ASVShade: Обновление цен с SSG

Обновление цен с SSG

by ASVShade

Всем добрый день!

Меня несколько раз спрашивали как работает программа по обновлению цен на карты с сайта http://www.starcitygames.com.

Вот решил немного рассказать об этом и выложить исходники.

Разработка всего этого заняла всего пару выходных, за то польза на долгое-долгое время)

Хочу отметить что вначале все это было реализовано когда то давно на Delphi 2009 – Delphi 2010 и хранение результатов в текстовых файлах, но чуть позже переписано на Java + MySQL.

Сегодня как раз и рассмотрим как это все работает именно на Java.

Алгоритм будет у нас следующий:

Формируем массив состоящий из названий необходимых карт
С цикле запрашиваем по HTTP страницы с картами (обычный метод “Get”)
Обрабатываем полученные страницы как строки, избавляя их от HTML мусора
Далее обрабатываем эти уже очищенные страницы как XML
Сохраняем полученные результаты

Для начала немного о типах

Массив для карт:

public class CardsSSGPrice {
public ArrayList<CardSSGPrice> Cards = new ArrayList<CardSSGPrice>();
}

Сама карта:

public class CardSSGPrice {
    public String fName;
    public String fSet;
    public String fCondition;
    public String fStock;
    public String fPrice;
    public CardSSGPrice(
            String Name,
            String Set,
            String Condition,
            String Stock,
            String Price) {
        this.fName=Name.replaceAll(" ","+");
        this.fSet=Set;
        this.fCondition=Condition;
        this.fStock=Stock;
        this.fPrice=Price;
    }

Метод который запрашивает страницу:

    private String GetHTMLFile(URL u) {
        String s = "";
        InputStream is = null;
        try {
            is = u.openStream();
            BufferedReader BR = new BufferedReader(new InputStreamReader(is));
            String Temp = BR.readLine();
            while (Temp != null) {
                s = s + Temp + "\n";
                Temp = BR.readLine();
            }
        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                assert is != null;
                is.close();
            } catch (IOException ioe) {
                ioe.printStackTrace();
            }
        }
        return s;
    }
Метод проводящий очистку страницы от HTML мусора

private String ParsStr(String str) {
    int posB;
    int posE;
    //удаляем шапку
    posB = str.indexOf("<tr class=\"deckdbbody\">", 0);
    str = "<table>\n" + str.substring(posB, str.length());
    //удаляем хвост
    posE = str.indexOf("<tr><td colspan=\"10\" align=\"center\">", 0);
    str = str.substring(0, posE) + "</table>";
    //удаляем мелочь
    str = str.replaceAll("<br>", "");
    str = str.replaceAll("&nbsp", "");
    str = str.replaceAll("&", "");
    str = str.replaceAll("\\]", "");
    str = str.replaceAll("\\[", "");
    str = str.replaceAll("<div style='text-align: center;'>", "");
    str = str.replaceAll("</div>", "");
    str = str.replaceAll("<b>", "");
    str = str.replaceAll("</b>", "");
    str = str.replaceAll("<tr height=\"4\"><td colspan=\"11\" class=\"deckdbheader\"></td></tr>", "");
    str = str.replaceAll("<img src=\"http://static.starcitygames.com/sales//images/plus_white.png\" alt=\"\\+1 Qty\\\" width=\\\"16\\\" height=\\\"16\\\">", "");
    //Удаляем все блоки input
    posB = str.indexOf("<input ");
    while (posB != -1) {
        posE = str.indexOf(">", posB);
        str = str.replaceAll(str.substring(posB, posE + 1), "");
        posB = str.indexOf("<input ");
    }
    //Удаляем все блоки img
    posB = str.indexOf("<img ");
    while (posB != -1) {
        posE = str.indexOf(">", posB);
        str = str.replaceFirst(str.substring(posB, posE + 1), "");
        posB = str.indexOf("<img ");
    }
return str;
}

Ну и самый главный метод

    public ssg(WorkDB MyDB) {
        //заполнение массива карт (здесь уже сами решайте как это удобнее реализовать, я например запрашиваю с БД)
        CardsSSGPrice MyCard = MyDB.GetCardsForPriceSSG();
        //Перебор карт
        for (int i_MyCard = 0; i_MyCard < MyCard.Cards.size(); i_MyCard++) {
            System.out.println((i_MyCard + 1) + " (" + MyCard.Cards.size() + ") [" + MyCard.Cards.get(i_MyCard).fName + "]");
            try {
                boolean PageGo = true;
                int PageNum = 0;
                //Если есть страница для обработки
                while (PageGo) {
                    //Формирование строки запроса и сам запрос
                    String str = GetHTMLFile(new URL(
                            "http://sales.starcitygames.com//search.php" +
                                    "?substring=" + MyCard.Cards.get(i_MyCard).fName +
                                    "&start=" + Integer.toString(PageNum)));
                    //смотрим есть ли результаты
                    if (str.indexOf("We're sorry! There are no results with the name ") == -1) {
                        //Посылаем страницу на строковую обработку
                        str = ParsStr(str);
                        //Начинаем работать с XML
                        SAXBuilder builder = new SAXBuilder();
                        Document MyDoc = builder.build(new ByteArrayInputStream(str.getBytes("UTF-8")));
                        List Children = MyDoc.getRootElement().getChildren();
                        //Эти 2 переменные нужны что бы хранить прошлое название, так как если по строкам названия одинаковые, то они не пишутся
                        String Name_Old = "";
                        String Set_Old = "";
                        for (int i = 0; i < Children.size(); i++) {
                            Element Child = (Element) Children.get(i);
                            String Name = "";
                            String Set = "";
                            String Condition = "";
                            String Stock = "";
                            String Price = "";
                            //Название новое
                            if (Child.getChildren().size() == 10) {
                                Name = ((Element) Child.getChildren().get(0)).getValue().trim();
                                Set = ((Element) Child.getChildren().get(1)).getValue().trim();
                                Condition = ((Element) Child.getChildren().get(6)).getValue().trim();
                                Stock = ((Element) Child.getChildren().get(7)).getValue().replaceAll("Out of Stock", "0");
                                Price = ((Element) Child.getChildren().get(8)).getValue().replaceAll("\\$", "");
                                Name_Old = Name;
                                Set_Old = Set;
                            }
                            //Название не изменилось
                            if (Child.getChildren().size() == 9) {
                                Name = Name_Old;
                                Set = Set_Old;
                                Condition = ((Element) Child.getChildren().get(5)).getValue().trim();
                                Stock = ((Element) Child.getChildren().get(6)).getValue().replaceAll("Out of Stock", "0");
                                Price = ((Element) Child.getChildren().get(7)).getValue().replaceAll("\\$", "");
                            }
                            //Сохранение результатов (здесь уже сами решайте как это удобнее реализовать, я например сохраняю в БД)
                            MyDB.PriceToDB(Name, Set, Condition, Stock, Price);
                        }
                        if (Children.size() == 50) {
                            PageGo = true;
                            PageNum = PageNum + 50;
                            System.out.println("=========Next Page==========");
                        } else {
                            PageGo = false;
                            PageNum = 0;
                        }
                    } else {
                        PageGo = false;
                        PageNum = 0;
                        System.out.println("=========No Results==========");
                    }

                }
            } catch (MalformedURLException e) {
                e.printStackTrace();
            } catch (JDOMException e) {
                e.printStackTrace();
            } catch (UnsupportedEncodingException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

Как видите кода не так уж и много, разобраться думаю не составит особого труда.

Всем удачи!

0 коммент.:

Отправить комментарий

ASVShade

Архив блога

Обо мне

Обновление цен с SSG

0 коммент.: