Получайте товары из интернет-магазина, используя простой парсер и разбиение на страницы

Я хочу проанализировать некоторые ссылки на продукты, имя и цену. Вот мой код: возникли проблемы с разбором, потому что я не знаю, как получить ссылку на продукт и имя. ЦЕНА в порядке, я понимаю. И разбиение на страницы не работает

<h2>Telefonai Pigu</h2> </br> <?php include_once('simple_html_dom.php'); $url = "http://pigu.lt/foto_gsm_mp3/mobilieji_telefonai/"; // Start from the main page $nextLink = $url; // Loop on each next Link as long as it exsists while ($nextLink) { echo "<hr>nextLink: $nextLink<br>"; //Create a DOM object $html = new simple_html_dom(); // Load HTML from a url $html->load_file($nextLink); $phones = $html->find('div#productList span.product'); foreach($phones as $phone) { // Get the link $linkas = $phone->href; // Get the name $pavadinimas = $phone->find('a[alt]', 0)->plaintext; // Get the name price and extract the useful part using regex $kaina = $phone->find('strong[class=nw]', 0)->plaintext; // This captures the integer part of decimal numbers: In "123,45" will capture "123"... Use @([\d,]+),?@ to capture the decimal part too echo $pavadinimas, " #----# ", $kaina, " #----# ", $linkas, "<br>"; //$query = "insert into telefonai (pavadinimas,kaina,linkas) VALUES (?,?,?)"; // $this->db->query($query, array($pavadinimas,$kaina, $linkas)); } // Extract the next link, if not found return NULL $nextLink = ( ($temp = $html->find('div.pagination a[="rel"]', 0)) ? "https://www.pigu.lt".$temp->href : NULL ); // Clear DOM object $html->clear(); unset($html); } ?>

Вывод:

 nextLink: http://pigu.lt/foto_gsm_mp3/mobilieji_telefonai/ A PHP Error was encountered Severity: Notice Message: Trying to get property of non-object Filename: views/pigu_view.php Line Number: 26 #----# 999,00 Lt #----# A PHP Error was encountered Severity: Notice Message: Trying to get property of non-object Filename: views/pigu_view.php Line Number: 26

Related of "Получайте товары из интернет-магазина, используя простой парсер и разбиение на страницы"

Пожалуйста, внимательно изучите исходный код, над которым вы работаете, а затем на основе этого вы можете получить нужные вам узлы … Это нормально, что совместимый код с другим веб-сайтом не будет работать здесь, поскольку эти два сайта не имеют одинакового источника код / структура!

Давайте снова, шаг за шагом, продолжим …

$phones = $html->find('div#productList span.product'); предоставит вам все «контейнеры телефонов» или то, что я назвал «блоки» … Каждый блок имеет следующую структуру:

 <span class="product"> <div class="fakeProductContainer"> <p class="productPhoto"> <span class=""> <span class="flags flag-disc-value" title="Akcija"><strong>500<br><span class="currencySymbol">Lt</span></strong></span> <span class="flags freeShipping" title="Nemokamas prekių atsiemimas į POST24 paštomatus. Pasiūlymas galioja iki sausio 31 d."></span> </span> <a href="/foto_gsm_mp3/mobilieji_telefonai/telefonas_sony_xperia_acro_s?id=4522595" title="Telefonas Sony Xperia acro S" class="photo-medium nobr"><img src="http://img.ruphp.com/php/c503caf69ad97d889842a5fd5b3ff372_medium.jpg" title="Telefonas Sony Xperia acro S" alt="Telefonas Sony Xperia acro S"></a> </p> <div class="price"> <strong class="nw">999,00 Lt</strong> <del class="nw">1.499,00 Lt *</del> </div> <h3><a href="/foto_gsm_mp3/mobilieji_telefonai/telefonas_sony_xperia_acro_s?id=4522595" title="Telefonas Sony Xperia acro S">Sony Xperia acro S</a></h3> <p class="descFields"> 3G: <em>HSDPA 14.4 Mbps, HSUPA 5.76 Mbps</em><br> GPS: <em>Taip</em><br> NFC: <em>Taip</em><br> Operacinė sistema: <em>Android OS</em><br> </p> </div> </span>

Якорь, содержащий ссылку продукта a, включен в <p class="productPhoto"> , и он является единственным якорем там, поэтому для его извлечения просто используйте $linkas = $phone->find('p.productPhoto a', 0)->href; (затем заполните его, поскольку это только относительная ссылка)

Название продукта находится в <h3> , опять же мы просто используем $pavadinimas = $phone->find('h3 a', 0)->plaintext; получить его

Цена включена в <div class="price"><strong> , и мы снова используем $kaina = $phone->find('div[class=price] strong', 0)->plaintext для извлечения

Ховер, не все телефоны отображают свою цену, поэтому мы должны проверить, правильно ли получена цена или нет

И, наконец, HTML-код, содержащий следующую ссылку, следующий:

 <div id="ListFootPannel"> <div class="pages-list"> <strong>1</strong> <a href="/foto_gsm_mp3/mobilieji_telefonai?page=2">2</a> <a href="/foto_gsm_mp3/mobilieji_telefonai?page=3">3</a> <a href="/foto_gsm_mp3/mobilieji_telefonai?page=4">4</a> <a href="/foto_gsm_mp3/mobilieji_telefonai?page=5">5</a> <a href="/foto_gsm_mp3/mobilieji_telefonai?page=6">6</a> <a rel="next" href="/foto_gsm_mp3/mobilieji_telefonai?page=2">Toliau</a> </div> <div class="pages-info"> Prekių </div> </div>

Итак, нас интересует тег <a rel="next"> , который можно получить с помощью $html->find('div#ListFootPannel a[rel="next"]', 0)

Итак, если мы добавим эти изменения в ваш исходный код, мы получим:

 $url = "http://pigu.lt/foto_gsm_mp3/mobilieji_telefonai/"; // Start from the main page $nextLink = $url; // Loop on each next Link as long as it exsists while ($nextLink) { echo "nextLink: $nextLink<br>"; //Create a DOM object $html = new simple_html_dom(); // Load HTML from a url $html->load_file($nextLink); //////////////////////////////////////////////// /// Get phone blocks and extract useful info /// //////////////////////////////////////////////// $phones = $html->find('div#productList span.product'); foreach($phones as $phone) { // Get the link $linkas = "http://pigu.lt" . $phone->find('p.productPhoto a', 0)->href; // Get the name $pavadinimas = $phone->find('h3 a', 0)->plaintext; // If price not found, find() returns FALSE, then return 000 if ( $tempPrice = $phone->find('div[class=price] strong', 0) ) { // Get the name price and extract the useful part using regex $kaina = $tempPrice->plaintext; // This captures the integer part of decimal numbers: In "123,45" will capture "123"... Use @([\d,]+),?@ to capture the decimal part too preg_match('@(\d+),?@', $kaina, $matches); $kaina = $matches[1]; } else $kaina = "000"; echo $pavadinimas, " #----# ", $kaina, " #----# ", $linkas, "<br>"; } //////////////////////////////////////////////// //////////////////////////////////////////////// // Extract the next link, if not found return NULL $nextLink = ( ($temp = $html->find('div#ListFootPannel a[rel="next"]', 0)) ? "http://pigu.lt".$temp->href : NULL ); // Clear DOM object $html->clear(); unset($html); echo "<hr>"; } с $url = "http://pigu.lt/foto_gsm_mp3/mobilieji_telefonai/"; // Start from the main page $nextLink = $url; // Loop on each next Link as long as it exsists while ($nextLink) { echo "nextLink: $nextLink<br>"; //Create a DOM object $html = new simple_html_dom(); // Load HTML from a url $html->load_file($nextLink); //////////////////////////////////////////////// /// Get phone blocks and extract useful info /// //////////////////////////////////////////////// $phones = $html->find('div#productList span.product'); foreach($phones as $phone) { // Get the link $linkas = "http://pigu.lt" . $phone->find('p.productPhoto a', 0)->href; // Get the name $pavadinimas = $phone->find('h3 a', 0)->plaintext; // If price not found, find() returns FALSE, then return 000 if ( $tempPrice = $phone->find('div[class=price] strong', 0) ) { // Get the name price and extract the useful part using regex $kaina = $tempPrice->plaintext; // This captures the integer part of decimal numbers: In "123,45" will capture "123"... Use @([\d,]+),?@ to capture the decimal part too preg_match('@(\d+),?@', $kaina, $matches); $kaina = $matches[1]; } else $kaina = "000"; echo $pavadinimas, " #----# ", $kaina, " #----# ", $linkas, "<br>"; } //////////////////////////////////////////////// //////////////////////////////////////////////// // Extract the next link, if not found return NULL $nextLink = ( ($temp = $html->find('div#ListFootPannel a[rel="next"]', 0)) ? "http://pigu.lt".$temp->href : NULL ); // Clear DOM object $html->clear(); unset($html); echo "<hr>"; }

Рабочий ДЕМО