Вот HTML, который я хотел бы проанализировать:
$html = ' <h1>title</h1> <div id="main"> <div id="page"> <div class="article"> <h2><span>date1</span> <a href="link1">title1</a></h2> <p>text1</p> </div> <div class="article"> <h2><span>date2</span> <a href="link2">title2</a></h2> <p>text2</p> </div> </div> </div>';
Вот что я хотел бы получить:
Array ( [0] => Array ( [link] => link1 [title] => title1 [description] => description1 [date] => date1 ) [1] => Array ( [link] => link2 [title] => title2 [description] => description2 [date] => date2 ) )
И вот мой PHP:
$doc = new DOMDocument(); $doc->loadHTML($html); $xpath = new DOMXpath($doc); $nodes = $xpath->query("//div[@class='article']/h2/a"); $list = array(); $i = 0; if($nodes) { foreach($nodes as $node) { if($node->getAttribute('href')) { $link = $node->getAttribute('href'); $list[$i]['link'] = $link; } if($node->nodeValue) { $title = $node->nodeValue; $list[$i]['title'] = $title; } if($node->nodeValue) { $description = $node->nodeValue; $list[$i]['description'] = $description; } if($node->nodeValue) { $date = $node->nodeValue; $list[$i]['date'] = $date; } $i++; } } echo '<pre>'; echo print_r ($list); echo '</pre>';
Результат ОК для link1
, title1
, link2
, title2
но не для description1
, date1
, description2
, date2
.
Я искал некоторые конкретные случаи, близкие к моим в руководстве по PHP. Но большую часть времени все довольно теоретично, когда речь идет о DOMdocument. Не могли бы вы помочь мне или предложить мне более конкретные материалы?
EDIT: вот содержимое $ node
DOMElement Object ( [tagName] => a [schemaTypeInfo] => [nodeName] => a [nodeValue] => title1 [nodeType] => 1 [parentNode] => (object value omitted) [childNodes] => (object value omitted) [firstChild] => (object value omitted) [lastChild] => (object value omitted) [previousSibling] => (object value omitted) [attributes] => (object value omitted) [ownerDocument] => (object value omitted) [namespaceURI] => [prefix] => [localName] => a [baseURI] => [textContent] => title1 ) 1 DOMElement Object ( [tagName] => a [schemaTypeInfo] => [nodeName] => a [nodeValue] => title2 [nodeType] => 1 [parentNode] => (object value omitted) [childNodes] => (object value omitted) [firstChild] => (object value omitted) [lastChild] => (object value omitted) [previousSibling] => (object value omitted) [attributes] => (object value omitted) [ownerDocument] => (object value omitted) [namespaceURI] => [prefix] => [localName] => a [baseURI] => [textContent] => title2 ) 1