У меня есть код, который читает HTML-файл с локального веб-сервера localhost
а затем преобразует его в XHTML
с tidy
. Затем я загружаю этот XHTML
в свой DOM
. код выглядит следующим образом:
<?php function getXHTML($html) { $options = array("output-html" => true,"quote-nbsp" => true, "drop-proprietary-attributes" => true,"drop-font-tags" => true,"drop-empty-paras" => true,"hide-comments" => true); $tidy=new tidy(); $xhtml=$tidy->repairString($html,$options); echo $xhtml; return $xhtml; } $content = file_get_contents("http://localhost/filename.htm"); $page = new DOMDocument(); $xpath=new DOMXPath($page); $content = getXHTML($content); // this is a tidy function to return XHTML $page->loadHTML($content); $totalPath = "//body/table[3]/tbody/tr[1]/td[4]"; $total = $xpath->query($totalPath); echo $total->length; // this shows zero ?>
содержимое filename.htm
выглядит так:
<!-- saved from url=(0041)http://www.rtu.ac.in/results/reformat.php --> <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <link rel="SHORTCUT ICON" href="http://www.rtu.ac.in/favicon.ico"> <link href="./Result - Rajasthan Technical University6_files/styleresults.css" rel="stylesheet" type="text/css"> <title>Result - Rajasthan Technical University</title> </head> <body> <table width="773" cellpadding="5" cellspacing="0" align="center"> <tbody><tr height="60"> <td width="16%" height="60" valign="top"><font color="brown" size="+2"><img src="./Result - Rajasthan Technical University6_files/logo.jpg" width="100" height="102" border="0" align="right"> </font></td> <td width="72%" height="60" align="center" valign="top"><p><font color="brown" size="+2"><strong>RAJASTHAN TECHNICAL UNIVERSITY </strong></font></p><font color="brown" size="+2"> <p><font size="+1"><strong>B.Tech -IVth SEMESTER -2010(Main) 16.5.2011</strong></font></p><font size="+1"> </font></font></td> <td width="12%" height="80"><strong>www.rtu.ac.in</strong> </td> </tr> </tbody></table> <br> <br> <table width="783" align="center" cellpadding="5" cellspacing="0" class="table"> <tbody> <tr> <td width="34%" align="center" valign="top" rowspan="2"><strong>Subject(s) Name </strong> </td> <td width="10%" align="center" valign="top" colspan="1" rowspan="2"> <strong>Subject(s) Code </strong> </td> <td align="center" valign="top" colspan="3" rowspan="1"><strong>Marks Obtained </strong> </td> </tr> <tr> <td width="20%" align="center"><strong>Internal</strong> </td> <td width="18%" align="center"><strong>Theory</strong> </td> <td width="18%" align="center"> </td> </tr> <tr> <td width="34%" align="center" style=" border-bottom: 0px none transparent;"><strong>SUBJECT-1</strong> </td> <td width="10%" align="center" style=" border-bottom: 0px none transparent;">4551</td> <td width="20%" align="center" style=" border-bottom: 0px none transparent;"> 16</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> 50</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> </td> </tr> <tr> <td width="34%" align="center" style=" border-bottom: 0px none transparent;"><strong>SUBJECT-2</strong> </td> <td width="10%" align="center" style=" border-bottom: 0px none transparent;"> 4552</td> <td width="20%" align="center" style=" border-bottom: 0px none transparent;"> 17</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> 61</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> </td> </tr> <tr> <td width="34%" align="center" style=" border-bottom: 0px none transparent;"><strong>SUBJECT-3</strong> </td> <td width="10%" align="center" style=" border-bottom: 0px none transparent;">4553</td> <td width="20%" align="center" style=" border-bottom: 0px none transparent;"> 19</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> 49</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> </td> </tr> <tr> <td align="center" style=" border-bottom: 0px none transparent;"><strong>SUBJECT-4</strong> </td> <td align="center" style=" border-bottom: 0px none transparent;">4554</td> <td align="center" style=" border-bottom: 0px none transparent;"> 14</td> <td align="center" style=" border-bottom: 0px none transparent;"> 68</td> <td align="center" style=" border-bottom: 0px none transparent;"> </td> </tr> <tr> <td align="center" style=" border-bottom: 0px none transparent;"><strong>SUBJECT-5</strong> </td> <td align="center" style=" border-bottom: 0px none transparent;">4555</td> <td align="center" style=" border-bottom: 0px none transparent;"> 14</td> <td align="center" style=" border-bottom: 0px none transparent;"> 36</td> <td align="center" style=" border-bottom: 0px none transparent;"> </td> </tr> <tr> <td width="34%" align="center" style=" border-bottom: 0px none transparent;"><strong>SUBJECT-6</strong> </td> <td width="10%" align="center" style=" border-bottom: 0px none transparent;">4556</td> <td width="20%" align="center" style=" border-bottom: 0px none transparent;"> 19</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> 48</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> </td> </tr><tr> <td align="center" style=" border-bottom: 0px none transparent;"> </td> <td align="center" style=" border-bottom: 0px none transparent;"> </td> <td align="center" style=" border-bottom: 0px none transparent;"> </td> <td align="center" style=" border-bottom: 0px none transparent;"> <strong>Internal</strong> </td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"><strong>Practical</strong> </td> </tr> <tr> <td width="34%" align="center" style=" border-bottom: 0px none transparent;"><strong>PSUBJECT-1</strong> </td> <td width="10%" align="center" style=" border-bottom: 0px none transparent;">4174</td> <td width="20%" align="center" style=" border-bottom: 0px none transparent;"> </td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> 29</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;">48</td> </tr> <tr> <td width="34%" align="center" style=" border-bottom: 0px none transparent;"><strong>PSUBJECT-2</strong> </td> <td width="10%" align="center" style=" border-bottom: 0px none transparent;">4175</td> <td width="20%" align="center" style=" border-bottom: 0px none transparent;"> </td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> 16</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;">26</td> </tr> <tr> <td width="34%" align="center" style=" border-bottom: 0px none transparent;"><strong>PSUBJECT-3</strong> </td> <td width="10%" align="center" style=" border-bottom: 0px none transparent;">4171</td> <td width="20%" align="center" style=" border-bottom: 0px none transparent;"> </td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> 15</td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;">27</td> </tr> <tr> <td align="center" style=" border-bottom: 0px none transparent;"><strong>PSUBJECT-4</strong> </td> <td align="center" style=" border-bottom: 0px none transparent;">4172</td> <td align="center" style=" border-bottom: 0px none transparent;"> </td> <td align="center" style=" border-bottom: 0px none transparent;"> 17</td> <td align="center" style=" border-bottom: 0px none transparent;">29</td> </tr> <tr> <td align="center" style=" border-bottom: 0px none transparent;"><strong>PSUBJECT-5</strong> </td> <td align="center" style=" border-bottom: 0px none transparent;">4173</td> <td align="center" style=" border-bottom: 0px none transparent;"> </td> <td align="center" style=" border-bottom: 0px none transparent;"> 29</td> <td align="center" style=" border-bottom: 0px none transparent;">46</td> </tr> <tr> <td width="34%" align="center" style=" border-bottom: 0px none transparent;"><strong>Disipline (Deca)</strong> </td> <td width="10%" align="center" style=" border-bottom: 0px none transparent;">4176</td> <td width="20%" align="center" style=" border-bottom: 0px none transparent;"> </td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;"> </td> <td width="18%" align="center" style=" border-bottom: 0px none transparent;">46</td> </tr> <tr><td> </td><td> </td><td> </td><td> </td><td> </td></tr></tbody> </table> <br><table width="783" align="center" cellpadding="5" cellspacing="0" class="table"> <tbody><tr> <td width="18%" align="center" valign="top"><strong>Practical Marks </strong> </td> <td width="18%" align="center" valign="top">328</td> <td width="19%" align="center" valign="top"><strong>Theory Marks </strong> </td> <td width="19%" align="center" valign="top">411</td> </tr> <tr> <td width="18%" align="center"><strong>Institute Code </strong> </td> <td width="18%" align="center"> 1229 </td> <td width="19%" align="center"><strong>DECCA </strong> </td> <td width="19%" align="center">4176</td> </tr> <tr> <td width="18%" align="center"><strong>Division </strong> </td> <td width="18%" align="center"> PASS </td> <td width="19%" align="center"><strong>Grand Total </strong> </td> <td width="19%" align="center">739</td> </tr> </tbody></table> <!-- Reformatter by Shashank Kumar Jain (CS, IIIrd Year, 2010-11) --> <div id="csscan-wrapper" style="display: none; "><h2 id="csscan-header">element</h2><table id="csscan-table"><tbody><tr><th colspan="2" id="csscan-header-font" class="csscan-header">Font</th></tr><tr id="csscan-row-font-family"><td id="csscan-property-font-family" class="csscan-property">font-family</td><td id="csscan-value-font-family" class="csscan-value"></td></tr><tr id="csscan-row-font-size"><td id="csscan-property-font-size" class="csscan-property">font-size</td><td id="csscan-value-font-size" class="csscan-value"></td></tr><tr id="csscan-row-font-style"><td id="csscan-property-font-style" class="csscan-property">font-style</td><td id="csscan-value-font-style" class="csscan-value"></td></tr><tr id="csscan-row-font-variant"><td id="csscan-property-font-variant" class="csscan-property">font-variant</td><td id="csscan-value-font-variant" class="csscan-value"></td></tr><tr id="csscan-row-font-weight"><td id="csscan-property-font-weight" class="csscan-property">font-weight</td><td id="csscan-value-font-weight" class="csscan-value"></td></tr><tr id="csscan-row-letter-spacing"><td id="csscan-property-letter-spacing" class="csscan-property">letter-spacing</td><td id="csscan-value-letter-spacing" class="csscan-value"></td></tr><tr id="csscan-row-line-height"><td id="csscan-property-line-height" class="csscan-property">line-height</td><td id="csscan-value-line-height" class="csscan-value"></td></tr><tr id="csscan-row-text-decoration"><td id="csscan-property-text-decoration" class="csscan-property">text-decoration</td><td id="csscan-value-text-decoration" class="csscan-value"></td></tr><tr id="csscan-row-text-align"><td id="csscan-property-text-align" class="csscan-property">text-align</td><td id="csscan-value-text-align" class="csscan-value"></td></tr><tr id="csscan-row-text-indent"><td id="csscan-property-text-indent" class="csscan-property">text-indent</td><td id="csscan-value-text-indent" class="csscan-value"></td></tr><tr id="csscan-row-text-transform"><td id="csscan-property-text-transform" class="csscan-property">text-transform</td><td id="csscan-value-text-transform" class="csscan-value"></td></tr><tr id="csscan-row-white-space"><td id="csscan-property-white-space" class="csscan-property">white-space</td><td id="csscan-value-white-space" class="csscan-value"></td></tr><tr id="csscan-row-word-spacing"><td id="csscan-property-word-spacing" class="csscan-property">word-spacing</td><td id="csscan-value-word-spacing" class="csscan-value"></td></tr><tr id="csscan-row-color"><td id="csscan-property-color" class="csscan-property">color</td><td id="csscan-value-color" class="csscan-value"></td></tr><tr><th colspan="2" id="csscan-header-background" class="csscan-header">Background</th></tr><tr id="csscan-row-background-attachment"><td id="csscan-property-background-attachment" class="csscan-property">bg-attachment</td><td id="csscan-value-background-attachment" class="csscan-value"></td></tr><tr id="csscan-row-background-color"><td id="csscan-property-background-color" class="csscan-property">bg-color</td><td id="csscan-value-background-color" class="csscan-value"></td></tr><tr id="csscan-row-background-image"><td id="csscan-property-background-image" class="csscan-property">bg-image</td><td id="csscan-value-background-image" class="csscan-value"></td></tr><tr id="csscan-row-background-position"><td id="csscan-property-background-position" class="csscan-property">bg-position</td><td id="csscan-value-background-position" class="csscan-value"></td></tr><tr id="csscan-row-background-repeat"><td id="csscan-property-background-repeat" class="csscan-property">bg-repeat</td><td id="csscan-value-background-repeat" class="csscan-value"></td></tr><tr><th colspan="2" id="csscan-header-size" class="csscan-header">Box</th></tr><tr id="csscan-row-width"><td id="csscan-property-width" class="csscan-property">width</td><td id="csscan-value-width" class="csscan-value"></td></tr><tr id="csscan-row-height"><td id="csscan-property-height" class="csscan-property">height</td><td id="csscan-value-height" class="csscan-value"></td></tr><tr id="csscan-row-border-top"><td id="csscan-property-border-top" class="csscan-property">border-top</td><td id="csscan-value-border-top" class="csscan-value"></td></tr><tr id="csscan-row-border-right"><td id="csscan-property-border-right" class="csscan-property">border-right</td><td id="csscan-value-border-right" class="csscan-value"></td></tr><tr id="csscan-row-border-bottom"><td id="csscan-property-border-bottom" class="csscan-property">border-bottom</td><td id="csscan-value-border-bottom" class="csscan-value"></td></tr><tr id="csscan-row-border-left"><td id="csscan-property-border-left" class="csscan-property">border-left</td><td id="csscan-value-border-left" class="csscan-value"></td></tr><tr id="csscan-row-margin"><td id="csscan-property-margin" class="csscan-property">margin</td><td id="csscan-value-margin" class="csscan-value"></td></tr><tr id="csscan-row-padding"><td id="csscan-property-padding" class="csscan-property">padding</td><td id="csscan-value-padding" class="csscan-value"></td></tr><tr id="csscan-row-max-height"><td id="csscan-property-max-height" class="csscan-property">max-height</td><td id="csscan-value-max-height" class="csscan-value"></td></tr><tr id="csscan-row-min-height"><td id="csscan-property-min-height" class="csscan-property">min-height</td><td id="csscan-value-min-height" class="csscan-value"></td></tr><tr id="csscan-row-max-width"><td id="csscan-property-max-width" class="csscan-property">max-width</td><td id="csscan-value-max-width" class="csscan-value"></td></tr><tr id="csscan-row-min-width"><td id="csscan-property-min-width" class="csscan-property">min-width</td><td id="csscan-value-min-width" class="csscan-value"></td></tr><tr id="csscan-row-outline-color"><td id="csscan-property-outline-color" class="csscan-property">outline-color</td><td id="csscan-value-outline-color" class="csscan-value"></td></tr><tr id="csscan-row-outline-style"><td id="csscan-property-outline-style" class="csscan-property">outline-style</td><td id="csscan-value-outline-style" class="csscan-value"></td></tr><tr id="csscan-row-outline-width"><td id="csscan-property-outline-width" class="csscan-property">outline-width</td><td id="csscan-value-outline-width" class="csscan-value"></td></tr><tr><th colspan="2" id="csscan-header-position" class="csscan-header">Positioning</th></tr><tr id="csscan-row-position"><td id="csscan-property-position" class="csscan-property">position</td><td id="csscan-value-position" class="csscan-value"></td></tr><tr id="csscan-row-top"><td id="csscan-property-top" class="csscan-property">top</td><td id="csscan-value-top" class="csscan-value"></td></tr><tr id="csscan-row-bottom"><td id="csscan-property-bottom" class="csscan-property">bottom</td><td id="csscan-value-bottom" class="csscan-value"></td></tr><tr id="csscan-row-right"><td id="csscan-property-right" class="csscan-property">right</td><td id="csscan-value-right" class="csscan-value"></td></tr><tr id="csscan-row-left"><td id="csscan-property-left" class="csscan-property">left</td><td id="csscan-value-left" class="csscan-value"></td></tr><tr id="csscan-row-float"><td id="csscan-property-float" class="csscan-property">float</td><td id="csscan-value-float" class="csscan-value"></td></tr><tr id="csscan-row-display"><td id="csscan-property-display" class="csscan-property">display</td><td id="csscan-value-display" class="csscan-value"></td></tr><tr id="csscan-row-clear"><td id="csscan-property-clear" class="csscan-property">clear</td><td id="csscan-value-clear" class="csscan-value"></td></tr><tr id="csscan-row-z-index"><td id="csscan-property-z-index" class="csscan-property">z-index</td><td id="csscan-value-z-index" class="csscan-value"></td></tr><tr><th colspan="2" id="csscan-header-list" class="csscan-header">List</th></tr><tr id="csscan-row-list-style-image"><td id="csscan-property-list-style-image" class="csscan-property">list-style-image</td><td id="csscan-value-list-style-image" class="csscan-value"></td></tr><tr id="csscan-row-list-style-type"><td id="csscan-property-list-style-type" class="csscan-property">list-style-type</td><td id="csscan-value-list-style-type" class="csscan-value"></td></tr><tr id="csscan-row-list-style-position"><td id="csscan-property-list-style-position" class="csscan-property">list-style-position</td><td id="csscan-value-list-style-position" class="csscan-value"></td></tr><tr><th colspan="2" id="csscan-header-table" class="csscan-header">Table</th></tr><tr id="csscan-row-vertical-align"><td id="csscan-property-vertical-align" class="csscan-property">vertical-align</td><td id="csscan-value-vertical-align" class="csscan-value"></td></tr><tr id="csscan-row-border-collapse"><td id="csscan-property-border-collapse" class="csscan-property">border-collapse</td><td id="csscan-value-border-collapse" class="csscan-value"></td></tr><tr id="csscan-row-border-spacing"><td id="csscan-property-border-spacing" class="csscan-property">border-spacing</td><td id="csscan-value-border-spacing" class="csscan-value"></td></tr><tr id="csscan-row-caption-side"><td id="csscan-property-caption-side" class="csscan-property">caption-side</td><td id="csscan-value-caption-side" class="csscan-value"></td></tr><tr id="csscan-row-empty-cells"><td id="csscan-property-empty-cells" class="csscan-property">empty-cells</td><td id="csscan-value-empty-cells" class="csscan-value"></td></tr><tr id="csscan-row-table-layout"><td id="csscan-property-table-layout" class="csscan-property">table-layout</td><td id="csscan-value-table-layout" class="csscan-value"></td></tr><tr><th colspan="2" id="csscan-header-effects" class="csscan-header">Effects</th></tr><tr id="csscan-row-text-shadow"><td id="csscan-property-text-shadow" class="csscan-property">text-shadow</td><td id="csscan-value-text-shadow" class="csscan-value"></td></tr><tr id="csscan-row--webkit-box-shadow"><td id="csscan-property--webkit-box-shadow" class="csscan-property">-webkit-box-shadow</td><td id="csscan-value--webkit-box-shadow" class="csscan-value"></td></tr><tr id="csscan-row-border-radius"><td id="csscan-property-border-radius" class="csscan-property">border-radius</td><td id="csscan-value-border-radius" class="csscan-value"></td></tr><tr><th colspan="2" id="csscan-header-other" class="csscan-header">Other</th></tr><tr id="csscan-row-overflow"><td id="csscan-property-overflow" class="csscan-property">overflow</td><td id="csscan-value-overflow" class="csscan-value"></td></tr><tr id="csscan-row-cursor"><td id="csscan-property-cursor" class="csscan-property">cursor</td><td id="csscan-value-cursor" class="csscan-value"></td></tr><tr id="csscan-row-visibility"><td id="csscan-property-visibility" class="csscan-property">visibility</td><td id="csscan-value-visibility" class="csscan-value"></td></tr></tbody></table></div></body></html>
XPath
выше правильно, поскольку я проверил его с FirePath
. может ли кто-нибудь сказать мне, что я делаю неправильно?