Parse html using Perl -


i have following html-

<div>    <strong>date: </strong>        19 july 2011 </div> 

i have been using html::treebuilder parse out particular parts of html using either tags or classes aforementioned html giving me difficulty in trying extract date only.

for instance tried-

for ( $tree->look_down( '_tag' => 'div')) {  $date  = $_->look_down( '_tag' => 'strong' )->as_trimmed_text; 

but seems conflict earlier use of <strong>. looking parse out '19 july 2011'. have read documentation on treebuilder can not find way of doing this.

how can using treebuilder?

the "dump" method invaluable in finding way around html::treebuilder object.

the solution here parent element of element you're interested in (which is, in case, <div>) , iterate across content list. text you're interested in plain text nodes, i.e. elements in list not references html::element objects.

#!/usr/bin/perl  use strict; use warnings;  use html::treebuilder;  $tree = html::treebuilder->new;  $tree->parse(<<end_of_html); <div>    <strong>date: </strong>        19 july 2011 </div> end_of_html  $date;  $div ($tree->look_down( _tag => 'div')) {   ($div->content_list) {     $date = $_ unless ref;   } }  print "$date\n"; 

Comments

Popular posts from this blog

linux - Using a Cron Job to check if my mod_wsgi / apache server is running and restart -

actionscript 3 - TweenLite does not work with object -

jQuery Ajax Render Fragments OR Whole Page -