Parse html using Perl -


i have following html-

<div>    <strong>date: </strong>        19 july 2011 </div> 

i have been using html::treebuilder parse out particular parts of html using either tags or classes aforementioned html giving me difficulty in trying extract date only.

for instance tried-

for ( $tree->look_down( '_tag' => 'div')) {  $date  = $_->look_down( '_tag' => 'strong' )->as_trimmed_text; 

but seems conflict earlier use of <strong>. looking parse out '19 july 2011'. have read documentation on treebuilder can not find way of doing this.

how can using treebuilder?

the "dump" method invaluable in finding way around html::treebuilder object.

the solution here parent element of element you're interested in (which is, in case, <div>) , iterate across content list. text you're interested in plain text nodes, i.e. elements in list not references html::element objects.

#!/usr/bin/perl  use strict; use warnings;  use html::treebuilder;  $tree = html::treebuilder->new;  $tree->parse(<<end_of_html); <div>    <strong>date: </strong>        19 july 2011 </div> end_of_html  $date;  $div ($tree->look_down( _tag => 'div')) {   ($div->content_list) {     $date = $_ unless ref;   } }  print "$date\n"; 

Comments

Popular posts from this blog

javascript - Iterate over array and calculate average values of array-parts -

iphone - Using nested NSDictionary with Picker -

objective c - Newbie question -multiple parameters -