Parse html using Perl -

- September 15, 2015

i have following html-

<div>    <strong>date: </strong>        19 july 2011 </div>

i have been using html::treebuilder parse out particular parts of html using either tags or classes aforementioned html giving me difficulty in trying extract date only.

for instance tried-

for ( $tree->look_down( '_tag' => 'div')) {  $date  = $_->look_down( '_tag' => 'strong' )->as_trimmed_text;

but seems conflict earlier use of <strong>. looking parse out '19 july 2011'. have read documentation on treebuilder can not find way of doing this.

how can using treebuilder?

the "dump" method invaluable in finding way around html::treebuilder object.

the solution here parent element of element you're interested in (which is, in case, <div>) , iterate across content list. text you're interested in plain text nodes, i.e. elements in list not references html::element objects.

#!/usr/bin/perl  use strict; use warnings;  use html::treebuilder;  $tree = html::treebuilder->new;  $tree->parse(<<end_of_html); <div>    <strong>date: </strong>        19 july 2011 </div> end_of_html  $date;  $div ($tree->look_down( _tag => 'div')) {   ($div->content_list) {     $date = $_ unless ref;   } }  print "$date\n";

Search This Blog

C A N B

Parse html using Perl -

Comments

Post a Comment

Popular posts from this blog

actionscript 3 - TweenLite does not work with object -

php - How can I edit my code to echo the data of child's element where my search term was found in, in XMLReader? -

c# - Global Variables vs. ASP.NET Session State -