perl - HTML::TokeParser - finding text between and after tags -
I am trying to remove the following page from the HTML page using the following code, but my code fails ..
Budget - $ 25,000,00
Gross (worldwide) - $ 58,500,00
#! Use / usr / bin / perl HTML :: TokeParser; My $ content = & lt; & Lt; HTML; & Lt; H5 & gt; Budget & lt; / H5> $ 25 million (estimated) & lt; Br / & gt; & Lt; Br / & gt; & Lt; H5 & gt; Opening weekend & lt; / H5> $ 727,327 (USA) (& lt; a href = "/ date / 09-25 /" & gt; 25 September & lt; / a & gt; & lt; a href = "/ year / 1994 /" & gt; 1994 & lt; / a & gt;) (33 screens) & lt; Br / & gt; & Lt; Br / & gt; & Lt; H5 & gt; Gross & lt; / H5> $ 28,341,469 (USA) ( 410,811 (Germany) ( 1,245,604 (Spain)
Filming Dates get_tag ("h5")) {my $ text = $ parser-> gt; Get_text (); Last if $ text = ~ / budget / i; } This is probably not the most beautiful solution, but it works. while (my $ token = $ tp-> get_tag ("h5")) {my $ heading = $ tp-> Get_text (); $ Tp- & gt; Get_tag ("/ h5"); My $ amount = $ tp-> Get_trimmed_text = ~ s / [^ \ d, \ $] // gr; $ Amount until next = ~ m / \ d /; Say qq {$ heading - $ amount}; } According to you can start with get_tag and get the tag. After the start of & lt; H5 & gt; We can hold the title, and conclude & lt; / H5> . He has a text node, which has a monetary value.
Comments
Post a Comment