python - Finding next occuring tag and its enclosed text with Beautiful Soup -
I tag I should get results for the first blockquote in the HTML file. I will see the next and sequential Example HTML File: Simple Python code: use (if it is not a brother, instead use) < / P> & lt; Blockquote & gt; I'm trying to parse the text between . When I type
soup.blockquote.get_text ()
and lt; Block & gt; How do I get the tag? Maybe I'm just tired and can not get it in the documentation.
& lt; Html & gt; & Lt; Head & gt; Header & lt; / Head & gt; & Lt; Blockquote & gt; I can get this lesson & lt; / Blockquote & gt; & Lt; P & gt; Eiaoiefj & lt; / P & gt; & Lt; Block & gt; This next & lt; / Blockquote & gt; & Lt; P & gt; & Lt; / P & gt; & Lt; Strong & gt; It & lt; / Strong> & Lt; Blockquote & gt; Also capture it after the "next capture" and & lt; / Blockquote & gt; & Lt; / Html & gt;
import bs4 BeautifulSoup html_doc = open ( "example.html") soup = Beautiful Soup (html_doc) print. (Soup .blockquote.get_text ()) # How to get the next blockquote ???
& gt; & Gt; & Gt; Html = '' '... & lt; Html & gt; ... & lt; Principal & gt; Header ... & lt; / Head & gt; ... & lt; Block & gt; Blah blah ... & lt; / Blockquote & gt; ... & lt; P & gt; Eiofage & lt; / P & gt; ... & lt; Block & gt; Hold it next ... ... & lt; / Blockquote & gt; ... & lt; P & gt; & Lt; / P & gt; & Lt; Strong & gt; Don 'Ticattoor & lt; / Strong> ... & lt; Block & gt; ... It is to be seized separately even after "Next Capture" ... ... & lt; / Blockquote & gt; ... & lt; / Html & gt; ... '' '& gt; & Gt; & Gt; Beautiful from the BS 4 import & gt; & Gt; & Gt; Soup = beautiful soup (html) & gt; & Gt; & Gt; Quote 1 = Soup. Block Suits & gt; & Gt; & Gt; Quote1.text u'blah blah \ n '& gt; & Gt; & Gt; Quote2 = quote1.find_next_siblings ('blockcote') & gt; & Gt; & Gt; Quote2.text u'capture this next \ n '
Comments
Post a Comment