python - BeautifulSoup4 parse everything except specific tags -
I am using Python to parse some HTML. The problem is that I Dr & lt; / p & gt; Pablov & lt; / li & gt; & lt; / ul & gt; ")" Hello this & lt; ul & gt; & lt; li & Lt; li & gt; Dr. Pablov & lt; / li & gt; & lt; / ul & gt; " P> You can still use demo: Dr & lt; / p & gt; Pablov & lt; / li & gt; & lt; / ul & gt; & Lt; Html & gt; & Lt; Body & gt; S 'soup = beautiful soup (s,' html.parser ') # force html.parser to sack the LXML's automatic unwrapper (soup) [63]: "Hello this From this perspective, any tag should work on arbitrary nestings, i.e. & lt; Ul & gt; and
& lt; Li & gt; I want to remove only the text of the document execept for the tag, sort the reverse order, so I need a function
parse_everything_but_lists which will have the following behavior
& gt; ; & Gt; & Gt; Lt; li & gt; & lt; / li & gt; & lt; li & gt; & lt; / l & gt; & lt; l & gt; & lt; li & gt;
open , just keep tag DRF (tags) from BS4 import = ('Ul', 'li')): In the tag for L: If Eastense (L, Tag): get a bit recursive. Opening (L) # Remains of the first, open it later if not el.name: el.unwrap ()
s = ''
gt; & lt; Li & gt; I & lt; / li & gt; & lt; li & gt; Dr. Pablov & lt; / li & gt; & lt; / ul & gt;
S = '' '' & lt; A & gt; & Lt; P & gt; & Lt; Ul & gt; & Lt; C & gt; & Lt; Li & gt; & Lt; D & gt; Hello & lt; / D & gt; & Lt; / Li & gt; & Lt; / C & gt; & Lt; / Ul & gt; & Lt; / P & gt; & Lt; / A & gt; "'' Soup = BeautifulSoup (S, 'html.parser') Soup Soup Out [19]:" & lt; Ul> gt; Leon; Hello & lt; / Li & gt; & Lt; / Ul & gt; "
Comments
Post a Comment