python - BeautifulSoup4 parse everything except specific tags -


I am using Python to parse some HTML. The problem is that I & lt; Ul & gt; and & lt; Li & gt; I want to remove only the text of the document execept for the tag, sort the reverse order, so I need a function parse_everything_but_lists which will have the following behavior

  & gt; ; & Gt; & Gt; Lt; li & gt; & lt; / li & gt; & lt; li & gt; & lt; / l & gt; & lt; l & gt; & lt; li & gt; 

Dr & lt; / p & gt; Pablov & lt; / li & gt; & lt; / ul & gt; ")" Hello this & lt; ul & gt; & lt; li & Lt; li & gt; Dr. Pablov & lt; / li & gt; & lt; / ul & gt; " P>

You can still use open , just keep tag DRF (tags) from BS4 import = ('Ul', 'li')): In the tag for L: If Eastense (L, Tag): get a bit recursive. Opening (L) # Remains of the first, open it later if not el.name: el.unwrap ()

demo:

  s = '' 
  • Dr & lt; / p & gt; Pablov & lt; / li & gt; & lt; / ul & gt; & Lt; Html & gt; & Lt; Body & gt; S 'soup = beautiful soup (s,' html.parser ') # force html.parser to sack the LXML's automatic unwrapper (soup) [63]: "Hello this

      gt; & lt; Li & gt; I & lt; / li & gt; & lt; li & gt; Dr. Pablov & lt; / li & gt; & lt; / ul & gt;

      From this perspective, any tag should work on arbitrary nestings, i.e.

        S = '' '' & lt; A & gt; & Lt; P & gt; & Lt; Ul & gt; & Lt; C & gt; & Lt; Li & gt; & Lt; D & gt; Hello & lt; / D & gt; & Lt; / Li & gt; & Lt; / C & gt; & Lt; / Ul & gt; & Lt; / P & gt; & Lt; / A & gt; "'' Soup = BeautifulSoup (S, 'html.parser') Soup Soup Out [19]:" & lt; Ul> gt; Leon; Hello & lt; / Li & gt; & Lt; / Ul & gt; "   

  • Comments

    Popular posts from this blog

    c - Mpirun hangs when mpi send and recieve is put in a loop -

    python - Apply coupon to a customer's subscription based on non-stripe related actions on the site -

    java - Unable to get JDBC connection in Spring application to MySQL -