How to extract text within HTML lists using beautifulsoup python -


I am trying to write a dragon program that can extract the text between the list between HTML. I want to remove information such as hardcover of books and number of pages. Does anyone know the command for this operation? & lt; H2 & gt; Product Details & lt; / H2 & gt; & Lt; Div class = "content" & gt; & Lt; Ul & gt; & Lt; Li & gt; & Lt; P & gt; Hardcover: & lt; / P & gt; 156 Page & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; Publisher: & lt; / P & gt; Insight version; Har / PST version (June 18, 2013) & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; Language: & lt; / P & gt; English & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; ISBN-10: & lt; / P & gt; 1608871827 & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; ISBN-13: & lt; / P & gt; 978-1608871827 & lt; / Li & gt;

For other information I used:

  Definition in soup.findAll ('time', {"class": 'bb_price'} ): Definition = Definition.renderContents ()   

But this does not work for this situation. Find and get the b tag by

.

Work example:

  BS4 import = "" "" LT; h2> product details & lt; / h2 & gt; & lt; div from BeautifulSoup data Class = "content">  
  • hardcover: 156 pages
  • Language: Gt; English & lt; / li & gt;
  • & lt; b & gt; ISBN-10: & lt; / b & gt; 1608871827 & lt; / li & gt; & lt; li & gt; ;

    ISBN-13: & lt; / p> 978-1608871827 "" "" " Soup = BeautifulSoup (data) print soup.find ('b', text = 'hardcover:') next_sibling pre Print sunt.find ('b', text = 'publisher:'.) Next_sibling

    print:.

      156 page insight version; every / PST Version (June 18, 2013)    

  • Comments

    Popular posts from this blog

    c - Mpirun hangs when mpi send and recieve is put in a loop -

    python - Apply coupon to a customer's subscription based on non-stripe related actions on the site -

    java - Unable to get JDBC connection in Spring application to MySQL -