How to extract text within HTML lists using beautifulsoup python -
I am trying to write a dragon program that can extract the text between the list between HTML. I want to remove information such as hardcover of books and number of pages. Does anyone know the command for this operation? For other information I used: But this does not work for this situation. Find and get the . Work example: ISBN-13: & lt; / p> 978-1608871827 "" "" " Soup = BeautifulSoup (data) print soup.find ('b', text = 'hardcover:') next_sibling pre Print sunt.find ('b', text = 'publisher:'.) Next_sibling print:. & lt; H2 & gt; Product Details & lt; / H2 & gt; & Lt; Div class = "content" & gt; & Lt; Ul & gt; & Lt; Li & gt; & Lt; P & gt; Hardcover: & lt; / P & gt; 156 Page & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; Publisher: & lt; / P & gt; Insight version; Har / PST version (June 18, 2013) & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; Language: & lt; / P & gt; English & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; ISBN-10: & lt; / P & gt; 1608871827 & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; ISBN-13: & lt; / P & gt; 978-1608871827 & lt; / Li & gt;
Definition in soup.findAll ('time', {"class": 'bb_price'} ): Definition = Definition.renderContents ()
b tag by
BS4 import = "" "" LT; h2> product details & lt; / h2 & gt; & lt; div from BeautifulSoup data Class = "content">
156 page insight version; every / PST Version (June 18, 2013)
Comments
Post a Comment