How to loop python to read a set of HTML files and dump into JSON -
I have a program that removes some variables from a group of 20 HTML files. Someone advises me how to loop the program to read all the HTML files from a directory and print information in personal JSN documents? Import from bs4 To loop through all of you You can use the BeautifulSoup #open data file get_data = open ("Book1.html", 'r'). Html Soup # Beautiful #Encourse (get_data) # Title and author title = soup.find ("span", id = "btAsinTitle") Author = receives title. ("A", href = True) # Soup provides the definition for the price. All ('span', {"class": 'bb_price'}): definition = definition.renderContents () #finds ISBN, shipping weight, product dimension printed soup.fund ('B', text = 'ISBEN-10:' ). Next printing soup.fund ('b', text = 'shipping weight:'). Next_sibling #prints All information print definition print titles. Get_text () print author.get_text ()
html files in each directory for each file name, pass the object as a file in the
BeautifulSoup constructor, get the necessary element and create a dictionary: Import from bs4 import Beautiful soup for the name of glob file in Glob.iglob ('* .html'): Open (file name) as f: soup = beautiful soup ( F) Title = Soup. Search ("span", id = "btAsinTitle") author = title.find_next ("a ('b', text = 'shipping weight:'). Next_sibling print {'', 'b', 'b', 'b ', Text =' ISBN-10: '). Next_sibling weight = soup.find (' b ', text =' shipping weight: ') Title': title.get_text (), 'author': author.get_text (), 'Isbn': isbn, 'weight': weight}
Comments
Post a Comment