How to loop python to read a set of HTML files and dump into JSON -

March 15, 2012

I have a program that removes some variables from a group of 20 HTML files. Someone advises me how to loop the program to read all the HTML files from a directory and print information in personal JSN documents? Import from bs4 BeautifulSoup #open data file get_data = open ("Book1.html", 'r'). Html Soup # Beautiful #Encourse (get_data) # Title and author title = soup.find ("span", id = "btAsinTitle") Author = receives title. ("A", href = True) # Soup provides the definition for the price. All ('span', {"class": 'bb_price'}): definition = definition.renderContents () #finds ISBN, shipping weight, product dimension printed soup.fund ('B', text = 'ISBEN-10:' ). Next printing soup.fund ('b', text = 'shipping weight:'). Next_sibling #prints All information print definition print titles. Get_text () print author.get_text ()

To loop through all of you You can use the html files in each directory for each file name, pass the object as a file in the BeautifulSoup constructor, get the necessary element and create a dictionary: Import from bs4 import Beautiful soup for the name of glob file in Glob.iglob ('* .html'): Open (file name) as f: soup = beautiful soup ( F) Title = Soup. Search ("span", id = "btAsinTitle") author = title.find_next ("a ('b', text = 'shipping weight:'). Next_sibling print {'', 'b', 'b', 'b ', Text =' ISBN-10: '). Next_sibling weight = soup.find (' b ', text =' shipping weight: ') Title': title.get_text (), 'author': author.get_text (), 'Isbn': isbn, 'weight': weight}

Get link Facebook X Pinterest Email Other Apps

Comments Post a Comment

Search This Blog

ABC code

How to loop python to read a set of HTML files and dump into JSON -

Comments

Post a Comment

Popular posts from this blog

python - how we can use ajax() in views.py in django? -

wpf - ControlTemplate with changes for every class -

matlab - Using loops to get multiple values into a cell -