python - how to parse remote document? -
Please help parsing the document to the Internet
Import xpr.m. Inidom import xml.dom.minidom to xml.dom.minidom import node import requests addr = requests.get ('http://fh79272k.bget.ru/py_test/books.xml') print (addr.status_code) doc = xml .dom.minidom.parse (str (addr)) # Load Dock in Object # Normally the front mapping = {} for the node in doc.getElementsByTagName ("book") is parsed: #Node2.childNodes in node3 For: Title = "": node3.nodeType == If node3 is the DOM object for node 2 isbn = node.getAttribute ("isbn" ) # # DOM Object API L = node. Gate Lenabitagnam ("Title") Node.TEXT_NODE: Title + = Node 3 Data Mapping [isbn] = Title # Mapping now has the same value as the SX instance pprint.pprint ( Mapping) This script does not work. The error message is:
Traceback (most recent call final): The file "C: \ Winters \ Openers \ Open Servers \ Domain \ Localhost \ python \ parse_html \ 1 \ dombook.py" In the # load doc object file in line 14, doctor = xml.dom.minidom.parse (str (addr)), "C: \ Python33 \ lib \ xml \ dom \ minidom.py", line 1960, parse return expatbuilder.parse In the file (file) "C: \ Python 33 \ lib \ xml \ dom \ expatbuilder.py", line 908, parse fp = open (file, 'rb') OSError: [error 22] Invalid argument: '' / P>
XML:
& lt; Catalog & gt; & Lt; Book isbn = "0-596-00128-2" & gt; & Lt; Title & gt; Python & amp; XML & lt; / Title & gt; & Lt; Date & gt; December 2001 & lt; / Date & gt; & Lt; Author & gt; Jones, Drake & lt; / Author & gt; & Lt; / Book & gt; & Lt; Book isbn = "0-596-15810-6" & gt; & Lt; Title & gt; Programming python, 4th edition & lt; / Title & gt; & Lt; Date & gt; October 2010 & lt; / Date & gt; & Lt; Author & gt; Lutz & lt; / Author & gt; & Lt; / Book & gt; & Lt; Book isbn = "0-596-15806-8" & gt; & Lt; Title & gt; Learning Python, fourth edition & lt; / Title & gt; & Lt; Date & gt; September 2009 & lt; / Date & gt; & Lt; Author & gt; Lutz & lt; / Author & gt; & Lt; / Book & gt; & Lt; Book isbn = "0-596-15808-4" & gt; & Lt; Title & gt; Python Pocket Reference, 4th Edition & lt; / Title & gt; & Lt; Date & gt; October 2009 & lt; / Date & gt; & Lt; Author & gt; Lutz & lt; / Author & gt; & Lt; / Book & gt; & Lt; Book isbn = "0-596-00797-3" & gt; & Lt; Title & gt; Python Cookbook, 2nd Edition & lt; / Title & gt; & Lt; Date & gt; March 2005 & lt; / Date & gt; & Lt; Author & gt; Martelli, Ravenscroft, Asher & lt; / Author & gt; & Lt; / Book & gt; & Lt; Book isbn = "0-596-10046-9" & gt; & Lt; Title & gt; Python in a brief, 2nd edition & lt; / Title & gt; & Lt; Date & gt; July 2006 & lt; / Date & gt; & Lt; Author & gt; Martelli & lt; / Author & gt; & Lt; / Book & gt; & Lt ;! - Plus many more Python books that appear here - & gt; & Lt; / List & gt;
You are creating XML from the feedback object, not in the text. Instead of str (addr) , use addr.text : doc = xml.dom.minidom.parse (addr. Instead of text, Also, try to use the XML parser to handle HTML, it is upset.
Comments
Post a Comment