java - Nested html not being parsed by Jsoup -
I am trying to parse one page with JSOP, but the HTML is not correctly parsed.
The general structure is:
& lt; Html & gt; & Lt; Top & gt; ... & lt; / Head & gt; & Lt; Frameset ... & gt; & Lt; Frame ... & gt; #document & lt; Html & gt; ... & lt; / Html & gt; & Lt; / Frame & gt; & Lt; / Frameset & gt; & Lt; / Html & gt; When I parse the html and print it, then document doc = jsoup.parse (html); System.out.println (doc.html ());
This external HTML (including #documents, but not frame or internal HTML) prints. Do anyone know how to get internal HTML with JSOU, or should I consider using a different library? Thank you.
EDIT: Here I am parsing the site. I have a subscription; Do not know if anyone will tell you in this.
After authentication, it will take you into:
Edit 2:
Then I run:
document doctor = jesop.ps (html); Elements ames = doc.select ("frameset> frame: last-child"); // print (ames); Switch (elems.size ()) {case: break; Case 1: Doctor = Jsoup.connect (elems.first (). Attr ("src")). Get (); break; Default: Break; } System.out.println (doc.html ()); Persevered HTML (doc.html ()):
& lt; Html & gt; & Lt; Top & gt; & Lt; / Head & gt; & Lt; Body & gt; & Amp; IUML; & Amp; Raquo; & Amp; Iquest; #document Hello Hello again & lt; / Body & gt; & Lt; / Html & gt; Then also & lt; Frameset & gt; Is not anyone looking for ideas?
Ways to Paste Nested HTML:
// Frameset Document Doc = Jesup. Connect ("Http://database.asahi.com/library/2/login/login.php") .get (); // login, password, etc. Set the frame URL to which you want to parse ... Note: I think that you want to parse the contents of the first frame elements elements = doc.select ("frameset & Gt; frame: first-child "); Switch (elts.size ()) {case 0: // no frame found ... break; Case 1: element frame alt = alts.fst (); Document frame doc = jsoup Connect (framelt.ttR ("src")) .get (); // Add frame dock node to doctor (FrameAlt # insert child) frameElt.insertChildren (0, frameDoc.childNodes ()); break; Default: // Strange Result ...} System.out.println (doc.html ());
Comments
Post a Comment