python - BeautifulSoup4 parse everything except specific tags -

May 15, 2013

I am using Python to parse some HTML. The problem is that I & lt; Ul & gt; and & lt; Li & gt; I want to remove only the text of the document execept for the tag, sort the reverse order, so I need a function parse_everything_but_lists which will have the following behavior

  & gt; ; & Gt; & Gt; Lt; li & gt; & lt; / li & gt; & lt; li & gt; & lt; / l & gt; & lt; l & gt; & lt; li & gt;  Dr & lt; / p & gt; Pablov & lt; / li & gt; & lt; / ul & gt; ")" Hello this & lt; ul & gt; & lt; li & Lt; li & gt; Dr. Pablov & lt; / li & gt; & lt; / ul & gt; "    P> 
  You can still use  open , just keep tag DRF (tags) from BS4 import = ('Ul', 'li')): In the tag for L: If Eastense (L, Tag): get a bit recursive. Opening (L) # Remains of the first, open it later if not el.name: el.unwrap ()   
 demo:  
  s = ''  
 
 
 
  
  Dr & lt; / p & gt; Pablov & lt; / li & gt; & lt; / ul & gt; & Lt; Html & gt; & Lt; Body & gt; S 'soup = beautiful soup (s,' html.parser ') # force html.parser to sack the LXML's automatic unwrapper (soup) [63]: "Hello this 
 gt; & lt; Li & gt; I & lt; / li & gt; & lt; li & gt; Dr. Pablov & lt; / li & gt; & lt; / ul & gt;    From this perspective, any tag should work on arbitrary nestings, i.e.  
  S = '' '' & lt; A & gt; & Lt; P & gt; & Lt; Ul & gt; & Lt; C & gt; & Lt; Li & gt; & Lt; D & gt; Hello & lt; / D & gt; & Lt; / Li & gt; & Lt; / C & gt; & Lt; / Ul & gt; & Lt; / P & gt; & Lt; / A & gt; "'' Soup = BeautifulSoup (S, 'html.parser') Soup Soup Out [19]:" & lt; Ul> gt; Leon; Hello & lt; / Li & gt; & Lt; / Ul & gt; "   

 



















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




python - how we can use ajax() in views.py in django? -






June 15, 2012








    I have a little code here.   Could you please tell me what this code really does? please.   Here in the AJAX call:    $ .ajax ({url: "{% url} upload_document '%}", type: "POST", data : {Csrfmiddletoken: document.getElementsByName ('csrfmiddletoken') [0] Price, Title: document.getElementById ('title'). Value, // document: document: document.getElementById ('document'),}, datatype: "Jason", success: function (feedback) {if (feedback == "true") {// success} other {// attachment errors}}});    How AJAX works in the Django and how we can see the AJAX request.  Here is the idea of my thoughts    def upload_document (requested): Print request. POTT print request FILES if request.is_ajax (): If request.method == 'POST': form = UploadForm (request.POST, request.FILES, user = request.user) if form.is_valid (): Form.save () return HTTPPCS (SimpleJason Dumps ('True'), Mime Type = 'App / Jason') Other:...





Read more





wpf - ControlTemplate with changes for every class -






August 15, 2011








    I have the following ControlTemplate that is the same for all my custom controls.    & amp; the lift; ControlTemplate x: key = "PssFunctionControlBaseHorizontal" TargetType = "Local: PssFunctionControlBase" & gt; & Lt; Grid & gt; & Lt; Threshold Threshold = "1" & gt; & Lt; Content Control X: Name = "Inner Content" Template = "{Static Resources Inner Contentbase Horizontal}" /> & Lt; / Border & gt; & Lt; Control x: name = "PART_ResizeDecorator" visibility = "short" template = "{static resource resize deccurratematet}" /> & Lt; / Grid & gt; & Lt; ControlTemplate.Triggers & gt; & Lt; Data Trigger Value = "True" Binding = "{Binding Editing Moded, Relative Soros = {ResolveSource Search Engineer, Instant Type = {x: Type Local: PssViewLayoutControl}}}" & gt; & Lt; Setter target name = "PART_ResizeDecorator" ...





Read more





matlab - Using loops to get multiple values into a cell -






April 15, 2013








    I have 31 topics (S1, S2, S3, S4, etc.) 3 images in each topic, opposite 1 IMG, Contrast 2 IMG and Contrast 3 Are IMG I p In a NX1 cell named P, I would like to use a loop to get all the paths of contrasts from all the subjects:    data / S1 / contrast1.img   / S1 / contrast2.img   data / S1 / contrast3.img   data / s2 / contrast1.img   data / S2 / contrast2.img   Data / S2 / contrast3.img ...   Data / S31 / contast3.img    This is what I've tried:    A = {'S1', 'S2', 'S3', ..., 'S31'}; % All topics C = {'contrast1.img', 'contrast2.img', 'contrast3.img'}; For each = P = cell (31 * 3,1) for each topic, the required contrast images: Length = A for Jammu = 1: Length (C) P {j} = spm_select ('FPList', Fullfile (data_path, q {i}) sprint ('% s', cell2mat (c))); % Of each topic is to select three contrast images, it works in my script, it is probably not 100% correct, because I had to simplify this example. End of ...





Read more

Search This Blog

ABC code

python - BeautifulSoup4 parse everything except specific tags -

Comments

Post a Comment

Popular posts from this blog

python - how we can use ajax() in views.py in django? -

wpf - ControlTemplate with changes for every class -

matlab - Using loops to get multiple values into a cell -