How to extract text within HTML lists using beautifulsoup python -

August 15, 2015

I am trying to write a dragon program that can extract the text between the list between HTML. I want to remove information such as hardcover of books and number of pages. Does anyone know the command for this operation? & lt; H2 & gt; Product Details & lt; / H2 & gt; & Lt; Div class = "content" & gt; & Lt; Ul & gt; & Lt; Li & gt; & Lt; P & gt; Hardcover: & lt; / P & gt; 156 Page & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; Publisher: & lt; / P & gt; Insight version; Har / PST version (June 18, 2013) & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; Language: & lt; / P & gt; English & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; ISBN-10: & lt; / P & gt; 1608871827 & lt; / Li & gt; & Lt; Li & gt; & Lt; P & gt; ISBN-13: & lt; / P & gt; 978-1608871827 & lt; / Li & gt;

For other information I used:

  Definition in soup.findAll ('time', {"class": 'bb_price'} ): Definition = Definition.renderContents ()    But this does not work for this situation. Find and get the  b  tag by   
 
 .  
 Work example:  
  BS4 import = "" "" LT; h2> product details & lt; / h2 & gt; & lt; div from BeautifulSoup data Class = "content">    hardcover:  156 pages   
 Language:  Gt; English & lt; / li & gt; 
 & lt; b & gt; ISBN-10: & lt; / b & gt; 1608871827 & lt; / li & gt; & lt; li & gt; ;  ISBN-13: & lt; / p> 978-1608871827    "" "" " Soup = BeautifulSoup (data) print soup.find ('b', text = 'hardcover:') next_sibling pre Print sunt.find ('b', text = 'publisher:'.) Next_sibling   
 print:.  
  156 page insight version; every / PST Version (June 18, 2013)    

 



















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




python - how we can use ajax() in views.py in django? -






June 15, 2012








    I have a little code here.   Could you please tell me what this code really does? please.   Here in the AJAX call:    $ .ajax ({url: "{% url} upload_document '%}", type: "POST", data : {Csrfmiddletoken: document.getElementsByName ('csrfmiddletoken') [0] Price, Title: document.getElementById ('title'). Value, // document: document: document.getElementById ('document'),}, datatype: "Jason", success: function (feedback) {if (feedback == "true") {// success} other {// attachment errors}}});    How AJAX works in the Django and how we can see the AJAX request.  Here is the idea of my thoughts    def upload_document (requested): Print request. POTT print request FILES if request.is_ajax (): If request.method == 'POST': form = UploadForm (request.POST, request.FILES, user = request.user) if form.is_valid (): Form.save () return HTTPPCS (SimpleJason Dumps ('True'), Mime Type = 'App / Jason') Other:...





Read more





matlab - Using loops to get multiple values into a cell -






April 15, 2013








    I have 31 topics (S1, S2, S3, S4, etc.) 3 images in each topic, opposite 1 IMG, Contrast 2 IMG and Contrast 3 Are IMG I p In a NX1 cell named P, I would like to use a loop to get all the paths of contrasts from all the subjects:    data / S1 / contrast1.img   / S1 / contrast2.img   data / S1 / contrast3.img   data / s2 / contrast1.img   data / S2 / contrast2.img   Data / S2 / contrast3.img ...   Data / S31 / contast3.img    This is what I've tried:    A = {'S1', 'S2', 'S3', ..., 'S31'}; % All topics C = {'contrast1.img', 'contrast2.img', 'contrast3.img'}; For each = P = cell (31 * 3,1) for each topic, the required contrast images: Length = A for Jammu = 1: Length (C) P {j} = spm_select ('FPList', Fullfile (data_path, q {i}) sprint ('% s', cell2mat (c))); % Of each topic is to select three contrast images, it works in my script, it is probably not 100% correct, because I had to simplify this example. End of ...





Read more





python - Sequence Pattern recognition with Pybrain -






March 15, 2011













    I am constantly trying to use recurrent neural networks to classify a series of data. To be more specific, I have a sequence of sensor reading (which is continuous over time), I have to learn an algorithm which can detect the state related to this pattern, given the readings change.   Example:   Time step_1: 1.4   Time step_2: 1   Time step_3: 0.8   State = New Sequence:   Time Phase 1: 0.4   Time Phase 2: 0.3   Time Phase 3: 0.1   State = Sitting   I actually have 12 censors, I'm just showing a sequence of numbers for convenience. (Numbers are not real, I'm just trying to fulfill this idea)!   I am trying to make my network with PyBrain RNN, however, I can not get a Data Set Container that can detect it, like information I tried to use sequential data But after some testing I came to know that this number is the next element in the sequence of numbers. Here's how I prepare my dataset:    self.alldata = SequentialDataSet (ds.num_features, 1) # Now add samples to the dat...





Read more

Search This Blog

ABC code

How to extract text within HTML lists using beautifulsoup python -

Comments

Post a Comment

Popular posts from this blog

python - how we can use ajax() in views.py in django? -

matlab - Using loops to get multiple values into a cell -

python - Sequence Pattern recognition with Pybrain -