python - scrapy crawler not working from home page -

July 15, 2013

I have a scraper written in scrap.contrib

  trying to collect items Scroller is I import Spider Link to Crawlspider, Scrappy.Conditib. Linkprektrosrsksmuh imports from scrapy.selector import selector SgmlLinkExtractor .. import item category GinakSpider (CrawlSpider): name = "ginak" start_urls = [ "http: //www.shop.ginakdesigns .com / main.sc"] rules = [rule (SgmlLinkExtractor (permission = [r 'category \ .sc \? category Id = \ d +'])), rules (SgmlLinkExtractor (permission = [r'product \ .sc \? ProductId = \ d + & amp; category Id = \ d + ']), callback = "Pars_aitem']] DRF Parse_item (self-response): sel = selector (response) self.log (response.url) item = Items.GinakItem () before [ 'name' ] = Sel.xpath ('// * [[id = "wrapper2"] / div / div / div [1] / div / div / div [2] / div / div / div [1] / div [1] / h2 / text () '). Remove [' item '] [' value '] = sel.xpath (' // * [@ id = "listprice"] / text () '). '] = Sel.xpath (' // * [[id = "wrapper2"] / div / div / div [1] / div / div / div [2] / div / div / div [1] / div [4] ] / Div / p / lesson () '). Remove () item ['category'] = sel.xpath ('// * [@ id = "breadcrumbs"] / a [2] / text ()'). Remove () Refund Item    Although it does not go into any link outside the home page. I have tried all kinds of things and have also checked my regular expressions for SgmlLinkExtractor. Is there anything wrong here?   
 
  The problem is that the  jsessionid  is included in the link that you want to remove For example,:  
  & lt; a href = "/ category.sc; jsessionid = EA2CAA7A3949F4E462BBF466E03755B7.m1plqscsfapp05? Sreniaidi = 16" & gt;  / :      /        
  Rules = [rule (SgmlLinkExtractor (permission = [r 'category \ .sc. *. Series id = \ d +']), callback = 'Pars_tim' rule (Sjielelinkaktractor (permission = [r'product \ .sc. *? ProductId = \ d + and amp; category Id = \ d + ']), callback = "Pars_aitem']]    hope That helps.   

 



















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




python - how we can use ajax() in views.py in django? -






June 15, 2012








    I have a little code here.   Could you please tell me what this code really does? please.   Here in the AJAX call:    $ .ajax ({url: "{% url} upload_document '%}", type: "POST", data : {Csrfmiddletoken: document.getElementsByName ('csrfmiddletoken') [0] Price, Title: document.getElementById ('title'). Value, // document: document: document.getElementById ('document'),}, datatype: "Jason", success: function (feedback) {if (feedback == "true") {// success} other {// attachment errors}}});    How AJAX works in the Django and how we can see the AJAX request.  Here is the idea of my thoughts    def upload_document (requested): Print request. POTT print request FILES if request.is_ajax (): If request.method == 'POST': form = UploadForm (request.POST, request.FILES, user = request.user) if form.is_valid (): Form.save () return HTTPPCS (SimpleJason Dumps ('True'), Mime Type = 'App / Jason') Other:...





Read more





matlab - Using loops to get multiple values into a cell -






April 15, 2013








    I have 31 topics (S1, S2, S3, S4, etc.) 3 images in each topic, opposite 1 IMG, Contrast 2 IMG and Contrast 3 Are IMG I p In a NX1 cell named P, I would like to use a loop to get all the paths of contrasts from all the subjects:    data / S1 / contrast1.img   / S1 / contrast2.img   data / S1 / contrast3.img   data / s2 / contrast1.img   data / S2 / contrast2.img   Data / S2 / contrast3.img ...   Data / S31 / contast3.img    This is what I've tried:    A = {'S1', 'S2', 'S3', ..., 'S31'}; % All topics C = {'contrast1.img', 'contrast2.img', 'contrast3.img'}; For each = P = cell (31 * 3,1) for each topic, the required contrast images: Length = A for Jammu = 1: Length (C) P {j} = spm_select ('FPList', Fullfile (data_path, q {i}) sprint ('% s', cell2mat (c))); % Of each topic is to select three contrast images, it works in my script, it is probably not 100% correct, because I had to simplify this example. End of ...





Read more





python - Sequence Pattern recognition with Pybrain -






March 15, 2011













    I am constantly trying to use recurrent neural networks to classify a series of data. To be more specific, I have a sequence of sensor reading (which is continuous over time), I have to learn an algorithm which can detect the state related to this pattern, given the readings change.   Example:   Time step_1: 1.4   Time step_2: 1   Time step_3: 0.8   State = New Sequence:   Time Phase 1: 0.4   Time Phase 2: 0.3   Time Phase 3: 0.1   State = Sitting   I actually have 12 censors, I'm just showing a sequence of numbers for convenience. (Numbers are not real, I'm just trying to fulfill this idea)!   I am trying to make my network with PyBrain RNN, however, I can not get a Data Set Container that can detect it, like information I tried to use sequential data But after some testing I came to know that this number is the next element in the sequence of numbers. Here's how I prepare my dataset:    self.alldata = SequentialDataSet (ds.num_features, 1) # Now add samples to the dat...





Read more

Search This Blog

ABC code

python - scrapy crawler not working from home page -

Comments

Post a Comment

Popular posts from this blog

python - how we can use ajax() in views.py in django? -

matlab - Using loops to get multiple values into a cell -

python - Sequence Pattern recognition with Pybrain -