python - scrapy crawler not working from home page -


I have a scraper written in scrap.contrib

  trying to collect items Scroller is I import Spider Link to Crawlspider, Scrappy.Conditib. Linkprektrosrsksmuh imports from scrapy.selector import selector SgmlLinkExtractor .. import item category GinakSpider (CrawlSpider): name = "ginak" start_urls = [ "http: //www.shop.ginakdesigns .com / main.sc"] rules = [rule (SgmlLinkExtractor (permission = [r 'category \ .sc \? category Id = \ d +'])), rules (SgmlLinkExtractor (permission = [r'product \ .sc \? ProductId = \ d + & amp; category Id = \ d + ']), callback = "Pars_aitem']] DRF Parse_item (self-response): sel = selector (response) self.log (response.url) item = Items.GinakItem () before [ 'name' ] = Sel.xpath ('// * [[id = "wrapper2"] / div / div / div [1] / div / div / div [2] / div / div / div [1] / div [1] / h2 / text () '). Remove [' item '] [' value '] = sel.xpath (' // * [@ id = "listprice"] / text () '). '] = Sel.xpath (' // * [[id = "wrapper2"] / div / div / div [1] / div / div / div [2] / div / div / div [1] / div [4] ] / Div / p / lesson () '). Remove () item ['category'] = sel.xpath ('// * [@ id = "breadcrumbs"] / a [2] / text ()'). Remove () Refund Item   

Although it does not go into any link outside the home page. I have tried all kinds of things and have also checked my regular expressions for SgmlLinkExtractor. Is there anything wrong here?

The problem is that the jsessionid is included in the link that you want to remove For example,:

  & lt; a href = "/ category.sc; jsessionid = EA2CAA7A3949F4E462BBF466E03755B7.m1plqscsfapp05? Sreniaidi = 16" & gt;  / :    

/

  Rules = [rule (SgmlLinkExtractor (permission = [r 'category \ .sc. *. Series id = \ d +']), callback = 'Pars_tim' rule (Sjielelinkaktractor (permission = [r'product \ .sc. *? ProductId = \ d + and amp; category Id = \ d + ']), callback = "Pars_aitem']]   

hope That helps.

Comments

Popular posts from this blog

python - how we can use ajax() in views.py in django? -

matlab - Using loops to get multiple values into a cell -

python - Sequence Pattern recognition with Pybrain -