python - scrapy crawler not working from home page -
I have a scraper written in scrap.contrib
trying to collect items Scroller is I import Spider Link to Crawlspider, Scrappy.Conditib. Linkprektrosrsksmuh imports from scrapy.selector import selector SgmlLinkExtractor .. import item category GinakSpider (CrawlSpider): name = "ginak" start_urls = [ "http: //www.shop.ginakdesigns .com / main.sc"] rules = [rule (SgmlLinkExtractor (permission = [r 'category \ .sc \? category Id = \ d +'])), rules (SgmlLinkExtractor (permission = [r'product \ .sc \? ProductId = \ d + & amp; category Id = \ d + ']), callback = "Pars_aitem']] DRF Parse_item (self-response): sel = selector (response) self.log (response.url) item = Items.GinakItem () before [ 'name' ] = Sel.xpath ('// * [[id = "wrapper2"] / div / div / div [1] / div / div / div [2] / div / div / div [1] / div [1] / h2 / text () '). Remove [' item '] [' value '] = sel.xpath (' // * [@ id = "listprice"] / text () '). '] = Sel.xpath (' // * [[id = "wrapper2"] / div / div / div [1] / div / div / div [2] / div / div / div [1] / div [4] ] / Div / p / lesson () '). Remove () item ['category'] = sel.xpath ('// * [@ id = "breadcrumbs"] / a [2] / text ()'). Remove () Refund Item Although it does not go into any link outside the home page. I have tried all kinds of things and have also checked my regular expressions for SgmlLinkExtractor. Is there anything wrong here?
The problem is that the jsessionid is included in the link that you want to remove For example,: & lt; a href = "/ category.sc; jsessionid = EA2CAA7A3949F4E462BBF466E03755B7.m1plqscsfapp05? Sreniaidi = 16" & gt; / : / Rules = [rule (SgmlLinkExtractor (permission = [r 'category \ .sc. *. Series id = \ d +']), callback = 'Pars_tim' rule (Sjielelinkaktractor (permission = [r'product \ .sc. *? ProductId = \ d + and amp; category Id = \ d + ']), callback = "Pars_aitem']] hope That helps.
Comments
Post a Comment