【问题标题】:Scrapy FormRequest , trying to send a post request (FormRequest) with currency change formdataScrapy FormRequest ,尝试发送带有货币更改表单数据的发布请求(FormRequest)
【发布时间】:2016-06-28 02:44:46
【问题描述】:

我一直在尝试抓取以下Website 但是随着货币从左上角设置表单更改为“SAR”,我尝试发送这样的scrapy请求:

r = Request(url='https://www.mooda.com/en/', cookies=[{'name': 'currency',
                                        'value': 'SAR',
                                        'domain': '.www.mooda.com',
                                        'path': '/'}, {'name':'country','value':'SA','domain': '.www.mooda.com','path':'/'}],dont_filter=True)

我仍然得到 EG 的价格

In [10]: response.css('.price').xpath('text()').extract()
Out[10]: 
[u'1,957 EG\xa3',
 u'3,736 EG\xa3',
 u'2,802 EG\xa3',
 u'10,380 EG\xa3',
 u'1,823 EG\xa3']

我还尝试发送带有指定表单数据的发布请求 像这样:

from scrapy.http.request.form import FormRequest
url = 'https://www.mooda.com/en/'
r = FormRequest(url=url,formdata={'selectCurrency':'https://www.mooda.com/en/directory/currency/switch/currency/SAR/uenc/aHR0cHM6Ly93d3cubW9vZGEuY29tL2VuLw,,/'})
fetch(r)

它仍然无法正常工作,也尝试使用 FormRequest.from_response() 但它永远不会工作,我真的很喜欢一些建议,我是scrapy表单请求的新手,如果有人可以提供帮助,我将不胜感激

【问题讨论】:

    标签: python web-scraping scrapy scrapy-spider scrapy-shell


    【解决方案1】:

    这都是关于 frontend cookie 的,我将首先向您展示如何处理请求,逻辑将与 Scrapy 完全相同:

    head = {        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
    #
    import requests
    from bs4 import BeautifulSoup
    
    with requests.Session() as s:
        soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content)
        r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
        r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
        soup2 = BeautifulSoup(r.content)
        print(soup2.select_one(".price").text)
    

    您需要向 ID 为 selectCurrency 的选项下的 url 发出请求,然后将发出请求时返回的 cookie 传递给 https://www.mooda.com/en?currency=sar。没有帖子,都是 get 请求,但是 get 中的 frontend cookie 是必不可少的。

    如果我们运行代码,您会发现它确实为我们提供了正确的数据:

    In [9]: with requests.Session() as s:
       ...:         soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content,"lxml")
       ...:         r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
       ...:         r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
       ...:         soup2 = BeautifulSoup(r.content,"lxml")
       ...:         print(soup2.select_one(".price").text)
       ...:     
    
    825 SR
    

    使用scrapy:

    class S(Spider):
        name = "foo"
        allowed_domains = ["www.mooda.com"]
        start_urls = ["https://www.mooda.com/en"]
    
        def parse(self, resp):
            curr = resp.css("#selectCurrency option[value*='SAR']::attr(value)").extract_first()
            return Request(curr, callback=self.parse2)
    
        def parse2(self, resp):
            print( resp.headers.getlist('Set-Cookie'))
            return Request("https://www.mooda.com/en?currency=sar",cookies=cookies, callback=self.parse3)
    
        def parse3(self, resp):
            print(resp.css('.price').xpath('text()').extract())
    

    如果你跑步会给你:

    ['frontend=c95er9h1at2srhtqu5rkfo13g0; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com', 'currency=SAR; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com']
    
    
    [u'825 SR', u'1,575 SR', u'1,181 SR', u'4,377 SR', u'769 SR']
    

    get 到 curr 什么都不返回,它只是设置了 cookie

    【讨论】:

    • 非常感谢,这真的很有帮助,我不知道它像请求 url='mooda.com/en/directory/currency/switch/currency/SAR/uenc/…' 那样简单,然后使用它的 cookie 请求网站的 URL,感谢您的努力.
    • 不用担心,这实际上不是最明显的解决方案,只有在你弄清楚如何做之后才会简单;)
    猜你喜欢
    • 2012-06-29
    • 2018-03-24
    • 2019-11-11
    • 1970-01-01
    • 2019-03-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多