【问题标题】:extract tag value in scrapy在scrapy中提取标签值
【发布时间】:2019-01-25 18:15:08
【问题描述】:

我想在scrapy fpr示例中提取xpath标签的值我有这个

/html/body/div[3]/ul[1]/li[1]/div/p

q1

/html/body/div[3]/ul[1]/li[3]/div/p

ans1

/html/body/div[3]/ul[2]/li[1]/div/p

q2

/html/body/div[3]/ul[2]/li[2]/div/p

ans2 链接:https://www.digikala.com/ajax/product/questions/980291

在这样的产量中

 def parse(self, response):
        for quote in response.xpath('//html/body/main'):
            yield {
#question or answer 
#question pattern  li/div/p  or li[1]/div/p
#answer pattern ended with li[2 or higher number]/div/p
#related question and answer both have the same ul for example both are ul[1]
                'type': quote.xpath('i dont know this part').extract_first (),
                'QAnumber': quote.xpath('?').extract(),
                'text': quote.xpath('/html/body/div[3]/*/*/div/p/text()').extract(),


            }

我如何提取这 3 个部分

【问题讨论】:

    标签: python-2.7 web-scraping scrapy


    【解决方案1】:
     def parse(self, response):
         for quote in response.css('#product-questions-list > ul'):
             quest = response.css('.is-question > div.section > div > p::text').extract_first()
             answer = response.css('.is-answer > div.section > div > p::text').extract_first()
             yield {quest: answer}
    

    【讨论】:

      【解决方案2】:

      很难理解您的问题。你想提取问题和答案吗?会是这样的。

      from w3lib.html import remove_tags
      for qa in response.css('div#product-questions-list ul.c-faq__list'):
          question = qa.css('li.is-question div.section > p::text').get()
          answer = qa.css('li.is-answer div.section > p').get()
          answer = remove_tags(answer) if answer else None
          number = qa.css('li.is-question a::attr(data-question-id)')
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-10-10
        • 2016-01-10
        • 2014-07-28
        • 1970-01-01
        • 2014-04-17
        • 2021-01-09
        • 1970-01-01
        • 2020-02-19
        相关资源
        最近更新 更多