【问题标题】:How to get javascript dynamic content from website如何从网站获取 javascript 动态内容
【发布时间】:2016-12-29 16:22:23
【问题描述】:

我正在尝试从网站获取动态内容。

我尝试使用 scrapy 获取内容。但是内容是用 js 文件加载的。所以它没有进入文本。

然后我为此安装了 selenium,但现在我收到 No such session 错误。

例如,这是我正在尝试获取内容的页面。

http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-p-SPORKONKSBFOX0081?magaza=Finspor

我只是为这个网站尝试了这个。

item = ProductItem
        name = response.css('h1.product-name::text').extract_first()
        price = response.css('span[id=offering-price] > span::text').extract_first()
        xpath = response.xpath('/html/head/script[17]')
        data = xpath.re(" = (\{.+\})")
        print(data)

这就是我想要得到的内容。

 var utagData = {"merchant_names":["Finspor"],"new_site":"new","order_store":"Finspor","order_currency":"TRY","page_domain":"www.hepsiburada.com","page_language":"tr-TR","page_site_name":"Hepsiburada","page_site_region":"tr","site_type":"desktop","page_type":"pdp","page_name":"Product Detail","category_path":"/product/spor-outdoor/spor-fitness/fitness-kondisyon/kosu-bantlari/sporkonksbfox008/","page_title":"Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Fiyatı","page_url":"http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-p-SPORKONKSBFOX0081?magaza=Finspor","page_referring_url":"http://www.hepsiburada.com/gunun-firsati-teklifi?element=1","page_query_string":["magaza=Finspor"],"is_canonical":"1","canonical_url":"http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-pm-sporkonksbfox008","product_prices":["999.00"],"product_unit_prices":["999.00"],"product_brands":["Fox Fitness"],"product_brand":"Fox Fitness","product_skus":["SPORKONKSBFOX0081"],"product_ids":["sporkonksbfox008"],"product_top_5":["sporkonksbfox008"],"product_names":["Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Bandı (Hediye Seçenekleriyle)"],"product_category_ids":["19249"],"product_categories":["kosu-bantlari"],"shipping_type":["super-hizli"],"product_quantities":["1"],"product_barcodes":["8691128100776"],"product_barcode":"8691128100776","product_name_array":"Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Bandı (Hediye Seçenekleriyle)","merchant_ids":["95df0e3483104fc1a16cca6e38bc45cc"],"order_subtotal":["999.00"],"category_id_hierarchy":"60001546 > 2147483635 > 353045 > 19249","category_name_hierarchy":"Spor Outdoor > Spor / Fitness > Fitness - Kondisyon > Koşu Bantları","product_status":"InStock"};
    var utagObject = utagData;
    var utag_data = {"merchant_names":["Finspor"],"new_site":"new","order_store":"Finspor","order_currency":"TRY","page_domain":"www.hepsiburada.com","page_language":"tr-TR","page_site_name":"Hepsiburada","page_site_region":"tr","site_type":"desktop","page_type":"pdp","page_name":"Product Detail","category_path":"/product/spor-outdoor/spor-fitness/fitness-kondisyon/kosu-bantlari/sporkonksbfox008/","page_title":"Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Fiyatı","page_url":"http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-p-SPORKONKSBFOX0081?magaza=Finspor","page_referring_url":"http://www.hepsiburada.com/gunun-firsati-teklifi?element=1","page_query_string":["magaza=Finspor"],"is_canonical":"1","canonical_url":"http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-pm-sporkonksbfox008","product_prices":["999.00"],"product_unit_prices":["999.00"],"product_brands":["Fox Fitness"],"product_brand":"Fox Fitness","product_skus":["SPORKONKSBFOX0081"],"product_ids":["sporkonksbfox008"],"product_top_5":["sporkonksbfox008"],"product_names":["Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Bandı (Hediye Seçenekleriyle)"],"product_category_ids":["19249"],"product_categories":["kosu-bantlari"],"shipping_type":["super-hizli"],"product_quantities":["1"],"product_barcodes":["8691128100776"],"product_barcode":"8691128100776","product_name_array":"Fox Fitness New Target 70E 2.5 Hp Motorlu, Masajlı Koşu Bandı (Hediye Seçenekleriyle)","merchant_ids":["95df0e3483104fc1a16cca6e38bc45cc"],"order_subtotal":["999.00"],"category_id_hierarchy":"60001546 > 2147483635 > 353045 > 19249","category_name_hierarchy":"Spor Outdoor > Spor / Fitness > Fitness - Kondisyon > Koşu Bantları","product_status":"InStock"};

【问题讨论】:

  • 你没有显示你的硒代码(这是你应该得到响应的地方)

标签: javascript selenium scrapy scrapy-spider scrapy-splash


【解决方案1】:

这里不需要执行任何 javascript。如果您右键单击页面并单击“查看页面源”(或类似),您可以在那里找到 json 格式的数据:

# assuming we're crawling:
# 'http://www.hepsiburada.com/fox-fitness-new-target-70e-2-5-hp-motorlu-masajli-kosu-bandi-hediye-secenekleriyle-p-SPORKONKSBFOX0081?magaza=Finspor'

import json

def parse(self, response):
    # get the java-script in the <script> node
    node = response.xpath("//script[contains(text(),'var utagData = ')]/text()")
    # extract the json bit from the script text with regex 
    data = node.re('= (\{.+\})')[0]
    # convert json to python dictionary
    data = json.loads(data)
    print(data)
    print(data['merchant_names'])
    # gives ['Finspor']

【讨论】:

    【解决方案2】:

    过去我使用这个库来抓取网站并获取我需要的内容:https://github.com/lapwinglabs/x-ray

    它有很好的 API 来找到你需要的具体数据:

    //get title
    xray('http://google.com', 'title')(function(err, title) {
      console.log(title);
    })
    

    或通过查找器查找:

    xray('http://reddit.com', '.content')(function(err, innerHTML) {
        console.log(innerHTML);
    })
    

    获取具体属性值:

    xray('http://techcrunch.com', 'img.logo@src')(function(err, value) {
        console.log(value);
    })
    

    所以请看看这个库。也许它可以帮助您实现所需的结果。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-01-06
      • 2016-07-06
      • 2014-05-08
      • 1970-01-01
      • 2020-07-24
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多