【问题标题】:How to get amazon product name如何获取亚马逊产品名称
【发布时间】:2023-02-16 10:20:58
【问题描述】:

抱歉,如果这篇文章看起来重复,但我找不到可行的方法来做到这一点。

import requests
from bs4 import BeautifulSoup
from lxml import etree as et
import time
import random
import csv

header = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
    'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}

bucket_list = ['https://www.amazon.co.uk/Military-Analogue-Waterproof-Tactical-Minimalist/dp/B0B6C7RMQD/']


def get_product_name(dom):
    try:
        name = dom.xpath('//span[@id="productTitle"]/text()')
        [name.strip() for name in name]
        return name[0]
    except Exception as e:
        name = 'Not Available'
        return None


with open('master_data.csv', 'w') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(['product name', 'url'])

    

for url in bucket_list:
        response = requests.get(url, headers=header)
        soup = BeautifulSoup(response.content, 'html.parser')
        amazon_dom = et.HTML(str(soup))

       

 product_name = get_product_name(amazon_dom)

       

 time.sleep(random.randint(2, 5))

       

 writer.writerow([product_name, url])
        print(product_name, url)

我有这段代码可以打开链接并查找其名称并将其粘贴到 csv 文件中,但它什么也没粘贴。我怎样才能解决这个问题?

【问题讨论】:

  • 您是否考虑过使用Amazon Selling Partner API
  • 我想在没有卖家账户的情况下这样做
  • 至于这个问题,我认为 Mihnea-Octavian Manolache 给出了一个很好的答案,但是您的代码仍然可以在几个方面进行改进。 1)当你不确定它不会抛出 IndexError 时,永远不要索引任何东西(所以在 return name[0] 之前添加 if name,顺便说一句,在你执行列表理解但不将其自身保存到任何变量之前在线 - 可能不是你故意的)。
  • 2) 永远不要做except Exception,因为你甚至会捕获 KeyboardInterrupt 并且将无法调试你的脚本。 3)在您的 except 子句中,您将值保存到局部变量 name 中,您将无法在函数范围之外访问该变量。 4)如果发生异常,您将返回None,但随后在调用代码中不确认这一点并在返回字符串时使用它。要么在异常情况下返回空字符串,要么在调用代码中测试 None

标签: python selenium-webdriver web-scraping


【解决方案1】:

亚马逊是一个充满活力的网站;意味着它以编程方式加载(使用 JS)。仅仅使用请求通常不足以抓取亚马逊。所以你没有得到任何结果的原因可能是因为你的response实际上没有任何dom.xpath('//span[@id="productTitle"]/text()')

如果你想爬亚马逊,至少有两种解决方案:

1. 使用 Python 和 Selenium 抓取

首先,为了呈现 JavaScript,您需要使用实际的浏览器。由于您的脚本是用 Python 编写的,我建议您安装 Selenium 并将其与 HTML 解析器(如 BeautifulSoup)一起使用以提取数据。这是一个实现示例:

from cmath import exp
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from lxml import etree

BUCKET_LIST = ['https://www.amazon.co.uk/Military-Analogue-Waterproof-Tactical-Minimalist/dp/B0B6C7RMQD/']

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 5000)

titles = []
for url in BUCKET_LIST:
    driver.get(url)
    title = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#productTitle')))
    titles.append(title.text)

driver.quit()

print(titles)

但是你还必须考虑到亚马逊采取了很多措施来防止抓取的事实。作为WebScrapingAPI 的一名工程师,我们遇到过很多这样的场景,我们投入了大量时间和精力来确保我们的检测率非常低,因此我们的产品现在提供了很高的成功率。

话虽这么说,如果您不想投资于开发,而是想更多地关注数据提取,您的第二个选择是:

2.使用第三方API

使用第三方应用程序(例如我们专用的Amazon API)意味着您必须向 API 的端点发送请求并取回数据(通常为 JSON 格式)。这是一个实现示例:

import requests

API_KEY = '<YOUR_API_KEY>'
SCRAPER_URL = 'https://ecom.webscrapingapi.com/v1'

PARAMS = {
    "api_key":API_KEY,
    "engine":"amazon",
    "type":"product",
    "product_id":"B09FQ35SW6"
}

response = requests.get(SCRAPER_URL, params=PARAMS)

print(response.text)

这种情况下的响应如下所示:

{
    "search_parameters": {
        "amazon_url": "https://www.amazon.com/dp/B09FQ35SW6",
        "engine": "amazon",
        "amazon_domain": "amazon.com",
        "device": "desktop",
        "type": "product",
        "product_id": "B09FQ35SW6"
    },
    "search_information": {
        "organic_results_state": "Results for exact spelling",
        "total_results": null,
        "query_displayed": ""
    },
    "product_results": {
        "position": 1,
        "product_id": "B09FQ35SW6",
        "title": "Micro SD Card 512GB High Speed SD Card Class 10 Memory Card with Adapter for Smartphone Surveillance Camera Tachograph Tablet Computers",
        "keywords": [
            "Micro",
            "Card",
            "512GB",
            "High",
            "Speed",
            "Card",
            "Class",
            "Memory",
            "Card",
            "with",
            "Adapter",
            "for",
            "Smartphone",
            "Surveillance",
            "Camera",
            "Tachograph",
            "Tablet",
            "Computers"
        ],
        "subtitle": {
            "text": "Brand: HUNYEIZ",
            "link": "https://www.amazon.com/s/ref=bl_dp_s_web_0?ie=UTF8&search-alias=aps&field-keywords=HUNYEIZ"
        },
        "description": "Protection: Waterproof Temperature Proof Shock Proof X-ray Radiation Proof Warm Tips: 1.Our Store offers 100% genuine memory card with 1 years warranty 2.Please use quaity card reader to verify all memory card on PC。 3.Please don't use cheap card reader to test memory card, speed of memory card will be reduced by low quality card reader. 4.Memory card speed is greatly affected by card reader, adapter, USB port etc. Low quality device will 100% slow down card speed.",
        "price": "$19.99",
        "brand": "HUNYEIZ",
        "categories": [
            {
                "name": "Electronics",
                "link": "https://www.amazon.com/electronics-store/b/ref=dp_bc_aui_C_1/139-3506267-5844968?ie=UTF8&node=172282",
                "category_id": "172282"
            },
            {
                "name": "Computers & Accessories",
                "link": "https://www.amazon.com/computer-pc-hardware-accessories-add-ons/b/ref=dp_bc_aui_C_2/139-3506267-5844968?ie=UTF8&node=541966",
                "category_id": "541966"
            },
            {
                "name": "Computer Accessories & Peripherals",
                "link": "https://www.amazon.com/Computer-Accessories-Supplies/b/ref=dp_bc_aui_C_3/139-3506267-5844968?ie=UTF8&node=172456",
                "category_id": "172456"
            },
            {
                "name": "Memory Cards",
                "link": "https://www.amazon.com/Memory-Cards-Computer-Add-Ons-Computers/b/ref=dp_bc_aui_C_4/139-3506267-5844968?ie=UTF8&node=516866",
                "category_id": "516866"
            },
            {
                "name": "Micro SD Cards",
                "link": "https://www.amazon.com/Micro-SD-Memory-Cards/b/ref=dp_bc_aui_C_5/139-3506267-5844968?ie=UTF8&node=3015433011",
                "category_id": "3015433011"
            }
        ],
        "search_alias": {
            "name": "Electronics",
            "value": "electronics"
        },
        "link": "https://www.amazon.com/Adapter-Smartphone-Surveillance-Tachograph-Computers/dp/B09FQ35SW6",
        "feature_bullets": [
            "【Micro SD card with SD card adapter】This micro sd card 512GB comes with an SD card adapter, you can put the micro sd card into the adapter, and then you can use it on any SD card interface.",
            "【Stable and never worry about data loss】 Micro sd card 512GB includes SD adapter, 512GB SD Card is made of high-quality chips, providing reliable performance, making it ideal for write-intensive applications and ensuring clear recording Evidence HD without dropped frames.",
            "【Protection】The HUNYEIZ SD Card 512GB memory card for camera has been tested and can withstand extreme conditions. They are resistant to high temperature, waterproof, shockproof, X-ray and anti-static.",
            "【large capacity and high speed】SD Card 512GB fast reading rate, can be viewed and transferred instantly, the maximum capacity of 512GB TF card is 512GB, there is enough space to store thousands of snapshots and hours of full HD Video, which saves you from worrying about insufficient storage space.",
            "【3-year warranty】Customer satisfaction is the greatest motivation to pursue higher quality. The product quality is very high. We always strive to provide the best products and services to our valuable customers, and we have an industry-leading one-year warranty. If you have any questions about our products, welcome to contact us!"
        ],
        "main_image": "https://m.media-amazon.com/images/I/51i2zzSuiAS._AC_SL1200_.jpg",
        "images": [
            {
                "link": "https://m.media-amazon.com/images/I/31qhVgQVALS._AC_US1500_.jpg"
            },
            {
                "link": "https://m.media-amazon.com/images/I/31128DynPkS._AC_US1500_.jpg"
            },
            {
                "link": "https://m.media-amazon.com/images/I/518tJ5WG8XS._AC_US1500_.jpg"
            },
            {
                "link": "https://m.media-amazon.com/images/I/51qe2yJ2eNS._AC_US1500_.jpg"
            },
            {
                "link": "https://m.media-amazon.com/images/I/41hE9cNBj+S._AC_US1500_.jpg"
            },
            {
                "link": "https://m.media-amazon.com/images/I/51PYgIp7cGS._AC_US1500_.jpg"
            },
            {
                "link": "https://m.media-amazon.com/images/I/51l+9rMAnIS._AC_US1500_.jpg"
            }
        ],
        "has_360_view": true,
        "attributes": [
            {
                "name": "Brand",
                "value": "HUNYEIZ"
            },
            {
                "name": "Flash Memory Type",
                "value": "Micro SD"
            },
            {
                "name": "Hardware Interface",
                "value": "MicroSDXC"
            },
            {
                "name": "Secure Digital Association Speed Class",
                "value": "Class 10"
            },
            {
                "name": "Memory Storage Capacity",
                "value": "512 GB"
            }
        ],
        "dimensions": "4.84 x 2.87 x 0.55 inches",
        "weight": "0.634 ounces",
        "origin": "China",
        "ratings_total": 2,
        "rating": 5,
        "bestseller_rank": [
            {
                "rank": 421,
                "category": "Micro SD Memory Cards",
                "link": "https://www.amazon.com/gp/bestsellers/pc/3015433011/ref=pd_zg_hrsr_pc"
            }
        ],
        "first_available": "September 8, 2021"
    }
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-08-11
    • 2012-03-15
    • 2018-06-04
    • 1970-01-01
    • 1970-01-01
    • 2013-07-22
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多