所以,这似乎有效。我两次都收到200 OK 响应,但内容长度不同。
对于它的价值,在 Firefox 中,当我单击蓝色的“购买此商店”按钮时,它会将我带到看起来完全相同的页面,但没有我刚刚单击的蓝色按钮。在 Chrome(测试版)中,当我单击蓝色按钮时,我会看到一个 403 Access denied 页面。他们的服务器打得不好。您可能很难实现自己想要实现的目标。
如果我在没有标题的情况下调用session.get,我将永远得不到回应。所以他们显然在检查用户代理,可能是 cookie 等。
import requests
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Upgrade-Insecure-Requests": "1",}
session = requests.Session()
url = "https://www.lowes.com/store/AK-Anchorage/2955"
response1 = session.get(url, headers=headers)
print(response1, len(response1.content))
response2 = session.get(url, headers=headers)
print(response2, len(response2.content))
输出:
<Response [200]> 56282
<Response [200]> 56323
我做了更多的测试。如果您不更改默认 Python 请求中的 user-agent,服务器将超时。即使将其更改为"" 似乎也足以让服务器给您响应。
您无需选择特定商店即可获取产品信息,包括描述、规格和价格。看看这个 GET 请求,没有 cookie,也没有会话:
import requests, json
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"}
url = "https://www.lowes.com/pd/Google-Nest-Learning-Thermostat-3rd-Gen-Thermostat-and-Room-Sensor-with-with-Wi-Fi-Compatibility/1001080012"
r = requests.get(url, headers=headers, timeout=5)
print("return code:", r)
print("content length:", len(r.content))
for line in r.text.splitlines():
if "window.digitalData.products = [" in line:
print("This line includes the 'sellingPrice' and the 'retailPrice'. After some splicing, we can treat it as JSON.")
left = line.find(" = ") + 3
right = line.rfind(";")
print(json.dumps(json.loads(line[left:right]), indent=True))
break
输出:
return code: <Response [200]>
content length: 107134
This line includes the 'sellingPrice' and the 'retailPrice'. After some splicing, we can treat it as JSON.
[
{
"productId": [
"1001080012"
],
"productName": "Nest_Learning_Thermostat_3rd_Gen_Thermostat_and_Room_Sensor_with_with_Wi-Fi_Compatibility",
"ivm": "753160-83910-T3007ES",
"itemNumber": "753160",
"vendorNumber": "83910",
"modelId": "T3007ES",
"type": "ANY",
"brandName": "Google",
"superCategory": "Heating & Cooling",
"quantity": 1,
"sellingPrice": 249,
"retailPrice": 249
}
]
产品描述和规格可以在这个元素中找到:
<section class="pd-information met-product-information grid-100 grid-parent v-spacing-jumbo">
(大约 300 行,所以我只是复制父标签。)
有一个 API 接受产品 ID 和商店编号,并返回定价信息:
import requests, json
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"}
url = "https://www.lowes.com/PricingServices/price/balance?productId=1001080012&storeNumber=1955"
r = requests.get(url, headers=headers, timeout=5)
print("return code:", r)
print("content length:", len(r.content))
print(json.dumps(json.loads(r.text), indent=True))
输出:
return code: <Response [200]>
content length: 768
[
{
"productId": 1001080012,
"storeNumber": 1955,
"isSosVendorDirect": true,
"price": {
"selling": "249.00",
"retail": "249.00",
"typeCode": 1,
"typeIndicator": "Regular Price"
},
"availability": [
{
"availabilityStatus": "Available",
"productStockType": "STK",
"availabileQuantity": 822,
"deliveryMethodId": 1,
"deliveryMethodName": "Parcel Shipping",
"storeNumber": 907
},
{
"availabilityStatus": "Available",
"productStockType": "STK",
"availabileQuantity": 8,
"leadTime": 1570529161540,
"deliveryMethodId": 2,
"deliveryMethodName": "Store Pickup",
"storeNumber": 1955
},
{
"availabilityStatus": "Available",
"productStockType": "STK",
"availabileQuantity": 1,
"leadTime": 1570529161540,
"deliveryMethodId": 3,
"deliveryMethodName": "Truck Delivery",
"storeNumber": 1955
}
],
"@type": "item"
}
]
它可以包含多个产品编号。例如:
https://www.lowes.com/PricingServices/price/balance?productId=1001080046%2C1001135076%2C1001091656%2C1001086418%2C1001143824%2C1001094006%2C1000170557%2C1000920864%2C1000338547%2C1000265699%2C1000561915%2C1000745998&storeNumber=1564
您可以使用这个返回 1.6MB json 文件的 API 获取每个商店的信息。 maxResults 通常设置为30,query 是您的经度和纬度。我建议将其保存到磁盘。我怀疑它改变了很多。
https://www.lowes.com/wcs/resources/store/10151/storelocation/v1_0?maxResults=2000&query=0%2C0
请记住,PricingServices/price/balance 端点可以为 storeNumber 采用多个值,以 %2C(逗号)分隔,因此您不需要 1763 个单独的 GET 请求。我仍然使用requests.Session 发出多个请求(因此它重用了底层连接)。