为什么 request.get 从我的浏览器给出不同的结果？（网页抓取）答案

【问题标题】：Why is request.get giving different results from my browser? (webscraping)为什么 request.get 从我的浏览器给出不同的结果？（网页抓取）
【发布时间】：2022-01-24 20:00:55
【问题描述】：

我对 python 非常陌生，我正在尝试制作一个网络抓取工具，用于收集人们在这个荷兰网站上发布的广告，你可以用它来出售你的旧东西。首先，我向用户询问搜索词、距离和邮政编码。然后，将这三个变量插入 URL 并生成 request.get。请参阅下面的函数声明：

from bs4 import BeautifulSoup
import requests

BASE_URL = 'https://www.marktplaats.nl'

def find_ads(BASE_URL, search_term, distance, postal_code):
    new_url = f"{BASE_URL}/q/{search_term}/#distanceMeters:{distance * 1000}|postcode:{postal_code}"
    html_text = requests.get(new_url, timeout=5).text
    soup = BeautifulSoup(html_text, 'lxml')
    ads = soup.find_all('li', class_='mp-Listing mp-Listing--list-item')

    print(new_url)

    for index, ad in enumerate(ads):
        ad_name = ad.find('h3', class_='mp-Listing-title').text
        ad_location = ad.find('span', class_='mp-Listing-location').text

        print(ad_location + "\n")
        print(ad_name + "\n")

search_term = "bike"
distance = 3
postal_code = 1234ab

find_ads(BASE_URL, search_term, distance, postal_code)

创建的 url (new_url) 与我想要的完全一样。当我将 new_url 复制到浏览器时，我会得到我想要抓取的页面，其中包含我的位置和距离设置。

但是，当我查看 ad_location 时，我希望它们都是我目前所在的城市，但我在控制台中看到的是分散在荷兰各地的随机城市。这意味着该页面以某种方式忽略了我的位置。

我的问题：为什么 requests.gets 得到的结果与我在浏览器中使用相同的 URL 时不同？为什么它忽略了我输入的距离和邮政编码？

【问题讨论】：

可能是因为页面是动态的并通过javascript呈现

标签： beautifulsoup python-requests

【解决方案1】：

页面是动态的。不过，您可以通过他们的 api 获取列表：

import requests

def find_ads(search_term, distance, postal_code):
    url = 'https://www.marktplaats.nl/lrp/api/search'
    payload = {
            'distanceMeters': '%s' %(distance * 1000),
            'limit': '30',
            'offset': '0',
            'postcode': '%s' %postal_code,
            'query': '%s' %search_term,
            'searchInTitleAndDescription': 'true',
            'viewOptions': 'list-view'}
    
    
    jsonData = requests.get(url, params=payload).json()
    
    ads = jsonData['listings']
    for index, ad in enumerate(ads):
        ad_name = ad['title']
        ad_location = ad['location']

        print(ad_location, "\n")
        print(ad_name, "\n")

search_term = "bike"
distance = 3
postal_code = '1234ab'

find_ads(search_term, distance, postal_code)

输出：

{'cityName': 'Dordrecht', 'countryName': 'Nederland', 'countryAbbreviation': 'NL', 'distanceMeters': -1000, 'isBuyerLocation': False, 'onCountryLevel': False, 'abroad': False, 'latitude': 51.811457038595, 'longitude': 4.6650731189384} 

Cortina fietsen de grootste voorraad bij Mega Bike Dordrecht 

{'cityName': 'Dordrecht', 'countryName': 'Nederland', 'countryAbbreviation': 'NL', 'distanceMeters': -1000, 'isBuyerLocation': False, 'onCountryLevel': False, 'abroad': False, 'latitude': 51.811457038595, 'longitude': 4.6650731189384} 

FIETSEN MET LICHTE LAKSCHADE lage prijzen bij Mega Bike 

...

【讨论】：