BS4：使用 Python 进行网页抓取时的属性错误答案

【问题标题】：BS4: Attribute Error in Web Scraping with PythonBS4：使用 Python 进行网页抓取时的属性错误
【发布时间】：2020-07-22 22:29:48
【问题描述】：

我需要从这个网站中提取link 商店所在城市的名称。我创建了这段代码：

def get_page_data(number):
    print('number:', number)

    url = 'https://www.biedronka.pl/pl/sklepy/lista,lat,52.25,lng,21,page,'.format(number)
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')

    container = soup.find(class_='s-content shop-list-page')
    items = container.find_all(class_='shopListElement')

    dane = []
    for item in items:
        miasto = item.find(class_='h4').get_text(strip=True)
        adres = item.find(class_='shopFullAddress').get_text(strip=True)
        dane.append([adres])

    return dane

wszystkie_dane = []
for number in range(1, 2):
    dane_na_stronie = get_page_data(number)

    wszystkie_dane.extend(dane_na_stronie)

dane = pd.DataFrame(wszystkie_dane, columns=['miasto','adres'])

dane.to_csv('biedronki_lista.csv', index=False)

问题出现在：

   miasto = item.find(class_='h4').get_text(strip=True)
AttributeError: 'NoneType' object has no attribute 'get_text'

任何想法如何从该网站提取城市名称（在 h4 中）？

【问题讨论】：

标签： python web-scraping beautifulsoup attributeerror

【解决方案1】：

class_='h4' 是一个属性，您将标签名称传递给不正确的类：

miasto = item.find('h4').get_text(strip=True)

【讨论】：

【解决方案2】：

尝试使用：

miasto = item.find('h4').text.split()[0]

或者：

miasto = item.find('h4').get_text(strip=True)

注意：

“h4”是一个标签，不是一个类。

解释：

当你给出 .find('h4') 时，它会返回：

<h4 style="margin-bottom: 10px;">

                Rzeszów             <span class="shopFullAddress">ul.<span class="shopAddress"> </span></span>

当你给.text时，它会返回：

'Rzeszów            \tul.'

当你给出 .split() 时，它会返回：

['Rzeszów', 'ul.']

从这里我们得到我们需要的东西。

因此，无论您在此代码中遇到错误的任何地方都执行此操作。

dane = []
    for item in items:
        miasto = item.find('h4').get_text(strip=True)
        adres = item.find('shopFullAddress').get_text(strip=True)
        dane.append([adres])

【讨论】：

删除class_=
shopFullAddress 不是一个类