【问题标题】:Scraping prices with BeautifulSoup4 in Python3 Udemy Website在 Python3 中使用 BeautifulSoup4 刮价格 Udemy 网站
【发布时间】:2021-12-18 03:44:25
【问题描述】:

我正在尝试从 Udemy 网站提取价格数据以及学生人数。 我在 Windows 上,我在 conda 环境中使用 Python 3.8 和 BeautifoulSoup。

这是我的代码:

url = 'https://www.udemy.com/course/business-analysis-conduct-a-strategy-analysis/'
html = requests.get(url).content
bs = BeautifulSoup(html, 'lxml')
searchingprice = bs.find('div', {'class':'price-text--price-part--2npPm udlite-clp-discount-price udlite-heading-xxl','data-purpose':'course-price-text'})
searchingstudents = bs.find('div', {'class':'','data-purpose':'enrollment'})
print(searchingprice)
print(searchingstudents)

而且我只获得有关学生的信息,而不是价格。我做错了什么?

None
<div class="" data-purpose="enrollment">
13,490 students
</div>

这里是网站的截图:

谢谢!

【问题讨论】:

    标签: python python-3.x web-scraping beautifulsoup


    【解决方案1】:

    价格不在源代码中,它是使用 javascript 获取的。我们必须采取同样的步骤。这段代码是你自己的,bs 已经加载了

    # get id of the course
    course_id=bs.body.attrs['data-clp-course-id']
    # build proper request, feel free to delete unneeded data requests
    link=f'https://www.udemy.com/api-2.0/pricing/?course_ids={course_id}&fields[pricing_result]=price,discount_price,list_price,price_detail,price_serve_tracking_id'
    # fetch the data
    res=requests.get(link).json()
    print(res)
    >>> {'courses': {'1596446': {'_class': 'pricing_result', 'price_serve_tracking_id': 'rbNYz3yCSiS2G1J62gtSzg', 'price': {'amount': 16.99, 'currency': 'EUR', 'price_string': '€16.99', 'currency_symbol': '€'}, 'list_price': {'amount': 119.99, 'currency': 'EUR', 'price_string': '€119.99', 'currency_symbol': '€'}, 'discount_price': {'amount': 17.0, 'currency': 'EUR', 'price_string': '€17', 'currency_symbol': '€'}, 'price_detail': {'amount': 119.99, 'currency': 'EUR', 'price_string': '€119.99', 'currency_symbol': '€'}}}, 'bundles': {}}
    

    【讨论】:

      【解决方案2】:
      html = """<div class="price-text--container--103D9 udlite-clp-price-text" 
      data-purpose="price-text-container"><div class="price-text--price-part--2npPm 
      udlite-clp-discount-price udlite-heading-lg" 
      data-purpose="course-price-text">
      <span class="udlite-sr-only">Current price</span>
      <span><span>$14.99</span></span></div>
      <div class="price-text--price-part--2npPm price-text--original-price--1sDdx 
      udlite-clp-list-price udlite-text-sm" data-purpose="original-price-container">
      <div data-purpose="course-old-price-text"><span class="udlite-sr-only">Original Price</span>
      <span><s><span>$99.99</span></s></span></div></div>
      <div class="price-text--price-part--2npPm udlite-clp-percent-discount udlite-text-sm"
      data-purpose="discount-percentage"><span class="udlite-sr-only">Discount</span><span>85% off</span>
      </div></div>"""
      
      soup = BeautifulSoup(html, 'lxml')
      # find the children of the main div class
      lst = soup.find('div', class_='price-text--container--103D9 udlite-clp-price-text').findChildren('span')
      # list comprehension to find the span text that starts with $ and keep the first element
      print([span.text for span in lst if span.text.startswith('$')][0])  # -> '$14.99'
      

      【讨论】:

      • 不工作。错误:AttributeError: 'NoneType' object has no attribute 'findChildren'
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-09-02
      • 1970-01-01
      • 2021-12-12
      相关资源
      最近更新 更多