【问题标题】:Iterating through web pages by modifying URL with for loop通过使用 for 循环修改 URL 来迭代网页
【发布时间】:2018-07-24 09:34:14
【问题描述】:

我正在尝试从 tripadvisor 抓取特定酒店的数据。

tripadvisor 中酒店的网址是

https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas.html

当页面返回 5 条评论时,页面分隔符出现在“d92240-Reviews”之后,键为“-or5-”的倍数为 5。

例如

https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-or5-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas.html

https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-or10-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas.html

对于 URL 以“&start=(结果数)”结尾的页面,我可以创建一个 for 循环来返回每个页面

for i in range(0,200,5):
  url = http://blahblahblah&start= + str(i)

但是我不知道如何使用我的tripadvisor url 做到这一点。

【问题讨论】:

    标签: python html url web-scraping iteration


    【解决方案1】:

    给你:

    initial='https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas.html'
    url_part1='https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-or'
    url_part2='-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas.html'
    print (initial)
    for index in range (5,200, 5):
        print(url_part1+str(index)+url_part2)
    

    【讨论】:

      猜你喜欢
      • 2023-02-23
      • 2017-08-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-11-02
      • 1970-01-01
      • 2018-10-18
      • 2014-04-27
      相关资源
      最近更新 更多