【问题标题】:Selenium scrapes only the first item that it findsSelenium 只抓取它找到的第一个项目
【发布时间】:2021-11-12 17:00:59
【问题描述】:

我使用以下代码块来抓取网站

driver = webdriver.Chrome(executable_path=r'C:/Users/USER/Downloads/chromedriver_win32/chromedriver.exe')
url = 'https://mamikos.com/cari/ugm/all/bulanan/0-15000000'
driver.get(url)

kamar = driver.find_elements_by_class_name('kost-rc__content')

for desc in kamar :
    nama = desc.find_element_by_xpath('//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[1]').text
    kecamatan = desc.find_element_by_xpath('//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[2]').text
    harga = desc.find_element_by_xpath('//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[4]/div/div[2]/div/span[1]').text
    print(nama, kecamatan, harga)

运行后,输出似乎只打印该页面的第一个结果。我试图将 xpath 更改为此

for desc in kamar :
    nama = desc.find_element_by_xpath('.//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[1]').text
    kecamatan = desc.find_element_by_xpath('.//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[2]/div/span[2]').text
    harga = desc.find_element_by_xpath('.//*[@id="app"]/div/div[5]/div/div[1]/div/div/div[1]/div[1]/div[1]/div/div[2]/div/div[2]/div[4]/div/div[2]/div/span[1]').text
    print(nama, kecamatan, harga)

但它只会给出一个错误,请帮助。

旁注:谷歌浏览器版本 95.0.4638.69(官方构建)(64 位)和使用的驱动程序是 ChromeDriver 95.0.4638.69

【问题讨论】:

  • 您需要阅读有关 xpath 以及如何更好地编写它的内容。
  • 但它只给出一个错误 - 什么错误?

标签: python-3.x selenium-webdriver web-scraping xpath webdriverwait


【解决方案1】:

要获取名称信息价格信息,您可以使用Locator Strategies

代码块:

driver.get("https://mamikos.com/cari/ugm/all/bulanan/0-15000000")
names = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='kost-rc__info']//span[contains(@class, 'rc-info__name')]")))]
infos = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='kost-rc__info']//span[contains(@class, 'rc-info__location')]")))]
prices = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='rc-price__real']//span[contains(@class, 'rc-price__text')]")))]
for i,j,k in zip(names, infos, prices):
    print(f"Name:{i} Title:{j} Price:{k}")
driver.quit()

控制台输出:

Name:Kost Singgahsini Sakura Karanggayam Sleman Yogyakarta Title:Kecamatan Depok Price:Rp1.370.000
Name:Kost Singgahsini Granada UGM Yogyakarta Title:Kecamatan Depok Price:Rp1.790.000
Name:Kost Kurnia Terban Tipe A UGM Yogyakarta RMZ Title:Kecamatan Gondokusuman Price:Rp606.000
Name:Kost Singgahsini Maleo UGM Kaliurang Yogyakarta Title:Kecamatan Depok Price:Rp1.973.000
Name:Kost AB-AE Tipe B Gejayan Yogyakarta RMZ Title:Depok Price:Rp1.710.000
Name:Kost AB-AE Tipe A Gejayan Yogyakarta RMZ Title:Depok Price:Rp1.425.000
Name:Kost Pogung Familia Tipe C Sleman Yogyakarta RMZ Title:Mlati Price:Rp1.900.000
Name:Kost Pogung Familia Tipe B Sleman Yogyakarta RMZ Title:Mlati Price:Rp1.710.000
Name:Kost Pogung Familia Tipe A Sleman Yogyakarta RMZ Title:Mlati Price:Rp1.425.000
Name:Kost Hanung Tipe B UGM Yogyakarta RMZ Title:Mlati Price:Rp736.000
Name:Kost Apik Tapak Dara Tipe B Deresan Yogyakarta Title:Depok Price:Rp1.620.000
Name:Kost Singgahsini Putri Maoni Tipe A Gejayan Yogyakarta Title:Depok Price:Rp1.520.000
Name:Kost Singgahsini Omah Khiar Tipe F Karang Gayam Yogyakarta Title:Depok Price:Rp1.720.000
Name:Kost Apik Tapak Dara Tipe C Deresan Yogyakarta Title:Kecamatan Depok Price:Rp2.205.000
Name:Kost Singgahsini Putri Maoni Tipe B Gejayan Yogyakarta Title:Depok Price:Rp1.720.000
Name:Kost Wisma Yudhistira Tipe C Mlati Sleman Yogyakarta Title:Mlati Price:Rp2.250.000
Name:Kost Pondok Bugenvil 3 Caturtunggal Depok Sleman Title:Depok Price:Rp1.800.000
Name:Kost Pranasmara 34C Tipe B Depok Sleman Title:Depok Price:Rp1.200.000
Name:Kost Pondok Bugenvil 2 Caturtunggal Depok Sleman Yogyakarta Title:Depok Price:Rp1.800.000
Name:Kost Rahayu Residence Tipe C Depok Sleman Yogyakarta Title:Depok Price:Rp1.150.000

【讨论】:

  • 您好,感谢您的代码,对我帮助很大。快速提问,你对从哪里开始学习使用 selenium 进行网络抓取有什么建议吗?因为我是新手,并且想在进一步推进网络抓取之前了解基础知识。提前谢谢你
  • 最好的方法是在 StackOverflow 中以 Frequent 'selenium' Questions 开头。
【解决方案2】:

这是解决您问题的完整 C# 代码。您可以根据自己的语言调整它,尤其是 xpath 部分。

var els = driver.findElements(By.Xpath("//div[@class='kost-rc__content']"));

foreach(var el in els){
var nama = el.findElement(By.Xpath(".//span[@class='rc-info__name bg-c-text bg-c-text--title-4 ']"));
console.log("nama:"+nama.Text());

var kecamatan = el.findElement(By.Xpath(".//span[@class='rc-info__location bg-c-text bg-c-text--body-1 ']"));
console.log("kecamatan:"+kecamatan.Text());
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2022-12-21
    • 2014-10-22
    • 2015-09-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-11-30
    相关资源
    最近更新 更多