【问题标题】:Selenium data scraping from a table从表中抓取 Selenium 数据
【发布时间】:2021-11-23 08:46:04
【问题描述】:

我正在尝试从 website 获取表中的公司名称,但是当我尝试时,我得到了一个

TypeError: 'str' object is not callable

参考 XPATH 有没有办法解决这个问题?

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

s = Service("C:\Program Files (x86)\chromedriver.exe")
browser = webdriver.Chrome(service=s)

# click accept on pop up
url = "https://www.dbs.com.sg/treasures/aics/stock-coverage/index.html"
browser.get(url)
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.XPATH, "//button[text()='Accept']"))).click()


# getting data from table
# count no of rows

rows = len(browser.find_elements(By.XPATH("//*[@id='buyholdsell']/div/div/div/div/div[1]/div/div/div/div/table/tbody/tr")))

print(rows)

【问题讨论】:

    标签: python selenium web-scraping


    【解决方案1】:

    这个错误

    TypeError: 'str' object is not callable
    

    暗示,这

    find_elements(By.XPATH("//")
    

    应该是

    find_elements(By.XPATH, "")
    

    所以,您的有效代码块将是:

    代码:

    wait = WebDriverWait(driver, 30)
    url = "https://www.dbs.com.sg/treasures/aics/stock-coverage/index.html"
    driver.get(url)
    try:
        wait.until(EC.element_to_be_clickable((By.XPATH, "//button[text()='Accept']"))).click()
    except:
        pass
    
    
    # getting data from table
    # count no of rows
    
    rows = len(driver.find_elements(By.XPATH, "//*[@id='buyholdsell']/div/div/div/div/div[1]/div/div/div/div/table/tbody/tr"))
    
    print(rows)
    

    进口:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

    输出:

    10
    

    【讨论】:

    • 非常感谢您的回复!我对此很陌生,是什么泄露了 // 是问题所在?
    • 我们无法支持find_elements(By.XPATH("//")
    • 请在此处查看官方文档selenium-python.readthedocs.io/…
    【解决方案2】:

    这里不需要硒。您可以通过 POST 请求获取表:

    import requests
    import pandas as pd
    
    
    page = 0
    data_present = True
    url = 'https://www.dbs.com.sg/contentapi/dbsstore_main_www_global-ia_en_article_equity/search'
    
    rows = []
    while data_present == True:
        payload = {"query":
                   {"bool":
                    {"filter":
                     [{"range":
                       {"date_sort.PublishedDate":
                        {"lte":"now/d"}}},
                      {"range":
                       {"date_sort.ExpiryDate":
                        {"gte":"now/d"}}},
                          {"term":
                           {"results_data.QuickViews.POWType.raw":"Equity"}},
                              {"term":{"meta.Country":"sg"}},
                              {"terms":{"results_data.Industry.raw":
                                        ["Industrials","BasicMaterials","ConsumerGoods","ConsumerServices",
                                         "Financials","HealthCare","OilGas","RealEstate","Technology",
                                         "Telecommunications","Utilities"]}},
                                  {"terms":
                                   {"results_data.QuickViews.Recommendation.raw":
                                    ["BUY","HOLD","SELL","FV"]}}]}},
                       "sort":{"results_data.Title.raw":{"order":"asc"}},
                       "from":page*10}
        
        
        jsonData = requests.post(url, json=payload).json()
        
        if len(jsonData['hits']['hits']) != 0:
            page+=1
            print(f'Page: {page}')
            hits = jsonData['hits']['hits']
            for hit in hits:
                company_name = hit['_source']['results_data']['CompanyName']
                stock_code = hit['_source']['results_data']['BloombergReutersCode']
                sector = hit['_source']['results_data']['Industry']
                market_capital = hit['_source']['results_data']['QuickViews']['MarketCap']
                target_price = hit['_source']['results_data']['QuickViews']['TargetPrice']
                time = hit['_source']['results_data']['QuickViews']['Time']
                cell = hit['_source']['results_data']['QuickViews']['Recommendation']
                
                row = {
                    'Cell':cell,
                    'Company Name':company_name,
                    'Stock Code':stock_code,
                    'Sector':sector,
                    'Market Capital':market_capital,
                    'Target Price':target_price,
                    'Time':time}
                
                rows.append(row)
                
        else:
            print('No more data.')
            data_present = False
            
            
    results = pd.DataFrame(rows)
    

    输出:

    print(results)
         Cell              Company Name  ...        Target Price       Time
    0     BUY          AEM Holdings Ltd  ...                4.98  12 months
    1     BUY            AIMS APAC REIT  ...    1.60140929562329  12 months
    2    HOLD               APAC Realty  ...   0.877765749701783  12 months
    3     BUY  ARA US Hospitality Trust  ...   0.752986472516872  12 months
    4     BUY      Ascendas India Trust  ...                 1.8  12 months
    ..    ...                       ...  ...                 ...        ...
    98   HOLD                 Vicom Ltd  ...                2.13  12 months
    99    BUY      Wilmar International  ...                6.67  12 months
    100   BUY  Yangzijiang Shipbuilding  ...                1.95  12 months
    101   BUY         iFAST Corporation  ...               12.93  12 months
    102  HOLD                  mm2 Asia  ...  0.0673190399546146  12 months
    
    [103 rows x 7 columns]
    

    【讨论】:

    • 非常感谢!您给出的回复非常有帮助,但我对此很陌生,所以我还没有尝试过 pandas。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-03-21
    • 1970-01-01
    • 1970-01-01
    • 2021-10-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多