【问题标题】:Get value from web link从网页链接中获取价值
【发布时间】:2019-10-21 14:06:44
【问题描述】:

我有一个 url,我想从中提取数据为 "Underlying Stock: NCC 96.70 As on Jun 06, 2019 10:12:20 IST" 的行并提取符号“NCC”和标的价格是“96.70”进入列表。

url = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17"

【问题讨论】:

  • 你想要什么,请澄清一下?您已经有该行尝试搜索它。
  • 能否请您提供您的努力,而不是一行?

标签: python python-3.x pandas dataframe web-scraping


【解决方案1】:

您可以向该站点发出请求,然后使用Beautiful Soup 解析结果。

试试这个:

from bs4 import BeautifulSoup
import requests

url = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17"
res = requests.get(url)
soup = BeautifulSoup(res.text)

# hacky way of finding and parsing the stock data
soup.get_text().split("Underlying Stock")[1][2:10].split(" ")

打印出来:

['NCC', '96.9']

PS:如果您收到有关lxml 的警告...它是默认解析器,因为您已经安装了它。然后更改此行:soup = BeautifulSoup(res.text, features="lxml")。您需要安装lxml,例如在您的环境中使用pip install lxml

【讨论】:

  • 先生,我已经添加了一张我需要像 Symbol=NCC 和 Price=97.05 这样的值的区域的图片
【解决方案2】:
from bs4 import BeautifulSoup
import requests

url = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17"
res = requests.get(url)
soup = BeautifulSoup(res.text, "lxml")

# hacky way of finding and parsing the stock data
mylist = soup.get_text().split("Underlying Stock")[1][2:10].split(" ")
print(mylist[:2])

===============================

import pandas as pd

dict1 = {'SYMBOL': ['ACC','ADANIENT','ADANIPORTS','ADANIPOWER','AJANTPHARM','ALBK','AMARAJABAT','AMBUJACEM','APOLLOHOSP','APOLLOTYRE','ARVIND','ASHOKLEY','ASIANPAINT','AUROPHARMA','AXISBANK','BAJAJ-AUTO','BAJAJFINSV','BAJFINANCE','BALKRISIND','BANKBARODA','BANKINDIA','BANKNIFTY','BATAINDIA','BEL','BEML','BERGEPAINT','BHARATFIN','BHARATFORG','BHARTIARTL','BHEL','BIOCON','BOSCHLTD','BPCL','BRITANNIA','BSOFT','CADILAHC','CANBK','CANFINHOME','CASTROLIND','CEATLTD','CENTURYTEX','CESC','CGPOWER','CHENNPETRO','CHOLAFIN','CIPLA','COALINDIA','COLPAL','CONCOR','CUMMINSIND','DABUR','DCBBANK','DHFL','DISHTV','DIVISLAB','DLF','DRREDDY','EICHERMOT','ENGINERSIN','EQUITAS','ESCORTS','EXIDEIND','FEDERALBNK','GAIL','GLENMARK','GMRINFRA','GODFRYPHLP','GODREJCP','GODREJIND','GRASIM','GSFC','HAVELLS','HCLTECH','HDFC','HDFCBANK','HEROMOTOCO','HEXAWARE','HINDALCO','HINDPETRO','HINDUNILVR','HINDZINC','IBULHSGFIN','ICICIBANK','ICICIPRULI','IDBI','IDEA','IDFC','IDFCFIRSTB','IFCI','IGL','INDIACEM','INDIANB','INDIGO','INDUSINDBK','INFIBEAM','INFRATEL','INFY','IOC','IRB','ITC','JETAIRWAYS','JINDALSTEL','JISLJALEQS','JSWSTEEL','JUBLFOOD','JUSTDIAL','KAJARIACER','KOTAKBANK','KSCL','KTKBANK','L&TFH','LICHSGFIN','LT','LUPIN','M&M','M&MFIN','MANAPPURAM','MARICO','MARUTI','MCDOWELL-N','MCX','MFSL','MGL','MINDTREE','MOTHERSUMI','MRF','MRPL','MUTHOOTFIN','NATIONALUM','NBCC','NCC','NESTLEIND','NHPC','NIFTY','NIFTYIT','NIITTECH','NMDC','NTPC','OFSS','OIL','ONGC','ORIENTBANK','PAGEIND','PCJEWELLER','PEL','PETRONET','PFC','PIDILITIND','PNB','POWERGRID','PVR','RAMCOCEM','RAYMOND','RBLBANK','RECLTD','RELCAPITAL','RELIANCE','RELINFRA','REPCOHOME','RPOWER','SAIL','SBIN','SHREECEM','SIEMENS','SOUTHBANK','SRF','SRTRANSFIN','STAR','SUNPHARMA','SUNTV','SUZLON','SYNDIBANK','TATACHEM','TATACOMM','TATAELXSI','TATAGLOBAL','TATAMOTORS','TATAMTRDVR','TATAPOWER','TATASTEEL','TCS','TECHM','TITAN','TORNTPHARM','TORNTPOWER','TV18BRDCST','TVSMOTOR','UBL','UJJIVAN','ULTRACEMCO','UNIONBANK','UPL','VEDL','VGUARD','VOLTAS','WIPRO','WOCKPHARMA','YESBANK','ZEEL'],
         'LOT_SIZE': [400,4000,2500,20000,500,13000,700,2500,500,3000,2000,4000,600,1000,1200,250,125,250,800,4000,6000,20,550,6000,700,2200,500,1200,1851,7500,900,30,1800,200,2250,1600,2000,1800,3400,400,600,550,12000,1800,500,1000,2200,700,1563,700,1250,4500,1500,8000,400,2600,250,25,4100,4000,1100,2000,7000,2667,1000,45000,700,600,1500,750,4700,1000,700,500,250,200,1500,3500,2100,300,3200,500,1375,1500,10000,19868,13200,12000,35000,2750,4500,2000,600,300,4000,2000,1200,3500,3200,2400,2200,2250,9000,1500,500,1400,1300,400,1500,4700,4500,1100,375,700,1000,1250,6000,2600,75,1250,700,1200,600,600,2850,10,7000,1500,8000,8000,8000,50,27000,75,50,750,6000,4800,150,3399,3750,7000,25,6500,302,3000,6200,500,7000,4000,400,800,800,1200,6000,1500,500,1300,1100,16000,12000,3000,50,550,33141,250,600,1100,1100,1000,76000,15000,750,1000,400,2250,2000,3800,9000,1061,250,1200,750,500,3000,13000,1000,700,1600,200,7000,600,2300,3000,1000,3200,900,1750,1300]}
df1 = pd.DataFrame(dict1)

dict2 = {'SYMBOL': ['INFY', 'TATAMOTORS', 'IDBI', 'BHEL', 'LT'],
         'LTP': ['55', '66', '77', '88', '99'],
         'PRICE': ['0.25', '0.36', '0.12', '0.28', '0.85']}
df2 = pd.DataFrame(dict2)

print(df1,'\n\n')
print(df2,'\n\n')

df2['LOT_SIZE']=df2[['SYMBOL']].merge(df1,how='left').LOT_SIZE

print(df2)

【讨论】:

    【解决方案3】:

    另一个版本,不那么老套。

    url = "https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17"
    
    page_html = requests.get(url).text
    page_soup = BeautifulSoup(page_html, "html.parser")
    page_soup.find("b").next.split(' ')
    

    【讨论】:

      【解决方案4】:

      一种简洁的方法是选择第一个右对齐表格单元格 (td[align=right]) ;您实际上可以将其简化为属性 [align=right]:

      from bs4 import BeautifulSoup as bs
      import requests
      
      r = requests.get('https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17')
      soup = bs(r.content, 'lxml')
      headline = soup.select_one('[align=right]').text.strip().replace('\xa0\n',' ')
      print(headline)
      

      你也可以取第一个表的第一行

      from bs4 import BeautifulSoup
      import requests
      
      r = requests.get('https://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=917&symbol=NCC&symbol=ncc&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17')
      soup = bs(r.content, 'lxml')
      table = soup.select_one('table')
      headline = table.select_one('tr:nth-of-type(1)').text.replace('\n',' ').replace('\xa0', ' ').strip()
      print(headline)
      

      【讨论】:

        猜你喜欢
        • 2020-01-25
        • 1970-01-01
        • 1970-01-01
        • 2010-12-13
        • 2013-02-23
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多