【问题标题】:Concatenate span results from Python beautifulsoup into string将 Python beautifulsoup 的 span 结果连接成字符串
【发布时间】:2021-07-24 00:56:56
【问题描述】:

下面的 sn-p 可以按需要工作,但作为改进的一部分,我想将项目结果加入到一个用逗号分隔的字符串中。我一直在尝试,但没有锁定。

from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request, urlopen

url = 'https://bscscan.com/tx/0xb9044e77ae66b6f128866e049d55f09b3501de6fc75478e406e4c32d1de4bd6a'
headers = {'User-Agent': 'Mozilla/5.0'}

req = Request(url, headers=headers)
html = urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser')

main_data = soup.select("ul#wrapperContent div.media-body")
for item in main_data:
    all_span = item.find_all("span", class_='mr-1')
    last_span = all_span[-1]
    all_a = item.find_all("a")
    last_a = all_a[-1]
    print("{:>35} | {:18} | https://bscscan.com{}".format(last_span.get_text(strip=True), last_a.get_text(strip=True), last_a['href']))

电流输出:

                    2 ($598.51) | Wrapped BNB (WBNB) | https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
          13.684565595242991082 | MoMo KEY (KEY)     | https://bscscan.com/token/0x85c128ee1feeb39a59490c720a9c563554b51d33
                              4 | Chi Gastoken...(CHI) | https://bscscan.com/token/0x0000000000004946c0e9f43f4dee607b0ef1fa1c

需要改进:

                    2 ($598.51) | Wrapped BNB (WBNB) | https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
          13.684565595242991082 | MoMo KEY (KEY)     | https://bscscan.com/token/0x85c128ee1feeb39a59490c720a9c563554b51d33
                              4 | Chi Gastoken...(CHI) | https://bscscan.com/token/0x0000000000004946c0e9f43f4dee607b0ef1fa1c
         -> Wrapped BNB (WBNB) , MoMo KEY (KEY) , Chi Gastoken...(CHI) #-- Concatenated String

【问题讨论】:

    标签: python python-3.x string beautifulsoup python-requests


    【解决方案1】:

    首先,您尝试连接的字符串似乎是链接中的文本,而不是跨度。

    其次: 初始化一个空字符串(在你的情况下它不会是空的,因为你希望它以'->'开头)然后在每次迭代中添加所需的字符串,你会得到最终的答案。 请尝试以下操作:

    from bs4 import BeautifulSoup
    from urllib import request
    from urllib.request import Request, urlopen
    
    url = 'https://bscscan.com/tx/0xb9044e77ae66b6f128866e049d55f09b3501de6fc75478e406e4c32d1de4bd6a'
    headers = {'User-Agent': 'Mozilla/5.0'}
    
    req = Request(url, headers=headers)
    html = urlopen(req).read()
    soup = BeautifulSoup(html, 'html.parser')
    
    main_data = soup.select("ul#wrapperContent div.media-body")
    link_texts = '->'    # initialize a new string
    for item in main_data:
        all_span = item.find_all("span", class_='mr-1')
        last_span = all_span[-1]
        all_a = item.find_all("a")
        last_a = all_a[-1]
        print("{:>35} | {:18} | https://bscscan.com{}".format(last_span.get_text(strip=True), last_a.get_text(strip=True), last_a['href']))
        link_texts += last_a.get_text(strip=True) + ","    # add the link text to the string you initialized on each iteration
    link_texts = link_texts[:-1]    # slice the string so as to remove the extra comma at the last :):):)
    print(link_texts)
    

    这是输出:

      2 ($597.04) | Wrapped BNB (WBNB) | https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
                  13.684565595242991082 | MoMo KEY (KEY)     | https://bscscan.com/token/0x85c128ee1feeb39a59490c720a9c563554b51d33
                                      4 | Chi Gastoken...(CHI) | https://bscscan.com/token/0x0000000000004946c0e9f43f4dee607b0ef1fa1c
    ->Wrapped BNB (WBNB),MoMo KEY (KEY),Chi Gastoken...(CHI)
    

    【讨论】:

      【解决方案2】:

      您应该将值存储在一个列表中(在 for 循环之前声明),并与 ', '.join(list_variable) 连接

      类似

      temp_list = []
      main_data = soup.select("ul#wrapperContent div.media-body")
      for item in main_data:
          all_span = item.find_all("span", class_='mr-1')
          last_span = all_span[-1]
          all_a = item.find_all("a")
          last_a = all_a[-1]
          print("{:>35} | {:18} | https://bscscan.com{}".format(last_span.get_text(strip=True), last_a.get_text(strip=True), last_a['href']))
          temp_list.append(last_a.get_text(strip=True))
      
      print(', '.join(temp_list))
      

      【讨论】:

      • 我做错了。我在里面创建它。
      猜你喜欢
      • 1970-01-01
      • 2013-04-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-09-04
      • 2015-05-18
      • 2011-07-08
      • 2021-12-18
      相关资源
      最近更新 更多