如何从我抓取的网页中删除空格？ [复制]答案

【问题标题】：How can i remove the empty spaces from a web page that i scrape? [duplicate]如何从我抓取的网页中删除空格？ [复制]
【发布时间】：2019-12-25 03:56:03
【问题描述】：

我正在使用漂亮的 Soup 使用 python3 抓取网站。我将数据存储到一个列表中。

我设法提取了我想要的信息

import requests
from bs4 import BeautifulSoup

source = requests.get('my site').text
soup = BeautifulSoup(source, 'lxml')

lista = []

rows = soup.find('table', class_='exchange-rates-table not- 
responsive').find_all('tr')

for row in rows:          # Print all occurrences
    l.ista.append(row.contents[3].get_text())
print(lista)



This is the output:

['Cod', '\n\n                        EUR\n\n                    ', 
'\n\n                        USD\n\n                    ', '\n\n                        
GBP\n\n                    ', '\n\n                        CHF\n\n                    
', '\n\n                        AUD\n\n                    ', '\n\n                        
DKK\n\n                    ', '\n\n                        HUF\n\n                    
', '\n\n                        JPY\n\n                    ', '\n\n                        
NOK\n\n                    ', '\n\n                        SEK\n\n                    
']

当我运行此代码时，我收到了我想要的信息，但有很多空格，它们之间有逗号和换行符。那么我怎样才能删除它们以获得我想要的东西。

【问题讨论】：

标签： python regex list web-scraping beautifulsoup

【解决方案1】：

由于您的数据已经在列表中，您可以在list comprehension 中使用strip：

[x.strip() for x in ['Cod', '\n\n EUR\n\n ', '\n\n USD\n\n ', '\n\n GBP\n\n ', '\n\n CHF\n\n ', '\n\n AUD\n\n ', '\n\n DKK\n\n ', '\n\n HUF\n\n ', '\n\n JPY\n\n ', '\n\n NOK\n\n ', '\n\n SEK\n\n ']]

【讨论】：

strip 是关键字，但看看我需要的答案：'lista = [] for row in rows: lista.append(row.contents[i].text.strip())'