【问题标题】:Removing extra characters from python string从python字符串中删除多余的字符
【发布时间】:2016-05-30 17:30:56
【问题描述】:

我要提取的信息: 列表中的位置 Al Bayan 和尼泊尔 ['Al Bayan' , 'Nepal']

<div class="location">
<div class="listing-location">Location</div>
<div class="location-areas">
<span class="location">Al Bayan</span>
‪,‪
<span class="location">Nepal</span>
</div>
<div class="area-description"> 3.3 km from Mall of the Emirates </div>
</div>

提取区域的代码:

区域

try:
    area= soup.find('div', 'location-areas')
    area_result= str(area.get_text().strip().encode("utf-8"))
    print([area_result])


except StandardError as e:
    area_result="Error was {0}".format(e)
    print area_result

输出:

"Al Bayanأ¢â‚¬آھ,أ¢â‚¬آھ
                        
                            Nepal"

所需的输出:

['Al Bayan', 'Nepal']

【问题讨论】:

  • 你能不能用一句话概括实际的问题?

标签: python python-2.7 python-3.x web-scraping bs4


【解决方案1】:

我假设soupBeautifulSoup 实例,例如soup = BeautifulSoup(html_string, "html.parser") 其中html_string 是您的html markup

试试这个:

area_list = [area.get_text().strip().encode('utf-8') for area in soup.find_all('span', {'class': 'location'})] print area_list

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2022-12-16
    • 2013-05-07
    • 1970-01-01
    • 2016-09-30
    • 1970-01-01
    • 1970-01-01
    • 2011-08-05
    相关资源
    最近更新 更多