【问题标题】:How to extract a substring between two strings with python [duplicate]如何使用python提取两个字符串之间的子字符串[重复]
【发布时间】:2020-03-23 14:18:27
【问题描述】:

我有这行:

<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>

我想提取价格:45.66 包含在:data-asin-price="" data-asin-shipping 之间

我找到了这段代码,但效果不是很好。

def extractSubstring(text, sub1, sub2):
  pos1 = text.lower().find(sub1) + len(sub1)
  pos2 = text.lower().find(sub2)
  if pos1 > pos2 and pos2 > 0:
    return text[pos1:pos2]
  elif pos2 > pos1 and pos1 > 0:
    return text[pos2:pos1]
  elif pos1 > 0:
    return text[pos1:]
  elif pos2 > 0:
    return text[pos2:]

result = soup.find_all(attrs={"data-asin-currency-code": "USD"})
priceLine='<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>'

sub1 = 'data-asin-price="'
sub2 = '" data-asin-shipping'

substring = extractSubstring(str(priceLine), sub1, sub2)

【问题讨论】:

  • 使用regex
  • 你可以使用price = re.findall("\d+\.\d+",priceLine)
  • 目前还不清楚您不只是为此使用 Beautiful Soup。它使extract attributes 变得非常容易。喜欢:soup.div['data-asin-price']
  • 我试过bs4没有成功,我只是找到了提取它的方法result = re.search(sub1+'(.*)'+sub2, text)所以,如果有人想回答这个..
  • 这可能会有所帮助:stackoverflow.com/questions/3368969/…

标签: python regex string


【解决方案1】:

BeautifulSoup 是要走的路

html = bs4.BeautifulSoup('<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>')

然后:

print(html.div['data-asin-price'])
45.66

【讨论】:

    猜你喜欢
    • 2018-07-26
    • 2021-01-17
    • 2016-12-09
    • 2013-01-31
    • 2013-12-11
    • 2022-01-26
    • 1970-01-01
    • 1970-01-01
    • 2015-10-09
    相关资源
    最近更新 更多