我会以不同的方式解决问题。
假设您的所有数据字符串都遵循“: $”的模式,那么您可以对字符串使用.split 方法。如果在 "$" 上拆分字符串,则返回列表中的最后一项是可以转换为浮点数的字符串。
def get_price(string):
"""Return the last token from splitting a string on the '$' as a float."""
return float(string.split("$")[-1])
data = ["24hour low: $13.45", "24hour high: $23.45"]
print(list(map(get_price, data)))
# A list comprehension that is the equivalent of the above.
print([get_price(item) for item in data])
输出:[13.45, 23.45]
更新:我刚刚看到你的评论。如果你想要一串数字,你可以使用这个代码。
def get_price(string):
"""Return the last token from splitting a string on the '$'
followed by a split on the '.'
followed by a join with an empty string ''."""
return "".join(string.split("$")[-1].split("."))
data = ["24hour low: $13.45", "24hour high: $23.45"]
print(list(map(get_price, data)))
# A list comprehension which is the equivalent of the above.
print([get_price(item) for item in data])
输出:['1345', '2345']
如果您坚持使用filter,下面是一个检查每个字符是否为数字的示例。然后切片并将索引 2 的项目连接到末尾。这将删除“24”。
def is_digit(string):
return string.isdigit()
stats = "24hour low: $13.45"
print(''.join(filter(is_digit, stats))[2:])
输出:'1345'
更新:我假设您有一个字符串列表。这是一个例子。
data = ["24hour low: $13.45", "24hour high: $23.45"]
print([''.join(filter(is_digit, string))[2:] for string in data])
输出:['1345', '2345']
在包含示例字符串的评论后更新:
SAMPLE_STRING = "Key metrics24 Hour Low$21.2024 Hour High$22.90Net change$0.616816"
nested_tokens = [
[''.join(filter(is_digit, item)) for item in token.split()]
for token in SAMPLE_STRING.replace(
"Net change", " " # Replace 'Net change' with a space
).split(
"24 Hour " # Remove '24 Hour' to avoid ambiguity.
)[
1: # slice 'Key metrics' from split
]
]
import itertools as it
print(list(it.chain.from_iterable(nested_tokens)))
输出:['2120', '2290', '0616816']