从过滤器中排除数字答案

【问题标题】：excluding numbers from a filter从过滤器中排除数字
【发布时间】：2021-01-19 15:38:42
【问题描述】：

我正在从一个字符串中过滤一堆数字。我必须这样做的代码是：

def fun(variable):
    numbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '.', '$']
    if variable in numbers:
        return True
    else:
        return False


sequence = stats
filtered = filter(fun, sequence)
print('The filtered numbers are:')
for s in filtered:
    print(s)

变量“stats”是一个字符串，例如“24hour low: $13.45” 运行代码时，它将提供 2、4、1、3、4、5 的输出但是，我想排除像 24 这样在纯数据方面无关紧要的数字。我必须做出哪些改变才能得到这个？此外，我对 python 和一般编码都是新手，所以请原谅我的无知。

【问题讨论】：

你能举一个你期望/想要的输出的例子吗？
@meshi 而不是“24 小时最低价：13.45 美元”，我的目标是获得 1345

标签： python-3.x filter output

【解决方案1】：

我会以不同的方式解决问题。

假设您的所有数据字符串都遵循“: $”的模式，那么您可以对字符串使用.split 方法。如果在 "$" 上拆分字符串，则返回列表中的最后一项是可以转换为浮点数的字符串。

def get_price(string):
    """Return the last token from splitting a string on the '$' as a float."""
    return float(string.split("$")[-1])

data = ["24hour low: $13.45", "24hour high: $23.45"]

print(list(map(get_price, data)))

# A list comprehension that is the equivalent of the above.
print([get_price(item) for item in data])

输出：[13.45, 23.45]

更新：我刚刚看到你的评论。如果你想要一串数字，你可以使用这个代码。

def get_price(string):
    """Return the last token from splitting a string on the '$'
    followed by a split on the '.' 
    followed by a join with an empty string ''."""
    return "".join(string.split("$")[-1].split("."))

data = ["24hour low: $13.45", "24hour high: $23.45"]

print(list(map(get_price, data)))

# A list comprehension which is the equivalent of the above.
print([get_price(item) for item in data])

输出：['1345', '2345']

如果您坚持使用filter，下面是一个检查每个字符是否为数字的示例。然后切片并将索引 2 的项目连接到末尾。这将删除“24”。

def is_digit(string):
    return string.isdigit()

stats = "24hour low: $13.45"
print(''.join(filter(is_digit, stats))[2:])

输出：'1345'

更新：我假设您有一个字符串列表。这是一个例子。

data = ["24hour low: $13.45", "24hour high: $23.45"]

print([''.join(filter(is_digit, string))[2:] for string in data])

输出：['1345', '2345']

在包含示例字符串的评论后更新：

SAMPLE_STRING = "Key metrics24 Hour Low$21.2024 Hour High$22.90Net change$0.616816"

nested_tokens = [
    [''.join(filter(is_digit, item)) for item in token.split()]
    for token in SAMPLE_STRING.replace(
        "Net change", " " # Replace 'Net change' with a space
    ).split(
        "24 Hour " # Remove '24 Hour' to avoid ambiguity.
    )[
        1: # slice 'Key metrics' from split
    ]
]

import itertools as it

print(list(it.chain.from_iterable(nested_tokens)))

输出：['2120', '2290', '0616816']

【讨论】：

非常感谢，这回答了我的问题。但是，我想要过滤的数据是在变量下而不是字符串形式下，我无法区分 24 小时低点和 24 小时高点。这还有可能吗？对于问题措辞错误，我深表歉意。
请放一个您要过滤的数据样本，即包括您要使用的变量的值。
我的变量是在“sequence = stats”示例中看到的“stats”：“关键指标24 小时低$21.2024 小时高$22.90Net change$0.616816”
我在您之前的评论中包含了一个使用示例字符串的解决方案。您想以float 类型收集数据吗？没有小数点的字符串可能没那么有用。
这看起来很有希望，但我收到一条错误消息，说“is_digit”是一个未解决的参考。另外，我得到一堆回溯错误：