【发布时间】:2017-10-02 02:33:34
【问题描述】:
如何从字符串中找到子串列表的位置?
给定一个字符串:
“周六,这架飞往圣彼得堡的飞机在从沙姆沙伊赫起飞 23 分钟后,在埃及的西奈沙漠坠毁。”
还有一个子串列表:
['The', 'plane', ',', 'bound', 'for', 'St', 'Petersburg', ',', 'crashed', 'in', 'Egypt', "' s”、“西奈”、“沙漠”、“刚刚”、“23”、“分钟”、“之后”、“起飞”、“从”、“沙姆”、“沙伊赫”、“开” , '星期六', '.']
期望的输出:
>>> s = "The plane, bound for St Petersburg, crashed in Egypt's Sinai desert just 23 minutes after take-off from Sharm el-Sheikh on Saturday."
>>> tokens = ['The', 'plane', ',', 'bound', 'for', 'St', 'Petersburg', ',', 'crashed', 'in', 'Egypt', "'s", 'Sinai', 'desert', 'just', '23', 'minutes', 'after', 'take-off', 'from', 'Sharm', 'el-Sheikh', 'on', 'Saturday', '.']
>>> find_offsets(tokens, s)
[(0, 3), (4, 9), (9, 10), (11, 16), (17, 20), (21, 23), (24, 34),
(34, 35), (36, 43), (44, 46), (47, 52), (52, 54), (55, 60), (61, 67),
(68, 72), (73, 75), (76, 83), (84, 89), (90, 98), (99, 103), (104, 109),
(110, 119), (120, 122), (123, 131), (131, 132)]
输出说明,第一个子字符串“The”可以通过使用字符串s 使用(start, end) 索引找到。所以从所需的输出。
因此,如果我们从所需输出中遍历所有整数元组,我们将得到子字符串列表,即
>>> [s[start:end] for start, end in out]
['The', 'plane', ',', 'bound', 'for', 'St', 'Petersburg', ',', 'crashed', 'in', 'Egypt', "'s", 'Sinai', 'desert', 'just', '23', 'minutes', 'after', 'take-off', 'from', 'Sharm', 'el-Sheikh', 'on', 'Saturday', '.']
我试过了:
def find_offset(tokens, s):
index = 0
offsets = []
for token in tokens:
start = s[index:].index(token) + index
index = start + len(token)
offsets.append((start, index))
return offsets
还有其他方法可以从字符串中找到子字符串列表的位置吗?
【问题讨论】:
标签: python string indexing substring offset