【问题标题】:splitting and combining text based on a delimiter in PythonPython中基于分隔符的文本拆分和组合
【发布时间】:2015-03-12 01:26:45
【问题描述】:

我有一个包含字符串的列表列表。经过各种正则表达式的工作后,我已将我想用作分隔符的 @@@ 插入到我的字符串中:

[['@@@this is part one and here is part two and here is part three and heres more and heres more'],
 ['this is part one@@@and here is part two and here is part three and heres more and heres more'],
 ['this is part one and here is part two@@@and here is part three and heres more and heres more']
 ['this is part one and here is part two and here is part three@@@and heres more and heres more']
 ['this is part one and here is part two and here is part three and heres more@@@and heres more']]

现在,我需要想出这个:

[['this is part one'],['and here is part two'],['and here is part three'], ['and heres more'], ['and heres more']]  

到目前为止,我的尝试都是臃肿、笨拙且丑陋的。我发现自己分裂、组合和匹配。任何人都可以就此类问题推荐一些一般性建议,以及使用哪些工具来使其易于管理?

编辑请注意! and heres more 在理想输出中确实出现了两次!

【问题讨论】:

  • 您能否阐明有关何时应在输出中包含“还有更多”的规则?它应该只出现在输出列表中一次吗?此外,输入列表中嵌入的列表都用逗号分隔?
  • 感谢您的提问。可能存在重复的文本...必须保留,并且必须保持顺序。

标签: python string list text


【解决方案1】:

我认为您实际上需要获取紧跟在 @@@ 之后的所有字符,直到下一个 and 或字符串结尾。

>>> [[m] for x in l for m in re.findall(r'@@@(.*?)(?=\sand\b|$)', x[0])]
[['this is part one'], ['and here is part two'], ['and here is part three'], ['and heres more'], ['and heres more']]

【讨论】:

  • 现在文本似乎失去了顺序:第一部分不再是第一部分!
  • set() 不保留顺序。 see here for discussion
  • @MattO'Brien 您的意思是更新吗?...您想要整个或仅第一个内部列表的上述预期输出?
  • @MattO'Brien 你能用准确的预期输出更新你的问题吗?
  • 该问题的预期输出正确...确实,and heres more 必须出现两次!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-05-08
  • 2015-06-26
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-11-15
  • 1970-01-01
相关资源
最近更新 更多