【问题标题】:Updating text file from the python dictionary从 python 字典更新文本文件
【发布时间】:2019-10-20 00:47:45
【问题描述】:

大家好,

假设我在 python 中有一本字典:

dict = {'fresh air', 'entertainment system', 'ice cream', 'milk', 'dog', 'blood pressure'}

以及如下文本列表:

text_file = ['is vitamin d in milk enough', 'try to improve quality level by automatic intake of fresh air', 'turn on the tv or entertainment system based on that individual preferences', 'blood pressure monitor', 'I buy more ice cream', 'proper method to add frozen wild blueberries in ice cream']

我想在文本文件的所有出现中显示每个出现的短语属于字典(比如新鲜空气)为#fresh_air#,而对于字典中的每个单词(比如milk),输出应该显示如#milk#,即在所有出现的text_file 的开头和结尾附加特殊字符。

我想要的输出应该是以下形式(列表列表):

[[is vitamin d in #milk# enough], [try to improve quality level by automatic intake of #fresh_air#], [turn on the tv or #entertainment_system# based on the individual preferences], [#blood_pressure# monitor], [I buy more #ice_cream#], [proper method to add frozen wild blueberries in #ice_cream# with #milk#]]

是否有任何标准方法可以以省时的方式实现这一目标?

我是使用 python 进行列表和文本处理的新手,我尝试过使用列表理解,但未能达到预期的结果。非常感谢任何帮助。

【问题讨论】:

  • 你有一个set 对象

标签: python python-3.x list dictionary nltk


【解决方案1】:

使用正则表达式。

例如:

import re
data = {'fresh air', 'entertainment system', 'ice cream', 'milk', 'dog', 'blood pressure'}
pattern = re.compile("("+"|".join(data)+")")
text_file = ['is vitamin d in milk enough', 'try to improve quality level by automatic intake of fresh air', 'turn on the tv or entertainment system based on that individual preferences', 'blood pressure monitor', 'I buy more ice cream', 'proper method to add frozen wild blueberries in ice cream']

result = [pattern.sub(r"#\1#", i) for i in text_file]
print(result)

输出:

['is vitamin d in #milk# enough',
 'try to improve quality level by automatic intake of #fresh air#',
 'turn on the tv or #entertainment system# based on that individual preferences',
 '#blood pressure# monitor',
 'I buy more #ice cream#',
 'proper method to add frozen wild blueberries in #ice cream#']

注意您的dict 变量是set 对象。


根据评论中的要求更新了 sn-p。

演示:

import re
data = {'fresh air', 'entertainment system', 'ice cream', 'milk', 'dog', 'blood pressure'}
data = {i: i.replace(" ", "_") for i in data}
#pattern = re.compile("("+"|".join(data)+")")
pattern = re.compile(r"\b("+"|".join(data)+r")\b")
text_file = ['is vitamin d in milk enough', 'try to improve quality level by automatic intake of fresh air', 'turn on the tv or entertainment system based on that individual preferences', 'blood pressure monitor', 'I buy more ice cream', 'proper method to add frozen wild blueberries in ice cream']

result = [pattern.sub(lambda x: "#{}#".format(data[x.group()]), i) for i in text_file]
print(result)

输出:

['is vitamin d in #milk# enough',
 'try to improve quality level by automatic intake of #fresh_air#',
 'turn on the tv or #entertainment_system# based on that individual preferences',
 '#blood_pressure# monitor',
 'I buy more #ice_cream#',
 'proper method to add frozen wild blueberries in #ice_cream#']

【讨论】:

  • 非常感谢 Rakesh 的帮助。我将详细介绍您建议的代码。
  • 另外,使用内置名称 (dict) 作为变量名是个坏主意。
  • @Rakesh。谢谢。如何通过保留您建议中提到的相同结构在短语中附加特殊字符(如#fresh_air#)?
  • @MishraS。更新了 sn-p
  • @MishraS。您需要使用正则表达式边界。在上述解决方案中使用:pattern = re.compile(r"\b("+"|".join(data)+r")\b")
猜你喜欢
  • 2022-01-22
  • 2015-09-27
  • 2012-03-08
  • 2017-07-07
  • 2023-03-14
  • 2018-01-11
  • 1970-01-01
  • 1970-01-01
  • 2013-07-13
相关资源
最近更新 更多