【问题标题】:Building Abreviations Dictionary from Text file从文本文件构建缩写字典
【发布时间】:2018-07-14 19:42:37
【问题描述】:

我想建立一个缩写字典。

我有一个包含很多缩写的文本文件。文本文件如下所示(导入后)

with open('abreviations.txt') as ab:
    ab_words = ab.read().splitlines()

摘录:

'ACE',
'Access Control Entry',
'ACK',
'Acknowledgement',
'ACORN',
'A Completely Obsessive Really Nutty person',

现在我想构建字典,其中我将每个不均匀的行作为字典键,将每个偶数行作为字典值。

因此我应该能够在最后写:

ab_dict['ACE']

并得到结果:

'Access Control Entry'

另外,我怎样才能使它不区分大小写?

ab_dict['ace']

应该产生相同的结果

'Access Control Entry'

事实上,如果输出也是小写,那就完美了:

'access control entry'

这里是文本文件的链接:https://www.dropbox.com/s/91afgnupk686p9y/abreviations.txt?dl=0

【问题讨论】:

  • 如果两个条目有相同的缩写怎么办?还要检查这个问题:stackoverflow.com/questions/2082152/case-insensitive-dictionary
  • @schwobaseggl 然后你有两个不同的字典键用于相同的值。这不是一个问题。感谢您的链接!
  • ACe 这样的键呢?您是否只对完全小写的ace 或完全大写的ACE 的键感兴趣?
  • @RoadRunner 好点。我认为,如果它可以处理这种情况,那将是完美的。因此产生“访问控制条目”
  • @RoadRunner:如果我有这样的句子:“ACE 不容易理解”,如何自动将句子中的“ACE”替换为“访问控制条目”?

标签: python dictionary text


【解决方案1】:

具有自定义ABDict 类和Python 的生成器功能的完整解决方案:

class ABDict(dict):
    ''' Class representing a dictionary of abbreviations'''

    def __getitem__(self, key):
        v = dict.__getitem__(self, key.upper())
        return v.lower() if key.islower() else v

with open('abbreviations.txt') as ab:
    ab_dict = ABDict()

    while True:
        try:
            k = next(ab).strip()    # `key` line
            v = next(ab).strip()    # `value` line
            ab_dict[k] = v
        except StopIteration:
            break

现在,测试(使用 case-relative 访问):

print(ab_dict['ACE'])
print(ab_dict['ace'])
print('*' * 10)
print(ab_dict['WYTB'])
print(ab_dict['wytb'])

输出(连续):

Access Control Entry
access control entry
**********
Wish You The Best
wish you the best

【讨论】:

  • 非常好的解决方案。
  • 非常好!非常感谢 !你看过 RoadRunner 的评论吗? ab_dict['Ace'] 应该发生什么?完美的输出是:访问控制条目
  • 如果我有一个这样的句子:“ACE不容易理解”,如何自动将句子中的“ACE”替换为“访问控制条目”?
  • @totyped,为什么 ab_dict['Ace'] 应该与 Access Control entry 一起呈现?为什么不 Access control Entryaccess Control Entry ?添加一个额外的逻辑 - 应该有明确和准确的条件
  • @RomanPerekhrest 哦,那是一个错字。它应该是:访问控制条目。对不起! ...因此输入的大小应该告诉输出的大小。
【解决方案2】:

这是另一个基于来自this solutionpairwise 函数的解决方案:

from requests.structures import CaseInsensitiveDict

def pairwise(iterable):
    "s -> (s0, s1), (s2, s3), (s4, s5), ..."
    a = iter(iterable)
    return zip(a, a)

with open('abreviations.txt') as reader:
    abr_dict = CaseInsensitiveDict()
    for abr, full in pairwise(reader):
        abr_dict[abr.strip()] = full.strip()

【讨论】:

    【解决方案3】:

    这是一个答案,它还允许用字典中的单词替换句子:

    import re
    from requests.structures import CaseInsensitiveDict
    
    def read_file_dict(filename):
        """
        Reads file data into CaseInsensitiveDict
        """
    
        # lists for keys and values
        keys = []
        values = []
    
        # case sensitive dict
        data = CaseInsensitiveDict()
    
        # count used for deciding which line we're on
        count = 1
    
        with open(filename) as file:
            temp = file.read().splitlines()
    
            for line in temp:
    
                # if the line count is even, a value is being read
                if count % 2 == 0:
                    values.append(line)
    
                # otherwise, a key is being read
                else:
                    keys.append(line)
                count += 1
    
        # Add to dictionary
        # perhaps some error checking here would be good
        for key, value in zip(keys, values):
            data[key] = value
    
        return data
    
    
    def replace_word(ab_dict, sentence):
        """
        Replaces sentence with words found in dictionary
        """
    
        # not necessarily words, but you get the idea
        words = re.findall(r"[\w']+|[.,!?; ]", sentence)
    
        new_words = []
        for word in words:
    
            # if word is in dictionary, replace it and add it to resulting list
            if word in ab_dict:
                new_words.append(ab_dict[word])
    
            # otherwise add it as normally
            else:
                new_words.append(word)
    
        # return sentence with replaced words
        return "".join(x for x in new_words)
    
    
    def main():
        ab_dict = read_file_dict("abreviations.txt")
    
        print(ab_dict)
    
        print(ab_dict['ACE'])
        print(ab_dict['Ace'])
        print(ab_dict['ace'])
    
        print(replace_word(ab_dict, "The ACE is not easy to understand"))
    
    if __name__ == '__main__':
        main()
    

    哪些输出:

    {'ACE': 'Access Control Entry', 'ACK': 'Acknowledgement', 'ACORN': 'A Completely Obsessive Really Nutty person'}
    Access Control Entry
    Access Control Entry
    Access Control Entry
    The Access Control Entry is not easy to understand
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-04-10
      • 1970-01-01
      • 1970-01-01
      • 2012-03-08
      • 2016-08-26
      相关资源
      最近更新 更多