在 Python 2.7 中将字符串和整数从 .txt 文件存储到字典答案

【问题标题】：Storing Strings and Integers from .txt file to a Dictionary in Python 2.7在 Python 2.7 中将字符串和整数从 .txt 文件存储到字典
【发布时间】：2016-04-30 05:13:20
【问题描述】：

我希望我的程序采用亚种标题（例如“Ablepharus bivittatus”）并将其存储为字符串键。然后我希望程序将以下几行序列 ID（整数）上升到直到下一个亚种标题。整数将存储为上面最初抓取的亚种键的值。

我希望程序能够提示用户输入字符串，然后将通过所有字典键搜索输入以找到完全匹配（区分大小写，此处拼写很重要），然后返回序列 ID .

最有效的方法是什么？现在我可以将这两个实体（ID 和亚种名称）分开，但我不知道如何在迭代文本文件时创建一个字典来存储这些值。

有些行包含相同的名称，但重复了多次，我怎么能告诉程序检测到，并且只匹配多个相同亚种名称中的第一个作为一个字符串键？

文本文件的格式如下

感谢您的宝贵时间

Ablepharus bivittatus   
36630
31764
31212
01996
09953
03744
14036
16094
01875
19076
09496
20583
24160
23142
26892
06533
05488
Ablepharus chernovi Ablepharus chernovi chernovi DAREVSKY 1953
Ablepharus chernovi eiselti SCHMIDTLER 1997
Ablepharus chernovi isauriensis SCHMIDTLER 1997
Ablepharus chernovi ressli SCHMIDTLER 1997
31212
01996
09637
14036
20583
23142
21989
26892
28697
09207
09206
Ablepharus darvazi  
06245
26892

这是迄今为止我一直在搞砸的一些代码。

dictionary = {}

with open("repCleanSubs2.txt") as file:
    for line in file:
        (key, val) = line.split()
        dictionary[val(key)] = val
print key(1)







'''import re
file = open('repCleanSubs2.txt')
subspecies = []
dnaIDs = []
for line in file:
    match = re.findall('^[a-zA-Z]+', line)
        if match:
            subspecies.append(line)
            #Grab sequence IDs under this line ^ 
            #
            #Until you reach next string match





print dnaIDs
#userInput = raw_input("Which subspecies would you like to view?: ")
#if userInput == re.match(subspecies(line)):
#   print subspecies(line)'''
# print sequences IDs from the line grabbed here ^`

【问题讨论】：

到目前为止你写了什么代码？

标签： python python-2.7 file dictionary

【解决方案1】：

您可能想使用file.read().splitlines() 来获取行列表。
遍历这些行并检查它们是新亚种还是 ID 似乎是最合适的。
然后您可以在迭代期间使用“当前”名称作为字典键，并将新 ID 添加到该列表中。

这似乎符合您的要求：

import re


data = {}
lines = []

with open("data.txt") as f:
    lines = f.read().splitlines()
name = ""
for l in lines:
    if re.match("\d{5}", l):
        data[name].append(l)
    else:
        name = l.strip()
        data[name] = []

print data

结果如下：

{
    "Ablepharus chernovi isauriensis SCHMIDTLER 1997": [], 
    "Ablepharus bivittatus": [
        "36630", 
        "31764", 
        "31212", 
        "01996", 
        "09953", 
        "03744", 
        "14036", 
        "16094", 
        "01875", 
        "19076", 
        "09496", 
        "20583", 
        "24160", 
        "23142", 
        "26892", 
        "06533", 
        "05488"
    ], 
    "Ablepharus chernovi ressli SCHMIDTLER 1997": [
        "31212", 
        "01996", 
        "09637", 
        "14036", 
        "20583", 
        "23142", 
        "21989", 
        "26892", 
        "28697", 
        "09207", 
        "09206"
    ], 
    "Ablepharus darvazi": [
        "06245", 
        "26892"
    ], 
    "Ablepharus chernovi eiselti SCHMIDTLER 1997": [], 
    "Ablepharus chernovi Ablepharus chernovi chernovi DAREVSKY 1953": []
}

我不确定您所说的某些行包含重复的相同名称是什么意思，如果您可以详细说明这一点并指出您的预期输出，那么可以合并。

最后，返回用户提供的给定键的序列 ID 如下所示：

print(data[raw_input()])

【讨论】：

Ablepharus chernovi Ablepharus chernovi chernovi DAREVSKY 1953 Ablepharus chernovi eiselti SCHMIDTLER 1997 Ablepharus chernovi isauriensis SCHMIDTLER 1997 Ablepharus chernovi ressli SCHMIDTLER 1997 31212 01996 09637 14036 20583 23142 21989 26892 28697 09207 所以例如这四个名称应该只算作一个键，我怎么能告诉程序检测一组 1 个序列 ID 的多个名称。？
当用户查找该键时，他们是否需要键入 4 个名称中的任何一个或一起键入整个字符串？即像Ablepharus chernovi Ablepharus chernovi chernovi DAREVSKY 1953 或整个Ablepharus chernovi Ablepharus chernovi chernovi DAREVSKY 1953 Ablepharus chernovi eiselti SCHMIDTLER 1997 Ablepharus chernovi isauriensis SCHMIDTLER 1997 Ablepharus chernovi ressli SCHMIDTLER 1997？