将文件作为字典返回答案

【问题标题】：Return a file as dictionary将文件作为字典返回
【发布时间】：2018-11-28 17:08:01
【问题描述】：

所以这是一个文件

APPLE: toronto, 2018, garden, tasty, 5
apple is a tasty fruit
>>>end 
apple is a sour fruit
>>>end
grapes: america, 24, organic, sweet, 4
grapes is a sweet fruit
>>>end

这是一个也有换行符的文件。我希望 tp 使用该文件创建一个字典。是这样的

函数是def f(file_to: (TextIO))-> Dict[str, List[tuple]]

file_to 是输入的文件名，它会返回字典，

{'apple': [('apple is a tasty fruit', 2018, 'garden', 'tasty', 5), (apple is a sour fruit)], 'grapes':['grapes is a sweet fruit', 24, 'organic', 5)]}

每个水果都是关键，它们的描述是在那里格式化的值。每个水果都在 >>>end

结束

我试过了

with open (file_to, "r") as myfile:
    data= myfile.readlines()
return data

它用 /n 返回列表中的文件字符串，我想我可以使用 strip() 来删除它并获取 ':' 之前的元素作为键。

我试过的代码是

from pprint import pprint
import re
def main():
    fin = open('f1.txt', 'r')

    data = {}
    key = ''
    parsed = []
    for line in fin:
        line = line.rstrip()
        if line.startswith('>'):
            data[key] = parsed
            parsed = []
        elif ':' in line:
            parts = re.split('\W+', line)
            key = parts[0].lower()
            parsed += parts[2:]
        else:
            parsed.insert(0, line)

    fin.close()
    pprint(data)


main()

它没有给出正确的预期结果:(

【问题讨论】：

您的尝试与注释不匹配。
为什么不直接使用 JSON 或 XML？
@DennisPatterson 听起来他们正在收到一个要求并且无法更改流程（给定函数 sn-p）

标签： python file dictionary

【解决方案1】：

我不认为你真的需要re 和pprint。我尝试过简单的列表理解和一些 if 语句。

def main:
    data = {}
    key = ''
    parsed = []
    for line in fin:
        line = line.rstrip()
        if line.startswith('>'):
            continue # If we get a line which starts with a '>', we can skip that line.
        elif ':' in line:
            parts = line.strip().split(":")
            key = parts[0].lower()

            firstInfo = parts[1].split(",") # What we have to add in the value, after reading the next line
            firstInfo.pop(0) # Removing the first element, The State name (as it is not required).

            secondInfo = fin.readline().strip() # Reading the next line. It will be the first value in the list.

            value = [secondInfo]

            value.extend([x for x in firstInfo]) # Extending the value list to add other elements.

            data[key] = value

    print(data["apple"])
    return data

如果您在此实施中遇到任何问题，我将很乐意提供帮助。（虽然这是不言自明的：P）

【讨论】：

感谢您的帮助。但是如果 APPLE: toronto, 2018, garden, sweet, 5 apple is: a sweet fruit has a ':' in the middle because it requires : as another main key but it's not because ever come before >>> end 被认为是一把钥匙和它的价值。但是如果一个值有：？
@Comp 我修改了我的代码（再次 :-P ）以考虑到这种情况。 elif re.match('^\w+:\s', line): 使用正则表达式 ^\w+:\s 来识别希望作为键的行和行的其余部分。 ^ 表示从开头开始匹配，\w+ 表示匹配一个或多个字母，紧随其后的是 : 和一个空格 \s 如果 : 位于某个其他位置，则不应允许匹配不是标题行的行。
@Comp 这种情况不太可能发生，因为当我获得第一个密钥时，我正在阅读另一行 (secondInfo = fin.readline().strip())（假设数据的第一行包含由 @987654332 分隔的密钥@. 不过，@Chris 的回答也是正确的，而且在处理密钥检测方面要好得多。

【解决方案2】：

我对你的代码做了一些调整（我在之前的帖子中给过你）。我认为这可以满足您对更新数据的需求。

数据：

APPLE: toronto, 2018, garden, tasty, 5
apple is a tasty fruit
>>>end
apple is a sour fruit
apple is ripe
>>>end
apple is red
>>>end
grapes: america, 24, organic, sweet, 4
grapes is a sweet fruit
>>>end

这是更新后的代码：

import re

def main():
    fin = open('f1.txt', 'r')

    data = {}

    for line in fin:
        line = line.rstrip()
        if line.startswith('>'):
            if key not in data:
                data[key] = [tuple(parts)]

        elif re.match('^\w+:\s', line):
            key, _, *parts = re.split('[:,]\s+', line)
        else:
            if key in data:
                data[key].append(line)
            else:
                parts.insert(0, line)

    fin.close()

    for key in data:
        if len(data[key]) > 1:
            data[key][1] = tuple(data[key][1:])
            del data[key][2:]

    print(data)


main()

修改后的数据和代码的输出是：

{'APPLE': [('apple is a tasty fruit', '2018', 'garden', 'tasty', '5'), ('apple is a sour fruit', 'apple is ripe', 'apple is red')], 'grapes': [('grapes is a sweet fruit', '24', 'organic', 'sweet', '4')]}

【讨论】：

感谢克里斯的帮助。如果输入数据时出现错误：builtins.UnboundLocalError: local variable 'key' referenced before assignment
@Comp 那么您的数据序列与您在样本中提供的序列不匹配。 elif re.match('^\w+:\s', line): 可能与 line 不匹配，因为冒号之前的部分中可能存在非单词 (a9zA-Z0-9) 字符。