【发布时间】:2018-08-03 18:01:00
【问题描述】:
我有一个包含以下内容的数据文件:
Part#1
A 10 20 10 10 30 10 20 10 30 10 20
B 10 10 20 10 10 30 10 30 10 20 30
Part#2
A 30 30 30 10 10 20 20 20 10 10 10
B 10 10 20 10 10 30 10 30 10 30 10
Part#3
A 10 20 10 30 10 20 10 20 10 20 10
B 10 10 20 20 20 30 10 10 20 20 30
从那里我希望有一个字典词典,每个字母都有汇总数据,所以它会是这样的:
dictionary = {{Part#1:{A:{10:6, 20:3, 30:2},
B:{10:6, 20:2, 30:3}}},
{Part#2:{A:{10:5, 20:3, 30:3},
B:{10:7, 20:1, 30:3}}},
{Part#3:{A:{10:6, 20:4, 30:1},
B:{10:4, 20:5, 30:2}}}}
这样,如果我想显示每个部分,它会给我这样的输出:
dictionary[Part#1]
A
10: 6
20: 3
30: 2
B
10: 6
20: 2
30: 3
… 以此类推,用于文件中接下来的几个分区。
目前我已经能够将文件从 txt 解析为 csv。并将其转换为字典让我们说外部字典。我一直在测试几种方法来查看我得到的输出,到目前为止,这段代码最接近(但不是全部)我正在寻找的结构,我已经在上面描述过。
partitions_dict = df_head(5).to_dict(orient='list')
print(partitions_dict)
Output:
{0: ['A', 'B', 'A', 'B', 'A'], 1: ['10', '10', '10', '10', '10'], 2: [10, 10, 10, 10, 10], 3: [10, 10, 10, 10, 10], 4: [10, 10, 10, 10, 10], 5: [10, 10, 10, 10, 10], 6: [10, 10, 10, 10, 10], 7: [10, 10, 10, 10, 10]
我用来解析文件的函数:
def fileFormatConverter(txt_file):
""" Receives a generated text file of partitions as a parameter
and converts it into csv format.
input: text file
return: csv file """
filename, ext = os.path.splitext(txt_file)
csv_file = filename + ".csv"
in_txt = csv.reader(open(txt_file, "r"), delimiter = ' ')
out_csv = csv.writer(open(csv_file,'w'))
out_csv.writerows(in_txt)
return (csv_file)
# removes "Part#0" as a header from the dataframe
df_traces = pd.read_csv(fileFormatConverter("sample.txt"), skiprows=1, header=None) #, error_bad_lines=False)
df_traces.head()
输出:
0 1 2 3 4 5 6 7 8 9 ... 15 16 17 18 19 20 21 22 23 24
0 A, 10, 20, 10, 10, 30, 10, 20, 10, 30, ... 20, 10, 10, 30, 10, 30, 10, 20, 30.0 NaN
1 Part#2 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 A, 30, 30, 30, 10, 10, 20, 20, 20, 10, ... 20, 10, 10, 30, 10, 30, 10, 30, 10.0 NaN
3 Part#3 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 A, 10, 20, 10, 30, 10, 20, 10, 20, 10, ... 20, 20, 20, 30, 10, 10, 20, 20, 30.0 NaN
我使用了一个函数来更改标题,以便更容易操作每个分区内的字母:
def changeDFHeaders(df):
df_transpose = df.T
new_header = df_transpose.iloc[0] # stores the first row for the header
df_transpose = df_transpose[1:] # take the data less the header row
df_transpose.columns = new_header # set the header row as the df header
return(df_transpose)
# The counter column serves as an index for the entire dataframe
#df_transpose['counter'] = range(len(df_transpose)) # adds the counter for rows column
#df_transpose.set_index('counter', inplace=True)
df_transpose_headers = changeDFHeaders(df_traces)
df_transpose_headers.infer_objects()
输出:
A, Part#2 A, Part#3 A,
1 10, NaN 30, NaN 10,
2 20, NaN 30, NaN 20,
3 10, NaN 30, NaN 10,
4 10, NaN 10, NaN 30,
5 30, NaN 10, NaN 10,
6 10, NaN 20, NaN 20,
7 20, NaN 20, NaN 10,
8 10, NaN 20, NaN 20,
9 30, NaN 10, NaN 10,
10 10, NaN 10, NaN 20,
11 20, NaN 10, NaN 10,
12 B, NaN B, NaN B,
13 10, NaN 10, NaN 10,
14 10, NaN 10, NaN 10,
15 20, NaN 20, NaN 20,
16 10, NaN 10, NaN 20,
17 10, NaN 10, NaN 20,
18 30, NaN 30, NaN 30,
19 10, NaN 10, NaN 10,
20 30, NaN 30, NaN 10,
21 10, NaN 10, NaN 20,
22 20, NaN 30, NaN 20,
23 30 NaN 10 NaN 30
24 NaN NaN NaN NaN NaN
--还是不太对……
如果您检查此声明:
df = df_transpose_headers
partitions_dict = df.head(5).to_dict(orient='list')
print(partitions_dict)
输出:
{'A,': ['10,', '20,', '10,', '30,', '10,'], 'Part#2': [nan, nan, nan, nan, nan], 'Part#3': [nan, nan, nan, nan, nan]}
【问题讨论】:
-
我注意到你已经编辑了你的问题以澄清为什么这不是重复的:你也可以edit 包括你试图解决这个问题的内容吗?请包括您拥有的所有相关代码。
-
@TemporalWolf 感谢您的建议!
-
我已投票支持重新开放,但我看不出您是如何从问题顶部给出的输入中得出代码中的输出的。
-
@TemporalWolf 好的。我将添加这些函数,以便您查看正在执行的操作。不过还是不太对。
-
感谢您回复提供更多信息的请求。为了进一步改进您的问题,您会发现How to Ask 和minimal reproducible example 中的提示非常有帮助
标签: python dictionary nested aggregate summary