将带有行的字典写入csv迭代答案

【问题标题】：Write dictionary with rows to csv iterating将带有行的字典写入csv迭代
【发布时间】：2018-09-03 19:33:27
【问题描述】：

我需要将字典写入csv，但问题是我无法将它保存在内存中，所以我必须迭代：

def save_phons_2_csv(pandas_dataset, csv_name):
    if not os.path.isfile(csv_name): #create file if it doesn't exists
        with open(csv_name, 'w')as csv_file:
            pass
    for index_r, row in pandas_dataset.iterrows(): #get all phons frames
        for index, phon_dict in enumerate(row['phons']):
            if (phon_dict['phon'] not in no_phons):
                dicc = get_phonema(row, index)
                label = dicc['label']
                rows = np.array(dicc["frames"])

                with open(csv_name,'a+') as ofile:               
                    ... append label and rows to csv

最后，我想要做的是将label 和rows 存储在一个csv 文件中并能够读回它。

到目前为止我最好的尝试是这样的：

            with open(csv_name,'a+') as ofile:               
                wr = csv.writer(ofile)
                wr.writerow([label, rows])

但它会写入其中一些跳过大部分帧，如下所示：

sh,"[array([ 0.0005188 ,  0.        ,  0.00036621, ..., -0.00024414,
       -0.00131226, -0.0015564 ], dtype=float32)]"

ix,"[array([-0.0015564 , -0.00131226, -0.00061035, ...,  0.0017395 ,
        0.00012207, -0.00164795], dtype=float32)]"

它还把\n放在任何它想要的地方。

编辑：声明：

label 是一个字符串，比如 'sh' 或 'ix' 或类似的东西

rows 是一个类似 [ 0.0005188 0. 0.00036621 ..., -0.00024414 -0.00131226 -0.0015564 ] 的数组

我还有所有帧的最大长度以防万一

如果我这样做print(pandas_dataset.head())，这就是我得到的：

 Dialect  Female    ID   Male Type  \
0     DR1    True  CJF0  False   SA   
1     DR1    True  CJF0  False   SA   
2     DR1    True  CJF0  False   SI   
3     DR1    True  CJF0  False   SI   
4     DR1    True  CJF0  False   SI   

                                                path  \
0  C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...   
1  C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...   
2  C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...   
3  C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...   
4  C:\Users\isaac\Desktop\TFM\Database\TIMIT\TRAI...   

                                               phons  \
0  [{'end': 3050, 'start': 0, 'phon': 'h#'}, {'en...   
1  [{'end': 2260, 'start': 0, 'phon': 'h#'}, {'en...   
2  [{'end': 1513, 'start': 0, 'phon': 'h#'}, {'en...   
3  [{'end': 2120, 'start': 0, 'phon': 'h#'}, {'en...   
4  [{'end': 1507, 'start': 0, 'phon': 'h#'}, {'en...   

                                               words  
0  [{'end': 5723, 'start': 3050, 'word': 'she'}, ...  
1  [{'end': 4600, 'start': 2260, 'word': 'don't'}...  
2  [{'end': 7436, 'start': 1513, 'word': 'even'},...  
3  [{'end': 3533, 'start': 2120, 'word': 'or'}, {...  
4  [{'end': 2154, 'start': 1507, 'word': 'a'}, {'...

【问题讨论】：

你不需要迭代行吗？ for row in rows: wr.writerow([label, rows])
csv 无法处理数组数据类型的行，因此它将数组转换为字符串，因此会丢失所有数据。您可以将其另存为 writerow([label]+rows)，但看起来你在这里写列？
打印出来的东西和我的一样，但每个音素都有很多次@PeterWood
@pratiklodha 实际上我正在尝试将其写为行，但只要我可以将其读回，就可以将其写为列。你知道怎么做吗？
如果您可以将print(pandas_dataset.head()) 的输出添加到问题中，以及该数据的预期输出应该是什么样的，这将有所帮助。

标签： python csv dictionary iteration writer

【解决方案1】：

我终于设法将其保存到 csv 中，但我认为这不是一个好的解决方案，因此我将不标记此答案，以防有人提出更好的答案。

def save_phons_2_csv(pandas_dataset, csv_name):
    np.set_printoptions(threshold = np.inf, linewidth = np.inf)
    if not os.path.isfile(csv_name): #create file if it doesn't exists
        with open(csv_name, 'w')as csv_file:
            pass

    for index_r, row in pandas_dataset.iterrows(): #get all phons frames
        for index, phon_dict in enumerate(row['phons']):
            if (phon_dict['phon'] not in no_phons):
                dicc = get_phonema(row, index)
                label = dicc['label']
                rows = dicc["frames"]
                with open(csv_name,'a+') as ofile: 
                    text = '%s; %s\n' % (label, rows)
                    ofile.write(text)

基本上我所做的是设置 np 如何打印输出。

这给了我一个这样的 csv：

sh   [ -3.35693359e-04   3.35693359e-04  -6.71386719e-04   9.46044922e-04...
iy   [  4.94384766e-03  -1.58691406e-03   7.93457031e-04   8.85009766e-04...
...

每行有两个单元格，一个用于标签，一个用于框架，我认为每个框架都有一个单元格会更好

【讨论】：

【解决方案2】：

您应该能够通过使用 CSV 写入器对象来改进问题：

import numpy as np
import pandas as pd
import csv


def save_phons_2_csv(pandas_dataset, csv_name):
    np.set_printoptions(threshold = np.inf, linewidth = np.inf)
    if not os.path.isfile(csv_name): #create file if it doesn't exists
        with open(csv_name, 'w')as csv_file:
            pass

    with open(csv_name, 'a+', newline='') as ofile:             
        csv_ofile = csv.writer(ofile)

        for index_r, row in pandas_dataset.iterrows(): #get all phons frames
            for index, phon_dict in enumerate(row['phons']):
                if phon_dict['phon'] not in no_phons:
                    dicc = get_phonema(row, index)
                    label = dicc['label']
                    rows = dicc["frames"]
                    csv_ofile.writerow([label] + list(rows))

这会获取一个元素列表，并在您的输出文件中写入一行，每个元素之间使用正确的分隔符。

【讨论】：