Python :: 将数据从 csv 转换为“str”类型的数据 [关闭]答案

【问题标题】：Python:: Converting a data from csv into data of type "str" [closed]Python :: 将数据从 csv 转换为“str”类型的数据 [关闭]
【发布时间】：2017-05-15 23:59:46
【问题描述】：

这是我在 python 的第一天。
我有一个如下所示的 csv 文件。

文件链接：https://1drv.ms/u/s!AlQo_tHSk1tGjlZYua8xoHSRQ4m6。

文件名：toy.csv

id  text
1   hello world
2   hello foo world
3   hello my world

我必须编写一段代码，使其采用以下格式：

要求格式：

'{"documents":[{"id":"1","text":"hello world"},{"id":"2","text":"hello foo world"},{"id":"three","text":"hello my world"},]}'
num_detect_langs = 1;

一种直接硬编码的方法如下：

input_texts = '{"documents":[{"id":"1","text":"hello world"},{"id":"2","text":"hello foo world"},{"id":"three","text":"hello my world"},]}'

这里输入文本的类型是“str”

但实际上这可能是不可能的，因为我的输入文件可以包含 1000 条记录。我知道我们需要形成一个“for”循环之类的东西，这样它采用所需的格式。我不知道如何实现这一点。

有人可以在这里帮忙吗？

【问题讨论】：

看看这个可能会有所帮助：docs.python.org/3/library/json.html
还有这个：pandas.pydata.org/pandas-docs/stable/generated/…

标签： python json python-2.7 csv

【解决方案1】：

这还不完全是您想要的，但可以让您非常接近：

import io
import json

# this is only to fake your input file...
file = io.StringIO('''id  text
1   hello world
2   hello foo world
3   hello my world
''')

# you would have to open your file:
# with open('filename', 'r') as file:
#     ...

lst = []
header = next(file)  # read and discard the header (id  text)
for line in file:
    splt = line[:-1].split(None, 1)
    lst.append({'id': splt[0], 'text': splt[1]})

print(json.dumps(lst))

# [{"id": "1", "text": "hello world"}, 
#  {"id": "2", "text": "hello foo world"},
#  {"id": "3", "text": "hello my world"}]

我相信你会解决剩下的。

这仅使用内置函数。但是看到您提到“数据框”，我想您想使用熊猫...

【讨论】：

【解决方案2】：

要将问题中提到的 df 数据框对象转换为所需的格式，您可以执行以下操作：

d={}
d["Documents"] = df.to_dict(orient='records')    
print d

输出：

{'documents': [{'text': 'hello world', 'id': 1}, {'text': 'hello foo world', 'id': 2}, {'text': 'hello my world', 'id': 3}]}

【讨论】：

在循环之前定义d={"documents": []} 需要n 查找键“文档”只是为了附加值。不是个好主意。
@MYGz - 知道为什么要交换文本和 id 吗？不能像这样 {"id":"1","text":"hello world"}。 Id 后跟文字？
@Sijo Order 直到 Python3.5 才保留在 Python 字典中。从 Python3.6 开始，它被保留了。
@Harald 你是对的。改变了它。 df.to_dict() 可以做到这一点。

【解决方案3】：

假设输入文件名为data.txt：

id  text
1   hello world
2   hello foo world
3   hello my world

执行此操作以创建所需的 JSON 字符串：

import json

with open('data.txt','r') as f:
    lines = f.read().splitlines()

first_line = lines[0]

id_header, text_header = first_line.split()
text_index = first_line.index(text_header)

documents = []

for line in lines[1:]:
    index = line.split()[0]
    text = line[text_index:]

    documents.append({
        id_header: index,
        text_header: text,
    })

result = {"documents": documents}

json_string = json.dumps(result)
print json_string

【讨论】：

不，这不是文件。他有一个 DataFrame 对象。连我一开始都误认为是文件。
该死的。我将 Pandas 作为一个被忽略的标签，因为我对此一无所知，甚至没有意识到这里的数据框有特殊含义。哎呀。
考虑学习一些基本的 Pandas。数据操作变得容易得多。有空的时候看看：pandas.pydata.org/pandas-docs/stable/tutorials.html
有趣的是，在他使用 Python 的第一天，他正在做 pandas：P。也许他是R 的人。
@HaraldNordgren - 感谢您的回复。正如 MYGz 指出的那样，我是一个 R 人。在 R 中，一旦数据被读取，它将被视为数据帧。很抱歉造成的混乱。但是在上述情况下，我从 csv 文件加载数据。您能否让我知道 id_header, text_header = first_line.split() 行是否正确？这会抛出一个错误“ValueError：需要超过 1 个值才能解压”。

【解决方案4】：

假设您的数据位于工作目录中的某个文件中，例如“data.csv”。我还假设它是一个逗号分隔的列表（您只发布了一张非常无用的图片）。无论如何：

import csv
import json
with open('data.csv') as f:
    reader = csv.DictReader(f)
    input_text = {'documents': list(reader)}
input_text = json.dumps(input_text)

【讨论】：

非常感谢您的帮助，这对我有用。