读取每列的第一个元素，然后读取 csv 文件中的整行答案

【问题标题】：Reading first element of each column and then the entire row in csv file读取每列的第一个元素，然后读取 csv 文件中的整行
【发布时间】：2021-12-11 08:38:38
【问题描述】：

我有一个 csv 文件，如下所示：

Account     Email       User_Id     User Type   Base Role   Last Login
123456  x@proton.com    111111         inter      user         7
7891011 y@proton.com    222222         inter      user         6
121314  z@proton.com    333333         inter      user         5

还会有 50 行这样的其他行。每个帐户可以有多个用户。并且同一帐户可以在文件中多次列出。我必须为每个帐户创建一个新的 csv 文件。对于每个帐户，我必须选择整行并复制其内容。我该怎么做？如何选择：

for each account number
   if a csv file for this account does not exist already
       create a new file
   copy the entire now and paste it in the new csv file

我可以用这个创建一个新的 csv 文件：

with open("test.csv") as fp

但我不知道如何选择每个帐号，然后将该行的内容复制并粘贴到新文件中。我是 Python 新手。请帮忙

【问题讨论】：

标签： python python-3.x csv file

【解决方案1】：

Python 默认自带一个csv 模块

import csv

def get_firsts(csvfile, skip_first=True):
    with open(csvfile, 'r') as f:
        data = csv.reader(f, delimiter=',')
        if skip_first:
            _ = next(data)
        firsts = [row[0] for row in data]
    return firsts

这会返回一个列表，其中只有每行的第一个元素，如果它是列名，您可以去掉第一个元素。

【讨论】：

if skip_first: _ = next(data) 代码在做什么？
@winterlyrock 如果第一行是标签，您通常希望将其拆分。如果您在列上执行一些逻辑，它会有所帮助。您还可以将 _ 分配给有用的列表并将其返回

【解决方案2】：

你可以在 python 中使用 pandas。

import pandas as pd

如果你有 DataFrame -> 没关系

如果你不这样做，你可以使用这一行将你的 CSV 转换为 DataFrame

df = pd.read_csv('your_csv_file.csv')

现在您可以像这样使用 DataFrame 函数来选择您的数据。

new_df = df.loc[df['Account'] == 123456]

new_df 也是一个 DataFrame。您可以使用以下方法保存结果 DataFrame：

new_df.to_csv('results.csv')

您可以使用这些代码为每个帐号执行此操作：

for i in df['Account'] :

    new_df = df.loc[df['Account'] == i]
    
    # you can use list of file path for saving results   
    new_df.to_csv('results.csv')

【讨论】：

如何为每个帐号执行此操作？
我会编辑这件事。

【解决方案3】：

您可以尝试使用 convtools 库，它提供了许多数据处理原语，包括聚合和 CSV 文件的帮助器：

from convtools import conversion as c
from convtools.contrib.tables import Table

dialect = Table.csv_dialect(delimiter="\t")
# read the input file
table = Table.from_csv("input_1.csv", header=True, dialect=dialect)
# remember the header
header = table.columns

# prepare a converter to group by first column (we could work with dicts, but
# it is slower), aggregate by storing rows in arrays
converter = (
    c.group_by(c.item(0))
    .aggregate({"account": c.item(0), "rows": c.ReduceFuncs.Array(c.this())})
    .gen_converter()
)

# perform aggregation
data_by_accounts = converter(table.into_iter_rows(list))

# write files
for data in data_by_accounts:
    Table.from_rows(data["rows"], header=header).into_csv(
        "account_{}.csv".format(data["account"]), dialect=dialect
    )

【讨论】：