【问题标题】:Merge rows of csv file with same category in python在python中合并具有相同类别的csv文件行
【发布时间】:2018-07-17 18:06:43
【问题描述】:

我熟悉在 python 中读写 csv 文件的基本概念。但我坚持为这个问题制定一个逻辑。我认为 GROUP BY 可以解决我的问题,但是如何在 python 中做呢

Category         Data
A                Once upon a time.
A                There was a king.
A                who ruled a great and glorious nation.
B                He loved each of them dearly. 
B                One day, when the young ladies were of age to be married. 
B                terrible, three-headed dragon laid. 
C                It is so difficult to deny 
C                the reality

我想为这样的输出制定逻辑,即 A 类的数据合并到一行,B 类和 C 类的数据也这样。

Category         Data
    A                Once upon a time. There was a king. who ruled a great and glorious nation.
    B                He loved each of them dearly. One day, when the young ladies were of age to be married. terrible, three-headed dragon laid. 
    C                It is so difficult to deny the reality

如果你们中的任何人可以帮助我解决这个逻辑,我会很感激他的努力。

【问题讨论】:

标签: python python-3.x csv


【解决方案1】:

使用pandas 库,您可以使用groupby 并创建一个自定义聚合函数,该函数仅连接每个类别的Data

>>> import pandas as pd
>>> data = [['A', 'Once upon a time.'], ['A', 'There was a king.'], ['A', 'who ruled a great and glorious nation.'], ['B', 'He loved each of them dearly. '], ['B', 'One day, when the young ladies were of age to be married. '], ['B', 'terrible, three-headed dragon laid. '], ['C', 'It is so difficult to deny '], ['C', 'the reality']]
>>> df = pd.DataFrame(data=data, columns=['Category','Data'])
>>> df
  Category                                               Data
0        A                                  Once upon a time.
1        A                                  There was a king.
2        A             who ruled a great and glorious nation.
3        B                     He loved each of them dearly.
4        B  One day, when the young ladies were of age to ...
5        B               terrible, three-headed dragon laid.
6        C                        It is so difficult to deny
7        C                                        the reality
>>> df.groupby('Category').agg({'Data': lambda x : ' '.join(x)})
                                                       Data
Category
A         Once upon a time. There was a king. who ruled ...
B         He loved each of them dearly.  One day, when t...
C                   It is so difficult to deny  the reality

【讨论】:

  • csv 中的其他列是否也不同?这意味着如果您的 csv 中有第二列 Data2,那么每个 Category 的记录在每一行上是否相同或不同?
【解决方案2】:

itertools.groupby 可以提供帮助(假设您的第一行中的字母是有序的):

from itertools import groupby
from io import StringIO

text = '''Category         Data
A                Once upon a time.
A                There was a king.
A                who ruled a great and glorious nation.
B                He loved each of them dearly.
B                One day, when the young ladies were of age to be married.
B                terrible, three-headed dragon laid.
C                It is so difficult to deny
C                the reality
'''

with StringIO(text) as file:
    next(file)  # skip header
    rows = (row.split('                ') for row in file)
    for key, items in groupby(rows, key=lambda x: x[0]):
        phrases = (item[1].strip() for item in items)
        print(key, ' '.join(phrases))

给出:

A Once upon a time. There was a king. who ruled a great and glorious nation.
B He loved each of them dearly. One day, when the young ladies were of age to be married. terrible, three-headed dragon laid.
C It is so difficult to deny the reality

如果您的数据在文件中,则必须将上面的 with StringIO(text) as file: 替换为:

with('textfile.txt') as file:
    # do stuff as above with file

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2013-04-15
    • 2012-03-25
    • 2013-10-10
    • 1970-01-01
    • 2013-12-16
    • 2021-10-04
    • 2020-06-24
    • 1970-01-01
    相关资源
    最近更新 更多