【发布时间】:2018-06-22 22:02:47
【问题描述】:
这是我的数据框:
data1 = [['2017-02-10','orange','jon','small','1','1.1'], ['2017-02-10','orange','jon','medium','1','2.1'], ['2017-02-10','orange','jon','large','1','3.1'], ['2017-02-11','orange','mary','small','2','1.2'], ['2017-02-10','orange','jon','medium','2','2.2'], ['2017-02-10','orange','jon','large','2','3.2'], ['2017-02-10','orange','jon','small','1','7.1'], ['2017-02-11','orange','mary','medium','1','8.1'], ['2017-02-11','orange','mary','large','1','9.1'], ['2017-02-11','orange','mary','small','2','10.1'], ['2017-02-11','orange','mary','medium','2','11.1'], ['2017-02-11','orange','mary','large','2','12.1']]
df = pd.DataFrame(data1,columns=['date', 'fruit', 'name', 'size', 'replicate', 'weight'])
print df
date fruit name size replicate weight
0 2017-02-10 orange jon small 1 1.1
1 2017-02-10 orange jon medium 1 2.1
2 2017-02-10 orange jon large 1 3.1
3 2017-02-11 orange mary small 2 1.2
4 2017-02-10 orange jon medium 2 2.2
5 2017-02-10 orange jon large 2 3.2
6 2017-02-10 orange jon small 1 7.1
7 2017-02-11 orange mary medium 1 8.1
8 2017-02-11 orange mary large 1 9.1
9 2017-02-11 orange mary small 2 10.1
10 2017-02-11 orange mary medium 2 11.1
11 2017-02-11 orange mary large 2 12.1
我需要对这个数据框进行分组,以便输出具有由复制分隔的小值、中值和大值,如下所示:
val1 = ['2017-02-10', 'orange', 'jon', 'small', '1', '1.1'],
['2017-02-10', 'orange', 'jon', 'medium', '1', '2.1'],
['2017-02-10', 'orange', 'jon', 'large', '1', '3.1'],
val2 = ['2017-02-10', 'orange', 'jon', 'small', '2', '7.1'],
['2017-02-10', 'orange', 'jon', 'medium', '2', '2.2'],
['2017-02-10', 'orange', 'jon', 'large', '2', '3.2'],
val3 = ['2017-02-11', 'orange', 'mary', 'small', '1', '1.2'],
['2017-02-11', 'orange', 'mary', 'medium', '1', '8.1'],
['2017-02-11', 'orange', 'mary', 'large', '1', '9.1'],
val4....
输出的格式无关紧要,更重要的是如何对数据进行适当的分组。使用非 pandas/numpy 方法,我可以从多个列中获取的值创建一个唯一标识符,这样如果“jon”实例不合适,它仍然会在输出中正确分组。更具体地说,每个输出组可以有一个唯一标识符“日期”、“水果”、“名称”,但必须具有“小”、“中”和“大”的所有对应实例,以及项目。
【问题讨论】:
-
您是否只想一次提取 3 行?另外,您输入的内容不应该是
val4吗? -
不,我不想一次只提取 3 行。这个例子可能是这样组织的,但不是所有的都是这样。是的 val4 也应该在输出中
标签: python pandas numpy grouping unique