【发布时间】:2015-05-22 04:55:07
【问题描述】:
我的目标是构建一个名为 df3 的新 DataFrame(数据框 3)。
使用df1 的['Header 1', 'Header 2', 'Normalized'] 值,我如何在df2 行中查找df1 ['Header 1', 'Header 2', 'Normalized'] 等于df2 ['Header 1', 'Header 2', 'Normalized'] 并从结果?
例如,在df1 的行0 中,Header 1、Header 2 和Normalized 等于df2 行0、1。
df1
Header 1 Header 2 Header 3 Normalized Status Match type
0 Boston Label 1 "phrase 1" phrase 1 eligible Phrase
1 DC/Baltimore Label 2 [phrase 2] phrase 2 eligible Exact
2 Philly/NJ Label 3 "phrase 3" phrase 3 eligible Phrase
3 Philly/NJ Label 4 "phrase 4" phrase 4 eligible Phrase
4 Philly/NJ Label 5 "phrase 5" phrase 5 eligible Phrase
5 Portland Label 6 "phrase 6" phrase 6 eligible Phrase
6 Raleigh/Charlotte Label 7 [phrase 7] phrase 7 eligible Exact
7 Raleigh/Charlotte Label 8 "phrase 8" phrase 8 eligible Phrase
df2
Header 1 Header 2 Header 3 Normalized Status Match type
0 Boston Label 1 +phrase +1 phrase 1 eligible Broad
1 Boston Label 1 [phrase 1] phrase 1 eligible Exact
2 DC/Baltimore Label 2 +phrase +2 phrase 2 eligible Broad
3 DC/Baltimore Label 2 "phrase 2" phrase 2 eligible Phrase
4 Frag Label 22 [what] what eligible Exact
5 Philly/NJ Label 3 +phrase +3 phrase 3 eligible Broad
6 Philly/NJ Label 4 +phrase +4 phrase 4 eligible Broad
7 Philly/NJ Label 5 +phrase +5 phrase 5 eligible Broad
8 Philly/NJ Label 3 [phrase 3] phrase 3 eligible Exact
9 Philly/NJ Label 4 [phrase 4] phrase 4 eligible Exact
10 Philly/NJ Label 5 [phrase 5] phrase 5 eligible Exact
11 Portland Label 6 +phrase +6 phrase 6 eligible Broad
12 Portland Label 6 [phrase 6] phrase 6 eligible Exact
13 Raleigh/Charlotte Label 7 +phrase +7 phrase 7 eligible Broad
14 Raleigh/Charlotte Label 8 +phrase +8 phrase 8 eligible Broad
15 Raleigh/Charlotte Label 7 "phrase 7" phrase 7 eligible Phrase
16 Raleigh/Charlotte Label 8 [phrase 8] phrase 8 eligible Exact
df3 此示例的最终结果将包括来自df1 的所有行和来自df2 的每一行,除了行 (index) 4,因为它的['Header 1', 'Header 2', 'Normalized'] 与df1 中的任何行都不匹配.
我不明白的关键是如何使用一个 DataFrame 中的多个条件来过滤另一个 DataFrame 中的数据?
编辑 1:
我的最终目标是让df3 如下表所示。需要注意的关键是 merges df1 和 df2 整行 其中['Header 1', 'Header 2', 'Normalized'] 是相等的。我已经尝试过merge 的建议。它看起来与我需要的完全一样,但我看到附加了后缀 _x、_y 的列标题。如何一口气输出以下内容?我是否必须更改标题标签以匹配原始表的标签并删除几列?还是有更好的方法?
Header 1 Header 2 Header 3 Normalized Status Match type
0 Boston Label 1 "phrase 1" phrase 1 eligible Phrase
1 DC/Baltimore Label 2 [phrase 2] phrase 2 eligible Exact
2 Philly/NJ Label 3 "phrase 3" phrase 3 eligible Phrase
3 Philly/NJ Label 4 "phrase 4" phrase 4 eligible Phrase
4 Philly/NJ Label 5 "phrase 5" phrase 5 eligible Phrase
5 Portland Label 6 "phrase 6" phrase 6 eligible Phrase
6 Raleigh/Charlotte Label 7 [phrase 7] phrase 7 eligible Exact
7 Raleigh/Charlotte Label 8 "phrase 8" phrase 8 eligible Phrase
0 Boston Label 1 +phrase +1 phrase 1 eligible Broad
1 Boston Label 1 [phrase 1] phrase 1 eligible Exact
2 DC/Baltimore Label 2 +phrase +2 phrase 2 eligible Broad
3 DC/Baltimore Label 2 "phrase 2" phrase 2 eligible Phrase
5 Philly/NJ Label 3 +phrase +3 phrase 3 eligible Broad
6 Philly/NJ Label 4 +phrase +4 phrase 4 eligible Broad
7 Philly/NJ Label 5 +phrase +5 phrase 5 eligible Broad
8 Philly/NJ Label 3 [phrase 3] phrase 3 eligible Exact
9 Philly/NJ Label 4 [phrase 4] phrase 4 eligible Exact
10 Philly/NJ Label 5 [phrase 5] phrase 5 eligible Exact
11 Portland Label 6 +phrase +6 phrase 6 eligible Broad
12 Portland Label 6 [phrase 6] phrase 6 eligible Exact
13 Raleigh/Charlotte Label 7 +phrase +7 phrase 7 eligible Broad
14 Raleigh/Charlotte Label 8 +phrase +8 phrase 8 eligible Broad
15 Raleigh/Charlotte Label 7 "phrase 7" phrase 7 eligible Phrase
16 Raleigh/Charlotte Label 8 [phrase 8] phrase 8 eligible Exact
【问题讨论】:
-
你的问题是定义最终值应该是左轴还是右轴的标准应该是什么
标签: python-3.x pandas merge