【发布时间】:2018-01-12 19:11:12
【问题描述】:
我想加入 2 个数据帧
邮政编码数据库(前 10 个条目)
0 zip_code City State County Population
0 0 90001 Los Angeles California Los Angeles 54481
1 1 90002 Los Angeles California Los Angeles 44584
2 2 90003 Los Angeles California Los Angeles 58187
3 3 90004 Los Angeles California Los Angeles 67850
4 4 90005 Los Angeles California Los Angeles 43014
5 5 90006 Los Angeles California Los Angeles 62765
6 6 90007 Los Angeles California Los Angeles 45021
7 7 90008 Los Angeles California Los Angeles 30840
8 8 90009 Los Angeles California Los Angeles -
9 9 90010 Los Angeles California Los Angeles 1943
和 数据(前 10 个条目)
buyer zip_code
0 SWEENEY,THOMAS R & MICHELLE H NaN
1 DOUGHERTY,HERBERT III & JENNIFER M NaN
2 WEST COAST RLTY SVCS INC NaN
3 LOVE,JULIE M NaN
4 SAHAR,DAVID NaN
5 SILBERSTERN,BRADLEY E TRUST 91199
6 LEE,SUSAN & JIMMY C 92025
7 FRAZZANO REAL ESTATE I NC NaN
8 RUV INVESTMENTS LLC 91730
9 KAOS KAPITAL LLC NaN
所以决赛桌应该有 [buyer, zip_code, City, County]。我加入的是邮政编码。
data_2 = data.join(zipcode_database[['City', 'County', 'zip_code']].set_index('zip_code'), on='zip_code')
但是 city 和 county 列是 NaN 即使对于 data 中实际存在邮政编码的元组.
buyer zip_code City County
10 LANDON AVE TRUST 37736 NaN NaN NaN
11 UMAR,AHMAD NaN NaN NaN
12 3 JPS INC 90717 NaN NaN
13 T & L HOLDINGS INC 95610 NaN NaN
14 CAHP HOLDINGS LLC 90808 NaN NaN
15 REBUILDING TOGETHER LONG BEACH 92344 NaN NaN
16 COLFIN AI-CA 4 LLC NaN NaN NaN
17 GUTIERREZ,HUGO 91381 NaN NaN
18 VALBRIDGE CAP GOLDEN GATE FUND NaN NaN NaN
19 SOLARES,OSCAR 92570 NaN NaN
为什么会这样?邮政编码数据库包含从 90001 到 999950 的所有邮政编码。
我的第一个想法是两者中“zip_code”的数据类型不同:
print(zipcode_database['zip_code'].dtype)
print(data['zip_code'].dtype)
输出:
int64
object
考虑过使用astype 进行类型转换,但这不适用于NaN 值。有什么想法吗?
【问题讨论】:
标签: python python-3.x pandas join