【问题标题】:compare dfs with nearest Lon,Lat (Python, Pandas)将 dfs 与最近的 Lon,Lat (Python, Pandas) 进行比较
【发布时间】:2019-04-02 19:30:28
【问题描述】:

我有一个大的 df1 列(Lon,Lat,V1,V2,V3)和一个大的 df2(V4,V5,Lat,Lon,V6)。 dfs 坐标不完全匹配。 df2 有不同的行号。我想要: 1) 根据 (abs(df1.Lon-df2.Lon

df1:

Lon,Lat,V1,V2,V3
-94.9324,34.9099,5.0,66.9,46.6
-103.524,34.457,6.0,186.7,3.8
-92.5145,38.7823,4.0,188.7,273.5
-92.5143,37.3182,2.0,78.8,218.4
-92.5142,36.6965,5.0,98.5,27.7
-89.2187,36.4448,7.3,79.8,35.8

df2:

V4,V5,Lat,Lon,V6
20190329,10,35.0,-94.9,105.9
20180329,11,34.5,-103.5,305.9
20170329,15,38.7,-92.5,206.0
20160329,14,36.5,-89.22,402.1
20150329,13,36.7,-92.6,316.1
20140329,05,37.4,-92.5,290.0
20130329,05,33.8,-89.2,250.0

df3:

Lon,Lat,V1,V6
-94.9324,34.9099,5.0,105.9
-103.524,34.457,6.0,305.9
-92.5145,38.7823,4.0,206.0
-92.5143,37.3182,2.0,290.0
-92.5142,36.6965,5.0,316.1
-89.2187,36.4448,7.3,402.1

不同的代码不起作用:

df3 = df1.loc[~((abs(df2.Lat - df1.Lat) <= 0.11) & (abs(df2.Lon - df1.Lon) <= 0.11))]
df3 = df1.where((abs(df1[df1.Lon] - df2[df2.Lon]) <=0.11) & (abs(df1[df1.Lat] -df2[df2.Lat]) <=0.11))
df3 = pd.merge(df1, df2, on=[(abs(df1.Lon-df2.Lon)<=0.11), (abs(df1.Lat-df2.Lat)<=0.11)], how='inner')

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    这是可能的,但使用交叉连接,所以如果DataFrames 很大,需要大量内存:

    df = pd.merge(df1.assign(A=1), df2.assign(A=1), on='A', how='outer', suffixes=('','_'))
    
    cols = ['Lon','Lat','V1','V6']
    df3 = df[(((df.Lat_ - df.Lat) <= 0.11).abs() & ((df.Lon_ - df.Lon).abs() <= 0.11))]
    df3 = df3.drop_duplicates(subset=df1.columns)[cols]
    print (df3)
             Lon      Lat   V1     V6
    0   -94.9324  34.9099  5.0  105.9
    8  -103.5240  34.4570  6.0  305.9
    16  -92.5145  38.7823  4.0  206.0
    25  -92.5143  37.3182  2.0  316.1
    32  -92.5142  36.6965  5.0  316.1
    38  -89.2187  36.4448  7.3  402.1
    

    【讨论】:

    • @user2031063 - 也已编辑,如何处理真实数据?
    • 我在第一行遇到内存错误 (df=pd.merge..etc)。我正在运行 64 位并且有足够的内存。知道如何解决这个问题吗?谢谢
    • @user2031063 - 我很担心。两个数据框中的行数是多少?
    • RAM 的大小是多少?
    • 每个df中超过1M行!
    猜你喜欢
    • 2013-04-13
    • 2016-02-20
    • 1970-01-01
    • 2017-07-14
    • 1970-01-01
    • 2016-07-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多