【发布时间】:2016-09-05 22:06:45
【问题描述】:
我有一个带点的数据框。前两列是职位。我正在根据与另一个点接近的点过滤数据。我用 cdist 计算所有点的距离,然后过滤这个结果以找到彼此之间距离小于 0.5 的点的索引。我还必须首先对这些索引进行两个微型过滤器,以删除删除索引以比较相同的点距离 [n,n] = 距离 [n,n] 将始终为零,我不想删除我的所有点.我还删除了类似距离比较距离 [n,m] = 距离 [m,n] 的索引。我需要删除的点数基本上是两倍,所以我使用 unique 过滤掉一半。
我的问题loc_find 是一个 numpy 数组,其中包含应该删除的行的索引。如何删除使用此数组从我的 pandas 数据帧中删除这些编号的行而不迭代数据帧?
from scipy.spatial.distance import cdist
import numpy as np
import pandas as pd
# make points and calculate distances
east=data['easting'].values
north=data['northing'].values
points=np.vstack((east,north)).T
distances=cdist(points,points) # big row x row matrix
zzzz=np.where(distances<0.5)
loc_dist=np.vstack((zzzz[0],zzzz[1])).T #array of indices where points are
# to close together and will be filtered contains unwanted distance
# comparisons such as comparing data[1,1] with data[1,1] which is always zero
#since it is the same point. also distance [1,2] is same as [2,1]
#My code for filtering the indices
loc_dist=loc_dist.astype('int')
diff_loc=zzzz[0]-zzzz[1] # remove indices for comparing the same
#point distance [n,n] = distance [n,n]
diff_zero=np.where(diff_loc==0)
loc_dist_s=np.delete(loc_dist, diff_zero[0],axis=0)
loc_find=np.unique(loc_dist_s) # remove indices for similar distance
#comparisons distance [n,m] = distance [m,n]
【问题讨论】:
-
df.drop(loc_find)应该可以工作 -
@EdChum 感谢关于 loc_find 的建议,这让我在 Github 上找到了另一个问题,该问题有答案。
标签: python pandas indexing dataframe filtering