【发布时间】:2019-07-02 04:45:03
【问题描述】:
我的任务是找到一些点的最近邻居,并删除其他不是最近邻居的点。这个任务就像下采样问题。
到目前为止的代码:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import spatial
data = pd.read_csv('data.csv')
majority = data.loc[data['class']==0]
minority = data.loc[data['class']==1]
majority_points=majority.drop('class', axis=1)
minority_points=minority.drop('class', axis=1)
all_data = pd.concat([majority,minority])
data_points = all_data.drop('class', axis=1)
# print(data_points)
majority_points=np.array(majority_points)
print (majority_points)
minority_points =np.array(minority_points)
print (minority_points)
#result
[[1 1]
[1 2]
[1 3]
[1 4]
[1 5]
[2 1]
[2 2]
[2 4]
[2 5]
[3 1]
[3 2]
[3 5]
[4 1]
[4 4]
[4 5]
[5 1]
[5 2]
[5 3]
[5 4]
[5 5]] (20, 2)
[[2 3]
[3 3]
[3 4]
[4 2]
[4 3]]
`
#to find nearest neighbor
from scipy.spatial import distance
Y = distance.cdist(minority_points, majority_points, 'euclidean')
K = np.argsort(Y)
print (Y)
print ("Ordered data: \n", K)
Y.sort()
print ("After short: \n", Y)
#result
[[2.23606798 1.41421356 1. 1.41421356 2.23606798 2.
1. 1. 2. 2.23606798 1.41421356 2.23606798
2.82842712 2.23606798 2.82842712 3.60555128 3.16227766 3.
3.16227766 3.60555128]
[2.82842712 2.23606798 2. 2.23606798 2.82842712 2.23606798
1.41421356 1.41421356 2.23606798 2. 1. 2.
2.23606798 1.41421356 2.23606798 2.82842712 2.23606798 2.
2.23606798 2.82842712]
[3.60555128 2.82842712 2.23606798 2. 2.23606798 3.16227766
2.23606798 1. 1.41421356 3. 2. 1.
3.16227766 1. 1.41421356 3.60555128 2.82842712 2.23606798
2. 2.23606798]
[3.16227766 3. 3.16227766 3.60555128 4.24264069 2.23606798
2. 2.82842712 3.60555128 1.41421356 1. 3.16227766
1. 2. 3. 1.41421356 1. 1.41421356
2.23606798 3.16227766]
[3.60555128 3.16227766 3. 3.16227766 3.60555128 2.82842712
2.23606798 2.23606798 2.82842712 2.23606798 1.41421356 2.23606798
2. 1. 2. 2.23606798 1.41421356 1.
1.41421356 2.23606798]]
Ordered data:
[[ 2 6 7 1 3 10 5 8 0 13 11 9 4 12 14 17 16 18 15 19]
[10 6 7 13 9 2 17 11 1 3 5 8 18 12 14 16 0 4 15 19]
[ 7 11 13 8 14 3 18 10 19 2 4 17 6 1 16 9 12 5 15 0]
[16 10 12 9 17 15 6 13 5 18 7 1 14 0 2 11 19 8 3 4]
[17 13 16 10 18 14 12 9 15 11 19 7 6 5 8 2 3 1 4 0]]
After short:
[[1. 1. 1. 1.41421356 1.41421356 1.41421356
2. 2. 2.23606798 2.23606798 2.23606798 2.23606798
2.23606798 2.82842712 2.82842712 3. 3.16227766 3.16227766
3.60555128 3.60555128]
[1. 1.41421356 1.41421356 1.41421356 2. 2.
2. 2. 2.23606798 2.23606798 2.23606798 2.23606798
2.23606798 2.23606798 2.23606798 2.23606798 2.82842712 2.82842712
2.82842712 2.82842712]
[1. 1. 1. 1.41421356 1.41421356 2.
2. 2. 2.23606798 2.23606798 2.23606798 2.23606798
2.23606798 2.82842712 2.82842712 3. 3.16227766 3.16227766
3.60555128 3.60555128]
[1. 1. 1. 1.41421356 1.41421356 1.41421356
2. 2. 2.23606798 2.23606798 2.82842712 3.
3. 3.16227766 3.16227766 3.16227766 3.16227766 3.60555128
3.60555128 4.24264069]
[1. 1. 1.41421356 1.41421356 1.41421356 2.
2. 2.23606798 2.23606798 2.23606798 2.23606798 2.23606798
2.23606798 2.82842712 2.82842712 3. 3.16227766 3.16227766
3.60555128 3.60555128]]
我想将少数点中每个点的 3 个最近邻变为多数点,并保留其数组的值,其余的被删除。
这是插图:
红点是少数示例,蓝点是多数示例。因此,每个少数类计算它的,例如,与多数最近的 3 个邻居。然后算法去除了一些不是最近邻的点。
【问题讨论】:
-
问题不清楚。
-
哪一部分你不明白? @AkshaySehgal
-
您的数据是如何存储的?你还有代码吗?展示您已经尝试过的方法对于提出一个好问题是不可或缺的。
-
我添加了它@Acoop
-
你看到我的回答了吗?