【发布时间】:2022-01-18 18:16:03
【问题描述】:
如果想在 python 中的大 .csv 文件中获得最接近的匹配。我的(缩短的).csv 文件是:
0,4,5,0,132,24055,0,64,6,23215,39635,22,21451751,3233419908,8,0,4126,368,15087,0
0,4,5,16,52,22607,0,64,6,24727,22,39635,3233439332,21453192,8,0,26,501,28207,0
1,4,5,0,40,1727,0,128,6,29216,62281,22,123196295,3338477204,5,0,26,513,30738,0
0,4,5,0,116,24108,0,64,6,23178,39635,22,21452647,3233437508,8,0,4126,644,61163,0
0,4,5,0,724,32046,0,64,6,14632,38655,22,1452688218,1828171762,8,0,4126,343,31853,0
0,4,5,0,76,26502,0,128,6,4405,50266,22,1776918274,3172205875,5,0,4126,512,9381,0
1,4,5,0,40,7662,0,64,6,39665,22,62202,3176642698,3972914889,5,0,26,501,63331,0
1,4,5,0,52,939,0,128,6,29992,62206,22,1466629610,0,8,0,44,64240,43460,0
0,4,5,16,76,10076,0,64,6,37199,22,50268,4016221794,718292575,5,0,4126,501,310,0
0,4,5,0,40,26722,0,128,6,4221,50270,22,38340335,3852724687,5,0,26,510,36549,0
0,4,5,0,76,26631,0,128,6,4276,50266,22,1776920362,3172222235,5,0,4126,511,61692,0
0,4,5,16,148,38558,0,64,6,8680,22,37221,2019795091,3598991383,8,0,4126,501,9098,0
0,4,5,0,52,24058,0,64,6,23292,39635,22,21452135,3233420036,8,0,26,368,38558,0
0,4,5,16,76,10249,0,64,6,37026,22,50266,3172221011,1776919966,5,0,4126,501,31557,0
0,4,5,16,212,38490,0,64,6,8684,22,37221,2019776067,3598991175,8,0,4126,501,56063,0
0,4,5,0,60,0,0,64,6,47342,22,44751,2722242689,3606442876,10,0,4426,65160,29042,0
0,4,5,16,76,10234,0,64,6,37041,22,50266,3172220319,1776919498,5,0,4126,501,49854,0
1,4,5,0,1016,1737,0,128,6,28230,62273,22,3387237183,3449598142,5,0,4126,513,49536,0
1,4,5,0,40,20630,0,64,6,26697,22,62288,4040909519,95375909,5,0,26,501,36104,0
0,4,5,16,180,22591,0,64,6,24615,22,39635,3233437764,21452775,8,0,4126,501,28548,0
0,4,5,0,52,31654,0,64,6,15696,47873,22,3476257438,205382502,8,0,26,368,59804,0
1,4,5,0,320,20922,0,64,6,26125,22,62195,2187234888,2519273239,5,0,4126,501,52263,0
0,4,5,0,1132,22526,0,64,6,23744,22,39635,3233417124,21450447,8,0,4126,509,12391,0
1,4,5,0,52,0,0,64,6,47315,22,62282,3209938138,2722777338,8,0,4426,64240,36683,0
0,4,5,0,52,3091,0,64,6,44259,22,38655,1828172842,1452688914,8,0,26,504,7425,0
0,4,5,16,132,10184,0,64,6,37035,22,50266,3172212167,1776918310,5,0,4126,501,44260,0
0,4,5,16,256,10167,0,64,6,36928,22,50266,3172210503,1776918310,5,0,4126,501,19165,0
1,4,5,0,120,2043,0,128,6,28820,62294,22,644393448,2960970388,5,0,4126,512,36939,0
0,4,5,16,196,38575,0,64,6,8615,22,37221,2019796627,3598991543,8,0,4126,501,29587,0
0,4,5,16,148,22599,0,64,6,24639,22,39635,3233438532,21452967,8,0,4126,501,41316,0
1,4,5,0,88,1733,0,128,6,29162,62267,22,872073945,3114048214,5,0,4126,508,23918,0
我已经做了一个程序,但它还没有完成,我不知道我该如何完成它。我必须使用其他程序吗?:
with open("<dir>", "r") as file:
file = file.readlines()
len_ = len(file)
string = "4,5,0,52,32345,0,64,6,15005,37221,22,3598991799,2019801315,8,0,26,691,17176,0" #The string, that I want to find the neares data in the .csv data.
list_ = []
for i in range(1, len_):
item = str(file[i])
item2 = item[2:]
list_.append(item2)
for item in list_:
算法:在行上从左到右查找,找到与搜索数据连续匹配最多的行。
【问题讨论】:
-
用什么方式最接近的匹配?由于这是 csv,您是要匹配来自多个列的值,还是要像字符串一样匹配它们?这里的预期结果是什么?
-
期望值应该是csv文件中的第一个,在本例中为0或1
-
输出应该是第一个值。我很想获得所有其他值的最接近匹配。 @JCaesar
-
您需要明确定义“最近匹配”的含义。