【发布时间】:2022-01-22 05:02:58
【问题描述】:
这是我的带有硬编码项目的概念证明脚本。我写这篇文章是为了将用户输入地址列表与从县地址列表中提取的地址的唯一项目列表进行比较。使用街道名称,我正在使用 difflib 查找最匹配的正确地址,以清理常见的拼写错误、不正确的道路名称和格式。我不明白为什么这写得不正确。如果你能帮助我,那就太好了。输出不需要是 .txt。这正是我用来练习的。
这似乎是一个简单的错误,但我无法弄清楚。在我的 IDE 中运行 print 语句时,结果很完美:OBJECTID、LONG、LAT、ADDRESS
1,-121.5013397,38.57353936,624 Q ST
2,-121.4889809,38.58067826,1229 I ST
3,-121.6252964,38.68504066,7208 W ELKHORN BLVD
4,-121.4648967,38.57105638,3145 GRANADA WY
5,-121.5034945,38.56493704,731 BROADWAY
6,-121.4643582,38.54432866,3301 MARTIN LUTHER KING JR BLVD
7,-121.4267998,38.46806583,6500 WYNDHAM DR
8,-121.4277157,38.56776765,5990 H ST
9,-121.4261309,38.52390186,5642 66TH ST
10,-121.5312586,38.49791376,785 FLORIN ROAD
11,-121.4836172,38.53385557,4500 24TH ST
12,-121.5182376,38.51647637,1100 43RD AV
13,-121.4826673,38.59115124,1341 N C STREET
14,-121.497416,38.615358,1640 W EL CAMINO
15,-121.4798681,38.49076918,7363 24TH ST
16,-121.435397,38.64776157,1311 BELL AVE
17,-121.435397,38.64776157,
18,-121.479827,38.64700504,746 NORTH MARKET BLVD
19,-121.4275146,38.59602966,1700 CHALLENGE WY
20,-121.4476495,38.61318471,2512 RIO LINDA BLVD
21,-121.5036868,38.67119467,1901 CLUB CENTER DR
22,-121.54029,38.6446808,4201 EL CENTRO RD
23,-121.4656495,38.51005465,3720 47TH AVE
24,-121.4538398,38.48538997,7927 EAST PARKWAY
25,-121.3928243,38.54872313,3301 JULLIARD DR
26,-121.4656495,38.51005465,
在它写入的 .txt 中,我得到的只是:
26,-121.4656495,38.51005465,
这是我所拥有的:
import csv
import usaddress
import difflib
def cls_name(stname, list):
c = difflib.get_close_matches(
stname,
list)[0]
return str(c)
unqlst = ['Q ST', 'I ST', 'W ELKHORN BLVD', 'GRANADA WY', 'BROADWAY', 'MARTIN LUTHER KING JR BLVD', 'WYNDHAM DR',
'H ST', '66TH ST', 'FLORIN ROAD', '24TH ST', '43RD AV', 'N C STREET', 'W EL CAMINO', '24TH ST', 'BELL AVE',
'NORTH MARKET BLVD', 'CHALLENGE WY', 'RIO LINDA BLVD', 'CLUB CENTER DR', 'EL CENTRO RD', '47TH AVE',
'EAST PARKWAY']
# path = r'C:\Users\Michael\Desktop\Fire_Stations.csv'
# with open(path) as file:
with open(r"C:\Users\Michael\Desktop\Fire_Stations.csv") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
# the below statement will skip the first row
next(csv_reader)
for line in csv_file:
line = line.split(',')
addys = line[3]
# addys = addys.strip('\n')
addys = addys.upper()
addys = usaddress.tag(addys) # prototype: getting parcel address w/o numbers for phase one cleaning, only
try:
rdnum = addys[0]['AddressNumber'] # Needed Try/Except I think because first title line
except KeyError:
rdnum = ''
try:
rsdir = addys[0]['StreetNamePreDirectional']
except KeyError:
rsdir = ''
try:
rdname = addys[0]['StreetName']
except KeyError:
rdname = ''
try:
rddsg = addys[0]['StreetNamePostType']
except KeyError:
rddsg = ''
wrdsrdname = (rsdir, rdname, rddsg)
wrdsrdname = " ".join(wrdsrdname)
wrdsrdname = wrdsrdname.strip()
try:
if wrdsrdname in unqlst: # if roadname is in the unique list from counties file, do nothing, if not find closest in list
# print('ADDRESS CORRECT')
pass # print(rdname)
else:
wrdsrdname = cls_name(wrdsrdname, unqlst) # calling fuction to find closest name
# print('ADDRESS INCORRECT BUT FIXED')
# print(rdname)
except:
# print('error002: no address match')
pass
tgthr = (rdnum, wrdsrdname)
final = (' '.join(tgthr))
# print(final)
# header = ['OBJECTID', 'LONG', 'LAT', 'ADDRESS']
data = [line[0], line[1], line[2], final]
data = ','.join(data)
print(data)
with open('Fire_Stations.txt', 'w') as f:
f.write(data)
【问题讨论】:
-
您可以使用相同的
with语句打开这两个文件。然后,您可以在 for 循环中编写每一行。现在,您只在 for 循环结束后编写,这就是为什么它只是最后一行。旁注:所有 try/except KeyError 的东西都可以用这种对.get()的调用代替:addys[0].get('StreetNamePostType')如果找不到密钥,则不会抛出 KeyError。 -
@mechanical_meat 谢谢!在同一行中打开文件有效!但是对于
.get(),当我想要的结果是什么都不返回时,我仍然没有返回NONE。有没有办法使用.get()并且如果找不到密钥则不返回任何内容? -
哦,我明白了,是的,您可以使用:
addys[0].get('StreetNamePostType','')在找不到密钥时获得空字符串。
标签: python