【发布时间】:2018-05-24 14:25:53
【问题描述】:
我正在尝试编写一个 Python 脚本,该脚本将获取任何 CSV 文件,通过地理编码器运行它,然后将生成的地理编码属性(+原始文件中的所有数据)写入新的 csv 文件。
到目前为止,我的代码如下,我应该注意到除了将地理编码属性与原始 csv 文件中的数据相结合之外,一切都按预期工作。目前发生的情况是特定行的所有原始 csv 文件的字段值在 csv 文件中仅显示为一个值(尽管地理编码属性显示正确)。脚本的问题位于最后。为简洁起见,我省略了不同类的代码。
我还应该注意我正在使用 hasattr*,因为虽然我不知道原始 in_file 中的所有字段是什么,但我知道在输入 csv 中的某个地方会出现这些字段,这些字段是所需的地理编码。
最初我尝试将“new_file.writerow([])”更改为“new_file.writerow()”,此时行输入-r-确实正确写入了csv文件,但无法再写入地理编码属性到 csv,因为它们被视为附加参数。
def locate(file=None):
""" locate by geocoding func"""
start_time = time.time()
count = 0
if file != None:
with open (file) as in_file:
f_csv = csv.reader(in_file)
# regex headers and lowercase to standarize for hasattr func.
headers = [ re.sub('["\s+]', '_', h).lower() for h in next(f_csv)]
# Used namedtuple for headers
Row = namedtuple('Row', headers)
# for row in file
for r in f_csv:
count += 1
# set row values to named tuple values
row = Row(*r)
# Try hasattr to find fields names address, city, state, zipcode
if hasattr(row, 'address'):
address = row.address
elif hasattr(row, 'address1'):
address = row.address1
if hasattr(row, 'city'):
city = row.city
if hasattr(row, 'state'):
state = row.state
elif hasattr(row, 'st'):
state = row.st
if hasattr(row, 'zipcode'):
zipCode = row.zipcode
elif hasattr(row, 'zip'):
zipCode = row.zipcode
# Create new address object
addressObject = Address(address, city, state, zipCode)
# Get response from api
data = requests.get(addressObject.__str__()).json()
try:
data['geocodeStatusCode'] = int(data['geocodeStatusCode'])
except:
data['geocodeStatusCode'] = None
if data['geocodeStatusCode'] == 'SomeNumber':
# geocoded address ideally uses parent class attributes
geocodedAddressObject = GeocodedAddress(addressObject.address, addressObject.city, addressObject.state, addressObject.zipCode, data['addressGeo']['latitude'], data['addressGeo']['longitude'], data['addressGeo']['score'])
else:
geocodedAddressObject = GeocodedAddress(addressObject.address, addressObject.city, addressObject.state, addressObject.zipCode)
# Problem Area
geocoded_file = file.replace('.csv', '_geocoded2') + '.csv'
with open(geocoded_file, 'a', newline='') as geocoded:
# Problem area -- the r -row- attribute writes all within the same cell even though they are comma separated. The geocoding attributes do write correctly to the csv file
new_file = csv.writer(geocoded)
new_file.writerow([r, geocodedAddressObject.latitude, geocodedAddressObject.longitude, geocodedAddressObject.geocodeScore])
print('The time to geocode {} records: {}'.format(count, (time.time() - start_time)))
CSV 输入数据示例:
"UID", "Occupant", "Address", "City", "State", "ZipCode"
"100001", "Playstation Theater", "New York", "NY", "10036"
"100002", "Ed Sullivan Theater", "New York, "NY", "10019"
CSV 输出示例(在地理编码期间解析附加字段)
"UID", "Occupant", "Address", "City", "State", "ZipCode", "GeoCodingLatitude", "GeoCodingLongitude", "GeoCodingScore"
"100001", "Playstation Theater", "New York", "NY", "10036", "45.1234", "-110.4567", "100"
"100002", "Ed Sullivan Theater", "New York, "NY", "10019", "44.1234", "-111.4567", "100"
【问题讨论】:
-
听起来你应该使用
DictReader。显示预期与实际输出和示例输入会有所帮助。 -
@MarkTolonen 如果您认为我应该在问题、风格或内容中添加任何其他内容,以便更容易回答 - 请告诉我。谢谢!
标签: csv python-3.6 namedtuple