【发布时间】:2018-01-07 02:49:17
【问题描述】:
所以,我有一个包含超过 10^6 行的数据框,我只是将 lat(最小度数)转换为 lat(仅度数)。然而,我的框架中有一些行有一个字符串“p-”,它在早期杀死了我的循环。我已经尝试了一些东西(如下)。
代码:
import pandas as pd
import numpy as np
import glob
import matplotlib.pyplot as plt
path = r'/home/engr/Documents/SchoolHR/Data/SFSU-Boat/SBE45m/2015/'
allfiles_list = glob.glob(path + "/15*.hex")
allfiles_list = sorted(allfiles_list)
col = ["temp", "conduct", "salinity", "lat", "lon", "hms", "dmy"]
big_frame = pd.DataFrame()
for name in allfiles_list:
df = pd.read_csv(name, skiprows=12, encoding="latin1", names=col, na_values=0, na_filter=False, engine="c")
big_frame = big_frame.append(df)
# TODO surgery on columns to convert to float for use on big_frame
# regex \D to remove any non-digit characters -- hms & dmy
big_frame["hms"].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')
big_frame["dmy"].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')
big_frame["temp"].replace(regex=True,inplace=True,to_replace='(\D.\=)',value='')
big_frame["conduct"].replace(regex=True,inplace=True,to_replace='(\D.\=)',value='')
big_frame["salinity"].replace(regex=True,inplace=True,to_replace='(\D.\=)',value='')
big_frame["lat"].replace(regex=True,inplace=True,to_replace='[lonat=]',value='')
big_frame["lon"].replace(regex=True,inplace=True,to_replace='[lonat=]',value='')
for index, row in big_frame.iterrows():
if row.lat[-1] == 'N':
D = float(row.lat[1:3])
M = float(row.lat[4:10])
DD = D + float(M/60)
row.lat = DD
if row.lon[-1] == 'W':
D1 = float(row.lon[1:4])
M1 = float(row.lon[5:12])
DD1 = D1 + float(M1/60)
row.lon = -DD1
代码返回此错误:
ValueError: could not convert string to float: 'p-'
我尝试通过这样做来修改代码并在数据帧上运行循环:
big_frame['lon'] = big_frame.lon.str.replace('p-?' , '')
big_frame['lat'] = big_frame.lat.str.replace('p-?' , '')
big_frame["lat"].replace(regex=True,inplace=True,to_replace='[)]',value='')
big_frame["lon"].replace(regex=True,inplace=True,to_replace='[)]',value='')
但我只是收到了这个:
IndexError: string index out of range
下面的示例数据集:
t1= 16.8828, c1= 3.59481, s= 27.3995, lat=37 46.985 N, lon=122 15.544 W, hms=143857, dmy=170315
t1= 16.8674, c1= 3.59335, s= 27.3977, lat=37 46.975 N, lon=122 15.523 W, hms=143907, dmy=170315
t1= 16.8441, c1= 3.59179, s= 27.4003, lat=37 46.966 N, lon=122 15.502 W, hms=143917, dmy=170315
t1= 16.8353, c1= 3.59183, s= 27.4066, lat=37 46.956 N, lon=122 15.480 W, hms=143927, dmy=170315
t1= 16.8169, c1= 3.59054, s= 27.4082, lat=37 46.946 N, lon=122 15.459 W, hms=143937, dmy=170315
t1= 16.8018, c1= 3.58917, s= 27.4068, lat=37 46.936 N, lon=122 15.438 W, hms=143947, dmy=170315
t1= 16.8072, c1= 3.59052, s= 27.4147, lat=37 46.926 N, lon=122 15.417 W, hms=143957, dmy=170315
t1= 16.8361, c1= 3.59415, s= 27.4257, lat=37 46.916 N, lon=122 15.396 W, hms=144007, dmy=170315
t1= 16.8612, c1= 3.59678, s= 27.4308, lat=37 46.907 N, lon=122 15.375 W, hms=144017, dmy=170315
t1= 16.8452, c1= 3.59187, s= 27.4002, lat=37 46.898 N, lon=122 15.356 W, hms=144027, dmy=170315
t1= 16.8439, c1= 3.58982, s= 27.3838, lat=37 46.890 N, lon=122 15.337 W, hms=144037, dmy=170315
t1= 16.8328, c1= 3.58865, s= 27.3814, lat=37 46.882 N, lon=122 15.322 W, hms=144047, dmy=170315
t1= 16.8257, c1= 3.58841, s= 27.3842, lat=37 46.874 N, lon=122 15.307 W, hms=144057, dmy=170315
t1= 16.8165, c1= 3.58856, s= 27.3917, lat=37 46.866 N, lon=122 15.292 W, hms=144107, dmy=170315
t1= 16.8103, c1= 3.58836, s= 27.3942, lat=37 46.858 N, lon=122 15.277 W, hms=144117, dmy=170315
t1= 16.8400, c1= 3.58819, s= 27.3726, lat=37 46.850 N, lon=122 15.263 W, hms=144127, dmy=170315
t1= 16.8547, c1= 3.58945, s= 27.3733, lat=37 46.841 N, lon=122 15.249 W, hms=144137, dmy=170315
t1= 16.8784, c1= 3.59235, s= 27.3817, lat=37 46.833 N, lon=122 15.235 W, hms=144147, dmy=170315
t1= 16.8817, c1= 3.59347, s= 27.3889, lat=37 46.825 N, lon=122 15.221 W, hms=144157, dmy=170315
【问题讨论】: