【问题标题】:pandas ValueError: could not convert string to float: 'p-'pandas ValueError:无法将字符串转换为浮点数:'p-'
【发布时间】:2018-01-07 02:49:17
【问题描述】:

所以,我有一个包含超过 10^6 行的数据框,我只是将 lat(最小度数)转换为 lat(仅度数)。然而,我的框架中有一些行有一个字符串“p-”,它在早期杀死了我的循环。我已经尝试了一些东西(如下)。

代码:

import pandas as pd
import numpy as np
import glob
import matplotlib.pyplot as plt

path = r'/home/engr/Documents/SchoolHR/Data/SFSU-Boat/SBE45m/2015/'

allfiles_list = glob.glob(path + "/15*.hex")
allfiles_list = sorted(allfiles_list)
col = ["temp", "conduct", "salinity", "lat", "lon", "hms", "dmy"]
big_frame = pd.DataFrame()

for name in allfiles_list:
    df = pd.read_csv(name, skiprows=12, encoding="latin1", names=col, na_values=0, na_filter=False, engine="c")
    big_frame = big_frame.append(df)

# TODO surgery on columns to convert to float for use on big_frame

# regex \D to remove any non-digit characters -- hms & dmy
big_frame["hms"].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')
big_frame["dmy"].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')
big_frame["temp"].replace(regex=True,inplace=True,to_replace='(\D.\=)',value='')
big_frame["conduct"].replace(regex=True,inplace=True,to_replace='(\D.\=)',value='')
big_frame["salinity"].replace(regex=True,inplace=True,to_replace='(\D.\=)',value='')
big_frame["lat"].replace(regex=True,inplace=True,to_replace='[lonat=]',value='')
big_frame["lon"].replace(regex=True,inplace=True,to_replace='[lonat=]',value='')

for index, row in big_frame.iterrows():
if row.lat[-1] == 'N':
    D = float(row.lat[1:3])
    M = float(row.lat[4:10])
    DD = D + float(M/60)
    row.lat = DD
if row.lon[-1] == 'W':
    D1 = float(row.lon[1:4])
    M1 = float(row.lon[5:12])
    DD1 = D1 + float(M1/60)
    row.lon = -DD1

代码返回此错误:

ValueError: could not convert string to float: 'p-'

我尝试通过这样做来修改代码并在数据帧上运行循环:

big_frame['lon'] = big_frame.lon.str.replace('p-?' , '')
big_frame['lat'] = big_frame.lat.str.replace('p-?' , '')
big_frame["lat"].replace(regex=True,inplace=True,to_replace='[)]',value='')
big_frame["lon"].replace(regex=True,inplace=True,to_replace='[)]',value='')

但我只是收到了这个:

IndexError: string index out of range

下面的示例数据集:

t1= 16.8828, c1= 3.59481, s= 27.3995, lat=37 46.985 N, lon=122 15.544 W, hms=143857, dmy=170315
t1= 16.8674, c1= 3.59335, s= 27.3977, lat=37 46.975 N, lon=122 15.523 W, hms=143907, dmy=170315
t1= 16.8441, c1= 3.59179, s= 27.4003, lat=37 46.966 N, lon=122 15.502 W, hms=143917, dmy=170315
t1= 16.8353, c1= 3.59183, s= 27.4066, lat=37 46.956 N, lon=122 15.480 W, hms=143927, dmy=170315
t1= 16.8169, c1= 3.59054, s= 27.4082, lat=37 46.946 N, lon=122 15.459 W, hms=143937, dmy=170315
t1= 16.8018, c1= 3.58917, s= 27.4068, lat=37 46.936 N, lon=122 15.438 W, hms=143947, dmy=170315
t1= 16.8072, c1= 3.59052, s= 27.4147, lat=37 46.926 N, lon=122 15.417 W, hms=143957, dmy=170315
t1= 16.8361, c1= 3.59415, s= 27.4257, lat=37 46.916 N, lon=122 15.396 W, hms=144007, dmy=170315
t1= 16.8612, c1= 3.59678, s= 27.4308, lat=37 46.907 N, lon=122 15.375 W, hms=144017, dmy=170315
t1= 16.8452, c1= 3.59187, s= 27.4002, lat=37 46.898 N, lon=122 15.356 W, hms=144027, dmy=170315
t1= 16.8439, c1= 3.58982, s= 27.3838, lat=37 46.890 N, lon=122 15.337 W, hms=144037, dmy=170315
t1= 16.8328, c1= 3.58865, s= 27.3814, lat=37 46.882 N, lon=122 15.322 W, hms=144047, dmy=170315
t1= 16.8257, c1= 3.58841, s= 27.3842, lat=37 46.874 N, lon=122 15.307 W, hms=144057, dmy=170315
t1= 16.8165, c1= 3.58856, s= 27.3917, lat=37 46.866 N, lon=122 15.292 W, hms=144107, dmy=170315
t1= 16.8103, c1= 3.58836, s= 27.3942, lat=37 46.858 N, lon=122 15.277 W, hms=144117, dmy=170315
t1= 16.8400, c1= 3.58819, s= 27.3726, lat=37 46.850 N, lon=122 15.263 W, hms=144127, dmy=170315
t1= 16.8547, c1= 3.58945, s= 27.3733, lat=37 46.841 N, lon=122 15.249 W, hms=144137, dmy=170315
t1= 16.8784, c1= 3.59235, s= 27.3817, lat=37 46.833 N, lon=122 15.235 W, hms=144147, dmy=170315
t1= 16.8817, c1= 3.59347, s= 27.3889, lat=37 46.825 N, lon=122 15.221 W, hms=144157, dmy=170315

【问题讨论】:

    标签: python pandas geography


    【解决方案1】:

    您可以使用以下内容删除有问题的行:

    big_frame =big_frame[big_frame['col_name'].apply(lambda x: x.isdigit())]
    

    然后操作不应该失败。

    【讨论】:

    • 如果你不关心失败的原因并且你忽略了有问题的行,只需在try: except
    • 不确定您是否可以尝试将原始代码包装起来,但是否完成转换 - 这就是您的意思吗?
    • 我需要字符串中的最后一个字符,以便循环检测纬度或经度的方向。这很聪明,但无济于事。该字符串也是“nnn nn.nnn L”。由于“nnn nn.nnn”之间的空格,它会出错
    • @ShpielMeister 你能举个例子吗?我尝试了错误处理,但它没有跳过有问题的行。我应该如何调出损坏的行?
    猜你喜欢
    • 2021-07-22
    • 2022-01-10
    • 2019-10-03
    • 1970-01-01
    • 2016-12-31
    • 2019-03-23
    • 2018-06-13
    • 2013-05-30
    • 1970-01-01
    相关资源
    最近更新 更多