ValueError，尽管已经为此执行了检查答案

【问题标题】：ValueError, though check has already be performed for thisValueError，尽管已经为此执行了检查
【发布时间】：2017-04-28 18:02:05
【问题描述】：

有点卡在 NaN 数据上。该程序通过外部硬盘驱动器中的文件夹拖网加载作为数据帧的 txt 文件，并且应该读取最后一列的最后一个值。由于某些原因最后一行没有完成，我选择了之前的行（或者这就是我希望做的。这是代码，我已经评论了我认为给麻烦的行：

#!/usr/bin/env python3

import glob
import math
import pandas as pd
import numpy as np

def get_avitime(vbo):
    try:
        df = pd.read_csv(vbo,
                         delim_whitespace=True,
                         header=90)
        row = next(df.iterrows())
        t = df.tail(2).avitime.values[0]
        return t
    except:
        pass

def human_time(seconds):
        secs = seconds/1000
        mins, secs = divmod(secs, 60)
        hours, mins = divmod(mins, 60)
        return '%02d:%02d:%02d' % (hours, mins, secs)
def main():
    path = 'Z:\\VBox_Backup\\**\\*.vbo'
    events = {}
    customers = {}

    for vbo_path in glob.glob(path, recursive=True):
        path_list = vbo_path.split('\\')
        event = path_list[2].upper()
        customer = path_list[3].title()
        avitime = get_avitime(vbo_path)
        if not avitime:             # this is to check there is a number
            continue
        else:
            if event not in events:
                events[event] = {customer:avitime}
                print(event)
            elif customer not in events[event]:
                events[event][last_customer] = human_time(events[event][last_customer])
                print(events[event][last_customer])
                events[event][customer] = avitime
            else:
                total_time = events[event][customer]
                total_time += avitime
                events[event][customer] = total_time
        last_customer = customer



    events[event][customer] = human_time(events[event][customer])
    df_events = pd.DataFrame(events)
    df.to_csv('event_track_times.csv')

main()

我输入了一行来检查一个值，但我猜测 NaN 不是空值，因此它不太有效。

C:\Users\rob.kinsey\AppData\Local\Continuum\Anaconda3) c:\Users\rob.kinsey\Pro
ramming>python test_single.py
BARCELONA
03:52:42
02:38:31
03:21:02
00:16:35
00:59:00
00:17:45
01:31:42
03:03:03
03:16:43
01:08:03
01:59:54
00:09:03
COTA
04:38:42
02:42:34
sys:1: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
04:01:13
01:19:47
03:09:31
02:37:32
03:37:34
02:14:42
04:53:01
LAGUNA_SECA
01:09:10
01:34:31
01:49:27
03:05:34
02:39:03
01:48:14
SILVERSTONE
04:39:31
01:52:21
02:53:42
02:10:44
02:11:17
02:37:11
01:19:12
04:32:21
05:06:43
SPA
Traceback (most recent call last):
  File "test_single.py", line 56, in <module>
    main()
  File "test_single.py", line 41, in main
    events[event][last_customer] = human_time(events[event][last_customer])
  File "test_single.py", line 23, in human_time

输出开始正确，除了 sys:1 错误，但至少它继续，以及完全停止程序的最终错误。我怎样才能解决这个 NaN 问题，我正在使用的所有变量都应该是浮点数据类型，或者应该被忽略。所有数据类型只能是字符串或浮点数，直到时间转换为整数。

【问题讨论】：

这是一个没有人见过的新问题吗？

标签： python-3.x nan

【解决方案1】：

好的，即使没有人回答，我也不得不回答我自己的问题，因为我不相信我是唯一遇到此问题的人。

在数据框中接收 NaN 有 3 个主要原因，其中大部分都围绕着无穷大，例如使用 'inf' 作为值或除以零，这也会提供 NaN 作为结果，wiki 页面是对我解决这个问题最有帮助的： https://en.wikipedia.org/wiki/NaN

关于 NaN 的另一个重要点有点像病毒，因为在任何计算中接触到它的任何东西都会导致 NaN，因此问题会成倍地恶化。实际上，您正在处理的是缺少数据，直到您意识到它是什么，NaN 是最没用和令人沮丧的事情，因为它属于数据类型而不是错误，但任何数学运算都将以 NaN 结尾。当心！！

这种情况的原因是因为在读取 csv 文件时使用了特定的行来获取标题，尽管这适用于大多数这些文件，但其中一些文件的标题我在不同的行上，结果，导入数据帧的标头要么是数据本身的一部分，要么是空值。结果，尝试通过标题名称访问数据框中的列会导致 NaN，并且如前所述，尽管程序导致了一些使用变通方法解决的问题，但这种情况激增，其中一个实际上是可以接受的，即添加这一行：

df = df.fillna(0)

在第一次定义df变量之后，在这种情况下：

df= pd.read_csv(vbo,
               delim_whitespace=True,
               header=90)

底线是，如果您收到此值，最好的办法就是首先弄清楚为什么您会得到 NaN，然后更容易就是否将 NaN 替换为“0”是一个可行的选择。

我真诚地希望这对找到它的人有所帮助。问候 iFunction

【讨论】：