编写一个函数，返回并打印列中所有值中的最大值答案

【问题标题】：Writing a function that returns and prints the maximum value, out of all the values in a column编写一个函数，返回并打印列中所有值中的最大值
【发布时间】：2018-12-03 22:42:58
【问题描述】：

我有这张桌子：

A DataFrame table which is made by using Jupyter Notebook.

这实际上只是表格的一部分。

完整的表格实际上是一个 .csv 文件，通过使用 .head() 函数，只显示前五行。

我需要编写一个函数来返回并打印第二列中所有值中的最大值，其标签为“Gold”。
该函数应返回单个字符串值。

在写我的问题之前，我查看了几个来源，尝试了很多方法来解决我的问题。

这似乎是一个非常简单的解决方案，但不幸的是我没有成功找到它。
（这个查询可能有几个可选的解决方案......？）

请帮帮我，我完全糊涂了。
谢谢！

以下是所有来源：

以下是我尝试解决问题的所有方法，其中一些存在语法错误：

1.a：求最大值的传统算法，如C语言：'for'循环。

def answer_one():

row=1

max_gold = df['Gold'].row  # Setting the initial maximum.

for col in df.columns: 

    if col[:2]=='Gold': # finding the column.    

        # now iterating through all the rows, finding finally the absolute maximum:

        for row in df.itertuples():  # I also tried: for row=2 in df.rows:

            if(df['Gold'].row > max_gold)  # I also tried: if(row.Gold > max_gold)

                 max_gold = df['Gold'].row  #  I also tried: max_gold = row.Gold

return df.max_gold

上面代码中如何合并打印功能有问题，所以单独添加了：

1.b：

for row in df.itertuples():
    print(row.Gold)         # or: print(max_gold)

1.c：

for col in df.columns: 

if col[:2]=='Gold':

    df[df['Gold'].max()]

def answer_one():

df = pd.DataFrame(columns=['Gold']) # syntax error.

for row in df.itertuples():    # The same as the separated code sction above.
        print(row.Gold)

def answer_one():

print(df[['Gold']][df.Value == df.Value.max()]) # I don't know if "Value" is a key word or not.

def answer_one():
return df['Gold'].max() # right syntax, wrong result (not the max value).

def answer_one():

s=data.max()

print '%s' % (s['Gold']) # syntax error.

6.a：

def answer_one():

df.loc[df['Gold'].idxmax()] # right syntax, wrong output (all the column indexes of the table are shown in a column)

6.b：

def answer_one():

df.loc[:,['Gold']]  # or: df.loc['Gold']  

df['Gold'].max()

【问题讨论】：

df['Gold'].max() 有什么问题？为什么要写这么长的问题？
@timgeb 值错误。应该是 1022。不是 976。我想从错误中吸取教训。
咦，为什么不是 18 岁？
这不是我写的完整列表。
我建议把它们扔掉。 :)

标签： python pandas jupyter-notebook

【解决方案1】：

第一个问题很好，我假设你正在 coursera 上学习 Python 数据科学课程？

正如已经指出的，df['Gold'].max() 是正确的，但是，如果数据类型错误，它将不会返回预期的结果。所以首先要确保它是一个数字。如果此数据集的输出不是 int64，您可以通过运行 df['Gold'].dtype 来检查这一点，您可以通过运行 df.loc[:,'Gold'] = df.loc[:,'Gold'].str.replace(',','').astype(int) 来纠正它，之后 df['Gold'].max() 将返回 1022。

当涉及到 for 循环时，在这种情况下，您可以遍历 Gold 系列中的所有值，而不是遍历所有列和所有行。请注意，python 使用 0 索引！因此，如果您将第 1 行用作起点，如果最大值位于第一行 (row0) 中，并且您使用 [Index] 而不是 .Index 进行索引，则会得到错误的结果。所以 for 循环可能看起来像这样。

CurrentMax = df['Gold'][0]
for value in df['Gold']:
    if value>CurrentMax:
        CurrentMax = value
print(CurrentMax)

包装成函数：

def rowbyrow(df=df):
    CurrentMax = df['Gold'][0]
    for value in df['Gold']:
        if value>CurrentMax:
            CurrentMax = value
    #print(CurrentMax) if you want to print the result when running
    return CurrentMax

关于第 3 点。我相信您所追求的内容如下，它会根据 Gold 的值等于最大值来过滤 Gold，因为您在 Gold 周围使用了两个括号，这将返回一个数据框，而不仅仅是价值： df[['Gold']][df.Gold == df.Gold.max()] 使用一个括号，它将返回一个系列： df['Gold'][df.Gold == df.Gold.max()]

关于第5点，如果您使用python 3可能会导致语法错误？在 python 3 中，您需要在 print 语句之后使用 ()，所以下面应该可以工作：

s=df.max()
print('%s' % (s['Gold']))

关于第 6 点：a 如果您只想输出特定列，则需要在过滤条件（由 , 分隔）之后传递该列，如下所示：

df.loc[df['Gold'].idxmax(),'Gold']

如果你想返回几列，你可以传递一个列表，例如

df.loc[df['Gold'].idxmax(),['Country','Gold']]

对于点 1:c，[:2] 将返回前两个字母。所以与四个字母的单词Gold相比总是会是假的。

一些性能比较：

1.

%%timeit
df.loc[df['Gold'].idxmax(),'Gold']
10000 loops, best of 3: 76.6 µs per loop

2.

%%timeit
s=df.max()
'%s' % (s['Gold'])
1000 loops, best of 3: 733 µs per loop

3.

%%timeit
rowbyrow()
10000 loops, best of 3: 71 µs per loop

4.

%%timeit
df['Gold'].max()
10000 loops, best of 3: 106 µs per loop

我惊讶地发现函数rowbyrow() 的结果最快。

在创建一个包含 10k 个随机值的序列后，rowbyrow() 仍然是最快的。

看这里：

df = pd.DataFrame((np.random.rand(10000, 1)), columns=['Gold']) 

%%timeit  # no. 1
df['Gold'].max()

The slowest run took 10.30 times longer than the fastest.   
10000 loops, best of 3: 127 µs per loop


%%timeit  # no. 2
rowbyrow()

The slowest run took 8.12 times longer than the fastest.   
10000 loops, best of 3: 72.7 µs per loop

【讨论】：

非常感谢您的全面回答！是的，这正是您提到的 coursera 课程的问题！我将在我的 Jupyter Notebook 中再次查看您建议的所有解决方案。祝你有美好的一天，保重。

【解决方案2】：

好吧，在检查了上面建议的所有解决方案后，它们都返回相同的值：976。

但它无论如何都不会返回 1022（正确答案）。

看这里：

这里：

还有这里：

最后一张图显示返回值其实已经是'int64'类型的，而不是'str'类型的，我是否使用dtype()函数检查值类型之前以下sn-p：

def answer_one():
    return df['Gold'].max()

answer_one()

或在之后。

关于代码行：

df.loc[:,'Gold'] = df.loc[:,'Gold'].str.replace(',','').astype(int)

上面提出的，用于从'str' 值类型（字符串）转换为'int64' 值类型（数字）-它返回一条错误消息，因为它不是@987654334 @ 无论如何都要输入。

应该有人回答我为什么我没有得到正确的答案？（976 而不是 1022）
是我的 Jupyter NoteBook 的问题吗？也许还有别的？

谢谢！

【讨论】：