Pandas DataFrame 索引乱序答案

【问题标题】：Pandas DataFrame indexing out of orderPandas DataFrame 索引乱序
【发布时间】：2018-06-18 21:37:01
【问题描述】：

我有以下 Pandas DataFrame：

Year        Bananas     Apples

2015 - 1    151235.0    NaN
2015 - 10   517326.0    NaN
2015 - 11   497511.0    NaN
2015 - 12   503372.0    NaN
2015 - 13   524244.0    NaN
2015 - 14   505785.0    11588.0
2015 - 15   493530.0    19170.0
2015 - 16   511167.0    18304.0
2015 - 17   605087.0    19030.0
2015 - 18   523477.0    20732.0
2015 - 19   410203.0    22032.0
2015 - 2    410268.0    NaN
2015 - 20   436890.0    21447.0
2015 - 21   412306.0    21957.0
2015 - 22   390683.0    23072.0

我希望使用“Year”列作为我的 DataFrame 的索引，但排序是乱序的。可以看出，“2015 - 2”的值应该在“2015 - 10”之前。

“年”列中的所有值都是字符串。格式应为 [Year, Week number]。我想保留这种格式，因为除了年份和周数之外我没有任何其他信息。

我尝试使用 pd.sort_values 命令按升序对值进行排序，但这并没有解决问题。我也尝试将“年份”列设置为我的索引并使用 pd.sort_index 命令，但这也不起作用。

我是 Python 和 Pandas 的新手，非常感谢任何帮助。谢谢你。

【问题讨论】：

标签： python pandas sorting datetime indexing

【解决方案1】：

不幸的是，pandas 排序函数没有 key 参数来提供自定义比较函数。但是您可以根据“年份”添加新列并使用它对数据进行排序。

df = pd.DataFrame({
    'Year': ['2015 - 10', '2015 - 1', '2015 - 2'],
    'bla': [3, 1, 2]
})

df['index'] = df['Year'].apply(lambda x: list(map(int, x.split(' - '))))
print(df)
df = df.sort_values('index')
print(df)
df = df.drop('index', axis=1)  # drop index if you don't need it
print(df)

输出：

        Year  bla       index
0  2015 - 10    3  [2015, 10]
1   2015 - 1    1   [2015, 1]
2   2015 - 2    2   [2015, 2]
        Year  bla       index
1   2015 - 1    1   [2015, 1]
2   2015 - 2    2   [2015, 2]
0  2015 - 10    3  [2015, 10]
        Year  bla
1   2015 - 1    1
2   2015 - 2    2
0  2015 - 10    3

【讨论】：