计算数据框 Pandas 每一列的唯一值答案

【问题标题】：Count unique values of each column of the dataframe Pandas计算数据框 Pandas 每一列的唯一值
【发布时间】：2020-06-24 05:11:47
【问题描述】：

我有以下数据框：

index  state  city     gdp    main_sector
1      NY     NYC      1000   services
2      NY     Utica    200    agriculture 
3      CA     LA       1200   tourism
4      CA     SF       800    tourism
5      FL     Miami    1300   services

我想获取具有唯一值的列的列表或表格：

state        3
city         5
gdp          from 200 to 1300
main_sector  3

我该怎么做？

【问题讨论】：

标签： pandas count unique

【解决方案1】：

您可以遍历列并将逻辑应用于 gdp 并找到其他列的唯一值的长度。

输入：

df = pd.DataFrame({'index': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
 'state': {0: 'NY', 1: 'NY', 2: 'CA', 3: 'CA', 4: 'FL'},
 'city': {0: 'NYC', 1: 'Utica', 2: 'LA', 3: 'SF', 4: 'Miami'},
 'gdp': {0: 1000, 1: 200, 2: 1200, 3: 800, 4: 1300},
 'main_sector': {0: 'services',
  1: 'agriculture',
  2: 'tourism',
  3: 'tourism',
  4: 'services'}})

a= []
b=[]
for col in df.columns:
    if col == 'gdp':
        b.append(col)
        a.append(f'from {df[col].min()} to {df[col].max()}')
    else:
        b.append(col)
        a.append(len(df[col].unique()))
df_new = pd.DataFrame(a,b, columns=['A'])
df_new

输出：

            A
index       5
state       3
city        5
gdp         from 200 to 1300
main_sector 3

【讨论】：

出现错误：unhashable type: 'numpy.ndarray'
我尝试使用“nunique”，但还有另一个错误：“int”类型的对象没有 len()
@Guerra 立即尝试