【问题标题】:Transform a dataFrame into readable for plotting barchart with plotly express将 dataFrame 转换为可读的,以便使用 plotly express 绘制条形图
【发布时间】:2021-12-17 21:23:58
【问题描述】:

我有以下凌乱的数据框。我很难将其修改为可用的格式

df = pd.DataFrame({'Q3_3_1': {'R_2cedWe4sx09CKlb': -99.0,
  'R_3smCukGdFbm4i2t': -99.0,
  'R_3Oj484bqZHepbmT': -99.0,
  'R_2Wxyhyo1ZtxL0f6': -99.0,
  'R_eh84KSBtWy9OWZ3': -99.0,
  'R_1pndKdTJ0GC0crY': -99.0,
  'R_3MF4nebUAJ130N1': -99.0,
  'R_1rrd0yEcpoziBXX': 'I have not attended a course on entrepreneurship so far.',
  'R_3J3ZATf90VmSonA': 'I have not attended a course on entrepreneurship so far.',
  'R_aaP0vu2FJGdIrNT': -99.0},
 'Q3_3_2': {'R_2cedWe4sx09CKlb': -99.0,
  'R_3smCukGdFbm4i2t': -99.0,
  'R_3Oj484bqZHepbmT': 'I have attended at least one entrepreneurship course as elective.',
  'R_2Wxyhyo1ZtxL0f6': -99.0,
  'R_eh84KSBtWy9OWZ3': -99.0,
  'R_1pndKdTJ0GC0crY': -99.0,
  'R_3MF4nebUAJ130N1': -99.0,
  'R_1rrd0yEcpoziBXX': -99.0,
  'R_3J3ZATf90VmSonA': -99.0,
  'R_aaP0vu2FJGdIrNT': 'I have attended at least one entrepreneurship course as elective.'},
 'Q3_3_3': {'R_2cedWe4sx09CKlb': 'I have attended at least one entrepreneurship course as compulsory part of my studies.',
  'R_3smCukGdFbm4i2t': 'I have attended at least one entrepreneurship course as compulsory part of my studies.',
  'R_3Oj484bqZHepbmT': 'I have attended at least one entrepreneurship course as compulsory part of my studies.',
  'R_2Wxyhyo1ZtxL0f6': 'I have attended at least one entrepreneurship course as compulsory part of my studies.',
  'R_eh84KSBtWy9OWZ3': 'I have attended at least one entrepreneurship course as compulsory part of my studies.',
  'R_1pndKdTJ0GC0crY': -99.0,
  'R_3MF4nebUAJ130N1': 'I have attended at least one entrepreneurship course as compulsory part of my studies.',
  'R_1rrd0yEcpoziBXX': -99.0,
  'R_3J3ZATf90VmSonA': -99.0,
  'R_aaP0vu2FJGdIrNT': -99.0},
 'Q3_3_4': {'R_2cedWe4sx09CKlb': -99.0,
  'R_3smCukGdFbm4i2t': -99.0,
  'R_3Oj484bqZHepbmT': -99.0,
  'R_2Wxyhyo1ZtxL0f6': -99.0,
  'R_eh84KSBtWy9OWZ3': -99.0,
  'R_1pndKdTJ0GC0crY': 'I am studying in a specific program on entrepreneurship.',
  'R_3MF4nebUAJ130N1': -99.0,
  'R_1rrd0yEcpoziBXX': -99.0,
  'R_3J3ZATf90VmSonA': -99.0,
  'R_aaP0vu2FJGdIrNT': -99.0},
 'Q3_3_5': {'R_2cedWe4sx09CKlb': -99.0,
  'R_3smCukGdFbm4i2t': -99.0,
  'R_3Oj484bqZHepbmT': -99.0,
  'R_2Wxyhyo1ZtxL0f6': -99.0,
  'R_eh84KSBtWy9OWZ3': -99.0,
  'R_1pndKdTJ0GC0crY': -99.0,
  'R_3MF4nebUAJ130N1': -99.0,
  'R_1rrd0yEcpoziBXX': -99.0,
  'R_3J3ZATf90VmSonA': -99.0,
  'R_aaP0vu2FJGdIrNT': -99.0},
 'Type': {'R_2cedWe4sx09CKlb': 'student',
  'R_3smCukGdFbm4i2t': 'nascent',
  'R_3Oj484bqZHepbmT': 'nascent',
  'R_2Wxyhyo1ZtxL0f6': 'student',
  'R_eh84KSBtWy9OWZ3': 'student',
  'R_1pndKdTJ0GC0crY': 'student',
  'R_3MF4nebUAJ130N1': 'student',
  'R_1rrd0yEcpoziBXX': 'nascent',
  'R_3J3ZATf90VmSonA': 'student',
  'R_aaP0vu2FJGdIrNT': 'active'}})

我想把它转换成可读的格式来创建一个带有 plotly 的条形图。我正在尝试创建的格式如下

df1 = pd.DataFrame({'Question': {0: 'Q3_3_1', 1: 'Q3_3_2', 2: 'Q3_3_3', 3: 'Q3_3_4',4:'Q3_3_5'},
 'student': {0: 1, 1: 0, 2: 4, 3: 1, 4: 0},
 'nascent': {0: 1, 1: 1, 2: 2, 3: 0, 4: 0},
'active': {0: 0, 1: 1, 2: 0, 3: 0, 4:0}})

我想使用的 plotly 命令行是

import plotly.express as px
px.bar(df1, x='Question', y=['student', 'nascent','active'], barmode='group', title='Final Term')

感谢您的帮助

【问题讨论】:

  • 我不明白你如何在预期的dataframe 中获得值 - 为什么student 必须是{0: 1, 1: 0, 2: 4, 3: 1, 4: 0},,为什么nascent 必须是{0: 1, 1: 1, 2: 2, 3: 0, 4: 0},为什么active 有成为{0: 0, 1: 1, 2: 0, 3: 0, 4:0}?我没有看到原始数据帧和预期数据帧之间有任何相关性。如果没有相关性,那么就不可能转换它。
  • 如何获得原始数据框?也许您应该更改生成原始数据帧的代码,而不是将其转换为预期结果。
  • 您也可以使用'Question': ['Q3_3_1', 'Q3_3_2', 'Q3_3_3', 'Q3_3_4', 'Q3_3_5'], 之类的列表来获得预期的数据帧,并且代码更具可读性。
  • 嗨,这是一项来自 Qualtrics 的调查。每列的数字是学生回答的次数,例如“我至少参加过一门创业课程作为我学习的必修课”

标签: python pandas dataframe plotly


【解决方案1】:
  • 您尚未定义数据的语义。看来 -99 实际上是 NaN
  • 直接重塑
    1. 类型添加到索引
    2. stack()问题进入索引。索引现在将是行、类型、问题
    3. 测试行现在是否为虚拟行ne(-99)
    4. 计数(实际上是求和)布尔值
    5. 稍微整理一下
  • 您现在可以使用您指定的代码行生成绘图
df1 = (
    df.set_index("Type", append=True)
    .stack()
    .ne(-99)
    .reset_index()
    .rename(columns={"level_2": "Question"})
    .groupby(["Type", "Question"])
    .sum()
    .unstack("Type")
    .droplevel(0,1)
    .reset_index()
)

px.bar(df1, x='Question', y=['student', 'nascent','active'], barmode='group', title='Final Term')

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-01-16
    • 2015-06-12
    • 2021-09-12
    • 2022-06-13
    • 2021-09-24
    相关资源
    最近更新 更多