【问题标题】:How to plot a pandas dataframe with andrew_curves?如何用 andrew_curves 绘制熊猫数据框?
【发布时间】:2015-03-29 05:09:12
【问题描述】:

我有以下熊猫数据框:

df = pd.read_csv('path/file/file.csv',
                 header=0, sep=',', names=['PhraseId', 'SentenceId', 'Phrase', 'Sentiment'])

我想用andrew_curves 打印它我尝试了以下方法:

andrews_curves(df, 'Name')

知道如何绘制这个吗?这是csv的内容:

PhraseId, SentenceId, Phrase, Sentiment
1, 1, A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., 1
2, 1, A series of escapades demonstrating the adage that what is good for the goose, 2
3, 1, A series, 2
4, 1, A, 2
5, 1, series, 2
6, 1, of escapades demonstrating the adage that what is good for the goose, 2
7, 1, of, 2
8, 1, escapades demonstrating the adage that what is good for the goose, 2
9, 1, escapades, 2
10, 1, demonstrating the adage that what is good for the goose, 2
11, 1, demonstrating the adage, 2
12, 1, demonstrating, 2
13, 1, the adage, 2
14, 1, the, 2
15, 1, adage, 2
16, 1, that what is good for the goose, 2
17, 1, that, 2
18, 1, what is good for the goose, 2
19, 1, what, 2
20, 1, is good for the goose, 2
21, 1, is, 2
22, 1, good for the goose, 3
23, 1, good, 3
24, 1, for the goose, 2
25, 1, for, 2
26, 1, the goose, 2
27, 1, goose, 2
28, 1, is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., 2
29, 1, is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story, 2

【问题讨论】:

  • 您的尝试有什么问题?有没有报错?

标签: python python-2.7 matplotlib pandas plot


【解决方案1】:

在您链接到的the doc page 中,鸢尾花数据集有一个名为'Name' 的列。当你打电话时

andrews_curves(data, 'Name')

data 的行按Name 的值分组。这就是鸢尾花的原因 数据集,您可以获得三种不同颜色的线条。

在您的数据集中,您有三列:ABC。要在 df 上调用 andrews_curves,首先需要确定要分组的值。例如,如果它是C 列的值,则调用

andrews_curves(data, 'C')

另一方面,如果您想按列名进行分组ABC,那么 首先融化您的 DataFrame 以将其从宽格式转换为长格式,然后 然后在variable 列上调用andrews_curves(其中包含值 A,B, 或 C 每行):

import numpy as np
import pandas as pd
import pandas.plotting as pdplt
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 1000)
df = pd.DataFrame({'A': np.sin(x**2)/x,
                   'B': np.sin(x)*np.exp(-x),
                   'C': np.cos(x)*x})
pdplt.andrews_curves(pd.melt(df), 'variable')
plt.show()

产量

【讨论】:

  • 这太抽象了。我不知道“切换键”是什么意思。请发布text.csv 的示例以及您正在运行的引发 TypeError 的代码。
  • 感谢您的帮助@unutbu。我更新了这个问题,我想根据少量的实例数据绘制安德鲁斯曲线。
  • @skwoi:andrews_curves 函数要求数据框的所有列都是数字的。您可能希望将Phrase 列转换为某种数字。你将如何做到这一点对我来说并不明显。另一个问题是您需要指定要分组的列。您的数据中没有明显的候选者。
  • 谢谢您的回答。对于这种类型的数据,您还推荐我什么其他类型的图?。
猜你喜欢
  • 2018-02-10
  • 2017-05-28
  • 2022-01-08
  • 2016-01-07
  • 1970-01-01
  • 2020-07-28
  • 2017-10-10
  • 2020-01-14
  • 2017-06-09
相关资源
最近更新 更多