在 pandas 数据框中使用什么 dtype 来表示货币？答案

【问题标题】：What dtype to use for money representation in pandas dataframe?在 pandas 数据框中使用什么 dtype 来表示货币？
【发布时间】：2015-06-14 14:55:39
【问题描述】：

所以我有一个 pandas 数据框对象，其中包含货币列，精度为小数点后两位，如“133.04”。没有小数点后三位或更多的数字，只有两位。

我的尝试：十进制模块

我尝试为此使用 Decimal 模块，但是当我尝试像这样重新采样时

gr_by_price = df['price'].resample(timeframe, how='ohlc')

我明白了

pandas.core.groupby.DataError: No numeric types to aggregate

在此之前我检查了 dtype

print(type(df['price'][0]))
<class 'decimal.Decimal'>

我是这个库和货币处理的新手，也许 Decimal 不是这个的正确选择？我该怎么办？

如果我将此专栏投给<class 'numpy.float64'>，一切正常。

更新：目前我正在使用这种方法

d.Decimal("%0.2f" % float(d.Decimal("1.04")))
Decimal('1.04')

来自this question

【问题讨论】：

不幸的是，您需要为此使用np.float64，只要不超过精度和限制就可以了
@EdChum 嗯.. 我不会把 133.04 变成 133.05 或 133.03，对吗？所以我在重新采样、重新采样并再次转换为 Decimal 之前将其转换为 float64，对吗？
这可能会发生，但通常不精确发生在较低的数字，但如果你在最后转换为 Decimal 它应该剪辑这个
@EdChum 谢谢。我现在就这样做。这是有趣的部分>>> d.Decimal(float(d.Decimal("1.04"))) Decimal('1.04000000000000003552713678800500929355621337890625')
大多数时候，您想要做的是将数字存储为浮点数，然后使用适当的格式进行显示。乐趣发生在小数点后的第 12 位或第 13 位，因此在实践中很少出现问题。 Decimal 不是核心 dtype（如 int 或 float），因此使用起来可能很痛苦。请注意，在核心 dtypes 之外，pandas 将事物存储为对象。使用info() 方法检查dtypes。

标签： python python-3.x pandas dataframe

【解决方案1】：

我们遇到了类似的问题，最好的办法是将其乘以 100 并将其表示为整数
（并将 /100 用于打印/外部选项）。
它将导致快速精确的计算（1 + 2 == 3 不像 0.1 + 0.2 ！= 0.3）

【讨论】：

【解决方案2】：

您需要区分内部价值表示和您呈现它的方式（更多信息请参阅MVC here）。正如您所说，您不需要其他类型的浮点数表示，我建议继续使用常规 float 进行内部表示和数学（它是 IEEE-754 标准）并添加这一行

pd.options.display.float_format = '{:6.2f}'.format

在脚本的开头。这将使所有 打印的 值自动四舍五入到第二位数，而不会实际更改它们的值。（pd 是pandas 的常用别名）。

【讨论】：

【解决方案3】：

对于您的用例，十进制似乎是一种非常合理的表示。这里的根本问题是 pandas 中的 ohlc 聚合器调用 cython 以提高速度，我认为 cython 不能采用小数。见这里：https://github.com/pandas-dev/pandas/blob/v0.20.3/pandas/core/groupby.py#L1203-L1212

Insead，我认为最直接的方法是自己写 ohlc 以便它可以对小数进行操作

In [89]: index = pd.date_range('1/1/2000', periods=9, freq='T')

In [90]: series = pd.Series(np.linspace(0, 2, 9), index=index)

In [91]: series.resample('3T').ohlc()
Out[91]:
                     open  high   low  close
2000-01-01 00:00:00  0.00  0.50  0.00   0.50
2000-01-01 00:03:00  0.75  1.25  0.75   1.25
2000-01-01 00:06:00  1.50  2.00  1.50   2.00

In [92]: decimal_series = pd.Series([Decimal(x) for x in np.linspace(0, 2, 9)], index=index)

In [93]: def ohlc(x):
    ...:     x = x[x.notnull()]
    ...:     if x.empty:
    ...:         return pd.Series({'open': np.nan, 'high': np.nan, 'low': np.nan, 'close': np.nan})
    ...:     return pd.Series({'open': x.iloc[0], 'high': x.max(), 'low': x.min(), 'close':x.iloc[-1]})
    ...:
In [107]: decimal_series.resample('3T').apply(ohlc).unstack()
Out[107]:
                    close  high   low  open
2000-01-01 00:00:00   0.5   0.5     0     0
2000-01-01 00:03:00  1.25  1.25  0.75  0.75
2000-01-01 00:06:00     2     2   1.5   1.5

【讨论】：

【解决方案4】：

我过去也遇到过这个问题，我最终使用的解决方案是将货币表示为其最低面额的倍数（即，一美分兑换美元）。因此，类型将是int。正如这里已经提到的，这种方法的优点是可以进行无损整数计算。

Price (currency) = Multiplyer * Sub_unit

例如。对于美元，价格单位为美元，子单位为 1 美分，乘数为 100。

我想提到的另一个方面是，这适用于不同的货币。比如日元的最小面额是1日元，那么乘数是1。印尼盾的最小面额是1000印尼盾，所以乘数也可以是1。您只需要记住每种货币的乘数。

事实上，您甚至可以创建一个自定义类来为您包装这个转换，这可能是最方便的解决方案。

【讨论】：

这样一个自定义类如何与从 CSV 文件中读取的 Pandas 数据框一起工作？
@Martin Thoma 我认为您必须在使用 DataFrame 之前对其进行快速转换。