【问题标题】:Python: normalizing some of the columns of a pandas DataFramePython:规范化 pandas DataFrame 的某些列
【发布时间】:2025-12-10 07:30:01
【问题描述】:

我有一个 DataFrame,我想从中使用另一个任意列规范化一些任意列:

import itertools as it
import numpy as np
import pandas as pd

header = tuple(['h_seqNum', 'h_stamp', 'user_id'])
joints = tuple(['head', 'neck', 'torso'])
attribs = tuple(['pos_x','pos_y','pos_z'])

all_columns = it.izip(*it.product(joints, attribs))
multiind_first = list(it.chain(['header']*len(header), all_columns.next(), ['pose',]))
multiind_second = list(it.chain(header, all_columns.next(), ['pose',]))

df = pd.DataFrame(np.random.rand(65).reshape(5,13),  columns = pd.MultiIndex.from_arrays([multiind_first, multiind_second], names=['joint', 'attrib']))

生成的 DataFrame 是这样的:

joint    header                            head                       neck                       torso                      pose
attrib   h_seqNum    h_stamp    user_id    pos_x    pos_y    pos_z    pos_x    pos_y    pos_z    pos_x    pos_y    pos_z    pose
0        0.681       0.059      0.607      0.093    0.504    0.975    0.317    0.739    0.129    0.759    0.254    0.814    1
1        0.914       0.420      0.305      0.242    0.700    0.180    0.324    0.171    0.477    0.943    0.877    0.069    0
2        0.522       0.395      0.118      0.739    0.653    0.326    0.947    0.517    0.036    0.647    0.079    0.227    0
3        0.475       0.815      0.792      0.208    0.472    0.427    0.213    0.544    0.440    0.033    0.636    0.527    2
4        0.767       0.774      0.983      0.646    0.949    0.947    0.402    0.015    0.913    0.734    0.192    0.032    0    

我想使用另一个任意关节(例如“躯干”)对属于任意关节(例如“头”)的所有列(属性)进行归一化。例如类似的东西。

df['head'] = df['head'] - df['torso']
df['neck'] = df['neck'] - df['torso']
# Note that torso remains "unnormalized"

为此我写了一个函数:

def normalize_joints(df, from_joint):
    joint_names = set(joints) - set([from_joint,])
    for j in list(joint_names):
         df[j] = df[j] - df[norm_name]

但是,当我执行此函数时,我收到以下错误:

normalize_joints(df, 'torso')

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-414-47f39f04716d> in <module>()
----> 1 normalize_joints(df, 'torso')

<ipython-input-407-cf13a67fabd8> in normalize_joints(df, from_joint)
      2     joint_names = set(joints) - set([from_joint,])
      3     for j in list(joint_names):
----> 4         df[j] = df[j] - df[from_joint]

/Library/Python/2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   2117                                          fill_value, limit, takeable=takeable)
   2118 
-> 2119         return frame
   2120 
   2121     def _reindex_index(self, new_index, method, copy, level, fill_value=NA,

/Library/Python/2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
   2164     @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs)
   2165     def reindex_axis(self, labels, axis=0, method=None, level=None, copy=True,
-> 2166                      limit=None, fill_value=np.nan):
   2167         return super(DataFrame, self).reindex_axis(labels=labels, axis=axis,
   2168                                                    method=method, level=level,

/Library/Python/2.7/site-packages/pandas/core/generic.pyc in _set_item(self, key, value)
    677 
    678     __bool__ = __nonzero__
--> 679 
    680     def bool(self):
    681         """ Return the bool of a single element PandasObject

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in set(self, item, value)
   1768     def sp_index(self):
   1769         return self.values.sp_index
-> 1770 
   1771     @property
   1772     def kind(self):

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in _reset_ref_locs(self)
   1054         # see if we can align other
   1055         if hasattr(other, 'reindex_axis'):
-> 1056             if align:
   1057                 axis = getattr(other, '_info_axis_number', 0)
   1058                 other = other.reindex_axis(self.items, axis=axis,

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in _rebuild_ref_locs(self)
   1062 
   1063         # make sure that we can broadcast
-> 1064         is_transposed = False
   1065         if hasattr(other, 'ndim') and hasattr(values, 'ndim'):
   1066             if values.ndim != other.ndim or values.shape == other.shape[::-1]:

AttributeError: _ref_locs

经过多次尝试,我无法找到错误的根源。如果我执行操作

df['head'] - df['torso']

它会返回一个带有正确结果的 DataFrame。但是,当我尝试将此 DataFrame 分配给 df['head'] 时,我得到了之前显示的错误。

有什么方法可以完成这个任务吗?

此外,我想知道是否有比我正在尝试的方法更好的方法来执行相同的标准化。也许使用 groupby 然后将 normalize 函数应用于选定的 DataFrame?

编辑:

numpy 1.6 和 pandas 0.12 出现此错误

升级到numpy 1.8和pandas 0.13后,以下操作有效:

df['head'] = df['head'] - df['torso']

【问题讨论】:

  • 在您的第一个代码块中,您需要将multiind_first 替换为mi_level_one,将multiind_second 替换为mi_level_two
  • 已替换。只是复制粘贴我的代码的问题。谢谢!

标签: python pandas


【解决方案1】:

问题是您的列是MultiIndex 的实例,试试这个:

def normalize_joints(df, from_joint):
    joint_names = set(joints) - set([from_joint,])
    for j in list(joint_names):
        keys = [(j,c) for c in attribs]
        df[keys] = df[j] - df[from_joint]

print df
normalize_joints(df, 'torso')
print df

输出:

joint     header                          head                          neck                         torso                          pose
attrib  h_seqNum   h_stamp   user_id     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z      pose
0       0.067366  0.957394  0.983969  0.602662  0.505270  0.990675  0.753841  0.598397  0.846479  0.757155  0.220009  0.328470  0.686525
1       0.806405  0.800388  0.302178  0.935559  0.180360  0.322767  0.230457  0.617555  0.602589  0.109482  0.181803  0.311266  0.929481
2       0.649677  0.237286  0.963088  0.370463  0.471590  0.489256  0.060383  0.070885  0.858312  0.306232  0.511731  0.257015  0.283287
3       0.054800  0.127925  0.099985  0.700160  0.211256  0.026782  0.820380  0.922593  0.600130  0.100745  0.418157  0.869735  0.597275
4       0.678372  0.334520  0.247894  0.616133  0.914610  0.229628  0.317488  0.224910  0.620222  0.952499  0.946568  0.539502  0.838473
joint     header                          head                          neck                         torso                          pose
attrib  h_seqNum   h_stamp   user_id     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z     pos_x     pos_y     pos_z      pose
0       0.067366  0.957394  0.983969 -0.154493  0.285261  0.662205 -0.003314  0.378387  0.518009  0.757155  0.220009  0.328470  0.686525
1       0.806405  0.800388  0.302178  0.826077 -0.001443  0.011501  0.120975  0.435752  0.291322  0.109482  0.181803  0.311266  0.929481
2       0.649677  0.237286  0.963088  0.064231 -0.040141  0.232241 -0.245850 -0.440846  0.601297  0.306232  0.511731  0.257015  0.283287
3       0.054800  0.127925  0.099985  0.599414 -0.206900 -0.842953  0.719635  0.504436 -0.269605  0.100745  0.418157  0.869735  0.597275
4       0.678372  0.334520  0.247894 -0.336366 -0.031958 -0.309874 -0.635011 -0.721658  0.080719  0.952499  0.946568  0.539502  0.838473

【讨论】:

  • 谢谢,@xndrme 你的回答给我带来了另一个问题。为什么如果 df['head'] - df['torso'] 生成的 pd.DataFrame 与您的答案相同,则无法将其分配给 df['head']?我知道它一定与MultiIndex有关,但我不明白为什么
  • 问题是多索引上的 df['head'] 只是部分的,它适用于 getting 数据,但似乎适用于 setting您应该提供整个多级索引(我认为这与 pandas 的实现有关,也许它的一些开发人员可以更好地回答您的问题;)
  • 不知何故,开发人员似乎想到了这个问题。升级到 numpy 1.8 和 pandas 0.13 解决了这个问题。
  • @VGonPa,好吧,我想我也需要升级 :)
【解决方案2】:

我相信我找到了一个相当简单的解决方案:

def normalize(df, from_joint):
    df.drop(['header', 'pose', from_joint], axis=1, level='joint').sub(df[from_joint], level=1)

df.update(normalize(df, 'torso'))

【讨论】: