【问题标题】:How to standard scale a 3D matrix?如何标准缩放 3D 矩阵?
【发布时间】:2018-10-12 01:03:32
【问题描述】:

我正在研究信号分类问题,想先缩放数据集矩阵,但我的数据是 3D 格式(批次、长度、通道)。
我尝试使用 Scikit-learn Standard Scaler:

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

但我收到此错误消息:

找到暗淡为 3 的数组。需要 StandardScaler

我认为一种解决方案是将矩阵按每个通道拆分为多个 2D 矩阵,分别缩放它们,然后以 3D 格式放回,但我想知道是否有更好的解决方案。
非常感谢。

【问题讨论】:

    标签: python machine-learning keras scikit-learn deep-learning


    【解决方案1】:

    只有 3 行代码...

    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train.reshape(-1, X_train.shape[-1])).reshape(X_train.shape)
    X_test = scaler.transform(X_test.reshape(-1, X_test.shape[-1])).reshape(X_test.shape)
    

    【讨论】:

      【解决方案2】:

      您必须为每个通道安装并存储一个缩放器

      from sklearn.preprocessing import StandardScaler
      
      scalers = {}
      for i in range(X_train.shape[1]):
          scalers[i] = StandardScaler()
          X_train[:, i, :] = scalers[i].fit_transform(X_train[:, i, :]) 
      
      for i in range(X_test.shape[1]):
          X_test[:, i, :] = scalers[i].transform(X_test[:, i, :]) 
      

      【讨论】:

      • 它不起作用。不应该是这样吗:for i in range(X_train.shape[1]):
      • 不,我认为应该是 X_train[:, :, i] = scalers[i].fit_transform(X_train[:, :, i])。至少对我来说,当我的数据结构为(批次、样本、行、列)时
      • 谢谢。这适用于熊猫框架列吗?我有超过 291 列,请问我们如何在 pandas 框架上应用相同的内容?
      【解决方案3】:

      如果您想以不同的方式扩展每个功能,就像 StandardScaler 所做的那样,您可以使用这个:

      import numpy as np
      from sklearn.base import TransformerMixin
      from sklearn.preprocessing import StandardScaler
      
      
      class NDStandardScaler(TransformerMixin):
          def __init__(self, **kwargs):
              self._scaler = StandardScaler(copy=True, **kwargs)
              self._orig_shape = None
      
          def fit(self, X, **kwargs):
              X = np.array(X)
              # Save the original shape to reshape the flattened X later
              # back to its original shape
              if len(X.shape) > 1:
                  self._orig_shape = X.shape[1:]
              X = self._flatten(X)
              self._scaler.fit(X, **kwargs)
              return self
      
          def transform(self, X, **kwargs):
              X = np.array(X)
              X = self._flatten(X)
              X = self._scaler.transform(X, **kwargs)
              X = self._reshape(X)
              return X
      
          def _flatten(self, X):
              # Reshape X to <= 2 dimensions
              if len(X.shape) > 2:
                  n_dims = np.prod(self._orig_shape)
                  X = X.reshape(-1, n_dims)
              return X
      
          def _reshape(self, X):
              # Reshape X back to it's original shape
              if len(X.shape) >= 2:
                  X = X.reshape(-1, *self._orig_shape)
              return X
      

      在将输入的特征提供给 sklearn 的 StandardScaler 之前,它只是将输入的特征展平。然后,它重新塑造它们。用法同StandardScaler:

      data = [[[0, 1], [2, 3]], [[1, 5], [2, 9]]]
      scaler = NDStandardScaler()
      print(scaler.fit_transform(data))
      

      打印

      [[[-1. -1.]
        [ 0. -1.]]
      
       [[ 1.  1.]
        [ 0.  1.]]]
      

      参数with_meanwith_std 直接传递给StandardScaler,因此可以按预期工作。 copy=False 不起作用,因为重塑不会就地发生。对于二维输入,NDStandardScaler 的工作方式与 StandardScaler 类似:

      data = [[0, 0], [0, 0], [1, 1], [1, 1]]
      scaler = NDStandardScaler()
      scaler.fit(data)
      print(scaler.transform(data))
      print(scaler.transform([[2, 2]]))
      

      打印

      [[-1. -1.]
       [-1. -1.]
       [ 1.  1.]
       [ 1.  1.]]
      [[3. 3.]]
      

      就像在 StandardScaler 的 sklearn 示例中一样。

      【讨论】:

      • 我在 pandas 数据框中有 291 列,所以我想知道我们如何在 pandas 数据框中应用相同的内容?
      【解决方案4】:

      一种优雅的方法是使用类继承,如下所示:

      
      from sklearn.preprocessing import MinMaxScaler
      import numpy as np
      
      class MinMaxScaler3D(MinMaxScaler):
      
          def fit_transform(self, X, y=None):
              x = np.reshape(X, newshape=(X.shape[0]*X.shape[1], X.shape[2]))
              return np.reshape(super().fit_transform(x, y=y), newshape=X.shape)
      
      

      用法:

      
      scaler = MinMaxScaler3D()
      X = scaler.fit_transform(X)
      
      

      【讨论】:

      • 真的!这是优雅、最短、最简单的。
      【解决方案5】:

      我对形状为 (2500,512,642) -->(样本、时间步长、特征/空间位置)的时空数据使用了标准化方案。 以下代码可用于Normalization及其逆向代码。

      def Normalize_data(data):
          scaled_data = []
          max_values  = []
          min_values  = []
          for N in range(data.shape[0]):
              temp = []
              t1   = []
              t2   = []
              for i in range(data.shape[1]):
                  max_val = np.max(data[N,i])
                  min_val = np.min(data[N,i])
                  norm = (data[N,i] - min_val)/(max_val - min_val)
                  temp.append(norm)
                  t1.append(max_val)
                  t2.append(min_val)
      
              scaled_data.append(temp)
              max_values.append(t1)
              min_values.append(t2)
          return (np.array(scaled_data), np.array(max_values), np.array(min_values))
      
      def InverseNormalize_data(scaled_data, max_values, min_values):
          res_data = []
          for N in range(scaled_data.shape[0]):
              temp = []
              for i in range(scaled_data.shape[1]):
                  max_val = max_values[N,i]
                  min_val = min_values[N,i]
                  #print(max_val)
                  #print(min_val)
                  orig = (scaled_data[N,i] * (max_val - min_val)) + min_val
                  temp.append(orig)
              res_data.append(temp)
          return np.array(res_data)
      

      【讨论】:

        【解决方案6】:
        s0, s1, s2 = y_train.shape[0], y_train.shape[1], y_train.shape[2]
        y_train = y_train.reshape(s0 * s1, s2)
        y_train = minMaxScaler.fit_transform(y_train)
        y_train = y_train.reshape(s0, s1, s2)
        
        s0, s1, s2 = y_test.shape[0], y_test.shape[1], y_test.shape[2]
        y_test = y_test.reshape(s0 * s1, s2)
        y_test = minMaxScaler.transform(y_test)
        y_test = y_test.reshape(s0, s1, s2)
        

        只是像这样重塑数据。对于零填充使用类似:

        s0, s1, s2 = x_train.shape[0], x_train.shape[1], x_train.shape[2]
        x_train = x_train.reshape(s0 * s1, s2)
        minMaxScaler.fit(x_train[0::s1])
        x_train = minMaxScaler.transform(x_train)
        x_train = x_train.reshape(s0, s1, s2)
        
        s0, s1, s2 = x_test.shape[0], x_test.shape[1], x_test.shape[2]
        x_test = x_test.reshape(s0 * s1, s2)
        x_test = minMaxScaler.transform(x_test)
        x_test = x_test.reshape(s0, s1, s2)
        

        【讨论】:

          【解决方案7】:

          如果你正在处理管道,你可以使用这个类

          from sklearn.base import TransformerMixin,BaseEstimator
          from sklearn.preprocessing import StandardScaler
          
          class Scaler(BaseEstimator,TransformerMixin):
          
              def __init__(self):
                  self.scaler = StandardScaler()
          
              def fit(self,X,y=None):
                  self.scaler.fit(X.reshape(X.shape[0], -1))
                  return self
          
              def transform(self,X):
                  return self.scaler.transform(X.reshape(X.shape[0], -1)).reshape(X.shape)
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 2013-11-19
            • 1970-01-01
            • 1970-01-01
            • 2011-05-31
            • 1970-01-01
            • 2021-02-18
            相关资源
            最近更新 更多