【问题标题】:Slicing multiple column ranges from a dataframe using iloc使用 iloc 从数据框中切片多个列范围
【发布时间】:2018-02-09 16:06:21
【问题描述】:

我有一个 32 列的 df

df.shape
(568285, 32)

我正在尝试以特定方式重新排列列,并使用 iloc 删除第一列

 df = df.iloc[:,[31,[1:23],24,25,26,28,27,29,30]]
                         ^
SyntaxError: invalid syntax

这是正确的做法吗?

【问题讨论】:

  • @JohnGalt 需要提高搜索技能,道歉
  • 给它时间,我们都在那里。 :)
  • @novicebioinforesearcher 另外,重要的是要注意,骗子也不错。有时我们搜索的方式不同。或者您只是不知道要搜索什么。它不应该反映对您或您的问题的负面判断。事实上,从这个问题和答案中获得了很多代表。所有标记欺骗完成的是将搜索结果重定向到欺骗目标。因此,事实上,您已经通过包含将重新路由到另一个答案的其他搜索词来提供帮助(-:
  • 啊,我明白骗子的概念是有道理的。

标签: python pandas dataframe indexing


【解决方案1】:

您可以使用np.r_ 索引器。

class RClass(AxisConcatenator)
 |  Translates slice objects to concatenation along the first axis.
 |  
 |  This is a simple way to build up arrays quickly. There are two use cases.

df = df.iloc[:, np.r_[31, 1:23, 24, 25, 26, 28, 27, 29, 30]]

df

     0     1     2     3     4     5     6     7     8     9   ...     40  \
A  33.0  44.0  68.0  31.0   NaN  87.0  66.0   NaN  72.0  33.0  ...   71.0   
B   NaN   NaN  77.0  98.0   NaN  48.0  91.0  43.0   NaN  89.0  ...   38.0   
C  45.0  55.0   NaN  72.0  61.0  87.0   NaN  99.0  96.0  75.0  ...   83.0   
D   NaN   NaN   NaN  58.0   NaN  97.0  64.0  49.0  52.0  45.0  ...   63.0   

     41    42    43    44    45    46    47    48    49  
A   NaN  87.0  31.0  50.0  48.0  73.0   NaN   NaN  81.0  
B  79.0  47.0  51.0  99.0  59.0   NaN  72.0  48.0   NaN  
C  93.0   NaN  95.0  97.0  52.0  99.0  71.0  53.0  69.0  
D   NaN  41.0   NaN   NaN  55.0  90.0   NaN   NaN  92.0

out = df.iloc[:, np.r_[31, 1:23, 24, 25, 26, 28, 27, 29, 30]]
out 
     31    1     2     3     4     5     6     7     8     9   ...     20  \
A  99.0  44.0  68.0  31.0   NaN  87.0  66.0   NaN  72.0  33.0  ...   66.0   
B  42.0   NaN  77.0  98.0   NaN  48.0  91.0  43.0   NaN  89.0  ...    NaN   
C  77.0  55.0   NaN  72.0  61.0  87.0   NaN  99.0  96.0  75.0  ...   76.0   
D  95.0   NaN   NaN  58.0   NaN  97.0  64.0  49.0  52.0  45.0  ...   71.0   

     21    22    24    25    26    28    27    29    30  
A   NaN  40.0  66.0  87.0  97.0  68.0   NaN  68.0   NaN  
B  95.0   NaN  47.0  79.0  47.0   NaN  83.0  81.0  57.0  
C   NaN  75.0  46.0  84.0   NaN  50.0  41.0  38.0  52.0  
D   NaN  74.0  41.0  55.0  60.0   NaN   NaN  84.0   NaN  

【讨论】:

    【解决方案2】:

    这是一个使用显式索引的自定义解决方案:
    旁注,np.r_ 不适合我,这就是我构建此解决方案的原因。

    import numpy as np
    import pandas as pd
    
    # Make a sample df of 1_000 rows & 100 cols
    data = np.zeros(shape=(1_000,100))
    df = pd.DataFrame(data)
    
    
    # Create a custom function for indexing
    def all_nums_in_range(*tuple_pairs, len_df):
        """
        Input pairs of tuples for index slicing
    
        Include `len_df` to ensure length of array matches indexed df
        """
    
        # Create an array with values to use as an index
        num_range = np.zeros(shape=(len_df,), dtype=bool)
    
        # Update
        for (start, end) in tuple_pairs:
            num_range[start:end] = True
    
        return num_range
    
    
    # Now apply
    num_range = all_nums_in_range((0,50), (75, 80), len_df=100)
    df.iloc[:, num_range]
    

    【讨论】:

      猜你喜欢
      • 2020-05-16
      • 2020-11-06
      • 1970-01-01
      • 2021-06-17
      • 2019-05-16
      • 2018-12-03
      • 2018-06-06
      • 1970-01-01
      • 2021-12-28
      相关资源
      最近更新 更多