【问题标题】:Create list of column names in multi-index pandas dataframe在多索引熊猫数据框中创建列名列表
【发布时间】:2020-11-21 15:35:28
【问题描述】:

我从 Excel 表中读取的数据框中有一个曲折的列名列表。数据作为多索引数据框导入,具有两个列标签级别。我想创建一个包含特定字符串的某些列名的列表,以便我可以将它们从数据框中删除。

我的想法是使用这样的东西:

# Create list of names for unwanted columns.
lst = [col for col in df.columns if 'ISTD' in col]
# Returns empty.

# Drop columns from dataframe.
df.drop(labels = lst, axis=1, level=0, inplace=True)

虽然列表返回空,所以我想问题是我不知道如何正确选择多索引数据框中的列。我发现它的文档很难理解,所以我希望在这里得到答案。

以下是我的列名供参考:

df.columns
Out[44]: 
MultiIndex([('115  In ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('115  In ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (         '137  Ba  [ He Gas ] ',           'Conc. RSD'),
            (         '137  Ba  [ He Gas ] ',       'Conc. [ ppb ]'),
            (         '137  Ba  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            ('159  Tb ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('159  Tb ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            ('175  Lu ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('175  Lu ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (         '208  Pb  [ He Gas ] ',           'Conc. RSD'),
            (         '208  Pb  [ He Gas ] ',       'Conc. [ ppb ]'),
            (         '208  Pb  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '23  Na  [ He Gas ] ',           'Conc. RSD'),
            (          '23  Na  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '23  Na  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '24  Mg  [ He Gas ] ',           'Conc. RSD'),
            (          '24  Mg  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '24  Mg  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '27  Al  [ He Gas ] ',           'Conc. RSD'),
            (          '27  Al  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '27  Al  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (           '39  K  [ He Gas ] ',           'Conc. RSD'),
            (           '39  K  [ He Gas ] ',       'Conc. [ ppb ]'),
            (           '39  K  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '44  Ca  [ He Gas ] ',           'Conc. RSD'),
            (          '44  Ca  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '44  Ca  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            ( '45  Sc ( ISTD )  [ He Gas ] ',                 'CPS'),
            ( '45  Sc ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (          '52  Cr  [ He Gas ] ',           'Conc. RSD'),
            (          '52  Cr  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '52  Cr  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '55  Mn  [ He Gas ] ',           'Conc. RSD'),
            (          '55  Mn  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '55  Mn  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '56  Fe  [ He Gas ] ',           'Conc. RSD'),
            (          '56  Fe  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '56  Fe  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '60  Ni  [ He Gas ] ',           'Conc. RSD'),
            (          '60  Ni  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '60  Ni  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '63  Cu  [ He Gas ] ',           'Conc. RSD'),
            (          '63  Cu  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '63  Cu  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '66  Zn  [ He Gas ] ',           'Conc. RSD'),
            (          '66  Zn  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '66  Zn  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (  '7  Li ( ISTD )  [ He Gas ] ',                 'CPS'),
            (  '7  Li ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            ( '72  Ge ( ISTD )  [ He Gas ] ',                 'CPS'),
            ( '72  Ge ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (          '75  As  [ He Gas ] ',           'Conc. RSD'),
            (          '75  As  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '75  As  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '78  Se  [ He Gas ] ',           'Conc. RSD'),
            (          '78  Se  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '78  Se  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '82  Se  [ He Gas ] ',           'Conc. RSD'),
            (          '82  Se  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '82  Se  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '95  Mo  [ He Gas ] ',           'Conc. RSD'),
            (          '95  Mo  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '95  Mo  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (                       'Sample',      'Acq. Date-Time'),
            (                       'Sample',             'Comment'),
            (                       'Sample',           'Data File'),
            (                       'Sample',               'Level'),
            (                       'Sample',                'Rjct'),
            (                       'Sample',         'Sample Name'),
            (                       'Sample',          'Total Dil.'),
            (                       'Sample',                'Type'),
            (                       'Sample',  'Unnamed: 0_level_1'),
            (                       'Sample',         'Vial Number')]

感谢阅读。

【问题讨论】:

  • 你试过在 df.columns 之后使用 .tolist() 吗?

标签: python pandas dataframe multi-index


【解决方案1】:

这是另一种方式。首先,创建一个包含 4 行的示例 MultiIndex(每行是一个元组):

midx = pd.MultiIndex.from_tuples([
        ('115  In ( ISTD )  [ He Gas ] ',           'CPS'),
        ('115  In ( ISTD )  [ He Gas ] ',       'CPS RSD'),
        (         '137  Ba  [ He Gas ] ',     'Conc. RSD'),
        (         '137  Ba  [ He Gas ] ', 'Conc. [ ppb ]'),
])

现在,创建掩码(在多索引的第一部分寻找 ISTD):

mask = np.array(['ISTD' in idx for idx in midx.get_level_values(0)])
midx[ ~ mask ]

MultiIndex([('137  Ba  [ He Gas ] ',     'Conc. RSD'),
            ('137  Ba  [ He Gas ] ', 'Conc. [ ppb ]')],
           )

【讨论】:

    【解决方案2】:

    多索引列是一个元组列表。你可以这样做:

    lst = [col for col in df.columns if 'ISTD' in col[0]]
    df = df.drop(lst, axis=1)
    

    【讨论】:

      【解决方案3】:

      因此,对于多列,df.columns 返回一个您可以将其视为元组列表的对象(MultiIndex 类型。

      您可以像这样遍历它们并删除它们:

      cols = [(first, second) for first, second in df.columns if 'ISTD' in second]
      df.drop(cols, axis=1, level=1)
      

      这将仅在第二层(您从 df.columns 获得的元组的第二个值)中查找“ISTD”。

      【讨论】:

      • 很好,如果我去掉 drop 函数中的 level 参数,它就可以工作。它给了我一个 KeyError (f"labels {codes} not found in level")。不知道这意味着什么。
      【解决方案4】:

      您不需要创建列表,使用“usecols”读取文件时无法读取列

      data = pd.read_excel(directory, usecols = lambda x: False if "unwanted_string" in x else True)
      

      如果你还想制作一个列表,你可以单独获取标题行,然后通过该列表删除不需要的字符串。

      #Read in the column names as a list:
      cols = pd.read_excel(directory, header=None, nrows=1, index_col = 0).values[0]
      cols = cols.tolist()
      
      #remove the elements that contain the unwanted string
      for item in cols:
          if "string" in str(item):
              cols.remove(item)
          else:
              continue
      
      #then assign cols list as columns of the dataframe:
      data.columns = cols
      

      【讨论】:

        猜你喜欢
        • 2016-06-16
        • 2016-02-05
        • 2020-05-25
        • 2023-01-14
        • 1970-01-01
        • 2016-01-13
        • 1970-01-01
        • 1970-01-01
        • 2019-11-01
        相关资源
        最近更新 更多