【问题标题】:Python pandas method chaining: assign column from strsplitPython pandas 方法链接:从 strsplit 分配列
【发布时间】:2016-02-18 09:40:42
【问题描述】:

当我想从另一列的拆分中创建一个新列时,我的 assign 方法有问题。如果我选择 split 方法的值,我会得到错误 ValueError: Length of values does not match length of index.如果我只是应用拆分,而不选择(索引)任何值,我会得到一个包含列表的新列。

如果我不索引 split 方法的输出,这里是输出

(
    pd.DataFrame({
        "Gene": ["G1", "G1", "G2", "G2"],
        "Sample": ["H1_T1", "H2_T1", "H1_T1", "H2_T1"]
    })
    .assign(Timepoint = lambda x: x.Sample.str.split("_")[1])
)
    Gene    Sample  Timepoint
0   G1  H1_T1   [H1, T1]
1   G1  H2_T1   [H2, T1]
2   G2  H1_T1   [H1, T1]
3   G2  H2_T1   [H2, T1]

这是一个示例,我想从 Sample 列中选择 T1 或 T2 值并给出错误:

(
    pd.DataFrame({
        "Gene": ["G1", "G1", "G2", "G2"],
        "Sample": ["H1_T1", "H2_T1", "H1_T1", "H2_T1"]
    })
    .assign(Timepoint = lambda x: x.Sample.str.split("_")[1])
)

我从中得到的错误是:

/home/user/anaconda3/lib/python3.4/site-packages/pandas/core/series.py in _sanitize_index(data, index, copy)
   2739 
   2740     if len(data) != len(index):
-> 2741         raise ValueError('Length of values does not match length of '
   2742                          'index')
   2743 

ValueError: Length of values does not match length of index

【问题讨论】:

    标签: python pandas assign method-chaining


    【解决方案1】:

    IIUC 那么您需要额外调用str 来选择元素:

    In [234]:
    pd.DataFrame({
            "Gene": ["G1", "G1", "G2", "G2"],
            "Sample": ["H1_T1", "H2_T1", "H1_T1", "H2_T1"]
        }).assign(Timepoint = lambda x: x.Sample.str.split("_").str[1])
    
    Out[234]:
      Gene Sample Timepoint
    0   G1  H1_T1        T1
    1   G1  H2_T1        T1
    2   G2  H1_T1        T1
    3   G2  H2_T1        T1
    

    如果我们稍微修改您的 df 并查看输出

    In [237]:
    df = pd.DataFrame({
            "Gene": ["G1", "G1", "G2", "G2"],
            "Sample": ["H1_T1", "H2_T2", "H1_T3", "H2_T4"]
        })
    
    df['Sample'].str.split("_")
    
    Out[237]:
    0    [H1, T1]
    1    [H2, T2]
    2    [H1, T3]
    3    [H2, T4]
    dtype: object
    

    所以你尝试的是以下内容:

    In [238]:
    df['Sample'].str.split("_")[1]
    
    Out[238]:
    ['H2', 'T2']
    

    你可以看到这样做是选择第二行,你想要的是选择每一行的第二个元素:

    In [239]:
    df['Sample'].str.split("_").str[1]
    
    Out[239]:
    0    T1
    1    T2
    2    T3
    3    T4
    dtype: object
    

    【讨论】:

    • 啊哈-我明白了,感谢@EdChum 的解释!我有点相信函数(拆分)正在为每一行而不是整个数据帧执行。谢谢!
    猜你喜欢
    • 2019-04-12
    • 2016-09-05
    • 1970-01-01
    • 2019-02-12
    • 2023-01-13
    • 2019-02-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多