【问题标题】:Substituting substrings with other column substring in pandas用熊猫中的其他列子字符串替换子字符串
【发布时间】:2023-03-19 07:20:02
【问题描述】:

我有一个数据框,其中包含一些模板字符串和相应的字符串变量来替换。例如,给定:

template,variable
"{color} shirt in {size}", "blue,medium"
"{capacity} bottle in {color}", "24oz,teal"
"{megapixel}mp camera", "24.1"

我想制作以下内容:

"blue shirt in medium"
"24oz bottle in teal"
"24.1mp camera"

保证第一列中模板子字符串的数量将等于第二列中字符串中的变量数量。字符串的格式与上面的例子一致。

我的第一个想法是使用extractall 创建一个多索引数据框,然后加入:

templates = df['template'].str.extractall('({\w+\})')
variables = df['variable'].str.extractall('(\w+)')
multi_df = templates.join(variables, how='inner')

但我不知道从那里去哪里。或者有没有更简单的方法?

【问题讨论】:

  • 单纯的print(f"{color} shirt in {size}")不行吗?

标签: python regex pandas


【解决方案1】:

使用string.Formattertemplate 列中提取变量并构建字典以进行替换。

>>> df
                       template        value  # I modified your column name
0       {color} shirt in {size}  blue,medium
1  {capacity} bottle in {color}    24oz,teal
2          {megapixel}mp camera         24.1
from string import Formatter

def extract_vars(s):
    return tuple(fn for _, fn, _, _ in Formatter().parse(s) if fn is not None)

df['variable'] = df['template'].apply(extract_vars)
df['value'] = df['value'].str.split(',')
df['combined'] = df.apply(lambda x: dict(zip(x['variable'], x['value'])), axis=1)

此时,您的数据框如下所示:

                       template           value           variable                               combined
0       {color} shirt in {size}  [blue, medium]      [color, size]    {'color': 'blue', 'size': 'medium'}
1  {capacity} bottle in {color}    [24oz, teal]  [capacity, color]  {'capacity': '24oz', 'color': 'teal'}
2          {megapixel}mp camera          [24.1]        [megapixel]                  {'megapixel': '24.1'}

最后,评估你的字符串:

>>> df.apply(lambda x: x['template'].format(**x['combined']), axis=1)
0    blue shirt in medium
1     24oz bottle in teal
2           24.1mp camera
dtype: object

【讨论】:

    【解决方案2】:

    我建议一种方法应用于每一行

    • 用于格式化的提取键
    • 使用键和值替换
    def formatter(row):
        keys = [k.strip("{}") for k in re.findall(r'({\w+})', row['template'])]
        return row['template'].format(**dict(zip(keys, row['variable'].split(","))))
    
    
    values = [
        ["{color} shirt in {size}", "blue,medium"],
        ["{capacity} bottle in {color}", "24oz,teal"],
        ["{megapixel}mp camera", "24.1"]
    ]
    df = pd.DataFrame(values, columns=['template', 'variable'])
    df['result'] = df.apply(formatter, axis=1)
    print(df)
    

    输出

                           template     variable                result
    0       {color} shirt in {size}  blue,medium  blue shirt in medium
    1  {capacity} bottle in {color}    24oz,teal   24oz bottle in teal
    2          {megapixel}mp camera         24.1         24.1mp camera
    

    【讨论】:

    • 使用简单的正则表达式你无法捕捉到像{color:>30} shirt in size 这样的东西。
    【解决方案3】:

    你可以使用defaultdict + str.format_map:

    from collections import defaultdict
    
    print(
        df.apply(
            lambda x: x["template"].format_map(
                defaultdict(lambda i=iter(x["variable"].split(",")): next(i))
            ),
            axis=1,
        )
    )
    

    打印:

    0    blue shirt in medium
    1     24oz bottle in teal
    2           24.1mp camera
    dtype: object
    

    【讨论】:

      【解决方案4】:

      也许代码比较简单:

      # replace e.g {color} by simply {}
      # As the substitution is positional, we can skip the hassle to build mapping of format keys to substitution values.
      df['template1'] = df['template'].str.replace(r'\{.*?\}', '{}', regex=True)
      
      # apply on each row with code similar to '{} shirt in {}'.format('blue', 'medium')
      df['result'] = df.apply(lambda x: x['template1'].format(*x['variable'].split(',')), axis=1)
      
      # remove temporary working column template1
      df = df.drop('template1', axis=1)
      

      结果:

      print(df)
      
                             template     variable                result
      0       {color} shirt in {size}  blue,medium  blue shirt in medium
      1  {capacity} bottle in {color}    24oz,teal   24oz bottle in teal
      2          {megapixel}mp camera         24.1         24.1mp camera
      

      【讨论】:

        最近更新 更多