【问题标题】:Applying custom functions over lists of columns在列列表上应用自定义函数
【发布时间】:2021-10-09 02:21:40
【问题描述】:

我正在尝试通过应用自定义函数来优化一些工作代码,但我不确定如何对大型数据框中的特定列进行优化。在下面的示例中,我在我的数据框中选择开放式问题,这是一项调查。您会看到我手动输入每个开放式列,但我只想有一个迭代开放式列表的自定义函数。

openend = ['Q28','Q56','Q63']
### Change ranges to match the above
open1 = df.iloc[:, 28:29] # isolates 'range'
open1 = open1.iloc[1:] # removes first row
open1 = pd.concat([ids, open1], axis=1) # adds ids

open2 = df.iloc[:, 56:57]
open2 = open2.iloc[1:]
open2 = pd.concat([ids, open2], axis=1)

open3 = df.iloc[:, 63:64]
open3 = open3.iloc[1:]
open3 = pd.concat([ids, open3], axis=1)
open1['question'] = df1['Q28'][0]
open1['answer'] = open1.iloc[:,1:2]
open1 = open1.drop(open1.iloc[:,1:2], axis=1)

open2['question'] = df1['Q56'][0]
open2['answer'] = open2.iloc[:,1:2]
open2 = open2.drop(open2.iloc[:,1:2], axis=1)

open3['question'] = df1['Q63'][0]
open3['answer'] = open3.iloc[:,1:2]
open3 = open3.drop(open3.iloc[:,1:2], axis=1)
open1_stack = open1
open2_stack = open2
open3_stack = open3
open1_stack["answer"] = open1_stack["answer"].str.upper().str.title()
open1_count = open1_stack.answer.str.split(expand=True).stack().value_counts()
open1_count = open1_count.to_frame().reset_index()
open1_count.columns = ['Word', 'Count']
open1_count['question'] = df1['Q28'][0]

open2_stack["answer"] = open2_stack["answer"].str.upper().str.title()
open2_count = open2_stack.answer.str.split(expand=True).stack().value_counts()
open2_count = open2_count.to_frame().reset_index()
open2_count.columns = ['Word', 'Count']
open2_count['question'] = df1['Q56'][0]

open3_stack["answer"] = open3_stack["answer"].str.upper().str.title()
open3_count = open3_stack.answer.str.split(expand=True).stack().value_counts()
open3_count = open3_count.to_frame().reset_index()
open3_count.columns = ['Word', 'Count']
open3_count['question'] = df1['Q63'][0]

有人可以通过这个示例向我展示如何遍历开放式列表并以最佳方式应用这些函数吗?

提前致谢。

【问题讨论】:

    标签: python pandas dataframe function


    【解决方案1】:

    您可以使用接受openend 作为参数并具有如下签名的函数来包装所有代码:

    def prepare_survey(openend:list):
    

    然后遍历该列表以提取'QXX'

    for q in openend:
        # process
    

    我看到您使用的内容与 openend 相同,除了提取索引的前几个步骤外没有任何变化。所以,保持原样,但提取问题编号,如下所示:

    import re
    
    def prepare_survey(openend:list):
        for q in openend:
            # process
            idx = int(re.sub("[^0-9]", "", q))  # extract question number
            # continue with the steps you have
            open1 = df.iloc[:, idx:idx+1]
     
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-02-12
      • 2020-10-17
      • 1970-01-01
      • 2018-05-02
      • 2018-03-13
      • 2021-12-24
      • 2022-06-26
      • 1970-01-01
      相关资源
      最近更新 更多