【发布时间】:2021-10-09 02:21:40
【问题描述】:
我正在尝试通过应用自定义函数来优化一些工作代码,但我不确定如何对大型数据框中的特定列进行优化。在下面的示例中,我在我的数据框中选择开放式问题,这是一项调查。您会看到我手动输入每个开放式列,但我只想有一个迭代开放式列表的自定义函数。
openend = ['Q28','Q56','Q63']
### Change ranges to match the above
open1 = df.iloc[:, 28:29] # isolates 'range'
open1 = open1.iloc[1:] # removes first row
open1 = pd.concat([ids, open1], axis=1) # adds ids
open2 = df.iloc[:, 56:57]
open2 = open2.iloc[1:]
open2 = pd.concat([ids, open2], axis=1)
open3 = df.iloc[:, 63:64]
open3 = open3.iloc[1:]
open3 = pd.concat([ids, open3], axis=1)
open1['question'] = df1['Q28'][0]
open1['answer'] = open1.iloc[:,1:2]
open1 = open1.drop(open1.iloc[:,1:2], axis=1)
open2['question'] = df1['Q56'][0]
open2['answer'] = open2.iloc[:,1:2]
open2 = open2.drop(open2.iloc[:,1:2], axis=1)
open3['question'] = df1['Q63'][0]
open3['answer'] = open3.iloc[:,1:2]
open3 = open3.drop(open3.iloc[:,1:2], axis=1)
open1_stack = open1
open2_stack = open2
open3_stack = open3
open1_stack["answer"] = open1_stack["answer"].str.upper().str.title()
open1_count = open1_stack.answer.str.split(expand=True).stack().value_counts()
open1_count = open1_count.to_frame().reset_index()
open1_count.columns = ['Word', 'Count']
open1_count['question'] = df1['Q28'][0]
open2_stack["answer"] = open2_stack["answer"].str.upper().str.title()
open2_count = open2_stack.answer.str.split(expand=True).stack().value_counts()
open2_count = open2_count.to_frame().reset_index()
open2_count.columns = ['Word', 'Count']
open2_count['question'] = df1['Q56'][0]
open3_stack["answer"] = open3_stack["answer"].str.upper().str.title()
open3_count = open3_stack.answer.str.split(expand=True).stack().value_counts()
open3_count = open3_count.to_frame().reset_index()
open3_count.columns = ['Word', 'Count']
open3_count['question'] = df1['Q63'][0]
有人可以通过这个示例向我展示如何遍历开放式列表并以最佳方式应用这些函数吗?
提前致谢。
【问题讨论】:
标签: python pandas dataframe function