【问题标题】:How to build pandas dataframe from multi-loop process that appends rows and columns?如何从附加行和列的多循环过程中构建熊猫数据框?
【发布时间】:2026-01-31 09:10:01
【问题描述】:

我正在尝试根据从 crontab 文件中收集的数据构建数据框。我不确定如何准确地获取这些片段并将它们编译成一个数据框。

这是我目前所拥有的:

from crontab import CronTab
import re

system_cron=CronTab()
user_cron=CronTab(user=True)
user_cron

#create and clean list of bash files

line=0
listJobs=[]
for job in user_cron:
    match = re.search('.sh', str(user_cron[line]))
    if match:
        pos=str(user_cron[line]).find('.sh')+3
        start=(str(user_cron[line])[::-1]).find(' ', 0)
        print(str(user_cron[line])[-start:pos])
        listJobs.append(str(user_cron[line])[-start:pos])
    line = line+1
listJobs = list(set(listJobs))
listJobs.remove("keybash.sh")

# listJobs is now a list of .sh files including their paths

# loop through the .sh files to pull the python notebooks

for job in listJobs:
    with(open(job, 'r')) as file:
        text=file.read()
        text = text.splitlines() 
    print(job)
    print(text)
type(text)

listFiles=[]
line=0
for file in text:
    match = re.search('ipynb', str(text[line]))
    if match:
        pos=str(text[line]).find('ipynb')+5
        start=(str(text[line])[::-1]).find(' ', 0)
        print(str(text[line])[-start:pos])
        listFiles.append(str(text[line])[-start:pos])
    line=line+1
listFiles

所以现在我有两个行数不同的列表,不知道如何加入它们以获得这样的结果:

我想知道我是否应该使用字典或转换为数据框然后循环遍历它?什么是最有效的方法来改变我的代码来实现我需要的东西?

【问题讨论】:

    标签: python pandas dataframe for-loop


    【解决方案1】:
    %%timeit
    df = pd.DataFrame(columns=['key'])
    for i in range(10):
        df.loc[i] = i
    15.8 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    %%timeit
    df = pd.DataFrame() 
    for i in range(10):
        df = df.append({'key': i}, ignore_index=True)
    14.7 ms ± 849 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    %%timeit
    data = []
    for i in range(10):
        data.append({'key': i})
    df = pd.DataFrame(data) 
    668 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

    【讨论】: