【问题标题】:Add column to existing dataframe and import data into new column in Python Pandas将列添加到现有数据框并将数据导入 Python Pandas 中的新列
【发布时间】:2020-05-29 06:02:12
【问题描述】:

我正在使用 Python 将 CSV 文件读入 pandas 数据帧。我想将文本文件列表读入数据框的新列中。

我正在读取的原始 CSV 文件如下所示:

Name,PrivateIP
bastion001,10.238.2.166
logicmonitor001,10.238.2.52
logicmonitor002,45.21.2.13

原始数据框如下所示。

代码:

hosts_list = dst = os.path.join('..', '..', 'source_files', 'aws_hosts_list', 'aws_hosts_list.csv')
fields = ["Name", "PrivateIP"]
orig_df = pd.read_csv(hosts_list, skipinitialspace=True, usecols=fields)
print(f"Orig DF: {orig_df}")

输出:

Orig DF:
                       Name     PrivateIP
0               bastion001  10.238.2.166
1          logicmonitor001   10.238.2.52
2         logicmonitor002    45.21.2.13

文本目录中有一堆文本文件,每个文件都有内存读数:


bastion001-memory.txt              B-mmp-rabbitmq-core002-memory.txt  logicmonitor002-memory.txt    mmp-cassandra001-memory.txt  company-division-rcsgw002-memory.txt
B-mmp-platsvc-core001-memory.txt   haproxy001-memory.txt              company-cassandra001-memory.txt  mmp-cassandra002-memory.txt  company-waepd001-memory.txt
B-mmp-platsvc-core002-memory.txt   haproxy002-memory.txt              company-cassandra002-memory.txt  mmp-cassandra003-memory.txt  company-waepd002-memory.txt
B-mmp-rabbitmq-core001-memory.txt  logicmonitor001-memory.txt         company-cassandra003-memory.txt  company-division-rcsgw001-memory.txt  company-waepd003-memory.txt

每个文件看起来都类似于:

cat haproxy001-memory.txt
7706172

我将每个文件读入现有数据框。


rowcount == 0
text_path = '/home/tdun0002/stash/cloud_scripts/output_files/memory_stats/text/'
filelist = os.listdir(text_path)
for filename in filelist:
    if rowcount == 0:
        pass
    else:
        my_file = text_path + filename
        print(f"Adding {filename} to DF")
        try:
            orig_df = pd.update(my_file)
            print(f"Data Frame: {orif_df}")
            ++rowcount
        except Exception as e:
            print(f"An error has occurred: {e}")

但是当我再次尝试读取生成的数据帧时,它还没有更新。为了清楚起见,我给新的 DF 取了一个新名称。

代码:

result_df = orig_df
pd.options.display.max_rows
print(f"\nResult Data Frame:\n{result_df}\n")

输出:

Result Data Frame:
                      Name     PrivateIP
0               bastion001  10.238.2.166
1          logicmonitor001   10.238.2.52
2          logicmonitor002    45.21.2.13

如何在 DF 中创建一个名为 Memory 的新列并将文本文件的内容添加到该列?

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    这是我希望能工作的代码。这有点笨拙,但你会明白的。里面有cmets。

    import pandas as pd
    import os
    from os import listdir
    from os.path import isfile, join
    
    # get all files in the directory
    # i used os.getcwd() to get the current directory
    # if your text files are in another dir, then write exact dir location
    # this gets you all files in your text dir
    onlyfiles = [f for f in listdir(os.getcwd()) if isfile(join(os.getcwd(), f))]
    
    # convert it to series
    memory_series = pd.Series(onlyfiles)
    
    # an apply function to get just txt files
    # others will be returned as None
    def file_name_getter(x):
        names = x.split(".", maxsplit=1)
        if names[1] == "txt":
            return names[0]
        else:
            return None
    
    # apply the function and get a new series with name values
    mem_list = memory_series.apply(lambda x: file_name_getter(x))
    
    # now read first line of txt files
    # and this is the function for it
    def get_txt_data(x):
        if x != None:
            with open(f'{x}.txt') as f:
                return int(f.readline().rstrip())
        else:
            return 0
    
    # apply the function, get a new series with memory values
    mem_val_list = mem_list.apply(lambda x: get_txt_data(x))
    
    # create a df where our Name and Memory data are present
    # cast Memory data as int
    df = pd.DataFrame(mem_val_list, columns=["Memory"], dtype="int")
    df["Name"] = mem_list
    
    # get rid of -memory now
    def name_normalizer(x):
        if x is None:
            return x
        else:
            return x.rsplit("-", maxsplit=1)[0]
    
    # apply function
    df["Name"] = df["Name"].apply(lambda x:  name_normalizer(x))
    
    
    # our sample orig_df
    orig_df = pd.DataFrame([["algo_2", "10.10.10"], ["other", "20.20.20"]], columns=["Name", "PrivateIP"])
    
    # merge using on, so if we miss data; that data wont cause any problem
    # all matching names will get their memory values
    final_df = orig_df.merge(df, on="Name")
    

    编辑:固定Name 正确返回。 (xxx-内存到xxx)

    【讨论】:

    • @bluethundr 刚刚更新了代码,看看吧!忘记了名为 xxx-memory.txt 的 .txt 文件
    猜你喜欢
    • 1970-01-01
    • 2020-07-14
    • 1970-01-01
    • 2017-05-02
    • 2022-12-19
    • 2023-01-23
    • 1970-01-01
    • 1970-01-01
    • 2019-06-23
    相关资源
    最近更新 更多