将不同文件夹中的多个 txt 文件读入 SAS 数据集答案

【问题标题】：Read multiple txt files from different folders into SAS dataset将不同文件夹中的多个 txt 文件读入 SAS 数据集
【发布时间】：2021-08-16 03:50:47
【问题描述】：

我有以下问题，我真的不知道从哪里开始。我有一个名为“ALL”的文件夹，在该文件夹内有子文件夹，其标题等于它们以 DD-MM-YYYY 格式创建的日期。每天都有一个文件夹，即没有丢失的日子。在每个文件夹中都有许多 txt 文件。我想从每个日期文件夹中读取其中一个文本文件。该文件的命名约定为“thedata_”，后跟一系列随机数字。

例如，如果 ALL 文件夹中有 3 个日期文件夹，那么我想将 3 个单独的“thedata_”文本文件读入 1 个最终 SAS 文件。随后每天添加一个新文件夹，我想将该文件夹中的“thedata_”文件附加到现有的 SAS 文件中，而不是从头开始重新运行脚本。

【问题讨论】：

首先创建一个文件列表。在此处或 community.sas.com 上搜索将爬过目录并列出所有文件的程序。然后，您使用数据步骤根据您的规则过滤要读取的文件的列表。这将为您留下要导入的文件列表。 INFILE 语句中的filevar 选项允许您动态更改输入文件并一次读取所有文件。文档中有一个粗略的示例。

标签： sas txt

【解决方案1】：

这是一种解决方案。这使用 SAS 函数来读取和填充读取每个文件夹中的每个文件的数据集，因此您无需打开 x 命令。您可以将每个文件保存到一个宏变量中，然后根据需要循环并读取每个文件。您可以修改它以使用 filevar 选项。

filename all "Directory/ALL";

data myfiles;
    length folder_name 
           file_name  
           file
           folder_path $5000.
    ;

    /* Folder delimiter */
    if("&sysscp." = "WIN") then SLASH = '\';
        else SLASH = '/';

    /* Open the ALL directory */
    did = dopen("all");

    /* If it was successful, continue */
    if(did) then do;  

        /* Iterate through all subfolders in ALL */
        do i = 1 to dnum(did);

            /* Get the subfolder name and full path */
            folder_name = dread(did, i);
            folder_path = cats(pathname('all'), SLASH, folder_name);

            /* Assign a filename statement to the subfolder */
            rc = filename('sub', folder_path);
            
            /* Give the sub-folder a a directory ID */
            did2 = dopen('sub');

            /* Open the subfolder and read all the .txt files within it */
            if(did2) then do;
                do j = 1 to dnum(did2);

                    file_name = dread(did2, j);
                    file_ext  = scan(file_name, -1, '.');
                    file      = cats(folder_path, SLASH, file_name);
                    
                    /* Save file name only if the expected value is found */
                    if(upcase(file_name) =: "THEDATA_" AND upcase(file_ext) = "TXT") then do;
                        nfiles+1;
                        call symputx(cats('file', nfiles), file); /* Save each file to a macro variable named file1, file2, etc. */
                        output;
                    end;
                end;
            end;

            /* Close the subfolder and move on to the next one */
            rc = dclose(did2);
        end;

    end;

    rc = dclose(did);

    /* Save the total number of files we found to a macro variable */
    call symputx('nFiles', nFiles);

    keep file file_name folder_name folder_path;
run;

/* Read all the files */
%macro readFiles;
    %do i = 1 %to &nFiles.;
        proc import 
            file = "&&file&i."
            out  =  _thedata_&i.
            dbms =  csv
            replace;
            guessingrows=max;
        run;
    %end;

    /* Put all the files together */
    data thedata;
        set _thedata_:;
    run;

    proc datasets lib=work nolist;
        delete _thedata_:;
    quit;
%mend;
%readFiles;

【讨论】：