【问题标题】:SAS: Define type when importing .xlsx with PROC IMPORTSAS:使用 PROC IMPORT 导入 .xlsx 时定义类型
【发布时间】:2016-12-24 12:06:22
【问题描述】:

问题:在使用 PROC IMPORT 时如何定义从 .xlsx 文件导入的变量的变量类型?


我的工作

我正在使用 SAS v9.4。据我所知,它是香草 SAS。我没有 SAS/ACCESS 等。

我的数据如下所示:

ID1        ID2  MONTH   YEAR    QTR VAR1    VAR2
ABC_1234   1    1       2010    1   869     3988
ABC_1235   12   2       2010    1   639     3144
ABC_1236   13   3       2010    2   698     3714
ABC_1237   45   4       2010    2   630     3213

我正在运行的程序是:

proc import out=rawdata
    datafile = "c:\rawdata.xlsx"
        dbms = xlsx replace;

    format ID1 $9. ;
    format ID2 $3. ;
    format MONTH best2. ;
    format YEAR best4. ;
    format QTR best1. ;
    format VAR1 best3. ;
    format VAR2 best4. ;
run;

当我运行这一步时,我得到以下日志输出:

错误:您正试图在数据集 WORK.RAWDATA 中使用带有数字变量 ID2 的字符格式 $。

这似乎告诉我的是 SAS 自动分配变量类型。我希望能够手动控制它。我找不到解释如何执行此操作的文档。 INFORMAT、LENGTH 和 INPUT 语句似乎不适用于 PROC IMPORT。

我使用 PROC IMPORT 是因为它在 .xlsx 文件方面取得了最大的成功。我能想到的两种可能的解决方案是 1)将 .xlsx 转换为 .csv 并在 DATA 步骤中使用 INFILE 以及 2)将数据作为数字输入并在稍后的步骤中将其转换为字符。我不喜欢第一个解决方案,因为它需要我手动操作数据,这是一个潜在的错误来源(例如删除前导零)。我不喜欢第二个,因为它可能会无意中引入错误(同样,例如带有前导零)并引入无关的工作。

【问题讨论】:

  • 您确定没有SAS/ACCESS to PC FILES 许可吗?我的印象是DBMS=XLSX 也需要它。
  • 另外,我不知道你可以直接在proc import 中使用format 语句(而且,事实证明,其他类似的属性语句)!
  • @Joe 我认为dbms=xlxs 现在是 BASE 的一部分。 EXCEL 和其他人仍然需要 PC 文件。有什么区别,我不知道。
  • @DomPazz 无论如何都不是 2015 年的 Chris Hemedinger (TS1M2)。我在 TS1M3 中没有看到任何提及它的变化,但也许我遗漏了一些东西。
  • 太糟糕了dbms=xlsxproc export 中不起作用。 =(

标签: excel types import sas


【解决方案1】:

您可以尝试在 Excel 中将列类型设置为“文本”,以查看 SAS 是否会从中确定。值得一试。

如果这不起作用,除非您使用 PC 文件服务器,或者在同一 SAS 服务器上安装相同位数的 Excel 以直接访问文件,否则您将需要使用单独的数据步骤来转换列.

proc import 
    file = "c:\rawdata.xlsx"
    out=_rawdata(rename=(ID2 = _ID2) )
    dbms = xlsx replace;
run;

data rawdata;
    format ID1 $9. ;
    format ID2 $3. ;
    format MONTH best2. ;
    format YEAR best4. ;
    format QTR best1. ;
    format VAR1 best3. ;
    format VAR2 best4. ;

    set _rawdata;

    ID2 = cats(_ID2);

    drop _:;
run;

如果您有 SAS/Access to Excel,您可以直接使用 DBDSOPTS data set option 控制这些变量。例如:

libname myxlsx Excel 'C:\rawdata.xlsx';

data rawdata;
    set myxlsx.'Sheet1$'n(DBDSOPTS="DBTYPE=(ID2='CHAR(3)')");
run;

出现问题的原因是proc import 中的xlsx 引擎是SAS 内部的,与Excel 引擎是分开的。 Excel 引擎使用 Microsoft Jet 或 Ace,而 xlsx 引擎使用的专有系统没有 Microsoft 那么多的控制权。为什么会这样,我不知道。

proc import 运行时,SAS 将尝试猜测它应该是什么格式(您可以使用guessingrows 选项对 xls 文件进行控制)。如果它检测到所有数字,它将假定一个数字变量。遗憾的是,如果没有安装 SAS/ACCESS to Excel 或 PC Files Server,您无法直接控制变量类型。

【讨论】:

  • 我认为 xlsx 引擎可能使用 Open Office XML 文档格式,而 Jet 引擎可能会处理该格式以及专有的 MS 格式。我猜xlsx的解析器是用proc模板写的,如果你能找到它使用的模板,理论上你可以找出它不支持它的原因,甚至可以修改它以添加所需的功能。跨度>
  • ^ 如果您想筛选 8k 行 proc 模板代码...
  • 这是一个有趣的想法。我会调查的!
  • @LoremIpsum 我猜 dbms=xlsx 可能使用 excelXP 标记集(顺便说一句,这是一个很大的猜测,也很有可能它正在做一些完全不同的事情)。您可以通过运行proc template; list tagsets; run; 获取所有标签集(例如xlsx 标签集)的列表,然后您可以通过proc template; source tagsets.excelxp; run; 列出标签集的来源。这样做表明它只是继承自tagsets.excelbase。如果我们再询问源代码,我们可以看到代码...proc template; source Tagsets.ExcelBase; run;
  • 这里有一些文档可以帮助您开始理解标签集代码:support.sas.com/documentation/cdl/en/odsug/61723/HTML/default/…
【解决方案2】:

在 Excel 中定义类型。

如果您想稍后转换它,请使用数据步骤来转换列。

data want ;
  length id1 $9 id2 $3 ;
  set rawdata(rename=(id2=numeric_id2));
  id2=cats(numeric_id2);
  drop numeric_id2;
run;

【讨论】:

    【解决方案3】:

    我通过不使用PROC IMPORT 解决了这个问题。它不是适合所有人的解决方案,但它对我的目的非常有用(即不是“大数据”)。如果您从 Excel 电子表格中读取数据,它应该适合您。

    ImportDataFile 是一个宏1,它可以自动导入数据步骤。数据步骤导入需要LENGTH 语句来定义变量名称和类型,INPUT 语句来从外部文件读取原始数据,INFILE 语句来指定哪个文件。

    data &dataset.;
      &infileStatement.;
    
      length &lengthStatement. ;
    
      input (_all_) (:) ;
    run;
    

    宏由三个主要步骤组成:

    • 如有必要,请建立DDE link(即连接到 Excel)
    • 通过读入标头获取数据变量
    • 读入剩余数据

    请注意其中的每一个如何对应于数据步骤中的三行。宏中的所有内容都支持该数据步骤。

    根据我的经验,最好将数据作为固定宽度字符导入,然后在单独的步骤中转换为所需的任何类型。是的,这是多余的,但我从来没有遇到过内存或空间问题。好处远远超过了任何假设的担忧。它使每次分析的数据流都相同,从而有助于验证并通过避免更正 SAS 对类型的猜测(以及不可避免的静默截断)来节省总体时间。

    因为 SAS 是一种非常冗长的语言,所以这个答案违反了 StackOverflow 答案字符限制。完整记录的副本在这里:https://pastebin.com/raw/RsXz3juJ 将代码放在一个名为ImportDataFile.sas 的文件中,并确保它在调用宏之前运行(可能使用%include)。调用形式为:

    %ImportDataFile(   
           dirData=    
      ,   fileName=    
      ,    dataset=    
      ,  delimiter=    
      , overOption=    
      ,  headerRow=    
      ,      sheet=    
      ,      range=    
      ,     prefix=    
      ,       case=    
      ,  defLength=    
    );                       
    

    在哪里

    Output(s)     : SAS dataset, macro variable &listHeader                 
    Inputs        :    dirData= Directory containing data file.             
                      fileName= Filename including file extension. Must be  
                                .csv, .txt, .tsv, .xls, or .xlsx.           
                       dataset= Name of dataset output to WORK library.     
                     delimiter= (optional) Delimiting string given in       
                                quotes. Default for CSV is a comma, for     
                                TXT/TSV a tab. This parameter may not be    
                                set for Excel files. Doing so generates a   
                                warning.                                    
                    overOption= (optional) INFILE option. Default is        
                                 MISSOVER.  Other choices are FLOWOVER,     
                                 STOPOVER, TRUNCOVER, or SCANOVER.          
                     headerRow= (optional) Row corresponding to header in   
                                an Excel file. Accepts R#C#:R#C#, but       
                                should be given as R#. Default is R1.       
                         sheet= Name of worksheet. Required for XLS or XLSX.
                         range= Range of spreadsheet to be imported.        
                                Required for XLS and XLSX. Use form         
                                R#C#:R#C#.  See example below.              
                        prefix= (optional) String to append to beginning of 
                                each variable name. Default is no prefix.   
                          case= (optional) Toggle mix case variable naming. 
                                Must be lower/upper/mixed. Default is       
                                lower.                                      
                     defLength= (optional) Character field length.  Default 
                                value is 100.                               
    

    例如,以下从位于C:\Path\To\Filemy_xl_file.xlsx 创建一个名为xl_import 的字符类型数据集,宽度为100。列以字符串“raw_”为前缀。 overOption 对应于INFILE 语句中定义的那些。

    %ImportDataFile(              
           dirData= C:\Path\To\File
      ,   fileName= my_xl_file.xlsx      
      ,    dataset= xl_import     
      ,     prefix= raw_          
      ,      sheet= Sheet1     
      ,      range= R2C1:R13C18   
      ,  defLength= 100           
      , overOption= MISSOVER      
    );                            
    

    这是宏的代码。享受吧。

    ********************************************************************
    ** Utilities / Sub Macros
    ********************************************************************;
    %macro ClearFileRef(fileRef);
      filename &fileRef. clear;
    %mend;
    
    %macro CompareVariablesToDDERange();
      %local columnIndex numberOfDDEColumns;
    
      %let columnIndex        = %eval(%sysfunc(findc(&range., 'C', ib)) + 1);
      %let numberOfDDEColumns = %sysfunc(substr(&range., &columnIndex));
      %if %ListLength(&listHeader) ^= &numberOfDDEColumns %then
        %put WARNING: [MACRO] Data file contains %ListLength(&listHeader) variables. RANGE argument has &numberOfDDEColumns columns.;
    %mend;
    
    %macro EstablishSystemLink(fileRef);
      filename &fileRef. dde 'excel|system';
    %mend;
    
    %macro EstablishWorkbookLink(fileRef, dirData, fileName, sheetName, range);
      filename &fileRef. dde "excel|&dirData.\[&fileName.]&sheetName.!&range.";
    %mend;
    
    %macro IsEmpty(macroVariable);
      %sysevalf(%superq(&macroVariable)=, boolean)
    %mend;
    
    %macro IsFileRef(reference);
      %local fileRefExists externalFileExists returnValue;
    
      %let fileRefExists      = %sysfunc(fexist(&reference.));
      %let externalFileExists = %sysfunc(fileexist(&reference.));
      %if &fileRefExists. = 1 and &externalFileExists. = 0 %then %let returnValue = 1;
      %else %let returnValue = 0;
      &returnValue
    %mend;
    
    %macro IsFilePath(reference);
      %local fileRefExists externalFileExists returnValue;
    
      %let fileRefExists      = %sysfunc(fexist(&reference.));
      %let externalFileExists = %sysfunc(fileexist(&reference.));
      %if &fileRefExists. = 0 and &externalFileExists. = 1 %then %let returnValue = 1;
      %else %let returnValue = 0;
      &returnValue
    %mend;
    
    %macro GetObsCount(dataset);
      %local exists returnValue closed;
    
      %let exists = %sysfunc(open(&dataset));
      %if &exists. %then %do;
        %let returnValue  = %sysfunc(attrn(&exists, nobs));
        %let closed       = %sysfunc(close(&exists));
        %end;
      %else %do;
        %put ERROR: [&SYSMACRONAME.] Dataset %upcase(&dataset) does not exist.;
        %abort cancel;
        %end;
      &returnValue
    %mend;
    
    %macro GetVarCount(dataset);
      %local exists varCount closed;
    
      %let exists = %sysfunc(open(&dataset));
      %if &exists. %then %do;
        %let varCount = %sysfunc(attrn(&exists, nvars));
        %let closed   = %sysfunc(close(&exists));
        %end;
      %else %do;
        %put ERROR: [&SYSMACRONAME.] Dataset %upcase(&dataset) does not exist.;
        %abort cancel;
        %end;
      &varCount
    %mend;
    
    %macro ListLength(list);
      %local count;
    
      %if %sysevalf(%superq(list)=, boolean) %then %let count = 0;
      %else %let count = %eval(%sysfunc(countc(&list., |)) + 1);
      &count
    %mend;
    
    %macro ListElement(list, n);
      %local nthElement;
    
      %let nthElement = %sysfunc(scan(%superq(&list.), &n., |, m));
      &nthElement
    %mend;
    
    %macro RemoveAllFormattingFromSheet(fileRef, sheet);
      data _null_;
        file &fileRef.;
        /* Select sheet of interest */
        put "[WORKBOOK.ACTIVATE(""&sheet."")]";
        /* Select first cell */
        put '[FORMULA.GOTO("R1C1")]';
        /* Apply dummy filter of ">2" to first column */
        put '[FILTER(1, ">2")]';
        /* Disable filters */
        put '[FILTER()]';
        /* Select all */
        put '[SELECT("R[0]C[0]:R[1048575]C[16383]", "R[0]C[0]")]';
        /* Unhide rows */
        put '[ROW.HEIGHT(,,TRUE, 2)]';
        /* Unhide columns */
        put '[COLUMN.WIDTH(,,TRUE, 2)]';
        /* Remove all formatting */
        put '[CLEAR(2)]';
        /* Autofit column width */
        put '[COLUMN.WIDTH(,,TRUE, 3)]';
      run;
    %mend;
    
    %macro SetSystemOptions(opt1, opt2, opt3);
      options &opt1. &opt2. &opt3.;
    %mend;
    
    %macro ImportDataFile(dirData=, fileName=, dataset=, delimiter=, overOption=MISSOVER, headerRow=R1, sheet=, range=, prefix=, case=lower, defLength=100) / minoperator mindelimiter=',';
    %put NOTE: [MACRO] Executing: ImportDataFile(dirData=&dirData, fileName=&fileName, dataset=&dataset, delimiter=&delimiter, overOption=&overOption, headerRow=&headerRow, sheet=&sheet, range=&range, prefix=&prefix, case=&case, defLength=&defLength);
    
      %local
        macroStart
        case
        extension
        HeaderRef
        lengthStatement
        delimiter
        InfileRef
        infileStatement
        numberOfRecords
        numberOfVars
        duration
       ;
    
      %global
        listHeader
        originalNOTES
        originalQUOTELENMAX
      ;
    
      %let macroStart           = %sysfunc(datetime());
      %let originalNOTES        = %sysfunc(getoption(notes));
      %let originalQUOTELENMAX  = %sysfunc(getoption(noquotelenmax));
    
      %SetSystemOptions(nonotes);
    
    ********************************************************************
    ** Validation
    ********************************************************************;
      %if %IsEmpty(dirData) %then %do;
        %put ERROR: [&SYSMACRONAME.] DIRDATA argument is blank.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %if %IsEmpty(fileName) %then %do;
        %put ERROR: [&SYSMACRONAME.] FILENAME argument is blank.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %if %IsEmpty(dataset) %then %do;
        %put ERROR: [&SYSMACRONAME.] DATASET argument is blank.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %if not(%IsEmpty(prefix)) and not(%sysfunc(nvalid(&prefix, v7))) %then %do;
        %put ERROR: [&SYSMACRONAME.] Invalid PREFIX="&prefix.";
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %let case = %upcase(&case.);
    
      %if not(&case. in (LOWER, UPPER, MIXED)) %then %do;
        %put ERROR: [&SYSMACRONAME.] Invalid case option: &case. Must be LOWER, UPPER, MIX.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %let extension  = %upcase(%scan(&fileName., 1, '.', b));
    
      %if not(&extension. in (TXT, TSV, CSV, XLS, XLSX)) %then %do;
        %put ERROR: [&SYSMACRONAME.] Invalid file type: &extension. Must be TXT, TSV, CSV, XLS, XLSX.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %if &extension. in (XLS, XLSX) and %IsEmpty(sheet) %then %do;
        %put ERROR: [&SYSMACRONAME.] SHEET argument undefined.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %if &extension. in (XLS, XLSX) and %IsEmpty(range) %then %do;
        %put ERROR: [&SYSMACRONAME.] RANGE argument undefined.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %if not(&extension. in (XLS, XLSX)) and not(%IsEmpty(sheet)) %then %do;
        %put ERROR: [&SYSMACRONAME.] SHEET argument only valid for XLS or XLSX files.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %if not(&extension. in (XLS, XLSX)) and not(%IsEmpty(range)) %then %do;
        %put ERROR: [&SYSMACRONAME.] RANGE argument only valid for XLS or XLSX files.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
    **********************************
    *** Define delimiter
    **********************************;
     %if %IsEmpty(delimiter) %then %do;
        %if       &extension. in (XLS, XLSX)  %then %let delimiter = '09'x;
        %else %if &extension. = CSV           %then %let delimiter = ',';
        %else %if &extension. in (TXT, TSV)   %then %let delimiter = '09'x;
        %else %do;
          %put ERROR: [&SYSMACRONAME.] Delimiter error.;
          %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
          %abort cancel;
          %end;
        %end;
    
      %if &extension. in (XLS, XLSX) and &delimiter ^= '09'x %then %do;
        %put WARNING: [&SYSMACRONAME.] Delimiter for Excel file must be '09'x.;
        %put WARNING: [&SYSMACRONAME.] Delimiter set to '09'x.;
        %let delimiter = '09'x;
        %end;
    
    ********************************************************************
    ** Prep Excel Worksheet
    ********************************************************************;
      %if &extension. in (XLS, XLSX) %then %do;
        %let DDECommandRef = DDEcmd;
        %EstablishDDELink(fileRef=&DDECommandRef.);
        %RemoveAllFormattingFromSheet(fileRef=&DDECommandRef., sheet=&sheet.);
        %end;
    
    ********************************************************************
    ** Get header
    ********************************************************************;
    
    **********************************
    *** Define file reference
    **********************************;
      %if &extension. in (XLS, XLSX) %then %do;
        %let HeaderRef = DDEHead;
        %EstablishDDELink(
          fileRef= &HeaderRef.
          ,   dirData= &dirData.
          ,  fileName= &fileName.
          , sheetName= &sheet.
          ,     range= &headerRow.
        );
        %end;
      %else %if &extension. in (CSV, TXT, TSV) %then
        %let HeaderRef = %sysfunc(dequote(&dirData.))\&fileName.;
    
      %ReadHeaderIntoList(reference=&HeaderRef., delimiter=&delimiter., prefix=&prefix., case=&case.);
    
    ********************************************************************
    ** Create length statement
    ********************************************************************;
      %let lengthStatement = %CreateLengthStatement(&listHeader., &defLength.);
    
    ********************************************************************
    ** Import data
    ********************************************************************;
    
    **********************************
    *** Define infile statement
    **********************************;
      %if &extension. in (XLS, XLSX) %then %do;
        %let InfileRef = DDESheet;
        %EstablishDDELink(
          fileRef= &InfileRef.
          ,   dirData= &dirData.
          ,  fileName= &fileName.
          , sheetName= &sheet.
          ,     range= &range.
        );
        %let infileStatement = infile &InfileRef. dlmstr=&delimiter. dsd notab &overOption.;
        %CompareVariablesToDDERange();
        %end;
      %else %if &extension. in (CSV, TXT, TSV) %then %do;
        %let InfileRef       = %sysfunc(dequote(&dirData.))\&fileName.;
        %let infileStatement = infile "&InfileRef." dlmstr=&delimiter. dsd &overOption. firstobs = 2 end=last_record;
        %end;
    
    **********************************
    *** Perform import
    **********************************;
      data &dataset.;
        &infileStatement.;
    
        length &lengthStatement. ;
    
        input (_all_) (:) ;
    
      run;
    
    ********************************************************************
    ** Housekeeping
    ********************************************************************;
      %let numberOfRecords = %GetObsCount(&dataset.);
      %let numberOfVars    = %GetVarCount(&dataset.);
    
      %SetSystemOptions(notes);
    
      %put;
      %put NOTE: [MACRO] The dataset WORK.%upcase(&dataset.) has &numberOfRecords. observations and &numberOfVars. variables.;
      %put NOTE: [MACRO] IMPORTDATAFILE macro used (Total process time):;
    
      %let duration = %sysfunc(putn(%sysevalf(%sysfunc(datetime()) - &macroStart.), time12.3));
      %if %sysfunc(minute("&duration."t)) > 0 %then %do;
        %put NO%str(TE-)         real time            %substr(&duration., 3, 8);
        %end;
      %else %do;
        %put NO%str(TE-)         real time            %substr(&duration., 6, 5) seconds;
        %end;
    
      %put;
    
      %SetSystemOptions(&originalNotes., &originalQUOTELENMAX.);
    
    %mend;
    
    %macro  EstablishDDELink(fileRef, dirData, fileName, sheetName, range);
    %put NOTE: [&SYSMACRONAME] Executing: EstablishDDELink(fileRef=&fileRef, dirData=&dirData, fileName=&fileName, sheetName=&sheetName, range=&range);
    
      %local dirData linkConnection stopTime closeReturnCode;
    
    ********************************************************************
    ** Validate arguments
    ********************************************************************;
      %if %IsEmpty(fileRef) %then %do;
        %put ERROR: [&SYSMACRONAME] fileRef is blank.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %if %length(&fileRef.) > 8 %then %do;
        %put ERROR: [&SYSMACRONAME] Fileref &fileRef exceeds 8 character limit.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
      %if not %IsEmpty(dirData) %then %let dirData = %sysfunc(dequote(&dirData.));
    
    ********************************************************************
    ** Assign fileref according to link type
    ********************************************************************;
      %if     %IsEmpty(dirData)
          and %IsEmpty(fileName)
          and %IsEmpty(sheetName)
          and %IsEmpty(range) %then %EstablishSystemLink(&fileRef.);
      %else %EstablishWorkbookLink(&fileRef., &dirData., &fileName., &sheetName., &range.);
    
    ********************************************************************
    ** Check that link has been established
    ********************************************************************;
      %let linkConnection = %sysfunc(fopen(&fileRef, S));
    
      %if not (&linkConnection. > 0) %then %do;
    
        /*Run until either Excel opens (linkConnection > 0)
          or until 10 seconds have passed.*/
        %let stopTime = %sysevalf(%sysfunc(datetime()) + 10);
    
        %do %until (&linkConnection. > 0);
          %if (%sysfunc(datetime()) >= &stopTime.) %then %do;
        %put ERROR: [&SYSMACRONAME] DDE system link was not established. Operation timed out.;
        %ClearFileRef(fileRef.);
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
          %let linkConnection = %sysfunc(fopen(&fileRef, S));
          %end;
        %end;
    
    ********************************************************************
    ** Housekeeping
    ********************************************************************;
      %let closeReturnCode = %sysfunc(fclose(&linkConnection));
    
    %mend;
    
    %macro  ReadHeaderIntoList(reference, delimiter, prefix, case) / minoperator mindelimiter=',';
    %put NOTE: [MACRO] Executing: ReadHeaderIntoList(reference=&reference, delimiter=&delimiter, prefix=&prefix, case=&case);
    
      %local  fileSpecification notab delimiter;
      %global listHeader;
    
      %SetSystemOptions(nonotes);
    
      %if %IsEmpty(reference) %then %do;
        %put ERROR: [&SYSMACRONAME.] REFERENCE argument is blank.;
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
    ********************************************************************
    ** Determine infile statement options
    ********************************************************************;
      /*SAS filerefs exist only for Excel files*/
      %if       %IsFileRef(&reference.)  %then %do;
        %let fileSpecification  = &reference.;
        %let notab              = notab;
        %end;
      /*Absolute references only for CSV,TXT,TSV files*/
      %else %if %IsFilePath(&reference.) %then %do;
        %let fileSpecification  = "&reference.";
        %let notab              = ;
        %let extension          = %upcase(%scan(&reference., 1, '.', b));
        %end;
      %else %do;
        %put ERROR: [&SYSMACRONAME.] Invalid input REFERENCE: [&reference.];
        %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
        %abort cancel;
        %end;
    
    ********************************************************************
    ** Read in header
    ********************************************************************;
      data _null_;
        infile &fileSpecification. dlmstr = '```#@' &notab. obs = 1 lrecl = 32767 ;
        length
          raw_header_line   $ 32767
          raw_with_pipes    $ 32767
        ;
        input raw_header_line;
    
        raw_with_pipes  = tranwrd(raw_header_line, &delimiter., '|');
        call symput('rawListHeader', strip(raw_with_pipes));
      run;
    
    ********************************************************************
    ** Transform headers into valid variable names
    ********************************************************************;
      %SetSystemOptions(noquotelenmax);
      data _null_;
        length
          i           8
          listLength  8
          header_i    $ 32767
          temp_i      $ 32767
          listValid   $ 32767
        ;
        listLength = %ListLength(%superq(rawListHeader));
    
        do i = 1 to listLength;
          header_i = scan("%superq(rawListHeader)", i, '|', 'm');
    
    **********************************
    *** Apply prefix
    **********************************;
          if not missing(header_i) then prefixed_i = cats("&prefix.", header_i);
          else                          prefixed_i = header_i;
    
    **********************************
    *** Apply case
    **********************************;
          if      "&case." = "LOWER" then cased_i = lowcase(prefixed_i);
          else if "&case." = "UPPER" then cased_i = upcase(prefixed_i);
          else                            cased_i = prefixed_i;
    
    **********************************
    *** Keep valid otherwise correct
    **********************************;
          if nvalid(cased_i, 'v7') then do;
        if i = 1 then listValid = cased_i;
        else          listValid = catx('|', listValid, cased_i);
        end;
          else do;
    
    **********************************
    *** Fill in blank headers
    **********************************;
          if missing(cased_i) and "&case." = "UPPER" then temp_i = "%upcase(&prefix.)NO_HEADER";
          else if missing(cased_i)                   then temp_i = "&prefix.no_header";
    
    **********************************
    *** Replace blanks with _ and
    *** Remove invalid characters
    **********************************;
          else do;
        replaced_space_with_underscore = tranwrd(strip(cased_i), ' ', '_');
        temp_i = compress(replaced_space_with_underscore, '_', 'kin');
        end;
    
    **********************************
    *** Make first char _ if digit
    **********************************;
        if anydigit(temp_i) = 1 then temp_i = cats('_', temp_i);
    
    **********************************
    *** Trim length to 32
    **********************************;
        if length(temp_i) > 32 then temp_i = substr(temp_i, 1, 32);
    
    **********************************
    *** Verify valid V7 name
    **********************************;
        if not nvalid(temp_i, 'v7') then do;
          put 'ERROR: [&SYSMACRONAME.] Error cleaning header ' i +(-1) '. Invalid SAS name.';
          call execute('
            %SetSystemOptions(&originalNOTES., &originalQUOTELENMAX.);
            data _null_;
              abort cancel nolist;
            run;');
          stop;
          end;
    
        if i = 1 then listValid = temp_i;
        else          listValid = catx('|', listValid, temp_i);
        end;
    
          output;
        end;
        call symput('listValid', strip(listValid));
      run;
    
    ********************************************************************
    ** Append repeated headers with incremented value
    ********************************************************************;
      /*Use hash table with key being each header and value
        corresponding to the number of occurences.  Create new
        header list as follows: If first occurence of a header,
        add to list.  If not first occurence, ruthlessly append
        occurence number (ensuring validity) and add to list.
        Beware: SAS documentation for hashes contains syntax
        errors.*/
      data _null_;
        length
          element_i   $ 32
          item        $ 32
          occurrences 8
          new_list    $ 32767
        ;
    
        declare hash h();
        h.defineKey('item');
        h.defineData('item', 'occurrences');
        h.defineDone();
        call missing(item, occurrences);
    
        listLength = input("%ListLength(&listValid.)", 8.);
        do i = 1 to listLength;
          element_i = scan("&listValid.", i, '|');
    
          if not (h.find(key: element_i) = 0) then do;
        h.add(key: element_i, data: element_i, data: 1);
        new_list = catx('|', new_list, element_i);
        end;
          else do;
        occurrences + 1;
        h.replace(key: element_i, data: element_i, data: occurrences);
    
        len     = length(element_i);
        digits  = ceil(log10(occurrences + 1));
    
        if (len + digits) > 32 then
          new_element = cats(substr(element_i, 1, len - digits), occurrences);
        else new_element = cats(element_i, occurrences);
    
        new_list = catx('|', new_list, new_element);
        end;
        end;
    
        call symput('listHeader', strip(new_list));
      run;
    %mend;
    
    %macro  CreateLengthStatement(listHeader, defLength);
      %local lengthStatement header_h;
    
      %let lengthStatement=;
      %do h = 1 %to %ListLength(&listHeader.);
      %let header_h = %ListElement(listHeader, &h);
        %if &h. = 1 %then %let lengthStatement = &header_h. $ &defLength. ;
        %else %let lengthStatement = &lengthStatement. &header_h. $ &defLength. ;
      %end;
      %let lengthStatement = &lengthStatement;
      &lengthStatement
    %mend;
    

    1 该解决方案广泛使用宏。根据我的经验,人们建议我避免使用宏。恕我直言,我发现最好忽略该建议。 SAS 没有函数,这使得开发抽象变得困难。宏允许您模仿功能。对宏的普遍恐惧是调试。坚持使用Single Responsibility Principle,您会发现它们一点也不难调试。用%put 声明记录它们,您就会知道谁被呼叫以及何时被呼叫。如果您不熟悉宏,它们实际上只是文本替换。代码通过预处理器并用文本替换宏代码。然后执行该文本以及您的其余代码。了解宏的最佳资源是the manual

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-11-21
      • 2019-04-12
      • 1970-01-01
      • 1970-01-01
      • 2019-09-24
      • 1970-01-01
      相关资源
      最近更新 更多