【问题标题】:In SAS how to transpose a table producing a dummy variable for each unique value in a column在 SAS 中,如何转置一个表,为列中的每个唯一值生成一个虚拟变量
【发布时间】:2026-01-19 19:10:01
【问题描述】:

使用 SAS,我正在尝试转置表中的数据,以便变量 ClassSubclass 的每个唯一值通过变量 ID 成为虚拟变量。

有:

ID        Class        Subclass         
-------------------------------   
ID1        1           1a          
ID1        1           1b           
ID1        1           1c           
ID1        2           2a

ID2        1           1a           
ID2        1           1b           
ID2        2           2a           
ID2        2           2b              
ID2        3           3a

ID3        1           1a                      
ID3        1           1d 
ID3        2           2a
ID3        3           3a           
ID3        3           3b  

想要:

ID    Class_1    Class_2    Class_3    Subclass_1a  ...    Subclass_3b         
----------------------------------------------------...---------------   
ID1   1          1          0          1            ...    0
ID2   1          1          1          1            ...    0
ID3   1          1          1          1            ...    0

我尝试在转置过程的 ID 语句中通过变量 IDClassSubclass 转置数据。然而,这会产生由ClassSubclass 的值的唯一组合串联组成的变量。在转置过程中没有定义 VAR 的情况下,该方法也不会产生 0 和 1 值。

在转置数据以实现需要表之前,我是否需要先创建实际的虚拟变量,还是有更直接的方法?

【问题讨论】:

标签: sas


【解决方案1】:

您似乎需要 PROC TRANSREG 的帮助来生成一个减少的设计矩阵。

data id;
   infile datalines firstobs=3;
   input ID :$3. class subclass :$2.;
   datalines;
ID        Class        Subclass
-------------------------------
ID1        1           1a
ID1        1           1b
ID1        1           1c
ID1        2           2a
ID2        1           1a
ID2        1           1b
ID2        2           2a
ID2        2           2b
ID2        3           3a
ID3        1           1a
ID3        1           1d
ID3        2           2a
ID3        3           3a
ID3        3           3b
;;;;
   run;
proc print;
   run;
proc transreg;
   id id;
   model class(class subclass / zero=none);
   output design out=dummy(drop=class subclass);
   run;
proc print;
   run;
proc summary nway;
   class id;
   output out=want(drop=_type_) max(class: subclass:)=;
   run;
proc print;
   run;

【讨论】:

    【解决方案2】:

    您还可以对每个变量进行区分并使用转置并将其合并回来。

      data have;
     input ID  $      Class  $      Subclass   $  ;
     datalines;      
     ID1        1           1a          
     ID1        1           1b           
     ID1        1           1c           
     ID1        2           2a
     ID2        1           1a           
     ID2        1           1b           
     ID2        2           2a           
     ID2        2           2b              
     ID2        3           3a
     ID3        1           1a                      
     ID3        1           1d 
     ID3        2           2a
     ID3        3           3a           
     ID3        3           3b  
     ;
    
      proc sql;
      create table want1 as 
      select distinct id, class from have;
    
     proc transpose data = want1 out=want1a(drop =_name_) prefix = class_;
      by id;
      id class;
      var class;
       run;
    
       proc sql;
       create table want2 as 
       select distinct id, subclass from have;
    
       proc transpose data = want2 out=want2a(drop =_name_) prefix = Subclass_;
       by id;
       id subclass;
        var Subclass;
         run;
    
     data want;
    merge want1a want2a;
    by id;
     array class(*) class_: subclass_:;
    do i = 1 to dim(class);
     if missing(class(i)) then class(i)= "0";
     else class(i) ="1"; 
    end; 
    drop i;
    run;
    

    【讨论】:

      【解决方案3】:

      这是一些棘手的代码生成,它使用哈希将值映射到数组索引,该索引对应于表示<name>_<value> 存在状态的标志变量

      data have;
      input ID $ Class Subclass $; datalines;
      ID1 1 1a 
      ID1 1 1b 
      ID1 1 1c 
      ID1 2 2a
      
      ID2 1 1a 
      ID2 1 1b 
      ID2 2 2a 
      ID2 2 2b 
      ID2 3 3a
      
      ID3 1 1a 
      ID3 1 1d 
      ID3 2 2a
      ID3 3 3a 
      ID3 3 3b 
      run;
      
      * create indexed name_value data for variable name construction and hash initialization;
      proc sql ; * fresh proc to reset within proc monotonic tracker;
        create table map1 as 
        select class, monotonic() as index 
        from (select distinct class from have);
      
      proc sql noprint;
        create table map2 as
        select subclass, monotonic() as index
        from (select distinct subclass from have);
      
      * populate macro variable with pdv target variable names to be arrayed;
      proc sql noprint;
        select catx('_','class',class) 
        into :map1vars separated by ' '
        from map1 order by index;
      
        select catx('_','subclass',subclass)
        into :map2vars separated by ' '
        from map2 order by index; 
      
      * group wise flag <variable>_<value> combinations;
      data want;
        if _n_ = 1 then do;
          if 0 then set map1 map2; * prep pdv with hash variables;
          declare hash map1(dataset:'map1');
          declare hash map2(dataset:'map2');
          map1.defineKey('class');
          map1.defineData('index');
          map1.defineDone();
          map2.defineKey('subclass');
          map2.defineData('index');
          map2.defineDone();
        end;
      
        * group wise flag pivot vars (existential extrusion);
        do until (last.id);
          set have;
          by id;
          array map1_ &map1vars; * array for <name>_<value> combinations;
          array map2_ &map2vars;
      
          * use hash lookup on value to find index into target array;
          map1.find(); put index=; map1_[index] = 1;
          map2.find(); put index=; map2_[index] = 1;
        end;
        keep id &map1vars &map2vars;
      run;
      

      Proc REPORT 可以显示值 across 以及组内的出现次数。

      proc report data=have;
        define id / group;
        define class / across;
        define subclass / across;
      run;
      

      【讨论】:

        最近更新 更多