【问题标题】:SAS import messy dataSAS导入杂乱数据
【发布时间】:2021-10-07 11:47:26
【问题描述】:

我有这个 SAS 代码,我想将其输出为水果数据,以日期描述和供应商作为列

data want;
infile datalines firstobs=2 dsd; 
input Date Description Supplier ;  
format Date ddmmyys8.; 
datalines;
Date, Description, Supplier 
14/2/21, "Jumbo size, sweet, organic", Fresh Ltd
13/3/21, "Fresh, juicy, sweet", Polan Ltd
12/1/21, Fresh and sweet, "30.kg", Japanko Ltd
13/4/21, "Sour and tasty", from Japan, "Juicy", Pan International
14/5/21, "Organic, honey sweet, fresh", Koreania Ltd
17/6/21, "Juicy, pulp", Grocer Fresh 
18/4/21, "Honey sweet", Korea, "fresh", Hanko Ltd
;
run;

我尝试了上面的代码,但仍然无法成功导入数据集。我可以知道如何确保导入的数据只有日期描述和供应商列吗?

【问题讨论】:

  • 对于超过 3 个值的行,您想要什么输出?你想要多重观察吗?你想把一些额外的值组合成两个字符变量之一吗?哪一个?
  • 你能不能让创建文件的人用正确的格式重新创建它,以便可以明确地解析它?
  • 这三个值应该在描述中

标签: sas


【解决方案1】:

您的数据步骤的主要问题是您没有告诉 SAS 如何定义变量,因此默认情况下 INPUT 语句将尝试将它们创建为数字。您也没有使用任何信息将日期字符串作为实际日期值读取。

很难说出你想对某些行上的额外值做什么。他们的意思是什么?你只是想忽略它们吗?如果你包括他们,他们应该去哪里?

这里的代码假设每一行都有一个日期和多对描述/供应商值。

data want;
  infile datalines firstobs=2 dsd truncover;
  length Date 8 Description $30 Supplier $30 ; 
  informat date ddmmyy.;
  format date yymmdd10.;
  input Date Description Supplier @;
  do until(description=' ');
    output;
    input description supplier @;
  end; 
datalines;
Date, Description, Supplier 
14/2/21, "Jumbo size, sweet, organic", Fresh Ltd
13/3/21, "Fresh, juicy, sweet", Polan Ltd
12/1/21, Fresh and sweet, "30.kg", Japanko Ltd
13/4/21, "Sour and tasty", from Japan, "Juicy", Pan International
14/5/21, "Organic, honey sweet, fresh", Koreania Ltd
17/6/21, "Juicy, pulp", Grocer Fresh 
18/4/21, "Honey sweet", Korea, "fresh", Hanko Ltd
;

结果:

Obs          Date    Description                    Supplier

  1    2021-02-14    Jumbo size, sweet, organic     Fresh Ltd
  2    2021-03-13    Fresh, juicy, sweet            Polan Ltd
  3    2021-01-12    Fresh and sweet                30.kg
  4    2021-01-12    Japanko Ltd
  5    2021-04-13    Sour and tasty                 from Japan
  6    2021-04-13    Juicy                          Pan International
  7    2021-05-14    Organic, honey sweet, fresh    Koreania Ltd
  8    2021-06-17    Juicy, pulp                    Grocer Fresh
  9    2021-04-18    Honey sweet                    Korea
 10    2021-04-18    fresh                          Hanko Ltd

这是一个假设额外的词都属于描述字段的版本。

data want;
  infile datalines firstobs=2 dsd truncover;
  length Date 8 Description $50 Supplier $30 ; 
  informat date ddmmyy.;
  format date yymmdd10.;
  input Date Description @ ;
  do _n_=1 to countw(_infile_,',','mq')-3;
     input Supplier @;
     description=catx(', ',description,supplier);
  end;
  input supplier;
datalines;
Date, Description, Supplier 
14/2/21, "Jumbo size, sweet, organic", Fresh Ltd
13/3/21, "Fresh, juicy, sweet", Polan Ltd
12/1/21, Fresh and sweet, "30.kg", Japanko Ltd
13/4/21, "Sour and tasty", from Japan, "Juicy", Pan International
14/5/21, "Organic, honey sweet, fresh", Koreania Ltd
17/6/21, "Juicy, pulp", Grocer Fresh 
18/4/21, "Honey sweet", Korea, "fresh", Hanko Ltd
;

结果

Obs          Date    Description                          Supplier

 1     2021-02-14    Jumbo size, sweet, organic           Fresh Ltd
 2     2021-03-13    Fresh, juicy, sweet                  Polan Ltd
 3     2021-01-12    Fresh and sweet, 30.kg               Japanko Ltd
 4     2021-04-13    Sour and tasty, from Japan, Juicy    Pan International
 5     2021-05-14    Organic, honey sweet, fresh          Koreania Ltd
 6     2021-06-17    Juicy, pulp                          Grocer Fresh
 7     2021-04-18    Honey sweet, Korea, fresh            Hanko Ltd

【讨论】:

  • 谢谢。顺便说一句,这行“countw(infile,',','mq')-3;”是做什么的?
  • 它设置 DO 循环的上限。 COUNTW() 函数计算字数。因此,当行中的单词超过 3 个时,DO 循环将一一读取多余的单词并将它们附加到DESCRIPTION。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-12-16
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多