【发布时间】:2022-01-20 09:30:08
【问题描述】:
我正在处理一个包含数百个变量的数据库,但是,由于它的来源是 JSON,所以我很难组织它。例如,不是文件在列中带来信息,而是创建新行。请参阅示例。
df1 <- data_frame(ID = c(111,111,111,111,111,111,222,222,333),
NAME = c('JOHN','JOHN','MARY','MARY','JAMES','JAMES','WILL','WILL','MARK'),
ADRESS = c('NY','NY','NY','NY','ROMA','ROMA','LONDON','TOKYO',''),
COLOR = c('GREEN','GREEN','RED','RED','YELLOW','YELLOW','BLUE','BLUE','ORANGE'),
CAR = c('','','BMW','BMW','TRUCK','TRUCK','FORD','FORD','FERRARI'),
COUNTRY = c('USA','USA','USA','USA','USA','USA','USA','USA','USA'))
我想以按 ID 分组的方式组织文件,如下例所示:
df2 <- data_frame(ID = c(111,222,333),
NAME1 = c('JOHN','WILL','MARK'),
NAME2 = c('MARY','',''),
NAME3 = c('JAMES','',''),
ADRESS1 = c('NY','LONDON',''),
ADRESS2 = c('NY','TOKYO',''),
ADRESS3 = c('ROMA','',''),
COLOR1 = c('GREEN','BLUE','ORANGE'),
COLOR2 = c('RED','',''),
COLOR3 = c('YELLOW','',''),
CAR1 = c('','FORD','FERRARI'),
CAR2 = c('BMW','',''),
CAR3 = c('TRUCK','',''),
COUNTRY = c('USA','USA','USA'))
但是,请注意,COUNTRY 变量不需要有很多列(COUNTRY1、COUNTRY2、COUNTRY3),因为结果会重复。在我的原始文件中,我会发现很多这样的情况。
如何在 df2 中均匀排列数据?
【问题讨论】: