【发布时间】:2016-09-11 17:14:03
【问题描述】:
下面的 csv 来自一个更长的数据表,称之为temp。我想将其转换为temp.wide,region_code 作为列和,垂直顺序为region_code(SAS、SSA、EUR、...)作为列的顺序。我刚刚注意到 dcast 按字母顺序排列新列。
scenario region_code region_name value
1: 2010 SAS South Asia 61.17716
2: 2010 SSA Africa south of the Sahara 62.08588
3: 2010 EUR Europe 63.76123
4: 2010 LAC Latin America and Caribbean 68.84806
5: 2010 FSU Former Soviet Union 59.04499
6: 2010 EAP East Asia and Pacific 64.00579
7: 2010 NAM North America 66.18235
8: 2010 MEN Middle East and North Africa 58.03167
9: SSP2-NoCC-REF SAS South Asia 57.29973
10: SSP2-NoCC-REF SSA Africa south of the Sahara 65.14987
11: SSP2-NoCC-REF EUR Europe 63.99204
12: SSP2-NoCC-REF LAC Latin America and Caribbean 68.21118
13: SSP2-NoCC-REF FSU Former Soviet Union 60.10807
14: SSP2-NoCC-REF EAP East Asia and Pacific 63.86103
15: SSP2-NoCC-REF NAM North America 65.97859
16: SSP2-NoCC-REF MEN Middle East and North Africa 58.98356
temp = setDT(structure(list(scenario = c("2010", "2010", "2010", "2010", "2010",
"2010", "2010", "2010", "SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF",
"SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF",
"SSP2-NoCC-REF"), region_code = c("SAS", "SSA", "EUR", "LAC",
"FSU", "EAP", "NAM", "MEN", "SAS", "SSA", "EUR", "LAC", "FSU",
"EAP", "NAM", "MEN"), region_name = c("South Asia", "Africa south of the Sahara",
"Europe", "Latin America and Caribbean", "Former Soviet Union",
"East Asia and Pacific", "North America", "Middle East and North Africa",
"South Asia", "Africa south of the Sahara", "Europe", "Latin America and Caribbean",
"Former Soviet Union", "East Asia and Pacific", "North America",
"Middle East and North Africa"), value = c(61.1771623260257,
62.0858809906661, 63.7612306428217, 68.84805628195, 59.0449875464304,
64.0057851485101, 66.182351351389, 58.0316719859857, 57.299725759211,
65.1498720847705, 63.9920412193261, 68.2111842947542, 60.1080745513644,
63.86103368494, 65.9785850777114, 58.9835574681585)), .Names = c("scenario",
"region_code", "region_name", "value"), row.names = c(NA, -16L
), class = "data.frame"))
这是我使用的代码。
formula.wide <- "scenario ~ region_code"
temp.wide <- data.table::dcast(
data = temp,
formula = formula.wide,
value.var = "value")
scenario EAP EUR FSU LAC MEN NAM SAS SSA
1: 2010 64.00579 63.76123 59.04499 68.84806 58.03167 66.18235 61.17716 62.08588
2: SSP2-NoCC-REF 63.86103 63.99204 60.10807 68.21118 58.98356 65.97859 57.29973 65.14987
新的列名是scenario, EAP, EUR, FSU, LAC, MEN, NAM, SAS, SSA。
我可以从temp 获取正确的顺序,然后使用setcolorder 为temp.wide 提供正确的列顺序。但我想知道是否有某种方法可以不按字母顺序排列新的列顺序。
另外,dcast 的帮助文本说
正在转换的列的名称以相同的顺序生成 (由下划线 _ 分隔)与每个中的(唯一)值 公式 RHS 中提到的列。
如果我理解正确,我认为它并没有描述 dcast 的实际作用。但是我不明白括号中的短语(用下划线分隔,_)是什么意思。
【问题讨论】:
-
请提供一个可重现的例子。
-
可以
dput数据吗? -
我可以
dput数据,但我不知道如何处理生成的文件。
标签: r data.table