【发布时间】:2021-08-05 07:23:27
【问题描述】:
我有一个数据集,显示了几个国家的双边出口。因为数据波动,我需要计算年级的平均值。并非所有国家都准确涵盖了年份。有些开始较晚,有些介于两者之间 - 这意味着缺少一些年份(但没有 NA 条目)。在一位了不起的社区成员的帮助下,我已经设法将数据切割成碎片:year_group。
下面我列出了另外两个问题以及我的代码、错误的输出以及底部数据集 total_trade 的一些示例输入数据
问题 1
我面临的问题是代码没有计算正确的方法。当我手动计算结果时,我得到的结果与我的代码不同。 (见下文)
这是我的代码
# create vectors for coding 4 years average
year_group_break <- c(1999, 2003, 2007, 2011, 2015, 2019)
year_group_labels <- c("1999-2002", "2003-2006", "2007-2010", "2011-2014", "2015-2018")
years <- c(1999, 2000, 2001, 2002,2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019)
FourY_av <- total_trade %>%
# create year_group variable for average values with above predefined labels and cuts,
# chose right = FALSE to take cut before year_group_break
mutate(year_group = cut(Year, breaks = year_group_break,
labels = year_group_labels,
include.lowest = TRUE, right = FALSE)) %>%
# add column with mean of total trade per four year period: "avg_year_group_total"
group_by(ReporterName, year_group) %>%
mutate(total_year_group = mean(Total_Year)) %>%
arrange(ReporterName,PartnerName, desc(Year))
View(FourY_av)
下面是错误输出 此输出错误,因为 total_year_group(安哥拉年份组 '2015-2018' 的平均值)应为 34746013.5(手动计算时)而不是 34907582。(如输出所示)在哪里是我的错吗?
> head(FourY_av)
# A tibble: 6 x 9
# Groups: ReporterName, year_group [1]
Year ReporterName PartnerName PartnerISO3 `TradeValue in 1000 USD` Total_Year pct_by_partner_year year_group total_year_group
<int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <fct> <dbl>
1 2018 Angola Afghanistan AFG 19.4 42096736. 0.0000460 2015-2018 34907582.
2 2017 Angola Afghanistan AFG 2.25 34904881. 0.00000644 2015-2018 34907582.
3 2016 Angola Afghanistan AFG 0.775 28057500. 0.00000276 2015-2018 34907582.
4 2015 Angola Afghanistan AFG 39.6 33924937. 0.000117 2015-2018 34907582.
5 2018 Angola Albania ALB 2.38 42096736. 0.00000565 2015-2018 34907582.
6 2017 Angola Albania ALB 39.7 34904881. 0.000114 2015-2018 34907582.
问题 2
另一个问题是,并非所有国家/地区都显示所有年份的数据。有的起步晚,有的有差距。 为了确保可比性,我仍然需要同年组的方法。数据集没有 NA。只是缺少数据。
例如安哥拉不包括 2008 年。数据集不包括 NA,但不包括安哥拉 2008 年的行和值。其他国家正在显示 2008 年的数据。我仍然需要在 total_year_group 列中获得安哥拉可用年份的平均值(取 2007、2009 和 2010 年的平均值)。这不应该是平均函数的问题,对吧?还是在这种情况下我需要考虑一些特殊的事情?
这里是 total_trade 的一些样本输入数据
dput(head(total_trade, n = 100))
structure(list(Year = c(2015L, 2018L, 2017L, 2016L, 2017L, 2015L,
2018L, 2016L, 2015L, 2017L, 2018L, 2018L, 2017L, 2018L, 2018L,
2015L, 2016L, 2017L, 2016L, 2015L, 2017L, 2018L, 2018L, 2017L,
2016L, 2015L, 2018L, 2014L, 2015L, 2016L, 2017L, 2017L, 2018L,
2016L, 2015L, 2016L, 2018L, 2017L, 2015L, 2010L, 2009L, 2016L,
2013L, 2014L, 2018L, 2017L, 2015L, 2016L, 2017L, 2018L, 2017L,
2018L, 2016L, 2016L, 2018L, 2007L, 2013L, 2009L, 2018L, 2015L,
2016L, 2014L, 2010L, 2017L, 2012L, 2011L, 2018L, 2016L, 2015L,
2016L, 2011L, 2018L, 2017L, 2015L, 2015L, 2016L, 2018L, 2017L,
2015L, 2015L, 2016L, 2018L, 2017L, 2007L, 2014L, 2010L, 2013L,
2011L, 2009L, 2012L, 2017L, 2018L, 2016L, 2015L, 2015L, 2015L,
2017L, 2016L, 2018L, 2015L), ReporterName = c("Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola"
), PartnerName = c("Afghanistan", "Afghanistan", "Afghanistan",
"Afghanistan", "Albania", "Albania", "Albania", "Algeria", "Algeria",
"Algeria", "Algeria", "American Samoa", "Andorra", "Andorra",
"Antigua and Barbuda", "Antigua and Barbuda", "Antigua and Barbuda",
"Antigua and Barbuda", "Argentina", "Argentina", "Argentina",
"Argentina", "Armenia", "Armenia", "Armenia", "Armenia", "Australia",
"Australia", "Australia", "Australia", "Australia", "Austria",
"Austria", "Austria", "Austria", "Azerbaijan", "Azerbaijan",
"Azerbaijan", "Azerbaijan", "Bahamas, The", "Bahamas, The", "Bahamas, The",
"Bahamas, The", "Bahamas, The", "Bahamas, The", "Bahamas, The",
"Bahamas, The", "Bahrain", "Bahrain", "Bahrain", "Bangladesh",
"Bangladesh", "Bangladesh", "Barbados", "Belarus", "Belgium",
"Belgium", "Belgium", "Belgium", "Belgium", "Belgium", "Belgium",
"Belgium", "Belgium", "Belgium", "Belgium", "Belize", "Belize",
"Belize", "Benin", "Benin", "Benin", "Benin", "Benin", "Bhutan",
"Bolivia", "Bolivia", "Bolivia", "Bolivia", "Botswana", "Botswana",
"Botswana", "Botswana", "Brazil", "Brazil", "Brazil", "Brazil",
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil",
"British Virgin Islands", "Brunei", "Bulgaria", "Bulgaria", "Bulgaria",
"Bulgaria"), PartnerISO3 = c("AFG", "AFG", "AFG", "AFG", "ALB",
"ALB", "ALB", "DZA", "DZA", "DZA", "DZA", "ASM", "AND", "AND",
"ATG", "ATG", "ATG", "ATG", "ARG", "ARG", "ARG", "ARG", "ARM",
"ARM", "ARM", "ARM", "AUS", "AUS", "AUS", "AUS", "AUS", "AUT",
"AUT", "AUT", "AUT", "AZE", "AZE", "AZE", "AZE", "BHS", "BHS",
"BHS", "BHS", "BHS", "BHS", "BHS", "BHS", "BHR", "BHR", "BHR",
"BGD", "BGD", "BGD", "BRB", "BLR", "BEL", "BEL", "BEL", "BEL",
"BEL", "BEL", "BEL", "BEL", "BEL", "BEL", "BEL", "BLZ", "BLZ",
"BLZ", "BEN", "BEN", "BEN", "BEN", "BEN", "BTN", "BOL", "BOL",
"BOL", "BOL", "BWA", "BWA", "BWA", "BWA", "BRA", "BRA", "BRA",
"BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "BRA", "VGB",
"BRN", "BGR", "BGR", "BGR", "BGR"), `TradeValue in 1000 USD` = c(39.586,
19.353, 2.248, 0.775, 39.723, 2.259, 2.38, 2169.123, 2322.463,
2241.599, 245.226, 12.007, 5.975, 0.326, 422.006, 155.467, 47.018,
54.774, 483.147, 142.23, 98.7, 61.362, 60.105, 30.494, 0.99,
0.731, 40220.092, 45435.804, 16096.404, 8546.882, 1904.301, 627.179,
433.699, 23.118, 5.124, 985.67, 600.371, 143.356, 9.926, 140139.415,
108214.936, 64444.203, 100210.999, 52974.059, 7322.893, 145.791,
26.995, 4.847, 5.187, 1.958, 125.722, 55.22, 2.75, 3.366, 54.31,
107976.895, 123610.469, 66757.2, 67763.201, 50046.64, 40199.706,
52383.95, 45614.873, 28690.458, 52907.343, 39328.574, 452.078,
5.82, 0.32, 970.324, 1700.981, 804.478, 332.216, 69.342, 1.632,
1530.58, 308.752, 62.569, 19.822, 55.241, 37.029, 16.917, 0.198,
874217.786, 1032751.313, 509259.955, 428750.075, 333280.441,
192964.08, 315316.932, 119947.132, 141486.749, 66556.728, 1273.093,
5.064, 22.324, 158.252, 33.583, 8.435, 0.077), Total_Year = c(33924937.48,
42096736.31, 34904881.111, 28057499.527, 34904881.111, 33924937.48,
42096736.31, 28057499.527, 33924937.48, 34904881.111, 42096736.31,
42096736.31, 34904881.111, 42096736.31, 42096736.31, 33924937.48,
28057499.527, 34904881.111, 28057499.527, 33924937.48, 34904881.111,
42096736.31, 42096736.31, 34904881.111, 28057499.527, 33924937.48,
42096736.31, 58672369.19, 33924937.48, 28057499.527, 34904881.111,
34904881.111, 42096736.31, 28057499.527, 33924937.48, 28057499.527,
42096736.31, 34904881.111, 33924937.48, 52612114.76, 40639411.73,
28057499.527, 67712526.544, 58672369.19, 42096736.31, 34904881.111,
33924937.48, 28057499.527, 34904881.111, 42096736.31, 34904881.111,
42096736.31, 28057499.527, 28057499.527, 42096736.31, 44177783.072,
67712526.544, 40639411.73, 42096736.31, 33924937.48, 28057499.527,
58672369.19, 52612114.76, 34904881.111, 70863076.416, 66427390.221,
42096736.31, 28057499.527, 33924937.48, 28057499.527, 66427390.221,
42096736.31, 34904881.111, 33924937.48, 33924937.48, 28057499.527,
42096736.31, 34904881.111, 33924937.48, 33924937.48, 28057499.527,
42096736.31, 34904881.111, 44177783.072, 58672369.19, 52612114.76,
67712526.544, 66427390.221, 40639411.73, 70863076.416, 34904881.111,
42096736.31, 28057499.527, 33924937.48, 33924937.48, 33924937.48,
34904881.111, 28057499.527, 42096736.31, 33924937.48), pct_by_partner_year = c(0.000116687024179005,
4.59726850497024e-05, 6.44035999679013e-06, 2.7621848456389e-06,
0.000113803567683494, 6.65881846158674e-06, 5.65364493454718e-06,
0.0077309918437765, 0.00684588733986371, 0.00642202158738646,
0.000582529719629944, 2.8522401146684e-05, 1.71179497245645e-05,
7.74406827169068e-07, 0.00100246726228929, 0.000458267609458834,
0.000167577299448064, 0.000156923611416451, 0.00172198880208503,
0.000419249114560196, 0.000282768474948037, 0.00014576426910659,
0.000142778289407966, 8.73631395649993e-05, 3.5284683834613e-06,
2.15475710288619e-06, 0.0955420669759755, 0.0774398658640565,
0.0474471147057807, 0.0304620231456308, 0.00545568682484317,
0.00179682319502973, 0.00103024376238159, 8.23950829180388e-05,
1.51039335091503e-05, 0.00351303578942051, 0.00142616994243657,
0.000410704736521284, 2.92587127267419e-05, 0.2663633948935,
0.266280763902189, 0.229686194730164, 0.147994771595005, 0.0902879153020956,
0.0173953936620509, 0.000417680838208199, 7.95727332317548e-05,
1.72752386410474e-05, 1.48603858110989e-05, 4.65119192514428e-06,
0.000360184581635431, 0.000131174064405754, 9.80130106517029e-06,
1.19967925037684e-05, 0.000129012376636663, 0.244414471464133,
0.18255184868885, 0.164267141570654, 0.160970200874938, 0.147521686751831,
0.143276153177212, 0.0892821454514031, 0.0867003221749986, 0.0821961201035531,
0.0746613690455784, 0.0592053577133712, 0.00107390272887404,
2.07431171633786e-05, 9.43258923288073e-07, 0.00345834096536738,
0.0025606620918584, 0.00191102225615742, 0.000951775194258733,
0.000204398313308255, 4.81062050876917e-06, 0.00545515468521031,
0.000733434529761055, 0.000179255731601051, 5.84289949294256e-05,
0.000162833019316739, 0.000131975409869888, 4.01860131755188e-05,
5.6725590719059e-07, 1.97886296054109, 1.76020046106476, 0.967951882039117,
0.633191666126054, 0.50172141324715, 0.474820062066878, 0.444966473299775,
0.34363999584631, 0.336099093188823, 0.237215465105691, 0.00375267603882995,
1.49270724610338e-05, 6.58041006358842e-05, 0.000453380716286491,
0.00011969348860786, 2.00371827827334e-05, 2.26971678416193e-07
)), row.names = c(NA, -100L), groups = structure(list(Year = c(2007L,
2007L, 2009L, 2009L, 2009L, 2010L, 2010L, 2010L, 2011L, 2011L,
2011L, 2012L, 2012L, 2013L, 2013L, 2013L, 2014L, 2014L, 2014L,
2014L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L
), ReporterName = c("Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola",
"Angola", "Angola", "Angola", "Angola"), PartnerName = c("Belgium",
"Brazil", "Bahamas, The", "Belgium", "Brazil", "Bahamas, The",
"Belgium", "Brazil", "Belgium", "Benin", "Brazil", "Belgium",
"Brazil", "Bahamas, The", "Belgium", "Brazil", "Australia", "Bahamas, The",
"Belgium", "Brazil", "Afghanistan", "Albania", "Algeria", "Antigua and Barbuda",
"Argentina", "Armenia", "Australia", "Austria", "Azerbaijan",
"Bahamas, The", "Belgium", "Belize", "Benin", "Bhutan", "Bolivia",
"Botswana", "Brazil", "British Virgin Islands", "Brunei", "Bulgaria",
"Afghanistan", "Algeria", "Antigua and Barbuda", "Argentina",
"Armenia", "Australia", "Austria", "Azerbaijan", "Bahamas, The",
"Bahrain", "Bangladesh", "Barbados", "Belgium", "Belize", "Benin",
"Bolivia", "Botswana", "Brazil", "Bulgaria", "Afghanistan", "Albania",
"Algeria", "Andorra", "Antigua and Barbuda", "Argentina", "Armenia",
"Australia", "Austria", "Azerbaijan", "Bahamas, The", "Bahrain",
"Bangladesh", "Belgium", "Benin", "Bolivia", "Botswana", "Brazil",
"Bulgaria", "Afghanistan", "Albania", "Algeria", "American Samoa",
"Andorra", "Antigua and Barbuda", "Argentina", "Armenia", "Australia",
"Austria", "Azerbaijan", "Bahamas, The", "Bahrain", "Bangladesh",
"Belarus", "Belgium", "Belize", "Benin", "Bolivia", "Botswana",
"Brazil", "Bulgaria"), .rows = structure(list(56L, 84L, 41L,
58L, 89L, 40L, 63L, 86L, 66L, 71L, 88L, 65L, 90L, 43L, 57L,
87L, 28L, 44L, 62L, 85L, 1L, 6L, 9L, 16L, 20L, 26L, 29L,
35L, 39L, 47L, 60L, 69L, 74L, 75L, 79L, 80L, 94L, 95L, 96L,
100L, 4L, 8L, 17L, 19L, 25L, 30L, 34L, 36L, 42L, 48L, 53L,
54L, 61L, 68L, 70L, 76L, 81L, 93L, 98L, 3L, 5L, 10L, 13L,
18L, 21L, 24L, 31L, 32L, 38L, 46L, 49L, 51L, 64L, 73L, 78L,
83L, 91L, 97L, 2L, 7L, 11L, 12L, 14L, 15L, 22L, 23L, 27L,
33L, 37L, 45L, 50L, 52L, 55L, 59L, 67L, 72L, 77L, 82L, 92L,
99L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = c(NA, 100L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
【问题讨论】:
-
你没有包含
year_group_break所以代码会抛出错误 -
谢谢!刚刚更新了帖子
标签: r dplyr group-by average mean