【问题标题】:Calculate percentage between two datafarmes计算两个数据框之间的百分比
【发布时间】:2021-12-11 14:42:41
【问题描述】:

第一个数据帧df1 是,

df1 = data.frame('gen' = c('a', 'b', 'c', 'd'), 'mm' = c(10, 20, 30, 40), 'nn' = c(50,60,70,80))
  gen mm nn
1   a 10 50
2   b 20 60
3   c 30 70
4   d 40 80

第二个数据帧df2 是,

df2 = data.frame('gen' = c('x', 'y'), 'mm' = c(10,20), 'nn' = c(20,30))
  gen mm nn
1   x  10 20
2   y  20 30

我想计算df1 占所有df2 值的百分比。

异常输出,

  gen    x.1   y.1   x.2   y.2
  <chr> <dbl> <dbl> <dbl> <dbl>
1 a         0   -50   150  66.67
2 b       100     0   200 100.00  
3 c       200    50   250 133.33 
4 d       300   100   300 167.67

例如,

一般公式

(df1-df2)/df2*100

考虑a

(10-10)/10*100 = 0 (x.1)

(10-20)/20*100 = -50 (y.1)

(50-20)/20*100 = 150 (x.2)

(50-30)/30*100 = 66.67 (y.2)

等等……

谢谢...

【问题讨论】:

  • 两个表都有 10、20、50,所以当您的公式引用哪个表的哪些元素时,我并不清楚。你能解释更多吗?
  • 添加了通用公式

标签: r dplyr data-manipulation


【解决方案1】:

这是data.table 方法

library(data.table)
# Convert df1 and df2 to data.table format
setDT(df)
setDT(df2, keep.rownames = c("id"))
# Melt df1 and df2 to long format
df.melt <- melt(df, id.vars = "gen", variable.factor = FALSE)
df2.melt <- melt(df2, id.vars = c("id", "gen"), variable.factor = FALSE)
# Perform left join
ans <- df2.melt[df.melt, on = .(variable), allow.cartesian = TRUE]
# Create new colnames
ans[, id2 := rowid(i.gen, gen)]
ans[, name := paste(gen, id2, sep = ".")]
# Perform calulation
ans[, new.value := 100 * (i.value - value) / value]
# Cast to wide format
dcast(ans, i.gen ~ name, value.var = "new.value")
#    i.gen x.1 x.2 y.1       y.2
# 1:     a   0 150 -50  66.66667
# 2:     b 100 200   0 100.00000
# 3:     c 200 250  50 133.33333
# 4:     d 300 300 100 166.66667

【讨论】:

  • 感谢您的回答。我在ans &lt;- df2.melt[df.melt, on = .(variable), allow.cartesian = TRUE] 收到错误,即Error in colnamesInt(i, unname(on), check_dups = FALSE) : argument specifying columns specify non existing column(s): cols[1]='variable'
  • 奇怪..这里没有错误...您是否在示例数据上运行了上述答案中的所有代码?
  • 是的,所有代码都运行了
  • 代码没问题
【解决方案2】:

你可以使用

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(-gen) %>% 
  left_join(df2 %>% pivot_longer(-gen), by = "name") %>% 
  mutate(value.y = (value.x - value.y) / value.y * 100, .keep = "unused") %>% 
  pivot_wider(names_from = c("gen.y", "name"), values_from = "value.y") %>% 
  rename(gen = gen.x, x.1 = x_mm, y.1 = y_mm, x.2 = x_nn, y.2 = y_nn)

返回

# A tibble: 4 x 5
  gen     x.1   y.1   x.2   y.2
  <chr> <dbl> <dbl> <dbl> <dbl>
1 a         0   -50   150  66.7
2 b       100     0   200 100  
3 c       200    50   250 133. 
4 d       300   100   300 167. 

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2011-08-13
    • 1970-01-01
    • 2016-02-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多