【问题标题】:R - ggplot multiple regression lines for different columns in same chartR - 同一图表中不同列的 ggplot 多条回归线
【发布时间】:2025-12-01 15:50:01
【问题描述】:

使用如下数据,

text = "
R_1700,R_350,R_2950,S_1700,S_350,S_2950
,,-98.2,,,14.15
-80,-82.3,-99,,-0.7,12.4
-77.55,-80.6,-97,,,14.5
-75.55,-80.85,-96.35,,,14.4
-80.8,-81.6,-94.3,,9.95,6
-80.8,-81.8,,,4.9,
-80.8,-81.85,,,8.2,
-73.8,-77.6,-98,,6.35,
-72.8,-76.7,-96.8,3.7,4.6,
-72.65,-81.7,-94.05,2.25,,
-72.95,-80.4,-94.6,1.7,,
-72.7,-81.7,-94.35,1.6,,
-76.05,-84.25,-95.65,3.65,,
-75.5,-84.65,-95.2,1.95,,
-74.65,-83.8,-94.6,2.6,,
-74.2,-83.95,-100.65,3.25,,
-66.8,-75.65,-97.25,,6.45,
-73.7,-77.7,-97.05,,6.8,
-97.8,-100.8,-116.9,,-5.3,
,-99.7,,,-1,
,-100.2,,,-1.3,
-93.3,-94.75,-103.7,,-4.25,
-94.6,-96.55,-105,,-6.7,
-96.4,-98.45,-110.1,,-6.9,
-96.4,-101.1,-110.7,,-7.65,
-94.95,-102,,,-7.2,
-94,-102.15,,,-9.35,
-91.8,-97,-110.3,,-5.3,
"
df1 = read.table(textConnection(text), sep=",", header = T)

需要为如下列绘制回归线,其中 X 轴保持 R_... 值,Y 轴保持 S_...

  1. S_1700R_1700
  2. S_350R_350
  3. S_2950R_2950

对于一组变量,我可以做如下的事情。

ggplot(df1, aes(x=R_1700, y=S_1700)) +
  geom_point() + 
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE)

需要帮助才能将所有三条线都放在一个图中,如下例所示。这 3 个不同的组将是 17003502950

【问题讨论】:

    标签: r ggplot2 linear-regression


    【解决方案1】:

    tidyverse解决方案

    library(tidyverse)
    
    df1 %>% 
      pivot_longer(everything()) %>% #wide to long data format
      separate(name, c("key","number"), sep = "_") %>% #Separate elements like R_1700 into 2 columns 
      group_by(number, key) %>% #Group the vaules according to number, key
      mutate(row = row_number()) %>% #For creating unique IDs 
      pivot_wider(names_from = key, values_from = value) %>% #Make separate columns for R and S
      ggplot(aes(x=R, y=S, color = number, shape = number)) +
      geom_point() + 
      geom_smooth(method=lm, se=FALSE, fullrange=TRUE)
    

    【讨论】:

    • 太棒了 .. 您还可以在正在完成的步骤中添加 cmets 吗?同样在separate 中,您使用了sep = "_" - 如果R_1700 不是ABC_CA_BDEF_for_KEY_1700 之类的字符串模式,而不是'S_1700' 是ABC_CA_XYZX_for_KEY_1700 之类的字符串模式。注意两者的区别在于第三个和第四个下划线之间
    • 查看更新后的答案。有关separate的详细信息,您可以访问this
    • 也请访问this
    【解决方案2】:

    如果您可以按照以下格式重新组织您的数据:

    # with data.table package
    library(data.table)
    setDT(df1)
    df2 <- melt(df1, measure.vars = patterns('R_', 'S_'))
    df2[, variable := factor(variable, levels = 1:3,
        labels = tstrsplit(grep('R_', names(df1), value = TRUE), '_')[[2]])]
    # > df2
    #     variable  value1 value2
    # 1:     1700      NA     NA
    # 2:     1700  -80.00     NA
    # 3:     1700  -77.55     NA
    # 4:     1700  -75.55     NA
    # 5:     1700  -80.80     NA
    # 6:     1700  -80.80     NA
    # 7:     1700  -80.80     NA
    # 8:     1700  -73.80     NA
    # 9:     1700  -72.80   3.70
    
    
    # without data.table
    tmp <- split.default(df1, f = sapply(strsplit(names(df1), '_'), `[`, 2))
    tmp <- lapply(tmp, function(dtf){
        names(dtf) <- c('value1', 'value2')
        return(dtf)
    })
    df2 <- do.call(rbind, tmp)
    df2$variable <- rep(names(tmp), each = nrow(df1))
    

    您可以根据需要轻松可视化数据:

    ggplot(df2, aes(x = value1, y = value2, color = variable)) +
        geom_point() + 
        geom_smooth(method=lm, se=FALSE, fullrange=TRUE) +
        labs(x = 'R', y = 'S')
    

    【讨论】:

    • 没有data.table可能得到答案?
    • @user3206440,当然。请参阅编辑后的答案。只要重新组织的数据是适合 ggplot 的长格式,如何重新格式化数据就很重要。对于只有几列的当前数据,您可以通过列子集->重命名->rbind手动完成,最后添加分组变量的列。但自动方式更通用,适用于具有更多列的数据集。
    最近更新 更多