【问题标题】:Use tidyr::pivot_longer for multiple measurements with uncertainties使用 tidyr::pivot_longer 进行具有不确定性的多次测量
【发布时间】:2020-03-16 02:10:08
【问题描述】:

我遇到的一种常见类型的数据集包含多个测量值,每行组合了相关的不确定性。这是一个例子:


structure(list(meas1 = c(150.3197, 19.95853, 161.40022, 103.23733, 140.28786, 193.42983, 75.237556, 207.84688, 116.4379, 80.251797 ), unc1 = c(0.038140954, 0.09151666, 0.035390881, 0.043274285, 0.03396304, 0.033362432, 0.05290015, 0.035449262, 0.038330437, 0.049171039), meas2 = c(1270.5522, 562.92518, 940.65152, 696.6982, 380.22449, 1979.0521, 1022.01, 1269.7508, 1686.6116, 1256.0033 ), unc2 = c(0.06063558, 0.061388181, 0.060714985, 0.061178737, 0.061318833, 0.060302475, 0.060876815, 0.060659146, 0.060412551, 0.060635459), meas3 = c(601.11331, 1675.2958, 608.84736, 998.76837, 266.2926, 2933.9751, 1682.3191, 775.43699, 428.29473, 1393.6564 ), unc3 = c(0.103445147, 0.102309634, 0.103147224, 0.101772166, 0.104186185, 0.101292496, 0.101556363, 0.102983978, 0.10394405, 0.101598249), ID = 1:10), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

我想把它放在一个整洁的配置中,像这样:

ID meas_type    reading     uncert
1  1     meas1  150.31970 0.03814095
2  1     meas2 1270.55220 0.06063558
3  1     meas3  601.11331 0.10344515
4  2     meas1   19.95853 0.09151666
5  2     meas2  562.92518 0.06138818
6  2     meas3 1675.29580 0.10230963 ...

我有一个解决方法,但我想知道是否没有 pivot_longer() 方法可以更优雅地做到这一点。

这是我的绝妙解决方案:

df_vals <- df_raw %>% 
  pivot_longer(cols = c("meas1", "meas2", "meas3"),
               names_to = "meas_type",
               values_to = "reading")
df_vals <- df_vals[, 4:6]

df_unc <- df_raw %>% 
  pivot_longer(cols = starts_with("unc"),
               values_to = "uncert")
df_unc <- df_unc[, 4:6]

df <-  cbind(df_vals, "uncert" = df_unc$uncert)

【问题讨论】:

    标签: r tidyr


    【解决方案1】:

    我们可以使用pivot_longernames_pattern参数。

    tidyr::pivot_longer(df, cols = -ID, 
                        names_to = c(".value", "meas_type"),
                        names_pattern = "(.*)(\\d+)")
    
    # A tibble: 30 x 4
    #     ID meas_type   meas    unc
    #   <int> <chr>      <dbl>  <dbl>
    # 1     1 1          150.  0.0381
    # 2     1 2         1271.  0.0606
    # 3     1 3          601.  0.103 
    # 4     2 1           20.0 0.0915
    # 5     2 2          563.  0.0614
    # 6     2 3         1675.  0.102 
    # 7     3 1          161.  0.0354
    # 8     3 2          941.  0.0607
    # 9     3 3          609.  0.103 
    #10     4 1          103.  0.0433
    # … with 20 more rows
    

    【讨论】:

      【解决方案2】:

      如果您考虑使用基本 R 解决方案,则需要使用数据框而不是 tibble,但这可以满足您的需求..

      d <- as.data.frame(d)
      
      reshape(data=d, varying=1:6,
              timevar="meas_type",
              direction="long",
              sep="")
      
      
           ID meas_type       meas        unc
      1.1   1         1  150.31970 0.03814095
      2.1   2         1   19.95853 0.09151666
      3.1   3         1  161.40022 0.03539088
      4.1   4         1  103.23733 0.04327429
      5.1   5         1  140.28786 0.03396304
      6.1   6         1  193.42983 0.03336243
      

      【讨论】:

        【解决方案3】:

        我们可以从data.table使用melt

        library(data.table)
        melt(setDT(df1), measure = patterns("^unc", "meas"), 
            value.name = c("unc", "meas"), variable.name = "meas_type")
        # ID meas_type        unc       meas
        # 1:  1         1 0.03814095  150.31970
        # 2:  2         1 0.09151666   19.95853
        # 3:  3         1 0.03539088  161.40022
        # 4:  4         1 0.04327429  103.23733
        # 5:  5         1 0.03396304  140.28786
        # 6:  6         1 0.03336243  193.42983
        # 7:  7         1 0.05290015   75.23756
        # 8:  8         1 0.03544926  207.84688
        # 9:  9         1 0.03833044  116.43790
        #10: 10         1 0.04917104   80.25180
        #11:  1         2 0.06063558 1270.55220
        #...
        

        【讨论】:

        • 融化选项比 pivot_longer 正则表达式更容易理解。谢谢!
        猜你喜欢
        • 2021-09-07
        • 2022-01-08
        • 1970-01-01
        • 2021-01-18
        • 2022-11-15
        • 2015-07-14
        • 2015-07-17
        • 2020-10-07
        • 1970-01-01
        相关资源
        最近更新 更多