【问题标题】:setnames for duplicate colnames in data.table为 data.table 中的重复列名设置名称
【发布时间】:2015-05-24 06:12:53
【问题描述】:

由于某些原因(没关系),导入的 excel 数据框中的列名有重复,如下所示(DT 是从数据框 DF 转换的数据表)。但是,这些是唯一的列名,因此需要使用 setnames

    DF<-structure(list(X1 = c("", "15 May 2014", "16 May 2014", "18 May 2014", 
"19 May 2014"), X2 = c(NaN, 746.18, 746.18, 744.34, 739.95), 
X3 = c(NaN, 549.9, 549.9, 546.5, 549.65), X1 = c(NaN, 406.57, 
406.57, 406.66, 404.73), X1 = c(NaN, 1788.86, 1788.86, 1767.69, 
1772.34), X1 = c(NaN, 2286, 2286, 2302.37, 2313.14), X2 = c(NaN, 
3639.25, 3639.25, 3622.08, 3569.53), X3 = c(NaN, 1160.13, 
1160.13, 1144.77, 1129.72), X1 = c(NaN, 182.83, 182.83, 182.83, 
182.83), X2 = c(NaN, 787.13, 787.13, 775.39, 764.82), X1 = c(NaN, 
853.2, 853.2, 849.67, 844.49)), .Names = c("X1", "X2", "X3", 
"X1", "X1", "X1", "X2", "X3", "X1", "X2", "X1"), class = c("data.table", 
"data.frame"), row.names = c(NA, -5L))

DT<-as.data.table(DF)

 >DT

                X1     X2     X3     X1      X1      X1      X2      X3     X1     X2     X1
    1:                NaN    NaN    NaN     NaN     NaN     NaN     NaN    NaN    NaN    NaN
    2: 15 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
    3: 16 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
    4: 18 May 2014 744.34 546.50 406.66 1767.69 2302.37 3622.08 1144.77 182.83 775.39 849.67
    5: 19 May 2014 739.95 549.65 404.73 1772.34 2313.14 3569.53 1129.72 182.83 764.82 844.49

所以,我决定使用setnames 更改这些列名,但我收到以下错误(很明显):

 new_names<-c("Date","BOD","DO","FI","HT","HY","IN","MA","SE","OR","RA")
 old_names<-names(DT)
 setnames(DT, old_names, new_names)

 Error in setnames(DT, old_names, new_names) : 
  Some duplicates exist in 'old': X1,X1,X1,X2,X3,X1,X2,X1

所以,我采用了 data.frame 的方式来改变 colnames

names(DT)<-new_names # this doesn't give any error but still gives warnings

Warning message:
In `names<-.data.table`(`*tmp*`, value = c("Date", "BOD", "DO",  :
  The names(x)<-value syntax copies the whole table. This is due to <- in R itself. Please change to setnames(x,old,new) which does not copy and is faster. See help('setnames'). You can safely ignore this warning if it is inconvenient to change right now. Setting options(warn=2) turns this warning into an error, so you can then use traceback() to find and change your names<- calls.
> DT
          Date    BOD     DO     FI      HT      HY      IN      MA     SE     OR     RA
1:                NaN    NaN    NaN     NaN     NaN     NaN     NaN    NaN    NaN    NaN
2: 15 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
3: 16 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
4: 18 May 2014 744.34 546.50 406.66 1767.69 2302.37 3622.08 1144.77 182.83 775.39 849.67
5: 19 May 2014 739.95 549.65 404.73 1772.34 2313.14 3569.53 1129.72 182.83 764.82 844.49 

所以,我想知道当 colnames 不唯一时是否有一种 data.table(唯一)方法来更改 colnames(同样,这是因为数据是从 excel 导入的)。

【问题讨论】:

    标签: r data.table


    【解决方案1】:

    您可以省略 old_names:

    setnames(DT, new_names)
    

    假设new_names 的所有名称都按正确的顺序排列,则可以正常工作。来自?setnames

    setnames(x,old,new):
    old: 提供新时,字符名或列名的数字位置要更改。 未提供new时,新的列名,长度必须与列数相同。查看示例。

    【讨论】:

      【解决方案2】:

      我遇到了同样的问题,但我并不关心这些列可以获得什么新名称,所以我只需要唯一的名称。将make.uniquemake.names(建议here)与setnames(@BrodieG above 指出)结合使用解决了我的问题:

      # considering your DT object:
      setnames(DT, make.unique(names(DT)))
      # The new column names are:
      names(DT)
      ## [1] "X1"   "X2"   "X3"   "X1.1" "X1.2" "X1.3" "X2.1" "X3.1" "X1.4" "X2.2" "X1.5"
      # Same can be achieved with:
      setnames(DT, make.names(names(DT), unique = TRUE))
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2015-10-20
        • 2019-01-22
        • 1970-01-01
        • 2017-06-11
        • 2020-11-12
        • 2018-09-04
        • 2020-04-07
        • 1970-01-01
        相关资源
        最近更新 更多