【问题标题】:Find columns that have identical values查找具有相同值的列
【发布时间】:2021-07-31 18:15:58
【问题描述】:

问题陈述: 我实际上想从进一步的分析中消除在所有单元格中具有相同值的列。为此,我想找到具有相同值的列。

我编写了以下代码,它似乎适用于数据帧测试,但不适用于真正的数据帧 stpo


    library("dplyr")
    library("purrr")
    test_unique <- function(x)
    {
      return(length(unique(x)))
    }
    
    test <-data.frame(c1 = c("a", "a"), c2 = c(NA, NA), c3 =  c(1,2), c4=c(NA, 4))
    # What I want to find out the columns that have the same value throughout
    res <- map(test[,c(names(test))], test_unique)
    res
    # But when I try to apply the same thing to the dataset below, it does not work. 
    # Not sure what the reason is. Is there a better way to do this? Perhaps using data.table? What am I doing wrong?
    res2 <- map(stpo[,c(names(stpo))], test_unique)
    res2

   
    I am not exactly sure how to put the result of dput. I am putting this below (this is the dataframe stpo)

structure(list(stlnr = c(1L, 2L, 3L, 3L, 3L, 3L, 4L), stlkn = c(1L, 
1L, 1L, 2L, 3L, 4L, 5L), stpoz = c(2L, 2L, 2L, 4L, 6L, 8L, 10L
), aennr = c(NA, NA, NA, NA, NA, NA, NA), vgknt = c(0L, 0L, 0L, 
0L, 0L, 0L, 0L), idnrk = c("test_1", "test_1", "test_2", "test_3", 
"test_3", "test_1", "test_2"), pswrk = c(NA, NA, NA, NA, NA, 
NA, NA), meins = c("EA", "EA", "EA", "EA", "EA", "EA", "EA"), 
    menge = c(1, 14, 4, 4, 2, 2, 1), fmeng = c(NA, NA, NA, NA, 
    NA, NA, NA), ausch = c(0, 0, 0, 0, 0, 0, 0), avoau = c(0, 
    0, 0, 0, 0, 0, 0), netau = c(NA, NA, NA, NA, NA, NA, NA), 
    erskz = c(NA, NA, NA, NA, NA, NA, NA), rekri = c(NA, NA, 
    NA, NA, NA, NA, NA), rekrs = c(NA, NA, NA, NA, NA, NA, NA
    ), nlfzt = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), verti = c(NA, NA, 
    NA, NA, NA, NA, NA), alpos = c(NA, NA, NA, NA, NA, NA, NA
    ), ewahr = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), ekgrp = c(NA, NA, 
    NA, NA, NA, NA, NA), lifzt = c(0L, 0L, 0L, 0L, 0L, 0L, 0L
    ), lifnr = c(NA, NA, NA, NA, NA, NA, NA), roms1 = c(0, 0, 
    0, 0, 0, 0, 0), roms2 = c(0, 0, 0, 0, 0, 0, 0), roms3 = c(0, 
    0, 0, 0, 0, 0, 0), romen = c(0, 0, 0, 0, 0, 0, 0), rform = c(NA, 
    NA, NA, NA, NA, NA, NA), upskz = c(NA, NA, NA, NA, NA, NA, 
    NA), valkz = c(NA, NA, NA, NA, NA, NA, NA), matkl = c(NA, 
    NA, NA, NA, NA, NA, NA), webaz = c(0L, 0L, 0L, 0L, 0L, 0L, 
    0L), clobk = c(NA, NA, NA, NA, NA, NA, NA), lgort = c(NA, 
    NA, NA, NA, NA, NA, 14L), kzkup = c(NA, NA, NA, NA, NA, NA, 
    NA), dvnam = c(NA, NA, NA, NA, NA, NA, NA), dspst = c(NA, 
    NA, NA, NA, NA, NA, NA), alpst = c(NA, NA, NA, NA, NA, NA, 
    NA), alprf = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), alpgr = c(NA, 
    NA, NA, NA, NA, NA, NA), kstty = c(NA, NA, NA, NA, NA, NA, 
    NA), kstnr = c(NA, NA, NA, NA, NA, NA, NA), nlfzv = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L), nlfmv = c(NA, NA, NA, NA, NA, NA, 
    NA), idhis = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), idvar = c(NA, 
    NA, NA, NA, NA, NA, NA), itsob = c(NA, NA, NA, NA, NA, NA, 
    NA), cufactor = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), funcid = c(NA, 
    NA, NA, NA, NA, NA, NA)), row.names = c(NA, -7L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000022534c51ef0>)

【问题讨论】:

  • 它的工作原理与创建的函数完全一样,即您正在检查它返回正确的唯一值的长度为sapply(stpo[, 1:3, with = FALSE], function(x) length(unique(x)))# stlnr stlkn stpoz 4 5 5
  • 最好为 dput 数据提供预期的输出
  • 如果要删除具有相同值的列,可以使用Filter(var, stpo)
  • 您好,感谢您回答我的问题。数据框 stpo 有 49 列。即使我采用第一列“stlnr”并应用此函数,我希望 length(unique(column stlnr)) 的值不应该返回 1,而是唯一值的数量。在这个时候,它不是。这就是stackoverflow上这个问题的原因。
  • 基于 dput,第一列只有 4 个 length(unique(stpo[[1]])) [1] 4

标签: r


【解决方案1】:

问题是我们在 data.table 上进行子集化,而不是 data.frame。在这里,我们需要with = FALSE(在?data.table中提到

j - 当 with=TRUE(默认)时,j 在 data.table 的框架内进行评估;即,它将列名视为变量。

stpo[,c(names(stpo))]
 [1] "stlnr"    "stlkn"    "stpoz"    "aennr"    "vgknt"    "idnrk"    "pswrk"    "meins"    "menge"    "fmeng"    "ausch"    "avoau"    "netau"    "erskz"   
[15] "rekri"    "rekrs"    "nlfzt"    "verti"    "alpos"    "ewahr"    "ekgrp"    "lifzt"    "lifnr"    "roms1"    "roms2"    "roms3"    "romen"    "rform"   
[29] "upskz"    "valkz"    "matkl"    "webaz"    "clobk"    "lgort"    "kzkup"    "dvnam"    "dspst"    "alpst"    "alprf"    "alpgr"    "kstty"    "kstnr"   
[43] "nlfzv"    "nlfmv"    "idhis"    "idvar"    "itsob"    "cufactor" "funcid"  

现在,检查输出

stpo[,c(names(stpo)), with = FALSE]
 stlnr stlkn stpoz aennr vgknt  idnrk pswrk meins menge fmeng ausch avoau netau erskz rekri rekrs nlfzt verti alpos ewahr ekgrp lifzt lifnr roms1 roms2
1:     1     1     2    NA     0 test_1    NA    EA     1    NA     0     0    NA    NA    NA    NA     0    NA    NA     0    NA     0    NA     0     0
2:     2     1     2    NA     0 test_1    NA    EA    14    NA     0     0    NA    NA    NA    NA     0    NA    NA     0    NA     0    NA     0     0
3:     3     1     2    NA     0 test_2    NA    EA     4    NA     0     0    NA    NA    NA    NA     0    NA    NA     0    NA     0    NA     0     0
4:     3     2     4    NA     0 test_3    NA    EA     4    NA     0     0    NA    NA    NA    NA     0    NA    NA     0    NA     0    NA     0     0
5:     3     3     6    NA     0 test_3    NA    EA     2    NA     0     0    NA    NA    NA    NA     0    NA    NA     0    NA     0    NA     0     0
6:     3     4     8    NA     0 test_1    NA    EA     2    NA     0     0    NA    NA    NA    NA     0    NA    NA     0    NA     0    NA     0     0
7:     4     5    10    NA     0 test_2    NA    EA     1    NA     0     0    NA    NA    NA    NA     0    NA    NA     0    NA     0    NA     0     0
   roms3 romen rform upskz valkz matkl webaz clobk lgort kzkup dvnam dspst alpst alprf alpgr kstty kstnr nlfzv nlfmv idhis idvar itsob cufactor funcid
1:     0     0    NA    NA    NA    NA     0    NA    NA    NA    NA    NA    NA     0    NA    NA    NA     0    NA     0    NA    NA        0     NA
2:     0     0    NA    NA    NA    NA     0    NA    NA    NA    NA    NA    NA     0    NA    NA    NA     0    NA     0    NA    NA        0     NA
3:     0     0    NA    NA    NA    NA     0    NA    NA    NA    NA    NA    NA     0    NA    NA    NA     0    NA     0    NA    NA        0     NA
4:     0     0    NA    NA    NA    NA     0    NA    NA    NA    NA    NA    NA     0    NA    NA    NA     0    NA     0    NA    NA        0     NA
5:     0     0    NA    NA    NA    NA     0    NA    NA    NA    NA    NA    NA     0    NA    NA    NA     0    NA     0    NA    NA        0     NA
6:     0     0    NA    NA    NA    NA     0    NA    NA    NA    NA    NA    NA     0    NA    NA    NA     0    NA     0    NA    NA        0     NA
7:     0     0    NA    NA    NA    NA     0    NA    14    NA    NA    NA    NA     0    NA    NA    NA     0    NA     0    NA 

此外,如果使用整个列,则无需进行任何子集设置,即只需执行

purrr::map(stpo, test_unique)

-输出

$stlnr
[1] 4

$stlkn
[1] 5

$stpoz
[1] 5
...
...

关于使用

stpo[,1:length(names(stpo))]

这似乎是一个错误或一种骇人听闻的处理方式,而不是标准选项


如果我们想消除具有单个值的列,请使用var(假设所有数字列)

Filter(var, stpo)
stlnr stlkn stpoz menge
1:     1     1     2     1
2:     2     1     2    14
3:     3     1     2     4
4:     3     2     4     4
5:     3     3     6     2
6:     3     4     8     2
7:     4     5    10     1

或者更改函数以返回逻辑输出(它还会检查其他类型的列)

f1 <- function(x) length(unique(x)) > 1
Filter(f1, stpo)

-输出

    stlnr stlkn stpoz  idnrk menge lgort
1:     1     1     2 test_1     1    NA
2:     2     1     2 test_1    14    NA
3:     3     1     2 test_2     4    NA
4:     3     2     4 test_3     4    NA
5:     3     3     6 test_3     2    NA
6:     3     4     8 test_1     2    NA
7:     4     5    10 test_2     1    14

或使用data.table 子集列的方式

 stpo[, .SD, .SDcols = f1]
   stlnr stlkn stpoz  idnrk menge lgort
1:     1     1     2 test_1     1    NA
2:     2     1     2 test_1    14    NA
3:     3     1     2 test_2     4    NA
4:     3     2     4 test_3     4    NA
5:     3     3     6 test_3     2    NA
6:     3     4     8 test_1     2    NA
7:     4     5    10 test_2     1    14

【讨论】:

  • 阿伦:谢谢你的详细解释。非常有帮助。萨蒂什
【解决方案2】:

看起来我从 Arun 编写的代码中得到了启发,并像这样修改了代码:

res2 <- map(stpo[,1:length(names(stpo))], test_unique)

【讨论】:

  • Anil:我可以知道编辑的原因吗?你是说这不是解决方案吗?萨蒂什
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-01-27
  • 2012-06-18
  • 1970-01-01
  • 2016-05-21
  • 2010-12-19
  • 1970-01-01
相关资源
最近更新 更多