【发布时间】:2021-07-31 18:15:58
【问题描述】:
问题陈述: 我实际上想从进一步的分析中消除在所有单元格中具有相同值的列。为此,我想找到具有相同值的列。
我编写了以下代码,它似乎适用于数据帧测试,但不适用于真正的数据帧 stpo
library("dplyr")
library("purrr")
test_unique <- function(x)
{
return(length(unique(x)))
}
test <-data.frame(c1 = c("a", "a"), c2 = c(NA, NA), c3 = c(1,2), c4=c(NA, 4))
# What I want to find out the columns that have the same value throughout
res <- map(test[,c(names(test))], test_unique)
res
# But when I try to apply the same thing to the dataset below, it does not work.
# Not sure what the reason is. Is there a better way to do this? Perhaps using data.table? What am I doing wrong?
res2 <- map(stpo[,c(names(stpo))], test_unique)
res2
I am not exactly sure how to put the result of dput. I am putting this below (this is the dataframe stpo)
structure(list(stlnr = c(1L, 2L, 3L, 3L, 3L, 3L, 4L), stlkn = c(1L,
1L, 1L, 2L, 3L, 4L, 5L), stpoz = c(2L, 2L, 2L, 4L, 6L, 8L, 10L
), aennr = c(NA, NA, NA, NA, NA, NA, NA), vgknt = c(0L, 0L, 0L,
0L, 0L, 0L, 0L), idnrk = c("test_1", "test_1", "test_2", "test_3",
"test_3", "test_1", "test_2"), pswrk = c(NA, NA, NA, NA, NA,
NA, NA), meins = c("EA", "EA", "EA", "EA", "EA", "EA", "EA"),
menge = c(1, 14, 4, 4, 2, 2, 1), fmeng = c(NA, NA, NA, NA,
NA, NA, NA), ausch = c(0, 0, 0, 0, 0, 0, 0), avoau = c(0,
0, 0, 0, 0, 0, 0), netau = c(NA, NA, NA, NA, NA, NA, NA),
erskz = c(NA, NA, NA, NA, NA, NA, NA), rekri = c(NA, NA,
NA, NA, NA, NA, NA), rekrs = c(NA, NA, NA, NA, NA, NA, NA
), nlfzt = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), verti = c(NA, NA,
NA, NA, NA, NA, NA), alpos = c(NA, NA, NA, NA, NA, NA, NA
), ewahr = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), ekgrp = c(NA, NA,
NA, NA, NA, NA, NA), lifzt = c(0L, 0L, 0L, 0L, 0L, 0L, 0L
), lifnr = c(NA, NA, NA, NA, NA, NA, NA), roms1 = c(0, 0,
0, 0, 0, 0, 0), roms2 = c(0, 0, 0, 0, 0, 0, 0), roms3 = c(0,
0, 0, 0, 0, 0, 0), romen = c(0, 0, 0, 0, 0, 0, 0), rform = c(NA,
NA, NA, NA, NA, NA, NA), upskz = c(NA, NA, NA, NA, NA, NA,
NA), valkz = c(NA, NA, NA, NA, NA, NA, NA), matkl = c(NA,
NA, NA, NA, NA, NA, NA), webaz = c(0L, 0L, 0L, 0L, 0L, 0L,
0L), clobk = c(NA, NA, NA, NA, NA, NA, NA), lgort = c(NA,
NA, NA, NA, NA, NA, 14L), kzkup = c(NA, NA, NA, NA, NA, NA,
NA), dvnam = c(NA, NA, NA, NA, NA, NA, NA), dspst = c(NA,
NA, NA, NA, NA, NA, NA), alpst = c(NA, NA, NA, NA, NA, NA,
NA), alprf = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), alpgr = c(NA,
NA, NA, NA, NA, NA, NA), kstty = c(NA, NA, NA, NA, NA, NA,
NA), kstnr = c(NA, NA, NA, NA, NA, NA, NA), nlfzv = c(0L,
0L, 0L, 0L, 0L, 0L, 0L), nlfmv = c(NA, NA, NA, NA, NA, NA,
NA), idhis = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), idvar = c(NA,
NA, NA, NA, NA, NA, NA), itsob = c(NA, NA, NA, NA, NA, NA,
NA), cufactor = c(0L, 0L, 0L, 0L, 0L, 0L, 0L), funcid = c(NA,
NA, NA, NA, NA, NA, NA)), row.names = c(NA, -7L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000022534c51ef0>)
【问题讨论】:
-
它的工作原理与创建的函数完全一样,即您正在检查它返回正确的唯一值的长度为
sapply(stpo[, 1:3, with = FALSE], function(x) length(unique(x)))# stlnr stlkn stpoz 4 5 5 -
最好为 dput 数据提供预期的输出
-
如果要删除具有相同值的列,可以使用
Filter(var, stpo) -
您好,感谢您回答我的问题。数据框 stpo 有 49 列。即使我采用第一列“stlnr”并应用此函数,我希望 length(unique(column stlnr)) 的值不应该返回 1,而是唯一值的数量。在这个时候,它不是。这就是stackoverflow上这个问题的原因。
-
基于 dput,第一列只有 4 个
length(unique(stpo[[1]])) [1] 4
标签: r