【问题标题】:Select columns from dataframe start with number从数据框中选择以数字开头的列
【发布时间】:2020-12-28 11:32:01
【问题描述】:

我有一个数据框,其列名的名称以数字开头,名称以字符串开头,我想使用以数字开头的名称进行子集,后跟点。

此代码适用于此示例,但在我的实际数据框中,AA ID 列被选中。不知道是什么原因

df <- data.frame(`AA ID`=c(1,2,3,4,5,6,7,8,9,10),
                 "BB"=c("AMK","KAMl","HAJ","NHS","KUL","GAF","BGA","NHU","VGY","NHU"),
                 "CC"=c("TAMAN","GHUSI","KELVIN","DEREK","LOKU","MNDHUL","JASMIN","BINNY","BURTAM","DAVID"),
                 "DD"=c(62,41,37,41,32,74,52,75,59,36),
                 "EE"=c("CA","NY","GA","DE","MN","LA","GA","VA","TM","BA"),
                 "FF"=c("ENGLISH","FRENCH","ENGLISH","FRENCH","ENGLISH","ENGLISH","SPANISH","ENGLISH","SPANISH","RUSSIAN"),
                 "GG"=c(33,44,51,51,37,58,24,67,41,75),
                 `1A`=c("","D","","NA","","D","","","D",""),
                 `2B`=c("","A","","","A","A","A","A","",""),
                 `3C`=c("","","","","","","","","",""),
                 `4D`=c("","G","G","G","G","G","G","G","",""),
                 "Concatenate" = c("","DAG","G","NAG","AG","DAG","AG","AG","D",""))

df <- df %>% rename(`1. A`="X1A",`1. B`="X2B",`1. C`="X3C",`1. D`="X4D")
Error_summary <- select(df,matches("^[0-9]*\\."))

我也在尝试在数据框中添加计数,如下所示

df_row = 
  df %>% 
  summarize(across(c(matches("^[0-9]*\\."), Concatenate), ~ sum(!is.na(.) & . != "" & . != "NA")))

但这也在选择我不想选择的列AA ID

【问题讨论】:

  • 如果可能,应避免使用不允许的列名,即以数字开头或包含空格。
  • 顺便说一句,您共享的数据没有以数字开头的列名。 R 更改它们并在它们前面添加X。你的真实数据也是这样吗?
  • 是的,我可以看到 R 正在将 X 添加到列名,但我不明白,但实际上我的列名以“1.city”、“2.country”、“5.professional”开头

标签: r dplyr


【解决方案1】:

考虑到应该以数字开头的变量将转换为以 X 开头的变量名,您可以这样做:

library(tidyverse)
df %>%
  select(matches("^X[0-9]"))

给出:

   X1..A X2..B X3..C X4..D
1                         
2      D     A           G
3                        G
4     NA                 G
5            A           G
6      D     A           G
7            A           G
8            A           G
9      D                  
10                        

使用相同的逻辑,您可以进行计数:

df %>% 
  summarize(across(c(matches("^X[0-9]"), Concatenate), ~ sum(!is.na(.) & . != "" & . != "NA")))

给了

  X1..A X2..B X3..C X4..D Concatenate
1     3     5     0     7           8

虽然我不确定您是否要排除 Concatenate 列中的“NAG”值。

【讨论】:

  • 我已经尝试过了,但这也是选择列AA ID
  • 它没有。请查看我帖子中的汇总命令与您的代码有何不同。同样使用您的代码,不会选择“AA ID”列,因为它不以 [0-9] 开头。
  • 在实际的列名是 1.city, 2.country 这样但在这个 r 中添加 x 在列中
猜你喜欢
  • 1970-01-01
  • 2021-12-26
  • 1970-01-01
  • 2011-05-29
  • 2011-08-17
  • 2020-07-07
  • 1970-01-01
  • 1970-01-01
  • 2017-07-03
相关资源
最近更新 更多