【发布时间】:2018-03-16 08:43:40
【问题描述】:
我正在尝试处理需要大量清理的数据集。我有一个主题名称,我似乎无法从中删除前导空格。
示例数据:
Data <- dput(Data)
structure(list(Teacher = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("Please.rate.teacher:.JOHN.DOE .Overall.rating.for.teacher",
"Please.rate.teacher: Jane.Doe.Overall.rating.for.teacher"), class = "factor"),
Overall_Rating = c(5L, 4L, 5L, 4L, 4L, 5L, 4L, 4L, 4L, 4L,
3L, 5L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 3L)), .Names = c("Teacher",
"Overall_Rating"), class = "data.frame", row.names = c(NA, -22L
))
我的清洁尝试:
Data_clean <- Data %>%
mutate(Teacher = as.character(Teacher),
Teacher = gsub("Please.rate.teacher|.Overall.rating.for.teacher|[:]", "", Teacher),
Teacher = gsub("[.]", " ", Teacher),
Teacher = trimws(Teacher),
Teacher = tolower(Teacher), Teacher = tools::toTitleCase(Teacher))
导致剩余的前导和尾随空格,这也打破了第二个名称的标题大小写:
unique(Data_clean$Teacher)
[1] "John Doe " " jane Doe"
第一个名字仍然有尾随空格,第二个名字有前导空格。
我怎样才能删除它?
【问题讨论】:
-
查找
?trimws。 -
我在更改大小写之前先调用 trimws
-
对不起,应该更仔细地阅读。我在下面添加了一个解决方案,请看一下。
标签: r data-cleaning