【发布时间】:2016-12-23 20:31:49
【问题描述】:
我正在尝试创建一个名为“combo”的变量。我想要全小写的县,如果两个单词之间有一个空格,则包括一个空格,并且县名和州缩写之间没有空格。
到目前为止,我有这个:
county <- c("Abbeville County", "Aleutians West Census Area",
"Cerro Gordo County", "Lonoke County")
state <- c("West Virginia", "Wisconsin", "Wyoming", "Alabama")
trialdat <- data.frame(county, state)
trialdat$state <- sapply(trialdat$state, tolower)
# deal with trailing spaces
trim.trailing <- function (x) sub("\\s+$", "", x)
trialdat$state2 <- as.factor(trim.trailing(as.factor(trialdat$state)))
trialdat$StateAbbrev <- stateFromLower(trialdat$state2)
trialdat$county2 <- as.factor(trim.trailing(as.factor(trialdat$county)))
# make combo variable
trialdat = mutate(trialdat, combo=paste(tolower(gsub("County", "",county2)),
StateAbbrev, sep=""))
所需的输出是一列
combo
1 abbevilleWV
2 aleutians west census areaWI
3 cerro gordoWY
4 lonokeAL
奇怪的事情正在发生。使用名称中带有空格的县,我得到了我想要的。但对于其他县,县名后仍留有空格。我不能简单地 gsub-out 所有空格,因为我需要在县名之间使用它们。有任何想法吗?谢谢!
注意:statefromLower 函数如下,从Chris' code 稍作调整。我包括它是因为问题可能来自这部分,不确定。
stateFromLower <- function(x) {
# read 52 state codes into local variable [includes DC
# (Washington D.C. and PR (Puerto Rico)]
st.codes <- data.frame(state1 = as.factor(c("AK", "AL", "AR",
"AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI",
"IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD",
"ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE",
"NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA",
"PR", "RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT",
"WA", "WI", "WV", "WY")), full = as.factor(c("alaska",
"alabama", "arkansas", "arizona", "california", "colorado",
"connecticut", "district of columbia", "delaware", "florida",
"georgia", "hawaii", "iowa", "idaho", "illinois", "indiana",
"kansas", "kentucky", "louisiana", "massachusetts", "maryland",
"maine", "michigan", "minnesota", "missouri", "mississippi",
"montana", "north carolina", "north dakota", "nebraska",
"new hampshire", "new jersey", "new mexico", "nevada",
"new york", "ohio", "oklahoma", "oregon", "pennsylvania",
"puerto rico", "rhode island", "south carolina", "south dakota",
"tennessee", "texas", "utah", "virginia", "vermont",
"washington", "wisconsin", "west virginia", "wyoming")))
# create an nx1 data.frame of state codes from source column
st.x <- data.frame(full = x)
# match source codes with codes from 'st.codes' local
# variable and use to return the full state name
refac.x <- st.codes$state1[match(st.x$full, st.codes$full)]
# return the full state names in the same order in which they
# appeared in the original source
return(refac.x)
}
感谢您对格式问题的耐心等待,这是我的第一个问题!
【问题讨论】:
-
欢迎来到 SO。你能提供一个reproducible example 你想要做什么,而不让我们下载文件吗?
-
另外,你试过
trimws()吗? -
我建议你摆脱不可重现的例子。尽量使您的示例尽可能少,以隔离您遇到的错误。明确您的样本输入所需的输出是什么,以便可以测试可能的解决方案。您为
stateFromLower发布的代码似乎不完整。 -
@C8H10N4O2,我确实尝试了
trimws(),但同样的问题仍然存在。不过还是谢谢你的建议!