【发布时间】:2020-03-13 13:03:34
【问题描述】:
'正在尝试删除特殊字符,例如"-","/",")","(" 等完全来自我的数据框。但是我的数据框只包含一个观察结果,因为它正在输入将在生产中使用的模型。我已经明确定义了因子水平数据框。
我尝试了以下方法:
sanitize_string <- function(string){
gsub('\\s+', "_", string) %>%
gsub("[(]", "_", .) %>%
gsub("[)]", "_", .) %>%
gsub("[/]", "_", .) %>%
gsub("[-]", "_", .)}
然后:
df <- as.data.frame(lapply(df, function(dataframe) sapply(dataframe, sanitize_string)), stringsAsFactors=FALSE)
但是当我这样做时,我失去了我的因子水平,它只是将每个因子视为具有一个水平,当我尝试从我的模型中获得预测时,这会导致问题,因为 sparse.model.matrix 需要 2 或每个因素都有更多级别,但实际上在生产中,只会发送一个观察结果。
谢谢。
这是我的数据框:
$ children_under16 : Factor w/ 2 levels "No","Yes": 1
$ ft_employment_status : Factor w/ 5 levels "Employed","Full-Time Education(Student)",..: 1
$ fuel_type : Factor w/ 2 levels "D","P": 2
$ homeowner : Factor w/ 2 levels "FALSE","TRUE": 2
$ marital_status : Factor w/ 6 levels "Married","Separated",..: 1
$ overnight_loc : Factor w/ 7 levels "In a private Driveway",..: NA
$ usage_type : Factor w/ 3 levels "CLASS_1","SDPC",..: 1
$ licence_type : Factor w/ 3 levels "UK","European",..: 1
$ yad_relationship_to_policyholder: Factor w/ 8 levels "Spouse","No_YAD",..: 1
$ A : Factor w/ 7 levels "1","2","5","3",..: 1
$ B : Factor w/ 19 levels "C","E","Q","D",..: 1
$ C : Factor w/ 63 levels "11","19","58",..: 1
$ region : Factor w/ 12 levels "Yorkshire and The Humber",..: 1
$ D : Factor w/ 28 levels "Semi-Detached Suburbia",..: 27
$ E : Factor w/ 77 levels "Families in Terraces and Flats",..: 77
$ F : Factor w/ 9 levels "Suburbanites",..: 1
$ industry_band : Factor w/ 18 levels "13","14","15",..: 14
$ occ_band_goco : Factor w/ 17 levels "0","1","2","3",..: 2
$ transmission : Factor w/ 2 levels "A","M": 2
$ vehicle_make : Factor w/ 19 levels "OTHER","AUDI",..: 1
$ vehicle_type : Factor w/ 17 levels "Mid Exec Saloon/Estate/Coupe",..: 1
$ rural_urban : Factor w/ 19 levels "Urban major conurbation",..: 2
$ water_company : Factor w/ 23 levels "Affinity Water",..: 23
$ seats : Factor w/ 6 levels "-99","2","4",..: ```
【问题讨论】:
-
可以给
head(df)和str(df)吗? -
你能提供你的数据样本吗?很想重现这个问题。
标签: r gsub xgboost model.matrix