【问题标题】:Change continuous variable into categorical将连续变量更改为分类变量
【发布时间】:2020-01-21 17:18:00
【问题描述】:

我有今年的变量,我想将其更改为具有 3 个级别的分类变量。我这里用的是levels函数,真的很蛋疼。

traintest$YearBuilt <- as.factor(traintest$YearBuilt)
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(1872,1875,1879,1880,1882,
                                                             1885,1890,1892,1893,1895,
                                                             1896,1898,1900,1901,1902,
                                                             1904,1905,1906,1907,1908,
                                                             1910,1911,1912,1913,1914,
                                                             1915,1916,1917,1918,1919,
                                                             1920,1921,1922,1923,1924,
                                                             1925,1926,1927,1928,1929,
                                                             1930,1931,1932,1934,1935,
                                                             1936,1937,1938,1939,1940,
                                                             1941,1942,1945,1946,1947,
                                                             1948,1949)] <- "Before1950"
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(1950,1951,1952,1953,1954,
                                                             1955,1956,1957,1958,1959,
                                                             1960,1961,1962,1963,1964,
                                                             1965,1966,1967,1968,1969,
                                                             1970,1971,1972,1973,1974,
                                                             1975,1976,1977,1978,1979,
                                                             1980,1981,1982,1983,1984,
                                                             1985,1986,1987,1988,1989,
                                                             1990,1991,1992,1993,1994,
                                                             1995,1996,1997,1998,1999)] <- "Between1950-2000"
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(2000,2001,2002,2003,2004,
                                                             2005,2006,2007,2008,2009,
                                                             2010)] <- "After2000"

我尝试过使用cut函数,但对我来说不太好用,它基本上把所有的变量都带入了第一类,其他两个类都归零了。

有没有更简单的方法可以做到这一点?

【问题讨论】:

标签: r


【解决方案1】:

一种选择是创建一个逻辑向量

v1 <- as.numeric(levels(traintest$YearBuilt))
i1 <- v1  < 1950
i2 <- !i1 & v1 < 2000
i3 <- v1 >=2000
levels(traintest$YearBuilt)[i1] <- "Before 1950"
levels(traintest$YearBuilt)[i2] <- "Between1950-2000"
levels(traintest$YearBuilt)[i3] <- "After 2000"

或使用cut

levels(traintest$YearBuilt) <- cut(v1, breaks = c(-Inf, 1949, 1999, 
       Inf), labels = c("Before1950", "Between1950-2000", "After 2000"))

【讨论】:

  • 在你的剪辑中,我认为你需要中间的休息时间是 1949 年。
猜你喜欢
  • 2019-11-24
  • 2018-05-18
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-09-16
  • 2019-11-15
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多