【问题标题】:Create new Categorical variable based on a subset of data根据数据子集创建新的分类变量
【发布时间】:2016-09-15 02:52:13
【问题描述】:

我有一个如下所示的数据框:

         cnt    bnk qst ans
1  Country 1 Bank 1  q1   1
2  Country 2 Bank 2  q1   1
3  Country 3 Bank 3  q1   3
4  Country 4 Bank 4  q1   1
5  Country 1 Bank 1  q2   1
6  Country 2 Bank 2  q2   2
7  Country 3 Bank 3  q2   3
8  Country 4 Bank 4  q2   4
9  Country 1 Bank 1  q3   1
10 Country 2 Bank 2  q3   1
11 Country 3 Bank 3  q3   2
12 Country 4 Bank 4  q3   1

供您参考,q 代表“问题”。所以,q2 是“问题 2”。同样,ans 是响应。

现在,我想根据q2 中的响应创建一个分类变量。特别是我想分配以下类别:

  1. 公开
  2. 私人
  3. 混合
  4. 其他

所以,如果 ans=1qst=q2,这是“公共”,如果 ans=2qst=q2 这是“私人”等等。所以,我之后的数据框应该如下所示:

         cnt    bnk qst ans   dummy
1  Country 1 Bank 1  q1   1  Public
2  Country 2 Bank 2  q1   1 Private
3  Country 3 Bank 3  q1   3   Mixed
4  Country 4 Bank 4  q1   1  Other'
5  Country 1 Bank 1  q2   1  Public
6  Country 2 Bank 2  q2   2 Private
7  Country 3 Bank 3  q2   3   Mixed
8  Country 4 Bank 4  q2   4  Other'
9  Country 1 Bank 1  q3   1  Public
10 Country 2 Bank 2  q3   1 Private
11 Country 3 Bank 3  q3   2   Mixed
12 Country 4 Bank 4  q3   1  Other'

我尝试使用 ifelse,但我没有做我想做的事。有人可以给我一些建议吗?

数据

dput(df)
structure(list(cnt = c("Country 1", "Country 2", "Country 3", 
"Country 4", "Country 1", "Country 2", "Country 3", "Country 4", 
"Country 1", "Country 2", "Country 3", "Country 4"), bnk = c("Bank 1", 
"Bank 2", "Bank 3", "Bank 4", "Bank 1", "Bank 2", "Bank 3", "Bank 4", 
"Bank 1", "Bank 2", "Bank 3", "Bank 4"), qst = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("q1", 
"q2", "q3"), class = "factor"), ans = c(1L, 1L, 3L, 1L, 1L, 2L, 
3L, 4L, 1L, 1L, 2L, 1L), dummy = c(NA, NA, NA, NA, "Public", 
"Private", "Mixed", "Other", NA, NA, NA, NA)), .Names = c("cnt", 
"bnk", "qst", "ans", "dummy"), row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11", "12"), class = "data.frame")

【问题讨论】:

  • 第二个条目是Privateans = 1
  • 是的,因为我不关心q1ans,而是关心q2

标签: r data-manipulation data-cleaning dummy-variable


【解决方案1】:

以下将把 NA 用于所有其他 Q,

df$dummy <- ifelse(df$ans == 1 & df$qst == 'q2', 'Public', 
               ifelse(df$ans == 2 & df$qst == 'q2', 'Private', 
                   ifelse(df$ans == 3 & df$qst == 'q2', 'Mixed', 
                        ifelse(df$ans == 4 & df$qst == 'q2', 'Other', NA))))

#         cnt    bnk qst ans   dummy
#1  Country 1 Bank 1  q1   1    <NA>
#2  Country 2 Bank 2  q1   1    <NA>
#3  Country 3 Bank 3  q1   3    <NA>
#4  Country 4 Bank 4  q1   1    <NA>
#5  Country 1 Bank 1  q2   1  Public
#6  Country 2 Bank 2  q2   2 Private
#7  Country 3 Bank 3  q2   3   Mixed
#8  Country 4 Bank 4  q2   4   Other
#9  Country 1 Bank 1  q3   1    <NA>
#10 Country 2 Bank 2  q3   1    <NA>
#11 Country 3 Bank 3  q3   2    <NA>
#12 Country 4 Bank 4  q3   1    <NA>

【讨论】:

    【解决方案2】:

    以下内容适用于名为 df 的 data.frame。没有数据很难测试:

    # construct dummy variable in subset data.frame
    dfCountryQ2 <- df[df$qst=="q2", c("cnt", "ans")]
    dfCountryQ2$dummy <- factor(dfCountryQ2$ans, levels=1:4,
                                labels=c("Public", "Private", "Mixed", "Other"))
    
    # merge on by country
    df <- merge(df, dfCountryQ2[, c("cnt", "dummy")], by="cnt")
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-05-22
      • 1970-01-01
      • 2015-12-14
      • 1970-01-01
      • 1970-01-01
      • 2014-03-20
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多