【发布时间】:2015-12-28 08:24:57
【问题描述】:
我需要将连续变量重新编码为类别,通常我使用“cut”函数,但在 cut 函数中我需要指定中断。我正在寻找一种方法来根据我的数据框中的其他分类变量设置不同的中断。
我的示例中的变量是 Cost,“breaks”在第二个表“cost.range”中,每个“Region”和每个“Category”都有一组不同的 Breaks
例子:
Region Product Category Cost
Country A Product 1 CAT A 731
Country B Product 1 CAT A 659
Country C Product 1 CAT A 385
Country D Product 1 CAT A 763
Country A Product 2 CAT A 701
Country B Product 2 CAT A 759
Country C Product 2 CAT A 580
Country D Product 2 CAT A 147
Country A Product 3 CAT B 645
Country B Product 3 CAT B 657
Country C Product 3 CAT B 424
Region Category Cost.Range Range
Country A CAT A 10 R1
Country A CAT A 50 R2
Country A CAT A 200 R3
Country A CAT A 1000 R4
Country A CAT B 20 R1
Country A CAT B 100 R2
Country A CAT B 400 R3
Country A CAT B 1500 R4
生成示例的代码:
Region <- c("Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D")
Product <- c("Product 1","Product 1","Product 1","Product 1","Product 2","Product 2","Product 2","Product 2","Product 3","Product 3","Product 3","Product 3","Product 4","Product 4","Product 4","Product 4")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B")
Cost <- c(731,659,385,763,701,759,580,147,645,657,424,34,850,463,160,550)
Table1 <- data.frame(Region, Product, Category, Cost)
Region <- c("Country A","Country A","Country A","Country A","Country A","Country A","Country A","Country A")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B")
Cost.range <- c(10,50,200,1000,20,100,400,1500)
Range <- c("R1","R1","R3","R4","R1","R2","R3","R4")
Table2 <- data.frame(Region, Category, Cost.range, Range)
【问题讨论】:
-
您可以使用
by,它也可以同时作用于每个类别。您能否以可复制的形式提供您的数据以及您尝试过的代码? -
谢谢,我编辑了我的帖子以包含代码,我查看了“by”文档,因为我是 R 新手,所以我不知道如何使用它。你能解释一下吗?
-
我想我会使用
cut,但标签在 Range 列中并不是唯一的。这是设计使然吗? -
是的,这就是为什么我不能使用 cut,我有 50 个类别和 20 个不同范围的国家/地区。
-
不同的范围不是问题,标签的非唯一性是。
标签: r