计算 r 中两个变量中唯一值的数量答案

【问题标题】：Count number of unique values in two variables in r计算 r 中两个变量中唯一值的数量
【发布时间】：2021-04-28 05:54:58
【问题描述】：

我有一个这样的数据集。我想根据 R 中全年的日期计算客户访问的次数。（使用 UniSA_Customer_No 和 Sale_Date）

客户编号和日期很少重复。我需要对所有日期和客户编号进行分组，并找出客户编号全年访问了多少次

【问题讨论】：

图片不是共享数据/代码的正确方式。以更易于复制的可复制格式添加它们。阅读how to give a reproducible example。

标签： r aggregate

【解决方案1】：

您可以通过“制表”每个客户来做到这一点：

table(year2014$UniSA_Customer_No)

例如可以比较 2 个变量：

tabulate(year2014$UniSA_Customer_No, year2014$Sale_Date)

但是，在这种情况下，我建议先删除重复项（有关详细信息，请参阅 this answer）。

#select data from the year 2014
year2014 <- year2014[grep("^2014-", year2014$Sale_Date),]
#extract only columns to define duplicates
cust_date <- cbind(year2014$UniSA_Customer_No, year2014$Sale_Date)
#detect duplicates
dup_rows <- duplicated(cust_date)
#subset to unique rows
year2014unique <- year2014[!dup_rows,]
#tabulate without duplicates (customers counted once per day)
table(year2014unique$UniSA_Customer_No, year2014unique$Sale_Date)

举个简单的例子：

> unique(c(1, 2, 3, 1))
[1] 1 2 3
> table(c(1, 2, 3, 1))

1 2 3 
2 1 1

不需要外部包来执行此操作。

【讨论】：

【解决方案2】：

你可以使用count：

library(dplyr)
library(lubridate)

df %>% count(UniSA_Customer_No, Sale_Date = year(as.Date(Sale_Date)))

在基础 R 中，使用 table ：

table(df$UniSA_Customer_No, format(as.Date(df$Sale_Date), '%Y'))

【讨论】：