【问题标题】:Passing current value of ddply split on to function将 ddply split 的当前值传递给函数
【发布时间】:2014-03-23 04:27:15
【问题描述】:

这里是一些示例数据,我想随着时间的推移对其姓名的性别进行编码:

names_to_encode <- structure(list(names = structure(c(2L, 2L, 1L, 1L, 3L, 3L), .Label = c("jane", "john", "madison"), class = "factor"), year = c(1890, 1990, 1890, 1990, 1890, 2012)), .Names = c("names", "year"), row.names = c(NA, -6L), class = "data.frame")

这是社会保障数据的最小集合,仅限于 1890 年和 1990 年的那些姓名:

ssa_demo <- structure(list(name = c("jane", "jane", "john", "john", "madison", "madison"), year = c(1890L, 1990L, 1890L, 1990L, 1890L, 1990L), female = c(372, 771, 56, 81, 0, 1407), male = c(0, 8, 8502, 29066, 14, 145)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), .Names = c("name", "year", "female", "male"))

我已经定义了一个函数,它对给定年份或年份范围的社会保障数据进行子集化。换句话说,它通过计算出使用该名字的男性和女性出生的比例来计算给定时间段内一个名字是男性还是女性。这是该函数和一个辅助函数:

require(plyr)
require(dplyr)

select_ssa <- function(years) {

  # If we get only one year (1890) convert it to a range of years (1890-1890)
  if (length(years) == 1) years <- c(years, years)

  # Calculate the male and female proportions for the given range of years
  ssa_select <- ssa_demo %.%
    filter(year >= years[1], year <= years[2]) %.%
    group_by(name) %.%
    summarise(female = sum(female),
              male = sum(male)) %.%
    mutate(proportion_male = round((male / (male + female)), digits = 4),
           proportion_female = round((female / (male + female)), digits = 4)) %.%
    mutate(gender = sapply(proportion_female, male_or_female))

  return(ssa_select)
}

# Helper function to determine whether a name is male or female in a given year
male_or_female <- function(proportion_female) {
  if (proportion_female > 0.5) {
    return("female")
  } else if(proportion_female == 0.5000) {
    return("either")
  } else {
    return("male")
  }
}

现在我想做的是使用 plyr,特别是 ddply,对要按年份编码的数据进行子集化,并将这些数据中的每一个与 select_ssa 函数返回的值合并。这是我的代码。

ddply(names_to_encode, .(year), merge, y = select_ssa(year), by.x = "names", by.y = "name", all.x = TRUE)

调用select_ssa(year) 时,如果我将1890 之类的值硬编码为函数的参数,则此命令可以正常工作。但是,当我尝试将 ddply 正在使用的 year 的当前值传递给它时,我收到一条错误消息:

Error in filter_impl(.data, dots(...), environment()) : 
  (list) object cannot be coerced to type 'integer'

如何将year 的当前值传递给ddply

【问题讨论】:

    标签: r plyr dplyr


    【解决方案1】:

    我认为您尝试在 ddply 内部进行连接会使事情变得过于复杂。如果我使用dplyr,我可能会做更多这样的事情:

    names_to_encode <- structure(list(name = structure(c(2L, 2L, 1L, 1L, 3L, 3L), .Label = c("jane", "john", "madison"), class = "factor"), year = c(1890, 1990, 1890, 1990, 1890, 2012)), .Names = c("name", "year"), row.names = c(NA, -6L), class = "data.frame")
    
    ssa_demo <- structure(list(name = c("jane", "jane", "john", "john", "madison", "madison"), year = c(1890L, 1990L, 1890L, 1990L, 1890L, 1990L), female = c(372, 771, 56, 81, 0, 1407), male = c(0, 8, 8502, 29066, 14, 145)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), .Names = c("name", "year", "female", "male"))
    
    names_to_encode$name <- as.character(names_to_encode$name)
    names_to_encode$year <- as.integer(names_to_encode$year)
    
    tmp <- left_join(ssa_demo,names_to_encode) %.%
            group_by(year,name) %.%
            summarise(female = sum(female),
                  male = sum(male)) %.%
            mutate(proportion_male = round((male / (male + female)), digits = 4),
               proportion_female = round((female / (male + female)), digits = 4)) %.%
            mutate(gender = ifelse(proportion_female == 0.5,"either",
                            ifelse(proportion_female > 0.5,"female","male")))
    

    请注意,0.1.1 对连接列的类型仍然有些挑剔,因此我不得不对其进行转换。我想我在 github 上看到了一些活动,表明这些活动要么已在开发版本中修复,要么至少是他们正在处理的问题。

    【讨论】:

    • 这很棒,适用于这些数据集。我的困难是我正在为 R 包编写此代码,因此我无法假设用户数据中的名称列名为 name 而年份列名为 year。在之前的 stackoverflow.com/questions/21888910/…> 中,我了解到 dplyr 不允许您指定要加入的列。我应该强制用户重命名列吗?
    • @LincolnMullen 如果有帮助,您可以使用 regroup 在 dplyr 中以编程方式进行分组。见here
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-11-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-08-17
    • 1970-01-01
    相关资源
    最近更新 更多