【问题标题】:r - generate multiple files from randomizing a data framer - 通过随机化数据框生成多个文件
【发布时间】:2018-11-30 21:34:54
【问题描述】:

我需要从数据框的随机化中生成并保存多个文件。 原始数据框是几年的每日天气数据。我需要生成随机重组年份但保持年份顺序的文件。

我开发了一个用于随机化年份的简单代码,但我无法重复随机化并将每个输出随机数据帧保存为单独的文件。

这是我目前所拥有的:

# Create example data frame
df <- data.frame(x=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,8,8))
df$y <- c(4,8,9,1,1,5,8,8,3,2,0,9,4,4,7,3,5,5,2,4,6,6)
df$z <- c("A","A","A","B","B","B","C","C","C","D","D","D","F","F","F","G","G","G","H","H","I","I")

set.seed(30)

# Split data frame based on info in one column (i.e. df$x) and store in a list 
dt_list <- split(df, f = df$x)

# RANDOMIZE data list -- Create a new index and change the order of dt_list
# SAVE the result to "random list" (i.e. 'rd_list')

rd_list <- dt_list[sample(1:length(dt_list), length(dt_list))]

# Put back together data in the order established in 'rd_list' 
rd_data <- do.call(rbind, rd_list)

这就像我需要的那样随机化数据框,但我不知道如何“保存并重复”所以我得到多个文件,比如说大约 20 个,命名为原始和顺序编号(例如 df_1、df_2 ...)。

此外,作为随机样本,有可能得到重复。有没有办法自动丢弃重复的文件?

谢谢!

【问题讨论】:

    标签: r random


    【解决方案1】:

    这是一种利用while 循环和dplyr 包中方便的sample_n() 函数的方法,它从数据框中采样指定数量的行(有或没有替换)。

    library(dplyr)
    
    # Create the data
    weather_data <- data.frame(Weather = c("Sunny", "Cloudy", "Rainy", "Sunny"),
                               Temperature = c(75, 68, 71, 76))
    
    # Twenty times, repeatedly sample rows from the data and write to a csv file
    total_files <- 20
    df_index <- 1
    
    while (df_index <= total_files) {
      # Get a sample of the data
      sampled_subset <- sample_n(weather_data,
                                 size = 10,
                                 replace = TRUE)
    
      # Write the data to a csv file
      filename_to_use <- paste0("Sample_Data", "_", df_index, ".csv")
    
      write.csv(x = sampled_subset,
                file = filename_to_use, sep = ",")
    
      df_index <- df_index + 1
    }
    

    【讨论】:

    • 感谢您的帮助。这类似于使用for 循环;应该考虑一下!有什么办法可以丢弃重复的文件?
    猜你喜欢
    • 1970-01-01
    • 2021-12-25
    • 2018-08-20
    • 1970-01-01
    • 1970-01-01
    • 2013-06-01
    • 2015-12-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多