【问题标题】:Subsetting and saving files with loop in R在R中使用循环子集和保存文件
【发布时间】:2021-09-24 02:45:57
【问题描述】:

我有从 2005 年到 2015 年的面板数据,我想运行一些循环并分别保存每年的输出。这是我的代码:

for (i in 2005:2015){

  ntm_data <-subset(ntm_data_wip, StartDate <="i" & EndDate >"i")

*"inner loops"*

regulatory_distance_matrix$year <-i
write.dta(regulatory_distance_matrix, "C:/Users/Utente/Desktop/Master's thesis/Thesis analysis/- RD construction/Binary RD/regulatory_distance_matrix_",i,".dta")
} 

如果我手动设置子集并选择一年,则内部循环会起作用。但是,当我按上述方式运行它时,出现以下错误:

 `summarise()` has grouped output by 'reporter', 'ntmcode'. You can override using the `.groups` argument.
Error in if (!is.na(regulatory_distance_matrix[k, avail_iso3s[g]])) { : 
  l'argomento ha lunghezza zero
Inoltre: Warning messages:
1: In max(ntm_data$StartDate, na.rm = FALSE) :
  no non-missing arguments to max; returning -Inf
2: In min(ntm_data$EndDate, na.rm = FALSE) :
  no non-missing arguments to min; returning Inf

有人知道如何解决吗? 提前致谢

【问题讨论】:

  • "i" 是一个包含小写字母 i 的字符常量。您的循环变量是一个包含从 2005 年到 2015 年的一系列数值的变量。尝试删除 "i" 中的引号。 `regulatory_distance_matrix 从何而来?否则,我们需要查看您的数据。哦,欢迎来到 SO!
  • 我想你 write.dta 在你声明你的文件名的地方也有点假。我想你需要一个粘贴:paste0("C:/Users/Utente/Desktop/Master's thesis/Thesis analysis/- RD construction/Binary RD/regulatory_distance_matrix_",i,".dta")

标签: r loops for-loop


【解决方案1】:

非常感谢您的及时回复,非常感谢!我在这里附上整个代码和一些简短的 cmets:

for (i in 2005:2007){

  ntm_data <-subset(ntm_data_wip, StartDate <=i & EndDate >i)

# Once the data is loaded, I exclude NTM codes which are missing. 
# I only need the reporter, NTM code and product codes (HS 6-digit codes).
ntm_data <- ntm_data[!is.na(ntm_data$ntmcode)&ntm_data$ntmcode!="",]
ntm_data <- ntm_data[,c("reporter", "ntmcode", "hs6")]

# I group the data by reporter, NTM and product code (hs6) and count the number of combinations in a new variable called count.
ntm_data <- ntm_data %>% group_by(reporter, ntmcode, hs6) %>%
  summarise(count = n())
head(ntm_data)

# I prepare the regulatory matrix by creating a list of countries for which I want the regulatory distance.  The 
# regulatory matrix shows the distance between two countries and has as column and row names the ISO3 codes of the countries.
# As specified above, I am interested in having the analysis for all available countries. 
avail_iso3s <- unique(ntm_data$reporter)

# I create an empty regulatory distance matrix. For column size I use the length of avail_iso3s and add 1 for the reporter column.
# I populate the column names with reporter and the ISO3 codes with the option dimnames.
regulatory_distance_matrix <- data.frame(matrix(vector(),0,length(avail_iso3s)+1,
                                                dimnames = list(c(), c("reporter", avail_iso3s )
                                                )),
                                         stringsAsFactors=F)


#' Now I can move on to calculating the regulatory distance formula in page 3 of "DEEP REGIONAL INTEGRATION AND NON-TARIFF MEASURES:A METHODOLOGY FOR DATA ANALYSIS (2015)" . 
#' As N is a constant, I start with calculating it outside of the loop
N <- ntm_data %>% group_by(ntmcode, hs6) %>% count()
N <- nrow(N)

# I now fill in the regulatory distance matrix with values

for (g in 1:length(avail_iso3s)){
  country_a <- ntm_data[ntm_data$reporter==avail_iso3s[g],c("ntmcode", "hs6")]
  country_a$country_a <- 1
  regulatory_distance_matrix[g,"reporter"] <- avail_iso3s[g]
  
  for (k in 1:length(avail_iso3s)){
    
    if (!is.na(regulatory_distance_matrix[k,avail_iso3s[g]])){next }
    
    country_b <- ntm_data[ntm_data$reporter==avail_iso3s[k],c("ntmcode", "hs6")]
    country_b$country_b <- 1
    merged <- merge(country_a, country_b, by=c("ntmcode", "hs6"), all = TRUE)
    merged[is.na(merged)] <- 0
    merged$abs_diff <- abs(merged$country_a-merged$country_b)
    rd <- sum(merged$abs_diff)/N
    regulatory_distance_matrix[g,avail_iso3s[k]] <- rd
    
  }
}

# Now I fill in the missing values and create a Stata dta.file.
for (g in 1:length(avail_iso3s)){
  for (k in 1:length(avail_iso3s)){
    if (is.na(regulatory_distance_matrix[k,avail_iso3s[g]])){
      regulatory_distance_matrix[k,avail_iso3s[g]] <- regulatory_distance_matrix[g,avail_iso3s[k]]
    }
  }
}


regulatory_distance_matrix$year <-i
write.dta(regulatory_distance_matrix, "C:/Users/Utente/Desktop/Master's thesis/Thesis analysis/- RD construction/Binary RD/new_regulatory_distance_matrix_",i,".dta")
} 

我希望这会很有用。另外,在遵循您的建议后,我收到以下错误:

`summarise()` has grouped output by 'reporter', 'ntmcode'. You can override using the `.groups` argument.
Error in if (convert.dates) { : 
  l'argomento non può essere interpretato come logico
Inoltre: Warning message:
In write.dta(regulatory_distance_matrix, "C:/Users/Utente/Desktop/Master's thesis/Thesis analysis/- RD construction/Binary RD/new_regulatory_distance_matrix_",  :
  Version must be 6-12: using 7

对于这个繁琐的问题,我深表歉意,但我对 R 很陌生,其他帖子似乎没有多大帮助。

【讨论】:

  • 这个没问题。它显示但不是真正的错误:summarise() 已按“reporter”、“ntmcode”分组输出。您可以使用 .groups 参数覆盖。您需要将文件路径包装在 paste0, write.dta(regulatory_distance_matrix, paste0("C:/Users/Utente/Desktop/Master's thesis/Thesis analysis/- RD construction/Binary RD/regulatory_distance_matrix_",i,".dta") )
  • 看来代码运行正常!非常感谢您的帮助,祝您有愉快的一天!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-07-11
  • 2021-10-12
  • 1970-01-01
  • 2017-10-11
相关资源
最近更新 更多