【发布时间】:2021-01-06 09:20:31
【问题描述】:
我的脚本和前 3 个 csv 文件之一可以在我的 Github 文件夹中找到
我已将 NDVI 和气候数据列表拆分为小 csv。每个文件都有 34 年的数据。
然后应根据冲突年份将每 34 年分为两部分,保存在同一张表和特定时间范围内。但是这部分代码已经可以了。
现在我想用第一部分的气候数据来控制列表的第二部分,通过使用多元线性回归,这也是这样做的。
我基本上需要做一个循环来存储一个csv的lm函数每一轮的所有系数。新列表中的文件。
我知道我可以使用 lapply 循环并将输出作为列表获取。但是有一些缺失的部分实际上循环通过 csv。文件。
#load libraries
library(ggplot2)
library(readr)
library(tidyr)
library(dplyr)
library(ggpubr)
library(plyr)
library(tidyverse)
library(fs)
file_paths <- fs::dir_ls("E:\\PYTHON_ST\\breakCSV_PYTHON\\AIM_2_regions\\Afghanistan")
file_paths
#create empty list and fill with file paths and loop through them
file_contents <- list()
for (i in seq_along(file_paths)) { #seq_along for vectors (list of file paths is a vector)
file_contents[[i]] <- read_csv(file = file_paths[[i]])
for (i in seq_len(file_contents[[i]])){ # redundant?
# do all the following steps in every file
# Step 1)
# Define years to divide table
#select conflict year in df
ConflictYear = file_contents[[i]][1,9]
ConflictYear
# select Start year of regression in df
SlopeYears = file_contents[[i]][1,7] #to get slope years (e.g.17)
BCStartYear = ConflictYear-SlopeYears #to get start year for regression
BCStartYear
#End year of regression
ACEndYear = ConflictYear+(SlopeYears-1) # -1 because the conflict year is included
ACEndYear
# Step 2
#select needed rows from df
#no headers but row numbers. NDVI.Year = [r1-r34,c2]
NDVI.Year <- file_contents[[i]][1:34,2]
NDVI <- file_contents[[i]][1:34,21]
T.annual.max <- file_contents[[i]][1:34,19]
Prec.annual.max <- file_contents[[i]][1:34,20]
soilM.annual.max <- file_contents[[i]][1:34,18]
#Define BeforeConf and AfterConf depending on Slope Year number and Conflict Years
#Go through NDVI.Year till Conflict.Year (-1 year) since the conflict year is not included in bc
BeforeConf1 <- file_contents[[i]][ which(file_contents[[i]]$NDVI.Year >= BCStartYear & file_contents[[i]]$NDVI.Year < ConflictYear),] #eg. 1982 to 1999
BeforeConf2 <- c(NDVI.Year, NDVI, T.annual.max, Prec.annual.max, soilM.annual.max) #which columns to include
BeforeConf <- BeforeConf1[BeforeConf2] #create table
AfterConf1 <- myFiles[ which(file_contents[[i]]$NDVI.Year >= ConflictYear & file_contents[[i]]$NDVI.Year <= ACEndYear),] #eg. 1999 to 2015
AfterConf2 <- c(NDVI.Year, NDVI, T.annual.max, Prec.annual.max, soilM.annual.max)
AfterConf <- AfterConf1[AfterConf2]
#Step 3)a)
#create empty list, to fill with coefficient results from each model results for each csv file and safe in new list
#Create an empty df for the output coefficients
names <- c("(Intercept)","BeforeConf$T.annual.max","BeforeConf$Prec.annual.max","BeforeConf$soilM.annual.max")
coef_df <- data.frame()
for (k in names) coef_df[[k]] <- as.character()
#Apply Multiple Linear Regression
plyrFunc <- function(x){
model <- lm(NDVI ~ T.annual.max + Prec.annual.max + soilM.annual.max, data = BeforeConf)
return(summary(model)$coefficients[1,1:4])
}
coef_df <- ddply(BeforeConf, .(), x)
coef_DF
}}
【问题讨论】:
-
有一些缺失的部分不太清楚。请具体说明在您的长代码块中什么不起作用。
-
因此该代码之前仅适用于一个 csv 文件。主要问题是将其更改为循环遍历 csv 文件列表并为所有文件执行任务。为此,我将以前的文件名更改为“file_contents[[i]]”。但我不知道它是否像那样工作。第二个问题是将系数存储在新的df中。那必须在循环内吗?同样的问题,它适用于单个文件,但我不知道如何将其更改为循环。
标签: r csv for-loop linear-regression plyr