【发布时间】:2017-03-16 03:56:29
【问题描述】:
我正在学习 R 和 Quantmod,并构建了一个非常简单的股票模型预测。我有一个 xgboost 和 caret 模型,这是整个例子:
library(quantmod)
library(xts)
# get market data
Nasdaq100_Symbols <- c("AAPL", "ADBE", "ADI", "ADP", "ADSK", "AKAM")
getSymbols(Nasdaq100_Symbols)
# merge them all together
nasdaq100 <- data.frame(as.xts(merge(AAPL, ADBE, ADI, ADP, ADSK, AKAM)))
# set outcome variable
outcomeSymbol <- 'ADP.Volume'
# shift outcome value to be on same line as predictors
nasdaq100 <- xts(nasdaq100,order.by=as.Date(rownames(nasdaq100)))
nasdaq100 <- as.data.frame(merge(nasdaq100, lm1=lag(nasdaq100[,outcomeSymbol],-1)))
nasdaq100$outcome <- ifelse(nasdaq100[,paste0(outcomeSymbol,'.1')] > nasdaq100[,outcomeSymbol], 1, 0)
# remove shifted down volume field
nasdaq100 <- nasdaq100[,!names(nasdaq100) %in% c(paste0(outcomeSymbol,'.1'))]
# cast date to true date and order in decreasing order
nasdaq100$date <- as.Date(row.names(nasdaq100))
nasdaq100 <- nasdaq100[order(as.Date(nasdaq100$date, "%m/%d/%Y"), decreasing = TRUE),]
# calculate all day differences and populate them on same row
GetDiffDays <- function(objDF,days=c(10), offLimitsSymbols=c('outcome'), roundByScaler=3) {
# needs to be sorted by date in decreasing order
ind <- sapply(objDF, is.numeric)
for (sym in names(objDF)[ind]) {
if (!sym %in% offLimitsSymbols) {
print(paste('*********', sym))
objDF[,sym] <- round(scale(objDF[,sym]),roundByScaler)
print(paste('theColName', sym))
for (day in days) {
objDF[paste0(sym,'_',day)] <- c(diff(objDF[,sym],lag = day),rep(x=0,day)) * -1
}
}
}
return (objDF)
}
# call the function with the following differences
nasdaq100 <- GetDiffDays(nasdaq100, days=c(1,2,3,4,5,10,20), offLimitsSymbols=c('outcome'), roundByScaler=2)
# drop most recent entry as we don't have an outcome
nasdaq100 <- nasdaq100[2:nrow(nasdaq100),]
# use POSIXlt to add day of the week, day of the month, day of the year
nasdaq100$wday <- as.POSIXlt(nasdaq100$date)$wday
nasdaq100$yday <- as.POSIXlt(nasdaq100$date)$mday
nasdaq100$mon<- as.POSIXlt(nasdaq100$date)$mon
# remove date field and shuffle data frame
nasdaq100 <- subset(nasdaq100, select=-c(date))
nasdaq100 <- nasdaq100[sample(nrow(nasdaq100)),]
# xgboost Modeling
library(xgboost)
predictorNames <- names(nasdaq100)[names(nasdaq100) != 'outcome']
set.seed(1234)
split <- sample(nrow(nasdaq100), floor(0.7*nrow(nasdaq100)))
train <-nasdaq100[split,]
test <- nasdaq100[-split,]
bst <- xgboost(data = as.matrix(train[,predictorNames]),
label = train$outcome,
verbose=0,
eta = 0.1,
gamma = 50,
missing = NaN,
nround = 150,
colsample_bytree = 0.1,
subsample = 1,
nthread = 4,
objective="binary:logistic")
predictions <- predict(bst, as.matrix(test[,predictorNames]), missing = NaN, outputmargin=TRUE)
library(pROC)
auc <- roc(test$outcome, predictions)
print(paste('AUC score:', auc$auc))
问题 1:
现在它训练 70%,预测 30%,最后我可以打印出 AUC 分数。假设我 100% 训练并想预测明天会发生什么? IE。获取模型认为明天体积会上升的符号。
问题 2:
理想情况下,我想继续将今天的日终数据添加到模型中,然后让它预测明天的符号。现在看来我不得不使用getSymbols() 再次拉出整个历史记录。有什么方法可以提取今天的数据并将其附加到该符号的 xts 对象?
【问题讨论】:
标签: r r-caret quantmod xgboost