根据列中的条件过滤数据表答案

【问题标题】：Filtering data table based on condition in Column根据列中的条件过滤数据表
【发布时间】：2018-01-25 23:18:30
【问题描述】：

我正在尝试从NSE site 下载 EOD 数据。数据由所有类型的 EQ 或 BE 或 DR 或 N1 等组成。现在我只想根据 EQ 和 BE 和 DR 过滤表格，并排除 Col“Series”中的其他字段。

读写后的数据结构是这样的

      DATE SERIES     SYMBOL     OPEN     HIGH      LOW    CLOSE   VOLUME
1    2016-05-27     EQ  20MICRONS    28.30    29.20    28.05    28.25    31468
2    2016-05-27     EQ 3IINFOTECH     4.20     4.25     3.90     3.95  2209977
3    2016-05-27     EQ    3MINDIA 13170.00 13300.00 12611.00 12699.00     5511
4    2016-05-27     EQ    8KMILES  1717.00  1770.95  1685.00  1710.45    33558
5    2016-05-27     EQ   A2ZINFRA    24.80    25.65    24.70    25.15   102189
6    2016-05-27     EQ AARTIDRUGS   458.05   473.85   458.05   468.95    11140
7    2016-05-27     EQ   AARTIIND   512.60   519.95   512.20   516.20    13101
8    2016-05-27     EQ  AARVEEDEN    58.00    59.00    57.20    58.55     3436
9    2016-05-27     EQ       ABAN   198.55   202.50   198.50   199.55   999288
10   2016-05-27     EQ        ABB  1241.80  1273.85  1234.40  1253.95    51180
11   2016-05-27     EQ ABBOTINDIA  4703.00  4764.00  4639.70  4751.70     2663
12   2016-05-27     EQ      ABFRL   137.80   141.00   133.50   134.50   541872

尝试使用which 命令但只返回 EQ 系列

使用的代码是

#28-10-2014: Fix for '403 Forbidden'
## Credit http://stackoverflow.com/questions/26086868/error-downloading-a-csv-in-zip-from-website-with-get-in-r

library(httr)

#Define Working Directory, where files would be saved
setwd('D:/FII Stats/')

Define start and end dates, and convert them into date format
startDate = as.Date("2016-05-26", order="ymd")
endDate =   as.Date("2016-05-27", order="ymd")

#work with date, month, year for which data has to be extracted
myDate = startDate
zippedFile <- tempfile() 

while (myDate <= endDate){
  filenameDate = paste(as.character(myDate, "%y%m%d"), ".csv", sep = "")
 monthfilename=paste(as.character(myDate, "%y%m"),".csv", sep = "")
 downloadfilename=paste("cm", toupper(as.character(myDate, "%d%b%Y")), "bhav.csv", sep = "")
 temp =""

  #Generate URL
 myURL = paste("http://www.nseindia.com/content/historical/EQUITIES/", as.character(myDate, "%Y"), "/", toupper(as.character(myDate, "%b")), "/", downloadfilename, ".zip", sep = "")

  #retrieve Zipped file
  tryCatch({
  #Download Zipped File

#28-10-2014: Fix for '403 Forbidden'
  #download.file(myURL,zippedFile, quiet=TRUE, mode="wb",cacheOK=TRUE)
  GET(myURL, user_agent("Mozilla/5.0"), write_disk(paste(downloadfilename,".zip",sep="")))


  #Unzip file and save it in temp 
  #28-10-2014: Fix for '403 Forbidden'
  temp <- read.csv(unzip(paste(downloadfilename,".zip",sep="")), sep = ",",as.is=TRUE) 

  #temp <-  temp[which(temp$SERIES=="EQ" | "DR" | "BE"), ]


  #Rename Columns Volume and Date
  colnames(temp)[9] <- "VOLUME"
  colnames(temp)[11] <- "DATE"

  #Define Date format
  temp$DATE <- as.Date(temp$DATE, format="%d-%b-%Y")

  #Reorder Columns and Select relevant columns
   temp<-subset(temp,select=c("DATE","SERIES","SYMBOL","OPEN","HIGH","LOW","CLOSE","VOLUME"))
   #temp<-subset(temp,temp[temp$"SERIES" == "BE & DR & EQ", ],select=c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME"))

  #Write the BHAVCOPY csv - datewise
  write.csv(temp,file=filenameDate,row.names = FALSE)

  #Write the csv in Monthly file
  if (file.exists(monthfilename))
  {
   write.table(temp,file=monthfilename,sep=",", eol="\n", row.names = FALSE, col.names = FALSE, append=TRUE)
  }else
  {
   write.table(temp,file=monthfilename,sep=",", eol="\n", row.names = FALSE, col.names = TRUE, append=FALSE)
  }


  #Print Progress
  #print(paste (myDate, "-Done!", endDate-myDate, "left"))
 }, error=function(err){
  #print(paste(myDate, "-No Record"))
 }
 )
  myDate <- myDate+1
  print(paste(myDate, "Next Record"))
}

 #Delete temp file - Bhavcopy
 junk <- dir(pattern="cm")
 file.remove(junk)

如何得到想要的结果？

【问题讨论】：

标签： r

【解决方案1】：

使用 %in% 而不是“==”。您不能使用x == A | B，但可以使用x %in% c("A","B")。如果您选择使用“[”，请不要使用子集。这是一个非此即彼的选择：

temp <- temp[ temp$"SERIES" %in% c("BE",  "DR", "EQ") ,   # row selection rule
             c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME") ] #col select

或者这样使用subset：

temp<-subset(temp,   SERIES %in% c("BE",  "DR", EQ"),   # NSE , so use unquoted colname
               select=c("DATE","SYMBOL", "OPEN", "HIGH", "LOW", "CLOSE", "LAST", "VOLUME"))

如果您打算使用 R 进行任何编程，最好使用“[”函数。subset 中的 NSE（如果您不知道首字母缩写词的含义，请查阅）是持续错误的来源。最安全的是避免使用“$”：

temp <- temp[ temp[["SERIES"]] %in% c("BE,  "DR", EQ") ,   # row selection rule
             c("DATE","SYMBOL","OPEN","HIGH","LOW","CLOSE","LAST","VOLUME") ] # col select

【讨论】：

使用了第一个代码 sn-p，结果如期而至。 BE 和 EQ 加上引号，否则结果不正确。谢谢

【解决方案2】：

这将完成工作：

library(data.table)

output <- setDT(df)[SERIES %in% c("EQ", "BE", "DR") ]

【讨论】：