提取具有其他两个值之间的值的数据框行答案

【问题标题】：Extract rows of data frame with values between two other values提取具有其他两个值之间的值的数据框行
【发布时间】：2017-10-20 21:50:13
【问题描述】：

我有 some data 有七列。前六个是对象横截面的空间坐标，以米为单位，相对于左下角的点 X=0 Y=0。形状的横截面由镶嵌三角形的网格组成，每个点坐标 (X1,Y1.X2,Y2.X3,Y3) 表示三角形的角。第七列 (Z) 是从分配给由三点坐标进行的三角形的分析得出的值。

我正在尝试通过形状中间的窄带提取点坐标和“Z”列数据。我查看了here 并使用了 David Arenburg 的代码和 data.table 包，但我似乎无法让它工作；可能是因为我有两列以上的数据？

我需要做的是通过形状的几何中心（高度的一半）在 1 厘米宽的水平带内提取坐标和 Z 数据。如果任何三角形的任何点坐标落在 1 厘米带内，我想要整行数据；理想情况下在单独的数据框中。

这是我目前所拥有的：

data<-sample_data
attach(data)
upper<-(max(data$Y3)/2)+0.005 # the horizontal centreline of the shape plus half a cm
lower<-(max(data$Y3)/2)-0.005 # the horizontal centreline of the shape minus half a cm
library(data.table)
(data[between(data,lower,upper,incbounds=FALSE)])

我也尝试过这些，但得到相同的错误消息

data[data>lower&data<upper]
data[sapply(data,function(x)x>lower&x<upper)]

# Error: Unsupported use of matrix or array for column indexing

如果您愿意，可以使用此代码绘制形状。

plot(X1,Y1,pch=19,cex=0.6)
points(X2,Y2,pch=19,cex=0.6)
points(X3,Y3,pch=19,cex=0.6)

希望我已经很好地解释了这一点，以便有人提供一些帮助

谢谢

【问题讨论】：

标签： r data.table range

【解决方案1】：

OP 已请求

通过形状的几何中心（高度的一半）提取 1 厘米宽的水平带内的坐标和 Z 数据。如果任何三角形的任何点坐标落在 1 厘米带内，我想要整行数据；理想情况下在单独的数据框中。

不幸的是，问题本身以及the accepted answer 只检查点 3 的 Y 值，而不检查点 1 和点 2 的 Y 值。这不符合上述选择那些的要求任意点坐标的三角形，即点 1、2、或 3，位于 1 cm 范围内。

所以，这里的关键问题是选择相关的三角形而不是单个点。

下面，有两个data.table 解决方案。第一个是使用 OP 提供的宽格式数据，第二个是长格式数据以简化代码。

宽幅

library(data.table)
# read data from dropbox
DT <- fread("https://www.dropbox.com/s/2h8oq8nzrr5jsnm/sample_data.csv?dl=1")
# compute horizontal band through the geometric center of the shape
lower <- DT[, mean(range(c(Y1, Y2, Y3)))] - 0.01 / 2
upper <- lower + 0.01
# select row if y value of any point is within the horizontal band 
DT[lower < Y1 & Y1 < upper | lower < Y2 & Y2 < upper | lower < Y3 & Y3 < upper]

             X1       Y1         X2       Y2         X3       Y3        Z
  1: 0.00737923 0.218856 0.00710657 0.215950 0.01030030 0.217116  37.1608
  2: 0.00517517 0.220532 0.00737923 0.218856 0.00761518 0.221670  57.6568
  3: 0.00679651 0.212962 0.00448935 0.214803 0.00407957 0.211809  16.6649
  4: 0.00407957 0.211809 0.00644539 0.209902 0.00679651 0.212962  15.2068
  5: 0.38168000 0.214740 0.38544600 0.212670 0.38533400 0.217001  28.2365
 ---                                                                     
177: 0.08690940 0.224751 0.07950840 0.237030 0.07896900 0.222427  86.2424
178: 0.08690940 0.224751 0.07896900 0.222427 0.08592510 0.216536  87.3141
179: 0.31252100 0.204228 0.30390000 0.214336 0.30509100 0.195766 127.5630
180: 0.01912900 0.209566 0.02296630 0.206891 0.02147170 0.214579  40.5351
181: 0.01912900 0.209566 0.02147170 0.214579 0.01702550 0.210148  37.3207

请注意，mean(range(c(Y1, Y2, Y3))) 用于计算几何中心的 y 值，而不是 max(data$Y3)/2，因为 y 值的范围延伸到 0 以下：

DT[, range(c(Y1, Y2, Y3))]

[1] -0.00171812  0.43692700

另外，条件
lower < Y1 & Y1 < upper | lower < Y2 & Y2 < upper | lower < Y3 & Y3 < upper
选择 181 个三角形，而仅使用 lower < Y3 & Y3 < upper 仅选择 93 个三角形。

绘制数据

使用data.table语法，可以绘制数据：

# plot all points
DT[, {plot(X1,Y1,pch=19,cex=0.6)
  points(X2,Y2,pch=19,cex=0.6)
  points(X3,Y3,pch=19,cex=0.6)}]
# plot points of selected triangles
DT[lower < Y1 & Y1 < upper | lower < Y2 & Y2 < upper | lower < Y3 & Y3 < upper, 
   {points(X1,Y1,pch=19,cex=0.6, col = "red")
     points(X2,Y2,pch=19,cex=0.6, col = "red")
     points(X3,Y3,pch=19,cex=0.6, col = "red")}]

长格式

每个三角形由 3 个 x 和 y 坐标（加上一个 z 值）组成。如果将数据从宽格式改造成长格式，则可以简化代码：

# reshape from wide to long with two value columns
mDT <- melt(DT, measure.vars = patterns("X", "Y"), value.name = c("X", "Y"))[
  # append column with triangle id
  , tn := rowid(variable)]
# compute range of horizontal band
Y_range <- mDT[, mean(range(Y)) + 0.005 * c(-1, 1)]
# get triangle ids which fulfill condition and subset original data set
DT[mDT[between(Y, Y_range[1], Y_range[2], FALSE), unique(tn)]]

             X1       Y1         X2       Y2         X3       Y3        Z
  1: 0.00737923 0.218856 0.00710657 0.215950 0.01030030 0.217116  37.1608
  2: 0.00517517 0.220532 0.00737923 0.218856 0.00761518 0.221670  57.6568
  3: 0.00679651 0.212962 0.00448935 0.214803 0.00407957 0.211809  16.6649
  4: 0.38168000 0.214740 0.38544600 0.212670 0.38533400 0.217001  28.2365
  5: 0.00485705 0.217712 0.00710657 0.215950 0.00737923 0.218856  35.5559
 ---                                                                     
177: 0.07138950 0.230589 0.06918600 0.223825 0.07896900 0.222427  69.3878
178: 0.31531000 0.223694 0.31960800 0.208014 0.32479400 0.215728 104.4240
179: 0.36601500 0.211508 0.37193400 0.210487 0.36756700 0.217592  42.0580
180: 0.08690940 0.224751 0.07950840 0.237030 0.07896900 0.222427  86.2424
181: 0.01912900 0.209566 0.02296630 0.206891 0.02147170 0.214579  40.5351

再次选择了 181 个三角形。长格式也更便于绘图：

# plot all points
mDT[, plot(X, Y, pch = 19, cex = 0.6)]
# plot points of selected triangles
# using a right join on the triangle ids of the selected triangles
mDT[mDT[between(Y, Y_range[1], Y_range[2], FALSE), .(tn = unique(tn))], on = "tn",
    points(X, Y , pch = 19, cex = 0.6, col = "red")]

错误信息说明

OP 报告以下行返回错误消息：

data[between(data,lower,upper,incbounds=FALSE)]
data[data>lower&data<upper]
data[sapply(data,function(x)x>lower&x<upper)]

原因是子集条件中使用了整个data对象，而不是单个列向量。

【讨论】：

哇！谢谢我接受了lower < Y1 & Y1 < upper | lower < Y2 & Y2 < upper | lower < Y3 & Y3 < upper 论点，并在我的分析中修复了它。很好的解决方案。谢谢