【发布时间】:2019-08-21 16:24:36
【问题描述】:
我有一个 MongoDB 聚合管道,其中包含许多步骤(匹配索引字段、添加字段、排序、折叠、再次排序、页面、项目结果。)如果我注释掉除第一个匹配步骤之外的所有步骤,查询执行速度超快(0.075 秒),因为它利用了正确的索引。但是,如果我随后尝试执行任何后续步骤,即使是像获取结果计数这样简单的操作,查询也会开始花费 27 秒!!!
这是查询:(不要太纠结于它的复杂性,因为索引正在快速执行它......)
db.runCommand({
aggregate: 'ResidentialProperty',
allowDiskUse: false,
explain: false,
cursor: {},
pipeline:
[
{
"$match" : {
"$and" : [
{
"CountyPlaceId" : 20006073
},
{
"$or" : [
{
"$and" : [
{
"ForSaleGroupId" : {
"$in" : [
2,
3
]
}
},
{
"$or" : [
{
"ForSaleGroupId" : {
"$nin" : [
2,
3
]
}
},
{
"ListDate" : {
"$gte" : ISODate("2019-02-21T00:00:00.000Z")
}
}
]
},
{
"$or" : [
{
"ForSaleGroupId" : {
"$ne" : 3
}
},
{
"PendingSaleDate" : {
"$gte" : ISODate("2019-02-21T00:00:00.000Z")
}
}
]
}
]
},
{
"ForLeaseGroupId" : {
"$in" : [
2,
3
]
},
"$or" : [
{
"ForLeaseGroupId" : {
"$nin" : [
2,
3
]
}
},
{
"ListDate" : {
"$gte" : ISODate("2019-02-21T00:00:00.000Z")
}
}
]
},
{
"DistressedGroupId" : {
"$in" : [
2,
3,
4
]
},
"$or" : [
{
"DistressedGroupId" : 1
},
{
"DistressedDate" : {
"$gte" : ISODate("2019-02-21T00:00:00.000Z")
}
}
]
},
{
"$and" : [
{
"OffMarketGroupId" : {
"$in" : [
3,
8
]
}
},
{
"$or" : [
{
"OffMarketGroupId" : 1
},
{
"OffMarketDate" : {
"$gte" : ISODate("2019-02-21T00:00:00.000Z")
}
}
]
},
{
"$or" : [
{
"OffMarketGroupId" : {
"$nin" : [
7,
8
]
}
},
{
"SoldDate" : {
"$gte" : ISODate("2019-02-21T00:00:00.000Z")
}
},
{
"OffMarketDate" : {
"$gte" : ISODate("2019-02-21T00:00:00.000Z")
}
}
]
}
]
},
{
"$or" : [
{
"ForSaleGroupId" : {
"$ne" : 1
}
},
{
"OffMarketGroupId" : 6
}
],
"ChangedListPriceDate" : {
"$gte" : ISODate("2019-02-21T00:00:00.000Z")
}
}
]
},
{
"$or" : [
{
"ForSaleGroupId" : {
"$ne" : 1
}
},
{
"ForLeaseGroupId" : {
"$ne" : 1
}
},
{
"OffMarketGroupId" : 6
},
{
"IsListingOnly" : true
},
{
"OrgId" : ""
},
{
"OffMarketDate" : {
"$gte" : ISODate("2018-11-23T00:00:00.000Z")
}
}
]
},
{
"PropertyTypeId" : {
"$in" : [
1,
5,
6
]
}
}
]
}
},
// Other steps ommitted, since it's slow regardless...
{ "$count": "Count" }
]
})
这是一个示例 ResidentialProperty 文档的样子:
{
"_id" : 294401911,
"PropertyId" : 86689647,
"OrgId" : "caclaw-n",
"OrgSecurableId" : 1,
"ListingId" : "19443870",
"Location" : {
"type" : "Point",
"coordinates" : [
-117.316207,
33.104623
]
},
"CountyPlaceId" : 20006073,
"CityPlaceId" : 50611194,
"ZipCodePlaceId" : 70092011,
"MetropolitanAreaPlaceId" : 10041740,
"MinorCivilDivisionPlaceId" : 30002074,
"NeighborhoodPlaceId" : 150813707,
"MacroNeighborhoodPlaceId" : 160051666,
"SubNeighborhoodPlaceId" : null,
"ResidentialNeighborhoodsPlaceId" : 220978234,
"ForSaleGroupId" : 1,
"DistressedGroupId" : 1,
"OffMarketGroupId" : 1,
"ForLeaseGroupId" : 2,
"ForSaleDistressedGroupId" : 1,
"OffMarketDistressedGroupId" : 1,
"ListDate" : ISODate("2019-03-15T00:00:00.000Z"),
"PendingSaleDate" : null,
"OffMarketDate" : null,
"DistressedDate" : null,
"SoldDate" : null,
"ChangedListPriceDate" : null,
"ListPrice" : null,
"ListPriceRangeLow" : null,
"ListPriceRangeHigh" : null,
"ListPricePerSqFt" : null,
"ListPricePerLotSizeSqFt" : null,
"SoldPrice" : 0,
"SoldPricePerSqFt" : 0.0,
"SoldPricePerLotSizeSqFt" : 0.0,
"MonthlyLeaseListPrice" : 6950.0,
"MonthlyLeaseListPricePerSqFt" : 2.5402,
"MonthlyLeaseListPricePerLotSizeSqFt" : 2.5402,
"MonthlyLeaseSoldPrice" : null,
"MonthlyLeaseSoldPricePerSqFt" : null,
"MonthlyLeaseSoldPricePerLotSizeSqFt" : null,
"SoldToListPriceRatio" : 0.0,
"EstimatedToListPriceRatio" : 0.0,
"AppPropertyModeId" : 1,
"PropertyTypeId" : 1,
"PropertySubTypeId" : null,
"Bedrooms" : 4,
"Bathrooms" : 3,
"LivingAreaInSqFt" : 2736,
"LotSizeInSqFt" : NumberLong(5073),
"YearBuilt" : 2004,
"GarageSpaces" : 2,
"BuildingSizeInSqFt" : 2736,
"Units" : 1,
"Rooms" : null,
"NetIncome" : null,
"EstimateTypeId" : 3,
"EstimatedValue" : 1253740,
"EstimatedValuePerSqFt" : 458.2383,
"EstimatedValuePerLotSizeSqFt" : 247.1397,
"CapRate" : null,
"Keywords" : [
"$6,950/month long-term minimum of 30 days. $8,950 June and then $9,950 for July or August. BeautifulWaters End Luxury Home walking distance to the beach. Short or Long term Fully Furnished (1 Month plus) with brand new furnishings & fresh paint & new carpets. Enjoy the beach & golf community lifestyle of Carlsbad, CA in this delightful North County San Diego vacation rental home! This spacious & comfortable two story single family home sits on a cul-de-sac in the gated community of Waters End. Easy walk to the beach and close proximity to the Carlsbad train station, area restaurants, shopping, golf courses, and San Diego theme park attractions. The community also offers many health and beauty spas, yoga, and meditation centers, nearby world-renowned golf courses (such as Torrey Pines, Aviara, and La Costa Resort and Spa) as well as some of the best cycling in all of San Diego County.",
"San Diego (City) (Sd)",
"R1",
"Single Family"
],
"OwnerName" : "Brookside Land Trust, ; State Trustee Services Llc",
"TenantNames" : null,
"Apn" : "214-610-49-00",
"OpenHouseStartDate" : null,
"OpenHouseEndDate" : null,
"ListingPhotoCount" : 25,
"StatusChangedDate" : ISODate("2019-06-28T00:00:00.000Z"),
"SortAddress" : "BrooksideCtZZZZZZZZZZ00000000000000000617ZZZZZCarlsbadCA92011",
"SortOwnerName" : "BrooksideLandTrust,;State",
"ListingIdAlphaNum" : "19443870",
"IsListingOnly" : false
}
计数返回 27,815 个结果。我不认为这是一个索引问题,因为第一个匹配步骤执行得如此之快。我也不认为这是每个聚合管道步骤达到 100mb 内存限制的问题,因为我设置了 allowDiskUse: false ,但它仍在执行查询而不会出错。
同样有趣的是,在第一个匹配步骤之后,针对同一集合的另一个聚合管道查询过滤到 45,081 条记录,但是当我在之后执行计数时,它仅在 3 秒内返回。所以这个问题不能真正归咎于文档结构。
那么这里到底发生了什么?为什么匹配过滤如此之快,而之后的任何操作,即使是像计数这样简单的操作,都非常慢?我已经尝试启用 explain: true 并且我没有看到任何突出的东西。匹配操作表明它使用了正确的索引。计数操作在说明中不包含任何其他详细信息。
【问题讨论】:
-
鉴于
$or的复杂性和大量使用,很难想象您的$match是如何被索引很好地支持的。当您只是测试$match时,您是用尽了结果游标还是仅仅获得了第一组结果? -
澄清上述评论:MongoDB 游标不会一次检索整个结果集。相反,在任何给定时间检索小批量以进行迭代。如果您尝试仅使用
$match结果遍历游标,很有可能仍然会遇到 27 秒的执行时间。由于条件分支的数量和要查询的不同字段的数量,您的初始匹配不太可能有效。 -
您的第一组更改应该是从顶级
$and操作(CountyPlaceId和PropertyTypeId)中删除第一个和最后一个条目,并将它们放在最开始匹配作为您的查询谓词。然后,您应该拥有三个顶级$match字段CountyPlaceId、PropertyTypeId和$and。这将显着减少初始匹配中的开销。可能需要进一步优化,但请先进行这些更改并从那里开始。 -
谢谢,你们俩都很准。我曾假设,由于第一步执行得很快,查询和索引没有问题,但我只检索前 50 个结果,所以这就是它如此之快的原因。我能够优化查询,现在获得了更合理的 2-3 秒持续时间。如果有人想发表他们的评论作为答案,我会接受。
标签: mongodb mongodb-query aggregation-framework