【发布时间】:2013-06-26 14:55:08
【问题描述】:
我正在使用实体框架(代码优先)并且发现我在 LINQ 查询中指定子句的顺序会对性能产生巨大影响,例如:
using (var db = new MyDbContext())
{
var mySize = "medium";
var myColour = "vermilion";
var list1 = db.Widgets.Where(x => x.Colour == myColour && x.Size == mySize).ToList();
var list2 = db.Widgets.Where(x => x.Size == mySize && x.Colour == myColour).ToList();
}
如果(稀有)颜色子句在(常见)尺寸子句之前,它会很快,但反过来它会慢几个数量级。该表有几百万行,有问题的两个字段是 nvarchar(50),所以没有标准化,但它们都被索引了。这些字段以代码优先方式指定,如下所示:
[StringLength(50)]
public string Colour { get; set; }
[StringLength(50)]
public string Size { get; set; }
我真的应该在我的 LINQ 查询中担心这些事情吗,我认为那是数据库的工作?
系统规格为:
- Visual Studio 2010
- .NET 4
- EntityFramework 6.0.0-beta1
- SQL Server 2008 R2 Web(64 位)
更新:
对,对于任何贪吃的惩罚,效果都可以复制如下。这个问题似乎对许多因素非常敏感,所以请忍受其中一些人为的性质:
通过 nuget 安装 EntityFramework 6.0.0-beta1,然后生成代码优先样式:
public class Widget
{
[Key]
public int WidgetId { get; set; }
[StringLength(50)]
public string Size { get; set; }
[StringLength(50)]
public string Colour { get; set; }
}
public class MyDbContext : DbContext
{
public MyDbContext()
: base("DefaultConnection")
{
}
public DbSet<Widget> Widgets { get; set; }
}
使用以下 SQL 生成虚拟数据:
insert into gadget (Size, Colour)
select RND1 + ' is the name is this size' as Size,
RND2 + ' is the name of this colour' as Colour
from (Select top 1000000
CAST(abs(Checksum(NewId())) % 100 as varchar) As RND1,
CAST(abs(Checksum(NewId())) % 10000 as varchar) As RND2
from master..spt_values t1 cross join master..spt_values t2) t3
为颜色和尺寸各添加一个索引,然后查询:
string mySize = "99 is the name is this size";
string myColour = "9999 is the name of this colour";
using (var db = new WebDbContext())
{
var list1= db.Widgets.Where(x => x.Colour == myColour && x.Size == mySize).ToList();
}
using (var db = new WebDbContext())
{
var list2 = db.Widgets.Where(x => x.Size == mySize && x.Colour == myColour).ToList();
}
这个问题似乎与生成的 SQL 中的 NULL 比较的钝集合有关,如下所示。
exec sp_executesql N'SELECT
[Extent1].[WidgetId] AS [WidgetId],
[Extent1].[Size] AS [Size],
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Size] = @p__linq__0)
AND ( NOT ([Extent1].[Size] IS NULL OR @p__linq__0 IS NULL)))
OR (([Extent1].[Size] IS NULL) AND (@p__linq__0 IS NULL)))
AND ((([Extent1].[Colour] = @p__linq__1) AND ( NOT ([Extent1].[Colour] IS NULL
OR @p__linq__1 IS NULL))) OR (([Extent1].[Colour] IS NULL)
AND (@p__linq__1 IS NULL)))',N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'99 is the name is this size',
@p__linq__1=N'9999 is the name of this colour'
go
将 LINQ 中的相等运算符更改为 StartWith() 可以解决问题,将两个字段之一更改为在数据库中不可为空也是如此。
我绝望了!
更新 2:
对任何赏金猎人的一些帮助,可以在干净的数据库中的 SQL Server 2008 R2 Web(64 位)上重现该问题,如下所示:
CREATE TABLE [dbo].[Widget](
[WidgetId] [int] IDENTITY(1,1) NOT NULL,
[Size] [nvarchar](50) NULL,
[Colour] [nvarchar](50) NULL,
CONSTRAINT [PK_dbo.Widget] PRIMARY KEY CLUSTERED
(
[WidgetId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX IX_Widget_Size ON dbo.Widget
(
Size
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX IX_Widget_Colour ON dbo.Widget
(
Colour
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
insert into Widget (Size, Colour)
select RND1 + ' is the name is this size' as Size,
RND2 + ' is the name of this colour' as Colour
from (Select top 1000000
CAST(abs(Checksum(NewId())) % 100 as varchar) As RND1,
CAST(abs(Checksum(NewId())) % 10000 as varchar) As RND2
from master..spt_values t1 cross join master..spt_values t2) t3
GO
然后比较以下两个查询的相对性能(您可能需要调整参数测试值以获得返回几行的查询以观察效果,即第二个查询id慢得多)。
exec sp_executesql N'SELECT
[Extent1].[WidgetId] AS [WidgetId],
[Extent1].[Size] AS [Size],
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Colour] = @p__linq__0)
AND ( NOT ([Extent1].[Colour] IS NULL
OR @p__linq__0 IS NULL)))
OR (([Extent1].[Colour] IS NULL)
AND (@p__linq__0 IS NULL)))
AND ((([Extent1].[Size] = @p__linq__1)
AND ( NOT ([Extent1].[Size] IS NULL
OR @p__linq__1 IS NULL)))
OR (([Extent1].[Size] IS NULL) AND (@p__linq__1 IS NULL)))',
N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'9999 is the name of this colour',
@p__linq__1=N'99 is the name is this size'
go
exec sp_executesql N'SELECT
[Extent1].[WidgetId] AS [WidgetId],
[Extent1].[Size] AS [Size],
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Size] = @p__linq__0)
AND ( NOT ([Extent1].[Size] IS NULL
OR @p__linq__0 IS NULL)))
OR (([Extent1].[Size] IS NULL)
AND (@p__linq__0 IS NULL)))
AND ((([Extent1].[Colour] = @p__linq__1)
AND ( NOT ([Extent1].[Colour] IS NULL
OR @p__linq__1 IS NULL)))
OR (([Extent1].[Colour] IS NULL)
AND (@p__linq__1 IS NULL)))',
N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'99 is the name is this size',
@p__linq__1=N'9999 is the name of this colour'
像我一样,您可能还会发现,如果您重新运行虚拟数据插入以使现在有 200 万行,问题就会消失。
【问题讨论】:
-
对这些列使用 nvarchar 似乎是一个糟糕的选择。
-
我希望生成的 SQL 中的顺序也颠倒过来。如果您直接在数据库上运行查询而不使用实体框架,那么顺序是否重要?
-
检查输出的 TSQL。检查查询计划。通常,查询优化器会找到最好的,但在某些情况下,TSQL 的顺序会影响查询计划,但对于 2 个简单的地方,这让我感到惊讶。
-
但重要的是 LINQ 解释。检查来自 LINQ 的 TSQL。
-
就赏金而言,很难比@PaulWhite 更可信
标签: c# .net sql-server linq entity-framework