为什么使用 SQL Server 2008 地理数据类型？答案

【问题标题】：Why use the SQL Server 2008 geography data type?为什么使用 SQL Server 2008 地理数据类型？
【发布时间】：2011-11-16 13:36:08
【问题描述】：

我正在重新设计一个客户数据库，我想与标准地址字段（街道、城市等）一起存储的新信息之一是地址的地理位置。我想到的唯一用例是允许用户在无法找到地址时在谷歌地图上绘制坐标，这通常发生在该地区是新开发的地区或位于偏远/农村地区时。

我的第一个想法是将纬度和经度存储为十进制值，但后来我记得 SQL Server 2008 R2 有一个 geography 数据类型。我完全没有使用 geography 的经验，而且从我最初的研究来看，这对于我的场景来说似乎是矫枉过正。

例如，要使用存储为decimal(7,4) 的纬度和经度，我可以这样做：

insert into Geotest(Latitude, Longitude) values (47.6475, -122.1393)
select Latitude, Longitude from Geotest

但是对于geography，我会这样做：

insert into Geotest(Geolocation) values (geography::Point(47.6475, -122.1393, 4326))
select Geolocation.Lat, Geolocation.Long from Geotest

虽然它并没有那么复杂得多，但如果我不需要，为什么还要增加复杂性呢？

在我放弃使用geography 的想法之前，有什么我应该考虑的吗？使用空间索引搜索位置是否比索引纬度和经度字段更快？使用geography 是否有我不知道的优势？或者，另一方面，我应该知道哪些警告会阻止我使用geography？

更新

@Erik Philips 提出了使用geography 进行邻近搜索的功能，这非常酷。

另一方面，一项快速测试表明，使用geography 时，简单的select 获取纬度和经度会明显变慢（详情如下）。，以及对accepted answer 对geography 上的另一个SO 问题的评论让我持怀疑态度：

@SaphuA 不客气。作为旁注要非常小心使用可为空的 GEOGRAPHY 数据类型列上的空间索引。有一些严重的性能问题，因此使 GEOGRAPHY 列不可为空即使您必须改造您的架构。 – 托马斯 6 月 18 日 11:18

总而言之，在权衡进行邻近搜索的可能性与性能和复杂性的权衡后，我决定在这种情况下放弃使用 geography。

我运行的测试的详细信息：

我创建了两张表，一张使用geography，另一张使用decimal(9,6) 表示经度和纬度：

CREATE TABLE [dbo].[GeographyTest]
(
    [RowId] [int] IDENTITY(1,1) NOT NULL,
    [Location] [geography] NOT NULL,
    CONSTRAINT [PK_GeographyTest] PRIMARY KEY CLUSTERED ( [RowId] ASC )
) 

CREATE TABLE [dbo].[LatLongTest]
(
    [RowId] [int] IDENTITY(1,1) NOT NULL,
    [Latitude] [decimal](9, 6) NULL,
    [Longitude] [decimal](9, 6) NULL,
    CONSTRAINT [PK_LatLongTest] PRIMARY KEY CLUSTERED ([RowId] ASC)
)

并在每个表中插入使用相同纬度和经度值的单行：

insert into GeographyTest(Location) values (geography::Point(47.6475, -122.1393, 4326))
insert into LatLongTest(Latitude, Longitude) values (47.6475, -122.1393)

最后，运行以下代码显示，在我的机器上，使用geography时选择经纬度大约慢了5倍。

declare @lat float, @long float,
        @d datetime2, @repCount int, @trialCount int, 
        @geographyDuration int, @latlongDuration int,
        @trials int = 3, @reps int = 100000

create table #results 
(
    GeographyDuration int,
    LatLongDuration int
)

set @trialCount = 0

while @trialCount < @trials
begin

    set @repCount = 0
    set @d = sysdatetime()

    while @repCount < @reps
    begin
        select @lat = Location.Lat,  @long = Location.Long from GeographyTest where RowId = 1
        set @repCount = @repCount + 1
    end

    set @geographyDuration = datediff(ms, @d, sysdatetime())

    set @repCount = 0
    set @d = sysdatetime()

    while @repCount < @reps
    begin
        select @lat = Latitude,  @long = Longitude from LatLongTest where RowId = 1
        set @repCount = @repCount + 1
    end

    set @latlongDuration = datediff(ms, @d, sysdatetime())

    insert into #results values(@geographyDuration, @latlongDuration)

    set @trialCount = @trialCount + 1

end

select * 
from #results

select avg(GeographyDuration) as AvgGeographyDuration, avg(LatLongDuration) as AvgLatLongDuration
from #results

drop table #results

结果：

GeographyDuration LatLongDuration
----------------- ---------------
5146              1020
5143              1016
5169              1030

AvgGeographyDuration AvgLatLongDuration
-------------------- ------------------
5152                 1022

更令人惊讶的是，即使没有选择行，例如选择不存在的RowId = 2 的位置，geography 仍然较慢：

GeographyDuration LatLongDuration
----------------- ---------------
1607              948
1610              946
1607              947

AvgGeographyDuration AvgLatLongDuration
-------------------- ------------------
1608                 947

【问题讨论】：

我正在考虑两者都做，将 Lat 和 Lon 保存在它们自己的列中，并为 Geography 对象设置另一列，所以如果我只需要 Lat/Lon 我从列中获取它们，如果我需要邻近搜索，我将使用 Geography。这是明智的吗？有什么缺点吗（除了需要更多空间……）？
@YuvalA。这听起来当然是合理的，并且可能是一个很好的折衷方案。我唯一担心的是表中的 Geography 列是否会对针对表的查询产生任何影响 - 我没有这方面的经验，因此您需要进行测试以验证。
您为什么一直用新问题更新您的问题，而不是提出新问题？
@Chad 不确定您的意思。我更新了一次问题的正文，不是为了问更多问题。
现在，对于那些发现这个问题的人来说，值得注意的是，SQL Server 2012 包括空间索引的显着性能提升。另外值得注意的是，只要您存储位置信息，您就可以稍后使用查找服务添加空间信息，以对您已经存储的地址进行地理编码。

标签： sql-server-2008 geolocation geocoding

【解决方案1】：

如果您计划进行任何空间计算，EF 5.0 允许使用以下 LINQ 表达式：

private Facility GetNearestFacilityToJobsite(DbGeography jobsite)
{   
    var q1 = from f in context.Facilities            
             let distance = f.Geocode.Distance(jobsite)
             where distance < 500 * 1609.344     
             orderby distance 
             select f;   
    return q1.FirstOrDefault();
}

那么有一个很好的理由使用地理。

Explanation of spatial within Entity Framework.

更新为Creating High Performance Spatial Databases

正如我在Noel Abrahams Answer 上所说的：

关于空间的说明，每个坐标存储为一个64位（8字节）长的双精度浮点数，8字节二进制值大致相当于15位十进制精度，所以比较一个只有 5 个字节的小数（9,6）并不是一个公平的比较。对于每个 LatLong（共 18 个字节），十进制必须至少为 Decimal(15,12)（9 个字节）才能进行真正的比较。

所以比较存储类型：

CREATE TABLE dbo.Geo
(    
geo geography
)
GO

CREATE TABLE dbo.LatLng
(    
    lat decimal(15, 12),   
    lng decimal(15, 12)
)
GO

INSERT dbo.Geo
SELECT geography::Point(12.3456789012345, 12.3456789012345, 4326) 
UNION ALL
SELECT geography::Point(87.6543210987654, 87.6543210987654, 4326) 

GO 10000

INSERT dbo.LatLng
SELECT  12.3456789012345, 12.3456789012345 
UNION
SELECT 87.6543210987654, 87.6543210987654

GO 10000

EXEC sp_spaceused 'dbo.Geo'

EXEC sp_spaceused 'dbo.LatLng'

结果：

name    rows    data     
Geo     20000   728 KB   
LatLon  20000   560 KB

地理数据类型占用的空间增加了 30%。

另外geography数据类型不仅限于存储Point，还可以存储LineString, CircularString, CompoundCurve, Polygon, CurvePolygon, GeometryCollection, MultiPoint, MultiLineString, and MultiPolygon and more。任何将最简单的地理类型（如 Lat/Long）存储在点（例如 LINESTRING(1 1, 2 2) 实例）之外的任何尝试都会为每个点产生额外的行，为每个点的顺序排序的列以及用于分组行的另一列。 SQL Server 也有 Geography 数据类型的方法，包括计算Area, Boundary, Length, Distances, and more。

在 Sql Server 中将纬度和经度存储为十进制似乎是不明智的。

更新 2

如果您打算进行距离、面积等任何计算，则很难在地球表面上正确计算这些。存储在 SQL Server 中的每种地理类型也存储有 Spatial Reference ID。这些 id 可以是不同的领域（地球是 4326）。这意味着 SQL Server 中的计算实际上会在地球表面上正确计算（而不是 as-the-crow-flies，它可能会穿过地球表面）。

【讨论】：

要添加到此信息，使用 Geography 真正扩展了 sql 搜索的能力，因为 Geography 数据类型允许您创建多个几乎任何大小和形状的区域。
再次感谢。我确实询问了考虑使用geography 的理由，你提供了一些好的理由。最终，我决定在这种情况下只使用 decimal 字段（请参阅我的冗长更新），但很高兴知道如果我需要做任何比简单映射坐标更花哨的事情，我可以使用 geography。跨度>

【解决方案2】：

要考虑的另一件事是每种方法占用的存储空间。地理类型存储为VARBINARY(MAX)。尝试运行此脚本：

CREATE TABLE dbo.Geo
(
    geo geography

)

GO

CREATE TABLE dbo.LatLon
(
    lat decimal(9, 6)
,   lon decimal(9, 6)

)

GO

INSERT dbo.Geo
SELECT geography::Point(36.204824, 138.252924, 4326) UNION ALL
SELECT geography::Point(51.5220066, -0.0717512, 4326) 

GO 10000

INSERT dbo.LatLon
SELECT  36.204824, 138.252924 UNION
SELECT 51.5220066, -0.0717512

GO 10000

EXEC sp_spaceused 'dbo.Geo'
EXEC sp_spaceused 'dbo.LatLon'

结果：

name    rows    data     
Geo     20000   728 KB   
LatLon  20000   400 KB

地理数据类型占用几乎两倍的空间。

【讨论】：

关于空间的注释each coordinate is stored as a double-precision floating-point number that is 64 bits (8 bytes) long, and 8-byte binary value is roughly equivalent to 15 digits of decimal precision，因此比较decimal(9,6) which is only 5 bytes 并不完全是公平的比较。对于每个 LatLong（共 18 个字节），十进制必须至少为 Decimal(15,12)（9 个字节）才能进行真正的比较。
@ErikPhilips 的重点是，当您只需要小数 (9, 6) 时，为什么还要使用小数 (15, 12)？上面的比较是一个实际的比较，而不是学术练习。

【解决方案3】：

    CREATE FUNCTION [dbo].[fn_GreatCircleDistance]
(@Latitude1 As Decimal(38, 19), @Longitude1 As Decimal(38, 19), 
            @Latitude2 As Decimal(38, 19), @Longitude2 As Decimal(38, 19), 
            @ValuesAsDecimalDegrees As bit = 1, 
            @ResultAsMiles As bit = 0)
RETURNS decimal(38,19)
AS
BEGIN
    -- Declare the return variable here
    DECLARE @ResultVar  decimal(38,19)

    -- Add the T-SQL statements to compute the return value here
/*
Credit for conversion algorithm to Chip Pearson
Web Page: www.cpearson.com/excel/latlong.aspx
Email: chip@cpearson.com
Phone: (816) 214-6957 USA Central Time (-6:00 UTC)
Between 9:00 AM and 7:00 PM

Ported to Transact SQL by Paul Burrows BCIS
*/
DECLARE  @C_RADIUS_EARTH_KM As Decimal(38, 19)
SET @C_RADIUS_EARTH_KM = 6370.97327862
DECLARE  @C_RADIUS_EARTH_MI As Decimal(38, 19)
SET @C_RADIUS_EARTH_MI = 3958.73926185
DECLARE  @C_PI As Decimal(38, 19)
SET @C_PI =  pi()

DECLARE @Lat1 As Decimal(38, 19)
DECLARE @Lat2 As Decimal(38, 19)
DECLARE @Long1 As Decimal(38, 19)
DECLARE @Long2 As Decimal(38, 19)
DECLARE @X As bigint
DECLARE @Delta As Decimal(38, 19)

If @ValuesAsDecimalDegrees = 1 
Begin
    set @X = 1
END
Else
Begin
    set @X = 24
End 

-- convert to decimal degrees
set @Lat1 = @Latitude1 * @X
set @Long1 = @Longitude1 * @X
set @Lat2 = @Latitude2 * @X
set @Long2 = @Longitude2 * @X

-- convert to radians: radians = (degrees/180) * PI
set @Lat1 = (@Lat1 / 180) * @C_PI
set @Lat2 = (@Lat2 / 180) * @C_PI
set @Long1 = (@Long1 / 180) * @C_PI
set @Long2 = (@Long2 / 180) * @C_PI

-- get the central spherical angle
set @Delta = ((2 * ASin(Sqrt((power(Sin((@Lat1 - @Lat2) / 2) ,2)) + 
    Cos(@Lat1) * Cos(@Lat2) * (power(Sin((@Long1 - @Long2) / 2) ,2))))))

If @ResultAsMiles = 1 
Begin
    set @ResultVar = @Delta * @C_RADIUS_EARTH_MI
End
Else
Begin
    set @ResultVar = @Delta * @C_RADIUS_EARTH_KM
End

    -- Return the result of the function
    RETURN @ResultVar

END

【讨论】：

总是欢迎新的答案，但请添加一些上下文。简要解释上述如何解决问题，使答案对其他人更有用。