【问题标题】:SQL Server Full Text Search Condition for FORMSOF for phrase exclude Stop Words用于短语排除停用词的 FORMSOF 的 SQL Server 全文搜索条件
【发布时间】:2012-02-17 13:41:07
【问题描述】:

我想搜索一些带有停用词的短语,例如“Line Through Crack”。 “通过”是停用词。我想得到与查询相同的结果

CONTAINS(*, 'FORMSOF(INFLECTIONAL, "Line") AND FORMSOF(INFLECTIONAL, "Crack")')

因此,所有包含除停用词之外的所有单词形式的所有行。 如果客户不知道停用词列表,我可以这样做吗?

【问题讨论】:

    标签: sql sql-server sql-server-2008 tsql full-text-search


    【解决方案1】:

    您使用的是什么版本的 SQL Server?如果是 2008 年或更高版本,那么您可以在查询运行时以编程方式检索停用词列表。然后,您可以检查是否有任何搜索词在停用词列表中,并将它们从“CONTAINS”查询字符串中排除。

    以下查询将返回停用词列表(对于美国英语,即语言 ID 1033):

    -- Run the following to get a list of languages and their IDs
    select lcid, name from sys.syslanguages order by 1
    
    -- Then use that ID to get a list of stop words
    select * from sys.fulltext_stopwords where language_id = 1033
    

    有了这些信息,你可以编写一个搜索过程来做这样的事情(这是一个非常基本的例子,但你应该明白了):

    USE [AdventureWorks]
    GO
    -- Make sure you have a full-text catalogue to test against
    /*
    IF EXISTS(SELECT * FROM sys.fulltext_indexes WHERE [object_id] = OBJECT_ID('Production.ProductDescription'))
        DROP FULLTEXT INDEX ON Production.ProductDescription;
    IF EXISTS(SELECT * FROM sys.fulltext_catalogs WHERE name = 'FTC_product_description')
        DROP FULLTEXT CATALOG FTC_product_description;
    CREATE FULLTEXT CATALOG [FTC_product_description]
        WITH ACCENT_SENSITIVITY = OFF
        AS DEFAULT AUTHORIZATION [dbo]
    CREATE FULLTEXT INDEX ON [Production].[ProductDescription]([Description] LANGUAGE [English])
        KEY INDEX [PK_ProductDescription_ProductDescriptionID] ON ([FTC_product_description], FILEGROUP [PRIMARY])
        WITH (CHANGE_TRACKING = AUTO, STOPLIST = SYSTEM);
    */
    GO
    IF OBJECT_ID('dbo.my_search_proc') IS NULL EXEC ('CREATE PROC dbo.my_search_proc AS ');
    GO
    -- My Search Proc
    ALTER PROC dbo.my_search_proc (
        @query_string   NVARCHAR(1000),
        @language_id    INT = 1033 -- change this to whatever your default language ID is
    ) AS
    BEGIN
        SET NOCOUNT ON;
    
        ------------------------------------------------------
        -- Split the string into 1 row per word
        ------------------------------------------------------
        -- I've done this in-line here for simplicity, but I 
        -- would recommend creating a CLR function instead
        -- for performance reasons.
        DECLARE @words TABLE (id INT IDENTITY(1,1), word NVARCHAR(100));
        DECLARE @cnt INT, @split_on CHAR(1)
        SELECT @cnt = 1, @split_on = ' ';
        WHILE (CHARINDEX(@split_on, @query_string) > 0) 
        BEGIN 
            INSERT INTO @words (word) 
            SELECT word = LEFT(LTRIM(RTRIM(SUBSTRING(@query_string,1,CHARINDEX(@split_on,@query_string)-1))), 100); 
            SET @query_string = SUBSTRING(@query_string,CHARINDEX(@split_on,@query_string)+1,LEN(@query_string)); 
            SET @cnt = @cnt + 1; 
        END 
        INSERT INTO @words (word)
        SELECT word = LEFT(LTRIM(RTRIM(@query_string)), 100); 
    
        ------------------------------------------------------
        -- Now build your "FORMSOF" string, excluding stop words.
        ------------------------------------------------------
        DECLARE @formsof NVARCHAR(4000);
    
        SELECT  @formsof = ISNULL(@formsof, '') 
                + 'FORMSOF(INFLECTIONAL, "' + w.word + '") AND '
        FROM    @words AS w 
        LEFT    JOIN sys.fulltext_system_stopwords AS sw -- use sys.fulltext_stopwords instead if you're using a user-defined stop-word list (or use both)
                ON  w.word = sw.stopword COLLATE database_default 
                AND sw.language_id = @language_id 
        WHERE   sw.stopword IS NULL
        ORDER   BY w.id; -- retain original order in case you do any weighting based on position, etc.
    
        -- If nothing was returned, then the whole query string was made up of stop-words, 
        -- so just return an empty result set to the application.
        IF @@ROWCOUNT = 0
            SELECT TOP(0) * FROM Production.ProductDescription;
    
        SET @formsof = LEFT(@formsof, LEN(@formsof)-4); -- Remove the last "AND"
        PRINT 'Query String: ' + @formsof
    
        ------------------------------------------------------
        -- Now perform the actual Full-Text search
        ------------------------------------------------------
        SELECT  * 
        FROM    Production.ProductDescription
        WHERE   CONTAINS(*, @formsof);
    END
    GO
    
    EXEC dbo.my_search_proc 'bars for downhill';
    

    因此,如果您搜索“bars for downhill”,则“for”将被删除(因为它是停用词),您应该留下FORMSOF(INFLECTIONAL, "bars") AND FORMSOF(INFLECTIONAL, "downhill").

    很遗憾,如果您使用的是 SQL 2005 并且不知道干扰词文件中的内容,那么您无能为力(据我所知)。

    干杯, 戴夫

    【讨论】:

      猜你喜欢
      • 2011-06-11
      • 2017-03-20
      • 2020-05-31
      • 2015-02-05
      • 2013-02-19
      • 2011-08-04
      • 2010-11-18
      • 2011-08-03
      • 2011-01-10
      相关资源
      最近更新 更多