假设简单的情况:
- 单词由一个空格字符分隔 - 在表格和替换字符串中。
- 没有自然语言中的标点符号。没有前导或尾随噪音。
- 区分大小写的匹配。
- 删除所有个匹配项(而不仅仅是第一个)。
还有一个像这样的表格:
CREATE TABLE strings(id serial PRIMARY KEY, string text);
INSERT INTO strings(string) VALUES
('John Doe lives in Emerald Street')
, ('John Doe lives in Emerald Street as john DOE');
一个简短的解决方案:
SELECT *, rtrim(regexp_replace(string, '(John|Doe|Emerald) ?', '', 'g')) FROM strings;
| 在正则表达式中分隔备选分支。
相关:
或者,将您的原始替换字符串作为输入:
SELECT *, rtrim(regexp_replace(string, '(' || replace('John Doe Emerald', ' ', '|') || ') ?', '', 'g')) FROM strings;
设置操作
正则表达式通常很昂贵。这可能更快(最小形式):
SELECT s.id, string_agg(word, ' ') AS string2
FROM strings s, unnest(string_to_array(s.string, ' ')) word
WHERE word <> ALL (string_to_array('John Doe Emerald', ' '))
GROUP BY 1
ORDER BY 1;
为避免任何歧义并确保保留原始顺序:
SELECT s.id, string_agg(word, ' ' ORDER BY ord) AS string2
FROM strings s, unnest(string_to_array(s.string, ' ')) WITH ORDINALITY AS t(word, ord)
WHERE t.word <> ALL (string_to_array('John Doe Emerald', ' '))
GROUP BY 1
ORDER BY 1;
见:
在单独的子查询中使用ORDER BY 通常更快:
SELECT sub.id, string_agg(sub.word, ' ') AS string2
FROM (
SELECT s.id, t.word
FROM strings s, unnest(string_to_array(s.string, ' ')) WITH ORDINALITY AS t(word, ord)
WHERE t.word <> ALL (string_to_array('John Doe Emerald', ' '))
ORDER BY s.id, t.ord
) sub
GROUP BY 1
ORDER BY 1;
通常更容易与LATERAL 子查询集成:
SELECT s.id, sub.string2
FROM strings s
CROSS JOIN LATERAL (
SELECT string_agg(t.word, ' ' ORDER BY t.ord) AS string2
FROM unnest(string_to_array(s.string, ' ')) WITH ORDINALITY AS t(word, ord)
WHERE t.word <> ALL (string_to_array('John Doe Emerald', ' '))
) sub
ORDER BY 1;
这样,我们不需要在外部SELECT 中使用GROUP BY。
db小提琴here