拆分逗号分隔值并将它们映射到 SQLite 中的原始 ID答案

【问题标题】：Split comma-separated values and map them to original ID in SQLite拆分逗号分隔值并将它们映射到 SQLite 中的原始 ID
【发布时间】：2016-10-27 15:53:42
【问题描述】：

我有一个名为 articles 的表，其中包含以下格式的数据：

id|categories
--+----------
1|123,13,43
2|1,3,15
3|9,17,44,18,3

出于测试目的，您可以使用以下 SQL 命令创建此表：

CREATE TABLE articles(id INTEGER PRIMARY KEY, categories TEXT);
INSERT INTO articles VALUES(1, '123,13,43'), (2, '1,3,15'), (3, '9,17,44,18,3');

现在我想拆分categories 列的值，以便得到如下表：

id|category
--+--------
1|123
1|13
1|43
2|1
2|3
2|15
3|9
3|17
3|44
3|18
3|3

如您所见，我想将原始表带入First normal form。

我已经知道如何从this 答案中以这种方式仅拆分一个行。下面的代码示例只取第二行（即 id=2 的那一行）并以所需的方式拆分它们：

WITH split(article_id, word, str, offsep) AS
(
    VALUES
    (
        2,
        '',
        (SELECT categories FROM articles WHERE id=2),
        1
    )
    UNION ALL
    SELECT
        article_id,
        substr(str, 0, CASE WHEN instr(str, ',') THEN instr(str, ',') ELSE length(str)+1 END),
        ltrim(substr(str, instr(str, ',')), ','),
        instr(str, ',')
        FROM split
        WHERE offsep
) SELECT article_id, word FROM split WHERE word!='';

当然这是非常不灵活的，因为文章 ID 需要硬编码。所以，现在我的问题是：我必须在上面的 SQLite 代码中添加或更改什么才能使其对所有行进行操作并输出所需的结果？

【问题讨论】：

真的必须在 SQL 中完成吗（也许“是”，只是验证）？你能在 bash 上做到这一点吗？
@DuduMarkovitz 我不认为使用 bash 会很好...而且我这样做是为了娱乐/教育目的，所以学习 SQL 方法会更好。
一个班轮 awk 就可以解决问题。这个网站有很多类似的问题。 SQLite 不适合这项工作。

标签： sql sqlite

【解决方案1】：

在玩了一些之后，我终于自己想出了解决方案。它还处理具有'' 或NULL 作为categories 值的行：

-- create temporary table which buffers the maximum article ID, because SELECT MAX can take a very long time on huge databases
DROP TABLE IF EXISTS max_article_id;
CREATE TEMP TABLE max_article_id(num INTEGER);
INSERT INTO max_article_id VALUES((SELECT MAX(id) FROM articles));

WITH RECURSIVE split(article_id, word, str, offsep) AS
(
    VALUES ( 0, '', '', 0 )                                      -- begin with dummy article 0 (which does not actually exist) to avoid code duplication
    UNION ALL
    SELECT
        CASE WHEN offsep==0 OR str IS NULL
            THEN article_id+1                                    -- go to next article if the current one is finished
            ELSE article_id                                      -- and keep the current one in the opposite case
        END,
        CASE WHEN offsep==0 OR str IS NULL
            THEN ''
            ELSE substr(str, 0, CASE WHEN instr(str, ',') THEN instr(str, ',') ELSE length(str)+1 END)
        END,
        CASE WHEN offsep==0 OR str IS NULL                       -- when str==NULL, then there has been a NULL value for the categories cell of the current article
            THEN (SELECT categories FROM articles WHERE id=article_id+1)
            ELSE ltrim(substr(str, instr(str, ',')), ',')
        END,
        CASE WHEN offsep==0 OR str IS NULL                       -- offsep==0 means that the splitting was finished in the previous iteration
            THEN 1                                               -- offsep==1 means that splitting the categories for a new article will begin in the next iteration
            ELSE instr(str, ',')                                 -- the actual string splitting stuff is explained and taken from here: http://stackoverflow.com/a/32051164
        END
        FROM split
        WHERE article_id<=(SELECT * FROM max_article_id)         -- stop getting new articles when the maximum article ID is reached
) SELECT article_id, word AS category FROM split WHERE word!=''; -- only select article_id and word from the result to use output the desired table layout

【讨论】：

谢谢你，工作得很好，除了 sqlite 似乎从 1 开始索引，所以我在设置单词时更改了这一行：ELSE substr(str, 1, CASE WHEN instr(str, ',') THEN instr(str, ',') ELSE length(str)+1 END)

【解决方案2】：

您好，这可能晚了几年，但我有一个更简单的解决方案，使用我在 How to split comma-separated value in SQLite? 中修改后的答案

CREATE TABLE articles(id INTEGER PRIMARY KEY, categories TEXT);
INSERT INTO articles VALUES(1, '123,13,43'), (2, '1,3,15'),
(3, '9,17,44,18,3'), (4, ''), (5, NULL);

WITH split(id, category, str) AS (
    SELECT id, '', categories||',' FROM articles
    UNION ALL SELECT id,
    substr(str, 0, instr(str, ',')),
    substr(str, instr(str, ',')+1)
    FROM split WHERE str
) SELECT id, category FROM split WHERE category ORDER BY id;

输出如你所料：

id|category
1|123
1|13
1|43
2|1
2|3
2|15
3|9
3|17
3|44
3|18
3|3

【讨论】：