如何改进或索引 postgresql 的 jsonb 数组字段？答案

【问题标题】：How to improve or index postgresql's jsonb array field?如何改进或索引 postgresql 的 jsonb 数组字段？
【发布时间】：2021-04-23 00:35:14
【问题描述】：

我通常使用jsonb字段存储数组数据。例如，我想存储客户的条形码信息，我将创建一个这样的表：

create table customers(fcustomerid bigint, fcodes jsonb);

一个客户有一行，所有条码信息都存储在其fcodes字段中，如下所示：

[
{
    "barcode":"000000001",
    "codeid":1,
    "product":"Coca Cola",
    "createdate":"2021-01-19",
    "lottorry":true,
    "lottdate":"2021-01-20",
    "bonus":50
},
{
    "barcode":"000000002",
    "codeid":2,
    "product":"Coca Cola",
    "createdate":"2021-01-19",
    "lottorry":false,
    "lottdate":"",
    "bonus":0
}
...
{
    "barcode":"000500000",
    "codeid":500000,
    "product":"Pepsi Cola",
    "createdate":"2021-01-19",
    "lottorry":false,
    "lottdate":"",
    "bonus":0
}
]

jsonb 数组可能存储数百万条具有相同结构的条码对象。也许这不是一个好主意，但是你知道当我有成千上万的客户时，我可以将所有数据存储在一个表中，一个客户在此表中具有一行，其所有数据存储在一个字段中，看起来非常简洁和易于管理。

对于此类应用场景，如何高效地插入或修改或查询数据？

我可以使用 jsonb_insert 插入一个对象，就像：

update customers 
set fcodes=jsonb_insert(fcodes,'{-1}','{...}'::jsonb) 
where fcustomerid=999;

当我想修改某个对象时，我发现有点困难，我应该先知道对象的索引，如果我使用增量键codeid作为数组索引，事情看起来很容易。我可以使用jsonb_modify，如下：

update customers 
set fcodes=jsonb_set(fcodes,concat('{',(mycodeid-1)::text,',lottery}'),'true'::jsonb) 
where fcustomerid=999;

但如果我想用 createdate 或 bonus 或 lottorry 或 product查询 jsonb 数组中的对象>，我应该使用 jsonpath 运算符。就像：

select jsonb_path_query_array(fcodes,'$ ? (product=="Pepsi Cola")' 
from customer 
where fcustomerid=999;

或喜欢：

select jsonb_path_query_array(fcodes,'$ ? (lottdate.datetime()>="2021-01-01".datetime() && lottdate.datetime()<="2021-01-31".datetime())' 
from customer 
where fcustomerid=999;

这个jsonb索引看起来很有用，但是在不同的行之间看起来很有用，而且我的操作主要是在一行的一个jsonb字段中进行的。

我很担心效率，对于存储在一行的一个jsonb字段中的数百万个对象，这是个好主意吗？以及如何提高这种场景下的效率？尤其是查询。

【问题讨论】：

不改进，重写为一对多关系的两张表。 PostgreSQL anti-patterns: Unnecessary json/hstore dynamic columns

标签： arrays postgresql indexing jsonb

【解决方案1】：

你的担心是对的。使用像这样巨大的 JSON，您将永远无法获得良好的性能。

您的数据根本不需要 JSON。创建一个存储单个条形码并具有对customers 的外键引用的表。那么一切都会变得简单而高效。

从本论坛的问题来看，在数据库中使用 JSON 几乎总是错误的选择。

【讨论】：

非常感谢！如果我使用一张表来存储，如果行数很大，例如十亿行，在一张有索引的表中，效率是否可以接受？给我一些建议，非常感谢
十亿行没问题。较大的 JSON 值是。
感谢您的建议，我会创建表格