【问题标题】:Cassandra Query on secondary index :ReadTimeout: code=1200二级索引上的 Cassandra 查询:ReadTimeout:code=1200
【发布时间】:2023-08-18 14:42:01
【问题描述】:

我正在使用 [cqlsh 5.0.1 |卡桑德拉 2.2.1 | CQL 规范 3.3.0 |本机协议 v4] 版本。我有 2 个节点 cassandra 集群,复制因子为 2。

$ nodetool status test_keyspace
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                         Rack
UN  10.xxx.4.xxx  85.32 GB   256          100.0%            xxxx-xx-xx-xx-xx                rack1
UN  10.xxx.4.xxx  80.99 GB   256          100.0%            x-xx-xx-xx-xx                   rack1

[我已将数字替换为 x]

这是键空间定义。

cqlsh> describe test_keyspace;

CREATE KEYSPACE test_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '2'}  AND durable_writes = true;

CREATE TABLE test_keyspace.test_table (
    id text PRIMARY KEY,
    listids map<int, timestamp>
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX list_index ON test_keyspace.test_table (keys(listids));

id 是唯一的,listids's 键的基数接近 1000。我在这个键空间中有数百万条记录。

我想获取具有特定键的记录数以及这些记录的列表。我从 cqlsh 尝试了这个查询:

select count(1) from test_table where listids contains key 12;

几秒钟后出现此错误:

ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

我已经在cqlshrc和cassandra.yaml中修改了超时参数。

cat /etc/cassandra/conf/cassandra.yaml | grep read_request_timeout_in_ms
#read_request_timeout_in_ms: 5000
read_request_timeout_in_ms: 300000

cat ~/.cassandra/cqlshrc
[connection]
timeout = 36000
request_timeout = 36000
client_timeout = 36000

当我检查/var/log/cassandra/system.log 时,我只得到了这个-

WARN  [SharedPool-Worker-157] 2016-07-25 11:56:22,010 SelectStatement.java:253 - Aggregation query used without partition key

我在我的代码中使用 Java 客户端。 Java 客户端也有很多读取超时。一种解决方案可能是重塑我的数据,但这需要更多时间(尽管我不确定)。有人可以建议快速解决这个问题吗?

添加统计数据:

$ nodetool cfstats test_keyspace
Keyspace: test_keyspace
    Read Count: 5928987886
    Read Latency: 3.468279416568199 ms.
    Write Count: 1590771056
    Write Latency: 0.02020026287239664 ms.
    Pending Flushes: 0
        Table (index): test_table.list_index
        SSTable count: 9
        Space used (live): 9664953448
        Space used (total): 9664953448
        Space used by snapshots (total): 4749
        Off heap memory used (total): 1417400
        SSTable Compression Ratio: 0.822577888909709
        Number of keys (estimate): 108
        Memtable cell count: 672265
        Memtable data size: 30854168
        Memtable off heap memory used: 0
        Memtable switch count: 0
        Local read count: 1718274
        Local read latency: 63.356 ms
        Local write count: 1031719451
        Local write latency: 0.015 ms
        Pending flushes: 0
        Bloom filter false positives: 369
        Bloom filter false ratio: 0.00060
        Bloom filter space used: 592
        Bloom filter off heap memory used: 520
        Index summary off heap memory used: 144
        Compression metadata off heap memory used: 1416736
        Compacted partition minimum bytes: 73
        Compacted partition maximum bytes: 2874382626
        Compacted partition mean bytes: 36905317
        Average live cells per slice (last five minutes): 5389.0
        Maximum live cells per slice (last five minutes): 51012
        Average tombstones per slice (last five minutes): 2.0
        Maximum tombstones per slice (last five minutes): 2759

        Table: test_table
        SSTable count: 559
        Space used (live): 62368820540
        Space used (total): 62368820540
        Space used by snapshots (total): 4794
        Off heap memory used (total): 817427277
        SSTable Compression Ratio: 0.4856571513639344
        Number of keys (estimate): 96692796
        Memtable cell count: 2587248
        Memtable data size: 27398085
        Memtable off heap memory used: 0
        Memtable switch count: 558
        Local read count: 5927272991
        Local read latency: 3.788 ms
        Local write count: 559051606
        Local write latency: 0.037 ms
        Pending flushes: 0
        Bloom filter false positives: 4905594
        Bloom filter false ratio: 0.00023
        Bloom filter space used: 612245816
        Bloom filter off heap memory used: 612241344
        Index summary off heap memory used: 196239565
        Compression metadata off heap memory used: 8946368
        Compacted partition minimum bytes: 43
        Compacted partition maximum bytes: 1916
        Compacted partition mean bytes: 173
        Average live cells per slice (last five minutes): 1.0
        Maximum live cells per slice (last five minutes): 1
        Average tombstones per slice (last five minutes): 1.0
        Maximum tombstones per slice (last five minutes): 1

【问题讨论】:

  • 我遇到了同样的问题。试过 1) # 也可以设置为 None 来禁用:client_timeout = None in cqlshrc in home .cassandra.没有帮助。 2) 增加了 ym.cassandra.yaml 中的超时时间 *timeout_in_ms 也没有帮助。最后,我在我的 java 代码中的 select 子句上运行循环并收到计数。 1200 万行让我在 7 秒内数完数。速度很快。

标签: java cassandra timeout cassandra-2.0 cqlsh


【解决方案1】:

您可以重新设计表,或将查询拆分为多个较小的查询。

您正在选择使用二级索引而不使用分区键(这就是警告告诉您的内容)。这样做,您实际上执行了全表扫描。您的节点必须查看每个分区才能满足您的请求。

不更改数据模型的解决方案是遍历所有分区并为每个分区运行一次查询。

select count(*) from test_table where id = 'somePartitionId' and listids contains key 12;

这样,您的节点就知道您在哪个分区上查找这些信息。然后,您必须在客户端聚合这些查询的结果。

【讨论】:

  • 一个澄清..在我的情况下,id 应该是分区键(不确定)并且 id 几乎是唯一的(也是数百万条记录)那么我将如何查询?
  • 老实说,我建议您改造您的数据模型。对于数百万个分区,您当然可以单独(并行)查询每个分区,但这显然会花费很多时间。
【解决方案2】:

我遇到了同样的问题。试过了 1) # 也可以设置为 None 来禁用:client_timeout = None in cqlshrc in home .cassandra.没有帮助。

2) 增加了 ym.cassandra.yaml 中的超时时间 *timeout_in_ms

也没有帮助。最后,我在我的 java 代码中的 select 子句上运行循环并收到计数。 1200 万行让我在 7 秒内数完数。速度很快。

Cluster cluster = Cluster.builder()
            .addContactPoints(serverIp)
            .build();

     session = cluster.connect(keyspace);


    String cqlStatement = "SELECT count(*) FROM imadmin.device_appclass_attributes";
    //String cqlStatement = "SELECT * FROM system_schema.keyspaces";
    for (Row row : session.execute(cqlStatement)) {
        System.out.println(row.toString());
    }

【讨论】: