超过 memtable_cleanup_threshold 时处理 cassandra 阻塞写入答案

【问题标题】：Handling of cassandra blocking writes when exceeds the memtable_cleanup_threshold超过 memtable_cleanup_threshold 时处理 cassandra 阻塞写入
【发布时间】：2019-07-06 21:12:44
【问题描述】：

我正在阅读 cassandra 冲洗策略并遇到以下声明 -

 If the data to be flushed exceeds the memtable_cleanup_threshold, Cassandra blocks writes until the next flush succeeds.

现在我的查询是，假设我们每秒向 cassandra 疯狂写入大约 10K 条记录，并且应用程序正在 24*7 运行。我们应该在以下参数中进行哪些设置以避免阻塞。

memtable_heap_space_in_mb 
memtable_offheap_space_in_mb 
memtable_cleanup_threshold

& 由于它是时间序列数据，我是否还需要对压缩策略进行任何更改。如果是，什么最适合我的情况。

我从 kafka 获取数据并不断插入 Cassandra 的 spark 应用程序在特定时间后挂起，我当时分析过，nodetool compactionstats 中有很多待处理的任务。

nodetool tablehistograms



 %       SSTables   WL             RL             P Size        Cell Count
                            (ms)           (ms)           (bytes)
50%     642.00    88.15           25109.16     310         24
75%     770.00    263.21         668489.53   535         50
95%     770.00    4055.27       668489.53   3311       310
98%     770.00    8409.01       668489.53   73457     6866
99%     770.00    12108.97     668489.53   219342   20501
Min      4.00        11.87           20924.30     150         9
Max     770.00    1996099.05 668489.53   4866323 454826


Keyspace : trackfleet_db
    Read Count: 7183347
    Read Latency: 15.153115504235004 ms
    Write Count: 2402229293
    Write Latency: 0.7495135263492935 ms
    Pending Flushes: 1
        Table: locationinfo
        SSTable count: 3307
        Space used (live): 62736956804
        Space used (total): 62736956804
        Space used by snapshots (total): 10469827269
        Off heap memory used (total): 56708763
        SSTable Compression Ratio: 0.38214618375483633
        Number of partitions (estimate): 493571
        Memtable cell count: 2089
        Memtable data size: 1168808
        Memtable off heap memory used: 0
        Memtable switch count: 88033
        Local read count: 765497
        Local read latency: 162.880 ms
        Local write count: 782044138
        Local write latency: 1.859 ms
        Pending flushes: 0
        Percent repaired: 0.0
        Bloom filter false positives: 368
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 29158176
        Bloom filter off heap memory used: 29104216
        Index summary off heap memory used: 7883835
        Compression metadata off heap memory used: 19720712
        Compacted partition minimum bytes: 150
        Compacted partition maximum bytes: 4866323
        Compacted partition mean bytes: 7626
        Average live cells per slice (last five minutes): 3.5
        Maximum live cells per slice (last five minutes): 6
        Average tombstones per slice (last five minutes): 1.0
        Maximum tombstones per slice (last five minutes): 1
        Dropped Mutations: 359

更改压缩策略后：-

Keyspace : trackfleet_db
    Read Count: 8568544
    Read Latency: 15.943608060365916 ms
    Write Count: 2568676920
    Write Latency: 0.8019530641630868 ms
    Pending Flushes: 1
        Table: locationinfo
        SSTable count: 5843
        SSTables in each level: [5842/4, 0, 0, 0, 0, 0, 0, 0, 0]
        Space used (live): 71317936302
        Space used (total): 71317936302
        Space used by snapshots (total): 10469827269
        Off heap memory used (total): 105205165
        SSTable Compression Ratio: 0.3889946058934169
        Number of partitions (estimate): 542002
        Memtable cell count: 235
        Memtable data size: 131501
        Memtable off heap memory used: 0
        Memtable switch count: 93947
        Local read count: 768148
        Local read latency: NaN ms
        Local write count: 839003671
        Local write latency: 1.127 ms
        Pending flushes: 1
        Percent repaired: 0.0
        Bloom filter false positives: 1345
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 54904960
        Bloom filter off heap memory used: 55402400
        Index summary off heap memory used: 14884149
        Compression metadata off heap memory used: 34918616
        Compacted partition minimum bytes: 150
        Compacted partition maximum bytes: 4866323
        Compacted partition mean bytes: 4478
        Average live cells per slice (last five minutes): NaN
        Maximum live cells per slice (last five minutes): 0
        Average tombstones per slice (last five minutes): NaN
        Maximum tombstones per slice (last five minutes): 0
        Dropped Mutations: 660

谢谢，

【问题讨论】：

您可以为您的餐桌添加nodetool tablestats 吗？

标签： scala cassandra spark-streaming cassandra-3.0

【解决方案1】：

除非有问题，否则我不会触及 memtable 设置。只有当您的写入速度超过磁盘的写入能力或 GC 打乱了时间时，它们才会真正阻塞。 “每秒 10K 记录和应用程序运行 24*7”——实际上并没有那么多，因为记录的大小不是很大并且不会溢出写入（一个体面的系统可以执行 100k-200k/s 的恒定负载）。 nodetool tablestats、tablehistograms 和 schema 可以帮助识别您的记录是否太大、分区是否太宽，并更好地指示您的压缩策略应该是什么（可能是 TWCS，但如果您有任何读取并且分区跨度可能是 LCS一天左右）。

pending tasks in nodetool compactionstats 与 memtable 设置无关，因为它更多的是你的压缩跟不上。这可能只是批量作业运行时的峰值，小分区刷新或修复流 sstables，但如果它增长而不是下降，你需要调整你的压缩策略。确实很大程度上取决于数据模型和统计数据（tablestats/tablehistograms）

【讨论】：

谢谢你，克里斯，这几天我身体不太好。我的应用程序现在正在运行。一旦挂起，将共享 nodetool tablestats。
我用直方图编辑了我的。你能检查一下并告诉我有什么问题吗？
SSTable count: 3307 是问题所在。您可能需要使用不同的压缩策略 TimedWindowCompactionStrategy，但 LeveledCompactionStrategy 也会有所帮助，并且设置起来更容易。可以使用 JMX 在 1 个节点上启用它来测试看support.datastax.com/hc/en-us/articles/…
jmx_set -m org.apache.cassandra.db:type=ColumnFamilies,keyspace= trackfleet_db,columnfamily=locationinfo CompactionParametersJson \{"class":"LeveledCompactionStrategy","sstable_size_in_mb":"256"\} 并给它很长的时间让压缩任务停止nodetool compactionstats。虽然也可以跳过它并立即改变它的集群范围，因为它真的不会比每次读取超过 600 个 sstables 更糟糕
我的 Cassandra 的写入工作量很大，并且根据文档“LeveledCompactionStrategy”在这种情况下并不好。我每秒插入大约 10K 条记录。你有什么建议？我还要继续使用 Level Compaction 吗？ datastax.com/dev/blog/when-to-use-leveled-compaction

【解决方案2】：

您可以参考此链接来调整上述参数。 http://abiasforaction.net/apache-cassandra-memtable-flush/

memtable_cleanup_threshold - 将触发 memtable 清理的总可用 memtable 空间的百分比。 memtable_cleanup_threshold 默认为 1 / (memtable_flush_writers + 1）。默认情况下，这基本上是您的 33% memtable_heap_space_in_mb。定期清理会导致冲洗占据memtable最大部分的表/列族空间。这种情况一直在发生，直到您的可用 memtable 内存下降低于清理阈值。

【讨论】：

我应该为 Compaction 做些什么？
如果它是基于 TTL 的时间序列数据并且您不执行删除操作，那么您可以查看 TWCS 以及继续使用 STCS 并在需要时进行调整。
您可以同时使用大小分层压缩策略 (STCS) 和分级压缩策略 (LCS)，而 LCS 的读取特性比 STCS 好得多。但是根据您的应用程序行为，您应该去。参考下面docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/config/…
@Laxmikant - 不它不是基于 TTL 的。我不断地将数据存储到表中，并且它们不断增长的表。我应该记住哪些参数来调整这个 STCS。
@Pandey - 我将进行疯狂的读/写操作，但正如你所说，LCS 更适合读取特性，所以应该坚持使用 STCS 吗？