NoSpamLogger.java 达到最大内存使用量 Cassandra答案

【问题标题】：NoSpamLogger.java Maximum memory usage reached CassandraNoSpamLogger.java 达到最大内存使用量 Cassandra
【发布时间】：2018-03-29 11:48:35
【问题描述】：

我有一个 5 节点的 Cassandra 集群，每个节点上约有 650 GB 的数据涉及复制因子 3。我最近开始在 /var/log/cassandra/system.log 中看到以下错误。

INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91 - 达到最大内存使用量 (1.000GiB)，无法分配 1.000MiB 的块

我试图增加 file_cache_size_in_mb，但迟早会出现同样的错误。我曾尝试将此参数设为 2GB，但无济于事。

当错误发生时，CPU 利用率飙升，读取延迟非常不稳定。我看到这种激增大约每 1/2 小时出现一次。请注意下面列表中的时间安排。

INFO [ReadStage-5] 2017-10-17 17:06:07,887 NoSpamLogger.java:91 - 达到最大内存使用量 (1.000GiB)，无法分配 1.000MiB 的块信息 [ReadStage-36] 2017-10-17 17:36:09,807 NoSpamLogger.java:91 - 达到最大内存使用量 (1.000GiB)，无法分配 1.000MiB 的块信息 [ReadStage-15] 2017-10-17 18:05:56,003 NoSpamLogger.java:91 - 达到最大内存使用量 (2.000GiB)，无法分配 1.000MiB 的块信息 [ReadStage-28] 2017-10-17 18:36:01,177 NoSpamLogger.java:91 - 达到最大内存使用量 (2.000GiB)，无法分配 1.000MiB 的块

我有两个表按小时分区，分区很大。前任。以下是他们从 nodetool 表统计中得到的输出

    Read Count: 4693453
    Read Latency: 0.36752741680805157 ms.
    Write Count: 561026
    Write Latency: 0.03742310516803143 ms.
    Pending Flushes: 0
        Table: raw_data
        SSTable count: 55
        Space used (live): 594395754275
        Space used (total): 594395754275
        Space used by snapshots (total): 0
        Off heap memory used (total): 360753372
        SSTable Compression Ratio: 0.20022598072758296
        Number of keys (estimate): 45163
        Memtable cell count: 90441
        Memtable data size: 685647925
        Memtable off heap memory used: 0
        Memtable switch count: 1
        Local read count: 0
        Local read latency: NaN ms
        Local write count: 126710
        Local write latency: 0.096 ms
        Pending flushes: 0
        Percent repaired: 52.99
        Bloom filter false positives: 167775
        Bloom filter false ratio: 0.16152
        Bloom filter space used: 264448
        Bloom filter off heap memory used: 264008
        Index summary off heap memory used: 31060
        Compression metadata off heap memory used: 360458304
        Compacted partition minimum bytes: 51
        **Compacted partition maximum bytes: 3449259151**
        Compacted partition mean bytes: 16642499
        Average live cells per slice (last five minutes): 1.0005435888450147
        Maximum live cells per slice (last five minutes): 42
        Average tombstones per slice (last five minutes): 1.0
        Maximum tombstones per slice (last five minutes): 1
        Dropped Mutations: 0



    Read Count: 4712814
    Read Latency: 0.3356051004771247 ms.
    Write Count: 643718
    Write Latency: 0.04168356951335834 ms.
    Pending Flushes: 0
        Table: customer_profile_history
        SSTable count: 20
        Space used (live): 9423364484
        Space used (total): 9423364484
        Space used by snapshots (total): 0
        Off heap memory used (total): 6560008
        SSTable Compression Ratio: 0.1744084338623116
        Number of keys (estimate): 69
        Memtable cell count: 35242
        Memtable data size: 789595302
        Memtable off heap memory used: 0
        Memtable switch count: 1
        Local read count: 2307
        Local read latency: NaN ms
        Local write count: 51772
        Local write latency: 0.076 ms
        Pending flushes: 0
        Percent repaired: 0.0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 384
        Bloom filter off heap memory used: 224
        Index summary off heap memory used: 400
        Compression metadata off heap memory used: 6559384
        Compacted partition minimum bytes: 20502
        **Compacted partition maximum bytes: 4139110981**
        Compacted partition mean bytes: 708736810
        Average live cells per slice (last five minutes): NaN
        Maximum live cells per slice (last five minutes): 0
        Average tombstones per slice (last five minutes): NaN
        Maximum tombstones per slice (last five minutes): 0
        Dropped Mutations: 0

这里是：

cdsdb/raw_data histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%             0.00             61.21              0.00           1955666               642
75%             1.00             73.46              0.00          17436917              4768
95%             3.00            105.78              0.00         107964792             24601
98%             8.00            219.34              0.00         186563160             42510
99%            12.00            315.85              0.00         268650950             61214
Min             0.00              6.87              0.00                51                 0
Max            14.00           1358.10              0.00        3449259151           7007506

cdsdb/customer_profile_history histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%             0.00             73.46              0.00         223875792             61214
75%             0.00             88.15              0.00         668489532            182785
95%             0.00            152.32              0.00        1996099046            654949
98%             0.00            785.94              0.00        3449259151           1358102
99%             0.00            943.13              0.00        3449259151           1358102
Min             0.00             24.60              0.00              5723                 4
Max             0.00           5839.59              0.00        5960319812           1955666

您能否建议一种解决此问题的方法？

【问题讨论】：

你能给我们这两个表的“nodetool cfhistograms”吗？
我在问题中发布了直方图。

标签： cassandra cassandra-3.0

【解决方案1】：

根据发布的 cfhistograms 输出，分区是巨大的。

raw_data 表的 95% 的分区大小为 107MB，最大 3.44GB。 customer_profile_history 的 95% 百分位有分区大小为 1.99GB，最大为 5.96GB。

这显然与您每半小时注意到的问题有关，因为这些巨大的分区被写入 sstable。数据模型必须根据分区大小进行更改，以将分区间隔设置为“分钟”而不是“小时”。所以一个 2GB 的分区会减少到 33MB 的分区。

推荐的分区大小是保持在接近 100MB 的最大值。虽然理论上我们可以存储超过 100MB，但性能会受到影响。请记住，该分区的每次读取都会通过线路读取超过 100MB 的数据。在您的情况下，它超过 2GB，因此会影响所有性能。

【讨论】：

太棒了！！感谢您指出了这一点。同样，重新分区这么大的表的最简单方法是什么？我是否需要创建一个新表并重新摄取其中的数据。还是有更好的办法？
奇怪的是，更大的分区表“customer_profile_history”只有大约10GB的数据，而分区仍有待改进的“raw_data”表大约有594GB。所以我会首先专注于重新创建以前的表并使用批量加载器（也称为 sstableloader）加载数据。您也可以使用相同的过程重新加载第二个表数据，或者如果您熟悉的话，可以考虑使用 spark 加载。
@Varsha 如果没有任何后续问题，请记住接受答案（勾号）。
感谢您迄今为止的投入。我实际上是在给解决方案一个旋转。我将分区更改为按分钟而不是按小时。我的应用程序通常不会访问过去的任何数据。所以我预计一旦我部署了新的更改，这些错误就会消失。但它们继续存在。自从我部署更改以来已经超过 12 小时。你知道为什么会这样吗？
您能否澄清一下您是否截断了过去的数据？它会经历压缩周期，因此即使您的应用程序实际上并没有访问它们，Cassandra 也是如此。我希望您将 cassandra.yaml auto_snapshot 属性设置为默认 true，这意味着如果您删除表，它将保留快照。这样 Cassandra 和应用程序就不再访问它了。在您测试应用程序之前，这可能是暂时的下降。