【问题标题】:Cassandra corruption due to UDT column drop由于 UDT 列丢弃导致 Cassandra 损坏
【发布时间】:2026-02-16 01:55:02
【问题描述】:

我们的生产Cluster Cassandra版本是:[cqlsh 5.0.1 |卡桑德拉 3.11.3 | CQL 规范 3.4.4 |原生协议 v4]

重新启动 Cassandra 节点后,Cassandra 没有启动并打印以下错误:

INFO  [main] 2018-08-22 15:30:04,082 CommitLogReader.java:105 - Skipping playback of empty log: CommitLog-6-1534951460541.log
DEBUG [main] 2018-08-22 15:30:04,082 CommitLogReader.java:273 - Reading /var/lib/cassandra/commitlog/CommitLog-6-1527416281330.log (CL version 6, messaging version 11, compression null)
INFO  [Service Thread] 2018-08-22 15:30:06,501 GCInspector.java:284 - ParNew GC in 216ms.  CMS Old Gen: 10906456 -> 31114600; Par Eden Space: 859045888 -> 0; Par Survivor Space: 29166056 -> 43187600
DEBUG [main] 2018-08-22 15:30:06,673 CommitLogReader.java:264 - Finished reading /var/lib/cassandra/commitlog/CommitLog-6-1527416281330.log
DEBUG [main] 2018-08-22 15:30:06,674 CommitLogReader.java:273 - Reading /var/lib/cassandra/commitlog/CommitLog-6-1527416281331.log (CL version 6, messaging version 11, compression null)
DEBUG [main] 2018-08-22 15:30:08,009 CommitLogReader.java:264 - Finished reading /var/lib/cassandra/commitlog/CommitLog-6-1527416281331.log
DEBUG [main] 2018-08-22 15:30:08,009 CommitLogReader.java:273 - Reading /var/lib/cassandra/commitlog/CommitLog-6-1527416281332.log (CL version 6, messaging version 11, compression null)
ERROR [main] 2018-08-22 15:30:08,610 JVMStabilityInspector.java:102 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation; saved to /tmp/mutation1296995018372874453dat.  This may be caused by replaying a mutation against a table with the same name but incompatible schema.  Exception follows: java.io.IOError: java.io.EOFException: EOF after 45 bytes out of 33554712
    at org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471) [apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404) [apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251) [apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132) [apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137) [apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:177) [apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:158) [apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:324) [apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:602) [apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:691) [apache-cassandra-3.11.3.jar:3.11.3]

在移出 CommitLogs(导致数据丢失)后,Cassandra 确实启动了,但对某些表的查询失败了

ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

还有system.log:

WARN  [ReadStage-2] 2018-08-26 11:04:34,091 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-2,10,main]: {}
java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /var/lib/cassandra/data/policy/rule-83f10050a91f11e890846d2c86545d91/mc-52-big-Data.db
    at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2601) ~[apache-cassandra-3.11.3.jar:3.11.3]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_171]
    at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) ~[apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) [apache-cassandra-3.11.3.jar:3.11.3]
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [apache-cassandra-3.11.3.jar:3.11.3]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]

经过调查,我很确定我通过以下步骤设法重现了该错误:

  1. 创建一个全新的 Cassandra docker 容器: docker stop cassandra-prod ;码头工人 rm cassandra-prod; docker run -d --name cassandra-prod -p 9042:9042 cassandra:3.11.3; docker exec -it cassandra-prod bash
  2. 创建键空间
  3. 创建 UDT
  4. 创建列类型为之前创建的 UDT 的表
  5. 在表格中插入多行
  6. 删除 UDT 列
  7. 重启 Cassandra: docker stop cassandra-prod ;码头工人启动卡桑德拉产品; docker exec -it cassandra-prod bash
  8. 对该表执行 SELECT 查询


DROP KEYSPACE IF EXISTS my_ks;
CREATE KEYSPACE my_ks WITH replication = {'class':'SimpleStrategy', 'replication_factor':1};
CREATE TYPE my_ks.my_type(column1 text);

CREATE TABLE my_ks.my_table (
  id uuid primary key,
  mt my_type
);

INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});
INSERT INTO my_ks.my_table(id, mt) VALUES(uuid(), {column1 : 'value1'});

ALTER TABLE my_ks.my_table DROP mt;

以下步骤会重现 CorruptSSTableException,但不会重现 CommitLogReadHandler$CommitLogReadException。
顺便说一句,在 Cassandra 3.11.1 上,使用上述步骤未重现该错误。

【问题讨论】:

  • 问题似乎与此类似 - CASSANDRA-12582。但是,我尝试使用您提供的步骤重现它。我能够在 3.9 中重现它,但不能在 3.10、3.11.1 或 3.11.3 中重现。这种行为似乎与 jira 中描述的一致。您能否提供更多信息以便能够重现它?
  • 另外,解决方法可能是删除以删除有问题的提交日志,重新添加列,然后放回提交日志。
  • @Horia 在这 5 个步骤之后您是否重新启动了 Cassandra 并尝试从表中进行选择?如果你没有,抱歉不清楚。我将编辑问题
  • 是的,我重新启动了它(我也没有运行 nodetool flush)。如前所述,我在3.9成功复现
  • 好的,所以我再次复制它时遇到了一些麻烦。但是在重复插入之后,我想我做到了。这有点难看,但似乎可以工作。

标签: cassandra cql


【解决方案1】:

在 Cassandra 4.0 中,将禁止删除(删除)非冻结的用户定义类型列。抛出的错误是

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot drop non-frozen column mt of user type my_type"

我在后备箱上对此进行了测试。不幸的是,这对于早期版本 (

为您的 udt 列使用 frozen 应该可以解决问题(我在 3.11.3 中测试过)(但无法更改列的类型)。

CREATE TABLE my_ks.my_table (
  id uuid primary key,
  mt frozen<my_type>
);

还有CASSANDRA-14673为此问题打开。

【讨论】:

  • 有什么方法可以预先识别 Cassandra 中的这些损坏和其他损坏?节点重启前的含义。
  • @YossiShasha 不幸的是,我认为这是不可能的。例如,在该节点收到查询后立即将查询写入提交日志。只有在节点重新启动时才会读取提交日志,并且 Cassandra 会读取提交日志以重新应用未到达 sstables 的更改。
  • 我明白了,如果我理解正确,这是关于 CommitLog 损坏的。 SSTable 损坏怎么办? SSTable 也已损坏,即使在移动损坏的 CommitLogs 后,我们也无法查询该表。 Thise 查询在重新启动之前成功执行,可能是因为它们是从 MemTables 中检索的。如果我错了,请纠正我。