【发布时间】:2020-03-06 22:57:32
【问题描述】:
我们的服务能够在我们的本地和部署的 Cassandra 实例上运行 SELECT 和 INSERT 查询,而不会出现任何问题。
但是,我们在使用以下 DELETE 查询时遇到了问题:
DELETE FROM config_by_uuid WHERE uuid = record_uuid;
我们的服务能够成功删除本地实例上的记录,但不能成功删除已部署实例上的记录。请注意,这两个实例的行为都是不变的,并且在我们部署的实例上没有报告任何错误。
值得注意的是,当上述查询通过cqlsh 在我们部署的实例上运行时,它成功删除了一条记录。只有在我们部署的实例上从我们的服务运行时才会失败。我们的服务和cqlsh 使用同一个用户来运行查询。
起初我们怀疑这可能是 Cassandra 一致性问题,因此我们尝试在 cqlsh 上运行查询,两者的一致性级别为 ONE 和 QUORUM,并且对于这两个一致性级别的查询都成功了。请注意,我们的服务当前使用QUORUM 进行所有操作。
我们排除这是代码问题的可能性的原因是该服务在我们的本地实例上按预期工作。我们的理由是,如果这是一个代码问题,那么这两个实例都应该失败,因此差异必须存在于我们的 Cassandra 安装的某个地方。两个实例都使用Cassandra 3.11.X。
两个实例的键空间和表详细信息相同,如下所示(请注意,我们目前仅使用单个节点,因为我们仍处于开发的早期阶段):
CREATE KEYSPACE config WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE config.config_by_uuid (
uuid uuid PRIMARY KEY,
config_name text,
config_value text,
service_uuid uuid,
tenant_uuid uuid,
user_uuid uuid
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
我们已在已部署的 Cassandra 上启用了跟踪,以下是通过 cqlsh 运行查询时的详细信息:
system_traces.sessions:
session_id: 25b48ce0-0491-11ea-ace9-5db0758d00f3
client: node_ip
command: QUERY
coordinator: node_ip
duration: 1875
parameters: {'consistency_level': 'ONE', 'page_size': '100', 'query': 'delete from config_by_uuid where uuid = 96ac4699-5199-4a80-9c59-b592d28ea2b7;', 'serial_consistency_level': 'SERIAL'}
request: Execute CQL3 query
started_at: 2019-11-11 14:40:03.758000+0000
system_traces.events:
session_id | event_id | activity | source | source_elapsed | thread
--------------------------------------+--------------------------------------+---------------------------------------------------------------------------------------+--------------+----------------+-----------------------------
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4b3f0-0491-11ea-ace9-5db0758d00f3 | Parsing delete from config_by_uuid where uuid = 96ac4699-5199-4a80-9c59-b592d28ea2b7; | node_ip | 203 | Native-Transport-Requests-1
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4b3f1-0491-11ea-ace9-5db0758d00f3 | Preparing statement | node_ip | 381 | Native-Transport-Requests-1
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4b3f2-0491-11ea-ace9-5db0758d00f3 | Executing single-partition query on roles | node_ip | 1044 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4b3f3-0491-11ea-ace9-5db0758d00f3 | Acquiring sstable references | node_ip | 1080 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db00-0491-11ea-ace9-5db0758d00f3 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | node_ip | 1114 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db01-0491-11ea-ace9-5db0758d00f3 | Key cache hit for sstable 2 | node_ip | 1152 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db02-0491-11ea-ace9-5db0758d00f3 | Merged data from memtables and 1 sstables | node_ip | 1276 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db03-0491-11ea-ace9-5db0758d00f3 | Read 1 live rows and 0 tombstone cells | node_ip | 1307 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db04-0491-11ea-ace9-5db0758d00f3 | Executing single-partition query on roles | node_ip | 1466 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db05-0491-11ea-ace9-5db0758d00f3 | Acquiring sstable references | node_ip | 1484 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db06-0491-11ea-ace9-5db0758d00f3 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | node_ip | 1501 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db07-0491-11ea-ace9-5db0758d00f3 | Key cache hit for sstable 2 | node_ip | 1525 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db08-0491-11ea-ace9-5db0758d00f3 | Merged data from memtables and 1 sstables | node_ip | 1573 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db09-0491-11ea-ace9-5db0758d00f3 | Read 1 live rows and 0 tombstone cells | node_ip | 1593 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db0a-0491-11ea-ace9-5db0758d00f3 | Determining replicas for mutation | node_ip | 1743 | Native-Transport-Requests-1
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db0b-0491-11ea-ace9-5db0758d00f3 | Appending to commitlog | node_ip | 1796 | MutationStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db0c-0491-11ea-ace9-5db0758d00f3 | Adding to config_by_uuid memtable | node_ip | 1827 | MutationStage-3
以下是从我们的服务运行查询时的详细信息:
system_traces.sessions:
session_id: 9ed67270-048f-11ea-ace9-5db0758d00f3
client: service_ip
command: QUERY
coordinator: node_ip
duration: 3247
parameters: {'bound_var_0_uuid': '19e12033-5ad4-4376-8293-315a26370d93', 'consistency_level': 'QUORUM', 'page_size': '5000', 'query': 'DELETE FROM config.config_by_uuid WHERE uuid=? ', 'serial_consistency_level': 'SERIAL'}
request: Execute CQL3 prepared query
started_at: 2019-11-11 14:29:07.991000+0000
system_traces.events:
session_id | event_id | activity | source | source_elapsed | thread
--------------------------------------+--------------------------------------+---------------------------------------------------------------------------+--------------+----------------+-----------------------------
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed67271-048f-11ea-ace9-5db0758d00f3 | Executing single-partition query on roles | node_ip | 178 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed67272-048f-11ea-ace9-5db0758d00f3 | Acquiring sstable references | node_ip | 204 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed67273-048f-11ea-ace9-5db0758d00f3 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | node_ip | 368 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed69980-048f-11ea-ace9-5db0758d00f3 | Key cache hit for sstable 2 | node_ip | 553 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed69981-048f-11ea-ace9-5db0758d00f3 | Merged data from memtables and 1 sstables | node_ip | 922 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed69982-048f-11ea-ace9-5db0758d00f3 | Read 1 live rows and 0 tombstone cells | node_ip | 1193 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c090-048f-11ea-ace9-5db0758d00f3 | Executing single-partition query on roles | node_ip | 1587 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c091-048f-11ea-ace9-5db0758d00f3 | Acquiring sstable references | node_ip | 1642 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c092-048f-11ea-ace9-5db0758d00f3 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | node_ip | 1708 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c093-048f-11ea-ace9-5db0758d00f3 | Key cache hit for sstable 2 | node_ip | 1750 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c094-048f-11ea-ace9-5db0758d00f3 | Merged data from memtables and 1 sstables | node_ip | 1845 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c095-048f-11ea-ace9-5db0758d00f3 | Read 1 live rows and 0 tombstone cells | node_ip | 1888 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6e7a0-048f-11ea-ace9-5db0758d00f3 | Determining replicas for mutation | node_ip | 2660 | Native-Transport-Requests-1
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6e7a1-048f-11ea-ace9-5db0758d00f3 | Appending to commitlog | node_ip | 3028 | MutationStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6e7a2-048f-11ea-ace9-5db0758d00f3 | Adding to config_by_uuid memtable | node_ip | 3133 | MutationStage-2
以下是我们在 Windows 10 上安装本地 Cassandra 的步骤。请注意,安装后没有更改任何配置文件:
已安装 Java 8。
java -version和javac -version都在工作。已安装 Python 2。
python --version正在运行。-
从以下位置下载了最新的 Cassandra
bin.tar.gz文件:http://cassandra.apache.org/download/ 解压zip文件内容,重命名为
cassandra,放到C:\中。将
C:\cassandra\bin添加到我们的 PATH 环境变量中。
以下是我们在 CentOS 8 上安装已部署的 Cassandra 的步骤:
-
更新百胜:
yum -y update -
安装 Java:
yum -y install java java -version -
创建 yum 使用的 repo 文件:
nano /etc/yum.repos.d/cassandra.repo --- [cassandra] name=Apache Cassandra baseurl=https://www.apache.org/dist/cassandra/redhat/311x/ gpgcheck=1 repo_gpgcheck=1 gpgkey=https://www.apache.org/dist/cassandra/KEYS -
安装 Cassandra:
yum -y install cassandra -
为 Cassandra 创建一个服务文件:
nano /etc/systemd/system/cassandra.service --- [Unit] Description=Apache Cassandra After=network.target [Service] PIDFile=/var/run/cassandra/cassandra.pid User=cassandra Group=cassandra ExecStart=/usr/sbin/cassandra -f -p /var/run/cassandra/cassandra.pid Restart=always [Install] WantedBy=multi-user.target -
重新加载系统守护进程:
systemctl daemon-reload -
授予 Cassandra 目录权限:
sudo chown -R cassandra:cassandra /var/lib/cassandra sudo chown -R cassandra:cassandra /var/log/cassandra -
配置系统在启动时运行 Cassandra:
systemctl enable cassandra -
配置 cassandra.yaml 文件:
nano /etc/cassandra/conf/cassandra.yaml --- (TIP: Use Ctrl+W to search for the settings you want to change.) authenticator: org.apache.cassandra.auth.PasswordAuthenticator authorizer: org.apache.cassandra.auth.CassandraAuthorizer role_manager: CassandraRoleManager roles_validity_in_ms: 0 permissions_validity_in_ms: 0 cluster_name: 'MyCompany Dev' initial_token: (should be commented-out) listen_address: node_ip rpc_address: node_ip endpoint_snitch: GossipingPropertyFileSnitch auto_bootstrap: false (add this at the bottom of the file) seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: "node_ip" -
配置 cassandra-topology.properties 文件:
nano /etc/cassandra/conf/cassandra-topology.properties --- (NOTE: For "Cassandra Node IP=Data Center:Rack", delete all existing values.) #Cassandra Node IP=Data Center:Rack [Local IP]=SG:Dev # default for unknown nodes default=SG:Dev -
配置 cassandra-rackdc.properties 文件:
nano /etc/cassandra/conf/cassandra-rackdc.properties --- dc=SG rack=Dev -
运行以下命令清理目录:
rm -rf /var/lib/cassandra/data rm -rf /var/lib/cassandra/commitlog rm -rf /var/lib/cassandra/saved_caches rm -rf /var/lib/cassandra/hints -
启动 Cassandra:
service cassandra start -
安装 Python 2:
yum -y install python2 python2 --version -
以默认用户身份登录:
cqlsh -u cassandra -p cassandra node_ip --request-timeout=6000 -
创建新用户:
CREATE ROLE adminuser WITH PASSWORD = 'password' AND SUPERUSER = true AND LOGIN = true; exit; -
以新用户身份登录:
cqlsh -u adminuser -p password node_ip --request-timeout=6000 -
禁用默认用户:
ALTER ROLE cassandra WITH PASSWORD = 'cassandra' AND SUPERUSER = false AND LOGIN = false; REVOKE ALL PERMISSIONS ON ALL KEYSPACES FROM cassandra; GRANT ALL PERMISSIONS ON ALL KEYSPACES TO adminuser; exit;
我们的服务是用 Golang 编写的,并使用以下第三方库与 Cassandra 通信:
github.com/gocql/gocql
github.com/scylladb/gocqlx
github.com/scylladb/gocqlx/qb
更新 1: 以下是我们的服务和 cqlsh 用于运行查询的用户的权限(通过 list all permissions on config.config_by_uuid;):
role | username | resource | permission
----------+-----------+-------------------------------+------------
adminuser | adminuser | <all keyspaces> | CREATE
adminuser | adminuser | <all keyspaces> | ALTER
adminuser | adminuser | <all keyspaces> | DROP
adminuser | adminuser | <all keyspaces> | SELECT
adminuser | adminuser | <all keyspaces> | MODIFY
adminuser | adminuser | <all keyspaces> | AUTHORIZE
adminuser | adminuser | <keyspace config> | CREATE
adminuser | adminuser | <keyspace config> | ALTER
adminuser | adminuser | <keyspace config> | DROP
adminuser | adminuser | <keyspace config> | SELECT
adminuser | adminuser | <keyspace config> | MODIFY
adminuser | adminuser | <keyspace config> | AUTHORIZE
adminuser | adminuser | <table config.config_by_uuid> | ALTER
adminuser | adminuser | <table config.config_by_uuid> | DROP
adminuser | adminuser | <table config.config_by_uuid> | SELECT
adminuser | adminuser | <table config.config_by_uuid> | MODIFY
adminuser | adminuser | <table config.config_by_uuid> | AUTHORIZE
Cassandra 文档指出 MODIFY 授予以下权限:INSERT、DELETE、UPDATE、TRUNCATE。因为adminuser可以插入记录没有任何问题,看来我们的删除问题不是权限问题。
更新 2: 以下是关键 Cassandra 目录的所有者和权限(通过 ls -al):
/etc/cassandra:
total 20
drwxr-xr-x 3 root root 4096 Nov 12 22:18 .
drwxr-xr-x. 103 root root 12288 Nov 12 22:18 ..
lrwxrwxrwx 1 root root 27 Nov 12 22:18 conf -> /etc/alternatives/cassandra
drwxr-xr-x 3 root root 4096 Nov 12 22:18 default.conf
/var/lib/cassandra:
total 24
drwxr-xr-x 6 cassandra cassandra 4096 Nov 12 22:38 .
drwxr-xr-x. 43 root root 4096 Nov 12 22:18 ..
drwxr-xr-x 2 cassandra cassandra 4096 Nov 12 22:38 commitlog
drwxr-xr-x 8 cassandra cassandra 4096 Nov 12 22:40 data
drwxr-xr-x 2 cassandra cassandra 4096 Nov 12 22:38 hints
drwxr-xr-x 2 cassandra cassandra 4096 Nov 12 22:38 saved_caches
/var/log/cassandra:
total 3788
drwxr-xr-x 2 cassandra cassandra 4096 Nov 12 22:19 .
drwxr-xr-x. 11 root root 4096 Nov 12 22:18 ..
-rw-r--r-- 1 cassandra cassandra 2661056 Nov 12 22:41 debug.log
-rw-r--r-- 1 cassandra cassandra 52623 Nov 12 23:11 gc.log.0.current
-rw-r--r-- 1 cassandra cassandra 1141764 Nov 12 22:40 system.log
更新 3: 我们还怀疑这是 tombstone 或 compaction 问题,因此我们尝试将 gc_grace_seconds 设置为 0 并运行删除查询,但它也没有帮助。
在将gc_grace_seconds 设置为0 和默认864000 的情况下运行nodetool compact -s config config_by_uuid 也没有帮助。
更新 4:我们尝试卸载并重新安装 Cassandra,但未能解决问题。以下是我们使用的步骤:
-
通过 yum 卸载 Cassandra:
yum -y remove cassandra -
删除以下目录:
rm -rf /var/lib/cassandra rm -rf /var/log/cassandra rm -rf /etc/cassandra -
删除了所有剩余文件:
(注意:对于以下命令的结果,请执行
rm -rf。)find / -name 'cassandra' find / -name '*cassandra*'例如
rm -rf /run/lock/subsys/cassandra rm -rf /tmp/hsperfdata_cassandra rm -rf /etc/rc.d/rc3.d/S80cassandra rm -rf /etc/rc.d/rc2.d/S80cassandra rm -rf /etc/rc.d/rc0.d/K20cassandra rm -rf /etc/rc.d/rc6.d/K20cassandra rm -rf /etc/rc.d/rc5.d/S80cassandra rm -rf /etc/rc.d/rc4.d/S80cassandra rm -rf /etc/rc.d/rc1.d/K20cassandra rm -rf /root/.cassandra rm -rf /var/cache/dnf/cassandra-e96532ac33a46b7e rm -rf /var/cache/dnf/cassandra.solv rm -rf /var/cache/dnf/cassandra-filenames.solvx rm -rf /run/systemd/generator.late/graphical.target.wants/cassandra.service rm -rf /run/systemd/generator.late/multi-user.target.wants/cassandra.service rm -rf /run/systemd/generator.late/cassandra.service
更新 5: 这个问题发生在我们安装 CentOS 的 Server 上,所以我们接下来尝试了 Minimal Install。令人惊讶的是,最小安装没有出现该问题。我们目前正在调查可能存在的差异。
更新 6:我们尝试再创建一台服务器,这次也选择了 Server 安装 CentOS。令人惊讶的是,这台服务器上也没有出现该问题,因此 CentOS 安装的类型也与我们的问题无关。
有了这个,我们已经确认是我们的 Cassandra 安装出了问题,尽管我们还不确定我们做错了什么,以至于即使卸载和重新安装也无法解决原始服务器上的问题。
也许我们上面的卸载步骤不够彻底?
更新 7: 原来新服务器没有出现问题的原因是因为原来的服务器使用的是定制的 CentOS ISO 而不是原版的.我们的一位团队成员正在研究自定义 ISO 的不同之处,当他们回复我们时,我将更新此问题。
更新 8: 事实证明,这个问题也存在于我们使用的所谓的普通 CentOS ISO 中,并且由于定制的 ISO 基于此,目前所有服务器都有问题。
但是,为了使问题发生,需要使用reboot 命令重新启动服务器。此命令交替出现问题(重启 1,没有问题;重启 2,出现问题;重启 3,没有问题)。
我们的一名团队成员目前正在调查我们是否使用了有问题的 CentOS ISO。我们也在考虑我们的 ISO 是否良好的可能性,但问题可能出在我们的虚拟机环境上。
更新 9: 未定制的 CentOS ISO,CentOS-8-x86_64-1905-dvd1.iso,是从 centos.org 下载的。我们已验证其校验和并确认 ISO 与来自 CentOS 官方网站的完全相同。
有了这个,我们已经确定问题出在我们的虚拟机环境上。
我们正在使用 vmware ESXi 创建托管 Cassandra 的虚拟机。
我们的虚拟机详情如下:
操作系统详情:
Compatibility: ESXi 6.7 virtual machine
Guest OS family: Linux
Guest OS version: CentOS 8 (64-bit)
存储详情:
Type: Standard (choices were `Standard` and `Persistent Memory`)
数据存储详细信息:
Capacity: 886.75 GB
Free: 294.09 GB
Type: VMFS6
Thin provisioning: Supported
Access: Single
虚拟机设置:
CPU: 1
(choices: 1-32)
Memory: 2048 MB
Hard disk 1: 16 GB
Maximum Size: 294.09 GB
Location: [datastore1] virtual_machine_name
Disk Provisioning: Thin Provisioned
(choices: Thin provisioned; Thick provisioned, lazily zeroed; Thick provisioned, eagerly zeroed)
Shares:
Type: Normal
(choices: Low, Normal, High, Custom)
Value: 1000
Limit - IOPs: Unlimited
Controller location: SCSI controller 0
(choices: IDE controller 0; IDE controller 1; SCSI controller 0; SATA controller 0)
Virtual Device Node unit: SCSI (0:0)
(choices: SCSI (0:0) to (0:64))
Disk mode: Dependent
(choices: Dependent; Independent - persistent; Independent - Non-persistent)
Sharing: None
(Disk sharing is only possible with eagerly zeroed, thick provisioned disks.)
SCSI Controller 0: VMware Paravirtual
(choices: LSI Logic SAS; LSI Logic Parallel; VMware Paravirtual)
SATA Controller 0: (no options)
USB controller 1: USB 2.0
(choices: USB 2.0; USB 3.0)
Network Adapter 1: our_domain
Connect: (checked)
CD/DVD Drive 1: Datastore ISO File (CentOS-8-x86_64-1905-dvd1.iso)
(choices: Host device; Datastore ISO File)
Connect: (checked)
Video Card: Default settings
(choices: Default settings; Specify custom settings)
生成的摘要:
Name: virtual_machine_name
Datastore: datastore1
Guest OS name: CentOS 8 (64-bit)
Compatibility: ESXi 6.7 virtual machine
vCPUs: 1
Memory: 2048 MB
Network adapters: 1
Network adapter 1 network: our_domain
Network adapter 1 type: VMXNET 3
IDE controller 0: IDE 0
IDE controller 1: IDE 1
SCSI controller 0: VMware Paravirtual
SATA controller 0: New SATA controller
Hard disk 1:
Capacity: 16GB
Datastore: [datastore1] virtual_machine_name/
Mode: Dependent
Provisioning: Thin provisioned
Controller: SCSI controller 0 : 0
CD/DVD drive 1:
Backing: [datastore1] _Data/ISO/CentOS-8-x86_64-1905-dvd1.iso
Connected: Yes
USB controller 1: USB 2.0
非常感谢所有花时间阅读这篇长篇文章的人!
【问题讨论】:
标签: cassandra row cql cassandra-3.0 cqlsh