我认为,您必须定期插入大量数据。所以我们要慎重选择分区键,不让庞大的数据插入单个分区。虽然您每小时汇总一次结果,但我选择分区作为每小时间隔。
这是主表架构:
CREATE TABLE transaction (
hour int,
day int,
month int,
year int,
transaction_id text,
item_code bigint,
payment_method text,
user_id bigint,
PRIMARY KEY ((hour, day, month, year), transaction_id)
);
您可以在此处将时间戳字段分为小时、日、月和年。
如果您想聚合结果,您应该使用 Spark 或 Hadoop,这是此类工作的最佳选择。
或
如果你想在 cassandra 中做这种工作,你必须为每个维度使用单独的表。在主表插入数据时,也必须在每个表中插入数据。
聚合支付方式:
CREATE TABLE payment_method_counter (
hour int,
day int,
month int,
year int,
type text,
count counter,
PRIMARY KEY ((hour, day, month, year), type)
);
您可以使用以下查询插入数据:
UPDATE payment_method_counter SET count = count + 1 WHERE hour = 1 AND day = 1 AND month = 1 AND year = 2017 AND type = 'cashondelivery';
聚合 Transaction_by_unique_user_id :
CREATE TABLE user_transaction_counter (
hour int,
day int,
month int,
year int,
userid bigint,
count counter,
PRIMARY KEY ((hour, day, month, year), userid)
);
并插入查询:
UPDATE user_transaction_counter SET count = count + 1 WHERE hour = 1 AND day = 1 AND month = 1 AND year = 2017 AND userid = 5;
出售的总商品:
CREATE TABLE item_sold_counter (
hour int,
day int,
month int,
year int,
item_code bigint,
count counter,
PRIMARY KEY ((hour, day, month, year), item_code)
);
你可以查询:
UPDATE item_sold_counter SET count = count + 1 WHERE hour = 1 AND day = 1 AND month = 1 AND year = 2017 AND item_code = 4;
在此处,对于已售出的总商品,请使用 item_code = 0 之类的特殊值。对于每件售出的商品,还要插入一个带有item_code = 0的值
获取结果:
你可以得到一个小时的聚合结果,如下查询:
cassandra@cqlsh:test> SELECT * FROM payment_method_counter WHERE hour = 1 AND day = 1 AND month = 1 AND year = 2017;
hour | day | month | year | type | count
------+-----+-------+------+----------------+-------
1 | 1 | 1 | 2017 | cashondelivery | 2
1 | 1 | 1 | 2017 | creditcard | 1
(2 rows)
cassandra@cqlsh:test> SELECT * FROM user_transaction_counter WHERE hour = 1 AND day = 1 AND month = 1 AND year = 2017;
hour | day | month | year | userid | count
------+-----+-------+------+--------+-------
1 | 1 | 1 | 2017 | 5 | 2
1 | 1 | 1 | 2017 | 6 | 1
(2 rows)
cassandra@cqlsh:test> SELECT * FROM item_sold_counter WHERE hour = 1 AND day = 1 AND month = 1 AND year = 2017;
hour | day | month | year | item_code | count
------+-----+-------+------+-----------+-------
1 | 1 | 1 | 2017 | 0 | 3
1 | 1 | 1 | 2017 | 3 | 1
1 | 1 | 1 | 2017 | 4 | 2