【发布时间】:2016-06-03 23:03:55
【问题描述】:
我正在尝试使用 bigquery 上的 reddit 数据,我想在一行中查看 cmets 和回复。我看到 bigquery 支持子查询,但我无法构造查询。由于数据的结构,我必须使用子查询来自连接同一个表,特别是我想将 id 和 parent_id 连接在一起,但我需要修改 id 才能加入。这是我尝试进行查询的方式:
SELECT
p.subreddit,
p.body AS first_body,
p.score AS first_score,
CONCAT('t1_',p.id) AS first_id ,
c.last_body,
c.last_score,
c.last_id
FROM
[fh-bigquery:reddit_comments.2016_01] p,
(
SELECT
body AS last_body,
score AS last_score,
CONCAT('t1_',id) AS last_id,
parent_id,
author,
body
FROM [fh-bigquery:reddit_comments.2016_01]
WHERE body != '[deleted]'
AND author != '[deleted]'
AND score > 1
) c
WHERE p.first_id = c.parent_id
AND p.score > 1
AND p.author != '[deleted]'
AND p.body != '[deleted]';
我得到的错误是:
Field 'c.parent_id' not found in table 'fh-bigquery:reddit_comments.2016_01'; did you mean 'parent_id'?
您可以在此处运行查询: https://bigquery.cloud.google.com/table/fh-bigquery:reddit_comments.2016_01
我不确定如何解决这个问题。加入这个并让这个查询运行的正确方法是什么?
【问题讨论】:
标签: sql subquery google-bigquery reddit bigdata