【问题标题】:Bigquery Google Analytics Users Not Counting UniqueBigquery Google Analytics 用户不计入唯一性
【发布时间】:2017-05-03 10:22:12
【问题描述】:

我在 BigQuery for Google Analytics 中遇到以下查询问题。由于某种原因,我无法将用户数计算为唯一用户,它本质上是计算行数,因此这些数字与会话非常相似。我也尝试过 EXACT_COUNT_DISTINCT() 但给出了相同的答案。

    SELECT
  date AS Day,
  MAX(CASE
      WHEN hits.sourcePropertyInfo.sourcePropertyTrackingId CONTAINS '778****' THEN 'MUG'
      WHEN hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website' END) AS Property,
  geoNetwork.country AS Country,
  SUM(totals.visits) AS visits,
  COUNT (DISTINCT(fullVisitorId), 1000000) AS Users,
  SUM(IFNULL(totals.newVisits,0)) AS NEW,
  (SUM(IFNULL(totals.screenviews,0))+SUM(IFNULL(totals.pageviews,0))) AS PAGEVIEWS,
  IFNULL(SUM(CASE
        WHEN totals.screenviews = 1 THEN SUM(IFNULL(totals.screenviews,0))
        ELSE 0 END)+ SUM(IFNULL(totals.bounces,0)),0) AS BOUNCES,
  SUM(CASE
      WHEN REGEXP_MATCH(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1
      ELSE 0 END) AS NewRegistrations,
  SUM(CASE
      WHEN REGEXP_MATCH(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar') OR hits.eventInfo.eventAction CONTAINS 'addtobasket::' THEN 1
      ELSE 0 END) AS ClickToBuy,
  SUM(IFNULL(totals.transactions,0)) AS Transactions,
  SUM(IFNULL(totals.transactionRevenue,0))/1000000 AS Revenue
FROM (TABLE_DATE_RANGE([****.ga_sessions_], TIMESTAMP('2017-03-15'), TIMESTAMP('2017-03-31'))),
GROUP BY
  Day,
  Country,
  geoNetwork.country,
  totals.screenviews;

【问题讨论】:

  • 你为什么按screenviews分组?
  • @ElliottBrossard 我认为这是问题所在。我试着把它排除在外,但它一直迫使我去
  • 我认为问题在于您有一些嵌套聚合,即 SUM 中的 SUM。如果您修复该逻辑,则查询应该可以工作。不过,我真的建议您使用standard SQL 进行分析。您可能还对migration guide 感兴趣。
  • 感谢@elliot,会试一试。选择中的子查询会是这里最大的优势吗?
  • 其他几个是 COUNT(DISTINCT ...) 给出了准确的结果(并且通常比 EXACT_COUNT_DISTINCT 更快),并且与 ga_sessions_ 表相关的重复字段处理要明智得多,尽管您可能发现有一条学习曲线。 Working with Arrays topic 是一个很好的介绍。

标签: google-analytics google-bigquery


【解决方案1】:

我刚刚测试了这个查询,它似乎有点简单:

SELECT
date,
MAX(CASE
     WHEN hits.sourcePropertyInfo.sourcePropertyTrackingId CONTAINS '778****' THEN 'MUG'
     WHEN hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website' END) AS Property,
geoNetwork.country AS Country,
SUM(totals.visits) AS visits,
COUNT(DISTINCT(fullVisitorId), 1000000) AS Users,
SUM(totals.newVisits) AS NEW,
SUM(totals.pageviews) AS PAGEVIEWS,
SUM(totals.bounces) AS BOUNCES,
SUM(CASE
      WHEN REGEXP_MATCH(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1
      ELSE 0 END) AS NewRegistrations,
SUM(CASE
      WHEN REGEXP_MATCH(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::') THEN 1
      ELSE 0 END) AS ClickToBuy,
SUM(totals.transactions) AS Transactions,
SUM(totals.transactionRevenue) /1000000 AS Revenue
FROM (TABLE_DATE_RANGE([project_id:dataset_id.ga_sessions_], TIMESTAMP('2017-03-15'), TIMESTAMP('2017-03-31'))),
GROUP BY
date, Country

它在我们的数据库中确实有效(不知道为什么您将 screenviewspageviews 相加)。

在标准 SQL 中(强烈建议您使用此版本)也许这已经为您解决了:

SELECT
date,
MAX(CASE
     WHEN exists(select 1 from unnest(hits) hits where regexp_contains(hits.sourcePropertyInfo.sourcePropertyTrackingId, r'778\*\*\*\*')) THEN 'MUG'
     WHEN exists(select 1 from unnest(hits) hits where hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social') THEN 'Social'ELSE 'Website' END) AS Property,
geoNetwork.country AS Country,
SUM(totals.visits) AS visits,
COUNT(DISTINCT(fullVisitorId)) AS Users,
SUM(totals.newVisits) AS new_,
SUM(totals.pageviews) AS PAGEVIEWS,
SUM(totals.bounces) AS BOUNCES,
SUM(CASE
      WHEN exists(select 1 from unnest(hits) hits where REGEXP_contains(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro')) THEN 1
      ELSE 0 END) AS NewRegistrations,
SUM(CASE
      WHEN exists(select 1 from unnest(hits) hits where REGEXP_contains(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::')) THEN 1
      ELSE 0 END) AS ClickToBuy,
SUM(totals.transactions) AS Transactions,
SUM(totals.transactionRevenue) /1000000 AS Revenue
FROM `project_id.dataset_id.ga_sessions*`
where 1 = 1
and parse_timestamp("%Y%m%d", regexp_extract(_table_suffix, r'.*_(.*)')) between  TIMESTAMP('2017-03-15') and  TIMESTAMP('2017-03-31')
GROUP BY
date, Country

【讨论】:

  • 我将应用程序的屏幕浏览量和网站的浏览量结合起来,但我想如果我只是制作中间表会更容易!谢谢@will这行得通
  • 您好@will,正在尝试使用标准 SQL 版本。我想添加主机名和主页加载速度。能给我看看吗?
  • 很高兴知道它有效:)。至于添加主机名和主页加载速度,也许最好开始一个新问题,显示您想要实现的内容并尝试过,因为它可能更复杂地解决(命中是 ga 架构中的重复字段,所以它可能必须未嵌套可能会更改查询)
  • 通过大量的反复试验设法解决了这个问题,谢谢!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多