【问题标题】:calculating unique pageviews in bigquery计算 bigquery 中的唯一网页浏览量
【发布时间】:2016-05-12 14:52:14
【问题描述】:

我正在尝试计算这 2 个页面的唯一网页浏览量。

  • 注册帐户 ->mysite.com/form?account=true&subscribed
  • 仅限帐户 -> mysite.com/form?account=true

我有这个带有 case 函数的查询,但是当我为每个页面参数单独运行以下查询时,我得到的结果与运行它组合时不同。但是综合结果是不准确的。有人可以告诉我我在这里做错了什么吗?

     SELECT
    COUNT(DISTINCT (CASE WHEN hits.type = "PAGE" THEN CONCAT(fullvisitorid, 
    STRING(visitid), hits.page.pagepath) END)) AS UniquePageViews,
    CASE WHEN (REGEXP_MATCH (hits.page.pagePath, '(.*account=true)'))  THEN "Accounts" 
    WHEN (REGEXP_MATCH (hits.page.pagePath, '(.*subscribed)')) THEN "Signups" ELSE "Others" END AS Goals
    FROM
     [mydata.ga_sessions_20150506]
    GROUP BY
     Goals

【问题讨论】:

    标签: google-bigquery


    【解决方案1】:

    问题可能是字符串"account=true""subscribed" 出现在同一个pagePath 中的情况,因此当您尝试计算它们时,只考虑一个。

    解决方法之一是改变匹配条件,如:

    SELECT
        EXACT_COUNT_DISTINCT(CASE WHEN hits.type = "PAGE" THEN CONCAT(fullvisitorid, 
    STRING(visitid), hits.page.pagepath) END) AS UniquePageViews,
        CASE WHEN (REGEXP_MATCH (hits.page.pagePath, '(account=true)') AND NOT REGEXP_MATCH (hits.page.pagePath, '(subscribed)'))  THEN "Accounts" 
             WHEN (REGEXP_MATCH (hits.page.pagePath, '(subscribed)') AND NOT REGEXP_MATCH (hits.page.pagePath, '(account=true)')) THEN "Signups"
             WHEN (REGEXP_MATCH (hits.page.pagePath, '(subscribed)') AND  REGEXP_MATCH (hits.page.pagePath, '(account=true)')) THEN "Both"
             ELSE "Others" END AS Goals
    FROM
     [mydata.ga_sessions_20150506]
    GROUP BY
     Goals
    

    我强制条件仅匹配 "account=true" 而不是 "subscribed"

    例如,这是我在 ga_sessions 数据集上测试的:

    SELECT
    exact_count_distinct(CASE WHEN hits.type = "PAGE" THEN CONCAT(fullvisitorid, STRING(visitid), hits.page.pagepath) END) AS UniquePageViews,
    CASE WHEN (REGEXP_MATCH (hits.page.pagePath, '(colcci)') AND NOT REGEXP_MATCH (hits.page.pagePath, '(lacoste)'))  THEN "colcci" 
         WHEN (REGEXP_MATCH (hits.page.pagePath, '(lacoste)') AND NOT REGEXP_MATCH (hits.page.pagePath, '(colcci)')) THEN "lacoste"
         WHEN (REGEXP_MATCH (hits.page.pagePath, '(lacoste)') AND REGEXP_MATCH (hits.page.pagePath, '(colcci)')) THEN 'both'
         ELSE "Others" END AS Goals
    FROM [40663402.ga_sessions_20150506]
    GROUP BY
    Goals
    

    希望这会有所帮助。有任何问题,请告诉我们,

    【讨论】:

    • 我得到了和以前一样的结果。也许是唯一网页浏览量的问题?我在下面的查询中测试了 uniquepageview,我得到了准确的数字。但我不确定如何在我的示例中包含此查询。 SELECT COUNT(1) as unique_pageviews FROM ( SELECT hits.page.pagePath, hits.page.pageTitle, fullVisitorId, visitNumber, COUNT(1) as hits FROM [my_table] WHERE hits.type='PAGE' GROUP BY hits.page。 pagePath, hits.page.pageTitle, fullVisitorId, visitNumber)
    • 不知道出了什么问题。也许如果您尝试添加:where hits.page.pagepath contains ("account=true&subscribed" or ("account=true" and not "subscribed") 可能会解决但不确定为什么第一个查询不起作用。 (也没有测试这个查询,可能不起作用)
    • 小问题,你能和我们分享一下我建议你的查询和你说的是正确的查询的结果吗?
    • UniquePageViews/Goals 62=Both, 331269=Others, 140=Signups,31=Accounts
    • @willy,正确的值应该是 93=accounts, and 202=signups
    猜你喜欢
    • 2020-10-24
    • 2023-04-02
    • 1970-01-01
    • 1970-01-01
    • 2015-09-11
    • 2012-07-23
    • 1970-01-01
    • 1970-01-01
    • 2015-03-24
    相关资源
    最近更新 更多