【问题标题】:Big Query landing page figures not consistent with Google Analytics interfaceBig Query 着陆页数字与 Google Analytics 界面不一致
【发布时间】:2017-04-20 13:26:50
【问题描述】:

我正在使用 BigQuery 来报告 Google Analytics(分析)数据。我正在尝试使用 BigQuery 重新创建着陆页数据。

以下查询报告的会话数比 Google Analytics(分析)界面中的少 18%:

SELECT DISTINCT
  fullVisitorId,
  visitID,
  h.page.pagePath AS LandingPage
FROM
  `project-name.dataset.ga_sessions_*`, UNNEST(hits) AS h
WHERE 
  hitNumber = 1
AND h.type = 'PAGE'
AND _TABLE_SUFFIX BETWEEN '20170331' AND '20170331'
ORDER BY fullVisitorId DESC

我的方法哪里出了问题?为什么我在 GA 界面上报的数字中无法达到一个小范围内的数字?

【问题讨论】:

    标签: google-analytics google-bigquery


    【解决方案1】:

    多种原因:

    1.等效登陆页面的大查询:

    SELECT
      LandingPage,
      COUNT(sessionId) AS Sessions,
      100 * SUM(totals.bounces)/COUNT(sessionId) AS BounceRate,
      AVG(totals.pageviews) AS AvgPageviews,
      SUM(totals.timeOnSite)/COUNT(sessionId) AS AvgTimeOnSite,
    from(
      SELECT
        CONCAT(fullVisitorId,STRING(visitId)) AS sessionID,
        totals.bounces,
        totals.pageviews,
        totals.timeOnSite,
        hits.page.pagePath AS landingPage
      FROM (
        SELECT
          fullVisitorId,
          visitId,
          hits.page.pagePath,
          totals.bounces,
          totals.pageviews,
          totals.timeOnSite,
          MIN(hits.hitNumber) WITHIN RECORD AS firstHit,
          hits.hitNumber AS hitNumber
        FROM (TABLE_DATE_RANGE ([XXXYYYZZZ.ga_sessions_],TIMESTAMP('2016-08-01'), TIMESTAMP ('2016-08-31')))
        WHERE
          hits.type = 'PAGE'
          AND hits.page.pagePath'')
      WHERE
        hitNumber = firstHit)
    GROUP BY
      LandingPage
    ORDER BY
      Sessions DESC,
      LandingPage
    

    下一步:

    预计算数据 -- 预聚合表

    这些是 Google 用于加快 UI 速度的预先计算的数据。谷歌没有具体说明何时完成,但可以在任何时候完成。这些被称为预聚合表


    因此,如果您将 GA UI 中的数字与您的 Big Query 输出进行比较,您总是会发现差异。请继续并依靠您的大查询数据。

    【讨论】:

    • 感谢您的回复@Tushar。如果我理解正确,我的查询只查看了 hitNumber = 1,这就是为什么它低于 18% 的报告。它需要考虑第一次命中未标记为 1 的情况,因此使用 min 函数。此外,即便如此,GA 接口也会牺牲一些准确性以进行扩展。根据我网站的数据,运行上面的查询表明不准确率可能高达 6%。那个听起来是对的吗?不,我必须弄清楚如何用标准 SQL 重写您的查询,不过可能是另一个问题!
    • @goose 我自己是一名分析师,与谷歌密切合作。可接受的差异率为 5-10%。但我不会阻止你自己编写和检查。如果您有任何疑虑,请告诉我。也许我可以帮忙:)
    • 再次感谢@Tushar - 我并不是说不是,我只是想检查一下我是否理解正确。知道这一点很有用。
    • Oki 你在问号后面写了 No,它应该是现在 ;) 很抱歉造成混淆:D
    • 啊,我的错。是的错误类型。哦!
    【解决方案2】:

    您只需将以下内容添加到您的选择语句中即可实现相同的目的:

    ,(SELECT page.pagePath FROM UNNEST(hits) WHERE hitnumber = (SELECT MIN(hitnumber) FROM UNNEST(hits) WHERE type = 'PAGE')) landingpage

    当我运行类似下面的内容时,我可以与 GA UI 进行 1 对 1 匹配,这比原始答案更简洁:

    SELECT DISTINCT
       a.landingpage
      ,COUNT(DISTINCT(a.sessionId)) sessions
      ,SUM(a.bounces) bounces
      ,AVG(a.avg_pages) avg_pages
      ,(SUM(tos)/COUNT(DISTINCT(a.sessionId)))/60 session_duration
    FROM
    (
        SELECT DISTINCT 
           CONCAT(CAST(fullVisitorId AS STRING),CAST(visitStartTime AS STRING)) sessionId
          ,(SELECT page.pagePath FROM UNNEST(hits) WHERE hitnumber = (SELECT MIN(hitnumber) FROM UNNEST(hits) WHERE type = 'PAGE')) landingpage
          ,totals.bounces bounces
          ,totals.timeonsite tos
          ,(SELECT COUNT(hitnumber) FROM UNNEST(hits) WHERE type = 'PAGE') avg_pages
        FROM `tablename_*`
          WHERE _TABLE_SUFFIX >= '20180801'
           AND _TABLE_SUFFIX <= '20180808'
            AND totals.visits = 1   
    ) a
    GROUP BY 1
    

    【讨论】:

      【解决方案3】:

      这里有另一种方式!你可以得到相同的数字:

          SELECT 
      LandingPage,
      COUNT(DISTINCT(sessionID)) AS sessions
      FROM(
      SELECT    
          CONCAT(fullVisitorId,CAST(visitId AS STRING)) AS sessionID,
          FIRST_VALUE(hits.page.pagePath) OVER (PARTITION BY  CONCAT(fullVisitorId,CAST(visitId AS STRING)) ORDER BY hits.hitNumber ASC ) AS LandingPage
      FROM
          `xxxxxxxx1.ga_sessions_*`,
          UNNEST(hits) AS hits
        WHERE
          _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
          AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
          AND hits.type ='PAGE'
      GROUP BY fullVisitorId, visitId, sessionID,hits.page.pagePath,hits.hitNumber
      )
      GROUP BY LandingPage
      ORDER BY sessions DESC
      

      【讨论】:

        【解决方案4】:

        架构中有一个 hit.isEntrance 字段可用于此目的。 下面的示例将向您展示昨天的目标网页:

        #standardSQL
        select
          date,
          hits.page.pagePath as landingPage,
          sum(totals.visits) as visits,
          sum(totals.bounces) as bounces,
          sum(totals.transactions) as transactions
        from
          `project.dataset.ga_sessions_*`,
          unnest(hits) as hits
        where
          (_table_suffix
            between format_date("%Y%m%d", date_sub(current_date(), interval 1 day))
            and format_date("%Y%m%d", date_sub(current_date(), interval 1 day)))
          and hits.isEntrance = True
          and totals.visits = 1 #avoid counting midnight-split sessions
        group by
          1, 2
        order by 3 desc
        

        但仍有一个差异来源,它来自没有着陆页的会话(如果您在着陆页报告中检查 GA,有时会有一个 (not set) 值.

        为了也包括这些,您可以这样做:

        with
        landing_pages_set as (
          select
            concat(cast(fullVisitorId as string), cast(visitId as string), cast(date as string)) as fullVisitId,
            hits.page.pagePath as virtualPagePath
          from
            `project.dataset.ga_sessions_*`,
            unnest(hits) as hits
          where
            (_table_suffix
              between format_date("%Y%m%d", date_sub(current_date(), interval 1 day))
              and format_date("%Y%m%d", date_sub(current_date(), interval 1 day)))
            and totals.visits = 1 #avoid counting midnight-split sessions
            and hits.isEntrance = TRUE
          group by 1, 2
        ),
        
        landing_pages_not_set as (
          select
            concat(cast(fullVisitorId as string), cast(visitId as string), cast(date as string)) as fullVisitId,
            date,
            "(not set)" as virtualPagePath,
            count(distinct concat(cast(fullVisitorId as string), cast(visitId as string), cast(date as string))) as visits,
            sum(totals.bounces) as bounces,
            sum(totals.transactions) as transactions
          from
            `project.dataset.ga_sessions_*`
          where
            (_table_suffix
              between format_date("%Y%m%d", date_sub(current_date(), interval 1 day))
              and format_date("%Y%m%d", date_sub(current_date(), interval 1 day)))
            and totals.visits = 1 #avoid counting midnight-split sessions
          group by 1, 2, 3
        ),
        
        landing_pages as (
          select
            l.fullVisitId as fullVisitId,
            date,
            coalesce(r.virtualPagePath, l.virtualPagePath) as virtualPagePath,
            visits,
            bounces,
            transactions
          from
            landing_pages_not_set l left join landing_pages_set r on l.fullVisitId = r.fullVisitId
        )
        
        select virtualPagePath, sum(visits) from landing_pages group by 1 order by 2 desc
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2017-12-26
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多