【问题标题】:Parse and Aggregate Google Analytics Browser Version CSV Data解析和聚合 Google Analytics(分析)浏览器版本 CSV 数据
【发布时间】:2023-03-06 11:08:01
【问题描述】:

Google Analytics(分析)将增量浏览器版本视为不同,因此我的报告无法得出任何有用的结论。例如,Chrome 45.0.2454.93 被视为与 45.0.2454.85 不同的浏览器。

我想编写一个 Python 2 应用程序,它可以抓取 Google Analytics CSV 并汇总主要浏览器版本的会话信息。

我是 Python 新手,但这是我的尝试……

from __future__ import division
import csv
from collections import defaultdict

RAWFile = 'somefile.csv'

def default_val():
    return [0, 0]

def aggregateaway():
    with open(RAWFile, 'r') as inf:
        has_header = csv.Sniffer().has_header(inf.read(1024))
        inf.seek(0)  # rewind
        incsv = csv.reader(inf)
        if has_header:
            next(incsv)  # skip header row

    reader = csv.DictReader(incsv, 'r')

    BrowserVersion = defaultdict(default_val)
    for row in reader:
        Sessions = int(row["Sessions"])
        BrowserVersion[row["BrowserVersion"]][0] += Sessions

    writer = csv.writer(open('out.csv', 'w'))
    writer.writerow(["BrowserVersion", "Sessions"])
    writer.writerows([BrowserVersion] + BrowserVersion[BrowserVersion] for BrowserVersion in BrowserVersion)

我知道两个问题:

  1. 我收到ValueError('I/O operation on closed file',) - 我认为这是因为我使用了在数据之前跳过前导行的逻辑。
  2. 我不确定如何以编程方式对主要浏览器版本进行分组。是left(BrowserVersion, 2)吗?即便如此,由于其他浏览器的版本控制约定,这也是有缺陷的。也许我可以搜索第一个 . 然后应用左边的 x 个字符。我如何将它添加到上面的代码中?

编辑:一些示例 CSV 数据:

# ----------------------------------------
# My Site
# Web Browsers
# 20150828-20150927
# ----------------------------------------

Browser,Operating System,Browser Version,Sessions,Bounce Rate
Safari,iOS,8.0,"1,681",68.91%
Chrome,Windows,45.0.2454.85,"1,200",40.98%
Chrome,Windows,45.0.2454.93,"2,273",40.98%

【问题讨论】:

    标签: python python-2.7 csv


    【解决方案1】:

    这是我在一位同事的大力帮助下最终使用的。希望 Google 决定尽快将此功能添加到 Google Analytics(分析)中:)

    #!/usr/bin/env python
    import csv
    import operator
    import pprint
    
    inputfilename = 'input.csv'
    outputfilename = 'output.csv'
    
    values = []
    with open(inputfilename, 'rb') as csvfile: #Open file
        reader = csv.DictReader(filter(lambda row: row[0]!='#', csvfile)) #Skip rows with #
        header = reader.next().values()[0] #Gives a list of field names
        for rows in reader:
            row = rows.values()[0]
            values.append({header[i]: row[i] for i in range(len(header))}) #Creates list of csv data in a dictionary
    
    report = {} #Define empty dictionary to aggregate data into
    
    for value in values:
        browserstring = value["Operating System"] + " - " + value["Browser"] + " - " + value["Browser Version"].split('.')[0] #Split browser version by '.' to get major version release
        if value["Browser"] <> '': #Skip to next to avoid GA column totals in output (i.e. those with a blank browser value)
            if browserstring in report:
                report [browserstring] += int(value["Sessions"].replace(',','')) #Remove number comma formatting, sum data
            else:
                report [browserstring] = int(value["Sessions"].replace(',','')) #Remove number formatting and add new reecord (if it does not exist already already)
        else:
            next
    
    sorted_report = sorted(report.items(), reverse=True, key=operator.itemgetter(1)) #Convert dictionary to tuple to sort values in descending order
    
    #pprint.pprint(sorted_report) #for debugging
    
    with open(outputfilename,'w') as out: #Let's print this to file
        csv_out=csv.writer(out)
        csv_out.writerow(['Aggregated Browser Version - Major']) #Title
        csv_out.writerow(['Browser','Sessions']) #Column headers
        for row in sorted_report: #Data from ordered tuple list
            csv_out.writerow(row)
    

    输出 CSV 示例两行:

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-05-03
      • 2017-10-02
      • 1970-01-01
      • 1970-01-01
      • 2010-10-30
      • 1970-01-01
      相关资源
      最近更新 更多