【问题标题】:average temperature from year and month data in a file python文件python中年份和月份数据的平均温度
【发布时间】:2023-03-15 12:52:02
【问题描述】:

我有一个数据文件,其中包含某种特定格式的数据,并且在处理时需要忽略一些额外的行。我需要处理数据并基于相同的值计算值。

样本数据:

Average monthly temperatures in Dubuque, Iowa, 
January 1964 through december 1975, n=144

24.7    25.7    30.6    47.5    62.9    68.5    73.7    67.9    61.1    48.5    39.6    20.0
16.1    19.1    24.2    45.4    61.3    66.5    72.1    68.4    60.2    50.9    37.4    31.1
10.4    21.6    37.4    44.7    53.2    68.0    73.7    68.2    60.7    50.2    37.2    24.6
21.5    14.7    35.0    48.3    54.0    68.2    69.6    65.7    60.8    49.1    33.2    26.0
19.1    20.6    40.2    50.0    55.3    67.7    70.7    70.3    60.6    50.7    35.8    20.7
14.0    24.1    29.4    46.6    58.6    62.2    72.1    71.7    61.9    47.6    34.2    20.4
8.4     19.0    31.4    48.7    61.6    68.1    72.2    70.6    62.5    52.7    36.7    23.8
11.2    20.0    29.6    47.7    55.8    73.2    68.0    67.1    64.9    57.1    37.6    27.7
13.4    17.2    30.8    43.7    62.3    66.4    70.2    71.6    62.1    46.0    32.7    17.3
22.5    25.7    42.3    45.2    55.5    68.9    72.3    72.3    62.5    55.6    38.0    20.4
17.6    20.5    34.2    49.2    54.8    63.8    74.0    67.1    57.7    50.8    36.8    25.5
20.4    19.6    24.6    41.3    61.8    68.5    72.0    71.1    57.3    52.5    40.6    26.2

示例文件来源:http://robjhyndman.com/tsdldata/data/cryer2.dat

注意:这里,行代表年份,列代表月份。

我正在尝试编写一个函数,它从给定的 url 返回任何月份的平均温度。

我试过如下:

def avg_temp_march(f):

   march_temps = []

    # read each line of the file and store the values
    # as floats in a list
    for line in f:
        line = str(line, 'ascii') # now line is a string
        temps = line.split()
    # check that it is not empty.
        if temps != []:
            march_temps.append(float(temps[2]))

    # calculate the average and return it
    return sum(march_temps) / len(march_temps)

avg_temp_march("data5.txt")

但我收到错误line = str(line, 'ascii')

TypeError: decoding str is not supported

【问题讨论】:

  • 请提供MCVE
  • 删除line = str(line, 'ascii') # now line is a string这一行
  • str 接受一个参数,你给它两个。
  • 当我删除你所说的行时,我得到一个索引超出范围错误我似乎无法到达月份:march_temps.append(float(temps[2])) IndexError: list索引超出范围
  • 我的意思是line 已经是一个字符串了。但请务必阅读我的第一条评论

标签: python average temperature


【解决方案1】:

我认为不需要将字符串转换为字符串。

我尝试通过一些修改来修复您的代码:

def avg_temp_march(f):
    # f is a string read from file

    march_temps = []

    for line in f.split("\n"):
        if line == "":  continue
        temps = line.split(" ")
        temps = [t for t in temps if t != ""]

        # check that it is not empty.
        month_index = 2
        if len(temps) > month_index:

            try:
                march_temps.append(float(temps[month_index]))
            except Exception, e:
                print temps
                print "Skipping line:", e
    # calculate the average and return it
    return sum(march_temps) / len(march_temps)

输出:

['Average', 'monthly', 'temperatures', 'in', 'Dubuque,', 'Iowa,']
Skipping line: could not convert string to float: temperatures
['January', '1964', 'through', 'december', '1975,', 'n=144']
Skipping line: could not convert string to float: through
32.475

根据您的原始问题(在最新编辑之前),我认为您可以通过这种方式解决您的问题。

# from urllib2 import urlopen 
from urllib.request import urlopen #python3

def avg_temp_march(url):
  f = urlopen(url).read()
  data = f.split("\n")[3:] #ingore the first 3 lines
  data = [line.split() for line in data if line!=''] #ignore the empty lines
  data = [map(float, line) for line in data] #Convert all numbers to float
  month_index = 2 # 2 for march
  monthly_sum = sum([line[month_index] for line in data])
  monthly_avg = monthly_sum/len(data)
  return monthly_avg

print avg_temp_march("http://robjhyndman.com/tsdldata/data/cryer2.dat")

【讨论】:

  • 感谢您的帮助,我很感激!但我没有“import urllib2”我想我可以让它与你提供的东西一起工作
  • 嗨,我认为您使用的是 python 3。您可以简单地从 python 3 中提供的另一个模块导入urlopen(请参阅我最新的答案更改)。如果它解决了您的问题,请将其标记为答案。如果您仍然遇到任何问题,请告诉我。
【解决方案2】:

使用pandas,代码会变短一点:

import calendar
import pandas a spd
df = pd.read_csv('data5.txt', delim_whitespace=True, skiprows=2,
                 names=calendar.month_abbr[1:])

现在是三月:

>>> df.Mar.mean()
32.475000000000001

所有月份:

>>> df.mean()
Jan    16.608333
Feb    20.650000
Mar    32.475000
Apr    46.525000
May    58.091667
Jun    67.500000
Jul    71.716667
Aug    69.333333
Sep    61.025000
Oct    50.975000
Nov    36.650000
Dec    23.641667
dtype: float64

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2023-03-15
    • 1970-01-01
    • 2019-07-28
    • 2022-07-07
    • 1970-01-01
    • 1970-01-01
    • 2019-06-21
    • 1970-01-01
    相关资源
    最近更新 更多