你应该学习pandas.DataFrame。
首先,我将删除带有<pre> 的行以仅包含数据。
content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
Fri 09-24-2021 22.22 22.27 20.38 20.49 47101392
Thu 09-23-2021 22.52 22.63 21.32 21.48 48145436
Wed 09-22-2021 24.88 25.37 22.88 23.68 59917888
Tue 09-21-2021 26.03 28.18 25.20 25.86 73069928
Mon 09-20-2021 26.26 30.81 25.36 27.31 104578920
Fri 09-17-2021 21.56 23.58 21.33 23.48 61526336
Thu 09-16-2021 21.91 22.66 21.04 21.38 42485960
Wed 12-07-2016 9150.00 9390.00 8780.00 9270.00 37485
Tue 12-06-2016 9530.00 9660.00 9130.00 9210.00 27220
</pre>"""
# remove lines with `<>`
content = '\n'.join(line for line in content.split('\n') if not line.startswith('<')).strip()
print(content)
然后它看起来像带有空格作为分隔符的文件CSV,您可以使用io模拟内存中的文件并读取它
import pandas as pd
import io
df = pd.read_csv(io.StringIO(content), sep='\s+', names=['day', 'date', 'A', 'B', 'C', 'D', 'volumen'])
然后你可以用date-day创建列
df['date-day'] = df['date'].str[3:5]
然后您可以在date-day 列中选择所有带有20 的行并计算average(平均值)
day_20 = df[ df['date-day'] == '20' ]
print(day_20.mean())
或者您可以使用groupby 同时处理所有时间。
for value, group in df.groupby('date-day'):
print('--- date-day:', value, '---')
#print(group.mean())
print('mean "A":', group['A'].mean())
print('mean "B":', group['B'].mean())
print('mean "C":', group['C'].mean())
print('mean "D":', group['D'].mean())
完整的工作代码:
content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
Fri 09-24-2021 22.22 22.27 20.38 20.49 47101392
Thu 09-23-2021 22.52 22.63 21.32 21.48 48145436
Wed 09-22-2021 24.88 25.37 22.88 23.68 59917888
Tue 09-21-2021 26.03 28.18 25.20 25.86 73069928
Mon 09-20-2021 26.26 30.81 25.36 27.31 104578920
Fri 09-17-2021 21.56 23.58 21.33 23.48 61526336
Thu 09-16-2021 21.91 22.66 21.04 21.38 42485960
Wed 12-07-2016 9150.00 9390.00 8780.00 9270.00 37485
Tue 12-06-2016 9530.00 9660.00 9130.00 9210.00 27220
</pre>"""
# remove lines with `<>`
content = '\n'.join(line for line in content.split('\n') if not line.startswith('<')).strip()
import pandas as pd
import io
df = pd.read_csv(io.StringIO(content), sep='\s+', names=['day', 'date', 'A', 'B', 'C', 'D', 'volumen'])
df['date-day'] = df['date'].str[3:5]
print(df)
day_20 = df[ df['date-day'] == '20' ]
print(day_20)
for value, group in df.groupby('date-day'):
print('--- date-day:', value, '---')
#print(group.mean())
print('mean "A":', group['A'].mean())
print('mean "B":', group['B'].mean())
print('mean "C":', group['C'].mean())
print('mean "D":', group['D'].mean())
结果:
day date A B C D volumen date-day
0 Fri 09-24-2021 22.22 22.27 20.38 20.49 47101392 24
1 Thu 09-23-2021 22.52 22.63 21.32 21.48 48145436 23
2 Wed 09-22-2021 24.88 25.37 22.88 23.68 59917888 22
3 Tue 09-21-2021 26.03 28.18 25.20 25.86 73069928 21
4 Mon 09-20-2021 26.26 30.81 25.36 27.31 104578920 20
5 Fri 09-17-2021 21.56 23.58 21.33 23.48 61526336 17
6 Thu 09-16-2021 21.91 22.66 21.04 21.38 42485960 16
7 Wed 12-07-2016 9150.00 9390.00 8780.00 9270.00 37485 07
8 Tue 12-06-2016 9530.00 9660.00 9130.00 9210.00 27220 06
day date A B C D volumen date-day
4 Mon 09-20-2021 26.26 30.81 25.36 27.31 104578920 20
--- date-day: 06 ---
mean "A": 9530.0
mean "B": 9660.0
mean "C": 9130.0
mean "D": 9210.0
--- date-day: 07 ---
mean "A": 9150.0
mean "B": 9390.0
mean "C": 8780.0
mean "D": 9270.0
--- date-day: 16 ---
mean "A": 21.91
mean "B": 22.66
mean "C": 21.04
mean "D": 21.38
--- date-day: 17 ---
mean "A": 21.56
mean "B": 23.58
mean "C": 21.33
mean "D": 23.48
--- date-day: 20 ---
mean "A": 26.26
mean "B": 30.81
mean "C": 25.36
mean "D": 27.31
--- date-day: 21 ---
mean "A": 26.03
mean "B": 28.18
mean "C": 25.2
mean "D": 25.86
--- date-day: 22 ---
mean "A": 24.88
mean "B": 25.37
mean "C": 22.88
mean "D": 23.68
--- date-day: 23 ---
mean "A": 22.52
mean "B": 22.63
mean "C": 21.32
mean "D": 21.48
--- date-day: 24 ---
mean "A": 22.22
mean "B": 22.27
mean "C": 20.38
mean "D": 20.49
如果您使用yfinance,那么您将直接以pandas.DataFrame 获取数据