在每个月的特定日期查找和比较股票价格答案

【问题标题】：Finding and comparing stock prices at a specific day of every month在每个月的特定日期查找和比较股票价格
【发布时间】：2021-09-27 14:12:43
【问题描述】：

因此，我正在尝试编写的代码是解决历史上一个月中的哪些日子是买卖股票的最佳日子。我特别关注的股票是 UVXY。我试图找出哪些日子是历史月度低点，哪些是历史月度高点，然后取平均值。到目前为止，我的代码不起作用，因为在每月的某些日子里，20 日或 10 日不是交易日。实际的字符串会更长并且有更多的日期，但我愿意使用 yfinance 来获取历史价格，我只是不确定它是如何工作的。谢谢！

from bs4 import BeautifulSoup

content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
Fri 09-24-2021      22.22     22.27     20.38      20.49    47101392 
Thu 09-23-2021      22.52      22.63      21.32      21.48    48145436 
Wed 09-22-2021      24.88      25.37      22.88      23.68    59917888 
Tue 09-21-2021      26.03      28.18      25.20      25.86    73069928 
Mon 09-20-2021      26.26      30.81      25.36      27.31   104578920 
Fri 09-17-2021      21.56      23.58      21.33      23.48    61526336 
Thu 09-16-2021      21.91      22.66      21.04      21.38    42485960 
....
Wed 12-07-2016    9150.00    9390.00    8780.00    9270.00       37485 
Tue 12-06-2016    9530.00    9660.00    9130.00    9210.00       27220
</pre>""" 

soup = BeautifulSoup(content, "html.parser")
stuff = soup.find('pre').text
lines = stuff.split("\n")

listOfStuff=[]
openPriceOfTrades=[]
closePriceOfTrades=[]
difference=[]



for line in lines:
  if(line[7:9]=="20"):
    closePriceOfTrades.append(line[20:-46])
  if line[7:9]=="10":
    openPriceOftrades.append(line[20:-46])
    difference = []   # initialization of result list

for i in range(len(openPriceOfTrades)-1):
  print(len(openPriceOfTrades))
  difference.append(float(closePriceOfTrades[i])-float(openPriceOfTrades[i]))
print(difference)

【问题讨论】：

如果您使用yfinance，那么您应该以 JSON 格式获取数据，我不明白您为什么为此使用 BeautifulSoup。
也许您应该将日期保留为三列mont、day、year - 那么选择同一天的数据会更简单。如果您使用pandas.DataFrame 而不是普通列表，它会简单得多。
您的 content 看起来像带有空格作为分隔符的文件 CSV，您可以使用 io.StringIO 和 pandas.read_csv(... , sep='\s+') 将其转换为 pandas.DataFrame

标签： python pandas dataframe stock yfinance

【解决方案1】：

你应该学习pandas.DataFrame。

首先，我将删除带有<pre> 的行以仅包含数据。

content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
Fri 09-24-2021      22.22     22.27     20.38      20.49    47101392 
Thu 09-23-2021      22.52      22.63      21.32      21.48    48145436 
Wed 09-22-2021      24.88      25.37      22.88      23.68    59917888 
Tue 09-21-2021      26.03      28.18      25.20      25.86    73069928 
Mon 09-20-2021      26.26      30.81      25.36      27.31   104578920 
Fri 09-17-2021      21.56      23.58      21.33      23.48    61526336 
Thu 09-16-2021      21.91      22.66      21.04      21.38    42485960 
Wed 12-07-2016    9150.00    9390.00    8780.00    9270.00       37485 
Tue 12-06-2016    9530.00    9660.00    9130.00    9210.00       27220
</pre>"""

# remove lines with `<>`
content = '\n'.join(line for line in content.split('\n') if not line.startswith('<')).strip()

print(content)

然后它看起来像带有空格作为分隔符的文件CSV，您可以使用io模拟内存中的文件并读取它

import pandas as pd
import io

df = pd.read_csv(io.StringIO(content), sep='\s+', names=['day', 'date', 'A', 'B', 'C', 'D', 'volumen'])

然后你可以用date-day创建列

df['date-day'] = df['date'].str[3:5]

然后您可以在date-day 列中选择所有带有20 的行并计算average（平均值）

day_20 = df[ df['date-day'] == '20' ]

print(day_20.mean())

或者您可以使用groupby 同时处理所有时间。

for value, group in df.groupby('date-day'):
    print('--- date-day:', value, '---')
    #print(group.mean())
    print('mean "A":', group['A'].mean())
    print('mean "B":', group['B'].mean())
    print('mean "C":', group['C'].mean())
    print('mean "D":', group['D'].mean())

完整的工作代码：

content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
Fri 09-24-2021      22.22     22.27     20.38      20.49    47101392 
Thu 09-23-2021      22.52      22.63      21.32      21.48    48145436 
Wed 09-22-2021      24.88      25.37      22.88      23.68    59917888 
Tue 09-21-2021      26.03      28.18      25.20      25.86    73069928 
Mon 09-20-2021      26.26      30.81      25.36      27.31   104578920 
Fri 09-17-2021      21.56      23.58      21.33      23.48    61526336 
Thu 09-16-2021      21.91      22.66      21.04      21.38    42485960 
Wed 12-07-2016    9150.00    9390.00    8780.00    9270.00       37485 
Tue 12-06-2016    9530.00    9660.00    9130.00    9210.00       27220
</pre>"""

# remove lines with `<>`
content = '\n'.join(line for line in content.split('\n') if not line.startswith('<')).strip()

import pandas as pd
import io

df = pd.read_csv(io.StringIO(content), sep='\s+', names=['day', 'date', 'A', 'B', 'C', 'D', 'volumen'])

df['date-day'] = df['date'].str[3:5]

print(df)

day_20 = df[ df['date-day'] == '20' ]
print(day_20)

for value, group in df.groupby('date-day'):
    print('--- date-day:', value, '---')
    #print(group.mean())
    print('mean "A":', group['A'].mean())
    print('mean "B":', group['B'].mean())
    print('mean "C":', group['C'].mean())
    print('mean "D":', group['D'].mean())

结果：

   day        date        A        B        C        D    volumen date-day
0  Fri  09-24-2021    22.22    22.27    20.38    20.49   47101392       24
1  Thu  09-23-2021    22.52    22.63    21.32    21.48   48145436       23
2  Wed  09-22-2021    24.88    25.37    22.88    23.68   59917888       22
3  Tue  09-21-2021    26.03    28.18    25.20    25.86   73069928       21
4  Mon  09-20-2021    26.26    30.81    25.36    27.31  104578920       20
5  Fri  09-17-2021    21.56    23.58    21.33    23.48   61526336       17
6  Thu  09-16-2021    21.91    22.66    21.04    21.38   42485960       16
7  Wed  12-07-2016  9150.00  9390.00  8780.00  9270.00      37485       07
8  Tue  12-06-2016  9530.00  9660.00  9130.00  9210.00      27220       06

   day        date      A      B      C      D    volumen date-day
4  Mon  09-20-2021  26.26  30.81  25.36  27.31  104578920       20

--- date-day: 06 ---
mean "A": 9530.0
mean "B": 9660.0
mean "C": 9130.0
mean "D": 9210.0
--- date-day: 07 ---
mean "A": 9150.0
mean "B": 9390.0
mean "C": 8780.0
mean "D": 9270.0
--- date-day: 16 ---
mean "A": 21.91
mean "B": 22.66
mean "C": 21.04
mean "D": 21.38
--- date-day: 17 ---
mean "A": 21.56
mean "B": 23.58
mean "C": 21.33
mean "D": 23.48
--- date-day: 20 ---
mean "A": 26.26
mean "B": 30.81
mean "C": 25.36
mean "D": 27.31
--- date-day: 21 ---
mean "A": 26.03
mean "B": 28.18
mean "C": 25.2
mean "D": 25.86
--- date-day: 22 ---
mean "A": 24.88
mean "B": 25.37
mean "C": 22.88
mean "D": 23.68
--- date-day: 23 ---
mean "A": 22.52
mean "B": 22.63
mean "C": 21.32
mean "D": 21.48
--- date-day: 24 ---
mean "A": 22.22
mean "B": 22.27
mean "C": 20.38
mean "D": 20.49

如果您使用yfinance，那么您将直接以pandas.DataFrame 获取数据

【讨论】：