【问题标题】:Formatting Data to Output to Excel格式化数据以输出到 Excel
【发布时间】:2020-10-20 04:01:45
【问题描述】:

我是 Python 的初学者。我知道现在这是意大利面条代码。请忽略我对 Regex 的野蛮使用来格式化一些数据,这将是我的下一篇文章。

但是,我正在尝试从网站上抓取 Texas Hold 'Em 手牌排名并将其输出到 Excel 文件中,以便使用 ctrl F 轻松搜索它们。

网站上的表格没有用 HTML 编码,所以我决定使用 BeautifulSoup 来抓取这些信息。

到目前为止,我已经设法将数据从字符串转换为列表。当我将它导出到 Excel 时,它会将整行放在同一列单元格中,当它应该用卡片分隔时,获胜概率等逐行..

如何格式化这些数据以使每一行都显示在它自己的单元格中?我有一个想法,使用 for 循环遍历手牌列表及其所有信息,但我不知道如何区分不同的标题,例如卡片、获胜概率等。到目前为止,我使用过正则表达式格式化数据以便于拆分,这是单独的变量。

网站表格是我希望如何在 Excel 中显示数据的一个很好的例子:https://wizardofodds.com/games/texas-hold-em/6-player-game/

from bs4 import BeautifulSoup
import requests
import re
import xlsxwriter

url = "https://wizardofodds.com/games/texas-hold-em/6-player-game/"

page = requests.get(url)

soup = BeautifulSoup(page.text, "html.parser")

def getContent():
    table_data = soup.find(class_ = "box-content has-data").get_text()

    handRegex1 = re.sub("Pair of ", "", table_data)
    handRegex2 = re.sub("'", "", handRegex1)
    handRegex3 = re.sub("/", "", handRegex2)
    handRegex4 = re.sub(" suited", "s", handRegex3)
    handRegex5 = re.sub(" unsuited", "o", handRegex4)
    handRegex6 = re.sub("""
    """, " ", handRegex5)
    handRegex7 = re.sub("\n", " ", handRegex6)
    handRegex8 = re.sub("\s\s\s", ",", handRegex7)
    separate = handRegex8.split(",")
    print(handRegex7)

    #using handRegex7 we can add each word to an individual cell. We have to separate the headers and sort those, the actual data should be easy to seperate by space charecter.

    workbook = xlsxwriter.Workbook('/Users/colivart/Excel_Files/Texas_Hold_Em_6.xlsx')
    worksheet = workbook.add_worksheet()
    """
    We can use for loop to iterate through format variable.
    This will allow us to add each hand
    and it's values one by one.
    """


    worksheet.write_column('A1', format)

    workbook.close()

getContent()


【问题讨论】:

标签: python web-scraping


【解决方案1】:

使用requestspandas 来获得你想要的东西怎么样?

方法如下:

import requests
import pandas as pd
from tabulate import tabulate

page = requests.get("https://wizardofodds.com/games/texas-hold-em/6-player-game/").text
df = pd.read_html(page, flavor="bs4")
df = pd.concat(df)
print(tabulate(df, showindex=False))
df.to_csv("poker_hands.csv", index=False, header=False)

这就是你得到的(以及一个可以导入 Excel 的 .csv 文件):

------------  ------------------  -----------  --------------  -----------  --------------------
Cards         Probability of Win  Average Win  Expected Value  Probability  Additive Probability
Pair of A's   49.51%              5.96         1.9508          0.45%        0.45%
Pair of K's   43.32%              5.95         1.5775          0.45%        0.90%
Pair of Q's   38.3%               5.93         1.2729          0.45%        1.36%
Pair of J's   34.05%              5.92         1.0142          0.45%        1.81%
A/K suited    32.15%              5.8          0.8641          0.30%        2.11%
Pair of T's   30.44%              5.89         0.7944          0.45%        2.56%
A/Q suited    30.56%              5.76         0.7589          0.30%        2.87%
K/Q suited    29.55%              5.76         0.7015          0.30%        3.17%
A/J suited    29.28%              5.71         0.6723          0.30%        3.47%
A/K unsuited  28.96%              5.77         0.6704          0.90%        4.37%
K/J suited    28.28%              5.72         0.6167          0.30%        4.68%
A/T suited    28.27%              5.67         0.6021          0.30%        4.98%
Pair of 9's   27.11%              5.89         0.5978          0.45%        5.43%
Q/J suited    27.57%              5.71         0.5737          0.30%        5.73%
A/Q unsuited  27.21%              5.71         0.555           0.90%        6.64%
K/T suited    27.32%              5.68         0.5506          0.30%        6.94%
Q/T suited    26.64%              5.67         0.5101          0.30%        7.24%
K/Q unsuited  26.28%              5.72         0.5028          0.90%        8.14%
J/T suited    26.33%              5.66         0.4904          0.30%        8.45%
A/J unsuited  25.79%              5.66         0.4597          0.90%        9.35%
A/9 suited    25.75%              5.62         0.4474          0.30%        9.65%
Pair of 8's   24.51%              5.88         0.4416          0.45%        10.11%
K/J unsuited  24.88%              5.67         0.4097          0.90%        11.01%
K/9 suited    24.73%              5.64         0.3946          0.30%        11.31%
A/8 suited    24.98%              5.58         0.3933          0.30%        11.61%
A/T unsuited  24.66%              5.6          0.3817          0.90%        12.52%
Q/J unsuited  24.26%              5.66         0.3721          0.90%        13.42%
Q/9 suited    24.07%              5.64         0.3565          0.30%        13.73%
A/7 suited    24.27%              5.54         0.3444          0.30%        14.03%
T/9 suited    23.9%               5.62         0.3432          0.30%        14.33%
J/9 suited    23.82%              5.63         0.3408          0.30%        14.63%
K/T unsuited  23.8%               5.61         0.3356          0.90%        15.54%
A/5 suited    24.19%              5.5          0.3309          0.30%        15.84%
Pair of 7's   22.35%              5.87         0.3116          0.45%        16.29%
Q/T unsuited  23.2%               5.61         0.3006          0.90%        17.19%
A/4 suited    23.63%              5.5          0.3002          0.30%        17.5%
A/6 suited    23.5%               5.51         0.2954          0.30%        17.8%
J/T unsuited  23.03%              5.6          0.2895          0.90%        18.7%
K/8 suited    22.97%              5.58         0.283           0.30%        19%
A/3 suited    23.05%              5.51         0.2705          0.30%        19.31%
Q/8 suited    22.24%              5.59         0.2436          0.30%        19.61%
K/7 suited    22.37%              5.55         0.2406          0.30%        19.91%
A/2 suited    22.39%              5.52         0.2357          0.30%        20.21%
T/8 suited    22.12%              5.58         0.2345          0.30%        20.51%
J/8 suited    22.01%              5.59         0.2298          0.30%        20.81%
9/8 suited    21.72%              5.6          0.2165          0.30%        21.12%
A/9 unsuited  21.93%              5.53         0.2136          0.90%        22.02%
Pair of 6's   20.59%              5.85         0.2054          0.45%        22.47%
K/6 suited    21.75%              5.52         0.1996          0.30%        22.78%
K/5 suited    21.29%              5.49         0.1692          0.30%        23.08%
K/9 unsuited  20.98%              5.56         0.1659          0.90%        23.98%
A/8 unsuited  21.07%              5.47         0.1533          0.90%        24.89%
Q/7 suited    20.65%              5.54         0.1434          0.30%        25.19%
K/4 suited    20.75%              5.5          0.1406          0.30%        25.49%
8/7 suited    20.39%              5.57         0.1367          0.30%        25.79%
Q/9 unsuited  20.41%              5.55         0.1334          0.90%        26.7%
T/7 suited    20.46%              5.54         0.1333          0.30%        27%
T/9 unsuited  20.44%              5.54         0.1322          0.90%        27.9%
9/7 suited    20.30%              5.57         0.1299          0.30%        28.21%
J/7 suited    20.38%              5.54         0.1293          0.30%        28.51%
J/9 unsuited  20.28%              5.55         0.1248          0.90%        29.41%
K/3 suited    20.25%              5.51         0.1154          0.30%        29.71%
Pair of 5's   19.05%              5.84         0.1118          0.45%        30.17%
Q/6 suited    20.19%              5.5          0.1109          0.30%        30.47%
A/7 unsuited  20.25%              5.42         0.0981          0.90%        31.37%
K/2 suited    19.76%              5.52         0.0909          0.30%        31.67%
A/5 unsuited  20.15%              5.37         0.083           0.90%        32.58%
Q/5 suited    19.72%              5.48         0.0806          0.30%        32.88%
7/6 suited    19.31%              5.56         0.0736          0.30%        33.18%
8/6 suited    19.11%              5.54         0.059           0.30%        33.48%
Q/4 suited    19.22%              5.49         0.0547          0.30%        33.79%
A/4 unsuited  19.51%              5.37         0.0484          0.90%        34.69%
A/6 unsuited  19.43%              5.38         0.0458          0.90%        35.6%
K/8 unsuited  19.04%              5.48         0.043           0.90%        36.5%
T/6 suited    18.95%              5.49         0.0407          0.30%        36.8%
9/6 suited    18.82%              5.53         0.0402          0.30%        37.1%
J/6 suited    18.92%              5.49         0.038           0.30%        37.41%
Pair of 4's   17.76%              5.84         0.0369          0.45%        37.86%
Q/3 suited    18.69%              5.5          0.0281          0.30%        38.16%
6/5 suited    18.45%              5.55         0.0233          0.30%        38.46%
A/3 unsuited  18.87%              5.38         0.0148          0.90%        39.37%
T/8 unsuited  18.5%               5.48         0.0138          0.90%        40.27%
J/5 suited    18.56%              5.46         0.0131          0.30%        40.57%
Q/8 unsuited  18.43%              5.48         0.0104          0.90%        41.48%
Q/2 suited    18.25%              5.52         0.0063          0.30%        41.78%
J/8 unsuited  18.31%              5.48         0.004           0.90%        42.68%
7/5 suited    18.1%               5.53         0.0007          0.30%        42.99%
9/8 unsuited  18.14%              5.5          -0.0013         0.90%        43.89%
K/7 unsuited  18.38%              5.42         -0.0036         0.90%        44.8%
5/4 suited    17.82%              5.54         -0.0125         0.30%        45.1%
J/4 suited    18.03%              5.47         -0.0139         0.30%        45.4%
A/2 unsuited  18.15%              5.38         -0.0235         0.90%        46.3%
8/5 suited    17.73%              5.5          -0.0244         0.30%        46.61%
Pair of 3's   16.69%              5.85         -0.0245         0.45%        47.06%
J/3 suited    17.57%              5.48         -0.0365         0.30%        47.36%
T/5 suited    17.62%              5.44         -0.041          0.30%        47.66%
9/5 suited    17.41%              5.48         -0.0458         0.30%        47.96%
K/6 unsuited  17.72%              5.38         -0.047          0.90%        48.87%
6/4 suited    17.17%              5.54         -0.0484         0.30%        49.17%
J/2 suited    17.11%              5.5          -0.0589         0.30%        49.47%
T/4 suited    17.19%              5.44         -0.0643         0.30%        49.77%
Pair of 2's   15.87%              5.85         -0.0708         0.45%        50.23%
7/4 suited    16.66%              5.52         -0.0809         0.30%        50.53%
5/3 suited    16.59%              5.54         -0.081          0.30%        50.83%
K/5 unsuited  17.18%              5.34         -0.0829         0.90%        51.73%
8/7 unsuited  16.76%              5.47         -0.0841         0.90%        52.64%
T/3 suited    16.72%              5.46         -0.0871         0.30%        52.94%
9/7 unsuited  16.61%              5.45         -0.0953         0.90%        53.85%
T/7 unsuited  16.71%              5.41         -0.0962         0.90%        54.75%
Q/7 unsuited  16.7%               5.4          -0.098          0.90%        55.66%
J/7 unsuited  16.51%              5.41         -0.1073         0.90%        56.56%
8/4 suited    16.25%              5.49         -0.1085         0.30%        56.86%
T/2 suited    16.27%              5.48         -0.1089         0.30%        57.16%
K/4 unsuited  16.59%              5.34         -0.114          0.90%        58.07%
4/3 suited    15.9%               5.56         -0.1162         0.30%        58.37%
9/4 suited    16.05%              5.46         -0.1237         0.30%        58.67%
6/3 suited    15.77%              5.54         -0.127          0.30%        58.97%
Q/6 unsuited  16.17%              5.35         -0.135          0.90%        59.88%
K/3 unsuited  16.03%              5.35         -0.142          0.90%        60.78%
9/3 suited    15.67%              5.47         -0.143          0.30%        61.09%
7/6 unsuited  15.62%              5.43         -0.1512         0.90%        61.99%
5/2 suited    15.19%              5.53         -0.1593         0.30%        62.29%
7/3 suited    15.25%              5.51         -0.1605         0.30%        62.59%
9/2 suited    15.22%              5.49         -0.1644         0.30%        62.9%
Q/5 unsuited  15.66%              5.31         -0.1685         0.90%        63.8%
K/2 unsuited  15.49%              5.36         -0.1695         0.90%        64.71%
8/6 unsuited  15.35%              5.41         -0.1698         0.90%        65.61%
8/3 suited    14.94%              5.47         -0.1832         0.30%        65.91%
4/2 suited    14.68%              5.56         -0.1836         0.30%        66.21%
9/6 unsuited  15%                 5.38         -0.193          0.90%        67.12%
T/6 unsuited  15.06%              5.33         -0.1972         0.90%        68.02%
Q/4 unsuited  15.08%              5.31         -0.1987         0.90%        68.93%
8/2 suited    14.58%              5.49         -0.2            0.30%        69.23%
6/5 unsuited  14.71%              5.41         -0.2045         0.90%        70.14%
J/6 unsuited  14.94%              5.32         -0.2053         0.90%        71.04%
6/2 suited    14.37%              5.53         -0.2056         0.30%        71.34%
3/2 suited    14.02%              5.58         -0.2171         0.30%        71.64%
Q/3 unsuited  14.52%              5.33         -0.2267         0.90%        72.55%
7/5 unsuited  14.34%              5.38         -0.2289         0.90%        73.45%
7/2 suited    13.97%              5.49         -0.2331         0.30%        73.76%
J/5 unsuited  14.53%              5.27         -0.2339         0.90%        74.66%
5/4 unsuited  14.05%              5.39         -0.2418         0.90%        75.57%
Q/2 unsuited  14%                 5.34         -0.2523         0.90%        76.47%
8/5 unsuited  13.87%              5.34         -0.2595         0.90%        77.38%
J/4 unsuited  13.96%              5.28         -0.2632         0.90%        78.28%
6/4 unsuited  13.34%              5.39         -0.2814         0.90%        79.19%
9/5 unsuited  13.48%              5.3          -0.2859         0.90%        80.09%
T/5 unsuited  13.63%              5.24         -0.2859         0.90%        81%
J/3 unsuited  13.42%              5.29         -0.2898         0.90%        81.9%
T/4 unsuited  13.16%              5.23         -0.3113         0.90%        82.81%
J/2 unsuited  12.92%              5.3          -0.3147         0.90%        83.71%
5/3 unsuited  12.71%              5.38         -0.3167         0.90%        84.62%
7/4 unsuited  12.78%              5.34         -0.3175         0.90%        85.52%
T/3 unsuited  12.62%              5.25         -0.3377         0.90%        86.43%
8/4 unsuited  12.29%              5.29         -0.35           0.90%        87.33%
4/3 unsuited  11.99%              5.39         -0.3537         0.90%        88.24%
T/2 unsuited  12.12%              5.26         -0.3623         0.90%        89.14%
6/3 unsuited  11.82%              5.35         -0.3672         0.90%        90.05%
9/4 unsuited  12.01%              5.24         -0.371          0.90%        90.95%
9/3 unsuited  11.58%              5.25         -0.3927         0.90%        91.86%
5/2 unsuited  11.23%              5.34         -0.3998         0.90%        92.76%
7/3 unsuited  11.24%              5.3          -0.4046         0.90%        93.67%
9/2 unsuited  11.08%              5.26         -0.417          0.90%        94.57%
4/2 unsuited  10.67%              5.37         -0.4267         0.90%        95.48%
8/3 unsuited  10.86%              5.23         -0.4314         0.90%        96.38%
6/2 unsuited  10.32%              5.31         -0.4519         0.90%        97.29%
8/2 unsuited  10.45%              5.24         -0.4522         0.90%        98.19%
3/2 unsuited  9.98%               5.39         -0.462          0.90%        99.1%
7/2 unsuited  9.87%               5.24         -0.4833         0.90%        100%
Total         18.14%              5.51         0               100%         0%
------------  ------------------  -----------  --------------  -----------  --------------------

这是.csv 文件中的内容:

【讨论】:

  • 我不知道为什么,但我认为只有当数据在 HTML 代码中被格式化为 时才能使用 pandas,这太棒了!使用数据框时,我还可以使用 Regex 来编辑手的名称吗?
  • 当然可以。如果您觉得我的回答有用,请点赞和/或接受。答案旁边有勾号。谢谢!玩得开心编码!
  • 会的,它告诉我再过一分钟,我才能接受你的回答。再次感谢!
【解决方案2】:

您是否尝试过用 ; 分隔值?和 \n 因为 excel 文件通常是用逗号分隔的。

因此,当您检索数据并对其进行清理时,您就拥有了这些值 牌;概率;赢\n card1;概率1;win1\n card2;概率2;win2\n

等等

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-10-31
    • 1970-01-01
    • 1970-01-01
    • 2016-05-19
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多