【发布时间】:2021-10-04 10:12:33
【问题描述】:
我有一个包含唯一实用程序数据的元组列表,包括消耗量(立方英尺)、加仑水和估计价格。有 13 个元组,一个用于一年中的每个月,一个用于年末的总消费。我的目标是提取这三个信息,将它们存储到数据框中,并最终将它们导出到 Excel 工作表中。
这是我将元组排序为字符串后的样子。 (我将它们迭代并排序为字符串的原因是因为它们最初是 Soup(BeautifulSoup) 格式,很难组织成列表。)
这是一个元组的样子:
[\'<area alt="" coords="151,115,181,382" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 49,094.00 CF (367,223.12 Gallons) <br /> Approximate Charge = $5,073.42\\\');" shape="rect"/>\']'
下面是元组的完整列表。唯一的例外是最后(第 13 个)元组列出了“总消费”而不是“消费”
['[\'<area alt="" coords="113,88,143,382" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'**Consumption = 54,070.00 CF (404,443.60 Gallons)** <br /> **Approximate Charge = $5,587.65**\\\');" shape="rect"/>\']', '[\'<area alt="" coords="151,115,181,382" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 49,094.00 CF (367,223.12 Gallons) <br /> Approximate Charge = $5,073.42\\\');" shape="rect"/>\']', '[\'<area alt="" coords="188,99,218,382" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 51,921.00 CF (388,369.08 Gallons) <br /> Approximate Charge = $5,365.57\\\');" shape="rect"/>\']', '[\'<area alt="" coords="226,125,256,382" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 47,122.00 CF (352,472.56 Gallons) <br /> Approximate Charge = $4,869.63\\\');" shape="rect"/>\']', '[\'<area alt="" coords="263,101,294,382" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 51,687.00 CF (386,618.76 Gallons) <br /> Approximate Charge = $5,341.39\\\');" shape="rect"/>\']', '[\'<area alt="" coords="301,139,331,382" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 44,643.00 CF (333,929.64 Gallons) <br /> Approximate Charge = $4,613.45\\\');" shape="rect"/>\']', '[\'<area alt="" coords="339,176,369,382" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 37,770.00 CF (282,519.60 Gallons) <br /> Approximate Charge = $4,010.80\\\');" shape="rect"/>\']', '[\'<area alt="" coords="376,382,407,383" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 0.00 CF (0.00 Gallons) <br /> Approximate Charge = $0.00\\\');" shape="rect"/>\']', '[\'<area alt="" coords="414,382,444,383" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 0.00 CF (0.00 Gallons) <br /> Approximate Charge = $0.00\\\');" shape="rect"/>\']', '[\'<area alt="" coords="452,382,482,383" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 0.00 CF (0.00 Gallons) <br /> Approximate Charge = $0.00\\\');" shape="rect"/>\']', '[\'<area alt="" coords="489,382,519,383" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 0.00 CF (0.00 Gallons) <br /> Approximate Charge = $0.00\\\');" shape="rect"/>\']', '[\'<area alt="" coords="527,382,557,383" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Consumption = 0.00 CF (0.00 Gallons) <br /> Approximate Charge = $0.00\\\');" shape="rect"/>\']', '[\'<area alt="" coords="653,68,733,382" onmouseout="DisplayTooltip(\\\'\\\');" onmouseover="DisplayTooltip(\\\'Total Consumption = 336,307 CF (2,515,576 Gallons) <br /> Approximate Charge = $34,861.91\\\');" shape="rect"/>\']']
我写了这个正则表达式来提取加仑:
gallons = re.search('CF((.*)Gallons)', test_line)
print(gallons)
哪个输出这个:
<re.Match object; span=(128, 150), match='CF (404,443.60 Gallons'>
这并没有真正让它变得更容易,因为现在我必须找到一种方法来提取 '404,443,.60'
如果有人可以推荐一种从元组列表中提取这三段数据的方法(假设我很可能必须在元组列表上创建某种形式的迭代)并将它们存储到一个数据帧中非常有帮助。最终目标是将这些数字存储到数据框中,并最终导出到 Excel 工作表中。
【问题讨论】: