列表理解和汉字项目答案

【问题标题】：list comprehension and item with chinese character列表理解和汉字项目
【发布时间】：2017-01-23 03:09:13
【问题描述】：

我有以下代码，它将从网站中获取一些带有中文字符的数据。

import csv
import requests
from bs4 import BeautifulSoup

url = "http://www.hkcpast.net/cpast_homepage/xyzbforms/BetMatchDetails.asp?tBetDate=2016/9/11"

r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

for a in soup.find_all('html'):
    a.decompose()

list = []
for row in soup.find_all('tr'):
    cols = row.find_all('td')
    for col in cols:
        if len(col) > 0:
            list.append(col.text.encode('utf-8').strip())

现在结果是这样的：

[1, x, y, z, 2, x, y, z, 3, x, y, z]

我的问题是我想从列表中创建一些子列表，它们之间用数字分隔（1、2、3、4 ,5 .....）

这样结果会是这样的：

[1, x, y, z]
[2, x, y, z]
[3, x, y, z]

这样做的最终目标是将每个子列表写成 csv 文件中的一行。首先将列表分成每个条目然后写入 csv 文件是否有意义？

【问题讨论】：

请坚持每个问题一个问题。但是，您描述的两个问题都太模糊而无法回答。也许您可以先向我们提供一些示例输入和预期输出？后者只是一个编码问题；您将数据编码为 UTF-8，因此在打印或列出时，您将在输出中看到 \xhh 字节表示。这很正常。
我已经编辑了这个问题，所以只剩下一个问题了。如何从具有特定要求的列表中创建子列表。

标签： python list csv

【解决方案1】：

您的代码的字面翻译如下：

list = []
for row in soup.find_all('tr'):
    cols = row.find_all('td')
    for col in cols:
        if len(col) = 0:
            continue  # Save some indentation
        txt = col.text.encode('utf-8').strip()
        try:
           _ = int(txt)
           # txt is an int.  Append new sub-list
           list.append( [txt] )
        except ValueError:
           # txt is not an int, append it to the end of previous sub-list
           list[-1].append(txt)

（请注意，如果第一个条目不是 int，这将非常失败！）

但是，我怀疑您实际上想为表中的每一行创建一个新的子列表。

【讨论】：