学习接受 Unicode...世界不再是 ASCII。
假设您在 Windows 上并使用 Excel 或记事本查看 .CSV,请在 Python 3 上使用以下行。仅通过此更改(并修复您的帖子的缩进),您甚至可以查看非 ASCII字符正确。记事本和 Excel 就像 utf-8-sig 提供的文件开头的 UTF-8 BOM 签名。
with open('usnwr_schools.csv', 'w', newline='', encoding='utf-8-sig') as f:
如果在另一个 Python 脚本中读取文件,请确保使用以下内容读取文件。您阅读的 b'University of Michigan\xe2\x80\x94\xe2\x80\x8bAnn Arbor' 的示例是以二进制模式阅读的 'rb'。
with open('usnwr_schools.csv', encoding='utf-8-sig') as f:
如果在 Linux 上,您可以使用 utf8 而不是 utf-8-sig。
顺便说一句,您可以将循环替换为:
with open('usnwr_schools.csv', 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
for school in reqSoup:
x = reqSoup.find_all("a", {"class" : "school-name"})
for item in x:
y = item.get_text()
writer.writerow([y])
回读:
with open('usnwr_schools.csv',encoding='utf-8-sig') as f:
print(f.read())
输出:
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
Massachusetts Institute of Technology
Stanford University
University of California—Berkeley
California Institute of Technology
Carnegie Mellon University
University of Michigan—Ann Arbor
Georgia Institute of Technology
University of Illinois—Urbana-Champaign
Purdue University—West Lafayette
University of Texas—Austin (Cockrell)
Texas A&M; University—College Station (Look)
Cornell University
University of Southern California (Viterbi)
Columbia University (Fu Foundation)
University of California—Los Angeles (Samueli)
University of California—San Diego (Jacobs)
Princeton University
Northwestern University (McCormick)
University of Pennsylvania
Johns Hopkins University (Whiting)
Virginia Tech
University of California—Santa Barbara
Harvard University
University of Maryland—College Park (Clark)
University of Washington
如果您仍然只想成为 ASCII,这将做到:
import requests
import bs4
import csv
results = requests.get('http://grad-schools.usnews.rankingsandreviews.com/best-graduate-schools/top-engineering-schools/eng-rankings?int=a74509')
replacements = {ord('\N{EN DASH}'):'-',
ord('\N{EM DASH}'):'-',
ord('\N{ZERO WIDTH SPACE}'):None}
reqSoup = bs4.BeautifulSoup(results.text, "html.parser")
with open('usnwr_schools.csv', 'w', newline='', encoding='ascii') as f:
writer = csv.writer(f)
for school in reqSoup:
x = reqSoup.find_all("a", {"class" : "school-name"})
for item in x:
y = item.get_text()
writer.writerow([y.translate(replacements)])
with open('usnwr_schools.csv',encoding='ascii') as f:
print(f.read())