【发布时间】:2021-11-19 04:08:50
【问题描述】:
我正在尝试使用 json.loads 加载使用 Beautiful Soup 收集的数据。但是,我使用的数据存在一个问题,即某些字段在字段中包含双引号。示例:
"rComments":"He is a very easy grader, but gets boring occasionally. I wish he would quit saying "Without further ado..." Cancer Bio is a great class because there is a different lecturer each time."
这会导致以下错误:
JSONDecodeError: Expecting ',' delimiter: line 1 column 3556 (char 3555)
有没有办法使用正则表达式或其他方法用单引号/无引号替换“事不宜迟...”周围的双引号?我需要维护其他双引号,因为 JSON 需要它们
这是我的代码的副本。任何具有嵌套双引号的教授 ID 都会失败。
# Make Request
url1 = 'https://www.ratemyprofessors.com/paginate/professors/ratings?tid={}&filter=&courseCode=&page=1'.format(124880)
page1 = requests.get(url1)
soup1 = BeautifulSoup(page1.text, "html.parser")
soup1 = str(soup1)
# Remove Double Quotes in Comments
soup1 = re.sub(r'(?:[\b\s\:]\".*)(?:.*)(\")(?:.*\")', '', soup1)
# Create Dictionary
Dict1 = json.loads(soup1)
我也尝试了下面的正则表达式,它也没有工作。
:r"(\".*?)\"(.*?)\"(.*\")
作为参考,这是 repr(soup1) 返回的内容。
'\'{"ratings":[{"attendance":"N/A","clarityColor":"good","easyColor":"average","helpColor":"good","helpCount":2,"id":29366967,"notHelpCount":0,"onlineClass":"","quality":"awesome","rClarity":5,"rClass":"BIOL4015","rComments":"One of my favorite professors at Tech. Really cares about his students, and even brought us apples from Elijay and snacks during the final. His tests are not too bad and the group project is pretty easy. Good teacher and even better human being.","rDate":"01/01/2018","rEasy":3.0,"rEasyString":"3.0","rErrorMsg":null,"rHelpful":5,"rInterest":"N/A","rOverall":5.0,"rOverallString":"5.0","rStatus":1,"rTextBookUse":"Yes","rTimestamp":1514816343000,"rWouldTakeAgain":"Yes","sId":361,"takenForCredit":"Yes","teacher":null,"teacherGrade":"B+","teacherRatingTags":["Inspirational","Caring"],"unUsefulGrouping":"people","usefulGrouping":"people"},{"attendance":"Not Mandatory","clarityColor":"good","easyColor":"average","helpColor":"good","helpCount":0,"id":28805507,"notHelpCount":0,"onlineClass":"","quality":"awesome","rClarity":5,"rClass":"BIOL3450","rComments":"GOAT","rDate":"10/30/2017","rEasy":3.0,"rEasyString":"3.0","rErrorMsg":null,"rHelpful":5,"rInterest":"N/A","rOverall":5.0,"rOverallString":"5.0","rStatus":1,"rTextBookUse":"Yes","rTimestamp":1509404689000,"rWouldTakeAgain":"Yes","sId":361,"takenForCredit":"Yes","teacher":null,"teacherGrade":"A","teacherRatingTags":["Caring","Get ready to read","Accessible outside class"],"unUsefulGrouping":"people","usefulGrouping":"people"},{"attendance":"N/A","clarityColor":"average","easyColor":"good","helpColor":"poor","helpCount":0,"id":19977224,"notHelpCount":0,"onlineClass":"","quality":"poor","rClarity":2,"rClass":"BIOL3450","rComments":"Dr Merril is a really, really nice person, and I\\\'m sure he\\\'s great doing his research but he is just not a good professor for a lecture based class with 150ish people. He\\\'s soft spoken, moves too fast in lecture and goes into unnecessary detail. Also does not hold office hours. Would rather defer students to TA.","rDate":"03/31/2012","rEasy":4.0,"rEasyString":"4.0","rErrorMsg":null,"rHelpful":1,"rInterest":"Low","rOverall":1.5,"rOverallString":"1.5","rStatus":1,"rTextBookUse":"Yes","rTimestamp":1333212949000,"rWouldTakeAgain":"N/A","sId":361,"takenForCredit":"N/A","teacher":null,"teacherGrade":"N/A","teacherRatingTags":[],"unUsefulGrouping":"people","usefulGrouping":"people"},{"attendance":"N/A","clarityColor":"good","easyColor":"good","helpColor":"average","helpCount":0,"id":15545116,"notHelpCount":0,"onlineClass":"","quality":"good","rClarity":5,"rClass":"BIOL3340","rComments":"Dr. Merrill is a very nice man and a decent teacher. Class attendance isn\\\'t necessary, however, he does offer extra credit for attendence occasionally. The class is all memorization and a lot of nit-picky information. Didn\\\'t like the class too much, but he was a fine teacher.","rDate":"03/18/2009","rEasy":4.0,"rEasyString":"4.0","rErrorMsg":null,"rHelpful":2,"rInterest":"Meh","rOverall":3.5,"rOverallString":"3.5","rStatus":1,"rTextBookUse":"Yes","rTimestamp":1237418592000,"rWouldTakeAgain":"N/A","sId":361,"takenForCredit":"N/A","teacher":null,"teacherGrade":"N/A","teacherRatingTags":[],"unUsefulGrouping":"people","usefulGrouping":"people"},{"attendance":"N/A","clarityColor":"good","easyColor":"poor","helpColor":"good","helpCount":1,"id":10944025,"notHelpCount":0,"onlineClass":"","quality":"awesome","rClarity":5,"rClass":"BIOL8802","rComments":"He is a very easy grader, but gets boring occasionally. I wish he would quit saying "Without further ado..." Cancer Bio is a great class because there is a different lecturer each time.","rDate":"11/18/2005","rEasy":1.0,"rEasyString":"1.0","rErrorMsg":null,"rHelpful":4,"rInterest":"It\\\'s my life","rOverall":4.5,"rOverallString":"4.5","rStatus":1,"rTextBookUse":"N/A","rTimestamp":1132303531000,"rWouldTakeAgain":"N/A","sId":361,"takenForCredit":"N/A","teacher":null,"teacherGrade":"N/A","teacherRatingTags":[],"unUsefulGrouping":"people","usefulGrouping":"person"},{"attendance":"N/A","clarityColor":"good","easyColor":"average","helpColor":"good","helpCount":0,"id":614809,"notHelpCount":0,"onlineClass":"","quality":"awesome","rClarity":4,"rClass":"3331","rComments":"Not very challenging","rDate":"02/22/2003","rEasy":2.0,"rEasyString":"2.0","rErrorMsg":null,"rHelpful":5,"rInterest":"N/A","rOverall":4.5,"rOverallString":"4.5","rStatus":1,"rTextBookUse":"N/A","rTimestamp":1045879151000,"rWouldTakeAgain":"N/A","sId":361,"takenForCredit":"N/A","teacher":null,"teacherGrade":"N/A","teacherRatingTags":[],"unUsefulGrouping":"people","usefulGrouping":"people"}],"remaining":0}\''
【问题讨论】:
-
嵌套引号在这里应该不是问题。如果您尝试在不删除引号的情况下加载 json,您会遇到什么错误?您能否在应用正则表达式替换之前提供
repr(soup1)。 -
嗨@WillDaSilva 我用错误和repr更新了帖子。