【发布时间】:2021-02-17 20:13:02
【问题描述】:
我正在处理 Twitter 流数据,我有这样的输出:
"data": {
"author_id": "1318123716522479616",
"created_at": "2020-11-05T04:18:21.000Z",
"entities": {
"hashtags": [
{
"end": 107,
"start": 86,
"tag": "MilliHesaplarYanyana"
}
],
"mentions": [
{
"end": 15,
"start": 3,
"username": "MilliTaakip"
}
]
},
"id": "1324204381177323520",
"lang": "tr",
"text": "RT @MilliTaakip: Milli hesaplar\u0131m\u0131z\u0131n g\u00fc\u00e7lenmesi i\u00e7in\nCumhurba\u015fkan\u0131m\u0131z\u0131n talimat\u0131yla,\n#MilliHesaplarYanyana \u00e7al\u0131\u015fmas\u0131n\u0131 destekliyoruz;\n\n\ud83c\uddf9\ud83c\uddf7\u2026"
}
}
我想从这些数据中提取特定信息,例如主题标签,并将它们存储在我的数据库中。
我尝试使用多种方式,如json.normalize、flatten_json,但它不起作用。我得到以下输出
这是我的代码:
def connect_to_endpoint(url, headers):
response = requests.request("GET", url, headers=headers, stream=True, params=payload)
print(response.status_code)
for response_line in response.iter_lines():
if response_line:
# print(ndjson.dumps(json_response["data"]["text"], indent=4, sort_keys=True))
conn = psycopg2.connect(database="tweetData", user="postgres", password="pass", host="localhost", port="5432")
cur = conn.cursor()
# cc
try:
data = json.loads(response_line.decode('utf-8'))
index = 0
#for created at
var2 = json.loads(response_line.decode('utf-8'))["data"]["text"]
# define a list of keywords
keywords = ('biden', 'election', 'trump','stocks')
if any(keyword in var2.lower() for keyword in keywords):
df= pd.json_normalize(data)
dffinal=pd.DataFrame(df)
engine = create_engine('postgresql+psycopg2://postgres:root@localhost:5432/tweetData')
dffinal.to_sql("new-tweets", engine,if_exists='append',dtype = {'relevant_column':sqlalchemy.types.JSON})
print("loaded")
else:
print("none")
conn.commit()
index += 1
cur.close()
except IOError as io:
print("ERROR!")
if response.status_code != 200:
raise Exception(
"Request returned an error: {} {}".format(
response.status_code, response.text
)
)
请告知我应该如何进行以及我的方法中有哪些错误
编辑: 每次我尝试检索推文数据时,如果推文数据中没有实体或没有主题标签,它会发送一条错误消息:Key Error: 'entities'
【问题讨论】:
标签: json python-3.x pandas postgresql