【发布时间】:2017-02-22 19:57:59
【问题描述】:
我正在尝试从帐户列表中下载 Twitter 关注者。我的函数(使用 twython)对于简短的帐户列表非常有效,但对于较长的列表会产生错误。这不是 RateLimit 问题,因为如果达到速率限制,我的函数会一直休眠到下一次 bin。 错误是这样的
twythonerror: ('Connection aborted.', error(10054, ''))
其他人似乎也有同样的问题,建议的解决方案是让函数在不同的 REST API 调用之间休眠,所以我实现了以下代码
del twapi
sleep(nap[afternoon])
afternoon = afternoon + 1
twapi = Twython(app_key=app_key, app_secret=app_secret,
oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
nap 是以秒为单位的间隔列表,下午是一个索引。 尽管有这个建议,我仍然有完全相同的问题。看来睡眠并不能解决问题。 谁能帮帮我?
这里是完整的功能
def download_follower(serie_lst):
"""Creates account named txt files containing followers ids. Uses for loop on accounts names list."""
nap = [1, 2, 4, 8, 16, 32, 64, 128]
afternoon = 0
for exemplar in serie_lst:
#username from serie_lst entries
account_name = exemplar
twapi = Twython(app_key=app_key, app_secret=app_secret,
oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
try:
#initializations
del twapi
if afternoon >= 7:
afternoon =0
sleep(nap[afternoon])
afternoon = afternoon + 1
twapi = Twython(app_key=app_key, app_secret=app_secret,
oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
next_cursor = -1
result = {}
result["screen_name"] = ""
result["followers"] = []
iteration = 0
file_name = ""
#user info
user = twapi.lookup_user(screen_name = account_name)
#store user name
result['screen_name'] = account_name
#loop until all cursored results are stored
while (next_cursor != 0):
sleep(random.randrange(start = 1, stop = 15, step = 1))
call_result = twapi.get_followers_ids(screen_name = account_name, cursor = next_cursor)
#loop over each entry of followers id and append each entry to results_follower
for i in call_result["ids"]:
result["followers"].append(i)
next_cursor = call_result["next_cursor"] #new next_cursor
iteration = iteration + 1
if (iteration > 13): #skip sleep if all cursored pages are processed
error_msg = localtime()
error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
error_msg ="".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
print(error_msg)
del error_msg
sleep(901) #15min + 1sec
iteration = 0
#output file
file_name = "".join([account_name, ".txt"])
#print output
out_file = open(file_name, "w") #open file "account_name.txt"
#out_file.write(str(result["followers"])) #standard format
for i in result["followers"]: #R friendly table format
out_file.write(str(i))
out_file.write("\n")
out_file.close()
except twython.TwythonRateLimitError:
#wait
error_msg = localtime()
error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
error_msg ="".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
print(error_msg)
del error_msg
del twapi
sleep(901) #15min + 1sec
#initializations
if afternoon >= 7:
afternoon =0
sleep(nap[afternoon])
afternoon = afternoon + 1
twapi = Twython(app_key=app_key, app_secret=app_secret,
oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
next_cursor = -1
result = {}
result["screen_name"] = ""
result["followers"] = []
iteration = 0
file_name = ""
#user info
user = twapi.lookup_user(screen_name = account_name)
#store user name
result['screen_name'] = account_name
#loop until all cursored results are stored
while (next_cursor != 0):
sleep(random.randrange(start = 1, stop = 15, step = 1))
call_result = twapi.get_followers_ids(screen_name = account_name, cursor = next_cursor)
#loop over each entry of followers id and append each entry to results_follower
for i in call_result["ids"]:
result["followers"].append(i)
next_cursor = call_result["next_cursor"] #new next_cursor
iteration = iteration + 1
if (iteration > 13): #skip sleep if all cursored pages are processed
error_msg = localtime()
error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
error_msg = "".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
print(error_msg)
del error_msg
sleep(901) #15min + 1sec
iteration = 0
#output file
file_name = "".join([account_name, ".txt"])
#print output
out_file = open(file_name, "w") #open file "account_name.txt"
#out_file.write(str(result["followers"])) #standard format
for i in result["followers"]: #R friendly table format
out_file.write(str(i))
out_file.write("\n")
out_file.close()
【问题讨论】:
-
nap中的值是什么?afternoon的初始值是多少?您需要提供更多背景信息才能理解。 -
nap = [1,2,4,8,16,32,64,128] 并且下午初始化为 0 并在需要时设置回 0。检查了该部分,问题是尽管程序在每次调用之间都处于休眠状态,但服务器仍在关闭连接
-
你为什么要使用这么短的休止符?如果这是一个速率限制问题,那么这些值可能不足以进入下一个窗口,as it seems,限制是每 15 分钟周期。
-
另外,你为什么每隔几秒就删除一次连接?我认为您应该能够保持连接打开,但等待下一个请求。
-
这不是 RateLimit 问题。以避免我的函数休眠 900 秒(15 分钟)。我遇到了 RateLimit 问题,但我已经解决了。这一次是一个不同的问题。可能,推特服务器将我的调用视为拒绝服务攻击,所以我让我的函数在不同的时间间隔内休眠,并出于同样的原因删除连接(如此处所建议的 --> stackoverflow.com/questions/27333671/… )
标签: python api twitter twython twitter-rest-api