【发布时间】:2017-08-14 15:34:47
【问题描述】:
我目前正在免费试用模式下从事谷歌云项目。我有 cron 作业从数据供应商获取数据并将其存储在数据存储中。几周前,我编写了获取数据的代码,一切正常,但突然之间,我开始收到错误“DeadlineExceededError:超出响应 HTTP 请求的总体截止日期”最后两天。我相信 cron 作业应该只在 60 分钟后超时,知道为什么我会收到错误吗?。
cron 任务
def run():
try:
config = cron.config
actual_data_source = config['xxx']['xxxx']
original_data_source = actual_data_source
company_list = cron.rest_client.load(config, "companies", '')
if not company_list:
logging.info("Company list is empty")
return "Ok"
for row in company_list:
company_repository.save(row,original_data_source, actual_data_source)
return "OK"
存储库代码
def save( dto, org_ds , act_dp):
try:
key = 'FIN/%s' % (dto['ticker'])
company = CompanyInfo(id=key)
company.stock_code = key
company.ticker = dto['ticker']
company.name = dto['name']
company.original_data_source = org_ds
company.actual_data_provider = act_dp
company.put()
return company
except Exception:
logging.exception("company_repository: error occurred saving the company
record ")
raise
RestClient
def load(config, resource, filter):
try:
username = config['xxxx']['xxxx']
password = config['xxxx']['xxxx']
headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":"
+ password)}
if filter:
from_date = filter['from']
to_date = filter['to']
ticker = filter['ticker']
start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")
current_page = 1
data = []
while True:
if (filter):
url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
else:
url = config['xxxx']["endpoints"][resource] % (current_page)
response = urlfetch.fetch(
url=url,
deadline=60,
method=urlfetch.GET,
headers=headers,
follow_redirects=False,
)
if response.status_code != 200:
logging.error("xxxx GET received status code %d!" % (response.status_code))
logging.error("error happend for url: %s with headers %s", url, headers)
return 'Sorry, xxxx API request failed', 500
db = json.loads(response.content)
if not db['data']:
break
data.extend(db['data'])
if db['total_pages'] == current_page:
break
current_page += 1
return data
except Exception:
logging.exception("Error occured with xxxx API request")
raise
【问题讨论】:
-
假设您没有像@momus 建议的那样被阻止或速率限制,考虑分派一个任务来执行
load函数中while循环的每次迭代的保存。这样您就不必等到load完成后才能开始数据存储更新。您也可以考虑使用ndb.put_multi,而不是在每个实例上调用put()。 -
相关(是的,我知道这确实是一个不同的问题):stackoverflow.com/questions/45594018/…
-
处理这些 cron 请求的服务使用哪种扩展方式和实例类型?
标签: google-app-engine google-cloud-datastore google-app-engine-python