有没有办法在工作完成后运行 aws 胶水爬虫？答案

【问题标题】：Is there a way to run aws glue crawler after job is finished?有没有办法在工作完成后运行 aws 胶水爬虫？
【发布时间】：2018-06-20 09:29:13
【问题描述】：

例如，我运行 ETL，可能会为目标表添加新字段或列。要检测表更改，应运行爬虫，但它只能手动或计划运行。

作业完成后是否可以触发爬虫？

【问题讨论】：

标签： amazon-web-services aws-glue

【解决方案1】：

import boto3
glue_client = boto3.client('glue', region_name='us-east-1')
glue_client.start_crawler(Name='name_of_crawler')

在代码末尾复制此代码 sn-p。

【讨论】：

这是抛出连接超时错误。请问我的错误有什么替代方案或解决方案吗？ ConnectTimeoutError：端点 URL 上的连接超时：“glue.eu-central-1.amazonaws.com”

【解决方案2】：

您可以使用触发器，但不能在触发器 UI 中：S

使用 Glue 工作流：添加触发器以启动作业、添加作业、添加触发器以获取作业成功、为触发的内容添加爬虫

或者，使用 CLI：

aws glue create-trigger --name myJob-success \
    --type CONDITIONAL \
    --predicate '{"Logical":"ANY","Conditions":[{"JobName":"myJob","LogicalOperator":"EQUALS","State":"SUCCEEDED"}]}' \
    --actions CrawlerName=myCrawler \
    --start-on-creation

或在 CloudFormation 中：

Type: AWS::Glue::Trigger
Properties: 
  Name: job_success
  Type: CONDITIONAL
  Predicate: 
    Logical: ANY
    Conditions:
      - JobName: myJob
        LogicalOperator: EQUALS
        State: SUCCEEDED
  Actions: 
    - CrawlerName:myCrawler

【讨论】：