【问题标题】:AttributeError: 'Namespace' object has no attribute 'project'AttributeError:“命名空间”对象没有属性“项目”
【发布时间】:2021-03-11 19:03:46
【问题描述】:

我正在尝试重用从https://www.opsguru.io/post/solution-walkthrough-visualizing-daily-cloud-spend-on-gcp-using-gke-dataflow-bigquery-and-grafana 复制的代码。我对python不太熟悉,因此在这里寻求帮助。尝试将 GCP Bigquery 数据复制到 Postgres

我已经对代码进行了一些修改,但由于我的错误或代码而出现了一些错误

这就是我所拥有的

import uuid
import argparse
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions, StandardOptions, GoogleCloudOptions, WorkerOptions
from beam_nuggets.io import relational_db
from apache_beam.io.gcp import bigquery


parser = argparse.ArgumentParser()
args = parser.parse_args()
project = args.project("project", help="Enter Project ID")
job_name = args.job_name + str(uuid.uuid4())
bigquery_source = args.bigquery_source
postgresql_user = args.postgresql_user
postgresql_password = args.postgresql_password
postgresql_host = args.postgresql_host
postgresql_port = args.postgresql_port
postgresql_db = args.postgresql_db
postgresql_table = args.postgresql_table
staging_location = args.staging_location
temp_location = args.temp_location
subnetwork = args.subnetwork
 
options = PipelineOptions(
    flags=["--requirements_file", "/opt/python/requirements.txt"])
# For Cloud execution, set the Cloud Platform project, job_name,
# staging location, temp_location and specify DataflowRunner.
 
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = project
google_cloud_options.job_name = job_name
google_cloud_options.staging_location = staging_location
google_cloud_options.temp_location = temp_location
google_cloud_options.region = "europe-west4"
worker_options = options.view_as(WorkerOptions)
worker_options.zone = "europe-west4-a"
worker_options.subnetwork = subnetwork
worker_options.max_num_workers = 20


 
options.view_as(StandardOptions).runner = 'DataflowRunner'

start_date = define_start_date()
with beam.Pipeline(options=options) as p:
    rows = p | 'QueryTableStdSQL' >> beam.io.Read(beam.io.BigQuerySource(
        query = 'SELECT \
        billing_account_id, \
        service.id as service_id, \
        service.description as service_description, \
        sku.id as sku_id, \
        sku.description as sku_description, \
        usage_start_time, \
        usage_end_time, \
        project.id as project_id, \
        project.name as project_description, \
        TO_JSON_STRING(project.labels) \
        as project_labels, \
        project.ancestry_numbers \
        as project_ancestry_numbers, \
        TO_JSON_STRING(labels) as labels, \
        TO_JSON_STRING(system_labels) as system_labels, \
        location.location as location_location, \
        location.country as location_country, \
        location.region as location_region, \
        location.zone as location_zone, \
        export_time, \
        cost, \
        currency, \
        currency_conversion_rate, \
        usage.amount as usage_amount, \
        usage.unit as usage_unit, \
        usage.amount_in_pricing_units as \
        usage_amount_in_pricing_units, \
        usage.pricing_unit as usage_pricing_unit, \
        TO_JSON_STRING(credits) as credits, \
        invoice.month as invoice_month cost_type \
        FROM `' + project + '.' + bigquery_source + '` \
        WHERE export_time >= "' + start_date + '"', use_standard_sql=True))
    source_config = relational_db.SourceConfiguration(
        drivername='postgresql+pg8000',
        host=postgresql_host,
        port=postgresql_port,
        username=postgresql_user,
        password=postgresql_password,
        database=postgresql_db,
        create_if_missing=True,
    )
    table_config = relational_db.TableConfiguration(
        name=postgresql_table,
        create_if_missing=True
    )
    rows | 'Writing to DB' >> relational_db.Write(
        source_config=source_config,
        table_config=table_config
    )

当我运行程序时出现以下错误:

bq-to-sql.py: error: unrecognized arguments: --project xxxxx --job_name bq-to-sql-job --bigquery_source xxxxxxxx
 --postgresql_user xxxxx --postgresql_password xxxxx --postgresql_host xx.xx.xx.xx --postgresql_port 5432 --postgresql_db xxxx --postgresql_table xxxx --staging_location g
s://xxxxx-staging --temp_location gs://xxxxx-temp --subnetwork regions/europe-west4/subnetworks/xxxx

【问题讨论】:

标签: postgresql python-2.7 google-cloud-platform google-cloud-dataflow apache-beam


【解决方案1】:

argparse 需要配置。 Argparse 像魔术一样工作,但它确实需要配置。在第 10 行 parser = argparse.ArgumentParser() 和第 11 行 args = parser.parse_args() 之间需要这些行

parser.add_argument("--project")
parser.add_argument("--job_name")
parser.add_argument("--bigquery_source")
parser.add_argument("--postgresql_user")
parser.add_argument("--postgresql_password")
parser.add_argument("--postgresql_host")
parser.add_argument("--postgresql_port")
parser.add_argument("--postgresql_db")
parser.add_argument("--postgresql_table")
parser.add_argument("--staging_location")
parser.add_argument("--temp_location")
parser.add_argument("--subnetwork")

Argparse 是一个有用的库。我建议在这些 add_argument 调用中添加大量 options

【讨论】:

  • 谢谢,这似乎解决了我遇到的最初错误,但现在又出现了一个错误Traceback (most recent call last): File "/opt/python/bq-to-sql.py", line 56, in <module> start_date = define_start_date('2020-01-01') NameError: name 'define_start_date' is not defined
  • @tripleb 函数 define_start_date() 确实似乎从 python 代码中丢失。创建、导入或更改代码使其不被调用。
猜你喜欢
  • 2015-04-09
  • 2021-11-14
  • 2018-09-03
  • 2020-10-01
  • 2017-11-30
  • 2020-05-17
  • 2015-09-16
  • 1970-01-01
  • 2015-10-05
相关资源
最近更新 更多