【问题标题】:Write to Firestore from inside Google Cloud Dataflow从 Google Cloud Dataflow 内部写入 Firestore
【发布时间】:2020-05-19 00:12:59
【问题描述】:

我现在遇到的核心问题是,当我运行部署到 Google Cloud Dataflow 的 Dataflow 管道时,出现错误:

java.lang.IllegalStateException:名称为 [DEFAULT] 的 FirebaseApp 不存在。

如果我在本地运行相同的管道,则一切正常。所以我怀疑是身份验证问题,或者是环境问题。

代码位:

DEPLOY 和 REAL 变量用于控制是否推送到云端(或在本地运行)以及是否使用我的 Pub/Sub 源或使用 moc'd 数据。在 moc'd 和 pub/sub 数据之间切换似乎根本不会影响 Firestore 的情况。只有部署与否。

我正在初始化 Firestore 应用程序的 main() 部分:

    public class BreakingDataTransactions {

    // When true, this pulls from the specified Pub/Sub topic
  static Boolean REAL = true;
    // when set to true the job gets deployed to Cloud Dataflow
  static Boolean DEPLOY = true;

  public static void main(String[] args) {
      // validate our env vars
    if (GlobalVars.projectId   == null ||
        GlobalVars.pubsubTopic == null ||
        GlobalVars.gcsBucket   == null ||
        GlobalVars.region      == null) {
          System.out.println("You have to set environment variables for project (BREAKING_PROJECT), pubsub topic (BREAKING_PUBSUB), region (BREAKING_REGION) and Cloud Storage bucket for staging (BREAKING_DATAFLOW_BUCKET) in order to deploy this pipeline.");
          System.exit(1);
        }

      // Initialize our Firestore instance
    try {
    GoogleCredentials credentials = GoogleCredentials.getApplicationDefault();
    System.out.println("*************************");
    System.out.println(credentials);
    FirebaseOptions firebaseOptions =
        new FirebaseOptions.Builder()
            .setCredentials(credentials)
            .setProjectId(GlobalVars.projectId)
            .build();
    FirebaseApp firebaseApp = FirebaseApp.initializeApp(firebaseOptions);

    } catch (IOException e) {
      e.printStackTrace();
    }

      // Start dataflow pipeline
    DataflowPipelineOptions options =
        PipelineOptionsFactory.create().as(DataflowPipelineOptions.class);

    options.setProject(GlobalVars.projectId);

    if (DEPLOY) {
        options.setRunner(DataflowRunner.class);
        options.setTempLocation(GlobalVars.gcsBucket);
        options.setRegion(GlobalVars.region);
    }

    Pipeline p = Pipeline.create(options);

还有我正在处理的部分:

    PCollection<Data> dataCollection =
        jsonStrings
            .apply(ParDo.of(JSONToPOJO.create(Data.class)))
            .setCoder(AvroCoder.of(Data.class));

    PCollection<Result> result =
        dataCollection
            .apply(Window.into(FixedWindows.of(Duration.standardSeconds(1))))
            .apply(WithKeys.of(x -> x.operation + "-" + x.job_id))
            .setCoder(KvCoder.of(StringUtf8Coder.of(), AvroCoder.of(Data.class)))
            .apply(Combine.<String, Data, Result>perKey(new DataAnalysis()))
            .apply(Reify.windowsInValue())
            .apply(MapElements.into(TypeDescriptor.of(Result.class))
                    .<KV<String, ValueInSingleWindow<Result>>>via(
                        x -> {
                          Result r = new Result();
                          String key = x.getKey();
                          r.query_action = key.substring(0, key.indexOf("-"));
                          r.job_id = key.substring(key.indexOf("-") + 1);
                          r.average_latency = x.getValue().getValue().average_latency;
                          r.failure_percent = x.getValue().getValue().failure_percent;
                          r.timestamp = x.getValue().getTimestamp().getMillis();
                          return r;
                        }));

          // this node will (hopefully) actually write out to Firestore
        result.apply(ParDo.of(new FireStoreOutput()));

最后是 FireStoreOutput 类:

  public static class FireStoreOutput extends DoFn<Result, String> {

    Firestore db;

    @ProcessElement
    public void processElement(@Element Result result) {

      db = FirestoreClient.getFirestore();
      DocumentReference docRef = db.collection("events")
                                   .document("next2020")
                                   .collection("transactions")
                                   .document(result.job_id)
                                   .collection("transactions")
                                   .document();
      //System.out.println(docRef.getId());
      // Add document data  with id "alovelace" using a hashmap
      Map<String, Object> data = new HashMap<>();
      data.put("failure_percent", result.failure_percent);
      data.put("average_latency", result.average_latency);
      data.put("query_action", result.query_action);
      data.put("timestamp", result.timestamp);

      // asynchronously write data
      ApiFuture<WriteResult> writeResult = docRef.set(data);
      try {
        writeResult.get();
      } catch (InterruptedException e) {
        e.printStackTrace();
      } catch (ExecutionException e) {
        e.printStackTrace();
      }
      ;
    }
  }

线路出现错误:db = FirestoreClient.getFirestore();

我正在部署带有 --serviceAccount 标志的 Dataflow 作业,该标志指定有权执行所有操作的服务帐户。

因此,除非GoogleCredentials credentials = GoogleCredentials.getApplicationDefault(); 以某种方式不起作用(但您在那里看到打印语句,并且它确实正确打印出构建时的凭据),否则不是这样。

但是,这只发生在构建时......所以我想知道我是否有持久性问题,它在构建时初始化很好,但是当作业实际在云中运行时,它会丢失之间的初始化部署和处理。如果是这样,我该如何解决这个问题?

谢谢!

【问题讨论】:

    标签: java google-cloud-firestore google-cloud-dataflow apache-beam


    【解决方案1】:

    好的,我找到了解决方案……最大的问题是我的 DAG 的 PCollection 被分成了两个线程路径。我有两种类型的操作“读取”和“写入”,因此这些结果都将 PCollection 发送到我的 FirestoreOut 类,这是我试图初始化 Firestore 应用程序的地方,导致已经初始化的问题。

    但是,使我的 db 对象成为一个同步的静态对象,并建立一个同步的 getDB() 方法,仅当它尚未设置时才进行初始化。 FireStoreOut 部分的最终更新相关代码:

      public static class FireStoreOutput extends DoFn<Result, String> {
    
        static Firestore db;
    
        public static synchronized Firestore getDB() {
          if (db == null) {
            System.out.println("I'm being called");
              // Initialize our Firestore instance
            try {
              GoogleCredentials credentials = GoogleCredentials.getApplicationDefault();
              System.out.println("*************************");
              System.out.println(credentials);
              FirebaseOptions firebaseOptions =
                  new FirebaseOptions.Builder()
                      .setCredentials(credentials)
                      .setProjectId(GlobalVars.projectId)
                      .build();
              FirebaseApp firebaseApp = FirebaseApp.initializeApp(firebaseOptions);
    
            } catch (IOException e) {
              e.printStackTrace();
            }
            db = FirestoreClient.getFirestore();
          }
          return db;
        }
    
        @ProcessElement
        public void processElement(@Element Result result) {
          DocumentReference docRef = getDB().collection("events")
                                       .document("next2020")
                                       .collection("transactions")
                                       .document(result.job_id)
                                       .collection("transactions")
                                       .document();
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-09-26
      • 2021-09-25
      • 1970-01-01
      • 2016-11-02
      • 2019-03-07
      • 1970-01-01
      相关资源
      最近更新 更多