如何在 azure databricks 中运行 Spring Boot Java 应用程序答案

【问题标题】：how to run spring boot java application in azure databricks如何在 azure databricks 中运行 Spring Boot Java 应用程序
【发布时间】：2021-05-12 12:54:27
【问题描述】：

我正在寻找有关如何在 Azure Databricks 中运行基于 Spring Boot 的 Java 应用程序的指导。

我习惯于在前提条件下在 Apache Spark 中运行基于 Spring Boot 的 Java 应用程序：Spring Boot 应用程序不会像在 Apache Spark 中那样运行。让他们运行适合我的技术是

使用 copy-rename-maven-plugin 重命名由 spring-boot-maven-plugin 生成的原始 jar 文件

      <plugin>
        <groupId>com.coderplus.maven.plugins</groupId>
        <artifactId>copy-rename-maven-plugin</artifactId>
        <version>1.0.1</version>
        <executions>
          <execution>
            <id>rename-file</id>
            <phase>package</phase>
            <goals>
              <goal>rename</goal>
            </goals>
            <configuration>
              <sourceFile>target/${project.name}-${project.version}.jar.original</sourceFile>
              <destinationFile>target/${project.name}-${project.version}-original.jar</destinationFile>
            </configuration>
          </execution>
        </executions>
      </plugin>

有一个原始jar文件名和位置的配置项设置为原始jar文件的安装位置
在 spark-submit 命令中将原始文件名和位置传递给 spark 会话“-Doriginal.jar-file”

sparkConf.setJars(new String[]{props.getJarFile()});

使用 spark 提交运行

spark-submit --master yarn --deploy-mode client --conf "spark.driver.extraJavaOptions=-Dspring.profiles.active=dev" SparkPiBoot-0.0.1.jar

https://radanalytics.io/assets/my-first-radanalytics-app/sparkpi-java-spring.html 中概述了该技术，它与开放式班次构建一起使用。

使用这种技术，我的驱动程序应用程序在本地 Apache Spark 中运行，作为具有依赖注入等功能的成熟 Spring Boot 应用程序。数据帧代码在 Apache Spark 中运行，取自原始未经处理的 jar 文件，因此没有依赖注入等。

在 Azure Databricks 中，我打算将 Databricks 作业安排为从 Azure 数据工厂管道运行，因此我尝试了相同的技术：我在 azure databricks 文件系统中安装了 2 个 jars 文件，创建了一个 Azure 数据工厂具有指向 dbfs 中原始 jar 文件的用户属性的 Databricks Activity。

这会导致错误，应用程序将无法运行 ''' command--1:1：错误：找不到类 org.springframework.boot.CommandLineRunner - 继续使用存根。 io.radanalytics.SparkPiBootApplication.main(Array()) ^ '''

主 jar 文件是一个 Spring Boot uber jar，其中包含所有依赖 jar，包括 spring-boot-1.5.2.RELEASE.jar，其中包括 org.springframework.boot.CommandLineRunner。

【问题讨论】：

标签： spring-boot databricks azure-databricks

【解决方案1】：

我从 Microsoft 支持部门收到了以下答复： Databricks 类加载器不支持从 Spring Boot uber jar 加载类。

有一个解决方法涉及

使用以下说明将安装在 databricks 集群中的默认 spring jar 替换为所需版本：https://docs.microsoft.com/en-us/azure/databricks/kb/libraries/replace-default-jar-new-jar
使用 maven-shade-plugin 构建 jar

<build>
   <plugins>
       <plugin>
           <groupId>org.apache.maven.plugins</groupId>
           <artifactId>maven-shade-plugin</artifactId>
           <dependencies>
               <dependency>
                   <groupId>org.springframework.boot</groupId>
                   <artifactId>spring-boot-maven-plugin</artifactId>
                   <version>1.5.2.RELEASE</version>
               </dependency>
           </dependencies>
           <configuration>
               <keepDependenciesWithProvidedScope>false</keepDependenciesWithProvidedScope>
               <createDependencyReducedPom>false</createDependencyReducedPom>
               <filters>
                   <filter>
                       <artifact>*:*</artifact>
                       <excludes>
                           <exclude>META-INF/*.SF</exclude>
                           <exclude>META-INF/*.DSA</exclude>
                           <exclude>META-INF/*.RSA</exclude>
                       </excludes>
                   </filter>
               </filters>
               <transformers>
                   <transformer
                           implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                       <resource>META-INF/spring.handlers</resource>
                   </transformer>
                   <transformer
                           implementation="org.springframework.boot.maven.PropertiesMergingResourceTransformer">
                       <resource>META-INF/spring.factories</resource>
                   </transformer>
                   <transformer
                           implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                       <resource>META-INF/spring.schemas</resource>
                   </transformer>
                   <transformer
                           implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
                   <transformer
                           implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                       <mainClass>${main-class}</mainClass>
                   </transformer>
               </transformers>
           </configuration>
           <executions>
               <execution>
                   <phase>package</phase>
                   <goals>
                       <goal>shade</goal>
                   </goals>
               </execution>
           </executions>
       </plugin>

【讨论】：

我一直坚持这种方法，除了升级集群中的spring框架版本1）升级集群中的snake yaml版本2）在应用程序中禁用日志系统之外，我还解决了以下问题：org .springframework.boot.logging.LoggingSystem=none 禁用 Gson 自动配置：@SpringBootApplication( exclude = { org.springframework.boot.autoconfigure.gson.GsonAutoConfiguration.class } ) 3) 从阴影 jar 中删除 log4j2 依赖项，在集群中使用 log4j
... 5) 在我的代码中使用 slf4j api 提供独立于 log4j 和 log4j2 6) 删除 log4j 和 log4j2 配置文件，依赖默认值 7) 不能使用 AOP 注释来监控代码运行 sql 查询，因为干扰 spark 会话，使用中间类代替监视，然后调用查询代码 8）@Component 类需要是公共的，可能需要或可能不需要可序列化 9）每个应用程序需要集群，因为应用程序部署在集群作为库，所有都添加到所有公共的类路径中