如何将 uuid 从 pyspark 数据帧保存到 postgres？答案

【问题标题】：How to save uuid from pyspark dataframe to postgres?如何将 uuid 从 pyspark 数据帧保存到 postgres？
【发布时间】：2021-02-04 15:30:36
【问题描述】：

Getting the below error while saving uuid to postgresql 

at org.postgresql.jdbc.PgStatement$BatchResultHandler.handleError(PgStatement.java:2356)
at org.postgresql.core.v3.QueryExecutorImpl$1.handleError(QueryExecutorImpl.java:395)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1912)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:338)
at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:2534)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:676)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:838)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:838)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

原因：org.postgresql.util.PSQLException：错误：列“id”是 uuid 类型，但表达式的类型是字符变化提示：您将需要重写或强制转换表达式。职位：276 在 org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2182) 在 org.postgresql.core.v3.QueryExecutorImpl.processResults（QueryExecutorImpl.java:1911） ... 17 更多

【问题讨论】：

遇到同样的问题：/ 但是，得到另一个错误消息：原因：org.postgresql.util.PSQLException：错误：列“id”是 uuid 类型，但表达式是字符类型变化
@user1809802 除了更改 postgres 中的数据类型之外，您找到解决方案了吗？

标签： postgresql pyspark uuid

【解决方案1】：

可悲的是，当将数据加载到数据帧中时，spark 似乎隐式地将 uuid 类型转换为不同的字符。我仍在寻找一种最佳方式来执行此操作，但到目前为止，使用触发器将字段重新转换为 uuid 似乎是让 uuid 位于应属于的位置的最安全方法。请注意，这种方法会产生开销，应该进行基准测试以查看它是否值得。我建议放弃使用 uuid 来支持整数 id，因为 uuid 类型具有 some performance issues 并且不完全跨数据库兼容。

【讨论】：