【问题标题】:snowflake connector errors ProgrammingError 100078 (22000)雪花连接器错误 ProgrammingError 100078 (22000)
【发布时间】:2020-01-24 05:55:40
【问题描述】:

当我尝试使用 python 脚本将数据从 s3 加载到雪花时,出现以下错误,

String '$METADATA$FILENAME' is too long and would be truncated
  File '#######', line 1, character 1
  Row 1, column $METADATA$FILENAME

我正在尝试将原始文件名存储在表中。为此,我使用$METADATA$FILENAME 关键字。在表中,此列使用全长 VARCHAR(16777216) 数据类型定义。 有什么办法可以解决这个问题

【问题讨论】:

  • 能否请您展示您用于执行此操作的代码 - 并提供其中一个文件名的示例?
  • 示例会有所帮助,以及您迄今为止测试过的内容。您的文件名是否超过 16777216 个字符和/或列的值是否大于 16MB?
  • 对不起,我不能分享代码。 @Suzy Lockwood 是的,我的列值大于 16mb,并且不超过 16777216
  • @YohanNeranga Snowflake 的 VARCHAR 最大长度为 16MB(未压缩),如文档中所述:docs.snowflake.net/manuals/sql-reference/…

标签: python-3.x snowflake-cloud-data-platform


【解决方案1】:
Here is the python script . Next time onwards please share your code too.

-------------------------------------------------------------------------------

#!/usr/bin/env python
# coding=utf-8
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext
import subprocess
from pyspark.sql import SparkSession
import logging
from logging import getLogger


spark = SparkSession.builder.appName("my_app").config('spark.sql.codegen.wholeStage', False).getOrCreate()

sc=spark.sparkContext
hadoop_conf=sc._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoop_conf.set("fs.s3n.awsAccessKeyId", " ") # Fill your values here
hadoop_conf.set("fs.s3n.awsSecretAccessKey", "")  # Fill your values here

logging.basicConfig(
        filename=v_log,
        level=logging.DEBUG)
logger = getLogger(__name__)

sfOptions = {
    "sfURL": "sfcsupport.snowflakecomputing.com",
    "sfAccount": "", # Fill your values here
    "sfUser": "", # Fill your values here
    "sfPassword": "", # Fill your values here
    "sfDatabase": "", # Fill your values here
    "sfSchema": "PUBLIC",
    "sfWarehouse": "", # Fill your values here
    "sfRole": "", # Fill your values here
    "parallelism": "64",
    "awsAccessKey": hadoop_conf.get("fs.s3n.awsAccessKeyId"),
    "awsSecretKey": hadoop_conf.get("fs.s3n.awsSecretAccessKey"),
    "tempdir": "s3n://<pathtofile>"  # Fill your values here
}




SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"


df = spark.read.option("delimiter", ",").csv(
    "s3n://<pathtofile>", header=False)  # Fill your values here
df.show()
----------------------------------------------------------------------------

【讨论】:

    猜你喜欢
    • 2021-07-22
    • 2020-03-23
    • 2020-04-26
    • 1970-01-01
    • 1970-01-01
    • 2022-01-24
    • 2021-11-11
    • 2023-03-22
    • 2023-04-09
    相关资源
    最近更新 更多