【发布时间】:2018-08-02 10:42:26
【问题描述】:
我正在使用DESCRIBE 关键字来获取有关临时视图的列信息。这是一种有用的方法,但是我有一个表,我只想描述列的子集。我正在尝试将LIMIT 与DESCRIBE 结合使用来实现此目的,但无法弄清楚。
这是一个玩具数据集(使用 pyspark 创建):
# make some test data
columns = ['id', 'dogs', 'cats', 'horses', 'people']
vals = [
(1, 2, 0, 4, 3),
(2, 0, 1, 2, 4)
]
# create DataFrame
df = spark.createDataFrame(vals, columns)
df.createOrReplaceTempView('df')
现在用sql描述:
%%sql
DESCRIBE df
输出:
col_name data_type
id bigint
dogs bigint
cats bigint
horses bigint
people bigint
实际上,我有比这更多的列,我想做的是LIMIT 这个查询的输出。以下是我尝试过的几件事:
尝试 #1:
DESCRIBE df
LIMIT 3
错误:
An error was encountered:
"\nextraneous input '3' expecting {<EOF>, '.'}(line 3, pos 6)\n\n== SQL ==\n\nDESCRIBE df\nLIMIT 3 \n------^^^\n"
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 603, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
raise ParseException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.ParseException: "\nextraneous input '3' expecting {<EOF>, '.'}(line 3, pos 6)\n\n== SQL ==\n\nDESCRIBE df\nLIMIT 3 \n------^^^\n"
尝试 #2:
SELECT a.*
FROM (
DESCRIBE df
) AS a
LIMIT 3
错误:
An error was encountered:
'Table or view not found: DESCRIBE; line 4 pos 4'
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 603, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: 'Table or view not found: DESCRIBE; line 4 pos 4'
有谁知道是否可以限制 describe 的输出?
【问题讨论】:
标签: sql pyspark apache-spark-sql pyspark-sql sql-limit