【发布时间】:2016-02-24 13:40:14
【问题描述】:
我在 AWS EMR 有一个 Spark 集群,并尝试使用 thrift-server 启动以下代码:
...
JavaSparkContext jsc = new JavaSparkContext(SparkContext.getOrCreate());
HiveContext hiveContext = new HiveContext(jsc);
JavaRDD<Person> people = jsc.textFile("people.txt").map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
...
}
});
DataFrame schemaPeople = hiveContext.createDataFrame(people, Person.class);
schemaPeople.registerTempTable("people_temp");
schemaPeople.saveAsTable("people");
HiveThriftServer2.startWithContext(hiveContext);
...
我使用以下命令运行此代码:
sudo ./sbin/start-thriftserver.sh --jars /home/ec2-user/some.jar --class spark.jobs.thrift.ThriftServerInit
thrift 服务器启动后,我使用直线连接到它:
!connect jdbc:hive2://localhost:10001,运行 show tables; 并得到结果:
+--------------+--------------+--+
| tableName | isTemporary |
+--------------+--------------+--+
| people | false |
+--------------+--------------+--+
我也希望看到一个临时表people_temp。为什么people_temp 不见了?
【问题讨论】:
标签: java amazon-web-services jdbc apache-spark amazon-emr