【发布时间】:2021-11-08 02:27:40
【问题描述】:
我正在尝试将 HDFS 导出到 MYSQL 数据库。我找到了各种不同的解决方案,但都没有奏效,我什至尝试从文件中删除 WINDOWS-1251 字符。
作为一个小总结 - 我正在使用带有 Hortonworks 映像的 virtualbox 进行此操作。
我在默认数据库中的 HIVE:
CREATE EXTERNAL TABLE `airqualitydata`(
`sensor_id` VARCHAR(100),
`sensor_type` VARCHAR(100),
`location` VARCHAR(100),
`lat` VARCHAR(100),
`lon` VARCHAR(100),
`timestamp` timestamp,
`p1` VARCHAR(100),
`durp1` VARCHAR(100),
`ratiop1` VARCHAR(100),
`p2` VARCHAR(100),
`durp2` VARCHAR(100),
`ratiop2` VARCHAR(100))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
LOCATION 'hdfs://sandbox-hdp.hortonworks.com:8020/hadoop/airqualitydata'
TBLPROPERTIES ("skip.header.line.count"="1");
/hadoop/airqualitydata HDFS 中包含的文件(为了确定,删除了 win1251 字符)。
请注意,可以通过在 hive 中查询 SELECT * FROM airqualitydata 来可视化此数据。
sensor_id;sensor_type;location;lat;lon;timestamp;P1;durP1;ratioP1;P2;durP2;ratioP2
9710;SDS011;4894;43.226;27.934;2021-09-09T00:00:12;70;;;20;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:02:41;83;;;0.93;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:05:14;0.80;;;0.73;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:07:42;0.50;;;0.50;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:10:10;57;;;0.80;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:12:39;0.40;;;0.40;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:15:07;0.70;;;0.70;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:17:35;2;;;0.47;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:20:04;90;;;0.63;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:22:34;0.57;;;0.57;;
9710;SDS011;4894;43.226;27.934;2021-09-09T00:25:01;0.73;;;0.60;;
MYSQL 数据库和表:
CREATE DATABASE airquality CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
CREATE TABLE `airqualitydata`(
`sensor_id` VARCHAR(100),
`sensor_type` VARCHAR(100),
`location` VARCHAR(100),
`lat` VARCHAR(100),
`lon` VARCHAR(100),
`timestamp` timestamp,
`p1` VARCHAR(100),
`durp1` VARCHAR(100),
`ratiop1` VARCHAR(100),
`p2` VARCHAR(100),
`durp2` VARCHAR(100),
`ratiop2` VARCHAR(100)
);
SQOOP CLI 调用:
sqoop export --connect "jdbc:mysql://localhost:3306/airquality?useUnicode=true&characterEncoding=WINDOWS-1251" --username root --password hortonworks1 --export-dir hdfs://sandbox-hdp.hortonworks.com:8020/hadoop/airqualitydata --table airqualitydata --input-fields-terminated-by "\073" --input-lines-terminated-by "\n" -m 1
我删除了?useUnicode=true&characterEncoding=WINDOWS-1251,但没有成功。
我也无法从终端中给出的 URL 访问日志,所以我只得到了这个失败:
Warning: /usr/hdp/2.6.5.0-292/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
21/09/12 04:04:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.5.0-292
21/09/12 04:04:40 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
21/09/12 04:04:40 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
21/09/12 04:04:40 INFO tool.CodeGenTool: Beginning code generation
Sun Sep 12 04:04:40 UTC 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
21/09/12 04:04:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `airqualitydata` AS t LIMIT 1
21/09/12 04:04:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `airqualitydata` AS t LIMIT 1
21/09/12 04:04:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.5.0-292/hadoop-mapreduce
Note: /tmp/sqoop-raj_ops/compile/41fba9933b913b974b70403656a13287/airqualitydata.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
21/09/12 04:04:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-raj_ops/compile/41fba9933b913b974b70403656a13287/airqualitydata.jar
21/09/12 04:04:42 INFO mapreduce.ExportJobBase: Beginning export of airqualitydata
21/09/12 04:04:43 INFO client.RMProxy: Connecting to ResourceManager at sandbox-hdp.hortonworks.com/172.18.0.2:8032
21/09/12 04:04:43 INFO client.AHSProxy: Connecting to Application History server at sandbox-hdp.hortonworks.com/172.18.0.2:10200
21/09/12 04:04:50 INFO input.FileInputFormat: Total input paths to process : 1
21/09/12 04:04:50 INFO input.FileInputFormat: Total input paths to process : 1
21/09/12 04:04:50 INFO mapreduce.JobSubmitter: number of splits:1
21/09/12 04:04:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1631399426919_0028
21/09/12 04:04:51 INFO impl.YarnClientImpl: Submitted application application_1631399426919_0028
21/09/12 04:04:51 INFO mapreduce.Job: The url to track the job: http://sandbox-hdp.hortonworks.com:8088/proxy/application_1631399426919_0028/
21/09/12 04:04:51 INFO mapreduce.Job: Running job: job_1631399426919_0028
21/09/12 04:04:59 INFO mapreduce.Job: Job job_1631399426919_0028 running in uber mode : false
21/09/12 04:04:59 INFO mapreduce.Job: map 0% reduce 0%
21/09/12 04:05:03 INFO mapreduce.Job: map 100% reduce 0%
21/09/12 04:05:04 INFO mapreduce.Job: Job job_1631399426919_0028 failed with state FAILED due to: Task failed task_1631399426919_0028_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
21/09/12 04:05:04 INFO mapreduce.Job: Counters: 8
Job Counters
Failed map tasks=1
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2840
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=2840
Total vcore-milliseconds taken by all map tasks=2840
Total megabyte-milliseconds taken by all map tasks=710000
21/09/12 04:05:04 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
21/09/12 04:05:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 21.2627 seconds (0 bytes/sec)
21/09/12 04:05:04 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
21/09/12 04:05:04 INFO mapreduce.ExportJobBase: Exported 0 records.
21/09/12 04:05:04 ERROR mapreduce.ExportJobBase: Export job failed!
21/09/12 04:05:04 ERROR tool.ExportTool: Error during export: Export job failed!
任何指示都会有所帮助,谢谢!
编辑#1: 根据上面的 cmets,使用:
sqoop export --connect jdbc:mysql://localhost:3306/airquality --table airqualitydata --username root --password hortonworks1 --hcatalog-database default --hcatalog-table airqualitydata --verbose
或基本上(用于复制的人)
sqoop export --connect jdbc:mysql://<host:port>/<mysql db> --table <mysql table> --username <mysql_user> --password <mysqlpass> --hcatalog-database <hive_db> --hcatalog-table <hive_table> --verbose
我得到它把数据放在 MYSQL 中。但是,它也放置了标题行。此外,当运行两次(我相信它应该覆盖数据)时,它会导致数据在表中出现两次。
+-----------+-------------+----------+--------+--------+---------------------+------+-------+---------+------+-------+---------+
| sensor_id | sensor_type | location | lat | lon | timestamp | p1 | durp1 | ratiop1 | p2 | durp2 | ratiop2 |
+-----------+-------------+----------+--------+--------+---------------------+------+-------+---------+------+-------+---------+
| sensor_id | sensor_type | location | lat | lon | 2021-09-12 05:55:49 | P1 | durP1 | ratioP1 | P2 | durP2 | ratioP2 |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 70 | | | 20 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 83 | | | 0.93 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.80 | | | 0.73 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.50 | | | 0.50 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 57 | | | 0.80 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.40 | | | 0.40 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.70 | | | 0.70 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 2 | | | 0.47 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 90 | | | 0.63 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.57 | | | 0.57 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:55:49 | 0.73 | | | 0.60 | | |
| sensor_id | sensor_type | location | lat | lon | 2021-09-12 05:58:02 | P1 | durP1 | ratioP1 | P2 | durP2 | ratioP2 |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 70 | | | 20 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 83 | | | 0.93 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.80 | | | 0.73 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.50 | | | 0.50 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 57 | | | 0.80 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.40 | | | 0.40 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.70 | | | 0.70 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 2 | | | 0.47 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 90 | | | 0.63 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.57 | | | 0.57 | | |
| 9710 | SDS011 | 4894 | 43.226 | 27.934 | 2021-09-12 05:58:02 | 0.73 | | | 0.60 | | |
+-----------+-------------+----------+--------+--------+---------------------+------+-------+---------+------+-------+---------+
HIVE 中的数据没问题(那里没有标题行)。这可能是什么原因造成的?
我也有一个例外,但它总体上完成了,这重要吗?
21/09/12 05:57:41 INFO mapreduce.Job: Running job: job_1631399426919_0035
21/09/12 05:57:55 INFO mapreduce.Job: Job job_1631399426919_0035 running in uber mode : false
21/09/12 05:57:55 INFO mapreduce.Job: map 0% reduce 0%
21/09/12 05:58:03 INFO mapreduce.Job: map 100% reduce 0%
21/09/12 05:58:05 INFO mapreduce.Job: Job job_1631399426919_0035 completed successfully
21/09/12 05:58:06 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=345759
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2597
HDFS: Number of bytes written=0
HDFS: Number of read operations=2
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4966
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4966
Total vcore-milliseconds taken by all map tasks=4966
Total megabyte-milliseconds taken by all map tasks=1241500
Map-Reduce Framework
Map input records=12
Map output records=12
Input split bytes=1800
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=211
CPU time spent (ms)=3490
Physical memory (bytes) snapshot=217477120
Virtual memory (bytes) snapshot=1972985856
Total committed heap usage (bytes)=51380224
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
21/09/12 05:58:06 INFO mapreduce.ExportJobBase: Transferred 2.5361 KB in 62.3328 seconds (41.6635 bytes/sec)
21/09/12 05:58:06 INFO mapreduce.ExportJobBase: Exported 12 records.
21/09/12 05:58:06 INFO mapreduce.ExportJobBase: Publishing HCatalog export job data to Listeners
21/09/12 05:58:06 WARN mapreduce.PublishJobData: Unable to publish export data to publisher org.apache.atlas.sqoop.hook.SqoopHook
java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.sqoop.mapreduce.PublishJobData.publishJobData(PublishJobData.java:46)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:457)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
21/09/12 05:58:06 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@4232c52b
【问题讨论】:
-
你能不能像
--hcatalog-database mydb --hcatalog-table airquality这样使用表名和数据库并删除`--export dir`?并确保hive和mysql的数据类型和数据长度是等价/相似的。 -
另外,您可以使用 --verbose 运行以从 sqoop 获取更多信息。
-
嗨@KoushikRoy,我刚刚更新了帖子。它将数据附加到表中(所以当运行两次时,我有重复,我认为这不是预期的,至少想要修复)并且有一个例外,但它可以完成整体工作。您可能想将其作为答案,以便我可以标记它。如果您对其他问题有任何想法,我也将不胜感激。感谢您的时间!