【问题标题】:How to import aws public dataset into mysql instance?如何将 aws 公共数据集导入 mysql 实例?
【发布时间】:2016-08-13 16:41:28
【问题描述】:

我正在尝试使用 Amazon AWS 中的公共数据集,名为 Twilio/Wigle.net Street Vector Data Set。该数据集包含美国街道名称和地址范围的数据。它的大小约为 10 GiB。当我们在 linux 中查看时,它看起来是这样的:

ubuntu@ip-172-31-xxx-xxx:/data-us-street$ pwd
/data-us-street
ubuntu@ip-172-31-xxx-xxx:/data-us-street$ ll
total 20576
drwxr-xr-x  6   27 sudo     4096 May 19  2009 ./
drwxr-xr-x 23 root root     4096 Apr 20 18:10 ../
-rw-r--r--  1 root root     8339 May 19  2009 README
drwx------  2   27 sudo     4096 Mar 18  2009 addresses/
-rw-rw----  1   27 sudo  5242880 Mar 18  2009 ib_logfile0
-rw-rw----  1   27 sudo  5242880 Mar  8  2009 ib_logfile1
-rw-rw----  1   27 sudo 10485760 Mar 18  2009 ibdata1
drwx------  2   27 sudo    16384 Mar  8  2009 lost+found/
drwx------  2   27 sudo     4096 Mar  8  2009 mysql/
-rw-rw----  1   27 sudo      117 Mar 18  2009 mysql-bin.000001
-rw-rw----  1   27 sudo       19 Mar 18  2009 mysql-bin.index
drwx------  2   27 sudo     4096 Mar  8  2009 test/

为了使用它的数据,我想将它链接到同一主机中的 MySQL 数据库。谁能告诉我怎么做?


我尝试过的

我尝试用以下更改覆盖位于/etc/mysql/my.cnf 的mysql 存储目录datadir

#datadir        = /var/lib/mysql
datadir         = /data-us-street

我停止了服务器,更改为 value 然后重新启动了 MySQL 服务器。但是,它不起作用。


自述文件

/*
    Copyright (c) 2009 Twilio, Inc.

    Permission is hereby granted, free of charge, to any person
    obtaining a copy of this software and associated documentation
    files (the "Software"), to deal in the Software without
    restriction, including without limitation the rights to use,
    copy, modify, merge, publish, distribute, sublicense, and/or sell
    copies of the Software, and to permit persons to whom the
    Software is furnished to do so, subject to the following
    conditions:

    The above copyright notice and this permission notice shall be
    included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
    OTHER DEALINGS IN THE SOFTWARE.
*/

1. The Twilio/Wigle.net public address dataset 

This public dataset contains the street data for the U.S., based on 
the geodata published by the U.S. Census Bureau's TIGER project.   
We have reformated the data from GIS friendly shapefiles to a more 
generally accessible MySQL database.   

This dataset covers all the streets, roads, and highways in the U.S.  
These streets are represented as "polylines" which are shapes made up
of individual line segments.    Each polyline has its own unique ID, 
and each segment that makes up the polyline has a sequence number.  
The combination of this ID and sequence number is the primary key of 
the address table.   Each row in the database represents one of these 
line segments.

2. Description of Columns 

     +-------------+------------+------+-----+------------+-------+
     | Field       | Type       | Null | Key | Default    | Extra |
     +-------------+------------+------+-----+------------+-------+
     | id          | char(10)   | NO   | PRI |            |       | 
     | seq         | char(3)    | NO   | PRI |            |       | 
     | name        | char(30)   | YES  | MUL | NULL       |       | 
     | prefix      | char(2)    | YES  | MUL | NULL       |       | 
     | type        | char(4)    | YES  |     | NULL       |       | 
     | startlat    | float(12,8)| NO   |     | 0.00000000 |       | 
     | startlong   | float(12,8)| NO   |     | 0.00000000 |       | 
     | endlat      | float(12,8)| NO   |     | 0.00000000 |       | 
     | endlong     | float(12,8)| NO   |     | 0.00000000 |       | 
     | leftzip     | int(5)     | YES  | MUL | NULL       |       | 
     | rightzip    | int(5)     | YES  | MUL | NULL       |       | 
     | leftaddr1   | char(11)   | YES  |     | NULL       |       | 
     | leftaddr2   | char(11)   | YES  |     | NULL       |       | 
     | rightaddr1  | char(11)   | YES  |     | NULL       |       | 
     | rightaddr2  | char(11)   | YES  |     | NULL       |       | 
     | name_dtmf   | char(30)   | YES  |     | NULL       |       | 
     | prefix_dtmf | char(2)    | YES  |     | NULL       |       | 
     +-------------+------------+------+-----+------------+-------+

   id - this is the unique ID of a street, road, or highway polyline,
     according to US census blocks

   seq - this is the sequence number of this segment in the polyline,
     used to correlate 2 segments of the same street within a US
     census block

   name - the name of this street, road, or highway.  example: Main

   prefix - the prefix of the street name.  examples: N,S,E,W

   type - the suffix of the street name.  examples: Blvd, St, Rd 

   startlat/startlong - the lat/long pair of this segments the starting 
point

   endlat/endlong - the lat/long pair of this segments the ending point

   leftzip/rightzip - the zipcode of the addresses on the corresponding
      side of the street

   leftaddr1/leftaddr2/rightaddr1/rightaddr2 - the starting and ending 
addresses
      numbers for each side of the street

   name_dtmf/prefix_dtmf/type_dtmf - the name, prefix and type columns, 
represented  
      as DTMF encoded numbers.  For example, (A,B,C) = 2, (D,E,F) = 3, 
etc.
      Useful for telephony applications. 


3. Example Segment 

    --------------------------------------------------------
             id: 111710515
            seq: 0
           name: Main
         prefix: 
           type: St
       startlat: 41.49493408
      startlong: -87.70324707
         endlat: 41.49483490
        endlong: -87.70324707
        leftzip: 60466
       rightzip: 60443
      leftaddr1: 21801
      leftaddr2: 21805
     rightaddr1: 21800
     rightaddr2: 21804
      name_dtmf: 6246
    prefix_dtmf: 
    --------------------------------------------------------

      Name: Main St.

        Left Zip: 60466
      21801                                21805
        X------------------------------------>
      21800                                21804
        Right Zip: 60443




4. Common Queries

   a. Find the street segment record for 30 Rockefeller Plaza, NY, NY, 
10020 

   SELECT * FROM address WHERE leftzip=10020 AND name LIKE 'Rockefeller' 
AND 30 BETWEEN leftaddr1 AND leftaddr2  
   UNION 
   SELECT * FROM address WHERE rightzip=10020 AND name LIKE 
'Rockefeller' AND 30 BETWEEN rightaddr1 AND rightaddr2

    --------------------------------------------------------
             id: 59657155
            seq: 0
           name: Rockefeller
         prefix: 
           type: Plz
       startlat: 40.7585
      startlong: -73.979
         endlat: 40.7592
        endlong: -73.9786
        leftzip: 10020
       rightzip: 10020
      leftaddr1: 22
      leftaddr2: 38
     rightaddr1: 21
     rightaddr2: 39
      name_dtmf: 7625335537
    prefix_dtmf:  
    --------------------------------------------------------


   b.  Find all the street names within a zipcode. 

   SELECT DISTINCT prefix, name, type FROM address WHERE leftzip=10009 
OR rightzip = 10009

     +--------+----------+------+
     | prefix | name     | type |
     +--------+----------+------+
     |        | 1st      | Ave  | 
     |        | Avenue A |      | 
     | E      | 13th     | St   | 
     | E      | 14th     | St   | 
     | E      | 12th     | St   | 
     | E      | 11th     | St   | 
     | E      | 10th     | St   | 
     | E      | 2nd      | St   | 
     | E      | 3rd      | St   | 
     | E      | 4th      | St   | 
     | E      | 6th      | St   | 
     | E      | 7th      | St   | 
     |        | St Marks | Pl   | 
     | E      | 9th      | St   | 
     |        | Avenue B |      | 
     | E      | 5th      | St   | 
     |        | Avenue C |      | 
     | E      | 20th     | St   | 
     | E      | 15th     | St   | 
     | E      | 16th     | St   | 
     | E      | 8th      | St   | 
     |        | Avenue D |      | 
     |        | Szold    | Pl   | 
     +--------+----------+------+

   c.  Streets starting with F in a zipcode 

   SELECT DISTINCT prefix, name, type from address WHERE leftzip = 94117 
AND name LIKE 'F%'
   UNION
   SELECT DISTINCT prefix, name, type from address WHERE rightzip = 
94117 AND name LIKE 'F%'

     +--------+------------+------+
     | prefix | name       | type |
     +--------+------------+------+
     |        | Fulton     | St   | 
     |        | Fell       | St   | 
     |        | Frederick  | St   | 
     |        | Farnsworth | Ln   | 
     |        | Fillmore   | St   | 
     |        | Friendship | Ct   | 
     +--------+------------+------+

   d. Streets starting with "FRE" as DTMF digits in a zipcode: 

   SELECT DISTINCT prefix, name, type from address WHERE leftzip = 94117 
AND name_dtmf LIKE '373%'
   UNION  
   SELECT DISTINCT prefix, name, type from address WHERE rightzip = 
94117 AND name_dtmf LIKE '373%'

     +--------+------------+------+
     | prefix | name       | type |
     +--------+------------+------+
     |        | Fulton     | St   | 
     |        | Fell       | St   | 
     |        | Frederick  | St   | 
     |        | Farnsworth | Ln   | 
     |        | Fillmore   | St   | 
     |        | Friendship | Ct   | 
     +--------+------------+------+

【问题讨论】:

    标签: mysql ubuntu amazon-web-services amazon-ec2


    【解决方案1】:

    根文件夹/data-us-street 下的每个子文件夹都代表一个数据库。所以如果我想使用数据库addresses,我需要使用文件夹./addresses

    1. 将目标数据库文件夹复制到当前 MySQL 数据库服务器实例的数据目录中,例如/var/lib/mysql

      $ cp -r /data-us-street/addresses /var/lib/mysql/addresses
      
    2. 递归更改文件夹所有者

      $ chown mysql:mysql /var/lib/mysql/addresses/*
      
    3. 使用数据库addresses,尽情享受吧!

    【讨论】:

      猜你喜欢
      • 2012-02-17
      • 1970-01-01
      • 2018-10-09
      • 2012-07-28
      • 2021-01-26
      • 1970-01-01
      • 1970-01-01
      • 2018-12-29
      • 1970-01-01
      相关资源
      最近更新 更多