【问题标题】:Accessing Hive from remote server through Python通过 Python 从远程服务器访问 Hive
【发布时间】:2020-05-28 01:29:37
【问题描述】:

我已经在远程服务器上安装了以下必要的包,以便通过 Python 访问 Hive。

Python 2.7.6,
Python开发工具,
pyhs2,
sasl-0.1.3,
节俭-0.9.1,
PyHive-0.1.0

这是访问 Hive 的 Python 脚本。

#!/usr/bin/env python
import pyhs2 as hive
import getpass
DEFAULT_DB = 'camp'
DEFAULT_SERVER = '10.25.xx.xx'
DEFAULT_PORT = 10000
DEFAULT_DOMAIN = 'xxx.xxxxxx.com'

# Get the username and password
u = raw_input('Enter PAM username: ')
s = getpass.getpass()
# Build the Hive Connection
connection = hive.connect(host=DEFAULT_SERVER, port=DEFAULT_PORT,    authMechanism='LDAP', user=u + '@' + DEFAULT_DOMAIN, password=s)
# Hive query statement
statement = "select * from camp.test"
cur = connection.cursor()

# Runs a Hive query and returns the result as a list of list
cur.execute(statement)
df = cur.fetchall()

这是我得到的输出:

  File "build/bdist.linux-x86_64/egg/pyhs2/__init__.py", line 7, in connect
  File "build/bdist.linux-x86_64/egg/pyhs2/connections.py", line 46, in __init__
  File "build/bdist.linux-x86_64/egg/pyhs2/cloudera/thrift_sasl.py", line 74, in open
  File "build/bdist.linux-x86_64/egg/pyhs2/cloudera/thrift_sasl.py", line 92, in _recv_sasl_message
  File "build/bdist.linux-x86_64/egg/thrift/transport/TTransport.py", line 58, in readAll
  File "build/bdist.linux-x86_64/egg/thrift/transport/TSocket.py", line 118, in read
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

执行脚本后,我在输出中看不到任何错误,但是我在屏幕上看不到任何查询结果。我不确定为什么它没有显示任何查询结果,Hive 服务器 IP、端口、用户和密码是正确的。我还验证了配置单元服务器和远程服务器之间的连接,没有连接问题。

【问题讨论】:

    标签: python-2.7 hadoop hive


    【解决方案1】:

    尝试使用此代码:

    import pyhs2
    
    with pyhs2.connect(host='localhost',
                       port=10000,
                       authMechanism="PLAIN",
                       user='root',
                       password='test',
                       database='default') as conn:
        with conn.cursor() as cur:
            #Show databases
            print cur.getDatabases()
    
            #Execute query
            cur.execute("select * from table")
    
            #Return column info from query
            print cur.getSchema()
    
            #Fetch table results
            for i in cur.fetch():
                print i
    

    【讨论】:

    • Cloudera packge not found error while running this code in python 3
    【解决方案2】:

    我已经设法通过以下方式获得访问权限

    from pyhive import presto
    DEFAULT_DB = 'XXXXX'
    DEFAULT_SERVER = 'server.name.blah'
    DEFAULT_PORT = 8000
    
    # Username
    u = "user"
    
    # Build the Hive Connection
    connection = presto.connect(host=DEFAULT_SERVER, port=DEFAULT_PORT, username=u)
    
    # Hive query statement
    statement = "select * from public.dudebro limit 5"
    cur = connection.cursor()
    
    # Runs a Hive query and returns the result as a list of list
    cur.execute(statement)
    df = cur.fetchall()
    print df
    

    【讨论】:

    • 它是否尝试仅在执行语句处连接?因为它运行良好,直到 cur.execute。然后给出连接被拒绝错误
    猜你喜欢
    • 2011-09-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多