Apache Hive setup
Overview of dbt-hive
- Maintained by: Cloudera
- Authors: Cloudera
- GitHub repo: cloudera/dbt-hive
- PyPI package:
dbt-hive
- Slack channel: #db-hive
- Supported dbt Core version: v1.1.0 and newer
- dbt Cloud support: Not Supported
- Minimum data platform version: n/a
Installing dbt-hive
pip is the easiest way to install the adapter:
pip install dbt-hive
Installing dbt-hive
will also install dbt-core
and any other dependencies.
Configuring dbt-hive
For Hive-specifc configuration please refer to Hive Configuration
For further info, refer to the GitHub repository: cloudera/dbt-hive
Connection Methods
dbt-hive can connect to Apache Hive and Cloudera Data Platform clusters. The Impyla library is used to establish connections to Hive.
dbt-hive supports two transport mechanisms:
- binary
- HTTP(S)
The default mechanism is binary
. To use HTTP transport, use the boolean option use_http_transport: [true / false]
.
Authentication Methods
dbt-hive supports two authentication mechanisms:
Insecure
This method is only recommended if you have a local install of Hive and want to test out the dbt-hive adapter.
your_profile_name:
target: dev
outputs:
dev:
type: hive
host: localhost
port: [port] # default value: 10000
schema: [schema name]
LDAP
LDAP allows you to authenticate with a username and password when Hive is configured with LDAP Auth. LDAP is supported over Binary & HTTP connection mechanisms.
This is the recommended authentication mechanism to use with Cloudera Data Platform (CDP).
your_profile_name:
target: dev
outputs:
dev:
type: hive
host: [host name]
http_path: [optional, http path to Hive] # default value: None
port: [port] # default value: 10000
auth_type: ldap
use_http_transport: [true / false] # default value: true
use_ssl: [true / false] # TLS should always be used with LDAP to ensure secure transmission of credentials, default value: true
username: [username]
password: [password]
schema: [schema name]
Note: When creating workload user in CDP, make sure the user has CREATE, SELECT, ALTER, INSERT, UPDATE, DROP, INDEX, READ and WRITE permissions. If you need the user to execute GRANT statements, you should also configure the appropriate GRANT permissions for them. When using Apache Ranger, permissions for allowing GRANT are typically set using "Delegate Admin" option. For more information, see grants
and on-run-start & on-run-end.
Kerberos
The Kerberos authentication mechanism uses GSSAPI to share Kerberos credentials when Hive is configured with Kerberos Auth.
your_profile_name:
target: dev
outputs:
dev:
type: hive
host: [hostname]
port: [port] # default value: 10000
auth_type: [GSSAPI]
kerberos_service_name: [kerberos service name] # default value: None
use_http_transport: true # default value: true
use_ssl: true # TLS should always be used to ensure secure transmission of credentials, default value: true
schema: [schema name]
Note: A typical setup of Cloudera Private Cloud will involve the following steps to setup Kerberos before one can execute dbt commands:
- Get the correct realm config file for your installation (krb5.conf)
- Set environment variable to point to the config file (export KRB5_CONFIG=/path/to/krb5.conf)
- Set correct permissions for config file (sudo chmod 644 /path/to/krb5.conf)
- Obtain keytab using kinit (kinit username@YOUR_REALM.YOUR_DOMAIN)
- The keytab is valid for certain period after which you will need to run kinit again to renew validity of the keytab.
- User will need CREATE, DROP, INSERT permissions on the schema provided in profiles.yml
Instrumentation
By default, the adapter will collect instrumentation events to help improve functionality and understand bugs. If you want to specifically switch this off, for instance, in a production environment, you can explicitly set the flag usage_tracking: false
in your profiles.yml
file.
Installation and Distribution
dbt's adapter for Apache Hive is managed in its own repository, dbt-hive. To use it,
you must install the dbt-hive
plugin.
Using pip
The following commands will install the latest version of dbt-hive
as well as the requisite version of dbt-core
and impyla
driver used for connections.
pip install dbt-hive
Supported Functionality
Name | Supported |
---|---|
Materialization: Table | Yes |
Materialization: View | Yes |
Materialization: Incremental - Append | Yes |
Materialization: Incremental - Insert+Overwrite | Yes |
Materialization: Incremental - Merge | No |
Materialization: Ephemeral | No |
Seeds | Yes |
Tests | Yes |
Snapshots | No |
Documentation | Yes |
Authentication: LDAP | Yes |
Authentication: Kerberos | Yes |