Creating an Apache Hive Table with MongoDB: Creating an External Table in Hive

Having created a MongoDB datastore we shall create a Hive external table defined on the MongoDB datastore using the Hive MongoDB Storage Handler. The TBLPROPERTIES for the MongoDB storage handler requires the mongo.user and mongo.password properties for which we need to create a user. We’ll create a MongoDB user first and then go on to create the external table.

  1. Start the MongoDB server shell with the following command.

>mongod

  1. A user that creates another user must have the createUser action privilege. First, create an administrative user, which has the createUser action privilege. The use admin command sets the database as admin. Add a user called hive with the db.addUser() or db.createUser() method. Use the db.auth() method to authenticate the administrative user.

>use admin

>db.addUser(‘hive’, ‘hive’);

>db.auth(‘hive’,’hive’);

The output from the Mongo shell commands is shown in Figure 11-13.

  1. Shut down the server.

>db.shutdownServer()

  1. Start the MongoDB server and log in to the shell as the administrative user hive with the following command.

mongo –port 27017 -u hive -p hive –authenticationDatabase admin

  1. The MongoDB shell connects to the test database. Create a user called hive with the following command.

db.createUser( { “user” : “hive”,

  “pwd”: “hive”,

  “roles” : [ { role: “clusterAdmin”, db: “admin” },

  { role: “readAnyDatabase”, db: “admin” },

  “readWrite”

  ] },

{ w: “majority” , wtimeout: 5000 } )

A new user called hive gets added as shown in Figure 11-14.

  1. As we shall be using the Remote metastore we need to start the Hive server with the following command.

hive -service hiveserver

The Hive Thrift Server gets started as shown in Figure 11-15.

  1. Start the Hive shell with the following command. hive
  2. In the Hive shell add the MongoDB storage handler for Hive to the Hive classpath with the ADD JAR command.

hive> ADD JAR hive-mongo-0.0.3-jar-with-dependencies.jar;

  1. Run a CREATE EXTERNAL TABLE command to create the Hive table wlslog with the following included:
  • Set the columns to TIME_STAMP,CATEGORY,TYPE,SERVERNAME,CODE, and MSG.
  • Set the STORED BY clause to ‘org.yong3.hive.mongo.MongoStorageHandler’.
  • In the WITH SERDEPROPERTIES clause set the mongo.column.mapping property to the column names in the MongoDB document collection.
  • In the TBLPROPERTIES clause set the properties shown in Table 11-2.

The command in the Hive shell to create a Hive external table is wlslog.

hive>CREATE EXTERNAL TABLE wlslog (TIME_STAMP string, CATEGORY string, TYPE string, SERVERNAME string, CODE string, MSG string)

STORED BY ‘org.yong3.hive.mongo.MongoStorageHandler’

WITH SERDEPROPERTIES (“mongo.column.mapping”=”TIME_STAMP,CATEGORY,TYPE,SERV ERNAME,CODE,MSG”)

TBLPROPERTIES ( “mongo.host” = “10.0.2.15”, “mongo.port” = “27017”,

“mongo.db” = “test”,“mongo.user” = “hive”, “mongo.passwd” = “hive”,

“mongo.collection” = “wlslog” );

The output from the command indicates that a Hive external table gets created.

  1. Run a SELECT query on the wlslog table to list the MongoDB data as shown in Figure 11-16.

The show tables command should list the wlslog table added as shown in Figure 11-17.

Source: Vohra Deepak (2015), Pro MongoDB™ Development, Apress; 1st ed. edition.

Leave a Reply

Your email address will not be published. Required fields are marked *