Overview of Hive Storage Handler for MongoDB

Apache Hive supports two types of tables: managed tables and external tables. The difference between the two is that a managed table is managed by Hive, which implies that both the metadata and the data for a Hive managed table are managed by Hive. For an external table, Hive only manages the metadata – not the table data. If a Hive managed table is dropped, both the data and the metadata get dropped. But if a Hive external table is dropped, only the metadata is dropped and not the table data. A Hive external table is suitable if the data is stored in an external datasource such as MongoDB database. In this section we introduce the Hive MongoDB Storage Handler, which is used to create a Hive external table over a MongoDB database.

The MongoDB storage handler for Hive class is org.yong3.hive.mongo.MongoStorageHandler. The storage handler supports only the Hive primitive types such as int and string. The MongoDB storage handler provides serdeproperties mongo.column.mapping in which the MongoDB datastore column names that are to be mapped to the Hive external table are specified. In addition the following (Table 11-1) tblproperties are supported.

Source: Vohra Deepak (2015), Pro MongoDB™ Development, Apress; 1st ed. edition.

Leave a Reply

Your email address will not be published. Required fields are marked *