Migrating an Apache Table to MongoDB: Creating a Document in Apache Cassandra

In this section we shall create the Cassandra table to be migrated to MongoDB. A Cassandra table may be created either using the Cassandra-CLI or using a Java application with Cassandra Java driver. We shall create a Cassandra table in a Java application, the CreateCassandraDatabase application. In the Java application, first we need to connect to Cassandra from the application. We shall use the Datastax Java driver to connect to Cassandra. Create an instance of Cluster, which is the main entry point for the Datastax Java driver. The cluster maintains a connection with one of the server nodes to keep information on the state and current topology of the cluster. The driver discovers all the nodes in the cluster making use of auto-discovery of nodes, which includes new nodes that join later. Build a Cluster.Builder instance, which is a helper class to build Cluster instances, using the static method builder().

  1. We need to provide the connect address of at least one of the nodes in the Cassandra cluster for the Datastax driver to be able to connect with the cluster and discover other nodes in the cluster using auto-discovery. Add the address of the Cassandra server running on the localhost (127.0.0.1) using the addContactPoint(String) method of Cluster.Builder.
  2. Next, invoke the build() method to build the Cluster using the configured addresses. The methods may be invoked in sequence as we don’t need an instance of the intermediary Cluster.Builder.

cluster = Cluster.builder().addContactPoint(“127.0.0.1”).build();

  1. Next, create a session on the cluster by invoking the connect() method. A Session instance is used to query the cluster and is represented with the Session class, which holds multiple connections to the cluster. The Session instance also provides policies on which node in the cluster to use for querying the cluster. The default policy is to use a round robin on all the nodes in the cluster. Session is also used to handle retries of failed queries. Session instances are thread-safe and a single instance is sufficient for an application if connecting to a single keyspace only. A separate Session instance is required if connecting to multiple keyspaces.

Session session = cluster.connect();

The Cassandra server must be running to be able to connect to the server when the application is run, and we already started the Cassandra server earlier. If Cassandra server is not running, the com.datastax.driver.core.exceptions.NoHostAvailableException exception is generated when a connection is tried.

The Session class provides several methods to prepare and run queries on the server, some of which are discussed in Table 6-2.

  1. We need to create a keyspace to store tables in. Add a static method createKeyspace() to create a keyspace in the CreateCassandraDatabase application. CQL 3 (Cassandra Query Language 3) has added support to run CREATE statements conditionally, which implies that an object is created only if the object to be constructed does not already exist. The IF NOT EXISTS clause is used to create conditionally. Create a keyspace called datastax using replication with strategy class as SimpleStrategy and replication factor as 1.

session.execute(“CREATE KEYSPACE IF NOT EXISTS datastax WITH replication ” + “=

{‘class’:’SimpleStrategy’, ‘replication_factor’:1};”);

  1. Invoke the createKeyspace() method in the main method. When the application is run, a keyspace gets created. Cassandra supports the following strategy classes listed in Table 6-3 that refer to the replica placement strategy class.

  1. Next, we shall create a column family, which is also called a table in CQL 3. Add a static method createTable() to CreateCassandraDatabase application. As mentioned before CREATE TABLE command also supports IF NOT EXISTS to create a table conditionally. CQL 3 has added the provision to create a compound primary key, a primary key created from multiple component primary key columns. In a compound primary key the first column is called the partition key. Create a table called catalog, which has columns catalog_id, journal, publisher, edition, title, and author. In catalog table the compound primary key is made from catalog_id and journal columns with catalog_id being the partition key. Invoke the execute(String) method to create table catalog as follows.

session.execute(“CREATE TABLE IF NOT EXISTS datastax.catalog (catalog_ id text,

journal text,publisher text, edition text,title text,author text,

PRIMARY KEY (catalog_id, journal))”);

  1. Prefix the table name with the keyspace name. Invoke the createTable() method in main method. When the CreateCassandraDatabase application is run, the catalog table gets created.
  2. Next, we shall add data to the table catalog using the INSERT statement. Use the IF NOT EXISTS keyword to add rows conditionally. When a compound primary key is used, all the component primary key columns must be specified including the values for the compound key columns.
  3. Add a method insert() to the CreateCassandraDatabase class and invoke the method in the main() method.
  4. Add two rows identified by row ids catalog1, catalog2 to the table catalog. For example, the two rows are added to the catalog table as follows.

session.execute(“INSERT INTO datastax.catalog (catalog_id, journal, publisher, edition,title,author) VALUES (‘catalog1′,’Oracle Magazine’, ‘Oracle Publishing’, ‘November-December 2013’, ‘Engineering as a Service’,’David A. Kelly’) IF NOT EXISTS”);

session.execute(“INSERT INTO datastax.catalog (catalog_id, journal, publisher, edition,title,author) VALUES (‘catalog2′,’Oracle Magazine’, ‘Oracle Publishing’, ‘November-December 2013’, ‘Quintessential and Collaborative’,’Tom Haunert’) IF NOT EXISTS”);

  1. To verify that a Cassandra table got created, next we shall run a SELECT statement to select columns from the catalog table. Add a method select() to run SELECT statement/s. Select all the columns from the catalog table using the * for column selection. The SELECT statement is run as a test to find that the data we added actually did get added.

ResultSet results = session.execute(“select * from datastax.catalog”);

  1. A row in the ResultSet is represented with the Row class. Iterate over the ResultSet to output the column value or each of the columns.

for (Row row : results) {

System.out.println(“Catalog Id: ” + row.getString(“catalog_id”));

System.out.println(“\n”);

System.out.println(“Journal: ” + row.getString(“journal”));

System.out.println(“Publisher: ” + row. getString(“publisher”));

System.out.println(“Edition: ” + row.getString(“edition”));

System.out.println(“Title: ” + row.getString(“title”));

System.out.println(“Author: ” + row.getString(“author”));

System.out.println(“\n”);

System.out.println(“\n”);

}

The CreateCassandraDatabase class is listed below.

package mongodb;

import com.datastax.driver.core.Cluster;

import com.datastax.driver.core.ResultSet;

import com.datastax.driver.core.Row;

import com.datastax.driver.core.Session;

public class CreateCassandraDatabase {

private static Cluster cluster;

private static Session session;

public static void main(String[] argv) {

cluster = Cluster.builder().addContactPoint(“127.0.0.1”).build();

session = cluster.connect();

createKeyspace();

createTable();

insert();

select();

session.close();

cluster.close();

}

private static void createKeyspace() {

session.execute(“CREATE KEYSPACE IF NOT EXISTS datastax WITH replication “

+ “= {‘class’:’SimpleStrategy’, ‘replication_factor’:1};”);

}

private static void createTable() {

session.execute(“CREATE TABLE IF NOT EXISTS datastax.catalog (catalog_id text,journal text,publisher

text, edition text,title text,author text,PRIMARY KEY (catalog_id, journal))”);

}

private static void insert() {

session.execute(“INSERT INTO datastax.catalog (catalog_id, journal, publisher, edition,title,author)

VALUES (‘catalog1′,’Oracle Magazine’, ‘Oracle Publishing’,

‘November-December 2013’, ‘Engineering as a Service’,’David A. Kelly’) IF NOT EXISTS”);

session.execute(“INSERT INTO datastax.catalog (catalog_id, journal, publisher, edition,title,author) VALUES (‘catalog2′,’Oracle Magazine’, ‘Oracle Publishing’, ‘November-December 2013’, ‘Quintessential and Collaborative’,’Tom Haunert’) IF NOT EXISTS”); }

private static void select() {

ResultSet results = session.execute(“select * from datastax.catalog”); for (Row row : results) {

System.out.println(“Catalog Id: ” + row.getString(“catalog_id”));

System.out.println(“\n”);

System.out.println(“Journal: ” + row.getString(“journal”));

System.out.println(“\n”);

System.out.println(“Publisher: ” + row.getString(“publisher”));

System.out.println(“\n”);

System.out.println(“Edition: ” + row.getString(“edition”));

System.out.println(“\n”);

System.out.println(“Title: ” + row.getString(“title”));

System.out.println(“\n”);

System.out.println(“Author: ” + row.getString(“author”));

System.out.println(“\n”);

}

}

}

  1. Run the CreateCassandraDatabase application to add two rows of data to the catalog table. Right-click on CreateCassandraDatabase.java in Package Explorer and select Run As ► Java Application as shown in Figure 6-12.

The Cassandra keyspace datastax gets created, the catalog table gets created, and data gets added to the table. The SELECT statement, which is run as a test, outputs the two rows added to Cassandra as shown in Figure 6-13.

  1. To verify that the datastax keyspace got created in Cassandra, log in to the Cassandra Client interface with the following command. If the Apache Cassandra version used does not include a cassandra-cli, use an earlier version Apache Cassandra 2.1.7 for the cassandra-cli.

cassandra-cli

  1. Run the following command to authenticate the datastax keyspace. use datastax;

The datastax keyspace gets authenticated as shown in Figure 6-14.

  1. To output the table stored in Cassandra run the following commands in Cassandra-CLI.

assume catalog keys as utf8;

assume catalog validator as utf8;

assume catalog comparator as utf8;

GET catalog[utf8(‘catalog1’)];

GET catalog[utf8(‘catalog2’)];

The two rows stored in the catalog table get listed as shown in Figure 6-15.

Next, we shall migrate the Cassandra data to MongoDB server.

Source: Vohra Deepak (2015), Pro MongoDB™ Development, Apress; 1st ed. edition.

Leave a Reply

Your email address will not be published. Required fields are marked *