Input and Output: Memory-Mapped Files in Java

Most operating systems can take advantage of a virtual memory implementation to “map” a file, or a region of a file, into memory. Then the file can be accessed as if it were an in-memory array, which is much faster than the traditional file operations.

1. Memory-Mapped File Performance

At the end of this section, you can find a program that computes the CRC32 checksum of a file using traditional file input and a memory-mapped file. On one machine, we got the timing data shown in Table 2.5 when computing the checksum of the 37MB file rt.jar in the jre/tib directory of the JDK.

As you can see, on this particular machine, memory mapping is a bit faster than using buffered sequential input and dramatically faster than using a RandomAccessFite.

Of course, the exact values will differ greatly from one machine to another, but it is obvious that the performance gain, compared to random access, can be substantial. For sequential reading of files of moderate size, on the other hand, there is no reason to use memory mapping.

The java.nio package makes memory mapping quite simple. Here is what you do.

First, get a channel for the file. A channel is an abstraction for a disk file that lets you access operating system features such as memory mapping, file locking, and fast data transfers between files.

FiteChannet channel = FiteChannet.open(path, options);

Then, get a ByteBuffer from the channel by calling the map method of the FiteChannet class. Specify the area of the file that you want to map and a mapping mode. Three modes are supported: ^[1]

FileChannel.MapMode.READ_ONLY: The resulting buffer is read-only. Any attempt to write to the buffer results in a ReadOnlyBufferException.
FileChannel.MapMode.READ_WRITE: The resulting buffer is writable, and the changes will be written back to the file at some time. Note that other programs that have mapped the same file might not see those changes immediately. The exact behavior of simultaneous file mapping by multiple programs depends on the operating system.
FileChannel.MapMode.PRIVATE: The resulting buffer is writable, but any changes are private to this buffer and not propagated to the file.

Once you have the buffer, you can read and write data using the methods of the ByteBuffer class and the Buffer superclass.

Buffers support both sequential and random data access. A buffer has a position that is advanced by get and put operations. For example, you can sequentially traverse all bytes in the buffer as

while (buffer.hasRemaining())

{

byte b = buffer.get();

…

}

Alternatively, you can use random access:

for (int i = 0; i < buffer.limit(); i++)

{

byte b = buffer.get(i);

…

}

You can also read and write arrays of bytes with the methods

get(byte[] bytes)

get(byte[] bytes, int offset, int length)

Finally, there are methods

getInt getChar

getLong getFloat

getShort getDouble

to read primitive-type values that are stored as binary values in the file. As we already mentioned, Java uses big-endian ordering for binary data. However, if you need to process a file containing binary numbers in little-endian order, simply call

buffer.order(ByteOrder.LITTLE_ENDIAN);

To find out the current byte order of a buffer, call

ByteOrder b = buffer.order();

To write numbers to a buffer, use one of the methods

putInt putChar

putLong putFtoat

putShort putDoubte

At some point, and certainly when the channel is closed, these changes are written back to the file.

Listing 2.5 computes the 32-bit cyclic redundancy checksum (CRC32) of a file. That checksum is often used to determine whether a file has been corrupted. Corruption of a file makes it very likely that the checksum has changed. The java.utit.zip package contains a class CRC32 that computes the checksum of a sequence of bytes, using the following loop:

var crc = new CRC32();

white (more bytes)

crc.update(next byte);

tong checksum = crc.getVatue();

The details of the CRC computation are not important. We just use it as an example of a useful file operation. (In practice, you would read and update data in larger blocks, not a byte at a time. Then the speed differences are not as dramatic.)

Run the program as

java memoryMap.MemoryMapTest filename

Listing 2.5 memoryMap/MemoryMapTest.java

2. The Buffer Data Structure

When you use memory mapping, you make a single buffer that spans the entire file or the area of the file that you’re interested in. You can also use buffers to read and write more modest chunks of information.

In this section, we briefly describe the basic operations on Buffer objects. A buffer is an array of values of the same type. The Buffer class is an abstract class with concrete subclasses ByteBuffer, CharBuffer, DoubleBuffer, FloatBuffer, IntBuffer, LongBuffer, and ShortBuffer.

In practice, you will most commonly use ByteBuffer and CharBuffer. As shown in Figure 2.9, a buffer has

A capacity that never changes
A position at which the next value is read or written
A limit beyond which reading and writing is meaningless
Optionally, a mark for repeating a read or write operation

These values fulfill the condition

0 ≤ mark ≤ position ≤ limit ≤ capacity

The principal purpose of a buffer is a “write, then read” cycle. At the outset, the buffer’s position is 0 and the limit is the capacity. Keep calling put to add values to the buffer. When you run out of data or reach the capacity, it is time to switch to reading.

Call flip to set the limit to the current position and the position to 0. Now keep calling get while the remaining method (which returns limit – position) is positive. When you have read all values in the buffer, call clear to prepare the buffer for the next writing cycle. The clear method resets the position to 0 and the limit to the capacity.

If you want to reread the buffer, use rewind or mark/reset (see the API notes for details).

To get a buffer, call a static method such as ByteBuffer.allocate or ByteBuffer.wrap.

Then, you can fill a buffer from a channel, or write its contents to a channel. For example,

ByteBuffer buffer = ByteBuffer.allocate(RECORD_SIZE);
channel.read(buffer);
channel.position(newpos);
buffer.flip();
channel.write(buffer);

This can be a useful alternative to a random-access file.

Source: Horstmann Cay S. (2019), Core Java. Volume II – Advanced Features, Pearson; 11th edition.

1. Memory-Mapped File Performance

2. The Buffer Data Structure

Leave a Reply Cancel reply

Login