JAVA NIO – Buffer

A Buffer is abstract class of java.nio, which contains fixed amount of data. It can store data and later retrieve those data. Buffer has key three properties:

  • Capacity: A buffer’s capacity is the number of elements it contains. It is constant number that doesn’t change
  • Limit: A buffer limits is the index of the first element that should not be read or written. It always less than buffer’s capacity.
  • Position: A buffer position is the index of next element to be read from the buffer.

There is one subclass of this class for each non-boolean primitive type e.g. CharBuffer, IntBuffer etc. Buffer is abstract class that extended by non-Boolean primitive that provide common behavior across the various Buffers as shown below.

JAVA NIO Buffer

A buffer is a linear, finite sequence of elements of a specific primitive type wrapped inside an object. It constitutes date content and information about the data into single object. Data may transfer in to or out of the buffer by the I/O operations by a channel at the specified position. The I/O operation can read and write at the current position and then increment the position by the number of elements transferred. Buffer throws BufferUnderflowException and BufferOverflowException if position exceed while get operation or put operation respectively. The position field of Buffer specified the position to retrieve or insert the data element inside Buffer. Limit fields of Buffer indicates the end of the buffer which can be set using below operation

public final Buffer limit(int newLimit)

We can also drain the buffer by using flip method as below operation

public final Buffer flip()

Flip method set the limit to the current position and position to 0.The rewind () method is similar to flip () but does not affect the limit. It only sets the position back to 0. You can use rewind () to go back and reread the data in a buffer that has already been flipped.

Byte Buffers:

ByteBuffer is similar to OS ByteBuffer that could be mapped to OS’s ByteBuffer without any translation. ByteBuffer uses Byte core unit to read and write IO data, which is more significant, compare to other primitive data type buffers.

Byte buffers can be created either by allocation, which allocates space for the buffer’s content, or by wrapping an existing byte array into a buffer.

ByteBuffer are two type Direct Buffer and Indirect Buffer.

Direct & Indirect Buffers

ByteBuffer are two types Direct Buffer and Indirect Buffers. The key difference between direct buffer and indirect buffer is that direct buffer could directly access native IO calls whereas indirect buffers could not. Direct buffer access file data directly by using native I/O operation to fill or drain byte buffer. Direct buffer is memory consuming but other side it provided most efficient I/O mechanism. It doesn’t copy the buffer’s content to an intermediate buffer or vice versa.

Indirect buffer could also be used to pass the data but it could not directly uses native I/O operation upon it. Indirect buffer indirectly uses temporary direct buffer to access file IO data.

A direct buffer could be created by allocateDirect factory method, which is more expensive, and therefore it is advisable to use direct buffers only when they yield a measureable gain in program performance.

Whether a byte buffer is direct or non-direct may be determined by invoking its isDirect () method. This method is provided so that explicit buffer management can be done in performance-critical code

JAVA NIO – Introduction

Operating System allocates memory to JVM to process its task. In old JDK (>1.4), JVM uses FileSystem API to access file from hard disk which is quite burden to JVM because there is no direct reference between JVM and File inside hard disk. JVM uses Operating System system calls to access file. Operating System stores files into large ByteBuffer, which is quite large compare to byte stream used by JVM. JVM uses extra efforts to convert ByteBuffer to byte stream and vice versa.

So there were two key challenges in existing old JDK

  • It could not access file directly from disk
  • It has to do extra efforts to convert files data to byte stream

Following are the key steps to access Files

  1. Operating System allocate memory to JVM
  2. Client invoke FileSystem to access specific file from OS
  3. JVM make OS system call to access to File data
  4. JVM get OS’s data as ByteBuffer and converts it into Byte Stream

For large file OS uses Virtual Memory to store data outside of RAM. The benefit of Virtual Memory is that it is sharable across multiple processes and hence VM could be accesses by OS and JVM both. From transferring data from OS to VM could be quite fast by using DMA (Direct Memory Access) whereas transferring data from VM to JVM is slow because JVM does extra efforts to break large data buffer to byte stream.

JAVA FileSystem: Java uses FileSystem API to access physical storage inside system. When client try to access a particular file via FileSystem, FileSystem identify storage location and load those disk stores into memory.

File’s data stores into multiple pages and page contain group of block. Kernel establishes mapping between memory pages and filesystem pages.

The Virtual memory read paging content from disk and uses page fault to synchronize the file data to Virtual Memory. Once pageins completed FileSystem read the file contents and its Meta information

JAVA NIO

JAVA NIO that introduce JDK 1.4 and keep enhancing on newer version improve the I/O operations. It provides new type of buffers such as ByteBuffer, CharBuffer, and IntBuffer etc. that reduce the overhead during transferring data from OS to JVM. Java NIO could be able to map directly from VM to JVM bye using new ByteBuffer. JAVA ByteBuffer is same as OS Byte Buffer that’s why it could be easily mapped from OS Byte Buffer to JVM Byte Buffer so less overhead on data conversion from OS to JVM.

JAVA NIO2

As per above diagram if OS uses VM and JVM use newly NIO interface, it enhances the performance while processing file especially large file. We will also discuss later there are another MappedByteBuffer which have capabilities to directly process the file in VM without transferring data from VM to JVM.

Hadoop HDFS JAVA API

Hadoop’s org.apache.hadoop.fs.FileSystem is generic class to access and manage HDFS files/directories located in distributed environment. File’s content stored inside datanode with multiple equal large sizes of blocks (e.g. 64 MB), and namenode keep the information of those blocks and Meta information. FileSystem read and stream by accessing blocks in sequence order. FileSystem first get blocks information from NameNode then open, read and close one by one. It opens first blocks once it complete then close and open next block. HDFS replicate the block to give higher reliability and scalability and if client is one of the datanode then it tries to access block locally if fail then move to other cluster datanode. Continue reading

Hadoop MapReduce

Introduction
HDFS stores file in multiple equal large size block e.g. 64 MB, 128 MB etc. and MapReduce framework access and process these files in distributed environment.
The MapReduce framework works on key-value pairs, it has two key part Mapper and Reducer.Map Reducers read file and split and pass to Mapper. Mapper set the input as key-value pairs and pass to the intermediate for sorting and shuffling. Reducer takes the key and list of value, process and writes to the disk. Continue reading