JAVA NIO – Introduction

Operating System allocates memory to JVM to process its task. In old JDK (>1.4), JVM uses FileSystem API to access file from hard disk which is quite burden to JVM because there is no direct reference between JVM and File inside hard disk. JVM uses Operating System system calls to access file. Operating System stores files into large ByteBuffer, which is quite large compare to byte stream used by JVM. JVM uses extra efforts to convert ByteBuffer to byte stream and vice versa.

So there were two key challenges in existing old JDK

  • It could not access file directly from disk
  • It has to do extra efforts to convert files data to byte stream

Following are the key steps to access Files

  1. Operating System allocate memory to JVM
  2. Client invoke FileSystem to access specific file from OS
  3. JVM make OS system call to access to File data
  4. JVM get OS’s data as ByteBuffer and converts it into Byte Stream

For large file OS uses Virtual Memory to store data outside of RAM. The benefit of Virtual Memory is that it is sharable across multiple processes and hence VM could be accesses by OS and JVM both. From transferring data from OS to VM could be quite fast by using DMA (Direct Memory Access) whereas transferring data from VM to JVM is slow because JVM does extra efforts to break large data buffer to byte stream.

JAVA FileSystem: Java uses FileSystem API to access physical storage inside system. When client try to access a particular file via FileSystem, FileSystem identify storage location and load those disk stores into memory.

File’s data stores into multiple pages and page contain group of block. Kernel establishes mapping between memory pages and filesystem pages.

The Virtual memory read paging content from disk and uses page fault to synchronize the file data to Virtual Memory. Once pageins completed FileSystem read the file contents and its Meta information


JAVA NIO that introduce JDK 1.4 and keep enhancing on newer version improve the I/O operations. It provides new type of buffers such as ByteBuffer, CharBuffer, and IntBuffer etc. that reduce the overhead during transferring data from OS to JVM. Java NIO could be able to map directly from VM to JVM bye using new ByteBuffer. JAVA ByteBuffer is same as OS Byte Buffer that’s why it could be easily mapped from OS Byte Buffer to JVM Byte Buffer so less overhead on data conversion from OS to JVM.


As per above diagram if OS uses VM and JVM use newly NIO interface, it enhances the performance while processing file especially large file. We will also discuss later there are another MappedByteBuffer which have capabilities to directly process the file in VM without transferring data from VM to JVM.


Hadoop’s org.apache.hadoop.fs.FileSystem is generic class to access and manage HDFS files/directories located in distributed environment. File’s content stored inside datanode with multiple equal large sizes of blocks (e.g. 64 MB), and namenode keep the information of those blocks and Meta information. FileSystem read and stream by accessing blocks in sequence order. FileSystem first get blocks information from NameNode then open, read and close one by one. It opens first blocks once it complete then close and open next block. HDFS replicate the block to give higher reliability and scalability and if client is one of the datanode then it tries to access block locally if fail then move to other cluster datanode. Continue reading

Hadoop MapReduce

HDFS stores file in multiple equal large size block e.g. 64 MB, 128 MB etc. and MapReduce framework access and process these files in distributed environment.
The MapReduce framework works on key-value pairs, it has two key part Mapper and Reducer.Map Reducers read file and split and pass to Mapper. Mapper set the input as key-value pairs and pass to the intermediate for sorting and shuffling. Reducer takes the key and list of value, process and writes to the disk. Continue reading