Java NIO \c Ron Hitchens

Ron Hitchens

Mentioned 7

The java New I/O (NIO) packages in J2SE 1.4 introduce many new, indispensable features previously unavailable to Java programmers. These include APIs for high-performance I/O operations, regular expression processing, and character set coding. These new libraries are a treasure trove for java developers. The NIO APIs are especially valuable where high-performance I/O is a requirement, but they can also be useful in a wide range of scenarios. The new APIs let you work directly with I/O buffers, multiplex nonblocking streams, do scattering reads and gathering writes, do channel-to-channel transfers, work with memory-mapped files, manage file locks, and much more. The new high-performance Regular Expression Library provides sophisticated, Perl-like regex-processing features such as pattern matching, search and replace, capture groups, look ahead assertions, and many others. The Charset API gives you complete control over character set encoding and decoding, which are vital for properly managing the exchange of documents on the Web, for localization, or for other purposes. You can also create and install your own custom character sets. Staying current with the latent java technology is never easy. NIO, new in Java 1.4, is quite possibly the most important new java feature since Swing. Understanding it thoroughly is essential for any serious Java developer. NIO closes the gap between java and natively compiled languages and enables java applications to achieve maximum I/O performance by effectively leveraging operating-system services in a portable way. Java NIO is a comprehensive guide to the java New I/O facilities. It lots you take full advantage of NIO features and shows you how they work, what they can do for you, and when you should use them. This book brings you up to speed on NIO and shows you how to bring your I/O-bound Java applications up to speed as well. Java NIO is an essential part of any Java professional's library.

More on Amazon.com

Mentioned in questions and answers.

To allocate() or to allocateDirect(), that is the question.

For some years now I've just stuck to the thought that since DirectByteBuffers are a direct memory mapping at OS level, that it would perform quicker with get/put calls than HeapByteBuffers. I never was really interested in finding out the exact details regarding the situation until now. I want to know which of the two types of ByteBuffers are faster and on what conditions.

Ron Hitches in his excellent book Java NIO seems to offer what I thought could be a good answer to your question:

Operating systems perform I/O operations on memory areas. These memory areas, as far as the operating system is concerned, are contiguous sequences of bytes. It's no surprise then that only byte buffers are eligible to participate in I/O operations. Also recall that the operating system will directly access the address space of the process, in this case the JVM process, to transfer the data. This means that memory areas that are targets of I/O perations must be contiguous sequences of bytes. In the JVM, an array of bytes may not be stored contiguously in memory, or the Garbage Collector could move it at any time. Arrays are objects in Java, and the way data is stored inside that object could vary from one JVM implementation to another.

For this reason, the notion of a direct buffer was introduced. Direct buffers are intended for interaction with channels and native I/O routines. They make a best effort to store the byte elements in a memory area that a channel can use for direct, or raw, access by using native code to tell the operating system to drain or fill the memory area directly.

Direct byte buffers are usually the best choice for I/O operations. By design, they support the most efficient I/O mechanism available to the JVM. Nondirect byte buffers can be passed to channels, but doing so may incur a performance penalty. It's usually not possible for a nondirect buffer to be the target of a native I/O operation. If you pass a nondirect ByteBuffer object to a channel for write, the channel may implicitly do the following on each call:

  1. Create a temporary direct ByteBuffer object.
  2. Copy the content of the nondirect buffer to the temporary buffer.
  3. Perform the low-level I/O operation using the temporary buffer.
  4. The temporary buffer object goes out of scope and is eventually garbage collected.

This can potentially result in buffer copying and object churn on every I/O, which are exactly the sorts of things we'd like to avoid. However, depending on the implementation, things may not be this bad. The runtime will likely cache and reuse direct buffers or perform other clever tricks to boost throughput. If you're simply creating a buffer for one-time use, the difference is not significant. On the other hand, if you will be using the buffer repeatedly in a high-performance scenario, you're better off allocating direct buffers and reusing them.

Direct buffers are optimal for I/O, but they may be more expensive to create than nondirect byte buffers. The memory used by direct buffers is allocated by calling through to native, operating system-specific code, bypassing the standard JVM heap. Setting up and tearing down direct buffers could be significantly more expensive than heap-resident buffers, depending on the host operating system and JVM implementation. The memory-storage areas of direct buffers are not subject to garbage collection because they are outside the standard JVM heap.

The performance tradeoffs of using direct versus nondirect buffers can vary widely by JVM, operating system, and code design. By allocating memory outside the heap, you may subject your application to additional forces of which the JVM is unaware. When bringing additional moving parts into play, make sure that you're achieving the desired effect. I recommend the old software maxim: first make it work, then make it fast. Don't worry too much about optimization up front; concentrate first on correctness. The JVM implementation may be able to perform buffer caching or other optimizations that will give you the performance you need without a lot of unnecessary effort on your part.

I'm wondering, how expensive it is to have many threads in waiting state in java 1.6 x64.

To be more specific, I'm writing application which runs across many computers and sends/receives data from one to another. I feel more comfortable to have separate thread for each connected machine and task, like 1) sending data, 2) receiving data, 3) reestablishing connection when it is dropped. So, given that there are N nodes in cluster, each machine is going to have 3 threads for each of N-1 neighbours. Typically there will be 12 machines, which comes to 33 communication threads.

Most of those threads will be sleeping most of the time, so for optimization purposes I could reduce number of threads and give more job to each of them. Like, for example. reestablishing connection is responsibility of receiving thread. Or sending to all connected machines is done by single thread.

So is there any significant perfomance impact on having many sleeping threads?

We had very much the same problem before we switched to NIO, so I will second Liedmans recommendation to go with that framework. You should be able to find a tutorial, but if you want the details, I can recommend Java NIO by Ron Hitchens.

Swithcing to NIO increased the number of connections we could handle a lot, which was really critical for us.

RandomAccessFile is quite slow for random access to a file. You often read about implementing a buffered layer over it, but code doing this isn't possible to find online.

So my question is: would you guys who know any opensource implementation of this class share a pointer or share your own implementation?

It would be nice if this question would turn out as a collection of useful links and code about this problem, which I'm sure, is shared by many and never addressed properly by SUN.

Please, no reference to MemoryMapping, as files can be way bigger than Integer.MAX_VALUE.

Well, I do not see a reason not to use java.nio.MappedByteBuffer even if the files are bigger the Integer.MAX_VALUE.

Evidently you will not be allowed to define a single MappedByteBuffer for the whole file. But you could have several MappedByteBuffers accessing different regions of the file.

The definition of position and size in FileChannenel.map are of type long, which implies you can provide values over Integer.MAX_VALUE, the only thing you have to take care of is that the size of your buffer will not be bigger than Integer.MAX_VALUE.

Therefore, you could define several maps like this:

buffer[0] = fileChannel.map(FileChannel.MapMode.READ_WRITE,0,2147483647L);
buffer[1] = fileChannel.map(FileChannel.MapMode.READ_WRITE,2147483647L, Integer.MAX_VALUE);
buffer[2] = fileChannel.map(FileChannel.MapMode.READ_WRITE, 4294967294L, Integer.MAX_VALUE);
...

In summary, the size cannot be bigger than Integer.MAX_VALUE, but the start position can be anywhere in your file.

In the Book Java NIO, the author Ron Hitchens states:

Accessing a file through the memory-mapping mechanism can be far more efficient than reading or writing data by conventional means, even when using channels. No explicit system calls need to be made, which can be time-consuming. More importantly, the virtual memory system of the operating system automatically caches memory pages. These pages will be cached using system memory andwill not consume space from the JVM's memory heap.

Once a memory page has been made valid (brought in from disk), it can be accessed again at full hardware speed without the need to make another system call to get the data. Large, structured files that contain indexes or other sections that are referenced or updated frequently can benefit tremendously from memory mapping. When combined with file locking to protect critical sections and control transactional atomicity, you begin to see how memory mapped buffers can be put to good use.

I really doubt that you will find a third-party API doing something better than that. Perhaps you may find an API written on top of this architecture to simplify the work.

Don't you think that this approach ought to work for you?

Currently I am using scanner/filereader and using while hasnextline. I think this method is not highly efficient. Is there any other method to read file with the similar functionality of this?

public void Read(String file) {
        Scanner sc = null;


        try {
            sc = new Scanner(new FileReader(file));

            while (sc.hasNextLine()) {
                String text = sc.nextLine();
                String[] file_Array = text.split(" ", 3);

                if (file_Array[0].equalsIgnoreCase("case")) {
                    //do something
                } else if (file_Array[0].equalsIgnoreCase("object")) {
                    //do something
                } else if (file_Array[0].equalsIgnoreCase("classes")) {
                    //do something
                } else if (file_Array[0].equalsIgnoreCase("function")) {
                    //do something
                } 
                else if (file_Array[0].equalsIgnoreCase("ignore")) {
                    //do something
                }
                else if (file_Array[0].equalsIgnoreCase("display")) {
                    //do something
                }
            }

        } catch (FileNotFoundException e) {
            System.out.println("Input file " + file + " not found");
            System.exit(1);
        } finally {
            sc.close();
        }
    }

you can use FileChannel and ByteBuffer from JAVA NIO. ByteBuffer size is the most critical part in reading data faster what i have observed. Below code will read the content of the file.

static public void main( String args[] ) throws Exception 
    {
        FileInputStream fileInputStream = new FileInputStream(
                                        new File("sample4.txt"));
        FileChannel fileChannel = fileInputStream.getChannel();
        ByteBuffer byteBuffer = ByteBuffer.allocate(1024);

        fileChannel.read(byteBuffer);
        byteBuffer.flip();
        int limit = byteBuffer.limit();
        while(limit>0)
        {
            System.out.print((char)byteBuffer.get());
            limit--;
        }

        fileChannel.close();
    }

You can check for '\n' for new line here. Thanks.


Even you can scatter and getter way to read files faster i.e.

fileChannel.get(buffers);

where

      ByteBuffer b1 = ByteBuffer.allocate(B1);
      ByteBuffer b2 = ByteBuffer.allocate(B2);
      ByteBuffer b3 = ByteBuffer.allocate(B3);

      ByteBuffer[] buffers = {b1, b2, b3};

This saves the user process to from making several system calls (which can be expensive) and allows kernel to optimize handling of the data because it has information about the total transfer, If multiple CPUs available it may even be possible to fill and drain several buffers simultaneously.

From this book.

For me below are the most probable definitions for asynchronous and non-blocking I/O:

Asynchronous I/O: In asynchronous I/O applications return immediately and OS will let them know when the bytes are available for processing.

NON-blocking I/O: Here application returns immediately what ever data available and application should have pooling mechanism to find out when more data is ready.

After knowing these definitions if we analyze java channels i.e. SocketChannel, ServerSocketChannel, DatagramSocketChannel then we can find that these channels can be used as blocking or non-blocking mode by the method configureBlocking(boolean block). and assume that we are using them as non-blocking mode. So here comes the questions:

if i will use Selector i.e. register channels to a selector whether it is Asynchronous I/O or NON-blocking I/O?

I feel this is Asynchronous I/O in java if and only if underlying operating system is informing the java application about readiness selection of a channel. Else it is non-blocking I/O and selector is just a mechanism which helps us pooling the above mention channels as i mention in the definition. Which is correct? Thanks in advance.

EDIT:

I have answered one part of the question i.e. types of I/O and how java facilitates these functionality.

But one question still remains whether all these functionalities are provides by java is simulated at java layer or it uses underlaying OS to facilitate? Assume the underlaying OS has all the support for these functionalities.

Please refer to the answer.

I thought of answering my question by doing some more homework. This post will also help in understanding the I/O concepts w.r.t. underlying OS.

  • This is blocking I/O: FileInputStream, FileOutputStream and even reading to or writing from Socket come under this category

  • This is non-blocking I/O: this is used by Socket Channels like ServerSocketchannel, SocketChannel, DatagramChannel in Java

  • This is multiplexed I/O: in Java it is used by Selector to handle multiple channels and these channels should be non-blocking by nature. So Socket channels can be registered to Selector and Selector can manage by I/O multiplexing facility of underlaying OS.

  • Now comes Asynchronous I/O. In asynchronous I/O applications return immediately and OS will let them know when the bytes are available for processing. In java it is facilitated by AsynchronousSocketChannel, AsynchronousServerSocketChannel, AsynchronousFileChannel.

For these above functionalities java uses underlying OS heavily. This is evident when i was going through the book . Here in chapter 4 the author mentions that

True readiness selection must be done by the operating system. One of the most important functions performed by an operating system is to handle I/O requests and notify processes when their data is ready. So it only makes sense to delegate this function down to the operating system. The Selector class provides the abstraction by which Java code can request readiness selection service from the underlying operating system in a portable way.

Hence it's clear that Java uses underlying OS heavily for these features.

I'm trying to print to a device which supports CP866 encoding only.

Unfortunately the device from which I'm printing (an Android device) does not support CP866, resulting in "abc".getBytes("CP866") throwing the UnsupportedEncodingException.

So, I guess, I have to do Unicode to CP866 encoding myself. Is there any freeware java library that does that?

I was needed to encode string with Cp866 in android. You can use java library with made up charset classes. Cp866 among them.

This is the link: http://www.doc.ic.ac.uk/~awl03/cgi-bin/trac.cgi/miro/browser/trunk/gcc/libjava/classpath/gnu/java/nio/charset

If you want extend Charset class and add you private Charset: Java NIO. Chapter 6 Character sets

I was reading the book and it has got the below lines:

A MemoryMappedBuffer directly reflects the disk file with which it is associated. If the file is structurally modified while the mapping is in effect, strange behavior can result (exact behaviors are, of course, operating system- and filesystem-dependent). A MemoryMappedBuffer has a fixed size, but the file it's mapped to is elastic. Specifically, if a file's size changes while the mapping is in effect, some or all of the buffer may become inaccessible, undefined data could be returned, or unchecked exceptions could be thrown.

So my questions are:

  • Can't i append text to the files which i have already mapped. If yes then how?
  • Can somebody please guide me what are the real use cases of memory mapped file and would be great if you can mention what specific problem you have solved by this.

Please bear with me if the questions are pretty naive. Thanks.

Memory mapped files are much faster then regular ByteBuffer version but it will allocate whole memory for example if you map 4MB file operating system will create 4MB file on filesystem that map file to a memory and you can directly write to file just by writing to memory. This is handy when you know exactly how much of data you want to write as if you write less then specified rest of the data array will be filled with zeros. Also Windows will lock the file (can't be deleted until JVM exits), this is not the case on Linux.

Below is the example of appending to a file with memory mapped buffer, for position just put the file size of file that you are writing to:

int BUFFER_SIZE = 4 * 1024 * 1024; // 4MB
String mainPath = "C:\\temp.txt";
SeekableByteChannel dataFileChannel = Files.newByteChannel("C:\\temp.txt", EnumSet.of(StandardOpenOption.WRITE, StandardOpenOption.CREATE, StandardOpenOption.APPEND));
MappedByteBuffer writeBuffer = dataFileChannel.map(FileChannel.MapMode.READ_WRITE, FILE_SIZE, BUFFER_SIZE);
writeBuffer.write(arrayOfBytes);