Monday, 8 January 2007

MultiThreading - Part 2

Building programs around threads of execution—that is, around specific sequences of instructions—delivers significant performance benefits. Consider, for example, a program that reads large amounts of data from disk and processes that data before writing it to the screen (such as a DVD player). On a traditional, single-threaded program (the kind most client programs use today), where only one task executes at a time, each of these activities happens as a part of a sequence of distinct phases. No data is processed until a chunk of a defined size has been read. So the program logic that could be processing the data is not executed until disk reads are complete. This leads to inferior performance.
On a threaded program, one thread can be assigned to read data, another thread to process it, and a third to write it out to the graphics card. These three threads can operate in parallel so that data is being processed while disk reads are going on. And overall performance improves. Many other examples can be devised in which the ability to do two things at once will provide better performance. The Java virtual machine (JVM) is itself heavily threaded for just this reason.
This article discusses creating multithreaded Java code, a few best-practices for designing parallel programs, and some of the tools and resources available to developers. That's a lot to cover in one article, so I'll just highlight the salient points and direct you to resources for additional information.
Threading Java Code
All programs use at least one thread. In C/C++ and Java, this is the thread that is started with the call to main(). The creation of additional threads requires several steps: creating a new thread and then assigning it work. Once the work is done, the thread is automatically killed off by the JVM.
Java provides two methods for creating threads and assigning them work. The first of these is to subclass Java's Thread class (in java.lang package) and then over-ride the run() method with the thread's work function. Here is an example of this: public class SimpleThread extends Thread { public SimpleThread(String str) { super(str); } public void run() { for (int i = 0; i < 10; i++) { System.out.println(i + " " + getName()); try { sleep((long)(Math.random() * 1000)); } catch (InterruptedException e) {} } System.out.println("DONE! " + getName()); }}
This class subclasses Thread and provides its own run() method. That function runs a loop which prints a passed string to the screen and then waits for a random amount of time. After ten iterations of the loop, the function prints "DONE!", then exits, which kills the thread. Here is the main function that creates the threads: public class TwoThreadsDemo { public static void main (String[] args) { new SimpleThread("Do it!").start(); new SimpleThread("Definitely not!").start(); }}
Note the simplicity of this code: the function is started, given a name (which is the string the thread will print) and start() is called. start() will call the run() method. The results of this program are as one would expect: 0 Do it!0 Definitely not!1 Definitely not!2 Definitely not!1 Do it!2 Do it!3 Do it!3 Definitely not!4 Do it!4 Definitely not!5 Do it!5 Definitely not!6 Do it!7 Do it!6 Definitely not!8 Do it!7 Definitely not!8 Definitely not!9 Do it!DONE! Do it!9 Definitely not!DONE! Definitely not!
As can be seen, the output from the two threads is intermingled. In a single-threaded program, all the "Do it!" commands would have printed together, followed by all the "Definitely not!" writes.
Different runs of this program will generate different results. This uncertainty is due to two aspects: there is a random pause between iterations of the loop and, more importantly, because the timing of thread execution is not guaranteed. This is a key principle. The JVM will run the threads on its own schedule (which generally favors running them as quickly as possible, but there is no guarantee of when a given thread will be run). A priority can be associated with individual threads to make sure that critical threads are handled by the JVM before less important ones.
The second way to start up a thread is to use a class with the Runnable interface, which is also defined in java.lang. This Runnable interface specifies a run() method that then becomes the thread's principal function, similar to the previous code.
The general style in Java these days is to favor interfaces over inheritance. This rule of thumb is valid in this choice of options as well. By using the interface option, a class can still inherit (subclass) later on, if necessary. (This would occur, for example, if the class were to be used later as an applet.)
Implications of Threading
While threading enhances performance, it adds complexity to the program's internal operations. This complexity arises primarily from interaction among the threads themselves. It is important to become familiar with these issues, because as more cores are added to Intel® processors, the number of threads to use will grow correspondingly. If these issues are not understood when programming for many threads, difficult to find bugs will undoubtedly ensue. So, let's look at some of the issues and solutions.
Waiting on another thread to finish: Suppose we have an array of integers that we want to process. We could step through the array, one integer at a time, and perform the operation. Or, more efficiently, we could assign several threads so that each one would process some portion of the array. Suppose we had to wait for all threads to finish before moving on to the next step. To synchronize activities temporally between the threads, threads use the join() method, which makes one thread wait on another to complete. The joining thread (thread B) waits on the joined thread (thread A) to finish. An optional timeout in join() enables the thread B to move forward with the other work if thread A does not terminate within a given time frame. This provision touches on a core complexity of threading: the problem of waiting on a thread. We'll touch on this next.
Waiting on locked objects: Suppose we wrote an airline seat-assignment system. On large programs of this kind, it is common that a thread is assigned to every user connected to the software, such as one to every ticket clerk. (In very large systems, this is not always the case.) If two users are simultaneously attempting to assign the same seat, problems will arise. Unless special steps are taken, one thread will assign the seat while another thread is doing the same thing. Both users will think they have an assigned place on the flight.
To avoid two threads simultaneously modifying the same data item, we have one thread lock the data item just prior to modification. In this way, when the second thread comes along to make the modification, it will have to wait until the lock is released by the first thread. When this occurs, the thread will see that the seat has just been assigned, and the request for seat assignment will fail. The problem of two threads racing to get the seat assignment completed is known as a race condition, and it can wreak havoc when it occurs. Best practice is to lock any code that accesses a variable that is accessible to more than one thread.
There are several locking options in Java. The most common of these is the use of the synchronized keyword. When the signature of a method includes synchronized, only one thread can execute that method at any given time. The lock on the method is then released when the method finishes executing. For example, protected synchronized int reserveSeat ( Seat seat_number ){ if ( seat_number.getReserved() == false ) { seat_number.setReserved(); return ( 0 ); } else return ( -1 );}
is a method that can only be run by one thread at a time. This locking breaks the race condition described earlier.
Using synchronized is one of several methods for handling interactions between threads. Java release J2SE 5.0 adds several convenient ways to lock objects. Most of these are found in the java.util.concurrent.locks package, which should be examined carefully once you're comfortable with Java threads.
While locking mechanisms solve the problem of race conditions, they introduce new complexity. The most difficult situation, in this regard, is called deadlock. Suppose thread A is waiting on thread B, and thread B is waiting on thread A, then both threads will be blocked forever, which is why the term deadlock is so descriptive. Deadlocks can be very difficult to identify (they can be hard to reproduce), and so care must be exerted to make sure there are no dependencies of this kind between threads.
Using Thread Pools
As mentioned earlier, when threads finish executing they are killed off by the JVM and the memory allocated to them is eventually garbage-collected. The trouble with the constant creation and destruction of threads is that it wastes clock cycles, as creating threads does have distinct overhead associated with it. A common solution and best practice is to allocate a group of threads (called a thread pool) early in the program and then reuse the threads as they become available. Using this scheme, the function assigned to a thread at creation is to sit in the pool and wait for a work assignment. Then, when the assigned work is complete, the thread is returned to the thread pool.
J2SE 5.0 introduced the java.util.concurrent package that includes a pre-built thread pooling framework that greatly facilitates this approach. More information on Java thread pools, including a tutorial can be found here.
When designing threaded programs and thread pools, the question naturally arises as to how many threads should be created. The answer depends on how you plan to use the threads. If you divide the work across threads on the basis of discrete tasks, then the number of threads equals the number of tasks. For example, a word processor might use one thread for display (the primary program thread in almost all systems is the one entitled to update the user interface), one for paginating the document, a third one for spell checking, and a fourth for miscellaneous background operations. In this case, four threads are ideal and they provide a very natural way to write the software.
However, if the program—like the one discussed earlier—uses multiple threads to do similar work, then the optimal number of threads tends to be a reflection of the system resources, especially the number of execution pipelines on the processor and the number of processors. On systems with Intel processors that rely on Hyper-Threading Technology (HT Technology), there are currently two execution pipelines per processor core. The new multi-core chips have two processor cores per chip. Intel has indicated that future chips are likely to have more cores, in good part because additional cores generate more performance without substantially increasing heat or power consumption. So, pipelines are going to become more plentiful.
Using the arithmetic suggested by these architectures, on a dual-core Pentium® 4-based system, four execution pipelines will be available and so four threads will provide ideal performance. On a dual-processing Intel® Xeon™-based workstation, the ideal number of thread is four, because at present the Xeon chips offer HT Technology but no multi-core models.
Final Thoughts
When running threaded Java programs on Intel platforms, you will likely want to be able to monitor the load on processors and the execution of threads. One of the best JVMs for obtaining this data and managing how the JVM handles parallel processing is BEA's WebLogic JRockit. JRockit has the additional advantage of having been purpose-built for the Intel platforms and optimized by engineers from BEA and Intel.
Regardless of which JVM you use, Intel® VTune™ Performance Analyzer will give you a deep view into how the JVM is executing your code, including per-thread performance bottlenecks. A white paper on using VTune performance analyzer with Java is located here. This article has provided an overview of how threading works on the Java platform. As Intel continues shipping processors with HT Technology and releases more multi-core chips, pressure to obtain the performance benefits from these multiple pipelines will increase. And, as the number of cores increases, the number of pipelines will proliferate correspondingly. The only way to exploit there advantage is to use threads, such as those discussed in this article. And the benefit of threaded applications in Java will become even more clearly evident. The following section provides additional resources to help guide you along

No comments: