Introduction to Mutlithreading
The other day I was reading an article titled "Increase Response Time With Multithreading" in the November 2002 issue of Visual Studio Magazine (formerly Visual Basic Programmer's Journal). After chuckling about the title (don't you want to decrease the time needed for a response?) I got to thinking. Multithreading is one of most highly hyped features in VB .NET but it's really a kuldge to patch up a poorly design system. Let me explain.
What is Multithreading?
Most computers run multi-tasking operating systems but the computer itself can only do one thing at a time. The CPU quickly switches back and forth between different processes to make it seem as if they are all running at the same time. Most processes spend a lot of time sitting around doing nothing (waiting for user input, waiting for disk drive access, waiting for a network connection to complete, and so forth) so switching them out doesn't hurt anything and it makes the user's life a lot easier.
Each major piece of running code executes in its own process. A process can also execute multiple threads that run different routines "at the same time." Like the main processes, the threads don't really run simultaneously. The system quickly switches back and forth between to make it seem as if the threads are running at the same time.
Multithreading itself doesn't make the computer any faster because the same CPU still has to process every command. In fact, multithreading slows the computer down a little bit because the system needs to maintain extra information about each thread and spend time switching them in and out. The penalty is small, however, so that by itself isn't a hugely important reason to avoid multithreading.
Why Multithread?
If multithreading doesn't really make the computer faster, why bother? The reason is to allow the computer to handle more than one task at (sort of) the same time. The most common example occurs when a program needs to perform some relatively long task but you want to let the user continue interacting with the application.
For example, suppose your program needs to perform a complex calculation that will take several minutes. With older programming environments, the program would seem to lock up so the user couldn't do anything until the calculation was finished. A reasonably designed program would display a status bar to at least let the user know it was still working. A more sophisticated program would periodically check the keyboard to see if the user had pressed Escape to cancel the calculation.
If you're a Visual Basic programmer, you may not see why this is a big deal. If you're a long-time C++ programmer, however, you probably know why this sort of thing can be hard. To understand why this is the case, consider how a C++ program in Windows used to work.
The heart of the program was an event loop. This loop checked the Windows message queue to see if there were any messages waiting for the program. It would respond the messages by taking appropriate action. For example, when the program found a paint message, it would repaint part of its forms.
Do
' Process messages.
Select Case (message type)
Case paint
draw stuff ...
Case exit
end the program...
etc.
End Select
Loop
|
So now how does the message loop handle a long calculation?
Do
' Process messages.
Select Case (message type)
Case start long calculation
call a routine to perform the long calculation
Case paint
draw stuff ...
Case exit
end the program...
etc.
End Select
Loop
|
While the subroutine performing the long calculation is running, the program does not return to the event loop. So how can the program tell if the user presses Escape or clicks the Cancel button?
One method would be to have the calculation routine periodically check the message queue to see if there are messages indicating that the user pressed Escape or clicked Cancel. But what about the other messages in the queue? Either the calculation routine needs to process them (in which case it contains its own copy of the event loop) or it must leave them in the queue so the main event loop can process them later. That means the system needs routines to let the code peek at the event queue but possibly not remove messages from it. Either way, this is a big mess.
Another possibility is to launch the calculation routine in its own thread. The system executes it on its own so the main program can go back to twiddling its thumbs in the event loop.
This is a much cleaner solution than making the calculation routine dig through the message queue. The main event loop can watch for the Escape key and Cancel button. If it sees one of these, it can kill the calculation thread relatively painlessly.
There are still some synchronization issues. The main program probably needs to disable the button that launched the calculation until the thread finishes so the user doesn't launch another thread. There may also be issues with access to data from the program and the thread. For example, you don't want both to access the data at the same instant. The operating system can handle most of these issues transparently.
Where's The Loop?
Many Visual Basic programmers don't appreciate the biggest single advantage to Visual Basic: it has no event loop. Whenever your code isn't actually running, the system executes its own event loop behind the scenes. When it finds a message you need to know about, the system raises an event for your program to handle in an event handler.
Now suppose the user clicks the Long Calculation button. The program starts running the subroutine that takes a long time to finish. Periodically the routine updates a progress bar so the user knows what's happening. This is pretty straightforward but there's still the issue of responsiveness. How do you let other programs on the computer update their displays and perform other booking chores? And how do you let the user cancel the long calculation?
If you've been using Visual Basic for a while, you probably know the answer is DoEvents. When the program executes the DoEvents statement, it releases control of the CPU so other processes can clean up their message queues. In particular, it allows the main program's behind-the-scenes event loop to run and look for Escape keys and Cancel button clicks. If it finds one of those, it invokes your event handler. The event handler can set a global flag indicating that the user wants to end the calculation.
After the other processes have handled their waiting messages, DoEvents returns control to the long calculation program. That subroutine checks the global flag, sees that the user wants to quit, and exits.
' Indicates whether the calculation is still running.
Private m_CalculationRunning As Boolean
' Perform the long calculation.
Private Sub cmdStartCalculation_Click()
' The calculation is now running.
m_CalculationRunning = True
Do
' Perform part of the calculation.
...
' See if the user wants to quit.
DoEvents
If Not m_CalculationRunning Then Exit Do
Loop
' Display partial or full results.
...
' The calculation is done.
m_CalculationRunning = False
End Sub
' Cancel the long calculation.
Private Sub cmdCancel_Click()
m_CalculationRunning = False
End Sub
|
There are still synchronization issues. For example, you'll want to disable the Start Calculation button and enable the Cancel button while the calculation is running.
What Now?
Like many of the other highly hyped new features in VB .NET, multithreading is not that big a deal. It seems more like something the C++ programmers who are used to explicitly dealing with event loops would find really exciting rather than something essential for a typical Visual Basic programmer. How do you keep your application responsive to the user? DoEvents. (Yawn. Next question.)
Oh sure, I can concoct situations where multithreading would be useful. My wife has an application that interacts with an imaging system that takes a long time to process data. She launches a thread because she doesn't have access to the imaging system's code so she can't put calls to DoEvents inside it (she's not using Visual Basic either but the same would apply to a similar Visual Basic program).
In this case, multithreading is truly a kludge to compensate for poor software engineering. If the imaging system processed data asynchronously and raised an event when it was finished, Michelle wouldn't have to worry about multithreading.
There are other situations where multithreading might make a program's design less confusing. For example, suppose you want to simulate multiple computers performing a highly parallel calculation. You could launch a simulator for each computer using a separate thread. I thin the timing and synchronization issues would make this more trouble than it was worth, but I can see some benefit.
Overall, however, multithreading is an interesting novelty and an artifact of an obsolete programming paradigm. Another wonderful improvement brought to you by the developers who brought you the += operator (from C++), Try/Catch/Throw (also from C++), and garbage collection (which solves a major problem in C++ programs that doesn't occur in VB programs)!
Adam Kelly has this reply:
I agree with most parts of your rant on multithreading, but the argument seems to be made for a single processor. What about a dual processor machine? I have for a long time being trying to find ways to really usefully use multithreading without much luck apart from the areas discussed below.
Splitting the work between two processors when working with large arrays in image processing seems a good use of resources, but can two threads access/write to the same array but different locations? Seems from what I have read one process is always locked out to stop a read and write at the same location?
That said a dual processor machine does have some advantages that are often over looked for non-multithreaded code, We have several dual processing machines working on single threaded code to process very large amounts of data. The second processor can handle passing data between machines over the network at the same time as the code is running flat out on the other processor. It is not perfect, but it is faster than a single processor trying to balance work and data transfer at the same time. I.E. a second machine requests or sends data to the first while it is busy.
Also if you use SQL server it does a great job of balancing data requests and loading data through the use of multiple threads. With a single thread while a large write was being done to the database a user requesting even a small amount of data would be frozen out.... However this does require a dual CPU machine....
Seems what you were talking about was multithreading on a single processor, will the same hold true with Intel's hyper threading where two instructions can be executed simultaneously on a single CPU?
Now if you figure out how I can address the same array with two threads for image processing... Start one at the top of the array and one in the middle. So that two threads get done in half the time on a dual processor or a hyperthreaed CPU, I am all ears!
Adam makes excellent points. Yes I was talking about a single CPU system, specifically with respect to Visual Basic. A multiple CPU system gives you a lot more options for this sort of thing.
Note that some hardware can automatically deserialize a computer's instructions and execute some in parallel.
After further discussion, Adam told me:
I hate to admit this but I have 30 computers at work all working flat out, all the time! We are to the point we don't let them idle (Hence 2nd processors handling data transfer in the background).
With those kinds of processing needs, he certainly knows how to squeeze out every last CPU cycle!
The hyper-threading that Adam mentions is a new feature of Intel chips. As near as I can tell, it allows the hardware to perform context switching much as the operating system does now. Because it is in hardware, it is significantly faster. One article says:
... However, the performance improvement does not double - a gain of around
25 per cent is achieved on desktop processors, and up to 30 per cent - but
often less - on servers, Intel says. Also, developers need to write their
apps for multithreading for users to gain full benefits from the
technology...
Here are some links where you can get more information:
The first article claims that multithreading often makes an application easier to develop. They cite server programming. That's probably true for some server applications, but personally the few times I've needed to write something server-like, Visual Basic's ActiveX DLLs did just fine. Properties on the DLL classes let you decide whether they should run separately or shared and that removes the need for explicit multithreading.
Adam Kelly had some more thoughts:
I have been thinking more about threading issues some more as I have been
working on code the past couple of weeks. I have come up with a place where a
single processor may want two or more threads that to my knowledge are beyond
using an alternative like DoEvents.
The best example of this is when you make a query of a large database with
ADO. Until the database engine (Jet/MSDE/SQL etc.) returns the data the
execution of your program is locked. So the form is locked and appears hung!
The same is true of any other call to an external function (DLL etc.). We
basically turn over control to the outside function and wait. DoEvents wont
save us! Is there a solution I don't know of short of threading? How to have
the form behave normally when an external function is executing?
I think some database calls support asynchronous processing. That would be a
better solution for both long queries and other slow DLL routines. The routine would return immediately and then work
independently in the background. It raises an event or calls a callback routine when it is
finished. In the best scenario the routine's library also provides a
function to get the progress of the long operation so you can keep the user
posted.
You can get this effect with Visual Basic relatively easily. In the initial subroutine call, start a timer set for a short time and then return. You'll still need to call DoEvents to let the main program get something done. When the timer fires, start the real calculation. When the calculation is finished, raise an event or invoke a callback method.
Adam's right, though, that some long-running routines don't provide this
sort of service and in that case you're stuck. It's a kludge
to cover bad DLL design but you might very well want to use multihreading in
these cases.
Michael Miller of
HalfTime Technologies Inc. says:
Our company specializes in multithreaded development and we even offer a VB6 multithreading component library called Thread Factory. I invite you to download a copy for your own evaluation.
Anyway, my programming background focuses almost entirely on multithreading application development - specifically realtime stock trading applications. These applications are extremely time critical and everything is measured in milliseconds. On your web page you invited feedback from programmers who "worked on an application where multithreading made things easier". Multithreading never makes things easier. However it does make certain things possible and in some cases more efficient. The asynchronous nature of Multithreading is what guarantees its complexity and is why its never easier to implement.
Let me begin by saying that many developers are misled into believing that multithreaded programs are only advantageous when running on a multiple CPU computer. This is a terrible misunderstanding; whether a computer has 1 or more CPUs is almost irrelevant in determining when to multithread a program. The important thing to do is define the tasks needed to be accomplished and determine if there will be a gain by processing them concurrently.
As an example. I recently had a large stock pricing application at an investment broker. The application ran very fast, but was slow saving prices to the database because of network latency and database contention. The solution I choose was to create multiple save services each running on their own thread and having their own connection to the database. Rather than having a single object trying to save 10,000 stock prices, I opted to have 5 save objects running independantly of each other saving 2,000 prices each. The overall performance gain in this situation was consistantly 300% faster.
As you can see in this situation the number of CPU was irrelavant because the CPU usage was approximately 5% during the entire save process. The BIG picture must be looked at in order to make judgement calls about when to multithread. The # of CPUs can be an important factor depending on the tasks to be accomplished, but this is not always the case.
I cannot emphasize enough the key to multithreading is understanding the tasks you need to be accomplish and how they would perform when running concurrently. Fortunately this analysis gets much easier as experience with multithreading is gained.
If you have worked on an application where multithreading made things easier, email me and let me know. I'll post people's experiences here.
|