![]() |
|
![]() |
|||||||||||||||||||||
![]() |
![]() |
||||||||||||||||||||
|
How can I run queries in threads? | ||||||||||||||||||||
I have several programs that run large queries, and would like to move the query processing to one or more background threads. How can I implement this in my program?
Process and Thread Basics Many programmers, especially those new to programming with multiple threads, believe threads are the sole domain of programming gurus. But it's all a matter of understanding some fundamental concepts about how threads work in Delphi and how you can apply threads in your code. If what you're programming has to do with the User Interface, you don't want to use multiple threads because threads require resources. In fact, every time you create a thread, it gets the same size stack as the main thread of a program (we'll discuss this below). If you have a lot of threads, you'll take up a lot of resources. So the idea is to be judicious in your use of threads. It may be tempting to create a bunch of threads to handle a bunch of different tasks because creating threads, as you will see, is fairly easy. But you don't want to create threads just for the sake of creating threads. In addition to taking up resources, every executing thread creates another time slice on the CPU, forcing it to handle more tasks. The net result is that the computer will slow way down. But there are some instances in which running background threads makes a lot of sense: It's ideal to create background threads when:
In Windows 95 and NT (I'll refer to both systems as Win32 throughout this article), every program loaded into memory is called a process. Many people new to threads (including myself) make the mistake of believing that the definition of a process is interchangeable with that of a thread. It's not. Processes are their own entities. Though the name "processes" implies a form of activity, a process does nothing. It is merely a memory address placeholder for its threads and a space where executable code gets loaded. A process' thread is what actually does the work in a program. When a process is created in Win32, it automatically has an execution thread associated with it. This is called the main thread of the program. Other threads can be instantiated within a process, but you won't see many programs using multiple threads of execution. A thread can only be associated with one process, but a process can have many threads. Therefore, there is a distinct one-way, one-to-many relationship between processes and threads. The TThread Object Traditionally, processes (executing programs) are created in Win32 using the WinAPI CreateProcess and threads are created using CreateThread. In fact, many advanced Delphi and Windows programmers I've spoken with say that using the WinAPI call is their method of preference. With Delphi 2.0, the Delphi team created a wrapper class called TThread that encapsulates the WinAPI thread calls. TThread provides developers with a programmatic interface for creating multiple threads of execution in their programs. It also makes the job of creating and maintaining threads easier than directly using WinAPI calls. Does this come at a price? I don't know. I have several multithreaded applications using both straight WinAPI calls and the TThread object and haven't noticed any significant differences. But my test arena was not as wide as it should have been to accurately gauge performance differences. Most of the VCL is not thread-safe -- it's very important to take this into consideration when creating multiple threads of execution. If you call a VCL object from within a thread, most likely you'll raise an exception, because many of the VCL objects were not written with any type of synchronization code to ensure data integrity when called at random times from anything but the main thread. Essentially, they can only receive messages from a single thread. If they get a message from another thread, they'll hiccup, and your program will probably crash. Fortunately, TThread has a very simple way of safely making calls into the VCL that we'll discuss in a bit. Let's look at the TThread's structure. Here's the declaration for the TThread object: TThread = class private FHandle: THandle; FThreadID: THandle; FTerminated: Boolean; FSuspended: Boolean; FMainThreadWaiting: Boolean; FFreeOnTerminate: Boolean; FFinished: Boolean; FReturnValue: Integer; FOnTerminate: TNotifyEvent; FMethod: TThreadMethod; FSynchronizeException: TObject; procedure CallOnTerminate; function GetPriority: TThreadPriority; procedure SetPriority(Value: TThreadPriority); procedure SetSuspended(Value: Boolean); protected procedure DoTerminate; virtual; procedure Execute; virtual; abstract; procedure Synchronize(Method: TThreadMethod); property ReturnValue: Integer read FReturnValue write FReturnValue; property Terminated: Boolean read FTerminated; public constructor Create(CreateSuspended: Boolean); destructor Destroy; override; procedure Resume; procedure Suspend; procedure Terminate; function WaitFor: Integer; property FreeOnTerminate: Boolean read FFreeOnTerminate write FFreeOnTerminate; property Handle: THandle read FHandle; property Priority: TThreadPriority read GetPriority write SetPriority; property Suspended: Boolean read FSuspended write SetSuspended; property ThreadID: THandle read FThreadID; property OnTerminate: TNotifyEvent read FOnTerminate write FOnTerminate; end; Its structure is quite simple -- and simple is good In most components there are only a few procedures and properties you need to think about; this is not an exception with TThread. The only methods you'll need to worry about are Execute, Create, and Synchronize; and the only property that you'll usually need to access is FreeOnTerminate. Key Methods and Properties of TThread The key methods and property of TThread are listed below in Table 1.
A Real-life Example Most people get more out of seeing code examples to implement a certain concept. Below are code excerpts from a program that I wrote that performs a bunch of queries in sequence. First we have the type declaration: type TEpiInfoProc = class(TThread) {Base processing class for Episode Information processing} private FStatMsg : String; FSession : TSession; tblPrimary, tblHistory, tblSymmetry : String; FIter : Integer; property Iteration : Integer read FIter write FIter; procedure eiCreateEpiTemp1; //Performs initial joins procedure eiCreateEpiTemp2; //Creates new table of summarized info procedure eiGetClassifications(clState, //'AMI', 'Asthma', etc. clName, //'H1', 'H2', etc. priFld, //Join field from Primary hstFld : String; //Join field from History bMode : TBatchMode); //Batch Mode (will always be //batCopy 1st time); procedure eiCreateEpiInfoTable(clSrc, clDst, clName : String); procedure eiCreateHistory(HistIndicator : String); //Generic processing methods procedure EnableBtns; procedure GetTableNames; procedure UpdStatus; procedure IndexTbl(dbName, tblName, idxName, fldName : String; idxOpts : TIndexOptions); protected procedure Execute; override; public constructor Create; end; The above is like anything you see in Delphi when you declare a descendant of a class. You declare your variables, properties, and methods just like anything else. Here's the Create constructor for the thread: constructor TEpiInfoProc.Create; begin inherited Create(True); FSession := TSession.Create(Application); with FSession do begin SessionName := 'EpiInfoSession'; NetFileDir := Session.NetFileDir; PrivateDir := 'D:\EpiPriv'; end; FreeOnTerminate := True; Resume; end; Notice I'm creating a new TSession instance. When we discuss running queries in threads, I'll go into more depth about this. At this point the important thing to note is that I create the thread in a suspended state by calling the inherited Create and setting its CreateSuspended parameter to True. This allows me to perform the code immediately below the inherited Create before the thread code actually starts running. Now, on to the Execute procedure. procedure TEpiInfoProc.Execute; var N, M : Integer; begin try Synchronize(EnableBtns); //Set enabled property of buttons to opposite of current state Synchronize(GetTableNames);//Get Names FStatMsg := 'Now performing initial summarizations'; Synchronize(UpdStatus); ShowMessage('Did it!'); Exit; eiCreateEpiTemp1; eiCreateEpiTemp; {Create H1 - H9 tables} for N := 0 to 8 do for M := 0 to 5 do begin FStatMsg := 'Now performing '+ arPri[M] + ' field extract for ' + arDis[N]; Synchronize(UpdStatus); Iteration := M; //first iteration must be a batCopy, then batAppend thereafter if (M = 0) then eiGetClassifications(arDis[N], arCls[N], arPri[M], arHst[M], batCopy) else eiGetClassifications(arDis[N], arCls[N], arPri[M], arHst[M], batAppend); end; {Now do Outer Joins} for N := 0 to 8 do begin FStatMsg := 'Now creating ' + arDst[N]; Synchronize(UpdStatus); eiCreateEpiInfoTable(arSrc[N], arDst[N], arCls[N]); end; IndexTbl('EPIINFO', 'EpiInfo', 'Primary', 'Episode',[ixPrimary]); for N := 0 to 8 do eiCreateHistory(arCls[N]); FStatMsg := 'Operation Complete!'; Synchronize(UpdStatus); Synchronize(EnableBtns); except Terminate; Abort; end; end; Notice all my calls to Synchronize, especially the synchronized call to UpdStatus. Immediately preceding this call, I set a variable called FStatMsg to some text. UpdStatus uses this text to set the SimpleText of a TStatusBar on the main form of the program. Why not just pass this as a string variable parameter to UpdStatus? Because synchronized methods cannot have parameters. If you need to pass parameters, you must either set them in variables or create a structure to hold values your synchronized method can then access and pass to your main form. Also take note of the looping structures I've written. You might be tempted to run a loop within a synchronized method. Although your code will work, this defeats the purpose of multiple threads because of what we discussed above. Synchronization makes your thread part of the main thread during the time it is executing, meaning you have only one thread actually running. This reduces your program to a single thread of execution. And if you have a potentially long process like several sequential queries executed from within a loop like I have above, forget it! Your program will act just like a single-threaded application until you're out the loop. In order to successfully run queries in threads, you must follow a few cardinal rules:
Of the above rules, 1 and 2 are the most important elements for successfully running queries in separate threads of execution. While rules 3 and 4 are important, under certain conditions your queries will not run if you don't obey rules 1 and 2. "Visual" and Non-visual Background Queries Now that we've established ground rules regarding running queries in threads, we must take into consideration a couple of implementation paths. I put the visual in quotes above to denote queries whose results will be displayed in a data-aware component of some sort. Non-visual background queries don't provide any visual result. They just run, write to persistent data stores and return. How you implement these methods is significantly different -- we'll discuss them in separate sections below. In either of these cases, the aim is to free up the user interface. One of my biggest frustrations with writing programs that process a lot of information is that once I've started execution, I can't do anything with the interface. I can't minimize or maximize the application; I can't move the window. Ever since I've learned to implement threaded technology in my programs, I need not worry about that. It's a real boon to my productivity. Running Totally Background Queries This method is a challenge; you have to code everything yourself. It's not a lot of code, but there are certain things to consider that you can take for granted when you drop VCL components onto a form. This method is very useful for "Data Mining" types of programs, in which you're querying huge tables and coming up with highly refined, summarized result sets. The operations that typify this type of application are several queries performed in succession. Were you to run this kind of application in a single-threaded program, you can effectively kiss your productivity goodbye because your user interface will be locked. The example I'll be using for this discussion is the example I used above because that application is a data mining type of application. It was built to run several sequential queries against Paradox tables with hundreds of thousands of records (pretty big for Paradox tables). Let's revisit the type declaration of thread: type TEpiInfoProc = class(TThread) {Base processing class for Episode Information processing} private FStatMsg : String; FSession : TSession; tblPrimary, tblHistory, tblSymmetry : String; FIter : Integer; property Iteration : Integer read FIter write FIter; procedure eiCreateEpiTemp1; //Performs initial joins procedure eiCreateEpiTemp2; //Creates new table of summarized info procedure eiGetClassifications(clState, //'AMI', 'Asthma', etc. clName, //'H1', 'H2', etc. priFld, //Join field from Primary hstFld : String; //Join field from History bMode : TBatchMode); //Batch Mode (will always be //batCopy 1st time); procedure eiCreateEpiInfoTable(clSrc, clDst, clName : String); procedure eiCreateHistory(HistIndicator : String); //Generic processing methods procedure EnableBtns; procedure GetTableNames; procedure UpdStatus; procedure IndexTbl(dbName, tblName, idxName, fldName : String; idxOpts : TIndexOptions); protected procedure Execute; override; public constructor Create; end; There's something in the type declaration that I didn't bring up previously. Notice the second private variable FSession. Then, look at the constructor Create code below: constructor TEpiInfoProc.Create; begin inherited Create(True); FSession := TSession.Create(Application); with FSession do begin SessionName := 'EpiInfoSession'; NetFileDir := Session.NetFileDir; PrivateDir := 'D:\EpiPriv'; end; FreeOnTerminate := True; Resume; end; I simply instantiate a new session in my Create constructor for my queries and tables to point to during the course of execution. If you don't have the latest update to Delphi and you look in the help file under TSession, you're warned not to create a TSession on the fly. Why? Truthfully, I don't know. I broke the rules anyway when I originally saw this warning because after looking at the VCL source for the TSession object in the DB.PAS file, I didn't see anything in the TSession object that would lead me to believe that I couldn't instantiate a TSession object on the fly. This changed in 2.01 -- the help file makes no such warning -- so it's not an issue. Of all the things that I'm doing in the thread, creating this TSession is the absolute key operation because it provides a common ground for all the data access components that I instantiate in the thread. During the course of execution, I make calls to several procedures that perform queries on several different tables. However, they all operate similarly, so it's only necessary to list one of the procedures to illustrate how you should do your queries in a thread. Here's an example procedure: procedure TEpiInfoProc.eiCreateEpiTemp1; var sqlEI : TEnhQuery; begin sqlEI := TEnhQuery.Create(Application); with sqlEI do begin SessionName := FSession.SessionName; DatabaseName := 'Primary'; DestDatabaseName := 'PRIVATE'; DestinationTable := 'epitemp1.db'; DestBatchMoveMode := batCopy; DoBatchMove := True; with SQL do begin Clear; Add('SELECT DISTINCT d.Episode, d.PatientID, d.Paid AS TotPd, d.Charge AS TotChg, D1.Start'); Add('FROM "' + tblPrimary + '" d LEFT OUTER JOIN "'+ tblSymmetry+'" D1 '); Add('ON (D1.Episode = d.Episode)'); Add('ORDER BY d.Episode, d.PatientID, d.Paid, d.Charge, D1.Start'); end; try try Execute; except raise; Abort; end; finally Free; end; end; end; The important thing to notice in the code example above is that the first property I set for the query I create is its SessionName property. This falls in line with obeying rules 1 and 2 that I mentioned above. I did this first so I'd be in compliance with them right away, though you can set properties in any order. The whole point to this is that looking at the procedure, it's no different from any type of procedure you'd use to create a query and execute it in code. The only difference is that you don't rely on the default TSession; you use the one you created at construction time. While I can't release what the application above actually does, I'll give you some gory details. On the average, the application requires three to four hours to execute completely, based on the size of the client database. Most of the client databases are several hundred megabytes in size, and all are in excess of 600MB per table, so you can imagine that with the enormity of the data sets, a single threaded application would just sit there and appear to be locked. But running the queries in the background frees the interface, so status messages can be supplied quite easily, and the form remains active during the run. I took an easy way out in this program to prevent multiple threads from executing. Typically what you'll do is use a system mutex or a semaphore to provide a mechanism signaling that you have a process running. But I found it much easier to just disable all the buttons on the form, so the user could still move the form around and even enter selection criteria but couldn't execute the new process until the current process has finished. It took much less code to implement a disabling feature than to implement a system mutex. Running Background Queries that will Produce Visible Result Sets You won't need to do this often because typically, queries run to display data in a grid or some other data-aware component are usually run against smaller tables and execute fairly quickly, so you can get away with running a single thread of execution. There are some instances in which you might need to query a couple of unrelated tables at the same time, so running the queries in separate threads of execution makes a lot of sense. Below is unit code from a test unit I wrote to illustrate this purpose. On my form I've dropped two of each of the following components: TQuery, TSession, TDataSource and a TDBGrid. I also have a button that will perform the thread creation. Let's look at the code, then discuss it: unit thrtest; interface uses Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs, DB, DBTables, Grids, DBGrids, StdCtrls; type TForm1 = class(TForm) DataSource1: TDataSource; Query1: TQuery; DBGrid1: TDBGrid; Button1: TButton; DataSource2: TDataSource; Query2: TQuery; DBGrid2: TDBGrid; Session1: TSession; Session2: TSession; procedure Button1Click(Sender: TObject); end; //Thread class declaration - very simple TQThread = class(TThread) private FQuery: TQuery; protected procedure Execute; override; public constructor Create(Query: TQuery); end; var Form1: TForm1; implementation {$R *.DFM} constructor TQThread.Create(Query: TQuery); begin inherited Create(True); // Create thread in a suspendend state so we can prepare vars FQuery := Query; //Set up local query var to be executed. FreeOnTerminate := True; //Free thread when finished executing Resume; end; procedure TQThread.Execute; begin FQuery.Open; //Perform the query end; procedure TForm1.Button1Click(Sender: TObject); begin TQThread.Create(Query1); TQThread.Create(Query2); end; end. I've made the thread intentionally easy. The only thing you pass to the constructor is a query. The rest is all taken care of. Before I did any coding of the above, I did the following with the components on the form:
This is a really simplistic example that works amazingly well. When the user presses the button, the program creates two new threads of type TQThread. Since we pass in the query we want to execute, the thread knows which one to operate on. Notice that I didn't put any synchronization code in this example. Some might think that you have to do this for a DataSource component, but the DataSource is Session-less, so it's impossible. Besides, the DataSource is merely a conduit between the Query and the DBGrid. It's fairly inert. As in any program, you can make this much more complex. For example, you can set the DatabaseName and SQL of the TQuerys at runtime, making this a really flexible display tool. You can add a batchmove facility at the tail-end of the run so that in addition to displaying the results in a DBGrid, the program will also write to result tables. Play around with this to see how you can expand on the concept. Conclusion Multi-threading is a breakthrough and revolutionary programming technology for many programmers. Note, that on Unix systems, multithreading has existed for several years. But for Windows platforms, this hasn't been the case until a few years ago. Also, I didn't cover some of the really advanced stuff that you can do with threads. For that, you'll have to refer to advanced-level Win32 development books. But hopefully, in the course of this discussion, you've reached a level of understanding that will get you on your way to constructing your own multi-threaded applications. |
Copyright © The Delphi Corner 2001 All Rights Reserved |
![]() |
||
|