Tempdb is often overlooked when finding out problems for perfomance issues with Sql Server. Usually tempdb just lies at the default location with the default size of 8 MB and autogrow enabled. Our application had this exact problem. We are using tempdb heavily with reporting, but we hadn't put any effort into tuning the tempdb. While testing a new server I tried to do some empirical testing with different configurations.
The Sql Server tempdb documents (for example Working with tempdb in SQL Server 2005) recommend the following things: setting the correct size, increasing the number of data files and putting the tempdb on separate disk array. Setting the correct size is obviously a good idea, since that removes the extra burden of constantly growing the file. Especially after a reboot when the file starts from 8 MB. Microsoft says that with SQL Server 2005 there shouldn't be so much issues with tempdb contention and need for multiple files, but I will test that also. The last recommendation of putting tempdb on separate disk array might be difficult in real life, where you usually have a server with some disk configuration and you have to work with that. In our case we have just two disk arrays so tempdb has to be on one or the other. But even if it isn't on dedicated disk, choosing the right disk array still makes a noticeable difference.
The server I used was a dual cpu quadcore with 8 GB of ram. It had a raid controller with two disk sets configured, RAID1 for the operating system (Windows Server 2003 Standard x64) and RAID10 for the data. I have written a piece about the used disk alignment and NTFS allocation unit sizes.
I didn't use any testing program here. Instead I used our own application which has one report which uses tempdb heavily. The database size is bit over 40 GB. Transaction log is located on operating system RAID1 array and the database is on RAID10 array. The report is just a stored procedure which mostly reads data from one big table based on clustered index. Of course the data is then joined and processed with other tables. At the end the results are written to another table, so this is not strictly a read only experiment.
For concurrency I created a Python script which made a 90 combinations of parameters for the report and then created different number of processes, from 1 to 128 (which is overkill, since there wasn't enough parameters for each of these). Every one of the processes took one set of parameters and started the stored procedure. After the procedure completed, the process took another set of parameters and so on, until all the reports were run. Between each test run Sql Server was restarted, so tempdb was reset to default size and cache cleared.
If you are now thinking "Gotcha! GIL! Python sucks at concurrency." I can tell you that I used Pyprocessing. But it doesn't make any difference here, since the concurrent activity is actually done on SQL Server and running the report takes anywhere from 20 seconds to several minutes. So if GIL caused any delays here, they would probably be marginal. I also made a version of the test script that uses threads and the results were in line with the process version.
Basically this experiment just documents the performance of tempdb on this specific hardware and with this specific application. But since the used hardware and configuration is quite ordinary, the results might be interesting for others also. Anyway, don't trust any of these numbers without testing with your own configuration first.
Straight to the hard facts. Below are the results for all the tests. These will be discussed later. All tests were run at least twice and I have picked the best time in each case. The results are here presented in seconds that it took to run all the 90 reports with different number of processes. The system was not running anything else except some system monitoring software and normal Windows background processes. Still there are some values in results that might include some kind of performance penalty because the data file was growing at that point or indexes grew larger etc. So we are not looking so much for single result but the trends across different workloads.
proc = number of processes defC = default 8MB tempdb located on C: (operating system drive) defD = default 8MB tempdb located on D: (data drive) diff% = difference to the base results (defC), negative means slower than original 3D = 3 GB tempdb located on D: 3Dm = 3 GB tempdb space located on D: consisting 8 equal size files 3DlC = 3 GB tempdb located on D: and tempdb log on C: proc defC defD diff% 3D diff% 3Dm diff% 3DlC diff% 1 4000 3847 3.85 3672 8.22 5190 -29.72 3708 7.30 2 2782 2640 5.10 2518 9.47 3500 -25.81 2581 7.21 4 2223 2059 7.36 2133 4.03 2687 -20.88 2123 4.51 8 1961 1875 4.39 1916 2.27 2255 -15.02 1959 0.06 16 1655 1473 10.97 1461 11.74 1739 -5.06 1619 2.15 32 1535 1363 11.15 1535 -0.01 1787 -16.44 1625 -5.90 64 1399 1478 -5.63 1350 3.55 1709 -22.09 1691 -20.83 128 1665 1441 13.44 1356 18.54 1653 0.71 1482 10.96
In general the throughput seems to grow nicely (total execution time decreases by several minutes) up to 16 processes but after that the problem probably is with disk system and adding more processes does not really increase throughput much.
The primary suggestion for tempdb is of course separate it from the user databases and SQL Server binaries. In this case this wasn't possible. So tempdb has to be either on operating system array or data array. I tested the difference first with default tempdb settings, where it starts to grow from 8 MB. With 128 processes tempdb grew up to 3 GB. This is almost 10 times bigger than in our current live database, where tempdb is around 300 MB, so the test is somewhat artificial, but at least it should give plenty of room for growth of the system.
SQL Server installation places tempdb in the same directory with other system databases. In this case that means the operating system drive. Tempdb data and log files are places in same directory. In this experiment I just moved tempdb to D: drive. This can be done with the following commands. The new paths will be used only after SQL Server is restarted.
alter database tempdb modify file(name=tempdev, filename='d:\mssql\data\tempdb.mdf') alter database tempdb modify file(name=templog, filename='d:\mssql\data\templog.ldf')
Looking at ther first two result columns (defC and defD) in results table we can see that just placing the tempdb on faster disk array gives us up to 13 percent more performance. Not surprisingly the difference is greatest with biggest workload.
The default size for tempdb is measly 8 MB. Microsoft recommends setting the correct size as default. This can be done with alter database command also. For example like this.
alter database tempdb modify file(name=tempdev, filename='d:\mssql\data\tempdb.mdf', size=3GB)
It has to be pointed out that while growing tempdb dynamically slows down the system temporarily while it does this, it usually is not a big issue. This is because tempdb shrinks only when SQL Server is restarted. Or maybe the correct saying would be that tempdb is recreated at restart. Anyway, if there is need for frequent restarts then there usually is some kind of other problem. SQL Server 2005 has also some optimizations for instant file initialization, which you can read more from the Microsoft article.
In my tests I found out that setting the default size at maximum did indeed give some extra performance. This is due to tempdb not needing to use autogrow at all. But comparing the results (3D) to default tempdb settings on same drive (defD) the performance increase is not big. But in any case, the best results of the experiment were achieved with this configuration.
Articles about optimizing tempdb usage suggest that more data files for tempdb will reduce contention. The recommended starting number is the number of processors in system.
This recommendation is maybe more important with older versions of SQL Server, since tempdb usage has been optimized in SQL Server 2005. In fact, there is an interesting blog post about this issue, which suggests that start with one file and only add files whenever there is measurable tempdb contention. Adding too many files will lead to extra overhead and cause performance problems itself.
In my tests this was indeed the case. I added 8 files (as the system has 8 cores). It is important to make them all equal sizes since only then SQL Server will spread the workload among all the files. This can be done with commands like the following.
alter database tempdb add file(name=tempdev2, filename='d:\mssql\data\tempdb2.ndf', size=384MB, filegrowth=10%)
Having multiple tempdb files only was useful when the number of processes was at the maximum. Which of course makes sense, since having large number of processes means more contention for tempdb resources. But the real surprise was how much worse the results were with smaller number of processes. For example having only one process the overhead was almost 30 percent comparing to baseline results. And comparing to same size tempdb on data array the results were over 40 percent weaker.
The results were so surprising that I rebooted, restored the test database and rerun the tests. Still the results were same and the number here is best of four test runs.
At least this experiment shows that one should never blindly follow any best practises lists. At least when the lists point to previous versions of the product.
Tempdb does not need backups and it can well be in simple recovery model. This is the recommendation and also the default. Tempdb does still use transaction log and setting the correct default size for log might also make a marginal difference. Nowhere had I seen any indication that there would be need to separate tempdb data and log files. As I was doing the test runs anyway, I decided to test this also.
Well, there is no need. At least in this case. But the results (3DlC) can be explained, because here we have only two disk arrays. And putting anything else on same drive with transaction log probably isn't a good idea anyway. The results were better compared to default settings on either drive, but this probably is just due to having the data file on faster array.
This experiment shows that when having to choose from transaction log and data arrays it is best to place tempdb with the data. Usually this will also be the faster disk array.
Setting the correct size for tempdb makes a measurable but small difference. Disk space should not be an issue since autogrow will anyway use the same space so it is best to set the default size to something reasonably big. Microsoft actually recommends something around 80 percent of the approximated maximum tempdb size, so autogrow would only be in use in extremely demanding situations.
The third important result was that adding more that one tempdb files is not a good idea unless tempdb contention is a problem.