When your Project Server queue slows down

There can be a whole load of different reasons why a queue in Project Server might not be processing as fast as you’d like – and after listing a few of them I am going in to detail on one condition we have seen with a few different customers now, under some different scenarios.  All of these are equally as applicable to Project Server 2007 as to Project Server 2010 queue processing.

1. Poor database maintenance – see http://technet.microsoft.com/en-us/library/cc973097.aspx for Project Server 2010 and http://technet.microsoft.com/en-us/library/cc973097(v=Office.12).aspx for the 2007 version

2. Large Shadow Tables – see http://support.microsoft.com/kb/2596531

3. Incorrectly configured queue settings – too many threads for the hardware you have – the 2007 reference is still one of the best descriptions of the Project Server queue – http://technet.microsoft.com/en-us/library/cc197395.aspx 

Update *** 3a (Thanks for the reminder Hrishi!) – Clock differences between servers – if the queue job can be processed ‘now’ but ‘now’ appears to be in the future to other servers then the job may go to sleep.

4. Just busy – the queue is there to spread the work out – sometimes this might take a while (long duration ‘admin’ type plans publishing…)

5. Poor execution plans for the stored procedures that control the queue processing – the main topic of this blog!

Getting in to more detail on point 5 – we have seen a few scenarios where the queue is processing really slowly – although SQL Server isn’t very busy and there appears to be no good reason that the jobs aren’t going through faster.  One customer’s observation (thanks Eric!) was that the jobs appeared to also complete in groups – that related to the number of threads.  For example if you had 8 threads running your Project queue (and of course plenty of hardware to support this quite high setting) then you might see 8 jobs running and some of these might have been expected to finish quickly – but they all seem to be waiting for the longest running job – then they all finish at once.  What appeared to be happening is that certain stored procedures that govern the queue jobs that ordinarily take just a few milliseconds might slow down and take half a second – or 10 seconds – or in the extreme 30 seconds to run.  And it was these stored procedures that decided which jobs get processed next.  A good way of summarizing this was that Project Server appeared to be taking more time deciding which queue job to run and not spending any time actually running anything!  Initially a reboot was the step that got things moving – but we wanted to be a bit smarter than the ‘turn it off and turn it on again’ approach.

I’ll talk a little more about the scenarios where we have seen this – some of the symptoms – some of the remedies and then some of the quick fixes that can both get things moving again and which can also confirm the problem – as well as introducing the topic of execution plans and how SQL Server decides how to get answers from the database.

An execution plan will detail how SQL Server goes about finding the data it needs – and will detail if specific indexes are used – or if perhaps just a table read is going to be quicker.  The plan will depend on the indexes available, the statistics available to understand the make-up of the indexes and the accuracy of those statistics.  Some of this comes back to point 1 in the list above – database maintenance – and some may depend on certain server settings – and even patches and versions of SQL Server in use.  Most of the slow downs we have seen relate to activities that will create a large number of queue jobs in a very short time.  Active Directory Synchronization, reporting database rebuild, mass deletion of projects or very heavy timesheet usage (or even automation) are the ones that spring to mind.  My theory is that the queue tables that were running along quite nicely while relatively empty suddenly find they have thousands of rows and the execution plan that is cached is no longer optimal.  SQL Server should quickly adjust for this – and we have certainly seen that Service Pack 1 of SQL Server 2008 R2 appears less susceptible to this problem.  There is also a recommended setting for Project Server – SET AUTO_UPDATE_STATISTICS_ASYNC ON (documented at http://technet.microsoft.com/en-us/library/ee662107.aspx) that could avoid this problem – as potentially it may be triggered if the database is waiting on the statistics to update rather than just getting on with the work!  As the problem is quite rare – and we (and our customers) usually want to get things working rather than examine the slow system – we haven’t had a good chance to confirm some of these theories.

One typical symptom, apart from things just running slow, is a particular event in the ULS log indicating a slow running stored procedure.  These are verbose level messages from SharePoint Foundation Monitoring, so may not be traces that are normally written to your ULS logs, but if they and they occur for some of the queue stored procedures such as MSP_ProjQ_Lock_Next_Available_Group  or MSP_TimesheetQ_Lock_Next_Available_Group.  The event ID to look for is b4ly and it might look something like tis:

08/09/2012 16:10:25.10 Microsoft.Office.Project.Server (0x36D0) 0x2E58 SharePoint Foundation Monitoring b4ly Verbose Leaving Monitored Scope (FillTypedDataSet — MSP_ProjQ_Lock_Next_Available_Group). Execution Time=21102.179988219601 <correlation ID>

I’ve highlighted the Event ID and the stored procedure name.  The execution time is in ms – so this is 21 seconds – v. slow.  You may also be able to track an increase in this time across your logs.

As mentioned above we usually found that a reboot got things moving again, but once we understood the likely causes of the problem we could be a little more focused and fix things with less inconvenience to the users.

How to Fix

1. Always worth getting your database maintenance working well – for this and many other reasons.

2. Set the recommended value for SET AUTO_UPDATE_STATISTICS_ASYNC

3. If Project Server is the only application running on SQL Server then the sledgehammer approach would be to run the following command within SQL Server management studio:  DBCC FREEPROCCACHE.  This will empty the stored execution plans from the procedure cache and then ALL stored procedures will need to recompile and the new execution plan should get things working faster.

4.  If other applications are running you probably don’t want to run that DBCC command – so instead you could just recompile a couple of the likely stored procedures:

Draft DB
EXEC Sp_recompile MSP_ProjQ_Lock_Next_Available_Group
EXEC Sp_recompile MSP_ProjQ_Get_Status_Of_Jobs_List
Published DB
EXEC Sp_recompile MSP_TimesheetQ_Get_Status_Of_Jobs_List
EXEC Sp_recompile MSP_TimesheetQ_Lock_Next_Available_Group

I hope this helps if you do run in to this kind of behavior – and hopefully by keeping SQL Server patched and using our recommended settings it isn’t something that you should see happen – but just in case – this detail might save a call to me and my colleagues.