Friends don’t let friends delete their cache or cancel queue jobs

Inspired by Brian Kennemer’s e-mail tag line of “Friends don’t let friends assign resources to summary tasks” I thought I would get back on my soapbox about the cache and queue.  I do appreciate that there are some early bugs around custom field display that require the occasional local cache deletion – and there are a couple of rare scenarios that will leave the queue in a bad way and things need canceling.  But generally many of the situations our customers run into can be resolved without recourse to either of these actions – which can both lead to DATA LOSS!

A couple of examples from the queue:-

Project Save from Project Professional – Getting Queued

This means the data is flowing from the client cache to the server queue – and once it is all in the queue it will then be loaded into the Project Server database tables.  If the client goes away while this is happening (and this can be our fault as we don’t handle Project closing very well) or the network goes down, or you hibernate your laptop as you race out of Starbucks, then the queue will just sit in this state.  If you cancel the queue job then the good data in the client cache will never see the light of day.  The correct approach is to identify from the queue where the save is coming from (the owner will display from the queue) and then get that person to re-connect their client and the getting queued should continue.  In some cases you will see the original save show as canceled but if you look in the ULS logs it will have a message along the lines of:-

PWA:http://server1/PWA, SSP:SharedServices1, User:DOMAINusername, PSI: WinProj.PreSaveProject [T:abf8f56f-e3d1-4139-9355-55ef33aa1378][U:079d778a-2a14-455a-a52e-3141b57e75ea][S:6521e25f-5c1c-41d3-a224-7a868e161c42][D:CLIENT1ProjConf 2][J:abf8f56f-e3d1-4139-9355-55ef33aa1378][PS_AC][3] Cancelling correlation 2439f848-3966-44b7-a645-1ff7b6914f10 as it has 1 send incomplete winproj save jobs.

which indicates the original save hadn’t got very far so it cancels it from the server and starts again.  This was in fact the project that should have demonstrated this recovery at the project conference – but I didn’t leave Project Professional connected to that profile for long enough (my fault – trying to present 3 hours of stuff in 75 minutes).  Another interesting tip from this queue job – CLIENT1ProjConf 2 is the server name and the Project Server account (not user account but the “profile”) used on that machine to make this queue request.

So the queue shows something like this:-

image

with the important fact that I didn’t cancel anything and the save came from my client cache – and nothing was lost.

Timesheet Update – Failed and Blocking Correlation

This next example shows a couple of things – the sleeping state and that the retry does work.  As long as you fix the underlying problem.  The queue is all data driven and if the data stays the same then it will behave exactly the same.  (One definition of insanity is doing the same thing over and over expecting a different outcome – same thing with the queue).  If I submit a timesheet with administrative time then when the update is processed it puts a calendar exception in to my calendar for the non-working time.  If as a resource I am checked out then this update can go into a sleeping state (Waiting to be processed (Sleeping)) – and it wakes up every 2 minutes and tries again.  If I happen to get checked in in the meantime then all is good and the process completes.  If not then eventually it will fail.  The error shown in the queue even gives you a reasonable clue to why it failed (if you know the secret language – CICO = Check-in check-out):-

Error summary/areas:
Array
CICOAlreadyCheckedOutToYou
Queue
GeneralQueueJobFailed
Error details:

<?xml version=”1.0″ encoding=”utf-16″?>
<errinfo>
  <array name=”Array” type=”System.Guid”>
    <item value=”079d778a-2a14-455a-a52e-3141b57e75ea”>
      <error id=”10101″ name=”CICOAlreadyCheckedOutToYou” uid=”ce366c36-421b-4c47-8fa0-d68f42ba63d6″ />
    </item>
  </array>
  <general>
    <class name=”Queue”>
      <error id=”26000″ name=”GeneralQueueJobFailed” uid=”385171b0-3ee9-4087-b308-859cb62fea53″ JobUID=”702e81a6-4f0e-4faf-ab78-2ab81fe60972″ ComputerName=”SERVER2″ GroupType=”TimesheetUpdate” MessageType=”UpdateTimesheetMessage” MessageId=”1″ Stage=”” />
    </class>
  </general>
</errinfo>

To recover from this error you do not need to cancel – just fix the underlying problem, which in this case was that I had my account open in Manage Users on another IE session, and then select the job and click retry.

image

This time it all works fine – even the blocked jobs can continue and other related jobs get spawned to update the reporting DB.

image

So please, please, please – deleting the local cache and canceling queue jobs should be a last resort.  There is usually a better way.

Technorati Tags: Project Server 2007