SQL, Team Foundation Server, Windows

TFS 2013 Upgrade: Project Collection Stuck Offline (Servicing…) State

We got around to finally upgrading our TFS 2012 Update 4 & SQL Server 2008 R2 SP2 environments to TFS 2013 Update 4 & SQL Server 2014 CU6 Enterprise.

Step 0. Background

The environment consisted of 3 project collections:

  • CollectionA – 450GB – 30 projects
  • CollectionB – 10GB – 1300 projects
  • CollectionC – 1.9TB – 10 projects

All the collections BUT CollectionB upgraded (eventually) without a problem on a dedicated hardware platform consisting of 4 CPUs (HT), 32GB RAM & 15K RAID10 disk configurations.

Puzzling thing was that CollectionB being the smallest, it was in an Offline (Servicing…) state for nearly 8 hours!:

Looking at SQL processes and connections we didn’t see any activity what so ever, nothing out of the ordinary on network, CPU or memory utilisation. The same went for the application tier server which was tasked to perform the upgrade, BUT we did notice that one core (out of 4) was utilised at 100%…

Additionally in Configuration Logs there was no ApplyPatch log as that gets written only when it finishes! How convenient – you’re in the dark! 

Time to dig into SQL more deeply (thanks to a call to Microsoft who provided some queries which we overlooked):

Step 1: Get the job id

Here the OperationClass can be set to either ‘DetachCollection’ or ‘ApplyPatch’:

FROM    tbl_ServicingJobDetail
WHERE   OperationClass = ‘AttachCollection’

Step 1.1. Get the job id

We can get the job ID from querying with the ‘ApplyPatch’ value to see what project collection is being upgraded:

FROM    tbl_ServicingJobDetail
WHERE   OperationClass = ‘ApplyPatch’

Step 2. Get job status

We want to see the status of the job, to see if it still queued or started:

SELECT  QueueTime ,
        StartTime ,
        EndTime ,
        Result ,
FROM    tbl_JobHistory
WHERE   JobId LIKE ‘9E639712-61CA-4B17-ADA0-2989240FB1F6’
QueueTime StartTime EndTime Result ResultMessage
2015-03-24 20:09:55.493 2015-03-24 20:09:57.087 2015-03-24 20:10:45.367 7 NULL

Step 3. Get the latest log output

This is the the golden nugget that you can see exactly what is going on behind the scenes:

SELECT  DetailTime,
FROM    vw_ServicingStepDetail
WHERE   JobId = ‘9E639712-61CA-4B17-ADA0-2989240FB1F6’

So we were able to see that the upgrade process was still running at step 290 of 376. The Chat.FixupChatPermissions is a new thing from Microsoft – and if memory serves me right goes as back as TFS 2012.

End Result

In the end for us what worked was to wait for the upgrade process to finish on that project collection. We were lucky that we were able to determine the expected (worse case scenario) time of completion as we executed the upgrade on a test environment which was identical to the production one. In the test environment it took in total 36 hours to complete therefore we expected that on production it should not be more than that.


What have we learned from this upgrade? Well a few things that’s for sure:

  • A large project collections would take equal amount of time to upgrade as a complex one (1.7TB PC equals same amount of time as a PC with 1300 projects)
    • BUT in our case the 1.7TB project collection took the same amount of time as the 1300 projects PC because it spent 90% of the time compressing certain tables as we upgraded to SQL Server 2014 Enterprise Edition – TFS automatically detects this and goes ahead compressing and optimising tables
  • If you restart TFS/SQL the upgrade will resume where it left off BUT if that particular step takes 10 hours, 10 hours would be restarted again – so please think long and hard before you do it!
  • Upgrade process is performed by single-threaded servicing AKA on one core = SLOW

So there we go, before thinking that you might have a frozen project collection after an upgrade or attachment/detachment follow the above steps to see what really is going on behind the scenes!


Leave a Comment

Your email address will not be published. Required fields are marked *