I Command You to Deliver Those Commands – Replication weirdness

Apr 2, 2010

/*Note – I have yet to see if this breaks anything – this post is for observational use only.  If it does break replication, I’ll add some notes to the bottom of this post */

 

Late last week, we had the fabulous opportunity to see just what happens when the dual power supplies for your SAN are plugged into the same PDU and that PDU gets powered off.  As you can imagine, this caused all sorts of neat things to occur.  Fortunately, the situation was remedied fairly quickly, but not before breaking replication for one of my databases.

 

The error that I received was: “The concurrent snapshot for publication ‘xx’ is not available because it has not been fully generated…”.  There were probably different routes to go to fix this issue, but I decided to reinitialize the subscription from a snapshot and went ahead and created the snapshot.  This particular database is a little under 500GB, so I knew that this would take awhile.

 

The next morning when I came to work, I could see that the snapshot had completed and the subscription had been reinitialized.   I knew that it would take some time for all of the commands that had queued up in the distribution database to be written to the subscription and so I went about my day.

 

Later, I noticed that the number of undistributed commands didn’t seem to be going down at all and was in fact growing.  This is a fairly heavily utilized database with writes going on very frequently, but it still seemed odd.   I did some querying to try to figure out just how behind we were, but, to my surprise, the subscription was completely up to date.  This is where it started to get strange.

 

I queried the MSdistriibution_status view to get an idea of what was still outstanding and saw the following:

Article_id Agent_id UndelivCmdsInDistDB DelivCmdsInDistDB
1 3 0 2130
2 3 0 3295
3 3 0 8275
4 3 0 2555
5 3 0 8310
6 3 0 2521
1 -2 2130 0
2 -2 3295 0
3 -2 8275 0
4 -2 2555 0
5 -2 8310 0
6 -2 2521 0
1 -1 2130 0
2 -1 3295 0
3 -1 8275 0
4 -1 2555 0
5 -1 8310 0
6 -1 2521 0

 

You’ll notice that the row counts for agents -1 and -2, the number of undistributed commands for each article was the same as the distributed commands for agent 3.

 

When I queried the MSDistribution_Agents table, I saw three distribution agents.  One was named identically to the distribution agent job.  The other two had subscriber_ids with negative numbers and the subscriber_db was ‘virtual’. 

 

I did some digging around on some unnamed search engine to try to figure out what virtual subscriptions were.  To be honest, there wasn’t  a lot of information to be found.  I did find a question on Eggheadcafe.com from 2007 where it seemed like the person was having a similar issue.  Reading through, it looked like it had something to do with the settings for allowing anonymous subscriptions and immediately synchronizing the snapshot.  I looked at my publication and both those were set to true.

 

Of course, I posted a question on Twitter at the same time and Hilary Cotter [Blog/Twitter], stated that the virtual subscriptions “are to keep your snapshot up to date, ignore them”.  That seemed to be consistent with what I was reading, but it still seemed odd that they appeared to be holding onto undistributed commands.  This, in turn, was causing my distribution database to grow.

 

Since, at this point, I figured that I was going to have to tear the whole thing down and rebuild it, I thought that I’d try to modify the publication first.  I knew that I didn’t want to allow anonymous subscriptions and I didn’t need to immediately synch from a snapshot, so I ran the following commands:

 

exec sp_changepublication @publication = 'xxxxx', @property = N'allow_anonymous', @value = 'false'

exec sp_changepublication @publication = 'xxxxx', @property = N'immediate_sync', @value = 'false'

 

I reran my query of the MSdistribution_status and voila, the undistributed commands were gone.  When I queried MSDistribution_agents, there was still a virtual subscriber, but it was no longer filling up my distribution database with undelivered commands.

 

I still don’t know what caused this – it seems like something occurred when I reinitialized the snapshot.  So far, replication is continuing to run without errors.  I’d be interested to hear if anyone has run into a similar issue.

Share with others

No Responses so far | Have Your Say!

Leave a Feedback

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Spam Protection by WP-SpamFree Plugin