/*Note – I have yet to see if this breaks anything – this post is for observational use only.  If it does break replication, I’ll add some notes to the bottom of this post */

 

Late last week, we had the fabulous opportunity to see just what happens when the dual power supplies for your SAN are plugged into the same PDU and that PDU gets powered off.  As you can imagine, this caused all sorts of neat things to occur.  Fortunately, the situation was remedied fairly quickly, but not before breaking replication for one of my databases.

 

The error that I received was: “The concurrent snapshot for publication ‘xx’ is not available because it has not been fully generated…”.  There were probably different routes to go to fix this issue, but I decided to reinitialize the subscription from a snapshot and went ahead and created the snapshot.  This particular database is a little under 500GB, so I knew that this would take awhile.

 

The next morning when I came to work, I could see that the snapshot had completed and the subscription had been reinitialized.   I knew that it would take some time for all of the commands that had queued up in the distribution database to be written to the subscription and so I went about my day.

 

Later, I noticed that the number of undistributed commands didn’t seem to be going down at all and was in fact growing.  This is a fairly heavily utilized database with writes going on very frequently, but it still seemed odd.   I did some querying to try to figure out just how behind we were, but, to my surprise, the subscription was completely up to date.  This is where it started to get strange.

 

I queried the MSdistriibution_status view to get an idea of what was still outstanding and saw the following:

Article_id Agent_id UndelivCmdsInDistDB DelivCmdsInDistDB
1 3 0 2130
2 3 0 3295
3 3 0 8275
4 3 0 2555
5 3 0 8310
6 3 0 2521
1 -2 2130 0
2 -2 3295 0
3 -2 8275 0
4 -2 2555 0
5 -2 8310 0
6 -2 2521 0
1 -1 2130 0
2 -1 3295 0
3 -1 8275 0
4 -1 2555 0
5 -1 8310 0
6 -1 2521 0

 

You’ll notice that the row counts for agents -1 and -2, the number of undistributed commands for each article was the same as the distributed commands for agent 3.

 

When I queried the MSDistribution_Agents table, I saw three distribution agents.  One was named identically to the distribution agent job.  The other two had subscriber_ids with negative numbers and the subscriber_db was ‘virtual’. 

 

I did some digging around on some unnamed search engine to try to figure out what virtual subscriptions were.  To be honest, there wasn’t  a lot of information to be found.  I did find a question on Eggheadcafe.com from 2007 where it seemed like the person was having a similar issue.  Reading through, it looked like it had something to do with the settings for allowing anonymous subscriptions and immediately synchronizing the snapshot.  I looked at my publication and both those were set to true.

 

Of course, I posted a question on Twitter at the same time and Hilary Cotter [Blog/Twitter], stated that the virtual subscriptions “are to keep your snapshot up to date, ignore them”.  That seemed to be consistent with what I was reading, but it still seemed odd that they appeared to be holding onto undistributed commands.  This, in turn, was causing my distribution database to grow.

 

Since, at this point, I figured that I was going to have to tear the whole thing down and rebuild it, I thought that I’d try to modify the publication first.  I knew that I didn’t want to allow anonymous subscriptions and I didn’t need to immediately synch from a snapshot, so I ran the following commands:

 

exec sp_changepublication @publication = 'xxxxx', @property = N'allow_anonymous', @value = 'false'

exec sp_changepublication @publication = 'xxxxx', @property = N'immediate_sync', @value = 'false'

 

I reran my query of the MSdistribution_status and voila, the undistributed commands were gone.  When I queried MSDistribution_agents, there was still a virtual subscriber, but it was no longer filling up my distribution database with undelivered commands.

 

I still don’t know what caused this – it seems like something occurred when I reinitialized the snapshot.  So far, replication is continuing to run without errors.  I’d be interested to hear if anyone has run into a similar issue.

The company I work for is currently in the midst of solidifying their SQL Server high availability and disaster recovery scheme.  While doing this, I did a comparison of all the available options within SQL Server for HA/DR for recommendation to my management.  We eventually went with a third party tool and this blog isn’t an endorsement of that tool, but rather is intended to give insight into how one company (the one I work for) approached looking at our disaster recovery options.  Since I had to put this together, I figured that maybe some of this might be helpful to someone else out there, so I thought I would write it up as a blog post.

 

What Do The Terms High Availability And Disaster Recovery Mean?

Before we get too deep into this, I figured I would define the terms High Availability and Disaster Recovery since they are quite often used together and sometimes, mistakenly, interchangeably.

 

High Availability – Protects against hardware/software issues that may cause downtime.  An example could be a bad processor or memory issue.

Disaster Recovery – The ability to recover from a catastrophic event, such as a natural disaster or, on a more local level, a data center disaster (fire, flood, cooling outage, etc) and be able to continue business operations with minimal loss and minimal downtime.

 

These two concepts really go hand in hand in ensuring business continuity and both should be built into a business continuity solution.

 

High Availability/Disaster Recovery Options In SQL Server

SQL Server has a number of native features built in to provide some basic measure of high availability/disaster recovery.  These include:

 

  • Database backups – this is probably the most basic form of disaster recovery for SQL Server and one that should be practiced in every situation, regardless of what other HA/DR solutions are in place.
  • Clustering – this provides a way of binding two or more like Windows servers together in what is known as a cluster.  Each of the servers in the cluster is considered a node and, typically, one node is “active” (processing transactions) and the other nodes are “passive”.  There is a private network that runs between the nodes so that if the active node fails to deliver a “heartbeat” that the other node(s) can detect, an automatic failover is invoked and one of the passive nodes is promoted to active.
    • Pros of Clustering:  Clustering provides redundancy in the case of server hardware failure and provides a fairly quick (within 5 minutes), automatic solution to move processing to another server.
    • Cons of Clustering
      • Does not protect against disk issues since all nodes share the database on the same disk.
      • Only protects against issues with that specific server, not data center-wide since all nodes are located in same data center.
      • Only addresses availability, not disaster recovery
  • Database Mirroring – new in SQL Server 2005, database mirroring offers a way to mirror a database to another server (and disk).  All transactions are sent to the mirror server as they are committed on the production server.  Depending on how it is implemented, can automate failover, similar to clustering.
    • Pros of Database Mirroring
      • Provides some form of both HA and DR since mirror can be located in another data center, thus protecting you from hardware failure and disaster.
      • Fast.  Mirror is updated virtually instantly
    • Cons of Database Mirroring
      • Only done at the database level, not the instance level and only user databases can be mirrored, not system databases.  This means that some other form of synchronizing logins and other system database objects has to be devised.
      • To be get all features of database mirroring, Enterprise Edition has to be used. 
      • Any SQL Agent jobs must be manually enabled on the mirror if a failover takes place.
  • Log Shipping – this is one of the oldest forms DR available in SQL Server and involves setting up a warm standby server with a copy of the user database on it that is to be protected and backups of the transaction log from the production database are periodically shipped to the standby server and applied.
    • Pros of Log Shipping:
      • Tried and true technology that has been around for a long time.
      • At the database level, can provide both HA and DR protection because warm standby can be located in another data center.
    • Cons of Log Shipping:
      • Amount of potential data loss is higher than with the other options because logs are usually shipped no more frequently than every 5 minutes and typically, more like every 30 minutes to an hour.
      • Failover is fairly manual and time intensive.  Takes longer than other options to bring warm standby online.
      • Like database mirroring, this only protects a database, not the entire instance.
      • For SQL Server 2000, this feature is only available in Enterprise Edition.  Available in Standard Edition from SQL Server 2005 forward.
      • Does not transfer non-logged transactions or schema changes (security, addition of database objects
  • Replication – while not necessarily intended as an HA/DR solution replication can be used in both scenarios.
    • Pros of Replication:
      • Real-time data updates.  Minimal data loss, if at all.
      • Can be used for both HA/DR as publisher and subscriber can be in different data centers.
    • Cons of Replication:
      • Complicated to setup and maintain.
      • No provided failover mechanism.  This has to be created as part of solution.
      • Again, only database specific solution, not instance specific.

Given that these native solutions were really only database and not instance based, we chose to look at third party options.  The product that we settled on was Double-Take.  While certainly not an easy solution to implement, Double-Take was attractive to us because it allowed us to set up a stand-by server in our hot site for each of our mission critical SQL Servers and then continuously replicate the entire instance to the stand-by server.  It also provides for either automated (if the server stops responding) or manual failover (we have opted for manual) through a control applet that automatically swaps DNS entries between the production and the standby server when a failover is initiated.

 

Double-Take:  How It Works

Both the production and the standby server have to have the exact same SQL Server configuration (Edition, build, directory structure/location, instances, etc) installed.  The Double-Take software is then installed on both the production and standby server and then, through the Double-Take software, the production server is configured as the source and the standby server is configured as the target.

 

During the configuration process, you can configure Double-Take to compress the data before it replicates it over the WAN to the stand-by server.  This can save a ton of bandwidth and makes sure that the transactions are queued on the target server as quickly as possible ensuring minimal data loss in the event of a failure.

 

Additionally, Double-Take will also generate scripts for the failover, failback, and restore of the databases back to the production server when it is back in commission.  These scripts and/or the replication can be customized by overriding the automatically generated rules that Double-Take creates.

This is a quickie blog post, but I thought I’d post it before I forgot about it.

 

We have transactional replication turned on for one of our production databases.  This database includes XML fields and we recently saw the following error:

 

           Length of LOB data (78862) to be replicated exceeds configured maximum 65536

 

Looking into it, this is a server setting and the default value for Max Text Replication Size is 65536 (bytes).   Because our XML data is of varying sizes, I made the decision to set this to the maximum allowable value – 2147483647 (bytes). 

 

To change it through SSMS:

  •                Right click on the server and choose Properties
  •                In the Advanced page – change the Max Text Replication Size to 2147483647

In T-SQL:

EXEC sp_configuremax text repl size’, 2147483647

GO

RECONFIGURE
GO
 
Hope this helps someone out! 
 
 

 

 

 

Replication tidbit

17 September 2009

I had set up replication a few times with SQL Server 2000 and there were times that I didn’t want to use a snapshot or a backup – had the schema and data already available and I just wanted to push new data over there.  Setting up the subscription was pretty easy.  In the Initialize Subscription dialog box, I just checked ‘No, the Subscriber already has the schema and data’.  Simple, eh?

initialize_2000

Recently I ran into the same situation – couldn’t use a snapshot or a backup (don’t ask).  The subscription database was already in place and I just wanted to push any new transactions from the publication over there.  When I went through the handy dandy replication wizard, though, I didn’t see the option that I had remembered from SQL Server 2000.  After an initial panic attack, I did some research.  The option is still there, it just looks different.

initialize_2005

All that you need to do now is uncheck the Initialize checkbox in the Initalize Subscription dialog box and away you go.  

Live, learn and replicate.