March towards SQL Server : Day 21 – SQL DBA Interview Questions Answers – REPLICATION
Now as we have already crossed two third of the month and covered a lot of topics till date, Now lets get a deep insight about another very interesting feature of SQL Server, The Replication. It is simple, It is bit tough and It is complex at the same time depending on the mode of replication you are using. While it is highly useful for Reporting purpose, It can also be described as cheapest solution for high availability and disaster recovery as you can move object(Say Most important objects of your environment) from one server to another server. You don’t need to move whole database from one node to another if there are only few tables which are important from HA\DR point of view. Lets start the reading most frequently asked QA series on Replication. Here you go.
Replication is subset of SQL Server that can move data and database objects in an automated way from one database to another database. This allows users to work with the same data at different locations and changes that are made are transferred to keep the databases synchronized.
2) What are types of replication?
- Snapshot replication – As the name implies snapshot replication takes a snapshot of the published objects and applies it to a subscriber. Snapshot replication completely overwrites the data at the subscriber each time a snapshot is applied. It is best suited for fairly static data or if it’s acceptable to have data out of sync between replication intervals. A subscriber does not always need to be connected, so data marked for replication can be applied the next time the subscriber is connected. An example use of snapshot replication is to update a list of items that only changes periodically.
- Transactional replication – As the name implies, it replicates each transaction for the article being published. To set up transactional replication, a snapshot of the publisher or a backup is taken and applied to the subscriber to synchronize the data. After that, when a transaction is written to the transaction log, the Log Reader Agent reads it from the transaction log and writes it to the distribution database and then to the subscriber. Only committed transactions are replicated to ensure data consistency. Transactional replication is widely applied where high latency is not allowed, such as an OLTP system for a bank or a stock trading firm, because you always need real-time updates of cash or stocks.
- Merge replication – This is the most complex types of replication which allows changes to happen at both the publisher and subscriber. As the name implies, changes are merged to keep data consistency and a uniform set of data. Just like transactional replication, an initial synchronization is done by applying snapshot. When a transaction occurs at the Publisher or Subscriber, the change is written to change tracking tables. The Merge Agent checks these tracking tables and sends the transaction to the distribution database where it gets propagated. The merge agent has the capability of resolving conflicts that occur during data synchronization. An example of using merge replication can be a store with many branches where products may be centrally stored in inventory. As the overall inventory is reduced it is propagated to the other stores to keep the databases synchronized.
3) What are various Agents of replication?
- Snapshot Agent- The Snapshot Agent is used with all types of replication. It prepares the schema and the initial bulk copy files of published tables and other objects, stores the snapshot files, and records information about synchronization in the distribution database. The Snapshot Agent runs at the Distributor.
- Log Reader Agent – The Log Reader Agent is used with transactional replication. It moves transactions marked for replication from the transaction log on the Publisher to the distribution database. Each database published using transactional replication has its own Log Reader Agent that runs on the Distributor and connects to the Publisher (the Distributor can be on the same computer as the Publisher)
- Distribution Agent – The Distribution Agent is used with snapshot replication and transactional replication. It applies the initial snapshot to the Subscriber and moves transactions held in the distribution database to Subscribers. The Distribution Agent runs at either the Distributor for push subscriptions or at the Subscriber for pull subscriptions.
- Merge Agent – The Merge Agent is used with merge replication. It applies the initial snapshot to the Subscriber and moves and reconciles incremental data changes that occur. Each merge subscription has its own Merge Agent that connects to both the Publisher and the Subscriber and updates both. The Merge Agent runs at either the Distributor for push subscriptions or the Subscriber for pull subscriptions.
- Queue Reader Agent – The Queue Reader Agent is used with transactional replication with the queued updating option. The agent runs at the Distributor and moves changes made at the Subscriber back to the Publisher. Unlike the Distribution Agent and the Merge Agent, only one instance of the Queue Reader Agent exists to service all Publishers and publications for a given distribution database.
4) Why is primary key needed in Transactional replication?
The reason is in the subscriber, rows are updated/deleted one-by-one using primary key.
If you delete 100 rows in the publisher using a single DELETE statement, in the subscriber 100 DELETE statements would be executed.
— on publisher
DELETE FROM dbo.tbAddress WHERE City = ‘LONDON’
— on subscriber
DELETE FROM dbo.tbAddress WHERE pk = @pk
5) Which all database objects can be included in replication?
6) What are prerequisites of transactional replication?
This is a basic rule that every article should have a Primary Key to be a candidate table for Transactional Replication. Primary keys are used to maintain uniqueness of records and to maintain referential integrity between tables, and that is why it is recommended for every article to have a primary key.
Securing snapshot folder:
Enough disk space for database being published:
We need to make sureWe need to make sure that we have ample space available for the transaction log for the published database, as it will continue to grow and won’t truncate the log records until they are moved to the distribution database. Please note that even in simple recovery model, the log fle can grow large if replication breaks. That is the reason it is recommended to set T-log’s auto grow option to value “true”. We should also make sure that the distribution database is available and log reader agent is running.
Enough disk space for distribution database:
It is necessary to have enough disk space allocated to the distribution database. This is because the distribution database will store the transactions marked for replication until it is applied to the subscriber database within the limit of retention period of distribution (which is 72 hours by default), or it will retain the transactions until the snapshot agent re-runs and creates a new snapshot. re-runs and creates a new snapshot.
Use domain account as service account:
We should always use the domain account as a service account, so that when agents access the shared folder of snapshot fles, it won’t have any problem just because they are local to the system and do not have permission to access network share. While mentioning service account, we are asked to choose from two built-in accounts including Local System account, Network Services, and this account, wherein we have to specify the domain account on which the service account will run.
7) Difference between push and pull replication.
- Push – As the name implies, a push subscription pushes data from publisher to the subscriber. Changes can be pushed to subscribers on demand, continuously, or on a scheduled basis.
- Pull – As the name implies, a pull subscription requests changes from the Publisher. This allows the subscriber to pull data as needed. This is useful for disconnected machines such as notebook computers that are not always connected and when they connect they can pull the data.
8) Define Distributor, Subscriber & Publisher
The Publisher is a server that makes data available for replication to other servers. In addition to being the server where you specify which data is to be replicated, the Publisher also detects which data has changed and maintains information about all publications at that site. Usually, any data element that is replicated has a single Publisher, even if it may be updated by several Subscribers or republished by a Subscriber. The publication database is the database on the Publisher that is the source of data and database objects to be replicated. Each database used in replication must be enabled as a publication database either through the Configure Publishing and Distribution Wizard, the Publisher and Distributor properties, by using the sp_replicationdboption system stored procedure, or by creating a publication on that database using the Create Publication Wizard.
The Distributor is a server that contains the distribution database and stores meta data, history data, and/or transactions. The Distributor can be a separate server from the Publisher (remote Distributor), or it can be the same server as the Publisher (local Distributor). The role of the Distributor varies depending on which type of replication you implement, and in general, its role is much greater for snapshot replication and transactional replication than it is for merge replication.
Subscribers are servers that receive replicated data. Subscribers subscribe to publications, not to individual articles within a publication, and they subscribe only to the publications that they need, not necessarily all of the publications available on a Publisher. If you have applications using transactional replication built with Microsoft® SQL Server™ version 6.5 or later, and those applications subscribe directly to articles instead of to publications, the applications will continue to work in SQL Server 2000. However, you should begin to migrate your subscriptions to the publication level where each publication is composed of one or more articles.
9) Define Article, Publication & Subscription.
An article identifies a database object that is included in a publication. A publication can contain different types of articles, including tables, views, stored procedures, and other objects. When tables are published as articles, filters can be used to restrict the columns and rows of the data sent to Subscribers.
A publication is a collection of one or more articles from one database. The grouping of multiple articles into a publication makes it easier to specify a logically related set of database objects and data that are replicated as a unit.
A subscription is a request for a copy of a publication to be delivered to a Subscriber. The subscription defines what publication will be received, where, and when. There are two types of subscriptions: push and pull.
10) Can we add or drop a single article from a publication. If so, How?
It is not necessary to stop activity on the publication or subscription databases in order to add a table (or another object). Add a table to a publication through the Publication Properties – <Publication> dialog box or the stored procedures sp_addarticle and sp_addmergearticle.
Remove a table from the publication using sp_droparticle, sp_dropmergearticle, or the Publication Properties – <Publication> dialog box. You cannot drop articles from snapshot or transactional publications after subscriptions have been added; you must drop the subscriptions first.
11) Define sp_replcounters
Returns replication statistics about latency, throughput, and transaction count for each published database. This stored procedure is executed at the Publisher on any database.
12) Can we use replication to replicate data across different RDBMS i.e. SQL to Oracle
Oracle and DB2 can subscribe to snapshot and transactional publications using push subscriptions. Subscriptions are supported for the two most recent versions of each database listed using the most recent version of the OLE DB provider listed.
However, Heterogeneous replication to non-SQL Server subscribers is deprecated. Oracle Publishing is deprecated. To move data, create solutions using change data capture and SSIS.
This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
13) Explain Latency in replication. How can you monitor Latency of particular publication
Transactional replication provides the tracer token feature, which provides a convenient way to measure latency in transactional replication topologies and to validate the connections between the Publisher, Distributor and Subscribers. A token (a small amount of data) is written to the transaction log of the publication database, marked as though it were a typical replicated transaction, and sent through the system, allowing a calculation of:
- How much time elapses between a transaction being committed at the Publisher and the corresponding command being inserted in the distribution database at the Distributor.
- How much time elapses between a command being inserted in the distribution database and the corresponding transaction being committed at a Subscriber.
14) What permissions are needed to a user to monitor replication.
The replmonitor database role in the distribution database. These users can monitor replication, but cannot change any replication properties.
15) Name some commonly used Replication DMVs and their use.
There are four replication related DMV’s in SQL Server.
16) What are the advantages and disadvantages of Snapshot replication over Transactional replication.
Snapshot Replication would be good to use if:
1. if you are sure that you would synchronize only once in a day and your business requirements do not include replicating transactions as and when they are comitted on the publisher
2. If the size of the replicating articles is small – may be a few MBs/GBs
3. If it is does not matter that for some time the replicating articles would be locked (till the snapshot would be generated)
Transactional Replication would be good to use if:
1. You want incremental changes to be propagated to Subscribers as they occur.
- The application requires low latency between the time changes are made at the Publisher and the changes arrive at the Subscriber.
- The application requires access to intermediate data states. For example, if a row changes five times, transactional replication allows an application to respond to each change (such as firing a trigger), not simply the net data change to the row.
4.The Publisher has a very high volume of insert, update, and delete activity.
15) What is peer to peer replication.
Peer-to-peer replication provides a scale-out and high-availability solution by maintaining copies of data across multiple server instances, also referred to as nodes. Built on the foundation of transactional replication, peer-to-peer replication propagates transactionally consistent changes in near real-time. This enables applications that require scale-out of read operations to distribute the reads from clients across multiple nodes. Because data is maintained across the nodes in near real-time, peer-to-peer replication provides data redundancy, which increases the availability of data.
16) What is conflict resolution in merge replication.
Merge replication allows multiple nodes to make autonomous data changes, so situations exist in which a change made at one node may conflict with a change made to the same data at another node. In other situations, the Merge Agent encounters an error such as a constraint violation and cannot propagate a change made at a particular node to another node.
The Merge Agent detects conflicts by using the lineage column of the MSmerge_contents system table; if column-level tracking is enabled for an article, the COLV1 column is also used. These columns contain metadata about when a row or column is inserted or updated, and about which nodes in a merge replication topology made changes to the row or column. You can use the system stored procedure sp_showrowreplicainfo (Transact-SQL) to view this metadata.
As the Merge Agent enumerates changes to be applied during synchronization, it compares the metadata for each row at the Publisher and Subscriber. The Merge Agent uses this metadata to determine if a row or column has changed at more than one node in the topology, which indicates a potential conflict. After a conflict is detected, the Merge Agent launches the conflict resolver specified for the article with a conflict and uses the resolver to determine the conflict winner. The winning row is applied at the Publisher and Subscriber, and the data from the losing row is written to a conflict table.
Conflicts are resolved automatically and immediately by the Merge Agent unless you have chosen interactive conflict resolution for the article.
17) What are datatype concerns in transactional replication.
Transactional replication supports publishing LOBs and performs partial updates on LOB columns: if a LOB column is updated, only the fragment of data changed is replicated, rather than all the data in the column.
If a published table includes any LOBs, consider using the following Distribution Agent parameters: -UseOledbStreaming, -OledbStreamThreshold, and -PacketSize. The most straightforward way to set these parameters is to use the Distribution Agent profile titled Distribution Profile for OLEDB streaming.
The process of replicating text, ntext and image data types in a transactional publication is subject to a number of considerations. It is recommend that you use the data types varchar(max), nvarchar(max), varbinary(max) instead of text, ntext, and image data types, respectively.
18) Which all SQL editions provide replication functionality
|Feature Name||Enterprise||Business Intelligence||Standard||Web||Express|
|SQL Server change tracking||Yes||Yes||Yes||Yes||Yes|
|Merge replication||Yes||Yes||Yes||Yes (Subscriber only)||Yes (Subscriber only)|
|Transactional replication||Yes||Yes||Yes||Yes (Subscriber only)||Yes (Subscriber only)|
|Snapshot replication||Yes||Yes||Yes||Yes (Subscriber only||Yes (Subscriber only)|
|Peer to Peer transactional replication||Yes|
19) Can we rename a database used in Publication or subscription.
No. we would need to drop the publications, rename the database and re-configure replication all over again. So there is no easy way to do this.
20) Are logins and passwords replicated?
No. You could create a DTS\SSIS package to transfer logins and passwords from a Publisher to one or more Subscribers.
21) Please underline the complications involved in using replication on SQL Cluster.
No special considerations are required because all data is stored on one set of disks on the cluster.
22) Are tables locked during snapshot generation?
The length of time that the locks are taken depends on the type of replication used:
- For merge publications, the Snapshot Agent does not take any locks.
- For transactional publications, by default the Snapshot Agent takes locks only during the initial phase of snapshot generation.
- For snapshot publications the Snapshot Agent takes locks during the entire snapshot generation process.
Because locks prevent other users from updating the tables, the Snapshot Agent should be scheduled to execute during periods of lower activity on the database, especially for snapshot publications.
23) What recovery model is required on a replicated database?
Replication is not dependent on any particular recovery model. A database can participate in replication whether it is in simple, bulk-logged, or full. However how data is tracked for replication depends on the type of replication used.
24) Can the same objects be published in different publications?
Replication supports publishing articles in multiple publications (including republishing data) with the following restrictions:
- If an article is published in a transactional publication and a merge publication, ensure that the @published_in_tran_pub property is set to TRUE for the merge article.
- An article cannot be published in both a merge publication and a transactional publication with queued updating subscriptions.
- Articles included in transactional publications that support updating subscriptions cannot be republished.
- Transactional replication and unfiltered merge replication support publishing a table in multiple publications and then subscribing within a single table in the subscription database (commonly referred to as a roll up scenario). Roll up is often used for aggregating subsets of data from multiple locations in one table at a central Subscribe
25) Can multiple publications use the same distribution database?
Yes. There are no restrictions on the number or types of publications that can use the same distribution database. All publications from a given Publisher must use the same Distributor and distribution database.
If you have multiple publications, you can configure multiple distribution databases at the Distributor to ensure that the data flowing through each distribution database is from a single publication. Use the Distributor Properties dialog box or sp_adddistributiondb (Transact-SQL) to add a distribution database.
26) Does replication encrypt data?
No. Replication does not encrypt data that is stored in the database or transferred over the network.
27) What is the effect of running a bulk insert command on a replicated database?
For transactional replication, bulk inserts are tracked and replicated like other inserts. For merge replication, you must ensure that change tracking metadata is updated properly.
28) Why can’t I run TRUNCATE TABLE on a published table?
TRUNCATE TABLE is a non-logged operation that does not fire triggers. It is not permitted because replication cannot track the changes caused by the operation: transactional replication tracks changes through the transaction log; merge replication tracks changes through triggers on published tables.
29) What is NOT FOR REPLICATION option for table constraints
In some cases, it is desirable for user activity in a replication topology to be treated differently from agent activity. For example, if a row is inserted by a user at the Publisher and that insert satisfies a check constraint on the table, it might not be required to enforce the same constraint when the row is inserted by a replication agent at the Subscriber. The NOT FOR REPLICATION option allows you to specify that the following database objects are treated differently when a replication agent performs an operation:
- Foreign key constraints : The foreign key constraint is not enforced when a replication agent performs an insert, update, or delete operation.
- Check constraints : The check constraint is not enforced when a replication agent performs an insert, update, or delete operation.
- Identity columns : The identity column value is not incremented when a replication agent performs an insert operation.
- Triggers : The trigger is not executed when a replication agent performs an insert, update, or delete operation.
30) Does replication resume if a connection is dropped or do we need to reinitialize the replication?
Yes. Replication processing resumes at the point at which it left off if a connection is dropped. If you are using merge replication over an unreliable network, consider using logical records, which ensures related changes are processed as a unit.
31) How do I move or rename files for databases involved in replication?
In versions of SQL Server prior to SQL Server 2005, moving or renaming database files required detaching and reattaching the database. Because a replicated database cannot be detached, replication had to be removed from these databases first. Beginning with SQL Server 2005, you can move or rename files without detaching and re-attaching the database, with no effect on Replication
References: Thanks to the all the SQL Server techies who wrote and shared the valuable information which helped me a lot to prepare this series of Questions. Also big thanks to Microsoft Documentation which contains each and everything about their product.