Video Screencast Help
Symantec Appoints Michael A. Brown CEO. Learn more.

Hanging replication

Created: 03 Oct 2012 | 21 comments

Hello,

 

replication between 2 SEPM servers hangs ('Downloading 10%').

What logs can I check to debug this ?

 

Thnaks in advance,

Regards

Comments 21 CommentsJump to latest comment

.Brian's picture

How is the bandwidth betweem the two?

What version of SEP is this?

Can you please post the contents of the scm-server-0.log or scm-server.1.log?

Please click the "Mark as solution" link at bottom left on the post that best answers your question. This will benefit admins looking for a solution to the same problem.

Ashish-Sharma's picture

What version of SEP and MSSQL are you running? Can you please post the contents of the scm-server-0.log or scm-server.1.log?

May be help.

https://www-secure.symantec.com/connect/articles/replication-and-considerations

Thanks In Advance

Ashish Sharma

 

 

Xtof's picture

bandwidth is good 100Mbits at least

version 12.1 mp1 ru1

logs are not up to date. No scm-server* log file updated since few weeks... Seems since the installation.

 

We never use replication before so it's very new for us... Maybe simple problem but I don't know how to debug..

 

 

pete_4u2002's picture

did you checked the tomcat\logs folder for logs?

 

A. Wesker's picture

Hi Xtof,

 

Downloading 10% is a basic interface display on SEPM Console. It is not really what is happening in real time when the replication is trying to run so finally your SEPMs might not be able at all to connect one to another.

Or the SEPM simply can't run the process to transfer his own database and compress it to a data.zip file in order it's getting transfered to the other SEPM. The situation could vary a lot so hanging at (Downloading 10%) is finally meaningless. Logs would speak more better ;-)

 

Basic replication troubleshooting  steps:

- Ensure your both SEPM are in the same version as replication of SEPM using different version is not supported.

- Ensure Windows Firewall is deactivated on the both SEPM for all the levels (Public network, Private Network, Domain Network).

- Port 8443 is not already in use and allowed for the both SEPMs.

- The certificate of the both SEPMs are valid.

- Auto Replicate : No

- Live Update should never run continuously.

Deeper replication troubleshooting steps:

You should be aware that Auto-Upgrade, Live Update (LUALL.exe) and Replication are using a common process. So if any of them is running when a replication is done, the replication will fail.

In this situation, it could display some error logs like, unexpected error with some various errors ID and sometimes a relevant log evidence like "Unable to lock the process to replicate from site XXX to site XXX".

It is recommended to set some specific settings for the replication depending the database type, the quantity of clients and your logs settings.

LiveUpdate on your both SEPMs should be set to run every 4 hours or ideally once a time in daily basis.

 

For example:

LU schedule for SEPM 1:

from 8.00 AM to 6.00PM (it will cover all the daily distribution).

LU shedule for SEPM 2:

from 8.00 AM to 6.00PM (it will cover all the daily distribution).

Replication from SEPM 1 to SEPM 2:

Daily basis from 8.00PM

Replication from SEPM 2 to SEPM 1

Daily basis from 1.00 AM

If you're using Auto-Upgrade to deploy some of your client, ensure that it's scheduled in different hours (example: from 6.00 AM to 7.00AM if we follow the previous schedule example).

With this type of configuration, you will be sure that you will not encounter replication issue cause of this common process uses between Auto-Upgrade, Live Update and replication.

About what you replicate, check replicate logs only. The rest is not fair, especially because your both SEPM already have their own SEP install package and LU contents present.

Hoping these information are helping you ;-)

 

Kind Regards,

A. Wesker

Xtof's picture

Hello,

 

Thanks for your help.

Here are some strange things in the replication logs :

2012-10-05 08:40:38.709 THREAD 43 ATTENTION: ReplicationManager>> getUSNCacheStatus: usnStatus:2 replicationInRetrieving:false StrictUSN:false StrictLogUSN:false
2012-10-05 08:40:38.709 THREAD 43 ATTENTION: ReplicationManager>> getUSNCacheStatus: usnStatus:2 replicationInRetrieving:false StrictUSN:false StrictLogUSN:false
 

2012-10-05 00:13:02.463 THREAD 44 ATTENTION: ReplicationTask>> canDoReplication: Warning-> local site id (7C7963A9C0C5A08101E7EF2CB0B9FBB5) is larger than remote site id (652D5A83C0C590810071E0FB6BFDB04E). No replication occurs!
2012-10-05 00:13:02.463 THREAD 44 ATTENTION: ReplicationTask>> replicate: skip!
 

What is the meaning of this local site ID ?

 

Thanks in advance

Xtof's picture

Last update : After a while, "Failed to submit" appears in the console...

pete_4u2002's picture

the warning seems to be ok, can you post the scm-server-0.log as soon it is failed?

is there firewall which is timing out?

Xtof's picture

Seems to be working now as I can see zip file being copied from site 2 to 1. Zip file is huge 8Go !!

What is strange is that all policies, groups... updates have been done on site 1. So I don't understand what can explain 8go from site 2 !! It's the first time the replication works, maybe can explain ! LUA download on the 2 SEPM are the same so I hope it doesn't replicate twice updates...

Is there a way to seee what is really replicated  (zip file cannot be extracted it seems) ?

 

Regards

A. Wesker's picture

Good news Xtof. At least it's resolved :-)

 

Yes that's normal that during the first replication the data.zip is very big and taking a lot of time to be performed but you don't have to be worried for future replications.

Why ? Because once the replication has been done successfully one time on the both SEPMs, the major part of the database information are the same.

Your future replications will be done faster day after day as not all the data have to be written anymore because it's already contained on the first data;zip

The both data.zip will be compared and only new information will be written.

It's a little bit like a detla data.zip that will be created and contained only the new information that SEPM 2 didn't have from the SEPM 1 and vice versa ^^

Regarding the logs you provided:

Only that one is an error message.

2012-10-05 00:13:02.463 THREAD 44 ATTENTION: ReplicationTask>> canDoReplication: Warning-> local site id (7C7963A9C0C5A08101E7EF2CB0B9FBB5) is larger than remote site id (652D5A83C0C590810071E0FB6BFDB04E). No replication occurs!
2012-10-05 00:13:02.463 THREAD 44 ATTENTION: ReplicationTask>> replicate: skip!

And it's not really an error, it's an expected behavior from the SEPM where the replication ran.

Because between replication of both servers, there is the concept of Master and Servant. The Master is the SEPM Server which has the lowest Site ID and the Servant the highest Site ID.

There is only one order how the replication can be done, that's why it has been skipped.

If the replication has not been done yet first on the Master then the one from the Slave will be skipped automatically as long as the one on the Master is not done.

Be sure to schedule your replication in the correct time in consequence and do not forget that when you change shedules of replication, you will have to perform manually once a replication on the Master first and on the Servant in order that the new schedule change will be applied for the future replications.

Your hanging issue probably happen because of large amount of data the first time to be transfered.

If you would like to have a look on the data.zip be sure to perform a copy of it and not touch the original one but you don't need to check inside the data.zip anyway ;-)

The data that are getting replicated is by default your database (the table called sem5_db on your SQL Server) with Group + Policies.

Then depending if you check the other boxes for the contents to replicate there will be Live Update contents (sem5_content), SEP Install Package and logs (sem5_log, sem5_log1, ...).

If you have a look on your SQL server about the size of all these tables then you will understand why your first data.zip was so huge.

And something important that you should know is when a replication is done the database is downloaded + the data of the Group + Policies from your SQL Server to your SEPM machine and then these data getting compressed and here is your data.zip created and transfered to replication  subdfolder of your other SEPM.

To perform the download of your database to the SEPM, the database connection should be lock for all your managed SEP clients otherwise it wouldn't be possible to do it as it will be stay in use forever.

During all the replication workflow, the database connection is locked for the managed SEP clients so all the threads sent by the managed SEP clients to the SEPM during this time being will be stuck on the SEPM and not processed. They will be processed once the replication is ended.

Now you could guess even more why it's important to replicate logs only and especially when you have a SQL Server because if you use an SQL Server it means fore sure that you have at least more than 5 000 managed clients because SEPM Embedded Database supports officially up to 5000 clients only and officiously we already recommend to use a dedicated SQL Server for better performance over 3000 managed clients.

The more data you replicate the longer time it will take for the data.zip creation, longer time it will take to be transfered to the other SEPM and  longer time the database connection will be closed for your managed SEP clients which can cause some issues like mismatched information displayed between your SEPM Console and your SEP clients.

We recommend to replicate logs only and especially for big database on SQL Server because the amount of data will be bigger if you replicate LiveUpdate contents and SEP install package then you will increase the risk that your servers will be hanging and the replication will take longer time to be done until the end.

And as I already told you on my previous post:

About what you replicate, check replicate logs only. The rest is not fair, especially because your both SEPM already have their own SEP install package and LU contents present.

 

Kind Regards,

A. Wesker

pete_4u2002's picture

you may need to uncheck the replication of the content and packages.

Xtof's picture

Hi all,

 

thanks a lot for your help.

Replication works fine now but I still have one question :

I though replication was done with 2 zip files (A=>B, B=>A). But it's running at the moment and there is 3 zip files created :

B=>A but no changes on the console A (8go)

A=>B and groups.. from A replicated on B OK (5Go)

B=>A running right now and I don't know what it will do.. (5go)

 

Is the first zip used for comparaison between A and B ??? so huge !

 

Regards

pete_4u2002's picture

whats the name of zip file? is it data.zip?

you can delete the one thats old..

Xtof's picture

data.zip right.

3rd data.zip file just finished processing. Replciation finished now.

But still don't understant why 3 zip...

John Santana's picture

Well, I have been into this situation before as well, the Symantec support staff pretty much guided me in deleting the relationship, then upgrading the SEPM to the latest on both sites and then set the new replication again from scratch.

that does the trick for me.

Kind regards,

John Santana
IT Professional

--------------------------------------------------

Please be nice to me as I'm newbie in this forum.

pete_4u2002's picture

it should be removed once replication gets completed successfully on both sites.

John Santana's picture

Yes correct and you are right Pete !

it may take some time before the replication has finished.

Kind regards,

John Santana
IT Professional

--------------------------------------------------

Please be nice to me as I'm newbie in this forum.

Xtof's picture

Hello,

 

here is the final status :

- Server A was fully set regarding groups, policies... server B had other groups and policies.

- First replication initiated  : All groups, policies... On B have been replaced by A settings ! So this was not a merge but a mirroring replication !! Is this normal because of the first time ? Maybe...

- Next replication did successfuly merge of data. No more deletion of groups and others but really merged.

- data.zip are now created and copied to replication parter only 2 times (2 ways) and no more 3 times. Can I conclude the 3 data.zip were because of the first replication which erase completely my server B with A settings ? Maybe...

- Does someone has an explicit flowchart of the replication process (db extrac, file creation, zip..) ?

- Last one : Server A and B are replicated. Each have a LUA server to update contents. Is it necessary to enable update content replication between A and B ? If I enable it, do the contents are replicated even if already on the 2 servers thanks to the LUA ?

Thanks for your help.

Rgds

pete_4u2002's picture

 Server A was fully set regarding groups, policies... server B had other groups and policies.

- First replication initiated  : All groups, policies... On B have been replaced by A settings ! So this was not a merge but a mirroring replication !! Is this normal because of the first time ? Maybe...

groups and policies always replicate both ways . next replication cycle only incremental will be replicated.

- Next replication did successfuly merge of data. No more deletion of groups and others but really merged.

- data.zip are now created and copied to replication parter only 2 times (2 ways) and no more 3 times. Can I conclude the 3 data.zip were because of the first replication which erase completely my server B with A settings ? Maybe...

could be , but is the data.zip ( 3 files) under the same folder?

 

- Does someone has an explicit flowchart of the replication process (db extrac, file creation, zip..) ?

https://www-secure.symantec.com/connect/articles/replication-and-considerations

Symantec Endpoint Protection Manager Replication Workflow

http://www.symantec.com/business/support/index?page=content&id=TECH172181

 

- Last one : Server A and B are replicated. Each have a LUA server to update contents. Is it necessary to enable update content replication between A and B ? If I enable it, do the contents are replicated even if already on the 2 servers thanks to the LUA ?

no need to replicate as both the servers have the content

A. Wesker's picture

Hi Xtof,

 

It's strange because I have a recent customer who asked me recently some questions very close to yours (maybe it's you ^^).

To be honest, I didn't notice the existance of White Paper of something like that so far regarding replication unfortunately.

For sure, the more replications will run and the smaller data.zip file size will be as there will be less and less difference between all sites which is completely a normal behavior.

I'm not an hundred percent sure but by design, in the SEPM when you have to use the Wizard to add the replication partner, this one will be the Servant, then it will have the highest site ID when the Master will have the lowest one.

Then and for sure, by design again, the B site will be a perfect copy of the site A during the first replication. It's expected that B will lose his previous information.

If later, you add some new group on site B, not present in the site A, during the future replication, these new groups and policies will be in site A as well.

Why ? Because it seems that by design, the Master replicates to his partner, then the partner loses all his previous group policies and settings.

Once a site is the Master, it seems to be extremely hard (not even sure if it's even possible and recommended to try) to change the site ID in order to change who will be the Master and who will be the Servant.

But after that first replication, the database, groups and policies are compared, then it's the newest information from both site that will be kept and replicated.

So the replication is already bi-directional excepted during the first replication.

It might be interresting to perform some tests with small test environment even with an embedded database as it will work on the same way.

I personnally did many tests of replication in order to understand better how it works. But so far after letting run replications for a month with 2 SEPMs 12.1 RU1 MP1 for example, at the end you should find only temp folder and only one data.zip file on each inbox/outbox replication folder of the both SEPMs.

You may find more than one data.zip if you're replicating more than one SEPM each other but then you will have them in different folders with a different ID

For example:

Site A replicate to B and vice versa.

There will be inside their replication inbox/outbox folder another folder with an ID (I don't remember if this ID correspond to the site ID or the USN, I have a small doubt about it).

Inside this ID folder, you will have your data.zip file

If suddenly you add a replication partner on site C in order it's getting replicated with B, then on the inbox/outbox folder of the site of B, you will have for sure more than one data.zip but the second one will be in another folder with another ID). This is an expected behavior. This is clearly how it's supposed to work.

And for the reason why, it's probably because SEPMs couldn't have the same Site ID and even couldn't use the same USN to perform replication between more than one site.

At the end, data from C will be gone and C will be a copy of B (because of the first replication).

Hoping these information helped you a little bit Xtof :-)

 

Kind Regards,

A. Wesker

 

 

PS: Please, do not forget to choose as a solution the answer that you think it was the most helpful regarding your request.

A. Wesker's picture

- Last one : Server A and B are replicated. Each have a LUA server to update contents. Is it necessary to enable update content replication between A and B ? If I enable it, do the contents are replicated even if already on the 2 servers thanks to the LUA ?

no need to replicate as both the servers have the content.

 

Agreed, not needed and not fair at all excepted if you had only one SEPM retrieving the contents from the LUA.

But as they are both getting their contents from LUA, the replication of LU contents is completely useless and will just make the replication longer to execute (without counting the fact that SEP package as well will be replicated cause we can't dissociate them for the moment when using this option).

 

Kind Regards,

A. Wesker