Video Screencast Help
Endpoint Management Community Blog

Massive Connector import speed improvement with one single change

Created: 24 Jan 2013 • 3 comments
Ludovic Ferre's picture
+3 3 Votes
Login to vote

I have a customer that is using the Connector Solution (in 6.0 and 7.x) to import users, group and parameters into their CMDB's for various business reasons.

Over time their import data has grown and so has the process time, to some extremes: updating 10~20K entries on a dataclass from a CSV file containing 1,000,000+ lines (a mere 30MiB) would take 3 hours + (in 6.0, and many more in 7.x).

Note that the imported data is not one for one - i.e. we are not populating a data table but linking keys from other tables, which is different process to handle from the SMP.

In a couple of cases we decided to take the processing outside of the Connector Solution, via a simple SQL procedure. This worked great but it requires some serious amount of work to implement the data insert, update and delete part of the procedure. This allowed us to run the import in less than one minute (fyi, it's Import #3 in the table below).

Thankfully my customer reported (today) an alternative option to this tedious process (and here it is):

Add an index on the Item.Name field!

Yes, it sound so little, but the customer reported the following data which is significant:

  Before After
Import #1 150+ minutes 26 minutes
Import #2 120+ minutes 25 minutes
Import #3 150+ minutes 40 minutes

 

Comments 3 CommentsJump to latest comment

petr_sanda's picture

Somebody might say the support guys should check such basics in the beginning of investigation..

0
Login to vote
Ludovic Ferre's picture

Totally yes, but as the import process wasn't showing a single bottleneck in the profiler traces (remember the big issue with the Import problem comes fro the quantity) it wasn't obvious.

I remember sharing the following calculation with development:

  • If you load 900,000 items in 1 milli-second it then takes 900 seconds (15 minutes) to load all the items from the CSV import file.
  • Then the processing is starting and running in sequent batch, so again you have large time slices gone without having to do much work (20 ms * 900,000) = 300 minutes

And I'm sure you know that with hindisight such large problem can often look very easy to resolve (thanks to the hindsight bias [1]).

[1] http://en.wikipedia.org/wiki/Hindsight_bias

I am currently off-net, on a retreat of some kind. I'll be back real soon, and you sure will hear from me then ;-).

Ludovic FERRÉ
Principal Remote Product Specialist
Symantec

0
Login to vote
petr_sanda's picture

You're right.

I agree this was not obvious until the end and some confusion might have been included. But from now on based on hindsight we know it could be resolved simply, we will focus on the basics before we spend weeks on measuring, proving and corresponding.

+1
Login to vote