Deployment Solution

 View Only

Best Practices for Optimizing SQL Server in SAN Environments Using Altiris Products 

May 29, 2008 05:35 PM

This article includes items to be aware of when using SAN technology with SQL Server and Altiris products. Thanks to the Notification Server Sustained Engineering team for initial development work on this document.

SAN's (Storage Area Networks) have many advantages in today's corporate environments. Large enterprises have been enjoying the benefits from SAN technologies. A SAN basically removes or uncouples storage (disks and drives) from servers and attaches that storage directly to the network. This allows for shared storage making SAN's highly scalable and flexible in their storage resource allocation. This also allows better and more efficient backup solutions, and overall better storage utilization.

As shown below in Figure 1, each server in a basic SAN configuration has two network connections. One goes to the primary network, show below in bold. This is the primary connection for the servers to connect to LAN and WAN resources. The second connection is to the SAN network fabric, isolating the 2nd network adapter and using it specifically for storage traffic.

Figure 1 - Basic SAN Diagram

Click to view.

One of the advantages of a SAN is its inherent ability to ensure high data availability. This high availability makes it ideal for database applications, as the various disk I/O technologies that it uses, combined with high-speed networking come together to get the data where it needs to be quickly and accurately.

There are many considerations that need to be taken care of when either installing and configuring SQL Server to a SAN-based environment for the first time, or when migrating from directly attached storage to a SAN. Many of these concepts should be familiar to support staff, while others will be more specific and oriented to the SAN. Furthermore, some performance metrics need to be available for measurement to determine the real usability of the SAN and whether or not the configuration will be acceptable for use with Notification Server.

This article will attempt to highlight and address some of the considerations, as well as introduce a metric to measure performance against.

Fundamentals

The fundamentals of SQL Server should not be overlooked when installing to a SAN. Whether or not a SAN is being used, it is important that good knowledge exists of the hardware that SQL files are being stored on. There should not be any configuration values that conflict with those basic fundamentals.

All SAN hardware should honor SQL Server I/O basics. Of high importance is that write ordering needs must be maintained, meaning that writes must always be secured in or on stable media and torn I/O prevention must occur 1. How this occurs should be readily available from the chosen manufacturers SAN documentation, or from the manufacturer direct. Specifically,

  • Stable Media refers to storage that can survive system restart or common failure2 and is generally physical disk storage, but SAN caches also fall under this definition. Stable Media is a requirement by SQL Server, as it is relied upon by Write Ahead Logging to maintain the ACID3 properties of the database and to ensure data integrity.4 Stable Media strategies need to be discussed with the hardware manufacturer.
  • Write Ordering is the ability to preserve the order of write operations and is a requirement of SQL Server on Hardware.5 SQL Server requires this to be maintained for local and remote I/O destinations. If Write Ordering is not maintained then the system breaks the Write Ahead Logging protocol.
  • Torn I/O refers to a situation where a partial write takes place, leaving the data in an invalid state. SQL Server documents this as a Torn Page. Torn Page or I/O prevention means that the disk system must provide sector alignment and sizing in a way that prevents torn I/O including splitting I/Os across various I/O entities in the I/O path.6

Physical SAN Characteristics

Because of the abstraction between the physical disk and SQL server that comes as a result of installing on a SAN, it then becomes vital to understand the underlying characteristics of a SAN in the early stages of any installation. A few key measures are outlined in the following paragraphs:

  • The relationship between the logical and physical portions of the SAN needs to be understood. The number of physical disks behind each LUN7 and those LUNs which share the same physical spindles needs to be understood. SAN hardware often supports 'zoning' which enables an admin to dedicate full physical disks to a LUN, so that the I/O is isolated to the SQL Server. SQL log files should be separated from data files at a physical disk level,8 in a SAN environment, this means allocating the database log files to LUNs which have dedicated physical disks - those that are not shared with other LUNs, including those which will be used for other database log, data or index files. A write response time of 2-3 msecs is the goal for this configuration. Any physical disks that are shared between servers or applications should be highlighted, as different servers running different applications with different I/O characteristics can cause problems. Furthermore, in general more spindles means better performance.9
  • Knowledge of each part of the network span between the host and the array is critical. An administrator should be able to point out each component and their relative speeds in the sequence that they receive the data. For example, first the number of HBAs inside the host, then the switch ports, the path from switch to array, the front side fiber channel ports, the cache and so on all the way up to the physical disks. This will illuminate any areas of concern, and it extremely useful information to have around when diagnosing performance issues.
  • To prevent bus saturation, it is often necessary to spread HBAs across the various busses in the host. Most current servers have several PCI or PCI-X busses contained therein, and those systems with multiple HBA's (Host Bus Adapters) should not overload a bus by having all the HBAs installed in any one bus. Furthermore, ensure that the HBAs are installed in slots with the highest speed, as the different busses will not all perform at the same rate. Vendor-specific, up-to-date HBA drivers should be used, and multipath software should be used to balance I/O across HBAs. Take special care of queue depths in HBA driver settings - these are often not set to be deep enough to support SQL Server I/O.10

RAID Characteristics

SQL Server performance is heavily dependant on access to disks, so basic RAID fundamentals are just as important as other areas when deploying on a SAN. In general the following recommendations are made:

  • Log files should always be kept on RAID 1+0.11 If possible also keep data and index files on RAID 1+0, but this can be expensive. Often customers are installing RAID 5, however this is not recommended. RAID 1+0 gives better write performance and availability than RAID 5.
  • Microsoft recommends a stripe size of either 64Kb or 256Kb, and further recommends the latter option where index scan ranges are on tables of greater than one hundred megabytes.12 In large installations the Altiris database often contains tables of 100 Mb, so a 256 Kb stripe size is the recommendation. Stripe sizes should never be set below 64Kb, as this is the size of a single SQL extent.
  • Because of the heavy use of the TempDB in Altiris, RAID1+0 is the preferred solution for this database. Again, this is a costly solution.

SQL Characteristics and Configuration

When configuring a SQL Server I/O subsystem for optimal running, it is important to not overlook the configuration of SQL itself. A few notes:

  • Do not rely on Auto Grow. Data Files should be pre-allocated to the appropriate size. Auto grow can be left enabled, simply to cover your bases, but do not rely on it as an appropriate strategy. Instead of using Auto Grow, manually manage the size of the files proactively.
  • Similarly, do not use Auto Shrink. Because Auto Shrink kicks in every 30 minutes (a value which cannot be configured) and takes up a lot of resources, Auto Shrink can cause long disk queue lengths or possibly I/O timeouts.13 Furthermore Auto Shrink and Auto Grow can team up together when switched on, causing repeated shrink and grow operations which in turn can cause file fragmentation. It is helpful to switch Auto Shrink off on the Model database, so that newly created databases do not inherit the option. If a database needs to be shrunk, it is a better option to periodically shrink the database with DBCC SHRINKDATABASE or DBCC SHRINKFILE commands.
  • Consider increasing the number of files per filegroup for the Altiris Database and TempDB on multi processor servers. A suggested formula is 1 data file per CPU for TempDB and between .25 and 1 data file per CPU for the Altiris Database. Note that Dual Core processors count as 2 CPUs, but hyperthreading does not. SQL Server uses a proportional fill algorithm, meaning that operations are distributed amongst multiple files. Because this algorithm tends to prefer allocations in files with more free space, it is helpful to created data files of the same size. It is not necessary to create these files on different spindles unless there are visible I/O bottlenecks.

Measuring Performance

Once the server has been configured to optimal configuration, it is a logical next step to check performance and measure a baseline. Similarly, once a server has the application installed and has been running, performance of the IO subsystem can be checked, if this is a concern for slow general performance. The following sections outline a few processes to measure IO performance and either highlight or eliminate it as a performance bottleneck.

SQLIO

Microsoft has a free, unsupported tool for measuring performance of IO subsystems, specifically on SAN environments.14 Although it is implied in the name of the product, SQL Server is not actually required for this to application to be run, nor do its results refer exclusively to SQL Server. It is a recommendation to get good baseline tests done on the SAN before any application or SQL is installed, in order to verify that the IO subsystem is able to handle the stress of serving said applications. There are a number of considerations support staff should be aware of when testing with SQLIO, a few of which are outlined in the following points:

  • If SQLIO is not being run before application install time, be aware that it is going to put extra load on the I/O subsystem. Further, any load that is not SQLIO may skew the results. For this reason it is recommended to do the tests in server maintenance periods or preferably before the server goes into production.
  • SQLIO depends on the size of the test file to be greater than the size of the SAN cache. It is very important to get accurate results to ensure that this is true. The default size of the cache file is 100Mb, which may or may not be enough. The size of the test file is tuned inside the parameter file that is passed to SQLIO at the command prompt. Also ensure that the test period is of sufficient length to make sure that the SAN cache is exhausted.
  • It is important to test many combinations of IO types. Refer to Appendix A outlining types and sizes of SQL operations. As a general rule make sure to test checkpoint operations with many random writes of 8kb; Test log operations with sequential reads and writes of 2kb to 6kb; Test read ahead operations with sequential reads and writes of 8kb to 256kb.
  • As noted in the SQLIO documentation, each IO path should be tested individually and then in combination. Take care of multi-pathing - if this is in use ensure it is working as expected. In order for this test to be relevant, a good understanding of the path is necessary - do not expect better throughput than the maximum of any single point in the path (HBA, fiber channel ports etc)
  • Read and understand the documentation that comes with SQLIO. Although the number of files included with the application is small, it is important to understand them - many points are not covered above as knowledge of the documentation is assumed.

Perfmon

The use of the Physical/Logical Disk -> Disk Queue Length perfmon counter has always been used in the past as a first call for measuring disk performance. Unfortunately the use of the Disk Queue Length in SAN environments is dramatically diminished.

The usual approach is to measure disk queue length and conclude that any sustained Disk Queue Lengths of greater than two is bad performance for any given disk. Microsoft specifies this technically as any Avg Disk Queue Length that exceeds twice the number of spindles is likely developing a bottleneck.13 Due to the fact that any given LUN on the SAN may be composed of many physical disks, the exact number of spindles is not necessarily known. Therefore it is of little value to measure all LUNs against a single arbitrary number. By the same token comparing one Disk Queue Length against another, even on the same server, is of dubious value.

Fortunately, there is another perfmon counter which still gives results in SAN environments: Logical Disk -> Avg Disk/Sec. This counter measures the speed that data is being moved in seconds. Specifically it measures the average time of each data transfer regardless of the number of bytes read or written.16 The suggestion here is that all Logical Disks involved in SQL operations should have an Avg Disk/Sec transfer time of between 10 and 20 ms. Arguments could be made for results greater than this, but certainly consistent times of greater than 50ms are cause for concern.

Appendices

Appendix A - SQL Server I/O Characteristics

Operation Random/ Sequential Read/Write Size Range
OLTP - Log Sequential Write 512 bytes - 64KB
OLTP - Data Random Read/Write 8K
Bulk Insert Sequential Write 8KB - 128KB (in multiples of 8KB)
Read Ahead Sequential Read 8KB - 256KB (in multiples of 8K)
CREATE DATABASE Sequential Write 512KB
Backup Sequential Read/Write 1MB
Restore Sequential Read/Write 64KB
DBCC CHECKDB Sequential Read 8KB - 64KB
DBCC DBREINDEX (read phase) Sequential Read (See Read Ahead)
DBCC DBREINDEX (write phase) Sequential Write 8KB - 128KB (multiples of 8KB)
DBCC SHOWCONTIG Sequential Read 8KB - 64KB

Found in Microsoft Customer Advisory Team http://blogs.msdn.com/sqlcat/archive/2005/11/17/493944.aspx

Appendix B - Indicative Throughput Speeds for SQL/IO

The following figures, from a number of different environments show SQL/IO throughput speeds. This may be useful to measure against and get a feeling of what speeds should be expected.

All results were collated with 8, 64, 128 and 256 as the byte size (-b parameter) using a 100 MB file with 2 threads. Results were generated for sequential and random patterns.

Environment Operation Byte Size IO/Sec MB's/Sec
Optiplex GX280 Random Write 8Kb 454.36 3.54
Optiplex GX280 Random Write 64Kb 191.88 11.99
Optiplex GX280 Random Write 128Kb 123.01 15.37
Optiplex GX280 Random Write 256Kb 75.92 18.98
Optiplex GX280 Random Read 8Kb 309.48 2.41
Optiplex GX280 Random Read 64Kb 161.10 10.06
Optiplex GX280 Random Read 128Kb 133.67 16.70
Optiplex GX280 Random Read 256Kb 120.78 30.19
Optiplex GX280 Sequential Write 8Kb 3295.72 25.74
Optiplex GX280 Sequential Write 64Kb 728.24 45.51
Optiplex GX280 Sequential Write 128Kb 276.86 34.60
Optiplex GX280 Sequential Write 256Kb 122.01 30.50
Optiplex GX280 Sequential Read 8Kb 7907.16 61.77
Optiplex GX280 Sequential Read 64Kb 1508.55 94.28
Optiplex GX280 Sequential Read 128Kb 748.96 93.62
Optiplex GX280 Sequential Read 256Kb 377.31 94.32
Dimension 4400 Random Write 8Kb 139.28 1.08
Dimension 4400 Random Write 64Kb 143.27 8.95
Dimension 4400 Random Write 128Kb 97.63 12.20
Dimension 4400 Random Write 256Kb 62.99 15.74
Dimension 4400 Random Read 8Kb 148.41 1.15
Dimension 4400 Random Read 64Kb 110.69 6.91
Dimension 4400 Random Read 128Kb 90.93 11.36
Dimension 4400 Random Read 256Kb 65.43 16.35
Dimension 4400 Sequential Write 8Kb 2173.18 16.97
Dimension 4400 Sequential Write 64Kb 479.06 29.94
Dimension 4400 Sequential Write 128Kb 245.05 30.63
Dimension 4400 Sequential Write 256Kb 122.96 30.74
Dimension 4400 Sequential Read 8Kb 2485.44 19.41
Dimension 4400 Sequential Read 64Kb 454.78 28.42
Dimension 4400 Sequential Read 128Kb 240.71 30.08
Dimension 4400 Sequential Read 256Kb 124.20 31.05
Dev SQL Server Random Write 8Kb 4208.79 32.88
Dev SQL Server Random Write 64Kb 1846.73 115.42
Dev SQL Server Random Write 128Kb 866.13 108.26
Dev SQL Server Random Write 256Kb 462.64 115.66
Dev SQL Server Random Read 8Kb 23088.01 180.37
Dev SQL Server Random Read 64Kb 6917.49 432.34
Dev SQL Server Random Read 128Kb 3675.82 459.47
Dev SQL Server Random Read 256Kb 1858.87 464.71
Dev SQL Server Sequential Write 8Kb 22106.94 172.71
Dev SQL Server Sequential Write 64Kb 2695.11 168.44
Dev SQL Server Sequential Write 128Kb 1291.00 161.37
Dev SQL Server Sequential Write 256Kb 603.14 150.78
Dev SQL Server Sequential Read 8Kb 52776.60 412.31
Dev SQL Server Sequential Read 64Kb 7221.93 451.37
Dev SQL Server Sequential Read 128Kb 3734.26 466.78
Dev SQL Server Sequential Read 256Kb 1940.99 485.24

1 From http://support.microsoft.com/kb/917043/en-us
2 From http://www.microsoft.com/technet/prodtechnol/sql/2...
3 Atomicity, Consistency, Isolation and Durability
4 http://www.microsoft.com/sql/alwayson/storage-req...
5 http://www.microsoft.com/sql/alwayson/storage-req...
6 http://www.microsoft.com/sql/alwayson/storage-req...
7 Logical Unit Number, defined by Wikipedia as "An address for an individual disk drive, and by extension the disk device itself"
8 Number 6 in the storage top ten. http://www.microsoft.com/technet/prodtechnol/sql... 9 Number 2 in the storage top ten.
10 Number 10 in storage top ten.
11 Number 5 in storage top ten
12 http://www.microsoft.com/technet/prodtechnol/sql/2...
13 For more information http://blogs.msdn.com/sqlserverstorageengine/archi...
14SQLIO found at http://www.microsoft.com/downloads/details.aspx?fa...
15 http://www.microsoft.com/technet/prodtechnol/windo...
16 Found in http://www.microsoft.com/technet/prodtechnol/wind...

Statistics
0 Favorited
0 Views
1 Files
0 Shares
0 Downloads
Attachment(s)
doc file
SQL Server SAN and NS.doc   1.52 MB   1 version
Uploaded - Feb 25, 2020

Tags and Keywords

Related Entries and Links

No Related Resource entered.