Optimizing RTCI for Intel vPro Operations
A few months ago I posted the article "Integrating Intel AMT Power Control Into Everyday Operations"
The good news is that it generated a renewed interest to the underlying ability of out-of-band management.... especially for mass power-on of systems at a defined time period. The article also brought a focus into what optimizations and considerations should be taken into account in order to achieve success. Here are a few additional insights on those optimizations\considerations based on various customer and lab experiences... and I fully expect more will come.
- Server vs. Client based TaskServer jobs
- Minimize the number of protocols used in the connection profile
- Retry after Wake and Timeout value
- Ensure Host Resolution is working in your environment
- Each Intel AMT session requires authentication and authorization
Server vs. Client Based TaskServer Jobs
When you define jobs within TaskServer, you may have noticed there are "server" and "client" job types. In my own experiences - A client job is something that can distributed to other TaskServers in the environment\hierarchy, and may be directed to the TaskAgent on the client... or that can be initiated by the remote TaskServer. A Server job will originate from the core Notification Server (i.e. Symantec Management Platform core server). Intel AMT tasks are Server Jobs - they are out-of-band management functions that originate from the core Notification Server requiring authentication\authorization into the firmware of the client. The protocols and associated authentication are defined by the Connection Profile.
Again - this is my own experience\understanding of the environment - and I welcome corrections, clarifications, and so forth from others. In my lab environment, I only have one TaskServer which is running on the core NS7 system. A few customers I am working with are using a hierarchical TaskServer setup.
Minimize the Number of Protocols Used in the Connection Profile
Have you ever noticed that there can be more than one Connection Profile defined? Contained within each connection profile are a list of protocols which can be enabled\disabled, along with defining the desired protocols credentials. On that note - you can specify multiple credentials for a single protocol, yet only select one of those protocol credentials for a single connection profile. (Even as I typed that, it sounds confusing.... but stick with me or try it out yourself. It will make sense in a moment). This was the design and intent of the "Pluggable Protocol Architecture" or PPA.
Take a look at the image below. Only the AMT (Active Management Technology) protocol is selected with a defined credential.
If desired, the "runtime credentials" can be selected. However, this might introduce additional latencies since it either requires communications with the Intel SCS (AMTconfig server, IntelAMT database) to get the per client instance stored credentials OR if using Kerberos authentication it would require a query to the Active Directory for the service token. (BTW: A theory on my part - this is part of the reason Intel AMT out-of-band tasks are "server" based and not client. Imagine the additional latencies that could occur if remote TaskServers had to query back to Intel SCS to obtain a per instance unique credential)
Instead - having a defined Intel AMT user directly specified should help reduce the latencies.
Another factor to keep in mind - if additional protocols are enabled in the connection profile used for the Intel AMT power-on job, by the nature of RTCI (Real Time Console Infrastructure) it will attempt to communicate with the client on those protocols. For example, if WMI is enabled a connection will be attempted. However, if the remote client is currently off or asleep, WMI is not available on the client. The RTCI Task will wait until the protocol connection times out. If only Intel AMT connections are attempted for a power-on job, this will reduce unnecessary traffic and associated attempts.
If you are using TLS with your Intel AMT configuration, keep in mind that a per client TLS session will need to be established. This again will add latencies along with server load to establish the TLS session per client. In general, I do not recommend TLS for Intel AMT configurations and communications. It is not a matter of it working (because it works great), but more of a performance issue due to nature of TLS sessions in general. A key question I often ask - "Do you encrypt and secure traffic on your internal network today?" In general the answer is no.
Retry after Wake and Timeout Value
If you scroll down on the AMT protocol options in the connection profile (not shown in the image above), you will notice two options at the bottom:
- Timeout - my understanding is this defines how long the server will attempt to connect to the Intel AMT client before timing out. The default setting is 10 seconds. More experimentation and inquiries will be needed to ratify my opinion.... but it may help to adjust this down to 5 seconds.
- Retry after wakeup from sleep - According to the power policy defined in the Intel AMT configuration profile, Intel AMT can be placed in a low power state. In the latest generations of the platform, this is often the default setting. The upside is that the power consumption of the client is very low (~1W) when the system is powered off. The downside is that a delay will occur in waking up the ME. Another point that I am waiting to clarify is what the resulting impact might be with the timeout value. For example, if the timeout is 10 seconds and Intel AMT must first be awakened - does this mean 10 seconds to wake-up followed by a re-attempt? I hope to get this clarified soon.
Ensure Host Resolution is Working in Your Environment
Communications to Intel AMT are SOAP-over-HTTP based, meaning that it is a TCP\IP connection. The WebServices traffic requires resolution of the current IP address of the client. It appears the current nature the Real-Time Console Infrastructure (RTCI) is to perform a FQDN\hostname resolution of the client IP at the time of the event. The client IP address is captured by the NSagent and stored in the database, yet there could be situations where the NS recorded IP address is no longer valid.
In the case of a 1-to-many power-on event, this would imply RTCI needs to first request the correct IP address of the target clients before being able to send a request. DNS resolution could be a few milliseconds.
Take a look at the Microsoft TechNet article that explains Windows based hostname resolution - http://technet.microsoft.com/en-us/library/bb727005.aspx. In my own experiences, the local DNS cache of the Windows operating system may cause incorrect IP resolution or other issues.
The key point here - ensure your client IP resolution is working correctly in the environment.
Each Intel AMT session requires Authentication and Authorization
In connection with the insights shared above, Intel AMT communications occur per client device and require authentication\authorization.
The upside is that you can ensure only those with appropriate credentials can communicate and utilize the Intel AMT features. In addition, since the communications are TCP\IP based they are routable by nature. This results in having the ability to reliably know the current power state, perform a power-on, and know that the power-on event completed successful. In general - these positives are key reasons customers really like how the technology works.
The downside is that connections are per client (i.e. unicast), authentication must occur (the examples above would be MD5 Digest), and the authenticed user must be authorized to perform the task desired (which means that the system must be checked to see if that task is possible, whether the user is authorized, etc). On a per system basis, this all happens very quickly. But if you account for how many systems have been targeted, how many threads are available to process, and so forth.... the numbers might indicate the why all of the targeted clients do not power-on right away.
The actual performance will vary greatly based on environmental conditions and variables. Thus what I share below is my own experience based on a simplified network capture and analysis. Per system traffic for a power-on event is less than 40KB, requiring up to 2.5 seconds (often the time was less than second). Some customers have indicated a few minutes to power-on hundreds or thousands of clients. In a training room environment, I ran a single task to power-on 25 systems which completed in about a minute. (BTW: It was rather impressive to the attendees of the training to see a collection of systems power-on in this manner, especially when I explained that Wake-on-LAN was not used). An unknown variable is the latency between the start of the job and when the TaskServer actually starts to send messages to the clients. As indicated above - there are other variables which apply not only to NS7\TaskServer environment and Intel AMT firmware.... but also to the infrastructure, other processes or resources on the server, and so forth.
The information provided above is for reference only, and is meant to provide insights based on key learnings and experiences. Keep in mind that collection of variables (hardware, software, environmental, etc) will affect the overall performance and scale of a job. There will be tweaks and optimizations for each variable, just as there may be updates to firmware or software. If you have additional insights to share - please do.
A comments often heard is why not use a BIOS timer to wake-up the systems at a defined time. One answer - that timer is set to work at only that time, whereas Intel AMT is a service waiting for a request at any given time. Some customers have need to define multiple wake-up times. Some customers have chosen to use Windows Scheduled Tasks to wake from sleep states. While these alternative approaches may work - they side step a common need of true out-of-band management.
You may be interested to know that in the newer generations of Intel AMT there is a little known feature called PC Alarm Clock - http://software.intel.com/en-us/articles/intel-active-management-technology-pc-alarm-clock/. The feature was introduced with Intel AMT 5.1 (2009 Desktops). This feature is not supported natively by Symantec\Altiris, yet there are ways to enable and configure it if needed. For example, the Intel vPro Core Process PowerShell Module enables the ability to get, set, and clear the PC Alarm Clock on configured Intel AMT systems. Take a look at http://communities.intel.com/community/openportit/vproexpert/blog/2010/07/19/intel-core-vpro-processor-powershell-module--release-introduction
Again - I welcome your thoughts and feedback on the information shared. View it as a work in progress as more experiences are obtained and shared.
The opinions expressed on this site are mine alone and do not necessarily reflect the opinions or strategies of Intel Corporation or its worldwide subsidiaries