H.323 Mediated Voice over IP: Protocols, Vulnerabilities & Remediation
by Dr. Thomas Porter
An IBM executive was once quoted as saying, "Our goal is to make the computer as easy to use as the telephone". Our goal, now, is the reverse: to make using an IP telephone at least as easy and secure as using a computer on the Internet.
Voice over IP (VoIP) can be a complex subject. Network security professionals may find the terminology foreign, and VoIP vulnerabilities are often misunderstood. This paper provides an overview of the H.323 protocol suite, its known vulnerabilities, and then suggests twenty rules for securing an H.323-based network.
VoIP protocols can be classified according to their role during message transmission. H.323 and SIP are signaling protocols and thus, they are involved in call setup, teardown, and modification. RTP (real-time transport protocol) and RTCP (real-time transport control protocol) are media transport protocols, and are involved in end-to-end transport of voice and multimedia data. TRIP, SAP, SRP, OSP, et. al. comprise a group of VoIP-related support protocols. Finally, because H.323 mediated VoIP relies upon the underlying transport layer to move data, more traditional protocols that security professionals are familiar with, such as TCP/IP, DNS, DHCP, SNMP, RSVP, and TFTP, may be required.
H.323 (which is implemented primarily at versions 3 and 4 as of the time of this writing) is a byzantine international protocol, published by the ITU, that supports interoperability between differing vendor implementations of telephony and multimedia products across IP-based networks. H.323 entities provide real-time audio, video and/or data communications. Support for audio is mandatory, while support for data and video is optional.
The dissection of H.323-mediated VoIP from a security point-of-view, is complex - the plethora of associated protocols & the large number of vendor implementations (at least 40 separate vendor implementations) has resulted in further complicating the interactions and different security features of the vendor implementations. SIP also has a number of security issues, but those will be addressed in another article.
As will be seen below, a surfeit of attacks against the H.323 protocols can be envisioned. The only existing vulnerabilities that we are aware of at this time take advantage of ASN.1 parsing defects in the first phase of H.225 data exchange. More vulnerabilities can be expected for several reasons: the large number of differing vendor implementations, the complex nature of this collection of protocols, problems with the various implementations of ASN.1/PER encoding/decoding, and the fact that these protocols, alone and in concert, have not endured the same level of scrutiny that other, more common protocols have been subjected to.
H.323 Signaling protocols
H.323 allows dissimilar communication devices to communicate with each other by using a standardized communication protocol. H.323 defines a common set of call setup and negotiating procedures and basic data-transport methods - the most common in VoIP applications being H.225.0, H.235, H.245, H.248, and the Q.900 signaling series. In addition, for VoIP communications H.323 specifies a group of audio codecs - the G.700 series. The following is an overview of these major protocols.
ASN.1 (Abstract Syntax Notation One) is commonly misunderstood. It is not a programming language, but it is a flexible notation that allows one to define a variety of data types. ASN.1 encoding rules are sets of rules used to transform data specified in the ASN.1 language into a standard format that can be decoded on any system that has a decoder based on the same set of rules. The H.323 protocol family is compiled into a wire-line protocol using PER (Packed Encoding Rules). PER is a compact binary encoding that is used on limited-bandwidth networks. It is designed to optimize the use of bandwidth, but the tradeoff is complexity: decoding PER PDUs (protocol data units) has lead to problems due to a number of factors including issues with octet alignment, integer precision, and unconstrained character strings.
H.323 Messaging sequence
H.323 signaling exchanges typically are routed via the gatekeeper or directly between the participants as chosen by the gatekeeper. Media exchanges are normally directly routed between the participants of a call.
Normally, the first message components used to initiate an H.323 exchange are Gatekeeper Discovery packets. Establishing a call between two endpoints requires two TCP connections between the endpoints: one for call setup (Q.931/H.225 messages), and one for capabilities exchange and call control (H.245 messages). First, an endpoint initiates an H.225/Q931 exchange on a TCP well-known port (TCP 1720) with another endpoint. Successful completion of the "call" results in an end-to-end reliable channel supporting H.245 messaging.
H.245 negotiations usually take place on a separate channel from the one used for H.225 exchanges, but newer applications support tunneling of H.245 PDUs in the H.225 signaling channel. There is no well-known port for H.245. The H.245 transport address is always passed in a call-signaling message. The media channels (those used to transport voice and video) are similarly dynamically-allocated. As an aside, this use of dynamic ports makes it difficult to implement security policy on firewalls, NAT, and traffic shaping.
H.323 data communications utilizes both TCP and UDP. TCP ensures reliable transport for control signals and data, because these signals must be received in proper order and cannot be lost. UDP is used for audio and video streams, which are time-sensitive but are not as sensitive to an occasional dropped packet. Consequently, the H.225 call signaling channel and the H.245 control channel typically run over TCP, while audio, video, and RAS channel exchanges rely on UDP for transport.
RTP and RTCP
Real-time transport protocol (RTP) is an application layer protocol that provides end-to-end delivery services of real-time audio and video. RTP provides payload identification, sequencing, timestamping, and delivery monitoring. UDP provides multiplexing and checksum services. RTP can also be used with other transport protocols.
The actual media, such as voice, first needs to be encoded using an appropriate codec. The encoded audio stream is then passed to RTP, which is used to transfer the real-time audio/video streams over the Internet. RTCP provides status and control information for the use of RTP.
Real-time transport control protocol (RTCP) is the counterpart of RTP that provides control services. The primary function of RTCP is to provide feedback on the quality of the data distribution. Other RTCP functions include carrying a transport-level identifier for an RTP source, called a canonical name, which can be used by receivers to synchronize audio and video.
RTP runs on dynamic, even-numbered, high ports (ports above 1024), while RTCP runs on the next corresponding odd numbered, high port.
H.225 (denial of service; execution of code)
These failures result from insufficient bounds checking of H.225 messages as they are parsed and processed by affected systems. These errors are primarily due to problems in low-level byte operations with vendor ASN.1 PER/BER PDU decoders, as mentioned earlier. Depending upon the affected system and implementation, these attacks result in system crash and reload (DoS), or in the case of systems that parse these data (such as Microsoft ISA server), execution of code within the context of the security service.
Additionally, we have found that flooding multiple, malformed GRQ (Gatekeeper Request) packets to the gatekeeper results in the disconnection of a number of vendor's IP phones.
Issues with remediation
The H.323 protocol suite is complex, but provides a great deal of flexibility. Protocol-specific problems will be addressed in a similar manner as problems with traditional protocols -- through testing and independent audit followed by remediation. Unfortunately there are still issues that cannot be easily addressed using traditional security devices found in typical organizations.
Twenty rules for securing H.323 networks
In light of the above issues, there are still numerous steps that can be taken to secure VoIP infrastructure. The key to securing H.323 networks is to use and enforce the security mechanisms already deployed in data networks (firewalls, encryption, etc.). This allows one to emulate the security level currently enjoyed by PSTN network users.
The author would like to thank Anton Rager, Brian Boyter, and Rick Robinson for their helpful discussions in preparing this article.
About the author
Thomas Porter, Ph.D. is a Senior Security Consultant with Avaya's Enterprise Security division. He has spent over eleven years in the networking and security industry as a consultant, speaker and developer of security tools; he also holds numerous security certifications. Tom lives in Chapel Hill, North Carolina with his wife.
This article originally appeared on SecurityFocus.com -- reproduction in whole or in part is not allowed without expressed written consent.