Wired for sound: How SIP won the VoIP protocol wars

As an industry grows, it is quite common to find multiple solutions that all attempt to address similar requirements. This evolution dictates that these proposed standards go through a stage of selection—over time, we see some become more dominant than others. Today, the Session Initiation Protocol (SIP) is clearly one of the dominant VoIP protocols, but that obviously didn’t happen overnight. In this article, the first of a series of in-depth articles exploring SIP and VoIP, we’ll look at the main factors that led to this outcome.

A brief history of VoIP

Let’s go back to 1995 in the days prior to Google, IM, and even broadband. Cell phones were large and bulky, Microsoft had developed a new Windows interface with a “Start” button, and Netscape had the most popular Web browser. The growth of the Internet and data networks prompted many to realize that it’s possible to use the new networks to serve our voice communication needs while substantially lowering the associated cost. The first commercial solution of Internet VoIP came from a company called VocalTec; their software allowed two people to talk with each other over the Internet. One would make a local call to an ISP via a 28.8K or 36.6K modem and be able to talk with friends even if they lived far away. I remember trying out this software, and the sound was definitely below acceptable quality. (It frequently sounded like you were attempting to speak while submerged in a swimming pool.) However, the software successfully connected two people and introduced real-time voice conversation for a bandwidth-constrained network.

It was immediately apparent to the first VoIP implementors that there are several differences between the telephone network and the data network. One of them is the message exchange design. The phone system works in circuit-switch, where a circuit is the complete path between two endpoints. Thus, it is possible to guarantee a single path for all messages in a single communication. The data network works with packets, where various hops along the way help to route the packets to their final destination, and this path may change from one packet to the other. Because of this structure, the data network cannot guarantee that the packets of a single session will traverse through the same path. VoIP therefore required some new innovations before it could really get off the ground.

To start a call, you need a VoIP signaling protocol. The term “signaling” comes from the circuit-switch telephone communication world. In this system, we have signals sent from one end to the other in order to communicate and allow us to talk over vast distances. The role of a signaling protocol is to define the way these messages are structured and the rules that let us start, configure, and end conversation. It is worth it to point out that signaling messages do not include the voice one hears (the media of the call). The signaling protocol may include the media streams information and their attributes, but the speech itself in a voice call is not a signaling message. If you’re looking for a very high-level explanation, just think of signaling as the messages a device sends when you dial or hang up the phone.

So the race was on to create a new signaling protocol. Some of these protocol specifications were open for everyone to implement, and others were vendor-proprietary solutions. And that race still isn’t quite over, as we’re constantly seeing new proposals that attempt to convince everyone that there’s a better way to do things. A VoIP signaling protocol must show how it integrates with the data network; this includes aspects such as defining a method of locating the communication devices, specifying server behavior, introducing new services, and security design.

SIP protocol design

SIP is an Internet Engineering Task Force (IETF) protocol and as such, it was designed to be an open Internet protocol. Its first release was in 1999, defined by RFC 2543, but its early drafts date back to 1996. It had some of its definitions revised later in 2002 by RFC 3261.

Let’s look at a simple SIP request:

INVITE sip:hannibal@arstechnica.com SIP/2.0
Via: SIP/2.0/UDP home.mynetwork.org;branch=z9hG4bK8uf35f
To: Jon Stokes <sip:hannibal@arstechnica.com>
From: Gilad <sip:gilad@voxisoft.com>;tag=n23ycs
Call-ID: nbo34tsggvsqap@home.mynetwork.org
CSeq: 59164 INVITE
Contact: sip:gilad@voxisoft.com
Max-Forwards: 70

SIP is text-based. Notice the addresses are very similar to email addresses. Although SIP can support telephone numbers, the basic idea is that the addresses do not have to be phone numbers, just as you would not expect your email address to look like your home or work address. A SIP message might resemble the following (partial) example:

GET /reviews/ HTTP/1.1
Host: arstechnica.com
User-Agent: Gecko/Firefox/3.5.5

Thus, SIP is quite similar to HTTP. The first line is the request line, which contains information regarding the type of request (GET in HTTP and INVITE in SIP for these examples) and the intended address, while subsequent lines are headers with additional information. Naturally, responses in SIP also look very similar to HTTP responses. The idea is to use the structure of one of the most popular Internet protocols and make it easier for software developers and network managers to work with SIP.

These attempts to make SIP as easy as HTTP worked out to some extent, but the requirements of SIP addresses are more complex than HTTP, so the protocol is more complex. For example, it is a basic requirement in SIP to be able to have 2-way symmetric communication, whereas a typical HTTP scenario would be a client making requests to a server and the server sending a response. Even without prior HTTP knowledge, learning this message structure is a very easy task.

For those who are wondering, the SIP example above is the first packet one might send when calling from a SIP phone to Ars Technica’s Deputy Editor, Jon Stokes. I will refrain from going into the technical details of the message contents at this time, as this is a subject for a separate article.

Reuse, and keeping it simple

The role of a signaling protocol is to define the way these messages are structured and the rules that let us start, configure, and end conversation.

Another important factor in SIP’s design was the decision to reuse other existing Internet standards as much as possible. Address location uses DNS, user authentication uses HTTP digest authentication, setting the call media streams uses the Session Description Protocol (SDP), encryption uses TLS and, when applicable, users send each other XML information. This integration further helped establish SIP as part of the Internet protocol world, and vendors could reuse existing implementations in their SIP applications. On the other hand, in some cases the IETF had to make additional definitions in other protocols in order to serve SIP needs.

Keeping the complexity of the servers, especially the proxies, along the call path as minimal as possible is also an emphasis in SIP’s design. SIP Proxies route the messages between the calling parties. The proxies defined in the standard are not aware of the call state, but rather operate on the transaction level and may also be stateless. This helps with scalability, because fewer devices can serve more calls. To do that, the protocol itself was separated to several distinct layers, a common practice programmers use to break down a complex system. This design helps to further simplify SIP and make it easier to implement. At times, keeping this minimal state forced some limitations (and later, some changes in the protocol), but these byproducts were kept to a minimum.

Finally, and perhaps most importantly, SIP was not built solely as a replacement for the telephone system. It allows extensions, and it relies on them to provide additional services beyond just simple calls. For example, you can use SIP to maintain user status information in an IM client as well as to set up IM sessions. Another extension enables transferring a call to a third party, something that was simply not defined by the basic SIP specification. This is possible thanks to the fact that SIP provides the necessary basic constructs while limiting those constructs only when necessary. SIP defines the concept of “dialog” which is a 2-way communication, but does not limit dialogs to calls. Two-way communication also includes setting your IM status and receiving your IM friends’ updates. Extensions can also easily define new request or response types and new headers when needed.

The outcome of these design decisions is the ability to answer the changing communication world’s needs with an existing protocol and thus provide a faster offering of these services. Vendors who want to support multiple services and enable new services would need to follow up closely on the growing number of SIP-related RFC specifications (and sometimes the earlier drafts). This affected SIP interoperability. The fact that so many extensions exist may make it more difficult to deploy a SIP network with multiple vendor devices. SIP attempts to mitigate this problem by defining keywords in an extension. Thus, you can indicate the supported and required extensions by adding the corresponding keywords. Market forces also drive vendors to implement the most commonly required extensions.

SIP’s ability to add new services and extensions proved to be the primary factor in its success.

small biz voip list

A different approach

H.323 is an ITU protocol—or actually a suite of ITU protocols. It is common to find comparisons between SIP and H.323. Some of these comparisons aim to show the benefits of one protocol rather than the other, but our purpose here is to present a different design approach and outcome. We will take a quick look at H.323, but this in no way a complete review.

H.323 is also an open protocol and its first release dates prior to SIP, back in 1996. To the naked eye, H.323 looks like a binary protocol. It does have some binary elements, but for the most part, it’s an encoding of ASN.1. ASN.1 encodes a structure or an object, so that one could easily take ASN.1 and retrieve a tree-like structure with all the data elements. H.323 uses PER encoding to reduce the packet length, which results in very efficient and small packets. Information sent in PER-encoded ASN.1 therefore requires less bandwidth than text, but many people find it easier simply to read a text rather than traverse a tree structure, especially when there are many elements. Furthermore, PER encoding/decoding is not that simple and often takes more programming effort than a text parser.

H.323 defines the details of its protocol more precisely than SIP. Vendors implementing H.323 can expect quicker interoperability with other vendors offering the same standard. When the ITU releases new versions of the protocol, they are always fully compatible with previous ones, so you don’t need to worry whether or not the previous implementation will become deprecated. On the flip side, all implementations of the new versions must support scenarios defined in older versions, even if it has been well-established that those procedures have significant disadvantages. Until now, 7 versions of H.323 were defined and some H.323 procedures changed during this time, such as new flows to start a new call. So everyone must support previous methods even when they are less efficient and more difficult to code. As for SIP, only 2 versions were created. In fact, they are both marked as SIP/2.0, so they are not really considered different versions. In any case, in places where the definition changed, backwards compatibility was maintained. Today, some issues that are considered bugs in the protocol are being repaired by short-length RFCs and in some cases SIP RFCs deprecate previous requirements.

In some cases, H.323 had features earlier than SIP, such as a resource reservation or digit notification event that’s generated when someone presses the keys on the telephone during a call. This forced SIP vendors to create their own solution to missing extensions, or to use an early draft of an RFC. H.323 that does support extensions by allowing vendor-specific information in some of its fields. So it’s possible to extend H.323, but it’s much harder. H.323 did formally introduce extension support in its 4th version, but not everyone was ready to move to this version quickly and SIP had this capability from the beginning. Most of H.323’s extensions focused on call supplementary services, while SIP extensions offered much more than extensions to calls. H.323 reuses many ITU protocols, many of which came from ISDN, but it does not present the same layer separation and modularity as seen in SIP.

Both designs have merits as well as disadvantages. Eventually, H.323 was released first, something that significantly contributed to its adoption at the time. Today, however, new deployments usually go with SIP rather than H.323. So what made SIP more acceptable over time?

The keys to SIP’s success

By the time most network operators had to choose a protocol, SIP interoperability levels were already very high.

SIP’s ability to add new services and extensions proved to be the primary factor of its success. When the focus turned to multimedia sessions with abilities other than mere calls, SIP’s adaptability proved to be crucial. Simply put, it is very natural in SIP to work with extensions. SIP is also relatively programmer-friendly. It does not take that long to learn, and the time needed to create a basic application is relatively short. The IETF usually makes an effort to make sure its specifications are readable and that implementers can understand its contents. Open-source projects can therefore create SIP-based applications much faster, and commercial vendors usually find the ability to release a product with fewer resources very appealing—this means more products for customers to choose.

Finally, as time progressed, interoperability became less of an issue because vendors were able to test their SIP implementations. By the time most network operators had to choose a protocol, SIP interoperability levels were already very high. SIP does not just compete with H.323, but also with several proprietary protocols. Network operators deploying VoIP truly like the concept of having the ability to select from different products. Interoperable solutions also imply the ability to deploy with multiple vendors on the same network without any specific implementation dependency. This last factor gave a strong push in favor of an open protocol rather than one owned by a company.

One major example of SIP’s success is the IP Multimedia Subsystem (IMS). IMS offers Internet services to cellular networks as well as merging the fixed and mobile worlds. SIP is a crucial building block in IMS’ ability to do just that. One of the goals in IMS design was to have the ability to introduce multiple services, a great fit between this requirement and SIP design.

As with any protocol, SIP is not perfect. Over time, it has required some alterations to cope with new real-world network scenarios. Thus far, it has proven flexible and adaptive enough to handle these changes without breaking the protocol. That’s a very good sign for the future of SIP.

Nonetheless, some do feel that using a newer protocol would be better. In fact, ITU started working on H.325 (One would expect its name would be H.324, but H.324 already exists and defines transmission of voice video and data on analog lines. It was later adapted to H.324M to enable video-over-cellular). H.325 is also often referred to as Advanced Multimedia System (AMS), and it strives to offer a new protocol designed to address the latest requirements from day one. However, H.325 is in the very early stages of development, and it has yet to prove itself among a variety of competing options. At present, it’s safe to say that SIP will continue to dominate for now and into the foreseeable future. In fact, wireless operators are starting to shift to a next-generation, all-IP network known as Long Term Evolution, or LTE, and SIP already plays an important role in this upcoming network architecture.

Source: ARSTechnica

Have you checked out the new WhichVoIP.co.za website as yet? Benchmark your services against your peers, have a look at what your competitors are doing, get listed in the best Telecoms provider directory in South Africa, and advertise on the site to attract customers to your page where you can view page hits, respond to reviews, load adverts, and more.

Visit WhichVoIP.co.za or jump to a leading comparison section:

Enjoy the site!


About Telecoms-Channel

Telecoms-Channel.co.za is your one-stop source for the latest news and insights from the telecoms industry in South Africa, where you get comprehensive coverage of the industry and keep up with the ever-evolving market landscape.

Whether you need to understand market trends, identify new opportunities, or stay informed of the latest developments, we have you covered.

In addition to bringing the best news together, we have access to an extensive supplier network that makes it easy for any telecoms company looking to tap into new markets or enter the telecoms industry. Take advantage of our expertise and contact us today to find your next partner!

Other posts you might be interested in

Ericsson LG
Industry News

Ericsson-LG’s Key Trends Shaping the Future of Enterprise Communication

Ericsson-LG is at the forefront of transforming the business communications landscape. By harnessing the power of artificial intelligence, cloud technology, and robust cybersecurity, the company is empowering businesses to enhance productivity, streamline operations, and improve customer experiences.

Cloud PBX Solutions

Request Once, Get Multiple Quotes - Save Thousands!