Options for Securing RTP
SessionsEricssonFarogatan 6SE-164 80 KistaSweden+46 10 714 82 87magnus.westerlund@ericsson.comUniversity of GlasgowSchool of Computing ScienceGlasgowG12 8QQUnited Kingdomcsp@csperkins.orghttp://csperkins.org/The Real-time Transport Protocol (RTP) is used in a large number of
different application domains and environments. This heterogeneity
implies that different security mechanisms are needed to provide
services such as confidentiality, integrity, and source authentication
of RTP and RTP Control Protocol (RTCP) packets suitable for the various
environments. The range of solutions makes it difficult for RTP-based
application developers to pick the most suitable mechanism. This
document provides an overview of a number of security solutions for RTP
and gives guidance for developers on how to choose the appropriate
security mechanism.The Real-time Transport Protocol (RTP)
is widely used in a large variety of multimedia applications, including
Voice over IP (VoIP), centralized multimedia conferencing, sensor data
transport, and Internet television (IPTV) services. These applications
can range from point-to-point phone calls, through centralized group
teleconferences, to large-scale television distribution services. The
types of media can vary significantly, as can the signaling methods used
to establish the RTP sessions.So far, this multidimensional heterogeneity has prevented development
of a single security solution that meets the needs of the different
applications. Instead, a significant number of different solutions have
been developed to meet different sets of security goals. This makes it
difficult for application developers to know what solutions exist and
whether their properties are appropriate. This memo gives an overview of
the available RTP solutions and provides guidance on their applicability
for different application domains. It also attempts to provide an
indication of actual and intended usage at the time of writing as
additional input to help with considerations such as interoperability,
availability of implementations, etc. The guidance provided is not
exhaustive, and this memo does not provide normative
recommendations.It is important that application developers consider the security
goals and requirements for their application. The IETF considers it
important that protocols implement secure modes of operation and makes
them available to users . Because of the
heterogeneity of RTP applications and use cases, however, a single
security solution cannot be mandated . Instead,
application developers need to select mechanisms that provide
appropriate security for their environment. It is strongly encouraged
that common mechanisms be used by related applications in common
environments. The IETF publishes guidelines for specific classes of
applications, so it is worth searching for such guidelines.The remainder of this document is structured as follows. provides additional background. outlines the available security mechanisms at the
time of this writing and lists their key security properties and
constraints. provides guidelines and
important aspects to consider when securing an RTP application. Finally,
in , we give some examples of application
domains where guidelines for security exist.RTP can be used in a wide variety of topologies due to its support
for point-to-point sessions, multicast groups, and other topologies
built around different types of RTP middleboxes. In the following, we
review the different topologies supported by RTP to understand their
implications for the security properties and trust relations that can
exist in RTP sessions.The most basic use case is two directly connected endpoints, shown
in , where A has established an RTP session
with B. In this case, the RTP security is primarily about ensuring
that any third party be unable to compromise the confidentiality and
integrity of the media communication. This requires confidentiality
protection of the RTP session, integrity protection of the RTP/RTCP
packets, and source authentication of all the packets to ensure no
man-in-the-middle (MITM) attack is taking place.The source authentication can also be tied to a user or an
endpoint's verifiable identity to ensure that the peer knows with whom
they are communicating. Here, the combination of the security protocol
protecting the RTP session (and, hence, the RTP and RTCP traffic) and
the key management protocol becomes important to determine what
security claims can be made.An RTP mixer is an RTP session-level middlebox around which one can
build a multiparty RTP-based conference. The RTP mixer might actually
perform media mixing, like mixing audio or compositing video images
into a new media stream being sent from the mixer to a given
participant, or it might provide a conceptual stream; for example, the
video of the current active speaker. From a security point of view,
the important features of an RTP mixer are that it generates a new
media stream, has its own source identifier, and does not simply
forward the original media.An RTP session using a mixer might have a topology like that in
. In this example, participants A through D
each send unicast RTP traffic to the RTP mixer, and receive an RTP
stream from the mixer, comprising a mixture of the streams from the
other participants.A consequence of an RTP mixer having its own source identifier and
acting as an active participant towards the other endpoints is that
the RTP mixer needs to be a trusted device that has access to the
security context(s) established. The RTP mixer can also become a
security-enforcing entity. For example, a common approach to secure
the topology in is to establish a security
context between the mixer and each participant independently and have
the mixer source authenticate each peer. The mixer then ensures that
one participant cannot impersonate another.RTP translators are middleboxes that provide various levels of
in&nbhy;network media translation and transcoding. Their security
properties vary widely, depending on which type of operations they
attempt to perform. We identify and discuss three different categories
of RTP translators: transport translators, gateways, and media
transcoders.A transport translator operates on a
level below RTP and RTCP. It relays the RTP/RTCP traffic from one
endpoint to one or more other addresses. This can be done based only
on IP addresses and transport protocol ports, and each receive port
on the translator can have a very basic list of where to forward
traffic. Transport translators also need to implement ingress
filtering to prevent random traffic from being forwarded that isn't
coming from a participant in the conference. shows an example transport translator,
where traffic from any one of the four participants will be
forwarded to the other three participants unchanged. The resulting
topology is very similar to an Any Source Multicast (ASM) session
(as discussed in ) but is implemented at the
application layer.A transport translator can often operate without needing access
to the security context, as long as the security mechanism does not
provide protection over the transport-layer information. A transport
translator does, however, make the group communication visible and,
thus, can complicate keying and source authentication mechanisms.
This is further discussed in .Gateways are deployed when the endpoints are not fully
compatible. shows an example topology.
The functions a gateway provides can be diverse and range from
transport-layer relaying between two domains not allowing direct
communication, via transport or media protocol function initiation
or termination, to protocol- or media-encoding translation. The
supported security protocol might even be one of the reasons a
gateway is needed.The choice of security protocol, and the details of the gateway
function, will determine if the gateway needs to be trusted with
access to the application security context. Many gateways need to be
trusted by all peers to perform the translation; in other cases,
some or all peers might not be aware of the presence of the gateway.
The security protocols have different properties depending on the
degree of trust and visibility needed. Ensuring communication is
possible without trusting the gateway can be a strong incentive for
accepting different security properties. Some security solutions
will be able to detect the gateways as manipulating the media
stream, unless the gateway is a trusted device.A media transcoder is a special type of gateway device that
changes the encoding of the media being transported by RTP. The
discussion in applies. A media
transcoder alters the media data and, thus, needs to be trusted with
access to the security context.Any Source Multicast is the original
multicast model where any multicast group participant can send to the
multicast group and get their packets delivered to all group members
(see ). This form of communication has
interesting security properties due to the many-to-many nature of the
group. Source authentication is important, but all participants with
access to the group security context will have the necessary secrets
to decrypt and verify the integrity of the traffic. Thus, use of any
group security context fails if the goal is to separate individual
sources; alternate solutions are needed.In addition, the potential large size of multicast groups creates
some considerations for the scalability of the solution and how the
key management is handled.Source-Specific Multicast (SSM)
allows only a specific endpoint to send traffic to the multicast
group, irrespective of the number of RTP media sources. The endpoint
is known as the media distribution source. For the RTP session to
function correctly with RTCP over an SSM session, extensions have been
defined in .
shows a sample SSM-based RTP session where several media sources,
MS1...MSm, all send media to a distribution source, which then
forwards the media data to the SSM group for delivery to the
receivers, R1...Rn, and the feedback targets, FT1...FTn. RTCP
reception quality feedback is sent unicast from each receiver to one
of the feedback targets. The feedback targets aggregate reception
quality feedback and forward it upstream towards the distribution
source. The distribution source forwards (possibly aggregated and
summarized) reception feedback to the SSM group and back to the
original media sources. The feedback targets are also members of the
SSM group and receive the media data, so they can send unicast repair
data to the receivers in response to feedback if appropriate.The use of SSM makes it more difficult to inject traffic into the
multicast group, but not impossible. Source authentication
requirements apply for SSM sessions, too; an individual verification
of who sent the RTP and RTCP packets is needed. An RTP session using
SSM will have a group security context that includes the media
sources, distribution source, feedback targets, and the receivers.
Each has a different role and will be trusted to perform different
actions. For example, the distribution source will need to
authenticate the media sources to prevent unwanted traffic from being
distributed via the SSM group. Similarly, the receivers need to
authenticate both the distribution source and their feedback target to
prevent injection attacks from malicious devices claiming to be
feedback targets. An understanding of the trust relationships and
group security context is needed between all components of the
system.This section provides an overview of security requirements and the
current RTP security mechanisms that implement those requirements. This
cannot be a complete survey, since new security mechanisms are defined
regularly. The goal is to help applications designers by reviewing the
types of solutions that are available. This section will use a number of
different security-related terms, as described in the Internet Security
Glossary, Version 2 .The Secure Real-time Transport Protocol
(SRTP) is one of the most commonly used mechanisms to provide
confidentiality, integrity protection, source authentication, and
replay protection for RTP. SRTP was developed with RTP header
compression and third-party monitors in mind. Thus, the RTP header is
not encrypted in RTP data packets, and the first 8 bytes of the first
RTCP packet header in each compound RTCP packet are not encrypted. The
entirety of RTP packets and compound RTCP packets are integrity
protected. This allows RTP header compression to work and lets
third-party monitors determine what RTP traffic flows exist based on
the synchronization source (SSRC) fields, but it protects the
sensitive content.SRTP works with transforms where different combinations of
encryption algorithm, authentication algorithm, and pseudorandom
function can be used, and the authentication tag length can be set to
any value. SRTP can also be easily extended with additional
cryptographic transforms. This gives flexibility but requires more
security knowledge by the application developer. To simplify things,
Session Description Protocol (SDP) security descriptions (see ) and Datagram Transport Layer Security
Extension for SRTP (DTLS-SRTP) (see ) use
predefined combinations of transforms, known as SRTP crypto suites and
SRTP protection profiles, that bundle together transforms and other
parameters, making them easier to use but reducing flexibility. The
Multimedia Internet Keying (MIKEY) protocol (see ) provides flexibility to negotiate the full selection
of transforms. At the time of this writing, the following transforms,
SRTP crypto suites, and SRTP protection profiles are defined or under
definition:AES Counter Mode encryption
with 128-bit keys combined with 160-bit keyed HMAC-SHA-1 with an
80-bit authentication tag. This is the default cryptographic
transform that needs to be supported. The transforms are defined
in SRTP, with the corresponding SRTP
crypto suite defined in and SRTP
protection profile defined in .AES f8-mode encryption using
128-bit keys combined with keyed HMAC-SHA-1 using 80-bit
authentication. The transforms are defined in , with the corresponding SRTP crypto suite
defined in . The corresponding SRTP
protection profile is not defined.A Korean national standard cryptographic
transform that is defined to be used with SRTP in . Three options are defined: one using SHA-1
authentication, one using Counter Mode with Cipher Block Chaining
Message Authentication Code (CBC-MAC), and one using Galois
Counter Mode.A Korean block
cipher that supports 128-, 192-, and 256-bit keys. It also
defines three options: Counter Mode where combined with HMAC-SHA-1
with 80- or 32-bit authentication tags, Counter Mode with CBC-MAC,
and Galois Counter Mode. It also defines a different key
derivation function than the AES-based systems.Cryptographic transforms
for SRTP based on AES-192 and AES-256 Counter Mode encryption and
160-bit keyed HMAC-SHA-1 with 80- and 32-bit authentication tags.
These provide 192- and 256-bit encryption keys, but otherwise
match the default 128-bit AES-CM transform. The transforms are
defined in and ,
and the SRTP crypto suites are defined in .AES Galois Counter Mode and AES
Counter Mode with CBC-MAC for AES-128 and AES-256. This
authentication is included in the cipher text, which becomes
expanded with the length of the authentication tag instead of
using the SRTP authentication tag. This is defined in .SRTP also provides a
NULL cipher that can be used when no confidentiality for RTP/RTCP
is requested. The corresponding SRTP protection profile is defined
in .The source authentication guarantees provided by SRTP depend on the
cryptographic transform and key management used. Some transforms give
strong source authentication even in multiparty sessions; others give
weaker guarantees and can authenticate group membership but not
sources. Timed Efficient Stream Loss-Tolerant Authentication (TESLA)
offers a complement to the regular symmetric
keyed authentication transforms, like HMAC-SHA-1, and can provide
per&nbhy;source authentication in some group communication scenarios.
The downside is the need for buffering the packets for a while before
authenticity can be verified. defines a variant of the authentication
tag that enables a receiver to obtain the Roll over Counter for the
RTP sequence number that is part of the Initialization Vector (IV) for
many cryptographic transforms. This enables quicker and easier options
for joining a long-lived RTP group; for example, a broadcast
session.RTP header extensions are normally carried in the clear and are
only integrity protected in SRTP. This can be problematic in some
cases, so defines an extension to also
encrypt selected header extensions.SRTP is specified and deployed in a number of RTP usage contexts;
significant support is provided in SIP-established VoIP clients,
including IP Multimedia Subsystems (IMS), and in the Real Time Streaming Protocol (RTSP) and
RTP-based media streaming. Thus, SRTP in general is widely deployed.
When it comes to cryptographic transforms, the default (AES-CM and
HMAC-SHA-1) is the most commonly used, but it might be expected that
AES-GCM, AES-192-CM, and AES-256-CM will gain usage in future,
especially due to the AES- and GCM-specific instructions in new
CPUs.SRTP does not contain an integrated key management solution;
instead, it relies on an external key management protocol. There are
several protocols that can be used. The following sections outline
some popular schemes.A Datagram Transport Layer Security (DTLS) extension exists for
establishing SRTP keys . This extension provides secure key exchange
between two peers, enabling Perfect Forward Secrecy (PFS) and
binding strong identity verification to an endpoint. PFS is a
property of the key agreement protocol that ensures that a session
key derived from a set of long-term keys will not be compromised if
one of the long-term keys is compromised in the future. The default
key generation will generate a key that contains material
contributed by both peers. The key exchange happens in the media
plane directly between the peers. The common key exchange procedures
will take two round trips assuming no losses. Transport Layer
Security (TLS) resumption can be used when establishing additional
media streams with the same peer, and it reduces the setup time to
one RTT for these streams (see for a
discussion of TLS resumption in this context).The actual security properties of an established SRTP session
using DTLS will depend on the cipher suites offered and used, as
well as the mechanism for identifying the endpoints of the
handshake. For example, some cipher suites provide PFS, while others
do not. When using DTLS, the application designer needs to select
which cipher suites DTLS-SRTP can offer and accept so that the
desired security properties are achieved. The next choice is how to
verify the identity of the peer endpoint. One choice can be to rely
on the certificates and use a PKI to verify them to make an identity
assertion. However, this is not the most common way; instead,
self-signed certificates are common to use to establish trust
through signaling or other third-party solutions.DTLS-SRTP key management can use the signaling protocol in four
ways: First, to agree on using DTLS-SRTP for media security. Second,
to determine the network location (address and port) where each side
is running a DTLS listener to let the parts perform the key
management handshakes that generate the keys used by SRTP. Third, to
exchange hashes of each side's certificates to bind these to the
signaling and ensure there is no MITM attack. This assumes that one
can trust the signaling solution to be resistant to modification and
not be in collaboration with an attacker. Finally, to provide an
asserted identity, e.g., , that can be used
to prevent modification of the signaling and the exchange of
certificate hashes. That way, it enables binding between the key
exchange and the signaling.This usage is well defined for SIP/SDP in and, in most cases, can be adopted for use with
other bidirectional signaling solutions. It is to be noted that
there is work underway to revisit the SIP
Identity mechanism in the IETF STIR working group.The main question regarding DTLS-SRTP's security properties is
how one verifies any peer identity or at least prevents MITM
attacks. This does require trust in some DTLS-SRTP external parties:
either a PKI, a signaling system, or some identity provider.DTLS-SRTP usage is clearly on the rise. It is mandatory to
support in Web Real-Time Communication (WebRTC). It has growing
support among SIP endpoints. DTLS-SRTP was developed in IETF
primarily to meet security requirements for RTP-based media
established using SIP. The requirements considered can be reviewed
in "Requirements and Analysis of Media
Security Management Protocols" .Multimedia Internet Keying (MIKEY)
is a keying protocol that has several modes with different
properties. MIKEY can be used in point-to-point applications using
SIP and RTSP (e.g., VoIP calls) but is also suitable for use in
broadcast and multicast applications and centralized group
communications.MIKEY can establish multiple security contexts or cryptographic
sessions with a single message. It is usable in scenarios where one
entity generates the key and needs to distribute the key to a number
of participants. The different modes and the resulting properties
are highly dependent on the cryptographic method used to establish
the session keys actually used by the security protocol, like
SRTP.MIKEY has the following modes of operation:Uses a pre-shared secret for
symmetric key crypto used to secure a keying message carrying
the already-generated session key. This system is the most
efficient from the perspective of having small messages and
processing demands. The downside is scalability, where usually
the effort for the provisioning of pre-shared keys is only
manageable if the number of endpoints is small.Uses a public key crypto to
secure a keying message carrying the already-generated session
key. This is more resource intensive but enables scalable
systems. It does require a public key infrastructure to enable
verification.Uses Diffie-Hellman key agreement
to generate the session key, thus providing perfect forward
secrecy. The downside is high resource consumption in bandwidth
and processing during the MIKEY exchange. This method can't be
used to establish group keys as each pair of peers performing
the MIKEY exchange will establish different keys. defines a variant of the Diffie-Hellman
exchange that uses a pre-shared key in a keyed Hashed Message
Authentication Code (HMAC) to verify authenticity of the keying
material instead of a digital signature as in the previous
method. This method is still restricted to
point&nbhy;to&nbhy;point usage.MIKEY-RSA in Reverse
mode is a variant of the public key method, which doesn't
rely on the initiator of the key exchange knowing the
responder's certificate. This method lets both the initiator and
the responder specify the session keying material depending on
the use case. Usage of this mode requires one round-trip
time.Ticket Payload (TICKET) is a MIKEY extension using a trusted
centralized key management service (KMS). The initiator and
responder do not share any credentials; instead, they trust a
third party, the KMS, with which they both have or can establish
shared credentials.Identity-Based Authenticated Key Exchange
(IBAKE) uses a KMS infrastructure but
with lower demand on the KMS. It claims to provide both perfect
forward and backwards secrecy. provides
Sakai-Kasahara Key Encryption (SAKKE) in MIKEY. It is based on
Identity-based Public Key Cryptography and a KMS infrastructure
to establish a shared secret value and certificateless
signatures to provide source authentication. Its features
include simplex transmission, scalability, low-latency call
setup, and support for secure deferred delivery.MIKEY messages have several different transports. defines how MIKEY messages can be embedded in
general SDP for usage with the signaling protocols SIP, Session
Announcement Protocol (SAP), and RTSP. There also exists a usage of
MIKEY defined by the Third Generation Partnership Project (3GPP)
that sends MIKEY messages directly over
UDP to key the receivers of Multimedia Broadcast and Multicast Service
(MBMS). defines the
application/mikey media type, allowing MIKEY to be used in, e.g.,
email and HTTP.Based on the many choices, it is important to consider the
properties needed in one's solution and based on that evaluate which
modes are candidates for use. More information on the applicability
of the different MIKEY modes can be found in .MIKEY with pre-shared keys is used by 3GPP MBMS, and IMS media security
specifies the use of the TICKET mode
transported over SIP and HTTP. RTSP 2.0
specifies use of the RSA-R mode. There are some SIP endpoints that
support MIKEY. The modes they use are unknown to the authors. provides a keying solution based on
sending plaintext keys in SDP. It is
primarily used with SIP and the SDP Offer/Answer model and is well
defined in point-to-point sessions where each side declares its own
unique key. Using security descriptions to establish group keys is
less well defined and can have security issues since it's difficult
to guarantee unique SSRCs (as needed to avoid a "two-time pad"
attack -- see Section 9 of ).Since keys are transported in plaintext in SDP, they can easily
be intercepted unless the SDP carrying protocol provides strong
end&nbhy;to&nbhy;end confidentiality and authentication guarantees.
This is not normally the case; instead, hop-by-hop security is
provided between signaling nodes using TLS. This leaves the keying
material sensitive to capture by the traversed signaling nodes.
Thus, in most cases, the security properties of security
descriptions are weak. The usage of security descriptions usually
requires additional security measures; for example, the signaling
nodes are trusted and protected by strict access control. Usage of
security descriptions requires careful design in order to ensure
that the security goals can be met.Security descriptions are the most commonly deployed keying
solution for SIP-based endpoints, where almost all endpoints that
support SRTP also support security descriptions. It is also used for
access protection in IMS Media Security .Encrypted Key Transport (EKT) is an
SRTP extension that enables group keying despite using a keying
mechanism like DTLS-SRTP that doesn't support group keys. It is
designed for centralized conferencing, but it can also be used in
sessions where endpoints connect to a conference bridge or a gateway
and need to be provisioned with the keys each participant on the
bridge or gateway uses to avoid decryption and encryption cycles.
This can enable interworking between DTLS-SRTP and other keying
systems where either party can set the key (e.g., interworking with
security descriptions).The mechanism is based on establishing an additional EKT key,
which everyone uses to protect their actual session key. The actual
session key is sent in an expanded authentication tag to the other
session participants. This key is only sent occasionally or
periodically depending on use cases and depending on what
requirements exist for timely delivery or notification.The only known deployment of EKT so far is in some Cisco video
conferencing products.The ZRTP key management system for
SRTP was proposed as an alternative to DTLS-SRTP. ZRTP provides best
effort encryption independent of the signaling protocol and utilizes
key continuity, Short Authentication Strings, or a PKI for
authentication. ZRTP wasn't adopted as an IETF Standards Track
protocol, but was instead published as an Informational RFC in the
IETF stream. Commercial implementations exist.Additional proprietary solutions are also known to exist.Section 9 of the RTP standard defines a
Data Encryption Standard (DES) or 3DES-based encryption of RTP and
RTCP packets. This mechanism is keyed using plaintext keys in SDP using the "k=" SDP field. This method can
provide confidentiality but, as discussed in Section 9 of , it has extremely weak security properties and is
not to be used.IPsec can be used in either tunnel or
transport mode to protect RTP and RTCP packets in transit from one
network interface to another. This can be sufficient when the network
interfaces have a direct relation or in a secured environment where it
can be controlled who can read the packets from those interfaces.The main concern with using IPsec to protect RTP traffic is that in
most cases, using a VPN approach that terminates the security
association at some node prior to the RTP endpoint leaves the traffic
vulnerable to attack between the VPN termination node and the
endpoint. Thus, usage of IPsec requires careful thought and design of
its usage so that it meets the security goals. An important question
is how one ensures the IPsec terminating peer and the ultimate
destination are the same. Applications can have issues using existing
APIs when determining if IPsec is being used or not and when
determining who the authenticated peer entity is when IPsec is
used.IPsec with RTP is more commonly used as a security solution between
infrastructure nodes that exchange many RTP sessions and media
streams. The establishment of a secure tunnel between such nodes
minimizes the key management overhead.Just as RTP can be sent over TCP, it
can also be sent over TLS over TCP,
using TLS to provide point-to-point security services. The security
properties TLS provides are confidentiality, integrity protection, and
possible source authentication if the client or server certificates
are verified and provide a usable identity. When used in multiparty
scenarios using a central node for media distribution, the security
provided is only between the central node and the peers, so the
security properties for the whole session are dependent on what trust
one can place in the central node.RTSP 1.0 and 2.0 specify the usage of RTP over the same
TLS/TCP connection that the RTSP messages are sent over. It appears
that RTP over TLS/TCP is also used in some proprietary solutions that
use TLS to bypass firewalls.DTLS is based on TLS but designed to work over an unreliable
datagram-oriented transport rather than requiring reliable byte stream
semantics from the transport protocol. Accordingly, DTLS can provide
point-to-point security for RTP flows analogous to that provided by
TLS but over a datagram transport such as UDP. The two peers establish
a DTLS association between each other, including the possibility to do
certificate-based source authentication when establishing the
association. All RTP and RTCP packets flowing will be protected by
this DTLS association.Note that using DTLS for RTP flows is different from using
DTLS-SRTP key management. DTLS-SRTP uses the same key management steps
as DTLS, but uses SRTP for the per-packet security operations. Using
DTLS for RTP flows uses the normal datagram TLS data protection,
wrapping complete RTP packets. When using DTLS for RTP flows, the RTP
and RTCP packets are completely encrypted with no headers in the
clear; when using DTLS-SRTP, the RTP headers are in the clear and only
the payload data is encrypted.DTLS can use similar techniques to those available for DTLS-SRTP to
bind a signaling-side agreement to communicate to the certificates
used by the endpoint when doing the DTLS handshake. This enables use
without having a certificate-based trust chain to a trusted
certificate root.There does not appear to be significant usage of DTLS for RTP.Mechanisms have been defined that encrypt only the media content
operating within the RTP payload data and leaving the RTP headers and
RTCP unaffected. There are several reasons why this might be
appropriate, but a common rationale is to ensure that the content
stored by RTSP streaming servers has the media content in a protected
format that cannot be read by the streaming server (this is mostly
done in the context of Digital Rights Management). These approaches
then use a key management solution between the rights provider and the
consuming client to deliver the key used to protect the content and do
not give the media server access to the security context. Such methods
have several security weaknesses such as the fact that the same key is
handed out to a potentially large group of receiving clients,
increasing the risk of a leak.Use of this type of solution can be of interest in environments
that allow middleboxes to rewrite the RTP headers and select which
streams are delivered to an endpoint (e.g., some types of centralized
video conference systems). The advantage of encrypting and possibly
integrity protecting the payload but not the headers is that the
middlebox can't eavesdrop on the media content, but it can still
provide stream switching functionality. The downside of such a system
is that it likely needs two levels of security: the payload-level
solution, to provide confidentiality and source authentication, and a
second layer with additional transport security ensuring source
authentication and integrity of the RTP headers associated with the
encrypted payloads. This can also result in the need to have two
different key management systems as the entity protecting the packets
and payloads are different with a different set of keys.The aspect of two tiers of security are present in ISMACryp (see
) and the deprecated 3GPP Packet-switched Streaming Service
solution; see Annex K of .The Internet Streaming Media Alliance (ISMA) has defined ISMA Encryption and Authentication 2.0.
This specification defines how one encrypts and packetizes the
encrypted application data units (ADUs) in an RTP payload using the
MPEG-4 generic payload format. The ADU
types that are allowed are those that can be stored as elementary
streams in an ISO Media File format-based file. ISMACryp uses SRTP
for packet-level integrity and source authentication from a
streaming server to the receiver.Key management for an ISMACryp-based system can be achieved
through Open Mobile Alliance (OMA) Digital
Rights Management 2.0, for example.In the following, we provide guidelines for how to choose appropriate
security mechanisms for RTP applications.This section discusses a number of application requirements that
need to be considered. An application designer choosing security
solutions requires a good understanding of what level of security is
needed and what behavior they strive to achieve.When it comes to confidentiality of an RTP session, there are
several aspects to consider:When using encryption
to provide media confidentiality, it is necessary to have some
rough understanding of the security goal and how long one can
expect the protected content to remain confidential. National or
other regulations might provide additional requirements on a
particular usage of an RTP. From that, one can determine which
encryption algorithms are to be used from the set of available
transforms.RTP-based security in
most of its forms simply wraps RTP and RTCP packets into
cryptographic containers. This commonly means that the size of
the original RTP payload is visible to observers of the
protected packet flow. This can provide information to those
observers. A well-documented case is the risk with variable
bitrate speech codecs that produce different sized packets based
on the speech input . Potential threats
such as these need to be considered and, if they are
significant, then restrictions will be needed on mode choices in
the codec, or additional padding will need to be added to make
all packets equal size and remove the informational leakage.Another case is RTP header extensions. If SRTP is
used, header extensions are normally not protected by the
security mechanism protecting the RTP payload. If the header
extension carries information that is considered sensitive, then
the application needs to be modified to ensure that mechanisms
used to protect against such information leakage are
employed.When considering the
confidentiality properties of a system, it is important to
consider where the media handled in the clear. For example, if
the system is based on an RTP mixer that needs the keys to
decrypt the media, process it, and repacketize it, then is the
mixer providing the security guarantees expected by the other
parts of the system? Furthermore, it is important to consider
who has access to the keys. The policies for the handling of the
keys, and who can access the keys, need to be considered along
with the confidentiality goals.As can be seen, the actual confidentiality level has likely more
to do with the application's usage of centralized nodes, and the
details of the key management solution chosen, than with the actual
choice of encryption algorithm (although, of course, the encryption
algorithm needs to be chosen appropriately for the desired security
level).Protection against modification of content by a third party, or
due to errors in the network, is another factor to consider. The
first aspect that one assesses is what resilience one has against
modifications to the content. Some media types are extremely
sensitive to network bit errors, whereas others might be able to
tolerate some degree of data corruption. Equally important is to
consider the sensitivity of the content, who is providing the
integrity assertion, what is the source of the integrity tag, and
what are the risks of modifications happening prior to that point
where protection is applied. These issues affect what cryptographic
algorithm is used, the length of the integrity tags, and whether the
entire payload is protected.RTP applications that rely on central nodes need to consider if
hop&nbhy;by&nbhy;hop integrity is acceptable or if true end-to-end
integrity protection is needed. Is it important to be able to tell
if a middlebox has modified the data? There are some uses of RTP
that require trusted middleboxes that can modify the data in a way
that doesn't break integrity protection as seen by the receiver, for
example, local advertisement insertion in IPTV systems. There are
also uses where it is essential that such in-network modification be
detectable. RTP can support both with appropriate choices of
security mechanisms.Integrity of the data is commonly closely tied to the question of
source authentication. That is, it becomes important to know who
makes an integrity assertion for the data.Source authentication is about determining who sent a particular
RTP or RTCP packet. It is normally closely tied with integrity,
since a receiver generally also wants to ensure that the data
received is what the source really sent, so source authentication
without integrity is not particularly useful. Similarly, integrity
protection without source authentication is also not particularly
useful; a claim that a packet is unchanged that cannot itself be
validated as from the source (or some from other known and trusted
party) is meaningless.Source authentication can be asserted in several different ways:
Using cryptographic mechanisms that
give authentication with some type of key management provide an
implicit method for source authentication. Assuming that the
mechanism has sufficient strength not to be circumvented in the
time frame when you would accept the packet as valid, it is
possible to assert a source-authenticated statement; this
message is likely from a source that has the cryptographic
key(s) to this communication.What that assertion actually means is highly
dependent on the application and how it handles the keys. If
only the two peers have access to the keys, this can form a
basis for a strong trust relationship that traffic is
authenticated coming from one of the peers. However, in a
multiparty scenario where security contexts are shared among
participants, most base-level authentication solutions can't
even assert that this packet is from the same source as the
previous packet.A step up in
the assertion that can be done in base-level systems is to tie
the signaling to the key exchange. Here, the goal is to at least
be able to assert that the source of the packets is the same
entity with which the receiver established the session. How
feasible this is depends on the properties of the key management
system, the ability to tie the signaling to a particular source,
and the degree of trust the receiver places on the different
nodes involved.For example, systems where the key exchange is
done using the signaling systems, such as security descriptions enable a direct
binding between signaling and key exchange. In such systems, the
actual security depends on the trust one can place in the
signaling system to correctly associate the peer's identifier
with the key exchange.If the applications have access
to a system that can provide verifiable identifiers, then the
source authentication can be bound to that identifier. For
example, in a point-to-point communication, even symmetric key
crypto, where the key management can assert that the key has
only been exchanged with a particular identifier, can provide a
strong assertion about the source of the traffic. SIP Identity
provides one example of how this can be
done and could be used to bind DTLS-SRTP certificates used by an
endpoint to the identity provider's public key to authenticate
the source of a DTLS-SRTP flow.Note that all levels of the system need to have
matching capability to assert identifiers. If the signaling can
assert that only a given entity in a multiparty session has a
key, then the media layer might be able to provide guarantees
about the identifier used by the media sender. However, using a
signaling authentication mechanism built on a group key can
limit the media layer to asserting only group membership.There exist many different types of systems providing identifiers
with different properties (e.g., SIP Identity ). In the context of RTP applications, the most
important property is the possibility to perform source
authentication and verify such assertions in relation to any claimed
identifiers. What an identifier really represents can also vary but,
in the context of communication, one of the most obvious is the
identifiers representing the identity of the human user with which
one communicates. However, the human user can also have additional
identifiers in a particular role. For example, the human (Alice) can
also be a police officer, and in some cases, an identifier for her
role as police officer will be more relevant than one that asserts
that she is Alice. This is common in contact with organizations,
where it is important to prove the person's right to represent the
organization. Some examples of identifier/identity mechanisms that
can be used: A certificate is used to assert
the identifiers used to claim an identity; by having access to
the private part of the certificate, one can perform signing to
assert one's identity. Any entity interested in verifying the
assertion then needs the public part of the certificate. By
having the certificate, one can verify the signature against the
certificate. The next step is to determine if one trusts the
certificate's trust chain. Commonly, by provisioning the
verifier with the public part of a root certificate, this
enables the verifier to verify a trust chain from the root
certificate down to the identifier in the certificate. However,
the trust is based on all steps in the certificate chain being
verifiable and trusted. Thus, the provisioning of root
certificates and the ability to revoke compromised certificates
are aspects that will require infrastructure.An online identity
provider (IdP) can authenticate a user's right to use an
identifier and then perform assertions on their behalf or
provision the requester with short-term credentials to assert
the identifiers. The verifier can then contact the IdP to
request verification of a particular identifier. Here, the trust
is highly dependent on how much one trusts the IdP. The system
also becomes dependent on having access to the relevant IdP.In all of the above examples, an important part of the security
properties is related to the method for authenticating the access to
the identity.RTP applications need to consider what privacy goals they have.
As RTP applications communicate directly between peers in many
cases, the IP addresses of any communication peer will be available.
The main privacy concern with IP addresses is related to
geographical location and the possibility to track a user of an
endpoint. The main way to avoid such concerns is the introduction of
relay (e.g., a Traversal Using Relay NAT (TURN) server ) or centralized media mixers or forwarders that
hide the address of a peer from any other peer. The security and
trust placed in these relays obviously needs to be carefully
considered.RTP itself can contribute to enabling a particular user to be
tracked between communication sessions if the Canonical Name (CNAME)
is generated according to the RTP specification in the form of
user@host. Such RTCP CNAMEs are likely long-term stable over
multiple sessions, allowing tracking of users. This can be desirable
for long-term fault tracking and diagnosis, but it clearly has
privacy implications. Instead, cryptographically random ones could
be used as defined by "Guidelines for
Choosing RTP Control Protocol (RTCP) CNAMEs".If privacy goals exist, they need to be considered and the system
designed with them in mind. In addition, certain RTP features might
have to be configured to safeguard privacy or have requirements on
how the implementation is done.When it comes to RTP security, the most appropriate solution is
often highly dependent on the topology of the communication session.
The signaling also impacts what information can be provided and if
this can be instance specific or common for a group. In the end, the
key management system will highly affect the security properties
achieved by the application. At the same time, the communication
structure of the application limits what key management methods are
applicable. As different key management methods have different
requirements on underlying infrastructure, it is important to take
that aspect into consideration early in the design.The guidelines for Cryptographic Key
Management provide an overview of why automatic key management
is important. They also provide a strong recommendation on using
automatic key management. Most of the security solutions reviewed in
this document provide or support automatic key management, at least to
establish session keys. In some more long-term use cases, credentials
might need to be manually deployed in certain cases.For SRTP, an important aspect of automatic key management is to
ensure that two-time pads do not occur, in particular by preventing
multiple endpoints using the same session key and SSRC. In these
cases, automatic key management methods can have strong dependencies
on signaling features to function correctly. If those dependencies
can't be fulfilled, additional constrains on usage, e.g., per-endpoint
session keys, might be needed to avoid the issue.When selecting security mechanisms for an RTP application, it is
important to consider the properties of the key management. Using key
management that is both automatic and integrated will provide minimal
interruption for the user and is important to ensure that security
can, and will remain, to be on by default.If the security mechanism only provides a secured tunnel, for
example, like some common uses of IPsec, it is important to consider the full
end-to-end properties of the system. How does one ensure that the path
from the endpoint to the local tunnel ingress/egress is secure and can
be trusted (and similarly for the other end of the tunnel)? How does
one handle the source authentication of the peer, as the security
protocol identifies the other end of the tunnel? These are some of the
issues that arise when one considers a tunnel-based security protocol
rather than an end-to-end one. Even with clear requirements and
knowledge that one still can achieve the security properties using a
tunnel-based solution, one ought to prefer to use end-to-end
mechanisms, as they are much less likely to violate any assumptions
made about deployment. These assumptions can also be difficult to
automatically verify.Key management solutions that use plaintext keys, like SDP security descriptions, require care
to ensure a secure transport of the signaling messages that contain
the plaintext keys. For plaintext keys, the security properties of the
system depend on how securely the plaintext keys are protected
end-to-end between the sender and receiver(s). Not only does one need
to consider what transport protection is provided for the signaling
message, including the keys, but also the degree to which any
intermediaries in the signaling are trusted. Untrusted intermediaries
can perform MITM attacks on the communication or can log the keys,
resulting in the encryption being compromised significantly after the
actual communication occurred.Few RTP applications exist as independent applications that never
interoperate with anything else. Rather, they enable communication
with a potentially large number of other systems. To minimize the
number of security mechanisms that need to be implemented, it is
important to consider if one can use the same security mechanisms as
other applications. This can also reduce problems with determining
what security level is actually negotiated in a particular
session.The desire to be interoperable can, in some cases, be in conflict
with the security requirements of an application. To meet the security
goals, it might be necessary to sacrifice interoperability.
Alternatively, one can implement multiple security mechanisms; this,
however, introduces the complication of ensuring that the user
understands what it means to use a particular security system. In
addition, the application can then become vulnerable to bid-down
attacks.In the following, we describe a number of example security solutions
for applications using RTP services or frameworks. These examples are
provided to illustrate the choices available. They are not normative
recommendations for security.In 2009, the IETF evaluated media security for RTP sessions
established using point-to-point SIP sessions. A number of
requirements were determined, and based on those, the existing
solutions for media security and especially the keying methods were
analyzed. The resulting requirements and analysis were published in
. Based on this analysis and working group
discussion, DTLS-SRTP was determined to be the best solution.The security solution for SIP using DTLS-SRTP is defined in "Framework for Establishing a Secure Real-time
Transport Protocol (SRTP) Security Context Using Datagram Transport
Layer Security (DTLS)". On a high level, the framework uses SIP
with SDP offer/answer procedures to exchange the network addresses
where the server endpoint will have a DTLS-SRTP-enabled server
running. The SIP signaling is also used to exchange the fingerprints
of the certificate each endpoint will use in the DTLS establishment
process. When the signaling is sufficiently completed, the DTLS-SRTP
client performs DTLS handshakes and establishes SRTP session keys. The
clients also verify the fingerprints of the certificates to verify
that no man in the middle has inserted themselves into the
exchange.DTLS has a number of good security properties. For example, to
enable a MITM, someone in the signaling path needs to perform an
active action and modify both the signaling message and the DTLS
handshake. Solutions also exist that enable the fingerprints to be
bound to identities. SIP Identity provides an identity established by
the first proxy for each user . This reduces
the number of nodes the connecting User Agent has to trust to include
just the first-hop proxy rather than the full signaling path. The
biggest security weakness of this system is its dependency on the
signaling. SIP signaling passes multiple nodes and there is usually no
message security deployed, only hop-by-hop transport security, if any,
between the nodes.Web Real-Time Communication (WebRTC) is a
solution providing JavaScript web applications with real-time media
directly between browsers. Media is transported using RTP and
protected using a mandatory application of SRTP , with keying done using DTLS-SRTP . The security configuration is further defined in
"WebRTC Security Architecture" .A hash of the peer's certificate is provided to the JavaScript web
application, allowing that web application to verify identity of the
peer. There are several ways in which the certificate hashes can be
verified. An approach identified in the WebRTC security architecture
is to use an identity provider. In this
solution, the identity provider, which is a third party to the web
application, signs the DTLS-SRTP hash combined with a statement on the
validity of the user identity that has been used to sign the hash. The
receiver of such an identity assertion can then independently verify
the user identity to ensure that it is the identity that the receiver
intended to communicate with, and that the cryptographic assertion
holds; this way, a user can be certain that the application also can't
perform a MITM and acquire the keys to the media communication. Other
ways of verifying the certificate hashes exist; for example, they
could be verified against a hash carried in some out-of-band channel
(e.g., compare with a hash printed on a business card) or using a
verbal short authentication string (e.g., as in ZRTP ) or using hash continuity.In the development of WebRTC, there has also been attention given
to privacy considerations. The main RTP-related concerns that have
been raised are:As Interactive Connectivity
Establishment (ICE) negotiation provides
IP addresses and ports for the browser, this leaks location
information in the signaling to the peer. To prevent this, one can
block the usage of any ICE candidate that isn't a relay candidate,
i.e., where the IP and port provided belong to the service
providers media traffic relay.Static RTP CNAMEs
and DTLS-SRTP certificates provide information that is reused
between session instances. Thus, to prevent tracking, such
information ought not be reused between sessions, or the
information ought not be sent in the clear. Note that generating
new certificates each time prevents continuity in authentication,
however, as WebRTC users are expected to use multiple devices to
access the same communication service, such continuity can't be
expected anyway; instead, the above-described identity mechanism
has to be relied on.Note: The above cases are focused on providing privacy from other
parties, not on providing privacy from the web server that provides
the WebRTC JavaScript application.In IMS, the core network is controlled by a single operator or by
several operators with high trust in each other. Except for some types
of accesses, the operator is in full control, and no packages are
routed over the Internet. Nodes in the core network offer services
such as voice mail, interworking with legacy systems (Public Switched
Telephone Network (PSTN), Global System for Mobile Communications
(GSM), and 3G), and transcoding. Endpoints are authenticated during
the SIP registration using either IMS and Authentication and Key
Agreement (AKA) (using Subscriber Identity Module (SIM) credentials)
or SIP Digest (using a password).In IMS media security,
end-to-end encryption is, therefore, not seen as needed or desired as
it would hinder, for example, interworking and transcoding, making
calls between incompatible terminals impossible. Because of this, IMS
media security mostly uses end-to-access-edge security where SRTP is
terminated in the first node in the core network. As the SIP signaling
is trusted and encrypted (with TLS or IPsec), security descriptions is considered to give
good protection against eavesdropping over the accesses that are not
already encrypted (GSM, 3G, and Long Term Evolution (LTE)). Media
source authentication is based on knowledge of the SRTP session key
and trust in that the IMS network will only forward media from the
correct endpoint.For enterprises and government agencies, which might have weaker
trust in the IMS core network and can be assumed to have compatible
terminals, end-to-end security can be achieved by deploying their own
key management server.Work on interworking with WebRTC is currently ongoing; the security
will still be end-to-access-edge but using DTLS-SRTP instead of security
descriptions.The 3GPP Release 11 PSS specification of the Packet-switched
Streaming Service (PSS) defines,
in Annex R, a set of security mechanisms. These security mechanisms
are concerned with protecting the content from being copied, i.e.,
Digital Rights Management (DRM). To meet these goals with the
specified solution, the client implementation and the application
platform are trusted to protect against access and modification by an
attacker.PSS is media controlled by RTSP 1.0
streaming over RTP. Thus, an RTSP client whose user wants to access a
protected content will request a session description (SDP) for the protected content. This SDP will
indicate that the media is protected by ISMACryp 2.0 encoding application units
(AUs).
The key(s) used to protect the media is provided in one of two ways. If a
single key is used, then the client uses some DRM system to retrieve
the key as indicated in the SDP. Commonly, OMA
DRM v2 will be used to retrieve the key. If multiple keys are
to be used, then an additional RTSP stream for key updates in parallel
with the media streams is established, where key updates are sent to
the client using Short Term Key Messages
defined in the "Service and Content Protection for Mobile Broadcast
Services" part of the OMA Mobile Broadcast Services.Worth noting is that this solution doesn't provide any integrity
verification method for the RTP header and payload header information;
only the encoded media AU is protected. 3GPP has not defined any
requirement for supporting any solution that could provide that
service. Thus, replay or insertion attacks are possible. Another
property is that the media content can be protected by the ones
providing the media, so that the operators of the RTSP server have no
access to unprotected content. Instead, all that want to access the
media are supposed to contact the DRM keying server, and if the device
is acceptable, they will be given the key to decrypt the media.To protect the signaling, RTSP 1.0 supports the usage of TLS. This
is, however, not explicitly discussed in the PSS specification. Usage
of TLS can prevent both modification of the session description
information and help maintain some privacy of what content the user is
watching as all URLs would then be confidentiality protected.The Real-time Streaming Protocol 2.0
offers an interesting comparison to the PSS service that is based on RTSP 1.0
and service requirements perceived by mobile operators. A major
difference between RTSP 1.0 and RTSP 2.0 is that 2.0 is fully defined
under the requirement to have a mandatory-to-implement security
mechanism. As it specifies one transport media over RTP, it is also
defining security mechanisms for the RTP-transported media
streams.The security goal for RTP in RTSP 2.0 is to ensure that there is
confidentiality, integrity, and source authentication between the RTSP
server and the client. This to prevent eavesdropping on what the user
is watching for privacy reasons and to prevent replay or injection
attacks on the media stream. To reach these goals, the signaling also
has to be protected, requiring the use of TLS between the client and
server.Using TLS-protected signaling, the client and server agree on the
media transport method when doing the SETUP request and response. The
secured media transport is SRTP (SAVP/RTP) normally over UDP. The key
management for SRTP is MIKEY using RSA-R mode. The RSA-R mode is
selected as it allows the RTSP server to select the key despite having
the RTSP client initiate the MIKEY exchange. It also enables the reuse
of the RTSP server's TLS certificate when creating the MIKEY messages,
thus ensuring a binding between the RTSP server and the key exchange.
Assuming the SETUP process works, this will establish a SRTP crypto
context to be used between the RTSP server and the client for the
RTP-transported media streams.This entire document is about security. Please read it.We thank the IESG for their careful review of , which led to the writing of this memo. John Mattsson
has contributed the IMS Media Security
example.The authors wish to thank Christian Correll, Dan Wing, Kevin Gross,
Alan Johnston, Michael Peck, Ole Jacobsen, Spencer Dawkins, Stephen
Farrell, John Mattsson, and Suresh Krishnan for their reviews and
proposals for improvements to the text.Securing the RTP Protocol Framework: Why RTP Does Not Mandate
a Single Media Security SolutionThe ARIA Algorithm and Its Use with the Secure Real-time
Transport Protocol(SRTP)AES-GCM and AES-CCM Authenticated Encryption in Secure RTP
(SRTP)Encrypted Key Transport for Secure RTPReal Time Streaming Protocol 2.0 (RTSP)Overview: Real Time Protocols for Browser-based
ApplicationsWebRTC Security ArchitectureISMA Encryption and Authentication Version 2.0Internet Streaming Media Alliance
(ISMA)OMA Digital Rights Management V2.0Open Mobile AllianceMobile Broadcast Services Version 1.0Open Mobile AllianceService and Content Protection for Mobile Broadcast
ServicesOpen Mobile AllianceTechnical Specification Group Services and System Aspects;
Transparent end-to-end Packet-switched Streaming Service (PSS);
Protocols and codecs3GPPTechnical Specification Group Services and System Aspects;
Transparent end-to-end Packet-switched Streaming Service (PSS);
Protocols and codecs3GPPMultimedia Broadcast/Multicast Service (MBMS); Protocols and
codecs3GPP3G Security; Security of Multimedia Broadcast/Multicast
Service (MBMS)3GPPIP Multimedia Subsystem (IMS) media plane security3GPP