Index: doc/draft-ietf-payload-rtp-opus.xml |
diff --git a/doc/draft-ietf-payload-rtp-opus.xml b/doc/draft-ietf-payload-rtp-opus.xml |
index 02440d94ff3aab41846e2b26611eff7aa83cba43..7f1f8678441611e355ab4dc4b2266017ca236ac5 100644 |
--- a/doc/draft-ietf-payload-rtp-opus.xml |
+++ b/doc/draft-ietf-payload-rtp-opus.xml |
@@ -1,10 +1,11 @@ |
<?xml version="1.0" encoding="UTF-8"?> |
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [ |
<!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'> |
+<!ENTITY rfc3389 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3389.xml'> |
<!ENTITY rfc3550 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml'> |
<!ENTITY rfc3711 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3711.xml'> |
<!ENTITY rfc3551 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3551.xml'> |
-<!ENTITY rfc4288 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4288.xml'> |
+<!ENTITY rfc6838 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6838.xml'> |
<!ENTITY rfc4855 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4855.xml'> |
<!ENTITY rfc4566 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml'> |
<!ENTITY rfc3264 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml'> |
@@ -17,7 +18,7 @@ |
<!ENTITY nbsp " "> |
]> |
- <rfc category="std" ipr="trust200902" docName="draft-ietf-payload-rtp-opus-01"> |
+ <rfc category="std" ipr="trust200902" docName="draft-ietf-payload-rtp-opus-07"> |
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?> |
<?rfc strict="yes" ?> |
@@ -43,14 +44,14 @@ |
</author> |
<author initials='K.' surname='Vos' fullname='Koen Vos'> |
- <organization>Skype Technologies S.A.</organization> |
+ <organization>vocTone</organization> |
<address> |
<postal> |
- <street>3210 Porter Drive</street> |
- <code>94304</code> |
- <city>Palo Alto</city> |
- <region>CA</region> |
- <country>USA</country> |
+ <street></street> |
+ <code></code> |
+ <city></city> |
+ <region></region> |
+ <country></country> |
</postal> |
<email>koenvos74@gmail.com</email> |
</address> |
@@ -60,7 +61,7 @@ |
<organization>Mozilla</organization> |
<address> |
<postal> |
- <street>650 Castro Street</street> |
+ <street>331 E. Evelyn Avenue</street> |
<city>Mountain View</city> |
<region>CA</region> |
<code>94041</code> |
@@ -70,15 +71,15 @@ |
</address> |
</author> |
- <date day='2' month='August' year='2013' /> |
+ <date day='13' month='January' year='2015' /> |
<abstract> |
<t> |
This document defines the Real-time Transport Protocol (RTP) payload |
format for packetization of Opus encoded |
- speech and audio data that is essential to integrate the codec in the |
- most compatible way. Further, media type registrations |
- are described for the RTP payload format. |
+ speech and audio data necessary to integrate the codec in the |
+ most compatible way. Further, it describes media type registrations |
+ for the RTP payload format. |
</t> |
</abstract> |
</front> |
@@ -87,19 +88,19 @@ |
<section title='Introduction'> |
<t> |
The Opus codec is a speech and audio codec developed within the |
- IETF Internet Wideband Audio Codec working group (codec). The codec |
+ IETF Internet Wideband Audio Codec working group. The codec |
has a very low algorithmic delay and it |
is highly scalable in terms of audio bandwidth, bitrate, and |
complexity. Further, it provides different modes to efficiently encode speech signals |
- as well as music signals, thus, making it the codec of choice for |
+ as well as music signals, thus making it the codec of choice for |
various applications using the Internet or similar networks. |
</t> |
<t> |
This document defines the Real-time Transport Protocol (RTP) |
<xref target="RFC3550"/> payload format for packetization |
- of Opus encoded speech and audio data that is essential to |
+ of Opus encoded speech and audio data necessary to |
integrate the Opus codec in the |
- most compatible way. Further, media type registrations are described for |
+ most compatible way. Further, it describes media type registrations for |
the RTP payload format. More information on the Opus |
codec can be obtained from <xref target="RFC6716"/>. |
</t> |
@@ -111,46 +112,46 @@ |
document are to be interpreted as described in <xref target="RFC2119"/>.</t> |
<t> |
<list style='hanging'> |
+ <t hangText="audio bandwidth:"> The range of audio frequecies being coded</t> |
<t hangText="CBR:"> Constant bitrate</t> |
<t hangText="CPU:"> Central Processing Unit</t> |
<t hangText="DTX:"> Discontinuous transmission</t> |
<t hangText="FEC:"> Forward error correction</t> |
- <t hangText="IP:"> Internet Protocol</t> |
- <t hangText="samples:"> Speech or audio samples (usually per channel)</t> |
- <t hangText="SDP:"> Session Description Protocol</t> |
+ <t hangText="IP:"> Internet Protocol</t> |
+ <t hangText="samples:"> Speech or audio samples (per channel)</t> |
+ <t hangText="SDP:"> Session Description Protocol</t> |
<t hangText="VBR:"> Variable bitrate</t> |
</list> |
</t> |
- <section title='Audio Bandwidth'> |
- <t> |
- Throughout this document, we refer to the following definitions: |
- </t> |
+ <t> |
+ Throughout this document, we refer to the following definitions: |
+ </t> |
<texttable anchor='bandwidth_definitions'> |
- <ttcol align='center'>Abbreviation</ttcol> |
+ <ttcol align='center'>Abbreviation</ttcol> |
<ttcol align='center'>Name</ttcol> |
- <ttcol align='center'>Bandwidth</ttcol> |
- <ttcol align='center'>Sampling</ttcol> |
- <c>nb</c> |
+ <ttcol align='center'>Audio Bandwidth (Hz)</ttcol> |
+ <ttcol align='center'>Sampling Rate (Hz)</ttcol> |
+ <c>NB</c> |
<c>Narrowband</c> |
<c>0 - 4000</c> |
<c>8000</c> |
- <c>mb</c> |
+ <c>MB</c> |
<c>Mediumband</c> |
<c>0 - 6000</c> |
<c>12000</c> |
- <c>wb</c> |
+ <c>WB</c> |
<c>Wideband</c> |
<c>0 - 8000</c> |
<c>16000</c> |
- <c>swb</c> |
+ <c>SWB</c> |
<c>Super-wideband</c> |
<c>0 - 12000</c> |
<c>24000</c> |
- <c>fb</c> |
+ <c>FB</c> |
<c>Fullband</c> |
<c>0 - 20000</c> |
<c>48000</c> |
@@ -159,21 +160,20 @@ |
Audio bandwidth naming |
</postamble> |
</texttable> |
- </section> |
</section> |
<section title='Opus Codec'> |
<t> |
- The Opus <xref target="RFC6716"/> speech and audio codec has been developed to encode speech |
- signals as well as audio signals. Two different modes, a voice mode |
- or an audio mode, may be chosen to allow the most efficient coding |
- dependent on the type of input signal, the sampling frequency of the |
- input signal, and the specific application. |
+ The Opus <xref target="RFC6716"/> codec encodes speech |
+ signals as well as general audio signals. Two different modes can be |
+ chosen, a voice mode or an audio mode, to allow the most efficient coding |
+ depending on the type of the input signal, the sampling frequency of the |
+ input signal, and the intended application. |
</t> |
<t> |
The voice mode allows efficient encoding of voice signals at lower bit |
- rates while the audio mode is optimized for audio signals at medium and |
+ rates while the audio mode is optimized for general audio signals at medium and |
higher bitrates. |
</t> |
@@ -185,43 +185,43 @@ |
<section title='Network Bandwidth'> |
<t> |
- Opus supports all bitrates from 6 kb/s to 510 kb/s. |
- The bitrate can be changed dynamically within that range. |
- All |
- other parameters being |
- equal, higher bitrate results in higher quality. |
- </t> |
- <section title='Recommended Bitrate' anchor='bitrate_by_bandwidth'> |
- <t> |
- For a frame size of |
- 20 ms, these |
- are the bitrate "sweet spots" for Opus in various configurations: |
+ Opus supports bitrates from 6 kb/s to 510 kb/s. |
+ The bitrate can be changed dynamically within that range. |
+ All |
+ other parameters being |
+ equal, higher bitrates result in higher quality. |
+ </t> |
+ <section title='Recommended Bitrate' anchor='bitrate_by_bandwidth'> |
+ <t> |
+ For a frame size of |
+ 20 ms, these |
+ are the bitrate "sweet spots" for Opus in various configurations: |
<list style="symbols"> |
- <t>8-12 kb/s for NB speech,</t> |
- <t>16-20 kb/s for WB speech,</t> |
- <t>28-40 kb/s for FB speech,</t> |
- <t>48-64 kb/s for FB mono music, and</t> |
- <t>64-128 kb/s for FB stereo music.</t> |
- </list> |
- </t> |
+ <t>8-12 kb/s for NB speech,</t> |
+ <t>16-20 kb/s for WB speech,</t> |
+ <t>28-40 kb/s for FB speech,</t> |
+ <t>48-64 kb/s for FB mono music, and</t> |
+ <t>64-128 kb/s for FB stereo music.</t> |
+ </list> |
+ </t> |
</section> |
- <section title='Variable versus Constant Bit Rate' anchor='variable-vs-constant-bitrate'> |
+ <section title='Variable versus Constant Bitrate' anchor='variable-vs-constant-bitrate'> |
+ <t> |
+ For the same average bitrate, variable bitrate (VBR) can achieve higher quality |
+ than constant bitrate (CBR). For the majority of voice transmission applications, VBR |
+ is the best choice. One reason for choosing CBR is the potential |
+ information leak that <spanx style='emph'>might</spanx> occur when encrypting the |
+ compressed stream. See <xref target="RFC6562"/> for guidelines on when VBR is |
+ appropriate for encrypted audio communications. In the case where an existing |
+ VBR stream needs to be converted to CBR for security reasons, then the Opus padding |
+ mechanism described in <xref target="RFC6716"/> is the RECOMMENDED way to achieve padding |
+ because the RTP padding bit is unencrypted.</t> |
+ |
<t> |
- For the same average bitrate, variable bitrate (VBR) can achieve higher quality |
- than constant bitrate (CBR). For the majority of voice transmission application, VBR |
- is the best choice. One potential reason for choosing CBR is the potential |
- information leak that <spanx style='emph'>may</spanx> occur when encrypting the |
- compressed stream. See <xref target="RFC6562"/> for guidelines on when VBR is |
- appropriate for encrypted audio communications. In the case where an existing |
- VBR stream needs to be converted to CBR for security reasons, then the Opus padding |
- mechanism described in <xref target="RFC6716"/> is the RECOMMENDED way to achieve padding |
- because the RTP padding bit is unencrypted.</t> |
- |
- <t> |
The bitrate can be adjusted at any point in time. To avoid congestion, |
- the average bitrate SHOULD be adjusted to the available |
- network capacity. If no target bitrate is specified, the bitrates specified in |
+ the average bitrate SHOULD NOT exceed the available |
+ network bandwidth. If no target bitrate is specified, the bitrates specified in |
<xref target='bitrate_by_bandwidth'/> are RECOMMENDED. |
</t> |
@@ -230,12 +230,12 @@ |
<section title='Discontinuous Transmission (DTX)'> |
<t> |
- The Opus codec may, as described in <xref target='variable-vs-constant-bitrate'/>, |
- be operated with an adaptive bitrate. In that case, the bitrate |
- will automatically be reduced for certain input signals like periods |
- of silence. During continuous transmission the bitrate will be |
- reduced, when the input signal allows to do so, but the transmission |
- to the receiver itself will never be interrupted. Therefore, the |
+ The Opus codec can, as described in <xref target='variable-vs-constant-bitrate'/>, |
+ be operated with a variable bitrate. In that case, the encoder will |
+ automatically reduce the bitrate for certain input signals, like periods |
+ of silence. When using continuous transmission, it will reduce the |
+ bitrate when the characteristics of the input signal permit, but |
+ will never interrupt the transmission to the receiver. Therefore, the |
received signal will maintain the same high level of quality over the |
full duration of a transmission while minimizing the average bit |
rate over time. |
@@ -244,25 +244,33 @@ |
<t> |
In cases where the bitrate of Opus needs to be reduced even |
further or in cases where only constant bitrate is available, |
- the Opus encoder may be set to use discontinuous |
+ the Opus encoder can use discontinuous |
transmission (DTX), where parts of the encoded signal that |
correspond to periods of silence in the input speech or audio signal |
- are not transmitted to the receiver. |
+ are not transmitted to the receiver. A receiver can distinguish |
+ between DTX and packet loss by looking for gaps in the sequence |
+ number, as described by Section 4.1 |
+ of <xref target="RFC3551"/>. |
</t> |
<t> |
On the receiving side, the non-transmitted parts will be handled by a |
frame loss concealment unit in the Opus decoder which generates a |
comfort noise signal to replace the non transmitted parts of the |
- speech or audio signal. |
+ speech or audio signal. Use of <xref target="RFC3389"/> Comfort |
+ Noise (CN) with Opus is discouraged. |
+ The transmitter MUST drop whole frames only, |
+ based on the size of the last transmitted frame, |
+ to ensure successive RTP timestamps differ by a multiple of 120 and |
+ to allow the receiver to use whole frames for concealment. |
</t> |
<t> |
- The DTX mode of Opus will have a slightly lower speech or audio |
- quality than the continuous mode. Therefore, it is RECOMMENDED to |
- use Opus in the continuous mode unless restraints on network |
- capacity are severe. The DTX mode can be engaged for operation |
- in both adaptive or constant bitrate. |
+ DTX can be used with both variable and constant bitrate. |
+ It will have a slightly lower speech or audio |
+ quality than continuous transmission. Therefore, using continuous |
+ transmission is RECOMMENDED unless restraints on available network bandwidth |
+ are severe. |
</t> |
</section> |
@@ -272,7 +280,7 @@ |
<section title='Complexity'> |
<t> |
- Complexity can be scaled to optimize for CPU resources in real-time, mostly as |
+ Complexity of the encoder can be scaled to optimize for CPU resources in real-time, mostly as |
a trade-off between audio quality and bitrate. Also, different modes of Opus have different complexity. |
</t> |
@@ -281,10 +289,10 @@ |
<section title="Forward Error Correction (FEC)"> |
<t> |
- The voice mode of Opus allows for "in-band" forward error correction (FEC) |
- data to be embedded into the bit stream of Opus. This FEC scheme adds |
- redundant information about the previous packet (n-1) to the current |
- output packet n. For |
+ The voice mode of Opus allows for embedding "in-band" forward error correction (FEC) |
+ data into the Opus bit stream. This FEC scheme adds |
+ redundant information about the previous packet (N-1) to the current |
+ output packet N. For |
each frame, the encoder decides whether to use FEC based on (1) an |
externally-provided estimate of the channel's packet loss rate; (2) an |
externally-provided estimate of the channel's capacity; (3) the |
@@ -297,16 +305,18 @@ |
<t> |
On the receiving side, the decoder can take advantage of this |
- additional information when, in case of a packet loss, the next packet |
+ additional information when it loses a packet and the next packet |
is available. In order to use the FEC data, the jitter buffer needs |
- to provide access to payloads with the FEC data. The decoder API function |
- has a flag to indicate that a FEC frame rather than a regular frame should |
- be decoded. If no FEC data is available for the current frame, the decoder |
- will consider the frame lost and invokes the frame loss concealment. |
+ to provide access to payloads with the FEC data. |
+ Instead of performing loss concealment for a missing packet, the |
+ receiver can then configure its decoder to decode the FEC data from the next packet. |
</t> |
<t> |
- If the FEC scheme is not implemented on the receiving side, FEC |
+ Any compliant Opus decoder is capable of ignoring |
+ FEC information when it is not needed, so encoding with FEC cannot cause |
+ interoperability problems. |
+ However, if FEC cannot be used on the receiving side, then FEC |
SHOULD NOT be used, as it leads to an inefficient usage of network |
resources. Decoder support for FEC SHOULD be indicated at the time a |
session is set up. |
@@ -319,15 +329,16 @@ |
<t> |
Opus allows for transmission of stereo audio signals. This operation |
is signaled in-band in the Opus payload and no special arrangement |
- is required in the payload format. Any implementation of the Opus |
- decoder MUST be capable of receiving stereo signals, although it MAY |
- decode those signals as mono. |
+ is needed in the payload format. An |
+ Opus decoder is capable of handling a stereo encoding, but an |
+ application might only be capable of consuming a single audio |
+ channel. |
</t> |
<t> |
- If a decoder can not take advantage of the benefits of a stereo signal |
+ If a decoder cannot take advantage of the benefits of a stereo signal |
this SHOULD be indicated at the time a session is set up. In that case |
the sending side SHOULD NOT send stereo signals as it leads to an |
- inefficient usage of the network. |
+ inefficient usage of network resources. |
</t> |
</section> |
@@ -338,65 +349,53 @@ |
<t>The payload format for Opus consists of the RTP header and Opus payload |
data.</t> |
<section title='RTP Header Usage'> |
- <t>The format of the RTP header is specified in <xref target="RFC3550"/>. The Opus |
- payload format uses the fields of the RTP header consistent with this |
- specification.</t> |
- |
- <t>The payload length of Opus is a multiple number of octets and |
- therefore no padding is required. The payload MAY be padded by an |
- integer number of octets according to <xref target="RFC3550"/>.</t> |
- |
- <t>The marker bit (M) of the RTP header is used in accordance with |
- Section 4.1 of <xref target="RFC3551"/>.</t> |
- |
- <t>The RTP payload type for Opus has not been assigned statically and is |
- expected to be assigned dynamically.</t> |
- |
- <t>The receiving side MUST be prepared to receive duplicates of RTP |
- packets. Only one of those payloads MUST be provided to the Opus decoder |
- for decoding and others MUST be discarded.</t> |
- |
- <t>Opus supports 5 different audio bandwidths which may be adjusted during |
- the duration of a call. The RTP timestamp clock frequency is defined as |
- the highest supported sampling frequency of Opus, i.e. 48000 Hz, for all |
- modes and sampling rates of Opus. The unit |
+ <t>The format of the RTP header is specified in <xref target="RFC3550"/>. |
+ The use of the fields of the RTP header by the Opus payload format is |
+ consistent with that specification.</t> |
+ |
+ <t>The payload length of Opus is an integer number of octets and |
+ therefore no padding is necessary. The payload MAY be padded by an |
+ integer number of octets according to <xref target="RFC3550"/>, |
+ although the Opus internal padding is preferred.</t> |
+ |
+ <t>The timestamp, sequence number, and marker bit (M) of the RTP header |
+ are used in accordance with Section 4.1 |
+ of <xref target="RFC3551"/>.</t> |
+ |
+ <t>The RTP payload type for Opus is to be assigned dynamically.</t> |
+ |
+ <t>The receiving side MUST be prepared to receive duplicate RTP |
+ packets. The receiver MUST provide at most one of those payloads to the |
+ Opus decoder for decoding, and MUST discard the others.</t> |
+ |
+ <t>Opus supports 5 different audio bandwidths, which can be adjusted during |
+ a call. |
+ The RTP timestamp is incremented with a 48000 Hz clock rate |
+ for all modes of Opus and all sampling rates. |
+ The unit |
for the timestamp is samples per single (mono) channel. The RTP timestamp corresponds to the |
- sample time of the first encoded sample in the encoded frame. For sampling |
- rates lower than 48000 Hz the number of samples has to be multiplied with |
- a multiplier according to <xref target="fs-upsample-factors"/> to determine |
- the RTP timestamp.</t> |
- |
- <texttable anchor='fs-upsample-factors' title="Timestamp multiplier"> |
- <ttcol align='center'>fs (Hz)</ttcol> |
- <ttcol align='center'>Multiplier</ttcol> |
- <c>8000</c> |
- <c>6</c> |
- <c>12000</c> |
- <c>4</c> |
- <c>16000</c> |
- <c>3</c> |
- <c>24000</c> |
- <c>2</c> |
- <c>48000</c> |
- <c>1</c> |
- </texttable> |
+ sample time of the first encoded sample in the encoded frame. |
+ For data encoded with sampling rates other than 48000 Hz, |
+ the sampling rate has to be adjusted to 48000 Hz.</t> |
+ |
</section> |
<section title='Payload Structure'> |
<t> |
- The Opus encoder can be set to output encoded frames representing 2.5, 5, 10, 20, |
- 40, or 60 ms of speech or audio data. Further, an arbitrary number of frames can be |
- combined into a packet. The maximum packet length is limited to the amount of encoded |
- data representing 120 ms of speech or audio data. The packetization of encoded data |
- is purely done by the Opus encoder and therefore only one packet output from the Opus |
- encoder MUST be used as a payload. |
+ The Opus encoder can output encoded frames representing 2.5, 5, 10, 20, |
+ 40, or 60 ms of speech or audio data. Further, an arbitrary number of frames can be |
+ combined into a packet, up to a maximum packet duration representing |
+ 120 ms of speech or audio data. The grouping of one or more Opus |
+ frames into a single Opus packet is defined in Section 3 of |
+ <xref target="RFC6716"/>. An RTP payload MUST contain exactly one |
+ Opus packet as defined by that document. |
</t> |
<t><xref target='payload-structure'/> shows the structure combined with the RTP header.</t> |
<figure anchor="payload-structure" |
- title="Payload Structure with RTP header"> |
- <artwork> |
+ title="Packet structure with RTP header"> |
+ <artwork align="center"> |
<![CDATA[ |
+----------+--------------+ |
|RTP Header| Opus Payload | |
@@ -406,16 +405,16 @@ |
</figure> |
<t> |
- <xref target='opus-packetization'/> shows supported frame sizes in |
- milliseconds of encoded speech or audio data for speech and audio mode |
- (Mode) and sampling rates (fs) of Opus and how the timestamp needs to |
- be incremented for packetization (ts incr). If the Opus encoder |
- outputs multiple encoded frames into a single packet the timestamps |
- have to be added up according to the combined frames. |
+ <xref target='opus-packetization'/> shows supported frame sizes in |
+ milliseconds of encoded speech or audio data for the speech and audio modes |
+ (Mode) and sampling rates (fs) of Opus and shows how the timestamp is |
+ incremented for packetization (ts incr). If the Opus encoder |
+ outputs multiple encoded frames into a single packet, the timestamp |
+ increment is the sum of the increments for the individual frames. |
</t> |
- <texttable anchor='opus-packetization' title="Supported Opus frame |
- sizes and timestamp increments"> |
+ <texttable anchor='opus-packetization' title="Supported Opus frame |
+ sizes and timestamp increments marked with an o. Unsupported marked with an x."> |
<ttcol align='center'>Mode</ttcol> |
<ttcol align='center'>fs</ttcol> |
<ttcol align='center'>2.5</ttcol> |
@@ -433,21 +432,21 @@ |
<c>1920</c> |
<c>2880</c> |
<c>voice</c> |
- <c>nb/mb/wb/swb/fb</c> |
- <c></c> |
- <c></c> |
- <c>x</c> |
- <c>x</c> |
+ <c>NB/MB/WB/SWB/FB</c> |
<c>x</c> |
<c>x</c> |
+ <c>o</c> |
+ <c>o</c> |
+ <c>o</c> |
+ <c>o</c> |
<c>audio</c> |
- <c>nb/wb/swb/fb</c> |
- <c>x</c> |
+ <c>NB/WB/SWB/FB</c> |
+ <c>o</c> |
+ <c>o</c> |
+ <c>o</c> |
+ <c>o</c> |
<c>x</c> |
<c>x</c> |
- <c>x</c> |
- <c></c> |
- <c></c> |
</texttable> |
</section> |
@@ -456,19 +455,17 @@ |
<section title='Congestion Control'> |
- <t>The adaptive nature of the Opus codec allows for an efficient |
- congestion control.</t> |
- |
- <t>The target bitrate of Opus can be adjusted at any point in time and |
- thus allowing for an efficient congestion control. Furthermore, the amount |
+ <t>The target bitrate of Opus can be adjusted at any point in time, thus |
+ allowing efficient congestion control. Furthermore, the amount |
of encoded speech or audio data encoded in a |
- single packet can be used for congestion control since the transmission |
- rate is inversely proportional to these frame sizes. A lower packet |
- transmission rate reduces the amount of header overhead but at the same |
- time increases latency and error sensitivity and should be done with care.</t> |
- |
- <t>It is RECOMMENDED that congestion control is applied during the |
- transmission of Opus encoded data.</t> |
+ single packet can be used for congestion control, since the transmission |
+ rate is inversely proportional to the packet duration. A lower packet |
+ transmission rate reduces the amount of header overhead, but at the same |
+ time increases latency and loss sensitivity, so it ought to be used with |
+ care.</t> |
+ |
+ <t>It is RECOMMENDED that senders of Opus encoded data apply congestion |
+ control.</t> |
</section> |
<section title='IANA Considerations'> |
@@ -477,7 +474,7 @@ |
<section title='Opus Media Type Registration'> |
<t>Media type registration is done according to <xref |
- target="RFC4288"/> and <xref target="RFC4855"/>.<vspace |
+ target="RFC6838"/> and <xref target="RFC4855"/>.<vspace |
blankLines='1'/></t> |
<t>Type name: audio<vspace blankLines='1'/></t> |
@@ -485,10 +482,10 @@ |
<t>Required parameters:</t> |
<t><list style="hanging"> |
- <t hangText="rate:"> RTP timestamp clock rate is incremented with |
+ <t hangText="rate:"> the RTP timestamp is incremented with a |
48000 Hz clock rate for all modes of Opus and all sampling |
- frequencies. For audio sampling rates other than 48000 Hz the rate |
- has to be adjusted to 48000 Hz according to <xref target="fs-upsample-factors"/>. |
+ rates. For data encoded with sampling rates other than 48000 Hz, |
+ the sampling rate has to be adjusted to 48000 Hz. |
</t> |
</list></t> |
@@ -505,7 +502,7 @@ |
usage and encoding complexity, so an encoder SHOULD NOT encode |
frequencies above the audio bandwidth specified by maxplaybackrate. |
This parameter can take any value between 8000 and 48000, although |
- commonly the value will match one of the Opus bandwidths |
+ commonly the value will match one of the Opus bandwidths |
(<xref target="bandwidth_definitions"/>). |
By default, the receiver is assumed to have no limitations, i.e. 48000. |
<vspace blankLines='1'/> |
@@ -519,119 +516,95 @@ |
This parameter is useful to avoid wasting receiver resources by operating the audio |
processing pipeline (e.g. echo cancellation) at a higher rate than necessary. |
This parameter can take any value between 8000 and 48000, although |
- commonly the value will match one of the Opus bandwidths |
+ commonly the value will match one of the Opus bandwidths |
(<xref target="bandwidth_definitions"/>). |
By default, the sender is assumed to have no limitations, i.e. 48000. |
<vspace blankLines='1'/> |
</t> |
- <t hangText="maxptime:"> the decoder's maximum length of time in |
- milliseconds rounded up to the next full integer value represented |
- by the media in a packet that can be |
- encapsulated in a received packet according to Section 6 of |
- <xref target="RFC4566"/>. Possible values are 3, 5, 10, 20, 40, |
- and 60 or an arbitrary multiple of Opus frame sizes rounded up to |
- the next full integer value up to a maximum value of 120 as |
+ <t hangText="maxptime:"> the maximum duration of media represented |
+ by a packet (according to Section 6 of |
+ <xref target="RFC4566"/>) that a decoder wants to receive, in |
+ milliseconds rounded up to the next full integer value. |
+ Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary |
+ multiple of an Opus frame size rounded up to the next full integer |
+ value, up to a maximum value of 120, as |
defined in <xref target='opus-rtp-payload-format'/>. If no value is |
- specified, 120 is assumed as default. This value is a recommendation |
- by the decoding side to ensure the best |
- performance for the decoder. The decoder MUST be |
- capable of accepting any allowed packet sizes to |
- ensure maximum compatibility. |
+ specified, the default is 120. |
<vspace blankLines='1'/></t> |
- <t hangText="ptime:"> the decoder's recommended length of time in |
- milliseconds rounded up to the next full integer value represented |
- by the media in a packet according to |
- Section 6 of <xref target="RFC4566"/>. Possible values are |
- 3, 5, 10, 20, 40, or 60 or an arbitrary multiple of Opus frame sizes |
- rounded up to the next full integer value up to a maximum |
- value of 120 as defined in <xref |
+ <t hangText="ptime:"> the preferred duration of media represented |
+ by a packet (according to Section 6 of |
+ <xref target="RFC4566"/>) that a decoder wants to receive, in |
+ milliseconds rounded up to the next full integer value. |
+ Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary |
+ multiple of an Opus frame size rounded up to the next full integer |
+ value, up to a maximum value of 120, as defined in <xref |
target='opus-rtp-payload-format'/>. If no value is |
- specified, 20 is assumed as default. If ptime is greater than |
- maxptime, ptime MUST be ignored. This parameter MAY be changed |
- during a session. This value is a recommendation by the decoding |
- side to ensure the best |
- performance for the decoder. The decoder MUST be |
- capable of accepting any allowed packet sizes to |
- ensure maximum compatibility. |
- <vspace blankLines='1'/></t> |
- |
- <t hangText="minptime:"> the decoder's minimum length of time in |
- milliseconds rounded up to the next full integer value represented |
- by the media in a packet that SHOULD |
- be encapsulated in a received packet according to Section 6 of <xref |
- target="RFC4566"/>. Possible values are 3, 5, 10, 20, 40, and 60 |
- or an arbitrary multiple of Opus frame sizes rounded up to the next |
- full integer value up to a maximum value of 120 |
- as defined in <xref target='opus-rtp-payload-format'/>. If no value is |
- specified, 3 is assumed as default. This value is a recommendation |
- by the decoding side to ensure the best |
- performance for the decoder. The decoder MUST be |
- capable to accept any allowed packet sizes to |
- ensure maximum compatibility. |
+ specified, the default is 20. |
<vspace blankLines='1'/></t> |
<t hangText="maxaveragebitrate:"> specifies the maximum average |
- receive bitrate of a session in bits per second (b/s). The actual |
- value of the bitrate may vary as it is dependent on the |
+ receive bitrate of a session in bits per second (b/s). The actual |
+ value of the bitrate can vary, as it is dependent on the |
characteristics of the media in a packet. Note that the maximum |
average bitrate MAY be modified dynamically during a session. Any |
- positive integer is allowed but values outside the range between |
- 6000 and 510000 SHOULD be ignored. If no value is specified, the |
+ positive integer is allowed, but values outside the range |
+ 6000 to 510000 SHOULD be ignored. If no value is specified, the |
maximum value specified in <xref target='bitrate_by_bandwidth'/> |
- for the corresponding mode of Opus and corresponding maxplaybackrate: |
- will be the default.<vspace blankLines='1'/></t> |
+ for the corresponding mode of Opus and corresponding maxplaybackrate |
+ is the default.<vspace blankLines='1'/></t> |
<t hangText="stereo:"> |
specifies whether the decoder prefers receiving stereo or mono signals. |
- Possible values are 1 and 0 where 1 specifies that stereo signals are preferred |
+ Possible values are 1 and 0 where 1 specifies that stereo signals are preferred, |
and 0 specifies that only mono signals are preferred. |
Independent of the stereo parameter every receiver MUST be able to receive and |
decode stereo signals but sending stereo signals to a receiver that signaled a |
preference for mono signals may result in higher than necessary network |
- utilisation and encoding complexity. If no value is specified, mono |
- is assumed (stereo=0).<vspace blankLines='1'/> |
+ utilization and encoding complexity. If no value is specified, |
+ the default is 0 (mono).<vspace blankLines='1'/> |
</t> |
<t hangText="sprop-stereo:"> |
specifies whether the sender is likely to produce stereo audio. |
- Possible values are 1 and 0 where 1 specifies that stereo signals are likely to |
- be sent, and 0 speficies that the sender will likely only send mono. |
- This is not a guarantee that the sender will never send stereo audio |
- (e.g. it could send a pre-recorded prompt that uses stereo), but it |
- indicates to the receiver that the received signal can be safely downmixed to mono. |
- This parameter is useful to avoid wasting receiver resources by operating the audio |
- processing pipeline (e.g. echo cancellation) in stereo when not necessary. |
- If no value is specified, mono |
- is assumed (sprop-stereo=0).<vspace blankLines='1'/> |
+ Possible values are 1 and 0, where 1 specifies that stereo signals are likely to |
+ be sent, and 0 specifies that the sender will likely only send mono. |
+ This is not a guarantee that the sender will never send stereo audio |
+ (e.g. it could send a pre-recorded prompt that uses stereo), but it |
+ indicates to the receiver that the received signal can be safely downmixed to mono. |
+ This parameter is useful to avoid wasting receiver resources by operating the audio |
+ processing pipeline (e.g. echo cancellation) in stereo when not necessary. |
+ If no value is specified, the default is 0 |
+ (mono).<vspace blankLines='1'/> |
</t> |
<t hangText="cbr:"> |
specifies if the decoder prefers the use of a constant bitrate versus |
- variable bitrate. Possible values are 1 and 0 where 1 specifies constant |
- bitrate and 0 specifies variable bitrate. If no value is specified, cbr |
- is assumed to be 0. Note that the maximum average bitrate may still be |
- changed, e.g. to adapt to changing network conditions.<vspace blankLines='1'/> |
+ variable bitrate. Possible values are 1 and 0, where 1 specifies constant |
+ bitrate and 0 specifies variable bitrate. If no value is specified, |
+ the default is 0 (vbr). When cbr is 1, the maximum average bitrate can still |
+ change, e.g. to adapt to changing network conditions.<vspace blankLines='1'/> |
</t> |
<t hangText="useinbandfec:"> specifies that the decoder has the capability to |
- take advantage of the Opus in-band FEC. Possible values are 1 and 0. It is RECOMMENDED to provide |
- 0 in case FEC cannot be utilized on the receiving side. If no |
+ take advantage of the Opus in-band FEC. Possible values are 1 and 0. |
+ Providing 0 when FEC cannot be used on the receiving side is |
+ RECOMMENDED. If no |
value is specified, useinbandfec is assumed to be 0. |
This parameter is only a preference and the receiver MUST be able to process |
packets that include FEC information, even if it means the FEC part is discarded. |
<vspace blankLines='1'/></t> |
<t hangText="usedtx:"> specifies if the decoder prefers the use of |
- DTX. Possible values are 1 and 0. If no value is specified, usedtx |
- is assumed to be 0.<vspace blankLines='1'/></t> |
+ DTX. Possible values are 1 and 0. If no value is specified, the |
+ default is 0.<vspace blankLines='1'/></t> |
</list></t> |
<t>Encoding considerations:<vspace blankLines='1'/></t> |
<t><list style="hanging"> |
- <t>Opus media type is framed and consists of binary data according |
- to Section 4.8 in <xref target="RFC4288"/>.</t> |
+ <t>The Opus media type is framed and consists of binary data according |
+ to Section 4.8 in <xref target="RFC6838"/>.</t> |
</list></t> |
<t>Security considerations: </t> |
@@ -640,16 +613,20 @@ |
</list></t> |
<t>Interoperability considerations: none<vspace blankLines='1'/></t> |
- <t>Published specification: none<vspace blankLines='1'/></t> |
+ <t>Published specification: RFC [XXXX]</t> |
+ <t>Note to the RFC Editor: Replace [XXXX] with the number of the published |
+ RFC.<vspace blankLines='1'/></t> |
<t>Applications that use this media type: </t> |
<t><list style="hanging"> |
<t>Any application that requires the transport of |
- speech or audio data may use this media type. Some examples are, |
+ speech or audio data can use this media type. Some examples are, |
but not limited to, audio and video conferencing, Voice over IP, |
media streaming.</t> |
</list></t> |
+ <t>Fragment identifier considerations: N/A<vspace blankLines='1'/></t> |
+ |
<t>Person & email address to contact for further information:</t> |
<t><list style="hanging"> |
<t>SILK Support silksupport@skype.net</t> |
@@ -673,7 +650,7 @@ |
<t>Jean-Marc Valin jmvalin@jmvalin.ca<vspace blankLines='1'/></t> |
</list></t> |
- <t> Change controller: TBD</t> |
+ <t> Change controller: IETF Payload Working Group delegated from the IESG</t> |
</section> |
<section title='Mapping to SDP Parameters'> |
@@ -689,18 +666,18 @@ |
<t>The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding |
name. The RTP clock rate in "a=rtpmap" MUST be 48000 and the number of |
- channels MUST be 2.</t> |
+ channels MUST be 2.</t> |
<t>The OPTIONAL media type parameters "ptime" and "maxptime" are |
mapped to "a=ptime" and "a=maxptime" attributes, respectively, in the |
SDP.</t> |
- <t>The OPTIONAL media type parameters "maxaveragebitrate", |
- "maxplaybackrate", "minptime", "stereo", "cbr", "useinbandfec", and |
- "usedtx", when present, MUST be included in the "a=fmtp" attribute |
+ <t>The OPTIONAL media type parameters "maxaveragebitrate", |
+ "maxplaybackrate", "stereo", "cbr", "useinbandfec", and |
+ "usedtx", when present, MUST be included in the "a=fmtp" attribute |
in the SDP, expressed as a media type string in the form of a |
semicolon-separated list of parameter=value pairs (e.g., |
- maxaveragebitrate=20000). They MUST NOT be specified in an |
+ maxplaybackrate=48000). They MUST NOT be specified in an |
SSRC-specific "fmtp" source-level attribute (as defined in |
Section 6.3 of <xref target="RFC5576"/>).</t> |
@@ -735,8 +712,8 @@ |
<t>Example 2: 16000 Hz clock rate, maximum packet size of 40 ms, |
recommended packet size of 40 ms, maximum average bitrate of 20000 bps, |
- prefers to receive stereo but only plans to send mono, FEC is allowed, |
- DTX is not allowed</t> |
+ prefers to receive stereo but only plans to send mono, FEC is desired, |
+ DTX is not desired</t> |
<figure> |
<artwork> |
@@ -775,8 +752,8 @@ |
<t>Opus supports several clock rates. For signaling purposes only |
the highest, i.e. 48000, is used. The actual clock rate of the |
corresponding media is signaled inside the payload and is not |
- subject to this payload format description. The decoder MUST be |
- capable to decode every received clock rate. An example |
+ restricted by this payload format description. The decoder MUST be |
+ capable of decoding every received clock rate. An example |
is shown below: |
<figure> |
@@ -791,29 +768,26 @@ |
<t>The "ptime" and "maxptime" parameters are unidirectional |
receive-only parameters and typically will not compromise |
- interoperability; however, dependent on the set values of the |
- parameters the performance of the application may suffer. <xref |
+ interoperability; however, some values might cause application |
+ performance to suffer. <xref |
target="RFC3264"/> defines the SDP offer-answer handling of the |
"ptime" parameter. The "maxptime" parameter MUST be handled in the |
same way.</t> |
<t> |
- The "minptime" parameter is a unidirectional |
- receive-only parameters and typically will not compromise |
- interoperability; however, dependent on the set values of the |
- parameter the performance of the application may suffer and should be |
- set with care. |
- </t> |
- |
- <t> |
The "maxplaybackrate" parameter is a unidirectional receive-only |
- parameter that reflects limitations of the local receiver. The sender |
- of the other side SHOULD NOT send with an audio bandwidth higher than |
- "maxplaybackrate" as this would lead to inefficient use of network resources. |
+ parameter that reflects limitations of the local receiver. When |
+ sending to a single destination, a sender MUST NOT use an audio |
+ bandwidth higher than necessary to make full use of audio sampled at |
+ a sampling rate of "maxplaybackrate". Gateways or senders that |
+ are sending the same encoded audio to multiple destinations |
+ SHOULD NOT use an audio bandwidth higher than necessary to |
+ represent audio sampled at "maxplaybackrate", as this would lead |
+ to inefficient use of network resources. |
The "maxplaybackrate" parameter does not |
- affect interoperability. Also, this parameter SHOULD NOT be used |
- to adjust the audio bandwidth as a function of the bitrates, as this |
- is the responsibility of the Opus encoder implementation. |
+ affect interoperability. Also, this parameter SHOULD NOT be used |
+ to adjust the audio bandwidth as a function of the bitrate, as this |
+ is the responsibility of the Opus encoder implementation. |
</t> |
<t>The "maxaveragebitrate" parameter is a unidirectional receive-only |
@@ -821,9 +795,9 @@ |
of the other side MUST NOT send with an average bitrate higher than |
"maxaveragebitrate" as it might overload the network and/or |
receiver. The "maxaveragebitrate" parameter typically will not |
- compromise interoperability; however, dependent on the set value of |
- the parameter the performance of the application may suffer and should |
- be set with care.</t> |
+ compromise interoperability; however, some values might cause |
+ application performance to suffer, and ought to be set with |
+ care.</t> |
<t>The "sprop-maxcapturerate" and "sprop-stereo" parameters are |
unidirectional sender-only parameters that reflect limitations of |
@@ -837,7 +811,12 @@ |
<t> |
The "stereo" parameter is a unidirectional receive-only |
- parameter. |
+ parameter. When sending to a single destination, a sender MUST |
+ NOT use stereo when "stereo" is 0. Gateways or senders that are |
+ sending the same encoded audio to multiple destinations SHOULD |
+ NOT use stereo when "stereo" is 0, as this would lead to |
+ inefficient use of network resources. The "stereo" parameter does |
+ not affect interoperability. |
</t> |
<t> |
@@ -865,23 +844,21 @@ |
<t><list style="symbols"> |
- <t>The values for "maxptime", "ptime", "minptime", "maxplaybackrate", and |
- "maxaveragebitrate" should be selected carefully to ensure that a |
+ <t>The values for "maxptime", "ptime", "maxplaybackrate", and |
+ "maxaveragebitrate" ought to be selected carefully to ensure that a |
reasonable performance can be achieved for the participants of a session.</t> |
<t> |
- The values for "maxptime", "ptime", and "minptime" of the payload |
+ The values for "maxptime", "ptime", and of the payload |
format configuration are recommendations by the decoding side to ensure |
- the best performance for the decoder. The decoder MUST be |
- capable to accept any allowed packet sizes to |
- ensure maximum compatibility. |
+ the best performance for the decoder. |
</t> |
<t>All other parameters of the payload format configuration are declarative |
and a participant MUST use the configurations that are provided for |
- the session. More than one configuration may be provided if necessary |
+ the session. More than one configuration can be provided if necessary |
by declaring multiple RTP payload types; however, the number of types |
- should be kept small.</t> |
+ ought to be kept small.</t> |
</list></t> |
</section> |
</section> |
@@ -891,13 +868,13 @@ |
<t>All RTP packets using the payload format defined in this specification |
are subject to the general security considerations discussed in the RTP |
- specification <xref target="RFC3550"/> and any profile from |
- e.g. <xref target="RFC3711"/> or <xref target="RFC3551"/>.</t> |
+ specification <xref target="RFC3550"/> and any profile from, |
+ e.g., <xref target="RFC3711"/> or <xref target="RFC3551"/>.</t> |
- <t>This payload format transports Opus encoded speech or audio data, |
- hence, security issues include confidentiality, integrity protection, and |
- authentication of the speech or audio itself. The Opus payload format does |
- not have any built-in security mechanisms. Any suitable external |
+ <t>This payload format transports Opus encoded speech or audio data. |
+ Hence, security issues include confidentiality, integrity protection, and |
+ authentication of the speech or audio itself. Opus does not provide |
+ any confidentiality or integrity protection. Any suitable external |
mechanisms, such as SRTP <xref target="RFC3711"/>, MAY be used.</t> |
<t>This payload format and the Opus encoding do not exhibit any |
@@ -907,26 +884,33 @@ |
</section> |
<section title='Acknowledgements'> |
- <t>TBD</t> |
+ <t>Many people have made useful comments and suggestions contributing to this document. |
+ In particular, we would like to thank |
+ Tina le Grand, Cullen Jennings, Jonathan Lennox, Gregory Maxwell, Colin Perkins, Jan Skoglund, |
+ Timothy B. Terriberry, Martin Thompson, Justin Uberti, Magnus Westerlund, and Mo Zanaty.</t> |
</section> |
</middle> |
<back> |
<references title="Normative References"> |
&rfc2119; |
+ &rfc3389; |
&rfc3550; |
&rfc3711; |
&rfc3551; |
- &rfc4288; |
+ &rfc6838; |
&rfc4855; |
&rfc4566; |
&rfc3264; |
- &rfc2974; |
&rfc2326; |
&rfc5576; |
&rfc6562; |
&rfc6716; |
</references> |
+ <references title="Informative References"> |
+ &rfc2974; |
+ </references> |
+ |
</back> |
</rfc> |