OLD | NEW |
(Empty) | |
| 1 <?xml version="1.0" encoding="UTF-8"?> |
| 2 <!DOCTYPE rfc SYSTEM "rfc2629.dtd" [ |
| 3 <!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.2119.xml'> |
| 4 <!ENTITY rfc3389 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.3389.xml'> |
| 5 <!ENTITY rfc3550 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.3550.xml'> |
| 6 <!ENTITY rfc3711 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.3711.xml'> |
| 7 <!ENTITY rfc3551 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.3551.xml'> |
| 8 <!ENTITY rfc6838 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.6838.xml'> |
| 9 <!ENTITY rfc4855 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.4855.xml'> |
| 10 <!ENTITY rfc4566 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.4566.xml'> |
| 11 <!ENTITY rfc4585 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.4585.xml'> |
| 12 <!ENTITY rfc3264 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.3264.xml'> |
| 13 <!ENTITY rfc2974 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.2974.xml'> |
| 14 <!ENTITY rfc2326 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.2326.xml'> |
| 15 <!ENTITY rfc3555 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.3555.xml'> |
| 16 <!ENTITY rfc5124 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.5124.xml'> |
| 17 <!ENTITY rfc5405 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.5405.xml'> |
| 18 <!ENTITY rfc5576 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.5576.xml'> |
| 19 <!ENTITY rfc6562 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.6562.xml'> |
| 20 <!ENTITY rfc6716 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.6716.xml'> |
| 21 <!ENTITY rfc7202 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.
RFC.7202.xml'> |
| 22 <!ENTITY nbsp " "> |
| 23 ]> |
| 24 |
| 25 <rfc category="std" ipr="trust200902" docName="draft-ietf-payload-rtp-opus-11"
> |
| 26 <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?> |
| 27 |
| 28 <?rfc strict="yes" ?> |
| 29 <?rfc toc="yes" ?> |
| 30 <?rfc tocdepth="3" ?> |
| 31 <?rfc tocappendix='no' ?> |
| 32 <?rfc tocindent='yes' ?> |
| 33 <?rfc symrefs="yes" ?> |
| 34 <?rfc sortrefs="yes" ?> |
| 35 <?rfc compact="no" ?> |
| 36 <?rfc subcompact="yes" ?> |
| 37 <?rfc iprnotified="yes" ?> |
| 38 |
| 39 <front> |
| 40 <title abbrev="RTP Payload Format for Opus"> |
| 41 RTP Payload Format for the Opus Speech and Audio Codec |
| 42 </title> |
| 43 |
| 44 <author fullname="Julian Spittka" initials="J." surname="Spittka"> |
| 45 <address> |
| 46 <email>jspittka@gmail.com</email> |
| 47 </address> |
| 48 </author> |
| 49 |
| 50 <author initials='K.' surname='Vos' fullname='Koen Vos'> |
| 51 <organization>vocTone</organization> |
| 52 <address> |
| 53 <postal> |
| 54 <street></street> |
| 55 <code></code> |
| 56 <city></city> |
| 57 <region></region> |
| 58 <country></country> |
| 59 </postal> |
| 60 <email>koenvos74@gmail.com</email> |
| 61 </address> |
| 62 </author> |
| 63 |
| 64 <author initials="JM" surname="Valin" fullname="Jean-Marc Valin"> |
| 65 <organization>Mozilla</organization> |
| 66 <address> |
| 67 <postal> |
| 68 <street>331 E. Evelyn Avenue</street> |
| 69 <city>Mountain View</city> |
| 70 <region>CA</region> |
| 71 <code>94041</code> |
| 72 <country>USA</country> |
| 73 </postal> |
| 74 <email>jmvalin@jmvalin.ca</email> |
| 75 </address> |
| 76 </author> |
| 77 |
| 78 <date day='14' month='April' year='2015' /> |
| 79 |
| 80 <abstract> |
| 81 <t> |
| 82 This document defines the Real-time Transport Protocol (RTP) payload |
| 83 format for packetization of Opus encoded |
| 84 speech and audio data necessary to integrate the codec in the |
| 85 most compatible way. It also provides an applicability statement |
| 86 for the use of Opus over RTP. Further, it describes media type registrat
ions |
| 87 for the RTP payload format. |
| 88 </t> |
| 89 </abstract> |
| 90 </front> |
| 91 |
| 92 <middle> |
| 93 <section title='Introduction'> |
| 94 <t> |
| 95 Opus <xref target="RFC6716"/> is a speech and audio codec developed with
in the |
| 96 IETF Internet Wideband Audio Codec working group. The codec |
| 97 has a very low algorithmic delay and it |
| 98 is highly scalable in terms of audio bandwidth, bitrate, and |
| 99 complexity. Further, it provides different modes to efficiently encode s
peech signals |
| 100 as well as music signals, thus making it the codec of choice for |
| 101 various applications using the Internet or similar networks. |
| 102 </t> |
| 103 <t> |
| 104 This document defines the Real-time Transport Protocol (RTP) |
| 105 <xref target="RFC3550"/> payload format for packetization |
| 106 of Opus encoded speech and audio data necessary to |
| 107 integrate Opus in the |
| 108 most compatible way. It also provides an applicability statement |
| 109 for the use of Opus over RTP. |
| 110 Further, it describes media type registrations for |
| 111 the RTP payload format. |
| 112 </t> |
| 113 </section> |
| 114 |
| 115 <section title='Conventions, Definitions and Acronyms used in this document'
> |
| 116 <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", |
| 117 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this |
| 118 document are to be interpreted as described in <xref target="RFC2119"/>.</
t> |
| 119 <t> |
| 120 <list style='hanging'> |
| 121 <t hangText="audio bandwidth:"> The range of audio frequecies being co
ded</t> |
| 122 <t hangText="CBR:"> Constant bitrate</t> |
| 123 <t hangText="CPU:"> Central Processing Unit</t> |
| 124 <t hangText="DTX:"> Discontinuous transmission</t> |
| 125 <t hangText="FEC:"> Forward error correction</t> |
| 126 <t hangText="IP:"> Internet Protocol</t> |
| 127 <t hangText="samples:"> Speech or audio samples (per channel)</t> |
| 128 <t hangText="SDP:"> Session Description Protocol</t> |
| 129 <t hangText="VBR:"> Variable bitrate</t> |
| 130 </list> |
| 131 </t> |
| 132 <t> |
| 133 Throughout this document, we refer to the following definitions: |
| 134 </t> |
| 135 <texttable anchor='bandwidth_definitions'> |
| 136 <ttcol align='center'>Abbreviation</ttcol> |
| 137 <ttcol align='center'>Name</ttcol> |
| 138 <ttcol align='center'>Audio Bandwidth (Hz)</ttcol> |
| 139 <ttcol align='center'>Sampling Rate (Hz)</ttcol> |
| 140 <c>NB</c> |
| 141 <c>Narrowband</c> |
| 142 <c>0 - 4000</c> |
| 143 <c>8000</c> |
| 144 |
| 145 <c>MB</c> |
| 146 <c>Mediumband</c> |
| 147 <c>0 - 6000</c> |
| 148 <c>12000</c> |
| 149 |
| 150 <c>WB</c> |
| 151 <c>Wideband</c> |
| 152 <c>0 - 8000</c> |
| 153 <c>16000</c> |
| 154 |
| 155 <c>SWB</c> |
| 156 <c>Super-wideband</c> |
| 157 <c>0 - 12000</c> |
| 158 <c>24000</c> |
| 159 |
| 160 <c>FB</c> |
| 161 <c>Fullband</c> |
| 162 <c>0 - 20000</c> |
| 163 <c>48000</c> |
| 164 |
| 165 <postamble> |
| 166 Audio bandwidth naming |
| 167 </postamble> |
| 168 </texttable> |
| 169 </section> |
| 170 |
| 171 <section title='Opus Codec'> |
| 172 <t> |
| 173 Opus encodes speech |
| 174 signals as well as general audio signals. Two different modes can be |
| 175 chosen, a voice mode or an audio mode, to allow the most efficient codin
g |
| 176 depending on the type of the input signal, the sampling frequency of the |
| 177 input signal, and the intended application. |
| 178 </t> |
| 179 |
| 180 <t> |
| 181 The voice mode allows efficient encoding of voice signals at lower bit |
| 182 rates while the audio mode is optimized for general audio signals at med
ium and |
| 183 higher bitrates. |
| 184 </t> |
| 185 |
| 186 <t> |
| 187 Opus is highly scalable in terms of audio |
| 188 bandwidth, bitrate, and complexity. Further, Opus allows |
| 189 transmitting stereo signals with in-band signaling in the bit-stream. |
| 190 </t> |
| 191 |
| 192 <section title='Network Bandwidth'> |
| 193 <t> |
| 194 Opus supports bitrates from 6 kb/s to 510 kb/s. |
| 195 The bitrate can be changed dynamically within that range. |
| 196 All |
| 197 other parameters being |
| 198 equal, higher bitrates result in higher audio quality. |
| 199 </t> |
| 200 <section title='Recommended Bitrate' anchor='bitrate_by_bandwidth'> |
| 201 <t> |
| 202 For a frame size of |
| 203 20 ms, these |
| 204 are the bitrate "sweet spots" for Opus in various configurations: |
| 205 |
| 206 <list style="symbols"> |
| 207 <t>8-12 kb/s for NB speech,</t> |
| 208 <t>16-20 kb/s for WB speech,</t> |
| 209 <t>28-40 kb/s for FB speech,</t> |
| 210 <t>48-64 kb/s for FB mono music, and</t> |
| 211 <t>64-128 kb/s for FB stereo music.</t> |
| 212 </list> |
| 213 </t> |
| 214 </section> |
| 215 <section title='Variable versus Constant Bitrate' anchor='variable-vs-c
onstant-bitrate'> |
| 216 <t> |
| 217 For the same average bitrate, variable bitrate (VBR) can achieve hig
her audio quality |
| 218 than constant bitrate (CBR). For the majority of voice transmission
applications, VBR |
| 219 is the best choice. One reason for choosing CBR is the potential |
| 220 information leak that <spanx style='emph'>might</spanx> occur when e
ncrypting the |
| 221 compressed stream. See <xref target="RFC6562"/> for guidelines on wh
en VBR is |
| 222 appropriate for encrypted audio communications. In the case where an
existing |
| 223 VBR stream needs to be converted to CBR for security reasons, then t
he Opus padding |
| 224 mechanism described in <xref target="RFC6716"/> is the RECOMMENDED w
ay to achieve padding |
| 225 because the RTP padding bit is unencrypted.</t> |
| 226 |
| 227 <t> |
| 228 The bitrate can be adjusted at any point in time. To avoid congestio
n, |
| 229 the average bitrate SHOULD NOT exceed the available |
| 230 network bandwidth. If no target bitrate is specified, the bitrates s
pecified in |
| 231 <xref target='bitrate_by_bandwidth'/> are RECOMMENDED. |
| 232 </t> |
| 233 |
| 234 </section> |
| 235 |
| 236 <section title='Discontinuous Transmission (DTX)'> |
| 237 |
| 238 <t> |
| 239 Opus can, as described in <xref target='variable-vs-constant-bitrate
'/>, |
| 240 be operated with a variable bitrate. In that case, the encoder will |
| 241 automatically reduce the bitrate for certain input signals, like per
iods |
| 242 of silence. When using continuous transmission, it will reduce the |
| 243 bitrate when the characteristics of the input signal permit, but |
| 244 will never interrupt the transmission to the receiver. Therefore, th
e |
| 245 received signal will maintain the same high level of audio quality o
ver the |
| 246 full duration of a transmission while minimizing the average bit |
| 247 rate over time. |
| 248 </t> |
| 249 |
| 250 <t> |
| 251 In cases where the bitrate of Opus needs to be reduced even |
| 252 further or in cases where only constant bitrate is available, |
| 253 the Opus encoder can use discontinuous |
| 254 transmission (DTX), where parts of the encoded signal that |
| 255 correspond to periods of silence in the input speech or audio signal |
| 256 are not transmitted to the receiver. A receiver can distinguish |
| 257 between DTX and packet loss by looking for gaps in the sequence |
| 258 number, as described by Section 4.1 |
| 259 of <xref target="RFC3551"/>. |
| 260 </t> |
| 261 |
| 262 <t> |
| 263 On the receiving side, the non-transmitted parts will be handled by
a |
| 264 frame loss concealment unit in the Opus decoder which generates a |
| 265 comfort noise signal to replace the non transmitted parts of the |
| 266 speech or audio signal. Use of <xref target="RFC3389"/> Comfort |
| 267 Noise (CN) with Opus is discouraged. |
| 268 The transmitter MUST drop whole frames only, |
| 269 based on the size of the last transmitted frame, |
| 270 to ensure successive RTP timestamps differ by a multiple of 120 and |
| 271 to allow the receiver to use whole frames for concealment. |
| 272 </t> |
| 273 |
| 274 <t> |
| 275 DTX can be used with both variable and constant bitrate. |
| 276 It will have a slightly lower speech or audio |
| 277 quality than continuous transmission. Therefore, using continuous |
| 278 transmission is RECOMMENDED unless constraints on available network
bandwidth |
| 279 are severe. |
| 280 </t> |
| 281 |
| 282 </section> |
| 283 |
| 284 </section> |
| 285 |
| 286 <section title='Complexity'> |
| 287 |
| 288 <t> |
| 289 Complexity of the encoder can be scaled to optimize for CPU resources
in real-time, mostly as |
| 290 a trade-off between audio quality and bitrate. Also, different modes o
f Opus have different complexity. |
| 291 </t> |
| 292 |
| 293 </section> |
| 294 |
| 295 <section title="Forward Error Correction (FEC)"> |
| 296 |
| 297 <t> |
| 298 The voice mode of Opus allows for embedding "in-band" forward error co
rrection (FEC) |
| 299 data into the Opus bit stream. This FEC scheme adds |
| 300 redundant information about the previous packet (N-1) to the current |
| 301 output packet N. For |
| 302 each frame, the encoder decides whether to use FEC based on (1) an |
| 303 externally-provided estimate of the channel's packet loss rate; (2) an |
| 304 externally-provided estimate of the channel's capacity; (3) the |
| 305 sensitivity of the audio or speech signal to packet loss; (4) whether |
| 306 the receiving decoder has indicated it can take advantage of "in-band" |
| 307 FEC information. The decision to send "in-band" FEC information is |
| 308 entirely controlled by the encoder and therefore no special precaution
s |
| 309 for the payload have to be taken. |
| 310 </t> |
| 311 |
| 312 <t> |
| 313 On the receiving side, the decoder can take advantage of this |
| 314 additional information when it loses a packet and the next packet |
| 315 is available. In order to use the FEC data, the jitter buffer needs |
| 316 to provide access to payloads with the FEC data. |
| 317 Instead of performing loss concealment for a missing packet, the |
| 318 receiver can then configure its decoder to decode the FEC data from th
e next packet. |
| 319 </t> |
| 320 |
| 321 <t> |
| 322 Any compliant Opus decoder is capable of ignoring |
| 323 FEC information when it is not needed, so encoding with FEC cannot cau
se |
| 324 interoperability problems. |
| 325 However, if FEC cannot be used on the receiving side, then FEC |
| 326 SHOULD NOT be used, as it leads to an inefficient usage of network |
| 327 resources. Decoder support for FEC SHOULD be indicated at the time a |
| 328 session is set up. |
| 329 </t> |
| 330 |
| 331 </section> |
| 332 |
| 333 <section title='Stereo Operation'> |
| 334 |
| 335 <t> |
| 336 Opus allows for transmission of stereo audio signals. This operation |
| 337 is signaled in-band in the Opus bit-stream and no special arrangement |
| 338 is needed in the payload format. An |
| 339 Opus decoder is capable of handling a stereo encoding, but an |
| 340 application might only be capable of consuming a single audio |
| 341 channel. |
| 342 </t> |
| 343 <t> |
| 344 If a decoder cannot take advantage of the benefits of a stereo signal |
| 345 this SHOULD be indicated at the time a session is set up. In that case |
| 346 the sending side SHOULD NOT send stereo signals as it leads to an |
| 347 inefficient usage of network resources. |
| 348 </t> |
| 349 |
| 350 </section> |
| 351 |
| 352 </section> |
| 353 |
| 354 <section title='Opus RTP Payload Format' anchor='opus-rtp-payload-format'> |
| 355 <t>The payload format for Opus consists of the RTP header and Opus payload |
| 356 data.</t> |
| 357 <section title='RTP Header Usage'> |
| 358 <t>The format of the RTP header is specified in <xref target="RFC3550"/>
. |
| 359 The use of the fields of the RTP header by the Opus payload format is |
| 360 consistent with that specification.</t> |
| 361 |
| 362 <t>The payload length of Opus is an integer number of octets and |
| 363 therefore no padding is necessary. The payload MAY be padded by an |
| 364 integer number of octets according to <xref target="RFC3550"/>, |
| 365 although the Opus internal padding is preferred.</t> |
| 366 |
| 367 <t>The timestamp, sequence number, and marker bit (M) of the RTP header |
| 368 are used in accordance with Section 4.1 |
| 369 of <xref target="RFC3551"/>.</t> |
| 370 |
| 371 <t>The RTP payload type for Opus is to be assigned dynamically.</t> |
| 372 |
| 373 <t>The receiving side MUST be prepared to receive duplicate RTP |
| 374 packets. The receiver MUST provide at most one of those payloads to the |
| 375 Opus decoder for decoding, and MUST discard the others.</t> |
| 376 |
| 377 <t>Opus supports 5 different audio bandwidths, which can be adjusted dur
ing |
| 378 a stream. |
| 379 The RTP timestamp is incremented with a 48000 Hz clock rate |
| 380 for all modes of Opus and all sampling rates. |
| 381 The unit |
| 382 for the timestamp is samples per single (mono) channel. The RTP timestam
p corresponds to the |
| 383 sample time of the first encoded sample in the encoded frame. |
| 384 For data encoded with sampling rates other than 48000 Hz, |
| 385 the sampling rate has to be adjusted to 48000 Hz.</t> |
| 386 |
| 387 </section> |
| 388 |
| 389 <section title='Payload Structure'> |
| 390 <t> |
| 391 The Opus encoder can output encoded frames representing 2.5, 5, 10, 20
, |
| 392 40, or 60 ms of speech or audio data. Further, an arbitrary numbe
r of frames can be |
| 393 combined into a packet, up to a maximum packet duration representing |
| 394 120 ms of speech or audio data. The grouping of one or more Opus |
| 395 frames into a single Opus packet is defined in Section 3 of |
| 396 <xref target="RFC6716"/>. An RTP payload MUST contain exactly one |
| 397 Opus packet as defined by that document. |
| 398 </t> |
| 399 |
| 400 <t><xref target='payload-structure'/> shows the structure combined with
the RTP header.</t> |
| 401 |
| 402 <figure anchor="payload-structure" |
| 403 title="Packet structure with RTP header"> |
| 404 <artwork align="center"> |
| 405 <![CDATA[ |
| 406 +----------+--------------+ |
| 407 |RTP Header| Opus Payload | |
| 408 +----------+--------------+ |
| 409 ]]> |
| 410 </artwork> |
| 411 </figure> |
| 412 |
| 413 <t> |
| 414 <xref target='opus-packetization'/> shows supported frame sizes in |
| 415 milliseconds of encoded speech or audio data for the speech and audio
modes |
| 416 (Mode) and sampling rates (fs) of Opus and shows how the timestamp is |
| 417 incremented for packetization (ts incr). If the Opus encoder |
| 418 outputs multiple encoded frames into a single packet, the timestamp |
| 419 increment is the sum of the increments for the individual frames. |
| 420 </t> |
| 421 |
| 422 <texttable anchor='opus-packetization' title="Supported Opus frame |
| 423 sizes and timestamp increments marked with an o. Unsupported marked wit
h an x."> |
| 424 <ttcol align='center'>Mode</ttcol> |
| 425 <ttcol align='center'>fs</ttcol> |
| 426 <ttcol align='center'>2.5</ttcol> |
| 427 <ttcol align='center'>5</ttcol> |
| 428 <ttcol align='center'>10</ttcol> |
| 429 <ttcol align='center'>20</ttcol> |
| 430 <ttcol align='center'>40</ttcol> |
| 431 <ttcol align='center'>60</ttcol> |
| 432 <c>ts incr</c> |
| 433 <c>all</c> |
| 434 <c>120</c> |
| 435 <c>240</c> |
| 436 <c>480</c> |
| 437 <c>960</c> |
| 438 <c>1920</c> |
| 439 <c>2880</c> |
| 440 <c>voice</c> |
| 441 <c>NB/MB/WB/SWB/FB</c> |
| 442 <c>x</c> |
| 443 <c>x</c> |
| 444 <c>o</c> |
| 445 <c>o</c> |
| 446 <c>o</c> |
| 447 <c>o</c> |
| 448 <c>audio</c> |
| 449 <c>NB/WB/SWB/FB</c> |
| 450 <c>o</c> |
| 451 <c>o</c> |
| 452 <c>o</c> |
| 453 <c>o</c> |
| 454 <c>x</c> |
| 455 <c>x</c> |
| 456 </texttable> |
| 457 |
| 458 </section> |
| 459 |
| 460 </section> |
| 461 |
| 462 <section title='Congestion Control'> |
| 463 |
| 464 <t>The target bitrate of Opus can be adjusted at any point in time, thus |
| 465 allowing efficient congestion control. Furthermore, the amount |
| 466 of encoded speech or audio data encoded in a |
| 467 single packet can be used for congestion control, since the transmission |
| 468 rate is inversely proportional to the packet duration. A lower packet |
| 469 transmission rate reduces the amount of header overhead, but at the same |
| 470 time increases latency and loss sensitivity, so it ought to be used with |
| 471 care.</t> |
| 472 |
| 473 <t>Since UDP does not provide congestion control, applications that use |
| 474 RTP over UDP SHOULD implement their own congestion control above the |
| 475 UDP layer <xref target="RFC5405"/>. Work in the rmcat working group |
| 476 <xref target="rmcat"/> describes the |
| 477 interactions and conceptual interfaces necessary between the application |
| 478 components that relate to congestion control, including the RTP layer, |
| 479 the higher-level media codec control layer, and the lower-level |
| 480 transport interface, as well as components dedicated to congestion |
| 481 control functions.</t> |
| 482 </section> |
| 483 |
| 484 <section title='IANA Considerations'> |
| 485 <t>One media subtype (audio/opus) has been defined and registered as |
| 486 described in the following section.</t> |
| 487 |
| 488 <section title='Opus Media Type Registration'> |
| 489 <t>Media type registration is done according to <xref |
| 490 target="RFC6838"/> and <xref target="RFC4855"/>.<vspace |
| 491 blankLines='1'/></t> |
| 492 |
| 493 <t>Type name: audio<vspace blankLines='1'/></t> |
| 494 <t>Subtype name: opus<vspace blankLines='1'/></t> |
| 495 |
| 496 <t>Required parameters:</t> |
| 497 <t><list style="hanging"> |
| 498 <t hangText="rate:"> the RTP timestamp is incremented with a |
| 499 48000 Hz clock rate for all modes of Opus and all sampling |
| 500 rates. For data encoded with sampling rates other than 48000 Hz, |
| 501 the sampling rate has to be adjusted to 48000 Hz. |
| 502 </t> |
| 503 </list></t> |
| 504 |
| 505 <t>Optional parameters:</t> |
| 506 |
| 507 <t><list style="hanging"> |
| 508 <t hangText="maxplaybackrate:"> |
| 509 a hint about the maximum output sampling rate that the receiver is |
| 510 capable of rendering in Hz. |
| 511 The decoder MUST be capable of decoding |
| 512 any audio bandwidth but due to hardware limitations only signals |
| 513 up to the specified sampling rate can be played back. Sending sign
als |
| 514 with higher audio bandwidth results in higher than necessary netwo
rk |
| 515 usage and encoding complexity, so an encoder SHOULD NOT encode |
| 516 frequencies above the audio bandwidth specified by maxplaybackrate
. |
| 517 This parameter can take any value between 8000 and 48000, although |
| 518 commonly the value will match one of the Opus bandwidths |
| 519 (<xref target="bandwidth_definitions"/>). |
| 520 By default, the receiver is assumed to have no limitations, i.e. 4
8000. |
| 521 <vspace blankLines='1'/> |
| 522 </t> |
| 523 |
| 524 <t hangText="sprop-maxcapturerate:"> |
| 525 a hint about the maximum input sampling rate that the sender is li
kely to produce. |
| 526 This is not a guarantee that the sender will never send any higher
bandwidth |
| 527 (e.g. it could send a pre-recorded prompt that uses a higher bandw
idth), but it |
| 528 indicates to the receiver that frequencies above this maximum can
safely be discarded. |
| 529 This parameter is useful to avoid wasting receiver resources by op
erating the audio |
| 530 processing pipeline (e.g. echo cancellation) at a higher rate than
necessary. |
| 531 This parameter can take any value between 8000 and 48000, although |
| 532 commonly the value will match one of the Opus bandwidths |
| 533 (<xref target="bandwidth_definitions"/>). |
| 534 By default, the sender is assumed to have no limitations, i.e. 480
00. |
| 535 <vspace blankLines='1'/> |
| 536 </t> |
| 537 |
| 538 <t hangText="maxptime:"> the maximum duration of media represented |
| 539 by a packet (according to Section 6 of |
| 540 <xref target="RFC4566"/>) that a decoder wants to receive, in |
| 541 milliseconds rounded up to the next full integer value. |
| 542 Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary |
| 543 multiple of an Opus frame size rounded up to the next full integer |
| 544 value, up to a maximum value of 120, as |
| 545 defined in <xref target='opus-rtp-payload-format'/>. If no value is |
| 546 specified, the default is 120. |
| 547 <vspace blankLines='1'/></t> |
| 548 |
| 549 <t hangText="ptime:"> the preferred duration of media represented |
| 550 by a packet (according to Section 6 of |
| 551 <xref target="RFC4566"/>) that a decoder wants to receive, in |
| 552 milliseconds rounded up to the next full integer value. |
| 553 Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary |
| 554 multiple of an Opus frame size rounded up to the next full integer |
| 555 value, up to a maximum value of 120, as defined in <xref |
| 556 target='opus-rtp-payload-format'/>. If no value is |
| 557 specified, the default is 20. |
| 558 <vspace blankLines='1'/></t> |
| 559 |
| 560 <t hangText="maxaveragebitrate:"> specifies the maximum average |
| 561 receive bitrate of a session in bits per second (b/s). The actual |
| 562 value of the bitrate can vary, as it is dependent on the |
| 563 characteristics of the media in a packet. Note that the maximum |
| 564 average bitrate MAY be modified dynamically during a session. Any |
| 565 positive integer is allowed, but values outside the range |
| 566 6000 to 510000 SHOULD be ignored. If no value is specified, the |
| 567 maximum value specified in <xref target='bitrate_by_bandwidth'/> |
| 568 for the corresponding mode of Opus and corresponding maxplaybackrate |
| 569 is the default.<vspace blankLines='1'/></t> |
| 570 |
| 571 <t hangText="stereo:"> |
| 572 specifies whether the decoder prefers receiving stereo or mono sig
nals. |
| 573 Possible values are 1 and 0 where 1 specifies that stereo signals
are preferred, |
| 574 and 0 specifies that only mono signals are preferred. |
| 575 Independent of the stereo parameter every receiver MUST be able to
receive and |
| 576 decode stereo signals but sending stereo signals to a receiver tha
t signaled a |
| 577 preference for mono signals may result in higher than necessary ne
twork |
| 578 utilization and encoding complexity. If no value is specified, |
| 579 the default is 0 (mono).<vspace blankLines='1'/> |
| 580 </t> |
| 581 |
| 582 <t hangText="sprop-stereo:"> |
| 583 specifies whether the sender is likely to produce stereo audio. |
| 584 Possible values are 1 and 0, where 1 specifies that stereo signals
are likely to |
| 585 be sent, and 0 specifies that the sender will likely only send mon
o. |
| 586 This is not a guarantee that the sender will never send stereo aud
io |
| 587 (e.g. it could send a pre-recorded prompt that uses stereo), but i
t |
| 588 indicates to the receiver that the received signal can be safely d
ownmixed to mono. |
| 589 This parameter is useful to avoid wasting receiver resources by op
erating the audio |
| 590 processing pipeline (e.g. echo cancellation) in stereo when not ne
cessary. |
| 591 If no value is specified, the default is 0 |
| 592 (mono).<vspace blankLines='1'/> |
| 593 </t> |
| 594 |
| 595 <t hangText="cbr:"> |
| 596 specifies if the decoder prefers the use of a constant bitrate ver
sus |
| 597 variable bitrate. Possible values are 1 and 0, where 1 specifies c
onstant |
| 598 bitrate and 0 specifies variable bitrate. If no value is specified
, |
| 599 the default is 0 (vbr). When cbr is 1, the maximum average bitrate
can still |
| 600 change, e.g. to adapt to changing network conditions.<vspace blank
Lines='1'/> |
| 601 </t> |
| 602 |
| 603 <t hangText="useinbandfec:"> specifies that the decoder has the capa
bility to |
| 604 take advantage of the Opus in-band FEC. Possible values are 1 and 0. |
| 605 Providing 0 when FEC cannot be used on the receiving side is |
| 606 RECOMMENDED. If no |
| 607 value is specified, useinbandfec is assumed to be 0. |
| 608 This parameter is only a preference and the receiver MUST be able to
process |
| 609 packets that include FEC information, even if it means the FEC part
is discarded. |
| 610 <vspace blankLines='1'/></t> |
| 611 |
| 612 <t hangText="usedtx:"> specifies if the decoder prefers the use of |
| 613 DTX. Possible values are 1 and 0. If no value is specified, the |
| 614 default is 0.<vspace blankLines='1'/></t> |
| 615 </list></t> |
| 616 |
| 617 <t>Encoding considerations:<vspace blankLines='1'/></t> |
| 618 <t><list style="hanging"> |
| 619 <t>The Opus media type is framed and consists of binary data accordi
ng |
| 620 to Section 4.8 in <xref target="RFC6838"/>.</t> |
| 621 </list></t> |
| 622 |
| 623 <t>Security considerations: </t> |
| 624 <t><list style="hanging"> |
| 625 <t>See <xref target='security-considerations'/> of this document.</t
> |
| 626 </list></t> |
| 627 |
| 628 <t>Interoperability considerations: none<vspace blankLines='1'/></t> |
| 629 <t>Published specification: RFC [XXXX]</t> |
| 630 <t>Note to the RFC Editor: Replace [XXXX] with the number of the publi
shed |
| 631 RFC.<vspace blankLines='1'/></t> |
| 632 |
| 633 <t>Applications that use this media type: </t> |
| 634 <t><list style="hanging"> |
| 635 <t>Any application that requires the transport of |
| 636 speech or audio data can use this media type. Some examples are, |
| 637 but not limited to, audio and video conferencing, Voice over IP, |
| 638 media streaming.</t> |
| 639 </list></t> |
| 640 |
| 641 <t>Fragment identifier considerations: N/A<vspace blankLines='1'/></t> |
| 642 |
| 643 <t>Person & email address to contact for further information:</t> |
| 644 <t><list style="hanging"> |
| 645 <t>SILK Support silksupport@skype.net</t> |
| 646 <t>Jean-Marc Valin jmvalin@jmvalin.ca</t> |
| 647 </list></t> |
| 648 |
| 649 <t>Intended usage: COMMON<vspace blankLines='1'/></t> |
| 650 |
| 651 <t>Restrictions on usage:<vspace blankLines='1'/></t> |
| 652 |
| 653 <t><list style="hanging"> |
| 654 <t>For transfer over RTP, the RTP payload format (<xref |
| 655 target='opus-rtp-payload-format'/> of this document) SHALL be |
| 656 used.</t> |
| 657 </list></t> |
| 658 |
| 659 <t>Author:</t> |
| 660 <t><list style="hanging"> |
| 661 <t>Julian Spittka jspittka@gmail.com<vspace blankLines='1'/></t> |
| 662 <t>Koen Vos koenvos74@gmail.com<vspace blankLines='1'/></t> |
| 663 <t>Jean-Marc Valin jmvalin@jmvalin.ca<vspace blankLines='1'/></t> |
| 664 </list></t> |
| 665 |
| 666 <t> Change controller: IETF Payload Working Group delegated from the I
ESG</t> |
| 667 </section> |
| 668 </section> |
| 669 |
| 670 <section title='SDP Considerations'> |
| 671 <t>The information described in the media type specification has a |
| 672 specific mapping to fields in the Session Description Protocol (SDP) |
| 673 <xref target="RFC4566"/>, which is commonly used to describe RTP |
| 674 sessions. When SDP is used to specify sessions employing Opus, |
| 675 the mapping is as follows:</t> |
| 676 |
| 677 <t> |
| 678 <list style="symbols"> |
| 679 <t>The media type ("audio") goes in SDP "m=" as the media name.</t> |
| 680 |
| 681 <t>The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding |
| 682 name. The RTP clock rate in "a=rtpmap" MUST be 48000 and the number
of |
| 683 channels MUST be 2.</t> |
| 684 |
| 685 <t>The OPTIONAL media type parameters "ptime" and "maxptime" are |
| 686 mapped to "a=ptime" and "a=maxptime" attributes, respectively, in th
e |
| 687 SDP.</t> |
| 688 |
| 689 <t>The OPTIONAL media type parameters "maxaveragebitrate", |
| 690 "maxplaybackrate", "stereo", "cbr", "useinbandfec", and |
| 691 "usedtx", when present, MUST be included in the "a=fmtp" attribute |
| 692 in the SDP, expressed as a media type string in the form of a |
| 693 semicolon-separated list of parameter=value pairs (e.g., |
| 694 maxplaybackrate=48000). They MUST NOT be specified in an |
| 695 SSRC-specific "fmtp" source-level attribute (as defined in |
| 696 Section 6.3 of <xref target="RFC5576"/>).</t> |
| 697 |
| 698 <t>The OPTIONAL media type parameters "sprop-maxcapturerate", |
| 699 and "sprop-stereo" MAY be mapped to the "a=fmtp" SDP attribute by |
| 700 copying them directly from the media type parameter string as part |
| 701 of the semicolon-separated list of parameter=value pairs (e.g., |
| 702 sprop-stereo=1). These same OPTIONAL media type parameters MAY also |
| 703 be specified using an SSRC-specific "fmtp" source-level attribute |
| 704 as described in Section 6.3 of <xref target="RFC5576"/>. |
| 705 They MAY be specified in both places, in which case the parameter |
| 706 in the source-level attribute overrides the one found on the |
| 707 "a=fmtp" line. The value of any parameter which is not specified in |
| 708 a source-level source attribute MUST be taken from the "a=fmtp" |
| 709 line, if it is present there.</t> |
| 710 |
| 711 </list> |
| 712 </t> |
| 713 |
| 714 <t>Below are some examples of SDP session descriptions for Opus:</t> |
| 715 |
| 716 <t>Example 1: Standard mono session with 48000 Hz clock rate</t> |
| 717 <figure> |
| 718 <artwork> |
| 719 <![CDATA[ |
| 720 m=audio 54312 RTP/AVP 101 |
| 721 a=rtpmap:101 opus/48000/2 |
| 722 ]]> |
| 723 </artwork> |
| 724 </figure> |
| 725 |
| 726 |
| 727 <t>Example 2: 16000 Hz clock rate, maximum packet size of 40 ms, |
| 728 recommended packet size of 40 ms, maximum average bitrate of 20000 bps, |
| 729 prefers to receive stereo but only plans to send mono, FEC is desired, |
| 730 DTX is not desired</t> |
| 731 |
| 732 <figure> |
| 733 <artwork> |
| 734 <![CDATA[ |
| 735 m=audio 54312 RTP/AVP 101 |
| 736 a=rtpmap:101 opus/48000/2 |
| 737 a=fmtp:101 maxplaybackrate=16000; sprop-maxcapturerate=16000; |
| 738 maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0 |
| 739 a=ptime:40 |
| 740 a=maxptime:40 |
| 741 ]]> |
| 742 </artwork> |
| 743 </figure> |
| 744 |
| 745 <t>Example 3: Two-way full-band stereo preferred</t> |
| 746 |
| 747 <figure> |
| 748 <artwork> |
| 749 <![CDATA[ |
| 750 m=audio 54312 RTP/AVP 101 |
| 751 a=rtpmap:101 opus/48000/2 |
| 752 a=fmtp:101 stereo=1; sprop-stereo=1 |
| 753 ]]> |
| 754 </artwork> |
| 755 </figure> |
| 756 |
| 757 |
| 758 <section title='SDP Offer/Answer Considerations'> |
| 759 |
| 760 <t>When using the offer-answer procedure described in <xref |
| 761 target="RFC3264"/> to negotiate the use of Opus, the following |
| 762 considerations apply:</t> |
| 763 |
| 764 <t><list style="symbols"> |
| 765 |
| 766 <t>Opus supports several clock rates. For signaling purposes only |
| 767 the highest, i.e. 48000, is used. The actual clock rate of the |
| 768 corresponding media is signaled inside the payload and is not |
| 769 restricted by this payload format description. The decoder MUST be |
| 770 capable of decoding every received clock rate. An example |
| 771 is shown below: |
| 772 |
| 773 <figure> |
| 774 <artwork> |
| 775 <![CDATA[ |
| 776 m=audio 54312 RTP/AVP 100 |
| 777 a=rtpmap:100 opus/48000/2 |
| 778 ]]> |
| 779 </artwork> |
| 780 </figure> |
| 781 </t> |
| 782 |
| 783 <t>The "ptime" and "maxptime" parameters are unidirectional |
| 784 receive-only parameters and typically will not compromise |
| 785 interoperability; however, some values might cause application |
| 786 performance to suffer. <xref |
| 787 target="RFC3264"/> defines the SDP offer-answer handling of the |
| 788 "ptime" parameter. The "maxptime" parameter MUST be handled in the |
| 789 same way.</t> |
| 790 |
| 791 <t> |
| 792 The "maxplaybackrate" parameter is a unidirectional receive-only |
| 793 parameter that reflects limitations of the local receiver. When |
| 794 sending to a single destination, a sender MUST NOT use an audio |
| 795 bandwidth higher than necessary to make full use of audio sampled
at |
| 796 a sampling rate of "maxplaybackrate". Gateways or senders that |
| 797 are sending the same encoded audio to multiple destinations |
| 798 SHOULD NOT use an audio bandwidth higher than necessary to |
| 799 represent audio sampled at "maxplaybackrate", as this would lead |
| 800 to inefficient use of network resources. |
| 801 The "maxplaybackrate" parameter does not |
| 802 affect interoperability. Also, this parameter SHOULD NOT be used |
| 803 to adjust the audio bandwidth as a function of the bitrate, as thi
s |
| 804 is the responsibility of the Opus encoder implementation. |
| 805 </t> |
| 806 |
| 807 <t>The "maxaveragebitrate" parameter is a unidirectional receive-onl
y |
| 808 parameter that reflects limitations of the local receiver. The sende
r |
| 809 of the other side MUST NOT send with an average bitrate higher than |
| 810 "maxaveragebitrate" as it might overload the network and/or |
| 811 receiver. The "maxaveragebitrate" parameter typically will not |
| 812 compromise interoperability; however, some values might cause |
| 813 application performance to suffer, and ought to be set with |
| 814 care.</t> |
| 815 |
| 816 <t>The "sprop-maxcapturerate" and "sprop-stereo" parameters are |
| 817 unidirectional sender-only parameters that reflect limitations of |
| 818 the sender side. |
| 819 They allow the receiver to set up a reduced-complexity audio |
| 820 processing pipeline if the sender is not planning to use the full |
| 821 range of Opus's capabilities. |
| 822 Neither "sprop-maxcapturerate" nor "sprop-stereo" affect |
| 823 interoperability and the receiver MUST be capable of receiving any s
ignal. |
| 824 </t> |
| 825 |
| 826 <t> |
| 827 The "stereo" parameter is a unidirectional receive-only |
| 828 parameter. When sending to a single destination, a sender MUST |
| 829 NOT use stereo when "stereo" is 0. Gateways or senders that are |
| 830 sending the same encoded audio to multiple destinations SHOULD |
| 831 NOT use stereo when "stereo" is 0, as this would lead to |
| 832 inefficient use of network resources. The "stereo" parameter does |
| 833 not affect interoperability. |
| 834 </t> |
| 835 |
| 836 <t> |
| 837 The "cbr" parameter is a unidirectional receive-only |
| 838 parameter. |
| 839 </t> |
| 840 |
| 841 <t>The "useinbandfec" parameter is a unidirectional receive-only |
| 842 parameter.</t> |
| 843 |
| 844 <t>The "usedtx" parameter is a unidirectional receive-only |
| 845 parameter.</t> |
| 846 |
| 847 <t>Any unknown parameter in an offer MUST be ignored by the receiver |
| 848 and MUST be removed from the answer.</t> |
| 849 |
| 850 </list></t> |
| 851 |
| 852 <t> |
| 853 The Opus parameters in an SDP Offer/Answer exchange are completely |
| 854 orthogonal, and there is no relationship between the SDP Offer and |
| 855 the Answer. |
| 856 </t> |
| 857 </section> |
| 858 |
| 859 <section title='Declarative SDP Considerations for Opus'> |
| 860 |
| 861 <t>For declarative use of SDP such as in Session Announcement Protocol |
| 862 (SAP), <xref target="RFC2974"/>, and RTSP, <xref target="RFC2326"/>, for |
| 863 Opus, the following needs to be considered:</t> |
| 864 |
| 865 <t><list style="symbols"> |
| 866 |
| 867 <t>The values for "maxptime", "ptime", "maxplaybackrate", and |
| 868 "maxaveragebitrate" ought to be selected carefully to ensure that a |
| 869 reasonable performance can be achieved for the participants of a sessi
on.</t> |
| 870 |
| 871 <t> |
| 872 The values for "maxptime", "ptime", and of the payload |
| 873 format configuration are recommendations by the decoding side to ens
ure |
| 874 the best performance for the decoder. |
| 875 </t> |
| 876 |
| 877 <t>All other parameters of the payload format configuration are declar
ative |
| 878 and a participant MUST use the configurations that are provided for |
| 879 the session. More than one configuration can be provided if necessary |
| 880 by declaring multiple RTP payload types; however, the number of types |
| 881 ought to be kept small.</t> |
| 882 </list></t> |
| 883 </section> |
| 884 </section> |
| 885 |
| 886 <section title='Security Considerations' anchor='security-considerations'> |
| 887 |
| 888 <t>Use of variable bitrate (VBR) is subject to the security considerations
in |
| 889 <xref target="RFC6562"/>.</t> |
| 890 |
| 891 <t>RTP packets using the payload format defined in this specification |
| 892 are subject to the security considerations discussed in the RTP |
| 893 specification <xref target="RFC3550"/>, and in any applicable RTP profile
such as |
| 894 RTP/AVP <xref target="RFC3551"/>, RTP/AVPF <xref target="RFC4585"/>, |
| 895 RTP/SAVP <xref target="RFC3711"/> or RTP/SAVPF <xref target="RFC5124"/>. |
| 896 However, as "Securing the RTP Protocol Framework: |
| 897 Why RTP Does Not Mandate a Single Media Security Solution" |
| 898 <xref target="RFC7202"/> discusses, it is not an RTP payload |
| 899 format's responsibility to discuss or mandate what solutions are used |
| 900 to meet the basic security goals like confidentiality, integrity and |
| 901 source authenticity for RTP in general. This responsibility lays on |
| 902 anyone using RTP in an application. They can find guidance on |
| 903 available security mechanisms and important considerations in Options |
| 904 for Securing RTP Sessions [I-D.ietf-avtcore-rtp-security-options]. |
| 905 Applications SHOULD use one or more appropriate strong security |
| 906 mechanisms.</t> |
| 907 |
| 908 <t>This payload format and the Opus encoding do not exhibit any |
| 909 significant non-uniformity in the receiver-end computational load and thus |
| 910 are unlikely to pose a denial-of-service threat due to the receipt of |
| 911 pathological datagrams.</t> |
| 912 </section> |
| 913 |
| 914 <section title='Acknowledgements'> |
| 915 <t>Many people have made useful comments and suggestions contributing to thi
s document. |
| 916 In particular, we would like to thank |
| 917 Tina le Grand, Cullen Jennings, Jonathan Lennox, Gregory Maxwell, Colin Pe
rkins, Jan Skoglund, |
| 918 Timothy B. Terriberry, Martin Thompson, Justin Uberti, Magnus Westerlund,
and Mo Zanaty.</t> |
| 919 </section> |
| 920 </middle> |
| 921 |
| 922 <back> |
| 923 <references title="Normative References"> |
| 924 &rfc2119; |
| 925 &rfc3389; |
| 926 &rfc3550; |
| 927 &rfc3711; |
| 928 &rfc3551; |
| 929 &rfc6838; |
| 930 &rfc4855; |
| 931 &rfc4566; |
| 932 &rfc3264; |
| 933 &rfc2326; |
| 934 &rfc5576; |
| 935 &rfc6562; |
| 936 &rfc6716; |
| 937 </references> |
| 938 |
| 939 <references title="Informative References"> |
| 940 &rfc2974; |
| 941 &rfc4585; |
| 942 &rfc5124; |
| 943 &rfc5405; |
| 944 &rfc7202; |
| 945 |
| 946 <reference anchor='rmcat' target='https://datatracker.ietf.org/wg/rmcat/do
cuments/'> |
| 947 <front> |
| 948 <title>rmcat documents</title> |
| 949 <author/> |
| 950 <date/> |
| 951 <abstract> |
| 952 <t></t> |
| 953 </abstract></front> |
| 954 </reference> |
| 955 |
| 956 |
| 957 </references> |
| 958 |
| 959 </back> |
| 960 </rfc> |
OLD | NEW |