third_party/opus/src/doc/draft-ietf-payload-rtp-opus.xml - Issue 2962373002: [Opus] Update to v1.2.1

Side by Side Diff: third_party/opus/src/doc/draft-ietf-payload-rtp-opus.xml

Issue 2962373002: [Opus] Update to v1.2.1 (Closed)

Patch Set: Pre-increment instead of post-increment Created 3 years, 5 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
(Empty)
	1 <?xml version="1.0" encoding="UTF-8"?>

	2 <!DOCTYPE rfc SYSTEM "rfc2629.dtd" [

	3 <!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.2119.xml'>

	4 <!ENTITY rfc3389 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.3389.xml'>

	5 <!ENTITY rfc3550 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.3550.xml'>

	6 <!ENTITY rfc3711 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.3711.xml'>

	7 <!ENTITY rfc3551 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.3551.xml'>

	8 <!ENTITY rfc6838 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.6838.xml'>

	9 <!ENTITY rfc4855 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.4855.xml'>

	10 <!ENTITY rfc4566 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.4566.xml'>

	11 <!ENTITY rfc4585 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.4585.xml'>

	12 <!ENTITY rfc3264 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.3264.xml'>

	13 <!ENTITY rfc2974 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.2974.xml'>

	14 <!ENTITY rfc2326 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.2326.xml'>

	15 <!ENTITY rfc3555 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.3555.xml'>

	16 <!ENTITY rfc5124 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.5124.xml'>

	17 <!ENTITY rfc5405 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.5405.xml'>

	18 <!ENTITY rfc5576 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.5576.xml'>

	19 <!ENTITY rfc6562 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.6562.xml'>

	20 <!ENTITY rfc6716 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.6716.xml'>

	21 <!ENTITY rfc7202 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference. RFC.7202.xml'>

	22 <!ENTITY nbsp " ">

	23 ]>

	24

	25 <rfc category="std" ipr="trust200902" docName="draft-ietf-payload-rtp-opus-11" >

	26 <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

	27

	28 <?rfc strict="yes" ?>

	29 <?rfc toc="yes" ?>

	30 <?rfc tocdepth="3" ?>

	31 <?rfc tocappendix='no' ?>

	32 <?rfc tocindent='yes' ?>

	33 <?rfc symrefs="yes" ?>

	34 <?rfc sortrefs="yes" ?>

	35 <?rfc compact="no" ?>

	36 <?rfc subcompact="yes" ?>

	37 <?rfc iprnotified="yes" ?>

	38

	39 <front>

	40 <title abbrev="RTP Payload Format for Opus">

	41 RTP Payload Format for the Opus Speech and Audio Codec

	42 </title>

	43

	44 <author fullname="Julian Spittka" initials="J." surname="Spittka">

	45 <address>

	46 <email>jspittka@gmail.com</email>

	47 </address>

	48 </author>

	49

	50 <author initials='K.' surname='Vos' fullname='Koen Vos'>

	51 <organization>vocTone</organization>

	52 <address>

	53 <postal>

	54 <street></street>

	55 <code></code>

	56 <city></city>

	57 <region></region>

	58 <country></country>

	59 </postal>

	60 <email>koenvos74@gmail.com</email>

	61 </address>

	62 </author>

	63

	64 <author initials="JM" surname="Valin" fullname="Jean-Marc Valin">

	65 <organization>Mozilla</organization>

	66 <address>

	67 <postal>

	68 <street>331 E. Evelyn Avenue</street>

	69 <city>Mountain View</city>

	70 <region>CA</region>

	71 <code>94041</code>

	72 <country>USA</country>

	73 </postal>

	74 <email>jmvalin@jmvalin.ca</email>

	75 </address>

	76 </author>

	77

	78 <date day='14' month='April' year='2015' />

	79

	80 <abstract>

	81 <t>

	82 This document defines the Real-time Transport Protocol (RTP) payload

	83 format for packetization of Opus encoded

	84 speech and audio data necessary to integrate the codec in the

	85 most compatible way. It also provides an applicability statement

	86 for the use of Opus over RTP. Further, it describes media type registrat ions

	87 for the RTP payload format.

	88 </t>

	89 </abstract>

	90 </front>

	91

	92 <middle>

	93 <section title='Introduction'>

	94 <t>

	95 Opus <xref target="RFC6716"/> is a speech and audio codec developed with in the

	96 IETF Internet Wideband Audio Codec working group. The codec

	97 has a very low algorithmic delay and it

	98 is highly scalable in terms of audio bandwidth, bitrate, and

	99 complexity. Further, it provides different modes to efficiently encode s peech signals

	100 as well as music signals, thus making it the codec of choice for

	101 various applications using the Internet or similar networks.

	102 </t>

	103 <t>

	104 This document defines the Real-time Transport Protocol (RTP)

	105 <xref target="RFC3550"/> payload format for packetization

	106 of Opus encoded speech and audio data necessary to

	107 integrate Opus in the

	108 most compatible way. It also provides an applicability statement

	109 for the use of Opus over RTP.

	110 Further, it describes media type registrations for

	111 the RTP payload format.

	112 </t>

	113 </section>

	114

	115 <section title='Conventions, Definitions and Acronyms used in this document' >

	116 <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

	117 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this

	118 document are to be interpreted as described in <xref target="RFC2119"/>.</ t>

	119 <t>

	120 <list style='hanging'>

	121 <t hangText="audio bandwidth:"> The range of audio frequecies being co ded</t>

	122 <t hangText="CBR:"> Constant bitrate</t>

	123 <t hangText="CPU:"> Central Processing Unit</t>

	124 <t hangText="DTX:"> Discontinuous transmission</t>

	125 <t hangText="FEC:"> Forward error correction</t>

	126 <t hangText="IP:"> Internet Protocol</t>

	127 <t hangText="samples:"> Speech or audio samples (per channel)</t>

	128 <t hangText="SDP:"> Session Description Protocol</t>

	129 <t hangText="VBR:"> Variable bitrate</t>

	130 </list>

	131 </t>

	132 <t>

	133 Throughout this document, we refer to the following definitions:

	134 </t>

	135 <texttable anchor='bandwidth_definitions'>

	136 <ttcol align='center'>Abbreviation</ttcol>

	137 <ttcol align='center'>Name</ttcol>

	138 <ttcol align='center'>Audio Bandwidth (Hz)</ttcol>

	139 <ttcol align='center'>Sampling Rate (Hz)</ttcol>

	140 <c>NB</c>

	141 <c>Narrowband</c>

	142 <c>0 - 4000</c>

	143 <c>8000</c>

	144

	145 <c>MB</c>

	146 <c>Mediumband</c>

	147 <c>0 - 6000</c>

	148 <c>12000</c>

	149

	150 <c>WB</c>

	151 <c>Wideband</c>

	152 <c>0 - 8000</c>

	153 <c>16000</c>

	154

	155 <c>SWB</c>

	156 <c>Super-wideband</c>

	157 <c>0 - 12000</c>

	158 <c>24000</c>

	159

	160 <c>FB</c>

	161 <c>Fullband</c>

	162 <c>0 - 20000</c>

	163 <c>48000</c>

	164

	165 <postamble>

	166 Audio bandwidth naming

	167 </postamble>

	168 </texttable>

	169 </section>

	170

	171 <section title='Opus Codec'>

	172 <t>

	173 Opus encodes speech

	174 signals as well as general audio signals. Two different modes can be

	175 chosen, a voice mode or an audio mode, to allow the most efficient codin g

	176 depending on the type of the input signal, the sampling frequency of the

	177 input signal, and the intended application.

	178 </t>

	179

	180 <t>

	181 The voice mode allows efficient encoding of voice signals at lower bit

	182 rates while the audio mode is optimized for general audio signals at med ium and

	183 higher bitrates.

	184 </t>

	185

	186 <t>

	187 Opus is highly scalable in terms of audio

	188 bandwidth, bitrate, and complexity. Further, Opus allows

	189 transmitting stereo signals with in-band signaling in the bit-stream.

	190 </t>

	191

	192 <section title='Network Bandwidth'>

	193 <t>

	194 Opus supports bitrates from 6 kb/s to 510 kb/s.

	195 The bitrate can be changed dynamically within that range.

	196 All

	197 other parameters being

	198 equal, higher bitrates result in higher audio quality.

	199 </t>

	200 <section title='Recommended Bitrate' anchor='bitrate_by_bandwidth'>

	201 <t>

	202 For a frame size of

	203 20 ms, these

	204 are the bitrate "sweet spots" for Opus in various configurations:

	205

	206 <list style="symbols">

	207 <t>8-12 kb/s for NB speech,</t>

	208 <t>16-20 kb/s for WB speech,</t>

	209 <t>28-40 kb/s for FB speech,</t>

	210 <t>48-64 kb/s for FB mono music, and</t>

	211 <t>64-128 kb/s for FB stereo music.</t>

	212 </list>

	213 </t>

	214 </section>

	215 <section title='Variable versus Constant Bitrate' anchor='variable-vs-c onstant-bitrate'>

	216 <t>

	217 For the same average bitrate, variable bitrate (VBR) can achieve hig her audio quality

	218 than constant bitrate (CBR). For the majority of voice transmission applications, VBR

	219 is the best choice. One reason for choosing CBR is the potential

	220 information leak that <spanx style='emph'>might</spanx> occur when e ncrypting the

	221 compressed stream. See <xref target="RFC6562"/> for guidelines on wh en VBR is

	222 appropriate for encrypted audio communications. In the case where an existing

	223 VBR stream needs to be converted to CBR for security reasons, then t he Opus padding

	224 mechanism described in <xref target="RFC6716"/> is the RECOMMENDED w ay to achieve padding

	225 because the RTP padding bit is unencrypted.</t>

	226

	227 <t>

	228 The bitrate can be adjusted at any point in time. To avoid congestio n,

	229 the average bitrate SHOULD NOT exceed the available

	230 network bandwidth. If no target bitrate is specified, the bitrates s pecified in

	231 <xref target='bitrate_by_bandwidth'/> are RECOMMENDED.

	232 </t>

	233

	234 </section>

	235

	236 <section title='Discontinuous Transmission (DTX)'>

	237

	238 <t>

	239 Opus can, as described in <xref target='variable-vs-constant-bitrate '/>,

	240 be operated with a variable bitrate. In that case, the encoder will

	241 automatically reduce the bitrate for certain input signals, like per iods

	242 of silence. When using continuous transmission, it will reduce the

	243 bitrate when the characteristics of the input signal permit, but

	244 will never interrupt the transmission to the receiver. Therefore, th e

	245 received signal will maintain the same high level of audio quality o ver the

	246 full duration of a transmission while minimizing the average bit

	247 rate over time.

	248 </t>

	249

	250 <t>

	251 In cases where the bitrate of Opus needs to be reduced even

	252 further or in cases where only constant bitrate is available,

	253 the Opus encoder can use discontinuous

	254 transmission (DTX), where parts of the encoded signal that

	255 correspond to periods of silence in the input speech or audio signal

	256 are not transmitted to the receiver. A receiver can distinguish

	257 between DTX and packet loss by looking for gaps in the sequence

	258 number, as described by Section 4.1

	259 of <xref target="RFC3551"/>.

	260 </t>

	261

	262 <t>

	263 On the receiving side, the non-transmitted parts will be handled by a

	264 frame loss concealment unit in the Opus decoder which generates a

	265 comfort noise signal to replace the non transmitted parts of the

	266 speech or audio signal. Use of <xref target="RFC3389"/> Comfort

	267 Noise (CN) with Opus is discouraged.

	268 The transmitter MUST drop whole frames only,

	269 based on the size of the last transmitted frame,

	270 to ensure successive RTP timestamps differ by a multiple of 120 and

	271 to allow the receiver to use whole frames for concealment.

	272 </t>

	273

	274 <t>

	275 DTX can be used with both variable and constant bitrate.

	276 It will have a slightly lower speech or audio

	277 quality than continuous transmission. Therefore, using continuous

	278 transmission is RECOMMENDED unless constraints on available network bandwidth

	279 are severe.

	280 </t>

	281

	282 </section>

	283

	284 </section>

	285

	286 <section title='Complexity'>

	287

	288 <t>

	289 Complexity of the encoder can be scaled to optimize for CPU resources in real-time, mostly as

	290 a trade-off between audio quality and bitrate. Also, different modes o f Opus have different complexity.

	291 </t>

	292

	293 </section>

	294

	295 <section title="Forward Error Correction (FEC)">

	296

	297 <t>

	298 The voice mode of Opus allows for embedding "in-band" forward error co rrection (FEC)

	299 data into the Opus bit stream. This FEC scheme adds

	300 redundant information about the previous packet (N-1) to the current

	301 output packet N. For

	302 each frame, the encoder decides whether to use FEC based on (1) an

	303 externally-provided estimate of the channel's packet loss rate; (2) an

	304 externally-provided estimate of the channel's capacity; (3) the

	305 sensitivity of the audio or speech signal to packet loss; (4) whether

	306 the receiving decoder has indicated it can take advantage of "in-band"

	307 FEC information. The decision to send "in-band" FEC information is

	308 entirely controlled by the encoder and therefore no special precaution s

	309 for the payload have to be taken.

	310 </t>

	311

	312 <t>

	313 On the receiving side, the decoder can take advantage of this

	314 additional information when it loses a packet and the next packet

	315 is available. In order to use the FEC data, the jitter buffer needs

	316 to provide access to payloads with the FEC data.

	317 Instead of performing loss concealment for a missing packet, the

	318 receiver can then configure its decoder to decode the FEC data from th e next packet.

	319 </t>

	320

	321 <t>

	322 Any compliant Opus decoder is capable of ignoring

	323 FEC information when it is not needed, so encoding with FEC cannot cau se

	324 interoperability problems.

	325 However, if FEC cannot be used on the receiving side, then FEC

	326 SHOULD NOT be used, as it leads to an inefficient usage of network

	327 resources. Decoder support for FEC SHOULD be indicated at the time a

	328 session is set up.

	329 </t>

	330

	331 </section>

	332

	333 <section title='Stereo Operation'>

	334

	335 <t>

	336 Opus allows for transmission of stereo audio signals. This operation

	337 is signaled in-band in the Opus bit-stream and no special arrangement

	338 is needed in the payload format. An

	339 Opus decoder is capable of handling a stereo encoding, but an

	340 application might only be capable of consuming a single audio

	341 channel.

	342 </t>

	343 <t>

	344 If a decoder cannot take advantage of the benefits of a stereo signal

	345 this SHOULD be indicated at the time a session is set up. In that case

	346 the sending side SHOULD NOT send stereo signals as it leads to an

	347 inefficient usage of network resources.

	348 </t>

	349

	350 </section>

	351

	352 </section>

	353

	354 <section title='Opus RTP Payload Format' anchor='opus-rtp-payload-format'>

	355 <t>The payload format for Opus consists of the RTP header and Opus payload

	356 data.</t>

	357 <section title='RTP Header Usage'>

	358 <t>The format of the RTP header is specified in <xref target="RFC3550"/> .

	359 The use of the fields of the RTP header by the Opus payload format is

	360 consistent with that specification.</t>

	361

	362 <t>The payload length of Opus is an integer number of octets and

	363 therefore no padding is necessary. The payload MAY be padded by an

	364 integer number of octets according to <xref target="RFC3550"/>,

	365 although the Opus internal padding is preferred.</t>

	366

	367 <t>The timestamp, sequence number, and marker bit (M) of the RTP header

	368 are used in accordance with Section 4.1

	369 of <xref target="RFC3551"/>.</t>

	370

	371 <t>The RTP payload type for Opus is to be assigned dynamically.</t>

	372

	373 <t>The receiving side MUST be prepared to receive duplicate RTP

	374 packets. The receiver MUST provide at most one of those payloads to the

	375 Opus decoder for decoding, and MUST discard the others.</t>

	376

	377 <t>Opus supports 5 different audio bandwidths, which can be adjusted dur ing

	378 a stream.

	379 The RTP timestamp is incremented with a 48000 Hz clock rate

	380 for all modes of Opus and all sampling rates.

	381 The unit

	382 for the timestamp is samples per single (mono) channel. The RTP timestam p corresponds to the

	383 sample time of the first encoded sample in the encoded frame.

	384 For data encoded with sampling rates other than 48000 Hz,

	385 the sampling rate has to be adjusted to 48000 Hz.</t>

	386

	387 </section>

	388

	389 <section title='Payload Structure'>

	390 <t>

	391 The Opus encoder can output encoded frames representing 2.5, 5, 10, 20 ,

	392 40, or 60 ms of speech or audio data. Further, an arbitrary numbe r of frames can be

	393 combined into a packet, up to a maximum packet duration representing

	394 120 ms of speech or audio data. The grouping of one or more Opus

	395 frames into a single Opus packet is defined in Section 3 of

	396 <xref target="RFC6716"/>. An RTP payload MUST contain exactly one

	397 Opus packet as defined by that document.

	398 </t>

	399

	400 <t><xref target='payload-structure'/> shows the structure combined with the RTP header.</t>

	401

	402 <figure anchor="payload-structure"

	403 title="Packet structure with RTP header">

	404 <artwork align="center">

	405 <![CDATA[

	406 +----------+--------------+

	407 \|RTP Header\| Opus Payload \|

	408 +----------+--------------+

	409 ]]>

	410 </artwork>

	411 </figure>

	412

	413 <t>

	414 <xref target='opus-packetization'/> shows supported frame sizes in

	415 milliseconds of encoded speech or audio data for the speech and audio modes

	416 (Mode) and sampling rates (fs) of Opus and shows how the timestamp is

	417 incremented for packetization (ts incr). If the Opus encoder

	418 outputs multiple encoded frames into a single packet, the timestamp

	419 increment is the sum of the increments for the individual frames.

	420 </t>

	421

	422 <texttable anchor='opus-packetization' title="Supported Opus frame

	423 sizes and timestamp increments marked with an o. Unsupported marked wit h an x.">

	424 <ttcol align='center'>Mode</ttcol>

	425 <ttcol align='center'>fs</ttcol>

	426 <ttcol align='center'>2.5</ttcol>

	427 <ttcol align='center'>5</ttcol>

	428 <ttcol align='center'>10</ttcol>

	429 <ttcol align='center'>20</ttcol>

	430 <ttcol align='center'>40</ttcol>

	431 <ttcol align='center'>60</ttcol>

	432 <c>ts incr</c>

	433 <c>all</c>

	434 <c>120</c>

	435 <c>240</c>

	436 <c>480</c>

	437 <c>960</c>

	438 <c>1920</c>

	439 <c>2880</c>

	440 <c>voice</c>

	441 <c>NB/MB/WB/SWB/FB</c>

	442 <c>x</c>

	443 <c>x</c>

	444 <c>o</c>

	445 <c>o</c>

	446 <c>o</c>

	447 <c>o</c>

	448 <c>audio</c>

	449 <c>NB/WB/SWB/FB</c>

	450 <c>o</c>

	451 <c>o</c>

	452 <c>o</c>

	453 <c>o</c>

	454 <c>x</c>

	455 <c>x</c>

	456 </texttable>

	457

	458 </section>

	459

	460 </section>

	461

	462 <section title='Congestion Control'>

	463

	464 <t>The target bitrate of Opus can be adjusted at any point in time, thus

	465 allowing efficient congestion control. Furthermore, the amount

	466 of encoded speech or audio data encoded in a

	467 single packet can be used for congestion control, since the transmission

	468 rate is inversely proportional to the packet duration. A lower packet

	469 transmission rate reduces the amount of header overhead, but at the same

	470 time increases latency and loss sensitivity, so it ought to be used with

	471 care.</t>

	472

	473 <t>Since UDP does not provide congestion control, applications that use

	474 RTP over UDP SHOULD implement their own congestion control above the

	475 UDP layer <xref target="RFC5405"/>. Work in the rmcat working group

	476 <xref target="rmcat"/> describes the

	477 interactions and conceptual interfaces necessary between the application

	478 components that relate to congestion control, including the RTP layer,

	479 the higher-level media codec control layer, and the lower-level

	480 transport interface, as well as components dedicated to congestion

	481 control functions.</t>

	482 </section>

	483

	484 <section title='IANA Considerations'>

	485 <t>One media subtype (audio/opus) has been defined and registered as

	486 described in the following section.</t>

	487

	488 <section title='Opus Media Type Registration'>

	489 <t>Media type registration is done according to <xref

	490 target="RFC6838"/> and <xref target="RFC4855"/>.<vspace

	491 blankLines='1'/></t>

	492

	493 <t>Type name: audio<vspace blankLines='1'/></t>

	494 <t>Subtype name: opus<vspace blankLines='1'/></t>

	495

	496 <t>Required parameters:</t>

	497 <t><list style="hanging">

	498 <t hangText="rate:"> the RTP timestamp is incremented with a

	499 48000 Hz clock rate for all modes of Opus and all sampling

	500 rates. For data encoded with sampling rates other than 48000 Hz,

	501 the sampling rate has to be adjusted to 48000 Hz.

	502 </t>

	503 </list></t>

	504

	505 <t>Optional parameters:</t>

	506

	507 <t><list style="hanging">

	508 <t hangText="maxplaybackrate:">

	509 a hint about the maximum output sampling rate that the receiver is

	510 capable of rendering in Hz.

	511 The decoder MUST be capable of decoding

	512 any audio bandwidth but due to hardware limitations only signals

	513 up to the specified sampling rate can be played back. Sending sign als

	514 with higher audio bandwidth results in higher than necessary netwo rk

	515 usage and encoding complexity, so an encoder SHOULD NOT encode

	516 frequencies above the audio bandwidth specified by maxplaybackrate .

	517 This parameter can take any value between 8000 and 48000, although

	518 commonly the value will match one of the Opus bandwidths

	519 (<xref target="bandwidth_definitions"/>).

	520 By default, the receiver is assumed to have no limitations, i.e. 4 8000.

	521 <vspace blankLines='1'/>

	522 </t>

	523

	524 <t hangText="sprop-maxcapturerate:">

	525 a hint about the maximum input sampling rate that the sender is li kely to produce.

	526 This is not a guarantee that the sender will never send any higher bandwidth

	527 (e.g. it could send a pre-recorded prompt that uses a higher bandw idth), but it

	528 indicates to the receiver that frequencies above this maximum can safely be discarded.

	529 This parameter is useful to avoid wasting receiver resources by op erating the audio

	530 processing pipeline (e.g. echo cancellation) at a higher rate than necessary.

	531 This parameter can take any value between 8000 and 48000, although

	532 commonly the value will match one of the Opus bandwidths

	533 (<xref target="bandwidth_definitions"/>).

	534 By default, the sender is assumed to have no limitations, i.e. 480 00.

	535 <vspace blankLines='1'/>

	536 </t>

	537

	538 <t hangText="maxptime:"> the maximum duration of media represented

	539 by a packet (according to Section 6 of

	540 <xref target="RFC4566"/>) that a decoder wants to receive, in

	541 milliseconds rounded up to the next full integer value.

	542 Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary

	543 multiple of an Opus frame size rounded up to the next full integer

	544 value, up to a maximum value of 120, as

	545 defined in <xref target='opus-rtp-payload-format'/>. If no value is

	546 specified, the default is 120.

	547 <vspace blankLines='1'/></t>

	548

	549 <t hangText="ptime:"> the preferred duration of media represented

	550 by a packet (according to Section 6 of

	551 <xref target="RFC4566"/>) that a decoder wants to receive, in

	552 milliseconds rounded up to the next full integer value.

	553 Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary

	554 multiple of an Opus frame size rounded up to the next full integer

	555 value, up to a maximum value of 120, as defined in <xref

	556 target='opus-rtp-payload-format'/>. If no value is

	557 specified, the default is 20.

	558 <vspace blankLines='1'/></t>

	559

	560 <t hangText="maxaveragebitrate:"> specifies the maximum average

	561 receive bitrate of a session in bits per second (b/s). The actual

	562 value of the bitrate can vary, as it is dependent on the

	563 characteristics of the media in a packet. Note that the maximum

	564 average bitrate MAY be modified dynamically during a session. Any

	565 positive integer is allowed, but values outside the range

	566 6000 to 510000 SHOULD be ignored. If no value is specified, the

	567 maximum value specified in <xref target='bitrate_by_bandwidth'/>

	568 for the corresponding mode of Opus and corresponding maxplaybackrate

	569 is the default.<vspace blankLines='1'/></t>

	570

	571 <t hangText="stereo:">

	572 specifies whether the decoder prefers receiving stereo or mono sig nals.

	573 Possible values are 1 and 0 where 1 specifies that stereo signals are preferred,

	574 and 0 specifies that only mono signals are preferred.

	575 Independent of the stereo parameter every receiver MUST be able to receive and

	576 decode stereo signals but sending stereo signals to a receiver tha t signaled a

	577 preference for mono signals may result in higher than necessary ne twork

	578 utilization and encoding complexity. If no value is specified,

	579 the default is 0 (mono).<vspace blankLines='1'/>

	580 </t>

	581

	582 <t hangText="sprop-stereo:">

	583 specifies whether the sender is likely to produce stereo audio.

	584 Possible values are 1 and 0, where 1 specifies that stereo signals are likely to

	585 be sent, and 0 specifies that the sender will likely only send mon o.

	586 This is not a guarantee that the sender will never send stereo aud io

	587 (e.g. it could send a pre-recorded prompt that uses stereo), but i t

	588 indicates to the receiver that the received signal can be safely d ownmixed to mono.

	589 This parameter is useful to avoid wasting receiver resources by op erating the audio

	590 processing pipeline (e.g. echo cancellation) in stereo when not ne cessary.

	591 If no value is specified, the default is 0

	592 (mono).<vspace blankLines='1'/>

	593 </t>

	594

	595 <t hangText="cbr:">

	596 specifies if the decoder prefers the use of a constant bitrate ver sus

	597 variable bitrate. Possible values are 1 and 0, where 1 specifies c onstant

	598 bitrate and 0 specifies variable bitrate. If no value is specified ,

	599 the default is 0 (vbr). When cbr is 1, the maximum average bitrate can still

	600 change, e.g. to adapt to changing network conditions.<vspace blank Lines='1'/>

	601 </t>

	602

	603 <t hangText="useinbandfec:"> specifies that the decoder has the capa bility to

	604 take advantage of the Opus in-band FEC. Possible values are 1 and 0.

	605 Providing 0 when FEC cannot be used on the receiving side is

	606 RECOMMENDED. If no

	607 value is specified, useinbandfec is assumed to be 0.

	608 This parameter is only a preference and the receiver MUST be able to process

	609 packets that include FEC information, even if it means the FEC part is discarded.

	610 <vspace blankLines='1'/></t>

	611

	612 <t hangText="usedtx:"> specifies if the decoder prefers the use of

	613 DTX. Possible values are 1 and 0. If no value is specified, the

	614 default is 0.<vspace blankLines='1'/></t>

	615 </list></t>

	616

	617 <t>Encoding considerations:<vspace blankLines='1'/></t>

	618 <t><list style="hanging">

	619 <t>The Opus media type is framed and consists of binary data accordi ng

	620 to Section 4.8 in <xref target="RFC6838"/>.</t>

	621 </list></t>

	622

	623 <t>Security considerations: </t>

	624 <t><list style="hanging">

	625 <t>See <xref target='security-considerations'/> of this document.</t >

	626 </list></t>

	627

	628 <t>Interoperability considerations: none<vspace blankLines='1'/></t>

	629 <t>Published specification: RFC [XXXX]</t>

	630 <t>Note to the RFC Editor: Replace [XXXX] with the number of the publi shed

	631 RFC.<vspace blankLines='1'/></t>

	632

	633 <t>Applications that use this media type: </t>

	634 <t><list style="hanging">

	635 <t>Any application that requires the transport of

	636 speech or audio data can use this media type. Some examples are,

	637 but not limited to, audio and video conferencing, Voice over IP,

	638 media streaming.</t>

	639 </list></t>

	640

	641 <t>Fragment identifier considerations: N/A<vspace blankLines='1'/></t>

	642

	643 <t>Person & email address to contact for further information:</t>

	644 <t><list style="hanging">

	645 <t>SILK Support silksupport@skype.net</t>

	646 <t>Jean-Marc Valin jmvalin@jmvalin.ca</t>

	647 </list></t>

	648

	649 <t>Intended usage: COMMON<vspace blankLines='1'/></t>

	650

	651 <t>Restrictions on usage:<vspace blankLines='1'/></t>

	652

	653 <t><list style="hanging">

	654 <t>For transfer over RTP, the RTP payload format (<xref

	655 target='opus-rtp-payload-format'/> of this document) SHALL be

	656 used.</t>

	657 </list></t>

	658

	659 <t>Author:</t>

	660 <t><list style="hanging">

	661 <t>Julian Spittka jspittka@gmail.com<vspace blankLines='1'/></t>

	662 <t>Koen Vos koenvos74@gmail.com<vspace blankLines='1'/></t>

	663 <t>Jean-Marc Valin jmvalin@jmvalin.ca<vspace blankLines='1'/></t>

	664 </list></t>

	665

	666 <t> Change controller: IETF Payload Working Group delegated from the I ESG</t>

	667 </section>

	668 </section>

	669

	670 <section title='SDP Considerations'>

	671 <t>The information described in the media type specification has a

	672 specific mapping to fields in the Session Description Protocol (SDP)

	673 <xref target="RFC4566"/>, which is commonly used to describe RTP

	674 sessions. When SDP is used to specify sessions employing Opus,

	675 the mapping is as follows:</t>

	676

	677 <t>

	678 <list style="symbols">

	679 <t>The media type ("audio") goes in SDP "m=" as the media name.</t>

	680

	681 <t>The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding

	682 name. The RTP clock rate in "a=rtpmap" MUST be 48000 and the number of

	683 channels MUST be 2.</t>

	684

	685 <t>The OPTIONAL media type parameters "ptime" and "maxptime" are

	686 mapped to "a=ptime" and "a=maxptime" attributes, respectively, in th e

	687 SDP.</t>

	688

	689 <t>The OPTIONAL media type parameters "maxaveragebitrate",

	690 "maxplaybackrate", "stereo", "cbr", "useinbandfec", and

	691 "usedtx", when present, MUST be included in the "a=fmtp" attribute

	692 in the SDP, expressed as a media type string in the form of a

	693 semicolon-separated list of parameter=value pairs (e.g.,

	694 maxplaybackrate=48000). They MUST NOT be specified in an

	695 SSRC-specific "fmtp" source-level attribute (as defined in

	696 Section 6.3 of <xref target="RFC5576"/>).</t>

	697

	698 <t>The OPTIONAL media type parameters "sprop-maxcapturerate",

	699 and "sprop-stereo" MAY be mapped to the "a=fmtp" SDP attribute by

	700 copying them directly from the media type parameter string as part

	701 of the semicolon-separated list of parameter=value pairs (e.g.,

	702 sprop-stereo=1). These same OPTIONAL media type parameters MAY also

	703 be specified using an SSRC-specific "fmtp" source-level attribute

	704 as described in Section 6.3 of <xref target="RFC5576"/>.

	705 They MAY be specified in both places, in which case the parameter

	706 in the source-level attribute overrides the one found on the

	707 "a=fmtp" line. The value of any parameter which is not specified in

	708 a source-level source attribute MUST be taken from the "a=fmtp"

	709 line, if it is present there.</t>

	710

	711 </list>

	712 </t>

	713

	714 <t>Below are some examples of SDP session descriptions for Opus:</t>

	715

	716 <t>Example 1: Standard mono session with 48000 Hz clock rate</t>

	717 <figure>

	718 <artwork>

	719 <![CDATA[

	720 m=audio 54312 RTP/AVP 101

	721 a=rtpmap:101 opus/48000/2

	722 ]]>

	723 </artwork>

	724 </figure>

	725

	726

	727 <t>Example 2: 16000 Hz clock rate, maximum packet size of 40 ms,

	728 recommended packet size of 40 ms, maximum average bitrate of 20000 bps,

	729 prefers to receive stereo but only plans to send mono, FEC is desired,

	730 DTX is not desired</t>

	731

	732 <figure>

	733 <artwork>

	734 <![CDATA[

	735 m=audio 54312 RTP/AVP 101

	736 a=rtpmap:101 opus/48000/2

	737 a=fmtp:101 maxplaybackrate=16000; sprop-maxcapturerate=16000;

	738 maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0

	739 a=ptime:40

	740 a=maxptime:40

	741 ]]>

	742 </artwork>

	743 </figure>

	744

	745 <t>Example 3: Two-way full-band stereo preferred</t>

	746

	747 <figure>

	748 <artwork>

	749 <![CDATA[

	750 m=audio 54312 RTP/AVP 101

	751 a=rtpmap:101 opus/48000/2

	752 a=fmtp:101 stereo=1; sprop-stereo=1

	753 ]]>

	754 </artwork>

	755 </figure>

	756

	757

	758 <section title='SDP Offer/Answer Considerations'>

	759

	760 <t>When using the offer-answer procedure described in <xref

	761 target="RFC3264"/> to negotiate the use of Opus, the following

	762 considerations apply:</t>

	763

	764 <t><list style="symbols">

	765

	766 <t>Opus supports several clock rates. For signaling purposes only

	767 the highest, i.e. 48000, is used. The actual clock rate of the

	768 corresponding media is signaled inside the payload and is not

	769 restricted by this payload format description. The decoder MUST be

	770 capable of decoding every received clock rate. An example

	771 is shown below:

	772

	773 <figure>

	774 <artwork>

	775 <![CDATA[

	776 m=audio 54312 RTP/AVP 100

	777 a=rtpmap:100 opus/48000/2

	778 ]]>

	779 </artwork>

	780 </figure>

	781 </t>

	782

	783 <t>The "ptime" and "maxptime" parameters are unidirectional

	784 receive-only parameters and typically will not compromise

	785 interoperability; however, some values might cause application

	786 performance to suffer. <xref

	787 target="RFC3264"/> defines the SDP offer-answer handling of the

	788 "ptime" parameter. The "maxptime" parameter MUST be handled in the

	789 same way.</t>

	790

	791 <t>

	792 The "maxplaybackrate" parameter is a unidirectional receive-only

	793 parameter that reflects limitations of the local receiver. When

	794 sending to a single destination, a sender MUST NOT use an audio

	795 bandwidth higher than necessary to make full use of audio sampled at

	796 a sampling rate of "maxplaybackrate". Gateways or senders that

	797 are sending the same encoded audio to multiple destinations

	798 SHOULD NOT use an audio bandwidth higher than necessary to

	799 represent audio sampled at "maxplaybackrate", as this would lead

	800 to inefficient use of network resources.

	801 The "maxplaybackrate" parameter does not

	802 affect interoperability. Also, this parameter SHOULD NOT be used

	803 to adjust the audio bandwidth as a function of the bitrate, as thi s

	804 is the responsibility of the Opus encoder implementation.

	805 </t>

	806

	807 <t>The "maxaveragebitrate" parameter is a unidirectional receive-onl y

	808 parameter that reflects limitations of the local receiver. The sende r

	809 of the other side MUST NOT send with an average bitrate higher than

	810 "maxaveragebitrate" as it might overload the network and/or

	811 receiver. The "maxaveragebitrate" parameter typically will not

	812 compromise interoperability; however, some values might cause

	813 application performance to suffer, and ought to be set with

	814 care.</t>

	815

	816 <t>The "sprop-maxcapturerate" and "sprop-stereo" parameters are

	817 unidirectional sender-only parameters that reflect limitations of

	818 the sender side.

	819 They allow the receiver to set up a reduced-complexity audio

	820 processing pipeline if the sender is not planning to use the full

	821 range of Opus's capabilities.

	822 Neither "sprop-maxcapturerate" nor "sprop-stereo" affect

	823 interoperability and the receiver MUST be capable of receiving any s ignal.

	824 </t>

	825

	826 <t>

	827 The "stereo" parameter is a unidirectional receive-only

	828 parameter. When sending to a single destination, a sender MUST

	829 NOT use stereo when "stereo" is 0. Gateways or senders that are

	830 sending the same encoded audio to multiple destinations SHOULD

	831 NOT use stereo when "stereo" is 0, as this would lead to

	832 inefficient use of network resources. The "stereo" parameter does

	833 not affect interoperability.

	834 </t>

	835

	836 <t>

	837 The "cbr" parameter is a unidirectional receive-only

	838 parameter.

	839 </t>

	840

	841 <t>The "useinbandfec" parameter is a unidirectional receive-only

	842 parameter.</t>

	843

	844 <t>The "usedtx" parameter is a unidirectional receive-only

	845 parameter.</t>

	846

	847 <t>Any unknown parameter in an offer MUST be ignored by the receiver

	848 and MUST be removed from the answer.</t>

	849

	850 </list></t>

	851

	852 <t>

	853 The Opus parameters in an SDP Offer/Answer exchange are completely

	854 orthogonal, and there is no relationship between the SDP Offer and

	855 the Answer.

	856 </t>

	857 </section>

	858

	859 <section title='Declarative SDP Considerations for Opus'>

	860

	861 <t>For declarative use of SDP such as in Session Announcement Protocol

	862 (SAP), <xref target="RFC2974"/>, and RTSP, <xref target="RFC2326"/>, for

	863 Opus, the following needs to be considered:</t>

	864

	865 <t><list style="symbols">

	866

	867 <t>The values for "maxptime", "ptime", "maxplaybackrate", and

	868 "maxaveragebitrate" ought to be selected carefully to ensure that a

	869 reasonable performance can be achieved for the participants of a sessi on.</t>

	870

	871 <t>

	872 The values for "maxptime", "ptime", and of the payload

	873 format configuration are recommendations by the decoding side to ens ure

	874 the best performance for the decoder.

	875 </t>

	876

	877 <t>All other parameters of the payload format configuration are declar ative

	878 and a participant MUST use the configurations that are provided for

	879 the session. More than one configuration can be provided if necessary

	880 by declaring multiple RTP payload types; however, the number of types

	881 ought to be kept small.</t>

	882 </list></t>

	883 </section>

	884 </section>

	885

	886 <section title='Security Considerations' anchor='security-considerations'>

	887

	888 <t>Use of variable bitrate (VBR) is subject to the security considerations in

	889 <xref target="RFC6562"/>.</t>

	890

	891 <t>RTP packets using the payload format defined in this specification

	892 are subject to the security considerations discussed in the RTP

	893 specification <xref target="RFC3550"/>, and in any applicable RTP profile such as

	894 RTP/AVP <xref target="RFC3551"/>, RTP/AVPF <xref target="RFC4585"/>,

	895 RTP/SAVP <xref target="RFC3711"/> or RTP/SAVPF <xref target="RFC5124"/>.

	896 However, as "Securing the RTP Protocol Framework:

	897 Why RTP Does Not Mandate a Single Media Security Solution"

	898 <xref target="RFC7202"/> discusses, it is not an RTP payload

	899 format's responsibility to discuss or mandate what solutions are used

	900 to meet the basic security goals like confidentiality, integrity and

	901 source authenticity for RTP in general. This responsibility lays on

	902 anyone using RTP in an application. They can find guidance on

	903 available security mechanisms and important considerations in Options

	904 for Securing RTP Sessions [I-D.ietf-avtcore-rtp-security-options].

	905 Applications SHOULD use one or more appropriate strong security

	906 mechanisms.</t>

	907

	908 <t>This payload format and the Opus encoding do not exhibit any

	909 significant non-uniformity in the receiver-end computational load and thus

	910 are unlikely to pose a denial-of-service threat due to the receipt of

	911 pathological datagrams.</t>

	912 </section>

	913

	914 <section title='Acknowledgements'>

	915 <t>Many people have made useful comments and suggestions contributing to thi s document.

	916 In particular, we would like to thank

	917 Tina le Grand, Cullen Jennings, Jonathan Lennox, Gregory Maxwell, Colin Pe rkins, Jan Skoglund,

	918 Timothy B. Terriberry, Martin Thompson, Justin Uberti, Magnus Westerlund, and Mo Zanaty.</t>

	919 </section>

	920 </middle>

	921

	922 <back>

	923 <references title="Normative References">

	924 &rfc2119;

	925 &rfc3389;

	926 &rfc3550;

	927 &rfc3711;

	928 &rfc3551;

	929 &rfc6838;

	930 &rfc4855;

	931 &rfc4566;

	932 &rfc3264;

	933 &rfc2326;

	934 &rfc5576;

	935 &rfc6562;

	936 &rfc6716;

	937 </references>

	938

	939 <references title="Informative References">

	940 &rfc2974;

	941 &rfc4585;

	942 &rfc5124;

	943 &rfc5405;

	944 &rfc7202;

	945

	946 <reference anchor='rmcat' target='https://datatracker.ietf.org/wg/rmcat/do cuments/'>

	947 <front>

	948 <title>rmcat documents</title>

	949 <author/>

	950 <date/>

	951 <abstract>

	952 <t></t>

	953 </abstract></front>

	954 </reference>

	955

	956

	957 </references>

	958

	959 </back>

	960 </rfc>

OLD	NEW

« no previous file with comments | « third_party/opus/src/doc/draft-ietf-codec-opus-update.xml ('k') | third_party/opus/src/doc/opus_in_isobmff.css » ('j') | no next file with comments »