doc/draft-ietf-codec-oggopus.xml - Issue 28553003: Updating Opus to a pre-release of 1.1

Unified Diff: doc/draft-ietf-codec-oggopus.xml

Issue 28553003: Updating Opus to a pre-release of 1.1 (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/deps/third_party/opus

Patch Set: Removing failing file Created 7 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: doc/draft-ietf-codec-oggopus.xml

diff --git a/doc/draft-ietf-codec-oggopus.xml b/doc/draft-ietf-codec-oggopus.xml

new file mode 100644

index 0000000000000000000000000000000000000000..6131e69ed5aaa8179ee6fe6b4b84a747cd26ce3f

--- /dev/null

+++ b/doc/draft-ietf-codec-oggopus.xml

@@ -0,0 +1,1447 @@

+<?xml version="1.0" encoding="utf-8"?>

+<!DOCTYPE rfc SYSTEM 'rfc2629.dtd' [

+<!ENTITY rfc2119 PUBLIC '' 'https://xml2rfc.tools.ietf.org/tools/xml2rfc/public/rfc/bibxml/reference.RFC.2119.xml'>

+<!ENTITY rfc3533 PUBLIC '' 'https://xml2rfc.tools.ietf.org/tools/xml2rfc/public/rfc/bibxml/reference.RFC.3533.xml'>

+<!ENTITY rfc3629 PUBLIC '' 'https://xml2rfc.tools.ietf.org/tools/xml2rfc/public/rfc/bibxml/reference.RFC.3629.xml'>

+<!ENTITY rfc4732 PUBLIC '' 'https://xml2rfc.tools.ietf.org/tools/xml2rfc/public/rfc/bibxml/reference.RFC.4732.xml'>

+<!ENTITY rfc5334 PUBLIC '' 'https://xml2rfc.tools.ietf.org/tools/xml2rfc/public/rfc/bibxml/reference.RFC.5334.xml'>

+<!ENTITY rfc6381 PUBLIC '' 'https://xml2rfc.tools.ietf.org/tools/xml2rfc/public/rfc/bibxml/reference.RFC.6381.xml'>

+<!ENTITY rfc6716 PUBLIC '' 'https://xml2rfc.tools.ietf.org/tools/xml2rfc/public/rfc/bibxml/reference.RFC.6716.xml'>

+]>

+<?rfc toc="yes" symrefs="yes" ?>

+<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-oggopus-01">

+<front>

+<title abbrev="Ogg Opus">Ogg Encapsulation for the Opus Audio Codec</title>

+<author initials="T.B." surname="Terriberry" fullname="Timothy B. Terriberry">

+<organization>Mozilla Corporation</organization>

+<address>

+<postal>

+<street>650 Castro Street</street>

+<city>Mountain View</city>

+<region>CA</region>

+<code>94041</code>

+<country>USA</country>

+</postal>

+<phone>+1 650 903-0800</phone>

+<email>tterribe@xiph.org</email>

+</address>

+</author>

+<author initials="R." surname="Lee" fullname="Ron Lee">

+<organization>Voicetronix</organization>

+<address>

+<postal>

+<street>246 Pulteney Street, Level 1</street>

+<city>Adelaide</city>

+<region>SA</region>

+<code>5000</code>

+<country>Australia</country>

+</postal>

+<phone>+61 8 8232 9112</phone>

+<email>ron@debian.org</email>

+</address>

+</author>

+<author initials="R." surname="Giles" fullname="Ralph Giles">

+<organization>Mozilla Corporation</organization>

+<address>

+<postal>

+<street>163 West Hastings Street</street>

+<city>Vancouver</city>

+<region>BC</region>

+<code>V6B 1H5</code>

+<country>Canada</country>

+</postal>

+<phone>+1 604 778 1540</phone>

+<email>giles@xiph.org</email>

+</address>

+</author>

+<date day="24" month="May" year="2013"/>

+<area>RAI</area>

+<workgroup>codec</workgroup>

+<abstract>

+<t>

+This document defines the Ogg encapsulation for the Opus interactive speech and

+ audio codec.

+This allows data encoded in the Opus format to be stored in an Ogg logical

+ bitstream.

+Ogg encapsulation provides Opus with a long-term storage format supporting

+ all of the essential features, including metadata, fast and accurate seeking,

+ corruption detection, recapture after errors, low overhead, and the ability to

+ multiplex Opus with other codecs (including video) with minimal buffering.

+It also provides a live streamable format, capable of delivery over a reliable

+ stream-oriented transport, without requiring all the data, or even the total

+ length of the data, up-front, in a form that is identical to the on-disk

+ storage format.

+</t>

+</abstract>

+</front>

+<middle>

+<section anchor="intro" title="Introduction">

+<t>

+The IETF Opus codec is a low-latency audio codec optimized for both voice and

+ general-purpose audio.

+See <xref target="RFC6716"/> for technical details.

+This document defines the encapsulation of Opus in a continuous, logical Ogg

+ bitstream <xref target="RFC3533"/>.

+</t>

+<t>

+Ogg bitstreams are made up of a series of 'pages', each of which contains data

+ from one or more 'packets'.

+Pages are the fundamental unit of multiplexing in an Ogg stream.

+Each page is associated with a particular logical stream and contains a capture

+ pattern and checksum, flags to mark the beginning and end of the logical

+ stream, and a 'granule position' that represents an absolute position in the

+ stream, to aid seeking.

+A single page can contain up to 65,025 octets of packet data from up to 255

+ different packets.

+Packets may be split arbitrarily across pages, and continued from one page to

+ the next (allowing packets much larger than would fit on a single page).

+Each page contains 'lacing values' that indicate how the data is partitioned

+ into packets, allowing a demuxer to recover the packet boundaries without

+ examining the encoded data.

+A packet is said to 'complete' on a page when the page contains the final

+ lacing value corresponding to that packet.

+</t>

+<t>

+This encapsulation defines the required contents of the packet data, including

+ the necessary headers, the organization of those packets into a logical

+ stream, and the interpretation of the codec-specific granule position field.

+It does not attempt to describe or specify the existing Ogg container format.

+Readers unfamiliar with the basic concepts mentioned above are encouraged to

+ review the details in <xref target="RFC3533"/>.

+</t>

+</section>

+<section anchor="terminology" title="Terminology">

+<t>

+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",

+ "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be

+ interpreted as described in <xref target="RFC2119"/>.

+</t>

+<t>

+Implementations that fail to satisfy one or more "MUST" requirements are

+ considered non-compliant.

+Implementations that satisfy all "MUST" requirements, but fail to satisfy one

+ or more "SHOULD" requirements are said to be "conditionally compliant".

+All other implementations are "unconditionally compliant".

+</t>

+</section>

+<section anchor="packet_organization" title="Packet Organization">

+<t>

+An Opus stream is organized as follows.

+</t>

+<t>

+There are two mandatory header packets.

+The granule position of the pages on which these packets complete MUST be zero.

+</t>

+<t>

+The first packet in the logical Ogg bitstream MUST contain the identification

+ (ID) header, which uniquely identifies a stream as Opus audio.

+The format of this header is defined in <xref target="id_header"/>.

+It MUST be placed alone (without any other packet data) on the first page of

+ the logical Ogg bitstream, and must complete on that page.

+This page MUST have its 'beginning of stream' flag set.

+</t>

+<t>

+The second packet in the logical Ogg bitstream MUST contain the comment header,

+ which contains user-supplied metadata.

+The format of this header is defined in <xref target="comment_header"/>.

+It MAY span one or more pages, beginning on the second page of the logical

+ stream.

+However many pages it spans, the comment header packet MUST finish the page on

+ which it completes.

+</t>

+<t>

+All subsequent pages are audio data pages, and the Ogg packets they contain are

+ audio data packets.

+Each audio data packet contains one Opus packet for each of N different

+ streams, where N is typically one for mono or stereo, but may be greater than

+ one for, e.g., multichannel audio.

+The value N is specified in the ID header (see

+ <xref target="channel_mapping"/>), and is fixed over the entire length of the

+ logical Ogg bitstream.

+</t>

+<t>

+The first N-1 Opus packets, if any, are packed one after another into the Ogg

+ packet, using the self-delimiting framing from Appendix B of

+ <xref target="RFC6716"/>.

+The remaining Opus packet is packed at the end of the Ogg packet using the

+ regular, undelimited framing from Section 3 of <xref target="RFC6716"/>.

+All of the Opus packets in a single Ogg packet MUST be constrained to have the

+ same duration.

+The duration and coding modes of each Opus packet are contained in the

+ TOC (table of contents) sequence in the first few bytes.

+A decoder SHOULD treat any Opus packet whose duration is different from that of

+ the first Opus packet in an Ogg packet as if it were an Opus packet with an

+ illegal TOC sequence.

+</t>

+<t>

+The first audio data page SHOULD NOT have the 'continued packet' flag set

+ (which would indicate the first audio data packet is continued from a previous

+ page).

+Packets MUST be placed into Ogg pages in order until the end of stream.

+Audio packets MAY span page boundaries.

+A decoder MUST treat a zero-octet audio data packet as if it were an Opus

+ packet with an illegal TOC sequence.

+The last page SHOULD have the 'end of stream' flag set, but implementations

+ should be prepared to deal with truncated streams that do not have a page

+ marked 'end of stream'.

+The final packet on the last page SHOULD NOT be a continued packet, i.e., the

+ final lacing value should be less than 255.

+There MUST NOT be any more pages in an Opus logical bitstream after a page

+ marked 'end of stream'.

+</t>

+</section>

+<section anchor="granpos" title="Granule Position">

+<t>

+The granule position of an audio data page encodes the total number of PCM

+ samples in the stream up to and including the last fully-decodable sample from

+ the last packet completed on that page.

+A page that is entirely spanned by a single packet (that completes on a

+ subsequent page) has no granule position, and the granule position field MUST

+ be set to the special value '-1' in two's complement.

+</t>

+<t>

+The granule position of an audio data page is in units of PCM audio samples at

+ a fixed rate of 48 kHz (per channel; a stereo stream's granule position

+ does not increment at twice the speed of a mono stream).

+It is possible to run an Opus decoder at other sampling rates, but the value

+ in the granule position field always counts samples assuming a 48 kHz

+ decoding rate, and the rest of this specification makes the same assumption.

+</t>

+<t>

+The duration of an Opus packet may be any multiple of 2.5 ms, up to a

+ maximum of 120 ms.

+This duration is encoded in the TOC sequence at the beginning of each packet.

+The number of samples returned by a decoder corresponds to this duration

+ exactly, even for the first few packets.

+For example, a 20 ms packet fed to a decoder running at 48 kHz will

+ always return 960 samples.

+A demuxer can parse the TOC sequence at the beginning of each Ogg packet to

+ work backwards or forwards from a packet with a known granule position (i.e.,

+ the last packet completed on some page) in order to assign granule positions

+ to every packet, or even every individual sample.

+The one exception is the last page in the stream, as described below.

+</t>

+<t>

+All other pages with completed packets after the first MUST have a granule

+ position equal to the number of samples contained in packets that complete on

+ that page plus the granule position of the most recent page with completed

+ packets.

+This guarantees that a demuxer can assign individual packets the same granule

+ position when working forwards as when working backwards.

+For this to work, there cannot be any gaps.

+In order to support capturing a stream that uses discontinuous transmission

+ (DTX), an encoder SHOULD emit packets that explicitly request the use of

+ Packet Loss Concealment (PLC) (i.e., with a frame length of 0, as defined in

+ Section 3.2.1 of <xref target="RFC6716"/>) in place of the packets that were

+ not transmitted.

+</t>

+<section anchor="preskip" title="Pre-skip">

+<t>

+There is some amount of latency introduced during the decoding process, to

+ allow for overlap in the MDCT modes, stereo mixing in the LP modes, and

+ resampling, and the encoder will introduce even more latency (though the exact

+ amount is not specified).

+Therefore, the first few samples produced by the decoder do not correspond to

+ real input audio, but are instead composed of padding inserted by the encoder

+ to compensate for this latency.

+These samples need to be stored and decoded, as Opus is an asymptotically

+ convergent predictive codec, meaning the decoded contents of each frame depend

+ on the recent history of decoder inputs.

+However, a decoder will want to skip these samples after decoding them.

+</t>

+<t>

+A 'pre-skip' field in the ID header (see <xref target="id_header"/>) signals

+ the number of samples which SHOULD be skipped (decoded but discarded) at the

+ beginning of the stream.

+This provides sufficient history to the decoder so that it has already

+ converged before the stream's output begins.

+It may also be used to perform sample-accurate cropping of existing encoded

+ streams.

+This amount need not be a multiple of 2.5 ms, may be smaller than a single

+ packet, or may span the contents of several packets.

+</t>

+</section>

+<section anchor="pcm_sample_position" title="PCM Sample Position">

+<t>

+The PCM sample position is determined from the granule position using the

+ formula

+<figure align="center">

+<artwork align="center"><![CDATA[

+'PCM sample position' = 'granule position' - 'pre-skip' .

+]]></artwork>

+</figure>

+</t>

+<t>

+For example, if the granule position of the first audio data page is 59,971,

+ and the pre-skip is 11,971, then the PCM sample position of the last decoded

+ sample from that page is 48,000.

+This can be converted into a playback time using the formula

+<figure align="center">

+<artwork align="center"><![CDATA[

+ 'PCM sample position'

+'playback time' = --------------------- .

+ 48000.0

+]]></artwork>

+</figure>

+</t>

+<t>

+The initial PCM sample position before any samples are played is normally '0'.

+In this case, the PCM sample position of the first audio sample to be played

+ starts at '1', because it marks the time on the clock

+ <spanx style="emph">after</spanx> that sample has been played, and a stream

+ that is exactly one second long has a final PCM sample position of '48000',

+ as in the example here.

+</t>

+<t>

+Vorbis streams use a granule position smaller than the number of audio samples

+ contained in the first audio data page to indicate that some of those samples

+ must be trimmed from the output (see <xref target="vorbis-trim"/>).

+However, to do so, Vorbis requires that the first audio data page contains

+ exactly two packets, in order to allow the decoder to perform PCM position

+ adjustments before needing to return any PCM data.

+Opus uses the pre-skip mechanism for this purpose instead, since the encoder

+ may introduce more than a single packet's worth of latency, and since very

+ large packets in streams with a very large number of channels might not fit

+ on a single page.

+</t>

+</section>

+<section anchor="end_trimming" title="End Trimming">

+<t>

+The page with the 'end of stream' flag set MAY have a granule position that

+ indicates the page contains less audio data than would normally be returned by

+ decoding up through the final packet.

+This is used to end the stream somewhere other than an even frame boundary.

+The granule position of the most recent audio data page with completed packets

+ is used to make this determination, or '0' is used if there were no previous

+ audio data pages with a completed packet.

+The difference between these granule positions indicates how many samples to

+ keep after decoding the packets that completed on the final page.

+The remaining samples are discarded.

+The number of discarded samples SHOULD be no larger than the number decoded

+ from the last packet.

+</t>

+</section>

+<section anchor="start_granpos_restrictions"

+ title="Restrictions on the Initial Granule Position">

+<t>

+The granule position of the first audio data page with a completed packet MAY

+ be larger than the number of samples contained in packets that complete on

+ that page, however it MUST NOT be smaller, unless that page has the 'end of

+ stream' flag set.

+Allowing a granule position larger than the number of samples allows the

+ beginning of a stream to be cropped or a live stream to be joined without

+ rewriting the granule position of all the remaining pages.

+This means that the PCM sample position just before the first sample to be

+ played may be larger than '0'.

+Synchronization when multiplexing with other logical streams still uses the PCM

+ sample position relative to '0' to compute sample times.

+This does not affect the behavior of pre-skip: exactly 'pre-skip' samples

+ should be skipped from the beginning of the decoded output, even if the

+ initial PCM sample position is greater than zero.

+</t>

+<t>

+On the other hand, a granule position that is smaller than the number of

+ decoded samples prevents a demuxer from working backwards to assign each

+ packet or each individual sample a valid granule position, since granule

+ positions must be non-negative.

+A decoder MUST reject as invalid any stream where the granule position is

+ smaller than the number of samples contained in packets that complete on the

+ first audio data page with a completed packet, unless that page has the 'end

+ of stream' flag set.

+It MAY defer this action until it decodes the last packet completed on that

+ page.

+</t>

+<t>

+If that page has the 'end of stream' flag set, a demuxer MUST reject as invalid

+ any stream where its granule position is smaller than the 'pre-skip' amount.

+This would indicate that more samples should be skipped from the initial

+ decoded output than exist in the stream.

+If the granule position is smaller than the number of decoded samples produced

+ by the packets that complete on that page, then a demuxer MUST use an initial

+ granule position of '0', and can work forwards from '0' to timestamp

+ individual packets.

+If the granule position is larger than the number of decoded samples available,

+ then the demuxer MUST still work backwards as described above, even if the

+ 'end of stream' flag is set, to determine the initial granule position, and

+ thus the initial PCM sample position.

+Both of these will be greater than '0' in this case.

+</t>

+</section>

+<section anchor="seeking_and_preroll" title="Seeking and Pre-roll">

+<t>

+Seeking in Ogg files is best performed using a bisection search for a page

+ whose granule position corresponds to a PCM position at or before the seek

+ target.

+With appropriately weighted bisection, accurate seeking can be performed with

+ just three or four bisections even in multi-gigabyte files.

+See <xref target="seeking"/> for general implementation guidance.

+</t>

+<t>

+When seeking within an Ogg Opus stream, the decoder SHOULD start decoding (and

+ discarding the output) at least 3840 samples (80 ms) prior to the

+ seek target in order to ensure that the output audio is correct by the time it

+ reaches the seek target.

+This 'pre-roll' is separate from, and unrelated to, the 'pre-skip' used at the

+ beginning of the stream.

+If the point 80 ms prior to the seek target comes before the initial PCM

+ sample position, the decoder SHOULD start decoding from the beginning of the

+ stream, applying pre-skip as normal, regardless of whether the pre-skip is

+ larger or smaller than 80 ms, and then continue to discard the samples

+ required to reach the seek target (if any).

+</t>

+</section>

+<section anchor="headers" title="Header Packets">

+<t>

+An Opus stream contains exactly two mandatory header packets:

+ an identification header and a comment header.

+</t>

+<section anchor="id_header" title="Identification Header">

+<figure anchor="id_header_packet" title="ID Header Packet" align="center">

+<artwork align="center"><![CDATA[

+ 0 1 2 3

+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| 'O' | 'p' | 'u' | 's' |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| 'H' | 'e' | 'a' | 'd' |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| Version = 1 | Channel Count | Pre-skip |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| Input Sample Rate (Hz) |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| Output Gain (Q7.8 in dB) | Mapping Family| |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ :

+| |

+: Optional Channel Mapping Table... :

+| |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+]]></artwork>

+</figure>

+<t>

+The fields in the identification (ID) header have the following meaning:

+<list style="numbers">

+<t><spanx style="strong">Magic Signature</spanx>:

+<vspace blankLines="1"/>

+This is an 8-octet (64-bit) field that allows codec identification and is

+ human-readable.

+It contains, in order, the magic numbers:

+<list style="empty">

+<t>0x4F 'O'</t>

+<t>0x70 'p'</t>

+<t>0x75 'u'</t>

+<t>0x73 's'</t>

+<t>0x48 'H'</t>

+<t>0x65 'e'</t>

+<t>0x61 'a'</t>

+<t>0x64 'd'</t>

+</list>

+Starting with "Op" helps distinguish it from audio data packets, as this is an

+ invalid TOC sequence.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">Version</spanx> (8 bits, unsigned):

+<vspace blankLines="1"/>

+The version number MUST always be '1' for this version of the encapsulation

+ specification.

+Implementations SHOULD treat streams where the upper four bits of the version

+ number match that of a recognized specification as backwards-compatible with

+ that specification.

+That is, the version number can be split into "major" and "minor" version

+ sub-fields, with changes to the "minor" sub-field (in the lower four bits)

+ signaling compatible changes.

+For example, a decoder implementing this specification SHOULD accept any stream

+ with a version number of '15' or less, and SHOULD assume any stream with a

+ version number '16' or greater is incompatible.

+The initial version '1' was chosen to keep implementations from relying on this

+ octet as a null terminator for the "OpusHead" string.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">Output Channel Count</spanx> 'C' (8 bits, unsigned):

+<vspace blankLines="1"/>

+This is the number of output channels.

+This might be different than the number of encoded channels, which can change

+ on a packet-by-packet basis.

+This value MUST NOT be zero.

+The maximum allowable value depends on the channel mapping family, and might be

+ as large as 255.

+See <xref target="channel_mapping"/> for details.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">Pre-skip</spanx> (16 bits, unsigned, little

+ endian):

+<vspace blankLines="1"/>

+This is the number of samples (at 48 kHz) to discard from the decoder

+ output when starting playback, and also the number to subtract from a page's

+ granule position to calculate its PCM sample position.

+When cropping the beginning of existing Ogg Opus streams, a pre-skip of at

+ least 3,840 samples (80 ms) is RECOMMENDED to ensure complete

+ convergence in the decoder.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">Input Sample Rate</spanx> (32 bits, unsigned, little

+ endian):

+<vspace blankLines="1"/>

+This field is <spanx style="emph">not</spanx> the sample rate to use for

+ playback of the encoded data.

+<vspace blankLines="1"/>

+Opus has a handful of coding modes, with internal audio bandwidths of 4, 6, 8,

+ 12, and 20 kHz.

+Each packet in the stream may have a different audio bandwidth.

+Regardless of the audio bandwidth, the reference decoder supports decoding any

+ stream at a sample rate of 8, 12, 16, 24, or 48 kHz.

+The original sample rate of the encoder input is not preserved by the lossy

+ compression.

+<vspace blankLines="1"/>

+An Ogg Opus player SHOULD select the playback sample rate according to the

+ following procedure:

+<list style="numbers">

+<t>If the hardware supports 48 kHz playback, decode at 48 kHz.</t>

+<t>Otherwise, if the hardware's highest available sample rate is a supported

+ rate, decode at this sample rate.</t>

+<t>Otherwise, if the hardware's highest available sample rate is less than

+ 48 kHz, decode at the highest supported rate above this and resample.</t>

+<t>Otherwise, decode at 48 kHz and resample.</t>

+</list>

+However, the 'Input Sample Rate' field allows the encoder to pass the sample

+ rate of the original input stream as metadata.

+This may be useful when the user requires the output sample rate to match the

+ input sample rate.

+For example, a non-player decoder writing PCM format samples to disk might

+ choose to resample the output audio back to the original input sample rate to

+ reduce surprise to the user, who might reasonably expect to get back a file

+ with the same sample rate as the one they fed to the encoder.

+<vspace blankLines="1"/>

+A value of zero indicates 'unspecified'.

+Encoders SHOULD write the actual input sample rate or zero, but decoder

+ implementations which do something with this field SHOULD take care to behave

+ sanely if given crazy values (e.g., do not actually upsample the output to

+ 10 MHz if requested).

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">Output Gain</spanx> (16 bits, signed, little

+ endian):

+<vspace blankLines="1"/>

+This is a gain to be applied by the decoder.

+It is 20*log10 of the factor to scale the decoder output by to achieve the

+ desired playback volume, stored in a 16-bit, signed, two's complement

+ fixed-point value with 8 fractional bits (i.e., Q7.8).

+To apply the gain, a decoder could use

+<figure align="center">

+<artwork align="center"><![CDATA[

+sample *= pow(10, output_gain/(20.0*256)) ,

+]]></artwork>

+</figure>

+ where output_gain is the raw 16-bit value from the header.

+<vspace blankLines="1"/>

+Virtually all players and media frameworks should apply it by default.

+If a player chooses to apply any volume adjustment or gain modification, such

+ as the R128_TRACK_GAIN (see <xref target="comment_header"/>) or a user-facing

+ volume knob, the adjustment MUST be applied in addition to this output gain in

+ order to achieve playback at the desired volume.

+<vspace blankLines="1"/>

+An encoder SHOULD set this field to zero, and instead apply any gain prior to

+ encoding, when this is possible and does not conflict with the user's wishes.

+The output gain should only be nonzero when the gain is adjusted after

+ encoding, or when the user wishes to adjust the gain for playback while

+ preserving the ability to recover the original signal amplitude.

+<vspace blankLines="1"/>

+Although the output gain has enormous range (+/- 128 dB, enough to amplify

+ inaudible sounds to the threshold of physical pain), most applications can

+ only reasonably use a small portion of this range around zero.

+The large range serves in part to ensure that gain can always be losslessly

+ transferred between OpusHead and R128_TRACK_GAIN (see below) without

+ saturating.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">Channel Mapping Family</spanx> (8 bits,

+ unsigned):

+<vspace blankLines="1"/>

+This octet indicates the order and semantic meaning of the various channels

+ encoded in each Ogg packet.

+<vspace blankLines="1"/>

+Each possible value of this octet indicates a mapping family, which defines a

+ set of allowed channel counts, and the ordered set of channel names for each

+ allowed channel count.

+The details are described in <xref target="channel_mapping"/>.

+</t>

+<t><spanx style="strong">Channel Mapping Table</spanx>:

+This table defines the mapping from encoded streams to output channels.

+It is omitted when the channel mapping family is 0, but REQUIRED otherwise.

+Its contents are specified in <xref target="channel_mapping"/>.

+</t>

+</list>

+</t>

+<t>

+All fields in the ID headers are REQUIRED, except for the channel mapping

+ table, which is omitted when the channel mapping family is 0.

+Implementations SHOULD reject ID headers which do not contain enough data for

+ these fields, even if they contain a valid Magic Signature.

+Future versions of this specification, even backwards-compatible versions,

+ might include additional fields in the ID header.

+If an ID header has a compatible major version, but a larger minor version,

+ an implementation MUST NOT reject it for containing additional data not

+ specified here.

+However, implementations MAY reject streams in which the ID header does not

+ complete on the first page.

+</t>

+<section anchor="channel_mapping" title="Channel Mapping">

+<t>

+An Ogg Opus stream allows mapping one number of Opus streams (N) to a possibly

+ larger number of decoded channels (M+N) to yet another number of output

+ channels (C), which might be larger or smaller than the number of decoded

+ channels.

+The order and meaning of these channels are defined by a channel mapping,

+ which consists of the 'channel mapping family' octet and, for channel mapping

+ families other than family 0, a channel mapping table, as illustrated in

+ <xref target="channel_mapping_table"/>.

+</t>

+<figure anchor="channel_mapping_table" title="Channel Mapping Table"

+ align="center">

+<artwork align="center"><![CDATA[

+ 0 1 2 3

+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+ +-+-+-+-+-+-+-+-+

+ | Stream Count |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| Coupled Count | Channel Mapping... :

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+]]></artwork>

+</figure>

+<t>

+The fields in the channel mapping table have the following meaning:

+<list style="numbers" counter="8">

+<t><spanx style="strong">Stream Count</spanx> 'N' (8 bits, unsigned):

+<vspace blankLines="1"/>

+This is the total number of streams encoded in each Ogg packet.

+This value is required to correctly parse the packed Opus packets inside an

+ Ogg packet, as described in <xref target="packet_organization"/>.

+This value MUST NOT be zero, as without at least one Opus packet with a valid

+ TOC sequence, a demuxer cannot recover the duration of an Ogg packet.

+<vspace blankLines="1"/>

+For channel mapping family 0, this value defaults to 1, and is not coded.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">Coupled Stream Count</spanx> 'M' (8 bits, unsigned):

+This is the number of streams whose decoders should be configured to produce

+ two channels.

+This MUST be no larger than the total number of streams, N.

+<vspace blankLines="1"/>

+Each packet in an Opus stream has an internal channel count of 1 or 2, which

+ can change from packet to packet.

+This is selected by the encoder depending on the bitrate and the audio being

+ encoded.

+The original channel count of the encoder input is not preserved by the lossy

+ compression.

+<vspace blankLines="1"/>

+Regardless of the internal channel count, any Opus stream can be decoded as

+ mono (a single channel) or stereo (two channels) by appropriate initialization

+ of the decoder.

+The 'coupled stream count' field indicates that the first M Opus decoders are

+ to be initialized in stereo mode, and the remaining N-M decoders are to be

+ initialized in mono mode.

+The total number of decoded channels, (M+N), MUST be no larger than 255, as

+ there is no way to index more channels than that in the channel mapping.

+<vspace blankLines="1"/>

+For channel mapping family 0, this value defaults to C-1 (i.e., 0 for mono

+ and 1 for stereo), and is not coded.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">Channel Mapping</spanx> (8*C bits):

+This contains one octet per output channel, indicating which decoded channel

+ should be used for each one.

+Let 'index' be the value of this octet for a particular output channel.

+This value MUST either be smaller than (M+N), or be the special value 255.

+If 'index' is less than 2*M, the output MUST be taken from decoding stream

+ ('index'/2) as stereo and selecting the left channel if 'index' is even, and

+ the right channel if 'index' is odd.

+If 'index' is 2*M or larger, the output MUST be taken from decoding stream

+ ('index'-M) as mono.

+If 'index' is 255, the corresponding output channel MUST contain pure silence.

+<vspace blankLines="1"/>

+The number of output channels, C, is not constrained to match the number of

+ decoded channels (M+N).

+A single index value MAY appear multiple times, i.e., the same decoded channel

+ might be mapped to multiple output channels.

+Some decoded channels might not be assigned to any output channel, as well.

+<vspace blankLines="1"/>

+For channel mapping family 0, the first index defaults to 0, and if C==2,

+ the second index defaults to 1.

+Neither index is coded.

+</t>

+</list>

+</t>

+<t>

+After producing the output channels, the channel mapping family determines the

+ semantic meaning of each one.

+Currently there are three defined mapping families, although more may be added.

+</t>

+<section anchor="channel_mapping_0" title="Channel Mapping Family 0">

+<t>

+Allowed numbers of channels: 1 or 2.

+RTP mapping.

+</t>

+<t>

+<list style="symbols">

+<t>1 channel: monophonic (mono).</t>

+<t>2 channels: stereo (left, right).</t>

+</list>

+<spanx style="strong">Special mapping</spanx>: This channel mapping value also

+ indicates that the contents consists of a single Opus stream that is stereo if

+ and only if C==2, with stream index 0 mapped to output channel 0 (mono, or

+ left channel) and stream index 1 mapped to output channel 1 (right channel)

+ if stereo.

+When the 'channel mapping family' octet has this value, the channel mapping

+ table MUST be omitted from the ID header packet.

+</t>

+</section>

+<section anchor="channel_mapping_1" title="Channel Mapping Family 1">

+<t>

+Allowed numbers of channels: 1...8.

+Vorbis channel order.

+</t>

+<t>

+Each channel is assigned to a speaker location in a conventional surround

+ configuration.

+Specific locations depend on the number of channels, and are given below

+ in order of the corresponding channel indicies.

+<list style="symbols">

+ <t>1 channel: monophonic (mono).</t>

+ <t>2 channels: stereo (left, right).</t>

+ <t>3 channels: linear surround (left, center, right)</t>

+ <t>4 channels: quadraphonic (front left, front right, rear left, rear right).</t>

+ <t>5 channels: 5.0 surround (front left, front center, front right, rear left, rear right).</t>

+ <t>6 channels: 5.1 surround (front left, front center, front right, rear left, rear right, LFE).</t>

+ <t>7 channels: 6.1 surround (front left, front center, front right, side left, side right, rear center, LFE).</t>

+ <t>8 channels: 7.1 surround (front left, front center, front right, side left, side right, rear left, rear right, LFE)</t>

+</list>

+This set of surround configurations and speaker location orderings is the same

+ as the one used by the Vorbis codec <xref target="vorbis-mapping"/>.

+The ordering is different from the one used by the

+ WAVE <xref target="wave-multichannel"/> and

+ FLAC <xref target="flac"/> formats,

+ so correct ordering requires permutation of the output channels when encoding

+ from or decoding to those formats.

+'LFE' here refers to a Low Frequency Effects, often mapped to a subwoofer

+ with no particular spacial position.

+Implementations SHOULD identify 'side' or 'rear' speaker locations with

+ 'surround' and 'back' as appropriate when interfacing with audio formats

+ or systems which prefer that terminology.

+Speaker configurations other than those described here are not supported.

+</t>

+</section>

+<section anchor="channel_mapping_255"

+ title="Channel Mapping Family 255">

+<t>

+Allowed numbers of channels: 1...255.

+No defined channel meaning.

+</t>

+<t>

+Channels are unidentified.

+General-purpose players SHOULD NOT attempt to play these streams, and offline

+ decoders MAY deinterleave the output into separate PCM files, one per channel.

+Decoders SHOULD NOT produce output for channels mapped to stream index 255

+ (pure silence) unless they have no other way to indicate the index of

+ non-silent channels.

+</t>

+</section>

+<section anchor="channel_mapping_undefined"

+ title="Undefined Channel Mappings">

+<t>

+The remaining channel mapping families (2...254) are reserved.

+A decoder encountering a reserved channel mapping family value SHOULD act as

+ though the value is 255.

+</t>

+</section>

+<section anchor="downmix" title="Downmixing">

+<t>

+An Ogg Opus player MUST play any Ogg Opus stream with a channel mapping family

+ of 0 or 1, even if the number of channels does not match the physically

+ connected audio hardware.

+Players SHOULD perform channel mixing to increase or reduce the number of

+ channels as needed.

+</t>

+<t>

+Implementations MAY use the following matricies to implement downmixing from

+ multichannel files using <xref target="channel_mapping_1">Channel Mapping

+ Family 1</xref>, which are known to give acceptable results for stereo.

+Matricies for 3 and 4 channels are normalized so each coefficent row sums

+ to 1 to avoid clipping.

+For 5 or more channels they are normalized to 2 as a compromize between

+ clipping and dynamic range reduction.

+</t>

+<t>

+In these matricies the front left and front right channels are generally

+passed through directly.

+When a surround channel is split between both the left and right stereo

+ channels, coefficients are chosen so their squares sum to 1, which

+ helps preserve the perceived intensity.

+Rear channels are mixed more diffusely or attenuated to maintain focus

+ on the front channels.

+</t>

+<figure anchor="downmix-matrix-3"

+ title="Stereo downmix matrix for the linear surround channel mapping"

+ align="center">

+<artwork align="center"><![CDATA[

+ Left output = ( 0.585786 * left + 0.414214 * center )

+Right output = ( 0.414214 * center + 0.585786 * right )

+]]></artwork>

+<postamble>

+Exact coefficient values are 1 and 1/sqrt(2), multiplied by

+ 1/(1 + 1/sqrt(2)) for normalization.

+</postamble>

+</figure>

+<figure anchor="downmix-matrix-4"

+ title="Stereo downmix matrix for the quadraphonic channel mapping"

+ align="center">

+<artwork align="center"><![CDATA[

+/ \ / \ / FL \

+| L output | | 0.422650 0.000000 0.366025 0.211325 | | FR |

+| R output | = | 0.000000 0.422650 0.211325 0.366025 | | RL |

+\ / \ / \ RR /

+]]></artwork>

+<postamble>

+Exact coefficient values are 1, sqrt(3)/2 and 1/2, multiplied by

+ 1/(1 + sqrt(3)/2 + 1/2) for normalization.

+</postamble>

+</figure>

+<figure anchor="downmix-matrix-5"

+ title="Stereo downmix matrix for the 5.0 surround mapping"

+ align="center">

+<artwork align="center"><![CDATA[

+ / FL \

+/ \ / \ | FC |

+| L | | 0.650802 0.460186 0.000000 0.563611 0.325401 | | FR |

+| R | = | 0.000000 0.460186 0.650802 0.325401 0.563611 | | RL |

+\ / \ / | RR |

+ \ /

+]]></artwork>

+<postamble>

+Exact coefficient values are 1, 1/sqrt(2), sqrt(3)/2 and 1/2, multiplied by

+ 2/(1 + 1/sqrt(2) + sqrt(3)/2 + 1/2)

+ for normalization.

+</postamble>

+</figure>

+<figure anchor="downmix-matrix-6"

+ title="Stereo downmix matrix for the 5.1 surround mapping"

+ align="center">

+<artwork align="center"><![CDATA[

+ /FL \

+/ \ / \ |FC |

+|L| | 0.529067 0.374107 0.000000 0.458186 0.264534 0.374107 | |FR |

+|R| = | 0.000000 0.374107 0.529067 0.264534 0.458186 0.374107 | |RL |

+\ / \ / |RR |

+ \LFE/

+]]></artwork>

+<postamble>

+Exact coefficient values are 1, 1/sqrt(2), sqrt(3)/2 and 1/2, multiplied by

+2/(1 + 1/sqrt(2) + sqrt(3)/2 + 1/2 + 1/sqrt(2))

+ for normalization.

+</postamble>

+</figure>

+<figure anchor="downmix-matrix-7"

+ title="Stereo downmix matrix for the 6.1 surround mapping"

+ align="center">

+<artwork align="center"><![CDATA[

+ / \

+ | 0.455310 0.321953 0.000000 0.394310 0.227655 0.278819 0.321953 |

+ | 0.000000 0.321953 0.455310 0.227655 0.394310 0.278819 0.321953 |

+ \ /

+]]></artwork>

+<postamble>

+Exact coefficient values are 1, 1/sqrt(2), sqrt(3)/2, 1/2 and

+ sqrt(3)/2/sqrt(2), multiplied by

+ 2/(1 + 1/sqrt(2) + sqrt(3)/2 + 1/2 +

+ sqrt(3)/2/sqrt(2) + 1/sqrt(2)) for normalization.

+The coeffients are in the same order as in <xref target="channel_mapping_1" />,

+ and the matricies above.

+</postamble>

+</figure>

+<figure anchor="downmix-matrix-8"

+ title="Stereo downmix matrix for the 7.1 surround mapping"

+ align="center">

+<artwork align="center"><![CDATA[

+/ \

+| .388631 .274804 .000000 .336565 .194316 .336565 .194316 .274804 |

+| .000000 .274804 .388631 .194316 .336565 .194316 .336565 .274804 |

+\ /

+]]></artwork>

+<postamble>

+Exact coefficient values are 1, 1/sqrt(2), sqrt(3)/2 and 1/2, multiplied by

+ 2/(2 + 2/sqrt(2) + sqrt(3)) for normalization.

+The coeffients are in the same order as in <xref target="channel_mapping_1" />,

+ and the matricies above.

+</postamble>

+</figure>

+</section>

+</section>

+</section>

+<section anchor="comment_header" title="Comment Header">

+<figure anchor="comment_header_packet" title="Comment Header Packet"

+ align="center">

+<artwork align="center"><![CDATA[

+ 0 1 2 3

+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| 'O' | 'p' | 'u' | 's' |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| 'T' | 'a' | 'g' | 's' |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| Vendor String Length |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| |

+: Vendor String... :

+| |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| User Comment List Length |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| User Comment #0 String Length |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| |

+: User Comment #0 String... :

+| |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+| User Comment #1 String Length |

++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+: :

+]]></artwork>

+</figure>

+<t>

+The comment header consists of a 64-bit magic signature, followed by data in

+ the same format as the <xref target="vorbis-comment"/> header used in Ogg

+ Vorbis (without the final "framing bit"), Ogg Theora, and Speex.

+<list style="numbers">

+<t><spanx style="strong">Magic Signature</spanx>:

+<vspace blankLines="1"/>

+This is an 8-octet (64-bit) field that allows codec identification and is

+ human-readable.

+It contains, in order, the magic numbers:

+<list style="empty">

+<t>0x4F 'O'</t>

+<t>0x70 'p'</t>

+<t>0x75 'u'</t>

+<t>0x73 's'</t>

+<t>0x54 'T'</t>

+<t>0x61 'a'</t>

+<t>0x67 'g'</t>

+<t>0x73 's'</t>

+</list>

+Starting with "Op" helps distinguish it from audio data packets, as this is an

+ invalid TOC sequence.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">Vendor String Length</spanx> (32 bits, unsigned,

+ little endian):

+<vspace blankLines="1"/>

+This field gives the length of the following vendor string, in octets.

+It MUST NOT indicate that the vendor string is longer than the rest of the

+ packet.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">Vendor String</spanx> (variable length, UTF-8 vector):

+<vspace blankLines="1"/>

+This is a simple human-readable tag for vendor information, encoded as a UTF-8

+ string <xref target="RFC3629"/>.

+No terminating null octet is required.

+<vspace blankLines="1"/>

+This tag is intended to identify the codec encoder and encapsulation

+ implementations, for tracing differences in technical behavior.

+User-facing encoding applications can use the 'ENCODER' user comment tag

+ to identify themselves.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">User Comment List Length</spanx> (32 bits, unsigned,

+ little endian):

+<vspace blankLines="1"/>

+This field indicates the number of user-supplied comments.

+It MAY indicate there are zero user-supplied comments, in which case there are

+ no additional fields in the packet.

+It MUST NOT indicate that there are so many comments that the comment string

+ lengths would require more data than is available in the rest of the packet.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">User Comment #i String Length</spanx> (32 bits,

+ unsigned, little endian):

+<vspace blankLines="1"/>

+This field gives the length of the following user comment string, in octets.

+There is one for each user comment indicated by the 'user comment list length'

+ field.

+It MUST NOT indicate that the string is longer than the rest of the packet.

+<vspace blankLines="1"/>

+</t>

+<t><spanx style="strong">User Comment #i String</spanx> (variable length, UTF-8

+ vector):

+<vspace blankLines="1"/>

+This field contains a single user comment string.

+There is one for each user comment indicated by the 'user comment list length'

+ field.

+</t>

+</list>

+</t>

+<t>

+The vendor string length and user comment list length are REQUIRED, and

+ implementations SHOULD reject comment headers that do not contain enough data

+ for these fields, or that do not contain enough data for the corresponding

+ vendor string or user comments they describe.

+Making this check before allocating the associated memory to contain the data

+ may help prevent a possible Denial-of-Service (DoS) attack from small comment

+ headers that claim to contain strings longer than the entire packet or more

+ user comments than than could possibly fit in the packet.

+</t>

+<t>

+The user comment strings follow the NAME=value format described by

+ <xref target="vorbis-comment"/> with the same recommended tag names.

+One new comment tag is introduced for Ogg Opus:

+<figure align="center">

+<artwork align="left"><![CDATA[

+R128_TRACK_GAIN=-573

+]]></artwork>

+</figure>

+representing the volume shift needed to normalize the track's volume.

+The gain is a Q7.8 fixed point number in dB, as in the ID header's 'output

+ gain' field.

+This tag is similar to the REPLAYGAIN_TRACK_GAIN tag in

+ Vorbis <xref target="replay-gain"/>, except that the normal volume

+ reference is the <xref target="EBU-R128"/> standard.

+</t>

+<t>

+An Ogg Opus file MUST NOT have more than one such tag, and if present its

+ value MUST be an integer from -32768 to 32767, inclusive, represented in

+ ASCII with no whitespace.

+If present, it MUST correctly represent the R128 normalization gain relative

+ to the 'output gain' field specified in the ID header.

+If a player chooses to make use of the R128_TRACK_GAIN tag, it MUST be

+ applied <spanx style="emph">in addition</spanx> to the 'output gain' value.

+If an encoder wishes to use R128 normalization, and the output gain is not

+ otherwise constrained or specified, the encoder SHOULD write the R128 gain

+ into the 'output gain' field and store a tag containing "R128_TRACK_GAIN=0".

+That is, it should assume that by default tools will respect the 'output gain'

+ field, and not the comment tag.

+If a tool modifies the ID header's 'output gain' field, it MUST also update or

+ remove the R128_TRACK_GAIN comment tag.

+</t>

+<t>

+To avoid confusion with multiple normalization schemes, an Opus comment header

+ SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK,

+ REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK tags.

+</t>

+<t>

+There is no Opus comment tag corresponding to REPLAYGAIN_ALBUM_GAIN.

+That information should instead be stored in the ID header's 'output gain'

+ field.

+</t>

+</section>

+<section anchor="packet_size_limits" title="Packet Size Limits">

+<t>

+Technically valid Opus packets can be arbitrarily large due to the padding

+ format, although the amount of non-padding data they can contain is bounded.

+These packets might be spread over a similarly enormous number of Ogg pages.

+Encoders SHOULD use no more padding than required to make a variable bitrate

+ (VBR) stream constant bitrate (CBR).

+Decoders SHOULD avoid attempting to allocate excessive amounts of memory when

+ presented with a very large packet.

+The presence of an extremely large packet in the stream could indicate a

+ memory exhaustion attack or stream corruption.

+Decoders SHOULD reject a packet that is too large to process, and display a

+ warning message.

+</t>

+<t>

+In an Ogg Opus stream, the largest possible valid packet that does not use

+ padding has a size of (61,298*N - 2) octets, or about 60 kB per

+ Opus stream.

+With 255 streams, this is 15,630,988 octets (14.9 MB) and can

+ span up to 61,298 Ogg pages, all but one of which will have a granule

+ position of -1.

+This is of course a very extreme packet, consisting of 255 streams, each

+ containing 120 ms of audio encoded as 2.5 ms frames, each frame

+ using the maximum possible number of octets (1275) and stored in the least

+ efficient manner allowed (a VBR code 3 Opus packet).

+Even in such a packet, most of the data will be zeros as 2.5 ms frames

+ cannot actually use all 1275 octets.

+The largest packet consisting of entirely useful data is

+ (15,326*N - 2) octets, or about 15 kB per stream.

+This corresponds to 120 ms of audio encoded as 10 ms frames in either

+ LP or Hybrid mode, but at a data rate of over 1 Mbps, which makes little

+ sense for the quality achieved.

+A more reasonable limit is (7,664*N - 2) octets, or about 7.5 kB

+ per stream.

+This corresponds to 120 ms of audio encoded as 20 ms stereo MDCT-mode

+ frames, with a total bitrate just under 511 kbps (not counting the Ogg

+ encapsulation overhead).

+With N=8, the maximum number of channels currently defined by mapping

+ family 1, this gives a maximum packet size of 61,310 octets, or just

+ under 60 kB.

+This is still quite conservative, as it assumes each output channel is taken

+ from one decoded channel of a stereo packet.

+An implementation could reasonably choose any of these numbers for its internal

+ limits.

+</t>

+</section>

+<section anchor="encoder" title="Encoder Guidelines">

+<t>

+When encoding Opus files, Ogg encoders should take into account the

+ algorithmic delay of the Opus encoder.

+</t>

+<figure align="center">

+<preamble>

+In encoders derived from the reference implementation, the number of

+ samples can be queried with:

+</preamble>

+<artwork align="center"><![CDATA[

+ opus_encoder_ctl(encoder_state, OPUS_GET_LOOKAHEAD, &samples_delay);

+]]></artwork>

+</figure>

+<t>

+To achieve good quality in the very first samples of a stream, the Ogg encoder

+ MAY use LPC extrapolation to generate at least 120 extra samples

+ (extra_samples) at the beginning to avoid the Opus encoder having to encode

+ a discontinuous signal.

+For an input file containing length samples, the Ogg encoder SHOULD set the

+ preskip header flag to samples_delay+extra_samples, encode at least

+ length+samples_delay+extra_samples samples, and set the granulepos of the last

+ page to length+samples_delay+extra_samples.

+This ensures that the encoded file has the same duration as the original, with

+ no time offset. The best way to pad the end of the stream is to also use LPC

+ extrapolation, but zero-padding is also acceptable.

+</t>

+<section anchor="lpc" title="LPC Extrapolation">

+<t>

+The first step in LPC extrapolation is to compute linear prediction

+ coefficients.

+When extending the end of the signal, order-N (typically with N ranging from 8

+ to 40) LPC analysis is performed on a window near the end of the signal.

+The last N samples are used as memory to an infinite impulse response (IIR)

+ filter.

+</t>

+<figure align="center">

+<preamble>

+The filter is then applied on a zero input to extrapolate the end of the signal.

+Let a(k) be the kth LPC coefficient and x(n) be the nth sample of the signal,

+ each new sample past the end of the signal is computed as:

+</preamble>

+<artwork align="center"><![CDATA[

+ N

+ ---

+x(n) = \ a(k)*x(n-k)

+ /

+ ---

+ k=1

+]]></artwork>

+</figure>

+<t>

+The process is repeated independently for each channel.

+It is possible to extend the beginning of the signal by applying the same

+ process backward in time.

+When extending the beginning of the signal, it is best to apply a "fade in" to

+ the extrapolated signal, e.g. by multiplying it by a half-Hanning window

+ <xref target="hanning"/>.

+</t>

+</section>

+<section anchor="continuous_chaining" title="Continuous Chaining">

+<t>

+In some applications, such as Internet radio, it is desirable to cut a long

+ streams into smaller chains, e.g. so the comment header can be updated.

+This can be done simply by separating the input streams into segments and

+ encoding each segment independently.

+The drawback of this approach is that it creates a small discontinuity

+ at the boundary due to the lossy nature of Opus.

+An encoder MAY avoid this discontinuity by using the following procedure:

+<list style="numbers">

+<t>Encode the last frame of the first segment as an independent frame by

+ turning off all forms of inter-frame prediction.

+De-emphasis is allowed.</t>

+<t>Set the granulepos of the last page to a point near the end of the last

+ frame.</t>

+<t>Begin the second segment with a copy of the last frame of the first

+ segment.</t>

+<t>Set the preskip flag of the second stream in such a way as to properly

+ join the two streams.</t>

+<t>Continue the encoding process normally from there, without any reset to

+ the encoder.</t>

+</list>

+</t>

+</section>

+<section anchor="implementation" title="Implementation Status">

+<t>

+A brief summary of major implementations of this draft is available

+ at <eref target="https://wiki.xiph.org/OggOpusImplementation"/>,

+ along with their status.

+</t>

+<t>

+[Note to RFC Editor: please remove this entire section before

+ final publication per <xref target="draft-sheffer-running-code"/>.]

+</t>

+</section>

+<section anchor="security" title="Security Considerations">

+<t>

+Implementations of the Opus codec need to take appropriate security

+ considerations into account, as outlined in <xref target="RFC4732"/>.

+This is just as much a problem for the container as it is for the codec itself.

+It is extremely important for the decoder to be robust against malicious

+ payloads.

+Malicious payloads must not cause the decoder to overrun its allocated memory

+ or to take an excessive amount of resources to decode.

+Although problems in encoders are typically rarer, the same applies to the

+ encoder.

+Malicious audio streams must not cause the encoder to misbehave because this

+ would allow an attacker to attack transcoding gateways.

+</t>

+<t>

+Like most other container formats, Ogg Opus files should not be used with

+ insecure ciphers or cipher modes that are vulnerable to known-plaintext

+ attacks.

+Elements such as the Ogg page capture pattern and the magic signatures in the

+ ID header and the comment header all have easily predictable values, in

+ addition to various elements of the codec data itself.

+</t>

+</section>

+<section anchor="content_type" title="Content Type">

+<t>

+An "Ogg Opus file" consists of one or more sequentially multiplexed segments,

+ each containing exactly one Ogg Opus stream.

+The RECOMMENDED mime-type for Ogg Opus files is "audio/ogg".

+</t>

+<figure>

+<preamble>

+If more specificity is desired, one MAY indicate the presence of Opus streams

+ using the codecs parameter defined in <xref target="RFC6381"/>, e.g.,

+</preamble>

+<artwork align="center"><![CDATA[

+ audio/ogg; codecs=opus

+]]></artwork>

+<postamble>

+ for an Ogg Opus file.

+</postamble>

+</figure>

+<t>

+The RECOMMENDED filename extension for Ogg Opus files is '.opus'.

+</t>

+<t>

+When Opus is concurrently multiplexed with other streams in an Ogg container,

+ one SHOULD use one of the "audio/ogg", "video/ogg", or "application/ogg"

+ mime-types, as defined in <xref target="RFC5334"/>.

+Such streams are not strictly "Ogg Opus files" as described above,

+ since they contain more than a single Opus stream per sequentially

+ multiplexed segment, e.g. video or multiple audio tracks.

+In such cases the the '.opus' filename extension is NOT RECOMMENDED.

+</t>

+</section>

+<section title="IANA Considerations">

+<t>

+This document has no actions for IANA.

+</t>

+</section>

+<section anchor="Acknowledgments" title="Acknowledgments">

+<t>

+Thanks to Greg Maxwell, Christopher "Monty" Montgomery, and Jean-Marc Valin for

+ their valuable contributions to this document.

+Additional thanks to Andrew D'Addesio, Greg Maxwell, and Vincent Penqeurc'h for

+ their feedback based on early implementations.

+</t>

+</section>

+<section title="Copying Conditions">

+<t>

+The authors agree to grant third parties the irrevocable right to copy, use,

+ and distribute the work, with or without modification, in any medium, without

+ royalty, provided that, unless separate permission is granted, redistributed

+ modified works do not contain misleading author, version, name of work, or

+ endorsement information.

+</t>

+</section>

+</middle>

+<back>

+<references title="Normative References">

+ &rfc2119;

+ &rfc3533;

+ &rfc3629;

+ &rfc5334;

+ &rfc6381;

+ &rfc6716;

+<reference anchor="EBU-R128" target="http://tech.ebu.ch/loudness">

+<front>

+<title>"Loudness Recommendation EBU R128</title>

+<author fullname="EBU Technical Committee"/>

+<date month="August" year="2011"/>

+</front>

+</reference>

+<reference anchor="vorbis-comment"

+ target="http://www.xiph.org/vorbis/doc/v-comment.html">

+<front>

+<title>Ogg Vorbis I Format Specification: Comment Field and Header

+ Specification</title>

+<author initials="C." surname="Montgomery"

+ fullname="Christopher "Monty" Montgomery"/>

+<date month="July" year="2002"/>

+</front>

+</reference>

+</references>

+<references title="Informative References">

+

+ &rfc4732;

+<reference anchor="draft-sheffer-running-code"

+ target="https://tools.ietf.org/html/draft-sheffer-running-code-05#section-2">

+ <front>

+ <title>Improving "Rough Consensus" with Running Code</title>

+ <author initials="Y." surname="Sheffer" fullname="Yaron Sheffer"/>

+ <author initials="A." surname="Farrel" fullname="Adrian Farrel"/>

+ <date month="May" year="2013"/>

+ </front>

+</reference>

+<reference anchor="flac"

+ target="https://xiph.org/flac/format.html">

+ <front>

+ <title>FLAC - Free Lossless Audio Codec Format Description</title>

+ <author initials="J." surname="Coalson" fullname="Josh Coalson"/>

+ <date month="January" year="2008"/>

+ </front>

+</reference>

+<reference anchor="hanning"

+ target="http://en.wikipedia.org/wiki/Hamming_function#Hann_.28Hanning.29_window">

+ <front>

+ <title>"Hann window</title>

+ <author fullname="Wikipedia"/>

+ <date month="May" year="2013"/>

+ </front>

+</reference>

+<reference anchor="replay-gain"

+ target="http://wiki.xiph.org/VorbisComment#Replay_Gain">

+<front>

+<title>VorbisComment: Replay Gain</title>

+<author initials="C." surname="Parker" fullname="Conrad Parker"/>

+<author initials="M." surname="Leese" fullname="Martin Leese"/>

+<date month="June" year="2009"/>

+</front>

+</reference>

+<reference anchor="seeking"

+ target="http://wiki.xiph.org/Seeking">

+<front>

+<title>Granulepos Encoding and How Seeking Really Works</title>

+<author initials="S." surname="Pfeiffer" fullname="Silvia Pfeiffer"/>

+<author initials="C." surname="Parker" fullname="Conrad Parker"/>

+<author initials="G." surname="Maxwell" fullname="Greg Maxwell"/>

+<date month="May" year="2012"/>

+</front>

+</reference>

+<reference anchor="vorbis-mapping"

+ target="http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9">

+<front>

+<title>The Vorbis I Specification, Section 4.3.9 Output Channel Order</title>

+<author initials="C." surname="Montgomery"

+ fullname="Christopher "Monty" Montgomery"/>

+<date month="January" year="2010"/>

+</front>

+</reference>

+<reference anchor="vorbis-trim"

+ target="http://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-130000A.2">

+ <front>

+ <title>The Vorbis I Specification, Appendix A: Embedding Vorbis

+ into an Ogg stream</title>

+ <author initials="C." surname="Montgomery"

+ fullname="Christopher "Monty" Montgomery"/>

+ <date month="November" year="2008"/>

+ </front>

+</reference>

+<reference anchor="wave-multichannel"

+ target="http://msdn.microsoft.com/en-us/windows/hardware/gg463006.aspx">

+ <front>

+ <title>Multiple Channel Audio Data and WAVE Files</title>

+ <author fullname="Microsoft Corporation"/>

+ <date month="March" year="2007"/>

+ </front>

+</reference>

+</references>

+</back>

+</rfc>

« no previous file with comments | « doc/build_oggdraft.sh ('k') | doc/draft-ietf-codec-opus.xml » ('j') | no next file with comments »