third_party/opus/src/doc/draft-ietf-codec-opus-update.xml - Issue 2962373002: [Opus] Update to v1.2.1

Unified Diff: third_party/opus/src/doc/draft-ietf-codec-opus-update.xml

Issue 2962373002: [Opus] Update to v1.2.1 (Closed)

Patch Set: Pre-increment instead of post-increment Created 3 years, 5 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

« no previous file with comments | « third_party/opus/src/doc/draft-ietf-codec-opus.xml ('k') | third_party/opus/src/doc/draft-ietf-payload-rtp-opus.xml » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

Index: third_party/opus/src/doc/draft-ietf-codec-opus-update.xml

diff --git a/third_party/opus/src/doc/draft-ietf-codec-opus-update.xml b/third_party/opus/src/doc/draft-ietf-codec-opus-update.xml

new file mode 100644

index 0000000000000000000000000000000000000000..a97124fc58c031d4eb19c18e81b0dc97f5df5019

--- /dev/null

+++ b/third_party/opus/src/doc/draft-ietf-codec-opus-update.xml

@@ -0,0 +1,438 @@

+<?xml version="1.0" encoding="US-ASCII"?>

+<!DOCTYPE rfc SYSTEM "rfc2629.dtd">

+<?rfc toc="yes"?>

+<?rfc tocompact="yes"?>

+<?rfc tocdepth="3"?>

+<?rfc tocindent="yes"?>

+<?rfc symrefs="yes"?>

+<?rfc sortrefs="yes"?>

+<?rfc comments="yes"?>

+<?rfc inline="yes"?>

+<?rfc compact="yes"?>

+<?rfc subcompact="no"?>

+<rfc category="std" docName="draft-ietf-codec-opus-update-06"

+ ipr="trust200902">

+ <front>

+ <title abbrev="Opus Update">Updates to the Opus Audio Codec</title>

+<author initials="JM" surname="Valin" fullname="Jean-Marc Valin">

+<organization>Mozilla Corporation</organization>

+<address>

+<postal>

+<street>331 E. Evelyn Avenue</street>

+<city>Mountain View</city>

+<region>CA</region>

+<code>94041</code>

+<country>USA</country>

+</postal>

+<phone>+1 650 903-0800</phone>

+<email>jmvalin@jmvalin.ca</email>

+</address>

+</author>

+<author initials="K." surname="Vos" fullname="Koen Vos">

+<organization>vocTone</organization>

+<address>

+<postal>

+<street></street>

+<city></city>

+<region></region>

+<code></code>

+<country></country>

+</postal>

+<phone></phone>

+<email>koenvos74@gmail.com</email>

+</address>

+</author>

+ <date day="19" month="June" year="2017" />

+ <abstract>

+ <t>This document addresses minor issues that were found in the specification

+ of the Opus audio codec in <xref target="RFC6716">RFC 6716</xref>.</t>

+ </abstract>

+ </front>

+ <middle>

+ <section title="Introduction">

+ <t>This document addresses minor issues that were discovered in the reference

+ implementation of the Opus codec that serves as the specification in

+ <xref target="RFC6716">RFC 6716</xref>. Only issues affecting the decoder are

+ listed here. An up-to-date implementation of the Opus encoder can be found at

+ https://opus-codec.org/.</t>

+ <t>

+ Some of the changes in this document update normative behaviour in a way that requires

+ new test vectors. The English text of the specification is unaffected, only

+ the C implementation is. The updated specification remains fully compatible with

+ the original specification.

+ </t>

+ <t>

+ Note: due to RFC formatting conventions, lines exceeding the column width

+ in the patch are split using a backslash character. The backslashes

+ at the end of a line and the white space at the beginning

+ of the following line are not part of the patch. A properly formatted patch

+ including all changes is available at

+ <eref target="https://jmvalin.ca/misc_stuff/opus_update.patch"/>. (EDITOR:

+ change to an ietf.org link when ready)

+ </t>

+ </section>

+ <section title="Terminology">

+ <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this

+ document are to be interpreted as described in <xref

+ target="RFC2119">RFC 2119</xref>.</t>

+ </section>

+ <section title="Stereo State Reset in SILK">

+ <t>The reference implementation does not reinitialize the stereo state

+ during a mode switch. The old stereo memory can produce a brief impulse

+ (i.e. single sample) in the decoded audio. This can be fixed by changing

+ silk/dec_API.c at line 72:

+ </t>

+<figure>

+<artwork><![CDATA[

+ for( n = 0; n < DECODER_NUM_CHANNELS; n++ ) {

+ ret = silk_init_decoder( &channel_state[ n ] );

+ }

++ silk_memset(&((silk_decoder *)decState)->sStereo, 0,

++ sizeof(((silk_decoder *)decState)->sStereo));

++ /* Not strictly needed, but it's cleaner that way */

++ ((silk_decoder *)decState)->prev_decode_only_middle = 0;

+ return ret;

+ }

+]]></artwork>

+</figure>

+ <t>

+ This change affects the normative part of the decoder, although the

+ amount of change is too small to make a significant impact on testvectors.

+ </t>

+ </section>

+ <section anchor="padding" title="Parsing of the Opus Packet Padding">

+ <t>It was discovered that some invalid packets of very large size could trigger

+ an out-of-bounds read in the Opus packet parsing code responsible for padding.

+ This is due to an integer overflow if the signaled padding exceeds 2^31-1 bytes

+ (the actual packet may be smaller). The code can be fixed by applying the following

+ changes at line 596 of src/opus_decoder.c:

+ </t>

+<figure>

+<artwork><![CDATA[

+ /* Padding flag is bit 6 */

+ if (ch&0x40)

+ {

+- int padding=0;

+ int p;

+ do {

+ if (len<=0)

+ return OPUS_INVALID_PACKET;

+ p = *data++;

+ len--;

+- padding += p==255 ? 254: p;

++ len -= p==255 ? 254: p;

+ } while (p==255);

+- len -= padding;

+ }

+]]></artwork>

+</figure>

+ <t>This packet parsing issue is limited to reading memory up

+ to about 60 kB beyond the compressed buffer. This can only be triggered

+ by a compressed packet more than about 16 MB long, so it's not a problem

+ for RTP. In theory, it <spanx style="emph">could</spanx> crash a file

+ decoder (e.g. Opus in Ogg) if the memory just after the incoming packet

+ is out-of-range, but our attempts to trigger such a crash in a production

+ application built using an affected version of the Opus decoder failed.</t>

+ </section>

+ <section anchor="resampler" title="Resampler buffer">

+ <t>The SILK resampler had the following issues:

+ <list style="numbers">

+ <t>The calls to memcpy() were using sizeof(opus_int32), but the type of the

+ local buffer was opus_int16.</t>

+ <t>Because the size was wrong, this potentially allowed the source

+ and destination regions of the memcpy() to overlap.

+ We <spanx style="emph">believe</spanx> that nSamplesIn is at least fs_in_khZ,

+ which is at least 8.

+ Since RESAMPLER_ORDER_FIR_12 is only 8, that should not be a problem once

+ the type size is fixed.</t>

+ <t>The size of the buffer used RESAMPLER_MAX_BATCH_SIZE_IN, but the

+ data stored in it was actually _twice_ the input batch size

+ (nSamplesIn<<1).</t>

+ </list></t>

+ <t>

+ The fact that the code never produced any error in testing (including when run under the

+ Valgrind memory debugger), suggests that in practice

+ the batch sizes are reasonable enough that none of the issues above

+ was ever a problem. However, proving that is non-obvious.

+ </t>

+ <t>The code can be fixed by applying the following changes to line 78 of silk/resampler_private_IIR_FIR.c:

+ </t>

+<figure>

+<artwork><![CDATA[

+ )

+ {

+ silk_resampler_state_struct *S = \

+(silk_resampler_state_struct *)SS;

+ opus_int32 nSamplesIn;

+ opus_int32 max_index_Q16, index_increment_Q16;

+- opus_int16 buf[ RESAMPLER_MAX_BATCH_SIZE_IN + \

+RESAMPLER_ORDER_FIR_12 ];

++ opus_int16 buf[ 2*RESAMPLER_MAX_BATCH_SIZE_IN + \

+RESAMPLER_ORDER_FIR_12 ];

+ /* Copy buffered samples to start of buffer */

+- silk_memcpy( buf, S->sFIR, RESAMPLER_ORDER_FIR_12 \

+* sizeof( opus_int32 ) );

++ silk_memcpy( buf, S->sFIR, RESAMPLER_ORDER_FIR_12 \

+* sizeof( opus_int16 ) );

+ /* Iterate over blocks of frameSizeIn input samples */

+ index_increment_Q16 = S->invRatio_Q16;

+ while( 1 ) {

+ nSamplesIn = silk_min( inLen, S->batchSize );

+ /* Upsample 2x */

+ silk_resampler_private_up2_HQ( S->sIIR, &buf[ \

+RESAMPLER_ORDER_FIR_12 ], in, nSamplesIn );

+ max_index_Q16 = silk_LSHIFT32( nSamplesIn, 16 + 1 \

+); /* + 1 because 2x upsampling */

+ out = silk_resampler_private_IIR_FIR_INTERPOL( out, \

+buf, max_index_Q16, index_increment_Q16 );

+ in += nSamplesIn;

+ inLen -= nSamplesIn;

+ if( inLen > 0 ) {

+ /* More iterations to do; copy last part of \

+filtered signal to beginning of buffer */

+- silk_memcpy( buf, &buf[ nSamplesIn << 1 ], \

+RESAMPLER_ORDER_FIR_12 * sizeof( opus_int32 ) );

++ silk_memmove( buf, &buf[ nSamplesIn << 1 ], \

+RESAMPLER_ORDER_FIR_12 * sizeof( opus_int16 ) );

+ } else {

+ break;

+ }

+ /* Copy last part of filtered signal to the state for \

+the next call */

+- silk_memcpy( S->sFIR, &buf[ nSamplesIn << 1 ], \

+RESAMPLER_ORDER_FIR_12 * sizeof( opus_int32 ) );

++ silk_memcpy( S->sFIR, &buf[ nSamplesIn << 1 ], \

+RESAMPLER_ORDER_FIR_12 * sizeof( opus_int16 ) );

+ }

+]]></artwork>

+</figure>

+ </section>

+ <section title="Integer wrap-around in inverse gain computation">

+ <t>

+ It was discovered through decoder fuzzing that some bitstreams could produce

+ integer values exceeding 32-bits in LPC_inverse_pred_gain_QA(), causing

+ a wrap-around. Although the error is harmless in practice, the C standard considers

+ the behavior as undefined, so the following patch to line 87 of silk/LPC_inv_pred_gain.c

+ detects values that do not fit in a 32-bit integer and considers the corresponding filters unstable:

+ </t>

+<figure>

+<artwork><![CDATA[

+ /* Update AR coefficient */

+ for( n = 0; n < k; n++ ) {

+- tmp_QA = Aold_QA[ n ] - MUL32_FRAC_Q( \

+Aold_QA[ k - n - 1 ], rc_Q31, 31 );

+- Anew_QA[ n ] = MUL32_FRAC_Q( tmp_QA, rc_mult2 , mult2Q );

++ opus_int64 tmp64;

++ tmp_QA = silk_SUB_SAT32( Aold_QA[ n ], MUL32_FRAC_Q( \

+Aold_QA[ k - n - 1 ], rc_Q31, 31 ) );

++ tmp64 = silk_RSHIFT_ROUND64( silk_SMULL( tmp_QA, \

+rc_mult2 ), mult2Q);

++ if( tmp64 > silk_int32_MAX || tmp64 < silk_int32_MIN ) {

++ return 0;

++ }

++ Anew_QA[ n ] = ( opus_int32 )tmp64;

+ }

+]]></artwork>

+</figure>

+ </section>

+ <section title="Integer wrap-around in LSF decoding">

+ <t>

+ It was discovered -- also from decoder fuzzing -- that an integer wrap-around could

+ occur when decoding line spectral frequency coefficients from extreme bitstreams.

+ The end result of the wrap-around is an illegal read access on the stack, which

+ the authors do not believe is exploitable but should nonetheless be fixed. The following

+ patch to line 137 of silk/NLSF_stabilize.c prevents the problem:

+ </t>

+<figure>

+<artwork><![CDATA[

+ /* Keep delta_min distance between the NLSFs */

+ for( i = 1; i < L; i++ )

+- NLSF_Q15[i] = silk_max_int( NLSF_Q15[i], \

+NLSF_Q15[i-1] + NDeltaMin_Q15[i] );

++ NLSF_Q15[i] = silk_max_int( NLSF_Q15[i], \

+silk_ADD_SAT16( NLSF_Q15[i-1], NDeltaMin_Q15[i] ) );

+ /* Last NLSF should be no higher than 1 - NDeltaMin[L] */

+]]></artwork>

+</figure>

+ </section>

+ <section title="Cap on Band Energy">

+ <t>On extreme bit-streams, it is possible for log-domain band energy levels

+ to exceed the maximum single-precision floating point value once converted

+ to a linear scale. This would later cause the decoded values to be NaN,

+ possibly causing problems in the software using the PCM values. This can be

+ avoided with the following patch to line 552 of celt/quant_bands.c:

+ </t>

+<figure>

+<artwork><![CDATA[

+ {

+ opus_val16 lg = ADD16(oldEBands[i+c*m->nbEBands],

+ SHL16((opus_val16)eMeans[i],6));

++ lg = MIN32(QCONST32(32.f, 16), lg);

+ eBands[i+c*m->nbEBands] = PSHR32(celt_exp2(lg),4);

+ }

+ for (;i<m->nbEBands;i++)

+]]></artwork>

+</figure>

+ </section>

+ <section title="Hybrid Folding" anchor="folding">

+ <t>When encoding in hybrid mode at low bitrate, we sometimes only have

+ enough bits to code a single CELT band (8 - 9.6 kHz). When that happens,

+ the second band (CELT band 18, from 9.6 to 12 kHz) cannot use folding

+ because it is wider than the amount already coded, and falls back to

+ LCG noise. Because it can also happen on transients (e.g. stops), it

+ can cause audible pre-echo.

+ </t>

+ <t>

+ To address the issue, we change the folding behavior so that it is

+ never forced to fall back to LCG due to the first band not containing

+ enough coefficients to fold onto the second band. This

+ is achieved by simply repeating part of the first band in the folding

+ of the second band. This changes the code in celt/bands.c around line 1237:

+ </t>

+<figure>

+<artwork><![CDATA[

+ b = 0;

+ }

+- if (resynth && M*eBands[i]-N >= M*eBands[start] && \

+(update_lowband || lowband_offset==0))

++ if (resynth && (M*eBands[i]-N >= M*eBands[start] || \

+i==start+1) && (update_lowband || lowband_offset==0))

+ lowband_offset = i;

++ if (i == start+1)

++ {

++ int n1, n2;

++ int offset;

++ n1 = M*(eBands[start+1]-eBands[start]);

++ n2 = M*(eBands[start+2]-eBands[start+1]);

++ offset = M*eBands[start];

++ /* Duplicate enough of the first band folding data to \

+be able to fold the second band.

++ Copies no data for CELT-only mode. */

++ OPUS_COPY(&norm[offset+n1], &norm[offset+2*n1 - n2], n2-n1);

++ if (C==2)

++ OPUS_COPY(&norm2[offset+n1], &norm2[offset+2*n1 - n2], \

+n2-n1);

++ }

+ tf_change = tf_res[i];

+ if (i>=m->effEBands)

+ {

+]]></artwork>

+</figure>

+ <t>

+ as well as line 1260:

+ </t>

+<figure>

+<artwork><![CDATA[

+ fold_start = lowband_offset;

+ while(M*eBands[--fold_start] > effective_lowband);

+ fold_end = lowband_offset-1;

+- while(M*eBands[++fold_end] < effective_lowband+N);

++ while(++fold_end < i && M*eBands[fold_end] < \

+effective_lowband+N);

+ x_cm = y_cm = 0;

+ fold_i = fold_start; do {

+ x_cm |= collapse_masks[fold_i*C+0];

+]]></artwork>

+</figure>

+ <t>

+ The fix does not impact compatibility, because the improvement does

+ not depend on the encoder doing anything special. There is also no

+ reasonable way for an encoder to use the original behavior to

+ improve quality over the proposed change.

+ </t>

+ </section>

+ <section title="Downmix to Mono" anchor="stereo">

+ <t>The last issue is not strictly a bug, but it is an issue that has been reported

+ when downmixing an Opus decoded stream to mono, whether this is done inside the decoder

+ or as a post-processing step on the stereo decoder output. Opus intensity stereo allows

+ optionally coding the two channels 180-degrees out of phase on a per-band basis.

+ This provides better stereo quality than forcing the two channels to be in phase,

+ but when the output is downmixed to mono, the energy in the affected bands is cancelled

+ sometimes resulting in audible artefacts.

+ </t>

+ <t>As a work-around for this issue, the decoder MAY choose not to apply the 180-degree

+ phase shift when the output is meant to be downmixed (inside or

+ outside of the decoder).

+ </t>

+ </section>

+ <section title="New Test Vectors">

+ <t>Changes in <xref target="folding"/> and <xref target="stereo"/> have

+ sufficient impact on the testvectors to make them fail. For this reason,

+ this document also updates the Opus test vectors. The new test vectors now

+ include two decoded outputs for the same bitstream. The outputs with

+ suffix 'm' do not apply the CELT 180-degree phase shift as allowed in

+ <xref target="stereo"/>, while the outputs without the suffix do. An

+ implementation is compliant as long as it passes either set of vectors.

+ </t>

+ <t>

+ In addition, any Opus implementation

+ that passes the original test vectors from <xref target="RFC6716">RFC 6716</xref>

+ is still compliant with the Opus specification. However, newer implementations

+ SHOULD be based on the new test vectors rather than the old ones.

+ </t>

+ <t>The new test vectors are located at

+ <eref target="https://jmvalin.ca/misc_stuff/opus_newvectors.tar.gz"/>. (EDITOR:

+ change to an ietf.org link when ready)

+ </t>

+ </section>

+ <section anchor="IANA" title="IANA Considerations">

+ <t>This document makes no request of IANA.</t>

+ <t>Note to RFC Editor: this section may be removed on publication as an

+ RFC.</t>

+ </section>

+ <section anchor="Acknowledgements" title="Acknowledgements">

+ <t>We would like to thank Juri Aedla for reporting the issue with the parsing of

+ the Opus padding. Also, thanks to Jonathan Lennox and Mark Harris for their

+ feedback on this document.</t>

+ </section>

+ </middle>

+ <back>

+ <references title="References">

+ <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>

+ <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6716.xml"?>

+ </references>

+ </back>

+</rfc>

« no previous file with comments | « third_party/opus/src/doc/draft-ietf-codec-opus.xml ('k') | third_party/opus/src/doc/draft-ietf-payload-rtp-opus.xml » ('j') | no next file with comments »