OLD | NEW |
(Empty) | |
| 1 <?xml version="1.0" encoding="US-ASCII"?> |
| 2 <!DOCTYPE rfc SYSTEM "rfc2629.dtd"> |
| 3 <?rfc toc="yes"?> |
| 4 <?rfc tocompact="yes"?> |
| 5 <?rfc tocdepth="3"?> |
| 6 <?rfc tocindent="yes"?> |
| 7 <?rfc symrefs="yes"?> |
| 8 <?rfc sortrefs="yes"?> |
| 9 <?rfc comments="yes"?> |
| 10 <?rfc inline="yes"?> |
| 11 <?rfc compact="yes"?> |
| 12 <?rfc subcompact="no"?> |
| 13 <rfc category="std" docName="draft-ietf-codec-opus-update-06" |
| 14 ipr="trust200902"> |
| 15 <front> |
| 16 <title abbrev="Opus Update">Updates to the Opus Audio Codec</title> |
| 17 |
| 18 <author initials="JM" surname="Valin" fullname="Jean-Marc Valin"> |
| 19 <organization>Mozilla Corporation</organization> |
| 20 <address> |
| 21 <postal> |
| 22 <street>331 E. Evelyn Avenue</street> |
| 23 <city>Mountain View</city> |
| 24 <region>CA</region> |
| 25 <code>94041</code> |
| 26 <country>USA</country> |
| 27 </postal> |
| 28 <phone>+1 650 903-0800</phone> |
| 29 <email>jmvalin@jmvalin.ca</email> |
| 30 </address> |
| 31 </author> |
| 32 |
| 33 <author initials="K." surname="Vos" fullname="Koen Vos"> |
| 34 <organization>vocTone</organization> |
| 35 <address> |
| 36 <postal> |
| 37 <street></street> |
| 38 <city></city> |
| 39 <region></region> |
| 40 <code></code> |
| 41 <country></country> |
| 42 </postal> |
| 43 <phone></phone> |
| 44 <email>koenvos74@gmail.com</email> |
| 45 </address> |
| 46 </author> |
| 47 |
| 48 |
| 49 |
| 50 <date day="19" month="June" year="2017" /> |
| 51 |
| 52 <abstract> |
| 53 <t>This document addresses minor issues that were found in the specificati
on |
| 54 of the Opus audio codec in <xref target="RFC6716">RFC 6716</xref>.</t> |
| 55 </abstract> |
| 56 </front> |
| 57 |
| 58 <middle> |
| 59 <section title="Introduction"> |
| 60 <t>This document addresses minor issues that were discovered in the refere
nce |
| 61 implementation of the Opus codec that serves as the specification in |
| 62 <xref target="RFC6716">RFC 6716</xref>. Only issues affecting the decoder
are |
| 63 listed here. An up-to-date implementation of the Opus encoder can be found
at |
| 64 https://opus-codec.org/.</t> |
| 65 <t> |
| 66 Some of the changes in this document update normative behaviour in a way t
hat requires |
| 67 new test vectors. The English text of the specification is unaffected, onl
y |
| 68 the C implementation is. The updated specification remains fully compatibl
e with |
| 69 the original specification. |
| 70 </t> |
| 71 |
| 72 <t> |
| 73 Note: due to RFC formatting conventions, lines exceeding the column width |
| 74 in the patch are split using a backslash character. The backslashes |
| 75 at the end of a line and the white space at the beginning |
| 76 of the following line are not part of the patch. A properly formatted patch |
| 77 including all changes is available at |
| 78 <eref target="https://jmvalin.ca/misc_stuff/opus_update.patch"/>. (EDITOR: |
| 79 change to an ietf.org link when ready) |
| 80 </t> |
| 81 |
| 82 </section> |
| 83 |
| 84 <section title="Terminology"> |
| 85 <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", |
| 86 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this |
| 87 document are to be interpreted as described in <xref |
| 88 target="RFC2119">RFC 2119</xref>.</t> |
| 89 </section> |
| 90 |
| 91 <section title="Stereo State Reset in SILK"> |
| 92 <t>The reference implementation does not reinitialize the stereo state |
| 93 during a mode switch. The old stereo memory can produce a brief impulse |
| 94 (i.e. single sample) in the decoded audio. This can be fixed by changing |
| 95 silk/dec_API.c at line 72: |
| 96 </t> |
| 97 <figure> |
| 98 <artwork><![CDATA[ |
| 99 for( n = 0; n < DECODER_NUM_CHANNELS; n++ ) { |
| 100 ret = silk_init_decoder( &channel_state[ n ] ); |
| 101 } |
| 102 + silk_memset(&((silk_decoder *)decState)->sStereo, 0, |
| 103 + sizeof(((silk_decoder *)decState)->sStereo)); |
| 104 + /* Not strictly needed, but it's cleaner that way */ |
| 105 + ((silk_decoder *)decState)->prev_decode_only_middle = 0; |
| 106 |
| 107 return ret; |
| 108 } |
| 109 ]]></artwork> |
| 110 </figure> |
| 111 <t> |
| 112 This change affects the normative part of the decoder, although the |
| 113 amount of change is too small to make a significant impact on testvectors. |
| 114 </t> |
| 115 </section> |
| 116 |
| 117 <section anchor="padding" title="Parsing of the Opus Packet Padding"> |
| 118 <t>It was discovered that some invalid packets of very large size could tr
igger |
| 119 an out-of-bounds read in the Opus packet parsing code responsible for padd
ing. |
| 120 This is due to an integer overflow if the signaled padding exceeds 2^31-1
bytes |
| 121 (the actual packet may be smaller). The code can be fixed by applying the
following |
| 122 changes at line 596 of src/opus_decoder.c: |
| 123 </t> |
| 124 <figure> |
| 125 <artwork><![CDATA[ |
| 126 /* Padding flag is bit 6 */ |
| 127 if (ch&0x40) |
| 128 { |
| 129 - int padding=0; |
| 130 int p; |
| 131 do { |
| 132 if (len<=0) |
| 133 return OPUS_INVALID_PACKET; |
| 134 p = *data++; |
| 135 len--; |
| 136 - padding += p==255 ? 254: p; |
| 137 + len -= p==255 ? 254: p; |
| 138 } while (p==255); |
| 139 - len -= padding; |
| 140 } |
| 141 ]]></artwork> |
| 142 </figure> |
| 143 <t>This packet parsing issue is limited to reading memory up |
| 144 to about 60 kB beyond the compressed buffer. This can only be triggered |
| 145 by a compressed packet more than about 16 MB long, so it's not a proble
m |
| 146 for RTP. In theory, it <spanx style="emph">could</spanx> crash a file |
| 147 decoder (e.g. Opus in Ogg) if the memory just after the incoming packet |
| 148 is out-of-range, but our attempts to trigger such a crash in a producti
on |
| 149 application built using an affected version of the Opus decoder failed.
</t> |
| 150 </section> |
| 151 |
| 152 <section anchor="resampler" title="Resampler buffer"> |
| 153 <t>The SILK resampler had the following issues: |
| 154 <list style="numbers"> |
| 155 <t>The calls to memcpy() were using sizeof(opus_int32), but the type of the |
| 156 local buffer was opus_int16.</t> |
| 157 <t>Because the size was wrong, this potentially allowed the source |
| 158 and destination regions of the memcpy() to overlap. |
| 159 We <spanx style="emph">believe</spanx> that nSamplesIn is at least fs_
in_khZ, |
| 160 which is at least 8. |
| 161 Since RESAMPLER_ORDER_FIR_12 is only 8, that should not be a problem once |
| 162 the type size is fixed.</t> |
| 163 <t>The size of the buffer used RESAMPLER_MAX_BATCH_SIZE_IN, but the |
| 164 data stored in it was actually _twice_ the input batch size |
| 165 (nSamplesIn<<1).</t> |
| 166 </list></t> |
| 167 <t> |
| 168 The fact that the code never produced any error in testing (including when
run under the |
| 169 Valgrind memory debugger), suggests that in practice |
| 170 the batch sizes are reasonable enough that none of the issues above |
| 171 was ever a problem. However, proving that is non-obvious. |
| 172 </t> |
| 173 <t>The code can be fixed by applying the following changes to line 78 of sil
k/resampler_private_IIR_FIR.c: |
| 174 </t> |
| 175 <figure> |
| 176 <artwork><![CDATA[ |
| 177 ) |
| 178 { |
| 179 silk_resampler_state_struct *S = \ |
| 180 (silk_resampler_state_struct *)SS; |
| 181 opus_int32 nSamplesIn; |
| 182 opus_int32 max_index_Q16, index_increment_Q16; |
| 183 - opus_int16 buf[ RESAMPLER_MAX_BATCH_SIZE_IN + \ |
| 184 RESAMPLER_ORDER_FIR_12 ]; |
| 185 + opus_int16 buf[ 2*RESAMPLER_MAX_BATCH_SIZE_IN + \ |
| 186 RESAMPLER_ORDER_FIR_12 ]; |
| 187 |
| 188 /* Copy buffered samples to start of buffer */ |
| 189 - silk_memcpy( buf, S->sFIR, RESAMPLER_ORDER_FIR_12 \ |
| 190 * sizeof( opus_int32 ) ); |
| 191 + silk_memcpy( buf, S->sFIR, RESAMPLER_ORDER_FIR_12 \ |
| 192 * sizeof( opus_int16 ) ); |
| 193 |
| 194 /* Iterate over blocks of frameSizeIn input samples */ |
| 195 index_increment_Q16 = S->invRatio_Q16; |
| 196 while( 1 ) { |
| 197 nSamplesIn = silk_min( inLen, S->batchSize ); |
| 198 |
| 199 /* Upsample 2x */ |
| 200 silk_resampler_private_up2_HQ( S->sIIR, &buf[ \ |
| 201 RESAMPLER_ORDER_FIR_12 ], in, nSamplesIn ); |
| 202 |
| 203 max_index_Q16 = silk_LSHIFT32( nSamplesIn, 16 + 1 \ |
| 204 ); /* + 1 because 2x upsampling */ |
| 205 out = silk_resampler_private_IIR_FIR_INTERPOL( out, \ |
| 206 buf, max_index_Q16, index_increment_Q16 ); |
| 207 in += nSamplesIn; |
| 208 inLen -= nSamplesIn; |
| 209 |
| 210 if( inLen > 0 ) { |
| 211 /* More iterations to do; copy last part of \ |
| 212 filtered signal to beginning of buffer */ |
| 213 - silk_memcpy( buf, &buf[ nSamplesIn << 1 ], \ |
| 214 RESAMPLER_ORDER_FIR_12 * sizeof( opus_int32 ) ); |
| 215 + silk_memmove( buf, &buf[ nSamplesIn << 1 ], \ |
| 216 RESAMPLER_ORDER_FIR_12 * sizeof( opus_int16 ) ); |
| 217 } else { |
| 218 break; |
| 219 } |
| 220 } |
| 221 |
| 222 /* Copy last part of filtered signal to the state for \ |
| 223 the next call */ |
| 224 - silk_memcpy( S->sFIR, &buf[ nSamplesIn << 1 ], \ |
| 225 RESAMPLER_ORDER_FIR_12 * sizeof( opus_int32 ) ); |
| 226 + silk_memcpy( S->sFIR, &buf[ nSamplesIn << 1 ], \ |
| 227 RESAMPLER_ORDER_FIR_12 * sizeof( opus_int16 ) ); |
| 228 } |
| 229 ]]></artwork> |
| 230 </figure> |
| 231 </section> |
| 232 |
| 233 <section title="Integer wrap-around in inverse gain computation"> |
| 234 <t> |
| 235 It was discovered through decoder fuzzing that some bitstreams could pro
duce |
| 236 integer values exceeding 32-bits in LPC_inverse_pred_gain_QA(), causing |
| 237 a wrap-around. Although the error is harmless in practice, the C standar
d considers |
| 238 the behavior as undefined, so the following patch to line 87 of silk/LPC
_inv_pred_gain.c |
| 239 detects values that do not fit in a 32-bit integer and considers the cor
responding filters unstable: |
| 240 </t> |
| 241 <figure> |
| 242 <artwork><![CDATA[ |
| 243 /* Update AR coefficient */ |
| 244 for( n = 0; n < k; n++ ) { |
| 245 - tmp_QA = Aold_QA[ n ] - MUL32_FRAC_Q( \ |
| 246 Aold_QA[ k - n - 1 ], rc_Q31, 31 ); |
| 247 - Anew_QA[ n ] = MUL32_FRAC_Q( tmp_QA, rc_mult2 , mult2Q ); |
| 248 + opus_int64 tmp64; |
| 249 + tmp_QA = silk_SUB_SAT32( Aold_QA[ n ], MUL32_FRAC_Q( \ |
| 250 Aold_QA[ k - n - 1 ], rc_Q31, 31 ) ); |
| 251 + tmp64 = silk_RSHIFT_ROUND64( silk_SMULL( tmp_QA, \ |
| 252 rc_mult2 ), mult2Q); |
| 253 + if( tmp64 > silk_int32_MAX || tmp64 < silk_int32_MIN ) { |
| 254 + return 0; |
| 255 + } |
| 256 + Anew_QA[ n ] = ( opus_int32 )tmp64; |
| 257 } |
| 258 ]]></artwork> |
| 259 </figure> |
| 260 </section> |
| 261 |
| 262 <section title="Integer wrap-around in LSF decoding"> |
| 263 <t> |
| 264 It was discovered -- also from decoder fuzzing -- that an integer wrap-a
round could |
| 265 occur when decoding line spectral frequency coefficients from extreme bi
tstreams. |
| 266 The end result of the wrap-around is an illegal read access on the stack
, which |
| 267 the authors do not believe is exploitable but should nonetheless be fixe
d. The following |
| 268 patch to line 137 of silk/NLSF_stabilize.c prevents the problem: |
| 269 </t> |
| 270 <figure> |
| 271 <artwork><![CDATA[ |
| 272 /* Keep delta_min distance between the NLSFs */ |
| 273 for( i = 1; i < L; i++ ) |
| 274 - NLSF_Q15[i] = silk_max_int( NLSF_Q15[i], \ |
| 275 NLSF_Q15[i-1] + NDeltaMin_Q15[i] ); |
| 276 + NLSF_Q15[i] = silk_max_int( NLSF_Q15[i], \ |
| 277 silk_ADD_SAT16( NLSF_Q15[i-1], NDeltaMin_Q15[i] ) ); |
| 278 |
| 279 /* Last NLSF should be no higher than 1 - NDeltaMin[L] */ |
| 280 ]]></artwork> |
| 281 </figure> |
| 282 |
| 283 </section> |
| 284 |
| 285 <section title="Cap on Band Energy"> |
| 286 <t>On extreme bit-streams, it is possible for log-domain band energy level
s |
| 287 to exceed the maximum single-precision floating point value once convert
ed |
| 288 to a linear scale. This would later cause the decoded values to be NaN, |
| 289 possibly causing problems in the software using the PCM values. This can
be |
| 290 avoided with the following patch to line 552 of celt/quant_bands.c: |
| 291 </t> |
| 292 <figure> |
| 293 <artwork><![CDATA[ |
| 294 { |
| 295 opus_val16 lg = ADD16(oldEBands[i+c*m->nbEBands], |
| 296 SHL16((opus_val16)eMeans[i],6)); |
| 297 + lg = MIN32(QCONST32(32.f, 16), lg); |
| 298 eBands[i+c*m->nbEBands] = PSHR32(celt_exp2(lg),4); |
| 299 } |
| 300 for (;i<m->nbEBands;i++) |
| 301 ]]></artwork> |
| 302 </figure> |
| 303 </section> |
| 304 |
| 305 <section title="Hybrid Folding" anchor="folding"> |
| 306 <t>When encoding in hybrid mode at low bitrate, we sometimes only have |
| 307 enough bits to code a single CELT band (8 - 9.6 kHz). When that happens, |
| 308 the second band (CELT band 18, from 9.6 to 12 kHz) cannot use folding |
| 309 because it is wider than the amount already coded, and falls back to |
| 310 LCG noise. Because it can also happen on transients (e.g. stops), it |
| 311 can cause audible pre-echo. |
| 312 </t> |
| 313 <t> |
| 314 To address the issue, we change the folding behavior so that it is |
| 315 never forced to fall back to LCG due to the first band not containing |
| 316 enough coefficients to fold onto the second band. This |
| 317 is achieved by simply repeating part of the first band in the folding |
| 318 of the second band. This changes the code in celt/bands.c around line 12
37: |
| 319 </t> |
| 320 <figure> |
| 321 <artwork><![CDATA[ |
| 322 b = 0; |
| 323 } |
| 324 |
| 325 - if (resynth && M*eBands[i]-N >= M*eBands[start] && \ |
| 326 (update_lowband || lowband_offset==0)) |
| 327 + if (resynth && (M*eBands[i]-N >= M*eBands[start] || \ |
| 328 i==start+1) && (update_lowband || lowband_offset==0)) |
| 329 lowband_offset = i; |
| 330 |
| 331 + if (i == start+1) |
| 332 + { |
| 333 + int n1, n2; |
| 334 + int offset; |
| 335 + n1 = M*(eBands[start+1]-eBands[start]); |
| 336 + n2 = M*(eBands[start+2]-eBands[start+1]); |
| 337 + offset = M*eBands[start]; |
| 338 + /* Duplicate enough of the first band folding data to \ |
| 339 be able to fold the second band. |
| 340 + Copies no data for CELT-only mode. */ |
| 341 + OPUS_COPY(&norm[offset+n1], &norm[offset+2*n1 - n2], n2-n1); |
| 342 + if (C==2) |
| 343 + OPUS_COPY(&norm2[offset+n1], &norm2[offset+2*n1 - n2], \ |
| 344 n2-n1); |
| 345 + } |
| 346 + |
| 347 tf_change = tf_res[i]; |
| 348 if (i>=m->effEBands) |
| 349 { |
| 350 ]]></artwork> |
| 351 </figure> |
| 352 |
| 353 <t> |
| 354 as well as line 1260: |
| 355 </t> |
| 356 |
| 357 <figure> |
| 358 <artwork><![CDATA[ |
| 359 fold_start = lowband_offset; |
| 360 while(M*eBands[--fold_start] > effective_lowband); |
| 361 fold_end = lowband_offset-1; |
| 362 - while(M*eBands[++fold_end] < effective_lowband+N); |
| 363 + while(++fold_end < i && M*eBands[fold_end] < \ |
| 364 effective_lowband+N); |
| 365 x_cm = y_cm = 0; |
| 366 fold_i = fold_start; do { |
| 367 x_cm |= collapse_masks[fold_i*C+0]; |
| 368 |
| 369 ]]></artwork> |
| 370 </figure> |
| 371 <t> |
| 372 The fix does not impact compatibility, because the improvement does |
| 373 not depend on the encoder doing anything special. There is also no |
| 374 reasonable way for an encoder to use the original behavior to |
| 375 improve quality over the proposed change. |
| 376 </t> |
| 377 </section> |
| 378 |
| 379 <section title="Downmix to Mono" anchor="stereo"> |
| 380 <t>The last issue is not strictly a bug, but it is an issue that has been
reported |
| 381 when downmixing an Opus decoded stream to mono, whether this is done insid
e the decoder |
| 382 or as a post-processing step on the stereo decoder output. Opus intensity
stereo allows |
| 383 optionally coding the two channels 180-degrees out of phase on a per-band
basis. |
| 384 This provides better stereo quality than forcing the two channels to be in
phase, |
| 385 but when the output is downmixed to mono, the energy in the affected bands
is cancelled |
| 386 sometimes resulting in audible artefacts. |
| 387 </t> |
| 388 <t>As a work-around for this issue, the decoder MAY choose not to apply th
e 180-degree |
| 389 phase shift when the output is meant to be downmixed (inside or |
| 390 outside of the decoder). |
| 391 </t> |
| 392 </section> |
| 393 |
| 394 |
| 395 <section title="New Test Vectors"> |
| 396 <t>Changes in <xref target="folding"/> and <xref target="stereo"/> have |
| 397 sufficient impact on the testvectors to make them fail. For this reason, |
| 398 this document also updates the Opus test vectors. The new test vectors n
ow |
| 399 include two decoded outputs for the same bitstream. The outputs with |
| 400 suffix 'm' do not apply the CELT 180-degree phase shift as allowed in |
| 401 <xref target="stereo"/>, while the outputs without the suffix do. An |
| 402 implementation is compliant as long as it passes either set of vectors. |
| 403 </t> |
| 404 <t> |
| 405 In addition, any Opus implementation |
| 406 that passes the original test vectors from <xref target="RFC6716">RFC 67
16</xref> |
| 407 is still compliant with the Opus specification. However, newer implement
ations |
| 408 SHOULD be based on the new test vectors rather than the old ones. |
| 409 </t> |
| 410 <t>The new test vectors are located at |
| 411 <eref target="https://jmvalin.ca/misc_stuff/opus_newvectors.tar.gz"/>. (
EDITOR: |
| 412 change to an ietf.org link when ready) |
| 413 </t> |
| 414 </section> |
| 415 |
| 416 <section anchor="IANA" title="IANA Considerations"> |
| 417 <t>This document makes no request of IANA.</t> |
| 418 |
| 419 <t>Note to RFC Editor: this section may be removed on publication as an |
| 420 RFC.</t> |
| 421 </section> |
| 422 |
| 423 <section anchor="Acknowledgements" title="Acknowledgements"> |
| 424 <t>We would like to thank Juri Aedla for reporting the issue with the pars
ing of |
| 425 the Opus padding. Also, thanks to Jonathan Lennox and Mark Harris for thei
r |
| 426 feedback on this document.</t> |
| 427 </section> |
| 428 </middle> |
| 429 |
| 430 <back> |
| 431 <references title="References"> |
| 432 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.211
9.xml"?> |
| 433 <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.671
6.xml"?> |
| 434 |
| 435 |
| 436 </references> |
| 437 </back> |
| 438 </rfc> |
OLD | NEW |