media/filters/audio_renderer_algorithm.h - Issue 19111004: Upgrade AudioRendererAlgorithm to use WSOLA,

Side by Side Diff: media/filters/audio_renderer_algorithm.h

Issue 19111004: Upgrade AudioRendererAlgorithm to use WSOLA, (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src

Patch Set: "Dale's and Marco's comments are addressed." Created 7 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

OLD	NEW
1 // Copyright (c) 2012 The Chromium Authors. All rights reserved.	1 // Copyright (c) 2012 The Chromium Authors. All rights reserved.

2 // Use of this source code is governed by a BSD-style license that can be	2 // Use of this source code is governed by a BSD-style license that can be

3 // found in the LICENSE file.	3 // found in the LICENSE file.

4	4

5 // AudioRendererAlgorithm buffers and transforms audio data. The owner of	5 // AudioRendererAlgorithm buffers and transforms audio data. The owner of

6 // this object provides audio data to the object through EnqueueBuffer() and	6 // this object provides audio data to the object through EnqueueBuffer() and

7 // requests data from the buffer via FillBuffer(). The owner also sets the	7 // requests data from the buffer via FillBuffer(). The owner also sets the

8 // playback rate, and the AudioRendererAlgorithm will stretch or compress the	8 // playback rate, and the AudioRendererAlgorithm will stretch or compress the

9 // buffered audio as necessary to match the playback rate when fulfilling	9 // buffered audio as necessary to match the playback rate when fulfilling

10 // FillBuffer() requests.	10 // FillBuffer() requests.

(...skipping 66 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
77 // than \|audio_buffer_\| was intending to hold.	77 // than \|audio_buffer_\| was intending to hold.

78 int frames_buffered() { return audio_buffer_.frames(); }	78 int frames_buffered() { return audio_buffer_.frames(); }

79	79

80 // Returns the samples per second for this audio stream.	80 // Returns the samples per second for this audio stream.

81 int samples_per_second() { return samples_per_second_; }	81 int samples_per_second() { return samples_per_second_; }

82	82

83 // Is the sound currently muted?	83 // Is the sound currently muted?

84 bool is_muted() { return muted_; }	84 bool is_muted() { return muted_; }

85	85

86 private:	86 private:

87 // Fills \|dest\| with up to \|requested_frames\| frames of audio data at faster	87 // Within the search region, find the block of data that is most similar to

88 // than normal speed. Returns the number of frames inserted into \|dest\|. If	88 // target block, and write it in \|optimal_block_\|. Returns false it there is
	DaleCurtis 2013/08/13 21:11:04 s/target block/\|target_block_\|/ s/target block/\|target_block_\|/ turaj 2013/08/16 22:13:56 comment rephrased. On 2013/08/13 21:11:04, DaleCu comment rephrased. On 2013/08/13 21:11:04, DaleCurtis wrote: Show quoted text > s/target block/\|target_block_\|/
89 // not enough data available, returns 0.	89 // not enough data to perform search. This is the case if either

90 //	90 // \|target_block_\| or \|search_block_\| extend into the future, i.e more input

91 // When the audio playback is > 1.0, we use a variant of Overlap-Add to squish	91 // is required. Otherwise true is returned.

92 // audio output while preserving pitch. Essentially, we play a bit of audio	92 bool GetOptimalBlock();

93 // data at normal speed, then we "fast forward" by dropping the next bit of

94 // audio data, and then we stich the pieces together by crossfading from one

95 // audio chunk to the next.

96 int OutputFasterPlayback(AudioBus* dest,

97 int dest_offset,

98 int requested_frames,

99 int input_step,

100 int output_step);

101	93

102 // Fills \|dest\| with up to \|requested_frames\| frames of audio data at slower	94 // Read a maximum of \|requested_frames\| frames from \|wsola_output_\|. Returns

103 // than normal speed. Returns the number of frames inserted into \|dest\|. If	95 // number of frames actually read.

104 // not enough data available, returns 0.	96 int WriteCompletedFramesTo(

105 //	97 int requested_frames, int output_offset, AudioBus* dest);

106 // When the audio playback is < 1.0, we use a variant of Overlap-Add to

107 // stretch audio output while preserving pitch. This works by outputting a

108 // segment of audio data at normal speed. The next audio segment then starts

109 // by repeating some of the audio data from the previous audio segment.

110 // Segments are stiched together by crossfading from one audio chunk to the

111 // next.

112 int OutputSlowerPlayback(AudioBus* dest,

113 int dest_offset,

114 int requested_frames,

115 int input_step,

116 int output_step);

117	98

118 // Resets the window state to the start of a new window.	99 // Fill \|dest\| with frames from \|audio_buffer_\| starting from frame

119 void ResetWindow();	100 // \|read_offset_frames\|. \|dest\| is expected to have the same number of

	101 // channels as \|audio_buffer_\|. A Negative offset, i.e.
	DaleCurtis 2013/08/13 21:11:04 s/Negative/negative/ s/Negative/negative/ turaj 2013/08/16 22:13:56 Done. Show quoted text On 2013/08/13 21:11:04, DaleCurtis wrote: > s/Negative/negative/ Done.
	102 // \|read_offset_frames\| < 0, is accepted assuming that \|audio_buffer\| is zero

	103 // for negative indices. This might happen for few first frames. False will

	104 // be returned if it is required to read beyond the last frame of

	105 // \|audio_buffer_\|, otherwise true is returned.

	106 bool PeekAudioWithZeroAppend(int read_offset_frames, AudioBus* dest);

120	107

121 // Does a linear crossfade from \|intro\| into \|outtro\| for one frame.	108 // Run one iteration of WSOLA, if there are sufficient frames. This will

122 void CrossfadeFrame(AudioBus* intro,	109 // extend the output by \|ola_hop_size_\|, written to \|wsola_output_\|. Then,

123 int intro_offset,	110 // at most \|requested_frames\| frames are read and written to \|dest\|, starting

124 AudioBus* outtro,	111 // at \|dest_offset\| frame. The number of frames

125 int outtro_offset,	112 // which is actually written to \|dest\| is returned.

126 int fade_offset);	113 bool WsolaIteration();
	DaleCurtis 2013/08/13 21:11:04 RunOneWsolaIteration() ? RunOneWsolaIteration() ? turaj 2013/08/16 22:13:56 Done. Show quoted text On 2013/08/13 21:11:04, DaleCurtis wrote: > RunOneWsolaIteration() ? Done.
	114

	115 // Seek \|audio_buffer_\| forward to remove frames from input that are not used

	116 // any more. State of the WSOLA will be updated accordingly.

	117 void RemoveOldInputFrames();

	118

	119 // Return the index to the first frame of the search region.

	120 int GetSearchRegionIndex() const;

	121

	122 // Is the target block fully within search region? If so, we don't need to
	DaleCurtis 2013/08/13 21:11:04 Use explicit \|target_block_\| and \|search_block_\| Use explicit \|target_block_\| and \|search_block_\| references in method descriptions (here and elsewhere) -- the algorithm is already confusing :) turaj 2013/08/16 22:13:56 Done. Show quoted text On 2013/08/13 21:11:04, DaleCurtis wrote: > Use explicit \|target_block_\| and \|search_block_\| references in method > descriptions (here and elsewhere) -- the algorithm is already confusing :) Done.
	123 // perform the search.

	124 bool TargetIsWithinSearchRegion() const;

	125

	126 // Do we have enough data to perform one round of WSOLA?

	127 bool CanPerformWsola() const;

127	128

128 // Number of channels in audio stream.	129 // Number of channels in audio stream.

129 int channels_;	130 int channels_;

130	131

131 // Sample rate of audio stream.	132 // Sample rate of audio stream.

132 int samples_per_second_;	133 int samples_per_second_;

133	134

134 // Used by algorithm to scale output.	135 // Used by algorithm to scale output.

135 float playback_rate_;	136 float playback_rate_;

136	137

137 // Buffered audio data.	138 // Buffered audio data.

138 AudioBufferQueue audio_buffer_;	139 AudioBufferQueue audio_buffer_;

139	140

140 // Length for crossfade in frames.

141 int frames_in_crossfade_;

142

143 // The current location in the audio window, between 0 and \|window_size_\|.

144 // When \|index_into_window_\| reaches \|window_size_\|, the window resets.

145 // Indexed by frame.

146 int index_into_window_;

147

148 // The frame number in the crossfade.

149 int crossfade_frame_number_;

150

151 // True if the audio should be muted.	141 // True if the audio should be muted.

152 bool muted_;	142 bool muted_;

153	143

154 // If muted, keep track of partial frames that should have been skipped over.	144 // If muted, keep track of partial frames that should have been skipped over.

155 double muted_partial_frame_;	145 double muted_partial_frame_;

156	146

157 // Temporary buffer to hold crossfade data.

158 scoped_ptr<AudioBus> crossfade_buffer_;

159

160 // Window size, in frames (calculated from audio properties).

161 int window_size_;

162

163 // How many frames to have in the queue before we report the queue is full.	147 // How many frames to have in the queue before we report the queue is full.

164 int capacity_;	148 int capacity_;

165	149

	150 // Waveform Similarity Overlap-and-add (WSOLA) variables.
	DaleCurtis 2013/08/13 21:11:04 This is more of an algorithm description and shoul This is more of an algorithm description and should be in the .cc file not the .h. Also the listed variables don't match up to the ones actually in the class. turaj 2013/08/16 22:13:56 I thought a description of algorithm helps underst I thought a description of algorithm helps understanding of variables. Moved to .cc. On 2013/08/13 21:11:04, DaleCurtis wrote: Show quoted text > This is more of an algorithm description and should be in the .cc file not the > .h. Also the listed variables don't match up to the ones actually in the class. DaleCurtis 2013/08/19 22:15:23 Feel free to add a comment telling readers to look Show quoted text On 2013/08/16 22:13:56, turaj wrote: > I thought a description of algorithm helps understanding of variables. Moved to > .cc. > > On 2013/08/13 21:11:04, DaleCurtis wrote: > > This is more of an algorithm description and should be in the .cc file not the > > .h. Also the listed variables don't match up to the ones actually in the > class. > Feel free to add a comment telling readers to look at the top of the .cc file for a more elaborate description. turaj 2013/08/21 01:01:19 Done. If you feel appropriate I can give reference Show quoted text On 2013/08/19 22:15:23, DaleCurtis wrote: > On 2013/08/16 22:13:56, turaj wrote: > > I thought a description of algorithm helps understanding of variables. Moved > to > > .cc. > > > > On 2013/08/13 21:11:04, DaleCurtis wrote: > > > This is more of an algorithm description and should be in the .cc file not > the > > > .h. Also the listed variables don't match up to the ones actually in the > > class. > > > Feel free to add a comment telling readers to look at the top of the .cc file > for a more elaborate description. Done. If you feel appropriate I can give reference to the paper where the algorithm is published.
	151 //

	152 // This is how WSOLA with 50% overlap-add works:

	153 //

	154 // Notation:

	155 //

	156 // \|W\| overlap-and-add (OLA) window.

	157 // \|L\| size of \|W\| in samples.

	158 // \|alpha\| playback-rate, where values less than 1 indicate a slowed-down

	159 // playout (output is longer than input).

	160 // \|ts_out\| current timestamp of output.

	161 // \|target\| target-block, we search the input to find a block that is most

	162 // similar to \|target\|. Similarity is measured by the correlation

	163 // between two given blocks.

	164 // \|tau\| a parameter defining the search interval. The search interval for

	165 // the best matched to \|target\| is

	166 // [\|ts_out\|\|alpha\|-\|tau\|, \|ts_out\|\|alpha\|+\|tau\|].

	167 // \|U\| Transition Window. See the step 5) for the usage of this window.

	168 //

	169 // Assume we start at time 0, i.e. beginning of both input

	170 // and output streams.

	171 //

	172 // 1) Initialize the output with the faded-out version of the first \|L/2\|

	173 // samples of the input. The faded-out version is constructed by

	174 // multiplying \|L/2\| input samples with the second half of OLA window, \|W\|.

	175 //

	176 // 2) Set the timestamp of output, \|ts_out\|, to \|L/2\|.

	177 //

	178 // 3) \|target\| is samples [0, L) of the input. This is the "natural"

	179 // continuation to the output (given 50% overlap-and-add).

	180 //

	181 // 4) Search interval of input is then centered at \|ts_out\| * \|alpha\| with

	182 // the width of 2 * \|tau\|, i.e. \|ts_out\| * \|alpha\| + [-\|tau\|, \|tau\|].

	183 //

	184 // 5) Find a frame which is centered within the search interval and is most

	185 // similar to \|target\|. Let \|Q\| be the most similar block to \|target\|

	186 // centered at \|ts_in_opt\|.

	187 // We compute the optimal block as \|opt\| = \|U\| * \|target\| +

	188 // (1 - \|U\|) * \|Q\|.

	189 //

	190 // 6) Overlap-and-add \|opt\| to the output. That is to add \|opt\| * \|W\| to the

	191 // output with \|L/2\| samples overlap.

	192 //

	193 // 7) \|ts_out\| = \|ts_out\| + \|L/2\|

	194 // Let \|target\| be the frame of the input centered at \|ts_in_opt\| + \|L/2\|.

	195 // Note that now \|target\| is the natural continuation to the current

	196 // output (the frame that follows \|opt\| in overlap-and-add sense).

	197 // Continue from step 4.

	198 //

	199

	200 // Book keeping of the current index of generated audio, in frames. This

	201 // should be appropriately updated when out samples are generated, regardless

	202 // of whether we push samples out when FillBuffer() is called or we store

	203 // audio in \|wsola_output\| for the subsequent calls to FillBuffer().

	204 // Furthermore, if samples from input \|audio_buffer_\| are evicted then this

	205 // variable should be updated accordingly, based on \|playback_rate_\|.

	206 int output_index_;

	207

	208 // The offset of the search center frame w.r.t. the first frame.

	209 int search_block_center_offset_;

	210

	211 // Number of Blocks to search to find the most similar one to the target

	212 // frame.

	213 int num_candidate_blocks_;

	214

	215 // Index of the beginning of the target block, counted in frames.

	216 int target_block_index_;

	217

	218 // Overlap-and-add window size in frames, denoted as \|L\| in WSOAL description.

	219 int ola_window_size_;

	220

	221 // The hop size of overlap-and-add in frames (\|L/2\|). This implementation

	222 // assumes 50% overlap-and-add.

	223 int ola_hop_size_;

	224

	225 // Number of frames in \|wsola_output_\| that overlap-and-add is completed for

	226 // them and can be copied to output if FillBuffer() is called. It also

	227 // specifies the index where the next WSOLA window has to overlap-and-add.

	228 int num_complete_frames_;

	229

	230 // This stores a part of the output that is created but couldn't be rendered.

	231 // Output is generated frame-by-frame which at some point might exceed the

	232 // number of requested samples. Furthermore, due to overlap-and-add,

	233 // the last half-window of the output is incomplete, which is stored in this

	234 // buffer.

	235 scoped_ptr<AudioBus> wsola_output_;

	236

	237 // Overlap-and-add window, denoted as \|W\| in the above (see step 6).

	238 scoped_ptr<float[]> ola_window_;

	239

	240 // Transition window, denoted as \|U\| in the above (see step 5).

	241 scoped_ptr<float[]> transition_window_;

	242

	243 // Auxiliary variables to avoid allocation in every iteration.

	244

	245 // Stores the optimal block in every iteration. This is the most

	246 // similar block to \|target_block_\| within \|search_block_\| and it is

	247 // overlap-and-added to \|wsola_output_\|.

	248 scoped_ptr<AudioBus> optimal_block_;

	249

	250 // A block of data that search is performed over to find the \|optimal_block_\|.

	251 scoped_ptr<AudioBus> search_block_;

	252

	253 // Stores the target block, denoted as \|target\| above. \|search_block_\| is

	254 // searched for a block (\|optimal_block_\|) that is most similar to

	255 // \|target_block_\|.

	256 scoped_ptr<AudioBus> target_block_;

	257

166 DISALLOW_COPY_AND_ASSIGN(AudioRendererAlgorithm);	258 DISALLOW_COPY_AND_ASSIGN(AudioRendererAlgorithm);

167 };	259 };

168	260

169 } // namespace media	261 } // namespace media

170	262

171 #endif // MEDIA_FILTERS_AUDIO_RENDERER_ALGORITHM_H_	263 #endif // MEDIA_FILTERS_AUDIO_RENDERER_ALGORITHM_H_

OLD	NEW

« no previous file with comments | « no previous file | media/filters/audio_renderer_algorithm.cc » ('j') | media/filters/audio_renderer_algorithm.cc » ('J')