content/renderer/speech_recognition_audio_source_provider.cc - Issue 499233003: Binding media stream audio track to speech recognition [renderer]

Side by Side Diff: content/renderer/speech_recognition_audio_source_provider.cc

Issue 499233003: Binding media stream audio track to speech recognition [renderer] (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Add unit test and refactor Created 6 years, 3 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

« content/renderer/speech_recognition_audio_source_provider.h ('K') | « content/renderer/speech_recognition_audio_source_provider.h ('k') | content/renderer/speech_recognition_audio_source_provider_unittest.cc » ('j') | content/renderer/speech_recognition_audio_source_provider_unittest.cc » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
(Empty)
	1 // Copyright 2014 The Chromium Authors. All rights reserved.

	2 // Use of this source code is governed by a BSD-style license that can be

	3 // found in the LICENSE file.

	4

	5 #include "content/renderer/speech_recognition_audio_source_provider.h"

	6

	7 #include "base/logging.h"

	8 #include "base/memory/shared_memory.h"

	9 #include "base/threading/thread_restrictions.h"
	no longer working on chromium 2014/09/16 12:44:05 why do you have this base/threading/thread_restric why do you have this base/threading/thread_restrictions.h in both the header and implementation? burnik 2014/09/16 19:10:22 Removed from implementation. On 2014/09/16 12:44:0 Removed from implementation. On 2014/09/16 12:44:05, xians1 wrote: Show quoted text > why do you have this base/threading/thread_restrictions.h in both the header and > implementation?
	10 #include "base/time/time.h"
	no longer working on chromium 2014/09/15 08:31:28 nit, alphabet order nit, alphabet order burnik 2014/09/15 15:00:06 Alphabetic order of what? On 2014/09/15 08:31:28, Alphabetic order of what? On 2014/09/15 08:31:28, xians1 wrote: Show quoted text > nit, alphabet order no longer working on chromium 2014/09/16 12:44:05 I was wrong, ignore it. Show quoted text On 2014/09/15 15:00:06, burnik wrote: > Alphabetic order of what? > On 2014/09/15 08:31:28, xians1 wrote: > > nit, alphabet order > I was wrong, ignore it. burnik 2014/09/16 19:10:22 Acknowledged. Show quoted text On 2014/09/16 12:44:05, xians1 wrote: > On 2014/09/15 15:00:06, burnik wrote: > > Alphabetic order of what? > > On 2014/09/15 08:31:28, xians1 wrote: > > > nit, alphabet order > > > > I was wrong, ignore it. Acknowledged.
	11 #include "media/audio/audio_parameters.h"

	12 #include "media/base/audio_fifo.h"

	13

	14 namespace content {

	15

	16 SpeechRecognitionAudioSourceProvider::SpeechRecognitionAudioSourceProvider(

	17 const blink::WebMediaStreamTrack& track,

	18 const media::AudioParameters& params, const base::SharedMemoryHandle memory,

	19 base::SyncSocket* socket, OnErrorCB on_error_cb)

	20 : track_(track),

	21 shared_memory_(memory, false),

	22 socket_(socket),

	23 output_params_(params),

	24 track_stopped_(false),

	25 buffer_index_(0),

	26 on_error_cb_(on_error_cb) {

	27 DCHECK(main_render_thread_checker_.CalledOnValidThread());

	28 DCHECK(params.IsValid());

	29 const size_t memory_length = media::AudioBus::CalculateMemorySize(params) +

	30 sizeof(media::AudioInputBufferParameters);

	31 CHECK(shared_memory_.Map(memory_length));

	32

	33 uint8* ptr = static_cast<uint8*>(shared_memory_.memory());

	34 media::AudioInputBuffer* buffer =

	35 reinterpret_cast<media::AudioInputBuffer*>(ptr);

	36 // Keep params for sync with client via \|params.size\| on the shared memory.

	37 peer_buffer_index_ = &(buffer->params.size);
	no longer working on chromium 2014/09/15 08:31:29 I think it is a bit wrong, the shared_memory_ has I think it is a bit wrong, the shared_memory_ has not been used before, why should you read the value there? Simply, you can initialize peer_buffer_index_ to 0 here. burnik 2014/09/15 15:00:05 It has been used, on the browser process. Was init It has been used, on the browser process. Was init to 0 upon alloc and share. Makes sense to me to alloc and init in the same place. On 2014/09/15 08:31:29, xians1 wrote: Show quoted text > I think it is a bit wrong, the shared_memory_ has not been used before, why > should you read the value there? > Simply, you can initialize peer_buffer_index_ to 0 here.
	38 // Client must manage his own counter and reset it.

	39 DCHECK_EQ(0U, *peer_buffer_index_);

	40 output_bus_ = media::AudioBus::WrapMemory(params, buffer->audio);

	41 // Connect the source provider to the track as a sink.

	42 MediaStreamAudioSink::AddToAudioTrack(this, track_);

	43 }

	44

	45 SpeechRecognitionAudioSourceProvider::~SpeechRecognitionAudioSourceProvider() {

	46 DCHECK(main_render_thread_checker_.CalledOnValidThread());

	47 if (audio_converter_.get()) audio_converter_->RemoveInput(this);

	48 // Notify the track before this sink goes away.

	49 if (!track_stopped_) MediaStreamAudioSink::RemoveFromAudioTrack(this, track_);

	50 }

	51

	52 // static

	53 bool SpeechRecognitionAudioSourceProvider::IsAllowedAudioTrack(
	no longer working on chromium 2014/09/15 08:31:29 IsAudioTrackSupported() seems a more suitable name IsAudioTrackSupported() seems a more suitable name here. burnik 2014/09/15 15:00:05 "Supported" would indicate there is a technical ba "Supported" would indicate there is a technical barrier to supporting. Here it's actually a policy because of the dreaded abuse SR could experience. On 2014/09/15 08:31:29, xians1 wrote: Show quoted text > IsAudioTrackSupported() seems a more suitable name here. no longer working on chromium 2014/09/16 12:44:05 The policy you mentioned is just one of the purpos Show quoted text On 2014/09/15 15:00:05, burnik wrote: > "Supported" would indicate there is a technical barrier to supporting. Here it's > actually a policy because of the dreaded abuse SR could experience. > On 2014/09/15 08:31:29, xians1 wrote: > > IsAudioTrackSupported() seems a more suitable name here. > The policy you mentioned is just one of the purposes why we have this method, the \|track\| can be any track JS injects, like video track, remote audio track, or screen cast track.
	54 const blink::WebMediaStreamTrack& track) {

	55 DCHECK(track.source().type() == blink::WebMediaStreamSource::TypeAudio);
	no longer working on chromium 2014/09/15 08:31:28 you can't put DCHECK here, this method is trigger you can't put DCHECK here, this method is trigger by JS, and developer can do whatever they want. Just return false if it is not TypeAudio burnik 2014/09/15 15:00:05 True, no checks were done elsewhere. Done. On 2014 True, no checks were done elsewhere. Done. On 2014/09/15 08:31:28, xians1 wrote: Show quoted text > you can't put DCHECK here, this method is trigger by JS, and developer can do > whatever they want. > Just return false if it is not TypeAudio
	56 MediaStreamAudioSource* native_source =

	57 static_cast<MediaStreamAudioSource*>(track.source().extraData());

	58 DCHECK(native_source);
	no longer working on chromium 2014/09/15 08:31:28 Same here, return false if native_source does not Same here, return false if native_source does not exist. burnik 2014/09/15 15:00:05 Done. Show quoted text On 2014/09/15 08:31:28, xians1 wrote: > Same here, return false if native_source does not exist. Done.
	59 const StreamDeviceInfo& device_info = native_source->device_info();

	60 // Purposely only support tracks from an audio device. Dissallow WebAudio.

	61 return (device_info.device.type == content::MEDIA_DEVICE_AUDIO_CAPTURE);

	62 }

	63

	64 void SpeechRecognitionAudioSourceProvider::OnSetFormat(

	65 const media::AudioParameters& input_params) {

	66 // We need detach the thread here because it will be a new capture thread

	67 // calling OnSetFormat() and OnData() if the source is restarted.

	68 capture_thread_checker_.DetachFromThread();

	69 DCHECK(capture_thread_checker_.CalledOnValidThread());

	70 DCHECK(input_params.IsValid());

	71

	72 input_params_ = input_params;

	73 fifo_buffer_size_ = output_params_.frames_per_buffer() *
	no longer working on chromium 2014/09/15 08:31:28 how is this cast? how is this cast? burnik 2014/09/15 15:00:05 Floored. Integer division. On 2014/09/15 08:31:28, Floored. Integer division. On 2014/09/15 08:31:28, xians1 wrote: Show quoted text > how is this cast? no longer working on chromium 2014/09/16 12:44:05 Do it in C++ way, add static_cast<int>() here. Al Show quoted text On 2014/09/15 15:00:05, burnik wrote: > Floored. Integer division. > On 2014/09/15 08:31:28, xians1 wrote: > > how is this cast? > Do it in C++ way, add static_cast<int>() here. Also, could you explain why floor is used instead of ceiling? burnik 2014/09/16 19:10:22 Input and output params are of media::AudioParamet Input and output params are of media::AudioParameters type. All members here are int. Integer division omits decimals. Added DCHECK(output_params_.IsValid()); to next patchset which will check if output sample rate is 0. In production - input will be 44100 with 441 frames and output will be 16000 with 1600 frames. Also, DCHECKS which follow check if we have enough buffer. On 2014/09/16 12:44:05, xians1 wrote: Show quoted text > On 2014/09/15 15:00:05, burnik wrote: > > Floored. Integer division. > > On 2014/09/15 08:31:28, xians1 wrote: > > > how is this cast? > > > > Do it in C++ way, add static_cast<int>() here. > > Also, could you explain why floor is used instead of ceiling? no longer working on chromium 2014/09/17 15:55:19 The example you are taking is just what it is on y Show quoted text On 2014/09/16 19:10:22, burnik wrote: > Input and output params are of media::AudioParameters type. > All members here are int. Integer division omits decimals. > Added DCHECK(output_params_.IsValid()); to next patchset which will check if > output sample rate is 0. > In production - input will be 44100 with 441 frames and output will be 16000 > with 1600 frames. > Also, DCHECKS which follow check if we have enough buffer. > The example you are taking is just what it is on your machine, the input sample rate can be any of the hardware sample rates, from 8k up to 192k burnik 2014/09/18 19:09:21 Ok, Agreed. So if I do it this way: fifo_buffer Show quoted text On 2014/09/17 15:55:19, xians1 wrote: > On 2014/09/16 19:10:22, burnik wrote: > > Input and output params are of media::AudioParameters type. > > All members here are int. Integer division omits decimals. > > Added DCHECK(output_params_.IsValid()); to next patchset which will check if > > output sample rate is 0. > > In production - input will be 44100 with 441 frames and output will be 16000 > > with 1600 frames. > > Also, DCHECKS which follow check if we have enough buffer. > > > > The example you are taking is just what it is on your machine, the input sample > rate can be any of the hardware sample rates, from 8k up to 192k Ok, Agreed. So if I do it this way: fifo_buffer_size_ = std::ceil(output_params_.frames_per_buffer() * static_cast<double>(input_params_.sample_rate()) / output_params_.sample_rate()); I've tested, and it would work properly for these: ================================ in.sr in.fpb out.sr out.fpb -------------------------------- 8000 80 16000 1600 8000 800 16000 1600 16000 160 16000 1600 16000 1600 16000 1600 32000 320 16000 1600 32000 3200 16000 1600 44100 441 16000 1600 44100 4410 16000 1600 48000 480 16000 1600 48000 4800 16000 1600 96000 960 16000 1600 96000 9600 16000 1600 11025 111* 16000 1600 11025 1103* 16000 1600 22050 221* 16000 1600 22050 2205 16000 1600 88200 882 16000 1600 88200 8820 16000 1600 176400 1764 16000 1600 176400 17640 16000 1600 192000 1920 16000 1600 192000 19200 16000 1600 ================================ * These starred are always rounded up, right? no longer working on chromium 2014/09/19 08:58:56 I think this looks correct. Show quoted text On 2014/09/18 19:09:21, burnik wrote: > On 2014/09/17 15:55:19, xians1 wrote: > > On 2014/09/16 19:10:22, burnik wrote: > > > Input and output params are of media::AudioParameters type. > > > All members here are int. Integer division omits decimals. > > > Added DCHECK(output_params_.IsValid()); to next patchset which will check if > > > output sample rate is 0. > > > In production - input will be 44100 with 441 frames and output will be 16000 > > > with 1600 frames. > > > Also, DCHECKS which follow check if we have enough buffer. > > > > > > > The example you are taking is just what it is on your machine, the input > sample > > rate can be any of the hardware sample rates, from 8k up to 192k > > Ok, Agreed. > > So if I do it this way: > > fifo_buffer_size_ = > std::ceil(output_params_.frames_per_buffer() * > static_cast<double>(input_params_.sample_rate()) / > output_params_.sample_rate()); > > I've tested, and it would work properly for these: > > ================================ > in.sr in.fpb out.sr out.fpb > -------------------------------- > 8000 80 16000 1600 > 8000 800 16000 1600 > 16000 160 16000 1600 > 16000 1600 16000 1600 > 32000 320 16000 1600 > 32000 3200 16000 1600 > 44100 441 16000 1600 > 44100 4410 16000 1600 > 48000 480 16000 1600 > 48000 4800 16000 1600 > 96000 960 16000 1600 > 96000 9600 16000 1600 > 11025 111* 16000 1600 > 11025 1103* 16000 1600 > 22050 221* 16000 1600 > 22050 2205 16000 1600 > 88200 882 16000 1600 > 88200 8820 16000 1600 > 176400 1764 16000 1600 > 176400 17640 16000 1600 > 192000 1920 16000 1600 > 192000 19200 16000 1600 > ================================ > > * These starred are always rounded up, right? > I think this looks correct.
	74 input_params_.sample_rate() /

	75 output_params_.sample_rate();

	76 DCHECK_GE(fifo_buffer_size_, input_params_.frames_per_buffer());

	77 DCHECK_GE(fifo_buffer_size_, output_params_.frames_per_buffer());

	78

	79 // Allows for some delays on the endpoint client.

	80 static const int kNumberOfBuffersInFifo = 2;

	81 int frames_in_fifo = kNumberOfBuffersInFifo * fifo_buffer_size_;

	82 fifo_.reset(new media::AudioFifo(input_params.channels(), frames_in_fifo));

	83 input_bus_ = media::AudioBus::Create(input_params.channels(),

	84 input_params.frames_per_buffer());

	85

	86 // Create the audio converter with \|disable_fifo\| as false so that the

	87 // converter will request input_params.frames_per_buffer() each time.

	88 // This will not increase the complexity as there is only one client to

	89 // the converter.

	90 audio_converter_.reset(

	91 new media::AudioConverter(input_params, output_params_, false));

	92 audio_converter_->AddInput(this);

	93 }

	94

	95 void SpeechRecognitionAudioSourceProvider::OnReadyStateChanged(

	96 blink::WebMediaStreamSource::ReadyState state) {

	97 DCHECK(main_render_thread_checker_.CalledOnValidThread());

	98 if (track_stopped_) return;
	no longer working on chromium 2014/09/15 08:31:27 new line. new line. burnik 2014/09/15 15:00:06 Done. However, clang-format proposes this way. On Done. However, clang-format proposes this way. On 2014/09/15 08:31:27, xians1 wrote: Show quoted text > new line.
	99 if (state == blink::WebMediaStreamSource::ReadyStateEnded) {
	no longer working on chromium 2014/09/15 08:31:28 add an empty line before the second if ( add an empty line before the second if ( burnik 2014/09/15 15:00:05 Done. Show quoted text On 2014/09/15 08:31:28, xians1 wrote: > add an empty line before the second if ( Done.
	100 track_stopped_ = true;

	101 MediaStreamAudioSink::RemoveFromAudioTrack(this, track_);
	no longer working on chromium 2014/09/15 08:31:28 Remove this line of code. track_ has already been Remove this line of code. track_ has already been ended, you should not call into the track_ any more. burnik 2014/09/15 15:00:06 Are you sure? Will the MediaStreamAudioSink remove Are you sure? Will the MediaStreamAudioSink remove the track on it's own? Can you point me to that code, please? This is paired with the dtor of the class. On 2014/09/15 08:31:28, xians1 wrote: Show quoted text > Remove this line of code. > track_ has already been ended, you should not call into the track_ any more.
	102 NotifyErrorState(ErrorState::TRACK_STOPPED);
	no longer working on chromium 2014/09/15 08:31:29 hmm, track ended state is not an error, ErrorState hmm, track ended state is not an error, ErrorState should not include TRACK_STOPPED at all. burnik 2014/09/15 15:00:06 Agreed. It's here for now as I refactor. On 2014/0 Agreed. It's here for now as I refactor. On 2014/09/15 08:31:29, xians1 wrote: Show quoted text > hmm, track ended state is not an error, ErrorState should not include > TRACK_STOPPED at all.
	103 }

	104 }

	105

	106 void SpeechRecognitionAudioSourceProvider::OnData(const int16* audio_data,

	107 int sample_rate,

	108 int number_of_channels,

	109 int number_of_frames) {

	110 DCHECK(capture_thread_checker_.CalledOnValidThread());

	111 DCHECK(peer_buffer_index_);

	112 DCHECK_EQ(input_bus_->frames(), number_of_frames);

	113 DCHECK_EQ(input_bus_->channels(), number_of_channels);

	114 if (fifo_->frames() + number_of_frames > fifo_->max_frames()) {

	115 NotifyErrorState(ErrorState::AUDIO_FIFO_OVERFLOW);
	no longer working on chromium 2014/09/15 08:31:28 Log it. Also, could you please explain what the cl Log it. Also, could you please explain what the client supposes to do when getting a AUDIO_FIFO_OVERFLOW callback? burnik 2014/09/15 15:00:06 Logged via DLOG(ERROR). Client can destroy the aud Logged via DLOG(ERROR). Client can destroy the audio source provider and potentially end the session early. On 2014/09/15 08:31:28, xians1 wrote: Show quoted text > Log it. > Also, could you please explain what the client supposes to do when getting a > AUDIO_FIFO_OVERFLOW callback?
	116 return;

	117 }

	118 // TODO(xians): A better way to handle the interleaved and deinterleaved

	119 // format switching, see issue/317710.

	120 input_bus_->FromInterleaved(audio_data, number_of_frames,

	121 sizeof(audio_data[0]));

	122

	123 fifo_->Push(input_bus_.get());

	124 // Wait for FIFO to have at least \|fifo_buffer_size_\| frames ready.

	125 if (fifo_->frames() < fifo_buffer_size_) return;
	no longer working on chromium 2014/09/15 08:31:28 empty line for the return empty line for the return burnik 2014/09/15 15:00:06 Done. Show quoted text On 2014/09/15 08:31:28, xians1 wrote: > empty line for the return Done.
	126

	127 // Make sure the previous output buffer was consumed by client before we send

	128 // the next buffer. \|peer_buffer_index_\| is pointing to shared memory.

	129 // The client must write to it (incrementing by 1) once the the buffer was

	130 // consumed. This is intentional not to block this audio capturing thread.

	131 if (buffer_index_ != (*peer_buffer_index_)) {

	132 NotifyErrorState(ErrorState::BUFFER_SYNC_LAG);

	133 return;

	134 }

	135

	136 audio_converter_->Convert(output_bus_.get());

	137

	138 // Notify client to consume buffer \|buffer_index_\| on \|output_bus_\|.

	139 const size_t bytes_sent =

	140 socket_->Send(&buffer_index_, sizeof(buffer_index_));

	141 if (bytes_sent != sizeof(buffer_index_)) {

	142 // The send usually fails if the user changes his input audio device.

	143 NotifyErrorState(ErrorState::SEND_FAILED);

	144 // We have discarded this buffer, but could still recover on the next one.

	145 // Although, if the socket was closed, this will shortly end up

	146 // in \|ErrorState::AUDIO_FIFO_OVERFLOW\|.

	147 return;

	148 }

	149

	150 // Count the sent buffer. We expect the client to do the same on his end.

	151 ++buffer_index_;

	152 }

	153

	154 double SpeechRecognitionAudioSourceProvider::ProvideInput(

	155 media::AudioBus* audio_bus, base::TimeDelta buffer_delay) {

	156 DCHECK(capture_thread_checker_.CalledOnValidThread());

	157 if (fifo_->frames() >= audio_bus->frames())

	158 fifo_->Consume(audio_bus, 0, audio_bus->frames());

	159 else

	160 audio_bus->Zero();
	no longer working on chromium 2014/09/15 08:31:29 do you know if the else case can happen here? do you know if the else case can happen here? burnik 2014/09/15 15:00:05 Yes. The else happens when we attach to the conver Yes. The else happens when we attach to the converter in \|OnSetFormat\|. Otherwise wouldn't be removing the \|attached_converter_\|. On 2014/09/15 08:31:29, xians1 wrote: Show quoted text > do you know if the else case can happen here?
	161 return 1.0;
	no longer working on chromium 2014/09/15 08:31:28 empty line before the return. empty line before the return. burnik 2014/09/15 15:00:05 Done. Show quoted text On 2014/09/15 08:31:28, xians1 wrote: > empty line before the return. Done.
	162 }

	163

	164 void SpeechRecognitionAudioSourceProvider::NotifyErrorState(ErrorState error) {

	165 // TODO(burnik): Runs on capture thread. Should run on main renderer thread!

	166 DCHECK(capture_thread_checker_.CalledOnValidThread());

	167 if (on_error_cb_.is_null()) return;

	168 on_error_cb_.Run(error);

	169 }

	170

	171 } // namespace content

OLD	NEW