content/browser/speech/speech_recognizer_impl.cc - Issue 9835049: Speech refactoring: Reimplemented speech_recognizer as a FSM. (CL1.5)

Side by Side Diff: content/browser/speech/speech_recognizer_impl.cc

Issue 9835049: Speech refactoring: Reimplemented speech_recognizer as a FSM. (CL1.5) (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src

Patch Set: Rebased from master. Created 8 years, 9 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

« content/browser/speech/speech_recognizer_impl.h ('K') | « content/browser/speech/speech_recognizer_impl.h ('k') | content/browser/speech/speech_recognizer_impl_unittest.cc » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
1 // Copyright (c) 2012 The Chromium Authors. All rights reserved.	1 // Copyright (c) 2012 The Chromium Authors. All rights reserved.

2 // Use of this source code is governed by a BSD-style license that can be	2 // Use of this source code is governed by a BSD-style license that can be

3 // found in the LICENSE file.	3 // found in the LICENSE file.

4	4

5 #include "content/browser/speech/speech_recognizer_impl.h"	5 #include "content/browser/speech/speech_recognizer_impl.h"

6	6

7 #include "base/bind.h"	7 #include "base/bind.h"

8 #include "base/time.h"	8 #include "base/time.h"

9 #include "content/browser/browser_main_loop.h"	9 #include "content/browser/browser_main_loop.h"

10 #include "content/browser/speech/audio_buffer.h"	10 #include "content/browser/speech/audio_buffer.h"

11 #include "content/browser/speech/google_one_shot_remote_engine.h"	11 #include "content/browser/speech/google_one_shot_remote_engine.h"

12 #include "content/public/browser/browser_thread.h"	12 #include "content/public/browser/browser_thread.h"

13 #include "content/public/browser/speech_recognition_event_listener.h"	13 #include "content/public/browser/speech_recognition_event_listener.h"

14 #include "content/public/browser/speech_recognizer.h"	14 #include "content/public/browser/speech_recognizer.h"

15 #include "content/public/common/speech_recognition_error.h"	15 #include "content/public/common/speech_recognition_error.h"

16 #include "content/public/common/speech_recognition_result.h"	16 #include "content/public/common/speech_recognition_result.h"

17 #include "net/url_request/url_request_context_getter.h"	17 #include "net/url_request/url_request_context_getter.h"

18	18

	19 #define UNREACHABLE_CONDITION() do { NOTREACHED(); return state_; } while(0)
	Satish 2012/03/27 09:47:42 can this be changed to a method InvalidInput() alo can this be changed to a method InvalidInput() along the lines of the existing DoNothing() method? You can remove the 'do-while' usage as well Primiano Tucci (use gerrit) 2012/03/28 13:24:44 I used a macro since, in case of bugs/failing DCHE Show quoted text On 2012/03/27 09:47:42, Satish wrote: > can this be changed to a method InvalidInput() along the lines of the existing > DoNothing() method? You can remove the 'do-while' usage as well I used a macro since, in case of bugs/failing DCHECKS, the line number that is pointed out refers to the code in the FSM switch matrix, helping a lot the debugging.
	20

19 using content::BrowserMainLoop;	21 using content::BrowserMainLoop;

20 using content::BrowserThread;	22 using content::BrowserThread;

21 using content::SpeechRecognitionError;	23 using content::SpeechRecognitionError;

22 using content::SpeechRecognitionEventListener;	24 using content::SpeechRecognitionEventListener;

23 using content::SpeechRecognitionResult;	25 using content::SpeechRecognitionResult;

24 using content::SpeechRecognizer;	26 using content::SpeechRecognizer;

25 using media::AudioInputController;	27 using media::AudioInputController;

26	28

	29 // TODO(primiano) what about a watchdog here to avoid getting stuck if the

	30 // SpeechRecognitionEngine does not deliver a result (in reasonable time)?
	Satish 2012/03/27 09:47:42 for remote engines, the network connection should for remote engines, the network connection should timeout automatically so a watchdog is probably not required. Since thats the only one we support now could remove this todo Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > for remote engines, the network connection should timeout automatically so a > watchdog is probably not required. Since thats the only one we support now could > remove this todo Done.
27 namespace {	31 namespace {

28	32 // Enables spontaneous transition from WaitingForSpeech to RecognizingSpeech,
	Satish 2012/03/27 09:47:42 add newline above add newline above Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > add newline above Done.
	33 // which is required for the mock recognition engine which sends fake results.

	34 const bool skipSilenceDetectionForTesting = false;
	Satish 2012/03/27 09:47:42 This doesn't seem to be set to true anywhere else This doesn't seem to be set to true anywhere else so code checking for this runs always now. Is it going to be in a future CL? If so, you can remove it here and add it in that CL Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > This doesn't seem to be set to true anywhere else so code checking for this runs > always now. Is it going to be in a future CL? If so, you can remove it here and > add it in that CL Done.
29 // The following constants are related to the volume level indicator shown in	35 // The following constants are related to the volume level indicator shown in

30 // the UI for recorded audio.	36 // the UI for recorded audio.

31 // Multiplier used when new volume is greater than previous level.	37 // Multiplier used when new volume is greater than previous level.

32 const float kUpSmoothingFactor = 1.0f;	38 const float kUpSmoothingFactor = 1.0f;

33 // Multiplier used when new volume is lesser than previous level.	39 // Multiplier used when new volume is lesser than previous level.

34 const float kDownSmoothingFactor = 0.7f;	40 const float kDownSmoothingFactor = 0.7f;

35 // RMS dB value of a maximum (unclipped) sine wave for int16 samples.	41 // RMS dB value of a maximum (unclipped) sine wave for int16 samples.

36 const float kAudioMeterMaxDb = 90.31f;	42 const float kAudioMeterMaxDb = 90.31f;

37 // This value corresponds to RMS dB for int16 with 6 most-significant-bits = 0.	43 // This value corresponds to RMS dB for int16 with 6 most-significant-bits = 0.

38 // Values lower than this will display as empty level-meter.	44 // Values lower than this will display as empty level-meter.

39 const float kAudioMeterMinDb = 30.0f;	45 const float kAudioMeterMinDb = 30.0f;

40 const float kAudioMeterDbRange = kAudioMeterMaxDb - kAudioMeterMinDb;	46 const float kAudioMeterDbRange = kAudioMeterMaxDb - kAudioMeterMinDb;

41	47

42 // Maximum level to draw to display unclipped meter. (1.0f displays clipping.)	48 // Maximum level to draw to display unclipped meter. (1.0f displays clipping.)

43 const float kAudioMeterRangeMaxUnclipped = 47.0f / 48.0f;	49 const float kAudioMeterRangeMaxUnclipped = 47.0f / 48.0f;

44	50

45 // Returns true if more than 5% of the samples are at min or max value.	51 // Returns true if more than 5% of the samples are at min or max value.

46 bool DetectClipping(const speech::AudioChunk& chunk) {	52 bool DetectClipping(const speech::AudioChunk& chunk) {

47 const int num_samples = chunk.NumSamples();	53 const int num_samples = chunk.NumSamples();

48 const int16* samples = chunk.SamplesData16();	54 const int16* samples = chunk.SamplesData16();

49 const int kThreshold = num_samples / 20;	55 const int kThreshold = num_samples / 20;

50 int clipping_samples = 0;	56 int clipping_samples = 0;

	57

51 for (int i = 0; i < num_samples; ++i) {	58 for (int i = 0; i < num_samples; ++i) {

52 if (samples[i] <= -32767 \|\| samples[i] >= 32767) {	59 if (samples[i] <= -32767 \|\| samples[i] >= 32767) {

53 if (++clipping_samples > kThreshold)	60 if (++clipping_samples > kThreshold)

54 return true;	61 return true;

55 }	62 }

56 }	63 }

57 return false;	64 return false;

58 }	65 }

59	66

60 } // namespace	67 } // namespace

61	68

62 SpeechRecognizer* SpeechRecognizer::Create(	69 SpeechRecognizer* SpeechRecognizer::Create(

63 SpeechRecognitionEventListener* listener,	70 SpeechRecognitionEventListener* listener,

64 int caller_id,	71 int caller_id,

65 const std::string& language,	72 const std::string& language,

66 const std::string& grammar,	73 const std::string& grammar,

67 net::URLRequestContextGetter* context_getter,	74 net::URLRequestContextGetter* context_getter,

68 bool filter_profanities,	75 bool filter_profanities,

69 const std::string& hardware_info,	76 const std::string& hardware_info,

70 const std::string& origin_url) {	77 const std::string& origin_url) {

	78 speech::GoogleOneShotRemoteEngineConfig google_sr_config;

	79 google_sr_config.language = language;

	80 google_sr_config.grammar = grammar;

	81 google_sr_config.audio_sample_rate =

	82 speech::SpeechRecognizerImpl::kAudioSampleRate;

	83 google_sr_config.audio_num_bits_per_sample =

	84 speech::SpeechRecognizerImpl::kNumBitsPerAudioSample;

	85 google_sr_config.filter_profanities = filter_profanities;

	86 google_sr_config.hardware_info = hardware_info;

	87 google_sr_config.origin_url = origin_url;

	88

	89 speech::GoogleOneShotRemoteEngine* google_sr_engine =

	90 new speech::GoogleOneShotRemoteEngine(context_getter);

	91 google_sr_engine->SetConfig(google_sr_config);
	Satish 2012/03/27 09:47:42 Is this config ever changed after creating the eng Is this config ever changed after creating the engine? If not, can we pass it as an arg to the constructor instead of a separate method? Primiano Tucci (use gerrit) 2012/03/28 13:24:44 It can be changed, so that the recognition engine Show quoted text On 2012/03/27 09:47:42, Satish wrote: > Is this config ever changed after creating the engine? If not, can we pass it as > an arg to the constructor instead of a separate method? It can be changed, so that the recognition engine can be reused avoiding to istantiate a new engine for each request (e.g. the JS changes the .lang property and reissues .Start() on the recognition object).
	92

71 return new speech::SpeechRecognizerImpl(listener,	93 return new speech::SpeechRecognizerImpl(listener,

72 caller_id,	94 caller_id,

73 language,	95 google_sr_engine);

74 grammar,

75 context_getter,

76 filter_profanities,

77 hardware_info,

78 origin_url);

79 }	96 }

80	97

81 namespace speech {	98 namespace speech {

82

83 const int SpeechRecognizerImpl::kAudioSampleRate = 16000;	99 const int SpeechRecognizerImpl::kAudioSampleRate = 16000;

84 const ChannelLayout SpeechRecognizerImpl::kChannelLayout = CHANNEL_LAYOUT_MONO;	100 const ChannelLayout SpeechRecognizerImpl::kChannelLayout = CHANNEL_LAYOUT_MONO;

85 const int SpeechRecognizerImpl::kNumBitsPerAudioSample = 16;	101 const int SpeechRecognizerImpl::kNumBitsPerAudioSample = 16;

86 const int SpeechRecognizerImpl::kNoSpeechTimeoutMs = 8000;	102 const int SpeechRecognizerImpl::kNoSpeechTimeoutMs = 8000;

87 const int SpeechRecognizerImpl::kEndpointerEstimationTimeMs = 300;	103 const int SpeechRecognizerImpl::kEndpointerEstimationTimeMs = 300;

88	104

89 SpeechRecognizerImpl::SpeechRecognizerImpl(	105 SpeechRecognizerImpl::SpeechRecognizerImpl(

90 SpeechRecognitionEventListener* listener,	106 SpeechRecognitionEventListener* listener,

91 int caller_id,	107 int caller_id,

92 const std::string& language,	108 SpeechRecognitionEngine* engine)

93 const std::string& grammar,

94 net::URLRequestContextGetter* context_getter,

95 bool filter_profanities,

96 const std::string& hardware_info,

97 const std::string& origin_url)

98 : listener_(listener),	109 : listener_(listener),

99 testing_audio_manager_(NULL),	110 testing_audio_manager_(NULL),

	111 recognition_engine_(engine),

100 endpointer_(kAudioSampleRate),	112 endpointer_(kAudioSampleRate),

101 context_getter_(context_getter),

102 caller_id_(caller_id),	113 caller_id_(caller_id),
	Satish 2012/03/27 09:47:42 this initializer list should be in the same order this initializer list should be in the same order as in the class declaration (.h file). I think the Clang builder will turn red otherwise. Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Isn't it? (except for non pod fields and pod field Show quoted text On 2012/03/27 09:47:42, Satish wrote: > this initializer list should be in the same order as in the class declaration > (.h file). I think the Clang builder will turn red otherwise. Isn't it? (except for non pod fields and pod fields that do not require initialization, thus skipped).
103 language_(language),	114 event_dispatch_nesting_level_(0),

104 grammar_(grammar),	115 state_(kIdle),

105 filter_profanities_(filter_profanities),	116 event_args_(NULL) {

106 hardware_info_(hardware_info),

107 origin_url_(origin_url),

108 num_samples_recorded_(0),

109 audio_level_(0.0f) {

110 DCHECK(listener_ != NULL);	117 DCHECK(listener_ != NULL);

	118 DCHECK(recognition_engine_ != NULL);

111 endpointer_.set_speech_input_complete_silence_length(	119 endpointer_.set_speech_input_complete_silence_length(

112 base::Time::kMicrosecondsPerSecond / 2);	120 base::Time::kMicrosecondsPerSecond / 2);

113 endpointer_.set_long_speech_input_complete_silence_length(	121 endpointer_.set_long_speech_input_complete_silence_length(

114 base::Time::kMicrosecondsPerSecond);	122 base::Time::kMicrosecondsPerSecond);

115 endpointer_.set_long_speech_length(3 * base::Time::kMicrosecondsPerSecond);	123 endpointer_.set_long_speech_length(3 * base::Time::kMicrosecondsPerSecond);

116 endpointer_.StartSession();	124 endpointer_.StartSession();

	125 recognition_engine_->set_delegate(this);

117 }	126 }

118	127

119 SpeechRecognizerImpl::~SpeechRecognizerImpl() {	128 SpeechRecognizerImpl::~SpeechRecognizerImpl() {
	Satish 2012/03/27 09:47:42 add a DCHECK to verify you are in a valid (idle?) add a DCHECK to verify you are in a valid (idle?) state when destroyed Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Hmm, the browser could be closed while a recogniti Show quoted text On 2012/03/27 09:47:42, Satish wrote: > add a DCHECK to verify you are in a valid (idle?) state when destroyed Hmm, the browser could be closed while a recognition is in progress, which would erroneously fail the DCHECK.
120 // Recording should have stopped earlier due to the endpointer or

121 // \|StopRecording\| being called.

122 DCHECK(!audio_controller_.get());

123 DCHECK(!recognition_engine_.get() \|\|

124 !recognition_engine_->IsRecognitionPending());

125 endpointer_.EndSession();	129 endpointer_.EndSession();

126 }	130 }

127	131

	132 // ------- Methods that trigger Finite State Machine (FSM) events ------------

	133

	134 // NOTE: all the external events and request should be enqueued (PostTask), even

	135 // if they come from the same (IO) thread, in order to preserve the relationship

	136 // of causality between events.

	137 // Imagine what would happen if a Start has been enqueued from another thread
	Satish 2012/03/27 09:47:42 137-145 looks like a scare tactic :) and could be 137-145 looks like a scare tactic :) and could be removed. 134-136 is quite explanatory Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > 137-145 looks like a scare tactic :) and could be removed. 134-136 is quite > explanatory Done.
	138 // (but not yet processed) and we suddenly issue a Stop from the IO thread.

	139 // Furthermore, even if you are sure to not interleave start and stop requests,

	140 // asynchronous event processing mixed with synchronous callback can cause very

	141 // mind-breaking side effects.

	142 // For instance, if someone could call Abort synchronously (instead of posting

	143 // the event on the queue), it will receive interleaved callbacks (e.g. an error

	144 // or the audio-end event) before the Abort call is effectively ended.

	145 // Is your (caller) code ready for this?

	146

128 void SpeechRecognizerImpl::StartRecognition() {	147 void SpeechRecognizerImpl::StartRecognition() {

	148 FSMEventArgs args;

	149 BrowserThread::PostTask(BrowserThread::IO, FROM_HERE,

	150 base::Bind(&SpeechRecognizerImpl::DispatchEvent,

	151 this, kStartRequest, args));
	Satish 2012/03/27 09:47:42 could make it simple by replacing 'args' with 'FSM could make it simple by replacing 'args' with 'FSMEventArgs()' Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > could make it simple by replacing 'args' with 'FSMEventArgs()' Done.
	152 }

	153

	154 void SpeechRecognizerImpl::AbortRecognition() {

	155 FSMEventArgs args;

	156 BrowserThread::PostTask(BrowserThread::IO, FROM_HERE,

	157 base::Bind(&SpeechRecognizerImpl::DispatchEvent,

	158 this, kAbortRequest, args));

	159 }

	160

	161 void SpeechRecognizerImpl::StopAudioCapture() {

	162 FSMEventArgs args;

	163 BrowserThread::PostTask(BrowserThread::IO, FROM_HERE,

	164 base::Bind(&SpeechRecognizerImpl::DispatchEvent,

	165 this, kStopCaptureRequest, args));

	166 }

	167

	168 bool SpeechRecognizerImpl::IsActive() const {

	169 // Checking the FSM state from another thread (thus, while the FSM is

	170 // potentially concurrently evolving) is meaningless.

	171 // If you're doing it, probably you have some design issues.

129 DCHECK(BrowserThread::CurrentlyOn(BrowserThread::IO));	172 DCHECK(BrowserThread::CurrentlyOn(BrowserThread::IO));

130 DCHECK(!audio_controller_.get());	173 return state_ != kIdle;

131 DCHECK(!recognition_engine_.get() \|\|	174 }

132 !recognition_engine_->IsRecognitionPending());	175

133	176 bool SpeechRecognizerImpl::IsCapturingAudio() const {

134 // The endpointer needs to estimate the environment/background noise before	177 DCHECK(BrowserThread::CurrentlyOn(BrowserThread::IO)); // See IsActive().

135 // starting to treat the audio as user input. In \|HandleOnData\| we wait until	178 return state_ >= kStartingRecognition && state_ <= kRecognizingSpeech;
	Satish 2012/03/27 09:47:42 Would checking for audio_controller_ != NULL be mo Would checking for audio_controller_ != NULL be more authoritative? Primiano Tucci (use gerrit) 2012/03/28 13:24:44 IMHO, all the decisions related to the evolution o Show quoted text On 2012/03/27 09:47:42, Satish wrote: > Would checking for audio_controller_ != NULL be more authoritative? IMHO, all the decisions related to the evolution of a FSM should refer to its state and not on the side effects caused by previous transitions. However, I agree with the DCHECK and slightly revised the code, making more pedantic checks.
136 // such time has passed before switching to user input mode.	179 }

137 endpointer_.SetEnvironmentEstimationMode();	180

138	181 // Invoked in the audio thread.

	182 void SpeechRecognizerImpl::OnError(AudioInputController* controller,

	183 int error_code) {

	184 FSMEventArgs args;

	185 args.audio_error_code = error_code;

	186 BrowserThread::PostTask(BrowserThread::IO, FROM_HERE,

	187 base::Bind(&SpeechRecognizerImpl::DispatchEvent,

	188 this, kAudioError, args));

	189 }

	190

	191 void SpeechRecognizerImpl::OnData(AudioInputController* controller,

	192 const uint8* data, uint32 size) {

	193 if (size == 0) // This could happen when audio capture stops and is normal.

	194 return;

	195

	196 FSMEventArgs args;

	197 args.audio_data = new AudioChunk(data, static_cast<size_t>(size),
	Satish 2012/03/27 09:47:42 add a comment here that the event handler takes ow add a comment here that the event handler takes ownership Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > add a comment here that the event handler takes ownership Done.
	198 kNumBitsPerAudioSample / 8);
	Satish 2012/03/27 09:47:42 since we are assuming kNumBitsPerAudioSample as a since we are assuming kNumBitsPerAudioSample as a multiple of 8, can you add a COMPILE_ASSERT at the top to validate this assumption? If it ever gets changed to something else (e.g. 15 bits per sample) this assert would catch it Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > since we are assuming kNumBitsPerAudioSample as a multiple of 8, can you add a > COMPILE_ASSERT at the top to validate this assumption? If it ever gets changed > to something else (e.g. 15 bits per sample) this assert would catch it Done.
	199 BrowserThread::PostTask(BrowserThread::IO, FROM_HERE,

	200 base::Bind(&SpeechRecognizerImpl::DispatchEvent,

	201 this, kAudioData, args));

	202 }

	203

	204 void SpeechRecognizerImpl::OnSpeechRecognitionEngineResult(

	205 const content::SpeechRecognitionResult& result) {

	206 FSMEvent event = kRecognitionResult;
	Satish 2012/03/27 09:47:42 can this value be passed directly to the base::Bin can this value be passed directly to the base::Bind call below? Primiano Tucci (use gerrit) 2012/03/28 13:24:44 It must! honestly don't know why I did pass throug Show quoted text On 2012/03/27 09:47:42, Satish wrote: > can this value be passed directly to the base::Bind call below? It must! honestly don't know why I did pass through that var, probably a rest of some copy/paste.
	207 FSMEventArgs args;

	208 args.speech_result = result;

	209 BrowserThread::PostTask(BrowserThread::IO, FROM_HERE,

	210 base::Bind(&SpeechRecognizerImpl::DispatchEvent,

	211 this, event, args));

	212 }

	213

	214 void SpeechRecognizerImpl::OnSpeechRecognitionEngineError(

	215 const content::SpeechRecognitionError& error) {

	216 FSMEvent event = kRecognitionError;
	Satish 2012/03/27 09:47:42 ditto ditto Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > ditto Done.
	217 FSMEventArgs args;

	218 args.error = error;

	219 BrowserThread::PostTask(BrowserThread::IO, FROM_HERE,

	220 base::Bind(&SpeechRecognizerImpl::DispatchEvent,

	221 this, event, args));

	222 }

	223

	224 // ----------------------- Core FSM implementation ---------------------------

	225

	226 void SpeechRecognizerImpl::DispatchEvent(FSMEvent event, FSMEventArgs args) {

	227 DCHECK(BrowserThread::CurrentlyOn(BrowserThread::IO));

	228 DCHECK_LE(event, kMaxEvent);

	229 DCHECK_LE(state_, kMaxState);

	230 // Event dispatching must be sequential, otherwise it will break all the rules
	Satish 2012/03/27 09:47:42 add newline above full length comments such as the add newline above full length comments such as these Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > add newline above full length comments such as these Done.
	231 // and the assumptions of the finite state automata model.

	232 DCHECK_EQ(event_dispatch_nesting_level_, 0);
	Satish 2012/03/27 09:47:42 could be clearer if this variable was a bool such could be clearer if this variable was a bool such as 'in_dispatch_event_', set to true here and to false at the end of the method Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Right. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > could be clearer if this variable was a bool such as 'in_dispatch_event_', set > to true here and to false at the end of the method Right.
	233 ++event_dispatch_nesting_level_;

	234 // Guard against the delegate freeing us until we finish processing the event.
	Satish 2012/03/27 09:47:42 ditto ditto Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > ditto Done.
	235 scoped_refptr<SpeechRecognizerImpl> me(this);

	236

	237 event_ = event;
	Satish 2012/03/27 09:47:42 These look a bit dangerous as they are invalid aft These look a bit dangerous as they are invalid after this method returns. Can you pass them as arguments and not have as member variables? Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Mmm what do you mean? They are used only by (priva Show quoted text On 2012/03/27 09:47:42, Satish wrote: > These look a bit dangerous as they are invalid after this method returns. Can > you pass them as arguments and not have as member variables? Mmm what do you mean? They are used only by (private) functions that are uniquely called in DispatchEvent. Perhaps we could add a DCHECK(in_event_processing_) on each of those functions, but I don't know if it is worth. Or if we want to be very paranoid instead of setting the event_args_ pointer to NULL we could set it to an empty (static) object.
	238 event_args_ = &args;

	239

	240 if (event == kAudioData)

	241 ProcessAudioPipeline();

	242 // The audio pipeline must be processed before the ProcessEvent, otherwise it
	Satish 2012/03/27 09:47:42 add newline above add newline above Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > add newline above Done.
	243 // would take actions according to the future state and not the current one.

	244 state_ = ProcessEvent(event);

	245

	246 // Cleanup event args.

	247 if (args.audio_data)
	Satish 2012/03/27 09:47:42 this cleanup should be part of the FSMEventArgs de this cleanup should be part of the FSMEventArgs destructor. Better still you can make audio_data into a scoped_ptr so a custom destructor isn't required Primiano Tucci (use gerrit) 2012/03/28 13:24:44 AudioChunk is now refcounted and should be destroy Show quoted text On 2012/03/27 09:47:42, Satish wrote: > this cleanup should be part of the FSMEventArgs destructor. Better still you can > make audio_data into a scoped_ptr so a custom destructor isn't required AudioChunk is now refcounted and should be destroyed automatically upon return of this function.
	248 delete args.audio_data;

	249 event_args_ = NULL;

	250 --event_dispatch_nesting_level_;

	251 }

	252

	253 // ----------- Contract for all the FSM evolution functions below -------------

	254 // - Are guaranteed to be executed in the IO thread;

	255 // - Are guaranteed to be not reentrant (themselves and each other);

	256 // - event_args_ is guaranteed to be non NULL;

	257 // - event_args_ members are guaranteed to be stable during the call;

	258 // - The class won't be freed in the meanwhile due to callbacks;

	259

	260 // TODO(primiano) the audio pipeline is currently serial. However, the

	261 // clipper->endpointer->vumeter chain and the sr_engine could be parallelized.

	262 // We should profile the execution to see if it would be worth or not.

	263 void SpeechRecognizerImpl::ProcessAudioPipeline() {

	264 const bool always = true;
	Satish 2012/03/27 09:47:42 remove this as its used only in the next line remove this as its used only in the next line Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > remove this as its used only in the next line Done.
	265 const bool route_audio_to_clipper = always;
	Satish 2012/03/27 09:47:42 only use 1 space on either side of = and && operat only use 1 space on either side of = and && operators, we don't align RHS of assignments across multiple lines like here Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > only use 1 space on either side of = and && operators, we don't align RHS of > assignments across multiple lines like here Done.
	266 const bool route_audio_to_endpointer = state_ >= kEstimatingEnvironment &&

	267 state_ <= kRecognizingSpeech;

	268 const bool route_audio_to_sr_engine = route_audio_to_endpointer;

	269 const bool route_audio_to_vumeter = state_ >= kWaitingForSpeech &&

	270 state_ <= kRecognizingSpeech;

	271

	272 AudioChunk& recorded_audio_data = *(event_args_->audio_data);
	Satish 2012/03/27 09:47:42 use "const AudioChunk&" use "const AudioChunk&" Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > use "const AudioChunk&" Done.
	273

	274 num_samples_recorded_ += recorded_audio_data.NumSamples();

	275

	276 if (route_audio_to_clipper) {

	277 clipper_detected_clip_ = DetectClipping(recorded_audio_data);
	Satish 2012/03/27 09:47:42 clipper_detected_clip_ is set here and used in Upd clipper_detected_clip_ is set here and used in UpdateSignalAndNoiseLevels() call made a few lines below. So seems like it shouldn't be a member variable, instead move it local here and pass as a parameter to the method below Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > clipper_detected_clip_ is set here and used in UpdateSignalAndNoiseLevels() call > made a few lines below. So seems like it shouldn't be a member variable, instead > move it local here and pass as a parameter to the method below Done.
	278 }

	279 if (route_audio_to_endpointer) {

	280 endpointer_.ProcessAudio(recorded_audio_data, &rms_);
	Satish 2012/03/27 09:47:42 ditto for 'rms_' ditto for 'rms_' Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > ditto for 'rms_' Done.
	281 }

	282 if (route_audio_to_vumeter) {

	283 DCHECK(route_audio_to_endpointer); // Depends on endpointer due to \|rms_\|.

	284 UpdateSignalAndNoiseLevels(rms_);
	Satish 2012/03/27 09:47:42 since this is the only method making use of clippi since this is the only method making use of clipping information, seems like DetectClipping should be called above this line and no need for 'route_audio_to_clipper' check Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > since this is the only method making use of clipping information, seems like > DetectClipping should be called above this line and no need for > 'route_audio_to_clipper' check Done.
	285 }

	286 if (route_audio_to_sr_engine) {

	287 DCHECK(recognition_engine_.get());

	288 recognition_engine_->TakeAudioChunk(recorded_audio_data);

	289 }

	290 }

	291

	292 SpeechRecognizerImpl::FSMState SpeechRecognizerImpl::ProcessEvent(
	Satish 2012/03/27 09:47:42 DispatchEvent and ProcessEvent are too similar, pl DispatchEvent and ProcessEvent are too similar, please find a more suitable name for this method or merge with DispatchEvent Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. ExecuteTransitionAndGetNextState Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > DispatchEvent and ProcessEvent are too similar, please find a more suitable name > for this method or merge with DispatchEvent Done. ExecuteTransitionAndGetNextState Done.
	293 FSMEvent event) {

	294 switch (state_) {

	295 case kIdle:

	296 switch (event) {

	297 // TODO(primiano) restore UNREACHABLE_CONDITION above when speech

	298 // input extensions are fixed.

	299 case kAbortRequest: return DoNothing(); //UNREACHABLE_CONDITION();

	300 case kStartRequest: return InitializeAndStartRecording();
	Satish 2012/03/27 09:47:42 since this is the only valid event in this state, since this is the only valid event in this state, can you rewrite as switch (event) { case kStartRequest: return InitializeAndStartRecording(); default: return DoNothing(); } Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Hmm IMHO it might introduce bugs if a new event is Show quoted text On 2012/03/27 09:47:42, Satish wrote: > since this is the only valid event in this state, can you rewrite as > switch (event) { > case kStartRequest: > return InitializeAndStartRecording(); > default: > return DoNothing(); > } Hmm IMHO it might introduce bugs if a new event is introduced. In the current implementation is a new event is introduced and is not explicitly processed if will be caught by the final UNREACHABLE_CONDITION.
	301 case kStopCaptureRequest: return DoNothing(); //UNREACHABLE_CONDITION();

	302 case kAudioData: return DoNothing(); // Corner cases related to

	303 case kRecognitionResult: return DoNothing(); // queued messages being

	304 case kRecognitionError: return DoNothing(); // lately dispatched.

	305 case kAudioError: return DoNothing();

	306 }

	307 break;

	308 case kStartingRecognition:

	309 switch (event) {

	310 case kAbortRequest: return Abort();
	Satish 2012/03/27 09:47:42 would be simpler to collapse multiple similar hand would be simpler to collapse multiple similar handlers: switch (event) { case kAbortRequest: case kStopCaptureRequest: case kRecognitionError: case kAudioError: return Abort(event); case kAudioData: return StartSpeechRecognition(); default: return InvalidInput(); } Same for every other switch below Primiano Tucci (use gerrit) 2012/03/28 13:24:44 IMHO it would become more difficult to read (since Show quoted text On 2012/03/27 09:47:42, Satish wrote: > would be simpler to collapse multiple similar handlers: > switch (event) { > case kAbortRequest: > case kStopCaptureRequest: > case kRecognitionError: > case kAudioError: > return Abort(event); > case kAudioData: > return StartSpeechRecognition(); > default: > return InvalidInput(); > } > > Same for every other switch below IMHO it would become more difficult to read (since the order of events will not be the same in all cases) and to maintain.
	311 case kStartRequest: UNREACHABLE_CONDITION();

	312 case kStopCaptureRequest: return Abort();

	313 case kAudioData: return StartSpeechRecognition();

	314 case kRecognitionResult: UNREACHABLE_CONDITION();

	315 case kRecognitionError: return Abort();

	316 case kAudioError: return Abort();

	317 }

	318 break;

	319 case kEstimatingEnvironment:

	320 switch (event) {

	321 case kAbortRequest: return Abort();
	Satish 2012/03/27 09:47:42 hmm, since kAbortRequest, kRecognitionError and kA hmm, since kAbortRequest, kRecognitionError and kAudioError are always triggering 'return Abort' in every state, may be these can be moved outside at the top and handled in a single place (if you also add a 'default: InvalidInput();' case). Up to you though.. Primiano Tucci (use gerrit) 2012/03/28 13:24:44 They are not exactly equivalent since they trigger Show quoted text On 2012/03/27 09:47:42, Satish wrote: > hmm, since kAbortRequest, kRecognitionError and kAudioError are always > triggering 'return Abort' in every state, may be these can be moved outside at > the top and handled in a single place (if you also add a 'default: > InvalidInput();' case). Up to you though.. They are not exactly equivalent since they trigger different behaviors inside Abort. Btw, I would like to keep the current structure: I know that it can be compacted, but I think that its verbosity helps to see all the possible race conditions that might happen, that would be otherwise hidden grouping cases or factoring events.
	322 case kStartRequest: UNREACHABLE_CONDITION();

	323 case kStopCaptureRequest: return StopCaptureAndWaitForResult();

	324 case kAudioData: return EnvironmentEstimation();

	325 case kRecognitionResult: return ProcessIntermediateRecognitionResult();

	326 case kRecognitionError: return Abort();

	327 case kAudioError: return Abort();

	328 }

	329 break;

	330 case kWaitingForSpeech:

	331 switch (event) {

	332 case kAbortRequest: return Abort();

	333 case kStartRequest: UNREACHABLE_CONDITION();

	334 case kStopCaptureRequest: return StopCaptureAndWaitForResult();

	335 case kAudioData: return DetectUserSpeechOrTimeout();

	336 case kRecognitionResult: return ProcessIntermediateRecognitionResult();

	337 case kRecognitionError: return Abort();

	338 case kAudioError: return Abort();

	339 }

	340 break;

	341 case kRecognizingSpeech:

	342 switch (event) {

	343 case kAbortRequest: return Abort();

	344 case kStartRequest: UNREACHABLE_CONDITION();

	345 case kStopCaptureRequest: return StopCaptureAndWaitForResult();

	346 case kAudioData: return DetectEndOfSpeech();

	347 case kRecognitionResult: return ProcessIntermediateRecognitionResult();

	348 case kRecognitionError: return Abort();

	349 case kAudioError: return Abort();

	350 }

	351 break;

	352 case kWaitingFinalResult:

	353 switch (event) {

	354 case kAbortRequest: return Abort();

	355 case kStartRequest: UNREACHABLE_CONDITION();

	356 case kStopCaptureRequest: return DoNothing();

	357 case kAudioData: return DoNothing();

	358 case kRecognitionResult: return ProcessFinalRecognitionResult();

	359 case kRecognitionError: return Abort();

	360 case kAudioError: return Abort();

	361 }

	362 break;

	363 }

	364 UNREACHABLE_CONDITION();

	365 }

	366

	367 SpeechRecognizerImpl::FSMState

	368 SpeechRecognizerImpl::InitializeAndStartRecording() {

	369 DCHECK(recognition_engine_.get());

	370 DCHECK(audio_controller_.get() == NULL);

139 AudioManager* audio_manager = (testing_audio_manager_ != NULL) ?	371 AudioManager* audio_manager = (testing_audio_manager_ != NULL) ?

140 testing_audio_manager_ :	372 testing_audio_manager_ :

141 BrowserMainLoop::GetAudioManager();	373 BrowserMainLoop::GetAudioManager();

	374 DCHECK(audio_manager != NULL);

	375

	376 VLOG(1) << "SpeechRecognizerImpl starting audio capture.";

	377 num_samples_recorded_ = 0;

	378 rms_ = 0;

	379 audio_level_ = 0;

	380 clipper_detected_clip_ = false;

	381 listener_->OnRecognitionStart(caller_id_);

	382

	383 if (!audio_manager->HasAudioInputDevices()) {

	384 return Abort(SpeechRecognitionError(

	385 content::SPEECH_RECOGNITION_ERROR_AUDIO,

	386 content::SPEECH_AUDIO_ERROR_DETAILS_NO_MIC));

	387 }

	388

	389 if (audio_manager->IsRecordingInProcess()) {

	390 return Abort(SpeechRecognitionError(

	391 content::SPEECH_RECOGNITION_ERROR_AUDIO,

	392 content::SPEECH_AUDIO_ERROR_DETAILS_IN_USE));

	393 }

	394

142 const int samples_per_packet = kAudioSampleRate *	395 const int samples_per_packet = kAudioSampleRate *
	Satish 2012/03/27 09:47:42 add parentheses around (kAudioSampleRate * ..) / 1 add parentheses around (kAudioSampleRate * ..) / 1000 ? Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > add parentheses around (kAudioSampleRate * ..) / 1000 ? Done.
143 GoogleOneShotRemoteEngine::kAudioPacketIntervalMs / 1000;	396 recognition_engine_->GetDesiredAudioChunkDurationMs() / 1000;

144 AudioParameters params(AudioParameters::AUDIO_PCM_LINEAR, kChannelLayout,	397 AudioParameters params(AudioParameters::AUDIO_PCM_LINEAR, kChannelLayout,

145 kAudioSampleRate, kNumBitsPerAudioSample,	398 kAudioSampleRate, kNumBitsPerAudioSample,

146 samples_per_packet);	399 samples_per_packet);

147 audio_controller_ = AudioInputController::Create(audio_manager, this, params);	400 audio_controller_ = AudioInputController::Create(audio_manager, this, params);

148 DCHECK(audio_controller_.get());	401

149 VLOG(1) << "SpeechRecognizer starting record.";	402 if (audio_controller_.get() == NULL) {

150 num_samples_recorded_ = 0;	403 return Abort(

	404 SpeechRecognitionError(content::SPEECH_RECOGNITION_ERROR_AUDIO));

	405 }

	406

	407 // The endpointer needs to estimate the environment/background noise before

	408 // starting to treat the audio as user input. We wait in the state

	409 // kEstimatingEnvironment until such interval has elapsed before switching

	410 // to user input mode.

	411 endpointer_.SetEnvironmentEstimationMode();

151 audio_controller_->Record();	412 audio_controller_->Record();

152 }	413 return kStartingRecognition;

153	414 }

154 void SpeechRecognizerImpl::AbortRecognition() {	415

155 DCHECK(BrowserThread::CurrentlyOn(BrowserThread::IO));	416 SpeechRecognizerImpl::FSMState SpeechRecognizerImpl::StartSpeechRecognition() {

156 DCHECK(audio_controller_.get() \|\| recognition_engine_.get());	417 // This was the first audio packet recorded, so start a request to the
	Satish 2012/03/27 09:47:42 update comment to say that the first audio packet update comment to say that the first audio packet has been received so start the recognition engine Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > update comment to say that the first audio packet has been received so start the > recognition engine Done.
157	418 // engine to send the data and inform the delegate.

158 // Stop recording if required.	419 DCHECK(recognition_engine_.get());

159 if (audio_controller_.get()) {	420 recognition_engine_->StartRecognition();

	421 listener_->OnAudioStart(caller_id_);

	422 // TODO(primiano) this is a little hack, since TakeAudioChunk() is already
	Satish 2012/03/27 09:47:42 add newline above add newline above Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > add newline above Done.
	423 // called by ProcessAudioPipeline(). I hate it since it weakens the

	424 // architectural beauty of this class. But it is the best tradeoff, unless we
	Satish 2012/03/27 09:47:42 could remove reference to 'architectural beauty' : could remove reference to 'architectural beauty' :) Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > could remove reference to 'architectural beauty' :) Done.
	425 // allow the drop the first audio chunk captured after opening the audio dev.

	426 recognition_engine_->TakeAudioChunk(*(event_args_->audio_data));

	427 return kEstimatingEnvironment;

	428 }

	429

	430 SpeechRecognizerImpl::FSMState SpeechRecognizerImpl::EnvironmentEstimation() {
	Satish 2012/03/27 09:47:42 this method's name doesn't indicate what it actual this method's name doesn't indicate what it actually does, can you use a more appropriate name? Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. WaitEnvironmentEstimationCompletion Show quoted text On 2012/03/27 09:47:42, Satish wrote: > this method's name doesn't indicate what it actually does, can you use a more > appropriate name? Done. WaitEnvironmentEstimationCompletion
	431 DCHECK(endpointer_.IsEstimatingEnvironment());

	432 if (GetElapsedTimeMs() >= kEndpointerEstimationTimeMs) {

	433 endpointer_.SetUserInputMode();

	434 listener_->OnEnvironmentEstimationComplete(caller_id_);

	435 return kWaitingForSpeech;

	436 } else {

	437 return kEstimatingEnvironment;

	438 }

	439 }

	440

	441 SpeechRecognizerImpl::FSMState

	442 SpeechRecognizerImpl::DetectUserSpeechOrTimeout() {

	443 if (skipSilenceDetectionForTesting)

	444 return kRecognizingSpeech;

	445

	446 if (endpointer_.DidStartReceivingSpeech()) {

	447 listener_->OnSoundStart(caller_id_);

	448 return kRecognizingSpeech;

	449 } else if (GetElapsedTimeMs() >= kNoSpeechTimeoutMs) {

	450 return Abort(

	451 SpeechRecognitionError(content::SPEECH_RECOGNITION_ERROR_NO_SPEECH));

	452 } else {

	453 return kWaitingForSpeech;

	454 }

	455 }

	456

	457 SpeechRecognizerImpl::FSMState SpeechRecognizerImpl::DetectEndOfSpeech() {

	458 if (endpointer_.speech_input_complete()) {

	459 return StopCaptureAndWaitForResult();

	460 } else {

	461 return kRecognizingSpeech;

	462 }

	463 }

	464

	465 SpeechRecognizerImpl::FSMState

	466 SpeechRecognizerImpl::StopCaptureAndWaitForResult() {

	467 DCHECK(state_ >= kEstimatingEnvironment && state_ <= kRecognizingSpeech);

	468

	469 VLOG(1) << "Concluding recognition";

	470 CloseAudioControllerSynchronously();

	471 recognition_engine_->AudioChunksEnded();

	472

	473 if (state_ > kWaitingForSpeech)

	474 listener_->OnSoundEnd(caller_id_);

	475

	476 listener_->OnAudioEnd(caller_id_);

	477 return kWaitingFinalResult;

	478 }

	479

	480 SpeechRecognizerImpl::FSMState SpeechRecognizerImpl::Abort() {

	481 // TODO(primiano) Should raise SPEECH_RECOGNITION_ERROR_ABORTED in lack of

	482 // other specific error sources (so that it was an explicit abort request).

	483 // However, SPEECH_RECOGNITION_ERROR_ABORTED is not caught in UI layers

	484 // and currently would cause an exception. JS will probably need it in future.

	485 SpeechRecognitionError error(content::SPEECH_RECOGNITION_ERROR_NONE);

	486 bool has_error = false;

	487 if (event_ == kAudioError) {

	488 has_error = true;

	489 error.code = content::SPEECH_RECOGNITION_ERROR_AUDIO;

	490 } else if (event_ == kRecognitionError) {

	491 has_error = true;

	492 error = event_args_->error;

	493 }

	494 return Abort(has_error, error);

	495 }

	496

	497 SpeechRecognizerImpl::FSMState SpeechRecognizerImpl::Abort(

	498 const SpeechRecognitionError& error) {

	499 return Abort(true, error);

	500 }

	501

	502 SpeechRecognizerImpl::FSMState SpeechRecognizerImpl::Abort(

	503 bool has_error, const SpeechRecognitionError& error) {
	Satish 2012/03/27 09:47:42 can we change 'error' to be a pointer and remove ' can we change 'error' to be a pointer and remove 'has_error' parameter? Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > can we change 'error' to be a pointer and remove 'has_error' parameter? Done.
	504 if (audio_controller_)

160 CloseAudioControllerSynchronously();	505 CloseAudioControllerSynchronously();

161 }	506

162	507 VLOG(1) << "SpeechRecognizerImpl canceling recognition. " <<

163 VLOG(1) << "SpeechRecognizer canceling recognition.";	508 error.code << " " << error.details;

164 recognition_engine_.reset();	509

165 }	510 // The recognition engine is initialized only after kStartingRecognition.

166	511 if (state_ > kStartingRecognition) {

167 void SpeechRecognizerImpl::StopAudioCapture() {	512 DCHECK(recognition_engine_.get());

168 DCHECK(BrowserThread::CurrentlyOn(BrowserThread::IO));	513 recognition_engine_->EndRecognition();

169	514 //TODO(primiano) reset the engine? Why, after all?
	Satish 2012/03/27 09:47:42 This comment is unclear, please reword if required This comment is unclear, please reword if required or remove. Also should the next line remain commented? Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > This comment is unclear, please reword if required or remove. Also should the > next line remain commented? Done.
170 // If audio recording has already stopped and we are in recognition phase,	515 //recognition_engine_.reset();

171 // silently ignore any more calls to stop recording.	516 }

172 if (!audio_controller_.get())	517

173 return;	518 if (state_ > kWaitingForSpeech && state_ < kWaitingFinalResult)
	Satish 2012/03/27 09:47:42 would be useful for the unittest to verify that al would be useful for the unittest to verify that all these callbacks to listener come as expected Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > would be useful for the unittest to verify that all these callbacks to listener > come as expected Done.
174	519 listener_->OnSoundEnd(caller_id_);

175 CloseAudioControllerSynchronously();	520

176 listener_->OnSoundEnd(caller_id_);	521 if (state_ > kStartingRecognition && state_ < kWaitingFinalResult)

177 listener_->OnAudioEnd(caller_id_);	522 listener_->OnAudioEnd(caller_id_);

178	523

179 // If we haven't got any audio yet end the recognition sequence here.	524 if (has_error)

180 if (recognition_engine_ == NULL) {	525 listener_->OnRecognitionError(caller_id_, error);

181 // Guard against the listener freeing us until we finish our job.	526

182 scoped_refptr<SpeechRecognizerImpl> me(this);	527 listener_->OnRecognitionEnd(caller_id_);

183 listener_->OnRecognitionEnd(caller_id_);	528

184 } else {	529 return kIdle;

185 recognition_engine_->AudioChunksEnded();	530 }

186 }	531

187 }	532 SpeechRecognizerImpl::FSMState

188	533 SpeechRecognizerImpl::ProcessIntermediateRecognitionResult() {

189 // Invoked in the audio thread.	534 // This is in preparation for future speech recognition functions.
	Satish 2012/03/27 09:47:42 remove these commented lines remove these commented lines Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > remove these commented lines Done.
190 void SpeechRecognizerImpl::OnError(AudioInputController* controller,	535 // DCHECK(continuous_mode_);

191 int error_code) {	536 // const SpeechRecognitionResult& result = event_args_->speech_result;

192 BrowserThread::PostTask(BrowserThread::IO, FROM_HERE,	537 // VLOG(1) << "Got intermediate result";

193 base::Bind(&SpeechRecognizerImpl::HandleOnError,	538 // listener_->OnRecognitionResult(caller_id_, result);

194 this, error_code));	539 NOTREACHED();

195 }	540 return state_;

196	541 }

197 void SpeechRecognizerImpl::HandleOnError(int error_code) {	542

198 LOG(WARNING) << "SpeechRecognizer::HandleOnError, code=" << error_code;	543 SpeechRecognizerImpl::FSMState

199	544 SpeechRecognizerImpl::ProcessFinalRecognitionResult() {

200 // Check if we are still recording before canceling recognition, as	545 const SpeechRecognitionResult& result = event_args_->speech_result;

201 // recording might have been stopped after this error was posted to the queue	546 VLOG(1) << "Got valid result";

202 // by \|OnError\|.	547 recognition_engine_->EndRecognition();

203 if (!audio_controller_.get())

204 return;

205

206 InformErrorAndAbortRecognition(content::SPEECH_RECOGNITION_ERROR_AUDIO);

207 }

208

209 void SpeechRecognizerImpl::OnData(AudioInputController* controller,

210 const uint8* data, uint32 size) {

211 if (size == 0) // This could happen when recording stops and is normal.

212 return;

213 AudioChunk* raw_audio = new AudioChunk(data, static_cast<size_t>(size),

214 kNumBitsPerAudioSample / 8);

215 BrowserThread::PostTask(BrowserThread::IO, FROM_HERE,

216 base::Bind(&SpeechRecognizerImpl::HandleOnData,

217 this, raw_audio));

218 }

219

220 void SpeechRecognizerImpl::HandleOnData(AudioChunk* raw_audio) {

221 scoped_ptr<AudioChunk> free_raw_audio_on_return(raw_audio);

222 // Check if we are still recording and if not discard this buffer, as

223 // recording might have been stopped after this buffer was posted to the queue

224 // by \|OnData\|.

225 if (!audio_controller_.get())

226 return;

227

228 bool speech_was_heard_before_packet = endpointer_.DidStartReceivingSpeech();

229

230 float rms;

231 endpointer_.ProcessAudio(*raw_audio, &rms);

232 bool did_clip = DetectClipping(*raw_audio);

233 num_samples_recorded_ += raw_audio->NumSamples();

234

235 if (recognition_engine_ == NULL) {

236 // This was the first audio packet recorded, so start a request to the

237 // server to send the data and inform the listener.

238 listener_->OnAudioStart(caller_id_);

239 GoogleOneShotRemoteEngineConfig google_sr_config;

240 google_sr_config.language = language_;

241 google_sr_config.grammar = grammar_;

242 google_sr_config.audio_sample_rate = kAudioSampleRate;

243 google_sr_config.audio_num_bits_per_sample = kNumBitsPerAudioSample;

244 google_sr_config.filter_profanities = filter_profanities_;

245 google_sr_config.hardware_info = hardware_info_;

246 google_sr_config.origin_url = origin_url_;

247 GoogleOneShotRemoteEngine* google_sr_engine =

248 new GoogleOneShotRemoteEngine(context_getter_.get());

249 google_sr_engine->SetConfig(google_sr_config);

250 recognition_engine_.reset(google_sr_engine);

251 recognition_engine_->set_delegate(this);

252 recognition_engine_->StartRecognition();

253 }

254

255 recognition_engine_->TakeAudioChunk(*raw_audio);

256

257 if (endpointer_.IsEstimatingEnvironment()) {

258 // Check if we have gathered enough audio for the endpointer to do

259 // environment estimation and should move on to detect speech/end of speech.

260 if (num_samples_recorded_ >= (kEndpointerEstimationTimeMs *

261 kAudioSampleRate) / 1000) {

262 endpointer_.SetUserInputMode();

263 listener_->OnEnvironmentEstimationComplete(caller_id_);

264 }

265 return; // No more processing since we are still estimating environment.

266 }

267

268 // Check if we have waited too long without hearing any speech.

269 bool speech_was_heard_after_packet = endpointer_.DidStartReceivingSpeech();

270 if (!speech_was_heard_after_packet &&

271 num_samples_recorded_ >= (kNoSpeechTimeoutMs / 1000) * kAudioSampleRate) {

272 InformErrorAndAbortRecognition(

273 content::SPEECH_RECOGNITION_ERROR_NO_SPEECH);

274 return;

275 }

276

277 if (!speech_was_heard_before_packet && speech_was_heard_after_packet)

278 listener_->OnSoundStart(caller_id_);

279

280 // Calculate the input volume to display in the UI, smoothing towards the

281 // new level.

282 float level = (rms - kAudioMeterMinDb) /

283 (kAudioMeterDbRange / kAudioMeterRangeMaxUnclipped);

284 level = std::min(std::max(0.0f, level), kAudioMeterRangeMaxUnclipped);

285 if (level > audio_level_) {

286 audio_level_ += (level - audio_level_) * kUpSmoothingFactor;

287 } else {

288 audio_level_ += (level - audio_level_) * kDownSmoothingFactor;

289 }

290

291 float noise_level = (endpointer_.NoiseLevelDb() - kAudioMeterMinDb) /

292 (kAudioMeterDbRange / kAudioMeterRangeMaxUnclipped);

293 noise_level = std::min(std::max(0.0f, noise_level),

294 kAudioMeterRangeMaxUnclipped);

295

296 listener_->OnAudioLevelsChange(caller_id_, did_clip ? 1.0f : audio_level_,

297 noise_level);

298

299 if (endpointer_.speech_input_complete())

300 StopAudioCapture();

301 }

302

303 void SpeechRecognizerImpl::OnSpeechRecognitionEngineResult(

304 const content::SpeechRecognitionResult& result) {

305 // Guard against the listener freeing us until we finish our job.

306 scoped_refptr<SpeechRecognizerImpl> me(this);

307 listener_->OnRecognitionResult(caller_id_, result);	548 listener_->OnRecognitionResult(caller_id_, result);

308 listener_->OnRecognitionEnd(caller_id_);	549 listener_->OnRecognitionEnd(caller_id_);

309 }	550 return kIdle;

310	551 }

311 void SpeechRecognizerImpl::OnSpeechRecognitionEngineError(	552

312 const content::SpeechRecognitionError& error) {	553 SpeechRecognizerImpl::FSMState SpeechRecognizerImpl::DoNothing() const {

313 InformErrorAndAbortRecognition(error.code);	554 return state_; // Just keep the current state.
	Satish 2012/03/27 09:47:42 2 spaces before //, here and other places in this 2 spaces before //, here and other places in this file Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > 2 spaces before //, here and other places in this file Done.
314 }

315

316 void SpeechRecognizerImpl::InformErrorAndAbortRecognition(

317 content::SpeechRecognitionErrorCode error) {

318 DCHECK_NE(error, content::SPEECH_RECOGNITION_ERROR_NONE);

319 AbortRecognition();

320

321 // Guard against the listener freeing us until we finish our job.

322 scoped_refptr<SpeechRecognizerImpl> me(this);

323 listener_->OnRecognitionError(caller_id_, error);

324 }	555 }

325	556

326 void SpeechRecognizerImpl::CloseAudioControllerSynchronously() {	557 void SpeechRecognizerImpl::CloseAudioControllerSynchronously() {

327 VLOG(1) << "SpeechRecognizer stopping record.";	558 DCHECK(audio_controller_);

	559 VLOG(1) << "SpeechRecognizerImpl stopping audio capture.";

328	560

329 // TODO(satish): investigate the possibility to utilize the closure	561 // TODO(satish): investigate the possibility to utilize the closure

330 // and switch to async. version of this method. Compare with how	562 // and switch to async. version of this method. Compare with how

331 // it's done in e.g. the AudioRendererHost.	563 // it's done in e.g. the AudioRendererHost.

332 base::WaitableEvent closed_event(true, false);	564 base::WaitableEvent closed_event(true, false);

333 audio_controller_->Close(base::Bind(&base::WaitableEvent::Signal,	565 audio_controller_->Close(base::Bind(&base::WaitableEvent::Signal,

334 base::Unretained(&closed_event)));	566 base::Unretained(&closed_event)));

335 closed_event.Wait();	567 closed_event.Wait();

336 audio_controller_ = NULL; // Releases the ref ptr.	568 audio_controller_ = NULL; // Releases the ref ptr.

337 }	569 }

338	570

339 bool SpeechRecognizerImpl::IsActive() const {	571 int SpeechRecognizerImpl::GetElapsedTimeMs() const {

340 return (recognition_engine_.get() != NULL);	572 return num_samples_recorded_ * 1000 / kAudioSampleRate;
	Satish 2012/03/27 09:47:42 use parenthesis around (num_samples_recorded_ * 10 use parenthesis around (num_samples_recorded_ * 1000) / ... Primiano Tucci (use gerrit) 2012/03/28 13:24:44 Done. Show quoted text On 2012/03/27 09:47:42, Satish wrote: > use parenthesis around (num_samples_recorded_ * 1000) / ... Done.
341 }	573 }

342	574

343 bool SpeechRecognizerImpl::IsCapturingAudio() const {	575 void SpeechRecognizerImpl::UpdateSignalAndNoiseLevels(const float& rms) {

344 return (audio_controller_.get() != NULL);	576 // Calculate the input volume to display in the UI, smoothing towards the

	577 // new level.

	578 // TODO(primiano) Do we really need all this floating point arith here?

	579 // Perhaps it might be quite expensive on mobile.

	580 float level = (rms - kAudioMeterMinDb) /

	581 (kAudioMeterDbRange / kAudioMeterRangeMaxUnclipped);

	582 level = std::min(std::max(0.0f, level), kAudioMeterRangeMaxUnclipped);

	583 if (level > audio_level_) {

	584 audio_level_ += (level - audio_level_) * kUpSmoothingFactor;

	585 } else {

	586 audio_level_ += (level - audio_level_) * kDownSmoothingFactor;

	587 }

	588

	589 float noise_level = (endpointer_.NoiseLevelDb() - kAudioMeterMinDb) /

	590 (kAudioMeterDbRange / kAudioMeterRangeMaxUnclipped);

	591 noise_level = std::min(std::max(0.0f, noise_level),

	592 kAudioMeterRangeMaxUnclipped);

	593

	594 listener_->OnAudioLevelsChange(

	595 caller_id_, clipper_detected_clip_ ? 1.0f : audio_level_, noise_level);

345 }	596 }

346	597

347 const SpeechRecognitionEngine&	598 const SpeechRecognitionEngine&

348 SpeechRecognizerImpl::recognition_engine() const {	599 SpeechRecognizerImpl::recognition_engine() const {

349 return *(recognition_engine_.get());	600 return *(recognition_engine_.get());

350 }	601 }

351	602

352 void SpeechRecognizerImpl::SetAudioManagerForTesting(	603 void SpeechRecognizerImpl::SetAudioManagerForTesting(

353 AudioManager* audio_manager) {	604 AudioManager* audio_manager) {

354 testing_audio_manager_ = audio_manager;	605 testing_audio_manager_ = audio_manager;

355 }	606 }

356	607

	608 SpeechRecognizerImpl::FSMEventArgs::FSMEventArgs()

	609 : audio_error_code(0),

	610 audio_data(NULL),

	611 error(content::SPEECH_RECOGNITION_ERROR_NONE) {

	612 }

357	613

358 } // namespace speech	614 } // namespace speech

OLD	NEW