Chromium Code Reviews| Index: chrome/browser/metrics/metrics_service.cc |
| =================================================================== |
| --- chrome/browser/metrics/metrics_service.cc (revision 238706) |
| +++ chrome/browser/metrics/metrics_service.cc (working copy) |
| @@ -22,14 +22,14 @@ |
| // before being persisted. |
| // |
| // Logs fall into one of two categories: "initial logs," and "ongoing logs." |
| -// There is at most one initial log sent for each complete run of the chromium |
| -// product (from startup, to browser shutdown). An initial log is generally |
| -// transmitted some short time (1 minute?) after startup, and includes stats |
| -// such as recent crash info, the number and types of plugins, etc. The |
| -// external server's response to the initial log conceptually tells this MS if |
| -// it should continue transmitting logs (during this session). The server |
| -// response can actually be much more detailed, and always includes (at a |
| -// minimum) how often additional ongoing logs should be sent. |
| +// There is at most one initial log sent for each complete run of Chrome (from |
| +// startup, to browser shutdown). An initial log is generally transmitted some |
| +// short time (1 minute?) after startup, and includes stats such as recent crash |
| +// info, the number and types of plugins, etc. The external server's response |
| +// to the initial log conceptually tells this MS if it should continue |
| +// transmitting logs (during this session). The server response can actually be |
| +// much more detailed, and always includes (at a minimum) how often additional |
| +// ongoing logs should be sent. |
| // |
| // After the above initial log, a series of ongoing logs will be transmitted. |
| // The first ongoing log actually begins to accumulate information stating when |
| @@ -45,13 +45,13 @@ |
| // memory statistics are deposited into a histogram, and the log finalization |
| // code is then called. In the finalization, a call to a Histogram server |
| // acquires a list of all local histograms that have been flagged for upload |
| -// to the UMA server. The finalization also acquires a the most recent number |
| +// to the UMA server. The finalization also acquires the most recent number |
| // of page loads, along with any counts of renderer or plugin crashes. |
| // |
| // When the browser shuts down, there will typically be a fragment of an ongoing |
| -// log that has not yet been transmitted. At shutdown time, that fragment |
| -// is closed (including snapshotting histograms), and persisted, for |
| -// potential transmission during a future run of the product. |
| +// log that has not yet been transmitted. At shutdown time, that fragment is |
| +// closed (including snapshotting histograms), and persisted, for potential |
| +// transmission during a future run of the product. |
| // |
| // There are two slightly abnormal shutdown conditions. There is a |
| // "disconnected scenario," and a "really fast startup and shutdown" scenario. |
| @@ -73,15 +73,16 @@ |
| // in a future run's initial log. (i.e., we don't lose crash stats). |
| // |
| // With the above overview, we can now describe the state machine's various |
| -// stats, based on the State enum specified in the state_ member. Those states |
| +// states, based on the State enum specified in the state_ member. Those states |
| // are: |
| // |
| -// INITIALIZED, // Constructor was called. |
| -// INIT_TASK_SCHEDULED, // Waiting for deferred init tasks to complete. |
| -// INIT_TASK_DONE, // Waiting for timer to send initial log. |
| -// INITIAL_LOG_READY, // Initial log generated, and waiting for reply. |
| -// SENDING_OLD_LOGS, // Sending unsent logs from previous session. |
| -// SENDING_CURRENT_LOGS, // Sending standard current logs as they accrue. |
| +// INITIALIZED, // Constructor was called. |
| +// INIT_TASK_SCHEDULED, // Waiting for deferred init tasks to finish. |
|
Ilya Sherman
2013/12/06 06:21:26
nit: Note that you still have "tasks" here, but yo
Alexei Svitkine (slow)
2013/12/06 17:59:22
Done.
|
| +// INIT_TASK_DONE, // Waiting for timer to send initial log. |
| +// SENDING_INITIAL_STABILITY_LOG, // Initial stability log being sent. |
| +// SENDING_INITIAL_METRICS_LOG, // Initial metrics log being sent. |
| +// SENDING_OLD_LOGS, // Sending unsent logs from previous session. |
| +// SENDING_CURRENT_LOGS, // Sending ongoing logs as they acrue. |
| // |
| // In more detail, we have: |
| // |
| @@ -89,7 +90,7 @@ |
| // The MS has been constructed, but has taken no actions to compose the |
| // initial log. |
| // |
| -// INIT_TASK_SCHEDULED, // Waiting for deferred init tasks to complete. |
| +// INIT_TASK_SCHEDULED, // Waiting for deferred init task to complete. |
| // Typically about 30 seconds after startup, a task is sent to a second thread |
| // (the file thread) to perform deferred (lower priority and slower) |
| // initialization steps such as getting the list of plugins. That task will |
| @@ -101,11 +102,21 @@ |
| // created. This callback typically arrives back less than one second after |
| // the deferred init task is dispatched. |
| // |
| -// INITIAL_LOG_READY, // Initial log generated, and waiting for reply. |
| -// This state is entered only after an initial log has been composed, and |
| -// prepared for transmission. It is also the case that any previously unsent |
| -// logs have been loaded into instance variables for possible transmission. |
| +// SENDING_INITIAL_STABILITY_LOG, // Initial stability log being sent. |
| +// During initialization, if a crash occurred during the previous session, an |
| +// initial stability log will be generated and registered with the log manager. |
| +// This state will be entered if there's such a stability log waiting to be |
|
Ilya Sherman
2013/12/06 06:21:26
nit: What does it mean for there to be a stability
Alexei Svitkine (slow)
2013/12/06 17:59:22
Done.
|
| +// transmitted when its time to sent up the first log (per the reporting |
|
Ilya Sherman
2013/12/06 06:21:26
nit: "its" -> "it's"
Ilya Sherman
2013/12/06 06:21:26
nit: "sent" -> "send"
Alexei Svitkine (slow)
2013/12/06 17:59:22
Done.
Alexei Svitkine (slow)
2013/12/06 17:59:22
Done.
|
| +// scheduler). If there is no initial stability log (e.g. there was no previous |
| +// crash), then this state will be skipped and the state will advance to |
| +// SENDING_INITIAL_METRICS_LOG. |
| // |
| +// SENDING_INITIAL_METRICS_LOG, // Initial metrics log being sent. |
| +// This state is entered once the initial metrics log has been composed, and |
| +// prepared for transmission and after sending the initial stability log, if |
| +// any. It is also the case that any previously unsent logs have been loaded |
|
Ilya Sherman
2013/12/06 06:21:26
nit: This first sentence is a bit hard to parse; c
Alexei Svitkine (slow)
2013/12/06 17:59:22
Done.
|
| +// into instance variables for possible transmission. |
| +// |
| // SENDING_OLD_LOGS, // Sending unsent logs from previous session. |
| // This state indicates that the initial log for this session has been |
| // successfully sent and it is now time to send any logs that were |
| @@ -337,6 +348,11 @@ |
| pref->CommitPendingWrite(); |
| } |
| +// Returns whether initial stability metrics should be sent in a separate log. |
| +bool SendSeparateInitialStabilityLog() { |
| + return base::FieldTrialList::FindFullName("UMAStability") == "SeparateLog"; |
| +} |
| + |
| } // namespace |
| @@ -503,6 +519,7 @@ |
| state_(INITIALIZED), |
| low_entropy_source_(kLowEntropySourceNotSet), |
| idle_since_last_transmission_(false), |
| + session_id_(-1), |
| next_window_id_(0), |
| self_ptr_factory_(this), |
| state_saver_factory_(this), |
| @@ -510,14 +527,7 @@ |
| num_async_histogram_fetches_in_progress_(0), |
| entropy_source_returned_(LAST_ENTROPY_NONE) { |
| DCHECK(IsSingleThreaded()); |
| - InitializeMetricsState(); |
| - base::Closure callback = base::Bind(&MetricsService::StartScheduledUpload, |
| - self_ptr_factory_.GetWeakPtr()); |
| - scheduler_.reset(new MetricsReportingScheduler(callback)); |
| - log_manager_.set_log_serializer(new MetricsLogSerializer()); |
| - log_manager_.set_max_ongoing_log_store_size(kUploadLogAvoidRetransmitSize); |
| - |
| BrowserChildProcessObserver::Add(this); |
| } |
| @@ -527,6 +537,17 @@ |
| BrowserChildProcessObserver::Remove(this); |
| } |
| +void MetricsService::InitializeMetricsRecordingState( |
| + bool reporting_will_be_enabled) { |
| + InitializeMetricsState(reporting_will_be_enabled); |
| + |
| + base::Closure callback = base::Bind(&MetricsService::StartScheduledUpload, |
| + self_ptr_factory_.GetWeakPtr()); |
| + scheduler_.reset(new MetricsReportingScheduler(callback)); |
| + log_manager_.set_log_serializer(new MetricsLogSerializer()); |
| + log_manager_.set_max_ongoing_log_store_size(kUploadLogAvoidRetransmitSize); |
| +} |
| + |
| void MetricsService::Start() { |
| HandleIdleSinceLastTransmission(false); |
| EnableRecording(); |
| @@ -788,7 +809,7 @@ |
| // killed, so this has to be treated similar to a shutdown, closing and |
| // persisting all logs. Unlinke a shutdown, the state is primed to be ready |
| // to continue logging and uploading if the process does return. |
| - if (recording_active() && state_ >= INITIAL_LOG_READY) { |
| + if (recording_active() && state_ >= SENDING_INITIAL_STABILITY_LOG) { |
| PushPendingLogsToPersistentStorage(); |
| // Persisting logs closes the current log, so start recording a new log |
| // immediately to capture any background work that might be done before the |
| @@ -893,7 +914,7 @@ |
| //------------------------------------------------------------------------------ |
| // Initialization methods |
| -void MetricsService::InitializeMetricsState() { |
| +void MetricsService::InitializeMetricsState(bool reporting_will_be_enabled) { |
| #if defined(OS_POSIX) |
| network_stats_server_ = chrome_common_net::kEchoTestServerLocation; |
| http_pipelining_test_server_ = chrome_common_net::kPipelineTestServerBaseUrl; |
| @@ -906,6 +927,8 @@ |
| PrefService* pref = g_browser_process->local_state(); |
| DCHECK(pref); |
| + // TODO(asvitkine): Kill this logic when SendSeparateInitialStabilityLog() is |
| + // is made the default behavior. |
| if ((pref->GetInt64(prefs::kStabilityStatsBuildTime) |
| != MetricsLog::GetBuildTime()) || |
| (pref->GetString(prefs::kStabilityStatsVersion) |
| @@ -919,14 +942,11 @@ |
| MetricsLog::GetBuildTime()); |
| } |
| - // Update session ID |
| + // Initialize uptime counters. |
| + const base::TimeDelta startup_uptime = GetIncrementalUptime(pref); |
| + DCHECK_EQ(0, startup_uptime.InMicroseconds()); |
|
Ilya Sherman
2013/12/06 06:21:26
Hmm, why does this need to happen earlier now?
Alexei Svitkine (slow)
2013/12/06 17:59:22
In a previous version of this I used startup_uptim
|
| session_id_ = pref->GetInteger(prefs::kMetricsSessionID); |
| - ++session_id_; |
| - pref->SetInteger(prefs::kMetricsSessionID, session_id_); |
| - // Stability bookkeeping |
| - IncrementPrefValue(prefs::kStabilityLaunchCount); |
| - |
| if (!pref->GetBoolean(prefs::kStabilityExitedCleanly)) { |
| IncrementPrefValue(prefs::kStabilityCrashCount); |
| // Reset flag, and wait until we call LogNeedForCleanShutdown() before |
| @@ -938,7 +958,20 @@ |
| int execution_phase = pref->GetInteger(prefs::kStabilityExecutionPhase); |
| UMA_HISTOGRAM_SPARSE_SLOWLY("Chrome.Browser.CrashedExecutionPhase", |
| execution_phase); |
| + |
| + // If the previous session didn't exit cleanly, then prepare an initial |
| + // stability log if UMA is enabled. |
| + if (reporting_will_be_enabled && SendSeparateInitialStabilityLog()) |
| + PrepareInitialStabilityLog(); |
| } |
| + |
| + // Update session ID. |
| + ++session_id_; |
| + pref->SetInteger(prefs::kMetricsSessionID, session_id_); |
| + |
| + // Stability bookkeeping |
| + IncrementPrefValue(prefs::kStabilityLaunchCount); |
| + |
| DCHECK_EQ(UNINITIALIZED_PHASE, execution_phase_); |
| SetExecutionPhase(START_METRICS_RECORDING); |
| @@ -952,9 +985,6 @@ |
| pref->SetBoolean(prefs::kStabilitySessionEndCompleted, true); |
| } |
| - // Initialize uptime counters. |
| - const base::TimeDelta startup_uptime = GetIncrementalUptime(pref); |
| - DCHECK_EQ(0, startup_uptime.InMicroseconds()); |
| // For backwards compatibility, leave this intact in case Omaha is checking |
| // them. prefs::kStabilityLastTimestampSec may also be useless now. |
| // TODO(jar): Delete these if they have no uses. |
| @@ -1087,10 +1117,10 @@ |
| // Upon the first callback, create the initial log so that we can immediately |
| // save the profiler data. |
| - if (!initial_log_.get()) |
| - initial_log_.reset(new MetricsLog(client_id_, session_id_)); |
| + if (!initial_metrics_log_.get()) |
| + initial_metrics_log_.reset(new MetricsLog(client_id_, session_id_)); |
| - initial_log_->RecordProfilerData(process_data, process_type); |
| + initial_metrics_log_->RecordProfilerData(process_data, process_type); |
| } |
| void MetricsService::FinishedReceivingProfilerData() { |
| @@ -1247,7 +1277,7 @@ |
| } |
| void MetricsService::PushPendingLogsToPersistentStorage() { |
| - if (state_ < INITIAL_LOG_READY) |
| + if (state_ < SENDING_INITIAL_STABILITY_LOG) |
| return; // We didn't and still don't have time to get plugin list etc. |
| if (log_manager_.has_staged_log()) { |
| @@ -1280,8 +1310,10 @@ |
| // Even if reporting is disabled, the scheduler is needed to trigger the |
| // creation of the initial log, which must be done in order for any logs to be |
| // persisted on shutdown or backgrounding. |
| - if (recording_active() && (reporting_active() || state_ < INITIAL_LOG_READY)) |
| + if (recording_active() && |
| + (reporting_active() || state_ < SENDING_INITIAL_STABILITY_LOG)) { |
| scheduler_->Start(); |
| + } |
| } |
| void MetricsService::StartScheduledUpload() { |
| @@ -1295,7 +1327,7 @@ |
| // recording are turned off instead of letting it fire and then aborting. |
| if (idle_since_last_transmission_ || |
| !recording_active() || |
| - (!reporting_active() && state_ >= INITIAL_LOG_READY)) { |
| + (!reporting_active() && state_ >= SENDING_INITIAL_STABILITY_LOG)) { |
| scheduler_->Stop(); |
| scheduler_->UploadCancelled(); |
| return; |
| @@ -1434,7 +1466,7 @@ |
| } |
| void MetricsService::StageNewLog() { |
| - if (log_manager_.has_staged_log()) |
| + if (state_ != INIT_TASK_DONE && log_manager_.has_staged_log()) |
| return; |
| switch (state_) { |
| @@ -1444,10 +1476,19 @@ |
| return; |
| case INIT_TASK_DONE: |
| - PrepareInitialLog(); |
| - DCHECK_EQ(INIT_TASK_DONE, state_); |
| - log_manager_.LoadPersistedUnsentLogs(); |
| - state_ = INITIAL_LOG_READY; |
| + if (log_manager_.has_staged_log()) { |
| + // There's an initial stability log, ready to send. |
| + state_ = SENDING_INITIAL_STABILITY_LOG; |
| + } else { |
| + // TODO(asvitkine): When the field trial is removed, the |log_type| |
| + // arg should be removed and PrepareInitialMetricsLog() should always |
| + // use ONGOING_LOG. Use INITIAL_LOG only to match to the old behavior |
| + // when the field trial is off. |
| + MetricsLog::LogType log_type = SendSeparateInitialStabilityLog() ? |
| + MetricsLog::ONGOING_LOG : MetricsLog::INITIAL_LOG; |
| + PrepareInitialMetricsLog(log_type); |
| + state_ = SENDING_INITIAL_METRICS_LOG; |
| + } |
| break; |
| case SENDING_OLD_LOGS: |
| @@ -1468,35 +1509,56 @@ |
| DCHECK(log_manager_.has_staged_log()); |
| } |
| -void MetricsService::PrepareInitialLog() { |
| - DCHECK_EQ(INIT_TASK_DONE, state_); |
|
Ilya Sherman
2013/12/06 06:21:26
nit: Please preserve some flavor of this DCHECK.
Alexei Svitkine (slow)
2013/12/06 17:59:22
Done (in PrepareInitialMetricsLog()).
|
| +void MetricsService::PrepareInitialStabilityLog() { |
| + DCHECK_EQ(INITIALIZED, state_); |
| - DCHECK(initial_log_.get()); |
| - initial_log_->set_hardware_class(hardware_class_); |
| + scoped_ptr<MetricsLog> initial_stability_log( |
| + new MetricsLog(client_id_, session_id_)); |
| + if (!initial_stability_log->LoadSavedEnvironmentFromPrefs()) |
| + return; |
| + initial_stability_log->RecordStabilityMetrics(base::TimeDelta(), |
| + MetricsLog::INITIAL_LOG); |
| + log_manager_.PauseCurrentLog(); |
| + log_manager_.BeginLoggingWithLog(initial_stability_log.release(), |
| + MetricsLog::INITIAL_LOG); |
| + log_manager_.FinishCurrentLog(); |
| + log_manager_.ResumePausedLog(); |
| + // TODO(asvitkine): It would be nice to defer StageNextLogForUpload() until |
| + // the log is ready to be actually sent. This way, the initial stability log |
| + // can be kept around in unsent logs (and saved to prefs) in case there's a |
| + // crash. However, this would require loading existing unsent logs earlier - |
| + // which is currently done in PrepareInitialMetricsLog(). |
|
Ilya Sherman
2013/12/06 06:21:26
I think it's important to get this right, i.e. to
Alexei Svitkine (slow)
2013/12/06 17:59:22
Okay - that's why I put the comment here to raise
Ilya Sherman
2013/12/07 06:51:53
Hmm. Could you possibly measure the actual perfor
Alexei Svitkine (slow)
2013/12/09 21:17:02
Agreed. I've added histograms to measure how long
Ilya Sherman
2013/12/10 06:02:10
Yeah, that sounds reasonable. Thanks! :)
|
| + log_manager_.StageNextLogForUpload(); |
| +} |
| + |
| +void MetricsService::PrepareInitialMetricsLog(MetricsLog::LogType log_type) { |
| + DCHECK(initial_metrics_log_.get()); |
|
Ilya Sherman
2013/12/06 06:21:26
nit: No need for .get()
Alexei Svitkine (slow)
2013/12/06 17:59:22
I actually just deleted this line now. If it's nul
|
| + initial_metrics_log_->set_hardware_class(hardware_class_); |
| + |
| std::vector<chrome_variations::ActiveGroupId> synthetic_trials; |
| GetCurrentSyntheticFieldTrials(&synthetic_trials); |
| - initial_log_->RecordEnvironment(plugins_, google_update_metrics_, |
| - synthetic_trials); |
| + initial_metrics_log_->RecordEnvironment(plugins_, google_update_metrics_, |
| + synthetic_trials); |
| PrefService* pref = g_browser_process->local_state(); |
| - initial_log_->RecordStabilityMetrics(GetIncrementalUptime(pref), |
| - MetricsLog::INITIAL_LOG); |
| + initial_metrics_log_->RecordStabilityMetrics(GetIncrementalUptime(pref), |
| + log_type); |
| // Histograms only get written to the current log, so make the new log current |
| // before writing them. |
| log_manager_.PauseCurrentLog(); |
| - log_manager_.BeginLoggingWithLog(initial_log_.release(), |
| - MetricsLog::INITIAL_LOG); |
| + log_manager_.BeginLoggingWithLog(initial_metrics_log_.release(), log_type); |
| RecordCurrentHistograms(); |
| log_manager_.FinishCurrentLog(); |
| log_manager_.ResumePausedLog(); |
| DCHECK(!log_manager_.has_staged_log()); |
| log_manager_.StageNextLogForUpload(); |
| + log_manager_.LoadPersistedUnsentLogs(); |
|
Ilya Sherman
2013/12/06 06:21:26
nit: It seems more appropriate to leave this where
Alexei Svitkine (slow)
2013/12/06 17:59:22
See my comment above about moving this call earlie
|
| } |
| void MetricsService::StoreUnsentLogs() { |
| - if (state_ < INITIAL_LOG_READY) |
| + if (state_ < SENDING_INITIAL_STABILITY_LOG) |
| return; // We never Recalled the prior unsent logs. |
| log_manager_.PersistUnsentLogs(); |
| @@ -1607,7 +1669,13 @@ |
| if (!log_manager_.has_staged_log()) { |
| switch (state_) { |
| - case INITIAL_LOG_READY: |
| + case SENDING_INITIAL_STABILITY_LOG: |
| + PrepareInitialMetricsLog(MetricsLog::ONGOING_LOG); |
| + SendStagedLog(); |
| + state_ = SENDING_INITIAL_METRICS_LOG; |
| + break; |
| + |
| + case SENDING_INITIAL_METRICS_LOG: |
| state_ = log_manager_.has_unsent_logs() ? SENDING_OLD_LOGS |
| : SENDING_CURRENT_LOGS; |
| break; |
| @@ -1631,10 +1699,16 @@ |
| DCHECK_LT(state_, SENDING_CURRENT_LOGS); |
| } |
| - // Error 400 indicates a problem with the log, not with the server, so |
| - // don't consider that a sign that the server is in trouble. |
| bool server_is_healthy = upload_succeeded || response_code == 400; |
| - scheduler_->UploadFinished(server_is_healthy, log_manager_.has_unsent_logs()); |
| + // Don't notify the scheduler that the upload is finished if we've only sent |
| + // the initial stability log, but not yet the initial metrics log (treat the |
| + // two as a single unit of work as far as the scheduler is concerned). |
| + if (state_ != SENDING_INITIAL_METRICS_LOG) { |
| + // Error 400 indicates a problem with the log, not with the server, so |
| + // don't consider that a sign that the server is in trouble. |
|
Ilya Sherman
2013/12/06 06:21:26
nit: This comment still belongs above line 1702 IM
Alexei Svitkine (slow)
2013/12/06 17:59:22
Done.
|
| + scheduler_->UploadFinished(server_is_healthy, |
| + log_manager_.has_unsent_logs()); |
| + } |
| // Collect network stats if UMA upload succeeded. |
| IOThread* io_thread = g_browser_process->io_thread(); |
| @@ -1700,8 +1774,8 @@ |
| } else if (status == base::TERMINATION_STATUS_STILL_RUNNING) { |
| UMA_HISTOGRAM_PERCENTAGE("BrowserRenderProcessHost.DisconnectedAlive", |
| was_extension_process ? 2 : 1); |
| - } |
| } |
| +} |
| void MetricsService::LogRendererHang() { |
| IncrementPrefValue(prefs::kStabilityRendererHangCount); |