Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(1391)

Unified Diff: chrome/browser/metrics/metrics_service.cc

Issue 81603002: Send UMA stability stats in a separate UMA log on startup. (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src/
Patch Set: Created 7 years ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
Index: chrome/browser/metrics/metrics_service.cc
===================================================================
--- chrome/browser/metrics/metrics_service.cc (revision 238706)
+++ chrome/browser/metrics/metrics_service.cc (working copy)
@@ -22,14 +22,14 @@
// before being persisted.
//
// Logs fall into one of two categories: "initial logs," and "ongoing logs."
-// There is at most one initial log sent for each complete run of the chromium
-// product (from startup, to browser shutdown). An initial log is generally
-// transmitted some short time (1 minute?) after startup, and includes stats
-// such as recent crash info, the number and types of plugins, etc. The
-// external server's response to the initial log conceptually tells this MS if
-// it should continue transmitting logs (during this session). The server
-// response can actually be much more detailed, and always includes (at a
-// minimum) how often additional ongoing logs should be sent.
+// There is at most one initial log sent for each complete run of Chrome (from
+// startup, to browser shutdown). An initial log is generally transmitted some
+// short time (1 minute?) after startup, and includes stats such as recent crash
+// info, the number and types of plugins, etc. The external server's response
+// to the initial log conceptually tells this MS if it should continue
+// transmitting logs (during this session). The server response can actually be
+// much more detailed, and always includes (at a minimum) how often additional
+// ongoing logs should be sent.
//
// After the above initial log, a series of ongoing logs will be transmitted.
// The first ongoing log actually begins to accumulate information stating when
@@ -45,13 +45,13 @@
// memory statistics are deposited into a histogram, and the log finalization
// code is then called. In the finalization, a call to a Histogram server
// acquires a list of all local histograms that have been flagged for upload
-// to the UMA server. The finalization also acquires a the most recent number
+// to the UMA server. The finalization also acquires the most recent number
// of page loads, along with any counts of renderer or plugin crashes.
//
// When the browser shuts down, there will typically be a fragment of an ongoing
-// log that has not yet been transmitted. At shutdown time, that fragment
-// is closed (including snapshotting histograms), and persisted, for
-// potential transmission during a future run of the product.
+// log that has not yet been transmitted. At shutdown time, that fragment is
+// closed (including snapshotting histograms), and persisted, for potential
+// transmission during a future run of the product.
//
// There are two slightly abnormal shutdown conditions. There is a
// "disconnected scenario," and a "really fast startup and shutdown" scenario.
@@ -73,15 +73,16 @@
// in a future run's initial log. (i.e., we don't lose crash stats).
//
// With the above overview, we can now describe the state machine's various
-// stats, based on the State enum specified in the state_ member. Those states
+// states, based on the State enum specified in the state_ member. Those states
// are:
//
-// INITIALIZED, // Constructor was called.
-// INIT_TASK_SCHEDULED, // Waiting for deferred init tasks to complete.
-// INIT_TASK_DONE, // Waiting for timer to send initial log.
-// INITIAL_LOG_READY, // Initial log generated, and waiting for reply.
-// SENDING_OLD_LOGS, // Sending unsent logs from previous session.
-// SENDING_CURRENT_LOGS, // Sending standard current logs as they accrue.
+// INITIALIZED, // Constructor was called.
+// INIT_TASK_SCHEDULED, // Waiting for deferred init tasks to finish.
Ilya Sherman 2013/12/06 06:21:26 nit: Note that you still have "tasks" here, but yo
Alexei Svitkine (slow) 2013/12/06 17:59:22 Done.
+// INIT_TASK_DONE, // Waiting for timer to send initial log.
+// SENDING_INITIAL_STABILITY_LOG, // Initial stability log being sent.
+// SENDING_INITIAL_METRICS_LOG, // Initial metrics log being sent.
+// SENDING_OLD_LOGS, // Sending unsent logs from previous session.
+// SENDING_CURRENT_LOGS, // Sending ongoing logs as they acrue.
//
// In more detail, we have:
//
@@ -89,7 +90,7 @@
// The MS has been constructed, but has taken no actions to compose the
// initial log.
//
-// INIT_TASK_SCHEDULED, // Waiting for deferred init tasks to complete.
+// INIT_TASK_SCHEDULED, // Waiting for deferred init task to complete.
// Typically about 30 seconds after startup, a task is sent to a second thread
// (the file thread) to perform deferred (lower priority and slower)
// initialization steps such as getting the list of plugins. That task will
@@ -101,11 +102,21 @@
// created. This callback typically arrives back less than one second after
// the deferred init task is dispatched.
//
-// INITIAL_LOG_READY, // Initial log generated, and waiting for reply.
-// This state is entered only after an initial log has been composed, and
-// prepared for transmission. It is also the case that any previously unsent
-// logs have been loaded into instance variables for possible transmission.
+// SENDING_INITIAL_STABILITY_LOG, // Initial stability log being sent.
+// During initialization, if a crash occurred during the previous session, an
+// initial stability log will be generated and registered with the log manager.
+// This state will be entered if there's such a stability log waiting to be
Ilya Sherman 2013/12/06 06:21:26 nit: What does it mean for there to be a stability
Alexei Svitkine (slow) 2013/12/06 17:59:22 Done.
+// transmitted when its time to sent up the first log (per the reporting
Ilya Sherman 2013/12/06 06:21:26 nit: "its" -> "it's"
Ilya Sherman 2013/12/06 06:21:26 nit: "sent" -> "send"
Alexei Svitkine (slow) 2013/12/06 17:59:22 Done.
Alexei Svitkine (slow) 2013/12/06 17:59:22 Done.
+// scheduler). If there is no initial stability log (e.g. there was no previous
+// crash), then this state will be skipped and the state will advance to
+// SENDING_INITIAL_METRICS_LOG.
//
+// SENDING_INITIAL_METRICS_LOG, // Initial metrics log being sent.
+// This state is entered once the initial metrics log has been composed, and
+// prepared for transmission and after sending the initial stability log, if
+// any. It is also the case that any previously unsent logs have been loaded
Ilya Sherman 2013/12/06 06:21:26 nit: This first sentence is a bit hard to parse; c
Alexei Svitkine (slow) 2013/12/06 17:59:22 Done.
+// into instance variables for possible transmission.
+//
// SENDING_OLD_LOGS, // Sending unsent logs from previous session.
// This state indicates that the initial log for this session has been
// successfully sent and it is now time to send any logs that were
@@ -337,6 +348,11 @@
pref->CommitPendingWrite();
}
+// Returns whether initial stability metrics should be sent in a separate log.
+bool SendSeparateInitialStabilityLog() {
+ return base::FieldTrialList::FindFullName("UMAStability") == "SeparateLog";
+}
+
} // namespace
@@ -503,6 +519,7 @@
state_(INITIALIZED),
low_entropy_source_(kLowEntropySourceNotSet),
idle_since_last_transmission_(false),
+ session_id_(-1),
next_window_id_(0),
self_ptr_factory_(this),
state_saver_factory_(this),
@@ -510,14 +527,7 @@
num_async_histogram_fetches_in_progress_(0),
entropy_source_returned_(LAST_ENTROPY_NONE) {
DCHECK(IsSingleThreaded());
- InitializeMetricsState();
- base::Closure callback = base::Bind(&MetricsService::StartScheduledUpload,
- self_ptr_factory_.GetWeakPtr());
- scheduler_.reset(new MetricsReportingScheduler(callback));
- log_manager_.set_log_serializer(new MetricsLogSerializer());
- log_manager_.set_max_ongoing_log_store_size(kUploadLogAvoidRetransmitSize);
-
BrowserChildProcessObserver::Add(this);
}
@@ -527,6 +537,17 @@
BrowserChildProcessObserver::Remove(this);
}
+void MetricsService::InitializeMetricsRecordingState(
+ bool reporting_will_be_enabled) {
+ InitializeMetricsState(reporting_will_be_enabled);
+
+ base::Closure callback = base::Bind(&MetricsService::StartScheduledUpload,
+ self_ptr_factory_.GetWeakPtr());
+ scheduler_.reset(new MetricsReportingScheduler(callback));
+ log_manager_.set_log_serializer(new MetricsLogSerializer());
+ log_manager_.set_max_ongoing_log_store_size(kUploadLogAvoidRetransmitSize);
+}
+
void MetricsService::Start() {
HandleIdleSinceLastTransmission(false);
EnableRecording();
@@ -788,7 +809,7 @@
// killed, so this has to be treated similar to a shutdown, closing and
// persisting all logs. Unlinke a shutdown, the state is primed to be ready
// to continue logging and uploading if the process does return.
- if (recording_active() && state_ >= INITIAL_LOG_READY) {
+ if (recording_active() && state_ >= SENDING_INITIAL_STABILITY_LOG) {
PushPendingLogsToPersistentStorage();
// Persisting logs closes the current log, so start recording a new log
// immediately to capture any background work that might be done before the
@@ -893,7 +914,7 @@
//------------------------------------------------------------------------------
// Initialization methods
-void MetricsService::InitializeMetricsState() {
+void MetricsService::InitializeMetricsState(bool reporting_will_be_enabled) {
#if defined(OS_POSIX)
network_stats_server_ = chrome_common_net::kEchoTestServerLocation;
http_pipelining_test_server_ = chrome_common_net::kPipelineTestServerBaseUrl;
@@ -906,6 +927,8 @@
PrefService* pref = g_browser_process->local_state();
DCHECK(pref);
+ // TODO(asvitkine): Kill this logic when SendSeparateInitialStabilityLog() is
+ // is made the default behavior.
if ((pref->GetInt64(prefs::kStabilityStatsBuildTime)
!= MetricsLog::GetBuildTime()) ||
(pref->GetString(prefs::kStabilityStatsVersion)
@@ -919,14 +942,11 @@
MetricsLog::GetBuildTime());
}
- // Update session ID
+ // Initialize uptime counters.
+ const base::TimeDelta startup_uptime = GetIncrementalUptime(pref);
+ DCHECK_EQ(0, startup_uptime.InMicroseconds());
Ilya Sherman 2013/12/06 06:21:26 Hmm, why does this need to happen earlier now?
Alexei Svitkine (slow) 2013/12/06 17:59:22 In a previous version of this I used startup_uptim
session_id_ = pref->GetInteger(prefs::kMetricsSessionID);
- ++session_id_;
- pref->SetInteger(prefs::kMetricsSessionID, session_id_);
- // Stability bookkeeping
- IncrementPrefValue(prefs::kStabilityLaunchCount);
-
if (!pref->GetBoolean(prefs::kStabilityExitedCleanly)) {
IncrementPrefValue(prefs::kStabilityCrashCount);
// Reset flag, and wait until we call LogNeedForCleanShutdown() before
@@ -938,7 +958,20 @@
int execution_phase = pref->GetInteger(prefs::kStabilityExecutionPhase);
UMA_HISTOGRAM_SPARSE_SLOWLY("Chrome.Browser.CrashedExecutionPhase",
execution_phase);
+
+ // If the previous session didn't exit cleanly, then prepare an initial
+ // stability log if UMA is enabled.
+ if (reporting_will_be_enabled && SendSeparateInitialStabilityLog())
+ PrepareInitialStabilityLog();
}
+
+ // Update session ID.
+ ++session_id_;
+ pref->SetInteger(prefs::kMetricsSessionID, session_id_);
+
+ // Stability bookkeeping
+ IncrementPrefValue(prefs::kStabilityLaunchCount);
+
DCHECK_EQ(UNINITIALIZED_PHASE, execution_phase_);
SetExecutionPhase(START_METRICS_RECORDING);
@@ -952,9 +985,6 @@
pref->SetBoolean(prefs::kStabilitySessionEndCompleted, true);
}
- // Initialize uptime counters.
- const base::TimeDelta startup_uptime = GetIncrementalUptime(pref);
- DCHECK_EQ(0, startup_uptime.InMicroseconds());
// For backwards compatibility, leave this intact in case Omaha is checking
// them. prefs::kStabilityLastTimestampSec may also be useless now.
// TODO(jar): Delete these if they have no uses.
@@ -1087,10 +1117,10 @@
// Upon the first callback, create the initial log so that we can immediately
// save the profiler data.
- if (!initial_log_.get())
- initial_log_.reset(new MetricsLog(client_id_, session_id_));
+ if (!initial_metrics_log_.get())
+ initial_metrics_log_.reset(new MetricsLog(client_id_, session_id_));
- initial_log_->RecordProfilerData(process_data, process_type);
+ initial_metrics_log_->RecordProfilerData(process_data, process_type);
}
void MetricsService::FinishedReceivingProfilerData() {
@@ -1247,7 +1277,7 @@
}
void MetricsService::PushPendingLogsToPersistentStorage() {
- if (state_ < INITIAL_LOG_READY)
+ if (state_ < SENDING_INITIAL_STABILITY_LOG)
return; // We didn't and still don't have time to get plugin list etc.
if (log_manager_.has_staged_log()) {
@@ -1280,8 +1310,10 @@
// Even if reporting is disabled, the scheduler is needed to trigger the
// creation of the initial log, which must be done in order for any logs to be
// persisted on shutdown or backgrounding.
- if (recording_active() && (reporting_active() || state_ < INITIAL_LOG_READY))
+ if (recording_active() &&
+ (reporting_active() || state_ < SENDING_INITIAL_STABILITY_LOG)) {
scheduler_->Start();
+ }
}
void MetricsService::StartScheduledUpload() {
@@ -1295,7 +1327,7 @@
// recording are turned off instead of letting it fire and then aborting.
if (idle_since_last_transmission_ ||
!recording_active() ||
- (!reporting_active() && state_ >= INITIAL_LOG_READY)) {
+ (!reporting_active() && state_ >= SENDING_INITIAL_STABILITY_LOG)) {
scheduler_->Stop();
scheduler_->UploadCancelled();
return;
@@ -1434,7 +1466,7 @@
}
void MetricsService::StageNewLog() {
- if (log_manager_.has_staged_log())
+ if (state_ != INIT_TASK_DONE && log_manager_.has_staged_log())
return;
switch (state_) {
@@ -1444,10 +1476,19 @@
return;
case INIT_TASK_DONE:
- PrepareInitialLog();
- DCHECK_EQ(INIT_TASK_DONE, state_);
- log_manager_.LoadPersistedUnsentLogs();
- state_ = INITIAL_LOG_READY;
+ if (log_manager_.has_staged_log()) {
+ // There's an initial stability log, ready to send.
+ state_ = SENDING_INITIAL_STABILITY_LOG;
+ } else {
+ // TODO(asvitkine): When the field trial is removed, the |log_type|
+ // arg should be removed and PrepareInitialMetricsLog() should always
+ // use ONGOING_LOG. Use INITIAL_LOG only to match to the old behavior
+ // when the field trial is off.
+ MetricsLog::LogType log_type = SendSeparateInitialStabilityLog() ?
+ MetricsLog::ONGOING_LOG : MetricsLog::INITIAL_LOG;
+ PrepareInitialMetricsLog(log_type);
+ state_ = SENDING_INITIAL_METRICS_LOG;
+ }
break;
case SENDING_OLD_LOGS:
@@ -1468,35 +1509,56 @@
DCHECK(log_manager_.has_staged_log());
}
-void MetricsService::PrepareInitialLog() {
- DCHECK_EQ(INIT_TASK_DONE, state_);
Ilya Sherman 2013/12/06 06:21:26 nit: Please preserve some flavor of this DCHECK.
Alexei Svitkine (slow) 2013/12/06 17:59:22 Done (in PrepareInitialMetricsLog()).
+void MetricsService::PrepareInitialStabilityLog() {
+ DCHECK_EQ(INITIALIZED, state_);
- DCHECK(initial_log_.get());
- initial_log_->set_hardware_class(hardware_class_);
+ scoped_ptr<MetricsLog> initial_stability_log(
+ new MetricsLog(client_id_, session_id_));
+ if (!initial_stability_log->LoadSavedEnvironmentFromPrefs())
+ return;
+ initial_stability_log->RecordStabilityMetrics(base::TimeDelta(),
+ MetricsLog::INITIAL_LOG);
+ log_manager_.PauseCurrentLog();
+ log_manager_.BeginLoggingWithLog(initial_stability_log.release(),
+ MetricsLog::INITIAL_LOG);
+ log_manager_.FinishCurrentLog();
+ log_manager_.ResumePausedLog();
+ // TODO(asvitkine): It would be nice to defer StageNextLogForUpload() until
+ // the log is ready to be actually sent. This way, the initial stability log
+ // can be kept around in unsent logs (and saved to prefs) in case there's a
+ // crash. However, this would require loading existing unsent logs earlier -
+ // which is currently done in PrepareInitialMetricsLog().
Ilya Sherman 2013/12/06 06:21:26 I think it's important to get this right, i.e. to
Alexei Svitkine (slow) 2013/12/06 17:59:22 Okay - that's why I put the comment here to raise
Ilya Sherman 2013/12/07 06:51:53 Hmm. Could you possibly measure the actual perfor
Alexei Svitkine (slow) 2013/12/09 21:17:02 Agreed. I've added histograms to measure how long
Ilya Sherman 2013/12/10 06:02:10 Yeah, that sounds reasonable. Thanks! :)
+ log_manager_.StageNextLogForUpload();
+}
+
+void MetricsService::PrepareInitialMetricsLog(MetricsLog::LogType log_type) {
+ DCHECK(initial_metrics_log_.get());
Ilya Sherman 2013/12/06 06:21:26 nit: No need for .get()
Alexei Svitkine (slow) 2013/12/06 17:59:22 I actually just deleted this line now. If it's nul
+ initial_metrics_log_->set_hardware_class(hardware_class_);
+
std::vector<chrome_variations::ActiveGroupId> synthetic_trials;
GetCurrentSyntheticFieldTrials(&synthetic_trials);
- initial_log_->RecordEnvironment(plugins_, google_update_metrics_,
- synthetic_trials);
+ initial_metrics_log_->RecordEnvironment(plugins_, google_update_metrics_,
+ synthetic_trials);
PrefService* pref = g_browser_process->local_state();
- initial_log_->RecordStabilityMetrics(GetIncrementalUptime(pref),
- MetricsLog::INITIAL_LOG);
+ initial_metrics_log_->RecordStabilityMetrics(GetIncrementalUptime(pref),
+ log_type);
// Histograms only get written to the current log, so make the new log current
// before writing them.
log_manager_.PauseCurrentLog();
- log_manager_.BeginLoggingWithLog(initial_log_.release(),
- MetricsLog::INITIAL_LOG);
+ log_manager_.BeginLoggingWithLog(initial_metrics_log_.release(), log_type);
RecordCurrentHistograms();
log_manager_.FinishCurrentLog();
log_manager_.ResumePausedLog();
DCHECK(!log_manager_.has_staged_log());
log_manager_.StageNextLogForUpload();
+ log_manager_.LoadPersistedUnsentLogs();
Ilya Sherman 2013/12/06 06:21:26 nit: It seems more appropriate to leave this where
Alexei Svitkine (slow) 2013/12/06 17:59:22 See my comment above about moving this call earlie
}
void MetricsService::StoreUnsentLogs() {
- if (state_ < INITIAL_LOG_READY)
+ if (state_ < SENDING_INITIAL_STABILITY_LOG)
return; // We never Recalled the prior unsent logs.
log_manager_.PersistUnsentLogs();
@@ -1607,7 +1669,13 @@
if (!log_manager_.has_staged_log()) {
switch (state_) {
- case INITIAL_LOG_READY:
+ case SENDING_INITIAL_STABILITY_LOG:
+ PrepareInitialMetricsLog(MetricsLog::ONGOING_LOG);
+ SendStagedLog();
+ state_ = SENDING_INITIAL_METRICS_LOG;
+ break;
+
+ case SENDING_INITIAL_METRICS_LOG:
state_ = log_manager_.has_unsent_logs() ? SENDING_OLD_LOGS
: SENDING_CURRENT_LOGS;
break;
@@ -1631,10 +1699,16 @@
DCHECK_LT(state_, SENDING_CURRENT_LOGS);
}
- // Error 400 indicates a problem with the log, not with the server, so
- // don't consider that a sign that the server is in trouble.
bool server_is_healthy = upload_succeeded || response_code == 400;
- scheduler_->UploadFinished(server_is_healthy, log_manager_.has_unsent_logs());
+ // Don't notify the scheduler that the upload is finished if we've only sent
+ // the initial stability log, but not yet the initial metrics log (treat the
+ // two as a single unit of work as far as the scheduler is concerned).
+ if (state_ != SENDING_INITIAL_METRICS_LOG) {
+ // Error 400 indicates a problem with the log, not with the server, so
+ // don't consider that a sign that the server is in trouble.
Ilya Sherman 2013/12/06 06:21:26 nit: This comment still belongs above line 1702 IM
Alexei Svitkine (slow) 2013/12/06 17:59:22 Done.
+ scheduler_->UploadFinished(server_is_healthy,
+ log_manager_.has_unsent_logs());
+ }
// Collect network stats if UMA upload succeeded.
IOThread* io_thread = g_browser_process->io_thread();
@@ -1700,8 +1774,8 @@
} else if (status == base::TERMINATION_STATUS_STILL_RUNNING) {
UMA_HISTOGRAM_PERCENTAGE("BrowserRenderProcessHost.DisconnectedAlive",
was_extension_process ? 2 : 1);
- }
}
+}
void MetricsService::LogRendererHang() {
IncrementPrefValue(prefs::kStabilityRendererHangCount);
« chrome/browser/metrics/metrics_service.h ('K') | « chrome/browser/metrics/metrics_service.h ('k') | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698