Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(138)

Side by Side Diff: chrome/browser/metrics/thread_watcher.h

Issue 12183008: ThreadWatcher - Added jankiness monitor for FILE thread. (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src/
Patch Set: Created 7 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
« no previous file with comments | « no previous file | chrome/browser/metrics/thread_watcher.cc » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 // Copyright (c) 2012 The Chromium Authors. All rights reserved. 1 // Copyright (c) 2012 The Chromium Authors. All rights reserved.
2 // Use of this source code is governed by a BSD-style license that can be 2 // Use of this source code is governed by a BSD-style license that can be
3 // found in the LICENSE file. 3 // found in the LICENSE file.
4 4
5 // This file defines a WatchDog thread that monitors the responsiveness of other 5 // This file defines a WatchDog thread that monitors the responsiveness of other
6 // browser threads like UI, IO, DB, FILE and CACHED threads. It also defines 6 // browser threads like UI, IO, DB, FILE and CACHED threads. It also defines
7 // ThreadWatcher class which performs health check on threads that would like to 7 // ThreadWatcher class which performs health check on threads that would like to
8 // be watched. This file also defines ThreadWatcherList class that has list of 8 // be watched. This file also defines ThreadWatcherList class that has list of
9 // all active ThreadWatcher objects. 9 // all active ThreadWatcher objects.
10 // 10 //
(...skipping 290 matching lines...) Expand 10 before | Expand all | Expand 10 after
301 301
302 // Class with a list of all active thread watchers. A thread watcher is active 302 // Class with a list of all active thread watchers. A thread watcher is active
303 // if it has been registered, which includes determing the histogram name. This 303 // if it has been registered, which includes determing the histogram name. This
304 // class provides utility functions to start and stop watching all browser 304 // class provides utility functions to start and stop watching all browser
305 // threads. Only one instance of this class exists. 305 // threads. Only one instance of this class exists.
306 class ThreadWatcherList { 306 class ThreadWatcherList {
307 public: 307 public:
308 // A map from BrowserThread to the actual instances. 308 // A map from BrowserThread to the actual instances.
309 typedef std::map<content::BrowserThread::ID, ThreadWatcher*> RegistrationList; 309 typedef std::map<content::BrowserThread::ID, ThreadWatcher*> RegistrationList;
310 310
311 // A map from thread names (UI, IO, etc) to |live_threads_threshold|. 311 // A map from thread names (UI, IO, etc) to |CrashDataThresholds|.
312 // |live_threads_threshold| specifies the maximum number of browser threads 312 // |live_threads_threshold| specifies the maximum number of browser threads
313 // that have to be responsive when we want to crash the browser because of 313 // that have to be responsive when we want to crash the browser because of
314 // hung watched thread. 314 // hung watched thread. This threshold allows us to either look for a system
315 // deadlock, or look for a solo hung thread. A small live_threads_threshold
316 // looks for a broad deadlock (few browser threads left running), and a large
317 // threshold looks for a single hung thread (this in only appropriate for a
318 // thread that *should* never have much jank, such as the IO).
319 //
320 // |unresponsive_threshold| specifies the number of unanswered ping messages
321 // after which watched (UI, IO, etc) thread is considered as not responsive.
322 // We translate "time" (given in seconds) into a number of pings. As a result,
323 // we only declare a thread unresponsive when a lot of "time" has passed (many
324 // pings), and yet our pinging thread has continued to process messages (so we
325 // know the entire PC is not hung). Set this number higher to crash less
326 // often, and lower to crash more often.
315 // 327 //
316 // The map lists all threads (by name) that can induce a crash by hanging. It 328 // The map lists all threads (by name) that can induce a crash by hanging. It
317 // is populated from the command line, or given a default list. See 329 // is populated from the command line, or given a default list. See
318 // InitializeAndStartWatching() for the separate list of all threads that are 330 // InitializeAndStartWatching() for the separate list of all threads that are
319 // watched, as they provide the system context of how hung *other* threads 331 // watched, as they provide the system context of how hung *other* threads
320 // are. 332 // are.
321 // 333 //
322 // Example 1: If the value for "IO" was 3, then we would crash if at least one 334 // CrashOnHangThreadMap is populated by ParseCommandLineCrashOnHangThreads().
323 // thread is responding and total responding threads is less than or equal to 335 // It parses command line argument like "UI:3:18,IO:3:18,FILE:5:90". In this
324 // 3 (this thread, plus at least one other thread is unresponsive). We would 336 // string, the first parameter specifies the thread_id: UI, IO or FILE. The
325 // not crash if none of the threads are not responding, as we'd assume such 337 // second parameter specifies |live_threads_threshold|. For UI and IO threads,
326 // large hang counts mean that the system is generally unresponsive. 338 // we would crash if the number of threads responding is less than or equal to
327 // Example 2: If the value for "UI" was INT_MAX, then we would always crash if 339 // 3. The third parameter specifies the unresponsive threshold seconds. This
328 // the UI thread was hung, no matter what the other threads are doing. 340 // number is used to calculate |unresponsive_threshold|. In this example for
329 // Example 3: If the value of "FILE" was 5, then we would only crash if the 341 // UI and IO threads, we would crash if it doesn't respond for 18 seconds (or
jar (doing other things) 2013/02/12 01:27:27 nit: it doesn't --> those threads don't
ramant (doing other things) 2013/02/12 01:54:30 Done.
330 // FILE thread was the ONLY hung thread (because we watch 6 threads). IF there 342 // 9 unanswered ping messages) and for FILE thread, crash_seconds is set to 90
331 // was another unresponsive thread, we would not consider this a problem worth 343 // seconds (or 45 unanswered ping messages).
332 // crashing for. 344 //
333 typedef std::map<std::string, uint32> CrashOnHangThreadMap; 345 // ThreadWatcher watches six (UI, IO, DB, FILE, FILE_USER_BLOCKING and CACHE)
jar (doing other things) 2013/02/12 01:27:27 This paragraph should move up earlier, before you
ramant (doing other things) 2013/02/12 01:54:30 Done.
346 // browser threads. The following examples explain how the data in
347 // |CrashDataThresholds| controls the crashes.
348 //
349 // Example 1: If the |live_threads_threshold| value for "IO" was 3 and
350 // unresponsive threshold seconds is 18 (or |unresponsive_threshold| is 9),
351 // then we would crash if the IO thread was hung (9 unanswered ping messages)
352 // and if at least one thread is responding and total responding threads is
353 // less than or equal to 3 (this thread, plus at least one other thread is
354 // unresponsive). We would not crash if none of the threads are not
355 // responding, as we'd assume such large hang counts mean that the system is
jar (doing other things) 2013/02/12 01:27:27 nit: should have read: We would not crash if none
ramant (doing other things) 2013/02/12 01:54:30 Done.
356 // generally unresponsive.
357 // Example 2: If the |live_threads_threshold| value for "UI" was INT_MAX and
jar (doing other things) 2013/02/12 01:27:27 Instead of INT_MAX, "any number higher than 6"
ramant (doing other things) 2013/02/12 01:54:30 Done.
358 // unresponsive threshold seconds is 18 (or |unresponsive_threshold| is 9),
359 // then we would always crash if the UI thread was hung (9 unanswered ping
360 // messages), no matter what the other threads are doing.
361 // Example 3: If the |live_threads_threshold| value of "FILE" was 5 and
362 // unresponsive threshold seconds is 90 (or |unresponsive_threshold| is 45),
363 // then we would only crash if the FILE thread was the ONLY hung thread
364 // (because we watch 6 threads). If there was another unresponsive thread, we
365 // would not consider this a problem worth crashing for. FILE thread would be
366 // considered as hung if it didn't respond for 45 ping messages.
367 struct CrashDataThresholds {
368 CrashDataThresholds(uint32 live_threads_threshold,
369 uint32 unresponsive_threshold);
370 CrashDataThresholds();
371
372 uint32 live_threads_threshold;
373 uint32 unresponsive_threshold;
374 };
375 typedef std::map<std::string, CrashDataThresholds> CrashOnHangThreadMap;
334 376
335 // This method posts a task on WatchDogThread to start watching all browser 377 // This method posts a task on WatchDogThread to start watching all browser
336 // threads. 378 // threads.
337 // This method is accessible on UI thread. 379 // This method is accessible on UI thread.
338 static void StartWatchingAll(const CommandLine& command_line); 380 static void StartWatchingAll(const CommandLine& command_line);
339 381
340 // This method posts a task on WatchDogThread to RevokeAll tasks and to 382 // This method posts a task on WatchDogThread to RevokeAll tasks and to
341 // deactive thread watching of other threads and tell NotificationService to 383 // deactive thread watching of other threads and tell NotificationService to
342 // stop calling Observe. 384 // stop calling Observe.
343 // This method is accessible on UI thread. 385 // This method is accessible on UI thread.
(...skipping 10 matching lines...) Expand all
354 uint32* unresponding_thread_count); 396 uint32* unresponding_thread_count);
355 397
356 // This will ensure that the watching is actively taking place, and awaken 398 // This will ensure that the watching is actively taking place, and awaken
357 // all thread watchers that are registered. 399 // all thread watchers that are registered.
358 static void WakeUpAll(); 400 static void WakeUpAll();
359 401
360 private: 402 private:
361 // Allow tests to access our innards for testing purposes. 403 // Allow tests to access our innards for testing purposes.
362 friend class CustomThreadWatcher; 404 friend class CustomThreadWatcher;
363 friend class ThreadWatcherTest; 405 friend class ThreadWatcherTest;
364 FRIEND_TEST_ALL_PREFIXES(ThreadWatcherTest, CommandLineArgs); 406 FRIEND_TEST_ALL_PREFIXES(ThreadWatcherTest, ThreadNamesOnlyArgs);
407 FRIEND_TEST_ALL_PREFIXES(ThreadWatcherTest, ThreadNamesAndLiveThresholdArgs);
408 FRIEND_TEST_ALL_PREFIXES(ThreadWatcherTest, CrashOnHangThreadsAllArgs);
365 409
366 // This singleton holds the global list of registered ThreadWatchers. 410 // This singleton holds the global list of registered ThreadWatchers.
367 ThreadWatcherList(); 411 ThreadWatcherList();
368 412
369 // Destructor deletes all registered ThreadWatcher instances. 413 // Destructor deletes all registered ThreadWatcher instances.
370 virtual ~ThreadWatcherList(); 414 virtual ~ThreadWatcherList();
371 415
372 // Parses the command line to get |unresponsive_threshold| from 416 // Parses the command line to get |crash_on_hang_threads| map from
373 // switches::kCrashOnHangSeconds, |crash_on_hang| thread names from 417 // switches::kCrashOnHangThreads. |crash_on_hang_threads| is a map of
374 // switches::kCrashOnHangThreads and |live_threads_threshold| from 418 // |crash_on_hang| thread's names to |CrashDataThresholds|.
375 // switches::kCrashOnLive. |crash_on_hang_threads| is a map of
376 // |crash_on_hang| thread's names to |live_threads_threshold|.
377 static void ParseCommandLine( 419 static void ParseCommandLine(
378 const CommandLine& command_line, 420 const CommandLine& command_line,
379 uint32* unresponsive_threshold, 421 uint32* unresponsive_threshold,
380 CrashOnHangThreadMap* crash_on_hang_threads); 422 CrashOnHangThreadMap* crash_on_hang_threads);
381 423
424 // Parses the argument |crash_on_hang_thread_names| and creates
425 // |crash_on_hang_threads| map of |crash_on_hang| thread's names to
426 // |CrashDataThresholds|. If |crash_on_hang_thread_names| doesn't specify
427 // |live_threads_threshold|, then it uses |default_live_threads_threshold| as
428 // the value. If |crash_on_hang_thread_names| doesn't specify |crash_seconds|,
429 // then it uses |default_crash_seconds| as the value.
430 static void ParseCommandLineCrashOnHangThreads(
431 const std::string& crash_on_hang_thread_names,
432 uint32 default_live_threads_threshold,
433 uint32 default_crash_seconds,
434 CrashOnHangThreadMap* crash_on_hang_threads);
435
382 // This constructs the |ThreadWatcherList| singleton and starts watching 436 // This constructs the |ThreadWatcherList| singleton and starts watching
383 // browser threads by calling StartWatching() on each browser thread that is 437 // browser threads by calling StartWatching() on each browser thread that is
384 // watched. It disarms StartupTimeBomb. 438 // watched. It disarms StartupTimeBomb.
385 static void InitializeAndStartWatching( 439 static void InitializeAndStartWatching(
386 uint32 unresponsive_threshold, 440 uint32 unresponsive_threshold,
387 const CrashOnHangThreadMap& crash_on_hang_threads); 441 const CrashOnHangThreadMap& crash_on_hang_threads);
388 442
389 // This method calls ThreadWatcher::StartWatching() to perform health check on 443 // This method calls ThreadWatcher::StartWatching() to perform health check on
390 // the given |thread_id|. 444 // the given |thread_id|.
391 static void StartWatching( 445 static void StartWatching(
(...skipping 174 matching lines...) Expand 10 before | Expand all | Expand 10 after
566 // shutdown_watchdog_ watches for hangs during shutdown. 620 // shutdown_watchdog_ watches for hangs during shutdown.
567 base::Watchdog* shutdown_watchdog_; 621 base::Watchdog* shutdown_watchdog_;
568 622
569 // The |thread_id_| on which this object is constructed. 623 // The |thread_id_| on which this object is constructed.
570 const base::PlatformThreadId thread_id_; 624 const base::PlatformThreadId thread_id_;
571 625
572 DISALLOW_COPY_AND_ASSIGN(ShutdownWatcherHelper); 626 DISALLOW_COPY_AND_ASSIGN(ShutdownWatcherHelper);
573 }; 627 };
574 628
575 #endif // CHROME_BROWSER_METRICS_THREAD_WATCHER_H_ 629 #endif // CHROME_BROWSER_METRICS_THREAD_WATCHER_H_
OLDNEW
« no previous file with comments | « no previous file | chrome/browser/metrics/thread_watcher.cc » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698