Chromium Code Reviews| OLD | NEW |
|---|---|
| 1 // Copyright (c) 2012 The Chromium Authors. All rights reserved. | 1 // Copyright (c) 2012 The Chromium Authors. All rights reserved. |
| 2 // Use of this source code is governed by a BSD-style license that can be | 2 // Use of this source code is governed by a BSD-style license that can be |
| 3 // found in the LICENSE file. | 3 // found in the LICENSE file. |
| 4 | 4 |
| 5 // This file defines a WatchDog thread that monitors the responsiveness of other | 5 // This file defines a WatchDog thread that monitors the responsiveness of other |
| 6 // browser threads like UI, IO, DB, FILE and CACHED threads. It also defines | 6 // browser threads like UI, IO, DB, FILE and CACHED threads. It also defines |
| 7 // ThreadWatcher class which performs health check on threads that would like to | 7 // ThreadWatcher class which performs health check on threads that would like to |
| 8 // be watched. This file also defines ThreadWatcherList class that has list of | 8 // be watched. This file also defines ThreadWatcherList class that has list of |
| 9 // all active ThreadWatcher objects. | 9 // all active ThreadWatcher objects. |
| 10 // | 10 // |
| (...skipping 290 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 301 | 301 |
| 302 // Class with a list of all active thread watchers. A thread watcher is active | 302 // Class with a list of all active thread watchers. A thread watcher is active |
| 303 // if it has been registered, which includes determing the histogram name. This | 303 // if it has been registered, which includes determing the histogram name. This |
| 304 // class provides utility functions to start and stop watching all browser | 304 // class provides utility functions to start and stop watching all browser |
| 305 // threads. Only one instance of this class exists. | 305 // threads. Only one instance of this class exists. |
| 306 class ThreadWatcherList { | 306 class ThreadWatcherList { |
| 307 public: | 307 public: |
| 308 // A map from BrowserThread to the actual instances. | 308 // A map from BrowserThread to the actual instances. |
| 309 typedef std::map<content::BrowserThread::ID, ThreadWatcher*> RegistrationList; | 309 typedef std::map<content::BrowserThread::ID, ThreadWatcher*> RegistrationList; |
| 310 | 310 |
| 311 // A map from thread names (UI, IO, etc) to |live_threads_threshold|. | 311 // A map from thread names (UI, IO, etc) to |CrashDataThresholds|. |
| 312 // |live_threads_threshold| specifies the maximum number of browser threads | 312 // |live_threads_threshold| specifies the maximum number of browser threads |
| 313 // that have to be responsive when we want to crash the browser because of | 313 // that have to be responsive when we want to crash the browser because of |
| 314 // hung watched thread. | 314 // hung watched thread. This threshold allows us to either look for a system |
| 315 // deadlock, or look for a solo hung thread. A small live_threads_threshold | |
| 316 // looks for a broad deadlock (few browser threads left running), and a large | |
| 317 // threshold looks for a single hung thread (this in only appropriate for a | |
| 318 // thread that *should* never have much jank, such as the IO). | |
| 319 // | |
| 320 // |unresponsive_threshold| specifies the number of unanswered ping messages | |
| 321 // after which watched (UI, IO, etc) thread is considered as not responsive. | |
| 322 // We translate "time" (given in seconds) into a number of pings. As a result, | |
| 323 // we only declare a thread unresponsive when a lot of "time" has passed (many | |
| 324 // pings), and yet our pinging thread has continued to process messages (so we | |
| 325 // know the entire PC is not hung). Set this number higher to crash less | |
| 326 // often, and lower to crash more often. | |
| 315 // | 327 // |
| 316 // The map lists all threads (by name) that can induce a crash by hanging. It | 328 // The map lists all threads (by name) that can induce a crash by hanging. It |
| 317 // is populated from the command line, or given a default list. See | 329 // is populated from the command line, or given a default list. See |
| 318 // InitializeAndStartWatching() for the separate list of all threads that are | 330 // InitializeAndStartWatching() for the separate list of all threads that are |
| 319 // watched, as they provide the system context of how hung *other* threads | 331 // watched, as they provide the system context of how hung *other* threads |
| 320 // are. | 332 // are. |
| 321 // | 333 // |
| 322 // Example 1: If the value for "IO" was 3, then we would crash if at least one | 334 // CrashOnHangThreadMap is populated by ParseCommandLineCrashOnHangThreads(). |
| 323 // thread is responding and total responding threads is less than or equal to | 335 // It parses command line argument like "UI:3:18,IO:3:18,FILE:5:90". In this |
| 324 // 3 (this thread, plus at least one other thread is unresponsive). We would | 336 // string, the first parameter specifies the thread_id: UI, IO or FILE. The |
| 325 // not crash if none of the threads are not responding, as we'd assume such | 337 // second parameter specifies |live_threads_threshold|. For UI and IO threads, |
| 326 // large hang counts mean that the system is generally unresponsive. | 338 // we would crash if the number of threads responding is less than or equal to |
| 327 // Example 2: If the value for "UI" was INT_MAX, then we would always crash if | 339 // 3. The third parameter specifies the unresponsive threshold seconds. This |
| 328 // the UI thread was hung, no matter what the other threads are doing. | 340 // number is used to calculate |unresponsive_threshold|. In this example for |
| 329 // Example 3: If the value of "FILE" was 5, then we would only crash if the | 341 // UI and IO threads, we would crash if it doesn't respond for 18 seconds (or |
|
jar (doing other things)
2013/02/12 01:27:27
nit: it doesn't --> those threads don't
ramant (doing other things)
2013/02/12 01:54:30
Done.
| |
| 330 // FILE thread was the ONLY hung thread (because we watch 6 threads). IF there | 342 // 9 unanswered ping messages) and for FILE thread, crash_seconds is set to 90 |
| 331 // was another unresponsive thread, we would not consider this a problem worth | 343 // seconds (or 45 unanswered ping messages). |
| 332 // crashing for. | 344 // |
| 333 typedef std::map<std::string, uint32> CrashOnHangThreadMap; | 345 // ThreadWatcher watches six (UI, IO, DB, FILE, FILE_USER_BLOCKING and CACHE) |
|
jar (doing other things)
2013/02/12 01:27:27
This paragraph should move up earlier, before you
ramant (doing other things)
2013/02/12 01:54:30
Done.
| |
| 346 // browser threads. The following examples explain how the data in | |
| 347 // |CrashDataThresholds| controls the crashes. | |
| 348 // | |
| 349 // Example 1: If the |live_threads_threshold| value for "IO" was 3 and | |
| 350 // unresponsive threshold seconds is 18 (or |unresponsive_threshold| is 9), | |
| 351 // then we would crash if the IO thread was hung (9 unanswered ping messages) | |
| 352 // and if at least one thread is responding and total responding threads is | |
| 353 // less than or equal to 3 (this thread, plus at least one other thread is | |
| 354 // unresponsive). We would not crash if none of the threads are not | |
| 355 // responding, as we'd assume such large hang counts mean that the system is | |
|
jar (doing other things)
2013/02/12 01:27:27
nit: should have read:
We would not crash if none
ramant (doing other things)
2013/02/12 01:54:30
Done.
| |
| 356 // generally unresponsive. | |
| 357 // Example 2: If the |live_threads_threshold| value for "UI" was INT_MAX and | |
|
jar (doing other things)
2013/02/12 01:27:27
Instead of INT_MAX, "any number higher than 6"
ramant (doing other things)
2013/02/12 01:54:30
Done.
| |
| 358 // unresponsive threshold seconds is 18 (or |unresponsive_threshold| is 9), | |
| 359 // then we would always crash if the UI thread was hung (9 unanswered ping | |
| 360 // messages), no matter what the other threads are doing. | |
| 361 // Example 3: If the |live_threads_threshold| value of "FILE" was 5 and | |
| 362 // unresponsive threshold seconds is 90 (or |unresponsive_threshold| is 45), | |
| 363 // then we would only crash if the FILE thread was the ONLY hung thread | |
| 364 // (because we watch 6 threads). If there was another unresponsive thread, we | |
| 365 // would not consider this a problem worth crashing for. FILE thread would be | |
| 366 // considered as hung if it didn't respond for 45 ping messages. | |
| 367 struct CrashDataThresholds { | |
| 368 CrashDataThresholds(uint32 live_threads_threshold, | |
| 369 uint32 unresponsive_threshold); | |
| 370 CrashDataThresholds(); | |
| 371 | |
| 372 uint32 live_threads_threshold; | |
| 373 uint32 unresponsive_threshold; | |
| 374 }; | |
| 375 typedef std::map<std::string, CrashDataThresholds> CrashOnHangThreadMap; | |
| 334 | 376 |
| 335 // This method posts a task on WatchDogThread to start watching all browser | 377 // This method posts a task on WatchDogThread to start watching all browser |
| 336 // threads. | 378 // threads. |
| 337 // This method is accessible on UI thread. | 379 // This method is accessible on UI thread. |
| 338 static void StartWatchingAll(const CommandLine& command_line); | 380 static void StartWatchingAll(const CommandLine& command_line); |
| 339 | 381 |
| 340 // This method posts a task on WatchDogThread to RevokeAll tasks and to | 382 // This method posts a task on WatchDogThread to RevokeAll tasks and to |
| 341 // deactive thread watching of other threads and tell NotificationService to | 383 // deactive thread watching of other threads and tell NotificationService to |
| 342 // stop calling Observe. | 384 // stop calling Observe. |
| 343 // This method is accessible on UI thread. | 385 // This method is accessible on UI thread. |
| (...skipping 10 matching lines...) Expand all Loading... | |
| 354 uint32* unresponding_thread_count); | 396 uint32* unresponding_thread_count); |
| 355 | 397 |
| 356 // This will ensure that the watching is actively taking place, and awaken | 398 // This will ensure that the watching is actively taking place, and awaken |
| 357 // all thread watchers that are registered. | 399 // all thread watchers that are registered. |
| 358 static void WakeUpAll(); | 400 static void WakeUpAll(); |
| 359 | 401 |
| 360 private: | 402 private: |
| 361 // Allow tests to access our innards for testing purposes. | 403 // Allow tests to access our innards for testing purposes. |
| 362 friend class CustomThreadWatcher; | 404 friend class CustomThreadWatcher; |
| 363 friend class ThreadWatcherTest; | 405 friend class ThreadWatcherTest; |
| 364 FRIEND_TEST_ALL_PREFIXES(ThreadWatcherTest, CommandLineArgs); | 406 FRIEND_TEST_ALL_PREFIXES(ThreadWatcherTest, ThreadNamesOnlyArgs); |
| 407 FRIEND_TEST_ALL_PREFIXES(ThreadWatcherTest, ThreadNamesAndLiveThresholdArgs); | |
| 408 FRIEND_TEST_ALL_PREFIXES(ThreadWatcherTest, CrashOnHangThreadsAllArgs); | |
| 365 | 409 |
| 366 // This singleton holds the global list of registered ThreadWatchers. | 410 // This singleton holds the global list of registered ThreadWatchers. |
| 367 ThreadWatcherList(); | 411 ThreadWatcherList(); |
| 368 | 412 |
| 369 // Destructor deletes all registered ThreadWatcher instances. | 413 // Destructor deletes all registered ThreadWatcher instances. |
| 370 virtual ~ThreadWatcherList(); | 414 virtual ~ThreadWatcherList(); |
| 371 | 415 |
| 372 // Parses the command line to get |unresponsive_threshold| from | 416 // Parses the command line to get |crash_on_hang_threads| map from |
| 373 // switches::kCrashOnHangSeconds, |crash_on_hang| thread names from | 417 // switches::kCrashOnHangThreads. |crash_on_hang_threads| is a map of |
| 374 // switches::kCrashOnHangThreads and |live_threads_threshold| from | 418 // |crash_on_hang| thread's names to |CrashDataThresholds|. |
| 375 // switches::kCrashOnLive. |crash_on_hang_threads| is a map of | |
| 376 // |crash_on_hang| thread's names to |live_threads_threshold|. | |
| 377 static void ParseCommandLine( | 419 static void ParseCommandLine( |
| 378 const CommandLine& command_line, | 420 const CommandLine& command_line, |
| 379 uint32* unresponsive_threshold, | 421 uint32* unresponsive_threshold, |
| 380 CrashOnHangThreadMap* crash_on_hang_threads); | 422 CrashOnHangThreadMap* crash_on_hang_threads); |
| 381 | 423 |
| 424 // Parses the argument |crash_on_hang_thread_names| and creates | |
| 425 // |crash_on_hang_threads| map of |crash_on_hang| thread's names to | |
| 426 // |CrashDataThresholds|. If |crash_on_hang_thread_names| doesn't specify | |
| 427 // |live_threads_threshold|, then it uses |default_live_threads_threshold| as | |
| 428 // the value. If |crash_on_hang_thread_names| doesn't specify |crash_seconds|, | |
| 429 // then it uses |default_crash_seconds| as the value. | |
| 430 static void ParseCommandLineCrashOnHangThreads( | |
| 431 const std::string& crash_on_hang_thread_names, | |
| 432 uint32 default_live_threads_threshold, | |
| 433 uint32 default_crash_seconds, | |
| 434 CrashOnHangThreadMap* crash_on_hang_threads); | |
| 435 | |
| 382 // This constructs the |ThreadWatcherList| singleton and starts watching | 436 // This constructs the |ThreadWatcherList| singleton and starts watching |
| 383 // browser threads by calling StartWatching() on each browser thread that is | 437 // browser threads by calling StartWatching() on each browser thread that is |
| 384 // watched. It disarms StartupTimeBomb. | 438 // watched. It disarms StartupTimeBomb. |
| 385 static void InitializeAndStartWatching( | 439 static void InitializeAndStartWatching( |
| 386 uint32 unresponsive_threshold, | 440 uint32 unresponsive_threshold, |
| 387 const CrashOnHangThreadMap& crash_on_hang_threads); | 441 const CrashOnHangThreadMap& crash_on_hang_threads); |
| 388 | 442 |
| 389 // This method calls ThreadWatcher::StartWatching() to perform health check on | 443 // This method calls ThreadWatcher::StartWatching() to perform health check on |
| 390 // the given |thread_id|. | 444 // the given |thread_id|. |
| 391 static void StartWatching( | 445 static void StartWatching( |
| (...skipping 174 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 566 // shutdown_watchdog_ watches for hangs during shutdown. | 620 // shutdown_watchdog_ watches for hangs during shutdown. |
| 567 base::Watchdog* shutdown_watchdog_; | 621 base::Watchdog* shutdown_watchdog_; |
| 568 | 622 |
| 569 // The |thread_id_| on which this object is constructed. | 623 // The |thread_id_| on which this object is constructed. |
| 570 const base::PlatformThreadId thread_id_; | 624 const base::PlatformThreadId thread_id_; |
| 571 | 625 |
| 572 DISALLOW_COPY_AND_ASSIGN(ShutdownWatcherHelper); | 626 DISALLOW_COPY_AND_ASSIGN(ShutdownWatcherHelper); |
| 573 }; | 627 }; |
| 574 | 628 |
| 575 #endif // CHROME_BROWSER_METRICS_THREAD_WATCHER_H_ | 629 #endif // CHROME_BROWSER_METRICS_THREAD_WATCHER_H_ |
| OLD | NEW |