docs/subtle_threading_bugs.md - Issue 2883903004: Port https://sites.google.com/a/chromium.org/dev/developers/design-documents/threading/suble-thread…

Side by Side Diff: docs/subtle_threading_bugs.md

Issue 2883903004: Port https://sites.google.com/a/chromium.org/dev/developers/design-documents/threading/suble-thread… (Closed)

Patch Set: Created 3 years, 7 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 # Subtle Threading Bugs and Patterns to avoid them

	2

	3 [TOC]

	4

	5 ## The Problem

	6 We were using a number of patterns that were problematic:

	7

	8 1. Using `BrowserThread::GetMessageLoop` This isn't safe, since it could return

	9 a valid pointer, but since the caller isn't holding on to a lock anymore, the

	10 target MessageLoop could be destructed while it's being used.

	11

	12 1. Caching of MessageLoop pointers in order to use them later for PostTask and f riends

	13

	14 * This was more efficient previously (more on that later) since using

	15 `BrowserThread::GetMessageLoop` involved a lock.

	16

	17 * But it spread logic about the order of thread destruction all over the

	18 code. Code that moved from the IO thread to the file thread and back, in

	19 order to avoid doing disk access on the IO thread, ended up having to do an

	20 extra hop through the UI thread on the way back to the IO thread since the

	21 file thread outlives the IO thread. Of course, most code learnt this the

	22 hard way, after doing the straight forward IO->file->IO thread hop and

	23 updating the code after seeing reliability or user crashes.

	24

	25 * It made the browser shutdown fragile and hence difficult to update.

	26

	27 1. File thread hops using RefCountedThreadSafe objects which have non-trivial

	28 destructors

	29

	30 * To reduce jank, frequently an object on the UI or IO thread would execute

	31 some code on the file thread, then post the result back to the original

	32 thread. We make this easy using `base::Callback` and

	33 `RefCountedThreadSafe`, so this pattern happened often, but it's not always

	34 safe: base::Callback holds an extra reference on the object to ensure that

	35 it doesn't invoke a method on a deleted object. But it's quite possible

	36 that before the file thread's stack unwinds and it releases the extra

	37 reference, that the response task on the original thread executed and

	38 released its own additional reference. The file thread is then left with

	39 the remaining reference and the object gets destructed there. While for

	40 most objects this is ok, many have non-trivial destructors, with the worst

	41 being ones that register with the UI-thread NotificationService. Dangling

	42 pointers would be left behind and tracking these crashes from ChromeBot or

	43 the crash dumps has wasted several days at least for me.

	44

	45 1. Having browser code take different code paths if a thread didn't exist

	46

	47 * This could be either deceptively harmless (i.e. execute synchronously when

	48 it was normally asynchronous), when in fact it makes shutdown slower

	49 because disk access is moved to the UI thread.

	50

	51 * It could lead to data loss, if tasks are silently not posted because the

	52 code assumes this only happens in unit tests, when it could occur on

	53 browser shutdown as well.

	54

	55 ## The Solution

	56

	57 * 1+2: Where possible, use `BrowserThread::PostTask`. Everywhere else, use

	58 `scoped_refptr<MessageLoopProxy>`.

	59

	60 `BrowserThread::PostTask` and friends (i.e. `PostDelayedTask`, `DeleteSoon`,

	61 `ReleaseSoon`) are safe and efficient: no locks are grabbed if the target

	62 thread is known to outlive the current thread. The four static methods have

	63 the same signature as the ones from `MessageLoop`, with the addition of the

	64 first parameter to indicate the target thread:

	65

	66 `BrowserThread::PostTask(BrowserThread::FILE, FROM_HERE, task);`

	67

	68 Similarly, `MessageLoopProxy` has (non-static) methods with the same signature

	69 as the `MessageLoop` counterparts, but is safe for caching a [reference

	70 counting] pointer to. You can obtain a `MessageLoopProxy` via

	71 `Thread::message_loop_proxy()`, `BrowserThread::GetMessageLoopProxy()`, or

	72 `MessageLoopProxy::CreateForCurrentThread()`.

	73

	74 * 3: If you want to execute a method on another thread and jump back to the

	75 original thread, use `PostTaskAndReply()`:

	76

	77 BrowserThread::PostTaskAndReply(BrowserThread::FILE, FROM_HERE,

	78 base::Bind(&Foo::DoStuffOnFileThread, this),

	79 base::Bind(&Foo::StuffDone, this));

	80

	81 `PostTaskAndReply()` will make sure that both tasks are destroyed on the

	82 thread they were created on, so if they hold the last reference to the object,

	83 it will be destroyed on the originating thread. You can also use

	84 `PostTaskAndReplyWithResult()` to return a value from the first task and pass

	85 it into the second task.

	86

	87 Alternatively, if your object must be destructed on a specific thread, you can

	88 use a trait from BrowserThread:

	89

	90 class Foo : public base::RefCountedThreadSafe<Foo, BrowserThread::DeleteOnIO Thread>

	91

	92 * 4: I've removed all the special casing and always made the objects in the

	93 browser code behave in one way. If you're writing a unit test and need to use

	94 an object that goes to a file thread (where before it would proceed

	95 synchronously), you just need:

	96

	97 BrowserThread file_thread(BrowserThread::FILE, MessageLoop::current());

	98 foo->StartAsyncTaskOnFileThread();

	99 MessageLoop::current()->RunAllPending();

	100

	101 There are plenty of examples now in the tests.

	102

	103 ## Gotchas

	104 Even when using `BrowseThread` or `MessageLoopProxy`, you will still likely have

	105 messages lost (usually resulting in memory leaks) when the target thread is in

	106 the process of shutting down: the benefit over `MessageLoop` is primarily one of

	107 avoiding crashing in unpredictable ways. (See this thread for debate.)

	108

	109 `BrowseThread` and `MessageLoopProxy::PostTask` will silently delete a task if

	110 the thread doesn't exist. This is done to avoid having all the code that uses

	111 it have special cases if the target thread exists or not, and to make Valgrind

	112 happy. As usual, the task for `DeleteSoon/ReleaseSoon` doesn't do anything in

	113 its destructor, so this won't cause unexpected behavior with them. But be wary

	114 of posted `Task` objects which have non-trivial destructors or smart pointers as

	115 members. I'm still on the fence about this, since while the latter is

	116 theoretical now, it could lead to problems in the future. I might change it so

	117 that the tasks are not deleted when I'm ready for more Valgrind fun.

	118

	119 If you absolutely must know if a task was posted or not, you can check the

	120 return value of `PostTask` and friends. But note that even if the task was

	121 posted successfully, there's no guarantee that it will run because the target

	122 thread could already have a `QuitTask` queued up, or be in the early stages of

	123 quitting.

	124

	125 `g_browser->io_thread()` and `file_thread()` are still around (the former for

	126 IPC code, and the latter for Linux proxy code which is in net and so can't use

	127 `BrowserThread`). Don't use them unless absolutely necessary.

	128

	129

	130 ### More information

	131

	132 * https://bugs.chromium.org/p/chromium/issues/detail?id=25354

	133

OLD	NEW

« no previous file with comments | « docs/README.md ('k') | no next file » | no next file with comments »