Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(367)

Side by Side Diff: net/docs/crash-course-in-net-internals.md

Issue 2501113002: Improve net-internals crash course. (Closed)
Patch Set: Expand on dt. Created 4 years, 1 month ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # A Crash Course in Debugging with about:net-internals 1 # A Crash Course in Debugging with about:net-internals
2 2
3 This document is intended to help get people started debugging network errors 3 This document is intended to help get people started debugging network errors
4 with about:net-internals, with some commonly useful tips and tricks. This 4 with about:net-internals, with some commonly useful tips and tricks. This
5 document is aimed more at how to get started using some of its features to 5 document is aimed more at how to get started using some of its features to
6 investigate bug reports, rather than as a feature overview. 6 investigate bug reports, rather than as a feature overview.
7 7
8 It would probably be useful to read 8 It would probably be useful to read
9 [life-of-a-url-request.md](life-of-a-url-request.md) before this document. 9 [life-of-a-url-request.md](life-of-a-url-request.md) before this document.
10 10
11 # What Data Net-Internals Contains 11 # What Data Net-Internals Contains
12 12
13 about:net-internals provides a view of browser activity from net/'s perspective. 13 about:net-internals provides a view of browser activity from net/'s perspective.
14 For this reason, it lacks knowledge of tabs, navigation, frames, resource types, 14 For this reason, it lacks knowledge of tabs, navigation, frames, resource types,
15 etc. 15 etc.
16 16
17 The top level network stack object is the URLRequestContext. The Events View 17 The top level network stack object is the URLRequestContext. The Events View
lilyhoughton 2016/12/01 18:26:32 Mention what this is (i.e., the views are listed i
Bence 2016/12/05 17:58:01 Done.
18 has information for all Chrome URLRequestContexts that are hooked up to the 18 has information for all Chrome URLRequestContexts that are hooked up to the
19 single, global, ChromeNetLog object. This includes both incognito and non- 19 single, global, ChromeNetLog object. This includes both incognito and
20 incognito profiles, among other things. The Events view only shows events for 20 non-incognito profiles, among other things. The Events view only shows events
21 the period that net-internals was open and running, and is incrementally updated 21 for the period that net-internals was open and running, and is incrementally
22 as events occur. The code attempts to add a top level event for URLRequests 22 updated as events occur. The code attempts to add a top level event for
23 that were active when the tab was opened, to help debug hung requests, but 23 URLRequests that were active when the about:net-internals tab was opened, to
24 that's best-effort only, and only includes requests for the current profile and 24 help debug hung requests, but that's best-effort only, and only includes
25 the system URLRequestContext. 25 requests for the current profile and the system URLRequestContext.
26 26
27 The other views are all snapshots of the current state of the main 27 The other views are all snapshots of the current state of the main
28 URLRequestContext's components, and are updated on a 5 second timer. These will 28 URLRequestContext's components, and are updated on a 5 second timer. These will
29 show objects that were created before about:net-internals was opened. Most 29 show objects that were created before about:net-internals was opened. Most
30 debugging is done with the Events view (which will be all this document 30 debugging is done with the Events view (which will be all this document covers),
31 covers), but it's good to be aware of this distinction. 31 but it's good to be aware of this distinction.
32 32
33 # Events vs Sources 33 # Events vs Sources
34 34
35 The Event View shows events logged by the NetLog. The NetLog model is that 35 The Event View shows events logged by the NetLog. The NetLog model is that
36 long-lived network stack objects, called sources, emit events over their 36 long-lived network stack objects, called sources, emit events over their
37 lifetime. When looking at the code, a "NetLogWithSource" object contains a sour ce 37 lifetime. When looking at the code, a NetLogWithSource object contains a source
lilyhoughton 2016/12/01 18:26:32 What is the point of this clause?
Bence 2016/12/05 17:58:01 Nothing. I guess I'm better off removing it.
38 ID, and a pointer to the NetLog the source emits events to. Some events have a 38 ID, a NetLogSourceType, and a pointer to the NetLog the source emits events to.
39 beginning and end point (during which other subevents may occur), and some only 39
40 occur at a single point in time. Generally only one event can be occuring for a 40 Sources show up in the left column of the Event View. Sources that include an
lilyhoughton 2016/12/01 18:26:32 Afaict, Before an event is actually selected, this
Bence 2016/12/05 17:58:01 Done.
41 source at a time. If there can be multiple events doing completely independent 41 event with a net_error parameter with negative value (that is, some kind of
42 thing, the code often uses new sources to represent the parallelism. 42 ERR_) are shown with red background. Sources whose opening event has not ended
43 yet are shown with white background. Other events have green background. The
44 search queries corresponding to the first two kinds are `is:error` and
45 `is:active`.
46
47 Events corresponding to selected sources show up in the right column, organized
lilyhoughton 2016/12/01 18:26:32 Maybe emphasize/clarify the need to and process of
48 by source, and by time within each source. There are two time values: t is
49 measured from some reference point common to all sources, and st is measured
50 from the first event for each source. Time is displayed in milliseconds.
51
52 Since the network stack is asynchronous, events from different sources will
53 often be interlaced in time, but Event View does not feature showing events from
54 different sources ordered by time. Large time gaps in the event list of a
lilyhoughton 2016/12/01 18:26:32 It does appear to feature showing events from diff
Bence 2016/12/05 17:58:01 Correct. I replaced the word "organized" with "so
55 single source usually mean that time is spent in the context of another source.
56
57 Some events come in pairs: a beginning and end event, between which other events
58 may occur. They are shown with + and - prefixes, respectively. The begin event
59 has a dt value which shows the duration. If the end event was captured, then
60 duration is calculated as the time difference between the begin and the end
61 events. Otherwise the time elapsed from the begin event until capturing
62 was stopped is displayed (a lower bound for actual duration), followed by a +
63 sign (for example, "dt=120+").
64
65 If there are no other events in between the begin and end, and the end event has
66 no parameters, then they are collapsed in a single line without a sign prefix.
67
68 Some other events only occur at a single point in time, and will not have a sign
69 prefix.
lilyhoughton 2016/12/01 18:26:32 Do these events also have a dt, or does that apply
Bence 2016/12/05 17:58:01 No duration. Added clarification to the text.
70
71 Generally only one event can be occuring for a source at a time. If there can
72 be multiple events doing completely independent things, the code often uses new
73 sources to represent the parallelism.
74
75 Most, but not all events correspond to a source. Exceptions are global events,
76 which have no source, and show up as individual entries in the source list.
77 Examples include NETWORK_CHANGED, DNS_CONFIG_CHANGED, and PROXY_CONFIG_CHANGED.
lilyhoughton 2016/12/01 18:26:32 Of global events, right?
Bence 2016/12/05 17:58:01 Correct. Added clarification to text.
78
79 # Common source types
43 80
44 "Sources" correspond to certain net objects, however, multiple layers of net/ 81 "Sources" correspond to certain net objects, however, multiple layers of net/
45 will often log to a single source. Here are the main source types and what they 82 will often log to a single source. Here are the main source types and what they
46 include (Excluding HTTP2 [SPDY]/QUIC): 83 include (excluding HTTP2 [SPDY]/QUIC):
47 84
48 * URL_REQUEST: This corresponds to the URLRequest object. It includes events 85 * URL_REQUEST: This corresponds to the URLRequest object. It includes events
49 from all the URLRequestJobs, HttpCache::Transactions, NetworkTransactions, 86 from all the URLRequestJobs, HttpCache::Transactions, NetworkTransactions,
50 HttpStreamFactoryImpl::Requests, HttpStream implementations, and 87 HttpStreamFactoryImpl::Requests, HttpStream implementations, and
51 HttpStreamParsers used to service a response. If the URL_REQUEST follows HTTP 88 HttpStreamParsers used to service a response. If the URL_REQUEST follows HTTP
52 redirects, it will include each redirect. This is a lot of stuff, but generally 89 redirects, it will include each redirect. This is a lot of stuff, but generally
53 only object is doing work at a time. This event source includes the full URL 90 only one object is doing work at a time. This event source includes the full
54 and generally includes the request / response headers (Except when the cache 91 URL and generally includes the request / response headers (except when the cache
55 handles the response). 92 handles the response).
56 93
57 * HTTP_STREAM_JOB: This corresponds to HttpStreamFactoryImpl::Job (Note that 94 * HTTP_STREAM_JOB: This corresponds to HttpStreamFactoryImpl::Job (note that
58 one Request can have multiple Jobs). It also includes its proxy and DNS 95 one Request can have multiple Jobs). It also includes its proxy and DNS
59 lookups. HTTP_STREAM_JOB log events are separate from URL_REQUEST because 96 lookups. HTTP_STREAM_JOB log events are separate from URL_REQUEST because two
60 two stream jobs may be created and races against each other, in some cases - 97 stream jobs may be created and races against each other, in some cases -- one
61 one for one for QUIC, and one for HTTP. One of the final events of this source 98 for QUIC, and one for HTTP.
62 indicates how an HttpStream was created (Reusing an existing SOCKET / 99
63 HTTP2_SESSION / QUIC_SESSION, or creating a new one). 100 One of the final events of this source, before the
101 HTTP_STREAM_JOB_BOUND_TO_REQUEST event, indicates how an HttpStream was
102 created:
103
104 + A SOCKET_POOL_BOUND_TO_CONNECT_JOB event means that a new TCP socket was
105 created, whereas a SOCKET_POOL_REUSED_AN_EXISTING_SOCKET event indicates tha t
106 an existing TCP socket was reused for a non-HTTP/2 request.
107
108 + An HTTP2_SESSION_POOL_IMPORTED_SESSION_FROM_SOCKET event indicates that a
109 new HTTP/2 session was opened by this Job.
110
111 + An HTTP2_SESSION_POOL_FOUND_EXISTING_SESSION event indicates that the requ est
112 was served on a preexisting HTTP/2 session.
113
114 + An HTTP2_SESSION_POOL_FOUND_EXISTING_SESSION_FROM_IP_POOL event means that
115 the request was pooled to a preexisting HTTP/2 session which had a different
116 SpdySessionKey, but DNS resolution resulted in the same IP, and the
117 certificate matches.
118
119 + There are currently no events logged for opening new QUIC sessions or
120 reusing existing ones.
64 121
65 * CONNECT_JOB: This corresponds to the ConnectJob subclasses that each socket 122 * CONNECT_JOB: This corresponds to the ConnectJob subclasses that each socket
66 pool uses. A successful CONNECT_JOB return a SOCKET. The events here vary a 123 pool uses. A successful CONNECT_JOB returns a SOCKET. The events here vary a
lilyhoughton 2016/12/01 18:26:32 What does it mean for a source to be successful?
Bence 2016/12/05 17:58:01 I guess that it signals OK (and not ERR_*) (via Co
67 lot by job type. Their main event is generally either to create a socket, or 124 lot by job type. Their main event is generally either to create a socket, or
68 request a socket from another socket pool (Which creates another CONNECT_JOB) 125 request a socket from another socket pool (which creates another CONNECT_JOB)
69 and then do some extra work on top of that - like establish an SSL connection on 126 and then do some extra work on top of that -- like establish an SSL connection o n
70 top of a TCP connection. 127 top of a TCP connection.
71 128
72 * SOCKET: These correspond to TCPSockets, but may also have other classes 129 * SOCKET: These correspond to TCPSockets, but may also have other classes
73 layered on top of them (Like an SSLClientSocket). This is a bit different from 130 layered on top of them (like an SSLClientSocket). This is a bit different from
74 the other classes, where the name corresponds to the topmost class, instead of 131 the other classes, where the name corresponds to the topmost class, instead of
75 the bottommost one. This is largely an artifact of the fact the socket is 132 the bottommost one. This is largely an artifact of the fact the socket is
76 created first, and then SSL (Or a proxy connection) is layered on top of it. 133 created first, and then SSL (or a proxy connection) is layered on top of it.
77 SOCKETs may be reused between multiple requests, and a request may end up 134 SOCKETs may be reused between multiple requests, and a request may end up
78 getting a socket created for another request. 135 getting a socket created for another request.
79 136
80 * HOST_RESOLVER_IMPL_JOB: These correspond to HostResolverImpl::Job. The 137 * HOST_RESOLVER_IMPL_JOB: These correspond to HostResolverImpl::Job. The
lilyhoughton 2016/12/01 18:26:32 The -> They?
Bence 2016/12/05 17:58:01 Done.
81 include information about how long the lookup was queued, each DNS request that 138 include information about how long the lookup was queued, each DNS request that
82 was attempted (With the platform or built-in resolver) and all the other sources 139 was attempted (with the platform or built-in resolver) and all the other sources
83 that are waiting on the job. 140 that are waiting on the job.
84 141
85 When one source depends on another, the code generally logs an event with 142 When one source depends on another, the code generally logs an event with
86 "source_dependency" value to both sources, which lets you jump between the two 143 `source_dependency` value to both sources, which lets you jump between the two
87 related events. 144 related events.
lilyhoughton 2016/12/01 18:26:32 Elaborate on this?
Bence 2016/12/05 17:58:01 Done.
88 145
89 # Debugging 146 # Debugging
90 147
91 When you receive a report from the user, the first thing you'll generally want 148 When you receive a report from the user, the first thing you'll generally want
92 to do find the URL_REQUEST[s] that are misbehaving. If the user gives an ERR_* 149 to do find the URL_REQUEST[s] that are misbehaving. If the user gives an ERR_*
93 code or the exact URL of the resource that won't load, you can just search for 150 code or the exact URL of the resource that won't load, you can just search for
94 it. If it's an upload, you can search for "post", or if it's a redirect issue, 151 it. If it's an upload, you can search for "post", or if it's a redirect issue,
95 you can search for "redirect". However, you often won't have much information 152 you can search for "redirect". However, you often won't have much information
96 about the actual problem. There are two filters in net-internals that can help 153 about the actual problem. There are two filters in net-internals that can help
97 in a lot of cases: 154 in a lot of cases:
98 155
99 * "type:URL_REQUEST is:error" will restrict the list to URL_REQUEST object with 156 * "type:URL_REQUEST is:error" will restrict the source list to URL_REQUEST
100 an error of some sort (red background). Cache errors are often non-fatal, so 157 objects with an error of some sort. Cache errors are often non-fatal, so you
101 you should generally ignore those, and look for a more interesting one. 158 should generally ignore those, and look for a more interesting one.
102 159
103 * "type:URL_REQUEST sort:duration" will show the longest-lived requests first. 160 * "type:URL_REQUEST sort:duration" will show the longest-lived requests first.
104 This is often useful in finding hung or slow requests. 161 This is often useful in finding hung or slow requests.
105 162
106 For a list of other filter commands, you can mouse over the question mark on 163 For a list of other filter commands, you can mouse over the question mark on
107 about:net-internals. 164 about:net-internals.
108 165
109 Once you locate the problematic request, the next is to figure out where the 166 Once you locate the problematic request, the next is to figure out where the
110 problem is - it's often one of the last events, though it could also be related 167 problem is -- it's often one of the last events, though it could also be related
111 to response or request headers. You can use "source_dependency" links to drill 168 to response or request headers. You can use "source_dependency" links to drill
112 down into other related sources, or up from layers below URL_REQUEST. 169 down into other related sources, or up from layers below URL_REQUEST.
lilyhoughton 2016/12/01 18:26:32 This phrasing is a little confusing - it's saying
Bence 2016/12/05 17:58:01 Indeed this is confusing. I agree with your inter
113 170
114 You can use the name of an event to search for the code responsible for that 171 You can use the name of an event to search for the code responsible for that
115 event, and try to deduce what went wrong before/after a particular event. Note 172 event, and try to deduce what went wrong before/after a particular event. Note
116 that the event names used in net-internals are not the entire string names, so 173 that the event names used in net-internals are not the entire string names, so
lilyhoughton 2016/12/01 18:26:32 Not sure if it's necessary, but it might be worth
Bence 2016/12/05 17:58:01 This is outdated, I am removing the last sentence
117 you should not do an entire string match. 174 you should not do an entire string match.
118 175
119 Some things to look for while debugging: 176 Some things to look for while debugging:
120 177
121 * CANCELLED events almost always come from outside the network stack. 178 * CANCELLED events almost always come from outside the network stack.
122 179
123 * Changing networks and entering / exiting suspend mode can have all sorts of 180 * Changing networks and entering / exiting suspend mode can have all sorts of
124 fun and exciting effects on underway network activity. Network changes log a 181 fun and exciting effects on underway network activity. Network changes log a
125 top level NETWORK_CHANGED event with no source - the event itself is treated as 182 top level NETWORK_CHANGED event. Suspend events are currently not logged.
126 its own source. Suspend events are currently not logged.
127 183
128 * URL_REQUEST_DELEGATE / DELEGATE_INFO events mean a URL_REQUEST is blocked on a 184 * URL_REQUEST_DELEGATE / DELEGATE_INFO events mean a URL_REQUEST is blocked on a
129 URLRequest::Delegate or the NetworkDelegate, which are implemented outside the 185 URLRequest::Delegate or the NetworkDelegate, which are implemented outside the
130 network stack. A request will sometimes be CANCELED here for reasons known only 186 network stack. A request will sometimes be CANCELED here for reasons known only
131 to the delegate. Or the delegate may cause a hang. In general, to debug issues 187 to the delegate. Or the delegate may cause a hang. In general, to debug issues
132 related to delegates, one needs to figure out which method of which object is 188 related to delegates, one needs to figure out which method of which object is
133 causing the problem. The object may be the a NetworkDelegate, a 189 causing the problem. The object may be the a NetworkDelegate, a
134 ResourceThrottle, a ResourceHandler, the ResourceLoader itself, or the 190 ResourceThrottle, a ResourceHandler, the ResourceLoader itself, or the
135 ResourceDispatcherHost. 191 ResourceDispatcherHost.
136 192
137 * Sockets are often reused between requests. If a request is on a stale 193 * Sockets are often reused between requests. If a request is on a stale
138 (reused) socket, what was the previous request that used the socket, how long 194 (reused) socket, what was the previous request that used the socket, how long
lilyhoughton 2016/12/01 18:26:32 Is it possible to get this information from net_in
Bence 2016/12/05 17:58:01 I believe so, I added a sentence here.
139 ago was it made? 195 ago was it made?
140 196
141 * SSL negotation is a process fraught with peril, particularly with broken 197 * SSL negotation is a process fraught with peril, particularly with broken
142 proxies. These will generally stall or fail in the SSL_CONNECT phase at the 198 proxies. These will generally stall or fail in the SSL_CONNECT phase at the
143 SOCKET layer. 199 SOCKET layer.
144 200
145 * Range requests have magic to handle them at the cache layer, and are often 201 * Range requests have magic to handle them at the cache layer, and are often
146 issued by the media and PDF code. 202 issued by the media and PDF code.
147 203
148 * Late binding: HTTP_STREAM_JOBs are not associated with any CONNECT_JOB until 204 * Late binding: HTTP_STREAM_JOBs are not associated with any CONNECT_JOB until
149 a CONNECT_JOB actually connects. This is so the highest priority pending job 205 a CONNECT_JOB actually connects. This is so the highest priority pending job
lilyhoughton 2016/12/01 18:26:32 Referring to the HTTP_STREAM_JOB, not the CONNECT_
Bence 2016/12/05 17:58:01 I think this sentence is correct. When ConnectJob
lilyhoughton 2016/12/09 19:56:09 Specifically, it isn't super clear to me what "the
150 gets the first available socket (Which may be a new socket, or an old one that's 206 gets the first available socket (which may be a new socket, or an old one that's
151 freed up). For this reason, it can be a little tricky to relate hung 207 freed up). For this reason, it can be a little tricky to relate hung
152 HTTP_STREAM_JOBs to CONNECT_JOBs. 208 HTTP_STREAM_JOBs to CONNECT_JOBs.
153 209
154 * Each CONNECT_JOB belongs to a "group", which has a limit of 6 connections. If 210 * Each CONNECT_JOB belongs to a "group", which has a limit of 6 connections. If
155 all CONNECT_JOBs beling to a group (The CONNECT_JOB's description field) are 211 all CONNECT_JOBs belonging to a group (the CONNECT_JOB's description field) are
156 stalled waiting on an available socket, the group probably has 6 sockets that 212 stalled waiting on an available socket, the group probably has 6 sockets that
157 that are hung - either hung trying to connect, or used by stalled requests and 213 that are hung -- either hung trying to connect, or used by stalled requests and
158 thus outside the socket pool's control. 214 thus outside the socket pool's control.
159 215
160 * There's a limit on number of DNS resolutions that can be started at once. If 216 * There's a limit on number of DNS resolutions that can be started at once. If
161 everything is stalled while resolving DNS addresses, you've probably hit this 217 everything is stalled while resolving DNS addresses, you've probably hit this
162 limit, and the DNS lookups are also misbehaving in some fashion. 218 limit, and the DNS lookups are also misbehaving in some fashion.
163 219
164 # Miscellany 220 # Miscellany
165 221
166 These are just miscellaneous things you may notice when looking through the 222 These are just miscellaneous things you may notice when looking through the
167 logs. 223 logs.
168 224
169 * URLRequests that look to start twice for no obvious reason. These are 225 * URLRequests that look to start twice for no obvious reason. These are
170 typically main frame requests, and the first request is AppCache. Can just 226 typically main frame requests, and the first request is AppCache. Can just
171 ignore it and move on with your life. 227 ignore it and move on with your life.
172 228
173 * Some HTTP requests are not handled by URLRequestHttpJobs. These include 229 * Some HTTP requests are not handled by URLRequestHttpJobs. These include
174 things like HSTS redirects (URLRequestRedirectJob), AppCache, ServiceWorker, 230 things like HSTS redirects (URLRequestRedirectJob), AppCache, ServiceWorker,
175 etc. These generally don't log as much information, so it can be tricky to 231 etc. These generally don't log as much information, so it can be tricky to
176 figure out what's going on with these. 232 figure out what's going on with these.
177 233
178 * Non-HTTP requests also appear in the log, and also generally don't log much 234 * Non-HTTP requests also appear in the log, and also generally don't log much
179 (blob URLs, chrome URLs, etc). 235 (blob URLs, chrome URLs, etc).
180 236
181 * Preconnects create a "HTTP_STREAM_JOB" event that may create multiple 237 * Preconnects create a "HTTP_STREAM_JOB" event that may create multiple
182 CONNECT_JOBs (or none) and is then destroyed. These can be identified by the 238 CONNECT_JOBs (or none) and is then destroyed. These can be identified by the
183 "SOCKET_POOL_CONNECTING_N_SOCKETS" events. 239 "SOCKET_POOL_CONNECTING_N_SOCKETS" events.
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698