Issue 1235393005: Add Tracing.requestMemoryDump method to DevTools API (protocol)

Issue 1235393005: Add Tracing.requestMemoryDump method to DevTools API (protocol) (Closed)

Created:
5 years, 5 months ago by petrcermak

Modified:
5 years, 2 months ago

Reviewers:
pfeldman

CC:
blink-reviews, caseq+blink_chromium.org, yurys+blink_chromium.org, lushnikov+blink_chromium.org, pfeldman+blink_chromium.org, apavlov+blink_chromium.org, devtools-reviews_chromium.org, sergeyv+blink_chromium.org, kozyatinskiy+blink_chromium.org

Base URL:
https://chromium.googlesource.com/chromium/blink.git@master

Target Ref:
refs/heads/master

Project:
blink

Visibility:
Public.

More Reviews

Description

Add Tracing.requestMemoryDump method to DevTools API (protocol) This method will allow profiling memory tools such as Telemetry to request memory dumps. Proposal: https://docs.google.com/document/d/190-URzSjfsiXNeyke86z85-5ctuqsC_RoX1J1uJTwIQ/edit?usp=sharing Tracing handler change (Chromium side): https://codereview.chromium.org/1232313004/ BUG=505826 Committed: https://crrev.com/a99accc93d66ea89d963739bfdf976a9ee6ee47d git-svn-id: svn://svn.chromium.org/blink/trunk@199571 bbb929c8-8fbe-4397-9dbb-9b2b20218538

Patch Set 1 #

Total comments: 2

Created: 5 years, 5 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+10 lines, -0 lines)			Patch
	M	Source/devtools/protocol.json	View		1 chunk	+10 lines, -0 lines	2 comments	Download

Messages

Total messages: 16 (2 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

pfeldman

https://codereview.chromium.org/1235393005/diff/1/Source/devtools/protocol.json File Source/devtools/protocol.json (right): https://codereview.chromium.org/1235393005/diff/1/Source/devtools/protocol.json#newcode4720 Source/devtools/protocol.json:4720: "description": "Request a global memory dump.", Does tracing need ...

5 years, 5 months ago (2015-07-20 18:44:07 UTC) #2

petrcermak

5 years, 5 months ago (2015-07-21 08:03:59 UTC) #3

pfeldman

I guess it justifies being a part of tracing domain then. Having said that, it ...

5 years, 5 months ago (2015-07-21 13:50:28 UTC) #4

pfeldman

I guess it justifies being a part of tracing domain then. Having said that, it ...

5 years, 5 months ago (2015-07-21 13:50:29 UTC) #5

petrcermak

On 2015/07/21 13:50:29, pfeldman wrote: > I guess it justifies being a part of tracing ...

5 years, 5 months ago (2015-07-21 14:20:59 UTC) #6

Primiano Tucci (use gerrit)

As far as this CL is concerned, yes, this is purely an add-on to the ...

5 years, 5 months ago (2015-07-21 15:22:57 UTC) #7

As far as this CL is concerned, yes, this is purely an add-on to the tracing
subsystem and falls entirely into that domain.
This API is telling to memory-infra: please grab a dump and inject it into the
trace. This is the way to get reliable memory measurements in telemetry, and is
an important precondition for us to automate and improve most of our (today
super-noisy) memory metrics in telemetry.

In the ultra essence, the memory-infra code (concretely
base/trace_event/*memory*) works this:
 - Somethings (right now only tracing) asks it to grab a memory dump across all
chrome subsystem.
 - The memory-infra code collects all the data (e.g., queries PartitionAlloc
stats, Oilpan stats, malloc stats, v8 stats, etc etc)... IPC / PostTask happen
yada yada.
 - All that information is dumped in the trace.

The best way to get a quick understanding of the core architecture without being
overwhelmed by dozens of design docs is this slide deck [1], slides 27-33.
Or even better, we can have a VC.

[1]
https://docs.google.com/presentation/d/1hDV_8h41-e9dTo3PHZOaotel1cSDM0QwNAMt_...



> I'd like to see memory info in devtools and in the task manager, at least with
some granularity. 
I see where you are aiming at and it makes sense. However, my feeling is that
right now memory-infra is still in its exploratory phase. We are using tracing
as a way to explore the areas we need to cover (concretely: which parts of
codebase to instrument and how do they relate each other) and telemetry is the
best way for us to figure out the quality of this data.

Exposing some API from memory-infra to (devtools, chrome memory internals), in
the form of on-demand collection of memory stats is perfectly reasonable, is
something that I have in mind and definitely matches the architecture
memory-infra has been designed around. Also, I have some bugs already assigned
to that (e.g., crbug.com/25454)
We can't get it right now (some internal optimizations that dumps data straight
into TraceValue) but the outer interfaces are unaware of this are open to these
other use cases.

However, at current stage:
 - Before committing to a public API, I'd like to have a clear idea of the data
we want to expose. As a concrete example, we're  collaborating with GPU folks at
the time of writing to rationalize GPU memory across all the platforms.
- Before exposing that information to devtools, I think we should sit down and
rationalize what subset of our data makes sense to expose to web developers, and
which relates more to chrome internals (I had some chat in the bast with
isherman@ about that)

> Does this mean that tracing should be always started in Chrome?
At current state, yes. But I am very keen to consider a way to get on demand
memory stats out of band, without tracing. I just think that we should do that
when our dataset is stable.
All the technology is there. I just think that right now it is too premature to
committing to make that interface stable and exposing to the rest of the
codebase.

I am very happy to discuss more over VC if you want.

pfeldman

On Tue, Jul 21, 2015 at 8:22 AM, <primiano@chromium.org> wrote: > As far as this ...

5 years, 5 months ago (2015-07-21 16:10:44 UTC) #8

On Tue, Jul 21, 2015 at 8:22 AM, <primiano@chromium.org> wrote:

> As far as this CL is concerned, yes, this is purely an add-on to the
> tracing
> subsystem and falls entirely into that domain.
> This API is telling to memory-infra: please grab a dump and inject it into
> the
> trace. This is the way to get reliable memory measurements in telemetry,
> and is
> an important precondition for us to automate and improve most of our (today
> super-noisy) memory metrics in telemetry.
>

We might be lost in the terminology since Tracing means so many things
these days.

When I say "Tracing", I mean its controller/config, categories, start and
stop capabilities. In terms of remote debugging and telemetry, it is
exposed under the "Tracing" domain with the Tracing.* methods and events.

Telemetry is not tracing, it is something bigger. It uses remote debugging
protocol to communicate with chrome and is capable of starting tracing
among other things. It is talking to multiple domains such as "Page",
"Runtime", "Input", etc.

Now my perception of the memory sampling is that it has a poll / sampling
nature, while tracing has more of a continuous / instrumentation / flow
nature. To get a simple memory sample / snapshot, I don't want to start
Tracing, request memory dump and stop Tracing. This looks like a tracing
misuse. I rather see a Memory.* domain next to Tracing which is available
to Telemetry just as much as tracing is. And every time I need a snapshot,
I do "Memory.requestMemoryDump".

> In the ultra essence, the memory-infra code (concretely
> base/trace_event/*memory*) works this:
>  - Somethings (right now only tracing) asks it to grab a memory dump
> across all
> chrome subsystem.
>  - The memory-infra code collects all the data (e.g., queries
> PartitionAlloc
> stats, Oilpan stats, malloc stats, v8 stats, etc etc)... IPC / PostTask
> happen
> yada yada.
>  - All that information is dumped in the trace.
>
>
Ok, so pardon my ignorance due to not reading the docs just yet, but I
don't see how you need tracing here. You should implement your own 100 line
Memory protocol handler and make your information available outside of the
tracing infrastructure. Telemetry would have it.

> The best way to get a quick understanding of the core architecture without
> being
> overwhelmed by dozens of design docs is this slide deck [1], slides 27-33.
> Or even better, we can have a VC.

> [1]
>
>
https://docs.google.com/presentation/d/1hDV_8h41-e9dTo3PHZOaotel1cSDM0QwNAMt_...
>
>
>
>  I'd like to see memory info in devtools and in the task manager, at least
>> with
>>
> some granularity.
> I see where you are aiming at and it makes sense. However, my feeling is
> that
> right now memory-infra is still in its exploratory phase. We are using
> tracing
> as a way to explore the areas we need to cover (concretely: which parts of
> codebase to instrument and how do they relate each other) and telemetry is
> the
> best way for us to figure out the quality of this data.
>

So in the terms defined above, do you need tracing or telemetry?

>
> Exposing some API from memory-infra to (devtools, chrome memory
> internals), in
> the form of on-demand collection of memory stats is perfectly reasonable,
> is
> something that I have in mind and definitely matches the architecture
> memory-infra has been designed around. Also, I have some bugs already
> assigned
> to that (e.g., crbug.com/25454)
> We can't get it right now (some internal optimizations that dumps data
> straight
> into TraceValue) but the outer interfaces are unaware of this are open to
> these
> other use cases.
>

Lets discuss it in a greater detail, at least as far as remote debugging /
telemetry is concerned, driving things, requesting snapshots, serializing
results, etc. is trivial.

>
> However, at current stage:
>  - Before committing to a public API, I'd like to have a clear idea of the
> data
> we want to expose. As a concrete example, we're  collaborating with GPU
> folks at
> the time of writing to rationalize GPU memory across all the platforms.
> - Before exposing that information to devtools, I think we should sit down
> and
> rationalize what subset of our data makes sense to expose to web
> developers, and
> which relates more to chrome internals (I had some chat in the bast with
> isherman@ about that)

Remote debugging protocol is not entirely public. We have numerous hidden
methods (80% of the protocol), so we can experiment with the data we send
out. It is as much a subject to change as tracing categories.

>
>
>  Does this mean that tracing should be always started in Chrome?
>>
> At current state, yes. But I am very keen to consider a way to get on
> demand
> memory stats out of band, without tracing. I just think that we should do
> that
> when our dataset is stable.
> All the technology is there. I just think that right now it is too
> premature to
> committing to make that interface stable and exposing to the rest of the
> codebase.
>

I don't think we need any commitment here, all I am advocating for is a
sample-friendly controller API that does not necessarily need to be a part
of tracing and inherit its start/stop instrumentation character. Other than
that, it seems your code is structured in a way that supports and
appreciates this approach.

> I am very happy to discuss more over VC if you want.
>

Lets definitely do that, put something on my calendar and I'll be happy to
help you navigate the remote debugging protocol infrastructure.

>
> https://codereview.chromium.org/1235393005/
>

To unsubscribe from this group and stop receiving emails from it, send an email
to blink-reviews+unsubscribe@chromium.org.

Primiano Tucci (use gerrit)

> Now my perception of the memory sampling is that it has a poll / ...

5 years, 5 months ago (2015-07-21 17:31:23 UTC) #9

> Now my perception of the memory sampling is that it has a poll / sampling
> nature, while tracing has more of a continuous / instrumentation / flow
> nature. 
Correct as it is right now, but the plan in future would be to have also some
TRACE_EVENT_WITH_MEMORY: TL;DR codebase triggering memory samples (post v8 GC,
post oilpan GC, etc) which end up into the trace (when enabled).


> To get a simple memory sample / snapshot, I don't want to start
> Tracing, request memory dump and stop Tracing. This looks like a tracing
> misuse. I rather see a Memory.* domain next to Tracing which is available
> to Telemetry just as much as tracing is. And every time I need a snapshot,
> I do "Memory.requestMemoryDump".

All memory-infra has been designed to dump inside the tracing ecosystem. The
reasons for this choice are:

1. Correlation (between trace events and memory dumps).
   It is very important to understand causation effects (did memory increase /
decrease when there was a V8/Oilpan GC event? Or after a raster? Or while
parsing HTML?). Having that integrated with the tracing timeline is fundamental
to understand the memory dynamics of chrome internals.

2. Data analysis. Right now the data dumped inside tracing is pretty raw and
requires a lot of offline massaging to be rationalized. Today this massaging is
done inside the TraceViewer importer.
You might argue at this point: so, how do you intend to make use of this data in
telemetry, if you don't have trace-viewer there? The answer is:
 - (Very) Short term: we cherry pick the few metrics that require very little
massaging (and, yes, duplicate that code).
 - Medium term: there is an ongoing project to merge the trace-viewer importers
into telemetry (timeline based measurement: run trace-viewer importers in
telemetry via D8 to process the data and use that into the telemetry
benchmarks). References and design docs:
https://github.com/google/trace-viewer/issues/1076 and
https://github.com/google/trace-viewer/issues/984
 In both cases (short and long term) we need an API to inject the trace points
into telemetry and make sure that they are time-aligned and correlated with the
other trace events.


> Ok, so pardon my ignorance due to not reading the docs just yet, but I
> don't see how you need tracing here. You should implement your own 100 line
> Memory protocol handler and make your information available outside of the
> tracing infrastructure. Telemetry would have it.

The problem is not the 100 line protocol handler, it is the ~10k LOC (it's not a
random number) which interprets the memory dump data (which lives in
trace-viewer). There is a lot of graph-based math involved there (and more to
come).
So, to answer your question, the reason why we need tracing is:
 - Today tracing is the main (actually the driving) use case for memory-infra:
we need to inspect chrome-internals to see what happens both in the time
dimension (trace events) and memory dimension.
 - Trace-viewer has all the technology to understand the memory dumps. Very soon
that technology will be available in telemetry for free (aforementioned D8
project).


> So in the terms defined above, do you need tracing or telemetry?
Both.
We need telemetry to:
 - Trigger memory dumps at precise points.
 - Turn the post-processed data into benchmark results.

We need those points to end up in tracing because:
 they are time-aligned with trace-events AND because the code in tracing knows
how to parse and massage the data.
 In the short term we are going to massage a 5% of them via home brewed (python)
telemetry code. But that is only temporary to address urgent short term needs
(internal stuff, let's discuss offline) and we have already plans to switch to
the new timeline based measurement (D8) as soon as it becomes available in
telemetry.


> Remote debugging protocol is not entirely public. We have numerous hidden
> methods (80% of the protocol), so we can experiment with the data we send
> out. It is as much a subject to change as tracing categories.
Ok let's discuss this offline.
Probably we want this to be a hidden method, at least for the moment.



> I don't think we need any commitment here, all I am advocating for is a
> sample-friendly controller API that does not necessarily need to be a part
> of tracing and inherit its start/stop instrumentation character. Other than
> that, it seems your code is structured in a way that supports and
> appreciates this approach.
This is correct.
The only problem is that, at the moment, the data that we get in output is very
complex and hard to digest.
Trace-viewer allows us to iterate on it quickly and out-of-band of the chromium
codebase (if we make mistakes / improvements, we just need to change the
data-massaging and can retroactively apply on the existing traces).
I will be totally happy to have some digested output available within the
chromium codebase itself. But before that I want to understand and rationalize
what that output should be, and we are not there yet.

In the essence, today the mental model is:

MemoryInfra::RequestMemoryDump() ->  Dumps ***loads*** of data into the trace


It feels like the mental model that we want tomorrow is

                                      / Loads of data injected into the trace
MemoryInfra::RequestMemoryDump(depth)
                                      \ Lightweight processed data returned to
the caller


Which I'm totally happy to support once we figure out what that "lightweight
processed data" should look like


Looking forward to discuss more offline.

petrcermak

This patch was superseded by https://codereview.chromium.org/1245233004/. We decided to extend the API exposed by the ...

5 years, 4 months ago (2015-07-24 14:37:46 UTC) #10

petrcermak

pfeldman: It turns out that I'll have to land the Blink patch (this) before the ...

5 years, 4 months ago (2015-07-27 18:09:24 UTC) #11

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1235393005/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1235393005/1

5 years, 4 months ago (2015-07-28 08:47:28 UTC) #14

commit-bot: I haz the power

Committed patchset #1 (id:1) as https://src.chromium.org/viewvc/blink?view=rev&revision=199571

5 years, 4 months ago (2015-07-28 09:58:58 UTC) #15

commit-bot: I haz the power

5 years, 2 months ago (2015-09-23 11:50:31 UTC) #16

Message was sent while issue was closed.

Patchset 1 (id:??) landed as
https://crrev.com/a99accc93d66ea89d963739bfdf976a9ee6ee47d

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages