mmoroz: I'd like your fuzzing efficiency opinion here. This fuzzer does a lot of work ...
3 years, 9 months ago
(2017-03-03 01:51:49 UTC)
#2
mmoroz: I'd like your fuzzing efficiency opinion here.
This fuzzer does a lot of work each iteration. It is so slow that on my z620
linux workstation, I am getting only 20-40 exec/s.
However, it doesn't do *that* much unnecessary work. I profiled it to be sure
and found few places we could improve. A change to the codec API could maybe eke
out a 2% efficiency gain (exposing the TextCodecFactory).
The reason it is slow is that we run the input on 42 encodings. For every
encoding we create 14 codecs to run the input on. That's 588 codecs per exec.
In previous fuzzers, I've generally not done this (i.e. loop through all
possible modes, running input with each mode). Rather, I've used the input
stream to *define* the mode (i.e. by picking random bits off the end), giving
the fuzzing infrastructure the power to figure out which branches are most
interesting to take.
I'm curious about this balance. WDYT?
It's much more efficient to create 14 different fuzzers for each codec. You could use ...
3 years, 9 months ago
(2017-03-03 02:03:56 UTC)
#4
It's much more efficient to create 14 different fuzzers for each codec. You
could use gn automation to help you with some sort of a #define.
Charlie Harrison
On 2017/03/03 02:03:56, aizatsky1 wrote: > It's much more efficient to create 14 different fuzzers ...
3 years, 9 months ago
(2017-03-03 02:07:57 UTC)
#5
On 2017/03/03 02:03:56, aizatsky1 wrote:
> It's much more efficient to create 14 different fuzzers for each codec. You
> could use gn automation to help you with some sort of a #define.
Do you mean create 588 different fuzzer targets? That seems excessive. Can you
clarify?
blink-reviews
Well, it depends )). Let me give you an example when decision is clear. Imagine ...
3 years, 9 months ago
(2017-03-03 02:20:51 UTC)
#6
Well, it depends )). Let me give you an example when decision is clear.
Imagine simple fuzzer:
for (codec in codecs)
video = parseVideo(data, codec);
In reality different codecs input are so much different that it makes
little sense of mutating, say, input for mp4 codec, and trying it on vp9.
Or trying to combine them together and checking on a third one. Not a lot
of interesting code paths will be found this way. Most likely mutating mp4
codec is useful only for mp4 codec. So your fuzzer's effectiveness is
automatically 1/codecs.size();
How data-independent are your 588 configurations? Do you expect inputs from
one be an interesting input for another? If you don't have much
independence, then current approach could be fine, but you won't get enough
CF cores this way.
On Thu, Mar 2, 2017 at 6:07 PM <csharrison@chromium.org> wrote:
> On 2017/03/03 02:03:56, aizatsky1 wrote:
> > It's much more efficient to create 14 different fuzzers for each codec.
> You
> > could use gn automation to help you with some sort of a #define.
>
> Do you mean create 588 different fuzzer targets? That seems excessive. Can
> you
> clarify?
>
> https://codereview.chromium.org/2731643002/
>
--
Mike
Sent from phone
--
You received this message because you are subscribed to the Google Groups "Blink
Reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to blink-reviews+unsubscribe@chromium.org.
chromium-reviews
Well, it depends )). Let me give you an example when decision is clear. Imagine ...
3 years, 9 months ago
(2017-03-03 02:20:51 UTC)
#7
Well, it depends )). Let me give you an example when decision is clear.
Imagine simple fuzzer:
for (codec in codecs)
video = parseVideo(data, codec);
In reality different codecs input are so much different that it makes
little sense of mutating, say, input for mp4 codec, and trying it on vp9.
Or trying to combine them together and checking on a third one. Not a lot
of interesting code paths will be found this way. Most likely mutating mp4
codec is useful only for mp4 codec. So your fuzzer's effectiveness is
automatically 1/codecs.size();
How data-independent are your 588 configurations? Do you expect inputs from
one be an interesting input for another? If you don't have much
independence, then current approach could be fine, but you won't get enough
CF cores this way.
On Thu, Mar 2, 2017 at 6:07 PM <csharrison@chromium.org> wrote:
> On 2017/03/03 02:03:56, aizatsky1 wrote:
> > It's much more efficient to create 14 different fuzzers for each codec.
> You
> > could use gn automation to help you with some sort of a #define.
>
> Do you mean create 588 different fuzzer targets? That seems excessive. Can
> you
> clarify?
>
> https://codereview.chromium.org/2731643002/
>
--
Mike
Sent from phone
--
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.
Charlie Harrison
Thanks for the clarification. I think we could probably get away with separating out into ...
3 years, 9 months ago
(2017-03-03 02:29:12 UTC)
#8
Thanks for the clarification.
I think we could probably get away with separating out into a few different
fuzzers per group of encodings, for encodings which are similar.
For the 14 modes, I think we could probably keep them per codec, or branch on
them based on the fuzzed data. I think for most of the modes, "interesting
data" is still interesting.
Let me think on it.
mmoroz
I'm super excited to see that many new targets! +1 to what Mike said, smaller ...
3 years, 9 months ago
(2017-03-03 09:49:47 UTC)
#9
I'm super excited to see that many new targets! +1 to what Mike said, smaller
targets are almost always better. Obviously, we'll get a better coverage with 42
targets running let's say 1K execs/s each than with a single target running ~30
execs/s :)
If there is an assumption that it might be interesting to feed data with
encoding A to decoder B, I have good news. We've added a thing called "corpus
cross-pollination", i.e. sometimes we randomly pick corpus of a completely
different target and feed it to another one. Thus, in a long run we'll cover all
possible combinations anyway, and each fuzz target will save those inputs which
are interesting for it.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
File third_party/WebKit/Source/platform/BUILD.gn (right):
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/BUILD.gn:2059:
fuzzer_test("blink_text_codec_fuzzer_" + target_name) {
Could you please name it in a way that it ends with '_fuzzer'. I think that we
rely on that format name in Chromium for some reason.
`"blink_text_codec_" + target_name + "_fuzzer"` would be good
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/BUILD.gn:2069: # Seed corpus uploaded to
Google Cloud Storage bucket used by ClusterFuzz.
Do you have the same seed corpus set for all the configs, or a separate for each
of them?
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
File third_party/WebKit/Source/platform/TextCodecFuzzer.cpp (right):
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:1: // Copyright 2016 The
Chromium Authors. All rights reserved.
2017
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:28: extern "C" int
LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
As I understood, we are using first 2 bytes to initialize some flags on lines
128 and 131. Due to that, we can add `if (size < 2) return 0;` in the beginning,
so libFuzzer would understand that it makes no sense to try smaller inputs.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:120: static const
Vector<WTF::UnencodableHandling> unencodableHandlingOptions{
Does it make sense to have #118 and #120 inside LLVMFuzzerTestOneInput? Can we
move them to global since they are static const?
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:135:
codec->decode(reinterpret_cast<const char*>(data), size, flushBehavior,
I think that we should use `ConsumeRemainingBytes`
(https://cs.chromium.org/chromium/src/base/test/fuzzed_data_provider.h?q=Fuzze...
instead of full |data|, since we have already used some leading bytes (lines 128
and 131). It would be better to avoid the dependency caused by bytes used twice.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:140:
codec->encode(reinterpret_cast<const LChar*>(data), size / sizeof(LChar),
The same as for line 135. Let's call `ConsumeRemainingBytes` before and then use
.data() or .c_str() in these calls.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:145:
codec->encode(reinterpret_cast<const UChar*>(data), size / sizeof(UChar),
Why we call `decode` on line 135 and `encode` here and line 140? Is it intended?
jsbell
Thanks for picking this up! https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Source/platform/TextCodecFuzzer.cpp File third_party/WebKit/Source/platform/TextCodecFuzzer.cpp (right): https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Source/platform/TextCodecFuzzer.cpp#newcode145 third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:145: codec->encode(reinterpret_cast<const UChar*>(data), size / ...
3 years, 9 months ago
(2017-03-03 17:35:43 UTC)
#10
Thanks for picking this up!
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
File third_party/WebKit/Source/platform/TextCodecFuzzer.cpp (right):
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:145:
codec->encode(reinterpret_cast<const UChar*>(data), size / sizeof(UChar),
On 2017/03/03 09:49:47, mmoroz wrote:
> Why we call `decode` on line 135 and `encode` here and line 140? Is it
intended?
The fuzzer is using the input data in three ways to test each codec's separate
encoding and decoding logic.
* treating it as bytes-off-the-wire and decoding
* treating it as a blink 8-bit string (latin1) and encoding
* treating it as a blink 16-bit string ("utf-16") and encoding - but only if
it's an even number of bytes
(The last case could instead truncate down to an even number of bytes rather
than rejecting half the inputs.)
Charlie Harrison
Thanks for the review! Apologies that the CL was in a weird state. Once I ...
3 years, 9 months ago
(2017-03-03 19:33:55 UTC)
#11
Thanks for the review! Apologies that the CL was in a weird state. Once I got GN
working I gave up and went to bed :D
In any case, I am happy that the new targets make you happy. I have tried to
address most of the comments.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
File third_party/WebKit/Source/platform/BUILD.gn (right):
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/BUILD.gn:2059:
fuzzer_test("blink_text_codec_fuzzer_" + target_name) {
On 2017/03/03 09:49:47, mmoroz wrote:
> Could you please name it in a way that it ends with '_fuzzer'. I think that we
> rely on that format name in Chromium for some reason.
>
> `"blink_text_codec_" + target_name + "_fuzzer"` would be good
Done.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/BUILD.gn:2069: # Seed corpus uploaded to
Google Cloud Storage bucket used by ClusterFuzz.
On 2017/03/03 09:49:47, mmoroz wrote:
> Do you have the same seed corpus set for all the configs, or a separate for
each
> of them?
Right now there is just one named "blink_text_codec_fuzzer" that jsbell
uploaded. I am not sure what we should do and would appreciate advice!
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
File third_party/WebKit/Source/platform/TextCodecFuzzer.cpp (right):
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:1: // Copyright 2016 The
Chromium Authors. All rights reserved.
On 2017/03/03 09:49:47, mmoroz wrote:
> 2017
Done.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:28: extern "C" int
LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
On 2017/03/03 09:49:47, mmoroz wrote:
> As I understood, we are using first 2 bytes to initialize some flags on lines
> 128 and 131. Due to that, we can add `if (size < 2) return 0;` in the
beginning,
> so libFuzzer would understand that it makes no sense to try smaller inputs.
Done, but we are actually using 3 bytes.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:120: static const
Vector<WTF::UnencodableHandling> unencodableHandlingOptions{
On 2017/03/03 09:49:47, mmoroz wrote:
> Does it make sense to have #118 and #120 inside LLVMFuzzerTestOneInput? Can we
> move them to global since they are static const?
Done.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:135:
codec->decode(reinterpret_cast<const char*>(data), size, flushBehavior,
On 2017/03/03 09:49:47, mmoroz wrote:
> I think that we should use `ConsumeRemainingBytes`
>
(https://cs.chromium.org/chromium/src/base/test/fuzzed_data_provider.h?q=Fuzze...
> instead of full |data|, since we have already used some leading bytes (lines
128
> and 131). It would be better to avoid the dependency caused by bytes used
twice.
Yes! Sorry this was just a simple error.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:140:
codec->encode(reinterpret_cast<const LChar*>(data), size / sizeof(LChar),
On 2017/03/03 09:49:47, mmoroz wrote:
> The same as for line 135. Let's call `ConsumeRemainingBytes` before and then
use
> .data() or .c_str() in these calls.
Done.
https://codereview.chromium.org/2731643002/diff/40001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:145:
codec->encode(reinterpret_cast<const UChar*>(data), size / sizeof(UChar),
On 2017/03/03 17:35:43, jsbell wrote:
> On 2017/03/03 09:49:47, mmoroz wrote:
> > Why we call `decode` on line 135 and `encode` here and line 140? Is it
> intended?
>
> The fuzzer is using the input data in three ways to test each codec's separate
> encoding and decoding logic.
>
> * treating it as bytes-off-the-wire and decoding
> * treating it as a blink 8-bit string (latin1) and encoding
> * treating it as a blink 16-bit string ("utf-16") and encoding - but only if
> it's an even number of bytes
>
> (The last case could instead truncate down to an even number of bytes rather
> than rejecting half the inputs.)
I have updated comments based on this discussion to make this clearer.
mmoroz
Thanks for clarifications, LGTM! Regarding seed corpus, there are two options: 1) upload files to ...
3 years, 9 months ago
(2017-03-05 19:09:46 UTC)
#12
Thanks for clarifications, LGTM!
Regarding seed corpus, there are two options:
1) upload files to the repo ad use via `seed_corpus` GN attribute
(https://chromium.googlesource.com/chromium/src/+/master/testing/libfuzzer/eff...)
2) upload it to GCS (as you already did), but we have to create a copy of your
corpus for every configuration you are going to land
I'd prefer option 1, as it's more scalable: we just add one line to the GN
template and all targets may use the same seed corpus. In general, we encourage
developers to put seed corpus to the repo, but if you have reasons not to do
that, option 2 is also fine. I can quickly copy the corpus you've already
uploaded for each of the new target names.
One more suggestion: does it make sense to have a round trip, i.e. `assert(data
!= encode(decode(data))`? Not sure whether it's applicable in this case, but if
it is, might be a good improvement in future :)
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
File third_party/WebKit/Source/platform/TextCodecFuzzer.cpp (right):
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:41: static
WTF::TextEncoding encoding =
nit: can be const?
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:140: CString byteString =
fuzzedData.ConsumeRemainingBytes();
nit: can it be const?
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:150:
std::unique_ptr<TextCodec> codec = newTextCodec(encoding);
Do we really need to create a new codec object here and on line 157? Sorry if
this is a stupid question, I'm not familiar with the API :)
Charlie Harrison
Thanks mmoroz. The data jsbell uploaded is around 35kb. Is it still a good idea ...
3 years, 9 months ago
(2017-03-06 01:51:12 UTC)
#13
Thanks mmoroz.
The data jsbell uploaded is around 35kb. Is it still a good idea to upload it in
repo? It feels too big for me.
I will have to defer to jsbell about whether the data is public or not. I think
all the bugs were fixed using this corpus though?
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
File third_party/WebKit/Source/platform/TextCodecFuzzer.cpp (right):
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:41: static
WTF::TextEncoding encoding =
On 2017/03/05 19:09:46, mmoroz wrote:
> nit: can be const?
Done.
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:140: CString byteString =
fuzzedData.ConsumeRemainingBytes();
On 2017/03/05 19:09:45, mmoroz wrote:
> nit: can it be const?
Done.
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:150:
std::unique_ptr<TextCodec> codec = newTextCodec(encoding);
On 2017/03/05 19:09:45, mmoroz wrote:
> Do we really need to create a new codec object here and on line 157? Sorry if
> this is a stupid question, I'm not familiar with the API :)
I think it does need to be different. It looks like some of the codecs are
stateful, so it makes sense to create a new one if we ever re-decode.
Charlie Harrison
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
3 years, 9 months ago
(2017-03-06 02:26:39 UTC)
#14
Dry run: Try jobs failed on following builders: linux_chromium_compile_dbg_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_compile_dbg_ng/builds/251069)
3 years, 9 months ago
(2017-03-06 03:05:47 UTC)
#17
3 years, 9 months ago
(2017-03-06 07:11:07 UTC)
#21
Dry run: This issue passed the CQ dry run.
mmoroz
On 2017/03/06 01:51:12, Charlie Harrison wrote: > Thanks mmoroz. > > The data jsbell uploaded ...
3 years, 9 months ago
(2017-03-06 10:00:19 UTC)
#22
On 2017/03/06 01:51:12, Charlie Harrison wrote:
> Thanks mmoroz.
>
> The data jsbell uploaded is around 35kb. Is it still a good idea to upload it
in
> repo? It feels too big for me.
>
> I will have to defer to jsbell about whether the data is public or not. I
think
> all the bugs were fixed using this corpus though?
>
>
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
> File third_party/WebKit/Source/platform/TextCodecFuzzer.cpp (right):
>
>
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
> third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:41: static
> WTF::TextEncoding encoding =
> On 2017/03/05 19:09:46, mmoroz wrote:
> > nit: can be const?
>
> Done.
>
>
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
> third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:140: CString byteString
=
> fuzzedData.ConsumeRemainingBytes();
> On 2017/03/05 19:09:45, mmoroz wrote:
> > nit: can it be const?
>
> Done.
>
>
https://codereview.chromium.org/2731643002/diff/60001/third_party/WebKit/Sour...
> third_party/WebKit/Source/platform/TextCodecFuzzer.cpp:150:
> std::unique_ptr<TextCodec> codec = newTextCodec(encoding);
> On 2017/03/05 19:09:45, mmoroz wrote:
> > Do we really need to create a new codec object here and on line 157? Sorry
if
> > this is a stupid question, I'm not familiar with the API :)
>
> I think it does need to be different. It looks like some of the codecs are
> stateful, so it makes sense to create a new one if we ever re-decode.
Thanks for the comments!
Regarding corpus, there is a trick to minimize it. You can run:
$ mkdir minimized_corpus
$ fuzzer -merge=1 ./minimized_corpus ./initial_corpus
So 'minimized_corpus' will have a minimized set of inputs giving the same
coverage as your 'initial_corpus' dir.
However, to do that, you'll probably have to run previous version of the fuzzer
which executes all possible encoding modes in a loop :)
I don't think that 35KB is too much, e.g. we have 272KB in
https://cs.chromium.org/chromium/src/third_party/harfbuzz-ng/fuzz/seed_corpus...
But minimization via `-merge=1` option would be a good move anyway :)
jsbell
lgtm It's unfortunate that we need to keep the build/code in sync and also remember ...
3 years, 9 months ago
(2017-03-06 21:52:18 UTC)
#23
lgtm
It's unfortunate that we need to keep the build/code in sync and also remember
to update if the list of codecs changes over in wtf/text (either directly or via
ICU tinkering), but I don't have concrete suggestions. cc: jshin@ so someone
else is aware.
(Also, FYI, if you want to verify the fuzzer try reverting any of the revs
2e2c129641218bcf03de40f74124a5c4f7c21f86
38d521c4ccc88c44de1d189c82b895c4a39f90c4
9d2d2e8409c21feca88979245ce75f9ac4a00f05
... which were found by the fuzzer.)
Charlie Harrison
Tragically I don't know how to download all of the files in GCS to run ...
3 years, 9 months ago
(2017-03-07 01:20:36 UTC)
#24
Tragically I don't know how to download all of the files in GCS to run the
minifier on them. I spent a few minutes trying to get gsutil to do it with no
luck. The only option seems to be to download them all manually (which is not
practical).
mmoroz
On 2017/03/07 01:20:36, Charlie Harrison wrote: > Tragically I don't know how to download all ...
3 years, 9 months ago
(2017-03-07 09:11:10 UTC)
#25
On 2017/03/07 01:20:36, Charlie Harrison wrote:
> Tragically I don't know how to download all of the files in GCS to run the
> minifier on them. I spent a few minutes trying to get gsutil to do it with no
> luck. The only option seems to be to download them all manually (which is not
> practical).
Wow! Doesn't the following work for you:
mkdir corpus
gsutil -m rsync gs://clusterfuzz-corpus/libfuzzer/gsutil -m rsync
gs://clusterfuzz-corpus/libfuzzer/blink_text_codec_fuzzer_static corpus
?
If not, what kind of error it shows? I'd be happy to help with that. Maybe I
should also update the docs, if it doesn't work smoothly.
Charlie Harrison
Looks like I was just holding gsutil wrong. I was also wrong about the estimated ...
3 years, 9 months ago
(2017-03-07 14:13:36 UTC)
#26
Looks like I was just holding gsutil wrong. I was also wrong about the estimated
size of the corpus. Minified it seems to be 600k. PTAL.
mmoroz
Awesome, LGTM!
3 years, 9 months ago
(2017-03-07 16:31:04 UTC)
#27
Awesome, LGTM!
Charlie Harrison
FYI why I haven't landed this. I realized I generated the seed corpus with a ...
3 years, 9 months ago
(2017-03-10 15:31:28 UTC)
#28
FYI why I haven't landed this.
I realized I generated the seed corpus with a slightly different fuzzer than
jsbell's original (it loops through the codecs but still uses random bits off
the fuzzed data for policy decisions). I'm going to re-generate when I have some
spare time with the fuzzer from the original patch.
Charlie Harrison
OK I've uploaded a corpus that represents something better. Basically, I ran jsbell's original fuzzer ...
3 years, 9 months ago
(2017-03-16 00:35:27 UTC)
#29
OK I've uploaded a corpus that represents something better.
Basically, I ran jsbell's original fuzzer to merge the corpus. This produced a
minimized corpus of ~1.9MB.
Then, to make it compatible (ish) with the current fuzzer, I appended 3 null
bytes to the end of each file using printf '\0\0\0' >> <corpus_file>
mmoroz
On 2017/03/16 00:35:27, Charlie Harrison-slow til 3-20 wrote: > OK I've uploaded a corpus that ...
3 years, 9 months ago
(2017-03-16 10:26:41 UTC)
#30
On 2017/03/16 00:35:27, Charlie Harrison-slow til 3-20 wrote:
> OK I've uploaded a corpus that represents something better.
>
> Basically, I ran jsbell's original fuzzer to merge the corpus. This produced a
> minimized corpus of ~1.9MB.
>
> Then, to make it compatible (ish) with the current fuzzer, I appended 3 null
> bytes to the end of each file using printf '\0\0\0' >> <corpus_file>
That sounds great!
Haraken, would you PTAL for OWNERS and general directory structure (esp. fuzzer files).
3 years, 9 months ago
(2017-03-16 10:29:52 UTC)
#32
Haraken, would you PTAL for OWNERS and general directory structure (esp. fuzzer
files).
Charlie Harrison
Haraken, would you PTAL for OWNERS and general directory structure (esp. fuzzer corpus files).
3 years, 9 months ago
(2017-03-16 10:30:09 UTC)
#33
Haraken, would you PTAL for OWNERS and general directory structure (esp. fuzzer
corpus files).
Charlie Harrison
Description was changed from ========== Fuzzer for TextCodecs Fuzzer originally written by jsbell here: https://codereview.chromium.org/2546233002 ...
3 years, 9 months ago
(2017-03-16 10:32:06 UTC)
#34
Description was changed from
==========
Fuzzer for TextCodecs
Fuzzer originally written by jsbell here:
https://codereview.chromium.org/2546233002
This introduces a libFuzzer-based fuzzer (which can be run locally
or via ClusterFuzz) for the WTF::TextCodec implementations. It
exercises the codecs - some of which are implemented in blink,
like UTF-8, UTF-16, Latin1, and some of which come wrap ICU - with
all the argument permutations for encoding and decoding.
Fuzzer docs:
https://chromium.googlesource.com/chromium/src/+/master/testing/libfuzzer/get...
A generated corpus was uploaded to Google Cloud Storage per
the docs; no dictionary is added since any byte stream is
useful. Running the fuzzer locally, three bugs were already
found and fixed.
==========
to
==========
Fuzzer for TextCodecs
Fuzzer originally written by jsbell here:
https://codereview.chromium.org/2546233002
This introduces a libFuzzer-based fuzzer template (which can be run locally
or via ClusterFuzz) for the WTF::TextCodec implementations. It
introduces a new target for every text codec.
Fuzzer docs:
https://chromium.googlesource.com/chromium/src/+/master/testing/libfuzzer/get...
This patch also includes a generated fuzzer corpus.
==========
haraken
On 2017/03/16 10:30:09, Charlie Harrison-slow til 3-20 wrote: > Haraken, would you PTAL for OWNERS ...
3 years, 9 months ago
(2017-03-16 14:17:17 UTC)
#35
On 2017/03/16 10:30:09, Charlie Harrison-slow til 3-20 wrote:
> Haraken, would you PTAL for OWNERS and general directory structure (esp.
fuzzer
> corpus files).
Are other fuzzers adding the corpus binary files in a similar way?
mmoroz
On 2017/03/16 14:17:17, haraken wrote: > On 2017/03/16 10:30:09, Charlie Harrison-slow til 3-20 wrote: > ...
3 years, 9 months ago
(2017-03-16 14:20:54 UTC)
#36
On 2017/03/16 14:17:17, haraken wrote:
> On 2017/03/16 10:30:09, Charlie Harrison-slow til 3-20 wrote:
> > Haraken, would you PTAL for OWNERS and general directory structure (esp.
> fuzzer
> > corpus files).
>
> Are other fuzzers adding the corpus binary files in a similar way?
Yes: https://cs.chromium.org/search/?q=fuzz+corpus+package:%5Echromium$&type=cs
haraken
LGTM
3 years, 9 months ago
(2017-03-16 14:23:41 UTC)
#37
LGTM
Charlie Harrison
The CQ bit was checked by csharrison@chromium.org
3 years, 9 months ago
(2017-03-16 14:35:23 UTC)
#38
Issue 2731643002: Fuzzer for TextCodecs
(Closed)
Created 3 years, 9 months ago by Charlie Harrison
Modified 3 years, 9 months ago
Reviewers: mmoroz, jsbell, aizatsky1, haraken
Base URL:
Comments: 23