Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(73)

Issue 184323003: Manually set fFilterValues in SkConvolutionFilter1D. (Closed)

Created:
6 years, 9 months ago by Tom Hudson
Modified:
6 years, 9 months ago
CC:
skia-review_googlegroups.com
Visibility:
Public.

Description

Manually set fFilterValues in SkConvolutionFilter1D. Using fFilterValues.reset() or a loop of fFilterValues.push_back() is about 5x slower than calling fFilterValues.resize_back() and then looping using primitive [] and =. This is only going to show up if you apply https://codereview.chromium.org/183763047/, where it yields about 2.5% speedup in the bitmap resize microbenchmarks on a Linux desktop. Ceteris paribus, it should actually improve rasterization time of drawBitmapRectToRect() with a resize by about 5% in Chromium. BUG=skia:2258 R=humper@google.com Committed: http://code.google.com/p/skia/source/detail?r=13681

Patch Set 1 #

Patch Set 2 : #

Patch Set 3 : #

Patch Set 4 : #

Unified diffs Side-by-side diffs Delta from patch set Stats (+6 lines, -1 line) Patch
M src/core/SkConvolver.cpp View 1 2 3 2 chunks +6 lines, -1 line 0 comments Download

Messages

Total messages: 15 (0 generated)
tomhudson
PTAL
6 years, 9 months ago (2014-03-06 12:23:31 UTC) #1
humper
On 2014/03/06 12:23:31, tomhudson wrote: > PTAL LGTM, nice find.
6 years, 9 months ago (2014-03-06 14:42:11 UTC) #2
tomhudson
The CQ bit was checked by tomhudson@google.com
6 years, 9 months ago (2014-03-06 14:42:54 UTC) #3
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://skia-tree-status.appspot.com/cq/tomhudson@chromium.org/184323003/1
6 years, 9 months ago (2014-03-06 14:43:02 UTC) #4
commit-bot: I haz the power
The CQ bit was unchecked by commit-bot@chromium.org
6 years, 9 months ago (2014-03-06 14:56:59 UTC) #5
commit-bot: I haz the power
Commit queue rejected this change because the description was changed between the time the change ...
6 years, 9 months ago (2014-03-06 14:57:00 UTC) #6
tomhudson
The CQ bit was checked by tomhudson@google.com
6 years, 9 months ago (2014-03-06 14:57:23 UTC) #7
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://skia-tree-status.appspot.com/cq/tomhudson@chromium.org/184323003/1
6 years, 9 months ago (2014-03-06 14:57:32 UTC) #8
commit-bot: I haz the power
Change committed as 13681
6 years, 9 months ago (2014-03-06 14:57:49 UTC) #9
rmistry
On 2014/03/06 14:57:49, I haz the power (commit-bot) wrote: > Change committed as 13681 This ...
6 years, 9 months ago (2014-03-06 15:40:27 UTC) #10
tomhudson
Understand my error now. fFilterValues[] is a single array, not a separate array for each ...
6 years, 9 months ago (2014-03-06 17:01:14 UTC) #11
tomhudson
Patch set two is correct (as far as gm on Linux is concerned), but yields ...
6 years, 9 months ago (2014-03-06 18:01:08 UTC) #12
mtklein
On 2014/03/06 18:01:08, tomhudson wrote: > Patch set two is correct (as far as gm ...
6 years, 9 months ago (2014-03-06 18:21:08 UTC) #13
mtklein
On 2014/03/06 18:21:08, mtklein wrote: > On 2014/03/06 18:01:08, tomhudson wrote: > > Patch set ...
6 years, 9 months ago (2014-03-06 18:24:38 UTC) #14
tomhudson
6 years, 9 months ago (2014-03-07 11:18:03 UTC) #15
Message was sent while issue was closed.
On 2014/03/06 18:24:38, mtklein wrote:
> Actually, have I got it right that fFilterValues is just an SkTArray<short>? 
If
> that's the case, you can probably get a big, dare I say bigass, speedup just
by
> switching to SkTDArray.

Naive change in code produces zero change in performance (within noise):

+#include "SkTDArray.h"

+        //fFilterValues.push_back( val );
+        fFilterValues.append( val );

+    //SkTArray<ConvolutionFixed> fFilterValues;
+    SkTDArray<ConvolutionFixed> fFilterValues;

+        //fFilterValues.resize_back(initialOffset + filterLength);
+        fFilterValues.setCount(initialOffset + filterLength);

Patch set 4 is a bit more aggressive: if we assume that we have a square filter,
when we see the first bit of length L (including padding zeroes), allocate L^2.
This gets rid of all the incremental allocations for the square-filter special
case, which I believe is by far the most common. But since we landed
https://codereview.chromium.org/188743002 for a 2x speedup in that special case
anyway, it's only offering to trim 1-2% off the microbenchmarks. We're getting
closer to a (Linux) profile dominated by convolution, as you might hope for:

+  44.52%  [.] void convolveVertically_SSE2<false>(short const*, int, unsigned
char* const*, int,
+  26.82%  [.] convolve4RowsHorizontally_SSE2(unsigned char const**,
SkConvolutionFilter1D const&,
+   6.29%  [.] __memcpy_ssse3_back
+   5.04%  [.]
_ZN14SkResizeFilter14computeFiltersEiiifP21SkConvolutionFilter1DRK18SkConvolutionPr
+   3.33%  [.] SkMitchellFilter::evaluate(float) const
+   2.69%  [.] BGRAConvolve2D(unsigned char const*, int, bool,
SkConvolutionFilter1D const&, SkCon
+   1.31%  [.] convolveHorizontally_SSE2(unsigned char const*,
SkConvolutionFilter1D const&, unsig
+   0.84%  [.] SkConvolutionFilter1D::AddFilter(int, short const*, int)
+   0.71%  [.] _int_malloc

The surprising-to-me memcpy is supposedly because we're calling
SkMatrix::mapRect() with an identity matrix an AWFUL LOT. In one of the two
forms this ends up copying a rect over itself?

Patch set 4 is not for commit because it breaks non-square filters; but is the
modest over-allocation worth pursuing? (If filters are circular, we're
allocating 4r^2 instead of pi*r^2, but it's a single allocation that gets avoids
fragmentation or realloc copying.

Powered by Google App Engine
This is Rietveld 408576698