OLD | NEW |
(Empty) | |
| 1 # Introduction |
| 2 |
| 3 This document discusses the current state of filtering in libyuv. An emphasis on
maximum performance while avoiding memory exceptions, and minimal amount of cod
e/complexity. See future work at end. |
| 4 |
| 5 # LibYuv Filter Subsampling |
| 6 |
| 7 There are 2 challenges with subsampling |
| 8 |
| 9 1. centering of samples, which involves clamping on edges |
| 10 2. clipping a source region |
| 11 |
| 12 Centering depends on scale factor and filter mode. |
| 13 |
| 14 # Down Sampling |
| 15 |
| 16 If scaling down, the stepping rate is always src_width / dst_width. |
| 17 |
| 18 dx = src_width / dst_width; |
| 19 |
| 20 e.g. If scaling from 1280x720 to 640x360, the step thru the source will be 2.0,
stepping over 2 pixels of source for each pixel of destination. |
| 21 |
| 22 Centering, depends on filter mode. |
| 23 |
| 24 *Point* downsampling takes the middle pixel. |
| 25 |
| 26 x = dx >> 1; |
| 27 |
| 28 For odd scale factors (e.g. 3x down) this is exactly the middle. For even scale
factors, this rounds up and takes the pixel to the right of center. e.g. scale
of 4x down will take pixel 2. |
| 29 |
| 30 **Bilinear** filter, uses the 2x2 pixels in the middle. |
| 31 |
| 32 x = dx / 2 - 0.5; |
| 33 |
| 34 For odd scale factors (e.g. 3x down) this is exactly the middle, and point sampl
ing is used. |
| 35 For even scale factors, this evenly filters the middle 2x2 pixels. e.g. 4x down
will filter pixels 1,2 at 50% in both directions. |
| 36 |
| 37 **Box** filter averages the entire box so sampling starts at 0. |
| 38 |
| 39 x = 0; |
| 40 |
| 41 For a scale factor of 2x down, this is equivalent to bilinear. |
| 42 |
| 43 # Up Sampling |
| 44 |
| 45 **Point** upsampling use stepping rate of src_width / dst_width and a starting c
oordinate of 0. |
| 46 |
| 47 x = 0; |
| 48 dx = src_width / dst_width; |
| 49 |
| 50 e.g. If scaling from 640x360 to 1280x720 the step thru the source will be 0.0, s
tepping half a pixel of source for each pixel of destination. Each pixel is repl
icated by the scale factor. |
| 51 |
| 52 **Bilinear** filter stretches such that the first pixel of source maps to the fi
rst pixel of destination, and the last pixel of source maps to the last pixel of
destination. |
| 53 |
| 54 x = 0; |
| 55 dx = (src_width - 1) / (dst_width - 1); |
| 56 |
| 57 This method is not technically correct, and will likely change in the future. |
| 58 |
| 59 * It is inconsistent with the bilinear down sampler. The same method could be u
sed for down sampling, and then it would be more reversible, but that would prev
ent specialized 2x down sampling. |
| 60 * Although centered, the image is slightly magnified. |
| 61 * The filtering was changed in early 2013 - previously it used: |
| 62 |
| 63 x = 0; |
| 64 dx = (src_width - 1) / (dst_width - 1); |
| 65 |
| 66 Which is the correct scale factor, but shifted the image left, and extruded the
last pixel. The reason for the change was to remove the extruding code from the
low level row functions, allowing 3 functions to sshare the same row functions
- ARGBScale, I420Scale, and ARGBInterpolate. Then the one function was ported t
o many cpu variations: SSE2, SSSE3, AVX2, Neon and 'Any' version for any number
of pixels and alignment. The function is also specialized for 0,25,50,75%. |
| 67 |
| 68 The above goes still has the potential to read the last pixel 100% and last pixe
l + 1 0%, which may cause a memory exception. So the left pixel goes to a fract
ion less than the last pixel, but filters in the minimum amount of it, and the m
aximum of the last pixel. |
| 69 |
| 70 dx = FixedDiv((src_width << 16) - 0x00010001, (dst << 16) - 0x00010000); |
| 71 |
| 72 **Box** filter for upsampling switches over to Bilinear. |
| 73 |
| 74 # Scale snippet: |
| 75 |
| 76 #define CENTERSTART(dx, s) (dx < 0) ? -((-dx >> 1) + s) : ((dx >> 1) + s) |
| 77 #define FIXEDDIV1(src, dst) FixedDiv((src << 16) - 0x00010001, \ |
| 78 (dst << 16) - 0x00010000); |
| 79 |
| 80 // Compute slope values for stepping. |
| 81 void ScaleSlope(int src_width, int src_height, |
| 82 int dst_width, int dst_height, |
| 83 FilterMode filtering, |
| 84 int* x, int* y, int* dx, int* dy) { |
| 85 assert(x != NULL); |
| 86 assert(y != NULL); |
| 87 assert(dx != NULL); |
| 88 assert(dy != NULL); |
| 89 assert(src_width != 0); |
| 90 assert(src_height != 0); |
| 91 assert(dst_width > 0); |
| 92 assert(dst_height > 0); |
| 93 if (filtering == kFilterBox) { |
| 94 // Scale step for point sampling duplicates all pixels equally. |
| 95 *dx = FixedDiv(Abs(src_width), dst_width); |
| 96 *dy = FixedDiv(src_height, dst_height); |
| 97 *x = 0; |
| 98 *y = 0; |
| 99 } else if (filtering == kFilterBilinear) { |
| 100 // Scale step for bilinear sampling renders last pixel once for upsample
. |
| 101 if (dst_width <= Abs(src_width)) { |
| 102 *dx = FixedDiv(Abs(src_width), dst_width); |
| 103 *x = CENTERSTART(*dx, -32768); |
| 104 } else if (dst_width > 1) { |
| 105 *dx = FIXEDDIV1(Abs(src_width), dst_width); |
| 106 *x = 0; |
| 107 } |
| 108 if (dst_height <= src_height) { |
| 109 *dy = FixedDiv(src_height, dst_height); |
| 110 *y = CENTERSTART(*dy, -32768); // 32768 = -0.5 to center bilinear. |
| 111 } else if (dst_height > 1) { |
| 112 *dy = FIXEDDIV1(src_height, dst_height); |
| 113 *y = 0; |
| 114 } |
| 115 } else if (filtering == kFilterLinear) { |
| 116 // Scale step for bilinear sampling renders last pixel once for upsample
. |
| 117 if (dst_width <= Abs(src_width)) { |
| 118 *dx = FixedDiv(Abs(src_width), dst_width); |
| 119 *x = CENTERSTART(*dx, -32768); |
| 120 } else if (dst_width > 1) { |
| 121 *dx = FIXEDDIV1(Abs(src_width), dst_width); |
| 122 *x = 0; |
| 123 } |
| 124 *dy = FixedDiv(src_height, dst_height); |
| 125 *y = *dy >> 1; |
| 126 } else { |
| 127 // Scale step for point sampling duplicates all pixels equally. |
| 128 *dx = FixedDiv(Abs(src_width), dst_width); |
| 129 *dy = FixedDiv(src_height, dst_height); |
| 130 *x = CENTERSTART(*dx, 0); |
| 131 *y = CENTERSTART(*dy, 0); |
| 132 } |
| 133 // Negative src_width means horizontally mirror. |
| 134 if (src_width < 0) { |
| 135 *x += (dst_width - 1) * *dx; |
| 136 *dx = -*dx; |
| 137 src_width = -src_width; |
| 138 } |
| 139 } |
| 140 |
| 141 # Future Work |
| 142 |
| 143 Point sampling should ideally be the same as bilinear, but pixel by pixel, round
to nearest neighbor. But as is, it is reversible and exactly matches ffmpeg at
all scale factors, both up and down. The scale factor is |
| 144 |
| 145 dx = src_width / dst_width; |
| 146 |
| 147 The step value is centered for down sample: |
| 148 |
| 149 x = dx / 2; |
| 150 |
| 151 Or starts at 0 for upsample. |
| 152 |
| 153 x = 0; |
| 154 |
| 155 Bilinear filtering is currently correct for down sampling, but not for upsamplin
g. |
| 156 Upsampling is stretching the first and last pixel of source, to the first and la
st pixel of destination. |
| 157 |
| 158 dx = (src_width - 1) / (dst_width - 1);<br> |
| 159 x = 0; |
| 160 |
| 161 It should be stretching such that the first pixel is centered in the middle of t
he scale factor, to match the pixel that would be sampled for down sampling by t
he same amount. And same on last pixel. |
| 162 |
| 163 dx = src_width / dst_width;<br> |
| 164 x = dx / 2 - 0.5; |
| 165 |
| 166 This would start at -0.5 and go to last pixel + 0.5, sampling 50% from last pixe
l + 1. |
| 167 Then clamping would be needed. On GPUs there are numerous ways to clamp. |
| 168 |
| 169 1. Clamp the coordinate to the edge of the texture, duplicating the first and la
st pixel. |
| 170 2. Blend with a constant color, such as transparent black. Typically best for f
onts. |
| 171 3. Mirror the UV coordinate, which is similar to clamping. Good for continuous
tone images. |
| 172 4. Wrap the coordinate, for texture tiling. |
| 173 5. Allow the coordinate to index beyond the image, which may be the correct data
if sampling a subimage. |
| 174 6. Extrapolate the edge based on the previous pixel. pixel -0.5 is computed fro
m slope of pixel 0 and 1. |
| 175 |
| 176 Some of these are computational, even for a GPU, which is one reason textures ar
e sometimes limited to power of 2 sizes. |
| 177 We do care about the clipping case, where allowing coordinates to become negativ
e and index pixels before the image is the correct data. But normally for simpl
e scaling, we want to clamp to the edge pixel. For example, if bilinear scaling
from 3x3 to 30x30, we’d essentially want 10 pixels of each of the original 3 pi
xels. But we want the original pixels to land in the middle of each 10 pixels,
at offsets 5, 15 and 25. There would be filtering between 5 and 15 between the
original pixels 0 and 1. And filtering between 15 and 25 from original pixels 1
and 2. The first 5 pixels are clamped to pixel 0 and the last 5 pixels are cla
mped to pixel 2. |
| 178 The easiest way to implement this is copy the original 3 pixels to a buffer, and
duplicate the first and last pixels. 0,1,2 becomes 0, 0,1,2, 2. Then implemen
t a filtering without clamping. We call this source extruding. Its only necess
ary on up sampling, since down sampler will always have valid surrounding pixels
. |
| 179 Extruding is practical when the image is already copied to a temporary buffer.
It could be done to the original image, as long as the original memory is resto
red, but valgrind and/or memory protection would disallow this, so it requires a
memcpy to a temporary buffer, which may hurt performance. The memcpy has a per
formance advantage, from a cache point of view, that can actually make this tech
nique faster, depending on hardware characteristics. |
| 180 Vertical extrusion can be done with a memcpy of the first/last row, or clamping
a pointer. |
| 181 |
| 182 |
| 183 The other way to implement clamping is handle the edges with a memset. e.g. Rea
d first source pixel and memset the first 5 pixels. Filter pixels 0,1,2 to 5 to
25. Read last pixel and memset the last 5 pixels. Blur is implemented with th
is method like this, which has 3 loops per row - left, middle and right. |
| 184 |
| 185 Box filter is only used for 2x down sample or more. Its based on integer sized
boxes. Technically it should be filtered edges, but thats substantially slower
(roughly 100x), and at that point you may as well do a cubic filter which is mor
e correct. |
| 186 |
| 187 Box filter currently sums rows into a row buffer. It does this with |
| 188 |
| 189 Mirroring will use the same slope as normal, but with a negative. |
| 190 The starting coordinate needs to consider the scale factor and filter. e.g. box
filter of 30x30 to 3x3 with mirroring would use -10 for step, but x = 20. widt
h (30) - dx. |
| 191 |
| 192 Step needs to be accurate, so it uses an integer divide. This is as much as 5%
of the profile. An approximated divide is substantially faster, but the inaccur
acy causes stepping beyond the original image boundaries. 3 general solutions: |
| 193 |
| 194 1. copy image to buffer with padding. allows for small errors in stepping. |
| 195 2. hash the divide, so common values are quickly found. |
| 196 3. change api so caller provides the slope. |
OLD | NEW |