Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(442)

Side by Side Diff: google.patch

Issue 1258673007: Add jpeg_skip_scanlines() API to libjpeg-turbo (Closed) Base URL: https://chromium.googlesource.com/chromium/deps/libjpeg_turbo.git@master
Patch Set: Updating google.patch Created 5 years, 4 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « djpeg.c ('k') | jdapistd.c » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 Index: README
2 ===================================================================
3 --- README (revision 829)
4 +++ README (working copy)
5 @@ -1,26 +1,26 @@
6 +libjpeg-turbo note: This file has been modified by The libjpeg-turbo Project
7 +to include only information relevant to libjpeg-turbo, to wordsmith certain
8 +sections, and to remove impolitic language that existed in the libjpeg v8
9 +README. It is included only for reference. Please see README-turbo.txt for
10 +information specific to libjpeg-turbo.
11 +
12 +
13 The Independent JPEG Group's JPEG software
14 ==========================================
15
16 -README for release 6b of 27-Mar-1998
17 -====================================
18 +This distribution contains a release of the Independent JPEG Group's free JPEG
19 +software. You are welcome to redistribute this software and to use it for any
20 +purpose, subject to the conditions under LEGAL ISSUES, below.
21
22 -This distribution contains the sixth public release of the Independent JPEG
23 -Group's free JPEG software. You are welcome to redistribute this software and
24 -to use it for any purpose, subject to the conditions under LEGAL ISSUES, below.
25 +This software is the work of Tom Lane, Guido Vollbeding, Philip Gladstone,
26 +Bill Allombert, Jim Boucher, Lee Crocker, Bob Friesenhahn, Ben Jackson,
27 +Julian Minguillon, Luis Ortiz, George Phillips, Davide Rossi, Ge' Weijers,
28 +and other members of the Independent JPEG Group.
29
30 -Serious users of this software (particularly those incorporating it into
31 -larger programs) should contact IJG at jpeg-info@uunet.uu.net to be added to
32 -our electronic mailing list. Mailing list members are notified of updates
33 -and have a chance to participate in technical discussions, etc.
34 +IJG is not affiliated with the ISO/IEC JTC1/SC29/WG1 standards committee
35 +(also known as JPEG, together with ITU-T SG16).
36
37 -This software is the work of Tom Lane, Philip Gladstone, Jim Boucher,
38 -Lee Crocker, Julian Minguillon, Luis Ortiz, George Phillips, Davide Rossi,
39 -Guido Vollbeding, Ge' Weijers, and other members of the Independent JPEG
40 -Group.
41
42 -IJG is not affiliated with the official ISO JPEG standards committee.
43 -
44 -
45 DOCUMENTATION ROADMAP
46 =====================
47
48 @@ -30,7 +30,6 @@
49 LEGAL ISSUES Copyright, lack of warranty, terms of distribution.
50 REFERENCES Where to learn more about JPEG.
51 ARCHIVE LOCATIONS Where to find newer versions of this software.
52 -RELATED SOFTWARE Other stuff you should get.
53 FILE FORMAT WARS Software *not* to get.
54 TO DO Plans for future IJG releases.
55
56 @@ -37,20 +36,19 @@
57 Other documentation files in the distribution are:
58
59 User documentation:
60 - install.doc How to configure and install the IJG software.
61 - usage.doc Usage instructions for cjpeg, djpeg, jpegtran,
62 + install.txt How to configure and install the IJG software.
63 + usage.txt Usage instructions for cjpeg, djpeg, jpegtran,
64 rdjpgcom, and wrjpgcom.
65 - *.1 Unix-style man pages for programs (same info as usage.doc).
66 - wizard.doc Advanced usage instructions for JPEG wizards only.
67 + *.1 Unix-style man pages for programs (same info as usage.txt).
68 + wizard.txt Advanced usage instructions for JPEG wizards only.
69 change.log Version-to-version change highlights.
70 Programmer and internal documentation:
71 - libjpeg.doc How to use the JPEG library in your own programs.
72 + libjpeg.txt How to use the JPEG library in your own programs.
73 example.c Sample code for calling the JPEG library.
74 - structure.doc Overview of the JPEG library's internal structure.
75 - filelist.doc Road map of IJG files.
76 - coderules.doc Coding style rules --- please read if you contribute code.
77 + structure.txt Overview of the JPEG library's internal structure.
78 + coderules.txt Coding style rules --- please read if you contribute code.
79
80 -Please read at least the files install.doc and usage.doc. Useful information
81 +Please read at least the files install.txt and usage.txt. Some information
82 can also be found in the JPEG FAQ (Frequently Asked Questions) article. See
83 ARCHIVE LOCATIONS below to find out where to obtain the FAQ article.
84
85 @@ -62,24 +60,27 @@
86 OVERVIEW
87 ========
88
89 -This package contains C software to implement JPEG image compression and
90 -decompression. JPEG (pronounced "jay-peg") is a standardized compression
91 -method for full-color and gray-scale images. JPEG is intended for compressing
92 -"real-world" scenes; line drawings, cartoons and other non-realistic images
93 -are not its strong suit. JPEG is lossy, meaning that the output image is not
94 -exactly identical to the input image. Hence you must not use JPEG if you
95 -have to have identical output bits. However, on typical photographic images,
96 -very good compression levels can be obtained with no visible change, and
97 -remarkably high compression levels are possible if you can tolerate a
98 -low-quality image. For more details, see the references, or just experiment
99 -with various compression settings.
100 +This package contains C software to implement JPEG image encoding, decoding,
101 +and transcoding. JPEG (pronounced "jay-peg") is a standardized compression
102 +method for full-color and gray-scale images. JPEG's strong suit is compressing
103 +photographic images or other types of images that have smooth color and
104 +brightness transitions between neighboring pixels. Images with sharp lines or
105 +other abrupt features may not compress well with JPEG, and a higher JPEG
106 +quality may have to be used to avoid visible compression artifacts with such
107 +images.
108
109 +JPEG is lossy, meaning that the output pixels are not necessarily identical to
110 +the input pixels. However, on photographic content and other "smooth" images,
111 +very good compression ratios can be obtained with no visible compression
112 +artifacts, and extremely high compression ratios are possible if you are
113 +willing to sacrifice image quality (by reducing the "quality" setting in the
114 +compressor.)
115 +
116 This software implements JPEG baseline, extended-sequential, and progressive
117 compression processes. Provision is made for supporting all variants of these
118 processes, although some uncommon parameter settings aren't implemented yet.
119 -For legal reasons, we are not distributing code for the arithmetic-coding
120 -variants of JPEG; see LEGAL ISSUES. We have made no provision for supporting
121 -the hierarchical or lossless processes defined in the standard.
122 +We have made no provision for supporting the hierarchical or lossless
123 +processes defined in the standard.
124
125 We provide a set of library routines for reading and writing JPEG image files,
126 plus two sample applications "cjpeg" and "djpeg", which use the library to
127 @@ -91,11 +92,12 @@
128 for example, the color quantization modules are not strictly part of JPEG
129 decoding, but they are essential for output to colormapped file formats or
130 colormapped displays. These extra functions can be compiled out of the
131 -library if not required for a particular application. We have also included
132 -"jpegtran", a utility for lossless transcoding between different JPEG
133 -processes, and "rdjpgcom" and "wrjpgcom", two simple applications for
134 -inserting and extracting textual comments in JFIF files.
135 +library if not required for a particular application.
136
137 +We have also included "jpegtran", a utility for lossless transcoding between
138 +different JPEG processes, and "rdjpgcom" and "wrjpgcom", two simple
139 +applications for inserting and extracting textual comments in JFIF files.
140 +
141 The emphasis in designing this software has been on achieving portability and
142 flexibility, while also making it fast enough to be useful. In particular,
143 the software is not intended to be read as a tutorial on JPEG. (See the
144 @@ -127,7 +129,7 @@
145 fitness for a particular purpose. This software is provided "AS IS", and you,
146 its user, assume the entire risk as to its quality and accuracy.
147
148 -This software is copyright (C) 1991-1998, Thomas G. Lane.
149 +This software is copyright (C) 1991-2012, Thomas G. Lane, Guido Vollbeding.
150 All Rights Reserved except as specified below.
151
152 Permission is hereby granted to use, copy, modify, and distribute this
153 @@ -158,30 +160,12 @@
154 assumed by the product vendor.
155
156
157 -ansi2knr.c is included in this distribution by permission of L. Peter Deutsch,
158 -sole proprietor of its copyright holder, Aladdin Enterprises of Menlo Park, CA.
159 -ansi2knr.c is NOT covered by the above copyright and conditions, but instead
160 -by the usual distribution terms of the Free Software Foundation; principally,
161 -that you must include source code if you redistribute it. (See the file
162 -ansi2knr.c for full details.) However, since ansi2knr.c is not needed as part
163 -of any program generated from the IJG code, this does not limit you more than
164 -the foregoing paragraphs do.
165 -
166 The Unix configuration script "configure" was produced with GNU Autoconf.
167 It is copyright by the Free Software Foundation but is freely distributable.
168 The same holds for its supporting scripts (config.guess, config.sub,
169 -ltconfig, ltmain.sh). Another support script, install-sh, is copyright
170 -by M.I.T. but is also freely distributable.
171 +ltmain.sh). Another support script, install-sh, is copyright by X Consortium
172 +but is also freely distributable.
173
174 -It appears that the arithmetic coding option of the JPEG spec is covered by
175 -patents owned by IBM, AT&T, and Mitsubishi. Hence arithmetic coding cannot
176 -legally be used without obtaining one or more licenses. For this reason,
177 -support for arithmetic coding has been removed from the free JPEG software.
178 -(Since arithmetic coding provides only a marginal gain over the unpatented
179 -Huffman mode, it is unlikely that very many implementations will support it.)
180 -So far as we are aware, there are no patent restrictions on the remaining
181 -code.
182 -
183 The IJG distribution formerly included code to read and write GIF files.
184 To avoid entanglement with the Unisys LZW patent, GIF reading support has
185 been removed altogether, and the GIF writer has been simplified to produce
186 @@ -198,7 +182,7 @@
187 REFERENCES
188 ==========
189
190 -We highly recommend reading one or more of these references before trying to
191 +We recommend reading one or more of these references before trying to
192 understand the innards of the JPEG software.
193
194 The best short technical introduction to the JPEG compression algorithm is
195 @@ -207,7 +191,7 @@
196 (Adjacent articles in that issue discuss MPEG motion picture compression,
197 applications of JPEG, and related topics.) If you don't have the CACM issue
198 handy, a PostScript file containing a revised version of Wallace's article is
199 -available at ftp://ftp.uu.net/graphics/jpeg/wallace.ps.gz. The file (actually
200 +available at http://www.ijg.org/files/wallace.ps.gz. The file (actually
201 a preprint for an article that appeared in IEEE Trans. Consumer Electronics)
202 omits the sample images that appeared in CACM, but it includes corrections
203 and some added material. Note: the Wallace article is copyright ACM and IEEE,
204 @@ -222,45 +206,29 @@
205 sample code is far from industrial-strength, but when you are ready to look
206 at a full implementation, you've got one here...
207
208 -The best full description of JPEG is the textbook "JPEG Still Image Data
209 -Compression Standard" by William B. Pennebaker and Joan L. Mitchell, published
210 -by Van Nostrand Reinhold, 1993, ISBN 0-442-01272-1. Price US$59.95, 638 pp.
211 -The book includes the complete text of the ISO JPEG standards (DIS 10918-1
212 -and draft DIS 10918-2). This is by far the most complete exposition of JPEG
213 -in existence, and we highly recommend it.
214 +The best currently available description of JPEG is the textbook "JPEG Still
215 +Image Data Compression Standard" by William B. Pennebaker and Joan L.
216 +Mitchell, published by Van Nostrand Reinhold, 1993, ISBN 0-442-01272-1.
217 +Price US$59.95, 638 pp. The book includes the complete text of the ISO JPEG
218 +standards (DIS 10918-1 and draft DIS 10918-2).
219
220 -The JPEG standard itself is not available electronically; you must order a
221 -paper copy through ISO or ITU. (Unless you feel a need to own a certified
222 -official copy, we recommend buying the Pennebaker and Mitchell book instead;
223 -it's much cheaper and includes a great deal of useful explanatory material.)
224 -In the USA, copies of the standard may be ordered from ANSI Sales at (212)
225 -642-4900, or from Global Engineering Documents at (800) 854-7179. (ANSI
226 -doesn't take credit card orders, but Global does.) It's not cheap: as of
227 -1992, ANSI was charging $95 for Part 1 and $47 for Part 2, plus 7%
228 -shipping/handling. The standard is divided into two parts, Part 1 being the
229 -actual specification, while Part 2 covers compliance testing methods. Part 1
230 -is titled "Digital Compression and Coding of Continuous-tone Still Images,
231 +The original JPEG standard is divided into two parts, Part 1 being the actual
232 +specification, while Part 2 covers compliance testing methods. Part 1 is
233 +titled "Digital Compression and Coding of Continuous-tone Still Images,
234 Part 1: Requirements and guidelines" and has document numbers ISO/IEC IS
235 10918-1, ITU-T T.81. Part 2 is titled "Digital Compression and Coding of
236 Continuous-tone Still Images, Part 2: Compliance testing" and has document
237 numbers ISO/IEC IS 10918-2, ITU-T T.83.
238
239 -Some extensions to the original JPEG standard are defined in JPEG Part 3,
240 -a newer ISO standard numbered ISO/IEC IS 10918-3 and ITU-T T.84. IJG
241 -currently does not support any Part 3 extensions.
242 -
243 The JPEG standard does not specify all details of an interchangeable file
244 format. For the omitted details we follow the "JFIF" conventions, revision
245 -1.02. A copy of the JFIF spec is available from:
246 - Literature Department
247 - C-Cube Microsystems, Inc.
248 - 1778 McCarthy Blvd.
249 - Milpitas, CA 95035
250 - phone (408) 944-6300, fax (408) 944-6314
251 -A PostScript version of this document is available by FTP at
252 -ftp://ftp.uu.net/graphics/jpeg/jfif.ps.gz. There is also a plain text
253 -version at ftp://ftp.uu.net/graphics/jpeg/jfif.txt.gz, but it is missing
254 -the figures.
255 +1.02. JFIF 1.02 has been adopted as an Ecma International Technical Report
256 +and thus received a formal publication status. It is available as a free
257 +download in PDF format from
258 +http://www.ecma-international.org/publications/techreports/E-TR-098.htm.
259 +A PostScript version of the JFIF document is available at
260 +http://www.ijg.org/files/jfif.ps.gz. There is also a plain text version at
261 +http://www.ijg.org/files/jfif.txt.gz, but it is missing the figures.
262
263 The TIFF 6.0 file format specification can be obtained by FTP from
264 ftp://ftp.sgi.com/graphics/tiff/TIFF6.ps.gz. The JPEG incorporation scheme
265 @@ -267,37 +235,24 @@
266 found in the TIFF 6.0 spec of 3-June-92 has a number of serious problems.
267 IJG does not recommend use of the TIFF 6.0 design (TIFF Compression tag 6).
268 Instead, we recommend the JPEG design proposed by TIFF Technical Note #2
269 -(Compression tag 7). Copies of this Note can be obtained from ftp.sgi.com or
270 -from ftp://ftp.uu.net/graphics/jpeg/. It is expected that the next revision
271 +(Compression tag 7). Copies of this Note can be obtained from
272 +http://www.ijg.org/files/. It is expected that the next revision
273 of the TIFF spec will replace the 6.0 JPEG design with the Note's design.
274 Although IJG's own code does not support TIFF/JPEG, the free libtiff library
275 -uses our library to implement TIFF/JPEG per the Note. libtiff is available
276 -from ftp://ftp.sgi.com/graphics/tiff/.
277 +uses our library to implement TIFF/JPEG per the Note.
278
279
280 ARCHIVE LOCATIONS
281 =================
282
283 -The "official" archive site for this software is ftp.uu.net (Internet
284 -address 192.48.96.9). The most recent released version can always be found
285 -there in directory graphics/jpeg. This particular version will be archived
286 -as ftp://ftp.uu.net/graphics/jpeg/jpegsrc.v6b.tar.gz. If you don't have
287 -direct Internet access, UUNET's archives are also available via UUCP; contact
288 -help@uunet.uu.net for information on retrieving files that way.
289 +The "official" archive site for this software is www.ijg.org.
290 +The most recent released version can always be found there in
291 +directory "files". This particular version will be archived as
292 +http://www.ijg.org/files/jpegsrc.v8d.tar.gz, and in Windows-compatible
293 +"zip" archive format as http://www.ijg.org/files/jpegsr8d.zip.
294
295 -Numerous Internet sites maintain copies of the UUNET files. However, only
296 -ftp.uu.net is guaranteed to have the latest official version.
297 -
298 -You can also obtain this software in DOS-compatible "zip" archive format from
299 -the SimTel archives (ftp://ftp.simtel.net/pub/simtelnet/msdos/graphics/), or
300 -on CompuServe in the Graphics Support forum (GO CIS:GRAPHSUP), library 12
301 -"JPEG Tools". Again, these versions may sometimes lag behind the ftp.uu.net
302 -release.
303 -
304 -The JPEG FAQ (Frequently Asked Questions) article is a useful source of
305 -general information about JPEG. It is updated constantly and therefore is
306 -not included in this distribution. The FAQ is posted every two weeks to
307 -Usenet newsgroups comp.graphics.misc, news.answers, and other groups.
308 +The JPEG FAQ (Frequently Asked Questions) article is a source of some
309 +general information about JPEG.
310 It is available on the World Wide Web at http://www.faqs.org/faqs/jpeg-faq/
311 and other news.answers archive sites, including the official news.answers
312 archive at rtfm.mit.edu: ftp://rtfm.mit.edu/pub/usenet/news.answers/jpeg-faq/.
313 @@ -307,79 +262,21 @@
314 send usenet/news.answers/jpeg-faq/part2
315
316
317 -RELATED SOFTWARE
318 -================
319 -
320 -Numerous viewing and image manipulation programs now support JPEG. (Quite a
321 -few of them use this library to do so.) The JPEG FAQ described above lists
322 -some of the more popular free and shareware viewers, and tells where to
323 -obtain them on Internet.
324 -
325 -If you are on a Unix machine, we highly recommend Jef Poskanzer's free
326 -PBMPLUS software, which provides many useful operations on PPM-format image
327 -files. In particular, it can convert PPM images to and from a wide range of
328 -other formats, thus making cjpeg/djpeg considerably more useful. The latest
329 -version is distributed by the NetPBM group, and is available from numerous
330 -sites, notably ftp://wuarchive.wustl.edu/graphics/graphics/packages/NetPBM/.
331 -Unfortunately PBMPLUS/NETPBM is not nearly as portable as the IJG software is;
332 -you are likely to have difficulty making it work on any non-Unix machine.
333 -
334 -A different free JPEG implementation, written by the PVRG group at Stanford,
335 -is available from ftp://havefun.stanford.edu/pub/jpeg/. This program
336 -is designed for research and experimentation rather than production use;
337 -it is slower, harder to use, and less portable than the IJG code, but it
338 -is easier to read and modify. Also, the PVRG code supports lossless JPEG,
339 -which we do not. (On the other hand, it doesn't do progressive JPEG.)
340 -
341 -
342 FILE FORMAT WARS
343 ================
344
345 -Some JPEG programs produce files that are not compatible with our library.
346 -The root of the problem is that the ISO JPEG committee failed to specify a
347 -concrete file format. Some vendors "filled in the blanks" on their own,
348 -creating proprietary formats that no one else could read. (For example, none
349 -of the early commercial JPEG implementations for the Macintosh were able to
350 -exchange compressed files.)
351 +The ISO/IEC JTC1/SC29/WG1 standards committee (also known as JPEG, together
352 +with ITU-T SG16) currently promotes different formats containing the name
353 +"JPEG" which are incompatible with original DCT-based JPEG. IJG therefore does
354 +not support these formats (see REFERENCES). Indeed, one of the original
355 +reasons for developing this free software was to help force convergence on
356 +common, interoperable format standards for JPEG files.
357 +Don't use an incompatible file format!
358 +(In any case, our decoder will remain capable of reading existing JPEG
359 +image files indefinitely.)
360
361 -The file format we have adopted is called JFIF (see REFERENCES). This format
362 -has been agreed to by a number of major commercial JPEG vendors, and it has
363 -become the de facto standard. JFIF is a minimal or "low end" representation.
364 -We recommend the use of TIFF/JPEG (TIFF revision 6.0 as modified by TIFF
365 -Technical Note #2) for "high end" applications that need to record a lot of
366 -additional data about an image. TIFF/JPEG is fairly new and not yet widely
367 -supported, unfortunately.
368
369 -The upcoming JPEG Part 3 standard defines a file format called SPIFF.
370 -SPIFF is interoperable with JFIF, in the sense that most JFIF decoders should
371 -be able to read the most common variant of SPIFF. SPIFF has some technical
372 -advantages over JFIF, but its major claim to fame is simply that it is an
373 -official standard rather than an informal one. At this point it is unclear
374 -whether SPIFF will supersede JFIF or whether JFIF will remain the de-facto
375 -standard. IJG intends to support SPIFF once the standard is frozen, but we
376 -have not decided whether it should become our default output format or not.
377 -(In any case, our decoder will remain capable of reading JFIF indefinitely.)
378 -
379 -Various proprietary file formats incorporating JPEG compression also exist.
380 -We have little or no sympathy for the existence of these formats. Indeed,
381 -one of the original reasons for developing this free software was to help
382 -force convergence on common, open format standards for JPEG files. Don't
383 -use a proprietary file format!
384 -
385 -
386 TO DO
387 =====
388
389 -The major thrust for v7 will probably be improvement of visual quality.
390 -The current method for scaling the quantization tables is known not to be
391 -very good at low Q values. We also intend to investigate block boundary
392 -smoothing, "poor man's variable quantization", and other means of improving
393 -quality-vs-file-size performance without sacrificing compatibility.
394 -
395 -In future versions, we are considering supporting some of the upcoming JPEG
396 -Part 3 extensions --- principally, variable quantization and the SPIFF file
397 -format.
398 -
399 -As always, speeding things up is of great interest.
400 -
401 -Please send bug reports, offers of help, etc. to jpeg-info@uunet.uu.net.
402 +Please send bug reports, offers of help, etc. to jpeg-info@jpegclub.org.
403 Index: bmp.c
404 ===================================================================
405 --- bmp.c (revision 829)
406 +++ bmp.c (working copy)
407 @@ -1,370 +1,274 @@
408 -/* Copyright (C)2004 Landmark Graphics Corporation
409 - * Copyright (C)2005 Sun Microsystems, Inc.
410 +/*
411 + * Copyright (C)2011 D. R. Commander. All Rights Reserved.
412 *
413 - * This library is free software and may be redistributed and/or modified under
414 - * the terms of the wxWindows Library License, Version 3.1 or (at your option)
415 - * any later version. The full license is in the LICENSE.txt file included
416 - * with this distribution.
417 + * Redistribution and use in source and binary forms, with or without
418 + * modification, are permitted provided that the following conditions are met:
419 *
420 - * This library is distributed in the hope that it will be useful,
421 - * but WITHOUT ANY WARRANTY; without even the implied warranty of
422 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
423 - * wxWindows Library License for more details.
424 -*/
425 + * - Redistributions of source code must retain the above copyright notice,
426 + * this list of conditions and the following disclaimer.
427 + * - Redistributions in binary form must reproduce the above copyright notice,
428 + * this list of conditions and the following disclaimer in the documentation
429 + * and/or other materials provided with the distribution.
430 + * - Neither the name of the libjpeg-turbo Project nor the names of its
431 + * contributors may be used to endorse or promote products derived from this
432 + * software without specific prior written permission.
433 + *
434 + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS",
435 + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
436 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
437 + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
438 + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
439 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
440 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
441 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
442 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
443 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
444 + * POSSIBILITY OF SUCH DAMAGE.
445 + */
446
447 -#include <fcntl.h>
448 -#include <sys/types.h>
449 -#include <sys/stat.h>
450 -#include <errno.h>
451 -#include <stdlib.h>
452 #include <stdio.h>
453 #include <string.h>
454 -#ifdef _WIN32
455 - #include <io.h>
456 -#else
457 - #include <unistd.h>
458 -#endif
459 -#include "./rrutil.h"
460 -#include "./bmp.h"
461 +#include <setjmp.h>
462 +#include <errno.h>
463 +#include "cdjpeg.h"
464 +#include <jpeglib.h>
465 +#include <jpegint.h>
466 +#include "tjutil.h"
467 +#include "bmp.h"
468
469 -#ifndef BI_BITFIELDS
470 -#define BI_BITFIELDS 3L
471 -#endif
472 -#ifndef BI_RGB
473 -#define BI_RGB 0L
474 -#endif
475
476 -#define BMPHDRSIZE 54
477 -typedef struct _bmphdr
478 -{
479 - unsigned short bfType;
480 - unsigned int bfSize;
481 - unsigned short bfReserved1, bfReserved2;
482 - unsigned int bfOffBits;
483 +/* This duplicates the functionality of the VirtualGL bitmap library using
484 + the components from cjpeg and djpeg */
485
486 - unsigned int biSize;
487 - int biWidth, biHeight;
488 - unsigned short biPlanes, biBitCount;
489 - unsigned int biCompression, biSizeImage;
490 - int biXPelsPerMeter, biYPelsPerMeter;
491 - unsigned int biClrUsed, biClrImportant;
492 -} bmphdr;
493
494 -static const char *__bmperr="No error";
495 +/* Error handling (based on example in example.c) */
496
497 -static const int ps[BMPPIXELFORMATS]={3, 4, 3, 4, 4, 4};
498 -static const int roffset[BMPPIXELFORMATS]={0, 0, 2, 2, 3, 1};
499 -static const int goffset[BMPPIXELFORMATS]={1, 1, 1, 1, 2, 2};
500 -static const int boffset[BMPPIXELFORMATS]={2, 2, 0, 0, 1, 3};
501 +static char errStr[JMSG_LENGTH_MAX]="No error";
502
503 -#define _throw(m) {__bmperr=m; retcode=-1; goto finally;}
504 -#define _unix(f) {if((f)==-1) _throw(strerror(errno));}
505 -#define _catch(f) {if((f)==-1) {retcode=-1; goto finally;}}
506 +struct my_error_mgr
507 +{
508 + struct jpeg_error_mgr pub;
509 + jmp_buf setjmp_buffer;
510 +};
511 +typedef struct my_error_mgr *my_error_ptr;
512
513 -#define readme(fd, addr, size) \
514 - if((bytesread=read(fd, addr, (size)))==-1) _throw(strerror(errno)); \
515 - if(bytesread!=(size)) _throw("Read error");
516 -
517 -void pixelconvert(unsigned char *srcbuf, enum BMPPIXELFORMAT srcformat,
518 - int srcpitch, unsigned char *dstbuf, enum BMPPIXELFORMAT dstformat, int dstpitch,
519 - int w, int h, int flip)
520 +static void my_error_exit(j_common_ptr cinfo)
521 {
522 - unsigned char *srcptr, *srcptr0, *dstptr, *dstptr0;
523 - int i, j;
524 -
525 - srcptr=flip? &srcbuf[srcpitch*(h-1)]:srcbuf;
526 - for(j=0, dstptr=dstbuf; j<h; j++,
527 - srcptr+=flip? -srcpitch:srcpitch, dstptr+=dstpitch)
528 - {
529 - for(i=0, srcptr0=srcptr, dstptr0=dstptr; i<w; i++,
530 - srcptr0+=ps[srcformat], dstptr0+=ps[dstformat])
531 - {
532 - dstptr0[roffset[dstformat]]=srcptr0[roffset[srcformat]];
533 - dstptr0[goffset[dstformat]]=srcptr0[goffset[srcformat]];
534 - dstptr0[boffset[dstformat]]=srcptr0[boffset[srcformat]];
535 - }
536 - }
537 + my_error_ptr myerr=(my_error_ptr)cinfo->err;
538 + (*cinfo->err->output_message)(cinfo);
539 + longjmp(myerr->setjmp_buffer, 1);
540 }
541
542 -int loadppm(int *fd, unsigned char **buf, int *w, int *h,
543 - enum BMPPIXELFORMAT f, int align, int dstbottomup, int ascii)
544 +/* Based on output_message() in jerror.c */
545 +
546 +static void my_output_message(j_common_ptr cinfo)
547 {
548 - FILE *fs=NULL; int retcode=0, scalefactor, dstpitch;
549 - unsigned char *tempbuf=NULL; char temps[255], temps2[255];
550 - int numread=0, totalread=0, pixel[3], i, j;
551 + (*cinfo->err->format_message)(cinfo, errStr);
552 +}
553
554 - if((fs=fdopen(*fd, "r"))==NULL) _throw(strerror(errno));
555 +#define _throw(m) {snprintf(errStr, JMSG_LENGTH_MAX, "%s", m); \
556 + retval=-1; goto bailout;}
557 +#define _throwunix(m) {snprintf(errStr, JMSG_LENGTH_MAX, "%s\n%s", m, \
558 + strerror(errno)); retval=-1; goto bailout;}
559
560 - do
561 - {
562 - if(!fgets(temps, 255, fs)) _throw("Read error");
563 - if(strlen(temps)==0 || temps[0]=='\n') continue;
564 - if(sscanf(temps, "%s", temps2)==1 && temps2[1]=='#') continue;
565 - switch(totalread)
566 - {
567 - case 0:
568 - if((numread=sscanf(temps, "%d %d %d", w, h, &sca lefactor))==EOF)
569 - _throw("Read error");
570 - break;
571 - case 1:
572 - if((numread=sscanf(temps, "%d %d", h, &scalefact or))==EOF)
573 - _throw("Read error");
574 - break;
575 - case 2:
576 - if((numread=sscanf(temps, "%d", &scalefactor))== EOF)
577 - _throw("Read error");
578 - break;
579 - }
580 - totalread+=numread;
581 - } while(totalread<3);
582 - if((*w)<1 || (*h)<1 || scalefactor<1) _throw("Corrupt PPM header");
583
584 - dstpitch=(((*w)*ps[f])+(align-1))&(~(align-1));
585 - if((*buf=(unsigned char *)malloc(dstpitch*(*h)))==NULL)
586 - _throw("Memory allocation error");
587 - if(ascii)
588 +static void pixelconvert(unsigned char *srcbuf, int srcpf, int srcbottomup,
589 + unsigned char *dstbuf, int dstpf, int dstbottomup, int w, int h)
590 +{
591 + unsigned char *srcptr=srcbuf, *srcptr2;
592 + int srcps=tjPixelSize[srcpf];
593 + int srcstride=srcbottomup? -w*srcps:w*srcps;
594 + unsigned char *dstptr=dstbuf, *dstptr2;
595 + int dstps=tjPixelSize[dstpf];
596 + int dststride=dstbottomup? -w*dstps:w*dstps;
597 + int row, col;
598 +
599 + if(srcbottomup) srcptr=&srcbuf[w*srcps*(h-1)];
600 + if(dstbottomup) dstptr=&dstbuf[w*dstps*(h-1)];
601 + for(row=0; row<h; row++, srcptr+=srcstride, dstptr+=dststride)
602 {
603 - for(j=0; j<*h; j++)
604 + for(col=0, srcptr2=srcptr, dstptr2=dstptr; col<w; col++, srcptr2 +=srcps,
605 + dstptr2+=dstps)
606 {
607 - for(i=0; i<*w; i++)
608 - {
609 - if(fscanf(fs, "%d%d%d", &pixel[0], &pixel[1], &p ixel[2])!=3)
610 - _throw("Read error");
611 - (*buf)[j*dstpitch+i*ps[f]+roffset[f]]=(unsigned char)(pixel[0]*255/scalefactor);
612 - (*buf)[j*dstpitch+i*ps[f]+goffset[f]]=(unsigned char)(pixel[1]*255/scalefactor);
613 - (*buf)[j*dstpitch+i*ps[f]+boffset[f]]=(unsigned char)(pixel[2]*255/scalefactor);
614 - }
615 + dstptr2[tjRedOffset[dstpf]]=srcptr2[tjRedOffset[srcpf]];
616 + dstptr2[tjGreenOffset[dstpf]]=srcptr2[tjGreenOffset[srcp f]];
617 + dstptr2[tjBlueOffset[dstpf]]=srcptr2[tjBlueOffset[srcpf] ];
618 }
619 }
620 - else
621 - {
622 - if(scalefactor!=255)
623 - _throw("Binary PPMs must have 8-bit components");
624 - if((tempbuf=(unsigned char *)malloc((*w)*(*h)*3))==NULL)
625 - _throw("Memory allocation error");
626 - if(fread(tempbuf, (*w)*(*h)*3, 1, fs)!=1) _throw("Read error");
627 - pixelconvert(tempbuf, BMP_RGB, (*w)*3, *buf, f, dstpitch, *w, *h , dstbottomup);
628 - }
629 -
630 - finally:
631 - if(fs) {fclose(fs); *fd=-1;}
632 - if(tempbuf) free(tempbuf);
633 - return retcode;
634 }
635
636
637 int loadbmp(char *filename, unsigned char **buf, int *w, int *h,
638 - enum BMPPIXELFORMAT f, int align, int dstbottomup)
639 + int dstpf, int bottomup)
640 {
641 - int fd=-1, bytesread, srcpitch, srcbottomup=1, srcps, dstpitch,
642 - retcode=0;
643 - unsigned char *tempbuf=NULL;
644 - bmphdr bh; int flags=O_RDONLY;
645 + int retval=0, dstps, srcpf, tempc;
646 + struct jpeg_compress_struct cinfo;
647 + struct my_error_mgr jerr;
648 + cjpeg_source_ptr src;
649 + FILE *file=NULL;
650
651 - dstbottomup=dstbottomup? 1:0;
652 - #ifdef _WIN32
653 - flags|=O_BINARY;
654 - #endif
655 - if(!filename || !buf || !w || !h || f<0 || f>BMPPIXELFORMATS-1 || align< 1)
656 - _throw("invalid argument to loadbmp()");
657 - if((align&(align-1))!=0)
658 - _throw("Alignment must be a power of 2");
659 - _unix(fd=open(filename, flags));
660 + memset(&cinfo, 0, sizeof(struct jpeg_compress_struct));
661
662 - readme(fd, &bh.bfType, sizeof(unsigned short));
663 - if(!littleendian()) bh.bfType=byteswap16(bh.bfType);
664 + if(!filename || !buf || !w || !h || dstpf<0 || dstpf>=TJ_NUMPF)
665 + _throw("loadbmp(): Invalid argument");
666
667 - if(bh.bfType==0x3650)
668 + if((file=fopen(filename, "rb"))==NULL)
669 + _throwunix("loadbmp(): Cannot open input file");
670 +
671 + cinfo.err=jpeg_std_error(&jerr.pub);
672 + jerr.pub.error_exit=my_error_exit;
673 + jerr.pub.output_message=my_output_message;
674 +
675 + if(setjmp(jerr.setjmp_buffer))
676 {
677 - _catch(loadppm(&fd, buf, w, h, f, align, dstbottomup, 0));
678 - goto finally;
679 + /* If we get here, the JPEG code has signaled an error. */
680 + retval=-1; goto bailout;
681 }
682 - if(bh.bfType==0x3350)
683 - {
684 - _catch(loadppm(&fd, buf, w, h, f, align, dstbottomup, 1));
685 - goto finally;
686 - }
687
688 - readme(fd, &bh.bfSize, sizeof(unsigned int));
689 - readme(fd, &bh.bfReserved1, sizeof(unsigned short));
690 - readme(fd, &bh.bfReserved2, sizeof(unsigned short));
691 - readme(fd, &bh.bfOffBits, sizeof(unsigned int));
692 - readme(fd, &bh.biSize, sizeof(unsigned int));
693 - readme(fd, &bh.biWidth, sizeof(int));
694 - readme(fd, &bh.biHeight, sizeof(int));
695 - readme(fd, &bh.biPlanes, sizeof(unsigned short));
696 - readme(fd, &bh.biBitCount, sizeof(unsigned short));
697 - readme(fd, &bh.biCompression, sizeof(unsigned int));
698 - readme(fd, &bh.biSizeImage, sizeof(unsigned int));
699 - readme(fd, &bh.biXPelsPerMeter, sizeof(int));
700 - readme(fd, &bh.biYPelsPerMeter, sizeof(int));
701 - readme(fd, &bh.biClrUsed, sizeof(unsigned int));
702 - readme(fd, &bh.biClrImportant, sizeof(unsigned int));
703 + jpeg_create_compress(&cinfo);
704 + if((tempc=getc(file))<0 || ungetc(tempc, file)==EOF)
705 + _throwunix("loadbmp(): Could not read input file")
706 + else if(tempc==EOF) _throw("loadbmp(): Input file contains no data");
707
708 - if(!littleendian())
709 + if(tempc=='B')
710 {
711 - bh.bfSize=byteswap(bh.bfSize);
712 - bh.bfOffBits=byteswap(bh.bfOffBits);
713 - bh.biSize=byteswap(bh.biSize);
714 - bh.biWidth=byteswap(bh.biWidth);
715 - bh.biHeight=byteswap(bh.biHeight);
716 - bh.biPlanes=byteswap16(bh.biPlanes);
717 - bh.biBitCount=byteswap16(bh.biBitCount);
718 - bh.biCompression=byteswap(bh.biCompression);
719 - bh.biSizeImage=byteswap(bh.biSizeImage);
720 - bh.biXPelsPerMeter=byteswap(bh.biXPelsPerMeter);
721 - bh.biYPelsPerMeter=byteswap(bh.biYPelsPerMeter);
722 - bh.biClrUsed=byteswap(bh.biClrUsed);
723 - bh.biClrImportant=byteswap(bh.biClrImportant);
724 + if((src=jinit_read_bmp(&cinfo))==NULL)
725 + _throw("loadbmp(): Could not initialize bitmap loader");
726 }
727 + else if(tempc=='P')
728 + {
729 + if((src=jinit_read_ppm(&cinfo))==NULL)
730 + _throw("loadbmp(): Could not initialize bitmap loader");
731 + }
732 + else _throw("loadbmp(): Unsupported file type");
733
734 - if(bh.bfType!=0x4d42 || bh.bfOffBits<BMPHDRSIZE
735 - || bh.biWidth<1 || bh.biHeight==0)
736 - _throw("Corrupt bitmap header");
737 - if((bh.biBitCount!=24 && bh.biBitCount!=32) || bh.biCompression!=BI_RGB)
738 - _throw("Only uncompessed RGB bitmaps are supported");
739 + src->input_file=file;
740 + (*src->start_input)(&cinfo, src);
741 + (*cinfo.mem->realize_virt_arrays)((j_common_ptr)&cinfo);
742
743 - *w=bh.biWidth; *h=bh.biHeight; srcps=bh.biBitCount/8;
744 - if(*h<0) {*h=-(*h); srcbottomup=0;}
745 - srcpitch=(((*w)*srcps)+3)&(~3);
746 - dstpitch=(((*w)*ps[f])+(align-1))&(~(align-1));
747 + *w=cinfo.image_width; *h=cinfo.image_height;
748
749 - if(srcpitch*(*h)+bh.bfOffBits!=bh.bfSize) _throw("Corrupt bitmap header" );
750 - if((tempbuf=(unsigned char *)malloc(srcpitch*(*h)))==NULL
751 - || (*buf=(unsigned char *)malloc(dstpitch*(*h)))==NULL)
752 - _throw("Memory allocation error");
753 - if(lseek(fd, (long)bh.bfOffBits, SEEK_SET)!=(long)bh.bfOffBits)
754 - _throw(strerror(errno));
755 - _unix(bytesread=read(fd, tempbuf, srcpitch*(*h)));
756 - if(bytesread!=srcpitch*(*h)) _throw("Read error");
757 + if(cinfo.input_components==1 && cinfo.in_color_space==JCS_RGB)
758 + srcpf=TJPF_GRAY;
759 + else srcpf=TJPF_RGB;
760
761 - pixelconvert(tempbuf, BMP_BGR, srcpitch, *buf, f, dstpitch, *w, *h,
762 - srcbottomup!=dstbottomup);
763 + dstps=tjPixelSize[dstpf];
764 + if((*buf=(unsigned char *)malloc((*w)*(*h)*dstps))==NULL)
765 + _throw("loadbmp(): Memory allocation failure");
766
767 - finally:
768 - if(tempbuf) free(tempbuf);
769 - if(fd!=-1) close(fd);
770 - return retcode;
771 + while(cinfo.next_scanline<cinfo.image_height)
772 + {
773 + int i, nlines=(*src->get_pixel_rows)(&cinfo, src);
774 + for(i=0; i<nlines; i++)
775 + {
776 + unsigned char *outbuf; int row;
777 + row=cinfo.next_scanline+i;
778 + if(bottomup) outbuf=&(*buf)[((*h)-row-1)*(*w)*dstps];
779 + else outbuf=&(*buf)[row*(*w)*dstps];
780 + pixelconvert(src->buffer[i], srcpf, 0, outbuf, dstpf, bo ttomup, *w,
781 + nlines);
782 + }
783 + cinfo.next_scanline+=nlines;
784 + }
785 +
786 + (*src->finish_input)(&cinfo, src);
787 +
788 + bailout:
789 + jpeg_destroy_compress(&cinfo);
790 + if(file) fclose(file);
791 + if(retval<0 && buf && *buf) {free(*buf); *buf=NULL;}
792 + return retval;
793 }
794
795 -#define writeme(fd, addr, size) \
796 - if((byteswritten=write(fd, addr, (size)))==-1) _throw(strerror(errno)); \
797 - if(byteswritten!=(size)) _throw("Write error");
798
799 -int saveppm(char *filename, unsigned char *buf, int w, int h,
800 - enum BMPPIXELFORMAT f, int srcpitch, int srcbottomup)
801 +int savebmp(char *filename, unsigned char *buf, int w, int h, int srcpf,
802 + int bottomup)
803 {
804 - FILE *fs=NULL; int retcode=0;
805 - unsigned char *tempbuf=NULL;
806 + int retval=0, srcps, dstpf;
807 + struct jpeg_decompress_struct dinfo;
808 + struct my_error_mgr jerr;
809 + djpeg_dest_ptr dst;
810 + FILE *file=NULL;
811 + char *ptr=NULL;
812
813 - if((fs=fopen(filename, "wb"))==NULL) _throw(strerror(errno));
814 - if(fprintf(fs, "P6\n")<1) _throw("Write error");
815 - if(fprintf(fs, "%d %d\n", w, h)<1) _throw("Write error");
816 - if(fprintf(fs, "255\n")<1) _throw("Write error");
817 + memset(&dinfo, 0, sizeof(struct jpeg_decompress_struct));
818
819 - if((tempbuf=(unsigned char *)malloc(w*h*3))==NULL)
820 - _throw("Memory allocation error");
821 + if(!filename || !buf || w<1 || h<1 || srcpf<0 || srcpf>=TJ_NUMPF)
822 + _throw("savebmp(): Invalid argument");
823
824 - pixelconvert(buf, f, srcpitch, tempbuf, BMP_RGB, w*3, w, h,
825 - srcbottomup);
826 + if((file=fopen(filename, "wb"))==NULL)
827 + _throwunix("savebmp(): Cannot open output file");
828
829 - if((fwrite(tempbuf, w*h*3, 1, fs))!=1) _throw("Write error");
830 + dinfo.err=jpeg_std_error(&jerr.pub);
831 + jerr.pub.error_exit=my_error_exit;
832 + jerr.pub.output_message=my_output_message;
833
834 - finally:
835 - if(tempbuf) free(tempbuf);
836 - if(fs) fclose(fs);
837 - return retcode;
838 -}
839 + if(setjmp(jerr.setjmp_buffer))
840 + {
841 + /* If we get here, the JPEG code has signaled an error. */
842 + retval=-1; goto bailout;
843 + }
844
845 -int savebmp(char *filename, unsigned char *buf, int w, int h,
846 - enum BMPPIXELFORMAT f, int srcpitch, int srcbottomup)
847 -{
848 - int fd=-1, byteswritten, dstpitch, retcode=0;
849 - int flags=O_RDWR|O_CREAT|O_TRUNC;
850 - unsigned char *tempbuf=NULL; char *temp;
851 - bmphdr bh; int mode;
852 + jpeg_create_decompress(&dinfo);
853 + if(srcpf==TJPF_GRAY)
854 + {
855 + dinfo.out_color_components=dinfo.output_components=1;
856 + dinfo.out_color_space=JCS_GRAYSCALE;
857 + }
858 + else
859 + {
860 + dinfo.out_color_components=dinfo.output_components=3;
861 + dinfo.out_color_space=JCS_RGB;
862 + }
863 + dinfo.image_width=w; dinfo.image_height=h;
864 + dinfo.global_state=DSTATE_READY;
865 + dinfo.scale_num=dinfo.scale_denom=1;
866
867 - #ifdef _WIN32
868 - flags|=O_BINARY; mode=_S_IREAD|_S_IWRITE;
869 - #else
870 - mode=S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH;
871 - #endif
872 - if(!filename || !buf || w<1 || h<1 || f<0 || f>BMPPIXELFORMATS-1 || srcp itch<0)
873 - _throw("bad argument to savebmp()");
874 -
875 - if(srcpitch==0) srcpitch=w*ps[f];
876 -
877 - if((temp=strrchr(filename, '.'))!=NULL)
878 + ptr=strrchr(filename, '.');
879 + if(ptr && !strcasecmp(ptr, ".bmp"))
880 {
881 - if(!stricmp(temp, ".ppm"))
882 - return saveppm(filename, buf, w, h, f, srcpitch, srcbott omup);
883 + if((dst=jinit_write_bmp(&dinfo, 0))==NULL)
884 + _throw("savebmp(): Could not initialize bitmap writer");
885 }
886 + else
887 + {
888 + if((dst=jinit_write_ppm(&dinfo))==NULL)
889 + _throw("savebmp(): Could not initialize PPM writer");
890 + }
891
892 - _unix(fd=open(filename, flags, mode));
893 - dstpitch=((w*3)+3)&(~3);
894 + dst->output_file=file;
895 + (*dst->start_output)(&dinfo, dst);
896 + (*dinfo.mem->realize_virt_arrays)((j_common_ptr)&dinfo);
897
898 - bh.bfType=0x4d42;
899 - bh.bfSize=BMPHDRSIZE+dstpitch*h;
900 - bh.bfReserved1=0; bh.bfReserved2=0;
901 - bh.bfOffBits=BMPHDRSIZE;
902 - bh.biSize=40;
903 - bh.biWidth=w; bh.biHeight=h;
904 - bh.biPlanes=0; bh.biBitCount=24;
905 - bh.biCompression=BI_RGB; bh.biSizeImage=0;
906 - bh.biXPelsPerMeter=0; bh.biYPelsPerMeter=0;
907 - bh.biClrUsed=0; bh.biClrImportant=0;
908 + if(srcpf==TJPF_GRAY) dstpf=srcpf;
909 + else dstpf=TJPF_RGB;
910 + srcps=tjPixelSize[srcpf];
911
912 - if(!littleendian())
913 + while(dinfo.output_scanline<dinfo.output_height)
914 {
915 - bh.bfType=byteswap16(bh.bfType);
916 - bh.bfSize=byteswap(bh.bfSize);
917 - bh.bfOffBits=byteswap(bh.bfOffBits);
918 - bh.biSize=byteswap(bh.biSize);
919 - bh.biWidth=byteswap(bh.biWidth);
920 - bh.biHeight=byteswap(bh.biHeight);
921 - bh.biPlanes=byteswap16(bh.biPlanes);
922 - bh.biBitCount=byteswap16(bh.biBitCount);
923 - bh.biCompression=byteswap(bh.biCompression);
924 - bh.biSizeImage=byteswap(bh.biSizeImage);
925 - bh.biXPelsPerMeter=byteswap(bh.biXPelsPerMeter);
926 - bh.biYPelsPerMeter=byteswap(bh.biYPelsPerMeter);
927 - bh.biClrUsed=byteswap(bh.biClrUsed);
928 - bh.biClrImportant=byteswap(bh.biClrImportant);
929 + int i, nlines=dst->buffer_height;
930 + for(i=0; i<nlines; i++)
931 + {
932 + unsigned char *inbuf; int row;
933 + row=dinfo.output_scanline+i;
934 + if(bottomup) inbuf=&buf[(h-row-1)*w*srcps];
935 + else inbuf=&buf[row*w*srcps];
936 + pixelconvert(inbuf, srcpf, bottomup, dst->buffer[i], dst pf, 0, w,
937 + nlines);
938 + }
939 + (*dst->put_pixel_rows)(&dinfo, dst, nlines);
940 + dinfo.output_scanline+=nlines;
941 }
942
943 - writeme(fd, &bh.bfType, sizeof(unsigned short));
944 - writeme(fd, &bh.bfSize, sizeof(unsigned int));
945 - writeme(fd, &bh.bfReserved1, sizeof(unsigned short));
946 - writeme(fd, &bh.bfReserved2, sizeof(unsigned short));
947 - writeme(fd, &bh.bfOffBits, sizeof(unsigned int));
948 - writeme(fd, &bh.biSize, sizeof(unsigned int));
949 - writeme(fd, &bh.biWidth, sizeof(int));
950 - writeme(fd, &bh.biHeight, sizeof(int));
951 - writeme(fd, &bh.biPlanes, sizeof(unsigned short));
952 - writeme(fd, &bh.biBitCount, sizeof(unsigned short));
953 - writeme(fd, &bh.biCompression, sizeof(unsigned int));
954 - writeme(fd, &bh.biSizeImage, sizeof(unsigned int));
955 - writeme(fd, &bh.biXPelsPerMeter, sizeof(int));
956 - writeme(fd, &bh.biYPelsPerMeter, sizeof(int));
957 - writeme(fd, &bh.biClrUsed, sizeof(unsigned int));
958 - writeme(fd, &bh.biClrImportant, sizeof(unsigned int));
959 + (*dst->finish_output)(&dinfo, dst);
960
961 - if((tempbuf=(unsigned char *)malloc(dstpitch*h))==NULL)
962 - _throw("Memory allocation error");
963 -
964 - pixelconvert(buf, f, srcpitch, tempbuf, BMP_BGR, dstpitch, w, h,
965 - !srcbottomup);
966 -
967 - if((byteswritten=write(fd, tempbuf, dstpitch*h))!=dstpitch*h)
968 - _throw(strerror(errno));
969 -
970 - finally:
971 - if(tempbuf) free(tempbuf);
972 - if(fd!=-1) close(fd);
973 - return retcode;
974 + bailout:
975 + jpeg_destroy_decompress(&dinfo);
976 + if(file) fclose(file);
977 + return retval;
978 }
979
980 const char *bmpgeterr(void)
981 {
982 - return __bmperr;
983 + return errStr;
984 }
985 Index: bmp.h
986 ===================================================================
987 --- bmp.h (revision 829)
988 +++ bmp.h (working copy)
989 @@ -1,48 +1,42 @@
990 -/* Copyright (C)2004 Landmark Graphics Corporation
991 - * Copyright (C)2005 Sun Microsystems, Inc.
992 +/*
993 + * Copyright (C)2011 D. R. Commander. All Rights Reserved.
994 *
995 - * This library is free software and may be redistributed and/or modified under
996 - * the terms of the wxWindows Library License, Version 3.1 or (at your option)
997 - * any later version. The full license is in the LICENSE.txt file included
998 - * with this distribution.
999 + * Redistribution and use in source and binary forms, with or without
1000 + * modification, are permitted provided that the following conditions are met:
1001 *
1002 - * This library is distributed in the hope that it will be useful,
1003 - * but WITHOUT ANY WARRANTY; without even the implied warranty of
1004 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
1005 - * wxWindows Library License for more details.
1006 -*/
1007 + * - Redistributions of source code must retain the above copyright notice,
1008 + * this list of conditions and the following disclaimer.
1009 + * - Redistributions in binary form must reproduce the above copyright notice,
1010 + * this list of conditions and the following disclaimer in the documentation
1011 + * and/or other materials provided with the distribution.
1012 + * - Neither the name of the libjpeg-turbo Project nor the names of its
1013 + * contributors may be used to endorse or promote products derived from this
1014 + * software without specific prior written permission.
1015 + *
1016 + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS",
1017 + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
1018 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
1019 + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
1020 + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
1021 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
1022 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
1023 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
1024 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
1025 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
1026 + * POSSIBILITY OF SUCH DAMAGE.
1027 + */
1028
1029 -// This provides rudimentary facilities for loading and saving true color
1030 -// BMP and PPM files
1031 -
1032 #ifndef __BMP_H__
1033 #define __BMP_H__
1034
1035 -#define BMPPIXELFORMATS 6
1036 -enum BMPPIXELFORMAT {BMP_RGB=0, BMP_RGBA, BMP_BGR, BMP_BGRA, BMP_ABGR, BMP_ARGB };
1037 +#include "./turbojpeg.h"
1038
1039 -#ifdef __cplusplus
1040 -extern "C" {
1041 -#endif
1042 +int loadbmp(char *filename, unsigned char **buf, int *w, int *h, int pf,
1043 + int bottomup);
1044
1045 -// This will load a Windows bitmap from a file and return a buffer with the
1046 -// specified pixel format, scanline alignment, and orientation. The width and
1047 -// height are returned in w and h.
1048 +int savebmp(char *filename, unsigned char *buf, int w, int h, int pf,
1049 + int bottomup);
1050
1051 -int loadbmp(char *filename, unsigned char **buf, int *w, int *h,
1052 - enum BMPPIXELFORMAT f, int align, int dstbottomup);
1053 -
1054 -// This will save a buffer with the specified pixel format, pitch, orientation,
1055 -// width, and height as a 24-bit Windows bitmap or PPM (the filename determines
1056 -// which format to use)
1057 -
1058 -int savebmp(char *filename, unsigned char *buf, int w, int h,
1059 - enum BMPPIXELFORMAT f, int srcpitch, int srcbottomup);
1060 -
1061 const char *bmpgeterr(void);
1062
1063 -#ifdef __cplusplus
1064 -}
1065 #endif
1066 -
1067 -#endif
1068 Index: cderror.h
1069 ===================================================================
1070 --- cderror.h (revision 829)
1071 +++ cderror.h (working copy)
1072 @@ -2,6 +2,7 @@
1073 * cderror.h
1074 *
1075 * Copyright (C) 1994-1997, Thomas G. Lane.
1076 + * Modified 2009 by Guido Vollbeding.
1077 * This file is part of the Independent JPEG Group's software.
1078 * For conditions of distribution and use, see the accompanying README file.
1079 *
1080 @@ -45,6 +46,7 @@
1081 JMESSAGE(JERR_BMP_BADPLANES, "Invalid BMP file: biPlanes not equal to 1")
1082 JMESSAGE(JERR_BMP_COLORSPACE, "BMP output must be grayscale or RGB")
1083 JMESSAGE(JERR_BMP_COMPRESSED, "Sorry, compressed BMPs not yet supported")
1084 +JMESSAGE(JERR_BMP_EMPTY, "Empty BMP image")
1085 JMESSAGE(JERR_BMP_NOT, "Not a BMP file - does not start with BM")
1086 JMESSAGE(JTRC_BMP, "%ux%u 24-bit BMP image")
1087 JMESSAGE(JTRC_BMP_MAPPED, "%ux%u 8-bit colormapped BMP image")
1088 Index: cdjpeg.h
1089 ===================================================================
1090 --- cdjpeg.h (revision 829)
1091 +++ cdjpeg.h (working copy)
1092 @@ -104,6 +104,7 @@
1093 #define jinit_write_targa jIWrTarga
1094 #define read_quant_tables RdQTables
1095 #define read_scan_script RdScnScript
1096 +#define set_quality_ratings SetQRates
1097 #define set_quant_slots SetQSlots
1098 #define set_sample_factors SetSFacts
1099 #define read_color_map RdCMap
1100 @@ -131,8 +132,10 @@
1101 /* cjpeg support routines (in rdswitch.c) */
1102
1103 EXTERN(boolean) read_quant_tables JPP((j_compress_ptr cinfo, char * filename,
1104 - int scale_factor, boolean force_baseline));
1105 + boolean force_baseline));
1106 EXTERN(boolean) read_scan_script JPP((j_compress_ptr cinfo, char * filename));
1107 +EXTERN(boolean) set_quality_ratings JPP((j_compress_ptr cinfo, char *arg,
1108 + boolean force_baseline));
1109 EXTERN(boolean) set_quant_slots JPP((j_compress_ptr cinfo, char *arg));
1110 EXTERN(boolean) set_sample_factors JPP((j_compress_ptr cinfo, char *arg));
1111
1112 Index: cjpeg.c
1113 ===================================================================
1114 --- cjpeg.c (revision 829)
1115 +++ cjpeg.c (working copy)
1116 @@ -1,8 +1,11 @@
1117 /*
1118 * cjpeg.c
1119 *
1120 + * This file was part of the Independent JPEG Group's software:
1121 * Copyright (C) 1991-1998, Thomas G. Lane.
1122 - * This file is part of the Independent JPEG Group's software.
1123 + * Modified 2003-2011 by Guido Vollbeding.
1124 + * libjpeg-turbo Modifications:
1125 + * Copyright (C) 2010, 2013, D. R. Commander.
1126 * For conditions of distribution and use, see the accompanying README file.
1127 *
1128 * This file contains a command-line user interface for the JPEG compressor.
1129 @@ -25,6 +28,7 @@
1130
1131 #include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */
1132 #include "jversion.h" /* for version message */
1133 +#include "config.h"
1134
1135 #ifdef USE_CCOMMAND /* command-line reader for Macintosh */
1136 #ifdef __MWERKS__
1137 @@ -135,6 +139,7 @@
1138
1139 static const char * progname; /* program name for error messages */
1140 static char * outfilename; /* for -outfile switch */
1141 +boolean memdst; /* for -memdst switch */
1142
1143
1144 LOCAL(void)
1145 @@ -149,8 +154,9 @@
1146 #endif
1147
1148 fprintf(stderr, "Switches (names may be abbreviated):\n");
1149 - fprintf(stderr, " -quality N Compression quality (0..100; 5-95 is useful range)\n");
1150 + fprintf(stderr, " -quality N[,...] Compression quality (0..100; 5-95 is us eful range)\n");
1151 fprintf(stderr, " -grayscale Create monochrome JPEG file\n");
1152 + fprintf(stderr, " -rgb Create RGB JPEG file\n");
1153 #ifdef ENTROPY_OPT_SUPPORTED
1154 fprintf(stderr, " -optimize Optimize Huffman table (smaller file, but s low compression)\n");
1155 #endif
1156 @@ -161,6 +167,9 @@
1157 fprintf(stderr, " -targa Input file is Targa format (usually not nee ded)\n");
1158 #endif
1159 fprintf(stderr, "Switches for advanced users:\n");
1160 +#ifdef C_ARITH_CODING_SUPPORTED
1161 + fprintf(stderr, " -arithmetic Use arithmetic coding\n");
1162 +#endif
1163 #ifdef DCT_ISLOW_SUPPORTED
1164 fprintf(stderr, " -dct int Use integer DCT method%s\n",
1165 (JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : ""));
1166 @@ -179,11 +188,11 @@
1167 #endif
1168 fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n");
1169 fprintf(stderr, " -outfile name Specify name for output file\n");
1170 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
1171 + fprintf(stderr, " -memdst Compress to memory instead of file (useful for benchmarking)\n");
1172 +#endif
1173 fprintf(stderr, " -verbose or -debug Emit debug output\n");
1174 fprintf(stderr, "Switches for wizards:\n");
1175 -#ifdef C_ARITH_CODING_SUPPORTED
1176 - fprintf(stderr, " -arithmetic Use arithmetic coding\n");
1177 -#endif
1178 fprintf(stderr, " -baseline Force baseline quantization tables\n");
1179 fprintf(stderr, " -qtables file Use quantization tables given in file\n");
1180 fprintf(stderr, " -qslots N[,...] Set component quantization tables\n");
1181 @@ -209,10 +218,9 @@
1182 {
1183 int argn;
1184 char * arg;
1185 - int quality; /* -quality parameter */
1186 - int q_scale_factor; /* scaling percentage for -qtables */
1187 boolean force_baseline;
1188 boolean simple_progressive;
1189 + char * qualityarg = NULL; /* saves -quality parm if any */
1190 char * qtablefile = NULL; /* saves -qtables filename if any */
1191 char * qslotsarg = NULL; /* saves -qslots parm if any */
1192 char * samplearg = NULL; /* saves -sample parm if any */
1193 @@ -219,15 +227,12 @@
1194 char * scansarg = NULL; /* saves -scans parm if any */
1195
1196 /* Set up default JPEG parameters. */
1197 - /* Note that default -quality level need not, and does not,
1198 - * match the default scaling for an explicit -qtables argument.
1199 - */
1200 - quality = 75; /* default -quality value */
1201 - q_scale_factor = 100; /* default to no scaling for -qtables */
1202 +
1203 force_baseline = FALSE; /* by default, allow 16-bit quantizers */
1204 simple_progressive = FALSE;
1205 is_targa = FALSE;
1206 outfilename = NULL;
1207 + memdst = FALSE;
1208 cinfo->err->trace_level = 0;
1209
1210 /* Scan command line options, adjust parameters */
1211 @@ -277,8 +282,11 @@
1212 static boolean printed_version = FALSE;
1213
1214 if (! printed_version) {
1215 - fprintf(stderr, "Independent JPEG Group's CJPEG, version %s\n%s\n",
1216 - JVERSION, JCOPYRIGHT);
1217 + fprintf(stderr, "%s version %s (build %s)\n",
1218 + PACKAGE_NAME, VERSION, BUILD);
1219 + fprintf(stderr, "%s\n\n", JCOPYRIGHT);
1220 + fprintf(stderr, "Emulating The Independent JPEG Group's software, versio n %s\n\n",
1221 + JVERSION);
1222 printed_version = TRUE;
1223 }
1224 cinfo->err->trace_level++;
1225 @@ -287,6 +295,10 @@
1226 /* Force a monochrome JPEG file to be generated. */
1227 jpeg_set_colorspace(cinfo, JCS_GRAYSCALE);
1228
1229 + } else if (keymatch(arg, "rgb", 3)) {
1230 + /* Force an RGB JPEG file to be generated. */
1231 + jpeg_set_colorspace(cinfo, JCS_RGB);
1232 +
1233 } else if (keymatch(arg, "maxmemory", 3)) {
1234 /* Maximum memory in Kb (or Mb with 'm'). */
1235 long lval;
1236 @@ -305,7 +317,7 @@
1237 #ifdef ENTROPY_OPT_SUPPORTED
1238 cinfo->optimize_coding = TRUE;
1239 #else
1240 - fprintf(stderr, "%s: sorry, entropy optimization was not compiled\n",
1241 + fprintf(stderr, "%s: sorry, entropy optimization was not compiled in\n",
1242 progname);
1243 exit(EXIT_FAILURE);
1244 #endif
1245 @@ -322,19 +334,26 @@
1246 simple_progressive = TRUE;
1247 /* We must postpone execution until num_components is known. */
1248 #else
1249 - fprintf(stderr, "%s: sorry, progressive output was not compiled\n",
1250 + fprintf(stderr, "%s: sorry, progressive output was not compiled in\n",
1251 progname);
1252 exit(EXIT_FAILURE);
1253 #endif
1254
1255 + } else if (keymatch(arg, "memdst", 2)) {
1256 + /* Use in-memory destination manager */
1257 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
1258 + memdst = TRUE;
1259 +#else
1260 + fprintf(stderr, "%s: sorry, in-memory destination manager was not compile d in\n",
1261 + progname);
1262 + exit(EXIT_FAILURE);
1263 +#endif
1264 +
1265 } else if (keymatch(arg, "quality", 1)) {
1266 - /* Quality factor (quantization table scaling factor). */
1267 + /* Quality ratings (quantization table scaling factors). */
1268 if (++argn >= argc) /* advance to next argument */
1269 usage();
1270 - if (sscanf(argv[argn], "%d", &quality) != 1)
1271 - usage();
1272 - /* Change scale factor in case -qtables is present. */
1273 - q_scale_factor = jpeg_quality_scaling(quality);
1274 + qualityarg = argv[argn];
1275
1276 } else if (keymatch(arg, "qslots", 2)) {
1277 /* Quantization table slot numbers. */
1278 @@ -382,7 +401,7 @@
1279 * default sampling factors.
1280 */
1281
1282 - } else if (keymatch(arg, "scans", 2)) {
1283 + } else if (keymatch(arg, "scans", 4)) {
1284 /* Set scan script. */
1285 #ifdef C_MULTISCAN_FILES_SUPPORTED
1286 if (++argn >= argc) /* advance to next argument */
1287 @@ -390,7 +409,7 @@
1288 scansarg = argv[argn];
1289 /* We must postpone reading the file in case -progressive appears. */
1290 #else
1291 - fprintf(stderr, "%s: sorry, multi-scan output was not compiled\n",
1292 + fprintf(stderr, "%s: sorry, multi-scan output was not compiled in\n",
1293 progname);
1294 exit(EXIT_FAILURE);
1295 #endif
1296 @@ -422,11 +441,12 @@
1297
1298 /* Set quantization tables for selected quality. */
1299 /* Some or all may be overridden if -qtables is present. */
1300 - jpeg_set_quality(cinfo, quality, force_baseline);
1301 + if (qualityarg != NULL) /* process -quality if it was present */
1302 + if (! set_quality_ratings(cinfo, qualityarg, force_baseline))
1303 + usage();
1304
1305 if (qtablefile != NULL) /* process -qtables if it was present */
1306 - if (! read_quant_tables(cinfo, qtablefile,
1307 - q_scale_factor, force_baseline))
1308 + if (! read_quant_tables(cinfo, qtablefile, force_baseline))
1309 usage();
1310
1311 if (qslotsarg != NULL) /* process -qslots if it was present */
1312 @@ -468,7 +488,9 @@
1313 int file_index;
1314 cjpeg_source_ptr src_mgr;
1315 FILE * input_file;
1316 - FILE * output_file;
1317 + FILE * output_file = NULL;
1318 + unsigned char *outbuffer = NULL;
1319 + unsigned long outsize = 0;
1320 JDIMENSION num_scanlines;
1321
1322 /* On Mac, fetch a command line. */
1323 @@ -511,20 +533,22 @@
1324 file_index = parse_switches(&cinfo, argc, argv, 0, FALSE);
1325
1326 #ifdef TWO_FILE_COMMANDLINE
1327 - /* Must have either -outfile switch or explicit output file name */
1328 - if (outfilename == NULL) {
1329 - if (file_index != argc-2) {
1330 - fprintf(stderr, "%s: must name one input and one output file\n",
1331 - progname);
1332 - usage();
1333 + if (!memdst) {
1334 + /* Must have either -outfile switch or explicit output file name */
1335 + if (outfilename == NULL) {
1336 + if (file_index != argc-2) {
1337 + fprintf(stderr, "%s: must name one input and one output file\n",
1338 + progname);
1339 + usage();
1340 + }
1341 + outfilename = argv[file_index+1];
1342 + } else {
1343 + if (file_index != argc-1) {
1344 + fprintf(stderr, "%s: must name one input and one output file\n",
1345 + progname);
1346 + usage();
1347 + }
1348 }
1349 - outfilename = argv[file_index+1];
1350 - } else {
1351 - if (file_index != argc-1) {
1352 - fprintf(stderr, "%s: must name one input and one output file\n",
1353 - progname);
1354 - usage();
1355 - }
1356 }
1357 #else
1358 /* Unix style: expect zero or one file name */
1359 @@ -551,7 +575,7 @@
1360 fprintf(stderr, "%s: can't open %s\n", progname, outfilename);
1361 exit(EXIT_FAILURE);
1362 }
1363 - } else {
1364 + } else if (!memdst) {
1365 /* default output file is stdout */
1366 output_file = write_stdout();
1367 }
1368 @@ -574,7 +598,12 @@
1369 file_index = parse_switches(&cinfo, argc, argv, 0, TRUE);
1370
1371 /* Specify data destination for compression */
1372 - jpeg_stdio_dest(&cinfo, output_file);
1373 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
1374 + if (memdst)
1375 + jpeg_mem_dest(&cinfo, &outbuffer, &outsize);
1376 + else
1377 +#endif
1378 + jpeg_stdio_dest(&cinfo, output_file);
1379
1380 /* Start compressor */
1381 jpeg_start_compress(&cinfo, TRUE);
1382 @@ -593,7 +622,7 @@
1383 /* Close files, if we opened them */
1384 if (input_file != stdin)
1385 fclose(input_file);
1386 - if (output_file != stdout)
1387 + if (output_file != stdout && output_file != NULL)
1388 fclose(output_file);
1389
1390 #ifdef PROGRESS_REPORT
1391 @@ -600,6 +629,12 @@
1392 end_progress_monitor((j_common_ptr) &cinfo);
1393 #endif
1394
1395 + if (memdst) {
1396 + fprintf(stderr, "Compressed size: %lu bytes\n", outsize);
1397 + if (outbuffer != NULL)
1398 + free(outbuffer);
1399 + }
1400 +
1401 /* All done. */
1402 exit(jerr.num_warnings ? EXIT_WARNING : EXIT_SUCCESS);
1403 return 0; /* suppress no-return-value warnings */
1404 Index: djpeg.c
1405 ===================================================================
1406 --- djpeg.c (revision 829)
1407 +++ djpeg.c (working copy)
1408 @@ -1,8 +1,11 @@
1409 /*
1410 * djpeg.c
1411 *
1412 + * This file was part of the Independent JPEG Group's software:
1413 * Copyright (C) 1991-1997, Thomas G. Lane.
1414 - * This file is part of the Independent JPEG Group's software.
1415 + * libjpeg-turbo Modifications:
1416 + * Copyright (C) 2010-2011, 2013-2015, D. R. Commander.
1417 + * Copyright (C) 2015, Google, Inc.
1418 * For conditions of distribution and use, see the accompanying README file.
1419 *
1420 * This file contains a command-line user interface for the JPEG decompressor.
1421 @@ -25,6 +28,7 @@
1422
1423 #include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */
1424 #include "jversion.h" /* for version message */
1425 +#include "config.h"
1426
1427 #include <ctype.h> /* to declare isprint() */
1428
1429 @@ -84,6 +88,10 @@
1430
1431 static const char * progname; /* program name for error messages */
1432 static char * outfilename; /* for -outfile switch */
1433 +boolean memsrc; /* for -memsrc switch */
1434 +boolean strip, skip;
1435 +JDIMENSION startY, endY;
1436 +#define INPUT_BUF_SIZE 4096
1437
1438
1439 LOCAL(void)
1440 @@ -101,6 +109,7 @@
1441 fprintf(stderr, " -colors N Reduce image to no more than N colors\n");
1442 fprintf(stderr, " -fast Fast, low-quality processing\n");
1443 fprintf(stderr, " -grayscale Force grayscale output\n");
1444 + fprintf(stderr, " -rgb Force RGB output\n");
1445 #ifdef IDCT_SCALING_SUPPORTED
1446 fprintf(stderr, " -scale M/N Scale output image by fraction M/N, eg, 1/8 \n");
1447 #endif
1448 @@ -153,6 +162,12 @@
1449 #endif
1450 fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n");
1451 fprintf(stderr, " -outfile name Specify name for output file\n");
1452 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
1453 + fprintf(stderr, " -memsrc Load input file into memory before decompre ssing\n");
1454 +#endif
1455 +
1456 + fprintf(stderr, " -skip Y0,Y1 Decode all rows except those between Y0 and Y1 (inclusive)\n");
1457 + fprintf(stderr, " -strip Y0,Y1 Decode only rows between Y0 and Y1 (inclusi ve)\n");
1458 fprintf(stderr, " -verbose or -debug Emit debug output\n");
1459 exit(EXIT_FAILURE);
1460 }
1461 @@ -176,6 +191,9 @@
1462 /* Set up default JPEG parameters. */
1463 requested_fmt = DEFAULT_FMT; /* set default output file format */
1464 outfilename = NULL;
1465 + memsrc = FALSE;
1466 + strip = FALSE;
1467 + skip = FALSE;
1468 cinfo->err->trace_level = 0;
1469
1470 /* Scan command line options, adjust parameters */
1471 @@ -240,8 +258,11 @@
1472 static boolean printed_version = FALSE;
1473
1474 if (! printed_version) {
1475 - fprintf(stderr, "Independent JPEG Group's DJPEG, version %s\n%s\n",
1476 - JVERSION, JCOPYRIGHT);
1477 + fprintf(stderr, "%s version %s (build %s)\n",
1478 + PACKAGE_NAME, VERSION, BUILD);
1479 + fprintf(stderr, "%s\n\n", JCOPYRIGHT);
1480 + fprintf(stderr, "Emulating The Independent JPEG Group's software, versio n %s\n\n",
1481 + JVERSION);
1482 printed_version = TRUE;
1483 }
1484 cinfo->err->trace_level++;
1485 @@ -263,6 +284,10 @@
1486 /* Force monochrome output. */
1487 cinfo->out_color_space = JCS_GRAYSCALE;
1488
1489 + } else if (keymatch(arg, "rgb", 2)) {
1490 + /* Force RGB output. */
1491 + cinfo->out_color_space = JCS_RGB;
1492 +
1493 } else if (keymatch(arg, "map", 3)) {
1494 /* Quantize to a color map taken from an input file. */
1495 if (++argn >= argc) /* advance to next argument */
1496 @@ -314,6 +339,16 @@
1497 usage();
1498 outfilename = argv[argn]; /* save it away for later use */
1499
1500 + } else if (keymatch(arg, "memsrc", 2)) {
1501 + /* Use in-memory source manager */
1502 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
1503 + memsrc = TRUE;
1504 +#else
1505 + fprintf(stderr, "%s: sorry, in-memory source manager was not compiled in\ n",
1506 + progname);
1507 + exit(EXIT_FAILURE);
1508 +#endif
1509 +
1510 } else if (keymatch(arg, "pnm", 1) || keymatch(arg, "ppm", 1)) {
1511 /* PPM/PGM output format. */
1512 requested_fmt = FMT_PPM;
1513 @@ -322,7 +357,7 @@
1514 /* RLE output format. */
1515 requested_fmt = FMT_RLE;
1516
1517 - } else if (keymatch(arg, "scale", 1)) {
1518 + } else if (keymatch(arg, "scale", 2)) {
1519 /* Scale the output image by a fraction M/N. */
1520 if (++argn >= argc) /* advance to next argument */
1521 usage();
1522 @@ -330,6 +365,20 @@
1523 &cinfo->scale_num, &cinfo->scale_denom) != 2)
1524 usage();
1525
1526 + } else if (keymatch(arg, "strip", 2)) {
1527 + if (++argn >= argc)
1528 + usage();
1529 + if (sscanf(argv[argn], "%d,%d", &startY, &endY) != 2 || startY > endY)
1530 + usage();
1531 + strip = TRUE;
1532 +
1533 + } else if (keymatch(arg, "skip", 2)) {
1534 + if (++argn >= argc)
1535 + usage();
1536 + if (sscanf(argv[argn], "%d,%d", &startY, &endY) != 2 || startY > endY)
1537 + usage();
1538 + skip = TRUE;
1539 +
1540 } else if (keymatch(arg, "targa", 1)) {
1541 /* Targa output format. */
1542 requested_fmt = FMT_TARGA;
1543 @@ -432,6 +481,8 @@
1544 djpeg_dest_ptr dest_mgr = NULL;
1545 FILE * input_file;
1546 FILE * output_file;
1547 + unsigned char *inbuffer = NULL;
1548 + unsigned long insize = 0;
1549 JDIMENSION num_scanlines;
1550
1551 /* On Mac, fetch a command line. */
1552 @@ -455,7 +506,7 @@
1553 * APP12 is used by some digital camera makers for textual info,
1554 * so we provide the ability to display it as text.
1555 * If you like, additional APPn marker types can be selected for display,
1556 - * but don't try to override APP0 or APP14 this way (see libjpeg.doc).
1557 + * but don't try to override APP0 or APP14 this way (see libjpeg.txt).
1558 */
1559 jpeg_set_marker_processor(&cinfo, JPEG_COM, print_text_marker);
1560 jpeg_set_marker_processor(&cinfo, JPEG_APP0+12, print_text_marker);
1561 @@ -526,7 +577,30 @@
1562 #endif
1563
1564 /* Specify data source for decompression */
1565 - jpeg_stdio_src(&cinfo, input_file);
1566 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
1567 + if (memsrc) {
1568 + size_t nbytes;
1569 + do {
1570 + inbuffer = (unsigned char *)realloc(inbuffer, insize + INPUT_BUF_SIZE);
1571 + if (inbuffer == NULL) {
1572 + fprintf(stderr, "%s: memory allocation failure\n", progname);
1573 + exit(EXIT_FAILURE);
1574 + }
1575 + nbytes = JFREAD(input_file, &inbuffer[insize], INPUT_BUF_SIZE);
1576 + if (nbytes < INPUT_BUF_SIZE && ferror(input_file)) {
1577 + if (file_index < argc)
1578 + fprintf(stderr, "%s: can't read from %s\n", progname,
1579 + argv[file_index]);
1580 + else
1581 + fprintf(stderr, "%s: can't read from stdin\n", progname);
1582 + }
1583 + insize += (unsigned long)nbytes;
1584 + } while (nbytes == INPUT_BUF_SIZE);
1585 + fprintf(stderr, "Compressed size: %lu bytes\n", insize);
1586 + jpeg_mem_src(&cinfo, inbuffer, insize);
1587 + } else
1588 +#endif
1589 + jpeg_stdio_src(&cinfo, input_file);
1590
1591 /* Read file header, set default decompression parameters */
1592 (void) jpeg_read_header(&cinfo, TRUE);
1593 @@ -575,14 +649,64 @@
1594 /* Start decompressor */
1595 (void) jpeg_start_decompress(&cinfo);
1596
1597 - /* Write output file header */
1598 - (*dest_mgr->start_output) (&cinfo, dest_mgr);
1599 + /* Strip decode */
1600 + if (strip || skip) {
1601 + JDIMENSION tmp;
1602
1603 - /* Process data */
1604 - while (cinfo.output_scanline < cinfo.output_height) {
1605 - num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,
1606 - dest_mgr->buffer_height);
1607 - (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);
1608 + /* Check for valid endY. We cannot check this value until after
1609 + * jpeg_start_decompress() is called. Note that we have already verified
1610 + * that startY <= endY.
1611 + */
1612 + if (endY > cinfo.output_height - 1) {
1613 + fprintf(stderr, "%s: strip %d-%d exceeds image height %d\n", progname,
1614 + startY, endY, cinfo.output_height);
1615 + exit(EXIT_FAILURE);
1616 + }
1617 +
1618 + /* Write output file header. This is a hack to ensure that the destination
1619 + * manager creates an image of the proper size for the partial decode.
1620 + */
1621 + tmp = cinfo.output_height;
1622 + cinfo.output_height = endY - startY + 1;
1623 + if (skip)
1624 + cinfo.output_height = tmp - cinfo.output_height;
1625 + (*dest_mgr->start_output) (&cinfo, dest_mgr);
1626 + cinfo.output_height = tmp;
1627 +
1628 + /* Process data */
1629 + if (skip) {
1630 + while (cinfo.output_scanline < startY) {
1631 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,
1632 + dest_mgr->buffer_height);
1633 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);
1634 + }
1635 + jpeg_skip_scanlines(&cinfo, endY - startY + 1);
1636 + while (cinfo.output_scanline < cinfo.output_height) {
1637 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,
1638 + dest_mgr->buffer_height);
1639 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);
1640 + }
1641 + } else {
1642 + jpeg_skip_scanlines(&cinfo, startY);
1643 + while (cinfo.output_scanline <= endY) {
1644 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,
1645 + dest_mgr->buffer_height);
1646 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);
1647 + }
1648 + jpeg_skip_scanlines(&cinfo, cinfo.output_height - endY + 1);
1649 + }
1650 +
1651 + /* Normal full image decode */
1652 + } else {
1653 + /* Write output file header */
1654 + (*dest_mgr->start_output) (&cinfo, dest_mgr);
1655 +
1656 + /* Process data */
1657 + while (cinfo.output_scanline < cinfo.output_height) {
1658 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,
1659 + dest_mgr->buffer_height);
1660 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);
1661 + }
1662 }
1663
1664 #ifdef PROGRESS_REPORT
1665 @@ -610,6 +734,9 @@
1666 end_progress_monitor((j_common_ptr) &cinfo);
1667 #endif
1668
1669 + if (memsrc && inbuffer != NULL)
1670 + free(inbuffer);
1671 +
1672 /* All done. */
1673 exit(jerr.num_warnings ? EXIT_WARNING : EXIT_SUCCESS);
1674 return 0; /* suppress no-return-value warnings */
1675 Index: jcapimin.c
1676 ===================================================================
1677 --- jcapimin.c (revision 829)
1678 +++ jcapimin.c (working copy)
1679 @@ -2,6 +2,7 @@
1680 * jcapimin.c
1681 *
1682 * Copyright (C) 1994-1998, Thomas G. Lane.
1683 + * Modified 2003-2010 by Guido Vollbeding.
1684 * This file is part of the Independent JPEG Group's software.
1685 * For conditions of distribution and use, see the accompanying README file.
1686 *
1687 @@ -63,8 +64,12 @@
1688
1689 cinfo->comp_info = NULL;
1690
1691 - for (i = 0; i < NUM_QUANT_TBLS; i++)
1692 + for (i = 0; i < NUM_QUANT_TBLS; i++) {
1693 cinfo->quant_tbl_ptrs[i] = NULL;
1694 +#if JPEG_LIB_VERSION >= 70
1695 + cinfo->q_scale_factor[i] = 100;
1696 +#endif
1697 + }
1698
1699 for (i = 0; i < NUM_HUFF_TBLS; i++) {
1700 cinfo->dc_huff_tbl_ptrs[i] = NULL;
1701 @@ -71,6 +76,13 @@
1702 cinfo->ac_huff_tbl_ptrs[i] = NULL;
1703 }
1704
1705 +#if JPEG_LIB_VERSION >= 80
1706 + /* Must do it here for emit_dqt in case jpeg_write_tables is used */
1707 + cinfo->block_size = DCTSIZE;
1708 + cinfo->natural_order = jpeg_natural_order;
1709 + cinfo->lim_Se = DCTSIZE2-1;
1710 +#endif
1711 +
1712 cinfo->script_space = NULL;
1713
1714 cinfo->input_gamma = 1.0; /* in case application forgets */
1715 Index: jccolor.c
1716 ===================================================================
1717 --- jccolor.c (revision 829)
1718 +++ jccolor.c (working copy)
1719 @@ -1,10 +1,11 @@
1720 /*
1721 * jccolor.c
1722 *
1723 + * This file was part of the Independent JPEG Group's software:
1724 * Copyright (C) 1991-1996, Thomas G. Lane.
1725 + * libjpeg-turbo Modifications:
1726 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
1727 - * Copyright 2009 D. R. Commander
1728 - * This file is part of the Independent JPEG Group's software.
1729 + * Copyright (C) 2009-2012, D. R. Commander.
1730 * For conditions of distribution and use, see the accompanying README file.
1731 *
1732 * This file contains input colorspace conversion routines.
1733 @@ -14,6 +15,7 @@
1734 #include "jinclude.h"
1735 #include "jpeglib.h"
1736 #include "jsimd.h"
1737 +#include "config.h"
1738
1739
1740 /* Private subobject */
1741 @@ -81,6 +83,111 @@
1742 #define TABLE_SIZE (8*(MAXJSAMPLE+1))
1743
1744
1745 +/* Include inline routines for colorspace extensions */
1746 +
1747 +#include "jccolext.c"
1748 +#undef RGB_RED
1749 +#undef RGB_GREEN
1750 +#undef RGB_BLUE
1751 +#undef RGB_PIXELSIZE
1752 +
1753 +#define RGB_RED EXT_RGB_RED
1754 +#define RGB_GREEN EXT_RGB_GREEN
1755 +#define RGB_BLUE EXT_RGB_BLUE
1756 +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
1757 +#define rgb_ycc_convert_internal extrgb_ycc_convert_internal
1758 +#define rgb_gray_convert_internal extrgb_gray_convert_internal
1759 +#define rgb_rgb_convert_internal extrgb_rgb_convert_internal
1760 +#include "jccolext.c"
1761 +#undef RGB_RED
1762 +#undef RGB_GREEN
1763 +#undef RGB_BLUE
1764 +#undef RGB_PIXELSIZE
1765 +#undef rgb_ycc_convert_internal
1766 +#undef rgb_gray_convert_internal
1767 +#undef rgb_rgb_convert_internal
1768 +
1769 +#define RGB_RED EXT_RGBX_RED
1770 +#define RGB_GREEN EXT_RGBX_GREEN
1771 +#define RGB_BLUE EXT_RGBX_BLUE
1772 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
1773 +#define rgb_ycc_convert_internal extrgbx_ycc_convert_internal
1774 +#define rgb_gray_convert_internal extrgbx_gray_convert_internal
1775 +#define rgb_rgb_convert_internal extrgbx_rgb_convert_internal
1776 +#include "jccolext.c"
1777 +#undef RGB_RED
1778 +#undef RGB_GREEN
1779 +#undef RGB_BLUE
1780 +#undef RGB_PIXELSIZE
1781 +#undef rgb_ycc_convert_internal
1782 +#undef rgb_gray_convert_internal
1783 +#undef rgb_rgb_convert_internal
1784 +
1785 +#define RGB_RED EXT_BGR_RED
1786 +#define RGB_GREEN EXT_BGR_GREEN
1787 +#define RGB_BLUE EXT_BGR_BLUE
1788 +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
1789 +#define rgb_ycc_convert_internal extbgr_ycc_convert_internal
1790 +#define rgb_gray_convert_internal extbgr_gray_convert_internal
1791 +#define rgb_rgb_convert_internal extbgr_rgb_convert_internal
1792 +#include "jccolext.c"
1793 +#undef RGB_RED
1794 +#undef RGB_GREEN
1795 +#undef RGB_BLUE
1796 +#undef RGB_PIXELSIZE
1797 +#undef rgb_ycc_convert_internal
1798 +#undef rgb_gray_convert_internal
1799 +#undef rgb_rgb_convert_internal
1800 +
1801 +#define RGB_RED EXT_BGRX_RED
1802 +#define RGB_GREEN EXT_BGRX_GREEN
1803 +#define RGB_BLUE EXT_BGRX_BLUE
1804 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
1805 +#define rgb_ycc_convert_internal extbgrx_ycc_convert_internal
1806 +#define rgb_gray_convert_internal extbgrx_gray_convert_internal
1807 +#define rgb_rgb_convert_internal extbgrx_rgb_convert_internal
1808 +#include "jccolext.c"
1809 +#undef RGB_RED
1810 +#undef RGB_GREEN
1811 +#undef RGB_BLUE
1812 +#undef RGB_PIXELSIZE
1813 +#undef rgb_ycc_convert_internal
1814 +#undef rgb_gray_convert_internal
1815 +#undef rgb_rgb_convert_internal
1816 +
1817 +#define RGB_RED EXT_XBGR_RED
1818 +#define RGB_GREEN EXT_XBGR_GREEN
1819 +#define RGB_BLUE EXT_XBGR_BLUE
1820 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
1821 +#define rgb_ycc_convert_internal extxbgr_ycc_convert_internal
1822 +#define rgb_gray_convert_internal extxbgr_gray_convert_internal
1823 +#define rgb_rgb_convert_internal extxbgr_rgb_convert_internal
1824 +#include "jccolext.c"
1825 +#undef RGB_RED
1826 +#undef RGB_GREEN
1827 +#undef RGB_BLUE
1828 +#undef RGB_PIXELSIZE
1829 +#undef rgb_ycc_convert_internal
1830 +#undef rgb_gray_convert_internal
1831 +#undef rgb_rgb_convert_internal
1832 +
1833 +#define RGB_RED EXT_XRGB_RED
1834 +#define RGB_GREEN EXT_XRGB_GREEN
1835 +#define RGB_BLUE EXT_XRGB_BLUE
1836 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
1837 +#define rgb_ycc_convert_internal extxrgb_ycc_convert_internal
1838 +#define rgb_gray_convert_internal extxrgb_gray_convert_internal
1839 +#define rgb_rgb_convert_internal extxrgb_rgb_convert_internal
1840 +#include "jccolext.c"
1841 +#undef RGB_RED
1842 +#undef RGB_GREEN
1843 +#undef RGB_BLUE
1844 +#undef RGB_PIXELSIZE
1845 +#undef rgb_ycc_convert_internal
1846 +#undef rgb_gray_convert_internal
1847 +#undef rgb_rgb_convert_internal
1848 +
1849 +
1850 /*
1851 * Initialize for RGB->YCC colorspace conversion.
1852 */
1853 @@ -119,14 +226,6 @@
1854
1855 /*
1856 * Convert some rows of samples to the JPEG colorspace.
1857 - *
1858 - * Note that we change from the application's interleaved-pixel format
1859 - * to our internal noninterleaved, one-plane-per-component format.
1860 - * The input buffer is therefore three times as wide as the output buffer.
1861 - *
1862 - * A starting row offset is provided only for the output buffer. The caller
1863 - * can easily adjust the passed input_buf value to accommodate any row
1864 - * offset required on that side.
1865 */
1866
1867 METHODDEF(void)
1868 @@ -134,43 +233,39 @@
1869 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
1870 JDIMENSION output_row, int num_rows)
1871 {
1872 - my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert;
1873 - register int r, g, b;
1874 - register INT32 * ctab = cconvert->rgb_ycc_tab;
1875 - register JSAMPROW inptr;
1876 - register JSAMPROW outptr0, outptr1, outptr2;
1877 - register JDIMENSION col;
1878 - JDIMENSION num_cols = cinfo->image_width;
1879 -
1880 - while (--num_rows >= 0) {
1881 - inptr = *input_buf++;
1882 - outptr0 = output_buf[0][output_row];
1883 - outptr1 = output_buf[1][output_row];
1884 - outptr2 = output_buf[2][output_row];
1885 - output_row++;
1886 - for (col = 0; col < num_cols; col++) {
1887 - r = GETJSAMPLE(inptr[rgb_red[cinfo->in_color_space]]);
1888 - g = GETJSAMPLE(inptr[rgb_green[cinfo->in_color_space]]);
1889 - b = GETJSAMPLE(inptr[rgb_blue[cinfo->in_color_space]]);
1890 - inptr += rgb_pixelsize[cinfo->in_color_space];
1891 - /* If the inputs are 0..MAXJSAMPLE, the outputs of these equations
1892 - * must be too; we do not need an explicit range-limiting operation.
1893 - * Hence the value being shifted is never negative, and we don't
1894 - * need the general RIGHT_SHIFT macro.
1895 - */
1896 - /* Y */
1897 - outptr0[col] = (JSAMPLE)
1898 - ((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF])
1899 - >> SCALEBITS);
1900 - /* Cb */
1901 - outptr1[col] = (JSAMPLE)
1902 - ((ctab[r+R_CB_OFF] + ctab[g+G_CB_OFF] + ctab[b+B_CB_OFF])
1903 - >> SCALEBITS);
1904 - /* Cr */
1905 - outptr2[col] = (JSAMPLE)
1906 - ((ctab[r+R_CR_OFF] + ctab[g+G_CR_OFF] + ctab[b+B_CR_OFF])
1907 - >> SCALEBITS);
1908 - }
1909 + switch (cinfo->in_color_space) {
1910 + case JCS_EXT_RGB:
1911 + extrgb_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
1912 + num_rows);
1913 + break;
1914 + case JCS_EXT_RGBX:
1915 + case JCS_EXT_RGBA:
1916 + extrgbx_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
1917 + num_rows);
1918 + break;
1919 + case JCS_EXT_BGR:
1920 + extbgr_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
1921 + num_rows);
1922 + break;
1923 + case JCS_EXT_BGRX:
1924 + case JCS_EXT_BGRA:
1925 + extbgrx_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
1926 + num_rows);
1927 + break;
1928 + case JCS_EXT_XBGR:
1929 + case JCS_EXT_ABGR:
1930 + extxbgr_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
1931 + num_rows);
1932 + break;
1933 + case JCS_EXT_XRGB:
1934 + case JCS_EXT_ARGB:
1935 + extxrgb_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
1936 + num_rows);
1937 + break;
1938 + default:
1939 + rgb_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,
1940 + num_rows);
1941 + break;
1942 }
1943 }
1944
1945 @@ -180,9 +275,6 @@
1946
1947 /*
1948 * Convert some rows of samples to the JPEG colorspace.
1949 - * This version handles RGB->grayscale conversion, which is the same
1950 - * as the RGB->Y portion of RGB->YCbCr.
1951 - * We assume rgb_ycc_start has been called (we only use the Y tables).
1952 */
1953
1954 METHODDEF(void)
1955 @@ -190,28 +282,85 @@
1956 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
1957 JDIMENSION output_row, int num_rows)
1958 {
1959 - my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert;
1960 - register int r, g, b;
1961 - register INT32 * ctab = cconvert->rgb_ycc_tab;
1962 - register JSAMPROW inptr;
1963 - register JSAMPROW outptr;
1964 - register JDIMENSION col;
1965 - JDIMENSION num_cols = cinfo->image_width;
1966 + switch (cinfo->in_color_space) {
1967 + case JCS_EXT_RGB:
1968 + extrgb_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
1969 + num_rows);
1970 + break;
1971 + case JCS_EXT_RGBX:
1972 + case JCS_EXT_RGBA:
1973 + extrgbx_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
1974 + num_rows);
1975 + break;
1976 + case JCS_EXT_BGR:
1977 + extbgr_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
1978 + num_rows);
1979 + break;
1980 + case JCS_EXT_BGRX:
1981 + case JCS_EXT_BGRA:
1982 + extbgrx_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
1983 + num_rows);
1984 + break;
1985 + case JCS_EXT_XBGR:
1986 + case JCS_EXT_ABGR:
1987 + extxbgr_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
1988 + num_rows);
1989 + break;
1990 + case JCS_EXT_XRGB:
1991 + case JCS_EXT_ARGB:
1992 + extxrgb_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
1993 + num_rows);
1994 + break;
1995 + default:
1996 + rgb_gray_convert_internal(cinfo, input_buf, output_buf, output_row,
1997 + num_rows);
1998 + break;
1999 + }
2000 +}
2001
2002 - while (--num_rows >= 0) {
2003 - inptr = *input_buf++;
2004 - outptr = output_buf[0][output_row];
2005 - output_row++;
2006 - for (col = 0; col < num_cols; col++) {
2007 - r = GETJSAMPLE(inptr[rgb_red[cinfo->in_color_space]]);
2008 - g = GETJSAMPLE(inptr[rgb_green[cinfo->in_color_space]]);
2009 - b = GETJSAMPLE(inptr[rgb_blue[cinfo->in_color_space]]);
2010 - inptr += rgb_pixelsize[cinfo->in_color_space];
2011 - /* Y */
2012 - outptr[col] = (JSAMPLE)
2013 - ((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF])
2014 - >> SCALEBITS);
2015 - }
2016 +
2017 +/*
2018 + * Extended RGB to plain RGB conversion
2019 + */
2020 +
2021 +METHODDEF(void)
2022 +rgb_rgb_convert (j_compress_ptr cinfo,
2023 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
2024 + JDIMENSION output_row, int num_rows)
2025 +{
2026 + switch (cinfo->in_color_space) {
2027 + case JCS_EXT_RGB:
2028 + extrgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
2029 + num_rows);
2030 + break;
2031 + case JCS_EXT_RGBX:
2032 + case JCS_EXT_RGBA:
2033 + extrgbx_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
2034 + num_rows);
2035 + break;
2036 + case JCS_EXT_BGR:
2037 + extbgr_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
2038 + num_rows);
2039 + break;
2040 + case JCS_EXT_BGRX:
2041 + case JCS_EXT_BGRA:
2042 + extbgrx_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
2043 + num_rows);
2044 + break;
2045 + case JCS_EXT_XBGR:
2046 + case JCS_EXT_ABGR:
2047 + extxbgr_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
2048 + num_rows);
2049 + break;
2050 + case JCS_EXT_XRGB:
2051 + case JCS_EXT_ARGB:
2052 + extxrgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
2053 + num_rows);
2054 + break;
2055 + default:
2056 + rgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,
2057 + num_rows);
2058 + break;
2059 }
2060 }
2061
2062 @@ -377,6 +526,10 @@
2063 case JCS_EXT_BGRX:
2064 case JCS_EXT_XBGR:
2065 case JCS_EXT_XRGB:
2066 + case JCS_EXT_RGBA:
2067 + case JCS_EXT_BGRA:
2068 + case JCS_EXT_ABGR:
2069 + case JCS_EXT_ARGB:
2070 if (cinfo->input_components != rgb_pixelsize[cinfo->in_color_space])
2071 ERREXIT(cinfo, JERR_BAD_IN_COLORSPACE);
2072 break;
2073 @@ -411,9 +564,17 @@
2074 cinfo->in_color_space == JCS_EXT_BGR ||
2075 cinfo->in_color_space == JCS_EXT_BGRX ||
2076 cinfo->in_color_space == JCS_EXT_XBGR ||
2077 - cinfo->in_color_space == JCS_EXT_XRGB) {
2078 - cconvert->pub.start_pass = rgb_ycc_start;
2079 - cconvert->pub.color_convert = rgb_gray_convert;
2080 + cinfo->in_color_space == JCS_EXT_XRGB ||
2081 + cinfo->in_color_space == JCS_EXT_RGBA ||
2082 + cinfo->in_color_space == JCS_EXT_BGRA ||
2083 + cinfo->in_color_space == JCS_EXT_ABGR ||
2084 + cinfo->in_color_space == JCS_EXT_ARGB) {
2085 + if (jsimd_can_rgb_gray())
2086 + cconvert->pub.color_convert = jsimd_rgb_gray_convert;
2087 + else {
2088 + cconvert->pub.start_pass = rgb_ycc_start;
2089 + cconvert->pub.color_convert = rgb_gray_convert;
2090 + }
2091 } else if (cinfo->in_color_space == JCS_YCbCr)
2092 cconvert->pub.color_convert = grayscale_convert;
2093 else
2094 @@ -421,17 +582,25 @@
2095 break;
2096
2097 case JCS_RGB:
2098 - case JCS_EXT_RGB:
2099 - case JCS_EXT_RGBX:
2100 - case JCS_EXT_BGR:
2101 - case JCS_EXT_BGRX:
2102 - case JCS_EXT_XBGR:
2103 - case JCS_EXT_XRGB:
2104 if (cinfo->num_components != 3)
2105 ERREXIT(cinfo, JERR_BAD_J_COLORSPACE);
2106 - if (cinfo->in_color_space == cinfo->jpeg_color_space &&
2107 - rgb_pixelsize[cinfo->in_color_space] == 3)
2108 + if (rgb_red[cinfo->in_color_space] == 0 &&
2109 + rgb_green[cinfo->in_color_space] == 1 &&
2110 + rgb_blue[cinfo->in_color_space] == 2 &&
2111 + rgb_pixelsize[cinfo->in_color_space] == 3)
2112 cconvert->pub.color_convert = null_convert;
2113 + else if (cinfo->in_color_space == JCS_RGB ||
2114 + cinfo->in_color_space == JCS_EXT_RGB ||
2115 + cinfo->in_color_space == JCS_EXT_RGBX ||
2116 + cinfo->in_color_space == JCS_EXT_BGR ||
2117 + cinfo->in_color_space == JCS_EXT_BGRX ||
2118 + cinfo->in_color_space == JCS_EXT_XBGR ||
2119 + cinfo->in_color_space == JCS_EXT_XRGB ||
2120 + cinfo->in_color_space == JCS_EXT_RGBA ||
2121 + cinfo->in_color_space == JCS_EXT_BGRA ||
2122 + cinfo->in_color_space == JCS_EXT_ABGR ||
2123 + cinfo->in_color_space == JCS_EXT_ARGB)
2124 + cconvert->pub.color_convert = rgb_rgb_convert;
2125 else
2126 ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL);
2127 break;
2128 @@ -445,7 +614,11 @@
2129 cinfo->in_color_space == JCS_EXT_BGR ||
2130 cinfo->in_color_space == JCS_EXT_BGRX ||
2131 cinfo->in_color_space == JCS_EXT_XBGR ||
2132 - cinfo->in_color_space == JCS_EXT_XRGB) {
2133 + cinfo->in_color_space == JCS_EXT_XRGB ||
2134 + cinfo->in_color_space == JCS_EXT_RGBA ||
2135 + cinfo->in_color_space == JCS_EXT_BGRA ||
2136 + cinfo->in_color_space == JCS_EXT_ABGR ||
2137 + cinfo->in_color_space == JCS_EXT_ARGB) {
2138 if (jsimd_can_rgb_ycc())
2139 cconvert->pub.color_convert = jsimd_rgb_ycc_convert;
2140 else {
2141 Index: jcdctmgr.c
2142 ===================================================================
2143 --- jcdctmgr.c (revision 829)
2144 +++ jcdctmgr.c (working copy)
2145 @@ -1,10 +1,12 @@
2146 /*
2147 * jcdctmgr.c
2148 *
2149 + * This file was part of the Independent JPEG Group's software:
2150 * Copyright (C) 1994-1996, Thomas G. Lane.
2151 + * libjpeg-turbo Modifications:
2152 * Copyright (C) 1999-2006, MIYASAKA Masaru.
2153 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
2154 - * This file is part of the Independent JPEG Group's software.
2155 + * Copyright (C) 2011 D. R. Commander
2156 * For conditions of distribution and use, see the accompanying README file.
2157 *
2158 * This file contains the forward-DCT management logic.
2159 @@ -39,6 +41,8 @@
2160 (JCOEFPTR coef_block, FAST_FLOAT * divisors,
2161 FAST_FLOAT * workspace));
2162
2163 +METHODDEF(void) quantize (JCOEFPTR, DCTELEM *, DCTELEM *);
2164 +
2165 typedef struct {
2166 struct jpeg_forward_dct pub; /* public fields */
2167
2168 @@ -73,7 +77,7 @@
2169 * Find the highest bit in an integer through binary search.
2170 */
2171 LOCAL(int)
2172 -fls (UINT16 val)
2173 +flss (UINT16 val)
2174 {
2175 int bit;
2176
2177 @@ -160,7 +164,7 @@
2178 * of in a consecutive manner, yet again in order to allow SIMD
2179 * routines.
2180 */
2181 -LOCAL(void)
2182 +LOCAL(int)
2183 compute_reciprocal (UINT16 divisor, DCTELEM * dtbl)
2184 {
2185 UDCTELEM2 fq, fr;
2186 @@ -167,7 +171,7 @@
2187 UDCTELEM c;
2188 int b, r;
2189
2190 - b = fls(divisor) - 1;
2191 + b = flss(divisor) - 1;
2192 r = sizeof(DCTELEM) * 8 + b;
2193
2194 fq = ((UDCTELEM2)1 << r) / divisor;
2195 @@ -179,7 +183,7 @@
2196 /* fq will be one bit too large to fit in DCTELEM, so adjust */
2197 fq >>= 1;
2198 r--;
2199 - } else if (fr <= (divisor / 2)) { /* fractional part is < 0.5 */
2200 + } else if (fr <= (divisor / 2U)) { /* fractional part is < 0.5 */
2201 c++;
2202 } else { /* fractional part is > 0.5 */
2203 fq++;
2204 @@ -189,6 +193,9 @@
2205 dtbl[DCTSIZE2 * 1] = (DCTELEM) c; /* correction + roundfactor */
2206 dtbl[DCTSIZE2 * 2] = (DCTELEM) (1 << (sizeof(DCTELEM)*8*2 - r)); /* scale */
2207 dtbl[DCTSIZE2 * 3] = (DCTELEM) r - sizeof(DCTELEM)*8; /* shift */
2208 +
2209 + if(r <= 16) return 0;
2210 + else return 1;
2211 }
2212
2213 /*
2214 @@ -232,7 +239,9 @@
2215 }
2216 dtbl = fdct->divisors[qtblno];
2217 for (i = 0; i < DCTSIZE2; i++) {
2218 - compute_reciprocal(qtbl->quantval[i] << 3, &dtbl[i]);
2219 + if(!compute_reciprocal(qtbl->quantval[i] << 3, &dtbl[i])
2220 + && fdct->quantize == jsimd_quantize)
2221 + fdct->quantize = quantize;
2222 }
2223 break;
2224 #endif
2225 @@ -266,10 +275,12 @@
2226 }
2227 dtbl = fdct->divisors[qtblno];
2228 for (i = 0; i < DCTSIZE2; i++) {
2229 - compute_reciprocal(
2230 + if(!compute_reciprocal(
2231 DESCALE(MULTIPLY16V16((INT32) qtbl->quantval[i],
2232 (INT32) aanscales[i]),
2233 - CONST_BITS-3), &dtbl[i]);
2234 + CONST_BITS-3), &dtbl[i])
2235 + && fdct->quantize == jsimd_quantize)
2236 + fdct->quantize = quantize;
2237 }
2238 }
2239 break;
2240 Index: jchuff.c
2241 ===================================================================
2242 --- jchuff.c (revision 829)
2243 +++ jchuff.c (working copy)
2244 @@ -1,8 +1,10 @@
2245 /*
2246 * jchuff.c
2247 *
2248 + * This file was part of the Independent JPEG Group's software:
2249 * Copyright (C) 1991-1997, Thomas G. Lane.
2250 - * This file is part of the Independent JPEG Group's software.
2251 + * libjpeg-turbo Modifications:
2252 + * Copyright (C) 2009-2011, D. R. Commander.
2253 * For conditions of distribution and use, see the accompanying README file.
2254 *
2255 * This file contains Huffman entropy encoding routines.
2256 @@ -14,21 +16,6 @@
2257 * permanent JPEG objects only upon successful completion of an MCU.
2258 */
2259
2260 -/* Modifications:
2261 - * Copyright (C)2007 Sun Microsystems, Inc.
2262 - * Copyright (C)2009 D. R. Commander
2263 - *
2264 - * This library is free software and may be redistributed and/or modified under
2265 - * the terms of the wxWindows Library License, Version 3.1 or (at your option)
2266 - * any later version. The full license is in the LICENSE.txt file included
2267 - * with this distribution.
2268 - *
2269 - * This library is distributed in the hope that it will be useful,
2270 - * but WITHOUT ANY WARRANTY; without even the implied warranty of
2271 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2272 - * wxWindows Library License for more details.
2273 - */
2274 -
2275 #define JPEG_INTERNALS
2276 #include "jinclude.h"
2277 #include "jpeglib.h"
2278 @@ -35,13 +22,42 @@
2279 #include "jchuff.h" /* Declarations shared with jcphuff.c */
2280 #include <limits.h>
2281
2282 -static unsigned char jpeg_first_bit_table[65536];
2283 -int jpeg_first_bit_table_init=0;
2284 +/*
2285 + * NOTE: If USE_CLZ_INTRINSIC is defined, then clz/bsr instructions will be
2286 + * used for bit counting rather than the lookup table. This will reduce the
2287 + * memory footprint by 64k, which is important for some mobile applications
2288 + * that create many isolated instances of libjpeg-turbo (web browsers, for
2289 + * instance.) This may improve performance on some mobile platforms as well.
2290 + * This feature is enabled by default only on ARM processors, because some x86
2291 + * chips have a slow implementation of bsr, and the use of clz/bsr cannot be
2292 + * shown to have a significant performance impact even on the x86 chips that
2293 + * have a fast implementation of it. When building for ARMv6, you can
2294 + * explicitly disable the use of clz/bsr by adding -mthumb to the compiler
2295 + * flags (this defines __thumb__).
2296 + */
2297
2298 +/* NOTE: Both GCC and Clang define __GNUC__ */
2299 +#if defined __GNUC__ && defined __arm__
2300 +#if !defined __thumb__ || defined __thumb2__
2301 +#define USE_CLZ_INTRINSIC
2302 +#endif
2303 +#endif
2304 +
2305 +#ifdef USE_CLZ_INTRINSIC
2306 +#define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x))
2307 +#define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0)
2308 +#else
2309 +static unsigned char jpeg_nbits_table[65536];
2310 +static int jpeg_nbits_table_init = 0;
2311 +#define JPEG_NBITS(x) (jpeg_nbits_table[x])
2312 +#define JPEG_NBITS_NONZERO(x) JPEG_NBITS(x)
2313 +#endif
2314 +
2315 #ifndef min
2316 #define min(a,b) ((a)<(b)?(a):(b))
2317 #endif
2318
2319 +
2320 /* Expanded entropy encoder object for Huffman encoding.
2321 *
2322 * The savable_state subrecord contains fields that change within an MCU,
2323 @@ -49,7 +65,7 @@
2324 */
2325
2326 typedef struct {
2327 - long put_buffer; /* current bit-accumulation buffer */
2328 + size_t put_buffer; /* current bit-accumulation buffer */
2329 int put_bits; /* # of bits now in it */
2330 int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */
2331 } savable_state;
2332 @@ -181,7 +197,6 @@
2333 }
2334
2335 /* Initialize bit buffer to empty */
2336 -
2337 entropy->saved.put_buffer = 0;
2338 entropy->saved.put_bits = 0;
2339
2340 @@ -285,14 +300,16 @@
2341 dtbl->ehufsi[i] = huffsize[p];
2342 }
2343
2344 - if(!jpeg_first_bit_table_init) {
2345 +#ifndef USE_CLZ_INTRINSIC
2346 + if(!jpeg_nbits_table_init) {
2347 for(i = 0; i < 65536; i++) {
2348 - int bit = 0, val = i;
2349 - while (val) {val >>= 1; bit++;}
2350 - jpeg_first_bit_table[i] = bit;
2351 + int nbits = 0, temp = i;
2352 + while (temp) {temp >>= 1; nbits++;}
2353 + jpeg_nbits_table[i] = nbits;
2354 }
2355 - jpeg_first_bit_table_init = 1;
2356 + jpeg_nbits_table_init = 1;
2357 }
2358 +#endif
2359 }
2360
2361
2362 @@ -312,8 +329,6 @@
2363 {
2364 struct jpeg_destination_mgr * dest = state->cinfo->dest;
2365
2366 - dest->free_in_buffer = state->free_in_buffer;
2367 -
2368 if (! (*dest->empty_output_buffer) (state->cinfo))
2369 return FALSE;
2370 /* After a successful buffer dump, must reset buffer pointers */
2371 @@ -325,178 +340,133 @@
2372
2373 /* Outputting bits to the file */
2374
2375 -/* Only the right 24 bits of put_buffer are used; the valid bits are
2376 - * left-justified in this part. At most 16 bits can be passed to emit_bits
2377 - * in one call, and we never retain more than 7 bits in put_buffer
2378 - * between calls, so 24 bits are sufficient.
2379 +/* These macros perform the same task as the emit_bits() function in the
2380 + * original libjpeg code. In addition to reducing overhead by explicitly
2381 + * inlining the code, additional performance is achieved by taking into
2382 + * account the size of the bit buffer and waiting until it is almost full
2383 + * before emptying it. This mostly benefits 64-bit platforms, since 6
2384 + * bytes can be stored in a 64-bit bit buffer before it has to be emptied.
2385 */
2386
2387 -/***************************************************************/
2388 -
2389 -#define EMIT_BYTE() { \
2390 - if (0xFF == (*buffer++ = (unsigned char)(put_buffer >> (put_bits -= 8)))) \
2391 - *buffer++ = 0; \
2392 +#define EMIT_BYTE() { \
2393 + JOCTET c; \
2394 + put_bits -= 8; \
2395 + c = (JOCTET)GETJOCTET(put_buffer >> put_bits); \
2396 + *buffer++ = c; \
2397 + if (c == 0xFF) /* need to stuff a zero byte? */ \
2398 + *buffer++ = 0; \
2399 }
2400
2401 -/***************************************************************/
2402 +#define PUT_BITS(code, size) { \
2403 + put_bits += size; \
2404 + put_buffer = (put_buffer << size) | code; \
2405 +}
2406
2407 -#define DUMP_BITS_(code, size) { \
2408 - put_bits += size; \
2409 - put_buffer = (put_buffer << size) | code; \
2410 - if (put_bits > 7) \
2411 - while(put_bits > 7) \
2412 - EMIT_BYTE() \
2413 - }
2414 -
2415 -/***************************************************************/
2416 -
2417 -#define CHECKBUF15() { \
2418 - if (put_bits > 15) { \
2419 - EMIT_BYTE() \
2420 - EMIT_BYTE() \
2421 - } \
2422 +#define CHECKBUF15() { \
2423 + if (put_bits > 15) { \
2424 + EMIT_BYTE() \
2425 + EMIT_BYTE() \
2426 + } \
2427 }
2428
2429 -#define CHECKBUF47() { \
2430 - if (put_bits > 47) { \
2431 - EMIT_BYTE() \
2432 - EMIT_BYTE() \
2433 - EMIT_BYTE() \
2434 - EMIT_BYTE() \
2435 - EMIT_BYTE() \
2436 - EMIT_BYTE() \
2437 - } \
2438 +#define CHECKBUF31() { \
2439 + if (put_bits > 31) { \
2440 + EMIT_BYTE() \
2441 + EMIT_BYTE() \
2442 + EMIT_BYTE() \
2443 + EMIT_BYTE() \
2444 + } \
2445 }
2446
2447 -#define CHECKBUF31() { \
2448 - if (put_bits > 31) { \
2449 - EMIT_BYTE() \
2450 - EMIT_BYTE() \
2451 - EMIT_BYTE() \
2452 - EMIT_BYTE() \
2453 - } \
2454 +#define CHECKBUF47() { \
2455 + if (put_bits > 47) { \
2456 + EMIT_BYTE() \
2457 + EMIT_BYTE() \
2458 + EMIT_BYTE() \
2459 + EMIT_BYTE() \
2460 + EMIT_BYTE() \
2461 + EMIT_BYTE() \
2462 + } \
2463 }
2464
2465 -/***************************************************************/
2466 +#if __WORDSIZE==64 || defined(_WIN64)
2467
2468 -#define DUMP_BITS_NOCHECK(code, size) { \
2469 - put_bits += size; \
2470 - put_buffer = (put_buffer << size) | code; \
2471 - }
2472 +#define EMIT_BITS(code, size) { \
2473 + CHECKBUF47() \
2474 + PUT_BITS(code, size) \
2475 +}
2476
2477 -#if __WORDSIZE==64
2478 -
2479 -#define DUMP_BITS(code, size) { \
2480 - CHECKBUF47() \
2481 - put_bits += size; \
2482 - put_buffer = (put_buffer << size) | code; \
2483 +#define EMIT_CODE(code, size) { \
2484 + temp2 &= (((INT32) 1)<<nbits) - 1; \
2485 + CHECKBUF31() \
2486 + PUT_BITS(code, size) \
2487 + PUT_BITS(temp2, nbits) \
2488 }
2489
2490 #else
2491
2492 -#define DUMP_BITS(code, size) { \
2493 - put_bits += size; \
2494 - put_buffer = (put_buffer << size) | code; \
2495 - CHECKBUF15() \
2496 - }
2497 +#define EMIT_BITS(code, size) { \
2498 + PUT_BITS(code, size) \
2499 + CHECKBUF15() \
2500 +}
2501
2502 -#endif
2503 -
2504 -/***************************************************************/
2505 -
2506 -#define DUMP_SINGLE_VALUE(ht, codevalue) { \
2507 - size = ht->ehufsi[codevalue]; \
2508 - code = ht->ehufco[codevalue]; \
2509 - \
2510 - DUMP_BITS(code, size) \
2511 +#define EMIT_CODE(code, size) { \
2512 + temp2 &= (((INT32) 1)<<nbits) - 1; \
2513 + PUT_BITS(code, size) \
2514 + CHECKBUF15() \
2515 + PUT_BITS(temp2, nbits) \
2516 + CHECKBUF15() \
2517 }
2518
2519 -/***************************************************************/
2520 -
2521 -#define DUMP_VALUE_SLOW(ht, codevalue, t, nbits) { \
2522 - size = ht->ehufsi[codevalue]; \
2523 - code = ht->ehufco[codevalue]; \
2524 - t &= ~(-1 << nbits); \
2525 - DUMP_BITS_NOCHECK(code, size) \
2526 - CHECKBUF15() \
2527 - DUMP_BITS_NOCHECK(t, nbits) \
2528 - CHECKBUF15() \
2529 - }
2530 -
2531 -int _max=0;
2532 -
2533 -#if __WORDSIZE==64
2534 -
2535 -#define DUMP_VALUE(ht, codevalue, t, nbits) { \
2536 - size = ht->ehufsi[codevalue]; \
2537 - code = ht->ehufco[codevalue]; \
2538 - t &= ~(-1 << nbits); \
2539 - CHECKBUF31() \
2540 - DUMP_BITS_NOCHECK(code, size) \
2541 - DUMP_BITS_NOCHECK(t, nbits) \
2542 - }
2543 -
2544 -#else
2545 -
2546 -#define DUMP_VALUE(ht, codevalue, t, nbits) { \
2547 - size = ht->ehufsi[codevalue]; \
2548 - code = ht->ehufco[codevalue]; \
2549 - t &= ~(-1 << nbits); \
2550 - DUMP_BITS_NOCHECK(code, size) \
2551 - CHECKBUF15() \
2552 - DUMP_BITS_NOCHECK(t, nbits) \
2553 - CHECKBUF15() \
2554 - }
2555 -
2556 #endif
2557
2558 -/***************************************************************/
2559
2560 #define BUFSIZE (DCTSIZE2 * 2)
2561
2562 -#define LOAD_BUFFER() { \
2563 - if (state->free_in_buffer < BUFSIZE) { \
2564 - localbuf = 1; \
2565 - buffer = _buffer; \
2566 - } \
2567 - else buffer = state->next_output_byte; \
2568 +#define LOAD_BUFFER() { \
2569 + if (state->free_in_buffer < BUFSIZE) { \
2570 + localbuf = 1; \
2571 + buffer = _buffer; \
2572 + } \
2573 + else buffer = state->next_output_byte; \
2574 }
2575
2576 -#define STORE_BUFFER() { \
2577 - if (localbuf) { \
2578 - bytes = buffer - _buffer; \
2579 - buffer = _buffer; \
2580 - while (bytes > 0) { \
2581 - bytestocopy = min(bytes, state->free_in_buffer); \
2582 - MEMCOPY(state->next_output_byte, buffer, bytestocopy); \
2583 - state->next_output_byte += bytestocopy; \
2584 - buffer += bytestocopy; \
2585 - state->free_in_buffer -= bytestocopy; \
2586 - if (state->free_in_buffer == 0) \
2587 - if (! dump_buffer(state)) return FALSE; \
2588 - bytes -= bytestocopy; \
2589 - } \
2590 - } \
2591 - else { \
2592 - state->free_in_buffer -= (buffer - state->next_output_byte); \
2593 - state->next_output_byte = buffer; \
2594 - } \
2595 +#define STORE_BUFFER() { \
2596 + if (localbuf) { \
2597 + bytes = buffer - _buffer; \
2598 + buffer = _buffer; \
2599 + while (bytes > 0) { \
2600 + bytestocopy = min(bytes, state->free_in_buffer); \
2601 + MEMCOPY(state->next_output_byte, buffer, bytestocopy); \
2602 + state->next_output_byte += bytestocopy; \
2603 + buffer += bytestocopy; \
2604 + state->free_in_buffer -= bytestocopy; \
2605 + if (state->free_in_buffer == 0) \
2606 + if (! dump_buffer(state)) return FALSE; \
2607 + bytes -= bytestocopy; \
2608 + } \
2609 + } \
2610 + else { \
2611 + state->free_in_buffer -= (buffer - state->next_output_byte); \
2612 + state->next_output_byte = buffer; \
2613 + } \
2614 }
2615
2616 -/***************************************************************/
2617
2618 LOCAL(boolean)
2619 flush_bits (working_state * state)
2620 {
2621 - unsigned char _buffer[BUFSIZE], *buffer;
2622 - long put_buffer; int put_bits;
2623 - int bytes, bytestocopy, localbuf = 0;
2624 + JOCTET _buffer[BUFSIZE], *buffer;
2625 + size_t put_buffer; int put_bits;
2626 + size_t bytes, bytestocopy; int localbuf = 0;
2627
2628 put_buffer = state->cur.put_buffer;
2629 put_bits = state->cur.put_bits;
2630 LOAD_BUFFER()
2631
2632 - DUMP_BITS_(0x7F, 7)
2633 + /* fill any partial byte with ones */
2634 + PUT_BITS(0x7F, 7)
2635 + while (put_bits >= 8) EMIT_BYTE()
2636
2637 state->cur.put_buffer = 0; /* and reset bit-buffer to empty */
2638 state->cur.put_bits = 0;
2639 @@ -505,6 +475,7 @@
2640 return TRUE;
2641 }
2642
2643 +
2644 /* Encode a single block's worth of coefficients */
2645
2646 LOCAL(boolean)
2647 @@ -511,13 +482,13 @@
2648 encode_one_block (working_state * state, JCOEFPTR block, int last_dc_val,
2649 c_derived_tbl *dctbl, c_derived_tbl *actbl)
2650 {
2651 - int temp, temp2;
2652 + int temp, temp2, temp3;
2653 int nbits;
2654 - int r, sflag, size, code;
2655 - unsigned char _buffer[BUFSIZE], *buffer;
2656 - long put_buffer; int put_bits;
2657 + int r, code, size;
2658 + JOCTET _buffer[BUFSIZE], *buffer;
2659 + size_t put_buffer; int put_bits;
2660 int code_0xf0 = actbl->ehufco[0xf0], size_0xf0 = actbl->ehufsi[0xf0];
2661 - int bytes, bytestocopy, localbuf = 0;
2662 + size_t bytes, bytestocopy; int localbuf = 0;
2663
2664 put_buffer = state->cur.put_buffer;
2665 put_bits = state->cur.put_bits;
2666 @@ -527,50 +498,88 @@
2667
2668 temp = temp2 = block[0] - last_dc_val;
2669
2670 - sflag = temp >> 31;
2671 - temp -= ((temp + temp) & sflag);
2672 - temp2 += sflag;
2673 - nbits = jpeg_first_bit_table[temp];
2674 - DUMP_VALUE_SLOW(dctbl, nbits, temp2, nbits)
2675 + /* This is a well-known technique for obtaining the absolute value without a
2676 + * branch. It is derived from an assembly language technique presented in
2677 + * "How to Optimize for the Pentium Processors", Copyright (c) 1996, 1997 by
2678 + * Agner Fog.
2679 + */
2680 + temp3 = temp >> (CHAR_BIT * sizeof(int) - 1);
2681 + temp ^= temp3;
2682 + temp -= temp3;
2683
2684 + /* For a negative input, want temp2 = bitwise complement of abs(input) */
2685 + /* This code assumes we are on a two's complement machine */
2686 + temp2 += temp3;
2687 +
2688 + /* Find the number of bits needed for the magnitude of the coefficient */
2689 + nbits = JPEG_NBITS(temp);
2690 +
2691 + /* Emit the Huffman-coded symbol for the number of bits */
2692 + code = dctbl->ehufco[nbits];
2693 + size = dctbl->ehufsi[nbits];
2694 + PUT_BITS(code, size)
2695 + CHECKBUF15()
2696 +
2697 + /* Mask off any extra bits in code */
2698 + temp2 &= (((INT32) 1)<<nbits) - 1;
2699 +
2700 + /* Emit that number of bits of the value, if positive, */
2701 + /* or the complement of its magnitude, if negative. */
2702 + PUT_BITS(temp2, nbits)
2703 + CHECKBUF15()
2704 +
2705 /* Encode the AC coefficients per section F.1.2.2 */
2706
2707 r = 0; /* r = run length of zeros */
2708
2709 -#define innerloop(order) { \
2710 - temp2 = *(JCOEF*)((unsigned char*)block + order); \
2711 - if(temp2 == 0) r++; \
2712 - else { \
2713 - temp = (JCOEF)temp2; \
2714 - sflag = temp >> 31; \
2715 - temp = (temp ^ sflag) - sflag; \
2716 - temp2 += sflag; \
2717 - nbits = jpeg_first_bit_table[temp]; \
2718 - for(; r > 15; r -= 16) DUMP_BITS(code_0xf0, size_0xf0) \
2719 - sflag = (r << 4) + nbits; \
2720 - DUMP_VALUE(actbl, sflag, temp2, nbits) \
2721 +/* Manually unroll the k loop to eliminate the counter variable. This
2722 + * improves performance greatly on systems with a limited number of
2723 + * registers (such as x86.)
2724 + */
2725 +#define kloop(jpeg_natural_order_of_k) { \
2726 + if ((temp = block[jpeg_natural_order_of_k]) == 0) { \
2727 + r++; \
2728 + } else { \
2729 + temp2 = temp; \
2730 + /* Branch-less absolute value, bitwise complement, etc., same as above */ \
2731 + temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); \
2732 + temp ^= temp3; \
2733 + temp -= temp3; \
2734 + temp2 += temp3; \
2735 + nbits = JPEG_NBITS_NONZERO(temp); \
2736 + /* if run length > 15, must emit special run-length-16 codes (0xF0) */ \
2737 + while (r > 15) { \
2738 + EMIT_BITS(code_0xf0, size_0xf0) \
2739 + r -= 16; \
2740 + } \
2741 + /* Emit Huffman symbol for run length / number of bits */ \
2742 + temp3 = (r << 4) + nbits; \
2743 + code = actbl->ehufco[temp3]; \
2744 + size = actbl->ehufsi[temp3]; \
2745 + EMIT_CODE(code, size) \
2746 r = 0; \
2747 - }}
2748 + } \
2749 +}
2750
2751 - innerloop(2*1); innerloop(2*8); innerloop(2*16); innerloop(2*9);
2752 - innerloop(2*2); innerloop(2*3); innerloop(2*10); innerloop(2*17);
2753 - innerloop(2*24); innerloop(2*32); innerloop(2*25); innerloop(2*18);
2754 - innerloop(2*11); innerloop(2*4); innerloop(2*5); innerloop(2*12);
2755 - innerloop(2*19); innerloop(2*26); innerloop(2*33); innerloop(2*40);
2756 - innerloop(2*48); innerloop(2*41); innerloop(2*34); innerloop(2*27);
2757 - innerloop(2*20); innerloop(2*13); innerloop(2*6); innerloop(2*7);
2758 - innerloop(2*14); innerloop(2*21); innerloop(2*28); innerloop(2*35);
2759 - innerloop(2*42); innerloop(2*49); innerloop(2*56); innerloop(2*57);
2760 - innerloop(2*50); innerloop(2*43); innerloop(2*36); innerloop(2*29);
2761 - innerloop(2*22); innerloop(2*15); innerloop(2*23); innerloop(2*30);
2762 - innerloop(2*37); innerloop(2*44); innerloop(2*51); innerloop(2*58);
2763 - innerloop(2*59); innerloop(2*52); innerloop(2*45); innerloop(2*38);
2764 - innerloop(2*31); innerloop(2*39); innerloop(2*46); innerloop(2*53);
2765 - innerloop(2*60); innerloop(2*61); innerloop(2*54); innerloop(2*47);
2766 - innerloop(2*55); innerloop(2*62); innerloop(2*63);
2767 + /* One iteration for each value in jpeg_natural_order[] */
2768 + kloop(1); kloop(8); kloop(16); kloop(9); kloop(2); kloop(3);
2769 + kloop(10); kloop(17); kloop(24); kloop(32); kloop(25); kloop(18);
2770 + kloop(11); kloop(4); kloop(5); kloop(12); kloop(19); kloop(26);
2771 + kloop(33); kloop(40); kloop(48); kloop(41); kloop(34); kloop(27);
2772 + kloop(20); kloop(13); kloop(6); kloop(7); kloop(14); kloop(21);
2773 + kloop(28); kloop(35); kloop(42); kloop(49); kloop(56); kloop(57);
2774 + kloop(50); kloop(43); kloop(36); kloop(29); kloop(22); kloop(15);
2775 + kloop(23); kloop(30); kloop(37); kloop(44); kloop(51); kloop(58);
2776 + kloop(59); kloop(52); kloop(45); kloop(38); kloop(31); kloop(39);
2777 + kloop(46); kloop(53); kloop(60); kloop(61); kloop(54); kloop(47);
2778 + kloop(55); kloop(62); kloop(63);
2779
2780 /* If the last coef(s) were zero, emit an end-of-block code */
2781 - if (r > 0) DUMP_SINGLE_VALUE(actbl, 0x0)
2782 + if (r > 0) {
2783 + code = actbl->ehufco[0];
2784 + size = actbl->ehufsi[0];
2785 + EMIT_BITS(code, size)
2786 + }
2787
2788 state->cur.put_buffer = put_buffer;
2789 state->cur.put_bits = put_bits;
2790 Index: jcinit.c
2791 ===================================================================
2792 --- jcinit.c (revision 829)
2793 +++ jcinit.c (working copy)
2794 @@ -42,7 +42,11 @@
2795 jinit_forward_dct(cinfo);
2796 /* Entropy encoding: either Huffman or arithmetic coding. */
2797 if (cinfo->arith_code) {
2798 +#ifdef C_ARITH_CODING_SUPPORTED
2799 + jinit_arith_encoder(cinfo);
2800 +#else
2801 ERREXIT(cinfo, JERR_ARITH_NOTIMPL);
2802 +#endif
2803 } else {
2804 if (cinfo->progressive_mode) {
2805 #ifdef C_PROGRESSIVE_SUPPORTED
2806 Index: jcmainct.c
2807 ===================================================================
2808 --- jcmainct.c (revision 829)
2809 +++ jcmainct.c (working copy)
2810 @@ -68,32 +68,32 @@
2811 METHODDEF(void)
2812 start_pass_main (j_compress_ptr cinfo, J_BUF_MODE pass_mode)
2813 {
2814 - my_main_ptr main = (my_main_ptr) cinfo->main;
2815 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
2816
2817 /* Do nothing in raw-data mode. */
2818 if (cinfo->raw_data_in)
2819 return;
2820
2821 - main->cur_iMCU_row = 0; /* initialize counters */
2822 - main->rowgroup_ctr = 0;
2823 - main->suspended = FALSE;
2824 - main->pass_mode = pass_mode; /* save mode for use by process_data */
2825 + main_ptr->cur_iMCU_row = 0; /* initialize counters */
2826 + main_ptr->rowgroup_ctr = 0;
2827 + main_ptr->suspended = FALSE;
2828 + main_ptr->pass_mode = pass_mode; /* save mode for use by process_data */
2829
2830 switch (pass_mode) {
2831 case JBUF_PASS_THRU:
2832 #ifdef FULL_MAIN_BUFFER_SUPPORTED
2833 - if (main->whole_image[0] != NULL)
2834 + if (main_ptr->whole_image[0] != NULL)
2835 ERREXIT(cinfo, JERR_BAD_BUFFER_MODE);
2836 #endif
2837 - main->pub.process_data = process_data_simple_main;
2838 + main_ptr->pub.process_data = process_data_simple_main;
2839 break;
2840 #ifdef FULL_MAIN_BUFFER_SUPPORTED
2841 case JBUF_SAVE_SOURCE:
2842 case JBUF_CRANK_DEST:
2843 case JBUF_SAVE_AND_PASS:
2844 - if (main->whole_image[0] == NULL)
2845 + if (main_ptr->whole_image[0] == NULL)
2846 ERREXIT(cinfo, JERR_BAD_BUFFER_MODE);
2847 - main->pub.process_data = process_data_buffer_main;
2848 + main_ptr->pub.process_data = process_data_buffer_main;
2849 break;
2850 #endif
2851 default:
2852 @@ -114,14 +114,14 @@
2853 JSAMPARRAY input_buf, JDIMENSION *in_row_ctr,
2854 JDIMENSION in_rows_avail)
2855 {
2856 - my_main_ptr main = (my_main_ptr) cinfo->main;
2857 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
2858
2859 - while (main->cur_iMCU_row < cinfo->total_iMCU_rows) {
2860 + while (main_ptr->cur_iMCU_row < cinfo->total_iMCU_rows) {
2861 /* Read input data if we haven't filled the main buffer yet */
2862 - if (main->rowgroup_ctr < DCTSIZE)
2863 + if (main_ptr->rowgroup_ctr < DCTSIZE)
2864 (*cinfo->prep->pre_process_data) (cinfo,
2865 input_buf, in_row_ctr, in_rows_avail,
2866 - main->buffer, &main->rowgroup_ctr,
2867 + main_ptr->buffer, &main_ptr->rowgroup_ct r,
2868 (JDIMENSION) DCTSIZE);
2869
2870 /* If we don't have a full iMCU row buffered, return to application for
2871 @@ -128,11 +128,11 @@
2872 * more data. Note that preprocessor will always pad to fill the iMCU row
2873 * at the bottom of the image.
2874 */
2875 - if (main->rowgroup_ctr != DCTSIZE)
2876 + if (main_ptr->rowgroup_ctr != DCTSIZE)
2877 return;
2878
2879 /* Send the completed row to the compressor */
2880 - if (! (*cinfo->coef->compress_data) (cinfo, main->buffer)) {
2881 + if (! (*cinfo->coef->compress_data) (cinfo, main_ptr->buffer)) {
2882 /* If compressor did not consume the whole row, then we must need to
2883 * suspend processing and return to the application. In this situation
2884 * we pretend we didn't yet consume the last input row; otherwise, if
2885 @@ -139,9 +139,9 @@
2886 * it happened to be the last row of the image, the application would
2887 * think we were done.
2888 */
2889 - if (! main->suspended) {
2890 + if (! main_ptr->suspended) {
2891 (*in_row_ctr)--;
2892 - main->suspended = TRUE;
2893 + main_ptr->suspended = TRUE;
2894 }
2895 return;
2896 }
2897 @@ -148,12 +148,12 @@
2898 /* We did finish the row. Undo our little suspension hack if a previous
2899 * call suspended; then mark the main buffer empty.
2900 */
2901 - if (main->suspended) {
2902 + if (main_ptr->suspended) {
2903 (*in_row_ctr)++;
2904 - main->suspended = FALSE;
2905 + main_ptr->suspended = FALSE;
2906 }
2907 - main->rowgroup_ctr = 0;
2908 - main->cur_iMCU_row++;
2909 + main_ptr->rowgroup_ctr = 0;
2910 + main_ptr->cur_iMCU_row++;
2911 }
2912 }
2913
2914 @@ -170,25 +170,25 @@
2915 JSAMPARRAY input_buf, JDIMENSION *in_row_ctr,
2916 JDIMENSION in_rows_avail)
2917 {
2918 - my_main_ptr main = (my_main_ptr) cinfo->main;
2919 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
2920 int ci;
2921 jpeg_component_info *compptr;
2922 - boolean writing = (main->pass_mode != JBUF_CRANK_DEST);
2923 + boolean writing = (main_ptr->pass_mode != JBUF_CRANK_DEST);
2924
2925 - while (main->cur_iMCU_row < cinfo->total_iMCU_rows) {
2926 + while (main_ptr->cur_iMCU_row < cinfo->total_iMCU_rows) {
2927 /* Realign the virtual buffers if at the start of an iMCU row. */
2928 - if (main->rowgroup_ctr == 0) {
2929 + if (main_ptr->rowgroup_ctr == 0) {
2930 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
2931 ci++, compptr++) {
2932 - main->buffer[ci] = (*cinfo->mem->access_virt_sarray)
2933 - ((j_common_ptr) cinfo, main->whole_image[ci],
2934 - main->cur_iMCU_row * (compptr->v_samp_factor * DCTSIZE),
2935 + main_ptr->buffer[ci] = (*cinfo->mem->access_virt_sarray)
2936 + ((j_common_ptr) cinfo, main_ptr->whole_image[ci],
2937 + main_ptr->cur_iMCU_row * (compptr->v_samp_factor * DCTSIZE),
2938 (JDIMENSION) (compptr->v_samp_factor * DCTSIZE), writing);
2939 }
2940 /* In a read pass, pretend we just read some source data. */
2941 if (! writing) {
2942 *in_row_ctr += cinfo->max_v_samp_factor * DCTSIZE;
2943 - main->rowgroup_ctr = DCTSIZE;
2944 + main_ptr->rowgroup_ctr = DCTSIZE;
2945 }
2946 }
2947
2948 @@ -197,16 +197,16 @@
2949 if (writing) {
2950 (*cinfo->prep->pre_process_data) (cinfo,
2951 input_buf, in_row_ctr, in_rows_avail,
2952 - main->buffer, &main->rowgroup_ctr,
2953 + main_ptr->buffer, &main_ptr->rowgroup_ct r,
2954 (JDIMENSION) DCTSIZE);
2955 /* Return to application if we need more data to fill the iMCU row. */
2956 - if (main->rowgroup_ctr < DCTSIZE)
2957 + if (main_ptr->rowgroup_ctr < DCTSIZE)
2958 return;
2959 }
2960
2961 /* Emit data, unless this is a sink-only pass. */
2962 - if (main->pass_mode != JBUF_SAVE_SOURCE) {
2963 - if (! (*cinfo->coef->compress_data) (cinfo, main->buffer)) {
2964 + if (main_ptr->pass_mode != JBUF_SAVE_SOURCE) {
2965 + if (! (*cinfo->coef->compress_data) (cinfo, main_ptr->buffer)) {
2966 /* If compressor did not consume the whole row, then we must need to
2967 * suspend processing and return to the application. In this situation
2968 * we pretend we didn't yet consume the last input row; otherwise, if
2969 @@ -213,9 +213,9 @@
2970 * it happened to be the last row of the image, the application would
2971 * think we were done.
2972 */
2973 - if (! main->suspended) {
2974 + if (! main_ptr->suspended) {
2975 (*in_row_ctr)--;
2976 - main->suspended = TRUE;
2977 + main_ptr->suspended = TRUE;
2978 }
2979 return;
2980 }
2981 @@ -222,15 +222,15 @@
2982 /* We did finish the row. Undo our little suspension hack if a previous
2983 * call suspended; then mark the main buffer empty.
2984 */
2985 - if (main->suspended) {
2986 + if (main_ptr->suspended) {
2987 (*in_row_ctr)++;
2988 - main->suspended = FALSE;
2989 + main_ptr->suspended = FALSE;
2990 }
2991 }
2992
2993 /* If get here, we are done with this iMCU row. Mark buffer empty. */
2994 - main->rowgroup_ctr = 0;
2995 - main->cur_iMCU_row++;
2996 + main_ptr->rowgroup_ctr = 0;
2997 + main_ptr->cur_iMCU_row++;
2998 }
2999 }
3000
3001 @@ -244,15 +244,15 @@
3002 GLOBAL(void)
3003 jinit_c_main_controller (j_compress_ptr cinfo, boolean need_full_buffer)
3004 {
3005 - my_main_ptr main;
3006 + my_main_ptr main_ptr;
3007 int ci;
3008 jpeg_component_info *compptr;
3009
3010 - main = (my_main_ptr)
3011 + main_ptr = (my_main_ptr)
3012 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
3013 SIZEOF(my_main_controller));
3014 - cinfo->main = (struct jpeg_c_main_controller *) main;
3015 - main->pub.start_pass = start_pass_main;
3016 + cinfo->main = (struct jpeg_c_main_controller *) main_ptr;
3017 + main_ptr->pub.start_pass = start_pass_main;
3018
3019 /* We don't need to create a buffer in raw-data mode. */
3020 if (cinfo->raw_data_in)
3021 @@ -267,7 +267,7 @@
3022 /* Note we pad the bottom to a multiple of the iMCU height */
3023 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
3024 ci++, compptr++) {
3025 - main->whole_image[ci] = (*cinfo->mem->request_virt_sarray)
3026 + main_ptr->whole_image[ci] = (*cinfo->mem->request_virt_sarray)
3027 ((j_common_ptr) cinfo, JPOOL_IMAGE, FALSE,
3028 compptr->width_in_blocks * DCTSIZE,
3029 (JDIMENSION) jround_up((long) compptr->height_in_blocks,
3030 @@ -279,12 +279,12 @@
3031 #endif
3032 } else {
3033 #ifdef FULL_MAIN_BUFFER_SUPPORTED
3034 - main->whole_image[0] = NULL; /* flag for no virtual arrays */
3035 + main_ptr->whole_image[0] = NULL; /* flag for no virtual arrays */
3036 #endif
3037 /* Allocate a strip buffer for each component */
3038 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
3039 ci++, compptr++) {
3040 - main->buffer[ci] = (*cinfo->mem->alloc_sarray)
3041 + main_ptr->buffer[ci] = (*cinfo->mem->alloc_sarray)
3042 ((j_common_ptr) cinfo, JPOOL_IMAGE,
3043 compptr->width_in_blocks * DCTSIZE,
3044 (JDIMENSION) (compptr->v_samp_factor * DCTSIZE));
3045 Index: jcmarker.c
3046 ===================================================================
3047 --- jcmarker.c (revision 829)
3048 +++ jcmarker.c (working copy)
3049 @@ -1,8 +1,11 @@
3050 /*
3051 * jcmarker.c
3052 *
3053 + * This file was part of the Independent JPEG Group's software:
3054 * Copyright (C) 1991-1998, Thomas G. Lane.
3055 - * This file is part of the Independent JPEG Group's software.
3056 + * Modified 2003-2010 by Guido Vollbeding.
3057 + * libjpeg-turbo Modifications:
3058 + * Copyright (C) 2010, D. R. Commander.
3059 * For conditions of distribution and use, see the accompanying README file.
3060 *
3061 * This file contains routines to write JPEG datastream markers.
3062 @@ -11,6 +14,7 @@
3063 #define JPEG_INTERNALS
3064 #include "jinclude.h"
3065 #include "jpeglib.h"
3066 +#include "jpegcomp.h"
3067
3068
3069 typedef enum { /* JPEG marker codes */
3070 @@ -18,24 +22,24 @@
3071 M_SOF1 = 0xc1,
3072 M_SOF2 = 0xc2,
3073 M_SOF3 = 0xc3,
3074 -
3075 +
3076 M_SOF5 = 0xc5,
3077 M_SOF6 = 0xc6,
3078 M_SOF7 = 0xc7,
3079 -
3080 +
3081 M_JPG = 0xc8,
3082 M_SOF9 = 0xc9,
3083 M_SOF10 = 0xca,
3084 M_SOF11 = 0xcb,
3085 -
3086 +
3087 M_SOF13 = 0xcd,
3088 M_SOF14 = 0xce,
3089 M_SOF15 = 0xcf,
3090 -
3091 +
3092 M_DHT = 0xc4,
3093 -
3094 +
3095 M_DAC = 0xcc,
3096 -
3097 +
3098 M_RST0 = 0xd0,
3099 M_RST1 = 0xd1,
3100 M_RST2 = 0xd2,
3101 @@ -44,7 +48,7 @@
3102 M_RST5 = 0xd5,
3103 M_RST6 = 0xd6,
3104 M_RST7 = 0xd7,
3105 -
3106 +
3107 M_SOI = 0xd8,
3108 M_EOI = 0xd9,
3109 M_SOS = 0xda,
3110 @@ -53,7 +57,7 @@
3111 M_DRI = 0xdd,
3112 M_DHP = 0xde,
3113 M_EXP = 0xdf,
3114 -
3115 +
3116 M_APP0 = 0xe0,
3117 M_APP1 = 0xe1,
3118 M_APP2 = 0xe2,
3119 @@ -70,13 +74,13 @@
3120 M_APP13 = 0xed,
3121 M_APP14 = 0xee,
3122 M_APP15 = 0xef,
3123 -
3124 +
3125 M_JPG0 = 0xf0,
3126 M_JPG13 = 0xfd,
3127 M_COM = 0xfe,
3128 -
3129 +
3130 M_TEM = 0x01,
3131 -
3132 +
3133 M_ERROR = 0x100
3134 } JPEG_MARKER;
3135
3136 @@ -229,33 +233,39 @@
3137 char ac_in_use[NUM_ARITH_TBLS];
3138 int length, i;
3139 jpeg_component_info *compptr;
3140 -
3141 +
3142 for (i = 0; i < NUM_ARITH_TBLS; i++)
3143 dc_in_use[i] = ac_in_use[i] = 0;
3144 -
3145 +
3146 for (i = 0; i < cinfo->comps_in_scan; i++) {
3147 compptr = cinfo->cur_comp_info[i];
3148 - dc_in_use[compptr->dc_tbl_no] = 1;
3149 - ac_in_use[compptr->ac_tbl_no] = 1;
3150 + /* DC needs no table for refinement scan */
3151 + if (cinfo->Ss == 0 && cinfo->Ah == 0)
3152 + dc_in_use[compptr->dc_tbl_no] = 1;
3153 + /* AC needs no table when not present */
3154 + if (cinfo->Se)
3155 + ac_in_use[compptr->ac_tbl_no] = 1;
3156 }
3157 -
3158 +
3159 length = 0;
3160 for (i = 0; i < NUM_ARITH_TBLS; i++)
3161 length += dc_in_use[i] + ac_in_use[i];
3162 -
3163 - emit_marker(cinfo, M_DAC);
3164 -
3165 - emit_2bytes(cinfo, length*2 + 2);
3166 -
3167 - for (i = 0; i < NUM_ARITH_TBLS; i++) {
3168 - if (dc_in_use[i]) {
3169 - emit_byte(cinfo, i);
3170 - emit_byte(cinfo, cinfo->arith_dc_L[i] + (cinfo->arith_dc_U[i]<<4));
3171 +
3172 + if (length) {
3173 + emit_marker(cinfo, M_DAC);
3174 +
3175 + emit_2bytes(cinfo, length*2 + 2);
3176 +
3177 + for (i = 0; i < NUM_ARITH_TBLS; i++) {
3178 + if (dc_in_use[i]) {
3179 + emit_byte(cinfo, i);
3180 + emit_byte(cinfo, cinfo->arith_dc_L[i] + (cinfo->arith_dc_U[i]<<4));
3181 + }
3182 + if (ac_in_use[i]) {
3183 + emit_byte(cinfo, i + 0x10);
3184 + emit_byte(cinfo, cinfo->arith_ac_K[i]);
3185 + }
3186 }
3187 - if (ac_in_use[i]) {
3188 - emit_byte(cinfo, i + 0x10);
3189 - emit_byte(cinfo, cinfo->arith_ac_K[i]);
3190 - }
3191 }
3192 #endif /* C_ARITH_CODING_SUPPORTED */
3193 }
3194 @@ -285,13 +295,13 @@
3195 emit_2bytes(cinfo, 3 * cinfo->num_components + 2 + 5 + 1); /* length */
3196
3197 /* Make sure image isn't bigger than SOF field can handle */
3198 - if ((long) cinfo->image_height > 65535L ||
3199 - (long) cinfo->image_width > 65535L)
3200 + if ((long) cinfo->_jpeg_height > 65535L ||
3201 + (long) cinfo->_jpeg_width > 65535L)
3202 ERREXIT1(cinfo, JERR_IMAGE_TOO_BIG, (unsigned int) 65535);
3203
3204 emit_byte(cinfo, cinfo->data_precision);
3205 - emit_2bytes(cinfo, (int) cinfo->image_height);
3206 - emit_2bytes(cinfo, (int) cinfo->image_width);
3207 + emit_2bytes(cinfo, (int) cinfo->_jpeg_height);
3208 + emit_2bytes(cinfo, (int) cinfo->_jpeg_width);
3209
3210 emit_byte(cinfo, cinfo->num_components);
3211
3212 @@ -320,22 +330,16 @@
3213 for (i = 0; i < cinfo->comps_in_scan; i++) {
3214 compptr = cinfo->cur_comp_info[i];
3215 emit_byte(cinfo, compptr->component_id);
3216 - td = compptr->dc_tbl_no;
3217 - ta = compptr->ac_tbl_no;
3218 - if (cinfo->progressive_mode) {
3219 - /* Progressive mode: only DC or only AC tables are used in one scan;
3220 - * furthermore, Huffman coding of DC refinement uses no table at all.
3221 - * We emit 0 for unused field(s); this is recommended by the P&M text
3222 - * but does not seem to be specified in the standard.
3223 - */
3224 - if (cinfo->Ss == 0) {
3225 - ta = 0; /* DC scan */
3226 - if (cinfo->Ah != 0 && !cinfo->arith_code)
3227 - td = 0; /* no DC table either */
3228 - } else {
3229 - td = 0; /* AC scan */
3230 - }
3231 - }
3232 +
3233 + /* We emit 0 for unused field(s); this is recommended by the P&M text
3234 + * but does not seem to be specified in the standard.
3235 + */
3236 +
3237 + /* DC needs no table for refinement scan */
3238 + td = cinfo->Ss == 0 && cinfo->Ah == 0 ? compptr->dc_tbl_no : 0;
3239 + /* AC needs no table when not present */
3240 + ta = cinfo->Se ? compptr->ac_tbl_no : 0;
3241 +
3242 emit_byte(cinfo, (td << 4) + ta);
3243 }
3244
3245 @@ -529,7 +533,10 @@
3246
3247 /* Emit the proper SOF marker */
3248 if (cinfo->arith_code) {
3249 - emit_sof(cinfo, M_SOF9); /* SOF code for arithmetic coding */
3250 + if (cinfo->progressive_mode)
3251 + emit_sof(cinfo, M_SOF10); /* SOF code for progressive arithmetic */
3252 + else
3253 + emit_sof(cinfo, M_SOF9); /* SOF code for sequential arithmetic */
3254 } else {
3255 if (cinfo->progressive_mode)
3256 emit_sof(cinfo, M_SOF2); /* SOF code for progressive Huffman */
3257 @@ -566,19 +573,12 @@
3258 */
3259 for (i = 0; i < cinfo->comps_in_scan; i++) {
3260 compptr = cinfo->cur_comp_info[i];
3261 - if (cinfo->progressive_mode) {
3262 - /* Progressive mode: only DC or only AC tables are used in one scan */
3263 - if (cinfo->Ss == 0) {
3264 - if (cinfo->Ah == 0) /* DC needs no table for refinement scan */
3265 - emit_dht(cinfo, compptr->dc_tbl_no, FALSE);
3266 - } else {
3267 - emit_dht(cinfo, compptr->ac_tbl_no, TRUE);
3268 - }
3269 - } else {
3270 - /* Sequential mode: need both DC and AC tables */
3271 + /* DC needs no table for refinement scan */
3272 + if (cinfo->Ss == 0 && cinfo->Ah == 0)
3273 emit_dht(cinfo, compptr->dc_tbl_no, FALSE);
3274 + /* AC needs no table when not present */
3275 + if (cinfo->Se)
3276 emit_dht(cinfo, compptr->ac_tbl_no, TRUE);
3277 - }
3278 }
3279 }
3280
3281 Index: jcmaster.c
3282 ===================================================================
3283 --- jcmaster.c (revision 829)
3284 +++ jcmaster.c (working copy)
3285 @@ -1,8 +1,11 @@
3286 /*
3287 * jcmaster.c
3288 *
3289 + * This file was part of the Independent JPEG Group's software:
3290 * Copyright (C) 1991-1997, Thomas G. Lane.
3291 - * This file is part of the Independent JPEG Group's software.
3292 + * Modified 2003-2010 by Guido Vollbeding.
3293 + * libjpeg-turbo Modifications:
3294 + * Copyright (C) 2010, D. R. Commander.
3295 * For conditions of distribution and use, see the accompanying README file.
3296 *
3297 * This file contains master control logic for the JPEG compressor.
3298 @@ -14,6 +17,7 @@
3299 #define JPEG_INTERNALS
3300 #include "jinclude.h"
3301 #include "jpeglib.h"
3302 +#include "jpegcomp.h"
3303
3304
3305 /* Private state */
3306 @@ -42,8 +46,28 @@
3307 * Support routines that do various essential calculations.
3308 */
3309
3310 +#if JPEG_LIB_VERSION >= 70
3311 +/*
3312 + * Compute JPEG image dimensions and related values.
3313 + * NOTE: this is exported for possible use by application.
3314 + * Hence it mustn't do anything that can't be done twice.
3315 + */
3316 +
3317 +GLOBAL(void)
3318 +jpeg_calc_jpeg_dimensions (j_compress_ptr cinfo)
3319 +/* Do computations that are needed before master selection phase */
3320 +{
3321 + /* Hardwire it to "no scaling" */
3322 + cinfo->jpeg_width = cinfo->image_width;
3323 + cinfo->jpeg_height = cinfo->image_height;
3324 + cinfo->min_DCT_h_scaled_size = DCTSIZE;
3325 + cinfo->min_DCT_v_scaled_size = DCTSIZE;
3326 +}
3327 +#endif
3328 +
3329 +
3330 LOCAL(void)
3331 -initial_setup (j_compress_ptr cinfo)
3332 +initial_setup (j_compress_ptr cinfo, boolean transcode_only)
3333 /* Do computations that are needed before master selection phase */
3334 {
3335 int ci;
3336 @@ -51,14 +75,21 @@
3337 long samplesperrow;
3338 JDIMENSION jd_samplesperrow;
3339
3340 +#if JPEG_LIB_VERSION >= 70
3341 +#if JPEG_LIB_VERSION >= 80
3342 + if (!transcode_only)
3343 +#endif
3344 + jpeg_calc_jpeg_dimensions(cinfo);
3345 +#endif
3346 +
3347 /* Sanity check on image dimensions */
3348 - if (cinfo->image_height <= 0 || cinfo->image_width <= 0
3349 + if (cinfo->_jpeg_height <= 0 || cinfo->_jpeg_width <= 0
3350 || cinfo->num_components <= 0 || cinfo->input_components <= 0)
3351 ERREXIT(cinfo, JERR_EMPTY_IMAGE);
3352
3353 /* Make sure image isn't bigger than I can handle */
3354 - if ((long) cinfo->image_height > (long) JPEG_MAX_DIMENSION ||
3355 - (long) cinfo->image_width > (long) JPEG_MAX_DIMENSION)
3356 + if ((long) cinfo->_jpeg_height > (long) JPEG_MAX_DIMENSION ||
3357 + (long) cinfo->_jpeg_width > (long) JPEG_MAX_DIMENSION)
3358 ERREXIT1(cinfo, JERR_IMAGE_TOO_BIG, (unsigned int) JPEG_MAX_DIMENSION);
3359
3360 /* Width of an input scanline must be representable as JDIMENSION. */
3361 @@ -96,20 +127,24 @@
3362 /* Fill in the correct component_index value; don't rely on application */
3363 compptr->component_index = ci;
3364 /* For compression, we never do DCT scaling. */
3365 +#if JPEG_LIB_VERSION >= 70
3366 + compptr->DCT_h_scaled_size = compptr->DCT_v_scaled_size = DCTSIZE;
3367 +#else
3368 compptr->DCT_scaled_size = DCTSIZE;
3369 +#endif
3370 /* Size in DCT blocks */
3371 compptr->width_in_blocks = (JDIMENSION)
3372 - jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor,
3373 + jdiv_round_up((long) cinfo->_jpeg_width * (long) compptr->h_samp_factor,
3374 (long) (cinfo->max_h_samp_factor * DCTSIZE));
3375 compptr->height_in_blocks = (JDIMENSION)
3376 - jdiv_round_up((long) cinfo->image_height * (long) compptr->v_samp_factor,
3377 + jdiv_round_up((long) cinfo->_jpeg_height * (long) compptr->v_samp_factor,
3378 (long) (cinfo->max_v_samp_factor * DCTSIZE));
3379 /* Size in samples */
3380 compptr->downsampled_width = (JDIMENSION)
3381 - jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor,
3382 + jdiv_round_up((long) cinfo->_jpeg_width * (long) compptr->h_samp_factor,
3383 (long) cinfo->max_h_samp_factor);
3384 compptr->downsampled_height = (JDIMENSION)
3385 - jdiv_round_up((long) cinfo->image_height * (long) compptr->v_samp_factor,
3386 + jdiv_round_up((long) cinfo->_jpeg_height * (long) compptr->v_samp_factor,
3387 (long) cinfo->max_v_samp_factor);
3388 /* Mark component needed (this flag isn't actually used for compression) */
3389 compptr->component_needed = TRUE;
3390 @@ -119,7 +154,7 @@
3391 * main controller will call coefficient controller).
3392 */
3393 cinfo->total_iMCU_rows = (JDIMENSION)
3394 - jdiv_round_up((long) cinfo->image_height,
3395 + jdiv_round_up((long) cinfo->_jpeg_height,
3396 (long) (cinfo->max_v_samp_factor*DCTSIZE));
3397 }
3398
3399 @@ -347,10 +382,10 @@
3400
3401 /* Overall image size in MCUs */
3402 cinfo->MCUs_per_row = (JDIMENSION)
3403 - jdiv_round_up((long) cinfo->image_width,
3404 + jdiv_round_up((long) cinfo->_jpeg_width,
3405 (long) (cinfo->max_h_samp_factor*DCTSIZE));
3406 cinfo->MCU_rows_in_scan = (JDIMENSION)
3407 - jdiv_round_up((long) cinfo->image_height,
3408 + jdiv_round_up((long) cinfo->_jpeg_height,
3409 (long) (cinfo->max_v_samp_factor*DCTSIZE));
3410
3411 cinfo->blocks_in_MCU = 0;
3412 @@ -554,7 +589,7 @@
3413 master->pub.is_last_pass = FALSE;
3414
3415 /* Validate parameters, determine derived values */
3416 - initial_setup(cinfo);
3417 + initial_setup(cinfo, transcode_only);
3418
3419 if (cinfo->scan_info != NULL) {
3420 #ifdef C_MULTISCAN_FILES_SUPPORTED
3421 @@ -567,7 +602,7 @@
3422 cinfo->num_scans = 1;
3423 }
3424
3425 - if (cinfo->progressive_mode) /* TEMPORARY HACK ??? */
3426 + if (cinfo->progressive_mode && !cinfo->arith_code) /* TEMPORARY HACK ??? * /
3427 cinfo->optimize_coding = TRUE; /* assume default tables no good for progres sive mode */
3428
3429 /* Initialize my private state */
3430 Index: jcparam.c
3431 ===================================================================
3432 --- jcparam.c (revision 829)
3433 +++ jcparam.c (working copy)
3434 @@ -1,9 +1,11 @@
3435 /*
3436 * jcparam.c
3437 *
3438 + * This file was part of the Independent JPEG Group's software:
3439 * Copyright (C) 1991-1998, Thomas G. Lane.
3440 - * Copyright (C) 2009, D. R. Commander.
3441 - * This file is part of the Independent JPEG Group's software.
3442 + * Modified 2003-2008 by Guido Vollbeding.
3443 + * libjpeg-turbo Modifications:
3444 + * Copyright (C) 2009-2011, D. R. Commander.
3445 * For conditions of distribution and use, see the accompanying README file.
3446 *
3447 * This file contains optional default-setting code for the JPEG compressor.
3448 @@ -61,7 +63,50 @@
3449 }
3450
3451
3452 +/* These are the sample quantization tables given in JPEG spec section K.1.
3453 + * The spec says that the values given produce "good" quality, and
3454 + * when divided by 2, "very good" quality.
3455 + */
3456 +static const unsigned int std_luminance_quant_tbl[DCTSIZE2] = {
3457 + 16, 11, 10, 16, 24, 40, 51, 61,
3458 + 12, 12, 14, 19, 26, 58, 60, 55,
3459 + 14, 13, 16, 24, 40, 57, 69, 56,
3460 + 14, 17, 22, 29, 51, 87, 80, 62,
3461 + 18, 22, 37, 56, 68, 109, 103, 77,
3462 + 24, 35, 55, 64, 81, 104, 113, 92,
3463 + 49, 64, 78, 87, 103, 121, 120, 101,
3464 + 72, 92, 95, 98, 112, 100, 103, 99
3465 +};
3466 +static const unsigned int std_chrominance_quant_tbl[DCTSIZE2] = {
3467 + 17, 18, 24, 47, 99, 99, 99, 99,
3468 + 18, 21, 26, 66, 99, 99, 99, 99,
3469 + 24, 26, 56, 99, 99, 99, 99, 99,
3470 + 47, 66, 99, 99, 99, 99, 99, 99,
3471 + 99, 99, 99, 99, 99, 99, 99, 99,
3472 + 99, 99, 99, 99, 99, 99, 99, 99,
3473 + 99, 99, 99, 99, 99, 99, 99, 99,
3474 + 99, 99, 99, 99, 99, 99, 99, 99
3475 +};
3476 +
3477 +
3478 +#if JPEG_LIB_VERSION >= 70
3479 GLOBAL(void)
3480 +jpeg_default_qtables (j_compress_ptr cinfo, boolean force_baseline)
3481 +/* Set or change the 'quality' (quantization) setting, using default tables
3482 + * and straight percentage-scaling quality scales.
3483 + * This entry point allows different scalings for luminance and chrominance.
3484 + */
3485 +{
3486 + /* Set up two quantization tables using the specified scaling */
3487 + jpeg_add_quant_table(cinfo, 0, std_luminance_quant_tbl,
3488 + cinfo->q_scale_factor[0], force_baseline);
3489 + jpeg_add_quant_table(cinfo, 1, std_chrominance_quant_tbl,
3490 + cinfo->q_scale_factor[1], force_baseline);
3491 +}
3492 +#endif
3493 +
3494 +
3495 +GLOBAL(void)
3496 jpeg_set_linear_quality (j_compress_ptr cinfo, int scale_factor,
3497 boolean force_baseline)
3498 /* Set or change the 'quality' (quantization) setting, using default tables
3499 @@ -70,31 +115,6 @@
3500 * applications that insist on a linear percentage scaling.
3501 */
3502 {
3503 - /* These are the sample quantization tables given in JPEG spec section K.1.
3504 - * The spec says that the values given produce "good" quality, and
3505 - * when divided by 2, "very good" quality.
3506 - */
3507 - static const unsigned int std_luminance_quant_tbl[DCTSIZE2] = {
3508 - 16, 11, 10, 16, 24, 40, 51, 61,
3509 - 12, 12, 14, 19, 26, 58, 60, 55,
3510 - 14, 13, 16, 24, 40, 57, 69, 56,
3511 - 14, 17, 22, 29, 51, 87, 80, 62,
3512 - 18, 22, 37, 56, 68, 109, 103, 77,
3513 - 24, 35, 55, 64, 81, 104, 113, 92,
3514 - 49, 64, 78, 87, 103, 121, 120, 101,
3515 - 72, 92, 95, 98, 112, 100, 103, 99
3516 - };
3517 - static const unsigned int std_chrominance_quant_tbl[DCTSIZE2] = {
3518 - 17, 18, 24, 47, 99, 99, 99, 99,
3519 - 18, 21, 26, 66, 99, 99, 99, 99,
3520 - 24, 26, 56, 99, 99, 99, 99, 99,
3521 - 47, 66, 99, 99, 99, 99, 99, 99,
3522 - 99, 99, 99, 99, 99, 99, 99, 99,
3523 - 99, 99, 99, 99, 99, 99, 99, 99,
3524 - 99, 99, 99, 99, 99, 99, 99, 99,
3525 - 99, 99, 99, 99, 99, 99, 99, 99
3526 - };
3527 -
3528 /* Set up two quantization tables using the specified scaling */
3529 jpeg_add_quant_table(cinfo, 0, std_luminance_quant_tbl,
3530 scale_factor, force_baseline);
3531 @@ -285,6 +305,10 @@
3532
3533 /* Initialize everything not dependent on the color space */
3534
3535 +#if JPEG_LIB_VERSION >= 70
3536 + cinfo->scale_num = 1; /* 1:1 scaling */
3537 + cinfo->scale_denom = 1;
3538 +#endif
3539 cinfo->data_precision = BITS_IN_JSAMPLE;
3540 /* Set up two quantization tables using default quality of 75 */
3541 jpeg_set_quality(cinfo, 75, TRUE);
3542 @@ -321,6 +345,11 @@
3543 /* By default, use the simpler non-cosited sampling alignment */
3544 cinfo->CCIR601_sampling = FALSE;
3545
3546 +#if JPEG_LIB_VERSION >= 70
3547 + /* By default, apply fancy downsampling */
3548 + cinfo->do_fancy_downsampling = TRUE;
3549 +#endif
3550 +
3551 /* No input smoothing */
3552 cinfo->smoothing_factor = 0;
3553
3554 @@ -370,6 +399,10 @@
3555 case JCS_EXT_BGRX:
3556 case JCS_EXT_XBGR:
3557 case JCS_EXT_XRGB:
3558 + case JCS_EXT_RGBA:
3559 + case JCS_EXT_BGRA:
3560 + case JCS_EXT_ABGR:
3561 + case JCS_EXT_ARGB:
3562 jpeg_set_colorspace(cinfo, JCS_YCbCr);
3563 break;
3564 case JCS_YCbCr:
3565 Index: jctrans.c
3566 ===================================================================
3567 --- jctrans.c (revision 829)
3568 +++ jctrans.c (working copy)
3569 @@ -2,6 +2,7 @@
3570 * jctrans.c
3571 *
3572 * Copyright (C) 1995-1998, Thomas G. Lane.
3573 + * Modified 2000-2009 by Guido Vollbeding.
3574 * This file is part of the Independent JPEG Group's software.
3575 * For conditions of distribution and use, see the accompanying README file.
3576 *
3577 @@ -76,6 +77,12 @@
3578 dstinfo->image_height = srcinfo->image_height;
3579 dstinfo->input_components = srcinfo->num_components;
3580 dstinfo->in_color_space = srcinfo->jpeg_color_space;
3581 +#if JPEG_LIB_VERSION >= 70
3582 + dstinfo->jpeg_width = srcinfo->output_width;
3583 + dstinfo->jpeg_height = srcinfo->output_height;
3584 + dstinfo->min_DCT_h_scaled_size = srcinfo->min_DCT_h_scaled_size;
3585 + dstinfo->min_DCT_v_scaled_size = srcinfo->min_DCT_v_scaled_size;
3586 +#endif
3587 /* Initialize all parameters to default values */
3588 jpeg_set_defaults(dstinfo);
3589 /* jpeg_set_defaults may choose wrong colorspace, eg YCbCr if input is RGB.
3590 @@ -167,7 +174,11 @@
3591
3592 /* Entropy encoding: either Huffman or arithmetic coding. */
3593 if (cinfo->arith_code) {
3594 +#ifdef C_ARITH_CODING_SUPPORTED
3595 + jinit_arith_encoder(cinfo);
3596 +#else
3597 ERREXIT(cinfo, JERR_ARITH_NOTIMPL);
3598 +#endif
3599 } else {
3600 if (cinfo->progressive_mode) {
3601 #ifdef C_PROGRESSIVE_SUPPORTED
3602 Index: jdapistd.c
3603 ===================================================================
3604 --- jdapistd.c (revision 829)
3605 +++ jdapistd.c (working copy)
3606 @@ -1,8 +1,11 @@
3607 /*
3608 * jdapistd.c
3609 *
3610 + * This file was part of the Independent JPEG Group's software:
3611 * Copyright (C) 1994-1996, Thomas G. Lane.
3612 - * This file is part of the Independent JPEG Group's software.
3613 + * libjpeg-turbo Modifications:
3614 + * Copyright (C) 2010, 2015, D. R. Commander.
3615 + * Copyright (C) 2015, Google, Inc.
3616 * For conditions of distribution and use, see the accompanying README file.
3617 *
3618 * This file contains application interface code for the decompression half
3619 @@ -14,9 +17,10 @@
3620 * whole decompression library into a transcoder.
3621 */
3622
3623 -#define JPEG_INTERNALS
3624 -#include "jinclude.h"
3625 -#include "jpeglib.h"
3626 +#include "jdmainct.h"
3627 +#include "jdcoefct.h"
3628 +#include "jdsample.h"
3629 +#include "jmemsys.h"
3630
3631
3632 /* Forward declarations */
3633 @@ -176,7 +180,236 @@
3634 }
3635
3636
3637 +
3638 +/* Dummy color convert function used by jpeg_skip_scanlines() */
3639 +LOCAL(void)
3640 +noop_convert (j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
3641 + JDIMENSION input_row, JSAMPARRAY output_buf, int num_rows)
3642 +{
3643 +}
3644 +
3645 +
3646 /*
3647 + * In some cases, it is best to call jpeg_read_scanlines() and discard the
3648 + * output, rather than skipping the scanlines, because this allows us to
3649 + * maintain the internal state of the context-based upsampler. In these cases,
3650 + * we set up and tear down a dummy color converter in order to avoid valgrind
3651 + * errors and to achieve the best possible performance.
3652 + */
3653 +LOCAL(void)
3654 +read_and_discard_scanlines (j_decompress_ptr cinfo, JDIMENSION num_lines)
3655 +{
3656 + JDIMENSION n;
3657 + void (*color_convert) (j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
3658 + JDIMENSION input_row, JSAMPARRAY output_buf,
3659 + int num_rows);
3660 +
3661 + color_convert = cinfo->cconvert->color_convert;
3662 + cinfo->cconvert->color_convert = noop_convert;
3663 +
3664 + for (n = 0; n < num_lines; n++)
3665 + jpeg_read_scanlines(cinfo, NULL, 1);
3666 +
3667 + cinfo->cconvert->color_convert = color_convert;
3668 +}
3669 +
3670 +/*
3671 + * Called by jpeg_skip_scanlines(). This partially skips a decompress block by
3672 + * incrementing the rowgroup counter.
3673 + */
3674 +
3675 +LOCAL(void)
3676 +increment_simple_rowgroup_ctr (j_decompress_ptr cinfo, JDIMENSION rows)
3677 +{
3678 + JDIMENSION rows_left;
3679 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
3680 +
3681 + /* Increment the counter to the next row group after the skipped rows. */
3682 + main_ptr->rowgroup_ctr += rows / cinfo->max_v_samp_factor;
3683 +
3684 + /* Partially skipping a row group would involve modifying the internal state
3685 + * of the upsampler, so read the remaining rows into a dummy buffer instead.
3686 + */
3687 + rows_left = rows % cinfo->max_v_samp_factor;
3688 + cinfo->output_scanline += rows - rows_left;
3689 +
3690 + read_and_discard_scanlines(cinfo, rows_left);
3691 +}
3692 +
3693 +/*
3694 + * Skips some scanlines of data from the JPEG decompressor.
3695 + *
3696 + * The return value will be the number of lines actually skipped. If skipping
3697 + * num_lines would move beyond the end of the image, then the actual number of
3698 + * lines remaining in the image is returned. Otherwise, the return value will
3699 + * be equal to num_lines.
3700 + *
3701 + * Refer to libjpeg.txt for more information.
3702 + */
3703 +
3704 +GLOBAL(JDIMENSION)
3705 +jpeg_skip_scanlines (j_decompress_ptr cinfo, JDIMENSION num_lines)
3706 +{
3707 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
3708 + my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
3709 + my_upsample_ptr upsample = (my_upsample_ptr) cinfo->upsample;
3710 + JDIMENSION i, x;
3711 + int y;
3712 + JDIMENSION lines_per_iMCU_row, lines_left_in_iMCU_row, lines_after_iMCU_row;
3713 + JDIMENSION lines_to_skip, lines_to_read;
3714 +
3715 + if (cinfo->global_state != DSTATE_SCANNING)
3716 + ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
3717 +
3718 + /* Do not skip past the bottom of the image. */
3719 + if (cinfo->output_scanline + num_lines >= cinfo->output_height) {
3720 + cinfo->output_scanline = cinfo->output_height;
3721 + return cinfo->output_height - cinfo->output_scanline;
3722 + }
3723 +
3724 + if (num_lines == 0)
3725 + return 0;
3726 +
3727 + lines_per_iMCU_row = cinfo->_min_DCT_scaled_size * cinfo->max_v_samp_factor;
3728 + lines_left_in_iMCU_row =
3729 + (lines_per_iMCU_row - (cinfo->output_scanline % lines_per_iMCU_row)) %
3730 + lines_per_iMCU_row;
3731 + lines_after_iMCU_row = num_lines - lines_left_in_iMCU_row;
3732 +
3733 + /* Skip the lines remaining in the current iMCU row. When upsampling
3734 + * requires context rows, we need the previous and next rows in order to read
3735 + * the current row. This adds some complexity.
3736 + */
3737 + if (cinfo->upsample->need_context_rows) {
3738 + /* If the skipped lines would not move us past the current iMCU row, we
3739 + * read the lines and ignore them. There might be a faster way of doing
3740 + * this, but we are facing increasing complexity for diminishing returns.
3741 + * The increasing complexity would be a by-product of meddling with the
3742 + * state machine used to skip context rows. Near the end of an iMCU row,
3743 + * the next iMCU row may have already been entropy-decoded. In this unique
3744 + * case, we will read the next iMCU row if we cannot skip past it as well.
3745 + */
3746 + if ((num_lines < lines_left_in_iMCU_row + 1) ||
3747 + (lines_left_in_iMCU_row <= 1 && main_ptr->buffer_full &&
3748 + lines_after_iMCU_row < lines_per_iMCU_row + 1)) {
3749 + read_and_discard_scanlines(cinfo, num_lines);
3750 + return num_lines;
3751 + }
3752 +
3753 + /* If the next iMCU row has already been entropy-decoded, make sure that
3754 + * we do not skip too far.
3755 + */
3756 + if (lines_left_in_iMCU_row <= 1 && main_ptr->buffer_full) {
3757 + cinfo->output_scanline += lines_left_in_iMCU_row + lines_per_iMCU_row;
3758 + lines_after_iMCU_row -= lines_per_iMCU_row;
3759 + } else {
3760 + cinfo->output_scanline += lines_left_in_iMCU_row;
3761 + }
3762 +
3763 + /* If we have just completed the first block, adjust the buffer pointers */
3764 + if (main_ptr->iMCU_row_ctr == 0 ||
3765 + (main_ptr->iMCU_row_ctr == 1 && lines_left_in_iMCU_row > 2))
3766 + set_wraparound_pointers(cinfo);
3767 + main_ptr->buffer_full = FALSE;
3768 + main_ptr->rowgroup_ctr = 0;
3769 + main_ptr->context_state = CTX_PREPARE_FOR_IMCU;
3770 + upsample->next_row_out = cinfo->max_v_samp_factor;
3771 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline;
3772 + }
3773 +
3774 + /* Skipping is much simpler when context rows are not required. */
3775 + else {
3776 + if (num_lines < lines_left_in_iMCU_row) {
3777 + increment_simple_rowgroup_ctr(cinfo, num_lines);
3778 + return num_lines;
3779 + } else {
3780 + cinfo->output_scanline += lines_left_in_iMCU_row;
3781 + main_ptr->buffer_full = FALSE;
3782 + main_ptr->rowgroup_ctr = 0;
3783 + upsample->next_row_out = cinfo->max_v_samp_factor;
3784 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline;
3785 + }
3786 + }
3787 +
3788 + /* Calculate how many full iMCU rows we can skip. */
3789 + if (cinfo->upsample->need_context_rows)
3790 + lines_to_skip = ((lines_after_iMCU_row - 1) / lines_per_iMCU_row) *
3791 + lines_per_iMCU_row;
3792 + else
3793 + lines_to_skip = (lines_after_iMCU_row / lines_per_iMCU_row) *
3794 + lines_per_iMCU_row;
3795 + /* Calculate the number of lines that remain to be skipped after skipping all
3796 + * of the full iMCU rows that we can. We will not read these lines unless we
3797 + * have to.
3798 + */
3799 + lines_to_read = lines_after_iMCU_row - lines_to_skip;
3800 +
3801 + /* For images requiring multiple scans (progressive, non-interleaved, etc.),
3802 + * all of the entropy decoding occurs in jpeg_start_decompress(), assuming
3803 + * that the input data source is non-suspending. This makes skipping easy.
3804 + */
3805 + if (cinfo->inputctl->has_multiple_scans) {
3806 + if (cinfo->upsample->need_context_rows) {
3807 + cinfo->output_scanline += lines_to_skip;
3808 + cinfo->output_iMCU_row += lines_to_skip / lines_per_iMCU_row;
3809 + main_ptr->iMCU_row_ctr += lines_after_iMCU_row / lines_per_iMCU_row;
3810 + /* It is complex to properly move to the middle of a context block, so
3811 + * read the remaining lines instead of skipping them.
3812 + */
3813 + read_and_discard_scanlines(cinfo, lines_to_read);
3814 + } else {
3815 + cinfo->output_scanline += lines_to_skip;
3816 + cinfo->output_iMCU_row += lines_to_skip / lines_per_iMCU_row;
3817 + increment_simple_rowgroup_ctr(cinfo, lines_to_read);
3818 + }
3819 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline;
3820 + return num_lines;
3821 + }
3822 +
3823 + /* Skip the iMCU rows that we can safely skip. */
3824 + for (i = 0; i < lines_to_skip; i += lines_per_iMCU_row) {
3825 + for (y = 0; y < coef->MCU_rows_per_iMCU_row; y++) {
3826 + for (x = 0; x < cinfo->MCUs_per_row; x++) {
3827 + /* Calling decode_mcu() with a NULL pointer causes it to discard the
3828 + * decoded coefficients. This is ~5% faster for large subsets, but
3829 + * it's tough to tell a difference for smaller images.
3830 + */
3831 + (*cinfo->entropy->decode_mcu) (cinfo, NULL);
3832 + }
3833 + }
3834 + cinfo->input_iMCU_row++;
3835 + cinfo->output_iMCU_row++;
3836 + if (cinfo->input_iMCU_row < cinfo->total_iMCU_rows)
3837 + start_iMCU_row(cinfo);
3838 + else
3839 + (*cinfo->inputctl->finish_input_pass) (cinfo);
3840 + }
3841 + cinfo->output_scanline += lines_to_skip;
3842 +
3843 + if (cinfo->upsample->need_context_rows) {
3844 + /* Context-based upsampling keeps track of iMCU rows. */
3845 + main_ptr->iMCU_row_ctr += lines_to_skip / lines_per_iMCU_row;
3846 +
3847 + /* It is complex to properly move to the middle of a context block, so
3848 + * read the remaining lines instead of skipping them.
3849 + */
3850 + read_and_discard_scanlines(cinfo, lines_to_read);
3851 + } else {
3852 + increment_simple_rowgroup_ctr(cinfo, lines_to_read);
3853 + }
3854 +
3855 + /* Since skipping lines involves skipping the upsampling step, the value of
3856 + * "rows_to_go" will become invalid unless we set it here. NOTE: This is a
3857 + * bit odd, since "rows_to_go" seems to be redundantly keeping track of
3858 + * output_scanline.
3859 + */
3860 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline;
3861 +
3862 + /* Always skip the requested number of lines. */
3863 + return num_lines;
3864 +}
3865 +
3866 +/*
3867 * Alternate entry point to read raw data.
3868 * Processes exactly one iMCU row per call, unless suspended.
3869 */
3870 @@ -202,7 +435,7 @@
3871 }
3872
3873 /* Verify that at least one iMCU row can be returned. */
3874 - lines_per_iMCU_row = cinfo->max_v_samp_factor * cinfo->min_DCT_scaled_size;
3875 + lines_per_iMCU_row = cinfo->max_v_samp_factor * cinfo->_min_DCT_scaled_size;
3876 if (max_lines < lines_per_iMCU_row)
3877 ERREXIT(cinfo, JERR_BUFFER_SIZE);
3878
3879 Index: jdatadst.c
3880 ===================================================================
3881 --- jdatadst.c (revision 829)
3882 +++ jdatadst.c (working copy)
3883 @@ -1,14 +1,17 @@
3884 /*
3885 * jdatadst.c
3886 *
3887 + * This file was part of the Independent JPEG Group's software:
3888 * Copyright (C) 1994-1996, Thomas G. Lane.
3889 - * This file is part of the Independent JPEG Group's software.
3890 + * Modified 2009-2012 by Guido Vollbeding.
3891 + * libjpeg-turbo Modifications:
3892 + * Copyright (C) 2013, D. R. Commander.
3893 * For conditions of distribution and use, see the accompanying README file.
3894 *
3895 * This file contains compression data destination routines for the case of
3896 - * emitting JPEG data to a file (or any stdio stream). While these routines
3897 - * are sufficient for most applications, some will want to use a different
3898 - * destination manager.
3899 + * emitting JPEG data to memory or to a file (or any stdio stream).
3900 + * While these routines are sufficient for most applications,
3901 + * some will want to use a different destination manager.
3902 * IMPORTANT: we assume that fwrite() will correctly transcribe an array of
3903 * JOCTETs into 8-bit-wide elements on external storage. If char is wider
3904 * than 8 bits on your machine, you may need to do some tweaking.
3905 @@ -19,7 +22,12 @@
3906 #include "jpeglib.h"
3907 #include "jerror.h"
3908
3909 +#ifndef HAVE_STDLIB_H /* <stdlib.h> should declare malloc(),free() */
3910 +extern void * malloc JPP((size_t size));
3911 +extern void free JPP((void *ptr));
3912 +#endif
3913
3914 +
3915 /* Expanded data destination object for stdio output */
3916
3917 typedef struct {
3918 @@ -34,6 +42,23 @@
3919 #define OUTPUT_BUF_SIZE 4096 /* choose an efficiently fwrite'able size */
3920
3921
3922 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
3923 +/* Expanded data destination object for memory output */
3924 +
3925 +typedef struct {
3926 + struct jpeg_destination_mgr pub; /* public fields */
3927 +
3928 + unsigned char ** outbuffer; /* target buffer */
3929 + unsigned long * outsize;
3930 + unsigned char * newbuffer; /* newly allocated buffer */
3931 + JOCTET * buffer; /* start of buffer */
3932 + size_t bufsize;
3933 +} my_mem_destination_mgr;
3934 +
3935 +typedef my_mem_destination_mgr * my_mem_dest_ptr;
3936 +#endif
3937 +
3938 +
3939 /*
3940 * Initialize destination --- called by jpeg_start_compress
3941 * before any data is actually written.
3942 @@ -53,7 +78,15 @@
3943 dest->pub.free_in_buffer = OUTPUT_BUF_SIZE;
3944 }
3945
3946 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
3947 +METHODDEF(void)
3948 +init_mem_destination (j_compress_ptr cinfo)
3949 +{
3950 + /* no work necessary here */
3951 +}
3952 +#endif
3953
3954 +
3955 /*
3956 * Empty the output buffer --- called whenever buffer fills up.
3957 *
3958 @@ -92,7 +125,39 @@
3959 return TRUE;
3960 }
3961
3962 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
3963 +METHODDEF(boolean)
3964 +empty_mem_output_buffer (j_compress_ptr cinfo)
3965 +{
3966 + size_t nextsize;
3967 + JOCTET * nextbuffer;
3968 + my_mem_dest_ptr dest = (my_mem_dest_ptr) cinfo->dest;
3969
3970 + /* Try to allocate new buffer with double size */
3971 + nextsize = dest->bufsize * 2;
3972 + nextbuffer = (JOCTET *) malloc(nextsize);
3973 +
3974 + if (nextbuffer == NULL)
3975 + ERREXIT1(cinfo, JERR_OUT_OF_MEMORY, 10);
3976 +
3977 + MEMCOPY(nextbuffer, dest->buffer, dest->bufsize);
3978 +
3979 + if (dest->newbuffer != NULL)
3980 + free(dest->newbuffer);
3981 +
3982 + dest->newbuffer = nextbuffer;
3983 +
3984 + dest->pub.next_output_byte = nextbuffer + dest->bufsize;
3985 + dest->pub.free_in_buffer = dest->bufsize;
3986 +
3987 + dest->buffer = nextbuffer;
3988 + dest->bufsize = nextsize;
3989 +
3990 + return TRUE;
3991 +}
3992 +#endif
3993 +
3994 +
3995 /*
3996 * Terminate destination --- called by jpeg_finish_compress
3997 * after all data has been written. Usually needs to flush buffer.
3998 @@ -119,7 +184,18 @@
3999 ERREXIT(cinfo, JERR_FILE_WRITE);
4000 }
4001
4002 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
4003 +METHODDEF(void)
4004 +term_mem_destination (j_compress_ptr cinfo)
4005 +{
4006 + my_mem_dest_ptr dest = (my_mem_dest_ptr) cinfo->dest;
4007
4008 + *dest->outbuffer = dest->buffer;
4009 + *dest->outsize = (unsigned long)(dest->bufsize - dest->pub.free_in_buffer);
4010 +}
4011 +#endif
4012 +
4013 +
4014 /*
4015 * Prepare for output to a stdio stream.
4016 * The caller must have already opened the stream, and is responsible
4017 @@ -149,3 +225,55 @@
4018 dest->pub.term_destination = term_destination;
4019 dest->outfile = outfile;
4020 }
4021 +
4022 +
4023 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
4024 +/*
4025 + * Prepare for output to a memory buffer.
4026 + * The caller may supply an own initial buffer with appropriate size.
4027 + * Otherwise, or when the actual data output exceeds the given size,
4028 + * the library adapts the buffer size as necessary.
4029 + * The standard library functions malloc/free are used for allocating
4030 + * larger memory, so the buffer is available to the application after
4031 + * finishing compression, and then the application is responsible for
4032 + * freeing the requested memory.
4033 + */
4034 +
4035 +GLOBAL(void)
4036 +jpeg_mem_dest (j_compress_ptr cinfo,
4037 + unsigned char ** outbuffer, unsigned long * outsize)
4038 +{
4039 + my_mem_dest_ptr dest;
4040 +
4041 + if (outbuffer == NULL || outsize == NULL) /* sanity check */
4042 + ERREXIT(cinfo, JERR_BUFFER_SIZE);
4043 +
4044 + /* The destination object is made permanent so that multiple JPEG images
4045 + * can be written to the same buffer without re-executing jpeg_mem_dest.
4046 + */
4047 + if (cinfo->dest == NULL) { /* first time for this JPEG object? */
4048 + cinfo->dest = (struct jpeg_destination_mgr *)
4049 + (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_PERMANENT,
4050 + SIZEOF(my_mem_destination_mgr));
4051 + }
4052 +
4053 + dest = (my_mem_dest_ptr) cinfo->dest;
4054 + dest->pub.init_destination = init_mem_destination;
4055 + dest->pub.empty_output_buffer = empty_mem_output_buffer;
4056 + dest->pub.term_destination = term_mem_destination;
4057 + dest->outbuffer = outbuffer;
4058 + dest->outsize = outsize;
4059 + dest->newbuffer = NULL;
4060 +
4061 + if (*outbuffer == NULL || *outsize == 0) {
4062 + /* Allocate initial buffer */
4063 + dest->newbuffer = *outbuffer = (unsigned char *) malloc(OUTPUT_BUF_SIZE);
4064 + if (dest->newbuffer == NULL)
4065 + ERREXIT1(cinfo, JERR_OUT_OF_MEMORY, 10);
4066 + *outsize = OUTPUT_BUF_SIZE;
4067 + }
4068 +
4069 + dest->pub.next_output_byte = dest->buffer = *outbuffer;
4070 + dest->pub.free_in_buffer = dest->bufsize = *outsize;
4071 +}
4072 +#endif
4073 Index: jdatasrc.c
4074 ===================================================================
4075 --- jdatasrc.c (revision 829)
4076 +++ jdatasrc.c (working copy)
4077 @@ -1,14 +1,17 @@
4078 /*
4079 * jdatasrc.c
4080 *
4081 + * This file was part of the Independent JPEG Group's software:
4082 * Copyright (C) 1994-1996, Thomas G. Lane.
4083 - * This file is part of the Independent JPEG Group's software.
4084 + * Modified 2009-2011 by Guido Vollbeding.
4085 + * libjpeg-turbo Modifications:
4086 + * Copyright (C) 2013, D. R. Commander.
4087 * For conditions of distribution and use, see the accompanying README file.
4088 *
4089 * This file contains decompression data source routines for the case of
4090 - * reading JPEG data from a file (or any stdio stream). While these routines
4091 - * are sufficient for most applications, some will want to use a different
4092 - * source manager.
4093 + * reading JPEG data from memory or from a file (or any stdio stream).
4094 + * While these routines are sufficient for most applications,
4095 + * some will want to use a different source manager.
4096 * IMPORTANT: we assume that fread() will correctly transcribe an array of
4097 * JOCTETs from 8-bit-wide elements on external storage. If char is wider
4098 * than 8 bits on your machine, you may need to do some tweaking.
4099 @@ -52,7 +55,15 @@
4100 src->start_of_file = TRUE;
4101 }
4102
4103 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
4104 +METHODDEF(void)
4105 +init_mem_source (j_decompress_ptr cinfo)
4106 +{
4107 + /* no work necessary here */
4108 +}
4109 +#endif
4110
4111 +
4112 /*
4113 * Fill the input buffer --- called whenever buffer is emptied.
4114 *
4115 @@ -111,7 +122,30 @@
4116 return TRUE;
4117 }
4118
4119 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
4120 +METHODDEF(boolean)
4121 +fill_mem_input_buffer (j_decompress_ptr cinfo)
4122 +{
4123 + static const JOCTET mybuffer[4] = {
4124 + (JOCTET) 0xFF, (JOCTET) JPEG_EOI, 0, 0
4125 + };
4126
4127 + /* The whole JPEG data is expected to reside in the supplied memory
4128 + * buffer, so any request for more data beyond the given buffer size
4129 + * is treated as an error.
4130 + */
4131 + WARNMS(cinfo, JWRN_JPEG_EOF);
4132 +
4133 + /* Insert a fake EOI marker */
4134 +
4135 + cinfo->src->next_input_byte = mybuffer;
4136 + cinfo->src->bytes_in_buffer = 2;
4137 +
4138 + return TRUE;
4139 +}
4140 +#endif
4141 +
4142 +
4143 /*
4144 * Skip data --- used to skip over a potentially large amount of
4145 * uninteresting data (such as an APPn marker).
4146 @@ -127,7 +161,7 @@
4147 METHODDEF(void)
4148 skip_input_data (j_decompress_ptr cinfo, long num_bytes)
4149 {
4150 - my_src_ptr src = (my_src_ptr) cinfo->src;
4151 + struct jpeg_source_mgr * src = cinfo->src;
4152
4153 /* Just a dumb implementation for now. Could use fseek() except
4154 * it doesn't work on pipes. Not clear that being smart is worth
4155 @@ -134,15 +168,15 @@
4156 * any trouble anyway --- large skips are infrequent.
4157 */
4158 if (num_bytes > 0) {
4159 - while (num_bytes > (long) src->pub.bytes_in_buffer) {
4160 - num_bytes -= (long) src->pub.bytes_in_buffer;
4161 - (void) fill_input_buffer(cinfo);
4162 + while (num_bytes > (long) src->bytes_in_buffer) {
4163 + num_bytes -= (long) src->bytes_in_buffer;
4164 + (void) (*src->fill_input_buffer) (cinfo);
4165 /* note we assume that fill_input_buffer will never return FALSE,
4166 * so suspension need not be handled.
4167 */
4168 }
4169 - src->pub.next_input_byte += (size_t) num_bytes;
4170 - src->pub.bytes_in_buffer -= (size_t) num_bytes;
4171 + src->next_input_byte += (size_t) num_bytes;
4172 + src->bytes_in_buffer -= (size_t) num_bytes;
4173 }
4174 }
4175
4176 @@ -210,3 +244,40 @@
4177 src->pub.bytes_in_buffer = 0; /* forces fill_input_buffer on first read */
4178 src->pub.next_input_byte = NULL; /* until buffer loaded */
4179 }
4180 +
4181 +
4182 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
4183 +/*
4184 + * Prepare for input from a supplied memory buffer.
4185 + * The buffer must contain the whole JPEG data.
4186 + */
4187 +
4188 +GLOBAL(void)
4189 +jpeg_mem_src (j_decompress_ptr cinfo,
4190 + unsigned char * inbuffer, unsigned long insize)
4191 +{
4192 + struct jpeg_source_mgr * src;
4193 +
4194 + if (inbuffer == NULL || insize == 0) /* Treat empty input as fatal error */
4195 + ERREXIT(cinfo, JERR_INPUT_EMPTY);
4196 +
4197 + /* The source object is made permanent so that a series of JPEG images
4198 + * can be read from the same buffer by calling jpeg_mem_src only before
4199 + * the first one.
4200 + */
4201 + if (cinfo->src == NULL) { /* first time for this JPEG object? */
4202 + cinfo->src = (struct jpeg_source_mgr *)
4203 + (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_PERMANENT,
4204 + SIZEOF(struct jpeg_source_mgr));
4205 + }
4206 +
4207 + src = cinfo->src;
4208 + src->init_source = init_mem_source;
4209 + src->fill_input_buffer = fill_mem_input_buffer;
4210 + src->skip_input_data = skip_input_data;
4211 + src->resync_to_restart = jpeg_resync_to_restart; /* use default method */
4212 + src->term_source = term_source;
4213 + src->bytes_in_buffer = (size_t) insize;
4214 + src->next_input_byte = (JOCTET *) inbuffer;
4215 +}
4216 +#endif
4217 Index: jdcoefct.c
4218 ===================================================================
4219 --- jdcoefct.c (revision 829)
4220 +++ jdcoefct.c (working copy)
4221 @@ -1,8 +1,11 @@
4222 /*
4223 * jdcoefct.c
4224 *
4225 + * This file was part of the Independent JPEG Group's software:
4226 * Copyright (C) 1994-1997, Thomas G. Lane.
4227 - * This file is part of the Independent JPEG Group's software.
4228 + * libjpeg-turbo Modifications:
4229 + * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
4230 + * Copyright (C) 2010, D. R. Commander.
4231 * For conditions of distribution and use, see the accompanying README file.
4232 *
4233 * This file contains the coefficient buffer controller for decompression.
4234 @@ -14,56 +17,10 @@
4235 * Also, the input side (only) is used when reading a file for transcoding.
4236 */
4237
4238 -#define JPEG_INTERNALS
4239 -#include "jinclude.h"
4240 -#include "jpeglib.h"
4241 +#include "jdcoefct.h"
4242 +#include "jpegcomp.h"
4243
4244 -/* Block smoothing is only applicable for progressive JPEG, so: */
4245 -#ifndef D_PROGRESSIVE_SUPPORTED
4246 -#undef BLOCK_SMOOTHING_SUPPORTED
4247 -#endif
4248
4249 -/* Private buffer controller object */
4250 -
4251 -typedef struct {
4252 - struct jpeg_d_coef_controller pub; /* public fields */
4253 -
4254 - /* These variables keep track of the current location of the input side. */
4255 - /* cinfo->input_iMCU_row is also used for this. */
4256 - JDIMENSION MCU_ctr; /* counts MCUs processed in current row */
4257 - int MCU_vert_offset; /* counts MCU rows within iMCU row */
4258 - int MCU_rows_per_iMCU_row; /* number of such rows needed */
4259 -
4260 - /* The output side's location is represented by cinfo->output_iMCU_row. */
4261 -
4262 - /* In single-pass modes, it's sufficient to buffer just one MCU.
4263 - * We allocate a workspace of D_MAX_BLOCKS_IN_MCU coefficient blocks,
4264 - * and let the entropy decoder write into that workspace each time.
4265 - * (On 80x86, the workspace is FAR even though it's not really very big;
4266 - * this is to keep the module interfaces unchanged when a large coefficient
4267 - * buffer is necessary.)
4268 - * In multi-pass modes, this array points to the current MCU's blocks
4269 - * within the virtual arrays; it is used only by the input side.
4270 - */
4271 - JBLOCKROW MCU_buffer[D_MAX_BLOCKS_IN_MCU];
4272 -
4273 - /* Temporary workspace for one MCU */
4274 - JCOEF * workspace;
4275 -
4276 -#ifdef D_MULTISCAN_FILES_SUPPORTED
4277 - /* In multi-pass modes, we need a virtual block array for each component. */
4278 - jvirt_barray_ptr whole_image[MAX_COMPONENTS];
4279 -#endif
4280 -
4281 -#ifdef BLOCK_SMOOTHING_SUPPORTED
4282 - /* When doing block smoothing, we latch coefficient Al values here */
4283 - int * coef_bits_latch;
4284 -#define SAVED_COEFS 6 /* we save coef_bits[0..5] */
4285 -#endif
4286 -} my_coef_controller;
4287 -
4288 -typedef my_coef_controller * my_coef_ptr;
4289 -
4290 /* Forward declarations */
4291 METHODDEF(int) decompress_onepass
4292 JPP((j_decompress_ptr cinfo, JSAMPIMAGE output_buf));
4293 @@ -78,30 +35,6 @@
4294 #endif
4295
4296
4297 -LOCAL(void)
4298 -start_iMCU_row (j_decompress_ptr cinfo)
4299 -/* Reset within-iMCU-row counters for a new row (input side) */
4300 -{
4301 - my_coef_ptr coef = (my_coef_ptr) cinfo->coef;
4302 -
4303 - /* In an interleaved scan, an MCU row is the same as an iMCU row.
4304 - * In a noninterleaved scan, an iMCU row has v_samp_factor MCU rows.
4305 - * But at the bottom of the image, process only what's left.
4306 - */
4307 - if (cinfo->comps_in_scan > 1) {
4308 - coef->MCU_rows_per_iMCU_row = 1;
4309 - } else {
4310 - if (cinfo->input_iMCU_row < (cinfo->total_iMCU_rows-1))
4311 - coef->MCU_rows_per_iMCU_row = cinfo->cur_comp_info[0]->v_samp_factor;
4312 - else
4313 - coef->MCU_rows_per_iMCU_row = cinfo->cur_comp_info[0]->last_row_height;
4314 - }
4315 -
4316 - coef->MCU_ctr = 0;
4317 - coef->MCU_vert_offset = 0;
4318 -}
4319 -
4320 -
4321 /*
4322 * Initialize for an input processing pass.
4323 */
4324 @@ -190,7 +123,7 @@
4325 useful_width = (MCU_col_num < last_MCU_col) ? compptr->MCU_width
4326 : compptr->last_col_width;
4327 output_ptr = output_buf[compptr->component_index] +
4328 - yoffset * compptr->DCT_scaled_size;
4329 + yoffset * compptr->_DCT_scaled_size;
4330 start_col = MCU_col_num * compptr->MCU_sample_width;
4331 for (yindex = 0; yindex < compptr->MCU_height; yindex++) {
4332 if (cinfo->input_iMCU_row < last_iMCU_row ||
4333 @@ -200,11 +133,11 @@
4334 (*inverse_DCT) (cinfo, compptr,
4335 (JCOEFPTR) coef->MCU_buffer[blkn+xindex],
4336 output_ptr, output_col);
4337 - output_col += compptr->DCT_scaled_size;
4338 + output_col += compptr->_DCT_scaled_size;
4339 }
4340 }
4341 blkn += compptr->MCU_width;
4342 - output_ptr += compptr->DCT_scaled_size;
4343 + output_ptr += compptr->_DCT_scaled_size;
4344 }
4345 }
4346 }
4347 @@ -365,9 +298,9 @@
4348 (*inverse_DCT) (cinfo, compptr, (JCOEFPTR) buffer_ptr,
4349 output_ptr, output_col);
4350 buffer_ptr++;
4351 - output_col += compptr->DCT_scaled_size;
4352 + output_col += compptr->_DCT_scaled_size;
4353 }
4354 - output_ptr += compptr->DCT_scaled_size;
4355 + output_ptr += compptr->_DCT_scaled_size;
4356 }
4357 }
4358
4359 @@ -660,9 +593,9 @@
4360 DC4 = DC5; DC5 = DC6;
4361 DC7 = DC8; DC8 = DC9;
4362 buffer_ptr++, prev_block_row++, next_block_row++;
4363 - output_col += compptr->DCT_scaled_size;
4364 + output_col += compptr->_DCT_scaled_size;
4365 }
4366 - output_ptr += compptr->DCT_scaled_size;
4367 + output_ptr += compptr->_DCT_scaled_size;
4368 }
4369 }
4370
4371 Index: jdcolor.c
4372 ===================================================================
4373 --- jdcolor.c (revision 829)
4374 +++ jdcolor.c (working copy)
4375 @@ -1,10 +1,12 @@
4376 /*
4377 * jdcolor.c
4378 *
4379 + * This file was part of the Independent JPEG Group's software:
4380 * Copyright (C) 1991-1997, Thomas G. Lane.
4381 + * Modified 2011 by Guido Vollbeding.
4382 + * libjpeg-turbo Modifications:
4383 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
4384 - * Copyright (C) 2009, D. R. Commander.
4385 - * This file is part of the Independent JPEG Group's software.
4386 + * Copyright (C) 2009, 2011-2012, D. R. Commander.
4387 * For conditions of distribution and use, see the accompanying README file.
4388 *
4389 * This file contains output colorspace conversion routines.
4390 @@ -14,6 +16,7 @@
4391 #include "jinclude.h"
4392 #include "jpeglib.h"
4393 #include "jsimd.h"
4394 +#include "config.h"
4395
4396
4397 /* Private subobject */
4398 @@ -26,6 +29,9 @@
4399 int * Cb_b_tab; /* => table for Cb to B conversion */
4400 INT32 * Cr_g_tab; /* => table for Cr to G conversion */
4401 INT32 * Cb_g_tab; /* => table for Cb to G conversion */
4402 +
4403 + /* Private state for RGB->Y conversion */
4404 + INT32 * rgb_y_tab; /* => table for RGB to Y conversion */
4405 } my_color_deconverter;
4406
4407 typedef my_color_deconverter * my_cconvert_ptr;
4408 @@ -32,14 +38,19 @@
4409
4410
4411 /**************** YCbCr -> RGB conversion: most common case **************/
4412 +/**************** RGB -> Y conversion: less common case **************/
4413
4414 /*
4415 * YCbCr is defined per CCIR 601-1, except that Cb and Cr are
4416 * normalized to the range 0..MAXJSAMPLE rather than -0.5 .. 0.5.
4417 * The conversion equations to be implemented are therefore
4418 + *
4419 * R = Y + 1.40200 * Cr
4420 * G = Y - 0.34414 * Cb - 0.71414 * Cr
4421 * B = Y + 1.77200 * Cb
4422 + *
4423 + * Y = 0.29900 * R + 0.58700 * G + 0.11400 * B
4424 + *
4425 * where Cb and Cr represent the incoming values less CENTERJSAMPLE.
4426 * (These numbers are derived from TIFF 6.0 section 21, dated 3-June-92.)
4427 *
4428 @@ -64,7 +75,132 @@
4429 #define ONE_HALF ((INT32) 1 << (SCALEBITS-1))
4430 #define FIX(x) ((INT32) ((x) * (1L<<SCALEBITS) + 0.5))
4431
4432 +/* We allocate one big table for RGB->Y conversion and divide it up into
4433 + * three parts, instead of doing three alloc_small requests. This lets us
4434 + * use a single table base address, which can be held in a register in the
4435 + * inner loops on many machines (more than can hold all three addresses,
4436 + * anyway).
4437 + */
4438
4439 +#define R_Y_OFF 0 /* offset to R => Y sect ion */
4440 +#define G_Y_OFF (1*(MAXJSAMPLE+1)) /* offset to G => Y sect ion */
4441 +#define B_Y_OFF (2*(MAXJSAMPLE+1)) /* etc. */
4442 +#define TABLE_SIZE (3*(MAXJSAMPLE+1))
4443 +
4444 +
4445 +/* Include inline routines for colorspace extensions */
4446 +
4447 +#include "jdcolext.c"
4448 +#undef RGB_RED
4449 +#undef RGB_GREEN
4450 +#undef RGB_BLUE
4451 +#undef RGB_PIXELSIZE
4452 +
4453 +#define RGB_RED EXT_RGB_RED
4454 +#define RGB_GREEN EXT_RGB_GREEN
4455 +#define RGB_BLUE EXT_RGB_BLUE
4456 +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
4457 +#define ycc_rgb_convert_internal ycc_extrgb_convert_internal
4458 +#define gray_rgb_convert_internal gray_extrgb_convert_internal
4459 +#define rgb_rgb_convert_internal rgb_extrgb_convert_internal
4460 +#include "jdcolext.c"
4461 +#undef RGB_RED
4462 +#undef RGB_GREEN
4463 +#undef RGB_BLUE
4464 +#undef RGB_PIXELSIZE
4465 +#undef ycc_rgb_convert_internal
4466 +#undef gray_rgb_convert_internal
4467 +#undef rgb_rgb_convert_internal
4468 +
4469 +#define RGB_RED EXT_RGBX_RED
4470 +#define RGB_GREEN EXT_RGBX_GREEN
4471 +#define RGB_BLUE EXT_RGBX_BLUE
4472 +#define RGB_ALPHA 3
4473 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
4474 +#define ycc_rgb_convert_internal ycc_extrgbx_convert_internal
4475 +#define gray_rgb_convert_internal gray_extrgbx_convert_internal
4476 +#define rgb_rgb_convert_internal rgb_extrgbx_convert_internal
4477 +#include "jdcolext.c"
4478 +#undef RGB_RED
4479 +#undef RGB_GREEN
4480 +#undef RGB_BLUE
4481 +#undef RGB_ALPHA
4482 +#undef RGB_PIXELSIZE
4483 +#undef ycc_rgb_convert_internal
4484 +#undef gray_rgb_convert_internal
4485 +#undef rgb_rgb_convert_internal
4486 +
4487 +#define RGB_RED EXT_BGR_RED
4488 +#define RGB_GREEN EXT_BGR_GREEN
4489 +#define RGB_BLUE EXT_BGR_BLUE
4490 +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
4491 +#define ycc_rgb_convert_internal ycc_extbgr_convert_internal
4492 +#define gray_rgb_convert_internal gray_extbgr_convert_internal
4493 +#define rgb_rgb_convert_internal rgb_extbgr_convert_internal
4494 +#include "jdcolext.c"
4495 +#undef RGB_RED
4496 +#undef RGB_GREEN
4497 +#undef RGB_BLUE
4498 +#undef RGB_PIXELSIZE
4499 +#undef ycc_rgb_convert_internal
4500 +#undef gray_rgb_convert_internal
4501 +#undef rgb_rgb_convert_internal
4502 +
4503 +#define RGB_RED EXT_BGRX_RED
4504 +#define RGB_GREEN EXT_BGRX_GREEN
4505 +#define RGB_BLUE EXT_BGRX_BLUE
4506 +#define RGB_ALPHA 3
4507 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
4508 +#define ycc_rgb_convert_internal ycc_extbgrx_convert_internal
4509 +#define gray_rgb_convert_internal gray_extbgrx_convert_internal
4510 +#define rgb_rgb_convert_internal rgb_extbgrx_convert_internal
4511 +#include "jdcolext.c"
4512 +#undef RGB_RED
4513 +#undef RGB_GREEN
4514 +#undef RGB_BLUE
4515 +#undef RGB_ALPHA
4516 +#undef RGB_PIXELSIZE
4517 +#undef ycc_rgb_convert_internal
4518 +#undef gray_rgb_convert_internal
4519 +#undef rgb_rgb_convert_internal
4520 +
4521 +#define RGB_RED EXT_XBGR_RED
4522 +#define RGB_GREEN EXT_XBGR_GREEN
4523 +#define RGB_BLUE EXT_XBGR_BLUE
4524 +#define RGB_ALPHA 0
4525 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
4526 +#define ycc_rgb_convert_internal ycc_extxbgr_convert_internal
4527 +#define gray_rgb_convert_internal gray_extxbgr_convert_internal
4528 +#define rgb_rgb_convert_internal rgb_extxbgr_convert_internal
4529 +#include "jdcolext.c"
4530 +#undef RGB_RED
4531 +#undef RGB_GREEN
4532 +#undef RGB_BLUE
4533 +#undef RGB_ALPHA
4534 +#undef RGB_PIXELSIZE
4535 +#undef ycc_rgb_convert_internal
4536 +#undef gray_rgb_convert_internal
4537 +#undef rgb_rgb_convert_internal
4538 +
4539 +#define RGB_RED EXT_XRGB_RED
4540 +#define RGB_GREEN EXT_XRGB_GREEN
4541 +#define RGB_BLUE EXT_XRGB_BLUE
4542 +#define RGB_ALPHA 0
4543 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
4544 +#define ycc_rgb_convert_internal ycc_extxrgb_convert_internal
4545 +#define gray_rgb_convert_internal gray_extxrgb_convert_internal
4546 +#define rgb_rgb_convert_internal rgb_extxrgb_convert_internal
4547 +#include "jdcolext.c"
4548 +#undef RGB_RED
4549 +#undef RGB_GREEN
4550 +#undef RGB_BLUE
4551 +#undef RGB_ALPHA
4552 +#undef RGB_PIXELSIZE
4553 +#undef ycc_rgb_convert_internal
4554 +#undef gray_rgb_convert_internal
4555 +#undef rgb_rgb_convert_internal
4556 +
4557 +
4558 /*
4559 * Initialize tables for YCC->RGB colorspace conversion.
4560 */
4561 @@ -110,13 +246,6 @@
4562
4563 /*
4564 * Convert some rows of samples to the output colorspace.
4565 - *
4566 - * Note that we change from noninterleaved, one-plane-per-component format
4567 - * to interleaved-pixel format. The output buffer is therefore three times
4568 - * as wide as the input buffer.
4569 - * A starting row offset is provided only for the input buffer. The caller
4570 - * can easily adjust the passed output_buf value to accommodate any row
4571 - * offset required on that side.
4572 */
4573
4574 METHODDEF(void)
4575 @@ -124,19 +253,86 @@
4576 JSAMPIMAGE input_buf, JDIMENSION input_row,
4577 JSAMPARRAY output_buf, int num_rows)
4578 {
4579 + switch (cinfo->out_color_space) {
4580 + case JCS_EXT_RGB:
4581 + ycc_extrgb_convert_internal(cinfo, input_buf, input_row, output_buf,
4582 + num_rows);
4583 + break;
4584 + case JCS_EXT_RGBX:
4585 + case JCS_EXT_RGBA:
4586 + ycc_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf,
4587 + num_rows);
4588 + break;
4589 + case JCS_EXT_BGR:
4590 + ycc_extbgr_convert_internal(cinfo, input_buf, input_row, output_buf,
4591 + num_rows);
4592 + break;
4593 + case JCS_EXT_BGRX:
4594 + case JCS_EXT_BGRA:
4595 + ycc_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf,
4596 + num_rows);
4597 + break;
4598 + case JCS_EXT_XBGR:
4599 + case JCS_EXT_ABGR:
4600 + ycc_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf,
4601 + num_rows);
4602 + break;
4603 + case JCS_EXT_XRGB:
4604 + case JCS_EXT_ARGB:
4605 + ycc_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf,
4606 + num_rows);
4607 + break;
4608 + default:
4609 + ycc_rgb_convert_internal(cinfo, input_buf, input_row, output_buf,
4610 + num_rows);
4611 + break;
4612 + }
4613 +}
4614 +
4615 +
4616 +/**************** Cases other than YCbCr -> RGB **************/
4617 +
4618 +
4619 +/*
4620 + * Initialize for RGB->grayscale colorspace conversion.
4621 + */
4622 +
4623 +LOCAL(void)
4624 +build_rgb_y_table (j_decompress_ptr cinfo)
4625 +{
4626 my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert;
4627 - register int y, cb, cr;
4628 + INT32 * rgb_y_tab;
4629 + INT32 i;
4630 +
4631 + /* Allocate and fill in the conversion tables. */
4632 + cconvert->rgb_y_tab = rgb_y_tab = (INT32 *)
4633 + (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
4634 + (TABLE_SIZE * SIZEOF(INT32)));
4635 +
4636 + for (i = 0; i <= MAXJSAMPLE; i++) {
4637 + rgb_y_tab[i+R_Y_OFF] = FIX(0.29900) * i;
4638 + rgb_y_tab[i+G_Y_OFF] = FIX(0.58700) * i;
4639 + rgb_y_tab[i+B_Y_OFF] = FIX(0.11400) * i + ONE_HALF;
4640 + }
4641 +}
4642 +
4643 +
4644 +/*
4645 + * Convert RGB to grayscale.
4646 + */
4647 +
4648 +METHODDEF(void)
4649 +rgb_gray_convert (j_decompress_ptr cinfo,
4650 + JSAMPIMAGE input_buf, JDIMENSION input_row,
4651 + JSAMPARRAY output_buf, int num_rows)
4652 +{
4653 + my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert;
4654 + register int r, g, b;
4655 + register INT32 * ctab = cconvert->rgb_y_tab;
4656 register JSAMPROW outptr;
4657 register JSAMPROW inptr0, inptr1, inptr2;
4658 register JDIMENSION col;
4659 JDIMENSION num_cols = cinfo->output_width;
4660 - /* copy these pointers into registers if possible */
4661 - register JSAMPLE * range_limit = cinfo->sample_range_limit;
4662 - register int * Crrtab = cconvert->Cr_r_tab;
4663 - register int * Cbbtab = cconvert->Cb_b_tab;
4664 - register INT32 * Crgtab = cconvert->Cr_g_tab;
4665 - register INT32 * Cbgtab = cconvert->Cb_g_tab;
4666 - SHIFT_TEMPS
4667
4668 while (--num_rows >= 0) {
4669 inptr0 = input_buf[0][input_row];
4670 @@ -145,24 +341,18 @@
4671 input_row++;
4672 outptr = *output_buf++;
4673 for (col = 0; col < num_cols; col++) {
4674 - y = GETJSAMPLE(inptr0[col]);
4675 - cb = GETJSAMPLE(inptr1[col]);
4676 - cr = GETJSAMPLE(inptr2[col]);
4677 - /* Range-limiting is essential due to noise introduced by DCT losses. */
4678 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + Crrtab[cr]];
4679 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y +
4680 - ((int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
4681 - SCALEBITS))];
4682 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + Cbbtab[cb]];
4683 - outptr += rgb_pixelsize[cinfo->out_color_space];
4684 + r = GETJSAMPLE(inptr0[col]);
4685 + g = GETJSAMPLE(inptr1[col]);
4686 + b = GETJSAMPLE(inptr2[col]);
4687 + /* Y */
4688 + outptr[col] = (JSAMPLE)
4689 + ((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF])
4690 + >> SCALEBITS);
4691 }
4692 }
4693 }
4694
4695
4696 -/**************** Cases other than YCbCr -> RGB **************/
4697 -
4698 -
4699 /*
4700 * Color conversion for no colorspace change: just copy the data,
4701 * converting from separate-planes to interleaved representation.
4702 @@ -211,9 +401,7 @@
4703
4704
4705 /*
4706 - * Convert grayscale to RGB: just duplicate the graylevel three times.
4707 - * This is provided to support applications that don't want to cope
4708 - * with grayscale as a separate case.
4709 + * Convert grayscale to RGB
4710 */
4711
4712 METHODDEF(void)
4713 @@ -221,20 +409,85 @@
4714 JSAMPIMAGE input_buf, JDIMENSION input_row,
4715 JSAMPARRAY output_buf, int num_rows)
4716 {
4717 - register JSAMPROW inptr, outptr;
4718 - register JDIMENSION col;
4719 - JDIMENSION num_cols = cinfo->output_width;
4720 + switch (cinfo->out_color_space) {
4721 + case JCS_EXT_RGB:
4722 + gray_extrgb_convert_internal(cinfo, input_buf, input_row, output_buf,
4723 + num_rows);
4724 + break;
4725 + case JCS_EXT_RGBX:
4726 + case JCS_EXT_RGBA:
4727 + gray_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf,
4728 + num_rows);
4729 + break;
4730 + case JCS_EXT_BGR:
4731 + gray_extbgr_convert_internal(cinfo, input_buf, input_row, output_buf,
4732 + num_rows);
4733 + break;
4734 + case JCS_EXT_BGRX:
4735 + case JCS_EXT_BGRA:
4736 + gray_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf,
4737 + num_rows);
4738 + break;
4739 + case JCS_EXT_XBGR:
4740 + case JCS_EXT_ABGR:
4741 + gray_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf,
4742 + num_rows);
4743 + break;
4744 + case JCS_EXT_XRGB:
4745 + case JCS_EXT_ARGB:
4746 + gray_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf,
4747 + num_rows);
4748 + break;
4749 + default:
4750 + gray_rgb_convert_internal(cinfo, input_buf, input_row, output_buf,
4751 + num_rows);
4752 + break;
4753 + }
4754 +}
4755
4756 - while (--num_rows >= 0) {
4757 - inptr = input_buf[0][input_row++];
4758 - outptr = *output_buf++;
4759 - for (col = 0; col < num_cols; col++) {
4760 - /* We can dispense with GETJSAMPLE() here */
4761 - outptr[rgb_red[cinfo->out_color_space]] =
4762 - outptr[rgb_green[cinfo->out_color_space]] =
4763 - outptr[rgb_blue[cinfo->out_color_space]] = inptr[col];
4764 - outptr += rgb_pixelsize[cinfo->out_color_space];
4765 - }
4766 +
4767 +/*
4768 + * Convert plain RGB to extended RGB
4769 + */
4770 +
4771 +METHODDEF(void)
4772 +rgb_rgb_convert (j_decompress_ptr cinfo,
4773 + JSAMPIMAGE input_buf, JDIMENSION input_row,
4774 + JSAMPARRAY output_buf, int num_rows)
4775 +{
4776 + switch (cinfo->out_color_space) {
4777 + case JCS_EXT_RGB:
4778 + rgb_extrgb_convert_internal(cinfo, input_buf, input_row, output_buf,
4779 + num_rows);
4780 + break;
4781 + case JCS_EXT_RGBX:
4782 + case JCS_EXT_RGBA:
4783 + rgb_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf,
4784 + num_rows);
4785 + break;
4786 + case JCS_EXT_BGR:
4787 + rgb_extbgr_convert_internal(cinfo, input_buf, input_row, output_buf,
4788 + num_rows);
4789 + break;
4790 + case JCS_EXT_BGRX:
4791 + case JCS_EXT_BGRA:
4792 + rgb_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf,
4793 + num_rows);
4794 + break;
4795 + case JCS_EXT_XBGR:
4796 + case JCS_EXT_ABGR:
4797 + rgb_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf,
4798 + num_rows);
4799 + break;
4800 + case JCS_EXT_XRGB:
4801 + case JCS_EXT_ARGB:
4802 + rgb_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf,
4803 + num_rows);
4804 + break;
4805 + default:
4806 + rgb_rgb_convert_internal(cinfo, input_buf, input_row, output_buf,
4807 + num_rows);
4808 + break;
4809 }
4810 }
4811
4812 @@ -356,6 +609,9 @@
4813 /* For color->grayscale conversion, only the Y (0) component is needed */
4814 for (ci = 1; ci < cinfo->num_components; ci++)
4815 cinfo->comp_info[ci].component_needed = FALSE;
4816 + } else if (cinfo->jpeg_color_space == JCS_RGB) {
4817 + cconvert->pub.color_convert = rgb_gray_convert;
4818 + build_rgb_y_table(cinfo);
4819 } else
4820 ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL);
4821 break;
4822 @@ -367,6 +623,10 @@
4823 case JCS_EXT_BGRX:
4824 case JCS_EXT_XBGR:
4825 case JCS_EXT_XRGB:
4826 + case JCS_EXT_RGBA:
4827 + case JCS_EXT_BGRA:
4828 + case JCS_EXT_ABGR:
4829 + case JCS_EXT_ARGB:
4830 cinfo->out_color_components = rgb_pixelsize[cinfo->out_color_space];
4831 if (cinfo->jpeg_color_space == JCS_YCbCr) {
4832 if (jsimd_can_ycc_rgb())
4833 @@ -377,9 +637,14 @@
4834 }
4835 } else if (cinfo->jpeg_color_space == JCS_GRAYSCALE) {
4836 cconvert->pub.color_convert = gray_rgb_convert;
4837 - } else if (cinfo->jpeg_color_space == cinfo->out_color_space &&
4838 - rgb_pixelsize[cinfo->out_color_space] == 3) {
4839 - cconvert->pub.color_convert = null_convert;
4840 + } else if (cinfo->jpeg_color_space == JCS_RGB) {
4841 + if (rgb_red[cinfo->out_color_space] == 0 &&
4842 + rgb_green[cinfo->out_color_space] == 1 &&
4843 + rgb_blue[cinfo->out_color_space] == 2 &&
4844 + rgb_pixelsize[cinfo->out_color_space] == 3)
4845 + cconvert->pub.color_convert = null_convert;
4846 + else
4847 + cconvert->pub.color_convert = rgb_rgb_convert;
4848 } else
4849 ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL);
4850 break;
4851 Index: jdct.h
4852 ===================================================================
4853 --- jdct.h (revision 829)
4854 +++ jdct.h (working copy)
4855 @@ -95,9 +95,21 @@
4856 #define jpeg_idct_islow jRDislow
4857 #define jpeg_idct_ifast jRDifast
4858 #define jpeg_idct_float jRDfloat
4859 +#define jpeg_idct_7x7 jRD7x7
4860 +#define jpeg_idct_6x6 jRD6x6
4861 +#define jpeg_idct_5x5 jRD5x5
4862 #define jpeg_idct_4x4 jRD4x4
4863 +#define jpeg_idct_3x3 jRD3x3
4864 #define jpeg_idct_2x2 jRD2x2
4865 #define jpeg_idct_1x1 jRD1x1
4866 +#define jpeg_idct_9x9 jRD9x9
4867 +#define jpeg_idct_10x10 jRD10x10
4868 +#define jpeg_idct_11x11 jRD11x11
4869 +#define jpeg_idct_12x12 jRD12x12
4870 +#define jpeg_idct_13x13 jRD13x13
4871 +#define jpeg_idct_14x14 jRD14x14
4872 +#define jpeg_idct_15x15 jRD15x15
4873 +#define jpeg_idct_16x16 jRD16x16
4874 #endif /* NEED_SHORT_EXTERNAL_NAMES */
4875
4876 /* Extern declarations for the forward and inverse DCT routines. */
4877 @@ -115,9 +127,21 @@
4878 EXTERN(void) jpeg_idct_float
4879 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4880 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4881 +EXTERN(void) jpeg_idct_7x7
4882 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4883 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4884 +EXTERN(void) jpeg_idct_6x6
4885 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4886 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4887 +EXTERN(void) jpeg_idct_5x5
4888 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4889 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4890 EXTERN(void) jpeg_idct_4x4
4891 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4892 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4893 +EXTERN(void) jpeg_idct_3x3
4894 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4895 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4896 EXTERN(void) jpeg_idct_2x2
4897 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4898 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4899 @@ -124,6 +148,30 @@
4900 EXTERN(void) jpeg_idct_1x1
4901 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4902 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4903 +EXTERN(void) jpeg_idct_9x9
4904 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4905 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4906 +EXTERN(void) jpeg_idct_10x10
4907 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4908 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4909 +EXTERN(void) jpeg_idct_11x11
4910 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4911 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4912 +EXTERN(void) jpeg_idct_12x12
4913 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4914 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4915 +EXTERN(void) jpeg_idct_13x13
4916 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4917 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4918 +EXTERN(void) jpeg_idct_14x14
4919 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4920 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4921 +EXTERN(void) jpeg_idct_15x15
4922 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4923 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4924 +EXTERN(void) jpeg_idct_16x16
4925 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
4926 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
4927
4928
4929 /*
4930 Index: jddctmgr.c
4931 ===================================================================
4932 --- jddctmgr.c (revision 829)
4933 +++ jddctmgr.c (working copy)
4934 @@ -1,9 +1,12 @@
4935 /*
4936 * jddctmgr.c
4937 *
4938 + * This file was part of the Independent JPEG Group's software:
4939 * Copyright (C) 1994-1996, Thomas G. Lane.
4940 + * Modified 2002-2010 by Guido Vollbeding.
4941 + * libjpeg-turbo Modifications:
4942 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
4943 - * This file is part of the Independent JPEG Group's software.
4944 + * Copyright (C) 2010, D. R. Commander.
4945 * For conditions of distribution and use, see the accompanying README file.
4946 *
4947 * This file contains the inverse-DCT management logic.
4948 @@ -21,6 +24,7 @@
4949 #include "jpeglib.h"
4950 #include "jdct.h" /* Private declarations for DCT subsystem */
4951 #include "jsimddct.h"
4952 +#include "jpegcomp.h"
4953
4954
4955 /*
4956 @@ -100,7 +104,7 @@
4957 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
4958 ci++, compptr++) {
4959 /* Select the proper IDCT routine for this component's scaling */
4960 - switch (compptr->DCT_scaled_size) {
4961 + switch (compptr->_DCT_scaled_size) {
4962 #ifdef IDCT_SCALING_SUPPORTED
4963 case 1:
4964 method_ptr = jpeg_idct_1x1;
4965 @@ -113,6 +117,10 @@
4966 method_ptr = jpeg_idct_2x2;
4967 method = JDCT_ISLOW; /* jidctred uses islow-style table */
4968 break;
4969 + case 3:
4970 + method_ptr = jpeg_idct_3x3;
4971 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
4972 + break;
4973 case 4:
4974 if (jsimd_can_idct_4x4())
4975 method_ptr = jsimd_idct_4x4;
4976 @@ -120,6 +128,18 @@
4977 method_ptr = jpeg_idct_4x4;
4978 method = JDCT_ISLOW; /* jidctred uses islow-style table */
4979 break;
4980 + case 5:
4981 + method_ptr = jpeg_idct_5x5;
4982 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
4983 + break;
4984 + case 6:
4985 + method_ptr = jpeg_idct_6x6;
4986 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
4987 + break;
4988 + case 7:
4989 + method_ptr = jpeg_idct_7x7;
4990 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
4991 + break;
4992 #endif
4993 case DCTSIZE:
4994 switch (cinfo->dct_method) {
4995 @@ -155,8 +175,40 @@
4996 break;
4997 }
4998 break;
4999 + case 9:
5000 + method_ptr = jpeg_idct_9x9;
5001 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
5002 + break;
5003 + case 10:
5004 + method_ptr = jpeg_idct_10x10;
5005 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
5006 + break;
5007 + case 11:
5008 + method_ptr = jpeg_idct_11x11;
5009 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
5010 + break;
5011 + case 12:
5012 + method_ptr = jpeg_idct_12x12;
5013 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
5014 + break;
5015 + case 13:
5016 + method_ptr = jpeg_idct_13x13;
5017 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
5018 + break;
5019 + case 14:
5020 + method_ptr = jpeg_idct_14x14;
5021 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
5022 + break;
5023 + case 15:
5024 + method_ptr = jpeg_idct_15x15;
5025 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
5026 + break;
5027 + case 16:
5028 + method_ptr = jpeg_idct_16x16;
5029 + method = JDCT_ISLOW; /* jidctint uses islow-style table */
5030 + break;
5031 default:
5032 - ERREXIT1(cinfo, JERR_BAD_DCTSIZE, compptr->DCT_scaled_size);
5033 + ERREXIT1(cinfo, JERR_BAD_DCTSIZE, compptr->_DCT_scaled_size);
5034 break;
5035 }
5036 idct->pub.inverse_DCT[ci] = method_ptr;
5037 Index: jdhuff.c
5038 ===================================================================
5039 --- jdhuff.c (revision 829)
5040 +++ jdhuff.c (working copy)
5041 @@ -1,8 +1,10 @@
5042 /*
5043 * jdhuff.c
5044 *
5045 + * This file was part of the Independent JPEG Group's software:
5046 * Copyright (C) 1991-1997, Thomas G. Lane.
5047 - * This file is part of the Independent JPEG Group's software.
5048 + * libjpeg-turbo Modifications:
5049 + * Copyright (C) 2009-2011, 2015, D. R. Commander.
5050 * For conditions of distribution and use, see the accompanying README file.
5051 *
5052 * This file contains Huffman entropy decoding routines.
5053 @@ -18,6 +20,7 @@
5054 #include "jinclude.h"
5055 #include "jpeglib.h"
5056 #include "jdhuff.h" /* Declarations shared with jdphuff.c */
5057 +#include "jpegcomp.h"
5058
5059
5060 /*
5061 @@ -122,7 +125,7 @@
5062 if (compptr->component_needed) {
5063 entropy->dc_needed[blkn] = TRUE;
5064 /* we don't need the ACs if producing a 1/8th-size image */
5065 - entropy->ac_needed[blkn] = (compptr->DCT_scaled_size > 1);
5066 + entropy->ac_needed[blkn] = (compptr->_DCT_scaled_size > 1);
5067 } else {
5068 entropy->dc_needed[blkn] = entropy->ac_needed[blkn] = FALSE;
5069 }
5070 @@ -225,6 +228,7 @@
5071 dtbl->maxcode[l] = -1; /* -1 if no codes of this length */
5072 }
5073 }
5074 + dtbl->valoffset[17] = 0;
5075 dtbl->maxcode[17] = 0xFFFFFL; /* ensures jpeg_huff_decode terminates */
5076
5077 /* Compute lookahead tables to speed up decoding.
5078 @@ -234,7 +238,8 @@
5079 * with that code.
5080 */
5081
5082 - MEMZERO(dtbl->look_nbits, SIZEOF(dtbl->look_nbits));
5083 + for (i = 0; i < (1 << HUFF_LOOKAHEAD); i++)
5084 + dtbl->lookup[i] = (HUFF_LOOKAHEAD + 1) << HUFF_LOOKAHEAD;
5085
5086 p = 0;
5087 for (l = 1; l <= HUFF_LOOKAHEAD; l++) {
5088 @@ -243,8 +248,7 @@
5089 /* Generate left-justified code followed by all possible bit sequences */
5090 lookbits = huffcode[p] << (HUFF_LOOKAHEAD-l);
5091 for (ctr = 1 << (HUFF_LOOKAHEAD-l); ctr > 0; ctr--) {
5092 - dtbl->look_nbits[lookbits] = l;
5093 - dtbl->look_sym[lookbits] = htbl->huffval[p];
5094 + dtbl->lookup[lookbits] = (l << HUFF_LOOKAHEAD) | htbl->huffval[p];
5095 lookbits++;
5096 }
5097 }
5098 @@ -389,6 +393,50 @@
5099 }
5100
5101
5102 +/* Macro version of the above, which performs much better but does not
5103 + handle markers. We have to hand off any blocks with markers to the
5104 + slower routines. */
5105 +
5106 +#define GET_BYTE \
5107 +{ \
5108 + register int c0, c1; \
5109 + c0 = GETJOCTET(*buffer++); \
5110 + c1 = GETJOCTET(*buffer); \
5111 + /* Pre-execute most common case */ \
5112 + get_buffer = (get_buffer << 8) | c0; \
5113 + bits_left += 8; \
5114 + if (c0 == 0xFF) { \
5115 + /* Pre-execute case of FF/00, which represents an FF data byte */ \
5116 + buffer++; \
5117 + if (c1 != 0) { \
5118 + /* Oops, it's actually a marker indicating end of compressed data. */ \
5119 + cinfo->unread_marker = c1; \
5120 + /* Back out pre-execution and fill the buffer with zero bits */ \
5121 + buffer -= 2; \
5122 + get_buffer &= ~0xFF; \
5123 + } \
5124 + } \
5125 +}
5126 +
5127 +#if __WORDSIZE == 64 || defined(_WIN64)
5128 +
5129 +/* Pre-fetch 48 bytes, because the holding register is 64-bit */
5130 +#define FILL_BIT_BUFFER_FAST \
5131 + if (bits_left < 16) { \
5132 + GET_BYTE GET_BYTE GET_BYTE GET_BYTE GET_BYTE GET_BYTE \
5133 + }
5134 +
5135 +#else
5136 +
5137 +/* Pre-fetch 16 bytes, because the holding register is 32-bit */
5138 +#define FILL_BIT_BUFFER_FAST \
5139 + if (bits_left < 16) { \
5140 + GET_BYTE GET_BYTE \
5141 + }
5142 +
5143 +#endif
5144 +
5145 +
5146 /*
5147 * Out-of-line code for Huffman code decoding.
5148 * See jdhuff.h for info about usage.
5149 @@ -438,9 +486,10 @@
5150 * On some machines, a shift and add will be faster than a table lookup.
5151 */
5152
5153 +#define AVOID_TABLES
5154 #ifdef AVOID_TABLES
5155
5156 -#define HUFF_EXTEND(x,s) ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x))
5157 +#define HUFF_EXTEND(x,s) ((x) + ((((x) - (1<<((s)-1))) >> 31) & (((-1)<<(s)) + 1)))
5158
5159 #else
5160
5161 @@ -498,6 +547,191 @@
5162 }
5163
5164
5165 +LOCAL(boolean)
5166 +decode_mcu_slow (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
5167 +{
5168 + huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy;
5169 + BITREAD_STATE_VARS;
5170 + int blkn;
5171 + savable_state state;
5172 + /* Outer loop handles each block in the MCU */
5173 +
5174 + /* Load up working state */
5175 + BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
5176 + ASSIGN_STATE(state, entropy->saved);
5177 +
5178 + for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
5179 + JBLOCKROW block = MCU_data ? MCU_data[blkn] : NULL;
5180 + d_derived_tbl * dctbl = entropy->dc_cur_tbls[blkn];
5181 + d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn];
5182 + register int s, k, r;
5183 +
5184 + /* Decode a single block's worth of coefficients */
5185 +
5186 + /* Section F.2.2.1: decode the DC coefficient difference */
5187 + HUFF_DECODE(s, br_state, dctbl, return FALSE, label1);
5188 + if (s) {
5189 + CHECK_BIT_BUFFER(br_state, s, return FALSE);
5190 + r = GET_BITS(s);
5191 + s = HUFF_EXTEND(r, s);
5192 + }
5193 +
5194 + if (entropy->dc_needed[blkn]) {
5195 + /* Convert DC difference to actual value, update last_dc_val */
5196 + int ci = cinfo->MCU_membership[blkn];
5197 + s += state.last_dc_val[ci];
5198 + state.last_dc_val[ci] = s;
5199 + if (block) {
5200 + /* Output the DC coefficient (assumes jpeg_natural_order[0] = 0) */
5201 + (*block)[0] = (JCOEF) s;
5202 + }
5203 + }
5204 +
5205 + if (entropy->ac_needed[blkn] && block) {
5206 +
5207 + /* Section F.2.2.2: decode the AC coefficients */
5208 + /* Since zeroes are skipped, output area must be cleared beforehand */
5209 + for (k = 1; k < DCTSIZE2; k++) {
5210 + HUFF_DECODE(s, br_state, actbl, return FALSE, label2);
5211 +
5212 + r = s >> 4;
5213 + s &= 15;
5214 +
5215 + if (s) {
5216 + k += r;
5217 + CHECK_BIT_BUFFER(br_state, s, return FALSE);
5218 + r = GET_BITS(s);
5219 + s = HUFF_EXTEND(r, s);
5220 + /* Output coefficient in natural (dezigzagged) order.
5221 + * Note: the extra entries in jpeg_natural_order[] will save us
5222 + * if k >= DCTSIZE2, which could happen if the data is corrupted.
5223 + */
5224 + (*block)[jpeg_natural_order[k]] = (JCOEF) s;
5225 + } else {
5226 + if (r != 15)
5227 + break;
5228 + k += 15;
5229 + }
5230 + }
5231 +
5232 + } else {
5233 +
5234 + /* Section F.2.2.2: decode the AC coefficients */
5235 + /* In this path we just discard the values */
5236 + for (k = 1; k < DCTSIZE2; k++) {
5237 + HUFF_DECODE(s, br_state, actbl, return FALSE, label3);
5238 +
5239 + r = s >> 4;
5240 + s &= 15;
5241 +
5242 + if (s) {
5243 + k += r;
5244 + CHECK_BIT_BUFFER(br_state, s, return FALSE);
5245 + DROP_BITS(s);
5246 + } else {
5247 + if (r != 15)
5248 + break;
5249 + k += 15;
5250 + }
5251 + }
5252 + }
5253 + }
5254 +
5255 + /* Completed MCU, so update state */
5256 + BITREAD_SAVE_STATE(cinfo,entropy->bitstate);
5257 + ASSIGN_STATE(entropy->saved, state);
5258 + return TRUE;
5259 +}
5260 +
5261 +
5262 +LOCAL(boolean)
5263 +decode_mcu_fast (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
5264 +{
5265 + huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy;
5266 + BITREAD_STATE_VARS;
5267 + JOCTET *buffer;
5268 + int blkn;
5269 + savable_state state;
5270 + /* Outer loop handles each block in the MCU */
5271 +
5272 + /* Load up working state */
5273 + BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
5274 + buffer = (JOCTET *) br_state.next_input_byte;
5275 + ASSIGN_STATE(state, entropy->saved);
5276 +
5277 + for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
5278 + JBLOCKROW block = MCU_data[blkn];
5279 + d_derived_tbl * dctbl = entropy->dc_cur_tbls[blkn];
5280 + d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn];
5281 + register int s, k, r, l;
5282 +
5283 + HUFF_DECODE_FAST(s, l, dctbl, slow_decode_mcu);
5284 + if (s) {
5285 + FILL_BIT_BUFFER_FAST
5286 + r = GET_BITS(s);
5287 + s = HUFF_EXTEND(r, s);
5288 + }
5289 +
5290 + if (entropy->dc_needed[blkn]) {
5291 + int ci = cinfo->MCU_membership[blkn];
5292 + s += state.last_dc_val[ci];
5293 + state.last_dc_val[ci] = s;
5294 + if (block)
5295 + (*block)[0] = (JCOEF) s;
5296 + }
5297 +
5298 + if (entropy->ac_needed[blkn] && block) {
5299 +
5300 + for (k = 1; k < DCTSIZE2; k++) {
5301 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu);
5302 + r = s >> 4;
5303 + s &= 15;
5304 +
5305 + if (s) {
5306 + k += r;
5307 + FILL_BIT_BUFFER_FAST
5308 + r = GET_BITS(s);
5309 + s = HUFF_EXTEND(r, s);
5310 + (*block)[jpeg_natural_order[k]] = (JCOEF) s;
5311 + } else {
5312 + if (r != 15) break;
5313 + k += 15;
5314 + }
5315 + }
5316 +
5317 + } else {
5318 +
5319 + for (k = 1; k < DCTSIZE2; k++) {
5320 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu);
5321 + r = s >> 4;
5322 + s &= 15;
5323 +
5324 + if (s) {
5325 + k += r;
5326 + FILL_BIT_BUFFER_FAST
5327 + DROP_BITS(s);
5328 + } else {
5329 + if (r != 15) break;
5330 + k += 15;
5331 + }
5332 + }
5333 + }
5334 + }
5335 +
5336 + if (cinfo->unread_marker != 0) {
5337 +slow_decode_mcu:
5338 + cinfo->unread_marker = 0;
5339 + return FALSE;
5340 + }
5341 +
5342 + br_state.bytes_in_buffer -= (buffer - br_state.next_input_byte);
5343 + br_state.next_input_byte = buffer;
5344 + BITREAD_SAVE_STATE(cinfo,entropy->bitstate);
5345 + ASSIGN_STATE(entropy->saved, state);
5346 + return TRUE;
5347 +}
5348 +
5349 +
5350 /*
5351 * Decode and return one MCU's worth of Huffman-compressed coefficients.
5352 * The coefficients are reordered from zigzag order into natural array order,
5353 @@ -513,13 +747,13 @@
5354 * this module, since we'll just re-assign them on the next call.)
5355 */
5356
5357 +#define BUFSIZE (DCTSIZE2 * 2u)
5358 +
5359 METHODDEF(boolean)
5360 decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
5361 {
5362 huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy;
5363 - int blkn;
5364 - BITREAD_STATE_VARS;
5365 - savable_state state;
5366 + int usefast = 1;
5367
5368 /* Process restart marker if needed; may have to suspend */
5369 if (cinfo->restart_interval) {
5370 @@ -526,98 +760,26 @@
5371 if (entropy->restarts_to_go == 0)
5372 if (! process_restart(cinfo))
5373 return FALSE;
5374 + usefast = 0;
5375 }
5376
5377 + if (cinfo->src->bytes_in_buffer < BUFSIZE * (size_t)cinfo->blocks_in_MCU
5378 + || cinfo->unread_marker != 0)
5379 + usefast = 0;
5380 +
5381 /* If we've run out of data, just leave the MCU set to zeroes.
5382 * This way, we return uniform gray for the remainder of the segment.
5383 */
5384 if (! entropy->pub.insufficient_data) {
5385
5386 - /* Load up working state */
5387 - BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
5388 - ASSIGN_STATE(state, entropy->saved);
5389 -
5390 - /* Outer loop handles each block in the MCU */
5391 -
5392 - for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
5393 - JBLOCKROW block = MCU_data[blkn];
5394 - d_derived_tbl * dctbl = entropy->dc_cur_tbls[blkn];
5395 - d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn];
5396 - register int s, k, r;
5397 -
5398 - /* Decode a single block's worth of coefficients */
5399 -
5400 - /* Section F.2.2.1: decode the DC coefficient difference */
5401 - HUFF_DECODE(s, br_state, dctbl, return FALSE, label1);
5402 - if (s) {
5403 - CHECK_BIT_BUFFER(br_state, s, return FALSE);
5404 - r = GET_BITS(s);
5405 - s = HUFF_EXTEND(r, s);
5406 - }
5407 -
5408 - if (entropy->dc_needed[blkn]) {
5409 - /* Convert DC difference to actual value, update last_dc_val */
5410 - int ci = cinfo->MCU_membership[blkn];
5411 - s += state.last_dc_val[ci];
5412 - state.last_dc_val[ci] = s;
5413 - /* Output the DC coefficient (assumes jpeg_natural_order[0] = 0) */
5414 - (*block)[0] = (JCOEF) s;
5415 - }
5416 -
5417 - if (entropy->ac_needed[blkn]) {
5418 -
5419 - /* Section F.2.2.2: decode the AC coefficients */
5420 - /* Since zeroes are skipped, output area must be cleared beforehand */
5421 - for (k = 1; k < DCTSIZE2; k++) {
5422 - HUFF_DECODE(s, br_state, actbl, return FALSE, label2);
5423 -
5424 - r = s >> 4;
5425 - s &= 15;
5426 -
5427 - if (s) {
5428 - k += r;
5429 - CHECK_BIT_BUFFER(br_state, s, return FALSE);
5430 - r = GET_BITS(s);
5431 - s = HUFF_EXTEND(r, s);
5432 - /* Output coefficient in natural (dezigzagged) order.
5433 - * Note: the extra entries in jpeg_natural_order[] will save us
5434 - * if k >= DCTSIZE2, which could happen if the data is corrupted.
5435 - */
5436 - (*block)[jpeg_natural_order[k]] = (JCOEF) s;
5437 - } else {
5438 - if (r != 15)
5439 - break;
5440 - k += 15;
5441 - }
5442 - }
5443 -
5444 - } else {
5445 -
5446 - /* Section F.2.2.2: decode the AC coefficients */
5447 - /* In this path we just discard the values */
5448 - for (k = 1; k < DCTSIZE2; k++) {
5449 - HUFF_DECODE(s, br_state, actbl, return FALSE, label3);
5450 -
5451 - r = s >> 4;
5452 - s &= 15;
5453 -
5454 - if (s) {
5455 - k += r;
5456 - CHECK_BIT_BUFFER(br_state, s, return FALSE);
5457 - DROP_BITS(s);
5458 - } else {
5459 - if (r != 15)
5460 - break;
5461 - k += 15;
5462 - }
5463 - }
5464 -
5465 - }
5466 + if (usefast) {
5467 + if (!decode_mcu_fast(cinfo, MCU_data)) goto use_slow;
5468 }
5469 + else {
5470 + use_slow:
5471 + if (!decode_mcu_slow(cinfo, MCU_data)) return FALSE;
5472 + }
5473
5474 - /* Completed MCU, so update state */
5475 - BITREAD_SAVE_STATE(cinfo,entropy->bitstate);
5476 - ASSIGN_STATE(entropy->saved, state);
5477 }
5478
5479 /* Account for restart interval (no-op if not using restarts) */
5480 Index: jdhuff.h
5481 ===================================================================
5482 --- jdhuff.h (revision 829)
5483 +++ jdhuff.h (working copy)
5484 @@ -1,8 +1,10 @@
5485 /*
5486 * jdhuff.h
5487 *
5488 + * This file was part of the Independent JPEG Group's software:
5489 * Copyright (C) 1991-1997, Thomas G. Lane.
5490 - * This file is part of the Independent JPEG Group's software.
5491 + * Modifications:
5492 + * Copyright (C) 2010-2011, D. R. Commander.
5493 * For conditions of distribution and use, see the accompanying README file.
5494 *
5495 * This file contains declarations for Huffman entropy decoding routines
5496 @@ -27,7 +29,7 @@
5497 /* Basic tables: (element [0] of each array is unused) */
5498 INT32 maxcode[18]; /* largest code of length k (-1 if none) */
5499 /* (maxcode[17] is a sentinel to ensure jpeg_huff_decode terminates) */
5500 - INT32 valoffset[17]; /* huffval[] offset for codes of length k */
5501 + INT32 valoffset[18]; /* huffval[] offset for codes of length k */
5502 /* valoffset[k] = huffval[] index of 1st symbol of code length k, less
5503 * the smallest code of length k; so given a code of length k, the
5504 * corresponding symbol is huffval[code + valoffset[k]]
5505 @@ -36,13 +38,17 @@
5506 /* Link to public Huffman table (needed only in jpeg_huff_decode) */
5507 JHUFF_TBL *pub;
5508
5509 - /* Lookahead tables: indexed by the next HUFF_LOOKAHEAD bits of
5510 + /* Lookahead table: indexed by the next HUFF_LOOKAHEAD bits of
5511 * the input data stream. If the next Huffman code is no more
5512 * than HUFF_LOOKAHEAD bits long, we can obtain its length and
5513 - * the corresponding symbol directly from these tables.
5514 + * the corresponding symbol directly from this tables.
5515 + *
5516 + * The lower 8 bits of each table entry contain the number of
5517 + * bits in the corresponding Huffman code, or HUFF_LOOKAHEAD + 1
5518 + * if too long. The next 8 bits of each entry contain the
5519 + * symbol.
5520 */
5521 - int look_nbits[1<<HUFF_LOOKAHEAD]; /* # bits, or 0 if too long */
5522 - UINT8 look_sym[1<<HUFF_LOOKAHEAD]; /* symbol, or unused */
5523 + int lookup[1<<HUFF_LOOKAHEAD];
5524 } d_derived_tbl;
5525
5526 /* Expand a Huffman table definition into the derived format */
5527 @@ -69,9 +75,18 @@
5528 * necessary.
5529 */
5530
5531 +#if __WORDSIZE == 64 || defined(_WIN64)
5532 +
5533 +typedef size_t bit_buf_type; /* type of bit-extraction buffer */
5534 +#define BIT_BUF_SIZE 64 /* size of buffer in bits */
5535 +
5536 +#else
5537 +
5538 typedef INT32 bit_buf_type; /* type of bit-extraction buffer */
5539 -#define BIT_BUF_SIZE 32 /* size of buffer in bits */
5540 +#define BIT_BUF_SIZE 32 /* size of buffer in bits */
5541
5542 +#endif
5543 +
5544 /* If long is > 32 bits on your machine, and shifting/masking longs is
5545 * reasonably fast, making bit_buf_type be long and setting BIT_BUF_SIZE
5546 * appropriately should be a win. Unfortunately we can't define the size
5547 @@ -183,11 +198,10 @@
5548 } \
5549 } \
5550 look = PEEK_BITS(HUFF_LOOKAHEAD); \
5551 - if ((nb = htbl->look_nbits[look]) != 0) { \
5552 + if ((nb = (htbl->lookup[look] >> HUFF_LOOKAHEAD)) <= HUFF_LOOKAHEAD) { \
5553 DROP_BITS(nb); \
5554 - result = htbl->look_sym[look]; \
5555 + result = htbl->lookup[look] & ((1 << HUFF_LOOKAHEAD) - 1); \
5556 } else { \
5557 - nb = HUFF_LOOKAHEAD+1; \
5558 slowlabel: \
5559 if ((result=jpeg_huff_decode(&state,get_buffer,bits_left,htbl,nb)) < 0) \
5560 { failaction; } \
5561 @@ -195,6 +209,28 @@
5562 } \
5563 }
5564
5565 +#define HUFF_DECODE_FAST(s,nb,htbl,slowlabel) \
5566 + FILL_BIT_BUFFER_FAST; \
5567 + s = PEEK_BITS(HUFF_LOOKAHEAD); \
5568 + s = htbl->lookup[s]; \
5569 + nb = s >> HUFF_LOOKAHEAD; \
5570 + /* Pre-execute the common case of nb <= HUFF_LOOKAHEAD */ \
5571 + DROP_BITS(nb); \
5572 + s = s & ((1 << HUFF_LOOKAHEAD) - 1); \
5573 + if (nb > HUFF_LOOKAHEAD) { \
5574 + /* Equivalent of jpeg_huff_decode() */ \
5575 + /* Don't use GET_BITS() here because we don't want to modify bits_left */ \
5576 + s = (get_buffer >> bits_left) & ((1 << (nb)) - 1); \
5577 + while (s > htbl->maxcode[nb]) { \
5578 + s <<= 1; \
5579 + s |= GET_BITS(1); \
5580 + nb++; \
5581 + } \
5582 + if (nb > 16) \
5583 + goto slowlabel; \
5584 + s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ]; \
5585 + }
5586 +
5587 /* Out-of-line case for Huffman code fetching */
5588 EXTERN(int) jpeg_huff_decode
5589 JPP((bitread_working_state * state, register bit_buf_type get_buffer,
5590 Index: jdinput.c
5591 ===================================================================
5592 --- jdinput.c (revision 829)
5593 +++ jdinput.c (working copy)
5594 @@ -1,8 +1,10 @@
5595 /*
5596 * jdinput.c
5597 *
5598 + * This file was part of the Independent JPEG Group's software:
5599 * Copyright (C) 1991-1997, Thomas G. Lane.
5600 - * This file is part of the Independent JPEG Group's software.
5601 + * libjpeg-turbo Modifications:
5602 + * Copyright (C) 2010, D. R. Commander.
5603 * For conditions of distribution and use, see the accompanying README file.
5604 *
5605 * This file contains input control logic for the JPEG decompressor.
5606 @@ -14,6 +16,7 @@
5607 #define JPEG_INTERNALS
5608 #include "jinclude.h"
5609 #include "jpeglib.h"
5610 +#include "jpegcomp.h"
5611
5612
5613 /* Private state */
5614 @@ -70,16 +73,30 @@
5615 compptr->v_samp_factor);
5616 }
5617
5618 +#if JPEG_LIB_VERSION >=80
5619 + cinfo->block_size = DCTSIZE;
5620 + cinfo->natural_order = jpeg_natural_order;
5621 + cinfo->lim_Se = DCTSIZE2-1;
5622 +#endif
5623 +
5624 /* We initialize DCT_scaled_size and min_DCT_scaled_size to DCTSIZE.
5625 * In the full decompressor, this will be overridden by jdmaster.c;
5626 * but in the transcoder, jdmaster.c is not used, so we must do it here.
5627 */
5628 +#if JPEG_LIB_VERSION >= 70
5629 + cinfo->min_DCT_h_scaled_size = cinfo->min_DCT_v_scaled_size = DCTSIZE;
5630 +#else
5631 cinfo->min_DCT_scaled_size = DCTSIZE;
5632 +#endif
5633
5634 /* Compute dimensions of components */
5635 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
5636 ci++, compptr++) {
5637 +#if JPEG_LIB_VERSION >= 70
5638 + compptr->DCT_h_scaled_size = compptr->DCT_v_scaled_size = DCTSIZE;
5639 +#else
5640 compptr->DCT_scaled_size = DCTSIZE;
5641 +#endif
5642 /* Size in DCT blocks */
5643 compptr->width_in_blocks = (JDIMENSION)
5644 jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor,
5645 @@ -138,7 +155,7 @@
5646 compptr->MCU_width = 1;
5647 compptr->MCU_height = 1;
5648 compptr->MCU_blocks = 1;
5649 - compptr->MCU_sample_width = compptr->DCT_scaled_size;
5650 + compptr->MCU_sample_width = compptr->_DCT_scaled_size;
5651 compptr->last_col_width = 1;
5652 /* For noninterleaved scans, it is convenient to define last_row_height
5653 * as the number of block rows present in the last iMCU row.
5654 @@ -174,7 +191,7 @@
5655 compptr->MCU_width = compptr->h_samp_factor;
5656 compptr->MCU_height = compptr->v_samp_factor;
5657 compptr->MCU_blocks = compptr->MCU_width * compptr->MCU_height;
5658 - compptr->MCU_sample_width = compptr->MCU_width * compptr->DCT_scaled_size ;
5659 + compptr->MCU_sample_width = compptr->MCU_width * compptr->_DCT_scaled_siz e;
5660 /* Figure number of non-dummy blocks in last MCU column & row */
5661 tmp = (int) (compptr->width_in_blocks % compptr->MCU_width);
5662 if (tmp == 0) tmp = compptr->MCU_width;
5663 Index: jdmainct.c
5664 ===================================================================
5665 --- jdmainct.c (revision 829)
5666 +++ jdmainct.c (working copy)
5667 @@ -1,8 +1,10 @@
5668 /*
5669 * jdmainct.c
5670 *
5671 + * This file was part of the Independent JPEG Group's software:
5672 * Copyright (C) 1994-1996, Thomas G. Lane.
5673 - * This file is part of the Independent JPEG Group's software.
5674 + * libjpeg-turbo Modifications:
5675 + * Copyright (C) 2010, D. R. Commander.
5676 * For conditions of distribution and use, see the accompanying README file.
5677 *
5678 * This file contains the main buffer controller for decompression.
5679 @@ -13,9 +15,7 @@
5680 * supplies the equivalent of the main buffer in that case.
5681 */
5682
5683 -#define JPEG_INTERNALS
5684 -#include "jinclude.h"
5685 -#include "jpeglib.h"
5686 +#include "jdmainct.h"
5687
5688
5689 /*
5690 @@ -109,36 +109,6 @@
5691 */
5692
5693
5694 -/* Private buffer controller object */
5695 -
5696 -typedef struct {
5697 - struct jpeg_d_main_controller pub; /* public fields */
5698 -
5699 - /* Pointer to allocated workspace (M or M+2 row groups). */
5700 - JSAMPARRAY buffer[MAX_COMPONENTS];
5701 -
5702 - boolean buffer_full; /* Have we gotten an iMCU row from decoder? */
5703 - JDIMENSION rowgroup_ctr; /* counts row groups output to postprocessor */
5704 -
5705 - /* Remaining fields are only used in the context case. */
5706 -
5707 - /* These are the master pointers to the funny-order pointer lists. */
5708 - JSAMPIMAGE xbuffer[2]; /* pointers to weird pointer lists */
5709 -
5710 - int whichptr; /* indicates which pointer set is now in use */
5711 - int context_state; /* process_data state machine status */
5712 - JDIMENSION rowgroups_avail; /* row groups available to postprocessor */
5713 - JDIMENSION iMCU_row_ctr; /* counts iMCU rows to detect image top/bot */
5714 -} my_main_controller;
5715 -
5716 -typedef my_main_controller * my_main_ptr;
5717 -
5718 -/* context_state values: */
5719 -#define CTX_PREPARE_FOR_IMCU 0 /* need to prepare for MCU row */
5720 -#define CTX_PROCESS_IMCU 1 /* feeding iMCU to postprocessor */
5721 -#define CTX_POSTPONED_ROW 2 /* feeding postponed row group */
5722 -
5723 -
5724 /* Forward declarations */
5725 METHODDEF(void) process_data_simple_main
5726 JPP((j_decompress_ptr cinfo, JSAMPARRAY output_buf,
5727 @@ -159,9 +129,9 @@
5728 * This is done only once, not once per pass.
5729 */
5730 {
5731 - my_main_ptr main = (my_main_ptr) cinfo->main;
5732 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
5733 int ci, rgroup;
5734 - int M = cinfo->min_DCT_scaled_size;
5735 + int M = cinfo->_min_DCT_scaled_size;
5736 jpeg_component_info *compptr;
5737 JSAMPARRAY xbuf;
5738
5739 @@ -168,15 +138,15 @@
5740 /* Get top-level space for component array pointers.
5741 * We alloc both arrays with one call to save a few cycles.
5742 */
5743 - main->xbuffer[0] = (JSAMPIMAGE)
5744 + main_ptr->xbuffer[0] = (JSAMPIMAGE)
5745 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
5746 cinfo->num_components * 2 * SIZEOF(JSAMPARRAY));
5747 - main->xbuffer[1] = main->xbuffer[0] + cinfo->num_components;
5748 + main_ptr->xbuffer[1] = main_ptr->xbuffer[0] + cinfo->num_components;
5749
5750 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
5751 ci++, compptr++) {
5752 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) /
5753 - cinfo->min_DCT_scaled_size; /* height of a row group of component */
5754 + rgroup = (compptr->v_samp_factor * compptr->_DCT_scaled_size) /
5755 + cinfo->_min_DCT_scaled_size; /* height of a row group of component */
5756 /* Get space for pointer lists --- M+4 row groups in each list.
5757 * We alloc both pointer lists with one call to save a few cycles.
5758 */
5759 @@ -184,9 +154,9 @@
5760 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
5761 2 * (rgroup * (M + 4)) * SIZEOF(JSAMPROW));
5762 xbuf += rgroup; /* want one row group at negative offsets */
5763 - main->xbuffer[0][ci] = xbuf;
5764 + main_ptr->xbuffer[0][ci] = xbuf;
5765 xbuf += rgroup * (M + 4);
5766 - main->xbuffer[1][ci] = xbuf;
5767 + main_ptr->xbuffer[1][ci] = xbuf;
5768 }
5769 }
5770
5771 @@ -194,26 +164,26 @@
5772 LOCAL(void)
5773 make_funny_pointers (j_decompress_ptr cinfo)
5774 /* Create the funny pointer lists discussed in the comments above.
5775 - * The actual workspace is already allocated (in main->buffer),
5776 + * The actual workspace is already allocated (in main_ptr->buffer),
5777 * and the space for the pointer lists is allocated too.
5778 * This routine just fills in the curiously ordered lists.
5779 * This will be repeated at the beginning of each pass.
5780 */
5781 {
5782 - my_main_ptr main = (my_main_ptr) cinfo->main;
5783 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
5784 int ci, i, rgroup;
5785 - int M = cinfo->min_DCT_scaled_size;
5786 + int M = cinfo->_min_DCT_scaled_size;
5787 jpeg_component_info *compptr;
5788 JSAMPARRAY buf, xbuf0, xbuf1;
5789
5790 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
5791 ci++, compptr++) {
5792 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) /
5793 - cinfo->min_DCT_scaled_size; /* height of a row group of component */
5794 - xbuf0 = main->xbuffer[0][ci];
5795 - xbuf1 = main->xbuffer[1][ci];
5796 + rgroup = (compptr->v_samp_factor * compptr->_DCT_scaled_size) /
5797 + cinfo->_min_DCT_scaled_size; /* height of a row group of component */
5798 + xbuf0 = main_ptr->xbuffer[0][ci];
5799 + xbuf1 = main_ptr->xbuffer[1][ci];
5800 /* First copy the workspace pointers as-is */
5801 - buf = main->buffer[ci];
5802 + buf = main_ptr->buffer[ci];
5803 for (i = 0; i < rgroup * (M + 2); i++) {
5804 xbuf0[i] = xbuf1[i] = buf[i];
5805 }
5806 @@ -235,34 +205,6 @@
5807
5808
5809 LOCAL(void)
5810 -set_wraparound_pointers (j_decompress_ptr cinfo)
5811 -/* Set up the "wraparound" pointers at top and bottom of the pointer lists.
5812 - * This changes the pointer list state from top-of-image to the normal state.
5813 - */
5814 -{
5815 - my_main_ptr main = (my_main_ptr) cinfo->main;
5816 - int ci, i, rgroup;
5817 - int M = cinfo->min_DCT_scaled_size;
5818 - jpeg_component_info *compptr;
5819 - JSAMPARRAY xbuf0, xbuf1;
5820 -
5821 - for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
5822 - ci++, compptr++) {
5823 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) /
5824 - cinfo->min_DCT_scaled_size; /* height of a row group of component */
5825 - xbuf0 = main->xbuffer[0][ci];
5826 - xbuf1 = main->xbuffer[1][ci];
5827 - for (i = 0; i < rgroup; i++) {
5828 - xbuf0[i - rgroup] = xbuf0[rgroup*(M+1) + i];
5829 - xbuf1[i - rgroup] = xbuf1[rgroup*(M+1) + i];
5830 - xbuf0[rgroup*(M+2) + i] = xbuf0[i];
5831 - xbuf1[rgroup*(M+2) + i] = xbuf1[i];
5832 - }
5833 - }
5834 -}
5835 -
5836 -
5837 -LOCAL(void)
5838 set_bottom_pointers (j_decompress_ptr cinfo)
5839 /* Change the pointer lists to duplicate the last sample row at the bottom
5840 * of the image. whichptr indicates which xbuffer holds the final iMCU row.
5841 @@ -269,7 +211,7 @@
5842 * Also sets rowgroups_avail to indicate number of nondummy row groups in row.
5843 */
5844 {
5845 - my_main_ptr main = (my_main_ptr) cinfo->main;
5846 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
5847 int ci, i, rgroup, iMCUheight, rows_left;
5848 jpeg_component_info *compptr;
5849 JSAMPARRAY xbuf;
5850 @@ -277,8 +219,8 @@
5851 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
5852 ci++, compptr++) {
5853 /* Count sample rows in one iMCU row and in one row group */
5854 - iMCUheight = compptr->v_samp_factor * compptr->DCT_scaled_size;
5855 - rgroup = iMCUheight / cinfo->min_DCT_scaled_size;
5856 + iMCUheight = compptr->v_samp_factor * compptr->_DCT_scaled_size;
5857 + rgroup = iMCUheight / cinfo->_min_DCT_scaled_size;
5858 /* Count nondummy sample rows remaining for this component */
5859 rows_left = (int) (compptr->downsampled_height % (JDIMENSION) iMCUheight);
5860 if (rows_left == 0) rows_left = iMCUheight;
5861 @@ -286,12 +228,12 @@
5862 * so we need only do it once.
5863 */
5864 if (ci == 0) {
5865 - main->rowgroups_avail = (JDIMENSION) ((rows_left-1) / rgroup + 1);
5866 + main_ptr->rowgroups_avail = (JDIMENSION) ((rows_left-1) / rgroup + 1);
5867 }
5868 /* Duplicate the last real sample row rgroup*2 times; this pads out the
5869 * last partial rowgroup and ensures at least one full rowgroup of context.
5870 */
5871 - xbuf = main->xbuffer[main->whichptr][ci];
5872 + xbuf = main_ptr->xbuffer[main_ptr->whichptr][ci];
5873 for (i = 0; i < rgroup * 2; i++) {
5874 xbuf[rows_left + i] = xbuf[rows_left-1];
5875 }
5876 @@ -306,27 +248,27 @@
5877 METHODDEF(void)
5878 start_pass_main (j_decompress_ptr cinfo, J_BUF_MODE pass_mode)
5879 {
5880 - my_main_ptr main = (my_main_ptr) cinfo->main;
5881 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
5882
5883 switch (pass_mode) {
5884 case JBUF_PASS_THRU:
5885 if (cinfo->upsample->need_context_rows) {
5886 - main->pub.process_data = process_data_context_main;
5887 + main_ptr->pub.process_data = process_data_context_main;
5888 make_funny_pointers(cinfo); /* Create the xbuffer[] lists */
5889 - main->whichptr = 0; /* Read first iMCU row into xbuffer[0] */
5890 - main->context_state = CTX_PREPARE_FOR_IMCU;
5891 - main->iMCU_row_ctr = 0;
5892 + main_ptr->whichptr = 0; /* Read first iMCU row into xbuffer[0] */
5893 + main_ptr->context_state = CTX_PREPARE_FOR_IMCU;
5894 + main_ptr->iMCU_row_ctr = 0;
5895 } else {
5896 /* Simple case with no context needed */
5897 - main->pub.process_data = process_data_simple_main;
5898 + main_ptr->pub.process_data = process_data_simple_main;
5899 }
5900 - main->buffer_full = FALSE; /* Mark buffer empty */
5901 - main->rowgroup_ctr = 0;
5902 + main_ptr->buffer_full = FALSE; /* Mark buffer empty */
5903 + main_ptr->rowgroup_ctr = 0;
5904 break;
5905 #ifdef QUANT_2PASS_SUPPORTED
5906 case JBUF_CRANK_DEST:
5907 /* For last pass of 2-pass quantization, just crank the postprocessor */
5908 - main->pub.process_data = process_data_crank_post;
5909 + main_ptr->pub.process_data = process_data_crank_post;
5910 break;
5911 #endif
5912 default:
5913 @@ -346,18 +288,18 @@
5914 JSAMPARRAY output_buf, JDIMENSION *out_row_ctr,
5915 JDIMENSION out_rows_avail)
5916 {
5917 - my_main_ptr main = (my_main_ptr) cinfo->main;
5918 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
5919 JDIMENSION rowgroups_avail;
5920
5921 /* Read input data if we haven't filled the main buffer yet */
5922 - if (! main->buffer_full) {
5923 - if (! (*cinfo->coef->decompress_data) (cinfo, main->buffer))
5924 + if (! main_ptr->buffer_full) {
5925 + if (! (*cinfo->coef->decompress_data) (cinfo, main_ptr->buffer))
5926 return; /* suspension forced, can do nothing more */
5927 - main->buffer_full = TRUE; /* OK, we have an iMCU row to work with */
5928 + main_ptr->buffer_full = TRUE; /* OK, we have an iMCU row to work with */
5929 }
5930
5931 /* There are always min_DCT_scaled_size row groups in an iMCU row. */
5932 - rowgroups_avail = (JDIMENSION) cinfo->min_DCT_scaled_size;
5933 + rowgroups_avail = (JDIMENSION) cinfo->_min_DCT_scaled_size;
5934 /* Note: at the bottom of the image, we may pass extra garbage row groups
5935 * to the postprocessor. The postprocessor has to check for bottom
5936 * of image anyway (at row resolution), so no point in us doing it too.
5937 @@ -364,14 +306,14 @@
5938 */
5939
5940 /* Feed the postprocessor */
5941 - (*cinfo->post->post_process_data) (cinfo, main->buffer,
5942 - &main->rowgroup_ctr, rowgroups_avail,
5943 + (*cinfo->post->post_process_data) (cinfo, main_ptr->buffer,
5944 + &main_ptr->rowgroup_ctr, rowgroups_avail,
5945 output_buf, out_row_ctr, out_rows_avail);
5946
5947 /* Has postprocessor consumed all the data yet? If so, mark buffer empty */
5948 - if (main->rowgroup_ctr >= rowgroups_avail) {
5949 - main->buffer_full = FALSE;
5950 - main->rowgroup_ctr = 0;
5951 + if (main_ptr->rowgroup_ctr >= rowgroups_avail) {
5952 + main_ptr->buffer_full = FALSE;
5953 + main_ptr->rowgroup_ctr = 0;
5954 }
5955 }
5956
5957 @@ -386,15 +328,15 @@
5958 JSAMPARRAY output_buf, JDIMENSION *out_row_ctr,
5959 JDIMENSION out_rows_avail)
5960 {
5961 - my_main_ptr main = (my_main_ptr) cinfo->main;
5962 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;
5963
5964 /* Read input data if we haven't filled the main buffer yet */
5965 - if (! main->buffer_full) {
5966 + if (! main_ptr->buffer_full) {
5967 if (! (*cinfo->coef->decompress_data) (cinfo,
5968 - main->xbuffer[main->whichptr]))
5969 + main_ptr->xbuffer[main_ptr->whichptr] ))
5970 return; /* suspension forced, can do nothing more */
5971 - main->buffer_full = TRUE; /* OK, we have an iMCU row to work with */
5972 - main->iMCU_row_ctr++; /* count rows received */
5973 + main_ptr->buffer_full = TRUE; /* OK, we have an iMCU row to work with */
5974 + main_ptr->iMCU_row_ctr++; /* count rows received */
5975 }
5976
5977 /* Postprocessor typically will not swallow all the input data it is handed
5978 @@ -402,47 +344,47 @@
5979 * to exit and restart. This switch lets us keep track of how far we got.
5980 * Note that each case falls through to the next on successful completion.
5981 */
5982 - switch (main->context_state) {
5983 + switch (main_ptr->context_state) {
5984 case CTX_POSTPONED_ROW:
5985 /* Call postprocessor using previously set pointers for postponed row */
5986 - (*cinfo->post->post_process_data) (cinfo, main->xbuffer[main->whichptr],
5987 - &main->rowgroup_ctr, main->rowgroups_avail,
5988 + (*cinfo->post->post_process_data) (cinfo, main_ptr->xbuffer[main_ptr->which ptr],
5989 + &main_ptr->rowgroup_ctr, main_ptr->rowgroups_avail,
5990 output_buf, out_row_ctr, out_rows_avail);
5991 - if (main->rowgroup_ctr < main->rowgroups_avail)
5992 + if (main_ptr->rowgroup_ctr < main_ptr->rowgroups_avail)
5993 return; /* Need to suspend */
5994 - main->context_state = CTX_PREPARE_FOR_IMCU;
5995 + main_ptr->context_state = CTX_PREPARE_FOR_IMCU;
5996 if (*out_row_ctr >= out_rows_avail)
5997 return; /* Postprocessor exactly filled output buf */
5998 /*FALLTHROUGH*/
5999 case CTX_PREPARE_FOR_IMCU:
6000 /* Prepare to process first M-1 row groups of this iMCU row */
6001 - main->rowgroup_ctr = 0;
6002 - main->rowgroups_avail = (JDIMENSION) (cinfo->min_DCT_scaled_size - 1);
6003 + main_ptr->rowgroup_ctr = 0;
6004 + main_ptr->rowgroups_avail = (JDIMENSION) (cinfo->_min_DCT_scaled_size - 1);
6005 /* Check for bottom of image: if so, tweak pointers to "duplicate"
6006 * the last sample row, and adjust rowgroups_avail to ignore padding rows.
6007 */
6008 - if (main->iMCU_row_ctr == cinfo->total_iMCU_rows)
6009 + if (main_ptr->iMCU_row_ctr == cinfo->total_iMCU_rows)
6010 set_bottom_pointers(cinfo);
6011 - main->context_state = CTX_PROCESS_IMCU;
6012 + main_ptr->context_state = CTX_PROCESS_IMCU;
6013 /*FALLTHROUGH*/
6014 case CTX_PROCESS_IMCU:
6015 /* Call postprocessor using previously set pointers */
6016 - (*cinfo->post->post_process_data) (cinfo, main->xbuffer[main->whichptr],
6017 - &main->rowgroup_ctr, main->rowgroups_avail,
6018 + (*cinfo->post->post_process_data) (cinfo, main_ptr->xbuffer[main_ptr->which ptr],
6019 + &main_ptr->rowgroup_ctr, main_ptr->rowgroups_avail,
6020 output_buf, out_row_ctr, out_rows_avail);
6021 - if (main->rowgroup_ctr < main->rowgroups_avail)
6022 + if (main_ptr->rowgroup_ctr < main_ptr->rowgroups_avail)
6023 return; /* Need to suspend */
6024 /* After the first iMCU, change wraparound pointers to normal state */
6025 - if (main->iMCU_row_ctr == 1)
6026 + if (main_ptr->iMCU_row_ctr == 1)
6027 set_wraparound_pointers(cinfo);
6028 /* Prepare to load new iMCU row using other xbuffer list */
6029 - main->whichptr ^= 1; /* 0=>1 or 1=>0 */
6030 - main->buffer_full = FALSE;
6031 + main_ptr->whichptr ^= 1; /* 0=>1 or 1=>0 */
6032 + main_ptr->buffer_full = FALSE;
6033 /* Still need to process last row group of this iMCU row, */
6034 /* which is saved at index M+1 of the other xbuffer */
6035 - main->rowgroup_ctr = (JDIMENSION) (cinfo->min_DCT_scaled_size + 1);
6036 - main->rowgroups_avail = (JDIMENSION) (cinfo->min_DCT_scaled_size + 2);
6037 - main->context_state = CTX_POSTPONED_ROW;
6038 + main_ptr->rowgroup_ctr = (JDIMENSION) (cinfo->_min_DCT_scaled_size + 1);
6039 + main_ptr->rowgroups_avail = (JDIMENSION) (cinfo->_min_DCT_scaled_size + 2);
6040 + main_ptr->context_state = CTX_POSTPONED_ROW;
6041 }
6042 }
6043
6044 @@ -475,15 +417,15 @@
6045 GLOBAL(void)
6046 jinit_d_main_controller (j_decompress_ptr cinfo, boolean need_full_buffer)
6047 {
6048 - my_main_ptr main;
6049 + my_main_ptr main_ptr;
6050 int ci, rgroup, ngroups;
6051 jpeg_component_info *compptr;
6052
6053 - main = (my_main_ptr)
6054 + main_ptr = (my_main_ptr)
6055 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
6056 SIZEOF(my_main_controller));
6057 - cinfo->main = (struct jpeg_d_main_controller *) main;
6058 - main->pub.start_pass = start_pass_main;
6059 + cinfo->main = (struct jpeg_d_main_controller *) main_ptr;
6060 + main_ptr->pub.start_pass = start_pass_main;
6061
6062 if (need_full_buffer) /* shouldn't happen */
6063 ERREXIT(cinfo, JERR_BAD_BUFFER_MODE);
6064 @@ -492,21 +434,21 @@
6065 * ngroups is the number of row groups we need.
6066 */
6067 if (cinfo->upsample->need_context_rows) {
6068 - if (cinfo->min_DCT_scaled_size < 2) /* unsupported, see comments above */
6069 + if (cinfo->_min_DCT_scaled_size < 2) /* unsupported, see comments above */
6070 ERREXIT(cinfo, JERR_NOTIMPL);
6071 alloc_funny_pointers(cinfo); /* Alloc space for xbuffer[] lists */
6072 - ngroups = cinfo->min_DCT_scaled_size + 2;
6073 + ngroups = cinfo->_min_DCT_scaled_size + 2;
6074 } else {
6075 - ngroups = cinfo->min_DCT_scaled_size;
6076 + ngroups = cinfo->_min_DCT_scaled_size;
6077 }
6078
6079 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
6080 ci++, compptr++) {
6081 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) /
6082 - cinfo->min_DCT_scaled_size; /* height of a row group of component */
6083 - main->buffer[ci] = (*cinfo->mem->alloc_sarray)
6084 + rgroup = (compptr->v_samp_factor * compptr->_DCT_scaled_size) /
6085 + cinfo->_min_DCT_scaled_size; /* height of a row group of component */
6086 + main_ptr->buffer[ci] = (*cinfo->mem->alloc_sarray)
6087 ((j_common_ptr) cinfo, JPOOL_IMAGE,
6088 - compptr->width_in_blocks * compptr->DCT_scaled_size,
6089 + compptr->width_in_blocks * compptr->_DCT_scaled_size,
6090 (JDIMENSION) (rgroup * ngroups));
6091 }
6092 }
1 Index: jdmarker.c 6093 Index: jdmarker.c
2 =================================================================== 6094 ===================================================================
3 --- jdmarker.c (revision 829) 6095 --- jdmarker.c (revision 829)
4 +++ jdmarker.c (working copy) 6096 +++ jdmarker.c (working copy)
5 @@ -910,7 +910,7 @@ 6097 @@ -1,8 +1,10 @@
6098 /*
6099 * jdmarker.c
6100 *
6101 + * This file was part of the Independent JPEG Group's software:
6102 * Copyright (C) 1991-1998, Thomas G. Lane.
6103 - * This file is part of the Independent JPEG Group's software.
6104 + * libjpeg-turbo Modifications:
6105 + * Copyright (C) 2012, D. R. Commander.
6106 * For conditions of distribution and use, see the accompanying README file.
6107 *
6108 * This file contains routines to decode JPEG datastream markers.
6109 @@ -302,7 +304,7 @@
6110 /* Process a SOS marker */
6111 {
6112 INT32 length;
6113 - int i, ci, n, c, cc;
6114 + int i, ci, n, c, cc, pi;
6115 jpeg_component_info * compptr;
6116 INPUT_VARS(cinfo);
6117
6118 @@ -322,13 +324,17 @@
6119
6120 /* Collect the component-spec parameters */
6121
6122 + for (i = 0; i < MAX_COMPS_IN_SCAN; i++)
6123 + cinfo->cur_comp_info[i] = NULL;
6124 +
6125 for (i = 0; i < n; i++) {
6126 INPUT_BYTE(cinfo, cc, return FALSE);
6127 INPUT_BYTE(cinfo, c, return FALSE);
6128
6129 - for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
6130 + for (ci = 0, compptr = cinfo->comp_info;
6131 +» ci < cinfo->num_components && ci < MAX_COMPS_IN_SCAN;
6132 » ci++, compptr++) {
6133 - if (cc == compptr->component_id)
6134 + if (cc == compptr->component_id && !cinfo->cur_comp_info[ci])
6135 » goto id_found;
6136 }
6137
6138 @@ -342,6 +348,13 @@
6139
6140 TRACEMS3(cinfo, 1, JTRC_SOS_COMPONENT, cc,
6141 » compptr->dc_tbl_no, compptr->ac_tbl_no);
6142 +
6143 + /* This CSi (cc) should differ from the previous CSi */
6144 + for (pi = 0; pi < i; pi++) {
6145 + if (cinfo->cur_comp_info[pi] == compptr) {
6146 + ERREXIT1(cinfo, JERR_BAD_COMPONENT_ID, cc);
6147 + }
6148 + }
6149 }
6150
6151 /* Collect the additional scan parameters Ss, Se, Ah/Al. */
6152 @@ -459,18 +472,21 @@
6153 for (i = 0; i < count; i++)
6154 INPUT_BYTE(cinfo, huffval[i], return FALSE);
6155
6156 + MEMZERO(&huffval[count], (256 - count) * SIZEOF(UINT8));
6157 +
6158 length -= count;
6159
6160 if (index & 0x10) {» » /* AC table definition */
6161 index -= 0x10;
6162 + if (index < 0 || index >= NUM_HUFF_TBLS)
6163 + ERREXIT1(cinfo, JERR_DHT_INDEX, index);
6164 htblptr = &cinfo->ac_huff_tbl_ptrs[index];
6165 } else {» » » /* DC table definition */
6166 + if (index < 0 || index >= NUM_HUFF_TBLS)
6167 + ERREXIT1(cinfo, JERR_DHT_INDEX, index);
6168 htblptr = &cinfo->dc_huff_tbl_ptrs[index];
6169 }
6170
6171 - if (index < 0 || index >= NUM_HUFF_TBLS)
6172 - ERREXIT1(cinfo, JERR_DHT_INDEX, index);
6173 -
6174 if (*htblptr == NULL)
6175 *htblptr = jpeg_alloc_huff_table((j_common_ptr) cinfo);
6176
6177 @@ -906,7 +922,7 @@
6 } 6178 }
7 6179
8 if (cinfo->marker->discarded_bytes != 0) { 6180 if (cinfo->marker->discarded_bytes != 0) {
9 - WARNMS2(cinfo, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c); 6181 - WARNMS2(cinfo, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c);
10 + TRACEMS2(cinfo, 1, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c) ; 6182 + TRACEMS2(cinfo, 1, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c) ;
11 cinfo->marker->discarded_bytes = 0; 6183 cinfo->marker->discarded_bytes = 0;
12 } 6184 }
13 6185
14 @@ -944,7 +944,144 @@ 6186 @@ -940,7 +956,144 @@
15 return TRUE; 6187 return TRUE;
16 } 6188 }
17 6189
18 +#ifdef MOTION_JPEG_SUPPORTED 6190 +#ifdef MOTION_JPEG_SUPPORTED
19 6191
20 +/* The default Huffman tables used by motion JPEG frames. When a motion JPEG 6192 +/* The default Huffman tables used by motion JPEG frames. When a motion JPEG
21 + * frame does not have DHT tables, we should use the huffman tables suggested b y 6193 + * frame does not have DHT tables, we should use the huffman tables suggested b y
22 + * the JPEG standard. Each of these tables represents a member of the JHUFF_TBL S 6194 + * the JPEG standard. Each of these tables represents a member of the JHUFF_TBL S
23 + * struct so we can just copy it to the according JHUFF_TBLS member. 6195 + * struct so we can just copy it to the according JHUFF_TBLS member.
24 + */ 6196 + */
(...skipping 124 matching lines...) Expand 10 before | Expand all | Expand 10 after
149 +#else 6321 +#else
150 + 6322 +
151 +#define mjpg_load_huff_tables(cinfo) 6323 +#define mjpg_load_huff_tables(cinfo)
152 + 6324 +
153 +#endif /* MOTION_JPEG_SUPPORTED */ 6325 +#endif /* MOTION_JPEG_SUPPORTED */
154 + 6326 +
155 + 6327 +
156 /* 6328 /*
157 * Read markers until SOS or EOI. 6329 * Read markers until SOS or EOI.
158 * 6330 *
159 @@ -1013,6 +1150,7 @@ 6331 @@ -1009,6 +1162,7 @@
160 break; 6332 break;
161 6333
162 case M_SOS: 6334 case M_SOS:
163 + mjpg_load_huff_tables(cinfo); 6335 + mjpg_load_huff_tables(cinfo);
164 if (! get_sos(cinfo)) 6336 if (! get_sos(cinfo))
165 return JPEG_SUSPENDED; 6337 return JPEG_SUSPENDED;
166 cinfo->unread_marker = 0; /* processed the marker */ 6338 cinfo->unread_marker = 0; /* processed the marker */
6339 Index: jdmaster.c
6340 ===================================================================
6341 --- jdmaster.c (revision 829)
6342 +++ jdmaster.c (working copy)
6343 @@ -1,9 +1,11 @@
6344 /*
6345 * jdmaster.c
6346 *
6347 + * This file was part of the Independent JPEG Group's software:
6348 * Copyright (C) 1991-1997, Thomas G. Lane.
6349 - * Copyright (C) 2009, D. R. Commander.
6350 - * This file is part of the Independent JPEG Group's software.
6351 + * Modified 2002-2009 by Guido Vollbeding.
6352 + * libjpeg-turbo Modifications:
6353 + * Copyright (C) 2009-2011, D. R. Commander.
6354 * For conditions of distribution and use, see the accompanying README file.
6355 *
6356 * This file contains master control logic for the JPEG decompressor.
6357 @@ -15,6 +17,7 @@
6358 #define JPEG_INTERNALS
6359 #include "jinclude.h"
6360 #include "jpeglib.h"
6361 +#include "jpegcomp.h"
6362
6363
6364 /* Private state */
6365 @@ -56,7 +59,11 @@
6366 cinfo->out_color_space != JCS_EXT_BGR &&
6367 cinfo->out_color_space != JCS_EXT_BGRX &&
6368 cinfo->out_color_space != JCS_EXT_XBGR &&
6369 - cinfo->out_color_space != JCS_EXT_XRGB) ||
6370 + cinfo->out_color_space != JCS_EXT_XRGB &&
6371 + cinfo->out_color_space != JCS_EXT_RGBA &&
6372 + cinfo->out_color_space != JCS_EXT_BGRA &&
6373 + cinfo->out_color_space != JCS_EXT_ABGR &&
6374 + cinfo->out_color_space != JCS_EXT_ARGB) ||
6375 cinfo->out_color_components != rgb_pixelsize[cinfo->out_color_space])
6376 return FALSE;
6377 /* and it only handles 2h1v or 2h2v sampling ratios */
6378 @@ -68,9 +75,9 @@
6379 cinfo->comp_info[2].v_samp_factor != 1)
6380 return FALSE;
6381 /* furthermore, it doesn't work if we've scaled the IDCTs differently */
6382 - if (cinfo->comp_info[0].DCT_scaled_size != cinfo->min_DCT_scaled_size ||
6383 - cinfo->comp_info[1].DCT_scaled_size != cinfo->min_DCT_scaled_size ||
6384 - cinfo->comp_info[2].DCT_scaled_size != cinfo->min_DCT_scaled_size)
6385 + if (cinfo->comp_info[0]._DCT_scaled_size != cinfo->_min_DCT_scaled_size ||
6386 + cinfo->comp_info[1]._DCT_scaled_size != cinfo->_min_DCT_scaled_size ||
6387 + cinfo->comp_info[2]._DCT_scaled_size != cinfo->_min_DCT_scaled_size)
6388 return FALSE;
6389 /* ??? also need to test for upsample-time rescaling, when & if supported */
6390 return TRUE; /* by golly, it'll work... */
6391 @@ -84,6 +91,177 @@
6392 * Compute output image dimensions and related values.
6393 * NOTE: this is exported for possible use by application.
6394 * Hence it mustn't do anything that can't be done twice.
6395 + */
6396 +
6397 +#if JPEG_LIB_VERSION >= 80
6398 +GLOBAL(void)
6399 +#else
6400 +LOCAL(void)
6401 +#endif
6402 +jpeg_core_output_dimensions (j_decompress_ptr cinfo)
6403 +/* Do computations that are needed before master selection phase.
6404 + * This function is used for transcoding and full decompression.
6405 + */
6406 +{
6407 +#ifdef IDCT_SCALING_SUPPORTED
6408 + int ci;
6409 + jpeg_component_info *compptr;
6410 +
6411 + /* Compute actual output image dimensions and DCT scaling choices. */
6412 + if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom) {
6413 + /* Provide 1/block_size scaling */
6414 + cinfo->output_width = (JDIMENSION)
6415 + jdiv_round_up((long) cinfo->image_width, (long) DCTSIZE);
6416 + cinfo->output_height = (JDIMENSION)
6417 + jdiv_round_up((long) cinfo->image_height, (long) DCTSIZE);
6418 + cinfo->_min_DCT_h_scaled_size = 1;
6419 + cinfo->_min_DCT_v_scaled_size = 1;
6420 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 2) {
6421 + /* Provide 2/block_size scaling */
6422 + cinfo->output_width = (JDIMENSION)
6423 + jdiv_round_up((long) cinfo->image_width * 2L, (long) DCTSIZE);
6424 + cinfo->output_height = (JDIMENSION)
6425 + jdiv_round_up((long) cinfo->image_height * 2L, (long) DCTSIZE);
6426 + cinfo->_min_DCT_h_scaled_size = 2;
6427 + cinfo->_min_DCT_v_scaled_size = 2;
6428 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 3) {
6429 + /* Provide 3/block_size scaling */
6430 + cinfo->output_width = (JDIMENSION)
6431 + jdiv_round_up((long) cinfo->image_width * 3L, (long) DCTSIZE);
6432 + cinfo->output_height = (JDIMENSION)
6433 + jdiv_round_up((long) cinfo->image_height * 3L, (long) DCTSIZE);
6434 + cinfo->_min_DCT_h_scaled_size = 3;
6435 + cinfo->_min_DCT_v_scaled_size = 3;
6436 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 4) {
6437 + /* Provide 4/block_size scaling */
6438 + cinfo->output_width = (JDIMENSION)
6439 + jdiv_round_up((long) cinfo->image_width * 4L, (long) DCTSIZE);
6440 + cinfo->output_height = (JDIMENSION)
6441 + jdiv_round_up((long) cinfo->image_height * 4L, (long) DCTSIZE);
6442 + cinfo->_min_DCT_h_scaled_size = 4;
6443 + cinfo->_min_DCT_v_scaled_size = 4;
6444 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 5) {
6445 + /* Provide 5/block_size scaling */
6446 + cinfo->output_width = (JDIMENSION)
6447 + jdiv_round_up((long) cinfo->image_width * 5L, (long) DCTSIZE);
6448 + cinfo->output_height = (JDIMENSION)
6449 + jdiv_round_up((long) cinfo->image_height * 5L, (long) DCTSIZE);
6450 + cinfo->_min_DCT_h_scaled_size = 5;
6451 + cinfo->_min_DCT_v_scaled_size = 5;
6452 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 6) {
6453 + /* Provide 6/block_size scaling */
6454 + cinfo->output_width = (JDIMENSION)
6455 + jdiv_round_up((long) cinfo->image_width * 6L, (long) DCTSIZE);
6456 + cinfo->output_height = (JDIMENSION)
6457 + jdiv_round_up((long) cinfo->image_height * 6L, (long) DCTSIZE);
6458 + cinfo->_min_DCT_h_scaled_size = 6;
6459 + cinfo->_min_DCT_v_scaled_size = 6;
6460 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 7) {
6461 + /* Provide 7/block_size scaling */
6462 + cinfo->output_width = (JDIMENSION)
6463 + jdiv_round_up((long) cinfo->image_width * 7L, (long) DCTSIZE);
6464 + cinfo->output_height = (JDIMENSION)
6465 + jdiv_round_up((long) cinfo->image_height * 7L, (long) DCTSIZE);
6466 + cinfo->_min_DCT_h_scaled_size = 7;
6467 + cinfo->_min_DCT_v_scaled_size = 7;
6468 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 8) {
6469 + /* Provide 8/block_size scaling */
6470 + cinfo->output_width = (JDIMENSION)
6471 + jdiv_round_up((long) cinfo->image_width * 8L, (long) DCTSIZE);
6472 + cinfo->output_height = (JDIMENSION)
6473 + jdiv_round_up((long) cinfo->image_height * 8L, (long) DCTSIZE);
6474 + cinfo->_min_DCT_h_scaled_size = 8;
6475 + cinfo->_min_DCT_v_scaled_size = 8;
6476 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 9) {
6477 + /* Provide 9/block_size scaling */
6478 + cinfo->output_width = (JDIMENSION)
6479 + jdiv_round_up((long) cinfo->image_width * 9L, (long) DCTSIZE);
6480 + cinfo->output_height = (JDIMENSION)
6481 + jdiv_round_up((long) cinfo->image_height * 9L, (long) DCTSIZE);
6482 + cinfo->_min_DCT_h_scaled_size = 9;
6483 + cinfo->_min_DCT_v_scaled_size = 9;
6484 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 10) {
6485 + /* Provide 10/block_size scaling */
6486 + cinfo->output_width = (JDIMENSION)
6487 + jdiv_round_up((long) cinfo->image_width * 10L, (long) DCTSIZE);
6488 + cinfo->output_height = (JDIMENSION)
6489 + jdiv_round_up((long) cinfo->image_height * 10L, (long) DCTSIZE);
6490 + cinfo->_min_DCT_h_scaled_size = 10;
6491 + cinfo->_min_DCT_v_scaled_size = 10;
6492 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 11) {
6493 + /* Provide 11/block_size scaling */
6494 + cinfo->output_width = (JDIMENSION)
6495 + jdiv_round_up((long) cinfo->image_width * 11L, (long) DCTSIZE);
6496 + cinfo->output_height = (JDIMENSION)
6497 + jdiv_round_up((long) cinfo->image_height * 11L, (long) DCTSIZE);
6498 + cinfo->_min_DCT_h_scaled_size = 11;
6499 + cinfo->_min_DCT_v_scaled_size = 11;
6500 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 12) {
6501 + /* Provide 12/block_size scaling */
6502 + cinfo->output_width = (JDIMENSION)
6503 + jdiv_round_up((long) cinfo->image_width * 12L, (long) DCTSIZE);
6504 + cinfo->output_height = (JDIMENSION)
6505 + jdiv_round_up((long) cinfo->image_height * 12L, (long) DCTSIZE);
6506 + cinfo->_min_DCT_h_scaled_size = 12;
6507 + cinfo->_min_DCT_v_scaled_size = 12;
6508 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 13) {
6509 + /* Provide 13/block_size scaling */
6510 + cinfo->output_width = (JDIMENSION)
6511 + jdiv_round_up((long) cinfo->image_width * 13L, (long) DCTSIZE);
6512 + cinfo->output_height = (JDIMENSION)
6513 + jdiv_round_up((long) cinfo->image_height * 13L, (long) DCTSIZE);
6514 + cinfo->_min_DCT_h_scaled_size = 13;
6515 + cinfo->_min_DCT_v_scaled_size = 13;
6516 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 14) {
6517 + /* Provide 14/block_size scaling */
6518 + cinfo->output_width = (JDIMENSION)
6519 + jdiv_round_up((long) cinfo->image_width * 14L, (long) DCTSIZE);
6520 + cinfo->output_height = (JDIMENSION)
6521 + jdiv_round_up((long) cinfo->image_height * 14L, (long) DCTSIZE);
6522 + cinfo->_min_DCT_h_scaled_size = 14;
6523 + cinfo->_min_DCT_v_scaled_size = 14;
6524 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 15) {
6525 + /* Provide 15/block_size scaling */
6526 + cinfo->output_width = (JDIMENSION)
6527 + jdiv_round_up((long) cinfo->image_width * 15L, (long) DCTSIZE);
6528 + cinfo->output_height = (JDIMENSION)
6529 + jdiv_round_up((long) cinfo->image_height * 15L, (long) DCTSIZE);
6530 + cinfo->_min_DCT_h_scaled_size = 15;
6531 + cinfo->_min_DCT_v_scaled_size = 15;
6532 + } else {
6533 + /* Provide 16/block_size scaling */
6534 + cinfo->output_width = (JDIMENSION)
6535 + jdiv_round_up((long) cinfo->image_width * 16L, (long) DCTSIZE);
6536 + cinfo->output_height = (JDIMENSION)
6537 + jdiv_round_up((long) cinfo->image_height * 16L, (long) DCTSIZE);
6538 + cinfo->_min_DCT_h_scaled_size = 16;
6539 + cinfo->_min_DCT_v_scaled_size = 16;
6540 + }
6541 +
6542 + /* Recompute dimensions of components */
6543 + for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
6544 + ci++, compptr++) {
6545 + compptr->_DCT_h_scaled_size = cinfo->_min_DCT_h_scaled_size;
6546 + compptr->_DCT_v_scaled_size = cinfo->_min_DCT_v_scaled_size;
6547 + }
6548 +
6549 +#else /* !IDCT_SCALING_SUPPORTED */
6550 +
6551 + /* Hardwire it to "no scaling" */
6552 + cinfo->output_width = cinfo->image_width;
6553 + cinfo->output_height = cinfo->image_height;
6554 + /* jdinput.c has already initialized DCT_scaled_size,
6555 + * and has computed unscaled downsampled_width and downsampled_height.
6556 + */
6557 +
6558 +#endif /* IDCT_SCALING_SUPPORTED */
6559 +}
6560 +
6561 +
6562 +/*
6563 + * Compute output image dimensions and related values.
6564 + * NOTE: this is exported for possible use by application.
6565 + * Hence it mustn't do anything that can't be done twice.
6566 * Also note that it may be called before the master module is initialized!
6567 */
6568
6569 @@ -100,52 +278,31 @@
6570 if (cinfo->global_state != DSTATE_READY)
6571 ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
6572
6573 + /* Compute core output image dimensions and DCT scaling choices. */
6574 + jpeg_core_output_dimensions(cinfo);
6575 +
6576 #ifdef IDCT_SCALING_SUPPORTED
6577
6578 - /* Compute actual output image dimensions and DCT scaling choices. */
6579 - if (cinfo->scale_num * 8 <= cinfo->scale_denom) {
6580 - /* Provide 1/8 scaling */
6581 - cinfo->output_width = (JDIMENSION)
6582 - jdiv_round_up((long) cinfo->image_width, 8L);
6583 - cinfo->output_height = (JDIMENSION)
6584 - jdiv_round_up((long) cinfo->image_height, 8L);
6585 - cinfo->min_DCT_scaled_size = 1;
6586 - } else if (cinfo->scale_num * 4 <= cinfo->scale_denom) {
6587 - /* Provide 1/4 scaling */
6588 - cinfo->output_width = (JDIMENSION)
6589 - jdiv_round_up((long) cinfo->image_width, 4L);
6590 - cinfo->output_height = (JDIMENSION)
6591 - jdiv_round_up((long) cinfo->image_height, 4L);
6592 - cinfo->min_DCT_scaled_size = 2;
6593 - } else if (cinfo->scale_num * 2 <= cinfo->scale_denom) {
6594 - /* Provide 1/2 scaling */
6595 - cinfo->output_width = (JDIMENSION)
6596 - jdiv_round_up((long) cinfo->image_width, 2L);
6597 - cinfo->output_height = (JDIMENSION)
6598 - jdiv_round_up((long) cinfo->image_height, 2L);
6599 - cinfo->min_DCT_scaled_size = 4;
6600 - } else {
6601 - /* Provide 1/1 scaling */
6602 - cinfo->output_width = cinfo->image_width;
6603 - cinfo->output_height = cinfo->image_height;
6604 - cinfo->min_DCT_scaled_size = DCTSIZE;
6605 - }
6606 /* In selecting the actual DCT scaling for each component, we try to
6607 * scale up the chroma components via IDCT scaling rather than upsampling.
6608 * This saves time if the upsampler gets to use 1:1 scaling.
6609 - * Note this code assumes that the supported DCT scalings are powers of 2.
6610 + * Note this code adapts subsampling ratios which are powers of 2.
6611 */
6612 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
6613 ci++, compptr++) {
6614 - int ssize = cinfo->min_DCT_scaled_size;
6615 + int ssize = cinfo->_min_DCT_scaled_size;
6616 while (ssize < DCTSIZE &&
6617 - (compptr->h_samp_factor * ssize * 2 <=
6618 - cinfo->max_h_samp_factor * cinfo->min_DCT_scaled_size) &&
6619 - (compptr->v_samp_factor * ssize * 2 <=
6620 - cinfo->max_v_samp_factor * cinfo->min_DCT_scaled_size)) {
6621 + ((cinfo->max_h_samp_factor * cinfo->_min_DCT_scaled_size) %
6622 + (compptr->h_samp_factor * ssize * 2) == 0) &&
6623 + ((cinfo->max_v_samp_factor * cinfo->_min_DCT_scaled_size) %
6624 + (compptr->v_samp_factor * ssize * 2) == 0)) {
6625 ssize = ssize * 2;
6626 }
6627 +#if JPEG_LIB_VERSION >= 70
6628 + compptr->DCT_h_scaled_size = compptr->DCT_v_scaled_size = ssize;
6629 +#else
6630 compptr->DCT_scaled_size = ssize;
6631 +#endif
6632 }
6633
6634 /* Recompute downsampled dimensions of components;
6635 @@ -156,11 +313,11 @@
6636 /* Size in samples, after IDCT scaling */
6637 compptr->downsampled_width = (JDIMENSION)
6638 jdiv_round_up((long) cinfo->image_width *
6639 - (long) (compptr->h_samp_factor * compptr->DCT_scaled_size),
6640 + (long) (compptr->h_samp_factor * compptr->_DCT_scaled_size),
6641 (long) (cinfo->max_h_samp_factor * DCTSIZE));
6642 compptr->downsampled_height = (JDIMENSION)
6643 jdiv_round_up((long) cinfo->image_height *
6644 - (long) (compptr->v_samp_factor * compptr->DCT_scaled_size),
6645 + (long) (compptr->v_samp_factor * compptr->_DCT_scaled_size),
6646 (long) (cinfo->max_v_samp_factor * DCTSIZE));
6647 }
6648
6649 @@ -188,6 +345,10 @@
6650 case JCS_EXT_BGRX:
6651 case JCS_EXT_XBGR:
6652 case JCS_EXT_XRGB:
6653 + case JCS_EXT_RGBA:
6654 + case JCS_EXT_BGRA:
6655 + case JCS_EXT_ABGR:
6656 + case JCS_EXT_ARGB:
6657 cinfo->out_color_components = rgb_pixelsize[cinfo->out_color_space];
6658 break;
6659 case JCS_YCbCr:
6660 @@ -384,7 +545,11 @@
6661 jinit_inverse_dct(cinfo);
6662 /* Entropy decoding: either Huffman or arithmetic coding. */
6663 if (cinfo->arith_code) {
6664 +#ifdef D_ARITH_CODING_SUPPORTED
6665 + jinit_arith_decoder(cinfo);
6666 +#else
6667 ERREXIT(cinfo, JERR_ARITH_NOTIMPL);
6668 +#endif
6669 } else {
6670 if (cinfo->progressive_mode) {
6671 #ifdef D_PROGRESSIVE_SUPPORTED
6672 Index: jdmerge.c
6673 ===================================================================
6674 --- jdmerge.c (revision 829)
6675 +++ jdmerge.c (working copy)
6676 @@ -1,10 +1,11 @@
6677 /*
6678 * jdmerge.c
6679 *
6680 + * This file was part of the Independent JPEG Group's software:
6681 * Copyright (C) 1994-1996, Thomas G. Lane.
6682 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
6683 - * Copyright (C) 2009, D. R. Commander.
6684 - * This file is part of the Independent JPEG Group's software.
6685 + * libjpeg-turbo Modifications:
6686 + * Copyright (C) 2009, 2011, D. R. Commander.
6687 * For conditions of distribution and use, see the accompanying README file.
6688 *
6689 * This file contains code for merged upsampling/color conversion.
6690 @@ -38,6 +39,7 @@
6691 #include "jinclude.h"
6692 #include "jpeglib.h"
6693 #include "jsimd.h"
6694 +#include "config.h"
6695
6696 #ifdef UPSAMPLE_MERGING_SUPPORTED
6697
6698 @@ -77,6 +79,107 @@
6699 #define FIX(x) ((INT32) ((x) * (1L<<SCALEBITS) + 0.5))
6700
6701
6702 +/* Include inline routines for colorspace extensions */
6703 +
6704 +#include "jdmrgext.c"
6705 +#undef RGB_RED
6706 +#undef RGB_GREEN
6707 +#undef RGB_BLUE
6708 +#undef RGB_PIXELSIZE
6709 +
6710 +#define RGB_RED EXT_RGB_RED
6711 +#define RGB_GREEN EXT_RGB_GREEN
6712 +#define RGB_BLUE EXT_RGB_BLUE
6713 +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
6714 +#define h2v1_merged_upsample_internal extrgb_h2v1_merged_upsample_internal
6715 +#define h2v2_merged_upsample_internal extrgb_h2v2_merged_upsample_internal
6716 +#include "jdmrgext.c"
6717 +#undef RGB_RED
6718 +#undef RGB_GREEN
6719 +#undef RGB_BLUE
6720 +#undef RGB_PIXELSIZE
6721 +#undef h2v1_merged_upsample_internal
6722 +#undef h2v2_merged_upsample_internal
6723 +
6724 +#define RGB_RED EXT_RGBX_RED
6725 +#define RGB_GREEN EXT_RGBX_GREEN
6726 +#define RGB_BLUE EXT_RGBX_BLUE
6727 +#define RGB_ALPHA 3
6728 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
6729 +#define h2v1_merged_upsample_internal extrgbx_h2v1_merged_upsample_internal
6730 +#define h2v2_merged_upsample_internal extrgbx_h2v2_merged_upsample_internal
6731 +#include "jdmrgext.c"
6732 +#undef RGB_RED
6733 +#undef RGB_GREEN
6734 +#undef RGB_BLUE
6735 +#undef RGB_ALPHA
6736 +#undef RGB_PIXELSIZE
6737 +#undef h2v1_merged_upsample_internal
6738 +#undef h2v2_merged_upsample_internal
6739 +
6740 +#define RGB_RED EXT_BGR_RED
6741 +#define RGB_GREEN EXT_BGR_GREEN
6742 +#define RGB_BLUE EXT_BGR_BLUE
6743 +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
6744 +#define h2v1_merged_upsample_internal extbgr_h2v1_merged_upsample_internal
6745 +#define h2v2_merged_upsample_internal extbgr_h2v2_merged_upsample_internal
6746 +#include "jdmrgext.c"
6747 +#undef RGB_RED
6748 +#undef RGB_GREEN
6749 +#undef RGB_BLUE
6750 +#undef RGB_PIXELSIZE
6751 +#undef h2v1_merged_upsample_internal
6752 +#undef h2v2_merged_upsample_internal
6753 +
6754 +#define RGB_RED EXT_BGRX_RED
6755 +#define RGB_GREEN EXT_BGRX_GREEN
6756 +#define RGB_BLUE EXT_BGRX_BLUE
6757 +#define RGB_ALPHA 3
6758 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
6759 +#define h2v1_merged_upsample_internal extbgrx_h2v1_merged_upsample_internal
6760 +#define h2v2_merged_upsample_internal extbgrx_h2v2_merged_upsample_internal
6761 +#include "jdmrgext.c"
6762 +#undef RGB_RED
6763 +#undef RGB_GREEN
6764 +#undef RGB_BLUE
6765 +#undef RGB_ALPHA
6766 +#undef RGB_PIXELSIZE
6767 +#undef h2v1_merged_upsample_internal
6768 +#undef h2v2_merged_upsample_internal
6769 +
6770 +#define RGB_RED EXT_XBGR_RED
6771 +#define RGB_GREEN EXT_XBGR_GREEN
6772 +#define RGB_BLUE EXT_XBGR_BLUE
6773 +#define RGB_ALPHA 0
6774 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
6775 +#define h2v1_merged_upsample_internal extxbgr_h2v1_merged_upsample_internal
6776 +#define h2v2_merged_upsample_internal extxbgr_h2v2_merged_upsample_internal
6777 +#include "jdmrgext.c"
6778 +#undef RGB_RED
6779 +#undef RGB_GREEN
6780 +#undef RGB_BLUE
6781 +#undef RGB_ALPHA
6782 +#undef RGB_PIXELSIZE
6783 +#undef h2v1_merged_upsample_internal
6784 +#undef h2v2_merged_upsample_internal
6785 +
6786 +#define RGB_RED EXT_XRGB_RED
6787 +#define RGB_GREEN EXT_XRGB_GREEN
6788 +#define RGB_BLUE EXT_XRGB_BLUE
6789 +#define RGB_ALPHA 0
6790 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
6791 +#define h2v1_merged_upsample_internal extxrgb_h2v1_merged_upsample_internal
6792 +#define h2v2_merged_upsample_internal extxrgb_h2v2_merged_upsample_internal
6793 +#include "jdmrgext.c"
6794 +#undef RGB_RED
6795 +#undef RGB_GREEN
6796 +#undef RGB_BLUE
6797 +#undef RGB_ALPHA
6798 +#undef RGB_PIXELSIZE
6799 +#undef h2v1_merged_upsample_internal
6800 +#undef h2v2_merged_upsample_internal
6801 +
6802 +
6803 /*
6804 * Initialize tables for YCC->RGB colorspace conversion.
6805 * This is taken directly from jdcolor.c; see that file for more info.
6806 @@ -230,56 +333,40 @@
6807 JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr,
6808 JSAMPARRAY output_buf)
6809 {
6810 - my_upsample_ptr upsample = (my_upsample_ptr) cinfo->upsample;
6811 - register int y, cred, cgreen, cblue;
6812 - int cb, cr;
6813 - register JSAMPROW outptr;
6814 - JSAMPROW inptr0, inptr1, inptr2;
6815 - JDIMENSION col;
6816 - /* copy these pointers into registers if possible */
6817 - register JSAMPLE * range_limit = cinfo->sample_range_limit;
6818 - int * Crrtab = upsample->Cr_r_tab;
6819 - int * Cbbtab = upsample->Cb_b_tab;
6820 - INT32 * Crgtab = upsample->Cr_g_tab;
6821 - INT32 * Cbgtab = upsample->Cb_g_tab;
6822 - SHIFT_TEMPS
6823 -
6824 - inptr0 = input_buf[0][in_row_group_ctr];
6825 - inptr1 = input_buf[1][in_row_group_ctr];
6826 - inptr2 = input_buf[2][in_row_group_ctr];
6827 - outptr = output_buf[0];
6828 - /* Loop for each pair of output pixels */
6829 - for (col = cinfo->output_width >> 1; col > 0; col--) {
6830 - /* Do the chroma part of the calculation */
6831 - cb = GETJSAMPLE(*inptr1++);
6832 - cr = GETJSAMPLE(*inptr2++);
6833 - cred = Crrtab[cr];
6834 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
6835 - cblue = Cbbtab[cb];
6836 - /* Fetch 2 Y values and emit 2 pixels */
6837 - y = GETJSAMPLE(*inptr0++);
6838 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];
6839 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];
6840 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];
6841 - outptr += rgb_pixelsize[cinfo->out_color_space];
6842 - y = GETJSAMPLE(*inptr0++);
6843 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];
6844 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];
6845 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];
6846 - outptr += rgb_pixelsize[cinfo->out_color_space];
6847 + switch (cinfo->out_color_space) {
6848 + case JCS_EXT_RGB:
6849 + extrgb_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6850 + output_buf);
6851 + break;
6852 + case JCS_EXT_RGBX:
6853 + case JCS_EXT_RGBA:
6854 + extrgbx_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6855 + output_buf);
6856 + break;
6857 + case JCS_EXT_BGR:
6858 + extbgr_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6859 + output_buf);
6860 + break;
6861 + case JCS_EXT_BGRX:
6862 + case JCS_EXT_BGRA:
6863 + extbgrx_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6864 + output_buf);
6865 + break;
6866 + case JCS_EXT_XBGR:
6867 + case JCS_EXT_ABGR:
6868 + extxbgr_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6869 + output_buf);
6870 + break;
6871 + case JCS_EXT_XRGB:
6872 + case JCS_EXT_ARGB:
6873 + extxrgb_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6874 + output_buf);
6875 + break;
6876 + default:
6877 + h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6878 + output_buf);
6879 + break;
6880 }
6881 - /* If image width is odd, do the last output column separately */
6882 - if (cinfo->output_width & 1) {
6883 - cb = GETJSAMPLE(*inptr1);
6884 - cr = GETJSAMPLE(*inptr2);
6885 - cred = Crrtab[cr];
6886 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
6887 - cblue = Cbbtab[cb];
6888 - y = GETJSAMPLE(*inptr0);
6889 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];
6890 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];
6891 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];
6892 - }
6893 }
6894
6895
6896 @@ -292,72 +379,40 @@
6897 JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr,
6898 JSAMPARRAY output_buf)
6899 {
6900 - my_upsample_ptr upsample = (my_upsample_ptr) cinfo->upsample;
6901 - register int y, cred, cgreen, cblue;
6902 - int cb, cr;
6903 - register JSAMPROW outptr0, outptr1;
6904 - JSAMPROW inptr00, inptr01, inptr1, inptr2;
6905 - JDIMENSION col;
6906 - /* copy these pointers into registers if possible */
6907 - register JSAMPLE * range_limit = cinfo->sample_range_limit;
6908 - int * Crrtab = upsample->Cr_r_tab;
6909 - int * Cbbtab = upsample->Cb_b_tab;
6910 - INT32 * Crgtab = upsample->Cr_g_tab;
6911 - INT32 * Cbgtab = upsample->Cb_g_tab;
6912 - SHIFT_TEMPS
6913 -
6914 - inptr00 = input_buf[0][in_row_group_ctr*2];
6915 - inptr01 = input_buf[0][in_row_group_ctr*2 + 1];
6916 - inptr1 = input_buf[1][in_row_group_ctr];
6917 - inptr2 = input_buf[2][in_row_group_ctr];
6918 - outptr0 = output_buf[0];
6919 - outptr1 = output_buf[1];
6920 - /* Loop for each group of output pixels */
6921 - for (col = cinfo->output_width >> 1; col > 0; col--) {
6922 - /* Do the chroma part of the calculation */
6923 - cb = GETJSAMPLE(*inptr1++);
6924 - cr = GETJSAMPLE(*inptr2++);
6925 - cred = Crrtab[cr];
6926 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
6927 - cblue = Cbbtab[cb];
6928 - /* Fetch 4 Y values and emit 4 pixels */
6929 - y = GETJSAMPLE(*inptr00++);
6930 - outptr0[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];
6931 - outptr0[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];
6932 - outptr0[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];
6933 - outptr0 += RGB_PIXELSIZE;
6934 - y = GETJSAMPLE(*inptr00++);
6935 - outptr0[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];
6936 - outptr0[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];
6937 - outptr0[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];
6938 - outptr0 += RGB_PIXELSIZE;
6939 - y = GETJSAMPLE(*inptr01++);
6940 - outptr1[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];
6941 - outptr1[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];
6942 - outptr1[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];
6943 - outptr1 += RGB_PIXELSIZE;
6944 - y = GETJSAMPLE(*inptr01++);
6945 - outptr1[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];
6946 - outptr1[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];
6947 - outptr1[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];
6948 - outptr1 += RGB_PIXELSIZE;
6949 + switch (cinfo->out_color_space) {
6950 + case JCS_EXT_RGB:
6951 + extrgb_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6952 + output_buf);
6953 + break;
6954 + case JCS_EXT_RGBX:
6955 + case JCS_EXT_RGBA:
6956 + extrgbx_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6957 + output_buf);
6958 + break;
6959 + case JCS_EXT_BGR:
6960 + extbgr_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6961 + output_buf);
6962 + break;
6963 + case JCS_EXT_BGRX:
6964 + case JCS_EXT_BGRA:
6965 + extbgrx_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6966 + output_buf);
6967 + break;
6968 + case JCS_EXT_XBGR:
6969 + case JCS_EXT_ABGR:
6970 + extxbgr_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6971 + output_buf);
6972 + break;
6973 + case JCS_EXT_XRGB:
6974 + case JCS_EXT_ARGB:
6975 + extxrgb_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6976 + output_buf);
6977 + break;
6978 + default:
6979 + h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,
6980 + output_buf);
6981 + break;
6982 }
6983 - /* If image width is odd, do the last output column separately */
6984 - if (cinfo->output_width & 1) {
6985 - cb = GETJSAMPLE(*inptr1);
6986 - cr = GETJSAMPLE(*inptr2);
6987 - cred = Crrtab[cr];
6988 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
6989 - cblue = Cbbtab[cb];
6990 - y = GETJSAMPLE(*inptr00);
6991 - outptr0[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];
6992 - outptr0[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];
6993 - outptr0[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];
6994 - y = GETJSAMPLE(*inptr01);
6995 - outptr1[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];
6996 - outptr1[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];
6997 - outptr1[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];
6998 - }
6999 }
7000
7001
7002 Index: jdphuff.c
7003 ===================================================================
7004 --- jdphuff.c (revision 829)
7005 +++ jdphuff.c (working copy)
7006 @@ -198,6 +198,7 @@
7007 * On some machines, a shift and add will be faster than a table lookup.
7008 */
7009
7010 +#define AVOID_TABLES
7011 #ifdef AVOID_TABLES
7012
7013 #define HUFF_EXTEND(x,s) ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x))
7014 Index: jdsample.c
7015 ===================================================================
7016 --- jdsample.c (revision 829)
7017 +++ jdsample.c (working copy)
7018 @@ -1,9 +1,11 @@
7019 /*
7020 * jdsample.c
7021 *
7022 + * This file was part of the Independent JPEG Group's software:
7023 * Copyright (C) 1991-1996, Thomas G. Lane.
7024 + * libjpeg-turbo Modifications:
7025 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
7026 - * This file is part of the Independent JPEG Group's software.
7027 + * Copyright (C) 2010, D. R. Commander.
7028 * For conditions of distribution and use, see the accompanying README file.
7029 *
7030 * This file contains upsampling routines.
7031 @@ -19,50 +21,12 @@
7032 * Pub. by IEEE Computer Society Press, Los Alamitos, CA. ISBN 0-8186-8944-7.
7033 */
7034
7035 -#define JPEG_INTERNALS
7036 -#include "jinclude.h"
7037 -#include "jpeglib.h"
7038 +#include "jdsample.h"
7039 #include "jsimd.h"
7040 +#include "jpegcomp.h"
7041
7042
7043 -/* Pointer to routine to upsample a single component */
7044 -typedef JMETHOD(void, upsample1_ptr,
7045 - (j_decompress_ptr cinfo, jpeg_component_info * compptr,
7046 - JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
7047
7048 -/* Private subobject */
7049 -
7050 -typedef struct {
7051 - struct jpeg_upsampler pub; /* public fields */
7052 -
7053 - /* Color conversion buffer. When using separate upsampling and color
7054 - * conversion steps, this buffer holds one upsampled row group until it
7055 - * has been color converted and output.
7056 - * Note: we do not allocate any storage for component(s) which are full-size,
7057 - * ie do not need rescaling. The corresponding entry of color_buf[] is
7058 - * simply set to point to the input data array, thereby avoiding copying.
7059 - */
7060 - JSAMPARRAY color_buf[MAX_COMPONENTS];
7061 -
7062 - /* Per-component upsampling method pointers */
7063 - upsample1_ptr methods[MAX_COMPONENTS];
7064 -
7065 - int next_row_out; /* counts rows emitted from color_buf */
7066 - JDIMENSION rows_to_go; /* counts rows remaining in image */
7067 -
7068 - /* Height of an input row group for each component. */
7069 - int rowgroup_height[MAX_COMPONENTS];
7070 -
7071 - /* These arrays save pixel expansion factors so that int_expand need not
7072 - * recompute them each time. They are unused for other upsampling methods.
7073 - */
7074 - UINT8 h_expand[MAX_COMPONENTS];
7075 - UINT8 v_expand[MAX_COMPONENTS];
7076 -} my_upsampler;
7077 -
7078 -typedef my_upsampler * my_upsample_ptr;
7079 -
7080 -
7081 /*
7082 * Initialize for an upsampling pass.
7083 */
7084 @@ -420,7 +384,7 @@
7085 /* jdmainct.c doesn't support context rows when min_DCT_scaled_size = 1,
7086 * so don't ask for it.
7087 */
7088 - do_fancy = cinfo->do_fancy_upsampling && cinfo->min_DCT_scaled_size > 1;
7089 + do_fancy = cinfo->do_fancy_upsampling && cinfo->_min_DCT_scaled_size > 1;
7090
7091 /* Verify we can handle the sampling factors, select per-component methods,
7092 * and create storage as needed.
7093 @@ -430,10 +394,10 @@
7094 /* Compute size of an "input group" after IDCT scaling. This many samples
7095 * are to be converted to max_h_samp_factor * max_v_samp_factor pixels.
7096 */
7097 - h_in_group = (compptr->h_samp_factor * compptr->DCT_scaled_size) /
7098 - cinfo->min_DCT_scaled_size;
7099 - v_in_group = (compptr->v_samp_factor * compptr->DCT_scaled_size) /
7100 - cinfo->min_DCT_scaled_size;
7101 + h_in_group = (compptr->h_samp_factor * compptr->_DCT_scaled_size) /
7102 + cinfo->_min_DCT_scaled_size;
7103 + v_in_group = (compptr->v_samp_factor * compptr->_DCT_scaled_size) /
7104 + cinfo->_min_DCT_scaled_size;
7105 h_out_group = cinfo->max_h_samp_factor;
7106 v_out_group = cinfo->max_v_samp_factor;
7107 upsample->rowgroup_height[ci] = v_in_group; /* save for use later */
7108 Index: jdtrans.c
7109 ===================================================================
7110 --- jdtrans.c (revision 829)
7111 +++ jdtrans.c (working copy)
7112 @@ -99,9 +99,18 @@
7113 /* This is effectively a buffered-image operation. */
7114 cinfo->buffered_image = TRUE;
7115
7116 +#if JPEG_LIB_VERSION >= 80
7117 + /* Compute output image dimensions and related values. */
7118 + jpeg_core_output_dimensions(cinfo);
7119 +#endif
7120 +
7121 /* Entropy decoding: either Huffman or arithmetic coding. */
7122 if (cinfo->arith_code) {
7123 +#ifdef D_ARITH_CODING_SUPPORTED
7124 + jinit_arith_decoder(cinfo);
7125 +#else
7126 ERREXIT(cinfo, JERR_ARITH_NOTIMPL);
7127 +#endif
7128 } else {
7129 if (cinfo->progressive_mode) {
7130 #ifdef D_PROGRESSIVE_SUPPORTED
7131 Index: jerror.h
7132 ===================================================================
7133 --- jerror.h (revision 829)
7134 +++ jerror.h (working copy)
7135 @@ -2,6 +2,7 @@
7136 * jerror.h
7137 *
7138 * Copyright (C) 1994-1997, Thomas G. Lane.
7139 + * Modified 1997-2009 by Guido Vollbeding.
7140 * This file is part of the Independent JPEG Group's software.
7141 * For conditions of distribution and use, see the accompanying README file.
7142 *
7143 @@ -39,14 +40,23 @@
7144 JMESSAGE(JMSG_NOMESSAGE, "Bogus message code %d") /* Must be first entry! */
7145
7146 /* For maintenance convenience, list is alphabetical by message code name */
7147 +#if JPEG_LIB_VERSION < 70
7148 JMESSAGE(JERR_ARITH_NOTIMPL,
7149 - "Sorry, there are legal restrictions on arithmetic coding")
7150 + "Sorry, arithmetic coding is not implemented")
7151 +#endif
7152 JMESSAGE(JERR_BAD_ALIGN_TYPE, "ALIGN_TYPE is wrong, please fix")
7153 JMESSAGE(JERR_BAD_ALLOC_CHUNK, "MAX_ALLOC_CHUNK is wrong, please fix")
7154 JMESSAGE(JERR_BAD_BUFFER_MODE, "Bogus buffer control mode")
7155 JMESSAGE(JERR_BAD_COMPONENT_ID, "Invalid component ID %d in SOS")
7156 +#if JPEG_LIB_VERSION >= 70
7157 +JMESSAGE(JERR_BAD_CROP_SPEC, "Invalid crop request")
7158 +#endif
7159 JMESSAGE(JERR_BAD_DCT_COEF, "DCT coefficient out of range")
7160 JMESSAGE(JERR_BAD_DCTSIZE, "IDCT output block size %d not supported")
7161 +#if JPEG_LIB_VERSION >= 70
7162 +JMESSAGE(JERR_BAD_DROP_SAMPLING,
7163 + "Component index %d: mismatching sampling ratio %d:%d, %d:%d, %c")
7164 +#endif
7165 JMESSAGE(JERR_BAD_HUFF_TABLE, "Bogus Huffman table definition")
7166 JMESSAGE(JERR_BAD_IN_COLORSPACE, "Bogus input colorspace")
7167 JMESSAGE(JERR_BAD_J_COLORSPACE, "Bogus JPEG colorspace")
7168 @@ -93,6 +103,9 @@
7169 JMESSAGE(JERR_MODE_CHANGE, "Invalid color quantization mode change")
7170 JMESSAGE(JERR_NOTIMPL, "Not implemented yet")
7171 JMESSAGE(JERR_NOT_COMPILED, "Requested feature was omitted at compile time")
7172 +#if JPEG_LIB_VERSION >= 70
7173 +JMESSAGE(JERR_NO_ARITH_TABLE, "Arithmetic table 0x%02x was not defined")
7174 +#endif
7175 JMESSAGE(JERR_NO_BACKING_STORE, "Backing store not supported")
7176 JMESSAGE(JERR_NO_HUFF_TABLE, "Huffman table 0x%02x was not defined")
7177 JMESSAGE(JERR_NO_IMAGE, "JPEG datastream contains no image")
7178 @@ -170,6 +183,9 @@
7179 JMESSAGE(JTRC_XMS_CLOSE, "Freed XMS handle %u")
7180 JMESSAGE(JTRC_XMS_OPEN, "Obtained XMS handle %u")
7181 JMESSAGE(JWRN_ADOBE_XFORM, "Unknown Adobe color transform code %d")
7182 +#if JPEG_LIB_VERSION >= 70
7183 +JMESSAGE(JWRN_ARITH_BAD_CODE, "Corrupt JPEG data: bad arithmetic code")
7184 +#endif
7185 JMESSAGE(JWRN_BOGUS_PROGRESSION,
7186 "Inconsistent progression sequence for component %d coefficient %d")
7187 JMESSAGE(JWRN_EXTRANEOUS_DATA,
7188 @@ -182,6 +198,13 @@
7189 "Corrupt JPEG data: found marker 0x%02x instead of RST%d")
7190 JMESSAGE(JWRN_NOT_SEQUENTIAL, "Invalid SOS parameters for sequential JPEG")
7191 JMESSAGE(JWRN_TOO_MUCH_DATA, "Application transferred too many scanlines")
7192 +#if JPEG_LIB_VERSION < 70
7193 +JMESSAGE(JERR_BAD_CROP_SPEC, "Invalid crop request")
7194 +#if defined(C_ARITH_CODING_SUPPORTED) || defined(D_ARITH_CODING_SUPPORTED)
7195 +JMESSAGE(JERR_NO_ARITH_TABLE, "Arithmetic table 0x%02x was not defined")
7196 +JMESSAGE(JWRN_ARITH_BAD_CODE, "Corrupt JPEG data: bad arithmetic code")
7197 +#endif
7198 +#endif
7199
7200 #ifdef JMAKE_ENUM_LIST
7201
7202 Index: jidctint.c
7203 ===================================================================
7204 --- jidctint.c (revision 829)
7205 +++ jidctint.c (working copy)
7206 @@ -2,6 +2,7 @@
7207 * jidctint.c
7208 *
7209 * Copyright (C) 1991-1998, Thomas G. Lane.
7210 + * Modification developed 2002-2009 by Guido Vollbeding.
7211 * This file is part of the Independent JPEG Group's software.
7212 * For conditions of distribution and use, see the accompanying README file.
7213 *
7214 @@ -23,6 +24,27 @@
7215 * The advantage of this method is that no data path contains more than one
7216 * multiplication; this allows a very simple and accurate implementation in
7217 * scaled fixed-point arithmetic, with a minimal number of shifts.
7218 + *
7219 + * We also provide IDCT routines with various output sample block sizes for
7220 + * direct resolution reduction or enlargement without additional resampling:
7221 + * NxN (N=1...16) pixels for one 8x8 input DCT block.
7222 + *
7223 + * For N<8 we simply take the corresponding low-frequency coefficients of
7224 + * the 8x8 input DCT block and apply an NxN point IDCT on the sub-block
7225 + * to yield the downscaled outputs.
7226 + * This can be seen as direct low-pass downsampling from the DCT domain
7227 + * point of view rather than the usual spatial domain point of view,
7228 + * yielding significant computational savings and results at least
7229 + * as good as common bilinear (averaging) spatial downsampling.
7230 + *
7231 + * For N>8 we apply a partial NxN IDCT on the 8 input coefficients as
7232 + * lower frequencies and higher frequencies assumed to be zero.
7233 + * It turns out that the computational effort is similar to the 8x8 IDCT
7234 + * regarding the output size.
7235 + * Furthermore, the scaling and descaling is the same for all IDCT sizes.
7236 + *
7237 + * CAUTION: We rely on the FIX() macro except for the N=1,2,4,8 cases
7238 + * since there would be too many additional constants to pre-calculate.
7239 */
7240
7241 #define JPEG_INTERNALS
7242 @@ -38,7 +60,7 @@
7243 */
7244
7245 #if DCTSIZE != 8
7246 - Sorry, this code only copes with 8x8 DCTs. /* deliberate syntax err */
7247 + Sorry, this code only copes with 8x8 DCT blocks. /* deliberate syntax err */
7248 #endif
7249
7250
7251 @@ -386,4 +408,2216 @@
7252 }
7253 }
7254
7255 +#ifdef IDCT_SCALING_SUPPORTED
7256 +
7257 +
7258 +/*
7259 + * Perform dequantization and inverse DCT on one block of coefficients,
7260 + * producing a 7x7 output block.
7261 + *
7262 + * Optimized algorithm with 12 multiplications in the 1-D kernel.
7263 + * cK represents sqrt(2) * cos(K*pi/14).
7264 + */
7265 +
7266 +GLOBAL(void)
7267 +jpeg_idct_7x7 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
7268 + JCOEFPTR coef_block,
7269 + JSAMPARRAY output_buf, JDIMENSION output_col)
7270 +{
7271 + INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12, tmp13;
7272 + INT32 z1, z2, z3;
7273 + JCOEFPTR inptr;
7274 + ISLOW_MULT_TYPE * quantptr;
7275 + int * wsptr;
7276 + JSAMPROW outptr;
7277 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
7278 + int ctr;
7279 + int workspace[7*7]; /* buffers data between passes */
7280 + SHIFT_TEMPS
7281 +
7282 + /* Pass 1: process columns from input, store into work array. */
7283 +
7284 + inptr = coef_block;
7285 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
7286 + wsptr = workspace;
7287 + for (ctr = 0; ctr < 7; ctr++, inptr++, quantptr++, wsptr++) {
7288 + /* Even part */
7289 +
7290 + tmp13 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
7291 + tmp13 <<= CONST_BITS;
7292 + /* Add fudge factor here for final descale. */
7293 + tmp13 += ONE << (CONST_BITS-PASS1_BITS-1);
7294 +
7295 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
7296 + z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
7297 + z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
7298 +
7299 + tmp10 = MULTIPLY(z2 - z3, FIX(0.881747734)); /* c4 */
7300 + tmp12 = MULTIPLY(z1 - z2, FIX(0.314692123)); /* c6 */
7301 + tmp11 = tmp10 + tmp12 + tmp13 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6 */
7302 + tmp0 = z1 + z3;
7303 + z2 -= tmp0;
7304 + tmp0 = MULTIPLY(tmp0, FIX(1.274162392)) + tmp13; /* c2 */
7305 + tmp10 += tmp0 - MULTIPLY(z3, FIX(0.077722536)); /* c2-c4-c6 */
7306 + tmp12 += tmp0 - MULTIPLY(z1, FIX(2.470602249)); /* c2+c4+c6 */
7307 + tmp13 += MULTIPLY(z2, FIX(1.414213562)); /* c0 */
7308 +
7309 + /* Odd part */
7310 +
7311 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
7312 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
7313 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
7314 +
7315 + tmp1 = MULTIPLY(z1 + z2, FIX(0.935414347)); /* (c3+c1-c5)/2 */
7316 + tmp2 = MULTIPLY(z1 - z2, FIX(0.170262339)); /* (c3+c5-c1)/2 */
7317 + tmp0 = tmp1 - tmp2;
7318 + tmp1 += tmp2;
7319 + tmp2 = MULTIPLY(z2 + z3, - FIX(1.378756276)); /* -c1 */
7320 + tmp1 += tmp2;
7321 + z2 = MULTIPLY(z1 + z3, FIX(0.613604268)); /* c5 */
7322 + tmp0 += z2;
7323 + tmp2 += z2 + MULTIPLY(z3, FIX(1.870828693)); /* c3+c1-c5 */
7324 +
7325 + /* Final output stage */
7326 +
7327 + wsptr[7*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
7328 + wsptr[7*6] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
7329 + wsptr[7*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);
7330 + wsptr[7*5] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);
7331 + wsptr[7*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);
7332 + wsptr[7*4] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);
7333 + wsptr[7*3] = (int) RIGHT_SHIFT(tmp13, CONST_BITS-PASS1_BITS);
7334 + }
7335 +
7336 + /* Pass 2: process 7 rows from work array, store into output array. */
7337 +
7338 + wsptr = workspace;
7339 + for (ctr = 0; ctr < 7; ctr++) {
7340 + outptr = output_buf[ctr] + output_col;
7341 +
7342 + /* Even part */
7343 +
7344 + /* Add fudge factor here for final descale. */
7345 + tmp13 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
7346 + tmp13 <<= CONST_BITS;
7347 +
7348 + z1 = (INT32) wsptr[2];
7349 + z2 = (INT32) wsptr[4];
7350 + z3 = (INT32) wsptr[6];
7351 +
7352 + tmp10 = MULTIPLY(z2 - z3, FIX(0.881747734)); /* c4 */
7353 + tmp12 = MULTIPLY(z1 - z2, FIX(0.314692123)); /* c6 */
7354 + tmp11 = tmp10 + tmp12 + tmp13 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6 */
7355 + tmp0 = z1 + z3;
7356 + z2 -= tmp0;
7357 + tmp0 = MULTIPLY(tmp0, FIX(1.274162392)) + tmp13; /* c2 */
7358 + tmp10 += tmp0 - MULTIPLY(z3, FIX(0.077722536)); /* c2-c4-c6 */
7359 + tmp12 += tmp0 - MULTIPLY(z1, FIX(2.470602249)); /* c2+c4+c6 */
7360 + tmp13 += MULTIPLY(z2, FIX(1.414213562)); /* c0 */
7361 +
7362 + /* Odd part */
7363 +
7364 + z1 = (INT32) wsptr[1];
7365 + z2 = (INT32) wsptr[3];
7366 + z3 = (INT32) wsptr[5];
7367 +
7368 + tmp1 = MULTIPLY(z1 + z2, FIX(0.935414347)); /* (c3+c1-c5)/2 */
7369 + tmp2 = MULTIPLY(z1 - z2, FIX(0.170262339)); /* (c3+c5-c1)/2 */
7370 + tmp0 = tmp1 - tmp2;
7371 + tmp1 += tmp2;
7372 + tmp2 = MULTIPLY(z2 + z3, - FIX(1.378756276)); /* -c1 */
7373 + tmp1 += tmp2;
7374 + z2 = MULTIPLY(z1 + z3, FIX(0.613604268)); /* c5 */
7375 + tmp0 += z2;
7376 + tmp2 += z2 + MULTIPLY(z3, FIX(1.870828693)); /* c3+c1-c5 */
7377 +
7378 + /* Final output stage */
7379 +
7380 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
7381 + CONST_BITS+PASS1_BITS+3)
7382 + & RANGE_MASK];
7383 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
7384 + CONST_BITS+PASS1_BITS+3)
7385 + & RANGE_MASK];
7386 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
7387 + CONST_BITS+PASS1_BITS+3)
7388 + & RANGE_MASK];
7389 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
7390 + CONST_BITS+PASS1_BITS+3)
7391 + & RANGE_MASK];
7392 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
7393 + CONST_BITS+PASS1_BITS+3)
7394 + & RANGE_MASK];
7395 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
7396 + CONST_BITS+PASS1_BITS+3)
7397 + & RANGE_MASK];
7398 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13,
7399 + CONST_BITS+PASS1_BITS+3)
7400 + & RANGE_MASK];
7401 +
7402 + wsptr += 7; /* advance pointer to next row */
7403 + }
7404 +}
7405 +
7406 +
7407 +/*
7408 + * Perform dequantization and inverse DCT on one block of coefficients,
7409 + * producing a reduced-size 6x6 output block.
7410 + *
7411 + * Optimized algorithm with 3 multiplications in the 1-D kernel.
7412 + * cK represents sqrt(2) * cos(K*pi/12).
7413 + */
7414 +
7415 +GLOBAL(void)
7416 +jpeg_idct_6x6 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
7417 + JCOEFPTR coef_block,
7418 + JSAMPARRAY output_buf, JDIMENSION output_col)
7419 +{
7420 + INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12;
7421 + INT32 z1, z2, z3;
7422 + JCOEFPTR inptr;
7423 + ISLOW_MULT_TYPE * quantptr;
7424 + int * wsptr;
7425 + JSAMPROW outptr;
7426 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
7427 + int ctr;
7428 + int workspace[6*6]; /* buffers data between passes */
7429 + SHIFT_TEMPS
7430 +
7431 + /* Pass 1: process columns from input, store into work array. */
7432 +
7433 + inptr = coef_block;
7434 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
7435 + wsptr = workspace;
7436 + for (ctr = 0; ctr < 6; ctr++, inptr++, quantptr++, wsptr++) {
7437 + /* Even part */
7438 +
7439 + tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
7440 + tmp0 <<= CONST_BITS;
7441 + /* Add fudge factor here for final descale. */
7442 + tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
7443 + tmp2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
7444 + tmp10 = MULTIPLY(tmp2, FIX(0.707106781)); /* c4 */
7445 + tmp1 = tmp0 + tmp10;
7446 + tmp11 = RIGHT_SHIFT(tmp0 - tmp10 - tmp10, CONST_BITS-PASS1_BITS);
7447 + tmp10 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
7448 + tmp0 = MULTIPLY(tmp10, FIX(1.224744871)); /* c2 */
7449 + tmp10 = tmp1 + tmp0;
7450 + tmp12 = tmp1 - tmp0;
7451 +
7452 + /* Odd part */
7453 +
7454 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
7455 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
7456 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
7457 + tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */
7458 + tmp0 = tmp1 + ((z1 + z2) << CONST_BITS);
7459 + tmp2 = tmp1 + ((z3 - z2) << CONST_BITS);
7460 + tmp1 = (z1 - z2 - z3) << PASS1_BITS;
7461 +
7462 + /* Final output stage */
7463 +
7464 + wsptr[6*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
7465 + wsptr[6*5] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
7466 + wsptr[6*1] = (int) (tmp11 + tmp1);
7467 + wsptr[6*4] = (int) (tmp11 - tmp1);
7468 + wsptr[6*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);
7469 + wsptr[6*3] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);
7470 + }
7471 +
7472 + /* Pass 2: process 6 rows from work array, store into output array. */
7473 +
7474 + wsptr = workspace;
7475 + for (ctr = 0; ctr < 6; ctr++) {
7476 + outptr = output_buf[ctr] + output_col;
7477 +
7478 + /* Even part */
7479 +
7480 + /* Add fudge factor here for final descale. */
7481 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
7482 + tmp0 <<= CONST_BITS;
7483 + tmp2 = (INT32) wsptr[4];
7484 + tmp10 = MULTIPLY(tmp2, FIX(0.707106781)); /* c4 */
7485 + tmp1 = tmp0 + tmp10;
7486 + tmp11 = tmp0 - tmp10 - tmp10;
7487 + tmp10 = (INT32) wsptr[2];
7488 + tmp0 = MULTIPLY(tmp10, FIX(1.224744871)); /* c2 */
7489 + tmp10 = tmp1 + tmp0;
7490 + tmp12 = tmp1 - tmp0;
7491 +
7492 + /* Odd part */
7493 +
7494 + z1 = (INT32) wsptr[1];
7495 + z2 = (INT32) wsptr[3];
7496 + z3 = (INT32) wsptr[5];
7497 + tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */
7498 + tmp0 = tmp1 + ((z1 + z2) << CONST_BITS);
7499 + tmp2 = tmp1 + ((z3 - z2) << CONST_BITS);
7500 + tmp1 = (z1 - z2 - z3) << CONST_BITS;
7501 +
7502 + /* Final output stage */
7503 +
7504 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
7505 + CONST_BITS+PASS1_BITS+3)
7506 + & RANGE_MASK];
7507 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
7508 + CONST_BITS+PASS1_BITS+3)
7509 + & RANGE_MASK];
7510 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
7511 + CONST_BITS+PASS1_BITS+3)
7512 + & RANGE_MASK];
7513 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
7514 + CONST_BITS+PASS1_BITS+3)
7515 + & RANGE_MASK];
7516 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
7517 + CONST_BITS+PASS1_BITS+3)
7518 + & RANGE_MASK];
7519 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
7520 + CONST_BITS+PASS1_BITS+3)
7521 + & RANGE_MASK];
7522 +
7523 + wsptr += 6; /* advance pointer to next row */
7524 + }
7525 +}
7526 +
7527 +
7528 +/*
7529 + * Perform dequantization and inverse DCT on one block of coefficients,
7530 + * producing a reduced-size 5x5 output block.
7531 + *
7532 + * Optimized algorithm with 5 multiplications in the 1-D kernel.
7533 + * cK represents sqrt(2) * cos(K*pi/10).
7534 + */
7535 +
7536 +GLOBAL(void)
7537 +jpeg_idct_5x5 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
7538 + JCOEFPTR coef_block,
7539 + JSAMPARRAY output_buf, JDIMENSION output_col)
7540 +{
7541 + INT32 tmp0, tmp1, tmp10, tmp11, tmp12;
7542 + INT32 z1, z2, z3;
7543 + JCOEFPTR inptr;
7544 + ISLOW_MULT_TYPE * quantptr;
7545 + int * wsptr;
7546 + JSAMPROW outptr;
7547 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
7548 + int ctr;
7549 + int workspace[5*5]; /* buffers data between passes */
7550 + SHIFT_TEMPS
7551 +
7552 + /* Pass 1: process columns from input, store into work array. */
7553 +
7554 + inptr = coef_block;
7555 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
7556 + wsptr = workspace;
7557 + for (ctr = 0; ctr < 5; ctr++, inptr++, quantptr++, wsptr++) {
7558 + /* Even part */
7559 +
7560 + tmp12 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
7561 + tmp12 <<= CONST_BITS;
7562 + /* Add fudge factor here for final descale. */
7563 + tmp12 += ONE << (CONST_BITS-PASS1_BITS-1);
7564 + tmp0 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
7565 + tmp1 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
7566 + z1 = MULTIPLY(tmp0 + tmp1, FIX(0.790569415)); /* (c2+c4)/2 */
7567 + z2 = MULTIPLY(tmp0 - tmp1, FIX(0.353553391)); /* (c2-c4)/2 */
7568 + z3 = tmp12 + z2;
7569 + tmp10 = z3 + z1;
7570 + tmp11 = z3 - z1;
7571 + tmp12 -= z2 << 2;
7572 +
7573 + /* Odd part */
7574 +
7575 + z2 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
7576 + z3 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
7577 +
7578 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c3 */
7579 + tmp0 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c1-c3 */
7580 + tmp1 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c1+c3 */
7581 +
7582 + /* Final output stage */
7583 +
7584 + wsptr[5*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
7585 + wsptr[5*4] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
7586 + wsptr[5*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);
7587 + wsptr[5*3] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);
7588 + wsptr[5*2] = (int) RIGHT_SHIFT(tmp12, CONST_BITS-PASS1_BITS);
7589 + }
7590 +
7591 + /* Pass 2: process 5 rows from work array, store into output array. */
7592 +
7593 + wsptr = workspace;
7594 + for (ctr = 0; ctr < 5; ctr++) {
7595 + outptr = output_buf[ctr] + output_col;
7596 +
7597 + /* Even part */
7598 +
7599 + /* Add fudge factor here for final descale. */
7600 + tmp12 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
7601 + tmp12 <<= CONST_BITS;
7602 + tmp0 = (INT32) wsptr[2];
7603 + tmp1 = (INT32) wsptr[4];
7604 + z1 = MULTIPLY(tmp0 + tmp1, FIX(0.790569415)); /* (c2+c4)/2 */
7605 + z2 = MULTIPLY(tmp0 - tmp1, FIX(0.353553391)); /* (c2-c4)/2 */
7606 + z3 = tmp12 + z2;
7607 + tmp10 = z3 + z1;
7608 + tmp11 = z3 - z1;
7609 + tmp12 -= z2 << 2;
7610 +
7611 + /* Odd part */
7612 +
7613 + z2 = (INT32) wsptr[1];
7614 + z3 = (INT32) wsptr[3];
7615 +
7616 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c3 */
7617 + tmp0 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c1-c3 */
7618 + tmp1 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c1+c3 */
7619 +
7620 + /* Final output stage */
7621 +
7622 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
7623 + CONST_BITS+PASS1_BITS+3)
7624 + & RANGE_MASK];
7625 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
7626 + CONST_BITS+PASS1_BITS+3)
7627 + & RANGE_MASK];
7628 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
7629 + CONST_BITS+PASS1_BITS+3)
7630 + & RANGE_MASK];
7631 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
7632 + CONST_BITS+PASS1_BITS+3)
7633 + & RANGE_MASK];
7634 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12,
7635 + CONST_BITS+PASS1_BITS+3)
7636 + & RANGE_MASK];
7637 +
7638 + wsptr += 5; /* advance pointer to next row */
7639 + }
7640 +}
7641 +
7642 +
7643 +/*
7644 + * Perform dequantization and inverse DCT on one block of coefficients,
7645 + * producing a reduced-size 3x3 output block.
7646 + *
7647 + * Optimized algorithm with 2 multiplications in the 1-D kernel.
7648 + * cK represents sqrt(2) * cos(K*pi/6).
7649 + */
7650 +
7651 +GLOBAL(void)
7652 +jpeg_idct_3x3 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
7653 + JCOEFPTR coef_block,
7654 + JSAMPARRAY output_buf, JDIMENSION output_col)
7655 +{
7656 + INT32 tmp0, tmp2, tmp10, tmp12;
7657 + JCOEFPTR inptr;
7658 + ISLOW_MULT_TYPE * quantptr;
7659 + int * wsptr;
7660 + JSAMPROW outptr;
7661 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
7662 + int ctr;
7663 + int workspace[3*3]; /* buffers data between passes */
7664 + SHIFT_TEMPS
7665 +
7666 + /* Pass 1: process columns from input, store into work array. */
7667 +
7668 + inptr = coef_block;
7669 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
7670 + wsptr = workspace;
7671 + for (ctr = 0; ctr < 3; ctr++, inptr++, quantptr++, wsptr++) {
7672 + /* Even part */
7673 +
7674 + tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
7675 + tmp0 <<= CONST_BITS;
7676 + /* Add fudge factor here for final descale. */
7677 + tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
7678 + tmp2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
7679 + tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */
7680 + tmp10 = tmp0 + tmp12;
7681 + tmp2 = tmp0 - tmp12 - tmp12;
7682 +
7683 + /* Odd part */
7684 +
7685 + tmp12 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
7686 + tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */
7687 +
7688 + /* Final output stage */
7689 +
7690 + wsptr[3*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
7691 + wsptr[3*2] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
7692 + wsptr[3*1] = (int) RIGHT_SHIFT(tmp2, CONST_BITS-PASS1_BITS);
7693 + }
7694 +
7695 + /* Pass 2: process 3 rows from work array, store into output array. */
7696 +
7697 + wsptr = workspace;
7698 + for (ctr = 0; ctr < 3; ctr++) {
7699 + outptr = output_buf[ctr] + output_col;
7700 +
7701 + /* Even part */
7702 +
7703 + /* Add fudge factor here for final descale. */
7704 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
7705 + tmp0 <<= CONST_BITS;
7706 + tmp2 = (INT32) wsptr[2];
7707 + tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */
7708 + tmp10 = tmp0 + tmp12;
7709 + tmp2 = tmp0 - tmp12 - tmp12;
7710 +
7711 + /* Odd part */
7712 +
7713 + tmp12 = (INT32) wsptr[1];
7714 + tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */
7715 +
7716 + /* Final output stage */
7717 +
7718 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
7719 + CONST_BITS+PASS1_BITS+3)
7720 + & RANGE_MASK];
7721 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
7722 + CONST_BITS+PASS1_BITS+3)
7723 + & RANGE_MASK];
7724 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp2,
7725 + CONST_BITS+PASS1_BITS+3)
7726 + & RANGE_MASK];
7727 +
7728 + wsptr += 3; /* advance pointer to next row */
7729 + }
7730 +}
7731 +
7732 +
7733 +/*
7734 + * Perform dequantization and inverse DCT on one block of coefficients,
7735 + * producing a 9x9 output block.
7736 + *
7737 + * Optimized algorithm with 10 multiplications in the 1-D kernel.
7738 + * cK represents sqrt(2) * cos(K*pi/18).
7739 + */
7740 +
7741 +GLOBAL(void)
7742 +jpeg_idct_9x9 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
7743 + JCOEFPTR coef_block,
7744 + JSAMPARRAY output_buf, JDIMENSION output_col)
7745 +{
7746 + INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13, tmp14;
7747 + INT32 z1, z2, z3, z4;
7748 + JCOEFPTR inptr;
7749 + ISLOW_MULT_TYPE * quantptr;
7750 + int * wsptr;
7751 + JSAMPROW outptr;
7752 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
7753 + int ctr;
7754 + int workspace[8*9]; /* buffers data between passes */
7755 + SHIFT_TEMPS
7756 +
7757 + /* Pass 1: process columns from input, store into work array. */
7758 +
7759 + inptr = coef_block;
7760 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
7761 + wsptr = workspace;
7762 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
7763 + /* Even part */
7764 +
7765 + tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
7766 + tmp0 <<= CONST_BITS;
7767 + /* Add fudge factor here for final descale. */
7768 + tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);
7769 +
7770 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
7771 + z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
7772 + z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
7773 +
7774 + tmp3 = MULTIPLY(z3, FIX(0.707106781)); /* c6 */
7775 + tmp1 = tmp0 + tmp3;
7776 + tmp2 = tmp0 - tmp3 - tmp3;
7777 +
7778 + tmp0 = MULTIPLY(z1 - z2, FIX(0.707106781)); /* c6 */
7779 + tmp11 = tmp2 + tmp0;
7780 + tmp14 = tmp2 - tmp0 - tmp0;
7781 +
7782 + tmp0 = MULTIPLY(z1 + z2, FIX(1.328926049)); /* c2 */
7783 + tmp2 = MULTIPLY(z1, FIX(1.083350441)); /* c4 */
7784 + tmp3 = MULTIPLY(z2, FIX(0.245575608)); /* c8 */
7785 +
7786 + tmp10 = tmp1 + tmp0 - tmp3;
7787 + tmp12 = tmp1 - tmp0 + tmp2;
7788 + tmp13 = tmp1 - tmp2 + tmp3;
7789 +
7790 + /* Odd part */
7791 +
7792 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
7793 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
7794 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
7795 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
7796 +
7797 + z2 = MULTIPLY(z2, - FIX(1.224744871)); /* -c3 */
7798 +
7799 + tmp2 = MULTIPLY(z1 + z3, FIX(0.909038955)); /* c5 */
7800 + tmp3 = MULTIPLY(z1 + z4, FIX(0.483689525)); /* c7 */
7801 + tmp0 = tmp2 + tmp3 - z2;
7802 + tmp1 = MULTIPLY(z3 - z4, FIX(1.392728481)); /* c1 */
7803 + tmp2 += z2 - tmp1;
7804 + tmp3 += z2 + tmp1;
7805 + tmp1 = MULTIPLY(z1 - z3 - z4, FIX(1.224744871)); /* c3 */
7806 +
7807 + /* Final output stage */
7808 +
7809 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);
7810 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);
7811 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);
7812 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);
7813 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);
7814 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);
7815 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp13 + tmp3, CONST_BITS-PASS1_BITS);
7816 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp13 - tmp3, CONST_BITS-PASS1_BITS);
7817 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp14, CONST_BITS-PASS1_BITS);
7818 + }
7819 +
7820 + /* Pass 2: process 9 rows from work array, store into output array. */
7821 +
7822 + wsptr = workspace;
7823 + for (ctr = 0; ctr < 9; ctr++) {
7824 + outptr = output_buf[ctr] + output_col;
7825 +
7826 + /* Even part */
7827 +
7828 + /* Add fudge factor here for final descale. */
7829 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
7830 + tmp0 <<= CONST_BITS;
7831 +
7832 + z1 = (INT32) wsptr[2];
7833 + z2 = (INT32) wsptr[4];
7834 + z3 = (INT32) wsptr[6];
7835 +
7836 + tmp3 = MULTIPLY(z3, FIX(0.707106781)); /* c6 */
7837 + tmp1 = tmp0 + tmp3;
7838 + tmp2 = tmp0 - tmp3 - tmp3;
7839 +
7840 + tmp0 = MULTIPLY(z1 - z2, FIX(0.707106781)); /* c6 */
7841 + tmp11 = tmp2 + tmp0;
7842 + tmp14 = tmp2 - tmp0 - tmp0;
7843 +
7844 + tmp0 = MULTIPLY(z1 + z2, FIX(1.328926049)); /* c2 */
7845 + tmp2 = MULTIPLY(z1, FIX(1.083350441)); /* c4 */
7846 + tmp3 = MULTIPLY(z2, FIX(0.245575608)); /* c8 */
7847 +
7848 + tmp10 = tmp1 + tmp0 - tmp3;
7849 + tmp12 = tmp1 - tmp0 + tmp2;
7850 + tmp13 = tmp1 - tmp2 + tmp3;
7851 +
7852 + /* Odd part */
7853 +
7854 + z1 = (INT32) wsptr[1];
7855 + z2 = (INT32) wsptr[3];
7856 + z3 = (INT32) wsptr[5];
7857 + z4 = (INT32) wsptr[7];
7858 +
7859 + z2 = MULTIPLY(z2, - FIX(1.224744871)); /* -c3 */
7860 +
7861 + tmp2 = MULTIPLY(z1 + z3, FIX(0.909038955)); /* c5 */
7862 + tmp3 = MULTIPLY(z1 + z4, FIX(0.483689525)); /* c7 */
7863 + tmp0 = tmp2 + tmp3 - z2;
7864 + tmp1 = MULTIPLY(z3 - z4, FIX(1.392728481)); /* c1 */
7865 + tmp2 += z2 - tmp1;
7866 + tmp3 += z2 + tmp1;
7867 + tmp1 = MULTIPLY(z1 - z3 - z4, FIX(1.224744871)); /* c3 */
7868 +
7869 + /* Final output stage */
7870 +
7871 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,
7872 + CONST_BITS+PASS1_BITS+3)
7873 + & RANGE_MASK];
7874 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,
7875 + CONST_BITS+PASS1_BITS+3)
7876 + & RANGE_MASK];
7877 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,
7878 + CONST_BITS+PASS1_BITS+3)
7879 + & RANGE_MASK];
7880 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,
7881 + CONST_BITS+PASS1_BITS+3)
7882 + & RANGE_MASK];
7883 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,
7884 + CONST_BITS+PASS1_BITS+3)
7885 + & RANGE_MASK];
7886 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,
7887 + CONST_BITS+PASS1_BITS+3)
7888 + & RANGE_MASK];
7889 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13 + tmp3,
7890 + CONST_BITS+PASS1_BITS+3)
7891 + & RANGE_MASK];
7892 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp13 - tmp3,
7893 + CONST_BITS+PASS1_BITS+3)
7894 + & RANGE_MASK];
7895 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp14,
7896 + CONST_BITS+PASS1_BITS+3)
7897 + & RANGE_MASK];
7898 +
7899 + wsptr += 8; /* advance pointer to next row */
7900 + }
7901 +}
7902 +
7903 +
7904 +/*
7905 + * Perform dequantization and inverse DCT on one block of coefficients,
7906 + * producing a 10x10 output block.
7907 + *
7908 + * Optimized algorithm with 12 multiplications in the 1-D kernel.
7909 + * cK represents sqrt(2) * cos(K*pi/20).
7910 + */
7911 +
7912 +GLOBAL(void)
7913 +jpeg_idct_10x10 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
7914 + JCOEFPTR coef_block,
7915 + JSAMPARRAY output_buf, JDIMENSION output_col)
7916 +{
7917 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14;
7918 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24;
7919 + INT32 z1, z2, z3, z4, z5;
7920 + JCOEFPTR inptr;
7921 + ISLOW_MULT_TYPE * quantptr;
7922 + int * wsptr;
7923 + JSAMPROW outptr;
7924 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
7925 + int ctr;
7926 + int workspace[8*10]; /* buffers data between passes */
7927 + SHIFT_TEMPS
7928 +
7929 + /* Pass 1: process columns from input, store into work array. */
7930 +
7931 + inptr = coef_block;
7932 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
7933 + wsptr = workspace;
7934 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
7935 + /* Even part */
7936 +
7937 + z3 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
7938 + z3 <<= CONST_BITS;
7939 + /* Add fudge factor here for final descale. */
7940 + z3 += ONE << (CONST_BITS-PASS1_BITS-1);
7941 + z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
7942 + z1 = MULTIPLY(z4, FIX(1.144122806)); /* c4 */
7943 + z2 = MULTIPLY(z4, FIX(0.437016024)); /* c8 */
7944 + tmp10 = z3 + z1;
7945 + tmp11 = z3 - z2;
7946 +
7947 + tmp22 = RIGHT_SHIFT(z3 - ((z1 - z2) << 1), /* c0 = (c4-c8)*2 */
7948 + CONST_BITS-PASS1_BITS);
7949 +
7950 + z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
7951 + z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
7952 +
7953 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c6 */
7954 + tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */
7955 + tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */
7956 +
7957 + tmp20 = tmp10 + tmp12;
7958 + tmp24 = tmp10 - tmp12;
7959 + tmp21 = tmp11 + tmp13;
7960 + tmp23 = tmp11 - tmp13;
7961 +
7962 + /* Odd part */
7963 +
7964 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
7965 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
7966 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
7967 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
7968 +
7969 + tmp11 = z2 + z4;
7970 + tmp13 = z2 - z4;
7971 +
7972 + tmp12 = MULTIPLY(tmp13, FIX(0.309016994)); /* (c3-c7)/2 */
7973 + z5 = z3 << CONST_BITS;
7974 +
7975 + z2 = MULTIPLY(tmp11, FIX(0.951056516)); /* (c3+c7)/2 */
7976 + z4 = z5 + tmp12;
7977 +
7978 + tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */
7979 + tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */
7980 +
7981 + z2 = MULTIPLY(tmp11, FIX(0.587785252)); /* (c1-c9)/2 */
7982 + z4 = z5 - tmp12 - (tmp13 << (CONST_BITS - 1));
7983 +
7984 + tmp12 = (z1 - tmp13 - z3) << PASS1_BITS;
7985 +
7986 + tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */
7987 + tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */
7988 +
7989 + /* Final output stage */
7990 +
7991 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
7992 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
7993 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
7994 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
7995 + wsptr[8*2] = (int) (tmp22 + tmp12);
7996 + wsptr[8*7] = (int) (tmp22 - tmp12);
7997 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
7998 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
7999 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
8000 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
8001 + }
8002 +
8003 + /* Pass 2: process 10 rows from work array, store into output array. */
8004 +
8005 + wsptr = workspace;
8006 + for (ctr = 0; ctr < 10; ctr++) {
8007 + outptr = output_buf[ctr] + output_col;
8008 +
8009 + /* Even part */
8010 +
8011 + /* Add fudge factor here for final descale. */
8012 + z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
8013 + z3 <<= CONST_BITS;
8014 + z4 = (INT32) wsptr[4];
8015 + z1 = MULTIPLY(z4, FIX(1.144122806)); /* c4 */
8016 + z2 = MULTIPLY(z4, FIX(0.437016024)); /* c8 */
8017 + tmp10 = z3 + z1;
8018 + tmp11 = z3 - z2;
8019 +
8020 + tmp22 = z3 - ((z1 - z2) << 1); /* c0 = (c4-c8)*2 */
8021 +
8022 + z2 = (INT32) wsptr[2];
8023 + z3 = (INT32) wsptr[6];
8024 +
8025 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c6 */
8026 + tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */
8027 + tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */
8028 +
8029 + tmp20 = tmp10 + tmp12;
8030 + tmp24 = tmp10 - tmp12;
8031 + tmp21 = tmp11 + tmp13;
8032 + tmp23 = tmp11 - tmp13;
8033 +
8034 + /* Odd part */
8035 +
8036 + z1 = (INT32) wsptr[1];
8037 + z2 = (INT32) wsptr[3];
8038 + z3 = (INT32) wsptr[5];
8039 + z3 <<= CONST_BITS;
8040 + z4 = (INT32) wsptr[7];
8041 +
8042 + tmp11 = z2 + z4;
8043 + tmp13 = z2 - z4;
8044 +
8045 + tmp12 = MULTIPLY(tmp13, FIX(0.309016994)); /* (c3-c7)/2 */
8046 +
8047 + z2 = MULTIPLY(tmp11, FIX(0.951056516)); /* (c3+c7)/2 */
8048 + z4 = z3 + tmp12;
8049 +
8050 + tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */
8051 + tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */
8052 +
8053 + z2 = MULTIPLY(tmp11, FIX(0.587785252)); /* (c1-c9)/2 */
8054 + z4 = z3 - tmp12 - (tmp13 << (CONST_BITS - 1));
8055 +
8056 + tmp12 = ((z1 - tmp13) << CONST_BITS) - z3;
8057 +
8058 + tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */
8059 + tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */
8060 +
8061 + /* Final output stage */
8062 +
8063 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
8064 + CONST_BITS+PASS1_BITS+3)
8065 + & RANGE_MASK];
8066 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
8067 + CONST_BITS+PASS1_BITS+3)
8068 + & RANGE_MASK];
8069 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
8070 + CONST_BITS+PASS1_BITS+3)
8071 + & RANGE_MASK];
8072 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
8073 + CONST_BITS+PASS1_BITS+3)
8074 + & RANGE_MASK];
8075 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
8076 + CONST_BITS+PASS1_BITS+3)
8077 + & RANGE_MASK];
8078 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
8079 + CONST_BITS+PASS1_BITS+3)
8080 + & RANGE_MASK];
8081 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
8082 + CONST_BITS+PASS1_BITS+3)
8083 + & RANGE_MASK];
8084 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
8085 + CONST_BITS+PASS1_BITS+3)
8086 + & RANGE_MASK];
8087 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
8088 + CONST_BITS+PASS1_BITS+3)
8089 + & RANGE_MASK];
8090 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
8091 + CONST_BITS+PASS1_BITS+3)
8092 + & RANGE_MASK];
8093 +
8094 + wsptr += 8; /* advance pointer to next row */
8095 + }
8096 +}
8097 +
8098 +
8099 +/*
8100 + * Perform dequantization and inverse DCT on one block of coefficients,
8101 + * producing a 11x11 output block.
8102 + *
8103 + * Optimized algorithm with 24 multiplications in the 1-D kernel.
8104 + * cK represents sqrt(2) * cos(K*pi/22).
8105 + */
8106 +
8107 +GLOBAL(void)
8108 +jpeg_idct_11x11 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
8109 + JCOEFPTR coef_block,
8110 + JSAMPARRAY output_buf, JDIMENSION output_col)
8111 +{
8112 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14;
8113 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25;
8114 + INT32 z1, z2, z3, z4;
8115 + JCOEFPTR inptr;
8116 + ISLOW_MULT_TYPE * quantptr;
8117 + int * wsptr;
8118 + JSAMPROW outptr;
8119 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
8120 + int ctr;
8121 + int workspace[8*11]; /* buffers data between passes */
8122 + SHIFT_TEMPS
8123 +
8124 + /* Pass 1: process columns from input, store into work array. */
8125 +
8126 + inptr = coef_block;
8127 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
8128 + wsptr = workspace;
8129 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
8130 + /* Even part */
8131 +
8132 + tmp10 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
8133 + tmp10 <<= CONST_BITS;
8134 + /* Add fudge factor here for final descale. */
8135 + tmp10 += ONE << (CONST_BITS-PASS1_BITS-1);
8136 +
8137 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
8138 + z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
8139 + z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
8140 +
8141 + tmp20 = MULTIPLY(z2 - z3, FIX(2.546640132)); /* c2+c4 */
8142 + tmp23 = MULTIPLY(z2 - z1, FIX(0.430815045)); /* c2-c6 */
8143 + z4 = z1 + z3;
8144 + tmp24 = MULTIPLY(z4, - FIX(1.155664402)); /* -(c2-c10) */
8145 + z4 -= z2;
8146 + tmp25 = tmp10 + MULTIPLY(z4, FIX(1.356927976)); /* c2 */
8147 + tmp21 = tmp20 + tmp23 + tmp25 -
8148 + MULTIPLY(z2, FIX(1.821790775)); /* c2+c4+c10-c6 */
8149 + tmp20 += tmp25 + MULTIPLY(z3, FIX(2.115825087)); /* c4+c6 */
8150 + tmp23 += tmp25 - MULTIPLY(z1, FIX(1.513598477)); /* c6+c8 */
8151 + tmp24 += tmp25;
8152 + tmp22 = tmp24 - MULTIPLY(z3, FIX(0.788749120)); /* c8+c10 */
8153 + tmp24 += MULTIPLY(z2, FIX(1.944413522)) - /* c2+c8 */
8154 + MULTIPLY(z1, FIX(1.390975730)); /* c4+c10 */
8155 + tmp25 = tmp10 - MULTIPLY(z4, FIX(1.414213562)); /* c0 */
8156 +
8157 + /* Odd part */
8158 +
8159 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
8160 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
8161 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
8162 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
8163 +
8164 + tmp11 = z1 + z2;
8165 + tmp14 = MULTIPLY(tmp11 + z3 + z4, FIX(0.398430003)); /* c9 */
8166 + tmp11 = MULTIPLY(tmp11, FIX(0.887983902)); /* c3-c9 */
8167 + tmp12 = MULTIPLY(z1 + z3, FIX(0.670361295)); /* c5-c9 */
8168 + tmp13 = tmp14 + MULTIPLY(z1 + z4, FIX(0.366151574)); /* c7-c9 */
8169 + tmp10 = tmp11 + tmp12 + tmp13 -
8170 + MULTIPLY(z1, FIX(0.923107866)); /* c7+c5+c3-c1-2*c9 */
8171 + z1 = tmp14 - MULTIPLY(z2 + z3, FIX(1.163011579)); /* c7+c9 */
8172 + tmp11 += z1 + MULTIPLY(z2, FIX(2.073276588)); /* c1+c7+3*c9-c3 */
8173 + tmp12 += z1 - MULTIPLY(z3, FIX(1.192193623)); /* c3+c5-c7-c9 */
8174 + z1 = MULTIPLY(z2 + z4, - FIX(1.798248910)); /* -(c1+c9) */
8175 + tmp11 += z1;
8176 + tmp13 += z1 + MULTIPLY(z4, FIX(2.102458632)); /* c1+c5+c9-c7 */
8177 + tmp14 += MULTIPLY(z2, - FIX(1.467221301)) + /* -(c5+c9) */
8178 + MULTIPLY(z3, FIX(1.001388905)) - /* c1-c9 */
8179 + MULTIPLY(z4, FIX(1.684843907)); /* c3+c9 */
8180 +
8181 + /* Final output stage */
8182 +
8183 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
8184 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
8185 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
8186 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
8187 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
8188 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
8189 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
8190 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
8191 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
8192 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
8193 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25, CONST_BITS-PASS1_BITS);
8194 + }
8195 +
8196 + /* Pass 2: process 11 rows from work array, store into output array. */
8197 +
8198 + wsptr = workspace;
8199 + for (ctr = 0; ctr < 11; ctr++) {
8200 + outptr = output_buf[ctr] + output_col;
8201 +
8202 + /* Even part */
8203 +
8204 + /* Add fudge factor here for final descale. */
8205 + tmp10 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
8206 + tmp10 <<= CONST_BITS;
8207 +
8208 + z1 = (INT32) wsptr[2];
8209 + z2 = (INT32) wsptr[4];
8210 + z3 = (INT32) wsptr[6];
8211 +
8212 + tmp20 = MULTIPLY(z2 - z3, FIX(2.546640132)); /* c2+c4 */
8213 + tmp23 = MULTIPLY(z2 - z1, FIX(0.430815045)); /* c2-c6 */
8214 + z4 = z1 + z3;
8215 + tmp24 = MULTIPLY(z4, - FIX(1.155664402)); /* -(c2-c10) */
8216 + z4 -= z2;
8217 + tmp25 = tmp10 + MULTIPLY(z4, FIX(1.356927976)); /* c2 */
8218 + tmp21 = tmp20 + tmp23 + tmp25 -
8219 + MULTIPLY(z2, FIX(1.821790775)); /* c2+c4+c10-c6 */
8220 + tmp20 += tmp25 + MULTIPLY(z3, FIX(2.115825087)); /* c4+c6 */
8221 + tmp23 += tmp25 - MULTIPLY(z1, FIX(1.513598477)); /* c6+c8 */
8222 + tmp24 += tmp25;
8223 + tmp22 = tmp24 - MULTIPLY(z3, FIX(0.788749120)); /* c8+c10 */
8224 + tmp24 += MULTIPLY(z2, FIX(1.944413522)) - /* c2+c8 */
8225 + MULTIPLY(z1, FIX(1.390975730)); /* c4+c10 */
8226 + tmp25 = tmp10 - MULTIPLY(z4, FIX(1.414213562)); /* c0 */
8227 +
8228 + /* Odd part */
8229 +
8230 + z1 = (INT32) wsptr[1];
8231 + z2 = (INT32) wsptr[3];
8232 + z3 = (INT32) wsptr[5];
8233 + z4 = (INT32) wsptr[7];
8234 +
8235 + tmp11 = z1 + z2;
8236 + tmp14 = MULTIPLY(tmp11 + z3 + z4, FIX(0.398430003)); /* c9 */
8237 + tmp11 = MULTIPLY(tmp11, FIX(0.887983902)); /* c3-c9 */
8238 + tmp12 = MULTIPLY(z1 + z3, FIX(0.670361295)); /* c5-c9 */
8239 + tmp13 = tmp14 + MULTIPLY(z1 + z4, FIX(0.366151574)); /* c7-c9 */
8240 + tmp10 = tmp11 + tmp12 + tmp13 -
8241 + MULTIPLY(z1, FIX(0.923107866)); /* c7+c5+c3-c1-2*c9 */
8242 + z1 = tmp14 - MULTIPLY(z2 + z3, FIX(1.163011579)); /* c7+c9 */
8243 + tmp11 += z1 + MULTIPLY(z2, FIX(2.073276588)); /* c1+c7+3*c9-c3 */
8244 + tmp12 += z1 - MULTIPLY(z3, FIX(1.192193623)); /* c3+c5-c7-c9 */
8245 + z1 = MULTIPLY(z2 + z4, - FIX(1.798248910)); /* -(c1+c9) */
8246 + tmp11 += z1;
8247 + tmp13 += z1 + MULTIPLY(z4, FIX(2.102458632)); /* c1+c5+c9-c7 */
8248 + tmp14 += MULTIPLY(z2, - FIX(1.467221301)) + /* -(c5+c9) */
8249 + MULTIPLY(z3, FIX(1.001388905)) - /* c1-c9 */
8250 + MULTIPLY(z4, FIX(1.684843907)); /* c3+c9 */
8251 +
8252 + /* Final output stage */
8253 +
8254 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
8255 + CONST_BITS+PASS1_BITS+3)
8256 + & RANGE_MASK];
8257 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
8258 + CONST_BITS+PASS1_BITS+3)
8259 + & RANGE_MASK];
8260 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
8261 + CONST_BITS+PASS1_BITS+3)
8262 + & RANGE_MASK];
8263 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
8264 + CONST_BITS+PASS1_BITS+3)
8265 + & RANGE_MASK];
8266 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
8267 + CONST_BITS+PASS1_BITS+3)
8268 + & RANGE_MASK];
8269 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
8270 + CONST_BITS+PASS1_BITS+3)
8271 + & RANGE_MASK];
8272 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
8273 + CONST_BITS+PASS1_BITS+3)
8274 + & RANGE_MASK];
8275 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
8276 + CONST_BITS+PASS1_BITS+3)
8277 + & RANGE_MASK];
8278 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
8279 + CONST_BITS+PASS1_BITS+3)
8280 + & RANGE_MASK];
8281 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
8282 + CONST_BITS+PASS1_BITS+3)
8283 + & RANGE_MASK];
8284 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25,
8285 + CONST_BITS+PASS1_BITS+3)
8286 + & RANGE_MASK];
8287 +
8288 + wsptr += 8; /* advance pointer to next row */
8289 + }
8290 +}
8291 +
8292 +
8293 +/*
8294 + * Perform dequantization and inverse DCT on one block of coefficients,
8295 + * producing a 12x12 output block.
8296 + *
8297 + * Optimized algorithm with 15 multiplications in the 1-D kernel.
8298 + * cK represents sqrt(2) * cos(K*pi/24).
8299 + */
8300 +
8301 +GLOBAL(void)
8302 +jpeg_idct_12x12 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
8303 + JCOEFPTR coef_block,
8304 + JSAMPARRAY output_buf, JDIMENSION output_col)
8305 +{
8306 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15;
8307 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25;
8308 + INT32 z1, z2, z3, z4;
8309 + JCOEFPTR inptr;
8310 + ISLOW_MULT_TYPE * quantptr;
8311 + int * wsptr;
8312 + JSAMPROW outptr;
8313 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
8314 + int ctr;
8315 + int workspace[8*12]; /* buffers data between passes */
8316 + SHIFT_TEMPS
8317 +
8318 + /* Pass 1: process columns from input, store into work array. */
8319 +
8320 + inptr = coef_block;
8321 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
8322 + wsptr = workspace;
8323 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
8324 + /* Even part */
8325 +
8326 + z3 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
8327 + z3 <<= CONST_BITS;
8328 + /* Add fudge factor here for final descale. */
8329 + z3 += ONE << (CONST_BITS-PASS1_BITS-1);
8330 +
8331 + z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
8332 + z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */
8333 +
8334 + tmp10 = z3 + z4;
8335 + tmp11 = z3 - z4;
8336 +
8337 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
8338 + z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */
8339 + z1 <<= CONST_BITS;
8340 + z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
8341 + z2 <<= CONST_BITS;
8342 +
8343 + tmp12 = z1 - z2;
8344 +
8345 + tmp21 = z3 + tmp12;
8346 + tmp24 = z3 - tmp12;
8347 +
8348 + tmp12 = z4 + z2;
8349 +
8350 + tmp20 = tmp10 + tmp12;
8351 + tmp25 = tmp10 - tmp12;
8352 +
8353 + tmp12 = z4 - z1 - z2;
8354 +
8355 + tmp22 = tmp11 + tmp12;
8356 + tmp23 = tmp11 - tmp12;
8357 +
8358 + /* Odd part */
8359 +
8360 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
8361 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
8362 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
8363 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
8364 +
8365 + tmp11 = MULTIPLY(z2, FIX(1.306562965)); /* c3 */
8366 + tmp14 = MULTIPLY(z2, - FIX_0_541196100); /* -c9 */
8367 +
8368 + tmp10 = z1 + z3;
8369 + tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669)); /* c7 */
8370 + tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384)); /* c5-c7 */
8371 + tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716)); /* c1-c5 */
8372 + tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580)); /* -(c7+c11) */
8373 + tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */
8374 + tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */
8375 + tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) - /* c7-c11 */
8376 + MULTIPLY(z4, FIX(1.982889723)); /* c5+c7 */
8377 +
8378 + z1 -= z4;
8379 + z2 -= z3;
8380 + z3 = MULTIPLY(z1 + z2, FIX_0_541196100); /* c9 */
8381 + tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865); /* c3-c9 */
8382 + tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065); /* c3+c9 */
8383 +
8384 + /* Final output stage */
8385 +
8386 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
8387 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
8388 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
8389 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
8390 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
8391 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
8392 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
8393 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
8394 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
8395 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
8396 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
8397 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
8398 + }
8399 +
8400 + /* Pass 2: process 12 rows from work array, store into output array. */
8401 +
8402 + wsptr = workspace;
8403 + for (ctr = 0; ctr < 12; ctr++) {
8404 + outptr = output_buf[ctr] + output_col;
8405 +
8406 + /* Even part */
8407 +
8408 + /* Add fudge factor here for final descale. */
8409 + z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
8410 + z3 <<= CONST_BITS;
8411 +
8412 + z4 = (INT32) wsptr[4];
8413 + z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */
8414 +
8415 + tmp10 = z3 + z4;
8416 + tmp11 = z3 - z4;
8417 +
8418 + z1 = (INT32) wsptr[2];
8419 + z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */
8420 + z1 <<= CONST_BITS;
8421 + z2 = (INT32) wsptr[6];
8422 + z2 <<= CONST_BITS;
8423 +
8424 + tmp12 = z1 - z2;
8425 +
8426 + tmp21 = z3 + tmp12;
8427 + tmp24 = z3 - tmp12;
8428 +
8429 + tmp12 = z4 + z2;
8430 +
8431 + tmp20 = tmp10 + tmp12;
8432 + tmp25 = tmp10 - tmp12;
8433 +
8434 + tmp12 = z4 - z1 - z2;
8435 +
8436 + tmp22 = tmp11 + tmp12;
8437 + tmp23 = tmp11 - tmp12;
8438 +
8439 + /* Odd part */
8440 +
8441 + z1 = (INT32) wsptr[1];
8442 + z2 = (INT32) wsptr[3];
8443 + z3 = (INT32) wsptr[5];
8444 + z4 = (INT32) wsptr[7];
8445 +
8446 + tmp11 = MULTIPLY(z2, FIX(1.306562965)); /* c3 */
8447 + tmp14 = MULTIPLY(z2, - FIX_0_541196100); /* -c9 */
8448 +
8449 + tmp10 = z1 + z3;
8450 + tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669)); /* c7 */
8451 + tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384)); /* c5-c7 */
8452 + tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716)); /* c1-c5 */
8453 + tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580)); /* -(c7+c11) */
8454 + tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */
8455 + tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */
8456 + tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) - /* c7-c11 */
8457 + MULTIPLY(z4, FIX(1.982889723)); /* c5+c7 */
8458 +
8459 + z1 -= z4;
8460 + z2 -= z3;
8461 + z3 = MULTIPLY(z1 + z2, FIX_0_541196100); /* c9 */
8462 + tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865); /* c3-c9 */
8463 + tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065); /* c3+c9 */
8464 +
8465 + /* Final output stage */
8466 +
8467 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
8468 + CONST_BITS+PASS1_BITS+3)
8469 + & RANGE_MASK];
8470 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
8471 + CONST_BITS+PASS1_BITS+3)
8472 + & RANGE_MASK];
8473 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
8474 + CONST_BITS+PASS1_BITS+3)
8475 + & RANGE_MASK];
8476 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
8477 + CONST_BITS+PASS1_BITS+3)
8478 + & RANGE_MASK];
8479 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
8480 + CONST_BITS+PASS1_BITS+3)
8481 + & RANGE_MASK];
8482 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
8483 + CONST_BITS+PASS1_BITS+3)
8484 + & RANGE_MASK];
8485 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
8486 + CONST_BITS+PASS1_BITS+3)
8487 + & RANGE_MASK];
8488 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
8489 + CONST_BITS+PASS1_BITS+3)
8490 + & RANGE_MASK];
8491 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
8492 + CONST_BITS+PASS1_BITS+3)
8493 + & RANGE_MASK];
8494 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
8495 + CONST_BITS+PASS1_BITS+3)
8496 + & RANGE_MASK];
8497 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
8498 + CONST_BITS+PASS1_BITS+3)
8499 + & RANGE_MASK];
8500 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
8501 + CONST_BITS+PASS1_BITS+3)
8502 + & RANGE_MASK];
8503 +
8504 + wsptr += 8; /* advance pointer to next row */
8505 + }
8506 +}
8507 +
8508 +
8509 +/*
8510 + * Perform dequantization and inverse DCT on one block of coefficients,
8511 + * producing a 13x13 output block.
8512 + *
8513 + * Optimized algorithm with 29 multiplications in the 1-D kernel.
8514 + * cK represents sqrt(2) * cos(K*pi/26).
8515 + */
8516 +
8517 +GLOBAL(void)
8518 +jpeg_idct_13x13 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
8519 + JCOEFPTR coef_block,
8520 + JSAMPARRAY output_buf, JDIMENSION output_col)
8521 +{
8522 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15;
8523 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26;
8524 + INT32 z1, z2, z3, z4;
8525 + JCOEFPTR inptr;
8526 + ISLOW_MULT_TYPE * quantptr;
8527 + int * wsptr;
8528 + JSAMPROW outptr;
8529 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
8530 + int ctr;
8531 + int workspace[8*13]; /* buffers data between passes */
8532 + SHIFT_TEMPS
8533 +
8534 + /* Pass 1: process columns from input, store into work array. */
8535 +
8536 + inptr = coef_block;
8537 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
8538 + wsptr = workspace;
8539 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
8540 + /* Even part */
8541 +
8542 + z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
8543 + z1 <<= CONST_BITS;
8544 + /* Add fudge factor here for final descale. */
8545 + z1 += ONE << (CONST_BITS-PASS1_BITS-1);
8546 +
8547 + z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
8548 + z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
8549 + z4 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
8550 +
8551 + tmp10 = z3 + z4;
8552 + tmp11 = z3 - z4;
8553 +
8554 + tmp12 = MULTIPLY(tmp10, FIX(1.155388986)); /* (c4+c6)/2 */
8555 + tmp13 = MULTIPLY(tmp11, FIX(0.096834934)) + z1; /* (c4-c6)/2 */
8556 +
8557 + tmp20 = MULTIPLY(z2, FIX(1.373119086)) + tmp12 + tmp13; /* c2 */
8558 + tmp22 = MULTIPLY(z2, FIX(0.501487041)) - tmp12 + tmp13; /* c10 */
8559 +
8560 + tmp12 = MULTIPLY(tmp10, FIX(0.316450131)); /* (c8-c12)/2 */
8561 + tmp13 = MULTIPLY(tmp11, FIX(0.486914739)) + z1; /* (c8+c12)/2 */
8562 +
8563 + tmp21 = MULTIPLY(z2, FIX(1.058554052)) - tmp12 + tmp13; /* c6 */
8564 + tmp25 = MULTIPLY(z2, - FIX(1.252223920)) + tmp12 + tmp13; /* c4 */
8565 +
8566 + tmp12 = MULTIPLY(tmp10, FIX(0.435816023)); /* (c2-c10)/2 */
8567 + tmp13 = MULTIPLY(tmp11, FIX(0.937303064)) - z1; /* (c2+c10)/2 */
8568 +
8569 + tmp23 = MULTIPLY(z2, - FIX(0.170464608)) - tmp12 - tmp13; /* c12 */
8570 + tmp24 = MULTIPLY(z2, - FIX(0.803364869)) + tmp12 - tmp13; /* c8 */
8571 +
8572 + tmp26 = MULTIPLY(tmp11 - z2, FIX(1.414213562)) + z1; /* c0 */
8573 +
8574 + /* Odd part */
8575 +
8576 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
8577 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
8578 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
8579 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
8580 +
8581 + tmp11 = MULTIPLY(z1 + z2, FIX(1.322312651)); /* c3 */
8582 + tmp12 = MULTIPLY(z1 + z3, FIX(1.163874945)); /* c5 */
8583 + tmp15 = z1 + z4;
8584 + tmp13 = MULTIPLY(tmp15, FIX(0.937797057)); /* c7 */
8585 + tmp10 = tmp11 + tmp12 + tmp13 -
8586 + MULTIPLY(z1, FIX(2.020082300)); /* c7+c5+c3-c1 */
8587 + tmp14 = MULTIPLY(z2 + z3, - FIX(0.338443458)); /* -c11 */
8588 + tmp11 += tmp14 + MULTIPLY(z2, FIX(0.837223564)); /* c5+c9+c11-c3 */
8589 + tmp12 += tmp14 - MULTIPLY(z3, FIX(1.572116027)); /* c1+c5-c9-c11 */
8590 + tmp14 = MULTIPLY(z2 + z4, - FIX(1.163874945)); /* -c5 */
8591 + tmp11 += tmp14;
8592 + tmp13 += tmp14 + MULTIPLY(z4, FIX(2.205608352)); /* c3+c5+c9-c7 */
8593 + tmp14 = MULTIPLY(z3 + z4, - FIX(0.657217813)); /* -c9 */
8594 + tmp12 += tmp14;
8595 + tmp13 += tmp14;
8596 + tmp15 = MULTIPLY(tmp15, FIX(0.338443458)); /* c11 */
8597 + tmp14 = tmp15 + MULTIPLY(z1, FIX(0.318774355)) - /* c9-c11 */
8598 + MULTIPLY(z2, FIX(0.466105296)); /* c1-c7 */
8599 + z1 = MULTIPLY(z3 - z2, FIX(0.937797057)); /* c7 */
8600 + tmp14 += z1;
8601 + tmp15 += z1 + MULTIPLY(z3, FIX(0.384515595)) - /* c3-c7 */
8602 + MULTIPLY(z4, FIX(1.742345811)); /* c1+c11 */
8603 +
8604 + /* Final output stage */
8605 +
8606 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
8607 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
8608 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
8609 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
8610 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
8611 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
8612 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
8613 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
8614 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
8615 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
8616 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
8617 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
8618 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26, CONST_BITS-PASS1_BITS);
8619 + }
8620 +
8621 + /* Pass 2: process 13 rows from work array, store into output array. */
8622 +
8623 + wsptr = workspace;
8624 + for (ctr = 0; ctr < 13; ctr++) {
8625 + outptr = output_buf[ctr] + output_col;
8626 +
8627 + /* Even part */
8628 +
8629 + /* Add fudge factor here for final descale. */
8630 + z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
8631 + z1 <<= CONST_BITS;
8632 +
8633 + z2 = (INT32) wsptr[2];
8634 + z3 = (INT32) wsptr[4];
8635 + z4 = (INT32) wsptr[6];
8636 +
8637 + tmp10 = z3 + z4;
8638 + tmp11 = z3 - z4;
8639 +
8640 + tmp12 = MULTIPLY(tmp10, FIX(1.155388986)); /* (c4+c6)/2 */
8641 + tmp13 = MULTIPLY(tmp11, FIX(0.096834934)) + z1; /* (c4-c6)/2 */
8642 +
8643 + tmp20 = MULTIPLY(z2, FIX(1.373119086)) + tmp12 + tmp13; /* c2 */
8644 + tmp22 = MULTIPLY(z2, FIX(0.501487041)) - tmp12 + tmp13; /* c10 */
8645 +
8646 + tmp12 = MULTIPLY(tmp10, FIX(0.316450131)); /* (c8-c12)/2 */
8647 + tmp13 = MULTIPLY(tmp11, FIX(0.486914739)) + z1; /* (c8+c12)/2 */
8648 +
8649 + tmp21 = MULTIPLY(z2, FIX(1.058554052)) - tmp12 + tmp13; /* c6 */
8650 + tmp25 = MULTIPLY(z2, - FIX(1.252223920)) + tmp12 + tmp13; /* c4 */
8651 +
8652 + tmp12 = MULTIPLY(tmp10, FIX(0.435816023)); /* (c2-c10)/2 */
8653 + tmp13 = MULTIPLY(tmp11, FIX(0.937303064)) - z1; /* (c2+c10)/2 */
8654 +
8655 + tmp23 = MULTIPLY(z2, - FIX(0.170464608)) - tmp12 - tmp13; /* c12 */
8656 + tmp24 = MULTIPLY(z2, - FIX(0.803364869)) + tmp12 - tmp13; /* c8 */
8657 +
8658 + tmp26 = MULTIPLY(tmp11 - z2, FIX(1.414213562)) + z1; /* c0 */
8659 +
8660 + /* Odd part */
8661 +
8662 + z1 = (INT32) wsptr[1];
8663 + z2 = (INT32) wsptr[3];
8664 + z3 = (INT32) wsptr[5];
8665 + z4 = (INT32) wsptr[7];
8666 +
8667 + tmp11 = MULTIPLY(z1 + z2, FIX(1.322312651)); /* c3 */
8668 + tmp12 = MULTIPLY(z1 + z3, FIX(1.163874945)); /* c5 */
8669 + tmp15 = z1 + z4;
8670 + tmp13 = MULTIPLY(tmp15, FIX(0.937797057)); /* c7 */
8671 + tmp10 = tmp11 + tmp12 + tmp13 -
8672 + MULTIPLY(z1, FIX(2.020082300)); /* c7+c5+c3-c1 */
8673 + tmp14 = MULTIPLY(z2 + z3, - FIX(0.338443458)); /* -c11 */
8674 + tmp11 += tmp14 + MULTIPLY(z2, FIX(0.837223564)); /* c5+c9+c11-c3 */
8675 + tmp12 += tmp14 - MULTIPLY(z3, FIX(1.572116027)); /* c1+c5-c9-c11 */
8676 + tmp14 = MULTIPLY(z2 + z4, - FIX(1.163874945)); /* -c5 */
8677 + tmp11 += tmp14;
8678 + tmp13 += tmp14 + MULTIPLY(z4, FIX(2.205608352)); /* c3+c5+c9-c7 */
8679 + tmp14 = MULTIPLY(z3 + z4, - FIX(0.657217813)); /* -c9 */
8680 + tmp12 += tmp14;
8681 + tmp13 += tmp14;
8682 + tmp15 = MULTIPLY(tmp15, FIX(0.338443458)); /* c11 */
8683 + tmp14 = tmp15 + MULTIPLY(z1, FIX(0.318774355)) - /* c9-c11 */
8684 + MULTIPLY(z2, FIX(0.466105296)); /* c1-c7 */
8685 + z1 = MULTIPLY(z3 - z2, FIX(0.937797057)); /* c7 */
8686 + tmp14 += z1;
8687 + tmp15 += z1 + MULTIPLY(z3, FIX(0.384515595)) - /* c3-c7 */
8688 + MULTIPLY(z4, FIX(1.742345811)); /* c1+c11 */
8689 +
8690 + /* Final output stage */
8691 +
8692 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
8693 + CONST_BITS+PASS1_BITS+3)
8694 + & RANGE_MASK];
8695 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
8696 + CONST_BITS+PASS1_BITS+3)
8697 + & RANGE_MASK];
8698 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
8699 + CONST_BITS+PASS1_BITS+3)
8700 + & RANGE_MASK];
8701 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
8702 + CONST_BITS+PASS1_BITS+3)
8703 + & RANGE_MASK];
8704 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
8705 + CONST_BITS+PASS1_BITS+3)
8706 + & RANGE_MASK];
8707 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
8708 + CONST_BITS+PASS1_BITS+3)
8709 + & RANGE_MASK];
8710 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
8711 + CONST_BITS+PASS1_BITS+3)
8712 + & RANGE_MASK];
8713 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
8714 + CONST_BITS+PASS1_BITS+3)
8715 + & RANGE_MASK];
8716 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
8717 + CONST_BITS+PASS1_BITS+3)
8718 + & RANGE_MASK];
8719 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
8720 + CONST_BITS+PASS1_BITS+3)
8721 + & RANGE_MASK];
8722 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
8723 + CONST_BITS+PASS1_BITS+3)
8724 + & RANGE_MASK];
8725 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
8726 + CONST_BITS+PASS1_BITS+3)
8727 + & RANGE_MASK];
8728 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26,
8729 + CONST_BITS+PASS1_BITS+3)
8730 + & RANGE_MASK];
8731 +
8732 + wsptr += 8; /* advance pointer to next row */
8733 + }
8734 +}
8735 +
8736 +
8737 +/*
8738 + * Perform dequantization and inverse DCT on one block of coefficients,
8739 + * producing a 14x14 output block.
8740 + *
8741 + * Optimized algorithm with 20 multiplications in the 1-D kernel.
8742 + * cK represents sqrt(2) * cos(K*pi/28).
8743 + */
8744 +
8745 +GLOBAL(void)
8746 +jpeg_idct_14x14 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
8747 + JCOEFPTR coef_block,
8748 + JSAMPARRAY output_buf, JDIMENSION output_col)
8749 +{
8750 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16;
8751 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26;
8752 + INT32 z1, z2, z3, z4;
8753 + JCOEFPTR inptr;
8754 + ISLOW_MULT_TYPE * quantptr;
8755 + int * wsptr;
8756 + JSAMPROW outptr;
8757 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
8758 + int ctr;
8759 + int workspace[8*14]; /* buffers data between passes */
8760 + SHIFT_TEMPS
8761 +
8762 + /* Pass 1: process columns from input, store into work array. */
8763 +
8764 + inptr = coef_block;
8765 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
8766 + wsptr = workspace;
8767 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
8768 + /* Even part */
8769 +
8770 + z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
8771 + z1 <<= CONST_BITS;
8772 + /* Add fudge factor here for final descale. */
8773 + z1 += ONE << (CONST_BITS-PASS1_BITS-1);
8774 + z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
8775 + z2 = MULTIPLY(z4, FIX(1.274162392)); /* c4 */
8776 + z3 = MULTIPLY(z4, FIX(0.314692123)); /* c12 */
8777 + z4 = MULTIPLY(z4, FIX(0.881747734)); /* c8 */
8778 +
8779 + tmp10 = z1 + z2;
8780 + tmp11 = z1 + z3;
8781 + tmp12 = z1 - z4;
8782 +
8783 + tmp23 = RIGHT_SHIFT(z1 - ((z2 + z3 - z4) << 1), /* c0 = (c4+c12-c8)*2 */
8784 + CONST_BITS-PASS1_BITS);
8785 +
8786 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
8787 + z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
8788 +
8789 + z3 = MULTIPLY(z1 + z2, FIX(1.105676686)); /* c6 */
8790 +
8791 + tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */
8792 + tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */
8793 + tmp15 = MULTIPLY(z1, FIX(0.613604268)) - /* c10 */
8794 + MULTIPLY(z2, FIX(1.378756276)); /* c2 */
8795 +
8796 + tmp20 = tmp10 + tmp13;
8797 + tmp26 = tmp10 - tmp13;
8798 + tmp21 = tmp11 + tmp14;
8799 + tmp25 = tmp11 - tmp14;
8800 + tmp22 = tmp12 + tmp15;
8801 + tmp24 = tmp12 - tmp15;
8802 +
8803 + /* Odd part */
8804 +
8805 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
8806 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
8807 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
8808 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
8809 + tmp13 = z4 << CONST_BITS;
8810 +
8811 + tmp14 = z1 + z3;
8812 + tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607)); /* c3 */
8813 + tmp12 = MULTIPLY(tmp14, FIX(1.197448846)); /* c5 */
8814 + tmp10 = tmp11 + tmp12 + tmp13 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */
8815 + tmp14 = MULTIPLY(tmp14, FIX(0.752406978)); /* c9 */
8816 + tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426)); /* c9+c11-c13 */
8817 + z1 -= z2;
8818 + tmp15 = MULTIPLY(z1, FIX(0.467085129)) - tmp13; /* c11 */
8819 + tmp16 += tmp15;
8820 + z1 += z4;
8821 + z4 = MULTIPLY(z2 + z3, - FIX(0.158341681)) - tmp13; /* -c13 */
8822 + tmp11 += z4 - MULTIPLY(z2, FIX(0.424103948)); /* c3-c9-c13 */
8823 + tmp12 += z4 - MULTIPLY(z3, FIX(2.373959773)); /* c3+c5-c13 */
8824 + z4 = MULTIPLY(z3 - z2, FIX(1.405321284)); /* c1 */
8825 + tmp14 += z4 + tmp13 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */
8826 + tmp15 += z4 + MULTIPLY(z2, FIX(0.674957567)); /* c1+c11-c5 */
8827 +
8828 + tmp13 = (z1 - z3) << PASS1_BITS;
8829 +
8830 + /* Final output stage */
8831 +
8832 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
8833 + wsptr[8*13] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
8834 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
8835 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
8836 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
8837 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
8838 + wsptr[8*3] = (int) (tmp23 + tmp13);
8839 + wsptr[8*10] = (int) (tmp23 - tmp13);
8840 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
8841 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
8842 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
8843 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
8844 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS);
8845 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS);
8846 + }
8847 +
8848 + /* Pass 2: process 14 rows from work array, store into output array. */
8849 +
8850 + wsptr = workspace;
8851 + for (ctr = 0; ctr < 14; ctr++) {
8852 + outptr = output_buf[ctr] + output_col;
8853 +
8854 + /* Even part */
8855 +
8856 + /* Add fudge factor here for final descale. */
8857 + z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
8858 + z1 <<= CONST_BITS;
8859 + z4 = (INT32) wsptr[4];
8860 + z2 = MULTIPLY(z4, FIX(1.274162392)); /* c4 */
8861 + z3 = MULTIPLY(z4, FIX(0.314692123)); /* c12 */
8862 + z4 = MULTIPLY(z4, FIX(0.881747734)); /* c8 */
8863 +
8864 + tmp10 = z1 + z2;
8865 + tmp11 = z1 + z3;
8866 + tmp12 = z1 - z4;
8867 +
8868 + tmp23 = z1 - ((z2 + z3 - z4) << 1); /* c0 = (c4+c12-c8)*2 */
8869 +
8870 + z1 = (INT32) wsptr[2];
8871 + z2 = (INT32) wsptr[6];
8872 +
8873 + z3 = MULTIPLY(z1 + z2, FIX(1.105676686)); /* c6 */
8874 +
8875 + tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */
8876 + tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */
8877 + tmp15 = MULTIPLY(z1, FIX(0.613604268)) - /* c10 */
8878 + MULTIPLY(z2, FIX(1.378756276)); /* c2 */
8879 +
8880 + tmp20 = tmp10 + tmp13;
8881 + tmp26 = tmp10 - tmp13;
8882 + tmp21 = tmp11 + tmp14;
8883 + tmp25 = tmp11 - tmp14;
8884 + tmp22 = tmp12 + tmp15;
8885 + tmp24 = tmp12 - tmp15;
8886 +
8887 + /* Odd part */
8888 +
8889 + z1 = (INT32) wsptr[1];
8890 + z2 = (INT32) wsptr[3];
8891 + z3 = (INT32) wsptr[5];
8892 + z4 = (INT32) wsptr[7];
8893 + z4 <<= CONST_BITS;
8894 +
8895 + tmp14 = z1 + z3;
8896 + tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607)); /* c3 */
8897 + tmp12 = MULTIPLY(tmp14, FIX(1.197448846)); /* c5 */
8898 + tmp10 = tmp11 + tmp12 + z4 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */
8899 + tmp14 = MULTIPLY(tmp14, FIX(0.752406978)); /* c9 */
8900 + tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426)); /* c9+c11-c13 */
8901 + z1 -= z2;
8902 + tmp15 = MULTIPLY(z1, FIX(0.467085129)) - z4; /* c11 */
8903 + tmp16 += tmp15;
8904 + tmp13 = MULTIPLY(z2 + z3, - FIX(0.158341681)) - z4; /* -c13 */
8905 + tmp11 += tmp13 - MULTIPLY(z2, FIX(0.424103948)); /* c3-c9-c13 */
8906 + tmp12 += tmp13 - MULTIPLY(z3, FIX(2.373959773)); /* c3+c5-c13 */
8907 + tmp13 = MULTIPLY(z3 - z2, FIX(1.405321284)); /* c1 */
8908 + tmp14 += tmp13 + z4 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */
8909 + tmp15 += tmp13 + MULTIPLY(z2, FIX(0.674957567)); /* c1+c11-c5 */
8910 +
8911 + tmp13 = ((z1 - z3) << CONST_BITS) + z4;
8912 +
8913 + /* Final output stage */
8914 +
8915 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
8916 + CONST_BITS+PASS1_BITS+3)
8917 + & RANGE_MASK];
8918 + outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
8919 + CONST_BITS+PASS1_BITS+3)
8920 + & RANGE_MASK];
8921 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
8922 + CONST_BITS+PASS1_BITS+3)
8923 + & RANGE_MASK];
8924 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
8925 + CONST_BITS+PASS1_BITS+3)
8926 + & RANGE_MASK];
8927 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
8928 + CONST_BITS+PASS1_BITS+3)
8929 + & RANGE_MASK];
8930 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
8931 + CONST_BITS+PASS1_BITS+3)
8932 + & RANGE_MASK];
8933 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
8934 + CONST_BITS+PASS1_BITS+3)
8935 + & RANGE_MASK];
8936 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
8937 + CONST_BITS+PASS1_BITS+3)
8938 + & RANGE_MASK];
8939 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
8940 + CONST_BITS+PASS1_BITS+3)
8941 + & RANGE_MASK];
8942 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
8943 + CONST_BITS+PASS1_BITS+3)
8944 + & RANGE_MASK];
8945 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
8946 + CONST_BITS+PASS1_BITS+3)
8947 + & RANGE_MASK];
8948 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
8949 + CONST_BITS+PASS1_BITS+3)
8950 + & RANGE_MASK];
8951 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16,
8952 + CONST_BITS+PASS1_BITS+3)
8953 + & RANGE_MASK];
8954 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16,
8955 + CONST_BITS+PASS1_BITS+3)
8956 + & RANGE_MASK];
8957 +
8958 + wsptr += 8; /* advance pointer to next row */
8959 + }
8960 +}
8961 +
8962 +
8963 +/*
8964 + * Perform dequantization and inverse DCT on one block of coefficients,
8965 + * producing a 15x15 output block.
8966 + *
8967 + * Optimized algorithm with 22 multiplications in the 1-D kernel.
8968 + * cK represents sqrt(2) * cos(K*pi/30).
8969 + */
8970 +
8971 +GLOBAL(void)
8972 +jpeg_idct_15x15 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
8973 + JCOEFPTR coef_block,
8974 + JSAMPARRAY output_buf, JDIMENSION output_col)
8975 +{
8976 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16;
8977 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27;
8978 + INT32 z1, z2, z3, z4;
8979 + JCOEFPTR inptr;
8980 + ISLOW_MULT_TYPE * quantptr;
8981 + int * wsptr;
8982 + JSAMPROW outptr;
8983 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
8984 + int ctr;
8985 + int workspace[8*15]; /* buffers data between passes */
8986 + SHIFT_TEMPS
8987 +
8988 + /* Pass 1: process columns from input, store into work array. */
8989 +
8990 + inptr = coef_block;
8991 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
8992 + wsptr = workspace;
8993 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
8994 + /* Even part */
8995 +
8996 + z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
8997 + z1 <<= CONST_BITS;
8998 + /* Add fudge factor here for final descale. */
8999 + z1 += ONE << (CONST_BITS-PASS1_BITS-1);
9000 +
9001 + z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
9002 + z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
9003 + z4 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
9004 +
9005 + tmp10 = MULTIPLY(z4, FIX(0.437016024)); /* c12 */
9006 + tmp11 = MULTIPLY(z4, FIX(1.144122806)); /* c6 */
9007 +
9008 + tmp12 = z1 - tmp10;
9009 + tmp13 = z1 + tmp11;
9010 + z1 -= (tmp11 - tmp10) << 1; /* c0 = (c6-c12)*2 */
9011 +
9012 + z4 = z2 - z3;
9013 + z3 += z2;
9014 + tmp10 = MULTIPLY(z3, FIX(1.337628990)); /* (c2+c4)/2 */
9015 + tmp11 = MULTIPLY(z4, FIX(0.045680613)); /* (c2-c4)/2 */
9016 + z2 = MULTIPLY(z2, FIX(1.439773946)); /* c4+c14 */
9017 +
9018 + tmp20 = tmp13 + tmp10 + tmp11;
9019 + tmp23 = tmp12 - tmp10 + tmp11 + z2;
9020 +
9021 + tmp10 = MULTIPLY(z3, FIX(0.547059574)); /* (c8+c14)/2 */
9022 + tmp11 = MULTIPLY(z4, FIX(0.399234004)); /* (c8-c14)/2 */
9023 +
9024 + tmp25 = tmp13 - tmp10 - tmp11;
9025 + tmp26 = tmp12 + tmp10 - tmp11 - z2;
9026 +
9027 + tmp10 = MULTIPLY(z3, FIX(0.790569415)); /* (c6+c12)/2 */
9028 + tmp11 = MULTIPLY(z4, FIX(0.353553391)); /* (c6-c12)/2 */
9029 +
9030 + tmp21 = tmp12 + tmp10 + tmp11;
9031 + tmp24 = tmp13 - tmp10 + tmp11;
9032 + tmp11 += tmp11;
9033 + tmp22 = z1 + tmp11; /* c10 = c6-c12 */
9034 + tmp27 = z1 - tmp11 - tmp11; /* c0 = (c6-c12)*2 */
9035 +
9036 + /* Odd part */
9037 +
9038 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
9039 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
9040 + z4 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
9041 + z3 = MULTIPLY(z4, FIX(1.224744871)); /* c5 */
9042 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
9043 +
9044 + tmp13 = z2 - z4;
9045 + tmp15 = MULTIPLY(z1 + tmp13, FIX(0.831253876)); /* c9 */
9046 + tmp11 = tmp15 + MULTIPLY(z1, FIX(0.513743148)); /* c3-c9 */
9047 + tmp14 = tmp15 - MULTIPLY(tmp13, FIX(2.176250899)); /* c3+c9 */
9048 +
9049 + tmp13 = MULTIPLY(z2, - FIX(0.831253876)); /* -c9 */
9050 + tmp15 = MULTIPLY(z2, - FIX(1.344997024)); /* -c3 */
9051 + z2 = z1 - z4;
9052 + tmp12 = z3 + MULTIPLY(z2, FIX(1.406466353)); /* c1 */
9053 +
9054 + tmp10 = tmp12 + MULTIPLY(z4, FIX(2.457431844)) - tmp15; /* c1+c7 */
9055 + tmp16 = tmp12 - MULTIPLY(z1, FIX(1.112434820)) + tmp13; /* c1-c13 */
9056 + tmp12 = MULTIPLY(z2, FIX(1.224744871)) - z3; /* c5 */
9057 + z2 = MULTIPLY(z1 + z4, FIX(0.575212477)); /* c11 */
9058 + tmp13 += z2 + MULTIPLY(z1, FIX(0.475753014)) - z3; /* c7-c11 */
9059 + tmp15 += z2 - MULTIPLY(z4, FIX(0.869244010)) + z3; /* c11+c13 */
9060 +
9061 + /* Final output stage */
9062 +
9063 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);
9064 + wsptr[8*14] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);
9065 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);
9066 + wsptr[8*13] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);
9067 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);
9068 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);
9069 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);
9070 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);
9071 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);
9072 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);
9073 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);
9074 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);
9075 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS);
9076 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS);
9077 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp27, CONST_BITS-PASS1_BITS);
9078 + }
9079 +
9080 + /* Pass 2: process 15 rows from work array, store into output array. */
9081 +
9082 + wsptr = workspace;
9083 + for (ctr = 0; ctr < 15; ctr++) {
9084 + outptr = output_buf[ctr] + output_col;
9085 +
9086 + /* Even part */
9087 +
9088 + /* Add fudge factor here for final descale. */
9089 + z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
9090 + z1 <<= CONST_BITS;
9091 +
9092 + z2 = (INT32) wsptr[2];
9093 + z3 = (INT32) wsptr[4];
9094 + z4 = (INT32) wsptr[6];
9095 +
9096 + tmp10 = MULTIPLY(z4, FIX(0.437016024)); /* c12 */
9097 + tmp11 = MULTIPLY(z4, FIX(1.144122806)); /* c6 */
9098 +
9099 + tmp12 = z1 - tmp10;
9100 + tmp13 = z1 + tmp11;
9101 + z1 -= (tmp11 - tmp10) << 1; /* c0 = (c6-c12)*2 */
9102 +
9103 + z4 = z2 - z3;
9104 + z3 += z2;
9105 + tmp10 = MULTIPLY(z3, FIX(1.337628990)); /* (c2+c4)/2 */
9106 + tmp11 = MULTIPLY(z4, FIX(0.045680613)); /* (c2-c4)/2 */
9107 + z2 = MULTIPLY(z2, FIX(1.439773946)); /* c4+c14 */
9108 +
9109 + tmp20 = tmp13 + tmp10 + tmp11;
9110 + tmp23 = tmp12 - tmp10 + tmp11 + z2;
9111 +
9112 + tmp10 = MULTIPLY(z3, FIX(0.547059574)); /* (c8+c14)/2 */
9113 + tmp11 = MULTIPLY(z4, FIX(0.399234004)); /* (c8-c14)/2 */
9114 +
9115 + tmp25 = tmp13 - tmp10 - tmp11;
9116 + tmp26 = tmp12 + tmp10 - tmp11 - z2;
9117 +
9118 + tmp10 = MULTIPLY(z3, FIX(0.790569415)); /* (c6+c12)/2 */
9119 + tmp11 = MULTIPLY(z4, FIX(0.353553391)); /* (c6-c12)/2 */
9120 +
9121 + tmp21 = tmp12 + tmp10 + tmp11;
9122 + tmp24 = tmp13 - tmp10 + tmp11;
9123 + tmp11 += tmp11;
9124 + tmp22 = z1 + tmp11; /* c10 = c6-c12 */
9125 + tmp27 = z1 - tmp11 - tmp11; /* c0 = (c6-c12)*2 */
9126 +
9127 + /* Odd part */
9128 +
9129 + z1 = (INT32) wsptr[1];
9130 + z2 = (INT32) wsptr[3];
9131 + z4 = (INT32) wsptr[5];
9132 + z3 = MULTIPLY(z4, FIX(1.224744871)); /* c5 */
9133 + z4 = (INT32) wsptr[7];
9134 +
9135 + tmp13 = z2 - z4;
9136 + tmp15 = MULTIPLY(z1 + tmp13, FIX(0.831253876)); /* c9 */
9137 + tmp11 = tmp15 + MULTIPLY(z1, FIX(0.513743148)); /* c3-c9 */
9138 + tmp14 = tmp15 - MULTIPLY(tmp13, FIX(2.176250899)); /* c3+c9 */
9139 +
9140 + tmp13 = MULTIPLY(z2, - FIX(0.831253876)); /* -c9 */
9141 + tmp15 = MULTIPLY(z2, - FIX(1.344997024)); /* -c3 */
9142 + z2 = z1 - z4;
9143 + tmp12 = z3 + MULTIPLY(z2, FIX(1.406466353)); /* c1 */
9144 +
9145 + tmp10 = tmp12 + MULTIPLY(z4, FIX(2.457431844)) - tmp15; /* c1+c7 */
9146 + tmp16 = tmp12 - MULTIPLY(z1, FIX(1.112434820)) + tmp13; /* c1-c13 */
9147 + tmp12 = MULTIPLY(z2, FIX(1.224744871)) - z3; /* c5 */
9148 + z2 = MULTIPLY(z1 + z4, FIX(0.575212477)); /* c11 */
9149 + tmp13 += z2 + MULTIPLY(z1, FIX(0.475753014)) - z3; /* c7-c11 */
9150 + tmp15 += z2 - MULTIPLY(z4, FIX(0.869244010)) + z3; /* c11+c13 */
9151 +
9152 + /* Final output stage */
9153 +
9154 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,
9155 + CONST_BITS+PASS1_BITS+3)
9156 + & RANGE_MASK];
9157 + outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,
9158 + CONST_BITS+PASS1_BITS+3)
9159 + & RANGE_MASK];
9160 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,
9161 + CONST_BITS+PASS1_BITS+3)
9162 + & RANGE_MASK];
9163 + outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,
9164 + CONST_BITS+PASS1_BITS+3)
9165 + & RANGE_MASK];
9166 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,
9167 + CONST_BITS+PASS1_BITS+3)
9168 + & RANGE_MASK];
9169 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,
9170 + CONST_BITS+PASS1_BITS+3)
9171 + & RANGE_MASK];
9172 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,
9173 + CONST_BITS+PASS1_BITS+3)
9174 + & RANGE_MASK];
9175 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,
9176 + CONST_BITS+PASS1_BITS+3)
9177 + & RANGE_MASK];
9178 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,
9179 + CONST_BITS+PASS1_BITS+3)
9180 + & RANGE_MASK];
9181 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,
9182 + CONST_BITS+PASS1_BITS+3)
9183 + & RANGE_MASK];
9184 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,
9185 + CONST_BITS+PASS1_BITS+3)
9186 + & RANGE_MASK];
9187 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,
9188 + CONST_BITS+PASS1_BITS+3)
9189 + & RANGE_MASK];
9190 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16,
9191 + CONST_BITS+PASS1_BITS+3)
9192 + & RANGE_MASK];
9193 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16,
9194 + CONST_BITS+PASS1_BITS+3)
9195 + & RANGE_MASK];
9196 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp27,
9197 + CONST_BITS+PASS1_BITS+3)
9198 + & RANGE_MASK];
9199 +
9200 + wsptr += 8; /* advance pointer to next row */
9201 + }
9202 +}
9203 +
9204 +
9205 +/*
9206 + * Perform dequantization and inverse DCT on one block of coefficients,
9207 + * producing a 16x16 output block.
9208 + *
9209 + * Optimized algorithm with 28 multiplications in the 1-D kernel.
9210 + * cK represents sqrt(2) * cos(K*pi/32).
9211 + */
9212 +
9213 +GLOBAL(void)
9214 +jpeg_idct_16x16 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
9215 + JCOEFPTR coef_block,
9216 + JSAMPARRAY output_buf, JDIMENSION output_col)
9217 +{
9218 + INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13;
9219 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27;
9220 + INT32 z1, z2, z3, z4;
9221 + JCOEFPTR inptr;
9222 + ISLOW_MULT_TYPE * quantptr;
9223 + int * wsptr;
9224 + JSAMPROW outptr;
9225 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);
9226 + int ctr;
9227 + int workspace[8*16]; /* buffers data between passes */
9228 + SHIFT_TEMPS
9229 +
9230 + /* Pass 1: process columns from input, store into work array. */
9231 +
9232 + inptr = coef_block;
9233 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;
9234 + wsptr = workspace;
9235 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {
9236 + /* Even part */
9237 +
9238 + tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
9239 + tmp0 <<= CONST_BITS;
9240 + /* Add fudge factor here for final descale. */
9241 + tmp0 += 1 << (CONST_BITS-PASS1_BITS-1);
9242 +
9243 + z1 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
9244 + tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */
9245 + tmp2 = MULTIPLY(z1, FIX_0_541196100); /* c12[16] = c6[8] */
9246 +
9247 + tmp10 = tmp0 + tmp1;
9248 + tmp11 = tmp0 - tmp1;
9249 + tmp12 = tmp0 + tmp2;
9250 + tmp13 = tmp0 - tmp2;
9251 +
9252 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
9253 + z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
9254 + z3 = z1 - z2;
9255 + z4 = MULTIPLY(z3, FIX(0.275899379)); /* c14[16] = c7[8] */
9256 + z3 = MULTIPLY(z3, FIX(1.387039845)); /* c2[16] = c1[8] */
9257 +
9258 + tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447); /* (c6+c2)[16] = (c3+c1)[8] */
9259 + tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223); /* (c6-c14)[16] = (c3-c7)[8] */
9260 + tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */
9261 + tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] * /
9262 +
9263 + tmp20 = tmp10 + tmp0;
9264 + tmp27 = tmp10 - tmp0;
9265 + tmp21 = tmp12 + tmp1;
9266 + tmp26 = tmp12 - tmp1;
9267 + tmp22 = tmp13 + tmp2;
9268 + tmp25 = tmp13 - tmp2;
9269 + tmp23 = tmp11 + tmp3;
9270 + tmp24 = tmp11 - tmp3;
9271 +
9272 + /* Odd part */
9273 +
9274 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
9275 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
9276 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
9277 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
9278 +
9279 + tmp11 = z1 + z3;
9280 +
9281 + tmp1 = MULTIPLY(z1 + z2, FIX(1.353318001)); /* c3 */
9282 + tmp2 = MULTIPLY(tmp11, FIX(1.247225013)); /* c5 */
9283 + tmp3 = MULTIPLY(z1 + z4, FIX(1.093201867)); /* c7 */
9284 + tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586)); /* c9 */
9285 + tmp11 = MULTIPLY(tmp11, FIX(0.666655658)); /* c11 */
9286 + tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528)); /* c13 */
9287 + tmp0 = tmp1 + tmp2 + tmp3 -
9288 + MULTIPLY(z1, FIX(2.286341144)); /* c7+c5+c3-c1 */
9289 + tmp13 = tmp10 + tmp11 + tmp12 -
9290 + MULTIPLY(z1, FIX(1.835730603)); /* c9+c11+c13-c15 */
9291 + z1 = MULTIPLY(z2 + z3, FIX(0.138617169)); /* c15 */
9292 + tmp1 += z1 + MULTIPLY(z2, FIX(0.071888074)); /* c9+c11-c3-c15 */
9293 + tmp2 += z1 - MULTIPLY(z3, FIX(1.125726048)); /* c5+c7+c15-c3 */
9294 + z1 = MULTIPLY(z3 - z2, FIX(1.407403738)); /* c1 */
9295 + tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282)); /* c1+c11-c9-c13 */
9296 + tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411)); /* c1+c5+c13-c7 */
9297 + z2 += z4;
9298 + z1 = MULTIPLY(z2, - FIX(0.666655658)); /* -c11 */
9299 + tmp1 += z1;
9300 + tmp3 += z1 + MULTIPLY(z4, FIX(1.065388962)); /* c3+c11+c15-c7 */
9301 + z2 = MULTIPLY(z2, - FIX(1.247225013)); /* -c5 */
9302 + tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809)); /* c1+c5+c9-c13 */
9303 + tmp12 += z2;
9304 + z2 = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */
9305 + tmp2 += z2;
9306 + tmp3 += z2;
9307 + z2 = MULTIPLY(z4 - z3, FIX(0.410524528)); /* c13 */
9308 + tmp10 += z2;
9309 + tmp11 += z2;
9310 +
9311 + /* Final output stage */
9312 +
9313 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp0, CONST_BITS-PASS1_BITS);
9314 + wsptr[8*15] = (int) RIGHT_SHIFT(tmp20 - tmp0, CONST_BITS-PASS1_BITS);
9315 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp1, CONST_BITS-PASS1_BITS);
9316 + wsptr[8*14] = (int) RIGHT_SHIFT(tmp21 - tmp1, CONST_BITS-PASS1_BITS);
9317 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp2, CONST_BITS-PASS1_BITS);
9318 + wsptr[8*13] = (int) RIGHT_SHIFT(tmp22 - tmp2, CONST_BITS-PASS1_BITS);
9319 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp3, CONST_BITS-PASS1_BITS);
9320 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp23 - tmp3, CONST_BITS-PASS1_BITS);
9321 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp10, CONST_BITS-PASS1_BITS);
9322 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp24 - tmp10, CONST_BITS-PASS1_BITS);
9323 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp11, CONST_BITS-PASS1_BITS);
9324 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp25 - tmp11, CONST_BITS-PASS1_BITS);
9325 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp12, CONST_BITS-PASS1_BITS);
9326 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp26 - tmp12, CONST_BITS-PASS1_BITS);
9327 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp27 + tmp13, CONST_BITS-PASS1_BITS);
9328 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp27 - tmp13, CONST_BITS-PASS1_BITS);
9329 + }
9330 +
9331 + /* Pass 2: process 16 rows from work array, store into output array. */
9332 +
9333 + wsptr = workspace;
9334 + for (ctr = 0; ctr < 16; ctr++) {
9335 + outptr = output_buf[ctr] + output_col;
9336 +
9337 + /* Even part */
9338 +
9339 + /* Add fudge factor here for final descale. */
9340 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));
9341 + tmp0 <<= CONST_BITS;
9342 +
9343 + z1 = (INT32) wsptr[4];
9344 + tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */
9345 + tmp2 = MULTIPLY(z1, FIX_0_541196100); /* c12[16] = c6[8] */
9346 +
9347 + tmp10 = tmp0 + tmp1;
9348 + tmp11 = tmp0 - tmp1;
9349 + tmp12 = tmp0 + tmp2;
9350 + tmp13 = tmp0 - tmp2;
9351 +
9352 + z1 = (INT32) wsptr[2];
9353 + z2 = (INT32) wsptr[6];
9354 + z3 = z1 - z2;
9355 + z4 = MULTIPLY(z3, FIX(0.275899379)); /* c14[16] = c7[8] */
9356 + z3 = MULTIPLY(z3, FIX(1.387039845)); /* c2[16] = c1[8] */
9357 +
9358 + tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447); /* (c6+c2)[16] = (c3+c1)[8] */
9359 + tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223); /* (c6-c14)[16] = (c3-c7)[8] */
9360 + tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */
9361 + tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] * /
9362 +
9363 + tmp20 = tmp10 + tmp0;
9364 + tmp27 = tmp10 - tmp0;
9365 + tmp21 = tmp12 + tmp1;
9366 + tmp26 = tmp12 - tmp1;
9367 + tmp22 = tmp13 + tmp2;
9368 + tmp25 = tmp13 - tmp2;
9369 + tmp23 = tmp11 + tmp3;
9370 + tmp24 = tmp11 - tmp3;
9371 +
9372 + /* Odd part */
9373 +
9374 + z1 = (INT32) wsptr[1];
9375 + z2 = (INT32) wsptr[3];
9376 + z3 = (INT32) wsptr[5];
9377 + z4 = (INT32) wsptr[7];
9378 +
9379 + tmp11 = z1 + z3;
9380 +
9381 + tmp1 = MULTIPLY(z1 + z2, FIX(1.353318001)); /* c3 */
9382 + tmp2 = MULTIPLY(tmp11, FIX(1.247225013)); /* c5 */
9383 + tmp3 = MULTIPLY(z1 + z4, FIX(1.093201867)); /* c7 */
9384 + tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586)); /* c9 */
9385 + tmp11 = MULTIPLY(tmp11, FIX(0.666655658)); /* c11 */
9386 + tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528)); /* c13 */
9387 + tmp0 = tmp1 + tmp2 + tmp3 -
9388 + MULTIPLY(z1, FIX(2.286341144)); /* c7+c5+c3-c1 */
9389 + tmp13 = tmp10 + tmp11 + tmp12 -
9390 + MULTIPLY(z1, FIX(1.835730603)); /* c9+c11+c13-c15 */
9391 + z1 = MULTIPLY(z2 + z3, FIX(0.138617169)); /* c15 */
9392 + tmp1 += z1 + MULTIPLY(z2, FIX(0.071888074)); /* c9+c11-c3-c15 */
9393 + tmp2 += z1 - MULTIPLY(z3, FIX(1.125726048)); /* c5+c7+c15-c3 */
9394 + z1 = MULTIPLY(z3 - z2, FIX(1.407403738)); /* c1 */
9395 + tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282)); /* c1+c11-c9-c13 */
9396 + tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411)); /* c1+c5+c13-c7 */
9397 + z2 += z4;
9398 + z1 = MULTIPLY(z2, - FIX(0.666655658)); /* -c11 */
9399 + tmp1 += z1;
9400 + tmp3 += z1 + MULTIPLY(z4, FIX(1.065388962)); /* c3+c11+c15-c7 */
9401 + z2 = MULTIPLY(z2, - FIX(1.247225013)); /* -c5 */
9402 + tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809)); /* c1+c5+c9-c13 */
9403 + tmp12 += z2;
9404 + z2 = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */
9405 + tmp2 += z2;
9406 + tmp3 += z2;
9407 + z2 = MULTIPLY(z4 - z3, FIX(0.410524528)); /* c13 */
9408 + tmp10 += z2;
9409 + tmp11 += z2;
9410 +
9411 + /* Final output stage */
9412 +
9413 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp0,
9414 + CONST_BITS+PASS1_BITS+3)
9415 + & RANGE_MASK];
9416 + outptr[15] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp0,
9417 + CONST_BITS+PASS1_BITS+3)
9418 + & RANGE_MASK];
9419 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp1,
9420 + CONST_BITS+PASS1_BITS+3)
9421 + & RANGE_MASK];
9422 + outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp1,
9423 + CONST_BITS+PASS1_BITS+3)
9424 + & RANGE_MASK];
9425 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp2,
9426 + CONST_BITS+PASS1_BITS+3)
9427 + & RANGE_MASK];
9428 + outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp2,
9429 + CONST_BITS+PASS1_BITS+3)
9430 + & RANGE_MASK];
9431 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp3,
9432 + CONST_BITS+PASS1_BITS+3)
9433 + & RANGE_MASK];
9434 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp3,
9435 + CONST_BITS+PASS1_BITS+3)
9436 + & RANGE_MASK];
9437 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp10,
9438 + CONST_BITS+PASS1_BITS+3)
9439 + & RANGE_MASK];
9440 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp10,
9441 + CONST_BITS+PASS1_BITS+3)
9442 + & RANGE_MASK];
9443 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp11,
9444 + CONST_BITS+PASS1_BITS+3)
9445 + & RANGE_MASK];
9446 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp11,
9447 + CONST_BITS+PASS1_BITS+3)
9448 + & RANGE_MASK];
9449 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp12,
9450 + CONST_BITS+PASS1_BITS+3)
9451 + & RANGE_MASK];
9452 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp12,
9453 + CONST_BITS+PASS1_BITS+3)
9454 + & RANGE_MASK];
9455 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp27 + tmp13,
9456 + CONST_BITS+PASS1_BITS+3)
9457 + & RANGE_MASK];
9458 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp27 - tmp13,
9459 + CONST_BITS+PASS1_BITS+3)
9460 + & RANGE_MASK];
9461 +
9462 + wsptr += 8; /* advance pointer to next row */
9463 + }
9464 +}
9465 +
9466 +#endif /* IDCT_SCALING_SUPPORTED */
9467 #endif /* DCT_ISLOW_SUPPORTED */
9468 Index: jmemmgr.c
9469 ===================================================================
9470 --- jmemmgr.c (revision 829)
9471 +++ jmemmgr.c (working copy)
9472 @@ -37,6 +37,15 @@
9473 #endif
9474
9475
9476 +LOCAL(size_t)
9477 +round_up_pow2 (size_t a, size_t b)
9478 +/* a rounded up to the next multiple of b, i.e. ceil(a/b)*b */
9479 +/* Assumes a >= 0, b > 0, and b is a power of 2 */
9480 +{
9481 + return ((a + b - 1) & (~(b - 1)));
9482 +}
9483 +
9484 +
9485 /*
9486 * Some important notes:
9487 * The allocation routines provided here must never return NULL.
9488 @@ -122,7 +131,7 @@
9489 jvirt_barray_ptr virt_barray_list;
9490
9491 /* This counts total space obtained from jpeg_get_small/large */
9492 - long total_space_allocated;
9493 + size_t total_space_allocated;
9494
9495 /* alloc_sarray and alloc_barray set this value for use by virtual
9496 * array routines.
9497 @@ -265,7 +274,7 @@
9498 * and so that algorithms can straddle outside the proper area up
9499 * to the next alignment.
9500 */
9501 - sizeofobject = jround_up(sizeofobject, ALIGN_SIZE);
9502 + sizeofobject = round_up_pow2(sizeofobject, ALIGN_SIZE);
9503
9504 /* Check for unsatisfiable request (do now to ensure no overflow below) */
9505 if ((SIZEOF(small_pool_hdr) + sizeofobject + ALIGN_SIZE - 1) > MAX_ALLOC_CHUN K)
9506 @@ -317,8 +326,8 @@
9507 /* OK, allocate the object from the current pool */
9508 data_ptr = (char *) hdr_ptr; /* point to first data byte in pool... */
9509 data_ptr += SIZEOF(small_pool_hdr); /* ...by skipping the header... */
9510 - if ((unsigned long)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */
9511 - data_ptr += ALIGN_SIZE - (unsigned long)data_ptr % ALIGN_SIZE;
9512 + if ((size_t)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */
9513 + data_ptr += ALIGN_SIZE - (size_t)data_ptr % ALIGN_SIZE;
9514 data_ptr += hdr_ptr->bytes_used; /* point to place for object */
9515 hdr_ptr->bytes_used += sizeofobject;
9516 hdr_ptr->bytes_left -= sizeofobject;
9517 @@ -354,7 +363,7 @@
9518 * algorithms can straddle outside the proper area up to the next
9519 * alignment.
9520 */
9521 - sizeofobject = jround_up(sizeofobject, ALIGN_SIZE);
9522 + sizeofobject = round_up_pow2(sizeofobject, ALIGN_SIZE);
9523
9524 /* Check for unsatisfiable request (do now to ensure no overflow below) */
9525 if ((SIZEOF(large_pool_hdr) + sizeofobject + ALIGN_SIZE - 1) > MAX_ALLOC_CHUN K)
9526 @@ -382,8 +391,8 @@
9527
9528 data_ptr = (char *) hdr_ptr; /* point to first data byte in pool... */
9529 data_ptr += SIZEOF(small_pool_hdr); /* ...by skipping the header... */
9530 - if ((unsigned long)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */
9531 - data_ptr += ALIGN_SIZE - (unsigned long)data_ptr % ALIGN_SIZE;
9532 + if ((size_t)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */
9533 + data_ptr += ALIGN_SIZE - (size_t)data_ptr % ALIGN_SIZE;
9534
9535 return (void FAR *) data_ptr;
9536 }
9537 @@ -420,7 +429,7 @@
9538 /* Make sure each row is properly aligned */
9539 if ((ALIGN_SIZE % SIZEOF(JSAMPLE)) != 0)
9540 out_of_memory(cinfo, 5); /* safety check */
9541 - samplesperrow = jround_up(samplesperrow, (2 * ALIGN_SIZE) / SIZEOF(JSAMPLE));
9542 + samplesperrow = (JDIMENSION)round_up_pow2(samplesperrow, (2 * ALIGN_SIZE) / S IZEOF(JSAMPLE));
9543
9544 /* Calculate max # of rows allowed in one allocation chunk */
9545 ltemp = (MAX_ALLOC_CHUNK-SIZEOF(large_pool_hdr)) /
9546 @@ -608,8 +617,8 @@
9547 /* Allocate the in-memory buffers for any unrealized virtual arrays */
9548 {
9549 my_mem_ptr mem = (my_mem_ptr) cinfo->mem;
9550 - long space_per_minheight, maximum_space, avail_mem;
9551 - long minheights, max_minheights;
9552 + size_t space_per_minheight, maximum_space, avail_mem;
9553 + size_t minheights, max_minheights;
9554 jvirt_sarray_ptr sptr;
9555 jvirt_barray_ptr bptr;
9556
9557 Index: jmemnobs.c
9558 ===================================================================
9559 --- jmemnobs.c (revision 829)
9560 +++ jmemnobs.c (working copy)
9561 @@ -69,9 +69,9 @@
9562 * Here we always say, "we got all you want bud!"
9563 */
9564
9565 -GLOBAL(long)
9566 -jpeg_mem_available (j_common_ptr cinfo, long min_bytes_needed,
9567 - long max_bytes_needed, long already_allocated)
9568 +GLOBAL(size_t)
9569 +jpeg_mem_available (j_common_ptr cinfo, size_t min_bytes_needed,
9570 + size_t max_bytes_needed, size_t already_allocated)
9571 {
9572 return max_bytes_needed;
9573 }
9574 Index: jmemsys.h
9575 ===================================================================
9576 --- jmemsys.h (revision 829)
9577 +++ jmemsys.h (working copy)
9578 @@ -100,10 +100,10 @@
9579 * Conversely, zero may be returned to always use the minimum amount of memory.
9580 */
9581
9582 -EXTERN(long) jpeg_mem_available JPP((j_common_ptr cinfo,
9583 - long min_bytes_needed,
9584 - long max_bytes_needed,
9585 - long already_allocated));
9586 +EXTERN(size_t) jpeg_mem_available JPP((j_common_ptr cinfo,
9587 + size_t min_bytes_needed,
9588 + size_t max_bytes_needed,
9589 + size_t already_allocated));
9590
9591
9592 /*
167 Index: jmorecfg.h 9593 Index: jmorecfg.h
168 =================================================================== 9594 ===================================================================
169 --- jmorecfg.h (revision 829) 9595 --- jmorecfg.h (revision 829)
170 +++ jmorecfg.h (working copy) 9596 +++ jmorecfg.h (working copy)
171 @@ -153,14 +153,18 @@ 9597 @@ -1,9 +1,10 @@
9598 /*
9599 * jmorecfg.h
9600 *
9601 + * This file was part of the Independent JPEG Group's software:
9602 * Copyright (C) 1991-1997, Thomas G. Lane.
9603 - * Copyright (C) 2009, D. R. Commander.
9604 - * This file is part of the Independent JPEG Group's software.
9605 + * Modifications:
9606 + * Copyright (C) 2009, 2011, D. R. Commander.
9607 * For conditions of distribution and use, see the accompanying README file.
9608 *
9609 * This file contains additional configuration options that customize the
9610 @@ -153,14 +154,18 @@
172 /* INT16 must hold at least the values -32768..32767. */ 9611 /* INT16 must hold at least the values -32768..32767. */
173 9612
174 #ifndef XMD_H /* X11/xmd.h correctly defines INT16 */ 9613 #ifndef XMD_H /* X11/xmd.h correctly defines INT16 */
175 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */ 9614 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */
176 typedef short INT16; 9615 typedef short INT16;
177 #endif 9616 #endif
178 +#endif 9617 +#endif
179 9618
180 /* INT32 must hold at least signed 32-bit values. */ 9619 /* INT32 must hold at least signed 32-bit values. */
181 9620
182 #ifndef XMD_H /* X11/xmd.h correctly defines INT32 */ 9621 #ifndef XMD_H /* X11/xmd.h correctly defines INT32 */
183 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */ 9622 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */
184 typedef long INT32; 9623 typedef long INT32;
185 #endif 9624 #endif
186 +#endif 9625 +#endif
187 9626
188 /* Datatype used for image dimensions. The JPEG standard only supports 9627 /* Datatype used for image dimensions. The JPEG standard only supports
189 * images up to 64K*64K due to 16-bit fields in SOF markers. Therefore 9628 * images up to 64K*64K due to 16-bit fields in SOF markers. Therefore
190 @@ -210,11 +214,13 @@ 9629 @@ -210,11 +215,16 @@
191 * explicit coding is needed; see uses of the NEED_FAR_POINTERS symbol. 9630 * explicit coding is needed; see uses of the NEED_FAR_POINTERS symbol.
192 */ 9631 */
193 9632
194 +#ifndef FAR 9633 +#ifndef FAR
195 #ifdef NEED_FAR_POINTERS 9634 #ifdef NEED_FAR_POINTERS
9635 +#ifndef FAR
196 #define FAR far 9636 #define FAR far
9637 +#endif
197 #else 9638 #else
9639 +#undef FAR
198 #define FAR 9640 #define FAR
199 #endif 9641 #endif
200 +#endif 9642 +#endif
201 9643
202 9644
203 /* 9645 /*
9646 @@ -257,8 +267,6 @@
9647 * (You may HAVE to do that if your compiler doesn't like null source files.)
9648 */
9649
9650 -/* Arithmetic coding is unsupported for legal reasons. Complaints to IBM. */
9651 -
9652 /* Capability options common to encoder and decoder: */
9653
9654 #define DCT_ISLOW_SUPPORTED /* slow but accurate integer algorithm */
9655 @@ -267,7 +275,6 @@
9656
9657 /* Encoder capability options: */
9658
9659 -#undef C_ARITH_CODING_SUPPORTED /* Arithmetic coding back end? */
9660 #define C_MULTISCAN_FILES_SUPPORTED /* Multiple-scan JPEG files? */
9661 #define C_PROGRESSIVE_SUPPORTED /* Progressive JPEG? (Requires MULTI SCAN)*/
9662 #define ENTROPY_OPT_SUPPORTED /* Optimization of entropy coding parms? */
9663 @@ -283,7 +290,6 @@
9664
9665 /* Decoder capability options: */
9666
9667 -#undef D_ARITH_CODING_SUPPORTED /* Arithmetic coding back end? */
9668 #define D_MULTISCAN_FILES_SUPPORTED /* Multiple-scan JPEG files? */
9669 #define D_PROGRESSIVE_SUPPORTED /* Progressive JPEG? (Requires MULTI SCAN)*/
9670 #define SAVE_MARKERS_SUPPORTED /* jpeg_save_markers() needed? */
9671 @@ -317,22 +323,60 @@
9672 #define RGB_BLUE 2 /* Offset of Blue */
9673 #define RGB_PIXELSIZE 3 /* JSAMPLEs per RGB scanline element */
9674
9675 -#define JPEG_NUMCS 12
9676 +#define JPEG_NUMCS 16
9677
9678 +#define EXT_RGB_RED 0
9679 +#define EXT_RGB_GREEN 1
9680 +#define EXT_RGB_BLUE 2
9681 +#define EXT_RGB_PIXELSIZE 3
9682 +
9683 +#define EXT_RGBX_RED 0
9684 +#define EXT_RGBX_GREEN 1
9685 +#define EXT_RGBX_BLUE 2
9686 +#define EXT_RGBX_PIXELSIZE 4
9687 +
9688 +#define EXT_BGR_RED 2
9689 +#define EXT_BGR_GREEN 1
9690 +#define EXT_BGR_BLUE 0
9691 +#define EXT_BGR_PIXELSIZE 3
9692 +
9693 +#define EXT_BGRX_RED 2
9694 +#define EXT_BGRX_GREEN 1
9695 +#define EXT_BGRX_BLUE 0
9696 +#define EXT_BGRX_PIXELSIZE 4
9697 +
9698 +#define EXT_XBGR_RED 3
9699 +#define EXT_XBGR_GREEN 2
9700 +#define EXT_XBGR_BLUE 1
9701 +#define EXT_XBGR_PIXELSIZE 4
9702 +
9703 +#define EXT_XRGB_RED 1
9704 +#define EXT_XRGB_GREEN 2
9705 +#define EXT_XRGB_BLUE 3
9706 +#define EXT_XRGB_PIXELSIZE 4
9707 +
9708 static const int rgb_red[JPEG_NUMCS] = {
9709 - -1, -1, RGB_RED, -1, -1, -1, 0, 0, 2, 2, 3, 1
9710 + -1, -1, RGB_RED, -1, -1, -1, EXT_RGB_RED, EXT_RGBX_RED,
9711 + EXT_BGR_RED, EXT_BGRX_RED, EXT_XBGR_RED, EXT_XRGB_RED,
9712 + EXT_RGBX_RED, EXT_BGRX_RED, EXT_XBGR_RED, EXT_XRGB_RED
9713 };
9714
9715 static const int rgb_green[JPEG_NUMCS] = {
9716 - -1, -1, RGB_GREEN, -1, -1, -1, 1, 1, 1, 1, 2, 2
9717 + -1, -1, RGB_GREEN, -1, -1, -1, EXT_RGB_GREEN, EXT_RGBX_GREEN,
9718 + EXT_BGR_GREEN, EXT_BGRX_GREEN, EXT_XBGR_GREEN, EXT_XRGB_GREEN,
9719 + EXT_RGBX_GREEN, EXT_BGRX_GREEN, EXT_XBGR_GREEN, EXT_XRGB_GREEN
9720 };
9721
9722 static const int rgb_blue[JPEG_NUMCS] = {
9723 - -1, -1, RGB_BLUE, -1, -1, -1, 2, 2, 0, 0, 1, 3
9724 + -1, -1, RGB_BLUE, -1, -1, -1, EXT_RGB_BLUE, EXT_RGBX_BLUE,
9725 + EXT_BGR_BLUE, EXT_BGRX_BLUE, EXT_XBGR_BLUE, EXT_XRGB_BLUE,
9726 + EXT_RGBX_BLUE, EXT_BGRX_BLUE, EXT_XBGR_BLUE, EXT_XRGB_BLUE
9727 };
9728
9729 static const int rgb_pixelsize[JPEG_NUMCS] = {
9730 - -1, -1, RGB_PIXELSIZE, -1, -1, -1, 3, 4, 3, 4, 4, 4
9731 + -1, -1, RGB_PIXELSIZE, -1, -1, -1, EXT_RGB_PIXELSIZE, EXT_RGBX_PIXELSIZE,
9732 + EXT_BGR_PIXELSIZE, EXT_BGRX_PIXELSIZE, EXT_XBGR_PIXELSIZE, EXT_XRGB_PIXELSIZE ,
9733 + EXT_RGBX_PIXELSIZE, EXT_BGRX_PIXELSIZE, EXT_XBGR_PIXELSIZE, EXT_XRGB_PIXELSIZ E
9734 };
9735
9736 /* Definitions for speed-related optimizations. */
9737 Index: jpegint.h
9738 ===================================================================
9739 --- jpegint.h (revision 829)
9740 +++ jpegint.h (working copy)
9741 @@ -2,6 +2,7 @@
9742 * jpegint.h
9743 *
9744 * Copyright (C) 1991-1997, Thomas G. Lane.
9745 + * Modified 1997-2009 by Guido Vollbeding.
9746 * This file is part of the Independent JPEG Group's software.
9747 * For conditions of distribution and use, see the accompanying README file.
9748 *
9749 @@ -304,6 +305,7 @@
9750 #define jinit_forward_dct jIFDCT
9751 #define jinit_huff_encoder jIHEncoder
9752 #define jinit_phuff_encoder jIPHEncoder
9753 +#define jinit_arith_encoder jIAEncoder
9754 #define jinit_marker_writer jIMWriter
9755 #define jinit_master_decompress jIDMaster
9756 #define jinit_d_main_controller jIDMainC
9757 @@ -313,6 +315,7 @@
9758 #define jinit_marker_reader jIMReader
9759 #define jinit_huff_decoder jIHDecoder
9760 #define jinit_phuff_decoder jIPHDecoder
9761 +#define jinit_arith_decoder jIADecoder
9762 #define jinit_inverse_dct jIIDCT
9763 #define jinit_upsampler jIUpsampler
9764 #define jinit_color_deconverter jIDColor
9765 @@ -327,6 +330,7 @@
9766 #define jzero_far jZeroFar
9767 #define jpeg_zigzag_order jZIGTable
9768 #define jpeg_natural_order jZAGTable
9769 +#define jpeg_aritab jAriTab
9770 #endif /* NEED_SHORT_EXTERNAL_NAMES */
9771
9772
9773 @@ -345,6 +349,7 @@
9774 EXTERN(void) jinit_forward_dct JPP((j_compress_ptr cinfo));
9775 EXTERN(void) jinit_huff_encoder JPP((j_compress_ptr cinfo));
9776 EXTERN(void) jinit_phuff_encoder JPP((j_compress_ptr cinfo));
9777 +EXTERN(void) jinit_arith_encoder JPP((j_compress_ptr cinfo));
9778 EXTERN(void) jinit_marker_writer JPP((j_compress_ptr cinfo));
9779 /* Decompression module initialization routines */
9780 EXTERN(void) jinit_master_decompress JPP((j_decompress_ptr cinfo));
9781 @@ -358,6 +363,7 @@
9782 EXTERN(void) jinit_marker_reader JPP((j_decompress_ptr cinfo));
9783 EXTERN(void) jinit_huff_decoder JPP((j_decompress_ptr cinfo));
9784 EXTERN(void) jinit_phuff_decoder JPP((j_decompress_ptr cinfo));
9785 +EXTERN(void) jinit_arith_decoder JPP((j_decompress_ptr cinfo));
9786 EXTERN(void) jinit_inverse_dct JPP((j_decompress_ptr cinfo));
9787 EXTERN(void) jinit_upsampler JPP((j_decompress_ptr cinfo));
9788 EXTERN(void) jinit_color_deconverter JPP((j_decompress_ptr cinfo));
9789 @@ -382,6 +388,9 @@
9790 #endif
9791 extern const int jpeg_natural_order[]; /* zigzag coef order to natural order */
9792
9793 +/* Arithmetic coding probability estimation tables in jaricom.c */
9794 +extern const INT32 jpeg_aritab[];
9795 +
9796 /* Suppress undefined-structure complaints if necessary. */
9797
9798 #ifdef INCOMPLETE_TYPES_BROKEN
204 Index: jpeglib.h 9799 Index: jpeglib.h
205 =================================================================== 9800 ===================================================================
206 --- jpeglib.h (revision 829) 9801 --- jpeglib.h (revision 829)
207 +++ jpeglib.h (working copy) 9802 +++ jpeglib.h (working copy)
208 @@ -15,6 +15,10 @@ 9803 @@ -1,9 +1,12 @@
9804 /*
9805 * jpeglib.h
9806 *
9807 + * This file was part of the Independent JPEG Group's software:
9808 * Copyright (C) 1991-1998, Thomas G. Lane.
9809 - * Copyright (C) 2009, D. R. Commander.
9810 - * This file is part of the Independent JPEG Group's software.
9811 + * Modified 2002-2009 by Guido Vollbeding.
9812 + * Modifications:
9813 + * Copyright (C) 2009-2011, 2013, D. R. Commander.
9814 + * Copyright (C) 2015, Google, Inc.
9815 * For conditions of distribution and use, see the accompanying README file.
9816 *
9817 * This file defines the application interface for the JPEG library.
9818 @@ -14,6 +17,10 @@
209 #ifndef JPEGLIB_H 9819 #ifndef JPEGLIB_H
210 #define JPEGLIB_H 9820 #define JPEGLIB_H
211 9821
212 +/* Begin chromium edits */ 9822 +/* Begin chromium edits */
213 +#include "jpeglibmangler.h" 9823 +#include "jpeglibmangler.h"
214 +/* End chromium edits */ 9824 +/* End chromium edits */
215 + 9825 +
216 /* 9826 /*
217 * First we include the configuration files that record how this 9827 * First we include the configuration files that record how this
218 * installation of the JPEG library is set up. jconfig.h can be 9828 * installation of the JPEG library is set up. jconfig.h can be
9829 @@ -27,13 +34,13 @@
9830 #include "jmorecfg.h" /* seldom changed options */
9831
9832
9833 -/* Version ID for the JPEG library.
9834 - * Might be useful for tests like "#if JPEG_LIB_VERSION >= 60".
9835 - */
9836 +#ifdef __cplusplus
9837 +#ifndef DONT_USE_EXTERN_C
9838 +extern "C" {
9839 +#endif
9840 +#endif
9841
9842 -#define JPEG_LIB_VERSION 62 /* Version 6b */
9843
9844 -
9845 /* Various constants determining the sizes of things.
9846 * All of these are specified by the JPEG standard, so don't change them
9847 * if you want to be compatible.
9848 @@ -145,12 +152,17 @@
9849 * Values of 1,2,4,8 are likely to be supported. Note that different
9850 * components may receive different IDCT scalings.
9851 */
9852 +#if JPEG_LIB_VERSION >= 70
9853 + int DCT_h_scaled_size;
9854 + int DCT_v_scaled_size;
9855 +#else
9856 int DCT_scaled_size;
9857 +#endif
9858 /* The downsampled dimensions are the component's actual, unpadded number
9859 * of samples at the main buffer (preprocessing/compression interface), thus
9860 * downsampled_width = ceil(image_width * Hi/Hmax)
9861 * and similarly for height. For decompression, IDCT scaling is included, so
9862 - * downsampled_width = ceil(image_width * Hi/Hmax * DCT_scaled_size/DCTSIZE)
9863 + * downsampled_width = ceil(image_width * Hi/Hmax * DCT_[h_]scaled_size/DCTSI ZE)
9864 */
9865 JDIMENSION downsampled_width; /* actual width in samples */
9866 JDIMENSION downsampled_height; /* actual height in samples */
9867 @@ -165,7 +177,7 @@
9868 int MCU_width; /* number of blocks per MCU, horizontally */
9869 int MCU_height; /* number of blocks per MCU, vertically */
9870 int MCU_blocks; /* MCU_width * MCU_height */
9871 - int MCU_sample_width; /* MCU width in samples, MCU_width*DCT_s caled_size */
9872 + int MCU_sample_width; /* MCU width in samples, MCU_width*DCT_[ h_]scaled_size */
9873 int last_col_width; /* # of non-dummy blocks across in last MCU */
9874 int last_row_height; /* # of non-dummy blocks down in last MCU */
9875
9876 @@ -205,12 +217,13 @@
9877 /* Known color spaces. */
9878
9879 #define JCS_EXTENSIONS 1
9880 +#define JCS_ALPHA_EXTENSIONS 1
9881
9882 typedef enum {
9883 JCS_UNKNOWN, /* error/unspecified */
9884 JCS_GRAYSCALE, /* monochrome */
9885 JCS_RGB, /* red/green/blue as specified by the RGB_RED, R GB_GREEN,
9886 - RGB_BLUE, and RGB_PIXELSIZE macros */
9887 + RGB_BLUE, and RGB_PIXELSIZE macros */
9888 JCS_YCbCr, /* Y/Cb/Cr (also known as YUV) */
9889 JCS_CMYK, /* C/M/Y/K */
9890 JCS_YCCK, /* Y/Cb/Cr/K */
9891 @@ -220,6 +233,17 @@
9892 JCS_EXT_BGRX, /* blue/green/red/x */
9893 JCS_EXT_XBGR, /* x/blue/green/red */
9894 JCS_EXT_XRGB, /* x/red/green/blue */
9895 + /* When out_color_space it set to JCS_EXT_RGBX, JCS_EXT_BGRX,
9896 + JCS_EXT_XBGR, or JCS_EXT_XRGB during decompression, the X byte is
9897 + undefined, and in order to ensure the best performance,
9898 + libjpeg-turbo can set that byte to whatever value it wishes. Use
9899 + the following colorspace constants to ensure that the X byte is set
9900 + to 0xFF, so that it can be interpreted as an opaque alpha
9901 + channel. */
9902 + JCS_EXT_RGBA, /* red/green/blue/alpha */
9903 + JCS_EXT_BGRA, /* blue/green/red/alpha */
9904 + JCS_EXT_ABGR, /* alpha/blue/green/red */
9905 + JCS_EXT_ARGB /* alpha/red/green/blue */
9906 } J_COLOR_SPACE;
9907
9908 /* DCT/IDCT algorithm options. */
9909 @@ -301,6 +325,19 @@
9910 * helper routines to simplify changing parameters.
9911 */
9912
9913 +#if JPEG_LIB_VERSION >= 70
9914 + unsigned int scale_num, scale_denom; /* fraction by which to scale image */
9915 +
9916 + JDIMENSION jpeg_width; /* scaled JPEG image width */
9917 + JDIMENSION jpeg_height; /* scaled JPEG image height */
9918 + /* Dimensions of actual JPEG image that will be written to file,
9919 + * derived from input dimensions by scaling factors above.
9920 + * These fields are computed by jpeg_start_compress().
9921 + * You can also use jpeg_calc_jpeg_dimensions() to determine these values
9922 + * in advance of calling jpeg_start_compress().
9923 + */
9924 +#endif
9925 +
9926 int data_precision; /* bits of precision in image data */
9927
9928 int num_components; /* # of color components in JPEG image */
9929 @@ -308,14 +345,19 @@
9930
9931 jpeg_component_info * comp_info;
9932 /* comp_info[i] describes component that appears i'th in SOF */
9933 -
9934 +
9935 JQUANT_TBL * quant_tbl_ptrs[NUM_QUANT_TBLS];
9936 - /* ptrs to coefficient quantization tables, or NULL if not defined */
9937 -
9938 +#if JPEG_LIB_VERSION >= 70
9939 + int q_scale_factor[NUM_QUANT_TBLS];
9940 +#endif
9941 + /* ptrs to coefficient quantization tables, or NULL if not defined,
9942 + * and corresponding scale factors (percentage, initialized 100).
9943 + */
9944 +
9945 JHUFF_TBL * dc_huff_tbl_ptrs[NUM_HUFF_TBLS];
9946 JHUFF_TBL * ac_huff_tbl_ptrs[NUM_HUFF_TBLS];
9947 /* ptrs to Huffman coding tables, or NULL if not defined */
9948 -
9949 +
9950 UINT8 arith_dc_L[NUM_ARITH_TBLS]; /* L values for DC arith-coding tables */
9951 UINT8 arith_dc_U[NUM_ARITH_TBLS]; /* U values for DC arith-coding tables */
9952 UINT8 arith_ac_K[NUM_ARITH_TBLS]; /* Kx values for AC arith-coding tables */
9953 @@ -331,6 +373,9 @@
9954 boolean arith_code; /* TRUE=arithmetic coding, FALSE=Huffman */
9955 boolean optimize_coding; /* TRUE=optimize entropy encoding parms */
9956 boolean CCIR601_sampling; /* TRUE=first samples are cosited */
9957 +#if JPEG_LIB_VERSION >= 70
9958 + boolean do_fancy_downsampling; /* TRUE=apply fancy downsampling */
9959 +#endif
9960 int smoothing_factor; /* 1..100, or 0 for no input smoothing * /
9961 J_DCT_METHOD dct_method; /* DCT algorithm selector */
9962
9963 @@ -374,6 +419,11 @@
9964 int max_h_samp_factor; /* largest h_samp_factor */
9965 int max_v_samp_factor; /* largest v_samp_factor */
9966
9967 +#if JPEG_LIB_VERSION >= 70
9968 + int min_DCT_h_scaled_size; /* smallest DCT_h_scaled_size of any component * /
9969 + int min_DCT_v_scaled_size; /* smallest DCT_v_scaled_size of any component * /
9970 +#endif
9971 +
9972 JDIMENSION total_iMCU_rows; /* # of iMCU rows to be input to coef ctlr */
9973 /* The coefficient controller receives data in units of MCU rows as defined
9974 * for fully interleaved scans (whether the JPEG file is interleaved or not).
9975 @@ -399,6 +449,12 @@
9976
9977 int Ss, Se, Ah, Al; /* progressive JPEG parameters for scan */
9978
9979 +#if JPEG_LIB_VERSION >= 80
9980 + int block_size; /* the basic DCT block size: 1..16 */
9981 + const int * natural_order; /* natural-order position array */
9982 + int lim_Se; /* min( Se, DCTSIZE2-1 ) */
9983 +#endif
9984 +
9985 /*
9986 * Links to compression subobjects (methods and private variables of modules)
9987 */
9988 @@ -545,6 +601,9 @@
9989 jpeg_component_info * comp_info;
9990 /* comp_info[i] describes component that appears i'th in SOF */
9991
9992 +#if JPEG_LIB_VERSION >= 80
9993 + boolean is_baseline; /* TRUE if Baseline SOF0 encountered */
9994 +#endif
9995 boolean progressive_mode; /* TRUE if SOFn specifies progressive mode */
9996 boolean arith_code; /* TRUE=arithmetic coding, FALSE=Huffman */
9997
9998 @@ -585,7 +644,12 @@
9999 int max_h_samp_factor; /* largest h_samp_factor */
10000 int max_v_samp_factor; /* largest v_samp_factor */
10001
10002 +#if JPEG_LIB_VERSION >= 70
10003 + int min_DCT_h_scaled_size; /* smallest DCT_h_scaled_size of any component * /
10004 + int min_DCT_v_scaled_size; /* smallest DCT_v_scaled_size of any component * /
10005 +#else
10006 int min_DCT_scaled_size; /* smallest DCT_scaled_size of any component */
10007 +#endif
10008
10009 JDIMENSION total_iMCU_rows; /* # of iMCU rows in image */
10010 /* The coefficient controller's input and output progress is measured in
10011 @@ -593,7 +657,7 @@
10012 * in fully interleaved JPEG scans, but are used whether the scan is
10013 * interleaved or not. We define an iMCU row as v_samp_factor DCT block
10014 * rows of each component. Therefore, the IDCT output contains
10015 - * v_samp_factor*DCT_scaled_size sample rows of a component per iMCU row.
10016 + * v_samp_factor*DCT_[v_]scaled_size sample rows of a component per iMCU row.
10017 */
10018
10019 JSAMPLE * sample_range_limit; /* table for fast range-limiting */
10020 @@ -617,6 +681,14 @@
10021
10022 int Ss, Se, Ah, Al; /* progressive JPEG parameters for scan */
10023
10024 +#if JPEG_LIB_VERSION >= 80
10025 + /* These fields are derived from Se of first SOS marker.
10026 + */
10027 + int block_size; /* the basic DCT block size: 1..16 */
10028 + const int * natural_order; /* natural-order position array for entropy decode */
10029 + int lim_Se; /* min( Se, DCTSIZE2-1 ) for entropy decode */
10030 +#endif
10031 +
10032 /* This field is shared between entropy decoder and marker parser.
10033 * It is either zero or the code of a JPEG marker that has been
10034 * read from the data source, but has not yet been processed.
10035 @@ -846,11 +918,18 @@
10036 #define jpeg_destroy_decompress jDestDecompress
10037 #define jpeg_stdio_dest jStdDest
10038 #define jpeg_stdio_src jStdSrc
10039 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
10040 +#define jpeg_mem_dest jMemDest
10041 +#define jpeg_mem_src jMemSrc
10042 +#endif
10043 #define jpeg_set_defaults jSetDefaults
10044 #define jpeg_set_colorspace jSetColorspace
10045 #define jpeg_default_colorspace jDefColorspace
10046 #define jpeg_set_quality jSetQuality
10047 #define jpeg_set_linear_quality jSetLQuality
10048 +#if JPEG_LIB_VERSION >= 70
10049 +#define jpeg_default_qtables jDefQTables
10050 +#endif
10051 #define jpeg_add_quant_table jAddQuantTable
10052 #define jpeg_quality_scaling jQualityScaling
10053 #define jpeg_simple_progression jSimProgress
10054 @@ -860,6 +939,9 @@
10055 #define jpeg_start_compress jStrtCompress
10056 #define jpeg_write_scanlines jWrtScanlines
10057 #define jpeg_finish_compress jFinCompress
10058 +#if JPEG_LIB_VERSION >= 70
10059 +#define jpeg_calc_jpeg_dimensions jCjpegDimensions
10060 +#endif
10061 #define jpeg_write_raw_data jWrtRawData
10062 #define jpeg_write_marker jWrtMarker
10063 #define jpeg_write_m_header jWrtMHeader
10064 @@ -876,6 +958,9 @@
10065 #define jpeg_input_complete jInComplete
10066 #define jpeg_new_colormap jNewCMap
10067 #define jpeg_consume_input jConsumeInput
10068 +#if JPEG_LIB_VERSION >= 80
10069 +#define jpeg_core_output_dimensions jCoreDimensions
10070 +#endif
10071 #define jpeg_calc_output_dimensions jCalcDimensions
10072 #define jpeg_save_markers jSaveMarkers
10073 #define jpeg_set_marker_processor jSetMarker
10074 @@ -920,6 +1005,16 @@
10075 EXTERN(void) jpeg_stdio_dest JPP((j_compress_ptr cinfo, FILE * outfile));
10076 EXTERN(void) jpeg_stdio_src JPP((j_decompress_ptr cinfo, FILE * infile));
10077
10078 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
10079 +/* Data source and destination managers: memory buffers. */
10080 +EXTERN(void) jpeg_mem_dest JPP((j_compress_ptr cinfo,
10081 + unsigned char ** outbuffer,
10082 + unsigned long * outsize));
10083 +EXTERN(void) jpeg_mem_src JPP((j_decompress_ptr cinfo,
10084 + unsigned char * inbuffer,
10085 + unsigned long insize));
10086 +#endif
10087 +
10088 /* Default parameter setup for compression */
10089 EXTERN(void) jpeg_set_defaults JPP((j_compress_ptr cinfo));
10090 /* Compression parameter setup aids */
10091 @@ -931,6 +1026,10 @@
10092 EXTERN(void) jpeg_set_linear_quality JPP((j_compress_ptr cinfo,
10093 int scale_factor,
10094 boolean force_baseline));
10095 +#if JPEG_LIB_VERSION >= 70
10096 +EXTERN(void) jpeg_default_qtables JPP((j_compress_ptr cinfo,
10097 + boolean force_baseline));
10098 +#endif
10099 EXTERN(void) jpeg_add_quant_table JPP((j_compress_ptr cinfo, int which_tbl,
10100 const unsigned int *basic_table,
10101 int scale_factor,
10102 @@ -950,12 +1049,17 @@
10103 JDIMENSION num_lines));
10104 EXTERN(void) jpeg_finish_compress JPP((j_compress_ptr cinfo));
10105
10106 +#if JPEG_LIB_VERSION >= 70
10107 +/* Precalculate JPEG dimensions for current compression parameters. */
10108 +EXTERN(void) jpeg_calc_jpeg_dimensions JPP((j_compress_ptr cinfo));
10109 +#endif
10110 +
10111 /* Replaces jpeg_write_scanlines when writing raw downsampled data. */
10112 EXTERN(JDIMENSION) jpeg_write_raw_data JPP((j_compress_ptr cinfo,
10113 JSAMPIMAGE data,
10114 JDIMENSION num_lines));
10115
10116 -/* Write a special marker. See libjpeg.doc concerning safe usage. */
10117 +/* Write a special marker. See libjpeg.txt concerning safe usage. */
10118 EXTERN(void) jpeg_write_marker
10119 JPP((j_compress_ptr cinfo, int marker,
10120 const JOCTET * dataptr, unsigned int datalen));
10121 @@ -986,6 +1090,8 @@
10122 EXTERN(JDIMENSION) jpeg_read_scanlines JPP((j_decompress_ptr cinfo,
10123 JSAMPARRAY scanlines,
10124 JDIMENSION max_lines));
10125 +EXTERN(JDIMENSION) jpeg_skip_scanlines (j_decompress_ptr cinfo,
10126 + JDIMENSION num_lines);
10127 EXTERN(boolean) jpeg_finish_decompress JPP((j_decompress_ptr cinfo));
10128
10129 /* Replaces jpeg_read_scanlines when reading raw downsampled data. */
10130 @@ -1009,6 +1115,9 @@
10131 #define JPEG_SCAN_COMPLETED 4 /* Completed last iMCU row of a scan */
10132
10133 /* Precalculate output dimensions for current decompression parameters. */
10134 +#if JPEG_LIB_VERSION >= 80
10135 +EXTERN(void) jpeg_core_output_dimensions JPP((j_decompress_ptr cinfo));
10136 +#endif
10137 EXTERN(void) jpeg_calc_output_dimensions JPP((j_decompress_ptr cinfo));
10138
10139 /* Control saving of COM and APPn markers into marker_list. */
10140 @@ -1103,4 +1212,10 @@
10141 #include "jerror.h" /* fetch error codes too */
10142 #endif
10143
10144 +#ifdef __cplusplus
10145 +#ifndef DONT_USE_EXTERN_C
10146 +}
10147 +#endif
10148 +#endif
10149 +
10150 #endif /* JPEGLIB_H */
219 Index: jpeglibmangler.h 10151 Index: jpeglibmangler.h
220 =================================================================== 10152 ===================================================================
221 --- jpeglibmangler.h (revision 0) 10153 --- jpeglibmangler.h (revision 0)
222 +++ jpeglibmangler.h» (revision 0) 10154 +++ jpeglibmangler.h» (working copy)
223 @@ -0,0 +1,113 @@ 10155 @@ -0,0 +1,114 @@
224 +// Copyright (c) 2009 The Chromium Authors. All rights reserved. 10156 +// Copyright (c) 2009 The Chromium Authors. All rights reserved.
225 +// Use of this source code is governed by a BSD-style license that can be 10157 +// Use of this source code is governed by a BSD-style license that can be
226 +// found in the LICENSE file. 10158 +// found in the LICENSE file.
227 + 10159 +
228 +#ifndef THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_ 10160 +#ifndef THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_
229 +#define THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_ 10161 +#define THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_
230 + 10162 +
231 +// Mangle all externally visible function names so we can build our own libjpeg 10163 +// Mangle all externally visible function names so we can build our own libjpeg
232 +// without system libraries trying to use it. 10164 +// without system libraries trying to use it.
233 + 10165 +
(...skipping 64 matching lines...) Expand 10 before | Expand all | Expand 10 after
298 +#define jpeg_write_scanlines chromium_jpeg_write_scanlines 10230 +#define jpeg_write_scanlines chromium_jpeg_write_scanlines
299 +#define jpeg_finish_compress chromium_jpeg_finish_compress 10231 +#define jpeg_finish_compress chromium_jpeg_finish_compress
300 +#define jpeg_write_raw_data chromium_jpeg_write_raw_data 10232 +#define jpeg_write_raw_data chromium_jpeg_write_raw_data
301 +#define jpeg_write_marker chromium_jpeg_write_marker 10233 +#define jpeg_write_marker chromium_jpeg_write_marker
302 +#define jpeg_write_m_header chromium_jpeg_write_m_header 10234 +#define jpeg_write_m_header chromium_jpeg_write_m_header
303 +#define jpeg_write_m_byte chromium_jpeg_write_m_byte 10235 +#define jpeg_write_m_byte chromium_jpeg_write_m_byte
304 +#define jpeg_write_tables chromium_jpeg_write_tables 10236 +#define jpeg_write_tables chromium_jpeg_write_tables
305 +#define jpeg_read_header chromium_jpeg_read_header 10237 +#define jpeg_read_header chromium_jpeg_read_header
306 +#define jpeg_start_decompress chromium_jpeg_start_decompress 10238 +#define jpeg_start_decompress chromium_jpeg_start_decompress
307 +#define jpeg_read_scanlines chromium_jpeg_read_scanlines 10239 +#define jpeg_read_scanlines chromium_jpeg_read_scanlines
10240 +#define jpeg_skip_scanlines chromium_jpeg_skip_scanlines
308 +#define jpeg_finish_decompress chromium_jpeg_finish_decompress 10241 +#define jpeg_finish_decompress chromium_jpeg_finish_decompress
309 +#define jpeg_read_raw_data chromium_jpeg_read_raw_data 10242 +#define jpeg_read_raw_data chromium_jpeg_read_raw_data
310 +#define jpeg_has_multiple_scans chromium_jpeg_has_multiple_scans 10243 +#define jpeg_has_multiple_scans chromium_jpeg_has_multiple_scans
311 +#define jpeg_start_output chromium_jpeg_start_output 10244 +#define jpeg_start_output chromium_jpeg_start_output
312 +#define jpeg_finish_output chromium_jpeg_finish_output 10245 +#define jpeg_finish_output chromium_jpeg_finish_output
313 +#define jpeg_input_complete chromium_jpeg_input_complete 10246 +#define jpeg_input_complete chromium_jpeg_input_complete
314 +#define jpeg_new_colormap chromium_jpeg_new_colormap 10247 +#define jpeg_new_colormap chromium_jpeg_new_colormap
315 +#define jpeg_consume_input chromium_jpeg_consume_input 10248 +#define jpeg_consume_input chromium_jpeg_consume_input
316 +#define jpeg_calc_output_dimensions chromium_jpeg_calc_output_dimensions 10249 +#define jpeg_calc_output_dimensions chromium_jpeg_calc_output_dimensions
317 +#define jpeg_save_markers chromium_jpeg_save_markers 10250 +#define jpeg_save_markers chromium_jpeg_save_markers
318 +#define jpeg_set_marker_processor chromium_jpeg_set_marker_processor 10251 +#define jpeg_set_marker_processor chromium_jpeg_set_marker_processor
319 +#define jpeg_read_coefficients chromium_jpeg_read_coefficients 10252 +#define jpeg_read_coefficients chromium_jpeg_read_coefficients
320 +#define jpeg_write_coefficients chromium_jpeg_write_coefficients 10253 +#define jpeg_write_coefficients chromium_jpeg_write_coefficients
321 +#define jpeg_copy_critical_parameters chromium_jpeg_copy_critical_parameters 10254 +#define jpeg_copy_critical_parameters chromium_jpeg_copy_critical_parameters
322 +#define jpeg_abort_compress chromium_jpeg_abort_compress 10255 +#define jpeg_abort_compress chromium_jpeg_abort_compress
323 +#define jpeg_abort_decompress chromium_jpeg_abort_decompress 10256 +#define jpeg_abort_decompress chromium_jpeg_abort_decompress
324 +#define jpeg_abort chromium_jpeg_abort 10257 +#define jpeg_abort chromium_jpeg_abort
325 +#define jpeg_destroy chromium_jpeg_destroy 10258 +#define jpeg_destroy chromium_jpeg_destroy
326 +#define jpeg_resync_to_restart chromium_jpeg_resync_to_restart 10259 +#define jpeg_resync_to_restart chromium_jpeg_resync_to_restart
327 +#define jpeg_get_small chromium_jpeg_get_small 10260 +#define jpeg_get_small chromium_jpeg_get_small
328 +#define jpeg_free_small chromium_jpeg_free_small 10261 +#define jpeg_free_small chromium_jpeg_free_small
329 +#define jpeg_get_large chromium_jpeg_get_large 10262 +#define jpeg_get_large chromium_jpeg_get_large
330 +#define jpeg_free_large chromium_jpeg_free_large 10263 +#define jpeg_free_large chromium_jpeg_free_large
331 +#define jpeg_mem_available chromium_jpeg_mem_available 10264 +#define jpeg_mem_available chromium_jpeg_mem_available
332 +#define jpeg_open_backing_store chromium_jpeg_open_backing_store 10265 +#define jpeg_open_backing_store chromium_jpeg_open_backing_store
333 +#define jpeg_mem_init chromium_jpeg_mem_init 10266 +#define jpeg_mem_init chromium_jpeg_mem_init
334 +#define jpeg_mem_term chromium_jpeg_mem_term 10267 +#define jpeg_mem_term chromium_jpeg_mem_term
335 + 10268 +
336 +#endif // THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_ 10269 +#endif // THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_
337 Index: simd/jcgrass2-64.asm 10270 Index: jpegut.c
338 =================================================================== 10271 ===================================================================
339 --- simd/jcgrass2-64.asm (revision 829) 10272 --- jpegut.c (revision 829)
340 +++ simd/jcgrass2-64.asm (working copy) 10273 +++ jpegut.c (working copy)
341 @@ -30,7 +30,7 @@ 10274 @@ -19,11 +19,14 @@
342 SECTION SEG_CONST 10275 #include "./rrtimer.h"
343 10276 #include "./turbojpeg.h"
344 alignz 16 10277
345 - global EXTN(jconst_rgb_gray_convert_sse2) 10278 -#define _catch(f) {if((f)==-1) {printf("TJPEG: %s\n", tjGetErrorStr()); goto f inally;}}
346 + global EXTN(jconst_rgb_gray_convert_sse2) PRIVATE 10279 +#define _catch(f) {if((f)==-1) {printf("TJPEG: %s\n", tjGetErrorStr()); bailou t();}}
347 10280
348 EXTN(jconst_rgb_gray_convert_sse2): 10281 const char *_subnamel[NUMSUBOPT]={"4:4:4", "4:2:2", "4:2:0", "GRAY"};
349 10282 const char *_subnames[NUMSUBOPT]={"444", "422", "420", "GRAY"};
350 Index: simd/jiss2fst.asm 10283
351 =================================================================== 10284 +int exitstatus=0;
352 --- simd/jiss2fst.asm (revision 829) 10285 +#define bailout() {exitstatus=-1; goto finally;}
353 +++ simd/jiss2fst.asm (working copy) 10286 +
354 @@ -59,7 +59,7 @@ 10287 int pixels[9][3]=
355 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) 10288 {
356 10289 {0, 255, 0},
357 alignz 16 10290 @@ -70,7 +73,7 @@
358 - global EXTN(jconst_idct_ifast_sse2) 10291 }
359 + global EXTN(jconst_idct_ifast_sse2) PRIVATE 10292 }
360 10293
361 EXTN(jconst_idct_ifast_sse2): 10294 -int dumpbuf(unsigned char *buf, int w, int h, int ps, int flags)
362 10295 +void dumpbuf(unsigned char *buf, int w, int h, int ps, int flags)
363 @@ -92,7 +92,7 @@ 10296 {
364 %define WK_NUM 2 10297 int roffset=(flags&TJ_BGR)?2:0, goffset=1, boffset=(flags&TJ_BGR)?0:2, i ,
10298 j;
10299 @@ -177,12 +180,12 @@
10300 if((outfile=fopen(filename, "wb"))==NULL)
10301 {
10302 printf("ERROR: Could not open %s for writing.\n", filename);
10303 - goto finally;
10304 + bailout();
10305 }
10306 if(fwrite(jpegbuf, jpgbufsize, 1, outfile)!=1)
10307 {
10308 printf("ERROR: Could not write to %s.\n", filename);
10309 - goto finally;
10310 + bailout();
10311 }
10312
10313 finally:
10314 @@ -210,7 +213,7 @@
10315
10316 if((bmpbuf=(unsigned char *)malloc(w*h*ps+1))==NULL)
10317 {
10318 - printf("ERROR: Could not allocate buffer\n"); goto finally;
10319 + printf("ERROR: Could not allocate buffer\n"); bailout();
10320 }
10321 initbuf(bmpbuf, w, h, ps, flags);
10322 memset(jpegbuf, 0, TJBUFSIZE(w, h));
10323 @@ -249,12 +252,12 @@
10324 _catch(tjDecompressHeader(hnd, jpegbuf, jpegsize, &_w, &_h));
10325 if(_w!=w || _h!=h)
10326 {
10327 - printf("Incorrect JPEG header\n"); goto finally;
10328 + printf("Incorrect JPEG header\n"); bailout();
10329 }
10330
10331 if((bmpbuf=(unsigned char *)malloc(w*h*ps+1))==NULL)
10332 {
10333 - printf("ERROR: Could not allocate buffer\n"); goto finally;
10334 + printf("ERROR: Could not allocate buffer\n"); bailout();
10335 }
10336 memset(bmpbuf, 0, w*ps*h);
10337
10338 @@ -278,13 +281,13 @@
10339
10340 if((jpegbuf=(unsigned char *)malloc(TJBUFSIZE(w, h))) == NULL)
10341 {
10342 - puts("ERROR: Could not allocate buffer."); goto finally;
10343 + puts("ERROR: Could not allocate buffer."); bailout();
10344 }
10345
10346 if((hnd=tjInitCompress())==NULL)
10347 - {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); g oto finally;}
10348 + {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); b ailout();}
10349 if((dhnd=tjInitDecompress())==NULL)
10350 - {printf("Error in tjInitDecompress():\n%s\n", tjGetErrorStr()); goto finally;}
10351 + {printf("Error in tjInitDecompress():\n%s\n", tjGetErrorStr()); bailout();}
10352
10353 gentestjpeg(hnd, jpegbuf, &size, w, h, ps, basefilename, subsamp, 100, 0 );
10354 gentestbmp(dhnd, jpegbuf, size, w, h, ps, basefilename, subsamp, 100, 0) ;
10355 @@ -327,7 +330,7 @@
10356 int i, j, i2; unsigned char *bmpbuf=NULL, *jpgbuf=NULL;
10357 tjhandle hnd=NULL; unsigned long size;
10358 if((hnd=tjInitCompress())==NULL)
10359 - {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); g oto finally;}
10360 + {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); b ailout();}
10361 printf("Buffer size regression test\n");
10362 for(j=1; j<48; j++)
10363 {
10364 @@ -337,7 +340,7 @@
10365 if((bmpbuf=(unsigned char *)malloc(i*j*4))==NULL
10366 || (jpgbuf=(unsigned char *)malloc(TJBUFSIZE(i, j)))==NU LL)
10367 {
10368 - printf("Memory allocation failure\n"); goto fin ally;
10369 + printf("Memory allocation failure\n"); bailout( );
10370 }
10371 memset(bmpbuf, 0, i*j*4);
10372 for(i2=0; i2<i*j; i2++)
10373 @@ -353,7 +356,7 @@
10374 if((bmpbuf=(unsigned char *)malloc(j*i*4))==NULL
10375 || (jpgbuf=(unsigned char *)malloc(TJBUFSIZE(j, i)))==NU LL)
10376 {
10377 - printf("Memory allocation failure\n"); goto fin ally;
10378 + printf("Memory allocation failure\n"); bailout( );
10379 }
10380 for(i2=0; i2<j*i*4; i2++)
10381 {
10382 @@ -380,5 +383,5 @@
10383 dotest(35, 41, 4, TJ_GRAYSCALE, "test");
10384 dotest1();
10385
10386 - return 0;
10387 + return exitstatus;
10388 }
10389 Index: jpgtest.cxx
10390 ===================================================================
10391 --- jpgtest.cxx (revision 829)
10392 +++ jpgtest.cxx (working copy)
10393 @@ -322,22 +322,22 @@
10394 if(!stricmp(argv[i], "-tile")) dotile=1;
10395 if(!stricmp(argv[i], "-forcesse3"))
10396 {
10397 - printf("Using SSE3 code in Intel compressor\n");
10398 + printf("Using SSE3 code\n");
10399 forcesse3=1;
10400 }
10401 if(!stricmp(argv[i], "-forcesse2"))
10402 {
10403 - printf("Using SSE2 code in Intel compressor\n");
10404 + printf("Using SSE2 code\n");
10405 forcesse2=1;
10406 }
10407 if(!stricmp(argv[i], "-forcesse"))
10408 {
10409 - printf("Using SSE code in Intel compressor\n");
10410 + printf("Using SSE code\n");
10411 forcesse=1;
10412 }
10413 if(!stricmp(argv[i], "-forcemmx"))
10414 {
10415 - printf("Using MMX code in Intel compressor\n");
10416 + printf("Using MMX code\n");
10417 forcemmx=1;
10418 }
10419 if(!stricmp(argv[i], "-fastupsample"))
10420 Index: jquant1.c
10421 ===================================================================
10422 --- jquant1.c (revision 829)
10423 +++ jquant1.c (working copy)
10424 @@ -1,9 +1,10 @@
10425 /*
10426 * jquant1.c
10427 *
10428 + * This file was part of the Independent JPEG Group's software:
10429 * Copyright (C) 1991-1996, Thomas G. Lane.
10430 + * libjpeg-turbo Modifications:
10431 * Copyright (C) 2009, D. R. Commander
10432 - * This file is part of the Independent JPEG Group's software.
10433 * For conditions of distribution and use, see the accompanying README file.
10434 *
10435 * This file contains 1-pass color quantization (color mapping) routines.
10436 Index: jquant2.c
10437 ===================================================================
10438 --- jquant2.c (revision 829)
10439 +++ jquant2.c (working copy)
10440 @@ -1,9 +1,10 @@
10441 /*
10442 * jquant2.c
10443 *
10444 + * This file was part of the Independent JPEG Group's software:
10445 * Copyright (C) 1991-1996, Thomas G. Lane.
10446 + * libjpeg-turbo Modifications:
10447 * Copyright (C) 2009, D. R. Commander.
10448 - * This file is part of the Independent JPEG Group's software.
10449 * For conditions of distribution and use, see the accompanying README file.
10450 *
10451 * This file contains 2-pass color quantization (color mapping) routines.
10452 Index: jsimd.h
10453 ===================================================================
10454 --- jsimd.h (revision 829)
10455 +++ jsimd.h (working copy)
10456 @@ -2,9 +2,11 @@
10457 * jsimd.h
10458 *
10459 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
10460 + * Copyright 2011 D. R. Commander
10461 *
10462 * Based on the x86 SIMD extension for IJG JPEG library,
10463 * Copyright (C) 1999-2006, MIYASAKA Masaru.
10464 + * For conditions of distribution and use, see copyright notice in jsimdext.inc
10465 *
10466 */
10467
10468 @@ -12,8 +14,10 @@
10469
10470 #ifdef NEED_SHORT_EXTERNAL_NAMES
10471 #define jsimd_can_rgb_ycc jSCanRgbYcc
10472 +#define jsimd_can_rgb_gray jSCanRgbGry
10473 #define jsimd_can_ycc_rgb jSCanYccRgb
10474 #define jsimd_rgb_ycc_convert jSRgbYccConv
10475 +#define jsimd_rgb_gray_convert jSRgbGryConv
10476 #define jsimd_ycc_rgb_convert jSYccRgbConv
10477 #define jsimd_can_h2v2_downsample jSCanH2V2Down
10478 #define jsimd_can_h2v1_downsample jSCanH2V1Down
10479 @@ -34,6 +38,7 @@
10480 #endif /* NEED_SHORT_EXTERNAL_NAMES */
10481
10482 EXTERN(int) jsimd_can_rgb_ycc JPP((void));
10483 +EXTERN(int) jsimd_can_rgb_gray JPP((void));
10484 EXTERN(int) jsimd_can_ycc_rgb JPP((void));
10485
10486 EXTERN(void) jsimd_rgb_ycc_convert
10487 @@ -40,6 +45,10 @@
10488 JPP((j_compress_ptr cinfo,
10489 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
10490 JDIMENSION output_row, int num_rows));
10491 +EXTERN(void) jsimd_rgb_gray_convert
10492 + JPP((j_compress_ptr cinfo,
10493 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
10494 + JDIMENSION output_row, int num_rows));
10495 EXTERN(void) jsimd_ycc_rgb_convert
10496 JPP((j_decompress_ptr cinfo,
10497 JSAMPIMAGE input_buf, JDIMENSION input_row,
10498 Index: jsimd_none.c
10499 ===================================================================
10500 --- jsimd_none.c (revision 829)
10501 +++ jsimd_none.c (working copy)
10502 @@ -2,10 +2,11 @@
10503 * jsimd_none.c
10504 *
10505 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
10506 - * Copyright 2009 D. R. Commander
10507 + * Copyright 2009-2011 D. R. Commander
10508 *
10509 * Based on the x86 SIMD extension for IJG JPEG library,
10510 * Copyright (C) 1999-2006, MIYASAKA Masaru.
10511 + * For conditions of distribution and use, see copyright notice in jsimdext.inc
10512 *
10513 * This file contains stubs for when there is no SIMD support available.
10514 */
10515 @@ -24,6 +25,12 @@
10516 }
10517
10518 GLOBAL(int)
10519 +jsimd_can_rgb_gray (void)
10520 +{
10521 + return 0;
10522 +}
10523 +
10524 +GLOBAL(int)
10525 jsimd_can_ycc_rgb (void)
10526 {
10527 return 0;
10528 @@ -37,6 +44,13 @@
10529 }
10530
10531 GLOBAL(void)
10532 +jsimd_rgb_gray_convert (j_compress_ptr cinfo,
10533 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
10534 + JDIMENSION output_row, int num_rows)
10535 +{
10536 +}
10537 +
10538 +GLOBAL(void)
10539 jsimd_ycc_rgb_convert (j_decompress_ptr cinfo,
10540 JSAMPIMAGE input_buf, JDIMENSION input_row,
10541 JSAMPARRAY output_buf, int num_rows)
10542 Index: jsimddct.h
10543 ===================================================================
10544 --- jsimddct.h (revision 829)
10545 +++ jsimddct.h (working copy)
10546 @@ -5,6 +5,7 @@
10547 *
10548 * Based on the x86 SIMD extension for IJG JPEG library,
10549 * Copyright (C) 1999-2006, MIYASAKA Masaru.
10550 + * For conditions of distribution and use, see copyright notice in jsimdext.inc
10551 *
10552 */
10553
10554 Index: jversion.h
10555 ===================================================================
10556 --- jversion.h (revision 829)
10557 +++ jversion.h (working copy)
10558 @@ -1,8 +1,10 @@
10559 /*
10560 * jversion.h
10561 *
10562 - * Copyright (C) 1991-1998, Thomas G. Lane.
10563 - * This file is part of the Independent JPEG Group's software.
10564 + * This file was part of the Independent JPEG Group's software:
10565 + * Copyright (C) 1991-2012, Thomas G. Lane, Guido Vollbeding.
10566 + * Modifications:
10567 + * Copyright (C) 2010, 2012-2014, D. R. Commander.
10568 * For conditions of distribution and use, see the accompanying README file.
10569 *
10570 * This file contains software version identification.
10571 @@ -9,6 +11,22 @@
10572 */
10573
10574
10575 +#if JPEG_LIB_VERSION >= 80
10576 +
10577 +#define JVERSION "8d 15-Jan-2012"
10578 +
10579 +#elif JPEG_LIB_VERSION >= 70
10580 +
10581 +#define JVERSION "7 27-Jun-2009"
10582 +
10583 +#else
10584 +
10585 #define JVERSION "6b 27-Mar-1998"
10586
10587 -#define JCOPYRIGHT "Copyright (C) 1998, Thomas G. Lane"
10588 +#endif
10589 +
10590 +#define JCOPYRIGHT "Copyright (C) 1991-2012 Thomas G. Lane, Guido Vollbedin g\n" \
10591 + "Copyright (C) 1999-2006 MIYASAKA Masaru\n" \
10592 + "Copyright (C) 2009 Pierre Ossman for Cendio AB\n" \
10593 + "Copyright (C) 2009-2014 D. R. Commander\n" \
10594 + "Copyright (C) 2009-2011 Nokia Corporation and/or its su bsidiary(-ies)"
10595 Index: rdbmp.c
10596 ===================================================================
10597 --- rdbmp.c (revision 829)
10598 +++ rdbmp.c (working copy)
10599 @@ -1,8 +1,11 @@
10600 /*
10601 * rdbmp.c
10602 *
10603 + * This file was part of the Independent JPEG Group's software:
10604 * Copyright (C) 1994-1996, Thomas G. Lane.
10605 - * This file is part of the Independent JPEG Group's software.
10606 + * Modified 2009-2010 by Guido Vollbeding.
10607 + * libjpeg-turbo Modifications:
10608 + * Modified 2011 by Siarhei Siamashka.
10609 * For conditions of distribution and use, see the accompanying README file.
10610 *
10611 * This file contains routines to read input images in Microsoft "BMP"
10612 @@ -177,10 +180,41 @@
10613 }
10614
10615
10616 +METHODDEF(JDIMENSION)
10617 +get_32bit_row (j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
10618 +/* This version is for reading 32-bit pixels */
10619 +{
10620 + bmp_source_ptr source = (bmp_source_ptr) sinfo;
10621 + JSAMPARRAY image_ptr;
10622 + register JSAMPROW inptr, outptr;
10623 + register JDIMENSION col;
10624 +
10625 + /* Fetch next row from virtual array */
10626 + source->source_row--;
10627 + image_ptr = (*cinfo->mem->access_virt_sarray)
10628 + ((j_common_ptr) cinfo, source->whole_image,
10629 + source->source_row, (JDIMENSION) 1, FALSE);
10630 + /* Transfer data. Note source values are in BGR order
10631 + * (even though Microsoft's own documents say the opposite).
10632 + */
10633 + inptr = image_ptr[0];
10634 + outptr = source->pub.buffer[0];
10635 + for (col = cinfo->image_width; col > 0; col--) {
10636 + outptr[2] = *inptr++; /* can omit GETJSAMPLE() safely */
10637 + outptr[1] = *inptr++;
10638 + outptr[0] = *inptr++;
10639 + inptr++; /* skip the 4th byte (Alpha channel) */
10640 + outptr += 3;
10641 + }
10642 +
10643 + return 1;
10644 +}
10645 +
10646 +
10647 /*
10648 * This method loads the image into whole_image during the first call on
10649 * get_pixel_rows. The get_pixel_rows pointer is then adjusted to call
10650 - * get_8bit_row or get_24bit_row on subsequent calls.
10651 + * get_8bit_row, get_24bit_row, or get_32bit_row on subsequent calls.
10652 */
10653
10654 METHODDEF(JDIMENSION)
10655 @@ -188,10 +222,9 @@
10656 {
10657 bmp_source_ptr source = (bmp_source_ptr) sinfo;
10658 register FILE *infile = source->pub.input_file;
10659 - register int c;
10660 register JSAMPROW out_ptr;
10661 JSAMPARRAY image_ptr;
10662 - JDIMENSION row, col;
10663 + JDIMENSION row;
10664 cd_progress_ptr progress = (cd_progress_ptr) cinfo->progress;
10665
10666 /* Read the data into a virtual array in input-file row order. */
10667 @@ -205,11 +238,11 @@
10668 ((j_common_ptr) cinfo, source->whole_image,
10669 row, (JDIMENSION) 1, TRUE);
10670 out_ptr = image_ptr[0];
10671 - for (col = source->row_width; col > 0; col--) {
10672 - /* inline copy of read_byte() for speed */
10673 - if ((c = getc(infile)) == EOF)
10674 - ERREXIT(cinfo, JERR_INPUT_EOF);
10675 - *out_ptr++ = (JSAMPLE) c;
10676 + if (fread(out_ptr, 1, source->row_width, infile) != source->row_width) {
10677 + if (feof(infile))
10678 + ERREXIT(cinfo, JERR_INPUT_EOF);
10679 + else
10680 + ERREXIT(cinfo, JERR_FILE_READ);
10681 }
10682 }
10683 if (progress != NULL)
10684 @@ -223,6 +256,9 @@
10685 case 24:
10686 source->pub.get_pixel_rows = get_24bit_row;
10687 break;
10688 + case 32:
10689 + source->pub.get_pixel_rows = get_32bit_row;
10690 + break;
10691 default:
10692 ERREXIT(cinfo, JERR_BMP_BADDEPTH);
10693 }
10694 @@ -251,8 +287,8 @@
10695 (((INT32) UCH(array[offset+3])) << 24))
10696 INT32 bfOffBits;
10697 INT32 headerSize;
10698 - INT32 biWidth = 0; /* initialize to avoid compiler warning */
10699 - INT32 biHeight = 0;
10700 + INT32 biWidth;
10701 + INT32 biHeight;
10702 unsigned int biPlanes;
10703 INT32 biCompression;
10704 INT32 biXPelsPerMeter,biYPelsPerMeter;
10705 @@ -300,8 +336,6 @@
10706 ERREXIT(cinfo, JERR_BMP_BADDEPTH);
10707 break;
10708 }
10709 - if (biPlanes != 1)
10710 - ERREXIT(cinfo, JERR_BMP_BADPLANES);
10711 break;
10712 case 40:
10713 case 64:
10714 @@ -325,12 +359,13 @@
10715 case 24: /* RGB image */
10716 TRACEMS2(cinfo, 1, JTRC_BMP, (int) biWidth, (int) biHeight);
10717 break;
10718 + case 32: /* RGB image + Alpha channel */
10719 + TRACEMS2(cinfo, 1, JTRC_BMP, (int) biWidth, (int) biHeight);
10720 + break;
10721 default:
10722 ERREXIT(cinfo, JERR_BMP_BADDEPTH);
10723 break;
10724 }
10725 - if (biPlanes != 1)
10726 - ERREXIT(cinfo, JERR_BMP_BADPLANES);
10727 if (biCompression != 0)
10728 ERREXIT(cinfo, JERR_BMP_COMPRESSED);
10729
10730 @@ -343,9 +378,14 @@
10731 break;
10732 default:
10733 ERREXIT(cinfo, JERR_BMP_BADHEADER);
10734 - break;
10735 + return;
10736 }
10737
10738 + if (biWidth <= 0 || biHeight <= 0)
10739 + ERREXIT(cinfo, JERR_BMP_EMPTY);
10740 + if (biPlanes != 1)
10741 + ERREXIT(cinfo, JERR_BMP_BADPLANES);
10742 +
10743 /* Compute distance to bitmap data --- will adjust for colormap below */
10744 bPad = bfOffBits - (headerSize + 14);
10745
10746 @@ -375,6 +415,8 @@
10747 /* Compute row width in file, including padding to 4-byte boundary */
10748 if (source->bits_per_pixel == 24)
10749 row_width = (JDIMENSION) (biWidth * 3);
10750 + else if (source->bits_per_pixel == 32)
10751 + row_width = (JDIMENSION) (biWidth * 4);
10752 else
10753 row_width = (JDIMENSION) biWidth;
10754 while ((row_width & 3) != 0) row_width++;
10755 Index: rdppm.c
10756 ===================================================================
10757 --- rdppm.c (revision 829)
10758 +++ rdppm.c (working copy)
10759 @@ -2,6 +2,7 @@
10760 * rdppm.c
10761 *
10762 * Copyright (C) 1991-1997, Thomas G. Lane.
10763 + * Modified 2009 by Bill Allombert, Guido Vollbeding.
10764 * This file is part of the Independent JPEG Group's software.
10765 * For conditions of distribution and use, see the accompanying README file.
10766 *
10767 @@ -250,8 +251,8 @@
10768 bufferptr = source->iobuffer;
10769 for (col = cinfo->image_width; col > 0; col--) {
10770 register int temp;
10771 - temp = UCH(*bufferptr++);
10772 - temp |= UCH(*bufferptr++) << 8;
10773 + temp = UCH(*bufferptr++) << 8;
10774 + temp |= UCH(*bufferptr++);
10775 *ptr++ = rescale[temp];
10776 }
10777 return 1;
10778 @@ -274,14 +275,14 @@
10779 bufferptr = source->iobuffer;
10780 for (col = cinfo->image_width; col > 0; col--) {
10781 register int temp;
10782 - temp = UCH(*bufferptr++);
10783 - temp |= UCH(*bufferptr++) << 8;
10784 + temp = UCH(*bufferptr++) << 8;
10785 + temp |= UCH(*bufferptr++);
10786 *ptr++ = rescale[temp];
10787 - temp = UCH(*bufferptr++);
10788 - temp |= UCH(*bufferptr++) << 8;
10789 + temp = UCH(*bufferptr++) << 8;
10790 + temp |= UCH(*bufferptr++);
10791 *ptr++ = rescale[temp];
10792 - temp = UCH(*bufferptr++);
10793 - temp |= UCH(*bufferptr++) << 8;
10794 + temp = UCH(*bufferptr++) << 8;
10795 + temp |= UCH(*bufferptr++);
10796 *ptr++ = rescale[temp];
10797 }
10798 return 1;
10799 Index: rdswitch.c
10800 ===================================================================
10801 --- rdswitch.c (revision 829)
10802 +++ rdswitch.c (working copy)
10803 @@ -1,8 +1,10 @@
10804 /*
10805 * rdswitch.c
10806 *
10807 + * This file was part of the Independent JPEG Group's software:
10808 * Copyright (C) 1991-1996, Thomas G. Lane.
10809 - * This file is part of the Independent JPEG Group's software.
10810 + * libjpeg-turbo Modifications:
10811 + * Copyright (C) 2010, D. R. Commander.
10812 * For conditions of distribution and use, see the accompanying README file.
10813 *
10814 * This file contains routines to process some of cjpeg's more complicated
10815 @@ -9,6 +11,7 @@
10816 * command-line switches. Switches processed here are:
10817 * -qtables file Read quantization tables from text file
10818 * -scans file Read scan script from text file
10819 + * -quality N[,N,...] Set quality ratings
10820 * -qslots N[,N,...] Set component quantization table selectors
10821 * -sample HxV[,HxV,...] Set component sampling factors
10822 */
10823 @@ -69,9 +72,12 @@
10824 }
10825
10826
10827 +#if JPEG_LIB_VERSION < 70
10828 +static int q_scale_factor[NUM_QUANT_TBLS] = {100, 100, 100, 100};
10829 +#endif
10830 +
10831 GLOBAL(boolean)
10832 -read_quant_tables (j_compress_ptr cinfo, char * filename,
10833 - int scale_factor, boolean force_baseline)
10834 +read_quant_tables (j_compress_ptr cinfo, char * filename, boolean force_baselin e)
10835 /* Read a set of quantization tables from the specified file.
10836 * The file is plain ASCII text: decimal numbers with whitespace between.
10837 * Comments preceded by '#' may be included in the file.
10838 @@ -108,7 +114,13 @@
10839 }
10840 table[i] = (unsigned int) val;
10841 }
10842 - jpeg_add_quant_table(cinfo, tblno, table, scale_factor, force_baseline);
10843 +#if JPEG_LIB_VERSION >= 70
10844 + jpeg_add_quant_table(cinfo, tblno, table, cinfo->q_scale_factor[tblno],
10845 + force_baseline);
10846 +#else
10847 + jpeg_add_quant_table(cinfo, tblno, table, q_scale_factor[tblno],
10848 + force_baseline);
10849 +#endif
10850 tblno++;
10851 }
10852
10853 @@ -262,7 +274,85 @@
10854 #endif /* C_MULTISCAN_FILES_SUPPORTED */
10855
10856
10857 +#if JPEG_LIB_VERSION < 70
10858 +/* These are the sample quantization tables given in JPEG spec section K.1.
10859 + * The spec says that the values given produce "good" quality, and
10860 + * when divided by 2, "very good" quality.
10861 + */
10862 +static const unsigned int std_luminance_quant_tbl[DCTSIZE2] = {
10863 + 16, 11, 10, 16, 24, 40, 51, 61,
10864 + 12, 12, 14, 19, 26, 58, 60, 55,
10865 + 14, 13, 16, 24, 40, 57, 69, 56,
10866 + 14, 17, 22, 29, 51, 87, 80, 62,
10867 + 18, 22, 37, 56, 68, 109, 103, 77,
10868 + 24, 35, 55, 64, 81, 104, 113, 92,
10869 + 49, 64, 78, 87, 103, 121, 120, 101,
10870 + 72, 92, 95, 98, 112, 100, 103, 99
10871 +};
10872 +static const unsigned int std_chrominance_quant_tbl[DCTSIZE2] = {
10873 + 17, 18, 24, 47, 99, 99, 99, 99,
10874 + 18, 21, 26, 66, 99, 99, 99, 99,
10875 + 24, 26, 56, 99, 99, 99, 99, 99,
10876 + 47, 66, 99, 99, 99, 99, 99, 99,
10877 + 99, 99, 99, 99, 99, 99, 99, 99,
10878 + 99, 99, 99, 99, 99, 99, 99, 99,
10879 + 99, 99, 99, 99, 99, 99, 99, 99,
10880 + 99, 99, 99, 99, 99, 99, 99, 99
10881 +};
10882 +
10883 +
10884 +LOCAL(void)
10885 +jpeg_default_qtables (j_compress_ptr cinfo, boolean force_baseline)
10886 +{
10887 + jpeg_add_quant_table(cinfo, 0, std_luminance_quant_tbl,
10888 + q_scale_factor[0], force_baseline);
10889 + jpeg_add_quant_table(cinfo, 1, std_chrominance_quant_tbl,
10890 + q_scale_factor[1], force_baseline);
10891 +}
10892 +#endif
10893 +
10894 +
10895 GLOBAL(boolean)
10896 +set_quality_ratings (j_compress_ptr cinfo, char *arg, boolean force_baseline)
10897 +/* Process a quality-ratings parameter string, of the form
10898 + * N[,N,...]
10899 + * If there are more q-table slots than parameters, the last value is replicate d.
10900 + */
10901 +{
10902 + int val = 75; /* default value */
10903 + int tblno;
10904 + char ch;
10905 +
10906 + for (tblno = 0; tblno < NUM_QUANT_TBLS; tblno++) {
10907 + if (*arg) {
10908 + ch = ','; /* if not set by sscanf, will be ',' */
10909 + if (sscanf(arg, "%d%c", &val, &ch) < 1)
10910 + return FALSE;
10911 + if (ch != ',') /* syntax check */
10912 + return FALSE;
10913 + /* Convert user 0-100 rating to percentage scaling */
10914 +#if JPEG_LIB_VERSION >= 70
10915 + cinfo->q_scale_factor[tblno] = jpeg_quality_scaling(val);
10916 +#else
10917 + q_scale_factor[tblno] = jpeg_quality_scaling(val);
10918 +#endif
10919 + while (*arg && *arg++ != ',') /* advance to next segment of arg string */
10920 + ;
10921 + } else {
10922 + /* reached end of parameter, set remaining factors to last value */
10923 +#if JPEG_LIB_VERSION >= 70
10924 + cinfo->q_scale_factor[tblno] = jpeg_quality_scaling(val);
10925 +#else
10926 + q_scale_factor[tblno] = jpeg_quality_scaling(val);
10927 +#endif
10928 + }
10929 + }
10930 + jpeg_default_qtables(cinfo, force_baseline);
10931 + return TRUE;
10932 +}
10933 +
10934 +
10935 +GLOBAL(boolean)
10936 set_quant_slots (j_compress_ptr cinfo, char *arg)
10937 /* Process a quantization-table-selectors parameter string, of the form
10938 * N[,N,...]
10939 Index: rrutil.h
10940 ===================================================================
10941 --- rrutil.h (revision 829)
10942 +++ rrutil.h (working copy)
10943 @@ -1,5 +1,6 @@
10944 /* Copyright (C)2004 Landmark Graphics Corporation
10945 * Copyright (C)2005 Sun Microsystems, Inc.
10946 + * Copyright (C)2010 D. R. Commander
10947 *
10948 * This library is free software and may be redistributed and/or modified under
10949 * the terms of the wxWindows Library License, Version 3.1 or (at your option)
10950 @@ -47,9 +48,9 @@
10951 static __inline int numprocs(void)
10952 {
10953 #ifdef _WIN32
10954 - DWORD ProcAff, SysAff, i; int count=0;
10955 + DWORD_PTR ProcAff, SysAff, i; int count=0;
10956 if(!GetProcessAffinityMask(GetCurrentProcess(), &ProcAff, &SysAff)) retu rn(1);
10957 - for(i=0; i<32; i++) if(ProcAff&(1<<i)) count++;
10958 + for(i=0; i<sizeof(long*)*8; i++) if(ProcAff&(1LL<<i)) count++;
10959 return(count);
10960 #elif defined (__APPLE__)
10961 return(1);
10962 Index: simd/jcclrmmx.asm
10963 ===================================================================
10964 --- simd/jcclrmmx.asm (revision 829)
10965 +++ simd/jcclrmmx.asm (working copy)
10966 @@ -19,8 +19,6 @@
10967 %include "jcolsamp.inc"
10968
10969 ; --------------------------------------------------------------------------
10970 - SECTION SEG_TEXT
10971 - BITS 32
10972 ;
10973 ; Convert some rows of samples to the output colorspace.
10974 ;
10975 @@ -42,7 +40,7 @@
10976 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
365 10977
366 align 16 10978 align 16
367 -» global» EXTN(jsimd_idct_ifast_sse2) 10979 -» global» EXTN(jsimd_rgb_ycc_convert_mmx)
368 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE 10980 +» global» EXTN(jsimd_rgb_ycc_convert_mmx) PRIVATE
369 10981
370 EXTN(jsimd_idct_ifast_sse2): 10982 EXTN(jsimd_rgb_ycc_convert_mmx):
371 push ebp 10983 push ebp
10984 @@ -474,3 +472,6 @@
10985 pop ebp
10986 ret
10987
10988 +; For some reason, the OS X linker does not honor the request to align the
10989 +; segment unless we do this.
10990 + align 16
372 Index: simd/jcclrss2-64.asm 10991 Index: simd/jcclrss2-64.asm
373 =================================================================== 10992 ===================================================================
374 --- simd/jcclrss2-64.asm (revision 829) 10993 --- simd/jcclrss2-64.asm (revision 829)
375 +++ simd/jcclrss2-64.asm (working copy) 10994 +++ simd/jcclrss2-64.asm (working copy)
376 @@ -37,7 +37,7 @@ 10995 @@ -1,5 +1,5 @@
10996 ;
10997 -; jcclrss2.asm - colorspace conversion (64-bit SSE2)
10998 +; jcclrss2-64.asm - colorspace conversion (64-bit SSE2)
10999 ;
11000 ; x86 SIMD extension for IJG JPEG library
11001 ; Copyright (C) 1999-2006, MIYASAKA Masaru.
11002 @@ -17,8 +17,6 @@
11003 %include "jcolsamp.inc"
11004
11005 ; --------------------------------------------------------------------------
11006 -» SECTION»SEG_TEXT
11007 -» BITS» 64
11008 ;
11009 ; Convert some rows of samples to the output colorspace.
11010 ;
11011 @@ -39,7 +37,7 @@
377 11012
378 align 16 11013 align 16
379 11014
380 - global EXTN(jsimd_rgb_ycc_convert_sse2) 11015 - global EXTN(jsimd_rgb_ycc_convert_sse2)
381 + global EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE 11016 + global EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE
382 11017
383 EXTN(jsimd_rgb_ycc_convert_sse2): 11018 EXTN(jsimd_rgb_ycc_convert_sse2):
384 push rbp 11019 push rbp
385 Index: simd/jiss2red-64.asm 11020 @@ -49,8 +47,8 @@
386 =================================================================== 11021 » mov» [rsp],rax
387 --- simd/jiss2red-64.asm» (revision 829) 11022 » mov» rbp,rsp»» » » ; rbp = aligned rbp
388 +++ simd/jiss2red-64.asm» (working copy) 11023 » lea» rsp, [wk(0)]
389 @@ -73,7 +73,7 @@ 11024 +» collect_args
11025 » push» rbx
11026 -» collect_args
11027
11028 » mov» rcx, r10
11029 » test» rcx,rcx
11030 @@ -70,7 +68,7 @@
11031 » pop» rcx
11032
11033 » mov rsi, r11
11034 -» mov» rax, r14
11035 +» mov» eax, r14d
11036 » test» rax,rax
11037 » jle» near .return
11038 .rowloop:
11039 @@ -475,10 +473,13 @@
11040 » jg» near .rowloop
11041
11042 .return:
11043 +» pop» rbx
11044 » uncollect_args
11045 -» pop» rbx
11046 » mov» rsp,rbp»» ; rsp <- aligned rbp
11047 » pop» rsp» » ; rsp <- original rbp
11048 » pop» rbp
11049 » ret
11050
11051 +; For some reason, the OS X linker does not honor the request to align the
11052 +; segment unless we do this.
11053 +» align» 16
11054 Index: simd/jcclrss2.asm
11055 ===================================================================
11056 --- simd/jcclrss2.asm» (revision 829)
11057 +++ simd/jcclrss2.asm» (working copy)
11058 @@ -16,8 +16,6 @@
11059 %include "jcolsamp.inc"
11060
11061 ; --------------------------------------------------------------------------
11062 -» SECTION»SEG_TEXT
11063 -» BITS» 32
11064 ;
11065 ; Convert some rows of samples to the output colorspace.
11066 ;
11067 @@ -40,7 +38,7 @@
11068
11069 » align» 16
11070
11071 -» global» EXTN(jsimd_rgb_ycc_convert_sse2)
11072 +» global» EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE
11073
11074 EXTN(jsimd_rgb_ycc_convert_sse2):
11075 » push» ebp
11076 @@ -500,3 +498,6 @@
11077 » pop» ebp
11078 » ret
11079
11080 +; For some reason, the OS X linker does not honor the request to align the
11081 +; segment unless we do this.
11082 +» align» 16
11083 Index: simd/jccolmmx.asm
11084 ===================================================================
11085 --- simd/jccolmmx.asm» (revision 829)
11086 +++ simd/jccolmmx.asm» (working copy)
11087 @@ -37,7 +37,7 @@
390 SECTION SEG_CONST 11088 SECTION SEG_CONST
391 11089
392 alignz 16 11090 alignz 16
393 -» global» EXTN(jconst_idct_red_sse2) 11091 -» global» EXTN(jconst_rgb_ycc_convert_mmx)
394 +» global» EXTN(jconst_idct_red_sse2) PRIVATE 11092 +» global» EXTN(jconst_rgb_ycc_convert_mmx) PRIVATE
395 11093
396 EXTN(jconst_idct_red_sse2): 11094 EXTN(jconst_rgb_ycc_convert_mmx):
397 11095
398 @@ -114,7 +114,7 @@ 11096 @@ -51,6 +51,9 @@
399 %define WK_NUM»» 2 11097 » alignz» 16
11098
11099 ; --------------------------------------------------------------------------
11100 +» SECTION»SEG_TEXT
11101 +» BITS» 32
11102 +
11103 %include "jcclrmmx.asm"
11104
11105 %undef RGB_RED
11106 @@ -57,10 +60,10 @@
11107 %undef RGB_GREEN
11108 %undef RGB_BLUE
11109 %undef RGB_PIXELSIZE
11110 -%define RGB_RED 0
11111 -%define RGB_GREEN 1
11112 -%define RGB_BLUE 2
11113 -%define RGB_PIXELSIZE 3
11114 +%define RGB_RED EXT_RGB_RED
11115 +%define RGB_GREEN EXT_RGB_GREEN
11116 +%define RGB_BLUE EXT_RGB_BLUE
11117 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
11118 %define jsimd_rgb_ycc_convert_mmx jsimd_extrgb_ycc_convert_mmx
11119 %include "jcclrmmx.asm"
11120
11121 @@ -68,10 +71,10 @@
11122 %undef RGB_GREEN
11123 %undef RGB_BLUE
11124 %undef RGB_PIXELSIZE
11125 -%define RGB_RED 0
11126 -%define RGB_GREEN 1
11127 -%define RGB_BLUE 2
11128 -%define RGB_PIXELSIZE 4
11129 +%define RGB_RED EXT_RGBX_RED
11130 +%define RGB_GREEN EXT_RGBX_GREEN
11131 +%define RGB_BLUE EXT_RGBX_BLUE
11132 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
11133 %define jsimd_rgb_ycc_convert_mmx jsimd_extrgbx_ycc_convert_mmx
11134 %include "jcclrmmx.asm"
11135
11136 @@ -79,10 +82,10 @@
11137 %undef RGB_GREEN
11138 %undef RGB_BLUE
11139 %undef RGB_PIXELSIZE
11140 -%define RGB_RED 2
11141 -%define RGB_GREEN 1
11142 -%define RGB_BLUE 0
11143 -%define RGB_PIXELSIZE 3
11144 +%define RGB_RED EXT_BGR_RED
11145 +%define RGB_GREEN EXT_BGR_GREEN
11146 +%define RGB_BLUE EXT_BGR_BLUE
11147 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
11148 %define jsimd_rgb_ycc_convert_mmx jsimd_extbgr_ycc_convert_mmx
11149 %include "jcclrmmx.asm"
11150
11151 @@ -90,10 +93,10 @@
11152 %undef RGB_GREEN
11153 %undef RGB_BLUE
11154 %undef RGB_PIXELSIZE
11155 -%define RGB_RED 2
11156 -%define RGB_GREEN 1
11157 -%define RGB_BLUE 0
11158 -%define RGB_PIXELSIZE 4
11159 +%define RGB_RED EXT_BGRX_RED
11160 +%define RGB_GREEN EXT_BGRX_GREEN
11161 +%define RGB_BLUE EXT_BGRX_BLUE
11162 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
11163 %define jsimd_rgb_ycc_convert_mmx jsimd_extbgrx_ycc_convert_mmx
11164 %include "jcclrmmx.asm"
11165
11166 @@ -101,10 +104,10 @@
11167 %undef RGB_GREEN
11168 %undef RGB_BLUE
11169 %undef RGB_PIXELSIZE
11170 -%define RGB_RED 3
11171 -%define RGB_GREEN 2
11172 -%define RGB_BLUE 1
11173 -%define RGB_PIXELSIZE 4
11174 +%define RGB_RED EXT_XBGR_RED
11175 +%define RGB_GREEN EXT_XBGR_GREEN
11176 +%define RGB_BLUE EXT_XBGR_BLUE
11177 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
11178 %define jsimd_rgb_ycc_convert_mmx jsimd_extxbgr_ycc_convert_mmx
11179 %include "jcclrmmx.asm"
11180
11181 @@ -112,9 +115,9 @@
11182 %undef RGB_GREEN
11183 %undef RGB_BLUE
11184 %undef RGB_PIXELSIZE
11185 -%define RGB_RED 1
11186 -%define RGB_GREEN 2
11187 -%define RGB_BLUE 3
11188 -%define RGB_PIXELSIZE 4
11189 +%define RGB_RED EXT_XRGB_RED
11190 +%define RGB_GREEN EXT_XRGB_GREEN
11191 +%define RGB_BLUE EXT_XRGB_BLUE
11192 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
11193 %define jsimd_rgb_ycc_convert_mmx jsimd_extxrgb_ycc_convert_mmx
11194 %include "jcclrmmx.asm"
11195 Index: simd/jccolss2-64.asm
11196 ===================================================================
11197 --- simd/jccolss2-64.asm» (revision 829)
11198 +++ simd/jccolss2-64.asm» (working copy)
11199 @@ -1,5 +1,5 @@
11200 ;
11201 -; jccolss2.asm - colorspace conversion (64-bit SSE2)
11202 +; jccolss2-64.asm - colorspace conversion (64-bit SSE2)
11203 ;
11204 ; x86 SIMD extension for IJG JPEG library
11205 ; Copyright (C) 1999-2006, MIYASAKA Masaru.
11206 @@ -34,7 +34,7 @@
11207 » SECTION»SEG_CONST
11208
11209 » alignz» 16
11210 -» global» EXTN(jconst_rgb_ycc_convert_sse2)
11211 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE
11212
11213 EXTN(jconst_rgb_ycc_convert_sse2):
11214
11215 @@ -48,6 +48,9 @@
11216 » alignz» 16
11217
11218 ; --------------------------------------------------------------------------
11219 +» SECTION»SEG_TEXT
11220 +» BITS» 64
11221 +
11222 %include "jcclrss2-64.asm"
11223
11224 %undef RGB_RED
11225 @@ -54,10 +57,10 @@
11226 %undef RGB_GREEN
11227 %undef RGB_BLUE
11228 %undef RGB_PIXELSIZE
11229 -%define RGB_RED 0
11230 -%define RGB_GREEN 1
11231 -%define RGB_BLUE 2
11232 -%define RGB_PIXELSIZE 3
11233 +%define RGB_RED EXT_RGB_RED
11234 +%define RGB_GREEN EXT_RGB_GREEN
11235 +%define RGB_BLUE EXT_RGB_BLUE
11236 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
11237 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgb_ycc_convert_sse2
11238 %include "jcclrss2-64.asm"
11239
11240 @@ -65,10 +68,10 @@
11241 %undef RGB_GREEN
11242 %undef RGB_BLUE
11243 %undef RGB_PIXELSIZE
11244 -%define RGB_RED 0
11245 -%define RGB_GREEN 1
11246 -%define RGB_BLUE 2
11247 -%define RGB_PIXELSIZE 4
11248 +%define RGB_RED EXT_RGBX_RED
11249 +%define RGB_GREEN EXT_RGBX_GREEN
11250 +%define RGB_BLUE EXT_RGBX_BLUE
11251 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
11252 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgbx_ycc_convert_sse2
11253 %include "jcclrss2-64.asm"
11254
11255 @@ -76,10 +79,10 @@
11256 %undef RGB_GREEN
11257 %undef RGB_BLUE
11258 %undef RGB_PIXELSIZE
11259 -%define RGB_RED 2
11260 -%define RGB_GREEN 1
11261 -%define RGB_BLUE 0
11262 -%define RGB_PIXELSIZE 3
11263 +%define RGB_RED EXT_BGR_RED
11264 +%define RGB_GREEN EXT_BGR_GREEN
11265 +%define RGB_BLUE EXT_BGR_BLUE
11266 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
11267 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgr_ycc_convert_sse2
11268 %include "jcclrss2-64.asm"
11269
11270 @@ -87,10 +90,10 @@
11271 %undef RGB_GREEN
11272 %undef RGB_BLUE
11273 %undef RGB_PIXELSIZE
11274 -%define RGB_RED 2
11275 -%define RGB_GREEN 1
11276 -%define RGB_BLUE 0
11277 -%define RGB_PIXELSIZE 4
11278 +%define RGB_RED EXT_BGRX_RED
11279 +%define RGB_GREEN EXT_BGRX_GREEN
11280 +%define RGB_BLUE EXT_BGRX_BLUE
11281 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
11282 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgrx_ycc_convert_sse2
11283 %include "jcclrss2-64.asm"
11284
11285 @@ -98,10 +101,10 @@
11286 %undef RGB_GREEN
11287 %undef RGB_BLUE
11288 %undef RGB_PIXELSIZE
11289 -%define RGB_RED 3
11290 -%define RGB_GREEN 2
11291 -%define RGB_BLUE 1
11292 -%define RGB_PIXELSIZE 4
11293 +%define RGB_RED EXT_XBGR_RED
11294 +%define RGB_GREEN EXT_XBGR_GREEN
11295 +%define RGB_BLUE EXT_XBGR_BLUE
11296 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
11297 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxbgr_ycc_convert_sse2
11298 %include "jcclrss2-64.asm"
11299
11300 @@ -109,9 +112,9 @@
11301 %undef RGB_GREEN
11302 %undef RGB_BLUE
11303 %undef RGB_PIXELSIZE
11304 -%define RGB_RED 1
11305 -%define RGB_GREEN 2
11306 -%define RGB_BLUE 3
11307 -%define RGB_PIXELSIZE 4
11308 +%define RGB_RED EXT_XRGB_RED
11309 +%define RGB_GREEN EXT_XRGB_GREEN
11310 +%define RGB_BLUE EXT_XRGB_BLUE
11311 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
11312 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxrgb_ycc_convert_sse2
11313 %include "jcclrss2-64.asm"
11314 Index: simd/jccolss2.asm
11315 ===================================================================
11316 --- simd/jccolss2.asm» (revision 829)
11317 +++ simd/jccolss2.asm» (working copy)
11318 @@ -34,7 +34,7 @@
11319 » SECTION»SEG_CONST
11320
11321 » alignz» 16
11322 -» global» EXTN(jconst_rgb_ycc_convert_sse2)
11323 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE
11324
11325 EXTN(jconst_rgb_ycc_convert_sse2):
11326
11327 @@ -48,6 +48,9 @@
11328 » alignz» 16
11329
11330 ; --------------------------------------------------------------------------
11331 +» SECTION»SEG_TEXT
11332 +» BITS» 32
11333 +
11334 %include "jcclrss2.asm"
11335
11336 %undef RGB_RED
11337 @@ -54,10 +57,10 @@
11338 %undef RGB_GREEN
11339 %undef RGB_BLUE
11340 %undef RGB_PIXELSIZE
11341 -%define RGB_RED 0
11342 -%define RGB_GREEN 1
11343 -%define RGB_BLUE 2
11344 -%define RGB_PIXELSIZE 3
11345 +%define RGB_RED EXT_RGB_RED
11346 +%define RGB_GREEN EXT_RGB_GREEN
11347 +%define RGB_BLUE EXT_RGB_BLUE
11348 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
11349 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgb_ycc_convert_sse2
11350 %include "jcclrss2.asm"
11351
11352 @@ -65,10 +68,10 @@
11353 %undef RGB_GREEN
11354 %undef RGB_BLUE
11355 %undef RGB_PIXELSIZE
11356 -%define RGB_RED 0
11357 -%define RGB_GREEN 1
11358 -%define RGB_BLUE 2
11359 -%define RGB_PIXELSIZE 4
11360 +%define RGB_RED EXT_RGBX_RED
11361 +%define RGB_GREEN EXT_RGBX_GREEN
11362 +%define RGB_BLUE EXT_RGBX_BLUE
11363 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
11364 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgbx_ycc_convert_sse2
11365 %include "jcclrss2.asm"
11366
11367 @@ -76,10 +79,10 @@
11368 %undef RGB_GREEN
11369 %undef RGB_BLUE
11370 %undef RGB_PIXELSIZE
11371 -%define RGB_RED 2
11372 -%define RGB_GREEN 1
11373 -%define RGB_BLUE 0
11374 -%define RGB_PIXELSIZE 3
11375 +%define RGB_RED EXT_BGR_RED
11376 +%define RGB_GREEN EXT_BGR_GREEN
11377 +%define RGB_BLUE EXT_BGR_BLUE
11378 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
11379 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgr_ycc_convert_sse2
11380 %include "jcclrss2.asm"
11381
11382 @@ -87,10 +90,10 @@
11383 %undef RGB_GREEN
11384 %undef RGB_BLUE
11385 %undef RGB_PIXELSIZE
11386 -%define RGB_RED 2
11387 -%define RGB_GREEN 1
11388 -%define RGB_BLUE 0
11389 -%define RGB_PIXELSIZE 4
11390 +%define RGB_RED EXT_BGRX_RED
11391 +%define RGB_GREEN EXT_BGRX_GREEN
11392 +%define RGB_BLUE EXT_BGRX_BLUE
11393 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
11394 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgrx_ycc_convert_sse2
11395 %include "jcclrss2.asm"
11396
11397 @@ -98,10 +101,10 @@
11398 %undef RGB_GREEN
11399 %undef RGB_BLUE
11400 %undef RGB_PIXELSIZE
11401 -%define RGB_RED 3
11402 -%define RGB_GREEN 2
11403 -%define RGB_BLUE 1
11404 -%define RGB_PIXELSIZE 4
11405 +%define RGB_RED EXT_XBGR_RED
11406 +%define RGB_GREEN EXT_XBGR_GREEN
11407 +%define RGB_BLUE EXT_XBGR_BLUE
11408 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
11409 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxbgr_ycc_convert_sse2
11410 %include "jcclrss2.asm"
11411
11412 @@ -109,9 +112,9 @@
11413 %undef RGB_GREEN
11414 %undef RGB_BLUE
11415 %undef RGB_PIXELSIZE
11416 -%define RGB_RED 1
11417 -%define RGB_GREEN 2
11418 -%define RGB_BLUE 3
11419 -%define RGB_PIXELSIZE 4
11420 +%define RGB_RED EXT_XRGB_RED
11421 +%define RGB_GREEN EXT_XRGB_GREEN
11422 +%define RGB_BLUE EXT_XRGB_BLUE
11423 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
11424 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxrgb_ycc_convert_sse2
11425 %include "jcclrss2.asm"
11426 Index: simd/jcqnt3dn.asm
11427 ===================================================================
11428 --- simd/jcqnt3dn.asm» (revision 829)
11429 +++ simd/jcqnt3dn.asm» (working copy)
11430 @@ -35,7 +35,7 @@
11431 %define workspace» ebp+16» » ; FAST_FLOAT * workspace
400 11432
401 align 16 11433 align 16
402 -» global» EXTN(jsimd_idct_4x4_sse2) 11434 -» global» EXTN(jsimd_convsamp_float_3dnow)
403 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE 11435 +» global» EXTN(jsimd_convsamp_float_3dnow) PRIVATE
404 11436
405 EXTN(jsimd_idct_4x4_sse2): 11437 EXTN(jsimd_convsamp_float_3dnow):
11438 » push» ebp
11439 @@ -138,7 +138,7 @@
11440 %define workspace» ebp+16» » ; FAST_FLOAT * workspace
11441
11442 » align» 16
11443 -» global» EXTN(jsimd_quantize_float_3dnow)
11444 +» global» EXTN(jsimd_quantize_float_3dnow) PRIVATE
11445
11446 EXTN(jsimd_quantize_float_3dnow):
11447 » push» ebp
11448 @@ -228,3 +228,6 @@
11449 » pop» ebp
11450 » ret
11451
11452 +; For some reason, the OS X linker does not honor the request to align the
11453 +; segment unless we do this.
11454 +» align» 16
11455 Index: simd/jcqntmmx.asm
11456 ===================================================================
11457 --- simd/jcqntmmx.asm» (revision 829)
11458 +++ simd/jcqntmmx.asm» (working copy)
11459 @@ -35,7 +35,7 @@
11460 %define workspace» ebp+16» » ; DCTELEM * workspace
11461
11462 » align» 16
11463 -» global» EXTN(jsimd_convsamp_mmx)
11464 +» global» EXTN(jsimd_convsamp_mmx) PRIVATE
11465
11466 EXTN(jsimd_convsamp_mmx):
11467 » push» ebp
11468 @@ -140,7 +140,7 @@
11469 %define workspace» ebp+16» » ; DCTELEM * workspace
11470
11471 » align» 16
11472 -» global» EXTN(jsimd_quantize_mmx)
11473 +» global» EXTN(jsimd_quantize_mmx) PRIVATE
11474
11475 EXTN(jsimd_quantize_mmx):
11476 » push» ebp
11477 @@ -269,3 +269,6 @@
11478 » pop» ebp
11479 » ret
11480
11481 +; For some reason, the OS X linker does not honor the request to align the
11482 +; segment unless we do this.
11483 +» align» 16
11484 Index: simd/jcqnts2f-64.asm
11485 ===================================================================
11486 --- simd/jcqnts2f-64.asm» (revision 829)
11487 +++ simd/jcqnts2f-64.asm» (working copy)
11488 @@ -1,5 +1,5 @@
11489 ;
11490 -; jcqnts2f.asm - sample data conversion and quantization (64-bit SSE & SSE2)
11491 +; jcqnts2f-64.asm - sample data conversion and quantization (64-bit SSE & SSE2)
11492 ;
11493 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
11494 ; Copyright 2009 D. R. Commander
11495 @@ -36,13 +36,14 @@
11496 ; r12 = FAST_FLOAT * workspace
11497
11498 » align» 16
11499 -» global» EXTN(jsimd_convsamp_float_sse2)
11500 +» global» EXTN(jsimd_convsamp_float_sse2) PRIVATE
11501
11502 EXTN(jsimd_convsamp_float_sse2):
406 push rbp 11503 push rbp
407 @@ -413,7 +413,7 @@ 11504 +» mov» rax,rsp
408 ; r13 = JDIMENSION output_col 11505 » mov» rbp,rsp
11506 +» collect_args
11507 » push» rbx
11508 -» collect_args
11509
11510 » pcmpeqw xmm7,xmm7
11511 » psllw xmm7,7
11512 @@ -89,8 +90,8 @@
11513 » dec» rcx
11514 » jnz» short .convloop
11515
11516 +» pop» rbx
11517 » uncollect_args
11518 -» pop» rbx
11519 » pop» rbp
11520 » ret
11521
11522 @@ -109,10 +110,11 @@
11523 ; r12 = FAST_FLOAT * workspace
409 11524
410 align 16 11525 align 16
411 -» global» EXTN(jsimd_idct_2x2_sse2) 11526 -» global» EXTN(jsimd_quantize_float_sse2)
412 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE 11527 +» global» EXTN(jsimd_quantize_float_sse2) PRIVATE
413 11528
414 EXTN(jsimd_idct_2x2_sse2): 11529 EXTN(jsimd_quantize_float_sse2):
415 push rbp 11530 push rbp
416 Index: simd/ji3dnflt.asm 11531 +» mov» rax,rsp
417 =================================================================== 11532 » mov» rbp,rsp
418 --- simd/ji3dnflt.asm» (revision 829) 11533 » collect_args
419 +++ simd/ji3dnflt.asm» (working copy) 11534
420 @@ -27,7 +27,7 @@ 11535 @@ -150,3 +152,7 @@
421 » SECTION»SEG_CONST 11536 » uncollect_args
422 11537 » pop» rbp
423 » alignz» 16 11538 » ret
424 -» global» EXTN(jconst_idct_float_3dnow) 11539 +
425 +» global» EXTN(jconst_idct_float_3dnow) PRIVATE 11540 +; For some reason, the OS X linker does not honor the request to align the
426 11541 +; segment unless we do this.
427 EXTN(jconst_idct_float_3dnow): 11542 +» align» 16
428
429 @@ -63,7 +63,7 @@
430 » » » » » ; FAST_FLOAT workspace[DCTSIZE2]
431
432 » align» 16
433 -» global» EXTN(jsimd_idct_float_3dnow)
434 +» global» EXTN(jsimd_idct_float_3dnow) PRIVATE
435
436 EXTN(jsimd_idct_float_3dnow):
437 » push» ebp
438 Index: simd/jsimdcpu.asm
439 ===================================================================
440 --- simd/jsimdcpu.asm» (revision 829)
441 +++ simd/jsimdcpu.asm» (working copy)
442 @@ -29,7 +29,7 @@
443 ;
444
445 » align» 16
446 -» global» EXTN(jpeg_simd_cpu_support)
447 +» global» EXTN(jpeg_simd_cpu_support) PRIVATE
448
449 EXTN(jpeg_simd_cpu_support):
450 » push» ebx
451 Index: simd/jdmerss2-64.asm
452 ===================================================================
453 --- simd/jdmerss2-64.asm» (revision 829)
454 +++ simd/jdmerss2-64.asm» (working copy)
455 @@ -35,7 +35,7 @@
456 » SECTION»SEG_CONST
457
458 » alignz» 16
459 -» global» EXTN(jconst_merged_upsample_sse2)
460 +» global» EXTN(jconst_merged_upsample_sse2) PRIVATE
461
462 EXTN(jconst_merged_upsample_sse2):
463
464 Index: simd/jdsammmx.asm
465 ===================================================================
466 --- simd/jdsammmx.asm» (revision 829)
467 +++ simd/jdsammmx.asm» (working copy)
468 @@ -22,7 +22,7 @@
469 » SECTION»SEG_CONST
470
471 » alignz» 16
472 -» global» EXTN(jconst_fancy_upsample_mmx)
473 +» global» EXTN(jconst_fancy_upsample_mmx) PRIVATE
474
475 EXTN(jconst_fancy_upsample_mmx):
476
477 @@ -58,7 +58,7 @@
478 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
479
480 » align» 16
481 -» global» EXTN(jsimd_h2v1_fancy_upsample_mmx)
482 +» global» EXTN(jsimd_h2v1_fancy_upsample_mmx) PRIVATE
483
484 EXTN(jsimd_h2v1_fancy_upsample_mmx):
485 » push» ebp
486 @@ -216,7 +216,7 @@
487 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr
488
489 » align» 16
490 -» global» EXTN(jsimd_h2v2_fancy_upsample_mmx)
491 +» global» EXTN(jsimd_h2v2_fancy_upsample_mmx) PRIVATE
492
493 EXTN(jsimd_h2v2_fancy_upsample_mmx):
494 » push» ebp
495 @@ -542,7 +542,7 @@
496 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
497
498 » align» 16
499 -» global» EXTN(jsimd_h2v1_upsample_mmx)
500 +» global» EXTN(jsimd_h2v1_upsample_mmx) PRIVATE
501
502 EXTN(jsimd_h2v1_upsample_mmx):
503 » push» ebp
504 @@ -643,7 +643,7 @@
505 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
506
507 » align» 16
508 -» global» EXTN(jsimd_h2v2_upsample_mmx)
509 +» global» EXTN(jsimd_h2v2_upsample_mmx) PRIVATE
510
511 EXTN(jsimd_h2v2_upsample_mmx):
512 » push» ebp
513 Index: simd/jdmrgmmx.asm
514 ===================================================================
515 --- simd/jdmrgmmx.asm» (revision 829)
516 +++ simd/jdmrgmmx.asm» (working copy)
517 @@ -40,7 +40,7 @@
518 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr
519
520 » align» 16
521 -» global» EXTN(jsimd_h2v1_merged_upsample_mmx)
522 +» global» EXTN(jsimd_h2v1_merged_upsample_mmx) PRIVATE
523
524 EXTN(jsimd_h2v1_merged_upsample_mmx):
525 » push» ebp
526 @@ -409,7 +409,7 @@
527 %define output_buf(b)» » (b)+20» » ; JSAMPARRAY output_buf
528
529 » align» 16
530 -» global» EXTN(jsimd_h2v2_merged_upsample_mmx)
531 +» global» EXTN(jsimd_h2v2_merged_upsample_mmx) PRIVATE
532
533 EXTN(jsimd_h2v2_merged_upsample_mmx):
534 » push» ebp
535 Index: simd/jdsamss2.asm
536 ===================================================================
537 --- simd/jdsamss2.asm» (revision 829)
538 +++ simd/jdsamss2.asm» (working copy)
539 @@ -22,7 +22,7 @@
540 » SECTION»SEG_CONST
541
542 » alignz» 16
543 -» global» EXTN(jconst_fancy_upsample_sse2)
544 +» global» EXTN(jconst_fancy_upsample_sse2) PRIVATE
545
546 EXTN(jconst_fancy_upsample_sse2):
547
548 @@ -58,7 +58,7 @@
549 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
550
551 » align» 16
552 -» global» EXTN(jsimd_h2v1_fancy_upsample_sse2)
553 +» global» EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE
554
555 EXTN(jsimd_h2v1_fancy_upsample_sse2):
556 » push» ebp
557 @@ -214,7 +214,7 @@
558 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr
559
560 » align» 16
561 -» global» EXTN(jsimd_h2v2_fancy_upsample_sse2)
562 +» global» EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE
563
564 EXTN(jsimd_h2v2_fancy_upsample_sse2):
565 » push» ebp
566 @@ -538,7 +538,7 @@
567 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
568
569 » align» 16
570 -» global» EXTN(jsimd_h2v1_upsample_sse2)
571 +» global» EXTN(jsimd_h2v1_upsample_sse2) PRIVATE
572
573 EXTN(jsimd_h2v1_upsample_sse2):
574 » push» ebp
575 @@ -637,7 +637,7 @@
576 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
577
578 » align» 16
579 -» global» EXTN(jsimd_h2v2_upsample_sse2)
580 +» global» EXTN(jsimd_h2v2_upsample_sse2) PRIVATE
581
582 EXTN(jsimd_h2v2_upsample_sse2):
583 » push» ebp
584 Index: simd/jiss2flt-64.asm
585 ===================================================================
586 --- simd/jiss2flt-64.asm» (revision 829)
587 +++ simd/jiss2flt-64.asm» (working copy)
588 @@ -38,7 +38,7 @@
589 » SECTION»SEG_CONST
590
591 » alignz» 16
592 -» global» EXTN(jconst_idct_float_sse2)
593 +» global» EXTN(jconst_idct_float_sse2) PRIVATE
594
595 EXTN(jconst_idct_float_sse2):
596
597 @@ -74,7 +74,7 @@
598 » » » » » ; FAST_FLOAT workspace[DCTSIZE2]
599
600 » align» 16
601 -» global» EXTN(jsimd_idct_float_sse2)
602 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE
603
604 EXTN(jsimd_idct_float_sse2):
605 » push» rbp
606 Index: simd/jfss2int-64.asm
607 ===================================================================
608 --- simd/jfss2int-64.asm» (revision 829)
609 +++ simd/jfss2int-64.asm» (working copy)
610 @@ -67,7 +67,7 @@
611 » SECTION»SEG_CONST
612
613 » alignz» 16
614 -» global» EXTN(jconst_fdct_islow_sse2)
615 +» global» EXTN(jconst_fdct_islow_sse2) PRIVATE
616
617 EXTN(jconst_fdct_islow_sse2):
618
619 @@ -101,7 +101,7 @@
620 %define WK_NUM»» 6
621
622 » align» 16
623 -» global» EXTN(jsimd_fdct_islow_sse2)
624 +» global» EXTN(jsimd_fdct_islow_sse2) PRIVATE
625
626 EXTN(jsimd_fdct_islow_sse2):
627 » push» rbp
628 Index: simd/jcqnts2f.asm 11543 Index: simd/jcqnts2f.asm
629 =================================================================== 11544 ===================================================================
630 --- simd/jcqnts2f.asm (revision 829) 11545 --- simd/jcqnts2f.asm (revision 829)
631 +++ simd/jcqnts2f.asm (working copy) 11546 +++ simd/jcqnts2f.asm (working copy)
632 @@ -35,7 +35,7 @@ 11547 @@ -35,7 +35,7 @@
633 %define workspace ebp+16 ; FAST_FLOAT * workspace 11548 %define workspace ebp+16 ; FAST_FLOAT * workspace
634 11549
635 align 16 11550 align 16
636 - global EXTN(jsimd_convsamp_float_sse2) 11551 - global EXTN(jsimd_convsamp_float_sse2)
637 + global EXTN(jsimd_convsamp_float_sse2) PRIVATE 11552 + global EXTN(jsimd_convsamp_float_sse2) PRIVATE
638 11553
639 EXTN(jsimd_convsamp_float_sse2): 11554 EXTN(jsimd_convsamp_float_sse2):
640 push ebp 11555 push ebp
641 @@ -115,7 +115,7 @@ 11556 @@ -115,7 +115,7 @@
642 %define workspace ebp+16 ; FAST_FLOAT * workspace 11557 %define workspace ebp+16 ; FAST_FLOAT * workspace
643 11558
644 align 16 11559 align 16
645 - global EXTN(jsimd_quantize_float_sse2) 11560 - global EXTN(jsimd_quantize_float_sse2)
646 + global EXTN(jsimd_quantize_float_sse2) PRIVATE 11561 + global EXTN(jsimd_quantize_float_sse2) PRIVATE
647 11562
648 EXTN(jsimd_quantize_float_sse2): 11563 EXTN(jsimd_quantize_float_sse2):
649 push ebp 11564 push ebp
650 Index: simd/jdmrgss2.asm 11565 @@ -166,3 +166,6 @@
651 =================================================================== 11566 » pop» ebp
652 --- simd/jdmrgss2.asm» (revision 829) 11567 » ret
653 +++ simd/jdmrgss2.asm» (working copy) 11568
654 @@ -40,7 +40,7 @@ 11569 +; For some reason, the OS X linker does not honor the request to align the
655 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr 11570 +; segment unless we do this.
11571 +» align» 16
11572 Index: simd/jcqnts2i-64.asm
11573 ===================================================================
11574 --- simd/jcqnts2i-64.asm» (revision 829)
11575 +++ simd/jcqnts2i-64.asm» (working copy)
11576 @@ -1,5 +1,5 @@
11577 ;
11578 -; jcqnts2i.asm - sample data conversion and quantization (64-bit SSE2)
11579 +; jcqnts2i-64.asm - sample data conversion and quantization (64-bit SSE2)
11580 ;
11581 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
11582 ; Copyright 2009 D. R. Commander
11583 @@ -36,13 +36,14 @@
11584 ; r12 = DCTELEM * workspace
656 11585
657 align 16 11586 align 16
658 -» global» EXTN(jsimd_h2v1_merged_upsample_sse2) 11587 -» global» EXTN(jsimd_convsamp_sse2)
659 +» global» EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE 11588 +» global» EXTN(jsimd_convsamp_sse2) PRIVATE
660 11589
661 EXTN(jsimd_h2v1_merged_upsample_sse2): 11590 EXTN(jsimd_convsamp_sse2):
662 » push» ebp 11591 » push» rbp
663 @@ -560,7 +560,7 @@ 11592 +» mov» rax,rsp
664 %define output_buf(b)» » (b)+20» » ; JSAMPARRAY output_buf 11593 » mov» rbp,rsp
11594 +» collect_args
11595 » push» rbx
11596 -» collect_args
11597
11598 » pxor» xmm6,xmm6» » ; xmm6=(all 0's)
11599 » pcmpeqw»xmm7,xmm7
11600 @@ -84,8 +85,8 @@
11601 » dec» rcx
11602 » jnz» short .convloop
11603
11604 +» pop» rbx
11605 » uncollect_args
11606 -» pop» rbx
11607 » pop» rbp
11608 » ret
11609
11610 @@ -111,10 +112,11 @@
11611 ; r12 = DCTELEM * workspace
665 11612
666 align 16 11613 align 16
667 -» global» EXTN(jsimd_h2v2_merged_upsample_sse2) 11614 -» global» EXTN(jsimd_quantize_sse2)
668 +» global» EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE 11615 +» global» EXTN(jsimd_quantize_sse2) PRIVATE
669 11616
670 EXTN(jsimd_h2v2_merged_upsample_sse2): 11617 EXTN(jsimd_quantize_sse2):
671 » push» ebp
672 Index: simd/jfmmxint.asm
673 ===================================================================
674 --- simd/jfmmxint.asm» (revision 829)
675 +++ simd/jfmmxint.asm» (working copy)
676 @@ -66,7 +66,7 @@
677 » SECTION»SEG_CONST
678
679 » alignz» 16
680 -» global» EXTN(jconst_fdct_islow_mmx)
681 +» global» EXTN(jconst_fdct_islow_mmx) PRIVATE
682
683 EXTN(jconst_fdct_islow_mmx):
684
685 @@ -101,7 +101,7 @@
686 %define WK_NUM»» 2
687
688 » align» 16
689 -» global» EXTN(jsimd_fdct_islow_mmx)
690 +» global» EXTN(jsimd_fdct_islow_mmx) PRIVATE
691
692 EXTN(jsimd_fdct_islow_mmx):
693 » push» ebp
694 Index: simd/jcgryss2-64.asm
695 ===================================================================
696 --- simd/jcgryss2-64.asm» (revision 829)
697 +++ simd/jcgryss2-64.asm» (working copy)
698 @@ -37,7 +37,7 @@
699
700 » align» 16
701
702 -» global» EXTN(jsimd_rgb_gray_convert_sse2)
703 +» global» EXTN(jsimd_rgb_gray_convert_sse2) PRIVATE
704
705 EXTN(jsimd_rgb_gray_convert_sse2):
706 push rbp 11618 push rbp
11619 + mov rax,rsp
11620 mov rbp,rsp
11621 collect_args
11622
11623 @@ -179,3 +181,7 @@
11624 uncollect_args
11625 pop rbp
11626 ret
11627 +
11628 +; For some reason, the OS X linker does not honor the request to align the
11629 +; segment unless we do this.
11630 + align 16
707 Index: simd/jcqnts2i.asm 11631 Index: simd/jcqnts2i.asm
708 =================================================================== 11632 ===================================================================
709 --- simd/jcqnts2i.asm (revision 829) 11633 --- simd/jcqnts2i.asm (revision 829)
710 +++ simd/jcqnts2i.asm (working copy) 11634 +++ simd/jcqnts2i.asm (working copy)
711 @@ -35,7 +35,7 @@ 11635 @@ -35,7 +35,7 @@
712 %define workspace ebp+16 ; DCTELEM * workspace 11636 %define workspace ebp+16 ; DCTELEM * workspace
713 11637
714 align 16 11638 align 16
715 - global EXTN(jsimd_convsamp_sse2) 11639 - global EXTN(jsimd_convsamp_sse2)
716 + global EXTN(jsimd_convsamp_sse2) PRIVATE 11640 + global EXTN(jsimd_convsamp_sse2) PRIVATE
717 11641
718 EXTN(jsimd_convsamp_sse2): 11642 EXTN(jsimd_convsamp_sse2):
719 push ebp 11643 push ebp
720 @@ -117,7 +117,7 @@ 11644 @@ -117,7 +117,7 @@
721 %define workspace ebp+16 ; DCTELEM * workspace 11645 %define workspace ebp+16 ; DCTELEM * workspace
722 11646
723 align 16 11647 align 16
724 - global EXTN(jsimd_quantize_sse2) 11648 - global EXTN(jsimd_quantize_sse2)
725 + global EXTN(jsimd_quantize_sse2) PRIVATE 11649 + global EXTN(jsimd_quantize_sse2) PRIVATE
726 11650
727 EXTN(jsimd_quantize_sse2): 11651 EXTN(jsimd_quantize_sse2):
728 push ebp 11652 push ebp
729 Index: simd/jiss2fst-64.asm 11653 @@ -195,3 +195,6 @@
11654 » pop» ebp
11655 » ret
11656
11657 +; For some reason, the OS X linker does not honor the request to align the
11658 +; segment unless we do this.
11659 +» align» 16
11660 Index: simd/jcqntsse.asm
730 =================================================================== 11661 ===================================================================
731 --- simd/jiss2fst-64.asm» (revision 829) 11662 --- simd/jcqntsse.asm» (revision 829)
732 +++ simd/jiss2fst-64.asm» (working copy) 11663 +++ simd/jcqntsse.asm» (working copy)
733 @@ -60,7 +60,7 @@ 11664 @@ -35,7 +35,7 @@
734 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) 11665 %define workspace» ebp+16» » ; FAST_FLOAT * workspace
735
736 » alignz» 16
737 -» global» EXTN(jconst_idct_ifast_sse2)
738 +» global» EXTN(jconst_idct_ifast_sse2) PRIVATE
739
740 EXTN(jconst_idct_ifast_sse2):
741
742 @@ -93,7 +93,7 @@
743 %define WK_NUM»» 2
744 11666
745 align 16 11667 align 16
746 -» global» EXTN(jsimd_idct_ifast_sse2) 11668 -» global» EXTN(jsimd_convsamp_float_sse)
747 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE 11669 +» global» EXTN(jsimd_convsamp_float_sse) PRIVATE
748 11670
749 EXTN(jsimd_idct_ifast_sse2): 11671 EXTN(jsimd_convsamp_float_sse):
750 » push» rbp 11672 » push» ebp
751 Index: simd/jiss2flt.asm 11673 @@ -138,7 +138,7 @@
752 =================================================================== 11674 %define workspace» ebp+16» » ; FAST_FLOAT * workspace
753 --- simd/jiss2flt.asm» (revision 829)
754 +++ simd/jiss2flt.asm» (working copy)
755 @@ -37,7 +37,7 @@
756 » SECTION»SEG_CONST
757
758 » alignz» 16
759 -» global» EXTN(jconst_idct_float_sse2)
760 +» global» EXTN(jconst_idct_float_sse2) PRIVATE
761
762 EXTN(jconst_idct_float_sse2):
763
764 @@ -73,7 +73,7 @@
765 » » » » » ; FAST_FLOAT workspace[DCTSIZE2]
766 11675
767 align 16 11676 align 16
768 -» global» EXTN(jsimd_idct_float_sse2) 11677 -» global» EXTN(jsimd_quantize_float_sse)
769 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE 11678 +» global» EXTN(jsimd_quantize_float_sse) PRIVATE
770 11679
771 EXTN(jsimd_idct_float_sse2): 11680 EXTN(jsimd_quantize_float_sse):
772 push ebp 11681 push ebp
773 Index: simd/jiss2int.asm 11682 @@ -206,3 +206,6 @@
11683 » pop» ebp
11684 » ret
11685
11686 +; For some reason, the OS X linker does not honor the request to align the
11687 +; segment unless we do this.
11688 +» align» 16
11689 Index: simd/jcsammmx.asm
774 =================================================================== 11690 ===================================================================
775 --- simd/jiss2int.asm» (revision 829) 11691 --- simd/jcsammmx.asm» (revision 829)
776 +++ simd/jiss2int.asm» (working copy) 11692 +++ simd/jcsammmx.asm» (working copy)
777 @@ -66,7 +66,7 @@ 11693 @@ -40,7 +40,7 @@
778 » SECTION»SEG_CONST 11694 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data
779
780 » alignz» 16
781 -» global» EXTN(jconst_idct_islow_sse2)
782 +» global» EXTN(jconst_idct_islow_sse2) PRIVATE
783
784 EXTN(jconst_idct_islow_sse2):
785
786 @@ -105,7 +105,7 @@
787 %define WK_NUM»» 12
788 11695
789 align 16 11696 align 16
790 -» global» EXTN(jsimd_idct_islow_sse2) 11697 -» global» EXTN(jsimd_h2v1_downsample_mmx)
791 +» global» EXTN(jsimd_idct_islow_sse2) PRIVATE 11698 +» global» EXTN(jsimd_h2v1_downsample_mmx) PRIVATE
792 11699
793 EXTN(jsimd_idct_islow_sse2): 11700 EXTN(jsimd_h2v1_downsample_mmx):
794 push ebp 11701 push ebp
795 Index: simd/jfsseflt-64.asm 11702 @@ -95,7 +95,7 @@
796 ===================================================================
797 --- simd/jfsseflt-64.asm» (revision 829)
798 +++ simd/jfsseflt-64.asm» (working copy)
799 @@ -38,7 +38,7 @@
800 » SECTION»SEG_CONST
801 11703
802 » alignz» 16 11704 » mov» eax, JDIMENSION [v_samp(ebp)]» ; rowctr
803 -» global» EXTN(jconst_fdct_float_sse) 11705 » test» eax,eax
804 +» global» EXTN(jconst_fdct_float_sse) PRIVATE 11706 -» jle» short .return
11707 +» jle» near .return
805 11708
806 EXTN(jconst_fdct_float_sse): 11709 » mov edx, 0x00010000» ; bias pattern
807 11710 » movd mm7,edx
808 @@ -65,7 +65,7 @@ 11711 @@ -182,7 +182,7 @@
809 %define WK_NUM»» 2 11712 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data
810 11713
811 align 16 11714 align 16
812 -» global» EXTN(jsimd_fdct_float_sse) 11715 -» global» EXTN(jsimd_h2v2_downsample_mmx)
813 +» global» EXTN(jsimd_fdct_float_sse) PRIVATE 11716 +» global» EXTN(jsimd_h2v2_downsample_mmx) PRIVATE
814 11717
815 EXTN(jsimd_fdct_float_sse): 11718 EXTN(jsimd_h2v2_downsample_mmx):
816 » push» rbp 11719 » push» ebp
817 Index: simd/jccolss2-64.asm 11720 @@ -319,3 +319,6 @@
818 =================================================================== 11721 » pop» ebp
819 --- simd/jccolss2-64.asm» (revision 829) 11722 » ret
820 +++ simd/jccolss2-64.asm» (working copy)
821 @@ -34,7 +34,7 @@
822 » SECTION»SEG_CONST
823 11723
824 » alignz» 16 11724 +; For some reason, the OS X linker does not honor the request to align the
825 -» global» EXTN(jconst_rgb_ycc_convert_sse2) 11725 +; segment unless we do this.
826 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE 11726 +» align» 16
827
828 EXTN(jconst_rgb_ycc_convert_sse2):
829
830 Index: simd/jcsamss2-64.asm 11727 Index: simd/jcsamss2-64.asm
831 =================================================================== 11728 ===================================================================
832 --- simd/jcsamss2-64.asm (revision 829) 11729 --- simd/jcsamss2-64.asm (revision 829)
833 +++ simd/jcsamss2-64.asm (working copy) 11730 +++ simd/jcsamss2-64.asm (working copy)
834 @@ -41,7 +41,7 @@ 11731 @@ -1,5 +1,5 @@
11732 ;
11733 -; jcsamss2.asm - downsampling (64-bit SSE2)
11734 +; jcsamss2-64.asm - downsampling (64-bit SSE2)
11735 ;
11736 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
11737 ; Copyright 2009 D. R. Commander
11738 @@ -41,10 +41,11 @@
835 ; r15 = JSAMPARRAY output_data 11739 ; r15 = JSAMPARRAY output_data
836 11740
837 align 16 11741 align 16
838 - global EXTN(jsimd_h2v1_downsample_sse2) 11742 - global EXTN(jsimd_h2v1_downsample_sse2)
839 + global EXTN(jsimd_h2v1_downsample_sse2) PRIVATE 11743 + global EXTN(jsimd_h2v1_downsample_sse2) PRIVATE
840 11744
841 EXTN(jsimd_h2v1_downsample_sse2): 11745 EXTN(jsimd_h2v1_downsample_sse2):
842 push rbp 11746 push rbp
843 @@ -185,7 +185,7 @@ 11747 +» mov» rax,rsp
11748 » mov» rbp,rsp
11749 » collect_args
11750
11751 @@ -184,10 +185,11 @@
844 ; r15 = JSAMPARRAY output_data 11752 ; r15 = JSAMPARRAY output_data
845 11753
846 align 16 11754 align 16
847 - global EXTN(jsimd_h2v2_downsample_sse2) 11755 - global EXTN(jsimd_h2v2_downsample_sse2)
848 + global EXTN(jsimd_h2v2_downsample_sse2) PRIVATE 11756 + global EXTN(jsimd_h2v2_downsample_sse2) PRIVATE
849 11757
850 EXTN(jsimd_h2v2_downsample_sse2): 11758 EXTN(jsimd_h2v2_downsample_sse2):
851 push rbp 11759 push rbp
11760 + mov rax,rsp
11761 mov rbp,rsp
11762 collect_args
11763
11764 @@ -322,3 +324,7 @@
11765 uncollect_args
11766 pop rbp
11767 ret
11768 +
11769 +; For some reason, the OS X linker does not honor the request to align the
11770 +; segment unless we do this.
11771 + align 16
11772 Index: simd/jcsamss2.asm
11773 ===================================================================
11774 --- simd/jcsamss2.asm (revision 829)
11775 +++ simd/jcsamss2.asm (working copy)
11776 @@ -40,7 +40,7 @@
11777 %define output_data(b) (b)+28 ; JSAMPARRAY output_data
11778
11779 align 16
11780 - global EXTN(jsimd_h2v1_downsample_sse2)
11781 + global EXTN(jsimd_h2v1_downsample_sse2) PRIVATE
11782
11783 EXTN(jsimd_h2v1_downsample_sse2):
11784 push ebp
11785 @@ -195,7 +195,7 @@
11786 %define output_data(b) (b)+28 ; JSAMPARRAY output_data
11787
11788 align 16
11789 - global EXTN(jsimd_h2v2_downsample_sse2)
11790 + global EXTN(jsimd_h2v2_downsample_sse2) PRIVATE
11791
11792 EXTN(jsimd_h2v2_downsample_sse2):
11793 push ebp
11794 @@ -346,3 +346,6 @@
11795 pop ebp
11796 ret
11797
11798 +; For some reason, the OS X linker does not honor the request to align the
11799 +; segment unless we do this.
11800 + align 16
11801 Index: simd/jdclrmmx.asm
11802 ===================================================================
11803 --- simd/jdclrmmx.asm (revision 829)
11804 +++ simd/jdclrmmx.asm (working copy)
11805 @@ -19,8 +19,6 @@
11806 %include "jcolsamp.inc"
11807
11808 ; --------------------------------------------------------------------------
11809 - SECTION SEG_TEXT
11810 - BITS 32
11811 ;
11812 ; Convert some rows of samples to the output colorspace.
11813 ;
11814 @@ -42,7 +40,7 @@
11815 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
11816
11817 align 16
11818 - global EXTN(jsimd_ycc_rgb_convert_mmx)
11819 + global EXTN(jsimd_ycc_rgb_convert_mmx) PRIVATE
11820
11821 EXTN(jsimd_ycc_rgb_convert_mmx):
11822 push ebp
11823 @@ -402,3 +400,6 @@
11824 pop ebp
11825 ret
11826
11827 +; For some reason, the OS X linker does not honor the request to align the
11828 +; segment unless we do this.
11829 + align 16
852 Index: simd/jdclrss2-64.asm 11830 Index: simd/jdclrss2-64.asm
853 =================================================================== 11831 ===================================================================
854 --- simd/jdclrss2-64.asm (revision 829) 11832 --- simd/jdclrss2-64.asm (revision 829)
855 +++ simd/jdclrss2-64.asm (working copy) 11833 +++ simd/jdclrss2-64.asm (working copy)
856 @@ -39,7 +39,7 @@ 11834 @@ -1,8 +1,8 @@
11835 ;
11836 -; jdclrss2.asm - colorspace conversion (64-bit SSE2)
11837 +; jdclrss2-64.asm - colorspace conversion (64-bit SSE2)
11838 ;
11839 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
11840 -; Copyright 2009 D. R. Commander
11841 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB
11842 +; Copyright 2009, 2012 D. R. Commander
11843 ;
11844 ; Based on
11845 ; x86 SIMD extension for IJG JPEG library
11846 @@ -20,8 +20,6 @@
11847 %include "jcolsamp.inc"
11848 » » » »
11849 ; --------------------------------------------------------------------------
11850 -» SECTION»SEG_TEXT
11851 -» BITS» 64
11852 ;
11853 ; Convert some rows of samples to the output colorspace.
11854 ;
11855 @@ -41,7 +39,7 @@
857 %define WK_NUM 2 11856 %define WK_NUM 2
858 11857
859 align 16 11858 align 16
860 - global EXTN(jsimd_ycc_rgb_convert_sse2) 11859 - global EXTN(jsimd_ycc_rgb_convert_sse2)
861 + global EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE 11860 + global EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE
862 11861
863 EXTN(jsimd_ycc_rgb_convert_sse2): 11862 EXTN(jsimd_ycc_rgb_convert_sse2):
864 push rbp 11863 push rbp
11864 @@ -51,8 +49,8 @@
11865 mov [rsp],rax
11866 mov rbp,rsp ; rbp = aligned rbp
11867 lea rsp, [wk(0)]
11868 + collect_args
11869 push rbx
11870 - collect_args
11871
11872 mov rcx, r10 ; num_cols
11873 test rcx,rcx
11874 @@ -72,7 +70,7 @@
11875 pop rcx
11876
11877 mov rdi, r13
11878 - mov rax, r14
11879 + mov eax, r14d
11880 test rax,rax
11881 jle near .return
11882 .rowloop:
11883 @@ -253,17 +251,13 @@
11884 movntdq XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
11885 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
11886 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF
11887 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
11888 jmp short .out0
11889 .out1: ; --(unaligned)-----------------
11890 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)
11891 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA
11892 - add rdi, byte SIZEOF_XMMWORD ; outptr
11893 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD
11894 - add rdi, byte SIZEOF_XMMWORD ; outptr
11895 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [rdi], xmmF
11896 - add rdi, byte SIZEOF_XMMWORD ; outptr
11897 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
11898 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
11899 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF
11900 .out0:
11901 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
11902 sub rcx, byte SIZEOF_XMMWORD
11903 jz near .nextrow
11904
11905 @@ -273,14 +267,12 @@
11906 jmp near .columnloop
11907
11908 .column_st32:
11909 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)
11910 lea rcx, [rcx+rcx*2] ; imul ecx, RGB_PIXELSIZE
11911 cmp rcx, byte 2*SIZEOF_XMMWORD
11912 jb short .column_st16
11913 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA
11914 - add rdi, byte SIZEOF_XMMWORD ; outptr
11915 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD
11916 - add rdi, byte SIZEOF_XMMWORD ; outptr
11917 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
11918 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
11919 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr
11920 movdqa xmmA,xmmF
11921 sub rcx, byte 2*SIZEOF_XMMWORD
11922 jmp short .column_st15
11923 @@ -287,50 +279,44 @@
11924 .column_st16:
11925 cmp rcx, byte SIZEOF_XMMWORD
11926 jb short .column_st15
11927 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA
11928 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
11929 add rdi, byte SIZEOF_XMMWORD ; outptr
11930 movdqa xmmA,xmmD
11931 sub rcx, byte SIZEOF_XMMWORD
11932 .column_st15:
11933 - mov rax,rcx
11934 - xor rcx, byte 0x0F
11935 - shl rcx, 2
11936 - movd xmmB,ecx
11937 - psrlq xmmH,4
11938 - pcmpeqb xmmE,xmmE
11939 - psrlq xmmH,xmmB
11940 - psrlq xmmE,xmmB
11941 - punpcklbw xmmE,xmmH
11942 - ; ----------------
11943 - mov rcx,rdi
11944 - and rcx, byte SIZEOF_XMMWORD-1
11945 - jz short .adj0
11946 - add rax,rcx
11947 - cmp rax, byte SIZEOF_XMMWORD
11948 - ja short .adj0
11949 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary
11950 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,rcx
11951 - movdqa xmmG,xmmA
11952 - movdqa xmmC,xmmE
11953 - pslldq xmmA, SIZEOF_XMMWORD/2
11954 - pslldq xmmE, SIZEOF_XMMWORD/2
11955 - movd xmmD,ecx
11956 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT
11957 - jb short .adj1
11958 - movd xmmF,ecx
11959 - psllq xmmA,xmmF
11960 - psllq xmmE,xmmF
11961 - jmp short .adj0
11962 -.adj1: neg ecx
11963 - movd xmmF,ecx
11964 - psrlq xmmA,xmmF
11965 - psrlq xmmE,xmmF
11966 - psllq xmmG,xmmD
11967 - psllq xmmC,xmmD
11968 - por xmmA,xmmG
11969 - por xmmE,xmmC
11970 -.adj0: ; ----------------
11971 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA
11972 + ; Store the lower 8 bytes of xmmA to the output when it has enough
11973 + ; space.
11974 + cmp rcx, byte SIZEOF_MMWORD
11975 + jb short .column_st7
11976 + movq XMM_MMWORD [rdi], xmmA
11977 + add rdi, byte SIZEOF_MMWORD
11978 + sub rcx, byte SIZEOF_MMWORD
11979 + psrldq xmmA, SIZEOF_MMWORD
11980 +.column_st7:
11981 + ; Store the lower 4 bytes of xmmA to the output when it has enough
11982 + ; space.
11983 + cmp rcx, byte SIZEOF_DWORD
11984 + jb short .column_st3
11985 + movd XMM_DWORD [rdi], xmmA
11986 + add rdi, byte SIZEOF_DWORD
11987 + sub rcx, byte SIZEOF_DWORD
11988 + psrldq xmmA, SIZEOF_DWORD
11989 +.column_st3:
11990 + ; Store the lower 2 bytes of rax to the output when it has enough
11991 + ; space.
11992 + movd eax, xmmA
11993 + cmp rcx, byte SIZEOF_WORD
11994 + jb short .column_st1
11995 + mov WORD [rdi], ax
11996 + add rdi, byte SIZEOF_WORD
11997 + sub rcx, byte SIZEOF_WORD
11998 + shr rax, 16
11999 +.column_st1:
12000 + ; Store the lower 1 byte of rax to the output when it has enough
12001 + ; space.
12002 + test rcx, rcx
12003 + jz short .nextrow
12004 + mov BYTE [rdi], al
12005
12006 %else ; RGB_PIXELSIZE == 4 ; -----------
12007
12008 @@ -375,19 +361,14 @@
12009 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
12010 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC
12011 movntdq XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH
12012 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
12013 jmp short .out0
12014 .out1: ; --(unaligned)-----------------
12015 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)
12016 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA
12017 - add rdi, byte SIZEOF_XMMWORD ; outptr
12018 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD
12019 - add rdi, byte SIZEOF_XMMWORD ; outptr
12020 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [rdi], xmmC
12021 - add rdi, byte SIZEOF_XMMWORD ; outptr
12022 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [rdi], xmmH
12023 - add rdi, byte SIZEOF_XMMWORD ; outptr
12024 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
12025 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
12026 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC
12027 + movdqu XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH
12028 .out0:
12029 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
12030 sub rcx, byte SIZEOF_XMMWORD
12031 jz near .nextrow
12032
12033 @@ -397,13 +378,11 @@
12034 jmp near .columnloop
12035
12036 .column_st32:
12037 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)
12038 cmp rcx, byte SIZEOF_XMMWORD/2
12039 jb short .column_st16
12040 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA
12041 - add rdi, byte SIZEOF_XMMWORD ; outptr
12042 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD
12043 - add rdi, byte SIZEOF_XMMWORD ; outptr
12044 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
12045 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
12046 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr
12047 movdqa xmmA,xmmC
12048 movdqa xmmD,xmmH
12049 sub rcx, byte SIZEOF_XMMWORD/2
12050 @@ -410,50 +389,25 @@
12051 .column_st16:
12052 cmp rcx, byte SIZEOF_XMMWORD/4
12053 jb short .column_st15
12054 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA
12055 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
12056 add rdi, byte SIZEOF_XMMWORD ; outptr
12057 movdqa xmmA,xmmD
12058 sub rcx, byte SIZEOF_XMMWORD/4
12059 .column_st15:
12060 - cmp rcx, byte SIZEOF_XMMWORD/16
12061 - jb near .nextrow
12062 - mov rax,rcx
12063 - xor rcx, byte 0x03
12064 - inc rcx
12065 - shl rcx, 4
12066 - movd xmmF,ecx
12067 - psrlq xmmE,xmmF
12068 - punpcklbw xmmE,xmmE
12069 - ; ----------------
12070 - mov rcx,rdi
12071 - and rcx, byte SIZEOF_XMMWORD-1
12072 - jz short .adj0
12073 - lea rax, [rcx+rax*4] ; RGB_PIXELSIZE
12074 - cmp rax, byte SIZEOF_XMMWORD
12075 - ja short .adj0
12076 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary
12077 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx
12078 - movdqa xmmB,xmmA
12079 - movdqa xmmG,xmmE
12080 - pslldq xmmA, SIZEOF_XMMWORD/2
12081 - pslldq xmmE, SIZEOF_XMMWORD/2
12082 - movd xmmC,ecx
12083 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT
12084 - jb short .adj1
12085 - movd xmmH,ecx
12086 - psllq xmmA,xmmH
12087 - psllq xmmE,xmmH
12088 - jmp short .adj0
12089 -.adj1: neg rcx
12090 - movd xmmH,ecx
12091 - psrlq xmmA,xmmH
12092 - psrlq xmmE,xmmH
12093 - psllq xmmB,xmmC
12094 - psllq xmmG,xmmC
12095 - por xmmA,xmmB
12096 - por xmmE,xmmG
12097 -.adj0: ; ----------------
12098 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA
12099 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough
12100 + ; space.
12101 + cmp rcx, byte SIZEOF_XMMWORD/8
12102 + jb short .column_st7
12103 + movq MMWORD [rdi], xmmA
12104 + add rdi, byte SIZEOF_XMMWORD/8*4
12105 + sub rcx, byte SIZEOF_XMMWORD/8
12106 + psrldq xmmA, SIZEOF_XMMWORD/8*4
12107 +.column_st7:
12108 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough
12109 + ; space.
12110 + test rcx, rcx
12111 + jz short .nextrow
12112 + movd XMM_DWORD [rdi], xmmA
12113
12114 %endif ; RGB_PIXELSIZE ; ---------------
12115
12116 @@ -475,9 +429,13 @@
12117 sfence ; flush the write buffer
12118
12119 .return:
12120 + pop rbx
12121 uncollect_args
12122 - pop rbx
12123 mov rsp,rbp ; rsp <- aligned rbp
12124 pop rsp ; rsp <- original rbp
12125 pop rbp
12126 ret
12127 +
12128 +; For some reason, the OS X linker does not honor the request to align the
12129 +; segment unless we do this.
12130 + align 16
12131 Index: simd/jdclrss2.asm
12132 ===================================================================
12133 --- simd/jdclrss2.asm (revision 829)
12134 +++ simd/jdclrss2.asm (working copy)
12135 @@ -1,7 +1,8 @@
12136 ;
12137 ; jdclrss2.asm - colorspace conversion (SSE2)
12138 ;
12139 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
12140 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB
12141 +; Copyright 2012 D. R. Commander
12142 ;
12143 ; Based on
12144 ; x86 SIMD extension for IJG JPEG library
12145 @@ -19,8 +20,6 @@
12146 %include "jcolsamp.inc"
12147
12148 ; --------------------------------------------------------------------------
12149 - SECTION SEG_TEXT
12150 - BITS 32
12151 ;
12152 ; Convert some rows of samples to the output colorspace.
12153 ;
12154 @@ -42,7 +41,7 @@
12155 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
12156
12157 align 16
12158 - global EXTN(jsimd_ycc_rgb_convert_sse2)
12159 + global EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE
12160
12161 EXTN(jsimd_ycc_rgb_convert_sse2):
12162 push ebp
12163 @@ -264,17 +263,13 @@
12164 movntdq XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
12165 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
12166 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF
12167 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
12168 jmp short .out0
12169 .out1: ; --(unaligned)-----------------
12170 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)
12171 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA
12172 - add edi, byte SIZEOF_XMMWORD ; outptr
12173 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD
12174 - add edi, byte SIZEOF_XMMWORD ; outptr
12175 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [edi], xmmF
12176 - add edi, byte SIZEOF_XMMWORD ; outptr
12177 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
12178 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
12179 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF
12180 .out0:
12181 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
12182 sub ecx, byte SIZEOF_XMMWORD
12183 jz near .nextrow
12184
12185 @@ -285,14 +280,12 @@
12186 alignx 16,7
12187
12188 .column_st32:
12189 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)
12190 lea ecx, [ecx+ecx*2] ; imul ecx, RGB_PIXELSIZE
12191 cmp ecx, byte 2*SIZEOF_XMMWORD
12192 jb short .column_st16
12193 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA
12194 - add edi, byte SIZEOF_XMMWORD ; outptr
12195 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD
12196 - add edi, byte SIZEOF_XMMWORD ; outptr
12197 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
12198 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
12199 + add edi, byte 2*SIZEOF_XMMWORD ; outptr
12200 movdqa xmmA,xmmF
12201 sub ecx, byte 2*SIZEOF_XMMWORD
12202 jmp short .column_st15
12203 @@ -299,50 +292,44 @@
12204 .column_st16:
12205 cmp ecx, byte SIZEOF_XMMWORD
12206 jb short .column_st15
12207 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA
12208 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
12209 add edi, byte SIZEOF_XMMWORD ; outptr
12210 movdqa xmmA,xmmD
12211 sub ecx, byte SIZEOF_XMMWORD
12212 .column_st15:
12213 - mov eax,ecx
12214 - xor ecx, byte 0x0F
12215 - shl ecx, 2
12216 - movd xmmB,ecx
12217 - psrlq xmmH,4
12218 - pcmpeqb xmmE,xmmE
12219 - psrlq xmmH,xmmB
12220 - psrlq xmmE,xmmB
12221 - punpcklbw xmmE,xmmH
12222 - ; ----------------
12223 - mov ecx,edi
12224 - and ecx, byte SIZEOF_XMMWORD-1
12225 - jz short .adj0
12226 - add eax,ecx
12227 - cmp eax, byte SIZEOF_XMMWORD
12228 - ja short .adj0
12229 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary
12230 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx
12231 - movdqa xmmG,xmmA
12232 - movdqa xmmC,xmmE
12233 - pslldq xmmA, SIZEOF_XMMWORD/2
12234 - pslldq xmmE, SIZEOF_XMMWORD/2
12235 - movd xmmD,ecx
12236 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT
12237 - jb short .adj1
12238 - movd xmmF,ecx
12239 - psllq xmmA,xmmF
12240 - psllq xmmE,xmmF
12241 - jmp short .adj0
12242 -.adj1: neg ecx
12243 - movd xmmF,ecx
12244 - psrlq xmmA,xmmF
12245 - psrlq xmmE,xmmF
12246 - psllq xmmG,xmmD
12247 - psllq xmmC,xmmD
12248 - por xmmA,xmmG
12249 - por xmmE,xmmC
12250 -.adj0: ; ----------------
12251 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
12252 + ; Store the lower 8 bytes of xmmA to the output when it has enough
12253 + ; space.
12254 + cmp ecx, byte SIZEOF_MMWORD
12255 + jb short .column_st7
12256 + movq XMM_MMWORD [edi], xmmA
12257 + add edi, byte SIZEOF_MMWORD
12258 + sub ecx, byte SIZEOF_MMWORD
12259 + psrldq xmmA, SIZEOF_MMWORD
12260 +.column_st7:
12261 + ; Store the lower 4 bytes of xmmA to the output when it has enough
12262 + ; space.
12263 + cmp ecx, byte SIZEOF_DWORD
12264 + jb short .column_st3
12265 + movd XMM_DWORD [edi], xmmA
12266 + add edi, byte SIZEOF_DWORD
12267 + sub ecx, byte SIZEOF_DWORD
12268 + psrldq xmmA, SIZEOF_DWORD
12269 +.column_st3:
12270 + ; Store the lower 2 bytes of eax to the output when it has enough
12271 + ; space.
12272 + movd eax, xmmA
12273 + cmp ecx, byte SIZEOF_WORD
12274 + jb short .column_st1
12275 + mov WORD [edi], ax
12276 + add edi, byte SIZEOF_WORD
12277 + sub ecx, byte SIZEOF_WORD
12278 + shr eax, 16
12279 +.column_st1:
12280 + ; Store the lower 1 byte of eax to the output when it has enough
12281 + ; space.
12282 + test ecx, ecx
12283 + jz short .nextrow
12284 + mov BYTE [edi], al
12285
12286 %else ; RGB_PIXELSIZE == 4 ; -----------
12287
12288 @@ -387,19 +374,14 @@
12289 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
12290 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC
12291 movntdq XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH
12292 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
12293 jmp short .out0
12294 .out1: ; --(unaligned)-----------------
12295 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)
12296 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
12297 - add edi, byte SIZEOF_XMMWORD ; outptr
12298 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD
12299 - add edi, byte SIZEOF_XMMWORD ; outptr
12300 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [edi], xmmC
12301 - add edi, byte SIZEOF_XMMWORD ; outptr
12302 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [edi], xmmH
12303 - add edi, byte SIZEOF_XMMWORD ; outptr
12304 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
12305 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
12306 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC
12307 + movdqu XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH
12308 .out0:
12309 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
12310 sub ecx, byte SIZEOF_XMMWORD
12311 jz near .nextrow
12312
12313 @@ -410,13 +392,11 @@
12314 alignx 16,7
12315
12316 .column_st32:
12317 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)
12318 cmp ecx, byte SIZEOF_XMMWORD/2
12319 jb short .column_st16
12320 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
12321 - add edi, byte SIZEOF_XMMWORD ; outptr
12322 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD
12323 - add edi, byte SIZEOF_XMMWORD ; outptr
12324 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
12325 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
12326 + add edi, byte 2*SIZEOF_XMMWORD ; outptr
12327 movdqa xmmA,xmmC
12328 movdqa xmmD,xmmH
12329 sub ecx, byte SIZEOF_XMMWORD/2
12330 @@ -423,50 +403,25 @@
12331 .column_st16:
12332 cmp ecx, byte SIZEOF_XMMWORD/4
12333 jb short .column_st15
12334 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
12335 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
12336 add edi, byte SIZEOF_XMMWORD ; outptr
12337 movdqa xmmA,xmmD
12338 sub ecx, byte SIZEOF_XMMWORD/4
12339 .column_st15:
12340 - cmp ecx, byte SIZEOF_XMMWORD/16
12341 - jb short .nextrow
12342 - mov eax,ecx
12343 - xor ecx, byte 0x03
12344 - inc ecx
12345 - shl ecx, 4
12346 - movd xmmF,ecx
12347 - psrlq xmmE,xmmF
12348 - punpcklbw xmmE,xmmE
12349 - ; ----------------
12350 - mov ecx,edi
12351 - and ecx, byte SIZEOF_XMMWORD-1
12352 - jz short .adj0
12353 - lea eax, [ecx+eax*4] ; RGB_PIXELSIZE
12354 - cmp eax, byte SIZEOF_XMMWORD
12355 - ja short .adj0
12356 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary
12357 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx
12358 - movdqa xmmB,xmmA
12359 - movdqa xmmG,xmmE
12360 - pslldq xmmA, SIZEOF_XMMWORD/2
12361 - pslldq xmmE, SIZEOF_XMMWORD/2
12362 - movd xmmC,ecx
12363 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT
12364 - jb short .adj1
12365 - movd xmmH,ecx
12366 - psllq xmmA,xmmH
12367 - psllq xmmE,xmmH
12368 - jmp short .adj0
12369 -.adj1: neg ecx
12370 - movd xmmH,ecx
12371 - psrlq xmmA,xmmH
12372 - psrlq xmmE,xmmH
12373 - psllq xmmB,xmmC
12374 - psllq xmmG,xmmC
12375 - por xmmA,xmmB
12376 - por xmmE,xmmG
12377 -.adj0: ; ----------------
12378 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
12379 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough
12380 + ; space.
12381 + cmp ecx, byte SIZEOF_XMMWORD/8
12382 + jb short .column_st7
12383 + movq XMM_MMWORD [edi], xmmA
12384 + add edi, byte SIZEOF_XMMWORD/8*4
12385 + sub ecx, byte SIZEOF_XMMWORD/8
12386 + psrldq xmmA, SIZEOF_XMMWORD/8*4
12387 +.column_st7:
12388 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough
12389 + ; space.
12390 + test ecx, ecx
12391 + jz short .nextrow
12392 + movd XMM_DWORD [edi], xmmA
12393
12394 %endif ; RGB_PIXELSIZE ; ---------------
12395
12396 @@ -500,3 +455,6 @@
12397 pop ebp
12398 ret
12399
12400 +; For some reason, the OS X linker does not honor the request to align the
12401 +; segment unless we do this.
12402 + align 16
865 Index: simd/jdcolmmx.asm 12403 Index: simd/jdcolmmx.asm
866 =================================================================== 12404 ===================================================================
867 --- simd/jdcolmmx.asm (revision 829) 12405 --- simd/jdcolmmx.asm (revision 829)
868 +++ simd/jdcolmmx.asm (working copy) 12406 +++ simd/jdcolmmx.asm (working copy)
869 @@ -35,7 +35,7 @@ 12407 @@ -35,7 +35,7 @@
870 SECTION SEG_CONST 12408 SECTION SEG_CONST
871 12409
872 alignz 16 12410 alignz 16
873 - global EXTN(jconst_ycc_rgb_convert_mmx) 12411 - global EXTN(jconst_ycc_rgb_convert_mmx)
874 + global EXTN(jconst_ycc_rgb_convert_mmx) PRIVATE 12412 + global EXTN(jconst_ycc_rgb_convert_mmx) PRIVATE
875 12413
876 EXTN(jconst_ycc_rgb_convert_mmx): 12414 EXTN(jconst_ycc_rgb_convert_mmx):
877 12415
878 Index: simd/jcclrmmx.asm 12416 @@ -48,6 +48,9 @@
12417 » alignz» 16
12418
12419 ; --------------------------------------------------------------------------
12420 +» SECTION»SEG_TEXT
12421 +» BITS» 32
12422 +
12423 %include "jdclrmmx.asm"
12424
12425 %undef RGB_RED
12426 @@ -54,10 +57,10 @@
12427 %undef RGB_GREEN
12428 %undef RGB_BLUE
12429 %undef RGB_PIXELSIZE
12430 -%define RGB_RED 0
12431 -%define RGB_GREEN 1
12432 -%define RGB_BLUE 2
12433 -%define RGB_PIXELSIZE 3
12434 +%define RGB_RED EXT_RGB_RED
12435 +%define RGB_GREEN EXT_RGB_GREEN
12436 +%define RGB_BLUE EXT_RGB_BLUE
12437 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
12438 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extrgb_convert_mmx
12439 %include "jdclrmmx.asm"
12440
12441 @@ -65,10 +68,10 @@
12442 %undef RGB_GREEN
12443 %undef RGB_BLUE
12444 %undef RGB_PIXELSIZE
12445 -%define RGB_RED 0
12446 -%define RGB_GREEN 1
12447 -%define RGB_BLUE 2
12448 -%define RGB_PIXELSIZE 4
12449 +%define RGB_RED EXT_RGBX_RED
12450 +%define RGB_GREEN EXT_RGBX_GREEN
12451 +%define RGB_BLUE EXT_RGBX_BLUE
12452 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
12453 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extrgbx_convert_mmx
12454 %include "jdclrmmx.asm"
12455
12456 @@ -76,10 +79,10 @@
12457 %undef RGB_GREEN
12458 %undef RGB_BLUE
12459 %undef RGB_PIXELSIZE
12460 -%define RGB_RED 2
12461 -%define RGB_GREEN 1
12462 -%define RGB_BLUE 0
12463 -%define RGB_PIXELSIZE 3
12464 +%define RGB_RED EXT_BGR_RED
12465 +%define RGB_GREEN EXT_BGR_GREEN
12466 +%define RGB_BLUE EXT_BGR_BLUE
12467 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
12468 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extbgr_convert_mmx
12469 %include "jdclrmmx.asm"
12470
12471 @@ -87,10 +90,10 @@
12472 %undef RGB_GREEN
12473 %undef RGB_BLUE
12474 %undef RGB_PIXELSIZE
12475 -%define RGB_RED 2
12476 -%define RGB_GREEN 1
12477 -%define RGB_BLUE 0
12478 -%define RGB_PIXELSIZE 4
12479 +%define RGB_RED EXT_BGRX_RED
12480 +%define RGB_GREEN EXT_BGRX_GREEN
12481 +%define RGB_BLUE EXT_BGRX_BLUE
12482 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
12483 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extbgrx_convert_mmx
12484 %include "jdclrmmx.asm"
12485
12486 @@ -98,10 +101,10 @@
12487 %undef RGB_GREEN
12488 %undef RGB_BLUE
12489 %undef RGB_PIXELSIZE
12490 -%define RGB_RED 3
12491 -%define RGB_GREEN 2
12492 -%define RGB_BLUE 1
12493 -%define RGB_PIXELSIZE 4
12494 +%define RGB_RED EXT_XBGR_RED
12495 +%define RGB_GREEN EXT_XBGR_GREEN
12496 +%define RGB_BLUE EXT_XBGR_BLUE
12497 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
12498 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extxbgr_convert_mmx
12499 %include "jdclrmmx.asm"
12500
12501 @@ -109,9 +112,9 @@
12502 %undef RGB_GREEN
12503 %undef RGB_BLUE
12504 %undef RGB_PIXELSIZE
12505 -%define RGB_RED 1
12506 -%define RGB_GREEN 2
12507 -%define RGB_BLUE 3
12508 -%define RGB_PIXELSIZE 4
12509 +%define RGB_RED EXT_XRGB_RED
12510 +%define RGB_GREEN EXT_XRGB_GREEN
12511 +%define RGB_BLUE EXT_XRGB_BLUE
12512 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
12513 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extxrgb_convert_mmx
12514 %include "jdclrmmx.asm"
12515 Index: simd/jdcolss2-64.asm
879 =================================================================== 12516 ===================================================================
880 --- simd/jcclrmmx.asm» (revision 829) 12517 --- simd/jdcolss2-64.asm» (revision 829)
881 +++ simd/jcclrmmx.asm» (working copy) 12518 +++ simd/jdcolss2-64.asm» (working copy)
882 @@ -40,7 +40,7 @@ 12519 @@ -1,5 +1,5 @@
883 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr 12520 ;
884 12521 -; jdcolss2.asm - colorspace conversion (64-bit SSE2)
885 » align» 16 12522 +; jdcolss2-64.asm - colorspace conversion (64-bit SSE2)
886 -» global» EXTN(jsimd_rgb_ycc_convert_mmx) 12523 ;
887 +» global» EXTN(jsimd_rgb_ycc_convert_mmx) PRIVATE 12524 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
888 12525 ; Copyright 2009 D. R. Commander
889 EXTN(jsimd_rgb_ycc_convert_mmx): 12526 @@ -35,7 +35,7 @@
890 » push» ebp
891 Index: simd/jfsseflt.asm
892 ===================================================================
893 --- simd/jfsseflt.asm» (revision 829)
894 +++ simd/jfsseflt.asm» (working copy)
895 @@ -37,7 +37,7 @@
896 SECTION SEG_CONST 12527 SECTION SEG_CONST
897 12528
898 alignz 16 12529 alignz 16
899 -» global» EXTN(jconst_fdct_float_sse) 12530 -» global» EXTN(jconst_ycc_rgb_convert_sse2)
900 +» global» EXTN(jconst_fdct_float_sse) PRIVATE 12531 +» global» EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE
901 12532
902 EXTN(jconst_fdct_float_sse): 12533 EXTN(jconst_ycc_rgb_convert_sse2):
903 12534
904 @@ -65,7 +65,7 @@ 12535 @@ -48,6 +48,9 @@
905 %define WK_NUM»» 2 12536 » alignz» 16
906 12537
907 » align» 16 12538 ; --------------------------------------------------------------------------
908 -» global» EXTN(jsimd_fdct_float_sse) 12539 +» SECTION»SEG_TEXT
909 +» global» EXTN(jsimd_fdct_float_sse) PRIVATE 12540 +» BITS» 64
910 12541 +
911 EXTN(jsimd_fdct_float_sse): 12542 %include "jdclrss2-64.asm"
912 » push» ebp 12543
913 Index: simd/jdmrgss2-64.asm 12544 %undef RGB_RED
914 =================================================================== 12545 @@ -54,10 +57,10 @@
915 --- simd/jdmrgss2-64.asm» (revision 829) 12546 %undef RGB_GREEN
916 +++ simd/jdmrgss2-64.asm» (working copy) 12547 %undef RGB_BLUE
917 @@ -39,7 +39,7 @@ 12548 %undef RGB_PIXELSIZE
918 %define WK_NUM»» 3 12549 -%define RGB_RED 0
919 12550 -%define RGB_GREEN 1
920 » align» 16 12551 -%define RGB_BLUE 2
921 -» global» EXTN(jsimd_h2v1_merged_upsample_sse2) 12552 -%define RGB_PIXELSIZE 3
922 +» global» EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE 12553 +%define RGB_RED EXT_RGB_RED
923 12554 +%define RGB_GREEN EXT_RGB_GREEN
924 EXTN(jsimd_h2v1_merged_upsample_sse2): 12555 +%define RGB_BLUE EXT_RGB_BLUE
925 » push» rbp 12556 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
926 @@ -543,7 +543,7 @@ 12557 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgb_convert_sse2
927 ; r13 = JSAMPARRAY output_buf 12558 %include "jdclrss2-64.asm"
928 12559
929 » align» 16 12560 @@ -65,10 +68,10 @@
930 -» global» EXTN(jsimd_h2v2_merged_upsample_sse2) 12561 %undef RGB_GREEN
931 +» global» EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE 12562 %undef RGB_BLUE
932 12563 %undef RGB_PIXELSIZE
933 EXTN(jsimd_h2v2_merged_upsample_sse2): 12564 -%define RGB_RED 0
934 » push» rbp 12565 -%define RGB_GREEN 1
12566 -%define RGB_BLUE 2
12567 -%define RGB_PIXELSIZE 4
12568 +%define RGB_RED EXT_RGBX_RED
12569 +%define RGB_GREEN EXT_RGBX_GREEN
12570 +%define RGB_BLUE EXT_RGBX_BLUE
12571 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
12572 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgbx_convert_sse2
12573 %include "jdclrss2-64.asm"
12574
12575 @@ -76,10 +79,10 @@
12576 %undef RGB_GREEN
12577 %undef RGB_BLUE
12578 %undef RGB_PIXELSIZE
12579 -%define RGB_RED 2
12580 -%define RGB_GREEN 1
12581 -%define RGB_BLUE 0
12582 -%define RGB_PIXELSIZE 3
12583 +%define RGB_RED EXT_BGR_RED
12584 +%define RGB_GREEN EXT_BGR_GREEN
12585 +%define RGB_BLUE EXT_BGR_BLUE
12586 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
12587 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgr_convert_sse2
12588 %include "jdclrss2-64.asm"
12589
12590 @@ -87,10 +90,10 @@
12591 %undef RGB_GREEN
12592 %undef RGB_BLUE
12593 %undef RGB_PIXELSIZE
12594 -%define RGB_RED 2
12595 -%define RGB_GREEN 1
12596 -%define RGB_BLUE 0
12597 -%define RGB_PIXELSIZE 4
12598 +%define RGB_RED EXT_BGRX_RED
12599 +%define RGB_GREEN EXT_BGRX_GREEN
12600 +%define RGB_BLUE EXT_BGRX_BLUE
12601 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
12602 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgrx_convert_sse2
12603 %include "jdclrss2-64.asm"
12604
12605 @@ -98,10 +101,10 @@
12606 %undef RGB_GREEN
12607 %undef RGB_BLUE
12608 %undef RGB_PIXELSIZE
12609 -%define RGB_RED 3
12610 -%define RGB_GREEN 2
12611 -%define RGB_BLUE 1
12612 -%define RGB_PIXELSIZE 4
12613 +%define RGB_RED EXT_XBGR_RED
12614 +%define RGB_GREEN EXT_XBGR_GREEN
12615 +%define RGB_BLUE EXT_XBGR_BLUE
12616 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
12617 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxbgr_convert_sse2
12618 %include "jdclrss2-64.asm"
12619
12620 @@ -109,9 +112,9 @@
12621 %undef RGB_GREEN
12622 %undef RGB_BLUE
12623 %undef RGB_PIXELSIZE
12624 -%define RGB_RED 1
12625 -%define RGB_GREEN 2
12626 -%define RGB_BLUE 3
12627 -%define RGB_PIXELSIZE 4
12628 +%define RGB_RED EXT_XRGB_RED
12629 +%define RGB_GREEN EXT_XRGB_GREEN
12630 +%define RGB_BLUE EXT_XRGB_BLUE
12631 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
12632 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxrgb_convert_sse2
12633 %include "jdclrss2-64.asm"
935 Index: simd/jdcolss2.asm 12634 Index: simd/jdcolss2.asm
936 =================================================================== 12635 ===================================================================
937 --- simd/jdcolss2.asm (revision 829) 12636 --- simd/jdcolss2.asm (revision 829)
938 +++ simd/jdcolss2.asm (working copy) 12637 +++ simd/jdcolss2.asm (working copy)
939 @@ -35,7 +35,7 @@ 12638 @@ -35,7 +35,7 @@
940 SECTION SEG_CONST 12639 SECTION SEG_CONST
941 12640
942 alignz 16 12641 alignz 16
943 - global EXTN(jconst_ycc_rgb_convert_sse2) 12642 - global EXTN(jconst_ycc_rgb_convert_sse2)
944 + global EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE 12643 + global EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE
945 12644
946 EXTN(jconst_ycc_rgb_convert_sse2): 12645 EXTN(jconst_ycc_rgb_convert_sse2):
947 12646
12647 @@ -48,6 +48,9 @@
12648 alignz 16
12649
12650 ; --------------------------------------------------------------------------
12651 + SECTION SEG_TEXT
12652 + BITS 32
12653 +
12654 %include "jdclrss2.asm"
12655
12656 %undef RGB_RED
12657 @@ -54,10 +57,10 @@
12658 %undef RGB_GREEN
12659 %undef RGB_BLUE
12660 %undef RGB_PIXELSIZE
12661 -%define RGB_RED 0
12662 -%define RGB_GREEN 1
12663 -%define RGB_BLUE 2
12664 -%define RGB_PIXELSIZE 3
12665 +%define RGB_RED EXT_RGB_RED
12666 +%define RGB_GREEN EXT_RGB_GREEN
12667 +%define RGB_BLUE EXT_RGB_BLUE
12668 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
12669 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgb_convert_sse2
12670 %include "jdclrss2.asm"
12671
12672 @@ -65,10 +68,10 @@
12673 %undef RGB_GREEN
12674 %undef RGB_BLUE
12675 %undef RGB_PIXELSIZE
12676 -%define RGB_RED 0
12677 -%define RGB_GREEN 1
12678 -%define RGB_BLUE 2
12679 -%define RGB_PIXELSIZE 4
12680 +%define RGB_RED EXT_RGBX_RED
12681 +%define RGB_GREEN EXT_RGBX_GREEN
12682 +%define RGB_BLUE EXT_RGBX_BLUE
12683 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
12684 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgbx_convert_sse2
12685 %include "jdclrss2.asm"
12686
12687 @@ -76,10 +79,10 @@
12688 %undef RGB_GREEN
12689 %undef RGB_BLUE
12690 %undef RGB_PIXELSIZE
12691 -%define RGB_RED 2
12692 -%define RGB_GREEN 1
12693 -%define RGB_BLUE 0
12694 -%define RGB_PIXELSIZE 3
12695 +%define RGB_RED EXT_BGR_RED
12696 +%define RGB_GREEN EXT_BGR_GREEN
12697 +%define RGB_BLUE EXT_BGR_BLUE
12698 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
12699 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgr_convert_sse2
12700 %include "jdclrss2.asm"
12701
12702 @@ -87,10 +90,10 @@
12703 %undef RGB_GREEN
12704 %undef RGB_BLUE
12705 %undef RGB_PIXELSIZE
12706 -%define RGB_RED 2
12707 -%define RGB_GREEN 1
12708 -%define RGB_BLUE 0
12709 -%define RGB_PIXELSIZE 4
12710 +%define RGB_RED EXT_BGRX_RED
12711 +%define RGB_GREEN EXT_BGRX_GREEN
12712 +%define RGB_BLUE EXT_BGRX_BLUE
12713 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
12714 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgrx_convert_sse2
12715 %include "jdclrss2.asm"
12716
12717 @@ -98,10 +101,10 @@
12718 %undef RGB_GREEN
12719 %undef RGB_BLUE
12720 %undef RGB_PIXELSIZE
12721 -%define RGB_RED 3
12722 -%define RGB_GREEN 2
12723 -%define RGB_BLUE 1
12724 -%define RGB_PIXELSIZE 4
12725 +%define RGB_RED EXT_XBGR_RED
12726 +%define RGB_GREEN EXT_XBGR_GREEN
12727 +%define RGB_BLUE EXT_XBGR_BLUE
12728 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
12729 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxbgr_convert_sse2
12730 %include "jdclrss2.asm"
12731
12732 @@ -109,9 +112,9 @@
12733 %undef RGB_GREEN
12734 %undef RGB_BLUE
12735 %undef RGB_PIXELSIZE
12736 -%define RGB_RED 1
12737 -%define RGB_GREEN 2
12738 -%define RGB_BLUE 3
12739 -%define RGB_PIXELSIZE 4
12740 +%define RGB_RED EXT_XRGB_RED
12741 +%define RGB_GREEN EXT_XRGB_GREEN
12742 +%define RGB_BLUE EXT_XRGB_BLUE
12743 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
12744 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxrgb_convert_sse2
12745 %include "jdclrss2.asm"
948 Index: simd/jdmermmx.asm 12746 Index: simd/jdmermmx.asm
949 =================================================================== 12747 ===================================================================
950 --- simd/jdmermmx.asm (revision 829) 12748 --- simd/jdmermmx.asm (revision 829)
951 +++ simd/jdmermmx.asm (working copy) 12749 +++ simd/jdmermmx.asm (working copy)
952 @@ -35,7 +35,7 @@ 12750 @@ -35,7 +35,7 @@
953 SECTION SEG_CONST 12751 SECTION SEG_CONST
954 12752
955 alignz 16 12753 alignz 16
956 - global EXTN(jconst_merged_upsample_mmx) 12754 - global EXTN(jconst_merged_upsample_mmx)
957 + global EXTN(jconst_merged_upsample_mmx) PRIVATE 12755 + global EXTN(jconst_merged_upsample_mmx) PRIVATE
958 12756
959 EXTN(jconst_merged_upsample_mmx): 12757 EXTN(jconst_merged_upsample_mmx):
960 12758
961 Index: simd/jcclrss2.asm 12759 @@ -48,6 +48,9 @@
12760 » alignz» 16
12761
12762 ; --------------------------------------------------------------------------
12763 +» SECTION»SEG_TEXT
12764 +» BITS» 32
12765 +
12766 %include "jdmrgmmx.asm"
12767
12768 %undef RGB_RED
12769 @@ -54,10 +57,10 @@
12770 %undef RGB_GREEN
12771 %undef RGB_BLUE
12772 %undef RGB_PIXELSIZE
12773 -%define RGB_RED 0
12774 -%define RGB_GREEN 1
12775 -%define RGB_BLUE 2
12776 -%define RGB_PIXELSIZE 3
12777 +%define RGB_RED EXT_RGB_RED
12778 +%define RGB_GREEN EXT_RGB_GREEN
12779 +%define RGB_BLUE EXT_RGB_BLUE
12780 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
12781 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extrgb_merged_upsample_mmx
12782 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extrgb_merged_upsample_mmx
12783 %include "jdmrgmmx.asm"
12784 @@ -66,10 +69,10 @@
12785 %undef RGB_GREEN
12786 %undef RGB_BLUE
12787 %undef RGB_PIXELSIZE
12788 -%define RGB_RED 0
12789 -%define RGB_GREEN 1
12790 -%define RGB_BLUE 2
12791 -%define RGB_PIXELSIZE 4
12792 +%define RGB_RED EXT_RGBX_RED
12793 +%define RGB_GREEN EXT_RGBX_GREEN
12794 +%define RGB_BLUE EXT_RGBX_BLUE
12795 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
12796 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extrgbx_merged_upsample_mmx
12797 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extrgbx_merged_upsample_mmx
12798 %include "jdmrgmmx.asm"
12799 @@ -78,10 +81,10 @@
12800 %undef RGB_GREEN
12801 %undef RGB_BLUE
12802 %undef RGB_PIXELSIZE
12803 -%define RGB_RED 2
12804 -%define RGB_GREEN 1
12805 -%define RGB_BLUE 0
12806 -%define RGB_PIXELSIZE 3
12807 +%define RGB_RED EXT_BGR_RED
12808 +%define RGB_GREEN EXT_BGR_GREEN
12809 +%define RGB_BLUE EXT_BGR_BLUE
12810 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
12811 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extbgr_merged_upsample_mmx
12812 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extbgr_merged_upsample_mmx
12813 %include "jdmrgmmx.asm"
12814 @@ -90,10 +93,10 @@
12815 %undef RGB_GREEN
12816 %undef RGB_BLUE
12817 %undef RGB_PIXELSIZE
12818 -%define RGB_RED 2
12819 -%define RGB_GREEN 1
12820 -%define RGB_BLUE 0
12821 -%define RGB_PIXELSIZE 4
12822 +%define RGB_RED EXT_BGRX_RED
12823 +%define RGB_GREEN EXT_BGRX_GREEN
12824 +%define RGB_BLUE EXT_BGRX_BLUE
12825 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
12826 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extbgrx_merged_upsample_mmx
12827 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extbgrx_merged_upsample_mmx
12828 %include "jdmrgmmx.asm"
12829 @@ -102,10 +105,10 @@
12830 %undef RGB_GREEN
12831 %undef RGB_BLUE
12832 %undef RGB_PIXELSIZE
12833 -%define RGB_RED 3
12834 -%define RGB_GREEN 2
12835 -%define RGB_BLUE 1
12836 -%define RGB_PIXELSIZE 4
12837 +%define RGB_RED EXT_XBGR_RED
12838 +%define RGB_GREEN EXT_XBGR_GREEN
12839 +%define RGB_BLUE EXT_XBGR_BLUE
12840 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
12841 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extxbgr_merged_upsample_mmx
12842 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extxbgr_merged_upsample_mmx
12843 %include "jdmrgmmx.asm"
12844 @@ -114,10 +117,10 @@
12845 %undef RGB_GREEN
12846 %undef RGB_BLUE
12847 %undef RGB_PIXELSIZE
12848 -%define RGB_RED 1
12849 -%define RGB_GREEN 2
12850 -%define RGB_BLUE 3
12851 -%define RGB_PIXELSIZE 4
12852 +%define RGB_RED EXT_XRGB_RED
12853 +%define RGB_GREEN EXT_XRGB_GREEN
12854 +%define RGB_BLUE EXT_XRGB_BLUE
12855 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
12856 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extxrgb_merged_upsample_mmx
12857 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extxrgb_merged_upsample_mmx
12858 %include "jdmrgmmx.asm"
12859 Index: simd/jdmerss2-64.asm
962 =================================================================== 12860 ===================================================================
963 --- simd/jcclrss2.asm» (revision 829) 12861 --- simd/jdmerss2-64.asm» (revision 829)
964 +++ simd/jcclrss2.asm» (working copy) 12862 +++ simd/jdmerss2-64.asm» (working copy)
965 @@ -38,7 +38,7 @@ 12863 @@ -1,5 +1,5 @@
966 12864 ;
967 » align» 16 12865 -; jdmerss2.asm - merged upsampling/color conversion (64-bit SSE2)
968 12866 +; jdmerss2-64.asm - merged upsampling/color conversion (64-bit SSE2)
969 -» global» EXTN(jsimd_rgb_ycc_convert_sse2) 12867 ;
970 +» global» EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE 12868 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
971 12869 ; Copyright 2009 D. R. Commander
972 EXTN(jsimd_rgb_ycc_convert_sse2): 12870 @@ -35,7 +35,7 @@
973 » push» ebp
974 Index: simd/jiss2red.asm
975 ===================================================================
976 --- simd/jiss2red.asm» (revision 829)
977 +++ simd/jiss2red.asm» (working copy)
978 @@ -72,7 +72,7 @@
979 SECTION SEG_CONST 12871 SECTION SEG_CONST
980 12872
981 alignz 16 12873 alignz 16
982 -» global» EXTN(jconst_idct_red_sse2) 12874 -» global» EXTN(jconst_merged_upsample_sse2)
983 +» global» EXTN(jconst_idct_red_sse2) PRIVATE 12875 +» global» EXTN(jconst_merged_upsample_sse2) PRIVATE
984 12876
985 EXTN(jconst_idct_red_sse2): 12877 EXTN(jconst_merged_upsample_sse2):
986 12878
987 @@ -113,7 +113,7 @@ 12879 @@ -48,6 +48,9 @@
988 %define WK_NUM»» 2 12880 » alignz» 16
989 12881
990 » align» 16 12882 ; --------------------------------------------------------------------------
991 -» global» EXTN(jsimd_idct_4x4_sse2) 12883 +» SECTION»SEG_TEXT
992 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE 12884 +» BITS» 64
993 12885 +
994 EXTN(jsimd_idct_4x4_sse2): 12886 %include "jdmrgss2-64.asm"
995 » push» ebp 12887
996 @@ -424,7 +424,7 @@ 12888 %undef RGB_RED
997 %define output_col(b)» (b)+20» » ; JDIMENSION output_col 12889 @@ -54,10 +57,10 @@
998 12890 %undef RGB_GREEN
999 » align» 16 12891 %undef RGB_BLUE
1000 -» global» EXTN(jsimd_idct_2x2_sse2) 12892 %undef RGB_PIXELSIZE
1001 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE 12893 -%define RGB_RED 0
1002 12894 -%define RGB_GREEN 1
1003 EXTN(jsimd_idct_2x2_sse2): 12895 -%define RGB_BLUE 2
1004 » push» ebp 12896 -%define RGB_PIXELSIZE 3
12897 +%define RGB_RED EXT_RGB_RED
12898 +%define RGB_GREEN EXT_RGB_GREEN
12899 +%define RGB_BLUE EXT_RGB_BLUE
12900 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
12901 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgb_merged_upsample_sse2
12902 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgb_merged_upsample_sse2
12903 %include "jdmrgss2-64.asm"
12904 @@ -66,10 +69,10 @@
12905 %undef RGB_GREEN
12906 %undef RGB_BLUE
12907 %undef RGB_PIXELSIZE
12908 -%define RGB_RED 0
12909 -%define RGB_GREEN 1
12910 -%define RGB_BLUE 2
12911 -%define RGB_PIXELSIZE 4
12912 +%define RGB_RED EXT_RGBX_RED
12913 +%define RGB_GREEN EXT_RGBX_GREEN
12914 +%define RGB_BLUE EXT_RGBX_BLUE
12915 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
12916 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgbx_merged_upsample_sse2
12917 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgbx_merged_upsample_sse2
12918 %include "jdmrgss2-64.asm"
12919 @@ -78,10 +81,10 @@
12920 %undef RGB_GREEN
12921 %undef RGB_BLUE
12922 %undef RGB_PIXELSIZE
12923 -%define RGB_RED 2
12924 -%define RGB_GREEN 1
12925 -%define RGB_BLUE 0
12926 -%define RGB_PIXELSIZE 3
12927 +%define RGB_RED EXT_BGR_RED
12928 +%define RGB_GREEN EXT_BGR_GREEN
12929 +%define RGB_BLUE EXT_BGR_BLUE
12930 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
12931 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgr_merged_upsample_sse2
12932 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgr_merged_upsample_sse2
12933 %include "jdmrgss2-64.asm"
12934 @@ -90,10 +93,10 @@
12935 %undef RGB_GREEN
12936 %undef RGB_BLUE
12937 %undef RGB_PIXELSIZE
12938 -%define RGB_RED 2
12939 -%define RGB_GREEN 1
12940 -%define RGB_BLUE 0
12941 -%define RGB_PIXELSIZE 4
12942 +%define RGB_RED EXT_BGRX_RED
12943 +%define RGB_GREEN EXT_BGRX_GREEN
12944 +%define RGB_BLUE EXT_BGRX_BLUE
12945 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
12946 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgrx_merged_upsample_sse2
12947 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgrx_merged_upsample_sse2
12948 %include "jdmrgss2-64.asm"
12949 @@ -102,10 +105,10 @@
12950 %undef RGB_GREEN
12951 %undef RGB_BLUE
12952 %undef RGB_PIXELSIZE
12953 -%define RGB_RED 3
12954 -%define RGB_GREEN 2
12955 -%define RGB_BLUE 1
12956 -%define RGB_PIXELSIZE 4
12957 +%define RGB_RED EXT_XBGR_RED
12958 +%define RGB_GREEN EXT_XBGR_GREEN
12959 +%define RGB_BLUE EXT_XBGR_BLUE
12960 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
12961 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxbgr_merged_upsample_sse2
12962 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxbgr_merged_upsample_sse2
12963 %include "jdmrgss2-64.asm"
12964 @@ -114,10 +117,10 @@
12965 %undef RGB_GREEN
12966 %undef RGB_BLUE
12967 %undef RGB_PIXELSIZE
12968 -%define RGB_RED 1
12969 -%define RGB_GREEN 2
12970 -%define RGB_BLUE 3
12971 -%define RGB_PIXELSIZE 4
12972 +%define RGB_RED EXT_XRGB_RED
12973 +%define RGB_GREEN EXT_XRGB_GREEN
12974 +%define RGB_BLUE EXT_XRGB_BLUE
12975 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
12976 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxrgb_merged_upsample_sse2
12977 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxrgb_merged_upsample_sse2
12978 %include "jdmrgss2-64.asm"
1005 Index: simd/jdmerss2.asm 12979 Index: simd/jdmerss2.asm
1006 =================================================================== 12980 ===================================================================
1007 --- simd/jdmerss2.asm (revision 829) 12981 --- simd/jdmerss2.asm (revision 829)
1008 +++ simd/jdmerss2.asm (working copy) 12982 +++ simd/jdmerss2.asm (working copy)
1009 @@ -35,7 +35,7 @@ 12983 @@ -35,7 +35,7 @@
1010 SECTION SEG_CONST 12984 SECTION SEG_CONST
1011 12985
1012 alignz 16 12986 alignz 16
1013 - global EXTN(jconst_merged_upsample_sse2) 12987 - global EXTN(jconst_merged_upsample_sse2)
1014 + global EXTN(jconst_merged_upsample_sse2) PRIVATE 12988 + global EXTN(jconst_merged_upsample_sse2) PRIVATE
1015 12989
1016 EXTN(jconst_merged_upsample_sse2): 12990 EXTN(jconst_merged_upsample_sse2):
1017 12991
1018 Index: simd/jfss2fst-64.asm 12992 @@ -48,6 +48,9 @@
12993 » alignz» 16
12994
12995 ; --------------------------------------------------------------------------
12996 +» SECTION»SEG_TEXT
12997 +» BITS» 32
12998 +
12999 %include "jdmrgss2.asm"
13000
13001 %undef RGB_RED
13002 @@ -54,10 +57,10 @@
13003 %undef RGB_GREEN
13004 %undef RGB_BLUE
13005 %undef RGB_PIXELSIZE
13006 -%define RGB_RED 0
13007 -%define RGB_GREEN 1
13008 -%define RGB_BLUE 2
13009 -%define RGB_PIXELSIZE 3
13010 +%define RGB_RED EXT_RGB_RED
13011 +%define RGB_GREEN EXT_RGB_GREEN
13012 +%define RGB_BLUE EXT_RGB_BLUE
13013 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
13014 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgb_merged_upsample_sse2
13015 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgb_merged_upsample_sse2
13016 %include "jdmrgss2.asm"
13017 @@ -66,10 +69,10 @@
13018 %undef RGB_GREEN
13019 %undef RGB_BLUE
13020 %undef RGB_PIXELSIZE
13021 -%define RGB_RED 0
13022 -%define RGB_GREEN 1
13023 -%define RGB_BLUE 2
13024 -%define RGB_PIXELSIZE 4
13025 +%define RGB_RED EXT_RGBX_RED
13026 +%define RGB_GREEN EXT_RGBX_GREEN
13027 +%define RGB_BLUE EXT_RGBX_BLUE
13028 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
13029 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgbx_merged_upsample_sse2
13030 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgbx_merged_upsample_sse2
13031 %include "jdmrgss2.asm"
13032 @@ -78,10 +81,10 @@
13033 %undef RGB_GREEN
13034 %undef RGB_BLUE
13035 %undef RGB_PIXELSIZE
13036 -%define RGB_RED 2
13037 -%define RGB_GREEN 1
13038 -%define RGB_BLUE 0
13039 -%define RGB_PIXELSIZE 3
13040 +%define RGB_RED EXT_BGR_RED
13041 +%define RGB_GREEN EXT_BGR_GREEN
13042 +%define RGB_BLUE EXT_BGR_BLUE
13043 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
13044 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgr_merged_upsample_sse2
13045 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgr_merged_upsample_sse2
13046 %include "jdmrgss2.asm"
13047 @@ -90,10 +93,10 @@
13048 %undef RGB_GREEN
13049 %undef RGB_BLUE
13050 %undef RGB_PIXELSIZE
13051 -%define RGB_RED 2
13052 -%define RGB_GREEN 1
13053 -%define RGB_BLUE 0
13054 -%define RGB_PIXELSIZE 4
13055 +%define RGB_RED EXT_BGRX_RED
13056 +%define RGB_GREEN EXT_BGRX_GREEN
13057 +%define RGB_BLUE EXT_BGRX_BLUE
13058 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
13059 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgrx_merged_upsample_sse2
13060 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgrx_merged_upsample_sse2
13061 %include "jdmrgss2.asm"
13062 @@ -102,10 +105,10 @@
13063 %undef RGB_GREEN
13064 %undef RGB_BLUE
13065 %undef RGB_PIXELSIZE
13066 -%define RGB_RED 3
13067 -%define RGB_GREEN 2
13068 -%define RGB_BLUE 1
13069 -%define RGB_PIXELSIZE 4
13070 +%define RGB_RED EXT_XBGR_RED
13071 +%define RGB_GREEN EXT_XBGR_GREEN
13072 +%define RGB_BLUE EXT_XBGR_BLUE
13073 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
13074 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxbgr_merged_upsample_sse2
13075 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxbgr_merged_upsample_sse2
13076 %include "jdmrgss2.asm"
13077 @@ -114,10 +117,10 @@
13078 %undef RGB_GREEN
13079 %undef RGB_BLUE
13080 %undef RGB_PIXELSIZE
13081 -%define RGB_RED 1
13082 -%define RGB_GREEN 2
13083 -%define RGB_BLUE 3
13084 -%define RGB_PIXELSIZE 4
13085 +%define RGB_RED EXT_XRGB_RED
13086 +%define RGB_GREEN EXT_XRGB_GREEN
13087 +%define RGB_BLUE EXT_XRGB_BLUE
13088 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
13089 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxrgb_merged_upsample_sse2
13090 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxrgb_merged_upsample_sse2
13091 %include "jdmrgss2.asm"
13092 Index: simd/jdmrgmmx.asm
1019 =================================================================== 13093 ===================================================================
1020 --- simd/jfss2fst-64.asm (revision 829) 13094 --- simd/jdmrgmmx.asm (revision 829)
1021 +++ simd/jfss2fst-64.asm (working copy) 13095 +++ simd/jdmrgmmx.asm (working copy)
1022 @@ -53,7 +53,7 @@ 13096 @@ -19,8 +19,6 @@
1023 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) 13097 %include "jcolsamp.inc"
13098
13099 ; --------------------------------------------------------------------------
13100 - SECTION SEG_TEXT
13101 - BITS 32
13102 ;
13103 ; Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical.
13104 ;
13105 @@ -42,7 +40,7 @@
13106 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
13107
13108 align 16
13109 - global EXTN(jsimd_h2v1_merged_upsample_mmx)
13110 + global EXTN(jsimd_h2v1_merged_upsample_mmx) PRIVATE
13111
13112 EXTN(jsimd_h2v1_merged_upsample_mmx):
13113 push ebp
13114 @@ -253,7 +251,7 @@
13115 movq MMWORD [edi+2*SIZEOF_MMWORD], mmC
13116
13117 sub ecx, byte SIZEOF_MMWORD
13118 - jz short .endcolumn
13119 + jz near .endcolumn
13120
13121 add edi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr
13122 add esi, byte SIZEOF_MMWORD ; inptr0
13123 @@ -411,7 +409,7 @@
13124 %define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
13125
13126 align 16
13127 - global EXTN(jsimd_h2v2_merged_upsample_mmx)
13128 + global EXTN(jsimd_h2v2_merged_upsample_mmx) PRIVATE
13129
13130 EXTN(jsimd_h2v2_merged_upsample_mmx):
13131 push ebp
13132 @@ -461,3 +459,6 @@
13133 pop ebp
13134 ret
13135
13136 +; For some reason, the OS X linker does not honor the request to align the
13137 +; segment unless we do this.
13138 + align 16
13139 Index: simd/jdmrgss2-64.asm
13140 ===================================================================
13141 --- simd/jdmrgss2-64.asm (revision 829)
13142 +++ simd/jdmrgss2-64.asm (working copy)
13143 @@ -1,8 +1,8 @@
13144 ;
13145 -; jdmrgss2.asm - merged upsampling/color conversion (64-bit SSE2)
13146 +; jdmrgss2-64.asm - merged upsampling/color conversion (64-bit SSE2)
13147 ;
13148 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
13149 -; Copyright 2009 D. R. Commander
13150 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB
13151 +; Copyright 2009, 2012 D. R. Commander
13152 ;
13153 ; Based on
13154 ; x86 SIMD extension for IJG JPEG library
13155 @@ -20,8 +20,6 @@
13156 %include "jcolsamp.inc"
13157
13158 ; --------------------------------------------------------------------------
13159 - SECTION SEG_TEXT
13160 - BITS 64
13161 ;
13162 ; Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical.
13163 ;
13164 @@ -41,7 +39,7 @@
13165 %define WK_NUM 3
13166
13167 align 16
13168 - global EXTN(jsimd_h2v1_merged_upsample_sse2)
13169 + global EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE
13170
13171 EXTN(jsimd_h2v1_merged_upsample_sse2):
13172 push rbp
13173 @@ -51,8 +49,8 @@
13174 mov [rsp],rax
13175 mov rbp,rsp ; rbp = aligned rbp
13176 lea rsp, [wk(0)]
13177 + collect_args
13178 push rbx
13179 - collect_args
13180
13181 mov rcx, r10 ; col
13182 test rcx,rcx
13183 @@ -254,17 +252,13 @@
13184 movntdq XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
13185 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
13186 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF
13187 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
13188 jmp short .out0
13189 .out1: ; --(unaligned)-----------------
13190 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)
13191 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA
13192 - add rdi, byte SIZEOF_XMMWORD ; outptr
13193 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD
13194 - add rdi, byte SIZEOF_XMMWORD ; outptr
13195 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [rdi], xmmF
13196 - add rdi, byte SIZEOF_XMMWORD ; outptr
13197 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
13198 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
13199 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF
13200 .out0:
13201 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
13202 sub rcx, byte SIZEOF_XMMWORD
13203 jz near .endcolumn
13204
13205 @@ -277,14 +271,12 @@
13206 jmp near .columnloop
13207
13208 .column_st32:
13209 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)
13210 lea rcx, [rcx+rcx*2] ; imul ecx, RGB_PIXELSIZE
13211 cmp rcx, byte 2*SIZEOF_XMMWORD
13212 jb short .column_st16
13213 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA
13214 - add rdi, byte SIZEOF_XMMWORD ; outptr
13215 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD
13216 - add rdi, byte SIZEOF_XMMWORD ; outptr
13217 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
13218 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
13219 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr
13220 movdqa xmmA,xmmF
13221 sub rcx, byte 2*SIZEOF_XMMWORD
13222 jmp short .column_st15
13223 @@ -291,50 +283,44 @@
13224 .column_st16:
13225 cmp rcx, byte SIZEOF_XMMWORD
13226 jb short .column_st15
13227 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA
13228 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
13229 add rdi, byte SIZEOF_XMMWORD ; outptr
13230 movdqa xmmA,xmmD
13231 sub rcx, byte SIZEOF_XMMWORD
13232 .column_st15:
13233 - mov rax,rcx
13234 - xor rcx, byte 0x0F
13235 - shl rcx, 2
13236 - movd xmmB,ecx
13237 - psrlq xmmH,4
13238 - pcmpeqb xmmE,xmmE
13239 - psrlq xmmH,xmmB
13240 - psrlq xmmE,xmmB
13241 - punpcklbw xmmE,xmmH
13242 - ; ----------------
13243 - mov rcx,rdi
13244 - and rcx, byte SIZEOF_XMMWORD-1
13245 - jz short .adj0
13246 - add rax,rcx
13247 - cmp rax, byte SIZEOF_XMMWORD
13248 - ja short .adj0
13249 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary
13250 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx
13251 - movdqa xmmG,xmmA
13252 - movdqa xmmC,xmmE
13253 - pslldq xmmA, SIZEOF_XMMWORD/2
13254 - pslldq xmmE, SIZEOF_XMMWORD/2
13255 - movd xmmD,ecx
13256 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT
13257 - jb short .adj1
13258 - movd xmmF,ecx
13259 - psllq xmmA,xmmF
13260 - psllq xmmE,xmmF
13261 - jmp short .adj0
13262 -.adj1: neg rcx
13263 - movd xmmF,ecx
13264 - psrlq xmmA,xmmF
13265 - psrlq xmmE,xmmF
13266 - psllq xmmG,xmmD
13267 - psllq xmmC,xmmD
13268 - por xmmA,xmmG
13269 - por xmmE,xmmC
13270 -.adj0: ; ----------------
13271 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
13272 + ; Store the lower 8 bytes of xmmA to the output when it has enough
13273 + ; space.
13274 + cmp rcx, byte SIZEOF_MMWORD
13275 + jb short .column_st7
13276 + movq XMM_MMWORD [rdi], xmmA
13277 + add rdi, byte SIZEOF_MMWORD
13278 + sub rcx, byte SIZEOF_MMWORD
13279 + psrldq xmmA, SIZEOF_MMWORD
13280 +.column_st7:
13281 + ; Store the lower 4 bytes of xmmA to the output when it has enough
13282 + ; space.
13283 + cmp rcx, byte SIZEOF_DWORD
13284 + jb short .column_st3
13285 + movd XMM_DWORD [rdi], xmmA
13286 + add rdi, byte SIZEOF_DWORD
13287 + sub rcx, byte SIZEOF_DWORD
13288 + psrldq xmmA, SIZEOF_DWORD
13289 +.column_st3:
13290 + ; Store the lower 2 bytes of rax to the output when it has enough
13291 + ; space.
13292 + movd eax, xmmA
13293 + cmp rcx, byte SIZEOF_WORD
13294 + jb short .column_st1
13295 + mov WORD [rdi], ax
13296 + add rdi, byte SIZEOF_WORD
13297 + sub rcx, byte SIZEOF_WORD
13298 + shr rax, 16
13299 +.column_st1:
13300 + ; Store the lower 1 byte of rax to the output when it has enough
13301 + ; space.
13302 + test rcx, rcx
13303 + jz short .endcolumn
13304 + mov BYTE [rdi], al
13305
13306 %else ; RGB_PIXELSIZE == 4 ; -----------
13307
13308 @@ -379,19 +365,14 @@
13309 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
13310 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC
13311 movntdq XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH
13312 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
13313 jmp short .out0
13314 .out1: ; --(unaligned)-----------------
13315 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)
13316 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA
13317 - add rdi, byte SIZEOF_XMMWORD ; outptr
13318 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD
13319 - add rdi, byte SIZEOF_XMMWORD ; outptr
13320 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [rdi], xmmC
13321 - add rdi, byte SIZEOF_XMMWORD ; outptr
13322 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [rdi], xmmH
13323 - add rdi, byte SIZEOF_XMMWORD ; outptr
13324 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
13325 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
13326 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC
13327 + movdqu XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH
13328 .out0:
13329 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
13330 sub rcx, byte SIZEOF_XMMWORD
13331 jz near .endcolumn
13332
13333 @@ -404,13 +385,11 @@
13334 jmp near .columnloop
13335
13336 .column_st32:
13337 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)
13338 cmp rcx, byte SIZEOF_XMMWORD/2
13339 jb short .column_st16
13340 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA
13341 - add rdi, byte SIZEOF_XMMWORD ; outptr
13342 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD
13343 - add rdi, byte SIZEOF_XMMWORD ; outptr
13344 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
13345 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD
13346 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr
13347 movdqa xmmA,xmmC
13348 movdqa xmmD,xmmH
13349 sub rcx, byte SIZEOF_XMMWORD/2
13350 @@ -417,50 +396,25 @@
13351 .column_st16:
13352 cmp rcx, byte SIZEOF_XMMWORD/4
13353 jb short .column_st15
13354 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
13355 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA
13356 add rdi, byte SIZEOF_XMMWORD ; outptr
13357 movdqa xmmA,xmmD
13358 sub rcx, byte SIZEOF_XMMWORD/4
13359 .column_st15:
13360 - cmp rcx, byte SIZEOF_XMMWORD/16
13361 - jb near .endcolumn
13362 - mov rax,rcx
13363 - xor rcx, byte 0x03
13364 - inc rcx
13365 - shl rcx, 4
13366 - movd xmmF,ecx
13367 - psrlq xmmE,xmmF
13368 - punpcklbw xmmE,xmmE
13369 - ; ----------------
13370 - mov rcx,rdi
13371 - and rcx, byte SIZEOF_XMMWORD-1
13372 - jz short .adj0
13373 - lea rax, [rcx+rax*4] ; RGB_PIXELSIZE
13374 - cmp rax, byte SIZEOF_XMMWORD
13375 - ja short .adj0
13376 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary
13377 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx
13378 - movdqa xmmB,xmmA
13379 - movdqa xmmG,xmmE
13380 - pslldq xmmA, SIZEOF_XMMWORD/2
13381 - pslldq xmmE, SIZEOF_XMMWORD/2
13382 - movd xmmC,ecx
13383 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT
13384 - jb short .adj1
13385 - movd xmmH,ecx
13386 - psllq xmmA,xmmH
13387 - psllq xmmE,xmmH
13388 - jmp short .adj0
13389 -.adj1: neg rcx
13390 - movd xmmH,ecx
13391 - psrlq xmmA,xmmH
13392 - psrlq xmmE,xmmH
13393 - psllq xmmB,xmmC
13394 - psllq xmmG,xmmC
13395 - por xmmA,xmmB
13396 - por xmmE,xmmG
13397 -.adj0: ; ----------------
13398 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
13399 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough
13400 + ; space.
13401 + cmp rcx, byte SIZEOF_XMMWORD/8
13402 + jb short .column_st7
13403 + movq XMM_MMWORD [rdi], xmmA
13404 + add rdi, byte SIZEOF_XMMWORD/8*4
13405 + sub rcx, byte SIZEOF_XMMWORD/8
13406 + psrldq xmmA, SIZEOF_XMMWORD/8*4
13407 +.column_st7:
13408 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough
13409 + ; space.
13410 + test rcx, rcx
13411 + jz short .endcolumn
13412 + movd XMM_DWORD [rdi], xmmA
13413
13414 %endif ; RGB_PIXELSIZE ; ---------------
13415
13416 @@ -468,8 +422,8 @@
13417 sfence ; flush the write buffer
13418
13419 .return:
13420 + pop rbx
13421 uncollect_args
13422 - pop rbx
13423 mov rsp,rbp ; rsp <- aligned rbp
13424 pop rsp ; rsp <- original rbp
13425 pop rbp
13426 @@ -492,13 +446,14 @@
13427 ; r13 = JSAMPARRAY output_buf
13428
13429 align 16
13430 - global EXTN(jsimd_h2v2_merged_upsample_sse2)
13431 + global EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE
13432
13433 EXTN(jsimd_h2v2_merged_upsample_sse2):
13434 push rbp
13435 + mov rax,rsp
13436 mov rbp,rsp
13437 + collect_args
13438 push rbx
13439 - collect_args
13440
13441 mov rax, r10
13442
13443 @@ -519,10 +474,17 @@
13444 push rcx
13445 push rax
13446
13447 + %ifdef WIN64
13448 + mov r8, rcx
13449 + mov r9, rdi
13450 + mov rcx, rax
13451 + mov rdx, rbx
13452 + %else
13453 mov rdx, rcx
13454 mov rcx, rdi
13455 mov rdi, rax
13456 mov rsi, rbx
13457 + %endif
13458
13459 call EXTN(jsimd_h2v1_merged_upsample_sse2)
13460
13461 @@ -545,10 +507,17 @@
13462 push rcx
13463 push rax
13464
13465 + %ifdef WIN64
13466 + mov r8, rcx
13467 + mov r9, rdi
13468 + mov rcx, rax
13469 + mov rdx, rbx
13470 + %else
13471 mov rdx, rcx
13472 mov rcx, rdi
13473 mov rdi, rax
13474 mov rsi, rbx
13475 + %endif
13476
13477 call EXTN(jsimd_h2v1_merged_upsample_sse2)
13478
13479 @@ -559,7 +528,11 @@
13480 pop rbx
13481 pop rdx
13482
13483 + pop rbx
13484 uncollect_args
13485 - pop rbx
13486 pop rbp
13487 ret
13488 +
13489 +; For some reason, the OS X linker does not honor the request to align the
13490 +; segment unless we do this.
13491 + align 16
13492 Index: simd/jdmrgss2.asm
13493 ===================================================================
13494 --- simd/jdmrgss2.asm (revision 829)
13495 +++ simd/jdmrgss2.asm (working copy)
13496 @@ -1,7 +1,8 @@
13497 ;
13498 ; jdmrgss2.asm - merged upsampling/color conversion (SSE2)
13499 ;
13500 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
13501 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB
13502 +; Copyright 2012 D. R. Commander
13503 ;
13504 ; Based on
13505 ; x86 SIMD extension for IJG JPEG library
13506 @@ -19,8 +20,6 @@
13507 %include "jcolsamp.inc"
13508
13509 ; --------------------------------------------------------------------------
13510 - SECTION SEG_TEXT
13511 - BITS 32
13512 ;
13513 ; Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical.
13514 ;
13515 @@ -42,7 +41,7 @@
13516 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
13517
13518 align 16
13519 - global EXTN(jsimd_h2v1_merged_upsample_sse2)
13520 + global EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE
13521
13522 EXTN(jsimd_h2v1_merged_upsample_sse2):
13523 push ebp
13524 @@ -266,17 +265,13 @@
13525 movntdq XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
13526 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
13527 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF
13528 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
13529 jmp short .out0
13530 .out1: ; --(unaligned)-----------------
13531 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)
13532 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA
13533 - add edi, byte SIZEOF_XMMWORD ; outptr
13534 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD
13535 - add edi, byte SIZEOF_XMMWORD ; outptr
13536 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [edi], xmmF
13537 - add edi, byte SIZEOF_XMMWORD ; outptr
13538 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
13539 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
13540 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF
13541 .out0:
13542 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
13543 sub ecx, byte SIZEOF_XMMWORD
13544 jz near .endcolumn
13545
13546 @@ -290,14 +285,12 @@
13547 alignx 16,7
13548
13549 .column_st32:
13550 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)
13551 lea ecx, [ecx+ecx*2] ; imul ecx, RGB_PIXELSIZE
13552 cmp ecx, byte 2*SIZEOF_XMMWORD
13553 jb short .column_st16
13554 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA
13555 - add edi, byte SIZEOF_XMMWORD ; outptr
13556 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD
13557 - add edi, byte SIZEOF_XMMWORD ; outptr
13558 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
13559 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
13560 + add edi, byte 2*SIZEOF_XMMWORD ; outptr
13561 movdqa xmmA,xmmF
13562 sub ecx, byte 2*SIZEOF_XMMWORD
13563 jmp short .column_st15
13564 @@ -304,50 +297,44 @@
13565 .column_st16:
13566 cmp ecx, byte SIZEOF_XMMWORD
13567 jb short .column_st15
13568 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA
13569 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
13570 add edi, byte SIZEOF_XMMWORD ; outptr
13571 movdqa xmmA,xmmD
13572 sub ecx, byte SIZEOF_XMMWORD
13573 .column_st15:
13574 - mov eax,ecx
13575 - xor ecx, byte 0x0F
13576 - shl ecx, 2
13577 - movd xmmB,ecx
13578 - psrlq xmmH,4
13579 - pcmpeqb xmmE,xmmE
13580 - psrlq xmmH,xmmB
13581 - psrlq xmmE,xmmB
13582 - punpcklbw xmmE,xmmH
13583 - ; ----------------
13584 - mov ecx,edi
13585 - and ecx, byte SIZEOF_XMMWORD-1
13586 - jz short .adj0
13587 - add eax,ecx
13588 - cmp eax, byte SIZEOF_XMMWORD
13589 - ja short .adj0
13590 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary
13591 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx
13592 - movdqa xmmG,xmmA
13593 - movdqa xmmC,xmmE
13594 - pslldq xmmA, SIZEOF_XMMWORD/2
13595 - pslldq xmmE, SIZEOF_XMMWORD/2
13596 - movd xmmD,ecx
13597 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT
13598 - jb short .adj1
13599 - movd xmmF,ecx
13600 - psllq xmmA,xmmF
13601 - psllq xmmE,xmmF
13602 - jmp short .adj0
13603 -.adj1: neg ecx
13604 - movd xmmF,ecx
13605 - psrlq xmmA,xmmF
13606 - psrlq xmmE,xmmF
13607 - psllq xmmG,xmmD
13608 - psllq xmmC,xmmD
13609 - por xmmA,xmmG
13610 - por xmmE,xmmC
13611 -.adj0: ; ----------------
13612 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
13613 + ; Store the lower 8 bytes of xmmA to the output when it has enough
13614 + ; space.
13615 + cmp ecx, byte SIZEOF_MMWORD
13616 + jb short .column_st7
13617 + movq XMM_MMWORD [edi], xmmA
13618 + add edi, byte SIZEOF_MMWORD
13619 + sub ecx, byte SIZEOF_MMWORD
13620 + psrldq xmmA, SIZEOF_MMWORD
13621 +.column_st7:
13622 + ; Store the lower 4 bytes of xmmA to the output when it has enough
13623 + ; space.
13624 + cmp ecx, byte SIZEOF_DWORD
13625 + jb short .column_st3
13626 + movd XMM_DWORD [edi], xmmA
13627 + add edi, byte SIZEOF_DWORD
13628 + sub ecx, byte SIZEOF_DWORD
13629 + psrldq xmmA, SIZEOF_DWORD
13630 +.column_st3:
13631 + ; Store the lower 2 bytes of eax to the output when it has enough
13632 + ; space.
13633 + movd eax, xmmA
13634 + cmp ecx, byte SIZEOF_WORD
13635 + jb short .column_st1
13636 + mov WORD [edi], ax
13637 + add edi, byte SIZEOF_WORD
13638 + sub ecx, byte SIZEOF_WORD
13639 + shr eax, 16
13640 +.column_st1:
13641 + ; Store the lower 1 byte of eax to the output when it has enough
13642 + ; space.
13643 + test ecx, ecx
13644 + jz short .endcolumn
13645 + mov BYTE [edi], al
13646
13647 %else ; RGB_PIXELSIZE == 4 ; -----------
13648
13649 @@ -392,19 +379,14 @@
13650 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
13651 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC
13652 movntdq XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH
13653 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
13654 jmp short .out0
13655 .out1: ; --(unaligned)-----------------
13656 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)
13657 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
13658 - add edi, byte SIZEOF_XMMWORD ; outptr
13659 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD
13660 - add edi, byte SIZEOF_XMMWORD ; outptr
13661 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [edi], xmmC
13662 - add edi, byte SIZEOF_XMMWORD ; outptr
13663 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [edi], xmmH
13664 - add edi, byte SIZEOF_XMMWORD ; outptr
13665 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
13666 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
13667 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC
13668 + movdqu XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH
13669 .out0:
13670 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
13671 sub ecx, byte SIZEOF_XMMWORD
13672 jz near .endcolumn
13673
13674 @@ -418,13 +400,11 @@
13675 alignx 16,7
13676
13677 .column_st32:
13678 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)
13679 cmp ecx, byte SIZEOF_XMMWORD/2
13680 jb short .column_st16
13681 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
13682 - add edi, byte SIZEOF_XMMWORD ; outptr
13683 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD
13684 - add edi, byte SIZEOF_XMMWORD ; outptr
13685 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
13686 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
13687 + add edi, byte 2*SIZEOF_XMMWORD ; outptr
13688 movdqa xmmA,xmmC
13689 movdqa xmmD,xmmH
13690 sub ecx, byte SIZEOF_XMMWORD/2
13691 @@ -431,50 +411,25 @@
13692 .column_st16:
13693 cmp ecx, byte SIZEOF_XMMWORD/4
13694 jb short .column_st15
13695 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
13696 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
13697 add edi, byte SIZEOF_XMMWORD ; outptr
13698 movdqa xmmA,xmmD
13699 sub ecx, byte SIZEOF_XMMWORD/4
13700 .column_st15:
13701 - cmp ecx, byte SIZEOF_XMMWORD/16
13702 - jb short .endcolumn
13703 - mov eax,ecx
13704 - xor ecx, byte 0x03
13705 - inc ecx
13706 - shl ecx, 4
13707 - movd xmmF,ecx
13708 - psrlq xmmE,xmmF
13709 - punpcklbw xmmE,xmmE
13710 - ; ----------------
13711 - mov ecx,edi
13712 - and ecx, byte SIZEOF_XMMWORD-1
13713 - jz short .adj0
13714 - lea eax, [ecx+eax*4] ; RGB_PIXELSIZE
13715 - cmp eax, byte SIZEOF_XMMWORD
13716 - ja short .adj0
13717 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary
13718 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx
13719 - movdqa xmmB,xmmA
13720 - movdqa xmmG,xmmE
13721 - pslldq xmmA, SIZEOF_XMMWORD/2
13722 - pslldq xmmE, SIZEOF_XMMWORD/2
13723 - movd xmmC,ecx
13724 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT
13725 - jb short .adj1
13726 - movd xmmH,ecx
13727 - psllq xmmA,xmmH
13728 - psllq xmmE,xmmH
13729 - jmp short .adj0
13730 -.adj1: neg ecx
13731 - movd xmmH,ecx
13732 - psrlq xmmA,xmmH
13733 - psrlq xmmE,xmmH
13734 - psllq xmmB,xmmC
13735 - psllq xmmG,xmmC
13736 - por xmmA,xmmB
13737 - por xmmE,xmmG
13738 -.adj0: ; ----------------
13739 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
13740 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough
13741 + ; space.
13742 + cmp ecx, byte SIZEOF_XMMWORD/8
13743 + jb short .column_st7
13744 + movq XMM_MMWORD [edi], xmmA
13745 + add edi, byte SIZEOF_XMMWORD/8*4
13746 + sub ecx, byte SIZEOF_XMMWORD/8
13747 + psrldq xmmA, SIZEOF_XMMWORD/8*4
13748 +.column_st7:
13749 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough
13750 + ; space.
13751 + test ecx, ecx
13752 + jz short .endcolumn
13753 + movd XMM_DWORD [edi], xmmA
13754
13755 %endif ; RGB_PIXELSIZE ; ---------------
13756
13757 @@ -509,7 +464,7 @@
13758 %define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
13759
13760 align 16
13761 - global EXTN(jsimd_h2v2_merged_upsample_sse2)
13762 + global EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE
13763
13764 EXTN(jsimd_h2v2_merged_upsample_sse2):
13765 push ebp
13766 @@ -559,3 +514,6 @@
13767 pop ebp
13768 ret
13769
13770 +; For some reason, the OS X linker does not honor the request to align the
13771 +; segment unless we do this.
13772 + align 16
13773 Index: simd/jdsammmx.asm
13774 ===================================================================
13775 --- simd/jdsammmx.asm (revision 829)
13776 +++ simd/jdsammmx.asm (working copy)
13777 @@ -22,7 +22,7 @@
13778 SECTION SEG_CONST
1024 13779
1025 alignz 16 13780 alignz 16
1026 -» global» EXTN(jconst_fdct_ifast_sse2) 13781 -» global» EXTN(jconst_fancy_upsample_mmx)
1027 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE 13782 +» global» EXTN(jconst_fancy_upsample_mmx) PRIVATE
1028 13783
1029 EXTN(jconst_fdct_ifast_sse2): 13784 EXTN(jconst_fancy_upsample_mmx):
1030 13785
1031 @@ -80,7 +80,7 @@ 13786 @@ -58,7 +58,7 @@
1032 %define WK_NUM»» 2 13787 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
1033 13788
1034 » align» 16 13789 » align» 16
1035 -» global» EXTN(jsimd_fdct_ifast_sse2) 13790 -» global» EXTN(jsimd_h2v1_fancy_upsample_mmx)
1036 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE 13791 +» global» EXTN(jsimd_h2v1_fancy_upsample_mmx) PRIVATE
1037 13792
1038 EXTN(jsimd_fdct_ifast_sse2): 13793 EXTN(jsimd_h2v1_fancy_upsample_mmx):
1039 » push» rbp
1040 Index: simd/jcqntmmx.asm
1041 ===================================================================
1042 --- simd/jcqntmmx.asm» (revision 829)
1043 +++ simd/jcqntmmx.asm» (working copy)
1044 @@ -35,7 +35,7 @@
1045 %define workspace» ebp+16» » ; DCTELEM * workspace
1046
1047 » align» 16
1048 -» global» EXTN(jsimd_convsamp_mmx)
1049 +» global» EXTN(jsimd_convsamp_mmx) PRIVATE
1050
1051 EXTN(jsimd_convsamp_mmx):
1052 push ebp 13794 push ebp
1053 @@ -140,7 +140,7 @@ 13795 @@ -216,7 +216,7 @@
1054 %define workspace» ebp+16» » ; DCTELEM * workspace 13796 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr
1055 13797
1056 » align» 16 13798 » align» 16
1057 -» global» EXTN(jsimd_quantize_mmx) 13799 -» global» EXTN(jsimd_h2v2_fancy_upsample_mmx)
1058 +» global» EXTN(jsimd_quantize_mmx) PRIVATE 13800 +» global» EXTN(jsimd_h2v2_fancy_upsample_mmx) PRIVATE
1059 13801
1060 EXTN(jsimd_quantize_mmx): 13802 EXTN(jsimd_h2v2_fancy_upsample_mmx):
1061 push ebp 13803 push ebp
1062 Index: simd/jimmxfst.asm 13804 @@ -542,7 +542,7 @@
1063 =================================================================== 13805 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
1064 --- simd/jimmxfst.asm» (revision 829) 13806
1065 +++ simd/jimmxfst.asm» (working copy) 13807 » align» 16
1066 @@ -59,7 +59,7 @@ 13808 -» global» EXTN(jsimd_h2v1_upsample_mmx)
1067 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) 13809 +» global» EXTN(jsimd_h2v1_upsample_mmx) PRIVATE
1068 13810
1069 » alignz» 16 13811 EXTN(jsimd_h2v1_upsample_mmx):
1070 -» global» EXTN(jconst_idct_ifast_mmx)
1071 +» global» EXTN(jconst_idct_ifast_mmx) PRIVATE
1072
1073 EXTN(jconst_idct_ifast_mmx):
1074
1075 @@ -94,7 +94,7 @@
1076 » » » » » ; JCOEF workspace[DCTSIZE2]
1077
1078 » align» 16
1079 -» global» EXTN(jsimd_idct_ifast_mmx)
1080 +» global» EXTN(jsimd_idct_ifast_mmx) PRIVATE
1081
1082 EXTN(jsimd_idct_ifast_mmx):
1083 push ebp 13812 push ebp
1084 Index: simd/jfss2fst.asm 13813 @@ -643,7 +643,7 @@
1085 =================================================================== 13814 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
1086 --- simd/jfss2fst.asm» (revision 829) 13815
1087 +++ simd/jfss2fst.asm» (working copy) 13816 » align» 16
1088 @@ -52,7 +52,7 @@ 13817 -» global» EXTN(jsimd_h2v2_upsample_mmx)
1089 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) 13818 +» global» EXTN(jsimd_h2v2_upsample_mmx) PRIVATE
1090 13819
1091 » alignz» 16 13820 EXTN(jsimd_h2v2_upsample_mmx):
1092 -» global» EXTN(jconst_fdct_ifast_sse2)
1093 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE
1094
1095 EXTN(jconst_fdct_ifast_sse2):
1096
1097 @@ -80,7 +80,7 @@
1098 %define WK_NUM»» 2
1099
1100 » align» 16
1101 -» global» EXTN(jsimd_fdct_ifast_sse2)
1102 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE
1103
1104 EXTN(jsimd_fdct_ifast_sse2):
1105 push ebp 13821 push ebp
1106 Index: simd/jcgrammx.asm 13822 @@ -732,3 +732,6 @@
1107 =================================================================== 13823 » pop» ebp
1108 --- simd/jcgrammx.asm» (revision 829) 13824 » ret
1109 +++ simd/jcgrammx.asm» (working copy) 13825
1110 @@ -33,7 +33,7 @@ 13826 +; For some reason, the OS X linker does not honor the request to align the
1111 » SECTION»SEG_CONST 13827 +; segment unless we do this.
1112 13828 +» align» 16
1113 » alignz» 16
1114 -» global» EXTN(jconst_rgb_gray_convert_mmx)
1115 +» global» EXTN(jconst_rgb_gray_convert_mmx) PRIVATE
1116
1117 EXTN(jconst_rgb_gray_convert_mmx):
1118
1119 Index: simd/jdcolss2-64.asm
1120 ===================================================================
1121 --- simd/jdcolss2-64.asm» (revision 829)
1122 +++ simd/jdcolss2-64.asm» (working copy)
1123 @@ -35,7 +35,7 @@
1124 » SECTION»SEG_CONST
1125
1126 » alignz» 16
1127 -» global» EXTN(jconst_ycc_rgb_convert_sse2)
1128 +» global» EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE
1129
1130 EXTN(jconst_ycc_rgb_convert_sse2):
1131
1132 Index: simd/jf3dnflt.asm
1133 ===================================================================
1134 --- simd/jf3dnflt.asm» (revision 829)
1135 +++ simd/jf3dnflt.asm» (working copy)
1136 @@ -27,7 +27,7 @@
1137 » SECTION»SEG_CONST
1138
1139 » alignz» 16
1140 -» global» EXTN(jconst_fdct_float_3dnow)
1141 +» global» EXTN(jconst_fdct_float_3dnow) PRIVATE
1142
1143 EXTN(jconst_fdct_float_3dnow):
1144
1145 @@ -55,7 +55,7 @@
1146 %define WK_NUM»» 2
1147
1148 » align» 16
1149 -» global» EXTN(jsimd_fdct_float_3dnow)
1150 +» global» EXTN(jsimd_fdct_float_3dnow) PRIVATE
1151
1152 EXTN(jsimd_fdct_float_3dnow):
1153 » push» ebp
1154 Index: simd/jdsamss2-64.asm 13829 Index: simd/jdsamss2-64.asm
1155 =================================================================== 13830 ===================================================================
1156 --- simd/jdsamss2-64.asm (revision 829) 13831 --- simd/jdsamss2-64.asm (revision 829)
1157 +++ simd/jdsamss2-64.asm (working copy) 13832 +++ simd/jdsamss2-64.asm (working copy)
13833 @@ -1,5 +1,5 @@
13834 ;
13835 -; jdsamss2.asm - upsampling (64-bit SSE2)
13836 +; jdsamss2-64.asm - upsampling (64-bit SSE2)
13837 ;
13838 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
13839 ; Copyright 2009 D. R. Commander
1158 @@ -23,7 +23,7 @@ 13840 @@ -23,7 +23,7 @@
1159 SECTION SEG_CONST 13841 SECTION SEG_CONST
1160 13842
1161 alignz 16 13843 alignz 16
1162 - global EXTN(jconst_fancy_upsample_sse2) 13844 - global EXTN(jconst_fancy_upsample_sse2)
1163 + global EXTN(jconst_fancy_upsample_sse2) PRIVATE 13845 + global EXTN(jconst_fancy_upsample_sse2) PRIVATE
1164 13846
1165 EXTN(jconst_fancy_upsample_sse2): 13847 EXTN(jconst_fancy_upsample_sse2):
1166 13848
1167 @@ -59,7 +59,7 @@ 13849 @@ -59,10 +59,11 @@
1168 ; r13 = JSAMPARRAY * output_data_ptr 13850 ; r13 = JSAMPARRAY * output_data_ptr
1169 13851
1170 align 16 13852 align 16
1171 - global EXTN(jsimd_h2v1_fancy_upsample_sse2) 13853 - global EXTN(jsimd_h2v1_fancy_upsample_sse2)
1172 + global EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE 13854 + global EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE
1173 13855
1174 EXTN(jsimd_h2v1_fancy_upsample_sse2): 13856 EXTN(jsimd_h2v1_fancy_upsample_sse2):
1175 push rbp 13857 push rbp
1176 @@ -201,7 +201,7 @@ 13858 +» mov» rax,rsp
13859 » mov» rbp,rsp
13860 » collect_args
13861
13862 @@ -200,7 +201,7 @@
1177 %define WK_NUM 4 13863 %define WK_NUM 4
1178 13864
1179 align 16 13865 align 16
1180 - global EXTN(jsimd_h2v2_fancy_upsample_sse2) 13866 - global EXTN(jsimd_h2v2_fancy_upsample_sse2)
1181 + global EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE 13867 + global EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE
1182 13868
1183 EXTN(jsimd_h2v2_fancy_upsample_sse2): 13869 EXTN(jsimd_h2v2_fancy_upsample_sse2):
1184 push rbp 13870 push rbp
1185 @@ -498,7 +498,7 @@ 13871 @@ -210,8 +211,8 @@
13872 » mov» [rsp],rax
13873 » mov» rbp,rsp»» » » ; rbp = aligned rbp
13874 » lea» rsp, [wk(0)]
13875 +» collect_args
13876 » push» rbx
13877 -» collect_args
13878
13879 » mov» rax, r11 ; colctr
13880 » test» rax,rax
13881 @@ -472,8 +473,8 @@
13882 » jg» near .rowloop
13883
13884 .return:
13885 +» pop» rbx
13886 » uncollect_args
13887 -» pop» rbx
13888 » mov» rsp,rbp»» ; rsp <- aligned rbp
13889 » pop» rsp» » ; rsp <- original rbp
13890 » pop» rbp
13891 @@ -497,10 +498,11 @@
1186 ; r13 = JSAMPARRAY * output_data_ptr 13892 ; r13 = JSAMPARRAY * output_data_ptr
1187 13893
1188 align 16 13894 align 16
1189 - global EXTN(jsimd_h2v1_upsample_sse2) 13895 - global EXTN(jsimd_h2v1_upsample_sse2)
1190 + global EXTN(jsimd_h2v1_upsample_sse2) PRIVATE 13896 + global EXTN(jsimd_h2v1_upsample_sse2) PRIVATE
1191 13897
1192 EXTN(jsimd_h2v1_upsample_sse2): 13898 EXTN(jsimd_h2v1_upsample_sse2):
1193 push rbp 13899 push rbp
1194 @@ -587,7 +587,7 @@ 13900 +» mov» rax,rsp
13901 » mov» rbp,rsp
13902 » collect_args
13903
13904 @@ -585,13 +587,14 @@
1195 ; r13 = JSAMPARRAY * output_data_ptr 13905 ; r13 = JSAMPARRAY * output_data_ptr
1196 13906
1197 align 16 13907 align 16
1198 - global EXTN(jsimd_h2v2_upsample_sse2) 13908 - global EXTN(jsimd_h2v2_upsample_sse2)
1199 + global EXTN(jsimd_h2v2_upsample_sse2) PRIVATE 13909 + global EXTN(jsimd_h2v2_upsample_sse2) PRIVATE
1200 13910
1201 EXTN(jsimd_h2v2_upsample_sse2): 13911 EXTN(jsimd_h2v2_upsample_sse2):
1202 push rbp 13912 push rbp
1203 Index: simd/jcgrass2.asm 13913 +» mov» rax,rsp
1204 =================================================================== 13914 » mov» rbp,rsp
1205 --- simd/jcgrass2.asm» (revision 829) 13915 +» collect_args
1206 +++ simd/jcgrass2.asm» (working copy) 13916 » push» rbx
1207 @@ -30,7 +30,7 @@ 13917 -» collect_args
1208 » SECTION»SEG_CONST 13918
1209 13919 » mov» rdx, r11
1210 » alignz» 16 13920 » add» rdx, byte (2*SIZEOF_XMMWORD)-1
1211 -» global» EXTN(jconst_rgb_gray_convert_sse2) 13921 @@ -658,7 +661,11 @@
1212 +» global» EXTN(jconst_rgb_gray_convert_sse2) PRIVATE 13922 » jg» near .rowloop
1213 13923
1214 EXTN(jconst_rgb_gray_convert_sse2): 13924 .return:
1215 13925 +» pop» rbx
1216 Index: simd/jcsammmx.asm 13926 » uncollect_args
1217 =================================================================== 13927 -» pop» rbx
1218 --- simd/jcsammmx.asm» (revision 829) 13928 » pop» rbp
1219 +++ simd/jcsammmx.asm» (working copy) 13929 » ret
1220 @@ -40,7 +40,7 @@
1221 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data
1222
1223 » align» 16
1224 -» global» EXTN(jsimd_h2v1_downsample_mmx)
1225 +» global» EXTN(jsimd_h2v1_downsample_mmx) PRIVATE
1226
1227 EXTN(jsimd_h2v1_downsample_mmx):
1228 » push» ebp
1229 @@ -182,7 +182,7 @@
1230 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data
1231
1232 » align» 16
1233 -» global» EXTN(jsimd_h2v2_downsample_mmx)
1234 +» global» EXTN(jsimd_h2v2_downsample_mmx) PRIVATE
1235
1236 EXTN(jsimd_h2v2_downsample_mmx):
1237 » push» ebp
1238 +Index: simd/jsimd_arm.c
1239 +===================================================================
1240 +--- simd/jsimd_arm.c (revision 272637)
1241 ++++ simd/jsimd_arm.c (working copy)
1242 +@@ -29,0 +29,0 @@
1243 + 13930 +
1244 + static unsigned int simd_support = ~0; 13931 +; For some reason, the OS X linker does not honor the request to align the
13932 +; segment unless we do this.
13933 +» align» 16
13934 Index: simd/jdsamss2.asm
13935 ===================================================================
13936 --- simd/jdsamss2.asm» (revision 829)
13937 +++ simd/jdsamss2.asm» (working copy)
13938 @@ -22,7 +22,7 @@
13939 » SECTION»SEG_CONST
13940
13941 » alignz» 16
13942 -» global» EXTN(jconst_fancy_upsample_sse2)
13943 +» global» EXTN(jconst_fancy_upsample_sse2) PRIVATE
13944
13945 EXTN(jconst_fancy_upsample_sse2):
13946
13947 @@ -58,7 +58,7 @@
13948 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
13949
13950 » align» 16
13951 -» global» EXTN(jsimd_h2v1_fancy_upsample_sse2)
13952 +» global» EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE
13953
13954 EXTN(jsimd_h2v1_fancy_upsample_sse2):
13955 » push» ebp
13956 @@ -214,7 +214,7 @@
13957 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr
13958
13959 » align» 16
13960 -» global» EXTN(jsimd_h2v2_fancy_upsample_sse2)
13961 +» global» EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE
13962
13963 EXTN(jsimd_h2v2_fancy_upsample_sse2):
13964 » push» ebp
13965 @@ -538,7 +538,7 @@
13966 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
13967
13968 » align» 16
13969 -» global» EXTN(jsimd_h2v1_upsample_sse2)
13970 +» global» EXTN(jsimd_h2v1_upsample_sse2) PRIVATE
13971
13972 EXTN(jsimd_h2v1_upsample_sse2):
13973 » push» ebp
13974 @@ -637,7 +637,7 @@
13975 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr
13976
13977 » align» 16
13978 -» global» EXTN(jsimd_h2v2_upsample_sse2)
13979 +» global» EXTN(jsimd_h2v2_upsample_sse2) PRIVATE
13980
13981 EXTN(jsimd_h2v2_upsample_sse2):
13982 » push» ebp
13983 @@ -724,3 +724,6 @@
13984 » pop» ebp
13985 » ret
13986
13987 +; For some reason, the OS X linker does not honor the request to align the
13988 +; segment unless we do this.
13989 +» align» 16
13990 Index: simd/jf3dnflt.asm
13991 ===================================================================
13992 --- simd/jf3dnflt.asm» (revision 829)
13993 +++ simd/jf3dnflt.asm» (working copy)
13994 @@ -27,7 +27,7 @@
13995 » SECTION»SEG_CONST
13996
13997 » alignz» 16
13998 -» global» EXTN(jconst_fdct_float_3dnow)
13999 +» global» EXTN(jconst_fdct_float_3dnow) PRIVATE
14000
14001 EXTN(jconst_fdct_float_3dnow):
14002
14003 @@ -55,7 +55,7 @@
14004 %define WK_NUM»» 2
14005
14006 » align» 16
14007 -» global» EXTN(jsimd_fdct_float_3dnow)
14008 +» global» EXTN(jsimd_fdct_float_3dnow) PRIVATE
14009
14010 EXTN(jsimd_fdct_float_3dnow):
14011 » push» ebp
14012 @@ -315,3 +315,6 @@
14013 » pop» ebp
14014 » ret
14015
14016 +; For some reason, the OS X linker does not honor the request to align the
14017 +; segment unless we do this.
14018 +» align» 16
14019 Index: simd/jfmmxfst.asm
14020 ===================================================================
14021 --- simd/jfmmxfst.asm» (revision 829)
14022 +++ simd/jfmmxfst.asm» (working copy)
14023 @@ -52,7 +52,7 @@
14024 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
14025
14026 » alignz» 16
14027 -» global» EXTN(jconst_fdct_ifast_mmx)
14028 +» global» EXTN(jconst_fdct_ifast_mmx) PRIVATE
14029
14030 EXTN(jconst_fdct_ifast_mmx):
14031
14032 @@ -80,7 +80,7 @@
14033 %define WK_NUM»» 2
14034
14035 » align» 16
14036 -» global» EXTN(jsimd_fdct_ifast_mmx)
14037 +» global» EXTN(jsimd_fdct_ifast_mmx) PRIVATE
14038
14039 EXTN(jsimd_fdct_ifast_mmx):
14040 » push» ebp
14041 @@ -392,3 +392,6 @@
14042 » pop» ebp
14043 » ret
14044
14045 +; For some reason, the OS X linker does not honor the request to align the
14046 +; segment unless we do this.
14047 +» align» 16
14048 Index: simd/jfmmxint.asm
14049 ===================================================================
14050 --- simd/jfmmxint.asm» (revision 829)
14051 +++ simd/jfmmxint.asm» (working copy)
14052 @@ -66,7 +66,7 @@
14053 » SECTION»SEG_CONST
14054
14055 » alignz» 16
14056 -» global» EXTN(jconst_fdct_islow_mmx)
14057 +» global» EXTN(jconst_fdct_islow_mmx) PRIVATE
14058
14059 EXTN(jconst_fdct_islow_mmx):
14060
14061 @@ -101,7 +101,7 @@
14062 %define WK_NUM»» 2
14063
14064 » align» 16
14065 -» global» EXTN(jsimd_fdct_islow_mmx)
14066 +» global» EXTN(jsimd_fdct_islow_mmx) PRIVATE
14067
14068 EXTN(jsimd_fdct_islow_mmx):
14069 » push» ebp
14070 @@ -617,3 +617,6 @@
14071 » pop» ebp
14072 » ret
14073
14074 +; For some reason, the OS X linker does not honor the request to align the
14075 +; segment unless we do this.
14076 +» align» 16
14077 Index: simd/jfss2fst-64.asm
14078 ===================================================================
14079 --- simd/jfss2fst-64.asm» (revision 829)
14080 +++ simd/jfss2fst-64.asm» (working copy)
14081 @@ -1,5 +1,5 @@
14082 ;
14083 -; jfss2fst.asm - fast integer FDCT (64-bit SSE2)
14084 +; jfss2fst-64.asm - fast integer FDCT (64-bit SSE2)
14085 ;
14086 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
14087 ; Copyright 2009 D. R. Commander
14088 @@ -53,7 +53,7 @@
14089 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
14090
14091 » alignz» 16
14092 -» global» EXTN(jconst_fdct_ifast_sse2)
14093 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE
14094
14095 EXTN(jconst_fdct_ifast_sse2):
14096
14097 @@ -80,7 +80,7 @@
14098 %define WK_NUM»» 2
14099
14100 » align» 16
14101 -» global» EXTN(jsimd_fdct_ifast_sse2)
14102 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE
14103
14104 EXTN(jsimd_fdct_ifast_sse2):
14105 » push» rbp
14106 @@ -386,3 +386,7 @@
14107 » pop» rsp» » ; rsp <- original rbp
14108 » pop» rbp
14109 » ret
1245 + 14110 +
1246 +-#if defined(__linux__) || defined(ANDROID) || defined(__ANDROID__) 14111 +; For some reason, the OS X linker does not honor the request to align the
1247 ++#if !defined(__ARM_NEON__) && (defined(__linux__) || defined(ANDROID) || defin ed(__ANDROID__)) 14112 +; segment unless we do this.
14113 +» align» 16
14114 Index: simd/jfss2fst.asm
14115 ===================================================================
14116 --- simd/jfss2fst.asm» (revision 829)
14117 +++ simd/jfss2fst.asm» (working copy)
14118 @@ -52,7 +52,7 @@
14119 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
14120
14121 » alignz» 16
14122 -» global» EXTN(jconst_fdct_ifast_sse2)
14123 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE
14124
14125 EXTN(jconst_fdct_ifast_sse2):
14126
14127 @@ -80,7 +80,7 @@
14128 %define WK_NUM»» 2
14129
14130 » align» 16
14131 -» global» EXTN(jsimd_fdct_ifast_sse2)
14132 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE
14133
14134 EXTN(jsimd_fdct_ifast_sse2):
14135 » push» ebp
14136 @@ -399,3 +399,6 @@
14137 » pop» ebp
14138 » ret
14139
14140 +; For some reason, the OS X linker does not honor the request to align the
14141 +; segment unless we do this.
14142 +» align» 16
14143 Index: simd/jfss2int-64.asm
14144 ===================================================================
14145 --- simd/jfss2int-64.asm» (revision 829)
14146 +++ simd/jfss2int-64.asm» (working copy)
14147 @@ -1,5 +1,5 @@
14148 ;
14149 -; jfss2int.asm - accurate integer FDCT (64-bit SSE2)
14150 +; jfss2int-64.asm - accurate integer FDCT (64-bit SSE2)
14151 ;
14152 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
14153 ; Copyright 2009 D. R. Commander
14154 @@ -67,7 +67,7 @@
14155 » SECTION»SEG_CONST
14156
14157 » alignz» 16
14158 -» global» EXTN(jconst_fdct_islow_sse2)
14159 +» global» EXTN(jconst_fdct_islow_sse2) PRIVATE
14160
14161 EXTN(jconst_fdct_islow_sse2):
14162
14163 @@ -101,7 +101,7 @@
14164 %define WK_NUM»» 6
14165
14166 » align» 16
14167 -» global» EXTN(jsimd_fdct_islow_sse2)
14168 +» global» EXTN(jsimd_fdct_islow_sse2) PRIVATE
14169
14170 EXTN(jsimd_fdct_islow_sse2):
14171 » push» rbp
14172 @@ -616,3 +616,7 @@
14173 » pop» rsp» » ; rsp <- original rbp
14174 » pop» rbp
14175 » ret
1248 + 14176 +
1249 + #define SOMEWHAT_SANE_PROC_CPUINFO_SIZE_LIMIT (1024 * 1024) 14177 +; For some reason, the OS X linker does not honor the request to align the
1250 + 14178 +; segment unless we do this.
1251 +@@ -100,6 +100,6 @@ 14179 +» align» 16
1252 + init_simd (void)
1253 + {
1254 + char *env = NULL;
1255 +-#if !defined(__ARM_NEON__) && defined(__linux__) || defined(ANDROID) || define d(__ANDROID__)
1256 ++#if !defined(__ARM_NEON__) && (defined(__linux__) || defined(ANDROID) || defin ed(__ANDROID__))
1257 + int bufsize = 1024; /* an initial guess for the line buffer size limit */
1258 + #endif
1259 +
1260 Index: simd/jsimd_arm_neon.S
1261 ===================================================================
1262 --- simd/jsimd_arm_neon.S» (revision 272637)
1263 +++ simd/jsimd_arm_neon.S» (working copy)
1264 @@ -41,11 +41,9 @@
1265 /* Supplementary macro for setting function attributes */
1266 .macro asm_function fname
1267 #ifdef __APPLE__
1268 - .func _\fname
1269 .globl _\fname
1270 _\fname:
1271 #else
1272 - .func \fname
1273 .global \fname
1274 #ifdef __ELF__
1275 .hidden \fname
1276 @@ -670,7 +668,6 @@
1277 .unreq ROW6R
1278 .unreq ROW7L
1279 .unreq ROW7R
1280 -.endfunc
1281
1282
1283 /*****************************************************************************/
1284 @@ -895,7 +892,6 @@
1285 .unreq TMP2
1286 .unreq TMP3
1287 .unreq TMP4
1288 -.endfunc
1289
1290
1291 /*****************************************************************************/
1292 @@ -1108,7 +1104,6 @@
1293 .unreq TMP2
1294 .unreq TMP3
1295 .unreq TMP4
1296 -.endfunc
1297
1298 .purgem idct_helper
1299
1300 @@ -1263,7 +1258,6 @@
1301 .unreq OUTPUT_COL
1302 .unreq TMP1
1303 .unreq TMP2
1304 -.endfunc
1305
1306 .purgem idct_helper
1307
1308 @@ -1547,7 +1541,6 @@
1309 .unreq U
1310 .unreq V
1311 .unreq N
1312 -.endfunc
1313
1314 .purgem do_yuv_to_rgb
1315 .purgem do_yuv_to_rgb_stage1
1316 @@ -1858,7 +1851,6 @@
1317 .unreq U
1318 .unreq V
1319 .unreq N
1320 -.endfunc
1321
1322 .purgem do_rgb_to_yuv
1323 .purgem do_rgb_to_yuv_stage1
1324 @@ -1940,7 +1932,6 @@
1325 .unreq TMP2
1326 .unreq TMP3
1327 .unreq TMP4
1328 -.endfunc
1329
1330
1331 /*****************************************************************************/
1332 @@ -2064,7 +2055,6 @@
1333
1334 .unreq DATA
1335 .unreq TMP
1336 -.endfunc
1337
1338
1339 /*****************************************************************************/
1340 @@ -2166,7 +2156,6 @@
1341 .unreq CORRECTION
1342 .unreq SHIFT
1343 .unreq LOOP_COUNT
1344 -.endfunc
1345
1346
1347 /*****************************************************************************/
1348 @@ -2401,7 +2390,6 @@
1349 .unreq WIDTH
1350 .unreq TMP
1351
1352 -.endfunc
1353
1354 .purgem upsample16
1355 .purgem upsample32
1356 Index: simd/jsimd_i386.c
1357 ===================================================================
1358 --- simd/jsimd_i386.c» (revision 829)
1359 +++ simd/jsimd_i386.c» (working copy)
1360 @@ -61,6 +61,7 @@
1361 simd_support &= JSIMD_SSE2;
1362 }
1363
1364 +#ifndef JPEG_DECODE_ONLY
1365 GLOBAL(int)
1366 jsimd_can_rgb_ycc (void)
1367 {
1368 @@ -82,6 +83,7 @@
1369
1370 return 0;
1371 }
1372 +#endif
1373
1374 GLOBAL(int)
1375 jsimd_can_rgb_gray (void)
1376 @@ -127,6 +129,7 @@
1377 return 0;
1378 }
1379
1380 +#ifndef JPEG_DECODE_ONLY
1381 GLOBAL(void)
1382 jsimd_rgb_ycc_convert (j_compress_ptr cinfo,
1383 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
1384 @@ -179,6 +182,7 @@
1385 mmxfct(cinfo->image_width, input_buf,
1386 output_buf, output_row, num_rows);
1387 }
1388 +#endif
1389
1390 GLOBAL(void)
1391 jsimd_rgb_gray_convert (j_compress_ptr cinfo,
1392 @@ -286,6 +290,7 @@
1393 input_row, output_buf, num_rows);
1394 }
1395
1396 +#ifndef JPEG_DECODE_ONLY
1397 GLOBAL(int)
1398 jsimd_can_h2v2_downsample (void)
1399 {
1400 @@ -351,6 +356,7 @@
1401 compptr->v_samp_factor, compptr->width_in_blocks,
1402 input_data, output_data);
1403 }
1404 +#endif
1405
1406 GLOBAL(int)
1407 jsimd_can_h2v2_upsample (void)
1408 @@ -636,6 +642,7 @@
1409 in_row_group_ctr, output_buf);
1410 }
1411
1412 +#ifndef JPEG_DECODE_ONLY
1413 GLOBAL(int)
1414 jsimd_can_convsamp (void)
1415 {
1416 @@ -855,6 +862,7 @@
1417 else if (simd_support & JSIMD_3DNOW)
1418 jsimd_quantize_float_3dnow(coef_block, divisors, workspace);
1419 }
1420 +#endif
1421
1422 GLOBAL(int)
1423 jsimd_can_idct_2x2 (void)
1424 @@ -1045,4 +1053,3 @@
1425 jsimd_idct_float_3dnow(compptr->dct_table, coef_block,
1426 output_buf, output_col);
1427 }
1428 -
1429 Index: simd/jcqnts2f-64.asm
1430 ===================================================================
1431 --- simd/jcqnts2f-64.asm» (revision 829)
1432 +++ simd/jcqnts2f-64.asm» (working copy)
1433 @@ -36,7 +36,7 @@
1434 ; r12 = FAST_FLOAT * workspace
1435
1436 » align» 16
1437 -» global» EXTN(jsimd_convsamp_float_sse2)
1438 +» global» EXTN(jsimd_convsamp_float_sse2) PRIVATE
1439
1440 EXTN(jsimd_convsamp_float_sse2):
1441 » push» rbp
1442 @@ -110,7 +110,7 @@
1443 ; r12 = FAST_FLOAT * workspace
1444
1445 » align» 16
1446 -» global» EXTN(jsimd_quantize_float_sse2)
1447 +» global» EXTN(jsimd_quantize_float_sse2) PRIVATE
1448
1449 EXTN(jsimd_quantize_float_sse2):
1450 » push» rbp
1451 Index: simd/jcqnt3dn.asm
1452 ===================================================================
1453 --- simd/jcqnt3dn.asm» (revision 829)
1454 +++ simd/jcqnt3dn.asm» (working copy)
1455 @@ -35,7 +35,7 @@
1456 %define workspace» ebp+16» » ; FAST_FLOAT * workspace
1457
1458 » align» 16
1459 -» global» EXTN(jsimd_convsamp_float_3dnow)
1460 +» global» EXTN(jsimd_convsamp_float_3dnow) PRIVATE
1461
1462 EXTN(jsimd_convsamp_float_3dnow):
1463 » push» ebp
1464 @@ -138,7 +138,7 @@
1465 %define workspace» ebp+16» » ; FAST_FLOAT * workspace
1466
1467 » align» 16
1468 -» global» EXTN(jsimd_quantize_float_3dnow)
1469 +» global» EXTN(jsimd_quantize_float_3dnow) PRIVATE
1470
1471 EXTN(jsimd_quantize_float_3dnow):
1472 » push» ebp
1473 Index: simd/jcsamss2.asm
1474 ===================================================================
1475 --- simd/jcsamss2.asm» (revision 829)
1476 +++ simd/jcsamss2.asm» (working copy)
1477 @@ -40,7 +40,7 @@
1478 %define output_data(b)»(b)+28» » ; JSAMPARRAY output_data
1479
1480 » align» 16
1481 -» global» EXTN(jsimd_h2v1_downsample_sse2)
1482 +» global» EXTN(jsimd_h2v1_downsample_sse2) PRIVATE
1483
1484 EXTN(jsimd_h2v1_downsample_sse2):
1485 » push» ebp
1486 @@ -195,7 +195,7 @@
1487 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data
1488
1489 » align» 16
1490 -» global» EXTN(jsimd_h2v2_downsample_sse2)
1491 +» global» EXTN(jsimd_h2v2_downsample_sse2) PRIVATE
1492
1493 EXTN(jsimd_h2v2_downsample_sse2):
1494 » push» ebp
1495 Index: simd/jsimd_x86_64.c
1496 ===================================================================
1497 --- simd/jsimd_x86_64.c»(revision 829)
1498 +++ simd/jsimd_x86_64.c»(working copy)
1499 @@ -29,6 +29,7 @@
1500
1501 #define IS_ALIGNED_SSE(ptr) (IS_ALIGNED(ptr, 4)) /* 16 byte alignment */
1502
1503 +#ifndef JPEG_DECODE_ONLY
1504 GLOBAL(int)
1505 jsimd_can_rgb_ycc (void)
1506 {
1507 @@ -45,6 +46,7 @@
1508
1509 return 1;
1510 }
1511 +#endif
1512
1513 GLOBAL(int)
1514 jsimd_can_rgb_gray (void)
1515 @@ -80,6 +82,7 @@
1516 return 1;
1517 }
1518
1519 +#ifndef JPEG_DECODE_ONLY
1520 GLOBAL(void)
1521 jsimd_rgb_ycc_convert (j_compress_ptr cinfo,
1522 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
1523 @@ -118,6 +121,7 @@
1524
1525 sse2fct(cinfo->image_width, input_buf, output_buf, output_row, num_rows);
1526 }
1527 +#endif
1528
1529 GLOBAL(void)
1530 jsimd_rgb_gray_convert (j_compress_ptr cinfo,
1531 @@ -197,6 +201,7 @@
1532 sse2fct(cinfo->output_width, input_buf, input_row, output_buf, num_rows);
1533 }
1534
1535 +#ifndef JPEG_DECODE_ONLY
1536 GLOBAL(int)
1537 jsimd_can_h2v2_downsample (void)
1538 {
1539 @@ -242,6 +247,7 @@
1540 compptr->width_in_blocks,
1541 input_data, output_data);
1542 }
1543 +#endif
1544
1545 GLOBAL(int)
1546 jsimd_can_h2v2_upsample (void)
1547 @@ -451,6 +457,7 @@
1548 sse2fct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf);
1549 }
1550
1551 +#ifndef JPEG_DECODE_ONLY
1552 GLOBAL(int)
1553 jsimd_can_convsamp (void)
1554 {
1555 @@ -601,6 +608,7 @@
1556 {
1557 jsimd_quantize_float_sse2(coef_block, divisors, workspace);
1558 }
1559 +#endif
1560
1561 GLOBAL(int)
1562 jsimd_can_idct_2x2 (void)
1563 @@ -750,4 +758,3 @@
1564 jsimd_idct_float_sse2(compptr->dct_table, coef_block,
1565 output_buf, output_col);
1566 }
1567 -
1568 Index: simd/jimmxint.asm
1569 ===================================================================
1570 --- simd/jimmxint.asm» (revision 829)
1571 +++ simd/jimmxint.asm» (working copy)
1572 @@ -66,7 +66,7 @@
1573 » SECTION»SEG_CONST
1574
1575 » alignz» 16
1576 -» global» EXTN(jconst_idct_islow_mmx)
1577 +» global» EXTN(jconst_idct_islow_mmx) PRIVATE
1578
1579 EXTN(jconst_idct_islow_mmx):
1580
1581 @@ -107,7 +107,7 @@
1582 » » » » » ; JCOEF workspace[DCTSIZE2]
1583
1584 » align» 16
1585 -» global» EXTN(jsimd_idct_islow_mmx)
1586 +» global» EXTN(jsimd_idct_islow_mmx) PRIVATE
1587
1588 EXTN(jsimd_idct_islow_mmx):
1589 » push» ebp
1590 Index: simd/jcgrymmx.asm
1591 ===================================================================
1592 --- simd/jcgrymmx.asm» (revision 829)
1593 +++ simd/jcgrymmx.asm» (working copy)
1594 @@ -41,7 +41,7 @@
1595 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr
1596
1597 » align» 16
1598 -» global» EXTN(jsimd_rgb_gray_convert_mmx)
1599 +» global» EXTN(jsimd_rgb_gray_convert_mmx) PRIVATE
1600
1601 EXTN(jsimd_rgb_gray_convert_mmx):
1602 » push» ebp
1603 Index: simd/jfss2int.asm 14180 Index: simd/jfss2int.asm
1604 =================================================================== 14181 ===================================================================
1605 --- simd/jfss2int.asm (revision 829) 14182 --- simd/jfss2int.asm (revision 829)
1606 +++ simd/jfss2int.asm (working copy) 14183 +++ simd/jfss2int.asm (working copy)
1607 @@ -66,7 +66,7 @@ 14184 @@ -66,7 +66,7 @@
1608 SECTION SEG_CONST 14185 SECTION SEG_CONST
1609 14186
1610 alignz 16 14187 alignz 16
1611 - global EXTN(jconst_fdct_islow_sse2) 14188 - global EXTN(jconst_fdct_islow_sse2)
1612 + global EXTN(jconst_fdct_islow_sse2) PRIVATE 14189 + global EXTN(jconst_fdct_islow_sse2) PRIVATE
1613 14190
1614 EXTN(jconst_fdct_islow_sse2): 14191 EXTN(jconst_fdct_islow_sse2):
1615 14192
1616 @@ -101,7 +101,7 @@ 14193 @@ -101,7 +101,7 @@
1617 %define WK_NUM 6 14194 %define WK_NUM 6
1618 14195
1619 align 16 14196 align 16
1620 - global EXTN(jsimd_fdct_islow_sse2) 14197 - global EXTN(jsimd_fdct_islow_sse2)
1621 + global EXTN(jsimd_fdct_islow_sse2) PRIVATE 14198 + global EXTN(jsimd_fdct_islow_sse2) PRIVATE
1622 14199
1623 EXTN(jsimd_fdct_islow_sse2): 14200 EXTN(jsimd_fdct_islow_sse2):
1624 push ebp 14201 push ebp
1625 Index: simd/jcgryss2.asm 14202 @@ -629,3 +629,6 @@
14203 » pop» ebp
14204 » ret
14205
14206 +; For some reason, the OS X linker does not honor the request to align the
14207 +; segment unless we do this.
14208 +» align» 16
14209 Index: simd/jfsseflt-64.asm
1626 =================================================================== 14210 ===================================================================
1627 --- simd/jcgryss2.asm» (revision 829) 14211 --- simd/jfsseflt-64.asm» (revision 829)
1628 +++ simd/jcgryss2.asm» (working copy) 14212 +++ simd/jfsseflt-64.asm» (working copy)
1629 @@ -39,7 +39,7 @@ 14213 @@ -1,5 +1,5 @@
14214 ;
14215 -; jfsseflt.asm - floating-point FDCT (64-bit SSE)
14216 +; jfsseflt-64.asm - floating-point FDCT (64-bit SSE)
14217 ;
14218 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
14219 ; Copyright 2009 D. R. Commander
14220 @@ -38,7 +38,7 @@
14221 » SECTION»SEG_CONST
14222
14223 » alignz» 16
14224 -» global» EXTN(jconst_fdct_float_sse)
14225 +» global» EXTN(jconst_fdct_float_sse) PRIVATE
14226
14227 EXTN(jconst_fdct_float_sse):
14228
14229 @@ -65,7 +65,7 @@
14230 %define WK_NUM»» 2
1630 14231
1631 align 16 14232 align 16
14233 - global EXTN(jsimd_fdct_float_sse)
14234 + global EXTN(jsimd_fdct_float_sse) PRIVATE
1632 14235
1633 -» global» EXTN(jsimd_rgb_gray_convert_sse2) 14236 EXTN(jsimd_fdct_float_sse):
1634 +» global» EXTN(jsimd_rgb_gray_convert_sse2) PRIVATE 14237 » push» rbp
1635 14238 @@ -352,3 +352,7 @@
1636 EXTN(jsimd_rgb_gray_convert_sse2): 14239 » pop» rsp» » ; rsp <- original rbp
1637 » push» ebp 14240 » pop» rbp
1638 Index: simd/jccolmmx.asm 14241 » ret
14242 +
14243 +; For some reason, the OS X linker does not honor the request to align the
14244 +; segment unless we do this.
14245 +» align» 16
14246 Index: simd/jfsseflt.asm
1639 =================================================================== 14247 ===================================================================
1640 --- simd/jccolmmx.asm» (revision 829) 14248 --- simd/jfsseflt.asm» (revision 829)
1641 +++ simd/jccolmmx.asm» (working copy) 14249 +++ simd/jfsseflt.asm» (working copy)
1642 @@ -37,7 +37,7 @@ 14250 @@ -37,7 +37,7 @@
1643 SECTION SEG_CONST 14251 SECTION SEG_CONST
1644 14252
1645 alignz 16 14253 alignz 16
1646 -» global» EXTN(jconst_rgb_ycc_convert_mmx) 14254 -» global» EXTN(jconst_fdct_float_sse)
1647 +» global» EXTN(jconst_rgb_ycc_convert_mmx) PRIVATE 14255 +» global» EXTN(jconst_fdct_float_sse) PRIVATE
1648 14256
1649 EXTN(jconst_rgb_ycc_convert_mmx): 14257 EXTN(jconst_fdct_float_sse):
1650 14258
14259 @@ -65,7 +65,7 @@
14260 %define WK_NUM 2
14261
14262 align 16
14263 - global EXTN(jsimd_fdct_float_sse)
14264 + global EXTN(jsimd_fdct_float_sse) PRIVATE
14265
14266 EXTN(jsimd_fdct_float_sse):
14267 push ebp
14268 @@ -365,3 +365,6 @@
14269 pop ebp
14270 ret
14271
14272 +; For some reason, the OS X linker does not honor the request to align the
14273 +; segment unless we do this.
14274 + align 16
14275 Index: simd/ji3dnflt.asm
14276 ===================================================================
14277 --- simd/ji3dnflt.asm (revision 829)
14278 +++ simd/ji3dnflt.asm (working copy)
14279 @@ -27,7 +27,7 @@
14280 SECTION SEG_CONST
14281
14282 alignz 16
14283 - global EXTN(jconst_idct_float_3dnow)
14284 + global EXTN(jconst_idct_float_3dnow) PRIVATE
14285
14286 EXTN(jconst_idct_float_3dnow):
14287
14288 @@ -63,7 +63,7 @@
14289 ; FAST_FLOAT workspace[DCTSIZE2]
14290
14291 align 16
14292 - global EXTN(jsimd_idct_float_3dnow)
14293 + global EXTN(jsimd_idct_float_3dnow) PRIVATE
14294
14295 EXTN(jsimd_idct_float_3dnow):
14296 push ebp
14297 @@ -447,3 +447,6 @@
14298 pop ebp
14299 ret
14300
14301 +; For some reason, the OS X linker does not honor the request to align the
14302 +; segment unless we do this.
14303 + align 16
14304 Index: simd/jimmxfst.asm
14305 ===================================================================
14306 --- simd/jimmxfst.asm (revision 829)
14307 +++ simd/jimmxfst.asm (working copy)
14308 @@ -59,7 +59,7 @@
14309 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
14310
14311 alignz 16
14312 - global EXTN(jconst_idct_ifast_mmx)
14313 + global EXTN(jconst_idct_ifast_mmx) PRIVATE
14314
14315 EXTN(jconst_idct_ifast_mmx):
14316
14317 @@ -94,7 +94,7 @@
14318 ; JCOEF workspace[DCTSIZE2]
14319
14320 align 16
14321 - global EXTN(jsimd_idct_ifast_mmx)
14322 + global EXTN(jsimd_idct_ifast_mmx) PRIVATE
14323
14324 EXTN(jsimd_idct_ifast_mmx):
14325 push ebp
14326 @@ -495,3 +495,6 @@
14327 pop ebp
14328 ret
14329
14330 +; For some reason, the OS X linker does not honor the request to align the
14331 +; segment unless we do this.
14332 + align 16
14333 Index: simd/jimmxint.asm
14334 ===================================================================
14335 --- simd/jimmxint.asm (revision 829)
14336 +++ simd/jimmxint.asm (working copy)
14337 @@ -66,7 +66,7 @@
14338 SECTION SEG_CONST
14339
14340 alignz 16
14341 - global EXTN(jconst_idct_islow_mmx)
14342 + global EXTN(jconst_idct_islow_mmx) PRIVATE
14343
14344 EXTN(jconst_idct_islow_mmx):
14345
14346 @@ -107,7 +107,7 @@
14347 ; JCOEF workspace[DCTSIZE2]
14348
14349 align 16
14350 - global EXTN(jsimd_idct_islow_mmx)
14351 + global EXTN(jsimd_idct_islow_mmx) PRIVATE
14352
14353 EXTN(jsimd_idct_islow_mmx):
14354 push ebp
14355 @@ -847,3 +847,6 @@
14356 pop ebp
14357 ret
14358
14359 +; For some reason, the OS X linker does not honor the request to align the
14360 +; segment unless we do this.
14361 + align 16
1651 Index: simd/jimmxred.asm 14362 Index: simd/jimmxred.asm
1652 =================================================================== 14363 ===================================================================
1653 --- simd/jimmxred.asm (revision 829) 14364 --- simd/jimmxred.asm (revision 829)
1654 +++ simd/jimmxred.asm (working copy) 14365 +++ simd/jimmxred.asm (working copy)
1655 @@ -72,7 +72,7 @@ 14366 @@ -72,7 +72,7 @@
1656 SECTION SEG_CONST 14367 SECTION SEG_CONST
1657 14368
1658 alignz 16 14369 alignz 16
1659 - global EXTN(jconst_idct_red_mmx) 14370 - global EXTN(jconst_idct_red_mmx)
1660 + global EXTN(jconst_idct_red_mmx) PRIVATE 14371 + global EXTN(jconst_idct_red_mmx) PRIVATE
(...skipping 11 matching lines...) Expand all
1672 push ebp 14383 push ebp
1673 @@ -503,7 +503,7 @@ 14384 @@ -503,7 +503,7 @@
1674 %define output_col(b) (b)+20 ; JDIMENSION output_col 14385 %define output_col(b) (b)+20 ; JDIMENSION output_col
1675 14386
1676 align 16 14387 align 16
1677 - global EXTN(jsimd_idct_2x2_mmx) 14388 - global EXTN(jsimd_idct_2x2_mmx)
1678 + global EXTN(jsimd_idct_2x2_mmx) PRIVATE 14389 + global EXTN(jsimd_idct_2x2_mmx) PRIVATE
1679 14390
1680 EXTN(jsimd_idct_2x2_mmx): 14391 EXTN(jsimd_idct_2x2_mmx):
1681 push ebp 14392 push ebp
1682 Index: simd/jsimdext.inc 14393 @@ -701,3 +701,6 @@
14394 » pop» ebp
14395 » ret
14396
14397 +; For some reason, the OS X linker does not honor the request to align the
14398 +; segment unless we do this.
14399 +» align» 16
14400 Index: simd/jiss2flt-64.asm
1683 =================================================================== 14401 ===================================================================
1684 --- simd/jsimdext.inc» (revision 829) 14402 --- simd/jiss2flt-64.asm» (revision 829)
1685 +++ simd/jsimdext.inc» (working copy) 14403 +++ simd/jiss2flt-64.asm» (working copy)
1686 @@ -73,6 +73,9 @@ 14404 @@ -1,5 +1,5 @@
1687 ; * *BSD family Unix using elf format
1688 ; * Unix System V, including Solaris x86, UnixWare and SCO Unix
1689
1690 +; PIC is the default on Linux
1691 +%define PIC
1692 +
1693 ; mark stack as non-executable
1694 section .note.GNU-stack noalloc noexec nowrite progbits
1695
1696 @@ -375,4 +378,14 @@
1697 ; 14405 ;
1698 %include "jsimdcfg.inc" 14406 -; jiss2flt.asm - floating-point IDCT (64-bit SSE & SSE2)
1699 14407 +; jiss2flt-64.asm - floating-point IDCT (64-bit SSE & SSE2)
1700 +; Begin chromium edits 14408 ;
1701 +%ifdef MACHO ; ----(nasm -fmacho -DMACHO ...)-------- 14409 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
1702 +%define PRIVATE :private_extern 14410 ; Copyright 2009 D. R. Commander
1703 +%elifdef ELF ; ----(nasm -felf[64] -DELF ...)------------ 14411 @@ -38,7 +38,7 @@
1704 +%define PRIVATE :hidden
1705 +%else
1706 +%define PRIVATE
1707 +%endif
1708 +; End chromium edits
1709 +
1710 ; --------------------------------------------------------------------------
1711 Index: simd/jdclrmmx.asm
1712 ===================================================================
1713 --- simd/jdclrmmx.asm» (revision 829)
1714 +++ simd/jdclrmmx.asm» (working copy)
1715 @@ -40,7 +40,7 @@
1716 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr
1717
1718 » align» 16
1719 -» global» EXTN(jsimd_ycc_rgb_convert_mmx)
1720 +» global» EXTN(jsimd_ycc_rgb_convert_mmx) PRIVATE
1721
1722 EXTN(jsimd_ycc_rgb_convert_mmx):
1723 » push» ebp
1724 Index: simd/jccolss2.asm
1725 ===================================================================
1726 --- simd/jccolss2.asm» (revision 829)
1727 +++ simd/jccolss2.asm» (working copy)
1728 @@ -34,7 +34,7 @@
1729 SECTION SEG_CONST 14412 SECTION SEG_CONST
1730 14413
1731 alignz 16 14414 alignz 16
1732 -» global» EXTN(jconst_rgb_ycc_convert_sse2) 14415 -» global» EXTN(jconst_idct_float_sse2)
1733 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE 14416 +» global» EXTN(jconst_idct_float_sse2) PRIVATE
1734 14417
1735 EXTN(jconst_rgb_ycc_convert_sse2): 14418 EXTN(jconst_idct_float_sse2):
1736 14419
1737 Index: simd/jisseflt.asm 14420 @@ -74,7 +74,7 @@
14421 » » » » » ; FAST_FLOAT workspace[DCTSIZE2]
14422
14423 » align» 16
14424 -» global» EXTN(jsimd_idct_float_sse2)
14425 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE
14426
14427 EXTN(jsimd_idct_float_sse2):
14428 » push» rbp
14429 @@ -81,11 +81,11 @@
14430 » mov» rax,rsp»» » » ; rax = original rbp
14431 » sub» rsp, byte 4
14432 » and» rsp, byte (-SIZEOF_XMMWORD)» ; align to 128 bits
14433 -» mov» [rsp],eax
14434 +» mov» [rsp],rax
14435 » mov» rbp,rsp»» » » ; rbp = aligned rbp
14436 » lea» rsp, [workspace]
14437 +» collect_args
14438 » push» rbx
14439 -» collect_args
14440
14441 » ; ---- Pass 1: process columns from input, store into work array.
14442
14443 @@ -471,9 +471,13 @@
14444 » dec» rcx» » » » ; ctr
14445 » jnz» near .rowloop
14446
14447 +» pop» rbx
14448 » uncollect_args
14449 -» pop» rbx
14450 » mov» rsp,rbp»» ; rsp <- aligned rbp
14451 » pop» rsp» » ; rsp <- original rbp
14452 » pop» rbp
14453 » ret
14454 +
14455 +; For some reason, the OS X linker does not honor the request to align the
14456 +; segment unless we do this.
14457 +» align» 16
14458 Index: simd/jiss2flt.asm
1738 =================================================================== 14459 ===================================================================
1739 --- simd/jisseflt.asm» (revision 829) 14460 --- simd/jiss2flt.asm» (revision 829)
1740 +++ simd/jisseflt.asm» (working copy) 14461 +++ simd/jiss2flt.asm» (working copy)
1741 @@ -37,7 +37,7 @@ 14462 @@ -37,7 +37,7 @@
1742 SECTION SEG_CONST 14463 SECTION SEG_CONST
1743 14464
1744 alignz 16 14465 alignz 16
1745 -» global» EXTN(jconst_idct_float_sse) 14466 -» global» EXTN(jconst_idct_float_sse2)
1746 +» global» EXTN(jconst_idct_float_sse) PRIVATE 14467 +» global» EXTN(jconst_idct_float_sse2) PRIVATE
1747 14468
1748 EXTN(jconst_idct_float_sse): 14469 EXTN(jconst_idct_float_sse2):
1749 14470
1750 @@ -73,7 +73,7 @@ 14471 @@ -73,7 +73,7 @@
1751 ; FAST_FLOAT workspace[DCTSIZE2] 14472 ; FAST_FLOAT workspace[DCTSIZE2]
1752 14473
1753 align 16 14474 align 16
1754 -» global» EXTN(jsimd_idct_float_sse) 14475 -» global» EXTN(jsimd_idct_float_sse2)
1755 +» global» EXTN(jsimd_idct_float_sse) PRIVATE 14476 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE
1756 14477
1757 EXTN(jsimd_idct_float_sse): 14478 EXTN(jsimd_idct_float_sse2):
1758 push ebp 14479 push ebp
1759 Index: simd/jcqnts2i-64.asm 14480 @@ -493,3 +493,6 @@
14481 » pop» ebp
14482 » ret
14483
14484 +; For some reason, the OS X linker does not honor the request to align the
14485 +; segment unless we do this.
14486 +» align» 16
14487 Index: simd/jiss2fst-64.asm
1760 =================================================================== 14488 ===================================================================
1761 --- simd/jcqnts2i-64.asm» (revision 829) 14489 --- simd/jiss2fst-64.asm» (revision 829)
1762 +++ simd/jcqnts2i-64.asm» (working copy) 14490 +++ simd/jiss2fst-64.asm» (working copy)
1763 @@ -36,7 +36,7 @@ 14491 @@ -1,5 +1,5 @@
1764 ; r12 = DCTELEM * workspace 14492 ;
14493 -; jiss2fst.asm - fast integer IDCT (64-bit SSE2)
14494 +; jiss2fst-64.asm - fast integer IDCT (64-bit SSE2)
14495 ;
14496 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
14497 ; Copyright 2009 D. R. Commander
14498 @@ -60,7 +60,7 @@
14499 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
14500
14501 » alignz» 16
14502 -» global» EXTN(jconst_idct_ifast_sse2)
14503 +» global» EXTN(jconst_idct_ifast_sse2) PRIVATE
14504
14505 EXTN(jconst_idct_ifast_sse2):
14506
14507 @@ -93,7 +93,7 @@
14508 %define WK_NUM»» 2
1765 14509
1766 align 16 14510 align 16
1767 -» global» EXTN(jsimd_convsamp_sse2) 14511 -» global» EXTN(jsimd_idct_ifast_sse2)
1768 +» global» EXTN(jsimd_convsamp_sse2) PRIVATE 14512 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE
1769 14513
1770 EXTN(jsimd_convsamp_sse2): 14514 EXTN(jsimd_idct_ifast_sse2):
1771 push rbp 14515 push rbp
1772 @@ -112,7 +112,7 @@ 14516 @@ -100,7 +100,7 @@
1773 ; r12 = DCTELEM * workspace 14517 » mov» rax,rsp»» » » ; rax = original rbp
14518 » sub» rsp, byte 4
14519 » and» rsp, byte (-SIZEOF_XMMWORD)» ; align to 128 bits
14520 -» mov» [rsp],eax
14521 +» mov» [rsp],rax
14522 » mov» rbp,rsp»» » » ; rbp = aligned rbp
14523 » lea» rsp, [wk(0)]
14524 » collect_args
14525 @@ -486,3 +486,7 @@
14526 » pop» rbp
14527 » ret
14528 » ret
14529 +
14530 +; For some reason, the OS X linker does not honor the request to align the
14531 +; segment unless we do this.
14532 +» align» 16
14533 Index: simd/jiss2fst.asm
14534 ===================================================================
14535 --- simd/jiss2fst.asm» (revision 829)
14536 +++ simd/jiss2fst.asm» (working copy)
14537 @@ -59,7 +59,7 @@
14538 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
14539
14540 » alignz» 16
14541 -» global» EXTN(jconst_idct_ifast_sse2)
14542 +» global» EXTN(jconst_idct_ifast_sse2) PRIVATE
14543
14544 EXTN(jconst_idct_ifast_sse2):
14545
14546 @@ -92,7 +92,7 @@
14547 %define WK_NUM»» 2
1774 14548
1775 align 16 14549 align 16
1776 -» global» EXTN(jsimd_quantize_sse2) 14550 -» global» EXTN(jsimd_idct_ifast_sse2)
1777 +» global» EXTN(jsimd_quantize_sse2) PRIVATE 14551 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE
1778 14552
1779 EXTN(jsimd_quantize_sse2): 14553 EXTN(jsimd_idct_ifast_sse2):
1780 » push» rbp 14554 » push» ebp
1781 Index: simd/jdclrss2.asm 14555 @@ -497,3 +497,6 @@
1782 =================================================================== 14556 » pop» ebp
1783 --- simd/jdclrss2.asm» (revision 829) 14557 » ret
1784 +++ simd/jdclrss2.asm» (working copy)
1785 @@ -40,7 +40,7 @@
1786 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr
1787 14558
1788 » align» 16 14559 +; For some reason, the OS X linker does not honor the request to align the
1789 -» global» EXTN(jsimd_ycc_rgb_convert_sse2) 14560 +; segment unless we do this.
1790 +» global» EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE 14561 +» align» 16
1791
1792 EXTN(jsimd_ycc_rgb_convert_sse2):
1793 » push» ebp
1794 Index: simd/jcqntsse.asm
1795 ===================================================================
1796 --- simd/jcqntsse.asm» (revision 829)
1797 +++ simd/jcqntsse.asm» (working copy)
1798 @@ -35,7 +35,7 @@
1799 %define workspace» ebp+16» » ; FAST_FLOAT * workspace
1800
1801 » align» 16
1802 -» global» EXTN(jsimd_convsamp_float_sse)
1803 +» global» EXTN(jsimd_convsamp_float_sse) PRIVATE
1804
1805 EXTN(jsimd_convsamp_float_sse):
1806 » push» ebp
1807 @@ -138,7 +138,7 @@
1808 %define workspace» ebp+16» » ; FAST_FLOAT * workspace
1809
1810 » align» 16
1811 -» global» EXTN(jsimd_quantize_float_sse)
1812 +» global» EXTN(jsimd_quantize_float_sse) PRIVATE
1813
1814 EXTN(jsimd_quantize_float_sse):
1815 » push» ebp
1816 Index: simd/jiss2int-64.asm 14562 Index: simd/jiss2int-64.asm
1817 =================================================================== 14563 ===================================================================
1818 --- simd/jiss2int-64.asm (revision 829) 14564 --- simd/jiss2int-64.asm (revision 829)
1819 +++ simd/jiss2int-64.asm (working copy) 14565 +++ simd/jiss2int-64.asm (working copy)
14566 @@ -1,5 +1,5 @@
14567 ;
14568 -; jiss2int.asm - accurate integer IDCT (64-bit SSE2)
14569 +; jiss2int-64.asm - accurate integer IDCT (64-bit SSE2)
14570 ;
14571 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
14572 ; Copyright 2009 D. R. Commander
1820 @@ -67,7 +67,7 @@ 14573 @@ -67,7 +67,7 @@
1821 SECTION SEG_CONST 14574 SECTION SEG_CONST
1822 14575
1823 alignz 16 14576 alignz 16
1824 - global EXTN(jconst_idct_islow_sse2) 14577 - global EXTN(jconst_idct_islow_sse2)
1825 + global EXTN(jconst_idct_islow_sse2) PRIVATE 14578 + global EXTN(jconst_idct_islow_sse2) PRIVATE
1826 14579
1827 EXTN(jconst_idct_islow_sse2): 14580 EXTN(jconst_idct_islow_sse2):
1828 14581
1829 @@ -106,7 +106,7 @@ 14582 @@ -106,7 +106,7 @@
1830 %define WK_NUM 12 14583 %define WK_NUM 12
1831 14584
1832 align 16 14585 align 16
1833 - global EXTN(jsimd_idct_islow_sse2) 14586 - global EXTN(jsimd_idct_islow_sse2)
1834 + global EXTN(jsimd_idct_islow_sse2) PRIVATE 14587 + global EXTN(jsimd_idct_islow_sse2) PRIVATE
1835 14588
1836 EXTN(jsimd_idct_islow_sse2): 14589 EXTN(jsimd_idct_islow_sse2):
1837 push rbp 14590 push rbp
1838 Index: simd/jfmmxfst.asm 14591 @@ -842,3 +842,7 @@
14592 » pop» rsp» » ; rsp <- original rbp
14593 » pop» rbp
14594 » ret
14595 +
14596 +; For some reason, the OS X linker does not honor the request to align the
14597 +; segment unless we do this.
14598 +» align» 16
14599 Index: simd/jiss2int.asm
1839 =================================================================== 14600 ===================================================================
1840 --- simd/jfmmxfst.asm» (revision 829) 14601 --- simd/jiss2int.asm» (revision 829)
1841 +++ simd/jfmmxfst.asm» (working copy) 14602 +++ simd/jiss2int.asm» (working copy)
1842 @@ -52,7 +52,7 @@ 14603 @@ -66,7 +66,7 @@
1843 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) 14604 » SECTION»SEG_CONST
1844 14605
1845 alignz 16 14606 alignz 16
1846 -» global» EXTN(jconst_fdct_ifast_mmx) 14607 -» global» EXTN(jconst_idct_islow_sse2)
1847 +» global» EXTN(jconst_fdct_ifast_mmx) PRIVATE 14608 +» global» EXTN(jconst_idct_islow_sse2) PRIVATE
1848 14609
1849 EXTN(jconst_fdct_ifast_mmx): 14610 EXTN(jconst_idct_islow_sse2):
1850 14611
1851 @@ -80,7 +80,7 @@ 14612 @@ -105,7 +105,7 @@
14613 %define WK_NUM»» 12
14614
14615 » align» 16
14616 -» global» EXTN(jsimd_idct_islow_sse2)
14617 +» global» EXTN(jsimd_idct_islow_sse2) PRIVATE
14618
14619 EXTN(jsimd_idct_islow_sse2):
14620 » push» ebp
14621 @@ -854,3 +854,6 @@
14622 » pop» ebp
14623 » ret
14624
14625 +; For some reason, the OS X linker does not honor the request to align the
14626 +; segment unless we do this.
14627 +» align» 16
14628 Index: simd/jiss2red-64.asm
14629 ===================================================================
14630 --- simd/jiss2red-64.asm» (revision 829)
14631 +++ simd/jiss2red-64.asm» (working copy)
14632 @@ -1,5 +1,5 @@
14633 ;
14634 -; jiss2red.asm - reduced-size IDCT (64-bit SSE2)
14635 +; jiss2red-64.asm - reduced-size IDCT (64-bit SSE2)
14636 ;
14637 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
14638 ; Copyright 2009 D. R. Commander
14639 @@ -73,7 +73,7 @@
14640 » SECTION»SEG_CONST
14641
14642 » alignz» 16
14643 -» global» EXTN(jconst_idct_red_sse2)
14644 +» global» EXTN(jconst_idct_red_sse2) PRIVATE
14645
14646 EXTN(jconst_idct_red_sse2):
14647
14648 @@ -114,7 +114,7 @@
1852 %define WK_NUM 2 14649 %define WK_NUM 2
1853 14650
1854 align 16 14651 align 16
1855 -» global» EXTN(jsimd_fdct_ifast_mmx) 14652 -» global» EXTN(jsimd_idct_4x4_sse2)
1856 +» global» EXTN(jsimd_fdct_ifast_mmx) PRIVATE 14653 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE
1857 14654
1858 EXTN(jsimd_fdct_ifast_mmx): 14655 EXTN(jsimd_idct_4x4_sse2):
14656 » push» rbp
14657 @@ -121,7 +121,7 @@
14658 » mov» rax,rsp»» » » ; rax = original rbp
14659 » sub» rsp, byte 4
14660 » and» rsp, byte (-SIZEOF_XMMWORD)» ; align to 128 bits
14661 -» mov» [rsp],eax
14662 +» mov» [rsp],rax
14663 » mov» rbp,rsp»» » » ; rbp = aligned rbp
14664 » lea» rsp, [wk(0)]
14665 » collect_args
14666 @@ -413,13 +413,14 @@
14667 ; r13 = JDIMENSION output_col
14668
14669 » align» 16
14670 -» global» EXTN(jsimd_idct_2x2_sse2)
14671 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE
14672
14673 EXTN(jsimd_idct_2x2_sse2):
14674 » push» rbp
14675 +» mov» rax,rsp
14676 » mov» rbp,rsp
14677 +» collect_args
14678 » push» rbx
14679 -» collect_args
14680
14681 » ; ---- Pass 1: process columns from input.
14682
14683 @@ -565,7 +566,11 @@
14684 » mov» WORD [rdx+rax*SIZEOF_JSAMPLE], bx
14685 » mov» WORD [rsi+rax*SIZEOF_JSAMPLE], cx
14686
14687 +» pop» rbx
14688 » uncollect_args
14689 -» pop» rbx
14690 » pop» rbp
14691 » ret
14692 +
14693 +; For some reason, the OS X linker does not honor the request to align the
14694 +; segment unless we do this.
14695 +» align» 16
14696 Index: simd/jiss2red.asm
14697 ===================================================================
14698 --- simd/jiss2red.asm» (revision 829)
14699 +++ simd/jiss2red.asm» (working copy)
14700 @@ -72,7 +72,7 @@
14701 » SECTION»SEG_CONST
14702
14703 » alignz» 16
14704 -» global» EXTN(jconst_idct_red_sse2)
14705 +» global» EXTN(jconst_idct_red_sse2) PRIVATE
14706
14707 EXTN(jconst_idct_red_sse2):
14708
14709 @@ -113,7 +113,7 @@
14710 %define WK_NUM»» 2
14711
14712 » align» 16
14713 -» global» EXTN(jsimd_idct_4x4_sse2)
14714 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE
14715
14716 EXTN(jsimd_idct_4x4_sse2):
1859 push ebp 14717 push ebp
1860 Index: jdarith.c 14718 @@ -424,7 +424,7 @@
14719 %define output_col(b)» (b)+20» » ; JDIMENSION output_col
14720
14721 » align» 16
14722 -» global» EXTN(jsimd_idct_2x2_sse2)
14723 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE
14724
14725 EXTN(jsimd_idct_2x2_sse2):
14726 » push» ebp
14727 @@ -589,3 +589,6 @@
14728 » pop» ebp
14729 » ret
14730
14731 +; For some reason, the OS X linker does not honor the request to align the
14732 +; segment unless we do this.
14733 +» align» 16
14734 Index: simd/jisseflt.asm
1861 =================================================================== 14735 ===================================================================
1862 --- jdarith.c» (revision 829) 14736 --- simd/jisseflt.asm» (revision 829)
1863 +++ jdarith.c» (working copy) 14737 +++ simd/jisseflt.asm» (working copy)
1864 @@ -150,8 +150,8 @@ 14738 @@ -37,7 +37,7 @@
1865 */ 14739 » SECTION»SEG_CONST
1866 sv = *st; 14740
1867 qe = jpeg_aritab[sv & 0x7F];»/* => Qe_Value */ 14741 » alignz» 16
1868 - nl = qe & 0xFF; qe >>= 8;» /* Next_Index_LPS + Switch_MPS */ 14742 -» global» EXTN(jconst_idct_float_sse)
1869 - nm = qe & 0xFF; qe >>= 8;» /* Next_Index_MPS */ 14743 +» global» EXTN(jconst_idct_float_sse) PRIVATE
1870 + nl = (unsigned char) qe & 0xFF; qe >>= 8;» /* Next_Index_LPS + Switch_MPS * / 14744
1871 + nm = (unsigned char) qe & 0xFF; qe >>= 8;» /* Next_Index_MPS */ 14745 EXTN(jconst_idct_float_sse):
1872 14746
1873 /* Decode & estimation procedures per sections D.2.4 & D.2.5 */ 14747 @@ -73,7 +73,7 @@
1874 temp = e->a - qe; 14748 » » » » » ; FAST_FLOAT workspace[DCTSIZE2]
1875 Index: jdhuff.c 14749
14750 » align» 16
14751 -» global» EXTN(jsimd_idct_float_sse)
14752 +» global» EXTN(jsimd_idct_float_sse) PRIVATE
14753
14754 EXTN(jsimd_idct_float_sse):
14755 » push» ebp
14756 @@ -567,3 +567,6 @@
14757 » pop» ebp
14758 » ret
14759
14760 +; For some reason, the OS X linker does not honor the request to align the
14761 +; segment unless we do this.
14762 +» align» 16
14763 Index: simd/jsimd.h
1876 =================================================================== 14764 ===================================================================
1877 --- jdhuff.c (revision 1541) 14765 --- simd/jsimd.h» (revision 829)
1878 +++ jdhuff.c (working copy) 14766 +++ simd/jsimd.h» (working copy)
1879 @@ -662,7 +662,7 @@ 14767 @@ -2,19 +2,22 @@
1880 d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn]; 14768 * simd/jsimd.h
1881 register int s, k, r, l; 14769 *
1882 14770 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
1883 - HUFF_DECODE_FAST(s, l, dctbl); 14771 + * Copyright 2011 D. R. Commander
1884 + HUFF_DECODE_FAST(s, l, dctbl, slow_decode_mcu); 14772 *
1885 if (s) { 14773 * Based on the x86 SIMD extension for IJG JPEG library,
1886 FILL_BIT_BUFFER_FAST 14774 * Copyright (C) 1999-2006, MIYASAKA Masaru.
1887 r = GET_BITS(s); 14775 + * For conditions of distribution and use, see copyright notice in jsimdext.inc
1888 @@ -679,7 +679,7 @@ 14776 *
1889 if (entropy->ac_needed[blkn]) {
1890
1891 for (k = 1; k < DCTSIZE2; k++) {
1892 - HUFF_DECODE_FAST(s, l, actbl);
1893 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu);
1894 r = s >> 4;
1895 s &= 15;
1896
1897 @@ -698,7 +698,7 @@
1898 } else {
1899
1900 for (k = 1; k < DCTSIZE2; k++) {
1901 - HUFF_DECODE_FAST(s, l, actbl);
1902 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu);
1903 r = s >> 4;
1904 s &= 15;
1905
1906 @@ -715,6 +715,7 @@
1907 }
1908
1909 if (cinfo->unread_marker != 0) {
1910 +slow_decode_mcu:
1911 cinfo->unread_marker = 0;
1912 return FALSE;
1913 }
1914 @@ -742,7 +743,7 @@
1915 * this module, since we'll just re-assign them on the next call.)
1916 */ 14777 */
1917 14778
1918 -#define BUFSIZE (DCTSIZE2 * 2) 14779 /* Bitmask for supported acceleration methods */
1919 +#define BUFSIZE (DCTSIZE2 * 2u) 14780
1920 14781 -#define JSIMD_NONE 0x00
1921 METHODDEF(boolean) 14782 -#define JSIMD_MMX 0x01
1922 decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data) 14783 -#define JSIMD_3DNOW 0x02
1923 Index: jdhuff.h 14784 -#define JSIMD_SSE 0x04
14785 -#define JSIMD_SSE2 0x08
14786 +#define JSIMD_NONE 0x00
14787 +#define JSIMD_MMX 0x01
14788 +#define JSIMD_3DNOW 0x02
14789 +#define JSIMD_SSE 0x04
14790 +#define JSIMD_SSE2 0x08
14791 +#define JSIMD_ARM_NEON 0x10
14792
14793 /* Short forms of external names for systems with brain-damaged linkers. */
14794
14795 @@ -27,6 +30,13 @@
14796 #define jsimd_extbgrx_ycc_convert_mmx jSEXTBGRXYCCM
14797 #define jsimd_extxbgr_ycc_convert_mmx jSEXTXBGRYCCM
14798 #define jsimd_extxrgb_ycc_convert_mmx jSEXTXRGBYCCM
14799 +#define jsimd_rgb_gray_convert_mmx jSRGBGRYM
14800 +#define jsimd_extrgb_gray_convert_mmx jSEXTRGBGRYM
14801 +#define jsimd_extrgbx_gray_convert_mmx jSEXTRGBXGRYM
14802 +#define jsimd_extbgr_gray_convert_mmx jSEXTBGRGRYM
14803 +#define jsimd_extbgrx_gray_convert_mmx jSEXTBGRXGRYM
14804 +#define jsimd_extxbgr_gray_convert_mmx jSEXTXBGRGRYM
14805 +#define jsimd_extxrgb_gray_convert_mmx jSEXTXRGBGRYM
14806 #define jsimd_ycc_rgb_convert_mmx jSYCCRGBM
14807 #define jsimd_ycc_extrgb_convert_mmx jSYCCEXTRGBM
14808 #define jsimd_ycc_extrgbx_convert_mmx jSYCCEXTRGBXM
14809 @@ -42,6 +52,14 @@
14810 #define jsimd_extbgrx_ycc_convert_sse2 jSEXTBGRXYCCS2
14811 #define jsimd_extxbgr_ycc_convert_sse2 jSEXTXBGRYCCS2
14812 #define jsimd_extxrgb_ycc_convert_sse2 jSEXTXRGBYCCS2
14813 +#define jconst_rgb_gray_convert_sse2 jSCRGBGRYS2
14814 +#define jsimd_rgb_gray_convert_sse2 jSRGBGRYS2
14815 +#define jsimd_extrgb_gray_convert_sse2 jSEXTRGBGRYS2
14816 +#define jsimd_extrgbx_gray_convert_sse2 jSEXTRGBXGRYS2
14817 +#define jsimd_extbgr_gray_convert_sse2 jSEXTBGRGRYS2
14818 +#define jsimd_extbgrx_gray_convert_sse2 jSEXTBGRXGRYS2
14819 +#define jsimd_extxbgr_gray_convert_sse2 jSEXTXBGRGRYS2
14820 +#define jsimd_extxrgb_gray_convert_sse2 jSEXTXRGBGRYS2
14821 #define jconst_ycc_rgb_convert_sse2 jSCYCCRGBS2
14822 #define jsimd_ycc_rgb_convert_sse2 jSYCCRGBS2
14823 #define jsimd_ycc_extrgb_convert_sse2 jSYCCEXTRGBS2
14824 @@ -162,6 +180,35 @@
14825 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14826 JDIMENSION output_row, int num_rows));
14827
14828 +EXTERN(void) jsimd_rgb_gray_convert_mmx
14829 + JPP((JDIMENSION img_width,
14830 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14831 + JDIMENSION output_row, int num_rows));
14832 +EXTERN(void) jsimd_extrgb_gray_convert_mmx
14833 + JPP((JDIMENSION img_width,
14834 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14835 + JDIMENSION output_row, int num_rows));
14836 +EXTERN(void) jsimd_extrgbx_gray_convert_mmx
14837 + JPP((JDIMENSION img_width,
14838 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14839 + JDIMENSION output_row, int num_rows));
14840 +EXTERN(void) jsimd_extbgr_gray_convert_mmx
14841 + JPP((JDIMENSION img_width,
14842 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14843 + JDIMENSION output_row, int num_rows));
14844 +EXTERN(void) jsimd_extbgrx_gray_convert_mmx
14845 + JPP((JDIMENSION img_width,
14846 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14847 + JDIMENSION output_row, int num_rows));
14848 +EXTERN(void) jsimd_extxbgr_gray_convert_mmx
14849 + JPP((JDIMENSION img_width,
14850 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14851 + JDIMENSION output_row, int num_rows));
14852 +EXTERN(void) jsimd_extxrgb_gray_convert_mmx
14853 + JPP((JDIMENSION img_width,
14854 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14855 + JDIMENSION output_row, int num_rows));
14856 +
14857 EXTERN(void) jsimd_ycc_rgb_convert_mmx
14858 JPP((JDIMENSION out_width,
14859 JSAMPIMAGE input_buf, JDIMENSION input_row,
14860 @@ -221,6 +268,36 @@
14861 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14862 JDIMENSION output_row, int num_rows));
14863
14864 +extern const int jconst_rgb_gray_convert_sse2[];
14865 +EXTERN(void) jsimd_rgb_gray_convert_sse2
14866 + JPP((JDIMENSION img_width,
14867 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14868 + JDIMENSION output_row, int num_rows));
14869 +EXTERN(void) jsimd_extrgb_gray_convert_sse2
14870 + JPP((JDIMENSION img_width,
14871 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14872 + JDIMENSION output_row, int num_rows));
14873 +EXTERN(void) jsimd_extrgbx_gray_convert_sse2
14874 + JPP((JDIMENSION img_width,
14875 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14876 + JDIMENSION output_row, int num_rows));
14877 +EXTERN(void) jsimd_extbgr_gray_convert_sse2
14878 + JPP((JDIMENSION img_width,
14879 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14880 + JDIMENSION output_row, int num_rows));
14881 +EXTERN(void) jsimd_extbgrx_gray_convert_sse2
14882 + JPP((JDIMENSION img_width,
14883 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14884 + JDIMENSION output_row, int num_rows));
14885 +EXTERN(void) jsimd_extxbgr_gray_convert_sse2
14886 + JPP((JDIMENSION img_width,
14887 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14888 + JDIMENSION output_row, int num_rows));
14889 +EXTERN(void) jsimd_extxrgb_gray_convert_sse2
14890 + JPP((JDIMENSION img_width,
14891 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14892 + JDIMENSION output_row, int num_rows));
14893 +
14894 extern const int jconst_ycc_rgb_convert_sse2[];
14895 EXTERN(void) jsimd_ycc_rgb_convert_sse2
14896 JPP((JDIMENSION out_width,
14897 @@ -251,6 +328,64 @@
14898 JSAMPIMAGE input_buf, JDIMENSION input_row,
14899 JSAMPARRAY output_buf, int num_rows));
14900
14901 +EXTERN(void) jsimd_rgb_ycc_convert_neon
14902 + JPP((JDIMENSION img_width,
14903 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14904 + JDIMENSION output_row, int num_rows));
14905 +EXTERN(void) jsimd_extrgb_ycc_convert_neon
14906 + JPP((JDIMENSION img_width,
14907 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14908 + JDIMENSION output_row, int num_rows));
14909 +EXTERN(void) jsimd_extrgbx_ycc_convert_neon
14910 + JPP((JDIMENSION img_width,
14911 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14912 + JDIMENSION output_row, int num_rows));
14913 +EXTERN(void) jsimd_extbgr_ycc_convert_neon
14914 + JPP((JDIMENSION img_width,
14915 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14916 + JDIMENSION output_row, int num_rows));
14917 +EXTERN(void) jsimd_extbgrx_ycc_convert_neon
14918 + JPP((JDIMENSION img_width,
14919 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14920 + JDIMENSION output_row, int num_rows));
14921 +EXTERN(void) jsimd_extxbgr_ycc_convert_neon
14922 + JPP((JDIMENSION img_width,
14923 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14924 + JDIMENSION output_row, int num_rows));
14925 +EXTERN(void) jsimd_extxrgb_ycc_convert_neon
14926 + JPP((JDIMENSION img_width,
14927 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
14928 + JDIMENSION output_row, int num_rows));
14929 +
14930 +EXTERN(void) jsimd_ycc_rgb_convert_neon
14931 + JPP((JDIMENSION out_width,
14932 + JSAMPIMAGE input_buf, JDIMENSION input_row,
14933 + JSAMPARRAY output_buf, int num_rows));
14934 +EXTERN(void) jsimd_ycc_extrgb_convert_neon
14935 + JPP((JDIMENSION out_width,
14936 + JSAMPIMAGE input_buf, JDIMENSION input_row,
14937 + JSAMPARRAY output_buf, int num_rows));
14938 +EXTERN(void) jsimd_ycc_extrgbx_convert_neon
14939 + JPP((JDIMENSION out_width,
14940 + JSAMPIMAGE input_buf, JDIMENSION input_row,
14941 + JSAMPARRAY output_buf, int num_rows));
14942 +EXTERN(void) jsimd_ycc_extbgr_convert_neon
14943 + JPP((JDIMENSION out_width,
14944 + JSAMPIMAGE input_buf, JDIMENSION input_row,
14945 + JSAMPARRAY output_buf, int num_rows));
14946 +EXTERN(void) jsimd_ycc_extbgrx_convert_neon
14947 + JPP((JDIMENSION out_width,
14948 + JSAMPIMAGE input_buf, JDIMENSION input_row,
14949 + JSAMPARRAY output_buf, int num_rows));
14950 +EXTERN(void) jsimd_ycc_extxbgr_convert_neon
14951 + JPP((JDIMENSION out_width,
14952 + JSAMPIMAGE input_buf, JDIMENSION input_row,
14953 + JSAMPARRAY output_buf, int num_rows));
14954 +EXTERN(void) jsimd_ycc_extxrgb_convert_neon
14955 + JPP((JDIMENSION out_width,
14956 + JSAMPIMAGE input_buf, JDIMENSION input_row,
14957 + JSAMPARRAY output_buf, int num_rows));
14958 +
14959 /* SIMD Downsample */
14960 EXTERN(void) jsimd_h2v2_downsample_mmx
14961 JPP((JDIMENSION image_width, int max_v_samp_factor,
14962 @@ -387,6 +522,10 @@
14963 JPP((JDIMENSION output_width, JSAMPIMAGE input_buf,
14964 JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf));
14965
14966 +EXTERN(void) jsimd_h2v1_fancy_upsample_neon
14967 + JPP((int max_v_samp_factor, JDIMENSION downsampled_width,
14968 + JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
14969 +
14970 /* SIMD Sample Conversion */
14971 EXTERN(void) jsimd_convsamp_mmx JPP((JSAMPARRAY sample_data,
14972 JDIMENSION start_col,
14973 @@ -396,6 +535,10 @@
14974 JDIMENSION start_col,
14975 DCTELEM * workspace));
14976
14977 +EXTERN(void) jsimd_convsamp_neon JPP((JSAMPARRAY sample_data,
14978 + JDIMENSION start_col,
14979 + DCTELEM * workspace));
14980 +
14981 EXTERN(void) jsimd_convsamp_float_3dnow JPP((JSAMPARRAY sample_data,
14982 JDIMENSION start_col,
14983 FAST_FLOAT * workspace));
14984 @@ -417,6 +560,8 @@
14985 extern const int jconst_fdct_islow_sse2[];
14986 EXTERN(void) jsimd_fdct_ifast_sse2 JPP((DCTELEM * data));
14987
14988 +EXTERN(void) jsimd_fdct_ifast_neon JPP((DCTELEM * data));
14989 +
14990 EXTERN(void) jsimd_fdct_float_3dnow JPP((FAST_FLOAT * data));
14991
14992 extern const int jconst_fdct_float_sse[];
14993 @@ -431,6 +576,10 @@
14994 DCTELEM * divisors,
14995 DCTELEM * workspace));
14996
14997 +EXTERN(void) jsimd_quantize_neon JPP((JCOEFPTR coef_block,
14998 + DCTELEM * divisors,
14999 + DCTELEM * workspace));
15000 +
15001 EXTERN(void) jsimd_quantize_float_3dnow JPP((JCOEFPTR coef_block,
15002 FAST_FLOAT * divisors,
15003 FAST_FLOAT * workspace));
15004 @@ -463,6 +612,15 @@
15005 JSAMPARRAY output_buf,
15006 JDIMENSION output_col));
15007
15008 +EXTERN(void) jsimd_idct_2x2_neon JPP((void * dct_table,
15009 + JCOEFPTR coef_block,
15010 + JSAMPARRAY output_buf,
15011 + JDIMENSION output_col));
15012 +EXTERN(void) jsimd_idct_4x4_neon JPP((void * dct_table,
15013 + JCOEFPTR coef_block,
15014 + JSAMPARRAY output_buf,
15015 + JDIMENSION output_col));
15016 +
15017 /* SIMD Inverse DCT */
15018 EXTERN(void) jsimd_idct_islow_mmx JPP((void * dct_table,
15019 JCOEFPTR coef_block,
15020 @@ -484,6 +642,15 @@
15021 JSAMPARRAY output_buf,
15022 JDIMENSION output_col));
15023
15024 +EXTERN(void) jsimd_idct_islow_neon JPP((void * dct_table,
15025 + JCOEFPTR coef_block,
15026 + JSAMPARRAY output_buf,
15027 + JDIMENSION output_col));
15028 +EXTERN(void) jsimd_idct_ifast_neon JPP((void * dct_table,
15029 + JCOEFPTR coef_block,
15030 + JSAMPARRAY output_buf,
15031 + JDIMENSION output_col));
15032 +
15033 EXTERN(void) jsimd_idct_float_3dnow JPP((void * dct_table,
15034 JCOEFPTR coef_block,
15035 JSAMPARRAY output_buf,
15036 Index: simd/jsimd_i386.c
1924 =================================================================== 15037 ===================================================================
1925 --- jdhuff.h (revision 1541) 15038 --- simd/jsimd_i386.c» (revision 829)
1926 +++ jdhuff.h (working copy) 15039 +++ simd/jsimd_i386.c» (working copy)
1927 @@ -208,7 +208,7 @@ 15040 @@ -2,10 +2,11 @@
1928 } \ 15041 * jsimd_i386.c
15042 *
15043 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
15044 - * Copyright 2009 D. R. Commander
15045 + * Copyright 2009-2011 D. R. Commander
15046 *
15047 * Based on the x86 SIMD extension for IJG JPEG library,
15048 * Copyright (C) 1999-2006, MIYASAKA Masaru.
15049 + * For conditions of distribution and use, see copyright notice in jsimdext.inc
15050 *
15051 * This file contains the interface between the "normal" portions
15052 * of the library and the SIMD implementations when running on a
15053 @@ -40,7 +41,7 @@
15054 {
15055 char *env = NULL;
15056
15057 - if (simd_support != ~0)
15058 + if (simd_support != ~0U)
15059 return;
15060
15061 simd_support = jpeg_simd_cpu_support();
15062 @@ -51,15 +52,16 @@
15063 simd_support &= JSIMD_MMX;
15064 env = getenv("JSIMD_FORCE3DNOW");
15065 if ((env != NULL) && (strcmp(env, "1") == 0))
15066 - simd_support &= JSIMD_3DNOW;
15067 + simd_support &= JSIMD_3DNOW|JSIMD_MMX;
15068 env = getenv("JSIMD_FORCESSE");
15069 if ((env != NULL) && (strcmp(env, "1") == 0))
15070 - simd_support &= JSIMD_SSE;
15071 + simd_support &= JSIMD_SSE|JSIMD_MMX;
15072 env = getenv("JSIMD_FORCESSE2");
15073 if ((env != NULL) && (strcmp(env, "1") == 0))
15074 simd_support &= JSIMD_SSE2;
1929 } 15075 }
1930 15076
1931 -#define HUFF_DECODE_FAST(s,nb,htbl) \ 15077 +#ifndef JPEG_DECODE_ONLY
1932 +#define HUFF_DECODE_FAST(s,nb,htbl,slowlabel) \ 15078 GLOBAL(int)
1933 FILL_BIT_BUFFER_FAST; \ 15079 jsimd_can_rgb_ycc (void)
1934 s = PEEK_BITS(HUFF_LOOKAHEAD); \ 15080 {
1935 s = htbl->lookup[s]; \ 15081 @@ -81,8 +83,31 @@
1936 @@ -225,7 +225,9 @@ 15082
1937 s |= GET_BITS(1); \ 15083 return 0;
1938 nb++; \ 15084 }
1939 } \
1940 - s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) & 0xFF ]; \
1941 + if (nb > 16) \
1942 + goto slowlabel; \
1943 + s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ]; \
1944 }
1945
1946 /* Out-of-line case for Huffman code fetching */
1947
1948 Index: jchuff.c
1949 ===================================================================
1950 --- jchuff.c» (revision 1219)
1951 +++ jchuff.c» (revision 1220)
1952 @@ -22,8 +22,36 @@
1953 #include "jchuff.h"» » /* Declarations shared with jcphuff.c */
1954 #include <limits.h>
1955
1956 +/*
1957 + * NOTE: If USE_CLZ_INTRINSIC is defined, then clz/bsr instructions will be
1958 + * used for bit counting rather than the lookup table. This will reduce the
1959 + * memory footprint by 64k, which is important for some mobile applications
1960 + * that create many isolated instances of libjpeg-turbo (web browsers, for
1961 + * instance.) This may improve performance on some mobile platforms as well.
1962 + * This feature is enabled by default only on ARM processors, because some x86
1963 + * chips have a slow implementation of bsr, and the use of clz/bsr cannot be
1964 + * shown to have a significant performance impact even on the x86 chips that
1965 + * have a fast implementation of it. When building for ARMv6, you can
1966 + * explicitly disable the use of clz/bsr by adding -mthumb to the compiler
1967 + * flags (this defines __thumb__).
1968 + */
1969 +
1970 +/* NOTE: Both GCC and Clang define __GNUC__ */
1971 +#if defined __GNUC__ && defined __arm__
1972 +#if !defined __thumb__ || defined __thumb2__
1973 +#define USE_CLZ_INTRINSIC
1974 +#endif 15085 +#endif
1975 +#endif 15086
1976 + 15087 GLOBAL(int)
1977 +#ifdef USE_CLZ_INTRINSIC
1978 +#define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x))
1979 +#define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0)
1980 +#else
1981 static unsigned char jpeg_nbits_table[65536];
1982 static int jpeg_nbits_table_init = 0;
1983 +#define JPEG_NBITS(x) (jpeg_nbits_table[x])
1984 +#define JPEG_NBITS_NONZERO(x) JPEG_NBITS(x)
1985 +#endif
1986
1987 #ifndef min
1988 #define min(a,b) ((a)<(b)?(a):(b))
1989 @@ -272,6 +300,7 @@
1990 dtbl->ehufsi[i] = huffsize[p];
1991 }
1992
1993 +#ifndef USE_CLZ_INTRINSIC
1994 if(!jpeg_nbits_table_init) {
1995 for(i = 0; i < 65536; i++) {
1996 int nbits = 0, temp = i;
1997 @@ -280,6 +309,7 @@
1998 }
1999 jpeg_nbits_table_init = 1;
2000 }
2001 +#endif
2002 }
2003
2004
2005 @@ -482,7 +512,7 @@
2006 temp2 += temp3;
2007
2008 /* Find the number of bits needed for the magnitude of the coefficient */
2009 - nbits = jpeg_nbits_table[temp];
2010 + nbits = JPEG_NBITS(temp);
2011
2012 /* Emit the Huffman-coded symbol for the number of bits */
2013 code = dctbl->ehufco[nbits];
2014 @@ -516,7 +546,7 @@
2015 temp ^= temp3; \
2016 temp -= temp3; \
2017 temp2 += temp3; \
2018 - nbits = jpeg_nbits_table[temp]; \
2019 + nbits = JPEG_NBITS_NONZERO(temp); \
2020 /* if run length > 15, must emit special run-length-16 codes (0xF0) */ \
2021 while (r > 15) { \
2022 EMIT_BITS(code_0xf0, size_0xf0) \
2023 Index: simd/jsimd_arm64.c
2024 ===================================================================
2025 --- /dev/null
2026 +++ simd/jsimd_arm64.c
2027 @@ -0,0 +1,544 @@
2028 +/*
2029 + * jsimd_arm64.c
2030 + *
2031 + * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
2032 + * Copyright 2009-2011, 2013-2014 D. R. Commander
2033 + *
2034 + * Based on the x86 SIMD extension for IJG JPEG library,
2035 + * Copyright (C) 1999-2006, MIYASAKA Masaru.
2036 + * For conditions of distribution and use, see copyright notice in jsimdext.inc
2037 + *
2038 + * This file contains the interface between the "normal" portions
2039 + * of the library and the SIMD implementations when running on a
2040 + * 64-bit ARM architecture.
2041 + */
2042 +
2043 +#define JPEG_INTERNALS
2044 +#include "../jinclude.h"
2045 +#include "../jpeglib.h"
2046 +#include "../jsimd.h"
2047 +#include "../jdct.h"
2048 +#include "../jsimddct.h"
2049 +#include "jsimd.h"
2050 +
2051 +#include <stdio.h>
2052 +#include <string.h>
2053 +#include <ctype.h>
2054 +
2055 +static unsigned int simd_support = ~0;
2056 +
2057 +/*
2058 + * Check what SIMD accelerations are supported.
2059 + *
2060 + * FIXME: This code is racy under a multi-threaded environment.
2061 + */
2062 +
2063 +/*
2064 + * ARMv8 architectures support NEON extensions by default.
2065 + * It is no longer optional as it was with ARMv7.
2066 + */
2067 +
2068 +
2069 +LOCAL(void)
2070 +init_simd (void)
2071 +{
2072 + char *env = NULL;
2073 +
2074 + if (simd_support != ~0U)
2075 + return;
2076 +
2077 + simd_support = 0;
2078 +
2079 + simd_support |= JSIMD_ARM_NEON;
2080 +
2081 + /* Force different settings through environment variables */
2082 + env = getenv("JSIMD_FORCENEON");
2083 + if ((env != NULL) && (strcmp(env, "1") == 0))
2084 + simd_support &= JSIMD_ARM_NEON;
2085 + env = getenv("JSIMD_FORCENONE");
2086 + if ((env != NULL) && (strcmp(env, "1") == 0))
2087 + simd_support = 0;
2088 +}
2089 +
2090 +GLOBAL(int)
2091 +jsimd_can_rgb_ycc (void)
2092 +{
2093 + init_simd();
2094 +
2095 + return 0;
2096 +}
2097 +
2098 +GLOBAL(int)
2099 +jsimd_can_rgb_gray (void) 15088 +jsimd_can_rgb_gray (void)
2100 +{ 15089 +{
2101 + init_simd(); 15090 + init_simd();
2102 + 15091 +
2103 + return 0;
2104 +}
2105 +
2106 +GLOBAL(int)
2107 +jsimd_can_ycc_rgb (void)
2108 +{
2109 + init_simd();
2110 +
2111 + /* The code is optimised for these values only */ 15092 + /* The code is optimised for these values only */
2112 + if (BITS_IN_JSAMPLE != 8) 15093 + if (BITS_IN_JSAMPLE != 8)
2113 + return 0; 15094 + return 0;
2114 + if (sizeof(JDIMENSION) != 4) 15095 + if (sizeof(JDIMENSION) != 4)
2115 + return 0; 15096 + return 0;
2116 + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4)) 15097 + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4))
2117 + return 0; 15098 + return 0;
2118 + 15099 +
2119 + if (simd_support & JSIMD_ARM_NEON) 15100 + if ((simd_support & JSIMD_SSE2) &&
15101 + IS_ALIGNED_SSE(jconst_rgb_gray_convert_sse2))
15102 + return 1;
15103 + if (simd_support & JSIMD_MMX)
2120 + return 1; 15104 + return 1;
2121 + 15105 +
2122 + return 0; 15106 + return 0;
2123 +} 15107 +}
2124 + 15108 +
2125 +GLOBAL(int) 15109 +GLOBAL(int)
2126 +jsimd_can_ycc_rgb565 (void) 15110 jsimd_can_ycc_rgb (void)
15111 {
15112 init_simd();
15113 @@ -104,6 +129,7 @@
15114 return 0;
15115 }
15116
15117 +#ifndef JPEG_DECODE_ONLY
15118 GLOBAL(void)
15119 jsimd_rgb_ycc_convert (j_compress_ptr cinfo,
15120 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
15121 @@ -119,6 +145,7 @@
15122 mmxfct=jsimd_extrgb_ycc_convert_mmx;
15123 break;
15124 case JCS_EXT_RGBX:
15125 + case JCS_EXT_RGBA:
15126 sse2fct=jsimd_extrgbx_ycc_convert_sse2;
15127 mmxfct=jsimd_extrgbx_ycc_convert_mmx;
15128 break;
15129 @@ -127,14 +154,17 @@
15130 mmxfct=jsimd_extbgr_ycc_convert_mmx;
15131 break;
15132 case JCS_EXT_BGRX:
15133 + case JCS_EXT_BGRA:
15134 sse2fct=jsimd_extbgrx_ycc_convert_sse2;
15135 mmxfct=jsimd_extbgrx_ycc_convert_mmx;
15136 break;
15137 case JCS_EXT_XBGR:
15138 + case JCS_EXT_ABGR:
15139 sse2fct=jsimd_extxbgr_ycc_convert_sse2;
15140 mmxfct=jsimd_extxbgr_ycc_convert_mmx;
15141 break;
15142 case JCS_EXT_XRGB:
15143 + case JCS_EXT_ARGB:
15144 sse2fct=jsimd_extxrgb_ycc_convert_sse2;
15145 mmxfct=jsimd_extxrgb_ycc_convert_mmx;
15146 break;
15147 @@ -152,8 +182,62 @@
15148 mmxfct(cinfo->image_width, input_buf,
15149 output_buf, output_row, num_rows);
15150 }
15151 +#endif
15152
15153 GLOBAL(void)
15154 +jsimd_rgb_gray_convert (j_compress_ptr cinfo,
15155 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
15156 + JDIMENSION output_row, int num_rows)
2127 +{ 15157 +{
2128 + init_simd(); 15158 + void (*sse2fct)(JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int);
2129 + 15159 + void (*mmxfct)(JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int);
15160 +
15161 + switch(cinfo->in_color_space)
15162 + {
15163 + case JCS_EXT_RGB:
15164 + sse2fct=jsimd_extrgb_gray_convert_sse2;
15165 + mmxfct=jsimd_extrgb_gray_convert_mmx;
15166 + break;
15167 + case JCS_EXT_RGBX:
15168 + case JCS_EXT_RGBA:
15169 + sse2fct=jsimd_extrgbx_gray_convert_sse2;
15170 + mmxfct=jsimd_extrgbx_gray_convert_mmx;
15171 + break;
15172 + case JCS_EXT_BGR:
15173 + sse2fct=jsimd_extbgr_gray_convert_sse2;
15174 + mmxfct=jsimd_extbgr_gray_convert_mmx;
15175 + break;
15176 + case JCS_EXT_BGRX:
15177 + case JCS_EXT_BGRA:
15178 + sse2fct=jsimd_extbgrx_gray_convert_sse2;
15179 + mmxfct=jsimd_extbgrx_gray_convert_mmx;
15180 + break;
15181 + case JCS_EXT_XBGR:
15182 + case JCS_EXT_ABGR:
15183 + sse2fct=jsimd_extxbgr_gray_convert_sse2;
15184 + mmxfct=jsimd_extxbgr_gray_convert_mmx;
15185 + break;
15186 + case JCS_EXT_XRGB:
15187 + case JCS_EXT_ARGB:
15188 + sse2fct=jsimd_extxrgb_gray_convert_sse2;
15189 + mmxfct=jsimd_extxrgb_gray_convert_mmx;
15190 + break;
15191 + default:
15192 + sse2fct=jsimd_rgb_gray_convert_sse2;
15193 + mmxfct=jsimd_rgb_gray_convert_mmx;
15194 + break;
15195 + }
15196 +
15197 + if ((simd_support & JSIMD_SSE2) &&
15198 + IS_ALIGNED_SSE(jconst_rgb_gray_convert_sse2))
15199 + sse2fct(cinfo->image_width, input_buf,
15200 + output_buf, output_row, num_rows);
15201 + else if (simd_support & JSIMD_MMX)
15202 + mmxfct(cinfo->image_width, input_buf,
15203 + output_buf, output_row, num_rows);
15204 +}
15205 +
15206 +GLOBAL(void)
15207 jsimd_ycc_rgb_convert (j_decompress_ptr cinfo,
15208 JSAMPIMAGE input_buf, JDIMENSION input_row,
15209 JSAMPARRAY output_buf, int num_rows)
15210 @@ -168,6 +252,7 @@
15211 mmxfct=jsimd_ycc_extrgb_convert_mmx;
15212 break;
15213 case JCS_EXT_RGBX:
15214 + case JCS_EXT_RGBA:
15215 sse2fct=jsimd_ycc_extrgbx_convert_sse2;
15216 mmxfct=jsimd_ycc_extrgbx_convert_mmx;
15217 break;
15218 @@ -176,14 +261,17 @@
15219 mmxfct=jsimd_ycc_extbgr_convert_mmx;
15220 break;
15221 case JCS_EXT_BGRX:
15222 + case JCS_EXT_BGRA:
15223 sse2fct=jsimd_ycc_extbgrx_convert_sse2;
15224 mmxfct=jsimd_ycc_extbgrx_convert_mmx;
15225 break;
15226 case JCS_EXT_XBGR:
15227 + case JCS_EXT_ABGR:
15228 sse2fct=jsimd_ycc_extxbgr_convert_sse2;
15229 mmxfct=jsimd_ycc_extxbgr_convert_mmx;
15230 break;
15231 case JCS_EXT_XRGB:
15232 + case JCS_EXT_ARGB:
15233 sse2fct=jsimd_ycc_extxrgb_convert_sse2;
15234 mmxfct=jsimd_ycc_extxrgb_convert_mmx;
15235 break;
15236 @@ -202,6 +290,7 @@
15237 input_row, output_buf, num_rows);
15238 }
15239
15240 +#ifndef JPEG_DECODE_ONLY
15241 GLOBAL(int)
15242 jsimd_can_h2v2_downsample (void)
15243 {
15244 @@ -267,6 +356,7 @@
15245 compptr->v_samp_factor, compptr->width_in_blocks,
15246 input_data, output_data);
15247 }
15248 +#endif
15249
15250 GLOBAL(int)
15251 jsimd_can_h2v2_upsample (void)
15252 @@ -382,7 +472,7 @@
15253 {
15254 if ((simd_support & JSIMD_SSE2) &&
15255 IS_ALIGNED_SSE(jconst_fancy_upsample_sse2))
15256 - jsimd_h2v1_fancy_upsample_sse2(cinfo->max_v_samp_factor,
15257 + jsimd_h2v2_fancy_upsample_sse2(cinfo->max_v_samp_factor,
15258 compptr->downsampled_width, input_data, output_data_ptr);
15259 else if (simd_support & JSIMD_MMX)
15260 jsimd_h2v2_fancy_upsample_mmx(cinfo->max_v_samp_factor,
15261 @@ -460,6 +550,7 @@
15262 mmxfct=jsimd_h2v2_extrgb_merged_upsample_mmx;
15263 break;
15264 case JCS_EXT_RGBX:
15265 + case JCS_EXT_RGBA:
15266 sse2fct=jsimd_h2v2_extrgbx_merged_upsample_sse2;
15267 mmxfct=jsimd_h2v2_extrgbx_merged_upsample_mmx;
15268 break;
15269 @@ -468,14 +559,17 @@
15270 mmxfct=jsimd_h2v2_extbgr_merged_upsample_mmx;
15271 break;
15272 case JCS_EXT_BGRX:
15273 + case JCS_EXT_BGRA:
15274 sse2fct=jsimd_h2v2_extbgrx_merged_upsample_sse2;
15275 mmxfct=jsimd_h2v2_extbgrx_merged_upsample_mmx;
15276 break;
15277 case JCS_EXT_XBGR:
15278 + case JCS_EXT_ABGR:
15279 sse2fct=jsimd_h2v2_extxbgr_merged_upsample_sse2;
15280 mmxfct=jsimd_h2v2_extxbgr_merged_upsample_mmx;
15281 break;
15282 case JCS_EXT_XRGB:
15283 + case JCS_EXT_ARGB:
15284 sse2fct=jsimd_h2v2_extxrgb_merged_upsample_sse2;
15285 mmxfct=jsimd_h2v2_extxrgb_merged_upsample_mmx;
15286 break;
15287 @@ -510,6 +604,7 @@
15288 mmxfct=jsimd_h2v1_extrgb_merged_upsample_mmx;
15289 break;
15290 case JCS_EXT_RGBX:
15291 + case JCS_EXT_RGBA:
15292 sse2fct=jsimd_h2v1_extrgbx_merged_upsample_sse2;
15293 mmxfct=jsimd_h2v1_extrgbx_merged_upsample_mmx;
15294 break;
15295 @@ -518,14 +613,17 @@
15296 mmxfct=jsimd_h2v1_extbgr_merged_upsample_mmx;
15297 break;
15298 case JCS_EXT_BGRX:
15299 + case JCS_EXT_BGRA:
15300 sse2fct=jsimd_h2v1_extbgrx_merged_upsample_sse2;
15301 mmxfct=jsimd_h2v1_extbgrx_merged_upsample_mmx;
15302 break;
15303 case JCS_EXT_XBGR:
15304 + case JCS_EXT_ABGR:
15305 sse2fct=jsimd_h2v1_extxbgr_merged_upsample_sse2;
15306 mmxfct=jsimd_h2v1_extxbgr_merged_upsample_mmx;
15307 break;
15308 case JCS_EXT_XRGB:
15309 + case JCS_EXT_ARGB:
15310 sse2fct=jsimd_h2v1_extxrgb_merged_upsample_sse2;
15311 mmxfct=jsimd_h2v1_extxrgb_merged_upsample_mmx;
15312 break;
15313 @@ -544,6 +642,7 @@
15314 in_row_group_ctr, output_buf);
15315 }
15316
15317 +#ifndef JPEG_DECODE_ONLY
15318 GLOBAL(int)
15319 jsimd_can_convsamp (void)
15320 {
15321 @@ -763,6 +862,7 @@
15322 else if (simd_support & JSIMD_3DNOW)
15323 jsimd_quantize_float_3dnow(coef_block, divisors, workspace);
15324 }
15325 +#endif
15326
15327 GLOBAL(int)
15328 jsimd_can_idct_2x2 (void)
15329 @@ -953,4 +1053,3 @@
15330 jsimd_idct_float_3dnow(compptr->dct_table, coef_block,
15331 output_buf, output_col);
15332 }
15333 -
15334 Index: simd/jsimd_x86_64.c
15335 ===================================================================
15336 --- simd/jsimd_x86_64.c»(revision 829)
15337 +++ simd/jsimd_x86_64.c»(working copy)
15338 @@ -2,10 +2,11 @@
15339 * jsimd_x86_64.c
15340 *
15341 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
15342 - * Copyright 2009 D. R. Commander
15343 + * Copyright 2009-2011 D. R. Commander
15344 *
15345 * Based on the x86 SIMD extension for IJG JPEG library,
15346 * Copyright (C) 1999-2006, MIYASAKA Masaru.
15347 + * For conditions of distribution and use, see copyright notice in jsimdext.inc
15348 *
15349 * This file contains the interface between the "normal" portions
15350 * of the library and the SIMD implementations when running on a
15351 @@ -18,16 +19,17 @@
15352 #include "../jsimd.h"
15353 #include "../jdct.h"
15354 #include "../jsimddct.h"
15355 -#include "simd/jsimd.h"
15356 +#include "jsimd.h"
15357
15358 /*
15359 * In the PIC cases, we have no guarantee that constants will keep
15360 * their alignment. This macro allows us to verify it at runtime.
15361 */
15362 -#define IS_ALIGNED(ptr, order) (((unsigned)ptr & ((1 << order) - 1)) == 0)
15363 +#define IS_ALIGNED(ptr, order) (((size_t)ptr & ((1 << order) - 1)) == 0)
15364
15365 #define IS_ALIGNED_SSE(ptr) (IS_ALIGNED(ptr, 4)) /* 16 byte alignment */
15366
15367 +#ifndef JPEG_DECODE_ONLY
15368 GLOBAL(int)
15369 jsimd_can_rgb_ycc (void)
15370 {
15371 @@ -44,8 +46,26 @@
15372
15373 return 1;
15374 }
15375 +#endif
15376
15377 GLOBAL(int)
15378 +jsimd_can_rgb_gray (void)
15379 +{
2130 + /* The code is optimised for these values only */ 15380 + /* The code is optimised for these values only */
2131 + if (BITS_IN_JSAMPLE != 8) 15381 + if (BITS_IN_JSAMPLE != 8)
2132 + return 0; 15382 + return 0;
2133 + if (sizeof(JDIMENSION) != 4) 15383 + if (sizeof(JDIMENSION) != 4)
2134 + return 0; 15384 + return 0;
2135 + 15385 + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4))
2136 + if (simd_support & JSIMD_ARM_NEON) 15386 + return 0;
2137 + return 1; 15387 +
2138 + 15388 + if (!IS_ALIGNED_SSE(jconst_rgb_gray_convert_sse2))
2139 + return 0; 15389 + return 0;
15390 +
15391 + return 1;
2140 +} 15392 +}
2141 + 15393 +
2142 +GLOBAL(void) 15394 +GLOBAL(int)
2143 +jsimd_rgb_ycc_convert (j_compress_ptr cinfo, 15395 jsimd_can_ycc_rgb (void)
2144 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, 15396 {
2145 + JDIMENSION output_row, int num_rows) 15397 /* The code is optimised for these values only */
2146 +{ 15398 @@ -62,6 +82,7 @@
2147 +} 15399 return 1;
2148 + 15400 }
2149 +GLOBAL(void) 15401
15402 +#ifndef JPEG_DECODE_ONLY
15403 GLOBAL(void)
15404 jsimd_rgb_ycc_convert (j_compress_ptr cinfo,
15405 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
15406 @@ -75,6 +96,7 @@
15407 sse2fct=jsimd_extrgb_ycc_convert_sse2;
15408 break;
15409 case JCS_EXT_RGBX:
15410 + case JCS_EXT_RGBA:
15411 sse2fct=jsimd_extrgbx_ycc_convert_sse2;
15412 break;
15413 case JCS_EXT_BGR:
15414 @@ -81,12 +103,15 @@
15415 sse2fct=jsimd_extbgr_ycc_convert_sse2;
15416 break;
15417 case JCS_EXT_BGRX:
15418 + case JCS_EXT_BGRA:
15419 sse2fct=jsimd_extbgrx_ycc_convert_sse2;
15420 break;
15421 case JCS_EXT_XBGR:
15422 + case JCS_EXT_ABGR:
15423 sse2fct=jsimd_extxbgr_ycc_convert_sse2;
15424 break;
15425 case JCS_EXT_XRGB:
15426 + case JCS_EXT_ARGB:
15427 sse2fct=jsimd_extxrgb_ycc_convert_sse2;
15428 break;
15429 default:
15430 @@ -96,8 +121,48 @@
15431
15432 sse2fct(cinfo->image_width, input_buf, output_buf, output_row, num_rows);
15433 }
15434 +#endif
15435
15436 GLOBAL(void)
2150 +jsimd_rgb_gray_convert (j_compress_ptr cinfo, 15437 +jsimd_rgb_gray_convert (j_compress_ptr cinfo,
2151 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, 15438 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
2152 + JDIMENSION output_row, int num_rows) 15439 + JDIMENSION output_row, int num_rows)
2153 +{ 15440 +{
2154 +} 15441 + void (*sse2fct)(JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int);
2155 + 15442 +
2156 +GLOBAL(void) 15443 + switch(cinfo->in_color_space)
2157 +jsimd_ycc_rgb_convert (j_decompress_ptr cinfo, 15444 + {
2158 + JSAMPIMAGE input_buf, JDIMENSION input_row,
2159 + JSAMPARRAY output_buf, int num_rows)
2160 +{
2161 + void (*neonfct)(JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY, int);
2162 +
2163 + switch(cinfo->out_color_space) {
2164 + case JCS_EXT_RGB: 15445 + case JCS_EXT_RGB:
2165 + neonfct=jsimd_ycc_extrgb_convert_neon; 15446 + sse2fct=jsimd_extrgb_gray_convert_sse2;
2166 + break; 15447 + break;
2167 + case JCS_EXT_RGBX: 15448 + case JCS_EXT_RGBX:
2168 + case JCS_EXT_RGBA: 15449 + case JCS_EXT_RGBA:
2169 + neonfct=jsimd_ycc_extrgbx_convert_neon; 15450 + sse2fct=jsimd_extrgbx_gray_convert_sse2;
2170 + break; 15451 + break;
2171 + case JCS_EXT_BGR: 15452 + case JCS_EXT_BGR:
2172 + neonfct=jsimd_ycc_extbgr_convert_neon; 15453 + sse2fct=jsimd_extbgr_gray_convert_sse2;
2173 + break; 15454 + break;
2174 + case JCS_EXT_BGRX: 15455 + case JCS_EXT_BGRX:
2175 + case JCS_EXT_BGRA: 15456 + case JCS_EXT_BGRA:
2176 + neonfct=jsimd_ycc_extbgrx_convert_neon; 15457 + sse2fct=jsimd_extbgrx_gray_convert_sse2;
2177 + break; 15458 + break;
2178 + case JCS_EXT_XBGR: 15459 + case JCS_EXT_XBGR:
2179 + case JCS_EXT_ABGR: 15460 + case JCS_EXT_ABGR:
2180 + neonfct=jsimd_ycc_extxbgr_convert_neon; 15461 + sse2fct=jsimd_extxbgr_gray_convert_sse2;
2181 + break; 15462 + break;
2182 + case JCS_EXT_XRGB: 15463 + case JCS_EXT_XRGB:
2183 + case JCS_EXT_ARGB: 15464 + case JCS_EXT_ARGB:
2184 + neonfct=jsimd_ycc_extxrgb_convert_neon; 15465 + sse2fct=jsimd_extxrgb_gray_convert_sse2;
2185 + break; 15466 + break;
2186 + default: 15467 + default:
2187 + neonfct=jsimd_ycc_extrgb_convert_neon; 15468 + sse2fct=jsimd_rgb_gray_convert_sse2;
2188 + break; 15469 + break;
2189 + } 15470 + }
2190 + 15471 +
2191 + if (simd_support & JSIMD_ARM_NEON) 15472 + sse2fct(cinfo->image_width, input_buf, output_buf, output_row, num_rows);
2192 + neonfct(cinfo->output_width, input_buf, input_row, output_buf, num_rows);
2193 +} 15473 +}
2194 + 15474 +
2195 +GLOBAL(void) 15475 +GLOBAL(void)
2196 +jsimd_ycc_rgb565_convert (j_decompress_ptr cinfo, 15476 jsimd_ycc_rgb_convert (j_decompress_ptr cinfo,
2197 + JSAMPIMAGE input_buf, JDIMENSION input_row, 15477 JSAMPIMAGE input_buf, JDIMENSION input_row,
2198 + JSAMPARRAY output_buf, int num_rows) 15478 JSAMPARRAY output_buf, int num_rows)
15479 @@ -110,6 +175,7 @@
15480 sse2fct=jsimd_ycc_extrgb_convert_sse2;
15481 break;
15482 case JCS_EXT_RGBX:
15483 + case JCS_EXT_RGBA:
15484 sse2fct=jsimd_ycc_extrgbx_convert_sse2;
15485 break;
15486 case JCS_EXT_BGR:
15487 @@ -116,12 +182,15 @@
15488 sse2fct=jsimd_ycc_extbgr_convert_sse2;
15489 break;
15490 case JCS_EXT_BGRX:
15491 + case JCS_EXT_BGRA:
15492 sse2fct=jsimd_ycc_extbgrx_convert_sse2;
15493 break;
15494 case JCS_EXT_XBGR:
15495 + case JCS_EXT_ABGR:
15496 sse2fct=jsimd_ycc_extxbgr_convert_sse2;
15497 break;
15498 case JCS_EXT_XRGB:
15499 + case JCS_EXT_ARGB:
15500 sse2fct=jsimd_ycc_extxrgb_convert_sse2;
15501 break;
15502 default:
15503 @@ -132,6 +201,7 @@
15504 sse2fct(cinfo->output_width, input_buf, input_row, output_buf, num_rows);
15505 }
15506
15507 +#ifndef JPEG_DECODE_ONLY
15508 GLOBAL(int)
15509 jsimd_can_h2v2_downsample (void)
15510 {
15511 @@ -177,6 +247,7 @@
15512 compptr->width_in_blocks,
15513 input_data, output_data);
15514 }
15515 +#endif
15516
15517 GLOBAL(int)
15518 jsimd_can_h2v2_upsample (void)
15519 @@ -260,7 +331,7 @@
15520 JSAMPARRAY input_data,
15521 JSAMPARRAY * output_data_ptr)
15522 {
15523 - jsimd_h2v1_fancy_upsample_sse2(cinfo->max_v_samp_factor,
15524 + jsimd_h2v2_fancy_upsample_sse2(cinfo->max_v_samp_factor,
15525 compptr->downsampled_width,
15526 input_data, output_data_ptr);
15527 }
15528 @@ -320,6 +391,7 @@
15529 sse2fct=jsimd_h2v2_extrgb_merged_upsample_sse2;
15530 break;
15531 case JCS_EXT_RGBX:
15532 + case JCS_EXT_RGBA:
15533 sse2fct=jsimd_h2v2_extrgbx_merged_upsample_sse2;
15534 break;
15535 case JCS_EXT_BGR:
15536 @@ -326,12 +398,15 @@
15537 sse2fct=jsimd_h2v2_extbgr_merged_upsample_sse2;
15538 break;
15539 case JCS_EXT_BGRX:
15540 + case JCS_EXT_BGRA:
15541 sse2fct=jsimd_h2v2_extbgrx_merged_upsample_sse2;
15542 break;
15543 case JCS_EXT_XBGR:
15544 + case JCS_EXT_ABGR:
15545 sse2fct=jsimd_h2v2_extxbgr_merged_upsample_sse2;
15546 break;
15547 case JCS_EXT_XRGB:
15548 + case JCS_EXT_ARGB:
15549 sse2fct=jsimd_h2v2_extxrgb_merged_upsample_sse2;
15550 break;
15551 default:
15552 @@ -356,6 +431,7 @@
15553 sse2fct=jsimd_h2v1_extrgb_merged_upsample_sse2;
15554 break;
15555 case JCS_EXT_RGBX:
15556 + case JCS_EXT_RGBA:
15557 sse2fct=jsimd_h2v1_extrgbx_merged_upsample_sse2;
15558 break;
15559 case JCS_EXT_BGR:
15560 @@ -362,12 +438,15 @@
15561 sse2fct=jsimd_h2v1_extbgr_merged_upsample_sse2;
15562 break;
15563 case JCS_EXT_BGRX:
15564 + case JCS_EXT_BGRA:
15565 sse2fct=jsimd_h2v1_extbgrx_merged_upsample_sse2;
15566 break;
15567 case JCS_EXT_XBGR:
15568 + case JCS_EXT_ABGR:
15569 sse2fct=jsimd_h2v1_extxbgr_merged_upsample_sse2;
15570 break;
15571 case JCS_EXT_XRGB:
15572 + case JCS_EXT_ARGB:
15573 sse2fct=jsimd_h2v1_extxrgb_merged_upsample_sse2;
15574 break;
15575 default:
15576 @@ -378,6 +457,7 @@
15577 sse2fct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf);
15578 }
15579
15580 +#ifndef JPEG_DECODE_ONLY
15581 GLOBAL(int)
15582 jsimd_can_convsamp (void)
15583 {
15584 @@ -528,6 +608,7 @@
15585 {
15586 jsimd_quantize_float_sse2(coef_block, divisors, workspace);
15587 }
15588 +#endif
15589
15590 GLOBAL(int)
15591 jsimd_can_idct_2x2 (void)
15592 @@ -677,4 +758,3 @@
15593 jsimd_idct_float_sse2(compptr->dct_table, coef_block,
15594 output_buf, output_col);
15595 }
15596 -
15597 Index: simd/jsimdcfg.inc.h
15598 ===================================================================
15599 --- simd/jsimdcfg.inc.h (revision 829)
15600 +++ simd/jsimdcfg.inc.h (working copy)
15601 @@ -15,26 +15,54 @@
15602 #include "../jmorecfg.h"
15603 #include "jsimd.h"
15604
15605 -#define define(var) %define _cpp_protection_##var
15606 -#define definev(var) %define _cpp_protection_##var var
15607 -
15608 ;
15609 ; -- jpeglib.h
15610 ;
15611
15612 -definev(DCTSIZE)
15613 -definev(DCTSIZE2)
15614 +%define _cpp_protection_DCTSIZE DCTSIZE
15615 +%define _cpp_protection_DCTSIZE2 DCTSIZE2
15616
15617 ;
15618 ; -- jmorecfg.h
15619 ;
15620
15621 -definev(RGB_RED)
15622 -definev(RGB_GREEN)
15623 -definev(RGB_BLUE)
15624 +%define _cpp_protection_RGB_RED RGB_RED
15625 +%define _cpp_protection_RGB_GREEN RGB_GREEN
15626 +%define _cpp_protection_RGB_BLUE RGB_BLUE
15627 +%define _cpp_protection_RGB_PIXELSIZE RGB_PIXELSIZE
15628
15629 -definev(RGB_PIXELSIZE)
15630 +%define _cpp_protection_EXT_RGB_RED EXT_RGB_RED
15631 +%define _cpp_protection_EXT_RGB_GREEN EXT_RGB_GREEN
15632 +%define _cpp_protection_EXT_RGB_BLUE EXT_RGB_BLUE
15633 +%define _cpp_protection_EXT_RGB_PIXELSIZE EXT_RGB_PIXELSIZE
15634
15635 +%define _cpp_protection_EXT_RGBX_RED EXT_RGBX_RED
15636 +%define _cpp_protection_EXT_RGBX_GREEN EXT_RGBX_GREEN
15637 +%define _cpp_protection_EXT_RGBX_BLUE EXT_RGBX_BLUE
15638 +%define _cpp_protection_EXT_RGBX_PIXELSIZE EXT_RGBX_PIXELSIZE
15639 +
15640 +%define _cpp_protection_EXT_BGR_RED EXT_BGR_RED
15641 +%define _cpp_protection_EXT_BGR_GREEN EXT_BGR_GREEN
15642 +%define _cpp_protection_EXT_BGR_BLUE EXT_BGR_BLUE
15643 +%define _cpp_protection_EXT_BGR_PIXELSIZE EXT_BGR_PIXELSIZE
15644 +
15645 +%define _cpp_protection_EXT_BGRX_RED EXT_BGRX_RED
15646 +%define _cpp_protection_EXT_BGRX_GREEN EXT_BGRX_GREEN
15647 +%define _cpp_protection_EXT_BGRX_BLUE EXT_BGRX_BLUE
15648 +%define _cpp_protection_EXT_BGRX_PIXELSIZE EXT_BGRX_PIXELSIZE
15649 +
15650 +%define _cpp_protection_EXT_XBGR_RED EXT_XBGR_RED
15651 +%define _cpp_protection_EXT_XBGR_GREEN EXT_XBGR_GREEN
15652 +%define _cpp_protection_EXT_XBGR_BLUE EXT_XBGR_BLUE
15653 +%define _cpp_protection_EXT_XBGR_PIXELSIZE EXT_XBGR_PIXELSIZE
15654 +
15655 +%define _cpp_protection_EXT_XRGB_RED EXT_XRGB_RED
15656 +%define _cpp_protection_EXT_XRGB_GREEN EXT_XRGB_GREEN
15657 +%define _cpp_protection_EXT_XRGB_BLUE EXT_XRGB_BLUE
15658 +%define _cpp_protection_EXT_XRGB_PIXELSIZE EXT_XRGB_PIXELSIZE
15659 +
15660 +%define RGBX_FILLER_0XFF 1
15661 +
15662 ; Representation of a single sample (pixel element value).
15663 ; On this SIMD implementation, this must be 'unsigned char'.
15664 ;
15665 @@ -42,7 +70,7 @@
15666 %define JSAMPLE byte ; unsigned char
15667 %define SIZEOF_JSAMPLE SIZEOF_BYTE ; sizeof(JSAMPLE)
15668
15669 -definev(CENTERJSAMPLE)
15670 +%define _cpp_protection_CENTERJSAMPLE CENTERJSAMPLE
15671
15672 ; Representation of a DCT frequency coefficient.
15673 ; On this SIMD implementation, this must be 'short'.
15674 @@ -95,74 +123,74 @@
15675 ; -- jsimd.h
15676 ;
15677
15678 -definev(JSIMD_NONE)
15679 -definev(JSIMD_MMX)
15680 -definev(JSIMD_3DNOW)
15681 -definev(JSIMD_SSE)
15682 -definev(JSIMD_SSE2)
15683 +%define _cpp_protection_JSIMD_NONE JSIMD_NONE
15684 +%define _cpp_protection_JSIMD_MMX JSIMD_MMX
15685 +%define _cpp_protection_JSIMD_3DNOW JSIMD_3DNOW
15686 +%define _cpp_protection_JSIMD_SSE JSIMD_SSE
15687 +%define _cpp_protection_JSIMD_SSE2 JSIMD_SSE2
15688
15689 ; Short forms of external names for systems with brain-damaged linkers.
15690 ;
15691 #ifdef NEED_SHORT_EXTERNAL_NAMES
15692 -definev(jpeg_simd_cpu_support)
15693 -definev(jsimd_rgb_ycc_convert_mmx)
15694 -definev(jsimd_ycc_rgb_convert_mmx)
15695 -definev(jconst_rgb_ycc_convert_sse2)
15696 -definev(jsimd_rgb_ycc_convert_sse2)
15697 -definev(jconst_ycc_rgb_convert_sse2)
15698 -definev(jsimd_ycc_rgb_convert_sse2)
15699 -definev(jsimd_h2v2_downsample_mmx)
15700 -definev(jsimd_h2v1_downsample_mmx)
15701 -definev(jsimd_h2v2_downsample_sse2)
15702 -definev(jsimd_h2v1_downsample_sse2)
15703 -definev(jsimd_h2v2_upsample_mmx)
15704 -definev(jsimd_h2v1_upsample_mmx)
15705 -definev(jsimd_h2v1_fancy_upsample_mmx)
15706 -definev(jsimd_h2v2_fancy_upsample_mmx)
15707 -definev(jsimd_h2v1_merged_upsample_mmx)
15708 -definev(jsimd_h2v2_merged_upsample_mmx)
15709 -definev(jsimd_h2v2_upsample_sse2)
15710 -definev(jsimd_h2v1_upsample_sse2)
15711 -definev(jconst_fancy_upsample_sse2)
15712 -definev(jsimd_h2v1_fancy_upsample_sse2)
15713 -definev(jsimd_h2v2_fancy_upsample_sse2)
15714 -definev(jconst_merged_upsample_sse2)
15715 -definev(jsimd_h2v1_merged_upsample_sse2)
15716 -definev(jsimd_h2v2_merged_upsample_sse2)
15717 -definev(jsimd_convsamp_mmx)
15718 -definev(jsimd_convsamp_sse2)
15719 -definev(jsimd_convsamp_float_3dnow)
15720 -definev(jsimd_convsamp_float_sse)
15721 -definev(jsimd_convsamp_float_sse2)
15722 -definev(jsimd_fdct_islow_mmx)
15723 -definev(jsimd_fdct_ifast_mmx)
15724 -definev(jconst_fdct_islow_sse2)
15725 -definev(jsimd_fdct_islow_sse2)
15726 -definev(jconst_fdct_ifast_sse2)
15727 -definev(jsimd_fdct_ifast_sse2)
15728 -definev(jsimd_fdct_float_3dnow)
15729 -definev(jconst_fdct_float_sse)
15730 -definev(jsimd_fdct_float_sse)
15731 -definev(jsimd_quantize_mmx)
15732 -definev(jsimd_quantize_sse2)
15733 -definev(jsimd_quantize_float_3dnow)
15734 -definev(jsimd_quantize_float_sse)
15735 -definev(jsimd_quantize_float_sse2)
15736 -definev(jsimd_idct_2x2_mmx)
15737 -definev(jsimd_idct_4x4_mmx)
15738 -definev(jconst_idct_red_sse2)
15739 -definev(jsimd_idct_2x2_sse2)
15740 -definev(jsimd_idct_4x4_sse2)
15741 -definev(jsimd_idct_islow_mmx)
15742 -definev(jsimd_idct_ifast_mmx)
15743 -definev(jconst_idct_islow_sse2)
15744 -definev(jsimd_idct_islow_sse2)
15745 -definev(jconst_idct_ifast_sse2)
15746 -definev(jsimd_idct_ifast_sse2)
15747 -definev(jsimd_idct_float_3dnow)
15748 -definev(jconst_idct_float_sse)
15749 -definev(jsimd_idct_float_sse)
15750 -definev(jconst_idct_float_sse2)
15751 -definev(jsimd_idct_float_sse2)
15752 +%define _cpp_protection_jpeg_simd_cpu_support jpeg_simd_cpu_support
15753 +%define _cpp_protection_jsimd_rgb_ycc_convert_mmx jsimd_rgb_ycc_convert_mmx
15754 +%define _cpp_protection_jsimd_ycc_rgb_convert_mmx jsimd_ycc_rgb_convert_mmx
15755 +%define _cpp_protection_jconst_rgb_ycc_convert_sse2 jconst_rgb_ycc_convert_sse2
15756 +%define _cpp_protection_jsimd_rgb_ycc_convert_sse2 jsimd_rgb_ycc_convert_sse2
15757 +%define _cpp_protection_jconst_ycc_rgb_convert_sse2 jconst_ycc_rgb_convert_sse2
15758 +%define _cpp_protection_jsimd_ycc_rgb_convert_sse2 jsimd_ycc_rgb_convert_sse2
15759 +%define _cpp_protection_jsimd_h2v2_downsample_mmx jsimd_h2v2_downsample_mmx
15760 +%define _cpp_protection_jsimd_h2v1_downsample_mmx jsimd_h2v1_downsample_mmx
15761 +%define _cpp_protection_jsimd_h2v2_downsample_sse2 jsimd_h2v2_downsample_sse2
15762 +%define _cpp_protection_jsimd_h2v1_downsample_sse2 jsimd_h2v1_downsample_sse2
15763 +%define _cpp_protection_jsimd_h2v2_upsample_mmx jsimd_h2v2_upsample_mmx
15764 +%define _cpp_protection_jsimd_h2v1_upsample_mmx jsimd_h2v1_upsample_mmx
15765 +%define _cpp_protection_jsimd_h2v1_fancy_upsample_mmx jsimd_h2v1_fancy_upsample _mmx
15766 +%define _cpp_protection_jsimd_h2v2_fancy_upsample_mmx jsimd_h2v2_fancy_upsample _mmx
15767 +%define _cpp_protection_jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_merged_upsamp le_mmx
15768 +%define _cpp_protection_jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_merged_upsamp le_mmx
15769 +%define _cpp_protection_jsimd_h2v2_upsample_sse2 jsimd_h2v2_upsample_sse2
15770 +%define _cpp_protection_jsimd_h2v1_upsample_sse2 jsimd_h2v1_upsample_sse2
15771 +%define _cpp_protection_jconst_fancy_upsample_sse2 jconst_fancy_upsample_sse2
15772 +%define _cpp_protection_jsimd_h2v1_fancy_upsample_sse2 jsimd_h2v1_fancy_upsampl e_sse2
15773 +%define _cpp_protection_jsimd_h2v2_fancy_upsample_sse2 jsimd_h2v2_fancy_upsampl e_sse2
15774 +%define _cpp_protection_jconst_merged_upsample_sse2 jconst_merged_upsample_sse2
15775 +%define _cpp_protection_jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_merged_upsam ple_sse2
15776 +%define _cpp_protection_jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_merged_upsam ple_sse2
15777 +%define _cpp_protection_jsimd_convsamp_mmx jsimd_convsamp_mmx
15778 +%define _cpp_protection_jsimd_convsamp_sse2 jsimd_convsamp_sse2
15779 +%define _cpp_protection_jsimd_convsamp_float_3dnow jsimd_convsamp_float_3dnow
15780 +%define _cpp_protection_jsimd_convsamp_float_sse jsimd_convsamp_float_sse
15781 +%define _cpp_protection_jsimd_convsamp_float_sse2 jsimd_convsamp_float_sse2
15782 +%define _cpp_protection_jsimd_fdct_islow_mmx jsimd_fdct_islow_mmx
15783 +%define _cpp_protection_jsimd_fdct_ifast_mmx jsimd_fdct_ifast_mmx
15784 +%define _cpp_protection_jconst_fdct_islow_sse2 jconst_fdct_islow_sse2
15785 +%define _cpp_protection_jsimd_fdct_islow_sse2 jsimd_fdct_islow_sse2
15786 +%define _cpp_protection_jconst_fdct_ifast_sse2 jconst_fdct_ifast_sse2
15787 +%define _cpp_protection_jsimd_fdct_ifast_sse2 jsimd_fdct_ifast_sse2
15788 +%define _cpp_protection_jsimd_fdct_float_3dnow jsimd_fdct_float_3dnow
15789 +%define _cpp_protection_jconst_fdct_float_sse jconst_fdct_float_sse
15790 +%define _cpp_protection_jsimd_fdct_float_sse jsimd_fdct_float_sse
15791 +%define _cpp_protection_jsimd_quantize_mmx jsimd_quantize_mmx
15792 +%define _cpp_protection_jsimd_quantize_sse2 jsimd_quantize_sse2
15793 +%define _cpp_protection_jsimd_quantize_float_3dnow jsimd_quantize_float_3dnow
15794 +%define _cpp_protection_jsimd_quantize_float_sse jsimd_quantize_float_sse
15795 +%define _cpp_protection_jsimd_quantize_float_sse2 jsimd_quantize_float_sse2
15796 +%define _cpp_protection_jsimd_idct_2x2_mmx jsimd_idct_2x2_mmx
15797 +%define _cpp_protection_jsimd_idct_4x4_mmx jsimd_idct_4x4_mmx
15798 +%define _cpp_protection_jconst_idct_red_sse2 jconst_idct_red_sse2
15799 +%define _cpp_protection_jsimd_idct_2x2_sse2 jsimd_idct_2x2_sse2
15800 +%define _cpp_protection_jsimd_idct_4x4_sse2 jsimd_idct_4x4_sse2
15801 +%define _cpp_protection_jsimd_idct_islow_mmx jsimd_idct_islow_mmx
15802 +%define _cpp_protection_jsimd_idct_ifast_mmx jsimd_idct_ifast_mmx
15803 +%define _cpp_protection_jconst_idct_islow_sse2 jconst_idct_islow_sse2
15804 +%define _cpp_protection_jsimd_idct_islow_sse2 jsimd_idct_islow_sse2
15805 +%define _cpp_protection_jconst_idct_ifast_sse2 jconst_idct_ifast_sse2
15806 +%define _cpp_protection_jsimd_idct_ifast_sse2 jsimd_idct_ifast_sse2
15807 +%define _cpp_protection_jsimd_idct_float_3dnow jsimd_idct_float_3dnow
15808 +%define _cpp_protection_jconst_idct_float_sse jconst_idct_float_sse
15809 +%define _cpp_protection_jsimd_idct_float_sse jsimd_idct_float_sse
15810 +%define _cpp_protection_jconst_idct_float_sse2 jconst_idct_float_sse2
15811 +%define _cpp_protection_jsimd_idct_float_sse2 jsimd_idct_float_sse2
15812 #endif /* NEED_SHORT_EXTERNAL_NAMES */
15813
15814 Index: simd/jsimdcpu.asm
15815 ===================================================================
15816 --- simd/jsimdcpu.asm (revision 829)
15817 +++ simd/jsimdcpu.asm (working copy)
15818 @@ -29,7 +29,7 @@
15819 ;
15820
15821 align 16
15822 - global EXTN(jpeg_simd_cpu_support)
15823 + global EXTN(jpeg_simd_cpu_support) PRIVATE
15824
15825 EXTN(jpeg_simd_cpu_support):
15826 push ebx
15827 @@ -100,3 +100,6 @@
15828 pop ebx
15829 ret
15830
15831 +; For some reason, the OS X linker does not honor the request to align the
15832 +; segment unless we do this.
15833 + align 16
15834 Index: simd/jsimdext.inc
15835 ===================================================================
15836 --- simd/jsimdext.inc (revision 829)
15837 +++ simd/jsimdext.inc (working copy)
15838 @@ -2,6 +2,7 @@
15839 ; jsimdext.inc - common declarations
15840 ;
15841 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
15842 +; Copyright 2010 D. R. Commander
15843 ;
15844 ; Based on
15845 ; x86 SIMD extension for IJG JPEG library - version 1.02
15846 @@ -37,9 +38,28 @@
15847
15848 ; -- segment definition --
15849 ;
15850 +%ifdef __YASM_VER__
15851 +%define SEG_TEXT .text align=16
15852 +%define SEG_CONST .rdata align=16
15853 +%else
15854 %define SEG_TEXT .text align=16 public use32 class=CODE
15855 %define SEG_CONST .rdata align=16 public use32 class=CONST
15856 +%endif
15857
15858 +%elifdef WIN64 ; ----(nasm -fwin64 -DWIN64 ...)--------
15859 +; * Microsoft Visual C++
15860 +
15861 +; -- segment definition --
15862 +;
15863 +%ifdef __YASM_VER__
15864 +%define SEG_TEXT .text align=16
15865 +%define SEG_CONST .rdata align=16
15866 +%else
15867 +%define SEG_TEXT .text align=16 public use64 class=CODE
15868 +%define SEG_CONST .rdata align=16 public use64 class=CONST
15869 +%endif
15870 +%define EXTN(name) name ; foo() -> foo
15871 +
15872 %elifdef OBJ32 ; ----(nasm -fobj -DOBJ32 ...)----------
15873 ; * Borland C++ (Win32)
15874
15875 @@ -53,6 +73,12 @@
15876 ; * *BSD family Unix using elf format
15877 ; * Unix System V, including Solaris x86, UnixWare and SCO Unix
15878
15879 +; PIC is the default on Linux
15880 +%define PIC
15881 +
15882 +; mark stack as non-executable
15883 +section .note.GNU-stack noalloc noexec nowrite progbits
15884 +
15885 ; -- segment definition --
15886 ;
15887 %ifdef __x86_64__
15888 @@ -280,7 +306,44 @@
15889 %endmacro
15890
15891 %ifdef __x86_64__
15892 +
15893 +%ifdef WIN64
15894 +
15895 %imacro collect_args 0
15896 + push r12
15897 + push r13
15898 + push r14
15899 + push r15
15900 + mov r10, rcx
15901 + mov r11, rdx
15902 + mov r12, r8
15903 + mov r13, r9
15904 + mov r14, [rax+48]
15905 + mov r15, [rax+56]
15906 + push rsi
15907 + push rdi
15908 + sub rsp, SIZEOF_XMMWORD
15909 + movaps XMMWORD [rsp], xmm6
15910 + sub rsp, SIZEOF_XMMWORD
15911 + movaps XMMWORD [rsp], xmm7
15912 +%endmacro
15913 +
15914 +%imacro uncollect_args 0
15915 + movaps xmm7, XMMWORD [rsp]
15916 + add rsp, SIZEOF_XMMWORD
15917 + movaps xmm6, XMMWORD [rsp]
15918 + add rsp, SIZEOF_XMMWORD
15919 + pop rdi
15920 + pop rsi
15921 + pop r15
15922 + pop r14
15923 + pop r13
15924 + pop r12
15925 +%endmacro
15926 +
15927 +%else
15928 +
15929 +%imacro collect_args 0
15930 push r10
15931 push r11
15932 push r12
15933 @@ -306,9 +369,21 @@
15934
15935 %endif
15936
15937 +%endif
15938 +
15939 ; --------------------------------------------------------------------------
15940 ; Defines picked up from the C headers
15941 ;
15942 %include "jsimdcfg.inc"
15943
15944 +; Begin chromium edits
15945 +%ifdef MACHO ; ----(nasm -fmacho -DMACHO ...)--------
15946 +%define PRIVATE :private_extern
15947 +%elifdef ELF ; ----(nasm -felf[64] -DELF ...)------------
15948 +%define PRIVATE :hidden
15949 +%else
15950 +%define PRIVATE
15951 +%endif
15952 +; End chromium edits
15953 +
15954 ; --------------------------------------------------------------------------
15955 Index: turbojpeg.h
15956 ===================================================================
15957 --- turbojpeg.h (revision 829)
15958 +++ turbojpeg.h (working copy)
15959 @@ -1,231 +1,932 @@
15960 -/* Copyright (C)2004 Landmark Graphics Corporation
15961 - * Copyright (C)2005, 2006 Sun Microsystems, Inc.
15962 - * Copyright (C)2009 D. R. Commander
15963 +/*
15964 + * Copyright (C)2009-2013 D. R. Commander. All Rights Reserved.
15965 *
15966 - * This library is free software and may be redistributed and/or modified under
15967 - * the terms of the wxWindows Library License, Version 3.1 or (at your option)
15968 - * any later version. The full license is in the LICENSE.txt file included
15969 - * with this distribution.
15970 + * Redistribution and use in source and binary forms, with or without
15971 + * modification, are permitted provided that the following conditions are met:
15972 *
15973 - * This library is distributed in the hope that it will be useful,
15974 - * but WITHOUT ANY WARRANTY; without even the implied warranty of
15975 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15976 - * wxWindows Library License for more details.
15977 + * - Redistributions of source code must retain the above copyright notice,
15978 + * this list of conditions and the following disclaimer.
15979 + * - Redistributions in binary form must reproduce the above copyright notice,
15980 + * this list of conditions and the following disclaimer in the documentation
15981 + * and/or other materials provided with the distribution.
15982 + * - Neither the name of the libjpeg-turbo Project nor the names of its
15983 + * contributors may be used to endorse or promote products derived from this
15984 + * software without specific prior written permission.
15985 + *
15986 + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS",
15987 + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15988 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15989 + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
15990 + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
15991 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
15992 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
15993 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
15994 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
15995 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
15996 + * POSSIBILITY OF SUCH DAMAGE.
15997 */
15998
15999 -#if (defined(_MSC_VER) || defined(__CYGWIN__) || defined(__MINGW32__)) && defin ed(_WIN32) && defined(DLLDEFINE)
16000 +#ifndef __TURBOJPEG_H__
16001 +#define __TURBOJPEG_H__
16002 +
16003 +#if defined(_WIN32) && defined(DLLDEFINE)
16004 #define DLLEXPORT __declspec(dllexport)
16005 #else
16006 #define DLLEXPORT
16007 #endif
16008 -
16009 #define DLLCALL
16010
16011 -/* Subsampling */
16012 -#define NUMSUBOPT 4
16013
16014 -enum {TJ_444=0, TJ_422, TJ_420, TJ_GRAYSCALE};
16015 +/**
16016 + * @addtogroup TurboJPEG
16017 + * TurboJPEG API. This API provides an interface for generating, decoding, and
16018 + * transforming planar YUV and JPEG images in memory.
16019 + *
16020 + * @{
16021 + */
16022
16023 -/* Flags */
16024 -#define TJ_BGR 1
16025 -#define TJ_BOTTOMUP 2
16026 -#define TJ_FORCEMMX 8 /* Force IPP to use MMX code even if SSE available */
16027 -#define TJ_FORCESSE 16 /* Force IPP to use SSE1 code even if SSE2 available * /
16028 -#define TJ_FORCESSE2 32 /* Force IPP to use SSE2 code (useful if auto-detect i s not working properly) */
16029 -#define TJ_ALPHAFIRST 64 /* BGR buffer is ABGR and RGB buffer is ARGB */
16030 -#define TJ_FORCESSE3 128 /* Force IPP to use SSE3 code (useful if auto-detect i s not working properly) */
16031 -#define TJ_FASTUPSAMPLE 256 /* Use fast, inaccurate 4:2:2 and 4:2:0 YUV upsampl ing routines in libjpeg decompressor */
16032
16033 +/**
16034 + * The number of chrominance subsampling options
16035 + */
16036 +#define TJ_NUMSAMP 5
16037 +
16038 +/**
16039 + * Chrominance subsampling options.
16040 + * When an image is converted from the RGB to the YCbCr colorspace as part of
16041 + * the JPEG compression process, some of the Cb and Cr (chrominance) components
16042 + * can be discarded or averaged together to produce a smaller image with little
16043 + * perceptible loss of image clarity (the human eye is more sensitive to small
16044 + * changes in brightness than small changes in color.) This is called
16045 + * "chrominance subsampling".
16046 + * <p>
16047 + * NOTE: Technically, the JPEG format uses the YCbCr colorspace, but per the
16048 + * convention of the digital video community, the TurboJPEG API uses "YUV" to
16049 + * refer to an image format consisting of Y, Cb, and Cr image planes.
16050 + */
16051 +enum TJSAMP
2199 +{ 16052 +{
2200 + if (simd_support & JSIMD_ARM_NEON) 16053 + /**
2201 + jsimd_ycc_rgb565_convert_neon(cinfo->output_width, input_buf, input_row, 16054 + * 4:4:4 chrominance subsampling (no chrominance subsampling). The JPEG or
2202 + output_buf, num_rows); 16055 + * YUV image will contain one chrominance component for every pixel in the
2203 +} 16056 + * source image.
2204 + 16057 + */
2205 +GLOBAL(int) 16058 + TJSAMP_444=0,
2206 +jsimd_can_h2v2_downsample (void) 16059 + /**
16060 + * 4:2:2 chrominance subsampling. The JPEG or YUV image will contain one
16061 + * chrominance component for every 2x1 block of pixels in the source image.
16062 + */
16063 + TJSAMP_422,
16064 + /**
16065 + * 4:2:0 chrominance subsampling. The JPEG or YUV image will contain one
16066 + * chrominance component for every 2x2 block of pixels in the source image.
16067 + */
16068 + TJSAMP_420,
16069 + /**
16070 + * Grayscale. The JPEG or YUV image will contain no chrominance components.
16071 + */
16072 + TJSAMP_GRAY,
16073 + /**
16074 + * 4:4:0 chrominance subsampling. The JPEG or YUV image will contain one
16075 + * chrominance component for every 1x2 block of pixels in the source image.
16076 + * Note that 4:4:0 subsampling is not fully accelerated in libjpeg-turbo.
16077 + */
16078 + TJSAMP_440
16079 +};
16080 +
16081 +/**
16082 + * MCU block width (in pixels) for a given level of chrominance subsampling.
16083 + * MCU block sizes:
16084 + * - 8x8 for no subsampling or grayscale
16085 + * - 16x8 for 4:2:2
16086 + * - 8x16 for 4:4:0
16087 + * - 16x16 for 4:2:0
16088 + */
16089 +static const int tjMCUWidth[TJ_NUMSAMP] = {8, 16, 16, 8, 8};
16090 +
16091 +/**
16092 + * MCU block height (in pixels) for a given level of chrominance subsampling.
16093 + * MCU block sizes:
16094 + * - 8x8 for no subsampling or grayscale
16095 + * - 16x8 for 4:2:2
16096 + * - 8x16 for 4:4:0
16097 + * - 16x16 for 4:2:0
16098 + */
16099 +static const int tjMCUHeight[TJ_NUMSAMP] = {8, 8, 16, 8, 16};
16100 +
16101 +
16102 +/**
16103 + * The number of pixel formats
16104 + */
16105 +#define TJ_NUMPF 11
16106 +
16107 +/**
16108 + * Pixel formats
16109 + */
16110 +enum TJPF
2207 +{ 16111 +{
2208 + init_simd(); 16112 + /**
2209 + 16113 + * RGB pixel format. The red, green, and blue components in the image are
2210 + return 0; 16114 + * stored in 3-byte pixels in the order R, G, B from lowest to highest byte
2211 +} 16115 + * address within each pixel.
2212 + 16116 + */
2213 +GLOBAL(int) 16117 + TJPF_RGB=0,
2214 +jsimd_can_h2v1_downsample (void) 16118 + /**
16119 + * BGR pixel format. The red, green, and blue components in the image are
16120 + * stored in 3-byte pixels in the order B, G, R from lowest to highest byte
16121 + * address within each pixel.
16122 + */
16123 + TJPF_BGR,
16124 + /**
16125 + * RGBX pixel format. The red, green, and blue components in the image are
16126 + * stored in 4-byte pixels in the order R, G, B from lowest to highest byte
16127 + * address within each pixel. The X component is ignored when compressing
16128 + * and undefined when decompressing.
16129 + */
16130 + TJPF_RGBX,
16131 + /**
16132 + * BGRX pixel format. The red, green, and blue components in the image are
16133 + * stored in 4-byte pixels in the order B, G, R from lowest to highest byte
16134 + * address within each pixel. The X component is ignored when compressing
16135 + * and undefined when decompressing.
16136 + */
16137 + TJPF_BGRX,
16138 + /**
16139 + * XBGR pixel format. The red, green, and blue components in the image are
16140 + * stored in 4-byte pixels in the order R, G, B from highest to lowest byte
16141 + * address within each pixel. The X component is ignored when compressing
16142 + * and undefined when decompressing.
16143 + */
16144 + TJPF_XBGR,
16145 + /**
16146 + * XRGB pixel format. The red, green, and blue components in the image are
16147 + * stored in 4-byte pixels in the order B, G, R from highest to lowest byte
16148 + * address within each pixel. The X component is ignored when compressing
16149 + * and undefined when decompressing.
16150 + */
16151 + TJPF_XRGB,
16152 + /**
16153 + * Grayscale pixel format. Each 1-byte pixel represents a luminance
16154 + * (brightness) level from 0 to 255.
16155 + */
16156 + TJPF_GRAY,
16157 + /**
16158 + * RGBA pixel format. This is the same as @ref TJPF_RGBX, except that when
16159 + * decompressing, the X component is guaranteed to be 0xFF, which can be
16160 + * interpreted as an opaque alpha channel.
16161 + */
16162 + TJPF_RGBA,
16163 + /**
16164 + * BGRA pixel format. This is the same as @ref TJPF_BGRX, except that when
16165 + * decompressing, the X component is guaranteed to be 0xFF, which can be
16166 + * interpreted as an opaque alpha channel.
16167 + */
16168 + TJPF_BGRA,
16169 + /**
16170 + * ABGR pixel format. This is the same as @ref TJPF_XBGR, except that when
16171 + * decompressing, the X component is guaranteed to be 0xFF, which can be
16172 + * interpreted as an opaque alpha channel.
16173 + */
16174 + TJPF_ABGR,
16175 + /**
16176 + * ARGB pixel format. This is the same as @ref TJPF_XRGB, except that when
16177 + * decompressing, the X component is guaranteed to be 0xFF, which can be
16178 + * interpreted as an opaque alpha channel.
16179 + */
16180 + TJPF_ARGB
16181 +};
16182 +
16183 +/**
16184 + * Red offset (in bytes) for a given pixel format. This specifies the number
16185 + * of bytes that the red component is offset from the start of the pixel. For
16186 + * instance, if a pixel of format TJ_BGRX is stored in <tt>char pixel[]</tt>,
16187 + * then the red component will be <tt>pixel[tjRedOffset[TJ_BGRX]]</tt>.
16188 + */
16189 +static const int tjRedOffset[TJ_NUMPF] = {0, 2, 0, 2, 3, 1, 0, 0, 2, 3, 1};
16190 +/**
16191 + * Green offset (in bytes) for a given pixel format. This specifies the number
16192 + * of bytes that the green component is offset from the start of the pixel.
16193 + * For instance, if a pixel of format TJ_BGRX is stored in
16194 + * <tt>char pixel[]</tt>, then the green component will be
16195 + * <tt>pixel[tjGreenOffset[TJ_BGRX]]</tt>.
16196 + */
16197 +static const int tjGreenOffset[TJ_NUMPF] = {1, 1, 1, 1, 2, 2, 0, 1, 1, 2, 2};
16198 +/**
16199 + * Blue offset (in bytes) for a given pixel format. This specifies the number
16200 + * of bytes that the Blue component is offset from the start of the pixel. For
16201 + * instance, if a pixel of format TJ_BGRX is stored in <tt>char pixel[]</tt>,
16202 + * then the blue component will be <tt>pixel[tjBlueOffset[TJ_BGRX]]</tt>.
16203 + */
16204 +static const int tjBlueOffset[TJ_NUMPF] = {2, 0, 2, 0, 1, 3, 0, 2, 0, 1, 3};
16205 +
16206 +/**
16207 + * Pixel size (in bytes) for a given pixel format.
16208 + */
16209 +static const int tjPixelSize[TJ_NUMPF] = {3, 3, 4, 4, 4, 4, 1, 4, 4, 4, 4};
16210 +
16211 +
16212 +/**
16213 + * The uncompressed source/destination image is stored in bottom-up (Windows,
16214 + * OpenGL) order, not top-down (X11) order.
16215 + */
16216 +#define TJFLAG_BOTTOMUP 2
16217 +/**
16218 + * Turn off CPU auto-detection and force TurboJPEG to use MMX code (if the
16219 + * underlying codec supports it.)
16220 + */
16221 +#define TJFLAG_FORCEMMX 8
16222 +/**
16223 + * Turn off CPU auto-detection and force TurboJPEG to use SSE code (if the
16224 + * underlying codec supports it.)
16225 + */
16226 +#define TJFLAG_FORCESSE 16
16227 +/**
16228 + * Turn off CPU auto-detection and force TurboJPEG to use SSE2 code (if the
16229 + * underlying codec supports it.)
16230 + */
16231 +#define TJFLAG_FORCESSE2 32
16232 +/**
16233 + * Turn off CPU auto-detection and force TurboJPEG to use SSE3 code (if the
16234 + * underlying codec supports it.)
16235 + */
16236 +#define TJFLAG_FORCESSE3 128
16237 +/**
16238 + * When decompressing an image that was compressed using chrominance
16239 + * subsampling, use the fastest chrominance upsampling algorithm available in
16240 + * the underlying codec. The default is to use smooth upsampling, which
16241 + * creates a smooth transition between neighboring chrominance components in
16242 + * order to reduce upsampling artifacts in the decompressed image.
16243 + */
16244 +#define TJFLAG_FASTUPSAMPLE 256
16245 +/**
16246 + * Disable buffer (re)allocation. If passed to #tjCompress2() or
16247 + * #tjTransform(), this flag will cause those functions to generate an error if
16248 + * the JPEG image buffer is invalid or too small rather than attempting to
16249 + * allocate or reallocate that buffer. This reproduces the behavior of earlier
16250 + * versions of TurboJPEG.
16251 + */
16252 +#define TJFLAG_NOREALLOC 1024
16253 +/**
16254 + * Use the fastest DCT/IDCT algorithm available in the underlying codec. The
16255 + * default if this flag is not specified is implementation-specific. For
16256 + * example, the implementation of TurboJPEG for libjpeg[-turbo] uses the fast
16257 + * algorithm by default when compressing, because this has been shown to have
16258 + * only a very slight effect on accuracy, but it uses the accurate algorithm
16259 + * when decompressing, because this has been shown to have a larger effect.
16260 + */
16261 +#define TJFLAG_FASTDCT 2048
16262 +/**
16263 + * Use the most accurate DCT/IDCT algorithm available in the underlying codec.
16264 + * The default if this flag is not specified is implementation-specific. For
16265 + * example, the implementation of TurboJPEG for libjpeg[-turbo] uses the fast
16266 + * algorithm by default when compressing, because this has been shown to have
16267 + * only a very slight effect on accuracy, but it uses the accurate algorithm
16268 + * when decompressing, because this has been shown to have a larger effect.
16269 + */
16270 +#define TJFLAG_ACCURATEDCT 4096
16271 +
16272 +
16273 +/**
16274 + * The number of transform operations
16275 + */
16276 +#define TJ_NUMXOP 8
16277 +
16278 +/**
16279 + * Transform operations for #tjTransform()
16280 + */
16281 +enum TJXOP
2215 +{ 16282 +{
2216 + init_simd(); 16283 + /**
2217 + 16284 + * Do not transform the position of the image pixels
2218 + return 0; 16285 + */
2219 +} 16286 + TJXOP_NONE=0,
2220 + 16287 + /**
2221 +GLOBAL(void) 16288 + * Flip (mirror) image horizontally. This transform is imperfect if there
2222 +jsimd_h2v2_downsample (j_compress_ptr cinfo, jpeg_component_info * compptr, 16289 + * are any partial MCU blocks on the right edge (see #TJXOPT_PERFECT.)
2223 + JSAMPARRAY input_data, JSAMPARRAY output_data) 16290 + */
16291 + TJXOP_HFLIP,
16292 + /**
16293 + * Flip (mirror) image vertically. This transform is imperfect if there are
16294 + * any partial MCU blocks on the bottom edge (see #TJXOPT_PERFECT.)
16295 + */
16296 + TJXOP_VFLIP,
16297 + /**
16298 + * Transpose image (flip/mirror along upper left to lower right axis.) This
16299 + * transform is always perfect.
16300 + */
16301 + TJXOP_TRANSPOSE,
16302 + /**
16303 + * Transverse transpose image (flip/mirror along upper right to lower left
16304 + * axis.) This transform is imperfect if there are any partial MCU blocks in
16305 + * the image (see #TJXOPT_PERFECT.)
16306 + */
16307 + TJXOP_TRANSVERSE,
16308 + /**
16309 + * Rotate image clockwise by 90 degrees. This transform is imperfect if
16310 + * there are any partial MCU blocks on the bottom edge (see
16311 + * #TJXOPT_PERFECT.)
16312 + */
16313 + TJXOP_ROT90,
16314 + /**
16315 + * Rotate image 180 degrees. This transform is imperfect if there are any
16316 + * partial MCU blocks in the image (see #TJXOPT_PERFECT.)
16317 + */
16318 + TJXOP_ROT180,
16319 + /**
16320 + * Rotate image counter-clockwise by 90 degrees. This transform is imperfect
16321 + * if there are any partial MCU blocks on the right edge (see
16322 + * #TJXOPT_PERFECT.)
16323 + */
16324 + TJXOP_ROT270
16325 +};
16326 +
16327 +
16328 +/**
16329 + * This option will cause #tjTransform() to return an error if the transform is
16330 + * not perfect. Lossless transforms operate on MCU blocks, whose size depends
16331 + * on the level of chrominance subsampling used (see #tjMCUWidth
16332 + * and #tjMCUHeight.) If the image's width or height is not evenly divisible
16333 + * by the MCU block size, then there will be partial MCU blocks on the right
16334 + * and/or bottom edges. It is not possible to move these partial MCU blocks to
16335 + * the top or left of the image, so any transform that would require that is
16336 + * "imperfect." If this option is not specified, then any partial MCU blocks
16337 + * that cannot be transformed will be left in place, which will create
16338 + * odd-looking strips on the right or bottom edge of the image.
16339 + */
16340 +#define TJXOPT_PERFECT 1
16341 +/**
16342 + * This option will cause #tjTransform() to discard any partial MCU blocks that
16343 + * cannot be transformed.
16344 + */
16345 +#define TJXOPT_TRIM 2
16346 +/**
16347 + * This option will enable lossless cropping. See #tjTransform() for more
16348 + * information.
16349 + */
16350 +#define TJXOPT_CROP 4
16351 +/**
16352 + * This option will discard the color data in the input image and produce
16353 + * a grayscale output image.
16354 + */
16355 +#define TJXOPT_GRAY 8
16356 +/**
16357 + * This option will prevent #tjTransform() from outputting a JPEG image for
16358 + * this particular transform (this can be used in conjunction with a custom
16359 + * filter to capture the transformed DCT coefficients without transcoding
16360 + * them.)
16361 + */
16362 +#define TJXOPT_NOOUTPUT 16
16363 +
16364 +
16365 +/**
16366 + * Scaling factor
16367 + */
16368 +typedef struct
2224 +{ 16369 +{
2225 +} 16370 + /**
2226 + 16371 + * Numerator
2227 +GLOBAL(void) 16372 + */
2228 +jsimd_h2v1_downsample (j_compress_ptr cinfo, jpeg_component_info * compptr, 16373 + int num;
2229 + JSAMPARRAY input_data, JSAMPARRAY output_data) 16374 + /**
16375 + * Denominator
16376 + */
16377 + int denom;
16378 +} tjscalingfactor;
16379 +
16380 +/**
16381 + * Cropping region
16382 + */
16383 +typedef struct
2230 +{ 16384 +{
2231 +} 16385 + /**
2232 + 16386 + * The left boundary of the cropping region. This must be evenly divisible
2233 +GLOBAL(int) 16387 + * by the MCU block width (see #tjMCUWidth.)
2234 +jsimd_can_h2v2_upsample (void) 16388 + */
16389 + int x;
16390 + /**
16391 + * The upper boundary of the cropping region. This must be evenly divisible
16392 + * by the MCU block height (see #tjMCUHeight.)
16393 + */
16394 + int y;
16395 + /**
16396 + * The width of the cropping region. Setting this to 0 is the equivalent of
16397 + * setting it to the width of the source JPEG image - x.
16398 + */
16399 + int w;
16400 + /**
16401 + * The height of the cropping region. Setting this to 0 is the equivalent of
16402 + * setting it to the height of the source JPEG image - y.
16403 + */
16404 + int h;
16405 +} tjregion;
16406 +
16407 +/**
16408 + * Lossless transform
16409 + */
16410 +typedef struct tjtransform
2235 +{ 16411 +{
2236 + init_simd(); 16412 + /**
2237 + 16413 + * Cropping region
2238 + return 0; 16414 + */
2239 +} 16415 + tjregion r;
2240 + 16416 + /**
2241 +GLOBAL(int) 16417 + * One of the @ref TJXOP "transform operations"
2242 +jsimd_can_h2v1_upsample (void) 16418 + */
2243 +{ 16419 + int op;
2244 + init_simd(); 16420 + /**
2245 + 16421 + * The bitwise OR of one of more of the @ref TJXOPT_CROP "transform options"
2246 + return 0; 16422 + */
2247 +} 16423 + int options;
2248 + 16424 + /**
2249 +GLOBAL(void) 16425 + * Arbitrary data that can be accessed within the body of the callback
2250 +jsimd_h2v2_upsample (j_decompress_ptr cinfo, 16426 + * function
2251 + jpeg_component_info * compptr, 16427 + */
2252 + JSAMPARRAY input_data, 16428 + void *data;
2253 + JSAMPARRAY * output_data_ptr) 16429 + /**
2254 +{ 16430 + * A callback function that can be used to modify the DCT coefficients
2255 +} 16431 + * after they are losslessly transformed but before they are transcoded to a
2256 + 16432 + * new JPEG image. This allows for custom filters or other transformations
2257 +GLOBAL(void) 16433 + * to be applied in the frequency domain.
2258 +jsimd_h2v1_upsample (j_decompress_ptr cinfo, 16434 + *
2259 + jpeg_component_info * compptr, 16435 + * @param coeffs pointer to an array of transformed DCT coefficients. (NOTE:
2260 + JSAMPARRAY input_data, 16436 + * this pointer is not guaranteed to be valid once the callback
2261 + JSAMPARRAY * output_data_ptr) 16437 + * returns, so applications wishing to hand off the DCT coefficients
2262 +{ 16438 + * to another function or library should make a copy of them within
2263 +} 16439 + * the body of the callback.)
2264 + 16440 + * @param arrayRegion #tjregion structure containing the width and height of
2265 +GLOBAL(int) 16441 + * the array pointed to by <tt>coeffs</tt> as well as its offset
2266 +jsimd_can_h2v2_fancy_upsample (void) 16442 + * relative to the component plane. TurboJPEG implementations may
2267 +{ 16443 + * choose to split each component plane into multiple DCT coefficient
2268 + init_simd(); 16444 + * arrays and call the callback function once for each array.
2269 + 16445 + * @param planeRegion #tjregion structure containing the width and height of
2270 + return 0; 16446 + * the component plane to which <tt>coeffs</tt> belongs
2271 +} 16447 + * @param componentID ID number of the component plane to which
2272 + 16448 + * <tt>coeffs</tt> belongs (Y, Cb, and Cr have, respectively, ID's of
2273 +GLOBAL(int) 16449 + * 0, 1, and 2 in typical JPEG images.)
2274 +jsimd_can_h2v1_fancy_upsample (void) 16450 + * @param transformID ID number of the transformed image to which
2275 +{ 16451 + * <tt>coeffs</tt> belongs. This is the same as the index of the
2276 + init_simd(); 16452 + * transform in the <tt>transforms</tt> array that was passed to
2277 + 16453 + * #tjTransform().
2278 + return 0; 16454 + * @param transform a pointer to a #tjtransform structure that specifies the
2279 +} 16455 + * parameters and/or cropping region for this transform
2280 + 16456 + *
2281 +GLOBAL(void) 16457 + * @return 0 if the callback was successful, or -1 if an error occurred.
2282 +jsimd_h2v2_fancy_upsample (j_decompress_ptr cinfo, 16458 + */
2283 + jpeg_component_info * compptr, 16459 + int (*customFilter)(short *coeffs, tjregion arrayRegion,
2284 + JSAMPARRAY input_data, 16460 + tjregion planeRegion, int componentIndex, int transformIndex,
2285 + JSAMPARRAY * output_data_ptr) 16461 + struct tjtransform *transform);
2286 +{ 16462 +} tjtransform;
2287 +} 16463 +
2288 + 16464 +/**
2289 +GLOBAL(void) 16465 + * TurboJPEG instance handle
2290 +jsimd_h2v1_fancy_upsample (j_decompress_ptr cinfo, 16466 + */
2291 + jpeg_component_info * compptr, 16467 typedef void* tjhandle;
2292 + JSAMPARRAY input_data, 16468
2293 + JSAMPARRAY * output_data_ptr) 16469 -#define TJPAD(p) (((p)+3)&(~3))
2294 +{ 16470 -#ifndef max
2295 +} 16471 - #define max(a,b) ((a)>(b)?(a):(b))
2296 + 16472 -#endif
2297 +GLOBAL(int) 16473
2298 +jsimd_can_h2v2_merged_upsample (void) 16474 +/**
2299 +{ 16475 + * Pad the given width to the nearest 32-bit boundary
2300 + init_simd(); 16476 + */
2301 + 16477 +#define TJPAD(width) (((width)+3)&(~3))
2302 + return 0; 16478 +
2303 +} 16479 +/**
2304 + 16480 + * Compute the scaled value of <tt>dimension</tt> using the given scaling
2305 +GLOBAL(int) 16481 + * factor. This macro performs the integer equivalent of <tt>ceil(dimension *
2306 +jsimd_can_h2v1_merged_upsample (void) 16482 + * scalingFactor)</tt>.
2307 +{ 16483 + */
2308 + init_simd(); 16484 +#define TJSCALED(dimension, scalingFactor) ((dimension * scalingFactor.num \
2309 + 16485 + + scalingFactor.denom - 1) / scalingFactor.denom)
2310 + return 0; 16486 +
2311 +} 16487 +
2312 + 16488 #ifdef __cplusplus
2313 +GLOBAL(void) 16489 extern "C" {
2314 +jsimd_h2v2_merged_upsample (j_decompress_ptr cinfo, 16490 #endif
2315 + JSAMPIMAGE input_buf, 16491
2316 + JDIMENSION in_row_group_ctr, 16492 -/* API follows */
2317 + JSAMPARRAY output_buf) 16493
2318 +{ 16494 +/**
2319 +} 16495 + * Create a TurboJPEG compressor instance.
2320 + 16496 + *
2321 +GLOBAL(void) 16497 + * @return a handle to the newly-created instance, or NULL if an error
2322 +jsimd_h2v1_merged_upsample (j_decompress_ptr cinfo, 16498 + * occurred (see #tjGetErrorStr().)
2323 + JSAMPIMAGE input_buf, 16499 + */
2324 + JDIMENSION in_row_group_ctr, 16500 +DLLEXPORT tjhandle DLLCALL tjInitCompress(void);
2325 + JSAMPARRAY output_buf) 16501
2326 +{ 16502 -/*
2327 +} 16503 - tjhandle tjInitCompress(void)
2328 + 16504
2329 +GLOBAL(int) 16505 - Creates a new JPEG compressor instance, allocates memory for the structures,
2330 +jsimd_can_convsamp (void) 16506 - and returns a handle to the instance. Most applications will only
2331 +{ 16507 - need to call this once at the beginning of the program or once for each
2332 + init_simd(); 16508 - concurrent thread. Don't try to create a new instance every time you
2333 + 16509 - compress an image, because this will cause performance to suffer.
2334 + return 0; 16510 -
2335 +} 16511 - RETURNS: NULL on error
2336 + 16512 +/**
2337 +GLOBAL(int) 16513 + * Compress an RGB or grayscale image into a JPEG image.
2338 +jsimd_can_convsamp_float (void) 16514 + *
2339 +{ 16515 + * @param handle a handle to a TurboJPEG compressor or transformer instance
2340 + init_simd(); 16516 + * @param srcBuf pointer to an image buffer containing RGB or grayscale pixels
2341 + 16517 + * to be compressed
2342 + return 0; 16518 + * @param width width (in pixels) of the source image
2343 +} 16519 + * @param pitch bytes per line of the source image. Normally, this should be
2344 + 16520 + * <tt>width * #tjPixelSize[pixelFormat]</tt> if the image is unpadded,
2345 +GLOBAL(void) 16521 + * or <tt>#TJPAD(width * #tjPixelSize[pixelFormat])</tt> if each line of
2346 +jsimd_convsamp (JSAMPARRAY sample_data, JDIMENSION start_col, 16522 + * the image is padded to the nearest 32-bit boundary, as is the case
2347 + DCTELEM * workspace) 16523 + * for Windows bitmaps. You can also be clever and use this parameter
2348 +{ 16524 + * to skip lines, etc. Setting this parameter to 0 is the equivalent of
2349 +} 16525 + * setting it to <tt>width * #tjPixelSize[pixelFormat]</tt>.
2350 + 16526 + * @param height height (in pixels) of the source image
2351 +GLOBAL(void) 16527 + * @param pixelFormat pixel format of the source image (see @ref TJPF
2352 +jsimd_convsamp_float (JSAMPARRAY sample_data, JDIMENSION start_col, 16528 + * "Pixel formats".)
2353 + FAST_FLOAT * workspace) 16529 + * @param jpegBuf address of a pointer to an image buffer that will receive the
2354 +{ 16530 + * JPEG image. TurboJPEG has the ability to reallocate the JPEG buffer
2355 +} 16531 + * to accommodate the size of the JPEG image. Thus, you can choose to:
2356 + 16532 + * -# pre-allocate the JPEG buffer with an arbitrary size using
2357 +GLOBAL(int) 16533 + * #tjAlloc() and let TurboJPEG grow the buffer as needed,
2358 +jsimd_can_fdct_islow (void) 16534 + * -# set <tt>*jpegBuf</tt> to NULL to tell TurboJPEG to allocate the
2359 +{ 16535 + * buffer for you, or
2360 + init_simd(); 16536 + * -# pre-allocate the buffer to a "worst case" size determined by
2361 + 16537 + * calling #tjBufSize(). This should ensure that the buffer never has
2362 + return 0; 16538 + * to be re-allocated (setting #TJFLAG_NOREALLOC guarantees this.)
2363 +} 16539 + * .
2364 + 16540 + * If you choose option 1, <tt>*jpegSize</tt> should be set to the
2365 +GLOBAL(int) 16541 + * size of your pre-allocated buffer. In any case, unless you have
2366 +jsimd_can_fdct_ifast (void) 16542 + * set #TJFLAG_NOREALLOC, you should always check <tt>*jpegBuf</tt> upon
2367 +{ 16543 + * return from this function, as it may have changed.
2368 + init_simd(); 16544 + * @param jpegSize pointer to an unsigned long variable that holds the size of
2369 + 16545 + * the JPEG image buffer. If <tt>*jpegBuf</tt> points to a
2370 + return 0; 16546 + * pre-allocated buffer, then <tt>*jpegSize</tt> should be set to the
2371 +} 16547 + * size of the buffer. Upon return, <tt>*jpegSize</tt> will contain the
2372 + 16548 + * size of the JPEG image (in bytes.)
2373 +GLOBAL(int) 16549 + * @param jpegSubsamp the level of chrominance subsampling to be used when
2374 +jsimd_can_fdct_float (void) 16550 + * generating the JPEG image (see @ref TJSAMP
2375 +{ 16551 + * "Chrominance subsampling options".)
2376 + init_simd(); 16552 + * @param jpegQual the image quality of the generated JPEG image (1 = worst,
2377 + 16553 + 100 = best)
2378 + return 0; 16554 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP
2379 +} 16555 + * "flags".
2380 + 16556 + *
2381 +GLOBAL(void) 16557 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)
2382 +jsimd_fdct_islow (DCTELEM * data) 16558 */
2383 +{ 16559 -DLLEXPORT tjhandle DLLCALL tjInitCompress(void);
2384 +} 16560 +DLLEXPORT int DLLCALL tjCompress2(tjhandle handle, unsigned char *srcBuf,
2385 + 16561 + int width, int pitch, int height, int pixelFormat, unsigned char **jpegBuf,
2386 +GLOBAL(void) 16562 + unsigned long *jpegSize, int jpegSubsamp, int jpegQual, int flags);
2387 +jsimd_fdct_ifast (DCTELEM * data) 16563
2388 +{ 16564
2389 +} 16565 -/*
2390 + 16566 - int tjCompress(tjhandle j,
2391 +GLOBAL(void) 16567 - unsigned char *srcbuf, int width, int pitch, int height, int pixelsize,
2392 +jsimd_fdct_float (FAST_FLOAT * data) 16568 - unsigned char *dstbuf, unsigned long *size,
2393 +{ 16569 - int jpegsubsamp, int jpegqual, int flags)
2394 +} 16570 +/**
2395 + 16571 + * The maximum size of the buffer (in bytes) required to hold a JPEG image with
2396 +GLOBAL(int) 16572 + * the given parameters. The number of bytes returned by this function is
2397 +jsimd_can_quantize (void) 16573 + * larger than the size of the uncompressed source image. The reason for this
2398 +{ 16574 + * is that the JPEG format uses 16-bit coefficients, and it is thus possible
2399 + init_simd(); 16575 + * for a very high-quality JPEG image with very high-frequency content to
2400 + 16576 + * expand rather than compress when converted to the JPEG format. Such images
2401 + return 0; 16577 + * represent a very rare corner case, but since there is no way to predict the
2402 +} 16578 + * size of a JPEG image prior to compression, the corner case has to be
2403 + 16579 + * handled.
2404 +GLOBAL(int) 16580 + *
2405 +jsimd_can_quantize_float (void) 16581 + * @param width width of the image (in pixels)
2406 +{ 16582 + * @param height height of the image (in pixels)
2407 + init_simd(); 16583 + * @param jpegSubsamp the level of chrominance subsampling to be used when
2408 + 16584 + * generating the JPEG image (see @ref TJSAMP
2409 + return 0; 16585 + * "Chrominance subsampling options".)
2410 +} 16586 + *
2411 + 16587 + * @return the maximum size of the buffer (in bytes) required to hold the
2412 +GLOBAL(void) 16588 + * image, or -1 if the arguments are out of bounds.
2413 +jsimd_quantize (JCOEFPTR coef_block, DCTELEM * divisors, 16589 + */
2414 + DCTELEM * workspace) 16590 +DLLEXPORT unsigned long DLLCALL tjBufSize(int width, int height,
2415 +{ 16591 + int jpegSubsamp);
2416 +} 16592
2417 + 16593 - [INPUT] j = instance handle previously returned from a call to
2418 +GLOBAL(void) 16594 - tjInitCompress()
2419 +jsimd_quantize_float (JCOEFPTR coef_block, FAST_FLOAT * divisors, 16595 - [INPUT] srcbuf = pointer to user-allocated image buffer containing pixels in
2420 + FAST_FLOAT * workspace) 16596 - RGB(A) or BGR(A) form
2421 +{ 16597 - [INPUT] width = width (in pixels) of the source image
2422 +} 16598 - [INPUT] pitch = bytes per line of the source image (width*pixelsize if the
2423 + 16599 - bitmap is unpadded, else TJPAD(width*pixelsize) if each line of the bitmap
2424 +GLOBAL(int) 16600 - is padded to the nearest 32-bit boundary, such as is the case for Windows
2425 +jsimd_can_idct_2x2 (void) 16601 - bitmaps. You can also be clever and use this parameter to skip lines, etc .,
2426 +{ 16602 - as long as the pitch is greater than 0.)
2427 + init_simd(); 16603 - [INPUT] height = height (in pixels) of the source image
2428 + 16604 - [INPUT] pixelsize = size (in bytes) of each pixel in the source image
2429 + /* The code is optimised for these values only */ 16605 - RGBA and BGRA: 4, RGB and BGR: 3
2430 + if (DCTSIZE != 8) 16606 - [INPUT] dstbuf = pointer to user-allocated image buffer which will receive
2431 + return 0; 16607 - the JPEG image. Use the macro TJBUFSIZE(width, height) to determine
2432 + if (sizeof(JCOEF) != 2) 16608 - the appropriate size for this buffer based on the image width and height.
2433 + return 0; 16609 - [OUTPUT] size = pointer to unsigned long which receives the size (in bytes)
2434 + if (BITS_IN_JSAMPLE != 8) 16610 - of the compressed image
2435 + return 0; 16611 - [INPUT] jpegsubsamp = Specifies either 4:2:0, 4:2:2, or 4:4:4 subsampling.
2436 + if (sizeof(JDIMENSION) != 4) 16612 - When the image is converted from the RGB to YCbCr colorspace as part of th e
2437 + return 0; 16613 - JPEG compression process, every other Cb and Cr (chrominance) pixel can be
2438 + if (sizeof(ISLOW_MULT_TYPE) != 2) 16614 - discarded to produce a smaller image with little perceptible loss of
2439 + return 0; 16615 - image clarity (the human eye is more sensitive to small changes in
2440 + 16616 - brightness than small changes in color.)
2441 + if (simd_support & JSIMD_ARM_NEON) 16617
2442 + return 1; 16618 - TJ_420: 4:2:0 subsampling. Discards every other Cb, Cr pixel in both
2443 + 16619 - horizontal and vertical directions.
2444 + return 0; 16620 - TJ_422: 4:2:2 subsampling. Discards every other Cb, Cr pixel only in
2445 +} 16621 - the horizontal direction.
2446 + 16622 - TJ_444: no subsampling.
2447 +GLOBAL(int) 16623 - TJ_GRAYSCALE: Generate grayscale JPEG image
2448 +jsimd_can_idct_4x4 (void) 16624 +/**
2449 +{ 16625 + * The size of the buffer (in bytes) required to hold a YUV planar image with
2450 + init_simd(); 16626 + * the given parameters.
2451 + 16627 + *
2452 + /* The code is optimised for these values only */ 16628 + * @param width width of the image (in pixels)
2453 + if (DCTSIZE != 8) 16629 + * @param height height of the image (in pixels)
2454 + return 0; 16630 + * @param subsamp level of chrominance subsampling in the image (see
2455 + if (sizeof(JCOEF) != 2) 16631 + * @ref TJSAMP "Chrominance subsampling options".)
2456 + return 0; 16632 + *
2457 + if (BITS_IN_JSAMPLE != 8) 16633 + * @return the size of the buffer (in bytes) required to hold the image, or
2458 + return 0; 16634 + * -1 if the arguments are out of bounds.
2459 + if (sizeof(JDIMENSION) != 4) 16635 + */
2460 + return 0; 16636 +DLLEXPORT unsigned long DLLCALL tjBufSizeYUV(int width, int height,
2461 + if (sizeof(ISLOW_MULT_TYPE) != 2) 16637 + int subsamp);
2462 + return 0; 16638
2463 + 16639 - [INPUT] jpegqual = JPEG quality (an integer between 0 and 100 inclusive.)
2464 + if (simd_support & JSIMD_ARM_NEON) 16640 - [INPUT] flags = the bitwise OR of one or more of the following
2465 + return 1; 16641
2466 + 16642 - TJ_BGR: The components of each pixel in the source image are stored in
2467 + return 0; 16643 - B,G,R order, not R,G,B
2468 +} 16644 - TJ_BOTTOMUP: The source image is stored in bottom-up (Windows) order,
2469 + 16645 - not top-down
2470 +GLOBAL(void) 16646 - TJ_FORCEMMX: Valid only for the Intel Performance Primitives implementatio n
2471 +jsimd_idct_2x2 (j_decompress_ptr cinfo, jpeg_component_info * compptr, 16647 - of this codec-- force IPP to use MMX code (bypass CPU auto-detection)
2472 + JCOEFPTR coef_block, JSAMPARRAY output_buf, 16648 - TJ_FORCESSE: Valid only for the Intel Performance Primitives implementatio n
2473 + JDIMENSION output_col) 16649 - of this codec-- force IPP to use SSE code (bypass CPU auto-detection)
2474 +{ 16650 - TJ_FORCESSE2: Valid only for the Intel Performance Primitives implementati on
2475 + if (simd_support & JSIMD_ARM_NEON) 16651 - of this codec-- force IPP to use SSE2 code (bypass CPU auto-detection)
2476 + jsimd_idct_2x2_neon(compptr->dct_table, coef_block, output_buf, 16652 - TJ_FORCESSE3: Valid only for the Intel Performance Primitives implementati on
2477 + output_col); 16653 - of this codec-- force IPP to use SSE3 code (bypass CPU auto-detection)
2478 +} 16654 +/**
2479 + 16655 + * Encode an RGB or grayscale image into a YUV planar image. This function
2480 +GLOBAL(void) 16656 + * uses the accelerated color conversion routines in TurboJPEG's underlying
2481 +jsimd_idct_4x4 (j_decompress_ptr cinfo, jpeg_component_info * compptr, 16657 + * codec to produce a planar YUV image that is suitable for X Video.
2482 + JCOEFPTR coef_block, JSAMPARRAY output_buf, 16658 + * Specifically, if the chrominance components are subsampled along the
2483 + JDIMENSION output_col) 16659 + * horizontal dimension, then the width of the luminance plane is padded to the
2484 +{ 16660 + * nearest multiple of 2 in the output image (same goes for the height of the
2485 + if (simd_support & JSIMD_ARM_NEON) 16661 + * luminance plane, if the chrominance components are subsampled along the
2486 + jsimd_idct_4x4_neon(compptr->dct_table, coef_block, output_buf, 16662 + * vertical dimension.) Also, each line of each plane in the output image is
2487 + output_col); 16663 + * padded to 4 bytes. Although this will work with any subsampling option, it
2488 +} 16664 + * is really only useful in combination with TJ_420, which produces an image
2489 + 16665 + * compatible with the I420 (AKA "YUV420P") format.
2490 +GLOBAL(int) 16666 + * <p>
2491 +jsimd_can_idct_islow (void) 16667 + * NOTE: Technically, the JPEG format uses the YCbCr colorspace, but per the
2492 +{ 16668 + * convention of the digital video community, the TurboJPEG API uses "YUV" to
2493 + init_simd(); 16669 + * refer to an image format consisting of Y, Cb, and Cr image planes.
2494 + 16670 + *
2495 + /* The code is optimised for these values only */ 16671 + * @param handle a handle to a TurboJPEG compressor or transformer instance
2496 + if (DCTSIZE != 8) 16672 + * @param srcBuf pointer to an image buffer containing RGB or grayscale pixels
2497 + return 0; 16673 + * to be encoded
2498 + if (sizeof(JCOEF) != 2) 16674 + * @param width width (in pixels) of the source image
2499 + return 0; 16675 + * @param pitch bytes per line of the source image. Normally, this should be
2500 + if (BITS_IN_JSAMPLE != 8) 16676 + * <tt>width * #tjPixelSize[pixelFormat]</tt> if the image is unpadded,
2501 + return 0; 16677 + * or <tt>#TJPAD(width * #tjPixelSize[pixelFormat])</tt> if each line of
2502 + if (sizeof(JDIMENSION) != 4) 16678 + * the image is padded to the nearest 32-bit boundary, as is the case
2503 + return 0; 16679 + * for Windows bitmaps. You can also be clever and use this parameter
2504 + if (sizeof(ISLOW_MULT_TYPE) != 2) 16680 + * to skip lines, etc. Setting this parameter to 0 is the equivalent of
2505 + return 0; 16681 + * setting it to <tt>width * #tjPixelSize[pixelFormat]</tt>.
2506 + 16682 + * @param height height (in pixels) of the source image
2507 + if (simd_support & JSIMD_ARM_NEON) 16683 + * @param pixelFormat pixel format of the source image (see @ref TJPF
2508 + return 1; 16684 + * "Pixel formats".)
2509 + 16685 + * @param dstBuf pointer to an image buffer that will receive the YUV image.
2510 + return 0; 16686 + * Use #tjBufSizeYUV() to determine the appropriate size for this buffer
2511 +} 16687 + * based on the image width, height, and level of chrominance
2512 + 16688 + * subsampling.
2513 +GLOBAL(int) 16689 + * @param subsamp the level of chrominance subsampling to be used when
2514 +jsimd_can_idct_ifast (void) 16690 + * generating the YUV image (see @ref TJSAMP
2515 +{ 16691 + * "Chrominance subsampling options".)
2516 + init_simd(); 16692 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP
2517 + 16693 + * "flags".
2518 + /* The code is optimised for these values only */ 16694 + *
2519 + if (DCTSIZE != 8) 16695 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)
2520 + return 0; 16696 +*/
2521 + if (sizeof(JCOEF) != 2) 16697 +DLLEXPORT int DLLCALL tjEncodeYUV2(tjhandle handle,
2522 + return 0; 16698 + unsigned char *srcBuf, int width, int pitch, int height, int pixelFormat,
2523 + if (BITS_IN_JSAMPLE != 8) 16699 + unsigned char *dstBuf, int subsamp, int flags);
2524 + return 0; 16700
2525 + if (sizeof(JDIMENSION) != 4) 16701 - RETURNS: 0 on success, -1 on error
2526 + return 0; 16702 +
2527 + if (sizeof(IFAST_MULT_TYPE) != 2) 16703 +/**
2528 + return 0; 16704 + * Create a TurboJPEG decompressor instance.
2529 + if (IFAST_SCALE_BITS != 2) 16705 + *
2530 + return 0; 16706 + * @return a handle to the newly-created instance, or NULL if an error
2531 + 16707 + * occurred (see #tjGetErrorStr().)
2532 + if (simd_support & JSIMD_ARM_NEON) 16708 */
2533 + return 1; 16709 -DLLEXPORT int DLLCALL tjCompress(tjhandle j,
2534 + 16710 - unsigned char *srcbuf, int width, int pitch, int height, int pixelsize,
2535 + return 0; 16711 - unsigned char *dstbuf, unsigned long *size,
2536 +} 16712 - int jpegsubsamp, int jpegqual, int flags);
2537 + 16713 +DLLEXPORT tjhandle DLLCALL tjInitDecompress(void);
2538 +GLOBAL(int) 16714
2539 +jsimd_can_idct_float (void) 16715 -DLLEXPORT unsigned long DLLCALL TJBUFSIZE(int width, int height);
2540 +{ 16716
2541 + init_simd(); 16717 -/*
2542 + 16718 - tjhandle tjInitDecompress(void)
2543 + return 0; 16719 +/**
2544 +} 16720 + * Retrieve information about a JPEG image without decompressing it.
2545 + 16721 + *
2546 +GLOBAL(void) 16722 + * @param handle a handle to a TurboJPEG decompressor or transformer instance
2547 +jsimd_idct_islow (j_decompress_ptr cinfo, jpeg_component_info * compptr, 16723 + * @param jpegBuf pointer to a buffer containing a JPEG image
2548 + JCOEFPTR coef_block, JSAMPARRAY output_buf, 16724 + * @param jpegSize size of the JPEG image (in bytes)
2549 + JDIMENSION output_col) 16725 + * @param width pointer to an integer variable that will receive the width (in
2550 +{ 16726 + * pixels) of the JPEG image
2551 + if (simd_support & JSIMD_ARM_NEON) 16727 + * @param height pointer to an integer variable that will receive the height
2552 + jsimd_idct_islow_neon(compptr->dct_table, coef_block, output_buf, 16728 + * (in pixels) of the JPEG image
2553 + output_col); 16729 + * @param jpegSubsamp pointer to an integer variable that will receive the
2554 +} 16730 + * level of chrominance subsampling used when compressing the JPEG image
2555 + 16731 + * (see @ref TJSAMP "Chrominance subsampling options".)
2556 +GLOBAL(void) 16732 + *
2557 +jsimd_idct_ifast (j_decompress_ptr cinfo, jpeg_component_info * compptr, 16733 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)
2558 + JCOEFPTR coef_block, JSAMPARRAY output_buf, 16734 +*/
2559 + JDIMENSION output_col) 16735 +DLLEXPORT int DLLCALL tjDecompressHeader2(tjhandle handle,
2560 +{ 16736 + unsigned char *jpegBuf, unsigned long jpegSize, int *width, int *height,
2561 + if (simd_support & JSIMD_ARM_NEON) 16737 + int *jpegSubsamp);
2562 + jsimd_idct_ifast_neon(compptr->dct_table, coef_block, output_buf, 16738
2563 + output_col); 16739 - Creates a new JPEG decompressor instance, allocates memory for the
2564 +} 16740 - structures, and returns a handle to the instance. Most applications will
2565 + 16741 - only need to call this once at the beginning of the program or once for each
2566 +GLOBAL(void) 16742 - concurrent thread. Don't try to create a new instance every time you
2567 +jsimd_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr, 16743 - decompress an image, because this will cause performance to suffer.
2568 + JCOEFPTR coef_block, JSAMPARRAY output_buf, 16744
2569 + JDIMENSION output_col) 16745 - RETURNS: NULL on error
2570 +{ 16746 +/**
2571 +} 16747 + * Returns a list of fractional scaling factors that the JPEG decompressor in
2572 Index: simd/jsimd_arm64_neon.S 16748 + * this implementation of TurboJPEG supports.
2573 new file mode 100644 16749 + *
16750 + * @param numscalingfactors pointer to an integer variable that will receive
16751 + * the number of elements in the list
16752 + *
16753 + * @return a pointer to a list of fractional scaling factors, or NULL if an
16754 + * error is encountered (see #tjGetErrorStr().)
16755 */
16756 -DLLEXPORT tjhandle DLLCALL tjInitDecompress(void);
16757 +DLLEXPORT tjscalingfactor* DLLCALL tjGetScalingFactors(int *numscalingfactors);
16758
16759
16760 -/*
16761 - int tjDecompressHeader(tjhandle j,
16762 - unsigned char *srcbuf, unsigned long size,
16763 - int *width, int *height)
16764 +/**
16765 + * Decompress a JPEG image to an RGB or grayscale image.
16766 + *
16767 + * @param handle a handle to a TurboJPEG decompressor or transformer instance
16768 + * @param jpegBuf pointer to a buffer containing the JPEG image to decompress
16769 + * @param jpegSize size of the JPEG image (in bytes)
16770 + * @param dstBuf pointer to an image buffer that will receive the decompressed
16771 + * image. This buffer should normally be <tt>pitch * scaledHeight</tt>
16772 + * bytes in size, where <tt>scaledHeight</tt> can be determined by
16773 + * calling #TJSCALED() with the JPEG image height and one of the scaling
16774 + * factors returned by #tjGetScalingFactors(). The <tt>dstBuf</tt>
16775 + * pointer may also be used to decompress into a specific region of a
16776 + * larger buffer.
16777 + * @param width desired width (in pixels) of the destination image. If this is
16778 + * different than the width of the JPEG image being decompressed, then
16779 + * TurboJPEG will use scaling in the JPEG decompressor to generate the
16780 + * largest possible image that will fit within the desired width. If
16781 + * <tt>width</tt> is set to 0, then only the height will be considered
16782 + * when determining the scaled image size.
16783 + * @param pitch bytes per line of the destination image. Normally, this is
16784 + * <tt>scaledWidth * #tjPixelSize[pixelFormat]</tt> if the decompressed
16785 + * image is unpadded, else <tt>#TJPAD(scaledWidth *
16786 + * #tjPixelSize[pixelFormat])</tt> if each line of the decompressed
16787 + * image is padded to the nearest 32-bit boundary, as is the case for
16788 + * Windows bitmaps. (NOTE: <tt>scaledWidth</tt> can be determined by
16789 + * calling #TJSCALED() with the JPEG image width and one of the scaling
16790 + * factors returned by #tjGetScalingFactors().) You can also be clever
16791 + * and use the pitch parameter to skip lines, etc. Setting this
16792 + * parameter to 0 is the equivalent of setting it to <tt>scaledWidth
16793 + * * #tjPixelSize[pixelFormat]</tt>.
16794 + * @param height desired height (in pixels) of the destination image. If this
16795 + * is different than the height of the JPEG image being decompressed,
16796 + * then TurboJPEG will use scaling in the JPEG decompressor to generate
16797 + * the largest possible image that will fit within the desired height.
16798 + * If <tt>height</tt> is set to 0, then only the width will be
16799 + * considered when determining the scaled image size.
16800 + * @param pixelFormat pixel format of the destination image (see @ref
16801 + * TJPF "Pixel formats".)
16802 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP
16803 + * "flags".
16804 + *
16805 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)
16806 + */
16807 +DLLEXPORT int DLLCALL tjDecompress2(tjhandle handle,
16808 + unsigned char *jpegBuf, unsigned long jpegSize, unsigned char *dstBuf,
16809 + int width, int pitch, int height, int pixelFormat, int flags);
16810
16811 - [INPUT] j = instance handle previously returned from a call to
16812 - tjInitDecompress()
16813 - [INPUT] srcbuf = pointer to a user-allocated buffer containing the JPEG image
16814 - to decompress
16815 - [INPUT] size = size of the JPEG image buffer (in bytes)
16816 - [OUTPUT] width = width (in pixels) of the JPEG image
16817 - [OUTPUT] height = height (in pixels) of the JPEG image
16818
16819 - RETURNS: 0 on success, -1 on error
16820 -*/
16821 -DLLEXPORT int DLLCALL tjDecompressHeader(tjhandle j,
16822 - unsigned char *srcbuf, unsigned long size,
16823 - int *width, int *height);
16824 +/**
16825 + * Decompress a JPEG image to a YUV planar image. This function performs JPEG
16826 + * decompression but leaves out the color conversion step, so a planar YUV
16827 + * image is generated instead of an RGB image. The padding of the planes in
16828 + * this image is the same as in the images generated by #tjEncodeYUV2(). Note
16829 + * that, if the width or height of the image is not an even multiple of the MCU
16830 + * block size (see #tjMCUWidth and #tjMCUHeight), then an intermediate buffer
16831 + * copy will be performed within TurboJPEG.
16832 + * <p>
16833 + * NOTE: Technically, the JPEG format uses the YCbCr colorspace, but per the
16834 + * convention of the digital video community, the TurboJPEG API uses "YUV" to
16835 + * refer to an image format consisting of Y, Cb, and Cr image planes.
16836 + *
16837 + * @param handle a handle to a TurboJPEG decompressor or transformer instance
16838 + * @param jpegBuf pointer to a buffer containing the JPEG image to decompress
16839 + * @param jpegSize size of the JPEG image (in bytes)
16840 + * @param dstBuf pointer to an image buffer that will receive the YUV image.
16841 + * Use #tjBufSizeYUV() to determine the appropriate size for this buffer
16842 + * based on the image width, height, and level of subsampling.
16843 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP
16844 + * "flags".
16845 + *
16846 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)
16847 + */
16848 +DLLEXPORT int DLLCALL tjDecompressToYUV(tjhandle handle,
16849 + unsigned char *jpegBuf, unsigned long jpegSize, unsigned char *dstBuf,
16850 + int flags);
16851
16852
16853 -/*
16854 - int tjDecompress(tjhandle j,
16855 - unsigned char *srcbuf, unsigned long size,
16856 - unsigned char *dstbuf, int width, int pitch, int height, int pixelsize,
16857 - int flags)
16858 +/**
16859 + * Create a new TurboJPEG transformer instance.
16860 + *
16861 + * @return a handle to the newly-created instance, or NULL if an error
16862 + * occurred (see #tjGetErrorStr().)
16863 + */
16864 +DLLEXPORT tjhandle DLLCALL tjInitTransform(void);
16865
16866 - [INPUT] j = instance handle previously returned from a call to
16867 - tjInitDecompress()
16868 - [INPUT] srcbuf = pointer to a user-allocated buffer containing the JPEG image
16869 - to decompress
16870 - [INPUT] size = size of the JPEG image buffer (in bytes)
16871 - [INPUT] dstbuf = pointer to user-allocated image buffer which will receive
16872 - the bitmap image. This buffer should normally be pitch*height
16873 - bytes in size, although this pointer may also be used to decompress into
16874 - a specific region of a larger buffer.
16875 - [INPUT] width = width (in pixels) of the destination image
16876 - [INPUT] pitch = bytes per line of the destination image (width*pixelsize if t he
16877 - bitmap is unpadded, else TJPAD(width*pixelsize) if each line of the bitmap
16878 - is padded to the nearest 32-bit boundary, such as is the case for Windows
16879 - bitmaps. You can also be clever and use this parameter to skip lines, etc .,
16880 - as long as the pitch is greater than 0.)
16881 - [INPUT] height = height (in pixels) of the destination image
16882 - [INPUT] pixelsize = size (in bytes) of each pixel in the destination image
16883 - RGBA/RGBx and BGRA/BGRx: 4, RGB and BGR: 3
16884 - [INPUT] flags = the bitwise OR of one or more of the following
16885
16886 - TJ_BGR: The components of each pixel in the destination image should be
16887 - written in B,G,R order, not R,G,B
16888 - TJ_BOTTOMUP: The destination image should be stored in bottom-up
16889 - (Windows) order, not top-down
16890 - TJ_FORCEMMX: Valid only for the Intel Performance Primitives implementatio n
16891 - of this codec-- force IPP to use MMX code (bypass CPU auto-detection)
16892 - TJ_FORCESSE: Valid only for the Intel Performance Primitives implementatio n
16893 - of this codec-- force IPP to use SSE code (bypass CPU auto-detection)
16894 - TJ_FORCESSE2: Valid only for the Intel Performance Primitives implementati on
16895 - of this codec-- force IPP to use SSE2 code (bypass CPU auto-detection)
16896 +/**
16897 + * Losslessly transform a JPEG image into another JPEG image. Lossless
16898 + * transforms work by moving the raw coefficients from one JPEG image structure
16899 + * to another without altering the values of the coefficients. While this is
16900 + * typically faster than decompressing the image, transforming it, and
16901 + * re-compressing it, lossless transforms are not free. Each lossless
16902 + * transform requires reading and performing Huffman decoding on all of the
16903 + * coefficients in the source image, regardless of the size of the destination
16904 + * image. Thus, this function provides a means of generating multiple
16905 + * transformed images from the same source or applying multiple
16906 + * transformations simultaneously, in order to eliminate the need to read the
16907 + * source coefficients multiple times.
16908 + *
16909 + * @param handle a handle to a TurboJPEG transformer instance
16910 + * @param jpegBuf pointer to a buffer containing the JPEG image to transform
16911 + * @param jpegSize size of the JPEG image (in bytes)
16912 + * @param n the number of transformed JPEG images to generate
16913 + * @param dstBufs pointer to an array of n image buffers. <tt>dstBufs[i]</tt>
16914 + * will receive a JPEG image that has been transformed using the
16915 + * parameters in <tt>transforms[i]</tt>. TurboJPEG has the ability to
16916 + * reallocate the JPEG buffer to accommodate the size of the JPEG image.
16917 + * Thus, you can choose to:
16918 + * -# pre-allocate the JPEG buffer with an arbitrary size using
16919 + * #tjAlloc() and let TurboJPEG grow the buffer as needed,
16920 + * -# set <tt>dstBufs[i]</tt> to NULL to tell TurboJPEG to allocate the
16921 + * buffer for you, or
16922 + * -# pre-allocate the buffer to a "worst case" size determined by
16923 + * calling #tjBufSize() with the transformed or cropped width and
16924 + * height. This should ensure that the buffer never has to be
16925 + * re-allocated (setting #TJFLAG_NOREALLOC guarantees this.)
16926 + * .
16927 + * If you choose option 1, <tt>dstSizes[i]</tt> should be set to
16928 + * the size of your pre-allocated buffer. In any case, unless you have
16929 + * set #TJFLAG_NOREALLOC, you should always check <tt>dstBufs[i]</tt>
16930 + * upon return from this function, as it may have changed.
16931 + * @param dstSizes pointer to an array of n unsigned long variables that will
16932 + * receive the actual sizes (in bytes) of each transformed JPEG image.
16933 + * If <tt>dstBufs[i]</tt> points to a pre-allocated buffer, then
16934 + * <tt>dstSizes[i]</tt> should be set to the size of the buffer. Upon
16935 + * return, <tt>dstSizes[i]</tt> will contain the size of the JPEG image
16936 + * (in bytes.)
16937 + * @param transforms pointer to an array of n #tjtransform structures, each of
16938 + * which specifies the transform parameters and/or cropping region for
16939 + * the corresponding transformed output image.
16940 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP
16941 + * "flags".
16942 + *
16943 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)
16944 + */
16945 +DLLEXPORT int DLLCALL tjTransform(tjhandle handle, unsigned char *jpegBuf,
16946 + unsigned long jpegSize, int n, unsigned char **dstBufs,
16947 + unsigned long *dstSizes, tjtransform *transforms, int flags);
16948
16949 - RETURNS: 0 on success, -1 on error
16950 -*/
16951 -DLLEXPORT int DLLCALL tjDecompress(tjhandle j,
16952 - unsigned char *srcbuf, unsigned long size,
16953 - unsigned char *dstbuf, int width, int pitch, int height, int pixelsize,
16954 - int flags);
16955
16956 +/**
16957 + * Destroy a TurboJPEG compressor, decompressor, or transformer instance.
16958 + *
16959 + * @param handle a handle to a TurboJPEG compressor, decompressor or
16960 + * transformer instance
16961 + *
16962 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)
16963 + */
16964 +DLLEXPORT int DLLCALL tjDestroy(tjhandle handle);
16965
16966 -/*
16967 - int tjDestroy(tjhandle h)
16968
16969 - Frees structures associated with a compression or decompression instance
16970 -
16971 - [INPUT] h = instance handle (returned from a previous call to
16972 - tjInitCompress() or tjInitDecompress()
16973 +/**
16974 + * Allocate an image buffer for use with TurboJPEG. You should always use
16975 + * this function to allocate the JPEG destination buffer(s) for #tjCompress2()
16976 + * and #tjTransform() unless you are disabling automatic buffer
16977 + * (re)allocation (by setting #TJFLAG_NOREALLOC.)
16978 + *
16979 + * @param bytes the number of bytes to allocate
16980 + *
16981 + * @return a pointer to a newly-allocated buffer with the specified number of
16982 + * bytes
16983 + *
16984 + * @sa tjFree()
16985 + */
16986 +DLLEXPORT unsigned char* DLLCALL tjAlloc(int bytes);
16987
16988 - RETURNS: 0 on success, -1 on error
16989 -*/
16990 -DLLEXPORT int DLLCALL tjDestroy(tjhandle h);
16991
16992 +/**
16993 + * Free an image buffer previously allocated by TurboJPEG. You should always
16994 + * use this function to free JPEG destination buffer(s) that were automatically
16995 + * (re)allocated by #tjCompress2() or #tjTransform() or that were manually
16996 + * allocated using #tjAlloc().
16997 + *
16998 + * @param buffer address of the buffer to free
16999 + *
17000 + * @sa tjAlloc()
17001 + */
17002 +DLLEXPORT void DLLCALL tjFree(unsigned char *buffer);
17003
17004 -/*
17005 - char *tjGetErrorStr(void)
17006 -
17007 - Returns a descriptive error message explaining why the last command failed
17008 -*/
17009 +
17010 +/**
17011 + * Returns a descriptive error message explaining why the last command failed.
17012 + *
17013 + * @return a descriptive error message explaining why the last command failed.
17014 + */
17015 DLLEXPORT char* DLLCALL tjGetErrorStr(void);
17016
17017 +
17018 +/* Backward compatibility functions and macros (nothing to see here) */
17019 +#define NUMSUBOPT TJ_NUMSAMP
17020 +#define TJ_444 TJSAMP_444
17021 +#define TJ_422 TJSAMP_422
17022 +#define TJ_420 TJSAMP_420
17023 +#define TJ_411 TJSAMP_420
17024 +#define TJ_GRAYSCALE TJSAMP_GRAY
17025 +
17026 +#define TJ_BGR 1
17027 +#define TJ_BOTTOMUP TJFLAG_BOTTOMUP
17028 +#define TJ_FORCEMMX TJFLAG_FORCEMMX
17029 +#define TJ_FORCESSE TJFLAG_FORCESSE
17030 +#define TJ_FORCESSE2 TJFLAG_FORCESSE2
17031 +#define TJ_ALPHAFIRST 64
17032 +#define TJ_FORCESSE3 TJFLAG_FORCESSE3
17033 +#define TJ_FASTUPSAMPLE TJFLAG_FASTUPSAMPLE
17034 +#define TJ_YUV 512
17035 +
17036 +DLLEXPORT unsigned long DLLCALL TJBUFSIZE(int width, int height);
17037 +
17038 +DLLEXPORT unsigned long DLLCALL TJBUFSIZEYUV(int width, int height,
17039 + int jpegSubsamp);
17040 +
17041 +DLLEXPORT int DLLCALL tjCompress(tjhandle handle, unsigned char *srcBuf,
17042 + int width, int pitch, int height, int pixelSize, unsigned char *dstBuf,
17043 + unsigned long *compressedSize, int jpegSubsamp, int jpegQual, int flags);
17044 +
17045 +DLLEXPORT int DLLCALL tjEncodeYUV(tjhandle handle,
17046 + unsigned char *srcBuf, int width, int pitch, int height, int pixelSize,
17047 + unsigned char *dstBuf, int subsamp, int flags);
17048 +
17049 +DLLEXPORT int DLLCALL tjDecompressHeader(tjhandle handle,
17050 + unsigned char *jpegBuf, unsigned long jpegSize, int *width, int *height);
17051 +
17052 +DLLEXPORT int DLLCALL tjDecompress(tjhandle handle,
17053 + unsigned char *jpegBuf, unsigned long jpegSize, unsigned char *dstBuf,
17054 + int width, int pitch, int height, int pixelSize, int flags);
17055 +
17056 +
17057 +/**
17058 + * @}
17059 + */
17060 +
17061 #ifdef __cplusplus
17062 }
17063 #endif
17064 +
17065 +#endif
17066 Index: turbojpegl.c
2574 =================================================================== 17067 ===================================================================
2575 --- /dev/null 17068 --- turbojpegl.c (revision 829)
2576 +++ simd/jsimd_arm64_neon.S 17069 +++ turbojpegl.c (working copy)
2577 @@ -0,0 +1,1861 @@ 17070 @@ -149,6 +149,10 @@
2578 +/* 17071 #error "TurboJPEG requires JPEG colorspace extensions"
2579 + * ARMv8 NEON optimizations for libjpeg-turbo 17072 #endif
2580 + * 17073
2581 + * Copyright (C) 2009-2011 Nokia Corporation and/or its subsidiary(-ies). 17074 + if(flags&TJ_FORCEMMX) putenv("JSIMD_FORCEMMX=1");
2582 + * All rights reserved. 17075 + else if(flags&TJ_FORCESSE) putenv("JSIMD_FORCESSE=1");
2583 + * Author: Siarhei Siamashka <siarhei.siamashka@nokia.com> 17076 + else if(flags&TJ_FORCESSE2) putenv("JSIMD_FORCESSE2=1");
2584 + * Copyright (C) 2013-2014, Linaro Limited 17077 +
2585 + * Author: Ragesh Radhakrishnan <ragesh.r@linaro.org> 17078 if(setjmp(j->jerr.jb))
2586 + * 17079 { // this will execute if LIBJPEG has an error
2587 + * This software is provided 'as-is', without any express or implied 17080 if(row_pointer) free(row_pointer);
2588 + * warranty. In no event will the authors be held liable for any damages 17081 @@ -188,7 +192,8 @@
2589 + * arising from the use of this software. 17082 j->cinfo.image_height-j->cinfo.next_scanline);
2590 + * 17083 }
2591 + * Permission is granted to anyone to use this software for any purpose, 17084 jpeg_finish_compress(&j->cinfo);
2592 + * including commercial applications, and to alter it and redistribute it 17085 - *size=TJBUFSIZE(j->cinfo.image_width, j->cinfo.image_height)-(j->jdms.fr ee_in_buffer);
2593 + * freely, subject to the following restrictions: 17086 + *size=TJBUFSIZE(j->cinfo.image_width, j->cinfo.image_height)
2594 + * 17087 + -(unsigned long)(j->jdms.free_in_buffer);
2595 + * 1. The origin of this software must not be misrepresented; you must not 17088
2596 + * claim that you wrote the original software. If you use this software 17089 if(row_pointer) free(row_pointer);
2597 + * in a product, an acknowledgment in the product documentation would be 17090 return 0;
2598 + * appreciated but is not required. 17091 @@ -287,6 +292,10 @@
2599 + * 2. Altered source versions must be plainly marked as such, and must not be 17092
2600 + * misrepresented as being the original software. 17093 if(pitch==0) pitch=width*ps;
2601 + * 3. This notice may not be removed or altered from any source distribution. 17094
2602 + */ 17095 + if(flags&TJ_FORCEMMX) putenv("JSIMD_FORCEMMX=1");
2603 + 17096 + else if(flags&TJ_FORCESSE) putenv("JSIMD_FORCESSE=1");
2604 +#if defined(__linux__) && defined(__ELF__) 17097 + else if(flags&TJ_FORCESSE2) putenv("JSIMD_FORCESSE2=1");
2605 +.section .note.GNU-stack,"",%progbits /* mark stack as non-executable */ 17098 +
2606 +#endif 17099 if(setjmp(j->jerr.jb))
2607 + 17100 { // this will execute if LIBJPEG has an error
2608 +.text 17101 if(row_pointer) free(row_pointer);
2609 +.arch armv8-a+fp+simd 17102 Index: wrppm.c
2610 + 17103 ===================================================================
2611 + 17104 --- wrppm.c (revision 829)
2612 +#define RESPECT_STRICT_ALIGNMENT 1 17105 +++ wrppm.c (working copy)
2613 + 17106 @@ -2,6 +2,7 @@
2614 + 17107 * wrppm.c
2615 +/*****************************************************************************/ 17108 *
2616 + 17109 * Copyright (C) 1991-1996, Thomas G. Lane.
2617 +/* Supplementary macro for setting function attributes */ 17110 + * Modified 2009 by Guido Vollbeding.
2618 +.macro asm_function fname 17111 * This file is part of the Independent JPEG Group's software.
2619 +#ifdef __APPLE__ 17112 * For conditions of distribution and use, see the accompanying README file.
2620 + .globl _\fname 17113 *
2621 +_\fname: 17114 @@ -40,11 +41,11 @@
2622 +#else 17115 #define BYTESPERSAMPLE 1
2623 + .global \fname 17116 #define PPM_MAXVAL 255
2624 +#ifdef __ELF__ 17117 #else
2625 + .hidden \fname 17118 -/* The word-per-sample format always puts the LSB first. */
2626 + .type \fname, %function 17119 +/* The word-per-sample format always puts the MSB first. */
2627 +#endif 17120 #define PUTPPMSAMPLE(ptr,v) \
2628 +\fname: 17121 { register int val_ = v; \
2629 +#endif 17122 + *ptr++ = (char) ((val_ >> 8) & 0xFF); \
2630 +.endm 17123 *ptr++ = (char) (val_ & 0xFF); \
2631 + 17124 - *ptr++ = (char) ((val_ >> 8) & 0xFF); \
2632 +/* Transpose elements of single 128 bit registers */ 17125 }
2633 +.macro transpose_single x0,x1,xi,xilen,literal 17126 #define BYTESPERSAMPLE 2
2634 + ins \xi\xilen[0], \x0\xilen[0] 17127 #define PPM_MAXVAL ((1<<BITS_IN_JSAMPLE)-1)
2635 + ins \x1\xilen[0], \x0\xilen[1]
2636 + trn1 \x0\literal, \x0\literal, \x1\literal
2637 + trn2 \x1\literal, \xi\literal, \x1\literal
2638 +.endm
2639 +
2640 +/* Transpose elements of 2 differnet registers */
2641 +.macro transpose x0,x1,xi,xilen,literal
2642 + mov \xi\xilen, \x0\xilen
2643 + trn1 \x0\literal, \x0\literal, \x1\literal
2644 + trn2 \x1\literal, \xi\literal, \x1\literal
2645 +.endm
2646 +
2647 +/* Transpose a block of 4x4 coefficients in four 64-bit registers */
2648 +.macro transpose_4x4_32 x0,x0len x1,x1len x2,x2len x3,x3len,xi,xilen
2649 + mov \xi\xilen, \x0\xilen
2650 + trn1 \x0\x0len, \x0\x0len, \x2\x2len
2651 + trn2 \x2\x2len, \xi\x0len, \x2\x2len
2652 + mov \xi\xilen, \x1\xilen
2653 + trn1 \x1\x1len, \x1\x1len, \x3\x3len
2654 + trn2 \x3\x3len, \xi\x1len, \x3\x3len
2655 +.endm
2656 +
2657 +.macro transpose_4x4_16 x0,x0len x1,x1len, x2,x2len, x3,x3len,xi,xilen
2658 + mov \xi\xilen, \x0\xilen
2659 + trn1 \x0\x0len, \x0\x0len, \x1\x1len
2660 + trn2 \x1\x2len, \xi\x0len, \x1\x2len
2661 + mov \xi\xilen, \x2\xilen
2662 + trn1 \x2\x2len, \x2\x2len, \x3\x3len
2663 + trn2 \x3\x2len, \xi\x1len, \x3\x3len
2664 +.endm
2665 +
2666 +.macro transpose_4x4 x0, x1, x2, x3,x5
2667 + transpose_4x4_16 \x0,.4h, \x1,.4h, \x2,.4h,\x3,.4h,\x5,.16b
2668 + transpose_4x4_32 \x0,.2s, \x1,.2s, \x2,.2s,\x3,.2s,\x5,.16b
2669 +.endm
2670 +
2671 +
2672 +#define CENTERJSAMPLE 128
2673 +
2674 +/*****************************************************************************/
2675 +
2676 +/*
2677 + * Perform dequantization and inverse DCT on one block of coefficients.
2678 + *
2679 + * GLOBAL(void)
2680 + * jsimd_idct_islow_neon (void * dct_table, JCOEFPTR coef_block,
2681 + * JSAMPARRAY output_buf, JDIMENSION output_col)
2682 + */
2683 +
2684 +#define FIX_0_298631336 (2446)
2685 +#define FIX_0_390180644 (3196)
2686 +#define FIX_0_541196100 (4433)
2687 +#define FIX_0_765366865 (6270)
2688 +#define FIX_0_899976223 (7373)
2689 +#define FIX_1_175875602 (9633)
2690 +#define FIX_1_501321110 (12299)
2691 +#define FIX_1_847759065 (15137)
2692 +#define FIX_1_961570560 (16069)
2693 +#define FIX_2_053119869 (16819)
2694 +#define FIX_2_562915447 (20995)
2695 +#define FIX_3_072711026 (25172)
2696 +
2697 +#define FIX_1_175875602_MINUS_1_961570560 (FIX_1_175875602 - FIX_1_961570560)
2698 +#define FIX_1_175875602_MINUS_0_390180644 (FIX_1_175875602 - FIX_0_390180644)
2699 +#define FIX_0_541196100_MINUS_1_847759065 (FIX_0_541196100 - FIX_1_847759065)
2700 +#define FIX_3_072711026_MINUS_2_562915447 (FIX_3_072711026 - FIX_2_562915447)
2701 +#define FIX_0_298631336_MINUS_0_899976223 (FIX_0_298631336 - FIX_0_899976223)
2702 +#define FIX_1_501321110_MINUS_0_899976223 (FIX_1_501321110 - FIX_0_899976223)
2703 +#define FIX_2_053119869_MINUS_2_562915447 (FIX_2_053119869 - FIX_2_562915447)
2704 +#define FIX_0_541196100_PLUS_0_765366865 (FIX_0_541196100 + FIX_0_765366865)
2705 +
2706 +/*
2707 + * Reference SIMD-friendly 1-D ISLOW iDCT C implementation.
2708 + * Uses some ideas from the comments in 'simd/jiss2int-64.asm'
2709 + */
2710 +#define REF_1D_IDCT(xrow0, xrow1, xrow2, xrow3, xrow4, xrow5, xrow6, xrow7) \
2711 +{ \
2712 + DCTELEM row0, row1, row2, row3, row4, row5, row6, row7; \
2713 + INT32 q1, q2, q3, q4, q5, q6, q7; \
2714 + INT32 tmp11_plus_tmp2, tmp11_minus_tmp2; \
2715 + \
2716 + /* 1-D iDCT input data */ \
2717 + row0 = xrow0; \
2718 + row1 = xrow1; \
2719 + row2 = xrow2; \
2720 + row3 = xrow3; \
2721 + row4 = xrow4; \
2722 + row5 = xrow5; \
2723 + row6 = xrow6; \
2724 + row7 = xrow7; \
2725 + \
2726 + q5 = row7 + row3; \
2727 + q4 = row5 + row1; \
2728 + q6 = MULTIPLY(q5, FIX_1_175875602_MINUS_1_961570560) + \
2729 + MULTIPLY(q4, FIX_1_175875602); \
2730 + q7 = MULTIPLY(q5, FIX_1_175875602) + \
2731 + MULTIPLY(q4, FIX_1_175875602_MINUS_0_390180644); \
2732 + q2 = MULTIPLY(row2, FIX_0_541196100) + \
2733 + MULTIPLY(row6, FIX_0_541196100_MINUS_1_847759065); \
2734 + q4 = q6; \
2735 + q3 = ((INT32) row0 - (INT32) row4) << 13; \
2736 + q6 += MULTIPLY(row5, -FIX_2_562915447) + \
2737 + MULTIPLY(row3, FIX_3_072711026_MINUS_2_562915447); \
2738 + /* now we can use q1 (reloadable constants have been used up) */ \
2739 + q1 = q3 + q2; \
2740 + q4 += MULTIPLY(row7, FIX_0_298631336_MINUS_0_899976223) + \
2741 + MULTIPLY(row1, -FIX_0_899976223); \
2742 + q5 = q7; \
2743 + q1 = q1 + q6; \
2744 + q7 += MULTIPLY(row7, -FIX_0_899976223) + \
2745 + MULTIPLY(row1, FIX_1_501321110_MINUS_0_899976223); \
2746 + \
2747 + /* (tmp11 + tmp2) has been calculated (out_row1 before descale) */ \
2748 + tmp11_plus_tmp2 = q1; \
2749 + row1 = 0; \
2750 + \
2751 + q1 = q1 - q6; \
2752 + q5 += MULTIPLY(row5, FIX_2_053119869_MINUS_2_562915447) + \
2753 + MULTIPLY(row3, -FIX_2_562915447); \
2754 + q1 = q1 - q6; \
2755 + q6 = MULTIPLY(row2, FIX_0_541196100_PLUS_0_765366865) + \
2756 + MULTIPLY(row6, FIX_0_541196100); \
2757 + q3 = q3 - q2; \
2758 + \
2759 + /* (tmp11 - tmp2) has been calculated (out_row6 before descale) */ \
2760 + tmp11_minus_tmp2 = q1; \
2761 + \
2762 + q1 = ((INT32) row0 + (INT32) row4) << 13; \
2763 + q2 = q1 + q6; \
2764 + q1 = q1 - q6; \
2765 + \
2766 + /* pick up the results */ \
2767 + tmp0 = q4; \
2768 + tmp1 = q5; \
2769 + tmp2 = (tmp11_plus_tmp2 - tmp11_minus_tmp2) / 2; \
2770 + tmp3 = q7; \
2771 + tmp10 = q2; \
2772 + tmp11 = (tmp11_plus_tmp2 + tmp11_minus_tmp2) / 2; \
2773 + tmp12 = q3; \
2774 + tmp13 = q1; \
2775 +}
2776 +
2777 +#define XFIX_0_899976223 v0.4h[0]
2778 +#define XFIX_0_541196100 v0.4h[1]
2779 +#define XFIX_2_562915447 v0.4h[2]
2780 +#define XFIX_0_298631336_MINUS_0_899976223 v0.4h[3]
2781 +#define XFIX_1_501321110_MINUS_0_899976223 v1.4h[0]
2782 +#define XFIX_2_053119869_MINUS_2_562915447 v1.4h[1]
2783 +#define XFIX_0_541196100_PLUS_0_765366865 v1.4h[2]
2784 +#define XFIX_1_175875602 v1.4h[3]
2785 +#define XFIX_1_175875602_MINUS_0_390180644 v2.4h[0]
2786 +#define XFIX_0_541196100_MINUS_1_847759065 v2.4h[1]
2787 +#define XFIX_3_072711026_MINUS_2_562915447 v2.4h[2]
2788 +#define XFIX_1_175875602_MINUS_1_961570560 v2.4h[3]
2789 +
2790 +.balign 16
2791 +jsimd_idct_islow_neon_consts:
2792 + .short FIX_0_899976223 /* d0[0] */
2793 + .short FIX_0_541196100 /* d0[1] */
2794 + .short FIX_2_562915447 /* d0[2] */
2795 + .short FIX_0_298631336_MINUS_0_899976223 /* d0[3] */
2796 + .short FIX_1_501321110_MINUS_0_899976223 /* d1[0] */
2797 + .short FIX_2_053119869_MINUS_2_562915447 /* d1[1] */
2798 + .short FIX_0_541196100_PLUS_0_765366865 /* d1[2] */
2799 + .short FIX_1_175875602 /* d1[3] */
2800 + /* reloadable constants */
2801 + .short FIX_1_175875602_MINUS_0_390180644 /* d2[0] */
2802 + .short FIX_0_541196100_MINUS_1_847759065 /* d2[1] */
2803 + .short FIX_3_072711026_MINUS_2_562915447 /* d2[2] */
2804 + .short FIX_1_175875602_MINUS_1_961570560 /* d2[3] */
2805 +
2806 +asm_function jsimd_idct_islow_neon
2807 +
2808 + DCT_TABLE .req x0
2809 + COEF_BLOCK .req x1
2810 + OUTPUT_BUF .req x2
2811 + OUTPUT_COL .req x3
2812 + TMP1 .req x0
2813 + TMP2 .req x1
2814 + TMP3 .req x2
2815 + TMP4 .req x15
2816 +
2817 + ROW0L .req v16
2818 + ROW0R .req v17
2819 + ROW1L .req v18
2820 + ROW1R .req v19
2821 + ROW2L .req v20
2822 + ROW2R .req v21
2823 + ROW3L .req v22
2824 + ROW3R .req v23
2825 + ROW4L .req v24
2826 + ROW4R .req v25
2827 + ROW5L .req v26
2828 + ROW5R .req v27
2829 + ROW6L .req v28
2830 + ROW6R .req v29
2831 + ROW7L .req v30
2832 + ROW7R .req v31
2833 + /* Save all NEON registers and x15 (32 NEON registers * 8 bytes + 16) */
2834 + sub sp, sp, 272
2835 + str x15, [sp], 16
2836 + adr x15, jsimd_idct_islow_neon_consts
2837 + st1 {v0.8b - v3.8b}, [sp], 32
2838 + st1 {v4.8b - v7.8b}, [sp], 32
2839 + st1 {v8.8b - v11.8b}, [sp], 32
2840 + st1 {v12.8b - v15.8b}, [sp], 32
2841 + st1 {v16.8b - v19.8b}, [sp], 32
2842 + st1 {v20.8b - v23.8b}, [sp], 32
2843 + st1 {v24.8b - v27.8b}, [sp], 32
2844 + st1 {v28.8b - v31.8b}, [sp], 32
2845 + ld1 {v16.4h, v17.4h, v18.4h, v19.4h}, [COEF_BLOCK], 32
2846 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [DCT_TABLE], 32
2847 + ld1 {v20.4h, v21.4h, v22.4h, v23.4h}, [COEF_BLOCK], 32
2848 + mul v16.4h, v16.4h, v0.4h
2849 + mul v17.4h, v17.4h, v1.4h
2850 + ins v16.2d[1], v17.2d[0] /* 128 bit q8 */
2851 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [DCT_TABLE], 32
2852 + mul v18.4h, v18.4h, v2.4h
2853 + mul v19.4h, v19.4h, v3.4h
2854 + ins v18.2d[1], v19.2d[0] /* 128 bit q9 */
2855 + ld1 {v24.4h, v25.4h, v26.4h, v27.4h}, [COEF_BLOCK], 32
2856 + mul v20.4h, v20.4h, v4.4h
2857 + mul v21.4h, v21.4h, v5.4h
2858 + ins v20.2d[1], v21.2d[0] /* 128 bit q10 */
2859 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [DCT_TABLE], 32
2860 + mul v22.4h, v22.4h, v6.4h
2861 + mul v23.4h, v23.4h, v7.4h
2862 + ins v22.2d[1], v23.2d[0] /* 128 bit q11 */
2863 + ld1 {v28.4h, v29.4h, v30.4h, v31.4h}, [COEF_BLOCK]
2864 + mul v24.4h, v24.4h, v0.4h
2865 + mul v25.4h, v25.4h, v1.4h
2866 + ins v24.2d[1], v25.2d[0] /* 128 bit q12 */
2867 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [DCT_TABLE], 32
2868 + mul v28.4h, v28.4h, v4.4h
2869 + mul v29.4h, v29.4h, v5.4h
2870 + ins v28.2d[1], v29.2d[0] /* 128 bit q14 */
2871 + mul v26.4h, v26.4h, v2.4h
2872 + mul v27.4h, v27.4h, v3.4h
2873 + ins v26.2d[1], v27.2d[0] /* 128 bit q13 */
2874 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [x15] /* load constants */
2875 + add x15, x15, #16
2876 + mul v30.4h, v30.4h, v6.4h
2877 + mul v31.4h, v31.4h, v7.4h
2878 + ins v30.2d[1], v31.2d[0] /* 128 bit q15 */
2879 + /* Go to the bottom of the stack */
2880 + sub sp, sp, 352
2881 + stp x4, x5, [sp], 16
2882 + st1 {v8.4h - v11.4h}, [sp], 32 /* save NEON registers */
2883 + st1 {v12.4h - v15.4h}, [sp], 32
2884 + /* 1-D IDCT, pass 1, left 4x8 half */
2885 + add v4.4h, ROW7L.4h, ROW3L.4h
2886 + add v5.4h, ROW5L.4h, ROW1L.4h
2887 + smull v12.4s, v4.4h, XFIX_1_175875602_MINUS_1_961570560
2888 + smlal v12.4s, v5.4h, XFIX_1_175875602
2889 + smull v14.4s, v4.4h, XFIX_1_175875602
2890 + /* Check for the zero coefficients in the right 4x8 half */
2891 + smlal v14.4s, v5.4h, XFIX_1_175875602_MINUS_0_390180644
2892 + ssubl v6.4s, ROW0L.4h, ROW4L.4h
2893 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 1 * 8))]
2894 + smull v4.4s, ROW2L.4h, XFIX_0_541196100
2895 + smlal v4.4s, ROW6L.4h, XFIX_0_541196100_MINUS_1_847759065
2896 + orr x0, x4, x5
2897 + mov v8.16b, v12.16b
2898 + smlsl v12.4s, ROW5L.4h, XFIX_2_562915447
2899 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 2 * 8))]
2900 + smlal v12.4s, ROW3L.4h, XFIX_3_072711026_MINUS_2_562915447
2901 + shl v6.4s, v6.4s, #13
2902 + orr x0, x0, x4
2903 + smlsl v8.4s, ROW1L.4h, XFIX_0_899976223
2904 + orr x0, x0 , x5
2905 + add v2.4s, v6.4s, v4.4s
2906 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 3 * 8))]
2907 + mov v10.16b, v14.16b
2908 + add v2.4s, v2.4s, v12.4s
2909 + orr x0, x0, x4
2910 + smlsl v14.4s, ROW7L.4h, XFIX_0_899976223
2911 + orr x0, x0, x5
2912 + smlal v14.4s, ROW1L.4h, XFIX_1_501321110_MINUS_0_899976223
2913 + rshrn ROW1L.4h, v2.4s, #11
2914 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 4 * 8))]
2915 + sub v2.4s, v2.4s, v12.4s
2916 + smlal v10.4s, ROW5L.4h, XFIX_2_053119869_MINUS_2_562915447
2917 + orr x0, x0, x4
2918 + smlsl v10.4s, ROW3L.4h, XFIX_2_562915447
2919 + orr x0, x0, x5
2920 + sub v2.4s, v2.4s, v12.4s
2921 + smull v12.4s, ROW2L.4h, XFIX_0_541196100_PLUS_0_765366865
2922 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 5 * 8))]
2923 + smlal v12.4s, ROW6L.4h, XFIX_0_541196100
2924 + sub v6.4s, v6.4s, v4.4s
2925 + orr x0, x0, x4
2926 + rshrn ROW6L.4h, v2.4s, #11
2927 + orr x0, x0, x5
2928 + add v2.4s, v6.4s, v10.4s
2929 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 6 * 8))]
2930 + sub v6.4s, v6.4s, v10.4s
2931 + saddl v10.4s, ROW0L.4h, ROW4L.4h
2932 + orr x0, x0, x4
2933 + rshrn ROW2L.4h, v2.4s, #11
2934 + orr x0, x0, x5
2935 + rshrn ROW5L.4h, v6.4s, #11
2936 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 7 * 8))]
2937 + shl v10.4s, v10.4s, #13
2938 + smlal v8.4s, ROW7L.4h, XFIX_0_298631336_MINUS_0_899976223
2939 + orr x0, x0, x4
2940 + add v4.4s, v10.4s, v12.4s
2941 + orr x0, x0, x5
2942 + cmp x0, #0 /* orrs instruction removed */
2943 + sub v2.4s, v10.4s, v12.4s
2944 + add v12.4s, v4.4s, v14.4s
2945 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 0 * 8))]
2946 + sub v4.4s, v4.4s, v14.4s
2947 + add v10.4s, v2.4s, v8.4s
2948 + orr x0, x4, x5
2949 + sub v6.4s, v2.4s, v8.4s
2950 + /* pop {x4, x5} */
2951 + sub sp, sp, 80
2952 + ldp x4, x5, [sp], 16
2953 + rshrn ROW7L.4h, v4.4s, #11
2954 + rshrn ROW3L.4h, v10.4s, #11
2955 + rshrn ROW0L.4h, v12.4s, #11
2956 + rshrn ROW4L.4h, v6.4s, #11
2957 +
2958 + beq 3f /* Go to do some special handling for the sparse right 4x8 half */
2959 +
2960 + /* 1-D IDCT, pass 1, right 4x8 half */
2961 + ld1 {v2.4h}, [x15] /* reload constants */
2962 + add v10.4h, ROW7R.4h, ROW3R.4h
2963 + add v8.4h, ROW5R.4h, ROW1R.4h
2964 + /* Transpose ROW6L <-> ROW7L (v3 available free register) */
2965 + transpose ROW6L, ROW7L, v3, .16b, .4h
2966 + smull v12.4s, v10.4h, XFIX_1_175875602_MINUS_1_961570560
2967 + smlal v12.4s, v8.4h, XFIX_1_175875602
2968 + /* Transpose ROW2L <-> ROW3L (v3 available free register) */
2969 + transpose ROW2L, ROW3L, v3, .16b, .4h
2970 + smull v14.4s, v10.4h, XFIX_1_175875602
2971 + smlal v14.4s, v8.4h, XFIX_1_175875602_MINUS_0_390180644
2972 + /* Transpose ROW0L <-> ROW1L (v3 available free register) */
2973 + transpose ROW0L, ROW1L, v3, .16b, .4h
2974 + ssubl v6.4s, ROW0R.4h, ROW4R.4h
2975 + smull v4.4s, ROW2R.4h, XFIX_0_541196100
2976 + smlal v4.4s, ROW6R.4h, XFIX_0_541196100_MINUS_1_847759065
2977 + /* Transpose ROW4L <-> ROW5L (v3 available free register) */
2978 + transpose ROW4L, ROW5L, v3, .16b, .4h
2979 + mov v8.16b, v12.16b
2980 + smlsl v12.4s, ROW5R.4h, XFIX_2_562915447
2981 + smlal v12.4s, ROW3R.4h, XFIX_3_072711026_MINUS_2_562915447
2982 + /* Transpose ROW1L <-> ROW3L (v3 available free register) */
2983 + transpose ROW1L, ROW3L, v3, .16b, .2s
2984 + shl v6.4s, v6.4s, #13
2985 + smlsl v8.4s, ROW1R.4h, XFIX_0_899976223
2986 + /* Transpose ROW4L <-> ROW6L (v3 available free register) */
2987 + transpose ROW4L, ROW6L, v3, .16b, .2s
2988 + add v2.4s, v6.4s, v4.4s
2989 + mov v10.16b, v14.16b
2990 + add v2.4s, v2.4s, v12.4s
2991 + /* Transpose ROW0L <-> ROW2L (v3 available free register) */
2992 + transpose ROW0L, ROW2L, v3, .16b, .2s
2993 + smlsl v14.4s, ROW7R.4h, XFIX_0_899976223
2994 + smlal v14.4s, ROW1R.4h, XFIX_1_501321110_MINUS_0_899976223
2995 + rshrn ROW1R.4h, v2.4s, #11
2996 + /* Transpose ROW5L <-> ROW7L (v3 available free register) */
2997 + transpose ROW5L, ROW7L, v3, .16b, .2s
2998 + sub v2.4s, v2.4s, v12.4s
2999 + smlal v10.4s, ROW5R.4h, XFIX_2_053119869_MINUS_2_562915447
3000 + smlsl v10.4s, ROW3R.4h, XFIX_2_562915447
3001 + sub v2.4s, v2.4s, v12.4s
3002 + smull v12.4s, ROW2R.4h, XFIX_0_541196100_PLUS_0_765366865
3003 + smlal v12.4s, ROW6R.4h, XFIX_0_541196100
3004 + sub v6.4s, v6.4s, v4.4s
3005 + rshrn ROW6R.4h, v2.4s, #11
3006 + add v2.4s, v6.4s, v10.4s
3007 + sub v6.4s, v6.4s, v10.4s
3008 + saddl v10.4s, ROW0R.4h, ROW4R.4h
3009 + rshrn ROW2R.4h, v2.4s, #11
3010 + rshrn ROW5R.4h, v6.4s, #11
3011 + shl v10.4s, v10.4s, #13
3012 + smlal v8.4s, ROW7R.4h, XFIX_0_298631336_MINUS_0_899976223
3013 + add v4.4s, v10.4s, v12.4s
3014 + sub v2.4s, v10.4s, v12.4s
3015 + add v12.4s, v4.4s, v14.4s
3016 + sub v4.4s, v4.4s, v14.4s
3017 + add v10.4s, v2.4s, v8.4s
3018 + sub v6.4s, v2.4s, v8.4s
3019 + rshrn ROW7R.4h, v4.4s, #11
3020 + rshrn ROW3R.4h, v10.4s, #11
3021 + rshrn ROW0R.4h, v12.4s, #11
3022 + rshrn ROW4R.4h, v6.4s, #11
3023 + /* Transpose right 4x8 half */
3024 + transpose ROW6R, ROW7R, v3, .16b, .4h
3025 + transpose ROW2R, ROW3R, v3, .16b, .4h
3026 + transpose ROW0R, ROW1R, v3, .16b, .4h
3027 + transpose ROW4R, ROW5R, v3, .16b, .4h
3028 + transpose ROW1R, ROW3R, v3, .16b, .2s
3029 + transpose ROW4R, ROW6R, v3, .16b, .2s
3030 + transpose ROW0R, ROW2R, v3, .16b, .2s
3031 + transpose ROW5R, ROW7R, v3, .16b, .2s
3032 +
3033 +1: /* 1-D IDCT, pass 2 (normal variant), left 4x8 half */
3034 + ld1 {v2.4h}, [x15] /* reload constants */
3035 + smull v12.4S, ROW1R.4h, XFIX_1_175875602 /* ROW5L.4h <-> ROW1R. 4h */
3036 + smlal v12.4s, ROW1L.4h, XFIX_1_175875602
3037 + smlal v12.4s, ROW3R.4h, XFIX_1_175875602_MINUS_1_961570560 /* R OW7L.4h <-> ROW3R.4h */
3038 + smlal v12.4s, ROW3L.4h, XFIX_1_175875602_MINUS_1_961570560
3039 + smull v14.4s, ROW3R.4h, XFIX_1_175875602 /* ROW7L.4h <-> ROW3R. 4h */
3040 + smlal v14.4s, ROW3L.4h, XFIX_1_175875602
3041 + smlal v14.4s, ROW1R.4h, XFIX_1_175875602_MINUS_0_390180644 /* R OW5L.4h <-> ROW1R.4h */
3042 + smlal v14.4s, ROW1L.4h, XFIX_1_175875602_MINUS_0_390180644
3043 + ssubl v6.4s, ROW0L.4h, ROW0R.4h /* ROW4L.4h <-> ROW0R.4h */
3044 + smull v4.4s, ROW2L.4h, XFIX_0_541196100
3045 + smlal v4.4s, ROW2R.4h, XFIX_0_541196100_MINUS_1_847759065 /* R OW6L.4h <-> ROW2R.4h */
3046 + mov v8.16b, v12.16b
3047 + smlsl v12.4s, ROW1R.4h, XFIX_2_562915447 /* ROW5L.4h <-> ROW1R. 4h */
3048 + smlal v12.4s, ROW3L.4h, XFIX_3_072711026_MINUS_2_562915447
3049 + shl v6.4s, v6.4s, #13
3050 + smlsl v8.4s, ROW1L.4h, XFIX_0_899976223
3051 + add v2.4s, v6.4s, v4.4s
3052 + mov v10.16b, v14.16b
3053 + add v2.4s, v2.4s, v12.4s
3054 + smlsl v14.4s, ROW3R.4h, XFIX_0_899976223 /* ROW7L.4h <-> ROW3R. 4h */
3055 + smlal v14.4s, ROW1L.4h, XFIX_1_501321110_MINUS_0_899976223
3056 + shrn ROW1L.4h, v2.4s, #16
3057 + sub v2.4s, v2.4s, v12.4s
3058 + smlal v10.4s, ROW1R.4h, XFIX_2_053119869_MINUS_2_562915447 /* R OW5L.4h <-> ROW1R.4h */
3059 + smlsl v10.4s, ROW3L.4h, XFIX_2_562915447
3060 + sub v2.4s, v2.4s, v12.4s
3061 + smull v12.4s, ROW2L.4h, XFIX_0_541196100_PLUS_0_765366865
3062 + smlal v12.4s, ROW2R.4h, XFIX_0_541196100 /* ROW6L.4h <-> ROW2R. 4h */
3063 + sub v6.4s, v6.4s, v4.4s
3064 + shrn ROW2R.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */
3065 + add v2.4s, v6.4s, v10.4s
3066 + sub v6.4s, v6.4s, v10.4s
3067 + saddl v10.4s, ROW0L.4h, ROW0R.4h /* ROW4L.4h <-> ROW0R.4h */
3068 + shrn ROW2L.4h, v2.4s, #16
3069 + shrn ROW1R.4h, v6.4s, #16 /* ROW5L.4h <-> ROW1R.4h */
3070 + shl v10.4s, v10.4s, #13
3071 + smlal v8.4s, ROW3R.4h, XFIX_0_298631336_MINUS_0_899976223 /* R OW7L.4h <-> ROW3R.4h */
3072 + add v4.4s, v10.4s, v12.4s
3073 + sub v2.4s, v10.4s, v12.4s
3074 + add v12.4s, v4.4s, v14.4s
3075 + sub v4.4s, v4.4s, v14.4s
3076 + add v10.4s, v2.4s, v8.4s
3077 + sub v6.4s, v2.4s, v8.4s
3078 + shrn ROW3R.4h, v4.4s, #16 /* ROW7L.4h <-> ROW3R.4h */
3079 + shrn ROW3L.4h, v10.4s, #16
3080 + shrn ROW0L.4h, v12.4s, #16
3081 + shrn ROW0R.4h, v6.4s, #16 /* ROW4L.4h <-> ROW0R.4h */
3082 + /* 1-D IDCT, pass 2, right 4x8 half */
3083 + ld1 {v2.4h}, [x15] /* reload constants */
3084 + smull v12.4s, ROW5R.4h, XFIX_1_175875602
3085 + smlal v12.4s, ROW5L.4h, XFIX_1_175875602 /* ROW5L.4h <-> ROW1R. 4h */
3086 + smlal v12.4s, ROW7R.4h, XFIX_1_175875602_MINUS_1_961570560
3087 + smlal v12.4s, ROW7L.4h, XFIX_1_175875602_MINUS_1_961570560 /* R OW7L.4h <-> ROW3R.4h */
3088 + smull v14.4s, ROW7R.4h, XFIX_1_175875602
3089 + smlal v14.4s, ROW7L.4h, XFIX_1_175875602 /* ROW7L.4h <-> ROW3R. 4h */
3090 + smlal v14.4s, ROW5R.4h, XFIX_1_175875602_MINUS_0_390180644
3091 + smlal v14.4s, ROW5L.4h, XFIX_1_175875602_MINUS_0_390180644 /* R OW5L.4h <-> ROW1R.4h */
3092 + ssubl v6.4s, ROW4L.4h, ROW4R.4h /* ROW4L.4h <-> ROW0R.4h */
3093 + smull v4.4s, ROW6L.4h, XFIX_0_541196100 /* ROW6L.4h <-> ROW2R. 4h */
3094 + smlal v4.4s, ROW6R.4h, XFIX_0_541196100_MINUS_1_847759065
3095 + mov v8.16b, v12.16b
3096 + smlsl v12.4s, ROW5R.4h, XFIX_2_562915447
3097 + smlal v12.4s, ROW7L.4h, XFIX_3_072711026_MINUS_2_562915447 /* R OW7L.4h <-> ROW3R.4h */
3098 + shl v6.4s, v6.4s, #13
3099 + smlsl v8.4s, ROW5L.4h, XFIX_0_899976223 /* ROW5L.4h <-> ROW1R. 4h */
3100 + add v2.4s, v6.4s, v4.4s
3101 + mov v10.16b, v14.16b
3102 + add v2.4s, v2.4s, v12.4s
3103 + smlsl v14.4s, ROW7R.4h, XFIX_0_899976223
3104 + smlal v14.4s, ROW5L.4h, XFIX_1_501321110_MINUS_0_899976223 /* R OW5L.4h <-> ROW1R.4h */
3105 + shrn ROW5L.4h, v2.4s, #16 /* ROW5L.4h <-> ROW1R.4h */
3106 + sub v2.4s, v2.4s, v12.4s
3107 + smlal v10.4s, ROW5R.4h, XFIX_2_053119869_MINUS_2_562915447
3108 + smlsl v10.4s, ROW7L.4h, XFIX_2_562915447 /* ROW7L.4h <-> ROW3R. 4h */
3109 + sub v2.4s, v2.4s, v12.4s
3110 + smull v12.4s, ROW6L.4h, XFIX_0_541196100_PLUS_0_765366865 /* RO W6L.4h <-> ROW2R.4h */
3111 + smlal v12.4s, ROW6R.4h, XFIX_0_541196100
3112 + sub v6.4s, v6.4s, v4.4s
3113 + shrn ROW6R.4h, v2.4s, #16
3114 + add v2.4s, v6.4s, v10.4s
3115 + sub v6.4s, v6.4s, v10.4s
3116 + saddl v10.4s, ROW4L.4h, ROW4R.4h /* ROW4L.4h <-> ROW0R.4h */
3117 + shrn ROW6L.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */
3118 + shrn ROW5R.4h, v6.4s, #16
3119 + shl v10.4s, v10.4s, #13
3120 + smlal v8.4s, ROW7R.4h, XFIX_0_298631336_MINUS_0_899976223
3121 + add v4.4s, v10.4s, v12.4s
3122 + sub v2.4s, v10.4s, v12.4s
3123 + add v12.4s, v4.4s, v14.4s
3124 + sub v4.4s, v4.4s, v14.4s
3125 + add v10.4s, v2.4s, v8.4s
3126 + sub v6.4s, v2.4s, v8.4s
3127 + shrn ROW7R.4h, v4.4s, #16
3128 + shrn ROW7L.4h, v10.4s, #16 /* ROW7L.4h <-> ROW3R.4h */
3129 + shrn ROW4L.4h, v12.4s, #16 /* ROW4L.4h <-> ROW0R.4h */
3130 + shrn ROW4R.4h, v6.4s, #16
3131 +
3132 +2: /* Descale to 8-bit and range limit */
3133 + ins v16.2d[1], v17.2d[0]
3134 + ins v18.2d[1], v19.2d[0]
3135 + ins v20.2d[1], v21.2d[0]
3136 + ins v22.2d[1], v23.2d[0]
3137 + sqrshrn v16.8b, v16.8h, #2
3138 + sqrshrn2 v16.16b, v18.8h, #2
3139 + sqrshrn v18.8b, v20.8h, #2
3140 + sqrshrn2 v18.16b, v22.8h, #2
3141 +
3142 + /* vpop {v8.4h - d15.4h} */ /* restore NEON registers */
3143 + ld1 {v8.4h - v11.4h}, [sp], 32
3144 + ld1 {v12.4h - v15.4h}, [sp], 32
3145 + ins v24.2d[1], v25.2d[0]
3146 +
3147 + sqrshrn v20.8b, v24.8h, #2
3148 + /* Transpose the final 8-bit samples and do signed->unsigned conversion * /
3149 + /* trn1 v16.8h, v16.8h, v18.8h */
3150 + transpose v16, v18, v3, .16b, .8h
3151 + ins v26.2d[1], v27.2d[0]
3152 + ins v28.2d[1], v29.2d[0]
3153 + ins v30.2d[1], v31.2d[0]
3154 + sqrshrn2 v20.16b, v26.8h, #2
3155 + sqrshrn v22.8b, v28.8h, #2
3156 + movi v0.16b, #(CENTERJSAMPLE)
3157 + sqrshrn2 v22.16b, v30.8h, #2
3158 + transpose_single v16, v17, v3, .2d, .8b
3159 + transpose_single v18, v19, v3, .2d, .8b
3160 + add v16.8b, v16.8b, v0.8b
3161 + add v17.8b, v17.8b, v0.8b
3162 + add v18.8b, v18.8b, v0.8b
3163 + add v19.8b, v19.8b, v0.8b
3164 + transpose v20, v22, v3, .16b, .8h
3165 + /* Store results to the output buffer */
3166 + ldp TMP1, TMP2, [OUTPUT_BUF], 16
3167 + add TMP1, TMP1, OUTPUT_COL
3168 + add TMP2, TMP2, OUTPUT_COL
3169 + st1 {v16.8b}, [TMP1]
3170 + transpose_single v20, v21, v3, .2d, .8b
3171 + st1 {v17.8b}, [TMP2]
3172 + ldp TMP1, TMP2, [OUTPUT_BUF], 16
3173 + add TMP1, TMP1, OUTPUT_COL
3174 + add TMP2, TMP2, OUTPUT_COL
3175 + st1 {v18.8b}, [TMP1]
3176 + add v20.8b, v20.8b, v0.8b
3177 + add v21.8b, v21.8b, v0.8b
3178 + st1 {v19.8b}, [TMP2]
3179 + ldp TMP1, TMP2, [OUTPUT_BUF], 16
3180 + ldp TMP3, TMP4, [OUTPUT_BUF]
3181 + add TMP1, TMP1, OUTPUT_COL
3182 + add TMP2, TMP2, OUTPUT_COL
3183 + add TMP3, TMP3, OUTPUT_COL
3184 + add TMP4, TMP4, OUTPUT_COL
3185 + transpose_single v22, v23, v3, .2d, .8b
3186 + st1 {v20.8b}, [TMP1]
3187 + add v22.8b, v22.8b, v0.8b
3188 + add v23.8b, v23.8b, v0.8b
3189 + st1 {v21.8b}, [TMP2]
3190 + st1 {v22.8b}, [TMP3]
3191 + st1 {v23.8b}, [TMP4]
3192 + ldr x15, [sp], 16
3193 + ld1 {v0.8b - v3.8b}, [sp], 32
3194 + ld1 {v4.8b - v7.8b}, [sp], 32
3195 + ld1 {v8.8b - v11.8b}, [sp], 32
3196 + ld1 {v12.8b - v15.8b}, [sp], 32
3197 + ld1 {v16.8b - v19.8b}, [sp], 32
3198 + ld1 {v20.8b - v23.8b}, [sp], 32
3199 + ld1 {v24.8b - v27.8b}, [sp], 32
3200 + ld1 {v28.8b - v31.8b}, [sp], 32
3201 + blr x30
3202 +
3203 +3: /* Left 4x8 half is done, right 4x8 half contains mostly zeros */
3204 +
3205 + /* Transpose left 4x8 half */
3206 + transpose ROW6L, ROW7L, v3, .16b, .4h
3207 + transpose ROW2L, ROW3L, v3, .16b, .4h
3208 + transpose ROW0L, ROW1L, v3, .16b, .4h
3209 + transpose ROW4L, ROW5L, v3, .16b, .4h
3210 + shl ROW0R.4h, ROW0R.4h, #2 /* PASS1_BITS */
3211 + transpose ROW1L, ROW3L, v3, .16b, .2s
3212 + transpose ROW4L, ROW6L, v3, .16b, .2s
3213 + transpose ROW0L, ROW2L, v3, .16b, .2s
3214 + transpose ROW5L, ROW7L, v3, .16b, .2s
3215 + cmp x0, #0
3216 + beq 4f /* Right 4x8 half has all zeros, go to 'sparse' second p ass */
3217 +
3218 + /* Only row 0 is non-zero for the right 4x8 half */
3219 + dup ROW1R.4h, ROW0R.4h[1]
3220 + dup ROW2R.4h, ROW0R.4h[2]
3221 + dup ROW3R.4h, ROW0R.4h[3]
3222 + dup ROW4R.4h, ROW0R.4h[0]
3223 + dup ROW5R.4h, ROW0R.4h[1]
3224 + dup ROW6R.4h, ROW0R.4h[2]
3225 + dup ROW7R.4h, ROW0R.4h[3]
3226 + dup ROW0R.4h, ROW0R.4h[0]
3227 + b 1b /* Go to 'normal' second pass */
3228 +
3229 +4: /* 1-D IDCT, pass 2 (sparse variant with zero rows 4-7), left 4x8 half */
3230 + ld1 {v2.4h}, [x15] /* reload constants */
3231 + smull v12.4s, ROW1L.4h, XFIX_1_175875602
3232 + smlal v12.4s, ROW3L.4h, XFIX_1_175875602_MINUS_1_961570560
3233 + smull v14.4s, ROW3L.4h, XFIX_1_175875602
3234 + smlal v14.4s, ROW1L.4h, XFIX_1_175875602_MINUS_0_390180644
3235 + smull v4.4s, ROW2L.4h, XFIX_0_541196100
3236 + sshll v6.4s, ROW0L.4h, #13
3237 + mov v8.16b, v12.16b
3238 + smlal v12.4s, ROW3L.4h, XFIX_3_072711026_MINUS_2_562915447
3239 + smlsl v8.4s, ROW1L.4h, XFIX_0_899976223
3240 + add v2.4s, v6.4s, v4.4s
3241 + mov v10.16b, v14.16b
3242 + smlal v14.4s, ROW1L.4h, XFIX_1_501321110_MINUS_0_899976223
3243 + add v2.4s, v2.4s, v12.4s
3244 + add v12.4s, v12.4s, v12.4s
3245 + smlsl v10.4s, ROW3L.4h, XFIX_2_562915447
3246 + shrn ROW1L.4h, v2.4s, #16
3247 + sub v2.4s, v2.4s, v12.4s
3248 + smull v12.4s, ROW2L.4h, XFIX_0_541196100_PLUS_0_765366865
3249 + sub v6.4s, v6.4s, v4.4s
3250 + shrn ROW2R.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */
3251 + add v2.4s, v6.4s, v10.4s
3252 + sub v6.4s, v6.4s, v10.4s
3253 + sshll v10.4s, ROW0L.4h, #13
3254 + shrn ROW2L.4h, v2.4s, #16
3255 + shrn ROW1R.4h, v6.4s, #16 /* ROW5L.4h <-> ROW1R.4h */
3256 + add v4.4s, v10.4s, v12.4s
3257 + sub v2.4s, v10.4s, v12.4s
3258 + add v12.4s, v4.4s, v14.4s
3259 + sub v4.4s, v4.4s, v14.4s
3260 + add v10.4s, v2.4s, v8.4s
3261 + sub v6.4s, v2.4s, v8.4s
3262 + shrn ROW3R.4h, v4.4s, #16 /* ROW7L.4h <-> ROW3R.4h */
3263 + shrn ROW3L.4h, v10.4s, #16
3264 + shrn ROW0L.4h, v12.4s, #16
3265 + shrn ROW0R.4h, v6.4s, #16 /* ROW4L.4h <-> ROW0R.4h */
3266 + /* 1-D IDCT, pass 2 (sparse variant with zero rows 4-7), right 4x8 half */
3267 + ld1 {v2.4h}, [x15] /* reload constants */
3268 + smull v12.4s, ROW5L.4h, XFIX_1_175875602
3269 + smlal v12.4s, ROW7L.4h, XFIX_1_175875602_MINUS_1_961570560
3270 + smull v14.4s, ROW7L.4h, XFIX_1_175875602
3271 + smlal v14.4s, ROW5L.4h, XFIX_1_175875602_MINUS_0_390180644
3272 + smull v4.4s, ROW6L.4h, XFIX_0_541196100
3273 + sshll v6.4s, ROW4L.4h, #13
3274 + mov v8.16b, v12.16b
3275 + smlal v12.4s, ROW7L.4h, XFIX_3_072711026_MINUS_2_562915447
3276 + smlsl v8.4s, ROW5L.4h, XFIX_0_899976223
3277 + add v2.4s, v6.4s, v4.4s
3278 + mov v10.16b, v14.16b
3279 + smlal v14.4s, ROW5L.4h, XFIX_1_501321110_MINUS_0_899976223
3280 + add v2.4s, v2.4s, v12.4s
3281 + add v12.4s, v12.4s, v12.4s
3282 + smlsl v10.4s, ROW7L.4h, XFIX_2_562915447
3283 + shrn ROW5L.4h, v2.4s, #16 /* ROW5L.4h <-> ROW1R.4h */
3284 + sub v2.4s, v2.4s, v12.4s
3285 + smull v12.4s, ROW6L.4h, XFIX_0_541196100_PLUS_0_765366865
3286 + sub v6.4s, v6.4s, v4.4s
3287 + shrn ROW6R.4h, v2.4s, #16
3288 + add v2.4s, v6.4s, v10.4s
3289 + sub v6.4s, v6.4s, v10.4s
3290 + sshll v10.4s, ROW4L.4h, #13
3291 + shrn ROW6L.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */
3292 + shrn ROW5R.4h, v6.4s, #16
3293 + add v4.4s, v10.4s, v12.4s
3294 + sub v2.4s, v10.4s, v12.4s
3295 + add v12.4s, v4.4s, v14.4s
3296 + sub v4.4s, v4.4s, v14.4s
3297 + add v10.4s, v2.4s, v8.4s
3298 + sub v6.4s, v2.4s, v8.4s
3299 + shrn ROW7R.4h, v4.4s, #16
3300 + shrn ROW7L.4h, v10.4s, #16 /* ROW7L.4h <-> ROW3R.4h */
3301 + shrn ROW4L.4h, v12.4s, #16 /* ROW4L.4h <-> ROW0R.4h */
3302 + shrn ROW4R.4h, v6.4s, #16
3303 + b 2b /* Go to epilogue */
3304 +
3305 + .unreq DCT_TABLE
3306 + .unreq COEF_BLOCK
3307 + .unreq OUTPUT_BUF
3308 + .unreq OUTPUT_COL
3309 + .unreq TMP1
3310 + .unreq TMP2
3311 + .unreq TMP3
3312 + .unreq TMP4
3313 +
3314 + .unreq ROW0L
3315 + .unreq ROW0R
3316 + .unreq ROW1L
3317 + .unreq ROW1R
3318 + .unreq ROW2L
3319 + .unreq ROW2R
3320 + .unreq ROW3L
3321 + .unreq ROW3R
3322 + .unreq ROW4L
3323 + .unreq ROW4R
3324 + .unreq ROW5L
3325 + .unreq ROW5R
3326 + .unreq ROW6L
3327 + .unreq ROW6R
3328 + .unreq ROW7L
3329 + .unreq ROW7R
3330 +
3331 +
3332 +/*****************************************************************************/
3333 +
3334 +/*
3335 + * jsimd_idct_ifast_neon
3336 + *
3337 + * This function contains a fast, not so accurate integer implementation of
3338 + * the inverse DCT (Discrete Cosine Transform). It uses the same calculations
3339 + * and produces exactly the same output as IJG's original 'jpeg_idct_ifast'
3340 + * function from jidctfst.c
3341 + *
3342 + * Normally 1-D AAN DCT needs 5 multiplications and 29 additions.
3343 + * But in ARM NEON case some extra additions are required because VQDMULH
3344 + * instruction can't handle the constants larger than 1. So the expressions
3345 + * like "x * 1.082392200" have to be converted to "x * 0.082392200 + x",
3346 + * which introduces an extra addition. Overall, there are 6 extra additions
3347 + * per 1-D IDCT pass, totalling to 5 VQDMULH and 35 VADD/VSUB instructions.
3348 + */
3349 +
3350 +#define XFIX_1_082392200 v0.4h[0]
3351 +#define XFIX_1_414213562 v0.4h[1]
3352 +#define XFIX_1_847759065 v0.4h[2]
3353 +#define XFIX_2_613125930 v0.4h[3]
3354 +
3355 +.balign 16
3356 +jsimd_idct_ifast_neon_consts:
3357 + .short (277 * 128 - 256 * 128) /* XFIX_1_082392200 */
3358 + .short (362 * 128 - 256 * 128) /* XFIX_1_414213562 */
3359 + .short (473 * 128 - 256 * 128) /* XFIX_1_847759065 */
3360 + .short (669 * 128 - 512 * 128) /* XFIX_2_613125930 */
3361 +
3362 +asm_function jsimd_idct_ifast_neon
3363 +
3364 + DCT_TABLE .req x0
3365 + COEF_BLOCK .req x1
3366 + OUTPUT_BUF .req x2
3367 + OUTPUT_COL .req x3
3368 + TMP1 .req x0
3369 + TMP2 .req x1
3370 + TMP3 .req x2
3371 + TMP4 .req x22
3372 + TMP5 .req x23
3373 +
3374 + /* Load and dequantize coefficients into NEON registers
3375 + * with the following allocation:
3376 + * 0 1 2 3 | 4 5 6 7
3377 + * ---------+--------
3378 + * 0 | d16 | d17 ( v8.8h )
3379 + * 1 | d18 | d19 ( v9.8h )
3380 + * 2 | d20 | d21 ( v10.8h )
3381 + * 3 | d22 | d23 ( v11.8h )
3382 + * 4 | d24 | d25 ( v12.8h )
3383 + * 5 | d26 | d27 ( v13.8h )
3384 + * 6 | d28 | d29 ( v14.8h )
3385 + * 7 | d30 | d31 ( v15.8h )
3386 + */
3387 + /* Save NEON registers used in fast IDCT */
3388 + sub sp, sp, #176
3389 + stp x22, x23, [sp], 16
3390 + adr x23, jsimd_idct_ifast_neon_consts
3391 + st1 {v0.8b - v3.8b}, [sp], 32
3392 + st1 {v4.8b - v7.8b}, [sp], 32
3393 + st1 {v8.8b - v11.8b}, [sp], 32
3394 + st1 {v12.8b - v15.8b}, [sp], 32
3395 + st1 {v16.8b - v19.8b}, [sp], 32
3396 + ld1 {v8.8h, v9.8h}, [COEF_BLOCK], 32
3397 + ld1 {v0.8h, v1.8h}, [DCT_TABLE], 32
3398 + ld1 {v10.8h, v11.8h}, [COEF_BLOCK], 32
3399 + mul v8.8h, v8.8h, v0.8h
3400 + ld1 {v2.8h, v3.8h}, [DCT_TABLE], 32
3401 + mul v9.8h, v9.8h, v1.8h
3402 + ld1 {v12.8h, v13.8h}, [COEF_BLOCK], 32
3403 + mul v10.8h, v10.8h, v2.8h
3404 + ld1 {v0.8h, v1.8h}, [DCT_TABLE], 32
3405 + mul v11.8h, v11.8h, v3.8h
3406 + ld1 {v14.8h, v15.8h}, [COEF_BLOCK], 32
3407 + mul v12.8h, v12.8h, v0.8h
3408 + ld1 {v2.8h, v3.8h}, [DCT_TABLE], 32
3409 + mul v14.8h, v14.8h, v2.8h
3410 + mul v13.8h, v13.8h, v1.8h
3411 + ld1 {v0.4h}, [x23] /* load constants */
3412 + mul v15.8h, v15.8h, v3.8h
3413 +
3414 + /* 1-D IDCT, pass 1 */
3415 + sub v2.8h, v10.8h, v14.8h
3416 + add v14.8h, v10.8h, v14.8h
3417 + sub v1.8h, v11.8h, v13.8h
3418 + add v13.8h, v11.8h, v13.8h
3419 + sub v5.8h, v9.8h, v15.8h
3420 + add v15.8h, v9.8h, v15.8h
3421 + sqdmulh v4.8h, v2.8h, XFIX_1_414213562
3422 + sqdmulh v6.8h, v1.8h, XFIX_2_613125930
3423 + add v3.8h, v1.8h, v1.8h
3424 + sub v1.8h, v5.8h, v1.8h
3425 + add v10.8h, v2.8h, v4.8h
3426 + sqdmulh v4.8h, v1.8h, XFIX_1_847759065
3427 + sub v2.8h, v15.8h, v13.8h
3428 + add v3.8h, v3.8h, v6.8h
3429 + sqdmulh v6.8h, v2.8h, XFIX_1_414213562
3430 + add v1.8h, v1.8h, v4.8h
3431 + sqdmulh v4.8h, v5.8h, XFIX_1_082392200
3432 + sub v10.8h, v10.8h, v14.8h
3433 + add v2.8h, v2.8h, v6.8h
3434 + sub v6.8h, v8.8h, v12.8h
3435 + add v12.8h, v8.8h, v12.8h
3436 + add v9.8h, v5.8h, v4.8h
3437 + add v5.8h, v6.8h, v10.8h
3438 + sub v10.8h, v6.8h, v10.8h
3439 + add v6.8h, v15.8h, v13.8h
3440 + add v8.8h, v12.8h, v14.8h
3441 + sub v3.8h, v6.8h, v3.8h
3442 + sub v12.8h, v12.8h, v14.8h
3443 + sub v3.8h, v3.8h, v1.8h
3444 + sub v1.8h, v9.8h, v1.8h
3445 + add v2.8h, v3.8h, v2.8h
3446 + sub v15.8h, v8.8h, v6.8h
3447 + add v1.8h, v1.8h, v2.8h
3448 + add v8.8h, v8.8h, v6.8h
3449 + add v14.8h, v5.8h, v3.8h
3450 + sub v9.8h, v5.8h, v3.8h
3451 + sub v13.8h, v10.8h, v2.8h
3452 + add v10.8h, v10.8h, v2.8h
3453 + /* Transpose q8-q9 */
3454 + mov v18.16b, v8.16b
3455 + trn1 v8.8h, v8.8h, v9.8h
3456 + trn2 v9.8h, v18.8h, v9.8h
3457 + sub v11.8h, v12.8h, v1.8h
3458 + /* Transpose q14-q15 */
3459 + mov v18.16b, v14.16b
3460 + trn1 v14.8h, v14.8h, v15.8h
3461 + trn2 v15.8h, v18.8h, v15.8h
3462 + add v12.8h, v12.8h, v1.8h
3463 + /* Transpose q10-q11 */
3464 + mov v18.16b, v10.16b
3465 + trn1 v10.8h, v10.8h, v11.8h
3466 + trn2 v11.8h, v18.8h, v11.8h
3467 + /* Transpose q12-q13 */
3468 + mov v18.16b, v12.16b
3469 + trn1 v12.8h, v12.8h, v13.8h
3470 + trn2 v13.8h, v18.8h, v13.8h
3471 + /* Transpose q9-q11 */
3472 + mov v18.16b, v9.16b
3473 + trn1 v9.4s, v9.4s, v11.4s
3474 + trn2 v11.4s, v18.4s, v11.4s
3475 + /* Transpose q12-q14 */
3476 + mov v18.16b, v12.16b
3477 + trn1 v12.4s, v12.4s, v14.4s
3478 + trn2 v14.4s, v18.4s, v14.4s
3479 + /* Transpose q8-q10 */
3480 + mov v18.16b, v8.16b
3481 + trn1 v8.4s, v8.4s, v10.4s
3482 + trn2 v10.4s, v18.4s, v10.4s
3483 + /* Transpose q13-q15 */
3484 + mov v18.16b, v13.16b
3485 + trn1 v13.4s, v13.4s, v15.4s
3486 + trn2 v15.4s, v18.4s, v15.4s
3487 + /* vswp v14.4h, v10-MSB.4h */
3488 + umov x22, v14.d[0]
3489 + ins v14.2d[0], v10.2d[1]
3490 + ins v10.2d[1], x22
3491 + /* vswp v13.4h, v9MSB.4h */
3492 +
3493 + umov x22, v13.d[0]
3494 + ins v13.2d[0], v9.2d[1]
3495 + ins v9.2d[1], x22
3496 + /* 1-D IDCT, pass 2 */
3497 + sub v2.8h, v10.8h, v14.8h
3498 + /* vswp v15.4h, v11MSB.4h */
3499 + umov x22, v15.d[0]
3500 + ins v15.2d[0], v11.2d[1]
3501 + ins v11.2d[1], x22
3502 + add v14.8h, v10.8h, v14.8h
3503 + /* vswp v12.4h, v8-MSB.4h */
3504 + umov x22, v12.d[0]
3505 + ins v12.2d[0], v8.2d[1]
3506 + ins v8.2d[1], x22
3507 + sub v1.8h, v11.8h, v13.8h
3508 + add v13.8h, v11.8h, v13.8h
3509 + sub v5.8h, v9.8h, v15.8h
3510 + add v15.8h, v9.8h, v15.8h
3511 + sqdmulh v4.8h, v2.8h, XFIX_1_414213562
3512 + sqdmulh v6.8h, v1.8h, XFIX_2_613125930
3513 + add v3.8h, v1.8h, v1.8h
3514 + sub v1.8h, v5.8h, v1.8h
3515 + add v10.8h, v2.8h, v4.8h
3516 + sqdmulh v4.8h, v1.8h, XFIX_1_847759065
3517 + sub v2.8h, v15.8h, v13.8h
3518 + add v3.8h, v3.8h, v6.8h
3519 + sqdmulh v6.8h, v2.8h, XFIX_1_414213562
3520 + add v1.8h, v1.8h, v4.8h
3521 + sqdmulh v4.8h, v5.8h, XFIX_1_082392200
3522 + sub v10.8h, v10.8h, v14.8h
3523 + add v2.8h, v2.8h, v6.8h
3524 + sub v6.8h, v8.8h, v12.8h
3525 + add v12.8h, v8.8h, v12.8h
3526 + add v9.8h, v5.8h, v4.8h
3527 + add v5.8h, v6.8h, v10.8h
3528 + sub v10.8h, v6.8h, v10.8h
3529 + add v6.8h, v15.8h, v13.8h
3530 + add v8.8h, v12.8h, v14.8h
3531 + sub v3.8h, v6.8h, v3.8h
3532 + sub v12.8h, v12.8h, v14.8h
3533 + sub v3.8h, v3.8h, v1.8h
3534 + sub v1.8h, v9.8h, v1.8h
3535 + add v2.8h, v3.8h, v2.8h
3536 + sub v15.8h, v8.8h, v6.8h
3537 + add v1.8h, v1.8h, v2.8h
3538 + add v8.8h, v8.8h, v6.8h
3539 + add v14.8h, v5.8h, v3.8h
3540 + sub v9.8h, v5.8h, v3.8h
3541 + sub v13.8h, v10.8h, v2.8h
3542 + add v10.8h, v10.8h, v2.8h
3543 + sub v11.8h, v12.8h, v1.8h
3544 + add v12.8h, v12.8h, v1.8h
3545 + /* Descale to 8-bit and range limit */
3546 + movi v0.16b, #0x80
3547 + sqshrn v8.8b, v8.8h, #5
3548 + sqshrn2 v8.16b, v9.8h, #5
3549 + sqshrn v9.8b, v10.8h, #5
3550 + sqshrn2 v9.16b, v11.8h, #5
3551 + sqshrn v10.8b, v12.8h, #5
3552 + sqshrn2 v10.16b, v13.8h, #5
3553 + sqshrn v11.8b, v14.8h, #5
3554 + sqshrn2 v11.16b, v15.8h, #5
3555 + add v8.16b, v8.16b, v0.16b
3556 + add v9.16b, v9.16b, v0.16b
3557 + add v10.16b, v10.16b, v0.16b
3558 + add v11.16b, v11.16b, v0.16b
3559 + /* Transpose the final 8-bit samples */
3560 + /* Transpose q8-q9 */
3561 + mov v18.16b, v8.16b
3562 + trn1 v8.8h, v8.8h, v9.8h
3563 + trn2 v9.8h, v18.8h, v9.8h
3564 + /* Transpose q10-q11 */
3565 + mov v18.16b, v10.16b
3566 + trn1 v10.8h, v10.8h, v11.8h
3567 + trn2 v11.8h, v18.8h, v11.8h
3568 + /* Transpose q8-q10 */
3569 + mov v18.16b, v8.16b
3570 + trn1 v8.4s, v8.4s, v10.4s
3571 + trn2 v10.4s, v18.4s, v10.4s
3572 + /* Transpose q9-q11 */
3573 + mov v18.16b, v9.16b
3574 + trn1 v9.4s, v9.4s, v11.4s
3575 + trn2 v11.4s, v18.4s, v11.4s
3576 + /* make copy */
3577 + ins v17.2d[0], v8.2d[1]
3578 + /* Transpose d16-d17-msb */
3579 + mov v18.16b, v8.16b
3580 + trn1 v8.8b, v8.8b, v17.8b
3581 + trn2 v17.8b, v18.8b, v17.8b
3582 + /* make copy */
3583 + ins v19.2d[0], v9.2d[1]
3584 + mov v18.16b, v9.16b
3585 + trn1 v9.8b, v9.8b, v19.8b
3586 + trn2 v19.8b, v18.8b, v19.8b
3587 + /* Store results to the output buffer */
3588 + ldp TMP1, TMP2, [OUTPUT_BUF], 16
3589 + add TMP1, TMP1, OUTPUT_COL
3590 + add TMP2, TMP2, OUTPUT_COL
3591 + st1 {v8.8b}, [TMP1]
3592 + st1 {v17.8b}, [TMP2]
3593 + ldp TMP1, TMP2, [OUTPUT_BUF], 16
3594 + add TMP1, TMP1, OUTPUT_COL
3595 + add TMP2, TMP2, OUTPUT_COL
3596 + st1 {v9.8b}, [TMP1]
3597 + /* make copy */
3598 + ins v7.2d[0], v10.2d[1]
3599 + mov v18.16b, v10.16b
3600 + trn1 v10.8b, v10.8b, v7.8b
3601 + trn2 v7.8b, v18.8b, v7.8b
3602 + st1 {v19.8b}, [TMP2]
3603 + ldp TMP1, TMP2, [OUTPUT_BUF], 16
3604 + ldp TMP4, TMP5, [OUTPUT_BUF], 16
3605 + add TMP1, TMP1, OUTPUT_COL
3606 + add TMP2, TMP2, OUTPUT_COL
3607 + add TMP4, TMP4, OUTPUT_COL
3608 + add TMP5, TMP5, OUTPUT_COL
3609 + st1 {v10.8b}, [TMP1]
3610 + /* make copy */
3611 + ins v16.2d[0], v11.2d[1]
3612 + mov v18.16b, v11.16b
3613 + trn1 v11.8b, v11.8b, v16.8b
3614 + trn2 v16.8b, v18.8b, v16.8b
3615 + st1 {v7.8b}, [TMP2]
3616 + st1 {v11.8b}, [TMP4]
3617 + st1 {v16.8b}, [TMP5]
3618 + sub sp, sp, #176
3619 + ldp x22, x23, [sp], 16
3620 + ld1 {v0.8b - v3.8b}, [sp], 32
3621 + ld1 {v4.8b - v7.8b}, [sp], 32
3622 + ld1 {v8.8b - v11.8b}, [sp], 32
3623 + ld1 {v12.8b - v15.8b}, [sp], 32
3624 + ld1 {v16.8b - v19.8b}, [sp], 32
3625 + blr x30
3626 +
3627 + .unreq DCT_TABLE
3628 + .unreq COEF_BLOCK
3629 + .unreq OUTPUT_BUF
3630 + .unreq OUTPUT_COL
3631 + .unreq TMP1
3632 + .unreq TMP2
3633 + .unreq TMP3
3634 + .unreq TMP4
3635 +
3636 +
3637 +/*****************************************************************************/
3638 +
3639 +/*
3640 + * jsimd_idct_4x4_neon
3641 + *
3642 + * This function contains inverse-DCT code for getting reduced-size
3643 + * 4x4 pixels output from an 8x8 DCT block. It uses the same calculations
3644 + * and produces exactly the same output as IJG's original 'jpeg_idct_4x4'
3645 + * function from jpeg-6b (jidctred.c).
3646 + *
3647 + * NOTE: jpeg-8 has an improved implementation of 4x4 inverse-DCT, which
3648 + * requires much less arithmetic operations and hence should be faster.
3649 + * The primary purpose of this particular NEON optimized function is
3650 + * bit exact compatibility with jpeg-6b.
3651 + *
3652 + * TODO: a bit better instructions scheduling can be achieved by expanding
3653 + * idct_helper/transpose_4x4 macros and reordering instructions,
3654 + * but readability will suffer somewhat.
3655 + */
3656 +
3657 +#define CONST_BITS 13
3658 +
3659 +#define FIX_0_211164243 (1730) /* FIX(0.211164243) */
3660 +#define FIX_0_509795579 (4176) /* FIX(0.509795579) */
3661 +#define FIX_0_601344887 (4926) /* FIX(0.601344887) */
3662 +#define FIX_0_720959822 (5906) /* FIX(0.720959822) */
3663 +#define FIX_0_765366865 (6270) /* FIX(0.765366865) */
3664 +#define FIX_0_850430095 (6967) /* FIX(0.850430095) */
3665 +#define FIX_0_899976223 (7373) /* FIX(0.899976223) */
3666 +#define FIX_1_061594337 (8697) /* FIX(1.061594337) */
3667 +#define FIX_1_272758580 (10426) /* FIX(1.272758580) */
3668 +#define FIX_1_451774981 (11893) /* FIX(1.451774981) */
3669 +#define FIX_1_847759065 (15137) /* FIX(1.847759065) */
3670 +#define FIX_2_172734803 (17799) /* FIX(2.172734803) */
3671 +#define FIX_2_562915447 (20995) /* FIX(2.562915447) */
3672 +#define FIX_3_624509785 (29692) /* FIX(3.624509785) */
3673 +
3674 +.balign 16
3675 +jsimd_idct_4x4_neon_consts:
3676 + .short FIX_1_847759065 /* v0.4h[0] */
3677 + .short -FIX_0_765366865 /* v0.4h[1] */
3678 + .short -FIX_0_211164243 /* v0.4h[2] */
3679 + .short FIX_1_451774981 /* v0.4h[3] */
3680 + .short -FIX_2_172734803 /* d1[0] */
3681 + .short FIX_1_061594337 /* d1[1] */
3682 + .short -FIX_0_509795579 /* d1[2] */
3683 + .short -FIX_0_601344887 /* d1[3] */
3684 + .short FIX_0_899976223 /* v2.4h[0] */
3685 + .short FIX_2_562915447 /* v2.4h[1] */
3686 + .short 1 << (CONST_BITS+1) /* v2.4h[2] */
3687 + .short 0 /* v2.4h[3] */
3688 +
3689 +.macro idct_helper x4, x6, x8, x10, x12, x14, x16, shift, y26, y27, y28, y29
3690 + smull v28.4s, \x4, v2.4h[2]
3691 + smlal v28.4s, \x8, v0.4h[0]
3692 + smlal v28.4s, \x14, v0.4h[1]
3693 +
3694 + smull v26.4s, \x16, v1.4h[2]
3695 + smlal v26.4s, \x12, v1.4h[3]
3696 + smlal v26.4s, \x10, v2.4h[0]
3697 + smlal v26.4s, \x6, v2.4h[1]
3698 +
3699 + smull v30.4s, \x4, v2.4h[2]
3700 + smlsl v30.4s, \x8, v0.4h[0]
3701 + smlsl v30.4s, \x14, v0.4h[1]
3702 +
3703 + smull v24.4s, \x16, v0.4h[2]
3704 + smlal v24.4s, \x12, v0.4h[3]
3705 + smlal v24.4s, \x10, v1.4h[0]
3706 + smlal v24.4s, \x6, v1.4h[1]
3707 +
3708 + add v20.4s, v28.4s, v26.4s
3709 + sub v28.4s, v28.4s, v26.4s
3710 +
3711 +.if \shift > 16
3712 + srshr v20.4s, v20.4s, #\shift
3713 + srshr v28.4s, v28.4s, #\shift
3714 + xtn \y26, v20.4s
3715 + xtn \y29, v28.4s
3716 +.else
3717 + rshrn \y26, v20.4s, #\shift
3718 + rshrn \y29, v28.4s, #\shift
3719 +.endif
3720 +
3721 + add v20.4s, v30.4s, v24.4s
3722 + sub v30.4s, v30.4s, v24.4s
3723 +
3724 +.if \shift > 16
3725 + srshr v20.4s, v20.4s, #\shift
3726 + srshr v30.4s, v30.4s, #\shift
3727 + xtn \y27, v20.4s
3728 + xtn \y28, v30.4s
3729 +.else
3730 + rshrn \y27, v20.4s, #\shift
3731 + rshrn \y28, v30.4s, #\shift
3732 +.endif
3733 +
3734 +.endm
3735 +
3736 +asm_function jsimd_idct_4x4_neon
3737 +
3738 + DCT_TABLE .req x0
3739 + COEF_BLOCK .req x1
3740 + OUTPUT_BUF .req x2
3741 + OUTPUT_COL .req x3
3742 + TMP1 .req x0
3743 + TMP2 .req x1
3744 + TMP3 .req x2
3745 + TMP4 .req x15
3746 +
3747 + /* Save all used NEON registers */
3748 + sub sp, sp, 272
3749 + str x15, [sp], 16
3750 + /* Load constants (v3.4h is just used for padding) */
3751 + adr TMP4, jsimd_idct_4x4_neon_consts
3752 + st1 {v0.8b - v3.8b}, [sp], 32
3753 + st1 {v4.8b - v7.8b}, [sp], 32
3754 + st1 {v8.8b - v11.8b}, [sp], 32
3755 + st1 {v12.8b - v15.8b}, [sp], 32
3756 + st1 {v16.8b - v19.8b}, [sp], 32
3757 + st1 {v20.8b - v23.8b}, [sp], 32
3758 + st1 {v24.8b - v27.8b}, [sp], 32
3759 + st1 {v28.8b - v31.8b}, [sp], 32
3760 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [TMP4]
3761 +
3762 + /* Load all COEF_BLOCK into NEON registers with the following allocation:
3763 + * 0 1 2 3 | 4 5 6 7
3764 + * ---------+--------
3765 + * 0 | v4.4h | v5.4h
3766 + * 1 | v6.4h | v7.4h
3767 + * 2 | v8.4h | v9.4h
3768 + * 3 | v10.4h | v11.4h
3769 + * 4 | - | -
3770 + * 5 | v12.4h | v13.4h
3771 + * 6 | v14.4h | v15.4h
3772 + * 7 | v16.4h | v17.4h
3773 + */
3774 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [COEF_BLOCK], 32
3775 + ld1 {v8.4h, v9.4h, v10.4h, v11.4h}, [COEF_BLOCK], 32
3776 + add COEF_BLOCK, COEF_BLOCK, #16
3777 + ld1 {v12.4h, v13.4h, v14.4h, v15.4h}, [COEF_BLOCK], 32
3778 + ld1 {v16.4h, v17.4h}, [COEF_BLOCK], 16
3779 + /* dequantize */
3780 + ld1 {v18.4h, v19.4h, v20.4h, v21.4h}, [DCT_TABLE], 32
3781 + mul v4.4h, v4.4h, v18.4h
3782 + mul v5.4h, v5.4h, v19.4h
3783 + ins v4.2d[1], v5.2d[0] /* 128 bit q4 */
3784 + ld1 {v22.4h, v23.4h, v24.4h, v25.4h}, [DCT_TABLE], 32
3785 + mul v6.4h, v6.4h, v20.4h
3786 + mul v7.4h, v7.4h, v21.4h
3787 + ins v6.2d[1], v7.2d[0] /* 128 bit q6 */
3788 + mul v8.4h, v8.4h, v22.4h
3789 + mul v9.4h, v9.4h, v23.4h
3790 + ins v8.2d[1], v9.2d[0] /* 128 bit q8 */
3791 + add DCT_TABLE, DCT_TABLE, #16
3792 + ld1 {v26.4h, v27.4h, v28.4h, v29.4h}, [DCT_TABLE], 32
3793 + mul v10.4h, v10.4h, v24.4h
3794 + mul v11.4h, v11.4h, v25.4h
3795 + ins v10.2d[1], v11.2d[0] /* 128 bit q10 */
3796 + mul v12.4h, v12.4h, v26.4h
3797 + mul v13.4h, v13.4h, v27.4h
3798 + ins v12.2d[1], v13.2d[0] /* 128 bit q12 */
3799 + ld1 {v30.4h, v31.4h}, [DCT_TABLE], 16
3800 + mul v14.4h, v14.4h, v28.4h
3801 + mul v15.4h, v15.4h, v29.4h
3802 + ins v14.2d[1], v15.2d[0] /* 128 bit q14 */
3803 + mul v16.4h, v16.4h, v30.4h
3804 + mul v17.4h, v17.4h, v31.4h
3805 + ins v16.2d[1], v17.2d[0] /* 128 bit q16 */
3806 +
3807 + /* Pass 1 */
3808 + idct_helper v4.4h, v6.4h, v8.4h, v10.4h, v12.4h, v14.4h, v16.4h, 12, v4 .4h, v6.4h, v8.4h, v10.4h
3809 + transpose_4x4 v4, v6, v8, v10, v3
3810 + ins v10.2d[1], v11.2d[0]
3811 + idct_helper v5.4h, v7.4h, v9.4h, v11.4h, v13.4h, v15.4h, v17.4h, 12, v5 .4h, v7.4h, v9.4h, v11.4h
3812 + transpose_4x4 v5, v7, v9, v11, v3
3813 + ins v10.2d[1], v11.2d[0]
3814 + /* Pass 2 */
3815 + idct_helper v4.4h, v6.4h, v8.4h, v10.4h, v7.4h, v9.4h, v11.4h, 19, v26. 4h, v27.4h, v28.4h, v29.4h
3816 + transpose_4x4 v26, v27, v28, v29, v3
3817 +
3818 + /* Range limit */
3819 + movi v30.8h, #0x80
3820 + ins v26.2d[1], v27.2d[0]
3821 + ins v28.2d[1], v29.2d[0]
3822 + add v26.8h, v26.8h, v30.8h
3823 + add v28.8h, v28.8h, v30.8h
3824 + sqxtun v26.8b, v26.8h
3825 + sqxtun v27.8b, v28.8h
3826 +
3827 + /* Store results to the output buffer */
3828 + ldp TMP1, TMP2, [OUTPUT_BUF], 16
3829 + ldp TMP3, TMP4, [OUTPUT_BUF]
3830 + add TMP1, TMP1, OUTPUT_COL
3831 + add TMP2, TMP2, OUTPUT_COL
3832 + add TMP3, TMP3, OUTPUT_COL
3833 + add TMP4, TMP4, OUTPUT_COL
3834 +
3835 +#if defined(__ARMEL__) && !RESPECT_STRICT_ALIGNMENT
3836 + /* We can use much less instructions on little endian systems if the
3837 + * OS kernel is not configured to trap unaligned memory accesses
3838 + */
3839 + st1 {v26.s}[0], [TMP1], 4
3840 + st1 {v27.s}[0], [TMP3], 4
3841 + st1 {v26.s}[1], [TMP2], 4
3842 + st1 {v27.s}[1], [TMP4], 4
3843 +#else
3844 + st1 {v26.b}[0], [TMP1], 1
3845 + st1 {v27.b}[0], [TMP3], 1
3846 + st1 {v26.b}[1], [TMP1], 1
3847 + st1 {v27.b}[1], [TMP3], 1
3848 + st1 {v26.b}[2], [TMP1], 1
3849 + st1 {v27.b}[2], [TMP3], 1
3850 + st1 {v26.b}[3], [TMP1], 1
3851 + st1 {v27.b}[3], [TMP3], 1
3852 +
3853 + st1 {v26.b}[4], [TMP2], 1
3854 + st1 {v27.b}[4], [TMP4], 1
3855 + st1 {v26.b}[5], [TMP2], 1
3856 + st1 {v27.b}[5], [TMP4], 1
3857 + st1 {v26.b}[6], [TMP2], 1
3858 + st1 {v27.b}[6], [TMP4], 1
3859 + st1 {v26.b}[7], [TMP2], 1
3860 + st1 {v27.b}[7], [TMP4], 1
3861 +#endif
3862 +
3863 + /* vpop {v8.4h - v15.4h} ;not available */
3864 + sub sp, sp, #272
3865 + ldr x15, [sp], 16
3866 + ld1 {v0.8b - v3.8b}, [sp], 32
3867 + ld1 {v4.8b - v7.8b}, [sp], 32
3868 + ld1 {v8.8b - v11.8b}, [sp], 32
3869 + ld1 {v12.8b - v15.8b}, [sp], 32
3870 + ld1 {v16.8b - v19.8b}, [sp], 32
3871 + ld1 {v20.8b - v23.8b}, [sp], 32
3872 + ld1 {v24.8b - v27.8b}, [sp], 32
3873 + ld1 {v28.8b - v31.8b}, [sp], 32
3874 + blr x30
3875 +
3876 + .unreq DCT_TABLE
3877 + .unreq COEF_BLOCK
3878 + .unreq OUTPUT_BUF
3879 + .unreq OUTPUT_COL
3880 + .unreq TMP1
3881 + .unreq TMP2
3882 + .unreq TMP3
3883 + .unreq TMP4
3884 +
3885 +.purgem idct_helper
3886 +
3887 +
3888 +/*****************************************************************************/
3889 +
3890 +/*
3891 + * jsimd_idct_2x2_neon
3892 + *
3893 + * This function contains inverse-DCT code for getting reduced-size
3894 + * 2x2 pixels output from an 8x8 DCT block. It uses the same calculations
3895 + * and produces exactly the same output as IJG's original 'jpeg_idct_2x2'
3896 + * function from jpeg-6b (jidctred.c).
3897 + *
3898 + * NOTE: jpeg-8 has an improved implementation of 2x2 inverse-DCT, which
3899 + * requires much less arithmetic operations and hence should be faster.
3900 + * The primary purpose of this particular NEON optimized function is
3901 + * bit exact compatibility with jpeg-6b.
3902 + */
3903 +
3904 +.balign 8
3905 +jsimd_idct_2x2_neon_consts:
3906 + .short -FIX_0_720959822 /* v14[0] */
3907 + .short FIX_0_850430095 /* v14[1] */
3908 + .short -FIX_1_272758580 /* v14[2] */
3909 + .short FIX_3_624509785 /* v14[3] */
3910 +
3911 +.macro idct_helper x4, x6, x10, x12, x16, shift, y26, y27
3912 + sshll v15.4s, \x4, #15
3913 + smull v26.4s, \x6, v14.4h[3]
3914 + smlal v26.4s, \x10, v14.4h[2]
3915 + smlal v26.4s, \x12, v14.4h[1]
3916 + smlal v26.4s, \x16, v14.4h[0]
3917 +
3918 + add v20.4s, v15.4s, v26.4s
3919 + sub v15.4s, v15.4s, v26.4s
3920 +
3921 +.if \shift > 16
3922 + srshr v20.4s, v20.4s, #\shift
3923 + srshr v15.4s, v15.4s, #\shift
3924 + xtn \y26, v20.4s
3925 + xtn \y27, v15.4s
3926 +.else
3927 + rshrn \y26, v20.4s, #\shift
3928 + rshrn \y27, v15.4s, #\shift
3929 +.endif
3930 +
3931 +.endm
3932 +
3933 +asm_function jsimd_idct_2x2_neon
3934 +
3935 + DCT_TABLE .req x0
3936 + COEF_BLOCK .req x1
3937 + OUTPUT_BUF .req x2
3938 + OUTPUT_COL .req x3
3939 + TMP1 .req x0
3940 + TMP2 .req x15
3941 +
3942 + /* vpush {v8.4h - v15.4h} ; not available */
3943 + sub sp, sp, 208
3944 + str x15, [sp], 16
3945 +
3946 + /* Load constants */
3947 + adr TMP2, jsimd_idct_2x2_neon_consts
3948 + st1 {v4.8b - v7.8b}, [sp], 32
3949 + st1 {v8.8b - v11.8b}, [sp], 32
3950 + st1 {v12.8b - v15.8b}, [sp], 32
3951 + st1 {v16.8b - v19.8b}, [sp], 32
3952 + st1 {v21.8b - v22.8b}, [sp], 16
3953 + st1 {v24.8b - v27.8b}, [sp], 32
3954 + st1 {v30.8b - v31.8b}, [sp], 16
3955 + ld1 {v14.4h}, [TMP2]
3956 +
3957 + /* Load all COEF_BLOCK into NEON registers with the following allocation:
3958 + * 0 1 2 3 | 4 5 6 7
3959 + * ---------+--------
3960 + * 0 | v4.4h | v5.4h
3961 + * 1 | v6.4h | v7.4h
3962 + * 2 | - | -
3963 + * 3 | v10.4h | v11.4h
3964 + * 4 | - | -
3965 + * 5 | v12.4h | v13.4h
3966 + * 6 | - | -
3967 + * 7 | v16.4h | v17.4h
3968 + */
3969 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [COEF_BLOCK], 32
3970 + add COEF_BLOCK, COEF_BLOCK, #16
3971 + ld1 {v10.4h, v11.4h}, [COEF_BLOCK], 16
3972 + add COEF_BLOCK, COEF_BLOCK, #16
3973 + ld1 {v12.4h, v13.4h}, [COEF_BLOCK], 16
3974 + add COEF_BLOCK, COEF_BLOCK, #16
3975 + ld1 {v16.4h, v17.4h}, [COEF_BLOCK], 16
3976 + /* Dequantize */
3977 + ld1 {v18.4h, v19.4h, v20.4h, v21.4h}, [DCT_TABLE], 32
3978 + mul v4.4h, v4.4h, v18.4h
3979 + mul v5.4h, v5.4h, v19.4h
3980 + ins v4.2d[1], v5.2d[0]
3981 + mul v6.4h, v6.4h, v20.4h
3982 + mul v7.4h, v7.4h, v21.4h
3983 + ins v6.2d[1], v7.2d[0]
3984 + add DCT_TABLE, DCT_TABLE, #16
3985 + ld1 {v24.4h, v25.4h}, [DCT_TABLE], 16
3986 + mul v10.4h, v10.4h, v24.4h
3987 + mul v11.4h, v11.4h, v25.4h
3988 + ins v10.2d[1], v11.2d[0]
3989 + add DCT_TABLE, DCT_TABLE, #16
3990 + ld1 {v26.4h, v27.4h}, [DCT_TABLE], 16
3991 + mul v12.4h, v12.4h, v26.4h
3992 + mul v13.4h, v13.4h, v27.4h
3993 + ins v12.2d[1], v13.2d[0]
3994 + add DCT_TABLE, DCT_TABLE, #16
3995 + ld1 {v30.4h, v31.4h}, [DCT_TABLE], 16
3996 + mul v16.4h, v16.4h, v30.4h
3997 + mul v17.4h, v17.4h, v31.4h
3998 + ins v16.2d[1], v17.2d[0]
3999 +
4000 + /* Pass 1 */
4001 +#if 0
4002 + idct_helper v4.4h, v6.4h, v10.4h, v12.4h, v16.4h, 13, v4.4h, v6.4h
4003 + transpose_4x4 v4.4h, v6.4h, v8.4h, v10.4h
4004 + idct_helper v5.4h, v7.4h, v11.4h, v13.4h, v17.4h, 13, v5.4h, v7.4h
4005 + transpose_4x4 v5.4h, v7.4h, v9.4h, v11.4h
4006 +#else
4007 + smull v26.4s, v6.4h, v14.4h[3]
4008 + smlal v26.4s, v10.4h, v14.4h[2]
4009 + smlal v26.4s, v12.4h, v14.4h[1]
4010 + smlal v26.4s, v16.4h, v14.4h[0]
4011 + smull v24.4s, v7.4h, v14.4h[3]
4012 + smlal v24.4s, v11.4h, v14.4h[2]
4013 + smlal v24.4s, v13.4h, v14.4h[1]
4014 + smlal v24.4s, v17.4h, v14.4h[0]
4015 + sshll v15.4s, v4.4h, #15
4016 + sshll v30.4s, v5.4h, #15
4017 + add v20.4s, v15.4s, v26.4s
4018 + sub v15.4s, v15.4s, v26.4s
4019 + rshrn v4.4h, v20.4s, #13
4020 + rshrn v6.4h, v15.4s, #13
4021 + add v20.4s, v30.4s, v24.4s
4022 + sub v15.4s, v30.4s, v24.4s
4023 + rshrn v5.4h, v20.4s, #13
4024 + rshrn v7.4h, v15.4s, #13
4025 + ins v4.2d[1], v5.2d[0]
4026 + ins v6.2d[1], v7.2d[0]
4027 + transpose v4, v6, v3, .16b, .8h
4028 + transpose v6, v10, v3, .16b, .4s
4029 + ins v11.2d[0], v10.2d[1]
4030 + ins v7.2d[0], v6.2d[1]
4031 +#endif
4032 +
4033 + /* Pass 2 */
4034 + idct_helper v4.4h, v6.4h, v10.4h, v7.4h, v11.4h, 20, v26.4h, v27.4h
4035 +
4036 + /* Range limit */
4037 + movi v30.8h, #0x80
4038 + ins v26.2d[1], v27.2d[0]
4039 + add v26.8h, v26.8h, v30.8h
4040 + sqxtun v30.8b, v26.8h
4041 + ins v26.2d[0], v30.2d[0]
4042 + sqxtun v27.8b, v26.8h
4043 +
4044 + /* Store results to the output buffer */
4045 + ldp TMP1, TMP2, [OUTPUT_BUF]
4046 + add TMP1, TMP1, OUTPUT_COL
4047 + add TMP2, TMP2, OUTPUT_COL
4048 +
4049 + st1 {v26.b}[0], [TMP1], 1
4050 + st1 {v27.b}[4], [TMP1], 1
4051 + st1 {v26.b}[1], [TMP2], 1
4052 + st1 {v27.b}[5], [TMP2], 1
4053 +
4054 + sub sp, sp, #208
4055 + ldr x15, [sp], 16
4056 + ld1 {v4.8b - v7.8b}, [sp], 32
4057 + ld1 {v8.8b - v11.8b}, [sp], 32
4058 + ld1 {v12.8b - v15.8b}, [sp], 32
4059 + ld1 {v16.8b - v19.8b}, [sp], 32
4060 + ld1 {v21.8b - v22.8b}, [sp], 16
4061 + ld1 {v24.8b - v27.8b}, [sp], 32
4062 + ld1 {v30.8b - v31.8b}, [sp], 16
4063 + blr x30
4064 +
4065 + .unreq DCT_TABLE
4066 + .unreq COEF_BLOCK
4067 + .unreq OUTPUT_BUF
4068 + .unreq OUTPUT_COL
4069 + .unreq TMP1
4070 + .unreq TMP2
4071 +
4072 +.purgem idct_helper
4073 +
4074 +
4075 +/*****************************************************************************/
4076 +
4077 +/*
4078 + * jsimd_ycc_extrgb_convert_neon
4079 + * jsimd_ycc_extbgr_convert_neon
4080 + * jsimd_ycc_extrgbx_convert_neon
4081 + * jsimd_ycc_extbgrx_convert_neon
4082 + * jsimd_ycc_extxbgr_convert_neon
4083 + * jsimd_ycc_extxrgb_convert_neon
4084 + *
4085 + * Colorspace conversion YCbCr -> RGB
4086 + */
4087 +
4088 +
4089 +.macro do_load size
4090 + .if \size == 8
4091 + ld1 {v4.8b}, [U], 8
4092 + ld1 {v5.8b}, [V], 8
4093 + ld1 {v0.8b}, [Y], 8
4094 + prfm PLDL1KEEP, [U, #64]
4095 + prfm PLDL1KEEP, [V, #64]
4096 + prfm PLDL1KEEP, [Y, #64]
4097 + .elseif \size == 4
4098 + ld1 {v4.b}[0], [U], 1
4099 + ld1 {v4.b}[1], [U], 1
4100 + ld1 {v4.b}[2], [U], 1
4101 + ld1 {v4.b}[3], [U], 1
4102 + ld1 {v5.b}[0], [V], 1
4103 + ld1 {v5.b}[1], [V], 1
4104 + ld1 {v5.b}[2], [V], 1
4105 + ld1 {v5.b}[3], [V], 1
4106 + ld1 {v0.b}[0], [Y], 1
4107 + ld1 {v0.b}[1], [Y], 1
4108 + ld1 {v0.b}[2], [Y], 1
4109 + ld1 {v0.b}[3], [Y], 1
4110 + .elseif \size == 2
4111 + ld1 {v4.b}[4], [U], 1
4112 + ld1 {v4.b}[5], [U], 1
4113 + ld1 {v5.b}[4], [V], 1
4114 + ld1 {v5.b}[5], [V], 1
4115 + ld1 {v0.b}[4], [Y], 1
4116 + ld1 {v0.b}[5], [Y], 1
4117 + .elseif \size == 1
4118 + ld1 {v4.b}[6], [U], 1
4119 + ld1 {v5.b}[6], [V], 1
4120 + ld1 {v0.b}[6], [Y], 1
4121 + .else
4122 + .error unsupported macroblock size
4123 + .endif
4124 +.endm
4125 +
4126 +.macro do_store bpp, size
4127 + .if \bpp == 24
4128 + .if \size == 8
4129 + st3 {v10.8b, v11.8b, v12.8b}, [RGB], 24
4130 + .elseif \size == 4
4131 + st3 {v10.b, v11.b, v12.b}[0], [RGB], 3
4132 + st3 {v10.b, v11.b, v12.b}[1], [RGB], 3
4133 + st3 {v10.b, v11.b, v12.b}[2], [RGB], 3
4134 + st3 {v10.b, v11.b, v12.b}[3], [RGB], 3
4135 + .elseif \size == 2
4136 + st3 {v10.b, v11.b, v12.b}[4], [RGB], 3
4137 + st3 {v10.b, v11.b, v12.b}[5], [RGB], 3
4138 + .elseif \size == 1
4139 + st3 {v10.b, v11.b, v12.b}[6], [RGB], 3
4140 + .else
4141 + .error unsupported macroblock size
4142 + .endif
4143 + .elseif \bpp == 32
4144 + .if \size == 8
4145 + st4 {v10.8b, v11.8b, v12.8b, v13.8b}, [RGB], 32
4146 + .elseif \size == 4
4147 + st4 {v10.b, v11.b, v12.b, v13.b}[0], [RGB], 4
4148 + st4 {v10.b, v11.b, v12.b, v13.b}[1], [RGB], 4
4149 + st4 {v10.b, v11.b, v12.b, v13.b}[2], [RGB], 4
4150 + st4 {v10.b, v11.b, v12.b, v13.b}[3], [RGB], 4
4151 + .elseif \size == 2
4152 + st4 {v10.b, v11.b, v12.b, v13.b}[4], [RGB], 4
4153 + st4 {v10.b, v11.b, v12.b, v13.b}[5], [RGB], 4
4154 + .elseif \size == 1
4155 + st4 {v10.b, v11.b, v12.b, v13.b}[6], [RGB], 4
4156 + .else
4157 + .error unsupported macroblock size
4158 + .endif
4159 + .elseif \bpp==16
4160 + .if \size == 8
4161 + st1 {v25.8h}, [RGB],16
4162 + .elseif \size == 4
4163 + st1 {v25.4h}, [RGB],8
4164 + .elseif \size == 2
4165 + st1 {v25.h}[4], [RGB],2
4166 + st1 {v25.h}[5], [RGB],2
4167 + .elseif \size == 1
4168 + st1 {v25.h}[6], [RGB],2
4169 + .else
4170 + .error unsupported macroblock size
4171 + .endif
4172 + .else
4173 + .error unsupported bpp
4174 + .endif
4175 +.endm
4176 +
4177 +.macro generate_jsimd_ycc_rgb_convert_neon colorid, bpp, r_offs, rsize, g_offs, gsize, b_offs, bsize, defsize
4178 +
4179 +/*
4180 + * 2-stage pipelined YCbCr->RGB conversion
4181 + */
4182 +
4183 +.macro do_yuv_to_rgb_stage1
4184 + uaddw v6.8h, v2.8h, v4.8b /* q3 = u - 128 */
4185 + uaddw v8.8h, v2.8h, v5.8b /* q2 = v - 128 */
4186 + smull v20.4s, v6.4h, v1.4h[1] /* multiply by -11277 */
4187 + smlal v20.4s, v8.4h, v1.4h[2] /* multiply by -23401 */
4188 + smull2 v22.4s, v6.8h, v1.4h[1] /* multiply by -11277 */
4189 + smlal2 v22.4s, v8.8h, v1.4h[2] /* multiply by -23401 */
4190 + smull v24.4s, v8.4h, v1.4h[0] /* multiply by 22971 */
4191 + smull2 v26.4s, v8.8h, v1.4h[0] /* multiply by 22971 */
4192 + smull v28.4s, v6.4h, v1.4h[3] /* multiply by 29033 */
4193 + smull2 v30.4s, v6.8h, v1.4h[3] /* multiply by 29033 */
4194 +.endm
4195 +
4196 +.macro do_yuv_to_rgb_stage2
4197 + rshrn v20.4h, v20.4s, #15
4198 + rshrn2 v20.8h, v22.4s, #15
4199 + rshrn v24.4h, v24.4s, #14
4200 + rshrn2 v24.8h, v26.4s, #14
4201 + rshrn v28.4h, v28.4s, #14
4202 + rshrn2 v28.8h, v30.4s, #14
4203 + uaddw v20.8h, v20.8h, v0.8b
4204 + uaddw v24.8h, v24.8h, v0.8b
4205 + uaddw v28.8h, v28.8h, v0.8b
4206 +.if \bpp != 16
4207 + sqxtun v1\g_offs\defsize, v20.8h
4208 + sqxtun v1\r_offs\defsize, v24.8h
4209 + sqxtun v1\b_offs\defsize, v28.8h
4210 +.else
4211 + sqshlu v21.8h, v20.8h, #8
4212 + sqshlu v25.8h, v24.8h, #8
4213 + sqshlu v29.8h, v28.8h, #8
4214 + sri v25.8h, v21.8h, #5
4215 + sri v25.8h, v29.8h, #11
4216 +.endif
4217 +
4218 +.endm
4219 +
4220 +.macro do_yuv_to_rgb_stage2_store_load_stage1
4221 + rshrn v20.4h, v20.4s, #15
4222 + rshrn v24.4h, v24.4s, #14
4223 + rshrn v28.4h, v28.4s, #14
4224 + ld1 {v4.8b}, [U], 8
4225 + rshrn2 v20.8h, v22.4s, #15
4226 + rshrn2 v24.8h, v26.4s, #14
4227 + rshrn2 v28.8h, v30.4s, #14
4228 + ld1 {v5.8b}, [V], 8
4229 + uaddw v20.8h, v20.8h, v0.8b
4230 + uaddw v24.8h, v24.8h, v0.8b
4231 + uaddw v28.8h, v28.8h, v0.8b
4232 +.if \bpp != 16 /**************** rgb24/rgb32 *********************************/
4233 + sqxtun v1\g_offs\defsize, v20.8h
4234 + ld1 {v0.8b}, [Y], 8
4235 + sqxtun v1\r_offs\defsize, v24.8h
4236 + prfm PLDL1KEEP, [U, #64]
4237 + prfm PLDL1KEEP, [V, #64]
4238 + prfm PLDL1KEEP, [Y, #64]
4239 + sqxtun v1\b_offs\defsize, v28.8h
4240 + uaddw v6.8h, v2.8h, v4.8b /* v6.16b = u - 128 */
4241 + uaddw v8.8h, v2.8h, v5.8b /* q2 = v - 128 */
4242 + smull v20.4s, v6.4h, v1.4h[1] /* multiply by -11277 */
4243 + smlal v20.4s, v8.4h, v1.4h[2] /* multiply by -23401 */
4244 + smull2 v22.4s, v6.8h, v1.4h[1] /* multiply by -11277 */
4245 + smlal2 v22.4s, v8.8h, v1.4h[2] /* multiply by -23401 */
4246 + smull v24.4s, v8.4h, v1.4h[0] /* multiply by 22971 */
4247 + smull2 v26.4s, v8.8h, v1.4h[0] /* multiply by 22971 */
4248 +.else /**************************** rgb565 ***********************************/
4249 + sqshlu v21.8h, v20.8h, #8
4250 + sqshlu v25.8h, v24.8h, #8
4251 + sqshlu v29.8h, v28.8h, #8
4252 + uaddw v6.8h, v2.8h, v4.8b /* v6.16b = u - 128 */
4253 + uaddw v8.8h, v2.8h, v5.8b /* q2 = v - 128 */
4254 + ld1 {v0.8b}, [Y], 8
4255 + smull v20.4s, v6.4h, v1.4h[1] /* multiply by -11277 */
4256 + smlal v20.4s, v8.4h, v1.4h[2] /* multiply by -23401 */
4257 + smull2 v22.4s, v6.8h, v1.4h[1] /* multiply by -11277 */
4258 + smlal2 v22.4s, v8.8h, v1.4h[2] /* multiply by -23401 */
4259 + sri v25.8h, v21.8h, #5
4260 + smull v24.4s, v8.4h, v1.4h[0] /* multiply by 22971 */
4261 + smull2 v26.4s, v8.8h, v1.4h[0] /* multiply by 22971 */
4262 + prfm PLDL1KEEP, [U, #64]
4263 + prfm PLDL1KEEP, [V, #64]
4264 + prfm PLDL1KEEP, [Y, #64]
4265 + sri v25.8h, v29.8h, #11
4266 +.endif
4267 + do_store \bpp, 8
4268 + smull v28.4s, v6.4h, v1.4h[3] /* multiply by 29033 */
4269 + smull2 v30.4s, v6.8h, v1.4h[3] /* multiply by 29033 */
4270 +.endm
4271 +
4272 +.macro do_yuv_to_rgb
4273 + do_yuv_to_rgb_stage1
4274 + do_yuv_to_rgb_stage2
4275 +.endm
4276 +
4277 +/* Apple gas crashes on adrl, work around that by using adr.
4278 + * But this requires a copy of these constants for each function.
4279 + */
4280 +
4281 +.balign 16
4282 +jsimd_ycc_\colorid\()_neon_consts:
4283 + .short 0, 0, 0, 0
4284 + .short 22971, -11277, -23401, 29033
4285 + .short -128, -128, -128, -128
4286 + .short -128, -128, -128, -128
4287 +
4288 +asm_function jsimd_ycc_\colorid\()_convert_neon
4289 + OUTPUT_WIDTH .req x0
4290 + INPUT_BUF .req x1
4291 + INPUT_ROW .req x2
4292 + OUTPUT_BUF .req x3
4293 + NUM_ROWS .req x4
4294 +
4295 + INPUT_BUF0 .req x5
4296 + INPUT_BUF1 .req x6
4297 + INPUT_BUF2 .req INPUT_BUF
4298 +
4299 + RGB .req x7
4300 + Y .req x8
4301 + U .req x9
4302 + V .req x10
4303 + N .req x15
4304 +
4305 + sub sp, sp, 336
4306 + str x15, [sp], 16
4307 + /* Load constants to d1, d2, d3 (v0.4h is just used for padding) */
4308 + adr x15, jsimd_ycc_\colorid\()_neon_consts
4309 + /* Save NEON registers */
4310 + st1 {v0.8b - v3.8b}, [sp], 32
4311 + st1 {v4.8b - v7.8b}, [sp], 32
4312 + st1 {v8.8b - v11.8b}, [sp], 32
4313 + st1 {v12.8b - v15.8b}, [sp], 32
4314 + st1 {v16.8b - v19.8b}, [sp], 32
4315 + st1 {v20.8b - v23.8b}, [sp], 32
4316 + st1 {v24.8b - v27.8b}, [sp], 32
4317 + st1 {v28.8b - v31.8b}, [sp], 32
4318 + ld1 {v0.4h, v1.4h}, [x15], 16
4319 + ld1 {v2.8h}, [x15]
4320 +
4321 + /* Save ARM registers and handle input arguments */
4322 + /* push {x4, x5, x6, x7, x8, x9, x10, x30} */
4323 + stp x4, x5, [sp], 16
4324 + stp x6, x7, [sp], 16
4325 + stp x8, x9, [sp], 16
4326 + stp x10, x30, [sp], 16
4327 + ldr INPUT_BUF0, [INPUT_BUF]
4328 + ldr INPUT_BUF1, [INPUT_BUF, 8]
4329 + ldr INPUT_BUF2, [INPUT_BUF, 16]
4330 + .unreq INPUT_BUF
4331 +
4332 + /* Initially set v10, v11.4h, v12.8b, d13 to 0xFF */
4333 + movi v10.16b, #255
4334 + movi v13.16b, #255
4335 +
4336 + /* Outer loop over scanlines */
4337 + cmp NUM_ROWS, #1
4338 + blt 9f
4339 +0:
4340 + lsl x16, INPUT_ROW, #3
4341 + ldr Y, [INPUT_BUF0, x16]
4342 + ldr U, [INPUT_BUF1, x16]
4343 + mov N, OUTPUT_WIDTH
4344 + ldr V, [INPUT_BUF2, x16]
4345 + add INPUT_ROW, INPUT_ROW, #1
4346 + ldr RGB, [OUTPUT_BUF], #8
4347 +
4348 + /* Inner loop over pixels */
4349 + subs N, N, #8
4350 + blt 3f
4351 + do_load 8
4352 + do_yuv_to_rgb_stage1
4353 + subs N, N, #8
4354 + blt 2f
4355 +1:
4356 + do_yuv_to_rgb_stage2_store_load_stage1
4357 + subs N, N, #8
4358 + bge 1b
4359 +2:
4360 + do_yuv_to_rgb_stage2
4361 + do_store \bpp, 8
4362 + tst N, #7
4363 + beq 8f
4364 +3:
4365 + tst N, #4
4366 + beq 3f
4367 + do_load 4
4368 +3:
4369 + tst N, #2
4370 + beq 4f
4371 + do_load 2
4372 +4:
4373 + tst N, #1
4374 + beq 5f
4375 + do_load 1
4376 +5:
4377 + do_yuv_to_rgb
4378 + tst N, #4
4379 + beq 6f
4380 + do_store \bpp, 4
4381 +6:
4382 + tst N, #2
4383 + beq 7f
4384 + do_store \bpp, 2
4385 +7:
4386 + tst N, #1
4387 + beq 8f
4388 + do_store \bpp, 1
4389 +8:
4390 + subs NUM_ROWS, NUM_ROWS, #1
4391 + bgt 0b
4392 +9:
4393 + /* Restore all registers and return */
4394 + sub sp, sp, #336
4395 + ldr x15, [sp], 16
4396 + ld1 {v0.8b - v3.8b}, [sp], 32
4397 + ld1 {v4.8b - v7.8b}, [sp], 32
4398 + ld1 {v8.8b - v11.8b}, [sp], 32
4399 + ld1 {v12.8b - v15.8b}, [sp], 32
4400 + ld1 {v16.8b - v19.8b}, [sp], 32
4401 + ld1 {v20.8b - v23.8b}, [sp], 32
4402 + ld1 {v24.8b - v27.8b}, [sp], 32
4403 + ld1 {v28.8b - v31.8b}, [sp], 32
4404 + /* pop {r4, r5, r6, r7, r8, r9, r10, pc} */
4405 + ldp x4, x5, [sp], 16
4406 + ldp x6, x7, [sp], 16
4407 + ldp x8, x9, [sp], 16
4408 + ldp x10, x30, [sp], 16
4409 + br x30
4410 + .unreq OUTPUT_WIDTH
4411 + .unreq INPUT_ROW
4412 + .unreq OUTPUT_BUF
4413 + .unreq NUM_ROWS
4414 + .unreq INPUT_BUF0
4415 + .unreq INPUT_BUF1
4416 + .unreq INPUT_BUF2
4417 + .unreq RGB
4418 + .unreq Y
4419 + .unreq U
4420 + .unreq V
4421 + .unreq N
4422 +
4423 +.purgem do_yuv_to_rgb
4424 +.purgem do_yuv_to_rgb_stage1
4425 +.purgem do_yuv_to_rgb_stage2
4426 +.purgem do_yuv_to_rgb_stage2_store_load_stage1
4427 +.endm
4428 +
4429 +/*--------------------------------- id ----- bpp R rsize G gsize B bsize defsize */
4430 +generate_jsimd_ycc_rgb_convert_neon extrgb, 24, 0, .4h, 1, .4h, 2, .4h, .8b
4431 +generate_jsimd_ycc_rgb_convert_neon extbgr, 24, 2, .4h, 1, .4h, 0, .4h, .8b
4432 +generate_jsimd_ycc_rgb_convert_neon extrgbx, 32, 0, .4h, 1, .4h, 2, .4h, .8b
4433 +generate_jsimd_ycc_rgb_convert_neon extbgrx, 32, 2, .4h, 1, .4h, 0, .4h, .8b
4434 +generate_jsimd_ycc_rgb_convert_neon extxbgr, 32, 3, .4h, 2, .4h, 1, .4h, .8b
4435 +generate_jsimd_ycc_rgb_convert_neon extxrgb, 32, 1, .4h, 2, .4h, 3, .4h, .8b
4436 +generate_jsimd_ycc_rgb_convert_neon rgb565, 16, 0, .4h, 0, .4h, 0, .4h, .8b
4437 +.purgem do_load
4438 +.purgem do_store
OLDNEW
« no previous file with comments | « djpeg.c ('k') | jdapistd.c » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698