google.patch - Issue 1258673007: Add jpeg_skip_scanlines() API to libjpeg-turbo

Side by Side Diff: google.patch

Issue 1258673007: Add jpeg_skip_scanlines() API to libjpeg-turbo (Closed) Base URL: https://chromium.googlesource.com/chromium/deps/libjpeg_turbo.git@master

Patch Set: Updating google.patch Created 5 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
	1 Index: README

	2 ===================================================================

	3 --- README (revision 829)

	4 +++ README (working copy)

	5 @@ -1,26 +1,26 @@

	6 +libjpeg-turbo note: This file has been modified by The libjpeg-turbo Project

	7 +to include only information relevant to libjpeg-turbo, to wordsmith certain

	8 +sections, and to remove impolitic language that existed in the libjpeg v8

	9 +README. It is included only for reference. Please see README-turbo.txt for

	10 +information specific to libjpeg-turbo.

	11 +

	12 +

	13 The Independent JPEG Group's JPEG software

	14 ==========================================

	15

	16 -README for release 6b of 27-Mar-1998

	17 -====================================

	18 +This distribution contains a release of the Independent JPEG Group's free JPEG

	19 +software. You are welcome to redistribute this software and to use it for any

	20 +purpose, subject to the conditions under LEGAL ISSUES, below.

	21

	22 -This distribution contains the sixth public release of the Independent JPEG

	23 -Group's free JPEG software. You are welcome to redistribute this software and

	24 -to use it for any purpose, subject to the conditions under LEGAL ISSUES, below.

	25 +This software is the work of Tom Lane, Guido Vollbeding, Philip Gladstone,

	26 +Bill Allombert, Jim Boucher, Lee Crocker, Bob Friesenhahn, Ben Jackson,

	27 +Julian Minguillon, Luis Ortiz, George Phillips, Davide Rossi, Ge' Weijers,

	28 +and other members of the Independent JPEG Group.

	29

	30 -Serious users of this software (particularly those incorporating it into

	31 -larger programs) should contact IJG at jpeg-info@uunet.uu.net to be added to

	32 -our electronic mailing list. Mailing list members are notified of updates

	33 -and have a chance to participate in technical discussions, etc.

	34 +IJG is not affiliated with the ISO/IEC JTC1/SC29/WG1 standards committee

	35 +(also known as JPEG, together with ITU-T SG16).

	36

	37 -This software is the work of Tom Lane, Philip Gladstone, Jim Boucher,

	38 -Lee Crocker, Julian Minguillon, Luis Ortiz, George Phillips, Davide Rossi,

	39 -Guido Vollbeding, Ge' Weijers, and other members of the Independent JPEG

	40 -Group.

	41

	42 -IJG is not affiliated with the official ISO JPEG standards committee.

	43 -

	44 -

	45 DOCUMENTATION ROADMAP

	46 =====================

	47

	48 @@ -30,7 +30,6 @@

	49 LEGAL ISSUES Copyright, lack of warranty, terms of distribution.

	50 REFERENCES Where to learn more about JPEG.

	51 ARCHIVE LOCATIONS Where to find newer versions of this software.

	52 -RELATED SOFTWARE Other stuff you should get.

	53 FILE FORMAT WARS Software not to get.

	54 TO DO Plans for future IJG releases.

	55

	56 @@ -37,20 +36,19 @@

	57 Other documentation files in the distribution are:

	58

	59 User documentation:

	60 - install.doc How to configure and install the IJG software.

	61 - usage.doc Usage instructions for cjpeg, djpeg, jpegtran,

	62 + install.txt How to configure and install the IJG software.

	63 + usage.txt Usage instructions for cjpeg, djpeg, jpegtran,

	64 rdjpgcom, and wrjpgcom.

	65 - *.1 Unix-style man pages for programs (same info as usage.doc).

	66 - wizard.doc Advanced usage instructions for JPEG wizards only.

	67 + *.1 Unix-style man pages for programs (same info as usage.txt).

	68 + wizard.txt Advanced usage instructions for JPEG wizards only.

	69 change.log Version-to-version change highlights.

	70 Programmer and internal documentation:

	71 - libjpeg.doc How to use the JPEG library in your own programs.

	72 + libjpeg.txt How to use the JPEG library in your own programs.

	73 example.c Sample code for calling the JPEG library.

	74 - structure.doc Overview of the JPEG library's internal structure.

	75 - filelist.doc Road map of IJG files.

	76 - coderules.doc Coding style rules --- please read if you contribute code.

	77 + structure.txt Overview of the JPEG library's internal structure.

	78 + coderules.txt Coding style rules --- please read if you contribute code.

	79

	80 -Please read at least the files install.doc and usage.doc. Useful information

	81 +Please read at least the files install.txt and usage.txt. Some information

	82 can also be found in the JPEG FAQ (Frequently Asked Questions) article. See

	83 ARCHIVE LOCATIONS below to find out where to obtain the FAQ article.

	84

	85 @@ -62,24 +60,27 @@

	86 OVERVIEW

	87 ========

	88

	89 -This package contains C software to implement JPEG image compression and

	90 -decompression. JPEG (pronounced "jay-peg") is a standardized compression

	91 -method for full-color and gray-scale images. JPEG is intended for compressing

	92 -"real-world" scenes; line drawings, cartoons and other non-realistic images

	93 -are not its strong suit. JPEG is lossy, meaning that the output image is not

	94 -exactly identical to the input image. Hence you must not use JPEG if you

	95 -have to have identical output bits. However, on typical photographic images,

	96 -very good compression levels can be obtained with no visible change, and

	97 -remarkably high compression levels are possible if you can tolerate a

	98 -low-quality image. For more details, see the references, or just experiment

	99 -with various compression settings.

	100 +This package contains C software to implement JPEG image encoding, decoding,

	101 +and transcoding. JPEG (pronounced "jay-peg") is a standardized compression

	102 +method for full-color and gray-scale images. JPEG's strong suit is compressing

	103 +photographic images or other types of images that have smooth color and

	104 +brightness transitions between neighboring pixels. Images with sharp lines or

	105 +other abrupt features may not compress well with JPEG, and a higher JPEG

	106 +quality may have to be used to avoid visible compression artifacts with such

	107 +images.

	108

	109 +JPEG is lossy, meaning that the output pixels are not necessarily identical to

	110 +the input pixels. However, on photographic content and other "smooth" images,

	111 +very good compression ratios can be obtained with no visible compression

	112 +artifacts, and extremely high compression ratios are possible if you are

	113 +willing to sacrifice image quality (by reducing the "quality" setting in the

	114 +compressor.)

	115 +

	116 This software implements JPEG baseline, extended-sequential, and progressive

	117 compression processes. Provision is made for supporting all variants of these

	118 processes, although some uncommon parameter settings aren't implemented yet.

	119 -For legal reasons, we are not distributing code for the arithmetic-coding

	120 -variants of JPEG; see LEGAL ISSUES. We have made no provision for supporting

	121 -the hierarchical or lossless processes defined in the standard.

	122 +We have made no provision for supporting the hierarchical or lossless

	123 +processes defined in the standard.

	124

	125 We provide a set of library routines for reading and writing JPEG image files,

	126 plus two sample applications "cjpeg" and "djpeg", which use the library to

	127 @@ -91,11 +92,12 @@

	128 for example, the color quantization modules are not strictly part of JPEG

	129 decoding, but they are essential for output to colormapped file formats or

	130 colormapped displays. These extra functions can be compiled out of the

	131 -library if not required for a particular application. We have also included

	132 -"jpegtran", a utility for lossless transcoding between different JPEG

	133 -processes, and "rdjpgcom" and "wrjpgcom", two simple applications for

	134 -inserting and extracting textual comments in JFIF files.

	135 +library if not required for a particular application.

	136

	137 +We have also included "jpegtran", a utility for lossless transcoding between

	138 +different JPEG processes, and "rdjpgcom" and "wrjpgcom", two simple

	139 +applications for inserting and extracting textual comments in JFIF files.

	140 +

	141 The emphasis in designing this software has been on achieving portability and

	142 flexibility, while also making it fast enough to be useful. In particular,

	143 the software is not intended to be read as a tutorial on JPEG. (See the

	144 @@ -127,7 +129,7 @@

	145 fitness for a particular purpose. This software is provided "AS IS", and you,

	146 its user, assume the entire risk as to its quality and accuracy.

	147

	148 -This software is copyright (C) 1991-1998, Thomas G. Lane.

	149 +This software is copyright (C) 1991-2012, Thomas G. Lane, Guido Vollbeding.

	150 All Rights Reserved except as specified below.

	151

	152 Permission is hereby granted to use, copy, modify, and distribute this

	153 @@ -158,30 +160,12 @@

	154 assumed by the product vendor.

	155

	156

	157 -ansi2knr.c is included in this distribution by permission of L. Peter Deutsch,

	158 -sole proprietor of its copyright holder, Aladdin Enterprises of Menlo Park, CA.

	159 -ansi2knr.c is NOT covered by the above copyright and conditions, but instead

	160 -by the usual distribution terms of the Free Software Foundation; principally,

	161 -that you must include source code if you redistribute it. (See the file

	162 -ansi2knr.c for full details.) However, since ansi2knr.c is not needed as part

	163 -of any program generated from the IJG code, this does not limit you more than

	164 -the foregoing paragraphs do.

	165 -

	166 The Unix configuration script "configure" was produced with GNU Autoconf.

	167 It is copyright by the Free Software Foundation but is freely distributable.

	168 The same holds for its supporting scripts (config.guess, config.sub,

	169 -ltconfig, ltmain.sh). Another support script, install-sh, is copyright

	170 -by M.I.T. but is also freely distributable.

	171 +ltmain.sh). Another support script, install-sh, is copyright by X Consortium

	172 +but is also freely distributable.

	173

	174 -It appears that the arithmetic coding option of the JPEG spec is covered by

	175 -patents owned by IBM, AT&T, and Mitsubishi. Hence arithmetic coding cannot

	176 -legally be used without obtaining one or more licenses. For this reason,

	177 -support for arithmetic coding has been removed from the free JPEG software.

	178 -(Since arithmetic coding provides only a marginal gain over the unpatented

	179 -Huffman mode, it is unlikely that very many implementations will support it.)

	180 -So far as we are aware, there are no patent restrictions on the remaining

	181 -code.

	182 -

	183 The IJG distribution formerly included code to read and write GIF files.

	184 To avoid entanglement with the Unisys LZW patent, GIF reading support has

	185 been removed altogether, and the GIF writer has been simplified to produce

	186 @@ -198,7 +182,7 @@

	187 REFERENCES

	188 ==========

	189

	190 -We highly recommend reading one or more of these references before trying to

	191 +We recommend reading one or more of these references before trying to

	192 understand the innards of the JPEG software.

	193

	194 The best short technical introduction to the JPEG compression algorithm is

	195 @@ -207,7 +191,7 @@

	196 (Adjacent articles in that issue discuss MPEG motion picture compression,

	197 applications of JPEG, and related topics.) If you don't have the CACM issue

	198 handy, a PostScript file containing a revised version of Wallace's article is

	199 -available at ftp://ftp.uu.net/graphics/jpeg/wallace.ps.gz. The file (actually

	200 +available at http://www.ijg.org/files/wallace.ps.gz. The file (actually

	201 a preprint for an article that appeared in IEEE Trans. Consumer Electronics)

	202 omits the sample images that appeared in CACM, but it includes corrections

	203 and some added material. Note: the Wallace article is copyright ACM and IEEE,

	204 @@ -222,45 +206,29 @@

	205 sample code is far from industrial-strength, but when you are ready to look

	206 at a full implementation, you've got one here...

	207

	208 -The best full description of JPEG is the textbook "JPEG Still Image Data

	209 -Compression Standard" by William B. Pennebaker and Joan L. Mitchell, published

	210 -by Van Nostrand Reinhold, 1993, ISBN 0-442-01272-1. Price US$59.95, 638 pp.

	211 -The book includes the complete text of the ISO JPEG standards (DIS 10918-1

	212 -and draft DIS 10918-2). This is by far the most complete exposition of JPEG

	213 -in existence, and we highly recommend it.

	214 +The best currently available description of JPEG is the textbook "JPEG Still

	215 +Image Data Compression Standard" by William B. Pennebaker and Joan L.

	216 +Mitchell, published by Van Nostrand Reinhold, 1993, ISBN 0-442-01272-1.

	217 +Price US$59.95, 638 pp. The book includes the complete text of the ISO JPEG

	218 +standards (DIS 10918-1 and draft DIS 10918-2).

	219

	220 -The JPEG standard itself is not available electronically; you must order a

	221 -paper copy through ISO or ITU. (Unless you feel a need to own a certified

	222 -official copy, we recommend buying the Pennebaker and Mitchell book instead;

	223 -it's much cheaper and includes a great deal of useful explanatory material.)

	224 -In the USA, copies of the standard may be ordered from ANSI Sales at (212)

	225 -642-4900, or from Global Engineering Documents at (800) 854-7179. (ANSI

	226 -doesn't take credit card orders, but Global does.) It's not cheap: as of

	227 -1992, ANSI was charging $95 for Part 1 and $47 for Part 2, plus 7%

	228 -shipping/handling. The standard is divided into two parts, Part 1 being the

	229 -actual specification, while Part 2 covers compliance testing methods. Part 1

	230 -is titled "Digital Compression and Coding of Continuous-tone Still Images,

	231 +The original JPEG standard is divided into two parts, Part 1 being the actual

	232 +specification, while Part 2 covers compliance testing methods. Part 1 is

	233 +titled "Digital Compression and Coding of Continuous-tone Still Images,

	234 Part 1: Requirements and guidelines" and has document numbers ISO/IEC IS

	235 10918-1, ITU-T T.81. Part 2 is titled "Digital Compression and Coding of

	236 Continuous-tone Still Images, Part 2: Compliance testing" and has document

	237 numbers ISO/IEC IS 10918-2, ITU-T T.83.

	238

	239 -Some extensions to the original JPEG standard are defined in JPEG Part 3,

	240 -a newer ISO standard numbered ISO/IEC IS 10918-3 and ITU-T T.84. IJG

	241 -currently does not support any Part 3 extensions.

	242 -

	243 The JPEG standard does not specify all details of an interchangeable file

	244 format. For the omitted details we follow the "JFIF" conventions, revision

	245 -1.02. A copy of the JFIF spec is available from:

	246 - Literature Department

	247 - C-Cube Microsystems, Inc.

	248 - 1778 McCarthy Blvd.

	249 - Milpitas, CA 95035

	250 - phone (408) 944-6300, fax (408) 944-6314

	251 -A PostScript version of this document is available by FTP at

	252 -ftp://ftp.uu.net/graphics/jpeg/jfif.ps.gz. There is also a plain text

	253 -version at ftp://ftp.uu.net/graphics/jpeg/jfif.txt.gz, but it is missing

	254 -the figures.

	255 +1.02. JFIF 1.02 has been adopted as an Ecma International Technical Report

	256 +and thus received a formal publication status. It is available as a free

	257 +download in PDF format from

	258 +http://www.ecma-international.org/publications/techreports/E-TR-098.htm.

	259 +A PostScript version of the JFIF document is available at

	260 +http://www.ijg.org/files/jfif.ps.gz. There is also a plain text version at

	261 +http://www.ijg.org/files/jfif.txt.gz, but it is missing the figures.

	262

	263 The TIFF 6.0 file format specification can be obtained by FTP from

	264 ftp://ftp.sgi.com/graphics/tiff/TIFF6.ps.gz. The JPEG incorporation scheme

	265 @@ -267,37 +235,24 @@

	266 found in the TIFF 6.0 spec of 3-June-92 has a number of serious problems.

	267 IJG does not recommend use of the TIFF 6.0 design (TIFF Compression tag 6).

	268 Instead, we recommend the JPEG design proposed by TIFF Technical Note #2

	269 -(Compression tag 7). Copies of this Note can be obtained from ftp.sgi.com or

	270 -from ftp://ftp.uu.net/graphics/jpeg/. It is expected that the next revision

	271 +(Compression tag 7). Copies of this Note can be obtained from

	272 +http://www.ijg.org/files/. It is expected that the next revision

	273 of the TIFF spec will replace the 6.0 JPEG design with the Note's design.

	274 Although IJG's own code does not support TIFF/JPEG, the free libtiff library

	275 -uses our library to implement TIFF/JPEG per the Note. libtiff is available

	276 -from ftp://ftp.sgi.com/graphics/tiff/.

	277 +uses our library to implement TIFF/JPEG per the Note.

	278

	279

	280 ARCHIVE LOCATIONS

	281 =================

	282

	283 -The "official" archive site for this software is ftp.uu.net (Internet

	284 -address 192.48.96.9). The most recent released version can always be found

	285 -there in directory graphics/jpeg. This particular version will be archived

	286 -as ftp://ftp.uu.net/graphics/jpeg/jpegsrc.v6b.tar.gz. If you don't have

	287 -direct Internet access, UUNET's archives are also available via UUCP; contact

	288 -help@uunet.uu.net for information on retrieving files that way.

	289 +The "official" archive site for this software is www.ijg.org.

	290 +The most recent released version can always be found there in

	291 +directory "files". This particular version will be archived as

	292 +http://www.ijg.org/files/jpegsrc.v8d.tar.gz, and in Windows-compatible

	293 +"zip" archive format as http://www.ijg.org/files/jpegsr8d.zip.

	294

	295 -Numerous Internet sites maintain copies of the UUNET files. However, only

	296 -ftp.uu.net is guaranteed to have the latest official version.

	297 -

	298 -You can also obtain this software in DOS-compatible "zip" archive format from

	299 -the SimTel archives (ftp://ftp.simtel.net/pub/simtelnet/msdos/graphics/), or

	300 -on CompuServe in the Graphics Support forum (GO CIS:GRAPHSUP), library 12

	301 -"JPEG Tools". Again, these versions may sometimes lag behind the ftp.uu.net

	302 -release.

	303 -

	304 -The JPEG FAQ (Frequently Asked Questions) article is a useful source of

	305 -general information about JPEG. It is updated constantly and therefore is

	306 -not included in this distribution. The FAQ is posted every two weeks to

	307 -Usenet newsgroups comp.graphics.misc, news.answers, and other groups.

	308 +The JPEG FAQ (Frequently Asked Questions) article is a source of some

	309 +general information about JPEG.

	310 It is available on the World Wide Web at http://www.faqs.org/faqs/jpeg-faq/

	311 and other news.answers archive sites, including the official news.answers

	312 archive at rtfm.mit.edu: ftp://rtfm.mit.edu/pub/usenet/news.answers/jpeg-faq/.

	313 @@ -307,79 +262,21 @@

	314 send usenet/news.answers/jpeg-faq/part2

	315

	316

	317 -RELATED SOFTWARE

	318 -================

	319 -

	320 -Numerous viewing and image manipulation programs now support JPEG. (Quite a

	321 -few of them use this library to do so.) The JPEG FAQ described above lists

	322 -some of the more popular free and shareware viewers, and tells where to

	323 -obtain them on Internet.

	324 -

	325 -If you are on a Unix machine, we highly recommend Jef Poskanzer's free

	326 -PBMPLUS software, which provides many useful operations on PPM-format image

	327 -files. In particular, it can convert PPM images to and from a wide range of

	328 -other formats, thus making cjpeg/djpeg considerably more useful. The latest

	329 -version is distributed by the NetPBM group, and is available from numerous

	330 -sites, notably ftp://wuarchive.wustl.edu/graphics/graphics/packages/NetPBM/.

	331 -Unfortunately PBMPLUS/NETPBM is not nearly as portable as the IJG software is;

	332 -you are likely to have difficulty making it work on any non-Unix machine.

	333 -

	334 -A different free JPEG implementation, written by the PVRG group at Stanford,

	335 -is available from ftp://havefun.stanford.edu/pub/jpeg/. This program

	336 -is designed for research and experimentation rather than production use;

	337 -it is slower, harder to use, and less portable than the IJG code, but it

	338 -is easier to read and modify. Also, the PVRG code supports lossless JPEG,

	339 -which we do not. (On the other hand, it doesn't do progressive JPEG.)

	340 -

	341 -

	342 FILE FORMAT WARS

	343 ================

	344

	345 -Some JPEG programs produce files that are not compatible with our library.

	346 -The root of the problem is that the ISO JPEG committee failed to specify a

	347 -concrete file format. Some vendors "filled in the blanks" on their own,

	348 -creating proprietary formats that no one else could read. (For example, none

	349 -of the early commercial JPEG implementations for the Macintosh were able to

	350 -exchange compressed files.)

	351 +The ISO/IEC JTC1/SC29/WG1 standards committee (also known as JPEG, together

	352 +with ITU-T SG16) currently promotes different formats containing the name

	353 +"JPEG" which are incompatible with original DCT-based JPEG. IJG therefore does

	354 +not support these formats (see REFERENCES). Indeed, one of the original

	355 +reasons for developing this free software was to help force convergence on

	356 +common, interoperable format standards for JPEG files.

	357 +Don't use an incompatible file format!

	358 +(In any case, our decoder will remain capable of reading existing JPEG

	359 +image files indefinitely.)

	360

	361 -The file format we have adopted is called JFIF (see REFERENCES). This format

	362 -has been agreed to by a number of major commercial JPEG vendors, and it has

	363 -become the de facto standard. JFIF is a minimal or "low end" representation.

	364 -We recommend the use of TIFF/JPEG (TIFF revision 6.0 as modified by TIFF

	365 -Technical Note #2) for "high end" applications that need to record a lot of

	366 -additional data about an image. TIFF/JPEG is fairly new and not yet widely

	367 -supported, unfortunately.

	368

	369 -The upcoming JPEG Part 3 standard defines a file format called SPIFF.

	370 -SPIFF is interoperable with JFIF, in the sense that most JFIF decoders should

	371 -be able to read the most common variant of SPIFF. SPIFF has some technical

	372 -advantages over JFIF, but its major claim to fame is simply that it is an

	373 -official standard rather than an informal one. At this point it is unclear

	374 -whether SPIFF will supersede JFIF or whether JFIF will remain the de-facto

	375 -standard. IJG intends to support SPIFF once the standard is frozen, but we

	376 -have not decided whether it should become our default output format or not.

	377 -(In any case, our decoder will remain capable of reading JFIF indefinitely.)

	378 -

	379 -Various proprietary file formats incorporating JPEG compression also exist.

	380 -We have little or no sympathy for the existence of these formats. Indeed,

	381 -one of the original reasons for developing this free software was to help

	382 -force convergence on common, open format standards for JPEG files. Don't

	383 -use a proprietary file format!

	384 -

	385 -

	386 TO DO

	387 =====

	388

	389 -The major thrust for v7 will probably be improvement of visual quality.

	390 -The current method for scaling the quantization tables is known not to be

	391 -very good at low Q values. We also intend to investigate block boundary

	392 -smoothing, "poor man's variable quantization", and other means of improving

	393 -quality-vs-file-size performance without sacrificing compatibility.

	394 -

	395 -In future versions, we are considering supporting some of the upcoming JPEG

	396 -Part 3 extensions --- principally, variable quantization and the SPIFF file

	397 -format.

	398 -

	399 -As always, speeding things up is of great interest.

	400 -

	401 -Please send bug reports, offers of help, etc. to jpeg-info@uunet.uu.net.

	402 +Please send bug reports, offers of help, etc. to jpeg-info@jpegclub.org.

	403 Index: bmp.c

	404 ===================================================================

	405 --- bmp.c (revision 829)

	406 +++ bmp.c (working copy)

	407 @@ -1,370 +1,274 @@

	408 -/* Copyright (C)2004 Landmark Graphics Corporation

	409 - * Copyright (C)2005 Sun Microsystems, Inc.

	410 +/*

	411 + * Copyright (C)2011 D. R. Commander. All Rights Reserved.

	412 *

	413 - * This library is free software and may be redistributed and/or modified under

	414 - * the terms of the wxWindows Library License, Version 3.1 or (at your option)

	415 - * any later version. The full license is in the LICENSE.txt file included

	416 - * with this distribution.

	417 + * Redistribution and use in source and binary forms, with or without

	418 + * modification, are permitted provided that the following conditions are met:

	419 *

	420 - * This library is distributed in the hope that it will be useful,

	421 - * but WITHOUT ANY WARRANTY; without even the implied warranty of

	422 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

	423 - * wxWindows Library License for more details.

	424 -*/

	425 + * - Redistributions of source code must retain the above copyright notice,

	426 + * this list of conditions and the following disclaimer.

	427 + * - Redistributions in binary form must reproduce the above copyright notice,

	428 + * this list of conditions and the following disclaimer in the documentation

	429 + * and/or other materials provided with the distribution.

	430 + * - Neither the name of the libjpeg-turbo Project nor the names of its

	431 + * contributors may be used to endorse or promote products derived from this

	432 + * software without specific prior written permission.

	433 + *

	434 + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS",

	435 + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE

	436 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE

	437 + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE

	438 + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR

	439 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF

	440 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS

	441 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN

	442 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)

	443 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE

	444 + * POSSIBILITY OF SUCH DAMAGE.

	445 + */

	446

	447 -#include <fcntl.h>

	448 -#include <sys/types.h>

	449 -#include <sys/stat.h>

	450 -#include <errno.h>

	451 -#include <stdlib.h>

	452 #include <stdio.h>

	453 #include <string.h>

	454 -#ifdef _WIN32

	455 - #include <io.h>

	456 -#else

	457 - #include <unistd.h>

	458 -#endif

	459 -#include "./rrutil.h"

	460 -#include "./bmp.h"

	461 +#include <setjmp.h>

	462 +#include <errno.h>

	463 +#include "cdjpeg.h"

	464 +#include <jpeglib.h>

	465 +#include <jpegint.h>

	466 +#include "tjutil.h"

	467 +#include "bmp.h"

	468

	469 -#ifndef BI_BITFIELDS

	470 -#define BI_BITFIELDS 3L

	471 -#endif

	472 -#ifndef BI_RGB

	473 -#define BI_RGB 0L

	474 -#endif

	475

	476 -#define BMPHDRSIZE 54

	477 -typedef struct _bmphdr

	478 -{

	479 - unsigned short bfType;

	480 - unsigned int bfSize;

	481 - unsigned short bfReserved1, bfReserved2;

	482 - unsigned int bfOffBits;

	483 +/* This duplicates the functionality of the VirtualGL bitmap library using

	484 + the components from cjpeg and djpeg */

	485

	486 - unsigned int biSize;

	487 - int biWidth, biHeight;

	488 - unsigned short biPlanes, biBitCount;

	489 - unsigned int biCompression, biSizeImage;

	490 - int biXPelsPerMeter, biYPelsPerMeter;

	491 - unsigned int biClrUsed, biClrImportant;

	492 -} bmphdr;

	493

	494 -static const char *__bmperr="No error";

	495 +/* Error handling (based on example in example.c) */

	496

	497 -static const int ps[BMPPIXELFORMATS]={3, 4, 3, 4, 4, 4};

	498 -static const int roffset[BMPPIXELFORMATS]={0, 0, 2, 2, 3, 1};

	499 -static const int goffset[BMPPIXELFORMATS]={1, 1, 1, 1, 2, 2};

	500 -static const int boffset[BMPPIXELFORMATS]={2, 2, 0, 0, 1, 3};

	501 +static char errStr[JMSG_LENGTH_MAX]="No error";

	502

	503 -#define _throw(m) {__bmperr=m; retcode=-1; goto finally;}

	504 -#define _unix(f) {if((f)==-1) _throw(strerror(errno));}

	505 -#define _catch(f) {if((f)==-1) {retcode=-1; goto finally;}}

	506 +struct my_error_mgr

	507 +{

	508 + struct jpeg_error_mgr pub;

	509 + jmp_buf setjmp_buffer;

	510 +};

	511 +typedef struct my_error_mgr *my_error_ptr;

	512

	513 -#define readme(fd, addr, size) \

	514 - if((bytesread=read(fd, addr, (size)))==-1) _throw(strerror(errno)); \

	515 - if(bytesread!=(size)) _throw("Read error");

	516 -

	517 -void pixelconvert(unsigned char *srcbuf, enum BMPPIXELFORMAT srcformat,

	518 - int srcpitch, unsigned char *dstbuf, enum BMPPIXELFORMAT dstformat, int dstpitch,

	519 - int w, int h, int flip)

	520 +static void my_error_exit(j_common_ptr cinfo)

	521 {

	522 - unsigned char srcptr, srcptr0, dstptr, dstptr0;

	523 - int i, j;

	524 -

	525 - srcptr=flip? &srcbuf[srcpitch*(h-1)]:srcbuf;

	526 - for(j=0, dstptr=dstbuf; j<h; j++,

	527 - srcptr+=flip? -srcpitch:srcpitch, dstptr+=dstpitch)

	528 - {

	529 - for(i=0, srcptr0=srcptr, dstptr0=dstptr; i<w; i++,

	530 - srcptr0+=ps[srcformat], dstptr0+=ps[dstformat])

	531 - {

	532 - dstptr0[roffset[dstformat]]=srcptr0[roffset[srcformat]];

	533 - dstptr0[goffset[dstformat]]=srcptr0[goffset[srcformat]];

	534 - dstptr0[boffset[dstformat]]=srcptr0[boffset[srcformat]];

	535 - }

	536 - }

	537 + my_error_ptr myerr=(my_error_ptr)cinfo->err;

	538 + (*cinfo->err->output_message)(cinfo);

	539 + longjmp(myerr->setjmp_buffer, 1);

	540 }

	541

	542 -int loadppm(int fd, unsigned char buf, int w, int *h,

	543 - enum BMPPIXELFORMAT f, int align, int dstbottomup, int ascii)

	544 +/* Based on output_message() in jerror.c */

	545 +

	546 +static void my_output_message(j_common_ptr cinfo)

	547 {

	548 - FILE *fs=NULL; int retcode=0, scalefactor, dstpitch;

	549 - unsigned char *tempbuf=NULL; char temps[255], temps2[255];

	550 - int numread=0, totalread=0, pixel[3], i, j;

	551 + (*cinfo->err->format_message)(cinfo, errStr);

	552 +}

	553

	554 - if((fs=fdopen(*fd, "r"))==NULL) _throw(strerror(errno));

	555 +#define _throw(m) {snprintf(errStr, JMSG_LENGTH_MAX, "%s", m); \

	556 + retval=-1; goto bailout;}

	557 +#define _throwunix(m) {snprintf(errStr, JMSG_LENGTH_MAX, "%s\n%s", m, \

	558 + strerror(errno)); retval=-1; goto bailout;}

	559

	560 - do

	561 - {

	562 - if(!fgets(temps, 255, fs)) _throw("Read error");

	563 - if(strlen(temps)==0 \|\| temps[0]=='\n') continue;

	564 - if(sscanf(temps, "%s", temps2)==1 && temps2[1]=='#') continue;

	565 - switch(totalread)

	566 - {

	567 - case 0:

	568 - if((numread=sscanf(temps, "%d %d %d", w, h, &sca lefactor))==EOF)

	569 - _throw("Read error");

	570 - break;

	571 - case 1:

	572 - if((numread=sscanf(temps, "%d %d", h, &scalefact or))==EOF)

	573 - _throw("Read error");

	574 - break;

	575 - case 2:

	576 - if((numread=sscanf(temps, "%d", &scalefactor))== EOF)

	577 - _throw("Read error");

	578 - break;

	579 - }

	580 - totalread+=numread;

	581 - } while(totalread<3);

	582 - if((w)<1 \|\| (h)<1 \|\| scalefactor<1) _throw("Corrupt PPM header");

	583

	584 - dstpitch=(((w)ps[f])+(align-1))&(~(align-1));

	585 - if((buf=(unsigned char )malloc(dstpitch(h)))==NULL)

	586 - _throw("Memory allocation error");

	587 - if(ascii)

	588 +static void pixelconvert(unsigned char *srcbuf, int srcpf, int srcbottomup,

	589 + unsigned char *dstbuf, int dstpf, int dstbottomup, int w, int h)

	590 +{

	591 + unsigned char srcptr=srcbuf, srcptr2;

	592 + int srcps=tjPixelSize[srcpf];

	593 + int srcstride=srcbottomup? -wsrcps:wsrcps;

	594 + unsigned char dstptr=dstbuf, dstptr2;

	595 + int dstps=tjPixelSize[dstpf];

	596 + int dststride=dstbottomup? -wdstps:wdstps;

	597 + int row, col;

	598 +

	599 + if(srcbottomup) srcptr=&srcbuf[wsrcps(h-1)];

	600 + if(dstbottomup) dstptr=&dstbuf[wdstps(h-1)];

	601 + for(row=0; row<h; row++, srcptr+=srcstride, dstptr+=dststride)

	602 {

	603 - for(j=0; j<*h; j++)

	604 + for(col=0, srcptr2=srcptr, dstptr2=dstptr; col<w; col++, srcptr2 +=srcps,

	605 + dstptr2+=dstps)

	606 {

	607 - for(i=0; i<*w; i++)

	608 - {

	609 - if(fscanf(fs, "%d%d%d", &pixel[0], &pixel[1], &p ixel[2])!=3)

	610 - _throw("Read error");

	611 - (buf)[jdstpitch+ips[f]+roffset[f]]=(unsigned char)(pixel[0]255/scalefactor);

	612 - (buf)[jdstpitch+ips[f]+goffset[f]]=(unsigned char)(pixel[1]255/scalefactor);

	613 - (buf)[jdstpitch+ips[f]+boffset[f]]=(unsigned char)(pixel[2]255/scalefactor);

	614 - }

	615 + dstptr2[tjRedOffset[dstpf]]=srcptr2[tjRedOffset[srcpf]];

	616 + dstptr2[tjGreenOffset[dstpf]]=srcptr2[tjGreenOffset[srcp f]];

	617 + dstptr2[tjBlueOffset[dstpf]]=srcptr2[tjBlueOffset[srcpf] ];

	618 }

	619 }

	620 - else

	621 - {

	622 - if(scalefactor!=255)

	623 - _throw("Binary PPMs must have 8-bit components");

	624 - if((tempbuf=(unsigned char )malloc((w)(h)*3))==NULL)

	625 - _throw("Memory allocation error");

	626 - if(fread(tempbuf, (w)(h)3, 1, fs)!=1) _throw("Read error");

	627 - pixelconvert(tempbuf, BMP_RGB, (w)3, buf, f, dstpitch, w, *h , dstbottomup);

	628 - }

	629 -

	630 - finally:

	631 - if(fs) {fclose(fs); *fd=-1;}

	632 - if(tempbuf) free(tempbuf);

	633 - return retcode;

	634 }

	635

	636

	637 int loadbmp(char filename, unsigned char buf, int w, int *h,

	638 - enum BMPPIXELFORMAT f, int align, int dstbottomup)

	639 + int dstpf, int bottomup)

	640 {

	641 - int fd=-1, bytesread, srcpitch, srcbottomup=1, srcps, dstpitch,

	642 - retcode=0;

	643 - unsigned char *tempbuf=NULL;

	644 - bmphdr bh; int flags=O_RDONLY;

	645 + int retval=0, dstps, srcpf, tempc;

	646 + struct jpeg_compress_struct cinfo;

	647 + struct my_error_mgr jerr;

	648 + cjpeg_source_ptr src;

	649 + FILE *file=NULL;

	650

	651 - dstbottomup=dstbottomup? 1:0;

	652 - #ifdef _WIN32

	653 - flags\|=O_BINARY;

	654 - #endif

	655 - if(!filename \|\| !buf \|\| !w \|\| !h \|\| f<0 \|\| f>BMPPIXELFORMATS-1 \|\| align< 1)

	656 - _throw("invalid argument to loadbmp()");

	657 - if((align&(align-1))!=0)

	658 - _throw("Alignment must be a power of 2");

	659 - _unix(fd=open(filename, flags));

	660 + memset(&cinfo, 0, sizeof(struct jpeg_compress_struct));

	661

	662 - readme(fd, &bh.bfType, sizeof(unsigned short));

	663 - if(!littleendian()) bh.bfType=byteswap16(bh.bfType);

	664 + if(!filename \|\| !buf \|\| !w \|\| !h \|\| dstpf<0 \|\| dstpf>=TJ_NUMPF)

	665 + _throw("loadbmp(): Invalid argument");

	666

	667 - if(bh.bfType==0x3650)

	668 + if((file=fopen(filename, "rb"))==NULL)

	669 + _throwunix("loadbmp(): Cannot open input file");

	670 +

	671 + cinfo.err=jpeg_std_error(&jerr.pub);

	672 + jerr.pub.error_exit=my_error_exit;

	673 + jerr.pub.output_message=my_output_message;

	674 +

	675 + if(setjmp(jerr.setjmp_buffer))

	676 {

	677 - _catch(loadppm(&fd, buf, w, h, f, align, dstbottomup, 0));

	678 - goto finally;

	679 + /* If we get here, the JPEG code has signaled an error. */

	680 + retval=-1; goto bailout;

	681 }

	682 - if(bh.bfType==0x3350)

	683 - {

	684 - _catch(loadppm(&fd, buf, w, h, f, align, dstbottomup, 1));

	685 - goto finally;

	686 - }

	687

	688 - readme(fd, &bh.bfSize, sizeof(unsigned int));

	689 - readme(fd, &bh.bfReserved1, sizeof(unsigned short));

	690 - readme(fd, &bh.bfReserved2, sizeof(unsigned short));

	691 - readme(fd, &bh.bfOffBits, sizeof(unsigned int));

	692 - readme(fd, &bh.biSize, sizeof(unsigned int));

	693 - readme(fd, &bh.biWidth, sizeof(int));

	694 - readme(fd, &bh.biHeight, sizeof(int));

	695 - readme(fd, &bh.biPlanes, sizeof(unsigned short));

	696 - readme(fd, &bh.biBitCount, sizeof(unsigned short));

	697 - readme(fd, &bh.biCompression, sizeof(unsigned int));

	698 - readme(fd, &bh.biSizeImage, sizeof(unsigned int));

	699 - readme(fd, &bh.biXPelsPerMeter, sizeof(int));

	700 - readme(fd, &bh.biYPelsPerMeter, sizeof(int));

	701 - readme(fd, &bh.biClrUsed, sizeof(unsigned int));

	702 - readme(fd, &bh.biClrImportant, sizeof(unsigned int));

	703 + jpeg_create_compress(&cinfo);

	704 + if((tempc=getc(file))<0 \|\| ungetc(tempc, file)==EOF)

	705 + _throwunix("loadbmp(): Could not read input file")

	706 + else if(tempc==EOF) _throw("loadbmp(): Input file contains no data");

	707

	708 - if(!littleendian())

	709 + if(tempc=='B')

	710 {

	711 - bh.bfSize=byteswap(bh.bfSize);

	712 - bh.bfOffBits=byteswap(bh.bfOffBits);

	713 - bh.biSize=byteswap(bh.biSize);

	714 - bh.biWidth=byteswap(bh.biWidth);

	715 - bh.biHeight=byteswap(bh.biHeight);

	716 - bh.biPlanes=byteswap16(bh.biPlanes);

	717 - bh.biBitCount=byteswap16(bh.biBitCount);

	718 - bh.biCompression=byteswap(bh.biCompression);

	719 - bh.biSizeImage=byteswap(bh.biSizeImage);

	720 - bh.biXPelsPerMeter=byteswap(bh.biXPelsPerMeter);

	721 - bh.biYPelsPerMeter=byteswap(bh.biYPelsPerMeter);

	722 - bh.biClrUsed=byteswap(bh.biClrUsed);

	723 - bh.biClrImportant=byteswap(bh.biClrImportant);

	724 + if((src=jinit_read_bmp(&cinfo))==NULL)

	725 + _throw("loadbmp(): Could not initialize bitmap loader");

	726 }

	727 + else if(tempc=='P')

	728 + {

	729 + if((src=jinit_read_ppm(&cinfo))==NULL)

	730 + _throw("loadbmp(): Could not initialize bitmap loader");

	731 + }

	732 + else _throw("loadbmp(): Unsupported file type");

	733

	734 - if(bh.bfType!=0x4d42 \|\| bh.bfOffBits<BMPHDRSIZE

	735 - \|\| bh.biWidth<1 \|\| bh.biHeight==0)

	736 - _throw("Corrupt bitmap header");

	737 - if((bh.biBitCount!=24 && bh.biBitCount!=32) \|\| bh.biCompression!=BI_RGB)

	738 - _throw("Only uncompessed RGB bitmaps are supported");

	739 + src->input_file=file;

	740 + (*src->start_input)(&cinfo, src);

	741 + (*cinfo.mem->realize_virt_arrays)((j_common_ptr)&cinfo);

	742

	743 - w=bh.biWidth; h=bh.biHeight; srcps=bh.biBitCount/8;

	744 - if(h<0) {h=-(*h); srcbottomup=0;}

	745 - srcpitch=(((w)srcps)+3)&(~3);

	746 - dstpitch=(((w)ps[f])+(align-1))&(~(align-1));

	747 + w=cinfo.image_width; h=cinfo.image_height;

	748

	749 - if(srcpitch(h)+bh.bfOffBits!=bh.bfSize) _throw("Corrupt bitmap header" );

	750 - if((tempbuf=(unsigned char )malloc(srcpitch(*h)))==NULL

	751 - \|\| (buf=(unsigned char )malloc(dstpitch(h)))==NULL)

	752 - _throw("Memory allocation error");

	753 - if(lseek(fd, (long)bh.bfOffBits, SEEK_SET)!=(long)bh.bfOffBits)

	754 - _throw(strerror(errno));

	755 - _unix(bytesread=read(fd, tempbuf, srcpitch(h)));

	756 - if(bytesread!=srcpitch(h)) _throw("Read error");

	757 + if(cinfo.input_components==1 && cinfo.in_color_space==JCS_RGB)

	758 + srcpf=TJPF_GRAY;

	759 + else srcpf=TJPF_RGB;

	760

	761 - pixelconvert(tempbuf, BMP_BGR, srcpitch, buf, f, dstpitch, w, *h,

	762 - srcbottomup!=dstbottomup);

	763 + dstps=tjPixelSize[dstpf];

	764 + if((buf=(unsigned char )malloc((w)(h)dstps))==NULL)

	765 + _throw("loadbmp(): Memory allocation failure");

	766

	767 - finally:

	768 - if(tempbuf) free(tempbuf);

	769 - if(fd!=-1) close(fd);

	770 - return retcode;

	771 + while(cinfo.next_scanline<cinfo.image_height)

	772 + {

	773 + int i, nlines=(*src->get_pixel_rows)(&cinfo, src);

	774 + for(i=0; i<nlines; i++)

	775 + {

	776 + unsigned char *outbuf; int row;

	777 + row=cinfo.next_scanline+i;

	778 + if(bottomup) outbuf=&(buf)[((h)-row-1)(w)*dstps];

	779 + else outbuf=&(buf)[row(w)dstps];

	780 + pixelconvert(src->buffer[i], srcpf, 0, outbuf, dstpf, bo ttomup, *w,

	781 + nlines);

	782 + }

	783 + cinfo.next_scanline+=nlines;

	784 + }

	785 +

	786 + (*src->finish_input)(&cinfo, src);

	787 +

	788 + bailout:

	789 + jpeg_destroy_compress(&cinfo);

	790 + if(file) fclose(file);

	791 + if(retval<0 && buf && buf) {free(buf); *buf=NULL;}

	792 + return retval;

	793 }

	794

	795 -#define writeme(fd, addr, size) \

	796 - if((byteswritten=write(fd, addr, (size)))==-1) _throw(strerror(errno)); \

	797 - if(byteswritten!=(size)) _throw("Write error");

	798

	799 -int saveppm(char filename, unsigned char buf, int w, int h,

	800 - enum BMPPIXELFORMAT f, int srcpitch, int srcbottomup)

	801 +int savebmp(char filename, unsigned char buf, int w, int h, int srcpf,

	802 + int bottomup)

	803 {

	804 - FILE *fs=NULL; int retcode=0;

	805 - unsigned char *tempbuf=NULL;

	806 + int retval=0, srcps, dstpf;

	807 + struct jpeg_decompress_struct dinfo;

	808 + struct my_error_mgr jerr;

	809 + djpeg_dest_ptr dst;

	810 + FILE *file=NULL;

	811 + char *ptr=NULL;

	812

	813 - if((fs=fopen(filename, "wb"))==NULL) _throw(strerror(errno));

	814 - if(fprintf(fs, "P6\n")<1) _throw("Write error");

	815 - if(fprintf(fs, "%d %d\n", w, h)<1) _throw("Write error");

	816 - if(fprintf(fs, "255\n")<1) _throw("Write error");

	817 + memset(&dinfo, 0, sizeof(struct jpeg_decompress_struct));

	818

	819 - if((tempbuf=(unsigned char )malloc(wh*3))==NULL)

	820 - _throw("Memory allocation error");

	821 + if(!filename \|\| !buf \|\| w<1 \|\| h<1 \|\| srcpf<0 \|\| srcpf>=TJ_NUMPF)

	822 + _throw("savebmp(): Invalid argument");

	823

	824 - pixelconvert(buf, f, srcpitch, tempbuf, BMP_RGB, w*3, w, h,

	825 - srcbottomup);

	826 + if((file=fopen(filename, "wb"))==NULL)

	827 + _throwunix("savebmp(): Cannot open output file");

	828

	829 - if((fwrite(tempbuf, wh3, 1, fs))!=1) _throw("Write error");

	830 + dinfo.err=jpeg_std_error(&jerr.pub);

	831 + jerr.pub.error_exit=my_error_exit;

	832 + jerr.pub.output_message=my_output_message;

	833

	834 - finally:

	835 - if(tempbuf) free(tempbuf);

	836 - if(fs) fclose(fs);

	837 - return retcode;

	838 -}

	839 + if(setjmp(jerr.setjmp_buffer))

	840 + {

	841 + /* If we get here, the JPEG code has signaled an error. */

	842 + retval=-1; goto bailout;

	843 + }

	844

	845 -int savebmp(char filename, unsigned char buf, int w, int h,

	846 - enum BMPPIXELFORMAT f, int srcpitch, int srcbottomup)

	847 -{

	848 - int fd=-1, byteswritten, dstpitch, retcode=0;

	849 - int flags=O_RDWR\|O_CREAT\|O_TRUNC;

	850 - unsigned char tempbuf=NULL; char temp;

	851 - bmphdr bh; int mode;

	852 + jpeg_create_decompress(&dinfo);

	853 + if(srcpf==TJPF_GRAY)

	854 + {

	855 + dinfo.out_color_components=dinfo.output_components=1;

	856 + dinfo.out_color_space=JCS_GRAYSCALE;

	857 + }

	858 + else

	859 + {

	860 + dinfo.out_color_components=dinfo.output_components=3;

	861 + dinfo.out_color_space=JCS_RGB;

	862 + }

	863 + dinfo.image_width=w; dinfo.image_height=h;

	864 + dinfo.global_state=DSTATE_READY;

	865 + dinfo.scale_num=dinfo.scale_denom=1;

	866

	867 - #ifdef _WIN32

	868 - flags\|=O_BINARY; mode=_S_IREAD\|_S_IWRITE;

	869 - #else

	870 - mode=S_IRUSR\|S_IWUSR\|S_IRGRP\|S_IWGRP\|S_IROTH\|S_IWOTH;

	871 - #endif

	872 - if(!filename \|\| !buf \|\| w<1 \|\| h<1 \|\| f<0 \|\| f>BMPPIXELFORMATS-1 \|\| srcp itch<0)

	873 - _throw("bad argument to savebmp()");

	874 -

	875 - if(srcpitch==0) srcpitch=w*ps[f];

	876 -

	877 - if((temp=strrchr(filename, '.'))!=NULL)

	878 + ptr=strrchr(filename, '.');

	879 + if(ptr && !strcasecmp(ptr, ".bmp"))

	880 {

	881 - if(!stricmp(temp, ".ppm"))

	882 - return saveppm(filename, buf, w, h, f, srcpitch, srcbott omup);

	883 + if((dst=jinit_write_bmp(&dinfo, 0))==NULL)

	884 + _throw("savebmp(): Could not initialize bitmap writer");

	885 }

	886 + else

	887 + {

	888 + if((dst=jinit_write_ppm(&dinfo))==NULL)

	889 + _throw("savebmp(): Could not initialize PPM writer");

	890 + }

	891

	892 - _unix(fd=open(filename, flags, mode));

	893 - dstpitch=((w*3)+3)&(~3);

	894 + dst->output_file=file;

	895 + (*dst->start_output)(&dinfo, dst);

	896 + (*dinfo.mem->realize_virt_arrays)((j_common_ptr)&dinfo);

	897

	898 - bh.bfType=0x4d42;

	899 - bh.bfSize=BMPHDRSIZE+dstpitch*h;

	900 - bh.bfReserved1=0; bh.bfReserved2=0;

	901 - bh.bfOffBits=BMPHDRSIZE;

	902 - bh.biSize=40;

	903 - bh.biWidth=w; bh.biHeight=h;

	904 - bh.biPlanes=0; bh.biBitCount=24;

	905 - bh.biCompression=BI_RGB; bh.biSizeImage=0;

	906 - bh.biXPelsPerMeter=0; bh.biYPelsPerMeter=0;

	907 - bh.biClrUsed=0; bh.biClrImportant=0;

	908 + if(srcpf==TJPF_GRAY) dstpf=srcpf;

	909 + else dstpf=TJPF_RGB;

	910 + srcps=tjPixelSize[srcpf];

	911

	912 - if(!littleendian())

	913 + while(dinfo.output_scanline<dinfo.output_height)

	914 {

	915 - bh.bfType=byteswap16(bh.bfType);

	916 - bh.bfSize=byteswap(bh.bfSize);

	917 - bh.bfOffBits=byteswap(bh.bfOffBits);

	918 - bh.biSize=byteswap(bh.biSize);

	919 - bh.biWidth=byteswap(bh.biWidth);

	920 - bh.biHeight=byteswap(bh.biHeight);

	921 - bh.biPlanes=byteswap16(bh.biPlanes);

	922 - bh.biBitCount=byteswap16(bh.biBitCount);

	923 - bh.biCompression=byteswap(bh.biCompression);

	924 - bh.biSizeImage=byteswap(bh.biSizeImage);

	925 - bh.biXPelsPerMeter=byteswap(bh.biXPelsPerMeter);

	926 - bh.biYPelsPerMeter=byteswap(bh.biYPelsPerMeter);

	927 - bh.biClrUsed=byteswap(bh.biClrUsed);

	928 - bh.biClrImportant=byteswap(bh.biClrImportant);

	929 + int i, nlines=dst->buffer_height;

	930 + for(i=0; i<nlines; i++)

	931 + {

	932 + unsigned char *inbuf; int row;

	933 + row=dinfo.output_scanline+i;

	934 + if(bottomup) inbuf=&buf[(h-row-1)wsrcps];

	935 + else inbuf=&buf[rowwsrcps];

	936 + pixelconvert(inbuf, srcpf, bottomup, dst->buffer[i], dst pf, 0, w,

	937 + nlines);

	938 + }

	939 + (*dst->put_pixel_rows)(&dinfo, dst, nlines);

	940 + dinfo.output_scanline+=nlines;

	941 }

	942

	943 - writeme(fd, &bh.bfType, sizeof(unsigned short));

	944 - writeme(fd, &bh.bfSize, sizeof(unsigned int));

	945 - writeme(fd, &bh.bfReserved1, sizeof(unsigned short));

	946 - writeme(fd, &bh.bfReserved2, sizeof(unsigned short));

	947 - writeme(fd, &bh.bfOffBits, sizeof(unsigned int));

	948 - writeme(fd, &bh.biSize, sizeof(unsigned int));

	949 - writeme(fd, &bh.biWidth, sizeof(int));

	950 - writeme(fd, &bh.biHeight, sizeof(int));

	951 - writeme(fd, &bh.biPlanes, sizeof(unsigned short));

	952 - writeme(fd, &bh.biBitCount, sizeof(unsigned short));

	953 - writeme(fd, &bh.biCompression, sizeof(unsigned int));

	954 - writeme(fd, &bh.biSizeImage, sizeof(unsigned int));

	955 - writeme(fd, &bh.biXPelsPerMeter, sizeof(int));

	956 - writeme(fd, &bh.biYPelsPerMeter, sizeof(int));

	957 - writeme(fd, &bh.biClrUsed, sizeof(unsigned int));

	958 - writeme(fd, &bh.biClrImportant, sizeof(unsigned int));

	959 + (*dst->finish_output)(&dinfo, dst);

	960

	961 - if((tempbuf=(unsigned char )malloc(dstpitchh))==NULL)

	962 - _throw("Memory allocation error");

	963 -

	964 - pixelconvert(buf, f, srcpitch, tempbuf, BMP_BGR, dstpitch, w, h,

	965 - !srcbottomup);

	966 -

	967 - if((byteswritten=write(fd, tempbuf, dstpitchh))!=dstpitchh)

	968 - _throw(strerror(errno));

	969 -

	970 - finally:

	971 - if(tempbuf) free(tempbuf);

	972 - if(fd!=-1) close(fd);

	973 - return retcode;

	974 + bailout:

	975 + jpeg_destroy_decompress(&dinfo);

	976 + if(file) fclose(file);

	977 + return retval;

	978 }

	979

	980 const char *bmpgeterr(void)

	981 {

	982 - return __bmperr;

	983 + return errStr;

	984 }

	985 Index: bmp.h

	986 ===================================================================

	987 --- bmp.h (revision 829)

	988 +++ bmp.h (working copy)

	989 @@ -1,48 +1,42 @@

	990 -/* Copyright (C)2004 Landmark Graphics Corporation

	991 - * Copyright (C)2005 Sun Microsystems, Inc.

	992 +/*

	993 + * Copyright (C)2011 D. R. Commander. All Rights Reserved.

	994 *

	995 - * This library is free software and may be redistributed and/or modified under

	996 - * the terms of the wxWindows Library License, Version 3.1 or (at your option)

	997 - * any later version. The full license is in the LICENSE.txt file included

	998 - * with this distribution.

	999 + * Redistribution and use in source and binary forms, with or without

	1000 + * modification, are permitted provided that the following conditions are met:

	1001 *

	1002 - * This library is distributed in the hope that it will be useful,

	1003 - * but WITHOUT ANY WARRANTY; without even the implied warranty of

	1004 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

	1005 - * wxWindows Library License for more details.

	1006 -*/

	1007 + * - Redistributions of source code must retain the above copyright notice,

	1008 + * this list of conditions and the following disclaimer.

	1009 + * - Redistributions in binary form must reproduce the above copyright notice,

	1010 + * this list of conditions and the following disclaimer in the documentation

	1011 + * and/or other materials provided with the distribution.

	1012 + * - Neither the name of the libjpeg-turbo Project nor the names of its

	1013 + * contributors may be used to endorse or promote products derived from this

	1014 + * software without specific prior written permission.

	1015 + *

	1016 + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS",

	1017 + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE

	1018 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE

	1019 + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE

	1020 + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR

	1021 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF

	1022 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS

	1023 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN

	1024 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)

	1025 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE

	1026 + * POSSIBILITY OF SUCH DAMAGE.

	1027 + */

	1028

	1029 -// This provides rudimentary facilities for loading and saving true color

	1030 -// BMP and PPM files

	1031 -

	1032 #ifndef __BMP_H__

	1033 #define __BMP_H__

	1034

	1035 -#define BMPPIXELFORMATS 6

	1036 -enum BMPPIXELFORMAT {BMP_RGB=0, BMP_RGBA, BMP_BGR, BMP_BGRA, BMP_ABGR, BMP_ARGB };

	1037 +#include "./turbojpeg.h"

	1038

	1039 -#ifdef __cplusplus

	1040 -extern "C" {

	1041 -#endif

	1042 +int loadbmp(char filename, unsigned char buf, int w, int *h, int pf,

	1043 + int bottomup);

	1044

	1045 -// This will load a Windows bitmap from a file and return a buffer with the

	1046 -// specified pixel format, scanline alignment, and orientation. The width and

	1047 -// height are returned in w and h.

	1048 +int savebmp(char filename, unsigned char buf, int w, int h, int pf,

	1049 + int bottomup);

	1050

	1051 -int loadbmp(char filename, unsigned char buf, int w, int *h,

	1052 - enum BMPPIXELFORMAT f, int align, int dstbottomup);

	1053 -

	1054 -// This will save a buffer with the specified pixel format, pitch, orientation,

	1055 -// width, and height as a 24-bit Windows bitmap or PPM (the filename determines

	1056 -// which format to use)

	1057 -

	1058 -int savebmp(char filename, unsigned char buf, int w, int h,

	1059 - enum BMPPIXELFORMAT f, int srcpitch, int srcbottomup);

	1060 -

	1061 const char *bmpgeterr(void);

	1062

	1063 -#ifdef __cplusplus

	1064 -}

	1065 #endif

	1066 -

	1067 -#endif

	1068 Index: cderror.h

	1069 ===================================================================

	1070 --- cderror.h (revision 829)

	1071 +++ cderror.h (working copy)

	1072 @@ -2,6 +2,7 @@

	1073 * cderror.h

	1074 *

	1075 * Copyright (C) 1994-1997, Thomas G. Lane.

	1076 + * Modified 2009 by Guido Vollbeding.

	1077 * This file is part of the Independent JPEG Group's software.

	1078 * For conditions of distribution and use, see the accompanying README file.

	1079 *

	1080 @@ -45,6 +46,7 @@

	1081 JMESSAGE(JERR_BMP_BADPLANES, "Invalid BMP file: biPlanes not equal to 1")

	1082 JMESSAGE(JERR_BMP_COLORSPACE, "BMP output must be grayscale or RGB")

	1083 JMESSAGE(JERR_BMP_COMPRESSED, "Sorry, compressed BMPs not yet supported")

	1084 +JMESSAGE(JERR_BMP_EMPTY, "Empty BMP image")

	1085 JMESSAGE(JERR_BMP_NOT, "Not a BMP file - does not start with BM")

	1086 JMESSAGE(JTRC_BMP, "%ux%u 24-bit BMP image")

	1087 JMESSAGE(JTRC_BMP_MAPPED, "%ux%u 8-bit colormapped BMP image")

	1088 Index: cdjpeg.h

	1089 ===================================================================

	1090 --- cdjpeg.h (revision 829)

	1091 +++ cdjpeg.h (working copy)

	1092 @@ -104,6 +104,7 @@

	1093 #define jinit_write_targa jIWrTarga

	1094 #define read_quant_tables RdQTables

	1095 #define read_scan_script RdScnScript

	1096 +#define set_quality_ratings SetQRates

	1097 #define set_quant_slots SetQSlots

	1098 #define set_sample_factors SetSFacts

	1099 #define read_color_map RdCMap

	1100 @@ -131,8 +132,10 @@

	1101 /* cjpeg support routines (in rdswitch.c) */

	1102

	1103 EXTERN(boolean) read_quant_tables JPP((j_compress_ptr cinfo, char * filename,

	1104 - int scale_factor, boolean force_baseline));

	1105 + boolean force_baseline));

	1106 EXTERN(boolean) read_scan_script JPP((j_compress_ptr cinfo, char * filename));

	1107 +EXTERN(boolean) set_quality_ratings JPP((j_compress_ptr cinfo, char *arg,

	1108 + boolean force_baseline));

	1109 EXTERN(boolean) set_quant_slots JPP((j_compress_ptr cinfo, char *arg));

	1110 EXTERN(boolean) set_sample_factors JPP((j_compress_ptr cinfo, char *arg));

	1111

	1112 Index: cjpeg.c

	1113 ===================================================================

	1114 --- cjpeg.c (revision 829)

	1115 +++ cjpeg.c (working copy)

	1116 @@ -1,8 +1,11 @@

	1117 /*

	1118 * cjpeg.c

	1119 *

	1120 + * This file was part of the Independent JPEG Group's software:

	1121 * Copyright (C) 1991-1998, Thomas G. Lane.

	1122 - * This file is part of the Independent JPEG Group's software.

	1123 + * Modified 2003-2011 by Guido Vollbeding.

	1124 + * libjpeg-turbo Modifications:

	1125 + * Copyright (C) 2010, 2013, D. R. Commander.

	1126 * For conditions of distribution and use, see the accompanying README file.

	1127 *

	1128 * This file contains a command-line user interface for the JPEG compressor.

	1129 @@ -25,6 +28,7 @@

	1130

	1131 #include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */

	1132 #include "jversion.h" /* for version message */

	1133 +#include "config.h"

	1134

	1135 #ifdef USE_CCOMMAND /* command-line reader for Macintosh */

	1136 #ifdef __MWERKS__

	1137 @@ -135,6 +139,7 @@

	1138

	1139 static const char * progname; /* program name for error messages */

	1140 static char * outfilename; /* for -outfile switch */

	1141 +boolean memdst; /* for -memdst switch */

	1142

	1143

	1144 LOCAL(void)

	1145 @@ -149,8 +154,9 @@

	1146 #endif

	1147

	1148 fprintf(stderr, "Switches (names may be abbreviated):\n");

	1149 - fprintf(stderr, " -quality N Compression quality (0..100; 5-95 is useful range)\n");

	1150 + fprintf(stderr, " -quality N[,...] Compression quality (0..100; 5-95 is us eful range)\n");

	1151 fprintf(stderr, " -grayscale Create monochrome JPEG file\n");

	1152 + fprintf(stderr, " -rgb Create RGB JPEG file\n");

	1153 #ifdef ENTROPY_OPT_SUPPORTED

	1154 fprintf(stderr, " -optimize Optimize Huffman table (smaller file, but s low compression)\n");

	1155 #endif

	1156 @@ -161,6 +167,9 @@

	1157 fprintf(stderr, " -targa Input file is Targa format (usually not nee ded)\n");

	1158 #endif

	1159 fprintf(stderr, "Switches for advanced users:\n");

	1160 +#ifdef C_ARITH_CODING_SUPPORTED

	1161 + fprintf(stderr, " -arithmetic Use arithmetic coding\n");

	1162 +#endif

	1163 #ifdef DCT_ISLOW_SUPPORTED

	1164 fprintf(stderr, " -dct int Use integer DCT method%s\n",

	1165 (JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : ""));

	1166 @@ -179,11 +188,11 @@

	1167 #endif

	1168 fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n");

	1169 fprintf(stderr, " -outfile name Specify name for output file\n");

	1170 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	1171 + fprintf(stderr, " -memdst Compress to memory instead of file (useful for benchmarking)\n");

	1172 +#endif

	1173 fprintf(stderr, " -verbose or -debug Emit debug output\n");

	1174 fprintf(stderr, "Switches for wizards:\n");

	1175 -#ifdef C_ARITH_CODING_SUPPORTED

	1176 - fprintf(stderr, " -arithmetic Use arithmetic coding\n");

	1177 -#endif

	1178 fprintf(stderr, " -baseline Force baseline quantization tables\n");

	1179 fprintf(stderr, " -qtables file Use quantization tables given in file\n");

	1180 fprintf(stderr, " -qslots N[,...] Set component quantization tables\n");

	1181 @@ -209,10 +218,9 @@

	1182 {

	1183 int argn;

	1184 char * arg;

	1185 - int quality; /* -quality parameter */

	1186 - int q_scale_factor; /* scaling percentage for -qtables */

	1187 boolean force_baseline;

	1188 boolean simple_progressive;

	1189 + char * qualityarg = NULL; /* saves -quality parm if any */

	1190 char * qtablefile = NULL; /* saves -qtables filename if any */

	1191 char * qslotsarg = NULL; /* saves -qslots parm if any */

	1192 char * samplearg = NULL; /* saves -sample parm if any */

	1193 @@ -219,15 +227,12 @@

	1194 char * scansarg = NULL; /* saves -scans parm if any */

	1195

	1196 /* Set up default JPEG parameters. */

	1197 - /* Note that default -quality level need not, and does not,

	1198 - * match the default scaling for an explicit -qtables argument.

	1199 - */

	1200 - quality = 75; /* default -quality value */

	1201 - q_scale_factor = 100; /* default to no scaling for -qtables */

	1202 +

	1203 force_baseline = FALSE; /* by default, allow 16-bit quantizers */

	1204 simple_progressive = FALSE;

	1205 is_targa = FALSE;

	1206 outfilename = NULL;

	1207 + memdst = FALSE;

	1208 cinfo->err->trace_level = 0;

	1209

	1210 /* Scan command line options, adjust parameters */

	1211 @@ -277,8 +282,11 @@

	1212 static boolean printed_version = FALSE;

	1213

	1214 if (! printed_version) {

	1215 - fprintf(stderr, "Independent JPEG Group's CJPEG, version %s\n%s\n",

	1216 - JVERSION, JCOPYRIGHT);

	1217 + fprintf(stderr, "%s version %s (build %s)\n",

	1218 + PACKAGE_NAME, VERSION, BUILD);

	1219 + fprintf(stderr, "%s\n\n", JCOPYRIGHT);

	1220 + fprintf(stderr, "Emulating The Independent JPEG Group's software, versio n %s\n\n",

	1221 + JVERSION);

	1222 printed_version = TRUE;

	1223 }

	1224 cinfo->err->trace_level++;

	1225 @@ -287,6 +295,10 @@

	1226 /* Force a monochrome JPEG file to be generated. */

	1227 jpeg_set_colorspace(cinfo, JCS_GRAYSCALE);

	1228

	1229 + } else if (keymatch(arg, "rgb", 3)) {

	1230 + /* Force an RGB JPEG file to be generated. */

	1231 + jpeg_set_colorspace(cinfo, JCS_RGB);

	1232 +

	1233 } else if (keymatch(arg, "maxmemory", 3)) {

	1234 /* Maximum memory in Kb (or Mb with 'm'). */

	1235 long lval;

	1236 @@ -305,7 +317,7 @@

	1237 #ifdef ENTROPY_OPT_SUPPORTED

	1238 cinfo->optimize_coding = TRUE;

	1239 #else

	1240 - fprintf(stderr, "%s: sorry, entropy optimization was not compiled\n",

	1241 + fprintf(stderr, "%s: sorry, entropy optimization was not compiled in\n",

	1242 progname);

	1243 exit(EXIT_FAILURE);

	1244 #endif

	1245 @@ -322,19 +334,26 @@

	1246 simple_progressive = TRUE;

	1247 /* We must postpone execution until num_components is known. */

	1248 #else

	1249 - fprintf(stderr, "%s: sorry, progressive output was not compiled\n",

	1250 + fprintf(stderr, "%s: sorry, progressive output was not compiled in\n",

	1251 progname);

	1252 exit(EXIT_FAILURE);

	1253 #endif

	1254

	1255 + } else if (keymatch(arg, "memdst", 2)) {

	1256 + /* Use in-memory destination manager */

	1257 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	1258 + memdst = TRUE;

	1259 +#else

	1260 + fprintf(stderr, "%s: sorry, in-memory destination manager was not compile d in\n",

	1261 + progname);

	1262 + exit(EXIT_FAILURE);

	1263 +#endif

	1264 +

	1265 } else if (keymatch(arg, "quality", 1)) {

	1266 - /* Quality factor (quantization table scaling factor). */

	1267 + /* Quality ratings (quantization table scaling factors). */

	1268 if (++argn >= argc) /* advance to next argument */

	1269 usage();

	1270 - if (sscanf(argv[argn], "%d", &quality) != 1)

	1271 - usage();

	1272 - /* Change scale factor in case -qtables is present. */

	1273 - q_scale_factor = jpeg_quality_scaling(quality);

	1274 + qualityarg = argv[argn];

	1275

	1276 } else if (keymatch(arg, "qslots", 2)) {

	1277 /* Quantization table slot numbers. */

	1278 @@ -382,7 +401,7 @@

	1279 * default sampling factors.

	1280 */

	1281

	1282 - } else if (keymatch(arg, "scans", 2)) {

	1283 + } else if (keymatch(arg, "scans", 4)) {

	1284 /* Set scan script. */

	1285 #ifdef C_MULTISCAN_FILES_SUPPORTED

	1286 if (++argn >= argc) /* advance to next argument */

	1287 @@ -390,7 +409,7 @@

	1288 scansarg = argv[argn];

	1289 /* We must postpone reading the file in case -progressive appears. */

	1290 #else

	1291 - fprintf(stderr, "%s: sorry, multi-scan output was not compiled\n",

	1292 + fprintf(stderr, "%s: sorry, multi-scan output was not compiled in\n",

	1293 progname);

	1294 exit(EXIT_FAILURE);

	1295 #endif

	1296 @@ -422,11 +441,12 @@

	1297

	1298 /* Set quantization tables for selected quality. */

	1299 /* Some or all may be overridden if -qtables is present. */

	1300 - jpeg_set_quality(cinfo, quality, force_baseline);

	1301 + if (qualityarg != NULL) /* process -quality if it was present */

	1302 + if (! set_quality_ratings(cinfo, qualityarg, force_baseline))

	1303 + usage();

	1304

	1305 if (qtablefile != NULL) /* process -qtables if it was present */

	1306 - if (! read_quant_tables(cinfo, qtablefile,

	1307 - q_scale_factor, force_baseline))

	1308 + if (! read_quant_tables(cinfo, qtablefile, force_baseline))

	1309 usage();

	1310

	1311 if (qslotsarg != NULL) /* process -qslots if it was present */

	1312 @@ -468,7 +488,9 @@

	1313 int file_index;

	1314 cjpeg_source_ptr src_mgr;

	1315 FILE * input_file;

	1316 - FILE * output_file;

	1317 + FILE * output_file = NULL;

	1318 + unsigned char *outbuffer = NULL;

	1319 + unsigned long outsize = 0;

	1320 JDIMENSION num_scanlines;

	1321

	1322 /* On Mac, fetch a command line. */

	1323 @@ -511,20 +533,22 @@

	1324 file_index = parse_switches(&cinfo, argc, argv, 0, FALSE);

	1325

	1326 #ifdef TWO_FILE_COMMANDLINE

	1327 - /* Must have either -outfile switch or explicit output file name */

	1328 - if (outfilename == NULL) {

	1329 - if (file_index != argc-2) {

	1330 - fprintf(stderr, "%s: must name one input and one output file\n",

	1331 - progname);

	1332 - usage();

	1333 + if (!memdst) {

	1334 + /* Must have either -outfile switch or explicit output file name */

	1335 + if (outfilename == NULL) {

	1336 + if (file_index != argc-2) {

	1337 + fprintf(stderr, "%s: must name one input and one output file\n",

	1338 + progname);

	1339 + usage();

	1340 + }

	1341 + outfilename = argv[file_index+1];

	1342 + } else {

	1343 + if (file_index != argc-1) {

	1344 + fprintf(stderr, "%s: must name one input and one output file\n",

	1345 + progname);

	1346 + usage();

	1347 + }

	1348 }

	1349 - outfilename = argv[file_index+1];

	1350 - } else {

	1351 - if (file_index != argc-1) {

	1352 - fprintf(stderr, "%s: must name one input and one output file\n",

	1353 - progname);

	1354 - usage();

	1355 - }

	1356 }

	1357 #else

	1358 /* Unix style: expect zero or one file name */

	1359 @@ -551,7 +575,7 @@

	1360 fprintf(stderr, "%s: can't open %s\n", progname, outfilename);

	1361 exit(EXIT_FAILURE);

	1362 }

	1363 - } else {

	1364 + } else if (!memdst) {

	1365 /* default output file is stdout */

	1366 output_file = write_stdout();

	1367 }

	1368 @@ -574,7 +598,12 @@

	1369 file_index = parse_switches(&cinfo, argc, argv, 0, TRUE);

	1370

	1371 /* Specify data destination for compression */

	1372 - jpeg_stdio_dest(&cinfo, output_file);

	1373 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	1374 + if (memdst)

	1375 + jpeg_mem_dest(&cinfo, &outbuffer, &outsize);

	1376 + else

	1377 +#endif

	1378 + jpeg_stdio_dest(&cinfo, output_file);

	1379

	1380 /* Start compressor */

	1381 jpeg_start_compress(&cinfo, TRUE);

	1382 @@ -593,7 +622,7 @@

	1383 /* Close files, if we opened them */

	1384 if (input_file != stdin)

	1385 fclose(input_file);

	1386 - if (output_file != stdout)

	1387 + if (output_file != stdout && output_file != NULL)

	1388 fclose(output_file);

	1389

	1390 #ifdef PROGRESS_REPORT

	1391 @@ -600,6 +629,12 @@

	1392 end_progress_monitor((j_common_ptr) &cinfo);

	1393 #endif

	1394

	1395 + if (memdst) {

	1396 + fprintf(stderr, "Compressed size: %lu bytes\n", outsize);

	1397 + if (outbuffer != NULL)

	1398 + free(outbuffer);

	1399 + }

	1400 +

	1401 /* All done. */

	1402 exit(jerr.num_warnings ? EXIT_WARNING : EXIT_SUCCESS);

	1403 return 0; /* suppress no-return-value warnings */

	1404 Index: djpeg.c

	1405 ===================================================================

	1406 --- djpeg.c (revision 829)

	1407 +++ djpeg.c (working copy)

	1408 @@ -1,8 +1,11 @@

	1409 /*

	1410 * djpeg.c

	1411 *

	1412 + * This file was part of the Independent JPEG Group's software:

	1413 * Copyright (C) 1991-1997, Thomas G. Lane.

	1414 - * This file is part of the Independent JPEG Group's software.

	1415 + * libjpeg-turbo Modifications:

	1416 + * Copyright (C) 2010-2011, 2013-2015, D. R. Commander.

	1417 + * Copyright (C) 2015, Google, Inc.

	1418 * For conditions of distribution and use, see the accompanying README file.

	1419 *

	1420 * This file contains a command-line user interface for the JPEG decompressor.

	1421 @@ -25,6 +28,7 @@

	1422

	1423 #include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */

	1424 #include "jversion.h" /* for version message */

	1425 +#include "config.h"

	1426

	1427 #include <ctype.h> /* to declare isprint() */

	1428

	1429 @@ -84,6 +88,10 @@

	1430

	1431 static const char * progname; /* program name for error messages */

	1432 static char * outfilename; /* for -outfile switch */

	1433 +boolean memsrc; /* for -memsrc switch */

	1434 +boolean strip, skip;

	1435 +JDIMENSION startY, endY;

	1436 +#define INPUT_BUF_SIZE 4096

	1437

	1438

	1439 LOCAL(void)

	1440 @@ -101,6 +109,7 @@

	1441 fprintf(stderr, " -colors N Reduce image to no more than N colors\n");

	1442 fprintf(stderr, " -fast Fast, low-quality processing\n");

	1443 fprintf(stderr, " -grayscale Force grayscale output\n");

	1444 + fprintf(stderr, " -rgb Force RGB output\n");

	1445 #ifdef IDCT_SCALING_SUPPORTED

	1446 fprintf(stderr, " -scale M/N Scale output image by fraction M/N, eg, 1/8 \n");

	1447 #endif

	1448 @@ -153,6 +162,12 @@

	1449 #endif

	1450 fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n");

	1451 fprintf(stderr, " -outfile name Specify name for output file\n");

	1452 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	1453 + fprintf(stderr, " -memsrc Load input file into memory before decompre ssing\n");

	1454 +#endif

	1455 +

	1456 + fprintf(stderr, " -skip Y0,Y1 Decode all rows except those between Y0 and Y1 (inclusive)\n");

	1457 + fprintf(stderr, " -strip Y0,Y1 Decode only rows between Y0 and Y1 (inclusi ve)\n");

	1458 fprintf(stderr, " -verbose or -debug Emit debug output\n");

	1459 exit(EXIT_FAILURE);

	1460 }

	1461 @@ -176,6 +191,9 @@

	1462 /* Set up default JPEG parameters. */

	1463 requested_fmt = DEFAULT_FMT; /* set default output file format */

	1464 outfilename = NULL;

	1465 + memsrc = FALSE;

	1466 + strip = FALSE;

	1467 + skip = FALSE;

	1468 cinfo->err->trace_level = 0;

	1469

	1470 /* Scan command line options, adjust parameters */

	1471 @@ -240,8 +258,11 @@

	1472 static boolean printed_version = FALSE;

	1473

	1474 if (! printed_version) {

	1475 - fprintf(stderr, "Independent JPEG Group's DJPEG, version %s\n%s\n",

	1476 - JVERSION, JCOPYRIGHT);

	1477 + fprintf(stderr, "%s version %s (build %s)\n",

	1478 + PACKAGE_NAME, VERSION, BUILD);

	1479 + fprintf(stderr, "%s\n\n", JCOPYRIGHT);

	1480 + fprintf(stderr, "Emulating The Independent JPEG Group's software, versio n %s\n\n",

	1481 + JVERSION);

	1482 printed_version = TRUE;

	1483 }

	1484 cinfo->err->trace_level++;

	1485 @@ -263,6 +284,10 @@

	1486 /* Force monochrome output. */

	1487 cinfo->out_color_space = JCS_GRAYSCALE;

	1488

	1489 + } else if (keymatch(arg, "rgb", 2)) {

	1490 + /* Force RGB output. */

	1491 + cinfo->out_color_space = JCS_RGB;

	1492 +

	1493 } else if (keymatch(arg, "map", 3)) {

	1494 /* Quantize to a color map taken from an input file. */

	1495 if (++argn >= argc) /* advance to next argument */

	1496 @@ -314,6 +339,16 @@

	1497 usage();

	1498 outfilename = argv[argn]; /* save it away for later use */

	1499

	1500 + } else if (keymatch(arg, "memsrc", 2)) {

	1501 + /* Use in-memory source manager */

	1502 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	1503 + memsrc = TRUE;

	1504 +#else

	1505 + fprintf(stderr, "%s: sorry, in-memory source manager was not compiled in\ n",

	1506 + progname);

	1507 + exit(EXIT_FAILURE);

	1508 +#endif

	1509 +

	1510 } else if (keymatch(arg, "pnm", 1) \|\| keymatch(arg, "ppm", 1)) {

	1511 /* PPM/PGM output format. */

	1512 requested_fmt = FMT_PPM;

	1513 @@ -322,7 +357,7 @@

	1514 /* RLE output format. */

	1515 requested_fmt = FMT_RLE;

	1516

	1517 - } else if (keymatch(arg, "scale", 1)) {

	1518 + } else if (keymatch(arg, "scale", 2)) {

	1519 /* Scale the output image by a fraction M/N. */

	1520 if (++argn >= argc) /* advance to next argument */

	1521 usage();

	1522 @@ -330,6 +365,20 @@

	1523 &cinfo->scale_num, &cinfo->scale_denom) != 2)

	1524 usage();

	1525

	1526 + } else if (keymatch(arg, "strip", 2)) {

	1527 + if (++argn >= argc)

	1528 + usage();

	1529 + if (sscanf(argv[argn], "%d,%d", &startY, &endY) != 2 \|\| startY > endY)

	1530 + usage();

	1531 + strip = TRUE;

	1532 +

	1533 + } else if (keymatch(arg, "skip", 2)) {

	1534 + if (++argn >= argc)

	1535 + usage();

	1536 + if (sscanf(argv[argn], "%d,%d", &startY, &endY) != 2 \|\| startY > endY)

	1537 + usage();

	1538 + skip = TRUE;

	1539 +

	1540 } else if (keymatch(arg, "targa", 1)) {

	1541 /* Targa output format. */

	1542 requested_fmt = FMT_TARGA;

	1543 @@ -432,6 +481,8 @@

	1544 djpeg_dest_ptr dest_mgr = NULL;

	1545 FILE * input_file;

	1546 FILE * output_file;

	1547 + unsigned char *inbuffer = NULL;

	1548 + unsigned long insize = 0;

	1549 JDIMENSION num_scanlines;

	1550

	1551 /* On Mac, fetch a command line. */

	1552 @@ -455,7 +506,7 @@

	1553 * APP12 is used by some digital camera makers for textual info,

	1554 * so we provide the ability to display it as text.

	1555 * If you like, additional APPn marker types can be selected for display,

	1556 - * but don't try to override APP0 or APP14 this way (see libjpeg.doc).

	1557 + * but don't try to override APP0 or APP14 this way (see libjpeg.txt).

	1558 */

	1559 jpeg_set_marker_processor(&cinfo, JPEG_COM, print_text_marker);

	1560 jpeg_set_marker_processor(&cinfo, JPEG_APP0+12, print_text_marker);

	1561 @@ -526,7 +577,30 @@

	1562 #endif

	1563

	1564 /* Specify data source for decompression */

	1565 - jpeg_stdio_src(&cinfo, input_file);

	1566 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	1567 + if (memsrc) {

	1568 + size_t nbytes;

	1569 + do {

	1570 + inbuffer = (unsigned char *)realloc(inbuffer, insize + INPUT_BUF_SIZE);

	1571 + if (inbuffer == NULL) {

	1572 + fprintf(stderr, "%s: memory allocation failure\n", progname);

	1573 + exit(EXIT_FAILURE);

	1574 + }

	1575 + nbytes = JFREAD(input_file, &inbuffer[insize], INPUT_BUF_SIZE);

	1576 + if (nbytes < INPUT_BUF_SIZE && ferror(input_file)) {

	1577 + if (file_index < argc)

	1578 + fprintf(stderr, "%s: can't read from %s\n", progname,

	1579 + argv[file_index]);

	1580 + else

	1581 + fprintf(stderr, "%s: can't read from stdin\n", progname);

	1582 + }

	1583 + insize += (unsigned long)nbytes;

	1584 + } while (nbytes == INPUT_BUF_SIZE);

	1585 + fprintf(stderr, "Compressed size: %lu bytes\n", insize);

	1586 + jpeg_mem_src(&cinfo, inbuffer, insize);

	1587 + } else

	1588 +#endif

	1589 + jpeg_stdio_src(&cinfo, input_file);

	1590

	1591 /* Read file header, set default decompression parameters */

	1592 (void) jpeg_read_header(&cinfo, TRUE);

	1593 @@ -575,14 +649,64 @@

	1594 /* Start decompressor */

	1595 (void) jpeg_start_decompress(&cinfo);

	1596

	1597 - /* Write output file header */

	1598 - (*dest_mgr->start_output) (&cinfo, dest_mgr);

	1599 + /* Strip decode */

	1600 + if (strip \|\| skip) {

	1601 + JDIMENSION tmp;

	1602

	1603 - /* Process data */

	1604 - while (cinfo.output_scanline < cinfo.output_height) {

	1605 - num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,

	1606 - dest_mgr->buffer_height);

	1607 - (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);

	1608 + /* Check for valid endY. We cannot check this value until after

	1609 + * jpeg_start_decompress() is called. Note that we have already verified

	1610 + * that startY <= endY.

	1611 + */

	1612 + if (endY > cinfo.output_height - 1) {

	1613 + fprintf(stderr, "%s: strip %d-%d exceeds image height %d\n", progname,

	1614 + startY, endY, cinfo.output_height);

	1615 + exit(EXIT_FAILURE);

	1616 + }

	1617 +

	1618 + /* Write output file header. This is a hack to ensure that the destination

	1619 + * manager creates an image of the proper size for the partial decode.

	1620 + */

	1621 + tmp = cinfo.output_height;

	1622 + cinfo.output_height = endY - startY + 1;

	1623 + if (skip)

	1624 + cinfo.output_height = tmp - cinfo.output_height;

	1625 + (*dest_mgr->start_output) (&cinfo, dest_mgr);

	1626 + cinfo.output_height = tmp;

	1627 +

	1628 + /* Process data */

	1629 + if (skip) {

	1630 + while (cinfo.output_scanline < startY) {

	1631 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,

	1632 + dest_mgr->buffer_height);

	1633 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);

	1634 + }

	1635 + jpeg_skip_scanlines(&cinfo, endY - startY + 1);

	1636 + while (cinfo.output_scanline < cinfo.output_height) {

	1637 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,

	1638 + dest_mgr->buffer_height);

	1639 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);

	1640 + }

	1641 + } else {

	1642 + jpeg_skip_scanlines(&cinfo, startY);

	1643 + while (cinfo.output_scanline <= endY) {

	1644 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,

	1645 + dest_mgr->buffer_height);

	1646 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);

	1647 + }

	1648 + jpeg_skip_scanlines(&cinfo, cinfo.output_height - endY + 1);

	1649 + }

	1650 +

	1651 + /* Normal full image decode */

	1652 + } else {

	1653 + /* Write output file header */

	1654 + (*dest_mgr->start_output) (&cinfo, dest_mgr);

	1655 +

	1656 + /* Process data */

	1657 + while (cinfo.output_scanline < cinfo.output_height) {

	1658 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,

	1659 + dest_mgr->buffer_height);

	1660 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);

	1661 + }

	1662 }

	1663

	1664 #ifdef PROGRESS_REPORT

	1665 @@ -610,6 +734,9 @@

	1666 end_progress_monitor((j_common_ptr) &cinfo);

	1667 #endif

	1668

	1669 + if (memsrc && inbuffer != NULL)

	1670 + free(inbuffer);

	1671 +

	1672 /* All done. */

	1673 exit(jerr.num_warnings ? EXIT_WARNING : EXIT_SUCCESS);

	1674 return 0; /* suppress no-return-value warnings */

	1675 Index: jcapimin.c

	1676 ===================================================================

	1677 --- jcapimin.c (revision 829)

	1678 +++ jcapimin.c (working copy)

	1679 @@ -2,6 +2,7 @@

	1680 * jcapimin.c

	1681 *

	1682 * Copyright (C) 1994-1998, Thomas G. Lane.

	1683 + * Modified 2003-2010 by Guido Vollbeding.

	1684 * This file is part of the Independent JPEG Group's software.

	1685 * For conditions of distribution and use, see the accompanying README file.

	1686 *

	1687 @@ -63,8 +64,12 @@

	1688

	1689 cinfo->comp_info = NULL;

	1690

	1691 - for (i = 0; i < NUM_QUANT_TBLS; i++)

	1692 + for (i = 0; i < NUM_QUANT_TBLS; i++) {

	1693 cinfo->quant_tbl_ptrs[i] = NULL;

	1694 +#if JPEG_LIB_VERSION >= 70

	1695 + cinfo->q_scale_factor[i] = 100;

	1696 +#endif

	1697 + }

	1698

	1699 for (i = 0; i < NUM_HUFF_TBLS; i++) {

	1700 cinfo->dc_huff_tbl_ptrs[i] = NULL;

	1701 @@ -71,6 +76,13 @@

	1702 cinfo->ac_huff_tbl_ptrs[i] = NULL;

	1703 }

	1704

	1705 +#if JPEG_LIB_VERSION >= 80

	1706 + /* Must do it here for emit_dqt in case jpeg_write_tables is used */

	1707 + cinfo->block_size = DCTSIZE;

	1708 + cinfo->natural_order = jpeg_natural_order;

	1709 + cinfo->lim_Se = DCTSIZE2-1;

	1710 +#endif

	1711 +

	1712 cinfo->script_space = NULL;

	1713

	1714 cinfo->input_gamma = 1.0; /* in case application forgets */

	1715 Index: jccolor.c

	1716 ===================================================================

	1717 --- jccolor.c (revision 829)

	1718 +++ jccolor.c (working copy)

	1719 @@ -1,10 +1,11 @@

	1720 /*

	1721 * jccolor.c

	1722 *

	1723 + * This file was part of the Independent JPEG Group's software:

	1724 * Copyright (C) 1991-1996, Thomas G. Lane.

	1725 + * libjpeg-turbo Modifications:

	1726 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	1727 - * Copyright 2009 D. R. Commander

	1728 - * This file is part of the Independent JPEG Group's software.

	1729 + * Copyright (C) 2009-2012, D. R. Commander.

	1730 * For conditions of distribution and use, see the accompanying README file.

	1731 *

	1732 * This file contains input colorspace conversion routines.

	1733 @@ -14,6 +15,7 @@

	1734 #include "jinclude.h"

	1735 #include "jpeglib.h"

	1736 #include "jsimd.h"

	1737 +#include "config.h"

	1738

	1739

	1740 /* Private subobject */

	1741 @@ -81,6 +83,111 @@

	1742 #define TABLE_SIZE (8*(MAXJSAMPLE+1))

	1743

	1744

	1745 +/* Include inline routines for colorspace extensions */

	1746 +

	1747 +#include "jccolext.c"

	1748 +#undef RGB_RED

	1749 +#undef RGB_GREEN

	1750 +#undef RGB_BLUE

	1751 +#undef RGB_PIXELSIZE

	1752 +

	1753 +#define RGB_RED EXT_RGB_RED

	1754 +#define RGB_GREEN EXT_RGB_GREEN

	1755 +#define RGB_BLUE EXT_RGB_BLUE

	1756 +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	1757 +#define rgb_ycc_convert_internal extrgb_ycc_convert_internal

	1758 +#define rgb_gray_convert_internal extrgb_gray_convert_internal

	1759 +#define rgb_rgb_convert_internal extrgb_rgb_convert_internal

	1760 +#include "jccolext.c"

	1761 +#undef RGB_RED

	1762 +#undef RGB_GREEN

	1763 +#undef RGB_BLUE

	1764 +#undef RGB_PIXELSIZE

	1765 +#undef rgb_ycc_convert_internal

	1766 +#undef rgb_gray_convert_internal

	1767 +#undef rgb_rgb_convert_internal

	1768 +

	1769 +#define RGB_RED EXT_RGBX_RED

	1770 +#define RGB_GREEN EXT_RGBX_GREEN

	1771 +#define RGB_BLUE EXT_RGBX_BLUE

	1772 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	1773 +#define rgb_ycc_convert_internal extrgbx_ycc_convert_internal

	1774 +#define rgb_gray_convert_internal extrgbx_gray_convert_internal

	1775 +#define rgb_rgb_convert_internal extrgbx_rgb_convert_internal

	1776 +#include "jccolext.c"

	1777 +#undef RGB_RED

	1778 +#undef RGB_GREEN

	1779 +#undef RGB_BLUE

	1780 +#undef RGB_PIXELSIZE

	1781 +#undef rgb_ycc_convert_internal

	1782 +#undef rgb_gray_convert_internal

	1783 +#undef rgb_rgb_convert_internal

	1784 +

	1785 +#define RGB_RED EXT_BGR_RED

	1786 +#define RGB_GREEN EXT_BGR_GREEN

	1787 +#define RGB_BLUE EXT_BGR_BLUE

	1788 +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	1789 +#define rgb_ycc_convert_internal extbgr_ycc_convert_internal

	1790 +#define rgb_gray_convert_internal extbgr_gray_convert_internal

	1791 +#define rgb_rgb_convert_internal extbgr_rgb_convert_internal

	1792 +#include "jccolext.c"

	1793 +#undef RGB_RED

	1794 +#undef RGB_GREEN

	1795 +#undef RGB_BLUE

	1796 +#undef RGB_PIXELSIZE

	1797 +#undef rgb_ycc_convert_internal

	1798 +#undef rgb_gray_convert_internal

	1799 +#undef rgb_rgb_convert_internal

	1800 +

	1801 +#define RGB_RED EXT_BGRX_RED

	1802 +#define RGB_GREEN EXT_BGRX_GREEN

	1803 +#define RGB_BLUE EXT_BGRX_BLUE

	1804 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	1805 +#define rgb_ycc_convert_internal extbgrx_ycc_convert_internal

	1806 +#define rgb_gray_convert_internal extbgrx_gray_convert_internal

	1807 +#define rgb_rgb_convert_internal extbgrx_rgb_convert_internal

	1808 +#include "jccolext.c"

	1809 +#undef RGB_RED

	1810 +#undef RGB_GREEN

	1811 +#undef RGB_BLUE

	1812 +#undef RGB_PIXELSIZE

	1813 +#undef rgb_ycc_convert_internal

	1814 +#undef rgb_gray_convert_internal

	1815 +#undef rgb_rgb_convert_internal

	1816 +

	1817 +#define RGB_RED EXT_XBGR_RED

	1818 +#define RGB_GREEN EXT_XBGR_GREEN

	1819 +#define RGB_BLUE EXT_XBGR_BLUE

	1820 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	1821 +#define rgb_ycc_convert_internal extxbgr_ycc_convert_internal

	1822 +#define rgb_gray_convert_internal extxbgr_gray_convert_internal

	1823 +#define rgb_rgb_convert_internal extxbgr_rgb_convert_internal

	1824 +#include "jccolext.c"

	1825 +#undef RGB_RED

	1826 +#undef RGB_GREEN

	1827 +#undef RGB_BLUE

	1828 +#undef RGB_PIXELSIZE

	1829 +#undef rgb_ycc_convert_internal

	1830 +#undef rgb_gray_convert_internal

	1831 +#undef rgb_rgb_convert_internal

	1832 +

	1833 +#define RGB_RED EXT_XRGB_RED

	1834 +#define RGB_GREEN EXT_XRGB_GREEN

	1835 +#define RGB_BLUE EXT_XRGB_BLUE

	1836 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	1837 +#define rgb_ycc_convert_internal extxrgb_ycc_convert_internal

	1838 +#define rgb_gray_convert_internal extxrgb_gray_convert_internal

	1839 +#define rgb_rgb_convert_internal extxrgb_rgb_convert_internal

	1840 +#include "jccolext.c"

	1841 +#undef RGB_RED

	1842 +#undef RGB_GREEN

	1843 +#undef RGB_BLUE

	1844 +#undef RGB_PIXELSIZE

	1845 +#undef rgb_ycc_convert_internal

	1846 +#undef rgb_gray_convert_internal

	1847 +#undef rgb_rgb_convert_internal

	1848 +

	1849 +

	1850 /*

	1851 * Initialize for RGB->YCC colorspace conversion.

	1852 */

	1853 @@ -119,14 +226,6 @@

	1854

	1855 /*

	1856 * Convert some rows of samples to the JPEG colorspace.

	1857 - *

	1858 - * Note that we change from the application's interleaved-pixel format

	1859 - * to our internal noninterleaved, one-plane-per-component format.

	1860 - * The input buffer is therefore three times as wide as the output buffer.

	1861 - *

	1862 - * A starting row offset is provided only for the output buffer. The caller

	1863 - * can easily adjust the passed input_buf value to accommodate any row

	1864 - * offset required on that side.

	1865 */

	1866

	1867 METHODDEF(void)

	1868 @@ -134,43 +233,39 @@

	1869 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	1870 JDIMENSION output_row, int num_rows)

	1871 {

	1872 - my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert;

	1873 - register int r, g, b;

	1874 - register INT32 * ctab = cconvert->rgb_ycc_tab;

	1875 - register JSAMPROW inptr;

	1876 - register JSAMPROW outptr0, outptr1, outptr2;

	1877 - register JDIMENSION col;

	1878 - JDIMENSION num_cols = cinfo->image_width;

	1879 -

	1880 - while (--num_rows >= 0) {

	1881 - inptr = *input_buf++;

	1882 - outptr0 = output_buf[0][output_row];

	1883 - outptr1 = output_buf[1][output_row];

	1884 - outptr2 = output_buf[2][output_row];

	1885 - output_row++;

	1886 - for (col = 0; col < num_cols; col++) {

	1887 - r = GETJSAMPLE(inptr[rgb_red[cinfo->in_color_space]]);

	1888 - g = GETJSAMPLE(inptr[rgb_green[cinfo->in_color_space]]);

	1889 - b = GETJSAMPLE(inptr[rgb_blue[cinfo->in_color_space]]);

	1890 - inptr += rgb_pixelsize[cinfo->in_color_space];

	1891 - /* If the inputs are 0..MAXJSAMPLE, the outputs of these equations

	1892 - * must be too; we do not need an explicit range-limiting operation.

	1893 - * Hence the value being shifted is never negative, and we don't

	1894 - * need the general RIGHT_SHIFT macro.

	1895 - */

	1896 - /* Y */

	1897 - outptr0[col] = (JSAMPLE)

	1898 - ((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF])

	1899 - >> SCALEBITS);

	1900 - /* Cb */

	1901 - outptr1[col] = (JSAMPLE)

	1902 - ((ctab[r+R_CB_OFF] + ctab[g+G_CB_OFF] + ctab[b+B_CB_OFF])

	1903 - >> SCALEBITS);

	1904 - /* Cr */

	1905 - outptr2[col] = (JSAMPLE)

	1906 - ((ctab[r+R_CR_OFF] + ctab[g+G_CR_OFF] + ctab[b+B_CR_OFF])

	1907 - >> SCALEBITS);

	1908 - }

	1909 + switch (cinfo->in_color_space) {

	1910 + case JCS_EXT_RGB:

	1911 + extrgb_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,

	1912 + num_rows);

	1913 + break;

	1914 + case JCS_EXT_RGBX:

	1915 + case JCS_EXT_RGBA:

	1916 + extrgbx_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,

	1917 + num_rows);

	1918 + break;

	1919 + case JCS_EXT_BGR:

	1920 + extbgr_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,

	1921 + num_rows);

	1922 + break;

	1923 + case JCS_EXT_BGRX:

	1924 + case JCS_EXT_BGRA:

	1925 + extbgrx_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,

	1926 + num_rows);

	1927 + break;

	1928 + case JCS_EXT_XBGR:

	1929 + case JCS_EXT_ABGR:

	1930 + extxbgr_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,

	1931 + num_rows);

	1932 + break;

	1933 + case JCS_EXT_XRGB:

	1934 + case JCS_EXT_ARGB:

	1935 + extxrgb_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,

	1936 + num_rows);

	1937 + break;

	1938 + default:

	1939 + rgb_ycc_convert_internal(cinfo, input_buf, output_buf, output_row,

	1940 + num_rows);

	1941 + break;

	1942 }

	1943 }

	1944

	1945 @@ -180,9 +275,6 @@

	1946

	1947 /*

	1948 * Convert some rows of samples to the JPEG colorspace.

	1949 - * This version handles RGB->grayscale conversion, which is the same

	1950 - * as the RGB->Y portion of RGB->YCbCr.

	1951 - * We assume rgb_ycc_start has been called (we only use the Y tables).

	1952 */

	1953

	1954 METHODDEF(void)

	1955 @@ -190,28 +282,85 @@

	1956 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	1957 JDIMENSION output_row, int num_rows)

	1958 {

	1959 - my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert;

	1960 - register int r, g, b;

	1961 - register INT32 * ctab = cconvert->rgb_ycc_tab;

	1962 - register JSAMPROW inptr;

	1963 - register JSAMPROW outptr;

	1964 - register JDIMENSION col;

	1965 - JDIMENSION num_cols = cinfo->image_width;

	1966 + switch (cinfo->in_color_space) {

	1967 + case JCS_EXT_RGB:

	1968 + extrgb_gray_convert_internal(cinfo, input_buf, output_buf, output_row,

	1969 + num_rows);

	1970 + break;

	1971 + case JCS_EXT_RGBX:

	1972 + case JCS_EXT_RGBA:

	1973 + extrgbx_gray_convert_internal(cinfo, input_buf, output_buf, output_row,

	1974 + num_rows);

	1975 + break;

	1976 + case JCS_EXT_BGR:

	1977 + extbgr_gray_convert_internal(cinfo, input_buf, output_buf, output_row,

	1978 + num_rows);

	1979 + break;

	1980 + case JCS_EXT_BGRX:

	1981 + case JCS_EXT_BGRA:

	1982 + extbgrx_gray_convert_internal(cinfo, input_buf, output_buf, output_row,

	1983 + num_rows);

	1984 + break;

	1985 + case JCS_EXT_XBGR:

	1986 + case JCS_EXT_ABGR:

	1987 + extxbgr_gray_convert_internal(cinfo, input_buf, output_buf, output_row,

	1988 + num_rows);

	1989 + break;

	1990 + case JCS_EXT_XRGB:

	1991 + case JCS_EXT_ARGB:

	1992 + extxrgb_gray_convert_internal(cinfo, input_buf, output_buf, output_row,

	1993 + num_rows);

	1994 + break;

	1995 + default:

	1996 + rgb_gray_convert_internal(cinfo, input_buf, output_buf, output_row,

	1997 + num_rows);

	1998 + break;

	1999 + }

	2000 +}

	2001

	2002 - while (--num_rows >= 0) {

	2003 - inptr = *input_buf++;

	2004 - outptr = output_buf[0][output_row];

	2005 - output_row++;

	2006 - for (col = 0; col < num_cols; col++) {

	2007 - r = GETJSAMPLE(inptr[rgb_red[cinfo->in_color_space]]);

	2008 - g = GETJSAMPLE(inptr[rgb_green[cinfo->in_color_space]]);

	2009 - b = GETJSAMPLE(inptr[rgb_blue[cinfo->in_color_space]]);

	2010 - inptr += rgb_pixelsize[cinfo->in_color_space];

	2011 - /* Y */

	2012 - outptr[col] = (JSAMPLE)

	2013 - ((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF])

	2014 - >> SCALEBITS);

	2015 - }

	2016 +

	2017 +/*

	2018 + * Extended RGB to plain RGB conversion

	2019 + */

	2020 +

	2021 +METHODDEF(void)

	2022 +rgb_rgb_convert (j_compress_ptr cinfo,

	2023 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	2024 + JDIMENSION output_row, int num_rows)

	2025 +{

	2026 + switch (cinfo->in_color_space) {

	2027 + case JCS_EXT_RGB:

	2028 + extrgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,

	2029 + num_rows);

	2030 + break;

	2031 + case JCS_EXT_RGBX:

	2032 + case JCS_EXT_RGBA:

	2033 + extrgbx_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,

	2034 + num_rows);

	2035 + break;

	2036 + case JCS_EXT_BGR:

	2037 + extbgr_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,

	2038 + num_rows);

	2039 + break;

	2040 + case JCS_EXT_BGRX:

	2041 + case JCS_EXT_BGRA:

	2042 + extbgrx_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,

	2043 + num_rows);

	2044 + break;

	2045 + case JCS_EXT_XBGR:

	2046 + case JCS_EXT_ABGR:

	2047 + extxbgr_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,

	2048 + num_rows);

	2049 + break;

	2050 + case JCS_EXT_XRGB:

	2051 + case JCS_EXT_ARGB:

	2052 + extxrgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,

	2053 + num_rows);

	2054 + break;

	2055 + default:

	2056 + rgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row,

	2057 + num_rows);

	2058 + break;

	2059 }

	2060 }

	2061

	2062 @@ -377,6 +526,10 @@

	2063 case JCS_EXT_BGRX:

	2064 case JCS_EXT_XBGR:

	2065 case JCS_EXT_XRGB:

	2066 + case JCS_EXT_RGBA:

	2067 + case JCS_EXT_BGRA:

	2068 + case JCS_EXT_ABGR:

	2069 + case JCS_EXT_ARGB:

	2070 if (cinfo->input_components != rgb_pixelsize[cinfo->in_color_space])

	2071 ERREXIT(cinfo, JERR_BAD_IN_COLORSPACE);

	2072 break;

	2073 @@ -411,9 +564,17 @@

	2074 cinfo->in_color_space == JCS_EXT_BGR \|\|

	2075 cinfo->in_color_space == JCS_EXT_BGRX \|\|

	2076 cinfo->in_color_space == JCS_EXT_XBGR \|\|

	2077 - cinfo->in_color_space == JCS_EXT_XRGB) {

	2078 - cconvert->pub.start_pass = rgb_ycc_start;

	2079 - cconvert->pub.color_convert = rgb_gray_convert;

	2080 + cinfo->in_color_space == JCS_EXT_XRGB \|\|

	2081 + cinfo->in_color_space == JCS_EXT_RGBA \|\|

	2082 + cinfo->in_color_space == JCS_EXT_BGRA \|\|

	2083 + cinfo->in_color_space == JCS_EXT_ABGR \|\|

	2084 + cinfo->in_color_space == JCS_EXT_ARGB) {

	2085 + if (jsimd_can_rgb_gray())

	2086 + cconvert->pub.color_convert = jsimd_rgb_gray_convert;

	2087 + else {

	2088 + cconvert->pub.start_pass = rgb_ycc_start;

	2089 + cconvert->pub.color_convert = rgb_gray_convert;

	2090 + }

	2091 } else if (cinfo->in_color_space == JCS_YCbCr)

	2092 cconvert->pub.color_convert = grayscale_convert;

	2093 else

	2094 @@ -421,17 +582,25 @@

	2095 break;

	2096

	2097 case JCS_RGB:

	2098 - case JCS_EXT_RGB:

	2099 - case JCS_EXT_RGBX:

	2100 - case JCS_EXT_BGR:

	2101 - case JCS_EXT_BGRX:

	2102 - case JCS_EXT_XBGR:

	2103 - case JCS_EXT_XRGB:

	2104 if (cinfo->num_components != 3)

	2105 ERREXIT(cinfo, JERR_BAD_J_COLORSPACE);

	2106 - if (cinfo->in_color_space == cinfo->jpeg_color_space &&

	2107 - rgb_pixelsize[cinfo->in_color_space] == 3)

	2108 + if (rgb_red[cinfo->in_color_space] == 0 &&

	2109 + rgb_green[cinfo->in_color_space] == 1 &&

	2110 + rgb_blue[cinfo->in_color_space] == 2 &&

	2111 + rgb_pixelsize[cinfo->in_color_space] == 3)

	2112 cconvert->pub.color_convert = null_convert;

	2113 + else if (cinfo->in_color_space == JCS_RGB \|\|

	2114 + cinfo->in_color_space == JCS_EXT_RGB \|\|

	2115 + cinfo->in_color_space == JCS_EXT_RGBX \|\|

	2116 + cinfo->in_color_space == JCS_EXT_BGR \|\|

	2117 + cinfo->in_color_space == JCS_EXT_BGRX \|\|

	2118 + cinfo->in_color_space == JCS_EXT_XBGR \|\|

	2119 + cinfo->in_color_space == JCS_EXT_XRGB \|\|

	2120 + cinfo->in_color_space == JCS_EXT_RGBA \|\|

	2121 + cinfo->in_color_space == JCS_EXT_BGRA \|\|

	2122 + cinfo->in_color_space == JCS_EXT_ABGR \|\|

	2123 + cinfo->in_color_space == JCS_EXT_ARGB)

	2124 + cconvert->pub.color_convert = rgb_rgb_convert;

	2125 else

	2126 ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL);

	2127 break;

	2128 @@ -445,7 +614,11 @@

	2129 cinfo->in_color_space == JCS_EXT_BGR \|\|

	2130 cinfo->in_color_space == JCS_EXT_BGRX \|\|

	2131 cinfo->in_color_space == JCS_EXT_XBGR \|\|

	2132 - cinfo->in_color_space == JCS_EXT_XRGB) {

	2133 + cinfo->in_color_space == JCS_EXT_XRGB \|\|

	2134 + cinfo->in_color_space == JCS_EXT_RGBA \|\|

	2135 + cinfo->in_color_space == JCS_EXT_BGRA \|\|

	2136 + cinfo->in_color_space == JCS_EXT_ABGR \|\|

	2137 + cinfo->in_color_space == JCS_EXT_ARGB) {

	2138 if (jsimd_can_rgb_ycc())

	2139 cconvert->pub.color_convert = jsimd_rgb_ycc_convert;

	2140 else {

	2141 Index: jcdctmgr.c

	2142 ===================================================================

	2143 --- jcdctmgr.c (revision 829)

	2144 +++ jcdctmgr.c (working copy)

	2145 @@ -1,10 +1,12 @@

	2146 /*

	2147 * jcdctmgr.c

	2148 *

	2149 + * This file was part of the Independent JPEG Group's software:

	2150 * Copyright (C) 1994-1996, Thomas G. Lane.

	2151 + * libjpeg-turbo Modifications:

	2152 * Copyright (C) 1999-2006, MIYASAKA Masaru.

	2153 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	2154 - * This file is part of the Independent JPEG Group's software.

	2155 + * Copyright (C) 2011 D. R. Commander

	2156 * For conditions of distribution and use, see the accompanying README file.

	2157 *

	2158 * This file contains the forward-DCT management logic.

	2159 @@ -39,6 +41,8 @@

	2160 (JCOEFPTR coef_block, FAST_FLOAT * divisors,

	2161 FAST_FLOAT * workspace));

	2162

	2163 +METHODDEF(void) quantize (JCOEFPTR, DCTELEM , DCTELEM );

	2164 +

	2165 typedef struct {

	2166 struct jpeg_forward_dct pub; /* public fields */

	2167

	2168 @@ -73,7 +77,7 @@

	2169 * Find the highest bit in an integer through binary search.

	2170 */

	2171 LOCAL(int)

	2172 -fls (UINT16 val)

	2173 +flss (UINT16 val)

	2174 {

	2175 int bit;

	2176

	2177 @@ -160,7 +164,7 @@

	2178 * of in a consecutive manner, yet again in order to allow SIMD

	2179 * routines.

	2180 */

	2181 -LOCAL(void)

	2182 +LOCAL(int)

	2183 compute_reciprocal (UINT16 divisor, DCTELEM * dtbl)

	2184 {

	2185 UDCTELEM2 fq, fr;

	2186 @@ -167,7 +171,7 @@

	2187 UDCTELEM c;

	2188 int b, r;

	2189

	2190 - b = fls(divisor) - 1;

	2191 + b = flss(divisor) - 1;

	2192 r = sizeof(DCTELEM) * 8 + b;

	2193

	2194 fq = ((UDCTELEM2)1 << r) / divisor;

	2195 @@ -179,7 +183,7 @@

	2196 /* fq will be one bit too large to fit in DCTELEM, so adjust */

	2197 fq >>= 1;

	2198 r--;

	2199 - } else if (fr <= (divisor / 2)) { /* fractional part is < 0.5 */

	2200 + } else if (fr <= (divisor / 2U)) { /* fractional part is < 0.5 */

	2201 c++;

	2202 } else { /* fractional part is > 0.5 */

	2203 fq++;

	2204 @@ -189,6 +193,9 @@

	2205 dtbl[DCTSIZE2 * 1] = (DCTELEM) c; /* correction + roundfactor */

	2206 dtbl[DCTSIZE2 * 2] = (DCTELEM) (1 << (sizeof(DCTELEM)82 - r)); /* scale */

	2207 dtbl[DCTSIZE2 * 3] = (DCTELEM) r - sizeof(DCTELEM)8; / shift */

	2208 +

	2209 + if(r <= 16) return 0;

	2210 + else return 1;

	2211 }

	2212

	2213 /*

	2214 @@ -232,7 +239,9 @@

	2215 }

	2216 dtbl = fdct->divisors[qtblno];

	2217 for (i = 0; i < DCTSIZE2; i++) {

	2218 - compute_reciprocal(qtbl->quantval[i] << 3, &dtbl[i]);

	2219 + if(!compute_reciprocal(qtbl->quantval[i] << 3, &dtbl[i])

	2220 + && fdct->quantize == jsimd_quantize)

	2221 + fdct->quantize = quantize;

	2222 }

	2223 break;

	2224 #endif

	2225 @@ -266,10 +275,12 @@

	2226 }

	2227 dtbl = fdct->divisors[qtblno];

	2228 for (i = 0; i < DCTSIZE2; i++) {

	2229 - compute_reciprocal(

	2230 + if(!compute_reciprocal(

	2231 DESCALE(MULTIPLY16V16((INT32) qtbl->quantval[i],

	2232 (INT32) aanscales[i]),

	2233 - CONST_BITS-3), &dtbl[i]);

	2234 + CONST_BITS-3), &dtbl[i])

	2235 + && fdct->quantize == jsimd_quantize)

	2236 + fdct->quantize = quantize;

	2237 }

	2238 }

	2239 break;

	2240 Index: jchuff.c

	2241 ===================================================================

	2242 --- jchuff.c (revision 829)

	2243 +++ jchuff.c (working copy)

	2244 @@ -1,8 +1,10 @@

	2245 /*

	2246 * jchuff.c

	2247 *

	2248 + * This file was part of the Independent JPEG Group's software:

	2249 * Copyright (C) 1991-1997, Thomas G. Lane.

	2250 - * This file is part of the Independent JPEG Group's software.

	2251 + * libjpeg-turbo Modifications:

	2252 + * Copyright (C) 2009-2011, D. R. Commander.

	2253 * For conditions of distribution and use, see the accompanying README file.

	2254 *

	2255 * This file contains Huffman entropy encoding routines.

	2256 @@ -14,21 +16,6 @@

	2257 * permanent JPEG objects only upon successful completion of an MCU.

	2258 */

	2259

	2260 -/* Modifications:

	2261 - * Copyright (C)2007 Sun Microsystems, Inc.

	2262 - * Copyright (C)2009 D. R. Commander

	2263 - *

	2264 - * This library is free software and may be redistributed and/or modified under

	2265 - * the terms of the wxWindows Library License, Version 3.1 or (at your option)

	2266 - * any later version. The full license is in the LICENSE.txt file included

	2267 - * with this distribution.

	2268 - *

	2269 - * This library is distributed in the hope that it will be useful,

	2270 - * but WITHOUT ANY WARRANTY; without even the implied warranty of

	2271 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

	2272 - * wxWindows Library License for more details.

	2273 - */

	2274 -

	2275 #define JPEG_INTERNALS

	2276 #include "jinclude.h"

	2277 #include "jpeglib.h"

	2278 @@ -35,13 +22,42 @@

	2279 #include "jchuff.h" /* Declarations shared with jcphuff.c */

	2280 #include <limits.h>

	2281

	2282 -static unsigned char jpeg_first_bit_table[65536];

	2283 -int jpeg_first_bit_table_init=0;

	2284 +/*

	2285 + * NOTE: If USE_CLZ_INTRINSIC is defined, then clz/bsr instructions will be

	2286 + * used for bit counting rather than the lookup table. This will reduce the

	2287 + * memory footprint by 64k, which is important for some mobile applications

	2288 + * that create many isolated instances of libjpeg-turbo (web browsers, for

	2289 + * instance.) This may improve performance on some mobile platforms as well.

	2290 + * This feature is enabled by default only on ARM processors, because some x86

	2291 + * chips have a slow implementation of bsr, and the use of clz/bsr cannot be

	2292 + * shown to have a significant performance impact even on the x86 chips that

	2293 + * have a fast implementation of it. When building for ARMv6, you can

	2294 + * explicitly disable the use of clz/bsr by adding -mthumb to the compiler

	2295 + * flags (this defines __thumb__).

	2296 + */

	2297

	2298 +/* NOTE: Both GCC and Clang define __GNUC__ */

	2299 +#if defined __GNUC__ && defined __arm__

	2300 +#if !defined __thumb__ \|\| defined __thumb2__

	2301 +#define USE_CLZ_INTRINSIC

	2302 +#endif

	2303 +#endif

	2304 +

	2305 +#ifdef USE_CLZ_INTRINSIC

	2306 +#define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x))

	2307 +#define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0)

	2308 +#else

	2309 +static unsigned char jpeg_nbits_table[65536];

	2310 +static int jpeg_nbits_table_init = 0;

	2311 +#define JPEG_NBITS(x) (jpeg_nbits_table[x])

	2312 +#define JPEG_NBITS_NONZERO(x) JPEG_NBITS(x)

	2313 +#endif

	2314 +

	2315 #ifndef min

	2316 #define min(a,b) ((a)<(b)?(a):(b))

	2317 #endif

	2318

	2319 +

	2320 /* Expanded entropy encoder object for Huffman encoding.

	2321 *

	2322 * The savable_state subrecord contains fields that change within an MCU,

	2323 @@ -49,7 +65,7 @@

	2324 */

	2325

	2326 typedef struct {

	2327 - long put_buffer; /* current bit-accumulation buffer */

	2328 + size_t put_buffer; /* current bit-accumulation buffer */

	2329 int put_bits; /* # of bits now in it */

	2330 int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */

	2331 } savable_state;

	2332 @@ -181,7 +197,6 @@

	2333 }

	2334

	2335 /* Initialize bit buffer to empty */

	2336 -

	2337 entropy->saved.put_buffer = 0;

	2338 entropy->saved.put_bits = 0;

	2339

	2340 @@ -285,14 +300,16 @@

	2341 dtbl->ehufsi[i] = huffsize[p];

	2342 }

	2343

	2344 - if(!jpeg_first_bit_table_init) {

	2345 +#ifndef USE_CLZ_INTRINSIC

	2346 + if(!jpeg_nbits_table_init) {

	2347 for(i = 0; i < 65536; i++) {

	2348 - int bit = 0, val = i;

	2349 - while (val) {val >>= 1; bit++;}

	2350 - jpeg_first_bit_table[i] = bit;

	2351 + int nbits = 0, temp = i;

	2352 + while (temp) {temp >>= 1; nbits++;}

	2353 + jpeg_nbits_table[i] = nbits;

	2354 }

	2355 - jpeg_first_bit_table_init = 1;

	2356 + jpeg_nbits_table_init = 1;

	2357 }

	2358 +#endif

	2359 }

	2360

	2361

	2362 @@ -312,8 +329,6 @@

	2363 {

	2364 struct jpeg_destination_mgr * dest = state->cinfo->dest;

	2365

	2366 - dest->free_in_buffer = state->free_in_buffer;

	2367 -

	2368 if (! (*dest->empty_output_buffer) (state->cinfo))

	2369 return FALSE;

	2370 /* After a successful buffer dump, must reset buffer pointers */

	2371 @@ -325,178 +340,133 @@

	2372

	2373 /* Outputting bits to the file */

	2374

	2375 -/* Only the right 24 bits of put_buffer are used; the valid bits are

	2376 - * left-justified in this part. At most 16 bits can be passed to emit_bits

	2377 - * in one call, and we never retain more than 7 bits in put_buffer

	2378 - * between calls, so 24 bits are sufficient.

	2379 +/* These macros perform the same task as the emit_bits() function in the

	2380 + * original libjpeg code. In addition to reducing overhead by explicitly

	2381 + * inlining the code, additional performance is achieved by taking into

	2382 + * account the size of the bit buffer and waiting until it is almost full

	2383 + * before emptying it. This mostly benefits 64-bit platforms, since 6

	2384 + * bytes can be stored in a 64-bit bit buffer before it has to be emptied.

	2385 */

	2386

	2387 -/***************************************************************/

	2388 -

	2389 -#define EMIT_BYTE() { \

	2390 - if (0xFF == (*buffer++ = (unsigned char)(put_buffer >> (put_bits -= 8)))) \

	2391 - *buffer++ = 0; \

	2392 +#define EMIT_BYTE() { \

	2393 + JOCTET c; \

	2394 + put_bits -= 8; \

	2395 + c = (JOCTET)GETJOCTET(put_buffer >> put_bits); \

	2396 + *buffer++ = c; \

	2397 + if (c == 0xFF) /* need to stuff a zero byte? */ \

	2398 + *buffer++ = 0; \

	2399 }

	2400

	2401 -/***************************************************************/

	2402 +#define PUT_BITS(code, size) { \

	2403 + put_bits += size; \

	2404 + put_buffer = (put_buffer << size) \| code; \

	2405 +}

	2406

	2407 -#define DUMP_BITS_(code, size) { \

	2408 - put_bits += size; \

	2409 - put_buffer = (put_buffer << size) \| code; \

	2410 - if (put_bits > 7) \

	2411 - while(put_bits > 7) \

	2412 - EMIT_BYTE() \

	2413 - }

	2414 -

	2415 -/***************************************************************/

	2416 -

	2417 -#define CHECKBUF15() { \

	2418 - if (put_bits > 15) { \

	2419 - EMIT_BYTE() \

	2420 - EMIT_BYTE() \

	2421 - } \

	2422 +#define CHECKBUF15() { \

	2423 + if (put_bits > 15) { \

	2424 + EMIT_BYTE() \

	2425 + EMIT_BYTE() \

	2426 + } \

	2427 }

	2428

	2429 -#define CHECKBUF47() { \

	2430 - if (put_bits > 47) { \

	2431 - EMIT_BYTE() \

	2432 - EMIT_BYTE() \

	2433 - EMIT_BYTE() \

	2434 - EMIT_BYTE() \

	2435 - EMIT_BYTE() \

	2436 - EMIT_BYTE() \

	2437 - } \

	2438 +#define CHECKBUF31() { \

	2439 + if (put_bits > 31) { \

	2440 + EMIT_BYTE() \

	2441 + EMIT_BYTE() \

	2442 + EMIT_BYTE() \

	2443 + EMIT_BYTE() \

	2444 + } \

	2445 }

	2446

	2447 -#define CHECKBUF31() { \

	2448 - if (put_bits > 31) { \

	2449 - EMIT_BYTE() \

	2450 - EMIT_BYTE() \

	2451 - EMIT_BYTE() \

	2452 - EMIT_BYTE() \

	2453 - } \

	2454 +#define CHECKBUF47() { \

	2455 + if (put_bits > 47) { \

	2456 + EMIT_BYTE() \

	2457 + EMIT_BYTE() \

	2458 + EMIT_BYTE() \

	2459 + EMIT_BYTE() \

	2460 + EMIT_BYTE() \

	2461 + EMIT_BYTE() \

	2462 + } \

	2463 }

	2464

	2465 -/***************************************************************/

	2466 +#if __WORDSIZE==64 \|\| defined(_WIN64)

	2467

	2468 -#define DUMP_BITS_NOCHECK(code, size) { \

	2469 - put_bits += size; \

	2470 - put_buffer = (put_buffer << size) \| code; \

	2471 - }

	2472 +#define EMIT_BITS(code, size) { \

	2473 + CHECKBUF47() \

	2474 + PUT_BITS(code, size) \

	2475 +}

	2476

	2477 -#if __WORDSIZE==64

	2478 -

	2479 -#define DUMP_BITS(code, size) { \

	2480 - CHECKBUF47() \

	2481 - put_bits += size; \

	2482 - put_buffer = (put_buffer << size) \| code; \

	2483 +#define EMIT_CODE(code, size) { \

	2484 + temp2 &= (((INT32) 1)<<nbits) - 1; \

	2485 + CHECKBUF31() \

	2486 + PUT_BITS(code, size) \

	2487 + PUT_BITS(temp2, nbits) \

	2488 }

	2489

	2490 #else

	2491

	2492 -#define DUMP_BITS(code, size) { \

	2493 - put_bits += size; \

	2494 - put_buffer = (put_buffer << size) \| code; \

	2495 - CHECKBUF15() \

	2496 - }

	2497 +#define EMIT_BITS(code, size) { \

	2498 + PUT_BITS(code, size) \

	2499 + CHECKBUF15() \

	2500 +}

	2501

	2502 -#endif

	2503 -

	2504 -/***************************************************************/

	2505 -

	2506 -#define DUMP_SINGLE_VALUE(ht, codevalue) { \

	2507 - size = ht->ehufsi[codevalue]; \

	2508 - code = ht->ehufco[codevalue]; \

	2509 - \

	2510 - DUMP_BITS(code, size) \

	2511 +#define EMIT_CODE(code, size) { \

	2512 + temp2 &= (((INT32) 1)<<nbits) - 1; \

	2513 + PUT_BITS(code, size) \

	2514 + CHECKBUF15() \

	2515 + PUT_BITS(temp2, nbits) \

	2516 + CHECKBUF15() \

	2517 }

	2518

	2519 -/***************************************************************/

	2520 -

	2521 -#define DUMP_VALUE_SLOW(ht, codevalue, t, nbits) { \

	2522 - size = ht->ehufsi[codevalue]; \

	2523 - code = ht->ehufco[codevalue]; \

	2524 - t &= ~(-1 << nbits); \

	2525 - DUMP_BITS_NOCHECK(code, size) \

	2526 - CHECKBUF15() \

	2527 - DUMP_BITS_NOCHECK(t, nbits) \

	2528 - CHECKBUF15() \

	2529 - }

	2530 -

	2531 -int _max=0;

	2532 -

	2533 -#if __WORDSIZE==64

	2534 -

	2535 -#define DUMP_VALUE(ht, codevalue, t, nbits) { \

	2536 - size = ht->ehufsi[codevalue]; \

	2537 - code = ht->ehufco[codevalue]; \

	2538 - t &= ~(-1 << nbits); \

	2539 - CHECKBUF31() \

	2540 - DUMP_BITS_NOCHECK(code, size) \

	2541 - DUMP_BITS_NOCHECK(t, nbits) \

	2542 - }

	2543 -

	2544 -#else

	2545 -

	2546 -#define DUMP_VALUE(ht, codevalue, t, nbits) { \

	2547 - size = ht->ehufsi[codevalue]; \

	2548 - code = ht->ehufco[codevalue]; \

	2549 - t &= ~(-1 << nbits); \

	2550 - DUMP_BITS_NOCHECK(code, size) \

	2551 - CHECKBUF15() \

	2552 - DUMP_BITS_NOCHECK(t, nbits) \

	2553 - CHECKBUF15() \

	2554 - }

	2555 -

	2556 #endif

	2557

	2558 -/***************************************************************/

	2559

	2560 #define BUFSIZE (DCTSIZE2 * 2)

	2561

	2562 -#define LOAD_BUFFER() { \

	2563 - if (state->free_in_buffer < BUFSIZE) { \

	2564 - localbuf = 1; \

	2565 - buffer = _buffer; \

	2566 - } \

	2567 - else buffer = state->next_output_byte; \

	2568 +#define LOAD_BUFFER() { \

	2569 + if (state->free_in_buffer < BUFSIZE) { \

	2570 + localbuf = 1; \

	2571 + buffer = _buffer; \

	2572 + } \

	2573 + else buffer = state->next_output_byte; \

	2574 }

	2575

	2576 -#define STORE_BUFFER() { \

	2577 - if (localbuf) { \

	2578 - bytes = buffer - _buffer; \

	2579 - buffer = _buffer; \

	2580 - while (bytes > 0) { \

	2581 - bytestocopy = min(bytes, state->free_in_buffer); \

	2582 - MEMCOPY(state->next_output_byte, buffer, bytestocopy); \

	2583 - state->next_output_byte += bytestocopy; \

	2584 - buffer += bytestocopy; \

	2585 - state->free_in_buffer -= bytestocopy; \

	2586 - if (state->free_in_buffer == 0) \

	2587 - if (! dump_buffer(state)) return FALSE; \

	2588 - bytes -= bytestocopy; \

	2589 - } \

	2590 - } \

	2591 - else { \

	2592 - state->free_in_buffer -= (buffer - state->next_output_byte); \

	2593 - state->next_output_byte = buffer; \

	2594 - } \

	2595 +#define STORE_BUFFER() { \

	2596 + if (localbuf) { \

	2597 + bytes = buffer - _buffer; \

	2598 + buffer = _buffer; \

	2599 + while (bytes > 0) { \

	2600 + bytestocopy = min(bytes, state->free_in_buffer); \

	2601 + MEMCOPY(state->next_output_byte, buffer, bytestocopy); \

	2602 + state->next_output_byte += bytestocopy; \

	2603 + buffer += bytestocopy; \

	2604 + state->free_in_buffer -= bytestocopy; \

	2605 + if (state->free_in_buffer == 0) \

	2606 + if (! dump_buffer(state)) return FALSE; \

	2607 + bytes -= bytestocopy; \

	2608 + } \

	2609 + } \

	2610 + else { \

	2611 + state->free_in_buffer -= (buffer - state->next_output_byte); \

	2612 + state->next_output_byte = buffer; \

	2613 + } \

	2614 }

	2615

	2616 -/***************************************************************/

	2617

	2618 LOCAL(boolean)

	2619 flush_bits (working_state * state)

	2620 {

	2621 - unsigned char _buffer[BUFSIZE], *buffer;

	2622 - long put_buffer; int put_bits;

	2623 - int bytes, bytestocopy, localbuf = 0;

	2624 + JOCTET _buffer[BUFSIZE], *buffer;

	2625 + size_t put_buffer; int put_bits;

	2626 + size_t bytes, bytestocopy; int localbuf = 0;

	2627

	2628 put_buffer = state->cur.put_buffer;

	2629 put_bits = state->cur.put_bits;

	2630 LOAD_BUFFER()

	2631

	2632 - DUMP_BITS_(0x7F, 7)

	2633 + /* fill any partial byte with ones */

	2634 + PUT_BITS(0x7F, 7)

	2635 + while (put_bits >= 8) EMIT_BYTE()

	2636

	2637 state->cur.put_buffer = 0; /* and reset bit-buffer to empty */

	2638 state->cur.put_bits = 0;

	2639 @@ -505,6 +475,7 @@

	2640 return TRUE;

	2641 }

	2642

	2643 +

	2644 /* Encode a single block's worth of coefficients */

	2645

	2646 LOCAL(boolean)

	2647 @@ -511,13 +482,13 @@

	2648 encode_one_block (working_state * state, JCOEFPTR block, int last_dc_val,

	2649 c_derived_tbl dctbl, c_derived_tbl actbl)

	2650 {

	2651 - int temp, temp2;

	2652 + int temp, temp2, temp3;

	2653 int nbits;

	2654 - int r, sflag, size, code;

	2655 - unsigned char _buffer[BUFSIZE], *buffer;

	2656 - long put_buffer; int put_bits;

	2657 + int r, code, size;

	2658 + JOCTET _buffer[BUFSIZE], *buffer;

	2659 + size_t put_buffer; int put_bits;

	2660 int code_0xf0 = actbl->ehufco[0xf0], size_0xf0 = actbl->ehufsi[0xf0];

	2661 - int bytes, bytestocopy, localbuf = 0;

	2662 + size_t bytes, bytestocopy; int localbuf = 0;

	2663

	2664 put_buffer = state->cur.put_buffer;

	2665 put_bits = state->cur.put_bits;

	2666 @@ -527,50 +498,88 @@

	2667

	2668 temp = temp2 = block[0] - last_dc_val;

	2669

	2670 - sflag = temp >> 31;

	2671 - temp -= ((temp + temp) & sflag);

	2672 - temp2 += sflag;

	2673 - nbits = jpeg_first_bit_table[temp];

	2674 - DUMP_VALUE_SLOW(dctbl, nbits, temp2, nbits)

	2675 + /* This is a well-known technique for obtaining the absolute value without a

	2676 + * branch. It is derived from an assembly language technique presented in

	2677 + * "How to Optimize for the Pentium Processors", Copyright (c) 1996, 1997 by

	2678 + * Agner Fog.

	2679 + */

	2680 + temp3 = temp >> (CHAR_BIT * sizeof(int) - 1);

	2681 + temp ^= temp3;

	2682 + temp -= temp3;

	2683

	2684 + /* For a negative input, want temp2 = bitwise complement of abs(input) */

	2685 + /* This code assumes we are on a two's complement machine */

	2686 + temp2 += temp3;

	2687 +

	2688 + /* Find the number of bits needed for the magnitude of the coefficient */

	2689 + nbits = JPEG_NBITS(temp);

	2690 +

	2691 + /* Emit the Huffman-coded symbol for the number of bits */

	2692 + code = dctbl->ehufco[nbits];

	2693 + size = dctbl->ehufsi[nbits];

	2694 + PUT_BITS(code, size)

	2695 + CHECKBUF15()

	2696 +

	2697 + /* Mask off any extra bits in code */

	2698 + temp2 &= (((INT32) 1)<<nbits) - 1;

	2699 +

	2700 + /* Emit that number of bits of the value, if positive, */

	2701 + /* or the complement of its magnitude, if negative. */

	2702 + PUT_BITS(temp2, nbits)

	2703 + CHECKBUF15()

	2704 +

	2705 /* Encode the AC coefficients per section F.1.2.2 */

	2706

	2707 r = 0; /* r = run length of zeros */

	2708

	2709 -#define innerloop(order) { \

	2710 - temp2 = (JCOEF)((unsigned char*)block + order); \

	2711 - if(temp2 == 0) r++; \

	2712 - else { \

	2713 - temp = (JCOEF)temp2; \

	2714 - sflag = temp >> 31; \

	2715 - temp = (temp ^ sflag) - sflag; \

	2716 - temp2 += sflag; \

	2717 - nbits = jpeg_first_bit_table[temp]; \

	2718 - for(; r > 15; r -= 16) DUMP_BITS(code_0xf0, size_0xf0) \

	2719 - sflag = (r << 4) + nbits; \

	2720 - DUMP_VALUE(actbl, sflag, temp2, nbits) \

	2721 +/* Manually unroll the k loop to eliminate the counter variable. This

	2722 + * improves performance greatly on systems with a limited number of

	2723 + * registers (such as x86.)

	2724 + */

	2725 +#define kloop(jpeg_natural_order_of_k) { \

	2726 + if ((temp = block[jpeg_natural_order_of_k]) == 0) { \

	2727 + r++; \

	2728 + } else { \

	2729 + temp2 = temp; \

	2730 + /* Branch-less absolute value, bitwise complement, etc., same as above */ \

	2731 + temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); \

	2732 + temp ^= temp3; \

	2733 + temp -= temp3; \

	2734 + temp2 += temp3; \

	2735 + nbits = JPEG_NBITS_NONZERO(temp); \

	2736 + /* if run length > 15, must emit special run-length-16 codes (0xF0) */ \

	2737 + while (r > 15) { \

	2738 + EMIT_BITS(code_0xf0, size_0xf0) \

	2739 + r -= 16; \

	2740 + } \

	2741 + /* Emit Huffman symbol for run length / number of bits */ \

	2742 + temp3 = (r << 4) + nbits; \

	2743 + code = actbl->ehufco[temp3]; \

	2744 + size = actbl->ehufsi[temp3]; \

	2745 + EMIT_CODE(code, size) \

	2746 r = 0; \

	2747 - }}

	2748 + } \

	2749 +}

	2750

	2751 - innerloop(21); innerloop(28); innerloop(216); innerloop(29);

	2752 - innerloop(22); innerloop(23); innerloop(210); innerloop(217);

	2753 - innerloop(224); innerloop(232); innerloop(225); innerloop(218);

	2754 - innerloop(211); innerloop(24); innerloop(25); innerloop(212);

	2755 - innerloop(219); innerloop(226); innerloop(233); innerloop(240);

	2756 - innerloop(248); innerloop(241); innerloop(234); innerloop(227);

	2757 - innerloop(220); innerloop(213); innerloop(26); innerloop(27);

	2758 - innerloop(214); innerloop(221); innerloop(228); innerloop(235);

	2759 - innerloop(242); innerloop(249); innerloop(256); innerloop(257);

	2760 - innerloop(250); innerloop(243); innerloop(236); innerloop(229);

	2761 - innerloop(222); innerloop(215); innerloop(223); innerloop(230);

	2762 - innerloop(237); innerloop(244); innerloop(251); innerloop(258);

	2763 - innerloop(259); innerloop(252); innerloop(245); innerloop(238);

	2764 - innerloop(231); innerloop(239); innerloop(246); innerloop(253);

	2765 - innerloop(260); innerloop(261); innerloop(254); innerloop(247);

	2766 - innerloop(255); innerloop(262); innerloop(2*63);

	2767 + /* One iteration for each value in jpeg_natural_order[] */

	2768 + kloop(1); kloop(8); kloop(16); kloop(9); kloop(2); kloop(3);

	2769 + kloop(10); kloop(17); kloop(24); kloop(32); kloop(25); kloop(18);

	2770 + kloop(11); kloop(4); kloop(5); kloop(12); kloop(19); kloop(26);

	2771 + kloop(33); kloop(40); kloop(48); kloop(41); kloop(34); kloop(27);

	2772 + kloop(20); kloop(13); kloop(6); kloop(7); kloop(14); kloop(21);

	2773 + kloop(28); kloop(35); kloop(42); kloop(49); kloop(56); kloop(57);

	2774 + kloop(50); kloop(43); kloop(36); kloop(29); kloop(22); kloop(15);

	2775 + kloop(23); kloop(30); kloop(37); kloop(44); kloop(51); kloop(58);

	2776 + kloop(59); kloop(52); kloop(45); kloop(38); kloop(31); kloop(39);

	2777 + kloop(46); kloop(53); kloop(60); kloop(61); kloop(54); kloop(47);

	2778 + kloop(55); kloop(62); kloop(63);

	2779

	2780 /* If the last coef(s) were zero, emit an end-of-block code */

	2781 - if (r > 0) DUMP_SINGLE_VALUE(actbl, 0x0)

	2782 + if (r > 0) {

	2783 + code = actbl->ehufco[0];

	2784 + size = actbl->ehufsi[0];

	2785 + EMIT_BITS(code, size)

	2786 + }

	2787

	2788 state->cur.put_buffer = put_buffer;

	2789 state->cur.put_bits = put_bits;

	2790 Index: jcinit.c

	2791 ===================================================================

	2792 --- jcinit.c (revision 829)

	2793 +++ jcinit.c (working copy)

	2794 @@ -42,7 +42,11 @@

	2795 jinit_forward_dct(cinfo);

	2796 /* Entropy encoding: either Huffman or arithmetic coding. */

	2797 if (cinfo->arith_code) {

	2798 +#ifdef C_ARITH_CODING_SUPPORTED

	2799 + jinit_arith_encoder(cinfo);

	2800 +#else

	2801 ERREXIT(cinfo, JERR_ARITH_NOTIMPL);

	2802 +#endif

	2803 } else {

	2804 if (cinfo->progressive_mode) {

	2805 #ifdef C_PROGRESSIVE_SUPPORTED

	2806 Index: jcmainct.c

	2807 ===================================================================

	2808 --- jcmainct.c (revision 829)

	2809 +++ jcmainct.c (working copy)

	2810 @@ -68,32 +68,32 @@

	2811 METHODDEF(void)

	2812 start_pass_main (j_compress_ptr cinfo, J_BUF_MODE pass_mode)

	2813 {

	2814 - my_main_ptr main = (my_main_ptr) cinfo->main;

	2815 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	2816

	2817 /* Do nothing in raw-data mode. */

	2818 if (cinfo->raw_data_in)

	2819 return;

	2820

	2821 - main->cur_iMCU_row = 0; /* initialize counters */

	2822 - main->rowgroup_ctr = 0;

	2823 - main->suspended = FALSE;

	2824 - main->pass_mode = pass_mode; /* save mode for use by process_data */

	2825 + main_ptr->cur_iMCU_row = 0; /* initialize counters */

	2826 + main_ptr->rowgroup_ctr = 0;

	2827 + main_ptr->suspended = FALSE;

	2828 + main_ptr->pass_mode = pass_mode; /* save mode for use by process_data */

	2829

	2830 switch (pass_mode) {

	2831 case JBUF_PASS_THRU:

	2832 #ifdef FULL_MAIN_BUFFER_SUPPORTED

	2833 - if (main->whole_image[0] != NULL)

	2834 + if (main_ptr->whole_image[0] != NULL)

	2835 ERREXIT(cinfo, JERR_BAD_BUFFER_MODE);

	2836 #endif

	2837 - main->pub.process_data = process_data_simple_main;

	2838 + main_ptr->pub.process_data = process_data_simple_main;

	2839 break;

	2840 #ifdef FULL_MAIN_BUFFER_SUPPORTED

	2841 case JBUF_SAVE_SOURCE:

	2842 case JBUF_CRANK_DEST:

	2843 case JBUF_SAVE_AND_PASS:

	2844 - if (main->whole_image[0] == NULL)

	2845 + if (main_ptr->whole_image[0] == NULL)

	2846 ERREXIT(cinfo, JERR_BAD_BUFFER_MODE);

	2847 - main->pub.process_data = process_data_buffer_main;

	2848 + main_ptr->pub.process_data = process_data_buffer_main;

	2849 break;

	2850 #endif

	2851 default:

	2852 @@ -114,14 +114,14 @@

	2853 JSAMPARRAY input_buf, JDIMENSION *in_row_ctr,

	2854 JDIMENSION in_rows_avail)

	2855 {

	2856 - my_main_ptr main = (my_main_ptr) cinfo->main;

	2857 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	2858

	2859 - while (main->cur_iMCU_row < cinfo->total_iMCU_rows) {

	2860 + while (main_ptr->cur_iMCU_row < cinfo->total_iMCU_rows) {

	2861 /* Read input data if we haven't filled the main buffer yet */

	2862 - if (main->rowgroup_ctr < DCTSIZE)

	2863 + if (main_ptr->rowgroup_ctr < DCTSIZE)

	2864 (*cinfo->prep->pre_process_data) (cinfo,

	2865 input_buf, in_row_ctr, in_rows_avail,

	2866 - main->buffer, &main->rowgroup_ctr,

	2867 + main_ptr->buffer, &main_ptr->rowgroup_ct r,

	2868 (JDIMENSION) DCTSIZE);

	2869

	2870 /* If we don't have a full iMCU row buffered, return to application for

	2871 @@ -128,11 +128,11 @@

	2872 * more data. Note that preprocessor will always pad to fill the iMCU row

	2873 * at the bottom of the image.

	2874 */

	2875 - if (main->rowgroup_ctr != DCTSIZE)

	2876 + if (main_ptr->rowgroup_ctr != DCTSIZE)

	2877 return;

	2878

	2879 /* Send the completed row to the compressor */

	2880 - if (! (*cinfo->coef->compress_data) (cinfo, main->buffer)) {

	2881 + if (! (*cinfo->coef->compress_data) (cinfo, main_ptr->buffer)) {

	2882 /* If compressor did not consume the whole row, then we must need to

	2883 * suspend processing and return to the application. In this situation

	2884 * we pretend we didn't yet consume the last input row; otherwise, if

	2885 @@ -139,9 +139,9 @@

	2886 * it happened to be the last row of the image, the application would

	2887 * think we were done.

	2888 */

	2889 - if (! main->suspended) {

	2890 + if (! main_ptr->suspended) {

	2891 (*in_row_ctr)--;

	2892 - main->suspended = TRUE;

	2893 + main_ptr->suspended = TRUE;

	2894 }

	2895 return;

	2896 }

	2897 @@ -148,12 +148,12 @@

	2898 /* We did finish the row. Undo our little suspension hack if a previous

	2899 * call suspended; then mark the main buffer empty.

	2900 */

	2901 - if (main->suspended) {

	2902 + if (main_ptr->suspended) {

	2903 (*in_row_ctr)++;

	2904 - main->suspended = FALSE;

	2905 + main_ptr->suspended = FALSE;

	2906 }

	2907 - main->rowgroup_ctr = 0;

	2908 - main->cur_iMCU_row++;

	2909 + main_ptr->rowgroup_ctr = 0;

	2910 + main_ptr->cur_iMCU_row++;

	2911 }

	2912 }

	2913

	2914 @@ -170,25 +170,25 @@

	2915 JSAMPARRAY input_buf, JDIMENSION *in_row_ctr,

	2916 JDIMENSION in_rows_avail)

	2917 {

	2918 - my_main_ptr main = (my_main_ptr) cinfo->main;

	2919 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	2920 int ci;

	2921 jpeg_component_info *compptr;

	2922 - boolean writing = (main->pass_mode != JBUF_CRANK_DEST);

	2923 + boolean writing = (main_ptr->pass_mode != JBUF_CRANK_DEST);

	2924

	2925 - while (main->cur_iMCU_row < cinfo->total_iMCU_rows) {

	2926 + while (main_ptr->cur_iMCU_row < cinfo->total_iMCU_rows) {

	2927 /* Realign the virtual buffers if at the start of an iMCU row. */

	2928 - if (main->rowgroup_ctr == 0) {

	2929 + if (main_ptr->rowgroup_ctr == 0) {

	2930 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	2931 ci++, compptr++) {

	2932 - main->buffer[ci] = (*cinfo->mem->access_virt_sarray)

	2933 - ((j_common_ptr) cinfo, main->whole_image[ci],

	2934 - main->cur_iMCU_row * (compptr->v_samp_factor * DCTSIZE),

	2935 + main_ptr->buffer[ci] = (*cinfo->mem->access_virt_sarray)

	2936 + ((j_common_ptr) cinfo, main_ptr->whole_image[ci],

	2937 + main_ptr->cur_iMCU_row * (compptr->v_samp_factor * DCTSIZE),

	2938 (JDIMENSION) (compptr->v_samp_factor * DCTSIZE), writing);

	2939 }

	2940 /* In a read pass, pretend we just read some source data. */

	2941 if (! writing) {

	2942 in_row_ctr += cinfo->max_v_samp_factor DCTSIZE;

	2943 - main->rowgroup_ctr = DCTSIZE;

	2944 + main_ptr->rowgroup_ctr = DCTSIZE;

	2945 }

	2946 }

	2947

	2948 @@ -197,16 +197,16 @@

	2949 if (writing) {

	2950 (*cinfo->prep->pre_process_data) (cinfo,

	2951 input_buf, in_row_ctr, in_rows_avail,

	2952 - main->buffer, &main->rowgroup_ctr,

	2953 + main_ptr->buffer, &main_ptr->rowgroup_ct r,

	2954 (JDIMENSION) DCTSIZE);

	2955 /* Return to application if we need more data to fill the iMCU row. */

	2956 - if (main->rowgroup_ctr < DCTSIZE)

	2957 + if (main_ptr->rowgroup_ctr < DCTSIZE)

	2958 return;

	2959 }

	2960

	2961 /* Emit data, unless this is a sink-only pass. */

	2962 - if (main->pass_mode != JBUF_SAVE_SOURCE) {

	2963 - if (! (*cinfo->coef->compress_data) (cinfo, main->buffer)) {

	2964 + if (main_ptr->pass_mode != JBUF_SAVE_SOURCE) {

	2965 + if (! (*cinfo->coef->compress_data) (cinfo, main_ptr->buffer)) {

	2966 /* If compressor did not consume the whole row, then we must need to

	2967 * suspend processing and return to the application. In this situation

	2968 * we pretend we didn't yet consume the last input row; otherwise, if

	2969 @@ -213,9 +213,9 @@

	2970 * it happened to be the last row of the image, the application would

	2971 * think we were done.

	2972 */

	2973 - if (! main->suspended) {

	2974 + if (! main_ptr->suspended) {

	2975 (*in_row_ctr)--;

	2976 - main->suspended = TRUE;

	2977 + main_ptr->suspended = TRUE;

	2978 }

	2979 return;

	2980 }

	2981 @@ -222,15 +222,15 @@

	2982 /* We did finish the row. Undo our little suspension hack if a previous

	2983 * call suspended; then mark the main buffer empty.

	2984 */

	2985 - if (main->suspended) {

	2986 + if (main_ptr->suspended) {

	2987 (*in_row_ctr)++;

	2988 - main->suspended = FALSE;

	2989 + main_ptr->suspended = FALSE;

	2990 }

	2991 }

	2992

	2993 /* If get here, we are done with this iMCU row. Mark buffer empty. */

	2994 - main->rowgroup_ctr = 0;

	2995 - main->cur_iMCU_row++;

	2996 + main_ptr->rowgroup_ctr = 0;

	2997 + main_ptr->cur_iMCU_row++;

	2998 }

	2999 }

	3000

	3001 @@ -244,15 +244,15 @@

	3002 GLOBAL(void)

	3003 jinit_c_main_controller (j_compress_ptr cinfo, boolean need_full_buffer)

	3004 {

	3005 - my_main_ptr main;

	3006 + my_main_ptr main_ptr;

	3007 int ci;

	3008 jpeg_component_info *compptr;

	3009

	3010 - main = (my_main_ptr)

	3011 + main_ptr = (my_main_ptr)

	3012 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,

	3013 SIZEOF(my_main_controller));

	3014 - cinfo->main = (struct jpeg_c_main_controller *) main;

	3015 - main->pub.start_pass = start_pass_main;

	3016 + cinfo->main = (struct jpeg_c_main_controller *) main_ptr;

	3017 + main_ptr->pub.start_pass = start_pass_main;

	3018

	3019 /* We don't need to create a buffer in raw-data mode. */

	3020 if (cinfo->raw_data_in)

	3021 @@ -267,7 +267,7 @@

	3022 /* Note we pad the bottom to a multiple of the iMCU height */

	3023 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	3024 ci++, compptr++) {

	3025 - main->whole_image[ci] = (*cinfo->mem->request_virt_sarray)

	3026 + main_ptr->whole_image[ci] = (*cinfo->mem->request_virt_sarray)

	3027 ((j_common_ptr) cinfo, JPOOL_IMAGE, FALSE,

	3028 compptr->width_in_blocks * DCTSIZE,

	3029 (JDIMENSION) jround_up((long) compptr->height_in_blocks,

	3030 @@ -279,12 +279,12 @@

	3031 #endif

	3032 } else {

	3033 #ifdef FULL_MAIN_BUFFER_SUPPORTED

	3034 - main->whole_image[0] = NULL; /* flag for no virtual arrays */

	3035 + main_ptr->whole_image[0] = NULL; /* flag for no virtual arrays */

	3036 #endif

	3037 /* Allocate a strip buffer for each component */

	3038 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	3039 ci++, compptr++) {

	3040 - main->buffer[ci] = (*cinfo->mem->alloc_sarray)

	3041 + main_ptr->buffer[ci] = (*cinfo->mem->alloc_sarray)

	3042 ((j_common_ptr) cinfo, JPOOL_IMAGE,

	3043 compptr->width_in_blocks * DCTSIZE,

	3044 (JDIMENSION) (compptr->v_samp_factor * DCTSIZE));

	3045 Index: jcmarker.c

	3046 ===================================================================

	3047 --- jcmarker.c (revision 829)

	3048 +++ jcmarker.c (working copy)

	3049 @@ -1,8 +1,11 @@

	3050 /*

	3051 * jcmarker.c

	3052 *

	3053 + * This file was part of the Independent JPEG Group's software:

	3054 * Copyright (C) 1991-1998, Thomas G. Lane.

	3055 - * This file is part of the Independent JPEG Group's software.

	3056 + * Modified 2003-2010 by Guido Vollbeding.

	3057 + * libjpeg-turbo Modifications:

	3058 + * Copyright (C) 2010, D. R. Commander.

	3059 * For conditions of distribution and use, see the accompanying README file.

	3060 *

	3061 * This file contains routines to write JPEG datastream markers.

	3062 @@ -11,6 +14,7 @@

	3063 #define JPEG_INTERNALS

	3064 #include "jinclude.h"

	3065 #include "jpeglib.h"

	3066 +#include "jpegcomp.h"

	3067

	3068

	3069 typedef enum { /* JPEG marker codes */

	3070 @@ -18,24 +22,24 @@

	3071 M_SOF1 = 0xc1,

	3072 M_SOF2 = 0xc2,

	3073 M_SOF3 = 0xc3,

	3074 -

	3075 +

	3076 M_SOF5 = 0xc5,

	3077 M_SOF6 = 0xc6,

	3078 M_SOF7 = 0xc7,

	3079 -

	3080 +

	3081 M_JPG = 0xc8,

	3082 M_SOF9 = 0xc9,

	3083 M_SOF10 = 0xca,

	3084 M_SOF11 = 0xcb,

	3085 -

	3086 +

	3087 M_SOF13 = 0xcd,

	3088 M_SOF14 = 0xce,

	3089 M_SOF15 = 0xcf,

	3090 -

	3091 +

	3092 M_DHT = 0xc4,

	3093 -

	3094 +

	3095 M_DAC = 0xcc,

	3096 -

	3097 +

	3098 M_RST0 = 0xd0,

	3099 M_RST1 = 0xd1,

	3100 M_RST2 = 0xd2,

	3101 @@ -44,7 +48,7 @@

	3102 M_RST5 = 0xd5,

	3103 M_RST6 = 0xd6,

	3104 M_RST7 = 0xd7,

	3105 -

	3106 +

	3107 M_SOI = 0xd8,

	3108 M_EOI = 0xd9,

	3109 M_SOS = 0xda,

	3110 @@ -53,7 +57,7 @@

	3111 M_DRI = 0xdd,

	3112 M_DHP = 0xde,

	3113 M_EXP = 0xdf,

	3114 -

	3115 +

	3116 M_APP0 = 0xe0,

	3117 M_APP1 = 0xe1,

	3118 M_APP2 = 0xe2,

	3119 @@ -70,13 +74,13 @@

	3120 M_APP13 = 0xed,

	3121 M_APP14 = 0xee,

	3122 M_APP15 = 0xef,

	3123 -

	3124 +

	3125 M_JPG0 = 0xf0,

	3126 M_JPG13 = 0xfd,

	3127 M_COM = 0xfe,

	3128 -

	3129 +

	3130 M_TEM = 0x01,

	3131 -

	3132 +

	3133 M_ERROR = 0x100

	3134 } JPEG_MARKER;

	3135

	3136 @@ -229,33 +233,39 @@

	3137 char ac_in_use[NUM_ARITH_TBLS];

	3138 int length, i;

	3139 jpeg_component_info *compptr;

	3140 -

	3141 +

	3142 for (i = 0; i < NUM_ARITH_TBLS; i++)

	3143 dc_in_use[i] = ac_in_use[i] = 0;

	3144 -

	3145 +

	3146 for (i = 0; i < cinfo->comps_in_scan; i++) {

	3147 compptr = cinfo->cur_comp_info[i];

	3148 - dc_in_use[compptr->dc_tbl_no] = 1;

	3149 - ac_in_use[compptr->ac_tbl_no] = 1;

	3150 + /* DC needs no table for refinement scan */

	3151 + if (cinfo->Ss == 0 && cinfo->Ah == 0)

	3152 + dc_in_use[compptr->dc_tbl_no] = 1;

	3153 + /* AC needs no table when not present */

	3154 + if (cinfo->Se)

	3155 + ac_in_use[compptr->ac_tbl_no] = 1;

	3156 }

	3157 -

	3158 +

	3159 length = 0;

	3160 for (i = 0; i < NUM_ARITH_TBLS; i++)

	3161 length += dc_in_use[i] + ac_in_use[i];

	3162 -

	3163 - emit_marker(cinfo, M_DAC);

	3164 -

	3165 - emit_2bytes(cinfo, length*2 + 2);

	3166 -

	3167 - for (i = 0; i < NUM_ARITH_TBLS; i++) {

	3168 - if (dc_in_use[i]) {

	3169 - emit_byte(cinfo, i);

	3170 - emit_byte(cinfo, cinfo->arith_dc_L[i] + (cinfo->arith_dc_U[i]<<4));

	3171 +

	3172 + if (length) {

	3173 + emit_marker(cinfo, M_DAC);

	3174 +

	3175 + emit_2bytes(cinfo, length*2 + 2);

	3176 +

	3177 + for (i = 0; i < NUM_ARITH_TBLS; i++) {

	3178 + if (dc_in_use[i]) {

	3179 + emit_byte(cinfo, i);

	3180 + emit_byte(cinfo, cinfo->arith_dc_L[i] + (cinfo->arith_dc_U[i]<<4));

	3181 + }

	3182 + if (ac_in_use[i]) {

	3183 + emit_byte(cinfo, i + 0x10);

	3184 + emit_byte(cinfo, cinfo->arith_ac_K[i]);

	3185 + }

	3186 }

	3187 - if (ac_in_use[i]) {

	3188 - emit_byte(cinfo, i + 0x10);

	3189 - emit_byte(cinfo, cinfo->arith_ac_K[i]);

	3190 - }

	3191 }

	3192 #endif /* C_ARITH_CODING_SUPPORTED */

	3193 }

	3194 @@ -285,13 +295,13 @@

	3195 emit_2bytes(cinfo, 3 * cinfo->num_components + 2 + 5 + 1); /* length */

	3196

	3197 /* Make sure image isn't bigger than SOF field can handle */

	3198 - if ((long) cinfo->image_height > 65535L \|\|

	3199 - (long) cinfo->image_width > 65535L)

	3200 + if ((long) cinfo->_jpeg_height > 65535L \|\|

	3201 + (long) cinfo->_jpeg_width > 65535L)

	3202 ERREXIT1(cinfo, JERR_IMAGE_TOO_BIG, (unsigned int) 65535);

	3203

	3204 emit_byte(cinfo, cinfo->data_precision);

	3205 - emit_2bytes(cinfo, (int) cinfo->image_height);

	3206 - emit_2bytes(cinfo, (int) cinfo->image_width);

	3207 + emit_2bytes(cinfo, (int) cinfo->_jpeg_height);

	3208 + emit_2bytes(cinfo, (int) cinfo->_jpeg_width);

	3209

	3210 emit_byte(cinfo, cinfo->num_components);

	3211

	3212 @@ -320,22 +330,16 @@

	3213 for (i = 0; i < cinfo->comps_in_scan; i++) {

	3214 compptr = cinfo->cur_comp_info[i];

	3215 emit_byte(cinfo, compptr->component_id);

	3216 - td = compptr->dc_tbl_no;

	3217 - ta = compptr->ac_tbl_no;

	3218 - if (cinfo->progressive_mode) {

	3219 - /* Progressive mode: only DC or only AC tables are used in one scan;

	3220 - * furthermore, Huffman coding of DC refinement uses no table at all.

	3221 - * We emit 0 for unused field(s); this is recommended by the P&M text

	3222 - * but does not seem to be specified in the standard.

	3223 - */

	3224 - if (cinfo->Ss == 0) {

	3225 - ta = 0; /* DC scan */

	3226 - if (cinfo->Ah != 0 && !cinfo->arith_code)

	3227 - td = 0; /* no DC table either */

	3228 - } else {

	3229 - td = 0; /* AC scan */

	3230 - }

	3231 - }

	3232 +

	3233 + /* We emit 0 for unused field(s); this is recommended by the P&M text

	3234 + * but does not seem to be specified in the standard.

	3235 + */

	3236 +

	3237 + /* DC needs no table for refinement scan */

	3238 + td = cinfo->Ss == 0 && cinfo->Ah == 0 ? compptr->dc_tbl_no : 0;

	3239 + /* AC needs no table when not present */

	3240 + ta = cinfo->Se ? compptr->ac_tbl_no : 0;

	3241 +

	3242 emit_byte(cinfo, (td << 4) + ta);

	3243 }

	3244

	3245 @@ -529,7 +533,10 @@

	3246

	3247 /* Emit the proper SOF marker */

	3248 if (cinfo->arith_code) {

	3249 - emit_sof(cinfo, M_SOF9); /* SOF code for arithmetic coding */

	3250 + if (cinfo->progressive_mode)

	3251 + emit_sof(cinfo, M_SOF10); /* SOF code for progressive arithmetic */

	3252 + else

	3253 + emit_sof(cinfo, M_SOF9); /* SOF code for sequential arithmetic */

	3254 } else {

	3255 if (cinfo->progressive_mode)

	3256 emit_sof(cinfo, M_SOF2); /* SOF code for progressive Huffman */

	3257 @@ -566,19 +573,12 @@

	3258 */

	3259 for (i = 0; i < cinfo->comps_in_scan; i++) {

	3260 compptr = cinfo->cur_comp_info[i];

	3261 - if (cinfo->progressive_mode) {

	3262 - /* Progressive mode: only DC or only AC tables are used in one scan */

	3263 - if (cinfo->Ss == 0) {

	3264 - if (cinfo->Ah == 0) /* DC needs no table for refinement scan */

	3265 - emit_dht(cinfo, compptr->dc_tbl_no, FALSE);

	3266 - } else {

	3267 - emit_dht(cinfo, compptr->ac_tbl_no, TRUE);

	3268 - }

	3269 - } else {

	3270 - /* Sequential mode: need both DC and AC tables */

	3271 + /* DC needs no table for refinement scan */

	3272 + if (cinfo->Ss == 0 && cinfo->Ah == 0)

	3273 emit_dht(cinfo, compptr->dc_tbl_no, FALSE);

	3274 + /* AC needs no table when not present */

	3275 + if (cinfo->Se)

	3276 emit_dht(cinfo, compptr->ac_tbl_no, TRUE);

	3277 - }

	3278 }

	3279 }

	3280

	3281 Index: jcmaster.c

	3282 ===================================================================

	3283 --- jcmaster.c (revision 829)

	3284 +++ jcmaster.c (working copy)

	3285 @@ -1,8 +1,11 @@

	3286 /*

	3287 * jcmaster.c

	3288 *

	3289 + * This file was part of the Independent JPEG Group's software:

	3290 * Copyright (C) 1991-1997, Thomas G. Lane.

	3291 - * This file is part of the Independent JPEG Group's software.

	3292 + * Modified 2003-2010 by Guido Vollbeding.

	3293 + * libjpeg-turbo Modifications:

	3294 + * Copyright (C) 2010, D. R. Commander.

	3295 * For conditions of distribution and use, see the accompanying README file.

	3296 *

	3297 * This file contains master control logic for the JPEG compressor.

	3298 @@ -14,6 +17,7 @@

	3299 #define JPEG_INTERNALS

	3300 #include "jinclude.h"

	3301 #include "jpeglib.h"

	3302 +#include "jpegcomp.h"

	3303

	3304

	3305 /* Private state */

	3306 @@ -42,8 +46,28 @@

	3307 * Support routines that do various essential calculations.

	3308 */

	3309

	3310 +#if JPEG_LIB_VERSION >= 70

	3311 +/*

	3312 + * Compute JPEG image dimensions and related values.

	3313 + * NOTE: this is exported for possible use by application.

	3314 + * Hence it mustn't do anything that can't be done twice.

	3315 + */

	3316 +

	3317 +GLOBAL(void)

	3318 +jpeg_calc_jpeg_dimensions (j_compress_ptr cinfo)

	3319 +/* Do computations that are needed before master selection phase */

	3320 +{

	3321 + /* Hardwire it to "no scaling" */

	3322 + cinfo->jpeg_width = cinfo->image_width;

	3323 + cinfo->jpeg_height = cinfo->image_height;

	3324 + cinfo->min_DCT_h_scaled_size = DCTSIZE;

	3325 + cinfo->min_DCT_v_scaled_size = DCTSIZE;

	3326 +}

	3327 +#endif

	3328 +

	3329 +

	3330 LOCAL(void)

	3331 -initial_setup (j_compress_ptr cinfo)

	3332 +initial_setup (j_compress_ptr cinfo, boolean transcode_only)

	3333 /* Do computations that are needed before master selection phase */

	3334 {

	3335 int ci;

	3336 @@ -51,14 +75,21 @@

	3337 long samplesperrow;

	3338 JDIMENSION jd_samplesperrow;

	3339

	3340 +#if JPEG_LIB_VERSION >= 70

	3341 +#if JPEG_LIB_VERSION >= 80

	3342 + if (!transcode_only)

	3343 +#endif

	3344 + jpeg_calc_jpeg_dimensions(cinfo);

	3345 +#endif

	3346 +

	3347 /* Sanity check on image dimensions */

	3348 - if (cinfo->image_height <= 0 \|\| cinfo->image_width <= 0

	3349 + if (cinfo->_jpeg_height <= 0 \|\| cinfo->_jpeg_width <= 0

	3350 \|\| cinfo->num_components <= 0 \|\| cinfo->input_components <= 0)

	3351 ERREXIT(cinfo, JERR_EMPTY_IMAGE);

	3352

	3353 /* Make sure image isn't bigger than I can handle */

	3354 - if ((long) cinfo->image_height > (long) JPEG_MAX_DIMENSION \|\|

	3355 - (long) cinfo->image_width > (long) JPEG_MAX_DIMENSION)

	3356 + if ((long) cinfo->_jpeg_height > (long) JPEG_MAX_DIMENSION \|\|

	3357 + (long) cinfo->_jpeg_width > (long) JPEG_MAX_DIMENSION)

	3358 ERREXIT1(cinfo, JERR_IMAGE_TOO_BIG, (unsigned int) JPEG_MAX_DIMENSION);

	3359

	3360 /* Width of an input scanline must be representable as JDIMENSION. */

	3361 @@ -96,20 +127,24 @@

	3362 /* Fill in the correct component_index value; don't rely on application */

	3363 compptr->component_index = ci;

	3364 /* For compression, we never do DCT scaling. */

	3365 +#if JPEG_LIB_VERSION >= 70

	3366 + compptr->DCT_h_scaled_size = compptr->DCT_v_scaled_size = DCTSIZE;

	3367 +#else

	3368 compptr->DCT_scaled_size = DCTSIZE;

	3369 +#endif

	3370 /* Size in DCT blocks */

	3371 compptr->width_in_blocks = (JDIMENSION)

	3372 - jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor,

	3373 + jdiv_round_up((long) cinfo->_jpeg_width * (long) compptr->h_samp_factor,

	3374 (long) (cinfo->max_h_samp_factor * DCTSIZE));

	3375 compptr->height_in_blocks = (JDIMENSION)

	3376 - jdiv_round_up((long) cinfo->image_height * (long) compptr->v_samp_factor,

	3377 + jdiv_round_up((long) cinfo->_jpeg_height * (long) compptr->v_samp_factor,

	3378 (long) (cinfo->max_v_samp_factor * DCTSIZE));

	3379 /* Size in samples */

	3380 compptr->downsampled_width = (JDIMENSION)

	3381 - jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor,

	3382 + jdiv_round_up((long) cinfo->_jpeg_width * (long) compptr->h_samp_factor,

	3383 (long) cinfo->max_h_samp_factor);

	3384 compptr->downsampled_height = (JDIMENSION)

	3385 - jdiv_round_up((long) cinfo->image_height * (long) compptr->v_samp_factor,

	3386 + jdiv_round_up((long) cinfo->_jpeg_height * (long) compptr->v_samp_factor,

	3387 (long) cinfo->max_v_samp_factor);

	3388 /* Mark component needed (this flag isn't actually used for compression) */

	3389 compptr->component_needed = TRUE;

	3390 @@ -119,7 +154,7 @@

	3391 * main controller will call coefficient controller).

	3392 */

	3393 cinfo->total_iMCU_rows = (JDIMENSION)

	3394 - jdiv_round_up((long) cinfo->image_height,

	3395 + jdiv_round_up((long) cinfo->_jpeg_height,

	3396 (long) (cinfo->max_v_samp_factor*DCTSIZE));

	3397 }

	3398

	3399 @@ -347,10 +382,10 @@

	3400

	3401 /* Overall image size in MCUs */

	3402 cinfo->MCUs_per_row = (JDIMENSION)

	3403 - jdiv_round_up((long) cinfo->image_width,

	3404 + jdiv_round_up((long) cinfo->_jpeg_width,

	3405 (long) (cinfo->max_h_samp_factor*DCTSIZE));

	3406 cinfo->MCU_rows_in_scan = (JDIMENSION)

	3407 - jdiv_round_up((long) cinfo->image_height,

	3408 + jdiv_round_up((long) cinfo->_jpeg_height,

	3409 (long) (cinfo->max_v_samp_factor*DCTSIZE));

	3410

	3411 cinfo->blocks_in_MCU = 0;

	3412 @@ -554,7 +589,7 @@

	3413 master->pub.is_last_pass = FALSE;

	3414

	3415 /* Validate parameters, determine derived values */

	3416 - initial_setup(cinfo);

	3417 + initial_setup(cinfo, transcode_only);

	3418

	3419 if (cinfo->scan_info != NULL) {

	3420 #ifdef C_MULTISCAN_FILES_SUPPORTED

	3421 @@ -567,7 +602,7 @@

	3422 cinfo->num_scans = 1;

	3423 }

	3424

	3425 - if (cinfo->progressive_mode) /* TEMPORARY HACK ??? */

	3426 + if (cinfo->progressive_mode && !cinfo->arith_code) /* TEMPORARY HACK ??? * /

	3427 cinfo->optimize_coding = TRUE; /* assume default tables no good for progres sive mode */

	3428

	3429 /* Initialize my private state */

	3430 Index: jcparam.c

	3431 ===================================================================

	3432 --- jcparam.c (revision 829)

	3433 +++ jcparam.c (working copy)

	3434 @@ -1,9 +1,11 @@

	3435 /*

	3436 * jcparam.c

	3437 *

	3438 + * This file was part of the Independent JPEG Group's software:

	3439 * Copyright (C) 1991-1998, Thomas G. Lane.

	3440 - * Copyright (C) 2009, D. R. Commander.

	3441 - * This file is part of the Independent JPEG Group's software.

	3442 + * Modified 2003-2008 by Guido Vollbeding.

	3443 + * libjpeg-turbo Modifications:

	3444 + * Copyright (C) 2009-2011, D. R. Commander.

	3445 * For conditions of distribution and use, see the accompanying README file.

	3446 *

	3447 * This file contains optional default-setting code for the JPEG compressor.

	3448 @@ -61,7 +63,50 @@

	3449 }

	3450

	3451

	3452 +/* These are the sample quantization tables given in JPEG spec section K.1.

	3453 + * The spec says that the values given produce "good" quality, and

	3454 + * when divided by 2, "very good" quality.

	3455 + */

	3456 +static const unsigned int std_luminance_quant_tbl[DCTSIZE2] = {

	3457 + 16, 11, 10, 16, 24, 40, 51, 61,

	3458 + 12, 12, 14, 19, 26, 58, 60, 55,

	3459 + 14, 13, 16, 24, 40, 57, 69, 56,

	3460 + 14, 17, 22, 29, 51, 87, 80, 62,

	3461 + 18, 22, 37, 56, 68, 109, 103, 77,

	3462 + 24, 35, 55, 64, 81, 104, 113, 92,

	3463 + 49, 64, 78, 87, 103, 121, 120, 101,

	3464 + 72, 92, 95, 98, 112, 100, 103, 99

	3465 +};

	3466 +static const unsigned int std_chrominance_quant_tbl[DCTSIZE2] = {

	3467 + 17, 18, 24, 47, 99, 99, 99, 99,

	3468 + 18, 21, 26, 66, 99, 99, 99, 99,

	3469 + 24, 26, 56, 99, 99, 99, 99, 99,

	3470 + 47, 66, 99, 99, 99, 99, 99, 99,

	3471 + 99, 99, 99, 99, 99, 99, 99, 99,

	3472 + 99, 99, 99, 99, 99, 99, 99, 99,

	3473 + 99, 99, 99, 99, 99, 99, 99, 99,

	3474 + 99, 99, 99, 99, 99, 99, 99, 99

	3475 +};

	3476 +

	3477 +

	3478 +#if JPEG_LIB_VERSION >= 70

	3479 GLOBAL(void)

	3480 +jpeg_default_qtables (j_compress_ptr cinfo, boolean force_baseline)

	3481 +/* Set or change the 'quality' (quantization) setting, using default tables

	3482 + * and straight percentage-scaling quality scales.

	3483 + * This entry point allows different scalings for luminance and chrominance.

	3484 + */

	3485 +{

	3486 + /* Set up two quantization tables using the specified scaling */

	3487 + jpeg_add_quant_table(cinfo, 0, std_luminance_quant_tbl,

	3488 + cinfo->q_scale_factor[0], force_baseline);

	3489 + jpeg_add_quant_table(cinfo, 1, std_chrominance_quant_tbl,

	3490 + cinfo->q_scale_factor[1], force_baseline);

	3491 +}

	3492 +#endif

	3493 +

	3494 +

	3495 +GLOBAL(void)

	3496 jpeg_set_linear_quality (j_compress_ptr cinfo, int scale_factor,

	3497 boolean force_baseline)

	3498 /* Set or change the 'quality' (quantization) setting, using default tables

	3499 @@ -70,31 +115,6 @@

	3500 * applications that insist on a linear percentage scaling.

	3501 */

	3502 {

	3503 - /* These are the sample quantization tables given in JPEG spec section K.1.

	3504 - * The spec says that the values given produce "good" quality, and

	3505 - * when divided by 2, "very good" quality.

	3506 - */

	3507 - static const unsigned int std_luminance_quant_tbl[DCTSIZE2] = {

	3508 - 16, 11, 10, 16, 24, 40, 51, 61,

	3509 - 12, 12, 14, 19, 26, 58, 60, 55,

	3510 - 14, 13, 16, 24, 40, 57, 69, 56,

	3511 - 14, 17, 22, 29, 51, 87, 80, 62,

	3512 - 18, 22, 37, 56, 68, 109, 103, 77,

	3513 - 24, 35, 55, 64, 81, 104, 113, 92,

	3514 - 49, 64, 78, 87, 103, 121, 120, 101,

	3515 - 72, 92, 95, 98, 112, 100, 103, 99

	3516 - };

	3517 - static const unsigned int std_chrominance_quant_tbl[DCTSIZE2] = {

	3518 - 17, 18, 24, 47, 99, 99, 99, 99,

	3519 - 18, 21, 26, 66, 99, 99, 99, 99,

	3520 - 24, 26, 56, 99, 99, 99, 99, 99,

	3521 - 47, 66, 99, 99, 99, 99, 99, 99,

	3522 - 99, 99, 99, 99, 99, 99, 99, 99,

	3523 - 99, 99, 99, 99, 99, 99, 99, 99,

	3524 - 99, 99, 99, 99, 99, 99, 99, 99,

	3525 - 99, 99, 99, 99, 99, 99, 99, 99

	3526 - };

	3527 -

	3528 /* Set up two quantization tables using the specified scaling */

	3529 jpeg_add_quant_table(cinfo, 0, std_luminance_quant_tbl,

	3530 scale_factor, force_baseline);

	3531 @@ -285,6 +305,10 @@

	3532

	3533 /* Initialize everything not dependent on the color space */

	3534

	3535 +#if JPEG_LIB_VERSION >= 70

	3536 + cinfo->scale_num = 1; /* 1:1 scaling */

	3537 + cinfo->scale_denom = 1;

	3538 +#endif

	3539 cinfo->data_precision = BITS_IN_JSAMPLE;

	3540 /* Set up two quantization tables using default quality of 75 */

	3541 jpeg_set_quality(cinfo, 75, TRUE);

	3542 @@ -321,6 +345,11 @@

	3543 /* By default, use the simpler non-cosited sampling alignment */

	3544 cinfo->CCIR601_sampling = FALSE;

	3545

	3546 +#if JPEG_LIB_VERSION >= 70

	3547 + /* By default, apply fancy downsampling */

	3548 + cinfo->do_fancy_downsampling = TRUE;

	3549 +#endif

	3550 +

	3551 /* No input smoothing */

	3552 cinfo->smoothing_factor = 0;

	3553

	3554 @@ -370,6 +399,10 @@

	3555 case JCS_EXT_BGRX:

	3556 case JCS_EXT_XBGR:

	3557 case JCS_EXT_XRGB:

	3558 + case JCS_EXT_RGBA:

	3559 + case JCS_EXT_BGRA:

	3560 + case JCS_EXT_ABGR:

	3561 + case JCS_EXT_ARGB:

	3562 jpeg_set_colorspace(cinfo, JCS_YCbCr);

	3563 break;

	3564 case JCS_YCbCr:

	3565 Index: jctrans.c

	3566 ===================================================================

	3567 --- jctrans.c (revision 829)

	3568 +++ jctrans.c (working copy)

	3569 @@ -2,6 +2,7 @@

	3570 * jctrans.c

	3571 *

	3572 * Copyright (C) 1995-1998, Thomas G. Lane.

	3573 + * Modified 2000-2009 by Guido Vollbeding.

	3574 * This file is part of the Independent JPEG Group's software.

	3575 * For conditions of distribution and use, see the accompanying README file.

	3576 *

	3577 @@ -76,6 +77,12 @@

	3578 dstinfo->image_height = srcinfo->image_height;

	3579 dstinfo->input_components = srcinfo->num_components;

	3580 dstinfo->in_color_space = srcinfo->jpeg_color_space;

	3581 +#if JPEG_LIB_VERSION >= 70

	3582 + dstinfo->jpeg_width = srcinfo->output_width;

	3583 + dstinfo->jpeg_height = srcinfo->output_height;

	3584 + dstinfo->min_DCT_h_scaled_size = srcinfo->min_DCT_h_scaled_size;

	3585 + dstinfo->min_DCT_v_scaled_size = srcinfo->min_DCT_v_scaled_size;

	3586 +#endif

	3587 /* Initialize all parameters to default values */

	3588 jpeg_set_defaults(dstinfo);

	3589 /* jpeg_set_defaults may choose wrong colorspace, eg YCbCr if input is RGB.

	3590 @@ -167,7 +174,11 @@

	3591

	3592 /* Entropy encoding: either Huffman or arithmetic coding. */

	3593 if (cinfo->arith_code) {

	3594 +#ifdef C_ARITH_CODING_SUPPORTED

	3595 + jinit_arith_encoder(cinfo);

	3596 +#else

	3597 ERREXIT(cinfo, JERR_ARITH_NOTIMPL);

	3598 +#endif

	3599 } else {

	3600 if (cinfo->progressive_mode) {

	3601 #ifdef C_PROGRESSIVE_SUPPORTED

	3602 Index: jdapistd.c

	3603 ===================================================================

	3604 --- jdapistd.c (revision 829)

	3605 +++ jdapistd.c (working copy)

	3606 @@ -1,8 +1,11 @@

	3607 /*

	3608 * jdapistd.c

	3609 *

	3610 + * This file was part of the Independent JPEG Group's software:

	3611 * Copyright (C) 1994-1996, Thomas G. Lane.

	3612 - * This file is part of the Independent JPEG Group's software.

	3613 + * libjpeg-turbo Modifications:

	3614 + * Copyright (C) 2010, 2015, D. R. Commander.

	3615 + * Copyright (C) 2015, Google, Inc.

	3616 * For conditions of distribution and use, see the accompanying README file.

	3617 *

	3618 * This file contains application interface code for the decompression half

	3619 @@ -14,9 +17,10 @@

	3620 * whole decompression library into a transcoder.

	3621 */

	3622

	3623 -#define JPEG_INTERNALS

	3624 -#include "jinclude.h"

	3625 -#include "jpeglib.h"

	3626 +#include "jdmainct.h"

	3627 +#include "jdcoefct.h"

	3628 +#include "jdsample.h"

	3629 +#include "jmemsys.h"

	3630

	3631

	3632 /* Forward declarations */

	3633 @@ -176,7 +180,236 @@

	3634 }

	3635

	3636

	3637 +

	3638 +/* Dummy color convert function used by jpeg_skip_scanlines() */

	3639 +LOCAL(void)

	3640 +noop_convert (j_decompress_ptr cinfo, JSAMPIMAGE input_buf,

	3641 + JDIMENSION input_row, JSAMPARRAY output_buf, int num_rows)

	3642 +{

	3643 +}

	3644 +

	3645 +

	3646 /*

	3647 + * In some cases, it is best to call jpeg_read_scanlines() and discard the

	3648 + * output, rather than skipping the scanlines, because this allows us to

	3649 + * maintain the internal state of the context-based upsampler. In these cases,

	3650 + * we set up and tear down a dummy color converter in order to avoid valgrind

	3651 + * errors and to achieve the best possible performance.

	3652 + */

	3653 +LOCAL(void)

	3654 +read_and_discard_scanlines (j_decompress_ptr cinfo, JDIMENSION num_lines)

	3655 +{

	3656 + JDIMENSION n;

	3657 + void (*color_convert) (j_decompress_ptr cinfo, JSAMPIMAGE input_buf,

	3658 + JDIMENSION input_row, JSAMPARRAY output_buf,

	3659 + int num_rows);

	3660 +

	3661 + color_convert = cinfo->cconvert->color_convert;

	3662 + cinfo->cconvert->color_convert = noop_convert;

	3663 +

	3664 + for (n = 0; n < num_lines; n++)

	3665 + jpeg_read_scanlines(cinfo, NULL, 1);

	3666 +

	3667 + cinfo->cconvert->color_convert = color_convert;

	3668 +}

	3669 +

	3670 +/*

	3671 + * Called by jpeg_skip_scanlines(). This partially skips a decompress block by

	3672 + * incrementing the rowgroup counter.

	3673 + */

	3674 +

	3675 +LOCAL(void)

	3676 +increment_simple_rowgroup_ctr (j_decompress_ptr cinfo, JDIMENSION rows)

	3677 +{

	3678 + JDIMENSION rows_left;

	3679 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	3680 +

	3681 + /* Increment the counter to the next row group after the skipped rows. */

	3682 + main_ptr->rowgroup_ctr += rows / cinfo->max_v_samp_factor;

	3683 +

	3684 + /* Partially skipping a row group would involve modifying the internal state

	3685 + * of the upsampler, so read the remaining rows into a dummy buffer instead.

	3686 + */

	3687 + rows_left = rows % cinfo->max_v_samp_factor;

	3688 + cinfo->output_scanline += rows - rows_left;

	3689 +

	3690 + read_and_discard_scanlines(cinfo, rows_left);

	3691 +}

	3692 +

	3693 +/*

	3694 + * Skips some scanlines of data from the JPEG decompressor.

	3695 + *

	3696 + * The return value will be the number of lines actually skipped. If skipping

	3697 + * num_lines would move beyond the end of the image, then the actual number of

	3698 + * lines remaining in the image is returned. Otherwise, the return value will

	3699 + * be equal to num_lines.

	3700 + *

	3701 + * Refer to libjpeg.txt for more information.

	3702 + */

	3703 +

	3704 +GLOBAL(JDIMENSION)

	3705 +jpeg_skip_scanlines (j_decompress_ptr cinfo, JDIMENSION num_lines)

	3706 +{

	3707 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	3708 + my_coef_ptr coef = (my_coef_ptr) cinfo->coef;

	3709 + my_upsample_ptr upsample = (my_upsample_ptr) cinfo->upsample;

	3710 + JDIMENSION i, x;

	3711 + int y;

	3712 + JDIMENSION lines_per_iMCU_row, lines_left_in_iMCU_row, lines_after_iMCU_row;

	3713 + JDIMENSION lines_to_skip, lines_to_read;

	3714 +

	3715 + if (cinfo->global_state != DSTATE_SCANNING)

	3716 + ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);

	3717 +

	3718 + /* Do not skip past the bottom of the image. */

	3719 + if (cinfo->output_scanline + num_lines >= cinfo->output_height) {

	3720 + cinfo->output_scanline = cinfo->output_height;

	3721 + return cinfo->output_height - cinfo->output_scanline;

	3722 + }

	3723 +

	3724 + if (num_lines == 0)

	3725 + return 0;

	3726 +

	3727 + lines_per_iMCU_row = cinfo->_min_DCT_scaled_size * cinfo->max_v_samp_factor;

	3728 + lines_left_in_iMCU_row =

	3729 + (lines_per_iMCU_row - (cinfo->output_scanline % lines_per_iMCU_row)) %

	3730 + lines_per_iMCU_row;

	3731 + lines_after_iMCU_row = num_lines - lines_left_in_iMCU_row;

	3732 +

	3733 + /* Skip the lines remaining in the current iMCU row. When upsampling

	3734 + * requires context rows, we need the previous and next rows in order to read

	3735 + * the current row. This adds some complexity.

	3736 + */

	3737 + if (cinfo->upsample->need_context_rows) {

	3738 + /* If the skipped lines would not move us past the current iMCU row, we

	3739 + * read the lines and ignore them. There might be a faster way of doing

	3740 + * this, but we are facing increasing complexity for diminishing returns.

	3741 + * The increasing complexity would be a by-product of meddling with the

	3742 + * state machine used to skip context rows. Near the end of an iMCU row,

	3743 + * the next iMCU row may have already been entropy-decoded. In this unique

	3744 + * case, we will read the next iMCU row if we cannot skip past it as well.

	3745 + */

	3746 + if ((num_lines < lines_left_in_iMCU_row + 1) \|\|

	3747 + (lines_left_in_iMCU_row <= 1 && main_ptr->buffer_full &&

	3748 + lines_after_iMCU_row < lines_per_iMCU_row + 1)) {

	3749 + read_and_discard_scanlines(cinfo, num_lines);

	3750 + return num_lines;

	3751 + }

	3752 +

	3753 + /* If the next iMCU row has already been entropy-decoded, make sure that

	3754 + * we do not skip too far.

	3755 + */

	3756 + if (lines_left_in_iMCU_row <= 1 && main_ptr->buffer_full) {

	3757 + cinfo->output_scanline += lines_left_in_iMCU_row + lines_per_iMCU_row;

	3758 + lines_after_iMCU_row -= lines_per_iMCU_row;

	3759 + } else {

	3760 + cinfo->output_scanline += lines_left_in_iMCU_row;

	3761 + }

	3762 +

	3763 + /* If we have just completed the first block, adjust the buffer pointers */

	3764 + if (main_ptr->iMCU_row_ctr == 0 \|\|

	3765 + (main_ptr->iMCU_row_ctr == 1 && lines_left_in_iMCU_row > 2))

	3766 + set_wraparound_pointers(cinfo);

	3767 + main_ptr->buffer_full = FALSE;

	3768 + main_ptr->rowgroup_ctr = 0;

	3769 + main_ptr->context_state = CTX_PREPARE_FOR_IMCU;

	3770 + upsample->next_row_out = cinfo->max_v_samp_factor;

	3771 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline;

	3772 + }

	3773 +

	3774 + /* Skipping is much simpler when context rows are not required. */

	3775 + else {

	3776 + if (num_lines < lines_left_in_iMCU_row) {

	3777 + increment_simple_rowgroup_ctr(cinfo, num_lines);

	3778 + return num_lines;

	3779 + } else {

	3780 + cinfo->output_scanline += lines_left_in_iMCU_row;

	3781 + main_ptr->buffer_full = FALSE;

	3782 + main_ptr->rowgroup_ctr = 0;

	3783 + upsample->next_row_out = cinfo->max_v_samp_factor;

	3784 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline;

	3785 + }

	3786 + }

	3787 +

	3788 + /* Calculate how many full iMCU rows we can skip. */

	3789 + if (cinfo->upsample->need_context_rows)

	3790 + lines_to_skip = ((lines_after_iMCU_row - 1) / lines_per_iMCU_row) *

	3791 + lines_per_iMCU_row;

	3792 + else

	3793 + lines_to_skip = (lines_after_iMCU_row / lines_per_iMCU_row) *

	3794 + lines_per_iMCU_row;

	3795 + /* Calculate the number of lines that remain to be skipped after skipping all

	3796 + * of the full iMCU rows that we can. We will not read these lines unless we

	3797 + * have to.

	3798 + */

	3799 + lines_to_read = lines_after_iMCU_row - lines_to_skip;

	3800 +

	3801 + /* For images requiring multiple scans (progressive, non-interleaved, etc.),

	3802 + * all of the entropy decoding occurs in jpeg_start_decompress(), assuming

	3803 + * that the input data source is non-suspending. This makes skipping easy.

	3804 + */

	3805 + if (cinfo->inputctl->has_multiple_scans) {

	3806 + if (cinfo->upsample->need_context_rows) {

	3807 + cinfo->output_scanline += lines_to_skip;

	3808 + cinfo->output_iMCU_row += lines_to_skip / lines_per_iMCU_row;

	3809 + main_ptr->iMCU_row_ctr += lines_after_iMCU_row / lines_per_iMCU_row;

	3810 + /* It is complex to properly move to the middle of a context block, so

	3811 + * read the remaining lines instead of skipping them.

	3812 + */

	3813 + read_and_discard_scanlines(cinfo, lines_to_read);

	3814 + } else {

	3815 + cinfo->output_scanline += lines_to_skip;

	3816 + cinfo->output_iMCU_row += lines_to_skip / lines_per_iMCU_row;

	3817 + increment_simple_rowgroup_ctr(cinfo, lines_to_read);

	3818 + }

	3819 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline;

	3820 + return num_lines;

	3821 + }

	3822 +

	3823 + /* Skip the iMCU rows that we can safely skip. */

	3824 + for (i = 0; i < lines_to_skip; i += lines_per_iMCU_row) {

	3825 + for (y = 0; y < coef->MCU_rows_per_iMCU_row; y++) {

	3826 + for (x = 0; x < cinfo->MCUs_per_row; x++) {

	3827 + /* Calling decode_mcu() with a NULL pointer causes it to discard the

	3828 + * decoded coefficients. This is ~5% faster for large subsets, but

	3829 + * it's tough to tell a difference for smaller images.

	3830 + */

	3831 + (*cinfo->entropy->decode_mcu) (cinfo, NULL);

	3832 + }

	3833 + }

	3834 + cinfo->input_iMCU_row++;

	3835 + cinfo->output_iMCU_row++;

	3836 + if (cinfo->input_iMCU_row < cinfo->total_iMCU_rows)

	3837 + start_iMCU_row(cinfo);

	3838 + else

	3839 + (*cinfo->inputctl->finish_input_pass) (cinfo);

	3840 + }

	3841 + cinfo->output_scanline += lines_to_skip;

	3842 +

	3843 + if (cinfo->upsample->need_context_rows) {

	3844 + /* Context-based upsampling keeps track of iMCU rows. */

	3845 + main_ptr->iMCU_row_ctr += lines_to_skip / lines_per_iMCU_row;

	3846 +

	3847 + /* It is complex to properly move to the middle of a context block, so

	3848 + * read the remaining lines instead of skipping them.

	3849 + */

	3850 + read_and_discard_scanlines(cinfo, lines_to_read);

	3851 + } else {

	3852 + increment_simple_rowgroup_ctr(cinfo, lines_to_read);

	3853 + }

	3854 +

	3855 + /* Since skipping lines involves skipping the upsampling step, the value of

	3856 + * "rows_to_go" will become invalid unless we set it here. NOTE: This is a

	3857 + * bit odd, since "rows_to_go" seems to be redundantly keeping track of

	3858 + * output_scanline.

	3859 + */

	3860 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline;

	3861 +

	3862 + /* Always skip the requested number of lines. */

	3863 + return num_lines;

	3864 +}

	3865 +

	3866 +/*

	3867 * Alternate entry point to read raw data.

	3868 * Processes exactly one iMCU row per call, unless suspended.

	3869 */

	3870 @@ -202,7 +435,7 @@

	3871 }

	3872

	3873 /* Verify that at least one iMCU row can be returned. */

	3874 - lines_per_iMCU_row = cinfo->max_v_samp_factor * cinfo->min_DCT_scaled_size;

	3875 + lines_per_iMCU_row = cinfo->max_v_samp_factor * cinfo->_min_DCT_scaled_size;

	3876 if (max_lines < lines_per_iMCU_row)

	3877 ERREXIT(cinfo, JERR_BUFFER_SIZE);

	3878

	3879 Index: jdatadst.c

	3880 ===================================================================

	3881 --- jdatadst.c (revision 829)

	3882 +++ jdatadst.c (working copy)

	3883 @@ -1,14 +1,17 @@

	3884 /*

	3885 * jdatadst.c

	3886 *

	3887 + * This file was part of the Independent JPEG Group's software:

	3888 * Copyright (C) 1994-1996, Thomas G. Lane.

	3889 - * This file is part of the Independent JPEG Group's software.

	3890 + * Modified 2009-2012 by Guido Vollbeding.

	3891 + * libjpeg-turbo Modifications:

	3892 + * Copyright (C) 2013, D. R. Commander.

	3893 * For conditions of distribution and use, see the accompanying README file.

	3894 *

	3895 * This file contains compression data destination routines for the case of

	3896 - * emitting JPEG data to a file (or any stdio stream). While these routines

	3897 - * are sufficient for most applications, some will want to use a different

	3898 - * destination manager.

	3899 + * emitting JPEG data to memory or to a file (or any stdio stream).

	3900 + * While these routines are sufficient for most applications,

	3901 + * some will want to use a different destination manager.

	3902 * IMPORTANT: we assume that fwrite() will correctly transcribe an array of

	3903 * JOCTETs into 8-bit-wide elements on external storage. If char is wider

	3904 * than 8 bits on your machine, you may need to do some tweaking.

	3905 @@ -19,7 +22,12 @@

	3906 #include "jpeglib.h"

	3907 #include "jerror.h"

	3908

	3909 +#ifndef HAVE_STDLIB_H /* <stdlib.h> should declare malloc(),free() */

	3910 +extern void * malloc JPP((size_t size));

	3911 +extern void free JPP((void *ptr));

	3912 +#endif

	3913

	3914 +

	3915 /* Expanded data destination object for stdio output */

	3916

	3917 typedef struct {

	3918 @@ -34,6 +42,23 @@

	3919 #define OUTPUT_BUF_SIZE 4096 /* choose an efficiently fwrite'able size */

	3920

	3921

	3922 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	3923 +/* Expanded data destination object for memory output */

	3924 +

	3925 +typedef struct {

	3926 + struct jpeg_destination_mgr pub; /* public fields */

	3927 +

	3928 + unsigned char ** outbuffer; /* target buffer */

	3929 + unsigned long * outsize;

	3930 + unsigned char * newbuffer; /* newly allocated buffer */

	3931 + JOCTET * buffer; /* start of buffer */

	3932 + size_t bufsize;

	3933 +} my_mem_destination_mgr;

	3934 +

	3935 +typedef my_mem_destination_mgr * my_mem_dest_ptr;

	3936 +#endif

	3937 +

	3938 +

	3939 /*

	3940 * Initialize destination --- called by jpeg_start_compress

	3941 * before any data is actually written.

	3942 @@ -53,7 +78,15 @@

	3943 dest->pub.free_in_buffer = OUTPUT_BUF_SIZE;

	3944 }

	3945

	3946 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	3947 +METHODDEF(void)

	3948 +init_mem_destination (j_compress_ptr cinfo)

	3949 +{

	3950 + /* no work necessary here */

	3951 +}

	3952 +#endif

	3953

	3954 +

	3955 /*

	3956 * Empty the output buffer --- called whenever buffer fills up.

	3957 *

	3958 @@ -92,7 +125,39 @@

	3959 return TRUE;

	3960 }

	3961

	3962 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	3963 +METHODDEF(boolean)

	3964 +empty_mem_output_buffer (j_compress_ptr cinfo)

	3965 +{

	3966 + size_t nextsize;

	3967 + JOCTET * nextbuffer;

	3968 + my_mem_dest_ptr dest = (my_mem_dest_ptr) cinfo->dest;

	3969

	3970 + /* Try to allocate new buffer with double size */

	3971 + nextsize = dest->bufsize * 2;

	3972 + nextbuffer = (JOCTET *) malloc(nextsize);

	3973 +

	3974 + if (nextbuffer == NULL)

	3975 + ERREXIT1(cinfo, JERR_OUT_OF_MEMORY, 10);

	3976 +

	3977 + MEMCOPY(nextbuffer, dest->buffer, dest->bufsize);

	3978 +

	3979 + if (dest->newbuffer != NULL)

	3980 + free(dest->newbuffer);

	3981 +

	3982 + dest->newbuffer = nextbuffer;

	3983 +

	3984 + dest->pub.next_output_byte = nextbuffer + dest->bufsize;

	3985 + dest->pub.free_in_buffer = dest->bufsize;

	3986 +

	3987 + dest->buffer = nextbuffer;

	3988 + dest->bufsize = nextsize;

	3989 +

	3990 + return TRUE;

	3991 +}

	3992 +#endif

	3993 +

	3994 +

	3995 /*

	3996 * Terminate destination --- called by jpeg_finish_compress

	3997 * after all data has been written. Usually needs to flush buffer.

	3998 @@ -119,7 +184,18 @@

	3999 ERREXIT(cinfo, JERR_FILE_WRITE);

	4000 }

	4001

	4002 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	4003 +METHODDEF(void)

	4004 +term_mem_destination (j_compress_ptr cinfo)

	4005 +{

	4006 + my_mem_dest_ptr dest = (my_mem_dest_ptr) cinfo->dest;

	4007

	4008 + *dest->outbuffer = dest->buffer;

	4009 + *dest->outsize = (unsigned long)(dest->bufsize - dest->pub.free_in_buffer);

	4010 +}

	4011 +#endif

	4012 +

	4013 +

	4014 /*

	4015 * Prepare for output to a stdio stream.

	4016 * The caller must have already opened the stream, and is responsible

	4017 @@ -149,3 +225,55 @@

	4018 dest->pub.term_destination = term_destination;

	4019 dest->outfile = outfile;

	4020 }

	4021 +

	4022 +

	4023 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	4024 +/*

	4025 + * Prepare for output to a memory buffer.

	4026 + * The caller may supply an own initial buffer with appropriate size.

	4027 + * Otherwise, or when the actual data output exceeds the given size,

	4028 + * the library adapts the buffer size as necessary.

	4029 + * The standard library functions malloc/free are used for allocating

	4030 + * larger memory, so the buffer is available to the application after

	4031 + * finishing compression, and then the application is responsible for

	4032 + * freeing the requested memory.

	4033 + */

	4034 +

	4035 +GLOBAL(void)

	4036 +jpeg_mem_dest (j_compress_ptr cinfo,

	4037 + unsigned char ** outbuffer, unsigned long * outsize)

	4038 +{

	4039 + my_mem_dest_ptr dest;

	4040 +

	4041 + if (outbuffer == NULL \|\| outsize == NULL) /* sanity check */

	4042 + ERREXIT(cinfo, JERR_BUFFER_SIZE);

	4043 +

	4044 + /* The destination object is made permanent so that multiple JPEG images

	4045 + * can be written to the same buffer without re-executing jpeg_mem_dest.

	4046 + */

	4047 + if (cinfo->dest == NULL) { /* first time for this JPEG object? */

	4048 + cinfo->dest = (struct jpeg_destination_mgr *)

	4049 + (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_PERMANENT,

	4050 + SIZEOF(my_mem_destination_mgr));

	4051 + }

	4052 +

	4053 + dest = (my_mem_dest_ptr) cinfo->dest;

	4054 + dest->pub.init_destination = init_mem_destination;

	4055 + dest->pub.empty_output_buffer = empty_mem_output_buffer;

	4056 + dest->pub.term_destination = term_mem_destination;

	4057 + dest->outbuffer = outbuffer;

	4058 + dest->outsize = outsize;

	4059 + dest->newbuffer = NULL;

	4060 +

	4061 + if (outbuffer == NULL \|\| outsize == 0) {

	4062 + /* Allocate initial buffer */

	4063 + dest->newbuffer = outbuffer = (unsigned char ) malloc(OUTPUT_BUF_SIZE);

	4064 + if (dest->newbuffer == NULL)

	4065 + ERREXIT1(cinfo, JERR_OUT_OF_MEMORY, 10);

	4066 + *outsize = OUTPUT_BUF_SIZE;

	4067 + }

	4068 +

	4069 + dest->pub.next_output_byte = dest->buffer = *outbuffer;

	4070 + dest->pub.free_in_buffer = dest->bufsize = *outsize;

	4071 +}

	4072 +#endif

	4073 Index: jdatasrc.c

	4074 ===================================================================

	4075 --- jdatasrc.c (revision 829)

	4076 +++ jdatasrc.c (working copy)

	4077 @@ -1,14 +1,17 @@

	4078 /*

	4079 * jdatasrc.c

	4080 *

	4081 + * This file was part of the Independent JPEG Group's software:

	4082 * Copyright (C) 1994-1996, Thomas G. Lane.

	4083 - * This file is part of the Independent JPEG Group's software.

	4084 + * Modified 2009-2011 by Guido Vollbeding.

	4085 + * libjpeg-turbo Modifications:

	4086 + * Copyright (C) 2013, D. R. Commander.

	4087 * For conditions of distribution and use, see the accompanying README file.

	4088 *

	4089 * This file contains decompression data source routines for the case of

	4090 - * reading JPEG data from a file (or any stdio stream). While these routines

	4091 - * are sufficient for most applications, some will want to use a different

	4092 - * source manager.

	4093 + * reading JPEG data from memory or from a file (or any stdio stream).

	4094 + * While these routines are sufficient for most applications,

	4095 + * some will want to use a different source manager.

	4096 * IMPORTANT: we assume that fread() will correctly transcribe an array of

	4097 * JOCTETs from 8-bit-wide elements on external storage. If char is wider

	4098 * than 8 bits on your machine, you may need to do some tweaking.

	4099 @@ -52,7 +55,15 @@

	4100 src->start_of_file = TRUE;

	4101 }

	4102

	4103 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	4104 +METHODDEF(void)

	4105 +init_mem_source (j_decompress_ptr cinfo)

	4106 +{

	4107 + /* no work necessary here */

	4108 +}

	4109 +#endif

	4110

	4111 +

	4112 /*

	4113 * Fill the input buffer --- called whenever buffer is emptied.

	4114 *

	4115 @@ -111,7 +122,30 @@

	4116 return TRUE;

	4117 }

	4118

	4119 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	4120 +METHODDEF(boolean)

	4121 +fill_mem_input_buffer (j_decompress_ptr cinfo)

	4122 +{

	4123 + static const JOCTET mybuffer[4] = {

	4124 + (JOCTET) 0xFF, (JOCTET) JPEG_EOI, 0, 0

	4125 + };

	4126

	4127 + /* The whole JPEG data is expected to reside in the supplied memory

	4128 + * buffer, so any request for more data beyond the given buffer size

	4129 + * is treated as an error.

	4130 + */

	4131 + WARNMS(cinfo, JWRN_JPEG_EOF);

	4132 +

	4133 + /* Insert a fake EOI marker */

	4134 +

	4135 + cinfo->src->next_input_byte = mybuffer;

	4136 + cinfo->src->bytes_in_buffer = 2;

	4137 +

	4138 + return TRUE;

	4139 +}

	4140 +#endif

	4141 +

	4142 +

	4143 /*

	4144 * Skip data --- used to skip over a potentially large amount of

	4145 * uninteresting data (such as an APPn marker).

	4146 @@ -127,7 +161,7 @@

	4147 METHODDEF(void)

	4148 skip_input_data (j_decompress_ptr cinfo, long num_bytes)

	4149 {

	4150 - my_src_ptr src = (my_src_ptr) cinfo->src;

	4151 + struct jpeg_source_mgr * src = cinfo->src;

	4152

	4153 /* Just a dumb implementation for now. Could use fseek() except

	4154 * it doesn't work on pipes. Not clear that being smart is worth

	4155 @@ -134,15 +168,15 @@

	4156 * any trouble anyway --- large skips are infrequent.

	4157 */

	4158 if (num_bytes > 0) {

	4159 - while (num_bytes > (long) src->pub.bytes_in_buffer) {

	4160 - num_bytes -= (long) src->pub.bytes_in_buffer;

	4161 - (void) fill_input_buffer(cinfo);

	4162 + while (num_bytes > (long) src->bytes_in_buffer) {

	4163 + num_bytes -= (long) src->bytes_in_buffer;

	4164 + (void) (*src->fill_input_buffer) (cinfo);

	4165 /* note we assume that fill_input_buffer will never return FALSE,

	4166 * so suspension need not be handled.

	4167 */

	4168 }

	4169 - src->pub.next_input_byte += (size_t) num_bytes;

	4170 - src->pub.bytes_in_buffer -= (size_t) num_bytes;

	4171 + src->next_input_byte += (size_t) num_bytes;

	4172 + src->bytes_in_buffer -= (size_t) num_bytes;

	4173 }

	4174 }

	4175

	4176 @@ -210,3 +244,40 @@

	4177 src->pub.bytes_in_buffer = 0; /* forces fill_input_buffer on first read */

	4178 src->pub.next_input_byte = NULL; /* until buffer loaded */

	4179 }

	4180 +

	4181 +

	4182 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	4183 +/*

	4184 + * Prepare for input from a supplied memory buffer.

	4185 + * The buffer must contain the whole JPEG data.

	4186 + */

	4187 +

	4188 +GLOBAL(void)

	4189 +jpeg_mem_src (j_decompress_ptr cinfo,

	4190 + unsigned char * inbuffer, unsigned long insize)

	4191 +{

	4192 + struct jpeg_source_mgr * src;

	4193 +

	4194 + if (inbuffer == NULL \|\| insize == 0) /* Treat empty input as fatal error */

	4195 + ERREXIT(cinfo, JERR_INPUT_EMPTY);

	4196 +

	4197 + /* The source object is made permanent so that a series of JPEG images

	4198 + * can be read from the same buffer by calling jpeg_mem_src only before

	4199 + * the first one.

	4200 + */

	4201 + if (cinfo->src == NULL) { /* first time for this JPEG object? */

	4202 + cinfo->src = (struct jpeg_source_mgr *)

	4203 + (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_PERMANENT,

	4204 + SIZEOF(struct jpeg_source_mgr));

	4205 + }

	4206 +

	4207 + src = cinfo->src;

	4208 + src->init_source = init_mem_source;

	4209 + src->fill_input_buffer = fill_mem_input_buffer;

	4210 + src->skip_input_data = skip_input_data;

	4211 + src->resync_to_restart = jpeg_resync_to_restart; /* use default method */

	4212 + src->term_source = term_source;

	4213 + src->bytes_in_buffer = (size_t) insize;

	4214 + src->next_input_byte = (JOCTET *) inbuffer;

	4215 +}

	4216 +#endif

	4217 Index: jdcoefct.c

	4218 ===================================================================

	4219 --- jdcoefct.c (revision 829)

	4220 +++ jdcoefct.c (working copy)

	4221 @@ -1,8 +1,11 @@

	4222 /*

	4223 * jdcoefct.c

	4224 *

	4225 + * This file was part of the Independent JPEG Group's software:

	4226 * Copyright (C) 1994-1997, Thomas G. Lane.

	4227 - * This file is part of the Independent JPEG Group's software.

	4228 + * libjpeg-turbo Modifications:

	4229 + * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	4230 + * Copyright (C) 2010, D. R. Commander.

	4231 * For conditions of distribution and use, see the accompanying README file.

	4232 *

	4233 * This file contains the coefficient buffer controller for decompression.

	4234 @@ -14,56 +17,10 @@

	4235 * Also, the input side (only) is used when reading a file for transcoding.

	4236 */

	4237

	4238 -#define JPEG_INTERNALS

	4239 -#include "jinclude.h"

	4240 -#include "jpeglib.h"

	4241 +#include "jdcoefct.h"

	4242 +#include "jpegcomp.h"

	4243

	4244 -/* Block smoothing is only applicable for progressive JPEG, so: */

	4245 -#ifndef D_PROGRESSIVE_SUPPORTED

	4246 -#undef BLOCK_SMOOTHING_SUPPORTED

	4247 -#endif

	4248

	4249 -/* Private buffer controller object */

	4250 -

	4251 -typedef struct {

	4252 - struct jpeg_d_coef_controller pub; /* public fields */

	4253 -

	4254 - /* These variables keep track of the current location of the input side. */

	4255 - /* cinfo->input_iMCU_row is also used for this. */

	4256 - JDIMENSION MCU_ctr; /* counts MCUs processed in current row */

	4257 - int MCU_vert_offset; /* counts MCU rows within iMCU row */

	4258 - int MCU_rows_per_iMCU_row; /* number of such rows needed */

	4259 -

	4260 - /* The output side's location is represented by cinfo->output_iMCU_row. */

	4261 -

	4262 - /* In single-pass modes, it's sufficient to buffer just one MCU.

	4263 - * We allocate a workspace of D_MAX_BLOCKS_IN_MCU coefficient blocks,

	4264 - * and let the entropy decoder write into that workspace each time.

	4265 - * (On 80x86, the workspace is FAR even though it's not really very big;

	4266 - * this is to keep the module interfaces unchanged when a large coefficient

	4267 - * buffer is necessary.)

	4268 - * In multi-pass modes, this array points to the current MCU's blocks

	4269 - * within the virtual arrays; it is used only by the input side.

	4270 - */

	4271 - JBLOCKROW MCU_buffer[D_MAX_BLOCKS_IN_MCU];

	4272 -

	4273 - /* Temporary workspace for one MCU */

	4274 - JCOEF * workspace;

	4275 -

	4276 -#ifdef D_MULTISCAN_FILES_SUPPORTED

	4277 - /* In multi-pass modes, we need a virtual block array for each component. */

	4278 - jvirt_barray_ptr whole_image[MAX_COMPONENTS];

	4279 -#endif

	4280 -

	4281 -#ifdef BLOCK_SMOOTHING_SUPPORTED

	4282 - /* When doing block smoothing, we latch coefficient Al values here */

	4283 - int * coef_bits_latch;

	4284 -#define SAVED_COEFS 6 /* we save coef_bits[0..5] */

	4285 -#endif

	4286 -} my_coef_controller;

	4287 -

	4288 -typedef my_coef_controller * my_coef_ptr;

	4289 -

	4290 /* Forward declarations */

	4291 METHODDEF(int) decompress_onepass

	4292 JPP((j_decompress_ptr cinfo, JSAMPIMAGE output_buf));

	4293 @@ -78,30 +35,6 @@

	4294 #endif

	4295

	4296

	4297 -LOCAL(void)

	4298 -start_iMCU_row (j_decompress_ptr cinfo)

	4299 -/* Reset within-iMCU-row counters for a new row (input side) */

	4300 -{

	4301 - my_coef_ptr coef = (my_coef_ptr) cinfo->coef;

	4302 -

	4303 - /* In an interleaved scan, an MCU row is the same as an iMCU row.

	4304 - * In a noninterleaved scan, an iMCU row has v_samp_factor MCU rows.

	4305 - * But at the bottom of the image, process only what's left.

	4306 - */

	4307 - if (cinfo->comps_in_scan > 1) {

	4308 - coef->MCU_rows_per_iMCU_row = 1;

	4309 - } else {

	4310 - if (cinfo->input_iMCU_row < (cinfo->total_iMCU_rows-1))

	4311 - coef->MCU_rows_per_iMCU_row = cinfo->cur_comp_info[0]->v_samp_factor;

	4312 - else

	4313 - coef->MCU_rows_per_iMCU_row = cinfo->cur_comp_info[0]->last_row_height;

	4314 - }

	4315 -

	4316 - coef->MCU_ctr = 0;

	4317 - coef->MCU_vert_offset = 0;

	4318 -}

	4319 -

	4320 -

	4321 /*

	4322 * Initialize for an input processing pass.

	4323 */

	4324 @@ -190,7 +123,7 @@

	4325 useful_width = (MCU_col_num < last_MCU_col) ? compptr->MCU_width

	4326 : compptr->last_col_width;

	4327 output_ptr = output_buf[compptr->component_index] +

	4328 - yoffset * compptr->DCT_scaled_size;

	4329 + yoffset * compptr->_DCT_scaled_size;

	4330 start_col = MCU_col_num * compptr->MCU_sample_width;

	4331 for (yindex = 0; yindex < compptr->MCU_height; yindex++) {

	4332 if (cinfo->input_iMCU_row < last_iMCU_row \|\|

	4333 @@ -200,11 +133,11 @@

	4334 (*inverse_DCT) (cinfo, compptr,

	4335 (JCOEFPTR) coef->MCU_buffer[blkn+xindex],

	4336 output_ptr, output_col);

	4337 - output_col += compptr->DCT_scaled_size;

	4338 + output_col += compptr->_DCT_scaled_size;

	4339 }

	4340 }

	4341 blkn += compptr->MCU_width;

	4342 - output_ptr += compptr->DCT_scaled_size;

	4343 + output_ptr += compptr->_DCT_scaled_size;

	4344 }

	4345 }

	4346 }

	4347 @@ -365,9 +298,9 @@

	4348 (*inverse_DCT) (cinfo, compptr, (JCOEFPTR) buffer_ptr,

	4349 output_ptr, output_col);

	4350 buffer_ptr++;

	4351 - output_col += compptr->DCT_scaled_size;

	4352 + output_col += compptr->_DCT_scaled_size;

	4353 }

	4354 - output_ptr += compptr->DCT_scaled_size;

	4355 + output_ptr += compptr->_DCT_scaled_size;

	4356 }

	4357 }

	4358

	4359 @@ -660,9 +593,9 @@

	4360 DC4 = DC5; DC5 = DC6;

	4361 DC7 = DC8; DC8 = DC9;

	4362 buffer_ptr++, prev_block_row++, next_block_row++;

	4363 - output_col += compptr->DCT_scaled_size;

	4364 + output_col += compptr->_DCT_scaled_size;

	4365 }

	4366 - output_ptr += compptr->DCT_scaled_size;

	4367 + output_ptr += compptr->_DCT_scaled_size;

	4368 }

	4369 }

	4370

	4371 Index: jdcolor.c

	4372 ===================================================================

	4373 --- jdcolor.c (revision 829)

	4374 +++ jdcolor.c (working copy)

	4375 @@ -1,10 +1,12 @@

	4376 /*

	4377 * jdcolor.c

	4378 *

	4379 + * This file was part of the Independent JPEG Group's software:

	4380 * Copyright (C) 1991-1997, Thomas G. Lane.

	4381 + * Modified 2011 by Guido Vollbeding.

	4382 + * libjpeg-turbo Modifications:

	4383 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	4384 - * Copyright (C) 2009, D. R. Commander.

	4385 - * This file is part of the Independent JPEG Group's software.

	4386 + * Copyright (C) 2009, 2011-2012, D. R. Commander.

	4387 * For conditions of distribution and use, see the accompanying README file.

	4388 *

	4389 * This file contains output colorspace conversion routines.

	4390 @@ -14,6 +16,7 @@

	4391 #include "jinclude.h"

	4392 #include "jpeglib.h"

	4393 #include "jsimd.h"

	4394 +#include "config.h"

	4395

	4396

	4397 /* Private subobject */

	4398 @@ -26,6 +29,9 @@

	4399 int * Cb_b_tab; /* => table for Cb to B conversion */

	4400 INT32 * Cr_g_tab; /* => table for Cr to G conversion */

	4401 INT32 * Cb_g_tab; /* => table for Cb to G conversion */

	4402 +

	4403 + /* Private state for RGB->Y conversion */

	4404 + INT32 * rgb_y_tab; /* => table for RGB to Y conversion */

	4405 } my_color_deconverter;

	4406

	4407 typedef my_color_deconverter * my_cconvert_ptr;

	4408 @@ -32,14 +38,19 @@

	4409

	4410

	4411 /************** YCbCr -> RGB conversion: most common case ************/

	4412 +/************** RGB -> Y conversion: less common case ************/

	4413

	4414 /*

	4415 * YCbCr is defined per CCIR 601-1, except that Cb and Cr are

	4416 * normalized to the range 0..MAXJSAMPLE rather than -0.5 .. 0.5.

	4417 * The conversion equations to be implemented are therefore

	4418 + *

	4419 * R = Y + 1.40200 * Cr

	4420 * G = Y - 0.34414 * Cb - 0.71414 * Cr

	4421 * B = Y + 1.77200 * Cb

	4422 + *

	4423 + * Y = 0.29900 * R + 0.58700 * G + 0.11400 * B

	4424 + *

	4425 * where Cb and Cr represent the incoming values less CENTERJSAMPLE.

	4426 * (These numbers are derived from TIFF 6.0 section 21, dated 3-June-92.)

	4427 *

	4428 @@ -64,7 +75,132 @@

	4429 #define ONE_HALF ((INT32) 1 << (SCALEBITS-1))

	4430 #define FIX(x) ((INT32) ((x) * (1L<<SCALEBITS) + 0.5))

	4431

	4432 +/* We allocate one big table for RGB->Y conversion and divide it up into

	4433 + * three parts, instead of doing three alloc_small requests. This lets us

	4434 + * use a single table base address, which can be held in a register in the

	4435 + * inner loops on many machines (more than can hold all three addresses,

	4436 + * anyway).

	4437 + */

	4438

	4439 +#define R_Y_OFF 0 /* offset to R => Y sect ion */

	4440 +#define G_Y_OFF (1(MAXJSAMPLE+1)) / offset to G => Y sect ion */

	4441 +#define B_Y_OFF (2(MAXJSAMPLE+1)) / etc. */

	4442 +#define TABLE_SIZE (3*(MAXJSAMPLE+1))

	4443 +

	4444 +

	4445 +/* Include inline routines for colorspace extensions */

	4446 +

	4447 +#include "jdcolext.c"

	4448 +#undef RGB_RED

	4449 +#undef RGB_GREEN

	4450 +#undef RGB_BLUE

	4451 +#undef RGB_PIXELSIZE

	4452 +

	4453 +#define RGB_RED EXT_RGB_RED

	4454 +#define RGB_GREEN EXT_RGB_GREEN

	4455 +#define RGB_BLUE EXT_RGB_BLUE

	4456 +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	4457 +#define ycc_rgb_convert_internal ycc_extrgb_convert_internal

	4458 +#define gray_rgb_convert_internal gray_extrgb_convert_internal

	4459 +#define rgb_rgb_convert_internal rgb_extrgb_convert_internal

	4460 +#include "jdcolext.c"

	4461 +#undef RGB_RED

	4462 +#undef RGB_GREEN

	4463 +#undef RGB_BLUE

	4464 +#undef RGB_PIXELSIZE

	4465 +#undef ycc_rgb_convert_internal

	4466 +#undef gray_rgb_convert_internal

	4467 +#undef rgb_rgb_convert_internal

	4468 +

	4469 +#define RGB_RED EXT_RGBX_RED

	4470 +#define RGB_GREEN EXT_RGBX_GREEN

	4471 +#define RGB_BLUE EXT_RGBX_BLUE

	4472 +#define RGB_ALPHA 3

	4473 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	4474 +#define ycc_rgb_convert_internal ycc_extrgbx_convert_internal

	4475 +#define gray_rgb_convert_internal gray_extrgbx_convert_internal

	4476 +#define rgb_rgb_convert_internal rgb_extrgbx_convert_internal

	4477 +#include "jdcolext.c"

	4478 +#undef RGB_RED

	4479 +#undef RGB_GREEN

	4480 +#undef RGB_BLUE

	4481 +#undef RGB_ALPHA

	4482 +#undef RGB_PIXELSIZE

	4483 +#undef ycc_rgb_convert_internal

	4484 +#undef gray_rgb_convert_internal

	4485 +#undef rgb_rgb_convert_internal

	4486 +

	4487 +#define RGB_RED EXT_BGR_RED

	4488 +#define RGB_GREEN EXT_BGR_GREEN

	4489 +#define RGB_BLUE EXT_BGR_BLUE

	4490 +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	4491 +#define ycc_rgb_convert_internal ycc_extbgr_convert_internal

	4492 +#define gray_rgb_convert_internal gray_extbgr_convert_internal

	4493 +#define rgb_rgb_convert_internal rgb_extbgr_convert_internal

	4494 +#include "jdcolext.c"

	4495 +#undef RGB_RED

	4496 +#undef RGB_GREEN

	4497 +#undef RGB_BLUE

	4498 +#undef RGB_PIXELSIZE

	4499 +#undef ycc_rgb_convert_internal

	4500 +#undef gray_rgb_convert_internal

	4501 +#undef rgb_rgb_convert_internal

	4502 +

	4503 +#define RGB_RED EXT_BGRX_RED

	4504 +#define RGB_GREEN EXT_BGRX_GREEN

	4505 +#define RGB_BLUE EXT_BGRX_BLUE

	4506 +#define RGB_ALPHA 3

	4507 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	4508 +#define ycc_rgb_convert_internal ycc_extbgrx_convert_internal

	4509 +#define gray_rgb_convert_internal gray_extbgrx_convert_internal

	4510 +#define rgb_rgb_convert_internal rgb_extbgrx_convert_internal

	4511 +#include "jdcolext.c"

	4512 +#undef RGB_RED

	4513 +#undef RGB_GREEN

	4514 +#undef RGB_BLUE

	4515 +#undef RGB_ALPHA

	4516 +#undef RGB_PIXELSIZE

	4517 +#undef ycc_rgb_convert_internal

	4518 +#undef gray_rgb_convert_internal

	4519 +#undef rgb_rgb_convert_internal

	4520 +

	4521 +#define RGB_RED EXT_XBGR_RED

	4522 +#define RGB_GREEN EXT_XBGR_GREEN

	4523 +#define RGB_BLUE EXT_XBGR_BLUE

	4524 +#define RGB_ALPHA 0

	4525 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	4526 +#define ycc_rgb_convert_internal ycc_extxbgr_convert_internal

	4527 +#define gray_rgb_convert_internal gray_extxbgr_convert_internal

	4528 +#define rgb_rgb_convert_internal rgb_extxbgr_convert_internal

	4529 +#include "jdcolext.c"

	4530 +#undef RGB_RED

	4531 +#undef RGB_GREEN

	4532 +#undef RGB_BLUE

	4533 +#undef RGB_ALPHA

	4534 +#undef RGB_PIXELSIZE

	4535 +#undef ycc_rgb_convert_internal

	4536 +#undef gray_rgb_convert_internal

	4537 +#undef rgb_rgb_convert_internal

	4538 +

	4539 +#define RGB_RED EXT_XRGB_RED

	4540 +#define RGB_GREEN EXT_XRGB_GREEN

	4541 +#define RGB_BLUE EXT_XRGB_BLUE

	4542 +#define RGB_ALPHA 0

	4543 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	4544 +#define ycc_rgb_convert_internal ycc_extxrgb_convert_internal

	4545 +#define gray_rgb_convert_internal gray_extxrgb_convert_internal

	4546 +#define rgb_rgb_convert_internal rgb_extxrgb_convert_internal

	4547 +#include "jdcolext.c"

	4548 +#undef RGB_RED

	4549 +#undef RGB_GREEN

	4550 +#undef RGB_BLUE

	4551 +#undef RGB_ALPHA

	4552 +#undef RGB_PIXELSIZE

	4553 +#undef ycc_rgb_convert_internal

	4554 +#undef gray_rgb_convert_internal

	4555 +#undef rgb_rgb_convert_internal

	4556 +

	4557 +

	4558 /*

	4559 * Initialize tables for YCC->RGB colorspace conversion.

	4560 */

	4561 @@ -110,13 +246,6 @@

	4562

	4563 /*

	4564 * Convert some rows of samples to the output colorspace.

	4565 - *

	4566 - * Note that we change from noninterleaved, one-plane-per-component format

	4567 - * to interleaved-pixel format. The output buffer is therefore three times

	4568 - * as wide as the input buffer.

	4569 - * A starting row offset is provided only for the input buffer. The caller

	4570 - * can easily adjust the passed output_buf value to accommodate any row

	4571 - * offset required on that side.

	4572 */

	4573

	4574 METHODDEF(void)

	4575 @@ -124,19 +253,86 @@

	4576 JSAMPIMAGE input_buf, JDIMENSION input_row,

	4577 JSAMPARRAY output_buf, int num_rows)

	4578 {

	4579 + switch (cinfo->out_color_space) {

	4580 + case JCS_EXT_RGB:

	4581 + ycc_extrgb_convert_internal(cinfo, input_buf, input_row, output_buf,

	4582 + num_rows);

	4583 + break;

	4584 + case JCS_EXT_RGBX:

	4585 + case JCS_EXT_RGBA:

	4586 + ycc_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf,

	4587 + num_rows);

	4588 + break;

	4589 + case JCS_EXT_BGR:

	4590 + ycc_extbgr_convert_internal(cinfo, input_buf, input_row, output_buf,

	4591 + num_rows);

	4592 + break;

	4593 + case JCS_EXT_BGRX:

	4594 + case JCS_EXT_BGRA:

	4595 + ycc_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf,

	4596 + num_rows);

	4597 + break;

	4598 + case JCS_EXT_XBGR:

	4599 + case JCS_EXT_ABGR:

	4600 + ycc_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf,

	4601 + num_rows);

	4602 + break;

	4603 + case JCS_EXT_XRGB:

	4604 + case JCS_EXT_ARGB:

	4605 + ycc_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf,

	4606 + num_rows);

	4607 + break;

	4608 + default:

	4609 + ycc_rgb_convert_internal(cinfo, input_buf, input_row, output_buf,

	4610 + num_rows);

	4611 + break;

	4612 + }

	4613 +}

	4614 +

	4615 +

	4616 +/************** Cases other than YCbCr -> RGB ************/

	4617 +

	4618 +

	4619 +/*

	4620 + * Initialize for RGB->grayscale colorspace conversion.

	4621 + */

	4622 +

	4623 +LOCAL(void)

	4624 +build_rgb_y_table (j_decompress_ptr cinfo)

	4625 +{

	4626 my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert;

	4627 - register int y, cb, cr;

	4628 + INT32 * rgb_y_tab;

	4629 + INT32 i;

	4630 +

	4631 + /* Allocate and fill in the conversion tables. */

	4632 + cconvert->rgb_y_tab = rgb_y_tab = (INT32 *)

	4633 + (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,

	4634 + (TABLE_SIZE * SIZEOF(INT32)));

	4635 +

	4636 + for (i = 0; i <= MAXJSAMPLE; i++) {

	4637 + rgb_y_tab[i+R_Y_OFF] = FIX(0.29900) * i;

	4638 + rgb_y_tab[i+G_Y_OFF] = FIX(0.58700) * i;

	4639 + rgb_y_tab[i+B_Y_OFF] = FIX(0.11400) * i + ONE_HALF;

	4640 + }

	4641 +}

	4642 +

	4643 +

	4644 +/*

	4645 + * Convert RGB to grayscale.

	4646 + */

	4647 +

	4648 +METHODDEF(void)

	4649 +rgb_gray_convert (j_decompress_ptr cinfo,

	4650 + JSAMPIMAGE input_buf, JDIMENSION input_row,

	4651 + JSAMPARRAY output_buf, int num_rows)

	4652 +{

	4653 + my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert;

	4654 + register int r, g, b;

	4655 + register INT32 * ctab = cconvert->rgb_y_tab;

	4656 register JSAMPROW outptr;

	4657 register JSAMPROW inptr0, inptr1, inptr2;

	4658 register JDIMENSION col;

	4659 JDIMENSION num_cols = cinfo->output_width;

	4660 - /* copy these pointers into registers if possible */

	4661 - register JSAMPLE * range_limit = cinfo->sample_range_limit;

	4662 - register int * Crrtab = cconvert->Cr_r_tab;

	4663 - register int * Cbbtab = cconvert->Cb_b_tab;

	4664 - register INT32 * Crgtab = cconvert->Cr_g_tab;

	4665 - register INT32 * Cbgtab = cconvert->Cb_g_tab;

	4666 - SHIFT_TEMPS

	4667

	4668 while (--num_rows >= 0) {

	4669 inptr0 = input_buf[0][input_row];

	4670 @@ -145,24 +341,18 @@

	4671 input_row++;

	4672 outptr = *output_buf++;

	4673 for (col = 0; col < num_cols; col++) {

	4674 - y = GETJSAMPLE(inptr0[col]);

	4675 - cb = GETJSAMPLE(inptr1[col]);

	4676 - cr = GETJSAMPLE(inptr2[col]);

	4677 - /* Range-limiting is essential due to noise introduced by DCT losses. */

	4678 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + Crrtab[cr]];

	4679 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y +

	4680 - ((int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],

	4681 - SCALEBITS))];

	4682 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + Cbbtab[cb]];

	4683 - outptr += rgb_pixelsize[cinfo->out_color_space];

	4684 + r = GETJSAMPLE(inptr0[col]);

	4685 + g = GETJSAMPLE(inptr1[col]);

	4686 + b = GETJSAMPLE(inptr2[col]);

	4687 + /* Y */

	4688 + outptr[col] = (JSAMPLE)

	4689 + ((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF])

	4690 + >> SCALEBITS);

	4691 }

	4692 }

	4693 }

	4694

	4695

	4696 -/************** Cases other than YCbCr -> RGB ************/

	4697 -

	4698 -

	4699 /*

	4700 * Color conversion for no colorspace change: just copy the data,

	4701 * converting from separate-planes to interleaved representation.

	4702 @@ -211,9 +401,7 @@

	4703

	4704

	4705 /*

	4706 - * Convert grayscale to RGB: just duplicate the graylevel three times.

	4707 - * This is provided to support applications that don't want to cope

	4708 - * with grayscale as a separate case.

	4709 + * Convert grayscale to RGB

	4710 */

	4711

	4712 METHODDEF(void)

	4713 @@ -221,20 +409,85 @@

	4714 JSAMPIMAGE input_buf, JDIMENSION input_row,

	4715 JSAMPARRAY output_buf, int num_rows)

	4716 {

	4717 - register JSAMPROW inptr, outptr;

	4718 - register JDIMENSION col;

	4719 - JDIMENSION num_cols = cinfo->output_width;

	4720 + switch (cinfo->out_color_space) {

	4721 + case JCS_EXT_RGB:

	4722 + gray_extrgb_convert_internal(cinfo, input_buf, input_row, output_buf,

	4723 + num_rows);

	4724 + break;

	4725 + case JCS_EXT_RGBX:

	4726 + case JCS_EXT_RGBA:

	4727 + gray_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf,

	4728 + num_rows);

	4729 + break;

	4730 + case JCS_EXT_BGR:

	4731 + gray_extbgr_convert_internal(cinfo, input_buf, input_row, output_buf,

	4732 + num_rows);

	4733 + break;

	4734 + case JCS_EXT_BGRX:

	4735 + case JCS_EXT_BGRA:

	4736 + gray_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf,

	4737 + num_rows);

	4738 + break;

	4739 + case JCS_EXT_XBGR:

	4740 + case JCS_EXT_ABGR:

	4741 + gray_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf,

	4742 + num_rows);

	4743 + break;

	4744 + case JCS_EXT_XRGB:

	4745 + case JCS_EXT_ARGB:

	4746 + gray_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf,

	4747 + num_rows);

	4748 + break;

	4749 + default:

	4750 + gray_rgb_convert_internal(cinfo, input_buf, input_row, output_buf,

	4751 + num_rows);

	4752 + break;

	4753 + }

	4754 +}

	4755

	4756 - while (--num_rows >= 0) {

	4757 - inptr = input_buf[0][input_row++];

	4758 - outptr = *output_buf++;

	4759 - for (col = 0; col < num_cols; col++) {

	4760 - /* We can dispense with GETJSAMPLE() here */

	4761 - outptr[rgb_red[cinfo->out_color_space]] =

	4762 - outptr[rgb_green[cinfo->out_color_space]] =

	4763 - outptr[rgb_blue[cinfo->out_color_space]] = inptr[col];

	4764 - outptr += rgb_pixelsize[cinfo->out_color_space];

	4765 - }

	4766 +

	4767 +/*

	4768 + * Convert plain RGB to extended RGB

	4769 + */

	4770 +

	4771 +METHODDEF(void)

	4772 +rgb_rgb_convert (j_decompress_ptr cinfo,

	4773 + JSAMPIMAGE input_buf, JDIMENSION input_row,

	4774 + JSAMPARRAY output_buf, int num_rows)

	4775 +{

	4776 + switch (cinfo->out_color_space) {

	4777 + case JCS_EXT_RGB:

	4778 + rgb_extrgb_convert_internal(cinfo, input_buf, input_row, output_buf,

	4779 + num_rows);

	4780 + break;

	4781 + case JCS_EXT_RGBX:

	4782 + case JCS_EXT_RGBA:

	4783 + rgb_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf,

	4784 + num_rows);

	4785 + break;

	4786 + case JCS_EXT_BGR:

	4787 + rgb_extbgr_convert_internal(cinfo, input_buf, input_row, output_buf,

	4788 + num_rows);

	4789 + break;

	4790 + case JCS_EXT_BGRX:

	4791 + case JCS_EXT_BGRA:

	4792 + rgb_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf,

	4793 + num_rows);

	4794 + break;

	4795 + case JCS_EXT_XBGR:

	4796 + case JCS_EXT_ABGR:

	4797 + rgb_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf,

	4798 + num_rows);

	4799 + break;

	4800 + case JCS_EXT_XRGB:

	4801 + case JCS_EXT_ARGB:

	4802 + rgb_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf,

	4803 + num_rows);

	4804 + break;

	4805 + default:

	4806 + rgb_rgb_convert_internal(cinfo, input_buf, input_row, output_buf,

	4807 + num_rows);

	4808 + break;

	4809 }

	4810 }

	4811

	4812 @@ -356,6 +609,9 @@

	4813 /* For color->grayscale conversion, only the Y (0) component is needed */

	4814 for (ci = 1; ci < cinfo->num_components; ci++)

	4815 cinfo->comp_info[ci].component_needed = FALSE;

	4816 + } else if (cinfo->jpeg_color_space == JCS_RGB) {

	4817 + cconvert->pub.color_convert = rgb_gray_convert;

	4818 + build_rgb_y_table(cinfo);

	4819 } else

	4820 ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL);

	4821 break;

	4822 @@ -367,6 +623,10 @@

	4823 case JCS_EXT_BGRX:

	4824 case JCS_EXT_XBGR:

	4825 case JCS_EXT_XRGB:

	4826 + case JCS_EXT_RGBA:

	4827 + case JCS_EXT_BGRA:

	4828 + case JCS_EXT_ABGR:

	4829 + case JCS_EXT_ARGB:

	4830 cinfo->out_color_components = rgb_pixelsize[cinfo->out_color_space];

	4831 if (cinfo->jpeg_color_space == JCS_YCbCr) {

	4832 if (jsimd_can_ycc_rgb())

	4833 @@ -377,9 +637,14 @@

	4834 }

	4835 } else if (cinfo->jpeg_color_space == JCS_GRAYSCALE) {

	4836 cconvert->pub.color_convert = gray_rgb_convert;

	4837 - } else if (cinfo->jpeg_color_space == cinfo->out_color_space &&

	4838 - rgb_pixelsize[cinfo->out_color_space] == 3) {

	4839 - cconvert->pub.color_convert = null_convert;

	4840 + } else if (cinfo->jpeg_color_space == JCS_RGB) {

	4841 + if (rgb_red[cinfo->out_color_space] == 0 &&

	4842 + rgb_green[cinfo->out_color_space] == 1 &&

	4843 + rgb_blue[cinfo->out_color_space] == 2 &&

	4844 + rgb_pixelsize[cinfo->out_color_space] == 3)

	4845 + cconvert->pub.color_convert = null_convert;

	4846 + else

	4847 + cconvert->pub.color_convert = rgb_rgb_convert;

	4848 } else

	4849 ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL);

	4850 break;

	4851 Index: jdct.h

	4852 ===================================================================

	4853 --- jdct.h (revision 829)

	4854 +++ jdct.h (working copy)

	4855 @@ -95,9 +95,21 @@

	4856 #define jpeg_idct_islow jRDislow

	4857 #define jpeg_idct_ifast jRDifast

	4858 #define jpeg_idct_float jRDfloat

	4859 +#define jpeg_idct_7x7 jRD7x7

	4860 +#define jpeg_idct_6x6 jRD6x6

	4861 +#define jpeg_idct_5x5 jRD5x5

	4862 #define jpeg_idct_4x4 jRD4x4

	4863 +#define jpeg_idct_3x3 jRD3x3

	4864 #define jpeg_idct_2x2 jRD2x2

	4865 #define jpeg_idct_1x1 jRD1x1

	4866 +#define jpeg_idct_9x9 jRD9x9

	4867 +#define jpeg_idct_10x10 jRD10x10

	4868 +#define jpeg_idct_11x11 jRD11x11

	4869 +#define jpeg_idct_12x12 jRD12x12

	4870 +#define jpeg_idct_13x13 jRD13x13

	4871 +#define jpeg_idct_14x14 jRD14x14

	4872 +#define jpeg_idct_15x15 jRD15x15

	4873 +#define jpeg_idct_16x16 jRD16x16

	4874 #endif /* NEED_SHORT_EXTERNAL_NAMES */

	4875

	4876 /* Extern declarations for the forward and inverse DCT routines. */

	4877 @@ -115,9 +127,21 @@

	4878 EXTERN(void) jpeg_idct_float

	4879 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4880 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4881 +EXTERN(void) jpeg_idct_7x7

	4882 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4883 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4884 +EXTERN(void) jpeg_idct_6x6

	4885 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4886 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4887 +EXTERN(void) jpeg_idct_5x5

	4888 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4889 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4890 EXTERN(void) jpeg_idct_4x4

	4891 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4892 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4893 +EXTERN(void) jpeg_idct_3x3

	4894 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4895 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4896 EXTERN(void) jpeg_idct_2x2

	4897 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4898 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4899 @@ -124,6 +148,30 @@

	4900 EXTERN(void) jpeg_idct_1x1

	4901 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4902 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4903 +EXTERN(void) jpeg_idct_9x9

	4904 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4905 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4906 +EXTERN(void) jpeg_idct_10x10

	4907 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4908 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4909 +EXTERN(void) jpeg_idct_11x11

	4910 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4911 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4912 +EXTERN(void) jpeg_idct_12x12

	4913 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4914 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4915 +EXTERN(void) jpeg_idct_13x13

	4916 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4917 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4918 +EXTERN(void) jpeg_idct_14x14

	4919 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4920 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4921 +EXTERN(void) jpeg_idct_15x15

	4922 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4923 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4924 +EXTERN(void) jpeg_idct_16x16

	4925 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,

	4926 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));

	4927

	4928

	4929 /*

	4930 Index: jddctmgr.c

	4931 ===================================================================

	4932 --- jddctmgr.c (revision 829)

	4933 +++ jddctmgr.c (working copy)

	4934 @@ -1,9 +1,12 @@

	4935 /*

	4936 * jddctmgr.c

	4937 *

	4938 + * This file was part of the Independent JPEG Group's software:

	4939 * Copyright (C) 1994-1996, Thomas G. Lane.

	4940 + * Modified 2002-2010 by Guido Vollbeding.

	4941 + * libjpeg-turbo Modifications:

	4942 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	4943 - * This file is part of the Independent JPEG Group's software.

	4944 + * Copyright (C) 2010, D. R. Commander.

	4945 * For conditions of distribution and use, see the accompanying README file.

	4946 *

	4947 * This file contains the inverse-DCT management logic.

	4948 @@ -21,6 +24,7 @@

	4949 #include "jpeglib.h"

	4950 #include "jdct.h" /* Private declarations for DCT subsystem */

	4951 #include "jsimddct.h"

	4952 +#include "jpegcomp.h"

	4953

	4954

	4955 /*

	4956 @@ -100,7 +104,7 @@

	4957 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	4958 ci++, compptr++) {

	4959 /* Select the proper IDCT routine for this component's scaling */

	4960 - switch (compptr->DCT_scaled_size) {

	4961 + switch (compptr->_DCT_scaled_size) {

	4962 #ifdef IDCT_SCALING_SUPPORTED

	4963 case 1:

	4964 method_ptr = jpeg_idct_1x1;

	4965 @@ -113,6 +117,10 @@

	4966 method_ptr = jpeg_idct_2x2;

	4967 method = JDCT_ISLOW; /* jidctred uses islow-style table */

	4968 break;

	4969 + case 3:

	4970 + method_ptr = jpeg_idct_3x3;

	4971 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	4972 + break;

	4973 case 4:

	4974 if (jsimd_can_idct_4x4())

	4975 method_ptr = jsimd_idct_4x4;

	4976 @@ -120,6 +128,18 @@

	4977 method_ptr = jpeg_idct_4x4;

	4978 method = JDCT_ISLOW; /* jidctred uses islow-style table */

	4979 break;

	4980 + case 5:

	4981 + method_ptr = jpeg_idct_5x5;

	4982 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	4983 + break;

	4984 + case 6:

	4985 + method_ptr = jpeg_idct_6x6;

	4986 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	4987 + break;

	4988 + case 7:

	4989 + method_ptr = jpeg_idct_7x7;

	4990 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	4991 + break;

	4992 #endif

	4993 case DCTSIZE:

	4994 switch (cinfo->dct_method) {

	4995 @@ -155,8 +175,40 @@

	4996 break;

	4997 }

	4998 break;

	4999 + case 9:

	5000 + method_ptr = jpeg_idct_9x9;

	5001 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	5002 + break;

	5003 + case 10:

	5004 + method_ptr = jpeg_idct_10x10;

	5005 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	5006 + break;

	5007 + case 11:

	5008 + method_ptr = jpeg_idct_11x11;

	5009 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	5010 + break;

	5011 + case 12:

	5012 + method_ptr = jpeg_idct_12x12;

	5013 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	5014 + break;

	5015 + case 13:

	5016 + method_ptr = jpeg_idct_13x13;

	5017 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	5018 + break;

	5019 + case 14:

	5020 + method_ptr = jpeg_idct_14x14;

	5021 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	5022 + break;

	5023 + case 15:

	5024 + method_ptr = jpeg_idct_15x15;

	5025 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	5026 + break;

	5027 + case 16:

	5028 + method_ptr = jpeg_idct_16x16;

	5029 + method = JDCT_ISLOW; /* jidctint uses islow-style table */

	5030 + break;

	5031 default:

	5032 - ERREXIT1(cinfo, JERR_BAD_DCTSIZE, compptr->DCT_scaled_size);

	5033 + ERREXIT1(cinfo, JERR_BAD_DCTSIZE, compptr->_DCT_scaled_size);

	5034 break;

	5035 }

	5036 idct->pub.inverse_DCT[ci] = method_ptr;

	5037 Index: jdhuff.c

	5038 ===================================================================

	5039 --- jdhuff.c (revision 829)

	5040 +++ jdhuff.c (working copy)

	5041 @@ -1,8 +1,10 @@

	5042 /*

	5043 * jdhuff.c

	5044 *

	5045 + * This file was part of the Independent JPEG Group's software:

	5046 * Copyright (C) 1991-1997, Thomas G. Lane.

	5047 - * This file is part of the Independent JPEG Group's software.

	5048 + * libjpeg-turbo Modifications:

	5049 + * Copyright (C) 2009-2011, 2015, D. R. Commander.

	5050 * For conditions of distribution and use, see the accompanying README file.

	5051 *

	5052 * This file contains Huffman entropy decoding routines.

	5053 @@ -18,6 +20,7 @@

	5054 #include "jinclude.h"

	5055 #include "jpeglib.h"

	5056 #include "jdhuff.h" /* Declarations shared with jdphuff.c */

	5057 +#include "jpegcomp.h"

	5058

	5059

	5060 /*

	5061 @@ -122,7 +125,7 @@

	5062 if (compptr->component_needed) {

	5063 entropy->dc_needed[blkn] = TRUE;

	5064 /* we don't need the ACs if producing a 1/8th-size image */

	5065 - entropy->ac_needed[blkn] = (compptr->DCT_scaled_size > 1);

	5066 + entropy->ac_needed[blkn] = (compptr->_DCT_scaled_size > 1);

	5067 } else {

	5068 entropy->dc_needed[blkn] = entropy->ac_needed[blkn] = FALSE;

	5069 }

	5070 @@ -225,6 +228,7 @@

	5071 dtbl->maxcode[l] = -1; /* -1 if no codes of this length */

	5072 }

	5073 }

	5074 + dtbl->valoffset[17] = 0;

	5075 dtbl->maxcode[17] = 0xFFFFFL; /* ensures jpeg_huff_decode terminates */

	5076

	5077 /* Compute lookahead tables to speed up decoding.

	5078 @@ -234,7 +238,8 @@

	5079 * with that code.

	5080 */

	5081

	5082 - MEMZERO(dtbl->look_nbits, SIZEOF(dtbl->look_nbits));

	5083 + for (i = 0; i < (1 << HUFF_LOOKAHEAD); i++)

	5084 + dtbl->lookup[i] = (HUFF_LOOKAHEAD + 1) << HUFF_LOOKAHEAD;

	5085

	5086 p = 0;

	5087 for (l = 1; l <= HUFF_LOOKAHEAD; l++) {

	5088 @@ -243,8 +248,7 @@

	5089 /* Generate left-justified code followed by all possible bit sequences */

	5090 lookbits = huffcode[p] << (HUFF_LOOKAHEAD-l);

	5091 for (ctr = 1 << (HUFF_LOOKAHEAD-l); ctr > 0; ctr--) {

	5092 - dtbl->look_nbits[lookbits] = l;

	5093 - dtbl->look_sym[lookbits] = htbl->huffval[p];

	5094 + dtbl->lookup[lookbits] = (l << HUFF_LOOKAHEAD) \| htbl->huffval[p];

	5095 lookbits++;

	5096 }

	5097 }

	5098 @@ -389,6 +393,50 @@

	5099 }

	5100

	5101

	5102 +/* Macro version of the above, which performs much better but does not

	5103 + handle markers. We have to hand off any blocks with markers to the

	5104 + slower routines. */

	5105 +

	5106 +#define GET_BYTE \

	5107 +{ \

	5108 + register int c0, c1; \

	5109 + c0 = GETJOCTET(*buffer++); \

	5110 + c1 = GETJOCTET(*buffer); \

	5111 + /* Pre-execute most common case */ \

	5112 + get_buffer = (get_buffer << 8) \| c0; \

	5113 + bits_left += 8; \

	5114 + if (c0 == 0xFF) { \

	5115 + /* Pre-execute case of FF/00, which represents an FF data byte */ \

	5116 + buffer++; \

	5117 + if (c1 != 0) { \

	5118 + /* Oops, it's actually a marker indicating end of compressed data. */ \

	5119 + cinfo->unread_marker = c1; \

	5120 + /* Back out pre-execution and fill the buffer with zero bits */ \

	5121 + buffer -= 2; \

	5122 + get_buffer &= ~0xFF; \

	5123 + } \

	5124 + } \

	5125 +}

	5126 +

	5127 +#if __WORDSIZE == 64 \|\| defined(_WIN64)

	5128 +

	5129 +/* Pre-fetch 48 bytes, because the holding register is 64-bit */

	5130 +#define FILL_BIT_BUFFER_FAST \

	5131 + if (bits_left < 16) { \

	5132 + GET_BYTE GET_BYTE GET_BYTE GET_BYTE GET_BYTE GET_BYTE \

	5133 + }

	5134 +

	5135 +#else

	5136 +

	5137 +/* Pre-fetch 16 bytes, because the holding register is 32-bit */

	5138 +#define FILL_BIT_BUFFER_FAST \

	5139 + if (bits_left < 16) { \

	5140 + GET_BYTE GET_BYTE \

	5141 + }

	5142 +

	5143 +#endif

	5144 +

	5145 +

	5146 /*

	5147 * Out-of-line code for Huffman code decoding.

	5148 * See jdhuff.h for info about usage.

	5149 @@ -438,9 +486,10 @@

	5150 * On some machines, a shift and add will be faster than a table lookup.

	5151 */

	5152

	5153 +#define AVOID_TABLES

	5154 #ifdef AVOID_TABLES

	5155

	5156 -#define HUFF_EXTEND(x,s) ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x))

	5157 +#define HUFF_EXTEND(x,s) ((x) + ((((x) - (1<<((s)-1))) >> 31) & (((-1)<<(s)) + 1)))

	5158

	5159 #else

	5160

	5161 @@ -498,6 +547,191 @@

	5162 }

	5163

	5164

	5165 +LOCAL(boolean)

	5166 +decode_mcu_slow (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)

	5167 +{

	5168 + huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy;

	5169 + BITREAD_STATE_VARS;

	5170 + int blkn;

	5171 + savable_state state;

	5172 + /* Outer loop handles each block in the MCU */

	5173 +

	5174 + /* Load up working state */

	5175 + BITREAD_LOAD_STATE(cinfo,entropy->bitstate);

	5176 + ASSIGN_STATE(state, entropy->saved);

	5177 +

	5178 + for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {

	5179 + JBLOCKROW block = MCU_data ? MCU_data[blkn] : NULL;

	5180 + d_derived_tbl * dctbl = entropy->dc_cur_tbls[blkn];

	5181 + d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn];

	5182 + register int s, k, r;

	5183 +

	5184 + /* Decode a single block's worth of coefficients */

	5185 +

	5186 + /* Section F.2.2.1: decode the DC coefficient difference */

	5187 + HUFF_DECODE(s, br_state, dctbl, return FALSE, label1);

	5188 + if (s) {

	5189 + CHECK_BIT_BUFFER(br_state, s, return FALSE);

	5190 + r = GET_BITS(s);

	5191 + s = HUFF_EXTEND(r, s);

	5192 + }

	5193 +

	5194 + if (entropy->dc_needed[blkn]) {

	5195 + /* Convert DC difference to actual value, update last_dc_val */

	5196 + int ci = cinfo->MCU_membership[blkn];

	5197 + s += state.last_dc_val[ci];

	5198 + state.last_dc_val[ci] = s;

	5199 + if (block) {

	5200 + /* Output the DC coefficient (assumes jpeg_natural_order[0] = 0) */

	5201 + (*block)[0] = (JCOEF) s;

	5202 + }

	5203 + }

	5204 +

	5205 + if (entropy->ac_needed[blkn] && block) {

	5206 +

	5207 + /* Section F.2.2.2: decode the AC coefficients */

	5208 + /* Since zeroes are skipped, output area must be cleared beforehand */

	5209 + for (k = 1; k < DCTSIZE2; k++) {

	5210 + HUFF_DECODE(s, br_state, actbl, return FALSE, label2);

	5211 +

	5212 + r = s >> 4;

	5213 + s &= 15;

	5214 +

	5215 + if (s) {

	5216 + k += r;

	5217 + CHECK_BIT_BUFFER(br_state, s, return FALSE);

	5218 + r = GET_BITS(s);

	5219 + s = HUFF_EXTEND(r, s);

	5220 + /* Output coefficient in natural (dezigzagged) order.

	5221 + * Note: the extra entries in jpeg_natural_order[] will save us

	5222 + * if k >= DCTSIZE2, which could happen if the data is corrupted.

	5223 + */

	5224 + (*block)[jpeg_natural_order[k]] = (JCOEF) s;

	5225 + } else {

	5226 + if (r != 15)

	5227 + break;

	5228 + k += 15;

	5229 + }

	5230 + }

	5231 +

	5232 + } else {

	5233 +

	5234 + /* Section F.2.2.2: decode the AC coefficients */

	5235 + /* In this path we just discard the values */

	5236 + for (k = 1; k < DCTSIZE2; k++) {

	5237 + HUFF_DECODE(s, br_state, actbl, return FALSE, label3);

	5238 +

	5239 + r = s >> 4;

	5240 + s &= 15;

	5241 +

	5242 + if (s) {

	5243 + k += r;

	5244 + CHECK_BIT_BUFFER(br_state, s, return FALSE);

	5245 + DROP_BITS(s);

	5246 + } else {

	5247 + if (r != 15)

	5248 + break;

	5249 + k += 15;

	5250 + }

	5251 + }

	5252 + }

	5253 + }

	5254 +

	5255 + /* Completed MCU, so update state */

	5256 + BITREAD_SAVE_STATE(cinfo,entropy->bitstate);

	5257 + ASSIGN_STATE(entropy->saved, state);

	5258 + return TRUE;

	5259 +}

	5260 +

	5261 +

	5262 +LOCAL(boolean)

	5263 +decode_mcu_fast (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)

	5264 +{

	5265 + huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy;

	5266 + BITREAD_STATE_VARS;

	5267 + JOCTET *buffer;

	5268 + int blkn;

	5269 + savable_state state;

	5270 + /* Outer loop handles each block in the MCU */

	5271 +

	5272 + /* Load up working state */

	5273 + BITREAD_LOAD_STATE(cinfo,entropy->bitstate);

	5274 + buffer = (JOCTET *) br_state.next_input_byte;

	5275 + ASSIGN_STATE(state, entropy->saved);

	5276 +

	5277 + for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {

	5278 + JBLOCKROW block = MCU_data[blkn];

	5279 + d_derived_tbl * dctbl = entropy->dc_cur_tbls[blkn];

	5280 + d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn];

	5281 + register int s, k, r, l;

	5282 +

	5283 + HUFF_DECODE_FAST(s, l, dctbl, slow_decode_mcu);

	5284 + if (s) {

	5285 + FILL_BIT_BUFFER_FAST

	5286 + r = GET_BITS(s);

	5287 + s = HUFF_EXTEND(r, s);

	5288 + }

	5289 +

	5290 + if (entropy->dc_needed[blkn]) {

	5291 + int ci = cinfo->MCU_membership[blkn];

	5292 + s += state.last_dc_val[ci];

	5293 + state.last_dc_val[ci] = s;

	5294 + if (block)

	5295 + (*block)[0] = (JCOEF) s;

	5296 + }

	5297 +

	5298 + if (entropy->ac_needed[blkn] && block) {

	5299 +

	5300 + for (k = 1; k < DCTSIZE2; k++) {

	5301 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu);

	5302 + r = s >> 4;

	5303 + s &= 15;

	5304 +

	5305 + if (s) {

	5306 + k += r;

	5307 + FILL_BIT_BUFFER_FAST

	5308 + r = GET_BITS(s);

	5309 + s = HUFF_EXTEND(r, s);

	5310 + (*block)[jpeg_natural_order[k]] = (JCOEF) s;

	5311 + } else {

	5312 + if (r != 15) break;

	5313 + k += 15;

	5314 + }

	5315 + }

	5316 +

	5317 + } else {

	5318 +

	5319 + for (k = 1; k < DCTSIZE2; k++) {

	5320 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu);

	5321 + r = s >> 4;

	5322 + s &= 15;

	5323 +

	5324 + if (s) {

	5325 + k += r;

	5326 + FILL_BIT_BUFFER_FAST

	5327 + DROP_BITS(s);

	5328 + } else {

	5329 + if (r != 15) break;

	5330 + k += 15;

	5331 + }

	5332 + }

	5333 + }

	5334 + }

	5335 +

	5336 + if (cinfo->unread_marker != 0) {

	5337 +slow_decode_mcu:

	5338 + cinfo->unread_marker = 0;

	5339 + return FALSE;

	5340 + }

	5341 +

	5342 + br_state.bytes_in_buffer -= (buffer - br_state.next_input_byte);

	5343 + br_state.next_input_byte = buffer;

	5344 + BITREAD_SAVE_STATE(cinfo,entropy->bitstate);

	5345 + ASSIGN_STATE(entropy->saved, state);

	5346 + return TRUE;

	5347 +}

	5348 +

	5349 +

	5350 /*

	5351 * Decode and return one MCU's worth of Huffman-compressed coefficients.

	5352 * The coefficients are reordered from zigzag order into natural array order,

	5353 @@ -513,13 +747,13 @@

	5354 * this module, since we'll just re-assign them on the next call.)

	5355 */

	5356

	5357 +#define BUFSIZE (DCTSIZE2 * 2u)

	5358 +

	5359 METHODDEF(boolean)

	5360 decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)

	5361 {

	5362 huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy;

	5363 - int blkn;

	5364 - BITREAD_STATE_VARS;

	5365 - savable_state state;

	5366 + int usefast = 1;

	5367

	5368 /* Process restart marker if needed; may have to suspend */

	5369 if (cinfo->restart_interval) {

	5370 @@ -526,98 +760,26 @@

	5371 if (entropy->restarts_to_go == 0)

	5372 if (! process_restart(cinfo))

	5373 return FALSE;

	5374 + usefast = 0;

	5375 }

	5376

	5377 + if (cinfo->src->bytes_in_buffer < BUFSIZE * (size_t)cinfo->blocks_in_MCU

	5378 + \|\| cinfo->unread_marker != 0)

	5379 + usefast = 0;

	5380 +

	5381 /* If we've run out of data, just leave the MCU set to zeroes.

	5382 * This way, we return uniform gray for the remainder of the segment.

	5383 */

	5384 if (! entropy->pub.insufficient_data) {

	5385

	5386 - /* Load up working state */

	5387 - BITREAD_LOAD_STATE(cinfo,entropy->bitstate);

	5388 - ASSIGN_STATE(state, entropy->saved);

	5389 -

	5390 - /* Outer loop handles each block in the MCU */

	5391 -

	5392 - for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {

	5393 - JBLOCKROW block = MCU_data[blkn];

	5394 - d_derived_tbl * dctbl = entropy->dc_cur_tbls[blkn];

	5395 - d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn];

	5396 - register int s, k, r;

	5397 -

	5398 - /* Decode a single block's worth of coefficients */

	5399 -

	5400 - /* Section F.2.2.1: decode the DC coefficient difference */

	5401 - HUFF_DECODE(s, br_state, dctbl, return FALSE, label1);

	5402 - if (s) {

	5403 - CHECK_BIT_BUFFER(br_state, s, return FALSE);

	5404 - r = GET_BITS(s);

	5405 - s = HUFF_EXTEND(r, s);

	5406 - }

	5407 -

	5408 - if (entropy->dc_needed[blkn]) {

	5409 - /* Convert DC difference to actual value, update last_dc_val */

	5410 - int ci = cinfo->MCU_membership[blkn];

	5411 - s += state.last_dc_val[ci];

	5412 - state.last_dc_val[ci] = s;

	5413 - /* Output the DC coefficient (assumes jpeg_natural_order[0] = 0) */

	5414 - (*block)[0] = (JCOEF) s;

	5415 - }

	5416 -

	5417 - if (entropy->ac_needed[blkn]) {

	5418 -

	5419 - /* Section F.2.2.2: decode the AC coefficients */

	5420 - /* Since zeroes are skipped, output area must be cleared beforehand */

	5421 - for (k = 1; k < DCTSIZE2; k++) {

	5422 - HUFF_DECODE(s, br_state, actbl, return FALSE, label2);

	5423 -

	5424 - r = s >> 4;

	5425 - s &= 15;

	5426 -

	5427 - if (s) {

	5428 - k += r;

	5429 - CHECK_BIT_BUFFER(br_state, s, return FALSE);

	5430 - r = GET_BITS(s);

	5431 - s = HUFF_EXTEND(r, s);

	5432 - /* Output coefficient in natural (dezigzagged) order.

	5433 - * Note: the extra entries in jpeg_natural_order[] will save us

	5434 - * if k >= DCTSIZE2, which could happen if the data is corrupted.

	5435 - */

	5436 - (*block)[jpeg_natural_order[k]] = (JCOEF) s;

	5437 - } else {

	5438 - if (r != 15)

	5439 - break;

	5440 - k += 15;

	5441 - }

	5442 - }

	5443 -

	5444 - } else {

	5445 -

	5446 - /* Section F.2.2.2: decode the AC coefficients */

	5447 - /* In this path we just discard the values */

	5448 - for (k = 1; k < DCTSIZE2; k++) {

	5449 - HUFF_DECODE(s, br_state, actbl, return FALSE, label3);

	5450 -

	5451 - r = s >> 4;

	5452 - s &= 15;

	5453 -

	5454 - if (s) {

	5455 - k += r;

	5456 - CHECK_BIT_BUFFER(br_state, s, return FALSE);

	5457 - DROP_BITS(s);

	5458 - } else {

	5459 - if (r != 15)

	5460 - break;

	5461 - k += 15;

	5462 - }

	5463 - }

	5464 -

	5465 - }

	5466 + if (usefast) {

	5467 + if (!decode_mcu_fast(cinfo, MCU_data)) goto use_slow;

	5468 }

	5469 + else {

	5470 + use_slow:

	5471 + if (!decode_mcu_slow(cinfo, MCU_data)) return FALSE;

	5472 + }

	5473

	5474 - /* Completed MCU, so update state */

	5475 - BITREAD_SAVE_STATE(cinfo,entropy->bitstate);

	5476 - ASSIGN_STATE(entropy->saved, state);

	5477 }

	5478

	5479 /* Account for restart interval (no-op if not using restarts) */

	5480 Index: jdhuff.h

	5481 ===================================================================

	5482 --- jdhuff.h (revision 829)

	5483 +++ jdhuff.h (working copy)

	5484 @@ -1,8 +1,10 @@

	5485 /*

	5486 * jdhuff.h

	5487 *

	5488 + * This file was part of the Independent JPEG Group's software:

	5489 * Copyright (C) 1991-1997, Thomas G. Lane.

	5490 - * This file is part of the Independent JPEG Group's software.

	5491 + * Modifications:

	5492 + * Copyright (C) 2010-2011, D. R. Commander.

	5493 * For conditions of distribution and use, see the accompanying README file.

	5494 *

	5495 * This file contains declarations for Huffman entropy decoding routines

	5496 @@ -27,7 +29,7 @@

	5497 /* Basic tables: (element [0] of each array is unused) */

	5498 INT32 maxcode[18]; /* largest code of length k (-1 if none) */

	5499 /* (maxcode[17] is a sentinel to ensure jpeg_huff_decode terminates) */

	5500 - INT32 valoffset[17]; /* huffval[] offset for codes of length k */

	5501 + INT32 valoffset[18]; /* huffval[] offset for codes of length k */

	5502 /* valoffset[k] = huffval[] index of 1st symbol of code length k, less

	5503 * the smallest code of length k; so given a code of length k, the

	5504 * corresponding symbol is huffval[code + valoffset[k]]

	5505 @@ -36,13 +38,17 @@

	5506 /* Link to public Huffman table (needed only in jpeg_huff_decode) */

	5507 JHUFF_TBL *pub;

	5508

	5509 - /* Lookahead tables: indexed by the next HUFF_LOOKAHEAD bits of

	5510 + /* Lookahead table: indexed by the next HUFF_LOOKAHEAD bits of

	5511 * the input data stream. If the next Huffman code is no more

	5512 * than HUFF_LOOKAHEAD bits long, we can obtain its length and

	5513 - * the corresponding symbol directly from these tables.

	5514 + * the corresponding symbol directly from this tables.

	5515 + *

	5516 + * The lower 8 bits of each table entry contain the number of

	5517 + * bits in the corresponding Huffman code, or HUFF_LOOKAHEAD + 1

	5518 + * if too long. The next 8 bits of each entry contain the

	5519 + * symbol.

	5520 */

	5521 - int look_nbits[1<<HUFF_LOOKAHEAD]; /* # bits, or 0 if too long */

	5522 - UINT8 look_sym[1<<HUFF_LOOKAHEAD]; /* symbol, or unused */

	5523 + int lookup[1<<HUFF_LOOKAHEAD];

	5524 } d_derived_tbl;

	5525

	5526 /* Expand a Huffman table definition into the derived format */

	5527 @@ -69,9 +75,18 @@

	5528 * necessary.

	5529 */

	5530

	5531 +#if __WORDSIZE == 64 \|\| defined(_WIN64)

	5532 +

	5533 +typedef size_t bit_buf_type; /* type of bit-extraction buffer */

	5534 +#define BIT_BUF_SIZE 64 /* size of buffer in bits */

	5535 +

	5536 +#else

	5537 +

	5538 typedef INT32 bit_buf_type; /* type of bit-extraction buffer */

	5539 -#define BIT_BUF_SIZE 32 /* size of buffer in bits */

	5540 +#define BIT_BUF_SIZE 32 /* size of buffer in bits */

	5541

	5542 +#endif

	5543 +

	5544 /* If long is > 32 bits on your machine, and shifting/masking longs is

	5545 * reasonably fast, making bit_buf_type be long and setting BIT_BUF_SIZE

	5546 * appropriately should be a win. Unfortunately we can't define the size

	5547 @@ -183,11 +198,10 @@

	5548 } \

	5549 } \

	5550 look = PEEK_BITS(HUFF_LOOKAHEAD); \

	5551 - if ((nb = htbl->look_nbits[look]) != 0) { \

	5552 + if ((nb = (htbl->lookup[look] >> HUFF_LOOKAHEAD)) <= HUFF_LOOKAHEAD) { \

	5553 DROP_BITS(nb); \

	5554 - result = htbl->look_sym[look]; \

	5555 + result = htbl->lookup[look] & ((1 << HUFF_LOOKAHEAD) - 1); \

	5556 } else { \

	5557 - nb = HUFF_LOOKAHEAD+1; \

	5558 slowlabel: \

	5559 if ((result=jpeg_huff_decode(&state,get_buffer,bits_left,htbl,nb)) < 0) \

	5560 { failaction; } \

	5561 @@ -195,6 +209,28 @@

	5562 } \

	5563 }

	5564

	5565 +#define HUFF_DECODE_FAST(s,nb,htbl,slowlabel) \

	5566 + FILL_BIT_BUFFER_FAST; \

	5567 + s = PEEK_BITS(HUFF_LOOKAHEAD); \

	5568 + s = htbl->lookup[s]; \

	5569 + nb = s >> HUFF_LOOKAHEAD; \

	5570 + /* Pre-execute the common case of nb <= HUFF_LOOKAHEAD */ \

	5571 + DROP_BITS(nb); \

	5572 + s = s & ((1 << HUFF_LOOKAHEAD) - 1); \

	5573 + if (nb > HUFF_LOOKAHEAD) { \

	5574 + /* Equivalent of jpeg_huff_decode() */ \

	5575 + /* Don't use GET_BITS() here because we don't want to modify bits_left */ \

	5576 + s = (get_buffer >> bits_left) & ((1 << (nb)) - 1); \

	5577 + while (s > htbl->maxcode[nb]) { \

	5578 + s <<= 1; \

	5579 + s \|= GET_BITS(1); \

	5580 + nb++; \

	5581 + } \

	5582 + if (nb > 16) \

	5583 + goto slowlabel; \

	5584 + s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ]; \

	5585 + }

	5586 +

	5587 /* Out-of-line case for Huffman code fetching */

	5588 EXTERN(int) jpeg_huff_decode

	5589 JPP((bitread_working_state * state, register bit_buf_type get_buffer,

	5590 Index: jdinput.c

	5591 ===================================================================

	5592 --- jdinput.c (revision 829)

	5593 +++ jdinput.c (working copy)

	5594 @@ -1,8 +1,10 @@

	5595 /*

	5596 * jdinput.c

	5597 *

	5598 + * This file was part of the Independent JPEG Group's software:

	5599 * Copyright (C) 1991-1997, Thomas G. Lane.

	5600 - * This file is part of the Independent JPEG Group's software.

	5601 + * libjpeg-turbo Modifications:

	5602 + * Copyright (C) 2010, D. R. Commander.

	5603 * For conditions of distribution and use, see the accompanying README file.

	5604 *

	5605 * This file contains input control logic for the JPEG decompressor.

	5606 @@ -14,6 +16,7 @@

	5607 #define JPEG_INTERNALS

	5608 #include "jinclude.h"

	5609 #include "jpeglib.h"

	5610 +#include "jpegcomp.h"

	5611

	5612

	5613 /* Private state */

	5614 @@ -70,16 +73,30 @@

	5615 compptr->v_samp_factor);

	5616 }

	5617

	5618 +#if JPEG_LIB_VERSION >=80

	5619 + cinfo->block_size = DCTSIZE;

	5620 + cinfo->natural_order = jpeg_natural_order;

	5621 + cinfo->lim_Se = DCTSIZE2-1;

	5622 +#endif

	5623 +

	5624 /* We initialize DCT_scaled_size and min_DCT_scaled_size to DCTSIZE.

	5625 * In the full decompressor, this will be overridden by jdmaster.c;

	5626 * but in the transcoder, jdmaster.c is not used, so we must do it here.

	5627 */

	5628 +#if JPEG_LIB_VERSION >= 70

	5629 + cinfo->min_DCT_h_scaled_size = cinfo->min_DCT_v_scaled_size = DCTSIZE;

	5630 +#else

	5631 cinfo->min_DCT_scaled_size = DCTSIZE;

	5632 +#endif

	5633

	5634 /* Compute dimensions of components */

	5635 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	5636 ci++, compptr++) {

	5637 +#if JPEG_LIB_VERSION >= 70

	5638 + compptr->DCT_h_scaled_size = compptr->DCT_v_scaled_size = DCTSIZE;

	5639 +#else

	5640 compptr->DCT_scaled_size = DCTSIZE;

	5641 +#endif

	5642 /* Size in DCT blocks */

	5643 compptr->width_in_blocks = (JDIMENSION)

	5644 jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor,

	5645 @@ -138,7 +155,7 @@

	5646 compptr->MCU_width = 1;

	5647 compptr->MCU_height = 1;

	5648 compptr->MCU_blocks = 1;

	5649 - compptr->MCU_sample_width = compptr->DCT_scaled_size;

	5650 + compptr->MCU_sample_width = compptr->_DCT_scaled_size;

	5651 compptr->last_col_width = 1;

	5652 /* For noninterleaved scans, it is convenient to define last_row_height

	5653 * as the number of block rows present in the last iMCU row.

	5654 @@ -174,7 +191,7 @@

	5655 compptr->MCU_width = compptr->h_samp_factor;

	5656 compptr->MCU_height = compptr->v_samp_factor;

	5657 compptr->MCU_blocks = compptr->MCU_width * compptr->MCU_height;

	5658 - compptr->MCU_sample_width = compptr->MCU_width * compptr->DCT_scaled_size ;

	5659 + compptr->MCU_sample_width = compptr->MCU_width * compptr->_DCT_scaled_siz e;

	5660 /* Figure number of non-dummy blocks in last MCU column & row */

	5661 tmp = (int) (compptr->width_in_blocks % compptr->MCU_width);

	5662 if (tmp == 0) tmp = compptr->MCU_width;

	5663 Index: jdmainct.c

	5664 ===================================================================

	5665 --- jdmainct.c (revision 829)

	5666 +++ jdmainct.c (working copy)

	5667 @@ -1,8 +1,10 @@

	5668 /*

	5669 * jdmainct.c

	5670 *

	5671 + * This file was part of the Independent JPEG Group's software:

	5672 * Copyright (C) 1994-1996, Thomas G. Lane.

	5673 - * This file is part of the Independent JPEG Group's software.

	5674 + * libjpeg-turbo Modifications:

	5675 + * Copyright (C) 2010, D. R. Commander.

	5676 * For conditions of distribution and use, see the accompanying README file.

	5677 *

	5678 * This file contains the main buffer controller for decompression.

	5679 @@ -13,9 +15,7 @@

	5680 * supplies the equivalent of the main buffer in that case.

	5681 */

	5682

	5683 -#define JPEG_INTERNALS

	5684 -#include "jinclude.h"

	5685 -#include "jpeglib.h"

	5686 +#include "jdmainct.h"

	5687

	5688

	5689 /*

	5690 @@ -109,36 +109,6 @@

	5691 */

	5692

	5693

	5694 -/* Private buffer controller object */

	5695 -

	5696 -typedef struct {

	5697 - struct jpeg_d_main_controller pub; /* public fields */

	5698 -

	5699 - /* Pointer to allocated workspace (M or M+2 row groups). */

	5700 - JSAMPARRAY buffer[MAX_COMPONENTS];

	5701 -

	5702 - boolean buffer_full; /* Have we gotten an iMCU row from decoder? */

	5703 - JDIMENSION rowgroup_ctr; /* counts row groups output to postprocessor */

	5704 -

	5705 - /* Remaining fields are only used in the context case. */

	5706 -

	5707 - /* These are the master pointers to the funny-order pointer lists. */

	5708 - JSAMPIMAGE xbuffer[2]; /* pointers to weird pointer lists */

	5709 -

	5710 - int whichptr; /* indicates which pointer set is now in use */

	5711 - int context_state; /* process_data state machine status */

	5712 - JDIMENSION rowgroups_avail; /* row groups available to postprocessor */

	5713 - JDIMENSION iMCU_row_ctr; /* counts iMCU rows to detect image top/bot */

	5714 -} my_main_controller;

	5715 -

	5716 -typedef my_main_controller * my_main_ptr;

	5717 -

	5718 -/* context_state values: */

	5719 -#define CTX_PREPARE_FOR_IMCU 0 /* need to prepare for MCU row */

	5720 -#define CTX_PROCESS_IMCU 1 /* feeding iMCU to postprocessor */

	5721 -#define CTX_POSTPONED_ROW 2 /* feeding postponed row group */

	5722 -

	5723 -

	5724 /* Forward declarations */

	5725 METHODDEF(void) process_data_simple_main

	5726 JPP((j_decompress_ptr cinfo, JSAMPARRAY output_buf,

	5727 @@ -159,9 +129,9 @@

	5728 * This is done only once, not once per pass.

	5729 */

	5730 {

	5731 - my_main_ptr main = (my_main_ptr) cinfo->main;

	5732 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	5733 int ci, rgroup;

	5734 - int M = cinfo->min_DCT_scaled_size;

	5735 + int M = cinfo->_min_DCT_scaled_size;

	5736 jpeg_component_info *compptr;

	5737 JSAMPARRAY xbuf;

	5738

	5739 @@ -168,15 +138,15 @@

	5740 /* Get top-level space for component array pointers.

	5741 * We alloc both arrays with one call to save a few cycles.

	5742 */

	5743 - main->xbuffer[0] = (JSAMPIMAGE)

	5744 + main_ptr->xbuffer[0] = (JSAMPIMAGE)

	5745 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,

	5746 cinfo->num_components * 2 * SIZEOF(JSAMPARRAY));

	5747 - main->xbuffer[1] = main->xbuffer[0] + cinfo->num_components;

	5748 + main_ptr->xbuffer[1] = main_ptr->xbuffer[0] + cinfo->num_components;

	5749

	5750 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	5751 ci++, compptr++) {

	5752 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) /

	5753 - cinfo->min_DCT_scaled_size; /* height of a row group of component */

	5754 + rgroup = (compptr->v_samp_factor * compptr->_DCT_scaled_size) /

	5755 + cinfo->_min_DCT_scaled_size; /* height of a row group of component */

	5756 /* Get space for pointer lists --- M+4 row groups in each list.

	5757 * We alloc both pointer lists with one call to save a few cycles.

	5758 */

	5759 @@ -184,9 +154,9 @@

	5760 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,

	5761 2 * (rgroup * (M + 4)) * SIZEOF(JSAMPROW));

	5762 xbuf += rgroup; /* want one row group at negative offsets */

	5763 - main->xbuffer[0][ci] = xbuf;

	5764 + main_ptr->xbuffer[0][ci] = xbuf;

	5765 xbuf += rgroup * (M + 4);

	5766 - main->xbuffer[1][ci] = xbuf;

	5767 + main_ptr->xbuffer[1][ci] = xbuf;

	5768 }

	5769 }

	5770

	5771 @@ -194,26 +164,26 @@

	5772 LOCAL(void)

	5773 make_funny_pointers (j_decompress_ptr cinfo)

	5774 /* Create the funny pointer lists discussed in the comments above.

	5775 - * The actual workspace is already allocated (in main->buffer),

	5776 + * The actual workspace is already allocated (in main_ptr->buffer),

	5777 * and the space for the pointer lists is allocated too.

	5778 * This routine just fills in the curiously ordered lists.

	5779 * This will be repeated at the beginning of each pass.

	5780 */

	5781 {

	5782 - my_main_ptr main = (my_main_ptr) cinfo->main;

	5783 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	5784 int ci, i, rgroup;

	5785 - int M = cinfo->min_DCT_scaled_size;

	5786 + int M = cinfo->_min_DCT_scaled_size;

	5787 jpeg_component_info *compptr;

	5788 JSAMPARRAY buf, xbuf0, xbuf1;

	5789

	5790 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	5791 ci++, compptr++) {

	5792 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) /

	5793 - cinfo->min_DCT_scaled_size; /* height of a row group of component */

	5794 - xbuf0 = main->xbuffer[0][ci];

	5795 - xbuf1 = main->xbuffer[1][ci];

	5796 + rgroup = (compptr->v_samp_factor * compptr->_DCT_scaled_size) /

	5797 + cinfo->_min_DCT_scaled_size; /* height of a row group of component */

	5798 + xbuf0 = main_ptr->xbuffer[0][ci];

	5799 + xbuf1 = main_ptr->xbuffer[1][ci];

	5800 /* First copy the workspace pointers as-is */

	5801 - buf = main->buffer[ci];

	5802 + buf = main_ptr->buffer[ci];

	5803 for (i = 0; i < rgroup * (M + 2); i++) {

	5804 xbuf0[i] = xbuf1[i] = buf[i];

	5805 }

	5806 @@ -235,34 +205,6 @@

	5807

	5808

	5809 LOCAL(void)

	5810 -set_wraparound_pointers (j_decompress_ptr cinfo)

	5811 -/* Set up the "wraparound" pointers at top and bottom of the pointer lists.

	5812 - * This changes the pointer list state from top-of-image to the normal state.

	5813 - */

	5814 -{

	5815 - my_main_ptr main = (my_main_ptr) cinfo->main;

	5816 - int ci, i, rgroup;

	5817 - int M = cinfo->min_DCT_scaled_size;

	5818 - jpeg_component_info *compptr;

	5819 - JSAMPARRAY xbuf0, xbuf1;

	5820 -

	5821 - for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	5822 - ci++, compptr++) {

	5823 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) /

	5824 - cinfo->min_DCT_scaled_size; /* height of a row group of component */

	5825 - xbuf0 = main->xbuffer[0][ci];

	5826 - xbuf1 = main->xbuffer[1][ci];

	5827 - for (i = 0; i < rgroup; i++) {

	5828 - xbuf0[i - rgroup] = xbuf0[rgroup*(M+1) + i];

	5829 - xbuf1[i - rgroup] = xbuf1[rgroup*(M+1) + i];

	5830 - xbuf0[rgroup*(M+2) + i] = xbuf0[i];

	5831 - xbuf1[rgroup*(M+2) + i] = xbuf1[i];

	5832 - }

	5833 - }

	5834 -}

	5835 -

	5836 -

	5837 -LOCAL(void)

	5838 set_bottom_pointers (j_decompress_ptr cinfo)

	5839 /* Change the pointer lists to duplicate the last sample row at the bottom

	5840 * of the image. whichptr indicates which xbuffer holds the final iMCU row.

	5841 @@ -269,7 +211,7 @@

	5842 * Also sets rowgroups_avail to indicate number of nondummy row groups in row.

	5843 */

	5844 {

	5845 - my_main_ptr main = (my_main_ptr) cinfo->main;

	5846 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	5847 int ci, i, rgroup, iMCUheight, rows_left;

	5848 jpeg_component_info *compptr;

	5849 JSAMPARRAY xbuf;

	5850 @@ -277,8 +219,8 @@

	5851 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	5852 ci++, compptr++) {

	5853 /* Count sample rows in one iMCU row and in one row group */

	5854 - iMCUheight = compptr->v_samp_factor * compptr->DCT_scaled_size;

	5855 - rgroup = iMCUheight / cinfo->min_DCT_scaled_size;

	5856 + iMCUheight = compptr->v_samp_factor * compptr->_DCT_scaled_size;

	5857 + rgroup = iMCUheight / cinfo->_min_DCT_scaled_size;

	5858 /* Count nondummy sample rows remaining for this component */

	5859 rows_left = (int) (compptr->downsampled_height % (JDIMENSION) iMCUheight);

	5860 if (rows_left == 0) rows_left = iMCUheight;

	5861 @@ -286,12 +228,12 @@

	5862 * so we need only do it once.

	5863 */

	5864 if (ci == 0) {

	5865 - main->rowgroups_avail = (JDIMENSION) ((rows_left-1) / rgroup + 1);

	5866 + main_ptr->rowgroups_avail = (JDIMENSION) ((rows_left-1) / rgroup + 1);

	5867 }

	5868 /* Duplicate the last real sample row rgroup*2 times; this pads out the

	5869 * last partial rowgroup and ensures at least one full rowgroup of context.

	5870 */

	5871 - xbuf = main->xbuffer[main->whichptr][ci];

	5872 + xbuf = main_ptr->xbuffer[main_ptr->whichptr][ci];

	5873 for (i = 0; i < rgroup * 2; i++) {

	5874 xbuf[rows_left + i] = xbuf[rows_left-1];

	5875 }

	5876 @@ -306,27 +248,27 @@

	5877 METHODDEF(void)

	5878 start_pass_main (j_decompress_ptr cinfo, J_BUF_MODE pass_mode)

	5879 {

	5880 - my_main_ptr main = (my_main_ptr) cinfo->main;

	5881 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	5882

	5883 switch (pass_mode) {

	5884 case JBUF_PASS_THRU:

	5885 if (cinfo->upsample->need_context_rows) {

	5886 - main->pub.process_data = process_data_context_main;

	5887 + main_ptr->pub.process_data = process_data_context_main;

	5888 make_funny_pointers(cinfo); /* Create the xbuffer[] lists */

	5889 - main->whichptr = 0; /* Read first iMCU row into xbuffer[0] */

	5890 - main->context_state = CTX_PREPARE_FOR_IMCU;

	5891 - main->iMCU_row_ctr = 0;

	5892 + main_ptr->whichptr = 0; /* Read first iMCU row into xbuffer[0] */

	5893 + main_ptr->context_state = CTX_PREPARE_FOR_IMCU;

	5894 + main_ptr->iMCU_row_ctr = 0;

	5895 } else {

	5896 /* Simple case with no context needed */

	5897 - main->pub.process_data = process_data_simple_main;

	5898 + main_ptr->pub.process_data = process_data_simple_main;

	5899 }

	5900 - main->buffer_full = FALSE; /* Mark buffer empty */

	5901 - main->rowgroup_ctr = 0;

	5902 + main_ptr->buffer_full = FALSE; /* Mark buffer empty */

	5903 + main_ptr->rowgroup_ctr = 0;

	5904 break;

	5905 #ifdef QUANT_2PASS_SUPPORTED

	5906 case JBUF_CRANK_DEST:

	5907 /* For last pass of 2-pass quantization, just crank the postprocessor */

	5908 - main->pub.process_data = process_data_crank_post;

	5909 + main_ptr->pub.process_data = process_data_crank_post;

	5910 break;

	5911 #endif

	5912 default:

	5913 @@ -346,18 +288,18 @@

	5914 JSAMPARRAY output_buf, JDIMENSION *out_row_ctr,

	5915 JDIMENSION out_rows_avail)

	5916 {

	5917 - my_main_ptr main = (my_main_ptr) cinfo->main;

	5918 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	5919 JDIMENSION rowgroups_avail;

	5920

	5921 /* Read input data if we haven't filled the main buffer yet */

	5922 - if (! main->buffer_full) {

	5923 - if (! (*cinfo->coef->decompress_data) (cinfo, main->buffer))

	5924 + if (! main_ptr->buffer_full) {

	5925 + if (! (*cinfo->coef->decompress_data) (cinfo, main_ptr->buffer))

	5926 return; /* suspension forced, can do nothing more */

	5927 - main->buffer_full = TRUE; /* OK, we have an iMCU row to work with */

	5928 + main_ptr->buffer_full = TRUE; /* OK, we have an iMCU row to work with */

	5929 }

	5930

	5931 /* There are always min_DCT_scaled_size row groups in an iMCU row. */

	5932 - rowgroups_avail = (JDIMENSION) cinfo->min_DCT_scaled_size;

	5933 + rowgroups_avail = (JDIMENSION) cinfo->_min_DCT_scaled_size;

	5934 /* Note: at the bottom of the image, we may pass extra garbage row groups

	5935 * to the postprocessor. The postprocessor has to check for bottom

	5936 * of image anyway (at row resolution), so no point in us doing it too.

	5937 @@ -364,14 +306,14 @@

	5938 */

	5939

	5940 /* Feed the postprocessor */

	5941 - (*cinfo->post->post_process_data) (cinfo, main->buffer,

	5942 - &main->rowgroup_ctr, rowgroups_avail,

	5943 + (*cinfo->post->post_process_data) (cinfo, main_ptr->buffer,

	5944 + &main_ptr->rowgroup_ctr, rowgroups_avail,

	5945 output_buf, out_row_ctr, out_rows_avail);

	5946

	5947 /* Has postprocessor consumed all the data yet? If so, mark buffer empty */

	5948 - if (main->rowgroup_ctr >= rowgroups_avail) {

	5949 - main->buffer_full = FALSE;

	5950 - main->rowgroup_ctr = 0;

	5951 + if (main_ptr->rowgroup_ctr >= rowgroups_avail) {

	5952 + main_ptr->buffer_full = FALSE;

	5953 + main_ptr->rowgroup_ctr = 0;

	5954 }

	5955 }

	5956

	5957 @@ -386,15 +328,15 @@

	5958 JSAMPARRAY output_buf, JDIMENSION *out_row_ctr,

	5959 JDIMENSION out_rows_avail)

	5960 {

	5961 - my_main_ptr main = (my_main_ptr) cinfo->main;

	5962 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main;

	5963

	5964 /* Read input data if we haven't filled the main buffer yet */

	5965 - if (! main->buffer_full) {

	5966 + if (! main_ptr->buffer_full) {

	5967 if (! (*cinfo->coef->decompress_data) (cinfo,

	5968 - main->xbuffer[main->whichptr]))

	5969 + main_ptr->xbuffer[main_ptr->whichptr] ))

	5970 return; /* suspension forced, can do nothing more */

	5971 - main->buffer_full = TRUE; /* OK, we have an iMCU row to work with */

	5972 - main->iMCU_row_ctr++; /* count rows received */

	5973 + main_ptr->buffer_full = TRUE; /* OK, we have an iMCU row to work with */

	5974 + main_ptr->iMCU_row_ctr++; /* count rows received */

	5975 }

	5976

	5977 /* Postprocessor typically will not swallow all the input data it is handed

	5978 @@ -402,47 +344,47 @@

	5979 * to exit and restart. This switch lets us keep track of how far we got.

	5980 * Note that each case falls through to the next on successful completion.

	5981 */

	5982 - switch (main->context_state) {

	5983 + switch (main_ptr->context_state) {

	5984 case CTX_POSTPONED_ROW:

	5985 /* Call postprocessor using previously set pointers for postponed row */

	5986 - (*cinfo->post->post_process_data) (cinfo, main->xbuffer[main->whichptr],

	5987 - &main->rowgroup_ctr, main->rowgroups_avail,

	5988 + (*cinfo->post->post_process_data) (cinfo, main_ptr->xbuffer[main_ptr->which ptr],

	5989 + &main_ptr->rowgroup_ctr, main_ptr->rowgroups_avail,

	5990 output_buf, out_row_ctr, out_rows_avail);

	5991 - if (main->rowgroup_ctr < main->rowgroups_avail)

	5992 + if (main_ptr->rowgroup_ctr < main_ptr->rowgroups_avail)

	5993 return; /* Need to suspend */

	5994 - main->context_state = CTX_PREPARE_FOR_IMCU;

	5995 + main_ptr->context_state = CTX_PREPARE_FOR_IMCU;

	5996 if (*out_row_ctr >= out_rows_avail)

	5997 return; /* Postprocessor exactly filled output buf */

	5998 /FALLTHROUGH/

	5999 case CTX_PREPARE_FOR_IMCU:

	6000 /* Prepare to process first M-1 row groups of this iMCU row */

	6001 - main->rowgroup_ctr = 0;

	6002 - main->rowgroups_avail = (JDIMENSION) (cinfo->min_DCT_scaled_size - 1);

	6003 + main_ptr->rowgroup_ctr = 0;

	6004 + main_ptr->rowgroups_avail = (JDIMENSION) (cinfo->_min_DCT_scaled_size - 1);

	6005 /* Check for bottom of image: if so, tweak pointers to "duplicate"

	6006 * the last sample row, and adjust rowgroups_avail to ignore padding rows.

	6007 */

	6008 - if (main->iMCU_row_ctr == cinfo->total_iMCU_rows)

	6009 + if (main_ptr->iMCU_row_ctr == cinfo->total_iMCU_rows)

	6010 set_bottom_pointers(cinfo);

	6011 - main->context_state = CTX_PROCESS_IMCU;

	6012 + main_ptr->context_state = CTX_PROCESS_IMCU;

	6013 /FALLTHROUGH/

	6014 case CTX_PROCESS_IMCU:

	6015 /* Call postprocessor using previously set pointers */

	6016 - (*cinfo->post->post_process_data) (cinfo, main->xbuffer[main->whichptr],

	6017 - &main->rowgroup_ctr, main->rowgroups_avail,

	6018 + (*cinfo->post->post_process_data) (cinfo, main_ptr->xbuffer[main_ptr->which ptr],

	6019 + &main_ptr->rowgroup_ctr, main_ptr->rowgroups_avail,

	6020 output_buf, out_row_ctr, out_rows_avail);

	6021 - if (main->rowgroup_ctr < main->rowgroups_avail)

	6022 + if (main_ptr->rowgroup_ctr < main_ptr->rowgroups_avail)

	6023 return; /* Need to suspend */

	6024 /* After the first iMCU, change wraparound pointers to normal state */

	6025 - if (main->iMCU_row_ctr == 1)

	6026 + if (main_ptr->iMCU_row_ctr == 1)

	6027 set_wraparound_pointers(cinfo);

	6028 /* Prepare to load new iMCU row using other xbuffer list */

	6029 - main->whichptr ^= 1; /* 0=>1 or 1=>0 */

	6030 - main->buffer_full = FALSE;

	6031 + main_ptr->whichptr ^= 1; /* 0=>1 or 1=>0 */

	6032 + main_ptr->buffer_full = FALSE;

	6033 /* Still need to process last row group of this iMCU row, */

	6034 /* which is saved at index M+1 of the other xbuffer */

	6035 - main->rowgroup_ctr = (JDIMENSION) (cinfo->min_DCT_scaled_size + 1);

	6036 - main->rowgroups_avail = (JDIMENSION) (cinfo->min_DCT_scaled_size + 2);

	6037 - main->context_state = CTX_POSTPONED_ROW;

	6038 + main_ptr->rowgroup_ctr = (JDIMENSION) (cinfo->_min_DCT_scaled_size + 1);

	6039 + main_ptr->rowgroups_avail = (JDIMENSION) (cinfo->_min_DCT_scaled_size + 2);

	6040 + main_ptr->context_state = CTX_POSTPONED_ROW;

	6041 }

	6042 }

	6043

	6044 @@ -475,15 +417,15 @@

	6045 GLOBAL(void)

	6046 jinit_d_main_controller (j_decompress_ptr cinfo, boolean need_full_buffer)

	6047 {

	6048 - my_main_ptr main;

	6049 + my_main_ptr main_ptr;

	6050 int ci, rgroup, ngroups;

	6051 jpeg_component_info *compptr;

	6052

	6053 - main = (my_main_ptr)

	6054 + main_ptr = (my_main_ptr)

	6055 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,

	6056 SIZEOF(my_main_controller));

	6057 - cinfo->main = (struct jpeg_d_main_controller *) main;

	6058 - main->pub.start_pass = start_pass_main;

	6059 + cinfo->main = (struct jpeg_d_main_controller *) main_ptr;

	6060 + main_ptr->pub.start_pass = start_pass_main;

	6061

	6062 if (need_full_buffer) /* shouldn't happen */

	6063 ERREXIT(cinfo, JERR_BAD_BUFFER_MODE);

	6064 @@ -492,21 +434,21 @@

	6065 * ngroups is the number of row groups we need.

	6066 */

	6067 if (cinfo->upsample->need_context_rows) {

	6068 - if (cinfo->min_DCT_scaled_size < 2) /* unsupported, see comments above */

	6069 + if (cinfo->_min_DCT_scaled_size < 2) /* unsupported, see comments above */

	6070 ERREXIT(cinfo, JERR_NOTIMPL);

	6071 alloc_funny_pointers(cinfo); /* Alloc space for xbuffer[] lists */

	6072 - ngroups = cinfo->min_DCT_scaled_size + 2;

	6073 + ngroups = cinfo->_min_DCT_scaled_size + 2;

	6074 } else {

	6075 - ngroups = cinfo->min_DCT_scaled_size;

	6076 + ngroups = cinfo->_min_DCT_scaled_size;

	6077 }

	6078

	6079 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	6080 ci++, compptr++) {

	6081 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) /

	6082 - cinfo->min_DCT_scaled_size; /* height of a row group of component */

	6083 - main->buffer[ci] = (*cinfo->mem->alloc_sarray)

	6084 + rgroup = (compptr->v_samp_factor * compptr->_DCT_scaled_size) /

	6085 + cinfo->_min_DCT_scaled_size; /* height of a row group of component */

	6086 + main_ptr->buffer[ci] = (*cinfo->mem->alloc_sarray)

	6087 ((j_common_ptr) cinfo, JPOOL_IMAGE,

	6088 - compptr->width_in_blocks * compptr->DCT_scaled_size,

	6089 + compptr->width_in_blocks * compptr->_DCT_scaled_size,

	6090 (JDIMENSION) (rgroup * ngroups));

	6091 }

	6092 }

1 Index: jdmarker.c	6093 Index: jdmarker.c

2 ===================================================================	6094 ===================================================================

3 --- jdmarker.c (revision 829)	6095 --- jdmarker.c (revision 829)

4 +++ jdmarker.c (working copy)	6096 +++ jdmarker.c (working copy)

5 @@ -910,7 +910,7 @@	6097 @@ -1,8 +1,10 @@

	6098 /*

	6099 * jdmarker.c

	6100 *

	6101 + * This file was part of the Independent JPEG Group's software:

	6102 * Copyright (C) 1991-1998, Thomas G. Lane.

	6103 - * This file is part of the Independent JPEG Group's software.

	6104 + * libjpeg-turbo Modifications:

	6105 + * Copyright (C) 2012, D. R. Commander.

	6106 * For conditions of distribution and use, see the accompanying README file.

	6107 *

	6108 * This file contains routines to decode JPEG datastream markers.

	6109 @@ -302,7 +304,7 @@

	6110 /* Process a SOS marker */

	6111 {

	6112 INT32 length;

	6113 - int i, ci, n, c, cc;

	6114 + int i, ci, n, c, cc, pi;

	6115 jpeg_component_info * compptr;

	6116 INPUT_VARS(cinfo);

	6117

	6118 @@ -322,13 +324,17 @@

	6119

	6120 /* Collect the component-spec parameters */

	6121

	6122 + for (i = 0; i < MAX_COMPS_IN_SCAN; i++)

	6123 + cinfo->cur_comp_info[i] = NULL;

	6124 +

	6125 for (i = 0; i < n; i++) {

	6126 INPUT_BYTE(cinfo, cc, return FALSE);

	6127 INPUT_BYTE(cinfo, c, return FALSE);

	6128

	6129 - for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	6130 + for (ci = 0, compptr = cinfo->comp_info;

	6131 +» ci < cinfo->num_components && ci < MAX_COMPS_IN_SCAN;

	6132 » ci++, compptr++) {

	6133 - if (cc == compptr->component_id)

	6134 + if (cc == compptr->component_id && !cinfo->cur_comp_info[ci])

	6135 » goto id_found;

	6136 }

	6137

	6138 @@ -342,6 +348,13 @@

	6139

	6140 TRACEMS3(cinfo, 1, JTRC_SOS_COMPONENT, cc,

	6141 » compptr->dc_tbl_no, compptr->ac_tbl_no);

	6142 +

	6143 + /* This CSi (cc) should differ from the previous CSi */

	6144 + for (pi = 0; pi < i; pi++) {

	6145 + if (cinfo->cur_comp_info[pi] == compptr) {

	6146 + ERREXIT1(cinfo, JERR_BAD_COMPONENT_ID, cc);

	6147 + }

	6148 + }

	6149 }

	6150

	6151 /* Collect the additional scan parameters Ss, Se, Ah/Al. */

	6152 @@ -459,18 +472,21 @@

	6153 for (i = 0; i < count; i++)

	6154 INPUT_BYTE(cinfo, huffval[i], return FALSE);

	6155

	6156 + MEMZERO(&huffval[count], (256 - count) * SIZEOF(UINT8));

	6157 +

	6158 length -= count;

	6159

	6160 if (index & 0x10) {» » /* AC table definition */

	6161 index -= 0x10;

	6162 + if (index < 0 \|\| index >= NUM_HUFF_TBLS)

	6163 + ERREXIT1(cinfo, JERR_DHT_INDEX, index);

	6164 htblptr = &cinfo->ac_huff_tbl_ptrs[index];

	6165 } else {» » » /* DC table definition */

	6166 + if (index < 0 \|\| index >= NUM_HUFF_TBLS)

	6167 + ERREXIT1(cinfo, JERR_DHT_INDEX, index);

	6168 htblptr = &cinfo->dc_huff_tbl_ptrs[index];

	6169 }

	6170

	6171 - if (index < 0 \|\| index >= NUM_HUFF_TBLS)

	6172 - ERREXIT1(cinfo, JERR_DHT_INDEX, index);

	6173 -

	6174 if (*htblptr == NULL)

	6175 *htblptr = jpeg_alloc_huff_table((j_common_ptr) cinfo);

	6176

	6177 @@ -906,7 +922,7 @@

6 }	6178 }

7	6179

8 if (cinfo->marker->discarded_bytes != 0) {	6180 if (cinfo->marker->discarded_bytes != 0) {

9 - WARNMS2(cinfo, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c);	6181 - WARNMS2(cinfo, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c);

10 + TRACEMS2(cinfo, 1, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c) ;	6182 + TRACEMS2(cinfo, 1, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c) ;

11 cinfo->marker->discarded_bytes = 0;	6183 cinfo->marker->discarded_bytes = 0;

12 }	6184 }

13	6185

14 @@ -944,7 +944,144 @@	6186 @@ -940,7 +956,144 @@

15 return TRUE;	6187 return TRUE;

16 }	6188 }

17	6189

18 +#ifdef MOTION_JPEG_SUPPORTED	6190 +#ifdef MOTION_JPEG_SUPPORTED

19	6191

20 +/* The default Huffman tables used by motion JPEG frames. When a motion JPEG	6192 +/* The default Huffman tables used by motion JPEG frames. When a motion JPEG

21 + * frame does not have DHT tables, we should use the huffman tables suggested b y	6193 + * frame does not have DHT tables, we should use the huffman tables suggested b y

22 + * the JPEG standard. Each of these tables represents a member of the JHUFF_TBL S	6194 + * the JPEG standard. Each of these tables represents a member of the JHUFF_TBL S

23 + * struct so we can just copy it to the according JHUFF_TBLS member.	6195 + * struct so we can just copy it to the according JHUFF_TBLS member.

24 + */	6196 + */

(...skipping 124 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
149 +#else	6321 +#else

150 +	6322 +

151 +#define mjpg_load_huff_tables(cinfo)	6323 +#define mjpg_load_huff_tables(cinfo)

152 +	6324 +

153 +#endif /* MOTION_JPEG_SUPPORTED */	6325 +#endif /* MOTION_JPEG_SUPPORTED */

154 +	6326 +

155 +	6327 +

156 /*	6328 /*

157 * Read markers until SOS or EOI.	6329 * Read markers until SOS or EOI.

158 *	6330 *

159 @@ -1013,6 +1150,7 @@	6331 @@ -1009,6 +1162,7 @@

160 break;	6332 break;

161	6333

162 case M_SOS:	6334 case M_SOS:

163 + mjpg_load_huff_tables(cinfo);	6335 + mjpg_load_huff_tables(cinfo);

164 if (! get_sos(cinfo))	6336 if (! get_sos(cinfo))

165 return JPEG_SUSPENDED;	6337 return JPEG_SUSPENDED;

166 cinfo->unread_marker = 0; /* processed the marker */	6338 cinfo->unread_marker = 0; /* processed the marker */

	6339 Index: jdmaster.c

	6340 ===================================================================

	6341 --- jdmaster.c (revision 829)

	6342 +++ jdmaster.c (working copy)

	6343 @@ -1,9 +1,11 @@

	6344 /*

	6345 * jdmaster.c

	6346 *

	6347 + * This file was part of the Independent JPEG Group's software:

	6348 * Copyright (C) 1991-1997, Thomas G. Lane.

	6349 - * Copyright (C) 2009, D. R. Commander.

	6350 - * This file is part of the Independent JPEG Group's software.

	6351 + * Modified 2002-2009 by Guido Vollbeding.

	6352 + * libjpeg-turbo Modifications:

	6353 + * Copyright (C) 2009-2011, D. R. Commander.

	6354 * For conditions of distribution and use, see the accompanying README file.

	6355 *

	6356 * This file contains master control logic for the JPEG decompressor.

	6357 @@ -15,6 +17,7 @@

	6358 #define JPEG_INTERNALS

	6359 #include "jinclude.h"

	6360 #include "jpeglib.h"

	6361 +#include "jpegcomp.h"

	6362

	6363

	6364 /* Private state */

	6365 @@ -56,7 +59,11 @@

	6366 cinfo->out_color_space != JCS_EXT_BGR &&

	6367 cinfo->out_color_space != JCS_EXT_BGRX &&

	6368 cinfo->out_color_space != JCS_EXT_XBGR &&

	6369 - cinfo->out_color_space != JCS_EXT_XRGB) \|\|

	6370 + cinfo->out_color_space != JCS_EXT_XRGB &&

	6371 + cinfo->out_color_space != JCS_EXT_RGBA &&

	6372 + cinfo->out_color_space != JCS_EXT_BGRA &&

	6373 + cinfo->out_color_space != JCS_EXT_ABGR &&

	6374 + cinfo->out_color_space != JCS_EXT_ARGB) \|\|

	6375 cinfo->out_color_components != rgb_pixelsize[cinfo->out_color_space])

	6376 return FALSE;

	6377 /* and it only handles 2h1v or 2h2v sampling ratios */

	6378 @@ -68,9 +75,9 @@

	6379 cinfo->comp_info[2].v_samp_factor != 1)

	6380 return FALSE;

	6381 /* furthermore, it doesn't work if we've scaled the IDCTs differently */

	6382 - if (cinfo->comp_info[0].DCT_scaled_size != cinfo->min_DCT_scaled_size \|\|

	6383 - cinfo->comp_info[1].DCT_scaled_size != cinfo->min_DCT_scaled_size \|\|

	6384 - cinfo->comp_info[2].DCT_scaled_size != cinfo->min_DCT_scaled_size)

	6385 + if (cinfo->comp_info[0]._DCT_scaled_size != cinfo->_min_DCT_scaled_size \|\|

	6386 + cinfo->comp_info[1]._DCT_scaled_size != cinfo->_min_DCT_scaled_size \|\|

	6387 + cinfo->comp_info[2]._DCT_scaled_size != cinfo->_min_DCT_scaled_size)

	6388 return FALSE;

	6389 /* ??? also need to test for upsample-time rescaling, when & if supported */

	6390 return TRUE; /* by golly, it'll work... */

	6391 @@ -84,6 +91,177 @@

	6392 * Compute output image dimensions and related values.

	6393 * NOTE: this is exported for possible use by application.

	6394 * Hence it mustn't do anything that can't be done twice.

	6395 + */

	6396 +

	6397 +#if JPEG_LIB_VERSION >= 80

	6398 +GLOBAL(void)

	6399 +#else

	6400 +LOCAL(void)

	6401 +#endif

	6402 +jpeg_core_output_dimensions (j_decompress_ptr cinfo)

	6403 +/* Do computations that are needed before master selection phase.

	6404 + * This function is used for transcoding and full decompression.

	6405 + */

	6406 +{

	6407 +#ifdef IDCT_SCALING_SUPPORTED

	6408 + int ci;

	6409 + jpeg_component_info *compptr;

	6410 +

	6411 + /* Compute actual output image dimensions and DCT scaling choices. */

	6412 + if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom) {

	6413 + /* Provide 1/block_size scaling */

	6414 + cinfo->output_width = (JDIMENSION)

	6415 + jdiv_round_up((long) cinfo->image_width, (long) DCTSIZE);

	6416 + cinfo->output_height = (JDIMENSION)

	6417 + jdiv_round_up((long) cinfo->image_height, (long) DCTSIZE);

	6418 + cinfo->_min_DCT_h_scaled_size = 1;

	6419 + cinfo->_min_DCT_v_scaled_size = 1;

	6420 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 2) {

	6421 + /* Provide 2/block_size scaling */

	6422 + cinfo->output_width = (JDIMENSION)

	6423 + jdiv_round_up((long) cinfo->image_width * 2L, (long) DCTSIZE);

	6424 + cinfo->output_height = (JDIMENSION)

	6425 + jdiv_round_up((long) cinfo->image_height * 2L, (long) DCTSIZE);

	6426 + cinfo->_min_DCT_h_scaled_size = 2;

	6427 + cinfo->_min_DCT_v_scaled_size = 2;

	6428 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 3) {

	6429 + /* Provide 3/block_size scaling */

	6430 + cinfo->output_width = (JDIMENSION)

	6431 + jdiv_round_up((long) cinfo->image_width * 3L, (long) DCTSIZE);

	6432 + cinfo->output_height = (JDIMENSION)

	6433 + jdiv_round_up((long) cinfo->image_height * 3L, (long) DCTSIZE);

	6434 + cinfo->_min_DCT_h_scaled_size = 3;

	6435 + cinfo->_min_DCT_v_scaled_size = 3;

	6436 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 4) {

	6437 + /* Provide 4/block_size scaling */

	6438 + cinfo->output_width = (JDIMENSION)

	6439 + jdiv_round_up((long) cinfo->image_width * 4L, (long) DCTSIZE);

	6440 + cinfo->output_height = (JDIMENSION)

	6441 + jdiv_round_up((long) cinfo->image_height * 4L, (long) DCTSIZE);

	6442 + cinfo->_min_DCT_h_scaled_size = 4;

	6443 + cinfo->_min_DCT_v_scaled_size = 4;

	6444 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 5) {

	6445 + /* Provide 5/block_size scaling */

	6446 + cinfo->output_width = (JDIMENSION)

	6447 + jdiv_round_up((long) cinfo->image_width * 5L, (long) DCTSIZE);

	6448 + cinfo->output_height = (JDIMENSION)

	6449 + jdiv_round_up((long) cinfo->image_height * 5L, (long) DCTSIZE);

	6450 + cinfo->_min_DCT_h_scaled_size = 5;

	6451 + cinfo->_min_DCT_v_scaled_size = 5;

	6452 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 6) {

	6453 + /* Provide 6/block_size scaling */

	6454 + cinfo->output_width = (JDIMENSION)

	6455 + jdiv_round_up((long) cinfo->image_width * 6L, (long) DCTSIZE);

	6456 + cinfo->output_height = (JDIMENSION)

	6457 + jdiv_round_up((long) cinfo->image_height * 6L, (long) DCTSIZE);

	6458 + cinfo->_min_DCT_h_scaled_size = 6;

	6459 + cinfo->_min_DCT_v_scaled_size = 6;

	6460 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 7) {

	6461 + /* Provide 7/block_size scaling */

	6462 + cinfo->output_width = (JDIMENSION)

	6463 + jdiv_round_up((long) cinfo->image_width * 7L, (long) DCTSIZE);

	6464 + cinfo->output_height = (JDIMENSION)

	6465 + jdiv_round_up((long) cinfo->image_height * 7L, (long) DCTSIZE);

	6466 + cinfo->_min_DCT_h_scaled_size = 7;

	6467 + cinfo->_min_DCT_v_scaled_size = 7;

	6468 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 8) {

	6469 + /* Provide 8/block_size scaling */

	6470 + cinfo->output_width = (JDIMENSION)

	6471 + jdiv_round_up((long) cinfo->image_width * 8L, (long) DCTSIZE);

	6472 + cinfo->output_height = (JDIMENSION)

	6473 + jdiv_round_up((long) cinfo->image_height * 8L, (long) DCTSIZE);

	6474 + cinfo->_min_DCT_h_scaled_size = 8;

	6475 + cinfo->_min_DCT_v_scaled_size = 8;

	6476 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 9) {

	6477 + /* Provide 9/block_size scaling */

	6478 + cinfo->output_width = (JDIMENSION)

	6479 + jdiv_round_up((long) cinfo->image_width * 9L, (long) DCTSIZE);

	6480 + cinfo->output_height = (JDIMENSION)

	6481 + jdiv_round_up((long) cinfo->image_height * 9L, (long) DCTSIZE);

	6482 + cinfo->_min_DCT_h_scaled_size = 9;

	6483 + cinfo->_min_DCT_v_scaled_size = 9;

	6484 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 10) {

	6485 + /* Provide 10/block_size scaling */

	6486 + cinfo->output_width = (JDIMENSION)

	6487 + jdiv_round_up((long) cinfo->image_width * 10L, (long) DCTSIZE);

	6488 + cinfo->output_height = (JDIMENSION)

	6489 + jdiv_round_up((long) cinfo->image_height * 10L, (long) DCTSIZE);

	6490 + cinfo->_min_DCT_h_scaled_size = 10;

	6491 + cinfo->_min_DCT_v_scaled_size = 10;

	6492 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 11) {

	6493 + /* Provide 11/block_size scaling */

	6494 + cinfo->output_width = (JDIMENSION)

	6495 + jdiv_round_up((long) cinfo->image_width * 11L, (long) DCTSIZE);

	6496 + cinfo->output_height = (JDIMENSION)

	6497 + jdiv_round_up((long) cinfo->image_height * 11L, (long) DCTSIZE);

	6498 + cinfo->_min_DCT_h_scaled_size = 11;

	6499 + cinfo->_min_DCT_v_scaled_size = 11;

	6500 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 12) {

	6501 + /* Provide 12/block_size scaling */

	6502 + cinfo->output_width = (JDIMENSION)

	6503 + jdiv_round_up((long) cinfo->image_width * 12L, (long) DCTSIZE);

	6504 + cinfo->output_height = (JDIMENSION)

	6505 + jdiv_round_up((long) cinfo->image_height * 12L, (long) DCTSIZE);

	6506 + cinfo->_min_DCT_h_scaled_size = 12;

	6507 + cinfo->_min_DCT_v_scaled_size = 12;

	6508 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 13) {

	6509 + /* Provide 13/block_size scaling */

	6510 + cinfo->output_width = (JDIMENSION)

	6511 + jdiv_round_up((long) cinfo->image_width * 13L, (long) DCTSIZE);

	6512 + cinfo->output_height = (JDIMENSION)

	6513 + jdiv_round_up((long) cinfo->image_height * 13L, (long) DCTSIZE);

	6514 + cinfo->_min_DCT_h_scaled_size = 13;

	6515 + cinfo->_min_DCT_v_scaled_size = 13;

	6516 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 14) {

	6517 + /* Provide 14/block_size scaling */

	6518 + cinfo->output_width = (JDIMENSION)

	6519 + jdiv_round_up((long) cinfo->image_width * 14L, (long) DCTSIZE);

	6520 + cinfo->output_height = (JDIMENSION)

	6521 + jdiv_round_up((long) cinfo->image_height * 14L, (long) DCTSIZE);

	6522 + cinfo->_min_DCT_h_scaled_size = 14;

	6523 + cinfo->_min_DCT_v_scaled_size = 14;

	6524 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 15) {

	6525 + /* Provide 15/block_size scaling */

	6526 + cinfo->output_width = (JDIMENSION)

	6527 + jdiv_round_up((long) cinfo->image_width * 15L, (long) DCTSIZE);

	6528 + cinfo->output_height = (JDIMENSION)

	6529 + jdiv_round_up((long) cinfo->image_height * 15L, (long) DCTSIZE);

	6530 + cinfo->_min_DCT_h_scaled_size = 15;

	6531 + cinfo->_min_DCT_v_scaled_size = 15;

	6532 + } else {

	6533 + /* Provide 16/block_size scaling */

	6534 + cinfo->output_width = (JDIMENSION)

	6535 + jdiv_round_up((long) cinfo->image_width * 16L, (long) DCTSIZE);

	6536 + cinfo->output_height = (JDIMENSION)

	6537 + jdiv_round_up((long) cinfo->image_height * 16L, (long) DCTSIZE);

	6538 + cinfo->_min_DCT_h_scaled_size = 16;

	6539 + cinfo->_min_DCT_v_scaled_size = 16;

	6540 + }

	6541 +

	6542 + /* Recompute dimensions of components */

	6543 + for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	6544 + ci++, compptr++) {

	6545 + compptr->_DCT_h_scaled_size = cinfo->_min_DCT_h_scaled_size;

	6546 + compptr->_DCT_v_scaled_size = cinfo->_min_DCT_v_scaled_size;

	6547 + }

	6548 +

	6549 +#else /* !IDCT_SCALING_SUPPORTED */

	6550 +

	6551 + /* Hardwire it to "no scaling" */

	6552 + cinfo->output_width = cinfo->image_width;

	6553 + cinfo->output_height = cinfo->image_height;

	6554 + /* jdinput.c has already initialized DCT_scaled_size,

	6555 + * and has computed unscaled downsampled_width and downsampled_height.

	6556 + */

	6557 +

	6558 +#endif /* IDCT_SCALING_SUPPORTED */

	6559 +}

	6560 +

	6561 +

	6562 +/*

	6563 + * Compute output image dimensions and related values.

	6564 + * NOTE: this is exported for possible use by application.

	6565 + * Hence it mustn't do anything that can't be done twice.

	6566 * Also note that it may be called before the master module is initialized!

	6567 */

	6568

	6569 @@ -100,52 +278,31 @@

	6570 if (cinfo->global_state != DSTATE_READY)

	6571 ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);

	6572

	6573 + /* Compute core output image dimensions and DCT scaling choices. */

	6574 + jpeg_core_output_dimensions(cinfo);

	6575 +

	6576 #ifdef IDCT_SCALING_SUPPORTED

	6577

	6578 - /* Compute actual output image dimensions and DCT scaling choices. */

	6579 - if (cinfo->scale_num * 8 <= cinfo->scale_denom) {

	6580 - /* Provide 1/8 scaling */

	6581 - cinfo->output_width = (JDIMENSION)

	6582 - jdiv_round_up((long) cinfo->image_width, 8L);

	6583 - cinfo->output_height = (JDIMENSION)

	6584 - jdiv_round_up((long) cinfo->image_height, 8L);

	6585 - cinfo->min_DCT_scaled_size = 1;

	6586 - } else if (cinfo->scale_num * 4 <= cinfo->scale_denom) {

	6587 - /* Provide 1/4 scaling */

	6588 - cinfo->output_width = (JDIMENSION)

	6589 - jdiv_round_up((long) cinfo->image_width, 4L);

	6590 - cinfo->output_height = (JDIMENSION)

	6591 - jdiv_round_up((long) cinfo->image_height, 4L);

	6592 - cinfo->min_DCT_scaled_size = 2;

	6593 - } else if (cinfo->scale_num * 2 <= cinfo->scale_denom) {

	6594 - /* Provide 1/2 scaling */

	6595 - cinfo->output_width = (JDIMENSION)

	6596 - jdiv_round_up((long) cinfo->image_width, 2L);

	6597 - cinfo->output_height = (JDIMENSION)

	6598 - jdiv_round_up((long) cinfo->image_height, 2L);

	6599 - cinfo->min_DCT_scaled_size = 4;

	6600 - } else {

	6601 - /* Provide 1/1 scaling */

	6602 - cinfo->output_width = cinfo->image_width;

	6603 - cinfo->output_height = cinfo->image_height;

	6604 - cinfo->min_DCT_scaled_size = DCTSIZE;

	6605 - }

	6606 /* In selecting the actual DCT scaling for each component, we try to

	6607 * scale up the chroma components via IDCT scaling rather than upsampling.

	6608 * This saves time if the upsampler gets to use 1:1 scaling.

	6609 - * Note this code assumes that the supported DCT scalings are powers of 2.

	6610 + * Note this code adapts subsampling ratios which are powers of 2.

	6611 */

	6612 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;

	6613 ci++, compptr++) {

	6614 - int ssize = cinfo->min_DCT_scaled_size;

	6615 + int ssize = cinfo->_min_DCT_scaled_size;

	6616 while (ssize < DCTSIZE &&

	6617 - (compptr->h_samp_factor * ssize * 2 <=

	6618 - cinfo->max_h_samp_factor * cinfo->min_DCT_scaled_size) &&

	6619 - (compptr->v_samp_factor * ssize * 2 <=

	6620 - cinfo->max_v_samp_factor * cinfo->min_DCT_scaled_size)) {

	6621 + ((cinfo->max_h_samp_factor * cinfo->_min_DCT_scaled_size) %

	6622 + (compptr->h_samp_factor * ssize * 2) == 0) &&

	6623 + ((cinfo->max_v_samp_factor * cinfo->_min_DCT_scaled_size) %

	6624 + (compptr->v_samp_factor * ssize * 2) == 0)) {

	6625 ssize = ssize * 2;

	6626 }

	6627 +#if JPEG_LIB_VERSION >= 70

	6628 + compptr->DCT_h_scaled_size = compptr->DCT_v_scaled_size = ssize;

	6629 +#else

	6630 compptr->DCT_scaled_size = ssize;

	6631 +#endif

	6632 }

	6633

	6634 /* Recompute downsampled dimensions of components;

	6635 @@ -156,11 +313,11 @@

	6636 /* Size in samples, after IDCT scaling */

	6637 compptr->downsampled_width = (JDIMENSION)

	6638 jdiv_round_up((long) cinfo->image_width *

	6639 - (long) (compptr->h_samp_factor * compptr->DCT_scaled_size),

	6640 + (long) (compptr->h_samp_factor * compptr->_DCT_scaled_size),

	6641 (long) (cinfo->max_h_samp_factor * DCTSIZE));

	6642 compptr->downsampled_height = (JDIMENSION)

	6643 jdiv_round_up((long) cinfo->image_height *

	6644 - (long) (compptr->v_samp_factor * compptr->DCT_scaled_size),

	6645 + (long) (compptr->v_samp_factor * compptr->_DCT_scaled_size),

	6646 (long) (cinfo->max_v_samp_factor * DCTSIZE));

	6647 }

	6648

	6649 @@ -188,6 +345,10 @@

	6650 case JCS_EXT_BGRX:

	6651 case JCS_EXT_XBGR:

	6652 case JCS_EXT_XRGB:

	6653 + case JCS_EXT_RGBA:

	6654 + case JCS_EXT_BGRA:

	6655 + case JCS_EXT_ABGR:

	6656 + case JCS_EXT_ARGB:

	6657 cinfo->out_color_components = rgb_pixelsize[cinfo->out_color_space];

	6658 break;

	6659 case JCS_YCbCr:

	6660 @@ -384,7 +545,11 @@

	6661 jinit_inverse_dct(cinfo);

	6662 /* Entropy decoding: either Huffman or arithmetic coding. */

	6663 if (cinfo->arith_code) {

	6664 +#ifdef D_ARITH_CODING_SUPPORTED

	6665 + jinit_arith_decoder(cinfo);

	6666 +#else

	6667 ERREXIT(cinfo, JERR_ARITH_NOTIMPL);

	6668 +#endif

	6669 } else {

	6670 if (cinfo->progressive_mode) {

	6671 #ifdef D_PROGRESSIVE_SUPPORTED

	6672 Index: jdmerge.c

	6673 ===================================================================

	6674 --- jdmerge.c (revision 829)

	6675 +++ jdmerge.c (working copy)

	6676 @@ -1,10 +1,11 @@

	6677 /*

	6678 * jdmerge.c

	6679 *

	6680 + * This file was part of the Independent JPEG Group's software:

	6681 * Copyright (C) 1994-1996, Thomas G. Lane.

	6682 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	6683 - * Copyright (C) 2009, D. R. Commander.

	6684 - * This file is part of the Independent JPEG Group's software.

	6685 + * libjpeg-turbo Modifications:

	6686 + * Copyright (C) 2009, 2011, D. R. Commander.

	6687 * For conditions of distribution and use, see the accompanying README file.

	6688 *

	6689 * This file contains code for merged upsampling/color conversion.

	6690 @@ -38,6 +39,7 @@

	6691 #include "jinclude.h"

	6692 #include "jpeglib.h"

	6693 #include "jsimd.h"

	6694 +#include "config.h"

	6695

	6696 #ifdef UPSAMPLE_MERGING_SUPPORTED

	6697

	6698 @@ -77,6 +79,107 @@

	6699 #define FIX(x) ((INT32) ((x) * (1L<<SCALEBITS) + 0.5))

	6700

	6701

	6702 +/* Include inline routines for colorspace extensions */

	6703 +

	6704 +#include "jdmrgext.c"

	6705 +#undef RGB_RED

	6706 +#undef RGB_GREEN

	6707 +#undef RGB_BLUE

	6708 +#undef RGB_PIXELSIZE

	6709 +

	6710 +#define RGB_RED EXT_RGB_RED

	6711 +#define RGB_GREEN EXT_RGB_GREEN

	6712 +#define RGB_BLUE EXT_RGB_BLUE

	6713 +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	6714 +#define h2v1_merged_upsample_internal extrgb_h2v1_merged_upsample_internal

	6715 +#define h2v2_merged_upsample_internal extrgb_h2v2_merged_upsample_internal

	6716 +#include "jdmrgext.c"

	6717 +#undef RGB_RED

	6718 +#undef RGB_GREEN

	6719 +#undef RGB_BLUE

	6720 +#undef RGB_PIXELSIZE

	6721 +#undef h2v1_merged_upsample_internal

	6722 +#undef h2v2_merged_upsample_internal

	6723 +

	6724 +#define RGB_RED EXT_RGBX_RED

	6725 +#define RGB_GREEN EXT_RGBX_GREEN

	6726 +#define RGB_BLUE EXT_RGBX_BLUE

	6727 +#define RGB_ALPHA 3

	6728 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	6729 +#define h2v1_merged_upsample_internal extrgbx_h2v1_merged_upsample_internal

	6730 +#define h2v2_merged_upsample_internal extrgbx_h2v2_merged_upsample_internal

	6731 +#include "jdmrgext.c"

	6732 +#undef RGB_RED

	6733 +#undef RGB_GREEN

	6734 +#undef RGB_BLUE

	6735 +#undef RGB_ALPHA

	6736 +#undef RGB_PIXELSIZE

	6737 +#undef h2v1_merged_upsample_internal

	6738 +#undef h2v2_merged_upsample_internal

	6739 +

	6740 +#define RGB_RED EXT_BGR_RED

	6741 +#define RGB_GREEN EXT_BGR_GREEN

	6742 +#define RGB_BLUE EXT_BGR_BLUE

	6743 +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	6744 +#define h2v1_merged_upsample_internal extbgr_h2v1_merged_upsample_internal

	6745 +#define h2v2_merged_upsample_internal extbgr_h2v2_merged_upsample_internal

	6746 +#include "jdmrgext.c"

	6747 +#undef RGB_RED

	6748 +#undef RGB_GREEN

	6749 +#undef RGB_BLUE

	6750 +#undef RGB_PIXELSIZE

	6751 +#undef h2v1_merged_upsample_internal

	6752 +#undef h2v2_merged_upsample_internal

	6753 +

	6754 +#define RGB_RED EXT_BGRX_RED

	6755 +#define RGB_GREEN EXT_BGRX_GREEN

	6756 +#define RGB_BLUE EXT_BGRX_BLUE

	6757 +#define RGB_ALPHA 3

	6758 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	6759 +#define h2v1_merged_upsample_internal extbgrx_h2v1_merged_upsample_internal

	6760 +#define h2v2_merged_upsample_internal extbgrx_h2v2_merged_upsample_internal

	6761 +#include "jdmrgext.c"

	6762 +#undef RGB_RED

	6763 +#undef RGB_GREEN

	6764 +#undef RGB_BLUE

	6765 +#undef RGB_ALPHA

	6766 +#undef RGB_PIXELSIZE

	6767 +#undef h2v1_merged_upsample_internal

	6768 +#undef h2v2_merged_upsample_internal

	6769 +

	6770 +#define RGB_RED EXT_XBGR_RED

	6771 +#define RGB_GREEN EXT_XBGR_GREEN

	6772 +#define RGB_BLUE EXT_XBGR_BLUE

	6773 +#define RGB_ALPHA 0

	6774 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	6775 +#define h2v1_merged_upsample_internal extxbgr_h2v1_merged_upsample_internal

	6776 +#define h2v2_merged_upsample_internal extxbgr_h2v2_merged_upsample_internal

	6777 +#include "jdmrgext.c"

	6778 +#undef RGB_RED

	6779 +#undef RGB_GREEN

	6780 +#undef RGB_BLUE

	6781 +#undef RGB_ALPHA

	6782 +#undef RGB_PIXELSIZE

	6783 +#undef h2v1_merged_upsample_internal

	6784 +#undef h2v2_merged_upsample_internal

	6785 +

	6786 +#define RGB_RED EXT_XRGB_RED

	6787 +#define RGB_GREEN EXT_XRGB_GREEN

	6788 +#define RGB_BLUE EXT_XRGB_BLUE

	6789 +#define RGB_ALPHA 0

	6790 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	6791 +#define h2v1_merged_upsample_internal extxrgb_h2v1_merged_upsample_internal

	6792 +#define h2v2_merged_upsample_internal extxrgb_h2v2_merged_upsample_internal

	6793 +#include "jdmrgext.c"

	6794 +#undef RGB_RED

	6795 +#undef RGB_GREEN

	6796 +#undef RGB_BLUE

	6797 +#undef RGB_ALPHA

	6798 +#undef RGB_PIXELSIZE

	6799 +#undef h2v1_merged_upsample_internal

	6800 +#undef h2v2_merged_upsample_internal

	6801 +

	6802 +

	6803 /*

	6804 * Initialize tables for YCC->RGB colorspace conversion.

	6805 * This is taken directly from jdcolor.c; see that file for more info.

	6806 @@ -230,56 +333,40 @@

	6807 JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr,

	6808 JSAMPARRAY output_buf)

	6809 {

	6810 - my_upsample_ptr upsample = (my_upsample_ptr) cinfo->upsample;

	6811 - register int y, cred, cgreen, cblue;

	6812 - int cb, cr;

	6813 - register JSAMPROW outptr;

	6814 - JSAMPROW inptr0, inptr1, inptr2;

	6815 - JDIMENSION col;

	6816 - /* copy these pointers into registers if possible */

	6817 - register JSAMPLE * range_limit = cinfo->sample_range_limit;

	6818 - int * Crrtab = upsample->Cr_r_tab;

	6819 - int * Cbbtab = upsample->Cb_b_tab;

	6820 - INT32 * Crgtab = upsample->Cr_g_tab;

	6821 - INT32 * Cbgtab = upsample->Cb_g_tab;

	6822 - SHIFT_TEMPS

	6823 -

	6824 - inptr0 = input_buf[0][in_row_group_ctr];

	6825 - inptr1 = input_buf[1][in_row_group_ctr];

	6826 - inptr2 = input_buf[2][in_row_group_ctr];

	6827 - outptr = output_buf[0];

	6828 - /* Loop for each pair of output pixels */

	6829 - for (col = cinfo->output_width >> 1; col > 0; col--) {

	6830 - /* Do the chroma part of the calculation */

	6831 - cb = GETJSAMPLE(*inptr1++);

	6832 - cr = GETJSAMPLE(*inptr2++);

	6833 - cred = Crrtab[cr];

	6834 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);

	6835 - cblue = Cbbtab[cb];

	6836 - /* Fetch 2 Y values and emit 2 pixels */

	6837 - y = GETJSAMPLE(*inptr0++);

	6838 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];

	6839 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];

	6840 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];

	6841 - outptr += rgb_pixelsize[cinfo->out_color_space];

	6842 - y = GETJSAMPLE(*inptr0++);

	6843 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];

	6844 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];

	6845 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];

	6846 - outptr += rgb_pixelsize[cinfo->out_color_space];

	6847 + switch (cinfo->out_color_space) {

	6848 + case JCS_EXT_RGB:

	6849 + extrgb_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6850 + output_buf);

	6851 + break;

	6852 + case JCS_EXT_RGBX:

	6853 + case JCS_EXT_RGBA:

	6854 + extrgbx_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6855 + output_buf);

	6856 + break;

	6857 + case JCS_EXT_BGR:

	6858 + extbgr_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6859 + output_buf);

	6860 + break;

	6861 + case JCS_EXT_BGRX:

	6862 + case JCS_EXT_BGRA:

	6863 + extbgrx_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6864 + output_buf);

	6865 + break;

	6866 + case JCS_EXT_XBGR:

	6867 + case JCS_EXT_ABGR:

	6868 + extxbgr_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6869 + output_buf);

	6870 + break;

	6871 + case JCS_EXT_XRGB:

	6872 + case JCS_EXT_ARGB:

	6873 + extxrgb_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6874 + output_buf);

	6875 + break;

	6876 + default:

	6877 + h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6878 + output_buf);

	6879 + break;

	6880 }

	6881 - /* If image width is odd, do the last output column separately */

	6882 - if (cinfo->output_width & 1) {

	6883 - cb = GETJSAMPLE(*inptr1);

	6884 - cr = GETJSAMPLE(*inptr2);

	6885 - cred = Crrtab[cr];

	6886 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);

	6887 - cblue = Cbbtab[cb];

	6888 - y = GETJSAMPLE(*inptr0);

	6889 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];

	6890 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];

	6891 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];

	6892 - }

	6893 }

	6894

	6895

	6896 @@ -292,72 +379,40 @@

	6897 JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr,

	6898 JSAMPARRAY output_buf)

	6899 {

	6900 - my_upsample_ptr upsample = (my_upsample_ptr) cinfo->upsample;

	6901 - register int y, cred, cgreen, cblue;

	6902 - int cb, cr;

	6903 - register JSAMPROW outptr0, outptr1;

	6904 - JSAMPROW inptr00, inptr01, inptr1, inptr2;

	6905 - JDIMENSION col;

	6906 - /* copy these pointers into registers if possible */

	6907 - register JSAMPLE * range_limit = cinfo->sample_range_limit;

	6908 - int * Crrtab = upsample->Cr_r_tab;

	6909 - int * Cbbtab = upsample->Cb_b_tab;

	6910 - INT32 * Crgtab = upsample->Cr_g_tab;

	6911 - INT32 * Cbgtab = upsample->Cb_g_tab;

	6912 - SHIFT_TEMPS

	6913 -

	6914 - inptr00 = input_buf[0][in_row_group_ctr*2];

	6915 - inptr01 = input_buf[0][in_row_group_ctr*2 + 1];

	6916 - inptr1 = input_buf[1][in_row_group_ctr];

	6917 - inptr2 = input_buf[2][in_row_group_ctr];

	6918 - outptr0 = output_buf[0];

	6919 - outptr1 = output_buf[1];

	6920 - /* Loop for each group of output pixels */

	6921 - for (col = cinfo->output_width >> 1; col > 0; col--) {

	6922 - /* Do the chroma part of the calculation */

	6923 - cb = GETJSAMPLE(*inptr1++);

	6924 - cr = GETJSAMPLE(*inptr2++);

	6925 - cred = Crrtab[cr];

	6926 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);

	6927 - cblue = Cbbtab[cb];

	6928 - /* Fetch 4 Y values and emit 4 pixels */

	6929 - y = GETJSAMPLE(*inptr00++);

	6930 - outptr0[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];

	6931 - outptr0[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];

	6932 - outptr0[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];

	6933 - outptr0 += RGB_PIXELSIZE;

	6934 - y = GETJSAMPLE(*inptr00++);

	6935 - outptr0[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];

	6936 - outptr0[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];

	6937 - outptr0[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];

	6938 - outptr0 += RGB_PIXELSIZE;

	6939 - y = GETJSAMPLE(*inptr01++);

	6940 - outptr1[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];

	6941 - outptr1[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];

	6942 - outptr1[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];

	6943 - outptr1 += RGB_PIXELSIZE;

	6944 - y = GETJSAMPLE(*inptr01++);

	6945 - outptr1[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];

	6946 - outptr1[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];

	6947 - outptr1[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];

	6948 - outptr1 += RGB_PIXELSIZE;

	6949 + switch (cinfo->out_color_space) {

	6950 + case JCS_EXT_RGB:

	6951 + extrgb_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6952 + output_buf);

	6953 + break;

	6954 + case JCS_EXT_RGBX:

	6955 + case JCS_EXT_RGBA:

	6956 + extrgbx_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6957 + output_buf);

	6958 + break;

	6959 + case JCS_EXT_BGR:

	6960 + extbgr_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6961 + output_buf);

	6962 + break;

	6963 + case JCS_EXT_BGRX:

	6964 + case JCS_EXT_BGRA:

	6965 + extbgrx_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6966 + output_buf);

	6967 + break;

	6968 + case JCS_EXT_XBGR:

	6969 + case JCS_EXT_ABGR:

	6970 + extxbgr_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6971 + output_buf);

	6972 + break;

	6973 + case JCS_EXT_XRGB:

	6974 + case JCS_EXT_ARGB:

	6975 + extxrgb_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6976 + output_buf);

	6977 + break;

	6978 + default:

	6979 + h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr,

	6980 + output_buf);

	6981 + break;

	6982 }

	6983 - /* If image width is odd, do the last output column separately */

	6984 - if (cinfo->output_width & 1) {

	6985 - cb = GETJSAMPLE(*inptr1);

	6986 - cr = GETJSAMPLE(*inptr2);

	6987 - cred = Crrtab[cr];

	6988 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);

	6989 - cblue = Cbbtab[cb];

	6990 - y = GETJSAMPLE(*inptr00);

	6991 - outptr0[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];

	6992 - outptr0[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];

	6993 - outptr0[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];

	6994 - y = GETJSAMPLE(*inptr01);

	6995 - outptr1[rgb_red[cinfo->out_color_space]] = range_limit[y + cred];

	6996 - outptr1[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen];

	6997 - outptr1[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue];

	6998 - }

	6999 }

	7000

	7001

	7002 Index: jdphuff.c

	7003 ===================================================================

	7004 --- jdphuff.c (revision 829)

	7005 +++ jdphuff.c (working copy)

	7006 @@ -198,6 +198,7 @@

	7007 * On some machines, a shift and add will be faster than a table lookup.

	7008 */

	7009

	7010 +#define AVOID_TABLES

	7011 #ifdef AVOID_TABLES

	7012

	7013 #define HUFF_EXTEND(x,s) ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x))

	7014 Index: jdsample.c

	7015 ===================================================================

	7016 --- jdsample.c (revision 829)

	7017 +++ jdsample.c (working copy)

	7018 @@ -1,9 +1,11 @@

	7019 /*

	7020 * jdsample.c

	7021 *

	7022 + * This file was part of the Independent JPEG Group's software:

	7023 * Copyright (C) 1991-1996, Thomas G. Lane.

	7024 + * libjpeg-turbo Modifications:

	7025 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	7026 - * This file is part of the Independent JPEG Group's software.

	7027 + * Copyright (C) 2010, D. R. Commander.

	7028 * For conditions of distribution and use, see the accompanying README file.

	7029 *

	7030 * This file contains upsampling routines.

	7031 @@ -19,50 +21,12 @@

	7032 * Pub. by IEEE Computer Society Press, Los Alamitos, CA. ISBN 0-8186-8944-7.

	7033 */

	7034

	7035 -#define JPEG_INTERNALS

	7036 -#include "jinclude.h"

	7037 -#include "jpeglib.h"

	7038 +#include "jdsample.h"

	7039 #include "jsimd.h"

	7040 +#include "jpegcomp.h"

	7041

	7042

	7043 -/* Pointer to routine to upsample a single component */

	7044 -typedef JMETHOD(void, upsample1_ptr,

	7045 - (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	7046 - JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));

	7047

	7048 -/* Private subobject */

	7049 -

	7050 -typedef struct {

	7051 - struct jpeg_upsampler pub; /* public fields */

	7052 -

	7053 - /* Color conversion buffer. When using separate upsampling and color

	7054 - * conversion steps, this buffer holds one upsampled row group until it

	7055 - * has been color converted and output.

	7056 - * Note: we do not allocate any storage for component(s) which are full-size,

	7057 - * ie do not need rescaling. The corresponding entry of color_buf[] is

	7058 - * simply set to point to the input data array, thereby avoiding copying.

	7059 - */

	7060 - JSAMPARRAY color_buf[MAX_COMPONENTS];

	7061 -

	7062 - /* Per-component upsampling method pointers */

	7063 - upsample1_ptr methods[MAX_COMPONENTS];

	7064 -

	7065 - int next_row_out; /* counts rows emitted from color_buf */

	7066 - JDIMENSION rows_to_go; /* counts rows remaining in image */

	7067 -

	7068 - /* Height of an input row group for each component. */

	7069 - int rowgroup_height[MAX_COMPONENTS];

	7070 -

	7071 - /* These arrays save pixel expansion factors so that int_expand need not

	7072 - * recompute them each time. They are unused for other upsampling methods.

	7073 - */

	7074 - UINT8 h_expand[MAX_COMPONENTS];

	7075 - UINT8 v_expand[MAX_COMPONENTS];

	7076 -} my_upsampler;

	7077 -

	7078 -typedef my_upsampler * my_upsample_ptr;

	7079 -

	7080 -

	7081 /*

	7082 * Initialize for an upsampling pass.

	7083 */

	7084 @@ -420,7 +384,7 @@

	7085 /* jdmainct.c doesn't support context rows when min_DCT_scaled_size = 1,

	7086 * so don't ask for it.

	7087 */

	7088 - do_fancy = cinfo->do_fancy_upsampling && cinfo->min_DCT_scaled_size > 1;

	7089 + do_fancy = cinfo->do_fancy_upsampling && cinfo->_min_DCT_scaled_size > 1;

	7090

	7091 /* Verify we can handle the sampling factors, select per-component methods,

	7092 * and create storage as needed.

	7093 @@ -430,10 +394,10 @@

	7094 /* Compute size of an "input group" after IDCT scaling. This many samples

	7095 * are to be converted to max_h_samp_factor * max_v_samp_factor pixels.

	7096 */

	7097 - h_in_group = (compptr->h_samp_factor * compptr->DCT_scaled_size) /

	7098 - cinfo->min_DCT_scaled_size;

	7099 - v_in_group = (compptr->v_samp_factor * compptr->DCT_scaled_size) /

	7100 - cinfo->min_DCT_scaled_size;

	7101 + h_in_group = (compptr->h_samp_factor * compptr->_DCT_scaled_size) /

	7102 + cinfo->_min_DCT_scaled_size;

	7103 + v_in_group = (compptr->v_samp_factor * compptr->_DCT_scaled_size) /

	7104 + cinfo->_min_DCT_scaled_size;

	7105 h_out_group = cinfo->max_h_samp_factor;

	7106 v_out_group = cinfo->max_v_samp_factor;

	7107 upsample->rowgroup_height[ci] = v_in_group; /* save for use later */

	7108 Index: jdtrans.c

	7109 ===================================================================

	7110 --- jdtrans.c (revision 829)

	7111 +++ jdtrans.c (working copy)

	7112 @@ -99,9 +99,18 @@

	7113 /* This is effectively a buffered-image operation. */

	7114 cinfo->buffered_image = TRUE;

	7115

	7116 +#if JPEG_LIB_VERSION >= 80

	7117 + /* Compute output image dimensions and related values. */

	7118 + jpeg_core_output_dimensions(cinfo);

	7119 +#endif

	7120 +

	7121 /* Entropy decoding: either Huffman or arithmetic coding. */

	7122 if (cinfo->arith_code) {

	7123 +#ifdef D_ARITH_CODING_SUPPORTED

	7124 + jinit_arith_decoder(cinfo);

	7125 +#else

	7126 ERREXIT(cinfo, JERR_ARITH_NOTIMPL);

	7127 +#endif

	7128 } else {

	7129 if (cinfo->progressive_mode) {

	7130 #ifdef D_PROGRESSIVE_SUPPORTED

	7131 Index: jerror.h

	7132 ===================================================================

	7133 --- jerror.h (revision 829)

	7134 +++ jerror.h (working copy)

	7135 @@ -2,6 +2,7 @@

	7136 * jerror.h

	7137 *

	7138 * Copyright (C) 1994-1997, Thomas G. Lane.

	7139 + * Modified 1997-2009 by Guido Vollbeding.

	7140 * This file is part of the Independent JPEG Group's software.

	7141 * For conditions of distribution and use, see the accompanying README file.

	7142 *

	7143 @@ -39,14 +40,23 @@

	7144 JMESSAGE(JMSG_NOMESSAGE, "Bogus message code %d") /* Must be first entry! */

	7145

	7146 /* For maintenance convenience, list is alphabetical by message code name */

	7147 +#if JPEG_LIB_VERSION < 70

	7148 JMESSAGE(JERR_ARITH_NOTIMPL,

	7149 - "Sorry, there are legal restrictions on arithmetic coding")

	7150 + "Sorry, arithmetic coding is not implemented")

	7151 +#endif

	7152 JMESSAGE(JERR_BAD_ALIGN_TYPE, "ALIGN_TYPE is wrong, please fix")

	7153 JMESSAGE(JERR_BAD_ALLOC_CHUNK, "MAX_ALLOC_CHUNK is wrong, please fix")

	7154 JMESSAGE(JERR_BAD_BUFFER_MODE, "Bogus buffer control mode")

	7155 JMESSAGE(JERR_BAD_COMPONENT_ID, "Invalid component ID %d in SOS")

	7156 +#if JPEG_LIB_VERSION >= 70

	7157 +JMESSAGE(JERR_BAD_CROP_SPEC, "Invalid crop request")

	7158 +#endif

	7159 JMESSAGE(JERR_BAD_DCT_COEF, "DCT coefficient out of range")

	7160 JMESSAGE(JERR_BAD_DCTSIZE, "IDCT output block size %d not supported")

	7161 +#if JPEG_LIB_VERSION >= 70

	7162 +JMESSAGE(JERR_BAD_DROP_SAMPLING,

	7163 + "Component index %d: mismatching sampling ratio %d:%d, %d:%d, %c")

	7164 +#endif

	7165 JMESSAGE(JERR_BAD_HUFF_TABLE, "Bogus Huffman table definition")

	7166 JMESSAGE(JERR_BAD_IN_COLORSPACE, "Bogus input colorspace")

	7167 JMESSAGE(JERR_BAD_J_COLORSPACE, "Bogus JPEG colorspace")

	7168 @@ -93,6 +103,9 @@

	7169 JMESSAGE(JERR_MODE_CHANGE, "Invalid color quantization mode change")

	7170 JMESSAGE(JERR_NOTIMPL, "Not implemented yet")

	7171 JMESSAGE(JERR_NOT_COMPILED, "Requested feature was omitted at compile time")

	7172 +#if JPEG_LIB_VERSION >= 70

	7173 +JMESSAGE(JERR_NO_ARITH_TABLE, "Arithmetic table 0x%02x was not defined")

	7174 +#endif

	7175 JMESSAGE(JERR_NO_BACKING_STORE, "Backing store not supported")

	7176 JMESSAGE(JERR_NO_HUFF_TABLE, "Huffman table 0x%02x was not defined")

	7177 JMESSAGE(JERR_NO_IMAGE, "JPEG datastream contains no image")

	7178 @@ -170,6 +183,9 @@

	7179 JMESSAGE(JTRC_XMS_CLOSE, "Freed XMS handle %u")

	7180 JMESSAGE(JTRC_XMS_OPEN, "Obtained XMS handle %u")

	7181 JMESSAGE(JWRN_ADOBE_XFORM, "Unknown Adobe color transform code %d")

	7182 +#if JPEG_LIB_VERSION >= 70

	7183 +JMESSAGE(JWRN_ARITH_BAD_CODE, "Corrupt JPEG data: bad arithmetic code")

	7184 +#endif

	7185 JMESSAGE(JWRN_BOGUS_PROGRESSION,

	7186 "Inconsistent progression sequence for component %d coefficient %d")

	7187 JMESSAGE(JWRN_EXTRANEOUS_DATA,

	7188 @@ -182,6 +198,13 @@

	7189 "Corrupt JPEG data: found marker 0x%02x instead of RST%d")

	7190 JMESSAGE(JWRN_NOT_SEQUENTIAL, "Invalid SOS parameters for sequential JPEG")

	7191 JMESSAGE(JWRN_TOO_MUCH_DATA, "Application transferred too many scanlines")

	7192 +#if JPEG_LIB_VERSION < 70

	7193 +JMESSAGE(JERR_BAD_CROP_SPEC, "Invalid crop request")

	7194 +#if defined(C_ARITH_CODING_SUPPORTED) \|\| defined(D_ARITH_CODING_SUPPORTED)

	7195 +JMESSAGE(JERR_NO_ARITH_TABLE, "Arithmetic table 0x%02x was not defined")

	7196 +JMESSAGE(JWRN_ARITH_BAD_CODE, "Corrupt JPEG data: bad arithmetic code")

	7197 +#endif

	7198 +#endif

	7199

	7200 #ifdef JMAKE_ENUM_LIST

	7201

	7202 Index: jidctint.c

	7203 ===================================================================

	7204 --- jidctint.c (revision 829)

	7205 +++ jidctint.c (working copy)

	7206 @@ -2,6 +2,7 @@

	7207 * jidctint.c

	7208 *

	7209 * Copyright (C) 1991-1998, Thomas G. Lane.

	7210 + * Modification developed 2002-2009 by Guido Vollbeding.

	7211 * This file is part of the Independent JPEG Group's software.

	7212 * For conditions of distribution and use, see the accompanying README file.

	7213 *

	7214 @@ -23,6 +24,27 @@

	7215 * The advantage of this method is that no data path contains more than one

	7216 * multiplication; this allows a very simple and accurate implementation in

	7217 * scaled fixed-point arithmetic, with a minimal number of shifts.

	7218 + *

	7219 + * We also provide IDCT routines with various output sample block sizes for

	7220 + * direct resolution reduction or enlargement without additional resampling:

	7221 + * NxN (N=1...16) pixels for one 8x8 input DCT block.

	7222 + *

	7223 + * For N<8 we simply take the corresponding low-frequency coefficients of

	7224 + * the 8x8 input DCT block and apply an NxN point IDCT on the sub-block

	7225 + * to yield the downscaled outputs.

	7226 + * This can be seen as direct low-pass downsampling from the DCT domain

	7227 + * point of view rather than the usual spatial domain point of view,

	7228 + * yielding significant computational savings and results at least

	7229 + * as good as common bilinear (averaging) spatial downsampling.

	7230 + *

	7231 + * For N>8 we apply a partial NxN IDCT on the 8 input coefficients as

	7232 + * lower frequencies and higher frequencies assumed to be zero.

	7233 + * It turns out that the computational effort is similar to the 8x8 IDCT

	7234 + * regarding the output size.

	7235 + * Furthermore, the scaling and descaling is the same for all IDCT sizes.

	7236 + *

	7237 + * CAUTION: We rely on the FIX() macro except for the N=1,2,4,8 cases

	7238 + * since there would be too many additional constants to pre-calculate.

	7239 */

	7240

	7241 #define JPEG_INTERNALS

	7242 @@ -38,7 +60,7 @@

	7243 */

	7244

	7245 #if DCTSIZE != 8

	7246 - Sorry, this code only copes with 8x8 DCTs. /* deliberate syntax err */

	7247 + Sorry, this code only copes with 8x8 DCT blocks. /* deliberate syntax err */

	7248 #endif

	7249

	7250

	7251 @@ -386,4 +408,2216 @@

	7252 }

	7253 }

	7254

	7255 +#ifdef IDCT_SCALING_SUPPORTED

	7256 +

	7257 +

	7258 +/*

	7259 + * Perform dequantization and inverse DCT on one block of coefficients,

	7260 + * producing a 7x7 output block.

	7261 + *

	7262 + * Optimized algorithm with 12 multiplications in the 1-D kernel.

	7263 + * cK represents sqrt(2) * cos(K*pi/14).

	7264 + */

	7265 +

	7266 +GLOBAL(void)

	7267 +jpeg_idct_7x7 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	7268 + JCOEFPTR coef_block,

	7269 + JSAMPARRAY output_buf, JDIMENSION output_col)

	7270 +{

	7271 + INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12, tmp13;

	7272 + INT32 z1, z2, z3;

	7273 + JCOEFPTR inptr;

	7274 + ISLOW_MULT_TYPE * quantptr;

	7275 + int * wsptr;

	7276 + JSAMPROW outptr;

	7277 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	7278 + int ctr;

	7279 + int workspace[77]; / buffers data between passes */

	7280 + SHIFT_TEMPS

	7281 +

	7282 + /* Pass 1: process columns from input, store into work array. */

	7283 +

	7284 + inptr = coef_block;

	7285 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	7286 + wsptr = workspace;

	7287 + for (ctr = 0; ctr < 7; ctr++, inptr++, quantptr++, wsptr++) {

	7288 + /* Even part */

	7289 +

	7290 + tmp13 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	7291 + tmp13 <<= CONST_BITS;

	7292 + /* Add fudge factor here for final descale. */

	7293 + tmp13 += ONE << (CONST_BITS-PASS1_BITS-1);

	7294 +

	7295 + z1 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	7296 + z2 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	7297 + z3 = DEQUANTIZE(inptr[DCTSIZE6], quantptr[DCTSIZE6]);

	7298 +

	7299 + tmp10 = MULTIPLY(z2 - z3, FIX(0.881747734)); /* c4 */

	7300 + tmp12 = MULTIPLY(z1 - z2, FIX(0.314692123)); /* c6 */

	7301 + tmp11 = tmp10 + tmp12 + tmp13 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6 */

	7302 + tmp0 = z1 + z3;

	7303 + z2 -= tmp0;

	7304 + tmp0 = MULTIPLY(tmp0, FIX(1.274162392)) + tmp13; /* c2 */

	7305 + tmp10 += tmp0 - MULTIPLY(z3, FIX(0.077722536)); /* c2-c4-c6 */

	7306 + tmp12 += tmp0 - MULTIPLY(z1, FIX(2.470602249)); /* c2+c4+c6 */

	7307 + tmp13 += MULTIPLY(z2, FIX(1.414213562)); /* c0 */

	7308 +

	7309 + /* Odd part */

	7310 +

	7311 + z1 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	7312 + z2 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	7313 + z3 = DEQUANTIZE(inptr[DCTSIZE5], quantptr[DCTSIZE5]);

	7314 +

	7315 + tmp1 = MULTIPLY(z1 + z2, FIX(0.935414347)); /* (c3+c1-c5)/2 */

	7316 + tmp2 = MULTIPLY(z1 - z2, FIX(0.170262339)); /* (c3+c5-c1)/2 */

	7317 + tmp0 = tmp1 - tmp2;

	7318 + tmp1 += tmp2;

	7319 + tmp2 = MULTIPLY(z2 + z3, - FIX(1.378756276)); /* -c1 */

	7320 + tmp1 += tmp2;

	7321 + z2 = MULTIPLY(z1 + z3, FIX(0.613604268)); /* c5 */

	7322 + tmp0 += z2;

	7323 + tmp2 += z2 + MULTIPLY(z3, FIX(1.870828693)); /* c3+c1-c5 */

	7324 +

	7325 + /* Final output stage */

	7326 +

	7327 + wsptr[7*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);

	7328 + wsptr[7*6] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);

	7329 + wsptr[7*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);

	7330 + wsptr[7*5] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);

	7331 + wsptr[7*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);

	7332 + wsptr[7*4] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);

	7333 + wsptr[7*3] = (int) RIGHT_SHIFT(tmp13, CONST_BITS-PASS1_BITS);

	7334 + }

	7335 +

	7336 + /* Pass 2: process 7 rows from work array, store into output array. */

	7337 +

	7338 + wsptr = workspace;

	7339 + for (ctr = 0; ctr < 7; ctr++) {

	7340 + outptr = output_buf[ctr] + output_col;

	7341 +

	7342 + /* Even part */

	7343 +

	7344 + /* Add fudge factor here for final descale. */

	7345 + tmp13 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	7346 + tmp13 <<= CONST_BITS;

	7347 +

	7348 + z1 = (INT32) wsptr[2];

	7349 + z2 = (INT32) wsptr[4];

	7350 + z3 = (INT32) wsptr[6];

	7351 +

	7352 + tmp10 = MULTIPLY(z2 - z3, FIX(0.881747734)); /* c4 */

	7353 + tmp12 = MULTIPLY(z1 - z2, FIX(0.314692123)); /* c6 */

	7354 + tmp11 = tmp10 + tmp12 + tmp13 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6 */

	7355 + tmp0 = z1 + z3;

	7356 + z2 -= tmp0;

	7357 + tmp0 = MULTIPLY(tmp0, FIX(1.274162392)) + tmp13; /* c2 */

	7358 + tmp10 += tmp0 - MULTIPLY(z3, FIX(0.077722536)); /* c2-c4-c6 */

	7359 + tmp12 += tmp0 - MULTIPLY(z1, FIX(2.470602249)); /* c2+c4+c6 */

	7360 + tmp13 += MULTIPLY(z2, FIX(1.414213562)); /* c0 */

	7361 +

	7362 + /* Odd part */

	7363 +

	7364 + z1 = (INT32) wsptr[1];

	7365 + z2 = (INT32) wsptr[3];

	7366 + z3 = (INT32) wsptr[5];

	7367 +

	7368 + tmp1 = MULTIPLY(z1 + z2, FIX(0.935414347)); /* (c3+c1-c5)/2 */

	7369 + tmp2 = MULTIPLY(z1 - z2, FIX(0.170262339)); /* (c3+c5-c1)/2 */

	7370 + tmp0 = tmp1 - tmp2;

	7371 + tmp1 += tmp2;

	7372 + tmp2 = MULTIPLY(z2 + z3, - FIX(1.378756276)); /* -c1 */

	7373 + tmp1 += tmp2;

	7374 + z2 = MULTIPLY(z1 + z3, FIX(0.613604268)); /* c5 */

	7375 + tmp0 += z2;

	7376 + tmp2 += z2 + MULTIPLY(z3, FIX(1.870828693)); /* c3+c1-c5 */

	7377 +

	7378 + /* Final output stage */

	7379 +

	7380 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,

	7381 + CONST_BITS+PASS1_BITS+3)

	7382 + & RANGE_MASK];

	7383 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,

	7384 + CONST_BITS+PASS1_BITS+3)

	7385 + & RANGE_MASK];

	7386 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,

	7387 + CONST_BITS+PASS1_BITS+3)

	7388 + & RANGE_MASK];

	7389 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,

	7390 + CONST_BITS+PASS1_BITS+3)

	7391 + & RANGE_MASK];

	7392 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,

	7393 + CONST_BITS+PASS1_BITS+3)

	7394 + & RANGE_MASK];

	7395 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,

	7396 + CONST_BITS+PASS1_BITS+3)

	7397 + & RANGE_MASK];

	7398 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13,

	7399 + CONST_BITS+PASS1_BITS+3)

	7400 + & RANGE_MASK];

	7401 +

	7402 + wsptr += 7; /* advance pointer to next row */

	7403 + }

	7404 +}

	7405 +

	7406 +

	7407 +/*

	7408 + * Perform dequantization and inverse DCT on one block of coefficients,

	7409 + * producing a reduced-size 6x6 output block.

	7410 + *

	7411 + * Optimized algorithm with 3 multiplications in the 1-D kernel.

	7412 + * cK represents sqrt(2) * cos(K*pi/12).

	7413 + */

	7414 +

	7415 +GLOBAL(void)

	7416 +jpeg_idct_6x6 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	7417 + JCOEFPTR coef_block,

	7418 + JSAMPARRAY output_buf, JDIMENSION output_col)

	7419 +{

	7420 + INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12;

	7421 + INT32 z1, z2, z3;

	7422 + JCOEFPTR inptr;

	7423 + ISLOW_MULT_TYPE * quantptr;

	7424 + int * wsptr;

	7425 + JSAMPROW outptr;

	7426 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	7427 + int ctr;

	7428 + int workspace[66]; / buffers data between passes */

	7429 + SHIFT_TEMPS

	7430 +

	7431 + /* Pass 1: process columns from input, store into work array. */

	7432 +

	7433 + inptr = coef_block;

	7434 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	7435 + wsptr = workspace;

	7436 + for (ctr = 0; ctr < 6; ctr++, inptr++, quantptr++, wsptr++) {

	7437 + /* Even part */

	7438 +

	7439 + tmp0 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	7440 + tmp0 <<= CONST_BITS;

	7441 + /* Add fudge factor here for final descale. */

	7442 + tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);

	7443 + tmp2 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	7444 + tmp10 = MULTIPLY(tmp2, FIX(0.707106781)); /* c4 */

	7445 + tmp1 = tmp0 + tmp10;

	7446 + tmp11 = RIGHT_SHIFT(tmp0 - tmp10 - tmp10, CONST_BITS-PASS1_BITS);

	7447 + tmp10 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	7448 + tmp0 = MULTIPLY(tmp10, FIX(1.224744871)); /* c2 */

	7449 + tmp10 = tmp1 + tmp0;

	7450 + tmp12 = tmp1 - tmp0;

	7451 +

	7452 + /* Odd part */

	7453 +

	7454 + z1 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	7455 + z2 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	7456 + z3 = DEQUANTIZE(inptr[DCTSIZE5], quantptr[DCTSIZE5]);

	7457 + tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */

	7458 + tmp0 = tmp1 + ((z1 + z2) << CONST_BITS);

	7459 + tmp2 = tmp1 + ((z3 - z2) << CONST_BITS);

	7460 + tmp1 = (z1 - z2 - z3) << PASS1_BITS;

	7461 +

	7462 + /* Final output stage */

	7463 +

	7464 + wsptr[6*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);

	7465 + wsptr[6*5] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);

	7466 + wsptr[6*1] = (int) (tmp11 + tmp1);

	7467 + wsptr[6*4] = (int) (tmp11 - tmp1);

	7468 + wsptr[6*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);

	7469 + wsptr[6*3] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);

	7470 + }

	7471 +

	7472 + /* Pass 2: process 6 rows from work array, store into output array. */

	7473 +

	7474 + wsptr = workspace;

	7475 + for (ctr = 0; ctr < 6; ctr++) {

	7476 + outptr = output_buf[ctr] + output_col;

	7477 +

	7478 + /* Even part */

	7479 +

	7480 + /* Add fudge factor here for final descale. */

	7481 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	7482 + tmp0 <<= CONST_BITS;

	7483 + tmp2 = (INT32) wsptr[4];

	7484 + tmp10 = MULTIPLY(tmp2, FIX(0.707106781)); /* c4 */

	7485 + tmp1 = tmp0 + tmp10;

	7486 + tmp11 = tmp0 - tmp10 - tmp10;

	7487 + tmp10 = (INT32) wsptr[2];

	7488 + tmp0 = MULTIPLY(tmp10, FIX(1.224744871)); /* c2 */

	7489 + tmp10 = tmp1 + tmp0;

	7490 + tmp12 = tmp1 - tmp0;

	7491 +

	7492 + /* Odd part */

	7493 +

	7494 + z1 = (INT32) wsptr[1];

	7495 + z2 = (INT32) wsptr[3];

	7496 + z3 = (INT32) wsptr[5];

	7497 + tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */

	7498 + tmp0 = tmp1 + ((z1 + z2) << CONST_BITS);

	7499 + tmp2 = tmp1 + ((z3 - z2) << CONST_BITS);

	7500 + tmp1 = (z1 - z2 - z3) << CONST_BITS;

	7501 +

	7502 + /* Final output stage */

	7503 +

	7504 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,

	7505 + CONST_BITS+PASS1_BITS+3)

	7506 + & RANGE_MASK];

	7507 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,

	7508 + CONST_BITS+PASS1_BITS+3)

	7509 + & RANGE_MASK];

	7510 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,

	7511 + CONST_BITS+PASS1_BITS+3)

	7512 + & RANGE_MASK];

	7513 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,

	7514 + CONST_BITS+PASS1_BITS+3)

	7515 + & RANGE_MASK];

	7516 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,

	7517 + CONST_BITS+PASS1_BITS+3)

	7518 + & RANGE_MASK];

	7519 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,

	7520 + CONST_BITS+PASS1_BITS+3)

	7521 + & RANGE_MASK];

	7522 +

	7523 + wsptr += 6; /* advance pointer to next row */

	7524 + }

	7525 +}

	7526 +

	7527 +

	7528 +/*

	7529 + * Perform dequantization and inverse DCT on one block of coefficients,

	7530 + * producing a reduced-size 5x5 output block.

	7531 + *

	7532 + * Optimized algorithm with 5 multiplications in the 1-D kernel.

	7533 + * cK represents sqrt(2) * cos(K*pi/10).

	7534 + */

	7535 +

	7536 +GLOBAL(void)

	7537 +jpeg_idct_5x5 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	7538 + JCOEFPTR coef_block,

	7539 + JSAMPARRAY output_buf, JDIMENSION output_col)

	7540 +{

	7541 + INT32 tmp0, tmp1, tmp10, tmp11, tmp12;

	7542 + INT32 z1, z2, z3;

	7543 + JCOEFPTR inptr;

	7544 + ISLOW_MULT_TYPE * quantptr;

	7545 + int * wsptr;

	7546 + JSAMPROW outptr;

	7547 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	7548 + int ctr;

	7549 + int workspace[55]; / buffers data between passes */

	7550 + SHIFT_TEMPS

	7551 +

	7552 + /* Pass 1: process columns from input, store into work array. */

	7553 +

	7554 + inptr = coef_block;

	7555 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	7556 + wsptr = workspace;

	7557 + for (ctr = 0; ctr < 5; ctr++, inptr++, quantptr++, wsptr++) {

	7558 + /* Even part */

	7559 +

	7560 + tmp12 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	7561 + tmp12 <<= CONST_BITS;

	7562 + /* Add fudge factor here for final descale. */

	7563 + tmp12 += ONE << (CONST_BITS-PASS1_BITS-1);

	7564 + tmp0 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	7565 + tmp1 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	7566 + z1 = MULTIPLY(tmp0 + tmp1, FIX(0.790569415)); /* (c2+c4)/2 */

	7567 + z2 = MULTIPLY(tmp0 - tmp1, FIX(0.353553391)); /* (c2-c4)/2 */

	7568 + z3 = tmp12 + z2;

	7569 + tmp10 = z3 + z1;

	7570 + tmp11 = z3 - z1;

	7571 + tmp12 -= z2 << 2;

	7572 +

	7573 + /* Odd part */

	7574 +

	7575 + z2 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	7576 + z3 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	7577 +

	7578 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c3 */

	7579 + tmp0 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c1-c3 */

	7580 + tmp1 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c1+c3 */

	7581 +

	7582 + /* Final output stage */

	7583 +

	7584 + wsptr[5*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);

	7585 + wsptr[5*4] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);

	7586 + wsptr[5*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);

	7587 + wsptr[5*3] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);

	7588 + wsptr[5*2] = (int) RIGHT_SHIFT(tmp12, CONST_BITS-PASS1_BITS);

	7589 + }

	7590 +

	7591 + /* Pass 2: process 5 rows from work array, store into output array. */

	7592 +

	7593 + wsptr = workspace;

	7594 + for (ctr = 0; ctr < 5; ctr++) {

	7595 + outptr = output_buf[ctr] + output_col;

	7596 +

	7597 + /* Even part */

	7598 +

	7599 + /* Add fudge factor here for final descale. */

	7600 + tmp12 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	7601 + tmp12 <<= CONST_BITS;

	7602 + tmp0 = (INT32) wsptr[2];

	7603 + tmp1 = (INT32) wsptr[4];

	7604 + z1 = MULTIPLY(tmp0 + tmp1, FIX(0.790569415)); /* (c2+c4)/2 */

	7605 + z2 = MULTIPLY(tmp0 - tmp1, FIX(0.353553391)); /* (c2-c4)/2 */

	7606 + z3 = tmp12 + z2;

	7607 + tmp10 = z3 + z1;

	7608 + tmp11 = z3 - z1;

	7609 + tmp12 -= z2 << 2;

	7610 +

	7611 + /* Odd part */

	7612 +

	7613 + z2 = (INT32) wsptr[1];

	7614 + z3 = (INT32) wsptr[3];

	7615 +

	7616 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c3 */

	7617 + tmp0 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c1-c3 */

	7618 + tmp1 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c1+c3 */

	7619 +

	7620 + /* Final output stage */

	7621 +

	7622 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,

	7623 + CONST_BITS+PASS1_BITS+3)

	7624 + & RANGE_MASK];

	7625 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,

	7626 + CONST_BITS+PASS1_BITS+3)

	7627 + & RANGE_MASK];

	7628 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,

	7629 + CONST_BITS+PASS1_BITS+3)

	7630 + & RANGE_MASK];

	7631 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,

	7632 + CONST_BITS+PASS1_BITS+3)

	7633 + & RANGE_MASK];

	7634 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12,

	7635 + CONST_BITS+PASS1_BITS+3)

	7636 + & RANGE_MASK];

	7637 +

	7638 + wsptr += 5; /* advance pointer to next row */

	7639 + }

	7640 +}

	7641 +

	7642 +

	7643 +/*

	7644 + * Perform dequantization and inverse DCT on one block of coefficients,

	7645 + * producing a reduced-size 3x3 output block.

	7646 + *

	7647 + * Optimized algorithm with 2 multiplications in the 1-D kernel.

	7648 + * cK represents sqrt(2) * cos(K*pi/6).

	7649 + */

	7650 +

	7651 +GLOBAL(void)

	7652 +jpeg_idct_3x3 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	7653 + JCOEFPTR coef_block,

	7654 + JSAMPARRAY output_buf, JDIMENSION output_col)

	7655 +{

	7656 + INT32 tmp0, tmp2, tmp10, tmp12;

	7657 + JCOEFPTR inptr;

	7658 + ISLOW_MULT_TYPE * quantptr;

	7659 + int * wsptr;

	7660 + JSAMPROW outptr;

	7661 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	7662 + int ctr;

	7663 + int workspace[33]; / buffers data between passes */

	7664 + SHIFT_TEMPS

	7665 +

	7666 + /* Pass 1: process columns from input, store into work array. */

	7667 +

	7668 + inptr = coef_block;

	7669 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	7670 + wsptr = workspace;

	7671 + for (ctr = 0; ctr < 3; ctr++, inptr++, quantptr++, wsptr++) {

	7672 + /* Even part */

	7673 +

	7674 + tmp0 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	7675 + tmp0 <<= CONST_BITS;

	7676 + /* Add fudge factor here for final descale. */

	7677 + tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);

	7678 + tmp2 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	7679 + tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */

	7680 + tmp10 = tmp0 + tmp12;

	7681 + tmp2 = tmp0 - tmp12 - tmp12;

	7682 +

	7683 + /* Odd part */

	7684 +

	7685 + tmp12 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	7686 + tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */

	7687 +

	7688 + /* Final output stage */

	7689 +

	7690 + wsptr[3*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);

	7691 + wsptr[3*2] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);

	7692 + wsptr[3*1] = (int) RIGHT_SHIFT(tmp2, CONST_BITS-PASS1_BITS);

	7693 + }

	7694 +

	7695 + /* Pass 2: process 3 rows from work array, store into output array. */

	7696 +

	7697 + wsptr = workspace;

	7698 + for (ctr = 0; ctr < 3; ctr++) {

	7699 + outptr = output_buf[ctr] + output_col;

	7700 +

	7701 + /* Even part */

	7702 +

	7703 + /* Add fudge factor here for final descale. */

	7704 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	7705 + tmp0 <<= CONST_BITS;

	7706 + tmp2 = (INT32) wsptr[2];

	7707 + tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */

	7708 + tmp10 = tmp0 + tmp12;

	7709 + tmp2 = tmp0 - tmp12 - tmp12;

	7710 +

	7711 + /* Odd part */

	7712 +

	7713 + tmp12 = (INT32) wsptr[1];

	7714 + tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */

	7715 +

	7716 + /* Final output stage */

	7717 +

	7718 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,

	7719 + CONST_BITS+PASS1_BITS+3)

	7720 + & RANGE_MASK];

	7721 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,

	7722 + CONST_BITS+PASS1_BITS+3)

	7723 + & RANGE_MASK];

	7724 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp2,

	7725 + CONST_BITS+PASS1_BITS+3)

	7726 + & RANGE_MASK];

	7727 +

	7728 + wsptr += 3; /* advance pointer to next row */

	7729 + }

	7730 +}

	7731 +

	7732 +

	7733 +/*

	7734 + * Perform dequantization and inverse DCT on one block of coefficients,

	7735 + * producing a 9x9 output block.

	7736 + *

	7737 + * Optimized algorithm with 10 multiplications in the 1-D kernel.

	7738 + * cK represents sqrt(2) * cos(K*pi/18).

	7739 + */

	7740 +

	7741 +GLOBAL(void)

	7742 +jpeg_idct_9x9 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	7743 + JCOEFPTR coef_block,

	7744 + JSAMPARRAY output_buf, JDIMENSION output_col)

	7745 +{

	7746 + INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13, tmp14;

	7747 + INT32 z1, z2, z3, z4;

	7748 + JCOEFPTR inptr;

	7749 + ISLOW_MULT_TYPE * quantptr;

	7750 + int * wsptr;

	7751 + JSAMPROW outptr;

	7752 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	7753 + int ctr;

	7754 + int workspace[89]; / buffers data between passes */

	7755 + SHIFT_TEMPS

	7756 +

	7757 + /* Pass 1: process columns from input, store into work array. */

	7758 +

	7759 + inptr = coef_block;

	7760 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	7761 + wsptr = workspace;

	7762 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {

	7763 + /* Even part */

	7764 +

	7765 + tmp0 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	7766 + tmp0 <<= CONST_BITS;

	7767 + /* Add fudge factor here for final descale. */

	7768 + tmp0 += ONE << (CONST_BITS-PASS1_BITS-1);

	7769 +

	7770 + z1 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	7771 + z2 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	7772 + z3 = DEQUANTIZE(inptr[DCTSIZE6], quantptr[DCTSIZE6]);

	7773 +

	7774 + tmp3 = MULTIPLY(z3, FIX(0.707106781)); /* c6 */

	7775 + tmp1 = tmp0 + tmp3;

	7776 + tmp2 = tmp0 - tmp3 - tmp3;

	7777 +

	7778 + tmp0 = MULTIPLY(z1 - z2, FIX(0.707106781)); /* c6 */

	7779 + tmp11 = tmp2 + tmp0;

	7780 + tmp14 = tmp2 - tmp0 - tmp0;

	7781 +

	7782 + tmp0 = MULTIPLY(z1 + z2, FIX(1.328926049)); /* c2 */

	7783 + tmp2 = MULTIPLY(z1, FIX(1.083350441)); /* c4 */

	7784 + tmp3 = MULTIPLY(z2, FIX(0.245575608)); /* c8 */

	7785 +

	7786 + tmp10 = tmp1 + tmp0 - tmp3;

	7787 + tmp12 = tmp1 - tmp0 + tmp2;

	7788 + tmp13 = tmp1 - tmp2 + tmp3;

	7789 +

	7790 + /* Odd part */

	7791 +

	7792 + z1 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	7793 + z2 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	7794 + z3 = DEQUANTIZE(inptr[DCTSIZE5], quantptr[DCTSIZE5]);

	7795 + z4 = DEQUANTIZE(inptr[DCTSIZE7], quantptr[DCTSIZE7]);

	7796 +

	7797 + z2 = MULTIPLY(z2, - FIX(1.224744871)); /* -c3 */

	7798 +

	7799 + tmp2 = MULTIPLY(z1 + z3, FIX(0.909038955)); /* c5 */

	7800 + tmp3 = MULTIPLY(z1 + z4, FIX(0.483689525)); /* c7 */

	7801 + tmp0 = tmp2 + tmp3 - z2;

	7802 + tmp1 = MULTIPLY(z3 - z4, FIX(1.392728481)); /* c1 */

	7803 + tmp2 += z2 - tmp1;

	7804 + tmp3 += z2 + tmp1;

	7805 + tmp1 = MULTIPLY(z1 - z3 - z4, FIX(1.224744871)); /* c3 */

	7806 +

	7807 + /* Final output stage */

	7808 +

	7809 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS);

	7810 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS);

	7811 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS);

	7812 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS);

	7813 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS);

	7814 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS);

	7815 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp13 + tmp3, CONST_BITS-PASS1_BITS);

	7816 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp13 - tmp3, CONST_BITS-PASS1_BITS);

	7817 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp14, CONST_BITS-PASS1_BITS);

	7818 + }

	7819 +

	7820 + /* Pass 2: process 9 rows from work array, store into output array. */

	7821 +

	7822 + wsptr = workspace;

	7823 + for (ctr = 0; ctr < 9; ctr++) {

	7824 + outptr = output_buf[ctr] + output_col;

	7825 +

	7826 + /* Even part */

	7827 +

	7828 + /* Add fudge factor here for final descale. */

	7829 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	7830 + tmp0 <<= CONST_BITS;

	7831 +

	7832 + z1 = (INT32) wsptr[2];

	7833 + z2 = (INT32) wsptr[4];

	7834 + z3 = (INT32) wsptr[6];

	7835 +

	7836 + tmp3 = MULTIPLY(z3, FIX(0.707106781)); /* c6 */

	7837 + tmp1 = tmp0 + tmp3;

	7838 + tmp2 = tmp0 - tmp3 - tmp3;

	7839 +

	7840 + tmp0 = MULTIPLY(z1 - z2, FIX(0.707106781)); /* c6 */

	7841 + tmp11 = tmp2 + tmp0;

	7842 + tmp14 = tmp2 - tmp0 - tmp0;

	7843 +

	7844 + tmp0 = MULTIPLY(z1 + z2, FIX(1.328926049)); /* c2 */

	7845 + tmp2 = MULTIPLY(z1, FIX(1.083350441)); /* c4 */

	7846 + tmp3 = MULTIPLY(z2, FIX(0.245575608)); /* c8 */

	7847 +

	7848 + tmp10 = tmp1 + tmp0 - tmp3;

	7849 + tmp12 = tmp1 - tmp0 + tmp2;

	7850 + tmp13 = tmp1 - tmp2 + tmp3;

	7851 +

	7852 + /* Odd part */

	7853 +

	7854 + z1 = (INT32) wsptr[1];

	7855 + z2 = (INT32) wsptr[3];

	7856 + z3 = (INT32) wsptr[5];

	7857 + z4 = (INT32) wsptr[7];

	7858 +

	7859 + z2 = MULTIPLY(z2, - FIX(1.224744871)); /* -c3 */

	7860 +

	7861 + tmp2 = MULTIPLY(z1 + z3, FIX(0.909038955)); /* c5 */

	7862 + tmp3 = MULTIPLY(z1 + z4, FIX(0.483689525)); /* c7 */

	7863 + tmp0 = tmp2 + tmp3 - z2;

	7864 + tmp1 = MULTIPLY(z3 - z4, FIX(1.392728481)); /* c1 */

	7865 + tmp2 += z2 - tmp1;

	7866 + tmp3 += z2 + tmp1;

	7867 + tmp1 = MULTIPLY(z1 - z3 - z4, FIX(1.224744871)); /* c3 */

	7868 +

	7869 + /* Final output stage */

	7870 +

	7871 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0,

	7872 + CONST_BITS+PASS1_BITS+3)

	7873 + & RANGE_MASK];

	7874 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0,

	7875 + CONST_BITS+PASS1_BITS+3)

	7876 + & RANGE_MASK];

	7877 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1,

	7878 + CONST_BITS+PASS1_BITS+3)

	7879 + & RANGE_MASK];

	7880 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1,

	7881 + CONST_BITS+PASS1_BITS+3)

	7882 + & RANGE_MASK];

	7883 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2,

	7884 + CONST_BITS+PASS1_BITS+3)

	7885 + & RANGE_MASK];

	7886 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2,

	7887 + CONST_BITS+PASS1_BITS+3)

	7888 + & RANGE_MASK];

	7889 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13 + tmp3,

	7890 + CONST_BITS+PASS1_BITS+3)

	7891 + & RANGE_MASK];

	7892 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp13 - tmp3,

	7893 + CONST_BITS+PASS1_BITS+3)

	7894 + & RANGE_MASK];

	7895 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp14,

	7896 + CONST_BITS+PASS1_BITS+3)

	7897 + & RANGE_MASK];

	7898 +

	7899 + wsptr += 8; /* advance pointer to next row */

	7900 + }

	7901 +}

	7902 +

	7903 +

	7904 +/*

	7905 + * Perform dequantization and inverse DCT on one block of coefficients,

	7906 + * producing a 10x10 output block.

	7907 + *

	7908 + * Optimized algorithm with 12 multiplications in the 1-D kernel.

	7909 + * cK represents sqrt(2) * cos(K*pi/20).

	7910 + */

	7911 +

	7912 +GLOBAL(void)

	7913 +jpeg_idct_10x10 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	7914 + JCOEFPTR coef_block,

	7915 + JSAMPARRAY output_buf, JDIMENSION output_col)

	7916 +{

	7917 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14;

	7918 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24;

	7919 + INT32 z1, z2, z3, z4, z5;

	7920 + JCOEFPTR inptr;

	7921 + ISLOW_MULT_TYPE * quantptr;

	7922 + int * wsptr;

	7923 + JSAMPROW outptr;

	7924 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	7925 + int ctr;

	7926 + int workspace[810]; / buffers data between passes */

	7927 + SHIFT_TEMPS

	7928 +

	7929 + /* Pass 1: process columns from input, store into work array. */

	7930 +

	7931 + inptr = coef_block;

	7932 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	7933 + wsptr = workspace;

	7934 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {

	7935 + /* Even part */

	7936 +

	7937 + z3 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	7938 + z3 <<= CONST_BITS;

	7939 + /* Add fudge factor here for final descale. */

	7940 + z3 += ONE << (CONST_BITS-PASS1_BITS-1);

	7941 + z4 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	7942 + z1 = MULTIPLY(z4, FIX(1.144122806)); /* c4 */

	7943 + z2 = MULTIPLY(z4, FIX(0.437016024)); /* c8 */

	7944 + tmp10 = z3 + z1;

	7945 + tmp11 = z3 - z2;

	7946 +

	7947 + tmp22 = RIGHT_SHIFT(z3 - ((z1 - z2) << 1), /* c0 = (c4-c8)2 /

	7948 + CONST_BITS-PASS1_BITS);

	7949 +

	7950 + z2 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	7951 + z3 = DEQUANTIZE(inptr[DCTSIZE6], quantptr[DCTSIZE6]);

	7952 +

	7953 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c6 */

	7954 + tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */

	7955 + tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */

	7956 +

	7957 + tmp20 = tmp10 + tmp12;

	7958 + tmp24 = tmp10 - tmp12;

	7959 + tmp21 = tmp11 + tmp13;

	7960 + tmp23 = tmp11 - tmp13;

	7961 +

	7962 + /* Odd part */

	7963 +

	7964 + z1 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	7965 + z2 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	7966 + z3 = DEQUANTIZE(inptr[DCTSIZE5], quantptr[DCTSIZE5]);

	7967 + z4 = DEQUANTIZE(inptr[DCTSIZE7], quantptr[DCTSIZE7]);

	7968 +

	7969 + tmp11 = z2 + z4;

	7970 + tmp13 = z2 - z4;

	7971 +

	7972 + tmp12 = MULTIPLY(tmp13, FIX(0.309016994)); /* (c3-c7)/2 */

	7973 + z5 = z3 << CONST_BITS;

	7974 +

	7975 + z2 = MULTIPLY(tmp11, FIX(0.951056516)); /* (c3+c7)/2 */

	7976 + z4 = z5 + tmp12;

	7977 +

	7978 + tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */

	7979 + tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */

	7980 +

	7981 + z2 = MULTIPLY(tmp11, FIX(0.587785252)); /* (c1-c9)/2 */

	7982 + z4 = z5 - tmp12 - (tmp13 << (CONST_BITS - 1));

	7983 +

	7984 + tmp12 = (z1 - tmp13 - z3) << PASS1_BITS;

	7985 +

	7986 + tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */

	7987 + tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */

	7988 +

	7989 + /* Final output stage */

	7990 +

	7991 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);

	7992 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);

	7993 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);

	7994 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);

	7995 + wsptr[8*2] = (int) (tmp22 + tmp12);

	7996 + wsptr[8*7] = (int) (tmp22 - tmp12);

	7997 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);

	7998 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);

	7999 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);

	8000 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);

	8001 + }

	8002 +

	8003 + /* Pass 2: process 10 rows from work array, store into output array. */

	8004 +

	8005 + wsptr = workspace;

	8006 + for (ctr = 0; ctr < 10; ctr++) {

	8007 + outptr = output_buf[ctr] + output_col;

	8008 +

	8009 + /* Even part */

	8010 +

	8011 + /* Add fudge factor here for final descale. */

	8012 + z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	8013 + z3 <<= CONST_BITS;

	8014 + z4 = (INT32) wsptr[4];

	8015 + z1 = MULTIPLY(z4, FIX(1.144122806)); /* c4 */

	8016 + z2 = MULTIPLY(z4, FIX(0.437016024)); /* c8 */

	8017 + tmp10 = z3 + z1;

	8018 + tmp11 = z3 - z2;

	8019 +

	8020 + tmp22 = z3 - ((z1 - z2) << 1); /* c0 = (c4-c8)2 /

	8021 +

	8022 + z2 = (INT32) wsptr[2];

	8023 + z3 = (INT32) wsptr[6];

	8024 +

	8025 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c6 */

	8026 + tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */

	8027 + tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */

	8028 +

	8029 + tmp20 = tmp10 + tmp12;

	8030 + tmp24 = tmp10 - tmp12;

	8031 + tmp21 = tmp11 + tmp13;

	8032 + tmp23 = tmp11 - tmp13;

	8033 +

	8034 + /* Odd part */

	8035 +

	8036 + z1 = (INT32) wsptr[1];

	8037 + z2 = (INT32) wsptr[3];

	8038 + z3 = (INT32) wsptr[5];

	8039 + z3 <<= CONST_BITS;

	8040 + z4 = (INT32) wsptr[7];

	8041 +

	8042 + tmp11 = z2 + z4;

	8043 + tmp13 = z2 - z4;

	8044 +

	8045 + tmp12 = MULTIPLY(tmp13, FIX(0.309016994)); /* (c3-c7)/2 */

	8046 +

	8047 + z2 = MULTIPLY(tmp11, FIX(0.951056516)); /* (c3+c7)/2 */

	8048 + z4 = z3 + tmp12;

	8049 +

	8050 + tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */

	8051 + tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */

	8052 +

	8053 + z2 = MULTIPLY(tmp11, FIX(0.587785252)); /* (c1-c9)/2 */

	8054 + z4 = z3 - tmp12 - (tmp13 << (CONST_BITS - 1));

	8055 +

	8056 + tmp12 = ((z1 - tmp13) << CONST_BITS) - z3;

	8057 +

	8058 + tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */

	8059 + tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */

	8060 +

	8061 + /* Final output stage */

	8062 +

	8063 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,

	8064 + CONST_BITS+PASS1_BITS+3)

	8065 + & RANGE_MASK];

	8066 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,

	8067 + CONST_BITS+PASS1_BITS+3)

	8068 + & RANGE_MASK];

	8069 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,

	8070 + CONST_BITS+PASS1_BITS+3)

	8071 + & RANGE_MASK];

	8072 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,

	8073 + CONST_BITS+PASS1_BITS+3)

	8074 + & RANGE_MASK];

	8075 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,

	8076 + CONST_BITS+PASS1_BITS+3)

	8077 + & RANGE_MASK];

	8078 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,

	8079 + CONST_BITS+PASS1_BITS+3)

	8080 + & RANGE_MASK];

	8081 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,

	8082 + CONST_BITS+PASS1_BITS+3)

	8083 + & RANGE_MASK];

	8084 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,

	8085 + CONST_BITS+PASS1_BITS+3)

	8086 + & RANGE_MASK];

	8087 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,

	8088 + CONST_BITS+PASS1_BITS+3)

	8089 + & RANGE_MASK];

	8090 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,

	8091 + CONST_BITS+PASS1_BITS+3)

	8092 + & RANGE_MASK];

	8093 +

	8094 + wsptr += 8; /* advance pointer to next row */

	8095 + }

	8096 +}

	8097 +

	8098 +

	8099 +/*

	8100 + * Perform dequantization and inverse DCT on one block of coefficients,

	8101 + * producing a 11x11 output block.

	8102 + *

	8103 + * Optimized algorithm with 24 multiplications in the 1-D kernel.

	8104 + * cK represents sqrt(2) * cos(K*pi/22).

	8105 + */

	8106 +

	8107 +GLOBAL(void)

	8108 +jpeg_idct_11x11 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	8109 + JCOEFPTR coef_block,

	8110 + JSAMPARRAY output_buf, JDIMENSION output_col)

	8111 +{

	8112 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14;

	8113 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25;

	8114 + INT32 z1, z2, z3, z4;

	8115 + JCOEFPTR inptr;

	8116 + ISLOW_MULT_TYPE * quantptr;

	8117 + int * wsptr;

	8118 + JSAMPROW outptr;

	8119 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	8120 + int ctr;

	8121 + int workspace[811]; / buffers data between passes */

	8122 + SHIFT_TEMPS

	8123 +

	8124 + /* Pass 1: process columns from input, store into work array. */

	8125 +

	8126 + inptr = coef_block;

	8127 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	8128 + wsptr = workspace;

	8129 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {

	8130 + /* Even part */

	8131 +

	8132 + tmp10 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	8133 + tmp10 <<= CONST_BITS;

	8134 + /* Add fudge factor here for final descale. */

	8135 + tmp10 += ONE << (CONST_BITS-PASS1_BITS-1);

	8136 +

	8137 + z1 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	8138 + z2 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	8139 + z3 = DEQUANTIZE(inptr[DCTSIZE6], quantptr[DCTSIZE6]);

	8140 +

	8141 + tmp20 = MULTIPLY(z2 - z3, FIX(2.546640132)); /* c2+c4 */

	8142 + tmp23 = MULTIPLY(z2 - z1, FIX(0.430815045)); /* c2-c6 */

	8143 + z4 = z1 + z3;

	8144 + tmp24 = MULTIPLY(z4, - FIX(1.155664402)); /* -(c2-c10) */

	8145 + z4 -= z2;

	8146 + tmp25 = tmp10 + MULTIPLY(z4, FIX(1.356927976)); /* c2 */

	8147 + tmp21 = tmp20 + tmp23 + tmp25 -

	8148 + MULTIPLY(z2, FIX(1.821790775)); /* c2+c4+c10-c6 */

	8149 + tmp20 += tmp25 + MULTIPLY(z3, FIX(2.115825087)); /* c4+c6 */

	8150 + tmp23 += tmp25 - MULTIPLY(z1, FIX(1.513598477)); /* c6+c8 */

	8151 + tmp24 += tmp25;

	8152 + tmp22 = tmp24 - MULTIPLY(z3, FIX(0.788749120)); /* c8+c10 */

	8153 + tmp24 += MULTIPLY(z2, FIX(1.944413522)) - /* c2+c8 */

	8154 + MULTIPLY(z1, FIX(1.390975730)); /* c4+c10 */

	8155 + tmp25 = tmp10 - MULTIPLY(z4, FIX(1.414213562)); /* c0 */

	8156 +

	8157 + /* Odd part */

	8158 +

	8159 + z1 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	8160 + z2 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	8161 + z3 = DEQUANTIZE(inptr[DCTSIZE5], quantptr[DCTSIZE5]);

	8162 + z4 = DEQUANTIZE(inptr[DCTSIZE7], quantptr[DCTSIZE7]);

	8163 +

	8164 + tmp11 = z1 + z2;

	8165 + tmp14 = MULTIPLY(tmp11 + z3 + z4, FIX(0.398430003)); /* c9 */

	8166 + tmp11 = MULTIPLY(tmp11, FIX(0.887983902)); /* c3-c9 */

	8167 + tmp12 = MULTIPLY(z1 + z3, FIX(0.670361295)); /* c5-c9 */

	8168 + tmp13 = tmp14 + MULTIPLY(z1 + z4, FIX(0.366151574)); /* c7-c9 */

	8169 + tmp10 = tmp11 + tmp12 + tmp13 -

	8170 + MULTIPLY(z1, FIX(0.923107866)); /* c7+c5+c3-c1-2c9 /

	8171 + z1 = tmp14 - MULTIPLY(z2 + z3, FIX(1.163011579)); /* c7+c9 */

	8172 + tmp11 += z1 + MULTIPLY(z2, FIX(2.073276588)); /* c1+c7+3c9-c3 /

	8173 + tmp12 += z1 - MULTIPLY(z3, FIX(1.192193623)); /* c3+c5-c7-c9 */

	8174 + z1 = MULTIPLY(z2 + z4, - FIX(1.798248910)); /* -(c1+c9) */

	8175 + tmp11 += z1;

	8176 + tmp13 += z1 + MULTIPLY(z4, FIX(2.102458632)); /* c1+c5+c9-c7 */

	8177 + tmp14 += MULTIPLY(z2, - FIX(1.467221301)) + /* -(c5+c9) */

	8178 + MULTIPLY(z3, FIX(1.001388905)) - /* c1-c9 */

	8179 + MULTIPLY(z4, FIX(1.684843907)); /* c3+c9 */

	8180 +

	8181 + /* Final output stage */

	8182 +

	8183 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);

	8184 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);

	8185 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);

	8186 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);

	8187 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);

	8188 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);

	8189 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);

	8190 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);

	8191 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);

	8192 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);

	8193 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25, CONST_BITS-PASS1_BITS);

	8194 + }

	8195 +

	8196 + /* Pass 2: process 11 rows from work array, store into output array. */

	8197 +

	8198 + wsptr = workspace;

	8199 + for (ctr = 0; ctr < 11; ctr++) {

	8200 + outptr = output_buf[ctr] + output_col;

	8201 +

	8202 + /* Even part */

	8203 +

	8204 + /* Add fudge factor here for final descale. */

	8205 + tmp10 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	8206 + tmp10 <<= CONST_BITS;

	8207 +

	8208 + z1 = (INT32) wsptr[2];

	8209 + z2 = (INT32) wsptr[4];

	8210 + z3 = (INT32) wsptr[6];

	8211 +

	8212 + tmp20 = MULTIPLY(z2 - z3, FIX(2.546640132)); /* c2+c4 */

	8213 + tmp23 = MULTIPLY(z2 - z1, FIX(0.430815045)); /* c2-c6 */

	8214 + z4 = z1 + z3;

	8215 + tmp24 = MULTIPLY(z4, - FIX(1.155664402)); /* -(c2-c10) */

	8216 + z4 -= z2;

	8217 + tmp25 = tmp10 + MULTIPLY(z4, FIX(1.356927976)); /* c2 */

	8218 + tmp21 = tmp20 + tmp23 + tmp25 -

	8219 + MULTIPLY(z2, FIX(1.821790775)); /* c2+c4+c10-c6 */

	8220 + tmp20 += tmp25 + MULTIPLY(z3, FIX(2.115825087)); /* c4+c6 */

	8221 + tmp23 += tmp25 - MULTIPLY(z1, FIX(1.513598477)); /* c6+c8 */

	8222 + tmp24 += tmp25;

	8223 + tmp22 = tmp24 - MULTIPLY(z3, FIX(0.788749120)); /* c8+c10 */

	8224 + tmp24 += MULTIPLY(z2, FIX(1.944413522)) - /* c2+c8 */

	8225 + MULTIPLY(z1, FIX(1.390975730)); /* c4+c10 */

	8226 + tmp25 = tmp10 - MULTIPLY(z4, FIX(1.414213562)); /* c0 */

	8227 +

	8228 + /* Odd part */

	8229 +

	8230 + z1 = (INT32) wsptr[1];

	8231 + z2 = (INT32) wsptr[3];

	8232 + z3 = (INT32) wsptr[5];

	8233 + z4 = (INT32) wsptr[7];

	8234 +

	8235 + tmp11 = z1 + z2;

	8236 + tmp14 = MULTIPLY(tmp11 + z3 + z4, FIX(0.398430003)); /* c9 */

	8237 + tmp11 = MULTIPLY(tmp11, FIX(0.887983902)); /* c3-c9 */

	8238 + tmp12 = MULTIPLY(z1 + z3, FIX(0.670361295)); /* c5-c9 */

	8239 + tmp13 = tmp14 + MULTIPLY(z1 + z4, FIX(0.366151574)); /* c7-c9 */

	8240 + tmp10 = tmp11 + tmp12 + tmp13 -

	8241 + MULTIPLY(z1, FIX(0.923107866)); /* c7+c5+c3-c1-2c9 /

	8242 + z1 = tmp14 - MULTIPLY(z2 + z3, FIX(1.163011579)); /* c7+c9 */

	8243 + tmp11 += z1 + MULTIPLY(z2, FIX(2.073276588)); /* c1+c7+3c9-c3 /

	8244 + tmp12 += z1 - MULTIPLY(z3, FIX(1.192193623)); /* c3+c5-c7-c9 */

	8245 + z1 = MULTIPLY(z2 + z4, - FIX(1.798248910)); /* -(c1+c9) */

	8246 + tmp11 += z1;

	8247 + tmp13 += z1 + MULTIPLY(z4, FIX(2.102458632)); /* c1+c5+c9-c7 */

	8248 + tmp14 += MULTIPLY(z2, - FIX(1.467221301)) + /* -(c5+c9) */

	8249 + MULTIPLY(z3, FIX(1.001388905)) - /* c1-c9 */

	8250 + MULTIPLY(z4, FIX(1.684843907)); /* c3+c9 */

	8251 +

	8252 + /* Final output stage */

	8253 +

	8254 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,

	8255 + CONST_BITS+PASS1_BITS+3)

	8256 + & RANGE_MASK];

	8257 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,

	8258 + CONST_BITS+PASS1_BITS+3)

	8259 + & RANGE_MASK];

	8260 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,

	8261 + CONST_BITS+PASS1_BITS+3)

	8262 + & RANGE_MASK];

	8263 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,

	8264 + CONST_BITS+PASS1_BITS+3)

	8265 + & RANGE_MASK];

	8266 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,

	8267 + CONST_BITS+PASS1_BITS+3)

	8268 + & RANGE_MASK];

	8269 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,

	8270 + CONST_BITS+PASS1_BITS+3)

	8271 + & RANGE_MASK];

	8272 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,

	8273 + CONST_BITS+PASS1_BITS+3)

	8274 + & RANGE_MASK];

	8275 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,

	8276 + CONST_BITS+PASS1_BITS+3)

	8277 + & RANGE_MASK];

	8278 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,

	8279 + CONST_BITS+PASS1_BITS+3)

	8280 + & RANGE_MASK];

	8281 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,

	8282 + CONST_BITS+PASS1_BITS+3)

	8283 + & RANGE_MASK];

	8284 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25,

	8285 + CONST_BITS+PASS1_BITS+3)

	8286 + & RANGE_MASK];

	8287 +

	8288 + wsptr += 8; /* advance pointer to next row */

	8289 + }

	8290 +}

	8291 +

	8292 +

	8293 +/*

	8294 + * Perform dequantization and inverse DCT on one block of coefficients,

	8295 + * producing a 12x12 output block.

	8296 + *

	8297 + * Optimized algorithm with 15 multiplications in the 1-D kernel.

	8298 + * cK represents sqrt(2) * cos(K*pi/24).

	8299 + */

	8300 +

	8301 +GLOBAL(void)

	8302 +jpeg_idct_12x12 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	8303 + JCOEFPTR coef_block,

	8304 + JSAMPARRAY output_buf, JDIMENSION output_col)

	8305 +{

	8306 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15;

	8307 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25;

	8308 + INT32 z1, z2, z3, z4;

	8309 + JCOEFPTR inptr;

	8310 + ISLOW_MULT_TYPE * quantptr;

	8311 + int * wsptr;

	8312 + JSAMPROW outptr;

	8313 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	8314 + int ctr;

	8315 + int workspace[812]; / buffers data between passes */

	8316 + SHIFT_TEMPS

	8317 +

	8318 + /* Pass 1: process columns from input, store into work array. */

	8319 +

	8320 + inptr = coef_block;

	8321 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	8322 + wsptr = workspace;

	8323 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {

	8324 + /* Even part */

	8325 +

	8326 + z3 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	8327 + z3 <<= CONST_BITS;

	8328 + /* Add fudge factor here for final descale. */

	8329 + z3 += ONE << (CONST_BITS-PASS1_BITS-1);

	8330 +

	8331 + z4 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	8332 + z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */

	8333 +

	8334 + tmp10 = z3 + z4;

	8335 + tmp11 = z3 - z4;

	8336 +

	8337 + z1 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	8338 + z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */

	8339 + z1 <<= CONST_BITS;

	8340 + z2 = DEQUANTIZE(inptr[DCTSIZE6], quantptr[DCTSIZE6]);

	8341 + z2 <<= CONST_BITS;

	8342 +

	8343 + tmp12 = z1 - z2;

	8344 +

	8345 + tmp21 = z3 + tmp12;

	8346 + tmp24 = z3 - tmp12;

	8347 +

	8348 + tmp12 = z4 + z2;

	8349 +

	8350 + tmp20 = tmp10 + tmp12;

	8351 + tmp25 = tmp10 - tmp12;

	8352 +

	8353 + tmp12 = z4 - z1 - z2;

	8354 +

	8355 + tmp22 = tmp11 + tmp12;

	8356 + tmp23 = tmp11 - tmp12;

	8357 +

	8358 + /* Odd part */

	8359 +

	8360 + z1 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	8361 + z2 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	8362 + z3 = DEQUANTIZE(inptr[DCTSIZE5], quantptr[DCTSIZE5]);

	8363 + z4 = DEQUANTIZE(inptr[DCTSIZE7], quantptr[DCTSIZE7]);

	8364 +

	8365 + tmp11 = MULTIPLY(z2, FIX(1.306562965)); /* c3 */

	8366 + tmp14 = MULTIPLY(z2, - FIX_0_541196100); /* -c9 */

	8367 +

	8368 + tmp10 = z1 + z3;

	8369 + tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669)); /* c7 */

	8370 + tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384)); /* c5-c7 */

	8371 + tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716)); /* c1-c5 */

	8372 + tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580)); /* -(c7+c11) */

	8373 + tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */

	8374 + tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */

	8375 + tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) - /* c7-c11 */

	8376 + MULTIPLY(z4, FIX(1.982889723)); /* c5+c7 */

	8377 +

	8378 + z1 -= z4;

	8379 + z2 -= z3;

	8380 + z3 = MULTIPLY(z1 + z2, FIX_0_541196100); /* c9 */

	8381 + tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865); /* c3-c9 */

	8382 + tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065); /* c3+c9 */

	8383 +

	8384 + /* Final output stage */

	8385 +

	8386 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);

	8387 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);

	8388 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);

	8389 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);

	8390 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);

	8391 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);

	8392 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);

	8393 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);

	8394 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);

	8395 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);

	8396 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);

	8397 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);

	8398 + }

	8399 +

	8400 + /* Pass 2: process 12 rows from work array, store into output array. */

	8401 +

	8402 + wsptr = workspace;

	8403 + for (ctr = 0; ctr < 12; ctr++) {

	8404 + outptr = output_buf[ctr] + output_col;

	8405 +

	8406 + /* Even part */

	8407 +

	8408 + /* Add fudge factor here for final descale. */

	8409 + z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	8410 + z3 <<= CONST_BITS;

	8411 +

	8412 + z4 = (INT32) wsptr[4];

	8413 + z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */

	8414 +

	8415 + tmp10 = z3 + z4;

	8416 + tmp11 = z3 - z4;

	8417 +

	8418 + z1 = (INT32) wsptr[2];

	8419 + z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */

	8420 + z1 <<= CONST_BITS;

	8421 + z2 = (INT32) wsptr[6];

	8422 + z2 <<= CONST_BITS;

	8423 +

	8424 + tmp12 = z1 - z2;

	8425 +

	8426 + tmp21 = z3 + tmp12;

	8427 + tmp24 = z3 - tmp12;

	8428 +

	8429 + tmp12 = z4 + z2;

	8430 +

	8431 + tmp20 = tmp10 + tmp12;

	8432 + tmp25 = tmp10 - tmp12;

	8433 +

	8434 + tmp12 = z4 - z1 - z2;

	8435 +

	8436 + tmp22 = tmp11 + tmp12;

	8437 + tmp23 = tmp11 - tmp12;

	8438 +

	8439 + /* Odd part */

	8440 +

	8441 + z1 = (INT32) wsptr[1];

	8442 + z2 = (INT32) wsptr[3];

	8443 + z3 = (INT32) wsptr[5];

	8444 + z4 = (INT32) wsptr[7];

	8445 +

	8446 + tmp11 = MULTIPLY(z2, FIX(1.306562965)); /* c3 */

	8447 + tmp14 = MULTIPLY(z2, - FIX_0_541196100); /* -c9 */

	8448 +

	8449 + tmp10 = z1 + z3;

	8450 + tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669)); /* c7 */

	8451 + tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384)); /* c5-c7 */

	8452 + tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716)); /* c1-c5 */

	8453 + tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580)); /* -(c7+c11) */

	8454 + tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */

	8455 + tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */

	8456 + tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) - /* c7-c11 */

	8457 + MULTIPLY(z4, FIX(1.982889723)); /* c5+c7 */

	8458 +

	8459 + z1 -= z4;

	8460 + z2 -= z3;

	8461 + z3 = MULTIPLY(z1 + z2, FIX_0_541196100); /* c9 */

	8462 + tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865); /* c3-c9 */

	8463 + tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065); /* c3+c9 */

	8464 +

	8465 + /* Final output stage */

	8466 +

	8467 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,

	8468 + CONST_BITS+PASS1_BITS+3)

	8469 + & RANGE_MASK];

	8470 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,

	8471 + CONST_BITS+PASS1_BITS+3)

	8472 + & RANGE_MASK];

	8473 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,

	8474 + CONST_BITS+PASS1_BITS+3)

	8475 + & RANGE_MASK];

	8476 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,

	8477 + CONST_BITS+PASS1_BITS+3)

	8478 + & RANGE_MASK];

	8479 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,

	8480 + CONST_BITS+PASS1_BITS+3)

	8481 + & RANGE_MASK];

	8482 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,

	8483 + CONST_BITS+PASS1_BITS+3)

	8484 + & RANGE_MASK];

	8485 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,

	8486 + CONST_BITS+PASS1_BITS+3)

	8487 + & RANGE_MASK];

	8488 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,

	8489 + CONST_BITS+PASS1_BITS+3)

	8490 + & RANGE_MASK];

	8491 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,

	8492 + CONST_BITS+PASS1_BITS+3)

	8493 + & RANGE_MASK];

	8494 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,

	8495 + CONST_BITS+PASS1_BITS+3)

	8496 + & RANGE_MASK];

	8497 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,

	8498 + CONST_BITS+PASS1_BITS+3)

	8499 + & RANGE_MASK];

	8500 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,

	8501 + CONST_BITS+PASS1_BITS+3)

	8502 + & RANGE_MASK];

	8503 +

	8504 + wsptr += 8; /* advance pointer to next row */

	8505 + }

	8506 +}

	8507 +

	8508 +

	8509 +/*

	8510 + * Perform dequantization and inverse DCT on one block of coefficients,

	8511 + * producing a 13x13 output block.

	8512 + *

	8513 + * Optimized algorithm with 29 multiplications in the 1-D kernel.

	8514 + * cK represents sqrt(2) * cos(K*pi/26).

	8515 + */

	8516 +

	8517 +GLOBAL(void)

	8518 +jpeg_idct_13x13 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	8519 + JCOEFPTR coef_block,

	8520 + JSAMPARRAY output_buf, JDIMENSION output_col)

	8521 +{

	8522 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15;

	8523 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26;

	8524 + INT32 z1, z2, z3, z4;

	8525 + JCOEFPTR inptr;

	8526 + ISLOW_MULT_TYPE * quantptr;

	8527 + int * wsptr;

	8528 + JSAMPROW outptr;

	8529 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	8530 + int ctr;

	8531 + int workspace[813]; / buffers data between passes */

	8532 + SHIFT_TEMPS

	8533 +

	8534 + /* Pass 1: process columns from input, store into work array. */

	8535 +

	8536 + inptr = coef_block;

	8537 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	8538 + wsptr = workspace;

	8539 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {

	8540 + /* Even part */

	8541 +

	8542 + z1 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	8543 + z1 <<= CONST_BITS;

	8544 + /* Add fudge factor here for final descale. */

	8545 + z1 += ONE << (CONST_BITS-PASS1_BITS-1);

	8546 +

	8547 + z2 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	8548 + z3 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	8549 + z4 = DEQUANTIZE(inptr[DCTSIZE6], quantptr[DCTSIZE6]);

	8550 +

	8551 + tmp10 = z3 + z4;

	8552 + tmp11 = z3 - z4;

	8553 +

	8554 + tmp12 = MULTIPLY(tmp10, FIX(1.155388986)); /* (c4+c6)/2 */

	8555 + tmp13 = MULTIPLY(tmp11, FIX(0.096834934)) + z1; /* (c4-c6)/2 */

	8556 +

	8557 + tmp20 = MULTIPLY(z2, FIX(1.373119086)) + tmp12 + tmp13; /* c2 */

	8558 + tmp22 = MULTIPLY(z2, FIX(0.501487041)) - tmp12 + tmp13; /* c10 */

	8559 +

	8560 + tmp12 = MULTIPLY(tmp10, FIX(0.316450131)); /* (c8-c12)/2 */

	8561 + tmp13 = MULTIPLY(tmp11, FIX(0.486914739)) + z1; /* (c8+c12)/2 */

	8562 +

	8563 + tmp21 = MULTIPLY(z2, FIX(1.058554052)) - tmp12 + tmp13; /* c6 */

	8564 + tmp25 = MULTIPLY(z2, - FIX(1.252223920)) + tmp12 + tmp13; /* c4 */

	8565 +

	8566 + tmp12 = MULTIPLY(tmp10, FIX(0.435816023)); /* (c2-c10)/2 */

	8567 + tmp13 = MULTIPLY(tmp11, FIX(0.937303064)) - z1; /* (c2+c10)/2 */

	8568 +

	8569 + tmp23 = MULTIPLY(z2, - FIX(0.170464608)) - tmp12 - tmp13; /* c12 */

	8570 + tmp24 = MULTIPLY(z2, - FIX(0.803364869)) + tmp12 - tmp13; /* c8 */

	8571 +

	8572 + tmp26 = MULTIPLY(tmp11 - z2, FIX(1.414213562)) + z1; /* c0 */

	8573 +

	8574 + /* Odd part */

	8575 +

	8576 + z1 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	8577 + z2 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	8578 + z3 = DEQUANTIZE(inptr[DCTSIZE5], quantptr[DCTSIZE5]);

	8579 + z4 = DEQUANTIZE(inptr[DCTSIZE7], quantptr[DCTSIZE7]);

	8580 +

	8581 + tmp11 = MULTIPLY(z1 + z2, FIX(1.322312651)); /* c3 */

	8582 + tmp12 = MULTIPLY(z1 + z3, FIX(1.163874945)); /* c5 */

	8583 + tmp15 = z1 + z4;

	8584 + tmp13 = MULTIPLY(tmp15, FIX(0.937797057)); /* c7 */

	8585 + tmp10 = tmp11 + tmp12 + tmp13 -

	8586 + MULTIPLY(z1, FIX(2.020082300)); /* c7+c5+c3-c1 */

	8587 + tmp14 = MULTIPLY(z2 + z3, - FIX(0.338443458)); /* -c11 */

	8588 + tmp11 += tmp14 + MULTIPLY(z2, FIX(0.837223564)); /* c5+c9+c11-c3 */

	8589 + tmp12 += tmp14 - MULTIPLY(z3, FIX(1.572116027)); /* c1+c5-c9-c11 */

	8590 + tmp14 = MULTIPLY(z2 + z4, - FIX(1.163874945)); /* -c5 */

	8591 + tmp11 += tmp14;

	8592 + tmp13 += tmp14 + MULTIPLY(z4, FIX(2.205608352)); /* c3+c5+c9-c7 */

	8593 + tmp14 = MULTIPLY(z3 + z4, - FIX(0.657217813)); /* -c9 */

	8594 + tmp12 += tmp14;

	8595 + tmp13 += tmp14;

	8596 + tmp15 = MULTIPLY(tmp15, FIX(0.338443458)); /* c11 */

	8597 + tmp14 = tmp15 + MULTIPLY(z1, FIX(0.318774355)) - /* c9-c11 */

	8598 + MULTIPLY(z2, FIX(0.466105296)); /* c1-c7 */

	8599 + z1 = MULTIPLY(z3 - z2, FIX(0.937797057)); /* c7 */

	8600 + tmp14 += z1;

	8601 + tmp15 += z1 + MULTIPLY(z3, FIX(0.384515595)) - /* c3-c7 */

	8602 + MULTIPLY(z4, FIX(1.742345811)); /* c1+c11 */

	8603 +

	8604 + /* Final output stage */

	8605 +

	8606 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);

	8607 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);

	8608 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);

	8609 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);

	8610 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);

	8611 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);

	8612 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);

	8613 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);

	8614 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);

	8615 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);

	8616 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);

	8617 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);

	8618 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26, CONST_BITS-PASS1_BITS);

	8619 + }

	8620 +

	8621 + /* Pass 2: process 13 rows from work array, store into output array. */

	8622 +

	8623 + wsptr = workspace;

	8624 + for (ctr = 0; ctr < 13; ctr++) {

	8625 + outptr = output_buf[ctr] + output_col;

	8626 +

	8627 + /* Even part */

	8628 +

	8629 + /* Add fudge factor here for final descale. */

	8630 + z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	8631 + z1 <<= CONST_BITS;

	8632 +

	8633 + z2 = (INT32) wsptr[2];

	8634 + z3 = (INT32) wsptr[4];

	8635 + z4 = (INT32) wsptr[6];

	8636 +

	8637 + tmp10 = z3 + z4;

	8638 + tmp11 = z3 - z4;

	8639 +

	8640 + tmp12 = MULTIPLY(tmp10, FIX(1.155388986)); /* (c4+c6)/2 */

	8641 + tmp13 = MULTIPLY(tmp11, FIX(0.096834934)) + z1; /* (c4-c6)/2 */

	8642 +

	8643 + tmp20 = MULTIPLY(z2, FIX(1.373119086)) + tmp12 + tmp13; /* c2 */

	8644 + tmp22 = MULTIPLY(z2, FIX(0.501487041)) - tmp12 + tmp13; /* c10 */

	8645 +

	8646 + tmp12 = MULTIPLY(tmp10, FIX(0.316450131)); /* (c8-c12)/2 */

	8647 + tmp13 = MULTIPLY(tmp11, FIX(0.486914739)) + z1; /* (c8+c12)/2 */

	8648 +

	8649 + tmp21 = MULTIPLY(z2, FIX(1.058554052)) - tmp12 + tmp13; /* c6 */

	8650 + tmp25 = MULTIPLY(z2, - FIX(1.252223920)) + tmp12 + tmp13; /* c4 */

	8651 +

	8652 + tmp12 = MULTIPLY(tmp10, FIX(0.435816023)); /* (c2-c10)/2 */

	8653 + tmp13 = MULTIPLY(tmp11, FIX(0.937303064)) - z1; /* (c2+c10)/2 */

	8654 +

	8655 + tmp23 = MULTIPLY(z2, - FIX(0.170464608)) - tmp12 - tmp13; /* c12 */

	8656 + tmp24 = MULTIPLY(z2, - FIX(0.803364869)) + tmp12 - tmp13; /* c8 */

	8657 +

	8658 + tmp26 = MULTIPLY(tmp11 - z2, FIX(1.414213562)) + z1; /* c0 */

	8659 +

	8660 + /* Odd part */

	8661 +

	8662 + z1 = (INT32) wsptr[1];

	8663 + z2 = (INT32) wsptr[3];

	8664 + z3 = (INT32) wsptr[5];

	8665 + z4 = (INT32) wsptr[7];

	8666 +

	8667 + tmp11 = MULTIPLY(z1 + z2, FIX(1.322312651)); /* c3 */

	8668 + tmp12 = MULTIPLY(z1 + z3, FIX(1.163874945)); /* c5 */

	8669 + tmp15 = z1 + z4;

	8670 + tmp13 = MULTIPLY(tmp15, FIX(0.937797057)); /* c7 */

	8671 + tmp10 = tmp11 + tmp12 + tmp13 -

	8672 + MULTIPLY(z1, FIX(2.020082300)); /* c7+c5+c3-c1 */

	8673 + tmp14 = MULTIPLY(z2 + z3, - FIX(0.338443458)); /* -c11 */

	8674 + tmp11 += tmp14 + MULTIPLY(z2, FIX(0.837223564)); /* c5+c9+c11-c3 */

	8675 + tmp12 += tmp14 - MULTIPLY(z3, FIX(1.572116027)); /* c1+c5-c9-c11 */

	8676 + tmp14 = MULTIPLY(z2 + z4, - FIX(1.163874945)); /* -c5 */

	8677 + tmp11 += tmp14;

	8678 + tmp13 += tmp14 + MULTIPLY(z4, FIX(2.205608352)); /* c3+c5+c9-c7 */

	8679 + tmp14 = MULTIPLY(z3 + z4, - FIX(0.657217813)); /* -c9 */

	8680 + tmp12 += tmp14;

	8681 + tmp13 += tmp14;

	8682 + tmp15 = MULTIPLY(tmp15, FIX(0.338443458)); /* c11 */

	8683 + tmp14 = tmp15 + MULTIPLY(z1, FIX(0.318774355)) - /* c9-c11 */

	8684 + MULTIPLY(z2, FIX(0.466105296)); /* c1-c7 */

	8685 + z1 = MULTIPLY(z3 - z2, FIX(0.937797057)); /* c7 */

	8686 + tmp14 += z1;

	8687 + tmp15 += z1 + MULTIPLY(z3, FIX(0.384515595)) - /* c3-c7 */

	8688 + MULTIPLY(z4, FIX(1.742345811)); /* c1+c11 */

	8689 +

	8690 + /* Final output stage */

	8691 +

	8692 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,

	8693 + CONST_BITS+PASS1_BITS+3)

	8694 + & RANGE_MASK];

	8695 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,

	8696 + CONST_BITS+PASS1_BITS+3)

	8697 + & RANGE_MASK];

	8698 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,

	8699 + CONST_BITS+PASS1_BITS+3)

	8700 + & RANGE_MASK];

	8701 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,

	8702 + CONST_BITS+PASS1_BITS+3)

	8703 + & RANGE_MASK];

	8704 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,

	8705 + CONST_BITS+PASS1_BITS+3)

	8706 + & RANGE_MASK];

	8707 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,

	8708 + CONST_BITS+PASS1_BITS+3)

	8709 + & RANGE_MASK];

	8710 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,

	8711 + CONST_BITS+PASS1_BITS+3)

	8712 + & RANGE_MASK];

	8713 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,

	8714 + CONST_BITS+PASS1_BITS+3)

	8715 + & RANGE_MASK];

	8716 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,

	8717 + CONST_BITS+PASS1_BITS+3)

	8718 + & RANGE_MASK];

	8719 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,

	8720 + CONST_BITS+PASS1_BITS+3)

	8721 + & RANGE_MASK];

	8722 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,

	8723 + CONST_BITS+PASS1_BITS+3)

	8724 + & RANGE_MASK];

	8725 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,

	8726 + CONST_BITS+PASS1_BITS+3)

	8727 + & RANGE_MASK];

	8728 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26,

	8729 + CONST_BITS+PASS1_BITS+3)

	8730 + & RANGE_MASK];

	8731 +

	8732 + wsptr += 8; /* advance pointer to next row */

	8733 + }

	8734 +}

	8735 +

	8736 +

	8737 +/*

	8738 + * Perform dequantization and inverse DCT on one block of coefficients,

	8739 + * producing a 14x14 output block.

	8740 + *

	8741 + * Optimized algorithm with 20 multiplications in the 1-D kernel.

	8742 + * cK represents sqrt(2) * cos(K*pi/28).

	8743 + */

	8744 +

	8745 +GLOBAL(void)

	8746 +jpeg_idct_14x14 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	8747 + JCOEFPTR coef_block,

	8748 + JSAMPARRAY output_buf, JDIMENSION output_col)

	8749 +{

	8750 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16;

	8751 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26;

	8752 + INT32 z1, z2, z3, z4;

	8753 + JCOEFPTR inptr;

	8754 + ISLOW_MULT_TYPE * quantptr;

	8755 + int * wsptr;

	8756 + JSAMPROW outptr;

	8757 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	8758 + int ctr;

	8759 + int workspace[814]; / buffers data between passes */

	8760 + SHIFT_TEMPS

	8761 +

	8762 + /* Pass 1: process columns from input, store into work array. */

	8763 +

	8764 + inptr = coef_block;

	8765 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	8766 + wsptr = workspace;

	8767 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {

	8768 + /* Even part */

	8769 +

	8770 + z1 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	8771 + z1 <<= CONST_BITS;

	8772 + /* Add fudge factor here for final descale. */

	8773 + z1 += ONE << (CONST_BITS-PASS1_BITS-1);

	8774 + z4 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	8775 + z2 = MULTIPLY(z4, FIX(1.274162392)); /* c4 */

	8776 + z3 = MULTIPLY(z4, FIX(0.314692123)); /* c12 */

	8777 + z4 = MULTIPLY(z4, FIX(0.881747734)); /* c8 */

	8778 +

	8779 + tmp10 = z1 + z2;

	8780 + tmp11 = z1 + z3;

	8781 + tmp12 = z1 - z4;

	8782 +

	8783 + tmp23 = RIGHT_SHIFT(z1 - ((z2 + z3 - z4) << 1), /* c0 = (c4+c12-c8)2 /

	8784 + CONST_BITS-PASS1_BITS);

	8785 +

	8786 + z1 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	8787 + z2 = DEQUANTIZE(inptr[DCTSIZE6], quantptr[DCTSIZE6]);

	8788 +

	8789 + z3 = MULTIPLY(z1 + z2, FIX(1.105676686)); /* c6 */

	8790 +

	8791 + tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */

	8792 + tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */

	8793 + tmp15 = MULTIPLY(z1, FIX(0.613604268)) - /* c10 */

	8794 + MULTIPLY(z2, FIX(1.378756276)); /* c2 */

	8795 +

	8796 + tmp20 = tmp10 + tmp13;

	8797 + tmp26 = tmp10 - tmp13;

	8798 + tmp21 = tmp11 + tmp14;

	8799 + tmp25 = tmp11 - tmp14;

	8800 + tmp22 = tmp12 + tmp15;

	8801 + tmp24 = tmp12 - tmp15;

	8802 +

	8803 + /* Odd part */

	8804 +

	8805 + z1 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	8806 + z2 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	8807 + z3 = DEQUANTIZE(inptr[DCTSIZE5], quantptr[DCTSIZE5]);

	8808 + z4 = DEQUANTIZE(inptr[DCTSIZE7], quantptr[DCTSIZE7]);

	8809 + tmp13 = z4 << CONST_BITS;

	8810 +

	8811 + tmp14 = z1 + z3;

	8812 + tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607)); /* c3 */

	8813 + tmp12 = MULTIPLY(tmp14, FIX(1.197448846)); /* c5 */

	8814 + tmp10 = tmp11 + tmp12 + tmp13 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */

	8815 + tmp14 = MULTIPLY(tmp14, FIX(0.752406978)); /* c9 */

	8816 + tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426)); /* c9+c11-c13 */

	8817 + z1 -= z2;

	8818 + tmp15 = MULTIPLY(z1, FIX(0.467085129)) - tmp13; /* c11 */

	8819 + tmp16 += tmp15;

	8820 + z1 += z4;

	8821 + z4 = MULTIPLY(z2 + z3, - FIX(0.158341681)) - tmp13; /* -c13 */

	8822 + tmp11 += z4 - MULTIPLY(z2, FIX(0.424103948)); /* c3-c9-c13 */

	8823 + tmp12 += z4 - MULTIPLY(z3, FIX(2.373959773)); /* c3+c5-c13 */

	8824 + z4 = MULTIPLY(z3 - z2, FIX(1.405321284)); /* c1 */

	8825 + tmp14 += z4 + tmp13 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */

	8826 + tmp15 += z4 + MULTIPLY(z2, FIX(0.674957567)); /* c1+c11-c5 */

	8827 +

	8828 + tmp13 = (z1 - z3) << PASS1_BITS;

	8829 +

	8830 + /* Final output stage */

	8831 +

	8832 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);

	8833 + wsptr[8*13] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);

	8834 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);

	8835 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);

	8836 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);

	8837 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);

	8838 + wsptr[8*3] = (int) (tmp23 + tmp13);

	8839 + wsptr[8*10] = (int) (tmp23 - tmp13);

	8840 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);

	8841 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);

	8842 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);

	8843 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);

	8844 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS);

	8845 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS);

	8846 + }

	8847 +

	8848 + /* Pass 2: process 14 rows from work array, store into output array. */

	8849 +

	8850 + wsptr = workspace;

	8851 + for (ctr = 0; ctr < 14; ctr++) {

	8852 + outptr = output_buf[ctr] + output_col;

	8853 +

	8854 + /* Even part */

	8855 +

	8856 + /* Add fudge factor here for final descale. */

	8857 + z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	8858 + z1 <<= CONST_BITS;

	8859 + z4 = (INT32) wsptr[4];

	8860 + z2 = MULTIPLY(z4, FIX(1.274162392)); /* c4 */

	8861 + z3 = MULTIPLY(z4, FIX(0.314692123)); /* c12 */

	8862 + z4 = MULTIPLY(z4, FIX(0.881747734)); /* c8 */

	8863 +

	8864 + tmp10 = z1 + z2;

	8865 + tmp11 = z1 + z3;

	8866 + tmp12 = z1 - z4;

	8867 +

	8868 + tmp23 = z1 - ((z2 + z3 - z4) << 1); /* c0 = (c4+c12-c8)2 /

	8869 +

	8870 + z1 = (INT32) wsptr[2];

	8871 + z2 = (INT32) wsptr[6];

	8872 +

	8873 + z3 = MULTIPLY(z1 + z2, FIX(1.105676686)); /* c6 */

	8874 +

	8875 + tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */

	8876 + tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */

	8877 + tmp15 = MULTIPLY(z1, FIX(0.613604268)) - /* c10 */

	8878 + MULTIPLY(z2, FIX(1.378756276)); /* c2 */

	8879 +

	8880 + tmp20 = tmp10 + tmp13;

	8881 + tmp26 = tmp10 - tmp13;

	8882 + tmp21 = tmp11 + tmp14;

	8883 + tmp25 = tmp11 - tmp14;

	8884 + tmp22 = tmp12 + tmp15;

	8885 + tmp24 = tmp12 - tmp15;

	8886 +

	8887 + /* Odd part */

	8888 +

	8889 + z1 = (INT32) wsptr[1];

	8890 + z2 = (INT32) wsptr[3];

	8891 + z3 = (INT32) wsptr[5];

	8892 + z4 = (INT32) wsptr[7];

	8893 + z4 <<= CONST_BITS;

	8894 +

	8895 + tmp14 = z1 + z3;

	8896 + tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607)); /* c3 */

	8897 + tmp12 = MULTIPLY(tmp14, FIX(1.197448846)); /* c5 */

	8898 + tmp10 = tmp11 + tmp12 + z4 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */

	8899 + tmp14 = MULTIPLY(tmp14, FIX(0.752406978)); /* c9 */

	8900 + tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426)); /* c9+c11-c13 */

	8901 + z1 -= z2;

	8902 + tmp15 = MULTIPLY(z1, FIX(0.467085129)) - z4; /* c11 */

	8903 + tmp16 += tmp15;

	8904 + tmp13 = MULTIPLY(z2 + z3, - FIX(0.158341681)) - z4; /* -c13 */

	8905 + tmp11 += tmp13 - MULTIPLY(z2, FIX(0.424103948)); /* c3-c9-c13 */

	8906 + tmp12 += tmp13 - MULTIPLY(z3, FIX(2.373959773)); /* c3+c5-c13 */

	8907 + tmp13 = MULTIPLY(z3 - z2, FIX(1.405321284)); /* c1 */

	8908 + tmp14 += tmp13 + z4 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */

	8909 + tmp15 += tmp13 + MULTIPLY(z2, FIX(0.674957567)); /* c1+c11-c5 */

	8910 +

	8911 + tmp13 = ((z1 - z3) << CONST_BITS) + z4;

	8912 +

	8913 + /* Final output stage */

	8914 +

	8915 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,

	8916 + CONST_BITS+PASS1_BITS+3)

	8917 + & RANGE_MASK];

	8918 + outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,

	8919 + CONST_BITS+PASS1_BITS+3)

	8920 + & RANGE_MASK];

	8921 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,

	8922 + CONST_BITS+PASS1_BITS+3)

	8923 + & RANGE_MASK];

	8924 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,

	8925 + CONST_BITS+PASS1_BITS+3)

	8926 + & RANGE_MASK];

	8927 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,

	8928 + CONST_BITS+PASS1_BITS+3)

	8929 + & RANGE_MASK];

	8930 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,

	8931 + CONST_BITS+PASS1_BITS+3)

	8932 + & RANGE_MASK];

	8933 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,

	8934 + CONST_BITS+PASS1_BITS+3)

	8935 + & RANGE_MASK];

	8936 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,

	8937 + CONST_BITS+PASS1_BITS+3)

	8938 + & RANGE_MASK];

	8939 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,

	8940 + CONST_BITS+PASS1_BITS+3)

	8941 + & RANGE_MASK];

	8942 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,

	8943 + CONST_BITS+PASS1_BITS+3)

	8944 + & RANGE_MASK];

	8945 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,

	8946 + CONST_BITS+PASS1_BITS+3)

	8947 + & RANGE_MASK];

	8948 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,

	8949 + CONST_BITS+PASS1_BITS+3)

	8950 + & RANGE_MASK];

	8951 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16,

	8952 + CONST_BITS+PASS1_BITS+3)

	8953 + & RANGE_MASK];

	8954 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16,

	8955 + CONST_BITS+PASS1_BITS+3)

	8956 + & RANGE_MASK];

	8957 +

	8958 + wsptr += 8; /* advance pointer to next row */

	8959 + }

	8960 +}

	8961 +

	8962 +

	8963 +/*

	8964 + * Perform dequantization and inverse DCT on one block of coefficients,

	8965 + * producing a 15x15 output block.

	8966 + *

	8967 + * Optimized algorithm with 22 multiplications in the 1-D kernel.

	8968 + * cK represents sqrt(2) * cos(K*pi/30).

	8969 + */

	8970 +

	8971 +GLOBAL(void)

	8972 +jpeg_idct_15x15 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	8973 + JCOEFPTR coef_block,

	8974 + JSAMPARRAY output_buf, JDIMENSION output_col)

	8975 +{

	8976 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16;

	8977 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27;

	8978 + INT32 z1, z2, z3, z4;

	8979 + JCOEFPTR inptr;

	8980 + ISLOW_MULT_TYPE * quantptr;

	8981 + int * wsptr;

	8982 + JSAMPROW outptr;

	8983 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	8984 + int ctr;

	8985 + int workspace[815]; / buffers data between passes */

	8986 + SHIFT_TEMPS

	8987 +

	8988 + /* Pass 1: process columns from input, store into work array. */

	8989 +

	8990 + inptr = coef_block;

	8991 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	8992 + wsptr = workspace;

	8993 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {

	8994 + /* Even part */

	8995 +

	8996 + z1 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	8997 + z1 <<= CONST_BITS;

	8998 + /* Add fudge factor here for final descale. */

	8999 + z1 += ONE << (CONST_BITS-PASS1_BITS-1);

	9000 +

	9001 + z2 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	9002 + z3 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	9003 + z4 = DEQUANTIZE(inptr[DCTSIZE6], quantptr[DCTSIZE6]);

	9004 +

	9005 + tmp10 = MULTIPLY(z4, FIX(0.437016024)); /* c12 */

	9006 + tmp11 = MULTIPLY(z4, FIX(1.144122806)); /* c6 */

	9007 +

	9008 + tmp12 = z1 - tmp10;

	9009 + tmp13 = z1 + tmp11;

	9010 + z1 -= (tmp11 - tmp10) << 1; /* c0 = (c6-c12)2 /

	9011 +

	9012 + z4 = z2 - z3;

	9013 + z3 += z2;

	9014 + tmp10 = MULTIPLY(z3, FIX(1.337628990)); /* (c2+c4)/2 */

	9015 + tmp11 = MULTIPLY(z4, FIX(0.045680613)); /* (c2-c4)/2 */

	9016 + z2 = MULTIPLY(z2, FIX(1.439773946)); /* c4+c14 */

	9017 +

	9018 + tmp20 = tmp13 + tmp10 + tmp11;

	9019 + tmp23 = tmp12 - tmp10 + tmp11 + z2;

	9020 +

	9021 + tmp10 = MULTIPLY(z3, FIX(0.547059574)); /* (c8+c14)/2 */

	9022 + tmp11 = MULTIPLY(z4, FIX(0.399234004)); /* (c8-c14)/2 */

	9023 +

	9024 + tmp25 = tmp13 - tmp10 - tmp11;

	9025 + tmp26 = tmp12 + tmp10 - tmp11 - z2;

	9026 +

	9027 + tmp10 = MULTIPLY(z3, FIX(0.790569415)); /* (c6+c12)/2 */

	9028 + tmp11 = MULTIPLY(z4, FIX(0.353553391)); /* (c6-c12)/2 */

	9029 +

	9030 + tmp21 = tmp12 + tmp10 + tmp11;

	9031 + tmp24 = tmp13 - tmp10 + tmp11;

	9032 + tmp11 += tmp11;

	9033 + tmp22 = z1 + tmp11; /* c10 = c6-c12 */

	9034 + tmp27 = z1 - tmp11 - tmp11; /* c0 = (c6-c12)2 /

	9035 +

	9036 + /* Odd part */

	9037 +

	9038 + z1 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	9039 + z2 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	9040 + z4 = DEQUANTIZE(inptr[DCTSIZE5], quantptr[DCTSIZE5]);

	9041 + z3 = MULTIPLY(z4, FIX(1.224744871)); /* c5 */

	9042 + z4 = DEQUANTIZE(inptr[DCTSIZE7], quantptr[DCTSIZE7]);

	9043 +

	9044 + tmp13 = z2 - z4;

	9045 + tmp15 = MULTIPLY(z1 + tmp13, FIX(0.831253876)); /* c9 */

	9046 + tmp11 = tmp15 + MULTIPLY(z1, FIX(0.513743148)); /* c3-c9 */

	9047 + tmp14 = tmp15 - MULTIPLY(tmp13, FIX(2.176250899)); /* c3+c9 */

	9048 +

	9049 + tmp13 = MULTIPLY(z2, - FIX(0.831253876)); /* -c9 */

	9050 + tmp15 = MULTIPLY(z2, - FIX(1.344997024)); /* -c3 */

	9051 + z2 = z1 - z4;

	9052 + tmp12 = z3 + MULTIPLY(z2, FIX(1.406466353)); /* c1 */

	9053 +

	9054 + tmp10 = tmp12 + MULTIPLY(z4, FIX(2.457431844)) - tmp15; /* c1+c7 */

	9055 + tmp16 = tmp12 - MULTIPLY(z1, FIX(1.112434820)) + tmp13; /* c1-c13 */

	9056 + tmp12 = MULTIPLY(z2, FIX(1.224744871)) - z3; /* c5 */

	9057 + z2 = MULTIPLY(z1 + z4, FIX(0.575212477)); /* c11 */

	9058 + tmp13 += z2 + MULTIPLY(z1, FIX(0.475753014)) - z3; /* c7-c11 */

	9059 + tmp15 += z2 - MULTIPLY(z4, FIX(0.869244010)) + z3; /* c11+c13 */

	9060 +

	9061 + /* Final output stage */

	9062 +

	9063 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS);

	9064 + wsptr[8*14] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS);

	9065 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS);

	9066 + wsptr[8*13] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS);

	9067 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS);

	9068 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS);

	9069 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS);

	9070 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS);

	9071 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS);

	9072 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS);

	9073 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS);

	9074 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS);

	9075 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS);

	9076 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS);

	9077 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp27, CONST_BITS-PASS1_BITS);

	9078 + }

	9079 +

	9080 + /* Pass 2: process 15 rows from work array, store into output array. */

	9081 +

	9082 + wsptr = workspace;

	9083 + for (ctr = 0; ctr < 15; ctr++) {

	9084 + outptr = output_buf[ctr] + output_col;

	9085 +

	9086 + /* Even part */

	9087 +

	9088 + /* Add fudge factor here for final descale. */

	9089 + z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	9090 + z1 <<= CONST_BITS;

	9091 +

	9092 + z2 = (INT32) wsptr[2];

	9093 + z3 = (INT32) wsptr[4];

	9094 + z4 = (INT32) wsptr[6];

	9095 +

	9096 + tmp10 = MULTIPLY(z4, FIX(0.437016024)); /* c12 */

	9097 + tmp11 = MULTIPLY(z4, FIX(1.144122806)); /* c6 */

	9098 +

	9099 + tmp12 = z1 - tmp10;

	9100 + tmp13 = z1 + tmp11;

	9101 + z1 -= (tmp11 - tmp10) << 1; /* c0 = (c6-c12)2 /

	9102 +

	9103 + z4 = z2 - z3;

	9104 + z3 += z2;

	9105 + tmp10 = MULTIPLY(z3, FIX(1.337628990)); /* (c2+c4)/2 */

	9106 + tmp11 = MULTIPLY(z4, FIX(0.045680613)); /* (c2-c4)/2 */

	9107 + z2 = MULTIPLY(z2, FIX(1.439773946)); /* c4+c14 */

	9108 +

	9109 + tmp20 = tmp13 + tmp10 + tmp11;

	9110 + tmp23 = tmp12 - tmp10 + tmp11 + z2;

	9111 +

	9112 + tmp10 = MULTIPLY(z3, FIX(0.547059574)); /* (c8+c14)/2 */

	9113 + tmp11 = MULTIPLY(z4, FIX(0.399234004)); /* (c8-c14)/2 */

	9114 +

	9115 + tmp25 = tmp13 - tmp10 - tmp11;

	9116 + tmp26 = tmp12 + tmp10 - tmp11 - z2;

	9117 +

	9118 + tmp10 = MULTIPLY(z3, FIX(0.790569415)); /* (c6+c12)/2 */

	9119 + tmp11 = MULTIPLY(z4, FIX(0.353553391)); /* (c6-c12)/2 */

	9120 +

	9121 + tmp21 = tmp12 + tmp10 + tmp11;

	9122 + tmp24 = tmp13 - tmp10 + tmp11;

	9123 + tmp11 += tmp11;

	9124 + tmp22 = z1 + tmp11; /* c10 = c6-c12 */

	9125 + tmp27 = z1 - tmp11 - tmp11; /* c0 = (c6-c12)2 /

	9126 +

	9127 + /* Odd part */

	9128 +

	9129 + z1 = (INT32) wsptr[1];

	9130 + z2 = (INT32) wsptr[3];

	9131 + z4 = (INT32) wsptr[5];

	9132 + z3 = MULTIPLY(z4, FIX(1.224744871)); /* c5 */

	9133 + z4 = (INT32) wsptr[7];

	9134 +

	9135 + tmp13 = z2 - z4;

	9136 + tmp15 = MULTIPLY(z1 + tmp13, FIX(0.831253876)); /* c9 */

	9137 + tmp11 = tmp15 + MULTIPLY(z1, FIX(0.513743148)); /* c3-c9 */

	9138 + tmp14 = tmp15 - MULTIPLY(tmp13, FIX(2.176250899)); /* c3+c9 */

	9139 +

	9140 + tmp13 = MULTIPLY(z2, - FIX(0.831253876)); /* -c9 */

	9141 + tmp15 = MULTIPLY(z2, - FIX(1.344997024)); /* -c3 */

	9142 + z2 = z1 - z4;

	9143 + tmp12 = z3 + MULTIPLY(z2, FIX(1.406466353)); /* c1 */

	9144 +

	9145 + tmp10 = tmp12 + MULTIPLY(z4, FIX(2.457431844)) - tmp15; /* c1+c7 */

	9146 + tmp16 = tmp12 - MULTIPLY(z1, FIX(1.112434820)) + tmp13; /* c1-c13 */

	9147 + tmp12 = MULTIPLY(z2, FIX(1.224744871)) - z3; /* c5 */

	9148 + z2 = MULTIPLY(z1 + z4, FIX(0.575212477)); /* c11 */

	9149 + tmp13 += z2 + MULTIPLY(z1, FIX(0.475753014)) - z3; /* c7-c11 */

	9150 + tmp15 += z2 - MULTIPLY(z4, FIX(0.869244010)) + z3; /* c11+c13 */

	9151 +

	9152 + /* Final output stage */

	9153 +

	9154 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10,

	9155 + CONST_BITS+PASS1_BITS+3)

	9156 + & RANGE_MASK];

	9157 + outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10,

	9158 + CONST_BITS+PASS1_BITS+3)

	9159 + & RANGE_MASK];

	9160 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11,

	9161 + CONST_BITS+PASS1_BITS+3)

	9162 + & RANGE_MASK];

	9163 + outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11,

	9164 + CONST_BITS+PASS1_BITS+3)

	9165 + & RANGE_MASK];

	9166 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12,

	9167 + CONST_BITS+PASS1_BITS+3)

	9168 + & RANGE_MASK];

	9169 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12,

	9170 + CONST_BITS+PASS1_BITS+3)

	9171 + & RANGE_MASK];

	9172 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13,

	9173 + CONST_BITS+PASS1_BITS+3)

	9174 + & RANGE_MASK];

	9175 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13,

	9176 + CONST_BITS+PASS1_BITS+3)

	9177 + & RANGE_MASK];

	9178 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14,

	9179 + CONST_BITS+PASS1_BITS+3)

	9180 + & RANGE_MASK];

	9181 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14,

	9182 + CONST_BITS+PASS1_BITS+3)

	9183 + & RANGE_MASK];

	9184 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15,

	9185 + CONST_BITS+PASS1_BITS+3)

	9186 + & RANGE_MASK];

	9187 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15,

	9188 + CONST_BITS+PASS1_BITS+3)

	9189 + & RANGE_MASK];

	9190 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16,

	9191 + CONST_BITS+PASS1_BITS+3)

	9192 + & RANGE_MASK];

	9193 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16,

	9194 + CONST_BITS+PASS1_BITS+3)

	9195 + & RANGE_MASK];

	9196 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp27,

	9197 + CONST_BITS+PASS1_BITS+3)

	9198 + & RANGE_MASK];

	9199 +

	9200 + wsptr += 8; /* advance pointer to next row */

	9201 + }

	9202 +}

	9203 +

	9204 +

	9205 +/*

	9206 + * Perform dequantization and inverse DCT on one block of coefficients,

	9207 + * producing a 16x16 output block.

	9208 + *

	9209 + * Optimized algorithm with 28 multiplications in the 1-D kernel.

	9210 + * cK represents sqrt(2) * cos(K*pi/32).

	9211 + */

	9212 +

	9213 +GLOBAL(void)

	9214 +jpeg_idct_16x16 (j_decompress_ptr cinfo, jpeg_component_info * compptr,

	9215 + JCOEFPTR coef_block,

	9216 + JSAMPARRAY output_buf, JDIMENSION output_col)

	9217 +{

	9218 + INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13;

	9219 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27;

	9220 + INT32 z1, z2, z3, z4;

	9221 + JCOEFPTR inptr;

	9222 + ISLOW_MULT_TYPE * quantptr;

	9223 + int * wsptr;

	9224 + JSAMPROW outptr;

	9225 + JSAMPLE *range_limit = IDCT_range_limit(cinfo);

	9226 + int ctr;

	9227 + int workspace[816]; / buffers data between passes */

	9228 + SHIFT_TEMPS

	9229 +

	9230 + /* Pass 1: process columns from input, store into work array. */

	9231 +

	9232 + inptr = coef_block;

	9233 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table;

	9234 + wsptr = workspace;

	9235 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) {

	9236 + /* Even part */

	9237 +

	9238 + tmp0 = DEQUANTIZE(inptr[DCTSIZE0], quantptr[DCTSIZE0]);

	9239 + tmp0 <<= CONST_BITS;

	9240 + /* Add fudge factor here for final descale. */

	9241 + tmp0 += 1 << (CONST_BITS-PASS1_BITS-1);

	9242 +

	9243 + z1 = DEQUANTIZE(inptr[DCTSIZE4], quantptr[DCTSIZE4]);

	9244 + tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */

	9245 + tmp2 = MULTIPLY(z1, FIX_0_541196100); /* c12[16] = c6[8] */

	9246 +

	9247 + tmp10 = tmp0 + tmp1;

	9248 + tmp11 = tmp0 - tmp1;

	9249 + tmp12 = tmp0 + tmp2;

	9250 + tmp13 = tmp0 - tmp2;

	9251 +

	9252 + z1 = DEQUANTIZE(inptr[DCTSIZE2], quantptr[DCTSIZE2]);

	9253 + z2 = DEQUANTIZE(inptr[DCTSIZE6], quantptr[DCTSIZE6]);

	9254 + z3 = z1 - z2;

	9255 + z4 = MULTIPLY(z3, FIX(0.275899379)); /* c14[16] = c7[8] */

	9256 + z3 = MULTIPLY(z3, FIX(1.387039845)); /* c2[16] = c1[8] */

	9257 +

	9258 + tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447); /* (c6+c2)[16] = (c3+c1)[8] */

	9259 + tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223); /* (c6-c14)[16] = (c3-c7)[8] */

	9260 + tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */

	9261 + tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] * /

	9262 +

	9263 + tmp20 = tmp10 + tmp0;

	9264 + tmp27 = tmp10 - tmp0;

	9265 + tmp21 = tmp12 + tmp1;

	9266 + tmp26 = tmp12 - tmp1;

	9267 + tmp22 = tmp13 + tmp2;

	9268 + tmp25 = tmp13 - tmp2;

	9269 + tmp23 = tmp11 + tmp3;

	9270 + tmp24 = tmp11 - tmp3;

	9271 +

	9272 + /* Odd part */

	9273 +

	9274 + z1 = DEQUANTIZE(inptr[DCTSIZE1], quantptr[DCTSIZE1]);

	9275 + z2 = DEQUANTIZE(inptr[DCTSIZE3], quantptr[DCTSIZE3]);

	9276 + z3 = DEQUANTIZE(inptr[DCTSIZE5], quantptr[DCTSIZE5]);

	9277 + z4 = DEQUANTIZE(inptr[DCTSIZE7], quantptr[DCTSIZE7]);

	9278 +

	9279 + tmp11 = z1 + z3;

	9280 +

	9281 + tmp1 = MULTIPLY(z1 + z2, FIX(1.353318001)); /* c3 */

	9282 + tmp2 = MULTIPLY(tmp11, FIX(1.247225013)); /* c5 */

	9283 + tmp3 = MULTIPLY(z1 + z4, FIX(1.093201867)); /* c7 */

	9284 + tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586)); /* c9 */

	9285 + tmp11 = MULTIPLY(tmp11, FIX(0.666655658)); /* c11 */

	9286 + tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528)); /* c13 */

	9287 + tmp0 = tmp1 + tmp2 + tmp3 -

	9288 + MULTIPLY(z1, FIX(2.286341144)); /* c7+c5+c3-c1 */

	9289 + tmp13 = tmp10 + tmp11 + tmp12 -

	9290 + MULTIPLY(z1, FIX(1.835730603)); /* c9+c11+c13-c15 */

	9291 + z1 = MULTIPLY(z2 + z3, FIX(0.138617169)); /* c15 */

	9292 + tmp1 += z1 + MULTIPLY(z2, FIX(0.071888074)); /* c9+c11-c3-c15 */

	9293 + tmp2 += z1 - MULTIPLY(z3, FIX(1.125726048)); /* c5+c7+c15-c3 */

	9294 + z1 = MULTIPLY(z3 - z2, FIX(1.407403738)); /* c1 */

	9295 + tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282)); /* c1+c11-c9-c13 */

	9296 + tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411)); /* c1+c5+c13-c7 */

	9297 + z2 += z4;

	9298 + z1 = MULTIPLY(z2, - FIX(0.666655658)); /* -c11 */

	9299 + tmp1 += z1;

	9300 + tmp3 += z1 + MULTIPLY(z4, FIX(1.065388962)); /* c3+c11+c15-c7 */

	9301 + z2 = MULTIPLY(z2, - FIX(1.247225013)); /* -c5 */

	9302 + tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809)); /* c1+c5+c9-c13 */

	9303 + tmp12 += z2;

	9304 + z2 = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */

	9305 + tmp2 += z2;

	9306 + tmp3 += z2;

	9307 + z2 = MULTIPLY(z4 - z3, FIX(0.410524528)); /* c13 */

	9308 + tmp10 += z2;

	9309 + tmp11 += z2;

	9310 +

	9311 + /* Final output stage */

	9312 +

	9313 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp0, CONST_BITS-PASS1_BITS);

	9314 + wsptr[8*15] = (int) RIGHT_SHIFT(tmp20 - tmp0, CONST_BITS-PASS1_BITS);

	9315 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp1, CONST_BITS-PASS1_BITS);

	9316 + wsptr[8*14] = (int) RIGHT_SHIFT(tmp21 - tmp1, CONST_BITS-PASS1_BITS);

	9317 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp2, CONST_BITS-PASS1_BITS);

	9318 + wsptr[8*13] = (int) RIGHT_SHIFT(tmp22 - tmp2, CONST_BITS-PASS1_BITS);

	9319 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp3, CONST_BITS-PASS1_BITS);

	9320 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp23 - tmp3, CONST_BITS-PASS1_BITS);

	9321 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp10, CONST_BITS-PASS1_BITS);

	9322 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp24 - tmp10, CONST_BITS-PASS1_BITS);

	9323 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp11, CONST_BITS-PASS1_BITS);

	9324 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp25 - tmp11, CONST_BITS-PASS1_BITS);

	9325 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp12, CONST_BITS-PASS1_BITS);

	9326 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp26 - tmp12, CONST_BITS-PASS1_BITS);

	9327 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp27 + tmp13, CONST_BITS-PASS1_BITS);

	9328 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp27 - tmp13, CONST_BITS-PASS1_BITS);

	9329 + }

	9330 +

	9331 + /* Pass 2: process 16 rows from work array, store into output array. */

	9332 +

	9333 + wsptr = workspace;

	9334 + for (ctr = 0; ctr < 16; ctr++) {

	9335 + outptr = output_buf[ctr] + output_col;

	9336 +

	9337 + /* Even part */

	9338 +

	9339 + /* Add fudge factor here for final descale. */

	9340 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2));

	9341 + tmp0 <<= CONST_BITS;

	9342 +

	9343 + z1 = (INT32) wsptr[4];

	9344 + tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */

	9345 + tmp2 = MULTIPLY(z1, FIX_0_541196100); /* c12[16] = c6[8] */

	9346 +

	9347 + tmp10 = tmp0 + tmp1;

	9348 + tmp11 = tmp0 - tmp1;

	9349 + tmp12 = tmp0 + tmp2;

	9350 + tmp13 = tmp0 - tmp2;

	9351 +

	9352 + z1 = (INT32) wsptr[2];

	9353 + z2 = (INT32) wsptr[6];

	9354 + z3 = z1 - z2;

	9355 + z4 = MULTIPLY(z3, FIX(0.275899379)); /* c14[16] = c7[8] */

	9356 + z3 = MULTIPLY(z3, FIX(1.387039845)); /* c2[16] = c1[8] */

	9357 +

	9358 + tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447); /* (c6+c2)[16] = (c3+c1)[8] */

	9359 + tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223); /* (c6-c14)[16] = (c3-c7)[8] */

	9360 + tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */

	9361 + tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] * /

	9362 +

	9363 + tmp20 = tmp10 + tmp0;

	9364 + tmp27 = tmp10 - tmp0;

	9365 + tmp21 = tmp12 + tmp1;

	9366 + tmp26 = tmp12 - tmp1;

	9367 + tmp22 = tmp13 + tmp2;

	9368 + tmp25 = tmp13 - tmp2;

	9369 + tmp23 = tmp11 + tmp3;

	9370 + tmp24 = tmp11 - tmp3;

	9371 +

	9372 + /* Odd part */

	9373 +

	9374 + z1 = (INT32) wsptr[1];

	9375 + z2 = (INT32) wsptr[3];

	9376 + z3 = (INT32) wsptr[5];

	9377 + z4 = (INT32) wsptr[7];

	9378 +

	9379 + tmp11 = z1 + z3;

	9380 +

	9381 + tmp1 = MULTIPLY(z1 + z2, FIX(1.353318001)); /* c3 */

	9382 + tmp2 = MULTIPLY(tmp11, FIX(1.247225013)); /* c5 */

	9383 + tmp3 = MULTIPLY(z1 + z4, FIX(1.093201867)); /* c7 */

	9384 + tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586)); /* c9 */

	9385 + tmp11 = MULTIPLY(tmp11, FIX(0.666655658)); /* c11 */

	9386 + tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528)); /* c13 */

	9387 + tmp0 = tmp1 + tmp2 + tmp3 -

	9388 + MULTIPLY(z1, FIX(2.286341144)); /* c7+c5+c3-c1 */

	9389 + tmp13 = tmp10 + tmp11 + tmp12 -

	9390 + MULTIPLY(z1, FIX(1.835730603)); /* c9+c11+c13-c15 */

	9391 + z1 = MULTIPLY(z2 + z3, FIX(0.138617169)); /* c15 */

	9392 + tmp1 += z1 + MULTIPLY(z2, FIX(0.071888074)); /* c9+c11-c3-c15 */

	9393 + tmp2 += z1 - MULTIPLY(z3, FIX(1.125726048)); /* c5+c7+c15-c3 */

	9394 + z1 = MULTIPLY(z3 - z2, FIX(1.407403738)); /* c1 */

	9395 + tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282)); /* c1+c11-c9-c13 */

	9396 + tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411)); /* c1+c5+c13-c7 */

	9397 + z2 += z4;

	9398 + z1 = MULTIPLY(z2, - FIX(0.666655658)); /* -c11 */

	9399 + tmp1 += z1;

	9400 + tmp3 += z1 + MULTIPLY(z4, FIX(1.065388962)); /* c3+c11+c15-c7 */

	9401 + z2 = MULTIPLY(z2, - FIX(1.247225013)); /* -c5 */

	9402 + tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809)); /* c1+c5+c9-c13 */

	9403 + tmp12 += z2;

	9404 + z2 = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */

	9405 + tmp2 += z2;

	9406 + tmp3 += z2;

	9407 + z2 = MULTIPLY(z4 - z3, FIX(0.410524528)); /* c13 */

	9408 + tmp10 += z2;

	9409 + tmp11 += z2;

	9410 +

	9411 + /* Final output stage */

	9412 +

	9413 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp0,

	9414 + CONST_BITS+PASS1_BITS+3)

	9415 + & RANGE_MASK];

	9416 + outptr[15] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp0,

	9417 + CONST_BITS+PASS1_BITS+3)

	9418 + & RANGE_MASK];

	9419 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp1,

	9420 + CONST_BITS+PASS1_BITS+3)

	9421 + & RANGE_MASK];

	9422 + outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp1,

	9423 + CONST_BITS+PASS1_BITS+3)

	9424 + & RANGE_MASK];

	9425 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp2,

	9426 + CONST_BITS+PASS1_BITS+3)

	9427 + & RANGE_MASK];

	9428 + outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp2,

	9429 + CONST_BITS+PASS1_BITS+3)

	9430 + & RANGE_MASK];

	9431 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp3,

	9432 + CONST_BITS+PASS1_BITS+3)

	9433 + & RANGE_MASK];

	9434 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp3,

	9435 + CONST_BITS+PASS1_BITS+3)

	9436 + & RANGE_MASK];

	9437 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp10,

	9438 + CONST_BITS+PASS1_BITS+3)

	9439 + & RANGE_MASK];

	9440 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp10,

	9441 + CONST_BITS+PASS1_BITS+3)

	9442 + & RANGE_MASK];

	9443 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp11,

	9444 + CONST_BITS+PASS1_BITS+3)

	9445 + & RANGE_MASK];

	9446 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp11,

	9447 + CONST_BITS+PASS1_BITS+3)

	9448 + & RANGE_MASK];

	9449 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp12,

	9450 + CONST_BITS+PASS1_BITS+3)

	9451 + & RANGE_MASK];

	9452 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp12,

	9453 + CONST_BITS+PASS1_BITS+3)

	9454 + & RANGE_MASK];

	9455 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp27 + tmp13,

	9456 + CONST_BITS+PASS1_BITS+3)

	9457 + & RANGE_MASK];

	9458 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp27 - tmp13,

	9459 + CONST_BITS+PASS1_BITS+3)

	9460 + & RANGE_MASK];

	9461 +

	9462 + wsptr += 8; /* advance pointer to next row */

	9463 + }

	9464 +}

	9465 +

	9466 +#endif /* IDCT_SCALING_SUPPORTED */

	9467 #endif /* DCT_ISLOW_SUPPORTED */

	9468 Index: jmemmgr.c

	9469 ===================================================================

	9470 --- jmemmgr.c (revision 829)

	9471 +++ jmemmgr.c (working copy)

	9472 @@ -37,6 +37,15 @@

	9473 #endif

	9474

	9475

	9476 +LOCAL(size_t)

	9477 +round_up_pow2 (size_t a, size_t b)

	9478 +/* a rounded up to the next multiple of b, i.e. ceil(a/b)b /

	9479 +/* Assumes a >= 0, b > 0, and b is a power of 2 */

	9480 +{

	9481 + return ((a + b - 1) & (~(b - 1)));

	9482 +}

	9483 +

	9484 +

	9485 /*

	9486 * Some important notes:

	9487 * The allocation routines provided here must never return NULL.

	9488 @@ -122,7 +131,7 @@

	9489 jvirt_barray_ptr virt_barray_list;

	9490

	9491 /* This counts total space obtained from jpeg_get_small/large */

	9492 - long total_space_allocated;

	9493 + size_t total_space_allocated;

	9494

	9495 /* alloc_sarray and alloc_barray set this value for use by virtual

	9496 * array routines.

	9497 @@ -265,7 +274,7 @@

	9498 * and so that algorithms can straddle outside the proper area up

	9499 * to the next alignment.

	9500 */

	9501 - sizeofobject = jround_up(sizeofobject, ALIGN_SIZE);

	9502 + sizeofobject = round_up_pow2(sizeofobject, ALIGN_SIZE);

	9503

	9504 /* Check for unsatisfiable request (do now to ensure no overflow below) */

	9505 if ((SIZEOF(small_pool_hdr) + sizeofobject + ALIGN_SIZE - 1) > MAX_ALLOC_CHUN K)

	9506 @@ -317,8 +326,8 @@

	9507 /* OK, allocate the object from the current pool */

	9508 data_ptr = (char ) hdr_ptr; / point to first data byte in pool... */

	9509 data_ptr += SIZEOF(small_pool_hdr); /* ...by skipping the header... */

	9510 - if ((unsigned long)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */

	9511 - data_ptr += ALIGN_SIZE - (unsigned long)data_ptr % ALIGN_SIZE;

	9512 + if ((size_t)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */

	9513 + data_ptr += ALIGN_SIZE - (size_t)data_ptr % ALIGN_SIZE;

	9514 data_ptr += hdr_ptr->bytes_used; /* point to place for object */

	9515 hdr_ptr->bytes_used += sizeofobject;

	9516 hdr_ptr->bytes_left -= sizeofobject;

	9517 @@ -354,7 +363,7 @@

	9518 * algorithms can straddle outside the proper area up to the next

	9519 * alignment.

	9520 */

	9521 - sizeofobject = jround_up(sizeofobject, ALIGN_SIZE);

	9522 + sizeofobject = round_up_pow2(sizeofobject, ALIGN_SIZE);

	9523

	9524 /* Check for unsatisfiable request (do now to ensure no overflow below) */

	9525 if ((SIZEOF(large_pool_hdr) + sizeofobject + ALIGN_SIZE - 1) > MAX_ALLOC_CHUN K)

	9526 @@ -382,8 +391,8 @@

	9527

	9528 data_ptr = (char ) hdr_ptr; / point to first data byte in pool... */

	9529 data_ptr += SIZEOF(small_pool_hdr); /* ...by skipping the header... */

	9530 - if ((unsigned long)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */

	9531 - data_ptr += ALIGN_SIZE - (unsigned long)data_ptr % ALIGN_SIZE;

	9532 + if ((size_t)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */

	9533 + data_ptr += ALIGN_SIZE - (size_t)data_ptr % ALIGN_SIZE;

	9534

	9535 return (void FAR *) data_ptr;

	9536 }

	9537 @@ -420,7 +429,7 @@

	9538 /* Make sure each row is properly aligned */

	9539 if ((ALIGN_SIZE % SIZEOF(JSAMPLE)) != 0)

	9540 out_of_memory(cinfo, 5); /* safety check */

	9541 - samplesperrow = jround_up(samplesperrow, (2 * ALIGN_SIZE) / SIZEOF(JSAMPLE));

	9542 + samplesperrow = (JDIMENSION)round_up_pow2(samplesperrow, (2 * ALIGN_SIZE) / S IZEOF(JSAMPLE));

	9543

	9544 /* Calculate max # of rows allowed in one allocation chunk */

	9545 ltemp = (MAX_ALLOC_CHUNK-SIZEOF(large_pool_hdr)) /

	9546 @@ -608,8 +617,8 @@

	9547 /* Allocate the in-memory buffers for any unrealized virtual arrays */

	9548 {

	9549 my_mem_ptr mem = (my_mem_ptr) cinfo->mem;

	9550 - long space_per_minheight, maximum_space, avail_mem;

	9551 - long minheights, max_minheights;

	9552 + size_t space_per_minheight, maximum_space, avail_mem;

	9553 + size_t minheights, max_minheights;

	9554 jvirt_sarray_ptr sptr;

	9555 jvirt_barray_ptr bptr;

	9556

	9557 Index: jmemnobs.c

	9558 ===================================================================

	9559 --- jmemnobs.c (revision 829)

	9560 +++ jmemnobs.c (working copy)

	9561 @@ -69,9 +69,9 @@

	9562 * Here we always say, "we got all you want bud!"

	9563 */

	9564

	9565 -GLOBAL(long)

	9566 -jpeg_mem_available (j_common_ptr cinfo, long min_bytes_needed,

	9567 - long max_bytes_needed, long already_allocated)

	9568 +GLOBAL(size_t)

	9569 +jpeg_mem_available (j_common_ptr cinfo, size_t min_bytes_needed,

	9570 + size_t max_bytes_needed, size_t already_allocated)

	9571 {

	9572 return max_bytes_needed;

	9573 }

	9574 Index: jmemsys.h

	9575 ===================================================================

	9576 --- jmemsys.h (revision 829)

	9577 +++ jmemsys.h (working copy)

	9578 @@ -100,10 +100,10 @@

	9579 * Conversely, zero may be returned to always use the minimum amount of memory.

	9580 */

	9581

	9582 -EXTERN(long) jpeg_mem_available JPP((j_common_ptr cinfo,

	9583 - long min_bytes_needed,

	9584 - long max_bytes_needed,

	9585 - long already_allocated));

	9586 +EXTERN(size_t) jpeg_mem_available JPP((j_common_ptr cinfo,

	9587 + size_t min_bytes_needed,

	9588 + size_t max_bytes_needed,

	9589 + size_t already_allocated));

	9590

	9591

	9592 /*

167 Index: jmorecfg.h	9593 Index: jmorecfg.h

168 ===================================================================	9594 ===================================================================

169 --- jmorecfg.h (revision 829)	9595 --- jmorecfg.h (revision 829)

170 +++ jmorecfg.h (working copy)	9596 +++ jmorecfg.h (working copy)

171 @@ -153,14 +153,18 @@	9597 @@ -1,9 +1,10 @@

	9598 /*

	9599 * jmorecfg.h

	9600 *

	9601 + * This file was part of the Independent JPEG Group's software:

	9602 * Copyright (C) 1991-1997, Thomas G. Lane.

	9603 - * Copyright (C) 2009, D. R. Commander.

	9604 - * This file is part of the Independent JPEG Group's software.

	9605 + * Modifications:

	9606 + * Copyright (C) 2009, 2011, D. R. Commander.

	9607 * For conditions of distribution and use, see the accompanying README file.

	9608 *

	9609 * This file contains additional configuration options that customize the

	9610 @@ -153,14 +154,18 @@

172 /* INT16 must hold at least the values -32768..32767. */	9611 /* INT16 must hold at least the values -32768..32767. */

173	9612

174 #ifndef XMD_H /* X11/xmd.h correctly defines INT16 */	9613 #ifndef XMD_H /* X11/xmd.h correctly defines INT16 */

175 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */	9614 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */

176 typedef short INT16;	9615 typedef short INT16;

177 #endif	9616 #endif

178 +#endif	9617 +#endif

179	9618

180 /* INT32 must hold at least signed 32-bit values. */	9619 /* INT32 must hold at least signed 32-bit values. */

181	9620

182 #ifndef XMD_H /* X11/xmd.h correctly defines INT32 */	9621 #ifndef XMD_H /* X11/xmd.h correctly defines INT32 */

183 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */	9622 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */

184 typedef long INT32;	9623 typedef long INT32;

185 #endif	9624 #endif

186 +#endif	9625 +#endif

187	9626

188 /* Datatype used for image dimensions. The JPEG standard only supports	9627 /* Datatype used for image dimensions. The JPEG standard only supports

189 * images up to 64K*64K due to 16-bit fields in SOF markers. Therefore	9628 * images up to 64K*64K due to 16-bit fields in SOF markers. Therefore

190 @@ -210,11 +214,13 @@	9629 @@ -210,11 +215,16 @@

191 * explicit coding is needed; see uses of the NEED_FAR_POINTERS symbol.	9630 * explicit coding is needed; see uses of the NEED_FAR_POINTERS symbol.

192 */	9631 */

193	9632

194 +#ifndef FAR	9633 +#ifndef FAR

195 #ifdef NEED_FAR_POINTERS	9634 #ifdef NEED_FAR_POINTERS

	9635 +#ifndef FAR

196 #define FAR far	9636 #define FAR far

	9637 +#endif

197 #else	9638 #else

	9639 +#undef FAR

198 #define FAR	9640 #define FAR

199 #endif	9641 #endif

200 +#endif	9642 +#endif

201	9643

202	9644

203 /*	9645 /*

	9646 @@ -257,8 +267,6 @@

	9647 * (You may HAVE to do that if your compiler doesn't like null source files.)

	9648 */

	9649

	9650 -/* Arithmetic coding is unsupported for legal reasons. Complaints to IBM. */

	9651 -

	9652 /* Capability options common to encoder and decoder: */

	9653

	9654 #define DCT_ISLOW_SUPPORTED /* slow but accurate integer algorithm */

	9655 @@ -267,7 +275,6 @@

	9656

	9657 /* Encoder capability options: */

	9658

	9659 -#undef C_ARITH_CODING_SUPPORTED /* Arithmetic coding back end? */

	9660 #define C_MULTISCAN_FILES_SUPPORTED /* Multiple-scan JPEG files? */

	9661 #define C_PROGRESSIVE_SUPPORTED /* Progressive JPEG? (Requires MULTI SCAN)*/

	9662 #define ENTROPY_OPT_SUPPORTED /* Optimization of entropy coding parms? */

	9663 @@ -283,7 +290,6 @@

	9664

	9665 /* Decoder capability options: */

	9666

	9667 -#undef D_ARITH_CODING_SUPPORTED /* Arithmetic coding back end? */

	9668 #define D_MULTISCAN_FILES_SUPPORTED /* Multiple-scan JPEG files? */

	9669 #define D_PROGRESSIVE_SUPPORTED /* Progressive JPEG? (Requires MULTI SCAN)*/

	9670 #define SAVE_MARKERS_SUPPORTED /* jpeg_save_markers() needed? */

	9671 @@ -317,22 +323,60 @@

	9672 #define RGB_BLUE 2 /* Offset of Blue */

	9673 #define RGB_PIXELSIZE 3 /* JSAMPLEs per RGB scanline element */

	9674

	9675 -#define JPEG_NUMCS 12

	9676 +#define JPEG_NUMCS 16

	9677

	9678 +#define EXT_RGB_RED 0

	9679 +#define EXT_RGB_GREEN 1

	9680 +#define EXT_RGB_BLUE 2

	9681 +#define EXT_RGB_PIXELSIZE 3

	9682 +

	9683 +#define EXT_RGBX_RED 0

	9684 +#define EXT_RGBX_GREEN 1

	9685 +#define EXT_RGBX_BLUE 2

	9686 +#define EXT_RGBX_PIXELSIZE 4

	9687 +

	9688 +#define EXT_BGR_RED 2

	9689 +#define EXT_BGR_GREEN 1

	9690 +#define EXT_BGR_BLUE 0

	9691 +#define EXT_BGR_PIXELSIZE 3

	9692 +

	9693 +#define EXT_BGRX_RED 2

	9694 +#define EXT_BGRX_GREEN 1

	9695 +#define EXT_BGRX_BLUE 0

	9696 +#define EXT_BGRX_PIXELSIZE 4

	9697 +

	9698 +#define EXT_XBGR_RED 3

	9699 +#define EXT_XBGR_GREEN 2

	9700 +#define EXT_XBGR_BLUE 1

	9701 +#define EXT_XBGR_PIXELSIZE 4

	9702 +

	9703 +#define EXT_XRGB_RED 1

	9704 +#define EXT_XRGB_GREEN 2

	9705 +#define EXT_XRGB_BLUE 3

	9706 +#define EXT_XRGB_PIXELSIZE 4

	9707 +

	9708 static const int rgb_red[JPEG_NUMCS] = {

	9709 - -1, -1, RGB_RED, -1, -1, -1, 0, 0, 2, 2, 3, 1

	9710 + -1, -1, RGB_RED, -1, -1, -1, EXT_RGB_RED, EXT_RGBX_RED,

	9711 + EXT_BGR_RED, EXT_BGRX_RED, EXT_XBGR_RED, EXT_XRGB_RED,

	9712 + EXT_RGBX_RED, EXT_BGRX_RED, EXT_XBGR_RED, EXT_XRGB_RED

	9713 };

	9714

	9715 static const int rgb_green[JPEG_NUMCS] = {

	9716 - -1, -1, RGB_GREEN, -1, -1, -1, 1, 1, 1, 1, 2, 2

	9717 + -1, -1, RGB_GREEN, -1, -1, -1, EXT_RGB_GREEN, EXT_RGBX_GREEN,

	9718 + EXT_BGR_GREEN, EXT_BGRX_GREEN, EXT_XBGR_GREEN, EXT_XRGB_GREEN,

	9719 + EXT_RGBX_GREEN, EXT_BGRX_GREEN, EXT_XBGR_GREEN, EXT_XRGB_GREEN

	9720 };

	9721

	9722 static const int rgb_blue[JPEG_NUMCS] = {

	9723 - -1, -1, RGB_BLUE, -1, -1, -1, 2, 2, 0, 0, 1, 3

	9724 + -1, -1, RGB_BLUE, -1, -1, -1, EXT_RGB_BLUE, EXT_RGBX_BLUE,

	9725 + EXT_BGR_BLUE, EXT_BGRX_BLUE, EXT_XBGR_BLUE, EXT_XRGB_BLUE,

	9726 + EXT_RGBX_BLUE, EXT_BGRX_BLUE, EXT_XBGR_BLUE, EXT_XRGB_BLUE

	9727 };

	9728

	9729 static const int rgb_pixelsize[JPEG_NUMCS] = {

	9730 - -1, -1, RGB_PIXELSIZE, -1, -1, -1, 3, 4, 3, 4, 4, 4

	9731 + -1, -1, RGB_PIXELSIZE, -1, -1, -1, EXT_RGB_PIXELSIZE, EXT_RGBX_PIXELSIZE,

	9732 + EXT_BGR_PIXELSIZE, EXT_BGRX_PIXELSIZE, EXT_XBGR_PIXELSIZE, EXT_XRGB_PIXELSIZE ,

	9733 + EXT_RGBX_PIXELSIZE, EXT_BGRX_PIXELSIZE, EXT_XBGR_PIXELSIZE, EXT_XRGB_PIXELSIZ E

	9734 };

	9735

	9736 /* Definitions for speed-related optimizations. */

	9737 Index: jpegint.h

	9738 ===================================================================

	9739 --- jpegint.h (revision 829)

	9740 +++ jpegint.h (working copy)

	9741 @@ -2,6 +2,7 @@

	9742 * jpegint.h

	9743 *

	9744 * Copyright (C) 1991-1997, Thomas G. Lane.

	9745 + * Modified 1997-2009 by Guido Vollbeding.

	9746 * This file is part of the Independent JPEG Group's software.

	9747 * For conditions of distribution and use, see the accompanying README file.

	9748 *

	9749 @@ -304,6 +305,7 @@

	9750 #define jinit_forward_dct jIFDCT

	9751 #define jinit_huff_encoder jIHEncoder

	9752 #define jinit_phuff_encoder jIPHEncoder

	9753 +#define jinit_arith_encoder jIAEncoder

	9754 #define jinit_marker_writer jIMWriter

	9755 #define jinit_master_decompress jIDMaster

	9756 #define jinit_d_main_controller jIDMainC

	9757 @@ -313,6 +315,7 @@

	9758 #define jinit_marker_reader jIMReader

	9759 #define jinit_huff_decoder jIHDecoder

	9760 #define jinit_phuff_decoder jIPHDecoder

	9761 +#define jinit_arith_decoder jIADecoder

	9762 #define jinit_inverse_dct jIIDCT

	9763 #define jinit_upsampler jIUpsampler

	9764 #define jinit_color_deconverter jIDColor

	9765 @@ -327,6 +330,7 @@

	9766 #define jzero_far jZeroFar

	9767 #define jpeg_zigzag_order jZIGTable

	9768 #define jpeg_natural_order jZAGTable

	9769 +#define jpeg_aritab jAriTab

	9770 #endif /* NEED_SHORT_EXTERNAL_NAMES */

	9771

	9772

	9773 @@ -345,6 +349,7 @@

	9774 EXTERN(void) jinit_forward_dct JPP((j_compress_ptr cinfo));

	9775 EXTERN(void) jinit_huff_encoder JPP((j_compress_ptr cinfo));

	9776 EXTERN(void) jinit_phuff_encoder JPP((j_compress_ptr cinfo));

	9777 +EXTERN(void) jinit_arith_encoder JPP((j_compress_ptr cinfo));

	9778 EXTERN(void) jinit_marker_writer JPP((j_compress_ptr cinfo));

	9779 /* Decompression module initialization routines */

	9780 EXTERN(void) jinit_master_decompress JPP((j_decompress_ptr cinfo));

	9781 @@ -358,6 +363,7 @@

	9782 EXTERN(void) jinit_marker_reader JPP((j_decompress_ptr cinfo));

	9783 EXTERN(void) jinit_huff_decoder JPP((j_decompress_ptr cinfo));

	9784 EXTERN(void) jinit_phuff_decoder JPP((j_decompress_ptr cinfo));

	9785 +EXTERN(void) jinit_arith_decoder JPP((j_decompress_ptr cinfo));

	9786 EXTERN(void) jinit_inverse_dct JPP((j_decompress_ptr cinfo));

	9787 EXTERN(void) jinit_upsampler JPP((j_decompress_ptr cinfo));

	9788 EXTERN(void) jinit_color_deconverter JPP((j_decompress_ptr cinfo));

	9789 @@ -382,6 +388,9 @@

	9790 #endif

	9791 extern const int jpeg_natural_order[]; /* zigzag coef order to natural order */

	9792

	9793 +/* Arithmetic coding probability estimation tables in jaricom.c */

	9794 +extern const INT32 jpeg_aritab[];

	9795 +

	9796 /* Suppress undefined-structure complaints if necessary. */

	9797

	9798 #ifdef INCOMPLETE_TYPES_BROKEN

204 Index: jpeglib.h	9799 Index: jpeglib.h

205 ===================================================================	9800 ===================================================================

206 --- jpeglib.h (revision 829)	9801 --- jpeglib.h (revision 829)

207 +++ jpeglib.h (working copy)	9802 +++ jpeglib.h (working copy)

208 @@ -15,6 +15,10 @@	9803 @@ -1,9 +1,12 @@

	9804 /*

	9805 * jpeglib.h

	9806 *

	9807 + * This file was part of the Independent JPEG Group's software:

	9808 * Copyright (C) 1991-1998, Thomas G. Lane.

	9809 - * Copyright (C) 2009, D. R. Commander.

	9810 - * This file is part of the Independent JPEG Group's software.

	9811 + * Modified 2002-2009 by Guido Vollbeding.

	9812 + * Modifications:

	9813 + * Copyright (C) 2009-2011, 2013, D. R. Commander.

	9814 + * Copyright (C) 2015, Google, Inc.

	9815 * For conditions of distribution and use, see the accompanying README file.

	9816 *

	9817 * This file defines the application interface for the JPEG library.

	9818 @@ -14,6 +17,10 @@

209 #ifndef JPEGLIB_H	9819 #ifndef JPEGLIB_H

210 #define JPEGLIB_H	9820 #define JPEGLIB_H

211	9821

212 +/* Begin chromium edits */	9822 +/* Begin chromium edits */

213 +#include "jpeglibmangler.h"	9823 +#include "jpeglibmangler.h"

214 +/* End chromium edits */	9824 +/* End chromium edits */

215 +	9825 +

216 /*	9826 /*

217 * First we include the configuration files that record how this	9827 * First we include the configuration files that record how this

218 * installation of the JPEG library is set up. jconfig.h can be	9828 * installation of the JPEG library is set up. jconfig.h can be

	9829 @@ -27,13 +34,13 @@

	9830 #include "jmorecfg.h" /* seldom changed options */

	9831

	9832

	9833 -/* Version ID for the JPEG library.

	9834 - * Might be useful for tests like "#if JPEG_LIB_VERSION >= 60".

	9835 - */

	9836 +#ifdef __cplusplus

	9837 +#ifndef DONT_USE_EXTERN_C

	9838 +extern "C" {

	9839 +#endif

	9840 +#endif

	9841

	9842 -#define JPEG_LIB_VERSION 62 /* Version 6b */

	9843

	9844 -

	9845 /* Various constants determining the sizes of things.

	9846 * All of these are specified by the JPEG standard, so don't change them

	9847 * if you want to be compatible.

	9848 @@ -145,12 +152,17 @@

	9849 * Values of 1,2,4,8 are likely to be supported. Note that different

	9850 * components may receive different IDCT scalings.

	9851 */

	9852 +#if JPEG_LIB_VERSION >= 70

	9853 + int DCT_h_scaled_size;

	9854 + int DCT_v_scaled_size;

	9855 +#else

	9856 int DCT_scaled_size;

	9857 +#endif

	9858 /* The downsampled dimensions are the component's actual, unpadded number

	9859 * of samples at the main buffer (preprocessing/compression interface), thus

	9860 * downsampled_width = ceil(image_width * Hi/Hmax)

	9861 * and similarly for height. For decompression, IDCT scaling is included, so

	9862 - * downsampled_width = ceil(image_width * Hi/Hmax * DCT_scaled_size/DCTSIZE)

	9863 + * downsampled_width = ceil(image_width * Hi/Hmax * DCT_[h_]scaled_size/DCTSI ZE)

	9864 */

	9865 JDIMENSION downsampled_width; /* actual width in samples */

	9866 JDIMENSION downsampled_height; /* actual height in samples */

	9867 @@ -165,7 +177,7 @@

	9868 int MCU_width; /* number of blocks per MCU, horizontally */

	9869 int MCU_height; /* number of blocks per MCU, vertically */

	9870 int MCU_blocks; /* MCU_width * MCU_height */

	9871 - int MCU_sample_width; /* MCU width in samples, MCU_widthDCT_s caled_size /

	9872 + int MCU_sample_width; /* MCU width in samples, MCU_widthDCT_[ h_]scaled_size /

	9873 int last_col_width; /* # of non-dummy blocks across in last MCU */

	9874 int last_row_height; /* # of non-dummy blocks down in last MCU */

	9875

	9876 @@ -205,12 +217,13 @@

	9877 /* Known color spaces. */

	9878

	9879 #define JCS_EXTENSIONS 1

	9880 +#define JCS_ALPHA_EXTENSIONS 1

	9881

	9882 typedef enum {

	9883 JCS_UNKNOWN, /* error/unspecified */

	9884 JCS_GRAYSCALE, /* monochrome */

	9885 JCS_RGB, /* red/green/blue as specified by the RGB_RED, R GB_GREEN,

	9886 - RGB_BLUE, and RGB_PIXELSIZE macros */

	9887 + RGB_BLUE, and RGB_PIXELSIZE macros */

	9888 JCS_YCbCr, /* Y/Cb/Cr (also known as YUV) */

	9889 JCS_CMYK, /* C/M/Y/K */

	9890 JCS_YCCK, /* Y/Cb/Cr/K */

	9891 @@ -220,6 +233,17 @@

	9892 JCS_EXT_BGRX, /* blue/green/red/x */

	9893 JCS_EXT_XBGR, /* x/blue/green/red */

	9894 JCS_EXT_XRGB, /* x/red/green/blue */

	9895 + /* When out_color_space it set to JCS_EXT_RGBX, JCS_EXT_BGRX,

	9896 + JCS_EXT_XBGR, or JCS_EXT_XRGB during decompression, the X byte is

	9897 + undefined, and in order to ensure the best performance,

	9898 + libjpeg-turbo can set that byte to whatever value it wishes. Use

	9899 + the following colorspace constants to ensure that the X byte is set

	9900 + to 0xFF, so that it can be interpreted as an opaque alpha

	9901 + channel. */

	9902 + JCS_EXT_RGBA, /* red/green/blue/alpha */

	9903 + JCS_EXT_BGRA, /* blue/green/red/alpha */

	9904 + JCS_EXT_ABGR, /* alpha/blue/green/red */

	9905 + JCS_EXT_ARGB /* alpha/red/green/blue */

	9906 } J_COLOR_SPACE;

	9907

	9908 /* DCT/IDCT algorithm options. */

	9909 @@ -301,6 +325,19 @@

	9910 * helper routines to simplify changing parameters.

	9911 */

	9912

	9913 +#if JPEG_LIB_VERSION >= 70

	9914 + unsigned int scale_num, scale_denom; /* fraction by which to scale image */

	9915 +

	9916 + JDIMENSION jpeg_width; /* scaled JPEG image width */

	9917 + JDIMENSION jpeg_height; /* scaled JPEG image height */

	9918 + /* Dimensions of actual JPEG image that will be written to file,

	9919 + * derived from input dimensions by scaling factors above.

	9920 + * These fields are computed by jpeg_start_compress().

	9921 + * You can also use jpeg_calc_jpeg_dimensions() to determine these values

	9922 + * in advance of calling jpeg_start_compress().

	9923 + */

	9924 +#endif

	9925 +

	9926 int data_precision; /* bits of precision in image data */

	9927

	9928 int num_components; /* # of color components in JPEG image */

	9929 @@ -308,14 +345,19 @@

	9930

	9931 jpeg_component_info * comp_info;

	9932 /* comp_info[i] describes component that appears i'th in SOF */

	9933 -

	9934 +

	9935 JQUANT_TBL * quant_tbl_ptrs[NUM_QUANT_TBLS];

	9936 - /* ptrs to coefficient quantization tables, or NULL if not defined */

	9937 -

	9938 +#if JPEG_LIB_VERSION >= 70

	9939 + int q_scale_factor[NUM_QUANT_TBLS];

	9940 +#endif

	9941 + /* ptrs to coefficient quantization tables, or NULL if not defined,

	9942 + * and corresponding scale factors (percentage, initialized 100).

	9943 + */

	9944 +

	9945 JHUFF_TBL * dc_huff_tbl_ptrs[NUM_HUFF_TBLS];

	9946 JHUFF_TBL * ac_huff_tbl_ptrs[NUM_HUFF_TBLS];

	9947 /* ptrs to Huffman coding tables, or NULL if not defined */

	9948 -

	9949 +

	9950 UINT8 arith_dc_L[NUM_ARITH_TBLS]; /* L values for DC arith-coding tables */

	9951 UINT8 arith_dc_U[NUM_ARITH_TBLS]; /* U values for DC arith-coding tables */

	9952 UINT8 arith_ac_K[NUM_ARITH_TBLS]; /* Kx values for AC arith-coding tables */

	9953 @@ -331,6 +373,9 @@

	9954 boolean arith_code; /* TRUE=arithmetic coding, FALSE=Huffman */

	9955 boolean optimize_coding; /* TRUE=optimize entropy encoding parms */

	9956 boolean CCIR601_sampling; /* TRUE=first samples are cosited */

	9957 +#if JPEG_LIB_VERSION >= 70

	9958 + boolean do_fancy_downsampling; /* TRUE=apply fancy downsampling */

	9959 +#endif

	9960 int smoothing_factor; /* 1..100, or 0 for no input smoothing * /

	9961 J_DCT_METHOD dct_method; /* DCT algorithm selector */

	9962

	9963 @@ -374,6 +419,11 @@

	9964 int max_h_samp_factor; /* largest h_samp_factor */

	9965 int max_v_samp_factor; /* largest v_samp_factor */

	9966

	9967 +#if JPEG_LIB_VERSION >= 70

	9968 + int min_DCT_h_scaled_size; /* smallest DCT_h_scaled_size of any component * /

	9969 + int min_DCT_v_scaled_size; /* smallest DCT_v_scaled_size of any component * /

	9970 +#endif

	9971 +

	9972 JDIMENSION total_iMCU_rows; /* # of iMCU rows to be input to coef ctlr */

	9973 /* The coefficient controller receives data in units of MCU rows as defined

	9974 * for fully interleaved scans (whether the JPEG file is interleaved or not).

	9975 @@ -399,6 +449,12 @@

	9976

	9977 int Ss, Se, Ah, Al; /* progressive JPEG parameters for scan */

	9978

	9979 +#if JPEG_LIB_VERSION >= 80

	9980 + int block_size; /* the basic DCT block size: 1..16 */

	9981 + const int * natural_order; /* natural-order position array */

	9982 + int lim_Se; /* min( Se, DCTSIZE2-1 ) */

	9983 +#endif

	9984 +

	9985 /*

	9986 * Links to compression subobjects (methods and private variables of modules)

	9987 */

	9988 @@ -545,6 +601,9 @@

	9989 jpeg_component_info * comp_info;

	9990 /* comp_info[i] describes component that appears i'th in SOF */

	9991

	9992 +#if JPEG_LIB_VERSION >= 80

	9993 + boolean is_baseline; /* TRUE if Baseline SOF0 encountered */

	9994 +#endif

	9995 boolean progressive_mode; /* TRUE if SOFn specifies progressive mode */

	9996 boolean arith_code; /* TRUE=arithmetic coding, FALSE=Huffman */

	9997

	9998 @@ -585,7 +644,12 @@

	9999 int max_h_samp_factor; /* largest h_samp_factor */

	10000 int max_v_samp_factor; /* largest v_samp_factor */

	10001

	10002 +#if JPEG_LIB_VERSION >= 70

	10003 + int min_DCT_h_scaled_size; /* smallest DCT_h_scaled_size of any component * /

	10004 + int min_DCT_v_scaled_size; /* smallest DCT_v_scaled_size of any component * /

	10005 +#else

	10006 int min_DCT_scaled_size; /* smallest DCT_scaled_size of any component */

	10007 +#endif

	10008

	10009 JDIMENSION total_iMCU_rows; /* # of iMCU rows in image */

	10010 /* The coefficient controller's input and output progress is measured in

	10011 @@ -593,7 +657,7 @@

	10012 * in fully interleaved JPEG scans, but are used whether the scan is

	10013 * interleaved or not. We define an iMCU row as v_samp_factor DCT block

	10014 * rows of each component. Therefore, the IDCT output contains

	10015 - * v_samp_factor*DCT_scaled_size sample rows of a component per iMCU row.

	10016 + * v_samp_factor*DCT_[v_]scaled_size sample rows of a component per iMCU row.

	10017 */

	10018

	10019 JSAMPLE * sample_range_limit; /* table for fast range-limiting */

	10020 @@ -617,6 +681,14 @@

	10021

	10022 int Ss, Se, Ah, Al; /* progressive JPEG parameters for scan */

	10023

	10024 +#if JPEG_LIB_VERSION >= 80

	10025 + /* These fields are derived from Se of first SOS marker.

	10026 + */

	10027 + int block_size; /* the basic DCT block size: 1..16 */

	10028 + const int * natural_order; /* natural-order position array for entropy decode */

	10029 + int lim_Se; /* min( Se, DCTSIZE2-1 ) for entropy decode */

	10030 +#endif

	10031 +

	10032 /* This field is shared between entropy decoder and marker parser.

	10033 * It is either zero or the code of a JPEG marker that has been

	10034 * read from the data source, but has not yet been processed.

	10035 @@ -846,11 +918,18 @@

	10036 #define jpeg_destroy_decompress jDestDecompress

	10037 #define jpeg_stdio_dest jStdDest

	10038 #define jpeg_stdio_src jStdSrc

	10039 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	10040 +#define jpeg_mem_dest jMemDest

	10041 +#define jpeg_mem_src jMemSrc

	10042 +#endif

	10043 #define jpeg_set_defaults jSetDefaults

	10044 #define jpeg_set_colorspace jSetColorspace

	10045 #define jpeg_default_colorspace jDefColorspace

	10046 #define jpeg_set_quality jSetQuality

	10047 #define jpeg_set_linear_quality jSetLQuality

	10048 +#if JPEG_LIB_VERSION >= 70

	10049 +#define jpeg_default_qtables jDefQTables

	10050 +#endif

	10051 #define jpeg_add_quant_table jAddQuantTable

	10052 #define jpeg_quality_scaling jQualityScaling

	10053 #define jpeg_simple_progression jSimProgress

	10054 @@ -860,6 +939,9 @@

	10055 #define jpeg_start_compress jStrtCompress

	10056 #define jpeg_write_scanlines jWrtScanlines

	10057 #define jpeg_finish_compress jFinCompress

	10058 +#if JPEG_LIB_VERSION >= 70

	10059 +#define jpeg_calc_jpeg_dimensions jCjpegDimensions

	10060 +#endif

	10061 #define jpeg_write_raw_data jWrtRawData

	10062 #define jpeg_write_marker jWrtMarker

	10063 #define jpeg_write_m_header jWrtMHeader

	10064 @@ -876,6 +958,9 @@

	10065 #define jpeg_input_complete jInComplete

	10066 #define jpeg_new_colormap jNewCMap

	10067 #define jpeg_consume_input jConsumeInput

	10068 +#if JPEG_LIB_VERSION >= 80

	10069 +#define jpeg_core_output_dimensions jCoreDimensions

	10070 +#endif

	10071 #define jpeg_calc_output_dimensions jCalcDimensions

	10072 #define jpeg_save_markers jSaveMarkers

	10073 #define jpeg_set_marker_processor jSetMarker

	10074 @@ -920,6 +1005,16 @@

	10075 EXTERN(void) jpeg_stdio_dest JPP((j_compress_ptr cinfo, FILE * outfile));

	10076 EXTERN(void) jpeg_stdio_src JPP((j_decompress_ptr cinfo, FILE * infile));

	10077

	10078 +#if JPEG_LIB_VERSION >= 80 \|\| defined(MEM_SRCDST_SUPPORTED)

	10079 +/* Data source and destination managers: memory buffers. */

	10080 +EXTERN(void) jpeg_mem_dest JPP((j_compress_ptr cinfo,

	10081 + unsigned char ** outbuffer,

	10082 + unsigned long * outsize));

	10083 +EXTERN(void) jpeg_mem_src JPP((j_decompress_ptr cinfo,

	10084 + unsigned char * inbuffer,

	10085 + unsigned long insize));

	10086 +#endif

	10087 +

	10088 /* Default parameter setup for compression */

	10089 EXTERN(void) jpeg_set_defaults JPP((j_compress_ptr cinfo));

	10090 /* Compression parameter setup aids */

	10091 @@ -931,6 +1026,10 @@

	10092 EXTERN(void) jpeg_set_linear_quality JPP((j_compress_ptr cinfo,

	10093 int scale_factor,

	10094 boolean force_baseline));

	10095 +#if JPEG_LIB_VERSION >= 70

	10096 +EXTERN(void) jpeg_default_qtables JPP((j_compress_ptr cinfo,

	10097 + boolean force_baseline));

	10098 +#endif

	10099 EXTERN(void) jpeg_add_quant_table JPP((j_compress_ptr cinfo, int which_tbl,

	10100 const unsigned int *basic_table,

	10101 int scale_factor,

	10102 @@ -950,12 +1049,17 @@

	10103 JDIMENSION num_lines));

	10104 EXTERN(void) jpeg_finish_compress JPP((j_compress_ptr cinfo));

	10105

	10106 +#if JPEG_LIB_VERSION >= 70

	10107 +/* Precalculate JPEG dimensions for current compression parameters. */

	10108 +EXTERN(void) jpeg_calc_jpeg_dimensions JPP((j_compress_ptr cinfo));

	10109 +#endif

	10110 +

	10111 /* Replaces jpeg_write_scanlines when writing raw downsampled data. */

	10112 EXTERN(JDIMENSION) jpeg_write_raw_data JPP((j_compress_ptr cinfo,

	10113 JSAMPIMAGE data,

	10114 JDIMENSION num_lines));

	10115

	10116 -/* Write a special marker. See libjpeg.doc concerning safe usage. */

	10117 +/* Write a special marker. See libjpeg.txt concerning safe usage. */

	10118 EXTERN(void) jpeg_write_marker

	10119 JPP((j_compress_ptr cinfo, int marker,

	10120 const JOCTET * dataptr, unsigned int datalen));

	10121 @@ -986,6 +1090,8 @@

	10122 EXTERN(JDIMENSION) jpeg_read_scanlines JPP((j_decompress_ptr cinfo,

	10123 JSAMPARRAY scanlines,

	10124 JDIMENSION max_lines));

	10125 +EXTERN(JDIMENSION) jpeg_skip_scanlines (j_decompress_ptr cinfo,

	10126 + JDIMENSION num_lines);

	10127 EXTERN(boolean) jpeg_finish_decompress JPP((j_decompress_ptr cinfo));

	10128

	10129 /* Replaces jpeg_read_scanlines when reading raw downsampled data. */

	10130 @@ -1009,6 +1115,9 @@

	10131 #define JPEG_SCAN_COMPLETED 4 /* Completed last iMCU row of a scan */

	10132

	10133 /* Precalculate output dimensions for current decompression parameters. */

	10134 +#if JPEG_LIB_VERSION >= 80

	10135 +EXTERN(void) jpeg_core_output_dimensions JPP((j_decompress_ptr cinfo));

	10136 +#endif

	10137 EXTERN(void) jpeg_calc_output_dimensions JPP((j_decompress_ptr cinfo));

	10138

	10139 /* Control saving of COM and APPn markers into marker_list. */

	10140 @@ -1103,4 +1212,10 @@

	10141 #include "jerror.h" /* fetch error codes too */

	10142 #endif

	10143

	10144 +#ifdef __cplusplus

	10145 +#ifndef DONT_USE_EXTERN_C

	10146 +}

	10147 +#endif

	10148 +#endif

	10149 +

	10150 #endif /* JPEGLIB_H */

219 Index: jpeglibmangler.h	10151 Index: jpeglibmangler.h

220 ===================================================================	10152 ===================================================================

221 --- jpeglibmangler.h (revision 0)	10153 --- jpeglibmangler.h (revision 0)

222 +++ jpeglibmangler.h» (revision 0)	10154 +++ jpeglibmangler.h» (working copy)

223 @@ -0,0 +1,113 @@	10155 @@ -0,0 +1,114 @@

224 +// Copyright (c) 2009 The Chromium Authors. All rights reserved.	10156 +// Copyright (c) 2009 The Chromium Authors. All rights reserved.

225 +// Use of this source code is governed by a BSD-style license that can be	10157 +// Use of this source code is governed by a BSD-style license that can be

226 +// found in the LICENSE file.	10158 +// found in the LICENSE file.

227 +	10159 +

228 +#ifndef THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_	10160 +#ifndef THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_

229 +#define THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_	10161 +#define THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_

230 +	10162 +

231 +// Mangle all externally visible function names so we can build our own libjpeg	10163 +// Mangle all externally visible function names so we can build our own libjpeg

232 +// without system libraries trying to use it.	10164 +// without system libraries trying to use it.

233 +	10165 +

(...skipping 64 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
298 +#define jpeg_write_scanlines chromium_jpeg_write_scanlines	10230 +#define jpeg_write_scanlines chromium_jpeg_write_scanlines

299 +#define jpeg_finish_compress chromium_jpeg_finish_compress	10231 +#define jpeg_finish_compress chromium_jpeg_finish_compress

300 +#define jpeg_write_raw_data chromium_jpeg_write_raw_data	10232 +#define jpeg_write_raw_data chromium_jpeg_write_raw_data

301 +#define jpeg_write_marker chromium_jpeg_write_marker	10233 +#define jpeg_write_marker chromium_jpeg_write_marker

302 +#define jpeg_write_m_header chromium_jpeg_write_m_header	10234 +#define jpeg_write_m_header chromium_jpeg_write_m_header

303 +#define jpeg_write_m_byte chromium_jpeg_write_m_byte	10235 +#define jpeg_write_m_byte chromium_jpeg_write_m_byte

304 +#define jpeg_write_tables chromium_jpeg_write_tables	10236 +#define jpeg_write_tables chromium_jpeg_write_tables

305 +#define jpeg_read_header chromium_jpeg_read_header	10237 +#define jpeg_read_header chromium_jpeg_read_header

306 +#define jpeg_start_decompress chromium_jpeg_start_decompress	10238 +#define jpeg_start_decompress chromium_jpeg_start_decompress

307 +#define jpeg_read_scanlines chromium_jpeg_read_scanlines	10239 +#define jpeg_read_scanlines chromium_jpeg_read_scanlines

	10240 +#define jpeg_skip_scanlines chromium_jpeg_skip_scanlines

308 +#define jpeg_finish_decompress chromium_jpeg_finish_decompress	10241 +#define jpeg_finish_decompress chromium_jpeg_finish_decompress

309 +#define jpeg_read_raw_data chromium_jpeg_read_raw_data	10242 +#define jpeg_read_raw_data chromium_jpeg_read_raw_data

310 +#define jpeg_has_multiple_scans chromium_jpeg_has_multiple_scans	10243 +#define jpeg_has_multiple_scans chromium_jpeg_has_multiple_scans

311 +#define jpeg_start_output chromium_jpeg_start_output	10244 +#define jpeg_start_output chromium_jpeg_start_output

312 +#define jpeg_finish_output chromium_jpeg_finish_output	10245 +#define jpeg_finish_output chromium_jpeg_finish_output

313 +#define jpeg_input_complete chromium_jpeg_input_complete	10246 +#define jpeg_input_complete chromium_jpeg_input_complete

314 +#define jpeg_new_colormap chromium_jpeg_new_colormap	10247 +#define jpeg_new_colormap chromium_jpeg_new_colormap

315 +#define jpeg_consume_input chromium_jpeg_consume_input	10248 +#define jpeg_consume_input chromium_jpeg_consume_input

316 +#define jpeg_calc_output_dimensions chromium_jpeg_calc_output_dimensions	10249 +#define jpeg_calc_output_dimensions chromium_jpeg_calc_output_dimensions

317 +#define jpeg_save_markers chromium_jpeg_save_markers	10250 +#define jpeg_save_markers chromium_jpeg_save_markers

318 +#define jpeg_set_marker_processor chromium_jpeg_set_marker_processor	10251 +#define jpeg_set_marker_processor chromium_jpeg_set_marker_processor

319 +#define jpeg_read_coefficients chromium_jpeg_read_coefficients	10252 +#define jpeg_read_coefficients chromium_jpeg_read_coefficients

320 +#define jpeg_write_coefficients chromium_jpeg_write_coefficients	10253 +#define jpeg_write_coefficients chromium_jpeg_write_coefficients

321 +#define jpeg_copy_critical_parameters chromium_jpeg_copy_critical_parameters	10254 +#define jpeg_copy_critical_parameters chromium_jpeg_copy_critical_parameters

322 +#define jpeg_abort_compress chromium_jpeg_abort_compress	10255 +#define jpeg_abort_compress chromium_jpeg_abort_compress

323 +#define jpeg_abort_decompress chromium_jpeg_abort_decompress	10256 +#define jpeg_abort_decompress chromium_jpeg_abort_decompress

324 +#define jpeg_abort chromium_jpeg_abort	10257 +#define jpeg_abort chromium_jpeg_abort

325 +#define jpeg_destroy chromium_jpeg_destroy	10258 +#define jpeg_destroy chromium_jpeg_destroy

326 +#define jpeg_resync_to_restart chromium_jpeg_resync_to_restart	10259 +#define jpeg_resync_to_restart chromium_jpeg_resync_to_restart

327 +#define jpeg_get_small chromium_jpeg_get_small	10260 +#define jpeg_get_small chromium_jpeg_get_small

328 +#define jpeg_free_small chromium_jpeg_free_small	10261 +#define jpeg_free_small chromium_jpeg_free_small

329 +#define jpeg_get_large chromium_jpeg_get_large	10262 +#define jpeg_get_large chromium_jpeg_get_large

330 +#define jpeg_free_large chromium_jpeg_free_large	10263 +#define jpeg_free_large chromium_jpeg_free_large

331 +#define jpeg_mem_available chromium_jpeg_mem_available	10264 +#define jpeg_mem_available chromium_jpeg_mem_available

332 +#define jpeg_open_backing_store chromium_jpeg_open_backing_store	10265 +#define jpeg_open_backing_store chromium_jpeg_open_backing_store

333 +#define jpeg_mem_init chromium_jpeg_mem_init	10266 +#define jpeg_mem_init chromium_jpeg_mem_init

334 +#define jpeg_mem_term chromium_jpeg_mem_term	10267 +#define jpeg_mem_term chromium_jpeg_mem_term

335 +	10268 +

336 +#endif // THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_	10269 +#endif // THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_

337 Index: simd/jcgrass2-64.asm	10270 Index: jpegut.c

338 ===================================================================	10271 ===================================================================

339 --- simd/jcgrass2-64.asm (revision 829)	10272 --- jpegut.c (revision 829)

340 +++ simd/jcgrass2-64.asm (working copy)	10273 +++ jpegut.c (working copy)

341 @@ -30,7 +30,7 @@	10274 @@ -19,11 +19,14 @@

342 SECTION SEG_CONST	10275 #include "./rrtimer.h"

343	10276 #include "./turbojpeg.h"

344 alignz 16	10277

345 - global EXTN(jconst_rgb_gray_convert_sse2)	10278 -#define _catch(f) {if((f)==-1) {printf("TJPEG: %s\n", tjGetErrorStr()); goto f inally;}}

346 + global EXTN(jconst_rgb_gray_convert_sse2) PRIVATE	10279 +#define _catch(f) {if((f)==-1) {printf("TJPEG: %s\n", tjGetErrorStr()); bailou t();}}

347	10280

348 EXTN(jconst_rgb_gray_convert_sse2):	10281 const char *_subnamel[NUMSUBOPT]={"4:4:4", "4:2:2", "4:2:0", "GRAY"};

349	10282 const char *_subnames[NUMSUBOPT]={"444", "422", "420", "GRAY"};

350 Index: simd/jiss2fst.asm	10283

351 ===================================================================	10284 +int exitstatus=0;

352 --- simd/jiss2fst.asm (revision 829)	10285 +#define bailout() {exitstatus=-1; goto finally;}

353 +++ simd/jiss2fst.asm (working copy)	10286 +

354 @@ -59,7 +59,7 @@	10287 int pixels[9][3]=

355 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)	10288 {

356	10289 {0, 255, 0},

357 alignz 16	10290 @@ -70,7 +73,7 @@

358 - global EXTN(jconst_idct_ifast_sse2)	10291 }

359 + global EXTN(jconst_idct_ifast_sse2) PRIVATE	10292 }

360	10293

361 EXTN(jconst_idct_ifast_sse2):	10294 -int dumpbuf(unsigned char *buf, int w, int h, int ps, int flags)

362	10295 +void dumpbuf(unsigned char *buf, int w, int h, int ps, int flags)

363 @@ -92,7 +92,7 @@	10296 {

364 %define WK_NUM 2	10297 int roffset=(flags&TJ_BGR)?2:0, goffset=1, boffset=(flags&TJ_BGR)?0:2, i ,

	10298 j;

	10299 @@ -177,12 +180,12 @@

	10300 if((outfile=fopen(filename, "wb"))==NULL)

	10301 {

	10302 printf("ERROR: Could not open %s for writing.\n", filename);

	10303 - goto finally;

	10304 + bailout();

	10305 }

	10306 if(fwrite(jpegbuf, jpgbufsize, 1, outfile)!=1)

	10307 {

	10308 printf("ERROR: Could not write to %s.\n", filename);

	10309 - goto finally;

	10310 + bailout();

	10311 }

	10312

	10313 finally:

	10314 @@ -210,7 +213,7 @@

	10315

	10316 if((bmpbuf=(unsigned char )malloc(wh*ps+1))==NULL)

	10317 {

	10318 - printf("ERROR: Could not allocate buffer\n"); goto finally;

	10319 + printf("ERROR: Could not allocate buffer\n"); bailout();

	10320 }

	10321 initbuf(bmpbuf, w, h, ps, flags);

	10322 memset(jpegbuf, 0, TJBUFSIZE(w, h));

	10323 @@ -249,12 +252,12 @@

	10324 _catch(tjDecompressHeader(hnd, jpegbuf, jpegsize, &_w, &_h));

	10325 if(_w!=w \|\| _h!=h)

	10326 {

	10327 - printf("Incorrect JPEG header\n"); goto finally;

	10328 + printf("Incorrect JPEG header\n"); bailout();

	10329 }

	10330

	10331 if((bmpbuf=(unsigned char )malloc(wh*ps+1))==NULL)

	10332 {

	10333 - printf("ERROR: Could not allocate buffer\n"); goto finally;

	10334 + printf("ERROR: Could not allocate buffer\n"); bailout();

	10335 }

	10336 memset(bmpbuf, 0, wpsh);

	10337

	10338 @@ -278,13 +281,13 @@

	10339

	10340 if((jpegbuf=(unsigned char *)malloc(TJBUFSIZE(w, h))) == NULL)

	10341 {

	10342 - puts("ERROR: Could not allocate buffer."); goto finally;

	10343 + puts("ERROR: Could not allocate buffer."); bailout();

	10344 }

	10345

	10346 if((hnd=tjInitCompress())==NULL)

	10347 - {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); g oto finally;}

	10348 + {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); b ailout();}

	10349 if((dhnd=tjInitDecompress())==NULL)

	10350 - {printf("Error in tjInitDecompress():\n%s\n", tjGetErrorStr()); goto finally;}

	10351 + {printf("Error in tjInitDecompress():\n%s\n", tjGetErrorStr()); bailout();}

	10352

	10353 gentestjpeg(hnd, jpegbuf, &size, w, h, ps, basefilename, subsamp, 100, 0 );

	10354 gentestbmp(dhnd, jpegbuf, size, w, h, ps, basefilename, subsamp, 100, 0) ;

	10355 @@ -327,7 +330,7 @@

	10356 int i, j, i2; unsigned char bmpbuf=NULL, jpgbuf=NULL;

	10357 tjhandle hnd=NULL; unsigned long size;

	10358 if((hnd=tjInitCompress())==NULL)

	10359 - {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); g oto finally;}

	10360 + {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); b ailout();}

	10361 printf("Buffer size regression test\n");

	10362 for(j=1; j<48; j++)

	10363 {

	10364 @@ -337,7 +340,7 @@

	10365 if((bmpbuf=(unsigned char )malloc(ij*4))==NULL

	10366 \|\| (jpgbuf=(unsigned char *)malloc(TJBUFSIZE(i, j)))==NU LL)

	10367 {

	10368 - printf("Memory allocation failure\n"); goto fin ally;

	10369 + printf("Memory allocation failure\n"); bailout( );

	10370 }

	10371 memset(bmpbuf, 0, ij4);

	10372 for(i2=0; i2<i*j; i2++)

	10373 @@ -353,7 +356,7 @@

	10374 if((bmpbuf=(unsigned char )malloc(ji*4))==NULL

	10375 \|\| (jpgbuf=(unsigned char *)malloc(TJBUFSIZE(j, i)))==NU LL)

	10376 {

	10377 - printf("Memory allocation failure\n"); goto fin ally;

	10378 + printf("Memory allocation failure\n"); bailout( );

	10379 }

	10380 for(i2=0; i2<ji4; i2++)

	10381 {

	10382 @@ -380,5 +383,5 @@

	10383 dotest(35, 41, 4, TJ_GRAYSCALE, "test");

	10384 dotest1();

	10385

	10386 - return 0;

	10387 + return exitstatus;

	10388 }

	10389 Index: jpgtest.cxx

	10390 ===================================================================

	10391 --- jpgtest.cxx (revision 829)

	10392 +++ jpgtest.cxx (working copy)

	10393 @@ -322,22 +322,22 @@

	10394 if(!stricmp(argv[i], "-tile")) dotile=1;

	10395 if(!stricmp(argv[i], "-forcesse3"))

	10396 {

	10397 - printf("Using SSE3 code in Intel compressor\n");

	10398 + printf("Using SSE3 code\n");

	10399 forcesse3=1;

	10400 }

	10401 if(!stricmp(argv[i], "-forcesse2"))

	10402 {

	10403 - printf("Using SSE2 code in Intel compressor\n");

	10404 + printf("Using SSE2 code\n");

	10405 forcesse2=1;

	10406 }

	10407 if(!stricmp(argv[i], "-forcesse"))

	10408 {

	10409 - printf("Using SSE code in Intel compressor\n");

	10410 + printf("Using SSE code\n");

	10411 forcesse=1;

	10412 }

	10413 if(!stricmp(argv[i], "-forcemmx"))

	10414 {

	10415 - printf("Using MMX code in Intel compressor\n");

	10416 + printf("Using MMX code\n");

	10417 forcemmx=1;

	10418 }

	10419 if(!stricmp(argv[i], "-fastupsample"))

	10420 Index: jquant1.c

	10421 ===================================================================

	10422 --- jquant1.c (revision 829)

	10423 +++ jquant1.c (working copy)

	10424 @@ -1,9 +1,10 @@

	10425 /*

	10426 * jquant1.c

	10427 *

	10428 + * This file was part of the Independent JPEG Group's software:

	10429 * Copyright (C) 1991-1996, Thomas G. Lane.

	10430 + * libjpeg-turbo Modifications:

	10431 * Copyright (C) 2009, D. R. Commander

	10432 - * This file is part of the Independent JPEG Group's software.

	10433 * For conditions of distribution and use, see the accompanying README file.

	10434 *

	10435 * This file contains 1-pass color quantization (color mapping) routines.

	10436 Index: jquant2.c

	10437 ===================================================================

	10438 --- jquant2.c (revision 829)

	10439 +++ jquant2.c (working copy)

	10440 @@ -1,9 +1,10 @@

	10441 /*

	10442 * jquant2.c

	10443 *

	10444 + * This file was part of the Independent JPEG Group's software:

	10445 * Copyright (C) 1991-1996, Thomas G. Lane.

	10446 + * libjpeg-turbo Modifications:

	10447 * Copyright (C) 2009, D. R. Commander.

	10448 - * This file is part of the Independent JPEG Group's software.

	10449 * For conditions of distribution and use, see the accompanying README file.

	10450 *

	10451 * This file contains 2-pass color quantization (color mapping) routines.

	10452 Index: jsimd.h

	10453 ===================================================================

	10454 --- jsimd.h (revision 829)

	10455 +++ jsimd.h (working copy)

	10456 @@ -2,9 +2,11 @@

	10457 * jsimd.h

	10458 *

	10459 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	10460 + * Copyright 2011 D. R. Commander

	10461 *

	10462 * Based on the x86 SIMD extension for IJG JPEG library,

	10463 * Copyright (C) 1999-2006, MIYASAKA Masaru.

	10464 + * For conditions of distribution and use, see copyright notice in jsimdext.inc

	10465 *

	10466 */

	10467

	10468 @@ -12,8 +14,10 @@

	10469

	10470 #ifdef NEED_SHORT_EXTERNAL_NAMES

	10471 #define jsimd_can_rgb_ycc jSCanRgbYcc

	10472 +#define jsimd_can_rgb_gray jSCanRgbGry

	10473 #define jsimd_can_ycc_rgb jSCanYccRgb

	10474 #define jsimd_rgb_ycc_convert jSRgbYccConv

	10475 +#define jsimd_rgb_gray_convert jSRgbGryConv

	10476 #define jsimd_ycc_rgb_convert jSYccRgbConv

	10477 #define jsimd_can_h2v2_downsample jSCanH2V2Down

	10478 #define jsimd_can_h2v1_downsample jSCanH2V1Down

	10479 @@ -34,6 +38,7 @@

	10480 #endif /* NEED_SHORT_EXTERNAL_NAMES */

	10481

	10482 EXTERN(int) jsimd_can_rgb_ycc JPP((void));

	10483 +EXTERN(int) jsimd_can_rgb_gray JPP((void));

	10484 EXTERN(int) jsimd_can_ycc_rgb JPP((void));

	10485

	10486 EXTERN(void) jsimd_rgb_ycc_convert

	10487 @@ -40,6 +45,10 @@

	10488 JPP((j_compress_ptr cinfo,

	10489 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	10490 JDIMENSION output_row, int num_rows));

	10491 +EXTERN(void) jsimd_rgb_gray_convert

	10492 + JPP((j_compress_ptr cinfo,

	10493 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	10494 + JDIMENSION output_row, int num_rows));

	10495 EXTERN(void) jsimd_ycc_rgb_convert

	10496 JPP((j_decompress_ptr cinfo,

	10497 JSAMPIMAGE input_buf, JDIMENSION input_row,

	10498 Index: jsimd_none.c

	10499 ===================================================================

	10500 --- jsimd_none.c (revision 829)

	10501 +++ jsimd_none.c (working copy)

	10502 @@ -2,10 +2,11 @@

	10503 * jsimd_none.c

	10504 *

	10505 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	10506 - * Copyright 2009 D. R. Commander

	10507 + * Copyright 2009-2011 D. R. Commander

	10508 *

	10509 * Based on the x86 SIMD extension for IJG JPEG library,

	10510 * Copyright (C) 1999-2006, MIYASAKA Masaru.

	10511 + * For conditions of distribution and use, see copyright notice in jsimdext.inc

	10512 *

	10513 * This file contains stubs for when there is no SIMD support available.

	10514 */

	10515 @@ -24,6 +25,12 @@

	10516 }

	10517

	10518 GLOBAL(int)

	10519 +jsimd_can_rgb_gray (void)

	10520 +{

	10521 + return 0;

	10522 +}

	10523 +

	10524 +GLOBAL(int)

	10525 jsimd_can_ycc_rgb (void)

	10526 {

	10527 return 0;

	10528 @@ -37,6 +44,13 @@

	10529 }

	10530

	10531 GLOBAL(void)

	10532 +jsimd_rgb_gray_convert (j_compress_ptr cinfo,

	10533 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	10534 + JDIMENSION output_row, int num_rows)

	10535 +{

	10536 +}

	10537 +

	10538 +GLOBAL(void)

	10539 jsimd_ycc_rgb_convert (j_decompress_ptr cinfo,

	10540 JSAMPIMAGE input_buf, JDIMENSION input_row,

	10541 JSAMPARRAY output_buf, int num_rows)

	10542 Index: jsimddct.h

	10543 ===================================================================

	10544 --- jsimddct.h (revision 829)

	10545 +++ jsimddct.h (working copy)

	10546 @@ -5,6 +5,7 @@

	10547 *

	10548 * Based on the x86 SIMD extension for IJG JPEG library,

	10549 * Copyright (C) 1999-2006, MIYASAKA Masaru.

	10550 + * For conditions of distribution and use, see copyright notice in jsimdext.inc

	10551 *

	10552 */

	10553

	10554 Index: jversion.h

	10555 ===================================================================

	10556 --- jversion.h (revision 829)

	10557 +++ jversion.h (working copy)

	10558 @@ -1,8 +1,10 @@

	10559 /*

	10560 * jversion.h

	10561 *

	10562 - * Copyright (C) 1991-1998, Thomas G. Lane.

	10563 - * This file is part of the Independent JPEG Group's software.

	10564 + * This file was part of the Independent JPEG Group's software:

	10565 + * Copyright (C) 1991-2012, Thomas G. Lane, Guido Vollbeding.

	10566 + * Modifications:

	10567 + * Copyright (C) 2010, 2012-2014, D. R. Commander.

	10568 * For conditions of distribution and use, see the accompanying README file.

	10569 *

	10570 * This file contains software version identification.

	10571 @@ -9,6 +11,22 @@

	10572 */

	10573

	10574

	10575 +#if JPEG_LIB_VERSION >= 80

	10576 +

	10577 +#define JVERSION "8d 15-Jan-2012"

	10578 +

	10579 +#elif JPEG_LIB_VERSION >= 70

	10580 +

	10581 +#define JVERSION "7 27-Jun-2009"

	10582 +

	10583 +#else

	10584 +

	10585 #define JVERSION "6b 27-Mar-1998"

	10586

	10587 -#define JCOPYRIGHT "Copyright (C) 1998, Thomas G. Lane"

	10588 +#endif

	10589 +

	10590 +#define JCOPYRIGHT "Copyright (C) 1991-2012 Thomas G. Lane, Guido Vollbedin g\n" \

	10591 + "Copyright (C) 1999-2006 MIYASAKA Masaru\n" \

	10592 + "Copyright (C) 2009 Pierre Ossman for Cendio AB\n" \

	10593 + "Copyright (C) 2009-2014 D. R. Commander\n" \

	10594 + "Copyright (C) 2009-2011 Nokia Corporation and/or its su bsidiary(-ies)"

	10595 Index: rdbmp.c

	10596 ===================================================================

	10597 --- rdbmp.c (revision 829)

	10598 +++ rdbmp.c (working copy)

	10599 @@ -1,8 +1,11 @@

	10600 /*

	10601 * rdbmp.c

	10602 *

	10603 + * This file was part of the Independent JPEG Group's software:

	10604 * Copyright (C) 1994-1996, Thomas G. Lane.

	10605 - * This file is part of the Independent JPEG Group's software.

	10606 + * Modified 2009-2010 by Guido Vollbeding.

	10607 + * libjpeg-turbo Modifications:

	10608 + * Modified 2011 by Siarhei Siamashka.

	10609 * For conditions of distribution and use, see the accompanying README file.

	10610 *

	10611 * This file contains routines to read input images in Microsoft "BMP"

	10612 @@ -177,10 +180,41 @@

	10613 }

	10614

	10615

	10616 +METHODDEF(JDIMENSION)

	10617 +get_32bit_row (j_compress_ptr cinfo, cjpeg_source_ptr sinfo)

	10618 +/* This version is for reading 32-bit pixels */

	10619 +{

	10620 + bmp_source_ptr source = (bmp_source_ptr) sinfo;

	10621 + JSAMPARRAY image_ptr;

	10622 + register JSAMPROW inptr, outptr;

	10623 + register JDIMENSION col;

	10624 +

	10625 + /* Fetch next row from virtual array */

	10626 + source->source_row--;

	10627 + image_ptr = (*cinfo->mem->access_virt_sarray)

	10628 + ((j_common_ptr) cinfo, source->whole_image,

	10629 + source->source_row, (JDIMENSION) 1, FALSE);

	10630 + /* Transfer data. Note source values are in BGR order

	10631 + * (even though Microsoft's own documents say the opposite).

	10632 + */

	10633 + inptr = image_ptr[0];

	10634 + outptr = source->pub.buffer[0];

	10635 + for (col = cinfo->image_width; col > 0; col--) {

	10636 + outptr[2] = inptr++; / can omit GETJSAMPLE() safely */

	10637 + outptr[1] = *inptr++;

	10638 + outptr[0] = *inptr++;

	10639 + inptr++; /* skip the 4th byte (Alpha channel) */

	10640 + outptr += 3;

	10641 + }

	10642 +

	10643 + return 1;

	10644 +}

	10645 +

	10646 +

	10647 /*

	10648 * This method loads the image into whole_image during the first call on

	10649 * get_pixel_rows. The get_pixel_rows pointer is then adjusted to call

	10650 - * get_8bit_row or get_24bit_row on subsequent calls.

	10651 + * get_8bit_row, get_24bit_row, or get_32bit_row on subsequent calls.

	10652 */

	10653

	10654 METHODDEF(JDIMENSION)

	10655 @@ -188,10 +222,9 @@

	10656 {

	10657 bmp_source_ptr source = (bmp_source_ptr) sinfo;

	10658 register FILE *infile = source->pub.input_file;

	10659 - register int c;

	10660 register JSAMPROW out_ptr;

	10661 JSAMPARRAY image_ptr;

	10662 - JDIMENSION row, col;

	10663 + JDIMENSION row;

	10664 cd_progress_ptr progress = (cd_progress_ptr) cinfo->progress;

	10665

	10666 /* Read the data into a virtual array in input-file row order. */

	10667 @@ -205,11 +238,11 @@

	10668 ((j_common_ptr) cinfo, source->whole_image,

	10669 row, (JDIMENSION) 1, TRUE);

	10670 out_ptr = image_ptr[0];

	10671 - for (col = source->row_width; col > 0; col--) {

	10672 - /* inline copy of read_byte() for speed */

	10673 - if ((c = getc(infile)) == EOF)

	10674 - ERREXIT(cinfo, JERR_INPUT_EOF);

	10675 - *out_ptr++ = (JSAMPLE) c;

	10676 + if (fread(out_ptr, 1, source->row_width, infile) != source->row_width) {

	10677 + if (feof(infile))

	10678 + ERREXIT(cinfo, JERR_INPUT_EOF);

	10679 + else

	10680 + ERREXIT(cinfo, JERR_FILE_READ);

	10681 }

	10682 }

	10683 if (progress != NULL)

	10684 @@ -223,6 +256,9 @@

	10685 case 24:

	10686 source->pub.get_pixel_rows = get_24bit_row;

	10687 break;

	10688 + case 32:

	10689 + source->pub.get_pixel_rows = get_32bit_row;

	10690 + break;

	10691 default:

	10692 ERREXIT(cinfo, JERR_BMP_BADDEPTH);

	10693 }

	10694 @@ -251,8 +287,8 @@

	10695 (((INT32) UCH(array[offset+3])) << 24))

	10696 INT32 bfOffBits;

	10697 INT32 headerSize;

	10698 - INT32 biWidth = 0; /* initialize to avoid compiler warning */

	10699 - INT32 biHeight = 0;

	10700 + INT32 biWidth;

	10701 + INT32 biHeight;

	10702 unsigned int biPlanes;

	10703 INT32 biCompression;

	10704 INT32 biXPelsPerMeter,biYPelsPerMeter;

	10705 @@ -300,8 +336,6 @@

	10706 ERREXIT(cinfo, JERR_BMP_BADDEPTH);

	10707 break;

	10708 }

	10709 - if (biPlanes != 1)

	10710 - ERREXIT(cinfo, JERR_BMP_BADPLANES);

	10711 break;

	10712 case 40:

	10713 case 64:

	10714 @@ -325,12 +359,13 @@

	10715 case 24: /* RGB image */

	10716 TRACEMS2(cinfo, 1, JTRC_BMP, (int) biWidth, (int) biHeight);

	10717 break;

	10718 + case 32: /* RGB image + Alpha channel */

	10719 + TRACEMS2(cinfo, 1, JTRC_BMP, (int) biWidth, (int) biHeight);

	10720 + break;

	10721 default:

	10722 ERREXIT(cinfo, JERR_BMP_BADDEPTH);

	10723 break;

	10724 }

	10725 - if (biPlanes != 1)

	10726 - ERREXIT(cinfo, JERR_BMP_BADPLANES);

	10727 if (biCompression != 0)

	10728 ERREXIT(cinfo, JERR_BMP_COMPRESSED);

	10729

	10730 @@ -343,9 +378,14 @@

	10731 break;

	10732 default:

	10733 ERREXIT(cinfo, JERR_BMP_BADHEADER);

	10734 - break;

	10735 + return;

	10736 }

	10737

	10738 + if (biWidth <= 0 \|\| biHeight <= 0)

	10739 + ERREXIT(cinfo, JERR_BMP_EMPTY);

	10740 + if (biPlanes != 1)

	10741 + ERREXIT(cinfo, JERR_BMP_BADPLANES);

	10742 +

	10743 /* Compute distance to bitmap data --- will adjust for colormap below */

	10744 bPad = bfOffBits - (headerSize + 14);

	10745

	10746 @@ -375,6 +415,8 @@

	10747 /* Compute row width in file, including padding to 4-byte boundary */

	10748 if (source->bits_per_pixel == 24)

	10749 row_width = (JDIMENSION) (biWidth * 3);

	10750 + else if (source->bits_per_pixel == 32)

	10751 + row_width = (JDIMENSION) (biWidth * 4);

	10752 else

	10753 row_width = (JDIMENSION) biWidth;

	10754 while ((row_width & 3) != 0) row_width++;

	10755 Index: rdppm.c

	10756 ===================================================================

	10757 --- rdppm.c (revision 829)

	10758 +++ rdppm.c (working copy)

	10759 @@ -2,6 +2,7 @@

	10760 * rdppm.c

	10761 *

	10762 * Copyright (C) 1991-1997, Thomas G. Lane.

	10763 + * Modified 2009 by Bill Allombert, Guido Vollbeding.

	10764 * This file is part of the Independent JPEG Group's software.

	10765 * For conditions of distribution and use, see the accompanying README file.

	10766 *

	10767 @@ -250,8 +251,8 @@

	10768 bufferptr = source->iobuffer;

	10769 for (col = cinfo->image_width; col > 0; col--) {

	10770 register int temp;

	10771 - temp = UCH(*bufferptr++);

	10772 - temp \|= UCH(*bufferptr++) << 8;

	10773 + temp = UCH(*bufferptr++) << 8;

	10774 + temp \|= UCH(*bufferptr++);

	10775 *ptr++ = rescale[temp];

	10776 }

	10777 return 1;

	10778 @@ -274,14 +275,14 @@

	10779 bufferptr = source->iobuffer;

	10780 for (col = cinfo->image_width; col > 0; col--) {

	10781 register int temp;

	10782 - temp = UCH(*bufferptr++);

	10783 - temp \|= UCH(*bufferptr++) << 8;

	10784 + temp = UCH(*bufferptr++) << 8;

	10785 + temp \|= UCH(*bufferptr++);

	10786 *ptr++ = rescale[temp];

	10787 - temp = UCH(*bufferptr++);

	10788 - temp \|= UCH(*bufferptr++) << 8;

	10789 + temp = UCH(*bufferptr++) << 8;

	10790 + temp \|= UCH(*bufferptr++);

	10791 *ptr++ = rescale[temp];

	10792 - temp = UCH(*bufferptr++);

	10793 - temp \|= UCH(*bufferptr++) << 8;

	10794 + temp = UCH(*bufferptr++) << 8;

	10795 + temp \|= UCH(*bufferptr++);

	10796 *ptr++ = rescale[temp];

	10797 }

	10798 return 1;

	10799 Index: rdswitch.c

	10800 ===================================================================

	10801 --- rdswitch.c (revision 829)

	10802 +++ rdswitch.c (working copy)

	10803 @@ -1,8 +1,10 @@

	10804 /*

	10805 * rdswitch.c

	10806 *

	10807 + * This file was part of the Independent JPEG Group's software:

	10808 * Copyright (C) 1991-1996, Thomas G. Lane.

	10809 - * This file is part of the Independent JPEG Group's software.

	10810 + * libjpeg-turbo Modifications:

	10811 + * Copyright (C) 2010, D. R. Commander.

	10812 * For conditions of distribution and use, see the accompanying README file.

	10813 *

	10814 * This file contains routines to process some of cjpeg's more complicated

	10815 @@ -9,6 +11,7 @@

	10816 * command-line switches. Switches processed here are:

	10817 * -qtables file Read quantization tables from text file

	10818 * -scans file Read scan script from text file

	10819 + * -quality N[,N,...] Set quality ratings

	10820 * -qslots N[,N,...] Set component quantization table selectors

	10821 * -sample HxV[,HxV,...] Set component sampling factors

	10822 */

	10823 @@ -69,9 +72,12 @@

	10824 }

	10825

	10826

	10827 +#if JPEG_LIB_VERSION < 70

	10828 +static int q_scale_factor[NUM_QUANT_TBLS] = {100, 100, 100, 100};

	10829 +#endif

	10830 +

	10831 GLOBAL(boolean)

	10832 -read_quant_tables (j_compress_ptr cinfo, char * filename,

	10833 - int scale_factor, boolean force_baseline)

	10834 +read_quant_tables (j_compress_ptr cinfo, char * filename, boolean force_baselin e)

	10835 /* Read a set of quantization tables from the specified file.

	10836 * The file is plain ASCII text: decimal numbers with whitespace between.

	10837 * Comments preceded by '#' may be included in the file.

	10838 @@ -108,7 +114,13 @@

	10839 }

	10840 table[i] = (unsigned int) val;

	10841 }

	10842 - jpeg_add_quant_table(cinfo, tblno, table, scale_factor, force_baseline);

	10843 +#if JPEG_LIB_VERSION >= 70

	10844 + jpeg_add_quant_table(cinfo, tblno, table, cinfo->q_scale_factor[tblno],

	10845 + force_baseline);

	10846 +#else

	10847 + jpeg_add_quant_table(cinfo, tblno, table, q_scale_factor[tblno],

	10848 + force_baseline);

	10849 +#endif

	10850 tblno++;

	10851 }

	10852

	10853 @@ -262,7 +274,85 @@

	10854 #endif /* C_MULTISCAN_FILES_SUPPORTED */

	10855

	10856

	10857 +#if JPEG_LIB_VERSION < 70

	10858 +/* These are the sample quantization tables given in JPEG spec section K.1.

	10859 + * The spec says that the values given produce "good" quality, and

	10860 + * when divided by 2, "very good" quality.

	10861 + */

	10862 +static const unsigned int std_luminance_quant_tbl[DCTSIZE2] = {

	10863 + 16, 11, 10, 16, 24, 40, 51, 61,

	10864 + 12, 12, 14, 19, 26, 58, 60, 55,

	10865 + 14, 13, 16, 24, 40, 57, 69, 56,

	10866 + 14, 17, 22, 29, 51, 87, 80, 62,

	10867 + 18, 22, 37, 56, 68, 109, 103, 77,

	10868 + 24, 35, 55, 64, 81, 104, 113, 92,

	10869 + 49, 64, 78, 87, 103, 121, 120, 101,

	10870 + 72, 92, 95, 98, 112, 100, 103, 99

	10871 +};

	10872 +static const unsigned int std_chrominance_quant_tbl[DCTSIZE2] = {

	10873 + 17, 18, 24, 47, 99, 99, 99, 99,

	10874 + 18, 21, 26, 66, 99, 99, 99, 99,

	10875 + 24, 26, 56, 99, 99, 99, 99, 99,

	10876 + 47, 66, 99, 99, 99, 99, 99, 99,

	10877 + 99, 99, 99, 99, 99, 99, 99, 99,

	10878 + 99, 99, 99, 99, 99, 99, 99, 99,

	10879 + 99, 99, 99, 99, 99, 99, 99, 99,

	10880 + 99, 99, 99, 99, 99, 99, 99, 99

	10881 +};

	10882 +

	10883 +

	10884 +LOCAL(void)

	10885 +jpeg_default_qtables (j_compress_ptr cinfo, boolean force_baseline)

	10886 +{

	10887 + jpeg_add_quant_table(cinfo, 0, std_luminance_quant_tbl,

	10888 + q_scale_factor[0], force_baseline);

	10889 + jpeg_add_quant_table(cinfo, 1, std_chrominance_quant_tbl,

	10890 + q_scale_factor[1], force_baseline);

	10891 +}

	10892 +#endif

	10893 +

	10894 +

	10895 GLOBAL(boolean)

	10896 +set_quality_ratings (j_compress_ptr cinfo, char *arg, boolean force_baseline)

	10897 +/* Process a quality-ratings parameter string, of the form

	10898 + * N[,N,...]

	10899 + * If there are more q-table slots than parameters, the last value is replicate d.

	10900 + */

	10901 +{

	10902 + int val = 75; /* default value */

	10903 + int tblno;

	10904 + char ch;

	10905 +

	10906 + for (tblno = 0; tblno < NUM_QUANT_TBLS; tblno++) {

	10907 + if (*arg) {

	10908 + ch = ','; /* if not set by sscanf, will be ',' */

	10909 + if (sscanf(arg, "%d%c", &val, &ch) < 1)

	10910 + return FALSE;

	10911 + if (ch != ',') /* syntax check */

	10912 + return FALSE;

	10913 + /* Convert user 0-100 rating to percentage scaling */

	10914 +#if JPEG_LIB_VERSION >= 70

	10915 + cinfo->q_scale_factor[tblno] = jpeg_quality_scaling(val);

	10916 +#else

	10917 + q_scale_factor[tblno] = jpeg_quality_scaling(val);

	10918 +#endif

	10919 + while (arg && arg++ != ',') /* advance to next segment of arg string */

	10920 + ;

	10921 + } else {

	10922 + /* reached end of parameter, set remaining factors to last value */

	10923 +#if JPEG_LIB_VERSION >= 70

	10924 + cinfo->q_scale_factor[tblno] = jpeg_quality_scaling(val);

	10925 +#else

	10926 + q_scale_factor[tblno] = jpeg_quality_scaling(val);

	10927 +#endif

	10928 + }

	10929 + }

	10930 + jpeg_default_qtables(cinfo, force_baseline);

	10931 + return TRUE;

	10932 +}

	10933 +

	10934 +

	10935 +GLOBAL(boolean)

	10936 set_quant_slots (j_compress_ptr cinfo, char *arg)

	10937 /* Process a quantization-table-selectors parameter string, of the form

	10938 * N[,N,...]

	10939 Index: rrutil.h

	10940 ===================================================================

	10941 --- rrutil.h (revision 829)

	10942 +++ rrutil.h (working copy)

	10943 @@ -1,5 +1,6 @@

	10944 /* Copyright (C)2004 Landmark Graphics Corporation

	10945 * Copyright (C)2005 Sun Microsystems, Inc.

	10946 + * Copyright (C)2010 D. R. Commander

	10947 *

	10948 * This library is free software and may be redistributed and/or modified under

	10949 * the terms of the wxWindows Library License, Version 3.1 or (at your option)

	10950 @@ -47,9 +48,9 @@

	10951 static __inline int numprocs(void)

	10952 {

	10953 #ifdef _WIN32

	10954 - DWORD ProcAff, SysAff, i; int count=0;

	10955 + DWORD_PTR ProcAff, SysAff, i; int count=0;

	10956 if(!GetProcessAffinityMask(GetCurrentProcess(), &ProcAff, &SysAff)) retu rn(1);

	10957 - for(i=0; i<32; i++) if(ProcAff&(1<<i)) count++;

	10958 + for(i=0; i<sizeof(long)8; i++) if(ProcAff&(1LL<<i)) count++;

	10959 return(count);

	10960 #elif defined (__APPLE__)

	10961 return(1);

	10962 Index: simd/jcclrmmx.asm

	10963 ===================================================================

	10964 --- simd/jcclrmmx.asm (revision 829)

	10965 +++ simd/jcclrmmx.asm (working copy)

	10966 @@ -19,8 +19,6 @@

	10967 %include "jcolsamp.inc"

	10968

	10969 ; --------------------------------------------------------------------------

	10970 - SECTION SEG_TEXT

	10971 - BITS 32

	10972 ;

	10973 ; Convert some rows of samples to the output colorspace.

	10974 ;

	10975 @@ -42,7 +40,7 @@

	10976 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr

365	10977

366 align 16	10978 align 16

367 -» global» EXTN(jsimd_idct_ifast_sse2)	10979 -» global» EXTN(jsimd_rgb_ycc_convert_mmx)

368 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE	10980 +» global» EXTN(jsimd_rgb_ycc_convert_mmx) PRIVATE

369	10981

370 EXTN(jsimd_idct_ifast_sse2):	10982 EXTN(jsimd_rgb_ycc_convert_mmx):

371 push ebp	10983 push ebp

	10984 @@ -474,3 +472,6 @@

	10985 pop ebp

	10986 ret

	10987

	10988 +; For some reason, the OS X linker does not honor the request to align the

	10989 +; segment unless we do this.

	10990 + align 16

372 Index: simd/jcclrss2-64.asm	10991 Index: simd/jcclrss2-64.asm

373 ===================================================================	10992 ===================================================================

374 --- simd/jcclrss2-64.asm (revision 829)	10993 --- simd/jcclrss2-64.asm (revision 829)

375 +++ simd/jcclrss2-64.asm (working copy)	10994 +++ simd/jcclrss2-64.asm (working copy)

376 @@ -37,7 +37,7 @@	10995 @@ -1,5 +1,5 @@

	10996 ;

	10997 -; jcclrss2.asm - colorspace conversion (64-bit SSE2)

	10998 +; jcclrss2-64.asm - colorspace conversion (64-bit SSE2)

	10999 ;

	11000 ; x86 SIMD extension for IJG JPEG library

	11001 ; Copyright (C) 1999-2006, MIYASAKA Masaru.

	11002 @@ -17,8 +17,6 @@

	11003 %include "jcolsamp.inc"

	11004

	11005 ; --------------------------------------------------------------------------

	11006 -» SECTION»SEG_TEXT

	11007 -» BITS» 64

	11008 ;

	11009 ; Convert some rows of samples to the output colorspace.

	11010 ;

	11011 @@ -39,7 +37,7 @@

377	11012

378 align 16	11013 align 16

379	11014

380 - global EXTN(jsimd_rgb_ycc_convert_sse2)	11015 - global EXTN(jsimd_rgb_ycc_convert_sse2)

381 + global EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE	11016 + global EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE

382	11017

383 EXTN(jsimd_rgb_ycc_convert_sse2):	11018 EXTN(jsimd_rgb_ycc_convert_sse2):

384 push rbp	11019 push rbp

385 Index: simd/jiss2red-64.asm	11020 @@ -49,8 +47,8 @@

386 ===================================================================	11021 » mov» [rsp],rax

387 --- simd/jiss2red-64.asm» (revision 829)	11022 » mov» rbp,rsp»» » » ; rbp = aligned rbp

388 +++ simd/jiss2red-64.asm» (working copy)	11023 » lea» rsp, [wk(0)]

389 @@ -73,7 +73,7 @@	11024 +» collect_args

	11025 » push» rbx

	11026 -» collect_args

	11027

	11028 » mov» rcx, r10

	11029 » test» rcx,rcx

	11030 @@ -70,7 +68,7 @@

	11031 » pop» rcx

	11032

	11033 » mov rsi, r11

	11034 -» mov» rax, r14

	11035 +» mov» eax, r14d

	11036 » test» rax,rax

	11037 » jle» near .return

	11038 .rowloop:

	11039 @@ -475,10 +473,13 @@

	11040 » jg» near .rowloop

	11041

	11042 .return:

	11043 +» pop» rbx

	11044 » uncollect_args

	11045 -» pop» rbx

	11046 » mov» rsp,rbp»» ; rsp <- aligned rbp

	11047 » pop» rsp» » ; rsp <- original rbp

	11048 » pop» rbp

	11049 » ret

	11050

	11051 +; For some reason, the OS X linker does not honor the request to align the

	11052 +; segment unless we do this.

	11053 +» align» 16

	11054 Index: simd/jcclrss2.asm

	11055 ===================================================================

	11056 --- simd/jcclrss2.asm» (revision 829)

	11057 +++ simd/jcclrss2.asm» (working copy)

	11058 @@ -16,8 +16,6 @@

	11059 %include "jcolsamp.inc"

	11060

	11061 ; --------------------------------------------------------------------------

	11062 -» SECTION»SEG_TEXT

	11063 -» BITS» 32

	11064 ;

	11065 ; Convert some rows of samples to the output colorspace.

	11066 ;

	11067 @@ -40,7 +38,7 @@

	11068

	11069 » align» 16

	11070

	11071 -» global» EXTN(jsimd_rgb_ycc_convert_sse2)

	11072 +» global» EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE

	11073

	11074 EXTN(jsimd_rgb_ycc_convert_sse2):

	11075 » push» ebp

	11076 @@ -500,3 +498,6 @@

	11077 » pop» ebp

	11078 » ret

	11079

	11080 +; For some reason, the OS X linker does not honor the request to align the

	11081 +; segment unless we do this.

	11082 +» align» 16

	11083 Index: simd/jccolmmx.asm

	11084 ===================================================================

	11085 --- simd/jccolmmx.asm» (revision 829)

	11086 +++ simd/jccolmmx.asm» (working copy)

	11087 @@ -37,7 +37,7 @@

390 SECTION SEG_CONST	11088 SECTION SEG_CONST

391	11089

392 alignz 16	11090 alignz 16

393 -» global» EXTN(jconst_idct_red_sse2)	11091 -» global» EXTN(jconst_rgb_ycc_convert_mmx)

394 +» global» EXTN(jconst_idct_red_sse2) PRIVATE	11092 +» global» EXTN(jconst_rgb_ycc_convert_mmx) PRIVATE

395	11093

396 EXTN(jconst_idct_red_sse2):	11094 EXTN(jconst_rgb_ycc_convert_mmx):

397	11095

398 @@ -114,7 +114,7 @@	11096 @@ -51,6 +51,9 @@

399 %define WK_NUM»» 2	11097 » alignz» 16

	11098

	11099 ; --------------------------------------------------------------------------

	11100 +» SECTION»SEG_TEXT

	11101 +» BITS» 32

	11102 +

	11103 %include "jcclrmmx.asm"

	11104

	11105 %undef RGB_RED

	11106 @@ -57,10 +60,10 @@

	11107 %undef RGB_GREEN

	11108 %undef RGB_BLUE

	11109 %undef RGB_PIXELSIZE

	11110 -%define RGB_RED 0

	11111 -%define RGB_GREEN 1

	11112 -%define RGB_BLUE 2

	11113 -%define RGB_PIXELSIZE 3

	11114 +%define RGB_RED EXT_RGB_RED

	11115 +%define RGB_GREEN EXT_RGB_GREEN

	11116 +%define RGB_BLUE EXT_RGB_BLUE

	11117 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	11118 %define jsimd_rgb_ycc_convert_mmx jsimd_extrgb_ycc_convert_mmx

	11119 %include "jcclrmmx.asm"

	11120

	11121 @@ -68,10 +71,10 @@

	11122 %undef RGB_GREEN

	11123 %undef RGB_BLUE

	11124 %undef RGB_PIXELSIZE

	11125 -%define RGB_RED 0

	11126 -%define RGB_GREEN 1

	11127 -%define RGB_BLUE 2

	11128 -%define RGB_PIXELSIZE 4

	11129 +%define RGB_RED EXT_RGBX_RED

	11130 +%define RGB_GREEN EXT_RGBX_GREEN

	11131 +%define RGB_BLUE EXT_RGBX_BLUE

	11132 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	11133 %define jsimd_rgb_ycc_convert_mmx jsimd_extrgbx_ycc_convert_mmx

	11134 %include "jcclrmmx.asm"

	11135

	11136 @@ -79,10 +82,10 @@

	11137 %undef RGB_GREEN

	11138 %undef RGB_BLUE

	11139 %undef RGB_PIXELSIZE

	11140 -%define RGB_RED 2

	11141 -%define RGB_GREEN 1

	11142 -%define RGB_BLUE 0

	11143 -%define RGB_PIXELSIZE 3

	11144 +%define RGB_RED EXT_BGR_RED

	11145 +%define RGB_GREEN EXT_BGR_GREEN

	11146 +%define RGB_BLUE EXT_BGR_BLUE

	11147 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	11148 %define jsimd_rgb_ycc_convert_mmx jsimd_extbgr_ycc_convert_mmx

	11149 %include "jcclrmmx.asm"

	11150

	11151 @@ -90,10 +93,10 @@

	11152 %undef RGB_GREEN

	11153 %undef RGB_BLUE

	11154 %undef RGB_PIXELSIZE

	11155 -%define RGB_RED 2

	11156 -%define RGB_GREEN 1

	11157 -%define RGB_BLUE 0

	11158 -%define RGB_PIXELSIZE 4

	11159 +%define RGB_RED EXT_BGRX_RED

	11160 +%define RGB_GREEN EXT_BGRX_GREEN

	11161 +%define RGB_BLUE EXT_BGRX_BLUE

	11162 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	11163 %define jsimd_rgb_ycc_convert_mmx jsimd_extbgrx_ycc_convert_mmx

	11164 %include "jcclrmmx.asm"

	11165

	11166 @@ -101,10 +104,10 @@

	11167 %undef RGB_GREEN

	11168 %undef RGB_BLUE

	11169 %undef RGB_PIXELSIZE

	11170 -%define RGB_RED 3

	11171 -%define RGB_GREEN 2

	11172 -%define RGB_BLUE 1

	11173 -%define RGB_PIXELSIZE 4

	11174 +%define RGB_RED EXT_XBGR_RED

	11175 +%define RGB_GREEN EXT_XBGR_GREEN

	11176 +%define RGB_BLUE EXT_XBGR_BLUE

	11177 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	11178 %define jsimd_rgb_ycc_convert_mmx jsimd_extxbgr_ycc_convert_mmx

	11179 %include "jcclrmmx.asm"

	11180

	11181 @@ -112,9 +115,9 @@

	11182 %undef RGB_GREEN

	11183 %undef RGB_BLUE

	11184 %undef RGB_PIXELSIZE

	11185 -%define RGB_RED 1

	11186 -%define RGB_GREEN 2

	11187 -%define RGB_BLUE 3

	11188 -%define RGB_PIXELSIZE 4

	11189 +%define RGB_RED EXT_XRGB_RED

	11190 +%define RGB_GREEN EXT_XRGB_GREEN

	11191 +%define RGB_BLUE EXT_XRGB_BLUE

	11192 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	11193 %define jsimd_rgb_ycc_convert_mmx jsimd_extxrgb_ycc_convert_mmx

	11194 %include "jcclrmmx.asm"

	11195 Index: simd/jccolss2-64.asm

	11196 ===================================================================

	11197 --- simd/jccolss2-64.asm» (revision 829)

	11198 +++ simd/jccolss2-64.asm» (working copy)

	11199 @@ -1,5 +1,5 @@

	11200 ;

	11201 -; jccolss2.asm - colorspace conversion (64-bit SSE2)

	11202 +; jccolss2-64.asm - colorspace conversion (64-bit SSE2)

	11203 ;

	11204 ; x86 SIMD extension for IJG JPEG library

	11205 ; Copyright (C) 1999-2006, MIYASAKA Masaru.

	11206 @@ -34,7 +34,7 @@

	11207 » SECTION»SEG_CONST

	11208

	11209 » alignz» 16

	11210 -» global» EXTN(jconst_rgb_ycc_convert_sse2)

	11211 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE

	11212

	11213 EXTN(jconst_rgb_ycc_convert_sse2):

	11214

	11215 @@ -48,6 +48,9 @@

	11216 » alignz» 16

	11217

	11218 ; --------------------------------------------------------------------------

	11219 +» SECTION»SEG_TEXT

	11220 +» BITS» 64

	11221 +

	11222 %include "jcclrss2-64.asm"

	11223

	11224 %undef RGB_RED

	11225 @@ -54,10 +57,10 @@

	11226 %undef RGB_GREEN

	11227 %undef RGB_BLUE

	11228 %undef RGB_PIXELSIZE

	11229 -%define RGB_RED 0

	11230 -%define RGB_GREEN 1

	11231 -%define RGB_BLUE 2

	11232 -%define RGB_PIXELSIZE 3

	11233 +%define RGB_RED EXT_RGB_RED

	11234 +%define RGB_GREEN EXT_RGB_GREEN

	11235 +%define RGB_BLUE EXT_RGB_BLUE

	11236 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	11237 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgb_ycc_convert_sse2

	11238 %include "jcclrss2-64.asm"

	11239

	11240 @@ -65,10 +68,10 @@

	11241 %undef RGB_GREEN

	11242 %undef RGB_BLUE

	11243 %undef RGB_PIXELSIZE

	11244 -%define RGB_RED 0

	11245 -%define RGB_GREEN 1

	11246 -%define RGB_BLUE 2

	11247 -%define RGB_PIXELSIZE 4

	11248 +%define RGB_RED EXT_RGBX_RED

	11249 +%define RGB_GREEN EXT_RGBX_GREEN

	11250 +%define RGB_BLUE EXT_RGBX_BLUE

	11251 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	11252 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgbx_ycc_convert_sse2

	11253 %include "jcclrss2-64.asm"

	11254

	11255 @@ -76,10 +79,10 @@

	11256 %undef RGB_GREEN

	11257 %undef RGB_BLUE

	11258 %undef RGB_PIXELSIZE

	11259 -%define RGB_RED 2

	11260 -%define RGB_GREEN 1

	11261 -%define RGB_BLUE 0

	11262 -%define RGB_PIXELSIZE 3

	11263 +%define RGB_RED EXT_BGR_RED

	11264 +%define RGB_GREEN EXT_BGR_GREEN

	11265 +%define RGB_BLUE EXT_BGR_BLUE

	11266 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	11267 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgr_ycc_convert_sse2

	11268 %include "jcclrss2-64.asm"

	11269

	11270 @@ -87,10 +90,10 @@

	11271 %undef RGB_GREEN

	11272 %undef RGB_BLUE

	11273 %undef RGB_PIXELSIZE

	11274 -%define RGB_RED 2

	11275 -%define RGB_GREEN 1

	11276 -%define RGB_BLUE 0

	11277 -%define RGB_PIXELSIZE 4

	11278 +%define RGB_RED EXT_BGRX_RED

	11279 +%define RGB_GREEN EXT_BGRX_GREEN

	11280 +%define RGB_BLUE EXT_BGRX_BLUE

	11281 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	11282 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgrx_ycc_convert_sse2

	11283 %include "jcclrss2-64.asm"

	11284

	11285 @@ -98,10 +101,10 @@

	11286 %undef RGB_GREEN

	11287 %undef RGB_BLUE

	11288 %undef RGB_PIXELSIZE

	11289 -%define RGB_RED 3

	11290 -%define RGB_GREEN 2

	11291 -%define RGB_BLUE 1

	11292 -%define RGB_PIXELSIZE 4

	11293 +%define RGB_RED EXT_XBGR_RED

	11294 +%define RGB_GREEN EXT_XBGR_GREEN

	11295 +%define RGB_BLUE EXT_XBGR_BLUE

	11296 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	11297 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxbgr_ycc_convert_sse2

	11298 %include "jcclrss2-64.asm"

	11299

	11300 @@ -109,9 +112,9 @@

	11301 %undef RGB_GREEN

	11302 %undef RGB_BLUE

	11303 %undef RGB_PIXELSIZE

	11304 -%define RGB_RED 1

	11305 -%define RGB_GREEN 2

	11306 -%define RGB_BLUE 3

	11307 -%define RGB_PIXELSIZE 4

	11308 +%define RGB_RED EXT_XRGB_RED

	11309 +%define RGB_GREEN EXT_XRGB_GREEN

	11310 +%define RGB_BLUE EXT_XRGB_BLUE

	11311 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	11312 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxrgb_ycc_convert_sse2

	11313 %include "jcclrss2-64.asm"

	11314 Index: simd/jccolss2.asm

	11315 ===================================================================

	11316 --- simd/jccolss2.asm» (revision 829)

	11317 +++ simd/jccolss2.asm» (working copy)

	11318 @@ -34,7 +34,7 @@

	11319 » SECTION»SEG_CONST

	11320

	11321 » alignz» 16

	11322 -» global» EXTN(jconst_rgb_ycc_convert_sse2)

	11323 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE

	11324

	11325 EXTN(jconst_rgb_ycc_convert_sse2):

	11326

	11327 @@ -48,6 +48,9 @@

	11328 » alignz» 16

	11329

	11330 ; --------------------------------------------------------------------------

	11331 +» SECTION»SEG_TEXT

	11332 +» BITS» 32

	11333 +

	11334 %include "jcclrss2.asm"

	11335

	11336 %undef RGB_RED

	11337 @@ -54,10 +57,10 @@

	11338 %undef RGB_GREEN

	11339 %undef RGB_BLUE

	11340 %undef RGB_PIXELSIZE

	11341 -%define RGB_RED 0

	11342 -%define RGB_GREEN 1

	11343 -%define RGB_BLUE 2

	11344 -%define RGB_PIXELSIZE 3

	11345 +%define RGB_RED EXT_RGB_RED

	11346 +%define RGB_GREEN EXT_RGB_GREEN

	11347 +%define RGB_BLUE EXT_RGB_BLUE

	11348 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	11349 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgb_ycc_convert_sse2

	11350 %include "jcclrss2.asm"

	11351

	11352 @@ -65,10 +68,10 @@

	11353 %undef RGB_GREEN

	11354 %undef RGB_BLUE

	11355 %undef RGB_PIXELSIZE

	11356 -%define RGB_RED 0

	11357 -%define RGB_GREEN 1

	11358 -%define RGB_BLUE 2

	11359 -%define RGB_PIXELSIZE 4

	11360 +%define RGB_RED EXT_RGBX_RED

	11361 +%define RGB_GREEN EXT_RGBX_GREEN

	11362 +%define RGB_BLUE EXT_RGBX_BLUE

	11363 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	11364 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgbx_ycc_convert_sse2

	11365 %include "jcclrss2.asm"

	11366

	11367 @@ -76,10 +79,10 @@

	11368 %undef RGB_GREEN

	11369 %undef RGB_BLUE

	11370 %undef RGB_PIXELSIZE

	11371 -%define RGB_RED 2

	11372 -%define RGB_GREEN 1

	11373 -%define RGB_BLUE 0

	11374 -%define RGB_PIXELSIZE 3

	11375 +%define RGB_RED EXT_BGR_RED

	11376 +%define RGB_GREEN EXT_BGR_GREEN

	11377 +%define RGB_BLUE EXT_BGR_BLUE

	11378 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	11379 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgr_ycc_convert_sse2

	11380 %include "jcclrss2.asm"

	11381

	11382 @@ -87,10 +90,10 @@

	11383 %undef RGB_GREEN

	11384 %undef RGB_BLUE

	11385 %undef RGB_PIXELSIZE

	11386 -%define RGB_RED 2

	11387 -%define RGB_GREEN 1

	11388 -%define RGB_BLUE 0

	11389 -%define RGB_PIXELSIZE 4

	11390 +%define RGB_RED EXT_BGRX_RED

	11391 +%define RGB_GREEN EXT_BGRX_GREEN

	11392 +%define RGB_BLUE EXT_BGRX_BLUE

	11393 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	11394 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgrx_ycc_convert_sse2

	11395 %include "jcclrss2.asm"

	11396

	11397 @@ -98,10 +101,10 @@

	11398 %undef RGB_GREEN

	11399 %undef RGB_BLUE

	11400 %undef RGB_PIXELSIZE

	11401 -%define RGB_RED 3

	11402 -%define RGB_GREEN 2

	11403 -%define RGB_BLUE 1

	11404 -%define RGB_PIXELSIZE 4

	11405 +%define RGB_RED EXT_XBGR_RED

	11406 +%define RGB_GREEN EXT_XBGR_GREEN

	11407 +%define RGB_BLUE EXT_XBGR_BLUE

	11408 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	11409 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxbgr_ycc_convert_sse2

	11410 %include "jcclrss2.asm"

	11411

	11412 @@ -109,9 +112,9 @@

	11413 %undef RGB_GREEN

	11414 %undef RGB_BLUE

	11415 %undef RGB_PIXELSIZE

	11416 -%define RGB_RED 1

	11417 -%define RGB_GREEN 2

	11418 -%define RGB_BLUE 3

	11419 -%define RGB_PIXELSIZE 4

	11420 +%define RGB_RED EXT_XRGB_RED

	11421 +%define RGB_GREEN EXT_XRGB_GREEN

	11422 +%define RGB_BLUE EXT_XRGB_BLUE

	11423 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	11424 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxrgb_ycc_convert_sse2

	11425 %include "jcclrss2.asm"

	11426 Index: simd/jcqnt3dn.asm

	11427 ===================================================================

	11428 --- simd/jcqnt3dn.asm» (revision 829)

	11429 +++ simd/jcqnt3dn.asm» (working copy)

	11430 @@ -35,7 +35,7 @@

	11431 %define workspace» ebp+16» » ; FAST_FLOAT * workspace

400	11432

401 align 16	11433 align 16

402 -» global» EXTN(jsimd_idct_4x4_sse2)	11434 -» global» EXTN(jsimd_convsamp_float_3dnow)

403 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE	11435 +» global» EXTN(jsimd_convsamp_float_3dnow) PRIVATE

404	11436

405 EXTN(jsimd_idct_4x4_sse2):	11437 EXTN(jsimd_convsamp_float_3dnow):

	11438 » push» ebp

	11439 @@ -138,7 +138,7 @@

	11440 %define workspace» ebp+16» » ; FAST_FLOAT * workspace

	11441

	11442 » align» 16

	11443 -» global» EXTN(jsimd_quantize_float_3dnow)

	11444 +» global» EXTN(jsimd_quantize_float_3dnow) PRIVATE

	11445

	11446 EXTN(jsimd_quantize_float_3dnow):

	11447 » push» ebp

	11448 @@ -228,3 +228,6 @@

	11449 » pop» ebp

	11450 » ret

	11451

	11452 +; For some reason, the OS X linker does not honor the request to align the

	11453 +; segment unless we do this.

	11454 +» align» 16

	11455 Index: simd/jcqntmmx.asm

	11456 ===================================================================

	11457 --- simd/jcqntmmx.asm» (revision 829)

	11458 +++ simd/jcqntmmx.asm» (working copy)

	11459 @@ -35,7 +35,7 @@

	11460 %define workspace» ebp+16» » ; DCTELEM * workspace

	11461

	11462 » align» 16

	11463 -» global» EXTN(jsimd_convsamp_mmx)

	11464 +» global» EXTN(jsimd_convsamp_mmx) PRIVATE

	11465

	11466 EXTN(jsimd_convsamp_mmx):

	11467 » push» ebp

	11468 @@ -140,7 +140,7 @@

	11469 %define workspace» ebp+16» » ; DCTELEM * workspace

	11470

	11471 » align» 16

	11472 -» global» EXTN(jsimd_quantize_mmx)

	11473 +» global» EXTN(jsimd_quantize_mmx) PRIVATE

	11474

	11475 EXTN(jsimd_quantize_mmx):

	11476 » push» ebp

	11477 @@ -269,3 +269,6 @@

	11478 » pop» ebp

	11479 » ret

	11480

	11481 +; For some reason, the OS X linker does not honor the request to align the

	11482 +; segment unless we do this.

	11483 +» align» 16

	11484 Index: simd/jcqnts2f-64.asm

	11485 ===================================================================

	11486 --- simd/jcqnts2f-64.asm» (revision 829)

	11487 +++ simd/jcqnts2f-64.asm» (working copy)

	11488 @@ -1,5 +1,5 @@

	11489 ;

	11490 -; jcqnts2f.asm - sample data conversion and quantization (64-bit SSE & SSE2)

	11491 +; jcqnts2f-64.asm - sample data conversion and quantization (64-bit SSE & SSE2)

	11492 ;

	11493 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	11494 ; Copyright 2009 D. R. Commander

	11495 @@ -36,13 +36,14 @@

	11496 ; r12 = FAST_FLOAT * workspace

	11497

	11498 » align» 16

	11499 -» global» EXTN(jsimd_convsamp_float_sse2)

	11500 +» global» EXTN(jsimd_convsamp_float_sse2) PRIVATE

	11501

	11502 EXTN(jsimd_convsamp_float_sse2):

406 push rbp	11503 push rbp

407 @@ -413,7 +413,7 @@	11504 +» mov» rax,rsp

408 ; r13 = JDIMENSION output_col	11505 » mov» rbp,rsp

	11506 +» collect_args

	11507 » push» rbx

	11508 -» collect_args

	11509

	11510 » pcmpeqw xmm7,xmm7

	11511 » psllw xmm7,7

	11512 @@ -89,8 +90,8 @@

	11513 » dec» rcx

	11514 » jnz» short .convloop

	11515

	11516 +» pop» rbx

	11517 » uncollect_args

	11518 -» pop» rbx

	11519 » pop» rbp

	11520 » ret

	11521

	11522 @@ -109,10 +110,11 @@

	11523 ; r12 = FAST_FLOAT * workspace

409	11524

410 align 16	11525 align 16

411 -» global» EXTN(jsimd_idct_2x2_sse2)	11526 -» global» EXTN(jsimd_quantize_float_sse2)

412 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE	11527 +» global» EXTN(jsimd_quantize_float_sse2) PRIVATE

413	11528

414 EXTN(jsimd_idct_2x2_sse2):	11529 EXTN(jsimd_quantize_float_sse2):

415 push rbp	11530 push rbp

416 Index: simd/ji3dnflt.asm	11531 +» mov» rax,rsp

417 ===================================================================	11532 » mov» rbp,rsp

418 --- simd/ji3dnflt.asm» (revision 829)	11533 » collect_args

419 +++ simd/ji3dnflt.asm» (working copy)	11534

420 @@ -27,7 +27,7 @@	11535 @@ -150,3 +152,7 @@

421 » SECTION»SEG_CONST	11536 » uncollect_args

422	11537 » pop» rbp

423 » alignz» 16	11538 » ret

424 -» global» EXTN(jconst_idct_float_3dnow)	11539 +

425 +» global» EXTN(jconst_idct_float_3dnow) PRIVATE	11540 +; For some reason, the OS X linker does not honor the request to align the

426	11541 +; segment unless we do this.

427 EXTN(jconst_idct_float_3dnow):	11542 +» align» 16

428

429 @@ -63,7 +63,7 @@

430 » » » » » ; FAST_FLOAT workspace[DCTSIZE2]

431

432 » align» 16

433 -» global» EXTN(jsimd_idct_float_3dnow)

434 +» global» EXTN(jsimd_idct_float_3dnow) PRIVATE

435

436 EXTN(jsimd_idct_float_3dnow):

437 » push» ebp

438 Index: simd/jsimdcpu.asm

439 ===================================================================

440 --- simd/jsimdcpu.asm» (revision 829)

441 +++ simd/jsimdcpu.asm» (working copy)

442 @@ -29,7 +29,7 @@

443 ;

444

445 » align» 16

446 -» global» EXTN(jpeg_simd_cpu_support)

447 +» global» EXTN(jpeg_simd_cpu_support) PRIVATE

448

449 EXTN(jpeg_simd_cpu_support):

450 » push» ebx

451 Index: simd/jdmerss2-64.asm

452 ===================================================================

453 --- simd/jdmerss2-64.asm» (revision 829)

454 +++ simd/jdmerss2-64.asm» (working copy)

455 @@ -35,7 +35,7 @@

456 » SECTION»SEG_CONST

457

458 » alignz» 16

459 -» global» EXTN(jconst_merged_upsample_sse2)

460 +» global» EXTN(jconst_merged_upsample_sse2) PRIVATE

461

462 EXTN(jconst_merged_upsample_sse2):

463

464 Index: simd/jdsammmx.asm

465 ===================================================================

466 --- simd/jdsammmx.asm» (revision 829)

467 +++ simd/jdsammmx.asm» (working copy)

468 @@ -22,7 +22,7 @@

469 » SECTION»SEG_CONST

470

471 » alignz» 16

472 -» global» EXTN(jconst_fancy_upsample_mmx)

473 +» global» EXTN(jconst_fancy_upsample_mmx) PRIVATE

474

475 EXTN(jconst_fancy_upsample_mmx):

476

477 @@ -58,7 +58,7 @@

478 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

479

480 » align» 16

481 -» global» EXTN(jsimd_h2v1_fancy_upsample_mmx)

482 +» global» EXTN(jsimd_h2v1_fancy_upsample_mmx) PRIVATE

483

484 EXTN(jsimd_h2v1_fancy_upsample_mmx):

485 » push» ebp

486 @@ -216,7 +216,7 @@

487 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr

488

489 » align» 16

490 -» global» EXTN(jsimd_h2v2_fancy_upsample_mmx)

491 +» global» EXTN(jsimd_h2v2_fancy_upsample_mmx) PRIVATE

492

493 EXTN(jsimd_h2v2_fancy_upsample_mmx):

494 » push» ebp

495 @@ -542,7 +542,7 @@

496 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

497

498 » align» 16

499 -» global» EXTN(jsimd_h2v1_upsample_mmx)

500 +» global» EXTN(jsimd_h2v1_upsample_mmx) PRIVATE

501

502 EXTN(jsimd_h2v1_upsample_mmx):

503 » push» ebp

504 @@ -643,7 +643,7 @@

505 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

506

507 » align» 16

508 -» global» EXTN(jsimd_h2v2_upsample_mmx)

509 +» global» EXTN(jsimd_h2v2_upsample_mmx) PRIVATE

510

511 EXTN(jsimd_h2v2_upsample_mmx):

512 » push» ebp

513 Index: simd/jdmrgmmx.asm

514 ===================================================================

515 --- simd/jdmrgmmx.asm» (revision 829)

516 +++ simd/jdmrgmmx.asm» (working copy)

517 @@ -40,7 +40,7 @@

518 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr

519

520 » align» 16

521 -» global» EXTN(jsimd_h2v1_merged_upsample_mmx)

522 +» global» EXTN(jsimd_h2v1_merged_upsample_mmx) PRIVATE

523

524 EXTN(jsimd_h2v1_merged_upsample_mmx):

525 » push» ebp

526 @@ -409,7 +409,7 @@

527 %define output_buf(b)» » (b)+20» » ; JSAMPARRAY output_buf

528

529 » align» 16

530 -» global» EXTN(jsimd_h2v2_merged_upsample_mmx)

531 +» global» EXTN(jsimd_h2v2_merged_upsample_mmx) PRIVATE

532

533 EXTN(jsimd_h2v2_merged_upsample_mmx):

534 » push» ebp

535 Index: simd/jdsamss2.asm

536 ===================================================================

537 --- simd/jdsamss2.asm» (revision 829)

538 +++ simd/jdsamss2.asm» (working copy)

539 @@ -22,7 +22,7 @@

540 » SECTION»SEG_CONST

541

542 » alignz» 16

543 -» global» EXTN(jconst_fancy_upsample_sse2)

544 +» global» EXTN(jconst_fancy_upsample_sse2) PRIVATE

545

546 EXTN(jconst_fancy_upsample_sse2):

547

548 @@ -58,7 +58,7 @@

549 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

550

551 » align» 16

552 -» global» EXTN(jsimd_h2v1_fancy_upsample_sse2)

553 +» global» EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE

554

555 EXTN(jsimd_h2v1_fancy_upsample_sse2):

556 » push» ebp

557 @@ -214,7 +214,7 @@

558 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr

559

560 » align» 16

561 -» global» EXTN(jsimd_h2v2_fancy_upsample_sse2)

562 +» global» EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE

563

564 EXTN(jsimd_h2v2_fancy_upsample_sse2):

565 » push» ebp

566 @@ -538,7 +538,7 @@

567 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

568

569 » align» 16

570 -» global» EXTN(jsimd_h2v1_upsample_sse2)

571 +» global» EXTN(jsimd_h2v1_upsample_sse2) PRIVATE

572

573 EXTN(jsimd_h2v1_upsample_sse2):

574 » push» ebp

575 @@ -637,7 +637,7 @@

576 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

577

578 » align» 16

579 -» global» EXTN(jsimd_h2v2_upsample_sse2)

580 +» global» EXTN(jsimd_h2v2_upsample_sse2) PRIVATE

581

582 EXTN(jsimd_h2v2_upsample_sse2):

583 » push» ebp

584 Index: simd/jiss2flt-64.asm

585 ===================================================================

586 --- simd/jiss2flt-64.asm» (revision 829)

587 +++ simd/jiss2flt-64.asm» (working copy)

588 @@ -38,7 +38,7 @@

589 » SECTION»SEG_CONST

590

591 » alignz» 16

592 -» global» EXTN(jconst_idct_float_sse2)

593 +» global» EXTN(jconst_idct_float_sse2) PRIVATE

594

595 EXTN(jconst_idct_float_sse2):

596

597 @@ -74,7 +74,7 @@

598 » » » » » ; FAST_FLOAT workspace[DCTSIZE2]

599

600 » align» 16

601 -» global» EXTN(jsimd_idct_float_sse2)

602 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE

603

604 EXTN(jsimd_idct_float_sse2):

605 » push» rbp

606 Index: simd/jfss2int-64.asm

607 ===================================================================

608 --- simd/jfss2int-64.asm» (revision 829)

609 +++ simd/jfss2int-64.asm» (working copy)

610 @@ -67,7 +67,7 @@

611 » SECTION»SEG_CONST

612

613 » alignz» 16

614 -» global» EXTN(jconst_fdct_islow_sse2)

615 +» global» EXTN(jconst_fdct_islow_sse2) PRIVATE

616

617 EXTN(jconst_fdct_islow_sse2):

618

619 @@ -101,7 +101,7 @@

620 %define WK_NUM»» 6

621

622 » align» 16

623 -» global» EXTN(jsimd_fdct_islow_sse2)

624 +» global» EXTN(jsimd_fdct_islow_sse2) PRIVATE

625

626 EXTN(jsimd_fdct_islow_sse2):

627 » push» rbp

628 Index: simd/jcqnts2f.asm	11543 Index: simd/jcqnts2f.asm

629 ===================================================================	11544 ===================================================================

630 --- simd/jcqnts2f.asm (revision 829)	11545 --- simd/jcqnts2f.asm (revision 829)

631 +++ simd/jcqnts2f.asm (working copy)	11546 +++ simd/jcqnts2f.asm (working copy)

632 @@ -35,7 +35,7 @@	11547 @@ -35,7 +35,7 @@

633 %define workspace ebp+16 ; FAST_FLOAT * workspace	11548 %define workspace ebp+16 ; FAST_FLOAT * workspace

634	11549

635 align 16	11550 align 16

636 - global EXTN(jsimd_convsamp_float_sse2)	11551 - global EXTN(jsimd_convsamp_float_sse2)

637 + global EXTN(jsimd_convsamp_float_sse2) PRIVATE	11552 + global EXTN(jsimd_convsamp_float_sse2) PRIVATE

638	11553

639 EXTN(jsimd_convsamp_float_sse2):	11554 EXTN(jsimd_convsamp_float_sse2):

640 push ebp	11555 push ebp

641 @@ -115,7 +115,7 @@	11556 @@ -115,7 +115,7 @@

642 %define workspace ebp+16 ; FAST_FLOAT * workspace	11557 %define workspace ebp+16 ; FAST_FLOAT * workspace

643	11558

644 align 16	11559 align 16

645 - global EXTN(jsimd_quantize_float_sse2)	11560 - global EXTN(jsimd_quantize_float_sse2)

646 + global EXTN(jsimd_quantize_float_sse2) PRIVATE	11561 + global EXTN(jsimd_quantize_float_sse2) PRIVATE

647	11562

648 EXTN(jsimd_quantize_float_sse2):	11563 EXTN(jsimd_quantize_float_sse2):

649 push ebp	11564 push ebp

650 Index: simd/jdmrgss2.asm	11565 @@ -166,3 +166,6 @@

651 ===================================================================	11566 » pop» ebp

652 --- simd/jdmrgss2.asm» (revision 829)	11567 » ret

653 +++ simd/jdmrgss2.asm» (working copy)	11568

654 @@ -40,7 +40,7 @@	11569 +; For some reason, the OS X linker does not honor the request to align the

655 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr	11570 +; segment unless we do this.

	11571 +» align» 16

	11572 Index: simd/jcqnts2i-64.asm

	11573 ===================================================================

	11574 --- simd/jcqnts2i-64.asm» (revision 829)

	11575 +++ simd/jcqnts2i-64.asm» (working copy)

	11576 @@ -1,5 +1,5 @@

	11577 ;

	11578 -; jcqnts2i.asm - sample data conversion and quantization (64-bit SSE2)

	11579 +; jcqnts2i-64.asm - sample data conversion and quantization (64-bit SSE2)

	11580 ;

	11581 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	11582 ; Copyright 2009 D. R. Commander

	11583 @@ -36,13 +36,14 @@

	11584 ; r12 = DCTELEM * workspace

656	11585

657 align 16	11586 align 16

658 -» global» EXTN(jsimd_h2v1_merged_upsample_sse2)	11587 -» global» EXTN(jsimd_convsamp_sse2)

659 +» global» EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE	11588 +» global» EXTN(jsimd_convsamp_sse2) PRIVATE

660	11589

661 EXTN(jsimd_h2v1_merged_upsample_sse2):	11590 EXTN(jsimd_convsamp_sse2):

662 » push» ebp	11591 » push» rbp

663 @@ -560,7 +560,7 @@	11592 +» mov» rax,rsp

664 %define output_buf(b)» » (b)+20» » ; JSAMPARRAY output_buf	11593 » mov» rbp,rsp

	11594 +» collect_args

	11595 » push» rbx

	11596 -» collect_args

	11597

	11598 » pxor» xmm6,xmm6» » ; xmm6=(all 0's)

	11599 » pcmpeqw»xmm7,xmm7

	11600 @@ -84,8 +85,8 @@

	11601 » dec» rcx

	11602 » jnz» short .convloop

	11603

	11604 +» pop» rbx

	11605 » uncollect_args

	11606 -» pop» rbx

	11607 » pop» rbp

	11608 » ret

	11609

	11610 @@ -111,10 +112,11 @@

	11611 ; r12 = DCTELEM * workspace

665	11612

666 align 16	11613 align 16

667 -» global» EXTN(jsimd_h2v2_merged_upsample_sse2)	11614 -» global» EXTN(jsimd_quantize_sse2)

668 +» global» EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE	11615 +» global» EXTN(jsimd_quantize_sse2) PRIVATE

669	11616

670 EXTN(jsimd_h2v2_merged_upsample_sse2):	11617 EXTN(jsimd_quantize_sse2):

671 » push» ebp

672 Index: simd/jfmmxint.asm

673 ===================================================================

674 --- simd/jfmmxint.asm» (revision 829)

675 +++ simd/jfmmxint.asm» (working copy)

676 @@ -66,7 +66,7 @@

677 » SECTION»SEG_CONST

678

679 » alignz» 16

680 -» global» EXTN(jconst_fdct_islow_mmx)

681 +» global» EXTN(jconst_fdct_islow_mmx) PRIVATE

682

683 EXTN(jconst_fdct_islow_mmx):

684

685 @@ -101,7 +101,7 @@

686 %define WK_NUM»» 2

687

688 » align» 16

689 -» global» EXTN(jsimd_fdct_islow_mmx)

690 +» global» EXTN(jsimd_fdct_islow_mmx) PRIVATE

691

692 EXTN(jsimd_fdct_islow_mmx):

693 » push» ebp

694 Index: simd/jcgryss2-64.asm

695 ===================================================================

696 --- simd/jcgryss2-64.asm» (revision 829)

697 +++ simd/jcgryss2-64.asm» (working copy)

698 @@ -37,7 +37,7 @@

699

700 » align» 16

701

702 -» global» EXTN(jsimd_rgb_gray_convert_sse2)

703 +» global» EXTN(jsimd_rgb_gray_convert_sse2) PRIVATE

704

705 EXTN(jsimd_rgb_gray_convert_sse2):

706 push rbp	11618 push rbp

	11619 + mov rax,rsp

	11620 mov rbp,rsp

	11621 collect_args

	11622

	11623 @@ -179,3 +181,7 @@

	11624 uncollect_args

	11625 pop rbp

	11626 ret

	11627 +

	11628 +; For some reason, the OS X linker does not honor the request to align the

	11629 +; segment unless we do this.

	11630 + align 16

707 Index: simd/jcqnts2i.asm	11631 Index: simd/jcqnts2i.asm

708 ===================================================================	11632 ===================================================================

709 --- simd/jcqnts2i.asm (revision 829)	11633 --- simd/jcqnts2i.asm (revision 829)

710 +++ simd/jcqnts2i.asm (working copy)	11634 +++ simd/jcqnts2i.asm (working copy)

711 @@ -35,7 +35,7 @@	11635 @@ -35,7 +35,7 @@

712 %define workspace ebp+16 ; DCTELEM * workspace	11636 %define workspace ebp+16 ; DCTELEM * workspace

713	11637

714 align 16	11638 align 16

715 - global EXTN(jsimd_convsamp_sse2)	11639 - global EXTN(jsimd_convsamp_sse2)

716 + global EXTN(jsimd_convsamp_sse2) PRIVATE	11640 + global EXTN(jsimd_convsamp_sse2) PRIVATE

717	11641

718 EXTN(jsimd_convsamp_sse2):	11642 EXTN(jsimd_convsamp_sse2):

719 push ebp	11643 push ebp

720 @@ -117,7 +117,7 @@	11644 @@ -117,7 +117,7 @@

721 %define workspace ebp+16 ; DCTELEM * workspace	11645 %define workspace ebp+16 ; DCTELEM * workspace

722	11646

723 align 16	11647 align 16

724 - global EXTN(jsimd_quantize_sse2)	11648 - global EXTN(jsimd_quantize_sse2)

725 + global EXTN(jsimd_quantize_sse2) PRIVATE	11649 + global EXTN(jsimd_quantize_sse2) PRIVATE

726	11650

727 EXTN(jsimd_quantize_sse2):	11651 EXTN(jsimd_quantize_sse2):

728 push ebp	11652 push ebp

729 Index: simd/jiss2fst-64.asm	11653 @@ -195,3 +195,6 @@

	11654 » pop» ebp

	11655 » ret

	11656

	11657 +; For some reason, the OS X linker does not honor the request to align the

	11658 +; segment unless we do this.

	11659 +» align» 16

	11660 Index: simd/jcqntsse.asm

730 ===================================================================	11661 ===================================================================

731 --- simd/jiss2fst-64.asm» (revision 829)	11662 --- simd/jcqntsse.asm» (revision 829)

732 +++ simd/jiss2fst-64.asm» (working copy)	11663 +++ simd/jcqntsse.asm» (working copy)

733 @@ -60,7 +60,7 @@	11664 @@ -35,7 +35,7 @@

734 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)	11665 %define workspace» ebp+16» » ; FAST_FLOAT * workspace

735

736 » alignz» 16

737 -» global» EXTN(jconst_idct_ifast_sse2)

738 +» global» EXTN(jconst_idct_ifast_sse2) PRIVATE

739

740 EXTN(jconst_idct_ifast_sse2):

741

742 @@ -93,7 +93,7 @@

743 %define WK_NUM»» 2

744	11666

745 align 16	11667 align 16

746 -» global» EXTN(jsimd_idct_ifast_sse2)	11668 -» global» EXTN(jsimd_convsamp_float_sse)

747 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE	11669 +» global» EXTN(jsimd_convsamp_float_sse) PRIVATE

748	11670

749 EXTN(jsimd_idct_ifast_sse2):	11671 EXTN(jsimd_convsamp_float_sse):

750 » push» rbp	11672 » push» ebp

751 Index: simd/jiss2flt.asm	11673 @@ -138,7 +138,7 @@

752 ===================================================================	11674 %define workspace» ebp+16» » ; FAST_FLOAT * workspace

753 --- simd/jiss2flt.asm» (revision 829)

754 +++ simd/jiss2flt.asm» (working copy)

755 @@ -37,7 +37,7 @@

756 » SECTION»SEG_CONST

757

758 » alignz» 16

759 -» global» EXTN(jconst_idct_float_sse2)

760 +» global» EXTN(jconst_idct_float_sse2) PRIVATE

761

762 EXTN(jconst_idct_float_sse2):

763

764 @@ -73,7 +73,7 @@

765 » » » » » ; FAST_FLOAT workspace[DCTSIZE2]

766	11675

767 align 16	11676 align 16

768 -» global» EXTN(jsimd_idct_float_sse2)	11677 -» global» EXTN(jsimd_quantize_float_sse)

769 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE	11678 +» global» EXTN(jsimd_quantize_float_sse) PRIVATE

770	11679

771 EXTN(jsimd_idct_float_sse2):	11680 EXTN(jsimd_quantize_float_sse):

772 push ebp	11681 push ebp

773 Index: simd/jiss2int.asm	11682 @@ -206,3 +206,6 @@

	11683 » pop» ebp

	11684 » ret

	11685

	11686 +; For some reason, the OS X linker does not honor the request to align the

	11687 +; segment unless we do this.

	11688 +» align» 16

	11689 Index: simd/jcsammmx.asm

774 ===================================================================	11690 ===================================================================

775 --- simd/jiss2int.asm» (revision 829)	11691 --- simd/jcsammmx.asm» (revision 829)

776 +++ simd/jiss2int.asm» (working copy)	11692 +++ simd/jcsammmx.asm» (working copy)

777 @@ -66,7 +66,7 @@	11693 @@ -40,7 +40,7 @@

778 » SECTION»SEG_CONST	11694 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data

779

780 » alignz» 16

781 -» global» EXTN(jconst_idct_islow_sse2)

782 +» global» EXTN(jconst_idct_islow_sse2) PRIVATE

783

784 EXTN(jconst_idct_islow_sse2):

785

786 @@ -105,7 +105,7 @@

787 %define WK_NUM»» 12

788	11695

789 align 16	11696 align 16

790 -» global» EXTN(jsimd_idct_islow_sse2)	11697 -» global» EXTN(jsimd_h2v1_downsample_mmx)

791 +» global» EXTN(jsimd_idct_islow_sse2) PRIVATE	11698 +» global» EXTN(jsimd_h2v1_downsample_mmx) PRIVATE

792	11699

793 EXTN(jsimd_idct_islow_sse2):	11700 EXTN(jsimd_h2v1_downsample_mmx):

794 push ebp	11701 push ebp

795 Index: simd/jfsseflt-64.asm	11702 @@ -95,7 +95,7 @@

796 ===================================================================

797 --- simd/jfsseflt-64.asm» (revision 829)

798 +++ simd/jfsseflt-64.asm» (working copy)

799 @@ -38,7 +38,7 @@

800 » SECTION»SEG_CONST

801	11703

802 » alignz» 16	11704 » mov» eax, JDIMENSION [v_samp(ebp)]» ; rowctr

803 -» global» EXTN(jconst_fdct_float_sse)	11705 » test» eax,eax

804 +» global» EXTN(jconst_fdct_float_sse) PRIVATE	11706 -» jle» short .return

	11707 +» jle» near .return

805	11708

806 EXTN(jconst_fdct_float_sse):	11709 » mov edx, 0x00010000» ; bias pattern

807	11710 » movd mm7,edx

808 @@ -65,7 +65,7 @@	11711 @@ -182,7 +182,7 @@

809 %define WK_NUM»» 2	11712 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data

810	11713

811 align 16	11714 align 16

812 -» global» EXTN(jsimd_fdct_float_sse)	11715 -» global» EXTN(jsimd_h2v2_downsample_mmx)

813 +» global» EXTN(jsimd_fdct_float_sse) PRIVATE	11716 +» global» EXTN(jsimd_h2v2_downsample_mmx) PRIVATE

814	11717

815 EXTN(jsimd_fdct_float_sse):	11718 EXTN(jsimd_h2v2_downsample_mmx):

816 » push» rbp	11719 » push» ebp

817 Index: simd/jccolss2-64.asm	11720 @@ -319,3 +319,6 @@

818 ===================================================================	11721 » pop» ebp

819 --- simd/jccolss2-64.asm» (revision 829)	11722 » ret

820 +++ simd/jccolss2-64.asm» (working copy)

821 @@ -34,7 +34,7 @@

822 » SECTION»SEG_CONST

823	11723

824 » alignz» 16	11724 +; For some reason, the OS X linker does not honor the request to align the

825 -» global» EXTN(jconst_rgb_ycc_convert_sse2)	11725 +; segment unless we do this.

826 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE	11726 +» align» 16

827

828 EXTN(jconst_rgb_ycc_convert_sse2):

829

830 Index: simd/jcsamss2-64.asm	11727 Index: simd/jcsamss2-64.asm

831 ===================================================================	11728 ===================================================================

832 --- simd/jcsamss2-64.asm (revision 829)	11729 --- simd/jcsamss2-64.asm (revision 829)

833 +++ simd/jcsamss2-64.asm (working copy)	11730 +++ simd/jcsamss2-64.asm (working copy)

834 @@ -41,7 +41,7 @@	11731 @@ -1,5 +1,5 @@

	11732 ;

	11733 -; jcsamss2.asm - downsampling (64-bit SSE2)

	11734 +; jcsamss2-64.asm - downsampling (64-bit SSE2)

	11735 ;

	11736 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	11737 ; Copyright 2009 D. R. Commander

	11738 @@ -41,10 +41,11 @@

835 ; r15 = JSAMPARRAY output_data	11739 ; r15 = JSAMPARRAY output_data

836	11740

837 align 16	11741 align 16

838 - global EXTN(jsimd_h2v1_downsample_sse2)	11742 - global EXTN(jsimd_h2v1_downsample_sse2)

839 + global EXTN(jsimd_h2v1_downsample_sse2) PRIVATE	11743 + global EXTN(jsimd_h2v1_downsample_sse2) PRIVATE

840	11744

841 EXTN(jsimd_h2v1_downsample_sse2):	11745 EXTN(jsimd_h2v1_downsample_sse2):

842 push rbp	11746 push rbp

843 @@ -185,7 +185,7 @@	11747 +» mov» rax,rsp

	11748 » mov» rbp,rsp

	11749 » collect_args

	11750

	11751 @@ -184,10 +185,11 @@

844 ; r15 = JSAMPARRAY output_data	11752 ; r15 = JSAMPARRAY output_data

845	11753

846 align 16	11754 align 16

847 - global EXTN(jsimd_h2v2_downsample_sse2)	11755 - global EXTN(jsimd_h2v2_downsample_sse2)

848 + global EXTN(jsimd_h2v2_downsample_sse2) PRIVATE	11756 + global EXTN(jsimd_h2v2_downsample_sse2) PRIVATE

849	11757

850 EXTN(jsimd_h2v2_downsample_sse2):	11758 EXTN(jsimd_h2v2_downsample_sse2):

851 push rbp	11759 push rbp

	11760 + mov rax,rsp

	11761 mov rbp,rsp

	11762 collect_args

	11763

	11764 @@ -322,3 +324,7 @@

	11765 uncollect_args

	11766 pop rbp

	11767 ret

	11768 +

	11769 +; For some reason, the OS X linker does not honor the request to align the

	11770 +; segment unless we do this.

	11771 + align 16

	11772 Index: simd/jcsamss2.asm

	11773 ===================================================================

	11774 --- simd/jcsamss2.asm (revision 829)

	11775 +++ simd/jcsamss2.asm (working copy)

	11776 @@ -40,7 +40,7 @@

	11777 %define output_data(b) (b)+28 ; JSAMPARRAY output_data

	11778

	11779 align 16

	11780 - global EXTN(jsimd_h2v1_downsample_sse2)

	11781 + global EXTN(jsimd_h2v1_downsample_sse2) PRIVATE

	11782

	11783 EXTN(jsimd_h2v1_downsample_sse2):

	11784 push ebp

	11785 @@ -195,7 +195,7 @@

	11786 %define output_data(b) (b)+28 ; JSAMPARRAY output_data

	11787

	11788 align 16

	11789 - global EXTN(jsimd_h2v2_downsample_sse2)

	11790 + global EXTN(jsimd_h2v2_downsample_sse2) PRIVATE

	11791

	11792 EXTN(jsimd_h2v2_downsample_sse2):

	11793 push ebp

	11794 @@ -346,3 +346,6 @@

	11795 pop ebp

	11796 ret

	11797

	11798 +; For some reason, the OS X linker does not honor the request to align the

	11799 +; segment unless we do this.

	11800 + align 16

	11801 Index: simd/jdclrmmx.asm

	11802 ===================================================================

	11803 --- simd/jdclrmmx.asm (revision 829)

	11804 +++ simd/jdclrmmx.asm (working copy)

	11805 @@ -19,8 +19,6 @@

	11806 %include "jcolsamp.inc"

	11807

	11808 ; --------------------------------------------------------------------------

	11809 - SECTION SEG_TEXT

	11810 - BITS 32

	11811 ;

	11812 ; Convert some rows of samples to the output colorspace.

	11813 ;

	11814 @@ -42,7 +40,7 @@

	11815 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr

	11816

	11817 align 16

	11818 - global EXTN(jsimd_ycc_rgb_convert_mmx)

	11819 + global EXTN(jsimd_ycc_rgb_convert_mmx) PRIVATE

	11820

	11821 EXTN(jsimd_ycc_rgb_convert_mmx):

	11822 push ebp

	11823 @@ -402,3 +400,6 @@

	11824 pop ebp

	11825 ret

	11826

	11827 +; For some reason, the OS X linker does not honor the request to align the

	11828 +; segment unless we do this.

	11829 + align 16

852 Index: simd/jdclrss2-64.asm	11830 Index: simd/jdclrss2-64.asm

853 ===================================================================	11831 ===================================================================

854 --- simd/jdclrss2-64.asm (revision 829)	11832 --- simd/jdclrss2-64.asm (revision 829)

855 +++ simd/jdclrss2-64.asm (working copy)	11833 +++ simd/jdclrss2-64.asm (working copy)

856 @@ -39,7 +39,7 @@	11834 @@ -1,8 +1,8 @@

	11835 ;

	11836 -; jdclrss2.asm - colorspace conversion (64-bit SSE2)

	11837 +; jdclrss2-64.asm - colorspace conversion (64-bit SSE2)

	11838 ;

	11839 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	11840 -; Copyright 2009 D. R. Commander

	11841 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB

	11842 +; Copyright 2009, 2012 D. R. Commander

	11843 ;

	11844 ; Based on

	11845 ; x86 SIMD extension for IJG JPEG library

	11846 @@ -20,8 +20,6 @@

	11847 %include "jcolsamp.inc"

	11848 » » » »

	11849 ; --------------------------------------------------------------------------

	11850 -» SECTION»SEG_TEXT

	11851 -» BITS» 64

	11852 ;

	11853 ; Convert some rows of samples to the output colorspace.

	11854 ;

	11855 @@ -41,7 +39,7 @@

857 %define WK_NUM 2	11856 %define WK_NUM 2

858	11857

859 align 16	11858 align 16

860 - global EXTN(jsimd_ycc_rgb_convert_sse2)	11859 - global EXTN(jsimd_ycc_rgb_convert_sse2)

861 + global EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE	11860 + global EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE

862	11861

863 EXTN(jsimd_ycc_rgb_convert_sse2):	11862 EXTN(jsimd_ycc_rgb_convert_sse2):

864 push rbp	11863 push rbp

	11864 @@ -51,8 +49,8 @@

	11865 mov [rsp],rax

	11866 mov rbp,rsp ; rbp = aligned rbp

	11867 lea rsp, [wk(0)]

	11868 + collect_args

	11869 push rbx

	11870 - collect_args

	11871

	11872 mov rcx, r10 ; num_cols

	11873 test rcx,rcx

	11874 @@ -72,7 +70,7 @@

	11875 pop rcx

	11876

	11877 mov rdi, r13

	11878 - mov rax, r14

	11879 + mov eax, r14d

	11880 test rax,rax

	11881 jle near .return

	11882 .rowloop:

	11883 @@ -253,17 +251,13 @@

	11884 movntdq XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	11885 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	11886 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF

	11887 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	11888 jmp short .out0

	11889 .out1: ; --(unaligned)-----------------

	11890 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)

	11891 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA

	11892 - add rdi, byte SIZEOF_XMMWORD ; outptr

	11893 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD

	11894 - add rdi, byte SIZEOF_XMMWORD ; outptr

	11895 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [rdi], xmmF

	11896 - add rdi, byte SIZEOF_XMMWORD ; outptr

	11897 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	11898 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	11899 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF

	11900 .out0:

	11901 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	11902 sub rcx, byte SIZEOF_XMMWORD

	11903 jz near .nextrow

	11904

	11905 @@ -273,14 +267,12 @@

	11906 jmp near .columnloop

	11907

	11908 .column_st32:

	11909 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)

	11910 lea rcx, [rcx+rcx*2] ; imul ecx, RGB_PIXELSIZE

	11911 cmp rcx, byte 2*SIZEOF_XMMWORD

	11912 jb short .column_st16

	11913 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA

	11914 - add rdi, byte SIZEOF_XMMWORD ; outptr

	11915 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD

	11916 - add rdi, byte SIZEOF_XMMWORD ; outptr

	11917 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	11918 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	11919 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr

	11920 movdqa xmmA,xmmF

	11921 sub rcx, byte 2*SIZEOF_XMMWORD

	11922 jmp short .column_st15

	11923 @@ -287,50 +279,44 @@

	11924 .column_st16:

	11925 cmp rcx, byte SIZEOF_XMMWORD

	11926 jb short .column_st15

	11927 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA

	11928 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	11929 add rdi, byte SIZEOF_XMMWORD ; outptr

	11930 movdqa xmmA,xmmD

	11931 sub rcx, byte SIZEOF_XMMWORD

	11932 .column_st15:

	11933 - mov rax,rcx

	11934 - xor rcx, byte 0x0F

	11935 - shl rcx, 2

	11936 - movd xmmB,ecx

	11937 - psrlq xmmH,4

	11938 - pcmpeqb xmmE,xmmE

	11939 - psrlq xmmH,xmmB

	11940 - psrlq xmmE,xmmB

	11941 - punpcklbw xmmE,xmmH

	11942 - ; ----------------

	11943 - mov rcx,rdi

	11944 - and rcx, byte SIZEOF_XMMWORD-1

	11945 - jz short .adj0

	11946 - add rax,rcx

	11947 - cmp rax, byte SIZEOF_XMMWORD

	11948 - ja short .adj0

	11949 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary

	11950 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,rcx

	11951 - movdqa xmmG,xmmA

	11952 - movdqa xmmC,xmmE

	11953 - pslldq xmmA, SIZEOF_XMMWORD/2

	11954 - pslldq xmmE, SIZEOF_XMMWORD/2

	11955 - movd xmmD,ecx

	11956 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT

	11957 - jb short .adj1

	11958 - movd xmmF,ecx

	11959 - psllq xmmA,xmmF

	11960 - psllq xmmE,xmmF

	11961 - jmp short .adj0

	11962 -.adj1: neg ecx

	11963 - movd xmmF,ecx

	11964 - psrlq xmmA,xmmF

	11965 - psrlq xmmE,xmmF

	11966 - psllq xmmG,xmmD

	11967 - psllq xmmC,xmmD

	11968 - por xmmA,xmmG

	11969 - por xmmE,xmmC

	11970 -.adj0: ; ----------------

	11971 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA

	11972 + ; Store the lower 8 bytes of xmmA to the output when it has enough

	11973 + ; space.

	11974 + cmp rcx, byte SIZEOF_MMWORD

	11975 + jb short .column_st7

	11976 + movq XMM_MMWORD [rdi], xmmA

	11977 + add rdi, byte SIZEOF_MMWORD

	11978 + sub rcx, byte SIZEOF_MMWORD

	11979 + psrldq xmmA, SIZEOF_MMWORD

	11980 +.column_st7:

	11981 + ; Store the lower 4 bytes of xmmA to the output when it has enough

	11982 + ; space.

	11983 + cmp rcx, byte SIZEOF_DWORD

	11984 + jb short .column_st3

	11985 + movd XMM_DWORD [rdi], xmmA

	11986 + add rdi, byte SIZEOF_DWORD

	11987 + sub rcx, byte SIZEOF_DWORD

	11988 + psrldq xmmA, SIZEOF_DWORD

	11989 +.column_st3:

	11990 + ; Store the lower 2 bytes of rax to the output when it has enough

	11991 + ; space.

	11992 + movd eax, xmmA

	11993 + cmp rcx, byte SIZEOF_WORD

	11994 + jb short .column_st1

	11995 + mov WORD [rdi], ax

	11996 + add rdi, byte SIZEOF_WORD

	11997 + sub rcx, byte SIZEOF_WORD

	11998 + shr rax, 16

	11999 +.column_st1:

	12000 + ; Store the lower 1 byte of rax to the output when it has enough

	12001 + ; space.

	12002 + test rcx, rcx

	12003 + jz short .nextrow

	12004 + mov BYTE [rdi], al

	12005

	12006 %else ; RGB_PIXELSIZE == 4 ; -----------

	12007

	12008 @@ -375,19 +361,14 @@

	12009 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	12010 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC

	12011 movntdq XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH

	12012 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	12013 jmp short .out0

	12014 .out1: ; --(unaligned)-----------------

	12015 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)

	12016 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA

	12017 - add rdi, byte SIZEOF_XMMWORD ; outptr

	12018 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD

	12019 - add rdi, byte SIZEOF_XMMWORD ; outptr

	12020 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [rdi], xmmC

	12021 - add rdi, byte SIZEOF_XMMWORD ; outptr

	12022 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [rdi], xmmH

	12023 - add rdi, byte SIZEOF_XMMWORD ; outptr

	12024 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	12025 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	12026 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC

	12027 + movdqu XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH

	12028 .out0:

	12029 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	12030 sub rcx, byte SIZEOF_XMMWORD

	12031 jz near .nextrow

	12032

	12033 @@ -397,13 +378,11 @@

	12034 jmp near .columnloop

	12035

	12036 .column_st32:

	12037 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)

	12038 cmp rcx, byte SIZEOF_XMMWORD/2

	12039 jb short .column_st16

	12040 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA

	12041 - add rdi, byte SIZEOF_XMMWORD ; outptr

	12042 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD

	12043 - add rdi, byte SIZEOF_XMMWORD ; outptr

	12044 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	12045 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	12046 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr

	12047 movdqa xmmA,xmmC

	12048 movdqa xmmD,xmmH

	12049 sub rcx, byte SIZEOF_XMMWORD/2

	12050 @@ -410,50 +389,25 @@

	12051 .column_st16:

	12052 cmp rcx, byte SIZEOF_XMMWORD/4

	12053 jb short .column_st15

	12054 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA

	12055 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	12056 add rdi, byte SIZEOF_XMMWORD ; outptr

	12057 movdqa xmmA,xmmD

	12058 sub rcx, byte SIZEOF_XMMWORD/4

	12059 .column_st15:

	12060 - cmp rcx, byte SIZEOF_XMMWORD/16

	12061 - jb near .nextrow

	12062 - mov rax,rcx

	12063 - xor rcx, byte 0x03

	12064 - inc rcx

	12065 - shl rcx, 4

	12066 - movd xmmF,ecx

	12067 - psrlq xmmE,xmmF

	12068 - punpcklbw xmmE,xmmE

	12069 - ; ----------------

	12070 - mov rcx,rdi

	12071 - and rcx, byte SIZEOF_XMMWORD-1

	12072 - jz short .adj0

	12073 - lea rax, [rcx+rax*4] ; RGB_PIXELSIZE

	12074 - cmp rax, byte SIZEOF_XMMWORD

	12075 - ja short .adj0

	12076 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary

	12077 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx

	12078 - movdqa xmmB,xmmA

	12079 - movdqa xmmG,xmmE

	12080 - pslldq xmmA, SIZEOF_XMMWORD/2

	12081 - pslldq xmmE, SIZEOF_XMMWORD/2

	12082 - movd xmmC,ecx

	12083 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT

	12084 - jb short .adj1

	12085 - movd xmmH,ecx

	12086 - psllq xmmA,xmmH

	12087 - psllq xmmE,xmmH

	12088 - jmp short .adj0

	12089 -.adj1: neg rcx

	12090 - movd xmmH,ecx

	12091 - psrlq xmmA,xmmH

	12092 - psrlq xmmE,xmmH

	12093 - psllq xmmB,xmmC

	12094 - psllq xmmG,xmmC

	12095 - por xmmA,xmmB

	12096 - por xmmE,xmmG

	12097 -.adj0: ; ----------------

	12098 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA

	12099 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough

	12100 + ; space.

	12101 + cmp rcx, byte SIZEOF_XMMWORD/8

	12102 + jb short .column_st7

	12103 + movq MMWORD [rdi], xmmA

	12104 + add rdi, byte SIZEOF_XMMWORD/8*4

	12105 + sub rcx, byte SIZEOF_XMMWORD/8

	12106 + psrldq xmmA, SIZEOF_XMMWORD/8*4

	12107 +.column_st7:

	12108 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough

	12109 + ; space.

	12110 + test rcx, rcx

	12111 + jz short .nextrow

	12112 + movd XMM_DWORD [rdi], xmmA

	12113

	12114 %endif ; RGB_PIXELSIZE ; ---------------

	12115

	12116 @@ -475,9 +429,13 @@

	12117 sfence ; flush the write buffer

	12118

	12119 .return:

	12120 + pop rbx

	12121 uncollect_args

	12122 - pop rbx

	12123 mov rsp,rbp ; rsp <- aligned rbp

	12124 pop rsp ; rsp <- original rbp

	12125 pop rbp

	12126 ret

	12127 +

	12128 +; For some reason, the OS X linker does not honor the request to align the

	12129 +; segment unless we do this.

	12130 + align 16

	12131 Index: simd/jdclrss2.asm

	12132 ===================================================================

	12133 --- simd/jdclrss2.asm (revision 829)

	12134 +++ simd/jdclrss2.asm (working copy)

	12135 @@ -1,7 +1,8 @@

	12136 ;

	12137 ; jdclrss2.asm - colorspace conversion (SSE2)

	12138 ;

	12139 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	12140 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB

	12141 +; Copyright 2012 D. R. Commander

	12142 ;

	12143 ; Based on

	12144 ; x86 SIMD extension for IJG JPEG library

	12145 @@ -19,8 +20,6 @@

	12146 %include "jcolsamp.inc"

	12147

	12148 ; --------------------------------------------------------------------------

	12149 - SECTION SEG_TEXT

	12150 - BITS 32

	12151 ;

	12152 ; Convert some rows of samples to the output colorspace.

	12153 ;

	12154 @@ -42,7 +41,7 @@

	12155 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr

	12156

	12157 align 16

	12158 - global EXTN(jsimd_ycc_rgb_convert_sse2)

	12159 + global EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE

	12160

	12161 EXTN(jsimd_ycc_rgb_convert_sse2):

	12162 push ebp

	12163 @@ -264,17 +263,13 @@

	12164 movntdq XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	12165 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	12166 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF

	12167 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	12168 jmp short .out0

	12169 .out1: ; --(unaligned)-----------------

	12170 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)

	12171 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA

	12172 - add edi, byte SIZEOF_XMMWORD ; outptr

	12173 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD

	12174 - add edi, byte SIZEOF_XMMWORD ; outptr

	12175 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [edi], xmmF

	12176 - add edi, byte SIZEOF_XMMWORD ; outptr

	12177 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	12178 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	12179 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF

	12180 .out0:

	12181 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	12182 sub ecx, byte SIZEOF_XMMWORD

	12183 jz near .nextrow

	12184

	12185 @@ -285,14 +280,12 @@

	12186 alignx 16,7

	12187

	12188 .column_st32:

	12189 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)

	12190 lea ecx, [ecx+ecx*2] ; imul ecx, RGB_PIXELSIZE

	12191 cmp ecx, byte 2*SIZEOF_XMMWORD

	12192 jb short .column_st16

	12193 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA

	12194 - add edi, byte SIZEOF_XMMWORD ; outptr

	12195 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD

	12196 - add edi, byte SIZEOF_XMMWORD ; outptr

	12197 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	12198 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	12199 + add edi, byte 2*SIZEOF_XMMWORD ; outptr

	12200 movdqa xmmA,xmmF

	12201 sub ecx, byte 2*SIZEOF_XMMWORD

	12202 jmp short .column_st15

	12203 @@ -299,50 +292,44 @@

	12204 .column_st16:

	12205 cmp ecx, byte SIZEOF_XMMWORD

	12206 jb short .column_st15

	12207 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA

	12208 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	12209 add edi, byte SIZEOF_XMMWORD ; outptr

	12210 movdqa xmmA,xmmD

	12211 sub ecx, byte SIZEOF_XMMWORD

	12212 .column_st15:

	12213 - mov eax,ecx

	12214 - xor ecx, byte 0x0F

	12215 - shl ecx, 2

	12216 - movd xmmB,ecx

	12217 - psrlq xmmH,4

	12218 - pcmpeqb xmmE,xmmE

	12219 - psrlq xmmH,xmmB

	12220 - psrlq xmmE,xmmB

	12221 - punpcklbw xmmE,xmmH

	12222 - ; ----------------

	12223 - mov ecx,edi

	12224 - and ecx, byte SIZEOF_XMMWORD-1

	12225 - jz short .adj0

	12226 - add eax,ecx

	12227 - cmp eax, byte SIZEOF_XMMWORD

	12228 - ja short .adj0

	12229 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary

	12230 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx

	12231 - movdqa xmmG,xmmA

	12232 - movdqa xmmC,xmmE

	12233 - pslldq xmmA, SIZEOF_XMMWORD/2

	12234 - pslldq xmmE, SIZEOF_XMMWORD/2

	12235 - movd xmmD,ecx

	12236 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT

	12237 - jb short .adj1

	12238 - movd xmmF,ecx

	12239 - psllq xmmA,xmmF

	12240 - psllq xmmE,xmmF

	12241 - jmp short .adj0

	12242 -.adj1: neg ecx

	12243 - movd xmmF,ecx

	12244 - psrlq xmmA,xmmF

	12245 - psrlq xmmE,xmmF

	12246 - psllq xmmG,xmmD

	12247 - psllq xmmC,xmmD

	12248 - por xmmA,xmmG

	12249 - por xmmE,xmmC

	12250 -.adj0: ; ----------------

	12251 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	12252 + ; Store the lower 8 bytes of xmmA to the output when it has enough

	12253 + ; space.

	12254 + cmp ecx, byte SIZEOF_MMWORD

	12255 + jb short .column_st7

	12256 + movq XMM_MMWORD [edi], xmmA

	12257 + add edi, byte SIZEOF_MMWORD

	12258 + sub ecx, byte SIZEOF_MMWORD

	12259 + psrldq xmmA, SIZEOF_MMWORD

	12260 +.column_st7:

	12261 + ; Store the lower 4 bytes of xmmA to the output when it has enough

	12262 + ; space.

	12263 + cmp ecx, byte SIZEOF_DWORD

	12264 + jb short .column_st3

	12265 + movd XMM_DWORD [edi], xmmA

	12266 + add edi, byte SIZEOF_DWORD

	12267 + sub ecx, byte SIZEOF_DWORD

	12268 + psrldq xmmA, SIZEOF_DWORD

	12269 +.column_st3:

	12270 + ; Store the lower 2 bytes of eax to the output when it has enough

	12271 + ; space.

	12272 + movd eax, xmmA

	12273 + cmp ecx, byte SIZEOF_WORD

	12274 + jb short .column_st1

	12275 + mov WORD [edi], ax

	12276 + add edi, byte SIZEOF_WORD

	12277 + sub ecx, byte SIZEOF_WORD

	12278 + shr eax, 16

	12279 +.column_st1:

	12280 + ; Store the lower 1 byte of eax to the output when it has enough

	12281 + ; space.

	12282 + test ecx, ecx

	12283 + jz short .nextrow

	12284 + mov BYTE [edi], al

	12285

	12286 %else ; RGB_PIXELSIZE == 4 ; -----------

	12287

	12288 @@ -387,19 +374,14 @@

	12289 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	12290 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC

	12291 movntdq XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH

	12292 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	12293 jmp short .out0

	12294 .out1: ; --(unaligned)-----------------

	12295 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)

	12296 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	12297 - add edi, byte SIZEOF_XMMWORD ; outptr

	12298 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD

	12299 - add edi, byte SIZEOF_XMMWORD ; outptr

	12300 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [edi], xmmC

	12301 - add edi, byte SIZEOF_XMMWORD ; outptr

	12302 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [edi], xmmH

	12303 - add edi, byte SIZEOF_XMMWORD ; outptr

	12304 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	12305 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	12306 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC

	12307 + movdqu XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH

	12308 .out0:

	12309 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	12310 sub ecx, byte SIZEOF_XMMWORD

	12311 jz near .nextrow

	12312

	12313 @@ -410,13 +392,11 @@

	12314 alignx 16,7

	12315

	12316 .column_st32:

	12317 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)

	12318 cmp ecx, byte SIZEOF_XMMWORD/2

	12319 jb short .column_st16

	12320 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	12321 - add edi, byte SIZEOF_XMMWORD ; outptr

	12322 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD

	12323 - add edi, byte SIZEOF_XMMWORD ; outptr

	12324 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	12325 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	12326 + add edi, byte 2*SIZEOF_XMMWORD ; outptr

	12327 movdqa xmmA,xmmC

	12328 movdqa xmmD,xmmH

	12329 sub ecx, byte SIZEOF_XMMWORD/2

	12330 @@ -423,50 +403,25 @@

	12331 .column_st16:

	12332 cmp ecx, byte SIZEOF_XMMWORD/4

	12333 jb short .column_st15

	12334 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	12335 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	12336 add edi, byte SIZEOF_XMMWORD ; outptr

	12337 movdqa xmmA,xmmD

	12338 sub ecx, byte SIZEOF_XMMWORD/4

	12339 .column_st15:

	12340 - cmp ecx, byte SIZEOF_XMMWORD/16

	12341 - jb short .nextrow

	12342 - mov eax,ecx

	12343 - xor ecx, byte 0x03

	12344 - inc ecx

	12345 - shl ecx, 4

	12346 - movd xmmF,ecx

	12347 - psrlq xmmE,xmmF

	12348 - punpcklbw xmmE,xmmE

	12349 - ; ----------------

	12350 - mov ecx,edi

	12351 - and ecx, byte SIZEOF_XMMWORD-1

	12352 - jz short .adj0

	12353 - lea eax, [ecx+eax*4] ; RGB_PIXELSIZE

	12354 - cmp eax, byte SIZEOF_XMMWORD

	12355 - ja short .adj0

	12356 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary

	12357 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx

	12358 - movdqa xmmB,xmmA

	12359 - movdqa xmmG,xmmE

	12360 - pslldq xmmA, SIZEOF_XMMWORD/2

	12361 - pslldq xmmE, SIZEOF_XMMWORD/2

	12362 - movd xmmC,ecx

	12363 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT

	12364 - jb short .adj1

	12365 - movd xmmH,ecx

	12366 - psllq xmmA,xmmH

	12367 - psllq xmmE,xmmH

	12368 - jmp short .adj0

	12369 -.adj1: neg ecx

	12370 - movd xmmH,ecx

	12371 - psrlq xmmA,xmmH

	12372 - psrlq xmmE,xmmH

	12373 - psllq xmmB,xmmC

	12374 - psllq xmmG,xmmC

	12375 - por xmmA,xmmB

	12376 - por xmmE,xmmG

	12377 -.adj0: ; ----------------

	12378 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	12379 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough

	12380 + ; space.

	12381 + cmp ecx, byte SIZEOF_XMMWORD/8

	12382 + jb short .column_st7

	12383 + movq XMM_MMWORD [edi], xmmA

	12384 + add edi, byte SIZEOF_XMMWORD/8*4

	12385 + sub ecx, byte SIZEOF_XMMWORD/8

	12386 + psrldq xmmA, SIZEOF_XMMWORD/8*4

	12387 +.column_st7:

	12388 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough

	12389 + ; space.

	12390 + test ecx, ecx

	12391 + jz short .nextrow

	12392 + movd XMM_DWORD [edi], xmmA

	12393

	12394 %endif ; RGB_PIXELSIZE ; ---------------

	12395

	12396 @@ -500,3 +455,6 @@

	12397 pop ebp

	12398 ret

	12399

	12400 +; For some reason, the OS X linker does not honor the request to align the

	12401 +; segment unless we do this.

	12402 + align 16

865 Index: simd/jdcolmmx.asm	12403 Index: simd/jdcolmmx.asm

866 ===================================================================	12404 ===================================================================

867 --- simd/jdcolmmx.asm (revision 829)	12405 --- simd/jdcolmmx.asm (revision 829)

868 +++ simd/jdcolmmx.asm (working copy)	12406 +++ simd/jdcolmmx.asm (working copy)

869 @@ -35,7 +35,7 @@	12407 @@ -35,7 +35,7 @@

870 SECTION SEG_CONST	12408 SECTION SEG_CONST

871	12409

872 alignz 16	12410 alignz 16

873 - global EXTN(jconst_ycc_rgb_convert_mmx)	12411 - global EXTN(jconst_ycc_rgb_convert_mmx)

874 + global EXTN(jconst_ycc_rgb_convert_mmx) PRIVATE	12412 + global EXTN(jconst_ycc_rgb_convert_mmx) PRIVATE

875	12413

876 EXTN(jconst_ycc_rgb_convert_mmx):	12414 EXTN(jconst_ycc_rgb_convert_mmx):

877	12415

878 Index: simd/jcclrmmx.asm	12416 @@ -48,6 +48,9 @@

	12417 » alignz» 16

	12418

	12419 ; --------------------------------------------------------------------------

	12420 +» SECTION»SEG_TEXT

	12421 +» BITS» 32

	12422 +

	12423 %include "jdclrmmx.asm"

	12424

	12425 %undef RGB_RED

	12426 @@ -54,10 +57,10 @@

	12427 %undef RGB_GREEN

	12428 %undef RGB_BLUE

	12429 %undef RGB_PIXELSIZE

	12430 -%define RGB_RED 0

	12431 -%define RGB_GREEN 1

	12432 -%define RGB_BLUE 2

	12433 -%define RGB_PIXELSIZE 3

	12434 +%define RGB_RED EXT_RGB_RED

	12435 +%define RGB_GREEN EXT_RGB_GREEN

	12436 +%define RGB_BLUE EXT_RGB_BLUE

	12437 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	12438 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extrgb_convert_mmx

	12439 %include "jdclrmmx.asm"

	12440

	12441 @@ -65,10 +68,10 @@

	12442 %undef RGB_GREEN

	12443 %undef RGB_BLUE

	12444 %undef RGB_PIXELSIZE

	12445 -%define RGB_RED 0

	12446 -%define RGB_GREEN 1

	12447 -%define RGB_BLUE 2

	12448 -%define RGB_PIXELSIZE 4

	12449 +%define RGB_RED EXT_RGBX_RED

	12450 +%define RGB_GREEN EXT_RGBX_GREEN

	12451 +%define RGB_BLUE EXT_RGBX_BLUE

	12452 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	12453 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extrgbx_convert_mmx

	12454 %include "jdclrmmx.asm"

	12455

	12456 @@ -76,10 +79,10 @@

	12457 %undef RGB_GREEN

	12458 %undef RGB_BLUE

	12459 %undef RGB_PIXELSIZE

	12460 -%define RGB_RED 2

	12461 -%define RGB_GREEN 1

	12462 -%define RGB_BLUE 0

	12463 -%define RGB_PIXELSIZE 3

	12464 +%define RGB_RED EXT_BGR_RED

	12465 +%define RGB_GREEN EXT_BGR_GREEN

	12466 +%define RGB_BLUE EXT_BGR_BLUE

	12467 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	12468 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extbgr_convert_mmx

	12469 %include "jdclrmmx.asm"

	12470

	12471 @@ -87,10 +90,10 @@

	12472 %undef RGB_GREEN

	12473 %undef RGB_BLUE

	12474 %undef RGB_PIXELSIZE

	12475 -%define RGB_RED 2

	12476 -%define RGB_GREEN 1

	12477 -%define RGB_BLUE 0

	12478 -%define RGB_PIXELSIZE 4

	12479 +%define RGB_RED EXT_BGRX_RED

	12480 +%define RGB_GREEN EXT_BGRX_GREEN

	12481 +%define RGB_BLUE EXT_BGRX_BLUE

	12482 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	12483 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extbgrx_convert_mmx

	12484 %include "jdclrmmx.asm"

	12485

	12486 @@ -98,10 +101,10 @@

	12487 %undef RGB_GREEN

	12488 %undef RGB_BLUE

	12489 %undef RGB_PIXELSIZE

	12490 -%define RGB_RED 3

	12491 -%define RGB_GREEN 2

	12492 -%define RGB_BLUE 1

	12493 -%define RGB_PIXELSIZE 4

	12494 +%define RGB_RED EXT_XBGR_RED

	12495 +%define RGB_GREEN EXT_XBGR_GREEN

	12496 +%define RGB_BLUE EXT_XBGR_BLUE

	12497 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	12498 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extxbgr_convert_mmx

	12499 %include "jdclrmmx.asm"

	12500

	12501 @@ -109,9 +112,9 @@

	12502 %undef RGB_GREEN

	12503 %undef RGB_BLUE

	12504 %undef RGB_PIXELSIZE

	12505 -%define RGB_RED 1

	12506 -%define RGB_GREEN 2

	12507 -%define RGB_BLUE 3

	12508 -%define RGB_PIXELSIZE 4

	12509 +%define RGB_RED EXT_XRGB_RED

	12510 +%define RGB_GREEN EXT_XRGB_GREEN

	12511 +%define RGB_BLUE EXT_XRGB_BLUE

	12512 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	12513 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extxrgb_convert_mmx

	12514 %include "jdclrmmx.asm"

	12515 Index: simd/jdcolss2-64.asm

879 ===================================================================	12516 ===================================================================

880 --- simd/jcclrmmx.asm» (revision 829)	12517 --- simd/jdcolss2-64.asm» (revision 829)

881 +++ simd/jcclrmmx.asm» (working copy)	12518 +++ simd/jdcolss2-64.asm» (working copy)

882 @@ -40,7 +40,7 @@	12519 @@ -1,5 +1,5 @@

883 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr	12520 ;

884	12521 -; jdcolss2.asm - colorspace conversion (64-bit SSE2)

885 » align» 16	12522 +; jdcolss2-64.asm - colorspace conversion (64-bit SSE2)

886 -» global» EXTN(jsimd_rgb_ycc_convert_mmx)	12523 ;

887 +» global» EXTN(jsimd_rgb_ycc_convert_mmx) PRIVATE	12524 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

888	12525 ; Copyright 2009 D. R. Commander

889 EXTN(jsimd_rgb_ycc_convert_mmx):	12526 @@ -35,7 +35,7 @@

890 » push» ebp

891 Index: simd/jfsseflt.asm

892 ===================================================================

893 --- simd/jfsseflt.asm» (revision 829)

894 +++ simd/jfsseflt.asm» (working copy)

895 @@ -37,7 +37,7 @@

896 SECTION SEG_CONST	12527 SECTION SEG_CONST

897	12528

898 alignz 16	12529 alignz 16

899 -» global» EXTN(jconst_fdct_float_sse)	12530 -» global» EXTN(jconst_ycc_rgb_convert_sse2)

900 +» global» EXTN(jconst_fdct_float_sse) PRIVATE	12531 +» global» EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE

901	12532

902 EXTN(jconst_fdct_float_sse):	12533 EXTN(jconst_ycc_rgb_convert_sse2):

903	12534

904 @@ -65,7 +65,7 @@	12535 @@ -48,6 +48,9 @@

905 %define WK_NUM»» 2	12536 » alignz» 16

906	12537

907 » align» 16	12538 ; --------------------------------------------------------------------------

908 -» global» EXTN(jsimd_fdct_float_sse)	12539 +» SECTION»SEG_TEXT

909 +» global» EXTN(jsimd_fdct_float_sse) PRIVATE	12540 +» BITS» 64

910	12541 +

911 EXTN(jsimd_fdct_float_sse):	12542 %include "jdclrss2-64.asm"

912 » push» ebp	12543

913 Index: simd/jdmrgss2-64.asm	12544 %undef RGB_RED

914 ===================================================================	12545 @@ -54,10 +57,10 @@

915 --- simd/jdmrgss2-64.asm» (revision 829)	12546 %undef RGB_GREEN

916 +++ simd/jdmrgss2-64.asm» (working copy)	12547 %undef RGB_BLUE

917 @@ -39,7 +39,7 @@	12548 %undef RGB_PIXELSIZE

918 %define WK_NUM»» 3	12549 -%define RGB_RED 0

919	12550 -%define RGB_GREEN 1

920 » align» 16	12551 -%define RGB_BLUE 2

921 -» global» EXTN(jsimd_h2v1_merged_upsample_sse2)	12552 -%define RGB_PIXELSIZE 3

922 +» global» EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE	12553 +%define RGB_RED EXT_RGB_RED

923	12554 +%define RGB_GREEN EXT_RGB_GREEN

924 EXTN(jsimd_h2v1_merged_upsample_sse2):	12555 +%define RGB_BLUE EXT_RGB_BLUE

925 » push» rbp	12556 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

926 @@ -543,7 +543,7 @@	12557 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgb_convert_sse2

927 ; r13 = JSAMPARRAY output_buf	12558 %include "jdclrss2-64.asm"

928	12559

929 » align» 16	12560 @@ -65,10 +68,10 @@

930 -» global» EXTN(jsimd_h2v2_merged_upsample_sse2)	12561 %undef RGB_GREEN

931 +» global» EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE	12562 %undef RGB_BLUE

932	12563 %undef RGB_PIXELSIZE

933 EXTN(jsimd_h2v2_merged_upsample_sse2):	12564 -%define RGB_RED 0

934 » push» rbp	12565 -%define RGB_GREEN 1

	12566 -%define RGB_BLUE 2

	12567 -%define RGB_PIXELSIZE 4

	12568 +%define RGB_RED EXT_RGBX_RED

	12569 +%define RGB_GREEN EXT_RGBX_GREEN

	12570 +%define RGB_BLUE EXT_RGBX_BLUE

	12571 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	12572 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgbx_convert_sse2

	12573 %include "jdclrss2-64.asm"

	12574

	12575 @@ -76,10 +79,10 @@

	12576 %undef RGB_GREEN

	12577 %undef RGB_BLUE

	12578 %undef RGB_PIXELSIZE

	12579 -%define RGB_RED 2

	12580 -%define RGB_GREEN 1

	12581 -%define RGB_BLUE 0

	12582 -%define RGB_PIXELSIZE 3

	12583 +%define RGB_RED EXT_BGR_RED

	12584 +%define RGB_GREEN EXT_BGR_GREEN

	12585 +%define RGB_BLUE EXT_BGR_BLUE

	12586 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	12587 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgr_convert_sse2

	12588 %include "jdclrss2-64.asm"

	12589

	12590 @@ -87,10 +90,10 @@

	12591 %undef RGB_GREEN

	12592 %undef RGB_BLUE

	12593 %undef RGB_PIXELSIZE

	12594 -%define RGB_RED 2

	12595 -%define RGB_GREEN 1

	12596 -%define RGB_BLUE 0

	12597 -%define RGB_PIXELSIZE 4

	12598 +%define RGB_RED EXT_BGRX_RED

	12599 +%define RGB_GREEN EXT_BGRX_GREEN

	12600 +%define RGB_BLUE EXT_BGRX_BLUE

	12601 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	12602 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgrx_convert_sse2

	12603 %include "jdclrss2-64.asm"

	12604

	12605 @@ -98,10 +101,10 @@

	12606 %undef RGB_GREEN

	12607 %undef RGB_BLUE

	12608 %undef RGB_PIXELSIZE

	12609 -%define RGB_RED 3

	12610 -%define RGB_GREEN 2

	12611 -%define RGB_BLUE 1

	12612 -%define RGB_PIXELSIZE 4

	12613 +%define RGB_RED EXT_XBGR_RED

	12614 +%define RGB_GREEN EXT_XBGR_GREEN

	12615 +%define RGB_BLUE EXT_XBGR_BLUE

	12616 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	12617 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxbgr_convert_sse2

	12618 %include "jdclrss2-64.asm"

	12619

	12620 @@ -109,9 +112,9 @@

	12621 %undef RGB_GREEN

	12622 %undef RGB_BLUE

	12623 %undef RGB_PIXELSIZE

	12624 -%define RGB_RED 1

	12625 -%define RGB_GREEN 2

	12626 -%define RGB_BLUE 3

	12627 -%define RGB_PIXELSIZE 4

	12628 +%define RGB_RED EXT_XRGB_RED

	12629 +%define RGB_GREEN EXT_XRGB_GREEN

	12630 +%define RGB_BLUE EXT_XRGB_BLUE

	12631 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	12632 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxrgb_convert_sse2

	12633 %include "jdclrss2-64.asm"

935 Index: simd/jdcolss2.asm	12634 Index: simd/jdcolss2.asm

936 ===================================================================	12635 ===================================================================

937 --- simd/jdcolss2.asm (revision 829)	12636 --- simd/jdcolss2.asm (revision 829)

938 +++ simd/jdcolss2.asm (working copy)	12637 +++ simd/jdcolss2.asm (working copy)

939 @@ -35,7 +35,7 @@	12638 @@ -35,7 +35,7 @@

940 SECTION SEG_CONST	12639 SECTION SEG_CONST

941	12640

942 alignz 16	12641 alignz 16

943 - global EXTN(jconst_ycc_rgb_convert_sse2)	12642 - global EXTN(jconst_ycc_rgb_convert_sse2)

944 + global EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE	12643 + global EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE

945	12644

946 EXTN(jconst_ycc_rgb_convert_sse2):	12645 EXTN(jconst_ycc_rgb_convert_sse2):

947	12646

	12647 @@ -48,6 +48,9 @@

	12648 alignz 16

	12649

	12650 ; --------------------------------------------------------------------------

	12651 + SECTION SEG_TEXT

	12652 + BITS 32

	12653 +

	12654 %include "jdclrss2.asm"

	12655

	12656 %undef RGB_RED

	12657 @@ -54,10 +57,10 @@

	12658 %undef RGB_GREEN

	12659 %undef RGB_BLUE

	12660 %undef RGB_PIXELSIZE

	12661 -%define RGB_RED 0

	12662 -%define RGB_GREEN 1

	12663 -%define RGB_BLUE 2

	12664 -%define RGB_PIXELSIZE 3

	12665 +%define RGB_RED EXT_RGB_RED

	12666 +%define RGB_GREEN EXT_RGB_GREEN

	12667 +%define RGB_BLUE EXT_RGB_BLUE

	12668 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	12669 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgb_convert_sse2

	12670 %include "jdclrss2.asm"

	12671

	12672 @@ -65,10 +68,10 @@

	12673 %undef RGB_GREEN

	12674 %undef RGB_BLUE

	12675 %undef RGB_PIXELSIZE

	12676 -%define RGB_RED 0

	12677 -%define RGB_GREEN 1

	12678 -%define RGB_BLUE 2

	12679 -%define RGB_PIXELSIZE 4

	12680 +%define RGB_RED EXT_RGBX_RED

	12681 +%define RGB_GREEN EXT_RGBX_GREEN

	12682 +%define RGB_BLUE EXT_RGBX_BLUE

	12683 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	12684 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgbx_convert_sse2

	12685 %include "jdclrss2.asm"

	12686

	12687 @@ -76,10 +79,10 @@

	12688 %undef RGB_GREEN

	12689 %undef RGB_BLUE

	12690 %undef RGB_PIXELSIZE

	12691 -%define RGB_RED 2

	12692 -%define RGB_GREEN 1

	12693 -%define RGB_BLUE 0

	12694 -%define RGB_PIXELSIZE 3

	12695 +%define RGB_RED EXT_BGR_RED

	12696 +%define RGB_GREEN EXT_BGR_GREEN

	12697 +%define RGB_BLUE EXT_BGR_BLUE

	12698 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	12699 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgr_convert_sse2

	12700 %include "jdclrss2.asm"

	12701

	12702 @@ -87,10 +90,10 @@

	12703 %undef RGB_GREEN

	12704 %undef RGB_BLUE

	12705 %undef RGB_PIXELSIZE

	12706 -%define RGB_RED 2

	12707 -%define RGB_GREEN 1

	12708 -%define RGB_BLUE 0

	12709 -%define RGB_PIXELSIZE 4

	12710 +%define RGB_RED EXT_BGRX_RED

	12711 +%define RGB_GREEN EXT_BGRX_GREEN

	12712 +%define RGB_BLUE EXT_BGRX_BLUE

	12713 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	12714 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgrx_convert_sse2

	12715 %include "jdclrss2.asm"

	12716

	12717 @@ -98,10 +101,10 @@

	12718 %undef RGB_GREEN

	12719 %undef RGB_BLUE

	12720 %undef RGB_PIXELSIZE

	12721 -%define RGB_RED 3

	12722 -%define RGB_GREEN 2

	12723 -%define RGB_BLUE 1

	12724 -%define RGB_PIXELSIZE 4

	12725 +%define RGB_RED EXT_XBGR_RED

	12726 +%define RGB_GREEN EXT_XBGR_GREEN

	12727 +%define RGB_BLUE EXT_XBGR_BLUE

	12728 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	12729 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxbgr_convert_sse2

	12730 %include "jdclrss2.asm"

	12731

	12732 @@ -109,9 +112,9 @@

	12733 %undef RGB_GREEN

	12734 %undef RGB_BLUE

	12735 %undef RGB_PIXELSIZE

	12736 -%define RGB_RED 1

	12737 -%define RGB_GREEN 2

	12738 -%define RGB_BLUE 3

	12739 -%define RGB_PIXELSIZE 4

	12740 +%define RGB_RED EXT_XRGB_RED

	12741 +%define RGB_GREEN EXT_XRGB_GREEN

	12742 +%define RGB_BLUE EXT_XRGB_BLUE

	12743 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	12744 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxrgb_convert_sse2

	12745 %include "jdclrss2.asm"

948 Index: simd/jdmermmx.asm	12746 Index: simd/jdmermmx.asm

949 ===================================================================	12747 ===================================================================

950 --- simd/jdmermmx.asm (revision 829)	12748 --- simd/jdmermmx.asm (revision 829)

951 +++ simd/jdmermmx.asm (working copy)	12749 +++ simd/jdmermmx.asm (working copy)

952 @@ -35,7 +35,7 @@	12750 @@ -35,7 +35,7 @@

953 SECTION SEG_CONST	12751 SECTION SEG_CONST

954	12752

955 alignz 16	12753 alignz 16

956 - global EXTN(jconst_merged_upsample_mmx)	12754 - global EXTN(jconst_merged_upsample_mmx)

957 + global EXTN(jconst_merged_upsample_mmx) PRIVATE	12755 + global EXTN(jconst_merged_upsample_mmx) PRIVATE

958	12756

959 EXTN(jconst_merged_upsample_mmx):	12757 EXTN(jconst_merged_upsample_mmx):

960	12758

961 Index: simd/jcclrss2.asm	12759 @@ -48,6 +48,9 @@

	12760 » alignz» 16

	12761

	12762 ; --------------------------------------------------------------------------

	12763 +» SECTION»SEG_TEXT

	12764 +» BITS» 32

	12765 +

	12766 %include "jdmrgmmx.asm"

	12767

	12768 %undef RGB_RED

	12769 @@ -54,10 +57,10 @@

	12770 %undef RGB_GREEN

	12771 %undef RGB_BLUE

	12772 %undef RGB_PIXELSIZE

	12773 -%define RGB_RED 0

	12774 -%define RGB_GREEN 1

	12775 -%define RGB_BLUE 2

	12776 -%define RGB_PIXELSIZE 3

	12777 +%define RGB_RED EXT_RGB_RED

	12778 +%define RGB_GREEN EXT_RGB_GREEN

	12779 +%define RGB_BLUE EXT_RGB_BLUE

	12780 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	12781 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extrgb_merged_upsample_mmx

	12782 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extrgb_merged_upsample_mmx

	12783 %include "jdmrgmmx.asm"

	12784 @@ -66,10 +69,10 @@

	12785 %undef RGB_GREEN

	12786 %undef RGB_BLUE

	12787 %undef RGB_PIXELSIZE

	12788 -%define RGB_RED 0

	12789 -%define RGB_GREEN 1

	12790 -%define RGB_BLUE 2

	12791 -%define RGB_PIXELSIZE 4

	12792 +%define RGB_RED EXT_RGBX_RED

	12793 +%define RGB_GREEN EXT_RGBX_GREEN

	12794 +%define RGB_BLUE EXT_RGBX_BLUE

	12795 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	12796 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extrgbx_merged_upsample_mmx

	12797 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extrgbx_merged_upsample_mmx

	12798 %include "jdmrgmmx.asm"

	12799 @@ -78,10 +81,10 @@

	12800 %undef RGB_GREEN

	12801 %undef RGB_BLUE

	12802 %undef RGB_PIXELSIZE

	12803 -%define RGB_RED 2

	12804 -%define RGB_GREEN 1

	12805 -%define RGB_BLUE 0

	12806 -%define RGB_PIXELSIZE 3

	12807 +%define RGB_RED EXT_BGR_RED

	12808 +%define RGB_GREEN EXT_BGR_GREEN

	12809 +%define RGB_BLUE EXT_BGR_BLUE

	12810 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	12811 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extbgr_merged_upsample_mmx

	12812 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extbgr_merged_upsample_mmx

	12813 %include "jdmrgmmx.asm"

	12814 @@ -90,10 +93,10 @@

	12815 %undef RGB_GREEN

	12816 %undef RGB_BLUE

	12817 %undef RGB_PIXELSIZE

	12818 -%define RGB_RED 2

	12819 -%define RGB_GREEN 1

	12820 -%define RGB_BLUE 0

	12821 -%define RGB_PIXELSIZE 4

	12822 +%define RGB_RED EXT_BGRX_RED

	12823 +%define RGB_GREEN EXT_BGRX_GREEN

	12824 +%define RGB_BLUE EXT_BGRX_BLUE

	12825 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	12826 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extbgrx_merged_upsample_mmx

	12827 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extbgrx_merged_upsample_mmx

	12828 %include "jdmrgmmx.asm"

	12829 @@ -102,10 +105,10 @@

	12830 %undef RGB_GREEN

	12831 %undef RGB_BLUE

	12832 %undef RGB_PIXELSIZE

	12833 -%define RGB_RED 3

	12834 -%define RGB_GREEN 2

	12835 -%define RGB_BLUE 1

	12836 -%define RGB_PIXELSIZE 4

	12837 +%define RGB_RED EXT_XBGR_RED

	12838 +%define RGB_GREEN EXT_XBGR_GREEN

	12839 +%define RGB_BLUE EXT_XBGR_BLUE

	12840 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	12841 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extxbgr_merged_upsample_mmx

	12842 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extxbgr_merged_upsample_mmx

	12843 %include "jdmrgmmx.asm"

	12844 @@ -114,10 +117,10 @@

	12845 %undef RGB_GREEN

	12846 %undef RGB_BLUE

	12847 %undef RGB_PIXELSIZE

	12848 -%define RGB_RED 1

	12849 -%define RGB_GREEN 2

	12850 -%define RGB_BLUE 3

	12851 -%define RGB_PIXELSIZE 4

	12852 +%define RGB_RED EXT_XRGB_RED

	12853 +%define RGB_GREEN EXT_XRGB_GREEN

	12854 +%define RGB_BLUE EXT_XRGB_BLUE

	12855 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	12856 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extxrgb_merged_upsample_mmx

	12857 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extxrgb_merged_upsample_mmx

	12858 %include "jdmrgmmx.asm"

	12859 Index: simd/jdmerss2-64.asm

962 ===================================================================	12860 ===================================================================

963 --- simd/jcclrss2.asm» (revision 829)	12861 --- simd/jdmerss2-64.asm» (revision 829)

964 +++ simd/jcclrss2.asm» (working copy)	12862 +++ simd/jdmerss2-64.asm» (working copy)

965 @@ -38,7 +38,7 @@	12863 @@ -1,5 +1,5 @@

966	12864 ;

967 » align» 16	12865 -; jdmerss2.asm - merged upsampling/color conversion (64-bit SSE2)

968	12866 +; jdmerss2-64.asm - merged upsampling/color conversion (64-bit SSE2)

969 -» global» EXTN(jsimd_rgb_ycc_convert_sse2)	12867 ;

970 +» global» EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE	12868 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

971	12869 ; Copyright 2009 D. R. Commander

972 EXTN(jsimd_rgb_ycc_convert_sse2):	12870 @@ -35,7 +35,7 @@

973 » push» ebp

974 Index: simd/jiss2red.asm

975 ===================================================================

976 --- simd/jiss2red.asm» (revision 829)

977 +++ simd/jiss2red.asm» (working copy)

978 @@ -72,7 +72,7 @@

979 SECTION SEG_CONST	12871 SECTION SEG_CONST

980	12872

981 alignz 16	12873 alignz 16

982 -» global» EXTN(jconst_idct_red_sse2)	12874 -» global» EXTN(jconst_merged_upsample_sse2)

983 +» global» EXTN(jconst_idct_red_sse2) PRIVATE	12875 +» global» EXTN(jconst_merged_upsample_sse2) PRIVATE

984	12876

985 EXTN(jconst_idct_red_sse2):	12877 EXTN(jconst_merged_upsample_sse2):

986	12878

987 @@ -113,7 +113,7 @@	12879 @@ -48,6 +48,9 @@

988 %define WK_NUM»» 2	12880 » alignz» 16

989	12881

990 » align» 16	12882 ; --------------------------------------------------------------------------

991 -» global» EXTN(jsimd_idct_4x4_sse2)	12883 +» SECTION»SEG_TEXT

992 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE	12884 +» BITS» 64

993	12885 +

994 EXTN(jsimd_idct_4x4_sse2):	12886 %include "jdmrgss2-64.asm"

995 » push» ebp	12887

996 @@ -424,7 +424,7 @@	12888 %undef RGB_RED

997 %define output_col(b)» (b)+20» » ; JDIMENSION output_col	12889 @@ -54,10 +57,10 @@

998	12890 %undef RGB_GREEN

999 » align» 16	12891 %undef RGB_BLUE

1000 -» global» EXTN(jsimd_idct_2x2_sse2)	12892 %undef RGB_PIXELSIZE

1001 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE	12893 -%define RGB_RED 0

1002	12894 -%define RGB_GREEN 1

1003 EXTN(jsimd_idct_2x2_sse2):	12895 -%define RGB_BLUE 2

1004 » push» ebp	12896 -%define RGB_PIXELSIZE 3

	12897 +%define RGB_RED EXT_RGB_RED

	12898 +%define RGB_GREEN EXT_RGB_GREEN

	12899 +%define RGB_BLUE EXT_RGB_BLUE

	12900 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	12901 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgb_merged_upsample_sse2

	12902 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgb_merged_upsample_sse2

	12903 %include "jdmrgss2-64.asm"

	12904 @@ -66,10 +69,10 @@

	12905 %undef RGB_GREEN

	12906 %undef RGB_BLUE

	12907 %undef RGB_PIXELSIZE

	12908 -%define RGB_RED 0

	12909 -%define RGB_GREEN 1

	12910 -%define RGB_BLUE 2

	12911 -%define RGB_PIXELSIZE 4

	12912 +%define RGB_RED EXT_RGBX_RED

	12913 +%define RGB_GREEN EXT_RGBX_GREEN

	12914 +%define RGB_BLUE EXT_RGBX_BLUE

	12915 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	12916 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgbx_merged_upsample_sse2

	12917 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgbx_merged_upsample_sse2

	12918 %include "jdmrgss2-64.asm"

	12919 @@ -78,10 +81,10 @@

	12920 %undef RGB_GREEN

	12921 %undef RGB_BLUE

	12922 %undef RGB_PIXELSIZE

	12923 -%define RGB_RED 2

	12924 -%define RGB_GREEN 1

	12925 -%define RGB_BLUE 0

	12926 -%define RGB_PIXELSIZE 3

	12927 +%define RGB_RED EXT_BGR_RED

	12928 +%define RGB_GREEN EXT_BGR_GREEN

	12929 +%define RGB_BLUE EXT_BGR_BLUE

	12930 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	12931 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgr_merged_upsample_sse2

	12932 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgr_merged_upsample_sse2

	12933 %include "jdmrgss2-64.asm"

	12934 @@ -90,10 +93,10 @@

	12935 %undef RGB_GREEN

	12936 %undef RGB_BLUE

	12937 %undef RGB_PIXELSIZE

	12938 -%define RGB_RED 2

	12939 -%define RGB_GREEN 1

	12940 -%define RGB_BLUE 0

	12941 -%define RGB_PIXELSIZE 4

	12942 +%define RGB_RED EXT_BGRX_RED

	12943 +%define RGB_GREEN EXT_BGRX_GREEN

	12944 +%define RGB_BLUE EXT_BGRX_BLUE

	12945 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	12946 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgrx_merged_upsample_sse2

	12947 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgrx_merged_upsample_sse2

	12948 %include "jdmrgss2-64.asm"

	12949 @@ -102,10 +105,10 @@

	12950 %undef RGB_GREEN

	12951 %undef RGB_BLUE

	12952 %undef RGB_PIXELSIZE

	12953 -%define RGB_RED 3

	12954 -%define RGB_GREEN 2

	12955 -%define RGB_BLUE 1

	12956 -%define RGB_PIXELSIZE 4

	12957 +%define RGB_RED EXT_XBGR_RED

	12958 +%define RGB_GREEN EXT_XBGR_GREEN

	12959 +%define RGB_BLUE EXT_XBGR_BLUE

	12960 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	12961 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxbgr_merged_upsample_sse2

	12962 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxbgr_merged_upsample_sse2

	12963 %include "jdmrgss2-64.asm"

	12964 @@ -114,10 +117,10 @@

	12965 %undef RGB_GREEN

	12966 %undef RGB_BLUE

	12967 %undef RGB_PIXELSIZE

	12968 -%define RGB_RED 1

	12969 -%define RGB_GREEN 2

	12970 -%define RGB_BLUE 3

	12971 -%define RGB_PIXELSIZE 4

	12972 +%define RGB_RED EXT_XRGB_RED

	12973 +%define RGB_GREEN EXT_XRGB_GREEN

	12974 +%define RGB_BLUE EXT_XRGB_BLUE

	12975 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	12976 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxrgb_merged_upsample_sse2

	12977 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxrgb_merged_upsample_sse2

	12978 %include "jdmrgss2-64.asm"

1005 Index: simd/jdmerss2.asm	12979 Index: simd/jdmerss2.asm

1006 ===================================================================	12980 ===================================================================

1007 --- simd/jdmerss2.asm (revision 829)	12981 --- simd/jdmerss2.asm (revision 829)

1008 +++ simd/jdmerss2.asm (working copy)	12982 +++ simd/jdmerss2.asm (working copy)

1009 @@ -35,7 +35,7 @@	12983 @@ -35,7 +35,7 @@

1010 SECTION SEG_CONST	12984 SECTION SEG_CONST

1011	12985

1012 alignz 16	12986 alignz 16

1013 - global EXTN(jconst_merged_upsample_sse2)	12987 - global EXTN(jconst_merged_upsample_sse2)

1014 + global EXTN(jconst_merged_upsample_sse2) PRIVATE	12988 + global EXTN(jconst_merged_upsample_sse2) PRIVATE

1015	12989

1016 EXTN(jconst_merged_upsample_sse2):	12990 EXTN(jconst_merged_upsample_sse2):

1017	12991

1018 Index: simd/jfss2fst-64.asm	12992 @@ -48,6 +48,9 @@

	12993 » alignz» 16

	12994

	12995 ; --------------------------------------------------------------------------

	12996 +» SECTION»SEG_TEXT

	12997 +» BITS» 32

	12998 +

	12999 %include "jdmrgss2.asm"

	13000

	13001 %undef RGB_RED

	13002 @@ -54,10 +57,10 @@

	13003 %undef RGB_GREEN

	13004 %undef RGB_BLUE

	13005 %undef RGB_PIXELSIZE

	13006 -%define RGB_RED 0

	13007 -%define RGB_GREEN 1

	13008 -%define RGB_BLUE 2

	13009 -%define RGB_PIXELSIZE 3

	13010 +%define RGB_RED EXT_RGB_RED

	13011 +%define RGB_GREEN EXT_RGB_GREEN

	13012 +%define RGB_BLUE EXT_RGB_BLUE

	13013 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	13014 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgb_merged_upsample_sse2

	13015 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgb_merged_upsample_sse2

	13016 %include "jdmrgss2.asm"

	13017 @@ -66,10 +69,10 @@

	13018 %undef RGB_GREEN

	13019 %undef RGB_BLUE

	13020 %undef RGB_PIXELSIZE

	13021 -%define RGB_RED 0

	13022 -%define RGB_GREEN 1

	13023 -%define RGB_BLUE 2

	13024 -%define RGB_PIXELSIZE 4

	13025 +%define RGB_RED EXT_RGBX_RED

	13026 +%define RGB_GREEN EXT_RGBX_GREEN

	13027 +%define RGB_BLUE EXT_RGBX_BLUE

	13028 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE

	13029 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgbx_merged_upsample_sse2

	13030 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgbx_merged_upsample_sse2

	13031 %include "jdmrgss2.asm"

	13032 @@ -78,10 +81,10 @@

	13033 %undef RGB_GREEN

	13034 %undef RGB_BLUE

	13035 %undef RGB_PIXELSIZE

	13036 -%define RGB_RED 2

	13037 -%define RGB_GREEN 1

	13038 -%define RGB_BLUE 0

	13039 -%define RGB_PIXELSIZE 3

	13040 +%define RGB_RED EXT_BGR_RED

	13041 +%define RGB_GREEN EXT_BGR_GREEN

	13042 +%define RGB_BLUE EXT_BGR_BLUE

	13043 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE

	13044 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgr_merged_upsample_sse2

	13045 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgr_merged_upsample_sse2

	13046 %include "jdmrgss2.asm"

	13047 @@ -90,10 +93,10 @@

	13048 %undef RGB_GREEN

	13049 %undef RGB_BLUE

	13050 %undef RGB_PIXELSIZE

	13051 -%define RGB_RED 2

	13052 -%define RGB_GREEN 1

	13053 -%define RGB_BLUE 0

	13054 -%define RGB_PIXELSIZE 4

	13055 +%define RGB_RED EXT_BGRX_RED

	13056 +%define RGB_GREEN EXT_BGRX_GREEN

	13057 +%define RGB_BLUE EXT_BGRX_BLUE

	13058 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE

	13059 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgrx_merged_upsample_sse2

	13060 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgrx_merged_upsample_sse2

	13061 %include "jdmrgss2.asm"

	13062 @@ -102,10 +105,10 @@

	13063 %undef RGB_GREEN

	13064 %undef RGB_BLUE

	13065 %undef RGB_PIXELSIZE

	13066 -%define RGB_RED 3

	13067 -%define RGB_GREEN 2

	13068 -%define RGB_BLUE 1

	13069 -%define RGB_PIXELSIZE 4

	13070 +%define RGB_RED EXT_XBGR_RED

	13071 +%define RGB_GREEN EXT_XBGR_GREEN

	13072 +%define RGB_BLUE EXT_XBGR_BLUE

	13073 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE

	13074 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxbgr_merged_upsample_sse2

	13075 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxbgr_merged_upsample_sse2

	13076 %include "jdmrgss2.asm"

	13077 @@ -114,10 +117,10 @@

	13078 %undef RGB_GREEN

	13079 %undef RGB_BLUE

	13080 %undef RGB_PIXELSIZE

	13081 -%define RGB_RED 1

	13082 -%define RGB_GREEN 2

	13083 -%define RGB_BLUE 3

	13084 -%define RGB_PIXELSIZE 4

	13085 +%define RGB_RED EXT_XRGB_RED

	13086 +%define RGB_GREEN EXT_XRGB_GREEN

	13087 +%define RGB_BLUE EXT_XRGB_BLUE

	13088 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	13089 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxrgb_merged_upsample_sse2

	13090 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxrgb_merged_upsample_sse2

	13091 %include "jdmrgss2.asm"

	13092 Index: simd/jdmrgmmx.asm

1019 ===================================================================	13093 ===================================================================

1020 --- simd/jfss2fst-64.asm (revision 829)	13094 --- simd/jdmrgmmx.asm (revision 829)

1021 +++ simd/jfss2fst-64.asm (working copy)	13095 +++ simd/jdmrgmmx.asm (working copy)

1022 @@ -53,7 +53,7 @@	13096 @@ -19,8 +19,6 @@

1023 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)	13097 %include "jcolsamp.inc"

	13098

	13099 ; --------------------------------------------------------------------------

	13100 - SECTION SEG_TEXT

	13101 - BITS 32

	13102 ;

	13103 ; Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical.

	13104 ;

	13105 @@ -42,7 +40,7 @@

	13106 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr

	13107

	13108 align 16

	13109 - global EXTN(jsimd_h2v1_merged_upsample_mmx)

	13110 + global EXTN(jsimd_h2v1_merged_upsample_mmx) PRIVATE

	13111

	13112 EXTN(jsimd_h2v1_merged_upsample_mmx):

	13113 push ebp

	13114 @@ -253,7 +251,7 @@

	13115 movq MMWORD [edi+2*SIZEOF_MMWORD], mmC

	13116

	13117 sub ecx, byte SIZEOF_MMWORD

	13118 - jz short .endcolumn

	13119 + jz near .endcolumn

	13120

	13121 add edi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr

	13122 add esi, byte SIZEOF_MMWORD ; inptr0

	13123 @@ -411,7 +409,7 @@

	13124 %define output_buf(b) (b)+20 ; JSAMPARRAY output_buf

	13125

	13126 align 16

	13127 - global EXTN(jsimd_h2v2_merged_upsample_mmx)

	13128 + global EXTN(jsimd_h2v2_merged_upsample_mmx) PRIVATE

	13129

	13130 EXTN(jsimd_h2v2_merged_upsample_mmx):

	13131 push ebp

	13132 @@ -461,3 +459,6 @@

	13133 pop ebp

	13134 ret

	13135

	13136 +; For some reason, the OS X linker does not honor the request to align the

	13137 +; segment unless we do this.

	13138 + align 16

	13139 Index: simd/jdmrgss2-64.asm

	13140 ===================================================================

	13141 --- simd/jdmrgss2-64.asm (revision 829)

	13142 +++ simd/jdmrgss2-64.asm (working copy)

	13143 @@ -1,8 +1,8 @@

	13144 ;

	13145 -; jdmrgss2.asm - merged upsampling/color conversion (64-bit SSE2)

	13146 +; jdmrgss2-64.asm - merged upsampling/color conversion (64-bit SSE2)

	13147 ;

	13148 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	13149 -; Copyright 2009 D. R. Commander

	13150 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB

	13151 +; Copyright 2009, 2012 D. R. Commander

	13152 ;

	13153 ; Based on

	13154 ; x86 SIMD extension for IJG JPEG library

	13155 @@ -20,8 +20,6 @@

	13156 %include "jcolsamp.inc"

	13157

	13158 ; --------------------------------------------------------------------------

	13159 - SECTION SEG_TEXT

	13160 - BITS 64

	13161 ;

	13162 ; Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical.

	13163 ;

	13164 @@ -41,7 +39,7 @@

	13165 %define WK_NUM 3

	13166

	13167 align 16

	13168 - global EXTN(jsimd_h2v1_merged_upsample_sse2)

	13169 + global EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE

	13170

	13171 EXTN(jsimd_h2v1_merged_upsample_sse2):

	13172 push rbp

	13173 @@ -51,8 +49,8 @@

	13174 mov [rsp],rax

	13175 mov rbp,rsp ; rbp = aligned rbp

	13176 lea rsp, [wk(0)]

	13177 + collect_args

	13178 push rbx

	13179 - collect_args

	13180

	13181 mov rcx, r10 ; col

	13182 test rcx,rcx

	13183 @@ -254,17 +252,13 @@

	13184 movntdq XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	13185 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	13186 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF

	13187 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	13188 jmp short .out0

	13189 .out1: ; --(unaligned)-----------------

	13190 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)

	13191 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA

	13192 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13193 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD

	13194 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13195 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [rdi], xmmF

	13196 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13197 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	13198 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	13199 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF

	13200 .out0:

	13201 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	13202 sub rcx, byte SIZEOF_XMMWORD

	13203 jz near .endcolumn

	13204

	13205 @@ -277,14 +271,12 @@

	13206 jmp near .columnloop

	13207

	13208 .column_st32:

	13209 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)

	13210 lea rcx, [rcx+rcx*2] ; imul ecx, RGB_PIXELSIZE

	13211 cmp rcx, byte 2*SIZEOF_XMMWORD

	13212 jb short .column_st16

	13213 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA

	13214 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13215 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD

	13216 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13217 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	13218 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	13219 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr

	13220 movdqa xmmA,xmmF

	13221 sub rcx, byte 2*SIZEOF_XMMWORD

	13222 jmp short .column_st15

	13223 @@ -291,50 +283,44 @@

	13224 .column_st16:

	13225 cmp rcx, byte SIZEOF_XMMWORD

	13226 jb short .column_st15

	13227 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA

	13228 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	13229 add rdi, byte SIZEOF_XMMWORD ; outptr

	13230 movdqa xmmA,xmmD

	13231 sub rcx, byte SIZEOF_XMMWORD

	13232 .column_st15:

	13233 - mov rax,rcx

	13234 - xor rcx, byte 0x0F

	13235 - shl rcx, 2

	13236 - movd xmmB,ecx

	13237 - psrlq xmmH,4

	13238 - pcmpeqb xmmE,xmmE

	13239 - psrlq xmmH,xmmB

	13240 - psrlq xmmE,xmmB

	13241 - punpcklbw xmmE,xmmH

	13242 - ; ----------------

	13243 - mov rcx,rdi

	13244 - and rcx, byte SIZEOF_XMMWORD-1

	13245 - jz short .adj0

	13246 - add rax,rcx

	13247 - cmp rax, byte SIZEOF_XMMWORD

	13248 - ja short .adj0

	13249 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary

	13250 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx

	13251 - movdqa xmmG,xmmA

	13252 - movdqa xmmC,xmmE

	13253 - pslldq xmmA, SIZEOF_XMMWORD/2

	13254 - pslldq xmmE, SIZEOF_XMMWORD/2

	13255 - movd xmmD,ecx

	13256 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT

	13257 - jb short .adj1

	13258 - movd xmmF,ecx

	13259 - psllq xmmA,xmmF

	13260 - psllq xmmE,xmmF

	13261 - jmp short .adj0

	13262 -.adj1: neg rcx

	13263 - movd xmmF,ecx

	13264 - psrlq xmmA,xmmF

	13265 - psrlq xmmE,xmmF

	13266 - psllq xmmG,xmmD

	13267 - psllq xmmC,xmmD

	13268 - por xmmA,xmmG

	13269 - por xmmE,xmmC

	13270 -.adj0: ; ----------------

	13271 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	13272 + ; Store the lower 8 bytes of xmmA to the output when it has enough

	13273 + ; space.

	13274 + cmp rcx, byte SIZEOF_MMWORD

	13275 + jb short .column_st7

	13276 + movq XMM_MMWORD [rdi], xmmA

	13277 + add rdi, byte SIZEOF_MMWORD

	13278 + sub rcx, byte SIZEOF_MMWORD

	13279 + psrldq xmmA, SIZEOF_MMWORD

	13280 +.column_st7:

	13281 + ; Store the lower 4 bytes of xmmA to the output when it has enough

	13282 + ; space.

	13283 + cmp rcx, byte SIZEOF_DWORD

	13284 + jb short .column_st3

	13285 + movd XMM_DWORD [rdi], xmmA

	13286 + add rdi, byte SIZEOF_DWORD

	13287 + sub rcx, byte SIZEOF_DWORD

	13288 + psrldq xmmA, SIZEOF_DWORD

	13289 +.column_st3:

	13290 + ; Store the lower 2 bytes of rax to the output when it has enough

	13291 + ; space.

	13292 + movd eax, xmmA

	13293 + cmp rcx, byte SIZEOF_WORD

	13294 + jb short .column_st1

	13295 + mov WORD [rdi], ax

	13296 + add rdi, byte SIZEOF_WORD

	13297 + sub rcx, byte SIZEOF_WORD

	13298 + shr rax, 16

	13299 +.column_st1:

	13300 + ; Store the lower 1 byte of rax to the output when it has enough

	13301 + ; space.

	13302 + test rcx, rcx

	13303 + jz short .endcolumn

	13304 + mov BYTE [rdi], al

	13305

	13306 %else ; RGB_PIXELSIZE == 4 ; -----------

	13307

	13308 @@ -379,19 +365,14 @@

	13309 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	13310 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC

	13311 movntdq XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH

	13312 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	13313 jmp short .out0

	13314 .out1: ; --(unaligned)-----------------

	13315 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)

	13316 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA

	13317 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13318 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD

	13319 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13320 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [rdi], xmmC

	13321 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13322 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [rdi], xmmH

	13323 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13324 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	13325 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	13326 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC

	13327 + movdqu XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH

	13328 .out0:

	13329 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	13330 sub rcx, byte SIZEOF_XMMWORD

	13331 jz near .endcolumn

	13332

	13333 @@ -404,13 +385,11 @@

	13334 jmp near .columnloop

	13335

	13336 .column_st32:

	13337 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)

	13338 cmp rcx, byte SIZEOF_XMMWORD/2

	13339 jb short .column_st16

	13340 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA

	13341 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13342 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD

	13343 - add rdi, byte SIZEOF_XMMWORD ; outptr

	13344 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	13345 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD

	13346 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr

	13347 movdqa xmmA,xmmC

	13348 movdqa xmmD,xmmH

	13349 sub rcx, byte SIZEOF_XMMWORD/2

	13350 @@ -417,50 +396,25 @@

	13351 .column_st16:

	13352 cmp rcx, byte SIZEOF_XMMWORD/4

	13353 jb short .column_st15

	13354 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	13355 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA

	13356 add rdi, byte SIZEOF_XMMWORD ; outptr

	13357 movdqa xmmA,xmmD

	13358 sub rcx, byte SIZEOF_XMMWORD/4

	13359 .column_st15:

	13360 - cmp rcx, byte SIZEOF_XMMWORD/16

	13361 - jb near .endcolumn

	13362 - mov rax,rcx

	13363 - xor rcx, byte 0x03

	13364 - inc rcx

	13365 - shl rcx, 4

	13366 - movd xmmF,ecx

	13367 - psrlq xmmE,xmmF

	13368 - punpcklbw xmmE,xmmE

	13369 - ; ----------------

	13370 - mov rcx,rdi

	13371 - and rcx, byte SIZEOF_XMMWORD-1

	13372 - jz short .adj0

	13373 - lea rax, [rcx+rax*4] ; RGB_PIXELSIZE

	13374 - cmp rax, byte SIZEOF_XMMWORD

	13375 - ja short .adj0

	13376 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary

	13377 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx

	13378 - movdqa xmmB,xmmA

	13379 - movdqa xmmG,xmmE

	13380 - pslldq xmmA, SIZEOF_XMMWORD/2

	13381 - pslldq xmmE, SIZEOF_XMMWORD/2

	13382 - movd xmmC,ecx

	13383 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT

	13384 - jb short .adj1

	13385 - movd xmmH,ecx

	13386 - psllq xmmA,xmmH

	13387 - psllq xmmE,xmmH

	13388 - jmp short .adj0

	13389 -.adj1: neg rcx

	13390 - movd xmmH,ecx

	13391 - psrlq xmmA,xmmH

	13392 - psrlq xmmE,xmmH

	13393 - psllq xmmB,xmmC

	13394 - psllq xmmG,xmmC

	13395 - por xmmA,xmmB

	13396 - por xmmE,xmmG

	13397 -.adj0: ; ----------------

	13398 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	13399 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough

	13400 + ; space.

	13401 + cmp rcx, byte SIZEOF_XMMWORD/8

	13402 + jb short .column_st7

	13403 + movq XMM_MMWORD [rdi], xmmA

	13404 + add rdi, byte SIZEOF_XMMWORD/8*4

	13405 + sub rcx, byte SIZEOF_XMMWORD/8

	13406 + psrldq xmmA, SIZEOF_XMMWORD/8*4

	13407 +.column_st7:

	13408 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough

	13409 + ; space.

	13410 + test rcx, rcx

	13411 + jz short .endcolumn

	13412 + movd XMM_DWORD [rdi], xmmA

	13413

	13414 %endif ; RGB_PIXELSIZE ; ---------------

	13415

	13416 @@ -468,8 +422,8 @@

	13417 sfence ; flush the write buffer

	13418

	13419 .return:

	13420 + pop rbx

	13421 uncollect_args

	13422 - pop rbx

	13423 mov rsp,rbp ; rsp <- aligned rbp

	13424 pop rsp ; rsp <- original rbp

	13425 pop rbp

	13426 @@ -492,13 +446,14 @@

	13427 ; r13 = JSAMPARRAY output_buf

	13428

	13429 align 16

	13430 - global EXTN(jsimd_h2v2_merged_upsample_sse2)

	13431 + global EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE

	13432

	13433 EXTN(jsimd_h2v2_merged_upsample_sse2):

	13434 push rbp

	13435 + mov rax,rsp

	13436 mov rbp,rsp

	13437 + collect_args

	13438 push rbx

	13439 - collect_args

	13440

	13441 mov rax, r10

	13442

	13443 @@ -519,10 +474,17 @@

	13444 push rcx

	13445 push rax

	13446

	13447 + %ifdef WIN64

	13448 + mov r8, rcx

	13449 + mov r9, rdi

	13450 + mov rcx, rax

	13451 + mov rdx, rbx

	13452 + %else

	13453 mov rdx, rcx

	13454 mov rcx, rdi

	13455 mov rdi, rax

	13456 mov rsi, rbx

	13457 + %endif

	13458

	13459 call EXTN(jsimd_h2v1_merged_upsample_sse2)

	13460

	13461 @@ -545,10 +507,17 @@

	13462 push rcx

	13463 push rax

	13464

	13465 + %ifdef WIN64

	13466 + mov r8, rcx

	13467 + mov r9, rdi

	13468 + mov rcx, rax

	13469 + mov rdx, rbx

	13470 + %else

	13471 mov rdx, rcx

	13472 mov rcx, rdi

	13473 mov rdi, rax

	13474 mov rsi, rbx

	13475 + %endif

	13476

	13477 call EXTN(jsimd_h2v1_merged_upsample_sse2)

	13478

	13479 @@ -559,7 +528,11 @@

	13480 pop rbx

	13481 pop rdx

	13482

	13483 + pop rbx

	13484 uncollect_args

	13485 - pop rbx

	13486 pop rbp

	13487 ret

	13488 +

	13489 +; For some reason, the OS X linker does not honor the request to align the

	13490 +; segment unless we do this.

	13491 + align 16

	13492 Index: simd/jdmrgss2.asm

	13493 ===================================================================

	13494 --- simd/jdmrgss2.asm (revision 829)

	13495 +++ simd/jdmrgss2.asm (working copy)

	13496 @@ -1,7 +1,8 @@

	13497 ;

	13498 ; jdmrgss2.asm - merged upsampling/color conversion (SSE2)

	13499 ;

	13500 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	13501 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB

	13502 +; Copyright 2012 D. R. Commander

	13503 ;

	13504 ; Based on

	13505 ; x86 SIMD extension for IJG JPEG library

	13506 @@ -19,8 +20,6 @@

	13507 %include "jcolsamp.inc"

	13508

	13509 ; --------------------------------------------------------------------------

	13510 - SECTION SEG_TEXT

	13511 - BITS 32

	13512 ;

	13513 ; Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical.

	13514 ;

	13515 @@ -42,7 +41,7 @@

	13516 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr

	13517

	13518 align 16

	13519 - global EXTN(jsimd_h2v1_merged_upsample_sse2)

	13520 + global EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE

	13521

	13522 EXTN(jsimd_h2v1_merged_upsample_sse2):

	13523 push ebp

	13524 @@ -266,17 +265,13 @@

	13525 movntdq XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	13526 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	13527 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF

	13528 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	13529 jmp short .out0

	13530 .out1: ; --(unaligned)-----------------

	13531 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)

	13532 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA

	13533 - add edi, byte SIZEOF_XMMWORD ; outptr

	13534 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD

	13535 - add edi, byte SIZEOF_XMMWORD ; outptr

	13536 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [edi], xmmF

	13537 - add edi, byte SIZEOF_XMMWORD ; outptr

	13538 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	13539 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	13540 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF

	13541 .out0:

	13542 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	13543 sub ecx, byte SIZEOF_XMMWORD

	13544 jz near .endcolumn

	13545

	13546 @@ -290,14 +285,12 @@

	13547 alignx 16,7

	13548

	13549 .column_st32:

	13550 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's)

	13551 lea ecx, [ecx+ecx*2] ; imul ecx, RGB_PIXELSIZE

	13552 cmp ecx, byte 2*SIZEOF_XMMWORD

	13553 jb short .column_st16

	13554 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA

	13555 - add edi, byte SIZEOF_XMMWORD ; outptr

	13556 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD

	13557 - add edi, byte SIZEOF_XMMWORD ; outptr

	13558 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	13559 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	13560 + add edi, byte 2*SIZEOF_XMMWORD ; outptr

	13561 movdqa xmmA,xmmF

	13562 sub ecx, byte 2*SIZEOF_XMMWORD

	13563 jmp short .column_st15

	13564 @@ -304,50 +297,44 @@

	13565 .column_st16:

	13566 cmp ecx, byte SIZEOF_XMMWORD

	13567 jb short .column_st15

	13568 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA

	13569 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	13570 add edi, byte SIZEOF_XMMWORD ; outptr

	13571 movdqa xmmA,xmmD

	13572 sub ecx, byte SIZEOF_XMMWORD

	13573 .column_st15:

	13574 - mov eax,ecx

	13575 - xor ecx, byte 0x0F

	13576 - shl ecx, 2

	13577 - movd xmmB,ecx

	13578 - psrlq xmmH,4

	13579 - pcmpeqb xmmE,xmmE

	13580 - psrlq xmmH,xmmB

	13581 - psrlq xmmE,xmmB

	13582 - punpcklbw xmmE,xmmH

	13583 - ; ----------------

	13584 - mov ecx,edi

	13585 - and ecx, byte SIZEOF_XMMWORD-1

	13586 - jz short .adj0

	13587 - add eax,ecx

	13588 - cmp eax, byte SIZEOF_XMMWORD

	13589 - ja short .adj0

	13590 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary

	13591 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx

	13592 - movdqa xmmG,xmmA

	13593 - movdqa xmmC,xmmE

	13594 - pslldq xmmA, SIZEOF_XMMWORD/2

	13595 - pslldq xmmE, SIZEOF_XMMWORD/2

	13596 - movd xmmD,ecx

	13597 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT

	13598 - jb short .adj1

	13599 - movd xmmF,ecx

	13600 - psllq xmmA,xmmF

	13601 - psllq xmmE,xmmF

	13602 - jmp short .adj0

	13603 -.adj1: neg ecx

	13604 - movd xmmF,ecx

	13605 - psrlq xmmA,xmmF

	13606 - psrlq xmmE,xmmF

	13607 - psllq xmmG,xmmD

	13608 - psllq xmmC,xmmD

	13609 - por xmmA,xmmG

	13610 - por xmmE,xmmC

	13611 -.adj0: ; ----------------

	13612 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	13613 + ; Store the lower 8 bytes of xmmA to the output when it has enough

	13614 + ; space.

	13615 + cmp ecx, byte SIZEOF_MMWORD

	13616 + jb short .column_st7

	13617 + movq XMM_MMWORD [edi], xmmA

	13618 + add edi, byte SIZEOF_MMWORD

	13619 + sub ecx, byte SIZEOF_MMWORD

	13620 + psrldq xmmA, SIZEOF_MMWORD

	13621 +.column_st7:

	13622 + ; Store the lower 4 bytes of xmmA to the output when it has enough

	13623 + ; space.

	13624 + cmp ecx, byte SIZEOF_DWORD

	13625 + jb short .column_st3

	13626 + movd XMM_DWORD [edi], xmmA

	13627 + add edi, byte SIZEOF_DWORD

	13628 + sub ecx, byte SIZEOF_DWORD

	13629 + psrldq xmmA, SIZEOF_DWORD

	13630 +.column_st3:

	13631 + ; Store the lower 2 bytes of eax to the output when it has enough

	13632 + ; space.

	13633 + movd eax, xmmA

	13634 + cmp ecx, byte SIZEOF_WORD

	13635 + jb short .column_st1

	13636 + mov WORD [edi], ax

	13637 + add edi, byte SIZEOF_WORD

	13638 + sub ecx, byte SIZEOF_WORD

	13639 + shr eax, 16

	13640 +.column_st1:

	13641 + ; Store the lower 1 byte of eax to the output when it has enough

	13642 + ; space.

	13643 + test ecx, ecx

	13644 + jz short .endcolumn

	13645 + mov BYTE [edi], al

	13646

	13647 %else ; RGB_PIXELSIZE == 4 ; -----------

	13648

	13649 @@ -392,19 +379,14 @@

	13650 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	13651 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC

	13652 movntdq XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH

	13653 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	13654 jmp short .out0

	13655 .out1: ; --(unaligned)-----------------

	13656 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)

	13657 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	13658 - add edi, byte SIZEOF_XMMWORD ; outptr

	13659 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD

	13660 - add edi, byte SIZEOF_XMMWORD ; outptr

	13661 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [edi], xmmC

	13662 - add edi, byte SIZEOF_XMMWORD ; outptr

	13663 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [edi], xmmH

	13664 - add edi, byte SIZEOF_XMMWORD ; outptr

	13665 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	13666 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	13667 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC

	13668 + movdqu XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH

	13669 .out0:

	13670 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr

	13671 sub ecx, byte SIZEOF_XMMWORD

	13672 jz near .endcolumn

	13673

	13674 @@ -418,13 +400,11 @@

	13675 alignx 16,7

	13676

	13677 .column_st32:

	13678 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's)

	13679 cmp ecx, byte SIZEOF_XMMWORD/2

	13680 jb short .column_st16

	13681 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	13682 - add edi, byte SIZEOF_XMMWORD ; outptr

	13683 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD

	13684 - add edi, byte SIZEOF_XMMWORD ; outptr

	13685 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	13686 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD

	13687 + add edi, byte 2*SIZEOF_XMMWORD ; outptr

	13688 movdqa xmmA,xmmC

	13689 movdqa xmmD,xmmH

	13690 sub ecx, byte SIZEOF_XMMWORD/2

	13691 @@ -431,50 +411,25 @@

	13692 .column_st16:

	13693 cmp ecx, byte SIZEOF_XMMWORD/4

	13694 jb short .column_st15

	13695 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	13696 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA

	13697 add edi, byte SIZEOF_XMMWORD ; outptr

	13698 movdqa xmmA,xmmD

	13699 sub ecx, byte SIZEOF_XMMWORD/4

	13700 .column_st15:

	13701 - cmp ecx, byte SIZEOF_XMMWORD/16

	13702 - jb short .endcolumn

	13703 - mov eax,ecx

	13704 - xor ecx, byte 0x03

	13705 - inc ecx

	13706 - shl ecx, 4

	13707 - movd xmmF,ecx

	13708 - psrlq xmmE,xmmF

	13709 - punpcklbw xmmE,xmmE

	13710 - ; ----------------

	13711 - mov ecx,edi

	13712 - and ecx, byte SIZEOF_XMMWORD-1

	13713 - jz short .adj0

	13714 - lea eax, [ecx+eax*4] ; RGB_PIXELSIZE

	13715 - cmp eax, byte SIZEOF_XMMWORD

	13716 - ja short .adj0

	13717 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary

	13718 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx

	13719 - movdqa xmmB,xmmA

	13720 - movdqa xmmG,xmmE

	13721 - pslldq xmmA, SIZEOF_XMMWORD/2

	13722 - pslldq xmmE, SIZEOF_XMMWORD/2

	13723 - movd xmmC,ecx

	13724 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT

	13725 - jb short .adj1

	13726 - movd xmmH,ecx

	13727 - psllq xmmA,xmmH

	13728 - psllq xmmE,xmmH

	13729 - jmp short .adj0

	13730 -.adj1: neg ecx

	13731 - movd xmmH,ecx

	13732 - psrlq xmmA,xmmH

	13733 - psrlq xmmE,xmmH

	13734 - psllq xmmB,xmmC

	13735 - psllq xmmG,xmmC

	13736 - por xmmA,xmmB

	13737 - por xmmE,xmmG

	13738 -.adj0: ; ----------------

	13739 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA

	13740 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough

	13741 + ; space.

	13742 + cmp ecx, byte SIZEOF_XMMWORD/8

	13743 + jb short .column_st7

	13744 + movq XMM_MMWORD [edi], xmmA

	13745 + add edi, byte SIZEOF_XMMWORD/8*4

	13746 + sub ecx, byte SIZEOF_XMMWORD/8

	13747 + psrldq xmmA, SIZEOF_XMMWORD/8*4

	13748 +.column_st7:

	13749 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough

	13750 + ; space.

	13751 + test ecx, ecx

	13752 + jz short .endcolumn

	13753 + movd XMM_DWORD [edi], xmmA

	13754

	13755 %endif ; RGB_PIXELSIZE ; ---------------

	13756

	13757 @@ -509,7 +464,7 @@

	13758 %define output_buf(b) (b)+20 ; JSAMPARRAY output_buf

	13759

	13760 align 16

	13761 - global EXTN(jsimd_h2v2_merged_upsample_sse2)

	13762 + global EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE

	13763

	13764 EXTN(jsimd_h2v2_merged_upsample_sse2):

	13765 push ebp

	13766 @@ -559,3 +514,6 @@

	13767 pop ebp

	13768 ret

	13769

	13770 +; For some reason, the OS X linker does not honor the request to align the

	13771 +; segment unless we do this.

	13772 + align 16

	13773 Index: simd/jdsammmx.asm

	13774 ===================================================================

	13775 --- simd/jdsammmx.asm (revision 829)

	13776 +++ simd/jdsammmx.asm (working copy)

	13777 @@ -22,7 +22,7 @@

	13778 SECTION SEG_CONST

1024	13779

1025 alignz 16	13780 alignz 16

1026 -» global» EXTN(jconst_fdct_ifast_sse2)	13781 -» global» EXTN(jconst_fancy_upsample_mmx)

1027 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE	13782 +» global» EXTN(jconst_fancy_upsample_mmx) PRIVATE

1028	13783

1029 EXTN(jconst_fdct_ifast_sse2):	13784 EXTN(jconst_fancy_upsample_mmx):

1030	13785

1031 @@ -80,7 +80,7 @@	13786 @@ -58,7 +58,7 @@

1032 %define WK_NUM»» 2	13787 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

1033	13788

1034 » align» 16	13789 » align» 16

1035 -» global» EXTN(jsimd_fdct_ifast_sse2)	13790 -» global» EXTN(jsimd_h2v1_fancy_upsample_mmx)

1036 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE	13791 +» global» EXTN(jsimd_h2v1_fancy_upsample_mmx) PRIVATE

1037	13792

1038 EXTN(jsimd_fdct_ifast_sse2):	13793 EXTN(jsimd_h2v1_fancy_upsample_mmx):

1039 » push» rbp

1040 Index: simd/jcqntmmx.asm

1041 ===================================================================

1042 --- simd/jcqntmmx.asm» (revision 829)

1043 +++ simd/jcqntmmx.asm» (working copy)

1044 @@ -35,7 +35,7 @@

1045 %define workspace» ebp+16» » ; DCTELEM * workspace

1046

1047 » align» 16

1048 -» global» EXTN(jsimd_convsamp_mmx)

1049 +» global» EXTN(jsimd_convsamp_mmx) PRIVATE

1050

1051 EXTN(jsimd_convsamp_mmx):

1052 push ebp	13794 push ebp

1053 @@ -140,7 +140,7 @@	13795 @@ -216,7 +216,7 @@

1054 %define workspace» ebp+16» » ; DCTELEM * workspace	13796 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr

1055	13797

1056 » align» 16	13798 » align» 16

1057 -» global» EXTN(jsimd_quantize_mmx)	13799 -» global» EXTN(jsimd_h2v2_fancy_upsample_mmx)

1058 +» global» EXTN(jsimd_quantize_mmx) PRIVATE	13800 +» global» EXTN(jsimd_h2v2_fancy_upsample_mmx) PRIVATE

1059	13801

1060 EXTN(jsimd_quantize_mmx):	13802 EXTN(jsimd_h2v2_fancy_upsample_mmx):

1061 push ebp	13803 push ebp

1062 Index: simd/jimmxfst.asm	13804 @@ -542,7 +542,7 @@

1063 ===================================================================	13805 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

1064 --- simd/jimmxfst.asm» (revision 829)	13806

1065 +++ simd/jimmxfst.asm» (working copy)	13807 » align» 16

1066 @@ -59,7 +59,7 @@	13808 -» global» EXTN(jsimd_h2v1_upsample_mmx)

1067 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)	13809 +» global» EXTN(jsimd_h2v1_upsample_mmx) PRIVATE

1068	13810

1069 » alignz» 16	13811 EXTN(jsimd_h2v1_upsample_mmx):

1070 -» global» EXTN(jconst_idct_ifast_mmx)

1071 +» global» EXTN(jconst_idct_ifast_mmx) PRIVATE

1072

1073 EXTN(jconst_idct_ifast_mmx):

1074

1075 @@ -94,7 +94,7 @@

1076 » » » » » ; JCOEF workspace[DCTSIZE2]

1077

1078 » align» 16

1079 -» global» EXTN(jsimd_idct_ifast_mmx)

1080 +» global» EXTN(jsimd_idct_ifast_mmx) PRIVATE

1081

1082 EXTN(jsimd_idct_ifast_mmx):

1083 push ebp	13812 push ebp

1084 Index: simd/jfss2fst.asm	13813 @@ -643,7 +643,7 @@

1085 ===================================================================	13814 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

1086 --- simd/jfss2fst.asm» (revision 829)	13815

1087 +++ simd/jfss2fst.asm» (working copy)	13816 » align» 16

1088 @@ -52,7 +52,7 @@	13817 -» global» EXTN(jsimd_h2v2_upsample_mmx)

1089 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)	13818 +» global» EXTN(jsimd_h2v2_upsample_mmx) PRIVATE

1090	13819

1091 » alignz» 16	13820 EXTN(jsimd_h2v2_upsample_mmx):

1092 -» global» EXTN(jconst_fdct_ifast_sse2)

1093 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE

1094

1095 EXTN(jconst_fdct_ifast_sse2):

1096

1097 @@ -80,7 +80,7 @@

1098 %define WK_NUM»» 2

1099

1100 » align» 16

1101 -» global» EXTN(jsimd_fdct_ifast_sse2)

1102 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE

1103

1104 EXTN(jsimd_fdct_ifast_sse2):

1105 push ebp	13821 push ebp

1106 Index: simd/jcgrammx.asm	13822 @@ -732,3 +732,6 @@

1107 ===================================================================	13823 » pop» ebp

1108 --- simd/jcgrammx.asm» (revision 829)	13824 » ret

1109 +++ simd/jcgrammx.asm» (working copy)	13825

1110 @@ -33,7 +33,7 @@	13826 +; For some reason, the OS X linker does not honor the request to align the

1111 » SECTION»SEG_CONST	13827 +; segment unless we do this.

1112	13828 +» align» 16

1113 » alignz» 16

1114 -» global» EXTN(jconst_rgb_gray_convert_mmx)

1115 +» global» EXTN(jconst_rgb_gray_convert_mmx) PRIVATE

1116

1117 EXTN(jconst_rgb_gray_convert_mmx):

1118

1119 Index: simd/jdcolss2-64.asm

1120 ===================================================================

1121 --- simd/jdcolss2-64.asm» (revision 829)

1122 +++ simd/jdcolss2-64.asm» (working copy)

1123 @@ -35,7 +35,7 @@

1124 » SECTION»SEG_CONST

1125

1126 » alignz» 16

1127 -» global» EXTN(jconst_ycc_rgb_convert_sse2)

1128 +» global» EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE

1129

1130 EXTN(jconst_ycc_rgb_convert_sse2):

1131

1132 Index: simd/jf3dnflt.asm

1133 ===================================================================

1134 --- simd/jf3dnflt.asm» (revision 829)

1135 +++ simd/jf3dnflt.asm» (working copy)

1136 @@ -27,7 +27,7 @@

1137 » SECTION»SEG_CONST

1138

1139 » alignz» 16

1140 -» global» EXTN(jconst_fdct_float_3dnow)

1141 +» global» EXTN(jconst_fdct_float_3dnow) PRIVATE

1142

1143 EXTN(jconst_fdct_float_3dnow):

1144

1145 @@ -55,7 +55,7 @@

1146 %define WK_NUM»» 2

1147

1148 » align» 16

1149 -» global» EXTN(jsimd_fdct_float_3dnow)

1150 +» global» EXTN(jsimd_fdct_float_3dnow) PRIVATE

1151

1152 EXTN(jsimd_fdct_float_3dnow):

1153 » push» ebp

1154 Index: simd/jdsamss2-64.asm	13829 Index: simd/jdsamss2-64.asm

1155 ===================================================================	13830 ===================================================================

1156 --- simd/jdsamss2-64.asm (revision 829)	13831 --- simd/jdsamss2-64.asm (revision 829)

1157 +++ simd/jdsamss2-64.asm (working copy)	13832 +++ simd/jdsamss2-64.asm (working copy)

	13833 @@ -1,5 +1,5 @@

	13834 ;

	13835 -; jdsamss2.asm - upsampling (64-bit SSE2)

	13836 +; jdsamss2-64.asm - upsampling (64-bit SSE2)

	13837 ;

	13838 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	13839 ; Copyright 2009 D. R. Commander

1158 @@ -23,7 +23,7 @@	13840 @@ -23,7 +23,7 @@

1159 SECTION SEG_CONST	13841 SECTION SEG_CONST

1160	13842

1161 alignz 16	13843 alignz 16

1162 - global EXTN(jconst_fancy_upsample_sse2)	13844 - global EXTN(jconst_fancy_upsample_sse2)

1163 + global EXTN(jconst_fancy_upsample_sse2) PRIVATE	13845 + global EXTN(jconst_fancy_upsample_sse2) PRIVATE

1164	13846

1165 EXTN(jconst_fancy_upsample_sse2):	13847 EXTN(jconst_fancy_upsample_sse2):

1166	13848

1167 @@ -59,7 +59,7 @@	13849 @@ -59,10 +59,11 @@

1168 ; r13 = JSAMPARRAY * output_data_ptr	13850 ; r13 = JSAMPARRAY * output_data_ptr

1169	13851

1170 align 16	13852 align 16

1171 - global EXTN(jsimd_h2v1_fancy_upsample_sse2)	13853 - global EXTN(jsimd_h2v1_fancy_upsample_sse2)

1172 + global EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE	13854 + global EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE

1173	13855

1174 EXTN(jsimd_h2v1_fancy_upsample_sse2):	13856 EXTN(jsimd_h2v1_fancy_upsample_sse2):

1175 push rbp	13857 push rbp

1176 @@ -201,7 +201,7 @@	13858 +» mov» rax,rsp

	13859 » mov» rbp,rsp

	13860 » collect_args

	13861

	13862 @@ -200,7 +201,7 @@

1177 %define WK_NUM 4	13863 %define WK_NUM 4

1178	13864

1179 align 16	13865 align 16

1180 - global EXTN(jsimd_h2v2_fancy_upsample_sse2)	13866 - global EXTN(jsimd_h2v2_fancy_upsample_sse2)

1181 + global EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE	13867 + global EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE

1182	13868

1183 EXTN(jsimd_h2v2_fancy_upsample_sse2):	13869 EXTN(jsimd_h2v2_fancy_upsample_sse2):

1184 push rbp	13870 push rbp

1185 @@ -498,7 +498,7 @@	13871 @@ -210,8 +211,8 @@

	13872 » mov» [rsp],rax

	13873 » mov» rbp,rsp»» » » ; rbp = aligned rbp

	13874 » lea» rsp, [wk(0)]

	13875 +» collect_args

	13876 » push» rbx

	13877 -» collect_args

	13878

	13879 » mov» rax, r11 ; colctr

	13880 » test» rax,rax

	13881 @@ -472,8 +473,8 @@

	13882 » jg» near .rowloop

	13883

	13884 .return:

	13885 +» pop» rbx

	13886 » uncollect_args

	13887 -» pop» rbx

	13888 » mov» rsp,rbp»» ; rsp <- aligned rbp

	13889 » pop» rsp» » ; rsp <- original rbp

	13890 » pop» rbp

	13891 @@ -497,10 +498,11 @@

1186 ; r13 = JSAMPARRAY * output_data_ptr	13892 ; r13 = JSAMPARRAY * output_data_ptr

1187	13893

1188 align 16	13894 align 16

1189 - global EXTN(jsimd_h2v1_upsample_sse2)	13895 - global EXTN(jsimd_h2v1_upsample_sse2)

1190 + global EXTN(jsimd_h2v1_upsample_sse2) PRIVATE	13896 + global EXTN(jsimd_h2v1_upsample_sse2) PRIVATE

1191	13897

1192 EXTN(jsimd_h2v1_upsample_sse2):	13898 EXTN(jsimd_h2v1_upsample_sse2):

1193 push rbp	13899 push rbp

1194 @@ -587,7 +587,7 @@	13900 +» mov» rax,rsp

	13901 » mov» rbp,rsp

	13902 » collect_args

	13903

	13904 @@ -585,13 +587,14 @@

1195 ; r13 = JSAMPARRAY * output_data_ptr	13905 ; r13 = JSAMPARRAY * output_data_ptr

1196	13906

1197 align 16	13907 align 16

1198 - global EXTN(jsimd_h2v2_upsample_sse2)	13908 - global EXTN(jsimd_h2v2_upsample_sse2)

1199 + global EXTN(jsimd_h2v2_upsample_sse2) PRIVATE	13909 + global EXTN(jsimd_h2v2_upsample_sse2) PRIVATE

1200	13910

1201 EXTN(jsimd_h2v2_upsample_sse2):	13911 EXTN(jsimd_h2v2_upsample_sse2):

1202 push rbp	13912 push rbp

1203 Index: simd/jcgrass2.asm	13913 +» mov» rax,rsp

1204 ===================================================================	13914 » mov» rbp,rsp

1205 --- simd/jcgrass2.asm» (revision 829)	13915 +» collect_args

1206 +++ simd/jcgrass2.asm» (working copy)	13916 » push» rbx

1207 @@ -30,7 +30,7 @@	13917 -» collect_args

1208 » SECTION»SEG_CONST	13918

1209	13919 » mov» rdx, r11

1210 » alignz» 16	13920 » add» rdx, byte (2*SIZEOF_XMMWORD)-1

1211 -» global» EXTN(jconst_rgb_gray_convert_sse2)	13921 @@ -658,7 +661,11 @@

1212 +» global» EXTN(jconst_rgb_gray_convert_sse2) PRIVATE	13922 » jg» near .rowloop

1213	13923

1214 EXTN(jconst_rgb_gray_convert_sse2):	13924 .return:

1215	13925 +» pop» rbx

1216 Index: simd/jcsammmx.asm	13926 » uncollect_args

1217 ===================================================================	13927 -» pop» rbx

1218 --- simd/jcsammmx.asm» (revision 829)	13928 » pop» rbp

1219 +++ simd/jcsammmx.asm» (working copy)	13929 » ret

1220 @@ -40,7 +40,7 @@

1221 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data

1222

1223 » align» 16

1224 -» global» EXTN(jsimd_h2v1_downsample_mmx)

1225 +» global» EXTN(jsimd_h2v1_downsample_mmx) PRIVATE

1226

1227 EXTN(jsimd_h2v1_downsample_mmx):

1228 » push» ebp

1229 @@ -182,7 +182,7 @@

1230 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data

1231

1232 » align» 16

1233 -» global» EXTN(jsimd_h2v2_downsample_mmx)

1234 +» global» EXTN(jsimd_h2v2_downsample_mmx) PRIVATE

1235

1236 EXTN(jsimd_h2v2_downsample_mmx):

1237 » push» ebp

1238 +Index: simd/jsimd_arm.c

1239 +===================================================================

1240 +--- simd/jsimd_arm.c (revision 272637)

1241 ++++ simd/jsimd_arm.c (working copy)

1242 +@@ -29,0 +29,0 @@

1243 +	13930 +

1244 + static unsigned int simd_support = ~0;	13931 +; For some reason, the OS X linker does not honor the request to align the

	13932 +; segment unless we do this.

	13933 +» align» 16

	13934 Index: simd/jdsamss2.asm

	13935 ===================================================================

	13936 --- simd/jdsamss2.asm» (revision 829)

	13937 +++ simd/jdsamss2.asm» (working copy)

	13938 @@ -22,7 +22,7 @@

	13939 » SECTION»SEG_CONST

	13940

	13941 » alignz» 16

	13942 -» global» EXTN(jconst_fancy_upsample_sse2)

	13943 +» global» EXTN(jconst_fancy_upsample_sse2) PRIVATE

	13944

	13945 EXTN(jconst_fancy_upsample_sse2):

	13946

	13947 @@ -58,7 +58,7 @@

	13948 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

	13949

	13950 » align» 16

	13951 -» global» EXTN(jsimd_h2v1_fancy_upsample_sse2)

	13952 +» global» EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE

	13953

	13954 EXTN(jsimd_h2v1_fancy_upsample_sse2):

	13955 » push» ebp

	13956 @@ -214,7 +214,7 @@

	13957 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr

	13958

	13959 » align» 16

	13960 -» global» EXTN(jsimd_h2v2_fancy_upsample_sse2)

	13961 +» global» EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE

	13962

	13963 EXTN(jsimd_h2v2_fancy_upsample_sse2):

	13964 » push» ebp

	13965 @@ -538,7 +538,7 @@

	13966 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

	13967

	13968 » align» 16

	13969 -» global» EXTN(jsimd_h2v1_upsample_sse2)

	13970 +» global» EXTN(jsimd_h2v1_upsample_sse2) PRIVATE

	13971

	13972 EXTN(jsimd_h2v1_upsample_sse2):

	13973 » push» ebp

	13974 @@ -637,7 +637,7 @@

	13975 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr

	13976

	13977 » align» 16

	13978 -» global» EXTN(jsimd_h2v2_upsample_sse2)

	13979 +» global» EXTN(jsimd_h2v2_upsample_sse2) PRIVATE

	13980

	13981 EXTN(jsimd_h2v2_upsample_sse2):

	13982 » push» ebp

	13983 @@ -724,3 +724,6 @@

	13984 » pop» ebp

	13985 » ret

	13986

	13987 +; For some reason, the OS X linker does not honor the request to align the

	13988 +; segment unless we do this.

	13989 +» align» 16

	13990 Index: simd/jf3dnflt.asm

	13991 ===================================================================

	13992 --- simd/jf3dnflt.asm» (revision 829)

	13993 +++ simd/jf3dnflt.asm» (working copy)

	13994 @@ -27,7 +27,7 @@

	13995 » SECTION»SEG_CONST

	13996

	13997 » alignz» 16

	13998 -» global» EXTN(jconst_fdct_float_3dnow)

	13999 +» global» EXTN(jconst_fdct_float_3dnow) PRIVATE

	14000

	14001 EXTN(jconst_fdct_float_3dnow):

	14002

	14003 @@ -55,7 +55,7 @@

	14004 %define WK_NUM»» 2

	14005

	14006 » align» 16

	14007 -» global» EXTN(jsimd_fdct_float_3dnow)

	14008 +» global» EXTN(jsimd_fdct_float_3dnow) PRIVATE

	14009

	14010 EXTN(jsimd_fdct_float_3dnow):

	14011 » push» ebp

	14012 @@ -315,3 +315,6 @@

	14013 » pop» ebp

	14014 » ret

	14015

	14016 +; For some reason, the OS X linker does not honor the request to align the

	14017 +; segment unless we do this.

	14018 +» align» 16

	14019 Index: simd/jfmmxfst.asm

	14020 ===================================================================

	14021 --- simd/jfmmxfst.asm» (revision 829)

	14022 +++ simd/jfmmxfst.asm» (working copy)

	14023 @@ -52,7 +52,7 @@

	14024 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)

	14025

	14026 » alignz» 16

	14027 -» global» EXTN(jconst_fdct_ifast_mmx)

	14028 +» global» EXTN(jconst_fdct_ifast_mmx) PRIVATE

	14029

	14030 EXTN(jconst_fdct_ifast_mmx):

	14031

	14032 @@ -80,7 +80,7 @@

	14033 %define WK_NUM»» 2

	14034

	14035 » align» 16

	14036 -» global» EXTN(jsimd_fdct_ifast_mmx)

	14037 +» global» EXTN(jsimd_fdct_ifast_mmx) PRIVATE

	14038

	14039 EXTN(jsimd_fdct_ifast_mmx):

	14040 » push» ebp

	14041 @@ -392,3 +392,6 @@

	14042 » pop» ebp

	14043 » ret

	14044

	14045 +; For some reason, the OS X linker does not honor the request to align the

	14046 +; segment unless we do this.

	14047 +» align» 16

	14048 Index: simd/jfmmxint.asm

	14049 ===================================================================

	14050 --- simd/jfmmxint.asm» (revision 829)

	14051 +++ simd/jfmmxint.asm» (working copy)

	14052 @@ -66,7 +66,7 @@

	14053 » SECTION»SEG_CONST

	14054

	14055 » alignz» 16

	14056 -» global» EXTN(jconst_fdct_islow_mmx)

	14057 +» global» EXTN(jconst_fdct_islow_mmx) PRIVATE

	14058

	14059 EXTN(jconst_fdct_islow_mmx):

	14060

	14061 @@ -101,7 +101,7 @@

	14062 %define WK_NUM»» 2

	14063

	14064 » align» 16

	14065 -» global» EXTN(jsimd_fdct_islow_mmx)

	14066 +» global» EXTN(jsimd_fdct_islow_mmx) PRIVATE

	14067

	14068 EXTN(jsimd_fdct_islow_mmx):

	14069 » push» ebp

	14070 @@ -617,3 +617,6 @@

	14071 » pop» ebp

	14072 » ret

	14073

	14074 +; For some reason, the OS X linker does not honor the request to align the

	14075 +; segment unless we do this.

	14076 +» align» 16

	14077 Index: simd/jfss2fst-64.asm

	14078 ===================================================================

	14079 --- simd/jfss2fst-64.asm» (revision 829)

	14080 +++ simd/jfss2fst-64.asm» (working copy)

	14081 @@ -1,5 +1,5 @@

	14082 ;

	14083 -; jfss2fst.asm - fast integer FDCT (64-bit SSE2)

	14084 +; jfss2fst-64.asm - fast integer FDCT (64-bit SSE2)

	14085 ;

	14086 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	14087 ; Copyright 2009 D. R. Commander

	14088 @@ -53,7 +53,7 @@

	14089 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)

	14090

	14091 » alignz» 16

	14092 -» global» EXTN(jconst_fdct_ifast_sse2)

	14093 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE

	14094

	14095 EXTN(jconst_fdct_ifast_sse2):

	14096

	14097 @@ -80,7 +80,7 @@

	14098 %define WK_NUM»» 2

	14099

	14100 » align» 16

	14101 -» global» EXTN(jsimd_fdct_ifast_sse2)

	14102 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE

	14103

	14104 EXTN(jsimd_fdct_ifast_sse2):

	14105 » push» rbp

	14106 @@ -386,3 +386,7 @@

	14107 » pop» rsp» » ; rsp <- original rbp

	14108 » pop» rbp

	14109 » ret

1245 +	14110 +

1246 +-#if defined(__linux__) \|\| defined(ANDROID) \|\| defined(__ANDROID__)	14111 +; For some reason, the OS X linker does not honor the request to align the

1247 ++#if !defined(__ARM_NEON__) && (defined(__linux__) \|\| defined(ANDROID) \|\| defin ed(__ANDROID__))	14112 +; segment unless we do this.

	14113 +» align» 16

	14114 Index: simd/jfss2fst.asm

	14115 ===================================================================

	14116 --- simd/jfss2fst.asm» (revision 829)

	14117 +++ simd/jfss2fst.asm» (working copy)

	14118 @@ -52,7 +52,7 @@

	14119 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)

	14120

	14121 » alignz» 16

	14122 -» global» EXTN(jconst_fdct_ifast_sse2)

	14123 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE

	14124

	14125 EXTN(jconst_fdct_ifast_sse2):

	14126

	14127 @@ -80,7 +80,7 @@

	14128 %define WK_NUM»» 2

	14129

	14130 » align» 16

	14131 -» global» EXTN(jsimd_fdct_ifast_sse2)

	14132 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE

	14133

	14134 EXTN(jsimd_fdct_ifast_sse2):

	14135 » push» ebp

	14136 @@ -399,3 +399,6 @@

	14137 » pop» ebp

	14138 » ret

	14139

	14140 +; For some reason, the OS X linker does not honor the request to align the

	14141 +; segment unless we do this.

	14142 +» align» 16

	14143 Index: simd/jfss2int-64.asm

	14144 ===================================================================

	14145 --- simd/jfss2int-64.asm» (revision 829)

	14146 +++ simd/jfss2int-64.asm» (working copy)

	14147 @@ -1,5 +1,5 @@

	14148 ;

	14149 -; jfss2int.asm - accurate integer FDCT (64-bit SSE2)

	14150 +; jfss2int-64.asm - accurate integer FDCT (64-bit SSE2)

	14151 ;

	14152 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	14153 ; Copyright 2009 D. R. Commander

	14154 @@ -67,7 +67,7 @@

	14155 » SECTION»SEG_CONST

	14156

	14157 » alignz» 16

	14158 -» global» EXTN(jconst_fdct_islow_sse2)

	14159 +» global» EXTN(jconst_fdct_islow_sse2) PRIVATE

	14160

	14161 EXTN(jconst_fdct_islow_sse2):

	14162

	14163 @@ -101,7 +101,7 @@

	14164 %define WK_NUM»» 6

	14165

	14166 » align» 16

	14167 -» global» EXTN(jsimd_fdct_islow_sse2)

	14168 +» global» EXTN(jsimd_fdct_islow_sse2) PRIVATE

	14169

	14170 EXTN(jsimd_fdct_islow_sse2):

	14171 » push» rbp

	14172 @@ -616,3 +616,7 @@

	14173 » pop» rsp» » ; rsp <- original rbp

	14174 » pop» rbp

	14175 » ret

1248 +	14176 +

1249 + #define SOMEWHAT_SANE_PROC_CPUINFO_SIZE_LIMIT (1024 * 1024)	14177 +; For some reason, the OS X linker does not honor the request to align the

1250 +	14178 +; segment unless we do this.

1251 +@@ -100,6 +100,6 @@	14179 +» align» 16

1252 + init_simd (void)

1253 + {

1254 + char *env = NULL;

1255 +-#if !defined(__ARM_NEON__) && defined(__linux__) \|\| defined(ANDROID) \|\| define d(__ANDROID__)

1256 ++#if !defined(__ARM_NEON__) && (defined(__linux__) \|\| defined(ANDROID) \|\| defin ed(__ANDROID__))

1257 + int bufsize = 1024; /* an initial guess for the line buffer size limit */

1258 + #endif

1259 +

1260 Index: simd/jsimd_arm_neon.S

1261 ===================================================================

1262 --- simd/jsimd_arm_neon.S» (revision 272637)

1263 +++ simd/jsimd_arm_neon.S» (working copy)

1264 @@ -41,11 +41,9 @@

1265 /* Supplementary macro for setting function attributes */

1266 .macro asm_function fname

1267 #ifdef __APPLE__

1268 - .func _\fname

1269 .globl _\fname

1270 _\fname:

1271 #else

1272 - .func \fname

1273 .global \fname

1274 #ifdef __ELF__

1275 .hidden \fname

1276 @@ -670,7 +668,6 @@

1277 .unreq ROW6R

1278 .unreq ROW7L

1279 .unreq ROW7R

1280 -.endfunc

1281

1282

1283 /*****************************************************************************/

1284 @@ -895,7 +892,6 @@

1285 .unreq TMP2

1286 .unreq TMP3

1287 .unreq TMP4

1288 -.endfunc

1289

1290

1291 /*****************************************************************************/

1292 @@ -1108,7 +1104,6 @@

1293 .unreq TMP2

1294 .unreq TMP3

1295 .unreq TMP4

1296 -.endfunc

1297

1298 .purgem idct_helper

1299

1300 @@ -1263,7 +1258,6 @@

1301 .unreq OUTPUT_COL

1302 .unreq TMP1

1303 .unreq TMP2

1304 -.endfunc

1305

1306 .purgem idct_helper

1307

1308 @@ -1547,7 +1541,6 @@

1309 .unreq U

1310 .unreq V

1311 .unreq N

1312 -.endfunc

1313

1314 .purgem do_yuv_to_rgb

1315 .purgem do_yuv_to_rgb_stage1

1316 @@ -1858,7 +1851,6 @@

1317 .unreq U

1318 .unreq V

1319 .unreq N

1320 -.endfunc

1321

1322 .purgem do_rgb_to_yuv

1323 .purgem do_rgb_to_yuv_stage1

1324 @@ -1940,7 +1932,6 @@

1325 .unreq TMP2

1326 .unreq TMP3

1327 .unreq TMP4

1328 -.endfunc

1329

1330

1331 /*****************************************************************************/

1332 @@ -2064,7 +2055,6 @@

1333

1334 .unreq DATA

1335 .unreq TMP

1336 -.endfunc

1337

1338

1339 /*****************************************************************************/

1340 @@ -2166,7 +2156,6 @@

1341 .unreq CORRECTION

1342 .unreq SHIFT

1343 .unreq LOOP_COUNT

1344 -.endfunc

1345

1346

1347 /*****************************************************************************/

1348 @@ -2401,7 +2390,6 @@

1349 .unreq WIDTH

1350 .unreq TMP

1351

1352 -.endfunc

1353

1354 .purgem upsample16

1355 .purgem upsample32

1356 Index: simd/jsimd_i386.c

1357 ===================================================================

1358 --- simd/jsimd_i386.c» (revision 829)

1359 +++ simd/jsimd_i386.c» (working copy)

1360 @@ -61,6 +61,7 @@

1361 simd_support &= JSIMD_SSE2;

1362 }

1363

1364 +#ifndef JPEG_DECODE_ONLY

1365 GLOBAL(int)

1366 jsimd_can_rgb_ycc (void)

1367 {

1368 @@ -82,6 +83,7 @@

1369

1370 return 0;

1371 }

1372 +#endif

1373

1374 GLOBAL(int)

1375 jsimd_can_rgb_gray (void)

1376 @@ -127,6 +129,7 @@

1377 return 0;

1378 }

1379

1380 +#ifndef JPEG_DECODE_ONLY

1381 GLOBAL(void)

1382 jsimd_rgb_ycc_convert (j_compress_ptr cinfo,

1383 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

1384 @@ -179,6 +182,7 @@

1385 mmxfct(cinfo->image_width, input_buf,

1386 output_buf, output_row, num_rows);

1387 }

1388 +#endif

1389

1390 GLOBAL(void)

1391 jsimd_rgb_gray_convert (j_compress_ptr cinfo,

1392 @@ -286,6 +290,7 @@

1393 input_row, output_buf, num_rows);

1394 }

1395

1396 +#ifndef JPEG_DECODE_ONLY

1397 GLOBAL(int)

1398 jsimd_can_h2v2_downsample (void)

1399 {

1400 @@ -351,6 +356,7 @@

1401 compptr->v_samp_factor, compptr->width_in_blocks,

1402 input_data, output_data);

1403 }

1404 +#endif

1405

1406 GLOBAL(int)

1407 jsimd_can_h2v2_upsample (void)

1408 @@ -636,6 +642,7 @@

1409 in_row_group_ctr, output_buf);

1410 }

1411

1412 +#ifndef JPEG_DECODE_ONLY

1413 GLOBAL(int)

1414 jsimd_can_convsamp (void)

1415 {

1416 @@ -855,6 +862,7 @@

1417 else if (simd_support & JSIMD_3DNOW)

1418 jsimd_quantize_float_3dnow(coef_block, divisors, workspace);

1419 }

1420 +#endif

1421

1422 GLOBAL(int)

1423 jsimd_can_idct_2x2 (void)

1424 @@ -1045,4 +1053,3 @@

1425 jsimd_idct_float_3dnow(compptr->dct_table, coef_block,

1426 output_buf, output_col);

1427 }

1428 -

1429 Index: simd/jcqnts2f-64.asm

1430 ===================================================================

1431 --- simd/jcqnts2f-64.asm» (revision 829)

1432 +++ simd/jcqnts2f-64.asm» (working copy)

1433 @@ -36,7 +36,7 @@

1434 ; r12 = FAST_FLOAT * workspace

1435

1436 » align» 16

1437 -» global» EXTN(jsimd_convsamp_float_sse2)

1438 +» global» EXTN(jsimd_convsamp_float_sse2) PRIVATE

1439

1440 EXTN(jsimd_convsamp_float_sse2):

1441 » push» rbp

1442 @@ -110,7 +110,7 @@

1443 ; r12 = FAST_FLOAT * workspace

1444

1445 » align» 16

1446 -» global» EXTN(jsimd_quantize_float_sse2)

1447 +» global» EXTN(jsimd_quantize_float_sse2) PRIVATE

1448

1449 EXTN(jsimd_quantize_float_sse2):

1450 » push» rbp

1451 Index: simd/jcqnt3dn.asm

1452 ===================================================================

1453 --- simd/jcqnt3dn.asm» (revision 829)

1454 +++ simd/jcqnt3dn.asm» (working copy)

1455 @@ -35,7 +35,7 @@

1456 %define workspace» ebp+16» » ; FAST_FLOAT * workspace

1457

1458 » align» 16

1459 -» global» EXTN(jsimd_convsamp_float_3dnow)

1460 +» global» EXTN(jsimd_convsamp_float_3dnow) PRIVATE

1461

1462 EXTN(jsimd_convsamp_float_3dnow):

1463 » push» ebp

1464 @@ -138,7 +138,7 @@

1465 %define workspace» ebp+16» » ; FAST_FLOAT * workspace

1466

1467 » align» 16

1468 -» global» EXTN(jsimd_quantize_float_3dnow)

1469 +» global» EXTN(jsimd_quantize_float_3dnow) PRIVATE

1470

1471 EXTN(jsimd_quantize_float_3dnow):

1472 » push» ebp

1473 Index: simd/jcsamss2.asm

1474 ===================================================================

1475 --- simd/jcsamss2.asm» (revision 829)

1476 +++ simd/jcsamss2.asm» (working copy)

1477 @@ -40,7 +40,7 @@

1478 %define output_data(b)»(b)+28» » ; JSAMPARRAY output_data

1479

1480 » align» 16

1481 -» global» EXTN(jsimd_h2v1_downsample_sse2)

1482 +» global» EXTN(jsimd_h2v1_downsample_sse2) PRIVATE

1483

1484 EXTN(jsimd_h2v1_downsample_sse2):

1485 » push» ebp

1486 @@ -195,7 +195,7 @@

1487 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data

1488

1489 » align» 16

1490 -» global» EXTN(jsimd_h2v2_downsample_sse2)

1491 +» global» EXTN(jsimd_h2v2_downsample_sse2) PRIVATE

1492

1493 EXTN(jsimd_h2v2_downsample_sse2):

1494 » push» ebp

1495 Index: simd/jsimd_x86_64.c

1496 ===================================================================

1497 --- simd/jsimd_x86_64.c»(revision 829)

1498 +++ simd/jsimd_x86_64.c»(working copy)

1499 @@ -29,6 +29,7 @@

1500

1501 #define IS_ALIGNED_SSE(ptr) (IS_ALIGNED(ptr, 4)) /* 16 byte alignment */

1502

1503 +#ifndef JPEG_DECODE_ONLY

1504 GLOBAL(int)

1505 jsimd_can_rgb_ycc (void)

1506 {

1507 @@ -45,6 +46,7 @@

1508

1509 return 1;

1510 }

1511 +#endif

1512

1513 GLOBAL(int)

1514 jsimd_can_rgb_gray (void)

1515 @@ -80,6 +82,7 @@

1516 return 1;

1517 }

1518

1519 +#ifndef JPEG_DECODE_ONLY

1520 GLOBAL(void)

1521 jsimd_rgb_ycc_convert (j_compress_ptr cinfo,

1522 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

1523 @@ -118,6 +121,7 @@

1524

1525 sse2fct(cinfo->image_width, input_buf, output_buf, output_row, num_rows);

1526 }

1527 +#endif

1528

1529 GLOBAL(void)

1530 jsimd_rgb_gray_convert (j_compress_ptr cinfo,

1531 @@ -197,6 +201,7 @@

1532 sse2fct(cinfo->output_width, input_buf, input_row, output_buf, num_rows);

1533 }

1534

1535 +#ifndef JPEG_DECODE_ONLY

1536 GLOBAL(int)

1537 jsimd_can_h2v2_downsample (void)

1538 {

1539 @@ -242,6 +247,7 @@

1540 compptr->width_in_blocks,

1541 input_data, output_data);

1542 }

1543 +#endif

1544

1545 GLOBAL(int)

1546 jsimd_can_h2v2_upsample (void)

1547 @@ -451,6 +457,7 @@

1548 sse2fct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf);

1549 }

1550

1551 +#ifndef JPEG_DECODE_ONLY

1552 GLOBAL(int)

1553 jsimd_can_convsamp (void)

1554 {

1555 @@ -601,6 +608,7 @@

1556 {

1557 jsimd_quantize_float_sse2(coef_block, divisors, workspace);

1558 }

1559 +#endif

1560

1561 GLOBAL(int)

1562 jsimd_can_idct_2x2 (void)

1563 @@ -750,4 +758,3 @@

1564 jsimd_idct_float_sse2(compptr->dct_table, coef_block,

1565 output_buf, output_col);

1566 }

1567 -

1568 Index: simd/jimmxint.asm

1569 ===================================================================

1570 --- simd/jimmxint.asm» (revision 829)

1571 +++ simd/jimmxint.asm» (working copy)

1572 @@ -66,7 +66,7 @@

1573 » SECTION»SEG_CONST

1574

1575 » alignz» 16

1576 -» global» EXTN(jconst_idct_islow_mmx)

1577 +» global» EXTN(jconst_idct_islow_mmx) PRIVATE

1578

1579 EXTN(jconst_idct_islow_mmx):

1580

1581 @@ -107,7 +107,7 @@

1582 » » » » » ; JCOEF workspace[DCTSIZE2]

1583

1584 » align» 16

1585 -» global» EXTN(jsimd_idct_islow_mmx)

1586 +» global» EXTN(jsimd_idct_islow_mmx) PRIVATE

1587

1588 EXTN(jsimd_idct_islow_mmx):

1589 » push» ebp

1590 Index: simd/jcgrymmx.asm

1591 ===================================================================

1592 --- simd/jcgrymmx.asm» (revision 829)

1593 +++ simd/jcgrymmx.asm» (working copy)

1594 @@ -41,7 +41,7 @@

1595 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr

1596

1597 » align» 16

1598 -» global» EXTN(jsimd_rgb_gray_convert_mmx)

1599 +» global» EXTN(jsimd_rgb_gray_convert_mmx) PRIVATE

1600

1601 EXTN(jsimd_rgb_gray_convert_mmx):

1602 » push» ebp

1603 Index: simd/jfss2int.asm	14180 Index: simd/jfss2int.asm

1604 ===================================================================	14181 ===================================================================

1605 --- simd/jfss2int.asm (revision 829)	14182 --- simd/jfss2int.asm (revision 829)

1606 +++ simd/jfss2int.asm (working copy)	14183 +++ simd/jfss2int.asm (working copy)

1607 @@ -66,7 +66,7 @@	14184 @@ -66,7 +66,7 @@

1608 SECTION SEG_CONST	14185 SECTION SEG_CONST

1609	14186

1610 alignz 16	14187 alignz 16

1611 - global EXTN(jconst_fdct_islow_sse2)	14188 - global EXTN(jconst_fdct_islow_sse2)

1612 + global EXTN(jconst_fdct_islow_sse2) PRIVATE	14189 + global EXTN(jconst_fdct_islow_sse2) PRIVATE

1613	14190

1614 EXTN(jconst_fdct_islow_sse2):	14191 EXTN(jconst_fdct_islow_sse2):

1615	14192

1616 @@ -101,7 +101,7 @@	14193 @@ -101,7 +101,7 @@

1617 %define WK_NUM 6	14194 %define WK_NUM 6

1618	14195

1619 align 16	14196 align 16

1620 - global EXTN(jsimd_fdct_islow_sse2)	14197 - global EXTN(jsimd_fdct_islow_sse2)

1621 + global EXTN(jsimd_fdct_islow_sse2) PRIVATE	14198 + global EXTN(jsimd_fdct_islow_sse2) PRIVATE

1622	14199

1623 EXTN(jsimd_fdct_islow_sse2):	14200 EXTN(jsimd_fdct_islow_sse2):

1624 push ebp	14201 push ebp

1625 Index: simd/jcgryss2.asm	14202 @@ -629,3 +629,6 @@

	14203 » pop» ebp

	14204 » ret

	14205

	14206 +; For some reason, the OS X linker does not honor the request to align the

	14207 +; segment unless we do this.

	14208 +» align» 16

	14209 Index: simd/jfsseflt-64.asm

1626 ===================================================================	14210 ===================================================================

1627 --- simd/jcgryss2.asm» (revision 829)	14211 --- simd/jfsseflt-64.asm» (revision 829)

1628 +++ simd/jcgryss2.asm» (working copy)	14212 +++ simd/jfsseflt-64.asm» (working copy)

1629 @@ -39,7 +39,7 @@	14213 @@ -1,5 +1,5 @@

	14214 ;

	14215 -; jfsseflt.asm - floating-point FDCT (64-bit SSE)

	14216 +; jfsseflt-64.asm - floating-point FDCT (64-bit SSE)

	14217 ;

	14218 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	14219 ; Copyright 2009 D. R. Commander

	14220 @@ -38,7 +38,7 @@

	14221 » SECTION»SEG_CONST

	14222

	14223 » alignz» 16

	14224 -» global» EXTN(jconst_fdct_float_sse)

	14225 +» global» EXTN(jconst_fdct_float_sse) PRIVATE

	14226

	14227 EXTN(jconst_fdct_float_sse):

	14228

	14229 @@ -65,7 +65,7 @@

	14230 %define WK_NUM»» 2

1630	14231

1631 align 16	14232 align 16

	14233 - global EXTN(jsimd_fdct_float_sse)

	14234 + global EXTN(jsimd_fdct_float_sse) PRIVATE

1632	14235

1633 -» global» EXTN(jsimd_rgb_gray_convert_sse2)	14236 EXTN(jsimd_fdct_float_sse):

1634 +» global» EXTN(jsimd_rgb_gray_convert_sse2) PRIVATE	14237 » push» rbp

1635	14238 @@ -352,3 +352,7 @@

1636 EXTN(jsimd_rgb_gray_convert_sse2):	14239 » pop» rsp» » ; rsp <- original rbp

1637 » push» ebp	14240 » pop» rbp

1638 Index: simd/jccolmmx.asm	14241 » ret

	14242 +

	14243 +; For some reason, the OS X linker does not honor the request to align the

	14244 +; segment unless we do this.

	14245 +» align» 16

	14246 Index: simd/jfsseflt.asm

1639 ===================================================================	14247 ===================================================================

1640 --- simd/jccolmmx.asm» (revision 829)	14248 --- simd/jfsseflt.asm» (revision 829)

1641 +++ simd/jccolmmx.asm» (working copy)	14249 +++ simd/jfsseflt.asm» (working copy)

1642 @@ -37,7 +37,7 @@	14250 @@ -37,7 +37,7 @@

1643 SECTION SEG_CONST	14251 SECTION SEG_CONST

1644	14252

1645 alignz 16	14253 alignz 16

1646 -» global» EXTN(jconst_rgb_ycc_convert_mmx)	14254 -» global» EXTN(jconst_fdct_float_sse)

1647 +» global» EXTN(jconst_rgb_ycc_convert_mmx) PRIVATE	14255 +» global» EXTN(jconst_fdct_float_sse) PRIVATE

1648	14256

1649 EXTN(jconst_rgb_ycc_convert_mmx):	14257 EXTN(jconst_fdct_float_sse):

1650	14258

	14259 @@ -65,7 +65,7 @@

	14260 %define WK_NUM 2

	14261

	14262 align 16

	14263 - global EXTN(jsimd_fdct_float_sse)

	14264 + global EXTN(jsimd_fdct_float_sse) PRIVATE

	14265

	14266 EXTN(jsimd_fdct_float_sse):

	14267 push ebp

	14268 @@ -365,3 +365,6 @@

	14269 pop ebp

	14270 ret

	14271

	14272 +; For some reason, the OS X linker does not honor the request to align the

	14273 +; segment unless we do this.

	14274 + align 16

	14275 Index: simd/ji3dnflt.asm

	14276 ===================================================================

	14277 --- simd/ji3dnflt.asm (revision 829)

	14278 +++ simd/ji3dnflt.asm (working copy)

	14279 @@ -27,7 +27,7 @@

	14280 SECTION SEG_CONST

	14281

	14282 alignz 16

	14283 - global EXTN(jconst_idct_float_3dnow)

	14284 + global EXTN(jconst_idct_float_3dnow) PRIVATE

	14285

	14286 EXTN(jconst_idct_float_3dnow):

	14287

	14288 @@ -63,7 +63,7 @@

	14289 ; FAST_FLOAT workspace[DCTSIZE2]

	14290

	14291 align 16

	14292 - global EXTN(jsimd_idct_float_3dnow)

	14293 + global EXTN(jsimd_idct_float_3dnow) PRIVATE

	14294

	14295 EXTN(jsimd_idct_float_3dnow):

	14296 push ebp

	14297 @@ -447,3 +447,6 @@

	14298 pop ebp

	14299 ret

	14300

	14301 +; For some reason, the OS X linker does not honor the request to align the

	14302 +; segment unless we do this.

	14303 + align 16

	14304 Index: simd/jimmxfst.asm

	14305 ===================================================================

	14306 --- simd/jimmxfst.asm (revision 829)

	14307 +++ simd/jimmxfst.asm (working copy)

	14308 @@ -59,7 +59,7 @@

	14309 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)

	14310

	14311 alignz 16

	14312 - global EXTN(jconst_idct_ifast_mmx)

	14313 + global EXTN(jconst_idct_ifast_mmx) PRIVATE

	14314

	14315 EXTN(jconst_idct_ifast_mmx):

	14316

	14317 @@ -94,7 +94,7 @@

	14318 ; JCOEF workspace[DCTSIZE2]

	14319

	14320 align 16

	14321 - global EXTN(jsimd_idct_ifast_mmx)

	14322 + global EXTN(jsimd_idct_ifast_mmx) PRIVATE

	14323

	14324 EXTN(jsimd_idct_ifast_mmx):

	14325 push ebp

	14326 @@ -495,3 +495,6 @@

	14327 pop ebp

	14328 ret

	14329

	14330 +; For some reason, the OS X linker does not honor the request to align the

	14331 +; segment unless we do this.

	14332 + align 16

	14333 Index: simd/jimmxint.asm

	14334 ===================================================================

	14335 --- simd/jimmxint.asm (revision 829)

	14336 +++ simd/jimmxint.asm (working copy)

	14337 @@ -66,7 +66,7 @@

	14338 SECTION SEG_CONST

	14339

	14340 alignz 16

	14341 - global EXTN(jconst_idct_islow_mmx)

	14342 + global EXTN(jconst_idct_islow_mmx) PRIVATE

	14343

	14344 EXTN(jconst_idct_islow_mmx):

	14345

	14346 @@ -107,7 +107,7 @@

	14347 ; JCOEF workspace[DCTSIZE2]

	14348

	14349 align 16

	14350 - global EXTN(jsimd_idct_islow_mmx)

	14351 + global EXTN(jsimd_idct_islow_mmx) PRIVATE

	14352

	14353 EXTN(jsimd_idct_islow_mmx):

	14354 push ebp

	14355 @@ -847,3 +847,6 @@

	14356 pop ebp

	14357 ret

	14358

	14359 +; For some reason, the OS X linker does not honor the request to align the

	14360 +; segment unless we do this.

	14361 + align 16

1651 Index: simd/jimmxred.asm	14362 Index: simd/jimmxred.asm

1652 ===================================================================	14363 ===================================================================

1653 --- simd/jimmxred.asm (revision 829)	14364 --- simd/jimmxred.asm (revision 829)

1654 +++ simd/jimmxred.asm (working copy)	14365 +++ simd/jimmxred.asm (working copy)

1655 @@ -72,7 +72,7 @@	14366 @@ -72,7 +72,7 @@

1656 SECTION SEG_CONST	14367 SECTION SEG_CONST

1657	14368

1658 alignz 16	14369 alignz 16

1659 - global EXTN(jconst_idct_red_mmx)	14370 - global EXTN(jconst_idct_red_mmx)

1660 + global EXTN(jconst_idct_red_mmx) PRIVATE	14371 + global EXTN(jconst_idct_red_mmx) PRIVATE

(...skipping 11 matching lines...) Expand all Loading...
1672 push ebp	14383 push ebp

1673 @@ -503,7 +503,7 @@	14384 @@ -503,7 +503,7 @@

1674 %define output_col(b) (b)+20 ; JDIMENSION output_col	14385 %define output_col(b) (b)+20 ; JDIMENSION output_col

1675	14386

1676 align 16	14387 align 16

1677 - global EXTN(jsimd_idct_2x2_mmx)	14388 - global EXTN(jsimd_idct_2x2_mmx)

1678 + global EXTN(jsimd_idct_2x2_mmx) PRIVATE	14389 + global EXTN(jsimd_idct_2x2_mmx) PRIVATE

1679	14390

1680 EXTN(jsimd_idct_2x2_mmx):	14391 EXTN(jsimd_idct_2x2_mmx):

1681 push ebp	14392 push ebp

1682 Index: simd/jsimdext.inc	14393 @@ -701,3 +701,6 @@

	14394 » pop» ebp

	14395 » ret

	14396

	14397 +; For some reason, the OS X linker does not honor the request to align the

	14398 +; segment unless we do this.

	14399 +» align» 16

	14400 Index: simd/jiss2flt-64.asm

1683 ===================================================================	14401 ===================================================================

1684 --- simd/jsimdext.inc» (revision 829)	14402 --- simd/jiss2flt-64.asm» (revision 829)

1685 +++ simd/jsimdext.inc» (working copy)	14403 +++ simd/jiss2flt-64.asm» (working copy)

1686 @@ -73,6 +73,9 @@	14404 @@ -1,5 +1,5 @@

1687 ; * *BSD family Unix using elf format

1688 ; * Unix System V, including Solaris x86, UnixWare and SCO Unix

1689

1690 +; PIC is the default on Linux

1691 +%define PIC

1692 +

1693 ; mark stack as non-executable

1694 section .note.GNU-stack noalloc noexec nowrite progbits

1695

1696 @@ -375,4 +378,14 @@

1697 ;	14405 ;

1698 %include "jsimdcfg.inc"	14406 -; jiss2flt.asm - floating-point IDCT (64-bit SSE & SSE2)

1699	14407 +; jiss2flt-64.asm - floating-point IDCT (64-bit SSE & SSE2)

1700 +; Begin chromium edits	14408 ;

1701 +%ifdef MACHO ; ----(nasm -fmacho -DMACHO ...)--------	14409 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

1702 +%define PRIVATE :private_extern	14410 ; Copyright 2009 D. R. Commander

1703 +%elifdef ELF ; ----(nasm -felf[64] -DELF ...)------------	14411 @@ -38,7 +38,7 @@

1704 +%define PRIVATE :hidden

1705 +%else

1706 +%define PRIVATE

1707 +%endif

1708 +; End chromium edits

1709 +

1710 ; --------------------------------------------------------------------------

1711 Index: simd/jdclrmmx.asm

1712 ===================================================================

1713 --- simd/jdclrmmx.asm» (revision 829)

1714 +++ simd/jdclrmmx.asm» (working copy)

1715 @@ -40,7 +40,7 @@

1716 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr

1717

1718 » align» 16

1719 -» global» EXTN(jsimd_ycc_rgb_convert_mmx)

1720 +» global» EXTN(jsimd_ycc_rgb_convert_mmx) PRIVATE

1721

1722 EXTN(jsimd_ycc_rgb_convert_mmx):

1723 » push» ebp

1724 Index: simd/jccolss2.asm

1725 ===================================================================

1726 --- simd/jccolss2.asm» (revision 829)

1727 +++ simd/jccolss2.asm» (working copy)

1728 @@ -34,7 +34,7 @@

1729 SECTION SEG_CONST	14412 SECTION SEG_CONST

1730	14413

1731 alignz 16	14414 alignz 16

1732 -» global» EXTN(jconst_rgb_ycc_convert_sse2)	14415 -» global» EXTN(jconst_idct_float_sse2)

1733 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE	14416 +» global» EXTN(jconst_idct_float_sse2) PRIVATE

1734	14417

1735 EXTN(jconst_rgb_ycc_convert_sse2):	14418 EXTN(jconst_idct_float_sse2):

1736	14419

1737 Index: simd/jisseflt.asm	14420 @@ -74,7 +74,7 @@

	14421 » » » » » ; FAST_FLOAT workspace[DCTSIZE2]

	14422

	14423 » align» 16

	14424 -» global» EXTN(jsimd_idct_float_sse2)

	14425 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE

	14426

	14427 EXTN(jsimd_idct_float_sse2):

	14428 » push» rbp

	14429 @@ -81,11 +81,11 @@

	14430 » mov» rax,rsp»» » » ; rax = original rbp

	14431 » sub» rsp, byte 4

	14432 » and» rsp, byte (-SIZEOF_XMMWORD)» ; align to 128 bits

	14433 -» mov» [rsp],eax

	14434 +» mov» [rsp],rax

	14435 » mov» rbp,rsp»» » » ; rbp = aligned rbp

	14436 » lea» rsp, [workspace]

	14437 +» collect_args

	14438 » push» rbx

	14439 -» collect_args

	14440

	14441 » ; ---- Pass 1: process columns from input, store into work array.

	14442

	14443 @@ -471,9 +471,13 @@

	14444 » dec» rcx» » » » ; ctr

	14445 » jnz» near .rowloop

	14446

	14447 +» pop» rbx

	14448 » uncollect_args

	14449 -» pop» rbx

	14450 » mov» rsp,rbp»» ; rsp <- aligned rbp

	14451 » pop» rsp» » ; rsp <- original rbp

	14452 » pop» rbp

	14453 » ret

	14454 +

	14455 +; For some reason, the OS X linker does not honor the request to align the

	14456 +; segment unless we do this.

	14457 +» align» 16

	14458 Index: simd/jiss2flt.asm

1738 ===================================================================	14459 ===================================================================

1739 --- simd/jisseflt.asm» (revision 829)	14460 --- simd/jiss2flt.asm» (revision 829)

1740 +++ simd/jisseflt.asm» (working copy)	14461 +++ simd/jiss2flt.asm» (working copy)

1741 @@ -37,7 +37,7 @@	14462 @@ -37,7 +37,7 @@

1742 SECTION SEG_CONST	14463 SECTION SEG_CONST

1743	14464

1744 alignz 16	14465 alignz 16

1745 -» global» EXTN(jconst_idct_float_sse)	14466 -» global» EXTN(jconst_idct_float_sse2)

1746 +» global» EXTN(jconst_idct_float_sse) PRIVATE	14467 +» global» EXTN(jconst_idct_float_sse2) PRIVATE

1747	14468

1748 EXTN(jconst_idct_float_sse):	14469 EXTN(jconst_idct_float_sse2):

1749	14470

1750 @@ -73,7 +73,7 @@	14471 @@ -73,7 +73,7 @@

1751 ; FAST_FLOAT workspace[DCTSIZE2]	14472 ; FAST_FLOAT workspace[DCTSIZE2]

1752	14473

1753 align 16	14474 align 16

1754 -» global» EXTN(jsimd_idct_float_sse)	14475 -» global» EXTN(jsimd_idct_float_sse2)

1755 +» global» EXTN(jsimd_idct_float_sse) PRIVATE	14476 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE

1756	14477

1757 EXTN(jsimd_idct_float_sse):	14478 EXTN(jsimd_idct_float_sse2):

1758 push ebp	14479 push ebp

1759 Index: simd/jcqnts2i-64.asm	14480 @@ -493,3 +493,6 @@

	14481 » pop» ebp

	14482 » ret

	14483

	14484 +; For some reason, the OS X linker does not honor the request to align the

	14485 +; segment unless we do this.

	14486 +» align» 16

	14487 Index: simd/jiss2fst-64.asm

1760 ===================================================================	14488 ===================================================================

1761 --- simd/jcqnts2i-64.asm» (revision 829)	14489 --- simd/jiss2fst-64.asm» (revision 829)

1762 +++ simd/jcqnts2i-64.asm» (working copy)	14490 +++ simd/jiss2fst-64.asm» (working copy)

1763 @@ -36,7 +36,7 @@	14491 @@ -1,5 +1,5 @@

1764 ; r12 = DCTELEM * workspace	14492 ;

	14493 -; jiss2fst.asm - fast integer IDCT (64-bit SSE2)

	14494 +; jiss2fst-64.asm - fast integer IDCT (64-bit SSE2)

	14495 ;

	14496 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	14497 ; Copyright 2009 D. R. Commander

	14498 @@ -60,7 +60,7 @@

	14499 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)

	14500

	14501 » alignz» 16

	14502 -» global» EXTN(jconst_idct_ifast_sse2)

	14503 +» global» EXTN(jconst_idct_ifast_sse2) PRIVATE

	14504

	14505 EXTN(jconst_idct_ifast_sse2):

	14506

	14507 @@ -93,7 +93,7 @@

	14508 %define WK_NUM»» 2

1765	14509

1766 align 16	14510 align 16

1767 -» global» EXTN(jsimd_convsamp_sse2)	14511 -» global» EXTN(jsimd_idct_ifast_sse2)

1768 +» global» EXTN(jsimd_convsamp_sse2) PRIVATE	14512 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE

1769	14513

1770 EXTN(jsimd_convsamp_sse2):	14514 EXTN(jsimd_idct_ifast_sse2):

1771 push rbp	14515 push rbp

1772 @@ -112,7 +112,7 @@	14516 @@ -100,7 +100,7 @@

1773 ; r12 = DCTELEM * workspace	14517 » mov» rax,rsp»» » » ; rax = original rbp

	14518 » sub» rsp, byte 4

	14519 » and» rsp, byte (-SIZEOF_XMMWORD)» ; align to 128 bits

	14520 -» mov» [rsp],eax

	14521 +» mov» [rsp],rax

	14522 » mov» rbp,rsp»» » » ; rbp = aligned rbp

	14523 » lea» rsp, [wk(0)]

	14524 » collect_args

	14525 @@ -486,3 +486,7 @@

	14526 » pop» rbp

	14527 » ret

	14528 » ret

	14529 +

	14530 +; For some reason, the OS X linker does not honor the request to align the

	14531 +; segment unless we do this.

	14532 +» align» 16

	14533 Index: simd/jiss2fst.asm

	14534 ===================================================================

	14535 --- simd/jiss2fst.asm» (revision 829)

	14536 +++ simd/jiss2fst.asm» (working copy)

	14537 @@ -59,7 +59,7 @@

	14538 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)

	14539

	14540 » alignz» 16

	14541 -» global» EXTN(jconst_idct_ifast_sse2)

	14542 +» global» EXTN(jconst_idct_ifast_sse2) PRIVATE

	14543

	14544 EXTN(jconst_idct_ifast_sse2):

	14545

	14546 @@ -92,7 +92,7 @@

	14547 %define WK_NUM»» 2

1774	14548

1775 align 16	14549 align 16

1776 -» global» EXTN(jsimd_quantize_sse2)	14550 -» global» EXTN(jsimd_idct_ifast_sse2)

1777 +» global» EXTN(jsimd_quantize_sse2) PRIVATE	14551 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE

1778	14552

1779 EXTN(jsimd_quantize_sse2):	14553 EXTN(jsimd_idct_ifast_sse2):

1780 » push» rbp	14554 » push» ebp

1781 Index: simd/jdclrss2.asm	14555 @@ -497,3 +497,6 @@

1782 ===================================================================	14556 » pop» ebp

1783 --- simd/jdclrss2.asm» (revision 829)	14557 » ret

1784 +++ simd/jdclrss2.asm» (working copy)

1785 @@ -40,7 +40,7 @@

1786 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr

1787	14558

1788 » align» 16	14559 +; For some reason, the OS X linker does not honor the request to align the

1789 -» global» EXTN(jsimd_ycc_rgb_convert_sse2)	14560 +; segment unless we do this.

1790 +» global» EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE	14561 +» align» 16

1791

1792 EXTN(jsimd_ycc_rgb_convert_sse2):

1793 » push» ebp

1794 Index: simd/jcqntsse.asm

1795 ===================================================================

1796 --- simd/jcqntsse.asm» (revision 829)

1797 +++ simd/jcqntsse.asm» (working copy)

1798 @@ -35,7 +35,7 @@

1799 %define workspace» ebp+16» » ; FAST_FLOAT * workspace

1800

1801 » align» 16

1802 -» global» EXTN(jsimd_convsamp_float_sse)

1803 +» global» EXTN(jsimd_convsamp_float_sse) PRIVATE

1804

1805 EXTN(jsimd_convsamp_float_sse):

1806 » push» ebp

1807 @@ -138,7 +138,7 @@

1808 %define workspace» ebp+16» » ; FAST_FLOAT * workspace

1809

1810 » align» 16

1811 -» global» EXTN(jsimd_quantize_float_sse)

1812 +» global» EXTN(jsimd_quantize_float_sse) PRIVATE

1813

1814 EXTN(jsimd_quantize_float_sse):

1815 » push» ebp

1816 Index: simd/jiss2int-64.asm	14562 Index: simd/jiss2int-64.asm

1817 ===================================================================	14563 ===================================================================

1818 --- simd/jiss2int-64.asm (revision 829)	14564 --- simd/jiss2int-64.asm (revision 829)

1819 +++ simd/jiss2int-64.asm (working copy)	14565 +++ simd/jiss2int-64.asm (working copy)

	14566 @@ -1,5 +1,5 @@

	14567 ;

	14568 -; jiss2int.asm - accurate integer IDCT (64-bit SSE2)

	14569 +; jiss2int-64.asm - accurate integer IDCT (64-bit SSE2)

	14570 ;

	14571 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	14572 ; Copyright 2009 D. R. Commander

1820 @@ -67,7 +67,7 @@	14573 @@ -67,7 +67,7 @@

1821 SECTION SEG_CONST	14574 SECTION SEG_CONST

1822	14575

1823 alignz 16	14576 alignz 16

1824 - global EXTN(jconst_idct_islow_sse2)	14577 - global EXTN(jconst_idct_islow_sse2)

1825 + global EXTN(jconst_idct_islow_sse2) PRIVATE	14578 + global EXTN(jconst_idct_islow_sse2) PRIVATE

1826	14579

1827 EXTN(jconst_idct_islow_sse2):	14580 EXTN(jconst_idct_islow_sse2):

1828	14581

1829 @@ -106,7 +106,7 @@	14582 @@ -106,7 +106,7 @@

1830 %define WK_NUM 12	14583 %define WK_NUM 12

1831	14584

1832 align 16	14585 align 16

1833 - global EXTN(jsimd_idct_islow_sse2)	14586 - global EXTN(jsimd_idct_islow_sse2)

1834 + global EXTN(jsimd_idct_islow_sse2) PRIVATE	14587 + global EXTN(jsimd_idct_islow_sse2) PRIVATE

1835	14588

1836 EXTN(jsimd_idct_islow_sse2):	14589 EXTN(jsimd_idct_islow_sse2):

1837 push rbp	14590 push rbp

1838 Index: simd/jfmmxfst.asm	14591 @@ -842,3 +842,7 @@

	14592 » pop» rsp» » ; rsp <- original rbp

	14593 » pop» rbp

	14594 » ret

	14595 +

	14596 +; For some reason, the OS X linker does not honor the request to align the

	14597 +; segment unless we do this.

	14598 +» align» 16

	14599 Index: simd/jiss2int.asm

1839 ===================================================================	14600 ===================================================================

1840 --- simd/jfmmxfst.asm» (revision 829)	14601 --- simd/jiss2int.asm» (revision 829)

1841 +++ simd/jfmmxfst.asm» (working copy)	14602 +++ simd/jiss2int.asm» (working copy)

1842 @@ -52,7 +52,7 @@	14603 @@ -66,7 +66,7 @@

1843 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)	14604 » SECTION»SEG_CONST

1844	14605

1845 alignz 16	14606 alignz 16

1846 -» global» EXTN(jconst_fdct_ifast_mmx)	14607 -» global» EXTN(jconst_idct_islow_sse2)

1847 +» global» EXTN(jconst_fdct_ifast_mmx) PRIVATE	14608 +» global» EXTN(jconst_idct_islow_sse2) PRIVATE

1848	14609

1849 EXTN(jconst_fdct_ifast_mmx):	14610 EXTN(jconst_idct_islow_sse2):

1850	14611

1851 @@ -80,7 +80,7 @@	14612 @@ -105,7 +105,7 @@

	14613 %define WK_NUM»» 12

	14614

	14615 » align» 16

	14616 -» global» EXTN(jsimd_idct_islow_sse2)

	14617 +» global» EXTN(jsimd_idct_islow_sse2) PRIVATE

	14618

	14619 EXTN(jsimd_idct_islow_sse2):

	14620 » push» ebp

	14621 @@ -854,3 +854,6 @@

	14622 » pop» ebp

	14623 » ret

	14624

	14625 +; For some reason, the OS X linker does not honor the request to align the

	14626 +; segment unless we do this.

	14627 +» align» 16

	14628 Index: simd/jiss2red-64.asm

	14629 ===================================================================

	14630 --- simd/jiss2red-64.asm» (revision 829)

	14631 +++ simd/jiss2red-64.asm» (working copy)

	14632 @@ -1,5 +1,5 @@

	14633 ;

	14634 -; jiss2red.asm - reduced-size IDCT (64-bit SSE2)

	14635 +; jiss2red-64.asm - reduced-size IDCT (64-bit SSE2)

	14636 ;

	14637 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	14638 ; Copyright 2009 D. R. Commander

	14639 @@ -73,7 +73,7 @@

	14640 » SECTION»SEG_CONST

	14641

	14642 » alignz» 16

	14643 -» global» EXTN(jconst_idct_red_sse2)

	14644 +» global» EXTN(jconst_idct_red_sse2) PRIVATE

	14645

	14646 EXTN(jconst_idct_red_sse2):

	14647

	14648 @@ -114,7 +114,7 @@

1852 %define WK_NUM 2	14649 %define WK_NUM 2

1853	14650

1854 align 16	14651 align 16

1855 -» global» EXTN(jsimd_fdct_ifast_mmx)	14652 -» global» EXTN(jsimd_idct_4x4_sse2)

1856 +» global» EXTN(jsimd_fdct_ifast_mmx) PRIVATE	14653 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE

1857	14654

1858 EXTN(jsimd_fdct_ifast_mmx):	14655 EXTN(jsimd_idct_4x4_sse2):

	14656 » push» rbp

	14657 @@ -121,7 +121,7 @@

	14658 » mov» rax,rsp»» » » ; rax = original rbp

	14659 » sub» rsp, byte 4

	14660 » and» rsp, byte (-SIZEOF_XMMWORD)» ; align to 128 bits

	14661 -» mov» [rsp],eax

	14662 +» mov» [rsp],rax

	14663 » mov» rbp,rsp»» » » ; rbp = aligned rbp

	14664 » lea» rsp, [wk(0)]

	14665 » collect_args

	14666 @@ -413,13 +413,14 @@

	14667 ; r13 = JDIMENSION output_col

	14668

	14669 » align» 16

	14670 -» global» EXTN(jsimd_idct_2x2_sse2)

	14671 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE

	14672

	14673 EXTN(jsimd_idct_2x2_sse2):

	14674 » push» rbp

	14675 +» mov» rax,rsp

	14676 » mov» rbp,rsp

	14677 +» collect_args

	14678 » push» rbx

	14679 -» collect_args

	14680

	14681 » ; ---- Pass 1: process columns from input.

	14682

	14683 @@ -565,7 +566,11 @@

	14684 » mov» WORD [rdx+rax*SIZEOF_JSAMPLE], bx

	14685 » mov» WORD [rsi+rax*SIZEOF_JSAMPLE], cx

	14686

	14687 +» pop» rbx

	14688 » uncollect_args

	14689 -» pop» rbx

	14690 » pop» rbp

	14691 » ret

	14692 +

	14693 +; For some reason, the OS X linker does not honor the request to align the

	14694 +; segment unless we do this.

	14695 +» align» 16

	14696 Index: simd/jiss2red.asm

	14697 ===================================================================

	14698 --- simd/jiss2red.asm» (revision 829)

	14699 +++ simd/jiss2red.asm» (working copy)

	14700 @@ -72,7 +72,7 @@

	14701 » SECTION»SEG_CONST

	14702

	14703 » alignz» 16

	14704 -» global» EXTN(jconst_idct_red_sse2)

	14705 +» global» EXTN(jconst_idct_red_sse2) PRIVATE

	14706

	14707 EXTN(jconst_idct_red_sse2):

	14708

	14709 @@ -113,7 +113,7 @@

	14710 %define WK_NUM»» 2

	14711

	14712 » align» 16

	14713 -» global» EXTN(jsimd_idct_4x4_sse2)

	14714 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE

	14715

	14716 EXTN(jsimd_idct_4x4_sse2):

1859 push ebp	14717 push ebp

1860 Index: jdarith.c	14718 @@ -424,7 +424,7 @@

	14719 %define output_col(b)» (b)+20» » ; JDIMENSION output_col

	14720

	14721 » align» 16

	14722 -» global» EXTN(jsimd_idct_2x2_sse2)

	14723 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE

	14724

	14725 EXTN(jsimd_idct_2x2_sse2):

	14726 » push» ebp

	14727 @@ -589,3 +589,6 @@

	14728 » pop» ebp

	14729 » ret

	14730

	14731 +; For some reason, the OS X linker does not honor the request to align the

	14732 +; segment unless we do this.

	14733 +» align» 16

	14734 Index: simd/jisseflt.asm

1861 ===================================================================	14735 ===================================================================

1862 --- jdarith.c» (revision 829)	14736 --- simd/jisseflt.asm» (revision 829)

1863 +++ jdarith.c» (working copy)	14737 +++ simd/jisseflt.asm» (working copy)

1864 @@ -150,8 +150,8 @@	14738 @@ -37,7 +37,7 @@

1865 */	14739 » SECTION»SEG_CONST

1866 sv = *st;	14740

1867 qe = jpeg_aritab[sv & 0x7F];»/* => Qe_Value */	14741 » alignz» 16

1868 - nl = qe & 0xFF; qe >>= 8;» /* Next_Index_LPS + Switch_MPS */	14742 -» global» EXTN(jconst_idct_float_sse)

1869 - nm = qe & 0xFF; qe >>= 8;» /* Next_Index_MPS */	14743 +» global» EXTN(jconst_idct_float_sse) PRIVATE

1870 + nl = (unsigned char) qe & 0xFF; qe >>= 8;» /* Next_Index_LPS + Switch_MPS * /	14744

1871 + nm = (unsigned char) qe & 0xFF; qe >>= 8;» /* Next_Index_MPS */	14745 EXTN(jconst_idct_float_sse):

1872	14746

1873 /* Decode & estimation procedures per sections D.2.4 & D.2.5 */	14747 @@ -73,7 +73,7 @@

1874 temp = e->a - qe;	14748 » » » » » ; FAST_FLOAT workspace[DCTSIZE2]

1875 Index: jdhuff.c	14749

	14750 » align» 16

	14751 -» global» EXTN(jsimd_idct_float_sse)

	14752 +» global» EXTN(jsimd_idct_float_sse) PRIVATE

	14753

	14754 EXTN(jsimd_idct_float_sse):

	14755 » push» ebp

	14756 @@ -567,3 +567,6 @@

	14757 » pop» ebp

	14758 » ret

	14759

	14760 +; For some reason, the OS X linker does not honor the request to align the

	14761 +; segment unless we do this.

	14762 +» align» 16

	14763 Index: simd/jsimd.h

1876 ===================================================================	14764 ===================================================================

1877 --- jdhuff.c (revision 1541)	14765 --- simd/jsimd.h» (revision 829)

1878 +++ jdhuff.c (working copy)	14766 +++ simd/jsimd.h» (working copy)

1879 @@ -662,7 +662,7 @@	14767 @@ -2,19 +2,22 @@

1880 d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn];	14768 * simd/jsimd.h

1881 register int s, k, r, l;	14769 *

1882	14770 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

1883 - HUFF_DECODE_FAST(s, l, dctbl);	14771 + * Copyright 2011 D. R. Commander

1884 + HUFF_DECODE_FAST(s, l, dctbl, slow_decode_mcu);	14772 *

1885 if (s) {	14773 * Based on the x86 SIMD extension for IJG JPEG library,

1886 FILL_BIT_BUFFER_FAST	14774 * Copyright (C) 1999-2006, MIYASAKA Masaru.

1887 r = GET_BITS(s);	14775 + * For conditions of distribution and use, see copyright notice in jsimdext.inc

1888 @@ -679,7 +679,7 @@	14776 *

1889 if (entropy->ac_needed[blkn]) {

1890

1891 for (k = 1; k < DCTSIZE2; k++) {

1892 - HUFF_DECODE_FAST(s, l, actbl);

1893 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu);

1894 r = s >> 4;

1895 s &= 15;

1896

1897 @@ -698,7 +698,7 @@

1898 } else {

1899

1900 for (k = 1; k < DCTSIZE2; k++) {

1901 - HUFF_DECODE_FAST(s, l, actbl);

1902 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu);

1903 r = s >> 4;

1904 s &= 15;

1905

1906 @@ -715,6 +715,7 @@

1907 }

1908

1909 if (cinfo->unread_marker != 0) {

1910 +slow_decode_mcu:

1911 cinfo->unread_marker = 0;

1912 return FALSE;

1913 }

1914 @@ -742,7 +743,7 @@

1915 * this module, since we'll just re-assign them on the next call.)

1916 */	14777 */

1917	14778

1918 -#define BUFSIZE (DCTSIZE2 * 2)	14779 /* Bitmask for supported acceleration methods */

1919 +#define BUFSIZE (DCTSIZE2 * 2u)	14780

1920	14781 -#define JSIMD_NONE 0x00

1921 METHODDEF(boolean)	14782 -#define JSIMD_MMX 0x01

1922 decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)	14783 -#define JSIMD_3DNOW 0x02

1923 Index: jdhuff.h	14784 -#define JSIMD_SSE 0x04

	14785 -#define JSIMD_SSE2 0x08

	14786 +#define JSIMD_NONE 0x00

	14787 +#define JSIMD_MMX 0x01

	14788 +#define JSIMD_3DNOW 0x02

	14789 +#define JSIMD_SSE 0x04

	14790 +#define JSIMD_SSE2 0x08

	14791 +#define JSIMD_ARM_NEON 0x10

	14792

	14793 /* Short forms of external names for systems with brain-damaged linkers. */

	14794

	14795 @@ -27,6 +30,13 @@

	14796 #define jsimd_extbgrx_ycc_convert_mmx jSEXTBGRXYCCM

	14797 #define jsimd_extxbgr_ycc_convert_mmx jSEXTXBGRYCCM

	14798 #define jsimd_extxrgb_ycc_convert_mmx jSEXTXRGBYCCM

	14799 +#define jsimd_rgb_gray_convert_mmx jSRGBGRYM

	14800 +#define jsimd_extrgb_gray_convert_mmx jSEXTRGBGRYM

	14801 +#define jsimd_extrgbx_gray_convert_mmx jSEXTRGBXGRYM

	14802 +#define jsimd_extbgr_gray_convert_mmx jSEXTBGRGRYM

	14803 +#define jsimd_extbgrx_gray_convert_mmx jSEXTBGRXGRYM

	14804 +#define jsimd_extxbgr_gray_convert_mmx jSEXTXBGRGRYM

	14805 +#define jsimd_extxrgb_gray_convert_mmx jSEXTXRGBGRYM

	14806 #define jsimd_ycc_rgb_convert_mmx jSYCCRGBM

	14807 #define jsimd_ycc_extrgb_convert_mmx jSYCCEXTRGBM

	14808 #define jsimd_ycc_extrgbx_convert_mmx jSYCCEXTRGBXM

	14809 @@ -42,6 +52,14 @@

	14810 #define jsimd_extbgrx_ycc_convert_sse2 jSEXTBGRXYCCS2

	14811 #define jsimd_extxbgr_ycc_convert_sse2 jSEXTXBGRYCCS2

	14812 #define jsimd_extxrgb_ycc_convert_sse2 jSEXTXRGBYCCS2

	14813 +#define jconst_rgb_gray_convert_sse2 jSCRGBGRYS2

	14814 +#define jsimd_rgb_gray_convert_sse2 jSRGBGRYS2

	14815 +#define jsimd_extrgb_gray_convert_sse2 jSEXTRGBGRYS2

	14816 +#define jsimd_extrgbx_gray_convert_sse2 jSEXTRGBXGRYS2

	14817 +#define jsimd_extbgr_gray_convert_sse2 jSEXTBGRGRYS2

	14818 +#define jsimd_extbgrx_gray_convert_sse2 jSEXTBGRXGRYS2

	14819 +#define jsimd_extxbgr_gray_convert_sse2 jSEXTXBGRGRYS2

	14820 +#define jsimd_extxrgb_gray_convert_sse2 jSEXTXRGBGRYS2

	14821 #define jconst_ycc_rgb_convert_sse2 jSCYCCRGBS2

	14822 #define jsimd_ycc_rgb_convert_sse2 jSYCCRGBS2

	14823 #define jsimd_ycc_extrgb_convert_sse2 jSYCCEXTRGBS2

	14824 @@ -162,6 +180,35 @@

	14825 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14826 JDIMENSION output_row, int num_rows));

	14827

	14828 +EXTERN(void) jsimd_rgb_gray_convert_mmx

	14829 + JPP((JDIMENSION img_width,

	14830 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14831 + JDIMENSION output_row, int num_rows));

	14832 +EXTERN(void) jsimd_extrgb_gray_convert_mmx

	14833 + JPP((JDIMENSION img_width,

	14834 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14835 + JDIMENSION output_row, int num_rows));

	14836 +EXTERN(void) jsimd_extrgbx_gray_convert_mmx

	14837 + JPP((JDIMENSION img_width,

	14838 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14839 + JDIMENSION output_row, int num_rows));

	14840 +EXTERN(void) jsimd_extbgr_gray_convert_mmx

	14841 + JPP((JDIMENSION img_width,

	14842 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14843 + JDIMENSION output_row, int num_rows));

	14844 +EXTERN(void) jsimd_extbgrx_gray_convert_mmx

	14845 + JPP((JDIMENSION img_width,

	14846 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14847 + JDIMENSION output_row, int num_rows));

	14848 +EXTERN(void) jsimd_extxbgr_gray_convert_mmx

	14849 + JPP((JDIMENSION img_width,

	14850 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14851 + JDIMENSION output_row, int num_rows));

	14852 +EXTERN(void) jsimd_extxrgb_gray_convert_mmx

	14853 + JPP((JDIMENSION img_width,

	14854 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14855 + JDIMENSION output_row, int num_rows));

	14856 +

	14857 EXTERN(void) jsimd_ycc_rgb_convert_mmx

	14858 JPP((JDIMENSION out_width,

	14859 JSAMPIMAGE input_buf, JDIMENSION input_row,

	14860 @@ -221,6 +268,36 @@

	14861 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14862 JDIMENSION output_row, int num_rows));

	14863

	14864 +extern const int jconst_rgb_gray_convert_sse2[];

	14865 +EXTERN(void) jsimd_rgb_gray_convert_sse2

	14866 + JPP((JDIMENSION img_width,

	14867 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14868 + JDIMENSION output_row, int num_rows));

	14869 +EXTERN(void) jsimd_extrgb_gray_convert_sse2

	14870 + JPP((JDIMENSION img_width,

	14871 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14872 + JDIMENSION output_row, int num_rows));

	14873 +EXTERN(void) jsimd_extrgbx_gray_convert_sse2

	14874 + JPP((JDIMENSION img_width,

	14875 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14876 + JDIMENSION output_row, int num_rows));

	14877 +EXTERN(void) jsimd_extbgr_gray_convert_sse2

	14878 + JPP((JDIMENSION img_width,

	14879 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14880 + JDIMENSION output_row, int num_rows));

	14881 +EXTERN(void) jsimd_extbgrx_gray_convert_sse2

	14882 + JPP((JDIMENSION img_width,

	14883 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14884 + JDIMENSION output_row, int num_rows));

	14885 +EXTERN(void) jsimd_extxbgr_gray_convert_sse2

	14886 + JPP((JDIMENSION img_width,

	14887 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14888 + JDIMENSION output_row, int num_rows));

	14889 +EXTERN(void) jsimd_extxrgb_gray_convert_sse2

	14890 + JPP((JDIMENSION img_width,

	14891 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14892 + JDIMENSION output_row, int num_rows));

	14893 +

	14894 extern const int jconst_ycc_rgb_convert_sse2[];

	14895 EXTERN(void) jsimd_ycc_rgb_convert_sse2

	14896 JPP((JDIMENSION out_width,

	14897 @@ -251,6 +328,64 @@

	14898 JSAMPIMAGE input_buf, JDIMENSION input_row,

	14899 JSAMPARRAY output_buf, int num_rows));

	14900

	14901 +EXTERN(void) jsimd_rgb_ycc_convert_neon

	14902 + JPP((JDIMENSION img_width,

	14903 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14904 + JDIMENSION output_row, int num_rows));

	14905 +EXTERN(void) jsimd_extrgb_ycc_convert_neon

	14906 + JPP((JDIMENSION img_width,

	14907 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14908 + JDIMENSION output_row, int num_rows));

	14909 +EXTERN(void) jsimd_extrgbx_ycc_convert_neon

	14910 + JPP((JDIMENSION img_width,

	14911 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14912 + JDIMENSION output_row, int num_rows));

	14913 +EXTERN(void) jsimd_extbgr_ycc_convert_neon

	14914 + JPP((JDIMENSION img_width,

	14915 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14916 + JDIMENSION output_row, int num_rows));

	14917 +EXTERN(void) jsimd_extbgrx_ycc_convert_neon

	14918 + JPP((JDIMENSION img_width,

	14919 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14920 + JDIMENSION output_row, int num_rows));

	14921 +EXTERN(void) jsimd_extxbgr_ycc_convert_neon

	14922 + JPP((JDIMENSION img_width,

	14923 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14924 + JDIMENSION output_row, int num_rows));

	14925 +EXTERN(void) jsimd_extxrgb_ycc_convert_neon

	14926 + JPP((JDIMENSION img_width,

	14927 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	14928 + JDIMENSION output_row, int num_rows));

	14929 +

	14930 +EXTERN(void) jsimd_ycc_rgb_convert_neon

	14931 + JPP((JDIMENSION out_width,

	14932 + JSAMPIMAGE input_buf, JDIMENSION input_row,

	14933 + JSAMPARRAY output_buf, int num_rows));

	14934 +EXTERN(void) jsimd_ycc_extrgb_convert_neon

	14935 + JPP((JDIMENSION out_width,

	14936 + JSAMPIMAGE input_buf, JDIMENSION input_row,

	14937 + JSAMPARRAY output_buf, int num_rows));

	14938 +EXTERN(void) jsimd_ycc_extrgbx_convert_neon

	14939 + JPP((JDIMENSION out_width,

	14940 + JSAMPIMAGE input_buf, JDIMENSION input_row,

	14941 + JSAMPARRAY output_buf, int num_rows));

	14942 +EXTERN(void) jsimd_ycc_extbgr_convert_neon

	14943 + JPP((JDIMENSION out_width,

	14944 + JSAMPIMAGE input_buf, JDIMENSION input_row,

	14945 + JSAMPARRAY output_buf, int num_rows));

	14946 +EXTERN(void) jsimd_ycc_extbgrx_convert_neon

	14947 + JPP((JDIMENSION out_width,

	14948 + JSAMPIMAGE input_buf, JDIMENSION input_row,

	14949 + JSAMPARRAY output_buf, int num_rows));

	14950 +EXTERN(void) jsimd_ycc_extxbgr_convert_neon

	14951 + JPP((JDIMENSION out_width,

	14952 + JSAMPIMAGE input_buf, JDIMENSION input_row,

	14953 + JSAMPARRAY output_buf, int num_rows));

	14954 +EXTERN(void) jsimd_ycc_extxrgb_convert_neon

	14955 + JPP((JDIMENSION out_width,

	14956 + JSAMPIMAGE input_buf, JDIMENSION input_row,

	14957 + JSAMPARRAY output_buf, int num_rows));

	14958 +

	14959 /* SIMD Downsample */

	14960 EXTERN(void) jsimd_h2v2_downsample_mmx

	14961 JPP((JDIMENSION image_width, int max_v_samp_factor,

	14962 @@ -387,6 +522,10 @@

	14963 JPP((JDIMENSION output_width, JSAMPIMAGE input_buf,

	14964 JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf));

	14965

	14966 +EXTERN(void) jsimd_h2v1_fancy_upsample_neon

	14967 + JPP((int max_v_samp_factor, JDIMENSION downsampled_width,

	14968 + JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));

	14969 +

	14970 /* SIMD Sample Conversion */

	14971 EXTERN(void) jsimd_convsamp_mmx JPP((JSAMPARRAY sample_data,

	14972 JDIMENSION start_col,

	14973 @@ -396,6 +535,10 @@

	14974 JDIMENSION start_col,

	14975 DCTELEM * workspace));

	14976

	14977 +EXTERN(void) jsimd_convsamp_neon JPP((JSAMPARRAY sample_data,

	14978 + JDIMENSION start_col,

	14979 + DCTELEM * workspace));

	14980 +

	14981 EXTERN(void) jsimd_convsamp_float_3dnow JPP((JSAMPARRAY sample_data,

	14982 JDIMENSION start_col,

	14983 FAST_FLOAT * workspace));

	14984 @@ -417,6 +560,8 @@

	14985 extern const int jconst_fdct_islow_sse2[];

	14986 EXTERN(void) jsimd_fdct_ifast_sse2 JPP((DCTELEM * data));

	14987

	14988 +EXTERN(void) jsimd_fdct_ifast_neon JPP((DCTELEM * data));

	14989 +

	14990 EXTERN(void) jsimd_fdct_float_3dnow JPP((FAST_FLOAT * data));

	14991

	14992 extern const int jconst_fdct_float_sse[];

	14993 @@ -431,6 +576,10 @@

	14994 DCTELEM * divisors,

	14995 DCTELEM * workspace));

	14996

	14997 +EXTERN(void) jsimd_quantize_neon JPP((JCOEFPTR coef_block,

	14998 + DCTELEM * divisors,

	14999 + DCTELEM * workspace));

	15000 +

	15001 EXTERN(void) jsimd_quantize_float_3dnow JPP((JCOEFPTR coef_block,

	15002 FAST_FLOAT * divisors,

	15003 FAST_FLOAT * workspace));

	15004 @@ -463,6 +612,15 @@

	15005 JSAMPARRAY output_buf,

	15006 JDIMENSION output_col));

	15007

	15008 +EXTERN(void) jsimd_idct_2x2_neon JPP((void * dct_table,

	15009 + JCOEFPTR coef_block,

	15010 + JSAMPARRAY output_buf,

	15011 + JDIMENSION output_col));

	15012 +EXTERN(void) jsimd_idct_4x4_neon JPP((void * dct_table,

	15013 + JCOEFPTR coef_block,

	15014 + JSAMPARRAY output_buf,

	15015 + JDIMENSION output_col));

	15016 +

	15017 /* SIMD Inverse DCT */

	15018 EXTERN(void) jsimd_idct_islow_mmx JPP((void * dct_table,

	15019 JCOEFPTR coef_block,

	15020 @@ -484,6 +642,15 @@

	15021 JSAMPARRAY output_buf,

	15022 JDIMENSION output_col));

	15023

	15024 +EXTERN(void) jsimd_idct_islow_neon JPP((void * dct_table,

	15025 + JCOEFPTR coef_block,

	15026 + JSAMPARRAY output_buf,

	15027 + JDIMENSION output_col));

	15028 +EXTERN(void) jsimd_idct_ifast_neon JPP((void * dct_table,

	15029 + JCOEFPTR coef_block,

	15030 + JSAMPARRAY output_buf,

	15031 + JDIMENSION output_col));

	15032 +

	15033 EXTERN(void) jsimd_idct_float_3dnow JPP((void * dct_table,

	15034 JCOEFPTR coef_block,

	15035 JSAMPARRAY output_buf,

	15036 Index: simd/jsimd_i386.c

1924 ===================================================================	15037 ===================================================================

1925 --- jdhuff.h (revision 1541)	15038 --- simd/jsimd_i386.c» (revision 829)

1926 +++ jdhuff.h (working copy)	15039 +++ simd/jsimd_i386.c» (working copy)

1927 @@ -208,7 +208,7 @@	15040 @@ -2,10 +2,11 @@

1928 } \	15041 * jsimd_i386.c

	15042 *

	15043 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	15044 - * Copyright 2009 D. R. Commander

	15045 + * Copyright 2009-2011 D. R. Commander

	15046 *

	15047 * Based on the x86 SIMD extension for IJG JPEG library,

	15048 * Copyright (C) 1999-2006, MIYASAKA Masaru.

	15049 + * For conditions of distribution and use, see copyright notice in jsimdext.inc

	15050 *

	15051 * This file contains the interface between the "normal" portions

	15052 * of the library and the SIMD implementations when running on a

	15053 @@ -40,7 +41,7 @@

	15054 {

	15055 char *env = NULL;

	15056

	15057 - if (simd_support != ~0)

	15058 + if (simd_support != ~0U)

	15059 return;

	15060

	15061 simd_support = jpeg_simd_cpu_support();

	15062 @@ -51,15 +52,16 @@

	15063 simd_support &= JSIMD_MMX;

	15064 env = getenv("JSIMD_FORCE3DNOW");

	15065 if ((env != NULL) && (strcmp(env, "1") == 0))

	15066 - simd_support &= JSIMD_3DNOW;

	15067 + simd_support &= JSIMD_3DNOW\|JSIMD_MMX;

	15068 env = getenv("JSIMD_FORCESSE");

	15069 if ((env != NULL) && (strcmp(env, "1") == 0))

	15070 - simd_support &= JSIMD_SSE;

	15071 + simd_support &= JSIMD_SSE\|JSIMD_MMX;

	15072 env = getenv("JSIMD_FORCESSE2");

	15073 if ((env != NULL) && (strcmp(env, "1") == 0))

	15074 simd_support &= JSIMD_SSE2;

1929 }	15075 }

1930	15076

1931 -#define HUFF_DECODE_FAST(s,nb,htbl) \	15077 +#ifndef JPEG_DECODE_ONLY

1932 +#define HUFF_DECODE_FAST(s,nb,htbl,slowlabel) \	15078 GLOBAL(int)

1933 FILL_BIT_BUFFER_FAST; \	15079 jsimd_can_rgb_ycc (void)

1934 s = PEEK_BITS(HUFF_LOOKAHEAD); \	15080 {

1935 s = htbl->lookup[s]; \	15081 @@ -81,8 +83,31 @@

1936 @@ -225,7 +225,9 @@	15082

1937 s \|= GET_BITS(1); \	15083 return 0;

1938 nb++; \	15084 }

1939 } \

1940 - s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) & 0xFF ]; \

1941 + if (nb > 16) \

1942 + goto slowlabel; \

1943 + s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ]; \

1944 }

1945

1946 /* Out-of-line case for Huffman code fetching */

1947

1948 Index: jchuff.c

1949 ===================================================================

1950 --- jchuff.c» (revision 1219)

1951 +++ jchuff.c» (revision 1220)

1952 @@ -22,8 +22,36 @@

1953 #include "jchuff.h"» » /* Declarations shared with jcphuff.c */

1954 #include <limits.h>

1955

1956 +/*

1957 + * NOTE: If USE_CLZ_INTRINSIC is defined, then clz/bsr instructions will be

1958 + * used for bit counting rather than the lookup table. This will reduce the

1959 + * memory footprint by 64k, which is important for some mobile applications

1960 + * that create many isolated instances of libjpeg-turbo (web browsers, for

1961 + * instance.) This may improve performance on some mobile platforms as well.

1962 + * This feature is enabled by default only on ARM processors, because some x86

1963 + * chips have a slow implementation of bsr, and the use of clz/bsr cannot be

1964 + * shown to have a significant performance impact even on the x86 chips that

1965 + * have a fast implementation of it. When building for ARMv6, you can

1966 + * explicitly disable the use of clz/bsr by adding -mthumb to the compiler

1967 + * flags (this defines __thumb__).

1968 + */

1969 +

1970 +/* NOTE: Both GCC and Clang define __GNUC__ */

1971 +#if defined __GNUC__ && defined __arm__

1972 +#if !defined __thumb__ \|\| defined __thumb2__

1973 +#define USE_CLZ_INTRINSIC

1974 +#endif	15085 +#endif

1975 +#endif	15086

1976 +	15087 GLOBAL(int)

1977 +#ifdef USE_CLZ_INTRINSIC

1978 +#define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x))

1979 +#define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0)

1980 +#else

1981 static unsigned char jpeg_nbits_table[65536];

1982 static int jpeg_nbits_table_init = 0;

1983 +#define JPEG_NBITS(x) (jpeg_nbits_table[x])

1984 +#define JPEG_NBITS_NONZERO(x) JPEG_NBITS(x)

1985 +#endif

1986

1987 #ifndef min

1988 #define min(a,b) ((a)<(b)?(a):(b))

1989 @@ -272,6 +300,7 @@

1990 dtbl->ehufsi[i] = huffsize[p];

1991 }

1992

1993 +#ifndef USE_CLZ_INTRINSIC

1994 if(!jpeg_nbits_table_init) {

1995 for(i = 0; i < 65536; i++) {

1996 int nbits = 0, temp = i;

1997 @@ -280,6 +309,7 @@

1998 }

1999 jpeg_nbits_table_init = 1;

2000 }

2001 +#endif

2002 }

2003

2004

2005 @@ -482,7 +512,7 @@

2006 temp2 += temp3;

2007

2008 /* Find the number of bits needed for the magnitude of the coefficient */

2009 - nbits = jpeg_nbits_table[temp];

2010 + nbits = JPEG_NBITS(temp);

2011

2012 /* Emit the Huffman-coded symbol for the number of bits */

2013 code = dctbl->ehufco[nbits];

2014 @@ -516,7 +546,7 @@

2015 temp ^= temp3; \

2016 temp -= temp3; \

2017 temp2 += temp3; \

2018 - nbits = jpeg_nbits_table[temp]; \

2019 + nbits = JPEG_NBITS_NONZERO(temp); \

2020 /* if run length > 15, must emit special run-length-16 codes (0xF0) */ \

2021 while (r > 15) { \

2022 EMIT_BITS(code_0xf0, size_0xf0) \

2023 Index: simd/jsimd_arm64.c

2024 ===================================================================

2025 --- /dev/null

2026 +++ simd/jsimd_arm64.c

2027 @@ -0,0 +1,544 @@

2028 +/*

2029 + * jsimd_arm64.c

2030 + *

2031 + * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

2032 + * Copyright 2009-2011, 2013-2014 D. R. Commander

2033 + *

2034 + * Based on the x86 SIMD extension for IJG JPEG library,

2035 + * Copyright (C) 1999-2006, MIYASAKA Masaru.

2036 + * For conditions of distribution and use, see copyright notice in jsimdext.inc

2037 + *

2038 + * This file contains the interface between the "normal" portions

2039 + * of the library and the SIMD implementations when running on a

2040 + * 64-bit ARM architecture.

2041 + */

2042 +

2043 +#define JPEG_INTERNALS

2044 +#include "../jinclude.h"

2045 +#include "../jpeglib.h"

2046 +#include "../jsimd.h"

2047 +#include "../jdct.h"

2048 +#include "../jsimddct.h"

2049 +#include "jsimd.h"

2050 +

2051 +#include <stdio.h>

2052 +#include <string.h>

2053 +#include <ctype.h>

2054 +

2055 +static unsigned int simd_support = ~0;

2056 +

2057 +/*

2058 + * Check what SIMD accelerations are supported.

2059 + *

2060 + * FIXME: This code is racy under a multi-threaded environment.

2061 + */

2062 +

2063 +/*

2064 + * ARMv8 architectures support NEON extensions by default.

2065 + * It is no longer optional as it was with ARMv7.

2066 + */

2067 +

2068 +

2069 +LOCAL(void)

2070 +init_simd (void)

2071 +{

2072 + char *env = NULL;

2073 +

2074 + if (simd_support != ~0U)

2075 + return;

2076 +

2077 + simd_support = 0;

2078 +

2079 + simd_support \|= JSIMD_ARM_NEON;

2080 +

2081 + /* Force different settings through environment variables */

2082 + env = getenv("JSIMD_FORCENEON");

2083 + if ((env != NULL) && (strcmp(env, "1") == 0))

2084 + simd_support &= JSIMD_ARM_NEON;

2085 + env = getenv("JSIMD_FORCENONE");

2086 + if ((env != NULL) && (strcmp(env, "1") == 0))

2087 + simd_support = 0;

2088 +}

2089 +

2090 +GLOBAL(int)

2091 +jsimd_can_rgb_ycc (void)

2092 +{

2093 + init_simd();

2094 +

2095 + return 0;

2096 +}

2097 +

2098 +GLOBAL(int)

2099 +jsimd_can_rgb_gray (void)	15088 +jsimd_can_rgb_gray (void)

2100 +{	15089 +{

2101 + init_simd();	15090 + init_simd();

2102 +	15091 +

2103 + return 0;

2104 +}

2105 +

2106 +GLOBAL(int)

2107 +jsimd_can_ycc_rgb (void)

2108 +{

2109 + init_simd();

2110 +

2111 + /* The code is optimised for these values only */	15092 + /* The code is optimised for these values only */

2112 + if (BITS_IN_JSAMPLE != 8)	15093 + if (BITS_IN_JSAMPLE != 8)

2113 + return 0;	15094 + return 0;

2114 + if (sizeof(JDIMENSION) != 4)	15095 + if (sizeof(JDIMENSION) != 4)

2115 + return 0;	15096 + return 0;

2116 + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4))	15097 + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4))

2117 + return 0;	15098 + return 0;

2118 +	15099 +

2119 + if (simd_support & JSIMD_ARM_NEON)	15100 + if ((simd_support & JSIMD_SSE2) &&

	15101 + IS_ALIGNED_SSE(jconst_rgb_gray_convert_sse2))

	15102 + return 1;

	15103 + if (simd_support & JSIMD_MMX)

2120 + return 1;	15104 + return 1;

2121 +	15105 +

2122 + return 0;	15106 + return 0;

2123 +}	15107 +}

2124 +	15108 +

2125 +GLOBAL(int)	15109 +GLOBAL(int)

2126 +jsimd_can_ycc_rgb565 (void)	15110 jsimd_can_ycc_rgb (void)

	15111 {

	15112 init_simd();

	15113 @@ -104,6 +129,7 @@

	15114 return 0;

	15115 }

	15116

	15117 +#ifndef JPEG_DECODE_ONLY

	15118 GLOBAL(void)

	15119 jsimd_rgb_ycc_convert (j_compress_ptr cinfo,

	15120 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	15121 @@ -119,6 +145,7 @@

	15122 mmxfct=jsimd_extrgb_ycc_convert_mmx;

	15123 break;

	15124 case JCS_EXT_RGBX:

	15125 + case JCS_EXT_RGBA:

	15126 sse2fct=jsimd_extrgbx_ycc_convert_sse2;

	15127 mmxfct=jsimd_extrgbx_ycc_convert_mmx;

	15128 break;

	15129 @@ -127,14 +154,17 @@

	15130 mmxfct=jsimd_extbgr_ycc_convert_mmx;

	15131 break;

	15132 case JCS_EXT_BGRX:

	15133 + case JCS_EXT_BGRA:

	15134 sse2fct=jsimd_extbgrx_ycc_convert_sse2;

	15135 mmxfct=jsimd_extbgrx_ycc_convert_mmx;

	15136 break;

	15137 case JCS_EXT_XBGR:

	15138 + case JCS_EXT_ABGR:

	15139 sse2fct=jsimd_extxbgr_ycc_convert_sse2;

	15140 mmxfct=jsimd_extxbgr_ycc_convert_mmx;

	15141 break;

	15142 case JCS_EXT_XRGB:

	15143 + case JCS_EXT_ARGB:

	15144 sse2fct=jsimd_extxrgb_ycc_convert_sse2;

	15145 mmxfct=jsimd_extxrgb_ycc_convert_mmx;

	15146 break;

	15147 @@ -152,8 +182,62 @@

	15148 mmxfct(cinfo->image_width, input_buf,

	15149 output_buf, output_row, num_rows);

	15150 }

	15151 +#endif

	15152

	15153 GLOBAL(void)

	15154 +jsimd_rgb_gray_convert (j_compress_ptr cinfo,

	15155 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	15156 + JDIMENSION output_row, int num_rows)

2127 +{	15157 +{

2128 + init_simd();	15158 + void (*sse2fct)(JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int);

2129 +	15159 + void (*mmxfct)(JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int);

	15160 +

	15161 + switch(cinfo->in_color_space)

	15162 + {

	15163 + case JCS_EXT_RGB:

	15164 + sse2fct=jsimd_extrgb_gray_convert_sse2;

	15165 + mmxfct=jsimd_extrgb_gray_convert_mmx;

	15166 + break;

	15167 + case JCS_EXT_RGBX:

	15168 + case JCS_EXT_RGBA:

	15169 + sse2fct=jsimd_extrgbx_gray_convert_sse2;

	15170 + mmxfct=jsimd_extrgbx_gray_convert_mmx;

	15171 + break;

	15172 + case JCS_EXT_BGR:

	15173 + sse2fct=jsimd_extbgr_gray_convert_sse2;

	15174 + mmxfct=jsimd_extbgr_gray_convert_mmx;

	15175 + break;

	15176 + case JCS_EXT_BGRX:

	15177 + case JCS_EXT_BGRA:

	15178 + sse2fct=jsimd_extbgrx_gray_convert_sse2;

	15179 + mmxfct=jsimd_extbgrx_gray_convert_mmx;

	15180 + break;

	15181 + case JCS_EXT_XBGR:

	15182 + case JCS_EXT_ABGR:

	15183 + sse2fct=jsimd_extxbgr_gray_convert_sse2;

	15184 + mmxfct=jsimd_extxbgr_gray_convert_mmx;

	15185 + break;

	15186 + case JCS_EXT_XRGB:

	15187 + case JCS_EXT_ARGB:

	15188 + sse2fct=jsimd_extxrgb_gray_convert_sse2;

	15189 + mmxfct=jsimd_extxrgb_gray_convert_mmx;

	15190 + break;

	15191 + default:

	15192 + sse2fct=jsimd_rgb_gray_convert_sse2;

	15193 + mmxfct=jsimd_rgb_gray_convert_mmx;

	15194 + break;

	15195 + }

	15196 +

	15197 + if ((simd_support & JSIMD_SSE2) &&

	15198 + IS_ALIGNED_SSE(jconst_rgb_gray_convert_sse2))

	15199 + sse2fct(cinfo->image_width, input_buf,

	15200 + output_buf, output_row, num_rows);

	15201 + else if (simd_support & JSIMD_MMX)

	15202 + mmxfct(cinfo->image_width, input_buf,

	15203 + output_buf, output_row, num_rows);

	15204 +}

	15205 +

	15206 +GLOBAL(void)

	15207 jsimd_ycc_rgb_convert (j_decompress_ptr cinfo,

	15208 JSAMPIMAGE input_buf, JDIMENSION input_row,

	15209 JSAMPARRAY output_buf, int num_rows)

	15210 @@ -168,6 +252,7 @@

	15211 mmxfct=jsimd_ycc_extrgb_convert_mmx;

	15212 break;

	15213 case JCS_EXT_RGBX:

	15214 + case JCS_EXT_RGBA:

	15215 sse2fct=jsimd_ycc_extrgbx_convert_sse2;

	15216 mmxfct=jsimd_ycc_extrgbx_convert_mmx;

	15217 break;

	15218 @@ -176,14 +261,17 @@

	15219 mmxfct=jsimd_ycc_extbgr_convert_mmx;

	15220 break;

	15221 case JCS_EXT_BGRX:

	15222 + case JCS_EXT_BGRA:

	15223 sse2fct=jsimd_ycc_extbgrx_convert_sse2;

	15224 mmxfct=jsimd_ycc_extbgrx_convert_mmx;

	15225 break;

	15226 case JCS_EXT_XBGR:

	15227 + case JCS_EXT_ABGR:

	15228 sse2fct=jsimd_ycc_extxbgr_convert_sse2;

	15229 mmxfct=jsimd_ycc_extxbgr_convert_mmx;

	15230 break;

	15231 case JCS_EXT_XRGB:

	15232 + case JCS_EXT_ARGB:

	15233 sse2fct=jsimd_ycc_extxrgb_convert_sse2;

	15234 mmxfct=jsimd_ycc_extxrgb_convert_mmx;

	15235 break;

	15236 @@ -202,6 +290,7 @@

	15237 input_row, output_buf, num_rows);

	15238 }

	15239

	15240 +#ifndef JPEG_DECODE_ONLY

	15241 GLOBAL(int)

	15242 jsimd_can_h2v2_downsample (void)

	15243 {

	15244 @@ -267,6 +356,7 @@

	15245 compptr->v_samp_factor, compptr->width_in_blocks,

	15246 input_data, output_data);

	15247 }

	15248 +#endif

	15249

	15250 GLOBAL(int)

	15251 jsimd_can_h2v2_upsample (void)

	15252 @@ -382,7 +472,7 @@

	15253 {

	15254 if ((simd_support & JSIMD_SSE2) &&

	15255 IS_ALIGNED_SSE(jconst_fancy_upsample_sse2))

	15256 - jsimd_h2v1_fancy_upsample_sse2(cinfo->max_v_samp_factor,

	15257 + jsimd_h2v2_fancy_upsample_sse2(cinfo->max_v_samp_factor,

	15258 compptr->downsampled_width, input_data, output_data_ptr);

	15259 else if (simd_support & JSIMD_MMX)

	15260 jsimd_h2v2_fancy_upsample_mmx(cinfo->max_v_samp_factor,

	15261 @@ -460,6 +550,7 @@

	15262 mmxfct=jsimd_h2v2_extrgb_merged_upsample_mmx;

	15263 break;

	15264 case JCS_EXT_RGBX:

	15265 + case JCS_EXT_RGBA:

	15266 sse2fct=jsimd_h2v2_extrgbx_merged_upsample_sse2;

	15267 mmxfct=jsimd_h2v2_extrgbx_merged_upsample_mmx;

	15268 break;

	15269 @@ -468,14 +559,17 @@

	15270 mmxfct=jsimd_h2v2_extbgr_merged_upsample_mmx;

	15271 break;

	15272 case JCS_EXT_BGRX:

	15273 + case JCS_EXT_BGRA:

	15274 sse2fct=jsimd_h2v2_extbgrx_merged_upsample_sse2;

	15275 mmxfct=jsimd_h2v2_extbgrx_merged_upsample_mmx;

	15276 break;

	15277 case JCS_EXT_XBGR:

	15278 + case JCS_EXT_ABGR:

	15279 sse2fct=jsimd_h2v2_extxbgr_merged_upsample_sse2;

	15280 mmxfct=jsimd_h2v2_extxbgr_merged_upsample_mmx;

	15281 break;

	15282 case JCS_EXT_XRGB:

	15283 + case JCS_EXT_ARGB:

	15284 sse2fct=jsimd_h2v2_extxrgb_merged_upsample_sse2;

	15285 mmxfct=jsimd_h2v2_extxrgb_merged_upsample_mmx;

	15286 break;

	15287 @@ -510,6 +604,7 @@

	15288 mmxfct=jsimd_h2v1_extrgb_merged_upsample_mmx;

	15289 break;

	15290 case JCS_EXT_RGBX:

	15291 + case JCS_EXT_RGBA:

	15292 sse2fct=jsimd_h2v1_extrgbx_merged_upsample_sse2;

	15293 mmxfct=jsimd_h2v1_extrgbx_merged_upsample_mmx;

	15294 break;

	15295 @@ -518,14 +613,17 @@

	15296 mmxfct=jsimd_h2v1_extbgr_merged_upsample_mmx;

	15297 break;

	15298 case JCS_EXT_BGRX:

	15299 + case JCS_EXT_BGRA:

	15300 sse2fct=jsimd_h2v1_extbgrx_merged_upsample_sse2;

	15301 mmxfct=jsimd_h2v1_extbgrx_merged_upsample_mmx;

	15302 break;

	15303 case JCS_EXT_XBGR:

	15304 + case JCS_EXT_ABGR:

	15305 sse2fct=jsimd_h2v1_extxbgr_merged_upsample_sse2;

	15306 mmxfct=jsimd_h2v1_extxbgr_merged_upsample_mmx;

	15307 break;

	15308 case JCS_EXT_XRGB:

	15309 + case JCS_EXT_ARGB:

	15310 sse2fct=jsimd_h2v1_extxrgb_merged_upsample_sse2;

	15311 mmxfct=jsimd_h2v1_extxrgb_merged_upsample_mmx;

	15312 break;

	15313 @@ -544,6 +642,7 @@

	15314 in_row_group_ctr, output_buf);

	15315 }

	15316

	15317 +#ifndef JPEG_DECODE_ONLY

	15318 GLOBAL(int)

	15319 jsimd_can_convsamp (void)

	15320 {

	15321 @@ -763,6 +862,7 @@

	15322 else if (simd_support & JSIMD_3DNOW)

	15323 jsimd_quantize_float_3dnow(coef_block, divisors, workspace);

	15324 }

	15325 +#endif

	15326

	15327 GLOBAL(int)

	15328 jsimd_can_idct_2x2 (void)

	15329 @@ -953,4 +1053,3 @@

	15330 jsimd_idct_float_3dnow(compptr->dct_table, coef_block,

	15331 output_buf, output_col);

	15332 }

	15333 -

	15334 Index: simd/jsimd_x86_64.c

	15335 ===================================================================

	15336 --- simd/jsimd_x86_64.c»(revision 829)

	15337 +++ simd/jsimd_x86_64.c»(working copy)

	15338 @@ -2,10 +2,11 @@

	15339 * jsimd_x86_64.c

	15340 *

	15341 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	15342 - * Copyright 2009 D. R. Commander

	15343 + * Copyright 2009-2011 D. R. Commander

	15344 *

	15345 * Based on the x86 SIMD extension for IJG JPEG library,

	15346 * Copyright (C) 1999-2006, MIYASAKA Masaru.

	15347 + * For conditions of distribution and use, see copyright notice in jsimdext.inc

	15348 *

	15349 * This file contains the interface between the "normal" portions

	15350 * of the library and the SIMD implementations when running on a

	15351 @@ -18,16 +19,17 @@

	15352 #include "../jsimd.h"

	15353 #include "../jdct.h"

	15354 #include "../jsimddct.h"

	15355 -#include "simd/jsimd.h"

	15356 +#include "jsimd.h"

	15357

	15358 /*

	15359 * In the PIC cases, we have no guarantee that constants will keep

	15360 * their alignment. This macro allows us to verify it at runtime.

	15361 */

	15362 -#define IS_ALIGNED(ptr, order) (((unsigned)ptr & ((1 << order) - 1)) == 0)

	15363 +#define IS_ALIGNED(ptr, order) (((size_t)ptr & ((1 << order) - 1)) == 0)

	15364

	15365 #define IS_ALIGNED_SSE(ptr) (IS_ALIGNED(ptr, 4)) /* 16 byte alignment */

	15366

	15367 +#ifndef JPEG_DECODE_ONLY

	15368 GLOBAL(int)

	15369 jsimd_can_rgb_ycc (void)

	15370 {

	15371 @@ -44,8 +46,26 @@

	15372

	15373 return 1;

	15374 }

	15375 +#endif

	15376

	15377 GLOBAL(int)

	15378 +jsimd_can_rgb_gray (void)

	15379 +{

2130 + /* The code is optimised for these values only */	15380 + /* The code is optimised for these values only */

2131 + if (BITS_IN_JSAMPLE != 8)	15381 + if (BITS_IN_JSAMPLE != 8)

2132 + return 0;	15382 + return 0;

2133 + if (sizeof(JDIMENSION) != 4)	15383 + if (sizeof(JDIMENSION) != 4)

2134 + return 0;	15384 + return 0;

2135 +	15385 + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4))

2136 + if (simd_support & JSIMD_ARM_NEON)	15386 + return 0;

2137 + return 1;	15387 +

2138 +	15388 + if (!IS_ALIGNED_SSE(jconst_rgb_gray_convert_sse2))

2139 + return 0;	15389 + return 0;

	15390 +

	15391 + return 1;

2140 +}	15392 +}

2141 +	15393 +

2142 +GLOBAL(void)	15394 +GLOBAL(int)

2143 +jsimd_rgb_ycc_convert (j_compress_ptr cinfo,	15395 jsimd_can_ycc_rgb (void)

2144 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,	15396 {

2145 + JDIMENSION output_row, int num_rows)	15397 /* The code is optimised for these values only */

2146 +{	15398 @@ -62,6 +82,7 @@

2147 +}	15399 return 1;

2148 +	15400 }

2149 +GLOBAL(void)	15401

	15402 +#ifndef JPEG_DECODE_ONLY

	15403 GLOBAL(void)

	15404 jsimd_rgb_ycc_convert (j_compress_ptr cinfo,

	15405 JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

	15406 @@ -75,6 +96,7 @@

	15407 sse2fct=jsimd_extrgb_ycc_convert_sse2;

	15408 break;

	15409 case JCS_EXT_RGBX:

	15410 + case JCS_EXT_RGBA:

	15411 sse2fct=jsimd_extrgbx_ycc_convert_sse2;

	15412 break;

	15413 case JCS_EXT_BGR:

	15414 @@ -81,12 +103,15 @@

	15415 sse2fct=jsimd_extbgr_ycc_convert_sse2;

	15416 break;

	15417 case JCS_EXT_BGRX:

	15418 + case JCS_EXT_BGRA:

	15419 sse2fct=jsimd_extbgrx_ycc_convert_sse2;

	15420 break;

	15421 case JCS_EXT_XBGR:

	15422 + case JCS_EXT_ABGR:

	15423 sse2fct=jsimd_extxbgr_ycc_convert_sse2;

	15424 break;

	15425 case JCS_EXT_XRGB:

	15426 + case JCS_EXT_ARGB:

	15427 sse2fct=jsimd_extxrgb_ycc_convert_sse2;

	15428 break;

	15429 default:

	15430 @@ -96,8 +121,48 @@

	15431

	15432 sse2fct(cinfo->image_width, input_buf, output_buf, output_row, num_rows);

	15433 }

	15434 +#endif

	15435

	15436 GLOBAL(void)

2150 +jsimd_rgb_gray_convert (j_compress_ptr cinfo,	15437 +jsimd_rgb_gray_convert (j_compress_ptr cinfo,

2151 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,	15438 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf,

2152 + JDIMENSION output_row, int num_rows)	15439 + JDIMENSION output_row, int num_rows)

2153 +{	15440 +{

2154 +}	15441 + void (*sse2fct)(JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int);

2155 +	15442 +

2156 +GLOBAL(void)	15443 + switch(cinfo->in_color_space)

2157 +jsimd_ycc_rgb_convert (j_decompress_ptr cinfo,	15444 + {

2158 + JSAMPIMAGE input_buf, JDIMENSION input_row,

2159 + JSAMPARRAY output_buf, int num_rows)

2160 +{

2161 + void (*neonfct)(JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY, int);

2162 +

2163 + switch(cinfo->out_color_space) {

2164 + case JCS_EXT_RGB:	15445 + case JCS_EXT_RGB:

2165 + neonfct=jsimd_ycc_extrgb_convert_neon;	15446 + sse2fct=jsimd_extrgb_gray_convert_sse2;

2166 + break;	15447 + break;

2167 + case JCS_EXT_RGBX:	15448 + case JCS_EXT_RGBX:

2168 + case JCS_EXT_RGBA:	15449 + case JCS_EXT_RGBA:

2169 + neonfct=jsimd_ycc_extrgbx_convert_neon;	15450 + sse2fct=jsimd_extrgbx_gray_convert_sse2;

2170 + break;	15451 + break;

2171 + case JCS_EXT_BGR:	15452 + case JCS_EXT_BGR:

2172 + neonfct=jsimd_ycc_extbgr_convert_neon;	15453 + sse2fct=jsimd_extbgr_gray_convert_sse2;

2173 + break;	15454 + break;

2174 + case JCS_EXT_BGRX:	15455 + case JCS_EXT_BGRX:

2175 + case JCS_EXT_BGRA:	15456 + case JCS_EXT_BGRA:

2176 + neonfct=jsimd_ycc_extbgrx_convert_neon;	15457 + sse2fct=jsimd_extbgrx_gray_convert_sse2;

2177 + break;	15458 + break;

2178 + case JCS_EXT_XBGR:	15459 + case JCS_EXT_XBGR:

2179 + case JCS_EXT_ABGR:	15460 + case JCS_EXT_ABGR:

2180 + neonfct=jsimd_ycc_extxbgr_convert_neon;	15461 + sse2fct=jsimd_extxbgr_gray_convert_sse2;

2181 + break;	15462 + break;

2182 + case JCS_EXT_XRGB:	15463 + case JCS_EXT_XRGB:

2183 + case JCS_EXT_ARGB:	15464 + case JCS_EXT_ARGB:

2184 + neonfct=jsimd_ycc_extxrgb_convert_neon;	15465 + sse2fct=jsimd_extxrgb_gray_convert_sse2;

2185 + break;	15466 + break;

2186 + default:	15467 + default:

2187 + neonfct=jsimd_ycc_extrgb_convert_neon;	15468 + sse2fct=jsimd_rgb_gray_convert_sse2;

2188 + break;	15469 + break;

2189 + }	15470 + }

2190 +	15471 +

2191 + if (simd_support & JSIMD_ARM_NEON)	15472 + sse2fct(cinfo->image_width, input_buf, output_buf, output_row, num_rows);

2192 + neonfct(cinfo->output_width, input_buf, input_row, output_buf, num_rows);

2193 +}	15473 +}

2194 +	15474 +

2195 +GLOBAL(void)	15475 +GLOBAL(void)

2196 +jsimd_ycc_rgb565_convert (j_decompress_ptr cinfo,	15476 jsimd_ycc_rgb_convert (j_decompress_ptr cinfo,

2197 + JSAMPIMAGE input_buf, JDIMENSION input_row,	15477 JSAMPIMAGE input_buf, JDIMENSION input_row,

2198 + JSAMPARRAY output_buf, int num_rows)	15478 JSAMPARRAY output_buf, int num_rows)

	15479 @@ -110,6 +175,7 @@

	15480 sse2fct=jsimd_ycc_extrgb_convert_sse2;

	15481 break;

	15482 case JCS_EXT_RGBX:

	15483 + case JCS_EXT_RGBA:

	15484 sse2fct=jsimd_ycc_extrgbx_convert_sse2;

	15485 break;

	15486 case JCS_EXT_BGR:

	15487 @@ -116,12 +182,15 @@

	15488 sse2fct=jsimd_ycc_extbgr_convert_sse2;

	15489 break;

	15490 case JCS_EXT_BGRX:

	15491 + case JCS_EXT_BGRA:

	15492 sse2fct=jsimd_ycc_extbgrx_convert_sse2;

	15493 break;

	15494 case JCS_EXT_XBGR:

	15495 + case JCS_EXT_ABGR:

	15496 sse2fct=jsimd_ycc_extxbgr_convert_sse2;

	15497 break;

	15498 case JCS_EXT_XRGB:

	15499 + case JCS_EXT_ARGB:

	15500 sse2fct=jsimd_ycc_extxrgb_convert_sse2;

	15501 break;

	15502 default:

	15503 @@ -132,6 +201,7 @@

	15504 sse2fct(cinfo->output_width, input_buf, input_row, output_buf, num_rows);

	15505 }

	15506

	15507 +#ifndef JPEG_DECODE_ONLY

	15508 GLOBAL(int)

	15509 jsimd_can_h2v2_downsample (void)

	15510 {

	15511 @@ -177,6 +247,7 @@

	15512 compptr->width_in_blocks,

	15513 input_data, output_data);

	15514 }

	15515 +#endif

	15516

	15517 GLOBAL(int)

	15518 jsimd_can_h2v2_upsample (void)

	15519 @@ -260,7 +331,7 @@

	15520 JSAMPARRAY input_data,

	15521 JSAMPARRAY * output_data_ptr)

	15522 {

	15523 - jsimd_h2v1_fancy_upsample_sse2(cinfo->max_v_samp_factor,

	15524 + jsimd_h2v2_fancy_upsample_sse2(cinfo->max_v_samp_factor,

	15525 compptr->downsampled_width,

	15526 input_data, output_data_ptr);

	15527 }

	15528 @@ -320,6 +391,7 @@

	15529 sse2fct=jsimd_h2v2_extrgb_merged_upsample_sse2;

	15530 break;

	15531 case JCS_EXT_RGBX:

	15532 + case JCS_EXT_RGBA:

	15533 sse2fct=jsimd_h2v2_extrgbx_merged_upsample_sse2;

	15534 break;

	15535 case JCS_EXT_BGR:

	15536 @@ -326,12 +398,15 @@

	15537 sse2fct=jsimd_h2v2_extbgr_merged_upsample_sse2;

	15538 break;

	15539 case JCS_EXT_BGRX:

	15540 + case JCS_EXT_BGRA:

	15541 sse2fct=jsimd_h2v2_extbgrx_merged_upsample_sse2;

	15542 break;

	15543 case JCS_EXT_XBGR:

	15544 + case JCS_EXT_ABGR:

	15545 sse2fct=jsimd_h2v2_extxbgr_merged_upsample_sse2;

	15546 break;

	15547 case JCS_EXT_XRGB:

	15548 + case JCS_EXT_ARGB:

	15549 sse2fct=jsimd_h2v2_extxrgb_merged_upsample_sse2;

	15550 break;

	15551 default:

	15552 @@ -356,6 +431,7 @@

	15553 sse2fct=jsimd_h2v1_extrgb_merged_upsample_sse2;

	15554 break;

	15555 case JCS_EXT_RGBX:

	15556 + case JCS_EXT_RGBA:

	15557 sse2fct=jsimd_h2v1_extrgbx_merged_upsample_sse2;

	15558 break;

	15559 case JCS_EXT_BGR:

	15560 @@ -362,12 +438,15 @@

	15561 sse2fct=jsimd_h2v1_extbgr_merged_upsample_sse2;

	15562 break;

	15563 case JCS_EXT_BGRX:

	15564 + case JCS_EXT_BGRA:

	15565 sse2fct=jsimd_h2v1_extbgrx_merged_upsample_sse2;

	15566 break;

	15567 case JCS_EXT_XBGR:

	15568 + case JCS_EXT_ABGR:

	15569 sse2fct=jsimd_h2v1_extxbgr_merged_upsample_sse2;

	15570 break;

	15571 case JCS_EXT_XRGB:

	15572 + case JCS_EXT_ARGB:

	15573 sse2fct=jsimd_h2v1_extxrgb_merged_upsample_sse2;

	15574 break;

	15575 default:

	15576 @@ -378,6 +457,7 @@

	15577 sse2fct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf);

	15578 }

	15579

	15580 +#ifndef JPEG_DECODE_ONLY

	15581 GLOBAL(int)

	15582 jsimd_can_convsamp (void)

	15583 {

	15584 @@ -528,6 +608,7 @@

	15585 {

	15586 jsimd_quantize_float_sse2(coef_block, divisors, workspace);

	15587 }

	15588 +#endif

	15589

	15590 GLOBAL(int)

	15591 jsimd_can_idct_2x2 (void)

	15592 @@ -677,4 +758,3 @@

	15593 jsimd_idct_float_sse2(compptr->dct_table, coef_block,

	15594 output_buf, output_col);

	15595 }

	15596 -

	15597 Index: simd/jsimdcfg.inc.h

	15598 ===================================================================

	15599 --- simd/jsimdcfg.inc.h (revision 829)

	15600 +++ simd/jsimdcfg.inc.h (working copy)

	15601 @@ -15,26 +15,54 @@

	15602 #include "../jmorecfg.h"

	15603 #include "jsimd.h"

	15604

	15605 -#define define(var) %define _cpp_protection_##var

	15606 -#define definev(var) %define _cpp_protection_##var var

	15607 -

	15608 ;

	15609 ; -- jpeglib.h

	15610 ;

	15611

	15612 -definev(DCTSIZE)

	15613 -definev(DCTSIZE2)

	15614 +%define _cpp_protection_DCTSIZE DCTSIZE

	15615 +%define _cpp_protection_DCTSIZE2 DCTSIZE2

	15616

	15617 ;

	15618 ; -- jmorecfg.h

	15619 ;

	15620

	15621 -definev(RGB_RED)

	15622 -definev(RGB_GREEN)

	15623 -definev(RGB_BLUE)

	15624 +%define _cpp_protection_RGB_RED RGB_RED

	15625 +%define _cpp_protection_RGB_GREEN RGB_GREEN

	15626 +%define _cpp_protection_RGB_BLUE RGB_BLUE

	15627 +%define _cpp_protection_RGB_PIXELSIZE RGB_PIXELSIZE

	15628

	15629 -definev(RGB_PIXELSIZE)

	15630 +%define _cpp_protection_EXT_RGB_RED EXT_RGB_RED

	15631 +%define _cpp_protection_EXT_RGB_GREEN EXT_RGB_GREEN

	15632 +%define _cpp_protection_EXT_RGB_BLUE EXT_RGB_BLUE

	15633 +%define _cpp_protection_EXT_RGB_PIXELSIZE EXT_RGB_PIXELSIZE

	15634

	15635 +%define _cpp_protection_EXT_RGBX_RED EXT_RGBX_RED

	15636 +%define _cpp_protection_EXT_RGBX_GREEN EXT_RGBX_GREEN

	15637 +%define _cpp_protection_EXT_RGBX_BLUE EXT_RGBX_BLUE

	15638 +%define _cpp_protection_EXT_RGBX_PIXELSIZE EXT_RGBX_PIXELSIZE

	15639 +

	15640 +%define _cpp_protection_EXT_BGR_RED EXT_BGR_RED

	15641 +%define _cpp_protection_EXT_BGR_GREEN EXT_BGR_GREEN

	15642 +%define _cpp_protection_EXT_BGR_BLUE EXT_BGR_BLUE

	15643 +%define _cpp_protection_EXT_BGR_PIXELSIZE EXT_BGR_PIXELSIZE

	15644 +

	15645 +%define _cpp_protection_EXT_BGRX_RED EXT_BGRX_RED

	15646 +%define _cpp_protection_EXT_BGRX_GREEN EXT_BGRX_GREEN

	15647 +%define _cpp_protection_EXT_BGRX_BLUE EXT_BGRX_BLUE

	15648 +%define _cpp_protection_EXT_BGRX_PIXELSIZE EXT_BGRX_PIXELSIZE

	15649 +

	15650 +%define _cpp_protection_EXT_XBGR_RED EXT_XBGR_RED

	15651 +%define _cpp_protection_EXT_XBGR_GREEN EXT_XBGR_GREEN

	15652 +%define _cpp_protection_EXT_XBGR_BLUE EXT_XBGR_BLUE

	15653 +%define _cpp_protection_EXT_XBGR_PIXELSIZE EXT_XBGR_PIXELSIZE

	15654 +

	15655 +%define _cpp_protection_EXT_XRGB_RED EXT_XRGB_RED

	15656 +%define _cpp_protection_EXT_XRGB_GREEN EXT_XRGB_GREEN

	15657 +%define _cpp_protection_EXT_XRGB_BLUE EXT_XRGB_BLUE

	15658 +%define _cpp_protection_EXT_XRGB_PIXELSIZE EXT_XRGB_PIXELSIZE

	15659 +

	15660 +%define RGBX_FILLER_0XFF 1

	15661 +

	15662 ; Representation of a single sample (pixel element value).

	15663 ; On this SIMD implementation, this must be 'unsigned char'.

	15664 ;

	15665 @@ -42,7 +70,7 @@

	15666 %define JSAMPLE byte ; unsigned char

	15667 %define SIZEOF_JSAMPLE SIZEOF_BYTE ; sizeof(JSAMPLE)

	15668

	15669 -definev(CENTERJSAMPLE)

	15670 +%define _cpp_protection_CENTERJSAMPLE CENTERJSAMPLE

	15671

	15672 ; Representation of a DCT frequency coefficient.

	15673 ; On this SIMD implementation, this must be 'short'.

	15674 @@ -95,74 +123,74 @@

	15675 ; -- jsimd.h

	15676 ;

	15677

	15678 -definev(JSIMD_NONE)

	15679 -definev(JSIMD_MMX)

	15680 -definev(JSIMD_3DNOW)

	15681 -definev(JSIMD_SSE)

	15682 -definev(JSIMD_SSE2)

	15683 +%define _cpp_protection_JSIMD_NONE JSIMD_NONE

	15684 +%define _cpp_protection_JSIMD_MMX JSIMD_MMX

	15685 +%define _cpp_protection_JSIMD_3DNOW JSIMD_3DNOW

	15686 +%define _cpp_protection_JSIMD_SSE JSIMD_SSE

	15687 +%define _cpp_protection_JSIMD_SSE2 JSIMD_SSE2

	15688

	15689 ; Short forms of external names for systems with brain-damaged linkers.

	15690 ;

	15691 #ifdef NEED_SHORT_EXTERNAL_NAMES

	15692 -definev(jpeg_simd_cpu_support)

	15693 -definev(jsimd_rgb_ycc_convert_mmx)

	15694 -definev(jsimd_ycc_rgb_convert_mmx)

	15695 -definev(jconst_rgb_ycc_convert_sse2)

	15696 -definev(jsimd_rgb_ycc_convert_sse2)

	15697 -definev(jconst_ycc_rgb_convert_sse2)

	15698 -definev(jsimd_ycc_rgb_convert_sse2)

	15699 -definev(jsimd_h2v2_downsample_mmx)

	15700 -definev(jsimd_h2v1_downsample_mmx)

	15701 -definev(jsimd_h2v2_downsample_sse2)

	15702 -definev(jsimd_h2v1_downsample_sse2)

	15703 -definev(jsimd_h2v2_upsample_mmx)

	15704 -definev(jsimd_h2v1_upsample_mmx)

	15705 -definev(jsimd_h2v1_fancy_upsample_mmx)

	15706 -definev(jsimd_h2v2_fancy_upsample_mmx)

	15707 -definev(jsimd_h2v1_merged_upsample_mmx)

	15708 -definev(jsimd_h2v2_merged_upsample_mmx)

	15709 -definev(jsimd_h2v2_upsample_sse2)

	15710 -definev(jsimd_h2v1_upsample_sse2)

	15711 -definev(jconst_fancy_upsample_sse2)

	15712 -definev(jsimd_h2v1_fancy_upsample_sse2)

	15713 -definev(jsimd_h2v2_fancy_upsample_sse2)

	15714 -definev(jconst_merged_upsample_sse2)

	15715 -definev(jsimd_h2v1_merged_upsample_sse2)

	15716 -definev(jsimd_h2v2_merged_upsample_sse2)

	15717 -definev(jsimd_convsamp_mmx)

	15718 -definev(jsimd_convsamp_sse2)

	15719 -definev(jsimd_convsamp_float_3dnow)

	15720 -definev(jsimd_convsamp_float_sse)

	15721 -definev(jsimd_convsamp_float_sse2)

	15722 -definev(jsimd_fdct_islow_mmx)

	15723 -definev(jsimd_fdct_ifast_mmx)

	15724 -definev(jconst_fdct_islow_sse2)

	15725 -definev(jsimd_fdct_islow_sse2)

	15726 -definev(jconst_fdct_ifast_sse2)

	15727 -definev(jsimd_fdct_ifast_sse2)

	15728 -definev(jsimd_fdct_float_3dnow)

	15729 -definev(jconst_fdct_float_sse)

	15730 -definev(jsimd_fdct_float_sse)

	15731 -definev(jsimd_quantize_mmx)

	15732 -definev(jsimd_quantize_sse2)

	15733 -definev(jsimd_quantize_float_3dnow)

	15734 -definev(jsimd_quantize_float_sse)

	15735 -definev(jsimd_quantize_float_sse2)

	15736 -definev(jsimd_idct_2x2_mmx)

	15737 -definev(jsimd_idct_4x4_mmx)

	15738 -definev(jconst_idct_red_sse2)

	15739 -definev(jsimd_idct_2x2_sse2)

	15740 -definev(jsimd_idct_4x4_sse2)

	15741 -definev(jsimd_idct_islow_mmx)

	15742 -definev(jsimd_idct_ifast_mmx)

	15743 -definev(jconst_idct_islow_sse2)

	15744 -definev(jsimd_idct_islow_sse2)

	15745 -definev(jconst_idct_ifast_sse2)

	15746 -definev(jsimd_idct_ifast_sse2)

	15747 -definev(jsimd_idct_float_3dnow)

	15748 -definev(jconst_idct_float_sse)

	15749 -definev(jsimd_idct_float_sse)

	15750 -definev(jconst_idct_float_sse2)

	15751 -definev(jsimd_idct_float_sse2)

	15752 +%define _cpp_protection_jpeg_simd_cpu_support jpeg_simd_cpu_support

	15753 +%define _cpp_protection_jsimd_rgb_ycc_convert_mmx jsimd_rgb_ycc_convert_mmx

	15754 +%define _cpp_protection_jsimd_ycc_rgb_convert_mmx jsimd_ycc_rgb_convert_mmx

	15755 +%define _cpp_protection_jconst_rgb_ycc_convert_sse2 jconst_rgb_ycc_convert_sse2

	15756 +%define _cpp_protection_jsimd_rgb_ycc_convert_sse2 jsimd_rgb_ycc_convert_sse2

	15757 +%define _cpp_protection_jconst_ycc_rgb_convert_sse2 jconst_ycc_rgb_convert_sse2

	15758 +%define _cpp_protection_jsimd_ycc_rgb_convert_sse2 jsimd_ycc_rgb_convert_sse2

	15759 +%define _cpp_protection_jsimd_h2v2_downsample_mmx jsimd_h2v2_downsample_mmx

	15760 +%define _cpp_protection_jsimd_h2v1_downsample_mmx jsimd_h2v1_downsample_mmx

	15761 +%define _cpp_protection_jsimd_h2v2_downsample_sse2 jsimd_h2v2_downsample_sse2

	15762 +%define _cpp_protection_jsimd_h2v1_downsample_sse2 jsimd_h2v1_downsample_sse2

	15763 +%define _cpp_protection_jsimd_h2v2_upsample_mmx jsimd_h2v2_upsample_mmx

	15764 +%define _cpp_protection_jsimd_h2v1_upsample_mmx jsimd_h2v1_upsample_mmx

	15765 +%define _cpp_protection_jsimd_h2v1_fancy_upsample_mmx jsimd_h2v1_fancy_upsample _mmx

	15766 +%define _cpp_protection_jsimd_h2v2_fancy_upsample_mmx jsimd_h2v2_fancy_upsample _mmx

	15767 +%define _cpp_protection_jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_merged_upsamp le_mmx

	15768 +%define _cpp_protection_jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_merged_upsamp le_mmx

	15769 +%define _cpp_protection_jsimd_h2v2_upsample_sse2 jsimd_h2v2_upsample_sse2

	15770 +%define _cpp_protection_jsimd_h2v1_upsample_sse2 jsimd_h2v1_upsample_sse2

	15771 +%define _cpp_protection_jconst_fancy_upsample_sse2 jconst_fancy_upsample_sse2

	15772 +%define _cpp_protection_jsimd_h2v1_fancy_upsample_sse2 jsimd_h2v1_fancy_upsampl e_sse2

	15773 +%define _cpp_protection_jsimd_h2v2_fancy_upsample_sse2 jsimd_h2v2_fancy_upsampl e_sse2

	15774 +%define _cpp_protection_jconst_merged_upsample_sse2 jconst_merged_upsample_sse2

	15775 +%define _cpp_protection_jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_merged_upsam ple_sse2

	15776 +%define _cpp_protection_jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_merged_upsam ple_sse2

	15777 +%define _cpp_protection_jsimd_convsamp_mmx jsimd_convsamp_mmx

	15778 +%define _cpp_protection_jsimd_convsamp_sse2 jsimd_convsamp_sse2

	15779 +%define _cpp_protection_jsimd_convsamp_float_3dnow jsimd_convsamp_float_3dnow

	15780 +%define _cpp_protection_jsimd_convsamp_float_sse jsimd_convsamp_float_sse

	15781 +%define _cpp_protection_jsimd_convsamp_float_sse2 jsimd_convsamp_float_sse2

	15782 +%define _cpp_protection_jsimd_fdct_islow_mmx jsimd_fdct_islow_mmx

	15783 +%define _cpp_protection_jsimd_fdct_ifast_mmx jsimd_fdct_ifast_mmx

	15784 +%define _cpp_protection_jconst_fdct_islow_sse2 jconst_fdct_islow_sse2

	15785 +%define _cpp_protection_jsimd_fdct_islow_sse2 jsimd_fdct_islow_sse2

	15786 +%define _cpp_protection_jconst_fdct_ifast_sse2 jconst_fdct_ifast_sse2

	15787 +%define _cpp_protection_jsimd_fdct_ifast_sse2 jsimd_fdct_ifast_sse2

	15788 +%define _cpp_protection_jsimd_fdct_float_3dnow jsimd_fdct_float_3dnow

	15789 +%define _cpp_protection_jconst_fdct_float_sse jconst_fdct_float_sse

	15790 +%define _cpp_protection_jsimd_fdct_float_sse jsimd_fdct_float_sse

	15791 +%define _cpp_protection_jsimd_quantize_mmx jsimd_quantize_mmx

	15792 +%define _cpp_protection_jsimd_quantize_sse2 jsimd_quantize_sse2

	15793 +%define _cpp_protection_jsimd_quantize_float_3dnow jsimd_quantize_float_3dnow

	15794 +%define _cpp_protection_jsimd_quantize_float_sse jsimd_quantize_float_sse

	15795 +%define _cpp_protection_jsimd_quantize_float_sse2 jsimd_quantize_float_sse2

	15796 +%define _cpp_protection_jsimd_idct_2x2_mmx jsimd_idct_2x2_mmx

	15797 +%define _cpp_protection_jsimd_idct_4x4_mmx jsimd_idct_4x4_mmx

	15798 +%define _cpp_protection_jconst_idct_red_sse2 jconst_idct_red_sse2

	15799 +%define _cpp_protection_jsimd_idct_2x2_sse2 jsimd_idct_2x2_sse2

	15800 +%define _cpp_protection_jsimd_idct_4x4_sse2 jsimd_idct_4x4_sse2

	15801 +%define _cpp_protection_jsimd_idct_islow_mmx jsimd_idct_islow_mmx

	15802 +%define _cpp_protection_jsimd_idct_ifast_mmx jsimd_idct_ifast_mmx

	15803 +%define _cpp_protection_jconst_idct_islow_sse2 jconst_idct_islow_sse2

	15804 +%define _cpp_protection_jsimd_idct_islow_sse2 jsimd_idct_islow_sse2

	15805 +%define _cpp_protection_jconst_idct_ifast_sse2 jconst_idct_ifast_sse2

	15806 +%define _cpp_protection_jsimd_idct_ifast_sse2 jsimd_idct_ifast_sse2

	15807 +%define _cpp_protection_jsimd_idct_float_3dnow jsimd_idct_float_3dnow

	15808 +%define _cpp_protection_jconst_idct_float_sse jconst_idct_float_sse

	15809 +%define _cpp_protection_jsimd_idct_float_sse jsimd_idct_float_sse

	15810 +%define _cpp_protection_jconst_idct_float_sse2 jconst_idct_float_sse2

	15811 +%define _cpp_protection_jsimd_idct_float_sse2 jsimd_idct_float_sse2

	15812 #endif /* NEED_SHORT_EXTERNAL_NAMES */

	15813

	15814 Index: simd/jsimdcpu.asm

	15815 ===================================================================

	15816 --- simd/jsimdcpu.asm (revision 829)

	15817 +++ simd/jsimdcpu.asm (working copy)

	15818 @@ -29,7 +29,7 @@

	15819 ;

	15820

	15821 align 16

	15822 - global EXTN(jpeg_simd_cpu_support)

	15823 + global EXTN(jpeg_simd_cpu_support) PRIVATE

	15824

	15825 EXTN(jpeg_simd_cpu_support):

	15826 push ebx

	15827 @@ -100,3 +100,6 @@

	15828 pop ebx

	15829 ret

	15830

	15831 +; For some reason, the OS X linker does not honor the request to align the

	15832 +; segment unless we do this.

	15833 + align 16

	15834 Index: simd/jsimdext.inc

	15835 ===================================================================

	15836 --- simd/jsimdext.inc (revision 829)

	15837 +++ simd/jsimdext.inc (working copy)

	15838 @@ -2,6 +2,7 @@

	15839 ; jsimdext.inc - common declarations

	15840 ;

	15841 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB

	15842 +; Copyright 2010 D. R. Commander

	15843 ;

	15844 ; Based on

	15845 ; x86 SIMD extension for IJG JPEG library - version 1.02

	15846 @@ -37,9 +38,28 @@

	15847

	15848 ; -- segment definition --

	15849 ;

	15850 +%ifdef __YASM_VER__

	15851 +%define SEG_TEXT .text align=16

	15852 +%define SEG_CONST .rdata align=16

	15853 +%else

	15854 %define SEG_TEXT .text align=16 public use32 class=CODE

	15855 %define SEG_CONST .rdata align=16 public use32 class=CONST

	15856 +%endif

	15857

	15858 +%elifdef WIN64 ; ----(nasm -fwin64 -DWIN64 ...)--------

	15859 +; * Microsoft Visual C++

	15860 +

	15861 +; -- segment definition --

	15862 +;

	15863 +%ifdef __YASM_VER__

	15864 +%define SEG_TEXT .text align=16

	15865 +%define SEG_CONST .rdata align=16

	15866 +%else

	15867 +%define SEG_TEXT .text align=16 public use64 class=CODE

	15868 +%define SEG_CONST .rdata align=16 public use64 class=CONST

	15869 +%endif

	15870 +%define EXTN(name) name ; foo() -> foo

	15871 +

	15872 %elifdef OBJ32 ; ----(nasm -fobj -DOBJ32 ...)----------

	15873 ; * Borland C++ (Win32)

	15874

	15875 @@ -53,6 +73,12 @@

	15876 ; * *BSD family Unix using elf format

	15877 ; * Unix System V, including Solaris x86, UnixWare and SCO Unix

	15878

	15879 +; PIC is the default on Linux

	15880 +%define PIC

	15881 +

	15882 +; mark stack as non-executable

	15883 +section .note.GNU-stack noalloc noexec nowrite progbits

	15884 +

	15885 ; -- segment definition --

	15886 ;

	15887 %ifdef __x86_64__

	15888 @@ -280,7 +306,44 @@

	15889 %endmacro

	15890

	15891 %ifdef __x86_64__

	15892 +

	15893 +%ifdef WIN64

	15894 +

	15895 %imacro collect_args 0

	15896 + push r12

	15897 + push r13

	15898 + push r14

	15899 + push r15

	15900 + mov r10, rcx

	15901 + mov r11, rdx

	15902 + mov r12, r8

	15903 + mov r13, r9

	15904 + mov r14, [rax+48]

	15905 + mov r15, [rax+56]

	15906 + push rsi

	15907 + push rdi

	15908 + sub rsp, SIZEOF_XMMWORD

	15909 + movaps XMMWORD [rsp], xmm6

	15910 + sub rsp, SIZEOF_XMMWORD

	15911 + movaps XMMWORD [rsp], xmm7

	15912 +%endmacro

	15913 +

	15914 +%imacro uncollect_args 0

	15915 + movaps xmm7, XMMWORD [rsp]

	15916 + add rsp, SIZEOF_XMMWORD

	15917 + movaps xmm6, XMMWORD [rsp]

	15918 + add rsp, SIZEOF_XMMWORD

	15919 + pop rdi

	15920 + pop rsi

	15921 + pop r15

	15922 + pop r14

	15923 + pop r13

	15924 + pop r12

	15925 +%endmacro

	15926 +

	15927 +%else

	15928 +

	15929 +%imacro collect_args 0

	15930 push r10

	15931 push r11

	15932 push r12

	15933 @@ -306,9 +369,21 @@

	15934

	15935 %endif

	15936

	15937 +%endif

	15938 +

	15939 ; --------------------------------------------------------------------------

	15940 ; Defines picked up from the C headers

	15941 ;

	15942 %include "jsimdcfg.inc"

	15943

	15944 +; Begin chromium edits

	15945 +%ifdef MACHO ; ----(nasm -fmacho -DMACHO ...)--------

	15946 +%define PRIVATE :private_extern

	15947 +%elifdef ELF ; ----(nasm -felf[64] -DELF ...)------------

	15948 +%define PRIVATE :hidden

	15949 +%else

	15950 +%define PRIVATE

	15951 +%endif

	15952 +; End chromium edits

	15953 +

	15954 ; --------------------------------------------------------------------------

	15955 Index: turbojpeg.h

	15956 ===================================================================

	15957 --- turbojpeg.h (revision 829)

	15958 +++ turbojpeg.h (working copy)

	15959 @@ -1,231 +1,932 @@

	15960 -/* Copyright (C)2004 Landmark Graphics Corporation

	15961 - * Copyright (C)2005, 2006 Sun Microsystems, Inc.

	15962 - * Copyright (C)2009 D. R. Commander

	15963 +/*

	15964 + * Copyright (C)2009-2013 D. R. Commander. All Rights Reserved.

	15965 *

	15966 - * This library is free software and may be redistributed and/or modified under

	15967 - * the terms of the wxWindows Library License, Version 3.1 or (at your option)

	15968 - * any later version. The full license is in the LICENSE.txt file included

	15969 - * with this distribution.

	15970 + * Redistribution and use in source and binary forms, with or without

	15971 + * modification, are permitted provided that the following conditions are met:

	15972 *

	15973 - * This library is distributed in the hope that it will be useful,

	15974 - * but WITHOUT ANY WARRANTY; without even the implied warranty of

	15975 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

	15976 - * wxWindows Library License for more details.

	15977 + * - Redistributions of source code must retain the above copyright notice,

	15978 + * this list of conditions and the following disclaimer.

	15979 + * - Redistributions in binary form must reproduce the above copyright notice,

	15980 + * this list of conditions and the following disclaimer in the documentation

	15981 + * and/or other materials provided with the distribution.

	15982 + * - Neither the name of the libjpeg-turbo Project nor the names of its

	15983 + * contributors may be used to endorse or promote products derived from this

	15984 + * software without specific prior written permission.

	15985 + *

	15986 + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS",

	15987 + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE

	15988 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE

	15989 + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE

	15990 + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR

	15991 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF

	15992 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS

	15993 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN

	15994 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)

	15995 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE

	15996 + * POSSIBILITY OF SUCH DAMAGE.

	15997 */

	15998

	15999 -#if (defined(_MSC_VER) \|\| defined(__CYGWIN__) \|\| defined(__MINGW32__)) && defin ed(_WIN32) && defined(DLLDEFINE)

	16000 +#ifndef __TURBOJPEG_H__

	16001 +#define __TURBOJPEG_H__

	16002 +

	16003 +#if defined(_WIN32) && defined(DLLDEFINE)

	16004 #define DLLEXPORT __declspec(dllexport)

	16005 #else

	16006 #define DLLEXPORT

	16007 #endif

	16008 -

	16009 #define DLLCALL

	16010

	16011 -/* Subsampling */

	16012 -#define NUMSUBOPT 4

	16013

	16014 -enum {TJ_444=0, TJ_422, TJ_420, TJ_GRAYSCALE};

	16015 +/**

	16016 + * @addtogroup TurboJPEG

	16017 + * TurboJPEG API. This API provides an interface for generating, decoding, and

	16018 + * transforming planar YUV and JPEG images in memory.

	16019 + *

	16020 + * @{

	16021 + */

	16022

	16023 -/* Flags */

	16024 -#define TJ_BGR 1

	16025 -#define TJ_BOTTOMUP 2

	16026 -#define TJ_FORCEMMX 8 /* Force IPP to use MMX code even if SSE available */

	16027 -#define TJ_FORCESSE 16 /* Force IPP to use SSE1 code even if SSE2 available * /

	16028 -#define TJ_FORCESSE2 32 /* Force IPP to use SSE2 code (useful if auto-detect i s not working properly) */

	16029 -#define TJ_ALPHAFIRST 64 /* BGR buffer is ABGR and RGB buffer is ARGB */

	16030 -#define TJ_FORCESSE3 128 /* Force IPP to use SSE3 code (useful if auto-detect i s not working properly) */

	16031 -#define TJ_FASTUPSAMPLE 256 /* Use fast, inaccurate 4:2:2 and 4:2:0 YUV upsampl ing routines in libjpeg decompressor */

	16032

	16033 +/**

	16034 + * The number of chrominance subsampling options

	16035 + */

	16036 +#define TJ_NUMSAMP 5

	16037 +

	16038 +/**

	16039 + * Chrominance subsampling options.

	16040 + * When an image is converted from the RGB to the YCbCr colorspace as part of

	16041 + * the JPEG compression process, some of the Cb and Cr (chrominance) components

	16042 + * can be discarded or averaged together to produce a smaller image with little

	16043 + * perceptible loss of image clarity (the human eye is more sensitive to small

	16044 + * changes in brightness than small changes in color.) This is called

	16045 + * "chrominance subsampling".

	16046 + * <p>

	16047 + * NOTE: Technically, the JPEG format uses the YCbCr colorspace, but per the

	16048 + * convention of the digital video community, the TurboJPEG API uses "YUV" to

	16049 + * refer to an image format consisting of Y, Cb, and Cr image planes.

	16050 + */

	16051 +enum TJSAMP

2199 +{	16052 +{

2200 + if (simd_support & JSIMD_ARM_NEON)	16053 + /**

2201 + jsimd_ycc_rgb565_convert_neon(cinfo->output_width, input_buf, input_row,	16054 + * 4:4:4 chrominance subsampling (no chrominance subsampling). The JPEG or

2202 + output_buf, num_rows);	16055 + * YUV image will contain one chrominance component for every pixel in the

2203 +}	16056 + * source image.

2204 +	16057 + */

2205 +GLOBAL(int)	16058 + TJSAMP_444=0,

2206 +jsimd_can_h2v2_downsample (void)	16059 + /**

	16060 + * 4:2:2 chrominance subsampling. The JPEG or YUV image will contain one

	16061 + * chrominance component for every 2x1 block of pixels in the source image.

	16062 + */

	16063 + TJSAMP_422,

	16064 + /**

	16065 + * 4:2:0 chrominance subsampling. The JPEG or YUV image will contain one

	16066 + * chrominance component for every 2x2 block of pixels in the source image.

	16067 + */

	16068 + TJSAMP_420,

	16069 + /**

	16070 + * Grayscale. The JPEG or YUV image will contain no chrominance components.

	16071 + */

	16072 + TJSAMP_GRAY,

	16073 + /**

	16074 + * 4:4:0 chrominance subsampling. The JPEG or YUV image will contain one

	16075 + * chrominance component for every 1x2 block of pixels in the source image.

	16076 + * Note that 4:4:0 subsampling is not fully accelerated in libjpeg-turbo.

	16077 + */

	16078 + TJSAMP_440

	16079 +};

	16080 +

	16081 +/**

	16082 + * MCU block width (in pixels) for a given level of chrominance subsampling.

	16083 + * MCU block sizes:

	16084 + * - 8x8 for no subsampling or grayscale

	16085 + * - 16x8 for 4:2:2

	16086 + * - 8x16 for 4:4:0

	16087 + * - 16x16 for 4:2:0

	16088 + */

	16089 +static const int tjMCUWidth[TJ_NUMSAMP] = {8, 16, 16, 8, 8};

	16090 +

	16091 +/**

	16092 + * MCU block height (in pixels) for a given level of chrominance subsampling.

	16093 + * MCU block sizes:

	16094 + * - 8x8 for no subsampling or grayscale

	16095 + * - 16x8 for 4:2:2

	16096 + * - 8x16 for 4:4:0

	16097 + * - 16x16 for 4:2:0

	16098 + */

	16099 +static const int tjMCUHeight[TJ_NUMSAMP] = {8, 8, 16, 8, 16};

	16100 +

	16101 +

	16102 +/**

	16103 + * The number of pixel formats

	16104 + */

	16105 +#define TJ_NUMPF 11

	16106 +

	16107 +/**

	16108 + * Pixel formats

	16109 + */

	16110 +enum TJPF

2207 +{	16111 +{

2208 + init_simd();	16112 + /**

2209 +	16113 + * RGB pixel format. The red, green, and blue components in the image are

2210 + return 0;	16114 + * stored in 3-byte pixels in the order R, G, B from lowest to highest byte

2211 +}	16115 + * address within each pixel.

2212 +	16116 + */

2213 +GLOBAL(int)	16117 + TJPF_RGB=0,

2214 +jsimd_can_h2v1_downsample (void)	16118 + /**

	16119 + * BGR pixel format. The red, green, and blue components in the image are

	16120 + * stored in 3-byte pixels in the order B, G, R from lowest to highest byte

	16121 + * address within each pixel.

	16122 + */

	16123 + TJPF_BGR,

	16124 + /**

	16125 + * RGBX pixel format. The red, green, and blue components in the image are

	16126 + * stored in 4-byte pixels in the order R, G, B from lowest to highest byte

	16127 + * address within each pixel. The X component is ignored when compressing

	16128 + * and undefined when decompressing.

	16129 + */

	16130 + TJPF_RGBX,

	16131 + /**

	16132 + * BGRX pixel format. The red, green, and blue components in the image are

	16133 + * stored in 4-byte pixels in the order B, G, R from lowest to highest byte

	16134 + * address within each pixel. The X component is ignored when compressing

	16135 + * and undefined when decompressing.

	16136 + */

	16137 + TJPF_BGRX,

	16138 + /**

	16139 + * XBGR pixel format. The red, green, and blue components in the image are

	16140 + * stored in 4-byte pixels in the order R, G, B from highest to lowest byte

	16141 + * address within each pixel. The X component is ignored when compressing

	16142 + * and undefined when decompressing.

	16143 + */

	16144 + TJPF_XBGR,

	16145 + /**

	16146 + * XRGB pixel format. The red, green, and blue components in the image are

	16147 + * stored in 4-byte pixels in the order B, G, R from highest to lowest byte

	16148 + * address within each pixel. The X component is ignored when compressing

	16149 + * and undefined when decompressing.

	16150 + */

	16151 + TJPF_XRGB,

	16152 + /**

	16153 + * Grayscale pixel format. Each 1-byte pixel represents a luminance

	16154 + * (brightness) level from 0 to 255.

	16155 + */

	16156 + TJPF_GRAY,

	16157 + /**

	16158 + * RGBA pixel format. This is the same as @ref TJPF_RGBX, except that when

	16159 + * decompressing, the X component is guaranteed to be 0xFF, which can be

	16160 + * interpreted as an opaque alpha channel.

	16161 + */

	16162 + TJPF_RGBA,

	16163 + /**

	16164 + * BGRA pixel format. This is the same as @ref TJPF_BGRX, except that when

	16165 + * decompressing, the X component is guaranteed to be 0xFF, which can be

	16166 + * interpreted as an opaque alpha channel.

	16167 + */

	16168 + TJPF_BGRA,

	16169 + /**

	16170 + * ABGR pixel format. This is the same as @ref TJPF_XBGR, except that when

	16171 + * decompressing, the X component is guaranteed to be 0xFF, which can be

	16172 + * interpreted as an opaque alpha channel.

	16173 + */

	16174 + TJPF_ABGR,

	16175 + /**

	16176 + * ARGB pixel format. This is the same as @ref TJPF_XRGB, except that when

	16177 + * decompressing, the X component is guaranteed to be 0xFF, which can be

	16178 + * interpreted as an opaque alpha channel.

	16179 + */

	16180 + TJPF_ARGB

	16181 +};

	16182 +

	16183 +/**

	16184 + * Red offset (in bytes) for a given pixel format. This specifies the number

	16185 + * of bytes that the red component is offset from the start of the pixel. For

	16186 + * instance, if a pixel of format TJ_BGRX is stored in <tt>char pixel[]</tt>,

	16187 + * then the red component will be <tt>pixel[tjRedOffset[TJ_BGRX]]</tt>.

	16188 + */

	16189 +static const int tjRedOffset[TJ_NUMPF] = {0, 2, 0, 2, 3, 1, 0, 0, 2, 3, 1};

	16190 +/**

	16191 + * Green offset (in bytes) for a given pixel format. This specifies the number

	16192 + * of bytes that the green component is offset from the start of the pixel.

	16193 + * For instance, if a pixel of format TJ_BGRX is stored in

	16194 + * <tt>char pixel[]</tt>, then the green component will be

	16195 + * <tt>pixel[tjGreenOffset[TJ_BGRX]]</tt>.

	16196 + */

	16197 +static const int tjGreenOffset[TJ_NUMPF] = {1, 1, 1, 1, 2, 2, 0, 1, 1, 2, 2};

	16198 +/**

	16199 + * Blue offset (in bytes) for a given pixel format. This specifies the number

	16200 + * of bytes that the Blue component is offset from the start of the pixel. For

	16201 + * instance, if a pixel of format TJ_BGRX is stored in <tt>char pixel[]</tt>,

	16202 + * then the blue component will be <tt>pixel[tjBlueOffset[TJ_BGRX]]</tt>.

	16203 + */

	16204 +static const int tjBlueOffset[TJ_NUMPF] = {2, 0, 2, 0, 1, 3, 0, 2, 0, 1, 3};

	16205 +

	16206 +/**

	16207 + * Pixel size (in bytes) for a given pixel format.

	16208 + */

	16209 +static const int tjPixelSize[TJ_NUMPF] = {3, 3, 4, 4, 4, 4, 1, 4, 4, 4, 4};

	16210 +

	16211 +

	16212 +/**

	16213 + * The uncompressed source/destination image is stored in bottom-up (Windows,

	16214 + * OpenGL) order, not top-down (X11) order.

	16215 + */

	16216 +#define TJFLAG_BOTTOMUP 2

	16217 +/**

	16218 + * Turn off CPU auto-detection and force TurboJPEG to use MMX code (if the

	16219 + * underlying codec supports it.)

	16220 + */

	16221 +#define TJFLAG_FORCEMMX 8

	16222 +/**

	16223 + * Turn off CPU auto-detection and force TurboJPEG to use SSE code (if the

	16224 + * underlying codec supports it.)

	16225 + */

	16226 +#define TJFLAG_FORCESSE 16

	16227 +/**

	16228 + * Turn off CPU auto-detection and force TurboJPEG to use SSE2 code (if the

	16229 + * underlying codec supports it.)

	16230 + */

	16231 +#define TJFLAG_FORCESSE2 32

	16232 +/**

	16233 + * Turn off CPU auto-detection and force TurboJPEG to use SSE3 code (if the

	16234 + * underlying codec supports it.)

	16235 + */

	16236 +#define TJFLAG_FORCESSE3 128

	16237 +/**

	16238 + * When decompressing an image that was compressed using chrominance

	16239 + * subsampling, use the fastest chrominance upsampling algorithm available in

	16240 + * the underlying codec. The default is to use smooth upsampling, which

	16241 + * creates a smooth transition between neighboring chrominance components in

	16242 + * order to reduce upsampling artifacts in the decompressed image.

	16243 + */

	16244 +#define TJFLAG_FASTUPSAMPLE 256

	16245 +/**

	16246 + * Disable buffer (re)allocation. If passed to #tjCompress2() or

	16247 + * #tjTransform(), this flag will cause those functions to generate an error if

	16248 + * the JPEG image buffer is invalid or too small rather than attempting to

	16249 + * allocate or reallocate that buffer. This reproduces the behavior of earlier

	16250 + * versions of TurboJPEG.

	16251 + */

	16252 +#define TJFLAG_NOREALLOC 1024

	16253 +/**

	16254 + * Use the fastest DCT/IDCT algorithm available in the underlying codec. The

	16255 + * default if this flag is not specified is implementation-specific. For

	16256 + * example, the implementation of TurboJPEG for libjpeg[-turbo] uses the fast

	16257 + * algorithm by default when compressing, because this has been shown to have

	16258 + * only a very slight effect on accuracy, but it uses the accurate algorithm

	16259 + * when decompressing, because this has been shown to have a larger effect.

	16260 + */

	16261 +#define TJFLAG_FASTDCT 2048

	16262 +/**

	16263 + * Use the most accurate DCT/IDCT algorithm available in the underlying codec.

	16264 + * The default if this flag is not specified is implementation-specific. For

	16265 + * example, the implementation of TurboJPEG for libjpeg[-turbo] uses the fast

	16266 + * algorithm by default when compressing, because this has been shown to have

	16267 + * only a very slight effect on accuracy, but it uses the accurate algorithm

	16268 + * when decompressing, because this has been shown to have a larger effect.

	16269 + */

	16270 +#define TJFLAG_ACCURATEDCT 4096

	16271 +

	16272 +

	16273 +/**

	16274 + * The number of transform operations

	16275 + */

	16276 +#define TJ_NUMXOP 8

	16277 +

	16278 +/**

	16279 + * Transform operations for #tjTransform()

	16280 + */

	16281 +enum TJXOP

2215 +{	16282 +{

2216 + init_simd();	16283 + /**

2217 +	16284 + * Do not transform the position of the image pixels

2218 + return 0;	16285 + */

2219 +}	16286 + TJXOP_NONE=0,

2220 +	16287 + /**

2221 +GLOBAL(void)	16288 + * Flip (mirror) image horizontally. This transform is imperfect if there

2222 +jsimd_h2v2_downsample (j_compress_ptr cinfo, jpeg_component_info * compptr,	16289 + * are any partial MCU blocks on the right edge (see #TJXOPT_PERFECT.)

2223 + JSAMPARRAY input_data, JSAMPARRAY output_data)	16290 + */

	16291 + TJXOP_HFLIP,

	16292 + /**

	16293 + * Flip (mirror) image vertically. This transform is imperfect if there are

	16294 + * any partial MCU blocks on the bottom edge (see #TJXOPT_PERFECT.)

	16295 + */

	16296 + TJXOP_VFLIP,

	16297 + /**

	16298 + * Transpose image (flip/mirror along upper left to lower right axis.) This

	16299 + * transform is always perfect.

	16300 + */

	16301 + TJXOP_TRANSPOSE,

	16302 + /**

	16303 + * Transverse transpose image (flip/mirror along upper right to lower left

	16304 + * axis.) This transform is imperfect if there are any partial MCU blocks in

	16305 + * the image (see #TJXOPT_PERFECT.)

	16306 + */

	16307 + TJXOP_TRANSVERSE,

	16308 + /**

	16309 + * Rotate image clockwise by 90 degrees. This transform is imperfect if

	16310 + * there are any partial MCU blocks on the bottom edge (see

	16311 + * #TJXOPT_PERFECT.)

	16312 + */

	16313 + TJXOP_ROT90,

	16314 + /**

	16315 + * Rotate image 180 degrees. This transform is imperfect if there are any

	16316 + * partial MCU blocks in the image (see #TJXOPT_PERFECT.)

	16317 + */

	16318 + TJXOP_ROT180,

	16319 + /**

	16320 + * Rotate image counter-clockwise by 90 degrees. This transform is imperfect

	16321 + * if there are any partial MCU blocks on the right edge (see

	16322 + * #TJXOPT_PERFECT.)

	16323 + */

	16324 + TJXOP_ROT270

	16325 +};

	16326 +

	16327 +

	16328 +/**

	16329 + * This option will cause #tjTransform() to return an error if the transform is

	16330 + * not perfect. Lossless transforms operate on MCU blocks, whose size depends

	16331 + * on the level of chrominance subsampling used (see #tjMCUWidth

	16332 + * and #tjMCUHeight.) If the image's width or height is not evenly divisible

	16333 + * by the MCU block size, then there will be partial MCU blocks on the right

	16334 + * and/or bottom edges. It is not possible to move these partial MCU blocks to

	16335 + * the top or left of the image, so any transform that would require that is

	16336 + * "imperfect." If this option is not specified, then any partial MCU blocks

	16337 + * that cannot be transformed will be left in place, which will create

	16338 + * odd-looking strips on the right or bottom edge of the image.

	16339 + */

	16340 +#define TJXOPT_PERFECT 1

	16341 +/**

	16342 + * This option will cause #tjTransform() to discard any partial MCU blocks that

	16343 + * cannot be transformed.

	16344 + */

	16345 +#define TJXOPT_TRIM 2

	16346 +/**

	16347 + * This option will enable lossless cropping. See #tjTransform() for more

	16348 + * information.

	16349 + */

	16350 +#define TJXOPT_CROP 4

	16351 +/**

	16352 + * This option will discard the color data in the input image and produce

	16353 + * a grayscale output image.

	16354 + */

	16355 +#define TJXOPT_GRAY 8

	16356 +/**

	16357 + * This option will prevent #tjTransform() from outputting a JPEG image for

	16358 + * this particular transform (this can be used in conjunction with a custom

	16359 + * filter to capture the transformed DCT coefficients without transcoding

	16360 + * them.)

	16361 + */

	16362 +#define TJXOPT_NOOUTPUT 16

	16363 +

	16364 +

	16365 +/**

	16366 + * Scaling factor

	16367 + */

	16368 +typedef struct

2224 +{	16369 +{

2225 +}	16370 + /**

2226 +	16371 + * Numerator

2227 +GLOBAL(void)	16372 + */

2228 +jsimd_h2v1_downsample (j_compress_ptr cinfo, jpeg_component_info * compptr,	16373 + int num;

2229 + JSAMPARRAY input_data, JSAMPARRAY output_data)	16374 + /**

	16375 + * Denominator

	16376 + */

	16377 + int denom;

	16378 +} tjscalingfactor;

	16379 +

	16380 +/**

	16381 + * Cropping region

	16382 + */

	16383 +typedef struct

2230 +{	16384 +{

2231 +}	16385 + /**

2232 +	16386 + * The left boundary of the cropping region. This must be evenly divisible

2233 +GLOBAL(int)	16387 + * by the MCU block width (see #tjMCUWidth.)

2234 +jsimd_can_h2v2_upsample (void)	16388 + */

	16389 + int x;

	16390 + /**

	16391 + * The upper boundary of the cropping region. This must be evenly divisible

	16392 + * by the MCU block height (see #tjMCUHeight.)

	16393 + */

	16394 + int y;

	16395 + /**

	16396 + * The width of the cropping region. Setting this to 0 is the equivalent of

	16397 + * setting it to the width of the source JPEG image - x.

	16398 + */

	16399 + int w;

	16400 + /**

	16401 + * The height of the cropping region. Setting this to 0 is the equivalent of

	16402 + * setting it to the height of the source JPEG image - y.

	16403 + */

	16404 + int h;

	16405 +} tjregion;

	16406 +

	16407 +/**

	16408 + * Lossless transform

	16409 + */

	16410 +typedef struct tjtransform

2235 +{	16411 +{

2236 + init_simd();	16412 + /**

2237 +	16413 + * Cropping region

2238 + return 0;	16414 + */

2239 +}	16415 + tjregion r;

2240 +	16416 + /**

2241 +GLOBAL(int)	16417 + * One of the @ref TJXOP "transform operations"

2242 +jsimd_can_h2v1_upsample (void)	16418 + */

2243 +{	16419 + int op;

2244 + init_simd();	16420 + /**

2245 +	16421 + * The bitwise OR of one of more of the @ref TJXOPT_CROP "transform options"

2246 + return 0;	16422 + */

2247 +}	16423 + int options;

2248 +	16424 + /**

2249 +GLOBAL(void)	16425 + * Arbitrary data that can be accessed within the body of the callback

2250 +jsimd_h2v2_upsample (j_decompress_ptr cinfo,	16426 + * function

2251 + jpeg_component_info * compptr,	16427 + */

2252 + JSAMPARRAY input_data,	16428 + void *data;

2253 + JSAMPARRAY * output_data_ptr)	16429 + /**

2254 +{	16430 + * A callback function that can be used to modify the DCT coefficients

2255 +}	16431 + * after they are losslessly transformed but before they are transcoded to a

2256 +	16432 + * new JPEG image. This allows for custom filters or other transformations

2257 +GLOBAL(void)	16433 + * to be applied in the frequency domain.

2258 +jsimd_h2v1_upsample (j_decompress_ptr cinfo,	16434 + *

2259 + jpeg_component_info * compptr,	16435 + * @param coeffs pointer to an array of transformed DCT coefficients. (NOTE:

2260 + JSAMPARRAY input_data,	16436 + * this pointer is not guaranteed to be valid once the callback

2261 + JSAMPARRAY * output_data_ptr)	16437 + * returns, so applications wishing to hand off the DCT coefficients

2262 +{	16438 + * to another function or library should make a copy of them within

2263 +}	16439 + * the body of the callback.)

2264 +	16440 + * @param arrayRegion #tjregion structure containing the width and height of

2265 +GLOBAL(int)	16441 + * the array pointed to by <tt>coeffs</tt> as well as its offset

2266 +jsimd_can_h2v2_fancy_upsample (void)	16442 + * relative to the component plane. TurboJPEG implementations may

2267 +{	16443 + * choose to split each component plane into multiple DCT coefficient

2268 + init_simd();	16444 + * arrays and call the callback function once for each array.

2269 +	16445 + * @param planeRegion #tjregion structure containing the width and height of

2270 + return 0;	16446 + * the component plane to which <tt>coeffs</tt> belongs

2271 +}	16447 + * @param componentID ID number of the component plane to which

2272 +	16448 + * <tt>coeffs</tt> belongs (Y, Cb, and Cr have, respectively, ID's of

2273 +GLOBAL(int)	16449 + * 0, 1, and 2 in typical JPEG images.)

2274 +jsimd_can_h2v1_fancy_upsample (void)	16450 + * @param transformID ID number of the transformed image to which

2275 +{	16451 + * <tt>coeffs</tt> belongs. This is the same as the index of the

2276 + init_simd();	16452 + * transform in the <tt>transforms</tt> array that was passed to

2277 +	16453 + * #tjTransform().

2278 + return 0;	16454 + * @param transform a pointer to a #tjtransform structure that specifies the

2279 +}	16455 + * parameters and/or cropping region for this transform

2280 +	16456 + *

2281 +GLOBAL(void)	16457 + * @return 0 if the callback was successful, or -1 if an error occurred.

2282 +jsimd_h2v2_fancy_upsample (j_decompress_ptr cinfo,	16458 + */

2283 + jpeg_component_info * compptr,	16459 + int (customFilter)(short coeffs, tjregion arrayRegion,

2284 + JSAMPARRAY input_data,	16460 + tjregion planeRegion, int componentIndex, int transformIndex,

2285 + JSAMPARRAY * output_data_ptr)	16461 + struct tjtransform *transform);

2286 +{	16462 +} tjtransform;

2287 +}	16463 +

2288 +	16464 +/**

2289 +GLOBAL(void)	16465 + * TurboJPEG instance handle

2290 +jsimd_h2v1_fancy_upsample (j_decompress_ptr cinfo,	16466 + */

2291 + jpeg_component_info * compptr,	16467 typedef void* tjhandle;

2292 + JSAMPARRAY input_data,	16468

2293 + JSAMPARRAY * output_data_ptr)	16469 -#define TJPAD(p) (((p)+3)&(~3))

2294 +{	16470 -#ifndef max

2295 +}	16471 - #define max(a,b) ((a)>(b)?(a):(b))

2296 +	16472 -#endif

2297 +GLOBAL(int)	16473

2298 +jsimd_can_h2v2_merged_upsample (void)	16474 +/**

2299 +{	16475 + * Pad the given width to the nearest 32-bit boundary

2300 + init_simd();	16476 + */

2301 +	16477 +#define TJPAD(width) (((width)+3)&(~3))

2302 + return 0;	16478 +

2303 +}	16479 +/**

2304 +	16480 + * Compute the scaled value of <tt>dimension</tt> using the given scaling

2305 +GLOBAL(int)	16481 + * factor. This macro performs the integer equivalent of <tt>ceil(dimension *

2306 +jsimd_can_h2v1_merged_upsample (void)	16482 + * scalingFactor)</tt>.

2307 +{	16483 + */

2308 + init_simd();	16484 +#define TJSCALED(dimension, scalingFactor) ((dimension * scalingFactor.num \

2309 +	16485 + + scalingFactor.denom - 1) / scalingFactor.denom)

2310 + return 0;	16486 +

2311 +}	16487 +

2312 +	16488 #ifdef __cplusplus

2313 +GLOBAL(void)	16489 extern "C" {

2314 +jsimd_h2v2_merged_upsample (j_decompress_ptr cinfo,	16490 #endif

2315 + JSAMPIMAGE input_buf,	16491

2316 + JDIMENSION in_row_group_ctr,	16492 -/* API follows */

2317 + JSAMPARRAY output_buf)	16493

2318 +{	16494 +/**

2319 +}	16495 + * Create a TurboJPEG compressor instance.

2320 +	16496 + *

2321 +GLOBAL(void)	16497 + * @return a handle to the newly-created instance, or NULL if an error

2322 +jsimd_h2v1_merged_upsample (j_decompress_ptr cinfo,	16498 + * occurred (see #tjGetErrorStr().)

2323 + JSAMPIMAGE input_buf,	16499 + */

2324 + JDIMENSION in_row_group_ctr,	16500 +DLLEXPORT tjhandle DLLCALL tjInitCompress(void);

2325 + JSAMPARRAY output_buf)	16501

2326 +{	16502 -/*

2327 +}	16503 - tjhandle tjInitCompress(void)

2328 +	16504

2329 +GLOBAL(int)	16505 - Creates a new JPEG compressor instance, allocates memory for the structures,

2330 +jsimd_can_convsamp (void)	16506 - and returns a handle to the instance. Most applications will only

2331 +{	16507 - need to call this once at the beginning of the program or once for each

2332 + init_simd();	16508 - concurrent thread. Don't try to create a new instance every time you

2333 +	16509 - compress an image, because this will cause performance to suffer.

2334 + return 0;	16510 -

2335 +}	16511 - RETURNS: NULL on error

2336 +	16512 +/**

2337 +GLOBAL(int)	16513 + * Compress an RGB or grayscale image into a JPEG image.

2338 +jsimd_can_convsamp_float (void)	16514 + *

2339 +{	16515 + * @param handle a handle to a TurboJPEG compressor or transformer instance

2340 + init_simd();	16516 + * @param srcBuf pointer to an image buffer containing RGB or grayscale pixels

2341 +	16517 + * to be compressed

2342 + return 0;	16518 + * @param width width (in pixels) of the source image

2343 +}	16519 + * @param pitch bytes per line of the source image. Normally, this should be

2344 +	16520 + * <tt>width * #tjPixelSize[pixelFormat]</tt> if the image is unpadded,

2345 +GLOBAL(void)	16521 + * or <tt>#TJPAD(width * #tjPixelSize[pixelFormat])</tt> if each line of

2346 +jsimd_convsamp (JSAMPARRAY sample_data, JDIMENSION start_col,	16522 + * the image is padded to the nearest 32-bit boundary, as is the case

2347 + DCTELEM * workspace)	16523 + * for Windows bitmaps. You can also be clever and use this parameter

2348 +{	16524 + * to skip lines, etc. Setting this parameter to 0 is the equivalent of

2349 +}	16525 + * setting it to <tt>width * #tjPixelSize[pixelFormat]</tt>.

2350 +	16526 + * @param height height (in pixels) of the source image

2351 +GLOBAL(void)	16527 + * @param pixelFormat pixel format of the source image (see @ref TJPF

2352 +jsimd_convsamp_float (JSAMPARRAY sample_data, JDIMENSION start_col,	16528 + * "Pixel formats".)

2353 + FAST_FLOAT * workspace)	16529 + * @param jpegBuf address of a pointer to an image buffer that will receive the

2354 +{	16530 + * JPEG image. TurboJPEG has the ability to reallocate the JPEG buffer

2355 +}	16531 + * to accommodate the size of the JPEG image. Thus, you can choose to:

2356 +	16532 + * -# pre-allocate the JPEG buffer with an arbitrary size using

2357 +GLOBAL(int)	16533 + * #tjAlloc() and let TurboJPEG grow the buffer as needed,

2358 +jsimd_can_fdct_islow (void)	16534 + * -# set <tt>*jpegBuf</tt> to NULL to tell TurboJPEG to allocate the

2359 +{	16535 + * buffer for you, or

2360 + init_simd();	16536 + * -# pre-allocate the buffer to a "worst case" size determined by

2361 +	16537 + * calling #tjBufSize(). This should ensure that the buffer never has

2362 + return 0;	16538 + * to be re-allocated (setting #TJFLAG_NOREALLOC guarantees this.)

2363 +}	16539 + * .

2364 +	16540 + * If you choose option 1, <tt>*jpegSize</tt> should be set to the

2365 +GLOBAL(int)	16541 + * size of your pre-allocated buffer. In any case, unless you have

2366 +jsimd_can_fdct_ifast (void)	16542 + * set #TJFLAG_NOREALLOC, you should always check <tt>*jpegBuf</tt> upon

2367 +{	16543 + * return from this function, as it may have changed.

2368 + init_simd();	16544 + * @param jpegSize pointer to an unsigned long variable that holds the size of

2369 +	16545 + * the JPEG image buffer. If <tt>*jpegBuf</tt> points to a

2370 + return 0;	16546 + * pre-allocated buffer, then <tt>*jpegSize</tt> should be set to the

2371 +}	16547 + * size of the buffer. Upon return, <tt>*jpegSize</tt> will contain the

2372 +	16548 + * size of the JPEG image (in bytes.)

2373 +GLOBAL(int)	16549 + * @param jpegSubsamp the level of chrominance subsampling to be used when

2374 +jsimd_can_fdct_float (void)	16550 + * generating the JPEG image (see @ref TJSAMP

2375 +{	16551 + * "Chrominance subsampling options".)

2376 + init_simd();	16552 + * @param jpegQual the image quality of the generated JPEG image (1 = worst,

2377 +	16553 + 100 = best)

2378 + return 0;	16554 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP

2379 +}	16555 + * "flags".

2380 +	16556 + *

2381 +GLOBAL(void)	16557 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)

2382 +jsimd_fdct_islow (DCTELEM * data)	16558 */

2383 +{	16559 -DLLEXPORT tjhandle DLLCALL tjInitCompress(void);

2384 +}	16560 +DLLEXPORT int DLLCALL tjCompress2(tjhandle handle, unsigned char *srcBuf,

2385 +	16561 + int width, int pitch, int height, int pixelFormat, unsigned char **jpegBuf,

2386 +GLOBAL(void)	16562 + unsigned long *jpegSize, int jpegSubsamp, int jpegQual, int flags);

2387 +jsimd_fdct_ifast (DCTELEM * data)	16563

2388 +{	16564

2389 +}	16565 -/*

2390 +	16566 - int tjCompress(tjhandle j,

2391 +GLOBAL(void)	16567 - unsigned char *srcbuf, int width, int pitch, int height, int pixelsize,

2392 +jsimd_fdct_float (FAST_FLOAT * data)	16568 - unsigned char dstbuf, unsigned long size,

2393 +{	16569 - int jpegsubsamp, int jpegqual, int flags)

2394 +}	16570 +/**

2395 +	16571 + * The maximum size of the buffer (in bytes) required to hold a JPEG image with

2396 +GLOBAL(int)	16572 + * the given parameters. The number of bytes returned by this function is

2397 +jsimd_can_quantize (void)	16573 + * larger than the size of the uncompressed source image. The reason for this

2398 +{	16574 + * is that the JPEG format uses 16-bit coefficients, and it is thus possible

2399 + init_simd();	16575 + * for a very high-quality JPEG image with very high-frequency content to

2400 +	16576 + * expand rather than compress when converted to the JPEG format. Such images

2401 + return 0;	16577 + * represent a very rare corner case, but since there is no way to predict the

2402 +}	16578 + * size of a JPEG image prior to compression, the corner case has to be

2403 +	16579 + * handled.

2404 +GLOBAL(int)	16580 + *

2405 +jsimd_can_quantize_float (void)	16581 + * @param width width of the image (in pixels)

2406 +{	16582 + * @param height height of the image (in pixels)

2407 + init_simd();	16583 + * @param jpegSubsamp the level of chrominance subsampling to be used when

2408 +	16584 + * generating the JPEG image (see @ref TJSAMP

2409 + return 0;	16585 + * "Chrominance subsampling options".)

2410 +}	16586 + *

2411 +	16587 + * @return the maximum size of the buffer (in bytes) required to hold the

2412 +GLOBAL(void)	16588 + * image, or -1 if the arguments are out of bounds.

2413 +jsimd_quantize (JCOEFPTR coef_block, DCTELEM * divisors,	16589 + */

2414 + DCTELEM * workspace)	16590 +DLLEXPORT unsigned long DLLCALL tjBufSize(int width, int height,

2415 +{	16591 + int jpegSubsamp);

2416 +}	16592

2417 +	16593 - [INPUT] j = instance handle previously returned from a call to

2418 +GLOBAL(void)	16594 - tjInitCompress()

2419 +jsimd_quantize_float (JCOEFPTR coef_block, FAST_FLOAT * divisors,	16595 - [INPUT] srcbuf = pointer to user-allocated image buffer containing pixels in

2420 + FAST_FLOAT * workspace)	16596 - RGB(A) or BGR(A) form

2421 +{	16597 - [INPUT] width = width (in pixels) of the source image

2422 +}	16598 - [INPUT] pitch = bytes per line of the source image (width*pixelsize if the

2423 +	16599 - bitmap is unpadded, else TJPAD(width*pixelsize) if each line of the bitmap

2424 +GLOBAL(int)	16600 - is padded to the nearest 32-bit boundary, such as is the case for Windows

2425 +jsimd_can_idct_2x2 (void)	16601 - bitmaps. You can also be clever and use this parameter to skip lines, etc .,

2426 +{	16602 - as long as the pitch is greater than 0.)

2427 + init_simd();	16603 - [INPUT] height = height (in pixels) of the source image

2428 +	16604 - [INPUT] pixelsize = size (in bytes) of each pixel in the source image

2429 + /* The code is optimised for these values only */	16605 - RGBA and BGRA: 4, RGB and BGR: 3

2430 + if (DCTSIZE != 8)	16606 - [INPUT] dstbuf = pointer to user-allocated image buffer which will receive

2431 + return 0;	16607 - the JPEG image. Use the macro TJBUFSIZE(width, height) to determine

2432 + if (sizeof(JCOEF) != 2)	16608 - the appropriate size for this buffer based on the image width and height.

2433 + return 0;	16609 - [OUTPUT] size = pointer to unsigned long which receives the size (in bytes)

2434 + if (BITS_IN_JSAMPLE != 8)	16610 - of the compressed image

2435 + return 0;	16611 - [INPUT] jpegsubsamp = Specifies either 4:2:0, 4:2:2, or 4:4:4 subsampling.

2436 + if (sizeof(JDIMENSION) != 4)	16612 - When the image is converted from the RGB to YCbCr colorspace as part of th e

2437 + return 0;	16613 - JPEG compression process, every other Cb and Cr (chrominance) pixel can be

2438 + if (sizeof(ISLOW_MULT_TYPE) != 2)	16614 - discarded to produce a smaller image with little perceptible loss of

2439 + return 0;	16615 - image clarity (the human eye is more sensitive to small changes in

2440 +	16616 - brightness than small changes in color.)

2441 + if (simd_support & JSIMD_ARM_NEON)	16617

2442 + return 1;	16618 - TJ_420: 4:2:0 subsampling. Discards every other Cb, Cr pixel in both

2443 +	16619 - horizontal and vertical directions.

2444 + return 0;	16620 - TJ_422: 4:2:2 subsampling. Discards every other Cb, Cr pixel only in

2445 +}	16621 - the horizontal direction.

2446 +	16622 - TJ_444: no subsampling.

2447 +GLOBAL(int)	16623 - TJ_GRAYSCALE: Generate grayscale JPEG image

2448 +jsimd_can_idct_4x4 (void)	16624 +/**

2449 +{	16625 + * The size of the buffer (in bytes) required to hold a YUV planar image with

2450 + init_simd();	16626 + * the given parameters.

2451 +	16627 + *

2452 + /* The code is optimised for these values only */	16628 + * @param width width of the image (in pixels)

2453 + if (DCTSIZE != 8)	16629 + * @param height height of the image (in pixels)

2454 + return 0;	16630 + * @param subsamp level of chrominance subsampling in the image (see

2455 + if (sizeof(JCOEF) != 2)	16631 + * @ref TJSAMP "Chrominance subsampling options".)

2456 + return 0;	16632 + *

2457 + if (BITS_IN_JSAMPLE != 8)	16633 + * @return the size of the buffer (in bytes) required to hold the image, or

2458 + return 0;	16634 + * -1 if the arguments are out of bounds.

2459 + if (sizeof(JDIMENSION) != 4)	16635 + */

2460 + return 0;	16636 +DLLEXPORT unsigned long DLLCALL tjBufSizeYUV(int width, int height,

2461 + if (sizeof(ISLOW_MULT_TYPE) != 2)	16637 + int subsamp);

2462 + return 0;	16638

2463 +	16639 - [INPUT] jpegqual = JPEG quality (an integer between 0 and 100 inclusive.)

2464 + if (simd_support & JSIMD_ARM_NEON)	16640 - [INPUT] flags = the bitwise OR of one or more of the following

2465 + return 1;	16641

2466 +	16642 - TJ_BGR: The components of each pixel in the source image are stored in

2467 + return 0;	16643 - B,G,R order, not R,G,B

2468 +}	16644 - TJ_BOTTOMUP: The source image is stored in bottom-up (Windows) order,

2469 +	16645 - not top-down

2470 +GLOBAL(void)	16646 - TJ_FORCEMMX: Valid only for the Intel Performance Primitives implementatio n

2471 +jsimd_idct_2x2 (j_decompress_ptr cinfo, jpeg_component_info * compptr,	16647 - of this codec-- force IPP to use MMX code (bypass CPU auto-detection)

2472 + JCOEFPTR coef_block, JSAMPARRAY output_buf,	16648 - TJ_FORCESSE: Valid only for the Intel Performance Primitives implementatio n

2473 + JDIMENSION output_col)	16649 - of this codec-- force IPP to use SSE code (bypass CPU auto-detection)

2474 +{	16650 - TJ_FORCESSE2: Valid only for the Intel Performance Primitives implementati on

2475 + if (simd_support & JSIMD_ARM_NEON)	16651 - of this codec-- force IPP to use SSE2 code (bypass CPU auto-detection)

2476 + jsimd_idct_2x2_neon(compptr->dct_table, coef_block, output_buf,	16652 - TJ_FORCESSE3: Valid only for the Intel Performance Primitives implementati on

2477 + output_col);	16653 - of this codec-- force IPP to use SSE3 code (bypass CPU auto-detection)

2478 +}	16654 +/**

2479 +	16655 + * Encode an RGB or grayscale image into a YUV planar image. This function

2480 +GLOBAL(void)	16656 + * uses the accelerated color conversion routines in TurboJPEG's underlying

2481 +jsimd_idct_4x4 (j_decompress_ptr cinfo, jpeg_component_info * compptr,	16657 + * codec to produce a planar YUV image that is suitable for X Video.

2482 + JCOEFPTR coef_block, JSAMPARRAY output_buf,	16658 + * Specifically, if the chrominance components are subsampled along the

2483 + JDIMENSION output_col)	16659 + * horizontal dimension, then the width of the luminance plane is padded to the

2484 +{	16660 + * nearest multiple of 2 in the output image (same goes for the height of the

2485 + if (simd_support & JSIMD_ARM_NEON)	16661 + * luminance plane, if the chrominance components are subsampled along the

2486 + jsimd_idct_4x4_neon(compptr->dct_table, coef_block, output_buf,	16662 + * vertical dimension.) Also, each line of each plane in the output image is

2487 + output_col);	16663 + * padded to 4 bytes. Although this will work with any subsampling option, it

2488 +}	16664 + * is really only useful in combination with TJ_420, which produces an image

2489 +	16665 + * compatible with the I420 (AKA "YUV420P") format.

2490 +GLOBAL(int)	16666 + * <p>

2491 +jsimd_can_idct_islow (void)	16667 + * NOTE: Technically, the JPEG format uses the YCbCr colorspace, but per the

2492 +{	16668 + * convention of the digital video community, the TurboJPEG API uses "YUV" to

2493 + init_simd();	16669 + * refer to an image format consisting of Y, Cb, and Cr image planes.

2494 +	16670 + *

2495 + /* The code is optimised for these values only */	16671 + * @param handle a handle to a TurboJPEG compressor or transformer instance

2496 + if (DCTSIZE != 8)	16672 + * @param srcBuf pointer to an image buffer containing RGB or grayscale pixels

2497 + return 0;	16673 + * to be encoded

2498 + if (sizeof(JCOEF) != 2)	16674 + * @param width width (in pixels) of the source image

2499 + return 0;	16675 + * @param pitch bytes per line of the source image. Normally, this should be

2500 + if (BITS_IN_JSAMPLE != 8)	16676 + * <tt>width * #tjPixelSize[pixelFormat]</tt> if the image is unpadded,

2501 + return 0;	16677 + * or <tt>#TJPAD(width * #tjPixelSize[pixelFormat])</tt> if each line of

2502 + if (sizeof(JDIMENSION) != 4)	16678 + * the image is padded to the nearest 32-bit boundary, as is the case

2503 + return 0;	16679 + * for Windows bitmaps. You can also be clever and use this parameter

2504 + if (sizeof(ISLOW_MULT_TYPE) != 2)	16680 + * to skip lines, etc. Setting this parameter to 0 is the equivalent of

2505 + return 0;	16681 + * setting it to <tt>width * #tjPixelSize[pixelFormat]</tt>.

2506 +	16682 + * @param height height (in pixels) of the source image

2507 + if (simd_support & JSIMD_ARM_NEON)	16683 + * @param pixelFormat pixel format of the source image (see @ref TJPF

2508 + return 1;	16684 + * "Pixel formats".)

2509 +	16685 + * @param dstBuf pointer to an image buffer that will receive the YUV image.

2510 + return 0;	16686 + * Use #tjBufSizeYUV() to determine the appropriate size for this buffer

2511 +}	16687 + * based on the image width, height, and level of chrominance

2512 +	16688 + * subsampling.

2513 +GLOBAL(int)	16689 + * @param subsamp the level of chrominance subsampling to be used when

2514 +jsimd_can_idct_ifast (void)	16690 + * generating the YUV image (see @ref TJSAMP

2515 +{	16691 + * "Chrominance subsampling options".)

2516 + init_simd();	16692 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP

2517 +	16693 + * "flags".

2518 + /* The code is optimised for these values only */	16694 + *

2519 + if (DCTSIZE != 8)	16695 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)

2520 + return 0;	16696 +*/

2521 + if (sizeof(JCOEF) != 2)	16697 +DLLEXPORT int DLLCALL tjEncodeYUV2(tjhandle handle,

2522 + return 0;	16698 + unsigned char *srcBuf, int width, int pitch, int height, int pixelFormat,

2523 + if (BITS_IN_JSAMPLE != 8)	16699 + unsigned char *dstBuf, int subsamp, int flags);

2524 + return 0;	16700

2525 + if (sizeof(JDIMENSION) != 4)	16701 - RETURNS: 0 on success, -1 on error

2526 + return 0;	16702 +

2527 + if (sizeof(IFAST_MULT_TYPE) != 2)	16703 +/**

2528 + return 0;	16704 + * Create a TurboJPEG decompressor instance.

2529 + if (IFAST_SCALE_BITS != 2)	16705 + *

2530 + return 0;	16706 + * @return a handle to the newly-created instance, or NULL if an error

2531 +	16707 + * occurred (see #tjGetErrorStr().)

2532 + if (simd_support & JSIMD_ARM_NEON)	16708 */

2533 + return 1;	16709 -DLLEXPORT int DLLCALL tjCompress(tjhandle j,

2534 +	16710 - unsigned char *srcbuf, int width, int pitch, int height, int pixelsize,

2535 + return 0;	16711 - unsigned char dstbuf, unsigned long size,

2536 +}	16712 - int jpegsubsamp, int jpegqual, int flags);

2537 +	16713 +DLLEXPORT tjhandle DLLCALL tjInitDecompress(void);

2538 +GLOBAL(int)	16714

2539 +jsimd_can_idct_float (void)	16715 -DLLEXPORT unsigned long DLLCALL TJBUFSIZE(int width, int height);

2540 +{	16716

2541 + init_simd();	16717 -/*

2542 +	16718 - tjhandle tjInitDecompress(void)

2543 + return 0;	16719 +/**

2544 +}	16720 + * Retrieve information about a JPEG image without decompressing it.

2545 +	16721 + *

2546 +GLOBAL(void)	16722 + * @param handle a handle to a TurboJPEG decompressor or transformer instance

2547 +jsimd_idct_islow (j_decompress_ptr cinfo, jpeg_component_info * compptr,	16723 + * @param jpegBuf pointer to a buffer containing a JPEG image

2548 + JCOEFPTR coef_block, JSAMPARRAY output_buf,	16724 + * @param jpegSize size of the JPEG image (in bytes)

2549 + JDIMENSION output_col)	16725 + * @param width pointer to an integer variable that will receive the width (in

2550 +{	16726 + * pixels) of the JPEG image

2551 + if (simd_support & JSIMD_ARM_NEON)	16727 + * @param height pointer to an integer variable that will receive the height

2552 + jsimd_idct_islow_neon(compptr->dct_table, coef_block, output_buf,	16728 + * (in pixels) of the JPEG image

2553 + output_col);	16729 + * @param jpegSubsamp pointer to an integer variable that will receive the

2554 +}	16730 + * level of chrominance subsampling used when compressing the JPEG image

2555 +	16731 + * (see @ref TJSAMP "Chrominance subsampling options".)

2556 +GLOBAL(void)	16732 + *

2557 +jsimd_idct_ifast (j_decompress_ptr cinfo, jpeg_component_info * compptr,	16733 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)

2558 + JCOEFPTR coef_block, JSAMPARRAY output_buf,	16734 +*/

2559 + JDIMENSION output_col)	16735 +DLLEXPORT int DLLCALL tjDecompressHeader2(tjhandle handle,

2560 +{	16736 + unsigned char jpegBuf, unsigned long jpegSize, int width, int *height,

2561 + if (simd_support & JSIMD_ARM_NEON)	16737 + int *jpegSubsamp);

2562 + jsimd_idct_ifast_neon(compptr->dct_table, coef_block, output_buf,	16738

2563 + output_col);	16739 - Creates a new JPEG decompressor instance, allocates memory for the

2564 +}	16740 - structures, and returns a handle to the instance. Most applications will

2565 +	16741 - only need to call this once at the beginning of the program or once for each

2566 +GLOBAL(void)	16742 - concurrent thread. Don't try to create a new instance every time you

2567 +jsimd_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,	16743 - decompress an image, because this will cause performance to suffer.

2568 + JCOEFPTR coef_block, JSAMPARRAY output_buf,	16744

2569 + JDIMENSION output_col)	16745 - RETURNS: NULL on error

2570 +{	16746 +/**

2571 +}	16747 + * Returns a list of fractional scaling factors that the JPEG decompressor in

2572 Index: simd/jsimd_arm64_neon.S	16748 + * this implementation of TurboJPEG supports.

2573 new file mode 100644	16749 + *

	16750 + * @param numscalingfactors pointer to an integer variable that will receive

	16751 + * the number of elements in the list

	16752 + *

	16753 + * @return a pointer to a list of fractional scaling factors, or NULL if an

	16754 + * error is encountered (see #tjGetErrorStr().)

	16755 */

	16756 -DLLEXPORT tjhandle DLLCALL tjInitDecompress(void);

	16757 +DLLEXPORT tjscalingfactor* DLLCALL tjGetScalingFactors(int *numscalingfactors);

	16758

	16759

	16760 -/*

	16761 - int tjDecompressHeader(tjhandle j,

	16762 - unsigned char *srcbuf, unsigned long size,

	16763 - int width, int height)

	16764 +/**

	16765 + * Decompress a JPEG image to an RGB or grayscale image.

	16766 + *

	16767 + * @param handle a handle to a TurboJPEG decompressor or transformer instance

	16768 + * @param jpegBuf pointer to a buffer containing the JPEG image to decompress

	16769 + * @param jpegSize size of the JPEG image (in bytes)

	16770 + * @param dstBuf pointer to an image buffer that will receive the decompressed

	16771 + * image. This buffer should normally be <tt>pitch * scaledHeight</tt>

	16772 + * bytes in size, where <tt>scaledHeight</tt> can be determined by

	16773 + * calling #TJSCALED() with the JPEG image height and one of the scaling

	16774 + * factors returned by #tjGetScalingFactors(). The <tt>dstBuf</tt>

	16775 + * pointer may also be used to decompress into a specific region of a

	16776 + * larger buffer.

	16777 + * @param width desired width (in pixels) of the destination image. If this is

	16778 + * different than the width of the JPEG image being decompressed, then

	16779 + * TurboJPEG will use scaling in the JPEG decompressor to generate the

	16780 + * largest possible image that will fit within the desired width. If

	16781 + * <tt>width</tt> is set to 0, then only the height will be considered

	16782 + * when determining the scaled image size.

	16783 + * @param pitch bytes per line of the destination image. Normally, this is

	16784 + * <tt>scaledWidth * #tjPixelSize[pixelFormat]</tt> if the decompressed

	16785 + * image is unpadded, else <tt>#TJPAD(scaledWidth *

	16786 + * #tjPixelSize[pixelFormat])</tt> if each line of the decompressed

	16787 + * image is padded to the nearest 32-bit boundary, as is the case for

	16788 + * Windows bitmaps. (NOTE: <tt>scaledWidth</tt> can be determined by

	16789 + * calling #TJSCALED() with the JPEG image width and one of the scaling

	16790 + * factors returned by #tjGetScalingFactors().) You can also be clever

	16791 + * and use the pitch parameter to skip lines, etc. Setting this

	16792 + * parameter to 0 is the equivalent of setting it to <tt>scaledWidth

	16793 + * * #tjPixelSize[pixelFormat]</tt>.

	16794 + * @param height desired height (in pixels) of the destination image. If this

	16795 + * is different than the height of the JPEG image being decompressed,

	16796 + * then TurboJPEG will use scaling in the JPEG decompressor to generate

	16797 + * the largest possible image that will fit within the desired height.

	16798 + * If <tt>height</tt> is set to 0, then only the width will be

	16799 + * considered when determining the scaled image size.

	16800 + * @param pixelFormat pixel format of the destination image (see @ref

	16801 + * TJPF "Pixel formats".)

	16802 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP

	16803 + * "flags".

	16804 + *

	16805 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)

	16806 + */

	16807 +DLLEXPORT int DLLCALL tjDecompress2(tjhandle handle,

	16808 + unsigned char jpegBuf, unsigned long jpegSize, unsigned char dstBuf,

	16809 + int width, int pitch, int height, int pixelFormat, int flags);

	16810

	16811 - [INPUT] j = instance handle previously returned from a call to

	16812 - tjInitDecompress()

	16813 - [INPUT] srcbuf = pointer to a user-allocated buffer containing the JPEG image

	16814 - to decompress

	16815 - [INPUT] size = size of the JPEG image buffer (in bytes)

	16816 - [OUTPUT] width = width (in pixels) of the JPEG image

	16817 - [OUTPUT] height = height (in pixels) of the JPEG image

	16818

	16819 - RETURNS: 0 on success, -1 on error

	16820 -*/

	16821 -DLLEXPORT int DLLCALL tjDecompressHeader(tjhandle j,

	16822 - unsigned char *srcbuf, unsigned long size,

	16823 - int width, int height);

	16824 +/**

	16825 + * Decompress a JPEG image to a YUV planar image. This function performs JPEG

	16826 + * decompression but leaves out the color conversion step, so a planar YUV

	16827 + * image is generated instead of an RGB image. The padding of the planes in

	16828 + * this image is the same as in the images generated by #tjEncodeYUV2(). Note

	16829 + * that, if the width or height of the image is not an even multiple of the MCU

	16830 + * block size (see #tjMCUWidth and #tjMCUHeight), then an intermediate buffer

	16831 + * copy will be performed within TurboJPEG.

	16832 + * <p>

	16833 + * NOTE: Technically, the JPEG format uses the YCbCr colorspace, but per the

	16834 + * convention of the digital video community, the TurboJPEG API uses "YUV" to

	16835 + * refer to an image format consisting of Y, Cb, and Cr image planes.

	16836 + *

	16837 + * @param handle a handle to a TurboJPEG decompressor or transformer instance

	16838 + * @param jpegBuf pointer to a buffer containing the JPEG image to decompress

	16839 + * @param jpegSize size of the JPEG image (in bytes)

	16840 + * @param dstBuf pointer to an image buffer that will receive the YUV image.

	16841 + * Use #tjBufSizeYUV() to determine the appropriate size for this buffer

	16842 + * based on the image width, height, and level of subsampling.

	16843 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP

	16844 + * "flags".

	16845 + *

	16846 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)

	16847 + */

	16848 +DLLEXPORT int DLLCALL tjDecompressToYUV(tjhandle handle,

	16849 + unsigned char jpegBuf, unsigned long jpegSize, unsigned char dstBuf,

	16850 + int flags);

	16851

	16852

	16853 -/*

	16854 - int tjDecompress(tjhandle j,

	16855 - unsigned char *srcbuf, unsigned long size,

	16856 - unsigned char *dstbuf, int width, int pitch, int height, int pixelsize,

	16857 - int flags)

	16858 +/**

	16859 + * Create a new TurboJPEG transformer instance.

	16860 + *

	16861 + * @return a handle to the newly-created instance, or NULL if an error

	16862 + * occurred (see #tjGetErrorStr().)

	16863 + */

	16864 +DLLEXPORT tjhandle DLLCALL tjInitTransform(void);

	16865

	16866 - [INPUT] j = instance handle previously returned from a call to

	16867 - tjInitDecompress()

	16868 - [INPUT] srcbuf = pointer to a user-allocated buffer containing the JPEG image

	16869 - to decompress

	16870 - [INPUT] size = size of the JPEG image buffer (in bytes)

	16871 - [INPUT] dstbuf = pointer to user-allocated image buffer which will receive

	16872 - the bitmap image. This buffer should normally be pitch*height

	16873 - bytes in size, although this pointer may also be used to decompress into

	16874 - a specific region of a larger buffer.

	16875 - [INPUT] width = width (in pixels) of the destination image

	16876 - [INPUT] pitch = bytes per line of the destination image (width*pixelsize if t he

	16877 - bitmap is unpadded, else TJPAD(width*pixelsize) if each line of the bitmap

	16878 - is padded to the nearest 32-bit boundary, such as is the case for Windows

	16879 - bitmaps. You can also be clever and use this parameter to skip lines, etc .,

	16880 - as long as the pitch is greater than 0.)

	16881 - [INPUT] height = height (in pixels) of the destination image

	16882 - [INPUT] pixelsize = size (in bytes) of each pixel in the destination image

	16883 - RGBA/RGBx and BGRA/BGRx: 4, RGB and BGR: 3

	16884 - [INPUT] flags = the bitwise OR of one or more of the following

	16885

	16886 - TJ_BGR: The components of each pixel in the destination image should be

	16887 - written in B,G,R order, not R,G,B

	16888 - TJ_BOTTOMUP: The destination image should be stored in bottom-up

	16889 - (Windows) order, not top-down

	16890 - TJ_FORCEMMX: Valid only for the Intel Performance Primitives implementatio n

	16891 - of this codec-- force IPP to use MMX code (bypass CPU auto-detection)

	16892 - TJ_FORCESSE: Valid only for the Intel Performance Primitives implementatio n

	16893 - of this codec-- force IPP to use SSE code (bypass CPU auto-detection)

	16894 - TJ_FORCESSE2: Valid only for the Intel Performance Primitives implementati on

	16895 - of this codec-- force IPP to use SSE2 code (bypass CPU auto-detection)

	16896 +/**

	16897 + * Losslessly transform a JPEG image into another JPEG image. Lossless

	16898 + * transforms work by moving the raw coefficients from one JPEG image structure

	16899 + * to another without altering the values of the coefficients. While this is

	16900 + * typically faster than decompressing the image, transforming it, and

	16901 + * re-compressing it, lossless transforms are not free. Each lossless

	16902 + * transform requires reading and performing Huffman decoding on all of the

	16903 + * coefficients in the source image, regardless of the size of the destination

	16904 + * image. Thus, this function provides a means of generating multiple

	16905 + * transformed images from the same source or applying multiple

	16906 + * transformations simultaneously, in order to eliminate the need to read the

	16907 + * source coefficients multiple times.

	16908 + *

	16909 + * @param handle a handle to a TurboJPEG transformer instance

	16910 + * @param jpegBuf pointer to a buffer containing the JPEG image to transform

	16911 + * @param jpegSize size of the JPEG image (in bytes)

	16912 + * @param n the number of transformed JPEG images to generate

	16913 + * @param dstBufs pointer to an array of n image buffers. <tt>dstBufs[i]</tt>

	16914 + * will receive a JPEG image that has been transformed using the

	16915 + * parameters in <tt>transforms[i]</tt>. TurboJPEG has the ability to

	16916 + * reallocate the JPEG buffer to accommodate the size of the JPEG image.

	16917 + * Thus, you can choose to:

	16918 + * -# pre-allocate the JPEG buffer with an arbitrary size using

	16919 + * #tjAlloc() and let TurboJPEG grow the buffer as needed,

	16920 + * -# set <tt>dstBufs[i]</tt> to NULL to tell TurboJPEG to allocate the

	16921 + * buffer for you, or

	16922 + * -# pre-allocate the buffer to a "worst case" size determined by

	16923 + * calling #tjBufSize() with the transformed or cropped width and

	16924 + * height. This should ensure that the buffer never has to be

	16925 + * re-allocated (setting #TJFLAG_NOREALLOC guarantees this.)

	16926 + * .

	16927 + * If you choose option 1, <tt>dstSizes[i]</tt> should be set to

	16928 + * the size of your pre-allocated buffer. In any case, unless you have

	16929 + * set #TJFLAG_NOREALLOC, you should always check <tt>dstBufs[i]</tt>

	16930 + * upon return from this function, as it may have changed.

	16931 + * @param dstSizes pointer to an array of n unsigned long variables that will

	16932 + * receive the actual sizes (in bytes) of each transformed JPEG image.

	16933 + * If <tt>dstBufs[i]</tt> points to a pre-allocated buffer, then

	16934 + * <tt>dstSizes[i]</tt> should be set to the size of the buffer. Upon

	16935 + * return, <tt>dstSizes[i]</tt> will contain the size of the JPEG image

	16936 + * (in bytes.)

	16937 + * @param transforms pointer to an array of n #tjtransform structures, each of

	16938 + * which specifies the transform parameters and/or cropping region for

	16939 + * the corresponding transformed output image.

	16940 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP

	16941 + * "flags".

	16942 + *

	16943 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)

	16944 + */

	16945 +DLLEXPORT int DLLCALL tjTransform(tjhandle handle, unsigned char *jpegBuf,

	16946 + unsigned long jpegSize, int n, unsigned char **dstBufs,

	16947 + unsigned long dstSizes, tjtransform transforms, int flags);

	16948

	16949 - RETURNS: 0 on success, -1 on error

	16950 -*/

	16951 -DLLEXPORT int DLLCALL tjDecompress(tjhandle j,

	16952 - unsigned char *srcbuf, unsigned long size,

	16953 - unsigned char *dstbuf, int width, int pitch, int height, int pixelsize,

	16954 - int flags);

	16955

	16956 +/**

	16957 + * Destroy a TurboJPEG compressor, decompressor, or transformer instance.

	16958 + *

	16959 + * @param handle a handle to a TurboJPEG compressor, decompressor or

	16960 + * transformer instance

	16961 + *

	16962 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().)

	16963 + */

	16964 +DLLEXPORT int DLLCALL tjDestroy(tjhandle handle);

	16965

	16966 -/*

	16967 - int tjDestroy(tjhandle h)

	16968

	16969 - Frees structures associated with a compression or decompression instance

	16970 -

	16971 - [INPUT] h = instance handle (returned from a previous call to

	16972 - tjInitCompress() or tjInitDecompress()

	16973 +/**

	16974 + * Allocate an image buffer for use with TurboJPEG. You should always use

	16975 + * this function to allocate the JPEG destination buffer(s) for #tjCompress2()

	16976 + * and #tjTransform() unless you are disabling automatic buffer

	16977 + * (re)allocation (by setting #TJFLAG_NOREALLOC.)

	16978 + *

	16979 + * @param bytes the number of bytes to allocate

	16980 + *

	16981 + * @return a pointer to a newly-allocated buffer with the specified number of

	16982 + * bytes

	16983 + *

	16984 + * @sa tjFree()

	16985 + */

	16986 +DLLEXPORT unsigned char* DLLCALL tjAlloc(int bytes);

	16987

	16988 - RETURNS: 0 on success, -1 on error

	16989 -*/

	16990 -DLLEXPORT int DLLCALL tjDestroy(tjhandle h);

	16991

	16992 +/**

	16993 + * Free an image buffer previously allocated by TurboJPEG. You should always

	16994 + * use this function to free JPEG destination buffer(s) that were automatically

	16995 + * (re)allocated by #tjCompress2() or #tjTransform() or that were manually

	16996 + * allocated using #tjAlloc().

	16997 + *

	16998 + * @param buffer address of the buffer to free

	16999 + *

	17000 + * @sa tjAlloc()

	17001 + */

	17002 +DLLEXPORT void DLLCALL tjFree(unsigned char *buffer);

	17003

	17004 -/*

	17005 - char *tjGetErrorStr(void)

	17006 -

	17007 - Returns a descriptive error message explaining why the last command failed

	17008 -*/

	17009 +

	17010 +/**

	17011 + * Returns a descriptive error message explaining why the last command failed.

	17012 + *

	17013 + * @return a descriptive error message explaining why the last command failed.

	17014 + */

	17015 DLLEXPORT char* DLLCALL tjGetErrorStr(void);

	17016

	17017 +

	17018 +/* Backward compatibility functions and macros (nothing to see here) */

	17019 +#define NUMSUBOPT TJ_NUMSAMP

	17020 +#define TJ_444 TJSAMP_444

	17021 +#define TJ_422 TJSAMP_422

	17022 +#define TJ_420 TJSAMP_420

	17023 +#define TJ_411 TJSAMP_420

	17024 +#define TJ_GRAYSCALE TJSAMP_GRAY

	17025 +

	17026 +#define TJ_BGR 1

	17027 +#define TJ_BOTTOMUP TJFLAG_BOTTOMUP

	17028 +#define TJ_FORCEMMX TJFLAG_FORCEMMX

	17029 +#define TJ_FORCESSE TJFLAG_FORCESSE

	17030 +#define TJ_FORCESSE2 TJFLAG_FORCESSE2

	17031 +#define TJ_ALPHAFIRST 64

	17032 +#define TJ_FORCESSE3 TJFLAG_FORCESSE3

	17033 +#define TJ_FASTUPSAMPLE TJFLAG_FASTUPSAMPLE

	17034 +#define TJ_YUV 512

	17035 +

	17036 +DLLEXPORT unsigned long DLLCALL TJBUFSIZE(int width, int height);

	17037 +

	17038 +DLLEXPORT unsigned long DLLCALL TJBUFSIZEYUV(int width, int height,

	17039 + int jpegSubsamp);

	17040 +

	17041 +DLLEXPORT int DLLCALL tjCompress(tjhandle handle, unsigned char *srcBuf,

	17042 + int width, int pitch, int height, int pixelSize, unsigned char *dstBuf,

	17043 + unsigned long *compressedSize, int jpegSubsamp, int jpegQual, int flags);

	17044 +

	17045 +DLLEXPORT int DLLCALL tjEncodeYUV(tjhandle handle,

	17046 + unsigned char *srcBuf, int width, int pitch, int height, int pixelSize,

	17047 + unsigned char *dstBuf, int subsamp, int flags);

	17048 +

	17049 +DLLEXPORT int DLLCALL tjDecompressHeader(tjhandle handle,

	17050 + unsigned char jpegBuf, unsigned long jpegSize, int width, int *height);

	17051 +

	17052 +DLLEXPORT int DLLCALL tjDecompress(tjhandle handle,

	17053 + unsigned char jpegBuf, unsigned long jpegSize, unsigned char dstBuf,

	17054 + int width, int pitch, int height, int pixelSize, int flags);

	17055 +

	17056 +

	17057 +/**

	17058 + * @}

	17059 + */

	17060 +

	17061 #ifdef __cplusplus

	17062 }

	17063 #endif

	17064 +

	17065 +#endif

	17066 Index: turbojpegl.c

2574 ===================================================================	17067 ===================================================================

2575 --- /dev/null	17068 --- turbojpegl.c (revision 829)

2576 +++ simd/jsimd_arm64_neon.S	17069 +++ turbojpegl.c (working copy)

2577 @@ -0,0 +1,1861 @@	17070 @@ -149,6 +149,10 @@

2578 +/*	17071 #error "TurboJPEG requires JPEG colorspace extensions"

2579 + * ARMv8 NEON optimizations for libjpeg-turbo	17072 #endif

2580 + *	17073

2581 + * Copyright (C) 2009-2011 Nokia Corporation and/or its subsidiary(-ies).	17074 + if(flags&TJ_FORCEMMX) putenv("JSIMD_FORCEMMX=1");

2582 + * All rights reserved.	17075 + else if(flags&TJ_FORCESSE) putenv("JSIMD_FORCESSE=1");

2583 + * Author: Siarhei Siamashka <siarhei.siamashka@nokia.com>	17076 + else if(flags&TJ_FORCESSE2) putenv("JSIMD_FORCESSE2=1");

2584 + * Copyright (C) 2013-2014, Linaro Limited	17077 +

2585 + * Author: Ragesh Radhakrishnan <ragesh.r@linaro.org>	17078 if(setjmp(j->jerr.jb))

2586 + *	17079 { // this will execute if LIBJPEG has an error

2587 + * This software is provided 'as-is', without any express or implied	17080 if(row_pointer) free(row_pointer);

2588 + * warranty. In no event will the authors be held liable for any damages	17081 @@ -188,7 +192,8 @@

2589 + * arising from the use of this software.	17082 j->cinfo.image_height-j->cinfo.next_scanline);

2590 + *	17083 }

2591 + * Permission is granted to anyone to use this software for any purpose,	17084 jpeg_finish_compress(&j->cinfo);

2592 + * including commercial applications, and to alter it and redistribute it	17085 - *size=TJBUFSIZE(j->cinfo.image_width, j->cinfo.image_height)-(j->jdms.fr ee_in_buffer);

2593 + * freely, subject to the following restrictions:	17086 + *size=TJBUFSIZE(j->cinfo.image_width, j->cinfo.image_height)

2594 + *	17087 + -(unsigned long)(j->jdms.free_in_buffer);

2595 + * 1. The origin of this software must not be misrepresented; you must not	17088

2596 + * claim that you wrote the original software. If you use this software	17089 if(row_pointer) free(row_pointer);

2597 + * in a product, an acknowledgment in the product documentation would be	17090 return 0;

2598 + * appreciated but is not required.	17091 @@ -287,6 +292,10 @@

2599 + * 2. Altered source versions must be plainly marked as such, and must not be	17092

2600 + * misrepresented as being the original software.	17093 if(pitch==0) pitch=width*ps;

2601 + * 3. This notice may not be removed or altered from any source distribution.	17094

2602 + */	17095 + if(flags&TJ_FORCEMMX) putenv("JSIMD_FORCEMMX=1");

2603 +	17096 + else if(flags&TJ_FORCESSE) putenv("JSIMD_FORCESSE=1");

2604 +#if defined(__linux__) && defined(__ELF__)	17097 + else if(flags&TJ_FORCESSE2) putenv("JSIMD_FORCESSE2=1");

2605 +.section .note.GNU-stack,"",%progbits /* mark stack as non-executable */	17098 +

2606 +#endif	17099 if(setjmp(j->jerr.jb))

2607 +	17100 { // this will execute if LIBJPEG has an error

2608 +.text	17101 if(row_pointer) free(row_pointer);

2609 +.arch armv8-a+fp+simd	17102 Index: wrppm.c

2610 +	17103 ===================================================================

2611 +	17104 --- wrppm.c (revision 829)

2612 +#define RESPECT_STRICT_ALIGNMENT 1	17105 +++ wrppm.c (working copy)

2613 +	17106 @@ -2,6 +2,7 @@

2614 +	17107 * wrppm.c

2615 +/*****************************************************************************/	17108 *

2616 +	17109 * Copyright (C) 1991-1996, Thomas G. Lane.

2617 +/* Supplementary macro for setting function attributes */	17110 + * Modified 2009 by Guido Vollbeding.

2618 +.macro asm_function fname	17111 * This file is part of the Independent JPEG Group's software.

2619 +#ifdef __APPLE__	17112 * For conditions of distribution and use, see the accompanying README file.

2620 + .globl _\fname	17113 *

2621 +_\fname:	17114 @@ -40,11 +41,11 @@

2622 +#else	17115 #define BYTESPERSAMPLE 1

2623 + .global \fname	17116 #define PPM_MAXVAL 255

2624 +#ifdef __ELF__	17117 #else

2625 + .hidden \fname	17118 -/* The word-per-sample format always puts the LSB first. */

2626 + .type \fname, %function	17119 +/* The word-per-sample format always puts the MSB first. */

2627 +#endif	17120 #define PUTPPMSAMPLE(ptr,v) \

2628 +\fname:	17121 { register int val_ = v; \

2629 +#endif	17122 + *ptr++ = (char) ((val_ >> 8) & 0xFF); \

2630 +.endm	17123 *ptr++ = (char) (val_ & 0xFF); \

2631 +	17124 - *ptr++ = (char) ((val_ >> 8) & 0xFF); \

2632 +/* Transpose elements of single 128 bit registers */	17125 }

2633 +.macro transpose_single x0,x1,xi,xilen,literal	17126 #define BYTESPERSAMPLE 2

2634 + ins \xi\xilen[0], \x0\xilen[0]	17127 #define PPM_MAXVAL ((1<<BITS_IN_JSAMPLE)-1)

2635 + ins \x1\xilen[0], \x0\xilen[1]

2636 + trn1 \x0\literal, \x0\literal, \x1\literal

2637 + trn2 \x1\literal, \xi\literal, \x1\literal

2638 +.endm

2639 +

2640 +/* Transpose elements of 2 differnet registers */

2641 +.macro transpose x0,x1,xi,xilen,literal

2642 + mov \xi\xilen, \x0\xilen

2643 + trn1 \x0\literal, \x0\literal, \x1\literal

2644 + trn2 \x1\literal, \xi\literal, \x1\literal

2645 +.endm

2646 +

2647 +/* Transpose a block of 4x4 coefficients in four 64-bit registers */

2648 +.macro transpose_4x4_32 x0,x0len x1,x1len x2,x2len x3,x3len,xi,xilen

2649 + mov \xi\xilen, \x0\xilen

2650 + trn1 \x0\x0len, \x0\x0len, \x2\x2len

2651 + trn2 \x2\x2len, \xi\x0len, \x2\x2len

2652 + mov \xi\xilen, \x1\xilen

2653 + trn1 \x1\x1len, \x1\x1len, \x3\x3len

2654 + trn2 \x3\x3len, \xi\x1len, \x3\x3len

2655 +.endm

2656 +

2657 +.macro transpose_4x4_16 x0,x0len x1,x1len, x2,x2len, x3,x3len,xi,xilen

2658 + mov \xi\xilen, \x0\xilen

2659 + trn1 \x0\x0len, \x0\x0len, \x1\x1len

2660 + trn2 \x1\x2len, \xi\x0len, \x1\x2len

2661 + mov \xi\xilen, \x2\xilen

2662 + trn1 \x2\x2len, \x2\x2len, \x3\x3len

2663 + trn2 \x3\x2len, \xi\x1len, \x3\x3len

2664 +.endm

2665 +

2666 +.macro transpose_4x4 x0, x1, x2, x3,x5

2667 + transpose_4x4_16 \x0,.4h, \x1,.4h, \x2,.4h,\x3,.4h,\x5,.16b

2668 + transpose_4x4_32 \x0,.2s, \x1,.2s, \x2,.2s,\x3,.2s,\x5,.16b

2669 +.endm

2670 +

2671 +

2672 +#define CENTERJSAMPLE 128

2673 +

2674 +/*****************************************************************************/

2675 +

2676 +/*

2677 + * Perform dequantization and inverse DCT on one block of coefficients.

2678 + *

2679 + * GLOBAL(void)

2680 + * jsimd_idct_islow_neon (void * dct_table, JCOEFPTR coef_block,

2681 + * JSAMPARRAY output_buf, JDIMENSION output_col)

2682 + */

2683 +

2684 +#define FIX_0_298631336 (2446)

2685 +#define FIX_0_390180644 (3196)

2686 +#define FIX_0_541196100 (4433)

2687 +#define FIX_0_765366865 (6270)

2688 +#define FIX_0_899976223 (7373)

2689 +#define FIX_1_175875602 (9633)

2690 +#define FIX_1_501321110 (12299)

2691 +#define FIX_1_847759065 (15137)

2692 +#define FIX_1_961570560 (16069)

2693 +#define FIX_2_053119869 (16819)

2694 +#define FIX_2_562915447 (20995)

2695 +#define FIX_3_072711026 (25172)

2696 +

2697 +#define FIX_1_175875602_MINUS_1_961570560 (FIX_1_175875602 - FIX_1_961570560)

2698 +#define FIX_1_175875602_MINUS_0_390180644 (FIX_1_175875602 - FIX_0_390180644)

2699 +#define FIX_0_541196100_MINUS_1_847759065 (FIX_0_541196100 - FIX_1_847759065)

2700 +#define FIX_3_072711026_MINUS_2_562915447 (FIX_3_072711026 - FIX_2_562915447)

2701 +#define FIX_0_298631336_MINUS_0_899976223 (FIX_0_298631336 - FIX_0_899976223)

2702 +#define FIX_1_501321110_MINUS_0_899976223 (FIX_1_501321110 - FIX_0_899976223)

2703 +#define FIX_2_053119869_MINUS_2_562915447 (FIX_2_053119869 - FIX_2_562915447)

2704 +#define FIX_0_541196100_PLUS_0_765366865 (FIX_0_541196100 + FIX_0_765366865)

2705 +

2706 +/*

2707 + * Reference SIMD-friendly 1-D ISLOW iDCT C implementation.

2708 + * Uses some ideas from the comments in 'simd/jiss2int-64.asm'

2709 + */

2710 +#define REF_1D_IDCT(xrow0, xrow1, xrow2, xrow3, xrow4, xrow5, xrow6, xrow7) \

2711 +{ \

2712 + DCTELEM row0, row1, row2, row3, row4, row5, row6, row7; \

2713 + INT32 q1, q2, q3, q4, q5, q6, q7; \

2714 + INT32 tmp11_plus_tmp2, tmp11_minus_tmp2; \

2715 + \

2716 + /* 1-D iDCT input data */ \

2717 + row0 = xrow0; \

2718 + row1 = xrow1; \

2719 + row2 = xrow2; \

2720 + row3 = xrow3; \

2721 + row4 = xrow4; \

2722 + row5 = xrow5; \

2723 + row6 = xrow6; \

2724 + row7 = xrow7; \

2725 + \

2726 + q5 = row7 + row3; \

2727 + q4 = row5 + row1; \

2728 + q6 = MULTIPLY(q5, FIX_1_175875602_MINUS_1_961570560) + \

2729 + MULTIPLY(q4, FIX_1_175875602); \

2730 + q7 = MULTIPLY(q5, FIX_1_175875602) + \

2731 + MULTIPLY(q4, FIX_1_175875602_MINUS_0_390180644); \

2732 + q2 = MULTIPLY(row2, FIX_0_541196100) + \

2733 + MULTIPLY(row6, FIX_0_541196100_MINUS_1_847759065); \

2734 + q4 = q6; \

2735 + q3 = ((INT32) row0 - (INT32) row4) << 13; \

2736 + q6 += MULTIPLY(row5, -FIX_2_562915447) + \

2737 + MULTIPLY(row3, FIX_3_072711026_MINUS_2_562915447); \

2738 + /* now we can use q1 (reloadable constants have been used up) */ \

2739 + q1 = q3 + q2; \

2740 + q4 += MULTIPLY(row7, FIX_0_298631336_MINUS_0_899976223) + \

2741 + MULTIPLY(row1, -FIX_0_899976223); \

2742 + q5 = q7; \

2743 + q1 = q1 + q6; \

2744 + q7 += MULTIPLY(row7, -FIX_0_899976223) + \

2745 + MULTIPLY(row1, FIX_1_501321110_MINUS_0_899976223); \

2746 + \

2747 + /* (tmp11 + tmp2) has been calculated (out_row1 before descale) */ \

2748 + tmp11_plus_tmp2 = q1; \

2749 + row1 = 0; \

2750 + \

2751 + q1 = q1 - q6; \

2752 + q5 += MULTIPLY(row5, FIX_2_053119869_MINUS_2_562915447) + \

2753 + MULTIPLY(row3, -FIX_2_562915447); \

2754 + q1 = q1 - q6; \

2755 + q6 = MULTIPLY(row2, FIX_0_541196100_PLUS_0_765366865) + \

2756 + MULTIPLY(row6, FIX_0_541196100); \

2757 + q3 = q3 - q2; \

2758 + \

2759 + /* (tmp11 - tmp2) has been calculated (out_row6 before descale) */ \

2760 + tmp11_minus_tmp2 = q1; \

2761 + \

2762 + q1 = ((INT32) row0 + (INT32) row4) << 13; \

2763 + q2 = q1 + q6; \

2764 + q1 = q1 - q6; \

2765 + \

2766 + /* pick up the results */ \

2767 + tmp0 = q4; \

2768 + tmp1 = q5; \

2769 + tmp2 = (tmp11_plus_tmp2 - tmp11_minus_tmp2) / 2; \

2770 + tmp3 = q7; \

2771 + tmp10 = q2; \

2772 + tmp11 = (tmp11_plus_tmp2 + tmp11_minus_tmp2) / 2; \

2773 + tmp12 = q3; \

2774 + tmp13 = q1; \

2775 +}

2776 +

2777 +#define XFIX_0_899976223 v0.4h[0]

2778 +#define XFIX_0_541196100 v0.4h[1]

2779 +#define XFIX_2_562915447 v0.4h[2]

2780 +#define XFIX_0_298631336_MINUS_0_899976223 v0.4h[3]

2781 +#define XFIX_1_501321110_MINUS_0_899976223 v1.4h[0]

2782 +#define XFIX_2_053119869_MINUS_2_562915447 v1.4h[1]

2783 +#define XFIX_0_541196100_PLUS_0_765366865 v1.4h[2]

2784 +#define XFIX_1_175875602 v1.4h[3]

2785 +#define XFIX_1_175875602_MINUS_0_390180644 v2.4h[0]

2786 +#define XFIX_0_541196100_MINUS_1_847759065 v2.4h[1]

2787 +#define XFIX_3_072711026_MINUS_2_562915447 v2.4h[2]

2788 +#define XFIX_1_175875602_MINUS_1_961570560 v2.4h[3]

2789 +

2790 +.balign 16

2791 +jsimd_idct_islow_neon_consts:

2792 + .short FIX_0_899976223 /* d0[0] */

2793 + .short FIX_0_541196100 /* d0[1] */

2794 + .short FIX_2_562915447 /* d0[2] */

2795 + .short FIX_0_298631336_MINUS_0_899976223 /* d0[3] */

2796 + .short FIX_1_501321110_MINUS_0_899976223 /* d1[0] */

2797 + .short FIX_2_053119869_MINUS_2_562915447 /* d1[1] */

2798 + .short FIX_0_541196100_PLUS_0_765366865 /* d1[2] */

2799 + .short FIX_1_175875602 /* d1[3] */

2800 + /* reloadable constants */

2801 + .short FIX_1_175875602_MINUS_0_390180644 /* d2[0] */

2802 + .short FIX_0_541196100_MINUS_1_847759065 /* d2[1] */

2803 + .short FIX_3_072711026_MINUS_2_562915447 /* d2[2] */

2804 + .short FIX_1_175875602_MINUS_1_961570560 /* d2[3] */

2805 +

2806 +asm_function jsimd_idct_islow_neon

2807 +

2808 + DCT_TABLE .req x0

2809 + COEF_BLOCK .req x1

2810 + OUTPUT_BUF .req x2

2811 + OUTPUT_COL .req x3

2812 + TMP1 .req x0

2813 + TMP2 .req x1

2814 + TMP3 .req x2

2815 + TMP4 .req x15

2816 +

2817 + ROW0L .req v16

2818 + ROW0R .req v17

2819 + ROW1L .req v18

2820 + ROW1R .req v19

2821 + ROW2L .req v20

2822 + ROW2R .req v21

2823 + ROW3L .req v22

2824 + ROW3R .req v23

2825 + ROW4L .req v24

2826 + ROW4R .req v25

2827 + ROW5L .req v26

2828 + ROW5R .req v27

2829 + ROW6L .req v28

2830 + ROW6R .req v29

2831 + ROW7L .req v30

2832 + ROW7R .req v31

2833 + /* Save all NEON registers and x15 (32 NEON registers * 8 bytes + 16) */

2834 + sub sp, sp, 272

2835 + str x15, [sp], 16

2836 + adr x15, jsimd_idct_islow_neon_consts

2837 + st1 {v0.8b - v3.8b}, [sp], 32

2838 + st1 {v4.8b - v7.8b}, [sp], 32

2839 + st1 {v8.8b - v11.8b}, [sp], 32

2840 + st1 {v12.8b - v15.8b}, [sp], 32

2841 + st1 {v16.8b - v19.8b}, [sp], 32

2842 + st1 {v20.8b - v23.8b}, [sp], 32

2843 + st1 {v24.8b - v27.8b}, [sp], 32

2844 + st1 {v28.8b - v31.8b}, [sp], 32

2845 + ld1 {v16.4h, v17.4h, v18.4h, v19.4h}, [COEF_BLOCK], 32

2846 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [DCT_TABLE], 32

2847 + ld1 {v20.4h, v21.4h, v22.4h, v23.4h}, [COEF_BLOCK], 32

2848 + mul v16.4h, v16.4h, v0.4h

2849 + mul v17.4h, v17.4h, v1.4h

2850 + ins v16.2d[1], v17.2d[0] /* 128 bit q8 */

2851 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [DCT_TABLE], 32

2852 + mul v18.4h, v18.4h, v2.4h

2853 + mul v19.4h, v19.4h, v3.4h

2854 + ins v18.2d[1], v19.2d[0] /* 128 bit q9 */

2855 + ld1 {v24.4h, v25.4h, v26.4h, v27.4h}, [COEF_BLOCK], 32

2856 + mul v20.4h, v20.4h, v4.4h

2857 + mul v21.4h, v21.4h, v5.4h

2858 + ins v20.2d[1], v21.2d[0] /* 128 bit q10 */

2859 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [DCT_TABLE], 32

2860 + mul v22.4h, v22.4h, v6.4h

2861 + mul v23.4h, v23.4h, v7.4h

2862 + ins v22.2d[1], v23.2d[0] /* 128 bit q11 */

2863 + ld1 {v28.4h, v29.4h, v30.4h, v31.4h}, [COEF_BLOCK]

2864 + mul v24.4h, v24.4h, v0.4h

2865 + mul v25.4h, v25.4h, v1.4h

2866 + ins v24.2d[1], v25.2d[0] /* 128 bit q12 */

2867 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [DCT_TABLE], 32

2868 + mul v28.4h, v28.4h, v4.4h

2869 + mul v29.4h, v29.4h, v5.4h

2870 + ins v28.2d[1], v29.2d[0] /* 128 bit q14 */

2871 + mul v26.4h, v26.4h, v2.4h

2872 + mul v27.4h, v27.4h, v3.4h

2873 + ins v26.2d[1], v27.2d[0] /* 128 bit q13 */

2874 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [x15] /* load constants */

2875 + add x15, x15, #16

2876 + mul v30.4h, v30.4h, v6.4h

2877 + mul v31.4h, v31.4h, v7.4h

2878 + ins v30.2d[1], v31.2d[0] /* 128 bit q15 */

2879 + /* Go to the bottom of the stack */

2880 + sub sp, sp, 352

2881 + stp x4, x5, [sp], 16

2882 + st1 {v8.4h - v11.4h}, [sp], 32 /* save NEON registers */

2883 + st1 {v12.4h - v15.4h}, [sp], 32

2884 + /* 1-D IDCT, pass 1, left 4x8 half */

2885 + add v4.4h, ROW7L.4h, ROW3L.4h

2886 + add v5.4h, ROW5L.4h, ROW1L.4h

2887 + smull v12.4s, v4.4h, XFIX_1_175875602_MINUS_1_961570560

2888 + smlal v12.4s, v5.4h, XFIX_1_175875602

2889 + smull v14.4s, v4.4h, XFIX_1_175875602

2890 + /* Check for the zero coefficients in the right 4x8 half */

2891 + smlal v14.4s, v5.4h, XFIX_1_175875602_MINUS_0_390180644

2892 + ssubl v6.4s, ROW0L.4h, ROW4L.4h

2893 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 1 * 8))]

2894 + smull v4.4s, ROW2L.4h, XFIX_0_541196100

2895 + smlal v4.4s, ROW6L.4h, XFIX_0_541196100_MINUS_1_847759065

2896 + orr x0, x4, x5

2897 + mov v8.16b, v12.16b

2898 + smlsl v12.4s, ROW5L.4h, XFIX_2_562915447

2899 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 2 * 8))]

2900 + smlal v12.4s, ROW3L.4h, XFIX_3_072711026_MINUS_2_562915447

2901 + shl v6.4s, v6.4s, #13

2902 + orr x0, x0, x4

2903 + smlsl v8.4s, ROW1L.4h, XFIX_0_899976223

2904 + orr x0, x0 , x5

2905 + add v2.4s, v6.4s, v4.4s

2906 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 3 * 8))]

2907 + mov v10.16b, v14.16b

2908 + add v2.4s, v2.4s, v12.4s

2909 + orr x0, x0, x4

2910 + smlsl v14.4s, ROW7L.4h, XFIX_0_899976223

2911 + orr x0, x0, x5

2912 + smlal v14.4s, ROW1L.4h, XFIX_1_501321110_MINUS_0_899976223

2913 + rshrn ROW1L.4h, v2.4s, #11

2914 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 4 * 8))]

2915 + sub v2.4s, v2.4s, v12.4s

2916 + smlal v10.4s, ROW5L.4h, XFIX_2_053119869_MINUS_2_562915447

2917 + orr x0, x0, x4

2918 + smlsl v10.4s, ROW3L.4h, XFIX_2_562915447

2919 + orr x0, x0, x5

2920 + sub v2.4s, v2.4s, v12.4s

2921 + smull v12.4s, ROW2L.4h, XFIX_0_541196100_PLUS_0_765366865

2922 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 5 * 8))]

2923 + smlal v12.4s, ROW6L.4h, XFIX_0_541196100

2924 + sub v6.4s, v6.4s, v4.4s

2925 + orr x0, x0, x4

2926 + rshrn ROW6L.4h, v2.4s, #11

2927 + orr x0, x0, x5

2928 + add v2.4s, v6.4s, v10.4s

2929 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 6 * 8))]

2930 + sub v6.4s, v6.4s, v10.4s

2931 + saddl v10.4s, ROW0L.4h, ROW4L.4h

2932 + orr x0, x0, x4

2933 + rshrn ROW2L.4h, v2.4s, #11

2934 + orr x0, x0, x5

2935 + rshrn ROW5L.4h, v6.4s, #11

2936 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 7 * 8))]

2937 + shl v10.4s, v10.4s, #13

2938 + smlal v8.4s, ROW7L.4h, XFIX_0_298631336_MINUS_0_899976223

2939 + orr x0, x0, x4

2940 + add v4.4s, v10.4s, v12.4s

2941 + orr x0, x0, x5

2942 + cmp x0, #0 /* orrs instruction removed */

2943 + sub v2.4s, v10.4s, v12.4s

2944 + add v12.4s, v4.4s, v14.4s

2945 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 0 * 8))]

2946 + sub v4.4s, v4.4s, v14.4s

2947 + add v10.4s, v2.4s, v8.4s

2948 + orr x0, x4, x5

2949 + sub v6.4s, v2.4s, v8.4s

2950 + /* pop {x4, x5} */

2951 + sub sp, sp, 80

2952 + ldp x4, x5, [sp], 16

2953 + rshrn ROW7L.4h, v4.4s, #11

2954 + rshrn ROW3L.4h, v10.4s, #11

2955 + rshrn ROW0L.4h, v12.4s, #11

2956 + rshrn ROW4L.4h, v6.4s, #11

2957 +

2958 + beq 3f /* Go to do some special handling for the sparse right 4x8 half */

2959 +

2960 + /* 1-D IDCT, pass 1, right 4x8 half */

2961 + ld1 {v2.4h}, [x15] /* reload constants */

2962 + add v10.4h, ROW7R.4h, ROW3R.4h

2963 + add v8.4h, ROW5R.4h, ROW1R.4h

2964 + /* Transpose ROW6L <-> ROW7L (v3 available free register) */

2965 + transpose ROW6L, ROW7L, v3, .16b, .4h

2966 + smull v12.4s, v10.4h, XFIX_1_175875602_MINUS_1_961570560

2967 + smlal v12.4s, v8.4h, XFIX_1_175875602

2968 + /* Transpose ROW2L <-> ROW3L (v3 available free register) */

2969 + transpose ROW2L, ROW3L, v3, .16b, .4h

2970 + smull v14.4s, v10.4h, XFIX_1_175875602

2971 + smlal v14.4s, v8.4h, XFIX_1_175875602_MINUS_0_390180644

2972 + /* Transpose ROW0L <-> ROW1L (v3 available free register) */

2973 + transpose ROW0L, ROW1L, v3, .16b, .4h

2974 + ssubl v6.4s, ROW0R.4h, ROW4R.4h

2975 + smull v4.4s, ROW2R.4h, XFIX_0_541196100

2976 + smlal v4.4s, ROW6R.4h, XFIX_0_541196100_MINUS_1_847759065

2977 + /* Transpose ROW4L <-> ROW5L (v3 available free register) */

2978 + transpose ROW4L, ROW5L, v3, .16b, .4h

2979 + mov v8.16b, v12.16b

2980 + smlsl v12.4s, ROW5R.4h, XFIX_2_562915447

2981 + smlal v12.4s, ROW3R.4h, XFIX_3_072711026_MINUS_2_562915447

2982 + /* Transpose ROW1L <-> ROW3L (v3 available free register) */

2983 + transpose ROW1L, ROW3L, v3, .16b, .2s

2984 + shl v6.4s, v6.4s, #13

2985 + smlsl v8.4s, ROW1R.4h, XFIX_0_899976223

2986 + /* Transpose ROW4L <-> ROW6L (v3 available free register) */

2987 + transpose ROW4L, ROW6L, v3, .16b, .2s

2988 + add v2.4s, v6.4s, v4.4s

2989 + mov v10.16b, v14.16b

2990 + add v2.4s, v2.4s, v12.4s

2991 + /* Transpose ROW0L <-> ROW2L (v3 available free register) */

2992 + transpose ROW0L, ROW2L, v3, .16b, .2s

2993 + smlsl v14.4s, ROW7R.4h, XFIX_0_899976223

2994 + smlal v14.4s, ROW1R.4h, XFIX_1_501321110_MINUS_0_899976223

2995 + rshrn ROW1R.4h, v2.4s, #11

2996 + /* Transpose ROW5L <-> ROW7L (v3 available free register) */

2997 + transpose ROW5L, ROW7L, v3, .16b, .2s

2998 + sub v2.4s, v2.4s, v12.4s

2999 + smlal v10.4s, ROW5R.4h, XFIX_2_053119869_MINUS_2_562915447

3000 + smlsl v10.4s, ROW3R.4h, XFIX_2_562915447

3001 + sub v2.4s, v2.4s, v12.4s

3002 + smull v12.4s, ROW2R.4h, XFIX_0_541196100_PLUS_0_765366865

3003 + smlal v12.4s, ROW6R.4h, XFIX_0_541196100

3004 + sub v6.4s, v6.4s, v4.4s

3005 + rshrn ROW6R.4h, v2.4s, #11

3006 + add v2.4s, v6.4s, v10.4s

3007 + sub v6.4s, v6.4s, v10.4s

3008 + saddl v10.4s, ROW0R.4h, ROW4R.4h

3009 + rshrn ROW2R.4h, v2.4s, #11

3010 + rshrn ROW5R.4h, v6.4s, #11

3011 + shl v10.4s, v10.4s, #13

3012 + smlal v8.4s, ROW7R.4h, XFIX_0_298631336_MINUS_0_899976223

3013 + add v4.4s, v10.4s, v12.4s

3014 + sub v2.4s, v10.4s, v12.4s

3015 + add v12.4s, v4.4s, v14.4s

3016 + sub v4.4s, v4.4s, v14.4s

3017 + add v10.4s, v2.4s, v8.4s

3018 + sub v6.4s, v2.4s, v8.4s

3019 + rshrn ROW7R.4h, v4.4s, #11

3020 + rshrn ROW3R.4h, v10.4s, #11

3021 + rshrn ROW0R.4h, v12.4s, #11

3022 + rshrn ROW4R.4h, v6.4s, #11

3023 + /* Transpose right 4x8 half */

3024 + transpose ROW6R, ROW7R, v3, .16b, .4h

3025 + transpose ROW2R, ROW3R, v3, .16b, .4h

3026 + transpose ROW0R, ROW1R, v3, .16b, .4h

3027 + transpose ROW4R, ROW5R, v3, .16b, .4h

3028 + transpose ROW1R, ROW3R, v3, .16b, .2s

3029 + transpose ROW4R, ROW6R, v3, .16b, .2s

3030 + transpose ROW0R, ROW2R, v3, .16b, .2s

3031 + transpose ROW5R, ROW7R, v3, .16b, .2s

3032 +

3033 +1: /* 1-D IDCT, pass 2 (normal variant), left 4x8 half */

3034 + ld1 {v2.4h}, [x15] /* reload constants */

3035 + smull v12.4S, ROW1R.4h, XFIX_1_175875602 /* ROW5L.4h <-> ROW1R. 4h */

3036 + smlal v12.4s, ROW1L.4h, XFIX_1_175875602

3037 + smlal v12.4s, ROW3R.4h, XFIX_1_175875602_MINUS_1_961570560 /* R OW7L.4h <-> ROW3R.4h */

3038 + smlal v12.4s, ROW3L.4h, XFIX_1_175875602_MINUS_1_961570560

3039 + smull v14.4s, ROW3R.4h, XFIX_1_175875602 /* ROW7L.4h <-> ROW3R. 4h */

3040 + smlal v14.4s, ROW3L.4h, XFIX_1_175875602

3041 + smlal v14.4s, ROW1R.4h, XFIX_1_175875602_MINUS_0_390180644 /* R OW5L.4h <-> ROW1R.4h */

3042 + smlal v14.4s, ROW1L.4h, XFIX_1_175875602_MINUS_0_390180644

3043 + ssubl v6.4s, ROW0L.4h, ROW0R.4h /* ROW4L.4h <-> ROW0R.4h */

3044 + smull v4.4s, ROW2L.4h, XFIX_0_541196100

3045 + smlal v4.4s, ROW2R.4h, XFIX_0_541196100_MINUS_1_847759065 /* R OW6L.4h <-> ROW2R.4h */

3046 + mov v8.16b, v12.16b

3047 + smlsl v12.4s, ROW1R.4h, XFIX_2_562915447 /* ROW5L.4h <-> ROW1R. 4h */

3048 + smlal v12.4s, ROW3L.4h, XFIX_3_072711026_MINUS_2_562915447

3049 + shl v6.4s, v6.4s, #13

3050 + smlsl v8.4s, ROW1L.4h, XFIX_0_899976223

3051 + add v2.4s, v6.4s, v4.4s

3052 + mov v10.16b, v14.16b

3053 + add v2.4s, v2.4s, v12.4s

3054 + smlsl v14.4s, ROW3R.4h, XFIX_0_899976223 /* ROW7L.4h <-> ROW3R. 4h */

3055 + smlal v14.4s, ROW1L.4h, XFIX_1_501321110_MINUS_0_899976223

3056 + shrn ROW1L.4h, v2.4s, #16

3057 + sub v2.4s, v2.4s, v12.4s

3058 + smlal v10.4s, ROW1R.4h, XFIX_2_053119869_MINUS_2_562915447 /* R OW5L.4h <-> ROW1R.4h */

3059 + smlsl v10.4s, ROW3L.4h, XFIX_2_562915447

3060 + sub v2.4s, v2.4s, v12.4s

3061 + smull v12.4s, ROW2L.4h, XFIX_0_541196100_PLUS_0_765366865

3062 + smlal v12.4s, ROW2R.4h, XFIX_0_541196100 /* ROW6L.4h <-> ROW2R. 4h */

3063 + sub v6.4s, v6.4s, v4.4s

3064 + shrn ROW2R.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */

3065 + add v2.4s, v6.4s, v10.4s

3066 + sub v6.4s, v6.4s, v10.4s

3067 + saddl v10.4s, ROW0L.4h, ROW0R.4h /* ROW4L.4h <-> ROW0R.4h */

3068 + shrn ROW2L.4h, v2.4s, #16

3069 + shrn ROW1R.4h, v6.4s, #16 /* ROW5L.4h <-> ROW1R.4h */

3070 + shl v10.4s, v10.4s, #13

3071 + smlal v8.4s, ROW3R.4h, XFIX_0_298631336_MINUS_0_899976223 /* R OW7L.4h <-> ROW3R.4h */

3072 + add v4.4s, v10.4s, v12.4s

3073 + sub v2.4s, v10.4s, v12.4s

3074 + add v12.4s, v4.4s, v14.4s

3075 + sub v4.4s, v4.4s, v14.4s

3076 + add v10.4s, v2.4s, v8.4s

3077 + sub v6.4s, v2.4s, v8.4s

3078 + shrn ROW3R.4h, v4.4s, #16 /* ROW7L.4h <-> ROW3R.4h */

3079 + shrn ROW3L.4h, v10.4s, #16

3080 + shrn ROW0L.4h, v12.4s, #16

3081 + shrn ROW0R.4h, v6.4s, #16 /* ROW4L.4h <-> ROW0R.4h */

3082 + /* 1-D IDCT, pass 2, right 4x8 half */

3083 + ld1 {v2.4h}, [x15] /* reload constants */

3084 + smull v12.4s, ROW5R.4h, XFIX_1_175875602

3085 + smlal v12.4s, ROW5L.4h, XFIX_1_175875602 /* ROW5L.4h <-> ROW1R. 4h */

3086 + smlal v12.4s, ROW7R.4h, XFIX_1_175875602_MINUS_1_961570560

3087 + smlal v12.4s, ROW7L.4h, XFIX_1_175875602_MINUS_1_961570560 /* R OW7L.4h <-> ROW3R.4h */

3088 + smull v14.4s, ROW7R.4h, XFIX_1_175875602

3089 + smlal v14.4s, ROW7L.4h, XFIX_1_175875602 /* ROW7L.4h <-> ROW3R. 4h */

3090 + smlal v14.4s, ROW5R.4h, XFIX_1_175875602_MINUS_0_390180644

3091 + smlal v14.4s, ROW5L.4h, XFIX_1_175875602_MINUS_0_390180644 /* R OW5L.4h <-> ROW1R.4h */

3092 + ssubl v6.4s, ROW4L.4h, ROW4R.4h /* ROW4L.4h <-> ROW0R.4h */

3093 + smull v4.4s, ROW6L.4h, XFIX_0_541196100 /* ROW6L.4h <-> ROW2R. 4h */

3094 + smlal v4.4s, ROW6R.4h, XFIX_0_541196100_MINUS_1_847759065

3095 + mov v8.16b, v12.16b

3096 + smlsl v12.4s, ROW5R.4h, XFIX_2_562915447

3097 + smlal v12.4s, ROW7L.4h, XFIX_3_072711026_MINUS_2_562915447 /* R OW7L.4h <-> ROW3R.4h */

3098 + shl v6.4s, v6.4s, #13

3099 + smlsl v8.4s, ROW5L.4h, XFIX_0_899976223 /* ROW5L.4h <-> ROW1R. 4h */

3100 + add v2.4s, v6.4s, v4.4s

3101 + mov v10.16b, v14.16b

3102 + add v2.4s, v2.4s, v12.4s

3103 + smlsl v14.4s, ROW7R.4h, XFIX_0_899976223

3104 + smlal v14.4s, ROW5L.4h, XFIX_1_501321110_MINUS_0_899976223 /* R OW5L.4h <-> ROW1R.4h */

3105 + shrn ROW5L.4h, v2.4s, #16 /* ROW5L.4h <-> ROW1R.4h */

3106 + sub v2.4s, v2.4s, v12.4s

3107 + smlal v10.4s, ROW5R.4h, XFIX_2_053119869_MINUS_2_562915447

3108 + smlsl v10.4s, ROW7L.4h, XFIX_2_562915447 /* ROW7L.4h <-> ROW3R. 4h */

3109 + sub v2.4s, v2.4s, v12.4s

3110 + smull v12.4s, ROW6L.4h, XFIX_0_541196100_PLUS_0_765366865 /* RO W6L.4h <-> ROW2R.4h */

3111 + smlal v12.4s, ROW6R.4h, XFIX_0_541196100

3112 + sub v6.4s, v6.4s, v4.4s

3113 + shrn ROW6R.4h, v2.4s, #16

3114 + add v2.4s, v6.4s, v10.4s

3115 + sub v6.4s, v6.4s, v10.4s

3116 + saddl v10.4s, ROW4L.4h, ROW4R.4h /* ROW4L.4h <-> ROW0R.4h */

3117 + shrn ROW6L.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */

3118 + shrn ROW5R.4h, v6.4s, #16

3119 + shl v10.4s, v10.4s, #13

3120 + smlal v8.4s, ROW7R.4h, XFIX_0_298631336_MINUS_0_899976223

3121 + add v4.4s, v10.4s, v12.4s

3122 + sub v2.4s, v10.4s, v12.4s

3123 + add v12.4s, v4.4s, v14.4s

3124 + sub v4.4s, v4.4s, v14.4s

3125 + add v10.4s, v2.4s, v8.4s

3126 + sub v6.4s, v2.4s, v8.4s

3127 + shrn ROW7R.4h, v4.4s, #16

3128 + shrn ROW7L.4h, v10.4s, #16 /* ROW7L.4h <-> ROW3R.4h */

3129 + shrn ROW4L.4h, v12.4s, #16 /* ROW4L.4h <-> ROW0R.4h */

3130 + shrn ROW4R.4h, v6.4s, #16

3131 +

3132 +2: /* Descale to 8-bit and range limit */

3133 + ins v16.2d[1], v17.2d[0]

3134 + ins v18.2d[1], v19.2d[0]

3135 + ins v20.2d[1], v21.2d[0]

3136 + ins v22.2d[1], v23.2d[0]

3137 + sqrshrn v16.8b, v16.8h, #2

3138 + sqrshrn2 v16.16b, v18.8h, #2

3139 + sqrshrn v18.8b, v20.8h, #2

3140 + sqrshrn2 v18.16b, v22.8h, #2

3141 +

3142 + /* vpop {v8.4h - d15.4h} / / restore NEON registers */

3143 + ld1 {v8.4h - v11.4h}, [sp], 32

3144 + ld1 {v12.4h - v15.4h}, [sp], 32

3145 + ins v24.2d[1], v25.2d[0]

3146 +

3147 + sqrshrn v20.8b, v24.8h, #2

3148 + /* Transpose the final 8-bit samples and do signed->unsigned conversion * /

3149 + /* trn1 v16.8h, v16.8h, v18.8h */

3150 + transpose v16, v18, v3, .16b, .8h

3151 + ins v26.2d[1], v27.2d[0]

3152 + ins v28.2d[1], v29.2d[0]

3153 + ins v30.2d[1], v31.2d[0]

3154 + sqrshrn2 v20.16b, v26.8h, #2

3155 + sqrshrn v22.8b, v28.8h, #2

3156 + movi v0.16b, #(CENTERJSAMPLE)

3157 + sqrshrn2 v22.16b, v30.8h, #2

3158 + transpose_single v16, v17, v3, .2d, .8b

3159 + transpose_single v18, v19, v3, .2d, .8b

3160 + add v16.8b, v16.8b, v0.8b

3161 + add v17.8b, v17.8b, v0.8b

3162 + add v18.8b, v18.8b, v0.8b

3163 + add v19.8b, v19.8b, v0.8b

3164 + transpose v20, v22, v3, .16b, .8h

3165 + /* Store results to the output buffer */

3166 + ldp TMP1, TMP2, [OUTPUT_BUF], 16

3167 + add TMP1, TMP1, OUTPUT_COL

3168 + add TMP2, TMP2, OUTPUT_COL

3169 + st1 {v16.8b}, [TMP1]

3170 + transpose_single v20, v21, v3, .2d, .8b

3171 + st1 {v17.8b}, [TMP2]

3172 + ldp TMP1, TMP2, [OUTPUT_BUF], 16

3173 + add TMP1, TMP1, OUTPUT_COL

3174 + add TMP2, TMP2, OUTPUT_COL

3175 + st1 {v18.8b}, [TMP1]

3176 + add v20.8b, v20.8b, v0.8b

3177 + add v21.8b, v21.8b, v0.8b

3178 + st1 {v19.8b}, [TMP2]

3179 + ldp TMP1, TMP2, [OUTPUT_BUF], 16

3180 + ldp TMP3, TMP4, [OUTPUT_BUF]

3181 + add TMP1, TMP1, OUTPUT_COL

3182 + add TMP2, TMP2, OUTPUT_COL

3183 + add TMP3, TMP3, OUTPUT_COL

3184 + add TMP4, TMP4, OUTPUT_COL

3185 + transpose_single v22, v23, v3, .2d, .8b

3186 + st1 {v20.8b}, [TMP1]

3187 + add v22.8b, v22.8b, v0.8b

3188 + add v23.8b, v23.8b, v0.8b

3189 + st1 {v21.8b}, [TMP2]

3190 + st1 {v22.8b}, [TMP3]

3191 + st1 {v23.8b}, [TMP4]

3192 + ldr x15, [sp], 16

3193 + ld1 {v0.8b - v3.8b}, [sp], 32

3194 + ld1 {v4.8b - v7.8b}, [sp], 32

3195 + ld1 {v8.8b - v11.8b}, [sp], 32

3196 + ld1 {v12.8b - v15.8b}, [sp], 32

3197 + ld1 {v16.8b - v19.8b}, [sp], 32

3198 + ld1 {v20.8b - v23.8b}, [sp], 32

3199 + ld1 {v24.8b - v27.8b}, [sp], 32

3200 + ld1 {v28.8b - v31.8b}, [sp], 32

3201 + blr x30

3202 +

3203 +3: /* Left 4x8 half is done, right 4x8 half contains mostly zeros */

3204 +

3205 + /* Transpose left 4x8 half */

3206 + transpose ROW6L, ROW7L, v3, .16b, .4h

3207 + transpose ROW2L, ROW3L, v3, .16b, .4h

3208 + transpose ROW0L, ROW1L, v3, .16b, .4h

3209 + transpose ROW4L, ROW5L, v3, .16b, .4h

3210 + shl ROW0R.4h, ROW0R.4h, #2 /* PASS1_BITS */

3211 + transpose ROW1L, ROW3L, v3, .16b, .2s

3212 + transpose ROW4L, ROW6L, v3, .16b, .2s

3213 + transpose ROW0L, ROW2L, v3, .16b, .2s

3214 + transpose ROW5L, ROW7L, v3, .16b, .2s

3215 + cmp x0, #0

3216 + beq 4f /* Right 4x8 half has all zeros, go to 'sparse' second p ass */

3217 +

3218 + /* Only row 0 is non-zero for the right 4x8 half */

3219 + dup ROW1R.4h, ROW0R.4h[1]

3220 + dup ROW2R.4h, ROW0R.4h[2]

3221 + dup ROW3R.4h, ROW0R.4h[3]

3222 + dup ROW4R.4h, ROW0R.4h[0]

3223 + dup ROW5R.4h, ROW0R.4h[1]

3224 + dup ROW6R.4h, ROW0R.4h[2]

3225 + dup ROW7R.4h, ROW0R.4h[3]

3226 + dup ROW0R.4h, ROW0R.4h[0]

3227 + b 1b /* Go to 'normal' second pass */

3228 +

3229 +4: /* 1-D IDCT, pass 2 (sparse variant with zero rows 4-7), left 4x8 half */

3230 + ld1 {v2.4h}, [x15] /* reload constants */

3231 + smull v12.4s, ROW1L.4h, XFIX_1_175875602

3232 + smlal v12.4s, ROW3L.4h, XFIX_1_175875602_MINUS_1_961570560

3233 + smull v14.4s, ROW3L.4h, XFIX_1_175875602

3234 + smlal v14.4s, ROW1L.4h, XFIX_1_175875602_MINUS_0_390180644

3235 + smull v4.4s, ROW2L.4h, XFIX_0_541196100

3236 + sshll v6.4s, ROW0L.4h, #13

3237 + mov v8.16b, v12.16b

3238 + smlal v12.4s, ROW3L.4h, XFIX_3_072711026_MINUS_2_562915447

3239 + smlsl v8.4s, ROW1L.4h, XFIX_0_899976223

3240 + add v2.4s, v6.4s, v4.4s

3241 + mov v10.16b, v14.16b

3242 + smlal v14.4s, ROW1L.4h, XFIX_1_501321110_MINUS_0_899976223

3243 + add v2.4s, v2.4s, v12.4s

3244 + add v12.4s, v12.4s, v12.4s

3245 + smlsl v10.4s, ROW3L.4h, XFIX_2_562915447

3246 + shrn ROW1L.4h, v2.4s, #16

3247 + sub v2.4s, v2.4s, v12.4s

3248 + smull v12.4s, ROW2L.4h, XFIX_0_541196100_PLUS_0_765366865

3249 + sub v6.4s, v6.4s, v4.4s

3250 + shrn ROW2R.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */

3251 + add v2.4s, v6.4s, v10.4s

3252 + sub v6.4s, v6.4s, v10.4s

3253 + sshll v10.4s, ROW0L.4h, #13

3254 + shrn ROW2L.4h, v2.4s, #16

3255 + shrn ROW1R.4h, v6.4s, #16 /* ROW5L.4h <-> ROW1R.4h */

3256 + add v4.4s, v10.4s, v12.4s

3257 + sub v2.4s, v10.4s, v12.4s

3258 + add v12.4s, v4.4s, v14.4s

3259 + sub v4.4s, v4.4s, v14.4s

3260 + add v10.4s, v2.4s, v8.4s

3261 + sub v6.4s, v2.4s, v8.4s

3262 + shrn ROW3R.4h, v4.4s, #16 /* ROW7L.4h <-> ROW3R.4h */

3263 + shrn ROW3L.4h, v10.4s, #16

3264 + shrn ROW0L.4h, v12.4s, #16

3265 + shrn ROW0R.4h, v6.4s, #16 /* ROW4L.4h <-> ROW0R.4h */

3266 + /* 1-D IDCT, pass 2 (sparse variant with zero rows 4-7), right 4x8 half */

3267 + ld1 {v2.4h}, [x15] /* reload constants */

3268 + smull v12.4s, ROW5L.4h, XFIX_1_175875602

3269 + smlal v12.4s, ROW7L.4h, XFIX_1_175875602_MINUS_1_961570560

3270 + smull v14.4s, ROW7L.4h, XFIX_1_175875602

3271 + smlal v14.4s, ROW5L.4h, XFIX_1_175875602_MINUS_0_390180644

3272 + smull v4.4s, ROW6L.4h, XFIX_0_541196100

3273 + sshll v6.4s, ROW4L.4h, #13

3274 + mov v8.16b, v12.16b

3275 + smlal v12.4s, ROW7L.4h, XFIX_3_072711026_MINUS_2_562915447

3276 + smlsl v8.4s, ROW5L.4h, XFIX_0_899976223

3277 + add v2.4s, v6.4s, v4.4s

3278 + mov v10.16b, v14.16b

3279 + smlal v14.4s, ROW5L.4h, XFIX_1_501321110_MINUS_0_899976223

3280 + add v2.4s, v2.4s, v12.4s

3281 + add v12.4s, v12.4s, v12.4s

3282 + smlsl v10.4s, ROW7L.4h, XFIX_2_562915447

3283 + shrn ROW5L.4h, v2.4s, #16 /* ROW5L.4h <-> ROW1R.4h */

3284 + sub v2.4s, v2.4s, v12.4s

3285 + smull v12.4s, ROW6L.4h, XFIX_0_541196100_PLUS_0_765366865

3286 + sub v6.4s, v6.4s, v4.4s

3287 + shrn ROW6R.4h, v2.4s, #16

3288 + add v2.4s, v6.4s, v10.4s

3289 + sub v6.4s, v6.4s, v10.4s

3290 + sshll v10.4s, ROW4L.4h, #13

3291 + shrn ROW6L.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */

3292 + shrn ROW5R.4h, v6.4s, #16

3293 + add v4.4s, v10.4s, v12.4s

3294 + sub v2.4s, v10.4s, v12.4s

3295 + add v12.4s, v4.4s, v14.4s

3296 + sub v4.4s, v4.4s, v14.4s

3297 + add v10.4s, v2.4s, v8.4s

3298 + sub v6.4s, v2.4s, v8.4s

3299 + shrn ROW7R.4h, v4.4s, #16

3300 + shrn ROW7L.4h, v10.4s, #16 /* ROW7L.4h <-> ROW3R.4h */

3301 + shrn ROW4L.4h, v12.4s, #16 /* ROW4L.4h <-> ROW0R.4h */

3302 + shrn ROW4R.4h, v6.4s, #16

3303 + b 2b /* Go to epilogue */

3304 +

3305 + .unreq DCT_TABLE

3306 + .unreq COEF_BLOCK

3307 + .unreq OUTPUT_BUF

3308 + .unreq OUTPUT_COL

3309 + .unreq TMP1

3310 + .unreq TMP2

3311 + .unreq TMP3

3312 + .unreq TMP4

3313 +

3314 + .unreq ROW0L

3315 + .unreq ROW0R

3316 + .unreq ROW1L

3317 + .unreq ROW1R

3318 + .unreq ROW2L

3319 + .unreq ROW2R

3320 + .unreq ROW3L

3321 + .unreq ROW3R

3322 + .unreq ROW4L

3323 + .unreq ROW4R

3324 + .unreq ROW5L

3325 + .unreq ROW5R

3326 + .unreq ROW6L

3327 + .unreq ROW6R

3328 + .unreq ROW7L

3329 + .unreq ROW7R

3330 +

3331 +

3332 +/*****************************************************************************/

3333 +

3334 +/*

3335 + * jsimd_idct_ifast_neon

3336 + *

3337 + * This function contains a fast, not so accurate integer implementation of

3338 + * the inverse DCT (Discrete Cosine Transform). It uses the same calculations

3339 + * and produces exactly the same output as IJG's original 'jpeg_idct_ifast'

3340 + * function from jidctfst.c

3341 + *

3342 + * Normally 1-D AAN DCT needs 5 multiplications and 29 additions.

3343 + * But in ARM NEON case some extra additions are required because VQDMULH

3344 + * instruction can't handle the constants larger than 1. So the expressions

3345 + * like "x * 1.082392200" have to be converted to "x * 0.082392200 + x",

3346 + * which introduces an extra addition. Overall, there are 6 extra additions

3347 + * per 1-D IDCT pass, totalling to 5 VQDMULH and 35 VADD/VSUB instructions.

3348 + */

3349 +

3350 +#define XFIX_1_082392200 v0.4h[0]

3351 +#define XFIX_1_414213562 v0.4h[1]

3352 +#define XFIX_1_847759065 v0.4h[2]

3353 +#define XFIX_2_613125930 v0.4h[3]

3354 +

3355 +.balign 16

3356 +jsimd_idct_ifast_neon_consts:

3357 + .short (277 * 128 - 256 * 128) /* XFIX_1_082392200 */

3358 + .short (362 * 128 - 256 * 128) /* XFIX_1_414213562 */

3359 + .short (473 * 128 - 256 * 128) /* XFIX_1_847759065 */

3360 + .short (669 * 128 - 512 * 128) /* XFIX_2_613125930 */

3361 +

3362 +asm_function jsimd_idct_ifast_neon

3363 +

3364 + DCT_TABLE .req x0

3365 + COEF_BLOCK .req x1

3366 + OUTPUT_BUF .req x2

3367 + OUTPUT_COL .req x3

3368 + TMP1 .req x0

3369 + TMP2 .req x1

3370 + TMP3 .req x2

3371 + TMP4 .req x22

3372 + TMP5 .req x23

3373 +

3374 + /* Load and dequantize coefficients into NEON registers

3375 + * with the following allocation:

3376 + * 0 1 2 3 \| 4 5 6 7

3377 + * ---------+--------

3378 + * 0 \| d16 \| d17 ( v8.8h )

3379 + * 1 \| d18 \| d19 ( v9.8h )

3380 + * 2 \| d20 \| d21 ( v10.8h )

3381 + * 3 \| d22 \| d23 ( v11.8h )

3382 + * 4 \| d24 \| d25 ( v12.8h )

3383 + * 5 \| d26 \| d27 ( v13.8h )

3384 + * 6 \| d28 \| d29 ( v14.8h )

3385 + * 7 \| d30 \| d31 ( v15.8h )

3386 + */

3387 + /* Save NEON registers used in fast IDCT */

3388 + sub sp, sp, #176

3389 + stp x22, x23, [sp], 16

3390 + adr x23, jsimd_idct_ifast_neon_consts

3391 + st1 {v0.8b - v3.8b}, [sp], 32

3392 + st1 {v4.8b - v7.8b}, [sp], 32

3393 + st1 {v8.8b - v11.8b}, [sp], 32

3394 + st1 {v12.8b - v15.8b}, [sp], 32

3395 + st1 {v16.8b - v19.8b}, [sp], 32

3396 + ld1 {v8.8h, v9.8h}, [COEF_BLOCK], 32

3397 + ld1 {v0.8h, v1.8h}, [DCT_TABLE], 32

3398 + ld1 {v10.8h, v11.8h}, [COEF_BLOCK], 32

3399 + mul v8.8h, v8.8h, v0.8h

3400 + ld1 {v2.8h, v3.8h}, [DCT_TABLE], 32

3401 + mul v9.8h, v9.8h, v1.8h

3402 + ld1 {v12.8h, v13.8h}, [COEF_BLOCK], 32

3403 + mul v10.8h, v10.8h, v2.8h

3404 + ld1 {v0.8h, v1.8h}, [DCT_TABLE], 32

3405 + mul v11.8h, v11.8h, v3.8h

3406 + ld1 {v14.8h, v15.8h}, [COEF_BLOCK], 32

3407 + mul v12.8h, v12.8h, v0.8h

3408 + ld1 {v2.8h, v3.8h}, [DCT_TABLE], 32

3409 + mul v14.8h, v14.8h, v2.8h

3410 + mul v13.8h, v13.8h, v1.8h

3411 + ld1 {v0.4h}, [x23] /* load constants */

3412 + mul v15.8h, v15.8h, v3.8h

3413 +

3414 + /* 1-D IDCT, pass 1 */

3415 + sub v2.8h, v10.8h, v14.8h

3416 + add v14.8h, v10.8h, v14.8h

3417 + sub v1.8h, v11.8h, v13.8h

3418 + add v13.8h, v11.8h, v13.8h

3419 + sub v5.8h, v9.8h, v15.8h

3420 + add v15.8h, v9.8h, v15.8h

3421 + sqdmulh v4.8h, v2.8h, XFIX_1_414213562

3422 + sqdmulh v6.8h, v1.8h, XFIX_2_613125930

3423 + add v3.8h, v1.8h, v1.8h

3424 + sub v1.8h, v5.8h, v1.8h

3425 + add v10.8h, v2.8h, v4.8h

3426 + sqdmulh v4.8h, v1.8h, XFIX_1_847759065

3427 + sub v2.8h, v15.8h, v13.8h

3428 + add v3.8h, v3.8h, v6.8h

3429 + sqdmulh v6.8h, v2.8h, XFIX_1_414213562

3430 + add v1.8h, v1.8h, v4.8h

3431 + sqdmulh v4.8h, v5.8h, XFIX_1_082392200

3432 + sub v10.8h, v10.8h, v14.8h

3433 + add v2.8h, v2.8h, v6.8h

3434 + sub v6.8h, v8.8h, v12.8h

3435 + add v12.8h, v8.8h, v12.8h

3436 + add v9.8h, v5.8h, v4.8h

3437 + add v5.8h, v6.8h, v10.8h

3438 + sub v10.8h, v6.8h, v10.8h

3439 + add v6.8h, v15.8h, v13.8h

3440 + add v8.8h, v12.8h, v14.8h

3441 + sub v3.8h, v6.8h, v3.8h

3442 + sub v12.8h, v12.8h, v14.8h

3443 + sub v3.8h, v3.8h, v1.8h

3444 + sub v1.8h, v9.8h, v1.8h

3445 + add v2.8h, v3.8h, v2.8h

3446 + sub v15.8h, v8.8h, v6.8h

3447 + add v1.8h, v1.8h, v2.8h

3448 + add v8.8h, v8.8h, v6.8h

3449 + add v14.8h, v5.8h, v3.8h

3450 + sub v9.8h, v5.8h, v3.8h

3451 + sub v13.8h, v10.8h, v2.8h

3452 + add v10.8h, v10.8h, v2.8h

3453 + /* Transpose q8-q9 */

3454 + mov v18.16b, v8.16b

3455 + trn1 v8.8h, v8.8h, v9.8h

3456 + trn2 v9.8h, v18.8h, v9.8h

3457 + sub v11.8h, v12.8h, v1.8h

3458 + /* Transpose q14-q15 */

3459 + mov v18.16b, v14.16b

3460 + trn1 v14.8h, v14.8h, v15.8h

3461 + trn2 v15.8h, v18.8h, v15.8h

3462 + add v12.8h, v12.8h, v1.8h

3463 + /* Transpose q10-q11 */

3464 + mov v18.16b, v10.16b

3465 + trn1 v10.8h, v10.8h, v11.8h

3466 + trn2 v11.8h, v18.8h, v11.8h

3467 + /* Transpose q12-q13 */

3468 + mov v18.16b, v12.16b

3469 + trn1 v12.8h, v12.8h, v13.8h

3470 + trn2 v13.8h, v18.8h, v13.8h

3471 + /* Transpose q9-q11 */

3472 + mov v18.16b, v9.16b

3473 + trn1 v9.4s, v9.4s, v11.4s

3474 + trn2 v11.4s, v18.4s, v11.4s

3475 + /* Transpose q12-q14 */

3476 + mov v18.16b, v12.16b

3477 + trn1 v12.4s, v12.4s, v14.4s

3478 + trn2 v14.4s, v18.4s, v14.4s

3479 + /* Transpose q8-q10 */

3480 + mov v18.16b, v8.16b

3481 + trn1 v8.4s, v8.4s, v10.4s

3482 + trn2 v10.4s, v18.4s, v10.4s

3483 + /* Transpose q13-q15 */

3484 + mov v18.16b, v13.16b

3485 + trn1 v13.4s, v13.4s, v15.4s

3486 + trn2 v15.4s, v18.4s, v15.4s

3487 + /* vswp v14.4h, v10-MSB.4h */

3488 + umov x22, v14.d[0]

3489 + ins v14.2d[0], v10.2d[1]

3490 + ins v10.2d[1], x22

3491 + /* vswp v13.4h, v9MSB.4h */

3492 +

3493 + umov x22, v13.d[0]

3494 + ins v13.2d[0], v9.2d[1]

3495 + ins v9.2d[1], x22

3496 + /* 1-D IDCT, pass 2 */

3497 + sub v2.8h, v10.8h, v14.8h

3498 + /* vswp v15.4h, v11MSB.4h */

3499 + umov x22, v15.d[0]

3500 + ins v15.2d[0], v11.2d[1]

3501 + ins v11.2d[1], x22

3502 + add v14.8h, v10.8h, v14.8h

3503 + /* vswp v12.4h, v8-MSB.4h */

3504 + umov x22, v12.d[0]

3505 + ins v12.2d[0], v8.2d[1]

3506 + ins v8.2d[1], x22

3507 + sub v1.8h, v11.8h, v13.8h

3508 + add v13.8h, v11.8h, v13.8h

3509 + sub v5.8h, v9.8h, v15.8h

3510 + add v15.8h, v9.8h, v15.8h

3511 + sqdmulh v4.8h, v2.8h, XFIX_1_414213562

3512 + sqdmulh v6.8h, v1.8h, XFIX_2_613125930

3513 + add v3.8h, v1.8h, v1.8h

3514 + sub v1.8h, v5.8h, v1.8h

3515 + add v10.8h, v2.8h, v4.8h

3516 + sqdmulh v4.8h, v1.8h, XFIX_1_847759065

3517 + sub v2.8h, v15.8h, v13.8h

3518 + add v3.8h, v3.8h, v6.8h

3519 + sqdmulh v6.8h, v2.8h, XFIX_1_414213562

3520 + add v1.8h, v1.8h, v4.8h

3521 + sqdmulh v4.8h, v5.8h, XFIX_1_082392200

3522 + sub v10.8h, v10.8h, v14.8h

3523 + add v2.8h, v2.8h, v6.8h

3524 + sub v6.8h, v8.8h, v12.8h

3525 + add v12.8h, v8.8h, v12.8h

3526 + add v9.8h, v5.8h, v4.8h

3527 + add v5.8h, v6.8h, v10.8h

3528 + sub v10.8h, v6.8h, v10.8h

3529 + add v6.8h, v15.8h, v13.8h

3530 + add v8.8h, v12.8h, v14.8h

3531 + sub v3.8h, v6.8h, v3.8h

3532 + sub v12.8h, v12.8h, v14.8h

3533 + sub v3.8h, v3.8h, v1.8h

3534 + sub v1.8h, v9.8h, v1.8h

3535 + add v2.8h, v3.8h, v2.8h

3536 + sub v15.8h, v8.8h, v6.8h

3537 + add v1.8h, v1.8h, v2.8h

3538 + add v8.8h, v8.8h, v6.8h

3539 + add v14.8h, v5.8h, v3.8h

3540 + sub v9.8h, v5.8h, v3.8h

3541 + sub v13.8h, v10.8h, v2.8h

3542 + add v10.8h, v10.8h, v2.8h

3543 + sub v11.8h, v12.8h, v1.8h

3544 + add v12.8h, v12.8h, v1.8h

3545 + /* Descale to 8-bit and range limit */

3546 + movi v0.16b, #0x80

3547 + sqshrn v8.8b, v8.8h, #5

3548 + sqshrn2 v8.16b, v9.8h, #5

3549 + sqshrn v9.8b, v10.8h, #5

3550 + sqshrn2 v9.16b, v11.8h, #5

3551 + sqshrn v10.8b, v12.8h, #5

3552 + sqshrn2 v10.16b, v13.8h, #5

3553 + sqshrn v11.8b, v14.8h, #5

3554 + sqshrn2 v11.16b, v15.8h, #5

3555 + add v8.16b, v8.16b, v0.16b

3556 + add v9.16b, v9.16b, v0.16b

3557 + add v10.16b, v10.16b, v0.16b

3558 + add v11.16b, v11.16b, v0.16b

3559 + /* Transpose the final 8-bit samples */

3560 + /* Transpose q8-q9 */

3561 + mov v18.16b, v8.16b

3562 + trn1 v8.8h, v8.8h, v9.8h

3563 + trn2 v9.8h, v18.8h, v9.8h

3564 + /* Transpose q10-q11 */

3565 + mov v18.16b, v10.16b

3566 + trn1 v10.8h, v10.8h, v11.8h

3567 + trn2 v11.8h, v18.8h, v11.8h

3568 + /* Transpose q8-q10 */

3569 + mov v18.16b, v8.16b

3570 + trn1 v8.4s, v8.4s, v10.4s

3571 + trn2 v10.4s, v18.4s, v10.4s

3572 + /* Transpose q9-q11 */

3573 + mov v18.16b, v9.16b

3574 + trn1 v9.4s, v9.4s, v11.4s

3575 + trn2 v11.4s, v18.4s, v11.4s

3576 + /* make copy */

3577 + ins v17.2d[0], v8.2d[1]

3578 + /* Transpose d16-d17-msb */

3579 + mov v18.16b, v8.16b

3580 + trn1 v8.8b, v8.8b, v17.8b

3581 + trn2 v17.8b, v18.8b, v17.8b

3582 + /* make copy */

3583 + ins v19.2d[0], v9.2d[1]

3584 + mov v18.16b, v9.16b

3585 + trn1 v9.8b, v9.8b, v19.8b

3586 + trn2 v19.8b, v18.8b, v19.8b

3587 + /* Store results to the output buffer */

3588 + ldp TMP1, TMP2, [OUTPUT_BUF], 16

3589 + add TMP1, TMP1, OUTPUT_COL

3590 + add TMP2, TMP2, OUTPUT_COL

3591 + st1 {v8.8b}, [TMP1]

3592 + st1 {v17.8b}, [TMP2]

3593 + ldp TMP1, TMP2, [OUTPUT_BUF], 16

3594 + add TMP1, TMP1, OUTPUT_COL

3595 + add TMP2, TMP2, OUTPUT_COL

3596 + st1 {v9.8b}, [TMP1]

3597 + /* make copy */

3598 + ins v7.2d[0], v10.2d[1]

3599 + mov v18.16b, v10.16b

3600 + trn1 v10.8b, v10.8b, v7.8b

3601 + trn2 v7.8b, v18.8b, v7.8b

3602 + st1 {v19.8b}, [TMP2]

3603 + ldp TMP1, TMP2, [OUTPUT_BUF], 16

3604 + ldp TMP4, TMP5, [OUTPUT_BUF], 16

3605 + add TMP1, TMP1, OUTPUT_COL

3606 + add TMP2, TMP2, OUTPUT_COL

3607 + add TMP4, TMP4, OUTPUT_COL

3608 + add TMP5, TMP5, OUTPUT_COL

3609 + st1 {v10.8b}, [TMP1]

3610 + /* make copy */

3611 + ins v16.2d[0], v11.2d[1]

3612 + mov v18.16b, v11.16b

3613 + trn1 v11.8b, v11.8b, v16.8b

3614 + trn2 v16.8b, v18.8b, v16.8b

3615 + st1 {v7.8b}, [TMP2]

3616 + st1 {v11.8b}, [TMP4]

3617 + st1 {v16.8b}, [TMP5]

3618 + sub sp, sp, #176

3619 + ldp x22, x23, [sp], 16

3620 + ld1 {v0.8b - v3.8b}, [sp], 32

3621 + ld1 {v4.8b - v7.8b}, [sp], 32

3622 + ld1 {v8.8b - v11.8b}, [sp], 32

3623 + ld1 {v12.8b - v15.8b}, [sp], 32

3624 + ld1 {v16.8b - v19.8b}, [sp], 32

3625 + blr x30

3626 +

3627 + .unreq DCT_TABLE

3628 + .unreq COEF_BLOCK

3629 + .unreq OUTPUT_BUF

3630 + .unreq OUTPUT_COL

3631 + .unreq TMP1

3632 + .unreq TMP2

3633 + .unreq TMP3

3634 + .unreq TMP4

3635 +

3636 +

3637 +/*****************************************************************************/

3638 +

3639 +/*

3640 + * jsimd_idct_4x4_neon

3641 + *

3642 + * This function contains inverse-DCT code for getting reduced-size

3643 + * 4x4 pixels output from an 8x8 DCT block. It uses the same calculations

3644 + * and produces exactly the same output as IJG's original 'jpeg_idct_4x4'

3645 + * function from jpeg-6b (jidctred.c).

3646 + *

3647 + * NOTE: jpeg-8 has an improved implementation of 4x4 inverse-DCT, which

3648 + * requires much less arithmetic operations and hence should be faster.

3649 + * The primary purpose of this particular NEON optimized function is

3650 + * bit exact compatibility with jpeg-6b.

3651 + *

3652 + * TODO: a bit better instructions scheduling can be achieved by expanding

3653 + * idct_helper/transpose_4x4 macros and reordering instructions,

3654 + * but readability will suffer somewhat.

3655 + */

3656 +

3657 +#define CONST_BITS 13

3658 +

3659 +#define FIX_0_211164243 (1730) /* FIX(0.211164243) */

3660 +#define FIX_0_509795579 (4176) /* FIX(0.509795579) */

3661 +#define FIX_0_601344887 (4926) /* FIX(0.601344887) */

3662 +#define FIX_0_720959822 (5906) /* FIX(0.720959822) */

3663 +#define FIX_0_765366865 (6270) /* FIX(0.765366865) */

3664 +#define FIX_0_850430095 (6967) /* FIX(0.850430095) */

3665 +#define FIX_0_899976223 (7373) /* FIX(0.899976223) */

3666 +#define FIX_1_061594337 (8697) /* FIX(1.061594337) */

3667 +#define FIX_1_272758580 (10426) /* FIX(1.272758580) */

3668 +#define FIX_1_451774981 (11893) /* FIX(1.451774981) */

3669 +#define FIX_1_847759065 (15137) /* FIX(1.847759065) */

3670 +#define FIX_2_172734803 (17799) /* FIX(2.172734803) */

3671 +#define FIX_2_562915447 (20995) /* FIX(2.562915447) */

3672 +#define FIX_3_624509785 (29692) /* FIX(3.624509785) */

3673 +

3674 +.balign 16

3675 +jsimd_idct_4x4_neon_consts:

3676 + .short FIX_1_847759065 /* v0.4h[0] */

3677 + .short -FIX_0_765366865 /* v0.4h[1] */

3678 + .short -FIX_0_211164243 /* v0.4h[2] */

3679 + .short FIX_1_451774981 /* v0.4h[3] */

3680 + .short -FIX_2_172734803 /* d1[0] */

3681 + .short FIX_1_061594337 /* d1[1] */

3682 + .short -FIX_0_509795579 /* d1[2] */

3683 + .short -FIX_0_601344887 /* d1[3] */

3684 + .short FIX_0_899976223 /* v2.4h[0] */

3685 + .short FIX_2_562915447 /* v2.4h[1] */

3686 + .short 1 << (CONST_BITS+1) /* v2.4h[2] */

3687 + .short 0 /* v2.4h[3] */

3688 +

3689 +.macro idct_helper x4, x6, x8, x10, x12, x14, x16, shift, y26, y27, y28, y29

3690 + smull v28.4s, \x4, v2.4h[2]

3691 + smlal v28.4s, \x8, v0.4h[0]

3692 + smlal v28.4s, \x14, v0.4h[1]

3693 +

3694 + smull v26.4s, \x16, v1.4h[2]

3695 + smlal v26.4s, \x12, v1.4h[3]

3696 + smlal v26.4s, \x10, v2.4h[0]

3697 + smlal v26.4s, \x6, v2.4h[1]

3698 +

3699 + smull v30.4s, \x4, v2.4h[2]

3700 + smlsl v30.4s, \x8, v0.4h[0]

3701 + smlsl v30.4s, \x14, v0.4h[1]

3702 +

3703 + smull v24.4s, \x16, v0.4h[2]

3704 + smlal v24.4s, \x12, v0.4h[3]

3705 + smlal v24.4s, \x10, v1.4h[0]

3706 + smlal v24.4s, \x6, v1.4h[1]

3707 +

3708 + add v20.4s, v28.4s, v26.4s

3709 + sub v28.4s, v28.4s, v26.4s

3710 +

3711 +.if \shift > 16

3712 + srshr v20.4s, v20.4s, #\shift

3713 + srshr v28.4s, v28.4s, #\shift

3714 + xtn \y26, v20.4s

3715 + xtn \y29, v28.4s

3716 +.else

3717 + rshrn \y26, v20.4s, #\shift

3718 + rshrn \y29, v28.4s, #\shift

3719 +.endif

3720 +

3721 + add v20.4s, v30.4s, v24.4s

3722 + sub v30.4s, v30.4s, v24.4s

3723 +

3724 +.if \shift > 16

3725 + srshr v20.4s, v20.4s, #\shift

3726 + srshr v30.4s, v30.4s, #\shift

3727 + xtn \y27, v20.4s

3728 + xtn \y28, v30.4s

3729 +.else

3730 + rshrn \y27, v20.4s, #\shift

3731 + rshrn \y28, v30.4s, #\shift

3732 +.endif

3733 +

3734 +.endm

3735 +

3736 +asm_function jsimd_idct_4x4_neon

3737 +

3738 + DCT_TABLE .req x0

3739 + COEF_BLOCK .req x1

3740 + OUTPUT_BUF .req x2

3741 + OUTPUT_COL .req x3

3742 + TMP1 .req x0

3743 + TMP2 .req x1

3744 + TMP3 .req x2

3745 + TMP4 .req x15

3746 +

3747 + /* Save all used NEON registers */

3748 + sub sp, sp, 272

3749 + str x15, [sp], 16

3750 + /* Load constants (v3.4h is just used for padding) */

3751 + adr TMP4, jsimd_idct_4x4_neon_consts

3752 + st1 {v0.8b - v3.8b}, [sp], 32

3753 + st1 {v4.8b - v7.8b}, [sp], 32

3754 + st1 {v8.8b - v11.8b}, [sp], 32

3755 + st1 {v12.8b - v15.8b}, [sp], 32

3756 + st1 {v16.8b - v19.8b}, [sp], 32

3757 + st1 {v20.8b - v23.8b}, [sp], 32

3758 + st1 {v24.8b - v27.8b}, [sp], 32

3759 + st1 {v28.8b - v31.8b}, [sp], 32

3760 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [TMP4]

3761 +

3762 + /* Load all COEF_BLOCK into NEON registers with the following allocation:

3763 + * 0 1 2 3 \| 4 5 6 7

3764 + * ---------+--------

3765 + * 0 \| v4.4h \| v5.4h

3766 + * 1 \| v6.4h \| v7.4h

3767 + * 2 \| v8.4h \| v9.4h

3768 + * 3 \| v10.4h \| v11.4h

3769 + * 4 \| - \| -

3770 + * 5 \| v12.4h \| v13.4h

3771 + * 6 \| v14.4h \| v15.4h

3772 + * 7 \| v16.4h \| v17.4h

3773 + */

3774 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [COEF_BLOCK], 32

3775 + ld1 {v8.4h, v9.4h, v10.4h, v11.4h}, [COEF_BLOCK], 32

3776 + add COEF_BLOCK, COEF_BLOCK, #16

3777 + ld1 {v12.4h, v13.4h, v14.4h, v15.4h}, [COEF_BLOCK], 32

3778 + ld1 {v16.4h, v17.4h}, [COEF_BLOCK], 16

3779 + /* dequantize */

3780 + ld1 {v18.4h, v19.4h, v20.4h, v21.4h}, [DCT_TABLE], 32

3781 + mul v4.4h, v4.4h, v18.4h

3782 + mul v5.4h, v5.4h, v19.4h

3783 + ins v4.2d[1], v5.2d[0] /* 128 bit q4 */

3784 + ld1 {v22.4h, v23.4h, v24.4h, v25.4h}, [DCT_TABLE], 32

3785 + mul v6.4h, v6.4h, v20.4h

3786 + mul v7.4h, v7.4h, v21.4h

3787 + ins v6.2d[1], v7.2d[0] /* 128 bit q6 */

3788 + mul v8.4h, v8.4h, v22.4h

3789 + mul v9.4h, v9.4h, v23.4h

3790 + ins v8.2d[1], v9.2d[0] /* 128 bit q8 */

3791 + add DCT_TABLE, DCT_TABLE, #16

3792 + ld1 {v26.4h, v27.4h, v28.4h, v29.4h}, [DCT_TABLE], 32

3793 + mul v10.4h, v10.4h, v24.4h

3794 + mul v11.4h, v11.4h, v25.4h

3795 + ins v10.2d[1], v11.2d[0] /* 128 bit q10 */

3796 + mul v12.4h, v12.4h, v26.4h

3797 + mul v13.4h, v13.4h, v27.4h

3798 + ins v12.2d[1], v13.2d[0] /* 128 bit q12 */

3799 + ld1 {v30.4h, v31.4h}, [DCT_TABLE], 16

3800 + mul v14.4h, v14.4h, v28.4h

3801 + mul v15.4h, v15.4h, v29.4h

3802 + ins v14.2d[1], v15.2d[0] /* 128 bit q14 */

3803 + mul v16.4h, v16.4h, v30.4h

3804 + mul v17.4h, v17.4h, v31.4h

3805 + ins v16.2d[1], v17.2d[0] /* 128 bit q16 */

3806 +

3807 + /* Pass 1 */

3808 + idct_helper v4.4h, v6.4h, v8.4h, v10.4h, v12.4h, v14.4h, v16.4h, 12, v4 .4h, v6.4h, v8.4h, v10.4h

3809 + transpose_4x4 v4, v6, v8, v10, v3

3810 + ins v10.2d[1], v11.2d[0]

3811 + idct_helper v5.4h, v7.4h, v9.4h, v11.4h, v13.4h, v15.4h, v17.4h, 12, v5 .4h, v7.4h, v9.4h, v11.4h

3812 + transpose_4x4 v5, v7, v9, v11, v3

3813 + ins v10.2d[1], v11.2d[0]

3814 + /* Pass 2 */

3815 + idct_helper v4.4h, v6.4h, v8.4h, v10.4h, v7.4h, v9.4h, v11.4h, 19, v26. 4h, v27.4h, v28.4h, v29.4h

3816 + transpose_4x4 v26, v27, v28, v29, v3

3817 +

3818 + /* Range limit */

3819 + movi v30.8h, #0x80

3820 + ins v26.2d[1], v27.2d[0]

3821 + ins v28.2d[1], v29.2d[0]

3822 + add v26.8h, v26.8h, v30.8h

3823 + add v28.8h, v28.8h, v30.8h

3824 + sqxtun v26.8b, v26.8h

3825 + sqxtun v27.8b, v28.8h

3826 +

3827 + /* Store results to the output buffer */

3828 + ldp TMP1, TMP2, [OUTPUT_BUF], 16

3829 + ldp TMP3, TMP4, [OUTPUT_BUF]

3830 + add TMP1, TMP1, OUTPUT_COL

3831 + add TMP2, TMP2, OUTPUT_COL

3832 + add TMP3, TMP3, OUTPUT_COL

3833 + add TMP4, TMP4, OUTPUT_COL

3834 +

3835 +#if defined(__ARMEL__) && !RESPECT_STRICT_ALIGNMENT

3836 + /* We can use much less instructions on little endian systems if the

3837 + * OS kernel is not configured to trap unaligned memory accesses

3838 + */

3839 + st1 {v26.s}[0], [TMP1], 4

3840 + st1 {v27.s}[0], [TMP3], 4

3841 + st1 {v26.s}[1], [TMP2], 4

3842 + st1 {v27.s}[1], [TMP4], 4

3843 +#else

3844 + st1 {v26.b}[0], [TMP1], 1

3845 + st1 {v27.b}[0], [TMP3], 1

3846 + st1 {v26.b}[1], [TMP1], 1

3847 + st1 {v27.b}[1], [TMP3], 1

3848 + st1 {v26.b}[2], [TMP1], 1

3849 + st1 {v27.b}[2], [TMP3], 1

3850 + st1 {v26.b}[3], [TMP1], 1

3851 + st1 {v27.b}[3], [TMP3], 1

3852 +

3853 + st1 {v26.b}[4], [TMP2], 1

3854 + st1 {v27.b}[4], [TMP4], 1

3855 + st1 {v26.b}[5], [TMP2], 1

3856 + st1 {v27.b}[5], [TMP4], 1

3857 + st1 {v26.b}[6], [TMP2], 1

3858 + st1 {v27.b}[6], [TMP4], 1

3859 + st1 {v26.b}[7], [TMP2], 1

3860 + st1 {v27.b}[7], [TMP4], 1

3861 +#endif

3862 +

3863 + /* vpop {v8.4h - v15.4h} ;not available */

3864 + sub sp, sp, #272

3865 + ldr x15, [sp], 16

3866 + ld1 {v0.8b - v3.8b}, [sp], 32

3867 + ld1 {v4.8b - v7.8b}, [sp], 32

3868 + ld1 {v8.8b - v11.8b}, [sp], 32

3869 + ld1 {v12.8b - v15.8b}, [sp], 32

3870 + ld1 {v16.8b - v19.8b}, [sp], 32

3871 + ld1 {v20.8b - v23.8b}, [sp], 32

3872 + ld1 {v24.8b - v27.8b}, [sp], 32

3873 + ld1 {v28.8b - v31.8b}, [sp], 32

3874 + blr x30

3875 +

3876 + .unreq DCT_TABLE

3877 + .unreq COEF_BLOCK

3878 + .unreq OUTPUT_BUF

3879 + .unreq OUTPUT_COL

3880 + .unreq TMP1

3881 + .unreq TMP2

3882 + .unreq TMP3

3883 + .unreq TMP4

3884 +

3885 +.purgem idct_helper

3886 +

3887 +

3888 +/*****************************************************************************/

3889 +

3890 +/*

3891 + * jsimd_idct_2x2_neon

3892 + *

3893 + * This function contains inverse-DCT code for getting reduced-size

3894 + * 2x2 pixels output from an 8x8 DCT block. It uses the same calculations

3895 + * and produces exactly the same output as IJG's original 'jpeg_idct_2x2'

3896 + * function from jpeg-6b (jidctred.c).

3897 + *

3898 + * NOTE: jpeg-8 has an improved implementation of 2x2 inverse-DCT, which

3899 + * requires much less arithmetic operations and hence should be faster.

3900 + * The primary purpose of this particular NEON optimized function is

3901 + * bit exact compatibility with jpeg-6b.

3902 + */

3903 +

3904 +.balign 8

3905 +jsimd_idct_2x2_neon_consts:

3906 + .short -FIX_0_720959822 /* v14[0] */

3907 + .short FIX_0_850430095 /* v14[1] */

3908 + .short -FIX_1_272758580 /* v14[2] */

3909 + .short FIX_3_624509785 /* v14[3] */

3910 +

3911 +.macro idct_helper x4, x6, x10, x12, x16, shift, y26, y27

3912 + sshll v15.4s, \x4, #15

3913 + smull v26.4s, \x6, v14.4h[3]

3914 + smlal v26.4s, \x10, v14.4h[2]

3915 + smlal v26.4s, \x12, v14.4h[1]

3916 + smlal v26.4s, \x16, v14.4h[0]

3917 +

3918 + add v20.4s, v15.4s, v26.4s

3919 + sub v15.4s, v15.4s, v26.4s

3920 +

3921 +.if \shift > 16

3922 + srshr v20.4s, v20.4s, #\shift

3923 + srshr v15.4s, v15.4s, #\shift

3924 + xtn \y26, v20.4s

3925 + xtn \y27, v15.4s

3926 +.else

3927 + rshrn \y26, v20.4s, #\shift

3928 + rshrn \y27, v15.4s, #\shift

3929 +.endif

3930 +

3931 +.endm

3932 +

3933 +asm_function jsimd_idct_2x2_neon

3934 +

3935 + DCT_TABLE .req x0

3936 + COEF_BLOCK .req x1

3937 + OUTPUT_BUF .req x2

3938 + OUTPUT_COL .req x3

3939 + TMP1 .req x0

3940 + TMP2 .req x15

3941 +

3942 + /* vpush {v8.4h - v15.4h} ; not available */

3943 + sub sp, sp, 208

3944 + str x15, [sp], 16

3945 +

3946 + /* Load constants */

3947 + adr TMP2, jsimd_idct_2x2_neon_consts

3948 + st1 {v4.8b - v7.8b}, [sp], 32

3949 + st1 {v8.8b - v11.8b}, [sp], 32

3950 + st1 {v12.8b - v15.8b}, [sp], 32

3951 + st1 {v16.8b - v19.8b}, [sp], 32

3952 + st1 {v21.8b - v22.8b}, [sp], 16

3953 + st1 {v24.8b - v27.8b}, [sp], 32

3954 + st1 {v30.8b - v31.8b}, [sp], 16

3955 + ld1 {v14.4h}, [TMP2]

3956 +

3957 + /* Load all COEF_BLOCK into NEON registers with the following allocation:

3958 + * 0 1 2 3 \| 4 5 6 7

3959 + * ---------+--------

3960 + * 0 \| v4.4h \| v5.4h

3961 + * 1 \| v6.4h \| v7.4h

3962 + * 2 \| - \| -

3963 + * 3 \| v10.4h \| v11.4h

3964 + * 4 \| - \| -

3965 + * 5 \| v12.4h \| v13.4h

3966 + * 6 \| - \| -

3967 + * 7 \| v16.4h \| v17.4h

3968 + */

3969 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [COEF_BLOCK], 32

3970 + add COEF_BLOCK, COEF_BLOCK, #16

3971 + ld1 {v10.4h, v11.4h}, [COEF_BLOCK], 16

3972 + add COEF_BLOCK, COEF_BLOCK, #16

3973 + ld1 {v12.4h, v13.4h}, [COEF_BLOCK], 16

3974 + add COEF_BLOCK, COEF_BLOCK, #16

3975 + ld1 {v16.4h, v17.4h}, [COEF_BLOCK], 16

3976 + /* Dequantize */

3977 + ld1 {v18.4h, v19.4h, v20.4h, v21.4h}, [DCT_TABLE], 32

3978 + mul v4.4h, v4.4h, v18.4h

3979 + mul v5.4h, v5.4h, v19.4h

3980 + ins v4.2d[1], v5.2d[0]

3981 + mul v6.4h, v6.4h, v20.4h

3982 + mul v7.4h, v7.4h, v21.4h

3983 + ins v6.2d[1], v7.2d[0]

3984 + add DCT_TABLE, DCT_TABLE, #16

3985 + ld1 {v24.4h, v25.4h}, [DCT_TABLE], 16

3986 + mul v10.4h, v10.4h, v24.4h

3987 + mul v11.4h, v11.4h, v25.4h

3988 + ins v10.2d[1], v11.2d[0]

3989 + add DCT_TABLE, DCT_TABLE, #16

3990 + ld1 {v26.4h, v27.4h}, [DCT_TABLE], 16

3991 + mul v12.4h, v12.4h, v26.4h

3992 + mul v13.4h, v13.4h, v27.4h

3993 + ins v12.2d[1], v13.2d[0]

3994 + add DCT_TABLE, DCT_TABLE, #16

3995 + ld1 {v30.4h, v31.4h}, [DCT_TABLE], 16

3996 + mul v16.4h, v16.4h, v30.4h

3997 + mul v17.4h, v17.4h, v31.4h

3998 + ins v16.2d[1], v17.2d[0]

3999 +

4000 + /* Pass 1 */

4001 +#if 0

4002 + idct_helper v4.4h, v6.4h, v10.4h, v12.4h, v16.4h, 13, v4.4h, v6.4h

4003 + transpose_4x4 v4.4h, v6.4h, v8.4h, v10.4h

4004 + idct_helper v5.4h, v7.4h, v11.4h, v13.4h, v17.4h, 13, v5.4h, v7.4h

4005 + transpose_4x4 v5.4h, v7.4h, v9.4h, v11.4h

4006 +#else

4007 + smull v26.4s, v6.4h, v14.4h[3]

4008 + smlal v26.4s, v10.4h, v14.4h[2]

4009 + smlal v26.4s, v12.4h, v14.4h[1]

4010 + smlal v26.4s, v16.4h, v14.4h[0]

4011 + smull v24.4s, v7.4h, v14.4h[3]

4012 + smlal v24.4s, v11.4h, v14.4h[2]

4013 + smlal v24.4s, v13.4h, v14.4h[1]

4014 + smlal v24.4s, v17.4h, v14.4h[0]

4015 + sshll v15.4s, v4.4h, #15

4016 + sshll v30.4s, v5.4h, #15

4017 + add v20.4s, v15.4s, v26.4s

4018 + sub v15.4s, v15.4s, v26.4s

4019 + rshrn v4.4h, v20.4s, #13

4020 + rshrn v6.4h, v15.4s, #13

4021 + add v20.4s, v30.4s, v24.4s

4022 + sub v15.4s, v30.4s, v24.4s

4023 + rshrn v5.4h, v20.4s, #13

4024 + rshrn v7.4h, v15.4s, #13

4025 + ins v4.2d[1], v5.2d[0]

4026 + ins v6.2d[1], v7.2d[0]

4027 + transpose v4, v6, v3, .16b, .8h

4028 + transpose v6, v10, v3, .16b, .4s

4029 + ins v11.2d[0], v10.2d[1]

4030 + ins v7.2d[0], v6.2d[1]

4031 +#endif

4032 +

4033 + /* Pass 2 */

4034 + idct_helper v4.4h, v6.4h, v10.4h, v7.4h, v11.4h, 20, v26.4h, v27.4h

4035 +

4036 + /* Range limit */

4037 + movi v30.8h, #0x80

4038 + ins v26.2d[1], v27.2d[0]

4039 + add v26.8h, v26.8h, v30.8h

4040 + sqxtun v30.8b, v26.8h

4041 + ins v26.2d[0], v30.2d[0]

4042 + sqxtun v27.8b, v26.8h

4043 +

4044 + /* Store results to the output buffer */

4045 + ldp TMP1, TMP2, [OUTPUT_BUF]

4046 + add TMP1, TMP1, OUTPUT_COL

4047 + add TMP2, TMP2, OUTPUT_COL

4048 +

4049 + st1 {v26.b}[0], [TMP1], 1

4050 + st1 {v27.b}[4], [TMP1], 1

4051 + st1 {v26.b}[1], [TMP2], 1

4052 + st1 {v27.b}[5], [TMP2], 1

4053 +

4054 + sub sp, sp, #208

4055 + ldr x15, [sp], 16

4056 + ld1 {v4.8b - v7.8b}, [sp], 32

4057 + ld1 {v8.8b - v11.8b}, [sp], 32

4058 + ld1 {v12.8b - v15.8b}, [sp], 32

4059 + ld1 {v16.8b - v19.8b}, [sp], 32

4060 + ld1 {v21.8b - v22.8b}, [sp], 16

4061 + ld1 {v24.8b - v27.8b}, [sp], 32

4062 + ld1 {v30.8b - v31.8b}, [sp], 16

4063 + blr x30

4064 +

4065 + .unreq DCT_TABLE

4066 + .unreq COEF_BLOCK

4067 + .unreq OUTPUT_BUF

4068 + .unreq OUTPUT_COL

4069 + .unreq TMP1

4070 + .unreq TMP2

4071 +

4072 +.purgem idct_helper

4073 +

4074 +

4075 +/*****************************************************************************/

4076 +

4077 +/*

4078 + * jsimd_ycc_extrgb_convert_neon

4079 + * jsimd_ycc_extbgr_convert_neon

4080 + * jsimd_ycc_extrgbx_convert_neon

4081 + * jsimd_ycc_extbgrx_convert_neon

4082 + * jsimd_ycc_extxbgr_convert_neon

4083 + * jsimd_ycc_extxrgb_convert_neon

4084 + *

4085 + * Colorspace conversion YCbCr -> RGB

4086 + */

4087 +

4088 +

4089 +.macro do_load size

4090 + .if \size == 8

4091 + ld1 {v4.8b}, [U], 8

4092 + ld1 {v5.8b}, [V], 8

4093 + ld1 {v0.8b}, [Y], 8

4094 + prfm PLDL1KEEP, [U, #64]

4095 + prfm PLDL1KEEP, [V, #64]

4096 + prfm PLDL1KEEP, [Y, #64]

4097 + .elseif \size == 4

4098 + ld1 {v4.b}[0], [U], 1

4099 + ld1 {v4.b}[1], [U], 1

4100 + ld1 {v4.b}[2], [U], 1

4101 + ld1 {v4.b}[3], [U], 1

4102 + ld1 {v5.b}[0], [V], 1

4103 + ld1 {v5.b}[1], [V], 1

4104 + ld1 {v5.b}[2], [V], 1

4105 + ld1 {v5.b}[3], [V], 1

4106 + ld1 {v0.b}[0], [Y], 1

4107 + ld1 {v0.b}[1], [Y], 1

4108 + ld1 {v0.b}[2], [Y], 1

4109 + ld1 {v0.b}[3], [Y], 1

4110 + .elseif \size == 2

4111 + ld1 {v4.b}[4], [U], 1

4112 + ld1 {v4.b}[5], [U], 1

4113 + ld1 {v5.b}[4], [V], 1

4114 + ld1 {v5.b}[5], [V], 1

4115 + ld1 {v0.b}[4], [Y], 1

4116 + ld1 {v0.b}[5], [Y], 1

4117 + .elseif \size == 1

4118 + ld1 {v4.b}[6], [U], 1

4119 + ld1 {v5.b}[6], [V], 1

4120 + ld1 {v0.b}[6], [Y], 1

4121 + .else

4122 + .error unsupported macroblock size

4123 + .endif

4124 +.endm

4125 +

4126 +.macro do_store bpp, size

4127 + .if \bpp == 24

4128 + .if \size == 8

4129 + st3 {v10.8b, v11.8b, v12.8b}, [RGB], 24

4130 + .elseif \size == 4

4131 + st3 {v10.b, v11.b, v12.b}[0], [RGB], 3

4132 + st3 {v10.b, v11.b, v12.b}[1], [RGB], 3

4133 + st3 {v10.b, v11.b, v12.b}[2], [RGB], 3

4134 + st3 {v10.b, v11.b, v12.b}[3], [RGB], 3

4135 + .elseif \size == 2

4136 + st3 {v10.b, v11.b, v12.b}[4], [RGB], 3

4137 + st3 {v10.b, v11.b, v12.b}[5], [RGB], 3

4138 + .elseif \size == 1

4139 + st3 {v10.b, v11.b, v12.b}[6], [RGB], 3

4140 + .else

4141 + .error unsupported macroblock size

4142 + .endif

4143 + .elseif \bpp == 32

4144 + .if \size == 8

4145 + st4 {v10.8b, v11.8b, v12.8b, v13.8b}, [RGB], 32

4146 + .elseif \size == 4

4147 + st4 {v10.b, v11.b, v12.b, v13.b}[0], [RGB], 4

4148 + st4 {v10.b, v11.b, v12.b, v13.b}[1], [RGB], 4

4149 + st4 {v10.b, v11.b, v12.b, v13.b}[2], [RGB], 4

4150 + st4 {v10.b, v11.b, v12.b, v13.b}[3], [RGB], 4

4151 + .elseif \size == 2

4152 + st4 {v10.b, v11.b, v12.b, v13.b}[4], [RGB], 4

4153 + st4 {v10.b, v11.b, v12.b, v13.b}[5], [RGB], 4

4154 + .elseif \size == 1

4155 + st4 {v10.b, v11.b, v12.b, v13.b}[6], [RGB], 4

4156 + .else

4157 + .error unsupported macroblock size

4158 + .endif

4159 + .elseif \bpp==16

4160 + .if \size == 8

4161 + st1 {v25.8h}, [RGB],16

4162 + .elseif \size == 4

4163 + st1 {v25.4h}, [RGB],8

4164 + .elseif \size == 2

4165 + st1 {v25.h}[4], [RGB],2

4166 + st1 {v25.h}[5], [RGB],2

4167 + .elseif \size == 1

4168 + st1 {v25.h}[6], [RGB],2

4169 + .else

4170 + .error unsupported macroblock size

4171 + .endif

4172 + .else

4173 + .error unsupported bpp

4174 + .endif

4175 +.endm

4176 +

4177 +.macro generate_jsimd_ycc_rgb_convert_neon colorid, bpp, r_offs, rsize, g_offs, gsize, b_offs, bsize, defsize

4178 +

4179 +/*

4180 + * 2-stage pipelined YCbCr->RGB conversion

4181 + */

4182 +

4183 +.macro do_yuv_to_rgb_stage1

4184 + uaddw v6.8h, v2.8h, v4.8b /* q3 = u - 128 */

4185 + uaddw v8.8h, v2.8h, v5.8b /* q2 = v - 128 */

4186 + smull v20.4s, v6.4h, v1.4h[1] /* multiply by -11277 */

4187 + smlal v20.4s, v8.4h, v1.4h[2] /* multiply by -23401 */

4188 + smull2 v22.4s, v6.8h, v1.4h[1] /* multiply by -11277 */

4189 + smlal2 v22.4s, v8.8h, v1.4h[2] /* multiply by -23401 */

4190 + smull v24.4s, v8.4h, v1.4h[0] /* multiply by 22971 */

4191 + smull2 v26.4s, v8.8h, v1.4h[0] /* multiply by 22971 */

4192 + smull v28.4s, v6.4h, v1.4h[3] /* multiply by 29033 */

4193 + smull2 v30.4s, v6.8h, v1.4h[3] /* multiply by 29033 */

4194 +.endm

4195 +

4196 +.macro do_yuv_to_rgb_stage2

4197 + rshrn v20.4h, v20.4s, #15

4198 + rshrn2 v20.8h, v22.4s, #15

4199 + rshrn v24.4h, v24.4s, #14

4200 + rshrn2 v24.8h, v26.4s, #14

4201 + rshrn v28.4h, v28.4s, #14

4202 + rshrn2 v28.8h, v30.4s, #14

4203 + uaddw v20.8h, v20.8h, v0.8b

4204 + uaddw v24.8h, v24.8h, v0.8b

4205 + uaddw v28.8h, v28.8h, v0.8b

4206 +.if \bpp != 16

4207 + sqxtun v1\g_offs\defsize, v20.8h

4208 + sqxtun v1\r_offs\defsize, v24.8h

4209 + sqxtun v1\b_offs\defsize, v28.8h

4210 +.else

4211 + sqshlu v21.8h, v20.8h, #8

4212 + sqshlu v25.8h, v24.8h, #8

4213 + sqshlu v29.8h, v28.8h, #8

4214 + sri v25.8h, v21.8h, #5

4215 + sri v25.8h, v29.8h, #11

4216 +.endif

4217 +

4218 +.endm

4219 +

4220 +.macro do_yuv_to_rgb_stage2_store_load_stage1

4221 + rshrn v20.4h, v20.4s, #15

4222 + rshrn v24.4h, v24.4s, #14

4223 + rshrn v28.4h, v28.4s, #14

4224 + ld1 {v4.8b}, [U], 8

4225 + rshrn2 v20.8h, v22.4s, #15

4226 + rshrn2 v24.8h, v26.4s, #14

4227 + rshrn2 v28.8h, v30.4s, #14

4228 + ld1 {v5.8b}, [V], 8

4229 + uaddw v20.8h, v20.8h, v0.8b

4230 + uaddw v24.8h, v24.8h, v0.8b

4231 + uaddw v28.8h, v28.8h, v0.8b

4232 +.if \bpp != 16 /************** rgb24/rgb32 *******************************/

4233 + sqxtun v1\g_offs\defsize, v20.8h

4234 + ld1 {v0.8b}, [Y], 8

4235 + sqxtun v1\r_offs\defsize, v24.8h

4236 + prfm PLDL1KEEP, [U, #64]

4237 + prfm PLDL1KEEP, [V, #64]

4238 + prfm PLDL1KEEP, [Y, #64]

4239 + sqxtun v1\b_offs\defsize, v28.8h

4240 + uaddw v6.8h, v2.8h, v4.8b /* v6.16b = u - 128 */

4241 + uaddw v8.8h, v2.8h, v5.8b /* q2 = v - 128 */

4242 + smull v20.4s, v6.4h, v1.4h[1] /* multiply by -11277 */

4243 + smlal v20.4s, v8.4h, v1.4h[2] /* multiply by -23401 */

4244 + smull2 v22.4s, v6.8h, v1.4h[1] /* multiply by -11277 */

4245 + smlal2 v22.4s, v8.8h, v1.4h[2] /* multiply by -23401 */

4246 + smull v24.4s, v8.4h, v1.4h[0] /* multiply by 22971 */

4247 + smull2 v26.4s, v8.8h, v1.4h[0] /* multiply by 22971 */

4248 +.else /************************** rgb565 *********************************/

4249 + sqshlu v21.8h, v20.8h, #8

4250 + sqshlu v25.8h, v24.8h, #8

4251 + sqshlu v29.8h, v28.8h, #8

4252 + uaddw v6.8h, v2.8h, v4.8b /* v6.16b = u - 128 */

4253 + uaddw v8.8h, v2.8h, v5.8b /* q2 = v - 128 */

4254 + ld1 {v0.8b}, [Y], 8

4255 + smull v20.4s, v6.4h, v1.4h[1] /* multiply by -11277 */

4256 + smlal v20.4s, v8.4h, v1.4h[2] /* multiply by -23401 */

4257 + smull2 v22.4s, v6.8h, v1.4h[1] /* multiply by -11277 */

4258 + smlal2 v22.4s, v8.8h, v1.4h[2] /* multiply by -23401 */

4259 + sri v25.8h, v21.8h, #5

4260 + smull v24.4s, v8.4h, v1.4h[0] /* multiply by 22971 */

4261 + smull2 v26.4s, v8.8h, v1.4h[0] /* multiply by 22971 */

4262 + prfm PLDL1KEEP, [U, #64]

4263 + prfm PLDL1KEEP, [V, #64]

4264 + prfm PLDL1KEEP, [Y, #64]

4265 + sri v25.8h, v29.8h, #11

4266 +.endif

4267 + do_store \bpp, 8

4268 + smull v28.4s, v6.4h, v1.4h[3] /* multiply by 29033 */

4269 + smull2 v30.4s, v6.8h, v1.4h[3] /* multiply by 29033 */

4270 +.endm

4271 +

4272 +.macro do_yuv_to_rgb

4273 + do_yuv_to_rgb_stage1

4274 + do_yuv_to_rgb_stage2

4275 +.endm

4276 +

4277 +/* Apple gas crashes on adrl, work around that by using adr.

4278 + * But this requires a copy of these constants for each function.

4279 + */

4280 +

4281 +.balign 16

4282 +jsimd_ycc_\colorid\()_neon_consts:

4283 + .short 0, 0, 0, 0

4284 + .short 22971, -11277, -23401, 29033

4285 + .short -128, -128, -128, -128

4286 + .short -128, -128, -128, -128

4287 +

4288 +asm_function jsimd_ycc_\colorid\()_convert_neon

4289 + OUTPUT_WIDTH .req x0

4290 + INPUT_BUF .req x1

4291 + INPUT_ROW .req x2

4292 + OUTPUT_BUF .req x3

4293 + NUM_ROWS .req x4

4294 +

4295 + INPUT_BUF0 .req x5

4296 + INPUT_BUF1 .req x6

4297 + INPUT_BUF2 .req INPUT_BUF

4298 +

4299 + RGB .req x7

4300 + Y .req x8

4301 + U .req x9

4302 + V .req x10

4303 + N .req x15

4304 +

4305 + sub sp, sp, 336

4306 + str x15, [sp], 16

4307 + /* Load constants to d1, d2, d3 (v0.4h is just used for padding) */

4308 + adr x15, jsimd_ycc_\colorid\()_neon_consts

4309 + /* Save NEON registers */

4310 + st1 {v0.8b - v3.8b}, [sp], 32

4311 + st1 {v4.8b - v7.8b}, [sp], 32

4312 + st1 {v8.8b - v11.8b}, [sp], 32

4313 + st1 {v12.8b - v15.8b}, [sp], 32

4314 + st1 {v16.8b - v19.8b}, [sp], 32

4315 + st1 {v20.8b - v23.8b}, [sp], 32

4316 + st1 {v24.8b - v27.8b}, [sp], 32

4317 + st1 {v28.8b - v31.8b}, [sp], 32

4318 + ld1 {v0.4h, v1.4h}, [x15], 16

4319 + ld1 {v2.8h}, [x15]

4320 +

4321 + /* Save ARM registers and handle input arguments */

4322 + /* push {x4, x5, x6, x7, x8, x9, x10, x30} */

4323 + stp x4, x5, [sp], 16

4324 + stp x6, x7, [sp], 16

4325 + stp x8, x9, [sp], 16

4326 + stp x10, x30, [sp], 16

4327 + ldr INPUT_BUF0, [INPUT_BUF]

4328 + ldr INPUT_BUF1, [INPUT_BUF, 8]

4329 + ldr INPUT_BUF2, [INPUT_BUF, 16]

4330 + .unreq INPUT_BUF

4331 +

4332 + /* Initially set v10, v11.4h, v12.8b, d13 to 0xFF */

4333 + movi v10.16b, #255

4334 + movi v13.16b, #255

4335 +

4336 + /* Outer loop over scanlines */

4337 + cmp NUM_ROWS, #1

4338 + blt 9f

4339 +0:

4340 + lsl x16, INPUT_ROW, #3

4341 + ldr Y, [INPUT_BUF0, x16]

4342 + ldr U, [INPUT_BUF1, x16]

4343 + mov N, OUTPUT_WIDTH

4344 + ldr V, [INPUT_BUF2, x16]

4345 + add INPUT_ROW, INPUT_ROW, #1

4346 + ldr RGB, [OUTPUT_BUF], #8

4347 +

4348 + /* Inner loop over pixels */

4349 + subs N, N, #8

4350 + blt 3f

4351 + do_load 8

4352 + do_yuv_to_rgb_stage1

4353 + subs N, N, #8

4354 + blt 2f

4355 +1:

4356 + do_yuv_to_rgb_stage2_store_load_stage1

4357 + subs N, N, #8

4358 + bge 1b

4359 +2:

4360 + do_yuv_to_rgb_stage2

4361 + do_store \bpp, 8

4362 + tst N, #7

4363 + beq 8f

4364 +3:

4365 + tst N, #4

4366 + beq 3f

4367 + do_load 4

4368 +3:

4369 + tst N, #2

4370 + beq 4f

4371 + do_load 2

4372 +4:

4373 + tst N, #1

4374 + beq 5f

4375 + do_load 1

4376 +5:

4377 + do_yuv_to_rgb

4378 + tst N, #4

4379 + beq 6f

4380 + do_store \bpp, 4

4381 +6:

4382 + tst N, #2

4383 + beq 7f

4384 + do_store \bpp, 2

4385 +7:

4386 + tst N, #1

4387 + beq 8f

4388 + do_store \bpp, 1

4389 +8:

4390 + subs NUM_ROWS, NUM_ROWS, #1

4391 + bgt 0b

4392 +9:

4393 + /* Restore all registers and return */

4394 + sub sp, sp, #336

4395 + ldr x15, [sp], 16

4396 + ld1 {v0.8b - v3.8b}, [sp], 32

4397 + ld1 {v4.8b - v7.8b}, [sp], 32

4398 + ld1 {v8.8b - v11.8b}, [sp], 32

4399 + ld1 {v12.8b - v15.8b}, [sp], 32

4400 + ld1 {v16.8b - v19.8b}, [sp], 32

4401 + ld1 {v20.8b - v23.8b}, [sp], 32

4402 + ld1 {v24.8b - v27.8b}, [sp], 32

4403 + ld1 {v28.8b - v31.8b}, [sp], 32

4404 + /* pop {r4, r5, r6, r7, r8, r9, r10, pc} */

4405 + ldp x4, x5, [sp], 16

4406 + ldp x6, x7, [sp], 16

4407 + ldp x8, x9, [sp], 16

4408 + ldp x10, x30, [sp], 16

4409 + br x30

4410 + .unreq OUTPUT_WIDTH

4411 + .unreq INPUT_ROW

4412 + .unreq OUTPUT_BUF

4413 + .unreq NUM_ROWS

4414 + .unreq INPUT_BUF0

4415 + .unreq INPUT_BUF1

4416 + .unreq INPUT_BUF2

4417 + .unreq RGB

4418 + .unreq Y

4419 + .unreq U

4420 + .unreq V

4421 + .unreq N

4422 +

4423 +.purgem do_yuv_to_rgb

4424 +.purgem do_yuv_to_rgb_stage1

4425 +.purgem do_yuv_to_rgb_stage2

4426 +.purgem do_yuv_to_rgb_stage2_store_load_stage1

4427 +.endm

4428 +

4429 +/--------------------------------- id ----- bpp R rsize G gsize B bsize defsize /

4430 +generate_jsimd_ycc_rgb_convert_neon extrgb, 24, 0, .4h, 1, .4h, 2, .4h, .8b

4431 +generate_jsimd_ycc_rgb_convert_neon extbgr, 24, 2, .4h, 1, .4h, 0, .4h, .8b

4432 +generate_jsimd_ycc_rgb_convert_neon extrgbx, 32, 0, .4h, 1, .4h, 2, .4h, .8b

4433 +generate_jsimd_ycc_rgb_convert_neon extbgrx, 32, 2, .4h, 1, .4h, 0, .4h, .8b

4434 +generate_jsimd_ycc_rgb_convert_neon extxbgr, 32, 3, .4h, 2, .4h, 1, .4h, .8b

4435 +generate_jsimd_ycc_rgb_convert_neon extxrgb, 32, 1, .4h, 2, .4h, 3, .4h, .8b

4436 +generate_jsimd_ycc_rgb_convert_neon rgb565, 16, 0, .4h, 0, .4h, 0, .4h, .8b

4437 +.purgem do_load

4438 +.purgem do_store

OLD	NEW

« no previous file with comments | « djpeg.c ('k') | jdapistd.c » ('j') | no next file with comments »