OLD | NEW |
| 1 Index: README |
| 2 =================================================================== |
| 3 --- README (revision 829) |
| 4 +++ README (working copy) |
| 5 @@ -1,26 +1,26 @@ |
| 6 +libjpeg-turbo note: This file has been modified by The libjpeg-turbo Project |
| 7 +to include only information relevant to libjpeg-turbo, to wordsmith certain |
| 8 +sections, and to remove impolitic language that existed in the libjpeg v8 |
| 9 +README. It is included only for reference. Please see README-turbo.txt for |
| 10 +information specific to libjpeg-turbo. |
| 11 + |
| 12 + |
| 13 The Independent JPEG Group's JPEG software |
| 14 ========================================== |
| 15 |
| 16 -README for release 6b of 27-Mar-1998 |
| 17 -==================================== |
| 18 +This distribution contains a release of the Independent JPEG Group's free JPEG |
| 19 +software. You are welcome to redistribute this software and to use it for any |
| 20 +purpose, subject to the conditions under LEGAL ISSUES, below. |
| 21 |
| 22 -This distribution contains the sixth public release of the Independent JPEG |
| 23 -Group's free JPEG software. You are welcome to redistribute this software and |
| 24 -to use it for any purpose, subject to the conditions under LEGAL ISSUES, below. |
| 25 +This software is the work of Tom Lane, Guido Vollbeding, Philip Gladstone, |
| 26 +Bill Allombert, Jim Boucher, Lee Crocker, Bob Friesenhahn, Ben Jackson, |
| 27 +Julian Minguillon, Luis Ortiz, George Phillips, Davide Rossi, Ge' Weijers, |
| 28 +and other members of the Independent JPEG Group. |
| 29 |
| 30 -Serious users of this software (particularly those incorporating it into |
| 31 -larger programs) should contact IJG at jpeg-info@uunet.uu.net to be added to |
| 32 -our electronic mailing list. Mailing list members are notified of updates |
| 33 -and have a chance to participate in technical discussions, etc. |
| 34 +IJG is not affiliated with the ISO/IEC JTC1/SC29/WG1 standards committee |
| 35 +(also known as JPEG, together with ITU-T SG16). |
| 36 |
| 37 -This software is the work of Tom Lane, Philip Gladstone, Jim Boucher, |
| 38 -Lee Crocker, Julian Minguillon, Luis Ortiz, George Phillips, Davide Rossi, |
| 39 -Guido Vollbeding, Ge' Weijers, and other members of the Independent JPEG |
| 40 -Group. |
| 41 |
| 42 -IJG is not affiliated with the official ISO JPEG standards committee. |
| 43 - |
| 44 - |
| 45 DOCUMENTATION ROADMAP |
| 46 ===================== |
| 47 |
| 48 @@ -30,7 +30,6 @@ |
| 49 LEGAL ISSUES Copyright, lack of warranty, terms of distribution. |
| 50 REFERENCES Where to learn more about JPEG. |
| 51 ARCHIVE LOCATIONS Where to find newer versions of this software. |
| 52 -RELATED SOFTWARE Other stuff you should get. |
| 53 FILE FORMAT WARS Software *not* to get. |
| 54 TO DO Plans for future IJG releases. |
| 55 |
| 56 @@ -37,20 +36,19 @@ |
| 57 Other documentation files in the distribution are: |
| 58 |
| 59 User documentation: |
| 60 - install.doc How to configure and install the IJG software. |
| 61 - usage.doc Usage instructions for cjpeg, djpeg, jpegtran, |
| 62 + install.txt How to configure and install the IJG software. |
| 63 + usage.txt Usage instructions for cjpeg, djpeg, jpegtran, |
| 64 rdjpgcom, and wrjpgcom. |
| 65 - *.1 Unix-style man pages for programs (same info as usage.doc). |
| 66 - wizard.doc Advanced usage instructions for JPEG wizards only. |
| 67 + *.1 Unix-style man pages for programs (same info as usage.txt). |
| 68 + wizard.txt Advanced usage instructions for JPEG wizards only. |
| 69 change.log Version-to-version change highlights. |
| 70 Programmer and internal documentation: |
| 71 - libjpeg.doc How to use the JPEG library in your own programs. |
| 72 + libjpeg.txt How to use the JPEG library in your own programs. |
| 73 example.c Sample code for calling the JPEG library. |
| 74 - structure.doc Overview of the JPEG library's internal structure. |
| 75 - filelist.doc Road map of IJG files. |
| 76 - coderules.doc Coding style rules --- please read if you contribute code. |
| 77 + structure.txt Overview of the JPEG library's internal structure. |
| 78 + coderules.txt Coding style rules --- please read if you contribute code. |
| 79 |
| 80 -Please read at least the files install.doc and usage.doc. Useful information |
| 81 +Please read at least the files install.txt and usage.txt. Some information |
| 82 can also be found in the JPEG FAQ (Frequently Asked Questions) article. See |
| 83 ARCHIVE LOCATIONS below to find out where to obtain the FAQ article. |
| 84 |
| 85 @@ -62,24 +60,27 @@ |
| 86 OVERVIEW |
| 87 ======== |
| 88 |
| 89 -This package contains C software to implement JPEG image compression and |
| 90 -decompression. JPEG (pronounced "jay-peg") is a standardized compression |
| 91 -method for full-color and gray-scale images. JPEG is intended for compressing |
| 92 -"real-world" scenes; line drawings, cartoons and other non-realistic images |
| 93 -are not its strong suit. JPEG is lossy, meaning that the output image is not |
| 94 -exactly identical to the input image. Hence you must not use JPEG if you |
| 95 -have to have identical output bits. However, on typical photographic images, |
| 96 -very good compression levels can be obtained with no visible change, and |
| 97 -remarkably high compression levels are possible if you can tolerate a |
| 98 -low-quality image. For more details, see the references, or just experiment |
| 99 -with various compression settings. |
| 100 +This package contains C software to implement JPEG image encoding, decoding, |
| 101 +and transcoding. JPEG (pronounced "jay-peg") is a standardized compression |
| 102 +method for full-color and gray-scale images. JPEG's strong suit is compressing |
| 103 +photographic images or other types of images that have smooth color and |
| 104 +brightness transitions between neighboring pixels. Images with sharp lines or |
| 105 +other abrupt features may not compress well with JPEG, and a higher JPEG |
| 106 +quality may have to be used to avoid visible compression artifacts with such |
| 107 +images. |
| 108 |
| 109 +JPEG is lossy, meaning that the output pixels are not necessarily identical to |
| 110 +the input pixels. However, on photographic content and other "smooth" images, |
| 111 +very good compression ratios can be obtained with no visible compression |
| 112 +artifacts, and extremely high compression ratios are possible if you are |
| 113 +willing to sacrifice image quality (by reducing the "quality" setting in the |
| 114 +compressor.) |
| 115 + |
| 116 This software implements JPEG baseline, extended-sequential, and progressive |
| 117 compression processes. Provision is made for supporting all variants of these |
| 118 processes, although some uncommon parameter settings aren't implemented yet. |
| 119 -For legal reasons, we are not distributing code for the arithmetic-coding |
| 120 -variants of JPEG; see LEGAL ISSUES. We have made no provision for supporting |
| 121 -the hierarchical or lossless processes defined in the standard. |
| 122 +We have made no provision for supporting the hierarchical or lossless |
| 123 +processes defined in the standard. |
| 124 |
| 125 We provide a set of library routines for reading and writing JPEG image files, |
| 126 plus two sample applications "cjpeg" and "djpeg", which use the library to |
| 127 @@ -91,11 +92,12 @@ |
| 128 for example, the color quantization modules are not strictly part of JPEG |
| 129 decoding, but they are essential for output to colormapped file formats or |
| 130 colormapped displays. These extra functions can be compiled out of the |
| 131 -library if not required for a particular application. We have also included |
| 132 -"jpegtran", a utility for lossless transcoding between different JPEG |
| 133 -processes, and "rdjpgcom" and "wrjpgcom", two simple applications for |
| 134 -inserting and extracting textual comments in JFIF files. |
| 135 +library if not required for a particular application. |
| 136 |
| 137 +We have also included "jpegtran", a utility for lossless transcoding between |
| 138 +different JPEG processes, and "rdjpgcom" and "wrjpgcom", two simple |
| 139 +applications for inserting and extracting textual comments in JFIF files. |
| 140 + |
| 141 The emphasis in designing this software has been on achieving portability and |
| 142 flexibility, while also making it fast enough to be useful. In particular, |
| 143 the software is not intended to be read as a tutorial on JPEG. (See the |
| 144 @@ -127,7 +129,7 @@ |
| 145 fitness for a particular purpose. This software is provided "AS IS", and you, |
| 146 its user, assume the entire risk as to its quality and accuracy. |
| 147 |
| 148 -This software is copyright (C) 1991-1998, Thomas G. Lane. |
| 149 +This software is copyright (C) 1991-2012, Thomas G. Lane, Guido Vollbeding. |
| 150 All Rights Reserved except as specified below. |
| 151 |
| 152 Permission is hereby granted to use, copy, modify, and distribute this |
| 153 @@ -158,30 +160,12 @@ |
| 154 assumed by the product vendor. |
| 155 |
| 156 |
| 157 -ansi2knr.c is included in this distribution by permission of L. Peter Deutsch, |
| 158 -sole proprietor of its copyright holder, Aladdin Enterprises of Menlo Park, CA. |
| 159 -ansi2knr.c is NOT covered by the above copyright and conditions, but instead |
| 160 -by the usual distribution terms of the Free Software Foundation; principally, |
| 161 -that you must include source code if you redistribute it. (See the file |
| 162 -ansi2knr.c for full details.) However, since ansi2knr.c is not needed as part |
| 163 -of any program generated from the IJG code, this does not limit you more than |
| 164 -the foregoing paragraphs do. |
| 165 - |
| 166 The Unix configuration script "configure" was produced with GNU Autoconf. |
| 167 It is copyright by the Free Software Foundation but is freely distributable. |
| 168 The same holds for its supporting scripts (config.guess, config.sub, |
| 169 -ltconfig, ltmain.sh). Another support script, install-sh, is copyright |
| 170 -by M.I.T. but is also freely distributable. |
| 171 +ltmain.sh). Another support script, install-sh, is copyright by X Consortium |
| 172 +but is also freely distributable. |
| 173 |
| 174 -It appears that the arithmetic coding option of the JPEG spec is covered by |
| 175 -patents owned by IBM, AT&T, and Mitsubishi. Hence arithmetic coding cannot |
| 176 -legally be used without obtaining one or more licenses. For this reason, |
| 177 -support for arithmetic coding has been removed from the free JPEG software. |
| 178 -(Since arithmetic coding provides only a marginal gain over the unpatented |
| 179 -Huffman mode, it is unlikely that very many implementations will support it.) |
| 180 -So far as we are aware, there are no patent restrictions on the remaining |
| 181 -code. |
| 182 - |
| 183 The IJG distribution formerly included code to read and write GIF files. |
| 184 To avoid entanglement with the Unisys LZW patent, GIF reading support has |
| 185 been removed altogether, and the GIF writer has been simplified to produce |
| 186 @@ -198,7 +182,7 @@ |
| 187 REFERENCES |
| 188 ========== |
| 189 |
| 190 -We highly recommend reading one or more of these references before trying to |
| 191 +We recommend reading one or more of these references before trying to |
| 192 understand the innards of the JPEG software. |
| 193 |
| 194 The best short technical introduction to the JPEG compression algorithm is |
| 195 @@ -207,7 +191,7 @@ |
| 196 (Adjacent articles in that issue discuss MPEG motion picture compression, |
| 197 applications of JPEG, and related topics.) If you don't have the CACM issue |
| 198 handy, a PostScript file containing a revised version of Wallace's article is |
| 199 -available at ftp://ftp.uu.net/graphics/jpeg/wallace.ps.gz. The file (actually |
| 200 +available at http://www.ijg.org/files/wallace.ps.gz. The file (actually |
| 201 a preprint for an article that appeared in IEEE Trans. Consumer Electronics) |
| 202 omits the sample images that appeared in CACM, but it includes corrections |
| 203 and some added material. Note: the Wallace article is copyright ACM and IEEE, |
| 204 @@ -222,45 +206,29 @@ |
| 205 sample code is far from industrial-strength, but when you are ready to look |
| 206 at a full implementation, you've got one here... |
| 207 |
| 208 -The best full description of JPEG is the textbook "JPEG Still Image Data |
| 209 -Compression Standard" by William B. Pennebaker and Joan L. Mitchell, published |
| 210 -by Van Nostrand Reinhold, 1993, ISBN 0-442-01272-1. Price US$59.95, 638 pp. |
| 211 -The book includes the complete text of the ISO JPEG standards (DIS 10918-1 |
| 212 -and draft DIS 10918-2). This is by far the most complete exposition of JPEG |
| 213 -in existence, and we highly recommend it. |
| 214 +The best currently available description of JPEG is the textbook "JPEG Still |
| 215 +Image Data Compression Standard" by William B. Pennebaker and Joan L. |
| 216 +Mitchell, published by Van Nostrand Reinhold, 1993, ISBN 0-442-01272-1. |
| 217 +Price US$59.95, 638 pp. The book includes the complete text of the ISO JPEG |
| 218 +standards (DIS 10918-1 and draft DIS 10918-2). |
| 219 |
| 220 -The JPEG standard itself is not available electronically; you must order a |
| 221 -paper copy through ISO or ITU. (Unless you feel a need to own a certified |
| 222 -official copy, we recommend buying the Pennebaker and Mitchell book instead; |
| 223 -it's much cheaper and includes a great deal of useful explanatory material.) |
| 224 -In the USA, copies of the standard may be ordered from ANSI Sales at (212) |
| 225 -642-4900, or from Global Engineering Documents at (800) 854-7179. (ANSI |
| 226 -doesn't take credit card orders, but Global does.) It's not cheap: as of |
| 227 -1992, ANSI was charging $95 for Part 1 and $47 for Part 2, plus 7% |
| 228 -shipping/handling. The standard is divided into two parts, Part 1 being the |
| 229 -actual specification, while Part 2 covers compliance testing methods. Part 1 |
| 230 -is titled "Digital Compression and Coding of Continuous-tone Still Images, |
| 231 +The original JPEG standard is divided into two parts, Part 1 being the actual |
| 232 +specification, while Part 2 covers compliance testing methods. Part 1 is |
| 233 +titled "Digital Compression and Coding of Continuous-tone Still Images, |
| 234 Part 1: Requirements and guidelines" and has document numbers ISO/IEC IS |
| 235 10918-1, ITU-T T.81. Part 2 is titled "Digital Compression and Coding of |
| 236 Continuous-tone Still Images, Part 2: Compliance testing" and has document |
| 237 numbers ISO/IEC IS 10918-2, ITU-T T.83. |
| 238 |
| 239 -Some extensions to the original JPEG standard are defined in JPEG Part 3, |
| 240 -a newer ISO standard numbered ISO/IEC IS 10918-3 and ITU-T T.84. IJG |
| 241 -currently does not support any Part 3 extensions. |
| 242 - |
| 243 The JPEG standard does not specify all details of an interchangeable file |
| 244 format. For the omitted details we follow the "JFIF" conventions, revision |
| 245 -1.02. A copy of the JFIF spec is available from: |
| 246 - Literature Department |
| 247 - C-Cube Microsystems, Inc. |
| 248 - 1778 McCarthy Blvd. |
| 249 - Milpitas, CA 95035 |
| 250 - phone (408) 944-6300, fax (408) 944-6314 |
| 251 -A PostScript version of this document is available by FTP at |
| 252 -ftp://ftp.uu.net/graphics/jpeg/jfif.ps.gz. There is also a plain text |
| 253 -version at ftp://ftp.uu.net/graphics/jpeg/jfif.txt.gz, but it is missing |
| 254 -the figures. |
| 255 +1.02. JFIF 1.02 has been adopted as an Ecma International Technical Report |
| 256 +and thus received a formal publication status. It is available as a free |
| 257 +download in PDF format from |
| 258 +http://www.ecma-international.org/publications/techreports/E-TR-098.htm. |
| 259 +A PostScript version of the JFIF document is available at |
| 260 +http://www.ijg.org/files/jfif.ps.gz. There is also a plain text version at |
| 261 +http://www.ijg.org/files/jfif.txt.gz, but it is missing the figures. |
| 262 |
| 263 The TIFF 6.0 file format specification can be obtained by FTP from |
| 264 ftp://ftp.sgi.com/graphics/tiff/TIFF6.ps.gz. The JPEG incorporation scheme |
| 265 @@ -267,37 +235,24 @@ |
| 266 found in the TIFF 6.0 spec of 3-June-92 has a number of serious problems. |
| 267 IJG does not recommend use of the TIFF 6.0 design (TIFF Compression tag 6). |
| 268 Instead, we recommend the JPEG design proposed by TIFF Technical Note #2 |
| 269 -(Compression tag 7). Copies of this Note can be obtained from ftp.sgi.com or |
| 270 -from ftp://ftp.uu.net/graphics/jpeg/. It is expected that the next revision |
| 271 +(Compression tag 7). Copies of this Note can be obtained from |
| 272 +http://www.ijg.org/files/. It is expected that the next revision |
| 273 of the TIFF spec will replace the 6.0 JPEG design with the Note's design. |
| 274 Although IJG's own code does not support TIFF/JPEG, the free libtiff library |
| 275 -uses our library to implement TIFF/JPEG per the Note. libtiff is available |
| 276 -from ftp://ftp.sgi.com/graphics/tiff/. |
| 277 +uses our library to implement TIFF/JPEG per the Note. |
| 278 |
| 279 |
| 280 ARCHIVE LOCATIONS |
| 281 ================= |
| 282 |
| 283 -The "official" archive site for this software is ftp.uu.net (Internet |
| 284 -address 192.48.96.9). The most recent released version can always be found |
| 285 -there in directory graphics/jpeg. This particular version will be archived |
| 286 -as ftp://ftp.uu.net/graphics/jpeg/jpegsrc.v6b.tar.gz. If you don't have |
| 287 -direct Internet access, UUNET's archives are also available via UUCP; contact |
| 288 -help@uunet.uu.net for information on retrieving files that way. |
| 289 +The "official" archive site for this software is www.ijg.org. |
| 290 +The most recent released version can always be found there in |
| 291 +directory "files". This particular version will be archived as |
| 292 +http://www.ijg.org/files/jpegsrc.v8d.tar.gz, and in Windows-compatible |
| 293 +"zip" archive format as http://www.ijg.org/files/jpegsr8d.zip. |
| 294 |
| 295 -Numerous Internet sites maintain copies of the UUNET files. However, only |
| 296 -ftp.uu.net is guaranteed to have the latest official version. |
| 297 - |
| 298 -You can also obtain this software in DOS-compatible "zip" archive format from |
| 299 -the SimTel archives (ftp://ftp.simtel.net/pub/simtelnet/msdos/graphics/), or |
| 300 -on CompuServe in the Graphics Support forum (GO CIS:GRAPHSUP), library 12 |
| 301 -"JPEG Tools". Again, these versions may sometimes lag behind the ftp.uu.net |
| 302 -release. |
| 303 - |
| 304 -The JPEG FAQ (Frequently Asked Questions) article is a useful source of |
| 305 -general information about JPEG. It is updated constantly and therefore is |
| 306 -not included in this distribution. The FAQ is posted every two weeks to |
| 307 -Usenet newsgroups comp.graphics.misc, news.answers, and other groups. |
| 308 +The JPEG FAQ (Frequently Asked Questions) article is a source of some |
| 309 +general information about JPEG. |
| 310 It is available on the World Wide Web at http://www.faqs.org/faqs/jpeg-faq/ |
| 311 and other news.answers archive sites, including the official news.answers |
| 312 archive at rtfm.mit.edu: ftp://rtfm.mit.edu/pub/usenet/news.answers/jpeg-faq/. |
| 313 @@ -307,79 +262,21 @@ |
| 314 send usenet/news.answers/jpeg-faq/part2 |
| 315 |
| 316 |
| 317 -RELATED SOFTWARE |
| 318 -================ |
| 319 - |
| 320 -Numerous viewing and image manipulation programs now support JPEG. (Quite a |
| 321 -few of them use this library to do so.) The JPEG FAQ described above lists |
| 322 -some of the more popular free and shareware viewers, and tells where to |
| 323 -obtain them on Internet. |
| 324 - |
| 325 -If you are on a Unix machine, we highly recommend Jef Poskanzer's free |
| 326 -PBMPLUS software, which provides many useful operations on PPM-format image |
| 327 -files. In particular, it can convert PPM images to and from a wide range of |
| 328 -other formats, thus making cjpeg/djpeg considerably more useful. The latest |
| 329 -version is distributed by the NetPBM group, and is available from numerous |
| 330 -sites, notably ftp://wuarchive.wustl.edu/graphics/graphics/packages/NetPBM/. |
| 331 -Unfortunately PBMPLUS/NETPBM is not nearly as portable as the IJG software is; |
| 332 -you are likely to have difficulty making it work on any non-Unix machine. |
| 333 - |
| 334 -A different free JPEG implementation, written by the PVRG group at Stanford, |
| 335 -is available from ftp://havefun.stanford.edu/pub/jpeg/. This program |
| 336 -is designed for research and experimentation rather than production use; |
| 337 -it is slower, harder to use, and less portable than the IJG code, but it |
| 338 -is easier to read and modify. Also, the PVRG code supports lossless JPEG, |
| 339 -which we do not. (On the other hand, it doesn't do progressive JPEG.) |
| 340 - |
| 341 - |
| 342 FILE FORMAT WARS |
| 343 ================ |
| 344 |
| 345 -Some JPEG programs produce files that are not compatible with our library. |
| 346 -The root of the problem is that the ISO JPEG committee failed to specify a |
| 347 -concrete file format. Some vendors "filled in the blanks" on their own, |
| 348 -creating proprietary formats that no one else could read. (For example, none |
| 349 -of the early commercial JPEG implementations for the Macintosh were able to |
| 350 -exchange compressed files.) |
| 351 +The ISO/IEC JTC1/SC29/WG1 standards committee (also known as JPEG, together |
| 352 +with ITU-T SG16) currently promotes different formats containing the name |
| 353 +"JPEG" which are incompatible with original DCT-based JPEG. IJG therefore does |
| 354 +not support these formats (see REFERENCES). Indeed, one of the original |
| 355 +reasons for developing this free software was to help force convergence on |
| 356 +common, interoperable format standards for JPEG files. |
| 357 +Don't use an incompatible file format! |
| 358 +(In any case, our decoder will remain capable of reading existing JPEG |
| 359 +image files indefinitely.) |
| 360 |
| 361 -The file format we have adopted is called JFIF (see REFERENCES). This format |
| 362 -has been agreed to by a number of major commercial JPEG vendors, and it has |
| 363 -become the de facto standard. JFIF is a minimal or "low end" representation. |
| 364 -We recommend the use of TIFF/JPEG (TIFF revision 6.0 as modified by TIFF |
| 365 -Technical Note #2) for "high end" applications that need to record a lot of |
| 366 -additional data about an image. TIFF/JPEG is fairly new and not yet widely |
| 367 -supported, unfortunately. |
| 368 |
| 369 -The upcoming JPEG Part 3 standard defines a file format called SPIFF. |
| 370 -SPIFF is interoperable with JFIF, in the sense that most JFIF decoders should |
| 371 -be able to read the most common variant of SPIFF. SPIFF has some technical |
| 372 -advantages over JFIF, but its major claim to fame is simply that it is an |
| 373 -official standard rather than an informal one. At this point it is unclear |
| 374 -whether SPIFF will supersede JFIF or whether JFIF will remain the de-facto |
| 375 -standard. IJG intends to support SPIFF once the standard is frozen, but we |
| 376 -have not decided whether it should become our default output format or not. |
| 377 -(In any case, our decoder will remain capable of reading JFIF indefinitely.) |
| 378 - |
| 379 -Various proprietary file formats incorporating JPEG compression also exist. |
| 380 -We have little or no sympathy for the existence of these formats. Indeed, |
| 381 -one of the original reasons for developing this free software was to help |
| 382 -force convergence on common, open format standards for JPEG files. Don't |
| 383 -use a proprietary file format! |
| 384 - |
| 385 - |
| 386 TO DO |
| 387 ===== |
| 388 |
| 389 -The major thrust for v7 will probably be improvement of visual quality. |
| 390 -The current method for scaling the quantization tables is known not to be |
| 391 -very good at low Q values. We also intend to investigate block boundary |
| 392 -smoothing, "poor man's variable quantization", and other means of improving |
| 393 -quality-vs-file-size performance without sacrificing compatibility. |
| 394 - |
| 395 -In future versions, we are considering supporting some of the upcoming JPEG |
| 396 -Part 3 extensions --- principally, variable quantization and the SPIFF file |
| 397 -format. |
| 398 - |
| 399 -As always, speeding things up is of great interest. |
| 400 - |
| 401 -Please send bug reports, offers of help, etc. to jpeg-info@uunet.uu.net. |
| 402 +Please send bug reports, offers of help, etc. to jpeg-info@jpegclub.org. |
| 403 Index: bmp.c |
| 404 =================================================================== |
| 405 --- bmp.c (revision 829) |
| 406 +++ bmp.c (working copy) |
| 407 @@ -1,370 +1,274 @@ |
| 408 -/* Copyright (C)2004 Landmark Graphics Corporation |
| 409 - * Copyright (C)2005 Sun Microsystems, Inc. |
| 410 +/* |
| 411 + * Copyright (C)2011 D. R. Commander. All Rights Reserved. |
| 412 * |
| 413 - * This library is free software and may be redistributed and/or modified under |
| 414 - * the terms of the wxWindows Library License, Version 3.1 or (at your option) |
| 415 - * any later version. The full license is in the LICENSE.txt file included |
| 416 - * with this distribution. |
| 417 + * Redistribution and use in source and binary forms, with or without |
| 418 + * modification, are permitted provided that the following conditions are met: |
| 419 * |
| 420 - * This library is distributed in the hope that it will be useful, |
| 421 - * but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 422 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 423 - * wxWindows Library License for more details. |
| 424 -*/ |
| 425 + * - Redistributions of source code must retain the above copyright notice, |
| 426 + * this list of conditions and the following disclaimer. |
| 427 + * - Redistributions in binary form must reproduce the above copyright notice, |
| 428 + * this list of conditions and the following disclaimer in the documentation |
| 429 + * and/or other materials provided with the distribution. |
| 430 + * - Neither the name of the libjpeg-turbo Project nor the names of its |
| 431 + * contributors may be used to endorse or promote products derived from this |
| 432 + * software without specific prior written permission. |
| 433 + * |
| 434 + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS", |
| 435 + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| 436 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
| 437 + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE |
| 438 + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR |
| 439 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF |
| 440 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS |
| 441 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN |
| 442 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) |
| 443 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE |
| 444 + * POSSIBILITY OF SUCH DAMAGE. |
| 445 + */ |
| 446 |
| 447 -#include <fcntl.h> |
| 448 -#include <sys/types.h> |
| 449 -#include <sys/stat.h> |
| 450 -#include <errno.h> |
| 451 -#include <stdlib.h> |
| 452 #include <stdio.h> |
| 453 #include <string.h> |
| 454 -#ifdef _WIN32 |
| 455 - #include <io.h> |
| 456 -#else |
| 457 - #include <unistd.h> |
| 458 -#endif |
| 459 -#include "./rrutil.h" |
| 460 -#include "./bmp.h" |
| 461 +#include <setjmp.h> |
| 462 +#include <errno.h> |
| 463 +#include "cdjpeg.h" |
| 464 +#include <jpeglib.h> |
| 465 +#include <jpegint.h> |
| 466 +#include "tjutil.h" |
| 467 +#include "bmp.h" |
| 468 |
| 469 -#ifndef BI_BITFIELDS |
| 470 -#define BI_BITFIELDS 3L |
| 471 -#endif |
| 472 -#ifndef BI_RGB |
| 473 -#define BI_RGB 0L |
| 474 -#endif |
| 475 |
| 476 -#define BMPHDRSIZE 54 |
| 477 -typedef struct _bmphdr |
| 478 -{ |
| 479 - unsigned short bfType; |
| 480 - unsigned int bfSize; |
| 481 - unsigned short bfReserved1, bfReserved2; |
| 482 - unsigned int bfOffBits; |
| 483 +/* This duplicates the functionality of the VirtualGL bitmap library using |
| 484 + the components from cjpeg and djpeg */ |
| 485 |
| 486 - unsigned int biSize; |
| 487 - int biWidth, biHeight; |
| 488 - unsigned short biPlanes, biBitCount; |
| 489 - unsigned int biCompression, biSizeImage; |
| 490 - int biXPelsPerMeter, biYPelsPerMeter; |
| 491 - unsigned int biClrUsed, biClrImportant; |
| 492 -} bmphdr; |
| 493 |
| 494 -static const char *__bmperr="No error"; |
| 495 +/* Error handling (based on example in example.c) */ |
| 496 |
| 497 -static const int ps[BMPPIXELFORMATS]={3, 4, 3, 4, 4, 4}; |
| 498 -static const int roffset[BMPPIXELFORMATS]={0, 0, 2, 2, 3, 1}; |
| 499 -static const int goffset[BMPPIXELFORMATS]={1, 1, 1, 1, 2, 2}; |
| 500 -static const int boffset[BMPPIXELFORMATS]={2, 2, 0, 0, 1, 3}; |
| 501 +static char errStr[JMSG_LENGTH_MAX]="No error"; |
| 502 |
| 503 -#define _throw(m) {__bmperr=m; retcode=-1; goto finally;} |
| 504 -#define _unix(f) {if((f)==-1) _throw(strerror(errno));} |
| 505 -#define _catch(f) {if((f)==-1) {retcode=-1; goto finally;}} |
| 506 +struct my_error_mgr |
| 507 +{ |
| 508 + struct jpeg_error_mgr pub; |
| 509 + jmp_buf setjmp_buffer; |
| 510 +}; |
| 511 +typedef struct my_error_mgr *my_error_ptr; |
| 512 |
| 513 -#define readme(fd, addr, size) \ |
| 514 - if((bytesread=read(fd, addr, (size)))==-1) _throw(strerror(errno)); \ |
| 515 - if(bytesread!=(size)) _throw("Read error"); |
| 516 - |
| 517 -void pixelconvert(unsigned char *srcbuf, enum BMPPIXELFORMAT srcformat, |
| 518 - int srcpitch, unsigned char *dstbuf, enum BMPPIXELFORMAT dstformat, int
dstpitch, |
| 519 - int w, int h, int flip) |
| 520 +static void my_error_exit(j_common_ptr cinfo) |
| 521 { |
| 522 - unsigned char *srcptr, *srcptr0, *dstptr, *dstptr0; |
| 523 - int i, j; |
| 524 - |
| 525 - srcptr=flip? &srcbuf[srcpitch*(h-1)]:srcbuf; |
| 526 - for(j=0, dstptr=dstbuf; j<h; j++, |
| 527 - srcptr+=flip? -srcpitch:srcpitch, dstptr+=dstpitch) |
| 528 - { |
| 529 - for(i=0, srcptr0=srcptr, dstptr0=dstptr; i<w; i++, |
| 530 - srcptr0+=ps[srcformat], dstptr0+=ps[dstformat]) |
| 531 - { |
| 532 - dstptr0[roffset[dstformat]]=srcptr0[roffset[srcformat]]; |
| 533 - dstptr0[goffset[dstformat]]=srcptr0[goffset[srcformat]]; |
| 534 - dstptr0[boffset[dstformat]]=srcptr0[boffset[srcformat]]; |
| 535 - } |
| 536 - } |
| 537 + my_error_ptr myerr=(my_error_ptr)cinfo->err; |
| 538 + (*cinfo->err->output_message)(cinfo); |
| 539 + longjmp(myerr->setjmp_buffer, 1); |
| 540 } |
| 541 |
| 542 -int loadppm(int *fd, unsigned char **buf, int *w, int *h, |
| 543 - enum BMPPIXELFORMAT f, int align, int dstbottomup, int ascii) |
| 544 +/* Based on output_message() in jerror.c */ |
| 545 + |
| 546 +static void my_output_message(j_common_ptr cinfo) |
| 547 { |
| 548 - FILE *fs=NULL; int retcode=0, scalefactor, dstpitch; |
| 549 - unsigned char *tempbuf=NULL; char temps[255], temps2[255]; |
| 550 - int numread=0, totalread=0, pixel[3], i, j; |
| 551 + (*cinfo->err->format_message)(cinfo, errStr); |
| 552 +} |
| 553 |
| 554 - if((fs=fdopen(*fd, "r"))==NULL) _throw(strerror(errno)); |
| 555 +#define _throw(m) {snprintf(errStr, JMSG_LENGTH_MAX, "%s", m); \ |
| 556 + retval=-1; goto bailout;} |
| 557 +#define _throwunix(m) {snprintf(errStr, JMSG_LENGTH_MAX, "%s\n%s", m, \ |
| 558 + strerror(errno)); retval=-1; goto bailout;} |
| 559 |
| 560 - do |
| 561 - { |
| 562 - if(!fgets(temps, 255, fs)) _throw("Read error"); |
| 563 - if(strlen(temps)==0 || temps[0]=='\n') continue; |
| 564 - if(sscanf(temps, "%s", temps2)==1 && temps2[1]=='#') continue; |
| 565 - switch(totalread) |
| 566 - { |
| 567 - case 0: |
| 568 - if((numread=sscanf(temps, "%d %d %d", w, h, &sca
lefactor))==EOF) |
| 569 - _throw("Read error"); |
| 570 - break; |
| 571 - case 1: |
| 572 - if((numread=sscanf(temps, "%d %d", h, &scalefact
or))==EOF) |
| 573 - _throw("Read error"); |
| 574 - break; |
| 575 - case 2: |
| 576 - if((numread=sscanf(temps, "%d", &scalefactor))==
EOF) |
| 577 - _throw("Read error"); |
| 578 - break; |
| 579 - } |
| 580 - totalread+=numread; |
| 581 - } while(totalread<3); |
| 582 - if((*w)<1 || (*h)<1 || scalefactor<1) _throw("Corrupt PPM header"); |
| 583 |
| 584 - dstpitch=(((*w)*ps[f])+(align-1))&(~(align-1)); |
| 585 - if((*buf=(unsigned char *)malloc(dstpitch*(*h)))==NULL) |
| 586 - _throw("Memory allocation error"); |
| 587 - if(ascii) |
| 588 +static void pixelconvert(unsigned char *srcbuf, int srcpf, int srcbottomup, |
| 589 + unsigned char *dstbuf, int dstpf, int dstbottomup, int w, int h) |
| 590 +{ |
| 591 + unsigned char *srcptr=srcbuf, *srcptr2; |
| 592 + int srcps=tjPixelSize[srcpf]; |
| 593 + int srcstride=srcbottomup? -w*srcps:w*srcps; |
| 594 + unsigned char *dstptr=dstbuf, *dstptr2; |
| 595 + int dstps=tjPixelSize[dstpf]; |
| 596 + int dststride=dstbottomup? -w*dstps:w*dstps; |
| 597 + int row, col; |
| 598 + |
| 599 + if(srcbottomup) srcptr=&srcbuf[w*srcps*(h-1)]; |
| 600 + if(dstbottomup) dstptr=&dstbuf[w*dstps*(h-1)]; |
| 601 + for(row=0; row<h; row++, srcptr+=srcstride, dstptr+=dststride) |
| 602 { |
| 603 - for(j=0; j<*h; j++) |
| 604 + for(col=0, srcptr2=srcptr, dstptr2=dstptr; col<w; col++, srcptr2
+=srcps, |
| 605 + dstptr2+=dstps) |
| 606 { |
| 607 - for(i=0; i<*w; i++) |
| 608 - { |
| 609 - if(fscanf(fs, "%d%d%d", &pixel[0], &pixel[1], &p
ixel[2])!=3) |
| 610 - _throw("Read error"); |
| 611 - (*buf)[j*dstpitch+i*ps[f]+roffset[f]]=(unsigned
char)(pixel[0]*255/scalefactor); |
| 612 - (*buf)[j*dstpitch+i*ps[f]+goffset[f]]=(unsigned
char)(pixel[1]*255/scalefactor); |
| 613 - (*buf)[j*dstpitch+i*ps[f]+boffset[f]]=(unsigned
char)(pixel[2]*255/scalefactor); |
| 614 - } |
| 615 + dstptr2[tjRedOffset[dstpf]]=srcptr2[tjRedOffset[srcpf]]; |
| 616 + dstptr2[tjGreenOffset[dstpf]]=srcptr2[tjGreenOffset[srcp
f]]; |
| 617 + dstptr2[tjBlueOffset[dstpf]]=srcptr2[tjBlueOffset[srcpf]
]; |
| 618 } |
| 619 } |
| 620 - else |
| 621 - { |
| 622 - if(scalefactor!=255) |
| 623 - _throw("Binary PPMs must have 8-bit components"); |
| 624 - if((tempbuf=(unsigned char *)malloc((*w)*(*h)*3))==NULL) |
| 625 - _throw("Memory allocation error"); |
| 626 - if(fread(tempbuf, (*w)*(*h)*3, 1, fs)!=1) _throw("Read error"); |
| 627 - pixelconvert(tempbuf, BMP_RGB, (*w)*3, *buf, f, dstpitch, *w, *h
, dstbottomup); |
| 628 - } |
| 629 - |
| 630 - finally: |
| 631 - if(fs) {fclose(fs); *fd=-1;} |
| 632 - if(tempbuf) free(tempbuf); |
| 633 - return retcode; |
| 634 } |
| 635 |
| 636 |
| 637 int loadbmp(char *filename, unsigned char **buf, int *w, int *h, |
| 638 - enum BMPPIXELFORMAT f, int align, int dstbottomup) |
| 639 + int dstpf, int bottomup) |
| 640 { |
| 641 - int fd=-1, bytesread, srcpitch, srcbottomup=1, srcps, dstpitch, |
| 642 - retcode=0; |
| 643 - unsigned char *tempbuf=NULL; |
| 644 - bmphdr bh; int flags=O_RDONLY; |
| 645 + int retval=0, dstps, srcpf, tempc; |
| 646 + struct jpeg_compress_struct cinfo; |
| 647 + struct my_error_mgr jerr; |
| 648 + cjpeg_source_ptr src; |
| 649 + FILE *file=NULL; |
| 650 |
| 651 - dstbottomup=dstbottomup? 1:0; |
| 652 - #ifdef _WIN32 |
| 653 - flags|=O_BINARY; |
| 654 - #endif |
| 655 - if(!filename || !buf || !w || !h || f<0 || f>BMPPIXELFORMATS-1 || align<
1) |
| 656 - _throw("invalid argument to loadbmp()"); |
| 657 - if((align&(align-1))!=0) |
| 658 - _throw("Alignment must be a power of 2"); |
| 659 - _unix(fd=open(filename, flags)); |
| 660 + memset(&cinfo, 0, sizeof(struct jpeg_compress_struct)); |
| 661 |
| 662 - readme(fd, &bh.bfType, sizeof(unsigned short)); |
| 663 - if(!littleendian()) bh.bfType=byteswap16(bh.bfType); |
| 664 + if(!filename || !buf || !w || !h || dstpf<0 || dstpf>=TJ_NUMPF) |
| 665 + _throw("loadbmp(): Invalid argument"); |
| 666 |
| 667 - if(bh.bfType==0x3650) |
| 668 + if((file=fopen(filename, "rb"))==NULL) |
| 669 + _throwunix("loadbmp(): Cannot open input file"); |
| 670 + |
| 671 + cinfo.err=jpeg_std_error(&jerr.pub); |
| 672 + jerr.pub.error_exit=my_error_exit; |
| 673 + jerr.pub.output_message=my_output_message; |
| 674 + |
| 675 + if(setjmp(jerr.setjmp_buffer)) |
| 676 { |
| 677 - _catch(loadppm(&fd, buf, w, h, f, align, dstbottomup, 0)); |
| 678 - goto finally; |
| 679 + /* If we get here, the JPEG code has signaled an error. */ |
| 680 + retval=-1; goto bailout; |
| 681 } |
| 682 - if(bh.bfType==0x3350) |
| 683 - { |
| 684 - _catch(loadppm(&fd, buf, w, h, f, align, dstbottomup, 1)); |
| 685 - goto finally; |
| 686 - } |
| 687 |
| 688 - readme(fd, &bh.bfSize, sizeof(unsigned int)); |
| 689 - readme(fd, &bh.bfReserved1, sizeof(unsigned short)); |
| 690 - readme(fd, &bh.bfReserved2, sizeof(unsigned short)); |
| 691 - readme(fd, &bh.bfOffBits, sizeof(unsigned int)); |
| 692 - readme(fd, &bh.biSize, sizeof(unsigned int)); |
| 693 - readme(fd, &bh.biWidth, sizeof(int)); |
| 694 - readme(fd, &bh.biHeight, sizeof(int)); |
| 695 - readme(fd, &bh.biPlanes, sizeof(unsigned short)); |
| 696 - readme(fd, &bh.biBitCount, sizeof(unsigned short)); |
| 697 - readme(fd, &bh.biCompression, sizeof(unsigned int)); |
| 698 - readme(fd, &bh.biSizeImage, sizeof(unsigned int)); |
| 699 - readme(fd, &bh.biXPelsPerMeter, sizeof(int)); |
| 700 - readme(fd, &bh.biYPelsPerMeter, sizeof(int)); |
| 701 - readme(fd, &bh.biClrUsed, sizeof(unsigned int)); |
| 702 - readme(fd, &bh.biClrImportant, sizeof(unsigned int)); |
| 703 + jpeg_create_compress(&cinfo); |
| 704 + if((tempc=getc(file))<0 || ungetc(tempc, file)==EOF) |
| 705 + _throwunix("loadbmp(): Could not read input file") |
| 706 + else if(tempc==EOF) _throw("loadbmp(): Input file contains no data"); |
| 707 |
| 708 - if(!littleendian()) |
| 709 + if(tempc=='B') |
| 710 { |
| 711 - bh.bfSize=byteswap(bh.bfSize); |
| 712 - bh.bfOffBits=byteswap(bh.bfOffBits); |
| 713 - bh.biSize=byteswap(bh.biSize); |
| 714 - bh.biWidth=byteswap(bh.biWidth); |
| 715 - bh.biHeight=byteswap(bh.biHeight); |
| 716 - bh.biPlanes=byteswap16(bh.biPlanes); |
| 717 - bh.biBitCount=byteswap16(bh.biBitCount); |
| 718 - bh.biCompression=byteswap(bh.biCompression); |
| 719 - bh.biSizeImage=byteswap(bh.biSizeImage); |
| 720 - bh.biXPelsPerMeter=byteswap(bh.biXPelsPerMeter); |
| 721 - bh.biYPelsPerMeter=byteswap(bh.biYPelsPerMeter); |
| 722 - bh.biClrUsed=byteswap(bh.biClrUsed); |
| 723 - bh.biClrImportant=byteswap(bh.biClrImportant); |
| 724 + if((src=jinit_read_bmp(&cinfo))==NULL) |
| 725 + _throw("loadbmp(): Could not initialize bitmap loader"); |
| 726 } |
| 727 + else if(tempc=='P') |
| 728 + { |
| 729 + if((src=jinit_read_ppm(&cinfo))==NULL) |
| 730 + _throw("loadbmp(): Could not initialize bitmap loader"); |
| 731 + } |
| 732 + else _throw("loadbmp(): Unsupported file type"); |
| 733 |
| 734 - if(bh.bfType!=0x4d42 || bh.bfOffBits<BMPHDRSIZE |
| 735 - || bh.biWidth<1 || bh.biHeight==0) |
| 736 - _throw("Corrupt bitmap header"); |
| 737 - if((bh.biBitCount!=24 && bh.biBitCount!=32) || bh.biCompression!=BI_RGB) |
| 738 - _throw("Only uncompessed RGB bitmaps are supported"); |
| 739 + src->input_file=file; |
| 740 + (*src->start_input)(&cinfo, src); |
| 741 + (*cinfo.mem->realize_virt_arrays)((j_common_ptr)&cinfo); |
| 742 |
| 743 - *w=bh.biWidth; *h=bh.biHeight; srcps=bh.biBitCount/8; |
| 744 - if(*h<0) {*h=-(*h); srcbottomup=0;} |
| 745 - srcpitch=(((*w)*srcps)+3)&(~3); |
| 746 - dstpitch=(((*w)*ps[f])+(align-1))&(~(align-1)); |
| 747 + *w=cinfo.image_width; *h=cinfo.image_height; |
| 748 |
| 749 - if(srcpitch*(*h)+bh.bfOffBits!=bh.bfSize) _throw("Corrupt bitmap header"
); |
| 750 - if((tempbuf=(unsigned char *)malloc(srcpitch*(*h)))==NULL |
| 751 - || (*buf=(unsigned char *)malloc(dstpitch*(*h)))==NULL) |
| 752 - _throw("Memory allocation error"); |
| 753 - if(lseek(fd, (long)bh.bfOffBits, SEEK_SET)!=(long)bh.bfOffBits) |
| 754 - _throw(strerror(errno)); |
| 755 - _unix(bytesread=read(fd, tempbuf, srcpitch*(*h))); |
| 756 - if(bytesread!=srcpitch*(*h)) _throw("Read error"); |
| 757 + if(cinfo.input_components==1 && cinfo.in_color_space==JCS_RGB) |
| 758 + srcpf=TJPF_GRAY; |
| 759 + else srcpf=TJPF_RGB; |
| 760 |
| 761 - pixelconvert(tempbuf, BMP_BGR, srcpitch, *buf, f, dstpitch, *w, *h, |
| 762 - srcbottomup!=dstbottomup); |
| 763 + dstps=tjPixelSize[dstpf]; |
| 764 + if((*buf=(unsigned char *)malloc((*w)*(*h)*dstps))==NULL) |
| 765 + _throw("loadbmp(): Memory allocation failure"); |
| 766 |
| 767 - finally: |
| 768 - if(tempbuf) free(tempbuf); |
| 769 - if(fd!=-1) close(fd); |
| 770 - return retcode; |
| 771 + while(cinfo.next_scanline<cinfo.image_height) |
| 772 + { |
| 773 + int i, nlines=(*src->get_pixel_rows)(&cinfo, src); |
| 774 + for(i=0; i<nlines; i++) |
| 775 + { |
| 776 + unsigned char *outbuf; int row; |
| 777 + row=cinfo.next_scanline+i; |
| 778 + if(bottomup) outbuf=&(*buf)[((*h)-row-1)*(*w)*dstps]; |
| 779 + else outbuf=&(*buf)[row*(*w)*dstps]; |
| 780 + pixelconvert(src->buffer[i], srcpf, 0, outbuf, dstpf, bo
ttomup, *w, |
| 781 + nlines); |
| 782 + } |
| 783 + cinfo.next_scanline+=nlines; |
| 784 + } |
| 785 + |
| 786 + (*src->finish_input)(&cinfo, src); |
| 787 + |
| 788 + bailout: |
| 789 + jpeg_destroy_compress(&cinfo); |
| 790 + if(file) fclose(file); |
| 791 + if(retval<0 && buf && *buf) {free(*buf); *buf=NULL;} |
| 792 + return retval; |
| 793 } |
| 794 |
| 795 -#define writeme(fd, addr, size) \ |
| 796 - if((byteswritten=write(fd, addr, (size)))==-1) _throw(strerror(errno));
\ |
| 797 - if(byteswritten!=(size)) _throw("Write error"); |
| 798 |
| 799 -int saveppm(char *filename, unsigned char *buf, int w, int h, |
| 800 - enum BMPPIXELFORMAT f, int srcpitch, int srcbottomup) |
| 801 +int savebmp(char *filename, unsigned char *buf, int w, int h, int srcpf, |
| 802 + int bottomup) |
| 803 { |
| 804 - FILE *fs=NULL; int retcode=0; |
| 805 - unsigned char *tempbuf=NULL; |
| 806 + int retval=0, srcps, dstpf; |
| 807 + struct jpeg_decompress_struct dinfo; |
| 808 + struct my_error_mgr jerr; |
| 809 + djpeg_dest_ptr dst; |
| 810 + FILE *file=NULL; |
| 811 + char *ptr=NULL; |
| 812 |
| 813 - if((fs=fopen(filename, "wb"))==NULL) _throw(strerror(errno)); |
| 814 - if(fprintf(fs, "P6\n")<1) _throw("Write error"); |
| 815 - if(fprintf(fs, "%d %d\n", w, h)<1) _throw("Write error"); |
| 816 - if(fprintf(fs, "255\n")<1) _throw("Write error"); |
| 817 + memset(&dinfo, 0, sizeof(struct jpeg_decompress_struct)); |
| 818 |
| 819 - if((tempbuf=(unsigned char *)malloc(w*h*3))==NULL) |
| 820 - _throw("Memory allocation error"); |
| 821 + if(!filename || !buf || w<1 || h<1 || srcpf<0 || srcpf>=TJ_NUMPF) |
| 822 + _throw("savebmp(): Invalid argument"); |
| 823 |
| 824 - pixelconvert(buf, f, srcpitch, tempbuf, BMP_RGB, w*3, w, h, |
| 825 - srcbottomup); |
| 826 + if((file=fopen(filename, "wb"))==NULL) |
| 827 + _throwunix("savebmp(): Cannot open output file"); |
| 828 |
| 829 - if((fwrite(tempbuf, w*h*3, 1, fs))!=1) _throw("Write error"); |
| 830 + dinfo.err=jpeg_std_error(&jerr.pub); |
| 831 + jerr.pub.error_exit=my_error_exit; |
| 832 + jerr.pub.output_message=my_output_message; |
| 833 |
| 834 - finally: |
| 835 - if(tempbuf) free(tempbuf); |
| 836 - if(fs) fclose(fs); |
| 837 - return retcode; |
| 838 -} |
| 839 + if(setjmp(jerr.setjmp_buffer)) |
| 840 + { |
| 841 + /* If we get here, the JPEG code has signaled an error. */ |
| 842 + retval=-1; goto bailout; |
| 843 + } |
| 844 |
| 845 -int savebmp(char *filename, unsigned char *buf, int w, int h, |
| 846 - enum BMPPIXELFORMAT f, int srcpitch, int srcbottomup) |
| 847 -{ |
| 848 - int fd=-1, byteswritten, dstpitch, retcode=0; |
| 849 - int flags=O_RDWR|O_CREAT|O_TRUNC; |
| 850 - unsigned char *tempbuf=NULL; char *temp; |
| 851 - bmphdr bh; int mode; |
| 852 + jpeg_create_decompress(&dinfo); |
| 853 + if(srcpf==TJPF_GRAY) |
| 854 + { |
| 855 + dinfo.out_color_components=dinfo.output_components=1; |
| 856 + dinfo.out_color_space=JCS_GRAYSCALE; |
| 857 + } |
| 858 + else |
| 859 + { |
| 860 + dinfo.out_color_components=dinfo.output_components=3; |
| 861 + dinfo.out_color_space=JCS_RGB; |
| 862 + } |
| 863 + dinfo.image_width=w; dinfo.image_height=h; |
| 864 + dinfo.global_state=DSTATE_READY; |
| 865 + dinfo.scale_num=dinfo.scale_denom=1; |
| 866 |
| 867 - #ifdef _WIN32 |
| 868 - flags|=O_BINARY; mode=_S_IREAD|_S_IWRITE; |
| 869 - #else |
| 870 - mode=S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH; |
| 871 - #endif |
| 872 - if(!filename || !buf || w<1 || h<1 || f<0 || f>BMPPIXELFORMATS-1 || srcp
itch<0) |
| 873 - _throw("bad argument to savebmp()"); |
| 874 - |
| 875 - if(srcpitch==0) srcpitch=w*ps[f]; |
| 876 - |
| 877 - if((temp=strrchr(filename, '.'))!=NULL) |
| 878 + ptr=strrchr(filename, '.'); |
| 879 + if(ptr && !strcasecmp(ptr, ".bmp")) |
| 880 { |
| 881 - if(!stricmp(temp, ".ppm")) |
| 882 - return saveppm(filename, buf, w, h, f, srcpitch, srcbott
omup); |
| 883 + if((dst=jinit_write_bmp(&dinfo, 0))==NULL) |
| 884 + _throw("savebmp(): Could not initialize bitmap writer"); |
| 885 } |
| 886 + else |
| 887 + { |
| 888 + if((dst=jinit_write_ppm(&dinfo))==NULL) |
| 889 + _throw("savebmp(): Could not initialize PPM writer"); |
| 890 + } |
| 891 |
| 892 - _unix(fd=open(filename, flags, mode)); |
| 893 - dstpitch=((w*3)+3)&(~3); |
| 894 + dst->output_file=file; |
| 895 + (*dst->start_output)(&dinfo, dst); |
| 896 + (*dinfo.mem->realize_virt_arrays)((j_common_ptr)&dinfo); |
| 897 |
| 898 - bh.bfType=0x4d42; |
| 899 - bh.bfSize=BMPHDRSIZE+dstpitch*h; |
| 900 - bh.bfReserved1=0; bh.bfReserved2=0; |
| 901 - bh.bfOffBits=BMPHDRSIZE; |
| 902 - bh.biSize=40; |
| 903 - bh.biWidth=w; bh.biHeight=h; |
| 904 - bh.biPlanes=0; bh.biBitCount=24; |
| 905 - bh.biCompression=BI_RGB; bh.biSizeImage=0; |
| 906 - bh.biXPelsPerMeter=0; bh.biYPelsPerMeter=0; |
| 907 - bh.biClrUsed=0; bh.biClrImportant=0; |
| 908 + if(srcpf==TJPF_GRAY) dstpf=srcpf; |
| 909 + else dstpf=TJPF_RGB; |
| 910 + srcps=tjPixelSize[srcpf]; |
| 911 |
| 912 - if(!littleendian()) |
| 913 + while(dinfo.output_scanline<dinfo.output_height) |
| 914 { |
| 915 - bh.bfType=byteswap16(bh.bfType); |
| 916 - bh.bfSize=byteswap(bh.bfSize); |
| 917 - bh.bfOffBits=byteswap(bh.bfOffBits); |
| 918 - bh.biSize=byteswap(bh.biSize); |
| 919 - bh.biWidth=byteswap(bh.biWidth); |
| 920 - bh.biHeight=byteswap(bh.biHeight); |
| 921 - bh.biPlanes=byteswap16(bh.biPlanes); |
| 922 - bh.biBitCount=byteswap16(bh.biBitCount); |
| 923 - bh.biCompression=byteswap(bh.biCompression); |
| 924 - bh.biSizeImage=byteswap(bh.biSizeImage); |
| 925 - bh.biXPelsPerMeter=byteswap(bh.biXPelsPerMeter); |
| 926 - bh.biYPelsPerMeter=byteswap(bh.biYPelsPerMeter); |
| 927 - bh.biClrUsed=byteswap(bh.biClrUsed); |
| 928 - bh.biClrImportant=byteswap(bh.biClrImportant); |
| 929 + int i, nlines=dst->buffer_height; |
| 930 + for(i=0; i<nlines; i++) |
| 931 + { |
| 932 + unsigned char *inbuf; int row; |
| 933 + row=dinfo.output_scanline+i; |
| 934 + if(bottomup) inbuf=&buf[(h-row-1)*w*srcps]; |
| 935 + else inbuf=&buf[row*w*srcps]; |
| 936 + pixelconvert(inbuf, srcpf, bottomup, dst->buffer[i], dst
pf, 0, w, |
| 937 + nlines); |
| 938 + } |
| 939 + (*dst->put_pixel_rows)(&dinfo, dst, nlines); |
| 940 + dinfo.output_scanline+=nlines; |
| 941 } |
| 942 |
| 943 - writeme(fd, &bh.bfType, sizeof(unsigned short)); |
| 944 - writeme(fd, &bh.bfSize, sizeof(unsigned int)); |
| 945 - writeme(fd, &bh.bfReserved1, sizeof(unsigned short)); |
| 946 - writeme(fd, &bh.bfReserved2, sizeof(unsigned short)); |
| 947 - writeme(fd, &bh.bfOffBits, sizeof(unsigned int)); |
| 948 - writeme(fd, &bh.biSize, sizeof(unsigned int)); |
| 949 - writeme(fd, &bh.biWidth, sizeof(int)); |
| 950 - writeme(fd, &bh.biHeight, sizeof(int)); |
| 951 - writeme(fd, &bh.biPlanes, sizeof(unsigned short)); |
| 952 - writeme(fd, &bh.biBitCount, sizeof(unsigned short)); |
| 953 - writeme(fd, &bh.biCompression, sizeof(unsigned int)); |
| 954 - writeme(fd, &bh.biSizeImage, sizeof(unsigned int)); |
| 955 - writeme(fd, &bh.biXPelsPerMeter, sizeof(int)); |
| 956 - writeme(fd, &bh.biYPelsPerMeter, sizeof(int)); |
| 957 - writeme(fd, &bh.biClrUsed, sizeof(unsigned int)); |
| 958 - writeme(fd, &bh.biClrImportant, sizeof(unsigned int)); |
| 959 + (*dst->finish_output)(&dinfo, dst); |
| 960 |
| 961 - if((tempbuf=(unsigned char *)malloc(dstpitch*h))==NULL) |
| 962 - _throw("Memory allocation error"); |
| 963 - |
| 964 - pixelconvert(buf, f, srcpitch, tempbuf, BMP_BGR, dstpitch, w, h, |
| 965 - !srcbottomup); |
| 966 - |
| 967 - if((byteswritten=write(fd, tempbuf, dstpitch*h))!=dstpitch*h) |
| 968 - _throw(strerror(errno)); |
| 969 - |
| 970 - finally: |
| 971 - if(tempbuf) free(tempbuf); |
| 972 - if(fd!=-1) close(fd); |
| 973 - return retcode; |
| 974 + bailout: |
| 975 + jpeg_destroy_decompress(&dinfo); |
| 976 + if(file) fclose(file); |
| 977 + return retval; |
| 978 } |
| 979 |
| 980 const char *bmpgeterr(void) |
| 981 { |
| 982 - return __bmperr; |
| 983 + return errStr; |
| 984 } |
| 985 Index: bmp.h |
| 986 =================================================================== |
| 987 --- bmp.h (revision 829) |
| 988 +++ bmp.h (working copy) |
| 989 @@ -1,48 +1,42 @@ |
| 990 -/* Copyright (C)2004 Landmark Graphics Corporation |
| 991 - * Copyright (C)2005 Sun Microsystems, Inc. |
| 992 +/* |
| 993 + * Copyright (C)2011 D. R. Commander. All Rights Reserved. |
| 994 * |
| 995 - * This library is free software and may be redistributed and/or modified under |
| 996 - * the terms of the wxWindows Library License, Version 3.1 or (at your option) |
| 997 - * any later version. The full license is in the LICENSE.txt file included |
| 998 - * with this distribution. |
| 999 + * Redistribution and use in source and binary forms, with or without |
| 1000 + * modification, are permitted provided that the following conditions are met: |
| 1001 * |
| 1002 - * This library is distributed in the hope that it will be useful, |
| 1003 - * but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 1004 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 1005 - * wxWindows Library License for more details. |
| 1006 -*/ |
| 1007 + * - Redistributions of source code must retain the above copyright notice, |
| 1008 + * this list of conditions and the following disclaimer. |
| 1009 + * - Redistributions in binary form must reproduce the above copyright notice, |
| 1010 + * this list of conditions and the following disclaimer in the documentation |
| 1011 + * and/or other materials provided with the distribution. |
| 1012 + * - Neither the name of the libjpeg-turbo Project nor the names of its |
| 1013 + * contributors may be used to endorse or promote products derived from this |
| 1014 + * software without specific prior written permission. |
| 1015 + * |
| 1016 + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS", |
| 1017 + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| 1018 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
| 1019 + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE |
| 1020 + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR |
| 1021 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF |
| 1022 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS |
| 1023 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN |
| 1024 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) |
| 1025 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE |
| 1026 + * POSSIBILITY OF SUCH DAMAGE. |
| 1027 + */ |
| 1028 |
| 1029 -// This provides rudimentary facilities for loading and saving true color |
| 1030 -// BMP and PPM files |
| 1031 - |
| 1032 #ifndef __BMP_H__ |
| 1033 #define __BMP_H__ |
| 1034 |
| 1035 -#define BMPPIXELFORMATS 6 |
| 1036 -enum BMPPIXELFORMAT {BMP_RGB=0, BMP_RGBA, BMP_BGR, BMP_BGRA, BMP_ABGR, BMP_ARGB
}; |
| 1037 +#include "./turbojpeg.h" |
| 1038 |
| 1039 -#ifdef __cplusplus |
| 1040 -extern "C" { |
| 1041 -#endif |
| 1042 +int loadbmp(char *filename, unsigned char **buf, int *w, int *h, int pf, |
| 1043 + int bottomup); |
| 1044 |
| 1045 -// This will load a Windows bitmap from a file and return a buffer with the |
| 1046 -// specified pixel format, scanline alignment, and orientation. The width and |
| 1047 -// height are returned in w and h. |
| 1048 +int savebmp(char *filename, unsigned char *buf, int w, int h, int pf, |
| 1049 + int bottomup); |
| 1050 |
| 1051 -int loadbmp(char *filename, unsigned char **buf, int *w, int *h, |
| 1052 - enum BMPPIXELFORMAT f, int align, int dstbottomup); |
| 1053 - |
| 1054 -// This will save a buffer with the specified pixel format, pitch, orientation, |
| 1055 -// width, and height as a 24-bit Windows bitmap or PPM (the filename determines |
| 1056 -// which format to use) |
| 1057 - |
| 1058 -int savebmp(char *filename, unsigned char *buf, int w, int h, |
| 1059 - enum BMPPIXELFORMAT f, int srcpitch, int srcbottomup); |
| 1060 - |
| 1061 const char *bmpgeterr(void); |
| 1062 |
| 1063 -#ifdef __cplusplus |
| 1064 -} |
| 1065 #endif |
| 1066 - |
| 1067 -#endif |
| 1068 Index: cderror.h |
| 1069 =================================================================== |
| 1070 --- cderror.h (revision 829) |
| 1071 +++ cderror.h (working copy) |
| 1072 @@ -2,6 +2,7 @@ |
| 1073 * cderror.h |
| 1074 * |
| 1075 * Copyright (C) 1994-1997, Thomas G. Lane. |
| 1076 + * Modified 2009 by Guido Vollbeding. |
| 1077 * This file is part of the Independent JPEG Group's software. |
| 1078 * For conditions of distribution and use, see the accompanying README file. |
| 1079 * |
| 1080 @@ -45,6 +46,7 @@ |
| 1081 JMESSAGE(JERR_BMP_BADPLANES, "Invalid BMP file: biPlanes not equal to 1") |
| 1082 JMESSAGE(JERR_BMP_COLORSPACE, "BMP output must be grayscale or RGB") |
| 1083 JMESSAGE(JERR_BMP_COMPRESSED, "Sorry, compressed BMPs not yet supported") |
| 1084 +JMESSAGE(JERR_BMP_EMPTY, "Empty BMP image") |
| 1085 JMESSAGE(JERR_BMP_NOT, "Not a BMP file - does not start with BM") |
| 1086 JMESSAGE(JTRC_BMP, "%ux%u 24-bit BMP image") |
| 1087 JMESSAGE(JTRC_BMP_MAPPED, "%ux%u 8-bit colormapped BMP image") |
| 1088 Index: cdjpeg.h |
| 1089 =================================================================== |
| 1090 --- cdjpeg.h (revision 829) |
| 1091 +++ cdjpeg.h (working copy) |
| 1092 @@ -104,6 +104,7 @@ |
| 1093 #define jinit_write_targa jIWrTarga |
| 1094 #define read_quant_tables RdQTables |
| 1095 #define read_scan_script RdScnScript |
| 1096 +#define set_quality_ratings SetQRates |
| 1097 #define set_quant_slots SetQSlots |
| 1098 #define set_sample_factors SetSFacts |
| 1099 #define read_color_map RdCMap |
| 1100 @@ -131,8 +132,10 @@ |
| 1101 /* cjpeg support routines (in rdswitch.c) */ |
| 1102 |
| 1103 EXTERN(boolean) read_quant_tables JPP((j_compress_ptr cinfo, char * filename, |
| 1104 - int scale_factor, boolean force_baseline)); |
| 1105 + boolean force_baseline)); |
| 1106 EXTERN(boolean) read_scan_script JPP((j_compress_ptr cinfo, char * filename)); |
| 1107 +EXTERN(boolean) set_quality_ratings JPP((j_compress_ptr cinfo, char *arg, |
| 1108 + boolean force_baseline)); |
| 1109 EXTERN(boolean) set_quant_slots JPP((j_compress_ptr cinfo, char *arg)); |
| 1110 EXTERN(boolean) set_sample_factors JPP((j_compress_ptr cinfo, char *arg)); |
| 1111 |
| 1112 Index: cjpeg.c |
| 1113 =================================================================== |
| 1114 --- cjpeg.c (revision 829) |
| 1115 +++ cjpeg.c (working copy) |
| 1116 @@ -1,8 +1,11 @@ |
| 1117 /* |
| 1118 * cjpeg.c |
| 1119 * |
| 1120 + * This file was part of the Independent JPEG Group's software: |
| 1121 * Copyright (C) 1991-1998, Thomas G. Lane. |
| 1122 - * This file is part of the Independent JPEG Group's software. |
| 1123 + * Modified 2003-2011 by Guido Vollbeding. |
| 1124 + * libjpeg-turbo Modifications: |
| 1125 + * Copyright (C) 2010, 2013, D. R. Commander. |
| 1126 * For conditions of distribution and use, see the accompanying README file. |
| 1127 * |
| 1128 * This file contains a command-line user interface for the JPEG compressor. |
| 1129 @@ -25,6 +28,7 @@ |
| 1130 |
| 1131 #include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */ |
| 1132 #include "jversion.h" /* for version message */ |
| 1133 +#include "config.h" |
| 1134 |
| 1135 #ifdef USE_CCOMMAND /* command-line reader for Macintosh */ |
| 1136 #ifdef __MWERKS__ |
| 1137 @@ -135,6 +139,7 @@ |
| 1138 |
| 1139 static const char * progname; /* program name for error messages */ |
| 1140 static char * outfilename; /* for -outfile switch */ |
| 1141 +boolean memdst; /* for -memdst switch */ |
| 1142 |
| 1143 |
| 1144 LOCAL(void) |
| 1145 @@ -149,8 +154,9 @@ |
| 1146 #endif |
| 1147 |
| 1148 fprintf(stderr, "Switches (names may be abbreviated):\n"); |
| 1149 - fprintf(stderr, " -quality N Compression quality (0..100; 5-95 is useful
range)\n"); |
| 1150 + fprintf(stderr, " -quality N[,...] Compression quality (0..100; 5-95 is us
eful range)\n"); |
| 1151 fprintf(stderr, " -grayscale Create monochrome JPEG file\n"); |
| 1152 + fprintf(stderr, " -rgb Create RGB JPEG file\n"); |
| 1153 #ifdef ENTROPY_OPT_SUPPORTED |
| 1154 fprintf(stderr, " -optimize Optimize Huffman table (smaller file, but s
low compression)\n"); |
| 1155 #endif |
| 1156 @@ -161,6 +167,9 @@ |
| 1157 fprintf(stderr, " -targa Input file is Targa format (usually not nee
ded)\n"); |
| 1158 #endif |
| 1159 fprintf(stderr, "Switches for advanced users:\n"); |
| 1160 +#ifdef C_ARITH_CODING_SUPPORTED |
| 1161 + fprintf(stderr, " -arithmetic Use arithmetic coding\n"); |
| 1162 +#endif |
| 1163 #ifdef DCT_ISLOW_SUPPORTED |
| 1164 fprintf(stderr, " -dct int Use integer DCT method%s\n", |
| 1165 (JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : "")); |
| 1166 @@ -179,11 +188,11 @@ |
| 1167 #endif |
| 1168 fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n"); |
| 1169 fprintf(stderr, " -outfile name Specify name for output file\n"); |
| 1170 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 1171 + fprintf(stderr, " -memdst Compress to memory instead of file (useful
for benchmarking)\n"); |
| 1172 +#endif |
| 1173 fprintf(stderr, " -verbose or -debug Emit debug output\n"); |
| 1174 fprintf(stderr, "Switches for wizards:\n"); |
| 1175 -#ifdef C_ARITH_CODING_SUPPORTED |
| 1176 - fprintf(stderr, " -arithmetic Use arithmetic coding\n"); |
| 1177 -#endif |
| 1178 fprintf(stderr, " -baseline Force baseline quantization tables\n"); |
| 1179 fprintf(stderr, " -qtables file Use quantization tables given in file\n"); |
| 1180 fprintf(stderr, " -qslots N[,...] Set component quantization tables\n"); |
| 1181 @@ -209,10 +218,9 @@ |
| 1182 { |
| 1183 int argn; |
| 1184 char * arg; |
| 1185 - int quality; /* -quality parameter */ |
| 1186 - int q_scale_factor; /* scaling percentage for -qtables */ |
| 1187 boolean force_baseline; |
| 1188 boolean simple_progressive; |
| 1189 + char * qualityarg = NULL; /* saves -quality parm if any */ |
| 1190 char * qtablefile = NULL; /* saves -qtables filename if any */ |
| 1191 char * qslotsarg = NULL; /* saves -qslots parm if any */ |
| 1192 char * samplearg = NULL; /* saves -sample parm if any */ |
| 1193 @@ -219,15 +227,12 @@ |
| 1194 char * scansarg = NULL; /* saves -scans parm if any */ |
| 1195 |
| 1196 /* Set up default JPEG parameters. */ |
| 1197 - /* Note that default -quality level need not, and does not, |
| 1198 - * match the default scaling for an explicit -qtables argument. |
| 1199 - */ |
| 1200 - quality = 75; /* default -quality value */ |
| 1201 - q_scale_factor = 100; /* default to no scaling for -qtables */ |
| 1202 + |
| 1203 force_baseline = FALSE; /* by default, allow 16-bit quantizers */ |
| 1204 simple_progressive = FALSE; |
| 1205 is_targa = FALSE; |
| 1206 outfilename = NULL; |
| 1207 + memdst = FALSE; |
| 1208 cinfo->err->trace_level = 0; |
| 1209 |
| 1210 /* Scan command line options, adjust parameters */ |
| 1211 @@ -277,8 +282,11 @@ |
| 1212 static boolean printed_version = FALSE; |
| 1213 |
| 1214 if (! printed_version) { |
| 1215 - fprintf(stderr, "Independent JPEG Group's CJPEG, version %s\n%s\n", |
| 1216 - JVERSION, JCOPYRIGHT); |
| 1217 + fprintf(stderr, "%s version %s (build %s)\n", |
| 1218 + PACKAGE_NAME, VERSION, BUILD); |
| 1219 + fprintf(stderr, "%s\n\n", JCOPYRIGHT); |
| 1220 + fprintf(stderr, "Emulating The Independent JPEG Group's software, versio
n %s\n\n", |
| 1221 + JVERSION); |
| 1222 printed_version = TRUE; |
| 1223 } |
| 1224 cinfo->err->trace_level++; |
| 1225 @@ -287,6 +295,10 @@ |
| 1226 /* Force a monochrome JPEG file to be generated. */ |
| 1227 jpeg_set_colorspace(cinfo, JCS_GRAYSCALE); |
| 1228 |
| 1229 + } else if (keymatch(arg, "rgb", 3)) { |
| 1230 + /* Force an RGB JPEG file to be generated. */ |
| 1231 + jpeg_set_colorspace(cinfo, JCS_RGB); |
| 1232 + |
| 1233 } else if (keymatch(arg, "maxmemory", 3)) { |
| 1234 /* Maximum memory in Kb (or Mb with 'm'). */ |
| 1235 long lval; |
| 1236 @@ -305,7 +317,7 @@ |
| 1237 #ifdef ENTROPY_OPT_SUPPORTED |
| 1238 cinfo->optimize_coding = TRUE; |
| 1239 #else |
| 1240 - fprintf(stderr, "%s: sorry, entropy optimization was not compiled\n", |
| 1241 + fprintf(stderr, "%s: sorry, entropy optimization was not compiled in\n", |
| 1242 progname); |
| 1243 exit(EXIT_FAILURE); |
| 1244 #endif |
| 1245 @@ -322,19 +334,26 @@ |
| 1246 simple_progressive = TRUE; |
| 1247 /* We must postpone execution until num_components is known. */ |
| 1248 #else |
| 1249 - fprintf(stderr, "%s: sorry, progressive output was not compiled\n", |
| 1250 + fprintf(stderr, "%s: sorry, progressive output was not compiled in\n", |
| 1251 progname); |
| 1252 exit(EXIT_FAILURE); |
| 1253 #endif |
| 1254 |
| 1255 + } else if (keymatch(arg, "memdst", 2)) { |
| 1256 + /* Use in-memory destination manager */ |
| 1257 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 1258 + memdst = TRUE; |
| 1259 +#else |
| 1260 + fprintf(stderr, "%s: sorry, in-memory destination manager was not compile
d in\n", |
| 1261 + progname); |
| 1262 + exit(EXIT_FAILURE); |
| 1263 +#endif |
| 1264 + |
| 1265 } else if (keymatch(arg, "quality", 1)) { |
| 1266 - /* Quality factor (quantization table scaling factor). */ |
| 1267 + /* Quality ratings (quantization table scaling factors). */ |
| 1268 if (++argn >= argc) /* advance to next argument */ |
| 1269 usage(); |
| 1270 - if (sscanf(argv[argn], "%d", &quality) != 1) |
| 1271 - usage(); |
| 1272 - /* Change scale factor in case -qtables is present. */ |
| 1273 - q_scale_factor = jpeg_quality_scaling(quality); |
| 1274 + qualityarg = argv[argn]; |
| 1275 |
| 1276 } else if (keymatch(arg, "qslots", 2)) { |
| 1277 /* Quantization table slot numbers. */ |
| 1278 @@ -382,7 +401,7 @@ |
| 1279 * default sampling factors. |
| 1280 */ |
| 1281 |
| 1282 - } else if (keymatch(arg, "scans", 2)) { |
| 1283 + } else if (keymatch(arg, "scans", 4)) { |
| 1284 /* Set scan script. */ |
| 1285 #ifdef C_MULTISCAN_FILES_SUPPORTED |
| 1286 if (++argn >= argc) /* advance to next argument */ |
| 1287 @@ -390,7 +409,7 @@ |
| 1288 scansarg = argv[argn]; |
| 1289 /* We must postpone reading the file in case -progressive appears. */ |
| 1290 #else |
| 1291 - fprintf(stderr, "%s: sorry, multi-scan output was not compiled\n", |
| 1292 + fprintf(stderr, "%s: sorry, multi-scan output was not compiled in\n", |
| 1293 progname); |
| 1294 exit(EXIT_FAILURE); |
| 1295 #endif |
| 1296 @@ -422,11 +441,12 @@ |
| 1297 |
| 1298 /* Set quantization tables for selected quality. */ |
| 1299 /* Some or all may be overridden if -qtables is present. */ |
| 1300 - jpeg_set_quality(cinfo, quality, force_baseline); |
| 1301 + if (qualityarg != NULL) /* process -quality if it was present */ |
| 1302 + if (! set_quality_ratings(cinfo, qualityarg, force_baseline)) |
| 1303 + usage(); |
| 1304 |
| 1305 if (qtablefile != NULL) /* process -qtables if it was present */ |
| 1306 - if (! read_quant_tables(cinfo, qtablefile, |
| 1307 - q_scale_factor, force_baseline)) |
| 1308 + if (! read_quant_tables(cinfo, qtablefile, force_baseline)) |
| 1309 usage(); |
| 1310 |
| 1311 if (qslotsarg != NULL) /* process -qslots if it was present */ |
| 1312 @@ -468,7 +488,9 @@ |
| 1313 int file_index; |
| 1314 cjpeg_source_ptr src_mgr; |
| 1315 FILE * input_file; |
| 1316 - FILE * output_file; |
| 1317 + FILE * output_file = NULL; |
| 1318 + unsigned char *outbuffer = NULL; |
| 1319 + unsigned long outsize = 0; |
| 1320 JDIMENSION num_scanlines; |
| 1321 |
| 1322 /* On Mac, fetch a command line. */ |
| 1323 @@ -511,20 +533,22 @@ |
| 1324 file_index = parse_switches(&cinfo, argc, argv, 0, FALSE); |
| 1325 |
| 1326 #ifdef TWO_FILE_COMMANDLINE |
| 1327 - /* Must have either -outfile switch or explicit output file name */ |
| 1328 - if (outfilename == NULL) { |
| 1329 - if (file_index != argc-2) { |
| 1330 - fprintf(stderr, "%s: must name one input and one output file\n", |
| 1331 - progname); |
| 1332 - usage(); |
| 1333 + if (!memdst) { |
| 1334 + /* Must have either -outfile switch or explicit output file name */ |
| 1335 + if (outfilename == NULL) { |
| 1336 + if (file_index != argc-2) { |
| 1337 + fprintf(stderr, "%s: must name one input and one output file\n", |
| 1338 + progname); |
| 1339 + usage(); |
| 1340 + } |
| 1341 + outfilename = argv[file_index+1]; |
| 1342 + } else { |
| 1343 + if (file_index != argc-1) { |
| 1344 + fprintf(stderr, "%s: must name one input and one output file\n", |
| 1345 + progname); |
| 1346 + usage(); |
| 1347 + } |
| 1348 } |
| 1349 - outfilename = argv[file_index+1]; |
| 1350 - } else { |
| 1351 - if (file_index != argc-1) { |
| 1352 - fprintf(stderr, "%s: must name one input and one output file\n", |
| 1353 - progname); |
| 1354 - usage(); |
| 1355 - } |
| 1356 } |
| 1357 #else |
| 1358 /* Unix style: expect zero or one file name */ |
| 1359 @@ -551,7 +575,7 @@ |
| 1360 fprintf(stderr, "%s: can't open %s\n", progname, outfilename); |
| 1361 exit(EXIT_FAILURE); |
| 1362 } |
| 1363 - } else { |
| 1364 + } else if (!memdst) { |
| 1365 /* default output file is stdout */ |
| 1366 output_file = write_stdout(); |
| 1367 } |
| 1368 @@ -574,7 +598,12 @@ |
| 1369 file_index = parse_switches(&cinfo, argc, argv, 0, TRUE); |
| 1370 |
| 1371 /* Specify data destination for compression */ |
| 1372 - jpeg_stdio_dest(&cinfo, output_file); |
| 1373 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 1374 + if (memdst) |
| 1375 + jpeg_mem_dest(&cinfo, &outbuffer, &outsize); |
| 1376 + else |
| 1377 +#endif |
| 1378 + jpeg_stdio_dest(&cinfo, output_file); |
| 1379 |
| 1380 /* Start compressor */ |
| 1381 jpeg_start_compress(&cinfo, TRUE); |
| 1382 @@ -593,7 +622,7 @@ |
| 1383 /* Close files, if we opened them */ |
| 1384 if (input_file != stdin) |
| 1385 fclose(input_file); |
| 1386 - if (output_file != stdout) |
| 1387 + if (output_file != stdout && output_file != NULL) |
| 1388 fclose(output_file); |
| 1389 |
| 1390 #ifdef PROGRESS_REPORT |
| 1391 @@ -600,6 +629,12 @@ |
| 1392 end_progress_monitor((j_common_ptr) &cinfo); |
| 1393 #endif |
| 1394 |
| 1395 + if (memdst) { |
| 1396 + fprintf(stderr, "Compressed size: %lu bytes\n", outsize); |
| 1397 + if (outbuffer != NULL) |
| 1398 + free(outbuffer); |
| 1399 + } |
| 1400 + |
| 1401 /* All done. */ |
| 1402 exit(jerr.num_warnings ? EXIT_WARNING : EXIT_SUCCESS); |
| 1403 return 0; /* suppress no-return-value warnings */ |
| 1404 Index: djpeg.c |
| 1405 =================================================================== |
| 1406 --- djpeg.c (revision 829) |
| 1407 +++ djpeg.c (working copy) |
| 1408 @@ -1,8 +1,11 @@ |
| 1409 /* |
| 1410 * djpeg.c |
| 1411 * |
| 1412 + * This file was part of the Independent JPEG Group's software: |
| 1413 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 1414 - * This file is part of the Independent JPEG Group's software. |
| 1415 + * libjpeg-turbo Modifications: |
| 1416 + * Copyright (C) 2010-2011, 2013-2015, D. R. Commander. |
| 1417 + * Copyright (C) 2015, Google, Inc. |
| 1418 * For conditions of distribution and use, see the accompanying README file. |
| 1419 * |
| 1420 * This file contains a command-line user interface for the JPEG decompressor. |
| 1421 @@ -25,6 +28,7 @@ |
| 1422 |
| 1423 #include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */ |
| 1424 #include "jversion.h" /* for version message */ |
| 1425 +#include "config.h" |
| 1426 |
| 1427 #include <ctype.h> /* to declare isprint() */ |
| 1428 |
| 1429 @@ -84,6 +88,10 @@ |
| 1430 |
| 1431 static const char * progname; /* program name for error messages */ |
| 1432 static char * outfilename; /* for -outfile switch */ |
| 1433 +boolean memsrc; /* for -memsrc switch */ |
| 1434 +boolean strip, skip; |
| 1435 +JDIMENSION startY, endY; |
| 1436 +#define INPUT_BUF_SIZE 4096 |
| 1437 |
| 1438 |
| 1439 LOCAL(void) |
| 1440 @@ -101,6 +109,7 @@ |
| 1441 fprintf(stderr, " -colors N Reduce image to no more than N colors\n"); |
| 1442 fprintf(stderr, " -fast Fast, low-quality processing\n"); |
| 1443 fprintf(stderr, " -grayscale Force grayscale output\n"); |
| 1444 + fprintf(stderr, " -rgb Force RGB output\n"); |
| 1445 #ifdef IDCT_SCALING_SUPPORTED |
| 1446 fprintf(stderr, " -scale M/N Scale output image by fraction M/N, eg, 1/8
\n"); |
| 1447 #endif |
| 1448 @@ -153,6 +162,12 @@ |
| 1449 #endif |
| 1450 fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n"); |
| 1451 fprintf(stderr, " -outfile name Specify name for output file\n"); |
| 1452 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 1453 + fprintf(stderr, " -memsrc Load input file into memory before decompre
ssing\n"); |
| 1454 +#endif |
| 1455 + |
| 1456 + fprintf(stderr, " -skip Y0,Y1 Decode all rows except those between Y0 and
Y1 (inclusive)\n"); |
| 1457 + fprintf(stderr, " -strip Y0,Y1 Decode only rows between Y0 and Y1 (inclusi
ve)\n"); |
| 1458 fprintf(stderr, " -verbose or -debug Emit debug output\n"); |
| 1459 exit(EXIT_FAILURE); |
| 1460 } |
| 1461 @@ -176,6 +191,9 @@ |
| 1462 /* Set up default JPEG parameters. */ |
| 1463 requested_fmt = DEFAULT_FMT; /* set default output file format */ |
| 1464 outfilename = NULL; |
| 1465 + memsrc = FALSE; |
| 1466 + strip = FALSE; |
| 1467 + skip = FALSE; |
| 1468 cinfo->err->trace_level = 0; |
| 1469 |
| 1470 /* Scan command line options, adjust parameters */ |
| 1471 @@ -240,8 +258,11 @@ |
| 1472 static boolean printed_version = FALSE; |
| 1473 |
| 1474 if (! printed_version) { |
| 1475 - fprintf(stderr, "Independent JPEG Group's DJPEG, version %s\n%s\n", |
| 1476 - JVERSION, JCOPYRIGHT); |
| 1477 + fprintf(stderr, "%s version %s (build %s)\n", |
| 1478 + PACKAGE_NAME, VERSION, BUILD); |
| 1479 + fprintf(stderr, "%s\n\n", JCOPYRIGHT); |
| 1480 + fprintf(stderr, "Emulating The Independent JPEG Group's software, versio
n %s\n\n", |
| 1481 + JVERSION); |
| 1482 printed_version = TRUE; |
| 1483 } |
| 1484 cinfo->err->trace_level++; |
| 1485 @@ -263,6 +284,10 @@ |
| 1486 /* Force monochrome output. */ |
| 1487 cinfo->out_color_space = JCS_GRAYSCALE; |
| 1488 |
| 1489 + } else if (keymatch(arg, "rgb", 2)) { |
| 1490 + /* Force RGB output. */ |
| 1491 + cinfo->out_color_space = JCS_RGB; |
| 1492 + |
| 1493 } else if (keymatch(arg, "map", 3)) { |
| 1494 /* Quantize to a color map taken from an input file. */ |
| 1495 if (++argn >= argc) /* advance to next argument */ |
| 1496 @@ -314,6 +339,16 @@ |
| 1497 usage(); |
| 1498 outfilename = argv[argn]; /* save it away for later use */ |
| 1499 |
| 1500 + } else if (keymatch(arg, "memsrc", 2)) { |
| 1501 + /* Use in-memory source manager */ |
| 1502 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 1503 + memsrc = TRUE; |
| 1504 +#else |
| 1505 + fprintf(stderr, "%s: sorry, in-memory source manager was not compiled in\
n", |
| 1506 + progname); |
| 1507 + exit(EXIT_FAILURE); |
| 1508 +#endif |
| 1509 + |
| 1510 } else if (keymatch(arg, "pnm", 1) || keymatch(arg, "ppm", 1)) { |
| 1511 /* PPM/PGM output format. */ |
| 1512 requested_fmt = FMT_PPM; |
| 1513 @@ -322,7 +357,7 @@ |
| 1514 /* RLE output format. */ |
| 1515 requested_fmt = FMT_RLE; |
| 1516 |
| 1517 - } else if (keymatch(arg, "scale", 1)) { |
| 1518 + } else if (keymatch(arg, "scale", 2)) { |
| 1519 /* Scale the output image by a fraction M/N. */ |
| 1520 if (++argn >= argc) /* advance to next argument */ |
| 1521 usage(); |
| 1522 @@ -330,6 +365,20 @@ |
| 1523 &cinfo->scale_num, &cinfo->scale_denom) != 2) |
| 1524 usage(); |
| 1525 |
| 1526 + } else if (keymatch(arg, "strip", 2)) { |
| 1527 + if (++argn >= argc) |
| 1528 + usage(); |
| 1529 + if (sscanf(argv[argn], "%d,%d", &startY, &endY) != 2 || startY > endY) |
| 1530 + usage(); |
| 1531 + strip = TRUE; |
| 1532 + |
| 1533 + } else if (keymatch(arg, "skip", 2)) { |
| 1534 + if (++argn >= argc) |
| 1535 + usage(); |
| 1536 + if (sscanf(argv[argn], "%d,%d", &startY, &endY) != 2 || startY > endY) |
| 1537 + usage(); |
| 1538 + skip = TRUE; |
| 1539 + |
| 1540 } else if (keymatch(arg, "targa", 1)) { |
| 1541 /* Targa output format. */ |
| 1542 requested_fmt = FMT_TARGA; |
| 1543 @@ -432,6 +481,8 @@ |
| 1544 djpeg_dest_ptr dest_mgr = NULL; |
| 1545 FILE * input_file; |
| 1546 FILE * output_file; |
| 1547 + unsigned char *inbuffer = NULL; |
| 1548 + unsigned long insize = 0; |
| 1549 JDIMENSION num_scanlines; |
| 1550 |
| 1551 /* On Mac, fetch a command line. */ |
| 1552 @@ -455,7 +506,7 @@ |
| 1553 * APP12 is used by some digital camera makers for textual info, |
| 1554 * so we provide the ability to display it as text. |
| 1555 * If you like, additional APPn marker types can be selected for display, |
| 1556 - * but don't try to override APP0 or APP14 this way (see libjpeg.doc). |
| 1557 + * but don't try to override APP0 or APP14 this way (see libjpeg.txt). |
| 1558 */ |
| 1559 jpeg_set_marker_processor(&cinfo, JPEG_COM, print_text_marker); |
| 1560 jpeg_set_marker_processor(&cinfo, JPEG_APP0+12, print_text_marker); |
| 1561 @@ -526,7 +577,30 @@ |
| 1562 #endif |
| 1563 |
| 1564 /* Specify data source for decompression */ |
| 1565 - jpeg_stdio_src(&cinfo, input_file); |
| 1566 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 1567 + if (memsrc) { |
| 1568 + size_t nbytes; |
| 1569 + do { |
| 1570 + inbuffer = (unsigned char *)realloc(inbuffer, insize + INPUT_BUF_SIZE); |
| 1571 + if (inbuffer == NULL) { |
| 1572 + fprintf(stderr, "%s: memory allocation failure\n", progname); |
| 1573 + exit(EXIT_FAILURE); |
| 1574 + } |
| 1575 + nbytes = JFREAD(input_file, &inbuffer[insize], INPUT_BUF_SIZE); |
| 1576 + if (nbytes < INPUT_BUF_SIZE && ferror(input_file)) { |
| 1577 + if (file_index < argc) |
| 1578 + fprintf(stderr, "%s: can't read from %s\n", progname, |
| 1579 + argv[file_index]); |
| 1580 + else |
| 1581 + fprintf(stderr, "%s: can't read from stdin\n", progname); |
| 1582 + } |
| 1583 + insize += (unsigned long)nbytes; |
| 1584 + } while (nbytes == INPUT_BUF_SIZE); |
| 1585 + fprintf(stderr, "Compressed size: %lu bytes\n", insize); |
| 1586 + jpeg_mem_src(&cinfo, inbuffer, insize); |
| 1587 + } else |
| 1588 +#endif |
| 1589 + jpeg_stdio_src(&cinfo, input_file); |
| 1590 |
| 1591 /* Read file header, set default decompression parameters */ |
| 1592 (void) jpeg_read_header(&cinfo, TRUE); |
| 1593 @@ -575,14 +649,64 @@ |
| 1594 /* Start decompressor */ |
| 1595 (void) jpeg_start_decompress(&cinfo); |
| 1596 |
| 1597 - /* Write output file header */ |
| 1598 - (*dest_mgr->start_output) (&cinfo, dest_mgr); |
| 1599 + /* Strip decode */ |
| 1600 + if (strip || skip) { |
| 1601 + JDIMENSION tmp; |
| 1602 |
| 1603 - /* Process data */ |
| 1604 - while (cinfo.output_scanline < cinfo.output_height) { |
| 1605 - num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer, |
| 1606 - dest_mgr->buffer_height); |
| 1607 - (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines); |
| 1608 + /* Check for valid endY. We cannot check this value until after |
| 1609 + * jpeg_start_decompress() is called. Note that we have already verified |
| 1610 + * that startY <= endY. |
| 1611 + */ |
| 1612 + if (endY > cinfo.output_height - 1) { |
| 1613 + fprintf(stderr, "%s: strip %d-%d exceeds image height %d\n", progname, |
| 1614 + startY, endY, cinfo.output_height); |
| 1615 + exit(EXIT_FAILURE); |
| 1616 + } |
| 1617 + |
| 1618 + /* Write output file header. This is a hack to ensure that the destination |
| 1619 + * manager creates an image of the proper size for the partial decode. |
| 1620 + */ |
| 1621 + tmp = cinfo.output_height; |
| 1622 + cinfo.output_height = endY - startY + 1; |
| 1623 + if (skip) |
| 1624 + cinfo.output_height = tmp - cinfo.output_height; |
| 1625 + (*dest_mgr->start_output) (&cinfo, dest_mgr); |
| 1626 + cinfo.output_height = tmp; |
| 1627 + |
| 1628 + /* Process data */ |
| 1629 + if (skip) { |
| 1630 + while (cinfo.output_scanline < startY) { |
| 1631 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer, |
| 1632 + dest_mgr->buffer_height); |
| 1633 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines); |
| 1634 + } |
| 1635 + jpeg_skip_scanlines(&cinfo, endY - startY + 1); |
| 1636 + while (cinfo.output_scanline < cinfo.output_height) { |
| 1637 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer, |
| 1638 + dest_mgr->buffer_height); |
| 1639 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines); |
| 1640 + } |
| 1641 + } else { |
| 1642 + jpeg_skip_scanlines(&cinfo, startY); |
| 1643 + while (cinfo.output_scanline <= endY) { |
| 1644 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer, |
| 1645 + dest_mgr->buffer_height); |
| 1646 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines); |
| 1647 + } |
| 1648 + jpeg_skip_scanlines(&cinfo, cinfo.output_height - endY + 1); |
| 1649 + } |
| 1650 + |
| 1651 + /* Normal full image decode */ |
| 1652 + } else { |
| 1653 + /* Write output file header */ |
| 1654 + (*dest_mgr->start_output) (&cinfo, dest_mgr); |
| 1655 + |
| 1656 + /* Process data */ |
| 1657 + while (cinfo.output_scanline < cinfo.output_height) { |
| 1658 + num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer, |
| 1659 + dest_mgr->buffer_height); |
| 1660 + (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines); |
| 1661 + } |
| 1662 } |
| 1663 |
| 1664 #ifdef PROGRESS_REPORT |
| 1665 @@ -610,6 +734,9 @@ |
| 1666 end_progress_monitor((j_common_ptr) &cinfo); |
| 1667 #endif |
| 1668 |
| 1669 + if (memsrc && inbuffer != NULL) |
| 1670 + free(inbuffer); |
| 1671 + |
| 1672 /* All done. */ |
| 1673 exit(jerr.num_warnings ? EXIT_WARNING : EXIT_SUCCESS); |
| 1674 return 0; /* suppress no-return-value warnings */ |
| 1675 Index: jcapimin.c |
| 1676 =================================================================== |
| 1677 --- jcapimin.c (revision 829) |
| 1678 +++ jcapimin.c (working copy) |
| 1679 @@ -2,6 +2,7 @@ |
| 1680 * jcapimin.c |
| 1681 * |
| 1682 * Copyright (C) 1994-1998, Thomas G. Lane. |
| 1683 + * Modified 2003-2010 by Guido Vollbeding. |
| 1684 * This file is part of the Independent JPEG Group's software. |
| 1685 * For conditions of distribution and use, see the accompanying README file. |
| 1686 * |
| 1687 @@ -63,8 +64,12 @@ |
| 1688 |
| 1689 cinfo->comp_info = NULL; |
| 1690 |
| 1691 - for (i = 0; i < NUM_QUANT_TBLS; i++) |
| 1692 + for (i = 0; i < NUM_QUANT_TBLS; i++) { |
| 1693 cinfo->quant_tbl_ptrs[i] = NULL; |
| 1694 +#if JPEG_LIB_VERSION >= 70 |
| 1695 + cinfo->q_scale_factor[i] = 100; |
| 1696 +#endif |
| 1697 + } |
| 1698 |
| 1699 for (i = 0; i < NUM_HUFF_TBLS; i++) { |
| 1700 cinfo->dc_huff_tbl_ptrs[i] = NULL; |
| 1701 @@ -71,6 +76,13 @@ |
| 1702 cinfo->ac_huff_tbl_ptrs[i] = NULL; |
| 1703 } |
| 1704 |
| 1705 +#if JPEG_LIB_VERSION >= 80 |
| 1706 + /* Must do it here for emit_dqt in case jpeg_write_tables is used */ |
| 1707 + cinfo->block_size = DCTSIZE; |
| 1708 + cinfo->natural_order = jpeg_natural_order; |
| 1709 + cinfo->lim_Se = DCTSIZE2-1; |
| 1710 +#endif |
| 1711 + |
| 1712 cinfo->script_space = NULL; |
| 1713 |
| 1714 cinfo->input_gamma = 1.0; /* in case application forgets */ |
| 1715 Index: jccolor.c |
| 1716 =================================================================== |
| 1717 --- jccolor.c (revision 829) |
| 1718 +++ jccolor.c (working copy) |
| 1719 @@ -1,10 +1,11 @@ |
| 1720 /* |
| 1721 * jccolor.c |
| 1722 * |
| 1723 + * This file was part of the Independent JPEG Group's software: |
| 1724 * Copyright (C) 1991-1996, Thomas G. Lane. |
| 1725 + * libjpeg-turbo Modifications: |
| 1726 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 1727 - * Copyright 2009 D. R. Commander |
| 1728 - * This file is part of the Independent JPEG Group's software. |
| 1729 + * Copyright (C) 2009-2012, D. R. Commander. |
| 1730 * For conditions of distribution and use, see the accompanying README file. |
| 1731 * |
| 1732 * This file contains input colorspace conversion routines. |
| 1733 @@ -14,6 +15,7 @@ |
| 1734 #include "jinclude.h" |
| 1735 #include "jpeglib.h" |
| 1736 #include "jsimd.h" |
| 1737 +#include "config.h" |
| 1738 |
| 1739 |
| 1740 /* Private subobject */ |
| 1741 @@ -81,6 +83,111 @@ |
| 1742 #define TABLE_SIZE (8*(MAXJSAMPLE+1)) |
| 1743 |
| 1744 |
| 1745 +/* Include inline routines for colorspace extensions */ |
| 1746 + |
| 1747 +#include "jccolext.c" |
| 1748 +#undef RGB_RED |
| 1749 +#undef RGB_GREEN |
| 1750 +#undef RGB_BLUE |
| 1751 +#undef RGB_PIXELSIZE |
| 1752 + |
| 1753 +#define RGB_RED EXT_RGB_RED |
| 1754 +#define RGB_GREEN EXT_RGB_GREEN |
| 1755 +#define RGB_BLUE EXT_RGB_BLUE |
| 1756 +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 1757 +#define rgb_ycc_convert_internal extrgb_ycc_convert_internal |
| 1758 +#define rgb_gray_convert_internal extrgb_gray_convert_internal |
| 1759 +#define rgb_rgb_convert_internal extrgb_rgb_convert_internal |
| 1760 +#include "jccolext.c" |
| 1761 +#undef RGB_RED |
| 1762 +#undef RGB_GREEN |
| 1763 +#undef RGB_BLUE |
| 1764 +#undef RGB_PIXELSIZE |
| 1765 +#undef rgb_ycc_convert_internal |
| 1766 +#undef rgb_gray_convert_internal |
| 1767 +#undef rgb_rgb_convert_internal |
| 1768 + |
| 1769 +#define RGB_RED EXT_RGBX_RED |
| 1770 +#define RGB_GREEN EXT_RGBX_GREEN |
| 1771 +#define RGB_BLUE EXT_RGBX_BLUE |
| 1772 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 1773 +#define rgb_ycc_convert_internal extrgbx_ycc_convert_internal |
| 1774 +#define rgb_gray_convert_internal extrgbx_gray_convert_internal |
| 1775 +#define rgb_rgb_convert_internal extrgbx_rgb_convert_internal |
| 1776 +#include "jccolext.c" |
| 1777 +#undef RGB_RED |
| 1778 +#undef RGB_GREEN |
| 1779 +#undef RGB_BLUE |
| 1780 +#undef RGB_PIXELSIZE |
| 1781 +#undef rgb_ycc_convert_internal |
| 1782 +#undef rgb_gray_convert_internal |
| 1783 +#undef rgb_rgb_convert_internal |
| 1784 + |
| 1785 +#define RGB_RED EXT_BGR_RED |
| 1786 +#define RGB_GREEN EXT_BGR_GREEN |
| 1787 +#define RGB_BLUE EXT_BGR_BLUE |
| 1788 +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 1789 +#define rgb_ycc_convert_internal extbgr_ycc_convert_internal |
| 1790 +#define rgb_gray_convert_internal extbgr_gray_convert_internal |
| 1791 +#define rgb_rgb_convert_internal extbgr_rgb_convert_internal |
| 1792 +#include "jccolext.c" |
| 1793 +#undef RGB_RED |
| 1794 +#undef RGB_GREEN |
| 1795 +#undef RGB_BLUE |
| 1796 +#undef RGB_PIXELSIZE |
| 1797 +#undef rgb_ycc_convert_internal |
| 1798 +#undef rgb_gray_convert_internal |
| 1799 +#undef rgb_rgb_convert_internal |
| 1800 + |
| 1801 +#define RGB_RED EXT_BGRX_RED |
| 1802 +#define RGB_GREEN EXT_BGRX_GREEN |
| 1803 +#define RGB_BLUE EXT_BGRX_BLUE |
| 1804 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 1805 +#define rgb_ycc_convert_internal extbgrx_ycc_convert_internal |
| 1806 +#define rgb_gray_convert_internal extbgrx_gray_convert_internal |
| 1807 +#define rgb_rgb_convert_internal extbgrx_rgb_convert_internal |
| 1808 +#include "jccolext.c" |
| 1809 +#undef RGB_RED |
| 1810 +#undef RGB_GREEN |
| 1811 +#undef RGB_BLUE |
| 1812 +#undef RGB_PIXELSIZE |
| 1813 +#undef rgb_ycc_convert_internal |
| 1814 +#undef rgb_gray_convert_internal |
| 1815 +#undef rgb_rgb_convert_internal |
| 1816 + |
| 1817 +#define RGB_RED EXT_XBGR_RED |
| 1818 +#define RGB_GREEN EXT_XBGR_GREEN |
| 1819 +#define RGB_BLUE EXT_XBGR_BLUE |
| 1820 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 1821 +#define rgb_ycc_convert_internal extxbgr_ycc_convert_internal |
| 1822 +#define rgb_gray_convert_internal extxbgr_gray_convert_internal |
| 1823 +#define rgb_rgb_convert_internal extxbgr_rgb_convert_internal |
| 1824 +#include "jccolext.c" |
| 1825 +#undef RGB_RED |
| 1826 +#undef RGB_GREEN |
| 1827 +#undef RGB_BLUE |
| 1828 +#undef RGB_PIXELSIZE |
| 1829 +#undef rgb_ycc_convert_internal |
| 1830 +#undef rgb_gray_convert_internal |
| 1831 +#undef rgb_rgb_convert_internal |
| 1832 + |
| 1833 +#define RGB_RED EXT_XRGB_RED |
| 1834 +#define RGB_GREEN EXT_XRGB_GREEN |
| 1835 +#define RGB_BLUE EXT_XRGB_BLUE |
| 1836 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 1837 +#define rgb_ycc_convert_internal extxrgb_ycc_convert_internal |
| 1838 +#define rgb_gray_convert_internal extxrgb_gray_convert_internal |
| 1839 +#define rgb_rgb_convert_internal extxrgb_rgb_convert_internal |
| 1840 +#include "jccolext.c" |
| 1841 +#undef RGB_RED |
| 1842 +#undef RGB_GREEN |
| 1843 +#undef RGB_BLUE |
| 1844 +#undef RGB_PIXELSIZE |
| 1845 +#undef rgb_ycc_convert_internal |
| 1846 +#undef rgb_gray_convert_internal |
| 1847 +#undef rgb_rgb_convert_internal |
| 1848 + |
| 1849 + |
| 1850 /* |
| 1851 * Initialize for RGB->YCC colorspace conversion. |
| 1852 */ |
| 1853 @@ -119,14 +226,6 @@ |
| 1854 |
| 1855 /* |
| 1856 * Convert some rows of samples to the JPEG colorspace. |
| 1857 - * |
| 1858 - * Note that we change from the application's interleaved-pixel format |
| 1859 - * to our internal noninterleaved, one-plane-per-component format. |
| 1860 - * The input buffer is therefore three times as wide as the output buffer. |
| 1861 - * |
| 1862 - * A starting row offset is provided only for the output buffer. The caller |
| 1863 - * can easily adjust the passed input_buf value to accommodate any row |
| 1864 - * offset required on that side. |
| 1865 */ |
| 1866 |
| 1867 METHODDEF(void) |
| 1868 @@ -134,43 +233,39 @@ |
| 1869 JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 1870 JDIMENSION output_row, int num_rows) |
| 1871 { |
| 1872 - my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert; |
| 1873 - register int r, g, b; |
| 1874 - register INT32 * ctab = cconvert->rgb_ycc_tab; |
| 1875 - register JSAMPROW inptr; |
| 1876 - register JSAMPROW outptr0, outptr1, outptr2; |
| 1877 - register JDIMENSION col; |
| 1878 - JDIMENSION num_cols = cinfo->image_width; |
| 1879 - |
| 1880 - while (--num_rows >= 0) { |
| 1881 - inptr = *input_buf++; |
| 1882 - outptr0 = output_buf[0][output_row]; |
| 1883 - outptr1 = output_buf[1][output_row]; |
| 1884 - outptr2 = output_buf[2][output_row]; |
| 1885 - output_row++; |
| 1886 - for (col = 0; col < num_cols; col++) { |
| 1887 - r = GETJSAMPLE(inptr[rgb_red[cinfo->in_color_space]]); |
| 1888 - g = GETJSAMPLE(inptr[rgb_green[cinfo->in_color_space]]); |
| 1889 - b = GETJSAMPLE(inptr[rgb_blue[cinfo->in_color_space]]); |
| 1890 - inptr += rgb_pixelsize[cinfo->in_color_space]; |
| 1891 - /* If the inputs are 0..MAXJSAMPLE, the outputs of these equations |
| 1892 - * must be too; we do not need an explicit range-limiting operation. |
| 1893 - * Hence the value being shifted is never negative, and we don't |
| 1894 - * need the general RIGHT_SHIFT macro. |
| 1895 - */ |
| 1896 - /* Y */ |
| 1897 - outptr0[col] = (JSAMPLE) |
| 1898 - ((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF]) |
| 1899 - >> SCALEBITS); |
| 1900 - /* Cb */ |
| 1901 - outptr1[col] = (JSAMPLE) |
| 1902 - ((ctab[r+R_CB_OFF] + ctab[g+G_CB_OFF] + ctab[b+B_CB_OFF]) |
| 1903 - >> SCALEBITS); |
| 1904 - /* Cr */ |
| 1905 - outptr2[col] = (JSAMPLE) |
| 1906 - ((ctab[r+R_CR_OFF] + ctab[g+G_CR_OFF] + ctab[b+B_CR_OFF]) |
| 1907 - >> SCALEBITS); |
| 1908 - } |
| 1909 + switch (cinfo->in_color_space) { |
| 1910 + case JCS_EXT_RGB: |
| 1911 + extrgb_ycc_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1912 + num_rows); |
| 1913 + break; |
| 1914 + case JCS_EXT_RGBX: |
| 1915 + case JCS_EXT_RGBA: |
| 1916 + extrgbx_ycc_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1917 + num_rows); |
| 1918 + break; |
| 1919 + case JCS_EXT_BGR: |
| 1920 + extbgr_ycc_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1921 + num_rows); |
| 1922 + break; |
| 1923 + case JCS_EXT_BGRX: |
| 1924 + case JCS_EXT_BGRA: |
| 1925 + extbgrx_ycc_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1926 + num_rows); |
| 1927 + break; |
| 1928 + case JCS_EXT_XBGR: |
| 1929 + case JCS_EXT_ABGR: |
| 1930 + extxbgr_ycc_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1931 + num_rows); |
| 1932 + break; |
| 1933 + case JCS_EXT_XRGB: |
| 1934 + case JCS_EXT_ARGB: |
| 1935 + extxrgb_ycc_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1936 + num_rows); |
| 1937 + break; |
| 1938 + default: |
| 1939 + rgb_ycc_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1940 + num_rows); |
| 1941 + break; |
| 1942 } |
| 1943 } |
| 1944 |
| 1945 @@ -180,9 +275,6 @@ |
| 1946 |
| 1947 /* |
| 1948 * Convert some rows of samples to the JPEG colorspace. |
| 1949 - * This version handles RGB->grayscale conversion, which is the same |
| 1950 - * as the RGB->Y portion of RGB->YCbCr. |
| 1951 - * We assume rgb_ycc_start has been called (we only use the Y tables). |
| 1952 */ |
| 1953 |
| 1954 METHODDEF(void) |
| 1955 @@ -190,28 +282,85 @@ |
| 1956 JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 1957 JDIMENSION output_row, int num_rows) |
| 1958 { |
| 1959 - my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert; |
| 1960 - register int r, g, b; |
| 1961 - register INT32 * ctab = cconvert->rgb_ycc_tab; |
| 1962 - register JSAMPROW inptr; |
| 1963 - register JSAMPROW outptr; |
| 1964 - register JDIMENSION col; |
| 1965 - JDIMENSION num_cols = cinfo->image_width; |
| 1966 + switch (cinfo->in_color_space) { |
| 1967 + case JCS_EXT_RGB: |
| 1968 + extrgb_gray_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1969 + num_rows); |
| 1970 + break; |
| 1971 + case JCS_EXT_RGBX: |
| 1972 + case JCS_EXT_RGBA: |
| 1973 + extrgbx_gray_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1974 + num_rows); |
| 1975 + break; |
| 1976 + case JCS_EXT_BGR: |
| 1977 + extbgr_gray_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1978 + num_rows); |
| 1979 + break; |
| 1980 + case JCS_EXT_BGRX: |
| 1981 + case JCS_EXT_BGRA: |
| 1982 + extbgrx_gray_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1983 + num_rows); |
| 1984 + break; |
| 1985 + case JCS_EXT_XBGR: |
| 1986 + case JCS_EXT_ABGR: |
| 1987 + extxbgr_gray_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1988 + num_rows); |
| 1989 + break; |
| 1990 + case JCS_EXT_XRGB: |
| 1991 + case JCS_EXT_ARGB: |
| 1992 + extxrgb_gray_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1993 + num_rows); |
| 1994 + break; |
| 1995 + default: |
| 1996 + rgb_gray_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 1997 + num_rows); |
| 1998 + break; |
| 1999 + } |
| 2000 +} |
| 2001 |
| 2002 - while (--num_rows >= 0) { |
| 2003 - inptr = *input_buf++; |
| 2004 - outptr = output_buf[0][output_row]; |
| 2005 - output_row++; |
| 2006 - for (col = 0; col < num_cols; col++) { |
| 2007 - r = GETJSAMPLE(inptr[rgb_red[cinfo->in_color_space]]); |
| 2008 - g = GETJSAMPLE(inptr[rgb_green[cinfo->in_color_space]]); |
| 2009 - b = GETJSAMPLE(inptr[rgb_blue[cinfo->in_color_space]]); |
| 2010 - inptr += rgb_pixelsize[cinfo->in_color_space]; |
| 2011 - /* Y */ |
| 2012 - outptr[col] = (JSAMPLE) |
| 2013 - ((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF]) |
| 2014 - >> SCALEBITS); |
| 2015 - } |
| 2016 + |
| 2017 +/* |
| 2018 + * Extended RGB to plain RGB conversion |
| 2019 + */ |
| 2020 + |
| 2021 +METHODDEF(void) |
| 2022 +rgb_rgb_convert (j_compress_ptr cinfo, |
| 2023 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 2024 + JDIMENSION output_row, int num_rows) |
| 2025 +{ |
| 2026 + switch (cinfo->in_color_space) { |
| 2027 + case JCS_EXT_RGB: |
| 2028 + extrgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 2029 + num_rows); |
| 2030 + break; |
| 2031 + case JCS_EXT_RGBX: |
| 2032 + case JCS_EXT_RGBA: |
| 2033 + extrgbx_rgb_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 2034 + num_rows); |
| 2035 + break; |
| 2036 + case JCS_EXT_BGR: |
| 2037 + extbgr_rgb_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 2038 + num_rows); |
| 2039 + break; |
| 2040 + case JCS_EXT_BGRX: |
| 2041 + case JCS_EXT_BGRA: |
| 2042 + extbgrx_rgb_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 2043 + num_rows); |
| 2044 + break; |
| 2045 + case JCS_EXT_XBGR: |
| 2046 + case JCS_EXT_ABGR: |
| 2047 + extxbgr_rgb_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 2048 + num_rows); |
| 2049 + break; |
| 2050 + case JCS_EXT_XRGB: |
| 2051 + case JCS_EXT_ARGB: |
| 2052 + extxrgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 2053 + num_rows); |
| 2054 + break; |
| 2055 + default: |
| 2056 + rgb_rgb_convert_internal(cinfo, input_buf, output_buf, output_row, |
| 2057 + num_rows); |
| 2058 + break; |
| 2059 } |
| 2060 } |
| 2061 |
| 2062 @@ -377,6 +526,10 @@ |
| 2063 case JCS_EXT_BGRX: |
| 2064 case JCS_EXT_XBGR: |
| 2065 case JCS_EXT_XRGB: |
| 2066 + case JCS_EXT_RGBA: |
| 2067 + case JCS_EXT_BGRA: |
| 2068 + case JCS_EXT_ABGR: |
| 2069 + case JCS_EXT_ARGB: |
| 2070 if (cinfo->input_components != rgb_pixelsize[cinfo->in_color_space]) |
| 2071 ERREXIT(cinfo, JERR_BAD_IN_COLORSPACE); |
| 2072 break; |
| 2073 @@ -411,9 +564,17 @@ |
| 2074 cinfo->in_color_space == JCS_EXT_BGR || |
| 2075 cinfo->in_color_space == JCS_EXT_BGRX || |
| 2076 cinfo->in_color_space == JCS_EXT_XBGR || |
| 2077 - cinfo->in_color_space == JCS_EXT_XRGB) { |
| 2078 - cconvert->pub.start_pass = rgb_ycc_start; |
| 2079 - cconvert->pub.color_convert = rgb_gray_convert; |
| 2080 + cinfo->in_color_space == JCS_EXT_XRGB || |
| 2081 + cinfo->in_color_space == JCS_EXT_RGBA || |
| 2082 + cinfo->in_color_space == JCS_EXT_BGRA || |
| 2083 + cinfo->in_color_space == JCS_EXT_ABGR || |
| 2084 + cinfo->in_color_space == JCS_EXT_ARGB) { |
| 2085 + if (jsimd_can_rgb_gray()) |
| 2086 + cconvert->pub.color_convert = jsimd_rgb_gray_convert; |
| 2087 + else { |
| 2088 + cconvert->pub.start_pass = rgb_ycc_start; |
| 2089 + cconvert->pub.color_convert = rgb_gray_convert; |
| 2090 + } |
| 2091 } else if (cinfo->in_color_space == JCS_YCbCr) |
| 2092 cconvert->pub.color_convert = grayscale_convert; |
| 2093 else |
| 2094 @@ -421,17 +582,25 @@ |
| 2095 break; |
| 2096 |
| 2097 case JCS_RGB: |
| 2098 - case JCS_EXT_RGB: |
| 2099 - case JCS_EXT_RGBX: |
| 2100 - case JCS_EXT_BGR: |
| 2101 - case JCS_EXT_BGRX: |
| 2102 - case JCS_EXT_XBGR: |
| 2103 - case JCS_EXT_XRGB: |
| 2104 if (cinfo->num_components != 3) |
| 2105 ERREXIT(cinfo, JERR_BAD_J_COLORSPACE); |
| 2106 - if (cinfo->in_color_space == cinfo->jpeg_color_space && |
| 2107 - rgb_pixelsize[cinfo->in_color_space] == 3) |
| 2108 + if (rgb_red[cinfo->in_color_space] == 0 && |
| 2109 + rgb_green[cinfo->in_color_space] == 1 && |
| 2110 + rgb_blue[cinfo->in_color_space] == 2 && |
| 2111 + rgb_pixelsize[cinfo->in_color_space] == 3) |
| 2112 cconvert->pub.color_convert = null_convert; |
| 2113 + else if (cinfo->in_color_space == JCS_RGB || |
| 2114 + cinfo->in_color_space == JCS_EXT_RGB || |
| 2115 + cinfo->in_color_space == JCS_EXT_RGBX || |
| 2116 + cinfo->in_color_space == JCS_EXT_BGR || |
| 2117 + cinfo->in_color_space == JCS_EXT_BGRX || |
| 2118 + cinfo->in_color_space == JCS_EXT_XBGR || |
| 2119 + cinfo->in_color_space == JCS_EXT_XRGB || |
| 2120 + cinfo->in_color_space == JCS_EXT_RGBA || |
| 2121 + cinfo->in_color_space == JCS_EXT_BGRA || |
| 2122 + cinfo->in_color_space == JCS_EXT_ABGR || |
| 2123 + cinfo->in_color_space == JCS_EXT_ARGB) |
| 2124 + cconvert->pub.color_convert = rgb_rgb_convert; |
| 2125 else |
| 2126 ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL); |
| 2127 break; |
| 2128 @@ -445,7 +614,11 @@ |
| 2129 cinfo->in_color_space == JCS_EXT_BGR || |
| 2130 cinfo->in_color_space == JCS_EXT_BGRX || |
| 2131 cinfo->in_color_space == JCS_EXT_XBGR || |
| 2132 - cinfo->in_color_space == JCS_EXT_XRGB) { |
| 2133 + cinfo->in_color_space == JCS_EXT_XRGB || |
| 2134 + cinfo->in_color_space == JCS_EXT_RGBA || |
| 2135 + cinfo->in_color_space == JCS_EXT_BGRA || |
| 2136 + cinfo->in_color_space == JCS_EXT_ABGR || |
| 2137 + cinfo->in_color_space == JCS_EXT_ARGB) { |
| 2138 if (jsimd_can_rgb_ycc()) |
| 2139 cconvert->pub.color_convert = jsimd_rgb_ycc_convert; |
| 2140 else { |
| 2141 Index: jcdctmgr.c |
| 2142 =================================================================== |
| 2143 --- jcdctmgr.c (revision 829) |
| 2144 +++ jcdctmgr.c (working copy) |
| 2145 @@ -1,10 +1,12 @@ |
| 2146 /* |
| 2147 * jcdctmgr.c |
| 2148 * |
| 2149 + * This file was part of the Independent JPEG Group's software: |
| 2150 * Copyright (C) 1994-1996, Thomas G. Lane. |
| 2151 + * libjpeg-turbo Modifications: |
| 2152 * Copyright (C) 1999-2006, MIYASAKA Masaru. |
| 2153 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 2154 - * This file is part of the Independent JPEG Group's software. |
| 2155 + * Copyright (C) 2011 D. R. Commander |
| 2156 * For conditions of distribution and use, see the accompanying README file. |
| 2157 * |
| 2158 * This file contains the forward-DCT management logic. |
| 2159 @@ -39,6 +41,8 @@ |
| 2160 (JCOEFPTR coef_block, FAST_FLOAT * divisors, |
| 2161 FAST_FLOAT * workspace)); |
| 2162 |
| 2163 +METHODDEF(void) quantize (JCOEFPTR, DCTELEM *, DCTELEM *); |
| 2164 + |
| 2165 typedef struct { |
| 2166 struct jpeg_forward_dct pub; /* public fields */ |
| 2167 |
| 2168 @@ -73,7 +77,7 @@ |
| 2169 * Find the highest bit in an integer through binary search. |
| 2170 */ |
| 2171 LOCAL(int) |
| 2172 -fls (UINT16 val) |
| 2173 +flss (UINT16 val) |
| 2174 { |
| 2175 int bit; |
| 2176 |
| 2177 @@ -160,7 +164,7 @@ |
| 2178 * of in a consecutive manner, yet again in order to allow SIMD |
| 2179 * routines. |
| 2180 */ |
| 2181 -LOCAL(void) |
| 2182 +LOCAL(int) |
| 2183 compute_reciprocal (UINT16 divisor, DCTELEM * dtbl) |
| 2184 { |
| 2185 UDCTELEM2 fq, fr; |
| 2186 @@ -167,7 +171,7 @@ |
| 2187 UDCTELEM c; |
| 2188 int b, r; |
| 2189 |
| 2190 - b = fls(divisor) - 1; |
| 2191 + b = flss(divisor) - 1; |
| 2192 r = sizeof(DCTELEM) * 8 + b; |
| 2193 |
| 2194 fq = ((UDCTELEM2)1 << r) / divisor; |
| 2195 @@ -179,7 +183,7 @@ |
| 2196 /* fq will be one bit too large to fit in DCTELEM, so adjust */ |
| 2197 fq >>= 1; |
| 2198 r--; |
| 2199 - } else if (fr <= (divisor / 2)) { /* fractional part is < 0.5 */ |
| 2200 + } else if (fr <= (divisor / 2U)) { /* fractional part is < 0.5 */ |
| 2201 c++; |
| 2202 } else { /* fractional part is > 0.5 */ |
| 2203 fq++; |
| 2204 @@ -189,6 +193,9 @@ |
| 2205 dtbl[DCTSIZE2 * 1] = (DCTELEM) c; /* correction + roundfactor */ |
| 2206 dtbl[DCTSIZE2 * 2] = (DCTELEM) (1 << (sizeof(DCTELEM)*8*2 - r)); /* scale */ |
| 2207 dtbl[DCTSIZE2 * 3] = (DCTELEM) r - sizeof(DCTELEM)*8; /* shift */ |
| 2208 + |
| 2209 + if(r <= 16) return 0; |
| 2210 + else return 1; |
| 2211 } |
| 2212 |
| 2213 /* |
| 2214 @@ -232,7 +239,9 @@ |
| 2215 } |
| 2216 dtbl = fdct->divisors[qtblno]; |
| 2217 for (i = 0; i < DCTSIZE2; i++) { |
| 2218 - compute_reciprocal(qtbl->quantval[i] << 3, &dtbl[i]); |
| 2219 + if(!compute_reciprocal(qtbl->quantval[i] << 3, &dtbl[i]) |
| 2220 + && fdct->quantize == jsimd_quantize) |
| 2221 + fdct->quantize = quantize; |
| 2222 } |
| 2223 break; |
| 2224 #endif |
| 2225 @@ -266,10 +275,12 @@ |
| 2226 } |
| 2227 dtbl = fdct->divisors[qtblno]; |
| 2228 for (i = 0; i < DCTSIZE2; i++) { |
| 2229 - compute_reciprocal( |
| 2230 + if(!compute_reciprocal( |
| 2231 DESCALE(MULTIPLY16V16((INT32) qtbl->quantval[i], |
| 2232 (INT32) aanscales[i]), |
| 2233 - CONST_BITS-3), &dtbl[i]); |
| 2234 + CONST_BITS-3), &dtbl[i]) |
| 2235 + && fdct->quantize == jsimd_quantize) |
| 2236 + fdct->quantize = quantize; |
| 2237 } |
| 2238 } |
| 2239 break; |
| 2240 Index: jchuff.c |
| 2241 =================================================================== |
| 2242 --- jchuff.c (revision 829) |
| 2243 +++ jchuff.c (working copy) |
| 2244 @@ -1,8 +1,10 @@ |
| 2245 /* |
| 2246 * jchuff.c |
| 2247 * |
| 2248 + * This file was part of the Independent JPEG Group's software: |
| 2249 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 2250 - * This file is part of the Independent JPEG Group's software. |
| 2251 + * libjpeg-turbo Modifications: |
| 2252 + * Copyright (C) 2009-2011, D. R. Commander. |
| 2253 * For conditions of distribution and use, see the accompanying README file. |
| 2254 * |
| 2255 * This file contains Huffman entropy encoding routines. |
| 2256 @@ -14,21 +16,6 @@ |
| 2257 * permanent JPEG objects only upon successful completion of an MCU. |
| 2258 */ |
| 2259 |
| 2260 -/* Modifications: |
| 2261 - * Copyright (C)2007 Sun Microsystems, Inc. |
| 2262 - * Copyright (C)2009 D. R. Commander |
| 2263 - * |
| 2264 - * This library is free software and may be redistributed and/or modified under |
| 2265 - * the terms of the wxWindows Library License, Version 3.1 or (at your option) |
| 2266 - * any later version. The full license is in the LICENSE.txt file included |
| 2267 - * with this distribution. |
| 2268 - * |
| 2269 - * This library is distributed in the hope that it will be useful, |
| 2270 - * but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 2271 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 2272 - * wxWindows Library License for more details. |
| 2273 - */ |
| 2274 - |
| 2275 #define JPEG_INTERNALS |
| 2276 #include "jinclude.h" |
| 2277 #include "jpeglib.h" |
| 2278 @@ -35,13 +22,42 @@ |
| 2279 #include "jchuff.h" /* Declarations shared with jcphuff.c */ |
| 2280 #include <limits.h> |
| 2281 |
| 2282 -static unsigned char jpeg_first_bit_table[65536]; |
| 2283 -int jpeg_first_bit_table_init=0; |
| 2284 +/* |
| 2285 + * NOTE: If USE_CLZ_INTRINSIC is defined, then clz/bsr instructions will be |
| 2286 + * used for bit counting rather than the lookup table. This will reduce the |
| 2287 + * memory footprint by 64k, which is important for some mobile applications |
| 2288 + * that create many isolated instances of libjpeg-turbo (web browsers, for |
| 2289 + * instance.) This may improve performance on some mobile platforms as well. |
| 2290 + * This feature is enabled by default only on ARM processors, because some x86 |
| 2291 + * chips have a slow implementation of bsr, and the use of clz/bsr cannot be |
| 2292 + * shown to have a significant performance impact even on the x86 chips that |
| 2293 + * have a fast implementation of it. When building for ARMv6, you can |
| 2294 + * explicitly disable the use of clz/bsr by adding -mthumb to the compiler |
| 2295 + * flags (this defines __thumb__). |
| 2296 + */ |
| 2297 |
| 2298 +/* NOTE: Both GCC and Clang define __GNUC__ */ |
| 2299 +#if defined __GNUC__ && defined __arm__ |
| 2300 +#if !defined __thumb__ || defined __thumb2__ |
| 2301 +#define USE_CLZ_INTRINSIC |
| 2302 +#endif |
| 2303 +#endif |
| 2304 + |
| 2305 +#ifdef USE_CLZ_INTRINSIC |
| 2306 +#define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x)) |
| 2307 +#define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0) |
| 2308 +#else |
| 2309 +static unsigned char jpeg_nbits_table[65536]; |
| 2310 +static int jpeg_nbits_table_init = 0; |
| 2311 +#define JPEG_NBITS(x) (jpeg_nbits_table[x]) |
| 2312 +#define JPEG_NBITS_NONZERO(x) JPEG_NBITS(x) |
| 2313 +#endif |
| 2314 + |
| 2315 #ifndef min |
| 2316 #define min(a,b) ((a)<(b)?(a):(b)) |
| 2317 #endif |
| 2318 |
| 2319 + |
| 2320 /* Expanded entropy encoder object for Huffman encoding. |
| 2321 * |
| 2322 * The savable_state subrecord contains fields that change within an MCU, |
| 2323 @@ -49,7 +65,7 @@ |
| 2324 */ |
| 2325 |
| 2326 typedef struct { |
| 2327 - long put_buffer; /* current bit-accumulation buffer */ |
| 2328 + size_t put_buffer; /* current bit-accumulation buffer */ |
| 2329 int put_bits; /* # of bits now in it */ |
| 2330 int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */ |
| 2331 } savable_state; |
| 2332 @@ -181,7 +197,6 @@ |
| 2333 } |
| 2334 |
| 2335 /* Initialize bit buffer to empty */ |
| 2336 - |
| 2337 entropy->saved.put_buffer = 0; |
| 2338 entropy->saved.put_bits = 0; |
| 2339 |
| 2340 @@ -285,14 +300,16 @@ |
| 2341 dtbl->ehufsi[i] = huffsize[p]; |
| 2342 } |
| 2343 |
| 2344 - if(!jpeg_first_bit_table_init) { |
| 2345 +#ifndef USE_CLZ_INTRINSIC |
| 2346 + if(!jpeg_nbits_table_init) { |
| 2347 for(i = 0; i < 65536; i++) { |
| 2348 - int bit = 0, val = i; |
| 2349 - while (val) {val >>= 1; bit++;} |
| 2350 - jpeg_first_bit_table[i] = bit; |
| 2351 + int nbits = 0, temp = i; |
| 2352 + while (temp) {temp >>= 1; nbits++;} |
| 2353 + jpeg_nbits_table[i] = nbits; |
| 2354 } |
| 2355 - jpeg_first_bit_table_init = 1; |
| 2356 + jpeg_nbits_table_init = 1; |
| 2357 } |
| 2358 +#endif |
| 2359 } |
| 2360 |
| 2361 |
| 2362 @@ -312,8 +329,6 @@ |
| 2363 { |
| 2364 struct jpeg_destination_mgr * dest = state->cinfo->dest; |
| 2365 |
| 2366 - dest->free_in_buffer = state->free_in_buffer; |
| 2367 - |
| 2368 if (! (*dest->empty_output_buffer) (state->cinfo)) |
| 2369 return FALSE; |
| 2370 /* After a successful buffer dump, must reset buffer pointers */ |
| 2371 @@ -325,178 +340,133 @@ |
| 2372 |
| 2373 /* Outputting bits to the file */ |
| 2374 |
| 2375 -/* Only the right 24 bits of put_buffer are used; the valid bits are |
| 2376 - * left-justified in this part. At most 16 bits can be passed to emit_bits |
| 2377 - * in one call, and we never retain more than 7 bits in put_buffer |
| 2378 - * between calls, so 24 bits are sufficient. |
| 2379 +/* These macros perform the same task as the emit_bits() function in the |
| 2380 + * original libjpeg code. In addition to reducing overhead by explicitly |
| 2381 + * inlining the code, additional performance is achieved by taking into |
| 2382 + * account the size of the bit buffer and waiting until it is almost full |
| 2383 + * before emptying it. This mostly benefits 64-bit platforms, since 6 |
| 2384 + * bytes can be stored in a 64-bit bit buffer before it has to be emptied. |
| 2385 */ |
| 2386 |
| 2387 -/***************************************************************/ |
| 2388 - |
| 2389 -#define EMIT_BYTE() { \ |
| 2390 - if (0xFF == (*buffer++ = (unsigned char)(put_buffer >> (put_bits -= 8)))) \ |
| 2391 - *buffer++ = 0; \ |
| 2392 +#define EMIT_BYTE() { \ |
| 2393 + JOCTET c; \ |
| 2394 + put_bits -= 8; \ |
| 2395 + c = (JOCTET)GETJOCTET(put_buffer >> put_bits); \ |
| 2396 + *buffer++ = c; \ |
| 2397 + if (c == 0xFF) /* need to stuff a zero byte? */ \ |
| 2398 + *buffer++ = 0; \ |
| 2399 } |
| 2400 |
| 2401 -/***************************************************************/ |
| 2402 +#define PUT_BITS(code, size) { \ |
| 2403 + put_bits += size; \ |
| 2404 + put_buffer = (put_buffer << size) | code; \ |
| 2405 +} |
| 2406 |
| 2407 -#define DUMP_BITS_(code, size) { \ |
| 2408 - put_bits += size; \ |
| 2409 - put_buffer = (put_buffer << size) | code; \ |
| 2410 - if (put_bits > 7) \ |
| 2411 - while(put_bits > 7) \ |
| 2412 - EMIT_BYTE() \ |
| 2413 - } |
| 2414 - |
| 2415 -/***************************************************************/ |
| 2416 - |
| 2417 -#define CHECKBUF15() { \ |
| 2418 - if (put_bits > 15) { \ |
| 2419 - EMIT_BYTE() \ |
| 2420 - EMIT_BYTE() \ |
| 2421 - } \ |
| 2422 +#define CHECKBUF15() { \ |
| 2423 + if (put_bits > 15) { \ |
| 2424 + EMIT_BYTE() \ |
| 2425 + EMIT_BYTE() \ |
| 2426 + } \ |
| 2427 } |
| 2428 |
| 2429 -#define CHECKBUF47() { \ |
| 2430 - if (put_bits > 47) { \ |
| 2431 - EMIT_BYTE() \ |
| 2432 - EMIT_BYTE() \ |
| 2433 - EMIT_BYTE() \ |
| 2434 - EMIT_BYTE() \ |
| 2435 - EMIT_BYTE() \ |
| 2436 - EMIT_BYTE() \ |
| 2437 - } \ |
| 2438 +#define CHECKBUF31() { \ |
| 2439 + if (put_bits > 31) { \ |
| 2440 + EMIT_BYTE() \ |
| 2441 + EMIT_BYTE() \ |
| 2442 + EMIT_BYTE() \ |
| 2443 + EMIT_BYTE() \ |
| 2444 + } \ |
| 2445 } |
| 2446 |
| 2447 -#define CHECKBUF31() { \ |
| 2448 - if (put_bits > 31) { \ |
| 2449 - EMIT_BYTE() \ |
| 2450 - EMIT_BYTE() \ |
| 2451 - EMIT_BYTE() \ |
| 2452 - EMIT_BYTE() \ |
| 2453 - } \ |
| 2454 +#define CHECKBUF47() { \ |
| 2455 + if (put_bits > 47) { \ |
| 2456 + EMIT_BYTE() \ |
| 2457 + EMIT_BYTE() \ |
| 2458 + EMIT_BYTE() \ |
| 2459 + EMIT_BYTE() \ |
| 2460 + EMIT_BYTE() \ |
| 2461 + EMIT_BYTE() \ |
| 2462 + } \ |
| 2463 } |
| 2464 |
| 2465 -/***************************************************************/ |
| 2466 +#if __WORDSIZE==64 || defined(_WIN64) |
| 2467 |
| 2468 -#define DUMP_BITS_NOCHECK(code, size) { \ |
| 2469 - put_bits += size; \ |
| 2470 - put_buffer = (put_buffer << size) | code; \ |
| 2471 - } |
| 2472 +#define EMIT_BITS(code, size) { \ |
| 2473 + CHECKBUF47() \ |
| 2474 + PUT_BITS(code, size) \ |
| 2475 +} |
| 2476 |
| 2477 -#if __WORDSIZE==64 |
| 2478 - |
| 2479 -#define DUMP_BITS(code, size) { \ |
| 2480 - CHECKBUF47() \ |
| 2481 - put_bits += size; \ |
| 2482 - put_buffer = (put_buffer << size) | code; \ |
| 2483 +#define EMIT_CODE(code, size) { \ |
| 2484 + temp2 &= (((INT32) 1)<<nbits) - 1; \ |
| 2485 + CHECKBUF31() \ |
| 2486 + PUT_BITS(code, size) \ |
| 2487 + PUT_BITS(temp2, nbits) \ |
| 2488 } |
| 2489 |
| 2490 #else |
| 2491 |
| 2492 -#define DUMP_BITS(code, size) { \ |
| 2493 - put_bits += size; \ |
| 2494 - put_buffer = (put_buffer << size) | code; \ |
| 2495 - CHECKBUF15() \ |
| 2496 - } |
| 2497 +#define EMIT_BITS(code, size) { \ |
| 2498 + PUT_BITS(code, size) \ |
| 2499 + CHECKBUF15() \ |
| 2500 +} |
| 2501 |
| 2502 -#endif |
| 2503 - |
| 2504 -/***************************************************************/ |
| 2505 - |
| 2506 -#define DUMP_SINGLE_VALUE(ht, codevalue) { \ |
| 2507 - size = ht->ehufsi[codevalue]; \ |
| 2508 - code = ht->ehufco[codevalue]; \ |
| 2509 - \ |
| 2510 - DUMP_BITS(code, size) \ |
| 2511 +#define EMIT_CODE(code, size) { \ |
| 2512 + temp2 &= (((INT32) 1)<<nbits) - 1; \ |
| 2513 + PUT_BITS(code, size) \ |
| 2514 + CHECKBUF15() \ |
| 2515 + PUT_BITS(temp2, nbits) \ |
| 2516 + CHECKBUF15() \ |
| 2517 } |
| 2518 |
| 2519 -/***************************************************************/ |
| 2520 - |
| 2521 -#define DUMP_VALUE_SLOW(ht, codevalue, t, nbits) { \ |
| 2522 - size = ht->ehufsi[codevalue]; \ |
| 2523 - code = ht->ehufco[codevalue]; \ |
| 2524 - t &= ~(-1 << nbits); \ |
| 2525 - DUMP_BITS_NOCHECK(code, size) \ |
| 2526 - CHECKBUF15() \ |
| 2527 - DUMP_BITS_NOCHECK(t, nbits) \ |
| 2528 - CHECKBUF15() \ |
| 2529 - } |
| 2530 - |
| 2531 -int _max=0; |
| 2532 - |
| 2533 -#if __WORDSIZE==64 |
| 2534 - |
| 2535 -#define DUMP_VALUE(ht, codevalue, t, nbits) { \ |
| 2536 - size = ht->ehufsi[codevalue]; \ |
| 2537 - code = ht->ehufco[codevalue]; \ |
| 2538 - t &= ~(-1 << nbits); \ |
| 2539 - CHECKBUF31() \ |
| 2540 - DUMP_BITS_NOCHECK(code, size) \ |
| 2541 - DUMP_BITS_NOCHECK(t, nbits) \ |
| 2542 - } |
| 2543 - |
| 2544 -#else |
| 2545 - |
| 2546 -#define DUMP_VALUE(ht, codevalue, t, nbits) { \ |
| 2547 - size = ht->ehufsi[codevalue]; \ |
| 2548 - code = ht->ehufco[codevalue]; \ |
| 2549 - t &= ~(-1 << nbits); \ |
| 2550 - DUMP_BITS_NOCHECK(code, size) \ |
| 2551 - CHECKBUF15() \ |
| 2552 - DUMP_BITS_NOCHECK(t, nbits) \ |
| 2553 - CHECKBUF15() \ |
| 2554 - } |
| 2555 - |
| 2556 #endif |
| 2557 |
| 2558 -/***************************************************************/ |
| 2559 |
| 2560 #define BUFSIZE (DCTSIZE2 * 2) |
| 2561 |
| 2562 -#define LOAD_BUFFER() { \ |
| 2563 - if (state->free_in_buffer < BUFSIZE) { \ |
| 2564 - localbuf = 1; \ |
| 2565 - buffer = _buffer; \ |
| 2566 - } \ |
| 2567 - else buffer = state->next_output_byte; \ |
| 2568 +#define LOAD_BUFFER() { \ |
| 2569 + if (state->free_in_buffer < BUFSIZE) { \ |
| 2570 + localbuf = 1; \ |
| 2571 + buffer = _buffer; \ |
| 2572 + } \ |
| 2573 + else buffer = state->next_output_byte; \ |
| 2574 } |
| 2575 |
| 2576 -#define STORE_BUFFER() { \ |
| 2577 - if (localbuf) { \ |
| 2578 - bytes = buffer - _buffer; \ |
| 2579 - buffer = _buffer; \ |
| 2580 - while (bytes > 0) { \ |
| 2581 - bytestocopy = min(bytes, state->free_in_buffer); \ |
| 2582 - MEMCOPY(state->next_output_byte, buffer, bytestocopy); \ |
| 2583 - state->next_output_byte += bytestocopy; \ |
| 2584 - buffer += bytestocopy; \ |
| 2585 - state->free_in_buffer -= bytestocopy; \ |
| 2586 - if (state->free_in_buffer == 0) \ |
| 2587 - if (! dump_buffer(state)) return FALSE; \ |
| 2588 - bytes -= bytestocopy; \ |
| 2589 - } \ |
| 2590 - } \ |
| 2591 - else { \ |
| 2592 - state->free_in_buffer -= (buffer - state->next_output_byte); \ |
| 2593 - state->next_output_byte = buffer; \ |
| 2594 - } \ |
| 2595 +#define STORE_BUFFER() { \ |
| 2596 + if (localbuf) { \ |
| 2597 + bytes = buffer - _buffer; \ |
| 2598 + buffer = _buffer; \ |
| 2599 + while (bytes > 0) { \ |
| 2600 + bytestocopy = min(bytes, state->free_in_buffer); \ |
| 2601 + MEMCOPY(state->next_output_byte, buffer, bytestocopy); \ |
| 2602 + state->next_output_byte += bytestocopy; \ |
| 2603 + buffer += bytestocopy; \ |
| 2604 + state->free_in_buffer -= bytestocopy; \ |
| 2605 + if (state->free_in_buffer == 0) \ |
| 2606 + if (! dump_buffer(state)) return FALSE; \ |
| 2607 + bytes -= bytestocopy; \ |
| 2608 + } \ |
| 2609 + } \ |
| 2610 + else { \ |
| 2611 + state->free_in_buffer -= (buffer - state->next_output_byte); \ |
| 2612 + state->next_output_byte = buffer; \ |
| 2613 + } \ |
| 2614 } |
| 2615 |
| 2616 -/***************************************************************/ |
| 2617 |
| 2618 LOCAL(boolean) |
| 2619 flush_bits (working_state * state) |
| 2620 { |
| 2621 - unsigned char _buffer[BUFSIZE], *buffer; |
| 2622 - long put_buffer; int put_bits; |
| 2623 - int bytes, bytestocopy, localbuf = 0; |
| 2624 + JOCTET _buffer[BUFSIZE], *buffer; |
| 2625 + size_t put_buffer; int put_bits; |
| 2626 + size_t bytes, bytestocopy; int localbuf = 0; |
| 2627 |
| 2628 put_buffer = state->cur.put_buffer; |
| 2629 put_bits = state->cur.put_bits; |
| 2630 LOAD_BUFFER() |
| 2631 |
| 2632 - DUMP_BITS_(0x7F, 7) |
| 2633 + /* fill any partial byte with ones */ |
| 2634 + PUT_BITS(0x7F, 7) |
| 2635 + while (put_bits >= 8) EMIT_BYTE() |
| 2636 |
| 2637 state->cur.put_buffer = 0; /* and reset bit-buffer to empty */ |
| 2638 state->cur.put_bits = 0; |
| 2639 @@ -505,6 +475,7 @@ |
| 2640 return TRUE; |
| 2641 } |
| 2642 |
| 2643 + |
| 2644 /* Encode a single block's worth of coefficients */ |
| 2645 |
| 2646 LOCAL(boolean) |
| 2647 @@ -511,13 +482,13 @@ |
| 2648 encode_one_block (working_state * state, JCOEFPTR block, int last_dc_val, |
| 2649 c_derived_tbl *dctbl, c_derived_tbl *actbl) |
| 2650 { |
| 2651 - int temp, temp2; |
| 2652 + int temp, temp2, temp3; |
| 2653 int nbits; |
| 2654 - int r, sflag, size, code; |
| 2655 - unsigned char _buffer[BUFSIZE], *buffer; |
| 2656 - long put_buffer; int put_bits; |
| 2657 + int r, code, size; |
| 2658 + JOCTET _buffer[BUFSIZE], *buffer; |
| 2659 + size_t put_buffer; int put_bits; |
| 2660 int code_0xf0 = actbl->ehufco[0xf0], size_0xf0 = actbl->ehufsi[0xf0]; |
| 2661 - int bytes, bytestocopy, localbuf = 0; |
| 2662 + size_t bytes, bytestocopy; int localbuf = 0; |
| 2663 |
| 2664 put_buffer = state->cur.put_buffer; |
| 2665 put_bits = state->cur.put_bits; |
| 2666 @@ -527,50 +498,88 @@ |
| 2667 |
| 2668 temp = temp2 = block[0] - last_dc_val; |
| 2669 |
| 2670 - sflag = temp >> 31; |
| 2671 - temp -= ((temp + temp) & sflag); |
| 2672 - temp2 += sflag; |
| 2673 - nbits = jpeg_first_bit_table[temp]; |
| 2674 - DUMP_VALUE_SLOW(dctbl, nbits, temp2, nbits) |
| 2675 + /* This is a well-known technique for obtaining the absolute value without a |
| 2676 + * branch. It is derived from an assembly language technique presented in |
| 2677 + * "How to Optimize for the Pentium Processors", Copyright (c) 1996, 1997 by |
| 2678 + * Agner Fog. |
| 2679 + */ |
| 2680 + temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); |
| 2681 + temp ^= temp3; |
| 2682 + temp -= temp3; |
| 2683 |
| 2684 + /* For a negative input, want temp2 = bitwise complement of abs(input) */ |
| 2685 + /* This code assumes we are on a two's complement machine */ |
| 2686 + temp2 += temp3; |
| 2687 + |
| 2688 + /* Find the number of bits needed for the magnitude of the coefficient */ |
| 2689 + nbits = JPEG_NBITS(temp); |
| 2690 + |
| 2691 + /* Emit the Huffman-coded symbol for the number of bits */ |
| 2692 + code = dctbl->ehufco[nbits]; |
| 2693 + size = dctbl->ehufsi[nbits]; |
| 2694 + PUT_BITS(code, size) |
| 2695 + CHECKBUF15() |
| 2696 + |
| 2697 + /* Mask off any extra bits in code */ |
| 2698 + temp2 &= (((INT32) 1)<<nbits) - 1; |
| 2699 + |
| 2700 + /* Emit that number of bits of the value, if positive, */ |
| 2701 + /* or the complement of its magnitude, if negative. */ |
| 2702 + PUT_BITS(temp2, nbits) |
| 2703 + CHECKBUF15() |
| 2704 + |
| 2705 /* Encode the AC coefficients per section F.1.2.2 */ |
| 2706 |
| 2707 r = 0; /* r = run length of zeros */ |
| 2708 |
| 2709 -#define innerloop(order) { \ |
| 2710 - temp2 = *(JCOEF*)((unsigned char*)block + order); \ |
| 2711 - if(temp2 == 0) r++; \ |
| 2712 - else { \ |
| 2713 - temp = (JCOEF)temp2; \ |
| 2714 - sflag = temp >> 31; \ |
| 2715 - temp = (temp ^ sflag) - sflag; \ |
| 2716 - temp2 += sflag; \ |
| 2717 - nbits = jpeg_first_bit_table[temp]; \ |
| 2718 - for(; r > 15; r -= 16) DUMP_BITS(code_0xf0, size_0xf0) \ |
| 2719 - sflag = (r << 4) + nbits; \ |
| 2720 - DUMP_VALUE(actbl, sflag, temp2, nbits) \ |
| 2721 +/* Manually unroll the k loop to eliminate the counter variable. This |
| 2722 + * improves performance greatly on systems with a limited number of |
| 2723 + * registers (such as x86.) |
| 2724 + */ |
| 2725 +#define kloop(jpeg_natural_order_of_k) { \ |
| 2726 + if ((temp = block[jpeg_natural_order_of_k]) == 0) { \ |
| 2727 + r++; \ |
| 2728 + } else { \ |
| 2729 + temp2 = temp; \ |
| 2730 + /* Branch-less absolute value, bitwise complement, etc., same as above */ \ |
| 2731 + temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); \ |
| 2732 + temp ^= temp3; \ |
| 2733 + temp -= temp3; \ |
| 2734 + temp2 += temp3; \ |
| 2735 + nbits = JPEG_NBITS_NONZERO(temp); \ |
| 2736 + /* if run length > 15, must emit special run-length-16 codes (0xF0) */ \ |
| 2737 + while (r > 15) { \ |
| 2738 + EMIT_BITS(code_0xf0, size_0xf0) \ |
| 2739 + r -= 16; \ |
| 2740 + } \ |
| 2741 + /* Emit Huffman symbol for run length / number of bits */ \ |
| 2742 + temp3 = (r << 4) + nbits; \ |
| 2743 + code = actbl->ehufco[temp3]; \ |
| 2744 + size = actbl->ehufsi[temp3]; \ |
| 2745 + EMIT_CODE(code, size) \ |
| 2746 r = 0; \ |
| 2747 - }} |
| 2748 + } \ |
| 2749 +} |
| 2750 |
| 2751 - innerloop(2*1); innerloop(2*8); innerloop(2*16); innerloop(2*9); |
| 2752 - innerloop(2*2); innerloop(2*3); innerloop(2*10); innerloop(2*17); |
| 2753 - innerloop(2*24); innerloop(2*32); innerloop(2*25); innerloop(2*18); |
| 2754 - innerloop(2*11); innerloop(2*4); innerloop(2*5); innerloop(2*12); |
| 2755 - innerloop(2*19); innerloop(2*26); innerloop(2*33); innerloop(2*40); |
| 2756 - innerloop(2*48); innerloop(2*41); innerloop(2*34); innerloop(2*27); |
| 2757 - innerloop(2*20); innerloop(2*13); innerloop(2*6); innerloop(2*7); |
| 2758 - innerloop(2*14); innerloop(2*21); innerloop(2*28); innerloop(2*35); |
| 2759 - innerloop(2*42); innerloop(2*49); innerloop(2*56); innerloop(2*57); |
| 2760 - innerloop(2*50); innerloop(2*43); innerloop(2*36); innerloop(2*29); |
| 2761 - innerloop(2*22); innerloop(2*15); innerloop(2*23); innerloop(2*30); |
| 2762 - innerloop(2*37); innerloop(2*44); innerloop(2*51); innerloop(2*58); |
| 2763 - innerloop(2*59); innerloop(2*52); innerloop(2*45); innerloop(2*38); |
| 2764 - innerloop(2*31); innerloop(2*39); innerloop(2*46); innerloop(2*53); |
| 2765 - innerloop(2*60); innerloop(2*61); innerloop(2*54); innerloop(2*47); |
| 2766 - innerloop(2*55); innerloop(2*62); innerloop(2*63); |
| 2767 + /* One iteration for each value in jpeg_natural_order[] */ |
| 2768 + kloop(1); kloop(8); kloop(16); kloop(9); kloop(2); kloop(3); |
| 2769 + kloop(10); kloop(17); kloop(24); kloop(32); kloop(25); kloop(18); |
| 2770 + kloop(11); kloop(4); kloop(5); kloop(12); kloop(19); kloop(26); |
| 2771 + kloop(33); kloop(40); kloop(48); kloop(41); kloop(34); kloop(27); |
| 2772 + kloop(20); kloop(13); kloop(6); kloop(7); kloop(14); kloop(21); |
| 2773 + kloop(28); kloop(35); kloop(42); kloop(49); kloop(56); kloop(57); |
| 2774 + kloop(50); kloop(43); kloop(36); kloop(29); kloop(22); kloop(15); |
| 2775 + kloop(23); kloop(30); kloop(37); kloop(44); kloop(51); kloop(58); |
| 2776 + kloop(59); kloop(52); kloop(45); kloop(38); kloop(31); kloop(39); |
| 2777 + kloop(46); kloop(53); kloop(60); kloop(61); kloop(54); kloop(47); |
| 2778 + kloop(55); kloop(62); kloop(63); |
| 2779 |
| 2780 /* If the last coef(s) were zero, emit an end-of-block code */ |
| 2781 - if (r > 0) DUMP_SINGLE_VALUE(actbl, 0x0) |
| 2782 + if (r > 0) { |
| 2783 + code = actbl->ehufco[0]; |
| 2784 + size = actbl->ehufsi[0]; |
| 2785 + EMIT_BITS(code, size) |
| 2786 + } |
| 2787 |
| 2788 state->cur.put_buffer = put_buffer; |
| 2789 state->cur.put_bits = put_bits; |
| 2790 Index: jcinit.c |
| 2791 =================================================================== |
| 2792 --- jcinit.c (revision 829) |
| 2793 +++ jcinit.c (working copy) |
| 2794 @@ -42,7 +42,11 @@ |
| 2795 jinit_forward_dct(cinfo); |
| 2796 /* Entropy encoding: either Huffman or arithmetic coding. */ |
| 2797 if (cinfo->arith_code) { |
| 2798 +#ifdef C_ARITH_CODING_SUPPORTED |
| 2799 + jinit_arith_encoder(cinfo); |
| 2800 +#else |
| 2801 ERREXIT(cinfo, JERR_ARITH_NOTIMPL); |
| 2802 +#endif |
| 2803 } else { |
| 2804 if (cinfo->progressive_mode) { |
| 2805 #ifdef C_PROGRESSIVE_SUPPORTED |
| 2806 Index: jcmainct.c |
| 2807 =================================================================== |
| 2808 --- jcmainct.c (revision 829) |
| 2809 +++ jcmainct.c (working copy) |
| 2810 @@ -68,32 +68,32 @@ |
| 2811 METHODDEF(void) |
| 2812 start_pass_main (j_compress_ptr cinfo, J_BUF_MODE pass_mode) |
| 2813 { |
| 2814 - my_main_ptr main = (my_main_ptr) cinfo->main; |
| 2815 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 2816 |
| 2817 /* Do nothing in raw-data mode. */ |
| 2818 if (cinfo->raw_data_in) |
| 2819 return; |
| 2820 |
| 2821 - main->cur_iMCU_row = 0; /* initialize counters */ |
| 2822 - main->rowgroup_ctr = 0; |
| 2823 - main->suspended = FALSE; |
| 2824 - main->pass_mode = pass_mode; /* save mode for use by process_data */ |
| 2825 + main_ptr->cur_iMCU_row = 0; /* initialize counters */ |
| 2826 + main_ptr->rowgroup_ctr = 0; |
| 2827 + main_ptr->suspended = FALSE; |
| 2828 + main_ptr->pass_mode = pass_mode; /* save mode for use by process_data */ |
| 2829 |
| 2830 switch (pass_mode) { |
| 2831 case JBUF_PASS_THRU: |
| 2832 #ifdef FULL_MAIN_BUFFER_SUPPORTED |
| 2833 - if (main->whole_image[0] != NULL) |
| 2834 + if (main_ptr->whole_image[0] != NULL) |
| 2835 ERREXIT(cinfo, JERR_BAD_BUFFER_MODE); |
| 2836 #endif |
| 2837 - main->pub.process_data = process_data_simple_main; |
| 2838 + main_ptr->pub.process_data = process_data_simple_main; |
| 2839 break; |
| 2840 #ifdef FULL_MAIN_BUFFER_SUPPORTED |
| 2841 case JBUF_SAVE_SOURCE: |
| 2842 case JBUF_CRANK_DEST: |
| 2843 case JBUF_SAVE_AND_PASS: |
| 2844 - if (main->whole_image[0] == NULL) |
| 2845 + if (main_ptr->whole_image[0] == NULL) |
| 2846 ERREXIT(cinfo, JERR_BAD_BUFFER_MODE); |
| 2847 - main->pub.process_data = process_data_buffer_main; |
| 2848 + main_ptr->pub.process_data = process_data_buffer_main; |
| 2849 break; |
| 2850 #endif |
| 2851 default: |
| 2852 @@ -114,14 +114,14 @@ |
| 2853 JSAMPARRAY input_buf, JDIMENSION *in_row_ctr, |
| 2854 JDIMENSION in_rows_avail) |
| 2855 { |
| 2856 - my_main_ptr main = (my_main_ptr) cinfo->main; |
| 2857 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 2858 |
| 2859 - while (main->cur_iMCU_row < cinfo->total_iMCU_rows) { |
| 2860 + while (main_ptr->cur_iMCU_row < cinfo->total_iMCU_rows) { |
| 2861 /* Read input data if we haven't filled the main buffer yet */ |
| 2862 - if (main->rowgroup_ctr < DCTSIZE) |
| 2863 + if (main_ptr->rowgroup_ctr < DCTSIZE) |
| 2864 (*cinfo->prep->pre_process_data) (cinfo, |
| 2865 input_buf, in_row_ctr, in_rows_avail, |
| 2866 - main->buffer, &main->rowgroup_ctr, |
| 2867 + main_ptr->buffer, &main_ptr->rowgroup_ct
r, |
| 2868 (JDIMENSION) DCTSIZE); |
| 2869 |
| 2870 /* If we don't have a full iMCU row buffered, return to application for |
| 2871 @@ -128,11 +128,11 @@ |
| 2872 * more data. Note that preprocessor will always pad to fill the iMCU row |
| 2873 * at the bottom of the image. |
| 2874 */ |
| 2875 - if (main->rowgroup_ctr != DCTSIZE) |
| 2876 + if (main_ptr->rowgroup_ctr != DCTSIZE) |
| 2877 return; |
| 2878 |
| 2879 /* Send the completed row to the compressor */ |
| 2880 - if (! (*cinfo->coef->compress_data) (cinfo, main->buffer)) { |
| 2881 + if (! (*cinfo->coef->compress_data) (cinfo, main_ptr->buffer)) { |
| 2882 /* If compressor did not consume the whole row, then we must need to |
| 2883 * suspend processing and return to the application. In this situation |
| 2884 * we pretend we didn't yet consume the last input row; otherwise, if |
| 2885 @@ -139,9 +139,9 @@ |
| 2886 * it happened to be the last row of the image, the application would |
| 2887 * think we were done. |
| 2888 */ |
| 2889 - if (! main->suspended) { |
| 2890 + if (! main_ptr->suspended) { |
| 2891 (*in_row_ctr)--; |
| 2892 - main->suspended = TRUE; |
| 2893 + main_ptr->suspended = TRUE; |
| 2894 } |
| 2895 return; |
| 2896 } |
| 2897 @@ -148,12 +148,12 @@ |
| 2898 /* We did finish the row. Undo our little suspension hack if a previous |
| 2899 * call suspended; then mark the main buffer empty. |
| 2900 */ |
| 2901 - if (main->suspended) { |
| 2902 + if (main_ptr->suspended) { |
| 2903 (*in_row_ctr)++; |
| 2904 - main->suspended = FALSE; |
| 2905 + main_ptr->suspended = FALSE; |
| 2906 } |
| 2907 - main->rowgroup_ctr = 0; |
| 2908 - main->cur_iMCU_row++; |
| 2909 + main_ptr->rowgroup_ctr = 0; |
| 2910 + main_ptr->cur_iMCU_row++; |
| 2911 } |
| 2912 } |
| 2913 |
| 2914 @@ -170,25 +170,25 @@ |
| 2915 JSAMPARRAY input_buf, JDIMENSION *in_row_ctr, |
| 2916 JDIMENSION in_rows_avail) |
| 2917 { |
| 2918 - my_main_ptr main = (my_main_ptr) cinfo->main; |
| 2919 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 2920 int ci; |
| 2921 jpeg_component_info *compptr; |
| 2922 - boolean writing = (main->pass_mode != JBUF_CRANK_DEST); |
| 2923 + boolean writing = (main_ptr->pass_mode != JBUF_CRANK_DEST); |
| 2924 |
| 2925 - while (main->cur_iMCU_row < cinfo->total_iMCU_rows) { |
| 2926 + while (main_ptr->cur_iMCU_row < cinfo->total_iMCU_rows) { |
| 2927 /* Realign the virtual buffers if at the start of an iMCU row. */ |
| 2928 - if (main->rowgroup_ctr == 0) { |
| 2929 + if (main_ptr->rowgroup_ctr == 0) { |
| 2930 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 2931 ci++, compptr++) { |
| 2932 - main->buffer[ci] = (*cinfo->mem->access_virt_sarray) |
| 2933 - ((j_common_ptr) cinfo, main->whole_image[ci], |
| 2934 - main->cur_iMCU_row * (compptr->v_samp_factor * DCTSIZE), |
| 2935 + main_ptr->buffer[ci] = (*cinfo->mem->access_virt_sarray) |
| 2936 + ((j_common_ptr) cinfo, main_ptr->whole_image[ci], |
| 2937 + main_ptr->cur_iMCU_row * (compptr->v_samp_factor * DCTSIZE), |
| 2938 (JDIMENSION) (compptr->v_samp_factor * DCTSIZE), writing); |
| 2939 } |
| 2940 /* In a read pass, pretend we just read some source data. */ |
| 2941 if (! writing) { |
| 2942 *in_row_ctr += cinfo->max_v_samp_factor * DCTSIZE; |
| 2943 - main->rowgroup_ctr = DCTSIZE; |
| 2944 + main_ptr->rowgroup_ctr = DCTSIZE; |
| 2945 } |
| 2946 } |
| 2947 |
| 2948 @@ -197,16 +197,16 @@ |
| 2949 if (writing) { |
| 2950 (*cinfo->prep->pre_process_data) (cinfo, |
| 2951 input_buf, in_row_ctr, in_rows_avail, |
| 2952 - main->buffer, &main->rowgroup_ctr, |
| 2953 + main_ptr->buffer, &main_ptr->rowgroup_ct
r, |
| 2954 (JDIMENSION) DCTSIZE); |
| 2955 /* Return to application if we need more data to fill the iMCU row. */ |
| 2956 - if (main->rowgroup_ctr < DCTSIZE) |
| 2957 + if (main_ptr->rowgroup_ctr < DCTSIZE) |
| 2958 return; |
| 2959 } |
| 2960 |
| 2961 /* Emit data, unless this is a sink-only pass. */ |
| 2962 - if (main->pass_mode != JBUF_SAVE_SOURCE) { |
| 2963 - if (! (*cinfo->coef->compress_data) (cinfo, main->buffer)) { |
| 2964 + if (main_ptr->pass_mode != JBUF_SAVE_SOURCE) { |
| 2965 + if (! (*cinfo->coef->compress_data) (cinfo, main_ptr->buffer)) { |
| 2966 /* If compressor did not consume the whole row, then we must need to |
| 2967 * suspend processing and return to the application. In this situation |
| 2968 * we pretend we didn't yet consume the last input row; otherwise, if |
| 2969 @@ -213,9 +213,9 @@ |
| 2970 * it happened to be the last row of the image, the application would |
| 2971 * think we were done. |
| 2972 */ |
| 2973 - if (! main->suspended) { |
| 2974 + if (! main_ptr->suspended) { |
| 2975 (*in_row_ctr)--; |
| 2976 - main->suspended = TRUE; |
| 2977 + main_ptr->suspended = TRUE; |
| 2978 } |
| 2979 return; |
| 2980 } |
| 2981 @@ -222,15 +222,15 @@ |
| 2982 /* We did finish the row. Undo our little suspension hack if a previous |
| 2983 * call suspended; then mark the main buffer empty. |
| 2984 */ |
| 2985 - if (main->suspended) { |
| 2986 + if (main_ptr->suspended) { |
| 2987 (*in_row_ctr)++; |
| 2988 - main->suspended = FALSE; |
| 2989 + main_ptr->suspended = FALSE; |
| 2990 } |
| 2991 } |
| 2992 |
| 2993 /* If get here, we are done with this iMCU row. Mark buffer empty. */ |
| 2994 - main->rowgroup_ctr = 0; |
| 2995 - main->cur_iMCU_row++; |
| 2996 + main_ptr->rowgroup_ctr = 0; |
| 2997 + main_ptr->cur_iMCU_row++; |
| 2998 } |
| 2999 } |
| 3000 |
| 3001 @@ -244,15 +244,15 @@ |
| 3002 GLOBAL(void) |
| 3003 jinit_c_main_controller (j_compress_ptr cinfo, boolean need_full_buffer) |
| 3004 { |
| 3005 - my_main_ptr main; |
| 3006 + my_main_ptr main_ptr; |
| 3007 int ci; |
| 3008 jpeg_component_info *compptr; |
| 3009 |
| 3010 - main = (my_main_ptr) |
| 3011 + main_ptr = (my_main_ptr) |
| 3012 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE, |
| 3013 SIZEOF(my_main_controller)); |
| 3014 - cinfo->main = (struct jpeg_c_main_controller *) main; |
| 3015 - main->pub.start_pass = start_pass_main; |
| 3016 + cinfo->main = (struct jpeg_c_main_controller *) main_ptr; |
| 3017 + main_ptr->pub.start_pass = start_pass_main; |
| 3018 |
| 3019 /* We don't need to create a buffer in raw-data mode. */ |
| 3020 if (cinfo->raw_data_in) |
| 3021 @@ -267,7 +267,7 @@ |
| 3022 /* Note we pad the bottom to a multiple of the iMCU height */ |
| 3023 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 3024 ci++, compptr++) { |
| 3025 - main->whole_image[ci] = (*cinfo->mem->request_virt_sarray) |
| 3026 + main_ptr->whole_image[ci] = (*cinfo->mem->request_virt_sarray) |
| 3027 ((j_common_ptr) cinfo, JPOOL_IMAGE, FALSE, |
| 3028 compptr->width_in_blocks * DCTSIZE, |
| 3029 (JDIMENSION) jround_up((long) compptr->height_in_blocks, |
| 3030 @@ -279,12 +279,12 @@ |
| 3031 #endif |
| 3032 } else { |
| 3033 #ifdef FULL_MAIN_BUFFER_SUPPORTED |
| 3034 - main->whole_image[0] = NULL; /* flag for no virtual arrays */ |
| 3035 + main_ptr->whole_image[0] = NULL; /* flag for no virtual arrays */ |
| 3036 #endif |
| 3037 /* Allocate a strip buffer for each component */ |
| 3038 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 3039 ci++, compptr++) { |
| 3040 - main->buffer[ci] = (*cinfo->mem->alloc_sarray) |
| 3041 + main_ptr->buffer[ci] = (*cinfo->mem->alloc_sarray) |
| 3042 ((j_common_ptr) cinfo, JPOOL_IMAGE, |
| 3043 compptr->width_in_blocks * DCTSIZE, |
| 3044 (JDIMENSION) (compptr->v_samp_factor * DCTSIZE)); |
| 3045 Index: jcmarker.c |
| 3046 =================================================================== |
| 3047 --- jcmarker.c (revision 829) |
| 3048 +++ jcmarker.c (working copy) |
| 3049 @@ -1,8 +1,11 @@ |
| 3050 /* |
| 3051 * jcmarker.c |
| 3052 * |
| 3053 + * This file was part of the Independent JPEG Group's software: |
| 3054 * Copyright (C) 1991-1998, Thomas G. Lane. |
| 3055 - * This file is part of the Independent JPEG Group's software. |
| 3056 + * Modified 2003-2010 by Guido Vollbeding. |
| 3057 + * libjpeg-turbo Modifications: |
| 3058 + * Copyright (C) 2010, D. R. Commander. |
| 3059 * For conditions of distribution and use, see the accompanying README file. |
| 3060 * |
| 3061 * This file contains routines to write JPEG datastream markers. |
| 3062 @@ -11,6 +14,7 @@ |
| 3063 #define JPEG_INTERNALS |
| 3064 #include "jinclude.h" |
| 3065 #include "jpeglib.h" |
| 3066 +#include "jpegcomp.h" |
| 3067 |
| 3068 |
| 3069 typedef enum { /* JPEG marker codes */ |
| 3070 @@ -18,24 +22,24 @@ |
| 3071 M_SOF1 = 0xc1, |
| 3072 M_SOF2 = 0xc2, |
| 3073 M_SOF3 = 0xc3, |
| 3074 - |
| 3075 + |
| 3076 M_SOF5 = 0xc5, |
| 3077 M_SOF6 = 0xc6, |
| 3078 M_SOF7 = 0xc7, |
| 3079 - |
| 3080 + |
| 3081 M_JPG = 0xc8, |
| 3082 M_SOF9 = 0xc9, |
| 3083 M_SOF10 = 0xca, |
| 3084 M_SOF11 = 0xcb, |
| 3085 - |
| 3086 + |
| 3087 M_SOF13 = 0xcd, |
| 3088 M_SOF14 = 0xce, |
| 3089 M_SOF15 = 0xcf, |
| 3090 - |
| 3091 + |
| 3092 M_DHT = 0xc4, |
| 3093 - |
| 3094 + |
| 3095 M_DAC = 0xcc, |
| 3096 - |
| 3097 + |
| 3098 M_RST0 = 0xd0, |
| 3099 M_RST1 = 0xd1, |
| 3100 M_RST2 = 0xd2, |
| 3101 @@ -44,7 +48,7 @@ |
| 3102 M_RST5 = 0xd5, |
| 3103 M_RST6 = 0xd6, |
| 3104 M_RST7 = 0xd7, |
| 3105 - |
| 3106 + |
| 3107 M_SOI = 0xd8, |
| 3108 M_EOI = 0xd9, |
| 3109 M_SOS = 0xda, |
| 3110 @@ -53,7 +57,7 @@ |
| 3111 M_DRI = 0xdd, |
| 3112 M_DHP = 0xde, |
| 3113 M_EXP = 0xdf, |
| 3114 - |
| 3115 + |
| 3116 M_APP0 = 0xe0, |
| 3117 M_APP1 = 0xe1, |
| 3118 M_APP2 = 0xe2, |
| 3119 @@ -70,13 +74,13 @@ |
| 3120 M_APP13 = 0xed, |
| 3121 M_APP14 = 0xee, |
| 3122 M_APP15 = 0xef, |
| 3123 - |
| 3124 + |
| 3125 M_JPG0 = 0xf0, |
| 3126 M_JPG13 = 0xfd, |
| 3127 M_COM = 0xfe, |
| 3128 - |
| 3129 + |
| 3130 M_TEM = 0x01, |
| 3131 - |
| 3132 + |
| 3133 M_ERROR = 0x100 |
| 3134 } JPEG_MARKER; |
| 3135 |
| 3136 @@ -229,33 +233,39 @@ |
| 3137 char ac_in_use[NUM_ARITH_TBLS]; |
| 3138 int length, i; |
| 3139 jpeg_component_info *compptr; |
| 3140 - |
| 3141 + |
| 3142 for (i = 0; i < NUM_ARITH_TBLS; i++) |
| 3143 dc_in_use[i] = ac_in_use[i] = 0; |
| 3144 - |
| 3145 + |
| 3146 for (i = 0; i < cinfo->comps_in_scan; i++) { |
| 3147 compptr = cinfo->cur_comp_info[i]; |
| 3148 - dc_in_use[compptr->dc_tbl_no] = 1; |
| 3149 - ac_in_use[compptr->ac_tbl_no] = 1; |
| 3150 + /* DC needs no table for refinement scan */ |
| 3151 + if (cinfo->Ss == 0 && cinfo->Ah == 0) |
| 3152 + dc_in_use[compptr->dc_tbl_no] = 1; |
| 3153 + /* AC needs no table when not present */ |
| 3154 + if (cinfo->Se) |
| 3155 + ac_in_use[compptr->ac_tbl_no] = 1; |
| 3156 } |
| 3157 - |
| 3158 + |
| 3159 length = 0; |
| 3160 for (i = 0; i < NUM_ARITH_TBLS; i++) |
| 3161 length += dc_in_use[i] + ac_in_use[i]; |
| 3162 - |
| 3163 - emit_marker(cinfo, M_DAC); |
| 3164 - |
| 3165 - emit_2bytes(cinfo, length*2 + 2); |
| 3166 - |
| 3167 - for (i = 0; i < NUM_ARITH_TBLS; i++) { |
| 3168 - if (dc_in_use[i]) { |
| 3169 - emit_byte(cinfo, i); |
| 3170 - emit_byte(cinfo, cinfo->arith_dc_L[i] + (cinfo->arith_dc_U[i]<<4)); |
| 3171 + |
| 3172 + if (length) { |
| 3173 + emit_marker(cinfo, M_DAC); |
| 3174 + |
| 3175 + emit_2bytes(cinfo, length*2 + 2); |
| 3176 + |
| 3177 + for (i = 0; i < NUM_ARITH_TBLS; i++) { |
| 3178 + if (dc_in_use[i]) { |
| 3179 + emit_byte(cinfo, i); |
| 3180 + emit_byte(cinfo, cinfo->arith_dc_L[i] + (cinfo->arith_dc_U[i]<<4)); |
| 3181 + } |
| 3182 + if (ac_in_use[i]) { |
| 3183 + emit_byte(cinfo, i + 0x10); |
| 3184 + emit_byte(cinfo, cinfo->arith_ac_K[i]); |
| 3185 + } |
| 3186 } |
| 3187 - if (ac_in_use[i]) { |
| 3188 - emit_byte(cinfo, i + 0x10); |
| 3189 - emit_byte(cinfo, cinfo->arith_ac_K[i]); |
| 3190 - } |
| 3191 } |
| 3192 #endif /* C_ARITH_CODING_SUPPORTED */ |
| 3193 } |
| 3194 @@ -285,13 +295,13 @@ |
| 3195 emit_2bytes(cinfo, 3 * cinfo->num_components + 2 + 5 + 1); /* length */ |
| 3196 |
| 3197 /* Make sure image isn't bigger than SOF field can handle */ |
| 3198 - if ((long) cinfo->image_height > 65535L || |
| 3199 - (long) cinfo->image_width > 65535L) |
| 3200 + if ((long) cinfo->_jpeg_height > 65535L || |
| 3201 + (long) cinfo->_jpeg_width > 65535L) |
| 3202 ERREXIT1(cinfo, JERR_IMAGE_TOO_BIG, (unsigned int) 65535); |
| 3203 |
| 3204 emit_byte(cinfo, cinfo->data_precision); |
| 3205 - emit_2bytes(cinfo, (int) cinfo->image_height); |
| 3206 - emit_2bytes(cinfo, (int) cinfo->image_width); |
| 3207 + emit_2bytes(cinfo, (int) cinfo->_jpeg_height); |
| 3208 + emit_2bytes(cinfo, (int) cinfo->_jpeg_width); |
| 3209 |
| 3210 emit_byte(cinfo, cinfo->num_components); |
| 3211 |
| 3212 @@ -320,22 +330,16 @@ |
| 3213 for (i = 0; i < cinfo->comps_in_scan; i++) { |
| 3214 compptr = cinfo->cur_comp_info[i]; |
| 3215 emit_byte(cinfo, compptr->component_id); |
| 3216 - td = compptr->dc_tbl_no; |
| 3217 - ta = compptr->ac_tbl_no; |
| 3218 - if (cinfo->progressive_mode) { |
| 3219 - /* Progressive mode: only DC or only AC tables are used in one scan; |
| 3220 - * furthermore, Huffman coding of DC refinement uses no table at all. |
| 3221 - * We emit 0 for unused field(s); this is recommended by the P&M text |
| 3222 - * but does not seem to be specified in the standard. |
| 3223 - */ |
| 3224 - if (cinfo->Ss == 0) { |
| 3225 - ta = 0; /* DC scan */ |
| 3226 - if (cinfo->Ah != 0 && !cinfo->arith_code) |
| 3227 - td = 0; /* no DC table either */ |
| 3228 - } else { |
| 3229 - td = 0; /* AC scan */ |
| 3230 - } |
| 3231 - } |
| 3232 + |
| 3233 + /* We emit 0 for unused field(s); this is recommended by the P&M text |
| 3234 + * but does not seem to be specified in the standard. |
| 3235 + */ |
| 3236 + |
| 3237 + /* DC needs no table for refinement scan */ |
| 3238 + td = cinfo->Ss == 0 && cinfo->Ah == 0 ? compptr->dc_tbl_no : 0; |
| 3239 + /* AC needs no table when not present */ |
| 3240 + ta = cinfo->Se ? compptr->ac_tbl_no : 0; |
| 3241 + |
| 3242 emit_byte(cinfo, (td << 4) + ta); |
| 3243 } |
| 3244 |
| 3245 @@ -529,7 +533,10 @@ |
| 3246 |
| 3247 /* Emit the proper SOF marker */ |
| 3248 if (cinfo->arith_code) { |
| 3249 - emit_sof(cinfo, M_SOF9); /* SOF code for arithmetic coding */ |
| 3250 + if (cinfo->progressive_mode) |
| 3251 + emit_sof(cinfo, M_SOF10); /* SOF code for progressive arithmetic */ |
| 3252 + else |
| 3253 + emit_sof(cinfo, M_SOF9); /* SOF code for sequential arithmetic */ |
| 3254 } else { |
| 3255 if (cinfo->progressive_mode) |
| 3256 emit_sof(cinfo, M_SOF2); /* SOF code for progressive Huffman */ |
| 3257 @@ -566,19 +573,12 @@ |
| 3258 */ |
| 3259 for (i = 0; i < cinfo->comps_in_scan; i++) { |
| 3260 compptr = cinfo->cur_comp_info[i]; |
| 3261 - if (cinfo->progressive_mode) { |
| 3262 - /* Progressive mode: only DC or only AC tables are used in one scan */ |
| 3263 - if (cinfo->Ss == 0) { |
| 3264 - if (cinfo->Ah == 0) /* DC needs no table for refinement scan */ |
| 3265 - emit_dht(cinfo, compptr->dc_tbl_no, FALSE); |
| 3266 - } else { |
| 3267 - emit_dht(cinfo, compptr->ac_tbl_no, TRUE); |
| 3268 - } |
| 3269 - } else { |
| 3270 - /* Sequential mode: need both DC and AC tables */ |
| 3271 + /* DC needs no table for refinement scan */ |
| 3272 + if (cinfo->Ss == 0 && cinfo->Ah == 0) |
| 3273 emit_dht(cinfo, compptr->dc_tbl_no, FALSE); |
| 3274 + /* AC needs no table when not present */ |
| 3275 + if (cinfo->Se) |
| 3276 emit_dht(cinfo, compptr->ac_tbl_no, TRUE); |
| 3277 - } |
| 3278 } |
| 3279 } |
| 3280 |
| 3281 Index: jcmaster.c |
| 3282 =================================================================== |
| 3283 --- jcmaster.c (revision 829) |
| 3284 +++ jcmaster.c (working copy) |
| 3285 @@ -1,8 +1,11 @@ |
| 3286 /* |
| 3287 * jcmaster.c |
| 3288 * |
| 3289 + * This file was part of the Independent JPEG Group's software: |
| 3290 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 3291 - * This file is part of the Independent JPEG Group's software. |
| 3292 + * Modified 2003-2010 by Guido Vollbeding. |
| 3293 + * libjpeg-turbo Modifications: |
| 3294 + * Copyright (C) 2010, D. R. Commander. |
| 3295 * For conditions of distribution and use, see the accompanying README file. |
| 3296 * |
| 3297 * This file contains master control logic for the JPEG compressor. |
| 3298 @@ -14,6 +17,7 @@ |
| 3299 #define JPEG_INTERNALS |
| 3300 #include "jinclude.h" |
| 3301 #include "jpeglib.h" |
| 3302 +#include "jpegcomp.h" |
| 3303 |
| 3304 |
| 3305 /* Private state */ |
| 3306 @@ -42,8 +46,28 @@ |
| 3307 * Support routines that do various essential calculations. |
| 3308 */ |
| 3309 |
| 3310 +#if JPEG_LIB_VERSION >= 70 |
| 3311 +/* |
| 3312 + * Compute JPEG image dimensions and related values. |
| 3313 + * NOTE: this is exported for possible use by application. |
| 3314 + * Hence it mustn't do anything that can't be done twice. |
| 3315 + */ |
| 3316 + |
| 3317 +GLOBAL(void) |
| 3318 +jpeg_calc_jpeg_dimensions (j_compress_ptr cinfo) |
| 3319 +/* Do computations that are needed before master selection phase */ |
| 3320 +{ |
| 3321 + /* Hardwire it to "no scaling" */ |
| 3322 + cinfo->jpeg_width = cinfo->image_width; |
| 3323 + cinfo->jpeg_height = cinfo->image_height; |
| 3324 + cinfo->min_DCT_h_scaled_size = DCTSIZE; |
| 3325 + cinfo->min_DCT_v_scaled_size = DCTSIZE; |
| 3326 +} |
| 3327 +#endif |
| 3328 + |
| 3329 + |
| 3330 LOCAL(void) |
| 3331 -initial_setup (j_compress_ptr cinfo) |
| 3332 +initial_setup (j_compress_ptr cinfo, boolean transcode_only) |
| 3333 /* Do computations that are needed before master selection phase */ |
| 3334 { |
| 3335 int ci; |
| 3336 @@ -51,14 +75,21 @@ |
| 3337 long samplesperrow; |
| 3338 JDIMENSION jd_samplesperrow; |
| 3339 |
| 3340 +#if JPEG_LIB_VERSION >= 70 |
| 3341 +#if JPEG_LIB_VERSION >= 80 |
| 3342 + if (!transcode_only) |
| 3343 +#endif |
| 3344 + jpeg_calc_jpeg_dimensions(cinfo); |
| 3345 +#endif |
| 3346 + |
| 3347 /* Sanity check on image dimensions */ |
| 3348 - if (cinfo->image_height <= 0 || cinfo->image_width <= 0 |
| 3349 + if (cinfo->_jpeg_height <= 0 || cinfo->_jpeg_width <= 0 |
| 3350 || cinfo->num_components <= 0 || cinfo->input_components <= 0) |
| 3351 ERREXIT(cinfo, JERR_EMPTY_IMAGE); |
| 3352 |
| 3353 /* Make sure image isn't bigger than I can handle */ |
| 3354 - if ((long) cinfo->image_height > (long) JPEG_MAX_DIMENSION || |
| 3355 - (long) cinfo->image_width > (long) JPEG_MAX_DIMENSION) |
| 3356 + if ((long) cinfo->_jpeg_height > (long) JPEG_MAX_DIMENSION || |
| 3357 + (long) cinfo->_jpeg_width > (long) JPEG_MAX_DIMENSION) |
| 3358 ERREXIT1(cinfo, JERR_IMAGE_TOO_BIG, (unsigned int) JPEG_MAX_DIMENSION); |
| 3359 |
| 3360 /* Width of an input scanline must be representable as JDIMENSION. */ |
| 3361 @@ -96,20 +127,24 @@ |
| 3362 /* Fill in the correct component_index value; don't rely on application */ |
| 3363 compptr->component_index = ci; |
| 3364 /* For compression, we never do DCT scaling. */ |
| 3365 +#if JPEG_LIB_VERSION >= 70 |
| 3366 + compptr->DCT_h_scaled_size = compptr->DCT_v_scaled_size = DCTSIZE; |
| 3367 +#else |
| 3368 compptr->DCT_scaled_size = DCTSIZE; |
| 3369 +#endif |
| 3370 /* Size in DCT blocks */ |
| 3371 compptr->width_in_blocks = (JDIMENSION) |
| 3372 - jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor, |
| 3373 + jdiv_round_up((long) cinfo->_jpeg_width * (long) compptr->h_samp_factor, |
| 3374 (long) (cinfo->max_h_samp_factor * DCTSIZE)); |
| 3375 compptr->height_in_blocks = (JDIMENSION) |
| 3376 - jdiv_round_up((long) cinfo->image_height * (long) compptr->v_samp_factor, |
| 3377 + jdiv_round_up((long) cinfo->_jpeg_height * (long) compptr->v_samp_factor, |
| 3378 (long) (cinfo->max_v_samp_factor * DCTSIZE)); |
| 3379 /* Size in samples */ |
| 3380 compptr->downsampled_width = (JDIMENSION) |
| 3381 - jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor, |
| 3382 + jdiv_round_up((long) cinfo->_jpeg_width * (long) compptr->h_samp_factor, |
| 3383 (long) cinfo->max_h_samp_factor); |
| 3384 compptr->downsampled_height = (JDIMENSION) |
| 3385 - jdiv_round_up((long) cinfo->image_height * (long) compptr->v_samp_factor, |
| 3386 + jdiv_round_up((long) cinfo->_jpeg_height * (long) compptr->v_samp_factor, |
| 3387 (long) cinfo->max_v_samp_factor); |
| 3388 /* Mark component needed (this flag isn't actually used for compression) */ |
| 3389 compptr->component_needed = TRUE; |
| 3390 @@ -119,7 +154,7 @@ |
| 3391 * main controller will call coefficient controller). |
| 3392 */ |
| 3393 cinfo->total_iMCU_rows = (JDIMENSION) |
| 3394 - jdiv_round_up((long) cinfo->image_height, |
| 3395 + jdiv_round_up((long) cinfo->_jpeg_height, |
| 3396 (long) (cinfo->max_v_samp_factor*DCTSIZE)); |
| 3397 } |
| 3398 |
| 3399 @@ -347,10 +382,10 @@ |
| 3400 |
| 3401 /* Overall image size in MCUs */ |
| 3402 cinfo->MCUs_per_row = (JDIMENSION) |
| 3403 - jdiv_round_up((long) cinfo->image_width, |
| 3404 + jdiv_round_up((long) cinfo->_jpeg_width, |
| 3405 (long) (cinfo->max_h_samp_factor*DCTSIZE)); |
| 3406 cinfo->MCU_rows_in_scan = (JDIMENSION) |
| 3407 - jdiv_round_up((long) cinfo->image_height, |
| 3408 + jdiv_round_up((long) cinfo->_jpeg_height, |
| 3409 (long) (cinfo->max_v_samp_factor*DCTSIZE)); |
| 3410 |
| 3411 cinfo->blocks_in_MCU = 0; |
| 3412 @@ -554,7 +589,7 @@ |
| 3413 master->pub.is_last_pass = FALSE; |
| 3414 |
| 3415 /* Validate parameters, determine derived values */ |
| 3416 - initial_setup(cinfo); |
| 3417 + initial_setup(cinfo, transcode_only); |
| 3418 |
| 3419 if (cinfo->scan_info != NULL) { |
| 3420 #ifdef C_MULTISCAN_FILES_SUPPORTED |
| 3421 @@ -567,7 +602,7 @@ |
| 3422 cinfo->num_scans = 1; |
| 3423 } |
| 3424 |
| 3425 - if (cinfo->progressive_mode) /* TEMPORARY HACK ??? */ |
| 3426 + if (cinfo->progressive_mode && !cinfo->arith_code) /* TEMPORARY HACK ??? *
/ |
| 3427 cinfo->optimize_coding = TRUE; /* assume default tables no good for progres
sive mode */ |
| 3428 |
| 3429 /* Initialize my private state */ |
| 3430 Index: jcparam.c |
| 3431 =================================================================== |
| 3432 --- jcparam.c (revision 829) |
| 3433 +++ jcparam.c (working copy) |
| 3434 @@ -1,9 +1,11 @@ |
| 3435 /* |
| 3436 * jcparam.c |
| 3437 * |
| 3438 + * This file was part of the Independent JPEG Group's software: |
| 3439 * Copyright (C) 1991-1998, Thomas G. Lane. |
| 3440 - * Copyright (C) 2009, D. R. Commander. |
| 3441 - * This file is part of the Independent JPEG Group's software. |
| 3442 + * Modified 2003-2008 by Guido Vollbeding. |
| 3443 + * libjpeg-turbo Modifications: |
| 3444 + * Copyright (C) 2009-2011, D. R. Commander. |
| 3445 * For conditions of distribution and use, see the accompanying README file. |
| 3446 * |
| 3447 * This file contains optional default-setting code for the JPEG compressor. |
| 3448 @@ -61,7 +63,50 @@ |
| 3449 } |
| 3450 |
| 3451 |
| 3452 +/* These are the sample quantization tables given in JPEG spec section K.1. |
| 3453 + * The spec says that the values given produce "good" quality, and |
| 3454 + * when divided by 2, "very good" quality. |
| 3455 + */ |
| 3456 +static const unsigned int std_luminance_quant_tbl[DCTSIZE2] = { |
| 3457 + 16, 11, 10, 16, 24, 40, 51, 61, |
| 3458 + 12, 12, 14, 19, 26, 58, 60, 55, |
| 3459 + 14, 13, 16, 24, 40, 57, 69, 56, |
| 3460 + 14, 17, 22, 29, 51, 87, 80, 62, |
| 3461 + 18, 22, 37, 56, 68, 109, 103, 77, |
| 3462 + 24, 35, 55, 64, 81, 104, 113, 92, |
| 3463 + 49, 64, 78, 87, 103, 121, 120, 101, |
| 3464 + 72, 92, 95, 98, 112, 100, 103, 99 |
| 3465 +}; |
| 3466 +static const unsigned int std_chrominance_quant_tbl[DCTSIZE2] = { |
| 3467 + 17, 18, 24, 47, 99, 99, 99, 99, |
| 3468 + 18, 21, 26, 66, 99, 99, 99, 99, |
| 3469 + 24, 26, 56, 99, 99, 99, 99, 99, |
| 3470 + 47, 66, 99, 99, 99, 99, 99, 99, |
| 3471 + 99, 99, 99, 99, 99, 99, 99, 99, |
| 3472 + 99, 99, 99, 99, 99, 99, 99, 99, |
| 3473 + 99, 99, 99, 99, 99, 99, 99, 99, |
| 3474 + 99, 99, 99, 99, 99, 99, 99, 99 |
| 3475 +}; |
| 3476 + |
| 3477 + |
| 3478 +#if JPEG_LIB_VERSION >= 70 |
| 3479 GLOBAL(void) |
| 3480 +jpeg_default_qtables (j_compress_ptr cinfo, boolean force_baseline) |
| 3481 +/* Set or change the 'quality' (quantization) setting, using default tables |
| 3482 + * and straight percentage-scaling quality scales. |
| 3483 + * This entry point allows different scalings for luminance and chrominance. |
| 3484 + */ |
| 3485 +{ |
| 3486 + /* Set up two quantization tables using the specified scaling */ |
| 3487 + jpeg_add_quant_table(cinfo, 0, std_luminance_quant_tbl, |
| 3488 + cinfo->q_scale_factor[0], force_baseline); |
| 3489 + jpeg_add_quant_table(cinfo, 1, std_chrominance_quant_tbl, |
| 3490 + cinfo->q_scale_factor[1], force_baseline); |
| 3491 +} |
| 3492 +#endif |
| 3493 + |
| 3494 + |
| 3495 +GLOBAL(void) |
| 3496 jpeg_set_linear_quality (j_compress_ptr cinfo, int scale_factor, |
| 3497 boolean force_baseline) |
| 3498 /* Set or change the 'quality' (quantization) setting, using default tables |
| 3499 @@ -70,31 +115,6 @@ |
| 3500 * applications that insist on a linear percentage scaling. |
| 3501 */ |
| 3502 { |
| 3503 - /* These are the sample quantization tables given in JPEG spec section K.1. |
| 3504 - * The spec says that the values given produce "good" quality, and |
| 3505 - * when divided by 2, "very good" quality. |
| 3506 - */ |
| 3507 - static const unsigned int std_luminance_quant_tbl[DCTSIZE2] = { |
| 3508 - 16, 11, 10, 16, 24, 40, 51, 61, |
| 3509 - 12, 12, 14, 19, 26, 58, 60, 55, |
| 3510 - 14, 13, 16, 24, 40, 57, 69, 56, |
| 3511 - 14, 17, 22, 29, 51, 87, 80, 62, |
| 3512 - 18, 22, 37, 56, 68, 109, 103, 77, |
| 3513 - 24, 35, 55, 64, 81, 104, 113, 92, |
| 3514 - 49, 64, 78, 87, 103, 121, 120, 101, |
| 3515 - 72, 92, 95, 98, 112, 100, 103, 99 |
| 3516 - }; |
| 3517 - static const unsigned int std_chrominance_quant_tbl[DCTSIZE2] = { |
| 3518 - 17, 18, 24, 47, 99, 99, 99, 99, |
| 3519 - 18, 21, 26, 66, 99, 99, 99, 99, |
| 3520 - 24, 26, 56, 99, 99, 99, 99, 99, |
| 3521 - 47, 66, 99, 99, 99, 99, 99, 99, |
| 3522 - 99, 99, 99, 99, 99, 99, 99, 99, |
| 3523 - 99, 99, 99, 99, 99, 99, 99, 99, |
| 3524 - 99, 99, 99, 99, 99, 99, 99, 99, |
| 3525 - 99, 99, 99, 99, 99, 99, 99, 99 |
| 3526 - }; |
| 3527 - |
| 3528 /* Set up two quantization tables using the specified scaling */ |
| 3529 jpeg_add_quant_table(cinfo, 0, std_luminance_quant_tbl, |
| 3530 scale_factor, force_baseline); |
| 3531 @@ -285,6 +305,10 @@ |
| 3532 |
| 3533 /* Initialize everything not dependent on the color space */ |
| 3534 |
| 3535 +#if JPEG_LIB_VERSION >= 70 |
| 3536 + cinfo->scale_num = 1; /* 1:1 scaling */ |
| 3537 + cinfo->scale_denom = 1; |
| 3538 +#endif |
| 3539 cinfo->data_precision = BITS_IN_JSAMPLE; |
| 3540 /* Set up two quantization tables using default quality of 75 */ |
| 3541 jpeg_set_quality(cinfo, 75, TRUE); |
| 3542 @@ -321,6 +345,11 @@ |
| 3543 /* By default, use the simpler non-cosited sampling alignment */ |
| 3544 cinfo->CCIR601_sampling = FALSE; |
| 3545 |
| 3546 +#if JPEG_LIB_VERSION >= 70 |
| 3547 + /* By default, apply fancy downsampling */ |
| 3548 + cinfo->do_fancy_downsampling = TRUE; |
| 3549 +#endif |
| 3550 + |
| 3551 /* No input smoothing */ |
| 3552 cinfo->smoothing_factor = 0; |
| 3553 |
| 3554 @@ -370,6 +399,10 @@ |
| 3555 case JCS_EXT_BGRX: |
| 3556 case JCS_EXT_XBGR: |
| 3557 case JCS_EXT_XRGB: |
| 3558 + case JCS_EXT_RGBA: |
| 3559 + case JCS_EXT_BGRA: |
| 3560 + case JCS_EXT_ABGR: |
| 3561 + case JCS_EXT_ARGB: |
| 3562 jpeg_set_colorspace(cinfo, JCS_YCbCr); |
| 3563 break; |
| 3564 case JCS_YCbCr: |
| 3565 Index: jctrans.c |
| 3566 =================================================================== |
| 3567 --- jctrans.c (revision 829) |
| 3568 +++ jctrans.c (working copy) |
| 3569 @@ -2,6 +2,7 @@ |
| 3570 * jctrans.c |
| 3571 * |
| 3572 * Copyright (C) 1995-1998, Thomas G. Lane. |
| 3573 + * Modified 2000-2009 by Guido Vollbeding. |
| 3574 * This file is part of the Independent JPEG Group's software. |
| 3575 * For conditions of distribution and use, see the accompanying README file. |
| 3576 * |
| 3577 @@ -76,6 +77,12 @@ |
| 3578 dstinfo->image_height = srcinfo->image_height; |
| 3579 dstinfo->input_components = srcinfo->num_components; |
| 3580 dstinfo->in_color_space = srcinfo->jpeg_color_space; |
| 3581 +#if JPEG_LIB_VERSION >= 70 |
| 3582 + dstinfo->jpeg_width = srcinfo->output_width; |
| 3583 + dstinfo->jpeg_height = srcinfo->output_height; |
| 3584 + dstinfo->min_DCT_h_scaled_size = srcinfo->min_DCT_h_scaled_size; |
| 3585 + dstinfo->min_DCT_v_scaled_size = srcinfo->min_DCT_v_scaled_size; |
| 3586 +#endif |
| 3587 /* Initialize all parameters to default values */ |
| 3588 jpeg_set_defaults(dstinfo); |
| 3589 /* jpeg_set_defaults may choose wrong colorspace, eg YCbCr if input is RGB. |
| 3590 @@ -167,7 +174,11 @@ |
| 3591 |
| 3592 /* Entropy encoding: either Huffman or arithmetic coding. */ |
| 3593 if (cinfo->arith_code) { |
| 3594 +#ifdef C_ARITH_CODING_SUPPORTED |
| 3595 + jinit_arith_encoder(cinfo); |
| 3596 +#else |
| 3597 ERREXIT(cinfo, JERR_ARITH_NOTIMPL); |
| 3598 +#endif |
| 3599 } else { |
| 3600 if (cinfo->progressive_mode) { |
| 3601 #ifdef C_PROGRESSIVE_SUPPORTED |
| 3602 Index: jdapistd.c |
| 3603 =================================================================== |
| 3604 --- jdapistd.c (revision 829) |
| 3605 +++ jdapistd.c (working copy) |
| 3606 @@ -1,8 +1,11 @@ |
| 3607 /* |
| 3608 * jdapistd.c |
| 3609 * |
| 3610 + * This file was part of the Independent JPEG Group's software: |
| 3611 * Copyright (C) 1994-1996, Thomas G. Lane. |
| 3612 - * This file is part of the Independent JPEG Group's software. |
| 3613 + * libjpeg-turbo Modifications: |
| 3614 + * Copyright (C) 2010, 2015, D. R. Commander. |
| 3615 + * Copyright (C) 2015, Google, Inc. |
| 3616 * For conditions of distribution and use, see the accompanying README file. |
| 3617 * |
| 3618 * This file contains application interface code for the decompression half |
| 3619 @@ -14,9 +17,10 @@ |
| 3620 * whole decompression library into a transcoder. |
| 3621 */ |
| 3622 |
| 3623 -#define JPEG_INTERNALS |
| 3624 -#include "jinclude.h" |
| 3625 -#include "jpeglib.h" |
| 3626 +#include "jdmainct.h" |
| 3627 +#include "jdcoefct.h" |
| 3628 +#include "jdsample.h" |
| 3629 +#include "jmemsys.h" |
| 3630 |
| 3631 |
| 3632 /* Forward declarations */ |
| 3633 @@ -176,7 +180,236 @@ |
| 3634 } |
| 3635 |
| 3636 |
| 3637 + |
| 3638 +/* Dummy color convert function used by jpeg_skip_scanlines() */ |
| 3639 +LOCAL(void) |
| 3640 +noop_convert (j_decompress_ptr cinfo, JSAMPIMAGE input_buf, |
| 3641 + JDIMENSION input_row, JSAMPARRAY output_buf, int num_rows) |
| 3642 +{ |
| 3643 +} |
| 3644 + |
| 3645 + |
| 3646 /* |
| 3647 + * In some cases, it is best to call jpeg_read_scanlines() and discard the |
| 3648 + * output, rather than skipping the scanlines, because this allows us to |
| 3649 + * maintain the internal state of the context-based upsampler. In these cases, |
| 3650 + * we set up and tear down a dummy color converter in order to avoid valgrind |
| 3651 + * errors and to achieve the best possible performance. |
| 3652 + */ |
| 3653 +LOCAL(void) |
| 3654 +read_and_discard_scanlines (j_decompress_ptr cinfo, JDIMENSION num_lines) |
| 3655 +{ |
| 3656 + JDIMENSION n; |
| 3657 + void (*color_convert) (j_decompress_ptr cinfo, JSAMPIMAGE input_buf, |
| 3658 + JDIMENSION input_row, JSAMPARRAY output_buf, |
| 3659 + int num_rows); |
| 3660 + |
| 3661 + color_convert = cinfo->cconvert->color_convert; |
| 3662 + cinfo->cconvert->color_convert = noop_convert; |
| 3663 + |
| 3664 + for (n = 0; n < num_lines; n++) |
| 3665 + jpeg_read_scanlines(cinfo, NULL, 1); |
| 3666 + |
| 3667 + cinfo->cconvert->color_convert = color_convert; |
| 3668 +} |
| 3669 + |
| 3670 +/* |
| 3671 + * Called by jpeg_skip_scanlines(). This partially skips a decompress block by |
| 3672 + * incrementing the rowgroup counter. |
| 3673 + */ |
| 3674 + |
| 3675 +LOCAL(void) |
| 3676 +increment_simple_rowgroup_ctr (j_decompress_ptr cinfo, JDIMENSION rows) |
| 3677 +{ |
| 3678 + JDIMENSION rows_left; |
| 3679 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 3680 + |
| 3681 + /* Increment the counter to the next row group after the skipped rows. */ |
| 3682 + main_ptr->rowgroup_ctr += rows / cinfo->max_v_samp_factor; |
| 3683 + |
| 3684 + /* Partially skipping a row group would involve modifying the internal state |
| 3685 + * of the upsampler, so read the remaining rows into a dummy buffer instead. |
| 3686 + */ |
| 3687 + rows_left = rows % cinfo->max_v_samp_factor; |
| 3688 + cinfo->output_scanline += rows - rows_left; |
| 3689 + |
| 3690 + read_and_discard_scanlines(cinfo, rows_left); |
| 3691 +} |
| 3692 + |
| 3693 +/* |
| 3694 + * Skips some scanlines of data from the JPEG decompressor. |
| 3695 + * |
| 3696 + * The return value will be the number of lines actually skipped. If skipping |
| 3697 + * num_lines would move beyond the end of the image, then the actual number of |
| 3698 + * lines remaining in the image is returned. Otherwise, the return value will |
| 3699 + * be equal to num_lines. |
| 3700 + * |
| 3701 + * Refer to libjpeg.txt for more information. |
| 3702 + */ |
| 3703 + |
| 3704 +GLOBAL(JDIMENSION) |
| 3705 +jpeg_skip_scanlines (j_decompress_ptr cinfo, JDIMENSION num_lines) |
| 3706 +{ |
| 3707 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 3708 + my_coef_ptr coef = (my_coef_ptr) cinfo->coef; |
| 3709 + my_upsample_ptr upsample = (my_upsample_ptr) cinfo->upsample; |
| 3710 + JDIMENSION i, x; |
| 3711 + int y; |
| 3712 + JDIMENSION lines_per_iMCU_row, lines_left_in_iMCU_row, lines_after_iMCU_row; |
| 3713 + JDIMENSION lines_to_skip, lines_to_read; |
| 3714 + |
| 3715 + if (cinfo->global_state != DSTATE_SCANNING) |
| 3716 + ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state); |
| 3717 + |
| 3718 + /* Do not skip past the bottom of the image. */ |
| 3719 + if (cinfo->output_scanline + num_lines >= cinfo->output_height) { |
| 3720 + cinfo->output_scanline = cinfo->output_height; |
| 3721 + return cinfo->output_height - cinfo->output_scanline; |
| 3722 + } |
| 3723 + |
| 3724 + if (num_lines == 0) |
| 3725 + return 0; |
| 3726 + |
| 3727 + lines_per_iMCU_row = cinfo->_min_DCT_scaled_size * cinfo->max_v_samp_factor; |
| 3728 + lines_left_in_iMCU_row = |
| 3729 + (lines_per_iMCU_row - (cinfo->output_scanline % lines_per_iMCU_row)) % |
| 3730 + lines_per_iMCU_row; |
| 3731 + lines_after_iMCU_row = num_lines - lines_left_in_iMCU_row; |
| 3732 + |
| 3733 + /* Skip the lines remaining in the current iMCU row. When upsampling |
| 3734 + * requires context rows, we need the previous and next rows in order to read |
| 3735 + * the current row. This adds some complexity. |
| 3736 + */ |
| 3737 + if (cinfo->upsample->need_context_rows) { |
| 3738 + /* If the skipped lines would not move us past the current iMCU row, we |
| 3739 + * read the lines and ignore them. There might be a faster way of doing |
| 3740 + * this, but we are facing increasing complexity for diminishing returns. |
| 3741 + * The increasing complexity would be a by-product of meddling with the |
| 3742 + * state machine used to skip context rows. Near the end of an iMCU row, |
| 3743 + * the next iMCU row may have already been entropy-decoded. In this unique |
| 3744 + * case, we will read the next iMCU row if we cannot skip past it as well. |
| 3745 + */ |
| 3746 + if ((num_lines < lines_left_in_iMCU_row + 1) || |
| 3747 + (lines_left_in_iMCU_row <= 1 && main_ptr->buffer_full && |
| 3748 + lines_after_iMCU_row < lines_per_iMCU_row + 1)) { |
| 3749 + read_and_discard_scanlines(cinfo, num_lines); |
| 3750 + return num_lines; |
| 3751 + } |
| 3752 + |
| 3753 + /* If the next iMCU row has already been entropy-decoded, make sure that |
| 3754 + * we do not skip too far. |
| 3755 + */ |
| 3756 + if (lines_left_in_iMCU_row <= 1 && main_ptr->buffer_full) { |
| 3757 + cinfo->output_scanline += lines_left_in_iMCU_row + lines_per_iMCU_row; |
| 3758 + lines_after_iMCU_row -= lines_per_iMCU_row; |
| 3759 + } else { |
| 3760 + cinfo->output_scanline += lines_left_in_iMCU_row; |
| 3761 + } |
| 3762 + |
| 3763 + /* If we have just completed the first block, adjust the buffer pointers */ |
| 3764 + if (main_ptr->iMCU_row_ctr == 0 || |
| 3765 + (main_ptr->iMCU_row_ctr == 1 && lines_left_in_iMCU_row > 2)) |
| 3766 + set_wraparound_pointers(cinfo); |
| 3767 + main_ptr->buffer_full = FALSE; |
| 3768 + main_ptr->rowgroup_ctr = 0; |
| 3769 + main_ptr->context_state = CTX_PREPARE_FOR_IMCU; |
| 3770 + upsample->next_row_out = cinfo->max_v_samp_factor; |
| 3771 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; |
| 3772 + } |
| 3773 + |
| 3774 + /* Skipping is much simpler when context rows are not required. */ |
| 3775 + else { |
| 3776 + if (num_lines < lines_left_in_iMCU_row) { |
| 3777 + increment_simple_rowgroup_ctr(cinfo, num_lines); |
| 3778 + return num_lines; |
| 3779 + } else { |
| 3780 + cinfo->output_scanline += lines_left_in_iMCU_row; |
| 3781 + main_ptr->buffer_full = FALSE; |
| 3782 + main_ptr->rowgroup_ctr = 0; |
| 3783 + upsample->next_row_out = cinfo->max_v_samp_factor; |
| 3784 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; |
| 3785 + } |
| 3786 + } |
| 3787 + |
| 3788 + /* Calculate how many full iMCU rows we can skip. */ |
| 3789 + if (cinfo->upsample->need_context_rows) |
| 3790 + lines_to_skip = ((lines_after_iMCU_row - 1) / lines_per_iMCU_row) * |
| 3791 + lines_per_iMCU_row; |
| 3792 + else |
| 3793 + lines_to_skip = (lines_after_iMCU_row / lines_per_iMCU_row) * |
| 3794 + lines_per_iMCU_row; |
| 3795 + /* Calculate the number of lines that remain to be skipped after skipping all |
| 3796 + * of the full iMCU rows that we can. We will not read these lines unless we |
| 3797 + * have to. |
| 3798 + */ |
| 3799 + lines_to_read = lines_after_iMCU_row - lines_to_skip; |
| 3800 + |
| 3801 + /* For images requiring multiple scans (progressive, non-interleaved, etc.), |
| 3802 + * all of the entropy decoding occurs in jpeg_start_decompress(), assuming |
| 3803 + * that the input data source is non-suspending. This makes skipping easy. |
| 3804 + */ |
| 3805 + if (cinfo->inputctl->has_multiple_scans) { |
| 3806 + if (cinfo->upsample->need_context_rows) { |
| 3807 + cinfo->output_scanline += lines_to_skip; |
| 3808 + cinfo->output_iMCU_row += lines_to_skip / lines_per_iMCU_row; |
| 3809 + main_ptr->iMCU_row_ctr += lines_after_iMCU_row / lines_per_iMCU_row; |
| 3810 + /* It is complex to properly move to the middle of a context block, so |
| 3811 + * read the remaining lines instead of skipping them. |
| 3812 + */ |
| 3813 + read_and_discard_scanlines(cinfo, lines_to_read); |
| 3814 + } else { |
| 3815 + cinfo->output_scanline += lines_to_skip; |
| 3816 + cinfo->output_iMCU_row += lines_to_skip / lines_per_iMCU_row; |
| 3817 + increment_simple_rowgroup_ctr(cinfo, lines_to_read); |
| 3818 + } |
| 3819 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; |
| 3820 + return num_lines; |
| 3821 + } |
| 3822 + |
| 3823 + /* Skip the iMCU rows that we can safely skip. */ |
| 3824 + for (i = 0; i < lines_to_skip; i += lines_per_iMCU_row) { |
| 3825 + for (y = 0; y < coef->MCU_rows_per_iMCU_row; y++) { |
| 3826 + for (x = 0; x < cinfo->MCUs_per_row; x++) { |
| 3827 + /* Calling decode_mcu() with a NULL pointer causes it to discard the |
| 3828 + * decoded coefficients. This is ~5% faster for large subsets, but |
| 3829 + * it's tough to tell a difference for smaller images. |
| 3830 + */ |
| 3831 + (*cinfo->entropy->decode_mcu) (cinfo, NULL); |
| 3832 + } |
| 3833 + } |
| 3834 + cinfo->input_iMCU_row++; |
| 3835 + cinfo->output_iMCU_row++; |
| 3836 + if (cinfo->input_iMCU_row < cinfo->total_iMCU_rows) |
| 3837 + start_iMCU_row(cinfo); |
| 3838 + else |
| 3839 + (*cinfo->inputctl->finish_input_pass) (cinfo); |
| 3840 + } |
| 3841 + cinfo->output_scanline += lines_to_skip; |
| 3842 + |
| 3843 + if (cinfo->upsample->need_context_rows) { |
| 3844 + /* Context-based upsampling keeps track of iMCU rows. */ |
| 3845 + main_ptr->iMCU_row_ctr += lines_to_skip / lines_per_iMCU_row; |
| 3846 + |
| 3847 + /* It is complex to properly move to the middle of a context block, so |
| 3848 + * read the remaining lines instead of skipping them. |
| 3849 + */ |
| 3850 + read_and_discard_scanlines(cinfo, lines_to_read); |
| 3851 + } else { |
| 3852 + increment_simple_rowgroup_ctr(cinfo, lines_to_read); |
| 3853 + } |
| 3854 + |
| 3855 + /* Since skipping lines involves skipping the upsampling step, the value of |
| 3856 + * "rows_to_go" will become invalid unless we set it here. NOTE: This is a |
| 3857 + * bit odd, since "rows_to_go" seems to be redundantly keeping track of |
| 3858 + * output_scanline. |
| 3859 + */ |
| 3860 + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; |
| 3861 + |
| 3862 + /* Always skip the requested number of lines. */ |
| 3863 + return num_lines; |
| 3864 +} |
| 3865 + |
| 3866 +/* |
| 3867 * Alternate entry point to read raw data. |
| 3868 * Processes exactly one iMCU row per call, unless suspended. |
| 3869 */ |
| 3870 @@ -202,7 +435,7 @@ |
| 3871 } |
| 3872 |
| 3873 /* Verify that at least one iMCU row can be returned. */ |
| 3874 - lines_per_iMCU_row = cinfo->max_v_samp_factor * cinfo->min_DCT_scaled_size; |
| 3875 + lines_per_iMCU_row = cinfo->max_v_samp_factor * cinfo->_min_DCT_scaled_size; |
| 3876 if (max_lines < lines_per_iMCU_row) |
| 3877 ERREXIT(cinfo, JERR_BUFFER_SIZE); |
| 3878 |
| 3879 Index: jdatadst.c |
| 3880 =================================================================== |
| 3881 --- jdatadst.c (revision 829) |
| 3882 +++ jdatadst.c (working copy) |
| 3883 @@ -1,14 +1,17 @@ |
| 3884 /* |
| 3885 * jdatadst.c |
| 3886 * |
| 3887 + * This file was part of the Independent JPEG Group's software: |
| 3888 * Copyright (C) 1994-1996, Thomas G. Lane. |
| 3889 - * This file is part of the Independent JPEG Group's software. |
| 3890 + * Modified 2009-2012 by Guido Vollbeding. |
| 3891 + * libjpeg-turbo Modifications: |
| 3892 + * Copyright (C) 2013, D. R. Commander. |
| 3893 * For conditions of distribution and use, see the accompanying README file. |
| 3894 * |
| 3895 * This file contains compression data destination routines for the case of |
| 3896 - * emitting JPEG data to a file (or any stdio stream). While these routines |
| 3897 - * are sufficient for most applications, some will want to use a different |
| 3898 - * destination manager. |
| 3899 + * emitting JPEG data to memory or to a file (or any stdio stream). |
| 3900 + * While these routines are sufficient for most applications, |
| 3901 + * some will want to use a different destination manager. |
| 3902 * IMPORTANT: we assume that fwrite() will correctly transcribe an array of |
| 3903 * JOCTETs into 8-bit-wide elements on external storage. If char is wider |
| 3904 * than 8 bits on your machine, you may need to do some tweaking. |
| 3905 @@ -19,7 +22,12 @@ |
| 3906 #include "jpeglib.h" |
| 3907 #include "jerror.h" |
| 3908 |
| 3909 +#ifndef HAVE_STDLIB_H /* <stdlib.h> should declare malloc(),free() */ |
| 3910 +extern void * malloc JPP((size_t size)); |
| 3911 +extern void free JPP((void *ptr)); |
| 3912 +#endif |
| 3913 |
| 3914 + |
| 3915 /* Expanded data destination object for stdio output */ |
| 3916 |
| 3917 typedef struct { |
| 3918 @@ -34,6 +42,23 @@ |
| 3919 #define OUTPUT_BUF_SIZE 4096 /* choose an efficiently fwrite'able size */ |
| 3920 |
| 3921 |
| 3922 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 3923 +/* Expanded data destination object for memory output */ |
| 3924 + |
| 3925 +typedef struct { |
| 3926 + struct jpeg_destination_mgr pub; /* public fields */ |
| 3927 + |
| 3928 + unsigned char ** outbuffer; /* target buffer */ |
| 3929 + unsigned long * outsize; |
| 3930 + unsigned char * newbuffer; /* newly allocated buffer */ |
| 3931 + JOCTET * buffer; /* start of buffer */ |
| 3932 + size_t bufsize; |
| 3933 +} my_mem_destination_mgr; |
| 3934 + |
| 3935 +typedef my_mem_destination_mgr * my_mem_dest_ptr; |
| 3936 +#endif |
| 3937 + |
| 3938 + |
| 3939 /* |
| 3940 * Initialize destination --- called by jpeg_start_compress |
| 3941 * before any data is actually written. |
| 3942 @@ -53,7 +78,15 @@ |
| 3943 dest->pub.free_in_buffer = OUTPUT_BUF_SIZE; |
| 3944 } |
| 3945 |
| 3946 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 3947 +METHODDEF(void) |
| 3948 +init_mem_destination (j_compress_ptr cinfo) |
| 3949 +{ |
| 3950 + /* no work necessary here */ |
| 3951 +} |
| 3952 +#endif |
| 3953 |
| 3954 + |
| 3955 /* |
| 3956 * Empty the output buffer --- called whenever buffer fills up. |
| 3957 * |
| 3958 @@ -92,7 +125,39 @@ |
| 3959 return TRUE; |
| 3960 } |
| 3961 |
| 3962 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 3963 +METHODDEF(boolean) |
| 3964 +empty_mem_output_buffer (j_compress_ptr cinfo) |
| 3965 +{ |
| 3966 + size_t nextsize; |
| 3967 + JOCTET * nextbuffer; |
| 3968 + my_mem_dest_ptr dest = (my_mem_dest_ptr) cinfo->dest; |
| 3969 |
| 3970 + /* Try to allocate new buffer with double size */ |
| 3971 + nextsize = dest->bufsize * 2; |
| 3972 + nextbuffer = (JOCTET *) malloc(nextsize); |
| 3973 + |
| 3974 + if (nextbuffer == NULL) |
| 3975 + ERREXIT1(cinfo, JERR_OUT_OF_MEMORY, 10); |
| 3976 + |
| 3977 + MEMCOPY(nextbuffer, dest->buffer, dest->bufsize); |
| 3978 + |
| 3979 + if (dest->newbuffer != NULL) |
| 3980 + free(dest->newbuffer); |
| 3981 + |
| 3982 + dest->newbuffer = nextbuffer; |
| 3983 + |
| 3984 + dest->pub.next_output_byte = nextbuffer + dest->bufsize; |
| 3985 + dest->pub.free_in_buffer = dest->bufsize; |
| 3986 + |
| 3987 + dest->buffer = nextbuffer; |
| 3988 + dest->bufsize = nextsize; |
| 3989 + |
| 3990 + return TRUE; |
| 3991 +} |
| 3992 +#endif |
| 3993 + |
| 3994 + |
| 3995 /* |
| 3996 * Terminate destination --- called by jpeg_finish_compress |
| 3997 * after all data has been written. Usually needs to flush buffer. |
| 3998 @@ -119,7 +184,18 @@ |
| 3999 ERREXIT(cinfo, JERR_FILE_WRITE); |
| 4000 } |
| 4001 |
| 4002 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 4003 +METHODDEF(void) |
| 4004 +term_mem_destination (j_compress_ptr cinfo) |
| 4005 +{ |
| 4006 + my_mem_dest_ptr dest = (my_mem_dest_ptr) cinfo->dest; |
| 4007 |
| 4008 + *dest->outbuffer = dest->buffer; |
| 4009 + *dest->outsize = (unsigned long)(dest->bufsize - dest->pub.free_in_buffer); |
| 4010 +} |
| 4011 +#endif |
| 4012 + |
| 4013 + |
| 4014 /* |
| 4015 * Prepare for output to a stdio stream. |
| 4016 * The caller must have already opened the stream, and is responsible |
| 4017 @@ -149,3 +225,55 @@ |
| 4018 dest->pub.term_destination = term_destination; |
| 4019 dest->outfile = outfile; |
| 4020 } |
| 4021 + |
| 4022 + |
| 4023 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 4024 +/* |
| 4025 + * Prepare for output to a memory buffer. |
| 4026 + * The caller may supply an own initial buffer with appropriate size. |
| 4027 + * Otherwise, or when the actual data output exceeds the given size, |
| 4028 + * the library adapts the buffer size as necessary. |
| 4029 + * The standard library functions malloc/free are used for allocating |
| 4030 + * larger memory, so the buffer is available to the application after |
| 4031 + * finishing compression, and then the application is responsible for |
| 4032 + * freeing the requested memory. |
| 4033 + */ |
| 4034 + |
| 4035 +GLOBAL(void) |
| 4036 +jpeg_mem_dest (j_compress_ptr cinfo, |
| 4037 + unsigned char ** outbuffer, unsigned long * outsize) |
| 4038 +{ |
| 4039 + my_mem_dest_ptr dest; |
| 4040 + |
| 4041 + if (outbuffer == NULL || outsize == NULL) /* sanity check */ |
| 4042 + ERREXIT(cinfo, JERR_BUFFER_SIZE); |
| 4043 + |
| 4044 + /* The destination object is made permanent so that multiple JPEG images |
| 4045 + * can be written to the same buffer without re-executing jpeg_mem_dest. |
| 4046 + */ |
| 4047 + if (cinfo->dest == NULL) { /* first time for this JPEG object? */ |
| 4048 + cinfo->dest = (struct jpeg_destination_mgr *) |
| 4049 + (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_PERMANENT, |
| 4050 + SIZEOF(my_mem_destination_mgr)); |
| 4051 + } |
| 4052 + |
| 4053 + dest = (my_mem_dest_ptr) cinfo->dest; |
| 4054 + dest->pub.init_destination = init_mem_destination; |
| 4055 + dest->pub.empty_output_buffer = empty_mem_output_buffer; |
| 4056 + dest->pub.term_destination = term_mem_destination; |
| 4057 + dest->outbuffer = outbuffer; |
| 4058 + dest->outsize = outsize; |
| 4059 + dest->newbuffer = NULL; |
| 4060 + |
| 4061 + if (*outbuffer == NULL || *outsize == 0) { |
| 4062 + /* Allocate initial buffer */ |
| 4063 + dest->newbuffer = *outbuffer = (unsigned char *) malloc(OUTPUT_BUF_SIZE); |
| 4064 + if (dest->newbuffer == NULL) |
| 4065 + ERREXIT1(cinfo, JERR_OUT_OF_MEMORY, 10); |
| 4066 + *outsize = OUTPUT_BUF_SIZE; |
| 4067 + } |
| 4068 + |
| 4069 + dest->pub.next_output_byte = dest->buffer = *outbuffer; |
| 4070 + dest->pub.free_in_buffer = dest->bufsize = *outsize; |
| 4071 +} |
| 4072 +#endif |
| 4073 Index: jdatasrc.c |
| 4074 =================================================================== |
| 4075 --- jdatasrc.c (revision 829) |
| 4076 +++ jdatasrc.c (working copy) |
| 4077 @@ -1,14 +1,17 @@ |
| 4078 /* |
| 4079 * jdatasrc.c |
| 4080 * |
| 4081 + * This file was part of the Independent JPEG Group's software: |
| 4082 * Copyright (C) 1994-1996, Thomas G. Lane. |
| 4083 - * This file is part of the Independent JPEG Group's software. |
| 4084 + * Modified 2009-2011 by Guido Vollbeding. |
| 4085 + * libjpeg-turbo Modifications: |
| 4086 + * Copyright (C) 2013, D. R. Commander. |
| 4087 * For conditions of distribution and use, see the accompanying README file. |
| 4088 * |
| 4089 * This file contains decompression data source routines for the case of |
| 4090 - * reading JPEG data from a file (or any stdio stream). While these routines |
| 4091 - * are sufficient for most applications, some will want to use a different |
| 4092 - * source manager. |
| 4093 + * reading JPEG data from memory or from a file (or any stdio stream). |
| 4094 + * While these routines are sufficient for most applications, |
| 4095 + * some will want to use a different source manager. |
| 4096 * IMPORTANT: we assume that fread() will correctly transcribe an array of |
| 4097 * JOCTETs from 8-bit-wide elements on external storage. If char is wider |
| 4098 * than 8 bits on your machine, you may need to do some tweaking. |
| 4099 @@ -52,7 +55,15 @@ |
| 4100 src->start_of_file = TRUE; |
| 4101 } |
| 4102 |
| 4103 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 4104 +METHODDEF(void) |
| 4105 +init_mem_source (j_decompress_ptr cinfo) |
| 4106 +{ |
| 4107 + /* no work necessary here */ |
| 4108 +} |
| 4109 +#endif |
| 4110 |
| 4111 + |
| 4112 /* |
| 4113 * Fill the input buffer --- called whenever buffer is emptied. |
| 4114 * |
| 4115 @@ -111,7 +122,30 @@ |
| 4116 return TRUE; |
| 4117 } |
| 4118 |
| 4119 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 4120 +METHODDEF(boolean) |
| 4121 +fill_mem_input_buffer (j_decompress_ptr cinfo) |
| 4122 +{ |
| 4123 + static const JOCTET mybuffer[4] = { |
| 4124 + (JOCTET) 0xFF, (JOCTET) JPEG_EOI, 0, 0 |
| 4125 + }; |
| 4126 |
| 4127 + /* The whole JPEG data is expected to reside in the supplied memory |
| 4128 + * buffer, so any request for more data beyond the given buffer size |
| 4129 + * is treated as an error. |
| 4130 + */ |
| 4131 + WARNMS(cinfo, JWRN_JPEG_EOF); |
| 4132 + |
| 4133 + /* Insert a fake EOI marker */ |
| 4134 + |
| 4135 + cinfo->src->next_input_byte = mybuffer; |
| 4136 + cinfo->src->bytes_in_buffer = 2; |
| 4137 + |
| 4138 + return TRUE; |
| 4139 +} |
| 4140 +#endif |
| 4141 + |
| 4142 + |
| 4143 /* |
| 4144 * Skip data --- used to skip over a potentially large amount of |
| 4145 * uninteresting data (such as an APPn marker). |
| 4146 @@ -127,7 +161,7 @@ |
| 4147 METHODDEF(void) |
| 4148 skip_input_data (j_decompress_ptr cinfo, long num_bytes) |
| 4149 { |
| 4150 - my_src_ptr src = (my_src_ptr) cinfo->src; |
| 4151 + struct jpeg_source_mgr * src = cinfo->src; |
| 4152 |
| 4153 /* Just a dumb implementation for now. Could use fseek() except |
| 4154 * it doesn't work on pipes. Not clear that being smart is worth |
| 4155 @@ -134,15 +168,15 @@ |
| 4156 * any trouble anyway --- large skips are infrequent. |
| 4157 */ |
| 4158 if (num_bytes > 0) { |
| 4159 - while (num_bytes > (long) src->pub.bytes_in_buffer) { |
| 4160 - num_bytes -= (long) src->pub.bytes_in_buffer; |
| 4161 - (void) fill_input_buffer(cinfo); |
| 4162 + while (num_bytes > (long) src->bytes_in_buffer) { |
| 4163 + num_bytes -= (long) src->bytes_in_buffer; |
| 4164 + (void) (*src->fill_input_buffer) (cinfo); |
| 4165 /* note we assume that fill_input_buffer will never return FALSE, |
| 4166 * so suspension need not be handled. |
| 4167 */ |
| 4168 } |
| 4169 - src->pub.next_input_byte += (size_t) num_bytes; |
| 4170 - src->pub.bytes_in_buffer -= (size_t) num_bytes; |
| 4171 + src->next_input_byte += (size_t) num_bytes; |
| 4172 + src->bytes_in_buffer -= (size_t) num_bytes; |
| 4173 } |
| 4174 } |
| 4175 |
| 4176 @@ -210,3 +244,40 @@ |
| 4177 src->pub.bytes_in_buffer = 0; /* forces fill_input_buffer on first read */ |
| 4178 src->pub.next_input_byte = NULL; /* until buffer loaded */ |
| 4179 } |
| 4180 + |
| 4181 + |
| 4182 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 4183 +/* |
| 4184 + * Prepare for input from a supplied memory buffer. |
| 4185 + * The buffer must contain the whole JPEG data. |
| 4186 + */ |
| 4187 + |
| 4188 +GLOBAL(void) |
| 4189 +jpeg_mem_src (j_decompress_ptr cinfo, |
| 4190 + unsigned char * inbuffer, unsigned long insize) |
| 4191 +{ |
| 4192 + struct jpeg_source_mgr * src; |
| 4193 + |
| 4194 + if (inbuffer == NULL || insize == 0) /* Treat empty input as fatal error */ |
| 4195 + ERREXIT(cinfo, JERR_INPUT_EMPTY); |
| 4196 + |
| 4197 + /* The source object is made permanent so that a series of JPEG images |
| 4198 + * can be read from the same buffer by calling jpeg_mem_src only before |
| 4199 + * the first one. |
| 4200 + */ |
| 4201 + if (cinfo->src == NULL) { /* first time for this JPEG object? */ |
| 4202 + cinfo->src = (struct jpeg_source_mgr *) |
| 4203 + (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_PERMANENT, |
| 4204 + SIZEOF(struct jpeg_source_mgr)); |
| 4205 + } |
| 4206 + |
| 4207 + src = cinfo->src; |
| 4208 + src->init_source = init_mem_source; |
| 4209 + src->fill_input_buffer = fill_mem_input_buffer; |
| 4210 + src->skip_input_data = skip_input_data; |
| 4211 + src->resync_to_restart = jpeg_resync_to_restart; /* use default method */ |
| 4212 + src->term_source = term_source; |
| 4213 + src->bytes_in_buffer = (size_t) insize; |
| 4214 + src->next_input_byte = (JOCTET *) inbuffer; |
| 4215 +} |
| 4216 +#endif |
| 4217 Index: jdcoefct.c |
| 4218 =================================================================== |
| 4219 --- jdcoefct.c (revision 829) |
| 4220 +++ jdcoefct.c (working copy) |
| 4221 @@ -1,8 +1,11 @@ |
| 4222 /* |
| 4223 * jdcoefct.c |
| 4224 * |
| 4225 + * This file was part of the Independent JPEG Group's software: |
| 4226 * Copyright (C) 1994-1997, Thomas G. Lane. |
| 4227 - * This file is part of the Independent JPEG Group's software. |
| 4228 + * libjpeg-turbo Modifications: |
| 4229 + * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 4230 + * Copyright (C) 2010, D. R. Commander. |
| 4231 * For conditions of distribution and use, see the accompanying README file. |
| 4232 * |
| 4233 * This file contains the coefficient buffer controller for decompression. |
| 4234 @@ -14,56 +17,10 @@ |
| 4235 * Also, the input side (only) is used when reading a file for transcoding. |
| 4236 */ |
| 4237 |
| 4238 -#define JPEG_INTERNALS |
| 4239 -#include "jinclude.h" |
| 4240 -#include "jpeglib.h" |
| 4241 +#include "jdcoefct.h" |
| 4242 +#include "jpegcomp.h" |
| 4243 |
| 4244 -/* Block smoothing is only applicable for progressive JPEG, so: */ |
| 4245 -#ifndef D_PROGRESSIVE_SUPPORTED |
| 4246 -#undef BLOCK_SMOOTHING_SUPPORTED |
| 4247 -#endif |
| 4248 |
| 4249 -/* Private buffer controller object */ |
| 4250 - |
| 4251 -typedef struct { |
| 4252 - struct jpeg_d_coef_controller pub; /* public fields */ |
| 4253 - |
| 4254 - /* These variables keep track of the current location of the input side. */ |
| 4255 - /* cinfo->input_iMCU_row is also used for this. */ |
| 4256 - JDIMENSION MCU_ctr; /* counts MCUs processed in current row */ |
| 4257 - int MCU_vert_offset; /* counts MCU rows within iMCU row */ |
| 4258 - int MCU_rows_per_iMCU_row; /* number of such rows needed */ |
| 4259 - |
| 4260 - /* The output side's location is represented by cinfo->output_iMCU_row. */ |
| 4261 - |
| 4262 - /* In single-pass modes, it's sufficient to buffer just one MCU. |
| 4263 - * We allocate a workspace of D_MAX_BLOCKS_IN_MCU coefficient blocks, |
| 4264 - * and let the entropy decoder write into that workspace each time. |
| 4265 - * (On 80x86, the workspace is FAR even though it's not really very big; |
| 4266 - * this is to keep the module interfaces unchanged when a large coefficient |
| 4267 - * buffer is necessary.) |
| 4268 - * In multi-pass modes, this array points to the current MCU's blocks |
| 4269 - * within the virtual arrays; it is used only by the input side. |
| 4270 - */ |
| 4271 - JBLOCKROW MCU_buffer[D_MAX_BLOCKS_IN_MCU]; |
| 4272 - |
| 4273 - /* Temporary workspace for one MCU */ |
| 4274 - JCOEF * workspace; |
| 4275 - |
| 4276 -#ifdef D_MULTISCAN_FILES_SUPPORTED |
| 4277 - /* In multi-pass modes, we need a virtual block array for each component. */ |
| 4278 - jvirt_barray_ptr whole_image[MAX_COMPONENTS]; |
| 4279 -#endif |
| 4280 - |
| 4281 -#ifdef BLOCK_SMOOTHING_SUPPORTED |
| 4282 - /* When doing block smoothing, we latch coefficient Al values here */ |
| 4283 - int * coef_bits_latch; |
| 4284 -#define SAVED_COEFS 6 /* we save coef_bits[0..5] */ |
| 4285 -#endif |
| 4286 -} my_coef_controller; |
| 4287 - |
| 4288 -typedef my_coef_controller * my_coef_ptr; |
| 4289 - |
| 4290 /* Forward declarations */ |
| 4291 METHODDEF(int) decompress_onepass |
| 4292 JPP((j_decompress_ptr cinfo, JSAMPIMAGE output_buf)); |
| 4293 @@ -78,30 +35,6 @@ |
| 4294 #endif |
| 4295 |
| 4296 |
| 4297 -LOCAL(void) |
| 4298 -start_iMCU_row (j_decompress_ptr cinfo) |
| 4299 -/* Reset within-iMCU-row counters for a new row (input side) */ |
| 4300 -{ |
| 4301 - my_coef_ptr coef = (my_coef_ptr) cinfo->coef; |
| 4302 - |
| 4303 - /* In an interleaved scan, an MCU row is the same as an iMCU row. |
| 4304 - * In a noninterleaved scan, an iMCU row has v_samp_factor MCU rows. |
| 4305 - * But at the bottom of the image, process only what's left. |
| 4306 - */ |
| 4307 - if (cinfo->comps_in_scan > 1) { |
| 4308 - coef->MCU_rows_per_iMCU_row = 1; |
| 4309 - } else { |
| 4310 - if (cinfo->input_iMCU_row < (cinfo->total_iMCU_rows-1)) |
| 4311 - coef->MCU_rows_per_iMCU_row = cinfo->cur_comp_info[0]->v_samp_factor; |
| 4312 - else |
| 4313 - coef->MCU_rows_per_iMCU_row = cinfo->cur_comp_info[0]->last_row_height; |
| 4314 - } |
| 4315 - |
| 4316 - coef->MCU_ctr = 0; |
| 4317 - coef->MCU_vert_offset = 0; |
| 4318 -} |
| 4319 - |
| 4320 - |
| 4321 /* |
| 4322 * Initialize for an input processing pass. |
| 4323 */ |
| 4324 @@ -190,7 +123,7 @@ |
| 4325 useful_width = (MCU_col_num < last_MCU_col) ? compptr->MCU_width |
| 4326 : compptr->last_col_width; |
| 4327 output_ptr = output_buf[compptr->component_index] + |
| 4328 - yoffset * compptr->DCT_scaled_size; |
| 4329 + yoffset * compptr->_DCT_scaled_size; |
| 4330 start_col = MCU_col_num * compptr->MCU_sample_width; |
| 4331 for (yindex = 0; yindex < compptr->MCU_height; yindex++) { |
| 4332 if (cinfo->input_iMCU_row < last_iMCU_row || |
| 4333 @@ -200,11 +133,11 @@ |
| 4334 (*inverse_DCT) (cinfo, compptr, |
| 4335 (JCOEFPTR) coef->MCU_buffer[blkn+xindex], |
| 4336 output_ptr, output_col); |
| 4337 - output_col += compptr->DCT_scaled_size; |
| 4338 + output_col += compptr->_DCT_scaled_size; |
| 4339 } |
| 4340 } |
| 4341 blkn += compptr->MCU_width; |
| 4342 - output_ptr += compptr->DCT_scaled_size; |
| 4343 + output_ptr += compptr->_DCT_scaled_size; |
| 4344 } |
| 4345 } |
| 4346 } |
| 4347 @@ -365,9 +298,9 @@ |
| 4348 (*inverse_DCT) (cinfo, compptr, (JCOEFPTR) buffer_ptr, |
| 4349 output_ptr, output_col); |
| 4350 buffer_ptr++; |
| 4351 - output_col += compptr->DCT_scaled_size; |
| 4352 + output_col += compptr->_DCT_scaled_size; |
| 4353 } |
| 4354 - output_ptr += compptr->DCT_scaled_size; |
| 4355 + output_ptr += compptr->_DCT_scaled_size; |
| 4356 } |
| 4357 } |
| 4358 |
| 4359 @@ -660,9 +593,9 @@ |
| 4360 DC4 = DC5; DC5 = DC6; |
| 4361 DC7 = DC8; DC8 = DC9; |
| 4362 buffer_ptr++, prev_block_row++, next_block_row++; |
| 4363 - output_col += compptr->DCT_scaled_size; |
| 4364 + output_col += compptr->_DCT_scaled_size; |
| 4365 } |
| 4366 - output_ptr += compptr->DCT_scaled_size; |
| 4367 + output_ptr += compptr->_DCT_scaled_size; |
| 4368 } |
| 4369 } |
| 4370 |
| 4371 Index: jdcolor.c |
| 4372 =================================================================== |
| 4373 --- jdcolor.c (revision 829) |
| 4374 +++ jdcolor.c (working copy) |
| 4375 @@ -1,10 +1,12 @@ |
| 4376 /* |
| 4377 * jdcolor.c |
| 4378 * |
| 4379 + * This file was part of the Independent JPEG Group's software: |
| 4380 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 4381 + * Modified 2011 by Guido Vollbeding. |
| 4382 + * libjpeg-turbo Modifications: |
| 4383 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 4384 - * Copyright (C) 2009, D. R. Commander. |
| 4385 - * This file is part of the Independent JPEG Group's software. |
| 4386 + * Copyright (C) 2009, 2011-2012, D. R. Commander. |
| 4387 * For conditions of distribution and use, see the accompanying README file. |
| 4388 * |
| 4389 * This file contains output colorspace conversion routines. |
| 4390 @@ -14,6 +16,7 @@ |
| 4391 #include "jinclude.h" |
| 4392 #include "jpeglib.h" |
| 4393 #include "jsimd.h" |
| 4394 +#include "config.h" |
| 4395 |
| 4396 |
| 4397 /* Private subobject */ |
| 4398 @@ -26,6 +29,9 @@ |
| 4399 int * Cb_b_tab; /* => table for Cb to B conversion */ |
| 4400 INT32 * Cr_g_tab; /* => table for Cr to G conversion */ |
| 4401 INT32 * Cb_g_tab; /* => table for Cb to G conversion */ |
| 4402 + |
| 4403 + /* Private state for RGB->Y conversion */ |
| 4404 + INT32 * rgb_y_tab; /* => table for RGB to Y conversion */ |
| 4405 } my_color_deconverter; |
| 4406 |
| 4407 typedef my_color_deconverter * my_cconvert_ptr; |
| 4408 @@ -32,14 +38,19 @@ |
| 4409 |
| 4410 |
| 4411 /**************** YCbCr -> RGB conversion: most common case **************/ |
| 4412 +/**************** RGB -> Y conversion: less common case **************/ |
| 4413 |
| 4414 /* |
| 4415 * YCbCr is defined per CCIR 601-1, except that Cb and Cr are |
| 4416 * normalized to the range 0..MAXJSAMPLE rather than -0.5 .. 0.5. |
| 4417 * The conversion equations to be implemented are therefore |
| 4418 + * |
| 4419 * R = Y + 1.40200 * Cr |
| 4420 * G = Y - 0.34414 * Cb - 0.71414 * Cr |
| 4421 * B = Y + 1.77200 * Cb |
| 4422 + * |
| 4423 + * Y = 0.29900 * R + 0.58700 * G + 0.11400 * B |
| 4424 + * |
| 4425 * where Cb and Cr represent the incoming values less CENTERJSAMPLE. |
| 4426 * (These numbers are derived from TIFF 6.0 section 21, dated 3-June-92.) |
| 4427 * |
| 4428 @@ -64,7 +75,132 @@ |
| 4429 #define ONE_HALF ((INT32) 1 << (SCALEBITS-1)) |
| 4430 #define FIX(x) ((INT32) ((x) * (1L<<SCALEBITS) + 0.5)) |
| 4431 |
| 4432 +/* We allocate one big table for RGB->Y conversion and divide it up into |
| 4433 + * three parts, instead of doing three alloc_small requests. This lets us |
| 4434 + * use a single table base address, which can be held in a register in the |
| 4435 + * inner loops on many machines (more than can hold all three addresses, |
| 4436 + * anyway). |
| 4437 + */ |
| 4438 |
| 4439 +#define R_Y_OFF 0 /* offset to R => Y sect
ion */ |
| 4440 +#define G_Y_OFF (1*(MAXJSAMPLE+1)) /* offset to G => Y sect
ion */ |
| 4441 +#define B_Y_OFF (2*(MAXJSAMPLE+1)) /* etc. */ |
| 4442 +#define TABLE_SIZE (3*(MAXJSAMPLE+1)) |
| 4443 + |
| 4444 + |
| 4445 +/* Include inline routines for colorspace extensions */ |
| 4446 + |
| 4447 +#include "jdcolext.c" |
| 4448 +#undef RGB_RED |
| 4449 +#undef RGB_GREEN |
| 4450 +#undef RGB_BLUE |
| 4451 +#undef RGB_PIXELSIZE |
| 4452 + |
| 4453 +#define RGB_RED EXT_RGB_RED |
| 4454 +#define RGB_GREEN EXT_RGB_GREEN |
| 4455 +#define RGB_BLUE EXT_RGB_BLUE |
| 4456 +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 4457 +#define ycc_rgb_convert_internal ycc_extrgb_convert_internal |
| 4458 +#define gray_rgb_convert_internal gray_extrgb_convert_internal |
| 4459 +#define rgb_rgb_convert_internal rgb_extrgb_convert_internal |
| 4460 +#include "jdcolext.c" |
| 4461 +#undef RGB_RED |
| 4462 +#undef RGB_GREEN |
| 4463 +#undef RGB_BLUE |
| 4464 +#undef RGB_PIXELSIZE |
| 4465 +#undef ycc_rgb_convert_internal |
| 4466 +#undef gray_rgb_convert_internal |
| 4467 +#undef rgb_rgb_convert_internal |
| 4468 + |
| 4469 +#define RGB_RED EXT_RGBX_RED |
| 4470 +#define RGB_GREEN EXT_RGBX_GREEN |
| 4471 +#define RGB_BLUE EXT_RGBX_BLUE |
| 4472 +#define RGB_ALPHA 3 |
| 4473 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 4474 +#define ycc_rgb_convert_internal ycc_extrgbx_convert_internal |
| 4475 +#define gray_rgb_convert_internal gray_extrgbx_convert_internal |
| 4476 +#define rgb_rgb_convert_internal rgb_extrgbx_convert_internal |
| 4477 +#include "jdcolext.c" |
| 4478 +#undef RGB_RED |
| 4479 +#undef RGB_GREEN |
| 4480 +#undef RGB_BLUE |
| 4481 +#undef RGB_ALPHA |
| 4482 +#undef RGB_PIXELSIZE |
| 4483 +#undef ycc_rgb_convert_internal |
| 4484 +#undef gray_rgb_convert_internal |
| 4485 +#undef rgb_rgb_convert_internal |
| 4486 + |
| 4487 +#define RGB_RED EXT_BGR_RED |
| 4488 +#define RGB_GREEN EXT_BGR_GREEN |
| 4489 +#define RGB_BLUE EXT_BGR_BLUE |
| 4490 +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 4491 +#define ycc_rgb_convert_internal ycc_extbgr_convert_internal |
| 4492 +#define gray_rgb_convert_internal gray_extbgr_convert_internal |
| 4493 +#define rgb_rgb_convert_internal rgb_extbgr_convert_internal |
| 4494 +#include "jdcolext.c" |
| 4495 +#undef RGB_RED |
| 4496 +#undef RGB_GREEN |
| 4497 +#undef RGB_BLUE |
| 4498 +#undef RGB_PIXELSIZE |
| 4499 +#undef ycc_rgb_convert_internal |
| 4500 +#undef gray_rgb_convert_internal |
| 4501 +#undef rgb_rgb_convert_internal |
| 4502 + |
| 4503 +#define RGB_RED EXT_BGRX_RED |
| 4504 +#define RGB_GREEN EXT_BGRX_GREEN |
| 4505 +#define RGB_BLUE EXT_BGRX_BLUE |
| 4506 +#define RGB_ALPHA 3 |
| 4507 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 4508 +#define ycc_rgb_convert_internal ycc_extbgrx_convert_internal |
| 4509 +#define gray_rgb_convert_internal gray_extbgrx_convert_internal |
| 4510 +#define rgb_rgb_convert_internal rgb_extbgrx_convert_internal |
| 4511 +#include "jdcolext.c" |
| 4512 +#undef RGB_RED |
| 4513 +#undef RGB_GREEN |
| 4514 +#undef RGB_BLUE |
| 4515 +#undef RGB_ALPHA |
| 4516 +#undef RGB_PIXELSIZE |
| 4517 +#undef ycc_rgb_convert_internal |
| 4518 +#undef gray_rgb_convert_internal |
| 4519 +#undef rgb_rgb_convert_internal |
| 4520 + |
| 4521 +#define RGB_RED EXT_XBGR_RED |
| 4522 +#define RGB_GREEN EXT_XBGR_GREEN |
| 4523 +#define RGB_BLUE EXT_XBGR_BLUE |
| 4524 +#define RGB_ALPHA 0 |
| 4525 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 4526 +#define ycc_rgb_convert_internal ycc_extxbgr_convert_internal |
| 4527 +#define gray_rgb_convert_internal gray_extxbgr_convert_internal |
| 4528 +#define rgb_rgb_convert_internal rgb_extxbgr_convert_internal |
| 4529 +#include "jdcolext.c" |
| 4530 +#undef RGB_RED |
| 4531 +#undef RGB_GREEN |
| 4532 +#undef RGB_BLUE |
| 4533 +#undef RGB_ALPHA |
| 4534 +#undef RGB_PIXELSIZE |
| 4535 +#undef ycc_rgb_convert_internal |
| 4536 +#undef gray_rgb_convert_internal |
| 4537 +#undef rgb_rgb_convert_internal |
| 4538 + |
| 4539 +#define RGB_RED EXT_XRGB_RED |
| 4540 +#define RGB_GREEN EXT_XRGB_GREEN |
| 4541 +#define RGB_BLUE EXT_XRGB_BLUE |
| 4542 +#define RGB_ALPHA 0 |
| 4543 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 4544 +#define ycc_rgb_convert_internal ycc_extxrgb_convert_internal |
| 4545 +#define gray_rgb_convert_internal gray_extxrgb_convert_internal |
| 4546 +#define rgb_rgb_convert_internal rgb_extxrgb_convert_internal |
| 4547 +#include "jdcolext.c" |
| 4548 +#undef RGB_RED |
| 4549 +#undef RGB_GREEN |
| 4550 +#undef RGB_BLUE |
| 4551 +#undef RGB_ALPHA |
| 4552 +#undef RGB_PIXELSIZE |
| 4553 +#undef ycc_rgb_convert_internal |
| 4554 +#undef gray_rgb_convert_internal |
| 4555 +#undef rgb_rgb_convert_internal |
| 4556 + |
| 4557 + |
| 4558 /* |
| 4559 * Initialize tables for YCC->RGB colorspace conversion. |
| 4560 */ |
| 4561 @@ -110,13 +246,6 @@ |
| 4562 |
| 4563 /* |
| 4564 * Convert some rows of samples to the output colorspace. |
| 4565 - * |
| 4566 - * Note that we change from noninterleaved, one-plane-per-component format |
| 4567 - * to interleaved-pixel format. The output buffer is therefore three times |
| 4568 - * as wide as the input buffer. |
| 4569 - * A starting row offset is provided only for the input buffer. The caller |
| 4570 - * can easily adjust the passed output_buf value to accommodate any row |
| 4571 - * offset required on that side. |
| 4572 */ |
| 4573 |
| 4574 METHODDEF(void) |
| 4575 @@ -124,19 +253,86 @@ |
| 4576 JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 4577 JSAMPARRAY output_buf, int num_rows) |
| 4578 { |
| 4579 + switch (cinfo->out_color_space) { |
| 4580 + case JCS_EXT_RGB: |
| 4581 + ycc_extrgb_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4582 + num_rows); |
| 4583 + break; |
| 4584 + case JCS_EXT_RGBX: |
| 4585 + case JCS_EXT_RGBA: |
| 4586 + ycc_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4587 + num_rows); |
| 4588 + break; |
| 4589 + case JCS_EXT_BGR: |
| 4590 + ycc_extbgr_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4591 + num_rows); |
| 4592 + break; |
| 4593 + case JCS_EXT_BGRX: |
| 4594 + case JCS_EXT_BGRA: |
| 4595 + ycc_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4596 + num_rows); |
| 4597 + break; |
| 4598 + case JCS_EXT_XBGR: |
| 4599 + case JCS_EXT_ABGR: |
| 4600 + ycc_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4601 + num_rows); |
| 4602 + break; |
| 4603 + case JCS_EXT_XRGB: |
| 4604 + case JCS_EXT_ARGB: |
| 4605 + ycc_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4606 + num_rows); |
| 4607 + break; |
| 4608 + default: |
| 4609 + ycc_rgb_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4610 + num_rows); |
| 4611 + break; |
| 4612 + } |
| 4613 +} |
| 4614 + |
| 4615 + |
| 4616 +/**************** Cases other than YCbCr -> RGB **************/ |
| 4617 + |
| 4618 + |
| 4619 +/* |
| 4620 + * Initialize for RGB->grayscale colorspace conversion. |
| 4621 + */ |
| 4622 + |
| 4623 +LOCAL(void) |
| 4624 +build_rgb_y_table (j_decompress_ptr cinfo) |
| 4625 +{ |
| 4626 my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert; |
| 4627 - register int y, cb, cr; |
| 4628 + INT32 * rgb_y_tab; |
| 4629 + INT32 i; |
| 4630 + |
| 4631 + /* Allocate and fill in the conversion tables. */ |
| 4632 + cconvert->rgb_y_tab = rgb_y_tab = (INT32 *) |
| 4633 + (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE, |
| 4634 + (TABLE_SIZE * SIZEOF(INT32))); |
| 4635 + |
| 4636 + for (i = 0; i <= MAXJSAMPLE; i++) { |
| 4637 + rgb_y_tab[i+R_Y_OFF] = FIX(0.29900) * i; |
| 4638 + rgb_y_tab[i+G_Y_OFF] = FIX(0.58700) * i; |
| 4639 + rgb_y_tab[i+B_Y_OFF] = FIX(0.11400) * i + ONE_HALF; |
| 4640 + } |
| 4641 +} |
| 4642 + |
| 4643 + |
| 4644 +/* |
| 4645 + * Convert RGB to grayscale. |
| 4646 + */ |
| 4647 + |
| 4648 +METHODDEF(void) |
| 4649 +rgb_gray_convert (j_decompress_ptr cinfo, |
| 4650 + JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 4651 + JSAMPARRAY output_buf, int num_rows) |
| 4652 +{ |
| 4653 + my_cconvert_ptr cconvert = (my_cconvert_ptr) cinfo->cconvert; |
| 4654 + register int r, g, b; |
| 4655 + register INT32 * ctab = cconvert->rgb_y_tab; |
| 4656 register JSAMPROW outptr; |
| 4657 register JSAMPROW inptr0, inptr1, inptr2; |
| 4658 register JDIMENSION col; |
| 4659 JDIMENSION num_cols = cinfo->output_width; |
| 4660 - /* copy these pointers into registers if possible */ |
| 4661 - register JSAMPLE * range_limit = cinfo->sample_range_limit; |
| 4662 - register int * Crrtab = cconvert->Cr_r_tab; |
| 4663 - register int * Cbbtab = cconvert->Cb_b_tab; |
| 4664 - register INT32 * Crgtab = cconvert->Cr_g_tab; |
| 4665 - register INT32 * Cbgtab = cconvert->Cb_g_tab; |
| 4666 - SHIFT_TEMPS |
| 4667 |
| 4668 while (--num_rows >= 0) { |
| 4669 inptr0 = input_buf[0][input_row]; |
| 4670 @@ -145,24 +341,18 @@ |
| 4671 input_row++; |
| 4672 outptr = *output_buf++; |
| 4673 for (col = 0; col < num_cols; col++) { |
| 4674 - y = GETJSAMPLE(inptr0[col]); |
| 4675 - cb = GETJSAMPLE(inptr1[col]); |
| 4676 - cr = GETJSAMPLE(inptr2[col]); |
| 4677 - /* Range-limiting is essential due to noise introduced by DCT losses. */ |
| 4678 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + Crrtab[cr]]; |
| 4679 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y + |
| 4680 - ((int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], |
| 4681 - SCALEBITS))]; |
| 4682 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + Cbbtab[cb]]; |
| 4683 - outptr += rgb_pixelsize[cinfo->out_color_space]; |
| 4684 + r = GETJSAMPLE(inptr0[col]); |
| 4685 + g = GETJSAMPLE(inptr1[col]); |
| 4686 + b = GETJSAMPLE(inptr2[col]); |
| 4687 + /* Y */ |
| 4688 + outptr[col] = (JSAMPLE) |
| 4689 + ((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF]) |
| 4690 + >> SCALEBITS); |
| 4691 } |
| 4692 } |
| 4693 } |
| 4694 |
| 4695 |
| 4696 -/**************** Cases other than YCbCr -> RGB **************/ |
| 4697 - |
| 4698 - |
| 4699 /* |
| 4700 * Color conversion for no colorspace change: just copy the data, |
| 4701 * converting from separate-planes to interleaved representation. |
| 4702 @@ -211,9 +401,7 @@ |
| 4703 |
| 4704 |
| 4705 /* |
| 4706 - * Convert grayscale to RGB: just duplicate the graylevel three times. |
| 4707 - * This is provided to support applications that don't want to cope |
| 4708 - * with grayscale as a separate case. |
| 4709 + * Convert grayscale to RGB |
| 4710 */ |
| 4711 |
| 4712 METHODDEF(void) |
| 4713 @@ -221,20 +409,85 @@ |
| 4714 JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 4715 JSAMPARRAY output_buf, int num_rows) |
| 4716 { |
| 4717 - register JSAMPROW inptr, outptr; |
| 4718 - register JDIMENSION col; |
| 4719 - JDIMENSION num_cols = cinfo->output_width; |
| 4720 + switch (cinfo->out_color_space) { |
| 4721 + case JCS_EXT_RGB: |
| 4722 + gray_extrgb_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4723 + num_rows); |
| 4724 + break; |
| 4725 + case JCS_EXT_RGBX: |
| 4726 + case JCS_EXT_RGBA: |
| 4727 + gray_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4728 + num_rows); |
| 4729 + break; |
| 4730 + case JCS_EXT_BGR: |
| 4731 + gray_extbgr_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4732 + num_rows); |
| 4733 + break; |
| 4734 + case JCS_EXT_BGRX: |
| 4735 + case JCS_EXT_BGRA: |
| 4736 + gray_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4737 + num_rows); |
| 4738 + break; |
| 4739 + case JCS_EXT_XBGR: |
| 4740 + case JCS_EXT_ABGR: |
| 4741 + gray_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4742 + num_rows); |
| 4743 + break; |
| 4744 + case JCS_EXT_XRGB: |
| 4745 + case JCS_EXT_ARGB: |
| 4746 + gray_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4747 + num_rows); |
| 4748 + break; |
| 4749 + default: |
| 4750 + gray_rgb_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4751 + num_rows); |
| 4752 + break; |
| 4753 + } |
| 4754 +} |
| 4755 |
| 4756 - while (--num_rows >= 0) { |
| 4757 - inptr = input_buf[0][input_row++]; |
| 4758 - outptr = *output_buf++; |
| 4759 - for (col = 0; col < num_cols; col++) { |
| 4760 - /* We can dispense with GETJSAMPLE() here */ |
| 4761 - outptr[rgb_red[cinfo->out_color_space]] = |
| 4762 - outptr[rgb_green[cinfo->out_color_space]] = |
| 4763 - outptr[rgb_blue[cinfo->out_color_space]] = inptr[col]; |
| 4764 - outptr += rgb_pixelsize[cinfo->out_color_space]; |
| 4765 - } |
| 4766 + |
| 4767 +/* |
| 4768 + * Convert plain RGB to extended RGB |
| 4769 + */ |
| 4770 + |
| 4771 +METHODDEF(void) |
| 4772 +rgb_rgb_convert (j_decompress_ptr cinfo, |
| 4773 + JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 4774 + JSAMPARRAY output_buf, int num_rows) |
| 4775 +{ |
| 4776 + switch (cinfo->out_color_space) { |
| 4777 + case JCS_EXT_RGB: |
| 4778 + rgb_extrgb_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4779 + num_rows); |
| 4780 + break; |
| 4781 + case JCS_EXT_RGBX: |
| 4782 + case JCS_EXT_RGBA: |
| 4783 + rgb_extrgbx_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4784 + num_rows); |
| 4785 + break; |
| 4786 + case JCS_EXT_BGR: |
| 4787 + rgb_extbgr_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4788 + num_rows); |
| 4789 + break; |
| 4790 + case JCS_EXT_BGRX: |
| 4791 + case JCS_EXT_BGRA: |
| 4792 + rgb_extbgrx_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4793 + num_rows); |
| 4794 + break; |
| 4795 + case JCS_EXT_XBGR: |
| 4796 + case JCS_EXT_ABGR: |
| 4797 + rgb_extxbgr_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4798 + num_rows); |
| 4799 + break; |
| 4800 + case JCS_EXT_XRGB: |
| 4801 + case JCS_EXT_ARGB: |
| 4802 + rgb_extxrgb_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4803 + num_rows); |
| 4804 + break; |
| 4805 + default: |
| 4806 + rgb_rgb_convert_internal(cinfo, input_buf, input_row, output_buf, |
| 4807 + num_rows); |
| 4808 + break; |
| 4809 } |
| 4810 } |
| 4811 |
| 4812 @@ -356,6 +609,9 @@ |
| 4813 /* For color->grayscale conversion, only the Y (0) component is needed */ |
| 4814 for (ci = 1; ci < cinfo->num_components; ci++) |
| 4815 cinfo->comp_info[ci].component_needed = FALSE; |
| 4816 + } else if (cinfo->jpeg_color_space == JCS_RGB) { |
| 4817 + cconvert->pub.color_convert = rgb_gray_convert; |
| 4818 + build_rgb_y_table(cinfo); |
| 4819 } else |
| 4820 ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL); |
| 4821 break; |
| 4822 @@ -367,6 +623,10 @@ |
| 4823 case JCS_EXT_BGRX: |
| 4824 case JCS_EXT_XBGR: |
| 4825 case JCS_EXT_XRGB: |
| 4826 + case JCS_EXT_RGBA: |
| 4827 + case JCS_EXT_BGRA: |
| 4828 + case JCS_EXT_ABGR: |
| 4829 + case JCS_EXT_ARGB: |
| 4830 cinfo->out_color_components = rgb_pixelsize[cinfo->out_color_space]; |
| 4831 if (cinfo->jpeg_color_space == JCS_YCbCr) { |
| 4832 if (jsimd_can_ycc_rgb()) |
| 4833 @@ -377,9 +637,14 @@ |
| 4834 } |
| 4835 } else if (cinfo->jpeg_color_space == JCS_GRAYSCALE) { |
| 4836 cconvert->pub.color_convert = gray_rgb_convert; |
| 4837 - } else if (cinfo->jpeg_color_space == cinfo->out_color_space && |
| 4838 - rgb_pixelsize[cinfo->out_color_space] == 3) { |
| 4839 - cconvert->pub.color_convert = null_convert; |
| 4840 + } else if (cinfo->jpeg_color_space == JCS_RGB) { |
| 4841 + if (rgb_red[cinfo->out_color_space] == 0 && |
| 4842 + rgb_green[cinfo->out_color_space] == 1 && |
| 4843 + rgb_blue[cinfo->out_color_space] == 2 && |
| 4844 + rgb_pixelsize[cinfo->out_color_space] == 3) |
| 4845 + cconvert->pub.color_convert = null_convert; |
| 4846 + else |
| 4847 + cconvert->pub.color_convert = rgb_rgb_convert; |
| 4848 } else |
| 4849 ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL); |
| 4850 break; |
| 4851 Index: jdct.h |
| 4852 =================================================================== |
| 4853 --- jdct.h (revision 829) |
| 4854 +++ jdct.h (working copy) |
| 4855 @@ -95,9 +95,21 @@ |
| 4856 #define jpeg_idct_islow jRDislow |
| 4857 #define jpeg_idct_ifast jRDifast |
| 4858 #define jpeg_idct_float jRDfloat |
| 4859 +#define jpeg_idct_7x7 jRD7x7 |
| 4860 +#define jpeg_idct_6x6 jRD6x6 |
| 4861 +#define jpeg_idct_5x5 jRD5x5 |
| 4862 #define jpeg_idct_4x4 jRD4x4 |
| 4863 +#define jpeg_idct_3x3 jRD3x3 |
| 4864 #define jpeg_idct_2x2 jRD2x2 |
| 4865 #define jpeg_idct_1x1 jRD1x1 |
| 4866 +#define jpeg_idct_9x9 jRD9x9 |
| 4867 +#define jpeg_idct_10x10 jRD10x10 |
| 4868 +#define jpeg_idct_11x11 jRD11x11 |
| 4869 +#define jpeg_idct_12x12 jRD12x12 |
| 4870 +#define jpeg_idct_13x13 jRD13x13 |
| 4871 +#define jpeg_idct_14x14 jRD14x14 |
| 4872 +#define jpeg_idct_15x15 jRD15x15 |
| 4873 +#define jpeg_idct_16x16 jRD16x16 |
| 4874 #endif /* NEED_SHORT_EXTERNAL_NAMES */ |
| 4875 |
| 4876 /* Extern declarations for the forward and inverse DCT routines. */ |
| 4877 @@ -115,9 +127,21 @@ |
| 4878 EXTERN(void) jpeg_idct_float |
| 4879 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4880 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4881 +EXTERN(void) jpeg_idct_7x7 |
| 4882 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4883 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4884 +EXTERN(void) jpeg_idct_6x6 |
| 4885 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4886 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4887 +EXTERN(void) jpeg_idct_5x5 |
| 4888 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4889 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4890 EXTERN(void) jpeg_idct_4x4 |
| 4891 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4892 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4893 +EXTERN(void) jpeg_idct_3x3 |
| 4894 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4895 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4896 EXTERN(void) jpeg_idct_2x2 |
| 4897 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4898 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4899 @@ -124,6 +148,30 @@ |
| 4900 EXTERN(void) jpeg_idct_1x1 |
| 4901 JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4902 JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4903 +EXTERN(void) jpeg_idct_9x9 |
| 4904 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4905 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4906 +EXTERN(void) jpeg_idct_10x10 |
| 4907 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4908 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4909 +EXTERN(void) jpeg_idct_11x11 |
| 4910 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4911 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4912 +EXTERN(void) jpeg_idct_12x12 |
| 4913 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4914 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4915 +EXTERN(void) jpeg_idct_13x13 |
| 4916 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4917 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4918 +EXTERN(void) jpeg_idct_14x14 |
| 4919 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4920 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4921 +EXTERN(void) jpeg_idct_15x15 |
| 4922 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4923 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4924 +EXTERN(void) jpeg_idct_16x16 |
| 4925 + JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 4926 + JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col)); |
| 4927 |
| 4928 |
| 4929 /* |
| 4930 Index: jddctmgr.c |
| 4931 =================================================================== |
| 4932 --- jddctmgr.c (revision 829) |
| 4933 +++ jddctmgr.c (working copy) |
| 4934 @@ -1,9 +1,12 @@ |
| 4935 /* |
| 4936 * jddctmgr.c |
| 4937 * |
| 4938 + * This file was part of the Independent JPEG Group's software: |
| 4939 * Copyright (C) 1994-1996, Thomas G. Lane. |
| 4940 + * Modified 2002-2010 by Guido Vollbeding. |
| 4941 + * libjpeg-turbo Modifications: |
| 4942 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 4943 - * This file is part of the Independent JPEG Group's software. |
| 4944 + * Copyright (C) 2010, D. R. Commander. |
| 4945 * For conditions of distribution and use, see the accompanying README file. |
| 4946 * |
| 4947 * This file contains the inverse-DCT management logic. |
| 4948 @@ -21,6 +24,7 @@ |
| 4949 #include "jpeglib.h" |
| 4950 #include "jdct.h" /* Private declarations for DCT subsystem */ |
| 4951 #include "jsimddct.h" |
| 4952 +#include "jpegcomp.h" |
| 4953 |
| 4954 |
| 4955 /* |
| 4956 @@ -100,7 +104,7 @@ |
| 4957 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 4958 ci++, compptr++) { |
| 4959 /* Select the proper IDCT routine for this component's scaling */ |
| 4960 - switch (compptr->DCT_scaled_size) { |
| 4961 + switch (compptr->_DCT_scaled_size) { |
| 4962 #ifdef IDCT_SCALING_SUPPORTED |
| 4963 case 1: |
| 4964 method_ptr = jpeg_idct_1x1; |
| 4965 @@ -113,6 +117,10 @@ |
| 4966 method_ptr = jpeg_idct_2x2; |
| 4967 method = JDCT_ISLOW; /* jidctred uses islow-style table */ |
| 4968 break; |
| 4969 + case 3: |
| 4970 + method_ptr = jpeg_idct_3x3; |
| 4971 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 4972 + break; |
| 4973 case 4: |
| 4974 if (jsimd_can_idct_4x4()) |
| 4975 method_ptr = jsimd_idct_4x4; |
| 4976 @@ -120,6 +128,18 @@ |
| 4977 method_ptr = jpeg_idct_4x4; |
| 4978 method = JDCT_ISLOW; /* jidctred uses islow-style table */ |
| 4979 break; |
| 4980 + case 5: |
| 4981 + method_ptr = jpeg_idct_5x5; |
| 4982 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 4983 + break; |
| 4984 + case 6: |
| 4985 + method_ptr = jpeg_idct_6x6; |
| 4986 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 4987 + break; |
| 4988 + case 7: |
| 4989 + method_ptr = jpeg_idct_7x7; |
| 4990 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 4991 + break; |
| 4992 #endif |
| 4993 case DCTSIZE: |
| 4994 switch (cinfo->dct_method) { |
| 4995 @@ -155,8 +175,40 @@ |
| 4996 break; |
| 4997 } |
| 4998 break; |
| 4999 + case 9: |
| 5000 + method_ptr = jpeg_idct_9x9; |
| 5001 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 5002 + break; |
| 5003 + case 10: |
| 5004 + method_ptr = jpeg_idct_10x10; |
| 5005 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 5006 + break; |
| 5007 + case 11: |
| 5008 + method_ptr = jpeg_idct_11x11; |
| 5009 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 5010 + break; |
| 5011 + case 12: |
| 5012 + method_ptr = jpeg_idct_12x12; |
| 5013 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 5014 + break; |
| 5015 + case 13: |
| 5016 + method_ptr = jpeg_idct_13x13; |
| 5017 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 5018 + break; |
| 5019 + case 14: |
| 5020 + method_ptr = jpeg_idct_14x14; |
| 5021 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 5022 + break; |
| 5023 + case 15: |
| 5024 + method_ptr = jpeg_idct_15x15; |
| 5025 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 5026 + break; |
| 5027 + case 16: |
| 5028 + method_ptr = jpeg_idct_16x16; |
| 5029 + method = JDCT_ISLOW; /* jidctint uses islow-style table */ |
| 5030 + break; |
| 5031 default: |
| 5032 - ERREXIT1(cinfo, JERR_BAD_DCTSIZE, compptr->DCT_scaled_size); |
| 5033 + ERREXIT1(cinfo, JERR_BAD_DCTSIZE, compptr->_DCT_scaled_size); |
| 5034 break; |
| 5035 } |
| 5036 idct->pub.inverse_DCT[ci] = method_ptr; |
| 5037 Index: jdhuff.c |
| 5038 =================================================================== |
| 5039 --- jdhuff.c (revision 829) |
| 5040 +++ jdhuff.c (working copy) |
| 5041 @@ -1,8 +1,10 @@ |
| 5042 /* |
| 5043 * jdhuff.c |
| 5044 * |
| 5045 + * This file was part of the Independent JPEG Group's software: |
| 5046 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 5047 - * This file is part of the Independent JPEG Group's software. |
| 5048 + * libjpeg-turbo Modifications: |
| 5049 + * Copyright (C) 2009-2011, 2015, D. R. Commander. |
| 5050 * For conditions of distribution and use, see the accompanying README file. |
| 5051 * |
| 5052 * This file contains Huffman entropy decoding routines. |
| 5053 @@ -18,6 +20,7 @@ |
| 5054 #include "jinclude.h" |
| 5055 #include "jpeglib.h" |
| 5056 #include "jdhuff.h" /* Declarations shared with jdphuff.c */ |
| 5057 +#include "jpegcomp.h" |
| 5058 |
| 5059 |
| 5060 /* |
| 5061 @@ -122,7 +125,7 @@ |
| 5062 if (compptr->component_needed) { |
| 5063 entropy->dc_needed[blkn] = TRUE; |
| 5064 /* we don't need the ACs if producing a 1/8th-size image */ |
| 5065 - entropy->ac_needed[blkn] = (compptr->DCT_scaled_size > 1); |
| 5066 + entropy->ac_needed[blkn] = (compptr->_DCT_scaled_size > 1); |
| 5067 } else { |
| 5068 entropy->dc_needed[blkn] = entropy->ac_needed[blkn] = FALSE; |
| 5069 } |
| 5070 @@ -225,6 +228,7 @@ |
| 5071 dtbl->maxcode[l] = -1; /* -1 if no codes of this length */ |
| 5072 } |
| 5073 } |
| 5074 + dtbl->valoffset[17] = 0; |
| 5075 dtbl->maxcode[17] = 0xFFFFFL; /* ensures jpeg_huff_decode terminates */ |
| 5076 |
| 5077 /* Compute lookahead tables to speed up decoding. |
| 5078 @@ -234,7 +238,8 @@ |
| 5079 * with that code. |
| 5080 */ |
| 5081 |
| 5082 - MEMZERO(dtbl->look_nbits, SIZEOF(dtbl->look_nbits)); |
| 5083 + for (i = 0; i < (1 << HUFF_LOOKAHEAD); i++) |
| 5084 + dtbl->lookup[i] = (HUFF_LOOKAHEAD + 1) << HUFF_LOOKAHEAD; |
| 5085 |
| 5086 p = 0; |
| 5087 for (l = 1; l <= HUFF_LOOKAHEAD; l++) { |
| 5088 @@ -243,8 +248,7 @@ |
| 5089 /* Generate left-justified code followed by all possible bit sequences */ |
| 5090 lookbits = huffcode[p] << (HUFF_LOOKAHEAD-l); |
| 5091 for (ctr = 1 << (HUFF_LOOKAHEAD-l); ctr > 0; ctr--) { |
| 5092 - dtbl->look_nbits[lookbits] = l; |
| 5093 - dtbl->look_sym[lookbits] = htbl->huffval[p]; |
| 5094 + dtbl->lookup[lookbits] = (l << HUFF_LOOKAHEAD) | htbl->huffval[p]; |
| 5095 lookbits++; |
| 5096 } |
| 5097 } |
| 5098 @@ -389,6 +393,50 @@ |
| 5099 } |
| 5100 |
| 5101 |
| 5102 +/* Macro version of the above, which performs much better but does not |
| 5103 + handle markers. We have to hand off any blocks with markers to the |
| 5104 + slower routines. */ |
| 5105 + |
| 5106 +#define GET_BYTE \ |
| 5107 +{ \ |
| 5108 + register int c0, c1; \ |
| 5109 + c0 = GETJOCTET(*buffer++); \ |
| 5110 + c1 = GETJOCTET(*buffer); \ |
| 5111 + /* Pre-execute most common case */ \ |
| 5112 + get_buffer = (get_buffer << 8) | c0; \ |
| 5113 + bits_left += 8; \ |
| 5114 + if (c0 == 0xFF) { \ |
| 5115 + /* Pre-execute case of FF/00, which represents an FF data byte */ \ |
| 5116 + buffer++; \ |
| 5117 + if (c1 != 0) { \ |
| 5118 + /* Oops, it's actually a marker indicating end of compressed data. */ \ |
| 5119 + cinfo->unread_marker = c1; \ |
| 5120 + /* Back out pre-execution and fill the buffer with zero bits */ \ |
| 5121 + buffer -= 2; \ |
| 5122 + get_buffer &= ~0xFF; \ |
| 5123 + } \ |
| 5124 + } \ |
| 5125 +} |
| 5126 + |
| 5127 +#if __WORDSIZE == 64 || defined(_WIN64) |
| 5128 + |
| 5129 +/* Pre-fetch 48 bytes, because the holding register is 64-bit */ |
| 5130 +#define FILL_BIT_BUFFER_FAST \ |
| 5131 + if (bits_left < 16) { \ |
| 5132 + GET_BYTE GET_BYTE GET_BYTE GET_BYTE GET_BYTE GET_BYTE \ |
| 5133 + } |
| 5134 + |
| 5135 +#else |
| 5136 + |
| 5137 +/* Pre-fetch 16 bytes, because the holding register is 32-bit */ |
| 5138 +#define FILL_BIT_BUFFER_FAST \ |
| 5139 + if (bits_left < 16) { \ |
| 5140 + GET_BYTE GET_BYTE \ |
| 5141 + } |
| 5142 + |
| 5143 +#endif |
| 5144 + |
| 5145 + |
| 5146 /* |
| 5147 * Out-of-line code for Huffman code decoding. |
| 5148 * See jdhuff.h for info about usage. |
| 5149 @@ -438,9 +486,10 @@ |
| 5150 * On some machines, a shift and add will be faster than a table lookup. |
| 5151 */ |
| 5152 |
| 5153 +#define AVOID_TABLES |
| 5154 #ifdef AVOID_TABLES |
| 5155 |
| 5156 -#define HUFF_EXTEND(x,s) ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x)) |
| 5157 +#define HUFF_EXTEND(x,s) ((x) + ((((x) - (1<<((s)-1))) >> 31) & (((-1)<<(s)) +
1))) |
| 5158 |
| 5159 #else |
| 5160 |
| 5161 @@ -498,6 +547,191 @@ |
| 5162 } |
| 5163 |
| 5164 |
| 5165 +LOCAL(boolean) |
| 5166 +decode_mcu_slow (j_decompress_ptr cinfo, JBLOCKROW *MCU_data) |
| 5167 +{ |
| 5168 + huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy; |
| 5169 + BITREAD_STATE_VARS; |
| 5170 + int blkn; |
| 5171 + savable_state state; |
| 5172 + /* Outer loop handles each block in the MCU */ |
| 5173 + |
| 5174 + /* Load up working state */ |
| 5175 + BITREAD_LOAD_STATE(cinfo,entropy->bitstate); |
| 5176 + ASSIGN_STATE(state, entropy->saved); |
| 5177 + |
| 5178 + for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) { |
| 5179 + JBLOCKROW block = MCU_data ? MCU_data[blkn] : NULL; |
| 5180 + d_derived_tbl * dctbl = entropy->dc_cur_tbls[blkn]; |
| 5181 + d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn]; |
| 5182 + register int s, k, r; |
| 5183 + |
| 5184 + /* Decode a single block's worth of coefficients */ |
| 5185 + |
| 5186 + /* Section F.2.2.1: decode the DC coefficient difference */ |
| 5187 + HUFF_DECODE(s, br_state, dctbl, return FALSE, label1); |
| 5188 + if (s) { |
| 5189 + CHECK_BIT_BUFFER(br_state, s, return FALSE); |
| 5190 + r = GET_BITS(s); |
| 5191 + s = HUFF_EXTEND(r, s); |
| 5192 + } |
| 5193 + |
| 5194 + if (entropy->dc_needed[blkn]) { |
| 5195 + /* Convert DC difference to actual value, update last_dc_val */ |
| 5196 + int ci = cinfo->MCU_membership[blkn]; |
| 5197 + s += state.last_dc_val[ci]; |
| 5198 + state.last_dc_val[ci] = s; |
| 5199 + if (block) { |
| 5200 + /* Output the DC coefficient (assumes jpeg_natural_order[0] = 0) */ |
| 5201 + (*block)[0] = (JCOEF) s; |
| 5202 + } |
| 5203 + } |
| 5204 + |
| 5205 + if (entropy->ac_needed[blkn] && block) { |
| 5206 + |
| 5207 + /* Section F.2.2.2: decode the AC coefficients */ |
| 5208 + /* Since zeroes are skipped, output area must be cleared beforehand */ |
| 5209 + for (k = 1; k < DCTSIZE2; k++) { |
| 5210 + HUFF_DECODE(s, br_state, actbl, return FALSE, label2); |
| 5211 + |
| 5212 + r = s >> 4; |
| 5213 + s &= 15; |
| 5214 + |
| 5215 + if (s) { |
| 5216 + k += r; |
| 5217 + CHECK_BIT_BUFFER(br_state, s, return FALSE); |
| 5218 + r = GET_BITS(s); |
| 5219 + s = HUFF_EXTEND(r, s); |
| 5220 + /* Output coefficient in natural (dezigzagged) order. |
| 5221 + * Note: the extra entries in jpeg_natural_order[] will save us |
| 5222 + * if k >= DCTSIZE2, which could happen if the data is corrupted. |
| 5223 + */ |
| 5224 + (*block)[jpeg_natural_order[k]] = (JCOEF) s; |
| 5225 + } else { |
| 5226 + if (r != 15) |
| 5227 + break; |
| 5228 + k += 15; |
| 5229 + } |
| 5230 + } |
| 5231 + |
| 5232 + } else { |
| 5233 + |
| 5234 + /* Section F.2.2.2: decode the AC coefficients */ |
| 5235 + /* In this path we just discard the values */ |
| 5236 + for (k = 1; k < DCTSIZE2; k++) { |
| 5237 + HUFF_DECODE(s, br_state, actbl, return FALSE, label3); |
| 5238 + |
| 5239 + r = s >> 4; |
| 5240 + s &= 15; |
| 5241 + |
| 5242 + if (s) { |
| 5243 + k += r; |
| 5244 + CHECK_BIT_BUFFER(br_state, s, return FALSE); |
| 5245 + DROP_BITS(s); |
| 5246 + } else { |
| 5247 + if (r != 15) |
| 5248 + break; |
| 5249 + k += 15; |
| 5250 + } |
| 5251 + } |
| 5252 + } |
| 5253 + } |
| 5254 + |
| 5255 + /* Completed MCU, so update state */ |
| 5256 + BITREAD_SAVE_STATE(cinfo,entropy->bitstate); |
| 5257 + ASSIGN_STATE(entropy->saved, state); |
| 5258 + return TRUE; |
| 5259 +} |
| 5260 + |
| 5261 + |
| 5262 +LOCAL(boolean) |
| 5263 +decode_mcu_fast (j_decompress_ptr cinfo, JBLOCKROW *MCU_data) |
| 5264 +{ |
| 5265 + huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy; |
| 5266 + BITREAD_STATE_VARS; |
| 5267 + JOCTET *buffer; |
| 5268 + int blkn; |
| 5269 + savable_state state; |
| 5270 + /* Outer loop handles each block in the MCU */ |
| 5271 + |
| 5272 + /* Load up working state */ |
| 5273 + BITREAD_LOAD_STATE(cinfo,entropy->bitstate); |
| 5274 + buffer = (JOCTET *) br_state.next_input_byte; |
| 5275 + ASSIGN_STATE(state, entropy->saved); |
| 5276 + |
| 5277 + for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) { |
| 5278 + JBLOCKROW block = MCU_data[blkn]; |
| 5279 + d_derived_tbl * dctbl = entropy->dc_cur_tbls[blkn]; |
| 5280 + d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn]; |
| 5281 + register int s, k, r, l; |
| 5282 + |
| 5283 + HUFF_DECODE_FAST(s, l, dctbl, slow_decode_mcu); |
| 5284 + if (s) { |
| 5285 + FILL_BIT_BUFFER_FAST |
| 5286 + r = GET_BITS(s); |
| 5287 + s = HUFF_EXTEND(r, s); |
| 5288 + } |
| 5289 + |
| 5290 + if (entropy->dc_needed[blkn]) { |
| 5291 + int ci = cinfo->MCU_membership[blkn]; |
| 5292 + s += state.last_dc_val[ci]; |
| 5293 + state.last_dc_val[ci] = s; |
| 5294 + if (block) |
| 5295 + (*block)[0] = (JCOEF) s; |
| 5296 + } |
| 5297 + |
| 5298 + if (entropy->ac_needed[blkn] && block) { |
| 5299 + |
| 5300 + for (k = 1; k < DCTSIZE2; k++) { |
| 5301 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu); |
| 5302 + r = s >> 4; |
| 5303 + s &= 15; |
| 5304 + |
| 5305 + if (s) { |
| 5306 + k += r; |
| 5307 + FILL_BIT_BUFFER_FAST |
| 5308 + r = GET_BITS(s); |
| 5309 + s = HUFF_EXTEND(r, s); |
| 5310 + (*block)[jpeg_natural_order[k]] = (JCOEF) s; |
| 5311 + } else { |
| 5312 + if (r != 15) break; |
| 5313 + k += 15; |
| 5314 + } |
| 5315 + } |
| 5316 + |
| 5317 + } else { |
| 5318 + |
| 5319 + for (k = 1; k < DCTSIZE2; k++) { |
| 5320 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu); |
| 5321 + r = s >> 4; |
| 5322 + s &= 15; |
| 5323 + |
| 5324 + if (s) { |
| 5325 + k += r; |
| 5326 + FILL_BIT_BUFFER_FAST |
| 5327 + DROP_BITS(s); |
| 5328 + } else { |
| 5329 + if (r != 15) break; |
| 5330 + k += 15; |
| 5331 + } |
| 5332 + } |
| 5333 + } |
| 5334 + } |
| 5335 + |
| 5336 + if (cinfo->unread_marker != 0) { |
| 5337 +slow_decode_mcu: |
| 5338 + cinfo->unread_marker = 0; |
| 5339 + return FALSE; |
| 5340 + } |
| 5341 + |
| 5342 + br_state.bytes_in_buffer -= (buffer - br_state.next_input_byte); |
| 5343 + br_state.next_input_byte = buffer; |
| 5344 + BITREAD_SAVE_STATE(cinfo,entropy->bitstate); |
| 5345 + ASSIGN_STATE(entropy->saved, state); |
| 5346 + return TRUE; |
| 5347 +} |
| 5348 + |
| 5349 + |
| 5350 /* |
| 5351 * Decode and return one MCU's worth of Huffman-compressed coefficients. |
| 5352 * The coefficients are reordered from zigzag order into natural array order, |
| 5353 @@ -513,13 +747,13 @@ |
| 5354 * this module, since we'll just re-assign them on the next call.) |
| 5355 */ |
| 5356 |
| 5357 +#define BUFSIZE (DCTSIZE2 * 2u) |
| 5358 + |
| 5359 METHODDEF(boolean) |
| 5360 decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data) |
| 5361 { |
| 5362 huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy; |
| 5363 - int blkn; |
| 5364 - BITREAD_STATE_VARS; |
| 5365 - savable_state state; |
| 5366 + int usefast = 1; |
| 5367 |
| 5368 /* Process restart marker if needed; may have to suspend */ |
| 5369 if (cinfo->restart_interval) { |
| 5370 @@ -526,98 +760,26 @@ |
| 5371 if (entropy->restarts_to_go == 0) |
| 5372 if (! process_restart(cinfo)) |
| 5373 return FALSE; |
| 5374 + usefast = 0; |
| 5375 } |
| 5376 |
| 5377 + if (cinfo->src->bytes_in_buffer < BUFSIZE * (size_t)cinfo->blocks_in_MCU |
| 5378 + || cinfo->unread_marker != 0) |
| 5379 + usefast = 0; |
| 5380 + |
| 5381 /* If we've run out of data, just leave the MCU set to zeroes. |
| 5382 * This way, we return uniform gray for the remainder of the segment. |
| 5383 */ |
| 5384 if (! entropy->pub.insufficient_data) { |
| 5385 |
| 5386 - /* Load up working state */ |
| 5387 - BITREAD_LOAD_STATE(cinfo,entropy->bitstate); |
| 5388 - ASSIGN_STATE(state, entropy->saved); |
| 5389 - |
| 5390 - /* Outer loop handles each block in the MCU */ |
| 5391 - |
| 5392 - for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) { |
| 5393 - JBLOCKROW block = MCU_data[blkn]; |
| 5394 - d_derived_tbl * dctbl = entropy->dc_cur_tbls[blkn]; |
| 5395 - d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn]; |
| 5396 - register int s, k, r; |
| 5397 - |
| 5398 - /* Decode a single block's worth of coefficients */ |
| 5399 - |
| 5400 - /* Section F.2.2.1: decode the DC coefficient difference */ |
| 5401 - HUFF_DECODE(s, br_state, dctbl, return FALSE, label1); |
| 5402 - if (s) { |
| 5403 - CHECK_BIT_BUFFER(br_state, s, return FALSE); |
| 5404 - r = GET_BITS(s); |
| 5405 - s = HUFF_EXTEND(r, s); |
| 5406 - } |
| 5407 - |
| 5408 - if (entropy->dc_needed[blkn]) { |
| 5409 - /* Convert DC difference to actual value, update last_dc_val */ |
| 5410 - int ci = cinfo->MCU_membership[blkn]; |
| 5411 - s += state.last_dc_val[ci]; |
| 5412 - state.last_dc_val[ci] = s; |
| 5413 - /* Output the DC coefficient (assumes jpeg_natural_order[0] = 0) */ |
| 5414 - (*block)[0] = (JCOEF) s; |
| 5415 - } |
| 5416 - |
| 5417 - if (entropy->ac_needed[blkn]) { |
| 5418 - |
| 5419 - /* Section F.2.2.2: decode the AC coefficients */ |
| 5420 - /* Since zeroes are skipped, output area must be cleared beforehand */ |
| 5421 - for (k = 1; k < DCTSIZE2; k++) { |
| 5422 - HUFF_DECODE(s, br_state, actbl, return FALSE, label2); |
| 5423 - |
| 5424 - r = s >> 4; |
| 5425 - s &= 15; |
| 5426 - |
| 5427 - if (s) { |
| 5428 - k += r; |
| 5429 - CHECK_BIT_BUFFER(br_state, s, return FALSE); |
| 5430 - r = GET_BITS(s); |
| 5431 - s = HUFF_EXTEND(r, s); |
| 5432 - /* Output coefficient in natural (dezigzagged) order. |
| 5433 - * Note: the extra entries in jpeg_natural_order[] will save us |
| 5434 - * if k >= DCTSIZE2, which could happen if the data is corrupted. |
| 5435 - */ |
| 5436 - (*block)[jpeg_natural_order[k]] = (JCOEF) s; |
| 5437 - } else { |
| 5438 - if (r != 15) |
| 5439 - break; |
| 5440 - k += 15; |
| 5441 - } |
| 5442 - } |
| 5443 - |
| 5444 - } else { |
| 5445 - |
| 5446 - /* Section F.2.2.2: decode the AC coefficients */ |
| 5447 - /* In this path we just discard the values */ |
| 5448 - for (k = 1; k < DCTSIZE2; k++) { |
| 5449 - HUFF_DECODE(s, br_state, actbl, return FALSE, label3); |
| 5450 - |
| 5451 - r = s >> 4; |
| 5452 - s &= 15; |
| 5453 - |
| 5454 - if (s) { |
| 5455 - k += r; |
| 5456 - CHECK_BIT_BUFFER(br_state, s, return FALSE); |
| 5457 - DROP_BITS(s); |
| 5458 - } else { |
| 5459 - if (r != 15) |
| 5460 - break; |
| 5461 - k += 15; |
| 5462 - } |
| 5463 - } |
| 5464 - |
| 5465 - } |
| 5466 + if (usefast) { |
| 5467 + if (!decode_mcu_fast(cinfo, MCU_data)) goto use_slow; |
| 5468 } |
| 5469 + else { |
| 5470 + use_slow: |
| 5471 + if (!decode_mcu_slow(cinfo, MCU_data)) return FALSE; |
| 5472 + } |
| 5473 |
| 5474 - /* Completed MCU, so update state */ |
| 5475 - BITREAD_SAVE_STATE(cinfo,entropy->bitstate); |
| 5476 - ASSIGN_STATE(entropy->saved, state); |
| 5477 } |
| 5478 |
| 5479 /* Account for restart interval (no-op if not using restarts) */ |
| 5480 Index: jdhuff.h |
| 5481 =================================================================== |
| 5482 --- jdhuff.h (revision 829) |
| 5483 +++ jdhuff.h (working copy) |
| 5484 @@ -1,8 +1,10 @@ |
| 5485 /* |
| 5486 * jdhuff.h |
| 5487 * |
| 5488 + * This file was part of the Independent JPEG Group's software: |
| 5489 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 5490 - * This file is part of the Independent JPEG Group's software. |
| 5491 + * Modifications: |
| 5492 + * Copyright (C) 2010-2011, D. R. Commander. |
| 5493 * For conditions of distribution and use, see the accompanying README file. |
| 5494 * |
| 5495 * This file contains declarations for Huffman entropy decoding routines |
| 5496 @@ -27,7 +29,7 @@ |
| 5497 /* Basic tables: (element [0] of each array is unused) */ |
| 5498 INT32 maxcode[18]; /* largest code of length k (-1 if none) */ |
| 5499 /* (maxcode[17] is a sentinel to ensure jpeg_huff_decode terminates) */ |
| 5500 - INT32 valoffset[17]; /* huffval[] offset for codes of length k */ |
| 5501 + INT32 valoffset[18]; /* huffval[] offset for codes of length k */ |
| 5502 /* valoffset[k] = huffval[] index of 1st symbol of code length k, less |
| 5503 * the smallest code of length k; so given a code of length k, the |
| 5504 * corresponding symbol is huffval[code + valoffset[k]] |
| 5505 @@ -36,13 +38,17 @@ |
| 5506 /* Link to public Huffman table (needed only in jpeg_huff_decode) */ |
| 5507 JHUFF_TBL *pub; |
| 5508 |
| 5509 - /* Lookahead tables: indexed by the next HUFF_LOOKAHEAD bits of |
| 5510 + /* Lookahead table: indexed by the next HUFF_LOOKAHEAD bits of |
| 5511 * the input data stream. If the next Huffman code is no more |
| 5512 * than HUFF_LOOKAHEAD bits long, we can obtain its length and |
| 5513 - * the corresponding symbol directly from these tables. |
| 5514 + * the corresponding symbol directly from this tables. |
| 5515 + * |
| 5516 + * The lower 8 bits of each table entry contain the number of |
| 5517 + * bits in the corresponding Huffman code, or HUFF_LOOKAHEAD + 1 |
| 5518 + * if too long. The next 8 bits of each entry contain the |
| 5519 + * symbol. |
| 5520 */ |
| 5521 - int look_nbits[1<<HUFF_LOOKAHEAD]; /* # bits, or 0 if too long */ |
| 5522 - UINT8 look_sym[1<<HUFF_LOOKAHEAD]; /* symbol, or unused */ |
| 5523 + int lookup[1<<HUFF_LOOKAHEAD]; |
| 5524 } d_derived_tbl; |
| 5525 |
| 5526 /* Expand a Huffman table definition into the derived format */ |
| 5527 @@ -69,9 +75,18 @@ |
| 5528 * necessary. |
| 5529 */ |
| 5530 |
| 5531 +#if __WORDSIZE == 64 || defined(_WIN64) |
| 5532 + |
| 5533 +typedef size_t bit_buf_type; /* type of bit-extraction buffer */ |
| 5534 +#define BIT_BUF_SIZE 64 /* size of buffer in bits */ |
| 5535 + |
| 5536 +#else |
| 5537 + |
| 5538 typedef INT32 bit_buf_type; /* type of bit-extraction buffer */ |
| 5539 -#define BIT_BUF_SIZE 32 /* size of buffer in bits */ |
| 5540 +#define BIT_BUF_SIZE 32 /* size of buffer in bits */ |
| 5541 |
| 5542 +#endif |
| 5543 + |
| 5544 /* If long is > 32 bits on your machine, and shifting/masking longs is |
| 5545 * reasonably fast, making bit_buf_type be long and setting BIT_BUF_SIZE |
| 5546 * appropriately should be a win. Unfortunately we can't define the size |
| 5547 @@ -183,11 +198,10 @@ |
| 5548 } \ |
| 5549 } \ |
| 5550 look = PEEK_BITS(HUFF_LOOKAHEAD); \ |
| 5551 - if ((nb = htbl->look_nbits[look]) != 0) { \ |
| 5552 + if ((nb = (htbl->lookup[look] >> HUFF_LOOKAHEAD)) <= HUFF_LOOKAHEAD) { \ |
| 5553 DROP_BITS(nb); \ |
| 5554 - result = htbl->look_sym[look]; \ |
| 5555 + result = htbl->lookup[look] & ((1 << HUFF_LOOKAHEAD) - 1); \ |
| 5556 } else { \ |
| 5557 - nb = HUFF_LOOKAHEAD+1; \ |
| 5558 slowlabel: \ |
| 5559 if ((result=jpeg_huff_decode(&state,get_buffer,bits_left,htbl,nb)) < 0) \ |
| 5560 { failaction; } \ |
| 5561 @@ -195,6 +209,28 @@ |
| 5562 } \ |
| 5563 } |
| 5564 |
| 5565 +#define HUFF_DECODE_FAST(s,nb,htbl,slowlabel) \ |
| 5566 + FILL_BIT_BUFFER_FAST; \ |
| 5567 + s = PEEK_BITS(HUFF_LOOKAHEAD); \ |
| 5568 + s = htbl->lookup[s]; \ |
| 5569 + nb = s >> HUFF_LOOKAHEAD; \ |
| 5570 + /* Pre-execute the common case of nb <= HUFF_LOOKAHEAD */ \ |
| 5571 + DROP_BITS(nb); \ |
| 5572 + s = s & ((1 << HUFF_LOOKAHEAD) - 1); \ |
| 5573 + if (nb > HUFF_LOOKAHEAD) { \ |
| 5574 + /* Equivalent of jpeg_huff_decode() */ \ |
| 5575 + /* Don't use GET_BITS() here because we don't want to modify bits_left */ \ |
| 5576 + s = (get_buffer >> bits_left) & ((1 << (nb)) - 1); \ |
| 5577 + while (s > htbl->maxcode[nb]) { \ |
| 5578 + s <<= 1; \ |
| 5579 + s |= GET_BITS(1); \ |
| 5580 + nb++; \ |
| 5581 + } \ |
| 5582 + if (nb > 16) \ |
| 5583 + goto slowlabel; \ |
| 5584 + s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ]; \ |
| 5585 + } |
| 5586 + |
| 5587 /* Out-of-line case for Huffman code fetching */ |
| 5588 EXTERN(int) jpeg_huff_decode |
| 5589 JPP((bitread_working_state * state, register bit_buf_type get_buffer, |
| 5590 Index: jdinput.c |
| 5591 =================================================================== |
| 5592 --- jdinput.c (revision 829) |
| 5593 +++ jdinput.c (working copy) |
| 5594 @@ -1,8 +1,10 @@ |
| 5595 /* |
| 5596 * jdinput.c |
| 5597 * |
| 5598 + * This file was part of the Independent JPEG Group's software: |
| 5599 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 5600 - * This file is part of the Independent JPEG Group's software. |
| 5601 + * libjpeg-turbo Modifications: |
| 5602 + * Copyright (C) 2010, D. R. Commander. |
| 5603 * For conditions of distribution and use, see the accompanying README file. |
| 5604 * |
| 5605 * This file contains input control logic for the JPEG decompressor. |
| 5606 @@ -14,6 +16,7 @@ |
| 5607 #define JPEG_INTERNALS |
| 5608 #include "jinclude.h" |
| 5609 #include "jpeglib.h" |
| 5610 +#include "jpegcomp.h" |
| 5611 |
| 5612 |
| 5613 /* Private state */ |
| 5614 @@ -70,16 +73,30 @@ |
| 5615 compptr->v_samp_factor); |
| 5616 } |
| 5617 |
| 5618 +#if JPEG_LIB_VERSION >=80 |
| 5619 + cinfo->block_size = DCTSIZE; |
| 5620 + cinfo->natural_order = jpeg_natural_order; |
| 5621 + cinfo->lim_Se = DCTSIZE2-1; |
| 5622 +#endif |
| 5623 + |
| 5624 /* We initialize DCT_scaled_size and min_DCT_scaled_size to DCTSIZE. |
| 5625 * In the full decompressor, this will be overridden by jdmaster.c; |
| 5626 * but in the transcoder, jdmaster.c is not used, so we must do it here. |
| 5627 */ |
| 5628 +#if JPEG_LIB_VERSION >= 70 |
| 5629 + cinfo->min_DCT_h_scaled_size = cinfo->min_DCT_v_scaled_size = DCTSIZE; |
| 5630 +#else |
| 5631 cinfo->min_DCT_scaled_size = DCTSIZE; |
| 5632 +#endif |
| 5633 |
| 5634 /* Compute dimensions of components */ |
| 5635 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 5636 ci++, compptr++) { |
| 5637 +#if JPEG_LIB_VERSION >= 70 |
| 5638 + compptr->DCT_h_scaled_size = compptr->DCT_v_scaled_size = DCTSIZE; |
| 5639 +#else |
| 5640 compptr->DCT_scaled_size = DCTSIZE; |
| 5641 +#endif |
| 5642 /* Size in DCT blocks */ |
| 5643 compptr->width_in_blocks = (JDIMENSION) |
| 5644 jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor, |
| 5645 @@ -138,7 +155,7 @@ |
| 5646 compptr->MCU_width = 1; |
| 5647 compptr->MCU_height = 1; |
| 5648 compptr->MCU_blocks = 1; |
| 5649 - compptr->MCU_sample_width = compptr->DCT_scaled_size; |
| 5650 + compptr->MCU_sample_width = compptr->_DCT_scaled_size; |
| 5651 compptr->last_col_width = 1; |
| 5652 /* For noninterleaved scans, it is convenient to define last_row_height |
| 5653 * as the number of block rows present in the last iMCU row. |
| 5654 @@ -174,7 +191,7 @@ |
| 5655 compptr->MCU_width = compptr->h_samp_factor; |
| 5656 compptr->MCU_height = compptr->v_samp_factor; |
| 5657 compptr->MCU_blocks = compptr->MCU_width * compptr->MCU_height; |
| 5658 - compptr->MCU_sample_width = compptr->MCU_width * compptr->DCT_scaled_size
; |
| 5659 + compptr->MCU_sample_width = compptr->MCU_width * compptr->_DCT_scaled_siz
e; |
| 5660 /* Figure number of non-dummy blocks in last MCU column & row */ |
| 5661 tmp = (int) (compptr->width_in_blocks % compptr->MCU_width); |
| 5662 if (tmp == 0) tmp = compptr->MCU_width; |
| 5663 Index: jdmainct.c |
| 5664 =================================================================== |
| 5665 --- jdmainct.c (revision 829) |
| 5666 +++ jdmainct.c (working copy) |
| 5667 @@ -1,8 +1,10 @@ |
| 5668 /* |
| 5669 * jdmainct.c |
| 5670 * |
| 5671 + * This file was part of the Independent JPEG Group's software: |
| 5672 * Copyright (C) 1994-1996, Thomas G. Lane. |
| 5673 - * This file is part of the Independent JPEG Group's software. |
| 5674 + * libjpeg-turbo Modifications: |
| 5675 + * Copyright (C) 2010, D. R. Commander. |
| 5676 * For conditions of distribution and use, see the accompanying README file. |
| 5677 * |
| 5678 * This file contains the main buffer controller for decompression. |
| 5679 @@ -13,9 +15,7 @@ |
| 5680 * supplies the equivalent of the main buffer in that case. |
| 5681 */ |
| 5682 |
| 5683 -#define JPEG_INTERNALS |
| 5684 -#include "jinclude.h" |
| 5685 -#include "jpeglib.h" |
| 5686 +#include "jdmainct.h" |
| 5687 |
| 5688 |
| 5689 /* |
| 5690 @@ -109,36 +109,6 @@ |
| 5691 */ |
| 5692 |
| 5693 |
| 5694 -/* Private buffer controller object */ |
| 5695 - |
| 5696 -typedef struct { |
| 5697 - struct jpeg_d_main_controller pub; /* public fields */ |
| 5698 - |
| 5699 - /* Pointer to allocated workspace (M or M+2 row groups). */ |
| 5700 - JSAMPARRAY buffer[MAX_COMPONENTS]; |
| 5701 - |
| 5702 - boolean buffer_full; /* Have we gotten an iMCU row from decoder? */ |
| 5703 - JDIMENSION rowgroup_ctr; /* counts row groups output to postprocessor */ |
| 5704 - |
| 5705 - /* Remaining fields are only used in the context case. */ |
| 5706 - |
| 5707 - /* These are the master pointers to the funny-order pointer lists. */ |
| 5708 - JSAMPIMAGE xbuffer[2]; /* pointers to weird pointer lists */ |
| 5709 - |
| 5710 - int whichptr; /* indicates which pointer set is now in
use */ |
| 5711 - int context_state; /* process_data state machine status */ |
| 5712 - JDIMENSION rowgroups_avail; /* row groups available to postprocessor */ |
| 5713 - JDIMENSION iMCU_row_ctr; /* counts iMCU rows to detect image top/bot */ |
| 5714 -} my_main_controller; |
| 5715 - |
| 5716 -typedef my_main_controller * my_main_ptr; |
| 5717 - |
| 5718 -/* context_state values: */ |
| 5719 -#define CTX_PREPARE_FOR_IMCU 0 /* need to prepare for MCU row */ |
| 5720 -#define CTX_PROCESS_IMCU 1 /* feeding iMCU to postprocessor */ |
| 5721 -#define CTX_POSTPONED_ROW 2 /* feeding postponed row group */ |
| 5722 - |
| 5723 - |
| 5724 /* Forward declarations */ |
| 5725 METHODDEF(void) process_data_simple_main |
| 5726 JPP((j_decompress_ptr cinfo, JSAMPARRAY output_buf, |
| 5727 @@ -159,9 +129,9 @@ |
| 5728 * This is done only once, not once per pass. |
| 5729 */ |
| 5730 { |
| 5731 - my_main_ptr main = (my_main_ptr) cinfo->main; |
| 5732 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 5733 int ci, rgroup; |
| 5734 - int M = cinfo->min_DCT_scaled_size; |
| 5735 + int M = cinfo->_min_DCT_scaled_size; |
| 5736 jpeg_component_info *compptr; |
| 5737 JSAMPARRAY xbuf; |
| 5738 |
| 5739 @@ -168,15 +138,15 @@ |
| 5740 /* Get top-level space for component array pointers. |
| 5741 * We alloc both arrays with one call to save a few cycles. |
| 5742 */ |
| 5743 - main->xbuffer[0] = (JSAMPIMAGE) |
| 5744 + main_ptr->xbuffer[0] = (JSAMPIMAGE) |
| 5745 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE, |
| 5746 cinfo->num_components * 2 * SIZEOF(JSAMPARRAY)); |
| 5747 - main->xbuffer[1] = main->xbuffer[0] + cinfo->num_components; |
| 5748 + main_ptr->xbuffer[1] = main_ptr->xbuffer[0] + cinfo->num_components; |
| 5749 |
| 5750 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 5751 ci++, compptr++) { |
| 5752 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) / |
| 5753 - cinfo->min_DCT_scaled_size; /* height of a row group of component */ |
| 5754 + rgroup = (compptr->v_samp_factor * compptr->_DCT_scaled_size) / |
| 5755 + cinfo->_min_DCT_scaled_size; /* height of a row group of component */ |
| 5756 /* Get space for pointer lists --- M+4 row groups in each list. |
| 5757 * We alloc both pointer lists with one call to save a few cycles. |
| 5758 */ |
| 5759 @@ -184,9 +154,9 @@ |
| 5760 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE, |
| 5761 2 * (rgroup * (M + 4)) * SIZEOF(JSAMPROW)); |
| 5762 xbuf += rgroup; /* want one row group at negative offsets */ |
| 5763 - main->xbuffer[0][ci] = xbuf; |
| 5764 + main_ptr->xbuffer[0][ci] = xbuf; |
| 5765 xbuf += rgroup * (M + 4); |
| 5766 - main->xbuffer[1][ci] = xbuf; |
| 5767 + main_ptr->xbuffer[1][ci] = xbuf; |
| 5768 } |
| 5769 } |
| 5770 |
| 5771 @@ -194,26 +164,26 @@ |
| 5772 LOCAL(void) |
| 5773 make_funny_pointers (j_decompress_ptr cinfo) |
| 5774 /* Create the funny pointer lists discussed in the comments above. |
| 5775 - * The actual workspace is already allocated (in main->buffer), |
| 5776 + * The actual workspace is already allocated (in main_ptr->buffer), |
| 5777 * and the space for the pointer lists is allocated too. |
| 5778 * This routine just fills in the curiously ordered lists. |
| 5779 * This will be repeated at the beginning of each pass. |
| 5780 */ |
| 5781 { |
| 5782 - my_main_ptr main = (my_main_ptr) cinfo->main; |
| 5783 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 5784 int ci, i, rgroup; |
| 5785 - int M = cinfo->min_DCT_scaled_size; |
| 5786 + int M = cinfo->_min_DCT_scaled_size; |
| 5787 jpeg_component_info *compptr; |
| 5788 JSAMPARRAY buf, xbuf0, xbuf1; |
| 5789 |
| 5790 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 5791 ci++, compptr++) { |
| 5792 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) / |
| 5793 - cinfo->min_DCT_scaled_size; /* height of a row group of component */ |
| 5794 - xbuf0 = main->xbuffer[0][ci]; |
| 5795 - xbuf1 = main->xbuffer[1][ci]; |
| 5796 + rgroup = (compptr->v_samp_factor * compptr->_DCT_scaled_size) / |
| 5797 + cinfo->_min_DCT_scaled_size; /* height of a row group of component */ |
| 5798 + xbuf0 = main_ptr->xbuffer[0][ci]; |
| 5799 + xbuf1 = main_ptr->xbuffer[1][ci]; |
| 5800 /* First copy the workspace pointers as-is */ |
| 5801 - buf = main->buffer[ci]; |
| 5802 + buf = main_ptr->buffer[ci]; |
| 5803 for (i = 0; i < rgroup * (M + 2); i++) { |
| 5804 xbuf0[i] = xbuf1[i] = buf[i]; |
| 5805 } |
| 5806 @@ -235,34 +205,6 @@ |
| 5807 |
| 5808 |
| 5809 LOCAL(void) |
| 5810 -set_wraparound_pointers (j_decompress_ptr cinfo) |
| 5811 -/* Set up the "wraparound" pointers at top and bottom of the pointer lists. |
| 5812 - * This changes the pointer list state from top-of-image to the normal state. |
| 5813 - */ |
| 5814 -{ |
| 5815 - my_main_ptr main = (my_main_ptr) cinfo->main; |
| 5816 - int ci, i, rgroup; |
| 5817 - int M = cinfo->min_DCT_scaled_size; |
| 5818 - jpeg_component_info *compptr; |
| 5819 - JSAMPARRAY xbuf0, xbuf1; |
| 5820 - |
| 5821 - for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 5822 - ci++, compptr++) { |
| 5823 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) / |
| 5824 - cinfo->min_DCT_scaled_size; /* height of a row group of component */ |
| 5825 - xbuf0 = main->xbuffer[0][ci]; |
| 5826 - xbuf1 = main->xbuffer[1][ci]; |
| 5827 - for (i = 0; i < rgroup; i++) { |
| 5828 - xbuf0[i - rgroup] = xbuf0[rgroup*(M+1) + i]; |
| 5829 - xbuf1[i - rgroup] = xbuf1[rgroup*(M+1) + i]; |
| 5830 - xbuf0[rgroup*(M+2) + i] = xbuf0[i]; |
| 5831 - xbuf1[rgroup*(M+2) + i] = xbuf1[i]; |
| 5832 - } |
| 5833 - } |
| 5834 -} |
| 5835 - |
| 5836 - |
| 5837 -LOCAL(void) |
| 5838 set_bottom_pointers (j_decompress_ptr cinfo) |
| 5839 /* Change the pointer lists to duplicate the last sample row at the bottom |
| 5840 * of the image. whichptr indicates which xbuffer holds the final iMCU row. |
| 5841 @@ -269,7 +211,7 @@ |
| 5842 * Also sets rowgroups_avail to indicate number of nondummy row groups in row. |
| 5843 */ |
| 5844 { |
| 5845 - my_main_ptr main = (my_main_ptr) cinfo->main; |
| 5846 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 5847 int ci, i, rgroup, iMCUheight, rows_left; |
| 5848 jpeg_component_info *compptr; |
| 5849 JSAMPARRAY xbuf; |
| 5850 @@ -277,8 +219,8 @@ |
| 5851 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 5852 ci++, compptr++) { |
| 5853 /* Count sample rows in one iMCU row and in one row group */ |
| 5854 - iMCUheight = compptr->v_samp_factor * compptr->DCT_scaled_size; |
| 5855 - rgroup = iMCUheight / cinfo->min_DCT_scaled_size; |
| 5856 + iMCUheight = compptr->v_samp_factor * compptr->_DCT_scaled_size; |
| 5857 + rgroup = iMCUheight / cinfo->_min_DCT_scaled_size; |
| 5858 /* Count nondummy sample rows remaining for this component */ |
| 5859 rows_left = (int) (compptr->downsampled_height % (JDIMENSION) iMCUheight); |
| 5860 if (rows_left == 0) rows_left = iMCUheight; |
| 5861 @@ -286,12 +228,12 @@ |
| 5862 * so we need only do it once. |
| 5863 */ |
| 5864 if (ci == 0) { |
| 5865 - main->rowgroups_avail = (JDIMENSION) ((rows_left-1) / rgroup + 1); |
| 5866 + main_ptr->rowgroups_avail = (JDIMENSION) ((rows_left-1) / rgroup + 1); |
| 5867 } |
| 5868 /* Duplicate the last real sample row rgroup*2 times; this pads out the |
| 5869 * last partial rowgroup and ensures at least one full rowgroup of context. |
| 5870 */ |
| 5871 - xbuf = main->xbuffer[main->whichptr][ci]; |
| 5872 + xbuf = main_ptr->xbuffer[main_ptr->whichptr][ci]; |
| 5873 for (i = 0; i < rgroup * 2; i++) { |
| 5874 xbuf[rows_left + i] = xbuf[rows_left-1]; |
| 5875 } |
| 5876 @@ -306,27 +248,27 @@ |
| 5877 METHODDEF(void) |
| 5878 start_pass_main (j_decompress_ptr cinfo, J_BUF_MODE pass_mode) |
| 5879 { |
| 5880 - my_main_ptr main = (my_main_ptr) cinfo->main; |
| 5881 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 5882 |
| 5883 switch (pass_mode) { |
| 5884 case JBUF_PASS_THRU: |
| 5885 if (cinfo->upsample->need_context_rows) { |
| 5886 - main->pub.process_data = process_data_context_main; |
| 5887 + main_ptr->pub.process_data = process_data_context_main; |
| 5888 make_funny_pointers(cinfo); /* Create the xbuffer[] lists */ |
| 5889 - main->whichptr = 0; /* Read first iMCU row into xbuffer[0] */ |
| 5890 - main->context_state = CTX_PREPARE_FOR_IMCU; |
| 5891 - main->iMCU_row_ctr = 0; |
| 5892 + main_ptr->whichptr = 0; /* Read first iMCU row into xbuffer[0] */ |
| 5893 + main_ptr->context_state = CTX_PREPARE_FOR_IMCU; |
| 5894 + main_ptr->iMCU_row_ctr = 0; |
| 5895 } else { |
| 5896 /* Simple case with no context needed */ |
| 5897 - main->pub.process_data = process_data_simple_main; |
| 5898 + main_ptr->pub.process_data = process_data_simple_main; |
| 5899 } |
| 5900 - main->buffer_full = FALSE; /* Mark buffer empty */ |
| 5901 - main->rowgroup_ctr = 0; |
| 5902 + main_ptr->buffer_full = FALSE; /* Mark buffer empty */ |
| 5903 + main_ptr->rowgroup_ctr = 0; |
| 5904 break; |
| 5905 #ifdef QUANT_2PASS_SUPPORTED |
| 5906 case JBUF_CRANK_DEST: |
| 5907 /* For last pass of 2-pass quantization, just crank the postprocessor */ |
| 5908 - main->pub.process_data = process_data_crank_post; |
| 5909 + main_ptr->pub.process_data = process_data_crank_post; |
| 5910 break; |
| 5911 #endif |
| 5912 default: |
| 5913 @@ -346,18 +288,18 @@ |
| 5914 JSAMPARRAY output_buf, JDIMENSION *out_row_ctr, |
| 5915 JDIMENSION out_rows_avail) |
| 5916 { |
| 5917 - my_main_ptr main = (my_main_ptr) cinfo->main; |
| 5918 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 5919 JDIMENSION rowgroups_avail; |
| 5920 |
| 5921 /* Read input data if we haven't filled the main buffer yet */ |
| 5922 - if (! main->buffer_full) { |
| 5923 - if (! (*cinfo->coef->decompress_data) (cinfo, main->buffer)) |
| 5924 + if (! main_ptr->buffer_full) { |
| 5925 + if (! (*cinfo->coef->decompress_data) (cinfo, main_ptr->buffer)) |
| 5926 return; /* suspension forced, can do nothing more */ |
| 5927 - main->buffer_full = TRUE; /* OK, we have an iMCU row to work with */ |
| 5928 + main_ptr->buffer_full = TRUE; /* OK, we have an iMCU row to work with
*/ |
| 5929 } |
| 5930 |
| 5931 /* There are always min_DCT_scaled_size row groups in an iMCU row. */ |
| 5932 - rowgroups_avail = (JDIMENSION) cinfo->min_DCT_scaled_size; |
| 5933 + rowgroups_avail = (JDIMENSION) cinfo->_min_DCT_scaled_size; |
| 5934 /* Note: at the bottom of the image, we may pass extra garbage row groups |
| 5935 * to the postprocessor. The postprocessor has to check for bottom |
| 5936 * of image anyway (at row resolution), so no point in us doing it too. |
| 5937 @@ -364,14 +306,14 @@ |
| 5938 */ |
| 5939 |
| 5940 /* Feed the postprocessor */ |
| 5941 - (*cinfo->post->post_process_data) (cinfo, main->buffer, |
| 5942 - &main->rowgroup_ctr, rowgroups_avail, |
| 5943 + (*cinfo->post->post_process_data) (cinfo, main_ptr->buffer, |
| 5944 + &main_ptr->rowgroup_ctr, rowgroups_avail, |
| 5945 output_buf, out_row_ctr, out_rows_avail); |
| 5946 |
| 5947 /* Has postprocessor consumed all the data yet? If so, mark buffer empty */ |
| 5948 - if (main->rowgroup_ctr >= rowgroups_avail) { |
| 5949 - main->buffer_full = FALSE; |
| 5950 - main->rowgroup_ctr = 0; |
| 5951 + if (main_ptr->rowgroup_ctr >= rowgroups_avail) { |
| 5952 + main_ptr->buffer_full = FALSE; |
| 5953 + main_ptr->rowgroup_ctr = 0; |
| 5954 } |
| 5955 } |
| 5956 |
| 5957 @@ -386,15 +328,15 @@ |
| 5958 JSAMPARRAY output_buf, JDIMENSION *out_row_ctr, |
| 5959 JDIMENSION out_rows_avail) |
| 5960 { |
| 5961 - my_main_ptr main = (my_main_ptr) cinfo->main; |
| 5962 + my_main_ptr main_ptr = (my_main_ptr) cinfo->main; |
| 5963 |
| 5964 /* Read input data if we haven't filled the main buffer yet */ |
| 5965 - if (! main->buffer_full) { |
| 5966 + if (! main_ptr->buffer_full) { |
| 5967 if (! (*cinfo->coef->decompress_data) (cinfo, |
| 5968 - main->xbuffer[main->whichptr])) |
| 5969 + main_ptr->xbuffer[main_ptr->whichptr]
)) |
| 5970 return; /* suspension forced, can do nothing more */ |
| 5971 - main->buffer_full = TRUE; /* OK, we have an iMCU row to work with */ |
| 5972 - main->iMCU_row_ctr++; /* count rows received */ |
| 5973 + main_ptr->buffer_full = TRUE; /* OK, we have an iMCU row to work with
*/ |
| 5974 + main_ptr->iMCU_row_ctr++; /* count rows received */ |
| 5975 } |
| 5976 |
| 5977 /* Postprocessor typically will not swallow all the input data it is handed |
| 5978 @@ -402,47 +344,47 @@ |
| 5979 * to exit and restart. This switch lets us keep track of how far we got. |
| 5980 * Note that each case falls through to the next on successful completion. |
| 5981 */ |
| 5982 - switch (main->context_state) { |
| 5983 + switch (main_ptr->context_state) { |
| 5984 case CTX_POSTPONED_ROW: |
| 5985 /* Call postprocessor using previously set pointers for postponed row */ |
| 5986 - (*cinfo->post->post_process_data) (cinfo, main->xbuffer[main->whichptr], |
| 5987 - &main->rowgroup_ctr, main->rowgroups_avail, |
| 5988 + (*cinfo->post->post_process_data) (cinfo, main_ptr->xbuffer[main_ptr->which
ptr], |
| 5989 + &main_ptr->rowgroup_ctr, main_ptr->rowgroups_avail, |
| 5990 output_buf, out_row_ctr, out_rows_avail); |
| 5991 - if (main->rowgroup_ctr < main->rowgroups_avail) |
| 5992 + if (main_ptr->rowgroup_ctr < main_ptr->rowgroups_avail) |
| 5993 return; /* Need to suspend */ |
| 5994 - main->context_state = CTX_PREPARE_FOR_IMCU; |
| 5995 + main_ptr->context_state = CTX_PREPARE_FOR_IMCU; |
| 5996 if (*out_row_ctr >= out_rows_avail) |
| 5997 return; /* Postprocessor exactly filled output buf */ |
| 5998 /*FALLTHROUGH*/ |
| 5999 case CTX_PREPARE_FOR_IMCU: |
| 6000 /* Prepare to process first M-1 row groups of this iMCU row */ |
| 6001 - main->rowgroup_ctr = 0; |
| 6002 - main->rowgroups_avail = (JDIMENSION) (cinfo->min_DCT_scaled_size - 1); |
| 6003 + main_ptr->rowgroup_ctr = 0; |
| 6004 + main_ptr->rowgroups_avail = (JDIMENSION) (cinfo->_min_DCT_scaled_size - 1); |
| 6005 /* Check for bottom of image: if so, tweak pointers to "duplicate" |
| 6006 * the last sample row, and adjust rowgroups_avail to ignore padding rows. |
| 6007 */ |
| 6008 - if (main->iMCU_row_ctr == cinfo->total_iMCU_rows) |
| 6009 + if (main_ptr->iMCU_row_ctr == cinfo->total_iMCU_rows) |
| 6010 set_bottom_pointers(cinfo); |
| 6011 - main->context_state = CTX_PROCESS_IMCU; |
| 6012 + main_ptr->context_state = CTX_PROCESS_IMCU; |
| 6013 /*FALLTHROUGH*/ |
| 6014 case CTX_PROCESS_IMCU: |
| 6015 /* Call postprocessor using previously set pointers */ |
| 6016 - (*cinfo->post->post_process_data) (cinfo, main->xbuffer[main->whichptr], |
| 6017 - &main->rowgroup_ctr, main->rowgroups_avail, |
| 6018 + (*cinfo->post->post_process_data) (cinfo, main_ptr->xbuffer[main_ptr->which
ptr], |
| 6019 + &main_ptr->rowgroup_ctr, main_ptr->rowgroups_avail, |
| 6020 output_buf, out_row_ctr, out_rows_avail); |
| 6021 - if (main->rowgroup_ctr < main->rowgroups_avail) |
| 6022 + if (main_ptr->rowgroup_ctr < main_ptr->rowgroups_avail) |
| 6023 return; /* Need to suspend */ |
| 6024 /* After the first iMCU, change wraparound pointers to normal state */ |
| 6025 - if (main->iMCU_row_ctr == 1) |
| 6026 + if (main_ptr->iMCU_row_ctr == 1) |
| 6027 set_wraparound_pointers(cinfo); |
| 6028 /* Prepare to load new iMCU row using other xbuffer list */ |
| 6029 - main->whichptr ^= 1; /* 0=>1 or 1=>0 */ |
| 6030 - main->buffer_full = FALSE; |
| 6031 + main_ptr->whichptr ^= 1; /* 0=>1 or 1=>0 */ |
| 6032 + main_ptr->buffer_full = FALSE; |
| 6033 /* Still need to process last row group of this iMCU row, */ |
| 6034 /* which is saved at index M+1 of the other xbuffer */ |
| 6035 - main->rowgroup_ctr = (JDIMENSION) (cinfo->min_DCT_scaled_size + 1); |
| 6036 - main->rowgroups_avail = (JDIMENSION) (cinfo->min_DCT_scaled_size + 2); |
| 6037 - main->context_state = CTX_POSTPONED_ROW; |
| 6038 + main_ptr->rowgroup_ctr = (JDIMENSION) (cinfo->_min_DCT_scaled_size + 1); |
| 6039 + main_ptr->rowgroups_avail = (JDIMENSION) (cinfo->_min_DCT_scaled_size + 2); |
| 6040 + main_ptr->context_state = CTX_POSTPONED_ROW; |
| 6041 } |
| 6042 } |
| 6043 |
| 6044 @@ -475,15 +417,15 @@ |
| 6045 GLOBAL(void) |
| 6046 jinit_d_main_controller (j_decompress_ptr cinfo, boolean need_full_buffer) |
| 6047 { |
| 6048 - my_main_ptr main; |
| 6049 + my_main_ptr main_ptr; |
| 6050 int ci, rgroup, ngroups; |
| 6051 jpeg_component_info *compptr; |
| 6052 |
| 6053 - main = (my_main_ptr) |
| 6054 + main_ptr = (my_main_ptr) |
| 6055 (*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE, |
| 6056 SIZEOF(my_main_controller)); |
| 6057 - cinfo->main = (struct jpeg_d_main_controller *) main; |
| 6058 - main->pub.start_pass = start_pass_main; |
| 6059 + cinfo->main = (struct jpeg_d_main_controller *) main_ptr; |
| 6060 + main_ptr->pub.start_pass = start_pass_main; |
| 6061 |
| 6062 if (need_full_buffer) /* shouldn't happen */ |
| 6063 ERREXIT(cinfo, JERR_BAD_BUFFER_MODE); |
| 6064 @@ -492,21 +434,21 @@ |
| 6065 * ngroups is the number of row groups we need. |
| 6066 */ |
| 6067 if (cinfo->upsample->need_context_rows) { |
| 6068 - if (cinfo->min_DCT_scaled_size < 2) /* unsupported, see comments above */ |
| 6069 + if (cinfo->_min_DCT_scaled_size < 2) /* unsupported, see comments above */ |
| 6070 ERREXIT(cinfo, JERR_NOTIMPL); |
| 6071 alloc_funny_pointers(cinfo); /* Alloc space for xbuffer[] lists */ |
| 6072 - ngroups = cinfo->min_DCT_scaled_size + 2; |
| 6073 + ngroups = cinfo->_min_DCT_scaled_size + 2; |
| 6074 } else { |
| 6075 - ngroups = cinfo->min_DCT_scaled_size; |
| 6076 + ngroups = cinfo->_min_DCT_scaled_size; |
| 6077 } |
| 6078 |
| 6079 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 6080 ci++, compptr++) { |
| 6081 - rgroup = (compptr->v_samp_factor * compptr->DCT_scaled_size) / |
| 6082 - cinfo->min_DCT_scaled_size; /* height of a row group of component */ |
| 6083 - main->buffer[ci] = (*cinfo->mem->alloc_sarray) |
| 6084 + rgroup = (compptr->v_samp_factor * compptr->_DCT_scaled_size) / |
| 6085 + cinfo->_min_DCT_scaled_size; /* height of a row group of component */ |
| 6086 + main_ptr->buffer[ci] = (*cinfo->mem->alloc_sarray) |
| 6087 ((j_common_ptr) cinfo, JPOOL_IMAGE, |
| 6088 - compptr->width_in_blocks * compptr->DCT_scaled_size, |
| 6089 + compptr->width_in_blocks * compptr->_DCT_scaled_size, |
| 6090 (JDIMENSION) (rgroup * ngroups)); |
| 6091 } |
| 6092 } |
1 Index: jdmarker.c | 6093 Index: jdmarker.c |
2 =================================================================== | 6094 =================================================================== |
3 --- jdmarker.c (revision 829) | 6095 --- jdmarker.c (revision 829) |
4 +++ jdmarker.c (working copy) | 6096 +++ jdmarker.c (working copy) |
5 @@ -910,7 +910,7 @@ | 6097 @@ -1,8 +1,10 @@ |
| 6098 /* |
| 6099 * jdmarker.c |
| 6100 * |
| 6101 + * This file was part of the Independent JPEG Group's software: |
| 6102 * Copyright (C) 1991-1998, Thomas G. Lane. |
| 6103 - * This file is part of the Independent JPEG Group's software. |
| 6104 + * libjpeg-turbo Modifications: |
| 6105 + * Copyright (C) 2012, D. R. Commander. |
| 6106 * For conditions of distribution and use, see the accompanying README file. |
| 6107 * |
| 6108 * This file contains routines to decode JPEG datastream markers. |
| 6109 @@ -302,7 +304,7 @@ |
| 6110 /* Process a SOS marker */ |
| 6111 { |
| 6112 INT32 length; |
| 6113 - int i, ci, n, c, cc; |
| 6114 + int i, ci, n, c, cc, pi; |
| 6115 jpeg_component_info * compptr; |
| 6116 INPUT_VARS(cinfo); |
| 6117 |
| 6118 @@ -322,13 +324,17 @@ |
| 6119 |
| 6120 /* Collect the component-spec parameters */ |
| 6121 |
| 6122 + for (i = 0; i < MAX_COMPS_IN_SCAN; i++) |
| 6123 + cinfo->cur_comp_info[i] = NULL; |
| 6124 + |
| 6125 for (i = 0; i < n; i++) { |
| 6126 INPUT_BYTE(cinfo, cc, return FALSE); |
| 6127 INPUT_BYTE(cinfo, c, return FALSE); |
| 6128 |
| 6129 - for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 6130 + for (ci = 0, compptr = cinfo->comp_info; |
| 6131 +» ci < cinfo->num_components && ci < MAX_COMPS_IN_SCAN; |
| 6132 » ci++, compptr++) { |
| 6133 - if (cc == compptr->component_id) |
| 6134 + if (cc == compptr->component_id && !cinfo->cur_comp_info[ci]) |
| 6135 » goto id_found; |
| 6136 } |
| 6137 |
| 6138 @@ -342,6 +348,13 @@ |
| 6139 |
| 6140 TRACEMS3(cinfo, 1, JTRC_SOS_COMPONENT, cc, |
| 6141 » compptr->dc_tbl_no, compptr->ac_tbl_no); |
| 6142 + |
| 6143 + /* This CSi (cc) should differ from the previous CSi */ |
| 6144 + for (pi = 0; pi < i; pi++) { |
| 6145 + if (cinfo->cur_comp_info[pi] == compptr) { |
| 6146 + ERREXIT1(cinfo, JERR_BAD_COMPONENT_ID, cc); |
| 6147 + } |
| 6148 + } |
| 6149 } |
| 6150 |
| 6151 /* Collect the additional scan parameters Ss, Se, Ah/Al. */ |
| 6152 @@ -459,18 +472,21 @@ |
| 6153 for (i = 0; i < count; i++) |
| 6154 INPUT_BYTE(cinfo, huffval[i], return FALSE); |
| 6155 |
| 6156 + MEMZERO(&huffval[count], (256 - count) * SIZEOF(UINT8)); |
| 6157 + |
| 6158 length -= count; |
| 6159 |
| 6160 if (index & 0x10) {» » /* AC table definition */ |
| 6161 index -= 0x10; |
| 6162 + if (index < 0 || index >= NUM_HUFF_TBLS) |
| 6163 + ERREXIT1(cinfo, JERR_DHT_INDEX, index); |
| 6164 htblptr = &cinfo->ac_huff_tbl_ptrs[index]; |
| 6165 } else {» » » /* DC table definition */ |
| 6166 + if (index < 0 || index >= NUM_HUFF_TBLS) |
| 6167 + ERREXIT1(cinfo, JERR_DHT_INDEX, index); |
| 6168 htblptr = &cinfo->dc_huff_tbl_ptrs[index]; |
| 6169 } |
| 6170 |
| 6171 - if (index < 0 || index >= NUM_HUFF_TBLS) |
| 6172 - ERREXIT1(cinfo, JERR_DHT_INDEX, index); |
| 6173 - |
| 6174 if (*htblptr == NULL) |
| 6175 *htblptr = jpeg_alloc_huff_table((j_common_ptr) cinfo); |
| 6176 |
| 6177 @@ -906,7 +922,7 @@ |
6 } | 6178 } |
7 | 6179 |
8 if (cinfo->marker->discarded_bytes != 0) { | 6180 if (cinfo->marker->discarded_bytes != 0) { |
9 - WARNMS2(cinfo, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c); | 6181 - WARNMS2(cinfo, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c); |
10 + TRACEMS2(cinfo, 1, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c)
; | 6182 + TRACEMS2(cinfo, 1, JWRN_EXTRANEOUS_DATA, cinfo->marker->discarded_bytes, c)
; |
11 cinfo->marker->discarded_bytes = 0; | 6183 cinfo->marker->discarded_bytes = 0; |
12 } | 6184 } |
13 | 6185 |
14 @@ -944,7 +944,144 @@ | 6186 @@ -940,7 +956,144 @@ |
15 return TRUE; | 6187 return TRUE; |
16 } | 6188 } |
17 | 6189 |
18 +#ifdef MOTION_JPEG_SUPPORTED | 6190 +#ifdef MOTION_JPEG_SUPPORTED |
19 | 6191 |
20 +/* The default Huffman tables used by motion JPEG frames. When a motion JPEG | 6192 +/* The default Huffman tables used by motion JPEG frames. When a motion JPEG |
21 + * frame does not have DHT tables, we should use the huffman tables suggested b
y | 6193 + * frame does not have DHT tables, we should use the huffman tables suggested b
y |
22 + * the JPEG standard. Each of these tables represents a member of the JHUFF_TBL
S | 6194 + * the JPEG standard. Each of these tables represents a member of the JHUFF_TBL
S |
23 + * struct so we can just copy it to the according JHUFF_TBLS member. | 6195 + * struct so we can just copy it to the according JHUFF_TBLS member. |
24 + */ | 6196 + */ |
(...skipping 124 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
149 +#else | 6321 +#else |
150 + | 6322 + |
151 +#define mjpg_load_huff_tables(cinfo) | 6323 +#define mjpg_load_huff_tables(cinfo) |
152 + | 6324 + |
153 +#endif /* MOTION_JPEG_SUPPORTED */ | 6325 +#endif /* MOTION_JPEG_SUPPORTED */ |
154 + | 6326 + |
155 + | 6327 + |
156 /* | 6328 /* |
157 * Read markers until SOS or EOI. | 6329 * Read markers until SOS or EOI. |
158 * | 6330 * |
159 @@ -1013,6 +1150,7 @@ | 6331 @@ -1009,6 +1162,7 @@ |
160 break; | 6332 break; |
161 | 6333 |
162 case M_SOS: | 6334 case M_SOS: |
163 + mjpg_load_huff_tables(cinfo); | 6335 + mjpg_load_huff_tables(cinfo); |
164 if (! get_sos(cinfo)) | 6336 if (! get_sos(cinfo)) |
165 return JPEG_SUSPENDED; | 6337 return JPEG_SUSPENDED; |
166 cinfo->unread_marker = 0; /* processed the marker */ | 6338 cinfo->unread_marker = 0; /* processed the marker */ |
| 6339 Index: jdmaster.c |
| 6340 =================================================================== |
| 6341 --- jdmaster.c (revision 829) |
| 6342 +++ jdmaster.c (working copy) |
| 6343 @@ -1,9 +1,11 @@ |
| 6344 /* |
| 6345 * jdmaster.c |
| 6346 * |
| 6347 + * This file was part of the Independent JPEG Group's software: |
| 6348 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 6349 - * Copyright (C) 2009, D. R. Commander. |
| 6350 - * This file is part of the Independent JPEG Group's software. |
| 6351 + * Modified 2002-2009 by Guido Vollbeding. |
| 6352 + * libjpeg-turbo Modifications: |
| 6353 + * Copyright (C) 2009-2011, D. R. Commander. |
| 6354 * For conditions of distribution and use, see the accompanying README file. |
| 6355 * |
| 6356 * This file contains master control logic for the JPEG decompressor. |
| 6357 @@ -15,6 +17,7 @@ |
| 6358 #define JPEG_INTERNALS |
| 6359 #include "jinclude.h" |
| 6360 #include "jpeglib.h" |
| 6361 +#include "jpegcomp.h" |
| 6362 |
| 6363 |
| 6364 /* Private state */ |
| 6365 @@ -56,7 +59,11 @@ |
| 6366 cinfo->out_color_space != JCS_EXT_BGR && |
| 6367 cinfo->out_color_space != JCS_EXT_BGRX && |
| 6368 cinfo->out_color_space != JCS_EXT_XBGR && |
| 6369 - cinfo->out_color_space != JCS_EXT_XRGB) || |
| 6370 + cinfo->out_color_space != JCS_EXT_XRGB && |
| 6371 + cinfo->out_color_space != JCS_EXT_RGBA && |
| 6372 + cinfo->out_color_space != JCS_EXT_BGRA && |
| 6373 + cinfo->out_color_space != JCS_EXT_ABGR && |
| 6374 + cinfo->out_color_space != JCS_EXT_ARGB) || |
| 6375 cinfo->out_color_components != rgb_pixelsize[cinfo->out_color_space]) |
| 6376 return FALSE; |
| 6377 /* and it only handles 2h1v or 2h2v sampling ratios */ |
| 6378 @@ -68,9 +75,9 @@ |
| 6379 cinfo->comp_info[2].v_samp_factor != 1) |
| 6380 return FALSE; |
| 6381 /* furthermore, it doesn't work if we've scaled the IDCTs differently */ |
| 6382 - if (cinfo->comp_info[0].DCT_scaled_size != cinfo->min_DCT_scaled_size || |
| 6383 - cinfo->comp_info[1].DCT_scaled_size != cinfo->min_DCT_scaled_size || |
| 6384 - cinfo->comp_info[2].DCT_scaled_size != cinfo->min_DCT_scaled_size) |
| 6385 + if (cinfo->comp_info[0]._DCT_scaled_size != cinfo->_min_DCT_scaled_size || |
| 6386 + cinfo->comp_info[1]._DCT_scaled_size != cinfo->_min_DCT_scaled_size || |
| 6387 + cinfo->comp_info[2]._DCT_scaled_size != cinfo->_min_DCT_scaled_size) |
| 6388 return FALSE; |
| 6389 /* ??? also need to test for upsample-time rescaling, when & if supported */ |
| 6390 return TRUE; /* by golly, it'll work... */ |
| 6391 @@ -84,6 +91,177 @@ |
| 6392 * Compute output image dimensions and related values. |
| 6393 * NOTE: this is exported for possible use by application. |
| 6394 * Hence it mustn't do anything that can't be done twice. |
| 6395 + */ |
| 6396 + |
| 6397 +#if JPEG_LIB_VERSION >= 80 |
| 6398 +GLOBAL(void) |
| 6399 +#else |
| 6400 +LOCAL(void) |
| 6401 +#endif |
| 6402 +jpeg_core_output_dimensions (j_decompress_ptr cinfo) |
| 6403 +/* Do computations that are needed before master selection phase. |
| 6404 + * This function is used for transcoding and full decompression. |
| 6405 + */ |
| 6406 +{ |
| 6407 +#ifdef IDCT_SCALING_SUPPORTED |
| 6408 + int ci; |
| 6409 + jpeg_component_info *compptr; |
| 6410 + |
| 6411 + /* Compute actual output image dimensions and DCT scaling choices. */ |
| 6412 + if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom) { |
| 6413 + /* Provide 1/block_size scaling */ |
| 6414 + cinfo->output_width = (JDIMENSION) |
| 6415 + jdiv_round_up((long) cinfo->image_width, (long) DCTSIZE); |
| 6416 + cinfo->output_height = (JDIMENSION) |
| 6417 + jdiv_round_up((long) cinfo->image_height, (long) DCTSIZE); |
| 6418 + cinfo->_min_DCT_h_scaled_size = 1; |
| 6419 + cinfo->_min_DCT_v_scaled_size = 1; |
| 6420 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 2) { |
| 6421 + /* Provide 2/block_size scaling */ |
| 6422 + cinfo->output_width = (JDIMENSION) |
| 6423 + jdiv_round_up((long) cinfo->image_width * 2L, (long) DCTSIZE); |
| 6424 + cinfo->output_height = (JDIMENSION) |
| 6425 + jdiv_round_up((long) cinfo->image_height * 2L, (long) DCTSIZE); |
| 6426 + cinfo->_min_DCT_h_scaled_size = 2; |
| 6427 + cinfo->_min_DCT_v_scaled_size = 2; |
| 6428 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 3) { |
| 6429 + /* Provide 3/block_size scaling */ |
| 6430 + cinfo->output_width = (JDIMENSION) |
| 6431 + jdiv_round_up((long) cinfo->image_width * 3L, (long) DCTSIZE); |
| 6432 + cinfo->output_height = (JDIMENSION) |
| 6433 + jdiv_round_up((long) cinfo->image_height * 3L, (long) DCTSIZE); |
| 6434 + cinfo->_min_DCT_h_scaled_size = 3; |
| 6435 + cinfo->_min_DCT_v_scaled_size = 3; |
| 6436 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 4) { |
| 6437 + /* Provide 4/block_size scaling */ |
| 6438 + cinfo->output_width = (JDIMENSION) |
| 6439 + jdiv_round_up((long) cinfo->image_width * 4L, (long) DCTSIZE); |
| 6440 + cinfo->output_height = (JDIMENSION) |
| 6441 + jdiv_round_up((long) cinfo->image_height * 4L, (long) DCTSIZE); |
| 6442 + cinfo->_min_DCT_h_scaled_size = 4; |
| 6443 + cinfo->_min_DCT_v_scaled_size = 4; |
| 6444 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 5) { |
| 6445 + /* Provide 5/block_size scaling */ |
| 6446 + cinfo->output_width = (JDIMENSION) |
| 6447 + jdiv_round_up((long) cinfo->image_width * 5L, (long) DCTSIZE); |
| 6448 + cinfo->output_height = (JDIMENSION) |
| 6449 + jdiv_round_up((long) cinfo->image_height * 5L, (long) DCTSIZE); |
| 6450 + cinfo->_min_DCT_h_scaled_size = 5; |
| 6451 + cinfo->_min_DCT_v_scaled_size = 5; |
| 6452 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 6) { |
| 6453 + /* Provide 6/block_size scaling */ |
| 6454 + cinfo->output_width = (JDIMENSION) |
| 6455 + jdiv_round_up((long) cinfo->image_width * 6L, (long) DCTSIZE); |
| 6456 + cinfo->output_height = (JDIMENSION) |
| 6457 + jdiv_round_up((long) cinfo->image_height * 6L, (long) DCTSIZE); |
| 6458 + cinfo->_min_DCT_h_scaled_size = 6; |
| 6459 + cinfo->_min_DCT_v_scaled_size = 6; |
| 6460 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 7) { |
| 6461 + /* Provide 7/block_size scaling */ |
| 6462 + cinfo->output_width = (JDIMENSION) |
| 6463 + jdiv_round_up((long) cinfo->image_width * 7L, (long) DCTSIZE); |
| 6464 + cinfo->output_height = (JDIMENSION) |
| 6465 + jdiv_round_up((long) cinfo->image_height * 7L, (long) DCTSIZE); |
| 6466 + cinfo->_min_DCT_h_scaled_size = 7; |
| 6467 + cinfo->_min_DCT_v_scaled_size = 7; |
| 6468 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 8) { |
| 6469 + /* Provide 8/block_size scaling */ |
| 6470 + cinfo->output_width = (JDIMENSION) |
| 6471 + jdiv_round_up((long) cinfo->image_width * 8L, (long) DCTSIZE); |
| 6472 + cinfo->output_height = (JDIMENSION) |
| 6473 + jdiv_round_up((long) cinfo->image_height * 8L, (long) DCTSIZE); |
| 6474 + cinfo->_min_DCT_h_scaled_size = 8; |
| 6475 + cinfo->_min_DCT_v_scaled_size = 8; |
| 6476 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 9) { |
| 6477 + /* Provide 9/block_size scaling */ |
| 6478 + cinfo->output_width = (JDIMENSION) |
| 6479 + jdiv_round_up((long) cinfo->image_width * 9L, (long) DCTSIZE); |
| 6480 + cinfo->output_height = (JDIMENSION) |
| 6481 + jdiv_round_up((long) cinfo->image_height * 9L, (long) DCTSIZE); |
| 6482 + cinfo->_min_DCT_h_scaled_size = 9; |
| 6483 + cinfo->_min_DCT_v_scaled_size = 9; |
| 6484 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 10) { |
| 6485 + /* Provide 10/block_size scaling */ |
| 6486 + cinfo->output_width = (JDIMENSION) |
| 6487 + jdiv_round_up((long) cinfo->image_width * 10L, (long) DCTSIZE); |
| 6488 + cinfo->output_height = (JDIMENSION) |
| 6489 + jdiv_round_up((long) cinfo->image_height * 10L, (long) DCTSIZE); |
| 6490 + cinfo->_min_DCT_h_scaled_size = 10; |
| 6491 + cinfo->_min_DCT_v_scaled_size = 10; |
| 6492 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 11) { |
| 6493 + /* Provide 11/block_size scaling */ |
| 6494 + cinfo->output_width = (JDIMENSION) |
| 6495 + jdiv_round_up((long) cinfo->image_width * 11L, (long) DCTSIZE); |
| 6496 + cinfo->output_height = (JDIMENSION) |
| 6497 + jdiv_round_up((long) cinfo->image_height * 11L, (long) DCTSIZE); |
| 6498 + cinfo->_min_DCT_h_scaled_size = 11; |
| 6499 + cinfo->_min_DCT_v_scaled_size = 11; |
| 6500 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 12) { |
| 6501 + /* Provide 12/block_size scaling */ |
| 6502 + cinfo->output_width = (JDIMENSION) |
| 6503 + jdiv_round_up((long) cinfo->image_width * 12L, (long) DCTSIZE); |
| 6504 + cinfo->output_height = (JDIMENSION) |
| 6505 + jdiv_round_up((long) cinfo->image_height * 12L, (long) DCTSIZE); |
| 6506 + cinfo->_min_DCT_h_scaled_size = 12; |
| 6507 + cinfo->_min_DCT_v_scaled_size = 12; |
| 6508 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 13) { |
| 6509 + /* Provide 13/block_size scaling */ |
| 6510 + cinfo->output_width = (JDIMENSION) |
| 6511 + jdiv_round_up((long) cinfo->image_width * 13L, (long) DCTSIZE); |
| 6512 + cinfo->output_height = (JDIMENSION) |
| 6513 + jdiv_round_up((long) cinfo->image_height * 13L, (long) DCTSIZE); |
| 6514 + cinfo->_min_DCT_h_scaled_size = 13; |
| 6515 + cinfo->_min_DCT_v_scaled_size = 13; |
| 6516 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 14) { |
| 6517 + /* Provide 14/block_size scaling */ |
| 6518 + cinfo->output_width = (JDIMENSION) |
| 6519 + jdiv_round_up((long) cinfo->image_width * 14L, (long) DCTSIZE); |
| 6520 + cinfo->output_height = (JDIMENSION) |
| 6521 + jdiv_round_up((long) cinfo->image_height * 14L, (long) DCTSIZE); |
| 6522 + cinfo->_min_DCT_h_scaled_size = 14; |
| 6523 + cinfo->_min_DCT_v_scaled_size = 14; |
| 6524 + } else if (cinfo->scale_num * DCTSIZE <= cinfo->scale_denom * 15) { |
| 6525 + /* Provide 15/block_size scaling */ |
| 6526 + cinfo->output_width = (JDIMENSION) |
| 6527 + jdiv_round_up((long) cinfo->image_width * 15L, (long) DCTSIZE); |
| 6528 + cinfo->output_height = (JDIMENSION) |
| 6529 + jdiv_round_up((long) cinfo->image_height * 15L, (long) DCTSIZE); |
| 6530 + cinfo->_min_DCT_h_scaled_size = 15; |
| 6531 + cinfo->_min_DCT_v_scaled_size = 15; |
| 6532 + } else { |
| 6533 + /* Provide 16/block_size scaling */ |
| 6534 + cinfo->output_width = (JDIMENSION) |
| 6535 + jdiv_round_up((long) cinfo->image_width * 16L, (long) DCTSIZE); |
| 6536 + cinfo->output_height = (JDIMENSION) |
| 6537 + jdiv_round_up((long) cinfo->image_height * 16L, (long) DCTSIZE); |
| 6538 + cinfo->_min_DCT_h_scaled_size = 16; |
| 6539 + cinfo->_min_DCT_v_scaled_size = 16; |
| 6540 + } |
| 6541 + |
| 6542 + /* Recompute dimensions of components */ |
| 6543 + for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 6544 + ci++, compptr++) { |
| 6545 + compptr->_DCT_h_scaled_size = cinfo->_min_DCT_h_scaled_size; |
| 6546 + compptr->_DCT_v_scaled_size = cinfo->_min_DCT_v_scaled_size; |
| 6547 + } |
| 6548 + |
| 6549 +#else /* !IDCT_SCALING_SUPPORTED */ |
| 6550 + |
| 6551 + /* Hardwire it to "no scaling" */ |
| 6552 + cinfo->output_width = cinfo->image_width; |
| 6553 + cinfo->output_height = cinfo->image_height; |
| 6554 + /* jdinput.c has already initialized DCT_scaled_size, |
| 6555 + * and has computed unscaled downsampled_width and downsampled_height. |
| 6556 + */ |
| 6557 + |
| 6558 +#endif /* IDCT_SCALING_SUPPORTED */ |
| 6559 +} |
| 6560 + |
| 6561 + |
| 6562 +/* |
| 6563 + * Compute output image dimensions and related values. |
| 6564 + * NOTE: this is exported for possible use by application. |
| 6565 + * Hence it mustn't do anything that can't be done twice. |
| 6566 * Also note that it may be called before the master module is initialized! |
| 6567 */ |
| 6568 |
| 6569 @@ -100,52 +278,31 @@ |
| 6570 if (cinfo->global_state != DSTATE_READY) |
| 6571 ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state); |
| 6572 |
| 6573 + /* Compute core output image dimensions and DCT scaling choices. */ |
| 6574 + jpeg_core_output_dimensions(cinfo); |
| 6575 + |
| 6576 #ifdef IDCT_SCALING_SUPPORTED |
| 6577 |
| 6578 - /* Compute actual output image dimensions and DCT scaling choices. */ |
| 6579 - if (cinfo->scale_num * 8 <= cinfo->scale_denom) { |
| 6580 - /* Provide 1/8 scaling */ |
| 6581 - cinfo->output_width = (JDIMENSION) |
| 6582 - jdiv_round_up((long) cinfo->image_width, 8L); |
| 6583 - cinfo->output_height = (JDIMENSION) |
| 6584 - jdiv_round_up((long) cinfo->image_height, 8L); |
| 6585 - cinfo->min_DCT_scaled_size = 1; |
| 6586 - } else if (cinfo->scale_num * 4 <= cinfo->scale_denom) { |
| 6587 - /* Provide 1/4 scaling */ |
| 6588 - cinfo->output_width = (JDIMENSION) |
| 6589 - jdiv_round_up((long) cinfo->image_width, 4L); |
| 6590 - cinfo->output_height = (JDIMENSION) |
| 6591 - jdiv_round_up((long) cinfo->image_height, 4L); |
| 6592 - cinfo->min_DCT_scaled_size = 2; |
| 6593 - } else if (cinfo->scale_num * 2 <= cinfo->scale_denom) { |
| 6594 - /* Provide 1/2 scaling */ |
| 6595 - cinfo->output_width = (JDIMENSION) |
| 6596 - jdiv_round_up((long) cinfo->image_width, 2L); |
| 6597 - cinfo->output_height = (JDIMENSION) |
| 6598 - jdiv_round_up((long) cinfo->image_height, 2L); |
| 6599 - cinfo->min_DCT_scaled_size = 4; |
| 6600 - } else { |
| 6601 - /* Provide 1/1 scaling */ |
| 6602 - cinfo->output_width = cinfo->image_width; |
| 6603 - cinfo->output_height = cinfo->image_height; |
| 6604 - cinfo->min_DCT_scaled_size = DCTSIZE; |
| 6605 - } |
| 6606 /* In selecting the actual DCT scaling for each component, we try to |
| 6607 * scale up the chroma components via IDCT scaling rather than upsampling. |
| 6608 * This saves time if the upsampler gets to use 1:1 scaling. |
| 6609 - * Note this code assumes that the supported DCT scalings are powers of 2. |
| 6610 + * Note this code adapts subsampling ratios which are powers of 2. |
| 6611 */ |
| 6612 for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; |
| 6613 ci++, compptr++) { |
| 6614 - int ssize = cinfo->min_DCT_scaled_size; |
| 6615 + int ssize = cinfo->_min_DCT_scaled_size; |
| 6616 while (ssize < DCTSIZE && |
| 6617 - (compptr->h_samp_factor * ssize * 2 <= |
| 6618 - cinfo->max_h_samp_factor * cinfo->min_DCT_scaled_size) && |
| 6619 - (compptr->v_samp_factor * ssize * 2 <= |
| 6620 - cinfo->max_v_samp_factor * cinfo->min_DCT_scaled_size)) { |
| 6621 + ((cinfo->max_h_samp_factor * cinfo->_min_DCT_scaled_size) % |
| 6622 + (compptr->h_samp_factor * ssize * 2) == 0) && |
| 6623 + ((cinfo->max_v_samp_factor * cinfo->_min_DCT_scaled_size) % |
| 6624 + (compptr->v_samp_factor * ssize * 2) == 0)) { |
| 6625 ssize = ssize * 2; |
| 6626 } |
| 6627 +#if JPEG_LIB_VERSION >= 70 |
| 6628 + compptr->DCT_h_scaled_size = compptr->DCT_v_scaled_size = ssize; |
| 6629 +#else |
| 6630 compptr->DCT_scaled_size = ssize; |
| 6631 +#endif |
| 6632 } |
| 6633 |
| 6634 /* Recompute downsampled dimensions of components; |
| 6635 @@ -156,11 +313,11 @@ |
| 6636 /* Size in samples, after IDCT scaling */ |
| 6637 compptr->downsampled_width = (JDIMENSION) |
| 6638 jdiv_round_up((long) cinfo->image_width * |
| 6639 - (long) (compptr->h_samp_factor * compptr->DCT_scaled_size), |
| 6640 + (long) (compptr->h_samp_factor * compptr->_DCT_scaled_size), |
| 6641 (long) (cinfo->max_h_samp_factor * DCTSIZE)); |
| 6642 compptr->downsampled_height = (JDIMENSION) |
| 6643 jdiv_round_up((long) cinfo->image_height * |
| 6644 - (long) (compptr->v_samp_factor * compptr->DCT_scaled_size), |
| 6645 + (long) (compptr->v_samp_factor * compptr->_DCT_scaled_size), |
| 6646 (long) (cinfo->max_v_samp_factor * DCTSIZE)); |
| 6647 } |
| 6648 |
| 6649 @@ -188,6 +345,10 @@ |
| 6650 case JCS_EXT_BGRX: |
| 6651 case JCS_EXT_XBGR: |
| 6652 case JCS_EXT_XRGB: |
| 6653 + case JCS_EXT_RGBA: |
| 6654 + case JCS_EXT_BGRA: |
| 6655 + case JCS_EXT_ABGR: |
| 6656 + case JCS_EXT_ARGB: |
| 6657 cinfo->out_color_components = rgb_pixelsize[cinfo->out_color_space]; |
| 6658 break; |
| 6659 case JCS_YCbCr: |
| 6660 @@ -384,7 +545,11 @@ |
| 6661 jinit_inverse_dct(cinfo); |
| 6662 /* Entropy decoding: either Huffman or arithmetic coding. */ |
| 6663 if (cinfo->arith_code) { |
| 6664 +#ifdef D_ARITH_CODING_SUPPORTED |
| 6665 + jinit_arith_decoder(cinfo); |
| 6666 +#else |
| 6667 ERREXIT(cinfo, JERR_ARITH_NOTIMPL); |
| 6668 +#endif |
| 6669 } else { |
| 6670 if (cinfo->progressive_mode) { |
| 6671 #ifdef D_PROGRESSIVE_SUPPORTED |
| 6672 Index: jdmerge.c |
| 6673 =================================================================== |
| 6674 --- jdmerge.c (revision 829) |
| 6675 +++ jdmerge.c (working copy) |
| 6676 @@ -1,10 +1,11 @@ |
| 6677 /* |
| 6678 * jdmerge.c |
| 6679 * |
| 6680 + * This file was part of the Independent JPEG Group's software: |
| 6681 * Copyright (C) 1994-1996, Thomas G. Lane. |
| 6682 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 6683 - * Copyright (C) 2009, D. R. Commander. |
| 6684 - * This file is part of the Independent JPEG Group's software. |
| 6685 + * libjpeg-turbo Modifications: |
| 6686 + * Copyright (C) 2009, 2011, D. R. Commander. |
| 6687 * For conditions of distribution and use, see the accompanying README file. |
| 6688 * |
| 6689 * This file contains code for merged upsampling/color conversion. |
| 6690 @@ -38,6 +39,7 @@ |
| 6691 #include "jinclude.h" |
| 6692 #include "jpeglib.h" |
| 6693 #include "jsimd.h" |
| 6694 +#include "config.h" |
| 6695 |
| 6696 #ifdef UPSAMPLE_MERGING_SUPPORTED |
| 6697 |
| 6698 @@ -77,6 +79,107 @@ |
| 6699 #define FIX(x) ((INT32) ((x) * (1L<<SCALEBITS) + 0.5)) |
| 6700 |
| 6701 |
| 6702 +/* Include inline routines for colorspace extensions */ |
| 6703 + |
| 6704 +#include "jdmrgext.c" |
| 6705 +#undef RGB_RED |
| 6706 +#undef RGB_GREEN |
| 6707 +#undef RGB_BLUE |
| 6708 +#undef RGB_PIXELSIZE |
| 6709 + |
| 6710 +#define RGB_RED EXT_RGB_RED |
| 6711 +#define RGB_GREEN EXT_RGB_GREEN |
| 6712 +#define RGB_BLUE EXT_RGB_BLUE |
| 6713 +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 6714 +#define h2v1_merged_upsample_internal extrgb_h2v1_merged_upsample_internal |
| 6715 +#define h2v2_merged_upsample_internal extrgb_h2v2_merged_upsample_internal |
| 6716 +#include "jdmrgext.c" |
| 6717 +#undef RGB_RED |
| 6718 +#undef RGB_GREEN |
| 6719 +#undef RGB_BLUE |
| 6720 +#undef RGB_PIXELSIZE |
| 6721 +#undef h2v1_merged_upsample_internal |
| 6722 +#undef h2v2_merged_upsample_internal |
| 6723 + |
| 6724 +#define RGB_RED EXT_RGBX_RED |
| 6725 +#define RGB_GREEN EXT_RGBX_GREEN |
| 6726 +#define RGB_BLUE EXT_RGBX_BLUE |
| 6727 +#define RGB_ALPHA 3 |
| 6728 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 6729 +#define h2v1_merged_upsample_internal extrgbx_h2v1_merged_upsample_internal |
| 6730 +#define h2v2_merged_upsample_internal extrgbx_h2v2_merged_upsample_internal |
| 6731 +#include "jdmrgext.c" |
| 6732 +#undef RGB_RED |
| 6733 +#undef RGB_GREEN |
| 6734 +#undef RGB_BLUE |
| 6735 +#undef RGB_ALPHA |
| 6736 +#undef RGB_PIXELSIZE |
| 6737 +#undef h2v1_merged_upsample_internal |
| 6738 +#undef h2v2_merged_upsample_internal |
| 6739 + |
| 6740 +#define RGB_RED EXT_BGR_RED |
| 6741 +#define RGB_GREEN EXT_BGR_GREEN |
| 6742 +#define RGB_BLUE EXT_BGR_BLUE |
| 6743 +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 6744 +#define h2v1_merged_upsample_internal extbgr_h2v1_merged_upsample_internal |
| 6745 +#define h2v2_merged_upsample_internal extbgr_h2v2_merged_upsample_internal |
| 6746 +#include "jdmrgext.c" |
| 6747 +#undef RGB_RED |
| 6748 +#undef RGB_GREEN |
| 6749 +#undef RGB_BLUE |
| 6750 +#undef RGB_PIXELSIZE |
| 6751 +#undef h2v1_merged_upsample_internal |
| 6752 +#undef h2v2_merged_upsample_internal |
| 6753 + |
| 6754 +#define RGB_RED EXT_BGRX_RED |
| 6755 +#define RGB_GREEN EXT_BGRX_GREEN |
| 6756 +#define RGB_BLUE EXT_BGRX_BLUE |
| 6757 +#define RGB_ALPHA 3 |
| 6758 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 6759 +#define h2v1_merged_upsample_internal extbgrx_h2v1_merged_upsample_internal |
| 6760 +#define h2v2_merged_upsample_internal extbgrx_h2v2_merged_upsample_internal |
| 6761 +#include "jdmrgext.c" |
| 6762 +#undef RGB_RED |
| 6763 +#undef RGB_GREEN |
| 6764 +#undef RGB_BLUE |
| 6765 +#undef RGB_ALPHA |
| 6766 +#undef RGB_PIXELSIZE |
| 6767 +#undef h2v1_merged_upsample_internal |
| 6768 +#undef h2v2_merged_upsample_internal |
| 6769 + |
| 6770 +#define RGB_RED EXT_XBGR_RED |
| 6771 +#define RGB_GREEN EXT_XBGR_GREEN |
| 6772 +#define RGB_BLUE EXT_XBGR_BLUE |
| 6773 +#define RGB_ALPHA 0 |
| 6774 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 6775 +#define h2v1_merged_upsample_internal extxbgr_h2v1_merged_upsample_internal |
| 6776 +#define h2v2_merged_upsample_internal extxbgr_h2v2_merged_upsample_internal |
| 6777 +#include "jdmrgext.c" |
| 6778 +#undef RGB_RED |
| 6779 +#undef RGB_GREEN |
| 6780 +#undef RGB_BLUE |
| 6781 +#undef RGB_ALPHA |
| 6782 +#undef RGB_PIXELSIZE |
| 6783 +#undef h2v1_merged_upsample_internal |
| 6784 +#undef h2v2_merged_upsample_internal |
| 6785 + |
| 6786 +#define RGB_RED EXT_XRGB_RED |
| 6787 +#define RGB_GREEN EXT_XRGB_GREEN |
| 6788 +#define RGB_BLUE EXT_XRGB_BLUE |
| 6789 +#define RGB_ALPHA 0 |
| 6790 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 6791 +#define h2v1_merged_upsample_internal extxrgb_h2v1_merged_upsample_internal |
| 6792 +#define h2v2_merged_upsample_internal extxrgb_h2v2_merged_upsample_internal |
| 6793 +#include "jdmrgext.c" |
| 6794 +#undef RGB_RED |
| 6795 +#undef RGB_GREEN |
| 6796 +#undef RGB_BLUE |
| 6797 +#undef RGB_ALPHA |
| 6798 +#undef RGB_PIXELSIZE |
| 6799 +#undef h2v1_merged_upsample_internal |
| 6800 +#undef h2v2_merged_upsample_internal |
| 6801 + |
| 6802 + |
| 6803 /* |
| 6804 * Initialize tables for YCC->RGB colorspace conversion. |
| 6805 * This is taken directly from jdcolor.c; see that file for more info. |
| 6806 @@ -230,56 +333,40 @@ |
| 6807 JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, |
| 6808 JSAMPARRAY output_buf) |
| 6809 { |
| 6810 - my_upsample_ptr upsample = (my_upsample_ptr) cinfo->upsample; |
| 6811 - register int y, cred, cgreen, cblue; |
| 6812 - int cb, cr; |
| 6813 - register JSAMPROW outptr; |
| 6814 - JSAMPROW inptr0, inptr1, inptr2; |
| 6815 - JDIMENSION col; |
| 6816 - /* copy these pointers into registers if possible */ |
| 6817 - register JSAMPLE * range_limit = cinfo->sample_range_limit; |
| 6818 - int * Crrtab = upsample->Cr_r_tab; |
| 6819 - int * Cbbtab = upsample->Cb_b_tab; |
| 6820 - INT32 * Crgtab = upsample->Cr_g_tab; |
| 6821 - INT32 * Cbgtab = upsample->Cb_g_tab; |
| 6822 - SHIFT_TEMPS |
| 6823 - |
| 6824 - inptr0 = input_buf[0][in_row_group_ctr]; |
| 6825 - inptr1 = input_buf[1][in_row_group_ctr]; |
| 6826 - inptr2 = input_buf[2][in_row_group_ctr]; |
| 6827 - outptr = output_buf[0]; |
| 6828 - /* Loop for each pair of output pixels */ |
| 6829 - for (col = cinfo->output_width >> 1; col > 0; col--) { |
| 6830 - /* Do the chroma part of the calculation */ |
| 6831 - cb = GETJSAMPLE(*inptr1++); |
| 6832 - cr = GETJSAMPLE(*inptr2++); |
| 6833 - cred = Crrtab[cr]; |
| 6834 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); |
| 6835 - cblue = Cbbtab[cb]; |
| 6836 - /* Fetch 2 Y values and emit 2 pixels */ |
| 6837 - y = GETJSAMPLE(*inptr0++); |
| 6838 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + cred]; |
| 6839 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen]; |
| 6840 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue]; |
| 6841 - outptr += rgb_pixelsize[cinfo->out_color_space]; |
| 6842 - y = GETJSAMPLE(*inptr0++); |
| 6843 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + cred]; |
| 6844 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen]; |
| 6845 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue]; |
| 6846 - outptr += rgb_pixelsize[cinfo->out_color_space]; |
| 6847 + switch (cinfo->out_color_space) { |
| 6848 + case JCS_EXT_RGB: |
| 6849 + extrgb_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6850 + output_buf); |
| 6851 + break; |
| 6852 + case JCS_EXT_RGBX: |
| 6853 + case JCS_EXT_RGBA: |
| 6854 + extrgbx_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6855 + output_buf); |
| 6856 + break; |
| 6857 + case JCS_EXT_BGR: |
| 6858 + extbgr_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6859 + output_buf); |
| 6860 + break; |
| 6861 + case JCS_EXT_BGRX: |
| 6862 + case JCS_EXT_BGRA: |
| 6863 + extbgrx_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6864 + output_buf); |
| 6865 + break; |
| 6866 + case JCS_EXT_XBGR: |
| 6867 + case JCS_EXT_ABGR: |
| 6868 + extxbgr_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6869 + output_buf); |
| 6870 + break; |
| 6871 + case JCS_EXT_XRGB: |
| 6872 + case JCS_EXT_ARGB: |
| 6873 + extxrgb_h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6874 + output_buf); |
| 6875 + break; |
| 6876 + default: |
| 6877 + h2v1_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6878 + output_buf); |
| 6879 + break; |
| 6880 } |
| 6881 - /* If image width is odd, do the last output column separately */ |
| 6882 - if (cinfo->output_width & 1) { |
| 6883 - cb = GETJSAMPLE(*inptr1); |
| 6884 - cr = GETJSAMPLE(*inptr2); |
| 6885 - cred = Crrtab[cr]; |
| 6886 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); |
| 6887 - cblue = Cbbtab[cb]; |
| 6888 - y = GETJSAMPLE(*inptr0); |
| 6889 - outptr[rgb_red[cinfo->out_color_space]] = range_limit[y + cred]; |
| 6890 - outptr[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen]; |
| 6891 - outptr[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue]; |
| 6892 - } |
| 6893 } |
| 6894 |
| 6895 |
| 6896 @@ -292,72 +379,40 @@ |
| 6897 JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, |
| 6898 JSAMPARRAY output_buf) |
| 6899 { |
| 6900 - my_upsample_ptr upsample = (my_upsample_ptr) cinfo->upsample; |
| 6901 - register int y, cred, cgreen, cblue; |
| 6902 - int cb, cr; |
| 6903 - register JSAMPROW outptr0, outptr1; |
| 6904 - JSAMPROW inptr00, inptr01, inptr1, inptr2; |
| 6905 - JDIMENSION col; |
| 6906 - /* copy these pointers into registers if possible */ |
| 6907 - register JSAMPLE * range_limit = cinfo->sample_range_limit; |
| 6908 - int * Crrtab = upsample->Cr_r_tab; |
| 6909 - int * Cbbtab = upsample->Cb_b_tab; |
| 6910 - INT32 * Crgtab = upsample->Cr_g_tab; |
| 6911 - INT32 * Cbgtab = upsample->Cb_g_tab; |
| 6912 - SHIFT_TEMPS |
| 6913 - |
| 6914 - inptr00 = input_buf[0][in_row_group_ctr*2]; |
| 6915 - inptr01 = input_buf[0][in_row_group_ctr*2 + 1]; |
| 6916 - inptr1 = input_buf[1][in_row_group_ctr]; |
| 6917 - inptr2 = input_buf[2][in_row_group_ctr]; |
| 6918 - outptr0 = output_buf[0]; |
| 6919 - outptr1 = output_buf[1]; |
| 6920 - /* Loop for each group of output pixels */ |
| 6921 - for (col = cinfo->output_width >> 1; col > 0; col--) { |
| 6922 - /* Do the chroma part of the calculation */ |
| 6923 - cb = GETJSAMPLE(*inptr1++); |
| 6924 - cr = GETJSAMPLE(*inptr2++); |
| 6925 - cred = Crrtab[cr]; |
| 6926 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); |
| 6927 - cblue = Cbbtab[cb]; |
| 6928 - /* Fetch 4 Y values and emit 4 pixels */ |
| 6929 - y = GETJSAMPLE(*inptr00++); |
| 6930 - outptr0[rgb_red[cinfo->out_color_space]] = range_limit[y + cred]; |
| 6931 - outptr0[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen]; |
| 6932 - outptr0[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue]; |
| 6933 - outptr0 += RGB_PIXELSIZE; |
| 6934 - y = GETJSAMPLE(*inptr00++); |
| 6935 - outptr0[rgb_red[cinfo->out_color_space]] = range_limit[y + cred]; |
| 6936 - outptr0[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen]; |
| 6937 - outptr0[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue]; |
| 6938 - outptr0 += RGB_PIXELSIZE; |
| 6939 - y = GETJSAMPLE(*inptr01++); |
| 6940 - outptr1[rgb_red[cinfo->out_color_space]] = range_limit[y + cred]; |
| 6941 - outptr1[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen]; |
| 6942 - outptr1[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue]; |
| 6943 - outptr1 += RGB_PIXELSIZE; |
| 6944 - y = GETJSAMPLE(*inptr01++); |
| 6945 - outptr1[rgb_red[cinfo->out_color_space]] = range_limit[y + cred]; |
| 6946 - outptr1[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen]; |
| 6947 - outptr1[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue]; |
| 6948 - outptr1 += RGB_PIXELSIZE; |
| 6949 + switch (cinfo->out_color_space) { |
| 6950 + case JCS_EXT_RGB: |
| 6951 + extrgb_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6952 + output_buf); |
| 6953 + break; |
| 6954 + case JCS_EXT_RGBX: |
| 6955 + case JCS_EXT_RGBA: |
| 6956 + extrgbx_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6957 + output_buf); |
| 6958 + break; |
| 6959 + case JCS_EXT_BGR: |
| 6960 + extbgr_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6961 + output_buf); |
| 6962 + break; |
| 6963 + case JCS_EXT_BGRX: |
| 6964 + case JCS_EXT_BGRA: |
| 6965 + extbgrx_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6966 + output_buf); |
| 6967 + break; |
| 6968 + case JCS_EXT_XBGR: |
| 6969 + case JCS_EXT_ABGR: |
| 6970 + extxbgr_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6971 + output_buf); |
| 6972 + break; |
| 6973 + case JCS_EXT_XRGB: |
| 6974 + case JCS_EXT_ARGB: |
| 6975 + extxrgb_h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6976 + output_buf); |
| 6977 + break; |
| 6978 + default: |
| 6979 + h2v2_merged_upsample_internal(cinfo, input_buf, in_row_group_ctr, |
| 6980 + output_buf); |
| 6981 + break; |
| 6982 } |
| 6983 - /* If image width is odd, do the last output column separately */ |
| 6984 - if (cinfo->output_width & 1) { |
| 6985 - cb = GETJSAMPLE(*inptr1); |
| 6986 - cr = GETJSAMPLE(*inptr2); |
| 6987 - cred = Crrtab[cr]; |
| 6988 - cgreen = (int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); |
| 6989 - cblue = Cbbtab[cb]; |
| 6990 - y = GETJSAMPLE(*inptr00); |
| 6991 - outptr0[rgb_red[cinfo->out_color_space]] = range_limit[y + cred]; |
| 6992 - outptr0[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen]; |
| 6993 - outptr0[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue]; |
| 6994 - y = GETJSAMPLE(*inptr01); |
| 6995 - outptr1[rgb_red[cinfo->out_color_space]] = range_limit[y + cred]; |
| 6996 - outptr1[rgb_green[cinfo->out_color_space]] = range_limit[y + cgreen]; |
| 6997 - outptr1[rgb_blue[cinfo->out_color_space]] = range_limit[y + cblue]; |
| 6998 - } |
| 6999 } |
| 7000 |
| 7001 |
| 7002 Index: jdphuff.c |
| 7003 =================================================================== |
| 7004 --- jdphuff.c (revision 829) |
| 7005 +++ jdphuff.c (working copy) |
| 7006 @@ -198,6 +198,7 @@ |
| 7007 * On some machines, a shift and add will be faster than a table lookup. |
| 7008 */ |
| 7009 |
| 7010 +#define AVOID_TABLES |
| 7011 #ifdef AVOID_TABLES |
| 7012 |
| 7013 #define HUFF_EXTEND(x,s) ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x)) |
| 7014 Index: jdsample.c |
| 7015 =================================================================== |
| 7016 --- jdsample.c (revision 829) |
| 7017 +++ jdsample.c (working copy) |
| 7018 @@ -1,9 +1,11 @@ |
| 7019 /* |
| 7020 * jdsample.c |
| 7021 * |
| 7022 + * This file was part of the Independent JPEG Group's software: |
| 7023 * Copyright (C) 1991-1996, Thomas G. Lane. |
| 7024 + * libjpeg-turbo Modifications: |
| 7025 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 7026 - * This file is part of the Independent JPEG Group's software. |
| 7027 + * Copyright (C) 2010, D. R. Commander. |
| 7028 * For conditions of distribution and use, see the accompanying README file. |
| 7029 * |
| 7030 * This file contains upsampling routines. |
| 7031 @@ -19,50 +21,12 @@ |
| 7032 * Pub. by IEEE Computer Society Press, Los Alamitos, CA. ISBN 0-8186-8944-7. |
| 7033 */ |
| 7034 |
| 7035 -#define JPEG_INTERNALS |
| 7036 -#include "jinclude.h" |
| 7037 -#include "jpeglib.h" |
| 7038 +#include "jdsample.h" |
| 7039 #include "jsimd.h" |
| 7040 +#include "jpegcomp.h" |
| 7041 |
| 7042 |
| 7043 -/* Pointer to routine to upsample a single component */ |
| 7044 -typedef JMETHOD(void, upsample1_ptr, |
| 7045 - (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 7046 - JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr)); |
| 7047 |
| 7048 -/* Private subobject */ |
| 7049 - |
| 7050 -typedef struct { |
| 7051 - struct jpeg_upsampler pub; /* public fields */ |
| 7052 - |
| 7053 - /* Color conversion buffer. When using separate upsampling and color |
| 7054 - * conversion steps, this buffer holds one upsampled row group until it |
| 7055 - * has been color converted and output. |
| 7056 - * Note: we do not allocate any storage for component(s) which are full-size, |
| 7057 - * ie do not need rescaling. The corresponding entry of color_buf[] is |
| 7058 - * simply set to point to the input data array, thereby avoiding copying. |
| 7059 - */ |
| 7060 - JSAMPARRAY color_buf[MAX_COMPONENTS]; |
| 7061 - |
| 7062 - /* Per-component upsampling method pointers */ |
| 7063 - upsample1_ptr methods[MAX_COMPONENTS]; |
| 7064 - |
| 7065 - int next_row_out; /* counts rows emitted from color_buf */ |
| 7066 - JDIMENSION rows_to_go; /* counts rows remaining in image */ |
| 7067 - |
| 7068 - /* Height of an input row group for each component. */ |
| 7069 - int rowgroup_height[MAX_COMPONENTS]; |
| 7070 - |
| 7071 - /* These arrays save pixel expansion factors so that int_expand need not |
| 7072 - * recompute them each time. They are unused for other upsampling methods. |
| 7073 - */ |
| 7074 - UINT8 h_expand[MAX_COMPONENTS]; |
| 7075 - UINT8 v_expand[MAX_COMPONENTS]; |
| 7076 -} my_upsampler; |
| 7077 - |
| 7078 -typedef my_upsampler * my_upsample_ptr; |
| 7079 - |
| 7080 - |
| 7081 /* |
| 7082 * Initialize for an upsampling pass. |
| 7083 */ |
| 7084 @@ -420,7 +384,7 @@ |
| 7085 /* jdmainct.c doesn't support context rows when min_DCT_scaled_size = 1, |
| 7086 * so don't ask for it. |
| 7087 */ |
| 7088 - do_fancy = cinfo->do_fancy_upsampling && cinfo->min_DCT_scaled_size > 1; |
| 7089 + do_fancy = cinfo->do_fancy_upsampling && cinfo->_min_DCT_scaled_size > 1; |
| 7090 |
| 7091 /* Verify we can handle the sampling factors, select per-component methods, |
| 7092 * and create storage as needed. |
| 7093 @@ -430,10 +394,10 @@ |
| 7094 /* Compute size of an "input group" after IDCT scaling. This many samples |
| 7095 * are to be converted to max_h_samp_factor * max_v_samp_factor pixels. |
| 7096 */ |
| 7097 - h_in_group = (compptr->h_samp_factor * compptr->DCT_scaled_size) / |
| 7098 - cinfo->min_DCT_scaled_size; |
| 7099 - v_in_group = (compptr->v_samp_factor * compptr->DCT_scaled_size) / |
| 7100 - cinfo->min_DCT_scaled_size; |
| 7101 + h_in_group = (compptr->h_samp_factor * compptr->_DCT_scaled_size) / |
| 7102 + cinfo->_min_DCT_scaled_size; |
| 7103 + v_in_group = (compptr->v_samp_factor * compptr->_DCT_scaled_size) / |
| 7104 + cinfo->_min_DCT_scaled_size; |
| 7105 h_out_group = cinfo->max_h_samp_factor; |
| 7106 v_out_group = cinfo->max_v_samp_factor; |
| 7107 upsample->rowgroup_height[ci] = v_in_group; /* save for use later */ |
| 7108 Index: jdtrans.c |
| 7109 =================================================================== |
| 7110 --- jdtrans.c (revision 829) |
| 7111 +++ jdtrans.c (working copy) |
| 7112 @@ -99,9 +99,18 @@ |
| 7113 /* This is effectively a buffered-image operation. */ |
| 7114 cinfo->buffered_image = TRUE; |
| 7115 |
| 7116 +#if JPEG_LIB_VERSION >= 80 |
| 7117 + /* Compute output image dimensions and related values. */ |
| 7118 + jpeg_core_output_dimensions(cinfo); |
| 7119 +#endif |
| 7120 + |
| 7121 /* Entropy decoding: either Huffman or arithmetic coding. */ |
| 7122 if (cinfo->arith_code) { |
| 7123 +#ifdef D_ARITH_CODING_SUPPORTED |
| 7124 + jinit_arith_decoder(cinfo); |
| 7125 +#else |
| 7126 ERREXIT(cinfo, JERR_ARITH_NOTIMPL); |
| 7127 +#endif |
| 7128 } else { |
| 7129 if (cinfo->progressive_mode) { |
| 7130 #ifdef D_PROGRESSIVE_SUPPORTED |
| 7131 Index: jerror.h |
| 7132 =================================================================== |
| 7133 --- jerror.h (revision 829) |
| 7134 +++ jerror.h (working copy) |
| 7135 @@ -2,6 +2,7 @@ |
| 7136 * jerror.h |
| 7137 * |
| 7138 * Copyright (C) 1994-1997, Thomas G. Lane. |
| 7139 + * Modified 1997-2009 by Guido Vollbeding. |
| 7140 * This file is part of the Independent JPEG Group's software. |
| 7141 * For conditions of distribution and use, see the accompanying README file. |
| 7142 * |
| 7143 @@ -39,14 +40,23 @@ |
| 7144 JMESSAGE(JMSG_NOMESSAGE, "Bogus message code %d") /* Must be first entry! */ |
| 7145 |
| 7146 /* For maintenance convenience, list is alphabetical by message code name */ |
| 7147 +#if JPEG_LIB_VERSION < 70 |
| 7148 JMESSAGE(JERR_ARITH_NOTIMPL, |
| 7149 - "Sorry, there are legal restrictions on arithmetic coding") |
| 7150 + "Sorry, arithmetic coding is not implemented") |
| 7151 +#endif |
| 7152 JMESSAGE(JERR_BAD_ALIGN_TYPE, "ALIGN_TYPE is wrong, please fix") |
| 7153 JMESSAGE(JERR_BAD_ALLOC_CHUNK, "MAX_ALLOC_CHUNK is wrong, please fix") |
| 7154 JMESSAGE(JERR_BAD_BUFFER_MODE, "Bogus buffer control mode") |
| 7155 JMESSAGE(JERR_BAD_COMPONENT_ID, "Invalid component ID %d in SOS") |
| 7156 +#if JPEG_LIB_VERSION >= 70 |
| 7157 +JMESSAGE(JERR_BAD_CROP_SPEC, "Invalid crop request") |
| 7158 +#endif |
| 7159 JMESSAGE(JERR_BAD_DCT_COEF, "DCT coefficient out of range") |
| 7160 JMESSAGE(JERR_BAD_DCTSIZE, "IDCT output block size %d not supported") |
| 7161 +#if JPEG_LIB_VERSION >= 70 |
| 7162 +JMESSAGE(JERR_BAD_DROP_SAMPLING, |
| 7163 + "Component index %d: mismatching sampling ratio %d:%d, %d:%d, %c") |
| 7164 +#endif |
| 7165 JMESSAGE(JERR_BAD_HUFF_TABLE, "Bogus Huffman table definition") |
| 7166 JMESSAGE(JERR_BAD_IN_COLORSPACE, "Bogus input colorspace") |
| 7167 JMESSAGE(JERR_BAD_J_COLORSPACE, "Bogus JPEG colorspace") |
| 7168 @@ -93,6 +103,9 @@ |
| 7169 JMESSAGE(JERR_MODE_CHANGE, "Invalid color quantization mode change") |
| 7170 JMESSAGE(JERR_NOTIMPL, "Not implemented yet") |
| 7171 JMESSAGE(JERR_NOT_COMPILED, "Requested feature was omitted at compile time") |
| 7172 +#if JPEG_LIB_VERSION >= 70 |
| 7173 +JMESSAGE(JERR_NO_ARITH_TABLE, "Arithmetic table 0x%02x was not defined") |
| 7174 +#endif |
| 7175 JMESSAGE(JERR_NO_BACKING_STORE, "Backing store not supported") |
| 7176 JMESSAGE(JERR_NO_HUFF_TABLE, "Huffman table 0x%02x was not defined") |
| 7177 JMESSAGE(JERR_NO_IMAGE, "JPEG datastream contains no image") |
| 7178 @@ -170,6 +183,9 @@ |
| 7179 JMESSAGE(JTRC_XMS_CLOSE, "Freed XMS handle %u") |
| 7180 JMESSAGE(JTRC_XMS_OPEN, "Obtained XMS handle %u") |
| 7181 JMESSAGE(JWRN_ADOBE_XFORM, "Unknown Adobe color transform code %d") |
| 7182 +#if JPEG_LIB_VERSION >= 70 |
| 7183 +JMESSAGE(JWRN_ARITH_BAD_CODE, "Corrupt JPEG data: bad arithmetic code") |
| 7184 +#endif |
| 7185 JMESSAGE(JWRN_BOGUS_PROGRESSION, |
| 7186 "Inconsistent progression sequence for component %d coefficient %d") |
| 7187 JMESSAGE(JWRN_EXTRANEOUS_DATA, |
| 7188 @@ -182,6 +198,13 @@ |
| 7189 "Corrupt JPEG data: found marker 0x%02x instead of RST%d") |
| 7190 JMESSAGE(JWRN_NOT_SEQUENTIAL, "Invalid SOS parameters for sequential JPEG") |
| 7191 JMESSAGE(JWRN_TOO_MUCH_DATA, "Application transferred too many scanlines") |
| 7192 +#if JPEG_LIB_VERSION < 70 |
| 7193 +JMESSAGE(JERR_BAD_CROP_SPEC, "Invalid crop request") |
| 7194 +#if defined(C_ARITH_CODING_SUPPORTED) || defined(D_ARITH_CODING_SUPPORTED) |
| 7195 +JMESSAGE(JERR_NO_ARITH_TABLE, "Arithmetic table 0x%02x was not defined") |
| 7196 +JMESSAGE(JWRN_ARITH_BAD_CODE, "Corrupt JPEG data: bad arithmetic code") |
| 7197 +#endif |
| 7198 +#endif |
| 7199 |
| 7200 #ifdef JMAKE_ENUM_LIST |
| 7201 |
| 7202 Index: jidctint.c |
| 7203 =================================================================== |
| 7204 --- jidctint.c (revision 829) |
| 7205 +++ jidctint.c (working copy) |
| 7206 @@ -2,6 +2,7 @@ |
| 7207 * jidctint.c |
| 7208 * |
| 7209 * Copyright (C) 1991-1998, Thomas G. Lane. |
| 7210 + * Modification developed 2002-2009 by Guido Vollbeding. |
| 7211 * This file is part of the Independent JPEG Group's software. |
| 7212 * For conditions of distribution and use, see the accompanying README file. |
| 7213 * |
| 7214 @@ -23,6 +24,27 @@ |
| 7215 * The advantage of this method is that no data path contains more than one |
| 7216 * multiplication; this allows a very simple and accurate implementation in |
| 7217 * scaled fixed-point arithmetic, with a minimal number of shifts. |
| 7218 + * |
| 7219 + * We also provide IDCT routines with various output sample block sizes for |
| 7220 + * direct resolution reduction or enlargement without additional resampling: |
| 7221 + * NxN (N=1...16) pixels for one 8x8 input DCT block. |
| 7222 + * |
| 7223 + * For N<8 we simply take the corresponding low-frequency coefficients of |
| 7224 + * the 8x8 input DCT block and apply an NxN point IDCT on the sub-block |
| 7225 + * to yield the downscaled outputs. |
| 7226 + * This can be seen as direct low-pass downsampling from the DCT domain |
| 7227 + * point of view rather than the usual spatial domain point of view, |
| 7228 + * yielding significant computational savings and results at least |
| 7229 + * as good as common bilinear (averaging) spatial downsampling. |
| 7230 + * |
| 7231 + * For N>8 we apply a partial NxN IDCT on the 8 input coefficients as |
| 7232 + * lower frequencies and higher frequencies assumed to be zero. |
| 7233 + * It turns out that the computational effort is similar to the 8x8 IDCT |
| 7234 + * regarding the output size. |
| 7235 + * Furthermore, the scaling and descaling is the same for all IDCT sizes. |
| 7236 + * |
| 7237 + * CAUTION: We rely on the FIX() macro except for the N=1,2,4,8 cases |
| 7238 + * since there would be too many additional constants to pre-calculate. |
| 7239 */ |
| 7240 |
| 7241 #define JPEG_INTERNALS |
| 7242 @@ -38,7 +60,7 @@ |
| 7243 */ |
| 7244 |
| 7245 #if DCTSIZE != 8 |
| 7246 - Sorry, this code only copes with 8x8 DCTs. /* deliberate syntax err */ |
| 7247 + Sorry, this code only copes with 8x8 DCT blocks. /* deliberate syntax err */ |
| 7248 #endif |
| 7249 |
| 7250 |
| 7251 @@ -386,4 +408,2216 @@ |
| 7252 } |
| 7253 } |
| 7254 |
| 7255 +#ifdef IDCT_SCALING_SUPPORTED |
| 7256 + |
| 7257 + |
| 7258 +/* |
| 7259 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 7260 + * producing a 7x7 output block. |
| 7261 + * |
| 7262 + * Optimized algorithm with 12 multiplications in the 1-D kernel. |
| 7263 + * cK represents sqrt(2) * cos(K*pi/14). |
| 7264 + */ |
| 7265 + |
| 7266 +GLOBAL(void) |
| 7267 +jpeg_idct_7x7 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 7268 + JCOEFPTR coef_block, |
| 7269 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 7270 +{ |
| 7271 + INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12, tmp13; |
| 7272 + INT32 z1, z2, z3; |
| 7273 + JCOEFPTR inptr; |
| 7274 + ISLOW_MULT_TYPE * quantptr; |
| 7275 + int * wsptr; |
| 7276 + JSAMPROW outptr; |
| 7277 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 7278 + int ctr; |
| 7279 + int workspace[7*7]; /* buffers data between passes */ |
| 7280 + SHIFT_TEMPS |
| 7281 + |
| 7282 + /* Pass 1: process columns from input, store into work array. */ |
| 7283 + |
| 7284 + inptr = coef_block; |
| 7285 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 7286 + wsptr = workspace; |
| 7287 + for (ctr = 0; ctr < 7; ctr++, inptr++, quantptr++, wsptr++) { |
| 7288 + /* Even part */ |
| 7289 + |
| 7290 + tmp13 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 7291 + tmp13 <<= CONST_BITS; |
| 7292 + /* Add fudge factor here for final descale. */ |
| 7293 + tmp13 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 7294 + |
| 7295 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 7296 + z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 7297 + z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]); |
| 7298 + |
| 7299 + tmp10 = MULTIPLY(z2 - z3, FIX(0.881747734)); /* c4 */ |
| 7300 + tmp12 = MULTIPLY(z1 - z2, FIX(0.314692123)); /* c6 */ |
| 7301 + tmp11 = tmp10 + tmp12 + tmp13 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6
*/ |
| 7302 + tmp0 = z1 + z3; |
| 7303 + z2 -= tmp0; |
| 7304 + tmp0 = MULTIPLY(tmp0, FIX(1.274162392)) + tmp13; /* c2 */ |
| 7305 + tmp10 += tmp0 - MULTIPLY(z3, FIX(0.077722536)); /* c2-c4-c6 */ |
| 7306 + tmp12 += tmp0 - MULTIPLY(z1, FIX(2.470602249)); /* c2+c4+c6 */ |
| 7307 + tmp13 += MULTIPLY(z2, FIX(1.414213562)); /* c0 */ |
| 7308 + |
| 7309 + /* Odd part */ |
| 7310 + |
| 7311 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 7312 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 7313 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); |
| 7314 + |
| 7315 + tmp1 = MULTIPLY(z1 + z2, FIX(0.935414347)); /* (c3+c1-c5)/2 */ |
| 7316 + tmp2 = MULTIPLY(z1 - z2, FIX(0.170262339)); /* (c3+c5-c1)/2 */ |
| 7317 + tmp0 = tmp1 - tmp2; |
| 7318 + tmp1 += tmp2; |
| 7319 + tmp2 = MULTIPLY(z2 + z3, - FIX(1.378756276)); /* -c1 */ |
| 7320 + tmp1 += tmp2; |
| 7321 + z2 = MULTIPLY(z1 + z3, FIX(0.613604268)); /* c5 */ |
| 7322 + tmp0 += z2; |
| 7323 + tmp2 += z2 + MULTIPLY(z3, FIX(1.870828693)); /* c3+c1-c5 */ |
| 7324 + |
| 7325 + /* Final output stage */ |
| 7326 + |
| 7327 + wsptr[7*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS); |
| 7328 + wsptr[7*6] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS); |
| 7329 + wsptr[7*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS); |
| 7330 + wsptr[7*5] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS); |
| 7331 + wsptr[7*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS); |
| 7332 + wsptr[7*4] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS); |
| 7333 + wsptr[7*3] = (int) RIGHT_SHIFT(tmp13, CONST_BITS-PASS1_BITS); |
| 7334 + } |
| 7335 + |
| 7336 + /* Pass 2: process 7 rows from work array, store into output array. */ |
| 7337 + |
| 7338 + wsptr = workspace; |
| 7339 + for (ctr = 0; ctr < 7; ctr++) { |
| 7340 + outptr = output_buf[ctr] + output_col; |
| 7341 + |
| 7342 + /* Even part */ |
| 7343 + |
| 7344 + /* Add fudge factor here for final descale. */ |
| 7345 + tmp13 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 7346 + tmp13 <<= CONST_BITS; |
| 7347 + |
| 7348 + z1 = (INT32) wsptr[2]; |
| 7349 + z2 = (INT32) wsptr[4]; |
| 7350 + z3 = (INT32) wsptr[6]; |
| 7351 + |
| 7352 + tmp10 = MULTIPLY(z2 - z3, FIX(0.881747734)); /* c4 */ |
| 7353 + tmp12 = MULTIPLY(z1 - z2, FIX(0.314692123)); /* c6 */ |
| 7354 + tmp11 = tmp10 + tmp12 + tmp13 - MULTIPLY(z2, FIX(1.841218003)); /* c2+c4-c6
*/ |
| 7355 + tmp0 = z1 + z3; |
| 7356 + z2 -= tmp0; |
| 7357 + tmp0 = MULTIPLY(tmp0, FIX(1.274162392)) + tmp13; /* c2 */ |
| 7358 + tmp10 += tmp0 - MULTIPLY(z3, FIX(0.077722536)); /* c2-c4-c6 */ |
| 7359 + tmp12 += tmp0 - MULTIPLY(z1, FIX(2.470602249)); /* c2+c4+c6 */ |
| 7360 + tmp13 += MULTIPLY(z2, FIX(1.414213562)); /* c0 */ |
| 7361 + |
| 7362 + /* Odd part */ |
| 7363 + |
| 7364 + z1 = (INT32) wsptr[1]; |
| 7365 + z2 = (INT32) wsptr[3]; |
| 7366 + z3 = (INT32) wsptr[5]; |
| 7367 + |
| 7368 + tmp1 = MULTIPLY(z1 + z2, FIX(0.935414347)); /* (c3+c1-c5)/2 */ |
| 7369 + tmp2 = MULTIPLY(z1 - z2, FIX(0.170262339)); /* (c3+c5-c1)/2 */ |
| 7370 + tmp0 = tmp1 - tmp2; |
| 7371 + tmp1 += tmp2; |
| 7372 + tmp2 = MULTIPLY(z2 + z3, - FIX(1.378756276)); /* -c1 */ |
| 7373 + tmp1 += tmp2; |
| 7374 + z2 = MULTIPLY(z1 + z3, FIX(0.613604268)); /* c5 */ |
| 7375 + tmp0 += z2; |
| 7376 + tmp2 += z2 + MULTIPLY(z3, FIX(1.870828693)); /* c3+c1-c5 */ |
| 7377 + |
| 7378 + /* Final output stage */ |
| 7379 + |
| 7380 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0, |
| 7381 + CONST_BITS+PASS1_BITS+3) |
| 7382 + & RANGE_MASK]; |
| 7383 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0, |
| 7384 + CONST_BITS+PASS1_BITS+3) |
| 7385 + & RANGE_MASK]; |
| 7386 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1, |
| 7387 + CONST_BITS+PASS1_BITS+3) |
| 7388 + & RANGE_MASK]; |
| 7389 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1, |
| 7390 + CONST_BITS+PASS1_BITS+3) |
| 7391 + & RANGE_MASK]; |
| 7392 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2, |
| 7393 + CONST_BITS+PASS1_BITS+3) |
| 7394 + & RANGE_MASK]; |
| 7395 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2, |
| 7396 + CONST_BITS+PASS1_BITS+3) |
| 7397 + & RANGE_MASK]; |
| 7398 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13, |
| 7399 + CONST_BITS+PASS1_BITS+3) |
| 7400 + & RANGE_MASK]; |
| 7401 + |
| 7402 + wsptr += 7; /* advance pointer to next row */ |
| 7403 + } |
| 7404 +} |
| 7405 + |
| 7406 + |
| 7407 +/* |
| 7408 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 7409 + * producing a reduced-size 6x6 output block. |
| 7410 + * |
| 7411 + * Optimized algorithm with 3 multiplications in the 1-D kernel. |
| 7412 + * cK represents sqrt(2) * cos(K*pi/12). |
| 7413 + */ |
| 7414 + |
| 7415 +GLOBAL(void) |
| 7416 +jpeg_idct_6x6 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 7417 + JCOEFPTR coef_block, |
| 7418 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 7419 +{ |
| 7420 + INT32 tmp0, tmp1, tmp2, tmp10, tmp11, tmp12; |
| 7421 + INT32 z1, z2, z3; |
| 7422 + JCOEFPTR inptr; |
| 7423 + ISLOW_MULT_TYPE * quantptr; |
| 7424 + int * wsptr; |
| 7425 + JSAMPROW outptr; |
| 7426 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 7427 + int ctr; |
| 7428 + int workspace[6*6]; /* buffers data between passes */ |
| 7429 + SHIFT_TEMPS |
| 7430 + |
| 7431 + /* Pass 1: process columns from input, store into work array. */ |
| 7432 + |
| 7433 + inptr = coef_block; |
| 7434 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 7435 + wsptr = workspace; |
| 7436 + for (ctr = 0; ctr < 6; ctr++, inptr++, quantptr++, wsptr++) { |
| 7437 + /* Even part */ |
| 7438 + |
| 7439 + tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 7440 + tmp0 <<= CONST_BITS; |
| 7441 + /* Add fudge factor here for final descale. */ |
| 7442 + tmp0 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 7443 + tmp2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 7444 + tmp10 = MULTIPLY(tmp2, FIX(0.707106781)); /* c4 */ |
| 7445 + tmp1 = tmp0 + tmp10; |
| 7446 + tmp11 = RIGHT_SHIFT(tmp0 - tmp10 - tmp10, CONST_BITS-PASS1_BITS); |
| 7447 + tmp10 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 7448 + tmp0 = MULTIPLY(tmp10, FIX(1.224744871)); /* c2 */ |
| 7449 + tmp10 = tmp1 + tmp0; |
| 7450 + tmp12 = tmp1 - tmp0; |
| 7451 + |
| 7452 + /* Odd part */ |
| 7453 + |
| 7454 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 7455 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 7456 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); |
| 7457 + tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */ |
| 7458 + tmp0 = tmp1 + ((z1 + z2) << CONST_BITS); |
| 7459 + tmp2 = tmp1 + ((z3 - z2) << CONST_BITS); |
| 7460 + tmp1 = (z1 - z2 - z3) << PASS1_BITS; |
| 7461 + |
| 7462 + /* Final output stage */ |
| 7463 + |
| 7464 + wsptr[6*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS); |
| 7465 + wsptr[6*5] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS); |
| 7466 + wsptr[6*1] = (int) (tmp11 + tmp1); |
| 7467 + wsptr[6*4] = (int) (tmp11 - tmp1); |
| 7468 + wsptr[6*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS); |
| 7469 + wsptr[6*3] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS); |
| 7470 + } |
| 7471 + |
| 7472 + /* Pass 2: process 6 rows from work array, store into output array. */ |
| 7473 + |
| 7474 + wsptr = workspace; |
| 7475 + for (ctr = 0; ctr < 6; ctr++) { |
| 7476 + outptr = output_buf[ctr] + output_col; |
| 7477 + |
| 7478 + /* Even part */ |
| 7479 + |
| 7480 + /* Add fudge factor here for final descale. */ |
| 7481 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 7482 + tmp0 <<= CONST_BITS; |
| 7483 + tmp2 = (INT32) wsptr[4]; |
| 7484 + tmp10 = MULTIPLY(tmp2, FIX(0.707106781)); /* c4 */ |
| 7485 + tmp1 = tmp0 + tmp10; |
| 7486 + tmp11 = tmp0 - tmp10 - tmp10; |
| 7487 + tmp10 = (INT32) wsptr[2]; |
| 7488 + tmp0 = MULTIPLY(tmp10, FIX(1.224744871)); /* c2 */ |
| 7489 + tmp10 = tmp1 + tmp0; |
| 7490 + tmp12 = tmp1 - tmp0; |
| 7491 + |
| 7492 + /* Odd part */ |
| 7493 + |
| 7494 + z1 = (INT32) wsptr[1]; |
| 7495 + z2 = (INT32) wsptr[3]; |
| 7496 + z3 = (INT32) wsptr[5]; |
| 7497 + tmp1 = MULTIPLY(z1 + z3, FIX(0.366025404)); /* c5 */ |
| 7498 + tmp0 = tmp1 + ((z1 + z2) << CONST_BITS); |
| 7499 + tmp2 = tmp1 + ((z3 - z2) << CONST_BITS); |
| 7500 + tmp1 = (z1 - z2 - z3) << CONST_BITS; |
| 7501 + |
| 7502 + /* Final output stage */ |
| 7503 + |
| 7504 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0, |
| 7505 + CONST_BITS+PASS1_BITS+3) |
| 7506 + & RANGE_MASK]; |
| 7507 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0, |
| 7508 + CONST_BITS+PASS1_BITS+3) |
| 7509 + & RANGE_MASK]; |
| 7510 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1, |
| 7511 + CONST_BITS+PASS1_BITS+3) |
| 7512 + & RANGE_MASK]; |
| 7513 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1, |
| 7514 + CONST_BITS+PASS1_BITS+3) |
| 7515 + & RANGE_MASK]; |
| 7516 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2, |
| 7517 + CONST_BITS+PASS1_BITS+3) |
| 7518 + & RANGE_MASK]; |
| 7519 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2, |
| 7520 + CONST_BITS+PASS1_BITS+3) |
| 7521 + & RANGE_MASK]; |
| 7522 + |
| 7523 + wsptr += 6; /* advance pointer to next row */ |
| 7524 + } |
| 7525 +} |
| 7526 + |
| 7527 + |
| 7528 +/* |
| 7529 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 7530 + * producing a reduced-size 5x5 output block. |
| 7531 + * |
| 7532 + * Optimized algorithm with 5 multiplications in the 1-D kernel. |
| 7533 + * cK represents sqrt(2) * cos(K*pi/10). |
| 7534 + */ |
| 7535 + |
| 7536 +GLOBAL(void) |
| 7537 +jpeg_idct_5x5 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 7538 + JCOEFPTR coef_block, |
| 7539 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 7540 +{ |
| 7541 + INT32 tmp0, tmp1, tmp10, tmp11, tmp12; |
| 7542 + INT32 z1, z2, z3; |
| 7543 + JCOEFPTR inptr; |
| 7544 + ISLOW_MULT_TYPE * quantptr; |
| 7545 + int * wsptr; |
| 7546 + JSAMPROW outptr; |
| 7547 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 7548 + int ctr; |
| 7549 + int workspace[5*5]; /* buffers data between passes */ |
| 7550 + SHIFT_TEMPS |
| 7551 + |
| 7552 + /* Pass 1: process columns from input, store into work array. */ |
| 7553 + |
| 7554 + inptr = coef_block; |
| 7555 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 7556 + wsptr = workspace; |
| 7557 + for (ctr = 0; ctr < 5; ctr++, inptr++, quantptr++, wsptr++) { |
| 7558 + /* Even part */ |
| 7559 + |
| 7560 + tmp12 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 7561 + tmp12 <<= CONST_BITS; |
| 7562 + /* Add fudge factor here for final descale. */ |
| 7563 + tmp12 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 7564 + tmp0 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 7565 + tmp1 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 7566 + z1 = MULTIPLY(tmp0 + tmp1, FIX(0.790569415)); /* (c2+c4)/2 */ |
| 7567 + z2 = MULTIPLY(tmp0 - tmp1, FIX(0.353553391)); /* (c2-c4)/2 */ |
| 7568 + z3 = tmp12 + z2; |
| 7569 + tmp10 = z3 + z1; |
| 7570 + tmp11 = z3 - z1; |
| 7571 + tmp12 -= z2 << 2; |
| 7572 + |
| 7573 + /* Odd part */ |
| 7574 + |
| 7575 + z2 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 7576 + z3 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 7577 + |
| 7578 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c3 */ |
| 7579 + tmp0 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c1-c3 */ |
| 7580 + tmp1 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c1+c3 */ |
| 7581 + |
| 7582 + /* Final output stage */ |
| 7583 + |
| 7584 + wsptr[5*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS); |
| 7585 + wsptr[5*4] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS); |
| 7586 + wsptr[5*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS); |
| 7587 + wsptr[5*3] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS); |
| 7588 + wsptr[5*2] = (int) RIGHT_SHIFT(tmp12, CONST_BITS-PASS1_BITS); |
| 7589 + } |
| 7590 + |
| 7591 + /* Pass 2: process 5 rows from work array, store into output array. */ |
| 7592 + |
| 7593 + wsptr = workspace; |
| 7594 + for (ctr = 0; ctr < 5; ctr++) { |
| 7595 + outptr = output_buf[ctr] + output_col; |
| 7596 + |
| 7597 + /* Even part */ |
| 7598 + |
| 7599 + /* Add fudge factor here for final descale. */ |
| 7600 + tmp12 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 7601 + tmp12 <<= CONST_BITS; |
| 7602 + tmp0 = (INT32) wsptr[2]; |
| 7603 + tmp1 = (INT32) wsptr[4]; |
| 7604 + z1 = MULTIPLY(tmp0 + tmp1, FIX(0.790569415)); /* (c2+c4)/2 */ |
| 7605 + z2 = MULTIPLY(tmp0 - tmp1, FIX(0.353553391)); /* (c2-c4)/2 */ |
| 7606 + z3 = tmp12 + z2; |
| 7607 + tmp10 = z3 + z1; |
| 7608 + tmp11 = z3 - z1; |
| 7609 + tmp12 -= z2 << 2; |
| 7610 + |
| 7611 + /* Odd part */ |
| 7612 + |
| 7613 + z2 = (INT32) wsptr[1]; |
| 7614 + z3 = (INT32) wsptr[3]; |
| 7615 + |
| 7616 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c3 */ |
| 7617 + tmp0 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c1-c3 */ |
| 7618 + tmp1 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c1+c3 */ |
| 7619 + |
| 7620 + /* Final output stage */ |
| 7621 + |
| 7622 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0, |
| 7623 + CONST_BITS+PASS1_BITS+3) |
| 7624 + & RANGE_MASK]; |
| 7625 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0, |
| 7626 + CONST_BITS+PASS1_BITS+3) |
| 7627 + & RANGE_MASK]; |
| 7628 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1, |
| 7629 + CONST_BITS+PASS1_BITS+3) |
| 7630 + & RANGE_MASK]; |
| 7631 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1, |
| 7632 + CONST_BITS+PASS1_BITS+3) |
| 7633 + & RANGE_MASK]; |
| 7634 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12, |
| 7635 + CONST_BITS+PASS1_BITS+3) |
| 7636 + & RANGE_MASK]; |
| 7637 + |
| 7638 + wsptr += 5; /* advance pointer to next row */ |
| 7639 + } |
| 7640 +} |
| 7641 + |
| 7642 + |
| 7643 +/* |
| 7644 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 7645 + * producing a reduced-size 3x3 output block. |
| 7646 + * |
| 7647 + * Optimized algorithm with 2 multiplications in the 1-D kernel. |
| 7648 + * cK represents sqrt(2) * cos(K*pi/6). |
| 7649 + */ |
| 7650 + |
| 7651 +GLOBAL(void) |
| 7652 +jpeg_idct_3x3 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 7653 + JCOEFPTR coef_block, |
| 7654 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 7655 +{ |
| 7656 + INT32 tmp0, tmp2, tmp10, tmp12; |
| 7657 + JCOEFPTR inptr; |
| 7658 + ISLOW_MULT_TYPE * quantptr; |
| 7659 + int * wsptr; |
| 7660 + JSAMPROW outptr; |
| 7661 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 7662 + int ctr; |
| 7663 + int workspace[3*3]; /* buffers data between passes */ |
| 7664 + SHIFT_TEMPS |
| 7665 + |
| 7666 + /* Pass 1: process columns from input, store into work array. */ |
| 7667 + |
| 7668 + inptr = coef_block; |
| 7669 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 7670 + wsptr = workspace; |
| 7671 + for (ctr = 0; ctr < 3; ctr++, inptr++, quantptr++, wsptr++) { |
| 7672 + /* Even part */ |
| 7673 + |
| 7674 + tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 7675 + tmp0 <<= CONST_BITS; |
| 7676 + /* Add fudge factor here for final descale. */ |
| 7677 + tmp0 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 7678 + tmp2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 7679 + tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */ |
| 7680 + tmp10 = tmp0 + tmp12; |
| 7681 + tmp2 = tmp0 - tmp12 - tmp12; |
| 7682 + |
| 7683 + /* Odd part */ |
| 7684 + |
| 7685 + tmp12 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 7686 + tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */ |
| 7687 + |
| 7688 + /* Final output stage */ |
| 7689 + |
| 7690 + wsptr[3*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS); |
| 7691 + wsptr[3*2] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS); |
| 7692 + wsptr[3*1] = (int) RIGHT_SHIFT(tmp2, CONST_BITS-PASS1_BITS); |
| 7693 + } |
| 7694 + |
| 7695 + /* Pass 2: process 3 rows from work array, store into output array. */ |
| 7696 + |
| 7697 + wsptr = workspace; |
| 7698 + for (ctr = 0; ctr < 3; ctr++) { |
| 7699 + outptr = output_buf[ctr] + output_col; |
| 7700 + |
| 7701 + /* Even part */ |
| 7702 + |
| 7703 + /* Add fudge factor here for final descale. */ |
| 7704 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 7705 + tmp0 <<= CONST_BITS; |
| 7706 + tmp2 = (INT32) wsptr[2]; |
| 7707 + tmp12 = MULTIPLY(tmp2, FIX(0.707106781)); /* c2 */ |
| 7708 + tmp10 = tmp0 + tmp12; |
| 7709 + tmp2 = tmp0 - tmp12 - tmp12; |
| 7710 + |
| 7711 + /* Odd part */ |
| 7712 + |
| 7713 + tmp12 = (INT32) wsptr[1]; |
| 7714 + tmp0 = MULTIPLY(tmp12, FIX(1.224744871)); /* c1 */ |
| 7715 + |
| 7716 + /* Final output stage */ |
| 7717 + |
| 7718 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0, |
| 7719 + CONST_BITS+PASS1_BITS+3) |
| 7720 + & RANGE_MASK]; |
| 7721 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0, |
| 7722 + CONST_BITS+PASS1_BITS+3) |
| 7723 + & RANGE_MASK]; |
| 7724 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp2, |
| 7725 + CONST_BITS+PASS1_BITS+3) |
| 7726 + & RANGE_MASK]; |
| 7727 + |
| 7728 + wsptr += 3; /* advance pointer to next row */ |
| 7729 + } |
| 7730 +} |
| 7731 + |
| 7732 + |
| 7733 +/* |
| 7734 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 7735 + * producing a 9x9 output block. |
| 7736 + * |
| 7737 + * Optimized algorithm with 10 multiplications in the 1-D kernel. |
| 7738 + * cK represents sqrt(2) * cos(K*pi/18). |
| 7739 + */ |
| 7740 + |
| 7741 +GLOBAL(void) |
| 7742 +jpeg_idct_9x9 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 7743 + JCOEFPTR coef_block, |
| 7744 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 7745 +{ |
| 7746 + INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13, tmp14; |
| 7747 + INT32 z1, z2, z3, z4; |
| 7748 + JCOEFPTR inptr; |
| 7749 + ISLOW_MULT_TYPE * quantptr; |
| 7750 + int * wsptr; |
| 7751 + JSAMPROW outptr; |
| 7752 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 7753 + int ctr; |
| 7754 + int workspace[8*9]; /* buffers data between passes */ |
| 7755 + SHIFT_TEMPS |
| 7756 + |
| 7757 + /* Pass 1: process columns from input, store into work array. */ |
| 7758 + |
| 7759 + inptr = coef_block; |
| 7760 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 7761 + wsptr = workspace; |
| 7762 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) { |
| 7763 + /* Even part */ |
| 7764 + |
| 7765 + tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 7766 + tmp0 <<= CONST_BITS; |
| 7767 + /* Add fudge factor here for final descale. */ |
| 7768 + tmp0 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 7769 + |
| 7770 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 7771 + z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 7772 + z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]); |
| 7773 + |
| 7774 + tmp3 = MULTIPLY(z3, FIX(0.707106781)); /* c6 */ |
| 7775 + tmp1 = tmp0 + tmp3; |
| 7776 + tmp2 = tmp0 - tmp3 - tmp3; |
| 7777 + |
| 7778 + tmp0 = MULTIPLY(z1 - z2, FIX(0.707106781)); /* c6 */ |
| 7779 + tmp11 = tmp2 + tmp0; |
| 7780 + tmp14 = tmp2 - tmp0 - tmp0; |
| 7781 + |
| 7782 + tmp0 = MULTIPLY(z1 + z2, FIX(1.328926049)); /* c2 */ |
| 7783 + tmp2 = MULTIPLY(z1, FIX(1.083350441)); /* c4 */ |
| 7784 + tmp3 = MULTIPLY(z2, FIX(0.245575608)); /* c8 */ |
| 7785 + |
| 7786 + tmp10 = tmp1 + tmp0 - tmp3; |
| 7787 + tmp12 = tmp1 - tmp0 + tmp2; |
| 7788 + tmp13 = tmp1 - tmp2 + tmp3; |
| 7789 + |
| 7790 + /* Odd part */ |
| 7791 + |
| 7792 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 7793 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 7794 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); |
| 7795 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]); |
| 7796 + |
| 7797 + z2 = MULTIPLY(z2, - FIX(1.224744871)); /* -c3 */ |
| 7798 + |
| 7799 + tmp2 = MULTIPLY(z1 + z3, FIX(0.909038955)); /* c5 */ |
| 7800 + tmp3 = MULTIPLY(z1 + z4, FIX(0.483689525)); /* c7 */ |
| 7801 + tmp0 = tmp2 + tmp3 - z2; |
| 7802 + tmp1 = MULTIPLY(z3 - z4, FIX(1.392728481)); /* c1 */ |
| 7803 + tmp2 += z2 - tmp1; |
| 7804 + tmp3 += z2 + tmp1; |
| 7805 + tmp1 = MULTIPLY(z1 - z3 - z4, FIX(1.224744871)); /* c3 */ |
| 7806 + |
| 7807 + /* Final output stage */ |
| 7808 + |
| 7809 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp10 + tmp0, CONST_BITS-PASS1_BITS); |
| 7810 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp10 - tmp0, CONST_BITS-PASS1_BITS); |
| 7811 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp11 + tmp1, CONST_BITS-PASS1_BITS); |
| 7812 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp11 - tmp1, CONST_BITS-PASS1_BITS); |
| 7813 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp12 + tmp2, CONST_BITS-PASS1_BITS); |
| 7814 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp12 - tmp2, CONST_BITS-PASS1_BITS); |
| 7815 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp13 + tmp3, CONST_BITS-PASS1_BITS); |
| 7816 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp13 - tmp3, CONST_BITS-PASS1_BITS); |
| 7817 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp14, CONST_BITS-PASS1_BITS); |
| 7818 + } |
| 7819 + |
| 7820 + /* Pass 2: process 9 rows from work array, store into output array. */ |
| 7821 + |
| 7822 + wsptr = workspace; |
| 7823 + for (ctr = 0; ctr < 9; ctr++) { |
| 7824 + outptr = output_buf[ctr] + output_col; |
| 7825 + |
| 7826 + /* Even part */ |
| 7827 + |
| 7828 + /* Add fudge factor here for final descale. */ |
| 7829 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 7830 + tmp0 <<= CONST_BITS; |
| 7831 + |
| 7832 + z1 = (INT32) wsptr[2]; |
| 7833 + z2 = (INT32) wsptr[4]; |
| 7834 + z3 = (INT32) wsptr[6]; |
| 7835 + |
| 7836 + tmp3 = MULTIPLY(z3, FIX(0.707106781)); /* c6 */ |
| 7837 + tmp1 = tmp0 + tmp3; |
| 7838 + tmp2 = tmp0 - tmp3 - tmp3; |
| 7839 + |
| 7840 + tmp0 = MULTIPLY(z1 - z2, FIX(0.707106781)); /* c6 */ |
| 7841 + tmp11 = tmp2 + tmp0; |
| 7842 + tmp14 = tmp2 - tmp0 - tmp0; |
| 7843 + |
| 7844 + tmp0 = MULTIPLY(z1 + z2, FIX(1.328926049)); /* c2 */ |
| 7845 + tmp2 = MULTIPLY(z1, FIX(1.083350441)); /* c4 */ |
| 7846 + tmp3 = MULTIPLY(z2, FIX(0.245575608)); /* c8 */ |
| 7847 + |
| 7848 + tmp10 = tmp1 + tmp0 - tmp3; |
| 7849 + tmp12 = tmp1 - tmp0 + tmp2; |
| 7850 + tmp13 = tmp1 - tmp2 + tmp3; |
| 7851 + |
| 7852 + /* Odd part */ |
| 7853 + |
| 7854 + z1 = (INT32) wsptr[1]; |
| 7855 + z2 = (INT32) wsptr[3]; |
| 7856 + z3 = (INT32) wsptr[5]; |
| 7857 + z4 = (INT32) wsptr[7]; |
| 7858 + |
| 7859 + z2 = MULTIPLY(z2, - FIX(1.224744871)); /* -c3 */ |
| 7860 + |
| 7861 + tmp2 = MULTIPLY(z1 + z3, FIX(0.909038955)); /* c5 */ |
| 7862 + tmp3 = MULTIPLY(z1 + z4, FIX(0.483689525)); /* c7 */ |
| 7863 + tmp0 = tmp2 + tmp3 - z2; |
| 7864 + tmp1 = MULTIPLY(z3 - z4, FIX(1.392728481)); /* c1 */ |
| 7865 + tmp2 += z2 - tmp1; |
| 7866 + tmp3 += z2 + tmp1; |
| 7867 + tmp1 = MULTIPLY(z1 - z3 - z4, FIX(1.224744871)); /* c3 */ |
| 7868 + |
| 7869 + /* Final output stage */ |
| 7870 + |
| 7871 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp10 + tmp0, |
| 7872 + CONST_BITS+PASS1_BITS+3) |
| 7873 + & RANGE_MASK]; |
| 7874 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp10 - tmp0, |
| 7875 + CONST_BITS+PASS1_BITS+3) |
| 7876 + & RANGE_MASK]; |
| 7877 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp11 + tmp1, |
| 7878 + CONST_BITS+PASS1_BITS+3) |
| 7879 + & RANGE_MASK]; |
| 7880 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp11 - tmp1, |
| 7881 + CONST_BITS+PASS1_BITS+3) |
| 7882 + & RANGE_MASK]; |
| 7883 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp12 + tmp2, |
| 7884 + CONST_BITS+PASS1_BITS+3) |
| 7885 + & RANGE_MASK]; |
| 7886 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp12 - tmp2, |
| 7887 + CONST_BITS+PASS1_BITS+3) |
| 7888 + & RANGE_MASK]; |
| 7889 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp13 + tmp3, |
| 7890 + CONST_BITS+PASS1_BITS+3) |
| 7891 + & RANGE_MASK]; |
| 7892 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp13 - tmp3, |
| 7893 + CONST_BITS+PASS1_BITS+3) |
| 7894 + & RANGE_MASK]; |
| 7895 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp14, |
| 7896 + CONST_BITS+PASS1_BITS+3) |
| 7897 + & RANGE_MASK]; |
| 7898 + |
| 7899 + wsptr += 8; /* advance pointer to next row */ |
| 7900 + } |
| 7901 +} |
| 7902 + |
| 7903 + |
| 7904 +/* |
| 7905 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 7906 + * producing a 10x10 output block. |
| 7907 + * |
| 7908 + * Optimized algorithm with 12 multiplications in the 1-D kernel. |
| 7909 + * cK represents sqrt(2) * cos(K*pi/20). |
| 7910 + */ |
| 7911 + |
| 7912 +GLOBAL(void) |
| 7913 +jpeg_idct_10x10 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 7914 + JCOEFPTR coef_block, |
| 7915 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 7916 +{ |
| 7917 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14; |
| 7918 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24; |
| 7919 + INT32 z1, z2, z3, z4, z5; |
| 7920 + JCOEFPTR inptr; |
| 7921 + ISLOW_MULT_TYPE * quantptr; |
| 7922 + int * wsptr; |
| 7923 + JSAMPROW outptr; |
| 7924 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 7925 + int ctr; |
| 7926 + int workspace[8*10]; /* buffers data between passes */ |
| 7927 + SHIFT_TEMPS |
| 7928 + |
| 7929 + /* Pass 1: process columns from input, store into work array. */ |
| 7930 + |
| 7931 + inptr = coef_block; |
| 7932 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 7933 + wsptr = workspace; |
| 7934 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) { |
| 7935 + /* Even part */ |
| 7936 + |
| 7937 + z3 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 7938 + z3 <<= CONST_BITS; |
| 7939 + /* Add fudge factor here for final descale. */ |
| 7940 + z3 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 7941 + z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 7942 + z1 = MULTIPLY(z4, FIX(1.144122806)); /* c4 */ |
| 7943 + z2 = MULTIPLY(z4, FIX(0.437016024)); /* c8 */ |
| 7944 + tmp10 = z3 + z1; |
| 7945 + tmp11 = z3 - z2; |
| 7946 + |
| 7947 + tmp22 = RIGHT_SHIFT(z3 - ((z1 - z2) << 1), /* c0 = (c4-c8)*2 */ |
| 7948 + CONST_BITS-PASS1_BITS); |
| 7949 + |
| 7950 + z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 7951 + z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]); |
| 7952 + |
| 7953 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c6 */ |
| 7954 + tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */ |
| 7955 + tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */ |
| 7956 + |
| 7957 + tmp20 = tmp10 + tmp12; |
| 7958 + tmp24 = tmp10 - tmp12; |
| 7959 + tmp21 = tmp11 + tmp13; |
| 7960 + tmp23 = tmp11 - tmp13; |
| 7961 + |
| 7962 + /* Odd part */ |
| 7963 + |
| 7964 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 7965 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 7966 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); |
| 7967 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]); |
| 7968 + |
| 7969 + tmp11 = z2 + z4; |
| 7970 + tmp13 = z2 - z4; |
| 7971 + |
| 7972 + tmp12 = MULTIPLY(tmp13, FIX(0.309016994)); /* (c3-c7)/2 */ |
| 7973 + z5 = z3 << CONST_BITS; |
| 7974 + |
| 7975 + z2 = MULTIPLY(tmp11, FIX(0.951056516)); /* (c3+c7)/2 */ |
| 7976 + z4 = z5 + tmp12; |
| 7977 + |
| 7978 + tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */ |
| 7979 + tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */ |
| 7980 + |
| 7981 + z2 = MULTIPLY(tmp11, FIX(0.587785252)); /* (c1-c9)/2 */ |
| 7982 + z4 = z5 - tmp12 - (tmp13 << (CONST_BITS - 1)); |
| 7983 + |
| 7984 + tmp12 = (z1 - tmp13 - z3) << PASS1_BITS; |
| 7985 + |
| 7986 + tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */ |
| 7987 + tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */ |
| 7988 + |
| 7989 + /* Final output stage */ |
| 7990 + |
| 7991 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS); |
| 7992 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS); |
| 7993 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS); |
| 7994 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS); |
| 7995 + wsptr[8*2] = (int) (tmp22 + tmp12); |
| 7996 + wsptr[8*7] = (int) (tmp22 - tmp12); |
| 7997 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS); |
| 7998 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS); |
| 7999 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS); |
| 8000 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS); |
| 8001 + } |
| 8002 + |
| 8003 + /* Pass 2: process 10 rows from work array, store into output array. */ |
| 8004 + |
| 8005 + wsptr = workspace; |
| 8006 + for (ctr = 0; ctr < 10; ctr++) { |
| 8007 + outptr = output_buf[ctr] + output_col; |
| 8008 + |
| 8009 + /* Even part */ |
| 8010 + |
| 8011 + /* Add fudge factor here for final descale. */ |
| 8012 + z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 8013 + z3 <<= CONST_BITS; |
| 8014 + z4 = (INT32) wsptr[4]; |
| 8015 + z1 = MULTIPLY(z4, FIX(1.144122806)); /* c4 */ |
| 8016 + z2 = MULTIPLY(z4, FIX(0.437016024)); /* c8 */ |
| 8017 + tmp10 = z3 + z1; |
| 8018 + tmp11 = z3 - z2; |
| 8019 + |
| 8020 + tmp22 = z3 - ((z1 - z2) << 1); /* c0 = (c4-c8)*2 */ |
| 8021 + |
| 8022 + z2 = (INT32) wsptr[2]; |
| 8023 + z3 = (INT32) wsptr[6]; |
| 8024 + |
| 8025 + z1 = MULTIPLY(z2 + z3, FIX(0.831253876)); /* c6 */ |
| 8026 + tmp12 = z1 + MULTIPLY(z2, FIX(0.513743148)); /* c2-c6 */ |
| 8027 + tmp13 = z1 - MULTIPLY(z3, FIX(2.176250899)); /* c2+c6 */ |
| 8028 + |
| 8029 + tmp20 = tmp10 + tmp12; |
| 8030 + tmp24 = tmp10 - tmp12; |
| 8031 + tmp21 = tmp11 + tmp13; |
| 8032 + tmp23 = tmp11 - tmp13; |
| 8033 + |
| 8034 + /* Odd part */ |
| 8035 + |
| 8036 + z1 = (INT32) wsptr[1]; |
| 8037 + z2 = (INT32) wsptr[3]; |
| 8038 + z3 = (INT32) wsptr[5]; |
| 8039 + z3 <<= CONST_BITS; |
| 8040 + z4 = (INT32) wsptr[7]; |
| 8041 + |
| 8042 + tmp11 = z2 + z4; |
| 8043 + tmp13 = z2 - z4; |
| 8044 + |
| 8045 + tmp12 = MULTIPLY(tmp13, FIX(0.309016994)); /* (c3-c7)/2 */ |
| 8046 + |
| 8047 + z2 = MULTIPLY(tmp11, FIX(0.951056516)); /* (c3+c7)/2 */ |
| 8048 + z4 = z3 + tmp12; |
| 8049 + |
| 8050 + tmp10 = MULTIPLY(z1, FIX(1.396802247)) + z2 + z4; /* c1 */ |
| 8051 + tmp14 = MULTIPLY(z1, FIX(0.221231742)) - z2 + z4; /* c9 */ |
| 8052 + |
| 8053 + z2 = MULTIPLY(tmp11, FIX(0.587785252)); /* (c1-c9)/2 */ |
| 8054 + z4 = z3 - tmp12 - (tmp13 << (CONST_BITS - 1)); |
| 8055 + |
| 8056 + tmp12 = ((z1 - tmp13) << CONST_BITS) - z3; |
| 8057 + |
| 8058 + tmp11 = MULTIPLY(z1, FIX(1.260073511)) - z2 - z4; /* c3 */ |
| 8059 + tmp13 = MULTIPLY(z1, FIX(0.642039522)) - z2 + z4; /* c7 */ |
| 8060 + |
| 8061 + /* Final output stage */ |
| 8062 + |
| 8063 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10, |
| 8064 + CONST_BITS+PASS1_BITS+3) |
| 8065 + & RANGE_MASK]; |
| 8066 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10, |
| 8067 + CONST_BITS+PASS1_BITS+3) |
| 8068 + & RANGE_MASK]; |
| 8069 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11, |
| 8070 + CONST_BITS+PASS1_BITS+3) |
| 8071 + & RANGE_MASK]; |
| 8072 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11, |
| 8073 + CONST_BITS+PASS1_BITS+3) |
| 8074 + & RANGE_MASK]; |
| 8075 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12, |
| 8076 + CONST_BITS+PASS1_BITS+3) |
| 8077 + & RANGE_MASK]; |
| 8078 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12, |
| 8079 + CONST_BITS+PASS1_BITS+3) |
| 8080 + & RANGE_MASK]; |
| 8081 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13, |
| 8082 + CONST_BITS+PASS1_BITS+3) |
| 8083 + & RANGE_MASK]; |
| 8084 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13, |
| 8085 + CONST_BITS+PASS1_BITS+3) |
| 8086 + & RANGE_MASK]; |
| 8087 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14, |
| 8088 + CONST_BITS+PASS1_BITS+3) |
| 8089 + & RANGE_MASK]; |
| 8090 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14, |
| 8091 + CONST_BITS+PASS1_BITS+3) |
| 8092 + & RANGE_MASK]; |
| 8093 + |
| 8094 + wsptr += 8; /* advance pointer to next row */ |
| 8095 + } |
| 8096 +} |
| 8097 + |
| 8098 + |
| 8099 +/* |
| 8100 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 8101 + * producing a 11x11 output block. |
| 8102 + * |
| 8103 + * Optimized algorithm with 24 multiplications in the 1-D kernel. |
| 8104 + * cK represents sqrt(2) * cos(K*pi/22). |
| 8105 + */ |
| 8106 + |
| 8107 +GLOBAL(void) |
| 8108 +jpeg_idct_11x11 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 8109 + JCOEFPTR coef_block, |
| 8110 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 8111 +{ |
| 8112 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14; |
| 8113 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25; |
| 8114 + INT32 z1, z2, z3, z4; |
| 8115 + JCOEFPTR inptr; |
| 8116 + ISLOW_MULT_TYPE * quantptr; |
| 8117 + int * wsptr; |
| 8118 + JSAMPROW outptr; |
| 8119 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 8120 + int ctr; |
| 8121 + int workspace[8*11]; /* buffers data between passes */ |
| 8122 + SHIFT_TEMPS |
| 8123 + |
| 8124 + /* Pass 1: process columns from input, store into work array. */ |
| 8125 + |
| 8126 + inptr = coef_block; |
| 8127 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 8128 + wsptr = workspace; |
| 8129 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) { |
| 8130 + /* Even part */ |
| 8131 + |
| 8132 + tmp10 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 8133 + tmp10 <<= CONST_BITS; |
| 8134 + /* Add fudge factor here for final descale. */ |
| 8135 + tmp10 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 8136 + |
| 8137 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 8138 + z2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 8139 + z3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]); |
| 8140 + |
| 8141 + tmp20 = MULTIPLY(z2 - z3, FIX(2.546640132)); /* c2+c4 */ |
| 8142 + tmp23 = MULTIPLY(z2 - z1, FIX(0.430815045)); /* c2-c6 */ |
| 8143 + z4 = z1 + z3; |
| 8144 + tmp24 = MULTIPLY(z4, - FIX(1.155664402)); /* -(c2-c10) */ |
| 8145 + z4 -= z2; |
| 8146 + tmp25 = tmp10 + MULTIPLY(z4, FIX(1.356927976)); /* c2 */ |
| 8147 + tmp21 = tmp20 + tmp23 + tmp25 - |
| 8148 + MULTIPLY(z2, FIX(1.821790775)); /* c2+c4+c10-c6 */ |
| 8149 + tmp20 += tmp25 + MULTIPLY(z3, FIX(2.115825087)); /* c4+c6 */ |
| 8150 + tmp23 += tmp25 - MULTIPLY(z1, FIX(1.513598477)); /* c6+c8 */ |
| 8151 + tmp24 += tmp25; |
| 8152 + tmp22 = tmp24 - MULTIPLY(z3, FIX(0.788749120)); /* c8+c10 */ |
| 8153 + tmp24 += MULTIPLY(z2, FIX(1.944413522)) - /* c2+c8 */ |
| 8154 + MULTIPLY(z1, FIX(1.390975730)); /* c4+c10 */ |
| 8155 + tmp25 = tmp10 - MULTIPLY(z4, FIX(1.414213562)); /* c0 */ |
| 8156 + |
| 8157 + /* Odd part */ |
| 8158 + |
| 8159 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 8160 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 8161 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); |
| 8162 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]); |
| 8163 + |
| 8164 + tmp11 = z1 + z2; |
| 8165 + tmp14 = MULTIPLY(tmp11 + z3 + z4, FIX(0.398430003)); /* c9 */ |
| 8166 + tmp11 = MULTIPLY(tmp11, FIX(0.887983902)); /* c3-c9 */ |
| 8167 + tmp12 = MULTIPLY(z1 + z3, FIX(0.670361295)); /* c5-c9 */ |
| 8168 + tmp13 = tmp14 + MULTIPLY(z1 + z4, FIX(0.366151574)); /* c7-c9 */ |
| 8169 + tmp10 = tmp11 + tmp12 + tmp13 - |
| 8170 + MULTIPLY(z1, FIX(0.923107866)); /* c7+c5+c3-c1-2*c9 */ |
| 8171 + z1 = tmp14 - MULTIPLY(z2 + z3, FIX(1.163011579)); /* c7+c9 */ |
| 8172 + tmp11 += z1 + MULTIPLY(z2, FIX(2.073276588)); /* c1+c7+3*c9-c3 */ |
| 8173 + tmp12 += z1 - MULTIPLY(z3, FIX(1.192193623)); /* c3+c5-c7-c9 */ |
| 8174 + z1 = MULTIPLY(z2 + z4, - FIX(1.798248910)); /* -(c1+c9) */ |
| 8175 + tmp11 += z1; |
| 8176 + tmp13 += z1 + MULTIPLY(z4, FIX(2.102458632)); /* c1+c5+c9-c7 */ |
| 8177 + tmp14 += MULTIPLY(z2, - FIX(1.467221301)) + /* -(c5+c9) */ |
| 8178 + MULTIPLY(z3, FIX(1.001388905)) - /* c1-c9 */ |
| 8179 + MULTIPLY(z4, FIX(1.684843907)); /* c3+c9 */ |
| 8180 + |
| 8181 + /* Final output stage */ |
| 8182 + |
| 8183 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS); |
| 8184 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS); |
| 8185 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS); |
| 8186 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS); |
| 8187 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS); |
| 8188 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS); |
| 8189 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS); |
| 8190 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS); |
| 8191 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS); |
| 8192 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS); |
| 8193 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25, CONST_BITS-PASS1_BITS); |
| 8194 + } |
| 8195 + |
| 8196 + /* Pass 2: process 11 rows from work array, store into output array. */ |
| 8197 + |
| 8198 + wsptr = workspace; |
| 8199 + for (ctr = 0; ctr < 11; ctr++) { |
| 8200 + outptr = output_buf[ctr] + output_col; |
| 8201 + |
| 8202 + /* Even part */ |
| 8203 + |
| 8204 + /* Add fudge factor here for final descale. */ |
| 8205 + tmp10 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 8206 + tmp10 <<= CONST_BITS; |
| 8207 + |
| 8208 + z1 = (INT32) wsptr[2]; |
| 8209 + z2 = (INT32) wsptr[4]; |
| 8210 + z3 = (INT32) wsptr[6]; |
| 8211 + |
| 8212 + tmp20 = MULTIPLY(z2 - z3, FIX(2.546640132)); /* c2+c4 */ |
| 8213 + tmp23 = MULTIPLY(z2 - z1, FIX(0.430815045)); /* c2-c6 */ |
| 8214 + z4 = z1 + z3; |
| 8215 + tmp24 = MULTIPLY(z4, - FIX(1.155664402)); /* -(c2-c10) */ |
| 8216 + z4 -= z2; |
| 8217 + tmp25 = tmp10 + MULTIPLY(z4, FIX(1.356927976)); /* c2 */ |
| 8218 + tmp21 = tmp20 + tmp23 + tmp25 - |
| 8219 + MULTIPLY(z2, FIX(1.821790775)); /* c2+c4+c10-c6 */ |
| 8220 + tmp20 += tmp25 + MULTIPLY(z3, FIX(2.115825087)); /* c4+c6 */ |
| 8221 + tmp23 += tmp25 - MULTIPLY(z1, FIX(1.513598477)); /* c6+c8 */ |
| 8222 + tmp24 += tmp25; |
| 8223 + tmp22 = tmp24 - MULTIPLY(z3, FIX(0.788749120)); /* c8+c10 */ |
| 8224 + tmp24 += MULTIPLY(z2, FIX(1.944413522)) - /* c2+c8 */ |
| 8225 + MULTIPLY(z1, FIX(1.390975730)); /* c4+c10 */ |
| 8226 + tmp25 = tmp10 - MULTIPLY(z4, FIX(1.414213562)); /* c0 */ |
| 8227 + |
| 8228 + /* Odd part */ |
| 8229 + |
| 8230 + z1 = (INT32) wsptr[1]; |
| 8231 + z2 = (INT32) wsptr[3]; |
| 8232 + z3 = (INT32) wsptr[5]; |
| 8233 + z4 = (INT32) wsptr[7]; |
| 8234 + |
| 8235 + tmp11 = z1 + z2; |
| 8236 + tmp14 = MULTIPLY(tmp11 + z3 + z4, FIX(0.398430003)); /* c9 */ |
| 8237 + tmp11 = MULTIPLY(tmp11, FIX(0.887983902)); /* c3-c9 */ |
| 8238 + tmp12 = MULTIPLY(z1 + z3, FIX(0.670361295)); /* c5-c9 */ |
| 8239 + tmp13 = tmp14 + MULTIPLY(z1 + z4, FIX(0.366151574)); /* c7-c9 */ |
| 8240 + tmp10 = tmp11 + tmp12 + tmp13 - |
| 8241 + MULTIPLY(z1, FIX(0.923107866)); /* c7+c5+c3-c1-2*c9 */ |
| 8242 + z1 = tmp14 - MULTIPLY(z2 + z3, FIX(1.163011579)); /* c7+c9 */ |
| 8243 + tmp11 += z1 + MULTIPLY(z2, FIX(2.073276588)); /* c1+c7+3*c9-c3 */ |
| 8244 + tmp12 += z1 - MULTIPLY(z3, FIX(1.192193623)); /* c3+c5-c7-c9 */ |
| 8245 + z1 = MULTIPLY(z2 + z4, - FIX(1.798248910)); /* -(c1+c9) */ |
| 8246 + tmp11 += z1; |
| 8247 + tmp13 += z1 + MULTIPLY(z4, FIX(2.102458632)); /* c1+c5+c9-c7 */ |
| 8248 + tmp14 += MULTIPLY(z2, - FIX(1.467221301)) + /* -(c5+c9) */ |
| 8249 + MULTIPLY(z3, FIX(1.001388905)) - /* c1-c9 */ |
| 8250 + MULTIPLY(z4, FIX(1.684843907)); /* c3+c9 */ |
| 8251 + |
| 8252 + /* Final output stage */ |
| 8253 + |
| 8254 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10, |
| 8255 + CONST_BITS+PASS1_BITS+3) |
| 8256 + & RANGE_MASK]; |
| 8257 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10, |
| 8258 + CONST_BITS+PASS1_BITS+3) |
| 8259 + & RANGE_MASK]; |
| 8260 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11, |
| 8261 + CONST_BITS+PASS1_BITS+3) |
| 8262 + & RANGE_MASK]; |
| 8263 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11, |
| 8264 + CONST_BITS+PASS1_BITS+3) |
| 8265 + & RANGE_MASK]; |
| 8266 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12, |
| 8267 + CONST_BITS+PASS1_BITS+3) |
| 8268 + & RANGE_MASK]; |
| 8269 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12, |
| 8270 + CONST_BITS+PASS1_BITS+3) |
| 8271 + & RANGE_MASK]; |
| 8272 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13, |
| 8273 + CONST_BITS+PASS1_BITS+3) |
| 8274 + & RANGE_MASK]; |
| 8275 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13, |
| 8276 + CONST_BITS+PASS1_BITS+3) |
| 8277 + & RANGE_MASK]; |
| 8278 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14, |
| 8279 + CONST_BITS+PASS1_BITS+3) |
| 8280 + & RANGE_MASK]; |
| 8281 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14, |
| 8282 + CONST_BITS+PASS1_BITS+3) |
| 8283 + & RANGE_MASK]; |
| 8284 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25, |
| 8285 + CONST_BITS+PASS1_BITS+3) |
| 8286 + & RANGE_MASK]; |
| 8287 + |
| 8288 + wsptr += 8; /* advance pointer to next row */ |
| 8289 + } |
| 8290 +} |
| 8291 + |
| 8292 + |
| 8293 +/* |
| 8294 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 8295 + * producing a 12x12 output block. |
| 8296 + * |
| 8297 + * Optimized algorithm with 15 multiplications in the 1-D kernel. |
| 8298 + * cK represents sqrt(2) * cos(K*pi/24). |
| 8299 + */ |
| 8300 + |
| 8301 +GLOBAL(void) |
| 8302 +jpeg_idct_12x12 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 8303 + JCOEFPTR coef_block, |
| 8304 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 8305 +{ |
| 8306 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15; |
| 8307 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25; |
| 8308 + INT32 z1, z2, z3, z4; |
| 8309 + JCOEFPTR inptr; |
| 8310 + ISLOW_MULT_TYPE * quantptr; |
| 8311 + int * wsptr; |
| 8312 + JSAMPROW outptr; |
| 8313 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 8314 + int ctr; |
| 8315 + int workspace[8*12]; /* buffers data between passes */ |
| 8316 + SHIFT_TEMPS |
| 8317 + |
| 8318 + /* Pass 1: process columns from input, store into work array. */ |
| 8319 + |
| 8320 + inptr = coef_block; |
| 8321 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 8322 + wsptr = workspace; |
| 8323 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) { |
| 8324 + /* Even part */ |
| 8325 + |
| 8326 + z3 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 8327 + z3 <<= CONST_BITS; |
| 8328 + /* Add fudge factor here for final descale. */ |
| 8329 + z3 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 8330 + |
| 8331 + z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 8332 + z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */ |
| 8333 + |
| 8334 + tmp10 = z3 + z4; |
| 8335 + tmp11 = z3 - z4; |
| 8336 + |
| 8337 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 8338 + z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */ |
| 8339 + z1 <<= CONST_BITS; |
| 8340 + z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]); |
| 8341 + z2 <<= CONST_BITS; |
| 8342 + |
| 8343 + tmp12 = z1 - z2; |
| 8344 + |
| 8345 + tmp21 = z3 + tmp12; |
| 8346 + tmp24 = z3 - tmp12; |
| 8347 + |
| 8348 + tmp12 = z4 + z2; |
| 8349 + |
| 8350 + tmp20 = tmp10 + tmp12; |
| 8351 + tmp25 = tmp10 - tmp12; |
| 8352 + |
| 8353 + tmp12 = z4 - z1 - z2; |
| 8354 + |
| 8355 + tmp22 = tmp11 + tmp12; |
| 8356 + tmp23 = tmp11 - tmp12; |
| 8357 + |
| 8358 + /* Odd part */ |
| 8359 + |
| 8360 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 8361 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 8362 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); |
| 8363 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]); |
| 8364 + |
| 8365 + tmp11 = MULTIPLY(z2, FIX(1.306562965)); /* c3 */ |
| 8366 + tmp14 = MULTIPLY(z2, - FIX_0_541196100); /* -c9 */ |
| 8367 + |
| 8368 + tmp10 = z1 + z3; |
| 8369 + tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669)); /* c7 */ |
| 8370 + tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384)); /* c5-c7 */ |
| 8371 + tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716)); /* c1-c5 */ |
| 8372 + tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580)); /* -(c7+c11) */ |
| 8373 + tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */ |
| 8374 + tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */ |
| 8375 + tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) - /* c7-c11 */ |
| 8376 + MULTIPLY(z4, FIX(1.982889723)); /* c5+c7 */ |
| 8377 + |
| 8378 + z1 -= z4; |
| 8379 + z2 -= z3; |
| 8380 + z3 = MULTIPLY(z1 + z2, FIX_0_541196100); /* c9 */ |
| 8381 + tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865); /* c3-c9 */ |
| 8382 + tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065); /* c3+c9 */ |
| 8383 + |
| 8384 + /* Final output stage */ |
| 8385 + |
| 8386 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS); |
| 8387 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS); |
| 8388 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS); |
| 8389 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS); |
| 8390 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS); |
| 8391 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS); |
| 8392 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS); |
| 8393 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS); |
| 8394 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS); |
| 8395 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS); |
| 8396 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS); |
| 8397 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS); |
| 8398 + } |
| 8399 + |
| 8400 + /* Pass 2: process 12 rows from work array, store into output array. */ |
| 8401 + |
| 8402 + wsptr = workspace; |
| 8403 + for (ctr = 0; ctr < 12; ctr++) { |
| 8404 + outptr = output_buf[ctr] + output_col; |
| 8405 + |
| 8406 + /* Even part */ |
| 8407 + |
| 8408 + /* Add fudge factor here for final descale. */ |
| 8409 + z3 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 8410 + z3 <<= CONST_BITS; |
| 8411 + |
| 8412 + z4 = (INT32) wsptr[4]; |
| 8413 + z4 = MULTIPLY(z4, FIX(1.224744871)); /* c4 */ |
| 8414 + |
| 8415 + tmp10 = z3 + z4; |
| 8416 + tmp11 = z3 - z4; |
| 8417 + |
| 8418 + z1 = (INT32) wsptr[2]; |
| 8419 + z4 = MULTIPLY(z1, FIX(1.366025404)); /* c2 */ |
| 8420 + z1 <<= CONST_BITS; |
| 8421 + z2 = (INT32) wsptr[6]; |
| 8422 + z2 <<= CONST_BITS; |
| 8423 + |
| 8424 + tmp12 = z1 - z2; |
| 8425 + |
| 8426 + tmp21 = z3 + tmp12; |
| 8427 + tmp24 = z3 - tmp12; |
| 8428 + |
| 8429 + tmp12 = z4 + z2; |
| 8430 + |
| 8431 + tmp20 = tmp10 + tmp12; |
| 8432 + tmp25 = tmp10 - tmp12; |
| 8433 + |
| 8434 + tmp12 = z4 - z1 - z2; |
| 8435 + |
| 8436 + tmp22 = tmp11 + tmp12; |
| 8437 + tmp23 = tmp11 - tmp12; |
| 8438 + |
| 8439 + /* Odd part */ |
| 8440 + |
| 8441 + z1 = (INT32) wsptr[1]; |
| 8442 + z2 = (INT32) wsptr[3]; |
| 8443 + z3 = (INT32) wsptr[5]; |
| 8444 + z4 = (INT32) wsptr[7]; |
| 8445 + |
| 8446 + tmp11 = MULTIPLY(z2, FIX(1.306562965)); /* c3 */ |
| 8447 + tmp14 = MULTIPLY(z2, - FIX_0_541196100); /* -c9 */ |
| 8448 + |
| 8449 + tmp10 = z1 + z3; |
| 8450 + tmp15 = MULTIPLY(tmp10 + z4, FIX(0.860918669)); /* c7 */ |
| 8451 + tmp12 = tmp15 + MULTIPLY(tmp10, FIX(0.261052384)); /* c5-c7 */ |
| 8452 + tmp10 = tmp12 + tmp11 + MULTIPLY(z1, FIX(0.280143716)); /* c1-c5 */ |
| 8453 + tmp13 = MULTIPLY(z3 + z4, - FIX(1.045510580)); /* -(c7+c11) */ |
| 8454 + tmp12 += tmp13 + tmp14 - MULTIPLY(z3, FIX(1.478575242)); /* c1+c5-c7-c11 */ |
| 8455 + tmp13 += tmp15 - tmp11 + MULTIPLY(z4, FIX(1.586706681)); /* c1+c11 */ |
| 8456 + tmp15 += tmp14 - MULTIPLY(z1, FIX(0.676326758)) - /* c7-c11 */ |
| 8457 + MULTIPLY(z4, FIX(1.982889723)); /* c5+c7 */ |
| 8458 + |
| 8459 + z1 -= z4; |
| 8460 + z2 -= z3; |
| 8461 + z3 = MULTIPLY(z1 + z2, FIX_0_541196100); /* c9 */ |
| 8462 + tmp11 = z3 + MULTIPLY(z1, FIX_0_765366865); /* c3-c9 */ |
| 8463 + tmp14 = z3 - MULTIPLY(z2, FIX_1_847759065); /* c3+c9 */ |
| 8464 + |
| 8465 + /* Final output stage */ |
| 8466 + |
| 8467 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10, |
| 8468 + CONST_BITS+PASS1_BITS+3) |
| 8469 + & RANGE_MASK]; |
| 8470 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10, |
| 8471 + CONST_BITS+PASS1_BITS+3) |
| 8472 + & RANGE_MASK]; |
| 8473 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11, |
| 8474 + CONST_BITS+PASS1_BITS+3) |
| 8475 + & RANGE_MASK]; |
| 8476 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11, |
| 8477 + CONST_BITS+PASS1_BITS+3) |
| 8478 + & RANGE_MASK]; |
| 8479 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12, |
| 8480 + CONST_BITS+PASS1_BITS+3) |
| 8481 + & RANGE_MASK]; |
| 8482 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12, |
| 8483 + CONST_BITS+PASS1_BITS+3) |
| 8484 + & RANGE_MASK]; |
| 8485 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13, |
| 8486 + CONST_BITS+PASS1_BITS+3) |
| 8487 + & RANGE_MASK]; |
| 8488 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13, |
| 8489 + CONST_BITS+PASS1_BITS+3) |
| 8490 + & RANGE_MASK]; |
| 8491 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14, |
| 8492 + CONST_BITS+PASS1_BITS+3) |
| 8493 + & RANGE_MASK]; |
| 8494 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14, |
| 8495 + CONST_BITS+PASS1_BITS+3) |
| 8496 + & RANGE_MASK]; |
| 8497 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15, |
| 8498 + CONST_BITS+PASS1_BITS+3) |
| 8499 + & RANGE_MASK]; |
| 8500 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15, |
| 8501 + CONST_BITS+PASS1_BITS+3) |
| 8502 + & RANGE_MASK]; |
| 8503 + |
| 8504 + wsptr += 8; /* advance pointer to next row */ |
| 8505 + } |
| 8506 +} |
| 8507 + |
| 8508 + |
| 8509 +/* |
| 8510 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 8511 + * producing a 13x13 output block. |
| 8512 + * |
| 8513 + * Optimized algorithm with 29 multiplications in the 1-D kernel. |
| 8514 + * cK represents sqrt(2) * cos(K*pi/26). |
| 8515 + */ |
| 8516 + |
| 8517 +GLOBAL(void) |
| 8518 +jpeg_idct_13x13 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 8519 + JCOEFPTR coef_block, |
| 8520 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 8521 +{ |
| 8522 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15; |
| 8523 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26; |
| 8524 + INT32 z1, z2, z3, z4; |
| 8525 + JCOEFPTR inptr; |
| 8526 + ISLOW_MULT_TYPE * quantptr; |
| 8527 + int * wsptr; |
| 8528 + JSAMPROW outptr; |
| 8529 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 8530 + int ctr; |
| 8531 + int workspace[8*13]; /* buffers data between passes */ |
| 8532 + SHIFT_TEMPS |
| 8533 + |
| 8534 + /* Pass 1: process columns from input, store into work array. */ |
| 8535 + |
| 8536 + inptr = coef_block; |
| 8537 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 8538 + wsptr = workspace; |
| 8539 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) { |
| 8540 + /* Even part */ |
| 8541 + |
| 8542 + z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 8543 + z1 <<= CONST_BITS; |
| 8544 + /* Add fudge factor here for final descale. */ |
| 8545 + z1 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 8546 + |
| 8547 + z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 8548 + z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 8549 + z4 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]); |
| 8550 + |
| 8551 + tmp10 = z3 + z4; |
| 8552 + tmp11 = z3 - z4; |
| 8553 + |
| 8554 + tmp12 = MULTIPLY(tmp10, FIX(1.155388986)); /* (c4+c6)/2 */ |
| 8555 + tmp13 = MULTIPLY(tmp11, FIX(0.096834934)) + z1; /* (c4-c6)/2 */ |
| 8556 + |
| 8557 + tmp20 = MULTIPLY(z2, FIX(1.373119086)) + tmp12 + tmp13; /* c2 */ |
| 8558 + tmp22 = MULTIPLY(z2, FIX(0.501487041)) - tmp12 + tmp13; /* c10 */ |
| 8559 + |
| 8560 + tmp12 = MULTIPLY(tmp10, FIX(0.316450131)); /* (c8-c12)/2 */ |
| 8561 + tmp13 = MULTIPLY(tmp11, FIX(0.486914739)) + z1; /* (c8+c12)/2 */ |
| 8562 + |
| 8563 + tmp21 = MULTIPLY(z2, FIX(1.058554052)) - tmp12 + tmp13; /* c6 */ |
| 8564 + tmp25 = MULTIPLY(z2, - FIX(1.252223920)) + tmp12 + tmp13; /* c4 */ |
| 8565 + |
| 8566 + tmp12 = MULTIPLY(tmp10, FIX(0.435816023)); /* (c2-c10)/2 */ |
| 8567 + tmp13 = MULTIPLY(tmp11, FIX(0.937303064)) - z1; /* (c2+c10)/2 */ |
| 8568 + |
| 8569 + tmp23 = MULTIPLY(z2, - FIX(0.170464608)) - tmp12 - tmp13; /* c12 */ |
| 8570 + tmp24 = MULTIPLY(z2, - FIX(0.803364869)) + tmp12 - tmp13; /* c8 */ |
| 8571 + |
| 8572 + tmp26 = MULTIPLY(tmp11 - z2, FIX(1.414213562)) + z1; /* c0 */ |
| 8573 + |
| 8574 + /* Odd part */ |
| 8575 + |
| 8576 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 8577 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 8578 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); |
| 8579 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]); |
| 8580 + |
| 8581 + tmp11 = MULTIPLY(z1 + z2, FIX(1.322312651)); /* c3 */ |
| 8582 + tmp12 = MULTIPLY(z1 + z3, FIX(1.163874945)); /* c5 */ |
| 8583 + tmp15 = z1 + z4; |
| 8584 + tmp13 = MULTIPLY(tmp15, FIX(0.937797057)); /* c7 */ |
| 8585 + tmp10 = tmp11 + tmp12 + tmp13 - |
| 8586 + MULTIPLY(z1, FIX(2.020082300)); /* c7+c5+c3-c1 */ |
| 8587 + tmp14 = MULTIPLY(z2 + z3, - FIX(0.338443458)); /* -c11 */ |
| 8588 + tmp11 += tmp14 + MULTIPLY(z2, FIX(0.837223564)); /* c5+c9+c11-c3 */ |
| 8589 + tmp12 += tmp14 - MULTIPLY(z3, FIX(1.572116027)); /* c1+c5-c9-c11 */ |
| 8590 + tmp14 = MULTIPLY(z2 + z4, - FIX(1.163874945)); /* -c5 */ |
| 8591 + tmp11 += tmp14; |
| 8592 + tmp13 += tmp14 + MULTIPLY(z4, FIX(2.205608352)); /* c3+c5+c9-c7 */ |
| 8593 + tmp14 = MULTIPLY(z3 + z4, - FIX(0.657217813)); /* -c9 */ |
| 8594 + tmp12 += tmp14; |
| 8595 + tmp13 += tmp14; |
| 8596 + tmp15 = MULTIPLY(tmp15, FIX(0.338443458)); /* c11 */ |
| 8597 + tmp14 = tmp15 + MULTIPLY(z1, FIX(0.318774355)) - /* c9-c11 */ |
| 8598 + MULTIPLY(z2, FIX(0.466105296)); /* c1-c7 */ |
| 8599 + z1 = MULTIPLY(z3 - z2, FIX(0.937797057)); /* c7 */ |
| 8600 + tmp14 += z1; |
| 8601 + tmp15 += z1 + MULTIPLY(z3, FIX(0.384515595)) - /* c3-c7 */ |
| 8602 + MULTIPLY(z4, FIX(1.742345811)); /* c1+c11 */ |
| 8603 + |
| 8604 + /* Final output stage */ |
| 8605 + |
| 8606 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS); |
| 8607 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS); |
| 8608 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS); |
| 8609 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS); |
| 8610 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS); |
| 8611 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS); |
| 8612 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS); |
| 8613 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS); |
| 8614 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS); |
| 8615 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS); |
| 8616 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS); |
| 8617 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS); |
| 8618 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26, CONST_BITS-PASS1_BITS); |
| 8619 + } |
| 8620 + |
| 8621 + /* Pass 2: process 13 rows from work array, store into output array. */ |
| 8622 + |
| 8623 + wsptr = workspace; |
| 8624 + for (ctr = 0; ctr < 13; ctr++) { |
| 8625 + outptr = output_buf[ctr] + output_col; |
| 8626 + |
| 8627 + /* Even part */ |
| 8628 + |
| 8629 + /* Add fudge factor here for final descale. */ |
| 8630 + z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 8631 + z1 <<= CONST_BITS; |
| 8632 + |
| 8633 + z2 = (INT32) wsptr[2]; |
| 8634 + z3 = (INT32) wsptr[4]; |
| 8635 + z4 = (INT32) wsptr[6]; |
| 8636 + |
| 8637 + tmp10 = z3 + z4; |
| 8638 + tmp11 = z3 - z4; |
| 8639 + |
| 8640 + tmp12 = MULTIPLY(tmp10, FIX(1.155388986)); /* (c4+c6)/2 */ |
| 8641 + tmp13 = MULTIPLY(tmp11, FIX(0.096834934)) + z1; /* (c4-c6)/2 */ |
| 8642 + |
| 8643 + tmp20 = MULTIPLY(z2, FIX(1.373119086)) + tmp12 + tmp13; /* c2 */ |
| 8644 + tmp22 = MULTIPLY(z2, FIX(0.501487041)) - tmp12 + tmp13; /* c10 */ |
| 8645 + |
| 8646 + tmp12 = MULTIPLY(tmp10, FIX(0.316450131)); /* (c8-c12)/2 */ |
| 8647 + tmp13 = MULTIPLY(tmp11, FIX(0.486914739)) + z1; /* (c8+c12)/2 */ |
| 8648 + |
| 8649 + tmp21 = MULTIPLY(z2, FIX(1.058554052)) - tmp12 + tmp13; /* c6 */ |
| 8650 + tmp25 = MULTIPLY(z2, - FIX(1.252223920)) + tmp12 + tmp13; /* c4 */ |
| 8651 + |
| 8652 + tmp12 = MULTIPLY(tmp10, FIX(0.435816023)); /* (c2-c10)/2 */ |
| 8653 + tmp13 = MULTIPLY(tmp11, FIX(0.937303064)) - z1; /* (c2+c10)/2 */ |
| 8654 + |
| 8655 + tmp23 = MULTIPLY(z2, - FIX(0.170464608)) - tmp12 - tmp13; /* c12 */ |
| 8656 + tmp24 = MULTIPLY(z2, - FIX(0.803364869)) + tmp12 - tmp13; /* c8 */ |
| 8657 + |
| 8658 + tmp26 = MULTIPLY(tmp11 - z2, FIX(1.414213562)) + z1; /* c0 */ |
| 8659 + |
| 8660 + /* Odd part */ |
| 8661 + |
| 8662 + z1 = (INT32) wsptr[1]; |
| 8663 + z2 = (INT32) wsptr[3]; |
| 8664 + z3 = (INT32) wsptr[5]; |
| 8665 + z4 = (INT32) wsptr[7]; |
| 8666 + |
| 8667 + tmp11 = MULTIPLY(z1 + z2, FIX(1.322312651)); /* c3 */ |
| 8668 + tmp12 = MULTIPLY(z1 + z3, FIX(1.163874945)); /* c5 */ |
| 8669 + tmp15 = z1 + z4; |
| 8670 + tmp13 = MULTIPLY(tmp15, FIX(0.937797057)); /* c7 */ |
| 8671 + tmp10 = tmp11 + tmp12 + tmp13 - |
| 8672 + MULTIPLY(z1, FIX(2.020082300)); /* c7+c5+c3-c1 */ |
| 8673 + tmp14 = MULTIPLY(z2 + z3, - FIX(0.338443458)); /* -c11 */ |
| 8674 + tmp11 += tmp14 + MULTIPLY(z2, FIX(0.837223564)); /* c5+c9+c11-c3 */ |
| 8675 + tmp12 += tmp14 - MULTIPLY(z3, FIX(1.572116027)); /* c1+c5-c9-c11 */ |
| 8676 + tmp14 = MULTIPLY(z2 + z4, - FIX(1.163874945)); /* -c5 */ |
| 8677 + tmp11 += tmp14; |
| 8678 + tmp13 += tmp14 + MULTIPLY(z4, FIX(2.205608352)); /* c3+c5+c9-c7 */ |
| 8679 + tmp14 = MULTIPLY(z3 + z4, - FIX(0.657217813)); /* -c9 */ |
| 8680 + tmp12 += tmp14; |
| 8681 + tmp13 += tmp14; |
| 8682 + tmp15 = MULTIPLY(tmp15, FIX(0.338443458)); /* c11 */ |
| 8683 + tmp14 = tmp15 + MULTIPLY(z1, FIX(0.318774355)) - /* c9-c11 */ |
| 8684 + MULTIPLY(z2, FIX(0.466105296)); /* c1-c7 */ |
| 8685 + z1 = MULTIPLY(z3 - z2, FIX(0.937797057)); /* c7 */ |
| 8686 + tmp14 += z1; |
| 8687 + tmp15 += z1 + MULTIPLY(z3, FIX(0.384515595)) - /* c3-c7 */ |
| 8688 + MULTIPLY(z4, FIX(1.742345811)); /* c1+c11 */ |
| 8689 + |
| 8690 + /* Final output stage */ |
| 8691 + |
| 8692 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10, |
| 8693 + CONST_BITS+PASS1_BITS+3) |
| 8694 + & RANGE_MASK]; |
| 8695 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10, |
| 8696 + CONST_BITS+PASS1_BITS+3) |
| 8697 + & RANGE_MASK]; |
| 8698 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11, |
| 8699 + CONST_BITS+PASS1_BITS+3) |
| 8700 + & RANGE_MASK]; |
| 8701 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11, |
| 8702 + CONST_BITS+PASS1_BITS+3) |
| 8703 + & RANGE_MASK]; |
| 8704 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12, |
| 8705 + CONST_BITS+PASS1_BITS+3) |
| 8706 + & RANGE_MASK]; |
| 8707 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12, |
| 8708 + CONST_BITS+PASS1_BITS+3) |
| 8709 + & RANGE_MASK]; |
| 8710 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13, |
| 8711 + CONST_BITS+PASS1_BITS+3) |
| 8712 + & RANGE_MASK]; |
| 8713 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13, |
| 8714 + CONST_BITS+PASS1_BITS+3) |
| 8715 + & RANGE_MASK]; |
| 8716 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14, |
| 8717 + CONST_BITS+PASS1_BITS+3) |
| 8718 + & RANGE_MASK]; |
| 8719 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14, |
| 8720 + CONST_BITS+PASS1_BITS+3) |
| 8721 + & RANGE_MASK]; |
| 8722 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15, |
| 8723 + CONST_BITS+PASS1_BITS+3) |
| 8724 + & RANGE_MASK]; |
| 8725 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15, |
| 8726 + CONST_BITS+PASS1_BITS+3) |
| 8727 + & RANGE_MASK]; |
| 8728 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26, |
| 8729 + CONST_BITS+PASS1_BITS+3) |
| 8730 + & RANGE_MASK]; |
| 8731 + |
| 8732 + wsptr += 8; /* advance pointer to next row */ |
| 8733 + } |
| 8734 +} |
| 8735 + |
| 8736 + |
| 8737 +/* |
| 8738 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 8739 + * producing a 14x14 output block. |
| 8740 + * |
| 8741 + * Optimized algorithm with 20 multiplications in the 1-D kernel. |
| 8742 + * cK represents sqrt(2) * cos(K*pi/28). |
| 8743 + */ |
| 8744 + |
| 8745 +GLOBAL(void) |
| 8746 +jpeg_idct_14x14 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 8747 + JCOEFPTR coef_block, |
| 8748 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 8749 +{ |
| 8750 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16; |
| 8751 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26; |
| 8752 + INT32 z1, z2, z3, z4; |
| 8753 + JCOEFPTR inptr; |
| 8754 + ISLOW_MULT_TYPE * quantptr; |
| 8755 + int * wsptr; |
| 8756 + JSAMPROW outptr; |
| 8757 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 8758 + int ctr; |
| 8759 + int workspace[8*14]; /* buffers data between passes */ |
| 8760 + SHIFT_TEMPS |
| 8761 + |
| 8762 + /* Pass 1: process columns from input, store into work array. */ |
| 8763 + |
| 8764 + inptr = coef_block; |
| 8765 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 8766 + wsptr = workspace; |
| 8767 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) { |
| 8768 + /* Even part */ |
| 8769 + |
| 8770 + z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 8771 + z1 <<= CONST_BITS; |
| 8772 + /* Add fudge factor here for final descale. */ |
| 8773 + z1 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 8774 + z4 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 8775 + z2 = MULTIPLY(z4, FIX(1.274162392)); /* c4 */ |
| 8776 + z3 = MULTIPLY(z4, FIX(0.314692123)); /* c12 */ |
| 8777 + z4 = MULTIPLY(z4, FIX(0.881747734)); /* c8 */ |
| 8778 + |
| 8779 + tmp10 = z1 + z2; |
| 8780 + tmp11 = z1 + z3; |
| 8781 + tmp12 = z1 - z4; |
| 8782 + |
| 8783 + tmp23 = RIGHT_SHIFT(z1 - ((z2 + z3 - z4) << 1), /* c0 = (c4+c12-c8)*2 */ |
| 8784 + CONST_BITS-PASS1_BITS); |
| 8785 + |
| 8786 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 8787 + z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]); |
| 8788 + |
| 8789 + z3 = MULTIPLY(z1 + z2, FIX(1.105676686)); /* c6 */ |
| 8790 + |
| 8791 + tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */ |
| 8792 + tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */ |
| 8793 + tmp15 = MULTIPLY(z1, FIX(0.613604268)) - /* c10 */ |
| 8794 + MULTIPLY(z2, FIX(1.378756276)); /* c2 */ |
| 8795 + |
| 8796 + tmp20 = tmp10 + tmp13; |
| 8797 + tmp26 = tmp10 - tmp13; |
| 8798 + tmp21 = tmp11 + tmp14; |
| 8799 + tmp25 = tmp11 - tmp14; |
| 8800 + tmp22 = tmp12 + tmp15; |
| 8801 + tmp24 = tmp12 - tmp15; |
| 8802 + |
| 8803 + /* Odd part */ |
| 8804 + |
| 8805 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 8806 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 8807 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); |
| 8808 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]); |
| 8809 + tmp13 = z4 << CONST_BITS; |
| 8810 + |
| 8811 + tmp14 = z1 + z3; |
| 8812 + tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607)); /* c3 */ |
| 8813 + tmp12 = MULTIPLY(tmp14, FIX(1.197448846)); /* c5 */ |
| 8814 + tmp10 = tmp11 + tmp12 + tmp13 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1
*/ |
| 8815 + tmp14 = MULTIPLY(tmp14, FIX(0.752406978)); /* c9 */ |
| 8816 + tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426)); /* c9+c11-c13 */ |
| 8817 + z1 -= z2; |
| 8818 + tmp15 = MULTIPLY(z1, FIX(0.467085129)) - tmp13; /* c11 */ |
| 8819 + tmp16 += tmp15; |
| 8820 + z1 += z4; |
| 8821 + z4 = MULTIPLY(z2 + z3, - FIX(0.158341681)) - tmp13; /* -c13 */ |
| 8822 + tmp11 += z4 - MULTIPLY(z2, FIX(0.424103948)); /* c3-c9-c13 */ |
| 8823 + tmp12 += z4 - MULTIPLY(z3, FIX(2.373959773)); /* c3+c5-c13 */ |
| 8824 + z4 = MULTIPLY(z3 - z2, FIX(1.405321284)); /* c1 */ |
| 8825 + tmp14 += z4 + tmp13 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */ |
| 8826 + tmp15 += z4 + MULTIPLY(z2, FIX(0.674957567)); /* c1+c11-c5 */ |
| 8827 + |
| 8828 + tmp13 = (z1 - z3) << PASS1_BITS; |
| 8829 + |
| 8830 + /* Final output stage */ |
| 8831 + |
| 8832 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS); |
| 8833 + wsptr[8*13] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS); |
| 8834 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS); |
| 8835 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS); |
| 8836 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS); |
| 8837 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS); |
| 8838 + wsptr[8*3] = (int) (tmp23 + tmp13); |
| 8839 + wsptr[8*10] = (int) (tmp23 - tmp13); |
| 8840 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS); |
| 8841 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS); |
| 8842 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS); |
| 8843 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS); |
| 8844 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS); |
| 8845 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS); |
| 8846 + } |
| 8847 + |
| 8848 + /* Pass 2: process 14 rows from work array, store into output array. */ |
| 8849 + |
| 8850 + wsptr = workspace; |
| 8851 + for (ctr = 0; ctr < 14; ctr++) { |
| 8852 + outptr = output_buf[ctr] + output_col; |
| 8853 + |
| 8854 + /* Even part */ |
| 8855 + |
| 8856 + /* Add fudge factor here for final descale. */ |
| 8857 + z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 8858 + z1 <<= CONST_BITS; |
| 8859 + z4 = (INT32) wsptr[4]; |
| 8860 + z2 = MULTIPLY(z4, FIX(1.274162392)); /* c4 */ |
| 8861 + z3 = MULTIPLY(z4, FIX(0.314692123)); /* c12 */ |
| 8862 + z4 = MULTIPLY(z4, FIX(0.881747734)); /* c8 */ |
| 8863 + |
| 8864 + tmp10 = z1 + z2; |
| 8865 + tmp11 = z1 + z3; |
| 8866 + tmp12 = z1 - z4; |
| 8867 + |
| 8868 + tmp23 = z1 - ((z2 + z3 - z4) << 1); /* c0 = (c4+c12-c8)*2 */ |
| 8869 + |
| 8870 + z1 = (INT32) wsptr[2]; |
| 8871 + z2 = (INT32) wsptr[6]; |
| 8872 + |
| 8873 + z3 = MULTIPLY(z1 + z2, FIX(1.105676686)); /* c6 */ |
| 8874 + |
| 8875 + tmp13 = z3 + MULTIPLY(z1, FIX(0.273079590)); /* c2-c6 */ |
| 8876 + tmp14 = z3 - MULTIPLY(z2, FIX(1.719280954)); /* c6+c10 */ |
| 8877 + tmp15 = MULTIPLY(z1, FIX(0.613604268)) - /* c10 */ |
| 8878 + MULTIPLY(z2, FIX(1.378756276)); /* c2 */ |
| 8879 + |
| 8880 + tmp20 = tmp10 + tmp13; |
| 8881 + tmp26 = tmp10 - tmp13; |
| 8882 + tmp21 = tmp11 + tmp14; |
| 8883 + tmp25 = tmp11 - tmp14; |
| 8884 + tmp22 = tmp12 + tmp15; |
| 8885 + tmp24 = tmp12 - tmp15; |
| 8886 + |
| 8887 + /* Odd part */ |
| 8888 + |
| 8889 + z1 = (INT32) wsptr[1]; |
| 8890 + z2 = (INT32) wsptr[3]; |
| 8891 + z3 = (INT32) wsptr[5]; |
| 8892 + z4 = (INT32) wsptr[7]; |
| 8893 + z4 <<= CONST_BITS; |
| 8894 + |
| 8895 + tmp14 = z1 + z3; |
| 8896 + tmp11 = MULTIPLY(z1 + z2, FIX(1.334852607)); /* c3 */ |
| 8897 + tmp12 = MULTIPLY(tmp14, FIX(1.197448846)); /* c5 */ |
| 8898 + tmp10 = tmp11 + tmp12 + z4 - MULTIPLY(z1, FIX(1.126980169)); /* c3+c5-c1 */ |
| 8899 + tmp14 = MULTIPLY(tmp14, FIX(0.752406978)); /* c9 */ |
| 8900 + tmp16 = tmp14 - MULTIPLY(z1, FIX(1.061150426)); /* c9+c11-c13 */ |
| 8901 + z1 -= z2; |
| 8902 + tmp15 = MULTIPLY(z1, FIX(0.467085129)) - z4; /* c11 */ |
| 8903 + tmp16 += tmp15; |
| 8904 + tmp13 = MULTIPLY(z2 + z3, - FIX(0.158341681)) - z4; /* -c13 */ |
| 8905 + tmp11 += tmp13 - MULTIPLY(z2, FIX(0.424103948)); /* c3-c9-c13 */ |
| 8906 + tmp12 += tmp13 - MULTIPLY(z3, FIX(2.373959773)); /* c3+c5-c13 */ |
| 8907 + tmp13 = MULTIPLY(z3 - z2, FIX(1.405321284)); /* c1 */ |
| 8908 + tmp14 += tmp13 + z4 - MULTIPLY(z3, FIX(1.6906431334)); /* c1+c9-c11 */ |
| 8909 + tmp15 += tmp13 + MULTIPLY(z2, FIX(0.674957567)); /* c1+c11-c5 */ |
| 8910 + |
| 8911 + tmp13 = ((z1 - z3) << CONST_BITS) + z4; |
| 8912 + |
| 8913 + /* Final output stage */ |
| 8914 + |
| 8915 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10, |
| 8916 + CONST_BITS+PASS1_BITS+3) |
| 8917 + & RANGE_MASK]; |
| 8918 + outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10, |
| 8919 + CONST_BITS+PASS1_BITS+3) |
| 8920 + & RANGE_MASK]; |
| 8921 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11, |
| 8922 + CONST_BITS+PASS1_BITS+3) |
| 8923 + & RANGE_MASK]; |
| 8924 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11, |
| 8925 + CONST_BITS+PASS1_BITS+3) |
| 8926 + & RANGE_MASK]; |
| 8927 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12, |
| 8928 + CONST_BITS+PASS1_BITS+3) |
| 8929 + & RANGE_MASK]; |
| 8930 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12, |
| 8931 + CONST_BITS+PASS1_BITS+3) |
| 8932 + & RANGE_MASK]; |
| 8933 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13, |
| 8934 + CONST_BITS+PASS1_BITS+3) |
| 8935 + & RANGE_MASK]; |
| 8936 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13, |
| 8937 + CONST_BITS+PASS1_BITS+3) |
| 8938 + & RANGE_MASK]; |
| 8939 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14, |
| 8940 + CONST_BITS+PASS1_BITS+3) |
| 8941 + & RANGE_MASK]; |
| 8942 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14, |
| 8943 + CONST_BITS+PASS1_BITS+3) |
| 8944 + & RANGE_MASK]; |
| 8945 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15, |
| 8946 + CONST_BITS+PASS1_BITS+3) |
| 8947 + & RANGE_MASK]; |
| 8948 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15, |
| 8949 + CONST_BITS+PASS1_BITS+3) |
| 8950 + & RANGE_MASK]; |
| 8951 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16, |
| 8952 + CONST_BITS+PASS1_BITS+3) |
| 8953 + & RANGE_MASK]; |
| 8954 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16, |
| 8955 + CONST_BITS+PASS1_BITS+3) |
| 8956 + & RANGE_MASK]; |
| 8957 + |
| 8958 + wsptr += 8; /* advance pointer to next row */ |
| 8959 + } |
| 8960 +} |
| 8961 + |
| 8962 + |
| 8963 +/* |
| 8964 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 8965 + * producing a 15x15 output block. |
| 8966 + * |
| 8967 + * Optimized algorithm with 22 multiplications in the 1-D kernel. |
| 8968 + * cK represents sqrt(2) * cos(K*pi/30). |
| 8969 + */ |
| 8970 + |
| 8971 +GLOBAL(void) |
| 8972 +jpeg_idct_15x15 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 8973 + JCOEFPTR coef_block, |
| 8974 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 8975 +{ |
| 8976 + INT32 tmp10, tmp11, tmp12, tmp13, tmp14, tmp15, tmp16; |
| 8977 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27; |
| 8978 + INT32 z1, z2, z3, z4; |
| 8979 + JCOEFPTR inptr; |
| 8980 + ISLOW_MULT_TYPE * quantptr; |
| 8981 + int * wsptr; |
| 8982 + JSAMPROW outptr; |
| 8983 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 8984 + int ctr; |
| 8985 + int workspace[8*15]; /* buffers data between passes */ |
| 8986 + SHIFT_TEMPS |
| 8987 + |
| 8988 + /* Pass 1: process columns from input, store into work array. */ |
| 8989 + |
| 8990 + inptr = coef_block; |
| 8991 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 8992 + wsptr = workspace; |
| 8993 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) { |
| 8994 + /* Even part */ |
| 8995 + |
| 8996 + z1 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 8997 + z1 <<= CONST_BITS; |
| 8998 + /* Add fudge factor here for final descale. */ |
| 8999 + z1 += ONE << (CONST_BITS-PASS1_BITS-1); |
| 9000 + |
| 9001 + z2 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 9002 + z3 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 9003 + z4 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]); |
| 9004 + |
| 9005 + tmp10 = MULTIPLY(z4, FIX(0.437016024)); /* c12 */ |
| 9006 + tmp11 = MULTIPLY(z4, FIX(1.144122806)); /* c6 */ |
| 9007 + |
| 9008 + tmp12 = z1 - tmp10; |
| 9009 + tmp13 = z1 + tmp11; |
| 9010 + z1 -= (tmp11 - tmp10) << 1; /* c0 = (c6-c12)*2 */ |
| 9011 + |
| 9012 + z4 = z2 - z3; |
| 9013 + z3 += z2; |
| 9014 + tmp10 = MULTIPLY(z3, FIX(1.337628990)); /* (c2+c4)/2 */ |
| 9015 + tmp11 = MULTIPLY(z4, FIX(0.045680613)); /* (c2-c4)/2 */ |
| 9016 + z2 = MULTIPLY(z2, FIX(1.439773946)); /* c4+c14 */ |
| 9017 + |
| 9018 + tmp20 = tmp13 + tmp10 + tmp11; |
| 9019 + tmp23 = tmp12 - tmp10 + tmp11 + z2; |
| 9020 + |
| 9021 + tmp10 = MULTIPLY(z3, FIX(0.547059574)); /* (c8+c14)/2 */ |
| 9022 + tmp11 = MULTIPLY(z4, FIX(0.399234004)); /* (c8-c14)/2 */ |
| 9023 + |
| 9024 + tmp25 = tmp13 - tmp10 - tmp11; |
| 9025 + tmp26 = tmp12 + tmp10 - tmp11 - z2; |
| 9026 + |
| 9027 + tmp10 = MULTIPLY(z3, FIX(0.790569415)); /* (c6+c12)/2 */ |
| 9028 + tmp11 = MULTIPLY(z4, FIX(0.353553391)); /* (c6-c12)/2 */ |
| 9029 + |
| 9030 + tmp21 = tmp12 + tmp10 + tmp11; |
| 9031 + tmp24 = tmp13 - tmp10 + tmp11; |
| 9032 + tmp11 += tmp11; |
| 9033 + tmp22 = z1 + tmp11; /* c10 = c6-c12 */ |
| 9034 + tmp27 = z1 - tmp11 - tmp11; /* c0 = (c6-c12)*2 */ |
| 9035 + |
| 9036 + /* Odd part */ |
| 9037 + |
| 9038 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 9039 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 9040 + z4 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); |
| 9041 + z3 = MULTIPLY(z4, FIX(1.224744871)); /* c5 */ |
| 9042 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]); |
| 9043 + |
| 9044 + tmp13 = z2 - z4; |
| 9045 + tmp15 = MULTIPLY(z1 + tmp13, FIX(0.831253876)); /* c9 */ |
| 9046 + tmp11 = tmp15 + MULTIPLY(z1, FIX(0.513743148)); /* c3-c9 */ |
| 9047 + tmp14 = tmp15 - MULTIPLY(tmp13, FIX(2.176250899)); /* c3+c9 */ |
| 9048 + |
| 9049 + tmp13 = MULTIPLY(z2, - FIX(0.831253876)); /* -c9 */ |
| 9050 + tmp15 = MULTIPLY(z2, - FIX(1.344997024)); /* -c3 */ |
| 9051 + z2 = z1 - z4; |
| 9052 + tmp12 = z3 + MULTIPLY(z2, FIX(1.406466353)); /* c1 */ |
| 9053 + |
| 9054 + tmp10 = tmp12 + MULTIPLY(z4, FIX(2.457431844)) - tmp15; /* c1+c7 */ |
| 9055 + tmp16 = tmp12 - MULTIPLY(z1, FIX(1.112434820)) + tmp13; /* c1-c13 */ |
| 9056 + tmp12 = MULTIPLY(z2, FIX(1.224744871)) - z3; /* c5 */ |
| 9057 + z2 = MULTIPLY(z1 + z4, FIX(0.575212477)); /* c11 */ |
| 9058 + tmp13 += z2 + MULTIPLY(z1, FIX(0.475753014)) - z3; /* c7-c11 */ |
| 9059 + tmp15 += z2 - MULTIPLY(z4, FIX(0.869244010)) + z3; /* c11+c13 */ |
| 9060 + |
| 9061 + /* Final output stage */ |
| 9062 + |
| 9063 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp10, CONST_BITS-PASS1_BITS); |
| 9064 + wsptr[8*14] = (int) RIGHT_SHIFT(tmp20 - tmp10, CONST_BITS-PASS1_BITS); |
| 9065 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp11, CONST_BITS-PASS1_BITS); |
| 9066 + wsptr[8*13] = (int) RIGHT_SHIFT(tmp21 - tmp11, CONST_BITS-PASS1_BITS); |
| 9067 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp12, CONST_BITS-PASS1_BITS); |
| 9068 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp22 - tmp12, CONST_BITS-PASS1_BITS); |
| 9069 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp13, CONST_BITS-PASS1_BITS); |
| 9070 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp23 - tmp13, CONST_BITS-PASS1_BITS); |
| 9071 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp14, CONST_BITS-PASS1_BITS); |
| 9072 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp24 - tmp14, CONST_BITS-PASS1_BITS); |
| 9073 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp15, CONST_BITS-PASS1_BITS); |
| 9074 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp25 - tmp15, CONST_BITS-PASS1_BITS); |
| 9075 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp16, CONST_BITS-PASS1_BITS); |
| 9076 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp26 - tmp16, CONST_BITS-PASS1_BITS); |
| 9077 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp27, CONST_BITS-PASS1_BITS); |
| 9078 + } |
| 9079 + |
| 9080 + /* Pass 2: process 15 rows from work array, store into output array. */ |
| 9081 + |
| 9082 + wsptr = workspace; |
| 9083 + for (ctr = 0; ctr < 15; ctr++) { |
| 9084 + outptr = output_buf[ctr] + output_col; |
| 9085 + |
| 9086 + /* Even part */ |
| 9087 + |
| 9088 + /* Add fudge factor here for final descale. */ |
| 9089 + z1 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 9090 + z1 <<= CONST_BITS; |
| 9091 + |
| 9092 + z2 = (INT32) wsptr[2]; |
| 9093 + z3 = (INT32) wsptr[4]; |
| 9094 + z4 = (INT32) wsptr[6]; |
| 9095 + |
| 9096 + tmp10 = MULTIPLY(z4, FIX(0.437016024)); /* c12 */ |
| 9097 + tmp11 = MULTIPLY(z4, FIX(1.144122806)); /* c6 */ |
| 9098 + |
| 9099 + tmp12 = z1 - tmp10; |
| 9100 + tmp13 = z1 + tmp11; |
| 9101 + z1 -= (tmp11 - tmp10) << 1; /* c0 = (c6-c12)*2 */ |
| 9102 + |
| 9103 + z4 = z2 - z3; |
| 9104 + z3 += z2; |
| 9105 + tmp10 = MULTIPLY(z3, FIX(1.337628990)); /* (c2+c4)/2 */ |
| 9106 + tmp11 = MULTIPLY(z4, FIX(0.045680613)); /* (c2-c4)/2 */ |
| 9107 + z2 = MULTIPLY(z2, FIX(1.439773946)); /* c4+c14 */ |
| 9108 + |
| 9109 + tmp20 = tmp13 + tmp10 + tmp11; |
| 9110 + tmp23 = tmp12 - tmp10 + tmp11 + z2; |
| 9111 + |
| 9112 + tmp10 = MULTIPLY(z3, FIX(0.547059574)); /* (c8+c14)/2 */ |
| 9113 + tmp11 = MULTIPLY(z4, FIX(0.399234004)); /* (c8-c14)/2 */ |
| 9114 + |
| 9115 + tmp25 = tmp13 - tmp10 - tmp11; |
| 9116 + tmp26 = tmp12 + tmp10 - tmp11 - z2; |
| 9117 + |
| 9118 + tmp10 = MULTIPLY(z3, FIX(0.790569415)); /* (c6+c12)/2 */ |
| 9119 + tmp11 = MULTIPLY(z4, FIX(0.353553391)); /* (c6-c12)/2 */ |
| 9120 + |
| 9121 + tmp21 = tmp12 + tmp10 + tmp11; |
| 9122 + tmp24 = tmp13 - tmp10 + tmp11; |
| 9123 + tmp11 += tmp11; |
| 9124 + tmp22 = z1 + tmp11; /* c10 = c6-c12 */ |
| 9125 + tmp27 = z1 - tmp11 - tmp11; /* c0 = (c6-c12)*2 */ |
| 9126 + |
| 9127 + /* Odd part */ |
| 9128 + |
| 9129 + z1 = (INT32) wsptr[1]; |
| 9130 + z2 = (INT32) wsptr[3]; |
| 9131 + z4 = (INT32) wsptr[5]; |
| 9132 + z3 = MULTIPLY(z4, FIX(1.224744871)); /* c5 */ |
| 9133 + z4 = (INT32) wsptr[7]; |
| 9134 + |
| 9135 + tmp13 = z2 - z4; |
| 9136 + tmp15 = MULTIPLY(z1 + tmp13, FIX(0.831253876)); /* c9 */ |
| 9137 + tmp11 = tmp15 + MULTIPLY(z1, FIX(0.513743148)); /* c3-c9 */ |
| 9138 + tmp14 = tmp15 - MULTIPLY(tmp13, FIX(2.176250899)); /* c3+c9 */ |
| 9139 + |
| 9140 + tmp13 = MULTIPLY(z2, - FIX(0.831253876)); /* -c9 */ |
| 9141 + tmp15 = MULTIPLY(z2, - FIX(1.344997024)); /* -c3 */ |
| 9142 + z2 = z1 - z4; |
| 9143 + tmp12 = z3 + MULTIPLY(z2, FIX(1.406466353)); /* c1 */ |
| 9144 + |
| 9145 + tmp10 = tmp12 + MULTIPLY(z4, FIX(2.457431844)) - tmp15; /* c1+c7 */ |
| 9146 + tmp16 = tmp12 - MULTIPLY(z1, FIX(1.112434820)) + tmp13; /* c1-c13 */ |
| 9147 + tmp12 = MULTIPLY(z2, FIX(1.224744871)) - z3; /* c5 */ |
| 9148 + z2 = MULTIPLY(z1 + z4, FIX(0.575212477)); /* c11 */ |
| 9149 + tmp13 += z2 + MULTIPLY(z1, FIX(0.475753014)) - z3; /* c7-c11 */ |
| 9150 + tmp15 += z2 - MULTIPLY(z4, FIX(0.869244010)) + z3; /* c11+c13 */ |
| 9151 + |
| 9152 + /* Final output stage */ |
| 9153 + |
| 9154 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp10, |
| 9155 + CONST_BITS+PASS1_BITS+3) |
| 9156 + & RANGE_MASK]; |
| 9157 + outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp10, |
| 9158 + CONST_BITS+PASS1_BITS+3) |
| 9159 + & RANGE_MASK]; |
| 9160 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp11, |
| 9161 + CONST_BITS+PASS1_BITS+3) |
| 9162 + & RANGE_MASK]; |
| 9163 + outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp11, |
| 9164 + CONST_BITS+PASS1_BITS+3) |
| 9165 + & RANGE_MASK]; |
| 9166 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp12, |
| 9167 + CONST_BITS+PASS1_BITS+3) |
| 9168 + & RANGE_MASK]; |
| 9169 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp12, |
| 9170 + CONST_BITS+PASS1_BITS+3) |
| 9171 + & RANGE_MASK]; |
| 9172 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp13, |
| 9173 + CONST_BITS+PASS1_BITS+3) |
| 9174 + & RANGE_MASK]; |
| 9175 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp13, |
| 9176 + CONST_BITS+PASS1_BITS+3) |
| 9177 + & RANGE_MASK]; |
| 9178 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp14, |
| 9179 + CONST_BITS+PASS1_BITS+3) |
| 9180 + & RANGE_MASK]; |
| 9181 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp14, |
| 9182 + CONST_BITS+PASS1_BITS+3) |
| 9183 + & RANGE_MASK]; |
| 9184 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp15, |
| 9185 + CONST_BITS+PASS1_BITS+3) |
| 9186 + & RANGE_MASK]; |
| 9187 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp15, |
| 9188 + CONST_BITS+PASS1_BITS+3) |
| 9189 + & RANGE_MASK]; |
| 9190 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp16, |
| 9191 + CONST_BITS+PASS1_BITS+3) |
| 9192 + & RANGE_MASK]; |
| 9193 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp16, |
| 9194 + CONST_BITS+PASS1_BITS+3) |
| 9195 + & RANGE_MASK]; |
| 9196 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp27, |
| 9197 + CONST_BITS+PASS1_BITS+3) |
| 9198 + & RANGE_MASK]; |
| 9199 + |
| 9200 + wsptr += 8; /* advance pointer to next row */ |
| 9201 + } |
| 9202 +} |
| 9203 + |
| 9204 + |
| 9205 +/* |
| 9206 + * Perform dequantization and inverse DCT on one block of coefficients, |
| 9207 + * producing a 16x16 output block. |
| 9208 + * |
| 9209 + * Optimized algorithm with 28 multiplications in the 1-D kernel. |
| 9210 + * cK represents sqrt(2) * cos(K*pi/32). |
| 9211 + */ |
| 9212 + |
| 9213 +GLOBAL(void) |
| 9214 +jpeg_idct_16x16 (j_decompress_ptr cinfo, jpeg_component_info * compptr, |
| 9215 + JCOEFPTR coef_block, |
| 9216 + JSAMPARRAY output_buf, JDIMENSION output_col) |
| 9217 +{ |
| 9218 + INT32 tmp0, tmp1, tmp2, tmp3, tmp10, tmp11, tmp12, tmp13; |
| 9219 + INT32 tmp20, tmp21, tmp22, tmp23, tmp24, tmp25, tmp26, tmp27; |
| 9220 + INT32 z1, z2, z3, z4; |
| 9221 + JCOEFPTR inptr; |
| 9222 + ISLOW_MULT_TYPE * quantptr; |
| 9223 + int * wsptr; |
| 9224 + JSAMPROW outptr; |
| 9225 + JSAMPLE *range_limit = IDCT_range_limit(cinfo); |
| 9226 + int ctr; |
| 9227 + int workspace[8*16]; /* buffers data between passes */ |
| 9228 + SHIFT_TEMPS |
| 9229 + |
| 9230 + /* Pass 1: process columns from input, store into work array. */ |
| 9231 + |
| 9232 + inptr = coef_block; |
| 9233 + quantptr = (ISLOW_MULT_TYPE *) compptr->dct_table; |
| 9234 + wsptr = workspace; |
| 9235 + for (ctr = 0; ctr < 8; ctr++, inptr++, quantptr++, wsptr++) { |
| 9236 + /* Even part */ |
| 9237 + |
| 9238 + tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]); |
| 9239 + tmp0 <<= CONST_BITS; |
| 9240 + /* Add fudge factor here for final descale. */ |
| 9241 + tmp0 += 1 << (CONST_BITS-PASS1_BITS-1); |
| 9242 + |
| 9243 + z1 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]); |
| 9244 + tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */ |
| 9245 + tmp2 = MULTIPLY(z1, FIX_0_541196100); /* c12[16] = c6[8] */ |
| 9246 + |
| 9247 + tmp10 = tmp0 + tmp1; |
| 9248 + tmp11 = tmp0 - tmp1; |
| 9249 + tmp12 = tmp0 + tmp2; |
| 9250 + tmp13 = tmp0 - tmp2; |
| 9251 + |
| 9252 + z1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]); |
| 9253 + z2 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]); |
| 9254 + z3 = z1 - z2; |
| 9255 + z4 = MULTIPLY(z3, FIX(0.275899379)); /* c14[16] = c7[8] */ |
| 9256 + z3 = MULTIPLY(z3, FIX(1.387039845)); /* c2[16] = c1[8] */ |
| 9257 + |
| 9258 + tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447); /* (c6+c2)[16] = (c3+c1)[8] */ |
| 9259 + tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223); /* (c6-c14)[16] = (c3-c7)[8] */ |
| 9260 + tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */ |
| 9261 + tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] *
/ |
| 9262 + |
| 9263 + tmp20 = tmp10 + tmp0; |
| 9264 + tmp27 = tmp10 - tmp0; |
| 9265 + tmp21 = tmp12 + tmp1; |
| 9266 + tmp26 = tmp12 - tmp1; |
| 9267 + tmp22 = tmp13 + tmp2; |
| 9268 + tmp25 = tmp13 - tmp2; |
| 9269 + tmp23 = tmp11 + tmp3; |
| 9270 + tmp24 = tmp11 - tmp3; |
| 9271 + |
| 9272 + /* Odd part */ |
| 9273 + |
| 9274 + z1 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]); |
| 9275 + z2 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]); |
| 9276 + z3 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]); |
| 9277 + z4 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]); |
| 9278 + |
| 9279 + tmp11 = z1 + z3; |
| 9280 + |
| 9281 + tmp1 = MULTIPLY(z1 + z2, FIX(1.353318001)); /* c3 */ |
| 9282 + tmp2 = MULTIPLY(tmp11, FIX(1.247225013)); /* c5 */ |
| 9283 + tmp3 = MULTIPLY(z1 + z4, FIX(1.093201867)); /* c7 */ |
| 9284 + tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586)); /* c9 */ |
| 9285 + tmp11 = MULTIPLY(tmp11, FIX(0.666655658)); /* c11 */ |
| 9286 + tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528)); /* c13 */ |
| 9287 + tmp0 = tmp1 + tmp2 + tmp3 - |
| 9288 + MULTIPLY(z1, FIX(2.286341144)); /* c7+c5+c3-c1 */ |
| 9289 + tmp13 = tmp10 + tmp11 + tmp12 - |
| 9290 + MULTIPLY(z1, FIX(1.835730603)); /* c9+c11+c13-c15 */ |
| 9291 + z1 = MULTIPLY(z2 + z3, FIX(0.138617169)); /* c15 */ |
| 9292 + tmp1 += z1 + MULTIPLY(z2, FIX(0.071888074)); /* c9+c11-c3-c15 */ |
| 9293 + tmp2 += z1 - MULTIPLY(z3, FIX(1.125726048)); /* c5+c7+c15-c3 */ |
| 9294 + z1 = MULTIPLY(z3 - z2, FIX(1.407403738)); /* c1 */ |
| 9295 + tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282)); /* c1+c11-c9-c13 */ |
| 9296 + tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411)); /* c1+c5+c13-c7 */ |
| 9297 + z2 += z4; |
| 9298 + z1 = MULTIPLY(z2, - FIX(0.666655658)); /* -c11 */ |
| 9299 + tmp1 += z1; |
| 9300 + tmp3 += z1 + MULTIPLY(z4, FIX(1.065388962)); /* c3+c11+c15-c7 */ |
| 9301 + z2 = MULTIPLY(z2, - FIX(1.247225013)); /* -c5 */ |
| 9302 + tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809)); /* c1+c5+c9-c13 */ |
| 9303 + tmp12 += z2; |
| 9304 + z2 = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */ |
| 9305 + tmp2 += z2; |
| 9306 + tmp3 += z2; |
| 9307 + z2 = MULTIPLY(z4 - z3, FIX(0.410524528)); /* c13 */ |
| 9308 + tmp10 += z2; |
| 9309 + tmp11 += z2; |
| 9310 + |
| 9311 + /* Final output stage */ |
| 9312 + |
| 9313 + wsptr[8*0] = (int) RIGHT_SHIFT(tmp20 + tmp0, CONST_BITS-PASS1_BITS); |
| 9314 + wsptr[8*15] = (int) RIGHT_SHIFT(tmp20 - tmp0, CONST_BITS-PASS1_BITS); |
| 9315 + wsptr[8*1] = (int) RIGHT_SHIFT(tmp21 + tmp1, CONST_BITS-PASS1_BITS); |
| 9316 + wsptr[8*14] = (int) RIGHT_SHIFT(tmp21 - tmp1, CONST_BITS-PASS1_BITS); |
| 9317 + wsptr[8*2] = (int) RIGHT_SHIFT(tmp22 + tmp2, CONST_BITS-PASS1_BITS); |
| 9318 + wsptr[8*13] = (int) RIGHT_SHIFT(tmp22 - tmp2, CONST_BITS-PASS1_BITS); |
| 9319 + wsptr[8*3] = (int) RIGHT_SHIFT(tmp23 + tmp3, CONST_BITS-PASS1_BITS); |
| 9320 + wsptr[8*12] = (int) RIGHT_SHIFT(tmp23 - tmp3, CONST_BITS-PASS1_BITS); |
| 9321 + wsptr[8*4] = (int) RIGHT_SHIFT(tmp24 + tmp10, CONST_BITS-PASS1_BITS); |
| 9322 + wsptr[8*11] = (int) RIGHT_SHIFT(tmp24 - tmp10, CONST_BITS-PASS1_BITS); |
| 9323 + wsptr[8*5] = (int) RIGHT_SHIFT(tmp25 + tmp11, CONST_BITS-PASS1_BITS); |
| 9324 + wsptr[8*10] = (int) RIGHT_SHIFT(tmp25 - tmp11, CONST_BITS-PASS1_BITS); |
| 9325 + wsptr[8*6] = (int) RIGHT_SHIFT(tmp26 + tmp12, CONST_BITS-PASS1_BITS); |
| 9326 + wsptr[8*9] = (int) RIGHT_SHIFT(tmp26 - tmp12, CONST_BITS-PASS1_BITS); |
| 9327 + wsptr[8*7] = (int) RIGHT_SHIFT(tmp27 + tmp13, CONST_BITS-PASS1_BITS); |
| 9328 + wsptr[8*8] = (int) RIGHT_SHIFT(tmp27 - tmp13, CONST_BITS-PASS1_BITS); |
| 9329 + } |
| 9330 + |
| 9331 + /* Pass 2: process 16 rows from work array, store into output array. */ |
| 9332 + |
| 9333 + wsptr = workspace; |
| 9334 + for (ctr = 0; ctr < 16; ctr++) { |
| 9335 + outptr = output_buf[ctr] + output_col; |
| 9336 + |
| 9337 + /* Even part */ |
| 9338 + |
| 9339 + /* Add fudge factor here for final descale. */ |
| 9340 + tmp0 = (INT32) wsptr[0] + (ONE << (PASS1_BITS+2)); |
| 9341 + tmp0 <<= CONST_BITS; |
| 9342 + |
| 9343 + z1 = (INT32) wsptr[4]; |
| 9344 + tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */ |
| 9345 + tmp2 = MULTIPLY(z1, FIX_0_541196100); /* c12[16] = c6[8] */ |
| 9346 + |
| 9347 + tmp10 = tmp0 + tmp1; |
| 9348 + tmp11 = tmp0 - tmp1; |
| 9349 + tmp12 = tmp0 + tmp2; |
| 9350 + tmp13 = tmp0 - tmp2; |
| 9351 + |
| 9352 + z1 = (INT32) wsptr[2]; |
| 9353 + z2 = (INT32) wsptr[6]; |
| 9354 + z3 = z1 - z2; |
| 9355 + z4 = MULTIPLY(z3, FIX(0.275899379)); /* c14[16] = c7[8] */ |
| 9356 + z3 = MULTIPLY(z3, FIX(1.387039845)); /* c2[16] = c1[8] */ |
| 9357 + |
| 9358 + tmp0 = z3 + MULTIPLY(z2, FIX_2_562915447); /* (c6+c2)[16] = (c3+c1)[8] */ |
| 9359 + tmp1 = z4 + MULTIPLY(z1, FIX_0_899976223); /* (c6-c14)[16] = (c3-c7)[8] */ |
| 9360 + tmp2 = z3 - MULTIPLY(z1, FIX(0.601344887)); /* (c2-c10)[16] = (c1-c5)[8] */ |
| 9361 + tmp3 = z4 - MULTIPLY(z2, FIX(0.509795579)); /* (c10-c14)[16] = (c5-c7)[8] *
/ |
| 9362 + |
| 9363 + tmp20 = tmp10 + tmp0; |
| 9364 + tmp27 = tmp10 - tmp0; |
| 9365 + tmp21 = tmp12 + tmp1; |
| 9366 + tmp26 = tmp12 - tmp1; |
| 9367 + tmp22 = tmp13 + tmp2; |
| 9368 + tmp25 = tmp13 - tmp2; |
| 9369 + tmp23 = tmp11 + tmp3; |
| 9370 + tmp24 = tmp11 - tmp3; |
| 9371 + |
| 9372 + /* Odd part */ |
| 9373 + |
| 9374 + z1 = (INT32) wsptr[1]; |
| 9375 + z2 = (INT32) wsptr[3]; |
| 9376 + z3 = (INT32) wsptr[5]; |
| 9377 + z4 = (INT32) wsptr[7]; |
| 9378 + |
| 9379 + tmp11 = z1 + z3; |
| 9380 + |
| 9381 + tmp1 = MULTIPLY(z1 + z2, FIX(1.353318001)); /* c3 */ |
| 9382 + tmp2 = MULTIPLY(tmp11, FIX(1.247225013)); /* c5 */ |
| 9383 + tmp3 = MULTIPLY(z1 + z4, FIX(1.093201867)); /* c7 */ |
| 9384 + tmp10 = MULTIPLY(z1 - z4, FIX(0.897167586)); /* c9 */ |
| 9385 + tmp11 = MULTIPLY(tmp11, FIX(0.666655658)); /* c11 */ |
| 9386 + tmp12 = MULTIPLY(z1 - z2, FIX(0.410524528)); /* c13 */ |
| 9387 + tmp0 = tmp1 + tmp2 + tmp3 - |
| 9388 + MULTIPLY(z1, FIX(2.286341144)); /* c7+c5+c3-c1 */ |
| 9389 + tmp13 = tmp10 + tmp11 + tmp12 - |
| 9390 + MULTIPLY(z1, FIX(1.835730603)); /* c9+c11+c13-c15 */ |
| 9391 + z1 = MULTIPLY(z2 + z3, FIX(0.138617169)); /* c15 */ |
| 9392 + tmp1 += z1 + MULTIPLY(z2, FIX(0.071888074)); /* c9+c11-c3-c15 */ |
| 9393 + tmp2 += z1 - MULTIPLY(z3, FIX(1.125726048)); /* c5+c7+c15-c3 */ |
| 9394 + z1 = MULTIPLY(z3 - z2, FIX(1.407403738)); /* c1 */ |
| 9395 + tmp11 += z1 - MULTIPLY(z3, FIX(0.766367282)); /* c1+c11-c9-c13 */ |
| 9396 + tmp12 += z1 + MULTIPLY(z2, FIX(1.971951411)); /* c1+c5+c13-c7 */ |
| 9397 + z2 += z4; |
| 9398 + z1 = MULTIPLY(z2, - FIX(0.666655658)); /* -c11 */ |
| 9399 + tmp1 += z1; |
| 9400 + tmp3 += z1 + MULTIPLY(z4, FIX(1.065388962)); /* c3+c11+c15-c7 */ |
| 9401 + z2 = MULTIPLY(z2, - FIX(1.247225013)); /* -c5 */ |
| 9402 + tmp10 += z2 + MULTIPLY(z4, FIX(3.141271809)); /* c1+c5+c9-c13 */ |
| 9403 + tmp12 += z2; |
| 9404 + z2 = MULTIPLY(z3 + z4, - FIX(1.353318001)); /* -c3 */ |
| 9405 + tmp2 += z2; |
| 9406 + tmp3 += z2; |
| 9407 + z2 = MULTIPLY(z4 - z3, FIX(0.410524528)); /* c13 */ |
| 9408 + tmp10 += z2; |
| 9409 + tmp11 += z2; |
| 9410 + |
| 9411 + /* Final output stage */ |
| 9412 + |
| 9413 + outptr[0] = range_limit[(int) RIGHT_SHIFT(tmp20 + tmp0, |
| 9414 + CONST_BITS+PASS1_BITS+3) |
| 9415 + & RANGE_MASK]; |
| 9416 + outptr[15] = range_limit[(int) RIGHT_SHIFT(tmp20 - tmp0, |
| 9417 + CONST_BITS+PASS1_BITS+3) |
| 9418 + & RANGE_MASK]; |
| 9419 + outptr[1] = range_limit[(int) RIGHT_SHIFT(tmp21 + tmp1, |
| 9420 + CONST_BITS+PASS1_BITS+3) |
| 9421 + & RANGE_MASK]; |
| 9422 + outptr[14] = range_limit[(int) RIGHT_SHIFT(tmp21 - tmp1, |
| 9423 + CONST_BITS+PASS1_BITS+3) |
| 9424 + & RANGE_MASK]; |
| 9425 + outptr[2] = range_limit[(int) RIGHT_SHIFT(tmp22 + tmp2, |
| 9426 + CONST_BITS+PASS1_BITS+3) |
| 9427 + & RANGE_MASK]; |
| 9428 + outptr[13] = range_limit[(int) RIGHT_SHIFT(tmp22 - tmp2, |
| 9429 + CONST_BITS+PASS1_BITS+3) |
| 9430 + & RANGE_MASK]; |
| 9431 + outptr[3] = range_limit[(int) RIGHT_SHIFT(tmp23 + tmp3, |
| 9432 + CONST_BITS+PASS1_BITS+3) |
| 9433 + & RANGE_MASK]; |
| 9434 + outptr[12] = range_limit[(int) RIGHT_SHIFT(tmp23 - tmp3, |
| 9435 + CONST_BITS+PASS1_BITS+3) |
| 9436 + & RANGE_MASK]; |
| 9437 + outptr[4] = range_limit[(int) RIGHT_SHIFT(tmp24 + tmp10, |
| 9438 + CONST_BITS+PASS1_BITS+3) |
| 9439 + & RANGE_MASK]; |
| 9440 + outptr[11] = range_limit[(int) RIGHT_SHIFT(tmp24 - tmp10, |
| 9441 + CONST_BITS+PASS1_BITS+3) |
| 9442 + & RANGE_MASK]; |
| 9443 + outptr[5] = range_limit[(int) RIGHT_SHIFT(tmp25 + tmp11, |
| 9444 + CONST_BITS+PASS1_BITS+3) |
| 9445 + & RANGE_MASK]; |
| 9446 + outptr[10] = range_limit[(int) RIGHT_SHIFT(tmp25 - tmp11, |
| 9447 + CONST_BITS+PASS1_BITS+3) |
| 9448 + & RANGE_MASK]; |
| 9449 + outptr[6] = range_limit[(int) RIGHT_SHIFT(tmp26 + tmp12, |
| 9450 + CONST_BITS+PASS1_BITS+3) |
| 9451 + & RANGE_MASK]; |
| 9452 + outptr[9] = range_limit[(int) RIGHT_SHIFT(tmp26 - tmp12, |
| 9453 + CONST_BITS+PASS1_BITS+3) |
| 9454 + & RANGE_MASK]; |
| 9455 + outptr[7] = range_limit[(int) RIGHT_SHIFT(tmp27 + tmp13, |
| 9456 + CONST_BITS+PASS1_BITS+3) |
| 9457 + & RANGE_MASK]; |
| 9458 + outptr[8] = range_limit[(int) RIGHT_SHIFT(tmp27 - tmp13, |
| 9459 + CONST_BITS+PASS1_BITS+3) |
| 9460 + & RANGE_MASK]; |
| 9461 + |
| 9462 + wsptr += 8; /* advance pointer to next row */ |
| 9463 + } |
| 9464 +} |
| 9465 + |
| 9466 +#endif /* IDCT_SCALING_SUPPORTED */ |
| 9467 #endif /* DCT_ISLOW_SUPPORTED */ |
| 9468 Index: jmemmgr.c |
| 9469 =================================================================== |
| 9470 --- jmemmgr.c (revision 829) |
| 9471 +++ jmemmgr.c (working copy) |
| 9472 @@ -37,6 +37,15 @@ |
| 9473 #endif |
| 9474 |
| 9475 |
| 9476 +LOCAL(size_t) |
| 9477 +round_up_pow2 (size_t a, size_t b) |
| 9478 +/* a rounded up to the next multiple of b, i.e. ceil(a/b)*b */ |
| 9479 +/* Assumes a >= 0, b > 0, and b is a power of 2 */ |
| 9480 +{ |
| 9481 + return ((a + b - 1) & (~(b - 1))); |
| 9482 +} |
| 9483 + |
| 9484 + |
| 9485 /* |
| 9486 * Some important notes: |
| 9487 * The allocation routines provided here must never return NULL. |
| 9488 @@ -122,7 +131,7 @@ |
| 9489 jvirt_barray_ptr virt_barray_list; |
| 9490 |
| 9491 /* This counts total space obtained from jpeg_get_small/large */ |
| 9492 - long total_space_allocated; |
| 9493 + size_t total_space_allocated; |
| 9494 |
| 9495 /* alloc_sarray and alloc_barray set this value for use by virtual |
| 9496 * array routines. |
| 9497 @@ -265,7 +274,7 @@ |
| 9498 * and so that algorithms can straddle outside the proper area up |
| 9499 * to the next alignment. |
| 9500 */ |
| 9501 - sizeofobject = jround_up(sizeofobject, ALIGN_SIZE); |
| 9502 + sizeofobject = round_up_pow2(sizeofobject, ALIGN_SIZE); |
| 9503 |
| 9504 /* Check for unsatisfiable request (do now to ensure no overflow below) */ |
| 9505 if ((SIZEOF(small_pool_hdr) + sizeofobject + ALIGN_SIZE - 1) > MAX_ALLOC_CHUN
K) |
| 9506 @@ -317,8 +326,8 @@ |
| 9507 /* OK, allocate the object from the current pool */ |
| 9508 data_ptr = (char *) hdr_ptr; /* point to first data byte in pool... */ |
| 9509 data_ptr += SIZEOF(small_pool_hdr); /* ...by skipping the header... */ |
| 9510 - if ((unsigned long)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */ |
| 9511 - data_ptr += ALIGN_SIZE - (unsigned long)data_ptr % ALIGN_SIZE; |
| 9512 + if ((size_t)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */ |
| 9513 + data_ptr += ALIGN_SIZE - (size_t)data_ptr % ALIGN_SIZE; |
| 9514 data_ptr += hdr_ptr->bytes_used; /* point to place for object */ |
| 9515 hdr_ptr->bytes_used += sizeofobject; |
| 9516 hdr_ptr->bytes_left -= sizeofobject; |
| 9517 @@ -354,7 +363,7 @@ |
| 9518 * algorithms can straddle outside the proper area up to the next |
| 9519 * alignment. |
| 9520 */ |
| 9521 - sizeofobject = jround_up(sizeofobject, ALIGN_SIZE); |
| 9522 + sizeofobject = round_up_pow2(sizeofobject, ALIGN_SIZE); |
| 9523 |
| 9524 /* Check for unsatisfiable request (do now to ensure no overflow below) */ |
| 9525 if ((SIZEOF(large_pool_hdr) + sizeofobject + ALIGN_SIZE - 1) > MAX_ALLOC_CHUN
K) |
| 9526 @@ -382,8 +391,8 @@ |
| 9527 |
| 9528 data_ptr = (char *) hdr_ptr; /* point to first data byte in pool... */ |
| 9529 data_ptr += SIZEOF(small_pool_hdr); /* ...by skipping the header... */ |
| 9530 - if ((unsigned long)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */ |
| 9531 - data_ptr += ALIGN_SIZE - (unsigned long)data_ptr % ALIGN_SIZE; |
| 9532 + if ((size_t)data_ptr % ALIGN_SIZE) /* ...and adjust for alignment */ |
| 9533 + data_ptr += ALIGN_SIZE - (size_t)data_ptr % ALIGN_SIZE; |
| 9534 |
| 9535 return (void FAR *) data_ptr; |
| 9536 } |
| 9537 @@ -420,7 +429,7 @@ |
| 9538 /* Make sure each row is properly aligned */ |
| 9539 if ((ALIGN_SIZE % SIZEOF(JSAMPLE)) != 0) |
| 9540 out_of_memory(cinfo, 5); /* safety check */ |
| 9541 - samplesperrow = jround_up(samplesperrow, (2 * ALIGN_SIZE) / SIZEOF(JSAMPLE)); |
| 9542 + samplesperrow = (JDIMENSION)round_up_pow2(samplesperrow, (2 * ALIGN_SIZE) / S
IZEOF(JSAMPLE)); |
| 9543 |
| 9544 /* Calculate max # of rows allowed in one allocation chunk */ |
| 9545 ltemp = (MAX_ALLOC_CHUNK-SIZEOF(large_pool_hdr)) / |
| 9546 @@ -608,8 +617,8 @@ |
| 9547 /* Allocate the in-memory buffers for any unrealized virtual arrays */ |
| 9548 { |
| 9549 my_mem_ptr mem = (my_mem_ptr) cinfo->mem; |
| 9550 - long space_per_minheight, maximum_space, avail_mem; |
| 9551 - long minheights, max_minheights; |
| 9552 + size_t space_per_minheight, maximum_space, avail_mem; |
| 9553 + size_t minheights, max_minheights; |
| 9554 jvirt_sarray_ptr sptr; |
| 9555 jvirt_barray_ptr bptr; |
| 9556 |
| 9557 Index: jmemnobs.c |
| 9558 =================================================================== |
| 9559 --- jmemnobs.c (revision 829) |
| 9560 +++ jmemnobs.c (working copy) |
| 9561 @@ -69,9 +69,9 @@ |
| 9562 * Here we always say, "we got all you want bud!" |
| 9563 */ |
| 9564 |
| 9565 -GLOBAL(long) |
| 9566 -jpeg_mem_available (j_common_ptr cinfo, long min_bytes_needed, |
| 9567 - long max_bytes_needed, long already_allocated) |
| 9568 +GLOBAL(size_t) |
| 9569 +jpeg_mem_available (j_common_ptr cinfo, size_t min_bytes_needed, |
| 9570 + size_t max_bytes_needed, size_t already_allocated) |
| 9571 { |
| 9572 return max_bytes_needed; |
| 9573 } |
| 9574 Index: jmemsys.h |
| 9575 =================================================================== |
| 9576 --- jmemsys.h (revision 829) |
| 9577 +++ jmemsys.h (working copy) |
| 9578 @@ -100,10 +100,10 @@ |
| 9579 * Conversely, zero may be returned to always use the minimum amount of memory. |
| 9580 */ |
| 9581 |
| 9582 -EXTERN(long) jpeg_mem_available JPP((j_common_ptr cinfo, |
| 9583 - long min_bytes_needed, |
| 9584 - long max_bytes_needed, |
| 9585 - long already_allocated)); |
| 9586 +EXTERN(size_t) jpeg_mem_available JPP((j_common_ptr cinfo, |
| 9587 + size_t min_bytes_needed, |
| 9588 + size_t max_bytes_needed, |
| 9589 + size_t already_allocated)); |
| 9590 |
| 9591 |
| 9592 /* |
167 Index: jmorecfg.h | 9593 Index: jmorecfg.h |
168 =================================================================== | 9594 =================================================================== |
169 --- jmorecfg.h (revision 829) | 9595 --- jmorecfg.h (revision 829) |
170 +++ jmorecfg.h (working copy) | 9596 +++ jmorecfg.h (working copy) |
171 @@ -153,14 +153,18 @@ | 9597 @@ -1,9 +1,10 @@ |
| 9598 /* |
| 9599 * jmorecfg.h |
| 9600 * |
| 9601 + * This file was part of the Independent JPEG Group's software: |
| 9602 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 9603 - * Copyright (C) 2009, D. R. Commander. |
| 9604 - * This file is part of the Independent JPEG Group's software. |
| 9605 + * Modifications: |
| 9606 + * Copyright (C) 2009, 2011, D. R. Commander. |
| 9607 * For conditions of distribution and use, see the accompanying README file. |
| 9608 * |
| 9609 * This file contains additional configuration options that customize the |
| 9610 @@ -153,14 +154,18 @@ |
172 /* INT16 must hold at least the values -32768..32767. */ | 9611 /* INT16 must hold at least the values -32768..32767. */ |
173 | 9612 |
174 #ifndef XMD_H /* X11/xmd.h correctly defines INT16 */ | 9613 #ifndef XMD_H /* X11/xmd.h correctly defines INT16 */ |
175 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */ | 9614 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */ |
176 typedef short INT16; | 9615 typedef short INT16; |
177 #endif | 9616 #endif |
178 +#endif | 9617 +#endif |
179 | 9618 |
180 /* INT32 must hold at least signed 32-bit values. */ | 9619 /* INT32 must hold at least signed 32-bit values. */ |
181 | 9620 |
182 #ifndef XMD_H /* X11/xmd.h correctly defines INT32 */ | 9621 #ifndef XMD_H /* X11/xmd.h correctly defines INT32 */ |
183 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */ | 9622 +#ifndef _BASETSD_H_ /* basetsd.h correctly defines INT32 */ |
184 typedef long INT32; | 9623 typedef long INT32; |
185 #endif | 9624 #endif |
186 +#endif | 9625 +#endif |
187 | 9626 |
188 /* Datatype used for image dimensions. The JPEG standard only supports | 9627 /* Datatype used for image dimensions. The JPEG standard only supports |
189 * images up to 64K*64K due to 16-bit fields in SOF markers. Therefore | 9628 * images up to 64K*64K due to 16-bit fields in SOF markers. Therefore |
190 @@ -210,11 +214,13 @@ | 9629 @@ -210,11 +215,16 @@ |
191 * explicit coding is needed; see uses of the NEED_FAR_POINTERS symbol. | 9630 * explicit coding is needed; see uses of the NEED_FAR_POINTERS symbol. |
192 */ | 9631 */ |
193 | 9632 |
194 +#ifndef FAR | 9633 +#ifndef FAR |
195 #ifdef NEED_FAR_POINTERS | 9634 #ifdef NEED_FAR_POINTERS |
| 9635 +#ifndef FAR |
196 #define FAR far | 9636 #define FAR far |
| 9637 +#endif |
197 #else | 9638 #else |
| 9639 +#undef FAR |
198 #define FAR | 9640 #define FAR |
199 #endif | 9641 #endif |
200 +#endif | 9642 +#endif |
201 | 9643 |
202 | 9644 |
203 /* | 9645 /* |
| 9646 @@ -257,8 +267,6 @@ |
| 9647 * (You may HAVE to do that if your compiler doesn't like null source files.) |
| 9648 */ |
| 9649 |
| 9650 -/* Arithmetic coding is unsupported for legal reasons. Complaints to IBM. */ |
| 9651 - |
| 9652 /* Capability options common to encoder and decoder: */ |
| 9653 |
| 9654 #define DCT_ISLOW_SUPPORTED /* slow but accurate integer algorithm */ |
| 9655 @@ -267,7 +275,6 @@ |
| 9656 |
| 9657 /* Encoder capability options: */ |
| 9658 |
| 9659 -#undef C_ARITH_CODING_SUPPORTED /* Arithmetic coding back end? */ |
| 9660 #define C_MULTISCAN_FILES_SUPPORTED /* Multiple-scan JPEG files? */ |
| 9661 #define C_PROGRESSIVE_SUPPORTED /* Progressive JPEG? (Requires MULTI
SCAN)*/ |
| 9662 #define ENTROPY_OPT_SUPPORTED /* Optimization of entropy coding parms? */ |
| 9663 @@ -283,7 +290,6 @@ |
| 9664 |
| 9665 /* Decoder capability options: */ |
| 9666 |
| 9667 -#undef D_ARITH_CODING_SUPPORTED /* Arithmetic coding back end? */ |
| 9668 #define D_MULTISCAN_FILES_SUPPORTED /* Multiple-scan JPEG files? */ |
| 9669 #define D_PROGRESSIVE_SUPPORTED /* Progressive JPEG? (Requires MULTI
SCAN)*/ |
| 9670 #define SAVE_MARKERS_SUPPORTED /* jpeg_save_markers() needed? */ |
| 9671 @@ -317,22 +323,60 @@ |
| 9672 #define RGB_BLUE 2 /* Offset of Blue */ |
| 9673 #define RGB_PIXELSIZE 3 /* JSAMPLEs per RGB scanline element */ |
| 9674 |
| 9675 -#define JPEG_NUMCS 12 |
| 9676 +#define JPEG_NUMCS 16 |
| 9677 |
| 9678 +#define EXT_RGB_RED 0 |
| 9679 +#define EXT_RGB_GREEN 1 |
| 9680 +#define EXT_RGB_BLUE 2 |
| 9681 +#define EXT_RGB_PIXELSIZE 3 |
| 9682 + |
| 9683 +#define EXT_RGBX_RED 0 |
| 9684 +#define EXT_RGBX_GREEN 1 |
| 9685 +#define EXT_RGBX_BLUE 2 |
| 9686 +#define EXT_RGBX_PIXELSIZE 4 |
| 9687 + |
| 9688 +#define EXT_BGR_RED 2 |
| 9689 +#define EXT_BGR_GREEN 1 |
| 9690 +#define EXT_BGR_BLUE 0 |
| 9691 +#define EXT_BGR_PIXELSIZE 3 |
| 9692 + |
| 9693 +#define EXT_BGRX_RED 2 |
| 9694 +#define EXT_BGRX_GREEN 1 |
| 9695 +#define EXT_BGRX_BLUE 0 |
| 9696 +#define EXT_BGRX_PIXELSIZE 4 |
| 9697 + |
| 9698 +#define EXT_XBGR_RED 3 |
| 9699 +#define EXT_XBGR_GREEN 2 |
| 9700 +#define EXT_XBGR_BLUE 1 |
| 9701 +#define EXT_XBGR_PIXELSIZE 4 |
| 9702 + |
| 9703 +#define EXT_XRGB_RED 1 |
| 9704 +#define EXT_XRGB_GREEN 2 |
| 9705 +#define EXT_XRGB_BLUE 3 |
| 9706 +#define EXT_XRGB_PIXELSIZE 4 |
| 9707 + |
| 9708 static const int rgb_red[JPEG_NUMCS] = { |
| 9709 - -1, -1, RGB_RED, -1, -1, -1, 0, 0, 2, 2, 3, 1 |
| 9710 + -1, -1, RGB_RED, -1, -1, -1, EXT_RGB_RED, EXT_RGBX_RED, |
| 9711 + EXT_BGR_RED, EXT_BGRX_RED, EXT_XBGR_RED, EXT_XRGB_RED, |
| 9712 + EXT_RGBX_RED, EXT_BGRX_RED, EXT_XBGR_RED, EXT_XRGB_RED |
| 9713 }; |
| 9714 |
| 9715 static const int rgb_green[JPEG_NUMCS] = { |
| 9716 - -1, -1, RGB_GREEN, -1, -1, -1, 1, 1, 1, 1, 2, 2 |
| 9717 + -1, -1, RGB_GREEN, -1, -1, -1, EXT_RGB_GREEN, EXT_RGBX_GREEN, |
| 9718 + EXT_BGR_GREEN, EXT_BGRX_GREEN, EXT_XBGR_GREEN, EXT_XRGB_GREEN, |
| 9719 + EXT_RGBX_GREEN, EXT_BGRX_GREEN, EXT_XBGR_GREEN, EXT_XRGB_GREEN |
| 9720 }; |
| 9721 |
| 9722 static const int rgb_blue[JPEG_NUMCS] = { |
| 9723 - -1, -1, RGB_BLUE, -1, -1, -1, 2, 2, 0, 0, 1, 3 |
| 9724 + -1, -1, RGB_BLUE, -1, -1, -1, EXT_RGB_BLUE, EXT_RGBX_BLUE, |
| 9725 + EXT_BGR_BLUE, EXT_BGRX_BLUE, EXT_XBGR_BLUE, EXT_XRGB_BLUE, |
| 9726 + EXT_RGBX_BLUE, EXT_BGRX_BLUE, EXT_XBGR_BLUE, EXT_XRGB_BLUE |
| 9727 }; |
| 9728 |
| 9729 static const int rgb_pixelsize[JPEG_NUMCS] = { |
| 9730 - -1, -1, RGB_PIXELSIZE, -1, -1, -1, 3, 4, 3, 4, 4, 4 |
| 9731 + -1, -1, RGB_PIXELSIZE, -1, -1, -1, EXT_RGB_PIXELSIZE, EXT_RGBX_PIXELSIZE, |
| 9732 + EXT_BGR_PIXELSIZE, EXT_BGRX_PIXELSIZE, EXT_XBGR_PIXELSIZE, EXT_XRGB_PIXELSIZE
, |
| 9733 + EXT_RGBX_PIXELSIZE, EXT_BGRX_PIXELSIZE, EXT_XBGR_PIXELSIZE, EXT_XRGB_PIXELSIZ
E |
| 9734 }; |
| 9735 |
| 9736 /* Definitions for speed-related optimizations. */ |
| 9737 Index: jpegint.h |
| 9738 =================================================================== |
| 9739 --- jpegint.h (revision 829) |
| 9740 +++ jpegint.h (working copy) |
| 9741 @@ -2,6 +2,7 @@ |
| 9742 * jpegint.h |
| 9743 * |
| 9744 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 9745 + * Modified 1997-2009 by Guido Vollbeding. |
| 9746 * This file is part of the Independent JPEG Group's software. |
| 9747 * For conditions of distribution and use, see the accompanying README file. |
| 9748 * |
| 9749 @@ -304,6 +305,7 @@ |
| 9750 #define jinit_forward_dct jIFDCT |
| 9751 #define jinit_huff_encoder jIHEncoder |
| 9752 #define jinit_phuff_encoder jIPHEncoder |
| 9753 +#define jinit_arith_encoder jIAEncoder |
| 9754 #define jinit_marker_writer jIMWriter |
| 9755 #define jinit_master_decompress jIDMaster |
| 9756 #define jinit_d_main_controller jIDMainC |
| 9757 @@ -313,6 +315,7 @@ |
| 9758 #define jinit_marker_reader jIMReader |
| 9759 #define jinit_huff_decoder jIHDecoder |
| 9760 #define jinit_phuff_decoder jIPHDecoder |
| 9761 +#define jinit_arith_decoder jIADecoder |
| 9762 #define jinit_inverse_dct jIIDCT |
| 9763 #define jinit_upsampler jIUpsampler |
| 9764 #define jinit_color_deconverter jIDColor |
| 9765 @@ -327,6 +330,7 @@ |
| 9766 #define jzero_far jZeroFar |
| 9767 #define jpeg_zigzag_order jZIGTable |
| 9768 #define jpeg_natural_order jZAGTable |
| 9769 +#define jpeg_aritab jAriTab |
| 9770 #endif /* NEED_SHORT_EXTERNAL_NAMES */ |
| 9771 |
| 9772 |
| 9773 @@ -345,6 +349,7 @@ |
| 9774 EXTERN(void) jinit_forward_dct JPP((j_compress_ptr cinfo)); |
| 9775 EXTERN(void) jinit_huff_encoder JPP((j_compress_ptr cinfo)); |
| 9776 EXTERN(void) jinit_phuff_encoder JPP((j_compress_ptr cinfo)); |
| 9777 +EXTERN(void) jinit_arith_encoder JPP((j_compress_ptr cinfo)); |
| 9778 EXTERN(void) jinit_marker_writer JPP((j_compress_ptr cinfo)); |
| 9779 /* Decompression module initialization routines */ |
| 9780 EXTERN(void) jinit_master_decompress JPP((j_decompress_ptr cinfo)); |
| 9781 @@ -358,6 +363,7 @@ |
| 9782 EXTERN(void) jinit_marker_reader JPP((j_decompress_ptr cinfo)); |
| 9783 EXTERN(void) jinit_huff_decoder JPP((j_decompress_ptr cinfo)); |
| 9784 EXTERN(void) jinit_phuff_decoder JPP((j_decompress_ptr cinfo)); |
| 9785 +EXTERN(void) jinit_arith_decoder JPP((j_decompress_ptr cinfo)); |
| 9786 EXTERN(void) jinit_inverse_dct JPP((j_decompress_ptr cinfo)); |
| 9787 EXTERN(void) jinit_upsampler JPP((j_decompress_ptr cinfo)); |
| 9788 EXTERN(void) jinit_color_deconverter JPP((j_decompress_ptr cinfo)); |
| 9789 @@ -382,6 +388,9 @@ |
| 9790 #endif |
| 9791 extern const int jpeg_natural_order[]; /* zigzag coef order to natural order */ |
| 9792 |
| 9793 +/* Arithmetic coding probability estimation tables in jaricom.c */ |
| 9794 +extern const INT32 jpeg_aritab[]; |
| 9795 + |
| 9796 /* Suppress undefined-structure complaints if necessary. */ |
| 9797 |
| 9798 #ifdef INCOMPLETE_TYPES_BROKEN |
204 Index: jpeglib.h | 9799 Index: jpeglib.h |
205 =================================================================== | 9800 =================================================================== |
206 --- jpeglib.h (revision 829) | 9801 --- jpeglib.h (revision 829) |
207 +++ jpeglib.h (working copy) | 9802 +++ jpeglib.h (working copy) |
208 @@ -15,6 +15,10 @@ | 9803 @@ -1,9 +1,12 @@ |
| 9804 /* |
| 9805 * jpeglib.h |
| 9806 * |
| 9807 + * This file was part of the Independent JPEG Group's software: |
| 9808 * Copyright (C) 1991-1998, Thomas G. Lane. |
| 9809 - * Copyright (C) 2009, D. R. Commander. |
| 9810 - * This file is part of the Independent JPEG Group's software. |
| 9811 + * Modified 2002-2009 by Guido Vollbeding. |
| 9812 + * Modifications: |
| 9813 + * Copyright (C) 2009-2011, 2013, D. R. Commander. |
| 9814 + * Copyright (C) 2015, Google, Inc. |
| 9815 * For conditions of distribution and use, see the accompanying README file. |
| 9816 * |
| 9817 * This file defines the application interface for the JPEG library. |
| 9818 @@ -14,6 +17,10 @@ |
209 #ifndef JPEGLIB_H | 9819 #ifndef JPEGLIB_H |
210 #define JPEGLIB_H | 9820 #define JPEGLIB_H |
211 | 9821 |
212 +/* Begin chromium edits */ | 9822 +/* Begin chromium edits */ |
213 +#include "jpeglibmangler.h" | 9823 +#include "jpeglibmangler.h" |
214 +/* End chromium edits */ | 9824 +/* End chromium edits */ |
215 + | 9825 + |
216 /* | 9826 /* |
217 * First we include the configuration files that record how this | 9827 * First we include the configuration files that record how this |
218 * installation of the JPEG library is set up. jconfig.h can be | 9828 * installation of the JPEG library is set up. jconfig.h can be |
| 9829 @@ -27,13 +34,13 @@ |
| 9830 #include "jmorecfg.h" /* seldom changed options */ |
| 9831 |
| 9832 |
| 9833 -/* Version ID for the JPEG library. |
| 9834 - * Might be useful for tests like "#if JPEG_LIB_VERSION >= 60". |
| 9835 - */ |
| 9836 +#ifdef __cplusplus |
| 9837 +#ifndef DONT_USE_EXTERN_C |
| 9838 +extern "C" { |
| 9839 +#endif |
| 9840 +#endif |
| 9841 |
| 9842 -#define JPEG_LIB_VERSION 62 /* Version 6b */ |
| 9843 |
| 9844 - |
| 9845 /* Various constants determining the sizes of things. |
| 9846 * All of these are specified by the JPEG standard, so don't change them |
| 9847 * if you want to be compatible. |
| 9848 @@ -145,12 +152,17 @@ |
| 9849 * Values of 1,2,4,8 are likely to be supported. Note that different |
| 9850 * components may receive different IDCT scalings. |
| 9851 */ |
| 9852 +#if JPEG_LIB_VERSION >= 70 |
| 9853 + int DCT_h_scaled_size; |
| 9854 + int DCT_v_scaled_size; |
| 9855 +#else |
| 9856 int DCT_scaled_size; |
| 9857 +#endif |
| 9858 /* The downsampled dimensions are the component's actual, unpadded number |
| 9859 * of samples at the main buffer (preprocessing/compression interface), thus |
| 9860 * downsampled_width = ceil(image_width * Hi/Hmax) |
| 9861 * and similarly for height. For decompression, IDCT scaling is included, so |
| 9862 - * downsampled_width = ceil(image_width * Hi/Hmax * DCT_scaled_size/DCTSIZE) |
| 9863 + * downsampled_width = ceil(image_width * Hi/Hmax * DCT_[h_]scaled_size/DCTSI
ZE) |
| 9864 */ |
| 9865 JDIMENSION downsampled_width; /* actual width in samples */ |
| 9866 JDIMENSION downsampled_height; /* actual height in samples */ |
| 9867 @@ -165,7 +177,7 @@ |
| 9868 int MCU_width; /* number of blocks per MCU, horizontally */ |
| 9869 int MCU_height; /* number of blocks per MCU, vertically */ |
| 9870 int MCU_blocks; /* MCU_width * MCU_height */ |
| 9871 - int MCU_sample_width; /* MCU width in samples, MCU_width*DCT_s
caled_size */ |
| 9872 + int MCU_sample_width; /* MCU width in samples, MCU_width*DCT_[
h_]scaled_size */ |
| 9873 int last_col_width; /* # of non-dummy blocks across in last MCU */ |
| 9874 int last_row_height; /* # of non-dummy blocks down in last MCU */ |
| 9875 |
| 9876 @@ -205,12 +217,13 @@ |
| 9877 /* Known color spaces. */ |
| 9878 |
| 9879 #define JCS_EXTENSIONS 1 |
| 9880 +#define JCS_ALPHA_EXTENSIONS 1 |
| 9881 |
| 9882 typedef enum { |
| 9883 JCS_UNKNOWN, /* error/unspecified */ |
| 9884 JCS_GRAYSCALE, /* monochrome */ |
| 9885 JCS_RGB, /* red/green/blue as specified by the RGB_RED, R
GB_GREEN, |
| 9886 - RGB_BLUE, and RGB_PIXELSIZE macros */ |
| 9887 + RGB_BLUE, and RGB_PIXELSIZE macros */ |
| 9888 JCS_YCbCr, /* Y/Cb/Cr (also known as YUV) */ |
| 9889 JCS_CMYK, /* C/M/Y/K */ |
| 9890 JCS_YCCK, /* Y/Cb/Cr/K */ |
| 9891 @@ -220,6 +233,17 @@ |
| 9892 JCS_EXT_BGRX, /* blue/green/red/x */ |
| 9893 JCS_EXT_XBGR, /* x/blue/green/red */ |
| 9894 JCS_EXT_XRGB, /* x/red/green/blue */ |
| 9895 + /* When out_color_space it set to JCS_EXT_RGBX, JCS_EXT_BGRX, |
| 9896 + JCS_EXT_XBGR, or JCS_EXT_XRGB during decompression, the X byte is |
| 9897 + undefined, and in order to ensure the best performance, |
| 9898 + libjpeg-turbo can set that byte to whatever value it wishes. Use |
| 9899 + the following colorspace constants to ensure that the X byte is set |
| 9900 + to 0xFF, so that it can be interpreted as an opaque alpha |
| 9901 + channel. */ |
| 9902 + JCS_EXT_RGBA, /* red/green/blue/alpha */ |
| 9903 + JCS_EXT_BGRA, /* blue/green/red/alpha */ |
| 9904 + JCS_EXT_ABGR, /* alpha/blue/green/red */ |
| 9905 + JCS_EXT_ARGB /* alpha/red/green/blue */ |
| 9906 } J_COLOR_SPACE; |
| 9907 |
| 9908 /* DCT/IDCT algorithm options. */ |
| 9909 @@ -301,6 +325,19 @@ |
| 9910 * helper routines to simplify changing parameters. |
| 9911 */ |
| 9912 |
| 9913 +#if JPEG_LIB_VERSION >= 70 |
| 9914 + unsigned int scale_num, scale_denom; /* fraction by which to scale image */ |
| 9915 + |
| 9916 + JDIMENSION jpeg_width; /* scaled JPEG image width */ |
| 9917 + JDIMENSION jpeg_height; /* scaled JPEG image height */ |
| 9918 + /* Dimensions of actual JPEG image that will be written to file, |
| 9919 + * derived from input dimensions by scaling factors above. |
| 9920 + * These fields are computed by jpeg_start_compress(). |
| 9921 + * You can also use jpeg_calc_jpeg_dimensions() to determine these values |
| 9922 + * in advance of calling jpeg_start_compress(). |
| 9923 + */ |
| 9924 +#endif |
| 9925 + |
| 9926 int data_precision; /* bits of precision in image data */ |
| 9927 |
| 9928 int num_components; /* # of color components in JPEG image */ |
| 9929 @@ -308,14 +345,19 @@ |
| 9930 |
| 9931 jpeg_component_info * comp_info; |
| 9932 /* comp_info[i] describes component that appears i'th in SOF */ |
| 9933 - |
| 9934 + |
| 9935 JQUANT_TBL * quant_tbl_ptrs[NUM_QUANT_TBLS]; |
| 9936 - /* ptrs to coefficient quantization tables, or NULL if not defined */ |
| 9937 - |
| 9938 +#if JPEG_LIB_VERSION >= 70 |
| 9939 + int q_scale_factor[NUM_QUANT_TBLS]; |
| 9940 +#endif |
| 9941 + /* ptrs to coefficient quantization tables, or NULL if not defined, |
| 9942 + * and corresponding scale factors (percentage, initialized 100). |
| 9943 + */ |
| 9944 + |
| 9945 JHUFF_TBL * dc_huff_tbl_ptrs[NUM_HUFF_TBLS]; |
| 9946 JHUFF_TBL * ac_huff_tbl_ptrs[NUM_HUFF_TBLS]; |
| 9947 /* ptrs to Huffman coding tables, or NULL if not defined */ |
| 9948 - |
| 9949 + |
| 9950 UINT8 arith_dc_L[NUM_ARITH_TBLS]; /* L values for DC arith-coding tables */ |
| 9951 UINT8 arith_dc_U[NUM_ARITH_TBLS]; /* U values for DC arith-coding tables */ |
| 9952 UINT8 arith_ac_K[NUM_ARITH_TBLS]; /* Kx values for AC arith-coding tables */ |
| 9953 @@ -331,6 +373,9 @@ |
| 9954 boolean arith_code; /* TRUE=arithmetic coding, FALSE=Huffman */ |
| 9955 boolean optimize_coding; /* TRUE=optimize entropy encoding parms */ |
| 9956 boolean CCIR601_sampling; /* TRUE=first samples are cosited */ |
| 9957 +#if JPEG_LIB_VERSION >= 70 |
| 9958 + boolean do_fancy_downsampling; /* TRUE=apply fancy downsampling */ |
| 9959 +#endif |
| 9960 int smoothing_factor; /* 1..100, or 0 for no input smoothing *
/ |
| 9961 J_DCT_METHOD dct_method; /* DCT algorithm selector */ |
| 9962 |
| 9963 @@ -374,6 +419,11 @@ |
| 9964 int max_h_samp_factor; /* largest h_samp_factor */ |
| 9965 int max_v_samp_factor; /* largest v_samp_factor */ |
| 9966 |
| 9967 +#if JPEG_LIB_VERSION >= 70 |
| 9968 + int min_DCT_h_scaled_size; /* smallest DCT_h_scaled_size of any component *
/ |
| 9969 + int min_DCT_v_scaled_size; /* smallest DCT_v_scaled_size of any component *
/ |
| 9970 +#endif |
| 9971 + |
| 9972 JDIMENSION total_iMCU_rows; /* # of iMCU rows to be input to coef ctlr */ |
| 9973 /* The coefficient controller receives data in units of MCU rows as defined |
| 9974 * for fully interleaved scans (whether the JPEG file is interleaved or not). |
| 9975 @@ -399,6 +449,12 @@ |
| 9976 |
| 9977 int Ss, Se, Ah, Al; /* progressive JPEG parameters for scan */ |
| 9978 |
| 9979 +#if JPEG_LIB_VERSION >= 80 |
| 9980 + int block_size; /* the basic DCT block size: 1..16 */ |
| 9981 + const int * natural_order; /* natural-order position array */ |
| 9982 + int lim_Se; /* min( Se, DCTSIZE2-1 ) */ |
| 9983 +#endif |
| 9984 + |
| 9985 /* |
| 9986 * Links to compression subobjects (methods and private variables of modules) |
| 9987 */ |
| 9988 @@ -545,6 +601,9 @@ |
| 9989 jpeg_component_info * comp_info; |
| 9990 /* comp_info[i] describes component that appears i'th in SOF */ |
| 9991 |
| 9992 +#if JPEG_LIB_VERSION >= 80 |
| 9993 + boolean is_baseline; /* TRUE if Baseline SOF0 encountered */ |
| 9994 +#endif |
| 9995 boolean progressive_mode; /* TRUE if SOFn specifies progressive mode */ |
| 9996 boolean arith_code; /* TRUE=arithmetic coding, FALSE=Huffman */ |
| 9997 |
| 9998 @@ -585,7 +644,12 @@ |
| 9999 int max_h_samp_factor; /* largest h_samp_factor */ |
| 10000 int max_v_samp_factor; /* largest v_samp_factor */ |
| 10001 |
| 10002 +#if JPEG_LIB_VERSION >= 70 |
| 10003 + int min_DCT_h_scaled_size; /* smallest DCT_h_scaled_size of any component *
/ |
| 10004 + int min_DCT_v_scaled_size; /* smallest DCT_v_scaled_size of any component *
/ |
| 10005 +#else |
| 10006 int min_DCT_scaled_size; /* smallest DCT_scaled_size of any component */ |
| 10007 +#endif |
| 10008 |
| 10009 JDIMENSION total_iMCU_rows; /* # of iMCU rows in image */ |
| 10010 /* The coefficient controller's input and output progress is measured in |
| 10011 @@ -593,7 +657,7 @@ |
| 10012 * in fully interleaved JPEG scans, but are used whether the scan is |
| 10013 * interleaved or not. We define an iMCU row as v_samp_factor DCT block |
| 10014 * rows of each component. Therefore, the IDCT output contains |
| 10015 - * v_samp_factor*DCT_scaled_size sample rows of a component per iMCU row. |
| 10016 + * v_samp_factor*DCT_[v_]scaled_size sample rows of a component per iMCU row. |
| 10017 */ |
| 10018 |
| 10019 JSAMPLE * sample_range_limit; /* table for fast range-limiting */ |
| 10020 @@ -617,6 +681,14 @@ |
| 10021 |
| 10022 int Ss, Se, Ah, Al; /* progressive JPEG parameters for scan */ |
| 10023 |
| 10024 +#if JPEG_LIB_VERSION >= 80 |
| 10025 + /* These fields are derived from Se of first SOS marker. |
| 10026 + */ |
| 10027 + int block_size; /* the basic DCT block size: 1..16 */ |
| 10028 + const int * natural_order; /* natural-order position array for entropy decode
*/ |
| 10029 + int lim_Se; /* min( Se, DCTSIZE2-1 ) for entropy decode */ |
| 10030 +#endif |
| 10031 + |
| 10032 /* This field is shared between entropy decoder and marker parser. |
| 10033 * It is either zero or the code of a JPEG marker that has been |
| 10034 * read from the data source, but has not yet been processed. |
| 10035 @@ -846,11 +918,18 @@ |
| 10036 #define jpeg_destroy_decompress jDestDecompress |
| 10037 #define jpeg_stdio_dest jStdDest |
| 10038 #define jpeg_stdio_src jStdSrc |
| 10039 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 10040 +#define jpeg_mem_dest jMemDest |
| 10041 +#define jpeg_mem_src jMemSrc |
| 10042 +#endif |
| 10043 #define jpeg_set_defaults jSetDefaults |
| 10044 #define jpeg_set_colorspace jSetColorspace |
| 10045 #define jpeg_default_colorspace jDefColorspace |
| 10046 #define jpeg_set_quality jSetQuality |
| 10047 #define jpeg_set_linear_quality jSetLQuality |
| 10048 +#if JPEG_LIB_VERSION >= 70 |
| 10049 +#define jpeg_default_qtables jDefQTables |
| 10050 +#endif |
| 10051 #define jpeg_add_quant_table jAddQuantTable |
| 10052 #define jpeg_quality_scaling jQualityScaling |
| 10053 #define jpeg_simple_progression jSimProgress |
| 10054 @@ -860,6 +939,9 @@ |
| 10055 #define jpeg_start_compress jStrtCompress |
| 10056 #define jpeg_write_scanlines jWrtScanlines |
| 10057 #define jpeg_finish_compress jFinCompress |
| 10058 +#if JPEG_LIB_VERSION >= 70 |
| 10059 +#define jpeg_calc_jpeg_dimensions jCjpegDimensions |
| 10060 +#endif |
| 10061 #define jpeg_write_raw_data jWrtRawData |
| 10062 #define jpeg_write_marker jWrtMarker |
| 10063 #define jpeg_write_m_header jWrtMHeader |
| 10064 @@ -876,6 +958,9 @@ |
| 10065 #define jpeg_input_complete jInComplete |
| 10066 #define jpeg_new_colormap jNewCMap |
| 10067 #define jpeg_consume_input jConsumeInput |
| 10068 +#if JPEG_LIB_VERSION >= 80 |
| 10069 +#define jpeg_core_output_dimensions jCoreDimensions |
| 10070 +#endif |
| 10071 #define jpeg_calc_output_dimensions jCalcDimensions |
| 10072 #define jpeg_save_markers jSaveMarkers |
| 10073 #define jpeg_set_marker_processor jSetMarker |
| 10074 @@ -920,6 +1005,16 @@ |
| 10075 EXTERN(void) jpeg_stdio_dest JPP((j_compress_ptr cinfo, FILE * outfile)); |
| 10076 EXTERN(void) jpeg_stdio_src JPP((j_decompress_ptr cinfo, FILE * infile)); |
| 10077 |
| 10078 +#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) |
| 10079 +/* Data source and destination managers: memory buffers. */ |
| 10080 +EXTERN(void) jpeg_mem_dest JPP((j_compress_ptr cinfo, |
| 10081 + unsigned char ** outbuffer, |
| 10082 + unsigned long * outsize)); |
| 10083 +EXTERN(void) jpeg_mem_src JPP((j_decompress_ptr cinfo, |
| 10084 + unsigned char * inbuffer, |
| 10085 + unsigned long insize)); |
| 10086 +#endif |
| 10087 + |
| 10088 /* Default parameter setup for compression */ |
| 10089 EXTERN(void) jpeg_set_defaults JPP((j_compress_ptr cinfo)); |
| 10090 /* Compression parameter setup aids */ |
| 10091 @@ -931,6 +1026,10 @@ |
| 10092 EXTERN(void) jpeg_set_linear_quality JPP((j_compress_ptr cinfo, |
| 10093 int scale_factor, |
| 10094 boolean force_baseline)); |
| 10095 +#if JPEG_LIB_VERSION >= 70 |
| 10096 +EXTERN(void) jpeg_default_qtables JPP((j_compress_ptr cinfo, |
| 10097 + boolean force_baseline)); |
| 10098 +#endif |
| 10099 EXTERN(void) jpeg_add_quant_table JPP((j_compress_ptr cinfo, int which_tbl, |
| 10100 const unsigned int *basic_table, |
| 10101 int scale_factor, |
| 10102 @@ -950,12 +1049,17 @@ |
| 10103 JDIMENSION num_lines)); |
| 10104 EXTERN(void) jpeg_finish_compress JPP((j_compress_ptr cinfo)); |
| 10105 |
| 10106 +#if JPEG_LIB_VERSION >= 70 |
| 10107 +/* Precalculate JPEG dimensions for current compression parameters. */ |
| 10108 +EXTERN(void) jpeg_calc_jpeg_dimensions JPP((j_compress_ptr cinfo)); |
| 10109 +#endif |
| 10110 + |
| 10111 /* Replaces jpeg_write_scanlines when writing raw downsampled data. */ |
| 10112 EXTERN(JDIMENSION) jpeg_write_raw_data JPP((j_compress_ptr cinfo, |
| 10113 JSAMPIMAGE data, |
| 10114 JDIMENSION num_lines)); |
| 10115 |
| 10116 -/* Write a special marker. See libjpeg.doc concerning safe usage. */ |
| 10117 +/* Write a special marker. See libjpeg.txt concerning safe usage. */ |
| 10118 EXTERN(void) jpeg_write_marker |
| 10119 JPP((j_compress_ptr cinfo, int marker, |
| 10120 const JOCTET * dataptr, unsigned int datalen)); |
| 10121 @@ -986,6 +1090,8 @@ |
| 10122 EXTERN(JDIMENSION) jpeg_read_scanlines JPP((j_decompress_ptr cinfo, |
| 10123 JSAMPARRAY scanlines, |
| 10124 JDIMENSION max_lines)); |
| 10125 +EXTERN(JDIMENSION) jpeg_skip_scanlines (j_decompress_ptr cinfo, |
| 10126 + JDIMENSION num_lines); |
| 10127 EXTERN(boolean) jpeg_finish_decompress JPP((j_decompress_ptr cinfo)); |
| 10128 |
| 10129 /* Replaces jpeg_read_scanlines when reading raw downsampled data. */ |
| 10130 @@ -1009,6 +1115,9 @@ |
| 10131 #define JPEG_SCAN_COMPLETED 4 /* Completed last iMCU row of a scan */ |
| 10132 |
| 10133 /* Precalculate output dimensions for current decompression parameters. */ |
| 10134 +#if JPEG_LIB_VERSION >= 80 |
| 10135 +EXTERN(void) jpeg_core_output_dimensions JPP((j_decompress_ptr cinfo)); |
| 10136 +#endif |
| 10137 EXTERN(void) jpeg_calc_output_dimensions JPP((j_decompress_ptr cinfo)); |
| 10138 |
| 10139 /* Control saving of COM and APPn markers into marker_list. */ |
| 10140 @@ -1103,4 +1212,10 @@ |
| 10141 #include "jerror.h" /* fetch error codes too */ |
| 10142 #endif |
| 10143 |
| 10144 +#ifdef __cplusplus |
| 10145 +#ifndef DONT_USE_EXTERN_C |
| 10146 +} |
| 10147 +#endif |
| 10148 +#endif |
| 10149 + |
| 10150 #endif /* JPEGLIB_H */ |
219 Index: jpeglibmangler.h | 10151 Index: jpeglibmangler.h |
220 =================================================================== | 10152 =================================================================== |
221 --- jpeglibmangler.h (revision 0) | 10153 --- jpeglibmangler.h (revision 0) |
222 +++ jpeglibmangler.h» (revision 0) | 10154 +++ jpeglibmangler.h» (working copy) |
223 @@ -0,0 +1,113 @@ | 10155 @@ -0,0 +1,114 @@ |
224 +// Copyright (c) 2009 The Chromium Authors. All rights reserved. | 10156 +// Copyright (c) 2009 The Chromium Authors. All rights reserved. |
225 +// Use of this source code is governed by a BSD-style license that can be | 10157 +// Use of this source code is governed by a BSD-style license that can be |
226 +// found in the LICENSE file. | 10158 +// found in the LICENSE file. |
227 + | 10159 + |
228 +#ifndef THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_ | 10160 +#ifndef THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_ |
229 +#define THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_ | 10161 +#define THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_ |
230 + | 10162 + |
231 +// Mangle all externally visible function names so we can build our own libjpeg | 10163 +// Mangle all externally visible function names so we can build our own libjpeg |
232 +// without system libraries trying to use it. | 10164 +// without system libraries trying to use it. |
233 + | 10165 + |
(...skipping 64 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
298 +#define jpeg_write_scanlines chromium_jpeg_write_scanlines | 10230 +#define jpeg_write_scanlines chromium_jpeg_write_scanlines |
299 +#define jpeg_finish_compress chromium_jpeg_finish_compress | 10231 +#define jpeg_finish_compress chromium_jpeg_finish_compress |
300 +#define jpeg_write_raw_data chromium_jpeg_write_raw_data | 10232 +#define jpeg_write_raw_data chromium_jpeg_write_raw_data |
301 +#define jpeg_write_marker chromium_jpeg_write_marker | 10233 +#define jpeg_write_marker chromium_jpeg_write_marker |
302 +#define jpeg_write_m_header chromium_jpeg_write_m_header | 10234 +#define jpeg_write_m_header chromium_jpeg_write_m_header |
303 +#define jpeg_write_m_byte chromium_jpeg_write_m_byte | 10235 +#define jpeg_write_m_byte chromium_jpeg_write_m_byte |
304 +#define jpeg_write_tables chromium_jpeg_write_tables | 10236 +#define jpeg_write_tables chromium_jpeg_write_tables |
305 +#define jpeg_read_header chromium_jpeg_read_header | 10237 +#define jpeg_read_header chromium_jpeg_read_header |
306 +#define jpeg_start_decompress chromium_jpeg_start_decompress | 10238 +#define jpeg_start_decompress chromium_jpeg_start_decompress |
307 +#define jpeg_read_scanlines chromium_jpeg_read_scanlines | 10239 +#define jpeg_read_scanlines chromium_jpeg_read_scanlines |
| 10240 +#define jpeg_skip_scanlines chromium_jpeg_skip_scanlines |
308 +#define jpeg_finish_decompress chromium_jpeg_finish_decompress | 10241 +#define jpeg_finish_decompress chromium_jpeg_finish_decompress |
309 +#define jpeg_read_raw_data chromium_jpeg_read_raw_data | 10242 +#define jpeg_read_raw_data chromium_jpeg_read_raw_data |
310 +#define jpeg_has_multiple_scans chromium_jpeg_has_multiple_scans | 10243 +#define jpeg_has_multiple_scans chromium_jpeg_has_multiple_scans |
311 +#define jpeg_start_output chromium_jpeg_start_output | 10244 +#define jpeg_start_output chromium_jpeg_start_output |
312 +#define jpeg_finish_output chromium_jpeg_finish_output | 10245 +#define jpeg_finish_output chromium_jpeg_finish_output |
313 +#define jpeg_input_complete chromium_jpeg_input_complete | 10246 +#define jpeg_input_complete chromium_jpeg_input_complete |
314 +#define jpeg_new_colormap chromium_jpeg_new_colormap | 10247 +#define jpeg_new_colormap chromium_jpeg_new_colormap |
315 +#define jpeg_consume_input chromium_jpeg_consume_input | 10248 +#define jpeg_consume_input chromium_jpeg_consume_input |
316 +#define jpeg_calc_output_dimensions chromium_jpeg_calc_output_dimensions | 10249 +#define jpeg_calc_output_dimensions chromium_jpeg_calc_output_dimensions |
317 +#define jpeg_save_markers chromium_jpeg_save_markers | 10250 +#define jpeg_save_markers chromium_jpeg_save_markers |
318 +#define jpeg_set_marker_processor chromium_jpeg_set_marker_processor | 10251 +#define jpeg_set_marker_processor chromium_jpeg_set_marker_processor |
319 +#define jpeg_read_coefficients chromium_jpeg_read_coefficients | 10252 +#define jpeg_read_coefficients chromium_jpeg_read_coefficients |
320 +#define jpeg_write_coefficients chromium_jpeg_write_coefficients | 10253 +#define jpeg_write_coefficients chromium_jpeg_write_coefficients |
321 +#define jpeg_copy_critical_parameters chromium_jpeg_copy_critical_parameters | 10254 +#define jpeg_copy_critical_parameters chromium_jpeg_copy_critical_parameters |
322 +#define jpeg_abort_compress chromium_jpeg_abort_compress | 10255 +#define jpeg_abort_compress chromium_jpeg_abort_compress |
323 +#define jpeg_abort_decompress chromium_jpeg_abort_decompress | 10256 +#define jpeg_abort_decompress chromium_jpeg_abort_decompress |
324 +#define jpeg_abort chromium_jpeg_abort | 10257 +#define jpeg_abort chromium_jpeg_abort |
325 +#define jpeg_destroy chromium_jpeg_destroy | 10258 +#define jpeg_destroy chromium_jpeg_destroy |
326 +#define jpeg_resync_to_restart chromium_jpeg_resync_to_restart | 10259 +#define jpeg_resync_to_restart chromium_jpeg_resync_to_restart |
327 +#define jpeg_get_small chromium_jpeg_get_small | 10260 +#define jpeg_get_small chromium_jpeg_get_small |
328 +#define jpeg_free_small chromium_jpeg_free_small | 10261 +#define jpeg_free_small chromium_jpeg_free_small |
329 +#define jpeg_get_large chromium_jpeg_get_large | 10262 +#define jpeg_get_large chromium_jpeg_get_large |
330 +#define jpeg_free_large chromium_jpeg_free_large | 10263 +#define jpeg_free_large chromium_jpeg_free_large |
331 +#define jpeg_mem_available chromium_jpeg_mem_available | 10264 +#define jpeg_mem_available chromium_jpeg_mem_available |
332 +#define jpeg_open_backing_store chromium_jpeg_open_backing_store | 10265 +#define jpeg_open_backing_store chromium_jpeg_open_backing_store |
333 +#define jpeg_mem_init chromium_jpeg_mem_init | 10266 +#define jpeg_mem_init chromium_jpeg_mem_init |
334 +#define jpeg_mem_term chromium_jpeg_mem_term | 10267 +#define jpeg_mem_term chromium_jpeg_mem_term |
335 + | 10268 + |
336 +#endif // THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_ | 10269 +#endif // THIRD_PARTY_LIBJPEG_TURBO_JPEGLIBMANGLER_H_ |
337 Index: simd/jcgrass2-64.asm | 10270 Index: jpegut.c |
338 =================================================================== | 10271 =================================================================== |
339 --- simd/jcgrass2-64.asm (revision 829) | 10272 --- jpegut.c (revision 829) |
340 +++ simd/jcgrass2-64.asm (working copy) | 10273 +++ jpegut.c (working copy) |
341 @@ -30,7 +30,7 @@ | 10274 @@ -19,11 +19,14 @@ |
342 SECTION SEG_CONST | 10275 #include "./rrtimer.h" |
343 | 10276 #include "./turbojpeg.h" |
344 alignz 16 | 10277 |
345 - global EXTN(jconst_rgb_gray_convert_sse2) | 10278 -#define _catch(f) {if((f)==-1) {printf("TJPEG: %s\n", tjGetErrorStr()); goto f
inally;}} |
346 + global EXTN(jconst_rgb_gray_convert_sse2) PRIVATE | 10279 +#define _catch(f) {if((f)==-1) {printf("TJPEG: %s\n", tjGetErrorStr()); bailou
t();}} |
347 | 10280 |
348 EXTN(jconst_rgb_gray_convert_sse2): | 10281 const char *_subnamel[NUMSUBOPT]={"4:4:4", "4:2:2", "4:2:0", "GRAY"}; |
349 | 10282 const char *_subnames[NUMSUBOPT]={"444", "422", "420", "GRAY"}; |
350 Index: simd/jiss2fst.asm | 10283 |
351 =================================================================== | 10284 +int exitstatus=0; |
352 --- simd/jiss2fst.asm (revision 829) | 10285 +#define bailout() {exitstatus=-1; goto finally;} |
353 +++ simd/jiss2fst.asm (working copy) | 10286 + |
354 @@ -59,7 +59,7 @@ | 10287 int pixels[9][3]= |
355 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) | 10288 { |
356 | 10289 {0, 255, 0}, |
357 alignz 16 | 10290 @@ -70,7 +73,7 @@ |
358 - global EXTN(jconst_idct_ifast_sse2) | 10291 } |
359 + global EXTN(jconst_idct_ifast_sse2) PRIVATE | 10292 } |
360 | 10293 |
361 EXTN(jconst_idct_ifast_sse2): | 10294 -int dumpbuf(unsigned char *buf, int w, int h, int ps, int flags) |
362 | 10295 +void dumpbuf(unsigned char *buf, int w, int h, int ps, int flags) |
363 @@ -92,7 +92,7 @@ | 10296 { |
364 %define WK_NUM 2 | 10297 int roffset=(flags&TJ_BGR)?2:0, goffset=1, boffset=(flags&TJ_BGR)?0:2, i
, |
| 10298 j; |
| 10299 @@ -177,12 +180,12 @@ |
| 10300 if((outfile=fopen(filename, "wb"))==NULL) |
| 10301 { |
| 10302 printf("ERROR: Could not open %s for writing.\n", filename); |
| 10303 - goto finally; |
| 10304 + bailout(); |
| 10305 } |
| 10306 if(fwrite(jpegbuf, jpgbufsize, 1, outfile)!=1) |
| 10307 { |
| 10308 printf("ERROR: Could not write to %s.\n", filename); |
| 10309 - goto finally; |
| 10310 + bailout(); |
| 10311 } |
| 10312 |
| 10313 finally: |
| 10314 @@ -210,7 +213,7 @@ |
| 10315 |
| 10316 if((bmpbuf=(unsigned char *)malloc(w*h*ps+1))==NULL) |
| 10317 { |
| 10318 - printf("ERROR: Could not allocate buffer\n"); goto finally; |
| 10319 + printf("ERROR: Could not allocate buffer\n"); bailout(); |
| 10320 } |
| 10321 initbuf(bmpbuf, w, h, ps, flags); |
| 10322 memset(jpegbuf, 0, TJBUFSIZE(w, h)); |
| 10323 @@ -249,12 +252,12 @@ |
| 10324 _catch(tjDecompressHeader(hnd, jpegbuf, jpegsize, &_w, &_h)); |
| 10325 if(_w!=w || _h!=h) |
| 10326 { |
| 10327 - printf("Incorrect JPEG header\n"); goto finally; |
| 10328 + printf("Incorrect JPEG header\n"); bailout(); |
| 10329 } |
| 10330 |
| 10331 if((bmpbuf=(unsigned char *)malloc(w*h*ps+1))==NULL) |
| 10332 { |
| 10333 - printf("ERROR: Could not allocate buffer\n"); goto finally; |
| 10334 + printf("ERROR: Could not allocate buffer\n"); bailout(); |
| 10335 } |
| 10336 memset(bmpbuf, 0, w*ps*h); |
| 10337 |
| 10338 @@ -278,13 +281,13 @@ |
| 10339 |
| 10340 if((jpegbuf=(unsigned char *)malloc(TJBUFSIZE(w, h))) == NULL) |
| 10341 { |
| 10342 - puts("ERROR: Could not allocate buffer."); goto finally; |
| 10343 + puts("ERROR: Could not allocate buffer."); bailout(); |
| 10344 } |
| 10345 |
| 10346 if((hnd=tjInitCompress())==NULL) |
| 10347 - {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); g
oto finally;} |
| 10348 + {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); b
ailout();} |
| 10349 if((dhnd=tjInitDecompress())==NULL) |
| 10350 - {printf("Error in tjInitDecompress():\n%s\n", tjGetErrorStr());
goto finally;} |
| 10351 + {printf("Error in tjInitDecompress():\n%s\n", tjGetErrorStr());
bailout();} |
| 10352 |
| 10353 gentestjpeg(hnd, jpegbuf, &size, w, h, ps, basefilename, subsamp, 100, 0
); |
| 10354 gentestbmp(dhnd, jpegbuf, size, w, h, ps, basefilename, subsamp, 100, 0)
; |
| 10355 @@ -327,7 +330,7 @@ |
| 10356 int i, j, i2; unsigned char *bmpbuf=NULL, *jpgbuf=NULL; |
| 10357 tjhandle hnd=NULL; unsigned long size; |
| 10358 if((hnd=tjInitCompress())==NULL) |
| 10359 - {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); g
oto finally;} |
| 10360 + {printf("Error in tjInitCompress():\n%s\n", tjGetErrorStr()); b
ailout();} |
| 10361 printf("Buffer size regression test\n"); |
| 10362 for(j=1; j<48; j++) |
| 10363 { |
| 10364 @@ -337,7 +340,7 @@ |
| 10365 if((bmpbuf=(unsigned char *)malloc(i*j*4))==NULL |
| 10366 || (jpgbuf=(unsigned char *)malloc(TJBUFSIZE(i, j)))==NU
LL) |
| 10367 { |
| 10368 - printf("Memory allocation failure\n"); goto fin
ally; |
| 10369 + printf("Memory allocation failure\n"); bailout(
); |
| 10370 } |
| 10371 memset(bmpbuf, 0, i*j*4); |
| 10372 for(i2=0; i2<i*j; i2++) |
| 10373 @@ -353,7 +356,7 @@ |
| 10374 if((bmpbuf=(unsigned char *)malloc(j*i*4))==NULL |
| 10375 || (jpgbuf=(unsigned char *)malloc(TJBUFSIZE(j, i)))==NU
LL) |
| 10376 { |
| 10377 - printf("Memory allocation failure\n"); goto fin
ally; |
| 10378 + printf("Memory allocation failure\n"); bailout(
); |
| 10379 } |
| 10380 for(i2=0; i2<j*i*4; i2++) |
| 10381 { |
| 10382 @@ -380,5 +383,5 @@ |
| 10383 dotest(35, 41, 4, TJ_GRAYSCALE, "test"); |
| 10384 dotest1(); |
| 10385 |
| 10386 - return 0; |
| 10387 + return exitstatus; |
| 10388 } |
| 10389 Index: jpgtest.cxx |
| 10390 =================================================================== |
| 10391 --- jpgtest.cxx (revision 829) |
| 10392 +++ jpgtest.cxx (working copy) |
| 10393 @@ -322,22 +322,22 @@ |
| 10394 if(!stricmp(argv[i], "-tile")) dotile=1; |
| 10395 if(!stricmp(argv[i], "-forcesse3")) |
| 10396 { |
| 10397 - printf("Using SSE3 code in Intel compressor\n"); |
| 10398 + printf("Using SSE3 code\n"); |
| 10399 forcesse3=1; |
| 10400 } |
| 10401 if(!stricmp(argv[i], "-forcesse2")) |
| 10402 { |
| 10403 - printf("Using SSE2 code in Intel compressor\n"); |
| 10404 + printf("Using SSE2 code\n"); |
| 10405 forcesse2=1; |
| 10406 } |
| 10407 if(!stricmp(argv[i], "-forcesse")) |
| 10408 { |
| 10409 - printf("Using SSE code in Intel compressor\n"); |
| 10410 + printf("Using SSE code\n"); |
| 10411 forcesse=1; |
| 10412 } |
| 10413 if(!stricmp(argv[i], "-forcemmx")) |
| 10414 { |
| 10415 - printf("Using MMX code in Intel compressor\n"); |
| 10416 + printf("Using MMX code\n"); |
| 10417 forcemmx=1; |
| 10418 } |
| 10419 if(!stricmp(argv[i], "-fastupsample")) |
| 10420 Index: jquant1.c |
| 10421 =================================================================== |
| 10422 --- jquant1.c (revision 829) |
| 10423 +++ jquant1.c (working copy) |
| 10424 @@ -1,9 +1,10 @@ |
| 10425 /* |
| 10426 * jquant1.c |
| 10427 * |
| 10428 + * This file was part of the Independent JPEG Group's software: |
| 10429 * Copyright (C) 1991-1996, Thomas G. Lane. |
| 10430 + * libjpeg-turbo Modifications: |
| 10431 * Copyright (C) 2009, D. R. Commander |
| 10432 - * This file is part of the Independent JPEG Group's software. |
| 10433 * For conditions of distribution and use, see the accompanying README file. |
| 10434 * |
| 10435 * This file contains 1-pass color quantization (color mapping) routines. |
| 10436 Index: jquant2.c |
| 10437 =================================================================== |
| 10438 --- jquant2.c (revision 829) |
| 10439 +++ jquant2.c (working copy) |
| 10440 @@ -1,9 +1,10 @@ |
| 10441 /* |
| 10442 * jquant2.c |
| 10443 * |
| 10444 + * This file was part of the Independent JPEG Group's software: |
| 10445 * Copyright (C) 1991-1996, Thomas G. Lane. |
| 10446 + * libjpeg-turbo Modifications: |
| 10447 * Copyright (C) 2009, D. R. Commander. |
| 10448 - * This file is part of the Independent JPEG Group's software. |
| 10449 * For conditions of distribution and use, see the accompanying README file. |
| 10450 * |
| 10451 * This file contains 2-pass color quantization (color mapping) routines. |
| 10452 Index: jsimd.h |
| 10453 =================================================================== |
| 10454 --- jsimd.h (revision 829) |
| 10455 +++ jsimd.h (working copy) |
| 10456 @@ -2,9 +2,11 @@ |
| 10457 * jsimd.h |
| 10458 * |
| 10459 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 10460 + * Copyright 2011 D. R. Commander |
| 10461 * |
| 10462 * Based on the x86 SIMD extension for IJG JPEG library, |
| 10463 * Copyright (C) 1999-2006, MIYASAKA Masaru. |
| 10464 + * For conditions of distribution and use, see copyright notice in jsimdext.inc |
| 10465 * |
| 10466 */ |
| 10467 |
| 10468 @@ -12,8 +14,10 @@ |
| 10469 |
| 10470 #ifdef NEED_SHORT_EXTERNAL_NAMES |
| 10471 #define jsimd_can_rgb_ycc jSCanRgbYcc |
| 10472 +#define jsimd_can_rgb_gray jSCanRgbGry |
| 10473 #define jsimd_can_ycc_rgb jSCanYccRgb |
| 10474 #define jsimd_rgb_ycc_convert jSRgbYccConv |
| 10475 +#define jsimd_rgb_gray_convert jSRgbGryConv |
| 10476 #define jsimd_ycc_rgb_convert jSYccRgbConv |
| 10477 #define jsimd_can_h2v2_downsample jSCanH2V2Down |
| 10478 #define jsimd_can_h2v1_downsample jSCanH2V1Down |
| 10479 @@ -34,6 +38,7 @@ |
| 10480 #endif /* NEED_SHORT_EXTERNAL_NAMES */ |
| 10481 |
| 10482 EXTERN(int) jsimd_can_rgb_ycc JPP((void)); |
| 10483 +EXTERN(int) jsimd_can_rgb_gray JPP((void)); |
| 10484 EXTERN(int) jsimd_can_ycc_rgb JPP((void)); |
| 10485 |
| 10486 EXTERN(void) jsimd_rgb_ycc_convert |
| 10487 @@ -40,6 +45,10 @@ |
| 10488 JPP((j_compress_ptr cinfo, |
| 10489 JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 10490 JDIMENSION output_row, int num_rows)); |
| 10491 +EXTERN(void) jsimd_rgb_gray_convert |
| 10492 + JPP((j_compress_ptr cinfo, |
| 10493 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 10494 + JDIMENSION output_row, int num_rows)); |
| 10495 EXTERN(void) jsimd_ycc_rgb_convert |
| 10496 JPP((j_decompress_ptr cinfo, |
| 10497 JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 10498 Index: jsimd_none.c |
| 10499 =================================================================== |
| 10500 --- jsimd_none.c (revision 829) |
| 10501 +++ jsimd_none.c (working copy) |
| 10502 @@ -2,10 +2,11 @@ |
| 10503 * jsimd_none.c |
| 10504 * |
| 10505 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 10506 - * Copyright 2009 D. R. Commander |
| 10507 + * Copyright 2009-2011 D. R. Commander |
| 10508 * |
| 10509 * Based on the x86 SIMD extension for IJG JPEG library, |
| 10510 * Copyright (C) 1999-2006, MIYASAKA Masaru. |
| 10511 + * For conditions of distribution and use, see copyright notice in jsimdext.inc |
| 10512 * |
| 10513 * This file contains stubs for when there is no SIMD support available. |
| 10514 */ |
| 10515 @@ -24,6 +25,12 @@ |
| 10516 } |
| 10517 |
| 10518 GLOBAL(int) |
| 10519 +jsimd_can_rgb_gray (void) |
| 10520 +{ |
| 10521 + return 0; |
| 10522 +} |
| 10523 + |
| 10524 +GLOBAL(int) |
| 10525 jsimd_can_ycc_rgb (void) |
| 10526 { |
| 10527 return 0; |
| 10528 @@ -37,6 +44,13 @@ |
| 10529 } |
| 10530 |
| 10531 GLOBAL(void) |
| 10532 +jsimd_rgb_gray_convert (j_compress_ptr cinfo, |
| 10533 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 10534 + JDIMENSION output_row, int num_rows) |
| 10535 +{ |
| 10536 +} |
| 10537 + |
| 10538 +GLOBAL(void) |
| 10539 jsimd_ycc_rgb_convert (j_decompress_ptr cinfo, |
| 10540 JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 10541 JSAMPARRAY output_buf, int num_rows) |
| 10542 Index: jsimddct.h |
| 10543 =================================================================== |
| 10544 --- jsimddct.h (revision 829) |
| 10545 +++ jsimddct.h (working copy) |
| 10546 @@ -5,6 +5,7 @@ |
| 10547 * |
| 10548 * Based on the x86 SIMD extension for IJG JPEG library, |
| 10549 * Copyright (C) 1999-2006, MIYASAKA Masaru. |
| 10550 + * For conditions of distribution and use, see copyright notice in jsimdext.inc |
| 10551 * |
| 10552 */ |
| 10553 |
| 10554 Index: jversion.h |
| 10555 =================================================================== |
| 10556 --- jversion.h (revision 829) |
| 10557 +++ jversion.h (working copy) |
| 10558 @@ -1,8 +1,10 @@ |
| 10559 /* |
| 10560 * jversion.h |
| 10561 * |
| 10562 - * Copyright (C) 1991-1998, Thomas G. Lane. |
| 10563 - * This file is part of the Independent JPEG Group's software. |
| 10564 + * This file was part of the Independent JPEG Group's software: |
| 10565 + * Copyright (C) 1991-2012, Thomas G. Lane, Guido Vollbeding. |
| 10566 + * Modifications: |
| 10567 + * Copyright (C) 2010, 2012-2014, D. R. Commander. |
| 10568 * For conditions of distribution and use, see the accompanying README file. |
| 10569 * |
| 10570 * This file contains software version identification. |
| 10571 @@ -9,6 +11,22 @@ |
| 10572 */ |
| 10573 |
| 10574 |
| 10575 +#if JPEG_LIB_VERSION >= 80 |
| 10576 + |
| 10577 +#define JVERSION "8d 15-Jan-2012" |
| 10578 + |
| 10579 +#elif JPEG_LIB_VERSION >= 70 |
| 10580 + |
| 10581 +#define JVERSION "7 27-Jun-2009" |
| 10582 + |
| 10583 +#else |
| 10584 + |
| 10585 #define JVERSION "6b 27-Mar-1998" |
| 10586 |
| 10587 -#define JCOPYRIGHT "Copyright (C) 1998, Thomas G. Lane" |
| 10588 +#endif |
| 10589 + |
| 10590 +#define JCOPYRIGHT "Copyright (C) 1991-2012 Thomas G. Lane, Guido Vollbedin
g\n" \ |
| 10591 + "Copyright (C) 1999-2006 MIYASAKA Masaru\n" \ |
| 10592 + "Copyright (C) 2009 Pierre Ossman for Cendio AB\n" \ |
| 10593 + "Copyright (C) 2009-2014 D. R. Commander\n" \ |
| 10594 + "Copyright (C) 2009-2011 Nokia Corporation and/or its su
bsidiary(-ies)" |
| 10595 Index: rdbmp.c |
| 10596 =================================================================== |
| 10597 --- rdbmp.c (revision 829) |
| 10598 +++ rdbmp.c (working copy) |
| 10599 @@ -1,8 +1,11 @@ |
| 10600 /* |
| 10601 * rdbmp.c |
| 10602 * |
| 10603 + * This file was part of the Independent JPEG Group's software: |
| 10604 * Copyright (C) 1994-1996, Thomas G. Lane. |
| 10605 - * This file is part of the Independent JPEG Group's software. |
| 10606 + * Modified 2009-2010 by Guido Vollbeding. |
| 10607 + * libjpeg-turbo Modifications: |
| 10608 + * Modified 2011 by Siarhei Siamashka. |
| 10609 * For conditions of distribution and use, see the accompanying README file. |
| 10610 * |
| 10611 * This file contains routines to read input images in Microsoft "BMP" |
| 10612 @@ -177,10 +180,41 @@ |
| 10613 } |
| 10614 |
| 10615 |
| 10616 +METHODDEF(JDIMENSION) |
| 10617 +get_32bit_row (j_compress_ptr cinfo, cjpeg_source_ptr sinfo) |
| 10618 +/* This version is for reading 32-bit pixels */ |
| 10619 +{ |
| 10620 + bmp_source_ptr source = (bmp_source_ptr) sinfo; |
| 10621 + JSAMPARRAY image_ptr; |
| 10622 + register JSAMPROW inptr, outptr; |
| 10623 + register JDIMENSION col; |
| 10624 + |
| 10625 + /* Fetch next row from virtual array */ |
| 10626 + source->source_row--; |
| 10627 + image_ptr = (*cinfo->mem->access_virt_sarray) |
| 10628 + ((j_common_ptr) cinfo, source->whole_image, |
| 10629 + source->source_row, (JDIMENSION) 1, FALSE); |
| 10630 + /* Transfer data. Note source values are in BGR order |
| 10631 + * (even though Microsoft's own documents say the opposite). |
| 10632 + */ |
| 10633 + inptr = image_ptr[0]; |
| 10634 + outptr = source->pub.buffer[0]; |
| 10635 + for (col = cinfo->image_width; col > 0; col--) { |
| 10636 + outptr[2] = *inptr++; /* can omit GETJSAMPLE() safely */ |
| 10637 + outptr[1] = *inptr++; |
| 10638 + outptr[0] = *inptr++; |
| 10639 + inptr++; /* skip the 4th byte (Alpha channel) */ |
| 10640 + outptr += 3; |
| 10641 + } |
| 10642 + |
| 10643 + return 1; |
| 10644 +} |
| 10645 + |
| 10646 + |
| 10647 /* |
| 10648 * This method loads the image into whole_image during the first call on |
| 10649 * get_pixel_rows. The get_pixel_rows pointer is then adjusted to call |
| 10650 - * get_8bit_row or get_24bit_row on subsequent calls. |
| 10651 + * get_8bit_row, get_24bit_row, or get_32bit_row on subsequent calls. |
| 10652 */ |
| 10653 |
| 10654 METHODDEF(JDIMENSION) |
| 10655 @@ -188,10 +222,9 @@ |
| 10656 { |
| 10657 bmp_source_ptr source = (bmp_source_ptr) sinfo; |
| 10658 register FILE *infile = source->pub.input_file; |
| 10659 - register int c; |
| 10660 register JSAMPROW out_ptr; |
| 10661 JSAMPARRAY image_ptr; |
| 10662 - JDIMENSION row, col; |
| 10663 + JDIMENSION row; |
| 10664 cd_progress_ptr progress = (cd_progress_ptr) cinfo->progress; |
| 10665 |
| 10666 /* Read the data into a virtual array in input-file row order. */ |
| 10667 @@ -205,11 +238,11 @@ |
| 10668 ((j_common_ptr) cinfo, source->whole_image, |
| 10669 row, (JDIMENSION) 1, TRUE); |
| 10670 out_ptr = image_ptr[0]; |
| 10671 - for (col = source->row_width; col > 0; col--) { |
| 10672 - /* inline copy of read_byte() for speed */ |
| 10673 - if ((c = getc(infile)) == EOF) |
| 10674 - ERREXIT(cinfo, JERR_INPUT_EOF); |
| 10675 - *out_ptr++ = (JSAMPLE) c; |
| 10676 + if (fread(out_ptr, 1, source->row_width, infile) != source->row_width) { |
| 10677 + if (feof(infile)) |
| 10678 + ERREXIT(cinfo, JERR_INPUT_EOF); |
| 10679 + else |
| 10680 + ERREXIT(cinfo, JERR_FILE_READ); |
| 10681 } |
| 10682 } |
| 10683 if (progress != NULL) |
| 10684 @@ -223,6 +256,9 @@ |
| 10685 case 24: |
| 10686 source->pub.get_pixel_rows = get_24bit_row; |
| 10687 break; |
| 10688 + case 32: |
| 10689 + source->pub.get_pixel_rows = get_32bit_row; |
| 10690 + break; |
| 10691 default: |
| 10692 ERREXIT(cinfo, JERR_BMP_BADDEPTH); |
| 10693 } |
| 10694 @@ -251,8 +287,8 @@ |
| 10695 (((INT32) UCH(array[offset+3])) << 24)) |
| 10696 INT32 bfOffBits; |
| 10697 INT32 headerSize; |
| 10698 - INT32 biWidth = 0; /* initialize to avoid compiler warning */ |
| 10699 - INT32 biHeight = 0; |
| 10700 + INT32 biWidth; |
| 10701 + INT32 biHeight; |
| 10702 unsigned int biPlanes; |
| 10703 INT32 biCompression; |
| 10704 INT32 biXPelsPerMeter,biYPelsPerMeter; |
| 10705 @@ -300,8 +336,6 @@ |
| 10706 ERREXIT(cinfo, JERR_BMP_BADDEPTH); |
| 10707 break; |
| 10708 } |
| 10709 - if (biPlanes != 1) |
| 10710 - ERREXIT(cinfo, JERR_BMP_BADPLANES); |
| 10711 break; |
| 10712 case 40: |
| 10713 case 64: |
| 10714 @@ -325,12 +359,13 @@ |
| 10715 case 24: /* RGB image */ |
| 10716 TRACEMS2(cinfo, 1, JTRC_BMP, (int) biWidth, (int) biHeight); |
| 10717 break; |
| 10718 + case 32: /* RGB image + Alpha channel */ |
| 10719 + TRACEMS2(cinfo, 1, JTRC_BMP, (int) biWidth, (int) biHeight); |
| 10720 + break; |
| 10721 default: |
| 10722 ERREXIT(cinfo, JERR_BMP_BADDEPTH); |
| 10723 break; |
| 10724 } |
| 10725 - if (biPlanes != 1) |
| 10726 - ERREXIT(cinfo, JERR_BMP_BADPLANES); |
| 10727 if (biCompression != 0) |
| 10728 ERREXIT(cinfo, JERR_BMP_COMPRESSED); |
| 10729 |
| 10730 @@ -343,9 +378,14 @@ |
| 10731 break; |
| 10732 default: |
| 10733 ERREXIT(cinfo, JERR_BMP_BADHEADER); |
| 10734 - break; |
| 10735 + return; |
| 10736 } |
| 10737 |
| 10738 + if (biWidth <= 0 || biHeight <= 0) |
| 10739 + ERREXIT(cinfo, JERR_BMP_EMPTY); |
| 10740 + if (biPlanes != 1) |
| 10741 + ERREXIT(cinfo, JERR_BMP_BADPLANES); |
| 10742 + |
| 10743 /* Compute distance to bitmap data --- will adjust for colormap below */ |
| 10744 bPad = bfOffBits - (headerSize + 14); |
| 10745 |
| 10746 @@ -375,6 +415,8 @@ |
| 10747 /* Compute row width in file, including padding to 4-byte boundary */ |
| 10748 if (source->bits_per_pixel == 24) |
| 10749 row_width = (JDIMENSION) (biWidth * 3); |
| 10750 + else if (source->bits_per_pixel == 32) |
| 10751 + row_width = (JDIMENSION) (biWidth * 4); |
| 10752 else |
| 10753 row_width = (JDIMENSION) biWidth; |
| 10754 while ((row_width & 3) != 0) row_width++; |
| 10755 Index: rdppm.c |
| 10756 =================================================================== |
| 10757 --- rdppm.c (revision 829) |
| 10758 +++ rdppm.c (working copy) |
| 10759 @@ -2,6 +2,7 @@ |
| 10760 * rdppm.c |
| 10761 * |
| 10762 * Copyright (C) 1991-1997, Thomas G. Lane. |
| 10763 + * Modified 2009 by Bill Allombert, Guido Vollbeding. |
| 10764 * This file is part of the Independent JPEG Group's software. |
| 10765 * For conditions of distribution and use, see the accompanying README file. |
| 10766 * |
| 10767 @@ -250,8 +251,8 @@ |
| 10768 bufferptr = source->iobuffer; |
| 10769 for (col = cinfo->image_width; col > 0; col--) { |
| 10770 register int temp; |
| 10771 - temp = UCH(*bufferptr++); |
| 10772 - temp |= UCH(*bufferptr++) << 8; |
| 10773 + temp = UCH(*bufferptr++) << 8; |
| 10774 + temp |= UCH(*bufferptr++); |
| 10775 *ptr++ = rescale[temp]; |
| 10776 } |
| 10777 return 1; |
| 10778 @@ -274,14 +275,14 @@ |
| 10779 bufferptr = source->iobuffer; |
| 10780 for (col = cinfo->image_width; col > 0; col--) { |
| 10781 register int temp; |
| 10782 - temp = UCH(*bufferptr++); |
| 10783 - temp |= UCH(*bufferptr++) << 8; |
| 10784 + temp = UCH(*bufferptr++) << 8; |
| 10785 + temp |= UCH(*bufferptr++); |
| 10786 *ptr++ = rescale[temp]; |
| 10787 - temp = UCH(*bufferptr++); |
| 10788 - temp |= UCH(*bufferptr++) << 8; |
| 10789 + temp = UCH(*bufferptr++) << 8; |
| 10790 + temp |= UCH(*bufferptr++); |
| 10791 *ptr++ = rescale[temp]; |
| 10792 - temp = UCH(*bufferptr++); |
| 10793 - temp |= UCH(*bufferptr++) << 8; |
| 10794 + temp = UCH(*bufferptr++) << 8; |
| 10795 + temp |= UCH(*bufferptr++); |
| 10796 *ptr++ = rescale[temp]; |
| 10797 } |
| 10798 return 1; |
| 10799 Index: rdswitch.c |
| 10800 =================================================================== |
| 10801 --- rdswitch.c (revision 829) |
| 10802 +++ rdswitch.c (working copy) |
| 10803 @@ -1,8 +1,10 @@ |
| 10804 /* |
| 10805 * rdswitch.c |
| 10806 * |
| 10807 + * This file was part of the Independent JPEG Group's software: |
| 10808 * Copyright (C) 1991-1996, Thomas G. Lane. |
| 10809 - * This file is part of the Independent JPEG Group's software. |
| 10810 + * libjpeg-turbo Modifications: |
| 10811 + * Copyright (C) 2010, D. R. Commander. |
| 10812 * For conditions of distribution and use, see the accompanying README file. |
| 10813 * |
| 10814 * This file contains routines to process some of cjpeg's more complicated |
| 10815 @@ -9,6 +11,7 @@ |
| 10816 * command-line switches. Switches processed here are: |
| 10817 * -qtables file Read quantization tables from text file |
| 10818 * -scans file Read scan script from text file |
| 10819 + * -quality N[,N,...] Set quality ratings |
| 10820 * -qslots N[,N,...] Set component quantization table selectors |
| 10821 * -sample HxV[,HxV,...] Set component sampling factors |
| 10822 */ |
| 10823 @@ -69,9 +72,12 @@ |
| 10824 } |
| 10825 |
| 10826 |
| 10827 +#if JPEG_LIB_VERSION < 70 |
| 10828 +static int q_scale_factor[NUM_QUANT_TBLS] = {100, 100, 100, 100}; |
| 10829 +#endif |
| 10830 + |
| 10831 GLOBAL(boolean) |
| 10832 -read_quant_tables (j_compress_ptr cinfo, char * filename, |
| 10833 - int scale_factor, boolean force_baseline) |
| 10834 +read_quant_tables (j_compress_ptr cinfo, char * filename, boolean force_baselin
e) |
| 10835 /* Read a set of quantization tables from the specified file. |
| 10836 * The file is plain ASCII text: decimal numbers with whitespace between. |
| 10837 * Comments preceded by '#' may be included in the file. |
| 10838 @@ -108,7 +114,13 @@ |
| 10839 } |
| 10840 table[i] = (unsigned int) val; |
| 10841 } |
| 10842 - jpeg_add_quant_table(cinfo, tblno, table, scale_factor, force_baseline); |
| 10843 +#if JPEG_LIB_VERSION >= 70 |
| 10844 + jpeg_add_quant_table(cinfo, tblno, table, cinfo->q_scale_factor[tblno], |
| 10845 + force_baseline); |
| 10846 +#else |
| 10847 + jpeg_add_quant_table(cinfo, tblno, table, q_scale_factor[tblno], |
| 10848 + force_baseline); |
| 10849 +#endif |
| 10850 tblno++; |
| 10851 } |
| 10852 |
| 10853 @@ -262,7 +274,85 @@ |
| 10854 #endif /* C_MULTISCAN_FILES_SUPPORTED */ |
| 10855 |
| 10856 |
| 10857 +#if JPEG_LIB_VERSION < 70 |
| 10858 +/* These are the sample quantization tables given in JPEG spec section K.1. |
| 10859 + * The spec says that the values given produce "good" quality, and |
| 10860 + * when divided by 2, "very good" quality. |
| 10861 + */ |
| 10862 +static const unsigned int std_luminance_quant_tbl[DCTSIZE2] = { |
| 10863 + 16, 11, 10, 16, 24, 40, 51, 61, |
| 10864 + 12, 12, 14, 19, 26, 58, 60, 55, |
| 10865 + 14, 13, 16, 24, 40, 57, 69, 56, |
| 10866 + 14, 17, 22, 29, 51, 87, 80, 62, |
| 10867 + 18, 22, 37, 56, 68, 109, 103, 77, |
| 10868 + 24, 35, 55, 64, 81, 104, 113, 92, |
| 10869 + 49, 64, 78, 87, 103, 121, 120, 101, |
| 10870 + 72, 92, 95, 98, 112, 100, 103, 99 |
| 10871 +}; |
| 10872 +static const unsigned int std_chrominance_quant_tbl[DCTSIZE2] = { |
| 10873 + 17, 18, 24, 47, 99, 99, 99, 99, |
| 10874 + 18, 21, 26, 66, 99, 99, 99, 99, |
| 10875 + 24, 26, 56, 99, 99, 99, 99, 99, |
| 10876 + 47, 66, 99, 99, 99, 99, 99, 99, |
| 10877 + 99, 99, 99, 99, 99, 99, 99, 99, |
| 10878 + 99, 99, 99, 99, 99, 99, 99, 99, |
| 10879 + 99, 99, 99, 99, 99, 99, 99, 99, |
| 10880 + 99, 99, 99, 99, 99, 99, 99, 99 |
| 10881 +}; |
| 10882 + |
| 10883 + |
| 10884 +LOCAL(void) |
| 10885 +jpeg_default_qtables (j_compress_ptr cinfo, boolean force_baseline) |
| 10886 +{ |
| 10887 + jpeg_add_quant_table(cinfo, 0, std_luminance_quant_tbl, |
| 10888 + q_scale_factor[0], force_baseline); |
| 10889 + jpeg_add_quant_table(cinfo, 1, std_chrominance_quant_tbl, |
| 10890 + q_scale_factor[1], force_baseline); |
| 10891 +} |
| 10892 +#endif |
| 10893 + |
| 10894 + |
| 10895 GLOBAL(boolean) |
| 10896 +set_quality_ratings (j_compress_ptr cinfo, char *arg, boolean force_baseline) |
| 10897 +/* Process a quality-ratings parameter string, of the form |
| 10898 + * N[,N,...] |
| 10899 + * If there are more q-table slots than parameters, the last value is replicate
d. |
| 10900 + */ |
| 10901 +{ |
| 10902 + int val = 75; /* default value */ |
| 10903 + int tblno; |
| 10904 + char ch; |
| 10905 + |
| 10906 + for (tblno = 0; tblno < NUM_QUANT_TBLS; tblno++) { |
| 10907 + if (*arg) { |
| 10908 + ch = ','; /* if not set by sscanf, will be ',' */ |
| 10909 + if (sscanf(arg, "%d%c", &val, &ch) < 1) |
| 10910 + return FALSE; |
| 10911 + if (ch != ',') /* syntax check */ |
| 10912 + return FALSE; |
| 10913 + /* Convert user 0-100 rating to percentage scaling */ |
| 10914 +#if JPEG_LIB_VERSION >= 70 |
| 10915 + cinfo->q_scale_factor[tblno] = jpeg_quality_scaling(val); |
| 10916 +#else |
| 10917 + q_scale_factor[tblno] = jpeg_quality_scaling(val); |
| 10918 +#endif |
| 10919 + while (*arg && *arg++ != ',') /* advance to next segment of arg string */ |
| 10920 + ; |
| 10921 + } else { |
| 10922 + /* reached end of parameter, set remaining factors to last value */ |
| 10923 +#if JPEG_LIB_VERSION >= 70 |
| 10924 + cinfo->q_scale_factor[tblno] = jpeg_quality_scaling(val); |
| 10925 +#else |
| 10926 + q_scale_factor[tblno] = jpeg_quality_scaling(val); |
| 10927 +#endif |
| 10928 + } |
| 10929 + } |
| 10930 + jpeg_default_qtables(cinfo, force_baseline); |
| 10931 + return TRUE; |
| 10932 +} |
| 10933 + |
| 10934 + |
| 10935 +GLOBAL(boolean) |
| 10936 set_quant_slots (j_compress_ptr cinfo, char *arg) |
| 10937 /* Process a quantization-table-selectors parameter string, of the form |
| 10938 * N[,N,...] |
| 10939 Index: rrutil.h |
| 10940 =================================================================== |
| 10941 --- rrutil.h (revision 829) |
| 10942 +++ rrutil.h (working copy) |
| 10943 @@ -1,5 +1,6 @@ |
| 10944 /* Copyright (C)2004 Landmark Graphics Corporation |
| 10945 * Copyright (C)2005 Sun Microsystems, Inc. |
| 10946 + * Copyright (C)2010 D. R. Commander |
| 10947 * |
| 10948 * This library is free software and may be redistributed and/or modified under |
| 10949 * the terms of the wxWindows Library License, Version 3.1 or (at your option) |
| 10950 @@ -47,9 +48,9 @@ |
| 10951 static __inline int numprocs(void) |
| 10952 { |
| 10953 #ifdef _WIN32 |
| 10954 - DWORD ProcAff, SysAff, i; int count=0; |
| 10955 + DWORD_PTR ProcAff, SysAff, i; int count=0; |
| 10956 if(!GetProcessAffinityMask(GetCurrentProcess(), &ProcAff, &SysAff)) retu
rn(1); |
| 10957 - for(i=0; i<32; i++) if(ProcAff&(1<<i)) count++; |
| 10958 + for(i=0; i<sizeof(long*)*8; i++) if(ProcAff&(1LL<<i)) count++; |
| 10959 return(count); |
| 10960 #elif defined (__APPLE__) |
| 10961 return(1); |
| 10962 Index: simd/jcclrmmx.asm |
| 10963 =================================================================== |
| 10964 --- simd/jcclrmmx.asm (revision 829) |
| 10965 +++ simd/jcclrmmx.asm (working copy) |
| 10966 @@ -19,8 +19,6 @@ |
| 10967 %include "jcolsamp.inc" |
| 10968 |
| 10969 ; -------------------------------------------------------------------------- |
| 10970 - SECTION SEG_TEXT |
| 10971 - BITS 32 |
| 10972 ; |
| 10973 ; Convert some rows of samples to the output colorspace. |
| 10974 ; |
| 10975 @@ -42,7 +40,7 @@ |
| 10976 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr |
365 | 10977 |
366 align 16 | 10978 align 16 |
367 -» global» EXTN(jsimd_idct_ifast_sse2) | 10979 -» global» EXTN(jsimd_rgb_ycc_convert_mmx) |
368 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE | 10980 +» global» EXTN(jsimd_rgb_ycc_convert_mmx) PRIVATE |
369 | 10981 |
370 EXTN(jsimd_idct_ifast_sse2): | 10982 EXTN(jsimd_rgb_ycc_convert_mmx): |
371 push ebp | 10983 push ebp |
| 10984 @@ -474,3 +472,6 @@ |
| 10985 pop ebp |
| 10986 ret |
| 10987 |
| 10988 +; For some reason, the OS X linker does not honor the request to align the |
| 10989 +; segment unless we do this. |
| 10990 + align 16 |
372 Index: simd/jcclrss2-64.asm | 10991 Index: simd/jcclrss2-64.asm |
373 =================================================================== | 10992 =================================================================== |
374 --- simd/jcclrss2-64.asm (revision 829) | 10993 --- simd/jcclrss2-64.asm (revision 829) |
375 +++ simd/jcclrss2-64.asm (working copy) | 10994 +++ simd/jcclrss2-64.asm (working copy) |
376 @@ -37,7 +37,7 @@ | 10995 @@ -1,5 +1,5 @@ |
| 10996 ; |
| 10997 -; jcclrss2.asm - colorspace conversion (64-bit SSE2) |
| 10998 +; jcclrss2-64.asm - colorspace conversion (64-bit SSE2) |
| 10999 ; |
| 11000 ; x86 SIMD extension for IJG JPEG library |
| 11001 ; Copyright (C) 1999-2006, MIYASAKA Masaru. |
| 11002 @@ -17,8 +17,6 @@ |
| 11003 %include "jcolsamp.inc" |
| 11004 |
| 11005 ; -------------------------------------------------------------------------- |
| 11006 -» SECTION»SEG_TEXT |
| 11007 -» BITS» 64 |
| 11008 ; |
| 11009 ; Convert some rows of samples to the output colorspace. |
| 11010 ; |
| 11011 @@ -39,7 +37,7 @@ |
377 | 11012 |
378 align 16 | 11013 align 16 |
379 | 11014 |
380 - global EXTN(jsimd_rgb_ycc_convert_sse2) | 11015 - global EXTN(jsimd_rgb_ycc_convert_sse2) |
381 + global EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE | 11016 + global EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE |
382 | 11017 |
383 EXTN(jsimd_rgb_ycc_convert_sse2): | 11018 EXTN(jsimd_rgb_ycc_convert_sse2): |
384 push rbp | 11019 push rbp |
385 Index: simd/jiss2red-64.asm | 11020 @@ -49,8 +47,8 @@ |
386 =================================================================== | 11021 » mov» [rsp],rax |
387 --- simd/jiss2red-64.asm» (revision 829) | 11022 » mov» rbp,rsp»» » » ; rbp = aligned rbp |
388 +++ simd/jiss2red-64.asm» (working copy) | 11023 » lea» rsp, [wk(0)] |
389 @@ -73,7 +73,7 @@ | 11024 +» collect_args |
| 11025 » push» rbx |
| 11026 -» collect_args |
| 11027 |
| 11028 » mov» rcx, r10 |
| 11029 » test» rcx,rcx |
| 11030 @@ -70,7 +68,7 @@ |
| 11031 » pop» rcx |
| 11032 |
| 11033 » mov rsi, r11 |
| 11034 -» mov» rax, r14 |
| 11035 +» mov» eax, r14d |
| 11036 » test» rax,rax |
| 11037 » jle» near .return |
| 11038 .rowloop: |
| 11039 @@ -475,10 +473,13 @@ |
| 11040 » jg» near .rowloop |
| 11041 |
| 11042 .return: |
| 11043 +» pop» rbx |
| 11044 » uncollect_args |
| 11045 -» pop» rbx |
| 11046 » mov» rsp,rbp»» ; rsp <- aligned rbp |
| 11047 » pop» rsp» » ; rsp <- original rbp |
| 11048 » pop» rbp |
| 11049 » ret |
| 11050 |
| 11051 +; For some reason, the OS X linker does not honor the request to align the |
| 11052 +; segment unless we do this. |
| 11053 +» align» 16 |
| 11054 Index: simd/jcclrss2.asm |
| 11055 =================================================================== |
| 11056 --- simd/jcclrss2.asm» (revision 829) |
| 11057 +++ simd/jcclrss2.asm» (working copy) |
| 11058 @@ -16,8 +16,6 @@ |
| 11059 %include "jcolsamp.inc" |
| 11060 |
| 11061 ; -------------------------------------------------------------------------- |
| 11062 -» SECTION»SEG_TEXT |
| 11063 -» BITS» 32 |
| 11064 ; |
| 11065 ; Convert some rows of samples to the output colorspace. |
| 11066 ; |
| 11067 @@ -40,7 +38,7 @@ |
| 11068 |
| 11069 » align» 16 |
| 11070 |
| 11071 -» global» EXTN(jsimd_rgb_ycc_convert_sse2) |
| 11072 +» global» EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE |
| 11073 |
| 11074 EXTN(jsimd_rgb_ycc_convert_sse2): |
| 11075 » push» ebp |
| 11076 @@ -500,3 +498,6 @@ |
| 11077 » pop» ebp |
| 11078 » ret |
| 11079 |
| 11080 +; For some reason, the OS X linker does not honor the request to align the |
| 11081 +; segment unless we do this. |
| 11082 +» align» 16 |
| 11083 Index: simd/jccolmmx.asm |
| 11084 =================================================================== |
| 11085 --- simd/jccolmmx.asm» (revision 829) |
| 11086 +++ simd/jccolmmx.asm» (working copy) |
| 11087 @@ -37,7 +37,7 @@ |
390 SECTION SEG_CONST | 11088 SECTION SEG_CONST |
391 | 11089 |
392 alignz 16 | 11090 alignz 16 |
393 -» global» EXTN(jconst_idct_red_sse2) | 11091 -» global» EXTN(jconst_rgb_ycc_convert_mmx) |
394 +» global» EXTN(jconst_idct_red_sse2) PRIVATE | 11092 +» global» EXTN(jconst_rgb_ycc_convert_mmx) PRIVATE |
395 | 11093 |
396 EXTN(jconst_idct_red_sse2): | 11094 EXTN(jconst_rgb_ycc_convert_mmx): |
397 | 11095 |
398 @@ -114,7 +114,7 @@ | 11096 @@ -51,6 +51,9 @@ |
399 %define WK_NUM»» 2 | 11097 » alignz» 16 |
| 11098 |
| 11099 ; -------------------------------------------------------------------------- |
| 11100 +» SECTION»SEG_TEXT |
| 11101 +» BITS» 32 |
| 11102 + |
| 11103 %include "jcclrmmx.asm" |
| 11104 |
| 11105 %undef RGB_RED |
| 11106 @@ -57,10 +60,10 @@ |
| 11107 %undef RGB_GREEN |
| 11108 %undef RGB_BLUE |
| 11109 %undef RGB_PIXELSIZE |
| 11110 -%define RGB_RED 0 |
| 11111 -%define RGB_GREEN 1 |
| 11112 -%define RGB_BLUE 2 |
| 11113 -%define RGB_PIXELSIZE 3 |
| 11114 +%define RGB_RED EXT_RGB_RED |
| 11115 +%define RGB_GREEN EXT_RGB_GREEN |
| 11116 +%define RGB_BLUE EXT_RGB_BLUE |
| 11117 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 11118 %define jsimd_rgb_ycc_convert_mmx jsimd_extrgb_ycc_convert_mmx |
| 11119 %include "jcclrmmx.asm" |
| 11120 |
| 11121 @@ -68,10 +71,10 @@ |
| 11122 %undef RGB_GREEN |
| 11123 %undef RGB_BLUE |
| 11124 %undef RGB_PIXELSIZE |
| 11125 -%define RGB_RED 0 |
| 11126 -%define RGB_GREEN 1 |
| 11127 -%define RGB_BLUE 2 |
| 11128 -%define RGB_PIXELSIZE 4 |
| 11129 +%define RGB_RED EXT_RGBX_RED |
| 11130 +%define RGB_GREEN EXT_RGBX_GREEN |
| 11131 +%define RGB_BLUE EXT_RGBX_BLUE |
| 11132 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 11133 %define jsimd_rgb_ycc_convert_mmx jsimd_extrgbx_ycc_convert_mmx |
| 11134 %include "jcclrmmx.asm" |
| 11135 |
| 11136 @@ -79,10 +82,10 @@ |
| 11137 %undef RGB_GREEN |
| 11138 %undef RGB_BLUE |
| 11139 %undef RGB_PIXELSIZE |
| 11140 -%define RGB_RED 2 |
| 11141 -%define RGB_GREEN 1 |
| 11142 -%define RGB_BLUE 0 |
| 11143 -%define RGB_PIXELSIZE 3 |
| 11144 +%define RGB_RED EXT_BGR_RED |
| 11145 +%define RGB_GREEN EXT_BGR_GREEN |
| 11146 +%define RGB_BLUE EXT_BGR_BLUE |
| 11147 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 11148 %define jsimd_rgb_ycc_convert_mmx jsimd_extbgr_ycc_convert_mmx |
| 11149 %include "jcclrmmx.asm" |
| 11150 |
| 11151 @@ -90,10 +93,10 @@ |
| 11152 %undef RGB_GREEN |
| 11153 %undef RGB_BLUE |
| 11154 %undef RGB_PIXELSIZE |
| 11155 -%define RGB_RED 2 |
| 11156 -%define RGB_GREEN 1 |
| 11157 -%define RGB_BLUE 0 |
| 11158 -%define RGB_PIXELSIZE 4 |
| 11159 +%define RGB_RED EXT_BGRX_RED |
| 11160 +%define RGB_GREEN EXT_BGRX_GREEN |
| 11161 +%define RGB_BLUE EXT_BGRX_BLUE |
| 11162 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 11163 %define jsimd_rgb_ycc_convert_mmx jsimd_extbgrx_ycc_convert_mmx |
| 11164 %include "jcclrmmx.asm" |
| 11165 |
| 11166 @@ -101,10 +104,10 @@ |
| 11167 %undef RGB_GREEN |
| 11168 %undef RGB_BLUE |
| 11169 %undef RGB_PIXELSIZE |
| 11170 -%define RGB_RED 3 |
| 11171 -%define RGB_GREEN 2 |
| 11172 -%define RGB_BLUE 1 |
| 11173 -%define RGB_PIXELSIZE 4 |
| 11174 +%define RGB_RED EXT_XBGR_RED |
| 11175 +%define RGB_GREEN EXT_XBGR_GREEN |
| 11176 +%define RGB_BLUE EXT_XBGR_BLUE |
| 11177 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 11178 %define jsimd_rgb_ycc_convert_mmx jsimd_extxbgr_ycc_convert_mmx |
| 11179 %include "jcclrmmx.asm" |
| 11180 |
| 11181 @@ -112,9 +115,9 @@ |
| 11182 %undef RGB_GREEN |
| 11183 %undef RGB_BLUE |
| 11184 %undef RGB_PIXELSIZE |
| 11185 -%define RGB_RED 1 |
| 11186 -%define RGB_GREEN 2 |
| 11187 -%define RGB_BLUE 3 |
| 11188 -%define RGB_PIXELSIZE 4 |
| 11189 +%define RGB_RED EXT_XRGB_RED |
| 11190 +%define RGB_GREEN EXT_XRGB_GREEN |
| 11191 +%define RGB_BLUE EXT_XRGB_BLUE |
| 11192 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 11193 %define jsimd_rgb_ycc_convert_mmx jsimd_extxrgb_ycc_convert_mmx |
| 11194 %include "jcclrmmx.asm" |
| 11195 Index: simd/jccolss2-64.asm |
| 11196 =================================================================== |
| 11197 --- simd/jccolss2-64.asm» (revision 829) |
| 11198 +++ simd/jccolss2-64.asm» (working copy) |
| 11199 @@ -1,5 +1,5 @@ |
| 11200 ; |
| 11201 -; jccolss2.asm - colorspace conversion (64-bit SSE2) |
| 11202 +; jccolss2-64.asm - colorspace conversion (64-bit SSE2) |
| 11203 ; |
| 11204 ; x86 SIMD extension for IJG JPEG library |
| 11205 ; Copyright (C) 1999-2006, MIYASAKA Masaru. |
| 11206 @@ -34,7 +34,7 @@ |
| 11207 » SECTION»SEG_CONST |
| 11208 |
| 11209 » alignz» 16 |
| 11210 -» global» EXTN(jconst_rgb_ycc_convert_sse2) |
| 11211 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE |
| 11212 |
| 11213 EXTN(jconst_rgb_ycc_convert_sse2): |
| 11214 |
| 11215 @@ -48,6 +48,9 @@ |
| 11216 » alignz» 16 |
| 11217 |
| 11218 ; -------------------------------------------------------------------------- |
| 11219 +» SECTION»SEG_TEXT |
| 11220 +» BITS» 64 |
| 11221 + |
| 11222 %include "jcclrss2-64.asm" |
| 11223 |
| 11224 %undef RGB_RED |
| 11225 @@ -54,10 +57,10 @@ |
| 11226 %undef RGB_GREEN |
| 11227 %undef RGB_BLUE |
| 11228 %undef RGB_PIXELSIZE |
| 11229 -%define RGB_RED 0 |
| 11230 -%define RGB_GREEN 1 |
| 11231 -%define RGB_BLUE 2 |
| 11232 -%define RGB_PIXELSIZE 3 |
| 11233 +%define RGB_RED EXT_RGB_RED |
| 11234 +%define RGB_GREEN EXT_RGB_GREEN |
| 11235 +%define RGB_BLUE EXT_RGB_BLUE |
| 11236 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 11237 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgb_ycc_convert_sse2 |
| 11238 %include "jcclrss2-64.asm" |
| 11239 |
| 11240 @@ -65,10 +68,10 @@ |
| 11241 %undef RGB_GREEN |
| 11242 %undef RGB_BLUE |
| 11243 %undef RGB_PIXELSIZE |
| 11244 -%define RGB_RED 0 |
| 11245 -%define RGB_GREEN 1 |
| 11246 -%define RGB_BLUE 2 |
| 11247 -%define RGB_PIXELSIZE 4 |
| 11248 +%define RGB_RED EXT_RGBX_RED |
| 11249 +%define RGB_GREEN EXT_RGBX_GREEN |
| 11250 +%define RGB_BLUE EXT_RGBX_BLUE |
| 11251 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 11252 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgbx_ycc_convert_sse2 |
| 11253 %include "jcclrss2-64.asm" |
| 11254 |
| 11255 @@ -76,10 +79,10 @@ |
| 11256 %undef RGB_GREEN |
| 11257 %undef RGB_BLUE |
| 11258 %undef RGB_PIXELSIZE |
| 11259 -%define RGB_RED 2 |
| 11260 -%define RGB_GREEN 1 |
| 11261 -%define RGB_BLUE 0 |
| 11262 -%define RGB_PIXELSIZE 3 |
| 11263 +%define RGB_RED EXT_BGR_RED |
| 11264 +%define RGB_GREEN EXT_BGR_GREEN |
| 11265 +%define RGB_BLUE EXT_BGR_BLUE |
| 11266 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 11267 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgr_ycc_convert_sse2 |
| 11268 %include "jcclrss2-64.asm" |
| 11269 |
| 11270 @@ -87,10 +90,10 @@ |
| 11271 %undef RGB_GREEN |
| 11272 %undef RGB_BLUE |
| 11273 %undef RGB_PIXELSIZE |
| 11274 -%define RGB_RED 2 |
| 11275 -%define RGB_GREEN 1 |
| 11276 -%define RGB_BLUE 0 |
| 11277 -%define RGB_PIXELSIZE 4 |
| 11278 +%define RGB_RED EXT_BGRX_RED |
| 11279 +%define RGB_GREEN EXT_BGRX_GREEN |
| 11280 +%define RGB_BLUE EXT_BGRX_BLUE |
| 11281 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 11282 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgrx_ycc_convert_sse2 |
| 11283 %include "jcclrss2-64.asm" |
| 11284 |
| 11285 @@ -98,10 +101,10 @@ |
| 11286 %undef RGB_GREEN |
| 11287 %undef RGB_BLUE |
| 11288 %undef RGB_PIXELSIZE |
| 11289 -%define RGB_RED 3 |
| 11290 -%define RGB_GREEN 2 |
| 11291 -%define RGB_BLUE 1 |
| 11292 -%define RGB_PIXELSIZE 4 |
| 11293 +%define RGB_RED EXT_XBGR_RED |
| 11294 +%define RGB_GREEN EXT_XBGR_GREEN |
| 11295 +%define RGB_BLUE EXT_XBGR_BLUE |
| 11296 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 11297 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxbgr_ycc_convert_sse2 |
| 11298 %include "jcclrss2-64.asm" |
| 11299 |
| 11300 @@ -109,9 +112,9 @@ |
| 11301 %undef RGB_GREEN |
| 11302 %undef RGB_BLUE |
| 11303 %undef RGB_PIXELSIZE |
| 11304 -%define RGB_RED 1 |
| 11305 -%define RGB_GREEN 2 |
| 11306 -%define RGB_BLUE 3 |
| 11307 -%define RGB_PIXELSIZE 4 |
| 11308 +%define RGB_RED EXT_XRGB_RED |
| 11309 +%define RGB_GREEN EXT_XRGB_GREEN |
| 11310 +%define RGB_BLUE EXT_XRGB_BLUE |
| 11311 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 11312 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxrgb_ycc_convert_sse2 |
| 11313 %include "jcclrss2-64.asm" |
| 11314 Index: simd/jccolss2.asm |
| 11315 =================================================================== |
| 11316 --- simd/jccolss2.asm» (revision 829) |
| 11317 +++ simd/jccolss2.asm» (working copy) |
| 11318 @@ -34,7 +34,7 @@ |
| 11319 » SECTION»SEG_CONST |
| 11320 |
| 11321 » alignz» 16 |
| 11322 -» global» EXTN(jconst_rgb_ycc_convert_sse2) |
| 11323 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE |
| 11324 |
| 11325 EXTN(jconst_rgb_ycc_convert_sse2): |
| 11326 |
| 11327 @@ -48,6 +48,9 @@ |
| 11328 » alignz» 16 |
| 11329 |
| 11330 ; -------------------------------------------------------------------------- |
| 11331 +» SECTION»SEG_TEXT |
| 11332 +» BITS» 32 |
| 11333 + |
| 11334 %include "jcclrss2.asm" |
| 11335 |
| 11336 %undef RGB_RED |
| 11337 @@ -54,10 +57,10 @@ |
| 11338 %undef RGB_GREEN |
| 11339 %undef RGB_BLUE |
| 11340 %undef RGB_PIXELSIZE |
| 11341 -%define RGB_RED 0 |
| 11342 -%define RGB_GREEN 1 |
| 11343 -%define RGB_BLUE 2 |
| 11344 -%define RGB_PIXELSIZE 3 |
| 11345 +%define RGB_RED EXT_RGB_RED |
| 11346 +%define RGB_GREEN EXT_RGB_GREEN |
| 11347 +%define RGB_BLUE EXT_RGB_BLUE |
| 11348 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 11349 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgb_ycc_convert_sse2 |
| 11350 %include "jcclrss2.asm" |
| 11351 |
| 11352 @@ -65,10 +68,10 @@ |
| 11353 %undef RGB_GREEN |
| 11354 %undef RGB_BLUE |
| 11355 %undef RGB_PIXELSIZE |
| 11356 -%define RGB_RED 0 |
| 11357 -%define RGB_GREEN 1 |
| 11358 -%define RGB_BLUE 2 |
| 11359 -%define RGB_PIXELSIZE 4 |
| 11360 +%define RGB_RED EXT_RGBX_RED |
| 11361 +%define RGB_GREEN EXT_RGBX_GREEN |
| 11362 +%define RGB_BLUE EXT_RGBX_BLUE |
| 11363 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 11364 %define jsimd_rgb_ycc_convert_sse2 jsimd_extrgbx_ycc_convert_sse2 |
| 11365 %include "jcclrss2.asm" |
| 11366 |
| 11367 @@ -76,10 +79,10 @@ |
| 11368 %undef RGB_GREEN |
| 11369 %undef RGB_BLUE |
| 11370 %undef RGB_PIXELSIZE |
| 11371 -%define RGB_RED 2 |
| 11372 -%define RGB_GREEN 1 |
| 11373 -%define RGB_BLUE 0 |
| 11374 -%define RGB_PIXELSIZE 3 |
| 11375 +%define RGB_RED EXT_BGR_RED |
| 11376 +%define RGB_GREEN EXT_BGR_GREEN |
| 11377 +%define RGB_BLUE EXT_BGR_BLUE |
| 11378 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 11379 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgr_ycc_convert_sse2 |
| 11380 %include "jcclrss2.asm" |
| 11381 |
| 11382 @@ -87,10 +90,10 @@ |
| 11383 %undef RGB_GREEN |
| 11384 %undef RGB_BLUE |
| 11385 %undef RGB_PIXELSIZE |
| 11386 -%define RGB_RED 2 |
| 11387 -%define RGB_GREEN 1 |
| 11388 -%define RGB_BLUE 0 |
| 11389 -%define RGB_PIXELSIZE 4 |
| 11390 +%define RGB_RED EXT_BGRX_RED |
| 11391 +%define RGB_GREEN EXT_BGRX_GREEN |
| 11392 +%define RGB_BLUE EXT_BGRX_BLUE |
| 11393 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 11394 %define jsimd_rgb_ycc_convert_sse2 jsimd_extbgrx_ycc_convert_sse2 |
| 11395 %include "jcclrss2.asm" |
| 11396 |
| 11397 @@ -98,10 +101,10 @@ |
| 11398 %undef RGB_GREEN |
| 11399 %undef RGB_BLUE |
| 11400 %undef RGB_PIXELSIZE |
| 11401 -%define RGB_RED 3 |
| 11402 -%define RGB_GREEN 2 |
| 11403 -%define RGB_BLUE 1 |
| 11404 -%define RGB_PIXELSIZE 4 |
| 11405 +%define RGB_RED EXT_XBGR_RED |
| 11406 +%define RGB_GREEN EXT_XBGR_GREEN |
| 11407 +%define RGB_BLUE EXT_XBGR_BLUE |
| 11408 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 11409 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxbgr_ycc_convert_sse2 |
| 11410 %include "jcclrss2.asm" |
| 11411 |
| 11412 @@ -109,9 +112,9 @@ |
| 11413 %undef RGB_GREEN |
| 11414 %undef RGB_BLUE |
| 11415 %undef RGB_PIXELSIZE |
| 11416 -%define RGB_RED 1 |
| 11417 -%define RGB_GREEN 2 |
| 11418 -%define RGB_BLUE 3 |
| 11419 -%define RGB_PIXELSIZE 4 |
| 11420 +%define RGB_RED EXT_XRGB_RED |
| 11421 +%define RGB_GREEN EXT_XRGB_GREEN |
| 11422 +%define RGB_BLUE EXT_XRGB_BLUE |
| 11423 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 11424 %define jsimd_rgb_ycc_convert_sse2 jsimd_extxrgb_ycc_convert_sse2 |
| 11425 %include "jcclrss2.asm" |
| 11426 Index: simd/jcqnt3dn.asm |
| 11427 =================================================================== |
| 11428 --- simd/jcqnt3dn.asm» (revision 829) |
| 11429 +++ simd/jcqnt3dn.asm» (working copy) |
| 11430 @@ -35,7 +35,7 @@ |
| 11431 %define workspace» ebp+16» » ; FAST_FLOAT * workspace |
400 | 11432 |
401 align 16 | 11433 align 16 |
402 -» global» EXTN(jsimd_idct_4x4_sse2) | 11434 -» global» EXTN(jsimd_convsamp_float_3dnow) |
403 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE | 11435 +» global» EXTN(jsimd_convsamp_float_3dnow) PRIVATE |
404 | 11436 |
405 EXTN(jsimd_idct_4x4_sse2): | 11437 EXTN(jsimd_convsamp_float_3dnow): |
| 11438 » push» ebp |
| 11439 @@ -138,7 +138,7 @@ |
| 11440 %define workspace» ebp+16» » ; FAST_FLOAT * workspace |
| 11441 |
| 11442 » align» 16 |
| 11443 -» global» EXTN(jsimd_quantize_float_3dnow) |
| 11444 +» global» EXTN(jsimd_quantize_float_3dnow) PRIVATE |
| 11445 |
| 11446 EXTN(jsimd_quantize_float_3dnow): |
| 11447 » push» ebp |
| 11448 @@ -228,3 +228,6 @@ |
| 11449 » pop» ebp |
| 11450 » ret |
| 11451 |
| 11452 +; For some reason, the OS X linker does not honor the request to align the |
| 11453 +; segment unless we do this. |
| 11454 +» align» 16 |
| 11455 Index: simd/jcqntmmx.asm |
| 11456 =================================================================== |
| 11457 --- simd/jcqntmmx.asm» (revision 829) |
| 11458 +++ simd/jcqntmmx.asm» (working copy) |
| 11459 @@ -35,7 +35,7 @@ |
| 11460 %define workspace» ebp+16» » ; DCTELEM * workspace |
| 11461 |
| 11462 » align» 16 |
| 11463 -» global» EXTN(jsimd_convsamp_mmx) |
| 11464 +» global» EXTN(jsimd_convsamp_mmx) PRIVATE |
| 11465 |
| 11466 EXTN(jsimd_convsamp_mmx): |
| 11467 » push» ebp |
| 11468 @@ -140,7 +140,7 @@ |
| 11469 %define workspace» ebp+16» » ; DCTELEM * workspace |
| 11470 |
| 11471 » align» 16 |
| 11472 -» global» EXTN(jsimd_quantize_mmx) |
| 11473 +» global» EXTN(jsimd_quantize_mmx) PRIVATE |
| 11474 |
| 11475 EXTN(jsimd_quantize_mmx): |
| 11476 » push» ebp |
| 11477 @@ -269,3 +269,6 @@ |
| 11478 » pop» ebp |
| 11479 » ret |
| 11480 |
| 11481 +; For some reason, the OS X linker does not honor the request to align the |
| 11482 +; segment unless we do this. |
| 11483 +» align» 16 |
| 11484 Index: simd/jcqnts2f-64.asm |
| 11485 =================================================================== |
| 11486 --- simd/jcqnts2f-64.asm» (revision 829) |
| 11487 +++ simd/jcqnts2f-64.asm» (working copy) |
| 11488 @@ -1,5 +1,5 @@ |
| 11489 ; |
| 11490 -; jcqnts2f.asm - sample data conversion and quantization (64-bit SSE & SSE2) |
| 11491 +; jcqnts2f-64.asm - sample data conversion and quantization (64-bit SSE & SSE2) |
| 11492 ; |
| 11493 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 11494 ; Copyright 2009 D. R. Commander |
| 11495 @@ -36,13 +36,14 @@ |
| 11496 ; r12 = FAST_FLOAT * workspace |
| 11497 |
| 11498 » align» 16 |
| 11499 -» global» EXTN(jsimd_convsamp_float_sse2) |
| 11500 +» global» EXTN(jsimd_convsamp_float_sse2) PRIVATE |
| 11501 |
| 11502 EXTN(jsimd_convsamp_float_sse2): |
406 push rbp | 11503 push rbp |
407 @@ -413,7 +413,7 @@ | 11504 +» mov» rax,rsp |
408 ; r13 = JDIMENSION output_col | 11505 » mov» rbp,rsp |
| 11506 +» collect_args |
| 11507 » push» rbx |
| 11508 -» collect_args |
| 11509 |
| 11510 » pcmpeqw xmm7,xmm7 |
| 11511 » psllw xmm7,7 |
| 11512 @@ -89,8 +90,8 @@ |
| 11513 » dec» rcx |
| 11514 » jnz» short .convloop |
| 11515 |
| 11516 +» pop» rbx |
| 11517 » uncollect_args |
| 11518 -» pop» rbx |
| 11519 » pop» rbp |
| 11520 » ret |
| 11521 |
| 11522 @@ -109,10 +110,11 @@ |
| 11523 ; r12 = FAST_FLOAT * workspace |
409 | 11524 |
410 align 16 | 11525 align 16 |
411 -» global» EXTN(jsimd_idct_2x2_sse2) | 11526 -» global» EXTN(jsimd_quantize_float_sse2) |
412 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE | 11527 +» global» EXTN(jsimd_quantize_float_sse2) PRIVATE |
413 | 11528 |
414 EXTN(jsimd_idct_2x2_sse2): | 11529 EXTN(jsimd_quantize_float_sse2): |
415 push rbp | 11530 push rbp |
416 Index: simd/ji3dnflt.asm | 11531 +» mov» rax,rsp |
417 =================================================================== | 11532 » mov» rbp,rsp |
418 --- simd/ji3dnflt.asm» (revision 829) | 11533 » collect_args |
419 +++ simd/ji3dnflt.asm» (working copy) | 11534 |
420 @@ -27,7 +27,7 @@ | 11535 @@ -150,3 +152,7 @@ |
421 » SECTION»SEG_CONST | 11536 » uncollect_args |
422 | 11537 » pop» rbp |
423 » alignz» 16 | 11538 » ret |
424 -» global» EXTN(jconst_idct_float_3dnow) | 11539 + |
425 +» global» EXTN(jconst_idct_float_3dnow) PRIVATE | 11540 +; For some reason, the OS X linker does not honor the request to align the |
426 | 11541 +; segment unless we do this. |
427 EXTN(jconst_idct_float_3dnow): | 11542 +» align» 16 |
428 | |
429 @@ -63,7 +63,7 @@ | |
430 » » » » » ; FAST_FLOAT workspace[DCTSIZE2] | |
431 | |
432 » align» 16 | |
433 -» global» EXTN(jsimd_idct_float_3dnow) | |
434 +» global» EXTN(jsimd_idct_float_3dnow) PRIVATE | |
435 | |
436 EXTN(jsimd_idct_float_3dnow): | |
437 » push» ebp | |
438 Index: simd/jsimdcpu.asm | |
439 =================================================================== | |
440 --- simd/jsimdcpu.asm» (revision 829) | |
441 +++ simd/jsimdcpu.asm» (working copy) | |
442 @@ -29,7 +29,7 @@ | |
443 ; | |
444 | |
445 » align» 16 | |
446 -» global» EXTN(jpeg_simd_cpu_support) | |
447 +» global» EXTN(jpeg_simd_cpu_support) PRIVATE | |
448 | |
449 EXTN(jpeg_simd_cpu_support): | |
450 » push» ebx | |
451 Index: simd/jdmerss2-64.asm | |
452 =================================================================== | |
453 --- simd/jdmerss2-64.asm» (revision 829) | |
454 +++ simd/jdmerss2-64.asm» (working copy) | |
455 @@ -35,7 +35,7 @@ | |
456 » SECTION»SEG_CONST | |
457 | |
458 » alignz» 16 | |
459 -» global» EXTN(jconst_merged_upsample_sse2) | |
460 +» global» EXTN(jconst_merged_upsample_sse2) PRIVATE | |
461 | |
462 EXTN(jconst_merged_upsample_sse2): | |
463 | |
464 Index: simd/jdsammmx.asm | |
465 =================================================================== | |
466 --- simd/jdsammmx.asm» (revision 829) | |
467 +++ simd/jdsammmx.asm» (working copy) | |
468 @@ -22,7 +22,7 @@ | |
469 » SECTION»SEG_CONST | |
470 | |
471 » alignz» 16 | |
472 -» global» EXTN(jconst_fancy_upsample_mmx) | |
473 +» global» EXTN(jconst_fancy_upsample_mmx) PRIVATE | |
474 | |
475 EXTN(jconst_fancy_upsample_mmx): | |
476 | |
477 @@ -58,7 +58,7 @@ | |
478 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr | |
479 | |
480 » align» 16 | |
481 -» global» EXTN(jsimd_h2v1_fancy_upsample_mmx) | |
482 +» global» EXTN(jsimd_h2v1_fancy_upsample_mmx) PRIVATE | |
483 | |
484 EXTN(jsimd_h2v1_fancy_upsample_mmx): | |
485 » push» ebp | |
486 @@ -216,7 +216,7 @@ | |
487 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr | |
488 | |
489 » align» 16 | |
490 -» global» EXTN(jsimd_h2v2_fancy_upsample_mmx) | |
491 +» global» EXTN(jsimd_h2v2_fancy_upsample_mmx) PRIVATE | |
492 | |
493 EXTN(jsimd_h2v2_fancy_upsample_mmx): | |
494 » push» ebp | |
495 @@ -542,7 +542,7 @@ | |
496 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr | |
497 | |
498 » align» 16 | |
499 -» global» EXTN(jsimd_h2v1_upsample_mmx) | |
500 +» global» EXTN(jsimd_h2v1_upsample_mmx) PRIVATE | |
501 | |
502 EXTN(jsimd_h2v1_upsample_mmx): | |
503 » push» ebp | |
504 @@ -643,7 +643,7 @@ | |
505 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr | |
506 | |
507 » align» 16 | |
508 -» global» EXTN(jsimd_h2v2_upsample_mmx) | |
509 +» global» EXTN(jsimd_h2v2_upsample_mmx) PRIVATE | |
510 | |
511 EXTN(jsimd_h2v2_upsample_mmx): | |
512 » push» ebp | |
513 Index: simd/jdmrgmmx.asm | |
514 =================================================================== | |
515 --- simd/jdmrgmmx.asm» (revision 829) | |
516 +++ simd/jdmrgmmx.asm» (working copy) | |
517 @@ -40,7 +40,7 @@ | |
518 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr | |
519 | |
520 » align» 16 | |
521 -» global» EXTN(jsimd_h2v1_merged_upsample_mmx) | |
522 +» global» EXTN(jsimd_h2v1_merged_upsample_mmx) PRIVATE | |
523 | |
524 EXTN(jsimd_h2v1_merged_upsample_mmx): | |
525 » push» ebp | |
526 @@ -409,7 +409,7 @@ | |
527 %define output_buf(b)» » (b)+20» » ; JSAMPARRAY output_buf | |
528 | |
529 » align» 16 | |
530 -» global» EXTN(jsimd_h2v2_merged_upsample_mmx) | |
531 +» global» EXTN(jsimd_h2v2_merged_upsample_mmx) PRIVATE | |
532 | |
533 EXTN(jsimd_h2v2_merged_upsample_mmx): | |
534 » push» ebp | |
535 Index: simd/jdsamss2.asm | |
536 =================================================================== | |
537 --- simd/jdsamss2.asm» (revision 829) | |
538 +++ simd/jdsamss2.asm» (working copy) | |
539 @@ -22,7 +22,7 @@ | |
540 » SECTION»SEG_CONST | |
541 | |
542 » alignz» 16 | |
543 -» global» EXTN(jconst_fancy_upsample_sse2) | |
544 +» global» EXTN(jconst_fancy_upsample_sse2) PRIVATE | |
545 | |
546 EXTN(jconst_fancy_upsample_sse2): | |
547 | |
548 @@ -58,7 +58,7 @@ | |
549 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr | |
550 | |
551 » align» 16 | |
552 -» global» EXTN(jsimd_h2v1_fancy_upsample_sse2) | |
553 +» global» EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE | |
554 | |
555 EXTN(jsimd_h2v1_fancy_upsample_sse2): | |
556 » push» ebp | |
557 @@ -214,7 +214,7 @@ | |
558 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr | |
559 | |
560 » align» 16 | |
561 -» global» EXTN(jsimd_h2v2_fancy_upsample_sse2) | |
562 +» global» EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE | |
563 | |
564 EXTN(jsimd_h2v2_fancy_upsample_sse2): | |
565 » push» ebp | |
566 @@ -538,7 +538,7 @@ | |
567 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr | |
568 | |
569 » align» 16 | |
570 -» global» EXTN(jsimd_h2v1_upsample_sse2) | |
571 +» global» EXTN(jsimd_h2v1_upsample_sse2) PRIVATE | |
572 | |
573 EXTN(jsimd_h2v1_upsample_sse2): | |
574 » push» ebp | |
575 @@ -637,7 +637,7 @@ | |
576 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr | |
577 | |
578 » align» 16 | |
579 -» global» EXTN(jsimd_h2v2_upsample_sse2) | |
580 +» global» EXTN(jsimd_h2v2_upsample_sse2) PRIVATE | |
581 | |
582 EXTN(jsimd_h2v2_upsample_sse2): | |
583 » push» ebp | |
584 Index: simd/jiss2flt-64.asm | |
585 =================================================================== | |
586 --- simd/jiss2flt-64.asm» (revision 829) | |
587 +++ simd/jiss2flt-64.asm» (working copy) | |
588 @@ -38,7 +38,7 @@ | |
589 » SECTION»SEG_CONST | |
590 | |
591 » alignz» 16 | |
592 -» global» EXTN(jconst_idct_float_sse2) | |
593 +» global» EXTN(jconst_idct_float_sse2) PRIVATE | |
594 | |
595 EXTN(jconst_idct_float_sse2): | |
596 | |
597 @@ -74,7 +74,7 @@ | |
598 » » » » » ; FAST_FLOAT workspace[DCTSIZE2] | |
599 | |
600 » align» 16 | |
601 -» global» EXTN(jsimd_idct_float_sse2) | |
602 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE | |
603 | |
604 EXTN(jsimd_idct_float_sse2): | |
605 » push» rbp | |
606 Index: simd/jfss2int-64.asm | |
607 =================================================================== | |
608 --- simd/jfss2int-64.asm» (revision 829) | |
609 +++ simd/jfss2int-64.asm» (working copy) | |
610 @@ -67,7 +67,7 @@ | |
611 » SECTION»SEG_CONST | |
612 | |
613 » alignz» 16 | |
614 -» global» EXTN(jconst_fdct_islow_sse2) | |
615 +» global» EXTN(jconst_fdct_islow_sse2) PRIVATE | |
616 | |
617 EXTN(jconst_fdct_islow_sse2): | |
618 | |
619 @@ -101,7 +101,7 @@ | |
620 %define WK_NUM»» 6 | |
621 | |
622 » align» 16 | |
623 -» global» EXTN(jsimd_fdct_islow_sse2) | |
624 +» global» EXTN(jsimd_fdct_islow_sse2) PRIVATE | |
625 | |
626 EXTN(jsimd_fdct_islow_sse2): | |
627 » push» rbp | |
628 Index: simd/jcqnts2f.asm | 11543 Index: simd/jcqnts2f.asm |
629 =================================================================== | 11544 =================================================================== |
630 --- simd/jcqnts2f.asm (revision 829) | 11545 --- simd/jcqnts2f.asm (revision 829) |
631 +++ simd/jcqnts2f.asm (working copy) | 11546 +++ simd/jcqnts2f.asm (working copy) |
632 @@ -35,7 +35,7 @@ | 11547 @@ -35,7 +35,7 @@ |
633 %define workspace ebp+16 ; FAST_FLOAT * workspace | 11548 %define workspace ebp+16 ; FAST_FLOAT * workspace |
634 | 11549 |
635 align 16 | 11550 align 16 |
636 - global EXTN(jsimd_convsamp_float_sse2) | 11551 - global EXTN(jsimd_convsamp_float_sse2) |
637 + global EXTN(jsimd_convsamp_float_sse2) PRIVATE | 11552 + global EXTN(jsimd_convsamp_float_sse2) PRIVATE |
638 | 11553 |
639 EXTN(jsimd_convsamp_float_sse2): | 11554 EXTN(jsimd_convsamp_float_sse2): |
640 push ebp | 11555 push ebp |
641 @@ -115,7 +115,7 @@ | 11556 @@ -115,7 +115,7 @@ |
642 %define workspace ebp+16 ; FAST_FLOAT * workspace | 11557 %define workspace ebp+16 ; FAST_FLOAT * workspace |
643 | 11558 |
644 align 16 | 11559 align 16 |
645 - global EXTN(jsimd_quantize_float_sse2) | 11560 - global EXTN(jsimd_quantize_float_sse2) |
646 + global EXTN(jsimd_quantize_float_sse2) PRIVATE | 11561 + global EXTN(jsimd_quantize_float_sse2) PRIVATE |
647 | 11562 |
648 EXTN(jsimd_quantize_float_sse2): | 11563 EXTN(jsimd_quantize_float_sse2): |
649 push ebp | 11564 push ebp |
650 Index: simd/jdmrgss2.asm | 11565 @@ -166,3 +166,6 @@ |
651 =================================================================== | 11566 » pop» ebp |
652 --- simd/jdmrgss2.asm» (revision 829) | 11567 » ret |
653 +++ simd/jdmrgss2.asm» (working copy) | 11568 |
654 @@ -40,7 +40,7 @@ | 11569 +; For some reason, the OS X linker does not honor the request to align the |
655 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr | 11570 +; segment unless we do this. |
| 11571 +» align» 16 |
| 11572 Index: simd/jcqnts2i-64.asm |
| 11573 =================================================================== |
| 11574 --- simd/jcqnts2i-64.asm» (revision 829) |
| 11575 +++ simd/jcqnts2i-64.asm» (working copy) |
| 11576 @@ -1,5 +1,5 @@ |
| 11577 ; |
| 11578 -; jcqnts2i.asm - sample data conversion and quantization (64-bit SSE2) |
| 11579 +; jcqnts2i-64.asm - sample data conversion and quantization (64-bit SSE2) |
| 11580 ; |
| 11581 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 11582 ; Copyright 2009 D. R. Commander |
| 11583 @@ -36,13 +36,14 @@ |
| 11584 ; r12 = DCTELEM * workspace |
656 | 11585 |
657 align 16 | 11586 align 16 |
658 -» global» EXTN(jsimd_h2v1_merged_upsample_sse2) | 11587 -» global» EXTN(jsimd_convsamp_sse2) |
659 +» global» EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE | 11588 +» global» EXTN(jsimd_convsamp_sse2) PRIVATE |
660 | 11589 |
661 EXTN(jsimd_h2v1_merged_upsample_sse2): | 11590 EXTN(jsimd_convsamp_sse2): |
662 » push» ebp | 11591 » push» rbp |
663 @@ -560,7 +560,7 @@ | 11592 +» mov» rax,rsp |
664 %define output_buf(b)» » (b)+20» » ; JSAMPARRAY output_buf | 11593 » mov» rbp,rsp |
| 11594 +» collect_args |
| 11595 » push» rbx |
| 11596 -» collect_args |
| 11597 |
| 11598 » pxor» xmm6,xmm6» » ; xmm6=(all 0's) |
| 11599 » pcmpeqw»xmm7,xmm7 |
| 11600 @@ -84,8 +85,8 @@ |
| 11601 » dec» rcx |
| 11602 » jnz» short .convloop |
| 11603 |
| 11604 +» pop» rbx |
| 11605 » uncollect_args |
| 11606 -» pop» rbx |
| 11607 » pop» rbp |
| 11608 » ret |
| 11609 |
| 11610 @@ -111,10 +112,11 @@ |
| 11611 ; r12 = DCTELEM * workspace |
665 | 11612 |
666 align 16 | 11613 align 16 |
667 -» global» EXTN(jsimd_h2v2_merged_upsample_sse2) | 11614 -» global» EXTN(jsimd_quantize_sse2) |
668 +» global» EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE | 11615 +» global» EXTN(jsimd_quantize_sse2) PRIVATE |
669 | 11616 |
670 EXTN(jsimd_h2v2_merged_upsample_sse2): | 11617 EXTN(jsimd_quantize_sse2): |
671 » push» ebp | |
672 Index: simd/jfmmxint.asm | |
673 =================================================================== | |
674 --- simd/jfmmxint.asm» (revision 829) | |
675 +++ simd/jfmmxint.asm» (working copy) | |
676 @@ -66,7 +66,7 @@ | |
677 » SECTION»SEG_CONST | |
678 | |
679 » alignz» 16 | |
680 -» global» EXTN(jconst_fdct_islow_mmx) | |
681 +» global» EXTN(jconst_fdct_islow_mmx) PRIVATE | |
682 | |
683 EXTN(jconst_fdct_islow_mmx): | |
684 | |
685 @@ -101,7 +101,7 @@ | |
686 %define WK_NUM»» 2 | |
687 | |
688 » align» 16 | |
689 -» global» EXTN(jsimd_fdct_islow_mmx) | |
690 +» global» EXTN(jsimd_fdct_islow_mmx) PRIVATE | |
691 | |
692 EXTN(jsimd_fdct_islow_mmx): | |
693 » push» ebp | |
694 Index: simd/jcgryss2-64.asm | |
695 =================================================================== | |
696 --- simd/jcgryss2-64.asm» (revision 829) | |
697 +++ simd/jcgryss2-64.asm» (working copy) | |
698 @@ -37,7 +37,7 @@ | |
699 | |
700 » align» 16 | |
701 | |
702 -» global» EXTN(jsimd_rgb_gray_convert_sse2) | |
703 +» global» EXTN(jsimd_rgb_gray_convert_sse2) PRIVATE | |
704 | |
705 EXTN(jsimd_rgb_gray_convert_sse2): | |
706 push rbp | 11618 push rbp |
| 11619 + mov rax,rsp |
| 11620 mov rbp,rsp |
| 11621 collect_args |
| 11622 |
| 11623 @@ -179,3 +181,7 @@ |
| 11624 uncollect_args |
| 11625 pop rbp |
| 11626 ret |
| 11627 + |
| 11628 +; For some reason, the OS X linker does not honor the request to align the |
| 11629 +; segment unless we do this. |
| 11630 + align 16 |
707 Index: simd/jcqnts2i.asm | 11631 Index: simd/jcqnts2i.asm |
708 =================================================================== | 11632 =================================================================== |
709 --- simd/jcqnts2i.asm (revision 829) | 11633 --- simd/jcqnts2i.asm (revision 829) |
710 +++ simd/jcqnts2i.asm (working copy) | 11634 +++ simd/jcqnts2i.asm (working copy) |
711 @@ -35,7 +35,7 @@ | 11635 @@ -35,7 +35,7 @@ |
712 %define workspace ebp+16 ; DCTELEM * workspace | 11636 %define workspace ebp+16 ; DCTELEM * workspace |
713 | 11637 |
714 align 16 | 11638 align 16 |
715 - global EXTN(jsimd_convsamp_sse2) | 11639 - global EXTN(jsimd_convsamp_sse2) |
716 + global EXTN(jsimd_convsamp_sse2) PRIVATE | 11640 + global EXTN(jsimd_convsamp_sse2) PRIVATE |
717 | 11641 |
718 EXTN(jsimd_convsamp_sse2): | 11642 EXTN(jsimd_convsamp_sse2): |
719 push ebp | 11643 push ebp |
720 @@ -117,7 +117,7 @@ | 11644 @@ -117,7 +117,7 @@ |
721 %define workspace ebp+16 ; DCTELEM * workspace | 11645 %define workspace ebp+16 ; DCTELEM * workspace |
722 | 11646 |
723 align 16 | 11647 align 16 |
724 - global EXTN(jsimd_quantize_sse2) | 11648 - global EXTN(jsimd_quantize_sse2) |
725 + global EXTN(jsimd_quantize_sse2) PRIVATE | 11649 + global EXTN(jsimd_quantize_sse2) PRIVATE |
726 | 11650 |
727 EXTN(jsimd_quantize_sse2): | 11651 EXTN(jsimd_quantize_sse2): |
728 push ebp | 11652 push ebp |
729 Index: simd/jiss2fst-64.asm | 11653 @@ -195,3 +195,6 @@ |
| 11654 » pop» ebp |
| 11655 » ret |
| 11656 |
| 11657 +; For some reason, the OS X linker does not honor the request to align the |
| 11658 +; segment unless we do this. |
| 11659 +» align» 16 |
| 11660 Index: simd/jcqntsse.asm |
730 =================================================================== | 11661 =================================================================== |
731 --- simd/jiss2fst-64.asm» (revision 829) | 11662 --- simd/jcqntsse.asm» (revision 829) |
732 +++ simd/jiss2fst-64.asm» (working copy) | 11663 +++ simd/jcqntsse.asm» (working copy) |
733 @@ -60,7 +60,7 @@ | 11664 @@ -35,7 +35,7 @@ |
734 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) | 11665 %define workspace» ebp+16» » ; FAST_FLOAT * workspace |
735 | |
736 » alignz» 16 | |
737 -» global» EXTN(jconst_idct_ifast_sse2) | |
738 +» global» EXTN(jconst_idct_ifast_sse2) PRIVATE | |
739 | |
740 EXTN(jconst_idct_ifast_sse2): | |
741 | |
742 @@ -93,7 +93,7 @@ | |
743 %define WK_NUM»» 2 | |
744 | 11666 |
745 align 16 | 11667 align 16 |
746 -» global» EXTN(jsimd_idct_ifast_sse2) | 11668 -» global» EXTN(jsimd_convsamp_float_sse) |
747 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE | 11669 +» global» EXTN(jsimd_convsamp_float_sse) PRIVATE |
748 | 11670 |
749 EXTN(jsimd_idct_ifast_sse2): | 11671 EXTN(jsimd_convsamp_float_sse): |
750 » push» rbp | 11672 » push» ebp |
751 Index: simd/jiss2flt.asm | 11673 @@ -138,7 +138,7 @@ |
752 =================================================================== | 11674 %define workspace» ebp+16» » ; FAST_FLOAT * workspace |
753 --- simd/jiss2flt.asm» (revision 829) | |
754 +++ simd/jiss2flt.asm» (working copy) | |
755 @@ -37,7 +37,7 @@ | |
756 » SECTION»SEG_CONST | |
757 | |
758 » alignz» 16 | |
759 -» global» EXTN(jconst_idct_float_sse2) | |
760 +» global» EXTN(jconst_idct_float_sse2) PRIVATE | |
761 | |
762 EXTN(jconst_idct_float_sse2): | |
763 | |
764 @@ -73,7 +73,7 @@ | |
765 » » » » » ; FAST_FLOAT workspace[DCTSIZE2] | |
766 | 11675 |
767 align 16 | 11676 align 16 |
768 -» global» EXTN(jsimd_idct_float_sse2) | 11677 -» global» EXTN(jsimd_quantize_float_sse) |
769 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE | 11678 +» global» EXTN(jsimd_quantize_float_sse) PRIVATE |
770 | 11679 |
771 EXTN(jsimd_idct_float_sse2): | 11680 EXTN(jsimd_quantize_float_sse): |
772 push ebp | 11681 push ebp |
773 Index: simd/jiss2int.asm | 11682 @@ -206,3 +206,6 @@ |
| 11683 » pop» ebp |
| 11684 » ret |
| 11685 |
| 11686 +; For some reason, the OS X linker does not honor the request to align the |
| 11687 +; segment unless we do this. |
| 11688 +» align» 16 |
| 11689 Index: simd/jcsammmx.asm |
774 =================================================================== | 11690 =================================================================== |
775 --- simd/jiss2int.asm» (revision 829) | 11691 --- simd/jcsammmx.asm» (revision 829) |
776 +++ simd/jiss2int.asm» (working copy) | 11692 +++ simd/jcsammmx.asm» (working copy) |
777 @@ -66,7 +66,7 @@ | 11693 @@ -40,7 +40,7 @@ |
778 » SECTION»SEG_CONST | 11694 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data |
779 | |
780 » alignz» 16 | |
781 -» global» EXTN(jconst_idct_islow_sse2) | |
782 +» global» EXTN(jconst_idct_islow_sse2) PRIVATE | |
783 | |
784 EXTN(jconst_idct_islow_sse2): | |
785 | |
786 @@ -105,7 +105,7 @@ | |
787 %define WK_NUM»» 12 | |
788 | 11695 |
789 align 16 | 11696 align 16 |
790 -» global» EXTN(jsimd_idct_islow_sse2) | 11697 -» global» EXTN(jsimd_h2v1_downsample_mmx) |
791 +» global» EXTN(jsimd_idct_islow_sse2) PRIVATE | 11698 +» global» EXTN(jsimd_h2v1_downsample_mmx) PRIVATE |
792 | 11699 |
793 EXTN(jsimd_idct_islow_sse2): | 11700 EXTN(jsimd_h2v1_downsample_mmx): |
794 push ebp | 11701 push ebp |
795 Index: simd/jfsseflt-64.asm | 11702 @@ -95,7 +95,7 @@ |
796 =================================================================== | |
797 --- simd/jfsseflt-64.asm» (revision 829) | |
798 +++ simd/jfsseflt-64.asm» (working copy) | |
799 @@ -38,7 +38,7 @@ | |
800 » SECTION»SEG_CONST | |
801 | 11703 |
802 » alignz» 16 | 11704 » mov» eax, JDIMENSION [v_samp(ebp)]» ; rowctr |
803 -» global» EXTN(jconst_fdct_float_sse) | 11705 » test» eax,eax |
804 +» global» EXTN(jconst_fdct_float_sse) PRIVATE | 11706 -» jle» short .return |
| 11707 +» jle» near .return |
805 | 11708 |
806 EXTN(jconst_fdct_float_sse): | 11709 » mov edx, 0x00010000» ; bias pattern |
807 | 11710 » movd mm7,edx |
808 @@ -65,7 +65,7 @@ | 11711 @@ -182,7 +182,7 @@ |
809 %define WK_NUM»» 2 | 11712 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data |
810 | 11713 |
811 align 16 | 11714 align 16 |
812 -» global» EXTN(jsimd_fdct_float_sse) | 11715 -» global» EXTN(jsimd_h2v2_downsample_mmx) |
813 +» global» EXTN(jsimd_fdct_float_sse) PRIVATE | 11716 +» global» EXTN(jsimd_h2v2_downsample_mmx) PRIVATE |
814 | 11717 |
815 EXTN(jsimd_fdct_float_sse): | 11718 EXTN(jsimd_h2v2_downsample_mmx): |
816 » push» rbp | 11719 » push» ebp |
817 Index: simd/jccolss2-64.asm | 11720 @@ -319,3 +319,6 @@ |
818 =================================================================== | 11721 » pop» ebp |
819 --- simd/jccolss2-64.asm» (revision 829) | 11722 » ret |
820 +++ simd/jccolss2-64.asm» (working copy) | |
821 @@ -34,7 +34,7 @@ | |
822 » SECTION»SEG_CONST | |
823 | 11723 |
824 » alignz» 16 | 11724 +; For some reason, the OS X linker does not honor the request to align the |
825 -» global» EXTN(jconst_rgb_ycc_convert_sse2) | 11725 +; segment unless we do this. |
826 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE | 11726 +» align» 16 |
827 | |
828 EXTN(jconst_rgb_ycc_convert_sse2): | |
829 | |
830 Index: simd/jcsamss2-64.asm | 11727 Index: simd/jcsamss2-64.asm |
831 =================================================================== | 11728 =================================================================== |
832 --- simd/jcsamss2-64.asm (revision 829) | 11729 --- simd/jcsamss2-64.asm (revision 829) |
833 +++ simd/jcsamss2-64.asm (working copy) | 11730 +++ simd/jcsamss2-64.asm (working copy) |
834 @@ -41,7 +41,7 @@ | 11731 @@ -1,5 +1,5 @@ |
| 11732 ; |
| 11733 -; jcsamss2.asm - downsampling (64-bit SSE2) |
| 11734 +; jcsamss2-64.asm - downsampling (64-bit SSE2) |
| 11735 ; |
| 11736 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 11737 ; Copyright 2009 D. R. Commander |
| 11738 @@ -41,10 +41,11 @@ |
835 ; r15 = JSAMPARRAY output_data | 11739 ; r15 = JSAMPARRAY output_data |
836 | 11740 |
837 align 16 | 11741 align 16 |
838 - global EXTN(jsimd_h2v1_downsample_sse2) | 11742 - global EXTN(jsimd_h2v1_downsample_sse2) |
839 + global EXTN(jsimd_h2v1_downsample_sse2) PRIVATE | 11743 + global EXTN(jsimd_h2v1_downsample_sse2) PRIVATE |
840 | 11744 |
841 EXTN(jsimd_h2v1_downsample_sse2): | 11745 EXTN(jsimd_h2v1_downsample_sse2): |
842 push rbp | 11746 push rbp |
843 @@ -185,7 +185,7 @@ | 11747 +» mov» rax,rsp |
| 11748 » mov» rbp,rsp |
| 11749 » collect_args |
| 11750 |
| 11751 @@ -184,10 +185,11 @@ |
844 ; r15 = JSAMPARRAY output_data | 11752 ; r15 = JSAMPARRAY output_data |
845 | 11753 |
846 align 16 | 11754 align 16 |
847 - global EXTN(jsimd_h2v2_downsample_sse2) | 11755 - global EXTN(jsimd_h2v2_downsample_sse2) |
848 + global EXTN(jsimd_h2v2_downsample_sse2) PRIVATE | 11756 + global EXTN(jsimd_h2v2_downsample_sse2) PRIVATE |
849 | 11757 |
850 EXTN(jsimd_h2v2_downsample_sse2): | 11758 EXTN(jsimd_h2v2_downsample_sse2): |
851 push rbp | 11759 push rbp |
| 11760 + mov rax,rsp |
| 11761 mov rbp,rsp |
| 11762 collect_args |
| 11763 |
| 11764 @@ -322,3 +324,7 @@ |
| 11765 uncollect_args |
| 11766 pop rbp |
| 11767 ret |
| 11768 + |
| 11769 +; For some reason, the OS X linker does not honor the request to align the |
| 11770 +; segment unless we do this. |
| 11771 + align 16 |
| 11772 Index: simd/jcsamss2.asm |
| 11773 =================================================================== |
| 11774 --- simd/jcsamss2.asm (revision 829) |
| 11775 +++ simd/jcsamss2.asm (working copy) |
| 11776 @@ -40,7 +40,7 @@ |
| 11777 %define output_data(b) (b)+28 ; JSAMPARRAY output_data |
| 11778 |
| 11779 align 16 |
| 11780 - global EXTN(jsimd_h2v1_downsample_sse2) |
| 11781 + global EXTN(jsimd_h2v1_downsample_sse2) PRIVATE |
| 11782 |
| 11783 EXTN(jsimd_h2v1_downsample_sse2): |
| 11784 push ebp |
| 11785 @@ -195,7 +195,7 @@ |
| 11786 %define output_data(b) (b)+28 ; JSAMPARRAY output_data |
| 11787 |
| 11788 align 16 |
| 11789 - global EXTN(jsimd_h2v2_downsample_sse2) |
| 11790 + global EXTN(jsimd_h2v2_downsample_sse2) PRIVATE |
| 11791 |
| 11792 EXTN(jsimd_h2v2_downsample_sse2): |
| 11793 push ebp |
| 11794 @@ -346,3 +346,6 @@ |
| 11795 pop ebp |
| 11796 ret |
| 11797 |
| 11798 +; For some reason, the OS X linker does not honor the request to align the |
| 11799 +; segment unless we do this. |
| 11800 + align 16 |
| 11801 Index: simd/jdclrmmx.asm |
| 11802 =================================================================== |
| 11803 --- simd/jdclrmmx.asm (revision 829) |
| 11804 +++ simd/jdclrmmx.asm (working copy) |
| 11805 @@ -19,8 +19,6 @@ |
| 11806 %include "jcolsamp.inc" |
| 11807 |
| 11808 ; -------------------------------------------------------------------------- |
| 11809 - SECTION SEG_TEXT |
| 11810 - BITS 32 |
| 11811 ; |
| 11812 ; Convert some rows of samples to the output colorspace. |
| 11813 ; |
| 11814 @@ -42,7 +40,7 @@ |
| 11815 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr |
| 11816 |
| 11817 align 16 |
| 11818 - global EXTN(jsimd_ycc_rgb_convert_mmx) |
| 11819 + global EXTN(jsimd_ycc_rgb_convert_mmx) PRIVATE |
| 11820 |
| 11821 EXTN(jsimd_ycc_rgb_convert_mmx): |
| 11822 push ebp |
| 11823 @@ -402,3 +400,6 @@ |
| 11824 pop ebp |
| 11825 ret |
| 11826 |
| 11827 +; For some reason, the OS X linker does not honor the request to align the |
| 11828 +; segment unless we do this. |
| 11829 + align 16 |
852 Index: simd/jdclrss2-64.asm | 11830 Index: simd/jdclrss2-64.asm |
853 =================================================================== | 11831 =================================================================== |
854 --- simd/jdclrss2-64.asm (revision 829) | 11832 --- simd/jdclrss2-64.asm (revision 829) |
855 +++ simd/jdclrss2-64.asm (working copy) | 11833 +++ simd/jdclrss2-64.asm (working copy) |
856 @@ -39,7 +39,7 @@ | 11834 @@ -1,8 +1,8 @@ |
| 11835 ; |
| 11836 -; jdclrss2.asm - colorspace conversion (64-bit SSE2) |
| 11837 +; jdclrss2-64.asm - colorspace conversion (64-bit SSE2) |
| 11838 ; |
| 11839 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 11840 -; Copyright 2009 D. R. Commander |
| 11841 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 11842 +; Copyright 2009, 2012 D. R. Commander |
| 11843 ; |
| 11844 ; Based on |
| 11845 ; x86 SIMD extension for IJG JPEG library |
| 11846 @@ -20,8 +20,6 @@ |
| 11847 %include "jcolsamp.inc" |
| 11848 » » » » |
| 11849 ; -------------------------------------------------------------------------- |
| 11850 -» SECTION»SEG_TEXT |
| 11851 -» BITS» 64 |
| 11852 ; |
| 11853 ; Convert some rows of samples to the output colorspace. |
| 11854 ; |
| 11855 @@ -41,7 +39,7 @@ |
857 %define WK_NUM 2 | 11856 %define WK_NUM 2 |
858 | 11857 |
859 align 16 | 11858 align 16 |
860 - global EXTN(jsimd_ycc_rgb_convert_sse2) | 11859 - global EXTN(jsimd_ycc_rgb_convert_sse2) |
861 + global EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE | 11860 + global EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE |
862 | 11861 |
863 EXTN(jsimd_ycc_rgb_convert_sse2): | 11862 EXTN(jsimd_ycc_rgb_convert_sse2): |
864 push rbp | 11863 push rbp |
| 11864 @@ -51,8 +49,8 @@ |
| 11865 mov [rsp],rax |
| 11866 mov rbp,rsp ; rbp = aligned rbp |
| 11867 lea rsp, [wk(0)] |
| 11868 + collect_args |
| 11869 push rbx |
| 11870 - collect_args |
| 11871 |
| 11872 mov rcx, r10 ; num_cols |
| 11873 test rcx,rcx |
| 11874 @@ -72,7 +70,7 @@ |
| 11875 pop rcx |
| 11876 |
| 11877 mov rdi, r13 |
| 11878 - mov rax, r14 |
| 11879 + mov eax, r14d |
| 11880 test rax,rax |
| 11881 jle near .return |
| 11882 .rowloop: |
| 11883 @@ -253,17 +251,13 @@ |
| 11884 movntdq XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 11885 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 11886 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF |
| 11887 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 11888 jmp short .out0 |
| 11889 .out1: ; --(unaligned)----------------- |
| 11890 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's) |
| 11891 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA |
| 11892 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 11893 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD |
| 11894 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 11895 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [rdi], xmmF |
| 11896 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 11897 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 11898 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 11899 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF |
| 11900 .out0: |
| 11901 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 11902 sub rcx, byte SIZEOF_XMMWORD |
| 11903 jz near .nextrow |
| 11904 |
| 11905 @@ -273,14 +267,12 @@ |
| 11906 jmp near .columnloop |
| 11907 |
| 11908 .column_st32: |
| 11909 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's) |
| 11910 lea rcx, [rcx+rcx*2] ; imul ecx, RGB_PIXELSIZE |
| 11911 cmp rcx, byte 2*SIZEOF_XMMWORD |
| 11912 jb short .column_st16 |
| 11913 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA |
| 11914 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 11915 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD |
| 11916 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 11917 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 11918 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 11919 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr |
| 11920 movdqa xmmA,xmmF |
| 11921 sub rcx, byte 2*SIZEOF_XMMWORD |
| 11922 jmp short .column_st15 |
| 11923 @@ -287,50 +279,44 @@ |
| 11924 .column_st16: |
| 11925 cmp rcx, byte SIZEOF_XMMWORD |
| 11926 jb short .column_st15 |
| 11927 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA |
| 11928 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 11929 add rdi, byte SIZEOF_XMMWORD ; outptr |
| 11930 movdqa xmmA,xmmD |
| 11931 sub rcx, byte SIZEOF_XMMWORD |
| 11932 .column_st15: |
| 11933 - mov rax,rcx |
| 11934 - xor rcx, byte 0x0F |
| 11935 - shl rcx, 2 |
| 11936 - movd xmmB,ecx |
| 11937 - psrlq xmmH,4 |
| 11938 - pcmpeqb xmmE,xmmE |
| 11939 - psrlq xmmH,xmmB |
| 11940 - psrlq xmmE,xmmB |
| 11941 - punpcklbw xmmE,xmmH |
| 11942 - ; ---------------- |
| 11943 - mov rcx,rdi |
| 11944 - and rcx, byte SIZEOF_XMMWORD-1 |
| 11945 - jz short .adj0 |
| 11946 - add rax,rcx |
| 11947 - cmp rax, byte SIZEOF_XMMWORD |
| 11948 - ja short .adj0 |
| 11949 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary |
| 11950 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,rcx |
| 11951 - movdqa xmmG,xmmA |
| 11952 - movdqa xmmC,xmmE |
| 11953 - pslldq xmmA, SIZEOF_XMMWORD/2 |
| 11954 - pslldq xmmE, SIZEOF_XMMWORD/2 |
| 11955 - movd xmmD,ecx |
| 11956 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT |
| 11957 - jb short .adj1 |
| 11958 - movd xmmF,ecx |
| 11959 - psllq xmmA,xmmF |
| 11960 - psllq xmmE,xmmF |
| 11961 - jmp short .adj0 |
| 11962 -.adj1: neg ecx |
| 11963 - movd xmmF,ecx |
| 11964 - psrlq xmmA,xmmF |
| 11965 - psrlq xmmE,xmmF |
| 11966 - psllq xmmG,xmmD |
| 11967 - psllq xmmC,xmmD |
| 11968 - por xmmA,xmmG |
| 11969 - por xmmE,xmmC |
| 11970 -.adj0: ; ---------------- |
| 11971 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA |
| 11972 + ; Store the lower 8 bytes of xmmA to the output when it has enough |
| 11973 + ; space. |
| 11974 + cmp rcx, byte SIZEOF_MMWORD |
| 11975 + jb short .column_st7 |
| 11976 + movq XMM_MMWORD [rdi], xmmA |
| 11977 + add rdi, byte SIZEOF_MMWORD |
| 11978 + sub rcx, byte SIZEOF_MMWORD |
| 11979 + psrldq xmmA, SIZEOF_MMWORD |
| 11980 +.column_st7: |
| 11981 + ; Store the lower 4 bytes of xmmA to the output when it has enough |
| 11982 + ; space. |
| 11983 + cmp rcx, byte SIZEOF_DWORD |
| 11984 + jb short .column_st3 |
| 11985 + movd XMM_DWORD [rdi], xmmA |
| 11986 + add rdi, byte SIZEOF_DWORD |
| 11987 + sub rcx, byte SIZEOF_DWORD |
| 11988 + psrldq xmmA, SIZEOF_DWORD |
| 11989 +.column_st3: |
| 11990 + ; Store the lower 2 bytes of rax to the output when it has enough |
| 11991 + ; space. |
| 11992 + movd eax, xmmA |
| 11993 + cmp rcx, byte SIZEOF_WORD |
| 11994 + jb short .column_st1 |
| 11995 + mov WORD [rdi], ax |
| 11996 + add rdi, byte SIZEOF_WORD |
| 11997 + sub rcx, byte SIZEOF_WORD |
| 11998 + shr rax, 16 |
| 11999 +.column_st1: |
| 12000 + ; Store the lower 1 byte of rax to the output when it has enough |
| 12001 + ; space. |
| 12002 + test rcx, rcx |
| 12003 + jz short .nextrow |
| 12004 + mov BYTE [rdi], al |
| 12005 |
| 12006 %else ; RGB_PIXELSIZE == 4 ; ----------- |
| 12007 |
| 12008 @@ -375,19 +361,14 @@ |
| 12009 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 12010 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC |
| 12011 movntdq XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH |
| 12012 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 12013 jmp short .out0 |
| 12014 .out1: ; --(unaligned)----------------- |
| 12015 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's) |
| 12016 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA |
| 12017 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 12018 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD |
| 12019 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 12020 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [rdi], xmmC |
| 12021 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 12022 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [rdi], xmmH |
| 12023 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 12024 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 12025 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 12026 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC |
| 12027 + movdqu XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH |
| 12028 .out0: |
| 12029 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 12030 sub rcx, byte SIZEOF_XMMWORD |
| 12031 jz near .nextrow |
| 12032 |
| 12033 @@ -397,13 +378,11 @@ |
| 12034 jmp near .columnloop |
| 12035 |
| 12036 .column_st32: |
| 12037 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's) |
| 12038 cmp rcx, byte SIZEOF_XMMWORD/2 |
| 12039 jb short .column_st16 |
| 12040 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA |
| 12041 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 12042 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD |
| 12043 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 12044 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 12045 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 12046 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr |
| 12047 movdqa xmmA,xmmC |
| 12048 movdqa xmmD,xmmH |
| 12049 sub rcx, byte SIZEOF_XMMWORD/2 |
| 12050 @@ -410,50 +389,25 @@ |
| 12051 .column_st16: |
| 12052 cmp rcx, byte SIZEOF_XMMWORD/4 |
| 12053 jb short .column_st15 |
| 12054 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA |
| 12055 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 12056 add rdi, byte SIZEOF_XMMWORD ; outptr |
| 12057 movdqa xmmA,xmmD |
| 12058 sub rcx, byte SIZEOF_XMMWORD/4 |
| 12059 .column_st15: |
| 12060 - cmp rcx, byte SIZEOF_XMMWORD/16 |
| 12061 - jb near .nextrow |
| 12062 - mov rax,rcx |
| 12063 - xor rcx, byte 0x03 |
| 12064 - inc rcx |
| 12065 - shl rcx, 4 |
| 12066 - movd xmmF,ecx |
| 12067 - psrlq xmmE,xmmF |
| 12068 - punpcklbw xmmE,xmmE |
| 12069 - ; ---------------- |
| 12070 - mov rcx,rdi |
| 12071 - and rcx, byte SIZEOF_XMMWORD-1 |
| 12072 - jz short .adj0 |
| 12073 - lea rax, [rcx+rax*4] ; RGB_PIXELSIZE |
| 12074 - cmp rax, byte SIZEOF_XMMWORD |
| 12075 - ja short .adj0 |
| 12076 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary |
| 12077 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx |
| 12078 - movdqa xmmB,xmmA |
| 12079 - movdqa xmmG,xmmE |
| 12080 - pslldq xmmA, SIZEOF_XMMWORD/2 |
| 12081 - pslldq xmmE, SIZEOF_XMMWORD/2 |
| 12082 - movd xmmC,ecx |
| 12083 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT |
| 12084 - jb short .adj1 |
| 12085 - movd xmmH,ecx |
| 12086 - psllq xmmA,xmmH |
| 12087 - psllq xmmE,xmmH |
| 12088 - jmp short .adj0 |
| 12089 -.adj1: neg rcx |
| 12090 - movd xmmH,ecx |
| 12091 - psrlq xmmA,xmmH |
| 12092 - psrlq xmmE,xmmH |
| 12093 - psllq xmmB,xmmC |
| 12094 - psllq xmmG,xmmC |
| 12095 - por xmmA,xmmB |
| 12096 - por xmmE,xmmG |
| 12097 -.adj0: ; ---------------- |
| 12098 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA |
| 12099 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough |
| 12100 + ; space. |
| 12101 + cmp rcx, byte SIZEOF_XMMWORD/8 |
| 12102 + jb short .column_st7 |
| 12103 + movq MMWORD [rdi], xmmA |
| 12104 + add rdi, byte SIZEOF_XMMWORD/8*4 |
| 12105 + sub rcx, byte SIZEOF_XMMWORD/8 |
| 12106 + psrldq xmmA, SIZEOF_XMMWORD/8*4 |
| 12107 +.column_st7: |
| 12108 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough |
| 12109 + ; space. |
| 12110 + test rcx, rcx |
| 12111 + jz short .nextrow |
| 12112 + movd XMM_DWORD [rdi], xmmA |
| 12113 |
| 12114 %endif ; RGB_PIXELSIZE ; --------------- |
| 12115 |
| 12116 @@ -475,9 +429,13 @@ |
| 12117 sfence ; flush the write buffer |
| 12118 |
| 12119 .return: |
| 12120 + pop rbx |
| 12121 uncollect_args |
| 12122 - pop rbx |
| 12123 mov rsp,rbp ; rsp <- aligned rbp |
| 12124 pop rsp ; rsp <- original rbp |
| 12125 pop rbp |
| 12126 ret |
| 12127 + |
| 12128 +; For some reason, the OS X linker does not honor the request to align the |
| 12129 +; segment unless we do this. |
| 12130 + align 16 |
| 12131 Index: simd/jdclrss2.asm |
| 12132 =================================================================== |
| 12133 --- simd/jdclrss2.asm (revision 829) |
| 12134 +++ simd/jdclrss2.asm (working copy) |
| 12135 @@ -1,7 +1,8 @@ |
| 12136 ; |
| 12137 ; jdclrss2.asm - colorspace conversion (SSE2) |
| 12138 ; |
| 12139 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 12140 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 12141 +; Copyright 2012 D. R. Commander |
| 12142 ; |
| 12143 ; Based on |
| 12144 ; x86 SIMD extension for IJG JPEG library |
| 12145 @@ -19,8 +20,6 @@ |
| 12146 %include "jcolsamp.inc" |
| 12147 |
| 12148 ; -------------------------------------------------------------------------- |
| 12149 - SECTION SEG_TEXT |
| 12150 - BITS 32 |
| 12151 ; |
| 12152 ; Convert some rows of samples to the output colorspace. |
| 12153 ; |
| 12154 @@ -42,7 +41,7 @@ |
| 12155 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr |
| 12156 |
| 12157 align 16 |
| 12158 - global EXTN(jsimd_ycc_rgb_convert_sse2) |
| 12159 + global EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE |
| 12160 |
| 12161 EXTN(jsimd_ycc_rgb_convert_sse2): |
| 12162 push ebp |
| 12163 @@ -264,17 +263,13 @@ |
| 12164 movntdq XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 12165 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 12166 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF |
| 12167 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 12168 jmp short .out0 |
| 12169 .out1: ; --(unaligned)----------------- |
| 12170 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's) |
| 12171 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA |
| 12172 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12173 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD |
| 12174 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12175 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [edi], xmmF |
| 12176 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12177 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 12178 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 12179 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF |
| 12180 .out0: |
| 12181 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 12182 sub ecx, byte SIZEOF_XMMWORD |
| 12183 jz near .nextrow |
| 12184 |
| 12185 @@ -285,14 +280,12 @@ |
| 12186 alignx 16,7 |
| 12187 |
| 12188 .column_st32: |
| 12189 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's) |
| 12190 lea ecx, [ecx+ecx*2] ; imul ecx, RGB_PIXELSIZE |
| 12191 cmp ecx, byte 2*SIZEOF_XMMWORD |
| 12192 jb short .column_st16 |
| 12193 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA |
| 12194 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12195 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD |
| 12196 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12197 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 12198 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 12199 + add edi, byte 2*SIZEOF_XMMWORD ; outptr |
| 12200 movdqa xmmA,xmmF |
| 12201 sub ecx, byte 2*SIZEOF_XMMWORD |
| 12202 jmp short .column_st15 |
| 12203 @@ -299,50 +292,44 @@ |
| 12204 .column_st16: |
| 12205 cmp ecx, byte SIZEOF_XMMWORD |
| 12206 jb short .column_st15 |
| 12207 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA |
| 12208 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 12209 add edi, byte SIZEOF_XMMWORD ; outptr |
| 12210 movdqa xmmA,xmmD |
| 12211 sub ecx, byte SIZEOF_XMMWORD |
| 12212 .column_st15: |
| 12213 - mov eax,ecx |
| 12214 - xor ecx, byte 0x0F |
| 12215 - shl ecx, 2 |
| 12216 - movd xmmB,ecx |
| 12217 - psrlq xmmH,4 |
| 12218 - pcmpeqb xmmE,xmmE |
| 12219 - psrlq xmmH,xmmB |
| 12220 - psrlq xmmE,xmmB |
| 12221 - punpcklbw xmmE,xmmH |
| 12222 - ; ---------------- |
| 12223 - mov ecx,edi |
| 12224 - and ecx, byte SIZEOF_XMMWORD-1 |
| 12225 - jz short .adj0 |
| 12226 - add eax,ecx |
| 12227 - cmp eax, byte SIZEOF_XMMWORD |
| 12228 - ja short .adj0 |
| 12229 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary |
| 12230 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx |
| 12231 - movdqa xmmG,xmmA |
| 12232 - movdqa xmmC,xmmE |
| 12233 - pslldq xmmA, SIZEOF_XMMWORD/2 |
| 12234 - pslldq xmmE, SIZEOF_XMMWORD/2 |
| 12235 - movd xmmD,ecx |
| 12236 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT |
| 12237 - jb short .adj1 |
| 12238 - movd xmmF,ecx |
| 12239 - psllq xmmA,xmmF |
| 12240 - psllq xmmE,xmmF |
| 12241 - jmp short .adj0 |
| 12242 -.adj1: neg ecx |
| 12243 - movd xmmF,ecx |
| 12244 - psrlq xmmA,xmmF |
| 12245 - psrlq xmmE,xmmF |
| 12246 - psllq xmmG,xmmD |
| 12247 - psllq xmmC,xmmD |
| 12248 - por xmmA,xmmG |
| 12249 - por xmmE,xmmC |
| 12250 -.adj0: ; ---------------- |
| 12251 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 12252 + ; Store the lower 8 bytes of xmmA to the output when it has enough |
| 12253 + ; space. |
| 12254 + cmp ecx, byte SIZEOF_MMWORD |
| 12255 + jb short .column_st7 |
| 12256 + movq XMM_MMWORD [edi], xmmA |
| 12257 + add edi, byte SIZEOF_MMWORD |
| 12258 + sub ecx, byte SIZEOF_MMWORD |
| 12259 + psrldq xmmA, SIZEOF_MMWORD |
| 12260 +.column_st7: |
| 12261 + ; Store the lower 4 bytes of xmmA to the output when it has enough |
| 12262 + ; space. |
| 12263 + cmp ecx, byte SIZEOF_DWORD |
| 12264 + jb short .column_st3 |
| 12265 + movd XMM_DWORD [edi], xmmA |
| 12266 + add edi, byte SIZEOF_DWORD |
| 12267 + sub ecx, byte SIZEOF_DWORD |
| 12268 + psrldq xmmA, SIZEOF_DWORD |
| 12269 +.column_st3: |
| 12270 + ; Store the lower 2 bytes of eax to the output when it has enough |
| 12271 + ; space. |
| 12272 + movd eax, xmmA |
| 12273 + cmp ecx, byte SIZEOF_WORD |
| 12274 + jb short .column_st1 |
| 12275 + mov WORD [edi], ax |
| 12276 + add edi, byte SIZEOF_WORD |
| 12277 + sub ecx, byte SIZEOF_WORD |
| 12278 + shr eax, 16 |
| 12279 +.column_st1: |
| 12280 + ; Store the lower 1 byte of eax to the output when it has enough |
| 12281 + ; space. |
| 12282 + test ecx, ecx |
| 12283 + jz short .nextrow |
| 12284 + mov BYTE [edi], al |
| 12285 |
| 12286 %else ; RGB_PIXELSIZE == 4 ; ----------- |
| 12287 |
| 12288 @@ -387,19 +374,14 @@ |
| 12289 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 12290 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC |
| 12291 movntdq XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH |
| 12292 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 12293 jmp short .out0 |
| 12294 .out1: ; --(unaligned)----------------- |
| 12295 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's) |
| 12296 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 12297 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12298 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD |
| 12299 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12300 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [edi], xmmC |
| 12301 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12302 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [edi], xmmH |
| 12303 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12304 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 12305 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 12306 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC |
| 12307 + movdqu XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH |
| 12308 .out0: |
| 12309 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 12310 sub ecx, byte SIZEOF_XMMWORD |
| 12311 jz near .nextrow |
| 12312 |
| 12313 @@ -410,13 +392,11 @@ |
| 12314 alignx 16,7 |
| 12315 |
| 12316 .column_st32: |
| 12317 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's) |
| 12318 cmp ecx, byte SIZEOF_XMMWORD/2 |
| 12319 jb short .column_st16 |
| 12320 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 12321 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12322 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD |
| 12323 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 12324 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 12325 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 12326 + add edi, byte 2*SIZEOF_XMMWORD ; outptr |
| 12327 movdqa xmmA,xmmC |
| 12328 movdqa xmmD,xmmH |
| 12329 sub ecx, byte SIZEOF_XMMWORD/2 |
| 12330 @@ -423,50 +403,25 @@ |
| 12331 .column_st16: |
| 12332 cmp ecx, byte SIZEOF_XMMWORD/4 |
| 12333 jb short .column_st15 |
| 12334 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 12335 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 12336 add edi, byte SIZEOF_XMMWORD ; outptr |
| 12337 movdqa xmmA,xmmD |
| 12338 sub ecx, byte SIZEOF_XMMWORD/4 |
| 12339 .column_st15: |
| 12340 - cmp ecx, byte SIZEOF_XMMWORD/16 |
| 12341 - jb short .nextrow |
| 12342 - mov eax,ecx |
| 12343 - xor ecx, byte 0x03 |
| 12344 - inc ecx |
| 12345 - shl ecx, 4 |
| 12346 - movd xmmF,ecx |
| 12347 - psrlq xmmE,xmmF |
| 12348 - punpcklbw xmmE,xmmE |
| 12349 - ; ---------------- |
| 12350 - mov ecx,edi |
| 12351 - and ecx, byte SIZEOF_XMMWORD-1 |
| 12352 - jz short .adj0 |
| 12353 - lea eax, [ecx+eax*4] ; RGB_PIXELSIZE |
| 12354 - cmp eax, byte SIZEOF_XMMWORD |
| 12355 - ja short .adj0 |
| 12356 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary |
| 12357 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx |
| 12358 - movdqa xmmB,xmmA |
| 12359 - movdqa xmmG,xmmE |
| 12360 - pslldq xmmA, SIZEOF_XMMWORD/2 |
| 12361 - pslldq xmmE, SIZEOF_XMMWORD/2 |
| 12362 - movd xmmC,ecx |
| 12363 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT |
| 12364 - jb short .adj1 |
| 12365 - movd xmmH,ecx |
| 12366 - psllq xmmA,xmmH |
| 12367 - psllq xmmE,xmmH |
| 12368 - jmp short .adj0 |
| 12369 -.adj1: neg ecx |
| 12370 - movd xmmH,ecx |
| 12371 - psrlq xmmA,xmmH |
| 12372 - psrlq xmmE,xmmH |
| 12373 - psllq xmmB,xmmC |
| 12374 - psllq xmmG,xmmC |
| 12375 - por xmmA,xmmB |
| 12376 - por xmmE,xmmG |
| 12377 -.adj0: ; ---------------- |
| 12378 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 12379 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough |
| 12380 + ; space. |
| 12381 + cmp ecx, byte SIZEOF_XMMWORD/8 |
| 12382 + jb short .column_st7 |
| 12383 + movq XMM_MMWORD [edi], xmmA |
| 12384 + add edi, byte SIZEOF_XMMWORD/8*4 |
| 12385 + sub ecx, byte SIZEOF_XMMWORD/8 |
| 12386 + psrldq xmmA, SIZEOF_XMMWORD/8*4 |
| 12387 +.column_st7: |
| 12388 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough |
| 12389 + ; space. |
| 12390 + test ecx, ecx |
| 12391 + jz short .nextrow |
| 12392 + movd XMM_DWORD [edi], xmmA |
| 12393 |
| 12394 %endif ; RGB_PIXELSIZE ; --------------- |
| 12395 |
| 12396 @@ -500,3 +455,6 @@ |
| 12397 pop ebp |
| 12398 ret |
| 12399 |
| 12400 +; For some reason, the OS X linker does not honor the request to align the |
| 12401 +; segment unless we do this. |
| 12402 + align 16 |
865 Index: simd/jdcolmmx.asm | 12403 Index: simd/jdcolmmx.asm |
866 =================================================================== | 12404 =================================================================== |
867 --- simd/jdcolmmx.asm (revision 829) | 12405 --- simd/jdcolmmx.asm (revision 829) |
868 +++ simd/jdcolmmx.asm (working copy) | 12406 +++ simd/jdcolmmx.asm (working copy) |
869 @@ -35,7 +35,7 @@ | 12407 @@ -35,7 +35,7 @@ |
870 SECTION SEG_CONST | 12408 SECTION SEG_CONST |
871 | 12409 |
872 alignz 16 | 12410 alignz 16 |
873 - global EXTN(jconst_ycc_rgb_convert_mmx) | 12411 - global EXTN(jconst_ycc_rgb_convert_mmx) |
874 + global EXTN(jconst_ycc_rgb_convert_mmx) PRIVATE | 12412 + global EXTN(jconst_ycc_rgb_convert_mmx) PRIVATE |
875 | 12413 |
876 EXTN(jconst_ycc_rgb_convert_mmx): | 12414 EXTN(jconst_ycc_rgb_convert_mmx): |
877 | 12415 |
878 Index: simd/jcclrmmx.asm | 12416 @@ -48,6 +48,9 @@ |
| 12417 » alignz» 16 |
| 12418 |
| 12419 ; -------------------------------------------------------------------------- |
| 12420 +» SECTION»SEG_TEXT |
| 12421 +» BITS» 32 |
| 12422 + |
| 12423 %include "jdclrmmx.asm" |
| 12424 |
| 12425 %undef RGB_RED |
| 12426 @@ -54,10 +57,10 @@ |
| 12427 %undef RGB_GREEN |
| 12428 %undef RGB_BLUE |
| 12429 %undef RGB_PIXELSIZE |
| 12430 -%define RGB_RED 0 |
| 12431 -%define RGB_GREEN 1 |
| 12432 -%define RGB_BLUE 2 |
| 12433 -%define RGB_PIXELSIZE 3 |
| 12434 +%define RGB_RED EXT_RGB_RED |
| 12435 +%define RGB_GREEN EXT_RGB_GREEN |
| 12436 +%define RGB_BLUE EXT_RGB_BLUE |
| 12437 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 12438 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extrgb_convert_mmx |
| 12439 %include "jdclrmmx.asm" |
| 12440 |
| 12441 @@ -65,10 +68,10 @@ |
| 12442 %undef RGB_GREEN |
| 12443 %undef RGB_BLUE |
| 12444 %undef RGB_PIXELSIZE |
| 12445 -%define RGB_RED 0 |
| 12446 -%define RGB_GREEN 1 |
| 12447 -%define RGB_BLUE 2 |
| 12448 -%define RGB_PIXELSIZE 4 |
| 12449 +%define RGB_RED EXT_RGBX_RED |
| 12450 +%define RGB_GREEN EXT_RGBX_GREEN |
| 12451 +%define RGB_BLUE EXT_RGBX_BLUE |
| 12452 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 12453 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extrgbx_convert_mmx |
| 12454 %include "jdclrmmx.asm" |
| 12455 |
| 12456 @@ -76,10 +79,10 @@ |
| 12457 %undef RGB_GREEN |
| 12458 %undef RGB_BLUE |
| 12459 %undef RGB_PIXELSIZE |
| 12460 -%define RGB_RED 2 |
| 12461 -%define RGB_GREEN 1 |
| 12462 -%define RGB_BLUE 0 |
| 12463 -%define RGB_PIXELSIZE 3 |
| 12464 +%define RGB_RED EXT_BGR_RED |
| 12465 +%define RGB_GREEN EXT_BGR_GREEN |
| 12466 +%define RGB_BLUE EXT_BGR_BLUE |
| 12467 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 12468 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extbgr_convert_mmx |
| 12469 %include "jdclrmmx.asm" |
| 12470 |
| 12471 @@ -87,10 +90,10 @@ |
| 12472 %undef RGB_GREEN |
| 12473 %undef RGB_BLUE |
| 12474 %undef RGB_PIXELSIZE |
| 12475 -%define RGB_RED 2 |
| 12476 -%define RGB_GREEN 1 |
| 12477 -%define RGB_BLUE 0 |
| 12478 -%define RGB_PIXELSIZE 4 |
| 12479 +%define RGB_RED EXT_BGRX_RED |
| 12480 +%define RGB_GREEN EXT_BGRX_GREEN |
| 12481 +%define RGB_BLUE EXT_BGRX_BLUE |
| 12482 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 12483 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extbgrx_convert_mmx |
| 12484 %include "jdclrmmx.asm" |
| 12485 |
| 12486 @@ -98,10 +101,10 @@ |
| 12487 %undef RGB_GREEN |
| 12488 %undef RGB_BLUE |
| 12489 %undef RGB_PIXELSIZE |
| 12490 -%define RGB_RED 3 |
| 12491 -%define RGB_GREEN 2 |
| 12492 -%define RGB_BLUE 1 |
| 12493 -%define RGB_PIXELSIZE 4 |
| 12494 +%define RGB_RED EXT_XBGR_RED |
| 12495 +%define RGB_GREEN EXT_XBGR_GREEN |
| 12496 +%define RGB_BLUE EXT_XBGR_BLUE |
| 12497 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 12498 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extxbgr_convert_mmx |
| 12499 %include "jdclrmmx.asm" |
| 12500 |
| 12501 @@ -109,9 +112,9 @@ |
| 12502 %undef RGB_GREEN |
| 12503 %undef RGB_BLUE |
| 12504 %undef RGB_PIXELSIZE |
| 12505 -%define RGB_RED 1 |
| 12506 -%define RGB_GREEN 2 |
| 12507 -%define RGB_BLUE 3 |
| 12508 -%define RGB_PIXELSIZE 4 |
| 12509 +%define RGB_RED EXT_XRGB_RED |
| 12510 +%define RGB_GREEN EXT_XRGB_GREEN |
| 12511 +%define RGB_BLUE EXT_XRGB_BLUE |
| 12512 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 12513 %define jsimd_ycc_rgb_convert_mmx jsimd_ycc_extxrgb_convert_mmx |
| 12514 %include "jdclrmmx.asm" |
| 12515 Index: simd/jdcolss2-64.asm |
879 =================================================================== | 12516 =================================================================== |
880 --- simd/jcclrmmx.asm» (revision 829) | 12517 --- simd/jdcolss2-64.asm» (revision 829) |
881 +++ simd/jcclrmmx.asm» (working copy) | 12518 +++ simd/jdcolss2-64.asm» (working copy) |
882 @@ -40,7 +40,7 @@ | 12519 @@ -1,5 +1,5 @@ |
883 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr | 12520 ; |
884 | 12521 -; jdcolss2.asm - colorspace conversion (64-bit SSE2) |
885 » align» 16 | 12522 +; jdcolss2-64.asm - colorspace conversion (64-bit SSE2) |
886 -» global» EXTN(jsimd_rgb_ycc_convert_mmx) | 12523 ; |
887 +» global» EXTN(jsimd_rgb_ycc_convert_mmx) PRIVATE | 12524 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
888 | 12525 ; Copyright 2009 D. R. Commander |
889 EXTN(jsimd_rgb_ycc_convert_mmx): | 12526 @@ -35,7 +35,7 @@ |
890 » push» ebp | |
891 Index: simd/jfsseflt.asm | |
892 =================================================================== | |
893 --- simd/jfsseflt.asm» (revision 829) | |
894 +++ simd/jfsseflt.asm» (working copy) | |
895 @@ -37,7 +37,7 @@ | |
896 SECTION SEG_CONST | 12527 SECTION SEG_CONST |
897 | 12528 |
898 alignz 16 | 12529 alignz 16 |
899 -» global» EXTN(jconst_fdct_float_sse) | 12530 -» global» EXTN(jconst_ycc_rgb_convert_sse2) |
900 +» global» EXTN(jconst_fdct_float_sse) PRIVATE | 12531 +» global» EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE |
901 | 12532 |
902 EXTN(jconst_fdct_float_sse): | 12533 EXTN(jconst_ycc_rgb_convert_sse2): |
903 | 12534 |
904 @@ -65,7 +65,7 @@ | 12535 @@ -48,6 +48,9 @@ |
905 %define WK_NUM»» 2 | 12536 » alignz» 16 |
906 | 12537 |
907 » align» 16 | 12538 ; -------------------------------------------------------------------------- |
908 -» global» EXTN(jsimd_fdct_float_sse) | 12539 +» SECTION»SEG_TEXT |
909 +» global» EXTN(jsimd_fdct_float_sse) PRIVATE | 12540 +» BITS» 64 |
910 | 12541 + |
911 EXTN(jsimd_fdct_float_sse): | 12542 %include "jdclrss2-64.asm" |
912 » push» ebp | 12543 |
913 Index: simd/jdmrgss2-64.asm | 12544 %undef RGB_RED |
914 =================================================================== | 12545 @@ -54,10 +57,10 @@ |
915 --- simd/jdmrgss2-64.asm» (revision 829) | 12546 %undef RGB_GREEN |
916 +++ simd/jdmrgss2-64.asm» (working copy) | 12547 %undef RGB_BLUE |
917 @@ -39,7 +39,7 @@ | 12548 %undef RGB_PIXELSIZE |
918 %define WK_NUM»» 3 | 12549 -%define RGB_RED 0 |
919 | 12550 -%define RGB_GREEN 1 |
920 » align» 16 | 12551 -%define RGB_BLUE 2 |
921 -» global» EXTN(jsimd_h2v1_merged_upsample_sse2) | 12552 -%define RGB_PIXELSIZE 3 |
922 +» global» EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE | 12553 +%define RGB_RED EXT_RGB_RED |
923 | 12554 +%define RGB_GREEN EXT_RGB_GREEN |
924 EXTN(jsimd_h2v1_merged_upsample_sse2): | 12555 +%define RGB_BLUE EXT_RGB_BLUE |
925 » push» rbp | 12556 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
926 @@ -543,7 +543,7 @@ | 12557 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgb_convert_sse2 |
927 ; r13 = JSAMPARRAY output_buf | 12558 %include "jdclrss2-64.asm" |
928 | 12559 |
929 » align» 16 | 12560 @@ -65,10 +68,10 @@ |
930 -» global» EXTN(jsimd_h2v2_merged_upsample_sse2) | 12561 %undef RGB_GREEN |
931 +» global» EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE | 12562 %undef RGB_BLUE |
932 | 12563 %undef RGB_PIXELSIZE |
933 EXTN(jsimd_h2v2_merged_upsample_sse2): | 12564 -%define RGB_RED 0 |
934 » push» rbp | 12565 -%define RGB_GREEN 1 |
| 12566 -%define RGB_BLUE 2 |
| 12567 -%define RGB_PIXELSIZE 4 |
| 12568 +%define RGB_RED EXT_RGBX_RED |
| 12569 +%define RGB_GREEN EXT_RGBX_GREEN |
| 12570 +%define RGB_BLUE EXT_RGBX_BLUE |
| 12571 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 12572 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgbx_convert_sse2 |
| 12573 %include "jdclrss2-64.asm" |
| 12574 |
| 12575 @@ -76,10 +79,10 @@ |
| 12576 %undef RGB_GREEN |
| 12577 %undef RGB_BLUE |
| 12578 %undef RGB_PIXELSIZE |
| 12579 -%define RGB_RED 2 |
| 12580 -%define RGB_GREEN 1 |
| 12581 -%define RGB_BLUE 0 |
| 12582 -%define RGB_PIXELSIZE 3 |
| 12583 +%define RGB_RED EXT_BGR_RED |
| 12584 +%define RGB_GREEN EXT_BGR_GREEN |
| 12585 +%define RGB_BLUE EXT_BGR_BLUE |
| 12586 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 12587 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgr_convert_sse2 |
| 12588 %include "jdclrss2-64.asm" |
| 12589 |
| 12590 @@ -87,10 +90,10 @@ |
| 12591 %undef RGB_GREEN |
| 12592 %undef RGB_BLUE |
| 12593 %undef RGB_PIXELSIZE |
| 12594 -%define RGB_RED 2 |
| 12595 -%define RGB_GREEN 1 |
| 12596 -%define RGB_BLUE 0 |
| 12597 -%define RGB_PIXELSIZE 4 |
| 12598 +%define RGB_RED EXT_BGRX_RED |
| 12599 +%define RGB_GREEN EXT_BGRX_GREEN |
| 12600 +%define RGB_BLUE EXT_BGRX_BLUE |
| 12601 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 12602 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgrx_convert_sse2 |
| 12603 %include "jdclrss2-64.asm" |
| 12604 |
| 12605 @@ -98,10 +101,10 @@ |
| 12606 %undef RGB_GREEN |
| 12607 %undef RGB_BLUE |
| 12608 %undef RGB_PIXELSIZE |
| 12609 -%define RGB_RED 3 |
| 12610 -%define RGB_GREEN 2 |
| 12611 -%define RGB_BLUE 1 |
| 12612 -%define RGB_PIXELSIZE 4 |
| 12613 +%define RGB_RED EXT_XBGR_RED |
| 12614 +%define RGB_GREEN EXT_XBGR_GREEN |
| 12615 +%define RGB_BLUE EXT_XBGR_BLUE |
| 12616 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 12617 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxbgr_convert_sse2 |
| 12618 %include "jdclrss2-64.asm" |
| 12619 |
| 12620 @@ -109,9 +112,9 @@ |
| 12621 %undef RGB_GREEN |
| 12622 %undef RGB_BLUE |
| 12623 %undef RGB_PIXELSIZE |
| 12624 -%define RGB_RED 1 |
| 12625 -%define RGB_GREEN 2 |
| 12626 -%define RGB_BLUE 3 |
| 12627 -%define RGB_PIXELSIZE 4 |
| 12628 +%define RGB_RED EXT_XRGB_RED |
| 12629 +%define RGB_GREEN EXT_XRGB_GREEN |
| 12630 +%define RGB_BLUE EXT_XRGB_BLUE |
| 12631 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 12632 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxrgb_convert_sse2 |
| 12633 %include "jdclrss2-64.asm" |
935 Index: simd/jdcolss2.asm | 12634 Index: simd/jdcolss2.asm |
936 =================================================================== | 12635 =================================================================== |
937 --- simd/jdcolss2.asm (revision 829) | 12636 --- simd/jdcolss2.asm (revision 829) |
938 +++ simd/jdcolss2.asm (working copy) | 12637 +++ simd/jdcolss2.asm (working copy) |
939 @@ -35,7 +35,7 @@ | 12638 @@ -35,7 +35,7 @@ |
940 SECTION SEG_CONST | 12639 SECTION SEG_CONST |
941 | 12640 |
942 alignz 16 | 12641 alignz 16 |
943 - global EXTN(jconst_ycc_rgb_convert_sse2) | 12642 - global EXTN(jconst_ycc_rgb_convert_sse2) |
944 + global EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE | 12643 + global EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE |
945 | 12644 |
946 EXTN(jconst_ycc_rgb_convert_sse2): | 12645 EXTN(jconst_ycc_rgb_convert_sse2): |
947 | 12646 |
| 12647 @@ -48,6 +48,9 @@ |
| 12648 alignz 16 |
| 12649 |
| 12650 ; -------------------------------------------------------------------------- |
| 12651 + SECTION SEG_TEXT |
| 12652 + BITS 32 |
| 12653 + |
| 12654 %include "jdclrss2.asm" |
| 12655 |
| 12656 %undef RGB_RED |
| 12657 @@ -54,10 +57,10 @@ |
| 12658 %undef RGB_GREEN |
| 12659 %undef RGB_BLUE |
| 12660 %undef RGB_PIXELSIZE |
| 12661 -%define RGB_RED 0 |
| 12662 -%define RGB_GREEN 1 |
| 12663 -%define RGB_BLUE 2 |
| 12664 -%define RGB_PIXELSIZE 3 |
| 12665 +%define RGB_RED EXT_RGB_RED |
| 12666 +%define RGB_GREEN EXT_RGB_GREEN |
| 12667 +%define RGB_BLUE EXT_RGB_BLUE |
| 12668 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 12669 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgb_convert_sse2 |
| 12670 %include "jdclrss2.asm" |
| 12671 |
| 12672 @@ -65,10 +68,10 @@ |
| 12673 %undef RGB_GREEN |
| 12674 %undef RGB_BLUE |
| 12675 %undef RGB_PIXELSIZE |
| 12676 -%define RGB_RED 0 |
| 12677 -%define RGB_GREEN 1 |
| 12678 -%define RGB_BLUE 2 |
| 12679 -%define RGB_PIXELSIZE 4 |
| 12680 +%define RGB_RED EXT_RGBX_RED |
| 12681 +%define RGB_GREEN EXT_RGBX_GREEN |
| 12682 +%define RGB_BLUE EXT_RGBX_BLUE |
| 12683 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 12684 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extrgbx_convert_sse2 |
| 12685 %include "jdclrss2.asm" |
| 12686 |
| 12687 @@ -76,10 +79,10 @@ |
| 12688 %undef RGB_GREEN |
| 12689 %undef RGB_BLUE |
| 12690 %undef RGB_PIXELSIZE |
| 12691 -%define RGB_RED 2 |
| 12692 -%define RGB_GREEN 1 |
| 12693 -%define RGB_BLUE 0 |
| 12694 -%define RGB_PIXELSIZE 3 |
| 12695 +%define RGB_RED EXT_BGR_RED |
| 12696 +%define RGB_GREEN EXT_BGR_GREEN |
| 12697 +%define RGB_BLUE EXT_BGR_BLUE |
| 12698 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 12699 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgr_convert_sse2 |
| 12700 %include "jdclrss2.asm" |
| 12701 |
| 12702 @@ -87,10 +90,10 @@ |
| 12703 %undef RGB_GREEN |
| 12704 %undef RGB_BLUE |
| 12705 %undef RGB_PIXELSIZE |
| 12706 -%define RGB_RED 2 |
| 12707 -%define RGB_GREEN 1 |
| 12708 -%define RGB_BLUE 0 |
| 12709 -%define RGB_PIXELSIZE 4 |
| 12710 +%define RGB_RED EXT_BGRX_RED |
| 12711 +%define RGB_GREEN EXT_BGRX_GREEN |
| 12712 +%define RGB_BLUE EXT_BGRX_BLUE |
| 12713 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 12714 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extbgrx_convert_sse2 |
| 12715 %include "jdclrss2.asm" |
| 12716 |
| 12717 @@ -98,10 +101,10 @@ |
| 12718 %undef RGB_GREEN |
| 12719 %undef RGB_BLUE |
| 12720 %undef RGB_PIXELSIZE |
| 12721 -%define RGB_RED 3 |
| 12722 -%define RGB_GREEN 2 |
| 12723 -%define RGB_BLUE 1 |
| 12724 -%define RGB_PIXELSIZE 4 |
| 12725 +%define RGB_RED EXT_XBGR_RED |
| 12726 +%define RGB_GREEN EXT_XBGR_GREEN |
| 12727 +%define RGB_BLUE EXT_XBGR_BLUE |
| 12728 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 12729 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxbgr_convert_sse2 |
| 12730 %include "jdclrss2.asm" |
| 12731 |
| 12732 @@ -109,9 +112,9 @@ |
| 12733 %undef RGB_GREEN |
| 12734 %undef RGB_BLUE |
| 12735 %undef RGB_PIXELSIZE |
| 12736 -%define RGB_RED 1 |
| 12737 -%define RGB_GREEN 2 |
| 12738 -%define RGB_BLUE 3 |
| 12739 -%define RGB_PIXELSIZE 4 |
| 12740 +%define RGB_RED EXT_XRGB_RED |
| 12741 +%define RGB_GREEN EXT_XRGB_GREEN |
| 12742 +%define RGB_BLUE EXT_XRGB_BLUE |
| 12743 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 12744 %define jsimd_ycc_rgb_convert_sse2 jsimd_ycc_extxrgb_convert_sse2 |
| 12745 %include "jdclrss2.asm" |
948 Index: simd/jdmermmx.asm | 12746 Index: simd/jdmermmx.asm |
949 =================================================================== | 12747 =================================================================== |
950 --- simd/jdmermmx.asm (revision 829) | 12748 --- simd/jdmermmx.asm (revision 829) |
951 +++ simd/jdmermmx.asm (working copy) | 12749 +++ simd/jdmermmx.asm (working copy) |
952 @@ -35,7 +35,7 @@ | 12750 @@ -35,7 +35,7 @@ |
953 SECTION SEG_CONST | 12751 SECTION SEG_CONST |
954 | 12752 |
955 alignz 16 | 12753 alignz 16 |
956 - global EXTN(jconst_merged_upsample_mmx) | 12754 - global EXTN(jconst_merged_upsample_mmx) |
957 + global EXTN(jconst_merged_upsample_mmx) PRIVATE | 12755 + global EXTN(jconst_merged_upsample_mmx) PRIVATE |
958 | 12756 |
959 EXTN(jconst_merged_upsample_mmx): | 12757 EXTN(jconst_merged_upsample_mmx): |
960 | 12758 |
961 Index: simd/jcclrss2.asm | 12759 @@ -48,6 +48,9 @@ |
| 12760 » alignz» 16 |
| 12761 |
| 12762 ; -------------------------------------------------------------------------- |
| 12763 +» SECTION»SEG_TEXT |
| 12764 +» BITS» 32 |
| 12765 + |
| 12766 %include "jdmrgmmx.asm" |
| 12767 |
| 12768 %undef RGB_RED |
| 12769 @@ -54,10 +57,10 @@ |
| 12770 %undef RGB_GREEN |
| 12771 %undef RGB_BLUE |
| 12772 %undef RGB_PIXELSIZE |
| 12773 -%define RGB_RED 0 |
| 12774 -%define RGB_GREEN 1 |
| 12775 -%define RGB_BLUE 2 |
| 12776 -%define RGB_PIXELSIZE 3 |
| 12777 +%define RGB_RED EXT_RGB_RED |
| 12778 +%define RGB_GREEN EXT_RGB_GREEN |
| 12779 +%define RGB_BLUE EXT_RGB_BLUE |
| 12780 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 12781 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extrgb_merged_upsample_mmx |
| 12782 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extrgb_merged_upsample_mmx |
| 12783 %include "jdmrgmmx.asm" |
| 12784 @@ -66,10 +69,10 @@ |
| 12785 %undef RGB_GREEN |
| 12786 %undef RGB_BLUE |
| 12787 %undef RGB_PIXELSIZE |
| 12788 -%define RGB_RED 0 |
| 12789 -%define RGB_GREEN 1 |
| 12790 -%define RGB_BLUE 2 |
| 12791 -%define RGB_PIXELSIZE 4 |
| 12792 +%define RGB_RED EXT_RGBX_RED |
| 12793 +%define RGB_GREEN EXT_RGBX_GREEN |
| 12794 +%define RGB_BLUE EXT_RGBX_BLUE |
| 12795 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 12796 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extrgbx_merged_upsample_mmx |
| 12797 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extrgbx_merged_upsample_mmx |
| 12798 %include "jdmrgmmx.asm" |
| 12799 @@ -78,10 +81,10 @@ |
| 12800 %undef RGB_GREEN |
| 12801 %undef RGB_BLUE |
| 12802 %undef RGB_PIXELSIZE |
| 12803 -%define RGB_RED 2 |
| 12804 -%define RGB_GREEN 1 |
| 12805 -%define RGB_BLUE 0 |
| 12806 -%define RGB_PIXELSIZE 3 |
| 12807 +%define RGB_RED EXT_BGR_RED |
| 12808 +%define RGB_GREEN EXT_BGR_GREEN |
| 12809 +%define RGB_BLUE EXT_BGR_BLUE |
| 12810 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 12811 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extbgr_merged_upsample_mmx |
| 12812 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extbgr_merged_upsample_mmx |
| 12813 %include "jdmrgmmx.asm" |
| 12814 @@ -90,10 +93,10 @@ |
| 12815 %undef RGB_GREEN |
| 12816 %undef RGB_BLUE |
| 12817 %undef RGB_PIXELSIZE |
| 12818 -%define RGB_RED 2 |
| 12819 -%define RGB_GREEN 1 |
| 12820 -%define RGB_BLUE 0 |
| 12821 -%define RGB_PIXELSIZE 4 |
| 12822 +%define RGB_RED EXT_BGRX_RED |
| 12823 +%define RGB_GREEN EXT_BGRX_GREEN |
| 12824 +%define RGB_BLUE EXT_BGRX_BLUE |
| 12825 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 12826 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extbgrx_merged_upsample_mmx |
| 12827 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extbgrx_merged_upsample_mmx |
| 12828 %include "jdmrgmmx.asm" |
| 12829 @@ -102,10 +105,10 @@ |
| 12830 %undef RGB_GREEN |
| 12831 %undef RGB_BLUE |
| 12832 %undef RGB_PIXELSIZE |
| 12833 -%define RGB_RED 3 |
| 12834 -%define RGB_GREEN 2 |
| 12835 -%define RGB_BLUE 1 |
| 12836 -%define RGB_PIXELSIZE 4 |
| 12837 +%define RGB_RED EXT_XBGR_RED |
| 12838 +%define RGB_GREEN EXT_XBGR_GREEN |
| 12839 +%define RGB_BLUE EXT_XBGR_BLUE |
| 12840 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 12841 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extxbgr_merged_upsample_mmx |
| 12842 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extxbgr_merged_upsample_mmx |
| 12843 %include "jdmrgmmx.asm" |
| 12844 @@ -114,10 +117,10 @@ |
| 12845 %undef RGB_GREEN |
| 12846 %undef RGB_BLUE |
| 12847 %undef RGB_PIXELSIZE |
| 12848 -%define RGB_RED 1 |
| 12849 -%define RGB_GREEN 2 |
| 12850 -%define RGB_BLUE 3 |
| 12851 -%define RGB_PIXELSIZE 4 |
| 12852 +%define RGB_RED EXT_XRGB_RED |
| 12853 +%define RGB_GREEN EXT_XRGB_GREEN |
| 12854 +%define RGB_BLUE EXT_XRGB_BLUE |
| 12855 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 12856 %define jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_extxrgb_merged_upsample_mmx |
| 12857 %define jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_extxrgb_merged_upsample_mmx |
| 12858 %include "jdmrgmmx.asm" |
| 12859 Index: simd/jdmerss2-64.asm |
962 =================================================================== | 12860 =================================================================== |
963 --- simd/jcclrss2.asm» (revision 829) | 12861 --- simd/jdmerss2-64.asm» (revision 829) |
964 +++ simd/jcclrss2.asm» (working copy) | 12862 +++ simd/jdmerss2-64.asm» (working copy) |
965 @@ -38,7 +38,7 @@ | 12863 @@ -1,5 +1,5 @@ |
966 | 12864 ; |
967 » align» 16 | 12865 -; jdmerss2.asm - merged upsampling/color conversion (64-bit SSE2) |
968 | 12866 +; jdmerss2-64.asm - merged upsampling/color conversion (64-bit SSE2) |
969 -» global» EXTN(jsimd_rgb_ycc_convert_sse2) | 12867 ; |
970 +» global» EXTN(jsimd_rgb_ycc_convert_sse2) PRIVATE | 12868 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
971 | 12869 ; Copyright 2009 D. R. Commander |
972 EXTN(jsimd_rgb_ycc_convert_sse2): | 12870 @@ -35,7 +35,7 @@ |
973 » push» ebp | |
974 Index: simd/jiss2red.asm | |
975 =================================================================== | |
976 --- simd/jiss2red.asm» (revision 829) | |
977 +++ simd/jiss2red.asm» (working copy) | |
978 @@ -72,7 +72,7 @@ | |
979 SECTION SEG_CONST | 12871 SECTION SEG_CONST |
980 | 12872 |
981 alignz 16 | 12873 alignz 16 |
982 -» global» EXTN(jconst_idct_red_sse2) | 12874 -» global» EXTN(jconst_merged_upsample_sse2) |
983 +» global» EXTN(jconst_idct_red_sse2) PRIVATE | 12875 +» global» EXTN(jconst_merged_upsample_sse2) PRIVATE |
984 | 12876 |
985 EXTN(jconst_idct_red_sse2): | 12877 EXTN(jconst_merged_upsample_sse2): |
986 | 12878 |
987 @@ -113,7 +113,7 @@ | 12879 @@ -48,6 +48,9 @@ |
988 %define WK_NUM»» 2 | 12880 » alignz» 16 |
989 | 12881 |
990 » align» 16 | 12882 ; -------------------------------------------------------------------------- |
991 -» global» EXTN(jsimd_idct_4x4_sse2) | 12883 +» SECTION»SEG_TEXT |
992 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE | 12884 +» BITS» 64 |
993 | 12885 + |
994 EXTN(jsimd_idct_4x4_sse2): | 12886 %include "jdmrgss2-64.asm" |
995 » push» ebp | 12887 |
996 @@ -424,7 +424,7 @@ | 12888 %undef RGB_RED |
997 %define output_col(b)» (b)+20» » ; JDIMENSION output_col | 12889 @@ -54,10 +57,10 @@ |
998 | 12890 %undef RGB_GREEN |
999 » align» 16 | 12891 %undef RGB_BLUE |
1000 -» global» EXTN(jsimd_idct_2x2_sse2) | 12892 %undef RGB_PIXELSIZE |
1001 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE | 12893 -%define RGB_RED 0 |
1002 | 12894 -%define RGB_GREEN 1 |
1003 EXTN(jsimd_idct_2x2_sse2): | 12895 -%define RGB_BLUE 2 |
1004 » push» ebp | 12896 -%define RGB_PIXELSIZE 3 |
| 12897 +%define RGB_RED EXT_RGB_RED |
| 12898 +%define RGB_GREEN EXT_RGB_GREEN |
| 12899 +%define RGB_BLUE EXT_RGB_BLUE |
| 12900 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 12901 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgb_merged_upsample_sse2 |
| 12902 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgb_merged_upsample_sse2 |
| 12903 %include "jdmrgss2-64.asm" |
| 12904 @@ -66,10 +69,10 @@ |
| 12905 %undef RGB_GREEN |
| 12906 %undef RGB_BLUE |
| 12907 %undef RGB_PIXELSIZE |
| 12908 -%define RGB_RED 0 |
| 12909 -%define RGB_GREEN 1 |
| 12910 -%define RGB_BLUE 2 |
| 12911 -%define RGB_PIXELSIZE 4 |
| 12912 +%define RGB_RED EXT_RGBX_RED |
| 12913 +%define RGB_GREEN EXT_RGBX_GREEN |
| 12914 +%define RGB_BLUE EXT_RGBX_BLUE |
| 12915 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 12916 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgbx_merged_upsample_sse2 |
| 12917 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgbx_merged_upsample_sse2 |
| 12918 %include "jdmrgss2-64.asm" |
| 12919 @@ -78,10 +81,10 @@ |
| 12920 %undef RGB_GREEN |
| 12921 %undef RGB_BLUE |
| 12922 %undef RGB_PIXELSIZE |
| 12923 -%define RGB_RED 2 |
| 12924 -%define RGB_GREEN 1 |
| 12925 -%define RGB_BLUE 0 |
| 12926 -%define RGB_PIXELSIZE 3 |
| 12927 +%define RGB_RED EXT_BGR_RED |
| 12928 +%define RGB_GREEN EXT_BGR_GREEN |
| 12929 +%define RGB_BLUE EXT_BGR_BLUE |
| 12930 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 12931 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgr_merged_upsample_sse2 |
| 12932 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgr_merged_upsample_sse2 |
| 12933 %include "jdmrgss2-64.asm" |
| 12934 @@ -90,10 +93,10 @@ |
| 12935 %undef RGB_GREEN |
| 12936 %undef RGB_BLUE |
| 12937 %undef RGB_PIXELSIZE |
| 12938 -%define RGB_RED 2 |
| 12939 -%define RGB_GREEN 1 |
| 12940 -%define RGB_BLUE 0 |
| 12941 -%define RGB_PIXELSIZE 4 |
| 12942 +%define RGB_RED EXT_BGRX_RED |
| 12943 +%define RGB_GREEN EXT_BGRX_GREEN |
| 12944 +%define RGB_BLUE EXT_BGRX_BLUE |
| 12945 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 12946 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgrx_merged_upsample_sse2 |
| 12947 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgrx_merged_upsample_sse2 |
| 12948 %include "jdmrgss2-64.asm" |
| 12949 @@ -102,10 +105,10 @@ |
| 12950 %undef RGB_GREEN |
| 12951 %undef RGB_BLUE |
| 12952 %undef RGB_PIXELSIZE |
| 12953 -%define RGB_RED 3 |
| 12954 -%define RGB_GREEN 2 |
| 12955 -%define RGB_BLUE 1 |
| 12956 -%define RGB_PIXELSIZE 4 |
| 12957 +%define RGB_RED EXT_XBGR_RED |
| 12958 +%define RGB_GREEN EXT_XBGR_GREEN |
| 12959 +%define RGB_BLUE EXT_XBGR_BLUE |
| 12960 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 12961 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxbgr_merged_upsample_sse2 |
| 12962 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxbgr_merged_upsample_sse2 |
| 12963 %include "jdmrgss2-64.asm" |
| 12964 @@ -114,10 +117,10 @@ |
| 12965 %undef RGB_GREEN |
| 12966 %undef RGB_BLUE |
| 12967 %undef RGB_PIXELSIZE |
| 12968 -%define RGB_RED 1 |
| 12969 -%define RGB_GREEN 2 |
| 12970 -%define RGB_BLUE 3 |
| 12971 -%define RGB_PIXELSIZE 4 |
| 12972 +%define RGB_RED EXT_XRGB_RED |
| 12973 +%define RGB_GREEN EXT_XRGB_GREEN |
| 12974 +%define RGB_BLUE EXT_XRGB_BLUE |
| 12975 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 12976 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxrgb_merged_upsample_sse2 |
| 12977 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxrgb_merged_upsample_sse2 |
| 12978 %include "jdmrgss2-64.asm" |
1005 Index: simd/jdmerss2.asm | 12979 Index: simd/jdmerss2.asm |
1006 =================================================================== | 12980 =================================================================== |
1007 --- simd/jdmerss2.asm (revision 829) | 12981 --- simd/jdmerss2.asm (revision 829) |
1008 +++ simd/jdmerss2.asm (working copy) | 12982 +++ simd/jdmerss2.asm (working copy) |
1009 @@ -35,7 +35,7 @@ | 12983 @@ -35,7 +35,7 @@ |
1010 SECTION SEG_CONST | 12984 SECTION SEG_CONST |
1011 | 12985 |
1012 alignz 16 | 12986 alignz 16 |
1013 - global EXTN(jconst_merged_upsample_sse2) | 12987 - global EXTN(jconst_merged_upsample_sse2) |
1014 + global EXTN(jconst_merged_upsample_sse2) PRIVATE | 12988 + global EXTN(jconst_merged_upsample_sse2) PRIVATE |
1015 | 12989 |
1016 EXTN(jconst_merged_upsample_sse2): | 12990 EXTN(jconst_merged_upsample_sse2): |
1017 | 12991 |
1018 Index: simd/jfss2fst-64.asm | 12992 @@ -48,6 +48,9 @@ |
| 12993 » alignz» 16 |
| 12994 |
| 12995 ; -------------------------------------------------------------------------- |
| 12996 +» SECTION»SEG_TEXT |
| 12997 +» BITS» 32 |
| 12998 + |
| 12999 %include "jdmrgss2.asm" |
| 13000 |
| 13001 %undef RGB_RED |
| 13002 @@ -54,10 +57,10 @@ |
| 13003 %undef RGB_GREEN |
| 13004 %undef RGB_BLUE |
| 13005 %undef RGB_PIXELSIZE |
| 13006 -%define RGB_RED 0 |
| 13007 -%define RGB_GREEN 1 |
| 13008 -%define RGB_BLUE 2 |
| 13009 -%define RGB_PIXELSIZE 3 |
| 13010 +%define RGB_RED EXT_RGB_RED |
| 13011 +%define RGB_GREEN EXT_RGB_GREEN |
| 13012 +%define RGB_BLUE EXT_RGB_BLUE |
| 13013 +%define RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 13014 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgb_merged_upsample_sse2 |
| 13015 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgb_merged_upsample_sse2 |
| 13016 %include "jdmrgss2.asm" |
| 13017 @@ -66,10 +69,10 @@ |
| 13018 %undef RGB_GREEN |
| 13019 %undef RGB_BLUE |
| 13020 %undef RGB_PIXELSIZE |
| 13021 -%define RGB_RED 0 |
| 13022 -%define RGB_GREEN 1 |
| 13023 -%define RGB_BLUE 2 |
| 13024 -%define RGB_PIXELSIZE 4 |
| 13025 +%define RGB_RED EXT_RGBX_RED |
| 13026 +%define RGB_GREEN EXT_RGBX_GREEN |
| 13027 +%define RGB_BLUE EXT_RGBX_BLUE |
| 13028 +%define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 13029 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extrgbx_merged_upsample_sse2 |
| 13030 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extrgbx_merged_upsample_sse2 |
| 13031 %include "jdmrgss2.asm" |
| 13032 @@ -78,10 +81,10 @@ |
| 13033 %undef RGB_GREEN |
| 13034 %undef RGB_BLUE |
| 13035 %undef RGB_PIXELSIZE |
| 13036 -%define RGB_RED 2 |
| 13037 -%define RGB_GREEN 1 |
| 13038 -%define RGB_BLUE 0 |
| 13039 -%define RGB_PIXELSIZE 3 |
| 13040 +%define RGB_RED EXT_BGR_RED |
| 13041 +%define RGB_GREEN EXT_BGR_GREEN |
| 13042 +%define RGB_BLUE EXT_BGR_BLUE |
| 13043 +%define RGB_PIXELSIZE EXT_BGR_PIXELSIZE |
| 13044 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgr_merged_upsample_sse2 |
| 13045 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgr_merged_upsample_sse2 |
| 13046 %include "jdmrgss2.asm" |
| 13047 @@ -90,10 +93,10 @@ |
| 13048 %undef RGB_GREEN |
| 13049 %undef RGB_BLUE |
| 13050 %undef RGB_PIXELSIZE |
| 13051 -%define RGB_RED 2 |
| 13052 -%define RGB_GREEN 1 |
| 13053 -%define RGB_BLUE 0 |
| 13054 -%define RGB_PIXELSIZE 4 |
| 13055 +%define RGB_RED EXT_BGRX_RED |
| 13056 +%define RGB_GREEN EXT_BGRX_GREEN |
| 13057 +%define RGB_BLUE EXT_BGRX_BLUE |
| 13058 +%define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 13059 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extbgrx_merged_upsample_sse2 |
| 13060 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extbgrx_merged_upsample_sse2 |
| 13061 %include "jdmrgss2.asm" |
| 13062 @@ -102,10 +105,10 @@ |
| 13063 %undef RGB_GREEN |
| 13064 %undef RGB_BLUE |
| 13065 %undef RGB_PIXELSIZE |
| 13066 -%define RGB_RED 3 |
| 13067 -%define RGB_GREEN 2 |
| 13068 -%define RGB_BLUE 1 |
| 13069 -%define RGB_PIXELSIZE 4 |
| 13070 +%define RGB_RED EXT_XBGR_RED |
| 13071 +%define RGB_GREEN EXT_XBGR_GREEN |
| 13072 +%define RGB_BLUE EXT_XBGR_BLUE |
| 13073 +%define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 13074 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxbgr_merged_upsample_sse2 |
| 13075 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxbgr_merged_upsample_sse2 |
| 13076 %include "jdmrgss2.asm" |
| 13077 @@ -114,10 +117,10 @@ |
| 13078 %undef RGB_GREEN |
| 13079 %undef RGB_BLUE |
| 13080 %undef RGB_PIXELSIZE |
| 13081 -%define RGB_RED 1 |
| 13082 -%define RGB_GREEN 2 |
| 13083 -%define RGB_BLUE 3 |
| 13084 -%define RGB_PIXELSIZE 4 |
| 13085 +%define RGB_RED EXT_XRGB_RED |
| 13086 +%define RGB_GREEN EXT_XRGB_GREEN |
| 13087 +%define RGB_BLUE EXT_XRGB_BLUE |
| 13088 +%define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 13089 %define jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_extxrgb_merged_upsample_sse2 |
| 13090 %define jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_extxrgb_merged_upsample_sse2 |
| 13091 %include "jdmrgss2.asm" |
| 13092 Index: simd/jdmrgmmx.asm |
1019 =================================================================== | 13093 =================================================================== |
1020 --- simd/jfss2fst-64.asm (revision 829) | 13094 --- simd/jdmrgmmx.asm (revision 829) |
1021 +++ simd/jfss2fst-64.asm (working copy) | 13095 +++ simd/jdmrgmmx.asm (working copy) |
1022 @@ -53,7 +53,7 @@ | 13096 @@ -19,8 +19,6 @@ |
1023 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) | 13097 %include "jcolsamp.inc" |
| 13098 |
| 13099 ; -------------------------------------------------------------------------- |
| 13100 - SECTION SEG_TEXT |
| 13101 - BITS 32 |
| 13102 ; |
| 13103 ; Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical. |
| 13104 ; |
| 13105 @@ -42,7 +40,7 @@ |
| 13106 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr |
| 13107 |
| 13108 align 16 |
| 13109 - global EXTN(jsimd_h2v1_merged_upsample_mmx) |
| 13110 + global EXTN(jsimd_h2v1_merged_upsample_mmx) PRIVATE |
| 13111 |
| 13112 EXTN(jsimd_h2v1_merged_upsample_mmx): |
| 13113 push ebp |
| 13114 @@ -253,7 +251,7 @@ |
| 13115 movq MMWORD [edi+2*SIZEOF_MMWORD], mmC |
| 13116 |
| 13117 sub ecx, byte SIZEOF_MMWORD |
| 13118 - jz short .endcolumn |
| 13119 + jz near .endcolumn |
| 13120 |
| 13121 add edi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr |
| 13122 add esi, byte SIZEOF_MMWORD ; inptr0 |
| 13123 @@ -411,7 +409,7 @@ |
| 13124 %define output_buf(b) (b)+20 ; JSAMPARRAY output_buf |
| 13125 |
| 13126 align 16 |
| 13127 - global EXTN(jsimd_h2v2_merged_upsample_mmx) |
| 13128 + global EXTN(jsimd_h2v2_merged_upsample_mmx) PRIVATE |
| 13129 |
| 13130 EXTN(jsimd_h2v2_merged_upsample_mmx): |
| 13131 push ebp |
| 13132 @@ -461,3 +459,6 @@ |
| 13133 pop ebp |
| 13134 ret |
| 13135 |
| 13136 +; For some reason, the OS X linker does not honor the request to align the |
| 13137 +; segment unless we do this. |
| 13138 + align 16 |
| 13139 Index: simd/jdmrgss2-64.asm |
| 13140 =================================================================== |
| 13141 --- simd/jdmrgss2-64.asm (revision 829) |
| 13142 +++ simd/jdmrgss2-64.asm (working copy) |
| 13143 @@ -1,8 +1,8 @@ |
| 13144 ; |
| 13145 -; jdmrgss2.asm - merged upsampling/color conversion (64-bit SSE2) |
| 13146 +; jdmrgss2-64.asm - merged upsampling/color conversion (64-bit SSE2) |
| 13147 ; |
| 13148 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 13149 -; Copyright 2009 D. R. Commander |
| 13150 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 13151 +; Copyright 2009, 2012 D. R. Commander |
| 13152 ; |
| 13153 ; Based on |
| 13154 ; x86 SIMD extension for IJG JPEG library |
| 13155 @@ -20,8 +20,6 @@ |
| 13156 %include "jcolsamp.inc" |
| 13157 |
| 13158 ; -------------------------------------------------------------------------- |
| 13159 - SECTION SEG_TEXT |
| 13160 - BITS 64 |
| 13161 ; |
| 13162 ; Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical. |
| 13163 ; |
| 13164 @@ -41,7 +39,7 @@ |
| 13165 %define WK_NUM 3 |
| 13166 |
| 13167 align 16 |
| 13168 - global EXTN(jsimd_h2v1_merged_upsample_sse2) |
| 13169 + global EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE |
| 13170 |
| 13171 EXTN(jsimd_h2v1_merged_upsample_sse2): |
| 13172 push rbp |
| 13173 @@ -51,8 +49,8 @@ |
| 13174 mov [rsp],rax |
| 13175 mov rbp,rsp ; rbp = aligned rbp |
| 13176 lea rsp, [wk(0)] |
| 13177 + collect_args |
| 13178 push rbx |
| 13179 - collect_args |
| 13180 |
| 13181 mov rcx, r10 ; col |
| 13182 test rcx,rcx |
| 13183 @@ -254,17 +252,13 @@ |
| 13184 movntdq XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 13185 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 13186 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF |
| 13187 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 13188 jmp short .out0 |
| 13189 .out1: ; --(unaligned)----------------- |
| 13190 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's) |
| 13191 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA |
| 13192 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13193 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD |
| 13194 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13195 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [rdi], xmmF |
| 13196 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13197 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 13198 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 13199 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmF |
| 13200 .out0: |
| 13201 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 13202 sub rcx, byte SIZEOF_XMMWORD |
| 13203 jz near .endcolumn |
| 13204 |
| 13205 @@ -277,14 +271,12 @@ |
| 13206 jmp near .columnloop |
| 13207 |
| 13208 .column_st32: |
| 13209 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's) |
| 13210 lea rcx, [rcx+rcx*2] ; imul ecx, RGB_PIXELSIZE |
| 13211 cmp rcx, byte 2*SIZEOF_XMMWORD |
| 13212 jb short .column_st16 |
| 13213 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA |
| 13214 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13215 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [rdi], xmmD |
| 13216 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13217 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 13218 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 13219 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr |
| 13220 movdqa xmmA,xmmF |
| 13221 sub rcx, byte 2*SIZEOF_XMMWORD |
| 13222 jmp short .column_st15 |
| 13223 @@ -291,50 +283,44 @@ |
| 13224 .column_st16: |
| 13225 cmp rcx, byte SIZEOF_XMMWORD |
| 13226 jb short .column_st15 |
| 13227 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [rdi], xmmA |
| 13228 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 13229 add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13230 movdqa xmmA,xmmD |
| 13231 sub rcx, byte SIZEOF_XMMWORD |
| 13232 .column_st15: |
| 13233 - mov rax,rcx |
| 13234 - xor rcx, byte 0x0F |
| 13235 - shl rcx, 2 |
| 13236 - movd xmmB,ecx |
| 13237 - psrlq xmmH,4 |
| 13238 - pcmpeqb xmmE,xmmE |
| 13239 - psrlq xmmH,xmmB |
| 13240 - psrlq xmmE,xmmB |
| 13241 - punpcklbw xmmE,xmmH |
| 13242 - ; ---------------- |
| 13243 - mov rcx,rdi |
| 13244 - and rcx, byte SIZEOF_XMMWORD-1 |
| 13245 - jz short .adj0 |
| 13246 - add rax,rcx |
| 13247 - cmp rax, byte SIZEOF_XMMWORD |
| 13248 - ja short .adj0 |
| 13249 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary |
| 13250 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx |
| 13251 - movdqa xmmG,xmmA |
| 13252 - movdqa xmmC,xmmE |
| 13253 - pslldq xmmA, SIZEOF_XMMWORD/2 |
| 13254 - pslldq xmmE, SIZEOF_XMMWORD/2 |
| 13255 - movd xmmD,ecx |
| 13256 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT |
| 13257 - jb short .adj1 |
| 13258 - movd xmmF,ecx |
| 13259 - psllq xmmA,xmmF |
| 13260 - psllq xmmE,xmmF |
| 13261 - jmp short .adj0 |
| 13262 -.adj1: neg rcx |
| 13263 - movd xmmF,ecx |
| 13264 - psrlq xmmA,xmmF |
| 13265 - psrlq xmmE,xmmF |
| 13266 - psllq xmmG,xmmD |
| 13267 - psllq xmmC,xmmD |
| 13268 - por xmmA,xmmG |
| 13269 - por xmmE,xmmC |
| 13270 -.adj0: ; ---------------- |
| 13271 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 13272 + ; Store the lower 8 bytes of xmmA to the output when it has enough |
| 13273 + ; space. |
| 13274 + cmp rcx, byte SIZEOF_MMWORD |
| 13275 + jb short .column_st7 |
| 13276 + movq XMM_MMWORD [rdi], xmmA |
| 13277 + add rdi, byte SIZEOF_MMWORD |
| 13278 + sub rcx, byte SIZEOF_MMWORD |
| 13279 + psrldq xmmA, SIZEOF_MMWORD |
| 13280 +.column_st7: |
| 13281 + ; Store the lower 4 bytes of xmmA to the output when it has enough |
| 13282 + ; space. |
| 13283 + cmp rcx, byte SIZEOF_DWORD |
| 13284 + jb short .column_st3 |
| 13285 + movd XMM_DWORD [rdi], xmmA |
| 13286 + add rdi, byte SIZEOF_DWORD |
| 13287 + sub rcx, byte SIZEOF_DWORD |
| 13288 + psrldq xmmA, SIZEOF_DWORD |
| 13289 +.column_st3: |
| 13290 + ; Store the lower 2 bytes of rax to the output when it has enough |
| 13291 + ; space. |
| 13292 + movd eax, xmmA |
| 13293 + cmp rcx, byte SIZEOF_WORD |
| 13294 + jb short .column_st1 |
| 13295 + mov WORD [rdi], ax |
| 13296 + add rdi, byte SIZEOF_WORD |
| 13297 + sub rcx, byte SIZEOF_WORD |
| 13298 + shr rax, 16 |
| 13299 +.column_st1: |
| 13300 + ; Store the lower 1 byte of rax to the output when it has enough |
| 13301 + ; space. |
| 13302 + test rcx, rcx |
| 13303 + jz short .endcolumn |
| 13304 + mov BYTE [rdi], al |
| 13305 |
| 13306 %else ; RGB_PIXELSIZE == 4 ; ----------- |
| 13307 |
| 13308 @@ -379,19 +365,14 @@ |
| 13309 movntdq XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 13310 movntdq XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC |
| 13311 movntdq XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH |
| 13312 - add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 13313 jmp short .out0 |
| 13314 .out1: ; --(unaligned)----------------- |
| 13315 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's) |
| 13316 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA |
| 13317 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13318 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD |
| 13319 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13320 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [rdi], xmmC |
| 13321 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13322 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [rdi], xmmH |
| 13323 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13324 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 13325 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 13326 + movdqu XMMWORD [rdi+2*SIZEOF_XMMWORD], xmmC |
| 13327 + movdqu XMMWORD [rdi+3*SIZEOF_XMMWORD], xmmH |
| 13328 .out0: |
| 13329 + add rdi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 13330 sub rcx, byte SIZEOF_XMMWORD |
| 13331 jz near .endcolumn |
| 13332 |
| 13333 @@ -404,13 +385,11 @@ |
| 13334 jmp near .columnloop |
| 13335 |
| 13336 .column_st32: |
| 13337 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's) |
| 13338 cmp rcx, byte SIZEOF_XMMWORD/2 |
| 13339 jb short .column_st16 |
| 13340 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [rdi], xmmA |
| 13341 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13342 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [rdi], xmmD |
| 13343 - add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13344 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 13345 + movdqu XMMWORD [rdi+1*SIZEOF_XMMWORD], xmmD |
| 13346 + add rdi, byte 2*SIZEOF_XMMWORD ; outptr |
| 13347 movdqa xmmA,xmmC |
| 13348 movdqa xmmD,xmmH |
| 13349 sub rcx, byte SIZEOF_XMMWORD/2 |
| 13350 @@ -417,50 +396,25 @@ |
| 13351 .column_st16: |
| 13352 cmp rcx, byte SIZEOF_XMMWORD/4 |
| 13353 jb short .column_st15 |
| 13354 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 13355 + movdqu XMMWORD [rdi+0*SIZEOF_XMMWORD], xmmA |
| 13356 add rdi, byte SIZEOF_XMMWORD ; outptr |
| 13357 movdqa xmmA,xmmD |
| 13358 sub rcx, byte SIZEOF_XMMWORD/4 |
| 13359 .column_st15: |
| 13360 - cmp rcx, byte SIZEOF_XMMWORD/16 |
| 13361 - jb near .endcolumn |
| 13362 - mov rax,rcx |
| 13363 - xor rcx, byte 0x03 |
| 13364 - inc rcx |
| 13365 - shl rcx, 4 |
| 13366 - movd xmmF,ecx |
| 13367 - psrlq xmmE,xmmF |
| 13368 - punpcklbw xmmE,xmmE |
| 13369 - ; ---------------- |
| 13370 - mov rcx,rdi |
| 13371 - and rcx, byte SIZEOF_XMMWORD-1 |
| 13372 - jz short .adj0 |
| 13373 - lea rax, [rcx+rax*4] ; RGB_PIXELSIZE |
| 13374 - cmp rax, byte SIZEOF_XMMWORD |
| 13375 - ja short .adj0 |
| 13376 - and rdi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary |
| 13377 - shl rcx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx |
| 13378 - movdqa xmmB,xmmA |
| 13379 - movdqa xmmG,xmmE |
| 13380 - pslldq xmmA, SIZEOF_XMMWORD/2 |
| 13381 - pslldq xmmE, SIZEOF_XMMWORD/2 |
| 13382 - movd xmmC,ecx |
| 13383 - sub rcx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT |
| 13384 - jb short .adj1 |
| 13385 - movd xmmH,ecx |
| 13386 - psllq xmmA,xmmH |
| 13387 - psllq xmmE,xmmH |
| 13388 - jmp short .adj0 |
| 13389 -.adj1: neg rcx |
| 13390 - movd xmmH,ecx |
| 13391 - psrlq xmmA,xmmH |
| 13392 - psrlq xmmE,xmmH |
| 13393 - psllq xmmB,xmmC |
| 13394 - psllq xmmG,xmmC |
| 13395 - por xmmA,xmmB |
| 13396 - por xmmE,xmmG |
| 13397 -.adj0: ; ---------------- |
| 13398 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 13399 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough |
| 13400 + ; space. |
| 13401 + cmp rcx, byte SIZEOF_XMMWORD/8 |
| 13402 + jb short .column_st7 |
| 13403 + movq XMM_MMWORD [rdi], xmmA |
| 13404 + add rdi, byte SIZEOF_XMMWORD/8*4 |
| 13405 + sub rcx, byte SIZEOF_XMMWORD/8 |
| 13406 + psrldq xmmA, SIZEOF_XMMWORD/8*4 |
| 13407 +.column_st7: |
| 13408 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough |
| 13409 + ; space. |
| 13410 + test rcx, rcx |
| 13411 + jz short .endcolumn |
| 13412 + movd XMM_DWORD [rdi], xmmA |
| 13413 |
| 13414 %endif ; RGB_PIXELSIZE ; --------------- |
| 13415 |
| 13416 @@ -468,8 +422,8 @@ |
| 13417 sfence ; flush the write buffer |
| 13418 |
| 13419 .return: |
| 13420 + pop rbx |
| 13421 uncollect_args |
| 13422 - pop rbx |
| 13423 mov rsp,rbp ; rsp <- aligned rbp |
| 13424 pop rsp ; rsp <- original rbp |
| 13425 pop rbp |
| 13426 @@ -492,13 +446,14 @@ |
| 13427 ; r13 = JSAMPARRAY output_buf |
| 13428 |
| 13429 align 16 |
| 13430 - global EXTN(jsimd_h2v2_merged_upsample_sse2) |
| 13431 + global EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE |
| 13432 |
| 13433 EXTN(jsimd_h2v2_merged_upsample_sse2): |
| 13434 push rbp |
| 13435 + mov rax,rsp |
| 13436 mov rbp,rsp |
| 13437 + collect_args |
| 13438 push rbx |
| 13439 - collect_args |
| 13440 |
| 13441 mov rax, r10 |
| 13442 |
| 13443 @@ -519,10 +474,17 @@ |
| 13444 push rcx |
| 13445 push rax |
| 13446 |
| 13447 + %ifdef WIN64 |
| 13448 + mov r8, rcx |
| 13449 + mov r9, rdi |
| 13450 + mov rcx, rax |
| 13451 + mov rdx, rbx |
| 13452 + %else |
| 13453 mov rdx, rcx |
| 13454 mov rcx, rdi |
| 13455 mov rdi, rax |
| 13456 mov rsi, rbx |
| 13457 + %endif |
| 13458 |
| 13459 call EXTN(jsimd_h2v1_merged_upsample_sse2) |
| 13460 |
| 13461 @@ -545,10 +507,17 @@ |
| 13462 push rcx |
| 13463 push rax |
| 13464 |
| 13465 + %ifdef WIN64 |
| 13466 + mov r8, rcx |
| 13467 + mov r9, rdi |
| 13468 + mov rcx, rax |
| 13469 + mov rdx, rbx |
| 13470 + %else |
| 13471 mov rdx, rcx |
| 13472 mov rcx, rdi |
| 13473 mov rdi, rax |
| 13474 mov rsi, rbx |
| 13475 + %endif |
| 13476 |
| 13477 call EXTN(jsimd_h2v1_merged_upsample_sse2) |
| 13478 |
| 13479 @@ -559,7 +528,11 @@ |
| 13480 pop rbx |
| 13481 pop rdx |
| 13482 |
| 13483 + pop rbx |
| 13484 uncollect_args |
| 13485 - pop rbx |
| 13486 pop rbp |
| 13487 ret |
| 13488 + |
| 13489 +; For some reason, the OS X linker does not honor the request to align the |
| 13490 +; segment unless we do this. |
| 13491 + align 16 |
| 13492 Index: simd/jdmrgss2.asm |
| 13493 =================================================================== |
| 13494 --- simd/jdmrgss2.asm (revision 829) |
| 13495 +++ simd/jdmrgss2.asm (working copy) |
| 13496 @@ -1,7 +1,8 @@ |
| 13497 ; |
| 13498 ; jdmrgss2.asm - merged upsampling/color conversion (SSE2) |
| 13499 ; |
| 13500 -; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 13501 +; Copyright 2009, 2012 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 13502 +; Copyright 2012 D. R. Commander |
| 13503 ; |
| 13504 ; Based on |
| 13505 ; x86 SIMD extension for IJG JPEG library |
| 13506 @@ -19,8 +20,6 @@ |
| 13507 %include "jcolsamp.inc" |
| 13508 |
| 13509 ; -------------------------------------------------------------------------- |
| 13510 - SECTION SEG_TEXT |
| 13511 - BITS 32 |
| 13512 ; |
| 13513 ; Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical. |
| 13514 ; |
| 13515 @@ -42,7 +41,7 @@ |
| 13516 %define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr |
| 13517 |
| 13518 align 16 |
| 13519 - global EXTN(jsimd_h2v1_merged_upsample_sse2) |
| 13520 + global EXTN(jsimd_h2v1_merged_upsample_sse2) PRIVATE |
| 13521 |
| 13522 EXTN(jsimd_h2v1_merged_upsample_sse2): |
| 13523 push ebp |
| 13524 @@ -266,17 +265,13 @@ |
| 13525 movntdq XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 13526 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 13527 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF |
| 13528 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 13529 jmp short .out0 |
| 13530 .out1: ; --(unaligned)----------------- |
| 13531 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's) |
| 13532 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA |
| 13533 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13534 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD |
| 13535 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13536 - maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [edi], xmmF |
| 13537 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13538 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 13539 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 13540 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF |
| 13541 .out0: |
| 13542 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 13543 sub ecx, byte SIZEOF_XMMWORD |
| 13544 jz near .endcolumn |
| 13545 |
| 13546 @@ -290,14 +285,12 @@ |
| 13547 alignx 16,7 |
| 13548 |
| 13549 .column_st32: |
| 13550 - pcmpeqb xmmH,xmmH ; xmmH=(all 1's) |
| 13551 lea ecx, [ecx+ecx*2] ; imul ecx, RGB_PIXELSIZE |
| 13552 cmp ecx, byte 2*SIZEOF_XMMWORD |
| 13553 jb short .column_st16 |
| 13554 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA |
| 13555 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13556 - maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD |
| 13557 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13558 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 13559 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 13560 + add edi, byte 2*SIZEOF_XMMWORD ; outptr |
| 13561 movdqa xmmA,xmmF |
| 13562 sub ecx, byte 2*SIZEOF_XMMWORD |
| 13563 jmp short .column_st15 |
| 13564 @@ -304,50 +297,44 @@ |
| 13565 .column_st16: |
| 13566 cmp ecx, byte SIZEOF_XMMWORD |
| 13567 jb short .column_st15 |
| 13568 - maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA |
| 13569 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 13570 add edi, byte SIZEOF_XMMWORD ; outptr |
| 13571 movdqa xmmA,xmmD |
| 13572 sub ecx, byte SIZEOF_XMMWORD |
| 13573 .column_st15: |
| 13574 - mov eax,ecx |
| 13575 - xor ecx, byte 0x0F |
| 13576 - shl ecx, 2 |
| 13577 - movd xmmB,ecx |
| 13578 - psrlq xmmH,4 |
| 13579 - pcmpeqb xmmE,xmmE |
| 13580 - psrlq xmmH,xmmB |
| 13581 - psrlq xmmE,xmmB |
| 13582 - punpcklbw xmmE,xmmH |
| 13583 - ; ---------------- |
| 13584 - mov ecx,edi |
| 13585 - and ecx, byte SIZEOF_XMMWORD-1 |
| 13586 - jz short .adj0 |
| 13587 - add eax,ecx |
| 13588 - cmp eax, byte SIZEOF_XMMWORD |
| 13589 - ja short .adj0 |
| 13590 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary |
| 13591 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx |
| 13592 - movdqa xmmG,xmmA |
| 13593 - movdqa xmmC,xmmE |
| 13594 - pslldq xmmA, SIZEOF_XMMWORD/2 |
| 13595 - pslldq xmmE, SIZEOF_XMMWORD/2 |
| 13596 - movd xmmD,ecx |
| 13597 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT |
| 13598 - jb short .adj1 |
| 13599 - movd xmmF,ecx |
| 13600 - psllq xmmA,xmmF |
| 13601 - psllq xmmE,xmmF |
| 13602 - jmp short .adj0 |
| 13603 -.adj1: neg ecx |
| 13604 - movd xmmF,ecx |
| 13605 - psrlq xmmA,xmmF |
| 13606 - psrlq xmmE,xmmF |
| 13607 - psllq xmmG,xmmD |
| 13608 - psllq xmmC,xmmD |
| 13609 - por xmmA,xmmG |
| 13610 - por xmmE,xmmC |
| 13611 -.adj0: ; ---------------- |
| 13612 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 13613 + ; Store the lower 8 bytes of xmmA to the output when it has enough |
| 13614 + ; space. |
| 13615 + cmp ecx, byte SIZEOF_MMWORD |
| 13616 + jb short .column_st7 |
| 13617 + movq XMM_MMWORD [edi], xmmA |
| 13618 + add edi, byte SIZEOF_MMWORD |
| 13619 + sub ecx, byte SIZEOF_MMWORD |
| 13620 + psrldq xmmA, SIZEOF_MMWORD |
| 13621 +.column_st7: |
| 13622 + ; Store the lower 4 bytes of xmmA to the output when it has enough |
| 13623 + ; space. |
| 13624 + cmp ecx, byte SIZEOF_DWORD |
| 13625 + jb short .column_st3 |
| 13626 + movd XMM_DWORD [edi], xmmA |
| 13627 + add edi, byte SIZEOF_DWORD |
| 13628 + sub ecx, byte SIZEOF_DWORD |
| 13629 + psrldq xmmA, SIZEOF_DWORD |
| 13630 +.column_st3: |
| 13631 + ; Store the lower 2 bytes of eax to the output when it has enough |
| 13632 + ; space. |
| 13633 + movd eax, xmmA |
| 13634 + cmp ecx, byte SIZEOF_WORD |
| 13635 + jb short .column_st1 |
| 13636 + mov WORD [edi], ax |
| 13637 + add edi, byte SIZEOF_WORD |
| 13638 + sub ecx, byte SIZEOF_WORD |
| 13639 + shr eax, 16 |
| 13640 +.column_st1: |
| 13641 + ; Store the lower 1 byte of eax to the output when it has enough |
| 13642 + ; space. |
| 13643 + test ecx, ecx |
| 13644 + jz short .endcolumn |
| 13645 + mov BYTE [edi], al |
| 13646 |
| 13647 %else ; RGB_PIXELSIZE == 4 ; ----------- |
| 13648 |
| 13649 @@ -392,19 +379,14 @@ |
| 13650 movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 13651 movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC |
| 13652 movntdq XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH |
| 13653 - add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 13654 jmp short .out0 |
| 13655 .out1: ; --(unaligned)----------------- |
| 13656 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's) |
| 13657 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 13658 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13659 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD |
| 13660 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13661 - maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [edi], xmmC |
| 13662 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13663 - maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [edi], xmmH |
| 13664 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13665 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 13666 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 13667 + movdqu XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC |
| 13668 + movdqu XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH |
| 13669 .out0: |
| 13670 + add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr |
| 13671 sub ecx, byte SIZEOF_XMMWORD |
| 13672 jz near .endcolumn |
| 13673 |
| 13674 @@ -418,13 +400,11 @@ |
| 13675 alignx 16,7 |
| 13676 |
| 13677 .column_st32: |
| 13678 - pcmpeqb xmmE,xmmE ; xmmE=(all 1's) |
| 13679 cmp ecx, byte SIZEOF_XMMWORD/2 |
| 13680 jb short .column_st16 |
| 13681 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 13682 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13683 - maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD |
| 13684 - add edi, byte SIZEOF_XMMWORD ; outptr |
| 13685 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 13686 + movdqu XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD |
| 13687 + add edi, byte 2*SIZEOF_XMMWORD ; outptr |
| 13688 movdqa xmmA,xmmC |
| 13689 movdqa xmmD,xmmH |
| 13690 sub ecx, byte SIZEOF_XMMWORD/2 |
| 13691 @@ -431,50 +411,25 @@ |
| 13692 .column_st16: |
| 13693 cmp ecx, byte SIZEOF_XMMWORD/4 |
| 13694 jb short .column_st15 |
| 13695 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 13696 + movdqu XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA |
| 13697 add edi, byte SIZEOF_XMMWORD ; outptr |
| 13698 movdqa xmmA,xmmD |
| 13699 sub ecx, byte SIZEOF_XMMWORD/4 |
| 13700 .column_st15: |
| 13701 - cmp ecx, byte SIZEOF_XMMWORD/16 |
| 13702 - jb short .endcolumn |
| 13703 - mov eax,ecx |
| 13704 - xor ecx, byte 0x03 |
| 13705 - inc ecx |
| 13706 - shl ecx, 4 |
| 13707 - movd xmmF,ecx |
| 13708 - psrlq xmmE,xmmF |
| 13709 - punpcklbw xmmE,xmmE |
| 13710 - ; ---------------- |
| 13711 - mov ecx,edi |
| 13712 - and ecx, byte SIZEOF_XMMWORD-1 |
| 13713 - jz short .adj0 |
| 13714 - lea eax, [ecx+eax*4] ; RGB_PIXELSIZE |
| 13715 - cmp eax, byte SIZEOF_XMMWORD |
| 13716 - ja short .adj0 |
| 13717 - and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary |
| 13718 - shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx |
| 13719 - movdqa xmmB,xmmA |
| 13720 - movdqa xmmG,xmmE |
| 13721 - pslldq xmmA, SIZEOF_XMMWORD/2 |
| 13722 - pslldq xmmE, SIZEOF_XMMWORD/2 |
| 13723 - movd xmmC,ecx |
| 13724 - sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT |
| 13725 - jb short .adj1 |
| 13726 - movd xmmH,ecx |
| 13727 - psllq xmmA,xmmH |
| 13728 - psllq xmmE,xmmH |
| 13729 - jmp short .adj0 |
| 13730 -.adj1: neg ecx |
| 13731 - movd xmmH,ecx |
| 13732 - psrlq xmmA,xmmH |
| 13733 - psrlq xmmE,xmmH |
| 13734 - psllq xmmB,xmmC |
| 13735 - psllq xmmG,xmmC |
| 13736 - por xmmA,xmmB |
| 13737 - por xmmE,xmmG |
| 13738 -.adj0: ; ---------------- |
| 13739 - maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA |
| 13740 + ; Store two pixels (8 bytes) of xmmA to the output when it has enough |
| 13741 + ; space. |
| 13742 + cmp ecx, byte SIZEOF_XMMWORD/8 |
| 13743 + jb short .column_st7 |
| 13744 + movq XMM_MMWORD [edi], xmmA |
| 13745 + add edi, byte SIZEOF_XMMWORD/8*4 |
| 13746 + sub ecx, byte SIZEOF_XMMWORD/8 |
| 13747 + psrldq xmmA, SIZEOF_XMMWORD/8*4 |
| 13748 +.column_st7: |
| 13749 + ; Store one pixel (4 bytes) of xmmA to the output when it has enough |
| 13750 + ; space. |
| 13751 + test ecx, ecx |
| 13752 + jz short .endcolumn |
| 13753 + movd XMM_DWORD [edi], xmmA |
| 13754 |
| 13755 %endif ; RGB_PIXELSIZE ; --------------- |
| 13756 |
| 13757 @@ -509,7 +464,7 @@ |
| 13758 %define output_buf(b) (b)+20 ; JSAMPARRAY output_buf |
| 13759 |
| 13760 align 16 |
| 13761 - global EXTN(jsimd_h2v2_merged_upsample_sse2) |
| 13762 + global EXTN(jsimd_h2v2_merged_upsample_sse2) PRIVATE |
| 13763 |
| 13764 EXTN(jsimd_h2v2_merged_upsample_sse2): |
| 13765 push ebp |
| 13766 @@ -559,3 +514,6 @@ |
| 13767 pop ebp |
| 13768 ret |
| 13769 |
| 13770 +; For some reason, the OS X linker does not honor the request to align the |
| 13771 +; segment unless we do this. |
| 13772 + align 16 |
| 13773 Index: simd/jdsammmx.asm |
| 13774 =================================================================== |
| 13775 --- simd/jdsammmx.asm (revision 829) |
| 13776 +++ simd/jdsammmx.asm (working copy) |
| 13777 @@ -22,7 +22,7 @@ |
| 13778 SECTION SEG_CONST |
1024 | 13779 |
1025 alignz 16 | 13780 alignz 16 |
1026 -» global» EXTN(jconst_fdct_ifast_sse2) | 13781 -» global» EXTN(jconst_fancy_upsample_mmx) |
1027 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE | 13782 +» global» EXTN(jconst_fancy_upsample_mmx) PRIVATE |
1028 | 13783 |
1029 EXTN(jconst_fdct_ifast_sse2): | 13784 EXTN(jconst_fancy_upsample_mmx): |
1030 | 13785 |
1031 @@ -80,7 +80,7 @@ | 13786 @@ -58,7 +58,7 @@ |
1032 %define WK_NUM»» 2 | 13787 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr |
1033 | 13788 |
1034 » align» 16 | 13789 » align» 16 |
1035 -» global» EXTN(jsimd_fdct_ifast_sse2) | 13790 -» global» EXTN(jsimd_h2v1_fancy_upsample_mmx) |
1036 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE | 13791 +» global» EXTN(jsimd_h2v1_fancy_upsample_mmx) PRIVATE |
1037 | 13792 |
1038 EXTN(jsimd_fdct_ifast_sse2): | 13793 EXTN(jsimd_h2v1_fancy_upsample_mmx): |
1039 » push» rbp | |
1040 Index: simd/jcqntmmx.asm | |
1041 =================================================================== | |
1042 --- simd/jcqntmmx.asm» (revision 829) | |
1043 +++ simd/jcqntmmx.asm» (working copy) | |
1044 @@ -35,7 +35,7 @@ | |
1045 %define workspace» ebp+16» » ; DCTELEM * workspace | |
1046 | |
1047 » align» 16 | |
1048 -» global» EXTN(jsimd_convsamp_mmx) | |
1049 +» global» EXTN(jsimd_convsamp_mmx) PRIVATE | |
1050 | |
1051 EXTN(jsimd_convsamp_mmx): | |
1052 push ebp | 13794 push ebp |
1053 @@ -140,7 +140,7 @@ | 13795 @@ -216,7 +216,7 @@ |
1054 %define workspace» ebp+16» » ; DCTELEM * workspace | 13796 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr |
1055 | 13797 |
1056 » align» 16 | 13798 » align» 16 |
1057 -» global» EXTN(jsimd_quantize_mmx) | 13799 -» global» EXTN(jsimd_h2v2_fancy_upsample_mmx) |
1058 +» global» EXTN(jsimd_quantize_mmx) PRIVATE | 13800 +» global» EXTN(jsimd_h2v2_fancy_upsample_mmx) PRIVATE |
1059 | 13801 |
1060 EXTN(jsimd_quantize_mmx): | 13802 EXTN(jsimd_h2v2_fancy_upsample_mmx): |
1061 push ebp | 13803 push ebp |
1062 Index: simd/jimmxfst.asm | 13804 @@ -542,7 +542,7 @@ |
1063 =================================================================== | 13805 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr |
1064 --- simd/jimmxfst.asm» (revision 829) | 13806 |
1065 +++ simd/jimmxfst.asm» (working copy) | 13807 » align» 16 |
1066 @@ -59,7 +59,7 @@ | 13808 -» global» EXTN(jsimd_h2v1_upsample_mmx) |
1067 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) | 13809 +» global» EXTN(jsimd_h2v1_upsample_mmx) PRIVATE |
1068 | 13810 |
1069 » alignz» 16 | 13811 EXTN(jsimd_h2v1_upsample_mmx): |
1070 -» global» EXTN(jconst_idct_ifast_mmx) | |
1071 +» global» EXTN(jconst_idct_ifast_mmx) PRIVATE | |
1072 | |
1073 EXTN(jconst_idct_ifast_mmx): | |
1074 | |
1075 @@ -94,7 +94,7 @@ | |
1076 » » » » » ; JCOEF workspace[DCTSIZE2] | |
1077 | |
1078 » align» 16 | |
1079 -» global» EXTN(jsimd_idct_ifast_mmx) | |
1080 +» global» EXTN(jsimd_idct_ifast_mmx) PRIVATE | |
1081 | |
1082 EXTN(jsimd_idct_ifast_mmx): | |
1083 push ebp | 13812 push ebp |
1084 Index: simd/jfss2fst.asm | 13813 @@ -643,7 +643,7 @@ |
1085 =================================================================== | 13814 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr |
1086 --- simd/jfss2fst.asm» (revision 829) | 13815 |
1087 +++ simd/jfss2fst.asm» (working copy) | 13816 » align» 16 |
1088 @@ -52,7 +52,7 @@ | 13817 -» global» EXTN(jsimd_h2v2_upsample_mmx) |
1089 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) | 13818 +» global» EXTN(jsimd_h2v2_upsample_mmx) PRIVATE |
1090 | 13819 |
1091 » alignz» 16 | 13820 EXTN(jsimd_h2v2_upsample_mmx): |
1092 -» global» EXTN(jconst_fdct_ifast_sse2) | |
1093 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE | |
1094 | |
1095 EXTN(jconst_fdct_ifast_sse2): | |
1096 | |
1097 @@ -80,7 +80,7 @@ | |
1098 %define WK_NUM»» 2 | |
1099 | |
1100 » align» 16 | |
1101 -» global» EXTN(jsimd_fdct_ifast_sse2) | |
1102 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE | |
1103 | |
1104 EXTN(jsimd_fdct_ifast_sse2): | |
1105 push ebp | 13821 push ebp |
1106 Index: simd/jcgrammx.asm | 13822 @@ -732,3 +732,6 @@ |
1107 =================================================================== | 13823 » pop» ebp |
1108 --- simd/jcgrammx.asm» (revision 829) | 13824 » ret |
1109 +++ simd/jcgrammx.asm» (working copy) | 13825 |
1110 @@ -33,7 +33,7 @@ | 13826 +; For some reason, the OS X linker does not honor the request to align the |
1111 » SECTION»SEG_CONST | 13827 +; segment unless we do this. |
1112 | 13828 +» align» 16 |
1113 » alignz» 16 | |
1114 -» global» EXTN(jconst_rgb_gray_convert_mmx) | |
1115 +» global» EXTN(jconst_rgb_gray_convert_mmx) PRIVATE | |
1116 | |
1117 EXTN(jconst_rgb_gray_convert_mmx): | |
1118 | |
1119 Index: simd/jdcolss2-64.asm | |
1120 =================================================================== | |
1121 --- simd/jdcolss2-64.asm» (revision 829) | |
1122 +++ simd/jdcolss2-64.asm» (working copy) | |
1123 @@ -35,7 +35,7 @@ | |
1124 » SECTION»SEG_CONST | |
1125 | |
1126 » alignz» 16 | |
1127 -» global» EXTN(jconst_ycc_rgb_convert_sse2) | |
1128 +» global» EXTN(jconst_ycc_rgb_convert_sse2) PRIVATE | |
1129 | |
1130 EXTN(jconst_ycc_rgb_convert_sse2): | |
1131 | |
1132 Index: simd/jf3dnflt.asm | |
1133 =================================================================== | |
1134 --- simd/jf3dnflt.asm» (revision 829) | |
1135 +++ simd/jf3dnflt.asm» (working copy) | |
1136 @@ -27,7 +27,7 @@ | |
1137 » SECTION»SEG_CONST | |
1138 | |
1139 » alignz» 16 | |
1140 -» global» EXTN(jconst_fdct_float_3dnow) | |
1141 +» global» EXTN(jconst_fdct_float_3dnow) PRIVATE | |
1142 | |
1143 EXTN(jconst_fdct_float_3dnow): | |
1144 | |
1145 @@ -55,7 +55,7 @@ | |
1146 %define WK_NUM»» 2 | |
1147 | |
1148 » align» 16 | |
1149 -» global» EXTN(jsimd_fdct_float_3dnow) | |
1150 +» global» EXTN(jsimd_fdct_float_3dnow) PRIVATE | |
1151 | |
1152 EXTN(jsimd_fdct_float_3dnow): | |
1153 » push» ebp | |
1154 Index: simd/jdsamss2-64.asm | 13829 Index: simd/jdsamss2-64.asm |
1155 =================================================================== | 13830 =================================================================== |
1156 --- simd/jdsamss2-64.asm (revision 829) | 13831 --- simd/jdsamss2-64.asm (revision 829) |
1157 +++ simd/jdsamss2-64.asm (working copy) | 13832 +++ simd/jdsamss2-64.asm (working copy) |
| 13833 @@ -1,5 +1,5 @@ |
| 13834 ; |
| 13835 -; jdsamss2.asm - upsampling (64-bit SSE2) |
| 13836 +; jdsamss2-64.asm - upsampling (64-bit SSE2) |
| 13837 ; |
| 13838 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 13839 ; Copyright 2009 D. R. Commander |
1158 @@ -23,7 +23,7 @@ | 13840 @@ -23,7 +23,7 @@ |
1159 SECTION SEG_CONST | 13841 SECTION SEG_CONST |
1160 | 13842 |
1161 alignz 16 | 13843 alignz 16 |
1162 - global EXTN(jconst_fancy_upsample_sse2) | 13844 - global EXTN(jconst_fancy_upsample_sse2) |
1163 + global EXTN(jconst_fancy_upsample_sse2) PRIVATE | 13845 + global EXTN(jconst_fancy_upsample_sse2) PRIVATE |
1164 | 13846 |
1165 EXTN(jconst_fancy_upsample_sse2): | 13847 EXTN(jconst_fancy_upsample_sse2): |
1166 | 13848 |
1167 @@ -59,7 +59,7 @@ | 13849 @@ -59,10 +59,11 @@ |
1168 ; r13 = JSAMPARRAY * output_data_ptr | 13850 ; r13 = JSAMPARRAY * output_data_ptr |
1169 | 13851 |
1170 align 16 | 13852 align 16 |
1171 - global EXTN(jsimd_h2v1_fancy_upsample_sse2) | 13853 - global EXTN(jsimd_h2v1_fancy_upsample_sse2) |
1172 + global EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE | 13854 + global EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE |
1173 | 13855 |
1174 EXTN(jsimd_h2v1_fancy_upsample_sse2): | 13856 EXTN(jsimd_h2v1_fancy_upsample_sse2): |
1175 push rbp | 13857 push rbp |
1176 @@ -201,7 +201,7 @@ | 13858 +» mov» rax,rsp |
| 13859 » mov» rbp,rsp |
| 13860 » collect_args |
| 13861 |
| 13862 @@ -200,7 +201,7 @@ |
1177 %define WK_NUM 4 | 13863 %define WK_NUM 4 |
1178 | 13864 |
1179 align 16 | 13865 align 16 |
1180 - global EXTN(jsimd_h2v2_fancy_upsample_sse2) | 13866 - global EXTN(jsimd_h2v2_fancy_upsample_sse2) |
1181 + global EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE | 13867 + global EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE |
1182 | 13868 |
1183 EXTN(jsimd_h2v2_fancy_upsample_sse2): | 13869 EXTN(jsimd_h2v2_fancy_upsample_sse2): |
1184 push rbp | 13870 push rbp |
1185 @@ -498,7 +498,7 @@ | 13871 @@ -210,8 +211,8 @@ |
| 13872 » mov» [rsp],rax |
| 13873 » mov» rbp,rsp»» » » ; rbp = aligned rbp |
| 13874 » lea» rsp, [wk(0)] |
| 13875 +» collect_args |
| 13876 » push» rbx |
| 13877 -» collect_args |
| 13878 |
| 13879 » mov» rax, r11 ; colctr |
| 13880 » test» rax,rax |
| 13881 @@ -472,8 +473,8 @@ |
| 13882 » jg» near .rowloop |
| 13883 |
| 13884 .return: |
| 13885 +» pop» rbx |
| 13886 » uncollect_args |
| 13887 -» pop» rbx |
| 13888 » mov» rsp,rbp»» ; rsp <- aligned rbp |
| 13889 » pop» rsp» » ; rsp <- original rbp |
| 13890 » pop» rbp |
| 13891 @@ -497,10 +498,11 @@ |
1186 ; r13 = JSAMPARRAY * output_data_ptr | 13892 ; r13 = JSAMPARRAY * output_data_ptr |
1187 | 13893 |
1188 align 16 | 13894 align 16 |
1189 - global EXTN(jsimd_h2v1_upsample_sse2) | 13895 - global EXTN(jsimd_h2v1_upsample_sse2) |
1190 + global EXTN(jsimd_h2v1_upsample_sse2) PRIVATE | 13896 + global EXTN(jsimd_h2v1_upsample_sse2) PRIVATE |
1191 | 13897 |
1192 EXTN(jsimd_h2v1_upsample_sse2): | 13898 EXTN(jsimd_h2v1_upsample_sse2): |
1193 push rbp | 13899 push rbp |
1194 @@ -587,7 +587,7 @@ | 13900 +» mov» rax,rsp |
| 13901 » mov» rbp,rsp |
| 13902 » collect_args |
| 13903 |
| 13904 @@ -585,13 +587,14 @@ |
1195 ; r13 = JSAMPARRAY * output_data_ptr | 13905 ; r13 = JSAMPARRAY * output_data_ptr |
1196 | 13906 |
1197 align 16 | 13907 align 16 |
1198 - global EXTN(jsimd_h2v2_upsample_sse2) | 13908 - global EXTN(jsimd_h2v2_upsample_sse2) |
1199 + global EXTN(jsimd_h2v2_upsample_sse2) PRIVATE | 13909 + global EXTN(jsimd_h2v2_upsample_sse2) PRIVATE |
1200 | 13910 |
1201 EXTN(jsimd_h2v2_upsample_sse2): | 13911 EXTN(jsimd_h2v2_upsample_sse2): |
1202 push rbp | 13912 push rbp |
1203 Index: simd/jcgrass2.asm | 13913 +» mov» rax,rsp |
1204 =================================================================== | 13914 » mov» rbp,rsp |
1205 --- simd/jcgrass2.asm» (revision 829) | 13915 +» collect_args |
1206 +++ simd/jcgrass2.asm» (working copy) | 13916 » push» rbx |
1207 @@ -30,7 +30,7 @@ | 13917 -» collect_args |
1208 » SECTION»SEG_CONST | 13918 |
1209 | 13919 » mov» rdx, r11 |
1210 » alignz» 16 | 13920 » add» rdx, byte (2*SIZEOF_XMMWORD)-1 |
1211 -» global» EXTN(jconst_rgb_gray_convert_sse2) | 13921 @@ -658,7 +661,11 @@ |
1212 +» global» EXTN(jconst_rgb_gray_convert_sse2) PRIVATE | 13922 » jg» near .rowloop |
1213 | 13923 |
1214 EXTN(jconst_rgb_gray_convert_sse2): | 13924 .return: |
1215 | 13925 +» pop» rbx |
1216 Index: simd/jcsammmx.asm | 13926 » uncollect_args |
1217 =================================================================== | 13927 -» pop» rbx |
1218 --- simd/jcsammmx.asm» (revision 829) | 13928 » pop» rbp |
1219 +++ simd/jcsammmx.asm» (working copy) | 13929 » ret |
1220 @@ -40,7 +40,7 @@ | |
1221 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data | |
1222 | |
1223 » align» 16 | |
1224 -» global» EXTN(jsimd_h2v1_downsample_mmx) | |
1225 +» global» EXTN(jsimd_h2v1_downsample_mmx) PRIVATE | |
1226 | |
1227 EXTN(jsimd_h2v1_downsample_mmx): | |
1228 » push» ebp | |
1229 @@ -182,7 +182,7 @@ | |
1230 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data | |
1231 | |
1232 » align» 16 | |
1233 -» global» EXTN(jsimd_h2v2_downsample_mmx) | |
1234 +» global» EXTN(jsimd_h2v2_downsample_mmx) PRIVATE | |
1235 | |
1236 EXTN(jsimd_h2v2_downsample_mmx): | |
1237 » push» ebp | |
1238 +Index: simd/jsimd_arm.c | |
1239 +=================================================================== | |
1240 +--- simd/jsimd_arm.c (revision 272637) | |
1241 ++++ simd/jsimd_arm.c (working copy) | |
1242 +@@ -29,0 +29,0 @@ | |
1243 + | 13930 + |
1244 + static unsigned int simd_support = ~0; | 13931 +; For some reason, the OS X linker does not honor the request to align the |
| 13932 +; segment unless we do this. |
| 13933 +» align» 16 |
| 13934 Index: simd/jdsamss2.asm |
| 13935 =================================================================== |
| 13936 --- simd/jdsamss2.asm» (revision 829) |
| 13937 +++ simd/jdsamss2.asm» (working copy) |
| 13938 @@ -22,7 +22,7 @@ |
| 13939 » SECTION»SEG_CONST |
| 13940 |
| 13941 » alignz» 16 |
| 13942 -» global» EXTN(jconst_fancy_upsample_sse2) |
| 13943 +» global» EXTN(jconst_fancy_upsample_sse2) PRIVATE |
| 13944 |
| 13945 EXTN(jconst_fancy_upsample_sse2): |
| 13946 |
| 13947 @@ -58,7 +58,7 @@ |
| 13948 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr |
| 13949 |
| 13950 » align» 16 |
| 13951 -» global» EXTN(jsimd_h2v1_fancy_upsample_sse2) |
| 13952 +» global» EXTN(jsimd_h2v1_fancy_upsample_sse2) PRIVATE |
| 13953 |
| 13954 EXTN(jsimd_h2v1_fancy_upsample_sse2): |
| 13955 » push» ebp |
| 13956 @@ -214,7 +214,7 @@ |
| 13957 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr |
| 13958 |
| 13959 » align» 16 |
| 13960 -» global» EXTN(jsimd_h2v2_fancy_upsample_sse2) |
| 13961 +» global» EXTN(jsimd_h2v2_fancy_upsample_sse2) PRIVATE |
| 13962 |
| 13963 EXTN(jsimd_h2v2_fancy_upsample_sse2): |
| 13964 » push» ebp |
| 13965 @@ -538,7 +538,7 @@ |
| 13966 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr |
| 13967 |
| 13968 » align» 16 |
| 13969 -» global» EXTN(jsimd_h2v1_upsample_sse2) |
| 13970 +» global» EXTN(jsimd_h2v1_upsample_sse2) PRIVATE |
| 13971 |
| 13972 EXTN(jsimd_h2v1_upsample_sse2): |
| 13973 » push» ebp |
| 13974 @@ -637,7 +637,7 @@ |
| 13975 %define output_data_ptr(b)» (b)+20» » ; JSAMPARRAY * output_data_ptr |
| 13976 |
| 13977 » align» 16 |
| 13978 -» global» EXTN(jsimd_h2v2_upsample_sse2) |
| 13979 +» global» EXTN(jsimd_h2v2_upsample_sse2) PRIVATE |
| 13980 |
| 13981 EXTN(jsimd_h2v2_upsample_sse2): |
| 13982 » push» ebp |
| 13983 @@ -724,3 +724,6 @@ |
| 13984 » pop» ebp |
| 13985 » ret |
| 13986 |
| 13987 +; For some reason, the OS X linker does not honor the request to align the |
| 13988 +; segment unless we do this. |
| 13989 +» align» 16 |
| 13990 Index: simd/jf3dnflt.asm |
| 13991 =================================================================== |
| 13992 --- simd/jf3dnflt.asm» (revision 829) |
| 13993 +++ simd/jf3dnflt.asm» (working copy) |
| 13994 @@ -27,7 +27,7 @@ |
| 13995 » SECTION»SEG_CONST |
| 13996 |
| 13997 » alignz» 16 |
| 13998 -» global» EXTN(jconst_fdct_float_3dnow) |
| 13999 +» global» EXTN(jconst_fdct_float_3dnow) PRIVATE |
| 14000 |
| 14001 EXTN(jconst_fdct_float_3dnow): |
| 14002 |
| 14003 @@ -55,7 +55,7 @@ |
| 14004 %define WK_NUM»» 2 |
| 14005 |
| 14006 » align» 16 |
| 14007 -» global» EXTN(jsimd_fdct_float_3dnow) |
| 14008 +» global» EXTN(jsimd_fdct_float_3dnow) PRIVATE |
| 14009 |
| 14010 EXTN(jsimd_fdct_float_3dnow): |
| 14011 » push» ebp |
| 14012 @@ -315,3 +315,6 @@ |
| 14013 » pop» ebp |
| 14014 » ret |
| 14015 |
| 14016 +; For some reason, the OS X linker does not honor the request to align the |
| 14017 +; segment unless we do this. |
| 14018 +» align» 16 |
| 14019 Index: simd/jfmmxfst.asm |
| 14020 =================================================================== |
| 14021 --- simd/jfmmxfst.asm» (revision 829) |
| 14022 +++ simd/jfmmxfst.asm» (working copy) |
| 14023 @@ -52,7 +52,7 @@ |
| 14024 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) |
| 14025 |
| 14026 » alignz» 16 |
| 14027 -» global» EXTN(jconst_fdct_ifast_mmx) |
| 14028 +» global» EXTN(jconst_fdct_ifast_mmx) PRIVATE |
| 14029 |
| 14030 EXTN(jconst_fdct_ifast_mmx): |
| 14031 |
| 14032 @@ -80,7 +80,7 @@ |
| 14033 %define WK_NUM»» 2 |
| 14034 |
| 14035 » align» 16 |
| 14036 -» global» EXTN(jsimd_fdct_ifast_mmx) |
| 14037 +» global» EXTN(jsimd_fdct_ifast_mmx) PRIVATE |
| 14038 |
| 14039 EXTN(jsimd_fdct_ifast_mmx): |
| 14040 » push» ebp |
| 14041 @@ -392,3 +392,6 @@ |
| 14042 » pop» ebp |
| 14043 » ret |
| 14044 |
| 14045 +; For some reason, the OS X linker does not honor the request to align the |
| 14046 +; segment unless we do this. |
| 14047 +» align» 16 |
| 14048 Index: simd/jfmmxint.asm |
| 14049 =================================================================== |
| 14050 --- simd/jfmmxint.asm» (revision 829) |
| 14051 +++ simd/jfmmxint.asm» (working copy) |
| 14052 @@ -66,7 +66,7 @@ |
| 14053 » SECTION»SEG_CONST |
| 14054 |
| 14055 » alignz» 16 |
| 14056 -» global» EXTN(jconst_fdct_islow_mmx) |
| 14057 +» global» EXTN(jconst_fdct_islow_mmx) PRIVATE |
| 14058 |
| 14059 EXTN(jconst_fdct_islow_mmx): |
| 14060 |
| 14061 @@ -101,7 +101,7 @@ |
| 14062 %define WK_NUM»» 2 |
| 14063 |
| 14064 » align» 16 |
| 14065 -» global» EXTN(jsimd_fdct_islow_mmx) |
| 14066 +» global» EXTN(jsimd_fdct_islow_mmx) PRIVATE |
| 14067 |
| 14068 EXTN(jsimd_fdct_islow_mmx): |
| 14069 » push» ebp |
| 14070 @@ -617,3 +617,6 @@ |
| 14071 » pop» ebp |
| 14072 » ret |
| 14073 |
| 14074 +; For some reason, the OS X linker does not honor the request to align the |
| 14075 +; segment unless we do this. |
| 14076 +» align» 16 |
| 14077 Index: simd/jfss2fst-64.asm |
| 14078 =================================================================== |
| 14079 --- simd/jfss2fst-64.asm» (revision 829) |
| 14080 +++ simd/jfss2fst-64.asm» (working copy) |
| 14081 @@ -1,5 +1,5 @@ |
| 14082 ; |
| 14083 -; jfss2fst.asm - fast integer FDCT (64-bit SSE2) |
| 14084 +; jfss2fst-64.asm - fast integer FDCT (64-bit SSE2) |
| 14085 ; |
| 14086 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 14087 ; Copyright 2009 D. R. Commander |
| 14088 @@ -53,7 +53,7 @@ |
| 14089 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) |
| 14090 |
| 14091 » alignz» 16 |
| 14092 -» global» EXTN(jconst_fdct_ifast_sse2) |
| 14093 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE |
| 14094 |
| 14095 EXTN(jconst_fdct_ifast_sse2): |
| 14096 |
| 14097 @@ -80,7 +80,7 @@ |
| 14098 %define WK_NUM»» 2 |
| 14099 |
| 14100 » align» 16 |
| 14101 -» global» EXTN(jsimd_fdct_ifast_sse2) |
| 14102 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE |
| 14103 |
| 14104 EXTN(jsimd_fdct_ifast_sse2): |
| 14105 » push» rbp |
| 14106 @@ -386,3 +386,7 @@ |
| 14107 » pop» rsp» » ; rsp <- original rbp |
| 14108 » pop» rbp |
| 14109 » ret |
1245 + | 14110 + |
1246 +-#if defined(__linux__) || defined(ANDROID) || defined(__ANDROID__) | 14111 +; For some reason, the OS X linker does not honor the request to align the |
1247 ++#if !defined(__ARM_NEON__) && (defined(__linux__) || defined(ANDROID) || defin
ed(__ANDROID__)) | 14112 +; segment unless we do this. |
| 14113 +» align» 16 |
| 14114 Index: simd/jfss2fst.asm |
| 14115 =================================================================== |
| 14116 --- simd/jfss2fst.asm» (revision 829) |
| 14117 +++ simd/jfss2fst.asm» (working copy) |
| 14118 @@ -52,7 +52,7 @@ |
| 14119 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) |
| 14120 |
| 14121 » alignz» 16 |
| 14122 -» global» EXTN(jconst_fdct_ifast_sse2) |
| 14123 +» global» EXTN(jconst_fdct_ifast_sse2) PRIVATE |
| 14124 |
| 14125 EXTN(jconst_fdct_ifast_sse2): |
| 14126 |
| 14127 @@ -80,7 +80,7 @@ |
| 14128 %define WK_NUM»» 2 |
| 14129 |
| 14130 » align» 16 |
| 14131 -» global» EXTN(jsimd_fdct_ifast_sse2) |
| 14132 +» global» EXTN(jsimd_fdct_ifast_sse2) PRIVATE |
| 14133 |
| 14134 EXTN(jsimd_fdct_ifast_sse2): |
| 14135 » push» ebp |
| 14136 @@ -399,3 +399,6 @@ |
| 14137 » pop» ebp |
| 14138 » ret |
| 14139 |
| 14140 +; For some reason, the OS X linker does not honor the request to align the |
| 14141 +; segment unless we do this. |
| 14142 +» align» 16 |
| 14143 Index: simd/jfss2int-64.asm |
| 14144 =================================================================== |
| 14145 --- simd/jfss2int-64.asm» (revision 829) |
| 14146 +++ simd/jfss2int-64.asm» (working copy) |
| 14147 @@ -1,5 +1,5 @@ |
| 14148 ; |
| 14149 -; jfss2int.asm - accurate integer FDCT (64-bit SSE2) |
| 14150 +; jfss2int-64.asm - accurate integer FDCT (64-bit SSE2) |
| 14151 ; |
| 14152 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 14153 ; Copyright 2009 D. R. Commander |
| 14154 @@ -67,7 +67,7 @@ |
| 14155 » SECTION»SEG_CONST |
| 14156 |
| 14157 » alignz» 16 |
| 14158 -» global» EXTN(jconst_fdct_islow_sse2) |
| 14159 +» global» EXTN(jconst_fdct_islow_sse2) PRIVATE |
| 14160 |
| 14161 EXTN(jconst_fdct_islow_sse2): |
| 14162 |
| 14163 @@ -101,7 +101,7 @@ |
| 14164 %define WK_NUM»» 6 |
| 14165 |
| 14166 » align» 16 |
| 14167 -» global» EXTN(jsimd_fdct_islow_sse2) |
| 14168 +» global» EXTN(jsimd_fdct_islow_sse2) PRIVATE |
| 14169 |
| 14170 EXTN(jsimd_fdct_islow_sse2): |
| 14171 » push» rbp |
| 14172 @@ -616,3 +616,7 @@ |
| 14173 » pop» rsp» » ; rsp <- original rbp |
| 14174 » pop» rbp |
| 14175 » ret |
1248 + | 14176 + |
1249 + #define SOMEWHAT_SANE_PROC_CPUINFO_SIZE_LIMIT (1024 * 1024) | 14177 +; For some reason, the OS X linker does not honor the request to align the |
1250 + | 14178 +; segment unless we do this. |
1251 +@@ -100,6 +100,6 @@ | 14179 +» align» 16 |
1252 + init_simd (void) | |
1253 + { | |
1254 + char *env = NULL; | |
1255 +-#if !defined(__ARM_NEON__) && defined(__linux__) || defined(ANDROID) || define
d(__ANDROID__) | |
1256 ++#if !defined(__ARM_NEON__) && (defined(__linux__) || defined(ANDROID) || defin
ed(__ANDROID__)) | |
1257 + int bufsize = 1024; /* an initial guess for the line buffer size limit */ | |
1258 + #endif | |
1259 + | |
1260 Index: simd/jsimd_arm_neon.S | |
1261 =================================================================== | |
1262 --- simd/jsimd_arm_neon.S» (revision 272637) | |
1263 +++ simd/jsimd_arm_neon.S» (working copy) | |
1264 @@ -41,11 +41,9 @@ | |
1265 /* Supplementary macro for setting function attributes */ | |
1266 .macro asm_function fname | |
1267 #ifdef __APPLE__ | |
1268 - .func _\fname | |
1269 .globl _\fname | |
1270 _\fname: | |
1271 #else | |
1272 - .func \fname | |
1273 .global \fname | |
1274 #ifdef __ELF__ | |
1275 .hidden \fname | |
1276 @@ -670,7 +668,6 @@ | |
1277 .unreq ROW6R | |
1278 .unreq ROW7L | |
1279 .unreq ROW7R | |
1280 -.endfunc | |
1281 | |
1282 | |
1283 /*****************************************************************************/ | |
1284 @@ -895,7 +892,6 @@ | |
1285 .unreq TMP2 | |
1286 .unreq TMP3 | |
1287 .unreq TMP4 | |
1288 -.endfunc | |
1289 | |
1290 | |
1291 /*****************************************************************************/ | |
1292 @@ -1108,7 +1104,6 @@ | |
1293 .unreq TMP2 | |
1294 .unreq TMP3 | |
1295 .unreq TMP4 | |
1296 -.endfunc | |
1297 | |
1298 .purgem idct_helper | |
1299 | |
1300 @@ -1263,7 +1258,6 @@ | |
1301 .unreq OUTPUT_COL | |
1302 .unreq TMP1 | |
1303 .unreq TMP2 | |
1304 -.endfunc | |
1305 | |
1306 .purgem idct_helper | |
1307 | |
1308 @@ -1547,7 +1541,6 @@ | |
1309 .unreq U | |
1310 .unreq V | |
1311 .unreq N | |
1312 -.endfunc | |
1313 | |
1314 .purgem do_yuv_to_rgb | |
1315 .purgem do_yuv_to_rgb_stage1 | |
1316 @@ -1858,7 +1851,6 @@ | |
1317 .unreq U | |
1318 .unreq V | |
1319 .unreq N | |
1320 -.endfunc | |
1321 | |
1322 .purgem do_rgb_to_yuv | |
1323 .purgem do_rgb_to_yuv_stage1 | |
1324 @@ -1940,7 +1932,6 @@ | |
1325 .unreq TMP2 | |
1326 .unreq TMP3 | |
1327 .unreq TMP4 | |
1328 -.endfunc | |
1329 | |
1330 | |
1331 /*****************************************************************************/ | |
1332 @@ -2064,7 +2055,6 @@ | |
1333 | |
1334 .unreq DATA | |
1335 .unreq TMP | |
1336 -.endfunc | |
1337 | |
1338 | |
1339 /*****************************************************************************/ | |
1340 @@ -2166,7 +2156,6 @@ | |
1341 .unreq CORRECTION | |
1342 .unreq SHIFT | |
1343 .unreq LOOP_COUNT | |
1344 -.endfunc | |
1345 | |
1346 | |
1347 /*****************************************************************************/ | |
1348 @@ -2401,7 +2390,6 @@ | |
1349 .unreq WIDTH | |
1350 .unreq TMP | |
1351 | |
1352 -.endfunc | |
1353 | |
1354 .purgem upsample16 | |
1355 .purgem upsample32 | |
1356 Index: simd/jsimd_i386.c | |
1357 =================================================================== | |
1358 --- simd/jsimd_i386.c» (revision 829) | |
1359 +++ simd/jsimd_i386.c» (working copy) | |
1360 @@ -61,6 +61,7 @@ | |
1361 simd_support &= JSIMD_SSE2; | |
1362 } | |
1363 | |
1364 +#ifndef JPEG_DECODE_ONLY | |
1365 GLOBAL(int) | |
1366 jsimd_can_rgb_ycc (void) | |
1367 { | |
1368 @@ -82,6 +83,7 @@ | |
1369 | |
1370 return 0; | |
1371 } | |
1372 +#endif | |
1373 | |
1374 GLOBAL(int) | |
1375 jsimd_can_rgb_gray (void) | |
1376 @@ -127,6 +129,7 @@ | |
1377 return 0; | |
1378 } | |
1379 | |
1380 +#ifndef JPEG_DECODE_ONLY | |
1381 GLOBAL(void) | |
1382 jsimd_rgb_ycc_convert (j_compress_ptr cinfo, | |
1383 JSAMPARRAY input_buf, JSAMPIMAGE output_buf, | |
1384 @@ -179,6 +182,7 @@ | |
1385 mmxfct(cinfo->image_width, input_buf, | |
1386 output_buf, output_row, num_rows); | |
1387 } | |
1388 +#endif | |
1389 | |
1390 GLOBAL(void) | |
1391 jsimd_rgb_gray_convert (j_compress_ptr cinfo, | |
1392 @@ -286,6 +290,7 @@ | |
1393 input_row, output_buf, num_rows); | |
1394 } | |
1395 | |
1396 +#ifndef JPEG_DECODE_ONLY | |
1397 GLOBAL(int) | |
1398 jsimd_can_h2v2_downsample (void) | |
1399 { | |
1400 @@ -351,6 +356,7 @@ | |
1401 compptr->v_samp_factor, compptr->width_in_blocks, | |
1402 input_data, output_data); | |
1403 } | |
1404 +#endif | |
1405 | |
1406 GLOBAL(int) | |
1407 jsimd_can_h2v2_upsample (void) | |
1408 @@ -636,6 +642,7 @@ | |
1409 in_row_group_ctr, output_buf); | |
1410 } | |
1411 | |
1412 +#ifndef JPEG_DECODE_ONLY | |
1413 GLOBAL(int) | |
1414 jsimd_can_convsamp (void) | |
1415 { | |
1416 @@ -855,6 +862,7 @@ | |
1417 else if (simd_support & JSIMD_3DNOW) | |
1418 jsimd_quantize_float_3dnow(coef_block, divisors, workspace); | |
1419 } | |
1420 +#endif | |
1421 | |
1422 GLOBAL(int) | |
1423 jsimd_can_idct_2x2 (void) | |
1424 @@ -1045,4 +1053,3 @@ | |
1425 jsimd_idct_float_3dnow(compptr->dct_table, coef_block, | |
1426 output_buf, output_col); | |
1427 } | |
1428 - | |
1429 Index: simd/jcqnts2f-64.asm | |
1430 =================================================================== | |
1431 --- simd/jcqnts2f-64.asm» (revision 829) | |
1432 +++ simd/jcqnts2f-64.asm» (working copy) | |
1433 @@ -36,7 +36,7 @@ | |
1434 ; r12 = FAST_FLOAT * workspace | |
1435 | |
1436 » align» 16 | |
1437 -» global» EXTN(jsimd_convsamp_float_sse2) | |
1438 +» global» EXTN(jsimd_convsamp_float_sse2) PRIVATE | |
1439 | |
1440 EXTN(jsimd_convsamp_float_sse2): | |
1441 » push» rbp | |
1442 @@ -110,7 +110,7 @@ | |
1443 ; r12 = FAST_FLOAT * workspace | |
1444 | |
1445 » align» 16 | |
1446 -» global» EXTN(jsimd_quantize_float_sse2) | |
1447 +» global» EXTN(jsimd_quantize_float_sse2) PRIVATE | |
1448 | |
1449 EXTN(jsimd_quantize_float_sse2): | |
1450 » push» rbp | |
1451 Index: simd/jcqnt3dn.asm | |
1452 =================================================================== | |
1453 --- simd/jcqnt3dn.asm» (revision 829) | |
1454 +++ simd/jcqnt3dn.asm» (working copy) | |
1455 @@ -35,7 +35,7 @@ | |
1456 %define workspace» ebp+16» » ; FAST_FLOAT * workspace | |
1457 | |
1458 » align» 16 | |
1459 -» global» EXTN(jsimd_convsamp_float_3dnow) | |
1460 +» global» EXTN(jsimd_convsamp_float_3dnow) PRIVATE | |
1461 | |
1462 EXTN(jsimd_convsamp_float_3dnow): | |
1463 » push» ebp | |
1464 @@ -138,7 +138,7 @@ | |
1465 %define workspace» ebp+16» » ; FAST_FLOAT * workspace | |
1466 | |
1467 » align» 16 | |
1468 -» global» EXTN(jsimd_quantize_float_3dnow) | |
1469 +» global» EXTN(jsimd_quantize_float_3dnow) PRIVATE | |
1470 | |
1471 EXTN(jsimd_quantize_float_3dnow): | |
1472 » push» ebp | |
1473 Index: simd/jcsamss2.asm | |
1474 =================================================================== | |
1475 --- simd/jcsamss2.asm» (revision 829) | |
1476 +++ simd/jcsamss2.asm» (working copy) | |
1477 @@ -40,7 +40,7 @@ | |
1478 %define output_data(b)»(b)+28» » ; JSAMPARRAY output_data | |
1479 | |
1480 » align» 16 | |
1481 -» global» EXTN(jsimd_h2v1_downsample_sse2) | |
1482 +» global» EXTN(jsimd_h2v1_downsample_sse2) PRIVATE | |
1483 | |
1484 EXTN(jsimd_h2v1_downsample_sse2): | |
1485 » push» ebp | |
1486 @@ -195,7 +195,7 @@ | |
1487 %define output_data(b)»(b)+28» ; JSAMPARRAY output_data | |
1488 | |
1489 » align» 16 | |
1490 -» global» EXTN(jsimd_h2v2_downsample_sse2) | |
1491 +» global» EXTN(jsimd_h2v2_downsample_sse2) PRIVATE | |
1492 | |
1493 EXTN(jsimd_h2v2_downsample_sse2): | |
1494 » push» ebp | |
1495 Index: simd/jsimd_x86_64.c | |
1496 =================================================================== | |
1497 --- simd/jsimd_x86_64.c»(revision 829) | |
1498 +++ simd/jsimd_x86_64.c»(working copy) | |
1499 @@ -29,6 +29,7 @@ | |
1500 | |
1501 #define IS_ALIGNED_SSE(ptr) (IS_ALIGNED(ptr, 4)) /* 16 byte alignment */ | |
1502 | |
1503 +#ifndef JPEG_DECODE_ONLY | |
1504 GLOBAL(int) | |
1505 jsimd_can_rgb_ycc (void) | |
1506 { | |
1507 @@ -45,6 +46,7 @@ | |
1508 | |
1509 return 1; | |
1510 } | |
1511 +#endif | |
1512 | |
1513 GLOBAL(int) | |
1514 jsimd_can_rgb_gray (void) | |
1515 @@ -80,6 +82,7 @@ | |
1516 return 1; | |
1517 } | |
1518 | |
1519 +#ifndef JPEG_DECODE_ONLY | |
1520 GLOBAL(void) | |
1521 jsimd_rgb_ycc_convert (j_compress_ptr cinfo, | |
1522 JSAMPARRAY input_buf, JSAMPIMAGE output_buf, | |
1523 @@ -118,6 +121,7 @@ | |
1524 | |
1525 sse2fct(cinfo->image_width, input_buf, output_buf, output_row, num_rows); | |
1526 } | |
1527 +#endif | |
1528 | |
1529 GLOBAL(void) | |
1530 jsimd_rgb_gray_convert (j_compress_ptr cinfo, | |
1531 @@ -197,6 +201,7 @@ | |
1532 sse2fct(cinfo->output_width, input_buf, input_row, output_buf, num_rows); | |
1533 } | |
1534 | |
1535 +#ifndef JPEG_DECODE_ONLY | |
1536 GLOBAL(int) | |
1537 jsimd_can_h2v2_downsample (void) | |
1538 { | |
1539 @@ -242,6 +247,7 @@ | |
1540 compptr->width_in_blocks, | |
1541 input_data, output_data); | |
1542 } | |
1543 +#endif | |
1544 | |
1545 GLOBAL(int) | |
1546 jsimd_can_h2v2_upsample (void) | |
1547 @@ -451,6 +457,7 @@ | |
1548 sse2fct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf); | |
1549 } | |
1550 | |
1551 +#ifndef JPEG_DECODE_ONLY | |
1552 GLOBAL(int) | |
1553 jsimd_can_convsamp (void) | |
1554 { | |
1555 @@ -601,6 +608,7 @@ | |
1556 { | |
1557 jsimd_quantize_float_sse2(coef_block, divisors, workspace); | |
1558 } | |
1559 +#endif | |
1560 | |
1561 GLOBAL(int) | |
1562 jsimd_can_idct_2x2 (void) | |
1563 @@ -750,4 +758,3 @@ | |
1564 jsimd_idct_float_sse2(compptr->dct_table, coef_block, | |
1565 output_buf, output_col); | |
1566 } | |
1567 - | |
1568 Index: simd/jimmxint.asm | |
1569 =================================================================== | |
1570 --- simd/jimmxint.asm» (revision 829) | |
1571 +++ simd/jimmxint.asm» (working copy) | |
1572 @@ -66,7 +66,7 @@ | |
1573 » SECTION»SEG_CONST | |
1574 | |
1575 » alignz» 16 | |
1576 -» global» EXTN(jconst_idct_islow_mmx) | |
1577 +» global» EXTN(jconst_idct_islow_mmx) PRIVATE | |
1578 | |
1579 EXTN(jconst_idct_islow_mmx): | |
1580 | |
1581 @@ -107,7 +107,7 @@ | |
1582 » » » » » ; JCOEF workspace[DCTSIZE2] | |
1583 | |
1584 » align» 16 | |
1585 -» global» EXTN(jsimd_idct_islow_mmx) | |
1586 +» global» EXTN(jsimd_idct_islow_mmx) PRIVATE | |
1587 | |
1588 EXTN(jsimd_idct_islow_mmx): | |
1589 » push» ebp | |
1590 Index: simd/jcgrymmx.asm | |
1591 =================================================================== | |
1592 --- simd/jcgrymmx.asm» (revision 829) | |
1593 +++ simd/jcgrymmx.asm» (working copy) | |
1594 @@ -41,7 +41,7 @@ | |
1595 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr | |
1596 | |
1597 » align» 16 | |
1598 -» global» EXTN(jsimd_rgb_gray_convert_mmx) | |
1599 +» global» EXTN(jsimd_rgb_gray_convert_mmx) PRIVATE | |
1600 | |
1601 EXTN(jsimd_rgb_gray_convert_mmx): | |
1602 » push» ebp | |
1603 Index: simd/jfss2int.asm | 14180 Index: simd/jfss2int.asm |
1604 =================================================================== | 14181 =================================================================== |
1605 --- simd/jfss2int.asm (revision 829) | 14182 --- simd/jfss2int.asm (revision 829) |
1606 +++ simd/jfss2int.asm (working copy) | 14183 +++ simd/jfss2int.asm (working copy) |
1607 @@ -66,7 +66,7 @@ | 14184 @@ -66,7 +66,7 @@ |
1608 SECTION SEG_CONST | 14185 SECTION SEG_CONST |
1609 | 14186 |
1610 alignz 16 | 14187 alignz 16 |
1611 - global EXTN(jconst_fdct_islow_sse2) | 14188 - global EXTN(jconst_fdct_islow_sse2) |
1612 + global EXTN(jconst_fdct_islow_sse2) PRIVATE | 14189 + global EXTN(jconst_fdct_islow_sse2) PRIVATE |
1613 | 14190 |
1614 EXTN(jconst_fdct_islow_sse2): | 14191 EXTN(jconst_fdct_islow_sse2): |
1615 | 14192 |
1616 @@ -101,7 +101,7 @@ | 14193 @@ -101,7 +101,7 @@ |
1617 %define WK_NUM 6 | 14194 %define WK_NUM 6 |
1618 | 14195 |
1619 align 16 | 14196 align 16 |
1620 - global EXTN(jsimd_fdct_islow_sse2) | 14197 - global EXTN(jsimd_fdct_islow_sse2) |
1621 + global EXTN(jsimd_fdct_islow_sse2) PRIVATE | 14198 + global EXTN(jsimd_fdct_islow_sse2) PRIVATE |
1622 | 14199 |
1623 EXTN(jsimd_fdct_islow_sse2): | 14200 EXTN(jsimd_fdct_islow_sse2): |
1624 push ebp | 14201 push ebp |
1625 Index: simd/jcgryss2.asm | 14202 @@ -629,3 +629,6 @@ |
| 14203 » pop» ebp |
| 14204 » ret |
| 14205 |
| 14206 +; For some reason, the OS X linker does not honor the request to align the |
| 14207 +; segment unless we do this. |
| 14208 +» align» 16 |
| 14209 Index: simd/jfsseflt-64.asm |
1626 =================================================================== | 14210 =================================================================== |
1627 --- simd/jcgryss2.asm» (revision 829) | 14211 --- simd/jfsseflt-64.asm» (revision 829) |
1628 +++ simd/jcgryss2.asm» (working copy) | 14212 +++ simd/jfsseflt-64.asm» (working copy) |
1629 @@ -39,7 +39,7 @@ | 14213 @@ -1,5 +1,5 @@ |
| 14214 ; |
| 14215 -; jfsseflt.asm - floating-point FDCT (64-bit SSE) |
| 14216 +; jfsseflt-64.asm - floating-point FDCT (64-bit SSE) |
| 14217 ; |
| 14218 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 14219 ; Copyright 2009 D. R. Commander |
| 14220 @@ -38,7 +38,7 @@ |
| 14221 » SECTION»SEG_CONST |
| 14222 |
| 14223 » alignz» 16 |
| 14224 -» global» EXTN(jconst_fdct_float_sse) |
| 14225 +» global» EXTN(jconst_fdct_float_sse) PRIVATE |
| 14226 |
| 14227 EXTN(jconst_fdct_float_sse): |
| 14228 |
| 14229 @@ -65,7 +65,7 @@ |
| 14230 %define WK_NUM»» 2 |
1630 | 14231 |
1631 align 16 | 14232 align 16 |
| 14233 - global EXTN(jsimd_fdct_float_sse) |
| 14234 + global EXTN(jsimd_fdct_float_sse) PRIVATE |
1632 | 14235 |
1633 -» global» EXTN(jsimd_rgb_gray_convert_sse2) | 14236 EXTN(jsimd_fdct_float_sse): |
1634 +» global» EXTN(jsimd_rgb_gray_convert_sse2) PRIVATE | 14237 » push» rbp |
1635 | 14238 @@ -352,3 +352,7 @@ |
1636 EXTN(jsimd_rgb_gray_convert_sse2): | 14239 » pop» rsp» » ; rsp <- original rbp |
1637 » push» ebp | 14240 » pop» rbp |
1638 Index: simd/jccolmmx.asm | 14241 » ret |
| 14242 + |
| 14243 +; For some reason, the OS X linker does not honor the request to align the |
| 14244 +; segment unless we do this. |
| 14245 +» align» 16 |
| 14246 Index: simd/jfsseflt.asm |
1639 =================================================================== | 14247 =================================================================== |
1640 --- simd/jccolmmx.asm» (revision 829) | 14248 --- simd/jfsseflt.asm» (revision 829) |
1641 +++ simd/jccolmmx.asm» (working copy) | 14249 +++ simd/jfsseflt.asm» (working copy) |
1642 @@ -37,7 +37,7 @@ | 14250 @@ -37,7 +37,7 @@ |
1643 SECTION SEG_CONST | 14251 SECTION SEG_CONST |
1644 | 14252 |
1645 alignz 16 | 14253 alignz 16 |
1646 -» global» EXTN(jconst_rgb_ycc_convert_mmx) | 14254 -» global» EXTN(jconst_fdct_float_sse) |
1647 +» global» EXTN(jconst_rgb_ycc_convert_mmx) PRIVATE | 14255 +» global» EXTN(jconst_fdct_float_sse) PRIVATE |
1648 | 14256 |
1649 EXTN(jconst_rgb_ycc_convert_mmx): | 14257 EXTN(jconst_fdct_float_sse): |
1650 | 14258 |
| 14259 @@ -65,7 +65,7 @@ |
| 14260 %define WK_NUM 2 |
| 14261 |
| 14262 align 16 |
| 14263 - global EXTN(jsimd_fdct_float_sse) |
| 14264 + global EXTN(jsimd_fdct_float_sse) PRIVATE |
| 14265 |
| 14266 EXTN(jsimd_fdct_float_sse): |
| 14267 push ebp |
| 14268 @@ -365,3 +365,6 @@ |
| 14269 pop ebp |
| 14270 ret |
| 14271 |
| 14272 +; For some reason, the OS X linker does not honor the request to align the |
| 14273 +; segment unless we do this. |
| 14274 + align 16 |
| 14275 Index: simd/ji3dnflt.asm |
| 14276 =================================================================== |
| 14277 --- simd/ji3dnflt.asm (revision 829) |
| 14278 +++ simd/ji3dnflt.asm (working copy) |
| 14279 @@ -27,7 +27,7 @@ |
| 14280 SECTION SEG_CONST |
| 14281 |
| 14282 alignz 16 |
| 14283 - global EXTN(jconst_idct_float_3dnow) |
| 14284 + global EXTN(jconst_idct_float_3dnow) PRIVATE |
| 14285 |
| 14286 EXTN(jconst_idct_float_3dnow): |
| 14287 |
| 14288 @@ -63,7 +63,7 @@ |
| 14289 ; FAST_FLOAT workspace[DCTSIZE2] |
| 14290 |
| 14291 align 16 |
| 14292 - global EXTN(jsimd_idct_float_3dnow) |
| 14293 + global EXTN(jsimd_idct_float_3dnow) PRIVATE |
| 14294 |
| 14295 EXTN(jsimd_idct_float_3dnow): |
| 14296 push ebp |
| 14297 @@ -447,3 +447,6 @@ |
| 14298 pop ebp |
| 14299 ret |
| 14300 |
| 14301 +; For some reason, the OS X linker does not honor the request to align the |
| 14302 +; segment unless we do this. |
| 14303 + align 16 |
| 14304 Index: simd/jimmxfst.asm |
| 14305 =================================================================== |
| 14306 --- simd/jimmxfst.asm (revision 829) |
| 14307 +++ simd/jimmxfst.asm (working copy) |
| 14308 @@ -59,7 +59,7 @@ |
| 14309 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) |
| 14310 |
| 14311 alignz 16 |
| 14312 - global EXTN(jconst_idct_ifast_mmx) |
| 14313 + global EXTN(jconst_idct_ifast_mmx) PRIVATE |
| 14314 |
| 14315 EXTN(jconst_idct_ifast_mmx): |
| 14316 |
| 14317 @@ -94,7 +94,7 @@ |
| 14318 ; JCOEF workspace[DCTSIZE2] |
| 14319 |
| 14320 align 16 |
| 14321 - global EXTN(jsimd_idct_ifast_mmx) |
| 14322 + global EXTN(jsimd_idct_ifast_mmx) PRIVATE |
| 14323 |
| 14324 EXTN(jsimd_idct_ifast_mmx): |
| 14325 push ebp |
| 14326 @@ -495,3 +495,6 @@ |
| 14327 pop ebp |
| 14328 ret |
| 14329 |
| 14330 +; For some reason, the OS X linker does not honor the request to align the |
| 14331 +; segment unless we do this. |
| 14332 + align 16 |
| 14333 Index: simd/jimmxint.asm |
| 14334 =================================================================== |
| 14335 --- simd/jimmxint.asm (revision 829) |
| 14336 +++ simd/jimmxint.asm (working copy) |
| 14337 @@ -66,7 +66,7 @@ |
| 14338 SECTION SEG_CONST |
| 14339 |
| 14340 alignz 16 |
| 14341 - global EXTN(jconst_idct_islow_mmx) |
| 14342 + global EXTN(jconst_idct_islow_mmx) PRIVATE |
| 14343 |
| 14344 EXTN(jconst_idct_islow_mmx): |
| 14345 |
| 14346 @@ -107,7 +107,7 @@ |
| 14347 ; JCOEF workspace[DCTSIZE2] |
| 14348 |
| 14349 align 16 |
| 14350 - global EXTN(jsimd_idct_islow_mmx) |
| 14351 + global EXTN(jsimd_idct_islow_mmx) PRIVATE |
| 14352 |
| 14353 EXTN(jsimd_idct_islow_mmx): |
| 14354 push ebp |
| 14355 @@ -847,3 +847,6 @@ |
| 14356 pop ebp |
| 14357 ret |
| 14358 |
| 14359 +; For some reason, the OS X linker does not honor the request to align the |
| 14360 +; segment unless we do this. |
| 14361 + align 16 |
1651 Index: simd/jimmxred.asm | 14362 Index: simd/jimmxred.asm |
1652 =================================================================== | 14363 =================================================================== |
1653 --- simd/jimmxred.asm (revision 829) | 14364 --- simd/jimmxred.asm (revision 829) |
1654 +++ simd/jimmxred.asm (working copy) | 14365 +++ simd/jimmxred.asm (working copy) |
1655 @@ -72,7 +72,7 @@ | 14366 @@ -72,7 +72,7 @@ |
1656 SECTION SEG_CONST | 14367 SECTION SEG_CONST |
1657 | 14368 |
1658 alignz 16 | 14369 alignz 16 |
1659 - global EXTN(jconst_idct_red_mmx) | 14370 - global EXTN(jconst_idct_red_mmx) |
1660 + global EXTN(jconst_idct_red_mmx) PRIVATE | 14371 + global EXTN(jconst_idct_red_mmx) PRIVATE |
(...skipping 11 matching lines...) Expand all Loading... |
1672 push ebp | 14383 push ebp |
1673 @@ -503,7 +503,7 @@ | 14384 @@ -503,7 +503,7 @@ |
1674 %define output_col(b) (b)+20 ; JDIMENSION output_col | 14385 %define output_col(b) (b)+20 ; JDIMENSION output_col |
1675 | 14386 |
1676 align 16 | 14387 align 16 |
1677 - global EXTN(jsimd_idct_2x2_mmx) | 14388 - global EXTN(jsimd_idct_2x2_mmx) |
1678 + global EXTN(jsimd_idct_2x2_mmx) PRIVATE | 14389 + global EXTN(jsimd_idct_2x2_mmx) PRIVATE |
1679 | 14390 |
1680 EXTN(jsimd_idct_2x2_mmx): | 14391 EXTN(jsimd_idct_2x2_mmx): |
1681 push ebp | 14392 push ebp |
1682 Index: simd/jsimdext.inc | 14393 @@ -701,3 +701,6 @@ |
| 14394 » pop» ebp |
| 14395 » ret |
| 14396 |
| 14397 +; For some reason, the OS X linker does not honor the request to align the |
| 14398 +; segment unless we do this. |
| 14399 +» align» 16 |
| 14400 Index: simd/jiss2flt-64.asm |
1683 =================================================================== | 14401 =================================================================== |
1684 --- simd/jsimdext.inc» (revision 829) | 14402 --- simd/jiss2flt-64.asm» (revision 829) |
1685 +++ simd/jsimdext.inc» (working copy) | 14403 +++ simd/jiss2flt-64.asm» (working copy) |
1686 @@ -73,6 +73,9 @@ | 14404 @@ -1,5 +1,5 @@ |
1687 ; * *BSD family Unix using elf format | |
1688 ; * Unix System V, including Solaris x86, UnixWare and SCO Unix | |
1689 | |
1690 +; PIC is the default on Linux | |
1691 +%define PIC | |
1692 + | |
1693 ; mark stack as non-executable | |
1694 section .note.GNU-stack noalloc noexec nowrite progbits | |
1695 | |
1696 @@ -375,4 +378,14 @@ | |
1697 ; | 14405 ; |
1698 %include "jsimdcfg.inc" | 14406 -; jiss2flt.asm - floating-point IDCT (64-bit SSE & SSE2) |
1699 | 14407 +; jiss2flt-64.asm - floating-point IDCT (64-bit SSE & SSE2) |
1700 +; Begin chromium edits | 14408 ; |
1701 +%ifdef MACHO ; ----(nasm -fmacho -DMACHO ...)-------- | 14409 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
1702 +%define PRIVATE :private_extern | 14410 ; Copyright 2009 D. R. Commander |
1703 +%elifdef ELF ; ----(nasm -felf[64] -DELF ...)------------ | 14411 @@ -38,7 +38,7 @@ |
1704 +%define PRIVATE :hidden | |
1705 +%else | |
1706 +%define PRIVATE | |
1707 +%endif | |
1708 +; End chromium edits | |
1709 + | |
1710 ; -------------------------------------------------------------------------- | |
1711 Index: simd/jdclrmmx.asm | |
1712 =================================================================== | |
1713 --- simd/jdclrmmx.asm» (revision 829) | |
1714 +++ simd/jdclrmmx.asm» (working copy) | |
1715 @@ -40,7 +40,7 @@ | |
1716 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr | |
1717 | |
1718 » align» 16 | |
1719 -» global» EXTN(jsimd_ycc_rgb_convert_mmx) | |
1720 +» global» EXTN(jsimd_ycc_rgb_convert_mmx) PRIVATE | |
1721 | |
1722 EXTN(jsimd_ycc_rgb_convert_mmx): | |
1723 » push» ebp | |
1724 Index: simd/jccolss2.asm | |
1725 =================================================================== | |
1726 --- simd/jccolss2.asm» (revision 829) | |
1727 +++ simd/jccolss2.asm» (working copy) | |
1728 @@ -34,7 +34,7 @@ | |
1729 SECTION SEG_CONST | 14412 SECTION SEG_CONST |
1730 | 14413 |
1731 alignz 16 | 14414 alignz 16 |
1732 -» global» EXTN(jconst_rgb_ycc_convert_sse2) | 14415 -» global» EXTN(jconst_idct_float_sse2) |
1733 +» global» EXTN(jconst_rgb_ycc_convert_sse2) PRIVATE | 14416 +» global» EXTN(jconst_idct_float_sse2) PRIVATE |
1734 | 14417 |
1735 EXTN(jconst_rgb_ycc_convert_sse2): | 14418 EXTN(jconst_idct_float_sse2): |
1736 | 14419 |
1737 Index: simd/jisseflt.asm | 14420 @@ -74,7 +74,7 @@ |
| 14421 » » » » » ; FAST_FLOAT workspace[DCTSIZE2] |
| 14422 |
| 14423 » align» 16 |
| 14424 -» global» EXTN(jsimd_idct_float_sse2) |
| 14425 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE |
| 14426 |
| 14427 EXTN(jsimd_idct_float_sse2): |
| 14428 » push» rbp |
| 14429 @@ -81,11 +81,11 @@ |
| 14430 » mov» rax,rsp»» » » ; rax = original rbp |
| 14431 » sub» rsp, byte 4 |
| 14432 » and» rsp, byte (-SIZEOF_XMMWORD)» ; align to 128 bits |
| 14433 -» mov» [rsp],eax |
| 14434 +» mov» [rsp],rax |
| 14435 » mov» rbp,rsp»» » » ; rbp = aligned rbp |
| 14436 » lea» rsp, [workspace] |
| 14437 +» collect_args |
| 14438 » push» rbx |
| 14439 -» collect_args |
| 14440 |
| 14441 » ; ---- Pass 1: process columns from input, store into work array. |
| 14442 |
| 14443 @@ -471,9 +471,13 @@ |
| 14444 » dec» rcx» » » » ; ctr |
| 14445 » jnz» near .rowloop |
| 14446 |
| 14447 +» pop» rbx |
| 14448 » uncollect_args |
| 14449 -» pop» rbx |
| 14450 » mov» rsp,rbp»» ; rsp <- aligned rbp |
| 14451 » pop» rsp» » ; rsp <- original rbp |
| 14452 » pop» rbp |
| 14453 » ret |
| 14454 + |
| 14455 +; For some reason, the OS X linker does not honor the request to align the |
| 14456 +; segment unless we do this. |
| 14457 +» align» 16 |
| 14458 Index: simd/jiss2flt.asm |
1738 =================================================================== | 14459 =================================================================== |
1739 --- simd/jisseflt.asm» (revision 829) | 14460 --- simd/jiss2flt.asm» (revision 829) |
1740 +++ simd/jisseflt.asm» (working copy) | 14461 +++ simd/jiss2flt.asm» (working copy) |
1741 @@ -37,7 +37,7 @@ | 14462 @@ -37,7 +37,7 @@ |
1742 SECTION SEG_CONST | 14463 SECTION SEG_CONST |
1743 | 14464 |
1744 alignz 16 | 14465 alignz 16 |
1745 -» global» EXTN(jconst_idct_float_sse) | 14466 -» global» EXTN(jconst_idct_float_sse2) |
1746 +» global» EXTN(jconst_idct_float_sse) PRIVATE | 14467 +» global» EXTN(jconst_idct_float_sse2) PRIVATE |
1747 | 14468 |
1748 EXTN(jconst_idct_float_sse): | 14469 EXTN(jconst_idct_float_sse2): |
1749 | 14470 |
1750 @@ -73,7 +73,7 @@ | 14471 @@ -73,7 +73,7 @@ |
1751 ; FAST_FLOAT workspace[DCTSIZE2] | 14472 ; FAST_FLOAT workspace[DCTSIZE2] |
1752 | 14473 |
1753 align 16 | 14474 align 16 |
1754 -» global» EXTN(jsimd_idct_float_sse) | 14475 -» global» EXTN(jsimd_idct_float_sse2) |
1755 +» global» EXTN(jsimd_idct_float_sse) PRIVATE | 14476 +» global» EXTN(jsimd_idct_float_sse2) PRIVATE |
1756 | 14477 |
1757 EXTN(jsimd_idct_float_sse): | 14478 EXTN(jsimd_idct_float_sse2): |
1758 push ebp | 14479 push ebp |
1759 Index: simd/jcqnts2i-64.asm | 14480 @@ -493,3 +493,6 @@ |
| 14481 » pop» ebp |
| 14482 » ret |
| 14483 |
| 14484 +; For some reason, the OS X linker does not honor the request to align the |
| 14485 +; segment unless we do this. |
| 14486 +» align» 16 |
| 14487 Index: simd/jiss2fst-64.asm |
1760 =================================================================== | 14488 =================================================================== |
1761 --- simd/jcqnts2i-64.asm» (revision 829) | 14489 --- simd/jiss2fst-64.asm» (revision 829) |
1762 +++ simd/jcqnts2i-64.asm» (working copy) | 14490 +++ simd/jiss2fst-64.asm» (working copy) |
1763 @@ -36,7 +36,7 @@ | 14491 @@ -1,5 +1,5 @@ |
1764 ; r12 = DCTELEM * workspace | 14492 ; |
| 14493 -; jiss2fst.asm - fast integer IDCT (64-bit SSE2) |
| 14494 +; jiss2fst-64.asm - fast integer IDCT (64-bit SSE2) |
| 14495 ; |
| 14496 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 14497 ; Copyright 2009 D. R. Commander |
| 14498 @@ -60,7 +60,7 @@ |
| 14499 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) |
| 14500 |
| 14501 » alignz» 16 |
| 14502 -» global» EXTN(jconst_idct_ifast_sse2) |
| 14503 +» global» EXTN(jconst_idct_ifast_sse2) PRIVATE |
| 14504 |
| 14505 EXTN(jconst_idct_ifast_sse2): |
| 14506 |
| 14507 @@ -93,7 +93,7 @@ |
| 14508 %define WK_NUM»» 2 |
1765 | 14509 |
1766 align 16 | 14510 align 16 |
1767 -» global» EXTN(jsimd_convsamp_sse2) | 14511 -» global» EXTN(jsimd_idct_ifast_sse2) |
1768 +» global» EXTN(jsimd_convsamp_sse2) PRIVATE | 14512 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE |
1769 | 14513 |
1770 EXTN(jsimd_convsamp_sse2): | 14514 EXTN(jsimd_idct_ifast_sse2): |
1771 push rbp | 14515 push rbp |
1772 @@ -112,7 +112,7 @@ | 14516 @@ -100,7 +100,7 @@ |
1773 ; r12 = DCTELEM * workspace | 14517 » mov» rax,rsp»» » » ; rax = original rbp |
| 14518 » sub» rsp, byte 4 |
| 14519 » and» rsp, byte (-SIZEOF_XMMWORD)» ; align to 128 bits |
| 14520 -» mov» [rsp],eax |
| 14521 +» mov» [rsp],rax |
| 14522 » mov» rbp,rsp»» » » ; rbp = aligned rbp |
| 14523 » lea» rsp, [wk(0)] |
| 14524 » collect_args |
| 14525 @@ -486,3 +486,7 @@ |
| 14526 » pop» rbp |
| 14527 » ret |
| 14528 » ret |
| 14529 + |
| 14530 +; For some reason, the OS X linker does not honor the request to align the |
| 14531 +; segment unless we do this. |
| 14532 +» align» 16 |
| 14533 Index: simd/jiss2fst.asm |
| 14534 =================================================================== |
| 14535 --- simd/jiss2fst.asm» (revision 829) |
| 14536 +++ simd/jiss2fst.asm» (working copy) |
| 14537 @@ -59,7 +59,7 @@ |
| 14538 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) |
| 14539 |
| 14540 » alignz» 16 |
| 14541 -» global» EXTN(jconst_idct_ifast_sse2) |
| 14542 +» global» EXTN(jconst_idct_ifast_sse2) PRIVATE |
| 14543 |
| 14544 EXTN(jconst_idct_ifast_sse2): |
| 14545 |
| 14546 @@ -92,7 +92,7 @@ |
| 14547 %define WK_NUM»» 2 |
1774 | 14548 |
1775 align 16 | 14549 align 16 |
1776 -» global» EXTN(jsimd_quantize_sse2) | 14550 -» global» EXTN(jsimd_idct_ifast_sse2) |
1777 +» global» EXTN(jsimd_quantize_sse2) PRIVATE | 14551 +» global» EXTN(jsimd_idct_ifast_sse2) PRIVATE |
1778 | 14552 |
1779 EXTN(jsimd_quantize_sse2): | 14553 EXTN(jsimd_idct_ifast_sse2): |
1780 » push» rbp | 14554 » push» ebp |
1781 Index: simd/jdclrss2.asm | 14555 @@ -497,3 +497,6 @@ |
1782 =================================================================== | 14556 » pop» ebp |
1783 --- simd/jdclrss2.asm» (revision 829) | 14557 » ret |
1784 +++ simd/jdclrss2.asm» (working copy) | |
1785 @@ -40,7 +40,7 @@ | |
1786 %define gotptr»» wk(0)-SIZEOF_POINTER» ; void * gotptr | |
1787 | 14558 |
1788 » align» 16 | 14559 +; For some reason, the OS X linker does not honor the request to align the |
1789 -» global» EXTN(jsimd_ycc_rgb_convert_sse2) | 14560 +; segment unless we do this. |
1790 +» global» EXTN(jsimd_ycc_rgb_convert_sse2) PRIVATE | 14561 +» align» 16 |
1791 | |
1792 EXTN(jsimd_ycc_rgb_convert_sse2): | |
1793 » push» ebp | |
1794 Index: simd/jcqntsse.asm | |
1795 =================================================================== | |
1796 --- simd/jcqntsse.asm» (revision 829) | |
1797 +++ simd/jcqntsse.asm» (working copy) | |
1798 @@ -35,7 +35,7 @@ | |
1799 %define workspace» ebp+16» » ; FAST_FLOAT * workspace | |
1800 | |
1801 » align» 16 | |
1802 -» global» EXTN(jsimd_convsamp_float_sse) | |
1803 +» global» EXTN(jsimd_convsamp_float_sse) PRIVATE | |
1804 | |
1805 EXTN(jsimd_convsamp_float_sse): | |
1806 » push» ebp | |
1807 @@ -138,7 +138,7 @@ | |
1808 %define workspace» ebp+16» » ; FAST_FLOAT * workspace | |
1809 | |
1810 » align» 16 | |
1811 -» global» EXTN(jsimd_quantize_float_sse) | |
1812 +» global» EXTN(jsimd_quantize_float_sse) PRIVATE | |
1813 | |
1814 EXTN(jsimd_quantize_float_sse): | |
1815 » push» ebp | |
1816 Index: simd/jiss2int-64.asm | 14562 Index: simd/jiss2int-64.asm |
1817 =================================================================== | 14563 =================================================================== |
1818 --- simd/jiss2int-64.asm (revision 829) | 14564 --- simd/jiss2int-64.asm (revision 829) |
1819 +++ simd/jiss2int-64.asm (working copy) | 14565 +++ simd/jiss2int-64.asm (working copy) |
| 14566 @@ -1,5 +1,5 @@ |
| 14567 ; |
| 14568 -; jiss2int.asm - accurate integer IDCT (64-bit SSE2) |
| 14569 +; jiss2int-64.asm - accurate integer IDCT (64-bit SSE2) |
| 14570 ; |
| 14571 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 14572 ; Copyright 2009 D. R. Commander |
1820 @@ -67,7 +67,7 @@ | 14573 @@ -67,7 +67,7 @@ |
1821 SECTION SEG_CONST | 14574 SECTION SEG_CONST |
1822 | 14575 |
1823 alignz 16 | 14576 alignz 16 |
1824 - global EXTN(jconst_idct_islow_sse2) | 14577 - global EXTN(jconst_idct_islow_sse2) |
1825 + global EXTN(jconst_idct_islow_sse2) PRIVATE | 14578 + global EXTN(jconst_idct_islow_sse2) PRIVATE |
1826 | 14579 |
1827 EXTN(jconst_idct_islow_sse2): | 14580 EXTN(jconst_idct_islow_sse2): |
1828 | 14581 |
1829 @@ -106,7 +106,7 @@ | 14582 @@ -106,7 +106,7 @@ |
1830 %define WK_NUM 12 | 14583 %define WK_NUM 12 |
1831 | 14584 |
1832 align 16 | 14585 align 16 |
1833 - global EXTN(jsimd_idct_islow_sse2) | 14586 - global EXTN(jsimd_idct_islow_sse2) |
1834 + global EXTN(jsimd_idct_islow_sse2) PRIVATE | 14587 + global EXTN(jsimd_idct_islow_sse2) PRIVATE |
1835 | 14588 |
1836 EXTN(jsimd_idct_islow_sse2): | 14589 EXTN(jsimd_idct_islow_sse2): |
1837 push rbp | 14590 push rbp |
1838 Index: simd/jfmmxfst.asm | 14591 @@ -842,3 +842,7 @@ |
| 14592 » pop» rsp» » ; rsp <- original rbp |
| 14593 » pop» rbp |
| 14594 » ret |
| 14595 + |
| 14596 +; For some reason, the OS X linker does not honor the request to align the |
| 14597 +; segment unless we do this. |
| 14598 +» align» 16 |
| 14599 Index: simd/jiss2int.asm |
1839 =================================================================== | 14600 =================================================================== |
1840 --- simd/jfmmxfst.asm» (revision 829) | 14601 --- simd/jiss2int.asm» (revision 829) |
1841 +++ simd/jfmmxfst.asm» (working copy) | 14602 +++ simd/jiss2int.asm» (working copy) |
1842 @@ -52,7 +52,7 @@ | 14603 @@ -66,7 +66,7 @@ |
1843 %define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) | 14604 » SECTION»SEG_CONST |
1844 | 14605 |
1845 alignz 16 | 14606 alignz 16 |
1846 -» global» EXTN(jconst_fdct_ifast_mmx) | 14607 -» global» EXTN(jconst_idct_islow_sse2) |
1847 +» global» EXTN(jconst_fdct_ifast_mmx) PRIVATE | 14608 +» global» EXTN(jconst_idct_islow_sse2) PRIVATE |
1848 | 14609 |
1849 EXTN(jconst_fdct_ifast_mmx): | 14610 EXTN(jconst_idct_islow_sse2): |
1850 | 14611 |
1851 @@ -80,7 +80,7 @@ | 14612 @@ -105,7 +105,7 @@ |
| 14613 %define WK_NUM»» 12 |
| 14614 |
| 14615 » align» 16 |
| 14616 -» global» EXTN(jsimd_idct_islow_sse2) |
| 14617 +» global» EXTN(jsimd_idct_islow_sse2) PRIVATE |
| 14618 |
| 14619 EXTN(jsimd_idct_islow_sse2): |
| 14620 » push» ebp |
| 14621 @@ -854,3 +854,6 @@ |
| 14622 » pop» ebp |
| 14623 » ret |
| 14624 |
| 14625 +; For some reason, the OS X linker does not honor the request to align the |
| 14626 +; segment unless we do this. |
| 14627 +» align» 16 |
| 14628 Index: simd/jiss2red-64.asm |
| 14629 =================================================================== |
| 14630 --- simd/jiss2red-64.asm» (revision 829) |
| 14631 +++ simd/jiss2red-64.asm» (working copy) |
| 14632 @@ -1,5 +1,5 @@ |
| 14633 ; |
| 14634 -; jiss2red.asm - reduced-size IDCT (64-bit SSE2) |
| 14635 +; jiss2red-64.asm - reduced-size IDCT (64-bit SSE2) |
| 14636 ; |
| 14637 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 14638 ; Copyright 2009 D. R. Commander |
| 14639 @@ -73,7 +73,7 @@ |
| 14640 » SECTION»SEG_CONST |
| 14641 |
| 14642 » alignz» 16 |
| 14643 -» global» EXTN(jconst_idct_red_sse2) |
| 14644 +» global» EXTN(jconst_idct_red_sse2) PRIVATE |
| 14645 |
| 14646 EXTN(jconst_idct_red_sse2): |
| 14647 |
| 14648 @@ -114,7 +114,7 @@ |
1852 %define WK_NUM 2 | 14649 %define WK_NUM 2 |
1853 | 14650 |
1854 align 16 | 14651 align 16 |
1855 -» global» EXTN(jsimd_fdct_ifast_mmx) | 14652 -» global» EXTN(jsimd_idct_4x4_sse2) |
1856 +» global» EXTN(jsimd_fdct_ifast_mmx) PRIVATE | 14653 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE |
1857 | 14654 |
1858 EXTN(jsimd_fdct_ifast_mmx): | 14655 EXTN(jsimd_idct_4x4_sse2): |
| 14656 » push» rbp |
| 14657 @@ -121,7 +121,7 @@ |
| 14658 » mov» rax,rsp»» » » ; rax = original rbp |
| 14659 » sub» rsp, byte 4 |
| 14660 » and» rsp, byte (-SIZEOF_XMMWORD)» ; align to 128 bits |
| 14661 -» mov» [rsp],eax |
| 14662 +» mov» [rsp],rax |
| 14663 » mov» rbp,rsp»» » » ; rbp = aligned rbp |
| 14664 » lea» rsp, [wk(0)] |
| 14665 » collect_args |
| 14666 @@ -413,13 +413,14 @@ |
| 14667 ; r13 = JDIMENSION output_col |
| 14668 |
| 14669 » align» 16 |
| 14670 -» global» EXTN(jsimd_idct_2x2_sse2) |
| 14671 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE |
| 14672 |
| 14673 EXTN(jsimd_idct_2x2_sse2): |
| 14674 » push» rbp |
| 14675 +» mov» rax,rsp |
| 14676 » mov» rbp,rsp |
| 14677 +» collect_args |
| 14678 » push» rbx |
| 14679 -» collect_args |
| 14680 |
| 14681 » ; ---- Pass 1: process columns from input. |
| 14682 |
| 14683 @@ -565,7 +566,11 @@ |
| 14684 » mov» WORD [rdx+rax*SIZEOF_JSAMPLE], bx |
| 14685 » mov» WORD [rsi+rax*SIZEOF_JSAMPLE], cx |
| 14686 |
| 14687 +» pop» rbx |
| 14688 » uncollect_args |
| 14689 -» pop» rbx |
| 14690 » pop» rbp |
| 14691 » ret |
| 14692 + |
| 14693 +; For some reason, the OS X linker does not honor the request to align the |
| 14694 +; segment unless we do this. |
| 14695 +» align» 16 |
| 14696 Index: simd/jiss2red.asm |
| 14697 =================================================================== |
| 14698 --- simd/jiss2red.asm» (revision 829) |
| 14699 +++ simd/jiss2red.asm» (working copy) |
| 14700 @@ -72,7 +72,7 @@ |
| 14701 » SECTION»SEG_CONST |
| 14702 |
| 14703 » alignz» 16 |
| 14704 -» global» EXTN(jconst_idct_red_sse2) |
| 14705 +» global» EXTN(jconst_idct_red_sse2) PRIVATE |
| 14706 |
| 14707 EXTN(jconst_idct_red_sse2): |
| 14708 |
| 14709 @@ -113,7 +113,7 @@ |
| 14710 %define WK_NUM»» 2 |
| 14711 |
| 14712 » align» 16 |
| 14713 -» global» EXTN(jsimd_idct_4x4_sse2) |
| 14714 +» global» EXTN(jsimd_idct_4x4_sse2) PRIVATE |
| 14715 |
| 14716 EXTN(jsimd_idct_4x4_sse2): |
1859 push ebp | 14717 push ebp |
1860 Index: jdarith.c | 14718 @@ -424,7 +424,7 @@ |
| 14719 %define output_col(b)» (b)+20» » ; JDIMENSION output_col |
| 14720 |
| 14721 » align» 16 |
| 14722 -» global» EXTN(jsimd_idct_2x2_sse2) |
| 14723 +» global» EXTN(jsimd_idct_2x2_sse2) PRIVATE |
| 14724 |
| 14725 EXTN(jsimd_idct_2x2_sse2): |
| 14726 » push» ebp |
| 14727 @@ -589,3 +589,6 @@ |
| 14728 » pop» ebp |
| 14729 » ret |
| 14730 |
| 14731 +; For some reason, the OS X linker does not honor the request to align the |
| 14732 +; segment unless we do this. |
| 14733 +» align» 16 |
| 14734 Index: simd/jisseflt.asm |
1861 =================================================================== | 14735 =================================================================== |
1862 --- jdarith.c» (revision 829) | 14736 --- simd/jisseflt.asm» (revision 829) |
1863 +++ jdarith.c» (working copy) | 14737 +++ simd/jisseflt.asm» (working copy) |
1864 @@ -150,8 +150,8 @@ | 14738 @@ -37,7 +37,7 @@ |
1865 */ | 14739 » SECTION»SEG_CONST |
1866 sv = *st; | 14740 |
1867 qe = jpeg_aritab[sv & 0x7F];»/* => Qe_Value */ | 14741 » alignz» 16 |
1868 - nl = qe & 0xFF; qe >>= 8;» /* Next_Index_LPS + Switch_MPS */ | 14742 -» global» EXTN(jconst_idct_float_sse) |
1869 - nm = qe & 0xFF; qe >>= 8;» /* Next_Index_MPS */ | 14743 +» global» EXTN(jconst_idct_float_sse) PRIVATE |
1870 + nl = (unsigned char) qe & 0xFF; qe >>= 8;» /* Next_Index_LPS + Switch_MPS *
/ | 14744 |
1871 + nm = (unsigned char) qe & 0xFF; qe >>= 8;» /* Next_Index_MPS */ | 14745 EXTN(jconst_idct_float_sse): |
1872 | 14746 |
1873 /* Decode & estimation procedures per sections D.2.4 & D.2.5 */ | 14747 @@ -73,7 +73,7 @@ |
1874 temp = e->a - qe; | 14748 » » » » » ; FAST_FLOAT workspace[DCTSIZE2] |
1875 Index: jdhuff.c | 14749 |
| 14750 » align» 16 |
| 14751 -» global» EXTN(jsimd_idct_float_sse) |
| 14752 +» global» EXTN(jsimd_idct_float_sse) PRIVATE |
| 14753 |
| 14754 EXTN(jsimd_idct_float_sse): |
| 14755 » push» ebp |
| 14756 @@ -567,3 +567,6 @@ |
| 14757 » pop» ebp |
| 14758 » ret |
| 14759 |
| 14760 +; For some reason, the OS X linker does not honor the request to align the |
| 14761 +; segment unless we do this. |
| 14762 +» align» 16 |
| 14763 Index: simd/jsimd.h |
1876 =================================================================== | 14764 =================================================================== |
1877 --- jdhuff.c (revision 1541) | 14765 --- simd/jsimd.h» (revision 829) |
1878 +++ jdhuff.c (working copy) | 14766 +++ simd/jsimd.h» (working copy) |
1879 @@ -662,7 +662,7 @@ | 14767 @@ -2,19 +2,22 @@ |
1880 d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn]; | 14768 * simd/jsimd.h |
1881 register int s, k, r, l; | 14769 * |
1882 | 14770 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
1883 - HUFF_DECODE_FAST(s, l, dctbl); | 14771 + * Copyright 2011 D. R. Commander |
1884 + HUFF_DECODE_FAST(s, l, dctbl, slow_decode_mcu); | 14772 * |
1885 if (s) { | 14773 * Based on the x86 SIMD extension for IJG JPEG library, |
1886 FILL_BIT_BUFFER_FAST | 14774 * Copyright (C) 1999-2006, MIYASAKA Masaru. |
1887 r = GET_BITS(s); | 14775 + * For conditions of distribution and use, see copyright notice in jsimdext.inc |
1888 @@ -679,7 +679,7 @@ | 14776 * |
1889 if (entropy->ac_needed[blkn]) { | |
1890 | |
1891 for (k = 1; k < DCTSIZE2; k++) { | |
1892 - HUFF_DECODE_FAST(s, l, actbl); | |
1893 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu); | |
1894 r = s >> 4; | |
1895 s &= 15; | |
1896 | |
1897 @@ -698,7 +698,7 @@ | |
1898 } else { | |
1899 | |
1900 for (k = 1; k < DCTSIZE2; k++) { | |
1901 - HUFF_DECODE_FAST(s, l, actbl); | |
1902 + HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu); | |
1903 r = s >> 4; | |
1904 s &= 15; | |
1905 | |
1906 @@ -715,6 +715,7 @@ | |
1907 } | |
1908 | |
1909 if (cinfo->unread_marker != 0) { | |
1910 +slow_decode_mcu: | |
1911 cinfo->unread_marker = 0; | |
1912 return FALSE; | |
1913 } | |
1914 @@ -742,7 +743,7 @@ | |
1915 * this module, since we'll just re-assign them on the next call.) | |
1916 */ | 14777 */ |
1917 | 14778 |
1918 -#define BUFSIZE (DCTSIZE2 * 2) | 14779 /* Bitmask for supported acceleration methods */ |
1919 +#define BUFSIZE (DCTSIZE2 * 2u) | 14780 |
1920 | 14781 -#define JSIMD_NONE 0x00 |
1921 METHODDEF(boolean) | 14782 -#define JSIMD_MMX 0x01 |
1922 decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data) | 14783 -#define JSIMD_3DNOW 0x02 |
1923 Index: jdhuff.h | 14784 -#define JSIMD_SSE 0x04 |
| 14785 -#define JSIMD_SSE2 0x08 |
| 14786 +#define JSIMD_NONE 0x00 |
| 14787 +#define JSIMD_MMX 0x01 |
| 14788 +#define JSIMD_3DNOW 0x02 |
| 14789 +#define JSIMD_SSE 0x04 |
| 14790 +#define JSIMD_SSE2 0x08 |
| 14791 +#define JSIMD_ARM_NEON 0x10 |
| 14792 |
| 14793 /* Short forms of external names for systems with brain-damaged linkers. */ |
| 14794 |
| 14795 @@ -27,6 +30,13 @@ |
| 14796 #define jsimd_extbgrx_ycc_convert_mmx jSEXTBGRXYCCM |
| 14797 #define jsimd_extxbgr_ycc_convert_mmx jSEXTXBGRYCCM |
| 14798 #define jsimd_extxrgb_ycc_convert_mmx jSEXTXRGBYCCM |
| 14799 +#define jsimd_rgb_gray_convert_mmx jSRGBGRYM |
| 14800 +#define jsimd_extrgb_gray_convert_mmx jSEXTRGBGRYM |
| 14801 +#define jsimd_extrgbx_gray_convert_mmx jSEXTRGBXGRYM |
| 14802 +#define jsimd_extbgr_gray_convert_mmx jSEXTBGRGRYM |
| 14803 +#define jsimd_extbgrx_gray_convert_mmx jSEXTBGRXGRYM |
| 14804 +#define jsimd_extxbgr_gray_convert_mmx jSEXTXBGRGRYM |
| 14805 +#define jsimd_extxrgb_gray_convert_mmx jSEXTXRGBGRYM |
| 14806 #define jsimd_ycc_rgb_convert_mmx jSYCCRGBM |
| 14807 #define jsimd_ycc_extrgb_convert_mmx jSYCCEXTRGBM |
| 14808 #define jsimd_ycc_extrgbx_convert_mmx jSYCCEXTRGBXM |
| 14809 @@ -42,6 +52,14 @@ |
| 14810 #define jsimd_extbgrx_ycc_convert_sse2 jSEXTBGRXYCCS2 |
| 14811 #define jsimd_extxbgr_ycc_convert_sse2 jSEXTXBGRYCCS2 |
| 14812 #define jsimd_extxrgb_ycc_convert_sse2 jSEXTXRGBYCCS2 |
| 14813 +#define jconst_rgb_gray_convert_sse2 jSCRGBGRYS2 |
| 14814 +#define jsimd_rgb_gray_convert_sse2 jSRGBGRYS2 |
| 14815 +#define jsimd_extrgb_gray_convert_sse2 jSEXTRGBGRYS2 |
| 14816 +#define jsimd_extrgbx_gray_convert_sse2 jSEXTRGBXGRYS2 |
| 14817 +#define jsimd_extbgr_gray_convert_sse2 jSEXTBGRGRYS2 |
| 14818 +#define jsimd_extbgrx_gray_convert_sse2 jSEXTBGRXGRYS2 |
| 14819 +#define jsimd_extxbgr_gray_convert_sse2 jSEXTXBGRGRYS2 |
| 14820 +#define jsimd_extxrgb_gray_convert_sse2 jSEXTXRGBGRYS2 |
| 14821 #define jconst_ycc_rgb_convert_sse2 jSCYCCRGBS2 |
| 14822 #define jsimd_ycc_rgb_convert_sse2 jSYCCRGBS2 |
| 14823 #define jsimd_ycc_extrgb_convert_sse2 jSYCCEXTRGBS2 |
| 14824 @@ -162,6 +180,35 @@ |
| 14825 JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14826 JDIMENSION output_row, int num_rows)); |
| 14827 |
| 14828 +EXTERN(void) jsimd_rgb_gray_convert_mmx |
| 14829 + JPP((JDIMENSION img_width, |
| 14830 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14831 + JDIMENSION output_row, int num_rows)); |
| 14832 +EXTERN(void) jsimd_extrgb_gray_convert_mmx |
| 14833 + JPP((JDIMENSION img_width, |
| 14834 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14835 + JDIMENSION output_row, int num_rows)); |
| 14836 +EXTERN(void) jsimd_extrgbx_gray_convert_mmx |
| 14837 + JPP((JDIMENSION img_width, |
| 14838 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14839 + JDIMENSION output_row, int num_rows)); |
| 14840 +EXTERN(void) jsimd_extbgr_gray_convert_mmx |
| 14841 + JPP((JDIMENSION img_width, |
| 14842 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14843 + JDIMENSION output_row, int num_rows)); |
| 14844 +EXTERN(void) jsimd_extbgrx_gray_convert_mmx |
| 14845 + JPP((JDIMENSION img_width, |
| 14846 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14847 + JDIMENSION output_row, int num_rows)); |
| 14848 +EXTERN(void) jsimd_extxbgr_gray_convert_mmx |
| 14849 + JPP((JDIMENSION img_width, |
| 14850 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14851 + JDIMENSION output_row, int num_rows)); |
| 14852 +EXTERN(void) jsimd_extxrgb_gray_convert_mmx |
| 14853 + JPP((JDIMENSION img_width, |
| 14854 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14855 + JDIMENSION output_row, int num_rows)); |
| 14856 + |
| 14857 EXTERN(void) jsimd_ycc_rgb_convert_mmx |
| 14858 JPP((JDIMENSION out_width, |
| 14859 JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 14860 @@ -221,6 +268,36 @@ |
| 14861 JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14862 JDIMENSION output_row, int num_rows)); |
| 14863 |
| 14864 +extern const int jconst_rgb_gray_convert_sse2[]; |
| 14865 +EXTERN(void) jsimd_rgb_gray_convert_sse2 |
| 14866 + JPP((JDIMENSION img_width, |
| 14867 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14868 + JDIMENSION output_row, int num_rows)); |
| 14869 +EXTERN(void) jsimd_extrgb_gray_convert_sse2 |
| 14870 + JPP((JDIMENSION img_width, |
| 14871 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14872 + JDIMENSION output_row, int num_rows)); |
| 14873 +EXTERN(void) jsimd_extrgbx_gray_convert_sse2 |
| 14874 + JPP((JDIMENSION img_width, |
| 14875 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14876 + JDIMENSION output_row, int num_rows)); |
| 14877 +EXTERN(void) jsimd_extbgr_gray_convert_sse2 |
| 14878 + JPP((JDIMENSION img_width, |
| 14879 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14880 + JDIMENSION output_row, int num_rows)); |
| 14881 +EXTERN(void) jsimd_extbgrx_gray_convert_sse2 |
| 14882 + JPP((JDIMENSION img_width, |
| 14883 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14884 + JDIMENSION output_row, int num_rows)); |
| 14885 +EXTERN(void) jsimd_extxbgr_gray_convert_sse2 |
| 14886 + JPP((JDIMENSION img_width, |
| 14887 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14888 + JDIMENSION output_row, int num_rows)); |
| 14889 +EXTERN(void) jsimd_extxrgb_gray_convert_sse2 |
| 14890 + JPP((JDIMENSION img_width, |
| 14891 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14892 + JDIMENSION output_row, int num_rows)); |
| 14893 + |
| 14894 extern const int jconst_ycc_rgb_convert_sse2[]; |
| 14895 EXTERN(void) jsimd_ycc_rgb_convert_sse2 |
| 14896 JPP((JDIMENSION out_width, |
| 14897 @@ -251,6 +328,64 @@ |
| 14898 JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 14899 JSAMPARRAY output_buf, int num_rows)); |
| 14900 |
| 14901 +EXTERN(void) jsimd_rgb_ycc_convert_neon |
| 14902 + JPP((JDIMENSION img_width, |
| 14903 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14904 + JDIMENSION output_row, int num_rows)); |
| 14905 +EXTERN(void) jsimd_extrgb_ycc_convert_neon |
| 14906 + JPP((JDIMENSION img_width, |
| 14907 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14908 + JDIMENSION output_row, int num_rows)); |
| 14909 +EXTERN(void) jsimd_extrgbx_ycc_convert_neon |
| 14910 + JPP((JDIMENSION img_width, |
| 14911 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14912 + JDIMENSION output_row, int num_rows)); |
| 14913 +EXTERN(void) jsimd_extbgr_ycc_convert_neon |
| 14914 + JPP((JDIMENSION img_width, |
| 14915 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14916 + JDIMENSION output_row, int num_rows)); |
| 14917 +EXTERN(void) jsimd_extbgrx_ycc_convert_neon |
| 14918 + JPP((JDIMENSION img_width, |
| 14919 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14920 + JDIMENSION output_row, int num_rows)); |
| 14921 +EXTERN(void) jsimd_extxbgr_ycc_convert_neon |
| 14922 + JPP((JDIMENSION img_width, |
| 14923 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14924 + JDIMENSION output_row, int num_rows)); |
| 14925 +EXTERN(void) jsimd_extxrgb_ycc_convert_neon |
| 14926 + JPP((JDIMENSION img_width, |
| 14927 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 14928 + JDIMENSION output_row, int num_rows)); |
| 14929 + |
| 14930 +EXTERN(void) jsimd_ycc_rgb_convert_neon |
| 14931 + JPP((JDIMENSION out_width, |
| 14932 + JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 14933 + JSAMPARRAY output_buf, int num_rows)); |
| 14934 +EXTERN(void) jsimd_ycc_extrgb_convert_neon |
| 14935 + JPP((JDIMENSION out_width, |
| 14936 + JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 14937 + JSAMPARRAY output_buf, int num_rows)); |
| 14938 +EXTERN(void) jsimd_ycc_extrgbx_convert_neon |
| 14939 + JPP((JDIMENSION out_width, |
| 14940 + JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 14941 + JSAMPARRAY output_buf, int num_rows)); |
| 14942 +EXTERN(void) jsimd_ycc_extbgr_convert_neon |
| 14943 + JPP((JDIMENSION out_width, |
| 14944 + JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 14945 + JSAMPARRAY output_buf, int num_rows)); |
| 14946 +EXTERN(void) jsimd_ycc_extbgrx_convert_neon |
| 14947 + JPP((JDIMENSION out_width, |
| 14948 + JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 14949 + JSAMPARRAY output_buf, int num_rows)); |
| 14950 +EXTERN(void) jsimd_ycc_extxbgr_convert_neon |
| 14951 + JPP((JDIMENSION out_width, |
| 14952 + JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 14953 + JSAMPARRAY output_buf, int num_rows)); |
| 14954 +EXTERN(void) jsimd_ycc_extxrgb_convert_neon |
| 14955 + JPP((JDIMENSION out_width, |
| 14956 + JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 14957 + JSAMPARRAY output_buf, int num_rows)); |
| 14958 + |
| 14959 /* SIMD Downsample */ |
| 14960 EXTERN(void) jsimd_h2v2_downsample_mmx |
| 14961 JPP((JDIMENSION image_width, int max_v_samp_factor, |
| 14962 @@ -387,6 +522,10 @@ |
| 14963 JPP((JDIMENSION output_width, JSAMPIMAGE input_buf, |
| 14964 JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf)); |
| 14965 |
| 14966 +EXTERN(void) jsimd_h2v1_fancy_upsample_neon |
| 14967 + JPP((int max_v_samp_factor, JDIMENSION downsampled_width, |
| 14968 + JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr)); |
| 14969 + |
| 14970 /* SIMD Sample Conversion */ |
| 14971 EXTERN(void) jsimd_convsamp_mmx JPP((JSAMPARRAY sample_data, |
| 14972 JDIMENSION start_col, |
| 14973 @@ -396,6 +535,10 @@ |
| 14974 JDIMENSION start_col, |
| 14975 DCTELEM * workspace)); |
| 14976 |
| 14977 +EXTERN(void) jsimd_convsamp_neon JPP((JSAMPARRAY sample_data, |
| 14978 + JDIMENSION start_col, |
| 14979 + DCTELEM * workspace)); |
| 14980 + |
| 14981 EXTERN(void) jsimd_convsamp_float_3dnow JPP((JSAMPARRAY sample_data, |
| 14982 JDIMENSION start_col, |
| 14983 FAST_FLOAT * workspace)); |
| 14984 @@ -417,6 +560,8 @@ |
| 14985 extern const int jconst_fdct_islow_sse2[]; |
| 14986 EXTERN(void) jsimd_fdct_ifast_sse2 JPP((DCTELEM * data)); |
| 14987 |
| 14988 +EXTERN(void) jsimd_fdct_ifast_neon JPP((DCTELEM * data)); |
| 14989 + |
| 14990 EXTERN(void) jsimd_fdct_float_3dnow JPP((FAST_FLOAT * data)); |
| 14991 |
| 14992 extern const int jconst_fdct_float_sse[]; |
| 14993 @@ -431,6 +576,10 @@ |
| 14994 DCTELEM * divisors, |
| 14995 DCTELEM * workspace)); |
| 14996 |
| 14997 +EXTERN(void) jsimd_quantize_neon JPP((JCOEFPTR coef_block, |
| 14998 + DCTELEM * divisors, |
| 14999 + DCTELEM * workspace)); |
| 15000 + |
| 15001 EXTERN(void) jsimd_quantize_float_3dnow JPP((JCOEFPTR coef_block, |
| 15002 FAST_FLOAT * divisors, |
| 15003 FAST_FLOAT * workspace)); |
| 15004 @@ -463,6 +612,15 @@ |
| 15005 JSAMPARRAY output_buf, |
| 15006 JDIMENSION output_col)); |
| 15007 |
| 15008 +EXTERN(void) jsimd_idct_2x2_neon JPP((void * dct_table, |
| 15009 + JCOEFPTR coef_block, |
| 15010 + JSAMPARRAY output_buf, |
| 15011 + JDIMENSION output_col)); |
| 15012 +EXTERN(void) jsimd_idct_4x4_neon JPP((void * dct_table, |
| 15013 + JCOEFPTR coef_block, |
| 15014 + JSAMPARRAY output_buf, |
| 15015 + JDIMENSION output_col)); |
| 15016 + |
| 15017 /* SIMD Inverse DCT */ |
| 15018 EXTERN(void) jsimd_idct_islow_mmx JPP((void * dct_table, |
| 15019 JCOEFPTR coef_block, |
| 15020 @@ -484,6 +642,15 @@ |
| 15021 JSAMPARRAY output_buf, |
| 15022 JDIMENSION output_col)); |
| 15023 |
| 15024 +EXTERN(void) jsimd_idct_islow_neon JPP((void * dct_table, |
| 15025 + JCOEFPTR coef_block, |
| 15026 + JSAMPARRAY output_buf, |
| 15027 + JDIMENSION output_col)); |
| 15028 +EXTERN(void) jsimd_idct_ifast_neon JPP((void * dct_table, |
| 15029 + JCOEFPTR coef_block, |
| 15030 + JSAMPARRAY output_buf, |
| 15031 + JDIMENSION output_col)); |
| 15032 + |
| 15033 EXTERN(void) jsimd_idct_float_3dnow JPP((void * dct_table, |
| 15034 JCOEFPTR coef_block, |
| 15035 JSAMPARRAY output_buf, |
| 15036 Index: simd/jsimd_i386.c |
1924 =================================================================== | 15037 =================================================================== |
1925 --- jdhuff.h (revision 1541) | 15038 --- simd/jsimd_i386.c» (revision 829) |
1926 +++ jdhuff.h (working copy) | 15039 +++ simd/jsimd_i386.c» (working copy) |
1927 @@ -208,7 +208,7 @@ | 15040 @@ -2,10 +2,11 @@ |
1928 } \ | 15041 * jsimd_i386.c |
| 15042 * |
| 15043 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 15044 - * Copyright 2009 D. R. Commander |
| 15045 + * Copyright 2009-2011 D. R. Commander |
| 15046 * |
| 15047 * Based on the x86 SIMD extension for IJG JPEG library, |
| 15048 * Copyright (C) 1999-2006, MIYASAKA Masaru. |
| 15049 + * For conditions of distribution and use, see copyright notice in jsimdext.inc |
| 15050 * |
| 15051 * This file contains the interface between the "normal" portions |
| 15052 * of the library and the SIMD implementations when running on a |
| 15053 @@ -40,7 +41,7 @@ |
| 15054 { |
| 15055 char *env = NULL; |
| 15056 |
| 15057 - if (simd_support != ~0) |
| 15058 + if (simd_support != ~0U) |
| 15059 return; |
| 15060 |
| 15061 simd_support = jpeg_simd_cpu_support(); |
| 15062 @@ -51,15 +52,16 @@ |
| 15063 simd_support &= JSIMD_MMX; |
| 15064 env = getenv("JSIMD_FORCE3DNOW"); |
| 15065 if ((env != NULL) && (strcmp(env, "1") == 0)) |
| 15066 - simd_support &= JSIMD_3DNOW; |
| 15067 + simd_support &= JSIMD_3DNOW|JSIMD_MMX; |
| 15068 env = getenv("JSIMD_FORCESSE"); |
| 15069 if ((env != NULL) && (strcmp(env, "1") == 0)) |
| 15070 - simd_support &= JSIMD_SSE; |
| 15071 + simd_support &= JSIMD_SSE|JSIMD_MMX; |
| 15072 env = getenv("JSIMD_FORCESSE2"); |
| 15073 if ((env != NULL) && (strcmp(env, "1") == 0)) |
| 15074 simd_support &= JSIMD_SSE2; |
1929 } | 15075 } |
1930 | 15076 |
1931 -#define HUFF_DECODE_FAST(s,nb,htbl) \ | 15077 +#ifndef JPEG_DECODE_ONLY |
1932 +#define HUFF_DECODE_FAST(s,nb,htbl,slowlabel) \ | 15078 GLOBAL(int) |
1933 FILL_BIT_BUFFER_FAST; \ | 15079 jsimd_can_rgb_ycc (void) |
1934 s = PEEK_BITS(HUFF_LOOKAHEAD); \ | 15080 { |
1935 s = htbl->lookup[s]; \ | 15081 @@ -81,8 +83,31 @@ |
1936 @@ -225,7 +225,9 @@ | 15082 |
1937 s |= GET_BITS(1); \ | 15083 return 0; |
1938 nb++; \ | 15084 } |
1939 } \ | |
1940 - s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) & 0xFF ]; \ | |
1941 + if (nb > 16) \ | |
1942 + goto slowlabel; \ | |
1943 + s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ]; \ | |
1944 } | |
1945 | |
1946 /* Out-of-line case for Huffman code fetching */ | |
1947 | |
1948 Index: jchuff.c | |
1949 =================================================================== | |
1950 --- jchuff.c» (revision 1219) | |
1951 +++ jchuff.c» (revision 1220) | |
1952 @@ -22,8 +22,36 @@ | |
1953 #include "jchuff.h"» » /* Declarations shared with jcphuff.c */ | |
1954 #include <limits.h> | |
1955 | |
1956 +/* | |
1957 + * NOTE: If USE_CLZ_INTRINSIC is defined, then clz/bsr instructions will be | |
1958 + * used for bit counting rather than the lookup table. This will reduce the | |
1959 + * memory footprint by 64k, which is important for some mobile applications | |
1960 + * that create many isolated instances of libjpeg-turbo (web browsers, for | |
1961 + * instance.) This may improve performance on some mobile platforms as well. | |
1962 + * This feature is enabled by default only on ARM processors, because some x86 | |
1963 + * chips have a slow implementation of bsr, and the use of clz/bsr cannot be | |
1964 + * shown to have a significant performance impact even on the x86 chips that | |
1965 + * have a fast implementation of it. When building for ARMv6, you can | |
1966 + * explicitly disable the use of clz/bsr by adding -mthumb to the compiler | |
1967 + * flags (this defines __thumb__). | |
1968 + */ | |
1969 + | |
1970 +/* NOTE: Both GCC and Clang define __GNUC__ */ | |
1971 +#if defined __GNUC__ && defined __arm__ | |
1972 +#if !defined __thumb__ || defined __thumb2__ | |
1973 +#define USE_CLZ_INTRINSIC | |
1974 +#endif | 15085 +#endif |
1975 +#endif | 15086 |
1976 + | 15087 GLOBAL(int) |
1977 +#ifdef USE_CLZ_INTRINSIC | |
1978 +#define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x)) | |
1979 +#define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0) | |
1980 +#else | |
1981 static unsigned char jpeg_nbits_table[65536]; | |
1982 static int jpeg_nbits_table_init = 0; | |
1983 +#define JPEG_NBITS(x) (jpeg_nbits_table[x]) | |
1984 +#define JPEG_NBITS_NONZERO(x) JPEG_NBITS(x) | |
1985 +#endif | |
1986 | |
1987 #ifndef min | |
1988 #define min(a,b) ((a)<(b)?(a):(b)) | |
1989 @@ -272,6 +300,7 @@ | |
1990 dtbl->ehufsi[i] = huffsize[p]; | |
1991 } | |
1992 | |
1993 +#ifndef USE_CLZ_INTRINSIC | |
1994 if(!jpeg_nbits_table_init) { | |
1995 for(i = 0; i < 65536; i++) { | |
1996 int nbits = 0, temp = i; | |
1997 @@ -280,6 +309,7 @@ | |
1998 } | |
1999 jpeg_nbits_table_init = 1; | |
2000 } | |
2001 +#endif | |
2002 } | |
2003 | |
2004 | |
2005 @@ -482,7 +512,7 @@ | |
2006 temp2 += temp3; | |
2007 | |
2008 /* Find the number of bits needed for the magnitude of the coefficient */ | |
2009 - nbits = jpeg_nbits_table[temp]; | |
2010 + nbits = JPEG_NBITS(temp); | |
2011 | |
2012 /* Emit the Huffman-coded symbol for the number of bits */ | |
2013 code = dctbl->ehufco[nbits]; | |
2014 @@ -516,7 +546,7 @@ | |
2015 temp ^= temp3; \ | |
2016 temp -= temp3; \ | |
2017 temp2 += temp3; \ | |
2018 - nbits = jpeg_nbits_table[temp]; \ | |
2019 + nbits = JPEG_NBITS_NONZERO(temp); \ | |
2020 /* if run length > 15, must emit special run-length-16 codes (0xF0) */ \ | |
2021 while (r > 15) { \ | |
2022 EMIT_BITS(code_0xf0, size_0xf0) \ | |
2023 Index: simd/jsimd_arm64.c | |
2024 =================================================================== | |
2025 --- /dev/null | |
2026 +++ simd/jsimd_arm64.c | |
2027 @@ -0,0 +1,544 @@ | |
2028 +/* | |
2029 + * jsimd_arm64.c | |
2030 + * | |
2031 + * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB | |
2032 + * Copyright 2009-2011, 2013-2014 D. R. Commander | |
2033 + * | |
2034 + * Based on the x86 SIMD extension for IJG JPEG library, | |
2035 + * Copyright (C) 1999-2006, MIYASAKA Masaru. | |
2036 + * For conditions of distribution and use, see copyright notice in jsimdext.inc | |
2037 + * | |
2038 + * This file contains the interface between the "normal" portions | |
2039 + * of the library and the SIMD implementations when running on a | |
2040 + * 64-bit ARM architecture. | |
2041 + */ | |
2042 + | |
2043 +#define JPEG_INTERNALS | |
2044 +#include "../jinclude.h" | |
2045 +#include "../jpeglib.h" | |
2046 +#include "../jsimd.h" | |
2047 +#include "../jdct.h" | |
2048 +#include "../jsimddct.h" | |
2049 +#include "jsimd.h" | |
2050 + | |
2051 +#include <stdio.h> | |
2052 +#include <string.h> | |
2053 +#include <ctype.h> | |
2054 + | |
2055 +static unsigned int simd_support = ~0; | |
2056 + | |
2057 +/* | |
2058 + * Check what SIMD accelerations are supported. | |
2059 + * | |
2060 + * FIXME: This code is racy under a multi-threaded environment. | |
2061 + */ | |
2062 + | |
2063 +/* | |
2064 + * ARMv8 architectures support NEON extensions by default. | |
2065 + * It is no longer optional as it was with ARMv7. | |
2066 + */ | |
2067 + | |
2068 + | |
2069 +LOCAL(void) | |
2070 +init_simd (void) | |
2071 +{ | |
2072 + char *env = NULL; | |
2073 + | |
2074 + if (simd_support != ~0U) | |
2075 + return; | |
2076 + | |
2077 + simd_support = 0; | |
2078 + | |
2079 + simd_support |= JSIMD_ARM_NEON; | |
2080 + | |
2081 + /* Force different settings through environment variables */ | |
2082 + env = getenv("JSIMD_FORCENEON"); | |
2083 + if ((env != NULL) && (strcmp(env, "1") == 0)) | |
2084 + simd_support &= JSIMD_ARM_NEON; | |
2085 + env = getenv("JSIMD_FORCENONE"); | |
2086 + if ((env != NULL) && (strcmp(env, "1") == 0)) | |
2087 + simd_support = 0; | |
2088 +} | |
2089 + | |
2090 +GLOBAL(int) | |
2091 +jsimd_can_rgb_ycc (void) | |
2092 +{ | |
2093 + init_simd(); | |
2094 + | |
2095 + return 0; | |
2096 +} | |
2097 + | |
2098 +GLOBAL(int) | |
2099 +jsimd_can_rgb_gray (void) | 15088 +jsimd_can_rgb_gray (void) |
2100 +{ | 15089 +{ |
2101 + init_simd(); | 15090 + init_simd(); |
2102 + | 15091 + |
2103 + return 0; | |
2104 +} | |
2105 + | |
2106 +GLOBAL(int) | |
2107 +jsimd_can_ycc_rgb (void) | |
2108 +{ | |
2109 + init_simd(); | |
2110 + | |
2111 + /* The code is optimised for these values only */ | 15092 + /* The code is optimised for these values only */ |
2112 + if (BITS_IN_JSAMPLE != 8) | 15093 + if (BITS_IN_JSAMPLE != 8) |
2113 + return 0; | 15094 + return 0; |
2114 + if (sizeof(JDIMENSION) != 4) | 15095 + if (sizeof(JDIMENSION) != 4) |
2115 + return 0; | 15096 + return 0; |
2116 + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4)) | 15097 + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4)) |
2117 + return 0; | 15098 + return 0; |
2118 + | 15099 + |
2119 + if (simd_support & JSIMD_ARM_NEON) | 15100 + if ((simd_support & JSIMD_SSE2) && |
| 15101 + IS_ALIGNED_SSE(jconst_rgb_gray_convert_sse2)) |
| 15102 + return 1; |
| 15103 + if (simd_support & JSIMD_MMX) |
2120 + return 1; | 15104 + return 1; |
2121 + | 15105 + |
2122 + return 0; | 15106 + return 0; |
2123 +} | 15107 +} |
2124 + | 15108 + |
2125 +GLOBAL(int) | 15109 +GLOBAL(int) |
2126 +jsimd_can_ycc_rgb565 (void) | 15110 jsimd_can_ycc_rgb (void) |
| 15111 { |
| 15112 init_simd(); |
| 15113 @@ -104,6 +129,7 @@ |
| 15114 return 0; |
| 15115 } |
| 15116 |
| 15117 +#ifndef JPEG_DECODE_ONLY |
| 15118 GLOBAL(void) |
| 15119 jsimd_rgb_ycc_convert (j_compress_ptr cinfo, |
| 15120 JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 15121 @@ -119,6 +145,7 @@ |
| 15122 mmxfct=jsimd_extrgb_ycc_convert_mmx; |
| 15123 break; |
| 15124 case JCS_EXT_RGBX: |
| 15125 + case JCS_EXT_RGBA: |
| 15126 sse2fct=jsimd_extrgbx_ycc_convert_sse2; |
| 15127 mmxfct=jsimd_extrgbx_ycc_convert_mmx; |
| 15128 break; |
| 15129 @@ -127,14 +154,17 @@ |
| 15130 mmxfct=jsimd_extbgr_ycc_convert_mmx; |
| 15131 break; |
| 15132 case JCS_EXT_BGRX: |
| 15133 + case JCS_EXT_BGRA: |
| 15134 sse2fct=jsimd_extbgrx_ycc_convert_sse2; |
| 15135 mmxfct=jsimd_extbgrx_ycc_convert_mmx; |
| 15136 break; |
| 15137 case JCS_EXT_XBGR: |
| 15138 + case JCS_EXT_ABGR: |
| 15139 sse2fct=jsimd_extxbgr_ycc_convert_sse2; |
| 15140 mmxfct=jsimd_extxbgr_ycc_convert_mmx; |
| 15141 break; |
| 15142 case JCS_EXT_XRGB: |
| 15143 + case JCS_EXT_ARGB: |
| 15144 sse2fct=jsimd_extxrgb_ycc_convert_sse2; |
| 15145 mmxfct=jsimd_extxrgb_ycc_convert_mmx; |
| 15146 break; |
| 15147 @@ -152,8 +182,62 @@ |
| 15148 mmxfct(cinfo->image_width, input_buf, |
| 15149 output_buf, output_row, num_rows); |
| 15150 } |
| 15151 +#endif |
| 15152 |
| 15153 GLOBAL(void) |
| 15154 +jsimd_rgb_gray_convert (j_compress_ptr cinfo, |
| 15155 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 15156 + JDIMENSION output_row, int num_rows) |
2127 +{ | 15157 +{ |
2128 + init_simd(); | 15158 + void (*sse2fct)(JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int); |
2129 + | 15159 + void (*mmxfct)(JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int); |
| 15160 + |
| 15161 + switch(cinfo->in_color_space) |
| 15162 + { |
| 15163 + case JCS_EXT_RGB: |
| 15164 + sse2fct=jsimd_extrgb_gray_convert_sse2; |
| 15165 + mmxfct=jsimd_extrgb_gray_convert_mmx; |
| 15166 + break; |
| 15167 + case JCS_EXT_RGBX: |
| 15168 + case JCS_EXT_RGBA: |
| 15169 + sse2fct=jsimd_extrgbx_gray_convert_sse2; |
| 15170 + mmxfct=jsimd_extrgbx_gray_convert_mmx; |
| 15171 + break; |
| 15172 + case JCS_EXT_BGR: |
| 15173 + sse2fct=jsimd_extbgr_gray_convert_sse2; |
| 15174 + mmxfct=jsimd_extbgr_gray_convert_mmx; |
| 15175 + break; |
| 15176 + case JCS_EXT_BGRX: |
| 15177 + case JCS_EXT_BGRA: |
| 15178 + sse2fct=jsimd_extbgrx_gray_convert_sse2; |
| 15179 + mmxfct=jsimd_extbgrx_gray_convert_mmx; |
| 15180 + break; |
| 15181 + case JCS_EXT_XBGR: |
| 15182 + case JCS_EXT_ABGR: |
| 15183 + sse2fct=jsimd_extxbgr_gray_convert_sse2; |
| 15184 + mmxfct=jsimd_extxbgr_gray_convert_mmx; |
| 15185 + break; |
| 15186 + case JCS_EXT_XRGB: |
| 15187 + case JCS_EXT_ARGB: |
| 15188 + sse2fct=jsimd_extxrgb_gray_convert_sse2; |
| 15189 + mmxfct=jsimd_extxrgb_gray_convert_mmx; |
| 15190 + break; |
| 15191 + default: |
| 15192 + sse2fct=jsimd_rgb_gray_convert_sse2; |
| 15193 + mmxfct=jsimd_rgb_gray_convert_mmx; |
| 15194 + break; |
| 15195 + } |
| 15196 + |
| 15197 + if ((simd_support & JSIMD_SSE2) && |
| 15198 + IS_ALIGNED_SSE(jconst_rgb_gray_convert_sse2)) |
| 15199 + sse2fct(cinfo->image_width, input_buf, |
| 15200 + output_buf, output_row, num_rows); |
| 15201 + else if (simd_support & JSIMD_MMX) |
| 15202 + mmxfct(cinfo->image_width, input_buf, |
| 15203 + output_buf, output_row, num_rows); |
| 15204 +} |
| 15205 + |
| 15206 +GLOBAL(void) |
| 15207 jsimd_ycc_rgb_convert (j_decompress_ptr cinfo, |
| 15208 JSAMPIMAGE input_buf, JDIMENSION input_row, |
| 15209 JSAMPARRAY output_buf, int num_rows) |
| 15210 @@ -168,6 +252,7 @@ |
| 15211 mmxfct=jsimd_ycc_extrgb_convert_mmx; |
| 15212 break; |
| 15213 case JCS_EXT_RGBX: |
| 15214 + case JCS_EXT_RGBA: |
| 15215 sse2fct=jsimd_ycc_extrgbx_convert_sse2; |
| 15216 mmxfct=jsimd_ycc_extrgbx_convert_mmx; |
| 15217 break; |
| 15218 @@ -176,14 +261,17 @@ |
| 15219 mmxfct=jsimd_ycc_extbgr_convert_mmx; |
| 15220 break; |
| 15221 case JCS_EXT_BGRX: |
| 15222 + case JCS_EXT_BGRA: |
| 15223 sse2fct=jsimd_ycc_extbgrx_convert_sse2; |
| 15224 mmxfct=jsimd_ycc_extbgrx_convert_mmx; |
| 15225 break; |
| 15226 case JCS_EXT_XBGR: |
| 15227 + case JCS_EXT_ABGR: |
| 15228 sse2fct=jsimd_ycc_extxbgr_convert_sse2; |
| 15229 mmxfct=jsimd_ycc_extxbgr_convert_mmx; |
| 15230 break; |
| 15231 case JCS_EXT_XRGB: |
| 15232 + case JCS_EXT_ARGB: |
| 15233 sse2fct=jsimd_ycc_extxrgb_convert_sse2; |
| 15234 mmxfct=jsimd_ycc_extxrgb_convert_mmx; |
| 15235 break; |
| 15236 @@ -202,6 +290,7 @@ |
| 15237 input_row, output_buf, num_rows); |
| 15238 } |
| 15239 |
| 15240 +#ifndef JPEG_DECODE_ONLY |
| 15241 GLOBAL(int) |
| 15242 jsimd_can_h2v2_downsample (void) |
| 15243 { |
| 15244 @@ -267,6 +356,7 @@ |
| 15245 compptr->v_samp_factor, compptr->width_in_blocks, |
| 15246 input_data, output_data); |
| 15247 } |
| 15248 +#endif |
| 15249 |
| 15250 GLOBAL(int) |
| 15251 jsimd_can_h2v2_upsample (void) |
| 15252 @@ -382,7 +472,7 @@ |
| 15253 { |
| 15254 if ((simd_support & JSIMD_SSE2) && |
| 15255 IS_ALIGNED_SSE(jconst_fancy_upsample_sse2)) |
| 15256 - jsimd_h2v1_fancy_upsample_sse2(cinfo->max_v_samp_factor, |
| 15257 + jsimd_h2v2_fancy_upsample_sse2(cinfo->max_v_samp_factor, |
| 15258 compptr->downsampled_width, input_data, output_data_ptr); |
| 15259 else if (simd_support & JSIMD_MMX) |
| 15260 jsimd_h2v2_fancy_upsample_mmx(cinfo->max_v_samp_factor, |
| 15261 @@ -460,6 +550,7 @@ |
| 15262 mmxfct=jsimd_h2v2_extrgb_merged_upsample_mmx; |
| 15263 break; |
| 15264 case JCS_EXT_RGBX: |
| 15265 + case JCS_EXT_RGBA: |
| 15266 sse2fct=jsimd_h2v2_extrgbx_merged_upsample_sse2; |
| 15267 mmxfct=jsimd_h2v2_extrgbx_merged_upsample_mmx; |
| 15268 break; |
| 15269 @@ -468,14 +559,17 @@ |
| 15270 mmxfct=jsimd_h2v2_extbgr_merged_upsample_mmx; |
| 15271 break; |
| 15272 case JCS_EXT_BGRX: |
| 15273 + case JCS_EXT_BGRA: |
| 15274 sse2fct=jsimd_h2v2_extbgrx_merged_upsample_sse2; |
| 15275 mmxfct=jsimd_h2v2_extbgrx_merged_upsample_mmx; |
| 15276 break; |
| 15277 case JCS_EXT_XBGR: |
| 15278 + case JCS_EXT_ABGR: |
| 15279 sse2fct=jsimd_h2v2_extxbgr_merged_upsample_sse2; |
| 15280 mmxfct=jsimd_h2v2_extxbgr_merged_upsample_mmx; |
| 15281 break; |
| 15282 case JCS_EXT_XRGB: |
| 15283 + case JCS_EXT_ARGB: |
| 15284 sse2fct=jsimd_h2v2_extxrgb_merged_upsample_sse2; |
| 15285 mmxfct=jsimd_h2v2_extxrgb_merged_upsample_mmx; |
| 15286 break; |
| 15287 @@ -510,6 +604,7 @@ |
| 15288 mmxfct=jsimd_h2v1_extrgb_merged_upsample_mmx; |
| 15289 break; |
| 15290 case JCS_EXT_RGBX: |
| 15291 + case JCS_EXT_RGBA: |
| 15292 sse2fct=jsimd_h2v1_extrgbx_merged_upsample_sse2; |
| 15293 mmxfct=jsimd_h2v1_extrgbx_merged_upsample_mmx; |
| 15294 break; |
| 15295 @@ -518,14 +613,17 @@ |
| 15296 mmxfct=jsimd_h2v1_extbgr_merged_upsample_mmx; |
| 15297 break; |
| 15298 case JCS_EXT_BGRX: |
| 15299 + case JCS_EXT_BGRA: |
| 15300 sse2fct=jsimd_h2v1_extbgrx_merged_upsample_sse2; |
| 15301 mmxfct=jsimd_h2v1_extbgrx_merged_upsample_mmx; |
| 15302 break; |
| 15303 case JCS_EXT_XBGR: |
| 15304 + case JCS_EXT_ABGR: |
| 15305 sse2fct=jsimd_h2v1_extxbgr_merged_upsample_sse2; |
| 15306 mmxfct=jsimd_h2v1_extxbgr_merged_upsample_mmx; |
| 15307 break; |
| 15308 case JCS_EXT_XRGB: |
| 15309 + case JCS_EXT_ARGB: |
| 15310 sse2fct=jsimd_h2v1_extxrgb_merged_upsample_sse2; |
| 15311 mmxfct=jsimd_h2v1_extxrgb_merged_upsample_mmx; |
| 15312 break; |
| 15313 @@ -544,6 +642,7 @@ |
| 15314 in_row_group_ctr, output_buf); |
| 15315 } |
| 15316 |
| 15317 +#ifndef JPEG_DECODE_ONLY |
| 15318 GLOBAL(int) |
| 15319 jsimd_can_convsamp (void) |
| 15320 { |
| 15321 @@ -763,6 +862,7 @@ |
| 15322 else if (simd_support & JSIMD_3DNOW) |
| 15323 jsimd_quantize_float_3dnow(coef_block, divisors, workspace); |
| 15324 } |
| 15325 +#endif |
| 15326 |
| 15327 GLOBAL(int) |
| 15328 jsimd_can_idct_2x2 (void) |
| 15329 @@ -953,4 +1053,3 @@ |
| 15330 jsimd_idct_float_3dnow(compptr->dct_table, coef_block, |
| 15331 output_buf, output_col); |
| 15332 } |
| 15333 - |
| 15334 Index: simd/jsimd_x86_64.c |
| 15335 =================================================================== |
| 15336 --- simd/jsimd_x86_64.c»(revision 829) |
| 15337 +++ simd/jsimd_x86_64.c»(working copy) |
| 15338 @@ -2,10 +2,11 @@ |
| 15339 * jsimd_x86_64.c |
| 15340 * |
| 15341 * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 15342 - * Copyright 2009 D. R. Commander |
| 15343 + * Copyright 2009-2011 D. R. Commander |
| 15344 * |
| 15345 * Based on the x86 SIMD extension for IJG JPEG library, |
| 15346 * Copyright (C) 1999-2006, MIYASAKA Masaru. |
| 15347 + * For conditions of distribution and use, see copyright notice in jsimdext.inc |
| 15348 * |
| 15349 * This file contains the interface between the "normal" portions |
| 15350 * of the library and the SIMD implementations when running on a |
| 15351 @@ -18,16 +19,17 @@ |
| 15352 #include "../jsimd.h" |
| 15353 #include "../jdct.h" |
| 15354 #include "../jsimddct.h" |
| 15355 -#include "simd/jsimd.h" |
| 15356 +#include "jsimd.h" |
| 15357 |
| 15358 /* |
| 15359 * In the PIC cases, we have no guarantee that constants will keep |
| 15360 * their alignment. This macro allows us to verify it at runtime. |
| 15361 */ |
| 15362 -#define IS_ALIGNED(ptr, order) (((unsigned)ptr & ((1 << order) - 1)) == 0) |
| 15363 +#define IS_ALIGNED(ptr, order) (((size_t)ptr & ((1 << order) - 1)) == 0) |
| 15364 |
| 15365 #define IS_ALIGNED_SSE(ptr) (IS_ALIGNED(ptr, 4)) /* 16 byte alignment */ |
| 15366 |
| 15367 +#ifndef JPEG_DECODE_ONLY |
| 15368 GLOBAL(int) |
| 15369 jsimd_can_rgb_ycc (void) |
| 15370 { |
| 15371 @@ -44,8 +46,26 @@ |
| 15372 |
| 15373 return 1; |
| 15374 } |
| 15375 +#endif |
| 15376 |
| 15377 GLOBAL(int) |
| 15378 +jsimd_can_rgb_gray (void) |
| 15379 +{ |
2130 + /* The code is optimised for these values only */ | 15380 + /* The code is optimised for these values only */ |
2131 + if (BITS_IN_JSAMPLE != 8) | 15381 + if (BITS_IN_JSAMPLE != 8) |
2132 + return 0; | 15382 + return 0; |
2133 + if (sizeof(JDIMENSION) != 4) | 15383 + if (sizeof(JDIMENSION) != 4) |
2134 + return 0; | 15384 + return 0; |
2135 + | 15385 + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4)) |
2136 + if (simd_support & JSIMD_ARM_NEON) | 15386 + return 0; |
2137 + return 1; | 15387 + |
2138 + | 15388 + if (!IS_ALIGNED_SSE(jconst_rgb_gray_convert_sse2)) |
2139 + return 0; | 15389 + return 0; |
| 15390 + |
| 15391 + return 1; |
2140 +} | 15392 +} |
2141 + | 15393 + |
2142 +GLOBAL(void) | 15394 +GLOBAL(int) |
2143 +jsimd_rgb_ycc_convert (j_compress_ptr cinfo, | 15395 jsimd_can_ycc_rgb (void) |
2144 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, | 15396 { |
2145 + JDIMENSION output_row, int num_rows) | 15397 /* The code is optimised for these values only */ |
2146 +{ | 15398 @@ -62,6 +82,7 @@ |
2147 +} | 15399 return 1; |
2148 + | 15400 } |
2149 +GLOBAL(void) | 15401 |
| 15402 +#ifndef JPEG_DECODE_ONLY |
| 15403 GLOBAL(void) |
| 15404 jsimd_rgb_ycc_convert (j_compress_ptr cinfo, |
| 15405 JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
| 15406 @@ -75,6 +96,7 @@ |
| 15407 sse2fct=jsimd_extrgb_ycc_convert_sse2; |
| 15408 break; |
| 15409 case JCS_EXT_RGBX: |
| 15410 + case JCS_EXT_RGBA: |
| 15411 sse2fct=jsimd_extrgbx_ycc_convert_sse2; |
| 15412 break; |
| 15413 case JCS_EXT_BGR: |
| 15414 @@ -81,12 +103,15 @@ |
| 15415 sse2fct=jsimd_extbgr_ycc_convert_sse2; |
| 15416 break; |
| 15417 case JCS_EXT_BGRX: |
| 15418 + case JCS_EXT_BGRA: |
| 15419 sse2fct=jsimd_extbgrx_ycc_convert_sse2; |
| 15420 break; |
| 15421 case JCS_EXT_XBGR: |
| 15422 + case JCS_EXT_ABGR: |
| 15423 sse2fct=jsimd_extxbgr_ycc_convert_sse2; |
| 15424 break; |
| 15425 case JCS_EXT_XRGB: |
| 15426 + case JCS_EXT_ARGB: |
| 15427 sse2fct=jsimd_extxrgb_ycc_convert_sse2; |
| 15428 break; |
| 15429 default: |
| 15430 @@ -96,8 +121,48 @@ |
| 15431 |
| 15432 sse2fct(cinfo->image_width, input_buf, output_buf, output_row, num_rows); |
| 15433 } |
| 15434 +#endif |
| 15435 |
| 15436 GLOBAL(void) |
2150 +jsimd_rgb_gray_convert (j_compress_ptr cinfo, | 15437 +jsimd_rgb_gray_convert (j_compress_ptr cinfo, |
2151 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, | 15438 + JSAMPARRAY input_buf, JSAMPIMAGE output_buf, |
2152 + JDIMENSION output_row, int num_rows) | 15439 + JDIMENSION output_row, int num_rows) |
2153 +{ | 15440 +{ |
2154 +} | 15441 + void (*sse2fct)(JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int); |
2155 + | 15442 + |
2156 +GLOBAL(void) | 15443 + switch(cinfo->in_color_space) |
2157 +jsimd_ycc_rgb_convert (j_decompress_ptr cinfo, | 15444 + { |
2158 + JSAMPIMAGE input_buf, JDIMENSION input_row, | |
2159 + JSAMPARRAY output_buf, int num_rows) | |
2160 +{ | |
2161 + void (*neonfct)(JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY, int); | |
2162 + | |
2163 + switch(cinfo->out_color_space) { | |
2164 + case JCS_EXT_RGB: | 15445 + case JCS_EXT_RGB: |
2165 + neonfct=jsimd_ycc_extrgb_convert_neon; | 15446 + sse2fct=jsimd_extrgb_gray_convert_sse2; |
2166 + break; | 15447 + break; |
2167 + case JCS_EXT_RGBX: | 15448 + case JCS_EXT_RGBX: |
2168 + case JCS_EXT_RGBA: | 15449 + case JCS_EXT_RGBA: |
2169 + neonfct=jsimd_ycc_extrgbx_convert_neon; | 15450 + sse2fct=jsimd_extrgbx_gray_convert_sse2; |
2170 + break; | 15451 + break; |
2171 + case JCS_EXT_BGR: | 15452 + case JCS_EXT_BGR: |
2172 + neonfct=jsimd_ycc_extbgr_convert_neon; | 15453 + sse2fct=jsimd_extbgr_gray_convert_sse2; |
2173 + break; | 15454 + break; |
2174 + case JCS_EXT_BGRX: | 15455 + case JCS_EXT_BGRX: |
2175 + case JCS_EXT_BGRA: | 15456 + case JCS_EXT_BGRA: |
2176 + neonfct=jsimd_ycc_extbgrx_convert_neon; | 15457 + sse2fct=jsimd_extbgrx_gray_convert_sse2; |
2177 + break; | 15458 + break; |
2178 + case JCS_EXT_XBGR: | 15459 + case JCS_EXT_XBGR: |
2179 + case JCS_EXT_ABGR: | 15460 + case JCS_EXT_ABGR: |
2180 + neonfct=jsimd_ycc_extxbgr_convert_neon; | 15461 + sse2fct=jsimd_extxbgr_gray_convert_sse2; |
2181 + break; | 15462 + break; |
2182 + case JCS_EXT_XRGB: | 15463 + case JCS_EXT_XRGB: |
2183 + case JCS_EXT_ARGB: | 15464 + case JCS_EXT_ARGB: |
2184 + neonfct=jsimd_ycc_extxrgb_convert_neon; | 15465 + sse2fct=jsimd_extxrgb_gray_convert_sse2; |
2185 + break; | 15466 + break; |
2186 + default: | 15467 + default: |
2187 + neonfct=jsimd_ycc_extrgb_convert_neon; | 15468 + sse2fct=jsimd_rgb_gray_convert_sse2; |
2188 + break; | 15469 + break; |
2189 + } | 15470 + } |
2190 + | 15471 + |
2191 + if (simd_support & JSIMD_ARM_NEON) | 15472 + sse2fct(cinfo->image_width, input_buf, output_buf, output_row, num_rows); |
2192 + neonfct(cinfo->output_width, input_buf, input_row, output_buf, num_rows); | |
2193 +} | 15473 +} |
2194 + | 15474 + |
2195 +GLOBAL(void) | 15475 +GLOBAL(void) |
2196 +jsimd_ycc_rgb565_convert (j_decompress_ptr cinfo, | 15476 jsimd_ycc_rgb_convert (j_decompress_ptr cinfo, |
2197 + JSAMPIMAGE input_buf, JDIMENSION input_row, | 15477 JSAMPIMAGE input_buf, JDIMENSION input_row, |
2198 + JSAMPARRAY output_buf, int num_rows) | 15478 JSAMPARRAY output_buf, int num_rows) |
| 15479 @@ -110,6 +175,7 @@ |
| 15480 sse2fct=jsimd_ycc_extrgb_convert_sse2; |
| 15481 break; |
| 15482 case JCS_EXT_RGBX: |
| 15483 + case JCS_EXT_RGBA: |
| 15484 sse2fct=jsimd_ycc_extrgbx_convert_sse2; |
| 15485 break; |
| 15486 case JCS_EXT_BGR: |
| 15487 @@ -116,12 +182,15 @@ |
| 15488 sse2fct=jsimd_ycc_extbgr_convert_sse2; |
| 15489 break; |
| 15490 case JCS_EXT_BGRX: |
| 15491 + case JCS_EXT_BGRA: |
| 15492 sse2fct=jsimd_ycc_extbgrx_convert_sse2; |
| 15493 break; |
| 15494 case JCS_EXT_XBGR: |
| 15495 + case JCS_EXT_ABGR: |
| 15496 sse2fct=jsimd_ycc_extxbgr_convert_sse2; |
| 15497 break; |
| 15498 case JCS_EXT_XRGB: |
| 15499 + case JCS_EXT_ARGB: |
| 15500 sse2fct=jsimd_ycc_extxrgb_convert_sse2; |
| 15501 break; |
| 15502 default: |
| 15503 @@ -132,6 +201,7 @@ |
| 15504 sse2fct(cinfo->output_width, input_buf, input_row, output_buf, num_rows); |
| 15505 } |
| 15506 |
| 15507 +#ifndef JPEG_DECODE_ONLY |
| 15508 GLOBAL(int) |
| 15509 jsimd_can_h2v2_downsample (void) |
| 15510 { |
| 15511 @@ -177,6 +247,7 @@ |
| 15512 compptr->width_in_blocks, |
| 15513 input_data, output_data); |
| 15514 } |
| 15515 +#endif |
| 15516 |
| 15517 GLOBAL(int) |
| 15518 jsimd_can_h2v2_upsample (void) |
| 15519 @@ -260,7 +331,7 @@ |
| 15520 JSAMPARRAY input_data, |
| 15521 JSAMPARRAY * output_data_ptr) |
| 15522 { |
| 15523 - jsimd_h2v1_fancy_upsample_sse2(cinfo->max_v_samp_factor, |
| 15524 + jsimd_h2v2_fancy_upsample_sse2(cinfo->max_v_samp_factor, |
| 15525 compptr->downsampled_width, |
| 15526 input_data, output_data_ptr); |
| 15527 } |
| 15528 @@ -320,6 +391,7 @@ |
| 15529 sse2fct=jsimd_h2v2_extrgb_merged_upsample_sse2; |
| 15530 break; |
| 15531 case JCS_EXT_RGBX: |
| 15532 + case JCS_EXT_RGBA: |
| 15533 sse2fct=jsimd_h2v2_extrgbx_merged_upsample_sse2; |
| 15534 break; |
| 15535 case JCS_EXT_BGR: |
| 15536 @@ -326,12 +398,15 @@ |
| 15537 sse2fct=jsimd_h2v2_extbgr_merged_upsample_sse2; |
| 15538 break; |
| 15539 case JCS_EXT_BGRX: |
| 15540 + case JCS_EXT_BGRA: |
| 15541 sse2fct=jsimd_h2v2_extbgrx_merged_upsample_sse2; |
| 15542 break; |
| 15543 case JCS_EXT_XBGR: |
| 15544 + case JCS_EXT_ABGR: |
| 15545 sse2fct=jsimd_h2v2_extxbgr_merged_upsample_sse2; |
| 15546 break; |
| 15547 case JCS_EXT_XRGB: |
| 15548 + case JCS_EXT_ARGB: |
| 15549 sse2fct=jsimd_h2v2_extxrgb_merged_upsample_sse2; |
| 15550 break; |
| 15551 default: |
| 15552 @@ -356,6 +431,7 @@ |
| 15553 sse2fct=jsimd_h2v1_extrgb_merged_upsample_sse2; |
| 15554 break; |
| 15555 case JCS_EXT_RGBX: |
| 15556 + case JCS_EXT_RGBA: |
| 15557 sse2fct=jsimd_h2v1_extrgbx_merged_upsample_sse2; |
| 15558 break; |
| 15559 case JCS_EXT_BGR: |
| 15560 @@ -362,12 +438,15 @@ |
| 15561 sse2fct=jsimd_h2v1_extbgr_merged_upsample_sse2; |
| 15562 break; |
| 15563 case JCS_EXT_BGRX: |
| 15564 + case JCS_EXT_BGRA: |
| 15565 sse2fct=jsimd_h2v1_extbgrx_merged_upsample_sse2; |
| 15566 break; |
| 15567 case JCS_EXT_XBGR: |
| 15568 + case JCS_EXT_ABGR: |
| 15569 sse2fct=jsimd_h2v1_extxbgr_merged_upsample_sse2; |
| 15570 break; |
| 15571 case JCS_EXT_XRGB: |
| 15572 + case JCS_EXT_ARGB: |
| 15573 sse2fct=jsimd_h2v1_extxrgb_merged_upsample_sse2; |
| 15574 break; |
| 15575 default: |
| 15576 @@ -378,6 +457,7 @@ |
| 15577 sse2fct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf); |
| 15578 } |
| 15579 |
| 15580 +#ifndef JPEG_DECODE_ONLY |
| 15581 GLOBAL(int) |
| 15582 jsimd_can_convsamp (void) |
| 15583 { |
| 15584 @@ -528,6 +608,7 @@ |
| 15585 { |
| 15586 jsimd_quantize_float_sse2(coef_block, divisors, workspace); |
| 15587 } |
| 15588 +#endif |
| 15589 |
| 15590 GLOBAL(int) |
| 15591 jsimd_can_idct_2x2 (void) |
| 15592 @@ -677,4 +758,3 @@ |
| 15593 jsimd_idct_float_sse2(compptr->dct_table, coef_block, |
| 15594 output_buf, output_col); |
| 15595 } |
| 15596 - |
| 15597 Index: simd/jsimdcfg.inc.h |
| 15598 =================================================================== |
| 15599 --- simd/jsimdcfg.inc.h (revision 829) |
| 15600 +++ simd/jsimdcfg.inc.h (working copy) |
| 15601 @@ -15,26 +15,54 @@ |
| 15602 #include "../jmorecfg.h" |
| 15603 #include "jsimd.h" |
| 15604 |
| 15605 -#define define(var) %define _cpp_protection_##var |
| 15606 -#define definev(var) %define _cpp_protection_##var var |
| 15607 - |
| 15608 ; |
| 15609 ; -- jpeglib.h |
| 15610 ; |
| 15611 |
| 15612 -definev(DCTSIZE) |
| 15613 -definev(DCTSIZE2) |
| 15614 +%define _cpp_protection_DCTSIZE DCTSIZE |
| 15615 +%define _cpp_protection_DCTSIZE2 DCTSIZE2 |
| 15616 |
| 15617 ; |
| 15618 ; -- jmorecfg.h |
| 15619 ; |
| 15620 |
| 15621 -definev(RGB_RED) |
| 15622 -definev(RGB_GREEN) |
| 15623 -definev(RGB_BLUE) |
| 15624 +%define _cpp_protection_RGB_RED RGB_RED |
| 15625 +%define _cpp_protection_RGB_GREEN RGB_GREEN |
| 15626 +%define _cpp_protection_RGB_BLUE RGB_BLUE |
| 15627 +%define _cpp_protection_RGB_PIXELSIZE RGB_PIXELSIZE |
| 15628 |
| 15629 -definev(RGB_PIXELSIZE) |
| 15630 +%define _cpp_protection_EXT_RGB_RED EXT_RGB_RED |
| 15631 +%define _cpp_protection_EXT_RGB_GREEN EXT_RGB_GREEN |
| 15632 +%define _cpp_protection_EXT_RGB_BLUE EXT_RGB_BLUE |
| 15633 +%define _cpp_protection_EXT_RGB_PIXELSIZE EXT_RGB_PIXELSIZE |
| 15634 |
| 15635 +%define _cpp_protection_EXT_RGBX_RED EXT_RGBX_RED |
| 15636 +%define _cpp_protection_EXT_RGBX_GREEN EXT_RGBX_GREEN |
| 15637 +%define _cpp_protection_EXT_RGBX_BLUE EXT_RGBX_BLUE |
| 15638 +%define _cpp_protection_EXT_RGBX_PIXELSIZE EXT_RGBX_PIXELSIZE |
| 15639 + |
| 15640 +%define _cpp_protection_EXT_BGR_RED EXT_BGR_RED |
| 15641 +%define _cpp_protection_EXT_BGR_GREEN EXT_BGR_GREEN |
| 15642 +%define _cpp_protection_EXT_BGR_BLUE EXT_BGR_BLUE |
| 15643 +%define _cpp_protection_EXT_BGR_PIXELSIZE EXT_BGR_PIXELSIZE |
| 15644 + |
| 15645 +%define _cpp_protection_EXT_BGRX_RED EXT_BGRX_RED |
| 15646 +%define _cpp_protection_EXT_BGRX_GREEN EXT_BGRX_GREEN |
| 15647 +%define _cpp_protection_EXT_BGRX_BLUE EXT_BGRX_BLUE |
| 15648 +%define _cpp_protection_EXT_BGRX_PIXELSIZE EXT_BGRX_PIXELSIZE |
| 15649 + |
| 15650 +%define _cpp_protection_EXT_XBGR_RED EXT_XBGR_RED |
| 15651 +%define _cpp_protection_EXT_XBGR_GREEN EXT_XBGR_GREEN |
| 15652 +%define _cpp_protection_EXT_XBGR_BLUE EXT_XBGR_BLUE |
| 15653 +%define _cpp_protection_EXT_XBGR_PIXELSIZE EXT_XBGR_PIXELSIZE |
| 15654 + |
| 15655 +%define _cpp_protection_EXT_XRGB_RED EXT_XRGB_RED |
| 15656 +%define _cpp_protection_EXT_XRGB_GREEN EXT_XRGB_GREEN |
| 15657 +%define _cpp_protection_EXT_XRGB_BLUE EXT_XRGB_BLUE |
| 15658 +%define _cpp_protection_EXT_XRGB_PIXELSIZE EXT_XRGB_PIXELSIZE |
| 15659 + |
| 15660 +%define RGBX_FILLER_0XFF 1 |
| 15661 + |
| 15662 ; Representation of a single sample (pixel element value). |
| 15663 ; On this SIMD implementation, this must be 'unsigned char'. |
| 15664 ; |
| 15665 @@ -42,7 +70,7 @@ |
| 15666 %define JSAMPLE byte ; unsigned char |
| 15667 %define SIZEOF_JSAMPLE SIZEOF_BYTE ; sizeof(JSAMPLE) |
| 15668 |
| 15669 -definev(CENTERJSAMPLE) |
| 15670 +%define _cpp_protection_CENTERJSAMPLE CENTERJSAMPLE |
| 15671 |
| 15672 ; Representation of a DCT frequency coefficient. |
| 15673 ; On this SIMD implementation, this must be 'short'. |
| 15674 @@ -95,74 +123,74 @@ |
| 15675 ; -- jsimd.h |
| 15676 ; |
| 15677 |
| 15678 -definev(JSIMD_NONE) |
| 15679 -definev(JSIMD_MMX) |
| 15680 -definev(JSIMD_3DNOW) |
| 15681 -definev(JSIMD_SSE) |
| 15682 -definev(JSIMD_SSE2) |
| 15683 +%define _cpp_protection_JSIMD_NONE JSIMD_NONE |
| 15684 +%define _cpp_protection_JSIMD_MMX JSIMD_MMX |
| 15685 +%define _cpp_protection_JSIMD_3DNOW JSIMD_3DNOW |
| 15686 +%define _cpp_protection_JSIMD_SSE JSIMD_SSE |
| 15687 +%define _cpp_protection_JSIMD_SSE2 JSIMD_SSE2 |
| 15688 |
| 15689 ; Short forms of external names for systems with brain-damaged linkers. |
| 15690 ; |
| 15691 #ifdef NEED_SHORT_EXTERNAL_NAMES |
| 15692 -definev(jpeg_simd_cpu_support) |
| 15693 -definev(jsimd_rgb_ycc_convert_mmx) |
| 15694 -definev(jsimd_ycc_rgb_convert_mmx) |
| 15695 -definev(jconst_rgb_ycc_convert_sse2) |
| 15696 -definev(jsimd_rgb_ycc_convert_sse2) |
| 15697 -definev(jconst_ycc_rgb_convert_sse2) |
| 15698 -definev(jsimd_ycc_rgb_convert_sse2) |
| 15699 -definev(jsimd_h2v2_downsample_mmx) |
| 15700 -definev(jsimd_h2v1_downsample_mmx) |
| 15701 -definev(jsimd_h2v2_downsample_sse2) |
| 15702 -definev(jsimd_h2v1_downsample_sse2) |
| 15703 -definev(jsimd_h2v2_upsample_mmx) |
| 15704 -definev(jsimd_h2v1_upsample_mmx) |
| 15705 -definev(jsimd_h2v1_fancy_upsample_mmx) |
| 15706 -definev(jsimd_h2v2_fancy_upsample_mmx) |
| 15707 -definev(jsimd_h2v1_merged_upsample_mmx) |
| 15708 -definev(jsimd_h2v2_merged_upsample_mmx) |
| 15709 -definev(jsimd_h2v2_upsample_sse2) |
| 15710 -definev(jsimd_h2v1_upsample_sse2) |
| 15711 -definev(jconst_fancy_upsample_sse2) |
| 15712 -definev(jsimd_h2v1_fancy_upsample_sse2) |
| 15713 -definev(jsimd_h2v2_fancy_upsample_sse2) |
| 15714 -definev(jconst_merged_upsample_sse2) |
| 15715 -definev(jsimd_h2v1_merged_upsample_sse2) |
| 15716 -definev(jsimd_h2v2_merged_upsample_sse2) |
| 15717 -definev(jsimd_convsamp_mmx) |
| 15718 -definev(jsimd_convsamp_sse2) |
| 15719 -definev(jsimd_convsamp_float_3dnow) |
| 15720 -definev(jsimd_convsamp_float_sse) |
| 15721 -definev(jsimd_convsamp_float_sse2) |
| 15722 -definev(jsimd_fdct_islow_mmx) |
| 15723 -definev(jsimd_fdct_ifast_mmx) |
| 15724 -definev(jconst_fdct_islow_sse2) |
| 15725 -definev(jsimd_fdct_islow_sse2) |
| 15726 -definev(jconst_fdct_ifast_sse2) |
| 15727 -definev(jsimd_fdct_ifast_sse2) |
| 15728 -definev(jsimd_fdct_float_3dnow) |
| 15729 -definev(jconst_fdct_float_sse) |
| 15730 -definev(jsimd_fdct_float_sse) |
| 15731 -definev(jsimd_quantize_mmx) |
| 15732 -definev(jsimd_quantize_sse2) |
| 15733 -definev(jsimd_quantize_float_3dnow) |
| 15734 -definev(jsimd_quantize_float_sse) |
| 15735 -definev(jsimd_quantize_float_sse2) |
| 15736 -definev(jsimd_idct_2x2_mmx) |
| 15737 -definev(jsimd_idct_4x4_mmx) |
| 15738 -definev(jconst_idct_red_sse2) |
| 15739 -definev(jsimd_idct_2x2_sse2) |
| 15740 -definev(jsimd_idct_4x4_sse2) |
| 15741 -definev(jsimd_idct_islow_mmx) |
| 15742 -definev(jsimd_idct_ifast_mmx) |
| 15743 -definev(jconst_idct_islow_sse2) |
| 15744 -definev(jsimd_idct_islow_sse2) |
| 15745 -definev(jconst_idct_ifast_sse2) |
| 15746 -definev(jsimd_idct_ifast_sse2) |
| 15747 -definev(jsimd_idct_float_3dnow) |
| 15748 -definev(jconst_idct_float_sse) |
| 15749 -definev(jsimd_idct_float_sse) |
| 15750 -definev(jconst_idct_float_sse2) |
| 15751 -definev(jsimd_idct_float_sse2) |
| 15752 +%define _cpp_protection_jpeg_simd_cpu_support jpeg_simd_cpu_support |
| 15753 +%define _cpp_protection_jsimd_rgb_ycc_convert_mmx jsimd_rgb_ycc_convert_mmx |
| 15754 +%define _cpp_protection_jsimd_ycc_rgb_convert_mmx jsimd_ycc_rgb_convert_mmx |
| 15755 +%define _cpp_protection_jconst_rgb_ycc_convert_sse2 jconst_rgb_ycc_convert_sse2 |
| 15756 +%define _cpp_protection_jsimd_rgb_ycc_convert_sse2 jsimd_rgb_ycc_convert_sse2 |
| 15757 +%define _cpp_protection_jconst_ycc_rgb_convert_sse2 jconst_ycc_rgb_convert_sse2 |
| 15758 +%define _cpp_protection_jsimd_ycc_rgb_convert_sse2 jsimd_ycc_rgb_convert_sse2 |
| 15759 +%define _cpp_protection_jsimd_h2v2_downsample_mmx jsimd_h2v2_downsample_mmx |
| 15760 +%define _cpp_protection_jsimd_h2v1_downsample_mmx jsimd_h2v1_downsample_mmx |
| 15761 +%define _cpp_protection_jsimd_h2v2_downsample_sse2 jsimd_h2v2_downsample_sse2 |
| 15762 +%define _cpp_protection_jsimd_h2v1_downsample_sse2 jsimd_h2v1_downsample_sse2 |
| 15763 +%define _cpp_protection_jsimd_h2v2_upsample_mmx jsimd_h2v2_upsample_mmx |
| 15764 +%define _cpp_protection_jsimd_h2v1_upsample_mmx jsimd_h2v1_upsample_mmx |
| 15765 +%define _cpp_protection_jsimd_h2v1_fancy_upsample_mmx jsimd_h2v1_fancy_upsample
_mmx |
| 15766 +%define _cpp_protection_jsimd_h2v2_fancy_upsample_mmx jsimd_h2v2_fancy_upsample
_mmx |
| 15767 +%define _cpp_protection_jsimd_h2v1_merged_upsample_mmx jsimd_h2v1_merged_upsamp
le_mmx |
| 15768 +%define _cpp_protection_jsimd_h2v2_merged_upsample_mmx jsimd_h2v2_merged_upsamp
le_mmx |
| 15769 +%define _cpp_protection_jsimd_h2v2_upsample_sse2 jsimd_h2v2_upsample_sse2 |
| 15770 +%define _cpp_protection_jsimd_h2v1_upsample_sse2 jsimd_h2v1_upsample_sse2 |
| 15771 +%define _cpp_protection_jconst_fancy_upsample_sse2 jconst_fancy_upsample_sse2 |
| 15772 +%define _cpp_protection_jsimd_h2v1_fancy_upsample_sse2 jsimd_h2v1_fancy_upsampl
e_sse2 |
| 15773 +%define _cpp_protection_jsimd_h2v2_fancy_upsample_sse2 jsimd_h2v2_fancy_upsampl
e_sse2 |
| 15774 +%define _cpp_protection_jconst_merged_upsample_sse2 jconst_merged_upsample_sse2 |
| 15775 +%define _cpp_protection_jsimd_h2v1_merged_upsample_sse2 jsimd_h2v1_merged_upsam
ple_sse2 |
| 15776 +%define _cpp_protection_jsimd_h2v2_merged_upsample_sse2 jsimd_h2v2_merged_upsam
ple_sse2 |
| 15777 +%define _cpp_protection_jsimd_convsamp_mmx jsimd_convsamp_mmx |
| 15778 +%define _cpp_protection_jsimd_convsamp_sse2 jsimd_convsamp_sse2 |
| 15779 +%define _cpp_protection_jsimd_convsamp_float_3dnow jsimd_convsamp_float_3dnow |
| 15780 +%define _cpp_protection_jsimd_convsamp_float_sse jsimd_convsamp_float_sse |
| 15781 +%define _cpp_protection_jsimd_convsamp_float_sse2 jsimd_convsamp_float_sse2 |
| 15782 +%define _cpp_protection_jsimd_fdct_islow_mmx jsimd_fdct_islow_mmx |
| 15783 +%define _cpp_protection_jsimd_fdct_ifast_mmx jsimd_fdct_ifast_mmx |
| 15784 +%define _cpp_protection_jconst_fdct_islow_sse2 jconst_fdct_islow_sse2 |
| 15785 +%define _cpp_protection_jsimd_fdct_islow_sse2 jsimd_fdct_islow_sse2 |
| 15786 +%define _cpp_protection_jconst_fdct_ifast_sse2 jconst_fdct_ifast_sse2 |
| 15787 +%define _cpp_protection_jsimd_fdct_ifast_sse2 jsimd_fdct_ifast_sse2 |
| 15788 +%define _cpp_protection_jsimd_fdct_float_3dnow jsimd_fdct_float_3dnow |
| 15789 +%define _cpp_protection_jconst_fdct_float_sse jconst_fdct_float_sse |
| 15790 +%define _cpp_protection_jsimd_fdct_float_sse jsimd_fdct_float_sse |
| 15791 +%define _cpp_protection_jsimd_quantize_mmx jsimd_quantize_mmx |
| 15792 +%define _cpp_protection_jsimd_quantize_sse2 jsimd_quantize_sse2 |
| 15793 +%define _cpp_protection_jsimd_quantize_float_3dnow jsimd_quantize_float_3dnow |
| 15794 +%define _cpp_protection_jsimd_quantize_float_sse jsimd_quantize_float_sse |
| 15795 +%define _cpp_protection_jsimd_quantize_float_sse2 jsimd_quantize_float_sse2 |
| 15796 +%define _cpp_protection_jsimd_idct_2x2_mmx jsimd_idct_2x2_mmx |
| 15797 +%define _cpp_protection_jsimd_idct_4x4_mmx jsimd_idct_4x4_mmx |
| 15798 +%define _cpp_protection_jconst_idct_red_sse2 jconst_idct_red_sse2 |
| 15799 +%define _cpp_protection_jsimd_idct_2x2_sse2 jsimd_idct_2x2_sse2 |
| 15800 +%define _cpp_protection_jsimd_idct_4x4_sse2 jsimd_idct_4x4_sse2 |
| 15801 +%define _cpp_protection_jsimd_idct_islow_mmx jsimd_idct_islow_mmx |
| 15802 +%define _cpp_protection_jsimd_idct_ifast_mmx jsimd_idct_ifast_mmx |
| 15803 +%define _cpp_protection_jconst_idct_islow_sse2 jconst_idct_islow_sse2 |
| 15804 +%define _cpp_protection_jsimd_idct_islow_sse2 jsimd_idct_islow_sse2 |
| 15805 +%define _cpp_protection_jconst_idct_ifast_sse2 jconst_idct_ifast_sse2 |
| 15806 +%define _cpp_protection_jsimd_idct_ifast_sse2 jsimd_idct_ifast_sse2 |
| 15807 +%define _cpp_protection_jsimd_idct_float_3dnow jsimd_idct_float_3dnow |
| 15808 +%define _cpp_protection_jconst_idct_float_sse jconst_idct_float_sse |
| 15809 +%define _cpp_protection_jsimd_idct_float_sse jsimd_idct_float_sse |
| 15810 +%define _cpp_protection_jconst_idct_float_sse2 jconst_idct_float_sse2 |
| 15811 +%define _cpp_protection_jsimd_idct_float_sse2 jsimd_idct_float_sse2 |
| 15812 #endif /* NEED_SHORT_EXTERNAL_NAMES */ |
| 15813 |
| 15814 Index: simd/jsimdcpu.asm |
| 15815 =================================================================== |
| 15816 --- simd/jsimdcpu.asm (revision 829) |
| 15817 +++ simd/jsimdcpu.asm (working copy) |
| 15818 @@ -29,7 +29,7 @@ |
| 15819 ; |
| 15820 |
| 15821 align 16 |
| 15822 - global EXTN(jpeg_simd_cpu_support) |
| 15823 + global EXTN(jpeg_simd_cpu_support) PRIVATE |
| 15824 |
| 15825 EXTN(jpeg_simd_cpu_support): |
| 15826 push ebx |
| 15827 @@ -100,3 +100,6 @@ |
| 15828 pop ebx |
| 15829 ret |
| 15830 |
| 15831 +; For some reason, the OS X linker does not honor the request to align the |
| 15832 +; segment unless we do this. |
| 15833 + align 16 |
| 15834 Index: simd/jsimdext.inc |
| 15835 =================================================================== |
| 15836 --- simd/jsimdext.inc (revision 829) |
| 15837 +++ simd/jsimdext.inc (working copy) |
| 15838 @@ -2,6 +2,7 @@ |
| 15839 ; jsimdext.inc - common declarations |
| 15840 ; |
| 15841 ; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB |
| 15842 +; Copyright 2010 D. R. Commander |
| 15843 ; |
| 15844 ; Based on |
| 15845 ; x86 SIMD extension for IJG JPEG library - version 1.02 |
| 15846 @@ -37,9 +38,28 @@ |
| 15847 |
| 15848 ; -- segment definition -- |
| 15849 ; |
| 15850 +%ifdef __YASM_VER__ |
| 15851 +%define SEG_TEXT .text align=16 |
| 15852 +%define SEG_CONST .rdata align=16 |
| 15853 +%else |
| 15854 %define SEG_TEXT .text align=16 public use32 class=CODE |
| 15855 %define SEG_CONST .rdata align=16 public use32 class=CONST |
| 15856 +%endif |
| 15857 |
| 15858 +%elifdef WIN64 ; ----(nasm -fwin64 -DWIN64 ...)-------- |
| 15859 +; * Microsoft Visual C++ |
| 15860 + |
| 15861 +; -- segment definition -- |
| 15862 +; |
| 15863 +%ifdef __YASM_VER__ |
| 15864 +%define SEG_TEXT .text align=16 |
| 15865 +%define SEG_CONST .rdata align=16 |
| 15866 +%else |
| 15867 +%define SEG_TEXT .text align=16 public use64 class=CODE |
| 15868 +%define SEG_CONST .rdata align=16 public use64 class=CONST |
| 15869 +%endif |
| 15870 +%define EXTN(name) name ; foo() -> foo |
| 15871 + |
| 15872 %elifdef OBJ32 ; ----(nasm -fobj -DOBJ32 ...)---------- |
| 15873 ; * Borland C++ (Win32) |
| 15874 |
| 15875 @@ -53,6 +73,12 @@ |
| 15876 ; * *BSD family Unix using elf format |
| 15877 ; * Unix System V, including Solaris x86, UnixWare and SCO Unix |
| 15878 |
| 15879 +; PIC is the default on Linux |
| 15880 +%define PIC |
| 15881 + |
| 15882 +; mark stack as non-executable |
| 15883 +section .note.GNU-stack noalloc noexec nowrite progbits |
| 15884 + |
| 15885 ; -- segment definition -- |
| 15886 ; |
| 15887 %ifdef __x86_64__ |
| 15888 @@ -280,7 +306,44 @@ |
| 15889 %endmacro |
| 15890 |
| 15891 %ifdef __x86_64__ |
| 15892 + |
| 15893 +%ifdef WIN64 |
| 15894 + |
| 15895 %imacro collect_args 0 |
| 15896 + push r12 |
| 15897 + push r13 |
| 15898 + push r14 |
| 15899 + push r15 |
| 15900 + mov r10, rcx |
| 15901 + mov r11, rdx |
| 15902 + mov r12, r8 |
| 15903 + mov r13, r9 |
| 15904 + mov r14, [rax+48] |
| 15905 + mov r15, [rax+56] |
| 15906 + push rsi |
| 15907 + push rdi |
| 15908 + sub rsp, SIZEOF_XMMWORD |
| 15909 + movaps XMMWORD [rsp], xmm6 |
| 15910 + sub rsp, SIZEOF_XMMWORD |
| 15911 + movaps XMMWORD [rsp], xmm7 |
| 15912 +%endmacro |
| 15913 + |
| 15914 +%imacro uncollect_args 0 |
| 15915 + movaps xmm7, XMMWORD [rsp] |
| 15916 + add rsp, SIZEOF_XMMWORD |
| 15917 + movaps xmm6, XMMWORD [rsp] |
| 15918 + add rsp, SIZEOF_XMMWORD |
| 15919 + pop rdi |
| 15920 + pop rsi |
| 15921 + pop r15 |
| 15922 + pop r14 |
| 15923 + pop r13 |
| 15924 + pop r12 |
| 15925 +%endmacro |
| 15926 + |
| 15927 +%else |
| 15928 + |
| 15929 +%imacro collect_args 0 |
| 15930 push r10 |
| 15931 push r11 |
| 15932 push r12 |
| 15933 @@ -306,9 +369,21 @@ |
| 15934 |
| 15935 %endif |
| 15936 |
| 15937 +%endif |
| 15938 + |
| 15939 ; -------------------------------------------------------------------------- |
| 15940 ; Defines picked up from the C headers |
| 15941 ; |
| 15942 %include "jsimdcfg.inc" |
| 15943 |
| 15944 +; Begin chromium edits |
| 15945 +%ifdef MACHO ; ----(nasm -fmacho -DMACHO ...)-------- |
| 15946 +%define PRIVATE :private_extern |
| 15947 +%elifdef ELF ; ----(nasm -felf[64] -DELF ...)------------ |
| 15948 +%define PRIVATE :hidden |
| 15949 +%else |
| 15950 +%define PRIVATE |
| 15951 +%endif |
| 15952 +; End chromium edits |
| 15953 + |
| 15954 ; -------------------------------------------------------------------------- |
| 15955 Index: turbojpeg.h |
| 15956 =================================================================== |
| 15957 --- turbojpeg.h (revision 829) |
| 15958 +++ turbojpeg.h (working copy) |
| 15959 @@ -1,231 +1,932 @@ |
| 15960 -/* Copyright (C)2004 Landmark Graphics Corporation |
| 15961 - * Copyright (C)2005, 2006 Sun Microsystems, Inc. |
| 15962 - * Copyright (C)2009 D. R. Commander |
| 15963 +/* |
| 15964 + * Copyright (C)2009-2013 D. R. Commander. All Rights Reserved. |
| 15965 * |
| 15966 - * This library is free software and may be redistributed and/or modified under |
| 15967 - * the terms of the wxWindows Library License, Version 3.1 or (at your option) |
| 15968 - * any later version. The full license is in the LICENSE.txt file included |
| 15969 - * with this distribution. |
| 15970 + * Redistribution and use in source and binary forms, with or without |
| 15971 + * modification, are permitted provided that the following conditions are met: |
| 15972 * |
| 15973 - * This library is distributed in the hope that it will be useful, |
| 15974 - * but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 15975 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 15976 - * wxWindows Library License for more details. |
| 15977 + * - Redistributions of source code must retain the above copyright notice, |
| 15978 + * this list of conditions and the following disclaimer. |
| 15979 + * - Redistributions in binary form must reproduce the above copyright notice, |
| 15980 + * this list of conditions and the following disclaimer in the documentation |
| 15981 + * and/or other materials provided with the distribution. |
| 15982 + * - Neither the name of the libjpeg-turbo Project nor the names of its |
| 15983 + * contributors may be used to endorse or promote products derived from this |
| 15984 + * software without specific prior written permission. |
| 15985 + * |
| 15986 + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS", |
| 15987 + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| 15988 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
| 15989 + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE |
| 15990 + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR |
| 15991 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF |
| 15992 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS |
| 15993 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN |
| 15994 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) |
| 15995 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE |
| 15996 + * POSSIBILITY OF SUCH DAMAGE. |
| 15997 */ |
| 15998 |
| 15999 -#if (defined(_MSC_VER) || defined(__CYGWIN__) || defined(__MINGW32__)) && defin
ed(_WIN32) && defined(DLLDEFINE) |
| 16000 +#ifndef __TURBOJPEG_H__ |
| 16001 +#define __TURBOJPEG_H__ |
| 16002 + |
| 16003 +#if defined(_WIN32) && defined(DLLDEFINE) |
| 16004 #define DLLEXPORT __declspec(dllexport) |
| 16005 #else |
| 16006 #define DLLEXPORT |
| 16007 #endif |
| 16008 - |
| 16009 #define DLLCALL |
| 16010 |
| 16011 -/* Subsampling */ |
| 16012 -#define NUMSUBOPT 4 |
| 16013 |
| 16014 -enum {TJ_444=0, TJ_422, TJ_420, TJ_GRAYSCALE}; |
| 16015 +/** |
| 16016 + * @addtogroup TurboJPEG |
| 16017 + * TurboJPEG API. This API provides an interface for generating, decoding, and |
| 16018 + * transforming planar YUV and JPEG images in memory. |
| 16019 + * |
| 16020 + * @{ |
| 16021 + */ |
| 16022 |
| 16023 -/* Flags */ |
| 16024 -#define TJ_BGR 1 |
| 16025 -#define TJ_BOTTOMUP 2 |
| 16026 -#define TJ_FORCEMMX 8 /* Force IPP to use MMX code even if SSE available */ |
| 16027 -#define TJ_FORCESSE 16 /* Force IPP to use SSE1 code even if SSE2 available *
/ |
| 16028 -#define TJ_FORCESSE2 32 /* Force IPP to use SSE2 code (useful if auto-detect i
s not working properly) */ |
| 16029 -#define TJ_ALPHAFIRST 64 /* BGR buffer is ABGR and RGB buffer is ARGB */ |
| 16030 -#define TJ_FORCESSE3 128 /* Force IPP to use SSE3 code (useful if auto-detect i
s not working properly) */ |
| 16031 -#define TJ_FASTUPSAMPLE 256 /* Use fast, inaccurate 4:2:2 and 4:2:0 YUV upsampl
ing routines in libjpeg decompressor */ |
| 16032 |
| 16033 +/** |
| 16034 + * The number of chrominance subsampling options |
| 16035 + */ |
| 16036 +#define TJ_NUMSAMP 5 |
| 16037 + |
| 16038 +/** |
| 16039 + * Chrominance subsampling options. |
| 16040 + * When an image is converted from the RGB to the YCbCr colorspace as part of |
| 16041 + * the JPEG compression process, some of the Cb and Cr (chrominance) components |
| 16042 + * can be discarded or averaged together to produce a smaller image with little |
| 16043 + * perceptible loss of image clarity (the human eye is more sensitive to small |
| 16044 + * changes in brightness than small changes in color.) This is called |
| 16045 + * "chrominance subsampling". |
| 16046 + * <p> |
| 16047 + * NOTE: Technically, the JPEG format uses the YCbCr colorspace, but per the |
| 16048 + * convention of the digital video community, the TurboJPEG API uses "YUV" to |
| 16049 + * refer to an image format consisting of Y, Cb, and Cr image planes. |
| 16050 + */ |
| 16051 +enum TJSAMP |
2199 +{ | 16052 +{ |
2200 + if (simd_support & JSIMD_ARM_NEON) | 16053 + /** |
2201 + jsimd_ycc_rgb565_convert_neon(cinfo->output_width, input_buf, input_row, | 16054 + * 4:4:4 chrominance subsampling (no chrominance subsampling). The JPEG or |
2202 + output_buf, num_rows); | 16055 + * YUV image will contain one chrominance component for every pixel in the |
2203 +} | 16056 + * source image. |
2204 + | 16057 + */ |
2205 +GLOBAL(int) | 16058 + TJSAMP_444=0, |
2206 +jsimd_can_h2v2_downsample (void) | 16059 + /** |
| 16060 + * 4:2:2 chrominance subsampling. The JPEG or YUV image will contain one |
| 16061 + * chrominance component for every 2x1 block of pixels in the source image. |
| 16062 + */ |
| 16063 + TJSAMP_422, |
| 16064 + /** |
| 16065 + * 4:2:0 chrominance subsampling. The JPEG or YUV image will contain one |
| 16066 + * chrominance component for every 2x2 block of pixels in the source image. |
| 16067 + */ |
| 16068 + TJSAMP_420, |
| 16069 + /** |
| 16070 + * Grayscale. The JPEG or YUV image will contain no chrominance components. |
| 16071 + */ |
| 16072 + TJSAMP_GRAY, |
| 16073 + /** |
| 16074 + * 4:4:0 chrominance subsampling. The JPEG or YUV image will contain one |
| 16075 + * chrominance component for every 1x2 block of pixels in the source image. |
| 16076 + * Note that 4:4:0 subsampling is not fully accelerated in libjpeg-turbo. |
| 16077 + */ |
| 16078 + TJSAMP_440 |
| 16079 +}; |
| 16080 + |
| 16081 +/** |
| 16082 + * MCU block width (in pixels) for a given level of chrominance subsampling. |
| 16083 + * MCU block sizes: |
| 16084 + * - 8x8 for no subsampling or grayscale |
| 16085 + * - 16x8 for 4:2:2 |
| 16086 + * - 8x16 for 4:4:0 |
| 16087 + * - 16x16 for 4:2:0 |
| 16088 + */ |
| 16089 +static const int tjMCUWidth[TJ_NUMSAMP] = {8, 16, 16, 8, 8}; |
| 16090 + |
| 16091 +/** |
| 16092 + * MCU block height (in pixels) for a given level of chrominance subsampling. |
| 16093 + * MCU block sizes: |
| 16094 + * - 8x8 for no subsampling or grayscale |
| 16095 + * - 16x8 for 4:2:2 |
| 16096 + * - 8x16 for 4:4:0 |
| 16097 + * - 16x16 for 4:2:0 |
| 16098 + */ |
| 16099 +static const int tjMCUHeight[TJ_NUMSAMP] = {8, 8, 16, 8, 16}; |
| 16100 + |
| 16101 + |
| 16102 +/** |
| 16103 + * The number of pixel formats |
| 16104 + */ |
| 16105 +#define TJ_NUMPF 11 |
| 16106 + |
| 16107 +/** |
| 16108 + * Pixel formats |
| 16109 + */ |
| 16110 +enum TJPF |
2207 +{ | 16111 +{ |
2208 + init_simd(); | 16112 + /** |
2209 + | 16113 + * RGB pixel format. The red, green, and blue components in the image are |
2210 + return 0; | 16114 + * stored in 3-byte pixels in the order R, G, B from lowest to highest byte |
2211 +} | 16115 + * address within each pixel. |
2212 + | 16116 + */ |
2213 +GLOBAL(int) | 16117 + TJPF_RGB=0, |
2214 +jsimd_can_h2v1_downsample (void) | 16118 + /** |
| 16119 + * BGR pixel format. The red, green, and blue components in the image are |
| 16120 + * stored in 3-byte pixels in the order B, G, R from lowest to highest byte |
| 16121 + * address within each pixel. |
| 16122 + */ |
| 16123 + TJPF_BGR, |
| 16124 + /** |
| 16125 + * RGBX pixel format. The red, green, and blue components in the image are |
| 16126 + * stored in 4-byte pixels in the order R, G, B from lowest to highest byte |
| 16127 + * address within each pixel. The X component is ignored when compressing |
| 16128 + * and undefined when decompressing. |
| 16129 + */ |
| 16130 + TJPF_RGBX, |
| 16131 + /** |
| 16132 + * BGRX pixel format. The red, green, and blue components in the image are |
| 16133 + * stored in 4-byte pixels in the order B, G, R from lowest to highest byte |
| 16134 + * address within each pixel. The X component is ignored when compressing |
| 16135 + * and undefined when decompressing. |
| 16136 + */ |
| 16137 + TJPF_BGRX, |
| 16138 + /** |
| 16139 + * XBGR pixel format. The red, green, and blue components in the image are |
| 16140 + * stored in 4-byte pixels in the order R, G, B from highest to lowest byte |
| 16141 + * address within each pixel. The X component is ignored when compressing |
| 16142 + * and undefined when decompressing. |
| 16143 + */ |
| 16144 + TJPF_XBGR, |
| 16145 + /** |
| 16146 + * XRGB pixel format. The red, green, and blue components in the image are |
| 16147 + * stored in 4-byte pixels in the order B, G, R from highest to lowest byte |
| 16148 + * address within each pixel. The X component is ignored when compressing |
| 16149 + * and undefined when decompressing. |
| 16150 + */ |
| 16151 + TJPF_XRGB, |
| 16152 + /** |
| 16153 + * Grayscale pixel format. Each 1-byte pixel represents a luminance |
| 16154 + * (brightness) level from 0 to 255. |
| 16155 + */ |
| 16156 + TJPF_GRAY, |
| 16157 + /** |
| 16158 + * RGBA pixel format. This is the same as @ref TJPF_RGBX, except that when |
| 16159 + * decompressing, the X component is guaranteed to be 0xFF, which can be |
| 16160 + * interpreted as an opaque alpha channel. |
| 16161 + */ |
| 16162 + TJPF_RGBA, |
| 16163 + /** |
| 16164 + * BGRA pixel format. This is the same as @ref TJPF_BGRX, except that when |
| 16165 + * decompressing, the X component is guaranteed to be 0xFF, which can be |
| 16166 + * interpreted as an opaque alpha channel. |
| 16167 + */ |
| 16168 + TJPF_BGRA, |
| 16169 + /** |
| 16170 + * ABGR pixel format. This is the same as @ref TJPF_XBGR, except that when |
| 16171 + * decompressing, the X component is guaranteed to be 0xFF, which can be |
| 16172 + * interpreted as an opaque alpha channel. |
| 16173 + */ |
| 16174 + TJPF_ABGR, |
| 16175 + /** |
| 16176 + * ARGB pixel format. This is the same as @ref TJPF_XRGB, except that when |
| 16177 + * decompressing, the X component is guaranteed to be 0xFF, which can be |
| 16178 + * interpreted as an opaque alpha channel. |
| 16179 + */ |
| 16180 + TJPF_ARGB |
| 16181 +}; |
| 16182 + |
| 16183 +/** |
| 16184 + * Red offset (in bytes) for a given pixel format. This specifies the number |
| 16185 + * of bytes that the red component is offset from the start of the pixel. For |
| 16186 + * instance, if a pixel of format TJ_BGRX is stored in <tt>char pixel[]</tt>, |
| 16187 + * then the red component will be <tt>pixel[tjRedOffset[TJ_BGRX]]</tt>. |
| 16188 + */ |
| 16189 +static const int tjRedOffset[TJ_NUMPF] = {0, 2, 0, 2, 3, 1, 0, 0, 2, 3, 1}; |
| 16190 +/** |
| 16191 + * Green offset (in bytes) for a given pixel format. This specifies the number |
| 16192 + * of bytes that the green component is offset from the start of the pixel. |
| 16193 + * For instance, if a pixel of format TJ_BGRX is stored in |
| 16194 + * <tt>char pixel[]</tt>, then the green component will be |
| 16195 + * <tt>pixel[tjGreenOffset[TJ_BGRX]]</tt>. |
| 16196 + */ |
| 16197 +static const int tjGreenOffset[TJ_NUMPF] = {1, 1, 1, 1, 2, 2, 0, 1, 1, 2, 2}; |
| 16198 +/** |
| 16199 + * Blue offset (in bytes) for a given pixel format. This specifies the number |
| 16200 + * of bytes that the Blue component is offset from the start of the pixel. For |
| 16201 + * instance, if a pixel of format TJ_BGRX is stored in <tt>char pixel[]</tt>, |
| 16202 + * then the blue component will be <tt>pixel[tjBlueOffset[TJ_BGRX]]</tt>. |
| 16203 + */ |
| 16204 +static const int tjBlueOffset[TJ_NUMPF] = {2, 0, 2, 0, 1, 3, 0, 2, 0, 1, 3}; |
| 16205 + |
| 16206 +/** |
| 16207 + * Pixel size (in bytes) for a given pixel format. |
| 16208 + */ |
| 16209 +static const int tjPixelSize[TJ_NUMPF] = {3, 3, 4, 4, 4, 4, 1, 4, 4, 4, 4}; |
| 16210 + |
| 16211 + |
| 16212 +/** |
| 16213 + * The uncompressed source/destination image is stored in bottom-up (Windows, |
| 16214 + * OpenGL) order, not top-down (X11) order. |
| 16215 + */ |
| 16216 +#define TJFLAG_BOTTOMUP 2 |
| 16217 +/** |
| 16218 + * Turn off CPU auto-detection and force TurboJPEG to use MMX code (if the |
| 16219 + * underlying codec supports it.) |
| 16220 + */ |
| 16221 +#define TJFLAG_FORCEMMX 8 |
| 16222 +/** |
| 16223 + * Turn off CPU auto-detection and force TurboJPEG to use SSE code (if the |
| 16224 + * underlying codec supports it.) |
| 16225 + */ |
| 16226 +#define TJFLAG_FORCESSE 16 |
| 16227 +/** |
| 16228 + * Turn off CPU auto-detection and force TurboJPEG to use SSE2 code (if the |
| 16229 + * underlying codec supports it.) |
| 16230 + */ |
| 16231 +#define TJFLAG_FORCESSE2 32 |
| 16232 +/** |
| 16233 + * Turn off CPU auto-detection and force TurboJPEG to use SSE3 code (if the |
| 16234 + * underlying codec supports it.) |
| 16235 + */ |
| 16236 +#define TJFLAG_FORCESSE3 128 |
| 16237 +/** |
| 16238 + * When decompressing an image that was compressed using chrominance |
| 16239 + * subsampling, use the fastest chrominance upsampling algorithm available in |
| 16240 + * the underlying codec. The default is to use smooth upsampling, which |
| 16241 + * creates a smooth transition between neighboring chrominance components in |
| 16242 + * order to reduce upsampling artifacts in the decompressed image. |
| 16243 + */ |
| 16244 +#define TJFLAG_FASTUPSAMPLE 256 |
| 16245 +/** |
| 16246 + * Disable buffer (re)allocation. If passed to #tjCompress2() or |
| 16247 + * #tjTransform(), this flag will cause those functions to generate an error if |
| 16248 + * the JPEG image buffer is invalid or too small rather than attempting to |
| 16249 + * allocate or reallocate that buffer. This reproduces the behavior of earlier |
| 16250 + * versions of TurboJPEG. |
| 16251 + */ |
| 16252 +#define TJFLAG_NOREALLOC 1024 |
| 16253 +/** |
| 16254 + * Use the fastest DCT/IDCT algorithm available in the underlying codec. The |
| 16255 + * default if this flag is not specified is implementation-specific. For |
| 16256 + * example, the implementation of TurboJPEG for libjpeg[-turbo] uses the fast |
| 16257 + * algorithm by default when compressing, because this has been shown to have |
| 16258 + * only a very slight effect on accuracy, but it uses the accurate algorithm |
| 16259 + * when decompressing, because this has been shown to have a larger effect. |
| 16260 + */ |
| 16261 +#define TJFLAG_FASTDCT 2048 |
| 16262 +/** |
| 16263 + * Use the most accurate DCT/IDCT algorithm available in the underlying codec. |
| 16264 + * The default if this flag is not specified is implementation-specific. For |
| 16265 + * example, the implementation of TurboJPEG for libjpeg[-turbo] uses the fast |
| 16266 + * algorithm by default when compressing, because this has been shown to have |
| 16267 + * only a very slight effect on accuracy, but it uses the accurate algorithm |
| 16268 + * when decompressing, because this has been shown to have a larger effect. |
| 16269 + */ |
| 16270 +#define TJFLAG_ACCURATEDCT 4096 |
| 16271 + |
| 16272 + |
| 16273 +/** |
| 16274 + * The number of transform operations |
| 16275 + */ |
| 16276 +#define TJ_NUMXOP 8 |
| 16277 + |
| 16278 +/** |
| 16279 + * Transform operations for #tjTransform() |
| 16280 + */ |
| 16281 +enum TJXOP |
2215 +{ | 16282 +{ |
2216 + init_simd(); | 16283 + /** |
2217 + | 16284 + * Do not transform the position of the image pixels |
2218 + return 0; | 16285 + */ |
2219 +} | 16286 + TJXOP_NONE=0, |
2220 + | 16287 + /** |
2221 +GLOBAL(void) | 16288 + * Flip (mirror) image horizontally. This transform is imperfect if there |
2222 +jsimd_h2v2_downsample (j_compress_ptr cinfo, jpeg_component_info * compptr, | 16289 + * are any partial MCU blocks on the right edge (see #TJXOPT_PERFECT.) |
2223 + JSAMPARRAY input_data, JSAMPARRAY output_data) | 16290 + */ |
| 16291 + TJXOP_HFLIP, |
| 16292 + /** |
| 16293 + * Flip (mirror) image vertically. This transform is imperfect if there are |
| 16294 + * any partial MCU blocks on the bottom edge (see #TJXOPT_PERFECT.) |
| 16295 + */ |
| 16296 + TJXOP_VFLIP, |
| 16297 + /** |
| 16298 + * Transpose image (flip/mirror along upper left to lower right axis.) This |
| 16299 + * transform is always perfect. |
| 16300 + */ |
| 16301 + TJXOP_TRANSPOSE, |
| 16302 + /** |
| 16303 + * Transverse transpose image (flip/mirror along upper right to lower left |
| 16304 + * axis.) This transform is imperfect if there are any partial MCU blocks in |
| 16305 + * the image (see #TJXOPT_PERFECT.) |
| 16306 + */ |
| 16307 + TJXOP_TRANSVERSE, |
| 16308 + /** |
| 16309 + * Rotate image clockwise by 90 degrees. This transform is imperfect if |
| 16310 + * there are any partial MCU blocks on the bottom edge (see |
| 16311 + * #TJXOPT_PERFECT.) |
| 16312 + */ |
| 16313 + TJXOP_ROT90, |
| 16314 + /** |
| 16315 + * Rotate image 180 degrees. This transform is imperfect if there are any |
| 16316 + * partial MCU blocks in the image (see #TJXOPT_PERFECT.) |
| 16317 + */ |
| 16318 + TJXOP_ROT180, |
| 16319 + /** |
| 16320 + * Rotate image counter-clockwise by 90 degrees. This transform is imperfect |
| 16321 + * if there are any partial MCU blocks on the right edge (see |
| 16322 + * #TJXOPT_PERFECT.) |
| 16323 + */ |
| 16324 + TJXOP_ROT270 |
| 16325 +}; |
| 16326 + |
| 16327 + |
| 16328 +/** |
| 16329 + * This option will cause #tjTransform() to return an error if the transform is |
| 16330 + * not perfect. Lossless transforms operate on MCU blocks, whose size depends |
| 16331 + * on the level of chrominance subsampling used (see #tjMCUWidth |
| 16332 + * and #tjMCUHeight.) If the image's width or height is not evenly divisible |
| 16333 + * by the MCU block size, then there will be partial MCU blocks on the right |
| 16334 + * and/or bottom edges. It is not possible to move these partial MCU blocks to |
| 16335 + * the top or left of the image, so any transform that would require that is |
| 16336 + * "imperfect." If this option is not specified, then any partial MCU blocks |
| 16337 + * that cannot be transformed will be left in place, which will create |
| 16338 + * odd-looking strips on the right or bottom edge of the image. |
| 16339 + */ |
| 16340 +#define TJXOPT_PERFECT 1 |
| 16341 +/** |
| 16342 + * This option will cause #tjTransform() to discard any partial MCU blocks that |
| 16343 + * cannot be transformed. |
| 16344 + */ |
| 16345 +#define TJXOPT_TRIM 2 |
| 16346 +/** |
| 16347 + * This option will enable lossless cropping. See #tjTransform() for more |
| 16348 + * information. |
| 16349 + */ |
| 16350 +#define TJXOPT_CROP 4 |
| 16351 +/** |
| 16352 + * This option will discard the color data in the input image and produce |
| 16353 + * a grayscale output image. |
| 16354 + */ |
| 16355 +#define TJXOPT_GRAY 8 |
| 16356 +/** |
| 16357 + * This option will prevent #tjTransform() from outputting a JPEG image for |
| 16358 + * this particular transform (this can be used in conjunction with a custom |
| 16359 + * filter to capture the transformed DCT coefficients without transcoding |
| 16360 + * them.) |
| 16361 + */ |
| 16362 +#define TJXOPT_NOOUTPUT 16 |
| 16363 + |
| 16364 + |
| 16365 +/** |
| 16366 + * Scaling factor |
| 16367 + */ |
| 16368 +typedef struct |
2224 +{ | 16369 +{ |
2225 +} | 16370 + /** |
2226 + | 16371 + * Numerator |
2227 +GLOBAL(void) | 16372 + */ |
2228 +jsimd_h2v1_downsample (j_compress_ptr cinfo, jpeg_component_info * compptr, | 16373 + int num; |
2229 + JSAMPARRAY input_data, JSAMPARRAY output_data) | 16374 + /** |
| 16375 + * Denominator |
| 16376 + */ |
| 16377 + int denom; |
| 16378 +} tjscalingfactor; |
| 16379 + |
| 16380 +/** |
| 16381 + * Cropping region |
| 16382 + */ |
| 16383 +typedef struct |
2230 +{ | 16384 +{ |
2231 +} | 16385 + /** |
2232 + | 16386 + * The left boundary of the cropping region. This must be evenly divisible |
2233 +GLOBAL(int) | 16387 + * by the MCU block width (see #tjMCUWidth.) |
2234 +jsimd_can_h2v2_upsample (void) | 16388 + */ |
| 16389 + int x; |
| 16390 + /** |
| 16391 + * The upper boundary of the cropping region. This must be evenly divisible |
| 16392 + * by the MCU block height (see #tjMCUHeight.) |
| 16393 + */ |
| 16394 + int y; |
| 16395 + /** |
| 16396 + * The width of the cropping region. Setting this to 0 is the equivalent of |
| 16397 + * setting it to the width of the source JPEG image - x. |
| 16398 + */ |
| 16399 + int w; |
| 16400 + /** |
| 16401 + * The height of the cropping region. Setting this to 0 is the equivalent of |
| 16402 + * setting it to the height of the source JPEG image - y. |
| 16403 + */ |
| 16404 + int h; |
| 16405 +} tjregion; |
| 16406 + |
| 16407 +/** |
| 16408 + * Lossless transform |
| 16409 + */ |
| 16410 +typedef struct tjtransform |
2235 +{ | 16411 +{ |
2236 + init_simd(); | 16412 + /** |
2237 + | 16413 + * Cropping region |
2238 + return 0; | 16414 + */ |
2239 +} | 16415 + tjregion r; |
2240 + | 16416 + /** |
2241 +GLOBAL(int) | 16417 + * One of the @ref TJXOP "transform operations" |
2242 +jsimd_can_h2v1_upsample (void) | 16418 + */ |
2243 +{ | 16419 + int op; |
2244 + init_simd(); | 16420 + /** |
2245 + | 16421 + * The bitwise OR of one of more of the @ref TJXOPT_CROP "transform options" |
2246 + return 0; | 16422 + */ |
2247 +} | 16423 + int options; |
2248 + | 16424 + /** |
2249 +GLOBAL(void) | 16425 + * Arbitrary data that can be accessed within the body of the callback |
2250 +jsimd_h2v2_upsample (j_decompress_ptr cinfo, | 16426 + * function |
2251 + jpeg_component_info * compptr, | 16427 + */ |
2252 + JSAMPARRAY input_data, | 16428 + void *data; |
2253 + JSAMPARRAY * output_data_ptr) | 16429 + /** |
2254 +{ | 16430 + * A callback function that can be used to modify the DCT coefficients |
2255 +} | 16431 + * after they are losslessly transformed but before they are transcoded to a |
2256 + | 16432 + * new JPEG image. This allows for custom filters or other transformations |
2257 +GLOBAL(void) | 16433 + * to be applied in the frequency domain. |
2258 +jsimd_h2v1_upsample (j_decompress_ptr cinfo, | 16434 + * |
2259 + jpeg_component_info * compptr, | 16435 + * @param coeffs pointer to an array of transformed DCT coefficients. (NOTE: |
2260 + JSAMPARRAY input_data, | 16436 + * this pointer is not guaranteed to be valid once the callback |
2261 + JSAMPARRAY * output_data_ptr) | 16437 + * returns, so applications wishing to hand off the DCT coefficients |
2262 +{ | 16438 + * to another function or library should make a copy of them within |
2263 +} | 16439 + * the body of the callback.) |
2264 + | 16440 + * @param arrayRegion #tjregion structure containing the width and height of |
2265 +GLOBAL(int) | 16441 + * the array pointed to by <tt>coeffs</tt> as well as its offset |
2266 +jsimd_can_h2v2_fancy_upsample (void) | 16442 + * relative to the component plane. TurboJPEG implementations may |
2267 +{ | 16443 + * choose to split each component plane into multiple DCT coefficient |
2268 + init_simd(); | 16444 + * arrays and call the callback function once for each array. |
2269 + | 16445 + * @param planeRegion #tjregion structure containing the width and height of |
2270 + return 0; | 16446 + * the component plane to which <tt>coeffs</tt> belongs |
2271 +} | 16447 + * @param componentID ID number of the component plane to which |
2272 + | 16448 + * <tt>coeffs</tt> belongs (Y, Cb, and Cr have, respectively, ID's of |
2273 +GLOBAL(int) | 16449 + * 0, 1, and 2 in typical JPEG images.) |
2274 +jsimd_can_h2v1_fancy_upsample (void) | 16450 + * @param transformID ID number of the transformed image to which |
2275 +{ | 16451 + * <tt>coeffs</tt> belongs. This is the same as the index of the |
2276 + init_simd(); | 16452 + * transform in the <tt>transforms</tt> array that was passed to |
2277 + | 16453 + * #tjTransform(). |
2278 + return 0; | 16454 + * @param transform a pointer to a #tjtransform structure that specifies the |
2279 +} | 16455 + * parameters and/or cropping region for this transform |
2280 + | 16456 + * |
2281 +GLOBAL(void) | 16457 + * @return 0 if the callback was successful, or -1 if an error occurred. |
2282 +jsimd_h2v2_fancy_upsample (j_decompress_ptr cinfo, | 16458 + */ |
2283 + jpeg_component_info * compptr, | 16459 + int (*customFilter)(short *coeffs, tjregion arrayRegion, |
2284 + JSAMPARRAY input_data, | 16460 + tjregion planeRegion, int componentIndex, int transformIndex, |
2285 + JSAMPARRAY * output_data_ptr) | 16461 + struct tjtransform *transform); |
2286 +{ | 16462 +} tjtransform; |
2287 +} | 16463 + |
2288 + | 16464 +/** |
2289 +GLOBAL(void) | 16465 + * TurboJPEG instance handle |
2290 +jsimd_h2v1_fancy_upsample (j_decompress_ptr cinfo, | 16466 + */ |
2291 + jpeg_component_info * compptr, | 16467 typedef void* tjhandle; |
2292 + JSAMPARRAY input_data, | 16468 |
2293 + JSAMPARRAY * output_data_ptr) | 16469 -#define TJPAD(p) (((p)+3)&(~3)) |
2294 +{ | 16470 -#ifndef max |
2295 +} | 16471 - #define max(a,b) ((a)>(b)?(a):(b)) |
2296 + | 16472 -#endif |
2297 +GLOBAL(int) | 16473 |
2298 +jsimd_can_h2v2_merged_upsample (void) | 16474 +/** |
2299 +{ | 16475 + * Pad the given width to the nearest 32-bit boundary |
2300 + init_simd(); | 16476 + */ |
2301 + | 16477 +#define TJPAD(width) (((width)+3)&(~3)) |
2302 + return 0; | 16478 + |
2303 +} | 16479 +/** |
2304 + | 16480 + * Compute the scaled value of <tt>dimension</tt> using the given scaling |
2305 +GLOBAL(int) | 16481 + * factor. This macro performs the integer equivalent of <tt>ceil(dimension * |
2306 +jsimd_can_h2v1_merged_upsample (void) | 16482 + * scalingFactor)</tt>. |
2307 +{ | 16483 + */ |
2308 + init_simd(); | 16484 +#define TJSCALED(dimension, scalingFactor) ((dimension * scalingFactor.num \ |
2309 + | 16485 + + scalingFactor.denom - 1) / scalingFactor.denom) |
2310 + return 0; | 16486 + |
2311 +} | 16487 + |
2312 + | 16488 #ifdef __cplusplus |
2313 +GLOBAL(void) | 16489 extern "C" { |
2314 +jsimd_h2v2_merged_upsample (j_decompress_ptr cinfo, | 16490 #endif |
2315 + JSAMPIMAGE input_buf, | 16491 |
2316 + JDIMENSION in_row_group_ctr, | 16492 -/* API follows */ |
2317 + JSAMPARRAY output_buf) | 16493 |
2318 +{ | 16494 +/** |
2319 +} | 16495 + * Create a TurboJPEG compressor instance. |
2320 + | 16496 + * |
2321 +GLOBAL(void) | 16497 + * @return a handle to the newly-created instance, or NULL if an error |
2322 +jsimd_h2v1_merged_upsample (j_decompress_ptr cinfo, | 16498 + * occurred (see #tjGetErrorStr().) |
2323 + JSAMPIMAGE input_buf, | 16499 + */ |
2324 + JDIMENSION in_row_group_ctr, | 16500 +DLLEXPORT tjhandle DLLCALL tjInitCompress(void); |
2325 + JSAMPARRAY output_buf) | 16501 |
2326 +{ | 16502 -/* |
2327 +} | 16503 - tjhandle tjInitCompress(void) |
2328 + | 16504 |
2329 +GLOBAL(int) | 16505 - Creates a new JPEG compressor instance, allocates memory for the structures, |
2330 +jsimd_can_convsamp (void) | 16506 - and returns a handle to the instance. Most applications will only |
2331 +{ | 16507 - need to call this once at the beginning of the program or once for each |
2332 + init_simd(); | 16508 - concurrent thread. Don't try to create a new instance every time you |
2333 + | 16509 - compress an image, because this will cause performance to suffer. |
2334 + return 0; | 16510 - |
2335 +} | 16511 - RETURNS: NULL on error |
2336 + | 16512 +/** |
2337 +GLOBAL(int) | 16513 + * Compress an RGB or grayscale image into a JPEG image. |
2338 +jsimd_can_convsamp_float (void) | 16514 + * |
2339 +{ | 16515 + * @param handle a handle to a TurboJPEG compressor or transformer instance |
2340 + init_simd(); | 16516 + * @param srcBuf pointer to an image buffer containing RGB or grayscale pixels |
2341 + | 16517 + * to be compressed |
2342 + return 0; | 16518 + * @param width width (in pixels) of the source image |
2343 +} | 16519 + * @param pitch bytes per line of the source image. Normally, this should be |
2344 + | 16520 + * <tt>width * #tjPixelSize[pixelFormat]</tt> if the image is unpadded, |
2345 +GLOBAL(void) | 16521 + * or <tt>#TJPAD(width * #tjPixelSize[pixelFormat])</tt> if each line of |
2346 +jsimd_convsamp (JSAMPARRAY sample_data, JDIMENSION start_col, | 16522 + * the image is padded to the nearest 32-bit boundary, as is the case |
2347 + DCTELEM * workspace) | 16523 + * for Windows bitmaps. You can also be clever and use this parameter |
2348 +{ | 16524 + * to skip lines, etc. Setting this parameter to 0 is the equivalent of |
2349 +} | 16525 + * setting it to <tt>width * #tjPixelSize[pixelFormat]</tt>. |
2350 + | 16526 + * @param height height (in pixels) of the source image |
2351 +GLOBAL(void) | 16527 + * @param pixelFormat pixel format of the source image (see @ref TJPF |
2352 +jsimd_convsamp_float (JSAMPARRAY sample_data, JDIMENSION start_col, | 16528 + * "Pixel formats".) |
2353 + FAST_FLOAT * workspace) | 16529 + * @param jpegBuf address of a pointer to an image buffer that will receive the |
2354 +{ | 16530 + * JPEG image. TurboJPEG has the ability to reallocate the JPEG buffer |
2355 +} | 16531 + * to accommodate the size of the JPEG image. Thus, you can choose to: |
2356 + | 16532 + * -# pre-allocate the JPEG buffer with an arbitrary size using |
2357 +GLOBAL(int) | 16533 + * #tjAlloc() and let TurboJPEG grow the buffer as needed, |
2358 +jsimd_can_fdct_islow (void) | 16534 + * -# set <tt>*jpegBuf</tt> to NULL to tell TurboJPEG to allocate the |
2359 +{ | 16535 + * buffer for you, or |
2360 + init_simd(); | 16536 + * -# pre-allocate the buffer to a "worst case" size determined by |
2361 + | 16537 + * calling #tjBufSize(). This should ensure that the buffer never has |
2362 + return 0; | 16538 + * to be re-allocated (setting #TJFLAG_NOREALLOC guarantees this.) |
2363 +} | 16539 + * . |
2364 + | 16540 + * If you choose option 1, <tt>*jpegSize</tt> should be set to the |
2365 +GLOBAL(int) | 16541 + * size of your pre-allocated buffer. In any case, unless you have |
2366 +jsimd_can_fdct_ifast (void) | 16542 + * set #TJFLAG_NOREALLOC, you should always check <tt>*jpegBuf</tt> upon |
2367 +{ | 16543 + * return from this function, as it may have changed. |
2368 + init_simd(); | 16544 + * @param jpegSize pointer to an unsigned long variable that holds the size of |
2369 + | 16545 + * the JPEG image buffer. If <tt>*jpegBuf</tt> points to a |
2370 + return 0; | 16546 + * pre-allocated buffer, then <tt>*jpegSize</tt> should be set to the |
2371 +} | 16547 + * size of the buffer. Upon return, <tt>*jpegSize</tt> will contain the |
2372 + | 16548 + * size of the JPEG image (in bytes.) |
2373 +GLOBAL(int) | 16549 + * @param jpegSubsamp the level of chrominance subsampling to be used when |
2374 +jsimd_can_fdct_float (void) | 16550 + * generating the JPEG image (see @ref TJSAMP |
2375 +{ | 16551 + * "Chrominance subsampling options".) |
2376 + init_simd(); | 16552 + * @param jpegQual the image quality of the generated JPEG image (1 = worst, |
2377 + | 16553 + 100 = best) |
2378 + return 0; | 16554 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP |
2379 +} | 16555 + * "flags". |
2380 + | 16556 + * |
2381 +GLOBAL(void) | 16557 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().) |
2382 +jsimd_fdct_islow (DCTELEM * data) | 16558 */ |
2383 +{ | 16559 -DLLEXPORT tjhandle DLLCALL tjInitCompress(void); |
2384 +} | 16560 +DLLEXPORT int DLLCALL tjCompress2(tjhandle handle, unsigned char *srcBuf, |
2385 + | 16561 + int width, int pitch, int height, int pixelFormat, unsigned char **jpegBuf, |
2386 +GLOBAL(void) | 16562 + unsigned long *jpegSize, int jpegSubsamp, int jpegQual, int flags); |
2387 +jsimd_fdct_ifast (DCTELEM * data) | 16563 |
2388 +{ | 16564 |
2389 +} | 16565 -/* |
2390 + | 16566 - int tjCompress(tjhandle j, |
2391 +GLOBAL(void) | 16567 - unsigned char *srcbuf, int width, int pitch, int height, int pixelsize, |
2392 +jsimd_fdct_float (FAST_FLOAT * data) | 16568 - unsigned char *dstbuf, unsigned long *size, |
2393 +{ | 16569 - int jpegsubsamp, int jpegqual, int flags) |
2394 +} | 16570 +/** |
2395 + | 16571 + * The maximum size of the buffer (in bytes) required to hold a JPEG image with |
2396 +GLOBAL(int) | 16572 + * the given parameters. The number of bytes returned by this function is |
2397 +jsimd_can_quantize (void) | 16573 + * larger than the size of the uncompressed source image. The reason for this |
2398 +{ | 16574 + * is that the JPEG format uses 16-bit coefficients, and it is thus possible |
2399 + init_simd(); | 16575 + * for a very high-quality JPEG image with very high-frequency content to |
2400 + | 16576 + * expand rather than compress when converted to the JPEG format. Such images |
2401 + return 0; | 16577 + * represent a very rare corner case, but since there is no way to predict the |
2402 +} | 16578 + * size of a JPEG image prior to compression, the corner case has to be |
2403 + | 16579 + * handled. |
2404 +GLOBAL(int) | 16580 + * |
2405 +jsimd_can_quantize_float (void) | 16581 + * @param width width of the image (in pixels) |
2406 +{ | 16582 + * @param height height of the image (in pixels) |
2407 + init_simd(); | 16583 + * @param jpegSubsamp the level of chrominance subsampling to be used when |
2408 + | 16584 + * generating the JPEG image (see @ref TJSAMP |
2409 + return 0; | 16585 + * "Chrominance subsampling options".) |
2410 +} | 16586 + * |
2411 + | 16587 + * @return the maximum size of the buffer (in bytes) required to hold the |
2412 +GLOBAL(void) | 16588 + * image, or -1 if the arguments are out of bounds. |
2413 +jsimd_quantize (JCOEFPTR coef_block, DCTELEM * divisors, | 16589 + */ |
2414 + DCTELEM * workspace) | 16590 +DLLEXPORT unsigned long DLLCALL tjBufSize(int width, int height, |
2415 +{ | 16591 + int jpegSubsamp); |
2416 +} | 16592 |
2417 + | 16593 - [INPUT] j = instance handle previously returned from a call to |
2418 +GLOBAL(void) | 16594 - tjInitCompress() |
2419 +jsimd_quantize_float (JCOEFPTR coef_block, FAST_FLOAT * divisors, | 16595 - [INPUT] srcbuf = pointer to user-allocated image buffer containing pixels in |
2420 + FAST_FLOAT * workspace) | 16596 - RGB(A) or BGR(A) form |
2421 +{ | 16597 - [INPUT] width = width (in pixels) of the source image |
2422 +} | 16598 - [INPUT] pitch = bytes per line of the source image (width*pixelsize if the |
2423 + | 16599 - bitmap is unpadded, else TJPAD(width*pixelsize) if each line of the bitmap |
2424 +GLOBAL(int) | 16600 - is padded to the nearest 32-bit boundary, such as is the case for Windows |
2425 +jsimd_can_idct_2x2 (void) | 16601 - bitmaps. You can also be clever and use this parameter to skip lines, etc
., |
2426 +{ | 16602 - as long as the pitch is greater than 0.) |
2427 + init_simd(); | 16603 - [INPUT] height = height (in pixels) of the source image |
2428 + | 16604 - [INPUT] pixelsize = size (in bytes) of each pixel in the source image |
2429 + /* The code is optimised for these values only */ | 16605 - RGBA and BGRA: 4, RGB and BGR: 3 |
2430 + if (DCTSIZE != 8) | 16606 - [INPUT] dstbuf = pointer to user-allocated image buffer which will receive |
2431 + return 0; | 16607 - the JPEG image. Use the macro TJBUFSIZE(width, height) to determine |
2432 + if (sizeof(JCOEF) != 2) | 16608 - the appropriate size for this buffer based on the image width and height. |
2433 + return 0; | 16609 - [OUTPUT] size = pointer to unsigned long which receives the size (in bytes) |
2434 + if (BITS_IN_JSAMPLE != 8) | 16610 - of the compressed image |
2435 + return 0; | 16611 - [INPUT] jpegsubsamp = Specifies either 4:2:0, 4:2:2, or 4:4:4 subsampling. |
2436 + if (sizeof(JDIMENSION) != 4) | 16612 - When the image is converted from the RGB to YCbCr colorspace as part of th
e |
2437 + return 0; | 16613 - JPEG compression process, every other Cb and Cr (chrominance) pixel can be |
2438 + if (sizeof(ISLOW_MULT_TYPE) != 2) | 16614 - discarded to produce a smaller image with little perceptible loss of |
2439 + return 0; | 16615 - image clarity (the human eye is more sensitive to small changes in |
2440 + | 16616 - brightness than small changes in color.) |
2441 + if (simd_support & JSIMD_ARM_NEON) | 16617 |
2442 + return 1; | 16618 - TJ_420: 4:2:0 subsampling. Discards every other Cb, Cr pixel in both |
2443 + | 16619 - horizontal and vertical directions. |
2444 + return 0; | 16620 - TJ_422: 4:2:2 subsampling. Discards every other Cb, Cr pixel only in |
2445 +} | 16621 - the horizontal direction. |
2446 + | 16622 - TJ_444: no subsampling. |
2447 +GLOBAL(int) | 16623 - TJ_GRAYSCALE: Generate grayscale JPEG image |
2448 +jsimd_can_idct_4x4 (void) | 16624 +/** |
2449 +{ | 16625 + * The size of the buffer (in bytes) required to hold a YUV planar image with |
2450 + init_simd(); | 16626 + * the given parameters. |
2451 + | 16627 + * |
2452 + /* The code is optimised for these values only */ | 16628 + * @param width width of the image (in pixels) |
2453 + if (DCTSIZE != 8) | 16629 + * @param height height of the image (in pixels) |
2454 + return 0; | 16630 + * @param subsamp level of chrominance subsampling in the image (see |
2455 + if (sizeof(JCOEF) != 2) | 16631 + * @ref TJSAMP "Chrominance subsampling options".) |
2456 + return 0; | 16632 + * |
2457 + if (BITS_IN_JSAMPLE != 8) | 16633 + * @return the size of the buffer (in bytes) required to hold the image, or |
2458 + return 0; | 16634 + * -1 if the arguments are out of bounds. |
2459 + if (sizeof(JDIMENSION) != 4) | 16635 + */ |
2460 + return 0; | 16636 +DLLEXPORT unsigned long DLLCALL tjBufSizeYUV(int width, int height, |
2461 + if (sizeof(ISLOW_MULT_TYPE) != 2) | 16637 + int subsamp); |
2462 + return 0; | 16638 |
2463 + | 16639 - [INPUT] jpegqual = JPEG quality (an integer between 0 and 100 inclusive.) |
2464 + if (simd_support & JSIMD_ARM_NEON) | 16640 - [INPUT] flags = the bitwise OR of one or more of the following |
2465 + return 1; | 16641 |
2466 + | 16642 - TJ_BGR: The components of each pixel in the source image are stored in |
2467 + return 0; | 16643 - B,G,R order, not R,G,B |
2468 +} | 16644 - TJ_BOTTOMUP: The source image is stored in bottom-up (Windows) order, |
2469 + | 16645 - not top-down |
2470 +GLOBAL(void) | 16646 - TJ_FORCEMMX: Valid only for the Intel Performance Primitives implementatio
n |
2471 +jsimd_idct_2x2 (j_decompress_ptr cinfo, jpeg_component_info * compptr, | 16647 - of this codec-- force IPP to use MMX code (bypass CPU auto-detection) |
2472 + JCOEFPTR coef_block, JSAMPARRAY output_buf, | 16648 - TJ_FORCESSE: Valid only for the Intel Performance Primitives implementatio
n |
2473 + JDIMENSION output_col) | 16649 - of this codec-- force IPP to use SSE code (bypass CPU auto-detection) |
2474 +{ | 16650 - TJ_FORCESSE2: Valid only for the Intel Performance Primitives implementati
on |
2475 + if (simd_support & JSIMD_ARM_NEON) | 16651 - of this codec-- force IPP to use SSE2 code (bypass CPU auto-detection) |
2476 + jsimd_idct_2x2_neon(compptr->dct_table, coef_block, output_buf, | 16652 - TJ_FORCESSE3: Valid only for the Intel Performance Primitives implementati
on |
2477 + output_col); | 16653 - of this codec-- force IPP to use SSE3 code (bypass CPU auto-detection) |
2478 +} | 16654 +/** |
2479 + | 16655 + * Encode an RGB or grayscale image into a YUV planar image. This function |
2480 +GLOBAL(void) | 16656 + * uses the accelerated color conversion routines in TurboJPEG's underlying |
2481 +jsimd_idct_4x4 (j_decompress_ptr cinfo, jpeg_component_info * compptr, | 16657 + * codec to produce a planar YUV image that is suitable for X Video. |
2482 + JCOEFPTR coef_block, JSAMPARRAY output_buf, | 16658 + * Specifically, if the chrominance components are subsampled along the |
2483 + JDIMENSION output_col) | 16659 + * horizontal dimension, then the width of the luminance plane is padded to the |
2484 +{ | 16660 + * nearest multiple of 2 in the output image (same goes for the height of the |
2485 + if (simd_support & JSIMD_ARM_NEON) | 16661 + * luminance plane, if the chrominance components are subsampled along the |
2486 + jsimd_idct_4x4_neon(compptr->dct_table, coef_block, output_buf, | 16662 + * vertical dimension.) Also, each line of each plane in the output image is |
2487 + output_col); | 16663 + * padded to 4 bytes. Although this will work with any subsampling option, it |
2488 +} | 16664 + * is really only useful in combination with TJ_420, which produces an image |
2489 + | 16665 + * compatible with the I420 (AKA "YUV420P") format. |
2490 +GLOBAL(int) | 16666 + * <p> |
2491 +jsimd_can_idct_islow (void) | 16667 + * NOTE: Technically, the JPEG format uses the YCbCr colorspace, but per the |
2492 +{ | 16668 + * convention of the digital video community, the TurboJPEG API uses "YUV" to |
2493 + init_simd(); | 16669 + * refer to an image format consisting of Y, Cb, and Cr image planes. |
2494 + | 16670 + * |
2495 + /* The code is optimised for these values only */ | 16671 + * @param handle a handle to a TurboJPEG compressor or transformer instance |
2496 + if (DCTSIZE != 8) | 16672 + * @param srcBuf pointer to an image buffer containing RGB or grayscale pixels |
2497 + return 0; | 16673 + * to be encoded |
2498 + if (sizeof(JCOEF) != 2) | 16674 + * @param width width (in pixels) of the source image |
2499 + return 0; | 16675 + * @param pitch bytes per line of the source image. Normally, this should be |
2500 + if (BITS_IN_JSAMPLE != 8) | 16676 + * <tt>width * #tjPixelSize[pixelFormat]</tt> if the image is unpadded, |
2501 + return 0; | 16677 + * or <tt>#TJPAD(width * #tjPixelSize[pixelFormat])</tt> if each line of |
2502 + if (sizeof(JDIMENSION) != 4) | 16678 + * the image is padded to the nearest 32-bit boundary, as is the case |
2503 + return 0; | 16679 + * for Windows bitmaps. You can also be clever and use this parameter |
2504 + if (sizeof(ISLOW_MULT_TYPE) != 2) | 16680 + * to skip lines, etc. Setting this parameter to 0 is the equivalent of |
2505 + return 0; | 16681 + * setting it to <tt>width * #tjPixelSize[pixelFormat]</tt>. |
2506 + | 16682 + * @param height height (in pixels) of the source image |
2507 + if (simd_support & JSIMD_ARM_NEON) | 16683 + * @param pixelFormat pixel format of the source image (see @ref TJPF |
2508 + return 1; | 16684 + * "Pixel formats".) |
2509 + | 16685 + * @param dstBuf pointer to an image buffer that will receive the YUV image. |
2510 + return 0; | 16686 + * Use #tjBufSizeYUV() to determine the appropriate size for this buffer |
2511 +} | 16687 + * based on the image width, height, and level of chrominance |
2512 + | 16688 + * subsampling. |
2513 +GLOBAL(int) | 16689 + * @param subsamp the level of chrominance subsampling to be used when |
2514 +jsimd_can_idct_ifast (void) | 16690 + * generating the YUV image (see @ref TJSAMP |
2515 +{ | 16691 + * "Chrominance subsampling options".) |
2516 + init_simd(); | 16692 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP |
2517 + | 16693 + * "flags". |
2518 + /* The code is optimised for these values only */ | 16694 + * |
2519 + if (DCTSIZE != 8) | 16695 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().) |
2520 + return 0; | 16696 +*/ |
2521 + if (sizeof(JCOEF) != 2) | 16697 +DLLEXPORT int DLLCALL tjEncodeYUV2(tjhandle handle, |
2522 + return 0; | 16698 + unsigned char *srcBuf, int width, int pitch, int height, int pixelFormat, |
2523 + if (BITS_IN_JSAMPLE != 8) | 16699 + unsigned char *dstBuf, int subsamp, int flags); |
2524 + return 0; | 16700 |
2525 + if (sizeof(JDIMENSION) != 4) | 16701 - RETURNS: 0 on success, -1 on error |
2526 + return 0; | 16702 + |
2527 + if (sizeof(IFAST_MULT_TYPE) != 2) | 16703 +/** |
2528 + return 0; | 16704 + * Create a TurboJPEG decompressor instance. |
2529 + if (IFAST_SCALE_BITS != 2) | 16705 + * |
2530 + return 0; | 16706 + * @return a handle to the newly-created instance, or NULL if an error |
2531 + | 16707 + * occurred (see #tjGetErrorStr().) |
2532 + if (simd_support & JSIMD_ARM_NEON) | 16708 */ |
2533 + return 1; | 16709 -DLLEXPORT int DLLCALL tjCompress(tjhandle j, |
2534 + | 16710 - unsigned char *srcbuf, int width, int pitch, int height, int pixelsize, |
2535 + return 0; | 16711 - unsigned char *dstbuf, unsigned long *size, |
2536 +} | 16712 - int jpegsubsamp, int jpegqual, int flags); |
2537 + | 16713 +DLLEXPORT tjhandle DLLCALL tjInitDecompress(void); |
2538 +GLOBAL(int) | 16714 |
2539 +jsimd_can_idct_float (void) | 16715 -DLLEXPORT unsigned long DLLCALL TJBUFSIZE(int width, int height); |
2540 +{ | 16716 |
2541 + init_simd(); | 16717 -/* |
2542 + | 16718 - tjhandle tjInitDecompress(void) |
2543 + return 0; | 16719 +/** |
2544 +} | 16720 + * Retrieve information about a JPEG image without decompressing it. |
2545 + | 16721 + * |
2546 +GLOBAL(void) | 16722 + * @param handle a handle to a TurboJPEG decompressor or transformer instance |
2547 +jsimd_idct_islow (j_decompress_ptr cinfo, jpeg_component_info * compptr, | 16723 + * @param jpegBuf pointer to a buffer containing a JPEG image |
2548 + JCOEFPTR coef_block, JSAMPARRAY output_buf, | 16724 + * @param jpegSize size of the JPEG image (in bytes) |
2549 + JDIMENSION output_col) | 16725 + * @param width pointer to an integer variable that will receive the width (in |
2550 +{ | 16726 + * pixels) of the JPEG image |
2551 + if (simd_support & JSIMD_ARM_NEON) | 16727 + * @param height pointer to an integer variable that will receive the height |
2552 + jsimd_idct_islow_neon(compptr->dct_table, coef_block, output_buf, | 16728 + * (in pixels) of the JPEG image |
2553 + output_col); | 16729 + * @param jpegSubsamp pointer to an integer variable that will receive the |
2554 +} | 16730 + * level of chrominance subsampling used when compressing the JPEG image |
2555 + | 16731 + * (see @ref TJSAMP "Chrominance subsampling options".) |
2556 +GLOBAL(void) | 16732 + * |
2557 +jsimd_idct_ifast (j_decompress_ptr cinfo, jpeg_component_info * compptr, | 16733 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().) |
2558 + JCOEFPTR coef_block, JSAMPARRAY output_buf, | 16734 +*/ |
2559 + JDIMENSION output_col) | 16735 +DLLEXPORT int DLLCALL tjDecompressHeader2(tjhandle handle, |
2560 +{ | 16736 + unsigned char *jpegBuf, unsigned long jpegSize, int *width, int *height, |
2561 + if (simd_support & JSIMD_ARM_NEON) | 16737 + int *jpegSubsamp); |
2562 + jsimd_idct_ifast_neon(compptr->dct_table, coef_block, output_buf, | 16738 |
2563 + output_col); | 16739 - Creates a new JPEG decompressor instance, allocates memory for the |
2564 +} | 16740 - structures, and returns a handle to the instance. Most applications will |
2565 + | 16741 - only need to call this once at the beginning of the program or once for each |
2566 +GLOBAL(void) | 16742 - concurrent thread. Don't try to create a new instance every time you |
2567 +jsimd_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr, | 16743 - decompress an image, because this will cause performance to suffer. |
2568 + JCOEFPTR coef_block, JSAMPARRAY output_buf, | 16744 |
2569 + JDIMENSION output_col) | 16745 - RETURNS: NULL on error |
2570 +{ | 16746 +/** |
2571 +} | 16747 + * Returns a list of fractional scaling factors that the JPEG decompressor in |
2572 Index: simd/jsimd_arm64_neon.S | 16748 + * this implementation of TurboJPEG supports. |
2573 new file mode 100644 | 16749 + * |
| 16750 + * @param numscalingfactors pointer to an integer variable that will receive |
| 16751 + * the number of elements in the list |
| 16752 + * |
| 16753 + * @return a pointer to a list of fractional scaling factors, or NULL if an |
| 16754 + * error is encountered (see #tjGetErrorStr().) |
| 16755 */ |
| 16756 -DLLEXPORT tjhandle DLLCALL tjInitDecompress(void); |
| 16757 +DLLEXPORT tjscalingfactor* DLLCALL tjGetScalingFactors(int *numscalingfactors); |
| 16758 |
| 16759 |
| 16760 -/* |
| 16761 - int tjDecompressHeader(tjhandle j, |
| 16762 - unsigned char *srcbuf, unsigned long size, |
| 16763 - int *width, int *height) |
| 16764 +/** |
| 16765 + * Decompress a JPEG image to an RGB or grayscale image. |
| 16766 + * |
| 16767 + * @param handle a handle to a TurboJPEG decompressor or transformer instance |
| 16768 + * @param jpegBuf pointer to a buffer containing the JPEG image to decompress |
| 16769 + * @param jpegSize size of the JPEG image (in bytes) |
| 16770 + * @param dstBuf pointer to an image buffer that will receive the decompressed |
| 16771 + * image. This buffer should normally be <tt>pitch * scaledHeight</tt> |
| 16772 + * bytes in size, where <tt>scaledHeight</tt> can be determined by |
| 16773 + * calling #TJSCALED() with the JPEG image height and one of the scaling |
| 16774 + * factors returned by #tjGetScalingFactors(). The <tt>dstBuf</tt> |
| 16775 + * pointer may also be used to decompress into a specific region of a |
| 16776 + * larger buffer. |
| 16777 + * @param width desired width (in pixels) of the destination image. If this is |
| 16778 + * different than the width of the JPEG image being decompressed, then |
| 16779 + * TurboJPEG will use scaling in the JPEG decompressor to generate the |
| 16780 + * largest possible image that will fit within the desired width. If |
| 16781 + * <tt>width</tt> is set to 0, then only the height will be considered |
| 16782 + * when determining the scaled image size. |
| 16783 + * @param pitch bytes per line of the destination image. Normally, this is |
| 16784 + * <tt>scaledWidth * #tjPixelSize[pixelFormat]</tt> if the decompressed |
| 16785 + * image is unpadded, else <tt>#TJPAD(scaledWidth * |
| 16786 + * #tjPixelSize[pixelFormat])</tt> if each line of the decompressed |
| 16787 + * image is padded to the nearest 32-bit boundary, as is the case for |
| 16788 + * Windows bitmaps. (NOTE: <tt>scaledWidth</tt> can be determined by |
| 16789 + * calling #TJSCALED() with the JPEG image width and one of the scaling |
| 16790 + * factors returned by #tjGetScalingFactors().) You can also be clever |
| 16791 + * and use the pitch parameter to skip lines, etc. Setting this |
| 16792 + * parameter to 0 is the equivalent of setting it to <tt>scaledWidth |
| 16793 + * * #tjPixelSize[pixelFormat]</tt>. |
| 16794 + * @param height desired height (in pixels) of the destination image. If this |
| 16795 + * is different than the height of the JPEG image being decompressed, |
| 16796 + * then TurboJPEG will use scaling in the JPEG decompressor to generate |
| 16797 + * the largest possible image that will fit within the desired height. |
| 16798 + * If <tt>height</tt> is set to 0, then only the width will be |
| 16799 + * considered when determining the scaled image size. |
| 16800 + * @param pixelFormat pixel format of the destination image (see @ref |
| 16801 + * TJPF "Pixel formats".) |
| 16802 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP |
| 16803 + * "flags". |
| 16804 + * |
| 16805 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().) |
| 16806 + */ |
| 16807 +DLLEXPORT int DLLCALL tjDecompress2(tjhandle handle, |
| 16808 + unsigned char *jpegBuf, unsigned long jpegSize, unsigned char *dstBuf, |
| 16809 + int width, int pitch, int height, int pixelFormat, int flags); |
| 16810 |
| 16811 - [INPUT] j = instance handle previously returned from a call to |
| 16812 - tjInitDecompress() |
| 16813 - [INPUT] srcbuf = pointer to a user-allocated buffer containing the JPEG image |
| 16814 - to decompress |
| 16815 - [INPUT] size = size of the JPEG image buffer (in bytes) |
| 16816 - [OUTPUT] width = width (in pixels) of the JPEG image |
| 16817 - [OUTPUT] height = height (in pixels) of the JPEG image |
| 16818 |
| 16819 - RETURNS: 0 on success, -1 on error |
| 16820 -*/ |
| 16821 -DLLEXPORT int DLLCALL tjDecompressHeader(tjhandle j, |
| 16822 - unsigned char *srcbuf, unsigned long size, |
| 16823 - int *width, int *height); |
| 16824 +/** |
| 16825 + * Decompress a JPEG image to a YUV planar image. This function performs JPEG |
| 16826 + * decompression but leaves out the color conversion step, so a planar YUV |
| 16827 + * image is generated instead of an RGB image. The padding of the planes in |
| 16828 + * this image is the same as in the images generated by #tjEncodeYUV2(). Note |
| 16829 + * that, if the width or height of the image is not an even multiple of the MCU |
| 16830 + * block size (see #tjMCUWidth and #tjMCUHeight), then an intermediate buffer |
| 16831 + * copy will be performed within TurboJPEG. |
| 16832 + * <p> |
| 16833 + * NOTE: Technically, the JPEG format uses the YCbCr colorspace, but per the |
| 16834 + * convention of the digital video community, the TurboJPEG API uses "YUV" to |
| 16835 + * refer to an image format consisting of Y, Cb, and Cr image planes. |
| 16836 + * |
| 16837 + * @param handle a handle to a TurboJPEG decompressor or transformer instance |
| 16838 + * @param jpegBuf pointer to a buffer containing the JPEG image to decompress |
| 16839 + * @param jpegSize size of the JPEG image (in bytes) |
| 16840 + * @param dstBuf pointer to an image buffer that will receive the YUV image. |
| 16841 + * Use #tjBufSizeYUV() to determine the appropriate size for this buffer |
| 16842 + * based on the image width, height, and level of subsampling. |
| 16843 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP |
| 16844 + * "flags". |
| 16845 + * |
| 16846 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().) |
| 16847 + */ |
| 16848 +DLLEXPORT int DLLCALL tjDecompressToYUV(tjhandle handle, |
| 16849 + unsigned char *jpegBuf, unsigned long jpegSize, unsigned char *dstBuf, |
| 16850 + int flags); |
| 16851 |
| 16852 |
| 16853 -/* |
| 16854 - int tjDecompress(tjhandle j, |
| 16855 - unsigned char *srcbuf, unsigned long size, |
| 16856 - unsigned char *dstbuf, int width, int pitch, int height, int pixelsize, |
| 16857 - int flags) |
| 16858 +/** |
| 16859 + * Create a new TurboJPEG transformer instance. |
| 16860 + * |
| 16861 + * @return a handle to the newly-created instance, or NULL if an error |
| 16862 + * occurred (see #tjGetErrorStr().) |
| 16863 + */ |
| 16864 +DLLEXPORT tjhandle DLLCALL tjInitTransform(void); |
| 16865 |
| 16866 - [INPUT] j = instance handle previously returned from a call to |
| 16867 - tjInitDecompress() |
| 16868 - [INPUT] srcbuf = pointer to a user-allocated buffer containing the JPEG image |
| 16869 - to decompress |
| 16870 - [INPUT] size = size of the JPEG image buffer (in bytes) |
| 16871 - [INPUT] dstbuf = pointer to user-allocated image buffer which will receive |
| 16872 - the bitmap image. This buffer should normally be pitch*height |
| 16873 - bytes in size, although this pointer may also be used to decompress into |
| 16874 - a specific region of a larger buffer. |
| 16875 - [INPUT] width = width (in pixels) of the destination image |
| 16876 - [INPUT] pitch = bytes per line of the destination image (width*pixelsize if t
he |
| 16877 - bitmap is unpadded, else TJPAD(width*pixelsize) if each line of the bitmap |
| 16878 - is padded to the nearest 32-bit boundary, such as is the case for Windows |
| 16879 - bitmaps. You can also be clever and use this parameter to skip lines, etc
., |
| 16880 - as long as the pitch is greater than 0.) |
| 16881 - [INPUT] height = height (in pixels) of the destination image |
| 16882 - [INPUT] pixelsize = size (in bytes) of each pixel in the destination image |
| 16883 - RGBA/RGBx and BGRA/BGRx: 4, RGB and BGR: 3 |
| 16884 - [INPUT] flags = the bitwise OR of one or more of the following |
| 16885 |
| 16886 - TJ_BGR: The components of each pixel in the destination image should be |
| 16887 - written in B,G,R order, not R,G,B |
| 16888 - TJ_BOTTOMUP: The destination image should be stored in bottom-up |
| 16889 - (Windows) order, not top-down |
| 16890 - TJ_FORCEMMX: Valid only for the Intel Performance Primitives implementatio
n |
| 16891 - of this codec-- force IPP to use MMX code (bypass CPU auto-detection) |
| 16892 - TJ_FORCESSE: Valid only for the Intel Performance Primitives implementatio
n |
| 16893 - of this codec-- force IPP to use SSE code (bypass CPU auto-detection) |
| 16894 - TJ_FORCESSE2: Valid only for the Intel Performance Primitives implementati
on |
| 16895 - of this codec-- force IPP to use SSE2 code (bypass CPU auto-detection) |
| 16896 +/** |
| 16897 + * Losslessly transform a JPEG image into another JPEG image. Lossless |
| 16898 + * transforms work by moving the raw coefficients from one JPEG image structure |
| 16899 + * to another without altering the values of the coefficients. While this is |
| 16900 + * typically faster than decompressing the image, transforming it, and |
| 16901 + * re-compressing it, lossless transforms are not free. Each lossless |
| 16902 + * transform requires reading and performing Huffman decoding on all of the |
| 16903 + * coefficients in the source image, regardless of the size of the destination |
| 16904 + * image. Thus, this function provides a means of generating multiple |
| 16905 + * transformed images from the same source or applying multiple |
| 16906 + * transformations simultaneously, in order to eliminate the need to read the |
| 16907 + * source coefficients multiple times. |
| 16908 + * |
| 16909 + * @param handle a handle to a TurboJPEG transformer instance |
| 16910 + * @param jpegBuf pointer to a buffer containing the JPEG image to transform |
| 16911 + * @param jpegSize size of the JPEG image (in bytes) |
| 16912 + * @param n the number of transformed JPEG images to generate |
| 16913 + * @param dstBufs pointer to an array of n image buffers. <tt>dstBufs[i]</tt> |
| 16914 + * will receive a JPEG image that has been transformed using the |
| 16915 + * parameters in <tt>transforms[i]</tt>. TurboJPEG has the ability to |
| 16916 + * reallocate the JPEG buffer to accommodate the size of the JPEG image. |
| 16917 + * Thus, you can choose to: |
| 16918 + * -# pre-allocate the JPEG buffer with an arbitrary size using |
| 16919 + * #tjAlloc() and let TurboJPEG grow the buffer as needed, |
| 16920 + * -# set <tt>dstBufs[i]</tt> to NULL to tell TurboJPEG to allocate the |
| 16921 + * buffer for you, or |
| 16922 + * -# pre-allocate the buffer to a "worst case" size determined by |
| 16923 + * calling #tjBufSize() with the transformed or cropped width and |
| 16924 + * height. This should ensure that the buffer never has to be |
| 16925 + * re-allocated (setting #TJFLAG_NOREALLOC guarantees this.) |
| 16926 + * . |
| 16927 + * If you choose option 1, <tt>dstSizes[i]</tt> should be set to |
| 16928 + * the size of your pre-allocated buffer. In any case, unless you have |
| 16929 + * set #TJFLAG_NOREALLOC, you should always check <tt>dstBufs[i]</tt> |
| 16930 + * upon return from this function, as it may have changed. |
| 16931 + * @param dstSizes pointer to an array of n unsigned long variables that will |
| 16932 + * receive the actual sizes (in bytes) of each transformed JPEG image. |
| 16933 + * If <tt>dstBufs[i]</tt> points to a pre-allocated buffer, then |
| 16934 + * <tt>dstSizes[i]</tt> should be set to the size of the buffer. Upon |
| 16935 + * return, <tt>dstSizes[i]</tt> will contain the size of the JPEG image |
| 16936 + * (in bytes.) |
| 16937 + * @param transforms pointer to an array of n #tjtransform structures, each of |
| 16938 + * which specifies the transform parameters and/or cropping region for |
| 16939 + * the corresponding transformed output image. |
| 16940 + * @param flags the bitwise OR of one or more of the @ref TJFLAG_BOTTOMUP |
| 16941 + * "flags". |
| 16942 + * |
| 16943 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().) |
| 16944 + */ |
| 16945 +DLLEXPORT int DLLCALL tjTransform(tjhandle handle, unsigned char *jpegBuf, |
| 16946 + unsigned long jpegSize, int n, unsigned char **dstBufs, |
| 16947 + unsigned long *dstSizes, tjtransform *transforms, int flags); |
| 16948 |
| 16949 - RETURNS: 0 on success, -1 on error |
| 16950 -*/ |
| 16951 -DLLEXPORT int DLLCALL tjDecompress(tjhandle j, |
| 16952 - unsigned char *srcbuf, unsigned long size, |
| 16953 - unsigned char *dstbuf, int width, int pitch, int height, int pixelsize, |
| 16954 - int flags); |
| 16955 |
| 16956 +/** |
| 16957 + * Destroy a TurboJPEG compressor, decompressor, or transformer instance. |
| 16958 + * |
| 16959 + * @param handle a handle to a TurboJPEG compressor, decompressor or |
| 16960 + * transformer instance |
| 16961 + * |
| 16962 + * @return 0 if successful, or -1 if an error occurred (see #tjGetErrorStr().) |
| 16963 + */ |
| 16964 +DLLEXPORT int DLLCALL tjDestroy(tjhandle handle); |
| 16965 |
| 16966 -/* |
| 16967 - int tjDestroy(tjhandle h) |
| 16968 |
| 16969 - Frees structures associated with a compression or decompression instance |
| 16970 - |
| 16971 - [INPUT] h = instance handle (returned from a previous call to |
| 16972 - tjInitCompress() or tjInitDecompress() |
| 16973 +/** |
| 16974 + * Allocate an image buffer for use with TurboJPEG. You should always use |
| 16975 + * this function to allocate the JPEG destination buffer(s) for #tjCompress2() |
| 16976 + * and #tjTransform() unless you are disabling automatic buffer |
| 16977 + * (re)allocation (by setting #TJFLAG_NOREALLOC.) |
| 16978 + * |
| 16979 + * @param bytes the number of bytes to allocate |
| 16980 + * |
| 16981 + * @return a pointer to a newly-allocated buffer with the specified number of |
| 16982 + * bytes |
| 16983 + * |
| 16984 + * @sa tjFree() |
| 16985 + */ |
| 16986 +DLLEXPORT unsigned char* DLLCALL tjAlloc(int bytes); |
| 16987 |
| 16988 - RETURNS: 0 on success, -1 on error |
| 16989 -*/ |
| 16990 -DLLEXPORT int DLLCALL tjDestroy(tjhandle h); |
| 16991 |
| 16992 +/** |
| 16993 + * Free an image buffer previously allocated by TurboJPEG. You should always |
| 16994 + * use this function to free JPEG destination buffer(s) that were automatically |
| 16995 + * (re)allocated by #tjCompress2() or #tjTransform() or that were manually |
| 16996 + * allocated using #tjAlloc(). |
| 16997 + * |
| 16998 + * @param buffer address of the buffer to free |
| 16999 + * |
| 17000 + * @sa tjAlloc() |
| 17001 + */ |
| 17002 +DLLEXPORT void DLLCALL tjFree(unsigned char *buffer); |
| 17003 |
| 17004 -/* |
| 17005 - char *tjGetErrorStr(void) |
| 17006 - |
| 17007 - Returns a descriptive error message explaining why the last command failed |
| 17008 -*/ |
| 17009 + |
| 17010 +/** |
| 17011 + * Returns a descriptive error message explaining why the last command failed. |
| 17012 + * |
| 17013 + * @return a descriptive error message explaining why the last command failed. |
| 17014 + */ |
| 17015 DLLEXPORT char* DLLCALL tjGetErrorStr(void); |
| 17016 |
| 17017 + |
| 17018 +/* Backward compatibility functions and macros (nothing to see here) */ |
| 17019 +#define NUMSUBOPT TJ_NUMSAMP |
| 17020 +#define TJ_444 TJSAMP_444 |
| 17021 +#define TJ_422 TJSAMP_422 |
| 17022 +#define TJ_420 TJSAMP_420 |
| 17023 +#define TJ_411 TJSAMP_420 |
| 17024 +#define TJ_GRAYSCALE TJSAMP_GRAY |
| 17025 + |
| 17026 +#define TJ_BGR 1 |
| 17027 +#define TJ_BOTTOMUP TJFLAG_BOTTOMUP |
| 17028 +#define TJ_FORCEMMX TJFLAG_FORCEMMX |
| 17029 +#define TJ_FORCESSE TJFLAG_FORCESSE |
| 17030 +#define TJ_FORCESSE2 TJFLAG_FORCESSE2 |
| 17031 +#define TJ_ALPHAFIRST 64 |
| 17032 +#define TJ_FORCESSE3 TJFLAG_FORCESSE3 |
| 17033 +#define TJ_FASTUPSAMPLE TJFLAG_FASTUPSAMPLE |
| 17034 +#define TJ_YUV 512 |
| 17035 + |
| 17036 +DLLEXPORT unsigned long DLLCALL TJBUFSIZE(int width, int height); |
| 17037 + |
| 17038 +DLLEXPORT unsigned long DLLCALL TJBUFSIZEYUV(int width, int height, |
| 17039 + int jpegSubsamp); |
| 17040 + |
| 17041 +DLLEXPORT int DLLCALL tjCompress(tjhandle handle, unsigned char *srcBuf, |
| 17042 + int width, int pitch, int height, int pixelSize, unsigned char *dstBuf, |
| 17043 + unsigned long *compressedSize, int jpegSubsamp, int jpegQual, int flags); |
| 17044 + |
| 17045 +DLLEXPORT int DLLCALL tjEncodeYUV(tjhandle handle, |
| 17046 + unsigned char *srcBuf, int width, int pitch, int height, int pixelSize, |
| 17047 + unsigned char *dstBuf, int subsamp, int flags); |
| 17048 + |
| 17049 +DLLEXPORT int DLLCALL tjDecompressHeader(tjhandle handle, |
| 17050 + unsigned char *jpegBuf, unsigned long jpegSize, int *width, int *height); |
| 17051 + |
| 17052 +DLLEXPORT int DLLCALL tjDecompress(tjhandle handle, |
| 17053 + unsigned char *jpegBuf, unsigned long jpegSize, unsigned char *dstBuf, |
| 17054 + int width, int pitch, int height, int pixelSize, int flags); |
| 17055 + |
| 17056 + |
| 17057 +/** |
| 17058 + * @} |
| 17059 + */ |
| 17060 + |
| 17061 #ifdef __cplusplus |
| 17062 } |
| 17063 #endif |
| 17064 + |
| 17065 +#endif |
| 17066 Index: turbojpegl.c |
2574 =================================================================== | 17067 =================================================================== |
2575 --- /dev/null | 17068 --- turbojpegl.c (revision 829) |
2576 +++ simd/jsimd_arm64_neon.S | 17069 +++ turbojpegl.c (working copy) |
2577 @@ -0,0 +1,1861 @@ | 17070 @@ -149,6 +149,10 @@ |
2578 +/* | 17071 #error "TurboJPEG requires JPEG colorspace extensions" |
2579 + * ARMv8 NEON optimizations for libjpeg-turbo | 17072 #endif |
2580 + * | 17073 |
2581 + * Copyright (C) 2009-2011 Nokia Corporation and/or its subsidiary(-ies). | 17074 + if(flags&TJ_FORCEMMX) putenv("JSIMD_FORCEMMX=1"); |
2582 + * All rights reserved. | 17075 + else if(flags&TJ_FORCESSE) putenv("JSIMD_FORCESSE=1"); |
2583 + * Author: Siarhei Siamashka <siarhei.siamashka@nokia.com> | 17076 + else if(flags&TJ_FORCESSE2) putenv("JSIMD_FORCESSE2=1"); |
2584 + * Copyright (C) 2013-2014, Linaro Limited | 17077 + |
2585 + * Author: Ragesh Radhakrishnan <ragesh.r@linaro.org> | 17078 if(setjmp(j->jerr.jb)) |
2586 + * | 17079 { // this will execute if LIBJPEG has an error |
2587 + * This software is provided 'as-is', without any express or implied | 17080 if(row_pointer) free(row_pointer); |
2588 + * warranty. In no event will the authors be held liable for any damages | 17081 @@ -188,7 +192,8 @@ |
2589 + * arising from the use of this software. | 17082 j->cinfo.image_height-j->cinfo.next_scanline); |
2590 + * | 17083 } |
2591 + * Permission is granted to anyone to use this software for any purpose, | 17084 jpeg_finish_compress(&j->cinfo); |
2592 + * including commercial applications, and to alter it and redistribute it | 17085 - *size=TJBUFSIZE(j->cinfo.image_width, j->cinfo.image_height)-(j->jdms.fr
ee_in_buffer); |
2593 + * freely, subject to the following restrictions: | 17086 + *size=TJBUFSIZE(j->cinfo.image_width, j->cinfo.image_height) |
2594 + * | 17087 + -(unsigned long)(j->jdms.free_in_buffer); |
2595 + * 1. The origin of this software must not be misrepresented; you must not | 17088 |
2596 + * claim that you wrote the original software. If you use this software | 17089 if(row_pointer) free(row_pointer); |
2597 + * in a product, an acknowledgment in the product documentation would be | 17090 return 0; |
2598 + * appreciated but is not required. | 17091 @@ -287,6 +292,10 @@ |
2599 + * 2. Altered source versions must be plainly marked as such, and must not be | 17092 |
2600 + * misrepresented as being the original software. | 17093 if(pitch==0) pitch=width*ps; |
2601 + * 3. This notice may not be removed or altered from any source distribution. | 17094 |
2602 + */ | 17095 + if(flags&TJ_FORCEMMX) putenv("JSIMD_FORCEMMX=1"); |
2603 + | 17096 + else if(flags&TJ_FORCESSE) putenv("JSIMD_FORCESSE=1"); |
2604 +#if defined(__linux__) && defined(__ELF__) | 17097 + else if(flags&TJ_FORCESSE2) putenv("JSIMD_FORCESSE2=1"); |
2605 +.section .note.GNU-stack,"",%progbits /* mark stack as non-executable */ | 17098 + |
2606 +#endif | 17099 if(setjmp(j->jerr.jb)) |
2607 + | 17100 { // this will execute if LIBJPEG has an error |
2608 +.text | 17101 if(row_pointer) free(row_pointer); |
2609 +.arch armv8-a+fp+simd | 17102 Index: wrppm.c |
2610 + | 17103 =================================================================== |
2611 + | 17104 --- wrppm.c (revision 829) |
2612 +#define RESPECT_STRICT_ALIGNMENT 1 | 17105 +++ wrppm.c (working copy) |
2613 + | 17106 @@ -2,6 +2,7 @@ |
2614 + | 17107 * wrppm.c |
2615 +/*****************************************************************************/ | 17108 * |
2616 + | 17109 * Copyright (C) 1991-1996, Thomas G. Lane. |
2617 +/* Supplementary macro for setting function attributes */ | 17110 + * Modified 2009 by Guido Vollbeding. |
2618 +.macro asm_function fname | 17111 * This file is part of the Independent JPEG Group's software. |
2619 +#ifdef __APPLE__ | 17112 * For conditions of distribution and use, see the accompanying README file. |
2620 + .globl _\fname | 17113 * |
2621 +_\fname: | 17114 @@ -40,11 +41,11 @@ |
2622 +#else | 17115 #define BYTESPERSAMPLE 1 |
2623 + .global \fname | 17116 #define PPM_MAXVAL 255 |
2624 +#ifdef __ELF__ | 17117 #else |
2625 + .hidden \fname | 17118 -/* The word-per-sample format always puts the LSB first. */ |
2626 + .type \fname, %function | 17119 +/* The word-per-sample format always puts the MSB first. */ |
2627 +#endif | 17120 #define PUTPPMSAMPLE(ptr,v) \ |
2628 +\fname: | 17121 { register int val_ = v; \ |
2629 +#endif | 17122 + *ptr++ = (char) ((val_ >> 8) & 0xFF); \ |
2630 +.endm | 17123 *ptr++ = (char) (val_ & 0xFF); \ |
2631 + | 17124 - *ptr++ = (char) ((val_ >> 8) & 0xFF); \ |
2632 +/* Transpose elements of single 128 bit registers */ | 17125 } |
2633 +.macro transpose_single x0,x1,xi,xilen,literal | 17126 #define BYTESPERSAMPLE 2 |
2634 + ins \xi\xilen[0], \x0\xilen[0] | 17127 #define PPM_MAXVAL ((1<<BITS_IN_JSAMPLE)-1) |
2635 + ins \x1\xilen[0], \x0\xilen[1] | |
2636 + trn1 \x0\literal, \x0\literal, \x1\literal | |
2637 + trn2 \x1\literal, \xi\literal, \x1\literal | |
2638 +.endm | |
2639 + | |
2640 +/* Transpose elements of 2 differnet registers */ | |
2641 +.macro transpose x0,x1,xi,xilen,literal | |
2642 + mov \xi\xilen, \x0\xilen | |
2643 + trn1 \x0\literal, \x0\literal, \x1\literal | |
2644 + trn2 \x1\literal, \xi\literal, \x1\literal | |
2645 +.endm | |
2646 + | |
2647 +/* Transpose a block of 4x4 coefficients in four 64-bit registers */ | |
2648 +.macro transpose_4x4_32 x0,x0len x1,x1len x2,x2len x3,x3len,xi,xilen | |
2649 + mov \xi\xilen, \x0\xilen | |
2650 + trn1 \x0\x0len, \x0\x0len, \x2\x2len | |
2651 + trn2 \x2\x2len, \xi\x0len, \x2\x2len | |
2652 + mov \xi\xilen, \x1\xilen | |
2653 + trn1 \x1\x1len, \x1\x1len, \x3\x3len | |
2654 + trn2 \x3\x3len, \xi\x1len, \x3\x3len | |
2655 +.endm | |
2656 + | |
2657 +.macro transpose_4x4_16 x0,x0len x1,x1len, x2,x2len, x3,x3len,xi,xilen | |
2658 + mov \xi\xilen, \x0\xilen | |
2659 + trn1 \x0\x0len, \x0\x0len, \x1\x1len | |
2660 + trn2 \x1\x2len, \xi\x0len, \x1\x2len | |
2661 + mov \xi\xilen, \x2\xilen | |
2662 + trn1 \x2\x2len, \x2\x2len, \x3\x3len | |
2663 + trn2 \x3\x2len, \xi\x1len, \x3\x3len | |
2664 +.endm | |
2665 + | |
2666 +.macro transpose_4x4 x0, x1, x2, x3,x5 | |
2667 + transpose_4x4_16 \x0,.4h, \x1,.4h, \x2,.4h,\x3,.4h,\x5,.16b | |
2668 + transpose_4x4_32 \x0,.2s, \x1,.2s, \x2,.2s,\x3,.2s,\x5,.16b | |
2669 +.endm | |
2670 + | |
2671 + | |
2672 +#define CENTERJSAMPLE 128 | |
2673 + | |
2674 +/*****************************************************************************/ | |
2675 + | |
2676 +/* | |
2677 + * Perform dequantization and inverse DCT on one block of coefficients. | |
2678 + * | |
2679 + * GLOBAL(void) | |
2680 + * jsimd_idct_islow_neon (void * dct_table, JCOEFPTR coef_block, | |
2681 + * JSAMPARRAY output_buf, JDIMENSION output_col) | |
2682 + */ | |
2683 + | |
2684 +#define FIX_0_298631336 (2446) | |
2685 +#define FIX_0_390180644 (3196) | |
2686 +#define FIX_0_541196100 (4433) | |
2687 +#define FIX_0_765366865 (6270) | |
2688 +#define FIX_0_899976223 (7373) | |
2689 +#define FIX_1_175875602 (9633) | |
2690 +#define FIX_1_501321110 (12299) | |
2691 +#define FIX_1_847759065 (15137) | |
2692 +#define FIX_1_961570560 (16069) | |
2693 +#define FIX_2_053119869 (16819) | |
2694 +#define FIX_2_562915447 (20995) | |
2695 +#define FIX_3_072711026 (25172) | |
2696 + | |
2697 +#define FIX_1_175875602_MINUS_1_961570560 (FIX_1_175875602 - FIX_1_961570560) | |
2698 +#define FIX_1_175875602_MINUS_0_390180644 (FIX_1_175875602 - FIX_0_390180644) | |
2699 +#define FIX_0_541196100_MINUS_1_847759065 (FIX_0_541196100 - FIX_1_847759065) | |
2700 +#define FIX_3_072711026_MINUS_2_562915447 (FIX_3_072711026 - FIX_2_562915447) | |
2701 +#define FIX_0_298631336_MINUS_0_899976223 (FIX_0_298631336 - FIX_0_899976223) | |
2702 +#define FIX_1_501321110_MINUS_0_899976223 (FIX_1_501321110 - FIX_0_899976223) | |
2703 +#define FIX_2_053119869_MINUS_2_562915447 (FIX_2_053119869 - FIX_2_562915447) | |
2704 +#define FIX_0_541196100_PLUS_0_765366865 (FIX_0_541196100 + FIX_0_765366865) | |
2705 + | |
2706 +/* | |
2707 + * Reference SIMD-friendly 1-D ISLOW iDCT C implementation. | |
2708 + * Uses some ideas from the comments in 'simd/jiss2int-64.asm' | |
2709 + */ | |
2710 +#define REF_1D_IDCT(xrow0, xrow1, xrow2, xrow3, xrow4, xrow5, xrow6, xrow7) \ | |
2711 +{ \ | |
2712 + DCTELEM row0, row1, row2, row3, row4, row5, row6, row7; \ | |
2713 + INT32 q1, q2, q3, q4, q5, q6, q7; \ | |
2714 + INT32 tmp11_plus_tmp2, tmp11_minus_tmp2; \ | |
2715 + \ | |
2716 + /* 1-D iDCT input data */ \ | |
2717 + row0 = xrow0; \ | |
2718 + row1 = xrow1; \ | |
2719 + row2 = xrow2; \ | |
2720 + row3 = xrow3; \ | |
2721 + row4 = xrow4; \ | |
2722 + row5 = xrow5; \ | |
2723 + row6 = xrow6; \ | |
2724 + row7 = xrow7; \ | |
2725 + \ | |
2726 + q5 = row7 + row3; \ | |
2727 + q4 = row5 + row1; \ | |
2728 + q6 = MULTIPLY(q5, FIX_1_175875602_MINUS_1_961570560) + \ | |
2729 + MULTIPLY(q4, FIX_1_175875602); \ | |
2730 + q7 = MULTIPLY(q5, FIX_1_175875602) + \ | |
2731 + MULTIPLY(q4, FIX_1_175875602_MINUS_0_390180644); \ | |
2732 + q2 = MULTIPLY(row2, FIX_0_541196100) + \ | |
2733 + MULTIPLY(row6, FIX_0_541196100_MINUS_1_847759065); \ | |
2734 + q4 = q6; \ | |
2735 + q3 = ((INT32) row0 - (INT32) row4) << 13; \ | |
2736 + q6 += MULTIPLY(row5, -FIX_2_562915447) + \ | |
2737 + MULTIPLY(row3, FIX_3_072711026_MINUS_2_562915447); \ | |
2738 + /* now we can use q1 (reloadable constants have been used up) */ \ | |
2739 + q1 = q3 + q2; \ | |
2740 + q4 += MULTIPLY(row7, FIX_0_298631336_MINUS_0_899976223) + \ | |
2741 + MULTIPLY(row1, -FIX_0_899976223); \ | |
2742 + q5 = q7; \ | |
2743 + q1 = q1 + q6; \ | |
2744 + q7 += MULTIPLY(row7, -FIX_0_899976223) + \ | |
2745 + MULTIPLY(row1, FIX_1_501321110_MINUS_0_899976223); \ | |
2746 + \ | |
2747 + /* (tmp11 + tmp2) has been calculated (out_row1 before descale) */ \ | |
2748 + tmp11_plus_tmp2 = q1; \ | |
2749 + row1 = 0; \ | |
2750 + \ | |
2751 + q1 = q1 - q6; \ | |
2752 + q5 += MULTIPLY(row5, FIX_2_053119869_MINUS_2_562915447) + \ | |
2753 + MULTIPLY(row3, -FIX_2_562915447); \ | |
2754 + q1 = q1 - q6; \ | |
2755 + q6 = MULTIPLY(row2, FIX_0_541196100_PLUS_0_765366865) + \ | |
2756 + MULTIPLY(row6, FIX_0_541196100); \ | |
2757 + q3 = q3 - q2; \ | |
2758 + \ | |
2759 + /* (tmp11 - tmp2) has been calculated (out_row6 before descale) */ \ | |
2760 + tmp11_minus_tmp2 = q1; \ | |
2761 + \ | |
2762 + q1 = ((INT32) row0 + (INT32) row4) << 13; \ | |
2763 + q2 = q1 + q6; \ | |
2764 + q1 = q1 - q6; \ | |
2765 + \ | |
2766 + /* pick up the results */ \ | |
2767 + tmp0 = q4; \ | |
2768 + tmp1 = q5; \ | |
2769 + tmp2 = (tmp11_plus_tmp2 - tmp11_minus_tmp2) / 2; \ | |
2770 + tmp3 = q7; \ | |
2771 + tmp10 = q2; \ | |
2772 + tmp11 = (tmp11_plus_tmp2 + tmp11_minus_tmp2) / 2; \ | |
2773 + tmp12 = q3; \ | |
2774 + tmp13 = q1; \ | |
2775 +} | |
2776 + | |
2777 +#define XFIX_0_899976223 v0.4h[0] | |
2778 +#define XFIX_0_541196100 v0.4h[1] | |
2779 +#define XFIX_2_562915447 v0.4h[2] | |
2780 +#define XFIX_0_298631336_MINUS_0_899976223 v0.4h[3] | |
2781 +#define XFIX_1_501321110_MINUS_0_899976223 v1.4h[0] | |
2782 +#define XFIX_2_053119869_MINUS_2_562915447 v1.4h[1] | |
2783 +#define XFIX_0_541196100_PLUS_0_765366865 v1.4h[2] | |
2784 +#define XFIX_1_175875602 v1.4h[3] | |
2785 +#define XFIX_1_175875602_MINUS_0_390180644 v2.4h[0] | |
2786 +#define XFIX_0_541196100_MINUS_1_847759065 v2.4h[1] | |
2787 +#define XFIX_3_072711026_MINUS_2_562915447 v2.4h[2] | |
2788 +#define XFIX_1_175875602_MINUS_1_961570560 v2.4h[3] | |
2789 + | |
2790 +.balign 16 | |
2791 +jsimd_idct_islow_neon_consts: | |
2792 + .short FIX_0_899976223 /* d0[0] */ | |
2793 + .short FIX_0_541196100 /* d0[1] */ | |
2794 + .short FIX_2_562915447 /* d0[2] */ | |
2795 + .short FIX_0_298631336_MINUS_0_899976223 /* d0[3] */ | |
2796 + .short FIX_1_501321110_MINUS_0_899976223 /* d1[0] */ | |
2797 + .short FIX_2_053119869_MINUS_2_562915447 /* d1[1] */ | |
2798 + .short FIX_0_541196100_PLUS_0_765366865 /* d1[2] */ | |
2799 + .short FIX_1_175875602 /* d1[3] */ | |
2800 + /* reloadable constants */ | |
2801 + .short FIX_1_175875602_MINUS_0_390180644 /* d2[0] */ | |
2802 + .short FIX_0_541196100_MINUS_1_847759065 /* d2[1] */ | |
2803 + .short FIX_3_072711026_MINUS_2_562915447 /* d2[2] */ | |
2804 + .short FIX_1_175875602_MINUS_1_961570560 /* d2[3] */ | |
2805 + | |
2806 +asm_function jsimd_idct_islow_neon | |
2807 + | |
2808 + DCT_TABLE .req x0 | |
2809 + COEF_BLOCK .req x1 | |
2810 + OUTPUT_BUF .req x2 | |
2811 + OUTPUT_COL .req x3 | |
2812 + TMP1 .req x0 | |
2813 + TMP2 .req x1 | |
2814 + TMP3 .req x2 | |
2815 + TMP4 .req x15 | |
2816 + | |
2817 + ROW0L .req v16 | |
2818 + ROW0R .req v17 | |
2819 + ROW1L .req v18 | |
2820 + ROW1R .req v19 | |
2821 + ROW2L .req v20 | |
2822 + ROW2R .req v21 | |
2823 + ROW3L .req v22 | |
2824 + ROW3R .req v23 | |
2825 + ROW4L .req v24 | |
2826 + ROW4R .req v25 | |
2827 + ROW5L .req v26 | |
2828 + ROW5R .req v27 | |
2829 + ROW6L .req v28 | |
2830 + ROW6R .req v29 | |
2831 + ROW7L .req v30 | |
2832 + ROW7R .req v31 | |
2833 + /* Save all NEON registers and x15 (32 NEON registers * 8 bytes + 16) */ | |
2834 + sub sp, sp, 272 | |
2835 + str x15, [sp], 16 | |
2836 + adr x15, jsimd_idct_islow_neon_consts | |
2837 + st1 {v0.8b - v3.8b}, [sp], 32 | |
2838 + st1 {v4.8b - v7.8b}, [sp], 32 | |
2839 + st1 {v8.8b - v11.8b}, [sp], 32 | |
2840 + st1 {v12.8b - v15.8b}, [sp], 32 | |
2841 + st1 {v16.8b - v19.8b}, [sp], 32 | |
2842 + st1 {v20.8b - v23.8b}, [sp], 32 | |
2843 + st1 {v24.8b - v27.8b}, [sp], 32 | |
2844 + st1 {v28.8b - v31.8b}, [sp], 32 | |
2845 + ld1 {v16.4h, v17.4h, v18.4h, v19.4h}, [COEF_BLOCK], 32 | |
2846 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [DCT_TABLE], 32 | |
2847 + ld1 {v20.4h, v21.4h, v22.4h, v23.4h}, [COEF_BLOCK], 32 | |
2848 + mul v16.4h, v16.4h, v0.4h | |
2849 + mul v17.4h, v17.4h, v1.4h | |
2850 + ins v16.2d[1], v17.2d[0] /* 128 bit q8 */ | |
2851 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [DCT_TABLE], 32 | |
2852 + mul v18.4h, v18.4h, v2.4h | |
2853 + mul v19.4h, v19.4h, v3.4h | |
2854 + ins v18.2d[1], v19.2d[0] /* 128 bit q9 */ | |
2855 + ld1 {v24.4h, v25.4h, v26.4h, v27.4h}, [COEF_BLOCK], 32 | |
2856 + mul v20.4h, v20.4h, v4.4h | |
2857 + mul v21.4h, v21.4h, v5.4h | |
2858 + ins v20.2d[1], v21.2d[0] /* 128 bit q10 */ | |
2859 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [DCT_TABLE], 32 | |
2860 + mul v22.4h, v22.4h, v6.4h | |
2861 + mul v23.4h, v23.4h, v7.4h | |
2862 + ins v22.2d[1], v23.2d[0] /* 128 bit q11 */ | |
2863 + ld1 {v28.4h, v29.4h, v30.4h, v31.4h}, [COEF_BLOCK] | |
2864 + mul v24.4h, v24.4h, v0.4h | |
2865 + mul v25.4h, v25.4h, v1.4h | |
2866 + ins v24.2d[1], v25.2d[0] /* 128 bit q12 */ | |
2867 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [DCT_TABLE], 32 | |
2868 + mul v28.4h, v28.4h, v4.4h | |
2869 + mul v29.4h, v29.4h, v5.4h | |
2870 + ins v28.2d[1], v29.2d[0] /* 128 bit q14 */ | |
2871 + mul v26.4h, v26.4h, v2.4h | |
2872 + mul v27.4h, v27.4h, v3.4h | |
2873 + ins v26.2d[1], v27.2d[0] /* 128 bit q13 */ | |
2874 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [x15] /* load constants */ | |
2875 + add x15, x15, #16 | |
2876 + mul v30.4h, v30.4h, v6.4h | |
2877 + mul v31.4h, v31.4h, v7.4h | |
2878 + ins v30.2d[1], v31.2d[0] /* 128 bit q15 */ | |
2879 + /* Go to the bottom of the stack */ | |
2880 + sub sp, sp, 352 | |
2881 + stp x4, x5, [sp], 16 | |
2882 + st1 {v8.4h - v11.4h}, [sp], 32 /* save NEON registers */ | |
2883 + st1 {v12.4h - v15.4h}, [sp], 32 | |
2884 + /* 1-D IDCT, pass 1, left 4x8 half */ | |
2885 + add v4.4h, ROW7L.4h, ROW3L.4h | |
2886 + add v5.4h, ROW5L.4h, ROW1L.4h | |
2887 + smull v12.4s, v4.4h, XFIX_1_175875602_MINUS_1_961570560 | |
2888 + smlal v12.4s, v5.4h, XFIX_1_175875602 | |
2889 + smull v14.4s, v4.4h, XFIX_1_175875602 | |
2890 + /* Check for the zero coefficients in the right 4x8 half */ | |
2891 + smlal v14.4s, v5.4h, XFIX_1_175875602_MINUS_0_390180644 | |
2892 + ssubl v6.4s, ROW0L.4h, ROW4L.4h | |
2893 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 1 * 8))] | |
2894 + smull v4.4s, ROW2L.4h, XFIX_0_541196100 | |
2895 + smlal v4.4s, ROW6L.4h, XFIX_0_541196100_MINUS_1_847759065 | |
2896 + orr x0, x4, x5 | |
2897 + mov v8.16b, v12.16b | |
2898 + smlsl v12.4s, ROW5L.4h, XFIX_2_562915447 | |
2899 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 2 * 8))] | |
2900 + smlal v12.4s, ROW3L.4h, XFIX_3_072711026_MINUS_2_562915447 | |
2901 + shl v6.4s, v6.4s, #13 | |
2902 + orr x0, x0, x4 | |
2903 + smlsl v8.4s, ROW1L.4h, XFIX_0_899976223 | |
2904 + orr x0, x0 , x5 | |
2905 + add v2.4s, v6.4s, v4.4s | |
2906 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 3 * 8))] | |
2907 + mov v10.16b, v14.16b | |
2908 + add v2.4s, v2.4s, v12.4s | |
2909 + orr x0, x0, x4 | |
2910 + smlsl v14.4s, ROW7L.4h, XFIX_0_899976223 | |
2911 + orr x0, x0, x5 | |
2912 + smlal v14.4s, ROW1L.4h, XFIX_1_501321110_MINUS_0_899976223 | |
2913 + rshrn ROW1L.4h, v2.4s, #11 | |
2914 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 4 * 8))] | |
2915 + sub v2.4s, v2.4s, v12.4s | |
2916 + smlal v10.4s, ROW5L.4h, XFIX_2_053119869_MINUS_2_562915447 | |
2917 + orr x0, x0, x4 | |
2918 + smlsl v10.4s, ROW3L.4h, XFIX_2_562915447 | |
2919 + orr x0, x0, x5 | |
2920 + sub v2.4s, v2.4s, v12.4s | |
2921 + smull v12.4s, ROW2L.4h, XFIX_0_541196100_PLUS_0_765366865 | |
2922 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 5 * 8))] | |
2923 + smlal v12.4s, ROW6L.4h, XFIX_0_541196100 | |
2924 + sub v6.4s, v6.4s, v4.4s | |
2925 + orr x0, x0, x4 | |
2926 + rshrn ROW6L.4h, v2.4s, #11 | |
2927 + orr x0, x0, x5 | |
2928 + add v2.4s, v6.4s, v10.4s | |
2929 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 6 * 8))] | |
2930 + sub v6.4s, v6.4s, v10.4s | |
2931 + saddl v10.4s, ROW0L.4h, ROW4L.4h | |
2932 + orr x0, x0, x4 | |
2933 + rshrn ROW2L.4h, v2.4s, #11 | |
2934 + orr x0, x0, x5 | |
2935 + rshrn ROW5L.4h, v6.4s, #11 | |
2936 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 7 * 8))] | |
2937 + shl v10.4s, v10.4s, #13 | |
2938 + smlal v8.4s, ROW7L.4h, XFIX_0_298631336_MINUS_0_899976223 | |
2939 + orr x0, x0, x4 | |
2940 + add v4.4s, v10.4s, v12.4s | |
2941 + orr x0, x0, x5 | |
2942 + cmp x0, #0 /* orrs instruction removed */ | |
2943 + sub v2.4s, v10.4s, v12.4s | |
2944 + add v12.4s, v4.4s, v14.4s | |
2945 + ldp w4, w5, [COEF_BLOCK, #(-96 + 2 * (4 + 0 * 8))] | |
2946 + sub v4.4s, v4.4s, v14.4s | |
2947 + add v10.4s, v2.4s, v8.4s | |
2948 + orr x0, x4, x5 | |
2949 + sub v6.4s, v2.4s, v8.4s | |
2950 + /* pop {x4, x5} */ | |
2951 + sub sp, sp, 80 | |
2952 + ldp x4, x5, [sp], 16 | |
2953 + rshrn ROW7L.4h, v4.4s, #11 | |
2954 + rshrn ROW3L.4h, v10.4s, #11 | |
2955 + rshrn ROW0L.4h, v12.4s, #11 | |
2956 + rshrn ROW4L.4h, v6.4s, #11 | |
2957 + | |
2958 + beq 3f /* Go to do some special handling for the sparse right
4x8 half */ | |
2959 + | |
2960 + /* 1-D IDCT, pass 1, right 4x8 half */ | |
2961 + ld1 {v2.4h}, [x15] /* reload constants */ | |
2962 + add v10.4h, ROW7R.4h, ROW3R.4h | |
2963 + add v8.4h, ROW5R.4h, ROW1R.4h | |
2964 + /* Transpose ROW6L <-> ROW7L (v3 available free register) */ | |
2965 + transpose ROW6L, ROW7L, v3, .16b, .4h | |
2966 + smull v12.4s, v10.4h, XFIX_1_175875602_MINUS_1_961570560 | |
2967 + smlal v12.4s, v8.4h, XFIX_1_175875602 | |
2968 + /* Transpose ROW2L <-> ROW3L (v3 available free register) */ | |
2969 + transpose ROW2L, ROW3L, v3, .16b, .4h | |
2970 + smull v14.4s, v10.4h, XFIX_1_175875602 | |
2971 + smlal v14.4s, v8.4h, XFIX_1_175875602_MINUS_0_390180644 | |
2972 + /* Transpose ROW0L <-> ROW1L (v3 available free register) */ | |
2973 + transpose ROW0L, ROW1L, v3, .16b, .4h | |
2974 + ssubl v6.4s, ROW0R.4h, ROW4R.4h | |
2975 + smull v4.4s, ROW2R.4h, XFIX_0_541196100 | |
2976 + smlal v4.4s, ROW6R.4h, XFIX_0_541196100_MINUS_1_847759065 | |
2977 + /* Transpose ROW4L <-> ROW5L (v3 available free register) */ | |
2978 + transpose ROW4L, ROW5L, v3, .16b, .4h | |
2979 + mov v8.16b, v12.16b | |
2980 + smlsl v12.4s, ROW5R.4h, XFIX_2_562915447 | |
2981 + smlal v12.4s, ROW3R.4h, XFIX_3_072711026_MINUS_2_562915447 | |
2982 + /* Transpose ROW1L <-> ROW3L (v3 available free register) */ | |
2983 + transpose ROW1L, ROW3L, v3, .16b, .2s | |
2984 + shl v6.4s, v6.4s, #13 | |
2985 + smlsl v8.4s, ROW1R.4h, XFIX_0_899976223 | |
2986 + /* Transpose ROW4L <-> ROW6L (v3 available free register) */ | |
2987 + transpose ROW4L, ROW6L, v3, .16b, .2s | |
2988 + add v2.4s, v6.4s, v4.4s | |
2989 + mov v10.16b, v14.16b | |
2990 + add v2.4s, v2.4s, v12.4s | |
2991 + /* Transpose ROW0L <-> ROW2L (v3 available free register) */ | |
2992 + transpose ROW0L, ROW2L, v3, .16b, .2s | |
2993 + smlsl v14.4s, ROW7R.4h, XFIX_0_899976223 | |
2994 + smlal v14.4s, ROW1R.4h, XFIX_1_501321110_MINUS_0_899976223 | |
2995 + rshrn ROW1R.4h, v2.4s, #11 | |
2996 + /* Transpose ROW5L <-> ROW7L (v3 available free register) */ | |
2997 + transpose ROW5L, ROW7L, v3, .16b, .2s | |
2998 + sub v2.4s, v2.4s, v12.4s | |
2999 + smlal v10.4s, ROW5R.4h, XFIX_2_053119869_MINUS_2_562915447 | |
3000 + smlsl v10.4s, ROW3R.4h, XFIX_2_562915447 | |
3001 + sub v2.4s, v2.4s, v12.4s | |
3002 + smull v12.4s, ROW2R.4h, XFIX_0_541196100_PLUS_0_765366865 | |
3003 + smlal v12.4s, ROW6R.4h, XFIX_0_541196100 | |
3004 + sub v6.4s, v6.4s, v4.4s | |
3005 + rshrn ROW6R.4h, v2.4s, #11 | |
3006 + add v2.4s, v6.4s, v10.4s | |
3007 + sub v6.4s, v6.4s, v10.4s | |
3008 + saddl v10.4s, ROW0R.4h, ROW4R.4h | |
3009 + rshrn ROW2R.4h, v2.4s, #11 | |
3010 + rshrn ROW5R.4h, v6.4s, #11 | |
3011 + shl v10.4s, v10.4s, #13 | |
3012 + smlal v8.4s, ROW7R.4h, XFIX_0_298631336_MINUS_0_899976223 | |
3013 + add v4.4s, v10.4s, v12.4s | |
3014 + sub v2.4s, v10.4s, v12.4s | |
3015 + add v12.4s, v4.4s, v14.4s | |
3016 + sub v4.4s, v4.4s, v14.4s | |
3017 + add v10.4s, v2.4s, v8.4s | |
3018 + sub v6.4s, v2.4s, v8.4s | |
3019 + rshrn ROW7R.4h, v4.4s, #11 | |
3020 + rshrn ROW3R.4h, v10.4s, #11 | |
3021 + rshrn ROW0R.4h, v12.4s, #11 | |
3022 + rshrn ROW4R.4h, v6.4s, #11 | |
3023 + /* Transpose right 4x8 half */ | |
3024 + transpose ROW6R, ROW7R, v3, .16b, .4h | |
3025 + transpose ROW2R, ROW3R, v3, .16b, .4h | |
3026 + transpose ROW0R, ROW1R, v3, .16b, .4h | |
3027 + transpose ROW4R, ROW5R, v3, .16b, .4h | |
3028 + transpose ROW1R, ROW3R, v3, .16b, .2s | |
3029 + transpose ROW4R, ROW6R, v3, .16b, .2s | |
3030 + transpose ROW0R, ROW2R, v3, .16b, .2s | |
3031 + transpose ROW5R, ROW7R, v3, .16b, .2s | |
3032 + | |
3033 +1: /* 1-D IDCT, pass 2 (normal variant), left 4x8 half */ | |
3034 + ld1 {v2.4h}, [x15] /* reload constants */ | |
3035 + smull v12.4S, ROW1R.4h, XFIX_1_175875602 /* ROW5L.4h <-> ROW1R.
4h */ | |
3036 + smlal v12.4s, ROW1L.4h, XFIX_1_175875602 | |
3037 + smlal v12.4s, ROW3R.4h, XFIX_1_175875602_MINUS_1_961570560 /* R
OW7L.4h <-> ROW3R.4h */ | |
3038 + smlal v12.4s, ROW3L.4h, XFIX_1_175875602_MINUS_1_961570560 | |
3039 + smull v14.4s, ROW3R.4h, XFIX_1_175875602 /* ROW7L.4h <-> ROW3R.
4h */ | |
3040 + smlal v14.4s, ROW3L.4h, XFIX_1_175875602 | |
3041 + smlal v14.4s, ROW1R.4h, XFIX_1_175875602_MINUS_0_390180644 /* R
OW5L.4h <-> ROW1R.4h */ | |
3042 + smlal v14.4s, ROW1L.4h, XFIX_1_175875602_MINUS_0_390180644 | |
3043 + ssubl v6.4s, ROW0L.4h, ROW0R.4h /* ROW4L.4h <-> ROW0R.4h */ | |
3044 + smull v4.4s, ROW2L.4h, XFIX_0_541196100 | |
3045 + smlal v4.4s, ROW2R.4h, XFIX_0_541196100_MINUS_1_847759065 /* R
OW6L.4h <-> ROW2R.4h */ | |
3046 + mov v8.16b, v12.16b | |
3047 + smlsl v12.4s, ROW1R.4h, XFIX_2_562915447 /* ROW5L.4h <-> ROW1R.
4h */ | |
3048 + smlal v12.4s, ROW3L.4h, XFIX_3_072711026_MINUS_2_562915447 | |
3049 + shl v6.4s, v6.4s, #13 | |
3050 + smlsl v8.4s, ROW1L.4h, XFIX_0_899976223 | |
3051 + add v2.4s, v6.4s, v4.4s | |
3052 + mov v10.16b, v14.16b | |
3053 + add v2.4s, v2.4s, v12.4s | |
3054 + smlsl v14.4s, ROW3R.4h, XFIX_0_899976223 /* ROW7L.4h <-> ROW3R.
4h */ | |
3055 + smlal v14.4s, ROW1L.4h, XFIX_1_501321110_MINUS_0_899976223 | |
3056 + shrn ROW1L.4h, v2.4s, #16 | |
3057 + sub v2.4s, v2.4s, v12.4s | |
3058 + smlal v10.4s, ROW1R.4h, XFIX_2_053119869_MINUS_2_562915447 /* R
OW5L.4h <-> ROW1R.4h */ | |
3059 + smlsl v10.4s, ROW3L.4h, XFIX_2_562915447 | |
3060 + sub v2.4s, v2.4s, v12.4s | |
3061 + smull v12.4s, ROW2L.4h, XFIX_0_541196100_PLUS_0_765366865 | |
3062 + smlal v12.4s, ROW2R.4h, XFIX_0_541196100 /* ROW6L.4h <-> ROW2R.
4h */ | |
3063 + sub v6.4s, v6.4s, v4.4s | |
3064 + shrn ROW2R.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */ | |
3065 + add v2.4s, v6.4s, v10.4s | |
3066 + sub v6.4s, v6.4s, v10.4s | |
3067 + saddl v10.4s, ROW0L.4h, ROW0R.4h /* ROW4L.4h <-> ROW0R.4h */ | |
3068 + shrn ROW2L.4h, v2.4s, #16 | |
3069 + shrn ROW1R.4h, v6.4s, #16 /* ROW5L.4h <-> ROW1R.4h */ | |
3070 + shl v10.4s, v10.4s, #13 | |
3071 + smlal v8.4s, ROW3R.4h, XFIX_0_298631336_MINUS_0_899976223 /* R
OW7L.4h <-> ROW3R.4h */ | |
3072 + add v4.4s, v10.4s, v12.4s | |
3073 + sub v2.4s, v10.4s, v12.4s | |
3074 + add v12.4s, v4.4s, v14.4s | |
3075 + sub v4.4s, v4.4s, v14.4s | |
3076 + add v10.4s, v2.4s, v8.4s | |
3077 + sub v6.4s, v2.4s, v8.4s | |
3078 + shrn ROW3R.4h, v4.4s, #16 /* ROW7L.4h <-> ROW3R.4h */ | |
3079 + shrn ROW3L.4h, v10.4s, #16 | |
3080 + shrn ROW0L.4h, v12.4s, #16 | |
3081 + shrn ROW0R.4h, v6.4s, #16 /* ROW4L.4h <-> ROW0R.4h */ | |
3082 + /* 1-D IDCT, pass 2, right 4x8 half */ | |
3083 + ld1 {v2.4h}, [x15] /* reload constants */ | |
3084 + smull v12.4s, ROW5R.4h, XFIX_1_175875602 | |
3085 + smlal v12.4s, ROW5L.4h, XFIX_1_175875602 /* ROW5L.4h <-> ROW1R.
4h */ | |
3086 + smlal v12.4s, ROW7R.4h, XFIX_1_175875602_MINUS_1_961570560 | |
3087 + smlal v12.4s, ROW7L.4h, XFIX_1_175875602_MINUS_1_961570560 /* R
OW7L.4h <-> ROW3R.4h */ | |
3088 + smull v14.4s, ROW7R.4h, XFIX_1_175875602 | |
3089 + smlal v14.4s, ROW7L.4h, XFIX_1_175875602 /* ROW7L.4h <-> ROW3R.
4h */ | |
3090 + smlal v14.4s, ROW5R.4h, XFIX_1_175875602_MINUS_0_390180644 | |
3091 + smlal v14.4s, ROW5L.4h, XFIX_1_175875602_MINUS_0_390180644 /* R
OW5L.4h <-> ROW1R.4h */ | |
3092 + ssubl v6.4s, ROW4L.4h, ROW4R.4h /* ROW4L.4h <-> ROW0R.4h */ | |
3093 + smull v4.4s, ROW6L.4h, XFIX_0_541196100 /* ROW6L.4h <-> ROW2R.
4h */ | |
3094 + smlal v4.4s, ROW6R.4h, XFIX_0_541196100_MINUS_1_847759065 | |
3095 + mov v8.16b, v12.16b | |
3096 + smlsl v12.4s, ROW5R.4h, XFIX_2_562915447 | |
3097 + smlal v12.4s, ROW7L.4h, XFIX_3_072711026_MINUS_2_562915447 /* R
OW7L.4h <-> ROW3R.4h */ | |
3098 + shl v6.4s, v6.4s, #13 | |
3099 + smlsl v8.4s, ROW5L.4h, XFIX_0_899976223 /* ROW5L.4h <-> ROW1R.
4h */ | |
3100 + add v2.4s, v6.4s, v4.4s | |
3101 + mov v10.16b, v14.16b | |
3102 + add v2.4s, v2.4s, v12.4s | |
3103 + smlsl v14.4s, ROW7R.4h, XFIX_0_899976223 | |
3104 + smlal v14.4s, ROW5L.4h, XFIX_1_501321110_MINUS_0_899976223 /* R
OW5L.4h <-> ROW1R.4h */ | |
3105 + shrn ROW5L.4h, v2.4s, #16 /* ROW5L.4h <-> ROW1R.4h */ | |
3106 + sub v2.4s, v2.4s, v12.4s | |
3107 + smlal v10.4s, ROW5R.4h, XFIX_2_053119869_MINUS_2_562915447 | |
3108 + smlsl v10.4s, ROW7L.4h, XFIX_2_562915447 /* ROW7L.4h <-> ROW3R.
4h */ | |
3109 + sub v2.4s, v2.4s, v12.4s | |
3110 + smull v12.4s, ROW6L.4h, XFIX_0_541196100_PLUS_0_765366865 /* RO
W6L.4h <-> ROW2R.4h */ | |
3111 + smlal v12.4s, ROW6R.4h, XFIX_0_541196100 | |
3112 + sub v6.4s, v6.4s, v4.4s | |
3113 + shrn ROW6R.4h, v2.4s, #16 | |
3114 + add v2.4s, v6.4s, v10.4s | |
3115 + sub v6.4s, v6.4s, v10.4s | |
3116 + saddl v10.4s, ROW4L.4h, ROW4R.4h /* ROW4L.4h <-> ROW0R.4h */ | |
3117 + shrn ROW6L.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */ | |
3118 + shrn ROW5R.4h, v6.4s, #16 | |
3119 + shl v10.4s, v10.4s, #13 | |
3120 + smlal v8.4s, ROW7R.4h, XFIX_0_298631336_MINUS_0_899976223 | |
3121 + add v4.4s, v10.4s, v12.4s | |
3122 + sub v2.4s, v10.4s, v12.4s | |
3123 + add v12.4s, v4.4s, v14.4s | |
3124 + sub v4.4s, v4.4s, v14.4s | |
3125 + add v10.4s, v2.4s, v8.4s | |
3126 + sub v6.4s, v2.4s, v8.4s | |
3127 + shrn ROW7R.4h, v4.4s, #16 | |
3128 + shrn ROW7L.4h, v10.4s, #16 /* ROW7L.4h <-> ROW3R.4h */ | |
3129 + shrn ROW4L.4h, v12.4s, #16 /* ROW4L.4h <-> ROW0R.4h */ | |
3130 + shrn ROW4R.4h, v6.4s, #16 | |
3131 + | |
3132 +2: /* Descale to 8-bit and range limit */ | |
3133 + ins v16.2d[1], v17.2d[0] | |
3134 + ins v18.2d[1], v19.2d[0] | |
3135 + ins v20.2d[1], v21.2d[0] | |
3136 + ins v22.2d[1], v23.2d[0] | |
3137 + sqrshrn v16.8b, v16.8h, #2 | |
3138 + sqrshrn2 v16.16b, v18.8h, #2 | |
3139 + sqrshrn v18.8b, v20.8h, #2 | |
3140 + sqrshrn2 v18.16b, v22.8h, #2 | |
3141 + | |
3142 + /* vpop {v8.4h - d15.4h} */ /* restore NEON registers */ | |
3143 + ld1 {v8.4h - v11.4h}, [sp], 32 | |
3144 + ld1 {v12.4h - v15.4h}, [sp], 32 | |
3145 + ins v24.2d[1], v25.2d[0] | |
3146 + | |
3147 + sqrshrn v20.8b, v24.8h, #2 | |
3148 + /* Transpose the final 8-bit samples and do signed->unsigned conversion *
/ | |
3149 + /* trn1 v16.8h, v16.8h, v18.8h */ | |
3150 + transpose v16, v18, v3, .16b, .8h | |
3151 + ins v26.2d[1], v27.2d[0] | |
3152 + ins v28.2d[1], v29.2d[0] | |
3153 + ins v30.2d[1], v31.2d[0] | |
3154 + sqrshrn2 v20.16b, v26.8h, #2 | |
3155 + sqrshrn v22.8b, v28.8h, #2 | |
3156 + movi v0.16b, #(CENTERJSAMPLE) | |
3157 + sqrshrn2 v22.16b, v30.8h, #2 | |
3158 + transpose_single v16, v17, v3, .2d, .8b | |
3159 + transpose_single v18, v19, v3, .2d, .8b | |
3160 + add v16.8b, v16.8b, v0.8b | |
3161 + add v17.8b, v17.8b, v0.8b | |
3162 + add v18.8b, v18.8b, v0.8b | |
3163 + add v19.8b, v19.8b, v0.8b | |
3164 + transpose v20, v22, v3, .16b, .8h | |
3165 + /* Store results to the output buffer */ | |
3166 + ldp TMP1, TMP2, [OUTPUT_BUF], 16 | |
3167 + add TMP1, TMP1, OUTPUT_COL | |
3168 + add TMP2, TMP2, OUTPUT_COL | |
3169 + st1 {v16.8b}, [TMP1] | |
3170 + transpose_single v20, v21, v3, .2d, .8b | |
3171 + st1 {v17.8b}, [TMP2] | |
3172 + ldp TMP1, TMP2, [OUTPUT_BUF], 16 | |
3173 + add TMP1, TMP1, OUTPUT_COL | |
3174 + add TMP2, TMP2, OUTPUT_COL | |
3175 + st1 {v18.8b}, [TMP1] | |
3176 + add v20.8b, v20.8b, v0.8b | |
3177 + add v21.8b, v21.8b, v0.8b | |
3178 + st1 {v19.8b}, [TMP2] | |
3179 + ldp TMP1, TMP2, [OUTPUT_BUF], 16 | |
3180 + ldp TMP3, TMP4, [OUTPUT_BUF] | |
3181 + add TMP1, TMP1, OUTPUT_COL | |
3182 + add TMP2, TMP2, OUTPUT_COL | |
3183 + add TMP3, TMP3, OUTPUT_COL | |
3184 + add TMP4, TMP4, OUTPUT_COL | |
3185 + transpose_single v22, v23, v3, .2d, .8b | |
3186 + st1 {v20.8b}, [TMP1] | |
3187 + add v22.8b, v22.8b, v0.8b | |
3188 + add v23.8b, v23.8b, v0.8b | |
3189 + st1 {v21.8b}, [TMP2] | |
3190 + st1 {v22.8b}, [TMP3] | |
3191 + st1 {v23.8b}, [TMP4] | |
3192 + ldr x15, [sp], 16 | |
3193 + ld1 {v0.8b - v3.8b}, [sp], 32 | |
3194 + ld1 {v4.8b - v7.8b}, [sp], 32 | |
3195 + ld1 {v8.8b - v11.8b}, [sp], 32 | |
3196 + ld1 {v12.8b - v15.8b}, [sp], 32 | |
3197 + ld1 {v16.8b - v19.8b}, [sp], 32 | |
3198 + ld1 {v20.8b - v23.8b}, [sp], 32 | |
3199 + ld1 {v24.8b - v27.8b}, [sp], 32 | |
3200 + ld1 {v28.8b - v31.8b}, [sp], 32 | |
3201 + blr x30 | |
3202 + | |
3203 +3: /* Left 4x8 half is done, right 4x8 half contains mostly zeros */ | |
3204 + | |
3205 + /* Transpose left 4x8 half */ | |
3206 + transpose ROW6L, ROW7L, v3, .16b, .4h | |
3207 + transpose ROW2L, ROW3L, v3, .16b, .4h | |
3208 + transpose ROW0L, ROW1L, v3, .16b, .4h | |
3209 + transpose ROW4L, ROW5L, v3, .16b, .4h | |
3210 + shl ROW0R.4h, ROW0R.4h, #2 /* PASS1_BITS */ | |
3211 + transpose ROW1L, ROW3L, v3, .16b, .2s | |
3212 + transpose ROW4L, ROW6L, v3, .16b, .2s | |
3213 + transpose ROW0L, ROW2L, v3, .16b, .2s | |
3214 + transpose ROW5L, ROW7L, v3, .16b, .2s | |
3215 + cmp x0, #0 | |
3216 + beq 4f /* Right 4x8 half has all zeros, go to 'sparse' second p
ass */ | |
3217 + | |
3218 + /* Only row 0 is non-zero for the right 4x8 half */ | |
3219 + dup ROW1R.4h, ROW0R.4h[1] | |
3220 + dup ROW2R.4h, ROW0R.4h[2] | |
3221 + dup ROW3R.4h, ROW0R.4h[3] | |
3222 + dup ROW4R.4h, ROW0R.4h[0] | |
3223 + dup ROW5R.4h, ROW0R.4h[1] | |
3224 + dup ROW6R.4h, ROW0R.4h[2] | |
3225 + dup ROW7R.4h, ROW0R.4h[3] | |
3226 + dup ROW0R.4h, ROW0R.4h[0] | |
3227 + b 1b /* Go to 'normal' second pass */ | |
3228 + | |
3229 +4: /* 1-D IDCT, pass 2 (sparse variant with zero rows 4-7), left 4x8 half */ | |
3230 + ld1 {v2.4h}, [x15] /* reload constants */ | |
3231 + smull v12.4s, ROW1L.4h, XFIX_1_175875602 | |
3232 + smlal v12.4s, ROW3L.4h, XFIX_1_175875602_MINUS_1_961570560 | |
3233 + smull v14.4s, ROW3L.4h, XFIX_1_175875602 | |
3234 + smlal v14.4s, ROW1L.4h, XFIX_1_175875602_MINUS_0_390180644 | |
3235 + smull v4.4s, ROW2L.4h, XFIX_0_541196100 | |
3236 + sshll v6.4s, ROW0L.4h, #13 | |
3237 + mov v8.16b, v12.16b | |
3238 + smlal v12.4s, ROW3L.4h, XFIX_3_072711026_MINUS_2_562915447 | |
3239 + smlsl v8.4s, ROW1L.4h, XFIX_0_899976223 | |
3240 + add v2.4s, v6.4s, v4.4s | |
3241 + mov v10.16b, v14.16b | |
3242 + smlal v14.4s, ROW1L.4h, XFIX_1_501321110_MINUS_0_899976223 | |
3243 + add v2.4s, v2.4s, v12.4s | |
3244 + add v12.4s, v12.4s, v12.4s | |
3245 + smlsl v10.4s, ROW3L.4h, XFIX_2_562915447 | |
3246 + shrn ROW1L.4h, v2.4s, #16 | |
3247 + sub v2.4s, v2.4s, v12.4s | |
3248 + smull v12.4s, ROW2L.4h, XFIX_0_541196100_PLUS_0_765366865 | |
3249 + sub v6.4s, v6.4s, v4.4s | |
3250 + shrn ROW2R.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */ | |
3251 + add v2.4s, v6.4s, v10.4s | |
3252 + sub v6.4s, v6.4s, v10.4s | |
3253 + sshll v10.4s, ROW0L.4h, #13 | |
3254 + shrn ROW2L.4h, v2.4s, #16 | |
3255 + shrn ROW1R.4h, v6.4s, #16 /* ROW5L.4h <-> ROW1R.4h */ | |
3256 + add v4.4s, v10.4s, v12.4s | |
3257 + sub v2.4s, v10.4s, v12.4s | |
3258 + add v12.4s, v4.4s, v14.4s | |
3259 + sub v4.4s, v4.4s, v14.4s | |
3260 + add v10.4s, v2.4s, v8.4s | |
3261 + sub v6.4s, v2.4s, v8.4s | |
3262 + shrn ROW3R.4h, v4.4s, #16 /* ROW7L.4h <-> ROW3R.4h */ | |
3263 + shrn ROW3L.4h, v10.4s, #16 | |
3264 + shrn ROW0L.4h, v12.4s, #16 | |
3265 + shrn ROW0R.4h, v6.4s, #16 /* ROW4L.4h <-> ROW0R.4h */ | |
3266 + /* 1-D IDCT, pass 2 (sparse variant with zero rows 4-7), right 4x8 half */ | |
3267 + ld1 {v2.4h}, [x15] /* reload constants */ | |
3268 + smull v12.4s, ROW5L.4h, XFIX_1_175875602 | |
3269 + smlal v12.4s, ROW7L.4h, XFIX_1_175875602_MINUS_1_961570560 | |
3270 + smull v14.4s, ROW7L.4h, XFIX_1_175875602 | |
3271 + smlal v14.4s, ROW5L.4h, XFIX_1_175875602_MINUS_0_390180644 | |
3272 + smull v4.4s, ROW6L.4h, XFIX_0_541196100 | |
3273 + sshll v6.4s, ROW4L.4h, #13 | |
3274 + mov v8.16b, v12.16b | |
3275 + smlal v12.4s, ROW7L.4h, XFIX_3_072711026_MINUS_2_562915447 | |
3276 + smlsl v8.4s, ROW5L.4h, XFIX_0_899976223 | |
3277 + add v2.4s, v6.4s, v4.4s | |
3278 + mov v10.16b, v14.16b | |
3279 + smlal v14.4s, ROW5L.4h, XFIX_1_501321110_MINUS_0_899976223 | |
3280 + add v2.4s, v2.4s, v12.4s | |
3281 + add v12.4s, v12.4s, v12.4s | |
3282 + smlsl v10.4s, ROW7L.4h, XFIX_2_562915447 | |
3283 + shrn ROW5L.4h, v2.4s, #16 /* ROW5L.4h <-> ROW1R.4h */ | |
3284 + sub v2.4s, v2.4s, v12.4s | |
3285 + smull v12.4s, ROW6L.4h, XFIX_0_541196100_PLUS_0_765366865 | |
3286 + sub v6.4s, v6.4s, v4.4s | |
3287 + shrn ROW6R.4h, v2.4s, #16 | |
3288 + add v2.4s, v6.4s, v10.4s | |
3289 + sub v6.4s, v6.4s, v10.4s | |
3290 + sshll v10.4s, ROW4L.4h, #13 | |
3291 + shrn ROW6L.4h, v2.4s, #16 /* ROW6L.4h <-> ROW2R.4h */ | |
3292 + shrn ROW5R.4h, v6.4s, #16 | |
3293 + add v4.4s, v10.4s, v12.4s | |
3294 + sub v2.4s, v10.4s, v12.4s | |
3295 + add v12.4s, v4.4s, v14.4s | |
3296 + sub v4.4s, v4.4s, v14.4s | |
3297 + add v10.4s, v2.4s, v8.4s | |
3298 + sub v6.4s, v2.4s, v8.4s | |
3299 + shrn ROW7R.4h, v4.4s, #16 | |
3300 + shrn ROW7L.4h, v10.4s, #16 /* ROW7L.4h <-> ROW3R.4h */ | |
3301 + shrn ROW4L.4h, v12.4s, #16 /* ROW4L.4h <-> ROW0R.4h */ | |
3302 + shrn ROW4R.4h, v6.4s, #16 | |
3303 + b 2b /* Go to epilogue */ | |
3304 + | |
3305 + .unreq DCT_TABLE | |
3306 + .unreq COEF_BLOCK | |
3307 + .unreq OUTPUT_BUF | |
3308 + .unreq OUTPUT_COL | |
3309 + .unreq TMP1 | |
3310 + .unreq TMP2 | |
3311 + .unreq TMP3 | |
3312 + .unreq TMP4 | |
3313 + | |
3314 + .unreq ROW0L | |
3315 + .unreq ROW0R | |
3316 + .unreq ROW1L | |
3317 + .unreq ROW1R | |
3318 + .unreq ROW2L | |
3319 + .unreq ROW2R | |
3320 + .unreq ROW3L | |
3321 + .unreq ROW3R | |
3322 + .unreq ROW4L | |
3323 + .unreq ROW4R | |
3324 + .unreq ROW5L | |
3325 + .unreq ROW5R | |
3326 + .unreq ROW6L | |
3327 + .unreq ROW6R | |
3328 + .unreq ROW7L | |
3329 + .unreq ROW7R | |
3330 + | |
3331 + | |
3332 +/*****************************************************************************/ | |
3333 + | |
3334 +/* | |
3335 + * jsimd_idct_ifast_neon | |
3336 + * | |
3337 + * This function contains a fast, not so accurate integer implementation of | |
3338 + * the inverse DCT (Discrete Cosine Transform). It uses the same calculations | |
3339 + * and produces exactly the same output as IJG's original 'jpeg_idct_ifast' | |
3340 + * function from jidctfst.c | |
3341 + * | |
3342 + * Normally 1-D AAN DCT needs 5 multiplications and 29 additions. | |
3343 + * But in ARM NEON case some extra additions are required because VQDMULH | |
3344 + * instruction can't handle the constants larger than 1. So the expressions | |
3345 + * like "x * 1.082392200" have to be converted to "x * 0.082392200 + x", | |
3346 + * which introduces an extra addition. Overall, there are 6 extra additions | |
3347 + * per 1-D IDCT pass, totalling to 5 VQDMULH and 35 VADD/VSUB instructions. | |
3348 + */ | |
3349 + | |
3350 +#define XFIX_1_082392200 v0.4h[0] | |
3351 +#define XFIX_1_414213562 v0.4h[1] | |
3352 +#define XFIX_1_847759065 v0.4h[2] | |
3353 +#define XFIX_2_613125930 v0.4h[3] | |
3354 + | |
3355 +.balign 16 | |
3356 +jsimd_idct_ifast_neon_consts: | |
3357 + .short (277 * 128 - 256 * 128) /* XFIX_1_082392200 */ | |
3358 + .short (362 * 128 - 256 * 128) /* XFIX_1_414213562 */ | |
3359 + .short (473 * 128 - 256 * 128) /* XFIX_1_847759065 */ | |
3360 + .short (669 * 128 - 512 * 128) /* XFIX_2_613125930 */ | |
3361 + | |
3362 +asm_function jsimd_idct_ifast_neon | |
3363 + | |
3364 + DCT_TABLE .req x0 | |
3365 + COEF_BLOCK .req x1 | |
3366 + OUTPUT_BUF .req x2 | |
3367 + OUTPUT_COL .req x3 | |
3368 + TMP1 .req x0 | |
3369 + TMP2 .req x1 | |
3370 + TMP3 .req x2 | |
3371 + TMP4 .req x22 | |
3372 + TMP5 .req x23 | |
3373 + | |
3374 + /* Load and dequantize coefficients into NEON registers | |
3375 + * with the following allocation: | |
3376 + * 0 1 2 3 | 4 5 6 7 | |
3377 + * ---------+-------- | |
3378 + * 0 | d16 | d17 ( v8.8h ) | |
3379 + * 1 | d18 | d19 ( v9.8h ) | |
3380 + * 2 | d20 | d21 ( v10.8h ) | |
3381 + * 3 | d22 | d23 ( v11.8h ) | |
3382 + * 4 | d24 | d25 ( v12.8h ) | |
3383 + * 5 | d26 | d27 ( v13.8h ) | |
3384 + * 6 | d28 | d29 ( v14.8h ) | |
3385 + * 7 | d30 | d31 ( v15.8h ) | |
3386 + */ | |
3387 + /* Save NEON registers used in fast IDCT */ | |
3388 + sub sp, sp, #176 | |
3389 + stp x22, x23, [sp], 16 | |
3390 + adr x23, jsimd_idct_ifast_neon_consts | |
3391 + st1 {v0.8b - v3.8b}, [sp], 32 | |
3392 + st1 {v4.8b - v7.8b}, [sp], 32 | |
3393 + st1 {v8.8b - v11.8b}, [sp], 32 | |
3394 + st1 {v12.8b - v15.8b}, [sp], 32 | |
3395 + st1 {v16.8b - v19.8b}, [sp], 32 | |
3396 + ld1 {v8.8h, v9.8h}, [COEF_BLOCK], 32 | |
3397 + ld1 {v0.8h, v1.8h}, [DCT_TABLE], 32 | |
3398 + ld1 {v10.8h, v11.8h}, [COEF_BLOCK], 32 | |
3399 + mul v8.8h, v8.8h, v0.8h | |
3400 + ld1 {v2.8h, v3.8h}, [DCT_TABLE], 32 | |
3401 + mul v9.8h, v9.8h, v1.8h | |
3402 + ld1 {v12.8h, v13.8h}, [COEF_BLOCK], 32 | |
3403 + mul v10.8h, v10.8h, v2.8h | |
3404 + ld1 {v0.8h, v1.8h}, [DCT_TABLE], 32 | |
3405 + mul v11.8h, v11.8h, v3.8h | |
3406 + ld1 {v14.8h, v15.8h}, [COEF_BLOCK], 32 | |
3407 + mul v12.8h, v12.8h, v0.8h | |
3408 + ld1 {v2.8h, v3.8h}, [DCT_TABLE], 32 | |
3409 + mul v14.8h, v14.8h, v2.8h | |
3410 + mul v13.8h, v13.8h, v1.8h | |
3411 + ld1 {v0.4h}, [x23] /* load constants */ | |
3412 + mul v15.8h, v15.8h, v3.8h | |
3413 + | |
3414 + /* 1-D IDCT, pass 1 */ | |
3415 + sub v2.8h, v10.8h, v14.8h | |
3416 + add v14.8h, v10.8h, v14.8h | |
3417 + sub v1.8h, v11.8h, v13.8h | |
3418 + add v13.8h, v11.8h, v13.8h | |
3419 + sub v5.8h, v9.8h, v15.8h | |
3420 + add v15.8h, v9.8h, v15.8h | |
3421 + sqdmulh v4.8h, v2.8h, XFIX_1_414213562 | |
3422 + sqdmulh v6.8h, v1.8h, XFIX_2_613125930 | |
3423 + add v3.8h, v1.8h, v1.8h | |
3424 + sub v1.8h, v5.8h, v1.8h | |
3425 + add v10.8h, v2.8h, v4.8h | |
3426 + sqdmulh v4.8h, v1.8h, XFIX_1_847759065 | |
3427 + sub v2.8h, v15.8h, v13.8h | |
3428 + add v3.8h, v3.8h, v6.8h | |
3429 + sqdmulh v6.8h, v2.8h, XFIX_1_414213562 | |
3430 + add v1.8h, v1.8h, v4.8h | |
3431 + sqdmulh v4.8h, v5.8h, XFIX_1_082392200 | |
3432 + sub v10.8h, v10.8h, v14.8h | |
3433 + add v2.8h, v2.8h, v6.8h | |
3434 + sub v6.8h, v8.8h, v12.8h | |
3435 + add v12.8h, v8.8h, v12.8h | |
3436 + add v9.8h, v5.8h, v4.8h | |
3437 + add v5.8h, v6.8h, v10.8h | |
3438 + sub v10.8h, v6.8h, v10.8h | |
3439 + add v6.8h, v15.8h, v13.8h | |
3440 + add v8.8h, v12.8h, v14.8h | |
3441 + sub v3.8h, v6.8h, v3.8h | |
3442 + sub v12.8h, v12.8h, v14.8h | |
3443 + sub v3.8h, v3.8h, v1.8h | |
3444 + sub v1.8h, v9.8h, v1.8h | |
3445 + add v2.8h, v3.8h, v2.8h | |
3446 + sub v15.8h, v8.8h, v6.8h | |
3447 + add v1.8h, v1.8h, v2.8h | |
3448 + add v8.8h, v8.8h, v6.8h | |
3449 + add v14.8h, v5.8h, v3.8h | |
3450 + sub v9.8h, v5.8h, v3.8h | |
3451 + sub v13.8h, v10.8h, v2.8h | |
3452 + add v10.8h, v10.8h, v2.8h | |
3453 + /* Transpose q8-q9 */ | |
3454 + mov v18.16b, v8.16b | |
3455 + trn1 v8.8h, v8.8h, v9.8h | |
3456 + trn2 v9.8h, v18.8h, v9.8h | |
3457 + sub v11.8h, v12.8h, v1.8h | |
3458 + /* Transpose q14-q15 */ | |
3459 + mov v18.16b, v14.16b | |
3460 + trn1 v14.8h, v14.8h, v15.8h | |
3461 + trn2 v15.8h, v18.8h, v15.8h | |
3462 + add v12.8h, v12.8h, v1.8h | |
3463 + /* Transpose q10-q11 */ | |
3464 + mov v18.16b, v10.16b | |
3465 + trn1 v10.8h, v10.8h, v11.8h | |
3466 + trn2 v11.8h, v18.8h, v11.8h | |
3467 + /* Transpose q12-q13 */ | |
3468 + mov v18.16b, v12.16b | |
3469 + trn1 v12.8h, v12.8h, v13.8h | |
3470 + trn2 v13.8h, v18.8h, v13.8h | |
3471 + /* Transpose q9-q11 */ | |
3472 + mov v18.16b, v9.16b | |
3473 + trn1 v9.4s, v9.4s, v11.4s | |
3474 + trn2 v11.4s, v18.4s, v11.4s | |
3475 + /* Transpose q12-q14 */ | |
3476 + mov v18.16b, v12.16b | |
3477 + trn1 v12.4s, v12.4s, v14.4s | |
3478 + trn2 v14.4s, v18.4s, v14.4s | |
3479 + /* Transpose q8-q10 */ | |
3480 + mov v18.16b, v8.16b | |
3481 + trn1 v8.4s, v8.4s, v10.4s | |
3482 + trn2 v10.4s, v18.4s, v10.4s | |
3483 + /* Transpose q13-q15 */ | |
3484 + mov v18.16b, v13.16b | |
3485 + trn1 v13.4s, v13.4s, v15.4s | |
3486 + trn2 v15.4s, v18.4s, v15.4s | |
3487 + /* vswp v14.4h, v10-MSB.4h */ | |
3488 + umov x22, v14.d[0] | |
3489 + ins v14.2d[0], v10.2d[1] | |
3490 + ins v10.2d[1], x22 | |
3491 + /* vswp v13.4h, v9MSB.4h */ | |
3492 + | |
3493 + umov x22, v13.d[0] | |
3494 + ins v13.2d[0], v9.2d[1] | |
3495 + ins v9.2d[1], x22 | |
3496 + /* 1-D IDCT, pass 2 */ | |
3497 + sub v2.8h, v10.8h, v14.8h | |
3498 + /* vswp v15.4h, v11MSB.4h */ | |
3499 + umov x22, v15.d[0] | |
3500 + ins v15.2d[0], v11.2d[1] | |
3501 + ins v11.2d[1], x22 | |
3502 + add v14.8h, v10.8h, v14.8h | |
3503 + /* vswp v12.4h, v8-MSB.4h */ | |
3504 + umov x22, v12.d[0] | |
3505 + ins v12.2d[0], v8.2d[1] | |
3506 + ins v8.2d[1], x22 | |
3507 + sub v1.8h, v11.8h, v13.8h | |
3508 + add v13.8h, v11.8h, v13.8h | |
3509 + sub v5.8h, v9.8h, v15.8h | |
3510 + add v15.8h, v9.8h, v15.8h | |
3511 + sqdmulh v4.8h, v2.8h, XFIX_1_414213562 | |
3512 + sqdmulh v6.8h, v1.8h, XFIX_2_613125930 | |
3513 + add v3.8h, v1.8h, v1.8h | |
3514 + sub v1.8h, v5.8h, v1.8h | |
3515 + add v10.8h, v2.8h, v4.8h | |
3516 + sqdmulh v4.8h, v1.8h, XFIX_1_847759065 | |
3517 + sub v2.8h, v15.8h, v13.8h | |
3518 + add v3.8h, v3.8h, v6.8h | |
3519 + sqdmulh v6.8h, v2.8h, XFIX_1_414213562 | |
3520 + add v1.8h, v1.8h, v4.8h | |
3521 + sqdmulh v4.8h, v5.8h, XFIX_1_082392200 | |
3522 + sub v10.8h, v10.8h, v14.8h | |
3523 + add v2.8h, v2.8h, v6.8h | |
3524 + sub v6.8h, v8.8h, v12.8h | |
3525 + add v12.8h, v8.8h, v12.8h | |
3526 + add v9.8h, v5.8h, v4.8h | |
3527 + add v5.8h, v6.8h, v10.8h | |
3528 + sub v10.8h, v6.8h, v10.8h | |
3529 + add v6.8h, v15.8h, v13.8h | |
3530 + add v8.8h, v12.8h, v14.8h | |
3531 + sub v3.8h, v6.8h, v3.8h | |
3532 + sub v12.8h, v12.8h, v14.8h | |
3533 + sub v3.8h, v3.8h, v1.8h | |
3534 + sub v1.8h, v9.8h, v1.8h | |
3535 + add v2.8h, v3.8h, v2.8h | |
3536 + sub v15.8h, v8.8h, v6.8h | |
3537 + add v1.8h, v1.8h, v2.8h | |
3538 + add v8.8h, v8.8h, v6.8h | |
3539 + add v14.8h, v5.8h, v3.8h | |
3540 + sub v9.8h, v5.8h, v3.8h | |
3541 + sub v13.8h, v10.8h, v2.8h | |
3542 + add v10.8h, v10.8h, v2.8h | |
3543 + sub v11.8h, v12.8h, v1.8h | |
3544 + add v12.8h, v12.8h, v1.8h | |
3545 + /* Descale to 8-bit and range limit */ | |
3546 + movi v0.16b, #0x80 | |
3547 + sqshrn v8.8b, v8.8h, #5 | |
3548 + sqshrn2 v8.16b, v9.8h, #5 | |
3549 + sqshrn v9.8b, v10.8h, #5 | |
3550 + sqshrn2 v9.16b, v11.8h, #5 | |
3551 + sqshrn v10.8b, v12.8h, #5 | |
3552 + sqshrn2 v10.16b, v13.8h, #5 | |
3553 + sqshrn v11.8b, v14.8h, #5 | |
3554 + sqshrn2 v11.16b, v15.8h, #5 | |
3555 + add v8.16b, v8.16b, v0.16b | |
3556 + add v9.16b, v9.16b, v0.16b | |
3557 + add v10.16b, v10.16b, v0.16b | |
3558 + add v11.16b, v11.16b, v0.16b | |
3559 + /* Transpose the final 8-bit samples */ | |
3560 + /* Transpose q8-q9 */ | |
3561 + mov v18.16b, v8.16b | |
3562 + trn1 v8.8h, v8.8h, v9.8h | |
3563 + trn2 v9.8h, v18.8h, v9.8h | |
3564 + /* Transpose q10-q11 */ | |
3565 + mov v18.16b, v10.16b | |
3566 + trn1 v10.8h, v10.8h, v11.8h | |
3567 + trn2 v11.8h, v18.8h, v11.8h | |
3568 + /* Transpose q8-q10 */ | |
3569 + mov v18.16b, v8.16b | |
3570 + trn1 v8.4s, v8.4s, v10.4s | |
3571 + trn2 v10.4s, v18.4s, v10.4s | |
3572 + /* Transpose q9-q11 */ | |
3573 + mov v18.16b, v9.16b | |
3574 + trn1 v9.4s, v9.4s, v11.4s | |
3575 + trn2 v11.4s, v18.4s, v11.4s | |
3576 + /* make copy */ | |
3577 + ins v17.2d[0], v8.2d[1] | |
3578 + /* Transpose d16-d17-msb */ | |
3579 + mov v18.16b, v8.16b | |
3580 + trn1 v8.8b, v8.8b, v17.8b | |
3581 + trn2 v17.8b, v18.8b, v17.8b | |
3582 + /* make copy */ | |
3583 + ins v19.2d[0], v9.2d[1] | |
3584 + mov v18.16b, v9.16b | |
3585 + trn1 v9.8b, v9.8b, v19.8b | |
3586 + trn2 v19.8b, v18.8b, v19.8b | |
3587 + /* Store results to the output buffer */ | |
3588 + ldp TMP1, TMP2, [OUTPUT_BUF], 16 | |
3589 + add TMP1, TMP1, OUTPUT_COL | |
3590 + add TMP2, TMP2, OUTPUT_COL | |
3591 + st1 {v8.8b}, [TMP1] | |
3592 + st1 {v17.8b}, [TMP2] | |
3593 + ldp TMP1, TMP2, [OUTPUT_BUF], 16 | |
3594 + add TMP1, TMP1, OUTPUT_COL | |
3595 + add TMP2, TMP2, OUTPUT_COL | |
3596 + st1 {v9.8b}, [TMP1] | |
3597 + /* make copy */ | |
3598 + ins v7.2d[0], v10.2d[1] | |
3599 + mov v18.16b, v10.16b | |
3600 + trn1 v10.8b, v10.8b, v7.8b | |
3601 + trn2 v7.8b, v18.8b, v7.8b | |
3602 + st1 {v19.8b}, [TMP2] | |
3603 + ldp TMP1, TMP2, [OUTPUT_BUF], 16 | |
3604 + ldp TMP4, TMP5, [OUTPUT_BUF], 16 | |
3605 + add TMP1, TMP1, OUTPUT_COL | |
3606 + add TMP2, TMP2, OUTPUT_COL | |
3607 + add TMP4, TMP4, OUTPUT_COL | |
3608 + add TMP5, TMP5, OUTPUT_COL | |
3609 + st1 {v10.8b}, [TMP1] | |
3610 + /* make copy */ | |
3611 + ins v16.2d[0], v11.2d[1] | |
3612 + mov v18.16b, v11.16b | |
3613 + trn1 v11.8b, v11.8b, v16.8b | |
3614 + trn2 v16.8b, v18.8b, v16.8b | |
3615 + st1 {v7.8b}, [TMP2] | |
3616 + st1 {v11.8b}, [TMP4] | |
3617 + st1 {v16.8b}, [TMP5] | |
3618 + sub sp, sp, #176 | |
3619 + ldp x22, x23, [sp], 16 | |
3620 + ld1 {v0.8b - v3.8b}, [sp], 32 | |
3621 + ld1 {v4.8b - v7.8b}, [sp], 32 | |
3622 + ld1 {v8.8b - v11.8b}, [sp], 32 | |
3623 + ld1 {v12.8b - v15.8b}, [sp], 32 | |
3624 + ld1 {v16.8b - v19.8b}, [sp], 32 | |
3625 + blr x30 | |
3626 + | |
3627 + .unreq DCT_TABLE | |
3628 + .unreq COEF_BLOCK | |
3629 + .unreq OUTPUT_BUF | |
3630 + .unreq OUTPUT_COL | |
3631 + .unreq TMP1 | |
3632 + .unreq TMP2 | |
3633 + .unreq TMP3 | |
3634 + .unreq TMP4 | |
3635 + | |
3636 + | |
3637 +/*****************************************************************************/ | |
3638 + | |
3639 +/* | |
3640 + * jsimd_idct_4x4_neon | |
3641 + * | |
3642 + * This function contains inverse-DCT code for getting reduced-size | |
3643 + * 4x4 pixels output from an 8x8 DCT block. It uses the same calculations | |
3644 + * and produces exactly the same output as IJG's original 'jpeg_idct_4x4' | |
3645 + * function from jpeg-6b (jidctred.c). | |
3646 + * | |
3647 + * NOTE: jpeg-8 has an improved implementation of 4x4 inverse-DCT, which | |
3648 + * requires much less arithmetic operations and hence should be faster. | |
3649 + * The primary purpose of this particular NEON optimized function is | |
3650 + * bit exact compatibility with jpeg-6b. | |
3651 + * | |
3652 + * TODO: a bit better instructions scheduling can be achieved by expanding | |
3653 + * idct_helper/transpose_4x4 macros and reordering instructions, | |
3654 + * but readability will suffer somewhat. | |
3655 + */ | |
3656 + | |
3657 +#define CONST_BITS 13 | |
3658 + | |
3659 +#define FIX_0_211164243 (1730) /* FIX(0.211164243) */ | |
3660 +#define FIX_0_509795579 (4176) /* FIX(0.509795579) */ | |
3661 +#define FIX_0_601344887 (4926) /* FIX(0.601344887) */ | |
3662 +#define FIX_0_720959822 (5906) /* FIX(0.720959822) */ | |
3663 +#define FIX_0_765366865 (6270) /* FIX(0.765366865) */ | |
3664 +#define FIX_0_850430095 (6967) /* FIX(0.850430095) */ | |
3665 +#define FIX_0_899976223 (7373) /* FIX(0.899976223) */ | |
3666 +#define FIX_1_061594337 (8697) /* FIX(1.061594337) */ | |
3667 +#define FIX_1_272758580 (10426) /* FIX(1.272758580) */ | |
3668 +#define FIX_1_451774981 (11893) /* FIX(1.451774981) */ | |
3669 +#define FIX_1_847759065 (15137) /* FIX(1.847759065) */ | |
3670 +#define FIX_2_172734803 (17799) /* FIX(2.172734803) */ | |
3671 +#define FIX_2_562915447 (20995) /* FIX(2.562915447) */ | |
3672 +#define FIX_3_624509785 (29692) /* FIX(3.624509785) */ | |
3673 + | |
3674 +.balign 16 | |
3675 +jsimd_idct_4x4_neon_consts: | |
3676 + .short FIX_1_847759065 /* v0.4h[0] */ | |
3677 + .short -FIX_0_765366865 /* v0.4h[1] */ | |
3678 + .short -FIX_0_211164243 /* v0.4h[2] */ | |
3679 + .short FIX_1_451774981 /* v0.4h[3] */ | |
3680 + .short -FIX_2_172734803 /* d1[0] */ | |
3681 + .short FIX_1_061594337 /* d1[1] */ | |
3682 + .short -FIX_0_509795579 /* d1[2] */ | |
3683 + .short -FIX_0_601344887 /* d1[3] */ | |
3684 + .short FIX_0_899976223 /* v2.4h[0] */ | |
3685 + .short FIX_2_562915447 /* v2.4h[1] */ | |
3686 + .short 1 << (CONST_BITS+1) /* v2.4h[2] */ | |
3687 + .short 0 /* v2.4h[3] */ | |
3688 + | |
3689 +.macro idct_helper x4, x6, x8, x10, x12, x14, x16, shift, y26, y27, y28, y29 | |
3690 + smull v28.4s, \x4, v2.4h[2] | |
3691 + smlal v28.4s, \x8, v0.4h[0] | |
3692 + smlal v28.4s, \x14, v0.4h[1] | |
3693 + | |
3694 + smull v26.4s, \x16, v1.4h[2] | |
3695 + smlal v26.4s, \x12, v1.4h[3] | |
3696 + smlal v26.4s, \x10, v2.4h[0] | |
3697 + smlal v26.4s, \x6, v2.4h[1] | |
3698 + | |
3699 + smull v30.4s, \x4, v2.4h[2] | |
3700 + smlsl v30.4s, \x8, v0.4h[0] | |
3701 + smlsl v30.4s, \x14, v0.4h[1] | |
3702 + | |
3703 + smull v24.4s, \x16, v0.4h[2] | |
3704 + smlal v24.4s, \x12, v0.4h[3] | |
3705 + smlal v24.4s, \x10, v1.4h[0] | |
3706 + smlal v24.4s, \x6, v1.4h[1] | |
3707 + | |
3708 + add v20.4s, v28.4s, v26.4s | |
3709 + sub v28.4s, v28.4s, v26.4s | |
3710 + | |
3711 +.if \shift > 16 | |
3712 + srshr v20.4s, v20.4s, #\shift | |
3713 + srshr v28.4s, v28.4s, #\shift | |
3714 + xtn \y26, v20.4s | |
3715 + xtn \y29, v28.4s | |
3716 +.else | |
3717 + rshrn \y26, v20.4s, #\shift | |
3718 + rshrn \y29, v28.4s, #\shift | |
3719 +.endif | |
3720 + | |
3721 + add v20.4s, v30.4s, v24.4s | |
3722 + sub v30.4s, v30.4s, v24.4s | |
3723 + | |
3724 +.if \shift > 16 | |
3725 + srshr v20.4s, v20.4s, #\shift | |
3726 + srshr v30.4s, v30.4s, #\shift | |
3727 + xtn \y27, v20.4s | |
3728 + xtn \y28, v30.4s | |
3729 +.else | |
3730 + rshrn \y27, v20.4s, #\shift | |
3731 + rshrn \y28, v30.4s, #\shift | |
3732 +.endif | |
3733 + | |
3734 +.endm | |
3735 + | |
3736 +asm_function jsimd_idct_4x4_neon | |
3737 + | |
3738 + DCT_TABLE .req x0 | |
3739 + COEF_BLOCK .req x1 | |
3740 + OUTPUT_BUF .req x2 | |
3741 + OUTPUT_COL .req x3 | |
3742 + TMP1 .req x0 | |
3743 + TMP2 .req x1 | |
3744 + TMP3 .req x2 | |
3745 + TMP4 .req x15 | |
3746 + | |
3747 + /* Save all used NEON registers */ | |
3748 + sub sp, sp, 272 | |
3749 + str x15, [sp], 16 | |
3750 + /* Load constants (v3.4h is just used for padding) */ | |
3751 + adr TMP4, jsimd_idct_4x4_neon_consts | |
3752 + st1 {v0.8b - v3.8b}, [sp], 32 | |
3753 + st1 {v4.8b - v7.8b}, [sp], 32 | |
3754 + st1 {v8.8b - v11.8b}, [sp], 32 | |
3755 + st1 {v12.8b - v15.8b}, [sp], 32 | |
3756 + st1 {v16.8b - v19.8b}, [sp], 32 | |
3757 + st1 {v20.8b - v23.8b}, [sp], 32 | |
3758 + st1 {v24.8b - v27.8b}, [sp], 32 | |
3759 + st1 {v28.8b - v31.8b}, [sp], 32 | |
3760 + ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [TMP4] | |
3761 + | |
3762 + /* Load all COEF_BLOCK into NEON registers with the following allocation: | |
3763 + * 0 1 2 3 | 4 5 6 7 | |
3764 + * ---------+-------- | |
3765 + * 0 | v4.4h | v5.4h | |
3766 + * 1 | v6.4h | v7.4h | |
3767 + * 2 | v8.4h | v9.4h | |
3768 + * 3 | v10.4h | v11.4h | |
3769 + * 4 | - | - | |
3770 + * 5 | v12.4h | v13.4h | |
3771 + * 6 | v14.4h | v15.4h | |
3772 + * 7 | v16.4h | v17.4h | |
3773 + */ | |
3774 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [COEF_BLOCK], 32 | |
3775 + ld1 {v8.4h, v9.4h, v10.4h, v11.4h}, [COEF_BLOCK], 32 | |
3776 + add COEF_BLOCK, COEF_BLOCK, #16 | |
3777 + ld1 {v12.4h, v13.4h, v14.4h, v15.4h}, [COEF_BLOCK], 32 | |
3778 + ld1 {v16.4h, v17.4h}, [COEF_BLOCK], 16 | |
3779 + /* dequantize */ | |
3780 + ld1 {v18.4h, v19.4h, v20.4h, v21.4h}, [DCT_TABLE], 32 | |
3781 + mul v4.4h, v4.4h, v18.4h | |
3782 + mul v5.4h, v5.4h, v19.4h | |
3783 + ins v4.2d[1], v5.2d[0] /* 128 bit q4 */ | |
3784 + ld1 {v22.4h, v23.4h, v24.4h, v25.4h}, [DCT_TABLE], 32 | |
3785 + mul v6.4h, v6.4h, v20.4h | |
3786 + mul v7.4h, v7.4h, v21.4h | |
3787 + ins v6.2d[1], v7.2d[0] /* 128 bit q6 */ | |
3788 + mul v8.4h, v8.4h, v22.4h | |
3789 + mul v9.4h, v9.4h, v23.4h | |
3790 + ins v8.2d[1], v9.2d[0] /* 128 bit q8 */ | |
3791 + add DCT_TABLE, DCT_TABLE, #16 | |
3792 + ld1 {v26.4h, v27.4h, v28.4h, v29.4h}, [DCT_TABLE], 32 | |
3793 + mul v10.4h, v10.4h, v24.4h | |
3794 + mul v11.4h, v11.4h, v25.4h | |
3795 + ins v10.2d[1], v11.2d[0] /* 128 bit q10 */ | |
3796 + mul v12.4h, v12.4h, v26.4h | |
3797 + mul v13.4h, v13.4h, v27.4h | |
3798 + ins v12.2d[1], v13.2d[0] /* 128 bit q12 */ | |
3799 + ld1 {v30.4h, v31.4h}, [DCT_TABLE], 16 | |
3800 + mul v14.4h, v14.4h, v28.4h | |
3801 + mul v15.4h, v15.4h, v29.4h | |
3802 + ins v14.2d[1], v15.2d[0] /* 128 bit q14 */ | |
3803 + mul v16.4h, v16.4h, v30.4h | |
3804 + mul v17.4h, v17.4h, v31.4h | |
3805 + ins v16.2d[1], v17.2d[0] /* 128 bit q16 */ | |
3806 + | |
3807 + /* Pass 1 */ | |
3808 + idct_helper v4.4h, v6.4h, v8.4h, v10.4h, v12.4h, v14.4h, v16.4h, 12, v4
.4h, v6.4h, v8.4h, v10.4h | |
3809 + transpose_4x4 v4, v6, v8, v10, v3 | |
3810 + ins v10.2d[1], v11.2d[0] | |
3811 + idct_helper v5.4h, v7.4h, v9.4h, v11.4h, v13.4h, v15.4h, v17.4h, 12, v5
.4h, v7.4h, v9.4h, v11.4h | |
3812 + transpose_4x4 v5, v7, v9, v11, v3 | |
3813 + ins v10.2d[1], v11.2d[0] | |
3814 + /* Pass 2 */ | |
3815 + idct_helper v4.4h, v6.4h, v8.4h, v10.4h, v7.4h, v9.4h, v11.4h, 19, v26.
4h, v27.4h, v28.4h, v29.4h | |
3816 + transpose_4x4 v26, v27, v28, v29, v3 | |
3817 + | |
3818 + /* Range limit */ | |
3819 + movi v30.8h, #0x80 | |
3820 + ins v26.2d[1], v27.2d[0] | |
3821 + ins v28.2d[1], v29.2d[0] | |
3822 + add v26.8h, v26.8h, v30.8h | |
3823 + add v28.8h, v28.8h, v30.8h | |
3824 + sqxtun v26.8b, v26.8h | |
3825 + sqxtun v27.8b, v28.8h | |
3826 + | |
3827 + /* Store results to the output buffer */ | |
3828 + ldp TMP1, TMP2, [OUTPUT_BUF], 16 | |
3829 + ldp TMP3, TMP4, [OUTPUT_BUF] | |
3830 + add TMP1, TMP1, OUTPUT_COL | |
3831 + add TMP2, TMP2, OUTPUT_COL | |
3832 + add TMP3, TMP3, OUTPUT_COL | |
3833 + add TMP4, TMP4, OUTPUT_COL | |
3834 + | |
3835 +#if defined(__ARMEL__) && !RESPECT_STRICT_ALIGNMENT | |
3836 + /* We can use much less instructions on little endian systems if the | |
3837 + * OS kernel is not configured to trap unaligned memory accesses | |
3838 + */ | |
3839 + st1 {v26.s}[0], [TMP1], 4 | |
3840 + st1 {v27.s}[0], [TMP3], 4 | |
3841 + st1 {v26.s}[1], [TMP2], 4 | |
3842 + st1 {v27.s}[1], [TMP4], 4 | |
3843 +#else | |
3844 + st1 {v26.b}[0], [TMP1], 1 | |
3845 + st1 {v27.b}[0], [TMP3], 1 | |
3846 + st1 {v26.b}[1], [TMP1], 1 | |
3847 + st1 {v27.b}[1], [TMP3], 1 | |
3848 + st1 {v26.b}[2], [TMP1], 1 | |
3849 + st1 {v27.b}[2], [TMP3], 1 | |
3850 + st1 {v26.b}[3], [TMP1], 1 | |
3851 + st1 {v27.b}[3], [TMP3], 1 | |
3852 + | |
3853 + st1 {v26.b}[4], [TMP2], 1 | |
3854 + st1 {v27.b}[4], [TMP4], 1 | |
3855 + st1 {v26.b}[5], [TMP2], 1 | |
3856 + st1 {v27.b}[5], [TMP4], 1 | |
3857 + st1 {v26.b}[6], [TMP2], 1 | |
3858 + st1 {v27.b}[6], [TMP4], 1 | |
3859 + st1 {v26.b}[7], [TMP2], 1 | |
3860 + st1 {v27.b}[7], [TMP4], 1 | |
3861 +#endif | |
3862 + | |
3863 + /* vpop {v8.4h - v15.4h} ;not available */ | |
3864 + sub sp, sp, #272 | |
3865 + ldr x15, [sp], 16 | |
3866 + ld1 {v0.8b - v3.8b}, [sp], 32 | |
3867 + ld1 {v4.8b - v7.8b}, [sp], 32 | |
3868 + ld1 {v8.8b - v11.8b}, [sp], 32 | |
3869 + ld1 {v12.8b - v15.8b}, [sp], 32 | |
3870 + ld1 {v16.8b - v19.8b}, [sp], 32 | |
3871 + ld1 {v20.8b - v23.8b}, [sp], 32 | |
3872 + ld1 {v24.8b - v27.8b}, [sp], 32 | |
3873 + ld1 {v28.8b - v31.8b}, [sp], 32 | |
3874 + blr x30 | |
3875 + | |
3876 + .unreq DCT_TABLE | |
3877 + .unreq COEF_BLOCK | |
3878 + .unreq OUTPUT_BUF | |
3879 + .unreq OUTPUT_COL | |
3880 + .unreq TMP1 | |
3881 + .unreq TMP2 | |
3882 + .unreq TMP3 | |
3883 + .unreq TMP4 | |
3884 + | |
3885 +.purgem idct_helper | |
3886 + | |
3887 + | |
3888 +/*****************************************************************************/ | |
3889 + | |
3890 +/* | |
3891 + * jsimd_idct_2x2_neon | |
3892 + * | |
3893 + * This function contains inverse-DCT code for getting reduced-size | |
3894 + * 2x2 pixels output from an 8x8 DCT block. It uses the same calculations | |
3895 + * and produces exactly the same output as IJG's original 'jpeg_idct_2x2' | |
3896 + * function from jpeg-6b (jidctred.c). | |
3897 + * | |
3898 + * NOTE: jpeg-8 has an improved implementation of 2x2 inverse-DCT, which | |
3899 + * requires much less arithmetic operations and hence should be faster. | |
3900 + * The primary purpose of this particular NEON optimized function is | |
3901 + * bit exact compatibility with jpeg-6b. | |
3902 + */ | |
3903 + | |
3904 +.balign 8 | |
3905 +jsimd_idct_2x2_neon_consts: | |
3906 + .short -FIX_0_720959822 /* v14[0] */ | |
3907 + .short FIX_0_850430095 /* v14[1] */ | |
3908 + .short -FIX_1_272758580 /* v14[2] */ | |
3909 + .short FIX_3_624509785 /* v14[3] */ | |
3910 + | |
3911 +.macro idct_helper x4, x6, x10, x12, x16, shift, y26, y27 | |
3912 + sshll v15.4s, \x4, #15 | |
3913 + smull v26.4s, \x6, v14.4h[3] | |
3914 + smlal v26.4s, \x10, v14.4h[2] | |
3915 + smlal v26.4s, \x12, v14.4h[1] | |
3916 + smlal v26.4s, \x16, v14.4h[0] | |
3917 + | |
3918 + add v20.4s, v15.4s, v26.4s | |
3919 + sub v15.4s, v15.4s, v26.4s | |
3920 + | |
3921 +.if \shift > 16 | |
3922 + srshr v20.4s, v20.4s, #\shift | |
3923 + srshr v15.4s, v15.4s, #\shift | |
3924 + xtn \y26, v20.4s | |
3925 + xtn \y27, v15.4s | |
3926 +.else | |
3927 + rshrn \y26, v20.4s, #\shift | |
3928 + rshrn \y27, v15.4s, #\shift | |
3929 +.endif | |
3930 + | |
3931 +.endm | |
3932 + | |
3933 +asm_function jsimd_idct_2x2_neon | |
3934 + | |
3935 + DCT_TABLE .req x0 | |
3936 + COEF_BLOCK .req x1 | |
3937 + OUTPUT_BUF .req x2 | |
3938 + OUTPUT_COL .req x3 | |
3939 + TMP1 .req x0 | |
3940 + TMP2 .req x15 | |
3941 + | |
3942 + /* vpush {v8.4h - v15.4h} ; not available */ | |
3943 + sub sp, sp, 208 | |
3944 + str x15, [sp], 16 | |
3945 + | |
3946 + /* Load constants */ | |
3947 + adr TMP2, jsimd_idct_2x2_neon_consts | |
3948 + st1 {v4.8b - v7.8b}, [sp], 32 | |
3949 + st1 {v8.8b - v11.8b}, [sp], 32 | |
3950 + st1 {v12.8b - v15.8b}, [sp], 32 | |
3951 + st1 {v16.8b - v19.8b}, [sp], 32 | |
3952 + st1 {v21.8b - v22.8b}, [sp], 16 | |
3953 + st1 {v24.8b - v27.8b}, [sp], 32 | |
3954 + st1 {v30.8b - v31.8b}, [sp], 16 | |
3955 + ld1 {v14.4h}, [TMP2] | |
3956 + | |
3957 + /* Load all COEF_BLOCK into NEON registers with the following allocation: | |
3958 + * 0 1 2 3 | 4 5 6 7 | |
3959 + * ---------+-------- | |
3960 + * 0 | v4.4h | v5.4h | |
3961 + * 1 | v6.4h | v7.4h | |
3962 + * 2 | - | - | |
3963 + * 3 | v10.4h | v11.4h | |
3964 + * 4 | - | - | |
3965 + * 5 | v12.4h | v13.4h | |
3966 + * 6 | - | - | |
3967 + * 7 | v16.4h | v17.4h | |
3968 + */ | |
3969 + ld1 {v4.4h, v5.4h, v6.4h, v7.4h}, [COEF_BLOCK], 32 | |
3970 + add COEF_BLOCK, COEF_BLOCK, #16 | |
3971 + ld1 {v10.4h, v11.4h}, [COEF_BLOCK], 16 | |
3972 + add COEF_BLOCK, COEF_BLOCK, #16 | |
3973 + ld1 {v12.4h, v13.4h}, [COEF_BLOCK], 16 | |
3974 + add COEF_BLOCK, COEF_BLOCK, #16 | |
3975 + ld1 {v16.4h, v17.4h}, [COEF_BLOCK], 16 | |
3976 + /* Dequantize */ | |
3977 + ld1 {v18.4h, v19.4h, v20.4h, v21.4h}, [DCT_TABLE], 32 | |
3978 + mul v4.4h, v4.4h, v18.4h | |
3979 + mul v5.4h, v5.4h, v19.4h | |
3980 + ins v4.2d[1], v5.2d[0] | |
3981 + mul v6.4h, v6.4h, v20.4h | |
3982 + mul v7.4h, v7.4h, v21.4h | |
3983 + ins v6.2d[1], v7.2d[0] | |
3984 + add DCT_TABLE, DCT_TABLE, #16 | |
3985 + ld1 {v24.4h, v25.4h}, [DCT_TABLE], 16 | |
3986 + mul v10.4h, v10.4h, v24.4h | |
3987 + mul v11.4h, v11.4h, v25.4h | |
3988 + ins v10.2d[1], v11.2d[0] | |
3989 + add DCT_TABLE, DCT_TABLE, #16 | |
3990 + ld1 {v26.4h, v27.4h}, [DCT_TABLE], 16 | |
3991 + mul v12.4h, v12.4h, v26.4h | |
3992 + mul v13.4h, v13.4h, v27.4h | |
3993 + ins v12.2d[1], v13.2d[0] | |
3994 + add DCT_TABLE, DCT_TABLE, #16 | |
3995 + ld1 {v30.4h, v31.4h}, [DCT_TABLE], 16 | |
3996 + mul v16.4h, v16.4h, v30.4h | |
3997 + mul v17.4h, v17.4h, v31.4h | |
3998 + ins v16.2d[1], v17.2d[0] | |
3999 + | |
4000 + /* Pass 1 */ | |
4001 +#if 0 | |
4002 + idct_helper v4.4h, v6.4h, v10.4h, v12.4h, v16.4h, 13, v4.4h, v6.4h | |
4003 + transpose_4x4 v4.4h, v6.4h, v8.4h, v10.4h | |
4004 + idct_helper v5.4h, v7.4h, v11.4h, v13.4h, v17.4h, 13, v5.4h, v7.4h | |
4005 + transpose_4x4 v5.4h, v7.4h, v9.4h, v11.4h | |
4006 +#else | |
4007 + smull v26.4s, v6.4h, v14.4h[3] | |
4008 + smlal v26.4s, v10.4h, v14.4h[2] | |
4009 + smlal v26.4s, v12.4h, v14.4h[1] | |
4010 + smlal v26.4s, v16.4h, v14.4h[0] | |
4011 + smull v24.4s, v7.4h, v14.4h[3] | |
4012 + smlal v24.4s, v11.4h, v14.4h[2] | |
4013 + smlal v24.4s, v13.4h, v14.4h[1] | |
4014 + smlal v24.4s, v17.4h, v14.4h[0] | |
4015 + sshll v15.4s, v4.4h, #15 | |
4016 + sshll v30.4s, v5.4h, #15 | |
4017 + add v20.4s, v15.4s, v26.4s | |
4018 + sub v15.4s, v15.4s, v26.4s | |
4019 + rshrn v4.4h, v20.4s, #13 | |
4020 + rshrn v6.4h, v15.4s, #13 | |
4021 + add v20.4s, v30.4s, v24.4s | |
4022 + sub v15.4s, v30.4s, v24.4s | |
4023 + rshrn v5.4h, v20.4s, #13 | |
4024 + rshrn v7.4h, v15.4s, #13 | |
4025 + ins v4.2d[1], v5.2d[0] | |
4026 + ins v6.2d[1], v7.2d[0] | |
4027 + transpose v4, v6, v3, .16b, .8h | |
4028 + transpose v6, v10, v3, .16b, .4s | |
4029 + ins v11.2d[0], v10.2d[1] | |
4030 + ins v7.2d[0], v6.2d[1] | |
4031 +#endif | |
4032 + | |
4033 + /* Pass 2 */ | |
4034 + idct_helper v4.4h, v6.4h, v10.4h, v7.4h, v11.4h, 20, v26.4h, v27.4h | |
4035 + | |
4036 + /* Range limit */ | |
4037 + movi v30.8h, #0x80 | |
4038 + ins v26.2d[1], v27.2d[0] | |
4039 + add v26.8h, v26.8h, v30.8h | |
4040 + sqxtun v30.8b, v26.8h | |
4041 + ins v26.2d[0], v30.2d[0] | |
4042 + sqxtun v27.8b, v26.8h | |
4043 + | |
4044 + /* Store results to the output buffer */ | |
4045 + ldp TMP1, TMP2, [OUTPUT_BUF] | |
4046 + add TMP1, TMP1, OUTPUT_COL | |
4047 + add TMP2, TMP2, OUTPUT_COL | |
4048 + | |
4049 + st1 {v26.b}[0], [TMP1], 1 | |
4050 + st1 {v27.b}[4], [TMP1], 1 | |
4051 + st1 {v26.b}[1], [TMP2], 1 | |
4052 + st1 {v27.b}[5], [TMP2], 1 | |
4053 + | |
4054 + sub sp, sp, #208 | |
4055 + ldr x15, [sp], 16 | |
4056 + ld1 {v4.8b - v7.8b}, [sp], 32 | |
4057 + ld1 {v8.8b - v11.8b}, [sp], 32 | |
4058 + ld1 {v12.8b - v15.8b}, [sp], 32 | |
4059 + ld1 {v16.8b - v19.8b}, [sp], 32 | |
4060 + ld1 {v21.8b - v22.8b}, [sp], 16 | |
4061 + ld1 {v24.8b - v27.8b}, [sp], 32 | |
4062 + ld1 {v30.8b - v31.8b}, [sp], 16 | |
4063 + blr x30 | |
4064 + | |
4065 + .unreq DCT_TABLE | |
4066 + .unreq COEF_BLOCK | |
4067 + .unreq OUTPUT_BUF | |
4068 + .unreq OUTPUT_COL | |
4069 + .unreq TMP1 | |
4070 + .unreq TMP2 | |
4071 + | |
4072 +.purgem idct_helper | |
4073 + | |
4074 + | |
4075 +/*****************************************************************************/ | |
4076 + | |
4077 +/* | |
4078 + * jsimd_ycc_extrgb_convert_neon | |
4079 + * jsimd_ycc_extbgr_convert_neon | |
4080 + * jsimd_ycc_extrgbx_convert_neon | |
4081 + * jsimd_ycc_extbgrx_convert_neon | |
4082 + * jsimd_ycc_extxbgr_convert_neon | |
4083 + * jsimd_ycc_extxrgb_convert_neon | |
4084 + * | |
4085 + * Colorspace conversion YCbCr -> RGB | |
4086 + */ | |
4087 + | |
4088 + | |
4089 +.macro do_load size | |
4090 + .if \size == 8 | |
4091 + ld1 {v4.8b}, [U], 8 | |
4092 + ld1 {v5.8b}, [V], 8 | |
4093 + ld1 {v0.8b}, [Y], 8 | |
4094 + prfm PLDL1KEEP, [U, #64] | |
4095 + prfm PLDL1KEEP, [V, #64] | |
4096 + prfm PLDL1KEEP, [Y, #64] | |
4097 + .elseif \size == 4 | |
4098 + ld1 {v4.b}[0], [U], 1 | |
4099 + ld1 {v4.b}[1], [U], 1 | |
4100 + ld1 {v4.b}[2], [U], 1 | |
4101 + ld1 {v4.b}[3], [U], 1 | |
4102 + ld1 {v5.b}[0], [V], 1 | |
4103 + ld1 {v5.b}[1], [V], 1 | |
4104 + ld1 {v5.b}[2], [V], 1 | |
4105 + ld1 {v5.b}[3], [V], 1 | |
4106 + ld1 {v0.b}[0], [Y], 1 | |
4107 + ld1 {v0.b}[1], [Y], 1 | |
4108 + ld1 {v0.b}[2], [Y], 1 | |
4109 + ld1 {v0.b}[3], [Y], 1 | |
4110 + .elseif \size == 2 | |
4111 + ld1 {v4.b}[4], [U], 1 | |
4112 + ld1 {v4.b}[5], [U], 1 | |
4113 + ld1 {v5.b}[4], [V], 1 | |
4114 + ld1 {v5.b}[5], [V], 1 | |
4115 + ld1 {v0.b}[4], [Y], 1 | |
4116 + ld1 {v0.b}[5], [Y], 1 | |
4117 + .elseif \size == 1 | |
4118 + ld1 {v4.b}[6], [U], 1 | |
4119 + ld1 {v5.b}[6], [V], 1 | |
4120 + ld1 {v0.b}[6], [Y], 1 | |
4121 + .else | |
4122 + .error unsupported macroblock size | |
4123 + .endif | |
4124 +.endm | |
4125 + | |
4126 +.macro do_store bpp, size | |
4127 + .if \bpp == 24 | |
4128 + .if \size == 8 | |
4129 + st3 {v10.8b, v11.8b, v12.8b}, [RGB], 24 | |
4130 + .elseif \size == 4 | |
4131 + st3 {v10.b, v11.b, v12.b}[0], [RGB], 3 | |
4132 + st3 {v10.b, v11.b, v12.b}[1], [RGB], 3 | |
4133 + st3 {v10.b, v11.b, v12.b}[2], [RGB], 3 | |
4134 + st3 {v10.b, v11.b, v12.b}[3], [RGB], 3 | |
4135 + .elseif \size == 2 | |
4136 + st3 {v10.b, v11.b, v12.b}[4], [RGB], 3 | |
4137 + st3 {v10.b, v11.b, v12.b}[5], [RGB], 3 | |
4138 + .elseif \size == 1 | |
4139 + st3 {v10.b, v11.b, v12.b}[6], [RGB], 3 | |
4140 + .else | |
4141 + .error unsupported macroblock size | |
4142 + .endif | |
4143 + .elseif \bpp == 32 | |
4144 + .if \size == 8 | |
4145 + st4 {v10.8b, v11.8b, v12.8b, v13.8b}, [RGB], 32 | |
4146 + .elseif \size == 4 | |
4147 + st4 {v10.b, v11.b, v12.b, v13.b}[0], [RGB], 4 | |
4148 + st4 {v10.b, v11.b, v12.b, v13.b}[1], [RGB], 4 | |
4149 + st4 {v10.b, v11.b, v12.b, v13.b}[2], [RGB], 4 | |
4150 + st4 {v10.b, v11.b, v12.b, v13.b}[3], [RGB], 4 | |
4151 + .elseif \size == 2 | |
4152 + st4 {v10.b, v11.b, v12.b, v13.b}[4], [RGB], 4 | |
4153 + st4 {v10.b, v11.b, v12.b, v13.b}[5], [RGB], 4 | |
4154 + .elseif \size == 1 | |
4155 + st4 {v10.b, v11.b, v12.b, v13.b}[6], [RGB], 4 | |
4156 + .else | |
4157 + .error unsupported macroblock size | |
4158 + .endif | |
4159 + .elseif \bpp==16 | |
4160 + .if \size == 8 | |
4161 + st1 {v25.8h}, [RGB],16 | |
4162 + .elseif \size == 4 | |
4163 + st1 {v25.4h}, [RGB],8 | |
4164 + .elseif \size == 2 | |
4165 + st1 {v25.h}[4], [RGB],2 | |
4166 + st1 {v25.h}[5], [RGB],2 | |
4167 + .elseif \size == 1 | |
4168 + st1 {v25.h}[6], [RGB],2 | |
4169 + .else | |
4170 + .error unsupported macroblock size | |
4171 + .endif | |
4172 + .else | |
4173 + .error unsupported bpp | |
4174 + .endif | |
4175 +.endm | |
4176 + | |
4177 +.macro generate_jsimd_ycc_rgb_convert_neon colorid, bpp, r_offs, rsize, g_offs,
gsize, b_offs, bsize, defsize | |
4178 + | |
4179 +/* | |
4180 + * 2-stage pipelined YCbCr->RGB conversion | |
4181 + */ | |
4182 + | |
4183 +.macro do_yuv_to_rgb_stage1 | |
4184 + uaddw v6.8h, v2.8h, v4.8b /* q3 = u - 128 */ | |
4185 + uaddw v8.8h, v2.8h, v5.8b /* q2 = v - 128 */ | |
4186 + smull v20.4s, v6.4h, v1.4h[1] /* multiply by -11277 */ | |
4187 + smlal v20.4s, v8.4h, v1.4h[2] /* multiply by -23401 */ | |
4188 + smull2 v22.4s, v6.8h, v1.4h[1] /* multiply by -11277 */ | |
4189 + smlal2 v22.4s, v8.8h, v1.4h[2] /* multiply by -23401 */ | |
4190 + smull v24.4s, v8.4h, v1.4h[0] /* multiply by 22971 */ | |
4191 + smull2 v26.4s, v8.8h, v1.4h[0] /* multiply by 22971 */ | |
4192 + smull v28.4s, v6.4h, v1.4h[3] /* multiply by 29033 */ | |
4193 + smull2 v30.4s, v6.8h, v1.4h[3] /* multiply by 29033 */ | |
4194 +.endm | |
4195 + | |
4196 +.macro do_yuv_to_rgb_stage2 | |
4197 + rshrn v20.4h, v20.4s, #15 | |
4198 + rshrn2 v20.8h, v22.4s, #15 | |
4199 + rshrn v24.4h, v24.4s, #14 | |
4200 + rshrn2 v24.8h, v26.4s, #14 | |
4201 + rshrn v28.4h, v28.4s, #14 | |
4202 + rshrn2 v28.8h, v30.4s, #14 | |
4203 + uaddw v20.8h, v20.8h, v0.8b | |
4204 + uaddw v24.8h, v24.8h, v0.8b | |
4205 + uaddw v28.8h, v28.8h, v0.8b | |
4206 +.if \bpp != 16 | |
4207 + sqxtun v1\g_offs\defsize, v20.8h | |
4208 + sqxtun v1\r_offs\defsize, v24.8h | |
4209 + sqxtun v1\b_offs\defsize, v28.8h | |
4210 +.else | |
4211 + sqshlu v21.8h, v20.8h, #8 | |
4212 + sqshlu v25.8h, v24.8h, #8 | |
4213 + sqshlu v29.8h, v28.8h, #8 | |
4214 + sri v25.8h, v21.8h, #5 | |
4215 + sri v25.8h, v29.8h, #11 | |
4216 +.endif | |
4217 + | |
4218 +.endm | |
4219 + | |
4220 +.macro do_yuv_to_rgb_stage2_store_load_stage1 | |
4221 + rshrn v20.4h, v20.4s, #15 | |
4222 + rshrn v24.4h, v24.4s, #14 | |
4223 + rshrn v28.4h, v28.4s, #14 | |
4224 + ld1 {v4.8b}, [U], 8 | |
4225 + rshrn2 v20.8h, v22.4s, #15 | |
4226 + rshrn2 v24.8h, v26.4s, #14 | |
4227 + rshrn2 v28.8h, v30.4s, #14 | |
4228 + ld1 {v5.8b}, [V], 8 | |
4229 + uaddw v20.8h, v20.8h, v0.8b | |
4230 + uaddw v24.8h, v24.8h, v0.8b | |
4231 + uaddw v28.8h, v28.8h, v0.8b | |
4232 +.if \bpp != 16 /**************** rgb24/rgb32 *********************************/ | |
4233 + sqxtun v1\g_offs\defsize, v20.8h | |
4234 + ld1 {v0.8b}, [Y], 8 | |
4235 + sqxtun v1\r_offs\defsize, v24.8h | |
4236 + prfm PLDL1KEEP, [U, #64] | |
4237 + prfm PLDL1KEEP, [V, #64] | |
4238 + prfm PLDL1KEEP, [Y, #64] | |
4239 + sqxtun v1\b_offs\defsize, v28.8h | |
4240 + uaddw v6.8h, v2.8h, v4.8b /* v6.16b = u - 128 */ | |
4241 + uaddw v8.8h, v2.8h, v5.8b /* q2 = v - 128 */ | |
4242 + smull v20.4s, v6.4h, v1.4h[1] /* multiply by -11277 */ | |
4243 + smlal v20.4s, v8.4h, v1.4h[2] /* multiply by -23401 */ | |
4244 + smull2 v22.4s, v6.8h, v1.4h[1] /* multiply by -11277 */ | |
4245 + smlal2 v22.4s, v8.8h, v1.4h[2] /* multiply by -23401 */ | |
4246 + smull v24.4s, v8.4h, v1.4h[0] /* multiply by 22971 */ | |
4247 + smull2 v26.4s, v8.8h, v1.4h[0] /* multiply by 22971 */ | |
4248 +.else /**************************** rgb565 ***********************************/ | |
4249 + sqshlu v21.8h, v20.8h, #8 | |
4250 + sqshlu v25.8h, v24.8h, #8 | |
4251 + sqshlu v29.8h, v28.8h, #8 | |
4252 + uaddw v6.8h, v2.8h, v4.8b /* v6.16b = u - 128 */ | |
4253 + uaddw v8.8h, v2.8h, v5.8b /* q2 = v - 128 */ | |
4254 + ld1 {v0.8b}, [Y], 8 | |
4255 + smull v20.4s, v6.4h, v1.4h[1] /* multiply by -11277 */ | |
4256 + smlal v20.4s, v8.4h, v1.4h[2] /* multiply by -23401 */ | |
4257 + smull2 v22.4s, v6.8h, v1.4h[1] /* multiply by -11277 */ | |
4258 + smlal2 v22.4s, v8.8h, v1.4h[2] /* multiply by -23401 */ | |
4259 + sri v25.8h, v21.8h, #5 | |
4260 + smull v24.4s, v8.4h, v1.4h[0] /* multiply by 22971 */ | |
4261 + smull2 v26.4s, v8.8h, v1.4h[0] /* multiply by 22971 */ | |
4262 + prfm PLDL1KEEP, [U, #64] | |
4263 + prfm PLDL1KEEP, [V, #64] | |
4264 + prfm PLDL1KEEP, [Y, #64] | |
4265 + sri v25.8h, v29.8h, #11 | |
4266 +.endif | |
4267 + do_store \bpp, 8 | |
4268 + smull v28.4s, v6.4h, v1.4h[3] /* multiply by 29033 */ | |
4269 + smull2 v30.4s, v6.8h, v1.4h[3] /* multiply by 29033 */ | |
4270 +.endm | |
4271 + | |
4272 +.macro do_yuv_to_rgb | |
4273 + do_yuv_to_rgb_stage1 | |
4274 + do_yuv_to_rgb_stage2 | |
4275 +.endm | |
4276 + | |
4277 +/* Apple gas crashes on adrl, work around that by using adr. | |
4278 + * But this requires a copy of these constants for each function. | |
4279 + */ | |
4280 + | |
4281 +.balign 16 | |
4282 +jsimd_ycc_\colorid\()_neon_consts: | |
4283 + .short 0, 0, 0, 0 | |
4284 + .short 22971, -11277, -23401, 29033 | |
4285 + .short -128, -128, -128, -128 | |
4286 + .short -128, -128, -128, -128 | |
4287 + | |
4288 +asm_function jsimd_ycc_\colorid\()_convert_neon | |
4289 + OUTPUT_WIDTH .req x0 | |
4290 + INPUT_BUF .req x1 | |
4291 + INPUT_ROW .req x2 | |
4292 + OUTPUT_BUF .req x3 | |
4293 + NUM_ROWS .req x4 | |
4294 + | |
4295 + INPUT_BUF0 .req x5 | |
4296 + INPUT_BUF1 .req x6 | |
4297 + INPUT_BUF2 .req INPUT_BUF | |
4298 + | |
4299 + RGB .req x7 | |
4300 + Y .req x8 | |
4301 + U .req x9 | |
4302 + V .req x10 | |
4303 + N .req x15 | |
4304 + | |
4305 + sub sp, sp, 336 | |
4306 + str x15, [sp], 16 | |
4307 + /* Load constants to d1, d2, d3 (v0.4h is just used for padding) */ | |
4308 + adr x15, jsimd_ycc_\colorid\()_neon_consts | |
4309 + /* Save NEON registers */ | |
4310 + st1 {v0.8b - v3.8b}, [sp], 32 | |
4311 + st1 {v4.8b - v7.8b}, [sp], 32 | |
4312 + st1 {v8.8b - v11.8b}, [sp], 32 | |
4313 + st1 {v12.8b - v15.8b}, [sp], 32 | |
4314 + st1 {v16.8b - v19.8b}, [sp], 32 | |
4315 + st1 {v20.8b - v23.8b}, [sp], 32 | |
4316 + st1 {v24.8b - v27.8b}, [sp], 32 | |
4317 + st1 {v28.8b - v31.8b}, [sp], 32 | |
4318 + ld1 {v0.4h, v1.4h}, [x15], 16 | |
4319 + ld1 {v2.8h}, [x15] | |
4320 + | |
4321 + /* Save ARM registers and handle input arguments */ | |
4322 + /* push {x4, x5, x6, x7, x8, x9, x10, x30} */ | |
4323 + stp x4, x5, [sp], 16 | |
4324 + stp x6, x7, [sp], 16 | |
4325 + stp x8, x9, [sp], 16 | |
4326 + stp x10, x30, [sp], 16 | |
4327 + ldr INPUT_BUF0, [INPUT_BUF] | |
4328 + ldr INPUT_BUF1, [INPUT_BUF, 8] | |
4329 + ldr INPUT_BUF2, [INPUT_BUF, 16] | |
4330 + .unreq INPUT_BUF | |
4331 + | |
4332 + /* Initially set v10, v11.4h, v12.8b, d13 to 0xFF */ | |
4333 + movi v10.16b, #255 | |
4334 + movi v13.16b, #255 | |
4335 + | |
4336 + /* Outer loop over scanlines */ | |
4337 + cmp NUM_ROWS, #1 | |
4338 + blt 9f | |
4339 +0: | |
4340 + lsl x16, INPUT_ROW, #3 | |
4341 + ldr Y, [INPUT_BUF0, x16] | |
4342 + ldr U, [INPUT_BUF1, x16] | |
4343 + mov N, OUTPUT_WIDTH | |
4344 + ldr V, [INPUT_BUF2, x16] | |
4345 + add INPUT_ROW, INPUT_ROW, #1 | |
4346 + ldr RGB, [OUTPUT_BUF], #8 | |
4347 + | |
4348 + /* Inner loop over pixels */ | |
4349 + subs N, N, #8 | |
4350 + blt 3f | |
4351 + do_load 8 | |
4352 + do_yuv_to_rgb_stage1 | |
4353 + subs N, N, #8 | |
4354 + blt 2f | |
4355 +1: | |
4356 + do_yuv_to_rgb_stage2_store_load_stage1 | |
4357 + subs N, N, #8 | |
4358 + bge 1b | |
4359 +2: | |
4360 + do_yuv_to_rgb_stage2 | |
4361 + do_store \bpp, 8 | |
4362 + tst N, #7 | |
4363 + beq 8f | |
4364 +3: | |
4365 + tst N, #4 | |
4366 + beq 3f | |
4367 + do_load 4 | |
4368 +3: | |
4369 + tst N, #2 | |
4370 + beq 4f | |
4371 + do_load 2 | |
4372 +4: | |
4373 + tst N, #1 | |
4374 + beq 5f | |
4375 + do_load 1 | |
4376 +5: | |
4377 + do_yuv_to_rgb | |
4378 + tst N, #4 | |
4379 + beq 6f | |
4380 + do_store \bpp, 4 | |
4381 +6: | |
4382 + tst N, #2 | |
4383 + beq 7f | |
4384 + do_store \bpp, 2 | |
4385 +7: | |
4386 + tst N, #1 | |
4387 + beq 8f | |
4388 + do_store \bpp, 1 | |
4389 +8: | |
4390 + subs NUM_ROWS, NUM_ROWS, #1 | |
4391 + bgt 0b | |
4392 +9: | |
4393 + /* Restore all registers and return */ | |
4394 + sub sp, sp, #336 | |
4395 + ldr x15, [sp], 16 | |
4396 + ld1 {v0.8b - v3.8b}, [sp], 32 | |
4397 + ld1 {v4.8b - v7.8b}, [sp], 32 | |
4398 + ld1 {v8.8b - v11.8b}, [sp], 32 | |
4399 + ld1 {v12.8b - v15.8b}, [sp], 32 | |
4400 + ld1 {v16.8b - v19.8b}, [sp], 32 | |
4401 + ld1 {v20.8b - v23.8b}, [sp], 32 | |
4402 + ld1 {v24.8b - v27.8b}, [sp], 32 | |
4403 + ld1 {v28.8b - v31.8b}, [sp], 32 | |
4404 + /* pop {r4, r5, r6, r7, r8, r9, r10, pc} */ | |
4405 + ldp x4, x5, [sp], 16 | |
4406 + ldp x6, x7, [sp], 16 | |
4407 + ldp x8, x9, [sp], 16 | |
4408 + ldp x10, x30, [sp], 16 | |
4409 + br x30 | |
4410 + .unreq OUTPUT_WIDTH | |
4411 + .unreq INPUT_ROW | |
4412 + .unreq OUTPUT_BUF | |
4413 + .unreq NUM_ROWS | |
4414 + .unreq INPUT_BUF0 | |
4415 + .unreq INPUT_BUF1 | |
4416 + .unreq INPUT_BUF2 | |
4417 + .unreq RGB | |
4418 + .unreq Y | |
4419 + .unreq U | |
4420 + .unreq V | |
4421 + .unreq N | |
4422 + | |
4423 +.purgem do_yuv_to_rgb | |
4424 +.purgem do_yuv_to_rgb_stage1 | |
4425 +.purgem do_yuv_to_rgb_stage2 | |
4426 +.purgem do_yuv_to_rgb_stage2_store_load_stage1 | |
4427 +.endm | |
4428 + | |
4429 +/*--------------------------------- id ----- bpp R rsize G gsize B bsize
defsize */ | |
4430 +generate_jsimd_ycc_rgb_convert_neon extrgb, 24, 0, .4h, 1, .4h, 2, .4h,
.8b | |
4431 +generate_jsimd_ycc_rgb_convert_neon extbgr, 24, 2, .4h, 1, .4h, 0, .4h,
.8b | |
4432 +generate_jsimd_ycc_rgb_convert_neon extrgbx, 32, 0, .4h, 1, .4h, 2, .4h,
.8b | |
4433 +generate_jsimd_ycc_rgb_convert_neon extbgrx, 32, 2, .4h, 1, .4h, 0, .4h,
.8b | |
4434 +generate_jsimd_ycc_rgb_convert_neon extxbgr, 32, 3, .4h, 2, .4h, 1, .4h,
.8b | |
4435 +generate_jsimd_ycc_rgb_convert_neon extxrgb, 32, 1, .4h, 2, .4h, 3, .4h,
.8b | |
4436 +generate_jsimd_ycc_rgb_convert_neon rgb565, 16, 0, .4h, 0, .4h, 0, .4h,
.8b | |
4437 +.purgem do_load | |
4438 +.purgem do_store | |
OLD | NEW |