OLD | NEW |
(Empty) | |
| 1 |
| 2 History of LZMA Utils and XZ Utils |
| 3 ================================== |
| 4 |
| 5 Tukaani distribution |
| 6 |
| 7 In 2005, there was a small group working on Tukaani distribution, which |
| 8 was a Slackware fork. One of the project goals was to fit the distro on |
| 9 a single 700 MiB ISO-9660 image. Using LZMA instead of gzip helped a |
| 10 lot. Roughly speaking, one could fit data that took 1000 MiB in gzipped |
| 11 form into 700 MiB with LZMA. Naturally compression ratio varied across |
| 12 packages, but this was what we got on average. |
| 13 |
| 14 Slackware packages have traditionally had .tgz as the filename suffix, |
| 15 which is an abbreviation of .tar.gz. A logical naming for LZMA |
| 16 compressed packages was .tlz, being an abbreviation of .tar.lzma. |
| 17 |
| 18 At the end of the year 2007, there was no distribution under the |
| 19 Tukaani project anymore, but development of LZMA Utils was kept going. |
| 20 Still, there were .tlz packages around, because at least Vector Linux |
| 21 (a Slackware based distribution) used LZMA for its packages. |
| 22 |
| 23 First versions of the modified pkgtools used the LZMA_Alone tool from |
| 24 Igor Pavlov's LZMA SDK as is. It was fine, because users wouldn't need |
| 25 to interact with LZMA_Alone directly. But people soon wanted to use |
| 26 LZMA for other files too, and the interface of LZMA_Alone wasn't |
| 27 comfortable for those used to gzip and bzip2. |
| 28 |
| 29 |
| 30 First steps of LZMA Utils |
| 31 |
| 32 The first version of LZMA Utils (4.22.0) included a shell script called |
| 33 lzmash. It was wrapper that had gzip-like command line interface. It |
| 34 used the LZMA_Alone tool from LZMA SDK to do all the real work. zgrep, |
| 35 zdiff, and related scripts from gzip were adapted work with LZMA and |
| 36 were part of the first LZMA Utils release too. |
| 37 |
| 38 LZMA Utils 4.22.0 included also lzmadec, which was a small (less than |
| 39 10 KiB) decoder-only command line tool. It was written on top of the |
| 40 decoder-only C code found from the LZMA SDK. lzmadec was convenient in |
| 41 situations where LZMA_Alone (a few hundred KiB) would be too big. |
| 42 |
| 43 lzmash and lzmadec were written by Lasse Collin. |
| 44 |
| 45 |
| 46 Second generation |
| 47 |
| 48 The lzmash script was an ugly and not very secure hack. The last |
| 49 version of LZMA Utils to use lzmash was 4.27.1. |
| 50 |
| 51 LZMA Utils 4.32.0beta1 introduced a new lzma command line tool written |
| 52 by Ville Koskinen. It was written in C++, and used the encoder and |
| 53 decoder from C++ LZMA SDK with little modifications. This tool replaced |
| 54 both the lzmash script and the LZMA_Alone command line tool in LZMA |
| 55 Utils. |
| 56 |
| 57 Introducing this new tool caused some temporary incompatibilities, |
| 58 because LZMA_Alone executable was simply named lzma like the new |
| 59 command line tool, but they had completely different command line |
| 60 interface. The file format was still the same. |
| 61 |
| 62 Lasse wrote liblzmadec, which was a small decoder-only library based |
| 63 on the C code found from LZMA SDK. liblzmadec had API similar to zlib, |
| 64 although there were some significant differences, which made it |
| 65 non-trivial to use it in some applications designed for zlib and |
| 66 libbzip2. |
| 67 |
| 68 The lzmadec command line tool was converted to use liblzmadec. |
| 69 |
| 70 Alexandre Sauvé helped converting build system to use GNU Autotools. |
| 71 This made is easier to test for certain less portable features needed |
| 72 by the new command line tool. |
| 73 |
| 74 Since the new command line tool never got completely finished (for |
| 75 example, it didn't support LZMA_OPT environment variable), the intent |
| 76 was to not call 4.32.x stable. Similarly, liblzmadec wasn't polished, |
| 77 but appeared to work well enough, so some people started using it too. |
| 78 |
| 79 Because the development of the third generation of LZMA Utils was |
| 80 delayed considerably (3-4 years), the 4.32.x branch had to be kept |
| 81 maintained. It got some bug fixes now and then, and finally it was |
| 82 decided to call it stable, although most of the missing features were |
| 83 never added. |
| 84 |
| 85 |
| 86 File format problems |
| 87 |
| 88 The file format used by LZMA_Alone was primitive. It was designed for |
| 89 embedded systems in mind, and thus provided only minimal set of |
| 90 features. The two biggest problems for non-embedded use were lack of |
| 91 magic bytes and integrity check. |
| 92 |
| 93 Igor and Lasse started developing a new file format with some help |
| 94 from Ville Koskinen. Also Mark Adler, Mikko Pouru, H. Peter Anvin, |
| 95 and Lars Wirzenius helped with some minor things at some point of the |
| 96 development. Designing the new format took quite a long time (actually, |
| 97 too long time would be more appropriate expression). It was mostly |
| 98 because Lasse was quite slow at getting things done due to personal |
| 99 reasons. |
| 100 |
| 101 Originally the new format was supposed to use the same .lzma suffix |
| 102 that was already used by the old file format. Switching to the new |
| 103 format wouldn't have caused much trouble when the old format wasn't |
| 104 used by many people. But since the development of the new format took |
| 105 so long time, the old format got quite popular, and it was decided |
| 106 that the new file format must use a different suffix. |
| 107 |
| 108 It was decided to use .xz as the suffix of the new file format. The |
| 109 first stable .xz file format specification was finally released in |
| 110 December 2008. In addition to fixing the most obvious problems of |
| 111 the old .lzma format, the .xz format added some new features like |
| 112 support for multiple filters (compression algorithms), filter chaining |
| 113 (like piping on the command line), and limited random-access reading. |
| 114 |
| 115 Currently the primary compression algorithm used in .xz is LZMA2. |
| 116 It is an extension on top of the original LZMA to fix some practical |
| 117 problems: LZMA2 adds support for flushing the encoder, uncompressed |
| 118 chunks, eases stateful decoder implementations, and improves support |
| 119 for multithreading. Since LZMA2 is better than the original LZMA, the |
| 120 original LZMA is not supported in .xz. |
| 121 |
| 122 |
| 123 Transition to XZ Utils |
| 124 |
| 125 The early versions of XZ Utils were called LZMA Utils. The first |
| 126 releases were 4.42.0alphas. They dropped the rest of the C++ LZMA SDK. |
| 127 The code was still directly based on LZMA SDK but ported to C and |
| 128 converted from callback API to stateful API. Later, Igor Pavlov made |
| 129 C version of the LZMA encoder too; these ports from C++ to C were |
| 130 independent in LZMA SDK and LZMA Utils. |
| 131 |
| 132 The core of the new LZMA Utils was liblzma, a compression library with |
| 133 zlib-like API. liblzma supported both the old and new file format. The |
| 134 gzip-like lzma command line tool was rewritten to use liblzma. |
| 135 |
| 136 The new LZMA Utils code base was renamed to XZ Utils when the name |
| 137 of the new file format had been decided. The liblzma compression |
| 138 library retained its name though, because changing it would have |
| 139 caused unnecessary breakage in applications already using the early |
| 140 liblzma snapshots. |
| 141 |
| 142 The xz command line tool can emulate the gzip-like lzma tool by |
| 143 creating appropriate symlinks (e.g. lzma -> xz). Thus, practically |
| 144 all scripts using the lzma tool from LZMA Utils will work as is with |
| 145 XZ Utils (and will keep using the old .lzma format). Still, the .lzma |
| 146 format is more or less deprecated. XZ Utils will keep supporting it, |
| 147 but new applications should use the .xz format, and migrating old |
| 148 applications to .xz is often a good idea too. |
| 149 |
OLD | NEW |