OLD | NEW |
(Empty) | |
| 1 |
| 2 The .lzma File Format |
| 3 ===================== |
| 4 |
| 5 0. Preface |
| 6 0.1. Notices and Acknowledgements |
| 7 0.2. Changes |
| 8 1. File Format |
| 9 1.1. Header |
| 10 1.1.1. Properties |
| 11 1.1.2. Dictionary Size |
| 12 1.1.3. Uncompressed Size |
| 13 1.2. LZMA Compressed Data |
| 14 2. References |
| 15 |
| 16 |
| 17 0. Preface |
| 18 |
| 19 This document describes the .lzma file format, which is |
| 20 sometimes also called LZMA_Alone format. It is a legacy file |
| 21 format, which is being or has been replaced by the .xz format. |
| 22 The MIME type of the .lzma format is `application/x-lzma'. |
| 23 |
| 24 The most commonly used software to handle .lzma files are |
| 25 LZMA SDK, LZMA Utils, 7-Zip, and XZ Utils. This document |
| 26 describes some of the differences between these implementations |
| 27 and gives hints what subset of the .lzma format is the most |
| 28 portable. |
| 29 |
| 30 |
| 31 0.1. Notices and Acknowledgements |
| 32 |
| 33 This file format was designed by Igor Pavlov for use in |
| 34 LZMA SDK. This document was written by Lasse Collin |
| 35 <lasse.collin@tukaani.org> using the documentation found |
| 36 from the LZMA SDK. |
| 37 |
| 38 This document has been put into the public domain. |
| 39 |
| 40 |
| 41 0.2. Changes |
| 42 |
| 43 Last modified: 2009-05-01 11:15+0300 |
| 44 |
| 45 |
| 46 1. File Format |
| 47 |
| 48 +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+ |
| 49 | Header | LZMA Compressed Data | |
| 50 +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+ |
| 51 |
| 52 The .lzma format file consist of 13-byte Header followed by |
| 53 the LZMA Compressed Data. |
| 54 |
| 55 Unlike the .gz, .bz2, and .xz formats, it is not possible to |
| 56 concatenate multiple .lzma files as is and expect the |
| 57 decompression tool to decode the resulting file as if it were |
| 58 a single .lzma file. |
| 59 |
| 60 For example, the command line tools from LZMA Utils and |
| 61 LZMA SDK silently ignore all the data after the first .lzma |
| 62 stream. In contrast, the command line tool from XZ Utils |
| 63 considers the .lzma file to be corrupt if there is data after |
| 64 the first .lzma stream. |
| 65 |
| 66 |
| 67 1.1. Header |
| 68 |
| 69 +------------+----+----+----+----+--+--+--+--+--+--+--+--+ |
| 70 | Properties | Dictionary Size | Uncompressed Size | |
| 71 +------------+----+----+----+----+--+--+--+--+--+--+--+--+ |
| 72 |
| 73 |
| 74 1.1.1. Properties |
| 75 |
| 76 The Properties field contains three properties. An abbreviation |
| 77 is given in parentheses, followed by the value range of the |
| 78 property. The field consists of |
| 79 |
| 80 1) the number of literal context bits (lc, [0, 8]); |
| 81 2) the number of literal position bits (lp, [0, 4]); and |
| 82 3) the number of position bits (pb, [0, 4]). |
| 83 |
| 84 The properties are encoded using the following formula: |
| 85 |
| 86 Properties = (pb * 5 + lp) * 9 + lc |
| 87 |
| 88 The following C code illustrates a straightforward way to |
| 89 decode the Properties field: |
| 90 |
| 91 uint8_t lc, lp, pb; |
| 92 uint8_t prop = get_lzma_properties(); |
| 93 if (prop > (4 * 5 + 4) * 9 + 8) |
| 94 return LZMA_PROPERTIES_ERROR; |
| 95 |
| 96 pb = prop / (9 * 5); |
| 97 prop -= pb * 9 * 5; |
| 98 lp = prop / 9; |
| 99 lc = prop - lp * 9; |
| 100 |
| 101 XZ Utils has an additional requirement: lc + lp <= 4. Files |
| 102 which don't follow this requirement cannot be decompressed |
| 103 with XZ Utils. Usually this isn't a problem since the most |
| 104 common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb |
| 105 combination that the files created by LZMA Utils can have, |
| 106 but LZMA Utils can decompress files with any lc/lp/pb. |
| 107 |
| 108 |
| 109 1.1.2. Dictionary Size |
| 110 |
| 111 Dictionary Size is stored as an unsigned 32-bit little endian |
| 112 integer. Any 32-bit value is possible, but for maximum |
| 113 portability, only sizes of 2^n and 2^n + 2^(n-1) should be |
| 114 used. |
| 115 |
| 116 LZMA Utils creates only files with dictionary size 2^n, |
| 117 16 <= n <= 25. LZMA Utils can decompress files with any |
| 118 dictionary size. |
| 119 |
| 120 XZ Utils creates and decompresses .lzma files only with |
| 121 dictionary sizes 2^n and 2^n + 2^(n-1). If some other |
| 122 dictionary size is specified when compressing, the value |
| 123 stored in the Dictionary Size field is a rounded up, but the |
| 124 specified value is still used in the actual compression code. |
| 125 |
| 126 |
| 127 1.1.3. Uncompressed Size |
| 128 |
| 129 Uncompressed Size is stored as unsigned 64-bit little endian |
| 130 integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates |
| 131 that Uncompressed Size is unknown. End of Payload Marker (*) |
| 132 is used if and only if Uncompressed Size is unknown. |
| 133 |
| 134 XZ Utils rejects files whose Uncompressed Size field specifies |
| 135 a known size that is 256 GiB or more. This is to reject false |
| 136 positives when trying to guess if the input file is in the |
| 137 .lzma format. When Uncompressed Size is unknown, there is no |
| 138 limit for the uncompressed size of the file. |
| 139 |
| 140 (*) Some tools use the term End of Stream (EOS) marker |
| 141 instead of End of Payload Marker. |
| 142 |
| 143 |
| 144 1.2. LZMA Compressed Data |
| 145 |
| 146 Detailed description of the format of this field is out of |
| 147 scope of this document. |
| 148 |
| 149 |
| 150 2. References |
| 151 |
| 152 LZMA SDK - The original LZMA implementation |
| 153 http://7-zip.org/sdk.html |
| 154 |
| 155 7-Zip |
| 156 http://7-zip.org/ |
| 157 |
| 158 LZMA Utils - LZMA adapted to POSIX-like systems |
| 159 http://tukaani.org/lzma/ |
| 160 |
| 161 XZ Utils - The next generation of LZMA Utils |
| 162 http://tukaani.org/xz/ |
| 163 |
| 164 The .xz file format - The successor of the the .lzma format |
| 165 http://tukaani.org/xz/xz-file-format.txt |
| 166 |
OLD | NEW |