Index: src/xz/xz.1 |
=================================================================== |
--- src/xz/xz.1 (revision 50504) |
+++ src/xz/xz.1 (working copy) |
@@ -5,9 +5,11 @@ |
.\" This file has been put into the public domain. |
.\" You can do whatever you want with this file. |
.\" |
-.TH XZ 1 "2010-06-15" "Tukaani" "XZ Utils" |
+.TH XZ 1 "2010-10-04" "Tukaani" "XZ Utils" |
+. |
.SH NAME |
xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files |
+. |
.SH SYNOPSIS |
.B xz |
.RI [ option ]... |
@@ -33,8 +35,8 @@ |
is equivalent to |
.BR "xz \-\-format=lzma \-\-decompress \-\-stdout" . |
.PP |
-When writing scripts that need to decompress files, it is recommended to |
-always use the name |
+When writing scripts that need to decompress files, |
+it is recommended to always use the name |
.B xz |
with appropriate arguments |
.RB ( "xz \-d" |
@@ -43,19 +45,22 @@ |
instead of the names |
.B unxz |
and |
-.BR xzcat. |
+.BR xzcat . |
+. |
.SH DESCRIPTION |
.B xz |
-is a general-purpose data compression tool with command line syntax similar to |
+is a general-purpose data compression tool with |
+command line syntax similar to |
.BR gzip (1) |
and |
.BR bzip2 (1). |
The native file format is the |
.B .xz |
-format, but also the legacy |
+format, but the legacy |
.B .lzma |
-format and raw compressed streams with no container format headers |
-are supported. |
+format used by LZMA Utils and |
+raw compressed streams with no container format headers |
+are also supported. |
.PP |
.B xz |
compresses or decompresses each |
@@ -68,13 +73,16 @@ |
is |
.BR \- , |
.B xz |
-reads from standard input and writes the processed data to standard output. |
+reads from standard input and writes the processed data |
+to standard output. |
.B xz |
will refuse (display an error and skip the |
.IR file ) |
-to write compressed data to standard output if it is a terminal. Similarly, |
+to write compressed data to standard output if it is a terminal. |
+Similarly, |
.B xz |
-will refuse to read compressed data from standard input if it is a terminal. |
+will refuse to read compressed data |
+from standard input if it is a terminal. |
.PP |
Unless |
.B \-\-stdout |
@@ -117,8 +125,9 @@ |
if any of the following applies: |
.IP \(bu 3 |
.I File |
-is not a regular file. Symbolic links are not followed, thus they |
-are not considered to be regular files. |
+is not a regular file. |
+Symbolic links are not followed, |
+and thus they are not considered to be regular files. |
.IP \(bu 3 |
.I File |
has more than one hard link. |
@@ -126,7 +135,7 @@ |
.I File |
has setuid, setgid, or sticky bit set. |
.IP \(bu 3 |
-The operation mode is set to compress, and the |
+The operation mode is set to compress and the |
.I file |
already has a suffix of the target file format |
.RB ( .xz |
@@ -142,7 +151,7 @@ |
.B .lzma |
format). |
.IP \(bu 3 |
-The operation mode is set to decompress, and the |
+The operation mode is set to decompress and the |
.I file |
doesn't have a suffix of any of the supported file formats |
.RB ( .xz , |
@@ -154,12 +163,13 @@ |
After successfully compressing or decompressing the |
.IR file , |
.B xz |
-copies the owner, group, permissions, access time, and modification time |
-from the source |
+copies the owner, group, permissions, access time, |
+and modification time from the source |
.I file |
-to the target file. If copying the group fails, the permissions are modified |
-so that the target file doesn't become accessible to users who didn't have |
-permission to access the source |
+to the target file. |
+If copying the group fails, the permissions are modified |
+so that the target file doesn't become accessible to users |
+who didn't have permission to access the source |
.IR file . |
.B xz |
doesn't support copying other metadata like access control lists |
@@ -169,7 +179,8 @@ |
.I file |
is removed unless |
.B \-\-keep |
-was specified. The source |
+was specified. |
+The source |
.I file |
is never removed if the output is written to standard output. |
.PP |
@@ -180,61 +191,78 @@ |
to the |
.B xz |
process makes it print progress information to standard error. |
-This has only limited use since when standard error is a terminal, using |
+This has only limited use since when standard error |
+is a terminal, using |
.B \-\-verbose |
will display an automatically updating progress indicator. |
+. |
.SS "Memory usage" |
The memory usage of |
.B xz |
-varies from a few hundred kilobytes to several gigabytes depending on |
-the compression settings. The settings used when compressing a file |
-affect also the memory usage of the decompressor. Typically the decompressor |
-needs only 5\ % to 20\ % of the amount of RAM that the compressor needed when |
-creating the file. Still, the worst-case memory usage of the decompressor |
-is several gigabytes. |
+varies from a few hundred kilobytes to several gigabytes |
+depending on the compression settings. |
+The settings used when compressing a file determine |
+the memory requirements of the decompressor. |
+Typically the decompressor needs 5\ % to 20\ % of |
+the amount of memory that the compressor needed when |
+creating the file. |
+For example, decompressing a file created with |
+.B xz \-9 |
+currently requires 65\ MiB of memory. |
+Still, it is possible to have |
+.B .xz |
+files that require several gigabytes of memory to decompress. |
.PP |
-To prevent uncomfortable surprises caused by huge memory usage, |
+Especially users of older systems may find |
+the possibility of very large memory usage annoying. |
+To prevent uncomfortable surprises, |
.B xz |
-has a built-in memory usage limiter. While some operating systems provide |
-ways to limit the memory usage of processes, relying on it wasn't deemed |
-to be flexible enough. The default limit depends on the total amount of |
-physical RAM: |
-.IP \(bu 3 |
-If 40\ % of RAM is at least 80 MiB, 40\ % of RAM is used as the limit. |
-.IP \(bu 3 |
-If 80\ % of RAM is less than 80 MiB, 80\ % of RAM is used as the limit. |
-.IP \(bu 3 |
-Otherwise 80 MiB is used as the limit. |
+has a built-in memory usage limiter, which is disabled by default. |
+While some operating systems provide ways to limit |
+the memory usage of processes, relying on it |
+wasn't deemed to be flexible enough (e.g. using |
+.BR ulimit (1) |
+to limit virtual memory tends to cripple |
+.BR mmap (2)). |
.PP |
-When compressing, if the selected compression settings exceed the memory |
-usage limit, the settings are automatically adjusted downwards and a notice |
-about this is displayed. As an exception, if the memory usage limit is |
-exceeded when compressing with |
-.B \-\-format=raw |
-or |
-.BR \-\-no\-adjust , |
-an error is displayed and |
+The memory usage limiter can be enabled with |
+the command line option \fB\-\-memlimit=\fIlimit\fR. |
+Often it is more convenient to enable the limiter |
+by default by setting the environment variable |
+.BR XZ_DEFAULTS , |
+e.g.\& |
+.BR XZ_DEFAULTS=\-\-memlimit=150MiB . |
+It is possible to set the limits separately |
+for compression and decompression |
+by using \fB\-\-memlimit\-compress=\fIlimit\fR and |
+\fB\-\-memlimit\-decompress=\fIlimit\fR. |
+Using these two options outside |
+.B XZ_DEFAULTS |
+is rarely useful because a single run of |
.B xz |
-will exit with exit status |
-.BR 1 . |
+cannot do both compression and decompression and |
+.BI \-\-memlimit= limit |
+(or \fB\-M\fR \fIlimit\fR) |
+is shorter to type on the command line. |
.PP |
-If source |
-.I file |
-cannot be decompressed without exceeding the memory usage limit, an error |
-message is displayed and the file is skipped. Note that compressed files |
-may contain many blocks, which may have been compressed with different |
-settings. Typically all blocks will have roughly the same memory requirements, |
-but it is possible that a block later in the file will exceed the memory usage |
-limit, and an error about too low memory usage limit gets displayed after some |
-data has already been decompressed. |
-.PP |
-The absolute value of the active memory usage limit can be seen with |
-.B \-\-info-memory |
-or near the bottom of the output of |
-.BR \-\-long\-help . |
-The default limit can be overridden with |
-\fB\-\-memory=\fIlimit\fR. |
-.SS Concatenation and padding with .xz files |
+If the specified memory usage limit is exceeded when decompressing, |
+.B xz |
+will display an error and decompressing the file will fail. |
+If the limit is exceeded when compressing, |
+.B xz |
+will try to scale the settings down so that the limit |
+is no longer exceeded (except when using \fB\-\-format=raw\fR |
+or \fB\-\-no\-adjust\fR). |
+This way the operation won't fail unless the limit is very small. |
+The scaling of the settings is done in steps that don't |
+match the compression level presets, e.g. if the limit is |
+only slightly less than the amount required for |
+.BR "xz \-9" , |
+the settings will be scaled down only a little, |
+not all the way down to |
+.BR "xz \-8" . |
+. |
+.SS "Concatenation and padding with .xz files" |
It is possible to concatenate |
.B .xz |
files as is. |
@@ -243,23 +271,28 @@ |
.B .xz |
file. |
.PP |
-It is possible to insert padding between the concenated parts |
-or after the last part. The padding must be null bytes and the size |
-of the padding must be a multiple of four bytes. This can be useful |
-if the .xz file is stored on a medium that stores file sizes |
-e.g. as 512-byte blocks. |
+It is possible to insert padding between the concatenated parts |
+or after the last part. |
+The padding must consist of null bytes and the size |
+of the padding must be a multiple of four bytes. |
+This can be useful e.g. if the |
+.B .xz |
+file is stored on a medium that measures file sizes |
+in 512-byte blocks. |
.PP |
Concatenation and padding are not allowed with |
.B .lzma |
files or raw streams. |
+. |
.SH OPTIONS |
+. |
.SS "Integer suffixes and special values" |
-In most places where an integer argument is expected, an optional suffix |
-is supported to easily indicate large integers. There must be no space |
-between the integer and the suffix. |
+In most places where an integer argument is expected, |
+an optional suffix is supported to easily indicate large integers. |
+There must be no space between the integer and the suffix. |
.TP |
.B KiB |
-The integer is multiplied by 1,024 (2^10). Also |
+Multiply the integer by 1,024 (2^10). |
.BR Ki , |
.BR k , |
.BR kB , |
@@ -270,7 +303,7 @@ |
.BR KiB . |
.TP |
.B MiB |
-The integer is multiplied by 1,048,576 (2^20). Also |
+Multiply the integer by 1,048,576 (2^20). |
.BR Mi , |
.BR m , |
.BR M , |
@@ -280,7 +313,7 @@ |
.BR MiB . |
.TP |
.B GiB |
-The integer is multiplied by 1,073,741,824 (2^30). Also |
+Multiply the integer by 1,073,741,824 (2^30). |
.BR Gi , |
.BR g , |
.BR G , |
@@ -289,16 +322,20 @@ |
are accepted as synonyms for |
.BR GiB . |
.PP |
-A special value |
+The special value |
.B max |
-can be used to indicate the maximum integer value supported by the option. |
+can be used to indicate the maximum integer value |
+supported by the option. |
+. |
.SS "Operation mode" |
-If multiple operation mode options are given, the last one takes effect. |
+If multiple operation mode options are given, |
+the last one takes effect. |
.TP |
.BR \-z ", " \-\-compress |
-Compress. This is the default operation mode when no operation mode option |
-is specified, and no other operation mode is implied from the command name |
-(for example, |
+Compress. |
+This is the default operation mode when no operation mode option |
+is specified and no other operation mode is implied from |
+the command name (for example, |
.B unxz |
implies |
.BR \-\-decompress ). |
@@ -309,62 +346,73 @@ |
.BR \-t ", " \-\-test |
Test the integrity of compressed |
.IR files . |
-No files are created or removed. This option is equivalent to |
+This option is equivalent to |
.B "\-\-decompress \-\-stdout" |
except that the decompressed data is discarded instead of being |
written to standard output. |
+No files are created or removed. |
.TP |
.BR \-l ", " \-\-list |
-List information about compressed |
+Print information about compressed |
.IR files . |
-No uncompressed output is produced, and no files are created or removed. |
-In list mode, the program cannot read the compressed data from standard |
+No uncompressed output is produced, |
+and no files are created or removed. |
+In list mode, the program cannot read |
+the compressed data from standard |
input or from other unseekable sources. |
-.IP |
+.IP "" |
The default listing shows basic information about |
.IR files , |
-one file per line. To get more detailed information, use also the |
+one file per line. |
+To get more detailed information, use also the |
.B \-\-verbose |
-option. For even more information, use |
+option. |
+For even more information, use |
.B \-\-verbose |
-twice, but note that it may be slow, because getting all the extra |
-information requires many seeks. The width of verbose output exceeds |
-80 characters, so piping the output to e.g. |
+twice, but note that this may be slow, because getting all the extra |
+information requires many seeks. |
+The width of verbose output exceeds |
+80 characters, so piping the output to e.g.\& |
.B "less\ \-S" |
may be convenient if the terminal isn't wide enough. |
-.IP |
+.IP "" |
The exact output may vary between |
.B xz |
-versions and different locales. To get machine-readable output, |
+versions and different locales. |
+For machine-readable output, |
.B \-\-robot \-\-list |
should be used. |
+. |
.SS "Operation modifiers" |
.TP |
.BR \-k ", " \-\-keep |
-Keep (don't delete) the input files. |
+Don't delete the input files. |
.TP |
.BR \-f ", " \-\-force |
This option has several effects: |
.RS |
.IP \(bu 3 |
-If the target file already exists, delete it before compressing or |
-decompressing. |
+If the target file already exists, |
+delete it before compressing or decompressing. |
.IP \(bu 3 |
-Compress or decompress even if the input is a symbolic link to a regular file, |
-has more than one hard link, or has setuid, setgid, or sticky bit set. |
-The setuid, setgid, and sticky bits are not copied to the target file. |
+Compress or decompress even if the input is |
+a symbolic link to a regular file, |
+has more than one hard link, |
+or has the setuid, setgid, or sticky bit set. |
+The setuid, setgid, and sticky bits are not copied |
+to the target file. |
.IP \(bu 3 |
-If combined with |
+When used with |
.B \-\-decompress |
.BR \-\-stdout |
and |
.B xz |
-doesn't recognize the type of the source file, |
-.B xz |
-will copy the source file as is to standard output. This allows using |
+cannot recognize the type of the source file, |
+copy the source file as is to standard output. |
+This allows |
.B xzcat |
-.B \--force |
-like |
+.B \-\-force |
+to be used like |
.BR cat (1) |
for files that have not been compressed with |
.BR xz . |
@@ -380,21 +428,23 @@ |
to decompress only a single file format. |
.RE |
.TP |
-.BR \-c ", " \-\-stdout ", " \-\-to-stdout |
-Write the compressed or decompressed data to standard output instead of |
-a file. This implies |
+.BR \-c ", " \-\-stdout ", " \-\-to\-stdout |
+Write the compressed or decompressed data to |
+standard output instead of a file. |
+This implies |
.BR \-\-keep . |
.TP |
.B \-\-no\-sparse |
-Disable creation of sparse files. By default, if decompressing into |
-a regular file, |
+Disable creation of sparse files. |
+By default, if decompressing into a regular file, |
.B xz |
-tries to make the file sparse if the decompressed data contains long |
-sequences of binary zeros. It works also when writing to standard output |
-as long as standard output is connected to a regular file, and certain |
-additional conditions are met to make it safe. Creating sparse files may |
-save disk space and speed up the decompression by reducing the amount of |
-disk I/O. |
+tries to make the file sparse if the decompressed data contains |
+long sequences of binary zeros. |
+It also works when writing to standard output |
+as long as standard output is connected to a regular file |
+and certain additional conditions are met to make it safe. |
+Creating sparse files may save disk space and speed up |
+the decompression by reducing the amount of disk I/O. |
.TP |
\fB\-S\fR \fI.suf\fR, \fB\-\-suffix=\fI.suf |
When compressing, use |
@@ -403,11 +453,12 @@ |
.B .xz |
or |
.BR .lzma . |
-If not writing to standard output and the source file already has the suffix |
+If not writing to standard output and |
+the source file already has the suffix |
.IR .suf , |
a warning is displayed and the file is skipped. |
-.IP |
-When decompressing, recognize also files with the suffix |
+.IP "" |
+When decompressing, recognize files with the suffix |
.I .suf |
in addition to files with the |
.BR .xz , |
@@ -415,13 +466,15 @@ |
.BR .lzma , |
or |
.B .tlz |
-suffix. If the source file has the suffix |
+suffix. |
+If the source file has the suffix |
.IR .suf , |
the suffix is removed to get the target filename. |
-.IP |
+.IP "" |
When compressing or decompressing raw streams |
.RB ( \-\-format=raw ), |
-the suffix must always be specified unless writing to standard output, |
+the suffix must always be specified unless |
+writing to standard output, |
because there is no default suffix for raw streams. |
.TP |
\fB\-\-files\fR[\fB=\fIfile\fR] |
@@ -429,8 +482,9 @@ |
.IR file ; |
if |
.I file |
-is omitted, filenames are read from standard input. Filenames must be |
-terminated with the newline character. A dash |
+is omitted, filenames are read from standard input. |
+Filenames must be terminated with the newline character. |
+A dash |
.RB ( \- ) |
is taken as a regular filename; it doesn't mean standard input. |
If filenames are given also as command line arguments, they are |
@@ -438,296 +492,469 @@ |
.IR file . |
.TP |
\fB\-\-files0\fR[\fB=\fIfile\fR] |
-This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except that the |
-filenames must be terminated with the null character. |
+This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except |
+that each filename must be terminated with the null character. |
+. |
.SS "Basic file format and compression options" |
.TP |
\fB\-F\fR \fIformat\fR, \fB\-\-format=\fIformat |
-Specify the file format to compress or decompress: |
+Specify the file |
+.I format |
+to compress or decompress: |
.RS |
-.IP \(bu 3 |
-.BR auto : |
-This is the default. When compressing, |
+.TP |
.B auto |
+This is the default. |
+When compressing, |
+.B auto |
is equivalent to |
.BR xz . |
-When decompressing, the format of the input file is automatically detected. |
+When decompressing, |
+the format of the input file is automatically detected. |
Note that raw streams (created with |
.BR \-\-format=raw ) |
cannot be auto-detected. |
-.IP \(bu 3 |
-.BR xz : |
+.TP |
+.B xz |
Compress to the |
.B .xz |
file format, or accept only |
.B .xz |
files when decompressing. |
-.IP \(bu 3 |
-.B lzma |
-or |
-.BR alone : |
+.TP |
+.BR lzma ", " alone |
Compress to the legacy |
.B .lzma |
file format, or accept only |
.B .lzma |
-files when decompressing. The alternative name |
+files when decompressing. |
+The alternative name |
.B alone |
is provided for backwards compatibility with LZMA Utils. |
-.IP \(bu 3 |
-.BR raw : |
-Compress or uncompress a raw stream (no headers). This is meant for advanced |
-users only. To decode raw streams, you need to set not only |
+.TP |
+.B raw |
+Compress or uncompress a raw stream (no headers). |
+This is meant for advanced users only. |
+To decode raw streams, you need use |
.B \-\-format=raw |
-but also specify the filter chain, which would normally be stored in the |
-container format headers. |
+and explicitly specify the filter chain, |
+which normally would have been stored in the container headers. |
.RE |
.TP |
\fB\-C\fR \fIcheck\fR, \fB\-\-check=\fIcheck |
-Specify the type of the integrity check, which is calculated from the |
-uncompressed data. This option has an effect only when compressing into the |
+Specify the type of the integrity check. |
+The check is calculated from the uncompressed data and |
+stored in the |
.B .xz |
+file. |
+This option has an effect only when compressing into the |
+.B .xz |
format; the |
.B .lzma |
format doesn't support integrity checks. |
The integrity check (if any) is verified when the |
.B .xz |
file is decompressed. |
-.IP |
+.IP "" |
Supported |
.I check |
types: |
.RS |
-.IP \(bu 3 |
-.BR none : |
-Don't calculate an integrity check at all. This is usually a bad idea. This |
-can be useful when integrity of the data is verified by other means anyway. |
-.IP \(bu 3 |
-.BR crc32 : |
+.TP |
+.B none |
+Don't calculate an integrity check at all. |
+This is usually a bad idea. |
+This can be useful when integrity of the data is verified |
+by other means anyway. |
+.TP |
+.B crc32 |
Calculate CRC32 using the polynomial from IEEE-802.3 (Ethernet). |
-.IP \(bu 3 |
-.BR crc64 : |
-Calculate CRC64 using the polynomial from ECMA-182. This is the default, since |
-it is slightly better than CRC32 at detecting damaged files and the speed |
-difference is negligible. |
-.IP \(bu 3 |
-.BR sha256 : |
-Calculate SHA-256. This is somewhat slower than CRC32 and CRC64. |
+.TP |
+.B crc64 |
+Calculate CRC64 using the polynomial from ECMA-182. |
+This is the default, since it is slightly better than CRC32 |
+at detecting damaged files and the speed difference is negligible. |
+.TP |
+.B sha256 |
+Calculate SHA-256. |
+This is somewhat slower than CRC32 and CRC64. |
.RE |
-.IP |
+.IP "" |
Integrity of the |
.B .xz |
-headers is always verified with CRC32. It is not possible to change or |
-disable it. |
+headers is always verified with CRC32. |
+It is not possible to change or disable it. |
.TP |
.BR \-0 " ... " \-9 |
-Select compression preset. If a preset level is specified multiple times, |
+Select a compression preset level. |
+The default is |
+.BR \-6 . |
+If multiple preset levels are specified, |
the last one takes effect. |
-.IP |
-The compression preset levels can be categorised roughly into three |
-categories: |
-.RS |
-.IP "\fB\-0\fR ... \fB\-2" |
-Fast presets with relatively low memory usage. |
-.B \-1 |
+If a custom filter chain was already specified, setting |
+a compression preset level clears the custom filter chain. |
+.IP "" |
+The differences between the presets are more significant than with |
+.BR gzip (1) |
and |
-.B \-2 |
-should give compression speed and ratios comparable to |
-.B "bzip2 \-1" |
+.BR bzip2 (1). |
+The selected compression settings determine |
+the memory requirements of the decompressor, |
+thus using a too high preset level might make it painful |
+to decompress the file on an old system with little RAM. |
+Specifically, |
+.B "it's not a good idea to blindly use \-9 for everything" |
+like it often is with |
+.BR gzip (1) |
and |
-.BR "bzip2 \-9" , |
-respectively. |
-Currently |
+.BR bzip2 (1). |
+.RS |
+.TP |
+.BR "\-0" " ... " "\-3" |
+These are somewhat fast presets. |
.B \-0 |
-is not very good (not much faster than |
-.B \-1 |
-but much worse compression). In future, |
-.B \-0 |
-may be indicate some fast algorithm instead of LZMA2. |
-.IP "\fB\-3\fR ... \fB\-5" |
-Good compression ratio with low to medium memory usage. |
-These are significantly slower than levels 0\-2. |
-.IP "\fB\-6\fR ... \fB\-9" |
-Excellent compression with medium to high memory usage. These are also |
-slower than the lower preset levels. The default is |
-.BR \-6 . |
-Unless you want to maximize the compression ratio, you probably don't want |
-a higher preset level than |
-.B \-7 |
-due to speed and memory usage. |
+is sometimes faster than |
+.B "gzip \-9" |
+while compressing much better. |
+The higher ones often have speed comparable to |
+.BR bzip2 (1) |
+with comparable or better compression ratio, |
+although the results |
+depend a lot on the type of data being compressed. |
+.TP |
+.BR "\-4" " ... " "\-6" |
+Good to very good compression while keeping |
+decompressor memory usage reasonable even for old systems. |
+.B \-6 |
+is the default, which is usually a good choice |
+e.g. for distributing files that need to be decompressible |
+even on systems with only 16\ MiB RAM. |
+.RB ( \-5e |
+or |
+.B \-6e |
+may be worth considering too. |
+See |
+.BR \-\-extreme .) |
+.TP |
+.B "\-7 ... \-9" |
+These are like |
+.B \-6 |
+but with higher compressor and decompressor memory requirements. |
+These are useful only when compressing files bigger than |
+8\ MiB, 16\ MiB, and 32\ MiB, respectively. |
.RE |
-.IP |
-The exact compression settings (filter chain) used by each preset may |
-vary between |
-.B xz |
-versions. The settings may also vary between files being compressed, if |
-.B xz |
-determines that modified settings will probably give better compression |
-ratio without significantly affecting compression time or memory usage. |
-.IP |
-Because the settings may vary, the memory usage may vary too. The following |
-table lists the maximum memory usage of each preset level, which won't be |
-exceeded even in future versions of |
-.BR xz . |
-.IP |
-.B "FIXME: The table below is just a rough idea." |
+.IP "" |
+On the same hardware, the decompression speed is approximately |
+a constant number of bytes of compressed data per second. |
+In other words, the better the compression, |
+the faster the decompression will usually be. |
+This also means that the amount of uncompressed output |
+produced per second can vary a lot. |
+.IP "" |
+The following table summarises the features of the presets: |
.RS |
.RS |
+.PP |
.TS |
tab(;); |
-c c c |
-n n n. |
-Preset;Compression;Decompression |
-\-0;6 MiB;1 MiB |
-\-1;6 MiB;1 MiB |
-\-2;10 MiB;1 MiB |
-\-3;20 MiB;2 MiB |
-\-4;30 MiB;3 MiB |
-\-5;60 MiB;6 MiB |
-\-6;100 MiB;10 MiB |
-\-7;200 MiB;20 MiB |
-\-8;400 MiB;40 MiB |
-\-9;800 MiB;80 MiB |
+c c c c c |
+n n n n n. |
+Preset;DictSize;CompCPU;CompMem;DecMem |
+\-0;256 KiB;0;3 MiB;1 MiB |
+\-1;1 MiB;1;9 MiB;2 MiB |
+\-2;2 MiB;2;17 MiB;3 MiB |
+\-3;4 MiB;3;32 MiB;5 MiB |
+\-4;4 MiB;4;48 MiB;5 MiB |
+\-5;8 MiB;5;94 MiB;9 MiB |
+\-6;8 MiB;6;94 MiB;9 MiB |
+\-7;16 MiB;6;186 MiB;17 MiB |
+\-8;32 MiB;6;370 MiB;33 MiB |
+\-9;64 MiB;6;674 MiB;65 MiB |
.TE |
.RE |
.RE |
-.IP |
-When compressing, |
+.IP "" |
+Column descriptions: |
+.RS |
+.IP \(bu 3 |
+DictSize is the LZMA2 dictionary size. |
+It is waste of memory to use a dictionary bigger than |
+the size of the uncompressed file. |
+This is why it is good to avoid using the presets |
+.BR \-7 " ... " \-9 |
+when there's no real need for them. |
+At |
+.B \-6 |
+and lower, the amount of memory wasted is |
+usually low enough to not matter. |
+.IP \(bu 3 |
+CompCPU is a simplified representation of the LZMA2 settings |
+that affect compression speed. |
+The dictionary size affects speed too, |
+so while CompCPU is the same for levels |
+.BR \-6 " ... " \-9 , |
+higher levels still tend to be a little slower. |
+To get even slower and thus possibly better compression, see |
+.BR \-\-extreme . |
+.IP \(bu 3 |
+CompMem contains the compressor memory requirements |
+in the single-threaded mode. |
+It may vary slightly between |
.B xz |
-automatically adjusts the compression settings downwards if |
-the memory usage limit would be exceeded, so it is safe to specify |
-a high preset level even on systems that don't have lots of RAM. |
+versions. |
+Memory requirements of some of the future multithreaded modes may |
+be dramatically higher than that of the single-threaded mode. |
+.IP \(bu 3 |
+DecMem contains the decompressor memory requirements. |
+That is, the compression settings determine |
+the memory requirements of the decompressor. |
+The exact decompressor memory usage is slighly more than |
+the LZMA2 dictionary size, but the values in the table |
+have been rounded up to the next full MiB. |
+.RE |
.TP |
-.BR \-\-fast " and " \-\-best |
+.BR \-e ", " \-\-extreme |
+Use a slower variant of the selected compression preset level |
+.RB ( \-0 " ... " \-9 ) |
+to hopefully get a little bit better compression ratio, |
+but with bad luck this can also make it worse. |
+Decompressor memory usage is not affected, |
+but compressor memory usage increases a little at preset levels |
+.BR \-0 " ... " \-3 . |
+.IP "" |
+Since there are two presets with dictionary sizes |
+4\ MiB and 8\ MiB, the presets |
+.B \-3e |
+and |
+.B \-5e |
+use slightly faster settings (lower CompCPU) than |
+.B \-4e |
+and |
+.BR \-6e , |
+respectively. |
+That way no two presets are identical. |
+.RS |
+.RS |
+.PP |
+.TS |
+tab(;); |
+c c c c c |
+n n n n n. |
+Preset;DictSize;CompCPU;CompMem;DecMem |
+\-0e;256 KiB;8;4 MiB;1 MiB |
+\-1e;1 MiB;8;13 MiB;2 MiB |
+\-2e;2 MiB;8;25 MiB;3 MiB |
+\-3e;4 MiB;7;48 MiB;5 MiB |
+\-4e;4 MiB;8;48 MiB;5 MiB |
+\-5e;8 MiB;7;94 MiB;9 MiB |
+\-6e;8 MiB;8;94 MiB;9 MiB |
+\-7e;16 MiB;8;186 MiB;17 MiB |
+\-8e;32 MiB;8;370 MiB;33 MiB |
+\-9e;64 MiB;8;674 MiB;65 MiB |
+.TE |
+.RE |
+.RE |
+.IP "" |
+For example, there are a total of four presets that use |
+8\ MiB dictionary, whose order from the fastest to the slowest is |
+.BR \-5 , |
+.BR \-6 , |
+.BR \-5e , |
+and |
+.BR \-6e . |
+.TP |
+.B \-\-fast |
+.PD 0 |
+.TP |
+.B \-\-best |
+.PD |
These are somewhat misleading aliases for |
.B \-0 |
and |
.BR \-9 , |
respectively. |
-These are provided only for backwards compatibility with LZMA Utils. |
+These are provided only for backwards compatibility |
+with LZMA Utils. |
Avoid using these options. |
-.IP |
-Especially the name of |
-.B \-\-best |
-is misleading, because the definition of best depends on the input data, |
-and that usually people don't want the very best compression ratio anyway, |
-because it would be very slow. |
.TP |
-.BR \-e ", " \-\-extreme |
-Modify the compression preset (\fB\-0\fR ... \fB\-9\fR) so that a little bit |
-better compression ratio can be achieved without increasing memory usage |
-of the compressor or decompressor (exception: compressor memory usage may |
-increase a little with presets \fB\-0\fR ... \fB\-2\fR). The downside is that |
-the compression time will increase dramatically (it can easily double). |
-.TP |
+.BI \-\-memlimit\-compress= limit |
+Set a memory usage limit for compression. |
+If this option is specified multiple times, |
+the last one takes effect. |
+.IP "" |
+If the compression settings exceed the |
+.IR limit , |
+.B xz |
+will adjust the settings downwards so that |
+the limit is no longer exceeded and display a notice that |
+automatic adjustment was done. |
+Such adjustments are not made when compressing with |
+.B \-\-format=raw |
+or if |
.B \-\-no\-adjust |
-Display an error and exit if the compression settings exceed the |
-the memory usage limit. The default is to adjust the settings downwards so |
-that the memory usage limit is not exceeded. Automatic adjusting is |
-always disabled when creating raw streams |
-.RB ( \-\-format=raw ). |
-.TP |
-\fB\-M\fR \fIlimit\fR, \fB\-\-memory=\fIlimit |
-Set the memory usage limit. If this option is specified multiple times, |
-the last one takes effect. The |
+has been specified. |
+In those cases, an error is displayed and |
+.B xz |
+will exit with exit status 1. |
+.IP "" |
+The |
.I limit |
can be specified in multiple ways: |
.RS |
.IP \(bu 3 |
The |
.I limit |
-can be an absolute value in bytes. Using an integer suffix like |
+can be an absolute value in bytes. |
+Using an integer suffix like |
.B MiB |
-can be useful. Example: |
-.B "\-\-memory=80MiB" |
+can be useful. |
+Example: |
+.B "\-\-memlimit\-compress=80MiB" |
.IP \(bu 3 |
The |
.I limit |
-can be specified as a percentage of physical RAM. Example: |
-.B "\-\-memory=70%" |
+can be specified as a percentage of total physical memory (RAM). |
+This can be useful especially when setting the |
+.B XZ_DEFAULTS |
+environment variable in a shell initialization script |
+that is shared between different computers. |
+That way the limit is automatically bigger |
+on systems with more memory. |
+Example: |
+.B "\-\-memlimit\-compress=70%" |
.IP \(bu 3 |
The |
.I limit |
can be reset back to its default value by setting it to |
.BR 0 . |
-See the section |
-.B "Memory usage" |
-for how the default limit is defined. |
-.IP \(bu 3 |
-The memory usage limiting can be effectively disabled by setting |
+This is currently equivalent to setting the |
.I limit |
to |
-.BR max . |
-This isn't recommended. It's usually better to use, for example, |
-.BR \-\-memory=90% . |
+.B max |
+(no memory usage limit). |
+Once multithreading support has been implemented, |
+there may be a difference between |
+.B 0 |
+and |
+.B max |
+for the multithreaded case, so it is recommended to use |
+.B 0 |
+instead of |
+.B max |
+until the details have been decided. |
.RE |
-.IP |
-The current |
-.I limit |
-can be seen near the bottom of the output of the |
-.B \-\-long-help |
-option. |
+.IP "" |
+See also the section |
+.BR "Memory usage" . |
.TP |
+.BI \-\-memlimit\-decompress= limit |
+Set a memory usage limit for decompression. |
+This also affects the |
+.B \-\-list |
+mode. |
+If the operation is not possible without exceeding the |
+.IR limit , |
+.B xz |
+will display an error and decompressing the file will fail. |
+See |
+.BI \-\-memlimit\-compress= limit |
+for possible ways to specify the |
+.IR limit . |
+.TP |
+\fB\-M\fR \fIlimit\fR, \fB\-\-memlimit=\fIlimit\fR, \fB\-\-memory=\fIlimit |
+This is equivalent to specifying \fB\-\-memlimit\-compress=\fIlimit |
+\fB\-\-memlimit\-decompress=\fIlimit\fR. |
+.TP |
+.B \-\-no\-adjust |
+Display an error and exit if the compression settings exceed |
+the memory usage limit. |
+The default is to adjust the settings downwards so |
+that the memory usage limit is not exceeded. |
+Automatic adjusting is always disabled when creating raw streams |
+.RB ( \-\-format=raw ). |
+.TP |
\fB\-T\fR \fIthreads\fR, \fB\-\-threads=\fIthreads |
-Specify the maximum number of worker threads to use. The default is |
-the number of available CPU cores. You can see the current value of |
+Specify the number of worker threads to use. |
+The actual number of threads can be less than |
.I threads |
-near the end of the output of the |
-.B \-\-long\-help |
-option. |
-.IP |
-The actual number of worker threads can be less than |
-.I threads |
if using more threads would exceed the memory usage limit. |
-In addition to CPU-intensive worker threads, |
-.B xz |
-may use a few auxiliary threads, which don't use a lot of CPU time. |
-.IP |
-.B "Multithreaded compression and decompression are not implemented yet," |
-.B "so this option has no effect for now." |
-.SS Custom compressor filter chains |
-A custom filter chain allows specifying the compression settings in detail |
-instead of relying on the settings associated to the preset levels. |
-When a custom filter chain is specified, the compression preset level options |
-(\fB\-0\fR ... \fB\-9\fR and \fB\-\-extreme\fR) are silently ignored. |
+.IP "" |
+.B "Multithreaded compression and decompression are not" |
+.B "implemented yet, so this option has no effect for now." |
+.IP "" |
+.B "As of writing (2010-09-27), it hasn't been decided" |
+.B "if threads will be used by default on multicore systems" |
+.B "once support for threading has been implemented." |
+.B "Comments are welcome." |
+The complicating factor is that using many threads |
+will increase the memory usage dramatically. |
+Note that if multithreading will be the default, |
+it will probably be done so that single-threaded and |
+multithreaded modes produce the same output, |
+so compression ratio won't be significantly affected |
+if threading will be enabled by default. |
+. |
+.SS "Custom compressor filter chains" |
+A custom filter chain allows specifying |
+the compression settings in detail instead of relying on |
+the settings associated to the preset levels. |
+When a custom filter chain is specified, |
+the compression preset level options |
+(\fB\-0\fR ... \fB\-9\fR and \fB\-\-extreme\fR) are |
+silently ignored. |
.PP |
-A filter chain is comparable to piping on the UN*X command line. |
-When compressing, the uncompressed input goes to the first filter, whose |
-output goes to the next filter (if any). The output of the last filter |
-gets written to the compressed file. The maximum number of filters in |
-the chain is four, but typically a filter chain has only one or two filters. |
+A filter chain is comparable to piping on the command line. |
+When compressing, the uncompressed input goes to the first filter, |
+whose output goes to the next filter (if any). |
+The output of the last filter gets written to the compressed file. |
+The maximum number of filters in the chain is four, |
+but typically a filter chain has only one or two filters. |
.PP |
-Many filters have limitations where they can be in the filter chain: |
-some filters can work only as the last filter in the chain, some only |
-as a non-last filter, and some work in any position in the chain. Depending |
-on the filter, this limitation is either inherent to the filter design or |
-exists to prevent security issues. |
+Many filters have limitations on where they can be |
+in the filter chain: |
+some filters can work only as the last filter in the chain, |
+some only as a non-last filter, and some work in any position |
+in the chain. |
+Depending on the filter, this limitation is either inherent to |
+the filter design or exists to prevent security issues. |
.PP |
-A custom filter chain is specified by using one or more filter options in |
-the order they are wanted in the filter chain. That is, the order of filter |
-options is significant! When decoding raw streams |
+A custom filter chain is specified by using one or more |
+filter options in the order they are wanted in the filter chain. |
+That is, the order of filter options is significant! |
+When decoding raw streams |
.RB ( \-\-format=raw ), |
-the filter chain is specified in the same order as it was specified when |
-compressing. |
+the filter chain is specified in the same order as |
+it was specified when compressing. |
.PP |
Filters take filter-specific |
.I options |
-as a comma-separated list. Extra commas in |
+as a comma-separated list. |
+Extra commas in |
.I options |
-are ignored. Every option has a default value, so you need to |
+are ignored. |
+Every option has a default value, so you need to |
specify only those you want to change. |
.TP |
-\fB\-\-lzma1\fR[\fB=\fIoptions\fR], \fB\-\-lzma2\fR[\fB=\fIoptions\fR] |
-Add LZMA1 or LZMA2 filter to the filter chain. These filter can be used |
-only as the last filter in the chain. |
-.IP |
-LZMA1 is a legacy filter, which is supported almost solely due to the legacy |
+\fB\-\-lzma1\fR[\fB=\fIoptions\fR] |
+.PD 0 |
+.TP |
+\fB\-\-lzma2\fR[\fB=\fIoptions\fR] |
+.PD |
+Add LZMA1 or LZMA2 filter to the filter chain. |
+These filters can be used only as the last filter in the chain. |
+.IP "" |
+LZMA1 is a legacy filter, |
+which is supported almost solely due to the legacy |
.B .lzma |
-file format, which supports only LZMA1. LZMA2 is an updated |
-version of LZMA1 to fix some practical issues of LZMA1. The |
+file format, which supports only LZMA1. |
+LZMA2 is an updated |
+version of LZMA1 to fix some practical issues of LZMA1. |
+The |
.B .xz |
-format uses LZMA2, and doesn't support LZMA1 at all. Compression speed and |
-ratios of LZMA1 and LZMA2 are practically the same. |
-.IP |
+format uses LZMA2 and doesn't support LZMA1 at all. |
+Compression speed and ratios of LZMA1 and LZMA2 |
+are practically the same. |
+.IP "" |
LZMA1 and LZMA2 share the same set of |
.IR options : |
.RS |
@@ -738,8 +965,9 @@ |
to |
.IR preset . |
.I Preset |
-consist of an integer, which may be followed by single-letter preset |
-modifiers. The integer can be from |
+consist of an integer, which may be followed by single-letter |
+preset modifiers. |
+The integer can be from |
.B 0 |
to |
.BR 9 , |
@@ -748,7 +976,6 @@ |
.BR e , |
which matches |
.BR \-\-extreme . |
-.IP |
The default |
.I preset |
is |
@@ -758,84 +985,155 @@ |
are taken. |
.TP |
.BI dict= size |
-Dictionary (history buffer) size indicates how many bytes of the recently |
-processed uncompressed data is kept in memory. One method to reduce size of |
-the uncompressed data is to store distance-length pairs, which |
-indicate what data to repeat from the dictionary buffer. The bigger |
-the dictionary, the better the compression ratio usually is, |
-but dictionaries bigger than the uncompressed data are waste of RAM. |
-.IP |
-Typical dictionary size is from 64 KiB to 64 MiB. The minimum is 4 KiB. |
-The maximum for compression is currently 1.5 GiB. The decompressor already |
-supports dictionaries up to one byte less than 4 GiB, which is the |
-maximum for LZMA1 and LZMA2 stream formats. |
-.IP |
-Dictionary size has the biggest effect on compression ratio. |
-Dictionary size and match finder together determine the memory usage of |
-the LZMA1 or LZMA2 encoder. The same dictionary size is required |
-for decompressing that was used when compressing, thus the memory usage of |
-the decoder is determined by the dictionary size used when compressing. |
+Dictionary (history buffer) |
+.I size |
+indicates how many bytes of the recently processed |
+uncompressed data is kept in memory. |
+The algorithm tries to find repeating byte sequences (matches) in |
+the uncompressed data, and replace them with references |
+to the data currently in the dictionary. |
+The bigger the dictionary, the higher is the chance |
+to find a match. |
+Thus, increasing dictionary |
+.I size |
+usually improves compression ratio, but |
+a dictionary bigger than the uncompressed file is waste of memory. |
+.IP "" |
+Typical dictionary |
+.I size |
+is from 64\ KiB to 64\ MiB. |
+The minimum is 4\ KiB. |
+The maximum for compression is currently 1.5\ GiB (1536\ MiB). |
+The decompressor already supports dictionaries up to |
+one byte less than 4\ GiB, which is the maximum for |
+the LZMA1 and LZMA2 stream formats. |
+.IP "" |
+Dictionary |
+.I size |
+and match finder |
+.RI ( mf ) |
+together determine the memory usage of the LZMA1 or LZMA2 encoder. |
+The same (or bigger) dictionary |
+.I size |
+is required for decompressing that was used when compressing, |
+thus the memory usage of the decoder is determined |
+by the dictionary size used when compressing. |
+The |
+.B .xz |
+headers store the dictionary |
+.I size |
+either as |
+.RI "2^" n |
+or |
+.RI "2^" n " + 2^(" n "\-1)," |
+so these |
+.I sizes |
+are somewhat preferred for compression. |
+Other |
+.I sizes |
+will get rounded up when stored in the |
+.B .xz |
+headers. |
.TP |
.BI lc= lc |
-Specify the number of literal context bits. The minimum is |
-.B 0 |
-and the maximum is |
-.BR 4 ; |
-the default is |
-.BR 3 . |
+Specify the number of literal context bits. |
+The minimum is 0 and the maximum is 4; the default is 3. |
In addition, the sum of |
.I lc |
and |
.I lp |
-must not exceed |
-.BR 4 . |
+must not exceed 4. |
+.IP "" |
+All bytes that cannot be encoded as matches |
+are encoded as literals. |
+That is, literals are simply 8-bit bytes |
+that are encoded one at a time. |
+.IP "" |
+The literal coding makes an assumption that the highest |
+.I lc |
+bits of the previous uncompressed byte correlate |
+with the next byte. |
+E.g. in typical English text, an upper-case letter is |
+often followed by a lower-case letter, and a lower-case |
+letter is usually followed by another lower-case letter. |
+In the US-ASCII character set, the highest three bits are 010 |
+for upper-case letters and 011 for lower-case letters. |
+When |
+.I lc |
+is at least 3, the literal coding can take advantage of |
+this property in the uncompressed data. |
+.IP "" |
+The default value (3) is usually good. |
+If you want maximum compression, test |
+.BR lc=4 . |
+Sometimes it helps a little, and |
+sometimes it makes compression worse. |
+If it makes it worse, test e.g.\& |
+.B lc=2 |
+too. |
.TP |
.BI lp= lp |
-Specify the number of literal position bits. The minimum is |
-.B 0 |
-and the maximum is |
-.BR 4 ; |
-the default is |
-.BR 0 . |
+Specify the number of literal position bits. |
+The minimum is 0 and the maximum is 4; the default is 0. |
+.IP "" |
+.I Lp |
+affects what kind of alignment in the uncompressed data is |
+assumed when encoding literals. |
+See |
+.I pb |
+below for more information about alignment. |
.TP |
.BI pb= pb |
-Specify the number of position bits. The minimum is |
-.B 0 |
-and the maximum is |
-.BR 4 ; |
-the default is |
-.BR 2 . |
-.TP |
-.BI mode= mode |
-Compression |
-.I mode |
-specifies the function used to analyze the data produced by the match finder. |
-Supported |
-.I modes |
-are |
-.B fast |
+Specify the number of position bits. |
+The minimum is 0 and the maximum is 4; the default is 2. |
+.IP "" |
+.I Pb |
+affects what kind of alignment in the uncompressed data is |
+assumed in general. |
+The default means four-byte alignment |
+.RI (2^ pb =2^2=4), |
+which is often a good choice when there's no better guess. |
+.IP "" |
+When the aligment is known, setting |
+.I pb |
+accordingly may reduce the file size a little. |
+E.g. with text files having one-byte |
+alignment (US-ASCII, ISO-8859-*, UTF-8), setting |
+.B pb=0 |
+can improve compression slightly. |
+For UTF-16 text, |
+.B pb=1 |
+is a good choice. |
+If the alignment is an odd number like 3 bytes, |
+.B pb=0 |
+might be the best choice. |
+.IP "" |
+Even though the assumed alignment can be adjusted with |
+.I pb |
and |
-.BR normal . |
-The default is |
-.B fast |
-for |
-.I presets |
-.BR 0 \- 2 |
-and |
-.B normal |
-for |
-.I presets |
-.BR 3 \- 9 . |
+.IR lp , |
+LZMA1 and LZMA2 still slightly favor 16-byte alignment. |
+It might be worth taking into account when designing file formats |
+that are likely to be often compressed with LZMA1 or LZMA2. |
.TP |
.BI mf= mf |
-Match finder has a major effect on encoder speed, memory usage, and |
-compression ratio. Usually Hash Chain match finders are faster than |
-Binary Tree match finders. Hash Chains are usually used together with |
-.B mode=fast |
-and Binary Trees with |
-.BR mode=normal . |
-The memory usage formulas are only rough estimates, |
-which are closest to reality when |
+Match finder has a major effect on encoder speed, |
+memory usage, and compression ratio. |
+Usually Hash Chain match finders are faster than Binary Tree |
+match finders. |
+The default depends on the |
+.IR preset : |
+0 uses |
+.BR hc3 , |
+1\-3 |
+use |
+.BR hc4 , |
+and the rest use |
+.BR bt4 . |
+.IP "" |
+The following match finders are supported. |
+The memory usage formulas below are rough approximations, |
+which are closest to the reality when |
.I dict |
is a power of two. |
.RS |
@@ -848,6 +1146,7 @@ |
3 |
.br |
Memory usage: |
+.br |
.I dict |
* 7.5 (if |
.I dict |
@@ -866,8 +1165,16 @@ |
4 |
.br |
Memory usage: |
+.br |
.I dict |
-* 7.5 |
+* 7.5 (if |
+.I dict |
+<= 32 MiB); |
+.br |
+.I dict |
+* 6.5 (if |
+.I dict |
+> 32 MiB) |
.TP |
.B bt2 |
Binary Tree with 2-byte hashing |
@@ -888,6 +1195,7 @@ |
3 |
.br |
Memory usage: |
+.br |
.I dict |
* 11.5 (if |
.I dict |
@@ -906,53 +1214,96 @@ |
4 |
.br |
Memory usage: |
+.br |
.I dict |
-* 11.5 |
+* 11.5 (if |
+.I dict |
+<= 32 MiB); |
+.br |
+.I dict |
+* 10.5 (if |
+.I dict |
+> 32 MiB) |
.RE |
.TP |
+.BI mode= mode |
+Compression |
+.I mode |
+specifies the method to analyze |
+the data produced by the match finder. |
+Supported |
+.I modes |
+are |
+.B fast |
+and |
+.BR normal . |
+The default is |
+.B fast |
+for |
+.I presets |
+0\-3 and |
+.B normal |
+for |
+.I presets |
+4\-9. |
+.IP "" |
+Usually |
+.B fast |
+is used with Hash Chain match finders and |
+.B normal |
+with Binary Tree match finders. |
+This is also what the |
+.I presets |
+do. |
+.TP |
.BI nice= nice |
-Specify what is considered to be a nice length for a match. Once a match |
-of at least |
+Specify what is considered to be a nice length for a match. |
+Once a match of at least |
.I nice |
-bytes is found, the algorithm stops looking for possibly better matches. |
-.IP |
-.I nice |
-can be 2\-273 bytes. Higher values tend to give better compression ratio |
-at expense of speed. The default depends on the |
-.I preset |
-level. |
+bytes is found, the algorithm stops |
+looking for possibly better matches. |
+.IP "" |
+.I Nice |
+can be 2\-273 bytes. |
+Higher values tend to give better compression ratio |
+at the expense of speed. |
+The default depends on the |
+.IR preset . |
.TP |
.BI depth= depth |
-Specify the maximum search depth in the match finder. The default is the |
-special value |
-.BR 0 , |
+Specify the maximum search depth in the match finder. |
+The default is the special value of 0, |
which makes the compressor determine a reasonable |
.I depth |
from |
.I mf |
and |
.IR nice . |
-.IP |
+.IP "" |
+Reasonable |
+.I depth |
+for Hash Chains is 4\-100 and 16\-1000 for Binary Trees. |
Using very high values for |
.I depth |
-can make the encoder extremely slow with carefully crafted files. |
+can make the encoder extremely slow with some files. |
Avoid setting the |
.I depth |
-over 1000 unless you are prepared to interrupt the compression in case it |
-is taking too long. |
+over 1000 unless you are prepared to interrupt |
+the compression in case it is taking far too long. |
.RE |
-.IP |
+.IP "" |
When decoding raw streams |
.RB ( \-\-format=raw ), |
-LZMA2 needs only the value of |
-.BR dict . |
+LZMA2 needs only the dictionary |
+.IR size . |
LZMA1 needs also |
-.BR lc , |
-.BR lp , |
+.IR lc , |
+.IR lp , |
and |
-.BR pb. |
+.IR pb . |
.TP |
\fB\-\-x86\fR[\fB=\fIoptions\fR] |
+.PD 0 |
.TP |
\fB\-\-powerpc\fR[\fB=\fIoptions\fR] |
.TP |
@@ -963,28 +1314,72 @@ |
\fB\-\-armthumb\fR[\fB=\fIoptions\fR] |
.TP |
\fB\-\-sparc\fR[\fB=\fIoptions\fR] |
-Add a branch/call/jump (BCJ) filter to the filter chain. These filters |
-can be used only as non-last filter in the filter chain. |
-.IP |
-A BCJ filter converts relative addresses in the machine code to their |
-absolute counterparts. This doesn't change the size of the data, but |
-it increases redundancy, which allows e.g. LZMA2 to get better |
-compression ratio. |
-.IP |
-The BCJ filters are always reversible, so using a BCJ filter for wrong |
-type of data doesn't cause any data loss. However, applying a BCJ filter |
-for wrong type of data is a bad idea, because it tends to make the |
-compression ratio worse. |
-.IP |
+.PD |
+Add a branch/call/jump (BCJ) filter to the filter chain. |
+These filters can be used only as a non-last filter |
+in the filter chain. |
+.IP "" |
+A BCJ filter converts relative addresses in |
+the machine code to their absolute counterparts. |
+This doesn't change the size of the data, |
+but it increases redundancy, |
+which can help LZMA2 to produce 0\-15\ % smaller |
+.B .xz |
+file. |
+The BCJ filters are always reversible, |
+so using a BCJ filter for wrong type of data |
+doesn't cause any data loss, although it may make |
+the compression ratio slightly worse. |
+.IP "" |
+It is fine to apply a BCJ filter on a whole executable; |
+there's no need to apply it only on the executable section. |
+Applying a BCJ filter on an archive that contains both executable |
+and non-executable files may or may not give good results, |
+so it generally isn't good to blindly apply a BCJ filter when |
+compressing binary packages for distribution. |
+.IP "" |
+These BCJ filters are very fast and |
+use insignificant amount of memory. |
+If a BCJ filter improves compression ratio of a file, |
+it can improve decompression speed at the same time. |
+This is because, on the same hardware, |
+the decompression speed of LZMA2 is roughly |
+a fixed number of bytes of compressed data per second. |
+.IP "" |
+These BCJ filters have known problems related to |
+the compression ratio: |
+.RS |
+.IP \(bu 3 |
+Some types of files containing executable code |
+(e.g. object files, static libraries, and Linux kernel modules) |
+have the addresses in the instructions filled with filler values. |
+These BCJ filters will still do the address conversion, |
+which will make the compression worse with these files. |
+.IP \(bu 3 |
+Applying a BCJ filter on an archive containing multiple similar |
+executables can make the compression ratio worse than not using |
+a BCJ filter. |
+This is because the BCJ filter doesn't detect the boundaries |
+of the executable files, and doesn't reset |
+the address conversion counter for each executable. |
+.RE |
+.IP "" |
+Both of the above problems will be fixed |
+in the future in a new filter. |
+The old BCJ filters will still be useful in embedded systems, |
+because the decoder of the new filter will be bigger |
+and use more memory. |
+.IP "" |
Different instruction sets have have different alignment: |
.RS |
.RS |
+.PP |
.TS |
tab(;); |
l n l |
l n l. |
Filter;Alignment;Notes |
-x86;1;32-bit and 64-bit x86 |
+x86;1;32-bit or 64-bit x86 |
PowerPC;4;Big endian only |
ARM;4;Little endian only |
ARM-Thumb;2;Little endian only |
@@ -993,15 +1388,18 @@ |
.TE |
.RE |
.RE |
-.IP |
-Since the BCJ-filtered data is usually compressed with LZMA2, the compression |
-ratio may be improved slightly if the LZMA2 options are set to match the |
-alignment of the selected BCJ filter. For example, with the IA-64 filter, |
-it's good to set |
+.IP "" |
+Since the BCJ-filtered data is usually compressed with LZMA2, |
+the compression ratio may be improved slightly if |
+the LZMA2 options are set to match the |
+alignment of the selected BCJ filter. |
+For example, with the IA-64 filter, it's good to set |
.B pb=4 |
-with LZMA2 (2^4=16). The x86 filter is an exception; it's usually good to |
-stick to LZMA2's default four-byte alignment when compressing x86 executables. |
-.IP |
+with LZMA2 (2^4=16). |
+The x86 filter is an exception; |
+it's usually good to stick to LZMA2's default |
+four-byte alignment when compressing x86 executables. |
+.IP "" |
All BCJ filters support the same |
.IR options : |
.RS |
@@ -1009,36 +1407,32 @@ |
.BI start= offset |
Specify the start |
.I offset |
-that is used when converting between relative and absolute addresses. |
+that is used when converting between relative |
+and absolute addresses. |
The |
.I offset |
-must be a multiple of the alignment of the filter (see the table above). |
-The default is zero. In practice, the default is good; specifying |
-a custom |
+must be a multiple of the alignment of the filter |
+(see the table above). |
+The default is zero. |
+In practice, the default is good; specifying a custom |
.I offset |
is almost never useful. |
-.IP |
-Specifying a non-zero start |
-.I offset |
-is probably useful only if the executable has multiple sections, and there |
-are many cross-section jumps or calls. Applying a BCJ filter separately for |
-each section with proper start offset and then compressing the result as |
-a single chunk may give some improvement in compression ratio compared |
-to applying the BCJ filter with the default |
-.I offset |
-for the whole executable. |
.RE |
.TP |
\fB\-\-delta\fR[\fB=\fIoptions\fR] |
-Add Delta filter to the filter chain. The Delta filter |
-can be used only as non-last filter in the filter chain. |
-.IP |
-Currently only simple byte-wise delta calculation is supported. It can |
-be useful when compressing e.g. uncompressed bitmap images or uncompressed |
-PCM audio. However, special purpose algorithms may give significantly better |
-results than Delta + LZMA2. This is true especially with audio, which |
-compresses faster and better e.g. with FLAC. |
-.IP |
+Add the Delta filter to the filter chain. |
+The Delta filter can be only used as a non-last filter |
+in the filter chain. |
+.IP "" |
+Currently only simple byte-wise delta calculation is supported. |
+It can be useful when compressing e.g. uncompressed bitmap images |
+or uncompressed PCM audio. |
+However, special purpose algorithms may give significantly better |
+results than Delta + LZMA2. |
+This is true especially with audio, |
+which compresses faster and better e.g. with |
+.BR flac (1). |
+.IP "" |
Supported |
.IR options : |
.RS |
@@ -1046,99 +1440,111 @@ |
.BI dist= distance |
Specify the |
.I distance |
-of the delta calculation as bytes. |
+of the delta calculation in bytes. |
.I distance |
-must be 1\-256. The default is 1. |
-.IP |
+must be 1\-256. |
+The default is 1. |
+.IP "" |
For example, with |
.B dist=2 |
and eight-byte input A1 B1 A2 B3 A3 B5 A4 B7, the output will be |
A1 B1 01 02 01 02 01 02. |
.RE |
+. |
.SS "Other options" |
.TP |
.BR \-q ", " \-\-quiet |
-Suppress warnings and notices. Specify this twice to suppress errors too. |
-This option has no effect on the exit status. That is, even if a warning |
-was suppressed, the exit status to indicate a warning is still used. |
+Suppress warnings and notices. |
+Specify this twice to suppress errors too. |
+This option has no effect on the exit status. |
+That is, even if a warning was suppressed, |
+the exit status to indicate a warning is still used. |
.TP |
.BR \-v ", " \-\-verbose |
-Be verbose. If standard error is connected to a terminal, |
+Be verbose. |
+If standard error is connected to a terminal, |
.B xz |
will display a progress indicator. |
Specifying |
.B \-\-verbose |
-twice will give even more verbose output (useful mostly for debugging). |
-.IP |
+twice will give even more verbose output. |
+.IP "" |
The progress indicator shows the following information: |
.RS |
.IP \(bu 3 |
-Completion percentage is shown if the size of the input file is known. |
-That is, percentage cannot be shown in pipes. |
+Completion percentage is shown |
+if the size of the input file is known. |
+That is, the percentage cannot be shown in pipes. |
.IP \(bu 3 |
-Amount of compressed data produced (compressing) or consumed (decompressing). |
+Amount of compressed data produced (compressing) |
+or consumed (decompressing). |
.IP \(bu 3 |
-Amount of uncompressed data consumed (compressing) or produced |
-(decompressing). |
+Amount of uncompressed data consumed (compressing) |
+or produced (decompressing). |
.IP \(bu 3 |
-Compression ratio, which is calculated by dividing the amount of |
-compressed data processed so far by the amount of uncompressed data |
-processed so far. |
+Compression ratio, which is calculated by dividing |
+the amount of compressed data processed so far by |
+the amount of uncompressed data processed so far. |
.IP \(bu 3 |
-Compression or decompression speed. This is measured as the amount of |
-uncompressed data consumed (compression) or produced (decompression) |
-per second. It is shown once a few seconds have passed since |
+Compression or decompression speed. |
+This is measured as the amount of uncompressed data consumed |
+(compression) or produced (decompression) per second. |
+It is shown after a few seconds have passed since |
.B xz |
started processing the file. |
.IP \(bu 3 |
-Elapsed time or estimated time remaining. |
-Elapsed time is displayed in the format M:SS or H:MM:SS. |
-The estimated remaining time is displayed in a less precise format |
-which never has colons, for example, 2 min 30 s. The estimate can |
-be shown only when the size of the input file is known and a couple of |
-seconds have already passed since |
+Elapsed time in the format M:SS or H:MM:SS. |
+.IP \(bu 3 |
+Estimated remaining time is shown |
+only when the size of the input file is |
+known and a couple of seconds have already passed since |
.B xz |
started processing the file. |
+The time is shown in a less precise format which |
+never has any colons, e.g. 2 min 30 s. |
.RE |
-.IP |
+.IP "" |
When standard error is not a terminal, |
.B \-\-verbose |
will make |
.B xz |
-print the filename, compressed size, uncompressed size, compression ratio, |
-speed, and elapsed time on a single line to standard error after |
-compressing or decompressing the file. If operating took at least a few |
-seconds, also the speed and elapsed time are printed. If the operation |
-didn't finish, for example due to user interruption, also the completion |
-percentage is printed if the size of the input file is known. |
+print the filename, compressed size, uncompressed size, |
+compression ratio, and possibly also the speed and elapsed time |
+on a single line to standard error after compressing or |
+decompressing the file. |
+The speed and elapsed time are included only when |
+the operation took at least a few seconds. |
+If the operation didn't finish, e.g. due to user interruption, |
+also the completion percentage is printed |
+if the size of the input file is known. |
.TP |
.BR \-Q ", " \-\-no\-warn |
-Don't set the exit status to |
-.B 2 |
-even if a condition worth a warning was detected. This option doesn't affect |
-the verbosity level, thus both |
+Don't set the exit status to 2 |
+even if a condition worth a warning was detected. |
+This option doesn't affect the verbosity level, thus both |
.B \-\-quiet |
and |
.B \-\-no\-warn |
-have to be used to not display warnings and to not alter the exit status. |
+have to be used to not display warnings and |
+to not alter the exit status. |
.TP |
.B \-\-robot |
-Print messages in a machine-parsable format. This is intended to ease |
-writing frontends that want to use |
+Print messages in a machine-parsable format. |
+This is intended to ease writing frontends that want to use |
.B xz |
-instead of liblzma, which may be the case with various scripts. The output |
-with this option enabled is meant to be stable across |
+instead of liblzma, which may be the case with various scripts. |
+The output with this option enabled is meant to be stable across |
.B xz |
-releases. See the section |
+releases. |
+See the section |
.B "ROBOT MODE" |
for details. |
.TP |
-.BR \-\-info-memory |
-Display the current memory usage limit in human-readable format on |
-a single line, and exit successfully. To see how much RAM |
+.BR \-\-info\-memory |
+Display, in human-readable format, how much physical memory (RAM) |
.B xz |
-thinks your system has, use |
-.BR "\-\-memory=100% \-\-info\-memory" . |
+thinks the system has and the memory usage limits for compression |
+and decompression, and exit successfully. |
.TP |
.BR \-h ", " \-\-help |
Display a help message describing the most commonly used options, |
@@ -1152,24 +1558,29 @@ |
.BR \-V ", " \-\-version |
Display the version number of |
.B xz |
-and liblzma in human readable format. To get machine-parsable output, specify |
+and liblzma in human readable format. |
+To get machine-parsable output, specify |
.B \-\-robot |
before |
.BR \-\-version . |
-.SH ROBOT MODE |
+. |
+.SH "ROBOT MODE" |
The robot mode is activated with the |
.B \-\-robot |
-option. It makes the output of |
+option. |
+It makes the output of |
.B xz |
-easier to parse by other programs. Currently |
+easier to parse by other programs. |
+Currently |
.B \-\-robot |
is supported only together with |
.BR \-\-version , |
-.BR \-\-info-memory , |
+.BR \-\-info\-memory , |
and |
.BR \-\-list . |
-It will be supported for normal compression and decompression in the future. |
-.PP |
+It will be supported for normal compression and |
+decompression in the future. |
+. |
.SS Version |
.B "xz \-\-robot \-\-version" |
will print the version number of |
@@ -1184,24 +1595,19 @@ |
Major version. |
.TP |
.I YYY |
-Minor version. Even numbers are stable. |
+Minor version. |
+Even numbers are stable. |
Odd numbers are alpha or beta versions. |
.TP |
.I ZZZ |
-Patch level for stable releases or just a counter for development releases. |
+Patch level for stable releases or |
+just a counter for development releases. |
.TP |
.I S |
Stability. |
-.B 0 |
-is alpha, |
-.B 1 |
-is beta, and |
-.B 2 |
-is stable. |
+0 is alpha, 1 is beta, and 2 is stable. |
.I S |
-should be always |
-.B 2 |
-when |
+should be always 2 when |
.I YYY |
is even. |
.PP |
@@ -1215,31 +1621,48 @@ |
and |
5.0.0 is |
.BR 50000002 . |
-.SS Memory limit information |
-.B "xz \-\-robot \-\-info-memory" |
-prints the current memory usage limit as bytes on a single line. |
-To get the total amount of installed RAM, use |
-.BR "xz \-\-robot \-\-memory=100% \-\-info-memory" . |
-.SS List mode |
+. |
+.SS "Memory limit information" |
+.B "xz \-\-robot \-\-info\-memory" |
+prints a single line with three tab-separated columns: |
+.IP 1. 4 |
+Total amount of physical memory (RAM) in bytes |
+.IP 2. 4 |
+Memory usage limit for compression in bytes. |
+A special value of zero indicates the default setting, |
+which for single-threaded mode is the same as no limit. |
+.IP 3. 4 |
+Memory usage limit for decompression in bytes. |
+A special value of zero indicates the default setting, |
+which for single-threaded mode is the same as no limit. |
+.PP |
+In the future, the output of |
+.B "xz \-\-robot \-\-info\-memory" |
+may have more columns, but never more than a single line. |
+. |
+.SS "List mode" |
.B "xz \-\-robot \-\-list" |
-uses tab-separated output. The first column of every line has a string |
+uses tab-separated output. |
+The first column of every line has a string |
that indicates the type of the information found on that line: |
.TP |
.B name |
-This is always the first line when starting to list a file. The second |
-column on the line is the filename. |
+This is always the first line when starting to list a file. |
+The second column on the line is the filename. |
.TP |
.B file |
This line contains overall information about the |
.B .xz |
-file. This line is always printed after the |
+file. |
+This line is always printed after the |
.B name |
line. |
.TP |
.B stream |
This line type is used only when |
.B \-\-verbose |
-was specified. There are as many |
+was specified. |
+There are as many |
.B stream |
lines as there are streams in the |
.B .xz |
@@ -1248,11 +1671,13 @@ |
.B block |
This line type is used only when |
.B \-\-verbose |
-was specified. There are as many |
+was specified. |
+There are as many |
.B block |
lines as there are blocks in the |
.B .xz |
-file. The |
+file. |
+The |
.B block |
lines are shown after all the |
.B stream |
@@ -1261,9 +1686,11 @@ |
.B summary |
This line type is used only when |
.B \-\-verbose |
-was specified twice. This line is printed after all |
+was specified twice. |
+This line is printed after all |
.B block |
-lines. Like the |
+lines. |
+Like the |
.B file |
line, the |
.B summary |
@@ -1272,12 +1699,13 @@ |
file. |
.TP |
.B totals |
-This line is always the very last line of the list output. It shows |
-the total counts and sizes. |
+This line is always the very last line of the list output. |
+It shows the total counts and sizes. |
.PP |
The columns of the |
.B file |
lines: |
+.PD 0 |
.RS |
.IP 2. 4 |
Number of streams in the file |
@@ -1294,8 +1722,8 @@ |
.RB ( \-\-\- ) |
are displayed instead of the ratio. |
.IP 7. 4 |
-Comma-separated list of integrity check names. The following strings are |
-used for the known check types: |
+Comma-separated list of integrity check names. |
+The following strings are used for the known check types: |
.BR None , |
.BR CRC32 , |
.BR CRC64 , |
@@ -1309,10 +1737,12 @@ |
.IP 8. 4 |
Total size of stream padding in the file |
.RE |
+.PD |
.PP |
The columns of the |
.B stream |
lines: |
+.PD 0 |
.RS |
.IP 2. 4 |
Stream number (the first stream is 1) |
@@ -1333,15 +1763,18 @@ |
.IP 10. 4 |
Size of stream padding |
.RE |
+.PD |
.PP |
The columns of the |
.B block |
lines: |
+.PD 0 |
.RS |
.IP 2. 4 |
Number of the stream containing this block |
.IP 3. 4 |
-Block number relative to the beginning of the stream (the first block is 1) |
+Block number relative to the beginning of the stream |
+(the first block is 1) |
.IP 4. 4 |
Block number relative to the beginning of the file |
.IP 5. 4 |
@@ -1357,14 +1790,18 @@ |
.IP 10. 4 |
Name of the integrity check |
.RE |
+.PD |
.PP |
If |
.B \-\-verbose |
was specified twice, additional columns are included on the |
.B block |
-lines. These are not displayed with a single |
+lines. |
+These are not displayed with a single |
.BR \-\-verbose , |
-because getting this information requires many seeks and can thus be slow: |
+because getting this information requires many seeks |
+and can thus be slow: |
+.PD 0 |
.RS |
.IP 11. 4 |
Value of the integrity check in hexadecimal |
@@ -1378,26 +1815,30 @@ |
indicates that uncompressed size is present. |
If the flag is not set, a dash |
.RB ( \- ) |
-is shown instead to keep the string length fixed. New flags may be added |
-to the end of the string in the future. |
+is shown instead to keep the string length fixed. |
+New flags may be added to the end of the string in the future. |
.IP 14. 4 |
Size of the actual compressed data in the block (this excludes |
the block header, block padding, and check fields) |
.IP 15. 4 |
-Amount of memory (as bytes) required to decompress this block with this |
+Amount of memory (in bytes) required to decompress |
+this block with this |
.B xz |
version |
.IP 16. 4 |
-Filter chain. Note that most of the options used at compression time cannot |
-be known, because only the options that are needed for decompression are |
-stored in the |
+Filter chain. |
+Note that most of the options used at compression time |
+cannot be known, because only the options |
+that are needed for decompression are stored in the |
.B .xz |
headers. |
.RE |
+.PD |
.PP |
The columns of the |
.B totals |
line: |
+.PD 0 |
.RS |
.IP 2. 4 |
Number of streams |
@@ -1410,14 +1851,17 @@ |
.IP 6. 4 |
Average compression ratio |
.IP 7. 4 |
-Comma-separated list of integrity check names that were present in the files |
+Comma-separated list of integrity check names |
+that were present in the files |
.IP 8. 4 |
Stream padding size |
.IP 9. 4 |
-Number of files. This is here to keep the order of the earlier columns |
-the same as on |
+Number of files. |
+This is here to |
+keep the order of the earlier columns the same as on |
.B file |
lines. |
+.PD |
.RE |
.PP |
If |
@@ -1425,10 +1869,11 @@ |
was specified twice, additional columns are included on the |
.B totals |
line: |
+.PD 0 |
.RS |
.IP 10. 4 |
-Maximum amount of memory (as bytes) required to decompress the files |
-with this |
+Maximum amount of memory (in bytes) required to decompress |
+the files with this |
.B xz |
version |
.IP 11. 4 |
@@ -1438,9 +1883,12 @@ |
indicating if all block headers have both compressed size and |
uncompressed size stored in them |
.RE |
+.PD |
.PP |
-Future versions may add new line types and new columns can be added to |
-the existing line types, but the existing columns won't be changed. |
+Future versions may add new line types and |
+new columns can be added to the existing line types, |
+but the existing columns won't be changed. |
+. |
.SH "EXIT STATUS" |
.TP |
.B 0 |
@@ -1450,21 +1898,76 @@ |
An error occurred. |
.TP |
.B 2 |
-Something worth a warning occurred, but no actual errors occurred. |
+Something worth a warning occurred, |
+but no actual errors occurred. |
.PP |
-Notices (not warnings or errors) printed on standard error don't affect |
-the exit status. |
+Notices (not warnings or errors) printed on standard error |
+don't affect the exit status. |
+. |
.SH ENVIRONMENT |
+.B xz |
+parses space-separated lists of options |
+from the environment variables |
+.B XZ_DEFAULTS |
+and |
+.BR XZ_OPT , |
+in this order, before parsing the options from the command line. |
+Note that only options are parsed from the environment variables; |
+all non-options are silently ignored. |
+Parsing is done with |
+.BR getopt_long (3) |
+which is used also for the command line arguments. |
.TP |
+.B XZ_DEFAULTS |
+User-specific or system-wide default options. |
+Typically this is set in a shell initialization script to enable |
+.BR xz 's |
+memory usage limiter by default. |
+Excluding shell initialization scripts |
+and similar special cases, scripts must never set or unset |
+.BR XZ_DEFAULTS . |
+.TP |
.B XZ_OPT |
-A space-separated list of options is parsed from |
+This is for passing options to |
+.B xz |
+when it is not possible to set the options directly on the |
+.B xz |
+command line. |
+This is the case e.g. when |
+.B xz |
+is run by a script or tool, e.g. GNU |
+.BR tar (1): |
+.RS |
+.RS |
+.PP |
+.nf |
+.ft CW |
+XZ_OPT=\-2v tar caf foo.tar.xz foo |
+.ft R |
+.fi |
+.RE |
+.RE |
+.IP "" |
+Scripts may use |
.B XZ_OPT |
-before parsing the options given on the command line. Note that only |
-options are parsed from |
-.BR XZ_OPT ; |
-all non-options are silently ignored. Parsing is done with |
-.BR getopt_long (3) |
-which is used also for the command line arguments. |
+e.g. to set script-specific default compression options. |
+It is still recommended to allow users to override |
+.B XZ_OPT |
+if that is reasonable, e.g. in |
+.BR sh (1) |
+scripts one may use something like this: |
+.RS |
+.RS |
+.PP |
+.nf |
+.ft CW |
+XZ_OPT=${XZ_OPT\-"\-7e"} |
+export XZ_OPT |
+.ft R |
+.fi |
+.RE |
+.RE |
+. |
.SH "LZMA UTILS COMPATIBILITY" |
The command line syntax of |
.B xz |
@@ -1473,26 +1976,32 @@ |
.BR unlzma , |
and |
.BR lzcat |
-as found from LZMA Utils 4.32.x. In most cases, it is possible to replace |
-LZMA Utils with XZ Utils without breaking existing scripts. There are some |
-incompatibilities though, which may sometimes cause problems. |
+as found from LZMA Utils 4.32.x. |
+In most cases, it is possible to replace |
+LZMA Utils with XZ Utils without breaking existing scripts. |
+There are some incompatibilities though, |
+which may sometimes cause problems. |
+. |
.SS "Compression preset levels" |
The numbering of the compression level presets is not identical in |
.B xz |
and LZMA Utils. |
-The most important difference is how dictionary sizes are mapped to different |
-presets. Dictionary size is roughly equal to the decompressor memory usage. |
+The most important difference is how dictionary sizes |
+are mapped to different presets. |
+Dictionary size is roughly equal to the decompressor memory usage. |
.RS |
+.PP |
.TS |
tab(;); |
c c c |
c n n. |
Level;xz;LZMA Utils |
-\-1;64 KiB;64 KiB |
-\-2;512 KiB;1 MiB |
-\-3;1 MiB;512 KiB |
-\-4;2 MiB;1 MiB |
-\-5;4 MiB;2 MiB |
+\-0;256 KiB;N/A |
+\-1;1 MiB;64 KiB |
+\-2;2 MiB;1 MiB |
+\-3;4 MiB;512 KiB |
+\-4;4 MiB;1 MiB |
+\-5;8 MiB;2 MiB |
\-6;8 MiB;4 MiB |
\-7;16 MiB;8 MiB |
\-8;32 MiB;16 MiB |
@@ -1500,20 +2009,24 @@ |
.TE |
.RE |
.PP |
-The dictionary size differences affect the compressor memory usage too, |
-but there are some other differences between LZMA Utils and XZ Utils, which |
+The dictionary size differences affect |
+the compressor memory usage too, |
+but there are some other differences between |
+LZMA Utils and XZ Utils, which |
make the difference even bigger: |
.RS |
+.PP |
.TS |
tab(;); |
c c c |
c n n. |
Level;xz;LZMA Utils 4.32.x |
-\-1;2 MiB;2 MiB |
-\-2;5 MiB;12 MiB |
-\-3;13 MiB;12 MiB |
-\-4;25 MiB;16 MiB |
-\-5;48 MiB;26 MiB |
+\-0;3 MiB;N/A |
+\-1;9 MiB;2 MiB |
+\-2;17 MiB;12 MiB |
+\-3;32 MiB;12 MiB |
+\-4;48 MiB;16 MiB |
+\-5;94 MiB;26 MiB |
\-6;94 MiB;45 MiB |
\-7;186 MiB;83 MiB |
\-8;370 MiB;159 MiB |
@@ -1525,33 +2038,40 @@ |
.B \-7 |
while in XZ Utils it is |
.BR \-6 , |
-so both use 8 MiB dictionary by default. |
+so both use an 8 MiB dictionary by default. |
+. |
.SS "Streamed vs. non-streamed .lzma files" |
-Uncompressed size of the file can be stored in the |
+The uncompressed size of the file can be stored in the |
.B .lzma |
-header. LZMA Utils does that when compressing regular files. |
-The alternative is to mark that uncompressed size is unknown and |
-use end of payload marker to indicate where the decompressor should stop. |
-LZMA Utils uses this method when uncompressed size isn't known, which is |
-the case for example in pipes. |
+header. |
+LZMA Utils does that when compressing regular files. |
+The alternative is to mark that uncompressed size is unknown |
+and use end-of-payload marker to indicate |
+where the decompressor should stop. |
+LZMA Utils uses this method when uncompressed size isn't known, |
+which is the case for example in pipes. |
.PP |
.B xz |
supports decompressing |
.B .lzma |
-files with or without end of payload marker, but all |
+files with or without end-of-payload marker, but all |
.B .lzma |
files created by |
.B xz |
-will use end of payload marker and have uncompressed size marked as unknown |
-in the |
+will use end-of-payload marker and have uncompressed size |
+marked as unknown in the |
.B .lzma |
-header. This may be a problem in some (uncommon) situations. For example, a |
+header. |
+This may be a problem in some uncommon situations. |
+For example, a |
.B .lzma |
-decompressor in an embedded device might work only with files that have known |
-uncompressed size. If you hit this problem, you need to use LZMA Utils or |
-LZMA SDK to create |
+decompressor in an embedded device might work |
+only with files that have known uncompressed size. |
+If you hit this problem, you need to use LZMA Utils |
+or LZMA SDK to create |
.B .lzma |
files with known uncompressed size. |
+. |
.SS "Unsupported .lzma files" |
The |
.B .lzma |
@@ -1559,7 +2079,8 @@ |
.I lc |
values up to 8, and |
.I lp |
-values up to 4. LZMA Utils can decompress files with any |
+values up to 4. |
+LZMA Utils can decompress files with any |
.I lc |
and |
.IR lp , |
@@ -1575,24 +2096,25 @@ |
.B xz |
and with LZMA SDK. |
.PP |
-The implementation of the LZMA1 filter in liblzma requires |
-that the sum of |
+The implementation of the LZMA1 filter in liblzma |
+requires that the sum of |
.I lc |
and |
.I lp |
-must not exceed 4. Thus, |
+must not exceed 4. |
+Thus, |
.B .lzma |
-files which exceed this limitation, cannot be decompressed with |
+files, which exceed this limitation, cannot be decompressed with |
.BR xz . |
.PP |
LZMA Utils creates only |
.B .lzma |
-files which have dictionary size of |
+files which have a dictionary size of |
.RI "2^" n |
-(a power of 2), but accepts files with any dictionary size. |
+(a power of 2) but accepts files with any dictionary size. |
liblzma accepts only |
.B .lzma |
-files which have dictionary size of |
+files which have a dictionary size of |
.RI "2^" n |
or |
.RI "2^" n " + 2^(" n "\-1)." |
@@ -1600,13 +2122,18 @@ |
.B .lzma |
files. |
.PP |
-These limitations shouldn't be a problem in practice, since practically all |
+These limitations shouldn't be a problem in practice, |
+since practically all |
.B .lzma |
files have been compressed with settings that liblzma will accept. |
+. |
.SS "Trailing garbage" |
-When decompressing, LZMA Utils silently ignore everything after the first |
+When decompressing, |
+LZMA Utils silently ignore everything after the first |
.B .lzma |
-stream. In most situations, this is a bug. This also means that LZMA Utils |
+stream. |
+In most situations, this is a bug. |
+This also means that LZMA Utils |
don't support decompressing concatenated |
.B .lzma |
files. |
@@ -1615,34 +2142,46 @@ |
.B .lzma |
stream, |
.B xz |
-considers the file to be corrupt. This may break obscure scripts which have |
+considers the file to be corrupt. |
+This may break obscure scripts which have |
assumed that trailing garbage is ignored. |
+. |
.SH NOTES |
-.SS Compressed output may vary |
-The exact compressed output produced from the same uncompressed input file |
-may vary between XZ Utils versions even if compression options are identical. |
-This is because the encoder can be improved (faster or better compression) |
-without affecting the file format. The output can vary even between different |
-builds of the same XZ Utils version, if different build options are used. |
+. |
+.SS "Compressed output may vary" |
+The exact compressed output produced from |
+the same uncompressed input file |
+may vary between XZ Utils versions even if |
+compression options are identical. |
+This is because the encoder can be improved |
+(faster or better compression) |
+without affecting the file format. |
+The output can vary even between different |
+builds of the same XZ Utils version, |
+if different build options are used. |
.PP |
The above means that implementing |
.B \-\-rsyncable |
to create rsyncable |
.B .xz |
-files is not going to happen without freezing a part of the encoder |
+files is not going to happen without |
+freezing a part of the encoder |
implementation, which can then be used with |
.BR \-\-rsyncable . |
-.SS Embedded .xz decompressors |
+. |
+.SS "Embedded .xz decompressors" |
Embedded |
.B .xz |
-decompressor implementations like XZ Embedded don't necessarily support files |
-created with |
+decompressor implementations like XZ Embedded don't necessarily |
+support files created with integrity |
.I check |
types other than |
.B none |
and |
.BR crc32 . |
-Since the default is \fB\-\-check=\fIcrc64\fR, you must use |
+Since the default is |
+.BR \-\-check=crc64 , |
+you must use |
.B \-\-check=none |
or |
.B \-\-check=crc32 |
@@ -1652,53 +2191,374 @@ |
.B .xz |
format decompressors support all the |
.I check |
-types, or at least are able to decompress the file without verifying the |
+types, or at least are able to decompress |
+the file without verifying the |
integrity check if the particular |
.I check |
is not supported. |
.PP |
-XZ Embedded supports BCJ filters, but only with the default start offset. |
+XZ Embedded supports BCJ filters, |
+but only with the default start offset. |
+. |
.SH EXAMPLES |
+. |
.SS Basics |
+Compress the file |
+.I foo |
+into |
+.I foo.xz |
+using the default compression level |
+.RB ( \-6 ), |
+and remove |
+.I foo |
+if compression is successful: |
+.RS |
+.PP |
+.nf |
+.ft CW |
+xz foo |
+.ft R |
+.fi |
+.RE |
+.PP |
+Decompress |
+.I bar.xz |
+into |
+.I bar |
+and don't remove |
+.I bar.xz |
+even if decompression is successful: |
+.RS |
+.PP |
+.nf |
+.ft CW |
+xz \-dk bar.xz |
+.ft R |
+.fi |
+.RE |
+.PP |
+Create |
+.I baz.tar.xz |
+with the preset |
+.B \-4e |
+.RB ( "\-4 \-\-extreme" ), |
+which is slower than e.g. the default |
+.BR \-6 , |
+but needs less memory for compression and decompression (48\ MiB |
+and 5\ MiB, respectively): |
+.RS |
+.PP |
+.nf |
+.ft CW |
+tar cf \- baz | xz \-4e > baz.tar.xz |
+.ft R |
+.fi |
+.RE |
+.PP |
A mix of compressed and uncompressed files can be decompressed |
to standard output with a single command: |
-.IP |
-.B "xz -dcf a.txt b.txt.xz c.txt d.txt.xz > abcd.txt" |
-.SS Parallel compression of many files |
+.RS |
+.PP |
+.nf |
+.ft CW |
+xz \-dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt |
+.ft R |
+.fi |
+.RE |
+. |
+.SS "Parallel compression of many files" |
On GNU and *BSD, |
.BR find (1) |
and |
.BR xargs (1) |
-can be used to parallellize compression of many files: |
+can be used to parallelize compression of many files: |
+.RS |
.PP |
-.IP |
-.B "find . \-type f \e! \-name '*.xz' \-print0 | xargs \-0r \-P4 \-n16 xz" |
+.nf |
+.ft CW |
+find . \-type f \e! \-name '*.xz' \-print0 \e |
+ | xargs \-0r \-P4 \-n16 xz \-T1 |
+.ft R |
+.fi |
+.RE |
.PP |
The |
.B \-P |
-option sets the number of parallel |
+option to |
+.BR xargs (1) |
+sets the number of parallel |
.B xz |
-processes. The best value for the |
+processes. |
+The best value for the |
.B \-n |
option depends on how many files there are to be compressed. |
-If there are only a couple of files, the value should probably be |
-.BR 1 ; |
+If there are only a couple of files, |
+the value should probably be 1; |
with tens of thousands of files, |
-.B 100 |
-or even more may be appropriate to reduce the number of |
+100 or even more may be appropriate to reduce the number of |
.B xz |
processes that |
.BR xargs (1) |
will eventually create. |
-.SS Robot mode examples |
-Calculating how many bytes have been saved in total after compressing |
-multiple files: |
-.IP |
-.B "xz --robot --list *.xz | awk '/^totals/{print $5\-$4}'" |
+.PP |
+The option |
+.B \-T1 |
+for |
+.B xz |
+is there to force it to single-threaded mode, because |
+.BR xargs (1) |
+is used to control the amount of parallelization. |
+. |
+.SS "Robot mode" |
+Calculate how many bytes have been saved in total |
+after compressing multiple files: |
+.RS |
+.PP |
+.nf |
+.ft CW |
+xz \-\-robot \-\-list *.xz | awk '/^totals/{print $5\-$4}' |
+.ft R |
+.fi |
+.RE |
+.PP |
+A script may want to know that it is using new enough |
+.BR xz . |
+The following |
+.BR sh (1) |
+script checks that the version number of the |
+.B xz |
+tool is at least 5.0.0. |
+This method is compatible with old beta versions, |
+which didn't support the |
+.B \-\-robot |
+option: |
+.RS |
+.PP |
+.nf |
+.ft CW |
+if ! eval "$(xz \-\-robot \-\-version 2> /dev/null)" || |
+ [ "$XZ_VERSION" \-lt 50000002 ]; then |
+ echo "Your xz is too old." |
+fi |
+unset XZ_VERSION LIBLZMA_VERSION |
+.ft R |
+.fi |
+.RE |
+.PP |
+Set a memory usage limit for decompression using |
+.BR XZ_OPT , |
+but if a limit has already been set, don't increase it: |
+.RS |
+.PP |
+.nf |
+.ft CW |
+NEWLIM=$((123 << 20)) # 123 MiB |
+OLDLIM=$(xz \-\-robot \-\-info\-memory | cut \-f3) |
+if [ $OLDLIM \-eq 0 \-o $OLDLIM \-gt $NEWLIM ]; then |
+ XZ_OPT="$XZ_OPT \-\-memlimit\-decompress=$NEWLIM" |
+ export XZ_OPT |
+fi |
+.ft R |
+.fi |
+.RE |
+. |
+.SS "Custom compressor filter chains" |
+The simplest use for custom filter chains is |
+customizing a LZMA2 preset. |
+This can be useful, |
+because the presets cover only a subset of the |
+potentially useful combinations of compression settings. |
+.PP |
+The CompCPU columns of the tables |
+from the descriptions of the options |
+.BR "\-0" " ... " "\-9" |
+and |
+.B \-\-extreme |
+are useful when customizing LZMA2 presets. |
+Here are the relevant parts collected from those two tables: |
+.RS |
+.PP |
+.TS |
+tab(;); |
+c c |
+n n. |
+Preset;CompCPU |
+\-0;0 |
+\-1;1 |
+\-2;2 |
+\-3;3 |
+\-4;4 |
+\-5;5 |
+\-6;6 |
+\-5e;7 |
+\-6e;8 |
+.TE |
+.RE |
+.PP |
+If you know that a file requires |
+somewhat big dictionary (e.g. 32 MiB) to compress well, |
+but you want to compress it quicker than |
+.B "xz \-8" |
+would do, a preset with a low CompCPU value (e.g. 1) |
+can be modified to use a bigger dictionary: |
+.RS |
+.PP |
+.nf |
+.ft CW |
+xz \-\-lzma2=preset=1,dict=32MiB foo.tar |
+.ft R |
+.fi |
+.RE |
+.PP |
+With certain files, the above command may be faster than |
+.B "xz \-6" |
+while compressing significantly better. |
+However, it must be emphasized that only some files benefit from |
+a big dictionary while keeping the CompCPU value low. |
+The most obvious situation, |
+where a big dictionary can help a lot, |
+is an archive containing very similar files |
+of at least a few megabytes each. |
+The dictionary size has to be significantly bigger |
+than any individual file to allow LZMA2 to take |
+full advantage of the similarities between consecutive files. |
+.PP |
+If very high compressor and decompressor memory usage is fine, |
+and the file being compressed is |
+at least several hundred megabytes, it may be useful |
+to use an even bigger dictionary than the 64 MiB that |
+.B "xz \-9" |
+would use: |
+.RS |
+.PP |
+.nf |
+.ft CW |
+xz \-vv \-\-lzma2=dict=192MiB big_foo.tar |
+.ft R |
+.fi |
+.RE |
+.PP |
+Using |
+.B \-vv |
+.RB ( "\-\-verbose \-\-verbose" ) |
+like in the above example can be useful |
+to see the memory requirements |
+of the compressor and decompressor. |
+Remember that using a dictionary bigger than |
+the size of the uncompressed file is waste of memory, |
+so the above command isn't useful for small files. |
+.PP |
+Sometimes the compression time doesn't matter, |
+but the decompressor memory usage has to be kept low |
+e.g. to make it possible to decompress the file on |
+an embedded system. |
+The following command uses |
+.B \-6e |
+.RB ( "\-6 \-\-extreme" ) |
+as a base and sets the dictionary to only 64\ KiB. |
+The resulting file can be decompressed with XZ Embedded |
+(that's why there is |
+.BR \-\-check=crc32 ) |
+using about 100\ KiB of memory. |
+.RS |
+.PP |
+.nf |
+.ft CW |
+xz \-\-check=crc32 \-\-lzma2=preset=6e,dict=64KiB foo |
+.ft R |
+.fi |
+.RE |
+.PP |
+If you want to squeeze out as many bytes as possible, |
+adjusting the number of literal context bits |
+.RI ( lc ) |
+and number of position bits |
+.RI ( pb ) |
+can sometimes help. |
+Adjusting the number of literal position bits |
+.RI ( lp ) |
+might help too, but usually |
+.I lc |
+and |
+.I pb |
+are more important. |
+E.g. a source code archive contains mostly US-ASCII text, |
+so something like the following might give |
+slightly (like 0.1\ %) smaller file than |
+.B "xz \-6e" |
+(try also without |
+.BR lc=4 ): |
+.RS |
+.PP |
+.nf |
+.ft CW |
+xz \-\-lzma2=preset=6e,pb=0,lc=4 source_code.tar |
+.ft R |
+.fi |
+.RE |
+.PP |
+Using another filter together with LZMA2 can improve |
+compression with certain file types. |
+E.g. to compress a x86-32 or x86-64 shared library |
+using the x86 BCJ filter: |
+.RS |
+.PP |
+.nf |
+.ft CW |
+xz \-\-x86 \-\-lzma2 libfoo.so |
+.ft R |
+.fi |
+.RE |
+.PP |
+Note that the order of the filter options is significant. |
+If |
+.B \-\-x86 |
+is specified after |
+.BR \-\-lzma2 , |
+.B xz |
+will give an error, |
+because there cannot be any filter after LZMA2, |
+and also because the x86 BCJ filter cannot be used |
+as the last filter in the chain. |
+.PP |
+The Delta filter together with LZMA2 |
+can give good results with bitmap images. |
+It should usually beat PNG, |
+which has a few more advanced filters than simple |
+delta but uses Deflate for the actual compression. |
+.PP |
+The image has to be saved in uncompressed format, |
+e.g. as uncompressed TIFF. |
+The distance parameter of the Delta filter is set |
+to match the number of bytes per pixel in the image. |
+E.g. 24-bit RGB bitmap needs |
+.BR dist=3 , |
+and it is also good to pass |
+.B pb=0 |
+to LZMA2 to accommodate the three-byte alignment: |
+.RS |
+.PP |
+.nf |
+.ft CW |
+xz \-\-delta=dist=3 \-\-lzma2=pb=0 foo.tiff |
+.ft R |
+.fi |
+.RE |
+.PP |
+If multiple images have been put into a single archive (e.g.\& |
+.BR .tar ), |
+the Delta filter will work on that too as long as all images |
+have the same number of bytes per pixel. |
+. |
.SH "SEE ALSO" |
.BR xzdec (1), |
+.BR xzdiff (1), |
+.BR xzgrep (1), |
+.BR xzless (1), |
+.BR xzmore (1), |
.BR gzip (1), |
-.BR bzip2 (1) |
+.BR bzip2 (1), |
+.BR 7z (1) |
.PP |
XZ Utils: <http://tukaani.org/xz/> |
.br |