Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(674)

Unified Diff: src/xz/xz.1

Issue 7109015: Update XZ Utils to 5.0.3 (in deps) (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/deps/third_party/xz/
Patch Set: Created 9 years, 7 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « src/xz/util.c ('k') | src/xzdec/Makefile.am » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: src/xz/xz.1
===================================================================
--- src/xz/xz.1 (revision 50504)
+++ src/xz/xz.1 (working copy)
@@ -5,9 +5,11 @@
.\" This file has been put into the public domain.
.\" You can do whatever you want with this file.
.\"
-.TH XZ 1 "2010-06-15" "Tukaani" "XZ Utils"
+.TH XZ 1 "2010-10-04" "Tukaani" "XZ Utils"
+.
.SH NAME
xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files
+.
.SH SYNOPSIS
.B xz
.RI [ option ]...
@@ -33,8 +35,8 @@
is equivalent to
.BR "xz \-\-format=lzma \-\-decompress \-\-stdout" .
.PP
-When writing scripts that need to decompress files, it is recommended to
-always use the name
+When writing scripts that need to decompress files,
+it is recommended to always use the name
.B xz
with appropriate arguments
.RB ( "xz \-d"
@@ -43,19 +45,22 @@
instead of the names
.B unxz
and
-.BR xzcat.
+.BR xzcat .
+.
.SH DESCRIPTION
.B xz
-is a general-purpose data compression tool with command line syntax similar to
+is a general-purpose data compression tool with
+command line syntax similar to
.BR gzip (1)
and
.BR bzip2 (1).
The native file format is the
.B .xz
-format, but also the legacy
+format, but the legacy
.B .lzma
-format and raw compressed streams with no container format headers
-are supported.
+format used by LZMA Utils and
+raw compressed streams with no container format headers
+are also supported.
.PP
.B xz
compresses or decompresses each
@@ -68,13 +73,16 @@
is
.BR \- ,
.B xz
-reads from standard input and writes the processed data to standard output.
+reads from standard input and writes the processed data
+to standard output.
.B xz
will refuse (display an error and skip the
.IR file )
-to write compressed data to standard output if it is a terminal. Similarly,
+to write compressed data to standard output if it is a terminal.
+Similarly,
.B xz
-will refuse to read compressed data from standard input if it is a terminal.
+will refuse to read compressed data
+from standard input if it is a terminal.
.PP
Unless
.B \-\-stdout
@@ -117,8 +125,9 @@
if any of the following applies:
.IP \(bu 3
.I File
-is not a regular file. Symbolic links are not followed, thus they
-are not considered to be regular files.
+is not a regular file.
+Symbolic links are not followed,
+and thus they are not considered to be regular files.
.IP \(bu 3
.I File
has more than one hard link.
@@ -126,7 +135,7 @@
.I File
has setuid, setgid, or sticky bit set.
.IP \(bu 3
-The operation mode is set to compress, and the
+The operation mode is set to compress and the
.I file
already has a suffix of the target file format
.RB ( .xz
@@ -142,7 +151,7 @@
.B .lzma
format).
.IP \(bu 3
-The operation mode is set to decompress, and the
+The operation mode is set to decompress and the
.I file
doesn't have a suffix of any of the supported file formats
.RB ( .xz ,
@@ -154,12 +163,13 @@
After successfully compressing or decompressing the
.IR file ,
.B xz
-copies the owner, group, permissions, access time, and modification time
-from the source
+copies the owner, group, permissions, access time,
+and modification time from the source
.I file
-to the target file. If copying the group fails, the permissions are modified
-so that the target file doesn't become accessible to users who didn't have
-permission to access the source
+to the target file.
+If copying the group fails, the permissions are modified
+so that the target file doesn't become accessible to users
+who didn't have permission to access the source
.IR file .
.B xz
doesn't support copying other metadata like access control lists
@@ -169,7 +179,8 @@
.I file
is removed unless
.B \-\-keep
-was specified. The source
+was specified.
+The source
.I file
is never removed if the output is written to standard output.
.PP
@@ -180,61 +191,78 @@
to the
.B xz
process makes it print progress information to standard error.
-This has only limited use since when standard error is a terminal, using
+This has only limited use since when standard error
+is a terminal, using
.B \-\-verbose
will display an automatically updating progress indicator.
+.
.SS "Memory usage"
The memory usage of
.B xz
-varies from a few hundred kilobytes to several gigabytes depending on
-the compression settings. The settings used when compressing a file
-affect also the memory usage of the decompressor. Typically the decompressor
-needs only 5\ % to 20\ % of the amount of RAM that the compressor needed when
-creating the file. Still, the worst-case memory usage of the decompressor
-is several gigabytes.
+varies from a few hundred kilobytes to several gigabytes
+depending on the compression settings.
+The settings used when compressing a file determine
+the memory requirements of the decompressor.
+Typically the decompressor needs 5\ % to 20\ % of
+the amount of memory that the compressor needed when
+creating the file.
+For example, decompressing a file created with
+.B xz \-9
+currently requires 65\ MiB of memory.
+Still, it is possible to have
+.B .xz
+files that require several gigabytes of memory to decompress.
.PP
-To prevent uncomfortable surprises caused by huge memory usage,
+Especially users of older systems may find
+the possibility of very large memory usage annoying.
+To prevent uncomfortable surprises,
.B xz
-has a built-in memory usage limiter. While some operating systems provide
-ways to limit the memory usage of processes, relying on it wasn't deemed
-to be flexible enough. The default limit depends on the total amount of
-physical RAM:
-.IP \(bu 3
-If 40\ % of RAM is at least 80 MiB, 40\ % of RAM is used as the limit.
-.IP \(bu 3
-If 80\ % of RAM is less than 80 MiB, 80\ % of RAM is used as the limit.
-.IP \(bu 3
-Otherwise 80 MiB is used as the limit.
+has a built-in memory usage limiter, which is disabled by default.
+While some operating systems provide ways to limit
+the memory usage of processes, relying on it
+wasn't deemed to be flexible enough (e.g. using
+.BR ulimit (1)
+to limit virtual memory tends to cripple
+.BR mmap (2)).
.PP
-When compressing, if the selected compression settings exceed the memory
-usage limit, the settings are automatically adjusted downwards and a notice
-about this is displayed. As an exception, if the memory usage limit is
-exceeded when compressing with
-.B \-\-format=raw
-or
-.BR \-\-no\-adjust ,
-an error is displayed and
+The memory usage limiter can be enabled with
+the command line option \fB\-\-memlimit=\fIlimit\fR.
+Often it is more convenient to enable the limiter
+by default by setting the environment variable
+.BR XZ_DEFAULTS ,
+e.g.\&
+.BR XZ_DEFAULTS=\-\-memlimit=150MiB .
+It is possible to set the limits separately
+for compression and decompression
+by using \fB\-\-memlimit\-compress=\fIlimit\fR and
+\fB\-\-memlimit\-decompress=\fIlimit\fR.
+Using these two options outside
+.B XZ_DEFAULTS
+is rarely useful because a single run of
.B xz
-will exit with exit status
-.BR 1 .
+cannot do both compression and decompression and
+.BI \-\-memlimit= limit
+(or \fB\-M\fR \fIlimit\fR)
+is shorter to type on the command line.
.PP
-If source
-.I file
-cannot be decompressed without exceeding the memory usage limit, an error
-message is displayed and the file is skipped. Note that compressed files
-may contain many blocks, which may have been compressed with different
-settings. Typically all blocks will have roughly the same memory requirements,
-but it is possible that a block later in the file will exceed the memory usage
-limit, and an error about too low memory usage limit gets displayed after some
-data has already been decompressed.
-.PP
-The absolute value of the active memory usage limit can be seen with
-.B \-\-info-memory
-or near the bottom of the output of
-.BR \-\-long\-help .
-The default limit can be overridden with
-\fB\-\-memory=\fIlimit\fR.
-.SS Concatenation and padding with .xz files
+If the specified memory usage limit is exceeded when decompressing,
+.B xz
+will display an error and decompressing the file will fail.
+If the limit is exceeded when compressing,
+.B xz
+will try to scale the settings down so that the limit
+is no longer exceeded (except when using \fB\-\-format=raw\fR
+or \fB\-\-no\-adjust\fR).
+This way the operation won't fail unless the limit is very small.
+The scaling of the settings is done in steps that don't
+match the compression level presets, e.g. if the limit is
+only slightly less than the amount required for
+.BR "xz \-9" ,
+the settings will be scaled down only a little,
+not all the way down to
+.BR "xz \-8" .
+.
+.SS "Concatenation and padding with .xz files"
It is possible to concatenate
.B .xz
files as is.
@@ -243,23 +271,28 @@
.B .xz
file.
.PP
-It is possible to insert padding between the concenated parts
-or after the last part. The padding must be null bytes and the size
-of the padding must be a multiple of four bytes. This can be useful
-if the .xz file is stored on a medium that stores file sizes
-e.g. as 512-byte blocks.
+It is possible to insert padding between the concatenated parts
+or after the last part.
+The padding must consist of null bytes and the size
+of the padding must be a multiple of four bytes.
+This can be useful e.g. if the
+.B .xz
+file is stored on a medium that measures file sizes
+in 512-byte blocks.
.PP
Concatenation and padding are not allowed with
.B .lzma
files or raw streams.
+.
.SH OPTIONS
+.
.SS "Integer suffixes and special values"
-In most places where an integer argument is expected, an optional suffix
-is supported to easily indicate large integers. There must be no space
-between the integer and the suffix.
+In most places where an integer argument is expected,
+an optional suffix is supported to easily indicate large integers.
+There must be no space between the integer and the suffix.
.TP
.B KiB
-The integer is multiplied by 1,024 (2^10). Also
+Multiply the integer by 1,024 (2^10).
.BR Ki ,
.BR k ,
.BR kB ,
@@ -270,7 +303,7 @@
.BR KiB .
.TP
.B MiB
-The integer is multiplied by 1,048,576 (2^20). Also
+Multiply the integer by 1,048,576 (2^20).
.BR Mi ,
.BR m ,
.BR M ,
@@ -280,7 +313,7 @@
.BR MiB .
.TP
.B GiB
-The integer is multiplied by 1,073,741,824 (2^30). Also
+Multiply the integer by 1,073,741,824 (2^30).
.BR Gi ,
.BR g ,
.BR G ,
@@ -289,16 +322,20 @@
are accepted as synonyms for
.BR GiB .
.PP
-A special value
+The special value
.B max
-can be used to indicate the maximum integer value supported by the option.
+can be used to indicate the maximum integer value
+supported by the option.
+.
.SS "Operation mode"
-If multiple operation mode options are given, the last one takes effect.
+If multiple operation mode options are given,
+the last one takes effect.
.TP
.BR \-z ", " \-\-compress
-Compress. This is the default operation mode when no operation mode option
-is specified, and no other operation mode is implied from the command name
-(for example,
+Compress.
+This is the default operation mode when no operation mode option
+is specified and no other operation mode is implied from
+the command name (for example,
.B unxz
implies
.BR \-\-decompress ).
@@ -309,62 +346,73 @@
.BR \-t ", " \-\-test
Test the integrity of compressed
.IR files .
-No files are created or removed. This option is equivalent to
+This option is equivalent to
.B "\-\-decompress \-\-stdout"
except that the decompressed data is discarded instead of being
written to standard output.
+No files are created or removed.
.TP
.BR \-l ", " \-\-list
-List information about compressed
+Print information about compressed
.IR files .
-No uncompressed output is produced, and no files are created or removed.
-In list mode, the program cannot read the compressed data from standard
+No uncompressed output is produced,
+and no files are created or removed.
+In list mode, the program cannot read
+the compressed data from standard
input or from other unseekable sources.
-.IP
+.IP ""
The default listing shows basic information about
.IR files ,
-one file per line. To get more detailed information, use also the
+one file per line.
+To get more detailed information, use also the
.B \-\-verbose
-option. For even more information, use
+option.
+For even more information, use
.B \-\-verbose
-twice, but note that it may be slow, because getting all the extra
-information requires many seeks. The width of verbose output exceeds
-80 characters, so piping the output to e.g.
+twice, but note that this may be slow, because getting all the extra
+information requires many seeks.
+The width of verbose output exceeds
+80 characters, so piping the output to e.g.\&
.B "less\ \-S"
may be convenient if the terminal isn't wide enough.
-.IP
+.IP ""
The exact output may vary between
.B xz
-versions and different locales. To get machine-readable output,
+versions and different locales.
+For machine-readable output,
.B \-\-robot \-\-list
should be used.
+.
.SS "Operation modifiers"
.TP
.BR \-k ", " \-\-keep
-Keep (don't delete) the input files.
+Don't delete the input files.
.TP
.BR \-f ", " \-\-force
This option has several effects:
.RS
.IP \(bu 3
-If the target file already exists, delete it before compressing or
-decompressing.
+If the target file already exists,
+delete it before compressing or decompressing.
.IP \(bu 3
-Compress or decompress even if the input is a symbolic link to a regular file,
-has more than one hard link, or has setuid, setgid, or sticky bit set.
-The setuid, setgid, and sticky bits are not copied to the target file.
+Compress or decompress even if the input is
+a symbolic link to a regular file,
+has more than one hard link,
+or has the setuid, setgid, or sticky bit set.
+The setuid, setgid, and sticky bits are not copied
+to the target file.
.IP \(bu 3
-If combined with
+When used with
.B \-\-decompress
.BR \-\-stdout
and
.B xz
-doesn't recognize the type of the source file,
-.B xz
-will copy the source file as is to standard output. This allows using
+cannot recognize the type of the source file,
+copy the source file as is to standard output.
+This allows
.B xzcat
-.B \--force
-like
+.B \-\-force
+to be used like
.BR cat (1)
for files that have not been compressed with
.BR xz .
@@ -380,21 +428,23 @@
to decompress only a single file format.
.RE
.TP
-.BR \-c ", " \-\-stdout ", " \-\-to-stdout
-Write the compressed or decompressed data to standard output instead of
-a file. This implies
+.BR \-c ", " \-\-stdout ", " \-\-to\-stdout
+Write the compressed or decompressed data to
+standard output instead of a file.
+This implies
.BR \-\-keep .
.TP
.B \-\-no\-sparse
-Disable creation of sparse files. By default, if decompressing into
-a regular file,
+Disable creation of sparse files.
+By default, if decompressing into a regular file,
.B xz
-tries to make the file sparse if the decompressed data contains long
-sequences of binary zeros. It works also when writing to standard output
-as long as standard output is connected to a regular file, and certain
-additional conditions are met to make it safe. Creating sparse files may
-save disk space and speed up the decompression by reducing the amount of
-disk I/O.
+tries to make the file sparse if the decompressed data contains
+long sequences of binary zeros.
+It also works when writing to standard output
+as long as standard output is connected to a regular file
+and certain additional conditions are met to make it safe.
+Creating sparse files may save disk space and speed up
+the decompression by reducing the amount of disk I/O.
.TP
\fB\-S\fR \fI.suf\fR, \fB\-\-suffix=\fI.suf
When compressing, use
@@ -403,11 +453,12 @@
.B .xz
or
.BR .lzma .
-If not writing to standard output and the source file already has the suffix
+If not writing to standard output and
+the source file already has the suffix
.IR .suf ,
a warning is displayed and the file is skipped.
-.IP
-When decompressing, recognize also files with the suffix
+.IP ""
+When decompressing, recognize files with the suffix
.I .suf
in addition to files with the
.BR .xz ,
@@ -415,13 +466,15 @@
.BR .lzma ,
or
.B .tlz
-suffix. If the source file has the suffix
+suffix.
+If the source file has the suffix
.IR .suf ,
the suffix is removed to get the target filename.
-.IP
+.IP ""
When compressing or decompressing raw streams
.RB ( \-\-format=raw ),
-the suffix must always be specified unless writing to standard output,
+the suffix must always be specified unless
+writing to standard output,
because there is no default suffix for raw streams.
.TP
\fB\-\-files\fR[\fB=\fIfile\fR]
@@ -429,8 +482,9 @@
.IR file ;
if
.I file
-is omitted, filenames are read from standard input. Filenames must be
-terminated with the newline character. A dash
+is omitted, filenames are read from standard input.
+Filenames must be terminated with the newline character.
+A dash
.RB ( \- )
is taken as a regular filename; it doesn't mean standard input.
If filenames are given also as command line arguments, they are
@@ -438,296 +492,469 @@
.IR file .
.TP
\fB\-\-files0\fR[\fB=\fIfile\fR]
-This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except that the
-filenames must be terminated with the null character.
+This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except
+that each filename must be terminated with the null character.
+.
.SS "Basic file format and compression options"
.TP
\fB\-F\fR \fIformat\fR, \fB\-\-format=\fIformat
-Specify the file format to compress or decompress:
+Specify the file
+.I format
+to compress or decompress:
.RS
-.IP \(bu 3
-.BR auto :
-This is the default. When compressing,
+.TP
.B auto
+This is the default.
+When compressing,
+.B auto
is equivalent to
.BR xz .
-When decompressing, the format of the input file is automatically detected.
+When decompressing,
+the format of the input file is automatically detected.
Note that raw streams (created with
.BR \-\-format=raw )
cannot be auto-detected.
-.IP \(bu 3
-.BR xz :
+.TP
+.B xz
Compress to the
.B .xz
file format, or accept only
.B .xz
files when decompressing.
-.IP \(bu 3
-.B lzma
-or
-.BR alone :
+.TP
+.BR lzma ", " alone
Compress to the legacy
.B .lzma
file format, or accept only
.B .lzma
-files when decompressing. The alternative name
+files when decompressing.
+The alternative name
.B alone
is provided for backwards compatibility with LZMA Utils.
-.IP \(bu 3
-.BR raw :
-Compress or uncompress a raw stream (no headers). This is meant for advanced
-users only. To decode raw streams, you need to set not only
+.TP
+.B raw
+Compress or uncompress a raw stream (no headers).
+This is meant for advanced users only.
+To decode raw streams, you need use
.B \-\-format=raw
-but also specify the filter chain, which would normally be stored in the
-container format headers.
+and explicitly specify the filter chain,
+which normally would have been stored in the container headers.
.RE
.TP
\fB\-C\fR \fIcheck\fR, \fB\-\-check=\fIcheck
-Specify the type of the integrity check, which is calculated from the
-uncompressed data. This option has an effect only when compressing into the
+Specify the type of the integrity check.
+The check is calculated from the uncompressed data and
+stored in the
.B .xz
+file.
+This option has an effect only when compressing into the
+.B .xz
format; the
.B .lzma
format doesn't support integrity checks.
The integrity check (if any) is verified when the
.B .xz
file is decompressed.
-.IP
+.IP ""
Supported
.I check
types:
.RS
-.IP \(bu 3
-.BR none :
-Don't calculate an integrity check at all. This is usually a bad idea. This
-can be useful when integrity of the data is verified by other means anyway.
-.IP \(bu 3
-.BR crc32 :
+.TP
+.B none
+Don't calculate an integrity check at all.
+This is usually a bad idea.
+This can be useful when integrity of the data is verified
+by other means anyway.
+.TP
+.B crc32
Calculate CRC32 using the polynomial from IEEE-802.3 (Ethernet).
-.IP \(bu 3
-.BR crc64 :
-Calculate CRC64 using the polynomial from ECMA-182. This is the default, since
-it is slightly better than CRC32 at detecting damaged files and the speed
-difference is negligible.
-.IP \(bu 3
-.BR sha256 :
-Calculate SHA-256. This is somewhat slower than CRC32 and CRC64.
+.TP
+.B crc64
+Calculate CRC64 using the polynomial from ECMA-182.
+This is the default, since it is slightly better than CRC32
+at detecting damaged files and the speed difference is negligible.
+.TP
+.B sha256
+Calculate SHA-256.
+This is somewhat slower than CRC32 and CRC64.
.RE
-.IP
+.IP ""
Integrity of the
.B .xz
-headers is always verified with CRC32. It is not possible to change or
-disable it.
+headers is always verified with CRC32.
+It is not possible to change or disable it.
.TP
.BR \-0 " ... " \-9
-Select compression preset. If a preset level is specified multiple times,
+Select a compression preset level.
+The default is
+.BR \-6 .
+If multiple preset levels are specified,
the last one takes effect.
-.IP
-The compression preset levels can be categorised roughly into three
-categories:
-.RS
-.IP "\fB\-0\fR ... \fB\-2"
-Fast presets with relatively low memory usage.
-.B \-1
+If a custom filter chain was already specified, setting
+a compression preset level clears the custom filter chain.
+.IP ""
+The differences between the presets are more significant than with
+.BR gzip (1)
and
-.B \-2
-should give compression speed and ratios comparable to
-.B "bzip2 \-1"
+.BR bzip2 (1).
+The selected compression settings determine
+the memory requirements of the decompressor,
+thus using a too high preset level might make it painful
+to decompress the file on an old system with little RAM.
+Specifically,
+.B "it's not a good idea to blindly use \-9 for everything"
+like it often is with
+.BR gzip (1)
and
-.BR "bzip2 \-9" ,
-respectively.
-Currently
+.BR bzip2 (1).
+.RS
+.TP
+.BR "\-0" " ... " "\-3"
+These are somewhat fast presets.
.B \-0
-is not very good (not much faster than
-.B \-1
-but much worse compression). In future,
-.B \-0
-may be indicate some fast algorithm instead of LZMA2.
-.IP "\fB\-3\fR ... \fB\-5"
-Good compression ratio with low to medium memory usage.
-These are significantly slower than levels 0\-2.
-.IP "\fB\-6\fR ... \fB\-9"
-Excellent compression with medium to high memory usage. These are also
-slower than the lower preset levels. The default is
-.BR \-6 .
-Unless you want to maximize the compression ratio, you probably don't want
-a higher preset level than
-.B \-7
-due to speed and memory usage.
+is sometimes faster than
+.B "gzip \-9"
+while compressing much better.
+The higher ones often have speed comparable to
+.BR bzip2 (1)
+with comparable or better compression ratio,
+although the results
+depend a lot on the type of data being compressed.
+.TP
+.BR "\-4" " ... " "\-6"
+Good to very good compression while keeping
+decompressor memory usage reasonable even for old systems.
+.B \-6
+is the default, which is usually a good choice
+e.g. for distributing files that need to be decompressible
+even on systems with only 16\ MiB RAM.
+.RB ( \-5e
+or
+.B \-6e
+may be worth considering too.
+See
+.BR \-\-extreme .)
+.TP
+.B "\-7 ... \-9"
+These are like
+.B \-6
+but with higher compressor and decompressor memory requirements.
+These are useful only when compressing files bigger than
+8\ MiB, 16\ MiB, and 32\ MiB, respectively.
.RE
-.IP
-The exact compression settings (filter chain) used by each preset may
-vary between
-.B xz
-versions. The settings may also vary between files being compressed, if
-.B xz
-determines that modified settings will probably give better compression
-ratio without significantly affecting compression time or memory usage.
-.IP
-Because the settings may vary, the memory usage may vary too. The following
-table lists the maximum memory usage of each preset level, which won't be
-exceeded even in future versions of
-.BR xz .
-.IP
-.B "FIXME: The table below is just a rough idea."
+.IP ""
+On the same hardware, the decompression speed is approximately
+a constant number of bytes of compressed data per second.
+In other words, the better the compression,
+the faster the decompression will usually be.
+This also means that the amount of uncompressed output
+produced per second can vary a lot.
+.IP ""
+The following table summarises the features of the presets:
.RS
.RS
+.PP
.TS
tab(;);
-c c c
-n n n.
-Preset;Compression;Decompression
-\-0;6 MiB;1 MiB
-\-1;6 MiB;1 MiB
-\-2;10 MiB;1 MiB
-\-3;20 MiB;2 MiB
-\-4;30 MiB;3 MiB
-\-5;60 MiB;6 MiB
-\-6;100 MiB;10 MiB
-\-7;200 MiB;20 MiB
-\-8;400 MiB;40 MiB
-\-9;800 MiB;80 MiB
+c c c c c
+n n n n n.
+Preset;DictSize;CompCPU;CompMem;DecMem
+\-0;256 KiB;0;3 MiB;1 MiB
+\-1;1 MiB;1;9 MiB;2 MiB
+\-2;2 MiB;2;17 MiB;3 MiB
+\-3;4 MiB;3;32 MiB;5 MiB
+\-4;4 MiB;4;48 MiB;5 MiB
+\-5;8 MiB;5;94 MiB;9 MiB
+\-6;8 MiB;6;94 MiB;9 MiB
+\-7;16 MiB;6;186 MiB;17 MiB
+\-8;32 MiB;6;370 MiB;33 MiB
+\-9;64 MiB;6;674 MiB;65 MiB
.TE
.RE
.RE
-.IP
-When compressing,
+.IP ""
+Column descriptions:
+.RS
+.IP \(bu 3
+DictSize is the LZMA2 dictionary size.
+It is waste of memory to use a dictionary bigger than
+the size of the uncompressed file.
+This is why it is good to avoid using the presets
+.BR \-7 " ... " \-9
+when there's no real need for them.
+At
+.B \-6
+and lower, the amount of memory wasted is
+usually low enough to not matter.
+.IP \(bu 3
+CompCPU is a simplified representation of the LZMA2 settings
+that affect compression speed.
+The dictionary size affects speed too,
+so while CompCPU is the same for levels
+.BR \-6 " ... " \-9 ,
+higher levels still tend to be a little slower.
+To get even slower and thus possibly better compression, see
+.BR \-\-extreme .
+.IP \(bu 3
+CompMem contains the compressor memory requirements
+in the single-threaded mode.
+It may vary slightly between
.B xz
-automatically adjusts the compression settings downwards if
-the memory usage limit would be exceeded, so it is safe to specify
-a high preset level even on systems that don't have lots of RAM.
+versions.
+Memory requirements of some of the future multithreaded modes may
+be dramatically higher than that of the single-threaded mode.
+.IP \(bu 3
+DecMem contains the decompressor memory requirements.
+That is, the compression settings determine
+the memory requirements of the decompressor.
+The exact decompressor memory usage is slighly more than
+the LZMA2 dictionary size, but the values in the table
+have been rounded up to the next full MiB.
+.RE
.TP
-.BR \-\-fast " and " \-\-best
+.BR \-e ", " \-\-extreme
+Use a slower variant of the selected compression preset level
+.RB ( \-0 " ... " \-9 )
+to hopefully get a little bit better compression ratio,
+but with bad luck this can also make it worse.
+Decompressor memory usage is not affected,
+but compressor memory usage increases a little at preset levels
+.BR \-0 " ... " \-3 .
+.IP ""
+Since there are two presets with dictionary sizes
+4\ MiB and 8\ MiB, the presets
+.B \-3e
+and
+.B \-5e
+use slightly faster settings (lower CompCPU) than
+.B \-4e
+and
+.BR \-6e ,
+respectively.
+That way no two presets are identical.
+.RS
+.RS
+.PP
+.TS
+tab(;);
+c c c c c
+n n n n n.
+Preset;DictSize;CompCPU;CompMem;DecMem
+\-0e;256 KiB;8;4 MiB;1 MiB
+\-1e;1 MiB;8;13 MiB;2 MiB
+\-2e;2 MiB;8;25 MiB;3 MiB
+\-3e;4 MiB;7;48 MiB;5 MiB
+\-4e;4 MiB;8;48 MiB;5 MiB
+\-5e;8 MiB;7;94 MiB;9 MiB
+\-6e;8 MiB;8;94 MiB;9 MiB
+\-7e;16 MiB;8;186 MiB;17 MiB
+\-8e;32 MiB;8;370 MiB;33 MiB
+\-9e;64 MiB;8;674 MiB;65 MiB
+.TE
+.RE
+.RE
+.IP ""
+For example, there are a total of four presets that use
+8\ MiB dictionary, whose order from the fastest to the slowest is
+.BR \-5 ,
+.BR \-6 ,
+.BR \-5e ,
+and
+.BR \-6e .
+.TP
+.B \-\-fast
+.PD 0
+.TP
+.B \-\-best
+.PD
These are somewhat misleading aliases for
.B \-0
and
.BR \-9 ,
respectively.
-These are provided only for backwards compatibility with LZMA Utils.
+These are provided only for backwards compatibility
+with LZMA Utils.
Avoid using these options.
-.IP
-Especially the name of
-.B \-\-best
-is misleading, because the definition of best depends on the input data,
-and that usually people don't want the very best compression ratio anyway,
-because it would be very slow.
.TP
-.BR \-e ", " \-\-extreme
-Modify the compression preset (\fB\-0\fR ... \fB\-9\fR) so that a little bit
-better compression ratio can be achieved without increasing memory usage
-of the compressor or decompressor (exception: compressor memory usage may
-increase a little with presets \fB\-0\fR ... \fB\-2\fR). The downside is that
-the compression time will increase dramatically (it can easily double).
-.TP
+.BI \-\-memlimit\-compress= limit
+Set a memory usage limit for compression.
+If this option is specified multiple times,
+the last one takes effect.
+.IP ""
+If the compression settings exceed the
+.IR limit ,
+.B xz
+will adjust the settings downwards so that
+the limit is no longer exceeded and display a notice that
+automatic adjustment was done.
+Such adjustments are not made when compressing with
+.B \-\-format=raw
+or if
.B \-\-no\-adjust
-Display an error and exit if the compression settings exceed the
-the memory usage limit. The default is to adjust the settings downwards so
-that the memory usage limit is not exceeded. Automatic adjusting is
-always disabled when creating raw streams
-.RB ( \-\-format=raw ).
-.TP
-\fB\-M\fR \fIlimit\fR, \fB\-\-memory=\fIlimit
-Set the memory usage limit. If this option is specified multiple times,
-the last one takes effect. The
+has been specified.
+In those cases, an error is displayed and
+.B xz
+will exit with exit status 1.
+.IP ""
+The
.I limit
can be specified in multiple ways:
.RS
.IP \(bu 3
The
.I limit
-can be an absolute value in bytes. Using an integer suffix like
+can be an absolute value in bytes.
+Using an integer suffix like
.B MiB
-can be useful. Example:
-.B "\-\-memory=80MiB"
+can be useful.
+Example:
+.B "\-\-memlimit\-compress=80MiB"
.IP \(bu 3
The
.I limit
-can be specified as a percentage of physical RAM. Example:
-.B "\-\-memory=70%"
+can be specified as a percentage of total physical memory (RAM).
+This can be useful especially when setting the
+.B XZ_DEFAULTS
+environment variable in a shell initialization script
+that is shared between different computers.
+That way the limit is automatically bigger
+on systems with more memory.
+Example:
+.B "\-\-memlimit\-compress=70%"
.IP \(bu 3
The
.I limit
can be reset back to its default value by setting it to
.BR 0 .
-See the section
-.B "Memory usage"
-for how the default limit is defined.
-.IP \(bu 3
-The memory usage limiting can be effectively disabled by setting
+This is currently equivalent to setting the
.I limit
to
-.BR max .
-This isn't recommended. It's usually better to use, for example,
-.BR \-\-memory=90% .
+.B max
+(no memory usage limit).
+Once multithreading support has been implemented,
+there may be a difference between
+.B 0
+and
+.B max
+for the multithreaded case, so it is recommended to use
+.B 0
+instead of
+.B max
+until the details have been decided.
.RE
-.IP
-The current
-.I limit
-can be seen near the bottom of the output of the
-.B \-\-long-help
-option.
+.IP ""
+See also the section
+.BR "Memory usage" .
.TP
+.BI \-\-memlimit\-decompress= limit
+Set a memory usage limit for decompression.
+This also affects the
+.B \-\-list
+mode.
+If the operation is not possible without exceeding the
+.IR limit ,
+.B xz
+will display an error and decompressing the file will fail.
+See
+.BI \-\-memlimit\-compress= limit
+for possible ways to specify the
+.IR limit .
+.TP
+\fB\-M\fR \fIlimit\fR, \fB\-\-memlimit=\fIlimit\fR, \fB\-\-memory=\fIlimit
+This is equivalent to specifying \fB\-\-memlimit\-compress=\fIlimit
+\fB\-\-memlimit\-decompress=\fIlimit\fR.
+.TP
+.B \-\-no\-adjust
+Display an error and exit if the compression settings exceed
+the memory usage limit.
+The default is to adjust the settings downwards so
+that the memory usage limit is not exceeded.
+Automatic adjusting is always disabled when creating raw streams
+.RB ( \-\-format=raw ).
+.TP
\fB\-T\fR \fIthreads\fR, \fB\-\-threads=\fIthreads
-Specify the maximum number of worker threads to use. The default is
-the number of available CPU cores. You can see the current value of
+Specify the number of worker threads to use.
+The actual number of threads can be less than
.I threads
-near the end of the output of the
-.B \-\-long\-help
-option.
-.IP
-The actual number of worker threads can be less than
-.I threads
if using more threads would exceed the memory usage limit.
-In addition to CPU-intensive worker threads,
-.B xz
-may use a few auxiliary threads, which don't use a lot of CPU time.
-.IP
-.B "Multithreaded compression and decompression are not implemented yet,"
-.B "so this option has no effect for now."
-.SS Custom compressor filter chains
-A custom filter chain allows specifying the compression settings in detail
-instead of relying on the settings associated to the preset levels.
-When a custom filter chain is specified, the compression preset level options
-(\fB\-0\fR ... \fB\-9\fR and \fB\-\-extreme\fR) are silently ignored.
+.IP ""
+.B "Multithreaded compression and decompression are not"
+.B "implemented yet, so this option has no effect for now."
+.IP ""
+.B "As of writing (2010-09-27), it hasn't been decided"
+.B "if threads will be used by default on multicore systems"
+.B "once support for threading has been implemented."
+.B "Comments are welcome."
+The complicating factor is that using many threads
+will increase the memory usage dramatically.
+Note that if multithreading will be the default,
+it will probably be done so that single-threaded and
+multithreaded modes produce the same output,
+so compression ratio won't be significantly affected
+if threading will be enabled by default.
+.
+.SS "Custom compressor filter chains"
+A custom filter chain allows specifying
+the compression settings in detail instead of relying on
+the settings associated to the preset levels.
+When a custom filter chain is specified,
+the compression preset level options
+(\fB\-0\fR ... \fB\-9\fR and \fB\-\-extreme\fR) are
+silently ignored.
.PP
-A filter chain is comparable to piping on the UN*X command line.
-When compressing, the uncompressed input goes to the first filter, whose
-output goes to the next filter (if any). The output of the last filter
-gets written to the compressed file. The maximum number of filters in
-the chain is four, but typically a filter chain has only one or two filters.
+A filter chain is comparable to piping on the command line.
+When compressing, the uncompressed input goes to the first filter,
+whose output goes to the next filter (if any).
+The output of the last filter gets written to the compressed file.
+The maximum number of filters in the chain is four,
+but typically a filter chain has only one or two filters.
.PP
-Many filters have limitations where they can be in the filter chain:
-some filters can work only as the last filter in the chain, some only
-as a non-last filter, and some work in any position in the chain. Depending
-on the filter, this limitation is either inherent to the filter design or
-exists to prevent security issues.
+Many filters have limitations on where they can be
+in the filter chain:
+some filters can work only as the last filter in the chain,
+some only as a non-last filter, and some work in any position
+in the chain.
+Depending on the filter, this limitation is either inherent to
+the filter design or exists to prevent security issues.
.PP
-A custom filter chain is specified by using one or more filter options in
-the order they are wanted in the filter chain. That is, the order of filter
-options is significant! When decoding raw streams
+A custom filter chain is specified by using one or more
+filter options in the order they are wanted in the filter chain.
+That is, the order of filter options is significant!
+When decoding raw streams
.RB ( \-\-format=raw ),
-the filter chain is specified in the same order as it was specified when
-compressing.
+the filter chain is specified in the same order as
+it was specified when compressing.
.PP
Filters take filter-specific
.I options
-as a comma-separated list. Extra commas in
+as a comma-separated list.
+Extra commas in
.I options
-are ignored. Every option has a default value, so you need to
+are ignored.
+Every option has a default value, so you need to
specify only those you want to change.
.TP
-\fB\-\-lzma1\fR[\fB=\fIoptions\fR], \fB\-\-lzma2\fR[\fB=\fIoptions\fR]
-Add LZMA1 or LZMA2 filter to the filter chain. These filter can be used
-only as the last filter in the chain.
-.IP
-LZMA1 is a legacy filter, which is supported almost solely due to the legacy
+\fB\-\-lzma1\fR[\fB=\fIoptions\fR]
+.PD 0
+.TP
+\fB\-\-lzma2\fR[\fB=\fIoptions\fR]
+.PD
+Add LZMA1 or LZMA2 filter to the filter chain.
+These filters can be used only as the last filter in the chain.
+.IP ""
+LZMA1 is a legacy filter,
+which is supported almost solely due to the legacy
.B .lzma
-file format, which supports only LZMA1. LZMA2 is an updated
-version of LZMA1 to fix some practical issues of LZMA1. The
+file format, which supports only LZMA1.
+LZMA2 is an updated
+version of LZMA1 to fix some practical issues of LZMA1.
+The
.B .xz
-format uses LZMA2, and doesn't support LZMA1 at all. Compression speed and
-ratios of LZMA1 and LZMA2 are practically the same.
-.IP
+format uses LZMA2 and doesn't support LZMA1 at all.
+Compression speed and ratios of LZMA1 and LZMA2
+are practically the same.
+.IP ""
LZMA1 and LZMA2 share the same set of
.IR options :
.RS
@@ -738,8 +965,9 @@
to
.IR preset .
.I Preset
-consist of an integer, which may be followed by single-letter preset
-modifiers. The integer can be from
+consist of an integer, which may be followed by single-letter
+preset modifiers.
+The integer can be from
.B 0
to
.BR 9 ,
@@ -748,7 +976,6 @@
.BR e ,
which matches
.BR \-\-extreme .
-.IP
The default
.I preset
is
@@ -758,84 +985,155 @@
are taken.
.TP
.BI dict= size
-Dictionary (history buffer) size indicates how many bytes of the recently
-processed uncompressed data is kept in memory. One method to reduce size of
-the uncompressed data is to store distance-length pairs, which
-indicate what data to repeat from the dictionary buffer. The bigger
-the dictionary, the better the compression ratio usually is,
-but dictionaries bigger than the uncompressed data are waste of RAM.
-.IP
-Typical dictionary size is from 64 KiB to 64 MiB. The minimum is 4 KiB.
-The maximum for compression is currently 1.5 GiB. The decompressor already
-supports dictionaries up to one byte less than 4 GiB, which is the
-maximum for LZMA1 and LZMA2 stream formats.
-.IP
-Dictionary size has the biggest effect on compression ratio.
-Dictionary size and match finder together determine the memory usage of
-the LZMA1 or LZMA2 encoder. The same dictionary size is required
-for decompressing that was used when compressing, thus the memory usage of
-the decoder is determined by the dictionary size used when compressing.
+Dictionary (history buffer)
+.I size
+indicates how many bytes of the recently processed
+uncompressed data is kept in memory.
+The algorithm tries to find repeating byte sequences (matches) in
+the uncompressed data, and replace them with references
+to the data currently in the dictionary.
+The bigger the dictionary, the higher is the chance
+to find a match.
+Thus, increasing dictionary
+.I size
+usually improves compression ratio, but
+a dictionary bigger than the uncompressed file is waste of memory.
+.IP ""
+Typical dictionary
+.I size
+is from 64\ KiB to 64\ MiB.
+The minimum is 4\ KiB.
+The maximum for compression is currently 1.5\ GiB (1536\ MiB).
+The decompressor already supports dictionaries up to
+one byte less than 4\ GiB, which is the maximum for
+the LZMA1 and LZMA2 stream formats.
+.IP ""
+Dictionary
+.I size
+and match finder
+.RI ( mf )
+together determine the memory usage of the LZMA1 or LZMA2 encoder.
+The same (or bigger) dictionary
+.I size
+is required for decompressing that was used when compressing,
+thus the memory usage of the decoder is determined
+by the dictionary size used when compressing.
+The
+.B .xz
+headers store the dictionary
+.I size
+either as
+.RI "2^" n
+or
+.RI "2^" n " + 2^(" n "\-1),"
+so these
+.I sizes
+are somewhat preferred for compression.
+Other
+.I sizes
+will get rounded up when stored in the
+.B .xz
+headers.
.TP
.BI lc= lc
-Specify the number of literal context bits. The minimum is
-.B 0
-and the maximum is
-.BR 4 ;
-the default is
-.BR 3 .
+Specify the number of literal context bits.
+The minimum is 0 and the maximum is 4; the default is 3.
In addition, the sum of
.I lc
and
.I lp
-must not exceed
-.BR 4 .
+must not exceed 4.
+.IP ""
+All bytes that cannot be encoded as matches
+are encoded as literals.
+That is, literals are simply 8-bit bytes
+that are encoded one at a time.
+.IP ""
+The literal coding makes an assumption that the highest
+.I lc
+bits of the previous uncompressed byte correlate
+with the next byte.
+E.g. in typical English text, an upper-case letter is
+often followed by a lower-case letter, and a lower-case
+letter is usually followed by another lower-case letter.
+In the US-ASCII character set, the highest three bits are 010
+for upper-case letters and 011 for lower-case letters.
+When
+.I lc
+is at least 3, the literal coding can take advantage of
+this property in the uncompressed data.
+.IP ""
+The default value (3) is usually good.
+If you want maximum compression, test
+.BR lc=4 .
+Sometimes it helps a little, and
+sometimes it makes compression worse.
+If it makes it worse, test e.g.\&
+.B lc=2
+too.
.TP
.BI lp= lp
-Specify the number of literal position bits. The minimum is
-.B 0
-and the maximum is
-.BR 4 ;
-the default is
-.BR 0 .
+Specify the number of literal position bits.
+The minimum is 0 and the maximum is 4; the default is 0.
+.IP ""
+.I Lp
+affects what kind of alignment in the uncompressed data is
+assumed when encoding literals.
+See
+.I pb
+below for more information about alignment.
.TP
.BI pb= pb
-Specify the number of position bits. The minimum is
-.B 0
-and the maximum is
-.BR 4 ;
-the default is
-.BR 2 .
-.TP
-.BI mode= mode
-Compression
-.I mode
-specifies the function used to analyze the data produced by the match finder.
-Supported
-.I modes
-are
-.B fast
+Specify the number of position bits.
+The minimum is 0 and the maximum is 4; the default is 2.
+.IP ""
+.I Pb
+affects what kind of alignment in the uncompressed data is
+assumed in general.
+The default means four-byte alignment
+.RI (2^ pb =2^2=4),
+which is often a good choice when there's no better guess.
+.IP ""
+When the aligment is known, setting
+.I pb
+accordingly may reduce the file size a little.
+E.g. with text files having one-byte
+alignment (US-ASCII, ISO-8859-*, UTF-8), setting
+.B pb=0
+can improve compression slightly.
+For UTF-16 text,
+.B pb=1
+is a good choice.
+If the alignment is an odd number like 3 bytes,
+.B pb=0
+might be the best choice.
+.IP ""
+Even though the assumed alignment can be adjusted with
+.I pb
and
-.BR normal .
-The default is
-.B fast
-for
-.I presets
-.BR 0 \- 2
-and
-.B normal
-for
-.I presets
-.BR 3 \- 9 .
+.IR lp ,
+LZMA1 and LZMA2 still slightly favor 16-byte alignment.
+It might be worth taking into account when designing file formats
+that are likely to be often compressed with LZMA1 or LZMA2.
.TP
.BI mf= mf
-Match finder has a major effect on encoder speed, memory usage, and
-compression ratio. Usually Hash Chain match finders are faster than
-Binary Tree match finders. Hash Chains are usually used together with
-.B mode=fast
-and Binary Trees with
-.BR mode=normal .
-The memory usage formulas are only rough estimates,
-which are closest to reality when
+Match finder has a major effect on encoder speed,
+memory usage, and compression ratio.
+Usually Hash Chain match finders are faster than Binary Tree
+match finders.
+The default depends on the
+.IR preset :
+0 uses
+.BR hc3 ,
+1\-3
+use
+.BR hc4 ,
+and the rest use
+.BR bt4 .
+.IP ""
+The following match finders are supported.
+The memory usage formulas below are rough approximations,
+which are closest to the reality when
.I dict
is a power of two.
.RS
@@ -848,6 +1146,7 @@
3
.br
Memory usage:
+.br
.I dict
* 7.5 (if
.I dict
@@ -866,8 +1165,16 @@
4
.br
Memory usage:
+.br
.I dict
-* 7.5
+* 7.5 (if
+.I dict
+<= 32 MiB);
+.br
+.I dict
+* 6.5 (if
+.I dict
+> 32 MiB)
.TP
.B bt2
Binary Tree with 2-byte hashing
@@ -888,6 +1195,7 @@
3
.br
Memory usage:
+.br
.I dict
* 11.5 (if
.I dict
@@ -906,53 +1214,96 @@
4
.br
Memory usage:
+.br
.I dict
-* 11.5
+* 11.5 (if
+.I dict
+<= 32 MiB);
+.br
+.I dict
+* 10.5 (if
+.I dict
+> 32 MiB)
.RE
.TP
+.BI mode= mode
+Compression
+.I mode
+specifies the method to analyze
+the data produced by the match finder.
+Supported
+.I modes
+are
+.B fast
+and
+.BR normal .
+The default is
+.B fast
+for
+.I presets
+0\-3 and
+.B normal
+for
+.I presets
+4\-9.
+.IP ""
+Usually
+.B fast
+is used with Hash Chain match finders and
+.B normal
+with Binary Tree match finders.
+This is also what the
+.I presets
+do.
+.TP
.BI nice= nice
-Specify what is considered to be a nice length for a match. Once a match
-of at least
+Specify what is considered to be a nice length for a match.
+Once a match of at least
.I nice
-bytes is found, the algorithm stops looking for possibly better matches.
-.IP
-.I nice
-can be 2\-273 bytes. Higher values tend to give better compression ratio
-at expense of speed. The default depends on the
-.I preset
-level.
+bytes is found, the algorithm stops
+looking for possibly better matches.
+.IP ""
+.I Nice
+can be 2\-273 bytes.
+Higher values tend to give better compression ratio
+at the expense of speed.
+The default depends on the
+.IR preset .
.TP
.BI depth= depth
-Specify the maximum search depth in the match finder. The default is the
-special value
-.BR 0 ,
+Specify the maximum search depth in the match finder.
+The default is the special value of 0,
which makes the compressor determine a reasonable
.I depth
from
.I mf
and
.IR nice .
-.IP
+.IP ""
+Reasonable
+.I depth
+for Hash Chains is 4\-100 and 16\-1000 for Binary Trees.
Using very high values for
.I depth
-can make the encoder extremely slow with carefully crafted files.
+can make the encoder extremely slow with some files.
Avoid setting the
.I depth
-over 1000 unless you are prepared to interrupt the compression in case it
-is taking too long.
+over 1000 unless you are prepared to interrupt
+the compression in case it is taking far too long.
.RE
-.IP
+.IP ""
When decoding raw streams
.RB ( \-\-format=raw ),
-LZMA2 needs only the value of
-.BR dict .
+LZMA2 needs only the dictionary
+.IR size .
LZMA1 needs also
-.BR lc ,
-.BR lp ,
+.IR lc ,
+.IR lp ,
and
-.BR pb.
+.IR pb .
.TP
\fB\-\-x86\fR[\fB=\fIoptions\fR]
+.PD 0
.TP
\fB\-\-powerpc\fR[\fB=\fIoptions\fR]
.TP
@@ -963,28 +1314,72 @@
\fB\-\-armthumb\fR[\fB=\fIoptions\fR]
.TP
\fB\-\-sparc\fR[\fB=\fIoptions\fR]
-Add a branch/call/jump (BCJ) filter to the filter chain. These filters
-can be used only as non-last filter in the filter chain.
-.IP
-A BCJ filter converts relative addresses in the machine code to their
-absolute counterparts. This doesn't change the size of the data, but
-it increases redundancy, which allows e.g. LZMA2 to get better
-compression ratio.
-.IP
-The BCJ filters are always reversible, so using a BCJ filter for wrong
-type of data doesn't cause any data loss. However, applying a BCJ filter
-for wrong type of data is a bad idea, because it tends to make the
-compression ratio worse.
-.IP
+.PD
+Add a branch/call/jump (BCJ) filter to the filter chain.
+These filters can be used only as a non-last filter
+in the filter chain.
+.IP ""
+A BCJ filter converts relative addresses in
+the machine code to their absolute counterparts.
+This doesn't change the size of the data,
+but it increases redundancy,
+which can help LZMA2 to produce 0\-15\ % smaller
+.B .xz
+file.
+The BCJ filters are always reversible,
+so using a BCJ filter for wrong type of data
+doesn't cause any data loss, although it may make
+the compression ratio slightly worse.
+.IP ""
+It is fine to apply a BCJ filter on a whole executable;
+there's no need to apply it only on the executable section.
+Applying a BCJ filter on an archive that contains both executable
+and non-executable files may or may not give good results,
+so it generally isn't good to blindly apply a BCJ filter when
+compressing binary packages for distribution.
+.IP ""
+These BCJ filters are very fast and
+use insignificant amount of memory.
+If a BCJ filter improves compression ratio of a file,
+it can improve decompression speed at the same time.
+This is because, on the same hardware,
+the decompression speed of LZMA2 is roughly
+a fixed number of bytes of compressed data per second.
+.IP ""
+These BCJ filters have known problems related to
+the compression ratio:
+.RS
+.IP \(bu 3
+Some types of files containing executable code
+(e.g. object files, static libraries, and Linux kernel modules)
+have the addresses in the instructions filled with filler values.
+These BCJ filters will still do the address conversion,
+which will make the compression worse with these files.
+.IP \(bu 3
+Applying a BCJ filter on an archive containing multiple similar
+executables can make the compression ratio worse than not using
+a BCJ filter.
+This is because the BCJ filter doesn't detect the boundaries
+of the executable files, and doesn't reset
+the address conversion counter for each executable.
+.RE
+.IP ""
+Both of the above problems will be fixed
+in the future in a new filter.
+The old BCJ filters will still be useful in embedded systems,
+because the decoder of the new filter will be bigger
+and use more memory.
+.IP ""
Different instruction sets have have different alignment:
.RS
.RS
+.PP
.TS
tab(;);
l n l
l n l.
Filter;Alignment;Notes
-x86;1;32-bit and 64-bit x86
+x86;1;32-bit or 64-bit x86
PowerPC;4;Big endian only
ARM;4;Little endian only
ARM-Thumb;2;Little endian only
@@ -993,15 +1388,18 @@
.TE
.RE
.RE
-.IP
-Since the BCJ-filtered data is usually compressed with LZMA2, the compression
-ratio may be improved slightly if the LZMA2 options are set to match the
-alignment of the selected BCJ filter. For example, with the IA-64 filter,
-it's good to set
+.IP ""
+Since the BCJ-filtered data is usually compressed with LZMA2,
+the compression ratio may be improved slightly if
+the LZMA2 options are set to match the
+alignment of the selected BCJ filter.
+For example, with the IA-64 filter, it's good to set
.B pb=4
-with LZMA2 (2^4=16). The x86 filter is an exception; it's usually good to
-stick to LZMA2's default four-byte alignment when compressing x86 executables.
-.IP
+with LZMA2 (2^4=16).
+The x86 filter is an exception;
+it's usually good to stick to LZMA2's default
+four-byte alignment when compressing x86 executables.
+.IP ""
All BCJ filters support the same
.IR options :
.RS
@@ -1009,36 +1407,32 @@
.BI start= offset
Specify the start
.I offset
-that is used when converting between relative and absolute addresses.
+that is used when converting between relative
+and absolute addresses.
The
.I offset
-must be a multiple of the alignment of the filter (see the table above).
-The default is zero. In practice, the default is good; specifying
-a custom
+must be a multiple of the alignment of the filter
+(see the table above).
+The default is zero.
+In practice, the default is good; specifying a custom
.I offset
is almost never useful.
-.IP
-Specifying a non-zero start
-.I offset
-is probably useful only if the executable has multiple sections, and there
-are many cross-section jumps or calls. Applying a BCJ filter separately for
-each section with proper start offset and then compressing the result as
-a single chunk may give some improvement in compression ratio compared
-to applying the BCJ filter with the default
-.I offset
-for the whole executable.
.RE
.TP
\fB\-\-delta\fR[\fB=\fIoptions\fR]
-Add Delta filter to the filter chain. The Delta filter
-can be used only as non-last filter in the filter chain.
-.IP
-Currently only simple byte-wise delta calculation is supported. It can
-be useful when compressing e.g. uncompressed bitmap images or uncompressed
-PCM audio. However, special purpose algorithms may give significantly better
-results than Delta + LZMA2. This is true especially with audio, which
-compresses faster and better e.g. with FLAC.
-.IP
+Add the Delta filter to the filter chain.
+The Delta filter can be only used as a non-last filter
+in the filter chain.
+.IP ""
+Currently only simple byte-wise delta calculation is supported.
+It can be useful when compressing e.g. uncompressed bitmap images
+or uncompressed PCM audio.
+However, special purpose algorithms may give significantly better
+results than Delta + LZMA2.
+This is true especially with audio,
+which compresses faster and better e.g. with
+.BR flac (1).
+.IP ""
Supported
.IR options :
.RS
@@ -1046,99 +1440,111 @@
.BI dist= distance
Specify the
.I distance
-of the delta calculation as bytes.
+of the delta calculation in bytes.
.I distance
-must be 1\-256. The default is 1.
-.IP
+must be 1\-256.
+The default is 1.
+.IP ""
For example, with
.B dist=2
and eight-byte input A1 B1 A2 B3 A3 B5 A4 B7, the output will be
A1 B1 01 02 01 02 01 02.
.RE
+.
.SS "Other options"
.TP
.BR \-q ", " \-\-quiet
-Suppress warnings and notices. Specify this twice to suppress errors too.
-This option has no effect on the exit status. That is, even if a warning
-was suppressed, the exit status to indicate a warning is still used.
+Suppress warnings and notices.
+Specify this twice to suppress errors too.
+This option has no effect on the exit status.
+That is, even if a warning was suppressed,
+the exit status to indicate a warning is still used.
.TP
.BR \-v ", " \-\-verbose
-Be verbose. If standard error is connected to a terminal,
+Be verbose.
+If standard error is connected to a terminal,
.B xz
will display a progress indicator.
Specifying
.B \-\-verbose
-twice will give even more verbose output (useful mostly for debugging).
-.IP
+twice will give even more verbose output.
+.IP ""
The progress indicator shows the following information:
.RS
.IP \(bu 3
-Completion percentage is shown if the size of the input file is known.
-That is, percentage cannot be shown in pipes.
+Completion percentage is shown
+if the size of the input file is known.
+That is, the percentage cannot be shown in pipes.
.IP \(bu 3
-Amount of compressed data produced (compressing) or consumed (decompressing).
+Amount of compressed data produced (compressing)
+or consumed (decompressing).
.IP \(bu 3
-Amount of uncompressed data consumed (compressing) or produced
-(decompressing).
+Amount of uncompressed data consumed (compressing)
+or produced (decompressing).
.IP \(bu 3
-Compression ratio, which is calculated by dividing the amount of
-compressed data processed so far by the amount of uncompressed data
-processed so far.
+Compression ratio, which is calculated by dividing
+the amount of compressed data processed so far by
+the amount of uncompressed data processed so far.
.IP \(bu 3
-Compression or decompression speed. This is measured as the amount of
-uncompressed data consumed (compression) or produced (decompression)
-per second. It is shown once a few seconds have passed since
+Compression or decompression speed.
+This is measured as the amount of uncompressed data consumed
+(compression) or produced (decompression) per second.
+It is shown after a few seconds have passed since
.B xz
started processing the file.
.IP \(bu 3
-Elapsed time or estimated time remaining.
-Elapsed time is displayed in the format M:SS or H:MM:SS.
-The estimated remaining time is displayed in a less precise format
-which never has colons, for example, 2 min 30 s. The estimate can
-be shown only when the size of the input file is known and a couple of
-seconds have already passed since
+Elapsed time in the format M:SS or H:MM:SS.
+.IP \(bu 3
+Estimated remaining time is shown
+only when the size of the input file is
+known and a couple of seconds have already passed since
.B xz
started processing the file.
+The time is shown in a less precise format which
+never has any colons, e.g. 2 min 30 s.
.RE
-.IP
+.IP ""
When standard error is not a terminal,
.B \-\-verbose
will make
.B xz
-print the filename, compressed size, uncompressed size, compression ratio,
-speed, and elapsed time on a single line to standard error after
-compressing or decompressing the file. If operating took at least a few
-seconds, also the speed and elapsed time are printed. If the operation
-didn't finish, for example due to user interruption, also the completion
-percentage is printed if the size of the input file is known.
+print the filename, compressed size, uncompressed size,
+compression ratio, and possibly also the speed and elapsed time
+on a single line to standard error after compressing or
+decompressing the file.
+The speed and elapsed time are included only when
+the operation took at least a few seconds.
+If the operation didn't finish, e.g. due to user interruption,
+also the completion percentage is printed
+if the size of the input file is known.
.TP
.BR \-Q ", " \-\-no\-warn
-Don't set the exit status to
-.B 2
-even if a condition worth a warning was detected. This option doesn't affect
-the verbosity level, thus both
+Don't set the exit status to 2
+even if a condition worth a warning was detected.
+This option doesn't affect the verbosity level, thus both
.B \-\-quiet
and
.B \-\-no\-warn
-have to be used to not display warnings and to not alter the exit status.
+have to be used to not display warnings and
+to not alter the exit status.
.TP
.B \-\-robot
-Print messages in a machine-parsable format. This is intended to ease
-writing frontends that want to use
+Print messages in a machine-parsable format.
+This is intended to ease writing frontends that want to use
.B xz
-instead of liblzma, which may be the case with various scripts. The output
-with this option enabled is meant to be stable across
+instead of liblzma, which may be the case with various scripts.
+The output with this option enabled is meant to be stable across
.B xz
-releases. See the section
+releases.
+See the section
.B "ROBOT MODE"
for details.
.TP
-.BR \-\-info-memory
-Display the current memory usage limit in human-readable format on
-a single line, and exit successfully. To see how much RAM
+.BR \-\-info\-memory
+Display, in human-readable format, how much physical memory (RAM)
.B xz
-thinks your system has, use
-.BR "\-\-memory=100% \-\-info\-memory" .
+thinks the system has and the memory usage limits for compression
+and decompression, and exit successfully.
.TP
.BR \-h ", " \-\-help
Display a help message describing the most commonly used options,
@@ -1152,24 +1558,29 @@
.BR \-V ", " \-\-version
Display the version number of
.B xz
-and liblzma in human readable format. To get machine-parsable output, specify
+and liblzma in human readable format.
+To get machine-parsable output, specify
.B \-\-robot
before
.BR \-\-version .
-.SH ROBOT MODE
+.
+.SH "ROBOT MODE"
The robot mode is activated with the
.B \-\-robot
-option. It makes the output of
+option.
+It makes the output of
.B xz
-easier to parse by other programs. Currently
+easier to parse by other programs.
+Currently
.B \-\-robot
is supported only together with
.BR \-\-version ,
-.BR \-\-info-memory ,
+.BR \-\-info\-memory ,
and
.BR \-\-list .
-It will be supported for normal compression and decompression in the future.
-.PP
+It will be supported for normal compression and
+decompression in the future.
+.
.SS Version
.B "xz \-\-robot \-\-version"
will print the version number of
@@ -1184,24 +1595,19 @@
Major version.
.TP
.I YYY
-Minor version. Even numbers are stable.
+Minor version.
+Even numbers are stable.
Odd numbers are alpha or beta versions.
.TP
.I ZZZ
-Patch level for stable releases or just a counter for development releases.
+Patch level for stable releases or
+just a counter for development releases.
.TP
.I S
Stability.
-.B 0
-is alpha,
-.B 1
-is beta, and
-.B 2
-is stable.
+0 is alpha, 1 is beta, and 2 is stable.
.I S
-should be always
-.B 2
-when
+should be always 2 when
.I YYY
is even.
.PP
@@ -1215,31 +1621,48 @@
and
5.0.0 is
.BR 50000002 .
-.SS Memory limit information
-.B "xz \-\-robot \-\-info-memory"
-prints the current memory usage limit as bytes on a single line.
-To get the total amount of installed RAM, use
-.BR "xz \-\-robot \-\-memory=100% \-\-info-memory" .
-.SS List mode
+.
+.SS "Memory limit information"
+.B "xz \-\-robot \-\-info\-memory"
+prints a single line with three tab-separated columns:
+.IP 1. 4
+Total amount of physical memory (RAM) in bytes
+.IP 2. 4
+Memory usage limit for compression in bytes.
+A special value of zero indicates the default setting,
+which for single-threaded mode is the same as no limit.
+.IP 3. 4
+Memory usage limit for decompression in bytes.
+A special value of zero indicates the default setting,
+which for single-threaded mode is the same as no limit.
+.PP
+In the future, the output of
+.B "xz \-\-robot \-\-info\-memory"
+may have more columns, but never more than a single line.
+.
+.SS "List mode"
.B "xz \-\-robot \-\-list"
-uses tab-separated output. The first column of every line has a string
+uses tab-separated output.
+The first column of every line has a string
that indicates the type of the information found on that line:
.TP
.B name
-This is always the first line when starting to list a file. The second
-column on the line is the filename.
+This is always the first line when starting to list a file.
+The second column on the line is the filename.
.TP
.B file
This line contains overall information about the
.B .xz
-file. This line is always printed after the
+file.
+This line is always printed after the
.B name
line.
.TP
.B stream
This line type is used only when
.B \-\-verbose
-was specified. There are as many
+was specified.
+There are as many
.B stream
lines as there are streams in the
.B .xz
@@ -1248,11 +1671,13 @@
.B block
This line type is used only when
.B \-\-verbose
-was specified. There are as many
+was specified.
+There are as many
.B block
lines as there are blocks in the
.B .xz
-file. The
+file.
+The
.B block
lines are shown after all the
.B stream
@@ -1261,9 +1686,11 @@
.B summary
This line type is used only when
.B \-\-verbose
-was specified twice. This line is printed after all
+was specified twice.
+This line is printed after all
.B block
-lines. Like the
+lines.
+Like the
.B file
line, the
.B summary
@@ -1272,12 +1699,13 @@
file.
.TP
.B totals
-This line is always the very last line of the list output. It shows
-the total counts and sizes.
+This line is always the very last line of the list output.
+It shows the total counts and sizes.
.PP
The columns of the
.B file
lines:
+.PD 0
.RS
.IP 2. 4
Number of streams in the file
@@ -1294,8 +1722,8 @@
.RB ( \-\-\- )
are displayed instead of the ratio.
.IP 7. 4
-Comma-separated list of integrity check names. The following strings are
-used for the known check types:
+Comma-separated list of integrity check names.
+The following strings are used for the known check types:
.BR None ,
.BR CRC32 ,
.BR CRC64 ,
@@ -1309,10 +1737,12 @@
.IP 8. 4
Total size of stream padding in the file
.RE
+.PD
.PP
The columns of the
.B stream
lines:
+.PD 0
.RS
.IP 2. 4
Stream number (the first stream is 1)
@@ -1333,15 +1763,18 @@
.IP 10. 4
Size of stream padding
.RE
+.PD
.PP
The columns of the
.B block
lines:
+.PD 0
.RS
.IP 2. 4
Number of the stream containing this block
.IP 3. 4
-Block number relative to the beginning of the stream (the first block is 1)
+Block number relative to the beginning of the stream
+(the first block is 1)
.IP 4. 4
Block number relative to the beginning of the file
.IP 5. 4
@@ -1357,14 +1790,18 @@
.IP 10. 4
Name of the integrity check
.RE
+.PD
.PP
If
.B \-\-verbose
was specified twice, additional columns are included on the
.B block
-lines. These are not displayed with a single
+lines.
+These are not displayed with a single
.BR \-\-verbose ,
-because getting this information requires many seeks and can thus be slow:
+because getting this information requires many seeks
+and can thus be slow:
+.PD 0
.RS
.IP 11. 4
Value of the integrity check in hexadecimal
@@ -1378,26 +1815,30 @@
indicates that uncompressed size is present.
If the flag is not set, a dash
.RB ( \- )
-is shown instead to keep the string length fixed. New flags may be added
-to the end of the string in the future.
+is shown instead to keep the string length fixed.
+New flags may be added to the end of the string in the future.
.IP 14. 4
Size of the actual compressed data in the block (this excludes
the block header, block padding, and check fields)
.IP 15. 4
-Amount of memory (as bytes) required to decompress this block with this
+Amount of memory (in bytes) required to decompress
+this block with this
.B xz
version
.IP 16. 4
-Filter chain. Note that most of the options used at compression time cannot
-be known, because only the options that are needed for decompression are
-stored in the
+Filter chain.
+Note that most of the options used at compression time
+cannot be known, because only the options
+that are needed for decompression are stored in the
.B .xz
headers.
.RE
+.PD
.PP
The columns of the
.B totals
line:
+.PD 0
.RS
.IP 2. 4
Number of streams
@@ -1410,14 +1851,17 @@
.IP 6. 4
Average compression ratio
.IP 7. 4
-Comma-separated list of integrity check names that were present in the files
+Comma-separated list of integrity check names
+that were present in the files
.IP 8. 4
Stream padding size
.IP 9. 4
-Number of files. This is here to keep the order of the earlier columns
-the same as on
+Number of files.
+This is here to
+keep the order of the earlier columns the same as on
.B file
lines.
+.PD
.RE
.PP
If
@@ -1425,10 +1869,11 @@
was specified twice, additional columns are included on the
.B totals
line:
+.PD 0
.RS
.IP 10. 4
-Maximum amount of memory (as bytes) required to decompress the files
-with this
+Maximum amount of memory (in bytes) required to decompress
+the files with this
.B xz
version
.IP 11. 4
@@ -1438,9 +1883,12 @@
indicating if all block headers have both compressed size and
uncompressed size stored in them
.RE
+.PD
.PP
-Future versions may add new line types and new columns can be added to
-the existing line types, but the existing columns won't be changed.
+Future versions may add new line types and
+new columns can be added to the existing line types,
+but the existing columns won't be changed.
+.
.SH "EXIT STATUS"
.TP
.B 0
@@ -1450,21 +1898,76 @@
An error occurred.
.TP
.B 2
-Something worth a warning occurred, but no actual errors occurred.
+Something worth a warning occurred,
+but no actual errors occurred.
.PP
-Notices (not warnings or errors) printed on standard error don't affect
-the exit status.
+Notices (not warnings or errors) printed on standard error
+don't affect the exit status.
+.
.SH ENVIRONMENT
+.B xz
+parses space-separated lists of options
+from the environment variables
+.B XZ_DEFAULTS
+and
+.BR XZ_OPT ,
+in this order, before parsing the options from the command line.
+Note that only options are parsed from the environment variables;
+all non-options are silently ignored.
+Parsing is done with
+.BR getopt_long (3)
+which is used also for the command line arguments.
.TP
+.B XZ_DEFAULTS
+User-specific or system-wide default options.
+Typically this is set in a shell initialization script to enable
+.BR xz 's
+memory usage limiter by default.
+Excluding shell initialization scripts
+and similar special cases, scripts must never set or unset
+.BR XZ_DEFAULTS .
+.TP
.B XZ_OPT
-A space-separated list of options is parsed from
+This is for passing options to
+.B xz
+when it is not possible to set the options directly on the
+.B xz
+command line.
+This is the case e.g. when
+.B xz
+is run by a script or tool, e.g. GNU
+.BR tar (1):
+.RS
+.RS
+.PP
+.nf
+.ft CW
+XZ_OPT=\-2v tar caf foo.tar.xz foo
+.ft R
+.fi
+.RE
+.RE
+.IP ""
+Scripts may use
.B XZ_OPT
-before parsing the options given on the command line. Note that only
-options are parsed from
-.BR XZ_OPT ;
-all non-options are silently ignored. Parsing is done with
-.BR getopt_long (3)
-which is used also for the command line arguments.
+e.g. to set script-specific default compression options.
+It is still recommended to allow users to override
+.B XZ_OPT
+if that is reasonable, e.g. in
+.BR sh (1)
+scripts one may use something like this:
+.RS
+.RS
+.PP
+.nf
+.ft CW
+XZ_OPT=${XZ_OPT\-"\-7e"}
+export XZ_OPT
+.ft R
+.fi
+.RE
+.RE
+.
.SH "LZMA UTILS COMPATIBILITY"
The command line syntax of
.B xz
@@ -1473,26 +1976,32 @@
.BR unlzma ,
and
.BR lzcat
-as found from LZMA Utils 4.32.x. In most cases, it is possible to replace
-LZMA Utils with XZ Utils without breaking existing scripts. There are some
-incompatibilities though, which may sometimes cause problems.
+as found from LZMA Utils 4.32.x.
+In most cases, it is possible to replace
+LZMA Utils with XZ Utils without breaking existing scripts.
+There are some incompatibilities though,
+which may sometimes cause problems.
+.
.SS "Compression preset levels"
The numbering of the compression level presets is not identical in
.B xz
and LZMA Utils.
-The most important difference is how dictionary sizes are mapped to different
-presets. Dictionary size is roughly equal to the decompressor memory usage.
+The most important difference is how dictionary sizes
+are mapped to different presets.
+Dictionary size is roughly equal to the decompressor memory usage.
.RS
+.PP
.TS
tab(;);
c c c
c n n.
Level;xz;LZMA Utils
-\-1;64 KiB;64 KiB
-\-2;512 KiB;1 MiB
-\-3;1 MiB;512 KiB
-\-4;2 MiB;1 MiB
-\-5;4 MiB;2 MiB
+\-0;256 KiB;N/A
+\-1;1 MiB;64 KiB
+\-2;2 MiB;1 MiB
+\-3;4 MiB;512 KiB
+\-4;4 MiB;1 MiB
+\-5;8 MiB;2 MiB
\-6;8 MiB;4 MiB
\-7;16 MiB;8 MiB
\-8;32 MiB;16 MiB
@@ -1500,20 +2009,24 @@
.TE
.RE
.PP
-The dictionary size differences affect the compressor memory usage too,
-but there are some other differences between LZMA Utils and XZ Utils, which
+The dictionary size differences affect
+the compressor memory usage too,
+but there are some other differences between
+LZMA Utils and XZ Utils, which
make the difference even bigger:
.RS
+.PP
.TS
tab(;);
c c c
c n n.
Level;xz;LZMA Utils 4.32.x
-\-1;2 MiB;2 MiB
-\-2;5 MiB;12 MiB
-\-3;13 MiB;12 MiB
-\-4;25 MiB;16 MiB
-\-5;48 MiB;26 MiB
+\-0;3 MiB;N/A
+\-1;9 MiB;2 MiB
+\-2;17 MiB;12 MiB
+\-3;32 MiB;12 MiB
+\-4;48 MiB;16 MiB
+\-5;94 MiB;26 MiB
\-6;94 MiB;45 MiB
\-7;186 MiB;83 MiB
\-8;370 MiB;159 MiB
@@ -1525,33 +2038,40 @@
.B \-7
while in XZ Utils it is
.BR \-6 ,
-so both use 8 MiB dictionary by default.
+so both use an 8 MiB dictionary by default.
+.
.SS "Streamed vs. non-streamed .lzma files"
-Uncompressed size of the file can be stored in the
+The uncompressed size of the file can be stored in the
.B .lzma
-header. LZMA Utils does that when compressing regular files.
-The alternative is to mark that uncompressed size is unknown and
-use end of payload marker to indicate where the decompressor should stop.
-LZMA Utils uses this method when uncompressed size isn't known, which is
-the case for example in pipes.
+header.
+LZMA Utils does that when compressing regular files.
+The alternative is to mark that uncompressed size is unknown
+and use end-of-payload marker to indicate
+where the decompressor should stop.
+LZMA Utils uses this method when uncompressed size isn't known,
+which is the case for example in pipes.
.PP
.B xz
supports decompressing
.B .lzma
-files with or without end of payload marker, but all
+files with or without end-of-payload marker, but all
.B .lzma
files created by
.B xz
-will use end of payload marker and have uncompressed size marked as unknown
-in the
+will use end-of-payload marker and have uncompressed size
+marked as unknown in the
.B .lzma
-header. This may be a problem in some (uncommon) situations. For example, a
+header.
+This may be a problem in some uncommon situations.
+For example, a
.B .lzma
-decompressor in an embedded device might work only with files that have known
-uncompressed size. If you hit this problem, you need to use LZMA Utils or
-LZMA SDK to create
+decompressor in an embedded device might work
+only with files that have known uncompressed size.
+If you hit this problem, you need to use LZMA Utils
+or LZMA SDK to create
.B .lzma
files with known uncompressed size.
+.
.SS "Unsupported .lzma files"
The
.B .lzma
@@ -1559,7 +2079,8 @@
.I lc
values up to 8, and
.I lp
-values up to 4. LZMA Utils can decompress files with any
+values up to 4.
+LZMA Utils can decompress files with any
.I lc
and
.IR lp ,
@@ -1575,24 +2096,25 @@
.B xz
and with LZMA SDK.
.PP
-The implementation of the LZMA1 filter in liblzma requires
-that the sum of
+The implementation of the LZMA1 filter in liblzma
+requires that the sum of
.I lc
and
.I lp
-must not exceed 4. Thus,
+must not exceed 4.
+Thus,
.B .lzma
-files which exceed this limitation, cannot be decompressed with
+files, which exceed this limitation, cannot be decompressed with
.BR xz .
.PP
LZMA Utils creates only
.B .lzma
-files which have dictionary size of
+files which have a dictionary size of
.RI "2^" n
-(a power of 2), but accepts files with any dictionary size.
+(a power of 2) but accepts files with any dictionary size.
liblzma accepts only
.B .lzma
-files which have dictionary size of
+files which have a dictionary size of
.RI "2^" n
or
.RI "2^" n " + 2^(" n "\-1)."
@@ -1600,13 +2122,18 @@
.B .lzma
files.
.PP
-These limitations shouldn't be a problem in practice, since practically all
+These limitations shouldn't be a problem in practice,
+since practically all
.B .lzma
files have been compressed with settings that liblzma will accept.
+.
.SS "Trailing garbage"
-When decompressing, LZMA Utils silently ignore everything after the first
+When decompressing,
+LZMA Utils silently ignore everything after the first
.B .lzma
-stream. In most situations, this is a bug. This also means that LZMA Utils
+stream.
+In most situations, this is a bug.
+This also means that LZMA Utils
don't support decompressing concatenated
.B .lzma
files.
@@ -1615,34 +2142,46 @@
.B .lzma
stream,
.B xz
-considers the file to be corrupt. This may break obscure scripts which have
+considers the file to be corrupt.
+This may break obscure scripts which have
assumed that trailing garbage is ignored.
+.
.SH NOTES
-.SS Compressed output may vary
-The exact compressed output produced from the same uncompressed input file
-may vary between XZ Utils versions even if compression options are identical.
-This is because the encoder can be improved (faster or better compression)
-without affecting the file format. The output can vary even between different
-builds of the same XZ Utils version, if different build options are used.
+.
+.SS "Compressed output may vary"
+The exact compressed output produced from
+the same uncompressed input file
+may vary between XZ Utils versions even if
+compression options are identical.
+This is because the encoder can be improved
+(faster or better compression)
+without affecting the file format.
+The output can vary even between different
+builds of the same XZ Utils version,
+if different build options are used.
.PP
The above means that implementing
.B \-\-rsyncable
to create rsyncable
.B .xz
-files is not going to happen without freezing a part of the encoder
+files is not going to happen without
+freezing a part of the encoder
implementation, which can then be used with
.BR \-\-rsyncable .
-.SS Embedded .xz decompressors
+.
+.SS "Embedded .xz decompressors"
Embedded
.B .xz
-decompressor implementations like XZ Embedded don't necessarily support files
-created with
+decompressor implementations like XZ Embedded don't necessarily
+support files created with integrity
.I check
types other than
.B none
and
.BR crc32 .
-Since the default is \fB\-\-check=\fIcrc64\fR, you must use
+Since the default is
+.BR \-\-check=crc64 ,
+you must use
.B \-\-check=none
or
.B \-\-check=crc32
@@ -1652,53 +2191,374 @@
.B .xz
format decompressors support all the
.I check
-types, or at least are able to decompress the file without verifying the
+types, or at least are able to decompress
+the file without verifying the
integrity check if the particular
.I check
is not supported.
.PP
-XZ Embedded supports BCJ filters, but only with the default start offset.
+XZ Embedded supports BCJ filters,
+but only with the default start offset.
+.
.SH EXAMPLES
+.
.SS Basics
+Compress the file
+.I foo
+into
+.I foo.xz
+using the default compression level
+.RB ( \-6 ),
+and remove
+.I foo
+if compression is successful:
+.RS
+.PP
+.nf
+.ft CW
+xz foo
+.ft R
+.fi
+.RE
+.PP
+Decompress
+.I bar.xz
+into
+.I bar
+and don't remove
+.I bar.xz
+even if decompression is successful:
+.RS
+.PP
+.nf
+.ft CW
+xz \-dk bar.xz
+.ft R
+.fi
+.RE
+.PP
+Create
+.I baz.tar.xz
+with the preset
+.B \-4e
+.RB ( "\-4 \-\-extreme" ),
+which is slower than e.g. the default
+.BR \-6 ,
+but needs less memory for compression and decompression (48\ MiB
+and 5\ MiB, respectively):
+.RS
+.PP
+.nf
+.ft CW
+tar cf \- baz | xz \-4e > baz.tar.xz
+.ft R
+.fi
+.RE
+.PP
A mix of compressed and uncompressed files can be decompressed
to standard output with a single command:
-.IP
-.B "xz -dcf a.txt b.txt.xz c.txt d.txt.xz > abcd.txt"
-.SS Parallel compression of many files
+.RS
+.PP
+.nf
+.ft CW
+xz \-dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt
+.ft R
+.fi
+.RE
+.
+.SS "Parallel compression of many files"
On GNU and *BSD,
.BR find (1)
and
.BR xargs (1)
-can be used to parallellize compression of many files:
+can be used to parallelize compression of many files:
+.RS
.PP
-.IP
-.B "find . \-type f \e! \-name '*.xz' \-print0 | xargs \-0r \-P4 \-n16 xz"
+.nf
+.ft CW
+find . \-type f \e! \-name '*.xz' \-print0 \e
+ | xargs \-0r \-P4 \-n16 xz \-T1
+.ft R
+.fi
+.RE
.PP
The
.B \-P
-option sets the number of parallel
+option to
+.BR xargs (1)
+sets the number of parallel
.B xz
-processes. The best value for the
+processes.
+The best value for the
.B \-n
option depends on how many files there are to be compressed.
-If there are only a couple of files, the value should probably be
-.BR 1 ;
+If there are only a couple of files,
+the value should probably be 1;
with tens of thousands of files,
-.B 100
-or even more may be appropriate to reduce the number of
+100 or even more may be appropriate to reduce the number of
.B xz
processes that
.BR xargs (1)
will eventually create.
-.SS Robot mode examples
-Calculating how many bytes have been saved in total after compressing
-multiple files:
-.IP
-.B "xz --robot --list *.xz | awk '/^totals/{print $5\-$4}'"
+.PP
+The option
+.B \-T1
+for
+.B xz
+is there to force it to single-threaded mode, because
+.BR xargs (1)
+is used to control the amount of parallelization.
+.
+.SS "Robot mode"
+Calculate how many bytes have been saved in total
+after compressing multiple files:
+.RS
+.PP
+.nf
+.ft CW
+xz \-\-robot \-\-list *.xz | awk '/^totals/{print $5\-$4}'
+.ft R
+.fi
+.RE
+.PP
+A script may want to know that it is using new enough
+.BR xz .
+The following
+.BR sh (1)
+script checks that the version number of the
+.B xz
+tool is at least 5.0.0.
+This method is compatible with old beta versions,
+which didn't support the
+.B \-\-robot
+option:
+.RS
+.PP
+.nf
+.ft CW
+if ! eval "$(xz \-\-robot \-\-version 2> /dev/null)" ||
+ [ "$XZ_VERSION" \-lt 50000002 ]; then
+ echo "Your xz is too old."
+fi
+unset XZ_VERSION LIBLZMA_VERSION
+.ft R
+.fi
+.RE
+.PP
+Set a memory usage limit for decompression using
+.BR XZ_OPT ,
+but if a limit has already been set, don't increase it:
+.RS
+.PP
+.nf
+.ft CW
+NEWLIM=$((123 << 20)) # 123 MiB
+OLDLIM=$(xz \-\-robot \-\-info\-memory | cut \-f3)
+if [ $OLDLIM \-eq 0 \-o $OLDLIM \-gt $NEWLIM ]; then
+ XZ_OPT="$XZ_OPT \-\-memlimit\-decompress=$NEWLIM"
+ export XZ_OPT
+fi
+.ft R
+.fi
+.RE
+.
+.SS "Custom compressor filter chains"
+The simplest use for custom filter chains is
+customizing a LZMA2 preset.
+This can be useful,
+because the presets cover only a subset of the
+potentially useful combinations of compression settings.
+.PP
+The CompCPU columns of the tables
+from the descriptions of the options
+.BR "\-0" " ... " "\-9"
+and
+.B \-\-extreme
+are useful when customizing LZMA2 presets.
+Here are the relevant parts collected from those two tables:
+.RS
+.PP
+.TS
+tab(;);
+c c
+n n.
+Preset;CompCPU
+\-0;0
+\-1;1
+\-2;2
+\-3;3
+\-4;4
+\-5;5
+\-6;6
+\-5e;7
+\-6e;8
+.TE
+.RE
+.PP
+If you know that a file requires
+somewhat big dictionary (e.g. 32 MiB) to compress well,
+but you want to compress it quicker than
+.B "xz \-8"
+would do, a preset with a low CompCPU value (e.g. 1)
+can be modified to use a bigger dictionary:
+.RS
+.PP
+.nf
+.ft CW
+xz \-\-lzma2=preset=1,dict=32MiB foo.tar
+.ft R
+.fi
+.RE
+.PP
+With certain files, the above command may be faster than
+.B "xz \-6"
+while compressing significantly better.
+However, it must be emphasized that only some files benefit from
+a big dictionary while keeping the CompCPU value low.
+The most obvious situation,
+where a big dictionary can help a lot,
+is an archive containing very similar files
+of at least a few megabytes each.
+The dictionary size has to be significantly bigger
+than any individual file to allow LZMA2 to take
+full advantage of the similarities between consecutive files.
+.PP
+If very high compressor and decompressor memory usage is fine,
+and the file being compressed is
+at least several hundred megabytes, it may be useful
+to use an even bigger dictionary than the 64 MiB that
+.B "xz \-9"
+would use:
+.RS
+.PP
+.nf
+.ft CW
+xz \-vv \-\-lzma2=dict=192MiB big_foo.tar
+.ft R
+.fi
+.RE
+.PP
+Using
+.B \-vv
+.RB ( "\-\-verbose \-\-verbose" )
+like in the above example can be useful
+to see the memory requirements
+of the compressor and decompressor.
+Remember that using a dictionary bigger than
+the size of the uncompressed file is waste of memory,
+so the above command isn't useful for small files.
+.PP
+Sometimes the compression time doesn't matter,
+but the decompressor memory usage has to be kept low
+e.g. to make it possible to decompress the file on
+an embedded system.
+The following command uses
+.B \-6e
+.RB ( "\-6 \-\-extreme" )
+as a base and sets the dictionary to only 64\ KiB.
+The resulting file can be decompressed with XZ Embedded
+(that's why there is
+.BR \-\-check=crc32 )
+using about 100\ KiB of memory.
+.RS
+.PP
+.nf
+.ft CW
+xz \-\-check=crc32 \-\-lzma2=preset=6e,dict=64KiB foo
+.ft R
+.fi
+.RE
+.PP
+If you want to squeeze out as many bytes as possible,
+adjusting the number of literal context bits
+.RI ( lc )
+and number of position bits
+.RI ( pb )
+can sometimes help.
+Adjusting the number of literal position bits
+.RI ( lp )
+might help too, but usually
+.I lc
+and
+.I pb
+are more important.
+E.g. a source code archive contains mostly US-ASCII text,
+so something like the following might give
+slightly (like 0.1\ %) smaller file than
+.B "xz \-6e"
+(try also without
+.BR lc=4 ):
+.RS
+.PP
+.nf
+.ft CW
+xz \-\-lzma2=preset=6e,pb=0,lc=4 source_code.tar
+.ft R
+.fi
+.RE
+.PP
+Using another filter together with LZMA2 can improve
+compression with certain file types.
+E.g. to compress a x86-32 or x86-64 shared library
+using the x86 BCJ filter:
+.RS
+.PP
+.nf
+.ft CW
+xz \-\-x86 \-\-lzma2 libfoo.so
+.ft R
+.fi
+.RE
+.PP
+Note that the order of the filter options is significant.
+If
+.B \-\-x86
+is specified after
+.BR \-\-lzma2 ,
+.B xz
+will give an error,
+because there cannot be any filter after LZMA2,
+and also because the x86 BCJ filter cannot be used
+as the last filter in the chain.
+.PP
+The Delta filter together with LZMA2
+can give good results with bitmap images.
+It should usually beat PNG,
+which has a few more advanced filters than simple
+delta but uses Deflate for the actual compression.
+.PP
+The image has to be saved in uncompressed format,
+e.g. as uncompressed TIFF.
+The distance parameter of the Delta filter is set
+to match the number of bytes per pixel in the image.
+E.g. 24-bit RGB bitmap needs
+.BR dist=3 ,
+and it is also good to pass
+.B pb=0
+to LZMA2 to accommodate the three-byte alignment:
+.RS
+.PP
+.nf
+.ft CW
+xz \-\-delta=dist=3 \-\-lzma2=pb=0 foo.tiff
+.ft R
+.fi
+.RE
+.PP
+If multiple images have been put into a single archive (e.g.\&
+.BR .tar ),
+the Delta filter will work on that too as long as all images
+have the same number of bytes per pixel.
+.
.SH "SEE ALSO"
.BR xzdec (1),
+.BR xzdiff (1),
+.BR xzgrep (1),
+.BR xzless (1),
+.BR xzmore (1),
.BR gzip (1),
-.BR bzip2 (1)
+.BR bzip2 (1),
+.BR 7z (1)
.PP
XZ Utils: <http://tukaani.org/xz/>
.br
« no previous file with comments | « src/xz/util.c ('k') | src/xzdec/Makefile.am » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698