src/xz/xz.1 - Issue 7109015: Update XZ Utils to 5.0.3 (in deps)

Unified Diff: src/xz/xz.1

Issue 7109015: Update XZ Utils to 5.0.3 (in deps) (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/deps/third_party/xz/

Patch Set: Created 9 years, 7 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: src/xz/xz.1

===================================================================

--- src/xz/xz.1 (revision 50504)

+++ src/xz/xz.1 (working copy)

@@ -5,9 +5,11 @@

.\" This file has been put into the public domain.

.\" You can do whatever you want with this file.

.\"

-.TH XZ 1 "2010-06-15" "Tukaani" "XZ Utils"

+.TH XZ 1 "2010-10-04" "Tukaani" "XZ Utils"

.SH NAME

xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files

.SH SYNOPSIS

.B xz

.RI [ option ]...

@@ -33,8 +35,8 @@

is equivalent to

.BR "xz \-\-format=lzma \-\-decompress \-\-stdout" .

.PP

-When writing scripts that need to decompress files, it is recommended to

-always use the name

+When writing scripts that need to decompress files,

+it is recommended to always use the name

.B xz

with appropriate arguments

.RB ( "xz \-d"

@@ -43,19 +45,22 @@

instead of the names

.B unxz

and

-.BR xzcat.

+.BR xzcat .

.SH DESCRIPTION

.B xz

-is a general-purpose data compression tool with command line syntax similar to

+is a general-purpose data compression tool with

+command line syntax similar to

.BR gzip (1)

and

.BR bzip2 (1).

The native file format is the

.B .xz

-format, but also the legacy

+format, but the legacy

.B .lzma

-format and raw compressed streams with no container format headers

-are supported.

+format used by LZMA Utils and

+raw compressed streams with no container format headers

+are also supported.

.PP

.B xz

compresses or decompresses each

@@ -68,13 +73,16 @@

.BR \- ,

.B xz

-reads from standard input and writes the processed data to standard output.

+reads from standard input and writes the processed data

+to standard output.

.B xz

will refuse (display an error and skip the

.IR file )

-to write compressed data to standard output if it is a terminal. Similarly,

+to write compressed data to standard output if it is a terminal.

+Similarly,

.B xz

-will refuse to read compressed data from standard input if it is a terminal.

+will refuse to read compressed data

+from standard input if it is a terminal.

.PP

Unless

.B \-\-stdout

@@ -117,8 +125,9 @@

if any of the following applies:

.IP \(bu 3

.I File

-is not a regular file. Symbolic links are not followed, thus they

-are not considered to be regular files.

+is not a regular file.

+Symbolic links are not followed,

+and thus they are not considered to be regular files.

.IP \(bu 3

.I File

has more than one hard link.

@@ -126,7 +135,7 @@

.I File

has setuid, setgid, or sticky bit set.

.IP \(bu 3

-The operation mode is set to compress, and the

+The operation mode is set to compress and the

.I file

already has a suffix of the target file format

.RB ( .xz

@@ -142,7 +151,7 @@

.B .lzma

format).

.IP \(bu 3

-The operation mode is set to decompress, and the

+The operation mode is set to decompress and the

.I file

doesn't have a suffix of any of the supported file formats

.RB ( .xz ,

@@ -154,12 +163,13 @@

After successfully compressing or decompressing the

.IR file ,

.B xz

-copies the owner, group, permissions, access time, and modification time

-from the source

+copies the owner, group, permissions, access time,

+and modification time from the source

.I file

-to the target file. If copying the group fails, the permissions are modified

-so that the target file doesn't become accessible to users who didn't have

-permission to access the source

+to the target file.

+If copying the group fails, the permissions are modified

+so that the target file doesn't become accessible to users

+who didn't have permission to access the source

.IR file .

.B xz

doesn't support copying other metadata like access control lists

@@ -169,7 +179,8 @@

.I file

is removed unless

.B \-\-keep

-was specified. The source

+was specified.

+The source

.I file

is never removed if the output is written to standard output.

.PP

@@ -180,61 +191,78 @@

to the

.B xz

process makes it print progress information to standard error.

-This has only limited use since when standard error is a terminal, using

+This has only limited use since when standard error

+is a terminal, using

.B \-\-verbose

will display an automatically updating progress indicator.

.SS "Memory usage"

The memory usage of

.B xz

-varies from a few hundred kilobytes to several gigabytes depending on

-the compression settings. The settings used when compressing a file

-affect also the memory usage of the decompressor. Typically the decompressor

-needs only 5\ % to 20\ % of the amount of RAM that the compressor needed when

-creating the file. Still, the worst-case memory usage of the decompressor

-is several gigabytes.

+varies from a few hundred kilobytes to several gigabytes

+depending on the compression settings.

+The settings used when compressing a file determine

+the memory requirements of the decompressor.

+Typically the decompressor needs 5\ % to 20\ % of

+the amount of memory that the compressor needed when

+creating the file.

+For example, decompressing a file created with

+.B xz \-9

+currently requires 65\ MiB of memory.

+Still, it is possible to have

+.B .xz

+files that require several gigabytes of memory to decompress.

.PP

-To prevent uncomfortable surprises caused by huge memory usage,

+Especially users of older systems may find

+the possibility of very large memory usage annoying.

+To prevent uncomfortable surprises,

.B xz

-has a built-in memory usage limiter. While some operating systems provide

-ways to limit the memory usage of processes, relying on it wasn't deemed

-to be flexible enough. The default limit depends on the total amount of

-physical RAM:

-.IP \(bu 3

-If 40\ % of RAM is at least 80 MiB, 40\ % of RAM is used as the limit.

-.IP \(bu 3

-If 80\ % of RAM is less than 80 MiB, 80\ % of RAM is used as the limit.

-.IP \(bu 3

-Otherwise 80 MiB is used as the limit.

+has a built-in memory usage limiter, which is disabled by default.

+While some operating systems provide ways to limit

+the memory usage of processes, relying on it

+wasn't deemed to be flexible enough (e.g. using

+.BR ulimit (1)

+to limit virtual memory tends to cripple

+.BR mmap (2)).

.PP

-When compressing, if the selected compression settings exceed the memory

-usage limit, the settings are automatically adjusted downwards and a notice

-about this is displayed. As an exception, if the memory usage limit is

-exceeded when compressing with

-.B \-\-format=raw

-or

-.BR \-\-no\-adjust ,

-an error is displayed and

+The memory usage limiter can be enabled with

+the command line option \fB\-\-memlimit=\fIlimit\fR.

+Often it is more convenient to enable the limiter

+by default by setting the environment variable

+.BR XZ_DEFAULTS ,

+e.g.\&

+.BR XZ_DEFAULTS=\-\-memlimit=150MiB .

+It is possible to set the limits separately

+for compression and decompression

+by using \fB\-\-memlimit\-compress=\fIlimit\fR and

+\fB\-\-memlimit\-decompress=\fIlimit\fR.

+Using these two options outside

+.B XZ_DEFAULTS

+is rarely useful because a single run of

.B xz

-will exit with exit status

-.BR 1 .

+cannot do both compression and decompression and

+.BI \-\-memlimit= limit

+(or \fB\-M\fR \fIlimit\fR)

+is shorter to type on the command line.

.PP

-If source

-.I file

-cannot be decompressed without exceeding the memory usage limit, an error

-message is displayed and the file is skipped. Note that compressed files

-may contain many blocks, which may have been compressed with different

-settings. Typically all blocks will have roughly the same memory requirements,

-but it is possible that a block later in the file will exceed the memory usage

-limit, and an error about too low memory usage limit gets displayed after some

-data has already been decompressed.

-.PP

-The absolute value of the active memory usage limit can be seen with

-.B \-\-info-memory

-or near the bottom of the output of

-.BR \-\-long\-help .

-The default limit can be overridden with

-\fB\-\-memory=\fIlimit\fR.

-.SS Concatenation and padding with .xz files

+If the specified memory usage limit is exceeded when decompressing,

+.B xz

+will display an error and decompressing the file will fail.

+If the limit is exceeded when compressing,

+.B xz

+will try to scale the settings down so that the limit

+is no longer exceeded (except when using \fB\-\-format=raw\fR

+or \fB\-\-no\-adjust\fR).

+This way the operation won't fail unless the limit is very small.

+The scaling of the settings is done in steps that don't

+match the compression level presets, e.g. if the limit is

+only slightly less than the amount required for

+.BR "xz \-9" ,

+the settings will be scaled down only a little,

+not all the way down to

+.BR "xz \-8" .

+.SS "Concatenation and padding with .xz files"

It is possible to concatenate

.B .xz

files as is.

@@ -243,23 +271,28 @@

.B .xz

file.

.PP

-It is possible to insert padding between the concenated parts

-or after the last part. The padding must be null bytes and the size

-of the padding must be a multiple of four bytes. This can be useful

-if the .xz file is stored on a medium that stores file sizes

-e.g. as 512-byte blocks.

+It is possible to insert padding between the concatenated parts

+or after the last part.

+The padding must consist of null bytes and the size

+of the padding must be a multiple of four bytes.

+This can be useful e.g. if the

+.B .xz

+file is stored on a medium that measures file sizes

+in 512-byte blocks.

.PP

Concatenation and padding are not allowed with

.B .lzma

files or raw streams.

.SH OPTIONS

.SS "Integer suffixes and special values"

-In most places where an integer argument is expected, an optional suffix

-is supported to easily indicate large integers. There must be no space

-between the integer and the suffix.

+In most places where an integer argument is expected,

+an optional suffix is supported to easily indicate large integers.

+There must be no space between the integer and the suffix.

.TP

.B KiB

-The integer is multiplied by 1,024 (2^10). Also

+Multiply the integer by 1,024 (2^10).

.BR Ki ,

.BR k ,

.BR kB ,

@@ -270,7 +303,7 @@

.BR KiB .

.TP

.B MiB

-The integer is multiplied by 1,048,576 (2^20). Also

+Multiply the integer by 1,048,576 (2^20).

.BR Mi ,

.BR m ,

.BR M ,

@@ -280,7 +313,7 @@

.BR MiB .

.TP

.B GiB

-The integer is multiplied by 1,073,741,824 (2^30). Also

+Multiply the integer by 1,073,741,824 (2^30).

.BR Gi ,

.BR g ,

.BR G ,

@@ -289,16 +322,20 @@

are accepted as synonyms for

.BR GiB .

.PP

-A special value

+The special value

.B max

-can be used to indicate the maximum integer value supported by the option.

+can be used to indicate the maximum integer value

+supported by the option.

.SS "Operation mode"

-If multiple operation mode options are given, the last one takes effect.

+If multiple operation mode options are given,

+the last one takes effect.

.TP

.BR \-z ", " \-\-compress

-Compress. This is the default operation mode when no operation mode option

-is specified, and no other operation mode is implied from the command name

-(for example,

+Compress.

+This is the default operation mode when no operation mode option

+is specified and no other operation mode is implied from

+the command name (for example,

.B unxz

implies

.BR \-\-decompress ).

@@ -309,62 +346,73 @@

.BR \-t ", " \-\-test

Test the integrity of compressed

.IR files .

-No files are created or removed. This option is equivalent to

+This option is equivalent to

.B "\-\-decompress \-\-stdout"

except that the decompressed data is discarded instead of being

written to standard output.

+No files are created or removed.

.TP

.BR \-l ", " \-\-list

-List information about compressed

+Print information about compressed

.IR files .

-No uncompressed output is produced, and no files are created or removed.

-In list mode, the program cannot read the compressed data from standard

+No uncompressed output is produced,

+and no files are created or removed.

+In list mode, the program cannot read

+the compressed data from standard

input or from other unseekable sources.

-.IP

+.IP ""

The default listing shows basic information about

.IR files ,

-one file per line. To get more detailed information, use also the

+one file per line.

+To get more detailed information, use also the

.B \-\-verbose

-option. For even more information, use

+option.

+For even more information, use

.B \-\-verbose

-twice, but note that it may be slow, because getting all the extra

-information requires many seeks. The width of verbose output exceeds

-80 characters, so piping the output to e.g.

+twice, but note that this may be slow, because getting all the extra

+information requires many seeks.

+The width of verbose output exceeds

+80 characters, so piping the output to e.g.\&

.B "less\ \-S"

may be convenient if the terminal isn't wide enough.

-.IP

+.IP ""

The exact output may vary between

.B xz

-versions and different locales. To get machine-readable output,

+versions and different locales.

+For machine-readable output,

.B \-\-robot \-\-list

should be used.

.SS "Operation modifiers"

.TP

.BR \-k ", " \-\-keep

-Keep (don't delete) the input files.

+Don't delete the input files.

.TP

.BR \-f ", " \-\-force

This option has several effects:

.RS

.IP \(bu 3

-If the target file already exists, delete it before compressing or

-decompressing.

+If the target file already exists,

+delete it before compressing or decompressing.

.IP \(bu 3

-Compress or decompress even if the input is a symbolic link to a regular file,

-has more than one hard link, or has setuid, setgid, or sticky bit set.

-The setuid, setgid, and sticky bits are not copied to the target file.

+Compress or decompress even if the input is

+a symbolic link to a regular file,

+has more than one hard link,

+or has the setuid, setgid, or sticky bit set.

+The setuid, setgid, and sticky bits are not copied

+to the target file.

.IP \(bu 3

-If combined with

+When used with

.B \-\-decompress

.BR \-\-stdout

and

.B xz

-doesn't recognize the type of the source file,

-.B xz

-will copy the source file as is to standard output. This allows using

+cannot recognize the type of the source file,

+copy the source file as is to standard output.

+This allows

.B xzcat

-.B \--force

-like

+.B \-\-force

+to be used like

.BR cat (1)

for files that have not been compressed with

.BR xz .

@@ -380,21 +428,23 @@

to decompress only a single file format.

.RE

.TP

-.BR \-c ", " \-\-stdout ", " \-\-to-stdout

-Write the compressed or decompressed data to standard output instead of

-a file. This implies

+.BR \-c ", " \-\-stdout ", " \-\-to\-stdout

+Write the compressed or decompressed data to

+standard output instead of a file.

+This implies

.BR \-\-keep .

.TP

.B \-\-no\-sparse

-Disable creation of sparse files. By default, if decompressing into

-a regular file,

+Disable creation of sparse files.

+By default, if decompressing into a regular file,

.B xz

-tries to make the file sparse if the decompressed data contains long

-sequences of binary zeros. It works also when writing to standard output

-as long as standard output is connected to a regular file, and certain

-additional conditions are met to make it safe. Creating sparse files may

-save disk space and speed up the decompression by reducing the amount of

-disk I/O.

+tries to make the file sparse if the decompressed data contains

+long sequences of binary zeros.

+It also works when writing to standard output

+as long as standard output is connected to a regular file

+and certain additional conditions are met to make it safe.

+Creating sparse files may save disk space and speed up

+the decompression by reducing the amount of disk I/O.

.TP

\fB\-S\fR \fI.suf\fR, \fB\-\-suffix=\fI.suf

When compressing, use

@@ -403,11 +453,12 @@

.B .xz

.BR .lzma .

-If not writing to standard output and the source file already has the suffix

+If not writing to standard output and

+the source file already has the suffix

.IR .suf ,

a warning is displayed and the file is skipped.

-.IP

-When decompressing, recognize also files with the suffix

+.IP ""

+When decompressing, recognize files with the suffix

.I .suf

in addition to files with the

.BR .xz ,

@@ -415,13 +466,15 @@

.BR .lzma ,

.B .tlz

-suffix. If the source file has the suffix

+suffix.

+If the source file has the suffix

.IR .suf ,

the suffix is removed to get the target filename.

-.IP

+.IP ""

When compressing or decompressing raw streams

.RB ( \-\-format=raw ),

-the suffix must always be specified unless writing to standard output,

+the suffix must always be specified unless

+writing to standard output,

because there is no default suffix for raw streams.

.TP

\fB\-\-files\fR[\fB=\fIfile\fR]

@@ -429,8 +482,9 @@

.IR file ;

.I file

-is omitted, filenames are read from standard input. Filenames must be

-terminated with the newline character. A dash

+is omitted, filenames are read from standard input.

+Filenames must be terminated with the newline character.

+A dash

.RB ( \- )

is taken as a regular filename; it doesn't mean standard input.

If filenames are given also as command line arguments, they are

@@ -438,296 +492,469 @@

.IR file .

.TP

\fB\-\-files0\fR[\fB=\fIfile\fR]

-This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except that the

-filenames must be terminated with the null character.

+This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except

+that each filename must be terminated with the null character.

.SS "Basic file format and compression options"

.TP

\fB\-F\fR \fIformat\fR, \fB\-\-format=\fIformat

-Specify the file format to compress or decompress:

+Specify the file

+.I format

+to compress or decompress:

.RS

-.IP \(bu 3

-.BR auto :

-This is the default. When compressing,

+.TP

.B auto

+This is the default.

+When compressing,

+.B auto

is equivalent to

.BR xz .

-When decompressing, the format of the input file is automatically detected.

+When decompressing,

+the format of the input file is automatically detected.

Note that raw streams (created with

.BR \-\-format=raw )

cannot be auto-detected.

-.IP \(bu 3

-.BR xz :

+.TP

+.B xz

Compress to the

.B .xz

file format, or accept only

.B .xz

files when decompressing.

-.IP \(bu 3

-.B lzma

-or

-.BR alone :

+.TP

+.BR lzma ", " alone

Compress to the legacy

.B .lzma

file format, or accept only

.B .lzma

-files when decompressing. The alternative name

+files when decompressing.

+The alternative name

.B alone

is provided for backwards compatibility with LZMA Utils.

-.IP \(bu 3

-.BR raw :

-Compress or uncompress a raw stream (no headers). This is meant for advanced

-users only. To decode raw streams, you need to set not only

+.TP

+.B raw

+Compress or uncompress a raw stream (no headers).

+This is meant for advanced users only.

+To decode raw streams, you need use

.B \-\-format=raw

-but also specify the filter chain, which would normally be stored in the

-container format headers.

+and explicitly specify the filter chain,

+which normally would have been stored in the container headers.

.RE

.TP

\fB\-C\fR \fIcheck\fR, \fB\-\-check=\fIcheck

-Specify the type of the integrity check, which is calculated from the

-uncompressed data. This option has an effect only when compressing into the

+Specify the type of the integrity check.

+The check is calculated from the uncompressed data and

+stored in the

.B .xz

+file.

+This option has an effect only when compressing into the

+.B .xz

format; the

.B .lzma

format doesn't support integrity checks.

The integrity check (if any) is verified when the

.B .xz

file is decompressed.

-.IP

+.IP ""

Supported

.I check

types:

.RS

-.IP \(bu 3

-.BR none :

-Don't calculate an integrity check at all. This is usually a bad idea. This

-can be useful when integrity of the data is verified by other means anyway.

-.IP \(bu 3

-.BR crc32 :

+.TP

+.B none

+Don't calculate an integrity check at all.

+This is usually a bad idea.

+This can be useful when integrity of the data is verified

+by other means anyway.

+.TP

+.B crc32

Calculate CRC32 using the polynomial from IEEE-802.3 (Ethernet).

-.IP \(bu 3

-.BR crc64 :

-Calculate CRC64 using the polynomial from ECMA-182. This is the default, since

-it is slightly better than CRC32 at detecting damaged files and the speed

-difference is negligible.

-.IP \(bu 3

-.BR sha256 :

-Calculate SHA-256. This is somewhat slower than CRC32 and CRC64.

+.TP

+.B crc64

+Calculate CRC64 using the polynomial from ECMA-182.

+This is the default, since it is slightly better than CRC32

+at detecting damaged files and the speed difference is negligible.

+.TP

+.B sha256

+Calculate SHA-256.

+This is somewhat slower than CRC32 and CRC64.

.RE

-.IP

+.IP ""

Integrity of the

.B .xz

-headers is always verified with CRC32. It is not possible to change or

-disable it.

+headers is always verified with CRC32.

+It is not possible to change or disable it.

.TP

.BR \-0 " ... " \-9

-Select compression preset. If a preset level is specified multiple times,

+Select a compression preset level.

+The default is

+.BR \-6 .

+If multiple preset levels are specified,

the last one takes effect.

-.IP

-The compression preset levels can be categorised roughly into three

-categories:

-.RS

-.IP "\fB\-0\fR ... \fB\-2"

-Fast presets with relatively low memory usage.

-.B \-1

+If a custom filter chain was already specified, setting

+a compression preset level clears the custom filter chain.

+.IP ""

+The differences between the presets are more significant than with

+.BR gzip (1)

and

-.B \-2

-should give compression speed and ratios comparable to

-.B "bzip2 \-1"

+.BR bzip2 (1).

+The selected compression settings determine

+the memory requirements of the decompressor,

+thus using a too high preset level might make it painful

+to decompress the file on an old system with little RAM.

+Specifically,

+.B "it's not a good idea to blindly use \-9 for everything"

+like it often is with

+.BR gzip (1)

and

-.BR "bzip2 \-9" ,

-respectively.

-Currently

+.BR bzip2 (1).

+.RS

+.TP

+.BR "\-0" " ... " "\-3"

+These are somewhat fast presets.

.B \-0

-is not very good (not much faster than

-.B \-1

-but much worse compression). In future,

-.B \-0

-may be indicate some fast algorithm instead of LZMA2.

-.IP "\fB\-3\fR ... \fB\-5"

-Good compression ratio with low to medium memory usage.

-These are significantly slower than levels 0\-2.

-.IP "\fB\-6\fR ... \fB\-9"

-Excellent compression with medium to high memory usage. These are also

-slower than the lower preset levels. The default is

-.BR \-6 .

-Unless you want to maximize the compression ratio, you probably don't want

-a higher preset level than

-.B \-7

-due to speed and memory usage.

+is sometimes faster than

+.B "gzip \-9"

+while compressing much better.

+The higher ones often have speed comparable to

+.BR bzip2 (1)

+with comparable or better compression ratio,

+although the results

+depend a lot on the type of data being compressed.

+.TP

+.BR "\-4" " ... " "\-6"

+Good to very good compression while keeping

+decompressor memory usage reasonable even for old systems.

+.B \-6

+is the default, which is usually a good choice

+e.g. for distributing files that need to be decompressible

+even on systems with only 16\ MiB RAM.

+.RB ( \-5e

+or

+.B \-6e

+may be worth considering too.

+See

+.BR \-\-extreme .)

+.TP

+.B "\-7 ... \-9"

+These are like

+.B \-6

+but with higher compressor and decompressor memory requirements.

+These are useful only when compressing files bigger than

+8\ MiB, 16\ MiB, and 32\ MiB, respectively.

.RE

-.IP

-The exact compression settings (filter chain) used by each preset may

-vary between

-.B xz

-versions. The settings may also vary between files being compressed, if

-.B xz

-determines that modified settings will probably give better compression

-ratio without significantly affecting compression time or memory usage.

-.IP

-Because the settings may vary, the memory usage may vary too. The following

-table lists the maximum memory usage of each preset level, which won't be

-exceeded even in future versions of

-.BR xz .

-.IP

-.B "FIXME: The table below is just a rough idea."

+.IP ""

+On the same hardware, the decompression speed is approximately

+a constant number of bytes of compressed data per second.

+In other words, the better the compression,

+the faster the decompression will usually be.

+This also means that the amount of uncompressed output

+produced per second can vary a lot.

+.IP ""

+The following table summarises the features of the presets:

.RS

+.PP

.TS

tab(;);

-c c c

-n n n.

-Preset;Compression;Decompression

-\-0;6 MiB;1 MiB

-\-1;6 MiB;1 MiB

-\-2;10 MiB;1 MiB

-\-3;20 MiB;2 MiB

-\-4;30 MiB;3 MiB

-\-5;60 MiB;6 MiB

-\-6;100 MiB;10 MiB

-\-7;200 MiB;20 MiB

-\-8;400 MiB;40 MiB

-\-9;800 MiB;80 MiB

+c c c c c

+n n n n n.

+Preset;DictSize;CompCPU;CompMem;DecMem

+\-0;256 KiB;0;3 MiB;1 MiB

+\-1;1 MiB;1;9 MiB;2 MiB

+\-2;2 MiB;2;17 MiB;3 MiB

+\-3;4 MiB;3;32 MiB;5 MiB

+\-4;4 MiB;4;48 MiB;5 MiB

+\-5;8 MiB;5;94 MiB;9 MiB

+\-6;8 MiB;6;94 MiB;9 MiB

+\-7;16 MiB;6;186 MiB;17 MiB

+\-8;32 MiB;6;370 MiB;33 MiB

+\-9;64 MiB;6;674 MiB;65 MiB

.TE

.RE

-.IP

-When compressing,

+.IP ""

+Column descriptions:

+.RS

+.IP \(bu 3

+DictSize is the LZMA2 dictionary size.

+It is waste of memory to use a dictionary bigger than

+the size of the uncompressed file.

+This is why it is good to avoid using the presets

+.BR \-7 " ... " \-9

+when there's no real need for them.

+At

+.B \-6

+and lower, the amount of memory wasted is

+usually low enough to not matter.

+.IP \(bu 3

+CompCPU is a simplified representation of the LZMA2 settings

+that affect compression speed.

+The dictionary size affects speed too,

+so while CompCPU is the same for levels

+.BR \-6 " ... " \-9 ,

+higher levels still tend to be a little slower.

+To get even slower and thus possibly better compression, see

+.BR \-\-extreme .

+.IP \(bu 3

+CompMem contains the compressor memory requirements

+in the single-threaded mode.

+It may vary slightly between

.B xz

-automatically adjusts the compression settings downwards if

-the memory usage limit would be exceeded, so it is safe to specify

-a high preset level even on systems that don't have lots of RAM.

+versions.

+Memory requirements of some of the future multithreaded modes may

+be dramatically higher than that of the single-threaded mode.

+.IP \(bu 3

+DecMem contains the decompressor memory requirements.

+That is, the compression settings determine

+the memory requirements of the decompressor.

+The exact decompressor memory usage is slighly more than

+the LZMA2 dictionary size, but the values in the table

+have been rounded up to the next full MiB.

+.RE

.TP

-.BR \-\-fast " and " \-\-best

+.BR \-e ", " \-\-extreme

+Use a slower variant of the selected compression preset level

+.RB ( \-0 " ... " \-9 )

+to hopefully get a little bit better compression ratio,

+but with bad luck this can also make it worse.

+Decompressor memory usage is not affected,

+but compressor memory usage increases a little at preset levels

+.BR \-0 " ... " \-3 .

+.IP ""

+Since there are two presets with dictionary sizes

+4\ MiB and 8\ MiB, the presets

+.B \-3e

+and

+.B \-5e

+use slightly faster settings (lower CompCPU) than

+.B \-4e

+and

+.BR \-6e ,

+respectively.

+That way no two presets are identical.

+.RS

+.PP

+.TS

+tab(;);

+c c c c c

+n n n n n.

+Preset;DictSize;CompCPU;CompMem;DecMem

+\-0e;256 KiB;8;4 MiB;1 MiB

+\-1e;1 MiB;8;13 MiB;2 MiB

+\-2e;2 MiB;8;25 MiB;3 MiB

+\-3e;4 MiB;7;48 MiB;5 MiB

+\-4e;4 MiB;8;48 MiB;5 MiB

+\-5e;8 MiB;7;94 MiB;9 MiB

+\-6e;8 MiB;8;94 MiB;9 MiB

+\-7e;16 MiB;8;186 MiB;17 MiB

+\-8e;32 MiB;8;370 MiB;33 MiB

+\-9e;64 MiB;8;674 MiB;65 MiB

+.TE

+.RE

+.IP ""

+For example, there are a total of four presets that use

+8\ MiB dictionary, whose order from the fastest to the slowest is

+.BR \-5 ,

+.BR \-6 ,

+.BR \-5e ,

+and

+.BR \-6e .

+.TP

+.B \-\-fast

+.PD 0

+.TP

+.B \-\-best

+.PD

These are somewhat misleading aliases for

.B \-0

and

.BR \-9 ,

respectively.

-These are provided only for backwards compatibility with LZMA Utils.

+These are provided only for backwards compatibility

+with LZMA Utils.

Avoid using these options.

-.IP

-Especially the name of

-.B \-\-best

-is misleading, because the definition of best depends on the input data,

-and that usually people don't want the very best compression ratio anyway,

-because it would be very slow.

.TP

-.BR \-e ", " \-\-extreme

-Modify the compression preset (\fB\-0\fR ... \fB\-9\fR) so that a little bit

-better compression ratio can be achieved without increasing memory usage

-of the compressor or decompressor (exception: compressor memory usage may

-increase a little with presets \fB\-0\fR ... \fB\-2\fR). The downside is that

-the compression time will increase dramatically (it can easily double).

-.TP

+.BI \-\-memlimit\-compress= limit

+Set a memory usage limit for compression.

+If this option is specified multiple times,

+the last one takes effect.

+.IP ""

+If the compression settings exceed the

+.IR limit ,

+.B xz

+will adjust the settings downwards so that

+the limit is no longer exceeded and display a notice that

+automatic adjustment was done.

+Such adjustments are not made when compressing with

+.B \-\-format=raw

+or if

.B \-\-no\-adjust

-Display an error and exit if the compression settings exceed the

-the memory usage limit. The default is to adjust the settings downwards so

-that the memory usage limit is not exceeded. Automatic adjusting is

-always disabled when creating raw streams

-.RB ( \-\-format=raw ).

-.TP

-\fB\-M\fR \fIlimit\fR, \fB\-\-memory=\fIlimit

-Set the memory usage limit. If this option is specified multiple times,

-the last one takes effect. The

+has been specified.

+In those cases, an error is displayed and

+.B xz

+will exit with exit status 1.

+.IP ""

+The

.I limit

can be specified in multiple ways:

.RS

.IP \(bu 3

The

.I limit

-can be an absolute value in bytes. Using an integer suffix like

+can be an absolute value in bytes.

+Using an integer suffix like

.B MiB

-can be useful. Example:

-.B "\-\-memory=80MiB"

+can be useful.

+Example:

+.B "\-\-memlimit\-compress=80MiB"

.IP \(bu 3

The

.I limit

-can be specified as a percentage of physical RAM. Example:

-.B "\-\-memory=70%"

+can be specified as a percentage of total physical memory (RAM).

+This can be useful especially when setting the

+.B XZ_DEFAULTS

+environment variable in a shell initialization script

+that is shared between different computers.

+That way the limit is automatically bigger

+on systems with more memory.

+Example:

+.B "\-\-memlimit\-compress=70%"

.IP \(bu 3

The

.I limit

can be reset back to its default value by setting it to

.BR 0 .

-See the section

-.B "Memory usage"

-for how the default limit is defined.

-.IP \(bu 3

-The memory usage limiting can be effectively disabled by setting

+This is currently equivalent to setting the

.I limit

-.BR max .

-This isn't recommended. It's usually better to use, for example,

-.BR \-\-memory=90% .

+.B max

+(no memory usage limit).

+Once multithreading support has been implemented,

+there may be a difference between

+.B 0

+and

+.B max

+for the multithreaded case, so it is recommended to use

+.B 0

+instead of

+.B max

+until the details have been decided.

.RE

-.IP

-The current

-.I limit

-can be seen near the bottom of the output of the

-.B \-\-long-help

-option.

+.IP ""

+See also the section

+.BR "Memory usage" .

.TP

+.BI \-\-memlimit\-decompress= limit

+Set a memory usage limit for decompression.

+This also affects the

+.B \-\-list

+mode.

+If the operation is not possible without exceeding the

+.IR limit ,

+.B xz

+will display an error and decompressing the file will fail.

+See

+.BI \-\-memlimit\-compress= limit

+for possible ways to specify the

+.IR limit .

+.TP

+\fB\-M\fR \fIlimit\fR, \fB\-\-memlimit=\fIlimit\fR, \fB\-\-memory=\fIlimit

+This is equivalent to specifying \fB\-\-memlimit\-compress=\fIlimit

+\fB\-\-memlimit\-decompress=\fIlimit\fR.

+.TP

+.B \-\-no\-adjust

+Display an error and exit if the compression settings exceed

+the memory usage limit.

+The default is to adjust the settings downwards so

+that the memory usage limit is not exceeded.

+Automatic adjusting is always disabled when creating raw streams

+.RB ( \-\-format=raw ).

+.TP

\fB\-T\fR \fIthreads\fR, \fB\-\-threads=\fIthreads

-Specify the maximum number of worker threads to use. The default is

-the number of available CPU cores. You can see the current value of

+Specify the number of worker threads to use.

+The actual number of threads can be less than

.I threads

-near the end of the output of the

-.B \-\-long\-help

-option.

-.IP

-The actual number of worker threads can be less than

-.I threads

if using more threads would exceed the memory usage limit.

-In addition to CPU-intensive worker threads,

-.B xz

-may use a few auxiliary threads, which don't use a lot of CPU time.

-.IP

-.B "Multithreaded compression and decompression are not implemented yet,"

-.B "so this option has no effect for now."

-.SS Custom compressor filter chains

-A custom filter chain allows specifying the compression settings in detail

-instead of relying on the settings associated to the preset levels.

-When a custom filter chain is specified, the compression preset level options

-(\fB\-0\fR ... \fB\-9\fR and \fB\-\-extreme\fR) are silently ignored.

+.IP ""

+.B "Multithreaded compression and decompression are not"

+.B "implemented yet, so this option has no effect for now."

+.IP ""

+.B "As of writing (2010-09-27), it hasn't been decided"

+.B "if threads will be used by default on multicore systems"

+.B "once support for threading has been implemented."

+.B "Comments are welcome."

+The complicating factor is that using many threads

+will increase the memory usage dramatically.

+Note that if multithreading will be the default,

+it will probably be done so that single-threaded and

+multithreaded modes produce the same output,

+so compression ratio won't be significantly affected

+if threading will be enabled by default.

+.SS "Custom compressor filter chains"

+A custom filter chain allows specifying

+the compression settings in detail instead of relying on

+the settings associated to the preset levels.

+When a custom filter chain is specified,

+the compression preset level options

+(\fB\-0\fR ... \fB\-9\fR and \fB\-\-extreme\fR) are

+silently ignored.

.PP

-A filter chain is comparable to piping on the UN*X command line.

-When compressing, the uncompressed input goes to the first filter, whose

-output goes to the next filter (if any). The output of the last filter

-gets written to the compressed file. The maximum number of filters in

-the chain is four, but typically a filter chain has only one or two filters.

+A filter chain is comparable to piping on the command line.

+When compressing, the uncompressed input goes to the first filter,

+whose output goes to the next filter (if any).

+The output of the last filter gets written to the compressed file.

+The maximum number of filters in the chain is four,

+but typically a filter chain has only one or two filters.

.PP

-Many filters have limitations where they can be in the filter chain:

-some filters can work only as the last filter in the chain, some only

-as a non-last filter, and some work in any position in the chain. Depending

-on the filter, this limitation is either inherent to the filter design or

-exists to prevent security issues.

+Many filters have limitations on where they can be

+in the filter chain:

+some filters can work only as the last filter in the chain,

+some only as a non-last filter, and some work in any position

+in the chain.

+Depending on the filter, this limitation is either inherent to

+the filter design or exists to prevent security issues.

.PP

-A custom filter chain is specified by using one or more filter options in

-the order they are wanted in the filter chain. That is, the order of filter

-options is significant! When decoding raw streams

+A custom filter chain is specified by using one or more

+filter options in the order they are wanted in the filter chain.

+That is, the order of filter options is significant!

+When decoding raw streams

.RB ( \-\-format=raw ),

-the filter chain is specified in the same order as it was specified when

-compressing.

+the filter chain is specified in the same order as

+it was specified when compressing.

.PP

Filters take filter-specific

.I options

-as a comma-separated list. Extra commas in

+as a comma-separated list.

+Extra commas in

.I options

-are ignored. Every option has a default value, so you need to

+are ignored.

+Every option has a default value, so you need to

specify only those you want to change.

.TP

-\fB\-\-lzma1\fR[\fB=\fIoptions\fR], \fB\-\-lzma2\fR[\fB=\fIoptions\fR]

-Add LZMA1 or LZMA2 filter to the filter chain. These filter can be used

-only as the last filter in the chain.

-.IP

-LZMA1 is a legacy filter, which is supported almost solely due to the legacy

+\fB\-\-lzma1\fR[\fB=\fIoptions\fR]

+.PD 0

+.TP

+\fB\-\-lzma2\fR[\fB=\fIoptions\fR]

+.PD

+Add LZMA1 or LZMA2 filter to the filter chain.

+These filters can be used only as the last filter in the chain.

+.IP ""

+LZMA1 is a legacy filter,

+which is supported almost solely due to the legacy

.B .lzma

-file format, which supports only LZMA1. LZMA2 is an updated

-version of LZMA1 to fix some practical issues of LZMA1. The

+file format, which supports only LZMA1.

+LZMA2 is an updated

+version of LZMA1 to fix some practical issues of LZMA1.

+The

.B .xz

-format uses LZMA2, and doesn't support LZMA1 at all. Compression speed and

-ratios of LZMA1 and LZMA2 are practically the same.

-.IP

+format uses LZMA2 and doesn't support LZMA1 at all.

+Compression speed and ratios of LZMA1 and LZMA2

+are practically the same.

+.IP ""

LZMA1 and LZMA2 share the same set of

.IR options :

.RS

@@ -738,8 +965,9 @@

.IR preset .

.I Preset

-consist of an integer, which may be followed by single-letter preset

-modifiers. The integer can be from

+consist of an integer, which may be followed by single-letter

+preset modifiers.

+The integer can be from

.B 0

.BR 9 ,

@@ -748,7 +976,6 @@

.BR e ,

which matches

.BR \-\-extreme .

-.IP

The default

.I preset

@@ -758,84 +985,155 @@

are taken.

.TP

.BI dict= size

-Dictionary (history buffer) size indicates how many bytes of the recently

-processed uncompressed data is kept in memory. One method to reduce size of

-the uncompressed data is to store distance-length pairs, which

-indicate what data to repeat from the dictionary buffer. The bigger

-the dictionary, the better the compression ratio usually is,

-but dictionaries bigger than the uncompressed data are waste of RAM.

-.IP

-Typical dictionary size is from 64 KiB to 64 MiB. The minimum is 4 KiB.

-The maximum for compression is currently 1.5 GiB. The decompressor already

-supports dictionaries up to one byte less than 4 GiB, which is the

-maximum for LZMA1 and LZMA2 stream formats.

-.IP

-Dictionary size has the biggest effect on compression ratio.

-Dictionary size and match finder together determine the memory usage of

-the LZMA1 or LZMA2 encoder. The same dictionary size is required

-for decompressing that was used when compressing, thus the memory usage of

-the decoder is determined by the dictionary size used when compressing.

+Dictionary (history buffer)

+.I size

+indicates how many bytes of the recently processed

+uncompressed data is kept in memory.

+The algorithm tries to find repeating byte sequences (matches) in

+the uncompressed data, and replace them with references

+to the data currently in the dictionary.

+The bigger the dictionary, the higher is the chance

+to find a match.

+Thus, increasing dictionary

+.I size

+usually improves compression ratio, but

+a dictionary bigger than the uncompressed file is waste of memory.

+.IP ""

+Typical dictionary

+.I size

+is from 64\ KiB to 64\ MiB.

+The minimum is 4\ KiB.

+The maximum for compression is currently 1.5\ GiB (1536\ MiB).

+The decompressor already supports dictionaries up to

+one byte less than 4\ GiB, which is the maximum for

+the LZMA1 and LZMA2 stream formats.

+.IP ""

+Dictionary

+.I size

+and match finder

+.RI ( mf )

+together determine the memory usage of the LZMA1 or LZMA2 encoder.

+The same (or bigger) dictionary

+.I size

+is required for decompressing that was used when compressing,

+thus the memory usage of the decoder is determined

+by the dictionary size used when compressing.

+The

+.B .xz

+headers store the dictionary

+.I size

+either as

+.RI "2^" n

+or

+.RI "2^" n " + 2^(" n "\-1),"

+so these

+.I sizes

+are somewhat preferred for compression.

+Other

+.I sizes

+will get rounded up when stored in the

+.B .xz

+headers.

.TP

.BI lc= lc

-Specify the number of literal context bits. The minimum is

-.B 0

-and the maximum is

-.BR 4 ;

-the default is

-.BR 3 .

+Specify the number of literal context bits.

+The minimum is 0 and the maximum is 4; the default is 3.

In addition, the sum of

.I lc

and

.I lp

-must not exceed

-.BR 4 .

+must not exceed 4.

+.IP ""

+All bytes that cannot be encoded as matches

+are encoded as literals.

+That is, literals are simply 8-bit bytes

+that are encoded one at a time.

+.IP ""

+The literal coding makes an assumption that the highest

+.I lc

+bits of the previous uncompressed byte correlate

+with the next byte.

+E.g. in typical English text, an upper-case letter is

+often followed by a lower-case letter, and a lower-case

+letter is usually followed by another lower-case letter.

+In the US-ASCII character set, the highest three bits are 010

+for upper-case letters and 011 for lower-case letters.

+When

+.I lc

+is at least 3, the literal coding can take advantage of

+this property in the uncompressed data.

+.IP ""

+The default value (3) is usually good.

+If you want maximum compression, test

+.BR lc=4 .

+Sometimes it helps a little, and

+sometimes it makes compression worse.

+If it makes it worse, test e.g.\&

+.B lc=2

+too.

.TP

.BI lp= lp

-Specify the number of literal position bits. The minimum is

-.B 0

-and the maximum is

-.BR 4 ;

-the default is

-.BR 0 .

+Specify the number of literal position bits.

+The minimum is 0 and the maximum is 4; the default is 0.

+.IP ""

+.I Lp

+affects what kind of alignment in the uncompressed data is

+assumed when encoding literals.

+See

+.I pb

+below for more information about alignment.

.TP

.BI pb= pb

-Specify the number of position bits. The minimum is

-.B 0

-and the maximum is

-.BR 4 ;

-the default is

-.BR 2 .

-.TP

-.BI mode= mode

-Compression

-.I mode

-specifies the function used to analyze the data produced by the match finder.

-Supported

-.I modes

-are

-.B fast

+Specify the number of position bits.

+The minimum is 0 and the maximum is 4; the default is 2.

+.IP ""

+.I Pb

+affects what kind of alignment in the uncompressed data is

+assumed in general.

+The default means four-byte alignment

+.RI (2^ pb =2^2=4),

+which is often a good choice when there's no better guess.

+.IP ""

+When the aligment is known, setting

+.I pb

+accordingly may reduce the file size a little.

+E.g. with text files having one-byte

+alignment (US-ASCII, ISO-8859-*, UTF-8), setting

+.B pb=0

+can improve compression slightly.

+For UTF-16 text,

+.B pb=1

+is a good choice.

+If the alignment is an odd number like 3 bytes,

+.B pb=0

+might be the best choice.

+.IP ""

+Even though the assumed alignment can be adjusted with

+.I pb

and

-.BR normal .

-The default is

-.B fast

-for

-.I presets

-.BR 0 \- 2

-and

-.B normal

-for

-.I presets

-.BR 3 \- 9 .

+.IR lp ,

+LZMA1 and LZMA2 still slightly favor 16-byte alignment.

+It might be worth taking into account when designing file formats

+that are likely to be often compressed with LZMA1 or LZMA2.

.TP

.BI mf= mf

-Match finder has a major effect on encoder speed, memory usage, and

-compression ratio. Usually Hash Chain match finders are faster than

-Binary Tree match finders. Hash Chains are usually used together with

-.B mode=fast

-and Binary Trees with

-.BR mode=normal .

-The memory usage formulas are only rough estimates,

-which are closest to reality when

+Match finder has a major effect on encoder speed,

+memory usage, and compression ratio.

+Usually Hash Chain match finders are faster than Binary Tree

+match finders.

+The default depends on the

+.IR preset :

+0 uses

+.BR hc3 ,

+1\-3

+use

+.BR hc4 ,

+and the rest use

+.BR bt4 .

+.IP ""

+The following match finders are supported.

+The memory usage formulas below are rough approximations,

+which are closest to the reality when

.I dict

is a power of two.

.RS

@@ -848,6 +1146,7 @@

.br

Memory usage:

+.br

.I dict

* 7.5 (if

.I dict

@@ -866,8 +1165,16 @@

.br

Memory usage:

+.br

.I dict

-* 7.5

+* 7.5 (if

+.I dict

+<= 32 MiB);

+.br

+.I dict

+* 6.5 (if

+.I dict

+> 32 MiB)

.TP

.B bt2

Binary Tree with 2-byte hashing

@@ -888,6 +1195,7 @@

.br

Memory usage:

+.br

.I dict

* 11.5 (if

.I dict

@@ -906,53 +1214,96 @@

.br

Memory usage:

+.br

.I dict

-* 11.5

+* 11.5 (if

+.I dict

+<= 32 MiB);

+.br

+.I dict

+* 10.5 (if

+.I dict

+> 32 MiB)

.RE

.TP

+.BI mode= mode

+Compression

+.I mode

+specifies the method to analyze

+the data produced by the match finder.

+Supported

+.I modes

+are

+.B fast

+and

+.BR normal .

+The default is

+.B fast

+for

+.I presets

+0\-3 and

+.B normal

+for

+.I presets

+4\-9.

+.IP ""

+Usually

+.B fast

+is used with Hash Chain match finders and

+.B normal

+with Binary Tree match finders.

+This is also what the

+.I presets

+do.

+.TP

.BI nice= nice

-Specify what is considered to be a nice length for a match. Once a match

-of at least

+Specify what is considered to be a nice length for a match.

+Once a match of at least

.I nice

-bytes is found, the algorithm stops looking for possibly better matches.

-.IP

-.I nice

-can be 2\-273 bytes. Higher values tend to give better compression ratio

-at expense of speed. The default depends on the

-.I preset

-level.

+bytes is found, the algorithm stops

+looking for possibly better matches.

+.IP ""

+.I Nice

+can be 2\-273 bytes.

+Higher values tend to give better compression ratio

+at the expense of speed.

+The default depends on the

+.IR preset .

.TP

.BI depth= depth

-Specify the maximum search depth in the match finder. The default is the

-special value

-.BR 0 ,

+Specify the maximum search depth in the match finder.

+The default is the special value of 0,

which makes the compressor determine a reasonable

.I depth

from

.I mf

and

.IR nice .

-.IP

+.IP ""

+Reasonable

+.I depth

+for Hash Chains is 4\-100 and 16\-1000 for Binary Trees.

Using very high values for

.I depth

-can make the encoder extremely slow with carefully crafted files.

+can make the encoder extremely slow with some files.

Avoid setting the

.I depth

-over 1000 unless you are prepared to interrupt the compression in case it

-is taking too long.

+over 1000 unless you are prepared to interrupt

+the compression in case it is taking far too long.

.RE

-.IP

+.IP ""

When decoding raw streams

.RB ( \-\-format=raw ),

-LZMA2 needs only the value of

-.BR dict .

+LZMA2 needs only the dictionary

+.IR size .

LZMA1 needs also

-.BR lc ,

-.BR lp ,

+.IR lc ,

+.IR lp ,

and

-.BR pb.

+.IR pb .

.TP

\fB\-\-x86\fR[\fB=\fIoptions\fR]

+.PD 0

.TP

\fB\-\-powerpc\fR[\fB=\fIoptions\fR]

.TP

@@ -963,28 +1314,72 @@

\fB\-\-armthumb\fR[\fB=\fIoptions\fR]

.TP

\fB\-\-sparc\fR[\fB=\fIoptions\fR]

-Add a branch/call/jump (BCJ) filter to the filter chain. These filters

-can be used only as non-last filter in the filter chain.

-.IP

-A BCJ filter converts relative addresses in the machine code to their

-absolute counterparts. This doesn't change the size of the data, but

-it increases redundancy, which allows e.g. LZMA2 to get better

-compression ratio.

-.IP

-The BCJ filters are always reversible, so using a BCJ filter for wrong

-type of data doesn't cause any data loss. However, applying a BCJ filter

-for wrong type of data is a bad idea, because it tends to make the

-compression ratio worse.

-.IP

+.PD

+Add a branch/call/jump (BCJ) filter to the filter chain.

+These filters can be used only as a non-last filter

+in the filter chain.

+.IP ""

+A BCJ filter converts relative addresses in

+the machine code to their absolute counterparts.

+This doesn't change the size of the data,

+but it increases redundancy,

+which can help LZMA2 to produce 0\-15\ % smaller

+.B .xz

+file.

+The BCJ filters are always reversible,

+so using a BCJ filter for wrong type of data

+doesn't cause any data loss, although it may make

+the compression ratio slightly worse.

+.IP ""

+It is fine to apply a BCJ filter on a whole executable;

+there's no need to apply it only on the executable section.

+Applying a BCJ filter on an archive that contains both executable

+and non-executable files may or may not give good results,

+so it generally isn't good to blindly apply a BCJ filter when

+compressing binary packages for distribution.

+.IP ""

+These BCJ filters are very fast and

+use insignificant amount of memory.

+If a BCJ filter improves compression ratio of a file,

+it can improve decompression speed at the same time.

+This is because, on the same hardware,

+the decompression speed of LZMA2 is roughly

+a fixed number of bytes of compressed data per second.

+.IP ""

+These BCJ filters have known problems related to

+the compression ratio:

+.RS

+.IP \(bu 3

+Some types of files containing executable code

+(e.g. object files, static libraries, and Linux kernel modules)

+have the addresses in the instructions filled with filler values.

+These BCJ filters will still do the address conversion,

+which will make the compression worse with these files.

+.IP \(bu 3

+Applying a BCJ filter on an archive containing multiple similar

+executables can make the compression ratio worse than not using

+a BCJ filter.

+This is because the BCJ filter doesn't detect the boundaries

+of the executable files, and doesn't reset

+the address conversion counter for each executable.

+.RE

+.IP ""

+Both of the above problems will be fixed

+in the future in a new filter.

+The old BCJ filters will still be useful in embedded systems,

+because the decoder of the new filter will be bigger

+and use more memory.

+.IP ""

Different instruction sets have have different alignment:

.RS

+.PP

.TS

tab(;);

l n l

l n l.

Filter;Alignment;Notes

-x86;1;32-bit and 64-bit x86

+x86;1;32-bit or 64-bit x86

PowerPC;4;Big endian only

ARM;4;Little endian only

ARM-Thumb;2;Little endian only

@@ -993,15 +1388,18 @@

.TE

.RE

-.IP

-Since the BCJ-filtered data is usually compressed with LZMA2, the compression

-ratio may be improved slightly if the LZMA2 options are set to match the

-alignment of the selected BCJ filter. For example, with the IA-64 filter,

-it's good to set

+.IP ""

+Since the BCJ-filtered data is usually compressed with LZMA2,

+the compression ratio may be improved slightly if

+the LZMA2 options are set to match the

+alignment of the selected BCJ filter.

+For example, with the IA-64 filter, it's good to set

.B pb=4

-with LZMA2 (2^4=16). The x86 filter is an exception; it's usually good to

-stick to LZMA2's default four-byte alignment when compressing x86 executables.

-.IP

+with LZMA2 (2^4=16).

+The x86 filter is an exception;

+it's usually good to stick to LZMA2's default

+four-byte alignment when compressing x86 executables.

+.IP ""

All BCJ filters support the same

.IR options :

.RS

@@ -1009,36 +1407,32 @@

.BI start= offset

Specify the start

.I offset

-that is used when converting between relative and absolute addresses.

+that is used when converting between relative

+and absolute addresses.

The

.I offset

-must be a multiple of the alignment of the filter (see the table above).

-The default is zero. In practice, the default is good; specifying

-a custom

+must be a multiple of the alignment of the filter

+(see the table above).

+The default is zero.

+In practice, the default is good; specifying a custom

.I offset

is almost never useful.

-.IP

-Specifying a non-zero start

-.I offset

-is probably useful only if the executable has multiple sections, and there

-are many cross-section jumps or calls. Applying a BCJ filter separately for

-each section with proper start offset and then compressing the result as

-a single chunk may give some improvement in compression ratio compared

-to applying the BCJ filter with the default

-.I offset

-for the whole executable.

.RE

.TP

\fB\-\-delta\fR[\fB=\fIoptions\fR]

-Add Delta filter to the filter chain. The Delta filter

-can be used only as non-last filter in the filter chain.

-.IP

-Currently only simple byte-wise delta calculation is supported. It can

-be useful when compressing e.g. uncompressed bitmap images or uncompressed

-PCM audio. However, special purpose algorithms may give significantly better

-results than Delta + LZMA2. This is true especially with audio, which

-compresses faster and better e.g. with FLAC.

-.IP

+Add the Delta filter to the filter chain.

+The Delta filter can be only used as a non-last filter

+in the filter chain.

+.IP ""

+Currently only simple byte-wise delta calculation is supported.

+It can be useful when compressing e.g. uncompressed bitmap images

+or uncompressed PCM audio.

+However, special purpose algorithms may give significantly better

+results than Delta + LZMA2.

+This is true especially with audio,

+which compresses faster and better e.g. with

+.BR flac (1).

+.IP ""

Supported

.IR options :

.RS

@@ -1046,99 +1440,111 @@

.BI dist= distance

Specify the

.I distance

-of the delta calculation as bytes.

+of the delta calculation in bytes.

.I distance

-must be 1\-256. The default is 1.

-.IP

+must be 1\-256.

+The default is 1.

+.IP ""

For example, with

.B dist=2

and eight-byte input A1 B1 A2 B3 A3 B5 A4 B7, the output will be

A1 B1 01 02 01 02 01 02.

.RE

.SS "Other options"

.TP

.BR \-q ", " \-\-quiet

-Suppress warnings and notices. Specify this twice to suppress errors too.

-This option has no effect on the exit status. That is, even if a warning

-was suppressed, the exit status to indicate a warning is still used.

+Suppress warnings and notices.

+Specify this twice to suppress errors too.

+This option has no effect on the exit status.

+That is, even if a warning was suppressed,

+the exit status to indicate a warning is still used.

.TP

.BR \-v ", " \-\-verbose

-Be verbose. If standard error is connected to a terminal,

+Be verbose.

+If standard error is connected to a terminal,

.B xz

will display a progress indicator.

Specifying

.B \-\-verbose

-twice will give even more verbose output (useful mostly for debugging).

-.IP

+twice will give even more verbose output.

+.IP ""

The progress indicator shows the following information:

.RS

.IP \(bu 3

-Completion percentage is shown if the size of the input file is known.

-That is, percentage cannot be shown in pipes.

+Completion percentage is shown

+if the size of the input file is known.

+That is, the percentage cannot be shown in pipes.

.IP \(bu 3

-Amount of compressed data produced (compressing) or consumed (decompressing).

+Amount of compressed data produced (compressing)

+or consumed (decompressing).

.IP \(bu 3

-Amount of uncompressed data consumed (compressing) or produced

-(decompressing).

+Amount of uncompressed data consumed (compressing)

+or produced (decompressing).

.IP \(bu 3

-Compression ratio, which is calculated by dividing the amount of

-compressed data processed so far by the amount of uncompressed data

-processed so far.

+Compression ratio, which is calculated by dividing

+the amount of compressed data processed so far by

+the amount of uncompressed data processed so far.

.IP \(bu 3

-Compression or decompression speed. This is measured as the amount of

-uncompressed data consumed (compression) or produced (decompression)

-per second. It is shown once a few seconds have passed since

+Compression or decompression speed.

+This is measured as the amount of uncompressed data consumed

+(compression) or produced (decompression) per second.

+It is shown after a few seconds have passed since

.B xz

started processing the file.

.IP \(bu 3

-Elapsed time or estimated time remaining.

-Elapsed time is displayed in the format M:SS or H:MM:SS.

-The estimated remaining time is displayed in a less precise format

-which never has colons, for example, 2 min 30 s. The estimate can

-be shown only when the size of the input file is known and a couple of

-seconds have already passed since

+Elapsed time in the format M:SS or H:MM:SS.

+.IP \(bu 3

+Estimated remaining time is shown

+only when the size of the input file is

+known and a couple of seconds have already passed since

.B xz

started processing the file.

+The time is shown in a less precise format which

+never has any colons, e.g. 2 min 30 s.

.RE

-.IP

+.IP ""

When standard error is not a terminal,

.B \-\-verbose

will make

.B xz

-print the filename, compressed size, uncompressed size, compression ratio,

-speed, and elapsed time on a single line to standard error after

-compressing or decompressing the file. If operating took at least a few

-seconds, also the speed and elapsed time are printed. If the operation

-didn't finish, for example due to user interruption, also the completion

-percentage is printed if the size of the input file is known.

+print the filename, compressed size, uncompressed size,

+compression ratio, and possibly also the speed and elapsed time

+on a single line to standard error after compressing or

+decompressing the file.

+The speed and elapsed time are included only when

+the operation took at least a few seconds.

+If the operation didn't finish, e.g. due to user interruption,

+also the completion percentage is printed

+if the size of the input file is known.

.TP

.BR \-Q ", " \-\-no\-warn

-Don't set the exit status to

-.B 2

-even if a condition worth a warning was detected. This option doesn't affect

-the verbosity level, thus both

+Don't set the exit status to 2

+even if a condition worth a warning was detected.

+This option doesn't affect the verbosity level, thus both

.B \-\-quiet

and

.B \-\-no\-warn

-have to be used to not display warnings and to not alter the exit status.

+have to be used to not display warnings and

+to not alter the exit status.

.TP

.B \-\-robot

-Print messages in a machine-parsable format. This is intended to ease

-writing frontends that want to use

+Print messages in a machine-parsable format.

+This is intended to ease writing frontends that want to use

.B xz

-instead of liblzma, which may be the case with various scripts. The output

-with this option enabled is meant to be stable across

+instead of liblzma, which may be the case with various scripts.

+The output with this option enabled is meant to be stable across

.B xz

-releases. See the section

+releases.

+See the section

.B "ROBOT MODE"

for details.

.TP

-.BR \-\-info-memory

-Display the current memory usage limit in human-readable format on

-a single line, and exit successfully. To see how much RAM

+.BR \-\-info\-memory

+Display, in human-readable format, how much physical memory (RAM)

.B xz

-thinks your system has, use

-.BR "\-\-memory=100% \-\-info\-memory" .

+thinks the system has and the memory usage limits for compression

+and decompression, and exit successfully.

.TP

.BR \-h ", " \-\-help

Display a help message describing the most commonly used options,

@@ -1152,24 +1558,29 @@

.BR \-V ", " \-\-version

Display the version number of

.B xz

-and liblzma in human readable format. To get machine-parsable output, specify

+and liblzma in human readable format.

+To get machine-parsable output, specify

.B \-\-robot

before

.BR \-\-version .

-.SH ROBOT MODE

+.SH "ROBOT MODE"

The robot mode is activated with the

.B \-\-robot

-option. It makes the output of

+option.

+It makes the output of

.B xz

-easier to parse by other programs. Currently

+easier to parse by other programs.

+Currently

.B \-\-robot

is supported only together with

.BR \-\-version ,

-.BR \-\-info-memory ,

+.BR \-\-info\-memory ,

and

.BR \-\-list .

-It will be supported for normal compression and decompression in the future.

-.PP

+It will be supported for normal compression and

+decompression in the future.

.SS Version

.B "xz \-\-robot \-\-version"

will print the version number of

@@ -1184,24 +1595,19 @@

Major version.

.TP

.I YYY

-Minor version. Even numbers are stable.

+Minor version.

+Even numbers are stable.

Odd numbers are alpha or beta versions.

.TP

.I ZZZ

-Patch level for stable releases or just a counter for development releases.

+Patch level for stable releases or

+just a counter for development releases.

.TP

.I S

Stability.

-.B 0

-is alpha,

-.B 1

-is beta, and

-.B 2

-is stable.

+0 is alpha, 1 is beta, and 2 is stable.

.I S

-should be always

-.B 2

-when

+should be always 2 when

.I YYY

is even.

.PP

@@ -1215,31 +1621,48 @@

and

5.0.0 is

.BR 50000002 .

-.SS Memory limit information

-.B "xz \-\-robot \-\-info-memory"

-prints the current memory usage limit as bytes on a single line.

-To get the total amount of installed RAM, use

-.BR "xz \-\-robot \-\-memory=100% \-\-info-memory" .

-.SS List mode

+.SS "Memory limit information"

+.B "xz \-\-robot \-\-info\-memory"

+prints a single line with three tab-separated columns:

+.IP 1. 4

+Total amount of physical memory (RAM) in bytes

+.IP 2. 4

+Memory usage limit for compression in bytes.

+A special value of zero indicates the default setting,

+which for single-threaded mode is the same as no limit.

+.IP 3. 4

+Memory usage limit for decompression in bytes.

+A special value of zero indicates the default setting,

+which for single-threaded mode is the same as no limit.

+.PP

+In the future, the output of

+.B "xz \-\-robot \-\-info\-memory"

+may have more columns, but never more than a single line.

+.SS "List mode"

.B "xz \-\-robot \-\-list"

-uses tab-separated output. The first column of every line has a string

+uses tab-separated output.

+The first column of every line has a string

that indicates the type of the information found on that line:

.TP

.B name

-This is always the first line when starting to list a file. The second

-column on the line is the filename.

+This is always the first line when starting to list a file.

+The second column on the line is the filename.

.TP

.B file

This line contains overall information about the

.B .xz

-file. This line is always printed after the

+file.

+This line is always printed after the

.B name

line.

.TP

.B stream

This line type is used only when

.B \-\-verbose

-was specified. There are as many

+was specified.

+There are as many

.B stream

lines as there are streams in the

.B .xz

@@ -1248,11 +1671,13 @@

.B block

This line type is used only when

.B \-\-verbose

-was specified. There are as many

+was specified.

+There are as many

.B block

lines as there are blocks in the

.B .xz

-file. The

+file.

+The

.B block

lines are shown after all the

.B stream

@@ -1261,9 +1686,11 @@

.B summary

This line type is used only when

.B \-\-verbose

-was specified twice. This line is printed after all

+was specified twice.

+This line is printed after all

.B block

-lines. Like the

+lines.

+Like the

.B file

line, the

.B summary

@@ -1272,12 +1699,13 @@

file.

.TP

.B totals

-This line is always the very last line of the list output. It shows

-the total counts and sizes.

+This line is always the very last line of the list output.

+It shows the total counts and sizes.

.PP

The columns of the

.B file

lines:

+.PD 0

.RS

.IP 2. 4

Number of streams in the file

@@ -1294,8 +1722,8 @@

.RB ( \-\-\- )

are displayed instead of the ratio.

.IP 7. 4

-Comma-separated list of integrity check names. The following strings are

-used for the known check types:

+Comma-separated list of integrity check names.

+The following strings are used for the known check types:

.BR None ,

.BR CRC32 ,

.BR CRC64 ,

@@ -1309,10 +1737,12 @@

.IP 8. 4

Total size of stream padding in the file

.RE

+.PD

.PP

The columns of the

.B stream

lines:

+.PD 0

.RS

.IP 2. 4

Stream number (the first stream is 1)

@@ -1333,15 +1763,18 @@

.IP 10. 4

Size of stream padding

.RE

+.PD

.PP

The columns of the

.B block

lines:

+.PD 0

.RS

.IP 2. 4

Number of the stream containing this block

.IP 3. 4

-Block number relative to the beginning of the stream (the first block is 1)

+Block number relative to the beginning of the stream

+(the first block is 1)

.IP 4. 4

Block number relative to the beginning of the file

.IP 5. 4

@@ -1357,14 +1790,18 @@

.IP 10. 4

Name of the integrity check

.RE

+.PD

.PP

.B \-\-verbose

was specified twice, additional columns are included on the

.B block

-lines. These are not displayed with a single

+lines.

+These are not displayed with a single

.BR \-\-verbose ,

-because getting this information requires many seeks and can thus be slow:

+because getting this information requires many seeks

+and can thus be slow:

+.PD 0

.RS

.IP 11. 4

Value of the integrity check in hexadecimal

@@ -1378,26 +1815,30 @@

indicates that uncompressed size is present.

If the flag is not set, a dash

.RB ( \- )

-is shown instead to keep the string length fixed. New flags may be added

-to the end of the string in the future.

+is shown instead to keep the string length fixed.

+New flags may be added to the end of the string in the future.

.IP 14. 4

Size of the actual compressed data in the block (this excludes

the block header, block padding, and check fields)

.IP 15. 4

-Amount of memory (as bytes) required to decompress this block with this

+Amount of memory (in bytes) required to decompress

+this block with this

.B xz

version

.IP 16. 4

-Filter chain. Note that most of the options used at compression time cannot

-be known, because only the options that are needed for decompression are

-stored in the

+Filter chain.

+Note that most of the options used at compression time

+cannot be known, because only the options

+that are needed for decompression are stored in the

.B .xz

headers.

.RE

+.PD

.PP

The columns of the

.B totals

line:

+.PD 0

.RS

.IP 2. 4

Number of streams

@@ -1410,14 +1851,17 @@

.IP 6. 4

Average compression ratio

.IP 7. 4

-Comma-separated list of integrity check names that were present in the files

+Comma-separated list of integrity check names

+that were present in the files

.IP 8. 4

Stream padding size

.IP 9. 4

-Number of files. This is here to keep the order of the earlier columns

-the same as on

+Number of files.

+This is here to

+keep the order of the earlier columns the same as on

.B file

lines.

+.PD

.RE

.PP

@@ -1425,10 +1869,11 @@

was specified twice, additional columns are included on the

.B totals

line:

+.PD 0

.RS

.IP 10. 4

-Maximum amount of memory (as bytes) required to decompress the files

-with this

+Maximum amount of memory (in bytes) required to decompress

+the files with this

.B xz

version

.IP 11. 4

@@ -1438,9 +1883,12 @@

indicating if all block headers have both compressed size and

uncompressed size stored in them

.RE

+.PD

.PP

-Future versions may add new line types and new columns can be added to

-the existing line types, but the existing columns won't be changed.

+Future versions may add new line types and

+new columns can be added to the existing line types,

+but the existing columns won't be changed.

.SH "EXIT STATUS"

.TP

.B 0

@@ -1450,21 +1898,76 @@

An error occurred.

.TP

.B 2

-Something worth a warning occurred, but no actual errors occurred.

+Something worth a warning occurred,

+but no actual errors occurred.

.PP

-Notices (not warnings or errors) printed on standard error don't affect

-the exit status.

+Notices (not warnings or errors) printed on standard error

+don't affect the exit status.

.SH ENVIRONMENT

+.B xz

+parses space-separated lists of options

+from the environment variables

+.B XZ_DEFAULTS

+and

+.BR XZ_OPT ,

+in this order, before parsing the options from the command line.

+Note that only options are parsed from the environment variables;

+all non-options are silently ignored.

+Parsing is done with

+.BR getopt_long (3)

+which is used also for the command line arguments.

.TP

+.B XZ_DEFAULTS

+User-specific or system-wide default options.

+Typically this is set in a shell initialization script to enable

+.BR xz 's

+memory usage limiter by default.

+Excluding shell initialization scripts

+and similar special cases, scripts must never set or unset

+.BR XZ_DEFAULTS .

+.TP

.B XZ_OPT

-A space-separated list of options is parsed from

+This is for passing options to

+.B xz

+when it is not possible to set the options directly on the

+.B xz

+command line.

+This is the case e.g. when

+.B xz

+is run by a script or tool, e.g. GNU

+.BR tar (1):

+.RS

+.PP

+.nf

+.ft CW

+XZ_OPT=\-2v tar caf foo.tar.xz foo

+.ft R

+.fi

+.RE

+.IP ""

+Scripts may use

.B XZ_OPT

-before parsing the options given on the command line. Note that only

-options are parsed from

-.BR XZ_OPT ;

-all non-options are silently ignored. Parsing is done with

-.BR getopt_long (3)

-which is used also for the command line arguments.

+e.g. to set script-specific default compression options.

+It is still recommended to allow users to override

+.B XZ_OPT

+if that is reasonable, e.g. in

+.BR sh (1)

+scripts one may use something like this:

+.RS

+.PP

+.nf

+.ft CW

+XZ_OPT=${XZ_OPT\-"\-7e"}

+export XZ_OPT

+.ft R

+.fi

+.RE

.SH "LZMA UTILS COMPATIBILITY"

The command line syntax of

.B xz

@@ -1473,26 +1976,32 @@

.BR unlzma ,

and

.BR lzcat

-as found from LZMA Utils 4.32.x. In most cases, it is possible to replace

-LZMA Utils with XZ Utils without breaking existing scripts. There are some

-incompatibilities though, which may sometimes cause problems.

+as found from LZMA Utils 4.32.x.

+In most cases, it is possible to replace

+LZMA Utils with XZ Utils without breaking existing scripts.

+There are some incompatibilities though,

+which may sometimes cause problems.

.SS "Compression preset levels"

The numbering of the compression level presets is not identical in

.B xz

and LZMA Utils.

-The most important difference is how dictionary sizes are mapped to different

-presets. Dictionary size is roughly equal to the decompressor memory usage.

+The most important difference is how dictionary sizes

+are mapped to different presets.

+Dictionary size is roughly equal to the decompressor memory usage.

.RS

+.PP

.TS

tab(;);

c c c

c n n.

Level;xz;LZMA Utils

-\-1;64 KiB;64 KiB

-\-2;512 KiB;1 MiB

-\-3;1 MiB;512 KiB

-\-4;2 MiB;1 MiB

-\-5;4 MiB;2 MiB

+\-0;256 KiB;N/A

+\-1;1 MiB;64 KiB

+\-2;2 MiB;1 MiB

+\-3;4 MiB;512 KiB

+\-4;4 MiB;1 MiB

+\-5;8 MiB;2 MiB

\-6;8 MiB;4 MiB

\-7;16 MiB;8 MiB

\-8;32 MiB;16 MiB

@@ -1500,20 +2009,24 @@

.TE

.RE

.PP

-The dictionary size differences affect the compressor memory usage too,

-but there are some other differences between LZMA Utils and XZ Utils, which

+The dictionary size differences affect

+the compressor memory usage too,

+but there are some other differences between

+LZMA Utils and XZ Utils, which

make the difference even bigger:

.RS

+.PP

.TS

tab(;);

c c c

c n n.

Level;xz;LZMA Utils 4.32.x

-\-1;2 MiB;2 MiB

-\-2;5 MiB;12 MiB

-\-3;13 MiB;12 MiB

-\-4;25 MiB;16 MiB

-\-5;48 MiB;26 MiB

+\-0;3 MiB;N/A

+\-1;9 MiB;2 MiB

+\-2;17 MiB;12 MiB

+\-3;32 MiB;12 MiB

+\-4;48 MiB;16 MiB

+\-5;94 MiB;26 MiB

\-6;94 MiB;45 MiB

\-7;186 MiB;83 MiB

\-8;370 MiB;159 MiB

@@ -1525,33 +2038,40 @@

.B \-7

while in XZ Utils it is

.BR \-6 ,

-so both use 8 MiB dictionary by default.

+so both use an 8 MiB dictionary by default.

.SS "Streamed vs. non-streamed .lzma files"

-Uncompressed size of the file can be stored in the

+The uncompressed size of the file can be stored in the

.B .lzma

-header. LZMA Utils does that when compressing regular files.

-The alternative is to mark that uncompressed size is unknown and

-use end of payload marker to indicate where the decompressor should stop.

-LZMA Utils uses this method when uncompressed size isn't known, which is

-the case for example in pipes.

+header.

+LZMA Utils does that when compressing regular files.

+The alternative is to mark that uncompressed size is unknown

+and use end-of-payload marker to indicate

+where the decompressor should stop.

+LZMA Utils uses this method when uncompressed size isn't known,

+which is the case for example in pipes.

.PP

.B xz

supports decompressing

.B .lzma

-files with or without end of payload marker, but all

+files with or without end-of-payload marker, but all

.B .lzma

files created by

.B xz

-will use end of payload marker and have uncompressed size marked as unknown

-in the

+will use end-of-payload marker and have uncompressed size

+marked as unknown in the

.B .lzma

-header. This may be a problem in some (uncommon) situations. For example, a

+header.

+This may be a problem in some uncommon situations.

+For example, a

.B .lzma

-decompressor in an embedded device might work only with files that have known

-uncompressed size. If you hit this problem, you need to use LZMA Utils or

-LZMA SDK to create

+decompressor in an embedded device might work

+only with files that have known uncompressed size.

+If you hit this problem, you need to use LZMA Utils

+or LZMA SDK to create

.B .lzma

files with known uncompressed size.

.SS "Unsupported .lzma files"

The

.B .lzma

@@ -1559,7 +2079,8 @@

.I lc

values up to 8, and

.I lp

-values up to 4. LZMA Utils can decompress files with any

+values up to 4.

+LZMA Utils can decompress files with any

.I lc

and

.IR lp ,

@@ -1575,24 +2096,25 @@

.B xz

and with LZMA SDK.

.PP

-The implementation of the LZMA1 filter in liblzma requires

-that the sum of

+The implementation of the LZMA1 filter in liblzma

+requires that the sum of

.I lc

and

.I lp

-must not exceed 4. Thus,

+must not exceed 4.

+Thus,

.B .lzma

-files which exceed this limitation, cannot be decompressed with

+files, which exceed this limitation, cannot be decompressed with

.BR xz .

.PP

LZMA Utils creates only

.B .lzma

-files which have dictionary size of

+files which have a dictionary size of

.RI "2^" n

-(a power of 2), but accepts files with any dictionary size.

+(a power of 2) but accepts files with any dictionary size.

liblzma accepts only

.B .lzma

-files which have dictionary size of

+files which have a dictionary size of

.RI "2^" n

.RI "2^" n " + 2^(" n "\-1)."

@@ -1600,13 +2122,18 @@

.B .lzma

files.

.PP

-These limitations shouldn't be a problem in practice, since practically all

+These limitations shouldn't be a problem in practice,

+since practically all

.B .lzma

files have been compressed with settings that liblzma will accept.

.SS "Trailing garbage"

-When decompressing, LZMA Utils silently ignore everything after the first

+When decompressing,

+LZMA Utils silently ignore everything after the first

.B .lzma

-stream. In most situations, this is a bug. This also means that LZMA Utils

+stream.

+In most situations, this is a bug.

+This also means that LZMA Utils

don't support decompressing concatenated

.B .lzma

files.

@@ -1615,34 +2142,46 @@

.B .lzma

stream,

.B xz

-considers the file to be corrupt. This may break obscure scripts which have

+considers the file to be corrupt.

+This may break obscure scripts which have

assumed that trailing garbage is ignored.

.SH NOTES

-.SS Compressed output may vary

-The exact compressed output produced from the same uncompressed input file

-may vary between XZ Utils versions even if compression options are identical.

-This is because the encoder can be improved (faster or better compression)

-without affecting the file format. The output can vary even between different

-builds of the same XZ Utils version, if different build options are used.

+.SS "Compressed output may vary"

+The exact compressed output produced from

+the same uncompressed input file

+may vary between XZ Utils versions even if

+compression options are identical.

+This is because the encoder can be improved

+(faster or better compression)

+without affecting the file format.

+The output can vary even between different

+builds of the same XZ Utils version,

+if different build options are used.

.PP

The above means that implementing

.B \-\-rsyncable

to create rsyncable

.B .xz

-files is not going to happen without freezing a part of the encoder

+files is not going to happen without

+freezing a part of the encoder

implementation, which can then be used with

.BR \-\-rsyncable .

-.SS Embedded .xz decompressors

+.SS "Embedded .xz decompressors"

Embedded

.B .xz

-decompressor implementations like XZ Embedded don't necessarily support files

-created with

+decompressor implementations like XZ Embedded don't necessarily

+support files created with integrity

.I check

types other than

.B none

and

.BR crc32 .

-Since the default is \fB\-\-check=\fIcrc64\fR, you must use

+Since the default is

+.BR \-\-check=crc64 ,

+you must use

.B \-\-check=none

.B \-\-check=crc32

@@ -1652,53 +2191,374 @@

.B .xz

format decompressors support all the

.I check

-types, or at least are able to decompress the file without verifying the

+types, or at least are able to decompress

+the file without verifying the

integrity check if the particular

.I check

is not supported.

.PP

-XZ Embedded supports BCJ filters, but only with the default start offset.

+XZ Embedded supports BCJ filters,

+but only with the default start offset.

.SH EXAMPLES

.SS Basics

+Compress the file

+.I foo

+into

+.I foo.xz

+using the default compression level

+.RB ( \-6 ),

+and remove

+.I foo

+if compression is successful:

+.RS

+.PP

+.nf

+.ft CW

+xz foo

+.ft R

+.fi

+.RE

+.PP

+Decompress

+.I bar.xz

+into

+.I bar

+and don't remove

+.I bar.xz

+even if decompression is successful:

+.RS

+.PP

+.nf

+.ft CW

+xz \-dk bar.xz

+.ft R

+.fi

+.RE

+.PP

+Create

+.I baz.tar.xz

+with the preset

+.B \-4e

+.RB ( "\-4 \-\-extreme" ),

+which is slower than e.g. the default

+.BR \-6 ,

+but needs less memory for compression and decompression (48\ MiB

+and 5\ MiB, respectively):

+.RS

+.PP

+.nf

+.ft CW

+tar cf \- baz | xz \-4e > baz.tar.xz

+.ft R

+.fi

+.RE

+.PP

A mix of compressed and uncompressed files can be decompressed

to standard output with a single command:

-.IP

-.B "xz -dcf a.txt b.txt.xz c.txt d.txt.xz > abcd.txt"

-.SS Parallel compression of many files

+.RS

+.PP

+.nf

+.ft CW

+xz \-dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt

+.ft R

+.fi

+.RE

+.SS "Parallel compression of many files"

On GNU and *BSD,

.BR find (1)

and

.BR xargs (1)

-can be used to parallellize compression of many files:

+can be used to parallelize compression of many files:

+.RS

.PP

-.IP

-.B "find . \-type f \e! \-name '*.xz' \-print0 | xargs \-0r \-P4 \-n16 xz"

+.nf

+.ft CW

+find . \-type f \e! \-name '*.xz' \-print0 \e

+ | xargs \-0r \-P4 \-n16 xz \-T1

+.ft R

+.fi

+.RE

.PP

The

.B \-P

-option sets the number of parallel

+option to

+.BR xargs (1)

+sets the number of parallel

.B xz

-processes. The best value for the

+processes.

+The best value for the

.B \-n

option depends on how many files there are to be compressed.

-If there are only a couple of files, the value should probably be

-.BR 1 ;

+If there are only a couple of files,

+the value should probably be 1;

with tens of thousands of files,

-.B 100

-or even more may be appropriate to reduce the number of

+100 or even more may be appropriate to reduce the number of

.B xz

processes that

.BR xargs (1)

will eventually create.

-.SS Robot mode examples

-Calculating how many bytes have been saved in total after compressing

-multiple files:

-.IP

-.B "xz --robot --list *.xz | awk '/^totals/{print $5\-$4}'"

+.PP

+The option

+.B \-T1

+for

+.B xz

+is there to force it to single-threaded mode, because

+.BR xargs (1)

+is used to control the amount of parallelization.

+.SS "Robot mode"

+Calculate how many bytes have been saved in total

+after compressing multiple files:

+.RS

+.PP

+.nf

+.ft CW

+xz \-\-robot \-\-list *.xz | awk '/^totals/{print $5\-$4}'

+.ft R

+.fi

+.RE

+.PP

+A script may want to know that it is using new enough

+.BR xz .

+The following

+.BR sh (1)

+script checks that the version number of the

+.B xz

+tool is at least 5.0.0.

+This method is compatible with old beta versions,

+which didn't support the

+.B \-\-robot

+option:

+.RS

+.PP

+.nf

+.ft CW

+if ! eval "$(xz \-\-robot \-\-version 2> /dev/null)" ||

+ [ "$XZ_VERSION" \-lt 50000002 ]; then

+ echo "Your xz is too old."

+fi

+unset XZ_VERSION LIBLZMA_VERSION

+.ft R

+.fi

+.RE

+.PP

+Set a memory usage limit for decompression using

+.BR XZ_OPT ,

+but if a limit has already been set, don't increase it:

+.RS

+.PP

+.nf

+.ft CW

+NEWLIM=$((123 << 20)) # 123 MiB

+OLDLIM=$(xz \-\-robot \-\-info\-memory | cut \-f3)

+if [ $OLDLIM \-eq 0 \-o $OLDLIM \-gt $NEWLIM ]; then

+ XZ_OPT="$XZ_OPT \-\-memlimit\-decompress=$NEWLIM"

+ export XZ_OPT

+fi

+.ft R

+.fi

+.RE

+.SS "Custom compressor filter chains"

+The simplest use for custom filter chains is

+customizing a LZMA2 preset.

+This can be useful,

+because the presets cover only a subset of the

+potentially useful combinations of compression settings.

+.PP

+The CompCPU columns of the tables

+from the descriptions of the options

+.BR "\-0" " ... " "\-9"

+and

+.B \-\-extreme

+are useful when customizing LZMA2 presets.

+Here are the relevant parts collected from those two tables:

+.RS

+.PP

+.TS

+tab(;);

+c c

+n n.

+Preset;CompCPU

+\-0;0

+\-1;1

+\-2;2

+\-3;3

+\-4;4

+\-5;5

+\-6;6

+\-5e;7

+\-6e;8

+.TE

+.RE

+.PP

+If you know that a file requires

+somewhat big dictionary (e.g. 32 MiB) to compress well,

+but you want to compress it quicker than

+.B "xz \-8"

+would do, a preset with a low CompCPU value (e.g. 1)

+can be modified to use a bigger dictionary:

+.RS

+.PP

+.nf

+.ft CW

+xz \-\-lzma2=preset=1,dict=32MiB foo.tar

+.ft R

+.fi

+.RE

+.PP

+With certain files, the above command may be faster than

+.B "xz \-6"

+while compressing significantly better.

+However, it must be emphasized that only some files benefit from

+a big dictionary while keeping the CompCPU value low.

+The most obvious situation,

+where a big dictionary can help a lot,

+is an archive containing very similar files

+of at least a few megabytes each.

+The dictionary size has to be significantly bigger

+than any individual file to allow LZMA2 to take

+full advantage of the similarities between consecutive files.

+.PP

+If very high compressor and decompressor memory usage is fine,

+and the file being compressed is

+at least several hundred megabytes, it may be useful

+to use an even bigger dictionary than the 64 MiB that

+.B "xz \-9"

+would use:

+.RS

+.PP

+.nf

+.ft CW

+xz \-vv \-\-lzma2=dict=192MiB big_foo.tar

+.ft R

+.fi

+.RE

+.PP

+Using

+.B \-vv

+.RB ( "\-\-verbose \-\-verbose" )

+like in the above example can be useful

+to see the memory requirements

+of the compressor and decompressor.

+Remember that using a dictionary bigger than

+the size of the uncompressed file is waste of memory,

+so the above command isn't useful for small files.

+.PP

+Sometimes the compression time doesn't matter,

+but the decompressor memory usage has to be kept low

+e.g. to make it possible to decompress the file on

+an embedded system.

+The following command uses

+.B \-6e

+.RB ( "\-6 \-\-extreme" )

+as a base and sets the dictionary to only 64\ KiB.

+The resulting file can be decompressed with XZ Embedded

+(that's why there is

+.BR \-\-check=crc32 )

+using about 100\ KiB of memory.

+.RS

+.PP

+.nf

+.ft CW

+xz \-\-check=crc32 \-\-lzma2=preset=6e,dict=64KiB foo

+.ft R

+.fi

+.RE

+.PP

+If you want to squeeze out as many bytes as possible,

+adjusting the number of literal context bits

+.RI ( lc )

+and number of position bits

+.RI ( pb )

+can sometimes help.

+Adjusting the number of literal position bits

+.RI ( lp )

+might help too, but usually

+.I lc

+and

+.I pb

+are more important.

+E.g. a source code archive contains mostly US-ASCII text,

+so something like the following might give

+slightly (like 0.1\ %) smaller file than

+.B "xz \-6e"

+(try also without

+.BR lc=4 ):

+.RS

+.PP

+.nf

+.ft CW

+xz \-\-lzma2=preset=6e,pb=0,lc=4 source_code.tar

+.ft R

+.fi

+.RE

+.PP

+Using another filter together with LZMA2 can improve

+compression with certain file types.

+E.g. to compress a x86-32 or x86-64 shared library

+using the x86 BCJ filter:

+.RS

+.PP

+.nf

+.ft CW

+xz \-\-x86 \-\-lzma2 libfoo.so

+.ft R

+.fi

+.RE

+.PP

+Note that the order of the filter options is significant.

+If

+.B \-\-x86

+is specified after

+.BR \-\-lzma2 ,

+.B xz

+will give an error,

+because there cannot be any filter after LZMA2,

+and also because the x86 BCJ filter cannot be used

+as the last filter in the chain.

+.PP

+The Delta filter together with LZMA2

+can give good results with bitmap images.

+It should usually beat PNG,

+which has a few more advanced filters than simple

+delta but uses Deflate for the actual compression.

+.PP

+The image has to be saved in uncompressed format,

+e.g. as uncompressed TIFF.

+The distance parameter of the Delta filter is set

+to match the number of bytes per pixel in the image.

+E.g. 24-bit RGB bitmap needs

+.BR dist=3 ,

+and it is also good to pass

+.B pb=0

+to LZMA2 to accommodate the three-byte alignment:

+.RS

+.PP

+.nf

+.ft CW

+xz \-\-delta=dist=3 \-\-lzma2=pb=0 foo.tiff

+.ft R

+.fi

+.RE

+.PP

+If multiple images have been put into a single archive (e.g.\&

+.BR .tar ),

+the Delta filter will work on that too as long as all images

+have the same number of bytes per pixel.

.SH "SEE ALSO"

.BR xzdec (1),

+.BR xzdiff (1),

+.BR xzgrep (1),

+.BR xzless (1),

+.BR xzmore (1),

.BR gzip (1),

-.BR bzip2 (1)

+.BR bzip2 (1),

+.BR 7z (1)

.PP

XZ Utils: <http://tukaani.org/xz/>

.br

« no previous file with comments | « src/xz/util.c ('k') | src/xzdec/Makefile.am » ('j') | no next file with comments »