1
0
Fork 0

Adding upstream version 0.27.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-03-04 07:39:25 +01:00
parent 701564a854
commit ee83909940
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
83 changed files with 980 additions and 726 deletions

View file

@ -1,5 +1,5 @@
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2.
.TH TARLZ "1" "December 2024" "tarlz 0.26" "User Commands"
.TH TARLZ "1" "March 2025" "tarlz 0.27" "User Commands"
.SH NAME
tarlz \- creates tar archives with multimember lzip compression
.SH SYNOPSIS
@ -19,7 +19,7 @@ compressed archives.
.PP
Keeping the alignment between tar members and lzip members has two
advantages. It adds an indexed lzip layer on top of the tar archive, making
it possible to decode the archive safely in parallel. It also minimizes the
it possible to decode the archive safely in parallel. It also reduces the
amount of data lost in case of corruption.
.PP
The tarlz file format is a safe POSIX\-style backup format. In case of
@ -160,12 +160,12 @@ Report bugs to lzip\-bug@nongnu.org
.br
Tarlz home page: http://www.nongnu.org/lzip/tarlz.html
.SH COPYRIGHT
Copyright \(co 2024 Antonio Diaz Diaz.
Copyright \(co 2025 Antonio Diaz Diaz.
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
.br
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Using lzlib 1.15\-rc1
Using lzlib 1.15
Using LZ_API_VERSION = 1015
.SH "SEE ALSO"
The full documentation for

View file

@ -11,13 +11,14 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual
************
This manual is for Tarlz (version 0.26, 7 December 2024).
This manual is for Tarlz (version 0.27, 28 February 2025).
* Menu:
* Introduction:: Purpose and features of tarlz
* Invoking tarlz:: Command-line interface
* Argument syntax:: By convention, options start with a hyphen
* Creating backups safely:: Checking integrity and accuracy of archives
* Portable character set:: POSIX portable filename character set
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
@ -29,7 +30,7 @@ This manual is for Tarlz (version 0.26, 7 December 2024).
* Concept index:: Index of concepts
Copyright (C) 2013-2024 Antonio Diaz Diaz.
Copyright (C) 2013-2025 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it.
@ -53,7 +54,7 @@ compressed archives.
Keeping the alignment between tar members and lzip members has two
advantages. It adds an indexed lzip layer on top of the tar archive, making
it possible to decode the archive safely in parallel. It also minimizes the
it possible to decode the archive safely in parallel. It also reduces the
amount of data lost in case of corruption. Compressing a tar archive with
plzip may even double the amount of files lost for each lzip member damaged
because it does not keep the members aligned.
@ -216,7 +217,7 @@ tarlz supports the following operations:
'-t'
'--list'
List the contents of an archive. If FILES are given, list only the
FILES given.
FILES given. *Note mt-listing::.
'-x'
'--extract'
@ -227,20 +228,23 @@ tarlz supports the following operations:
directories unconditionally before extracting over them. Other than
that, it does not make any special effort to extract a file over an
incompatible type of file. For example, extracting a file over a
non-empty directory usually fails.
non-empty directory usually fails. *Note mt-extraction::.
'-z'
'--compress'
Compress existing POSIX tar archives aligning the lzip members to the
tar members with choice of granularity ('--bsolid' by default,
'--dsolid' works like '--asolid'). Exit with error status 2 if any
input archive is an empty file. The input archives are kept unchanged.
Existing compressed archives are not overwritten. A hyphen '-' used as
the name of an input archive reads from standard input and writes to
standard output (unless the option '--output' is used). Tarlz can be
used as compressor for GNU tar by using a command like
'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Tarlz can be used as
compressor for zupdate (zutils) by using a command like
'--dsolid' works like '--asolid'). Each input archive is compressed to
a file with the extension '.lz' added unless the option '--output' is
used. If no archives are specified, or if a hyphen '-' is used as the
name of an archive, tarlz reads from standard input and writes to
standard output (unless the option '--output' is used). When
'--output' is used, only one input archive can be specified. Exit with
error status 2 if any input archive is an empty file. The input
archives are kept unchanged. Existing compressed archives are not
overwritten. Tarlz can be used as compressor for GNU tar by using a
command like 'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Tarlz can
be used as compressor for zupdate (zutils) by using a command like
'zupdate --lz='tarlz -z' foo.tar.gz'. Note that tarlz only works
reliably on archives without global headers, or with global headers
whose content can be ignored.
@ -251,11 +255,8 @@ tarlz supports the following operations:
archive. Unless solid compression is requested, the end-of-archive
blocks are compressed in a lzip member separated from the preceding
members and from any nonzero garbage following the end-of-archive
blocks. '--compress' implies plzip argument style, not tar style. Each
input archive is compressed to a file with the extension '.lz' added
unless the option '--output' is used. When '--output' is used, only
one input archive can be specified. '-f' can't be used with
'--compress'.
blocks. '--compress' implies plzip argument style, not tar style. '-f'
can't be used with '--compress'.
'--check-lib'
Compare the version of lzlib used to compile tarlz with the version
@ -276,7 +277,9 @@ tarlz supports the following options: *Note Argument syntax::.
Set target size of input data blocks for the option '--bsolid'. *Note
--bsolid::. Valid values range from 8 KiB to 1 GiB. Default value is
two times the dictionary size, except for option '-0' where it
defaults to 1 MiB. *Note Minimum archive sizes::.
defaults to 1 MiB. *Note Minimum archive sizes::. Tarlz does not split
tar members. If a file is larger than BYTES, tarlz will create a lzip
member large enough to contain the file.
'-C DIR'
'--directory=DIR'
@ -424,12 +427,12 @@ tarlz supports the following options: *Note Argument syntax::.
group ID.
'--exclude=PATTERN'
Exclude files matching a shell pattern like '*.o'. A file is considered
to match if any component of the file name matches. For example, '*.o'
matches 'foo.o', 'foo.o/bar' and 'foo/bar.o'. If PATTERN contains a
'/', it matches a corresponding '/' in the file name. For example,
'foo/*.o' matches 'foo/bar.o'. Multiple '--exclude' options can be
specified.
Exclude files matching a shell pattern like '*.o', even if the files
are specified in the command line. A file is considered to match if any
component of the file name matches. For example, '*.o' matches
'foo.o', 'foo.o/bar' and 'foo/bar.o'. If PATTERN contains a '/', it
matches a corresponding '/' in the file name. For example, 'foo/*.o'
matches 'foo/bar.o'. Multiple '--exclude' options can be specified.
'--ignore-ids'
Make '--diff' ignore differences in owner and group IDs. This option is
@ -486,10 +489,10 @@ tarlz supports the following options: *Note Argument syntax::.
During archive creation, warn if any file being archived has a
modification time newer than the archive creation time. This option
may slow archive creation somewhat because it makes an extra call to
'stat' after archiving each file, but it guarantees that file contents
were not modified during the creation of the archive. Note that the
file must be at least one second newer than the archive for it to be
detected as newer.
'stat' after archiving each file, but it nearly guarantees that file
contents were not modified during the creation of the archive. Note
that the file must be at least one second newer than the archive for
it to be detected as newer.
Exit status: 0 for a normal exit, 1 for environmental problems (file not
@ -498,7 +501,7 @@ indicate a corrupt or invalid input file, 3 for an internal consistency
error (e.g., bug) which caused tarlz to panic.

File: tarlz.info, Node: Argument syntax, Next: Portable character set, Prev: Invoking tarlz, Up: Top
File: tarlz.info, Node: Argument syntax, Next: Creating backups safely, Prev: Invoking tarlz, Up: Top
3 Syntax of command-line arguments
**********************************
@ -541,9 +544,55 @@ GNU adds "long options" to these conventions:
Thus, '--foo bar' and '--foo=bar' are equivalent.

File: tarlz.info, Node: Portable character set, Next: File format, Prev: Argument syntax, Up: Top
File: tarlz.info, Node: Creating backups safely, Next: Portable character set, Prev: Argument syntax, Up: Top
4 POSIX portable filename character set
4 Checking the integrity and accuracy of tar.lz archives
********************************************************
Uncompressed tar archives do not offer any integrity checking for the files
they store. The pax format even fails to offer integrity checking for some
of the metadata. *Note crc32::. The integrity checking of tar archives is
usually provided by a compression layer or by an external hash.
Lzip compression provides safe integrity checking to tar archives. But it
does not matter how safe is the archiving format if the archive is created
corrupt because of a concurrent modification of the files being archived, a
faulty RAM, or a bug in the archiving tool. The only way of guaranteeing
that a backup archive is correct is to check its integrity and accuracy
after creating it.
Testing the integrity of the archive with 'lzip -tv' guarantees that the
compression layer of the archive is valid, but it does not guarantee that
the tar layer is valid nor that the files in the archive match the files in
the file system. For example, if the RAM is faulty and a bit flip happens
in the input buffer before tarlz compresses it, the archive will not match
the files. It is safer to check the archive with 'tarlz -d' just after
creation because it checks the compression layer and the tar layer, and it
compares the files in the archive with the files in the file system:
tarlz -cf archive.tar.lz somedir # create the archive
tarlz -df archive.tar.lz # check the archive
Once the integrity and accuracy of an archive have been verified as in
the example above, they can be verified again anywhere at any time with
'tarlz -t -n0'. It is important to disable multi-threading with '-n0'
because multi-threaded listing does not detect corruption in the tar member
data of multimember archives: *Note mt-listing::.
tarlz -t -n0 -f archive.tar.lz > /dev/null
'lzip -tv' checks the integrity of the compression layer, and therefore
the integrity and accuracy of any archive created and verified as explained
above. This test is reliable for solidly compressed archives, but it does
not detect a truncated multimember archive if the truncation happens just
at a member boundary:
lzip -tv archive.tar.lz

File: tarlz.info, Node: Portable character set, Next: File format, Prev: Creating backups safely, Up: Top
5 POSIX portable filename character set
***************************************
The set of characters from which portable file names are constructed.
@ -561,7 +610,7 @@ names use only the portable character set without spaces added.

File: tarlz.info, Node: File format, Next: Amendments to pax format, Prev: Portable character set, Up: Top
5 File format
6 File format
*************
In the diagram below, a box like this:
@ -632,7 +681,7 @@ tar.lz
| member | member | member |
+===============+=================================================+========+
5.1 Pax header block
6.1 Pax header block
====================
The pax header block is identical to the ustar header block described below
@ -676,7 +725,7 @@ space, equal-sign, and newline.
previously archived. This record overrides the field 'linkname' in the
following ustar header block. The following ustar header block
determines the type of link created. If typeflag of the following
header block is 1, a hard link is created. If typeflag is 2, a
header block is '1', a hard link is created. If typeflag is '2', a
symbolic link is created and the linkpath value is used as the
contents of the symbolic link. The linkpath record is created only for
links with a link name that does not fit in the space provided by the
@ -716,17 +765,17 @@ space, equal-sign, and newline.
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
representing the CRC <value> itself. The <value> is represented as 8
hexadecimal digits in big endian order, '22 GNU.crc32=00000000\n'. The
keyword of the CRC record is protected by the CRC to guarantee that
corruption is always detected when using '--missing-crc' (except in
case of CRC collision). A CRC was chosen because a checksum is too
weak for a potentially large list of variable sized records. A
option '--missing-crc' guarantees that corruption is always detected
(except in case of CRC collision). A CRC was chosen because a checksum
is too weak for a potentially large list of variable sized records. A
checksum can't detect simple errors like the swapping of two bytes.
*Note --missing-crc::.
At verbosity level 1 or higher tarlz prints a diagnostic for each unknown
extended header keyword found in an archive, once per keyword.
5.2 Ustar header block
6.2 Ustar header block
======================
The ustar header block has a length of 512 bytes and is structured as shown
@ -750,6 +799,7 @@ gname 297 32
devmajor 329 8
devminor 337 8
prefix 345 155
padding 500 12
All characters in the header block are coded using the ISO/IEC 646:1991
(ASCII) standard, except in fields storing names for files, users, and
@ -839,7 +889,7 @@ file archived:
''7''
Reserved to represent a file to which an implementation has associated
some high-performance attribute (contiguous file). Tarlz treats this
type of file as a regular file (type 0).
type of file as a regular file (type '0').
The field 'magic' contains the ASCII null-terminated string "ustar". The
@ -848,13 +898,13 @@ field 'version' contains the characters "00" (0x30,0x30). The fields
characters in the array contain non-null characters including the last
character. Each numeric field contains a leading space- or zero-filled,
optionally null-terminated octal number using digits from the ISO/IEC
646:1991 (ASCII) standard. Tarlz is able to decode numeric fields 1 byte
646:1991 (ASCII) standard. Tarlz is able to decode numeric fields one byte
longer than standard ustar by not requiring a terminating null character.

File: tarlz.info, Node: Amendments to pax format, Next: Program design, Prev: File format, Up: Top
6 The reasons for the differences with pax
7 The reasons for the differences with pax
******************************************
Tarlz creates safe archives that allow the reliable detection of invalid or
@ -865,7 +915,7 @@ achieve this goal and avoid some other flaws in the pax format, tarlz makes
some changes to the variant of the pax format that it uses. This chapter
describes these changes and the concrete reasons to implement them.
6.1 Add a CRC of the extended records
7.1 Add a CRC of the extended records
=====================================
The POSIX pax format has a serious flaw. The metadata stored in pax extended
@ -892,7 +942,7 @@ place.
Redundancy Check (CRC) in a way compatible with standard tar tools. *Note
key_crc32::.
6.2 Remove flawed backward compatibility
7.2 Remove flawed backward compatibility
========================================
In order to allow the extraction of pax archives by a tar utility conforming
@ -925,7 +975,7 @@ trying to extract the file or link. This also makes easier during parallel
decoding the detection of a tar member split between two lzip members at
the boundary between the extended header and the ustar header.
6.3 As simple as possible (but not simpler)
7.3 As simple as possible (but not simpler)
===========================================
The tarlz format is mainly ustar. Extended pax headers are used only when
@ -940,7 +990,7 @@ corruption.
ignored. Some operations may not behave as expected if the archive contains
global headers.
6.4 Improve reproducibility
7.4 Improve reproducibility
===========================
Pax includes by default the process ID of the pax process in the ustar name
@ -952,7 +1002,7 @@ extended records, making it easier to produce reproducible archives.
ten; '99<97_bytes>' or '100<97_bytes>'. Tarlz minimizes the length of the
record and always produces a length of x-1 in these cases.
6.5 No data in hard links
7.5 No data in hard links
=========================
Tarlz does not allow data in hard link members. The data (if any) must be in
@ -961,27 +1011,26 @@ the names of a file are stored as hard links, the type of the file is lost.
Not allowing data in hard links also prevents invalid actions like
extracting file data for a hard link to a symbolic link or to a directory.
6.6 Avoid misconversions to/from UTF-8
7.6 Avoid misconversions to/from UTF-8
======================================
There is no portable way to tell what charset a text string is coded into.
Therefore, tarlz stores all fields representing text strings unmodified,
without conversion to UTF-8 nor any other transformation. This prevents
accidental double UTF-8 conversions. If the need arises this behavior will
be adjusted with a command-line option in the future.
accidental double UTF-8 conversions.

File: tarlz.info, Node: Program design, Next: Multi-threaded decoding, Prev: Amendments to pax format, Up: Top
7 Internal structure of tarlz
8 Internal structure of tarlz
*****************************
The parts of tarlz related to sequential processing of the archive are more
or less similar to any other tar and won't be described here. The
interesting parts described here are those related to Multi-threaded
interesting parts described here are those related to multi-threaded
processing.
The structure of the part of tarlz performing Multi-threaded archive
The structure of the part of tarlz performing multi-threaded archive
creation is somewhat similar to that of plzip with the added complication
of the solidity levels. *Note Program design: (plzip)Program design. A
grouper thread and several worker threads are created, acting the main
@ -1053,7 +1102,7 @@ error be avoided.

File: tarlz.info, Node: Multi-threaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
8 Limitations of parallel tar decoding
9 Limitations of parallel tar decoding
**************************************
Safely decoding a tar archive in parallel is only possible if one decodes
@ -1093,11 +1142,14 @@ tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded
mode and continues decoding the archive.
If the files in the archive are large, multi-threaded '--list' on a
regular (seekable) tar.lz archive can be hundreds of times faster than
sequential '--list' because, in addition to using several processors, it
only needs to decompress part of each lzip member. See the following
example listing the Silesia corpus on a dual core machine:
9.1 Multi-threaded listing
==========================
If the files in the archive are large, multi-threaded '--list' on a regular
(seekable) tar.lz archive can be hundreds of times faster than sequential
'--list' because, in addition to using several processors, it only needs to
decompress part of each lzip member. See the following example listing the
Silesia corpus on a dual core machine:
tarlz -9 --no-solid -cf silesia.tar.lz silesia
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
@ -1106,10 +1158,12 @@ example listing the Silesia corpus on a dual core machine:
On the other hand, multi-threaded '--list' won't detect corruption in
the tar member data because it only decodes the part of each lzip member
corresponding to the tar member header. This is another reason why the tar
headers must provide their own integrity checking.
corresponding to the tar member header. Partial decoding of a lzip member
can't guarantee the integrity of the data decoded. This is another reason
why the tar headers (including the extended records) must provide their own
integrity checking.
8.1 Limitations of multi-threaded extraction
9.2 Limitations of multi-threaded extraction
============================================
Multi-threaded extraction may produce different output than single-threaded
@ -1139,8 +1193,8 @@ links to.

File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top
9 Minimum archive sizes required for multi-threaded block compression
*********************************************************************
10 Minimum archive sizes required for multi-threaded block compression
**********************************************************************
When creating or appending to a compressed archive using multi-threaded
block compression, tarlz puts tar members together in blocks and compresses
@ -1177,7 +1231,7 @@ Level

File: tarlz.info, Node: Examples, Next: Problems, Prev: Minimum archive sizes, Up: Top
10 A small tutorial with examples
11 A small tutorial with examples
*********************************
Example 1: Create a multimember compressed archive 'archive.tar.lz'
@ -1233,10 +1287,12 @@ other members can still be extracted).
tarlz -z --no-solid archive.tar
Example 10: Compress the archive 'archive.tar' and write the output to
'foo.tar.lz'.
Example 10: Recompress the archive 'archive.tar.lz' with different
solidity, write the output to 'archive-ns.tar.lz', and compare both
archives.
tarlz -z -o foo.tar.lz archive.tar
lzip -cd archive.tar.lz | tarlz -9z --no-solid -o archive-ns.tar.lz
zcmp archive.tar.lz archive-ns.tar.lz
Example 11: Concatenate and compress two archives 'archive1.tar' and
'archive2.tar', and write the output to 'foo.tar.lz'.
@ -1246,7 +1302,7 @@ Example 11: Concatenate and compress two archives 'archive1.tar' and

File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
11 Reporting bugs
12 Reporting bugs
*****************
There are probably bugs in tarlz. There are certainly errors and omissions
@ -1270,6 +1326,7 @@ Concept index
* Amendments to pax format: Amendments to pax format. (line 6)
* argument syntax: Argument syntax. (line 6)
* bugs: Problems. (line 6)
* creating backups: Creating backups safely. (line 6)
* examples: Examples. (line 6)
* file format: File format. (line 6)
* getting help: Problems. (line 6)
@ -1287,26 +1344,29 @@ Concept index

Tag Table:
Node: Top216
Node: Introduction1281
Node: Invoking tarlz4106
Ref: --data-size13109
Ref: --bsolid17626
Node: Argument syntax23539
Node: Portable character set25314
Node: File format25958
Ref: key_crc3233001
Ref: ustar-uid-gid36305
Ref: ustar-mtime37112
Node: Amendments to pax format39115
Ref: crc3239823
Ref: flawed-compat41134
Node: Program design45211
Node: Multi-threaded decoding49138
Ref: mt-extraction52407
Node: Minimum archive sizes53713
Node: Examples55840
Node: Problems58199
Node: Concept index58754
Node: Introduction1356
Node: Invoking tarlz4179
Ref: --data-size13265
Ref: --bsolid17924
Ref: --missing-crc21532
Node: Argument syntax23897
Node: Creating backups safely25673
Node: Portable character set28057
Node: File format28709
Ref: key_crc3235756
Ref: ustar-uid-gid39052
Ref: ustar-mtime39859
Node: Amendments to pax format41866
Ref: crc3242574
Ref: flawed-compat43885
Node: Program design47870
Node: Multi-threaded decoding51797
Ref: mt-listing54198
Ref: mt-extraction55236
Node: Minimum archive sizes56542
Node: Examples58671
Node: Problems61166
Node: Concept index61721

End Tag Table

View file

@ -6,8 +6,8 @@
@finalout
@c %**end of header
@set UPDATED 7 December 2024
@set VERSION 0.26
@set UPDATED 28 February 2025
@set VERSION 0.27
@dircategory Archiving
@direntry
@ -39,6 +39,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
* Introduction:: Purpose and features of tarlz
* Invoking tarlz:: Command-line interface
* Argument syntax:: By convention, options start with a hyphen
* Creating backups safely:: Checking integrity and accuracy of archives
* Portable character set:: POSIX portable filename character set
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
@ -51,7 +52,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
@end menu
@sp 1
Copyright @copyright{} 2013-2024 Antonio Diaz Diaz.
Copyright @copyright{} 2013-2025 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it.
@ -76,7 +77,7 @@ compressed archives.
Keeping the alignment between tar members and lzip members has two
advantages. It adds an indexed lzip layer on top of the tar archive, making
it possible to decode the archive safely in parallel. It also minimizes the
it possible to decode the archive safely in parallel. It also reduces the
amount of data lost in case of corruption. Compressing a tar archive with
plzip may even double the amount of files lost for each lzip member damaged
because it does not keep the members aligned.
@ -254,7 +255,7 @@ during multi-threaded extraction. @xref{mt-extraction}.
@item -t
@itemx --list
List the contents of an archive. If @var{files} are given, list only the
@var{files} given.
@var{files} given. @xref{mt-listing}.
@item -x
@itemx --extract
@ -265,20 +266,23 @@ directory without extracting the files under it, use
empty directories unconditionally before extracting over them. Other than
that, it does not make any special effort to extract a file over an
incompatible type of file. For example, extracting a file over a non-empty
directory usually fails.
directory usually fails. @xref{mt-extraction}.
@item -z
@itemx --compress
Compress existing POSIX tar archives aligning the lzip members to the tar
members with choice of granularity (@option{--bsolid} by default,
@option{--dsolid} works like @option{--asolid}). Exit with error status 2 if
any input archive is an empty file. The input archives are kept unchanged.
Existing compressed archives are not overwritten. A hyphen @samp{-} used as
the name of an input archive reads from standard input and writes to
standard output (unless the option @option{--output} is used). Tarlz can be
used as compressor for GNU tar by using a command like
@w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}. Tarlz can be used as
compressor for zupdate (zutils) by using a command like
@option{--dsolid} works like @option{--asolid}). Each input archive is
compressed to a file with the extension @file{.lz} added unless the option
@option{--output} is used. If no archives are specified, or if a hyphen
@samp{-} is used as the name of an archive, tarlz reads from standard input
and writes to standard output (unless the option @option{--output} is used).
When @option{--output} is used, only one input archive can be specified.
Exit with error status 2 if any input archive is an empty file. The input
archives are kept unchanged. Existing compressed archives are not
overwritten. Tarlz can be used as compressor for GNU tar by using a command
like @w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}. Tarlz can be
used as compressor for zupdate (zutils) by using a command like
@w{@samp{zupdate --lz='tarlz -z' foo.tar.gz}}. Note that tarlz only works
reliably on archives without global headers, or with global headers whose
content can be ignored.
@ -289,10 +293,8 @@ block is found, and then compresses the rest of the archive. Unless solid
compression is requested, the end-of-archive blocks are compressed in a lzip
member separated from the preceding members and from any nonzero garbage
following the end-of-archive blocks. @option{--compress} implies plzip
argument style, not tar style. Each input archive is compressed to a file
with the extension @file{.lz} added unless the option @option{--output} is
used. When @option{--output} is used, only one input archive can be specified.
@option{-f} can't be used with @option{--compress}.
argument style, not tar style. @option{-f} can't be used with
@option{--compress}.
@item --check-lib
Compare the
@ -319,8 +321,10 @@ tarlz supports the following options: @xref{Argument syntax}.
@itemx --data-size=@var{bytes}
Set target size of input data blocks for the option @option{--bsolid}.
@xref{--bsolid}. Valid values range from @w{8 KiB} to @w{1 GiB}. Default
value is two times the dictionary size, except for option @option{-0} where it
defaults to @w{1 MiB}. @xref{Minimum archive sizes}.
value is two times the dictionary size, except for option @option{-0} where
it defaults to @w{1 MiB}. @xref{Minimum archive sizes}. Tarlz does not split
tar members. If a file is larger than @var{bytes}, tarlz will create a lzip
member large enough to contain the file.
@item -C @var{dir}
@itemx --directory=@var{dir}
@ -465,12 +469,13 @@ If @var{group} is not a valid group name, it is decoded as a decimal numeric
group ID.
@item --exclude=@var{pattern}
Exclude files matching a shell pattern like @file{*.o}. A file is considered
to match if any component of the file name matches. For example, @file{*.o}
matches @file{foo.o}, @file{foo.o/bar} and @file{foo/bar.o}. If
@var{pattern} contains a @samp{/}, it matches a corresponding @samp{/} in
the file name. For example, @file{foo/*.o} matches @file{foo/bar.o}.
Multiple @option{--exclude} options can be specified.
Exclude files matching a shell pattern like @file{*.o}, even if the files
are specified in the command line. A file is considered to match if any
component of the file name matches. For example, @file{*.o} matches
@file{foo.o}, @file{foo.o/bar} and @file{foo/bar.o}. If @var{pattern}
contains a @samp{/}, it matches a corresponding @samp{/} in the file name.
For example, @file{foo/*.o} matches @file{foo/bar.o}. Multiple
@option{--exclude} options can be specified.
@item --ignore-ids
Make @option{--diff} ignore differences in owner and group IDs. This option is
@ -493,6 +498,7 @@ recover as much data as possible from each damaged member. It is recommended
to run tarlz in single-threaded mode (@option{--threads=0}) when using this
option.
@anchor{--missing-crc}
@item --missing-crc
Exit with error status 2 if the CRC of the extended records is missing. When
this option is used, tarlz detects any corruption in the extended records
@ -525,9 +531,9 @@ values range from 1 to 1024. The default value is 64.
During archive creation, warn if any file being archived has a modification
time newer than the archive creation time. This option may slow archive
creation somewhat because it makes an extra call to @samp{stat} after
archiving each file, but it guarantees that file contents were not modified
during the creation of the archive. Note that the file must be at least one
second newer than the archive for it to be detected as newer.
archiving each file, but it nearly guarantees that file contents were not
modified during the creation of the archive. Note that the file must be at
least one second newer than the archive for it to be detected as newer.
@ignore
@item --permissive
@ -591,6 +597,58 @@ Thus, @w{@option{--foo bar}} and @option{--foo=bar} are equivalent.
@end itemize
@node Creating backups safely
@chapter Checking the integrity and accuracy of tar.lz archives
@cindex creating backups
Uncompressed tar archives do not offer any integrity checking for the files
they store. The pax format even fails to offer integrity checking for some
of the metadata. @xref{crc32}. The integrity checking of tar archives is
usually provided by a compression layer or by an external hash.
Lzip compression provides safe integrity checking to tar archives. But it
does not matter how safe is the archiving format if the archive is created
corrupt because of a concurrent modification of the files being archived, a
faulty RAM, or a bug in the archiving tool. The only way of guaranteeing
that a backup archive is correct is to check its integrity and accuracy
after creating it.
Testing the integrity of the archive with @w{@samp{lzip -tv}} guarantees
that the compression layer of the archive is valid, but it does not
guarantee that the tar layer is valid nor that the files in the archive
match the files in the file system. For example, if the RAM is faulty and a
bit flip happens in the input buffer before tarlz compresses it, the archive
will not match the files. It is safer to check the archive with
@w{@samp{tarlz -d}} just after creation because it checks the compression
layer and the tar layer, and it compares the files in the archive with the
files in the file system:
@example
tarlz -cf archive.tar.lz somedir # create the archive
tarlz -df archive.tar.lz # check the archive
@end example
Once the integrity and accuracy of an archive have been verified as in the
example above, they can be verified again anywhere at any time with
@w{@samp{tarlz -t -n0}}. It is important to disable multi-threading with
@option{-n0} because multi-threaded listing does not detect corruption in
the tar member data of multimember archives: @xref{mt-listing}.
@example
tarlz -t -n0 -f archive.tar.lz > /dev/null
@end example
@w{@samp{lzip -tv}} checks the integrity of the compression layer, and
therefore the integrity and accuracy of any archive created and verified as
explained above. This test is reliable for solidly compressed archives, but
it does not detect a truncated multimember archive if the truncation happens
just at a member boundary:
@example
lzip -tv archive.tar.lz
@end example
@node Portable character set
@chapter POSIX portable filename character set
@cindex portable character set
@ -641,7 +699,7 @@ are not allowed in multimember files.
Each lzip member contains one or more tar members in a simplified POSIX pax
interchange format. The only pax typeflag value supported by tarlz (in
addition to the typeflag values defined by the ustar format) is @samp{x}.
addition to the typeflag values defined by the ustar format) is 'x'.
The pax format is an extension on top of the ustar format that removes the
size limitations of the ustar format.
@ -654,7 +712,7 @@ An optional extended header block followed by one or more blocks that
contain the extended header records as if they were the contents of a file;
i.e., the extended header records are included as the data for this header
block. This header block is of the form described in pax header block, with
a typeflag value of @samp{x}.
a typeflag value of 'x'.
@item
A header block in ustar format that describes the file. Any fields defined
@ -713,7 +771,7 @@ An extended header just before the end-of-archive blocks.
@section Pax header block
The pax header block is identical to the ustar header block described below
except that the typeflag has the value @samp{x} (extended). The field
except that the typeflag has the value 'x' (extended). The field
@samp{size} is the size of the extended header data in bytes. Most other
fields in the pax header block are zeroed on archive creation to prevent
trouble if the archive is read by a ustar tool, and are ignored by tarlz on
@ -752,8 +810,8 @@ greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
The file name of a link being created to another file, of any type,
previously archived. This record overrides the field @samp{linkname} in the
following ustar header block. The following ustar header block determines
the type of link created. If typeflag of the following header block is 1, a
hard link is created. If typeflag is 2, a symbolic link is created and the
the type of link created. If typeflag of the following header block is '1', a
hard link is created. If typeflag is '2', a symbolic link is created and the
linkpath value is used as the contents of the symbolic link. The linkpath
record is created only for links with a link name that does not fit in the
space provided by the ustar header.
@ -789,13 +847,12 @@ greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
@item GNU.crc32
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
representing the CRC <value> itself. The <value> is represented as 8
hexadecimal digits in big endian order,
@w{@samp{22 GNU.crc32=00000000\n}}. The keyword of the CRC record is
protected by the CRC to guarantee that corruption is always detected when
using @option{--missing-crc} (except in case of CRC collision). A CRC was
chosen because a checksum is too weak for a potentially large list of
variable sized records. A checksum can't detect simple errors like the
swapping of two bytes.
hexadecimal digits in big endian order, @w{@samp{22 GNU.crc32=00000000\n}}.
The option @option{--missing-crc} guarantees that corruption is always
detected (except in case of CRC collision). A CRC was chosen because a
checksum is too weak for a potentially large list of variable sized records.
A checksum can't detect simple errors like the swapping of two bytes.
@xref{--missing-crc}.
@end table
@ -825,6 +882,7 @@ shown in the following table. All lengths and offsets are in decimal:
@item devmajor @tab 329 @tab 8
@item devminor @tab 337 @tab 8
@item prefix @tab 345 @tab 155
@item padding @tab 500 @tab 12
@end multitable
All characters in the header block are coded using the ISO/IEC 646:1991
@ -919,7 +977,7 @@ FIFO special file.
@item '7'
Reserved to represent a file to which an implementation has associated some
high-performance attribute (contiguous file). Tarlz treats this type of file
as a regular file (type 0).
as a regular file (type '0').
@end table
@ -930,8 +988,8 @@ except when all characters in the array contain non-null characters
including the last character. Each numeric field contains a leading space-
or zero-filled, optionally null-terminated octal number using digits from
the ISO/IEC 646:1991 (ASCII) standard. Tarlz is able to decode numeric
fields 1 byte longer than standard ustar by not requiring a terminating null
character.
fields one byte longer than standard ustar by not requiring a terminating
null character.
@node Amendments to pax format
@ -1044,8 +1102,7 @@ extracting file data for a hard link to a symbolic link or to a directory.
There is no portable way to tell what charset a text string is coded into.
Therefore, tarlz stores all fields representing text strings unmodified,
without conversion to UTF-8 nor any other transformation. This prevents
accidental double UTF-8 conversions. If the need arises this behavior will
be adjusted with a command-line option in the future.
accidental double UTF-8 conversions.
@node Program design
@ -1054,12 +1111,12 @@ be adjusted with a command-line option in the future.
The parts of tarlz related to sequential processing of the archive are more
or less similar to any other tar and won't be described here. The interesting
parts described here are those related to Multi-threaded processing.
parts described here are those related to multi-threaded processing.
The structure of the part of tarlz performing Multi-threaded archive
The structure of the part of tarlz performing multi-threaded archive
creation is somewhat similar to that of
@uref{http://www.nongnu.org/lzip/plzip.html#Program-design,,plzip} with the
added complication of the solidity levels.
@uref{http://www.nongnu.org/lzip/manual/plzip_manual.html#Program-design,,plzip}
with the added complication of the solidity levels.
@ifnothtml
@xref{Program design,,,plzip}.
@end ifnothtml
@ -1174,6 +1231,9 @@ tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded
mode and continues decoding the archive.
@anchor{mt-listing}
@section Multi-threaded listing
If the files in the archive are large, multi-threaded @option{--list} on a
regular (seekable) tar.lz archive can be hundreds of times faster than
sequential @option{--list} because, in addition to using several processors,
@ -1189,8 +1249,10 @@ time tarlz -tf silesia.tar.lz (0.020s)
On the other hand, multi-threaded @option{--list} won't detect corruption in
the tar member data because it only decodes the part of each lzip member
corresponding to the tar member header. This is another reason why the tar
headers must provide their own integrity checking.
corresponding to the tar member header. Partial decoding of a lzip member
can't guarantee the integrity of the data decoded. This is another reason
why the tar headers (including the extended records) must provide their own
integrity checking.
@anchor{mt-extraction}
@section Limitations of multi-threaded extraction
@ -1344,11 +1406,13 @@ tarlz -z --no-solid archive.tar
@end example
@noindent
Example 10: Compress the archive @file{archive.tar} and write the output to
@file{foo.tar.lz}.
Example 10: Recompress the archive @file{archive.tar.lz} with different
solidity, write the output to @file{archive-ns.tar.lz}, and compare both
archives.
@example
tarlz -z -o foo.tar.lz archive.tar
lzip -cd archive.tar.lz | tarlz -9z --no-solid -o archive-ns.tar.lz
zcmp archive.tar.lz archive-ns.tar.lz
@end example
@noindent