Adding upstream version 0.27.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
701564a854
commit
ee83909940
83 changed files with 980 additions and 726 deletions
|
@ -1,5 +1,5 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2.
|
||||
.TH TARLZ "1" "December 2024" "tarlz 0.26" "User Commands"
|
||||
.TH TARLZ "1" "March 2025" "tarlz 0.27" "User Commands"
|
||||
.SH NAME
|
||||
tarlz \- creates tar archives with multimember lzip compression
|
||||
.SH SYNOPSIS
|
||||
|
@ -19,7 +19,7 @@ compressed archives.
|
|||
.PP
|
||||
Keeping the alignment between tar members and lzip members has two
|
||||
advantages. It adds an indexed lzip layer on top of the tar archive, making
|
||||
it possible to decode the archive safely in parallel. It also minimizes the
|
||||
it possible to decode the archive safely in parallel. It also reduces the
|
||||
amount of data lost in case of corruption.
|
||||
.PP
|
||||
The tarlz file format is a safe POSIX\-style backup format. In case of
|
||||
|
@ -160,12 +160,12 @@ Report bugs to lzip\-bug@nongnu.org
|
|||
.br
|
||||
Tarlz home page: http://www.nongnu.org/lzip/tarlz.html
|
||||
.SH COPYRIGHT
|
||||
Copyright \(co 2024 Antonio Diaz Diaz.
|
||||
Copyright \(co 2025 Antonio Diaz Diaz.
|
||||
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
||||
.br
|
||||
This is free software: you are free to change and redistribute it.
|
||||
There is NO WARRANTY, to the extent permitted by law.
|
||||
Using lzlib 1.15\-rc1
|
||||
Using lzlib 1.15
|
||||
Using LZ_API_VERSION = 1015
|
||||
.SH "SEE ALSO"
|
||||
The full documentation for
|
||||
|
|
240
doc/tarlz.info
240
doc/tarlz.info
|
@ -11,13 +11,14 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Tarlz Manual
|
||||
************
|
||||
|
||||
This manual is for Tarlz (version 0.26, 7 December 2024).
|
||||
This manual is for Tarlz (version 0.27, 28 February 2025).
|
||||
|
||||
* Menu:
|
||||
|
||||
* Introduction:: Purpose and features of tarlz
|
||||
* Invoking tarlz:: Command-line interface
|
||||
* Argument syntax:: By convention, options start with a hyphen
|
||||
* Creating backups safely:: Checking integrity and accuracy of archives
|
||||
* Portable character set:: POSIX portable filename character set
|
||||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
|
@ -29,7 +30,7 @@ This manual is for Tarlz (version 0.26, 7 December 2024).
|
|||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2013-2024 Antonio Diaz Diaz.
|
||||
Copyright (C) 2013-2025 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to copy,
|
||||
distribute, and modify it.
|
||||
|
@ -53,7 +54,7 @@ compressed archives.
|
|||
|
||||
Keeping the alignment between tar members and lzip members has two
|
||||
advantages. It adds an indexed lzip layer on top of the tar archive, making
|
||||
it possible to decode the archive safely in parallel. It also minimizes the
|
||||
it possible to decode the archive safely in parallel. It also reduces the
|
||||
amount of data lost in case of corruption. Compressing a tar archive with
|
||||
plzip may even double the amount of files lost for each lzip member damaged
|
||||
because it does not keep the members aligned.
|
||||
|
@ -216,7 +217,7 @@ tarlz supports the following operations:
|
|||
'-t'
|
||||
'--list'
|
||||
List the contents of an archive. If FILES are given, list only the
|
||||
FILES given.
|
||||
FILES given. *Note mt-listing::.
|
||||
|
||||
'-x'
|
||||
'--extract'
|
||||
|
@ -227,20 +228,23 @@ tarlz supports the following operations:
|
|||
directories unconditionally before extracting over them. Other than
|
||||
that, it does not make any special effort to extract a file over an
|
||||
incompatible type of file. For example, extracting a file over a
|
||||
non-empty directory usually fails.
|
||||
non-empty directory usually fails. *Note mt-extraction::.
|
||||
|
||||
'-z'
|
||||
'--compress'
|
||||
Compress existing POSIX tar archives aligning the lzip members to the
|
||||
tar members with choice of granularity ('--bsolid' by default,
|
||||
'--dsolid' works like '--asolid'). Exit with error status 2 if any
|
||||
input archive is an empty file. The input archives are kept unchanged.
|
||||
Existing compressed archives are not overwritten. A hyphen '-' used as
|
||||
the name of an input archive reads from standard input and writes to
|
||||
standard output (unless the option '--output' is used). Tarlz can be
|
||||
used as compressor for GNU tar by using a command like
|
||||
'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Tarlz can be used as
|
||||
compressor for zupdate (zutils) by using a command like
|
||||
'--dsolid' works like '--asolid'). Each input archive is compressed to
|
||||
a file with the extension '.lz' added unless the option '--output' is
|
||||
used. If no archives are specified, or if a hyphen '-' is used as the
|
||||
name of an archive, tarlz reads from standard input and writes to
|
||||
standard output (unless the option '--output' is used). When
|
||||
'--output' is used, only one input archive can be specified. Exit with
|
||||
error status 2 if any input archive is an empty file. The input
|
||||
archives are kept unchanged. Existing compressed archives are not
|
||||
overwritten. Tarlz can be used as compressor for GNU tar by using a
|
||||
command like 'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Tarlz can
|
||||
be used as compressor for zupdate (zutils) by using a command like
|
||||
'zupdate --lz='tarlz -z' foo.tar.gz'. Note that tarlz only works
|
||||
reliably on archives without global headers, or with global headers
|
||||
whose content can be ignored.
|
||||
|
@ -251,11 +255,8 @@ tarlz supports the following operations:
|
|||
archive. Unless solid compression is requested, the end-of-archive
|
||||
blocks are compressed in a lzip member separated from the preceding
|
||||
members and from any nonzero garbage following the end-of-archive
|
||||
blocks. '--compress' implies plzip argument style, not tar style. Each
|
||||
input archive is compressed to a file with the extension '.lz' added
|
||||
unless the option '--output' is used. When '--output' is used, only
|
||||
one input archive can be specified. '-f' can't be used with
|
||||
'--compress'.
|
||||
blocks. '--compress' implies plzip argument style, not tar style. '-f'
|
||||
can't be used with '--compress'.
|
||||
|
||||
'--check-lib'
|
||||
Compare the version of lzlib used to compile tarlz with the version
|
||||
|
@ -276,7 +277,9 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
Set target size of input data blocks for the option '--bsolid'. *Note
|
||||
--bsolid::. Valid values range from 8 KiB to 1 GiB. Default value is
|
||||
two times the dictionary size, except for option '-0' where it
|
||||
defaults to 1 MiB. *Note Minimum archive sizes::.
|
||||
defaults to 1 MiB. *Note Minimum archive sizes::. Tarlz does not split
|
||||
tar members. If a file is larger than BYTES, tarlz will create a lzip
|
||||
member large enough to contain the file.
|
||||
|
||||
'-C DIR'
|
||||
'--directory=DIR'
|
||||
|
@ -424,12 +427,12 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
group ID.
|
||||
|
||||
'--exclude=PATTERN'
|
||||
Exclude files matching a shell pattern like '*.o'. A file is considered
|
||||
to match if any component of the file name matches. For example, '*.o'
|
||||
matches 'foo.o', 'foo.o/bar' and 'foo/bar.o'. If PATTERN contains a
|
||||
'/', it matches a corresponding '/' in the file name. For example,
|
||||
'foo/*.o' matches 'foo/bar.o'. Multiple '--exclude' options can be
|
||||
specified.
|
||||
Exclude files matching a shell pattern like '*.o', even if the files
|
||||
are specified in the command line. A file is considered to match if any
|
||||
component of the file name matches. For example, '*.o' matches
|
||||
'foo.o', 'foo.o/bar' and 'foo/bar.o'. If PATTERN contains a '/', it
|
||||
matches a corresponding '/' in the file name. For example, 'foo/*.o'
|
||||
matches 'foo/bar.o'. Multiple '--exclude' options can be specified.
|
||||
|
||||
'--ignore-ids'
|
||||
Make '--diff' ignore differences in owner and group IDs. This option is
|
||||
|
@ -486,10 +489,10 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
During archive creation, warn if any file being archived has a
|
||||
modification time newer than the archive creation time. This option
|
||||
may slow archive creation somewhat because it makes an extra call to
|
||||
'stat' after archiving each file, but it guarantees that file contents
|
||||
were not modified during the creation of the archive. Note that the
|
||||
file must be at least one second newer than the archive for it to be
|
||||
detected as newer.
|
||||
'stat' after archiving each file, but it nearly guarantees that file
|
||||
contents were not modified during the creation of the archive. Note
|
||||
that the file must be at least one second newer than the archive for
|
||||
it to be detected as newer.
|
||||
|
||||
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file not
|
||||
|
@ -498,7 +501,7 @@ indicate a corrupt or invalid input file, 3 for an internal consistency
|
|||
error (e.g., bug) which caused tarlz to panic.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Argument syntax, Next: Portable character set, Prev: Invoking tarlz, Up: Top
|
||||
File: tarlz.info, Node: Argument syntax, Next: Creating backups safely, Prev: Invoking tarlz, Up: Top
|
||||
|
||||
3 Syntax of command-line arguments
|
||||
**********************************
|
||||
|
@ -541,9 +544,55 @@ GNU adds "long options" to these conventions:
|
|||
Thus, '--foo bar' and '--foo=bar' are equivalent.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Portable character set, Next: File format, Prev: Argument syntax, Up: Top
|
||||
File: tarlz.info, Node: Creating backups safely, Next: Portable character set, Prev: Argument syntax, Up: Top
|
||||
|
||||
4 POSIX portable filename character set
|
||||
4 Checking the integrity and accuracy of tar.lz archives
|
||||
********************************************************
|
||||
|
||||
Uncompressed tar archives do not offer any integrity checking for the files
|
||||
they store. The pax format even fails to offer integrity checking for some
|
||||
of the metadata. *Note crc32::. The integrity checking of tar archives is
|
||||
usually provided by a compression layer or by an external hash.
|
||||
|
||||
Lzip compression provides safe integrity checking to tar archives. But it
|
||||
does not matter how safe is the archiving format if the archive is created
|
||||
corrupt because of a concurrent modification of the files being archived, a
|
||||
faulty RAM, or a bug in the archiving tool. The only way of guaranteeing
|
||||
that a backup archive is correct is to check its integrity and accuracy
|
||||
after creating it.
|
||||
|
||||
Testing the integrity of the archive with 'lzip -tv' guarantees that the
|
||||
compression layer of the archive is valid, but it does not guarantee that
|
||||
the tar layer is valid nor that the files in the archive match the files in
|
||||
the file system. For example, if the RAM is faulty and a bit flip happens
|
||||
in the input buffer before tarlz compresses it, the archive will not match
|
||||
the files. It is safer to check the archive with 'tarlz -d' just after
|
||||
creation because it checks the compression layer and the tar layer, and it
|
||||
compares the files in the archive with the files in the file system:
|
||||
|
||||
tarlz -cf archive.tar.lz somedir # create the archive
|
||||
tarlz -df archive.tar.lz # check the archive
|
||||
|
||||
Once the integrity and accuracy of an archive have been verified as in
|
||||
the example above, they can be verified again anywhere at any time with
|
||||
'tarlz -t -n0'. It is important to disable multi-threading with '-n0'
|
||||
because multi-threaded listing does not detect corruption in the tar member
|
||||
data of multimember archives: *Note mt-listing::.
|
||||
|
||||
tarlz -t -n0 -f archive.tar.lz > /dev/null
|
||||
|
||||
'lzip -tv' checks the integrity of the compression layer, and therefore
|
||||
the integrity and accuracy of any archive created and verified as explained
|
||||
above. This test is reliable for solidly compressed archives, but it does
|
||||
not detect a truncated multimember archive if the truncation happens just
|
||||
at a member boundary:
|
||||
|
||||
lzip -tv archive.tar.lz
|
||||
|
||||
|
||||
File: tarlz.info, Node: Portable character set, Next: File format, Prev: Creating backups safely, Up: Top
|
||||
|
||||
5 POSIX portable filename character set
|
||||
***************************************
|
||||
|
||||
The set of characters from which portable file names are constructed.
|
||||
|
@ -561,7 +610,7 @@ names use only the portable character set without spaces added.
|
|||
|
||||
File: tarlz.info, Node: File format, Next: Amendments to pax format, Prev: Portable character set, Up: Top
|
||||
|
||||
5 File format
|
||||
6 File format
|
||||
*************
|
||||
|
||||
In the diagram below, a box like this:
|
||||
|
@ -632,7 +681,7 @@ tar.lz
|
|||
| member | member | member |
|
||||
+===============+=================================================+========+
|
||||
|
||||
5.1 Pax header block
|
||||
6.1 Pax header block
|
||||
====================
|
||||
|
||||
The pax header block is identical to the ustar header block described below
|
||||
|
@ -676,7 +725,7 @@ space, equal-sign, and newline.
|
|||
previously archived. This record overrides the field 'linkname' in the
|
||||
following ustar header block. The following ustar header block
|
||||
determines the type of link created. If typeflag of the following
|
||||
header block is 1, a hard link is created. If typeflag is 2, a
|
||||
header block is '1', a hard link is created. If typeflag is '2', a
|
||||
symbolic link is created and the linkpath value is used as the
|
||||
contents of the symbolic link. The linkpath record is created only for
|
||||
links with a link name that does not fit in the space provided by the
|
||||
|
@ -716,17 +765,17 @@ space, equal-sign, and newline.
|
|||
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
|
||||
representing the CRC <value> itself. The <value> is represented as 8
|
||||
hexadecimal digits in big endian order, '22 GNU.crc32=00000000\n'. The
|
||||
keyword of the CRC record is protected by the CRC to guarantee that
|
||||
corruption is always detected when using '--missing-crc' (except in
|
||||
case of CRC collision). A CRC was chosen because a checksum is too
|
||||
weak for a potentially large list of variable sized records. A
|
||||
option '--missing-crc' guarantees that corruption is always detected
|
||||
(except in case of CRC collision). A CRC was chosen because a checksum
|
||||
is too weak for a potentially large list of variable sized records. A
|
||||
checksum can't detect simple errors like the swapping of two bytes.
|
||||
*Note --missing-crc::.
|
||||
|
||||
|
||||
At verbosity level 1 or higher tarlz prints a diagnostic for each unknown
|
||||
extended header keyword found in an archive, once per keyword.
|
||||
|
||||
5.2 Ustar header block
|
||||
6.2 Ustar header block
|
||||
======================
|
||||
|
||||
The ustar header block has a length of 512 bytes and is structured as shown
|
||||
|
@ -750,6 +799,7 @@ gname 297 32
|
|||
devmajor 329 8
|
||||
devminor 337 8
|
||||
prefix 345 155
|
||||
padding 500 12
|
||||
|
||||
All characters in the header block are coded using the ISO/IEC 646:1991
|
||||
(ASCII) standard, except in fields storing names for files, users, and
|
||||
|
@ -839,7 +889,7 @@ file archived:
|
|||
''7''
|
||||
Reserved to represent a file to which an implementation has associated
|
||||
some high-performance attribute (contiguous file). Tarlz treats this
|
||||
type of file as a regular file (type 0).
|
||||
type of file as a regular file (type '0').
|
||||
|
||||
|
||||
The field 'magic' contains the ASCII null-terminated string "ustar". The
|
||||
|
@ -848,13 +898,13 @@ field 'version' contains the characters "00" (0x30,0x30). The fields
|
|||
characters in the array contain non-null characters including the last
|
||||
character. Each numeric field contains a leading space- or zero-filled,
|
||||
optionally null-terminated octal number using digits from the ISO/IEC
|
||||
646:1991 (ASCII) standard. Tarlz is able to decode numeric fields 1 byte
|
||||
646:1991 (ASCII) standard. Tarlz is able to decode numeric fields one byte
|
||||
longer than standard ustar by not requiring a terminating null character.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Amendments to pax format, Next: Program design, Prev: File format, Up: Top
|
||||
|
||||
6 The reasons for the differences with pax
|
||||
7 The reasons for the differences with pax
|
||||
******************************************
|
||||
|
||||
Tarlz creates safe archives that allow the reliable detection of invalid or
|
||||
|
@ -865,7 +915,7 @@ achieve this goal and avoid some other flaws in the pax format, tarlz makes
|
|||
some changes to the variant of the pax format that it uses. This chapter
|
||||
describes these changes and the concrete reasons to implement them.
|
||||
|
||||
6.1 Add a CRC of the extended records
|
||||
7.1 Add a CRC of the extended records
|
||||
=====================================
|
||||
|
||||
The POSIX pax format has a serious flaw. The metadata stored in pax extended
|
||||
|
@ -892,7 +942,7 @@ place.
|
|||
Redundancy Check (CRC) in a way compatible with standard tar tools. *Note
|
||||
key_crc32::.
|
||||
|
||||
6.2 Remove flawed backward compatibility
|
||||
7.2 Remove flawed backward compatibility
|
||||
========================================
|
||||
|
||||
In order to allow the extraction of pax archives by a tar utility conforming
|
||||
|
@ -925,7 +975,7 @@ trying to extract the file or link. This also makes easier during parallel
|
|||
decoding the detection of a tar member split between two lzip members at
|
||||
the boundary between the extended header and the ustar header.
|
||||
|
||||
6.3 As simple as possible (but not simpler)
|
||||
7.3 As simple as possible (but not simpler)
|
||||
===========================================
|
||||
|
||||
The tarlz format is mainly ustar. Extended pax headers are used only when
|
||||
|
@ -940,7 +990,7 @@ corruption.
|
|||
ignored. Some operations may not behave as expected if the archive contains
|
||||
global headers.
|
||||
|
||||
6.4 Improve reproducibility
|
||||
7.4 Improve reproducibility
|
||||
===========================
|
||||
|
||||
Pax includes by default the process ID of the pax process in the ustar name
|
||||
|
@ -952,7 +1002,7 @@ extended records, making it easier to produce reproducible archives.
|
|||
ten; '99<97_bytes>' or '100<97_bytes>'. Tarlz minimizes the length of the
|
||||
record and always produces a length of x-1 in these cases.
|
||||
|
||||
6.5 No data in hard links
|
||||
7.5 No data in hard links
|
||||
=========================
|
||||
|
||||
Tarlz does not allow data in hard link members. The data (if any) must be in
|
||||
|
@ -961,27 +1011,26 @@ the names of a file are stored as hard links, the type of the file is lost.
|
|||
Not allowing data in hard links also prevents invalid actions like
|
||||
extracting file data for a hard link to a symbolic link or to a directory.
|
||||
|
||||
6.6 Avoid misconversions to/from UTF-8
|
||||
7.6 Avoid misconversions to/from UTF-8
|
||||
======================================
|
||||
|
||||
There is no portable way to tell what charset a text string is coded into.
|
||||
Therefore, tarlz stores all fields representing text strings unmodified,
|
||||
without conversion to UTF-8 nor any other transformation. This prevents
|
||||
accidental double UTF-8 conversions. If the need arises this behavior will
|
||||
be adjusted with a command-line option in the future.
|
||||
accidental double UTF-8 conversions.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Program design, Next: Multi-threaded decoding, Prev: Amendments to pax format, Up: Top
|
||||
|
||||
7 Internal structure of tarlz
|
||||
8 Internal structure of tarlz
|
||||
*****************************
|
||||
|
||||
The parts of tarlz related to sequential processing of the archive are more
|
||||
or less similar to any other tar and won't be described here. The
|
||||
interesting parts described here are those related to Multi-threaded
|
||||
interesting parts described here are those related to multi-threaded
|
||||
processing.
|
||||
|
||||
The structure of the part of tarlz performing Multi-threaded archive
|
||||
The structure of the part of tarlz performing multi-threaded archive
|
||||
creation is somewhat similar to that of plzip with the added complication
|
||||
of the solidity levels. *Note Program design: (plzip)Program design. A
|
||||
grouper thread and several worker threads are created, acting the main
|
||||
|
@ -1053,7 +1102,7 @@ error be avoided.
|
|||
|
||||
File: tarlz.info, Node: Multi-threaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
|
||||
|
||||
8 Limitations of parallel tar decoding
|
||||
9 Limitations of parallel tar decoding
|
||||
**************************************
|
||||
|
||||
Safely decoding a tar archive in parallel is only possible if one decodes
|
||||
|
@ -1093,11 +1142,14 @@ tar.lz archives, keeping backwards compatibility. If tarlz finds a member
|
|||
misalignment during multi-threaded decoding, it switches to single-threaded
|
||||
mode and continues decoding the archive.
|
||||
|
||||
If the files in the archive are large, multi-threaded '--list' on a
|
||||
regular (seekable) tar.lz archive can be hundreds of times faster than
|
||||
sequential '--list' because, in addition to using several processors, it
|
||||
only needs to decompress part of each lzip member. See the following
|
||||
example listing the Silesia corpus on a dual core machine:
|
||||
9.1 Multi-threaded listing
|
||||
==========================
|
||||
|
||||
If the files in the archive are large, multi-threaded '--list' on a regular
|
||||
(seekable) tar.lz archive can be hundreds of times faster than sequential
|
||||
'--list' because, in addition to using several processors, it only needs to
|
||||
decompress part of each lzip member. See the following example listing the
|
||||
Silesia corpus on a dual core machine:
|
||||
|
||||
tarlz -9 --no-solid -cf silesia.tar.lz silesia
|
||||
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
|
||||
|
@ -1106,10 +1158,12 @@ example listing the Silesia corpus on a dual core machine:
|
|||
|
||||
On the other hand, multi-threaded '--list' won't detect corruption in
|
||||
the tar member data because it only decodes the part of each lzip member
|
||||
corresponding to the tar member header. This is another reason why the tar
|
||||
headers must provide their own integrity checking.
|
||||
corresponding to the tar member header. Partial decoding of a lzip member
|
||||
can't guarantee the integrity of the data decoded. This is another reason
|
||||
why the tar headers (including the extended records) must provide their own
|
||||
integrity checking.
|
||||
|
||||
8.1 Limitations of multi-threaded extraction
|
||||
9.2 Limitations of multi-threaded extraction
|
||||
============================================
|
||||
|
||||
Multi-threaded extraction may produce different output than single-threaded
|
||||
|
@ -1139,8 +1193,8 @@ links to.
|
|||
|
||||
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top
|
||||
|
||||
9 Minimum archive sizes required for multi-threaded block compression
|
||||
*********************************************************************
|
||||
10 Minimum archive sizes required for multi-threaded block compression
|
||||
**********************************************************************
|
||||
|
||||
When creating or appending to a compressed archive using multi-threaded
|
||||
block compression, tarlz puts tar members together in blocks and compresses
|
||||
|
@ -1177,7 +1231,7 @@ Level
|
|||
|
||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Minimum archive sizes, Up: Top
|
||||
|
||||
10 A small tutorial with examples
|
||||
11 A small tutorial with examples
|
||||
*********************************
|
||||
|
||||
Example 1: Create a multimember compressed archive 'archive.tar.lz'
|
||||
|
@ -1233,10 +1287,12 @@ other members can still be extracted).
|
|||
|
||||
tarlz -z --no-solid archive.tar
|
||||
|
||||
Example 10: Compress the archive 'archive.tar' and write the output to
|
||||
'foo.tar.lz'.
|
||||
Example 10: Recompress the archive 'archive.tar.lz' with different
|
||||
solidity, write the output to 'archive-ns.tar.lz', and compare both
|
||||
archives.
|
||||
|
||||
tarlz -z -o foo.tar.lz archive.tar
|
||||
lzip -cd archive.tar.lz | tarlz -9z --no-solid -o archive-ns.tar.lz
|
||||
zcmp archive.tar.lz archive-ns.tar.lz
|
||||
|
||||
Example 11: Concatenate and compress two archives 'archive1.tar' and
|
||||
'archive2.tar', and write the output to 'foo.tar.lz'.
|
||||
|
@ -1246,7 +1302,7 @@ Example 11: Concatenate and compress two archives 'archive1.tar' and
|
|||
|
||||
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
||||
|
||||
11 Reporting bugs
|
||||
12 Reporting bugs
|
||||
*****************
|
||||
|
||||
There are probably bugs in tarlz. There are certainly errors and omissions
|
||||
|
@ -1270,6 +1326,7 @@ Concept index
|
|||
* Amendments to pax format: Amendments to pax format. (line 6)
|
||||
* argument syntax: Argument syntax. (line 6)
|
||||
* bugs: Problems. (line 6)
|
||||
* creating backups: Creating backups safely. (line 6)
|
||||
* examples: Examples. (line 6)
|
||||
* file format: File format. (line 6)
|
||||
* getting help: Problems. (line 6)
|
||||
|
@ -1287,26 +1344,29 @@ Concept index
|
|||
|
||||
Tag Table:
|
||||
Node: Top216
|
||||
Node: Introduction1281
|
||||
Node: Invoking tarlz4106
|
||||
Ref: --data-size13109
|
||||
Ref: --bsolid17626
|
||||
Node: Argument syntax23539
|
||||
Node: Portable character set25314
|
||||
Node: File format25958
|
||||
Ref: key_crc3233001
|
||||
Ref: ustar-uid-gid36305
|
||||
Ref: ustar-mtime37112
|
||||
Node: Amendments to pax format39115
|
||||
Ref: crc3239823
|
||||
Ref: flawed-compat41134
|
||||
Node: Program design45211
|
||||
Node: Multi-threaded decoding49138
|
||||
Ref: mt-extraction52407
|
||||
Node: Minimum archive sizes53713
|
||||
Node: Examples55840
|
||||
Node: Problems58199
|
||||
Node: Concept index58754
|
||||
Node: Introduction1356
|
||||
Node: Invoking tarlz4179
|
||||
Ref: --data-size13265
|
||||
Ref: --bsolid17924
|
||||
Ref: --missing-crc21532
|
||||
Node: Argument syntax23897
|
||||
Node: Creating backups safely25673
|
||||
Node: Portable character set28057
|
||||
Node: File format28709
|
||||
Ref: key_crc3235756
|
||||
Ref: ustar-uid-gid39052
|
||||
Ref: ustar-mtime39859
|
||||
Node: Amendments to pax format41866
|
||||
Ref: crc3242574
|
||||
Ref: flawed-compat43885
|
||||
Node: Program design47870
|
||||
Node: Multi-threaded decoding51797
|
||||
Ref: mt-listing54198
|
||||
Ref: mt-extraction55236
|
||||
Node: Minimum archive sizes56542
|
||||
Node: Examples58671
|
||||
Node: Problems61166
|
||||
Node: Concept index61721
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
174
doc/tarlz.texi
174
doc/tarlz.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 7 December 2024
|
||||
@set VERSION 0.26
|
||||
@set UPDATED 28 February 2025
|
||||
@set VERSION 0.27
|
||||
|
||||
@dircategory Archiving
|
||||
@direntry
|
||||
|
@ -39,6 +39,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
|
|||
* Introduction:: Purpose and features of tarlz
|
||||
* Invoking tarlz:: Command-line interface
|
||||
* Argument syntax:: By convention, options start with a hyphen
|
||||
* Creating backups safely:: Checking integrity and accuracy of archives
|
||||
* Portable character set:: POSIX portable filename character set
|
||||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
|
@ -51,7 +52,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
|
|||
@end menu
|
||||
|
||||
@sp 1
|
||||
Copyright @copyright{} 2013-2024 Antonio Diaz Diaz.
|
||||
Copyright @copyright{} 2013-2025 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to copy,
|
||||
distribute, and modify it.
|
||||
|
@ -76,7 +77,7 @@ compressed archives.
|
|||
|
||||
Keeping the alignment between tar members and lzip members has two
|
||||
advantages. It adds an indexed lzip layer on top of the tar archive, making
|
||||
it possible to decode the archive safely in parallel. It also minimizes the
|
||||
it possible to decode the archive safely in parallel. It also reduces the
|
||||
amount of data lost in case of corruption. Compressing a tar archive with
|
||||
plzip may even double the amount of files lost for each lzip member damaged
|
||||
because it does not keep the members aligned.
|
||||
|
@ -254,7 +255,7 @@ during multi-threaded extraction. @xref{mt-extraction}.
|
|||
@item -t
|
||||
@itemx --list
|
||||
List the contents of an archive. If @var{files} are given, list only the
|
||||
@var{files} given.
|
||||
@var{files} given. @xref{mt-listing}.
|
||||
|
||||
@item -x
|
||||
@itemx --extract
|
||||
|
@ -265,20 +266,23 @@ directory without extracting the files under it, use
|
|||
empty directories unconditionally before extracting over them. Other than
|
||||
that, it does not make any special effort to extract a file over an
|
||||
incompatible type of file. For example, extracting a file over a non-empty
|
||||
directory usually fails.
|
||||
directory usually fails. @xref{mt-extraction}.
|
||||
|
||||
@item -z
|
||||
@itemx --compress
|
||||
Compress existing POSIX tar archives aligning the lzip members to the tar
|
||||
members with choice of granularity (@option{--bsolid} by default,
|
||||
@option{--dsolid} works like @option{--asolid}). Exit with error status 2 if
|
||||
any input archive is an empty file. The input archives are kept unchanged.
|
||||
Existing compressed archives are not overwritten. A hyphen @samp{-} used as
|
||||
the name of an input archive reads from standard input and writes to
|
||||
standard output (unless the option @option{--output} is used). Tarlz can be
|
||||
used as compressor for GNU tar by using a command like
|
||||
@w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}. Tarlz can be used as
|
||||
compressor for zupdate (zutils) by using a command like
|
||||
@option{--dsolid} works like @option{--asolid}). Each input archive is
|
||||
compressed to a file with the extension @file{.lz} added unless the option
|
||||
@option{--output} is used. If no archives are specified, or if a hyphen
|
||||
@samp{-} is used as the name of an archive, tarlz reads from standard input
|
||||
and writes to standard output (unless the option @option{--output} is used).
|
||||
When @option{--output} is used, only one input archive can be specified.
|
||||
Exit with error status 2 if any input archive is an empty file. The input
|
||||
archives are kept unchanged. Existing compressed archives are not
|
||||
overwritten. Tarlz can be used as compressor for GNU tar by using a command
|
||||
like @w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}. Tarlz can be
|
||||
used as compressor for zupdate (zutils) by using a command like
|
||||
@w{@samp{zupdate --lz='tarlz -z' foo.tar.gz}}. Note that tarlz only works
|
||||
reliably on archives without global headers, or with global headers whose
|
||||
content can be ignored.
|
||||
|
@ -289,10 +293,8 @@ block is found, and then compresses the rest of the archive. Unless solid
|
|||
compression is requested, the end-of-archive blocks are compressed in a lzip
|
||||
member separated from the preceding members and from any nonzero garbage
|
||||
following the end-of-archive blocks. @option{--compress} implies plzip
|
||||
argument style, not tar style. Each input archive is compressed to a file
|
||||
with the extension @file{.lz} added unless the option @option{--output} is
|
||||
used. When @option{--output} is used, only one input archive can be specified.
|
||||
@option{-f} can't be used with @option{--compress}.
|
||||
argument style, not tar style. @option{-f} can't be used with
|
||||
@option{--compress}.
|
||||
|
||||
@item --check-lib
|
||||
Compare the
|
||||
|
@ -319,8 +321,10 @@ tarlz supports the following options: @xref{Argument syntax}.
|
|||
@itemx --data-size=@var{bytes}
|
||||
Set target size of input data blocks for the option @option{--bsolid}.
|
||||
@xref{--bsolid}. Valid values range from @w{8 KiB} to @w{1 GiB}. Default
|
||||
value is two times the dictionary size, except for option @option{-0} where it
|
||||
defaults to @w{1 MiB}. @xref{Minimum archive sizes}.
|
||||
value is two times the dictionary size, except for option @option{-0} where
|
||||
it defaults to @w{1 MiB}. @xref{Minimum archive sizes}. Tarlz does not split
|
||||
tar members. If a file is larger than @var{bytes}, tarlz will create a lzip
|
||||
member large enough to contain the file.
|
||||
|
||||
@item -C @var{dir}
|
||||
@itemx --directory=@var{dir}
|
||||
|
@ -465,12 +469,13 @@ If @var{group} is not a valid group name, it is decoded as a decimal numeric
|
|||
group ID.
|
||||
|
||||
@item --exclude=@var{pattern}
|
||||
Exclude files matching a shell pattern like @file{*.o}. A file is considered
|
||||
to match if any component of the file name matches. For example, @file{*.o}
|
||||
matches @file{foo.o}, @file{foo.o/bar} and @file{foo/bar.o}. If
|
||||
@var{pattern} contains a @samp{/}, it matches a corresponding @samp{/} in
|
||||
the file name. For example, @file{foo/*.o} matches @file{foo/bar.o}.
|
||||
Multiple @option{--exclude} options can be specified.
|
||||
Exclude files matching a shell pattern like @file{*.o}, even if the files
|
||||
are specified in the command line. A file is considered to match if any
|
||||
component of the file name matches. For example, @file{*.o} matches
|
||||
@file{foo.o}, @file{foo.o/bar} and @file{foo/bar.o}. If @var{pattern}
|
||||
contains a @samp{/}, it matches a corresponding @samp{/} in the file name.
|
||||
For example, @file{foo/*.o} matches @file{foo/bar.o}. Multiple
|
||||
@option{--exclude} options can be specified.
|
||||
|
||||
@item --ignore-ids
|
||||
Make @option{--diff} ignore differences in owner and group IDs. This option is
|
||||
|
@ -493,6 +498,7 @@ recover as much data as possible from each damaged member. It is recommended
|
|||
to run tarlz in single-threaded mode (@option{--threads=0}) when using this
|
||||
option.
|
||||
|
||||
@anchor{--missing-crc}
|
||||
@item --missing-crc
|
||||
Exit with error status 2 if the CRC of the extended records is missing. When
|
||||
this option is used, tarlz detects any corruption in the extended records
|
||||
|
@ -525,9 +531,9 @@ values range from 1 to 1024. The default value is 64.
|
|||
During archive creation, warn if any file being archived has a modification
|
||||
time newer than the archive creation time. This option may slow archive
|
||||
creation somewhat because it makes an extra call to @samp{stat} after
|
||||
archiving each file, but it guarantees that file contents were not modified
|
||||
during the creation of the archive. Note that the file must be at least one
|
||||
second newer than the archive for it to be detected as newer.
|
||||
archiving each file, but it nearly guarantees that file contents were not
|
||||
modified during the creation of the archive. Note that the file must be at
|
||||
least one second newer than the archive for it to be detected as newer.
|
||||
|
||||
@ignore
|
||||
@item --permissive
|
||||
|
@ -591,6 +597,58 @@ Thus, @w{@option{--foo bar}} and @option{--foo=bar} are equivalent.
|
|||
@end itemize
|
||||
|
||||
|
||||
@node Creating backups safely
|
||||
@chapter Checking the integrity and accuracy of tar.lz archives
|
||||
@cindex creating backups
|
||||
|
||||
Uncompressed tar archives do not offer any integrity checking for the files
|
||||
they store. The pax format even fails to offer integrity checking for some
|
||||
of the metadata. @xref{crc32}. The integrity checking of tar archives is
|
||||
usually provided by a compression layer or by an external hash.
|
||||
|
||||
Lzip compression provides safe integrity checking to tar archives. But it
|
||||
does not matter how safe is the archiving format if the archive is created
|
||||
corrupt because of a concurrent modification of the files being archived, a
|
||||
faulty RAM, or a bug in the archiving tool. The only way of guaranteeing
|
||||
that a backup archive is correct is to check its integrity and accuracy
|
||||
after creating it.
|
||||
|
||||
Testing the integrity of the archive with @w{@samp{lzip -tv}} guarantees
|
||||
that the compression layer of the archive is valid, but it does not
|
||||
guarantee that the tar layer is valid nor that the files in the archive
|
||||
match the files in the file system. For example, if the RAM is faulty and a
|
||||
bit flip happens in the input buffer before tarlz compresses it, the archive
|
||||
will not match the files. It is safer to check the archive with
|
||||
@w{@samp{tarlz -d}} just after creation because it checks the compression
|
||||
layer and the tar layer, and it compares the files in the archive with the
|
||||
files in the file system:
|
||||
|
||||
@example
|
||||
tarlz -cf archive.tar.lz somedir # create the archive
|
||||
tarlz -df archive.tar.lz # check the archive
|
||||
@end example
|
||||
|
||||
Once the integrity and accuracy of an archive have been verified as in the
|
||||
example above, they can be verified again anywhere at any time with
|
||||
@w{@samp{tarlz -t -n0}}. It is important to disable multi-threading with
|
||||
@option{-n0} because multi-threaded listing does not detect corruption in
|
||||
the tar member data of multimember archives: @xref{mt-listing}.
|
||||
|
||||
@example
|
||||
tarlz -t -n0 -f archive.tar.lz > /dev/null
|
||||
@end example
|
||||
|
||||
@w{@samp{lzip -tv}} checks the integrity of the compression layer, and
|
||||
therefore the integrity and accuracy of any archive created and verified as
|
||||
explained above. This test is reliable for solidly compressed archives, but
|
||||
it does not detect a truncated multimember archive if the truncation happens
|
||||
just at a member boundary:
|
||||
|
||||
@example
|
||||
lzip -tv archive.tar.lz
|
||||
@end example
|
||||
|
||||
|
||||
@node Portable character set
|
||||
@chapter POSIX portable filename character set
|
||||
@cindex portable character set
|
||||
|
@ -641,7 +699,7 @@ are not allowed in multimember files.
|
|||
|
||||
Each lzip member contains one or more tar members in a simplified POSIX pax
|
||||
interchange format. The only pax typeflag value supported by tarlz (in
|
||||
addition to the typeflag values defined by the ustar format) is @samp{x}.
|
||||
addition to the typeflag values defined by the ustar format) is 'x'.
|
||||
The pax format is an extension on top of the ustar format that removes the
|
||||
size limitations of the ustar format.
|
||||
|
||||
|
@ -654,7 +712,7 @@ An optional extended header block followed by one or more blocks that
|
|||
contain the extended header records as if they were the contents of a file;
|
||||
i.e., the extended header records are included as the data for this header
|
||||
block. This header block is of the form described in pax header block, with
|
||||
a typeflag value of @samp{x}.
|
||||
a typeflag value of 'x'.
|
||||
|
||||
@item
|
||||
A header block in ustar format that describes the file. Any fields defined
|
||||
|
@ -713,7 +771,7 @@ An extended header just before the end-of-archive blocks.
|
|||
@section Pax header block
|
||||
|
||||
The pax header block is identical to the ustar header block described below
|
||||
except that the typeflag has the value @samp{x} (extended). The field
|
||||
except that the typeflag has the value 'x' (extended). The field
|
||||
@samp{size} is the size of the extended header data in bytes. Most other
|
||||
fields in the pax header block are zeroed on archive creation to prevent
|
||||
trouble if the archive is read by a ustar tool, and are ignored by tarlz on
|
||||
|
@ -752,8 +810,8 @@ greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
|
|||
The file name of a link being created to another file, of any type,
|
||||
previously archived. This record overrides the field @samp{linkname} in the
|
||||
following ustar header block. The following ustar header block determines
|
||||
the type of link created. If typeflag of the following header block is 1, a
|
||||
hard link is created. If typeflag is 2, a symbolic link is created and the
|
||||
the type of link created. If typeflag of the following header block is '1', a
|
||||
hard link is created. If typeflag is '2', a symbolic link is created and the
|
||||
linkpath value is used as the contents of the symbolic link. The linkpath
|
||||
record is created only for links with a link name that does not fit in the
|
||||
space provided by the ustar header.
|
||||
|
@ -789,13 +847,12 @@ greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
|
|||
@item GNU.crc32
|
||||
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
|
||||
representing the CRC <value> itself. The <value> is represented as 8
|
||||
hexadecimal digits in big endian order,
|
||||
@w{@samp{22 GNU.crc32=00000000\n}}. The keyword of the CRC record is
|
||||
protected by the CRC to guarantee that corruption is always detected when
|
||||
using @option{--missing-crc} (except in case of CRC collision). A CRC was
|
||||
chosen because a checksum is too weak for a potentially large list of
|
||||
variable sized records. A checksum can't detect simple errors like the
|
||||
swapping of two bytes.
|
||||
hexadecimal digits in big endian order, @w{@samp{22 GNU.crc32=00000000\n}}.
|
||||
The option @option{--missing-crc} guarantees that corruption is always
|
||||
detected (except in case of CRC collision). A CRC was chosen because a
|
||||
checksum is too weak for a potentially large list of variable sized records.
|
||||
A checksum can't detect simple errors like the swapping of two bytes.
|
||||
@xref{--missing-crc}.
|
||||
|
||||
@end table
|
||||
|
||||
|
@ -825,6 +882,7 @@ shown in the following table. All lengths and offsets are in decimal:
|
|||
@item devmajor @tab 329 @tab 8
|
||||
@item devminor @tab 337 @tab 8
|
||||
@item prefix @tab 345 @tab 155
|
||||
@item padding @tab 500 @tab 12
|
||||
@end multitable
|
||||
|
||||
All characters in the header block are coded using the ISO/IEC 646:1991
|
||||
|
@ -919,7 +977,7 @@ FIFO special file.
|
|||
@item '7'
|
||||
Reserved to represent a file to which an implementation has associated some
|
||||
high-performance attribute (contiguous file). Tarlz treats this type of file
|
||||
as a regular file (type 0).
|
||||
as a regular file (type '0').
|
||||
|
||||
@end table
|
||||
|
||||
|
@ -930,8 +988,8 @@ except when all characters in the array contain non-null characters
|
|||
including the last character. Each numeric field contains a leading space-
|
||||
or zero-filled, optionally null-terminated octal number using digits from
|
||||
the ISO/IEC 646:1991 (ASCII) standard. Tarlz is able to decode numeric
|
||||
fields 1 byte longer than standard ustar by not requiring a terminating null
|
||||
character.
|
||||
fields one byte longer than standard ustar by not requiring a terminating
|
||||
null character.
|
||||
|
||||
|
||||
@node Amendments to pax format
|
||||
|
@ -1044,8 +1102,7 @@ extracting file data for a hard link to a symbolic link or to a directory.
|
|||
There is no portable way to tell what charset a text string is coded into.
|
||||
Therefore, tarlz stores all fields representing text strings unmodified,
|
||||
without conversion to UTF-8 nor any other transformation. This prevents
|
||||
accidental double UTF-8 conversions. If the need arises this behavior will
|
||||
be adjusted with a command-line option in the future.
|
||||
accidental double UTF-8 conversions.
|
||||
|
||||
|
||||
@node Program design
|
||||
|
@ -1054,12 +1111,12 @@ be adjusted with a command-line option in the future.
|
|||
|
||||
The parts of tarlz related to sequential processing of the archive are more
|
||||
or less similar to any other tar and won't be described here. The interesting
|
||||
parts described here are those related to Multi-threaded processing.
|
||||
parts described here are those related to multi-threaded processing.
|
||||
|
||||
The structure of the part of tarlz performing Multi-threaded archive
|
||||
The structure of the part of tarlz performing multi-threaded archive
|
||||
creation is somewhat similar to that of
|
||||
@uref{http://www.nongnu.org/lzip/plzip.html#Program-design,,plzip} with the
|
||||
added complication of the solidity levels.
|
||||
@uref{http://www.nongnu.org/lzip/manual/plzip_manual.html#Program-design,,plzip}
|
||||
with the added complication of the solidity levels.
|
||||
@ifnothtml
|
||||
@xref{Program design,,,plzip}.
|
||||
@end ifnothtml
|
||||
|
@ -1174,6 +1231,9 @@ tar.lz archives, keeping backwards compatibility. If tarlz finds a member
|
|||
misalignment during multi-threaded decoding, it switches to single-threaded
|
||||
mode and continues decoding the archive.
|
||||
|
||||
@anchor{mt-listing}
|
||||
@section Multi-threaded listing
|
||||
|
||||
If the files in the archive are large, multi-threaded @option{--list} on a
|
||||
regular (seekable) tar.lz archive can be hundreds of times faster than
|
||||
sequential @option{--list} because, in addition to using several processors,
|
||||
|
@ -1189,8 +1249,10 @@ time tarlz -tf silesia.tar.lz (0.020s)
|
|||
|
||||
On the other hand, multi-threaded @option{--list} won't detect corruption in
|
||||
the tar member data because it only decodes the part of each lzip member
|
||||
corresponding to the tar member header. This is another reason why the tar
|
||||
headers must provide their own integrity checking.
|
||||
corresponding to the tar member header. Partial decoding of a lzip member
|
||||
can't guarantee the integrity of the data decoded. This is another reason
|
||||
why the tar headers (including the extended records) must provide their own
|
||||
integrity checking.
|
||||
|
||||
@anchor{mt-extraction}
|
||||
@section Limitations of multi-threaded extraction
|
||||
|
@ -1344,11 +1406,13 @@ tarlz -z --no-solid archive.tar
|
|||
@end example
|
||||
|
||||
@noindent
|
||||
Example 10: Compress the archive @file{archive.tar} and write the output to
|
||||
@file{foo.tar.lz}.
|
||||
Example 10: Recompress the archive @file{archive.tar.lz} with different
|
||||
solidity, write the output to @file{archive-ns.tar.lz}, and compare both
|
||||
archives.
|
||||
|
||||
@example
|
||||
tarlz -z -o foo.tar.lz archive.tar
|
||||
lzip -cd archive.tar.lz | tarlz -9z --no-solid -o archive-ns.tar.lz
|
||||
zcmp archive.tar.lz archive-ns.tar.lz
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue