2025-02-17 21:04:20 +01:00
|
|
|
Description
|
|
|
|
|
2025-02-17 21:12:14 +01:00
|
|
|
Tarlz is a massively parallel (multi-threaded) combined implementation of
|
|
|
|
the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
|
2025-02-17 21:14:17 +01:00
|
|
|
archives in a simplified and safer variant of the POSIX pax format
|
|
|
|
compressed with lzip, keeping the alignment between tar members and lzip
|
|
|
|
members. The resulting multimember tar.lz archive is fully backward
|
|
|
|
compatible with standard tar tools like GNU tar, which treat it like any
|
|
|
|
other tar.lz archive. Tarlz can append files to the end of such compressed
|
|
|
|
archives.
|
|
|
|
|
|
|
|
Keeping the alignment between tar members and lzip members has two
|
|
|
|
advantages. It adds an indexed lzip layer on top of the tar archive, making
|
|
|
|
it possible to decode the archive safely in parallel. It also minimizes the
|
|
|
|
amount of data lost in case of corruption. Compressing a tar archive with
|
|
|
|
plzip may even double the amount of files lost for each lzip member damaged
|
|
|
|
because it does not keep the members aligned.
|
2025-02-17 21:12:14 +01:00
|
|
|
|
|
|
|
Tarlz can create tar archives with five levels of compression granularity;
|
2025-02-17 21:13:25 +01:00
|
|
|
per file (--no-solid), per block (--bsolid, default), per directory
|
|
|
|
(--dsolid), appendable solid (--asolid), and solid (--solid).
|
2025-02-17 21:12:14 +01:00
|
|
|
|
|
|
|
Of course, compressing each file (or each directory) individually can't
|
|
|
|
achieve a compression ratio as high as compressing solidly the whole tar
|
|
|
|
archive, but it has the following advantages:
|
2025-02-17 21:04:20 +01:00
|
|
|
|
|
|
|
* The resulting multimember tar.lz archive can be decompressed in
|
2025-02-17 21:10:53 +01:00
|
|
|
parallel, multiplying the decompression speed.
|
2025-02-17 21:04:20 +01:00
|
|
|
|
2025-02-17 21:10:01 +01:00
|
|
|
* New members can be appended to the archive (by removing the EOF
|
2025-02-17 21:13:41 +01:00
|
|
|
member), and unwanted members can be deleted from the archive. Just
|
|
|
|
like an uncompressed tar archive.
|
2025-02-17 21:04:20 +01:00
|
|
|
|
2025-02-17 21:14:17 +01:00
|
|
|
* It is a safe POSIX-style backup format. In case of corruption,
|
2025-02-17 21:04:20 +01:00
|
|
|
tarlz can extract all the undamaged members from the tar.lz
|
|
|
|
archive, skipping over the damaged members, just like the standard
|
2025-02-17 21:10:01 +01:00
|
|
|
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
|
|
|
|
used to recover as much data as possible from each damaged member,
|
|
|
|
and lziprecover can be used to recover some of the damaged members.
|
2025-02-17 21:04:20 +01:00
|
|
|
|
|
|
|
* A multimember tar.lz archive is usually smaller than the
|
|
|
|
corresponding solidly compressed tar.gz archive, except when
|
|
|
|
individually compressing files smaller than about 32 KiB.
|
|
|
|
|
2025-02-17 21:14:17 +01:00
|
|
|
Note that the POSIX pax format has a serious flaw. The metadata stored in
|
2025-02-17 21:10:53 +01:00
|
|
|
pax extended records are not protected by any kind of check sequence.
|
2025-02-17 21:14:17 +01:00
|
|
|
Corruption in a long file name may cause the extraction of the file in the
|
2025-02-17 21:10:53 +01:00
|
|
|
wrong place without warning. Corruption in a large file size may cause the
|
2025-02-17 21:10:01 +01:00
|
|
|
truncation of the file or the appending of garbage to the file, both
|
2025-02-17 21:10:53 +01:00
|
|
|
followed by a spurious warning about a corrupt header far from the place of
|
|
|
|
the undetected corruption.
|
2025-02-17 21:10:01 +01:00
|
|
|
|
2025-02-17 21:14:17 +01:00
|
|
|
Metadata like file name and file size must be always protected in an archive
|
2025-02-17 21:10:01 +01:00
|
|
|
format because of the adverse effects of undetected corruption in them,
|
|
|
|
potentially much worse that undetected corruption in the data. Even more so
|
|
|
|
in the case of pax because the amount of metadata it stores is potentially
|
|
|
|
large, making undetected corruption more probable.
|
|
|
|
|
2025-02-17 21:14:17 +01:00
|
|
|
Headers and metadata must be protected separately from data because the
|
|
|
|
integrity checking of lzip may not be able to detect the corruption before
|
|
|
|
the metadata has been used, for example, to create a new file in the wrong
|
|
|
|
place.
|
|
|
|
|
2025-02-17 21:12:14 +01:00
|
|
|
Because of the above, tarlz protects the extended records with a CRC in a
|
|
|
|
way compatible with standard tar tools.
|
2025-02-17 21:10:01 +01:00
|
|
|
|
|
|
|
Tarlz does not understand other tar formats like gnu, oldgnu, star or v7.
|
2025-02-17 21:14:17 +01:00
|
|
|
'tarlz -tf archive.tar.lz > /dev/null' can be used to verify that the format
|
|
|
|
of the archive is compatible with tarlz.
|
2025-02-17 21:10:01 +01:00
|
|
|
|
2025-02-17 21:12:14 +01:00
|
|
|
The diagram below shows the correspondence between each tar member (formed
|
|
|
|
by one or two headers plus optional data) in the tar archive and each lzip
|
|
|
|
member in the resulting multimember tar.lz archive, when per file
|
|
|
|
compression is used:
|
2025-02-17 21:10:01 +01:00
|
|
|
|
|
|
|
tar
|
|
|
|
+========+======+=================+===============+========+======+========+
|
|
|
|
| header | data | extended header | extended data | header | data | EOF |
|
|
|
|
+========+======+=================+===============+========+======+========+
|
|
|
|
|
|
|
|
tar.lz
|
|
|
|
+===============+=================================================+========+
|
|
|
|
| member | member | member |
|
|
|
|
+===============+=================================================+========+
|
|
|
|
|
2025-02-17 21:04:20 +01:00
|
|
|
|
2025-02-17 21:10:53 +01:00
|
|
|
Copyright (C) 2013-2019 Antonio Diaz Diaz.
|
2025-02-17 21:04:20 +01:00
|
|
|
|
|
|
|
This file is free documentation: you have unlimited permission to copy,
|
|
|
|
distribute and modify it.
|
|
|
|
|
|
|
|
The file Makefile.in is a data file used by configure to produce the
|
|
|
|
Makefile. It has the same copyright owner and permissions that configure
|
|
|
|
itself.
|