1
0
Fork 0

Merging upstream version 0.27.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-03-04 07:39:30 +01:00
parent 619358407d
commit 5e422e043e
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
83 changed files with 980 additions and 726 deletions

View file

@ -1,3 +1,20 @@
2025-02-28 Antonio Diaz Diaz <antonio@gnu.org>
* Version 0.27 released.
* common_decode.cc (format_member_name): Print seconds since epoch
if date is out of range. Use at least 4 digits to print years.
Print typeflag after the member name if unknown file type.
(make_dirs): stat last dir before trying to create directories.
* decode.cc (skip_warn): Diagnose a corrupt tar header.
* extended.cc (Extended::parse): Diagnose a CRC mismatch.
New argument 'msg_vecp' for multi-threaded diagnostics.
* Many small fixes and improvements to the code and the manual.
* tarlz.texi: New chapter 'Creating backups safely'.
(Suggested by Aren Tyr).
* check.sh: Require lzip. Create .tar files from .tar.lz files.
Limit '--mtime' test to safe dates. (Reported by Aren Tyr).
* testsuite: Add 5 new test files.
2024-12-07 Antonio Diaz Diaz <antonio@gnu.org> 2024-12-07 Antonio Diaz Diaz <antonio@gnu.org>
* Version 0.26 released. * Version 0.26 released.
@ -73,8 +90,7 @@
* Lzlib 1.12 or newer is now required. * Lzlib 1.12 or newer is now required.
* decode.cc (decode): Skip members without name except when listing. * decode.cc (decode): Skip members without name except when listing.
decode_lz.cc (dworker): Likewise. (Reported by Florian Schmaus). decode_lz.cc (dworker): Likewise. (Reported by Florian Schmaus).
* New options '-z, --compress' and '-o, --output'. * New options '-z, --compress', '-o, --output', and '--warn-newer'.
* New option '--warn-newer'.
* tarlz.texi (Invoking tarlz): Document concatenation to stdout. * tarlz.texi (Invoking tarlz): Document concatenation to stdout.
* check.sh: Fix the '--diff' test on OS/2. (Reported by Elbert Pol). * check.sh: Fix the '--diff' test on OS/2. (Reported by Elbert Pol).
@ -97,10 +113,10 @@
2020-07-30 Antonio Diaz Diaz <antonio@gnu.org> 2020-07-30 Antonio Diaz Diaz <antonio@gnu.org>
* Version 0.17 released. * Version 0.17 released.
* New option '--mtime'. * New options '--mtime' and '-p, --preserve-permissions'.
* New option '-p, --preserve-permissions'.
* Implement multi-threaded '-d, --diff'. * Implement multi-threaded '-d, --diff'.
* list_lz.cc: Rename to decode_lz.cc. * list_lz.cc: Rename to decode_lz.cc.
(decode_lz): Limit num_workers to number of members.
* main.cc (main): Report an error if a file name is empty or if the * main.cc (main): Report an error if a file name is empty or if the
archive is specified more than once. archive is specified more than once.
* lzip_index.cc: Improve messages for corruption in last header. * lzip_index.cc: Improve messages for corruption in last header.
@ -125,8 +141,7 @@
2019-03-12 Antonio Diaz Diaz <antonio@gnu.org> 2019-03-12 Antonio Diaz Diaz <antonio@gnu.org>
* Version 0.14 released. * Version 0.14 released.
* New option '--exclude'. * New options '--exclude' and '-h, --dereference'.
* New option '-h, --dereference'.
* Short option name '-h' no longer means '--help'. * Short option name '-h' no longer means '--help'.
* create.cc: Implement '-A, --concatenate' and '-r, --append' to * create.cc: Implement '-A, --concatenate' and '-r, --append' to
uncompressed archives and to standard output. uncompressed archives and to standard output.
@ -145,8 +160,7 @@
* create.cc (fill_headers): Fix use of st_rdev instead of st_dev. * create.cc (fill_headers): Fix use of st_rdev instead of st_dev.
* Save just numerical uid/gid if user or group not in database. * Save just numerical uid/gid if user or group not in database.
* extract.cc (format_member_name): Print devmajor and devminor. * extract.cc (format_member_name): Print devmajor and devminor.
* New option '-d, --diff'. * New options '-d, --diff' and '--ignore-ids'.
* New option '--ignore-ids'.
* extract.cc: Fast '-t, --list' on seekable uncompressed archives. * extract.cc: Fast '-t, --list' on seekable uncompressed archives.
2019-02-13 Antonio Diaz Diaz <antonio@gnu.org> 2019-02-13 Antonio Diaz Diaz <antonio@gnu.org>
@ -161,8 +175,7 @@
2019-01-31 Antonio Diaz Diaz <antonio@gnu.org> 2019-01-31 Antonio Diaz Diaz <antonio@gnu.org>
* Version 0.10 released. * Version 0.10 released.
* New option '--bsolid'. * New options '--bsolid' and '-B, --data-size'.
* New option '-B, --data-size'.
* create.cc: Set ustar name to zero if extended header is used. * create.cc: Set ustar name to zero if extended header is used.
2019-01-22 Antonio Diaz Diaz <antonio@gnu.org> 2019-01-22 Antonio Diaz Diaz <antonio@gnu.org>
@ -185,8 +198,7 @@
2018-11-23 Antonio Diaz Diaz <antonio@gnu.org> 2018-11-23 Antonio Diaz Diaz <antonio@gnu.org>
* Version 0.7 released. * Version 0.7 released.
* New option '--keep-damaged'. * New options '--keep-damaged' and '--no-solid'.
* New option '--no-solid'.
* create.cc (archive_write): Minimize dictionary size. * create.cc (archive_write): Minimize dictionary size.
Detect and skip archive in '-A', '-c', and '-r'. Detect and skip archive in '-A', '-c', and '-r'.
* main.cc (show_version): Show the version of lzlib being used. * main.cc (show_version): Show the version of lzlib being used.
@ -195,7 +207,7 @@
* Version 0.6 released. * Version 0.6 released.
* New option '-A, --concatenate'. * New option '-A, --concatenate'.
* Option '--ignore-crc' replaced with '--missing-crc'. * Replace option '--ignore-crc' with '--missing-crc'.
* create.cc (add_member): Check that uid, gid, mtime, devmajor, * create.cc (add_member): Check that uid, gid, mtime, devmajor,
and devminor are in ustar range. and devminor are in ustar range.
* configure: Accept appending to CXXFLAGS; 'CXXFLAGS+=OPTIONS'. * configure: Accept appending to CXXFLAGS; 'CXXFLAGS+=OPTIONS'.
@ -220,11 +232,10 @@
* Version 0.3 released. * Version 0.3 released.
* Rename project to 'tarlz' from 'pmtar' (Poor Man's Tar). * Rename project to 'tarlz' from 'pmtar' (Poor Man's Tar).
* New option '-C, --directory'. * New options '-C, --directory' and '-r, --append'.
* Implement lzip compression of members at archive creation.
* New option '-r, --append'.
* New options '--owner' and '--group'. * New options '--owner' and '--group'.
* New options '--asolid', '--dsolid', and '--solid'. * New options '--asolid', '--dsolid', and '--solid'.
* Implement lzip compression of members at archive creation.
* Implement file appending to compressed archive. * Implement file appending to compressed archive.
* Implement transparent decompression of the archive. * Implement transparent decompression of the archive.
* Implement skipping over damaged (un)compressed members. * Implement skipping over damaged (un)compressed members.
@ -242,7 +253,7 @@
* Version 0.1 released. * Version 0.1 released.
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This file is a collection of facts, and thus it is not copyrightable, but just This file is a collection of facts, and thus it is not copyrightable, but just
in case, you have unlimited permission to copy, distribute, and modify it. in case, you have unlimited permission to copy, distribute, and modify it.

12
INSTALL
View file

@ -4,10 +4,12 @@ You will need a C++98 compiler with support for 'long long', and the
compression library lzlib installed. (gcc 3.3.6 or newer is recommended). compression library lzlib installed. (gcc 3.3.6 or newer is recommended).
I use gcc 6.1.0 and 3.3.6, but the code should compile with any standards I use gcc 6.1.0 and 3.3.6, but the code should compile with any standards
compliant compiler. compliant compiler.
Gcc is available at http://gcc.gnu.org.
Lzlib is available at http://www.nongnu.org/lzip/lzlib.html.
Lzlib must be version 1.12 or newer. Lzlib must be version 1.12 or newer.
Gcc is available at http://gcc.gnu.org
Lzlib is available at http://www.nongnu.org/lzip/lzlib.html
Lzip is required to run the tests.
Lzip is available at http://www.nongnu.org/lzip/lzip.html
The operating system must allow signal handlers read access to objects with The operating system must allow signal handlers read access to objects with
static storage duration so that the cleanup handler for Control-C can delete static storage duration so that the cleanup handler for Control-C can delete
@ -18,6 +20,8 @@ Procedure
--------- ---------
1. Unpack the archive if you have not done so already: 1. Unpack the archive if you have not done so already:
tarlz -xf tarlz[version].tar.lz
or
tar -xf tarlz[version].tar.lz tar -xf tarlz[version].tar.lz
or or
lzip -cd tarlz[version].tar.lz | tar -xf - lzip -cd tarlz[version].tar.lz | tar -xf -
@ -74,7 +78,7 @@ After running 'configure', you can run 'make' and 'make install' as
explained above. explained above.
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This file is free documentation: you have unlimited permission to copy, This file is free documentation: you have unlimited permission to copy,
distribute, and modify it. distribute, and modify it.

View file

@ -135,42 +135,34 @@ dist : doc
$(DISTNAME)/*.cc \ $(DISTNAME)/*.cc \
$(DISTNAME)/testsuite/check.sh \ $(DISTNAME)/testsuite/check.sh \
$(DISTNAME)/testsuite/test.txt \ $(DISTNAME)/testsuite/test.txt \
$(DISTNAME)/testsuite/test.txt.tar \
$(DISTNAME)/testsuite/test_bad1.txt.tar \ $(DISTNAME)/testsuite/test_bad1.txt.tar \
$(DISTNAME)/testsuite/test_bad[12].txt \ $(DISTNAME)/testsuite/test_bad[12].txt \
$(DISTNAME)/testsuite/rfoo \ $(DISTNAME)/testsuite/rfoo \
$(DISTNAME)/testsuite/rbar \ $(DISTNAME)/testsuite/rbar \
$(DISTNAME)/testsuite/rbaz \ $(DISTNAME)/testsuite/rbaz \
$(DISTNAME)/testsuite/test3.tar \
$(DISTNAME)/testsuite/test3_nn.tar \
$(DISTNAME)/testsuite/test3_eoa[1-4].tar \
$(DISTNAME)/testsuite/test3_gh[1-4].tar \
$(DISTNAME)/testsuite/test3_bad[1-5].tar \ $(DISTNAME)/testsuite/test3_bad[1-5].tar \
$(DISTNAME)/testsuite/test3_dir.tar \
$(DISTNAME)/testsuite/t155.tar \
$(DISTNAME)/testsuite/t155_fv[1-3].tar \
$(DISTNAME)/testsuite/eoa_blocks.tar \
$(DISTNAME)/testsuite/em.lz \ $(DISTNAME)/testsuite/em.lz \
$(DISTNAME)/testsuite/test.txt.lz \
$(DISTNAME)/testsuite/test.txt.tar.lz \ $(DISTNAME)/testsuite/test.txt.tar.lz \
$(DISTNAME)/testsuite/test_bad[12].txt.tar.lz \ $(DISTNAME)/testsuite/test_bad[12].txt.tar.lz \
$(DISTNAME)/testsuite/test3.tar.lz \ $(DISTNAME)/testsuite/test3.tar.lz \
$(DISTNAME)/testsuite/test3_eoa[1-5].tar.lz \ $(DISTNAME)/testsuite/test3_eoa[1-5].tar.lz \
$(DISTNAME)/testsuite/test3_gh[1-6].tar.lz \ $(DISTNAME)/testsuite/test3_gh[1-8].tar.lz \
$(DISTNAME)/testsuite/test3_nn.tar.lz \ $(DISTNAME)/testsuite/test3_nn.tar.lz \
$(DISTNAME)/testsuite/test3_sm[1-4].tar.lz \ $(DISTNAME)/testsuite/test3_sm[1-4].tar.lz \
$(DISTNAME)/testsuite/test3_bad[1-6].tar.lz \ $(DISTNAME)/testsuite/test3_bad[1-6].tar.lz \
$(DISTNAME)/testsuite/test3_crc.tar.lz \
$(DISTNAME)/testsuite/test3_uk.tar.lz \
$(DISTNAME)/testsuite/test3_dir.tar.lz \ $(DISTNAME)/testsuite/test3_dir.tar.lz \
$(DISTNAME)/testsuite/test3_dot.tar.lz \ $(DISTNAME)/testsuite/test3_dot.tar.lz \
$(DISTNAME)/testsuite/tar_in_tlz[12].tar.lz \ $(DISTNAME)/testsuite/tar_in_tlz[12].tar.lz \
$(DISTNAME)/testsuite/tlz_in_tar[12].tar \ $(DISTNAME)/testsuite/tlz_in_tar[12].tar \
$(DISTNAME)/testsuite/ts_in_link.tar.lz \ $(DISTNAME)/testsuite/ts_in_link.tar.lz \
$(DISTNAME)/testsuite/t155.tar.lz \ $(DISTNAME)/testsuite/t155.tar.lz \
$(DISTNAME)/testsuite/t155_fv[1-6].tar.lz \ $(DISTNAME)/testsuite/t155_fv[1-7].tar.lz \
$(DISTNAME)/testsuite/dotdot[1-5].tar.lz \ $(DISTNAME)/testsuite/dotdot[1-5].tar.lz \
$(DISTNAME)/testsuite/ug32767.tar.lz \ $(DISTNAME)/testsuite/ug32767.tar.lz \
$(DISTNAME)/testsuite/ug32chars.tar.lz \ $(DISTNAME)/testsuite/ug32chars.tar.lz \
$(DISTNAME)/testsuite/eoa_blocks.tar.lz $(DISTNAME)/testsuite/eoa_blocks.lz
rm -f $(DISTNAME) rm -f $(DISTNAME)
clean : clean :

36
NEWS
View file

@ -1,13 +1,33 @@
Changes in version 0.26: Changes in version 0.27:
tarlz now exits with error status 2 if any empty lzip member is found in a tarlz now prints seconds since epoch if a file date is out of range.
multimember compressed archive.
Scalability of parallel compressed creation and decoding has been increased. tarlz now uses at least 4 digits to print years.
A diagnostic message for read error has been improved. 'tarlz -tv' now prints the value of typeflag after the member name for
unknown file types.
The chapter 'Syntax of command-line arguments' has been added to the manual. tarlz now prints a diagnostic when it finds a corrupt tar header (or random
data where a tar header is expected).
'make check' now skips time stamps out of range or not recognized by system tarlz now diagnoses CRC mismatches in extended records separately.
tools. (Reported by J Dean).
Multi-threaded decoding now prints diagnostics about CRC mismatches and
unknown keywords in extended records in the correct order.
Many small fixes and improvements have been made to the code and the manual.
The chapter 'Creating backups safely' has been added to the manual.
(Suggested by Aren Tyr).
Lzip is now required to run the tests because I have not found any other
portable and reliable way to tell compressed archives from non-compressed.
Where possible, .tar archives for the testsuite are now decompressed from
their .tar.lz versions instead of distributed.
'make check' no longer tests '--mtime' with extreme dates to avoid test
failures caused by differences with the system tool 'touch'.
(Reported by Aren Tyr).
5 new test files have been added to the testsuite.

4
README
View file

@ -15,7 +15,7 @@ compressed archives.
Keeping the alignment between tar members and lzip members has two Keeping the alignment between tar members and lzip members has two
advantages. It adds an indexed lzip layer on top of the tar archive, making advantages. It adds an indexed lzip layer on top of the tar archive, making
it possible to decode the archive safely in parallel. It also minimizes the it possible to decode the archive safely in parallel. It also reduces the
amount of data lost in case of corruption. Compressing a tar archive with amount of data lost in case of corruption. Compressing a tar archive with
plzip may even double the amount of files lost for each lzip member damaged plzip may even double the amount of files lost for each lzip member damaged
because it does not keep the members aligned. because it does not keep the members aligned.
@ -93,7 +93,7 @@ Tarlz uses Arg_parser for command-line argument parsing:
http://www.nongnu.org/arg-parser/arg_parser.html http://www.nongnu.org/arg-parser/arg_parser.html
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This file is free documentation: you have unlimited permission to copy, This file is free documentation: you have unlimited permission to copy,
distribute, and modify it. distribute, and modify it.

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -82,10 +82,9 @@ Archive_descriptor::Archive_descriptor( const std::string & archive_name )
int Archive_reader_base::parse_records( Extended & extended, int Archive_reader_base::parse_records( Extended & extended,
const Tar_header header, const Tar_header header, Resizable_buffer & rbuf,
Resizable_buffer & rbuf, const char * const default_msg, const bool permissive,
const char * const default_msg, std::vector< std::string > * const msg_vecp )
const bool permissive )
{ {
const long long edsize = parse_octal( header + size_o, size_l ); const long long edsize = parse_octal( header + size_o, size_l );
const long long bufsize = round_up( edsize ); const long long bufsize = round_up( edsize );
@ -95,7 +94,7 @@ int Archive_reader_base::parse_records( Extended & extended,
if( !rbuf.resize( bufsize ) ) return err( -1, mem_msg ); if( !rbuf.resize( bufsize ) ) return err( -1, mem_msg );
e_msg_ = ""; e_code_ = 0; e_msg_ = ""; e_code_ = 0;
int retval = read( rbuf.u8(), bufsize ); // extended records buffer int retval = read( rbuf.u8(), bufsize ); // extended records buffer
if( retval == 0 && !extended.parse( rbuf(), edsize, permissive ) ) if( retval == 0 && !extended.parse( rbuf(), edsize, permissive, msg_vecp ) )
retval = 2; retval = 2;
if( retval && !*e_msg_ ) e_msg_ = default_msg; if( retval && !*e_msg_ ) e_msg_ = default_msg;
return retval; return retval;
@ -156,16 +155,9 @@ int Archive_reader::read( uint8_t * const buf, const int size )
const int rd = LZ_decompress_read( decoder, buf + sz, size - sz ); const int rd = LZ_decompress_read( decoder, buf + sz, size - sz );
if( rd < 0 ) if( rd < 0 )
{ {
const unsigned long long old_pos = LZ_decompress_total_in_size( decoder );
if( LZ_decompress_sync_to_member( decoder ) < 0 ) if( LZ_decompress_sync_to_member( decoder ) < 0 )
internal_error( "library error (LZ_decompress_sync_to_member)." ); internal_error( "library error (LZ_decompress_sync_to_member)." );
e_skip_ = true; set_error_status( 2 ); e_skip_ = true; set_error_status( 2 ); return err( 2, "", 0, sz, true );
const unsigned long long new_pos = LZ_decompress_total_in_size( decoder );
// lzlib < 1.8 does not update total_in_size when syncing to member
if( new_pos >= old_pos && new_pos < LLONG_MAX )
return err( 2, "", 0, sz, true );
return err( -1, "Skipping to next header failed. "
"Lzlib 1.8 or newer required.", 0, sz );
} }
if( rd == 0 && LZ_decompress_finished( decoder ) == 1 ) if( rd == 0 && LZ_decompress_finished( decoder ) == 1 )
{ return err( -2, end_msg, 0, sz ); } { return err( -2, end_msg, 0, sz ); }

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -65,9 +65,10 @@ public:
If !OK, fills all the e_* variables. */ If !OK, fills all the e_* variables. */
virtual int read( uint8_t * const buf, const int size ) = 0; virtual int read( uint8_t * const buf, const int size ) = 0;
int parse_records( Extended & extended, const Tar_header header, int parse_records( Extended & extended,
Resizable_buffer & rbuf, const char * const default_msg, const Tar_header header, Resizable_buffer & rbuf,
const bool permissive ); const char * const default_msg, const bool permissive,
std::vector< std::string > * const msg_vecp = 0 );
}; };
@ -112,7 +113,7 @@ public:
long long mdata_end() const { return mdata_end_; } long long mdata_end() const { return mdata_end_; }
bool at_member_end() const { return data_pos_ == mdata_end_; } bool at_member_end() const { return data_pos_ == mdata_end_; }
// Resets decoder and sets position to the start of the member. // Reset decoder and set position to the start of the member.
void set_member( const long i ); void set_member( const long i );
int read( uint8_t * const buf, const int size ); int read( uint8_t * const buf, const int size );

View file

@ -1,5 +1,5 @@
/* Arg_parser - POSIX/GNU command-line argument parser. (C++ version) /* Arg_parser - POSIX/GNU command-line argument parser. (C++ version)
Copyright (C) 2006-2024 Antonio Diaz Diaz. Copyright (C) 2006-2025 Antonio Diaz Diaz.
This library is free software. Redistribution and use in source and This library is free software. Redistribution and use in source and
binary forms, with or without modification, are permitted provided binary forms, with or without modification, are permitted provided

View file

@ -1,5 +1,5 @@
/* Arg_parser - POSIX/GNU command-line argument parser. (C++ version) /* Arg_parser - POSIX/GNU command-line argument parser. (C++ version)
Copyright (C) 2006-2024 Antonio Diaz Diaz. Copyright (C) 2006-2025 Antonio Diaz Diaz.
This library is free software. Redistribution and use in source and This library is free software. Redistribution and use in source and
binary forms, with or without modification, are permitted provided binary forms, with or without modification, are permitted provided

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -117,53 +117,56 @@ bool block_is_zero( const uint8_t * const buf, const int size )
bool format_member_name( const Extended & extended, const Tar_header header, bool format_member_name( const Extended & extended, const Tar_header header,
Resizable_buffer & rbuf, const bool long_format ) Resizable_buffer & rbuf, const bool long_format )
{ {
if( long_format ) if( !long_format )
{ {
format_mode_string( header, rbuf() ); if( !rbuf.resize( extended.path().size() + 2 ) ) return false;
const int group_string_len = snprintf( rbuf(), rbuf.size(), "%s\n", extended.path().c_str() );
format_user_group_string( extended, header, rbuf() + mode_string_size ); return true;
int offset = mode_string_size + group_string_len; }
const time_t mtime = extended.mtime().sec(); format_mode_string( header, rbuf() );
struct tm t; const int group_string_len =
if( !localtime_r( &mtime, &t ) ) // if local time fails format_user_group_string( extended, header, rbuf() + mode_string_size );
{ time_t z = 0; if( !gmtime_r( &z, &t ) ) // use UTC, the epoch int offset = mode_string_size + group_string_len;
{ t.tm_year = 70; t.tm_mon = t.tm_hour = t.tm_min = 0; t.tm_mday = 1; } } const time_t mtime = extended.mtime().sec();
const Typeflag typeflag = (Typeflag)header[typeflag_o]; struct tm t;
const bool islink = typeflag == tf_link || typeflag == tf_symlink; char buf[32]; // if local time and UTC fail, use seconds since epoch
const char * const link_string = !islink ? "" : if( localtime_r( &mtime, &t ) || gmtime_r( &mtime, &t ) )
( ( typeflag == tf_link ) ? " link to " : " -> " ); snprintf( buf, sizeof buf, "%04d-%02u-%02u %02u:%02u", 1900 + t.tm_year,
// print "user/group size" in a field of width 19 with 8 or more for size 1 + t.tm_mon, t.tm_mday, t.tm_hour, t.tm_min );
if( typeflag == tf_chardev || typeflag == tf_blockdev ) else snprintf( buf, sizeof buf, "%lld", extended.mtime().sec() );
{ const Typeflag typeflag = (Typeflag)header[typeflag_o];
const unsigned devmajor = parse_octal( header + devmajor_o, devmajor_l ); const bool islink = typeflag == tf_link || typeflag == tf_symlink;
const unsigned devminor = parse_octal( header + devminor_o, devminor_l ); const char * const link_string = !islink ? "" :
const int width = std::max( 1, ( ( typeflag == tf_link ) ? " link to " : " -> " );
std::max( 8, 19 - group_string_len ) - 1 - decimal_digits( devminor ) ); // print "user/group size" in a field of width 19 with 8 or more for size
offset += snprintf( rbuf() + offset, rbuf.size() - offset, " %*u,%u", if( typeflag == tf_chardev || typeflag == tf_blockdev )
width, devmajor, devminor ); {
} const unsigned devmajor = parse_octal( header + devmajor_o, devmajor_l );
else const unsigned devminor = parse_octal( header + devminor_o, devminor_l );
{ const int width = std::max( 1,
const int width = std::max( 8, 19 - group_string_len ); std::max( 8, 19 - group_string_len ) - 1 - decimal_digits( devminor ) );
offset += snprintf( rbuf() + offset, rbuf.size() - offset, " %*llu", offset += snprintf( rbuf() + offset, rbuf.size() - offset, " %*u,%u",
width, extended.file_size() ); width, devmajor, devminor );
}
for( int i = 0; i < 2; ++i ) // resize rbuf if not large enough
{
const int len = snprintf( rbuf() + offset, rbuf.size() - offset,
" %4d-%02u-%02u %02u:%02u %s%s%s\n",
1900 + t.tm_year, 1 + t.tm_mon, t.tm_mday, t.tm_hour,
t.tm_min, extended.path().c_str(), link_string,
islink ? extended.linkpath().c_str() : "" );
if( len + offset < (int)rbuf.size() ) break;
if( !rbuf.resize( len + offset + 1 ) ) return false;
}
} }
else else
{ {
if( rbuf.size() < extended.path().size() + 2 && const int width = std::max( 8, 19 - group_string_len );
!rbuf.resize( extended.path().size() + 2 ) ) return false; offset += snprintf( rbuf() + offset, rbuf.size() - offset, " %*llu",
snprintf( rbuf(), rbuf.size(), "%s\n", extended.path().c_str() ); width, extended.file_size() );
}
for( int i = 0; i < 2; ++i ) // resize rbuf if not large enough
{
const int len = snprintf( rbuf() + offset, rbuf.size() - offset,
" %s %s%s%s\n", buf, extended.path().c_str(), link_string,
islink ? extended.linkpath().c_str() : "" );
if( len + offset < (int)rbuf.size() ) { offset += len; break; }
if( !rbuf.resize( len + offset + 1 ) ) return false;
}
if( rbuf()[0] == '?' )
{
if( !rbuf.resize( offset + 25 + 1 ) ) return false;
snprintf( rbuf() + offset - 1, rbuf.size() - offset,
": Unknown file type 0x%02X\n", typeflag );
} }
return true; return true;
} }
@ -183,9 +186,12 @@ bool show_member_name( const Extended & extended, const Tar_header header,
} }
/* Return true if file must be skipped.
Execute -C options if cwd_fd >= 0 (diff or extract). */
bool check_skip_filename( const Cl_options & cl_opts, bool check_skip_filename( const Cl_options & cl_opts,
std::vector< char > & name_pending, std::vector< char > & name_pending,
const char * const filename, const int chdir_fd ) const char * const filename, const int cwd_fd,
std::string * const msgp )
{ {
static int c_idx = -1; // parser index of last -C executed static int c_idx = -1; // parser index of last -C executed
if( Exclude::excluded( filename ) ) return true; // skip excluded files if( Exclude::excluded( filename ) ) return true; // skip excluded files
@ -197,18 +203,19 @@ bool check_skip_filename( const Cl_options & cl_opts,
{ {
if( cl_opts.parser.code( i ) == 'C' ) { chdir_pending = true; continue; } if( cl_opts.parser.code( i ) == 'C' ) { chdir_pending = true; continue; }
if( !nonempty_arg( cl_opts.parser, i ) ) continue; // skip opts, empty names if( !nonempty_arg( cl_opts.parser, i ) ) continue; // skip opts, empty names
std::string removed_prefix; std::string removed_prefix; // prefix of cl argument
const char * const name = remove_leading_dotslash( const char * const name = remove_leading_dotslash(
cl_opts.parser.argument( i ).c_str(), &removed_prefix ); cl_opts.parser.argument( i ).c_str(), &removed_prefix );
if( compare_prefix_dir( name, filename ) || if( compare_prefix_dir( name, filename ) ||
compare_tslash( name, filename ) ) compare_tslash( name, filename ) )
{ {
print_removed_prefix( removed_prefix ); print_removed_prefix( removed_prefix, msgp );
skip = false; name_pending[i] = false; skip = false; name_pending[i] = false;
if( chdir_pending && chdir_fd >= 0 ) // only serial decoder sets cwd_fd >= 0 to process -C options
if( chdir_pending && cwd_fd >= 0 )
{ {
if( c_idx > i ) if( c_idx > i )
{ if( fchdir( chdir_fd ) != 0 ) { if( fchdir( cwd_fd ) != 0 )
{ show_error( "Error changing to initial working directory", errno ); { show_error( "Error changing to initial working directory", errno );
throw Chdir_error(); } c_idx = -1; } throw Chdir_error(); } c_idx = -1; }
for( int j = c_idx + 1; j < i; ++j ) for( int j = c_idx + 1; j < i; ++j )
@ -234,7 +241,11 @@ bool make_dirs( const std::string & name )
while( i > 0 && name[i-1] != '/' ) --i; // remove last component while( i > 0 && name[i-1] != '/' ) --i; // remove last component
while( i > 0 && name[i-1] == '/' ) --i; // remove more slashes while( i > 0 && name[i-1] == '/' ) --i; // remove more slashes
const int dirsize = i; // first slash before last component const int dirsize = i; // first slash before last component
struct stat st;
if( dirsize > 0 && lstat( std::string( name, 0, dirsize ).c_str(), &st ) == 0 )
{ if( !S_ISDIR( st.st_mode ) ) { errno = ENOTDIR; return false; }
return true; }
for( i = 0; i < dirsize; ) // if dirsize == 0, dirname is '/' or empty for( i = 0; i < dirsize; ) // if dirsize == 0, dirname is '/' or empty
{ {
while( i < dirsize && name[i] == '/' ) ++i; while( i < dirsize && name[i] == '/' ) ++i;
@ -244,7 +255,6 @@ bool make_dirs( const std::string & name )
{ {
const std::string partial( name, 0, i ); const std::string partial( name, 0, i );
const mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH; const mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
struct stat st;
if( lstat( partial.c_str(), &st ) == 0 ) if( lstat( partial.c_str(), &st ) == 0 )
{ if( !S_ISDIR( st.st_mode ) ) { errno = ENOTDIR; return false; } } { if( !S_ISDIR( st.st_mode ) ) { errno = ENOTDIR; return false; } }
else if( mkdir( partial.c_str(), mode ) != 0 && errno != EEXIST ) else if( mkdir( partial.c_str(), mode ) != 0 && errno != EEXIST )

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -131,10 +131,10 @@ bool print_removed_prefix( const std::string & prefix,
if( prefixes[i] == prefix ) if( prefixes[i] == prefix )
{ xunlock( &mutex ); if( msgp ) msgp->clear(); return false; } { xunlock( &mutex ); if( msgp ) msgp->clear(); return false; }
prefixes.push_back( prefix ); prefixes.push_back( prefix );
xunlock( &mutex );
std::string msg( "Removing leading '" ); msg += prefix; std::string msg( "Removing leading '" ); msg += prefix;
msg += "' from member names."; msg += "' from member names."; // from archive or command line
if( msgp ) *msgp = msg; else show_error( msg.c_str() ); if( msgp ) *msgp = msg; else show_error( msg.c_str() );
xunlock( &mutex ); // put here to prevent mixing calls to show_error
return true; return true;
} }

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -212,7 +212,6 @@ int compress_archive( const Cl_options & cl_opts,
Extended extended; // metadata from extended records Extended extended; // metadata from extended records
Resizable_buffer rbuf; // headers and extended records buffer Resizable_buffer rbuf; // headers and extended records buffer
if( !rbuf.size() ) { show_error( mem_msg ); return 1; } if( !rbuf.size() ) { show_error( mem_msg ); return 1; }
const char * const rderr_msg = "Read error";
bool first_header = true; bool first_header = true;
while( true ) // process one tar member per iteration while( true ) // process one tar member per iteration
@ -224,7 +223,7 @@ int compress_archive( const Cl_options & cl_opts,
show_file_error( filename, "Archive is empty." ); show_file_error( filename, "Archive is empty." );
close( infd ); return 2; } close( infd ); return 2; }
if( rd != header_size ) if( rd != header_size )
{ show_file_error( filename, rderr_msg, errno ); close( infd ); return 1; } { show_file_error( filename, rd_err_msg, errno ); close( infd ); return 1; }
first_header = false; first_header = false;
const bool is_header = check_ustar_chksum( rbuf.u8() ); const bool is_header = check_ustar_chksum( rbuf.u8() );
@ -259,7 +258,8 @@ int compress_archive( const Cl_options & cl_opts,
if( !rbuf.resize( total_header_size + bufsize ) ) if( !rbuf.resize( total_header_size + bufsize ) )
{ show_file_error( filename, mem_msg ); close( infd ); return 1; } { show_file_error( filename, mem_msg ); close( infd ); return 1; }
if( readblock( infd, rbuf.u8() + total_header_size, bufsize ) != bufsize ) if( readblock( infd, rbuf.u8() + total_header_size, bufsize ) != bufsize )
{ show_file_error( filename, rderr_msg, errno ); close( infd ); return 1; } { show_file_error( filename, rd_err_msg, errno );
close( infd ); return 1; }
total_header_size += bufsize; total_header_size += bufsize;
if( typeflag == tf_extended ) // do not parse global headers if( typeflag == tf_extended ) // do not parse global headers
{ {
@ -269,7 +269,7 @@ int compress_archive( const Cl_options & cl_opts,
if( !rbuf.resize( total_header_size + header_size ) ) if( !rbuf.resize( total_header_size + header_size ) )
{ show_file_error( filename, mem_msg ); close( infd ); return 1; } { show_file_error( filename, mem_msg ); close( infd ); return 1; }
if( readblock( infd, rbuf.u8() + total_header_size, header_size ) != header_size ) if( readblock( infd, rbuf.u8() + total_header_size, header_size ) != header_size )
{ show_file_error( filename, errno ? rderr_msg : end_msg, errno ); { show_file_error( filename, errno ? rd_err_msg : end_msg, errno );
close( infd ); return errno ? 1 : 2; } close( infd ); return errno ? 1 : 2; }
if( !check_ustar_chksum( rbuf.u8() ) ) if( !check_ustar_chksum( rbuf.u8() ) )
{ show_file_error( filename, bad_hdr_msg ); close( infd ); return 2; } { show_file_error( filename, bad_hdr_msg ); close( infd ); return 2; }

6
configure vendored
View file

@ -1,12 +1,12 @@
#! /bin/sh #! /bin/sh
# configure script for Tarlz - Archiver with multimember lzip compression # configure script for Tarlz - Archiver with multimember lzip compression
# Copyright (C) 2013-2024 Antonio Diaz Diaz. # Copyright (C) 2013-2025 Antonio Diaz Diaz.
# #
# This configure script is free software: you have unlimited permission # This configure script is free software: you have unlimited permission
# to copy, distribute, and modify it. # to copy, distribute, and modify it.
pkgname=tarlz pkgname=tarlz
pkgversion=0.26 pkgversion=0.27
progname=tarlz progname=tarlz
srctrigger=doc/${pkgname}.texi srctrigger=doc/${pkgname}.texi
@ -175,7 +175,7 @@ echo "MAKEINFO = ${MAKEINFO}"
rm -f Makefile rm -f Makefile
cat > Makefile << EOF cat > Makefile << EOF
# Makefile for Tarlz - Archiver with multimember lzip compression # Makefile for Tarlz - Archiver with multimember lzip compression
# Copyright (C) 2013-2024 Antonio Diaz Diaz. # Copyright (C) 2013-2025 Antonio Diaz Diaz.
# This file was generated automatically by configure. Don't edit. # This file was generated automatically by configure. Don't edit.
# #
# This Makefile is free software: you have unlimited permission # This Makefile is free software: you have unlimited permission

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -231,7 +231,7 @@ bool store_name( const char * const filename, Extended & extended,
} }
// add one tar member to the archive // add one tar member to the archive and print filename
int add_member( const char * const filename, const struct stat *, int add_member( const char * const filename, const struct stat *,
const int flag, struct FTW * ) const int flag, struct FTW * )
{ {
@ -323,7 +323,7 @@ bool copy_file( const int infd, const int outfd, const char * const filename,
if( max_size >= 0 ) rest -= size; if( max_size >= 0 ) rest -= size;
const int rd = readblock( infd, buffer, size ); const int rd = readblock( infd, buffer, size );
if( rd != size && errno ) if( rd != size && errno )
{ show_file_error( filename, "Read error", errno ); error = true; break; } { show_file_error( filename, rd_err_msg, errno ); error = true; break; }
if( rd > 0 ) if( rd > 0 )
{ {
if( !writeblock_wrapper( outfd, buffer, rd ) ) { error = true; break; } if( !writeblock_wrapper( outfd, buffer, rd ) ) { error = true; break; }
@ -455,7 +455,7 @@ bool fill_headers( const char * const filename, Extended & extended,
show_file_error( filename, "Wrong size reading symbolic link.\n" show_file_error( filename, "Wrong size reading symbolic link.\n"
"Please, send a bug report to the maintainers of your filesystem, " "Please, send a bug report to the maintainers of your filesystem, "
"mentioning\n'wrong st_size of symbolic link'.\nSee " "mentioning\n'wrong st_size of symbolic link'.\nSee "
"http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html" ); "http://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_stat.h.html" );
set_error_status( 1 ); return false; set_error_status( 1 ); return false;
} }
} }
@ -607,7 +607,7 @@ int concatenate( const Cl_options & cl_opts )
"Not an appendable tar archive." ); "Not an appendable tar archive." );
close( infd ); retval = 2; break; } close( infd ); retval = 2; break; }
if( !copy_file( infd, outfd, filename, size ) || close( infd ) != 0 ) if( !copy_file( infd, outfd, filename, size ) || close( infd ) != 0 )
{ show_file_error( filename, "Error copying archive", errno ); { show_file_error( filename, "Error concatenating archive", errno );
eoa_pending = false; retval = 1; break; } eoa_pending = false; retval = 1; break; }
eoa_pending = true; eoa_pending = true;
if( verbosity >= 1 ) std::fprintf( stderr, "%s\n", filename ); if( verbosity >= 1 ) std::fprintf( stderr, "%s\n", filename );
@ -621,6 +621,34 @@ int concatenate( const Cl_options & cl_opts )
} }
// Return value: 0 = skip arg, 1 = error, 2 = arg done
int parse_cl_arg( const Cl_options & cl_opts, const int i,
int (* add_memberp)( const char * const filename,
const struct stat *, const int flag, struct FTW * ) )
{
const int code = cl_opts.parser.code( i );
const std::string & arg = cl_opts.parser.argument( i );
const char * filename = arg.c_str(); // filename from command line
if( code == 'C' && chdir( filename ) != 0 )
{ show_file_error( filename, chdir_msg, errno ); return 1; }
if( code ) return 0; // skip options
if( cl_opts.parser.argument( i ).empty() ) return 0; // skip empty names
std::string deslashed; // filename without trailing slashes
unsigned len = arg.size();
while( len > 1 && arg[len-1] == '/' ) --len;
if( len < arg.size() )
{ deslashed.assign( arg, 0, len ); filename = deslashed.c_str(); }
if( Exclude::excluded( filename ) ) return 0; // skip excluded files
struct stat st;
if( lstat( filename, &st ) != 0 )
{ show_file_error( filename, cant_stat, errno );
set_error_status( 1 ); return 0; }
if( nftw( filename, add_memberp, 16, cl_opts.dereference ? 0 : FTW_PHYS ) )
return 1; // write error or OOM
return 2;
}
int encode( const Cl_options & cl_opts ) int encode( const Cl_options & cl_opts )
{ {
if( !grbuf.size() ) { show_error( mem_msg ); return 1; } if( !grbuf.size() ) { show_error( mem_msg ); return 1; }
@ -698,27 +726,11 @@ int encode( const Cl_options & cl_opts )
int retval = 0; int retval = 0;
for( int i = 0; i < cl_opts.parser.arguments(); ++i ) // parse command line for( int i = 0; i < cl_opts.parser.arguments(); ++i ) // parse command line
{ {
const int code = cl_opts.parser.code( i ); const int ret = parse_cl_arg( cl_opts, i, add_member );
const std::string & arg = cl_opts.parser.argument( i ); if( ret == 0 ) continue; // skip arg
const char * filename = arg.c_str(); if( ret == 1 ) { retval = 1; break; } // error
if( code == 'C' && chdir( filename ) != 0 ) if( encoder && cl_opts.solidity == dsolid && // end of group
{ show_file_error( filename, chdir_msg, errno ); retval = 1; break; } !archive_write( 0, 0 ) ) { retval = 1; break; }
if( code ) continue; // skip options
if( cl_opts.parser.argument( i ).empty() ) continue; // skip empty names
std::string deslashed; // arg without trailing slashes
unsigned len = arg.size();
while( len > 1 && arg[len-1] == '/' ) --len;
if( len < arg.size() )
{ deslashed.assign( arg, 0, len ); filename = deslashed.c_str(); }
if( Exclude::excluded( filename ) ) continue; // skip excluded files
struct stat st;
if( lstat( filename, &st ) != 0 ) // filename from command line
{ show_file_error( filename, cant_stat, errno ); set_error_status( 1 ); }
else if( ( retval = nftw( filename, add_member, 16,
cl_opts.dereference ? 0 : FTW_PHYS ) ) != 0 )
break; // write error
else if( encoder && cl_opts.solidity == dsolid && !archive_write( 0, 0 ) )
{ retval = 1; break; }
} }
if( retval == 0 ) // write End-Of-Archive records if( retval == 0 ) // write End-Of-Archive records

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -45,3 +45,8 @@ public:
extern Archive_attrs archive_attrs; extern Archive_attrs archive_attrs;
const char * const cant_stat = "Can't stat input file"; const char * const cant_stat = "Can't stat input file";
// defined in create.cc
int parse_cl_arg( const Cl_options & cl_opts, const int i,
int (* add_memberp)( const char * const filename,
const struct stat *, const int flag, struct FTW * ) );

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -250,7 +250,7 @@ public:
}; };
// send one ipacket with tar member metadata to courier // send one ipacket with tar member metadata to courier and print filename
int add_member_lz( const char * const filename, const struct stat *, int add_member_lz( const char * const filename, const struct stat *,
const int flag, struct FTW * ) const int flag, struct FTW * )
{ {
@ -300,26 +300,10 @@ extern "C" void * grouper( void * arg )
for( int i = 0; i < cl_opts.parser.arguments(); ++i ) // parse command line for( int i = 0; i < cl_opts.parser.arguments(); ++i ) // parse command line
{ {
const int code = cl_opts.parser.code( i ); const int ret = parse_cl_arg( cl_opts, i, add_member_lz );
const std::string & arg = cl_opts.parser.argument( i ); if( ret == 0 ) continue; // skip arg
const char * filename = arg.c_str(); if( ret == 1 ) exit_fail_mt(); // error
if( code == 'C' && chdir( filename ) != 0 ) if( cl_opts.solidity == dsolid ) // end of group
{ show_file_error( filename, chdir_msg, errno ); exit_fail_mt(); }
if( code ) continue; // skip options
if( cl_opts.parser.argument( i ).empty() ) continue; // skip empty names
std::string deslashed; // arg without trailing slashes
unsigned len = arg.size();
while( len > 1 && arg[len-1] == '/' ) --len;
if( len < arg.size() )
{ deslashed.assign( arg, 0, len ); filename = deslashed.c_str(); }
if( Exclude::excluded( filename ) ) continue; // skip excluded files
struct stat st;
if( lstat( filename, &st ) != 0 ) // filename from command line
{ show_file_error( filename, cant_stat, errno ); set_error_status( 1 ); }
else if( nftw( filename, add_member_lz, 16,
cl_opts.dereference ? 0 : FTW_PHYS ) != 0 )
exit_fail_mt(); // write error or OOM
else if( cl_opts.solidity == dsolid ) // end of group
courier.receive_packet( new Ipacket ); courier.receive_packet( new Ipacket );
} }

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -48,14 +48,18 @@ namespace {
Resizable_buffer grbuf; Resizable_buffer grbuf;
bool skip_warn( const bool reset = false ) // avoid duplicate warnings bool skip_warn( const bool reset = false, const unsigned chksum = 0 )
{ {
static bool skipping = false; static bool skipping = false; // avoid duplicate warnings
if( reset ) skipping = false; if( reset ) { skipping = false; return false; }
else if( !skipping ) if( skipping ) return false;
{ skipping = true; show_error( "Skipping to next header." ); return true; } skipping = true;
return false; if( chksum != 0 )
{ if( verbosity < 1 ) show_error( "Corrupt header." );
else std::fprintf( stderr, "%s: Corrupt header: ustar chksum = %06o\n",
program_name, chksum ); }
show_error( "Skipping to next header." ); return true;
} }
@ -127,20 +131,20 @@ int extract_member( const Cl_options & cl_opts, Archive_reader & ar,
int outfd = -1; int outfd = -1;
if( !show_member_name( extended, header, 1, grbuf ) ) return 1; if( !show_member_name( extended, header, 1, grbuf ) ) return 1;
// remove file (or empty dir) before extraction to prevent following links
std::remove( filename );
if( !make_dirs( filename ) ) if( !make_dirs( filename ) )
{ {
show_file_error( filename, intdir_msg, errno ); show_file_error( filename, intdir_msg, errno );
set_error_status( 1 ); set_error_status( 1 );
return skip_member( ar, extended, typeflag ); return skip_member( ar, extended, typeflag );
} }
// remove file or empty dir before extraction to prevent following links
std::remove( filename );
switch( typeflag ) switch( typeflag )
{ {
case tf_regular: case tf_regular:
case tf_hiperf: case tf_hiperf:
outfd = open_outstream( filename ); outfd = open_outstream( filename, true, 0, false );
if( outfd < 0 ) if( outfd < 0 )
{ set_error_status( 1 ); return skip_member( ar, extended, typeflag ); } { set_error_status( 1 ); return skip_member( ar, extended, typeflag ); }
break; break;
@ -223,7 +227,7 @@ int extract_member( const Cl_options & cl_opts, Archive_reader & ar,
if( cl_opts.keep_damaged ) if( cl_opts.keep_damaged )
{ writeblock( outfd, buf, std::min( rest, (long long)ar.e_size() ) ); { writeblock( outfd, buf, std::min( rest, (long long)ar.e_size() ) );
close( outfd ); } close( outfd ); }
else { close( outfd ); std::remove( filename ); } else { close( outfd ); unlink( filename ); }
} }
if( ar.fatal() ) return ret; else return 0; if( ar.fatal() ) return ret; else return 0;
} }
@ -386,7 +390,7 @@ bool compare_file_contents( std::string & estr, std::string & ostr,
const int rd = readblock( infd2, buf2, rsize2 ); const int rd = readblock( infd2, buf2, rsize2 );
if( rd != rsize2 ) if( rd != rsize2 )
{ {
if( errno ) format_file_error( estr, filename, "Read error", errno ); if( errno ) format_file_error( estr, filename, rd_err_msg, errno );
else format_file_diff( ostr, filename, "EOF found in file" ); else format_file_diff( ostr, filename, "EOF found in file" );
diff = true; diff = true;
} }
@ -420,8 +424,8 @@ int decode( const Cl_options & cl_opts )
const bool c_after_name = c_present && const bool c_after_name = c_present &&
option_C_after_filename( cl_opts.parser ); option_C_after_filename( cl_opts.parser );
// save current working directory for sequential decoding // save current working directory for sequential decoding
const int chdir_fd = c_after_name ? open( ".", O_RDONLY | O_DIRECTORY ) : -1; const int cwd_fd = c_after_name ? open( ".", O_RDONLY | O_DIRECTORY ) : -1;
if( c_after_name && chdir_fd < 0 ) if( c_after_name && cwd_fd < 0 )
{ show_error( "Can't save current working directory", errno ); return 1; } { show_error( "Can't save current working directory", errno ); return 1; }
if( c_present && !c_after_name ) // execute all -C options if( c_present && !c_after_name ) // execute all -C options
for( int i = 0; i < cl_opts.parser.arguments(); ++i ) for( int i = 0; i < cl_opts.parser.arguments(); ++i )
@ -464,8 +468,7 @@ int decode( const Cl_options & cl_opts )
show_file_error( ad.namep, fv_msg1 ); show_file_error( ad.namep, fv_msg1 );
retval = 2; break; retval = 2; break;
} }
if( skip_warn() && verbosity >= 2 ) skip_warn( false, ustar_chksum( header ) );
std::fprintf( stderr, "ustar chksum = %07o\n", ustar_chksum( header ) );
set_error_status( 2 ); continue; set_error_status( 2 ); continue;
} }
skip_warn( true ); // reset warning skip_warn( true ); // reset warning
@ -480,7 +483,7 @@ int decode( const Cl_options & cl_opts )
if( ret != 0 ) if( ret != 0 )
{ show_file_error( ad.namep, ar.e_msg(), ar.e_code() ); { show_file_error( ad.namep, ar.e_msg(), ar.e_code() );
if( ar.fatal() ) { retval = ret; break; } if( ar.fatal() ) { retval = ret; break; }
skip_warn(); set_error_status( ret ); } set_error_status( ret ); }
continue; continue;
} }
if( typeflag == tf_extended ) if( typeflag == tf_extended )
@ -492,7 +495,7 @@ int decode( const Cl_options & cl_opts )
if( ret != 0 ) if( ret != 0 )
{ show_file_error( ad.namep, ar.e_msg(), ar.e_code() ); { show_file_error( ad.namep, ar.e_msg(), ar.e_code() );
if( ar.fatal() ) { retval = ret; break; } if( ar.fatal() ) { retval = ret; break; }
skip_warn(); extended.reset(); set_error_status( ret ); } extended.reset(); set_error_status( ret ); }
else if( !extended.crc_present() && cl_opts.missing_crc ) else if( !extended.crc_present() && cl_opts.missing_crc )
{ show_file_error( ad.namep, miscrc_msg ); retval = 2; break; } { show_file_error( ad.namep, miscrc_msg ); retval = 2; break; }
prev_extended = true; continue; prev_extended = true; continue;
@ -504,7 +507,7 @@ int decode( const Cl_options & cl_opts )
try { try {
// members without name are skipped except when listing // members without name are skipped except when listing
if( check_skip_filename( cl_opts, name_pending, extended.path().c_str(), if( check_skip_filename( cl_opts, name_pending, extended.path().c_str(),
chdir_fd ) ) retval = skip_member( ar, extended, typeflag ); cwd_fd ) ) retval = skip_member( ar, extended, typeflag );
else else
{ {
print_removed_prefix( extended.removed_prefix ); print_removed_prefix( extended.removed_prefix );

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -16,7 +16,7 @@
*/ */
inline bool data_may_follow( const Typeflag typeflag ) inline bool data_may_follow( const Typeflag typeflag )
{ return typeflag <= 0 || typeflag >= 7; } { return typeflag == tf_regular || typeflag == tf_hiperf; }
inline bool uid_gid_in_range( const long long uid, const long long gid ) inline bool uid_gid_in_range( const long long uid, const long long gid )
{ return uid == (long long)( (uid_t)uid ) && { return uid == (long long)( (uid_t)uid ) &&
@ -27,7 +27,7 @@ const char * const cantln_msg = "Can't %slink '%s' to '%s'";
const char * const mkdir_msg = "Can't create directory"; const char * const mkdir_msg = "Can't create directory";
const char * const mknod_msg = "Can't create device node"; const char * const mknod_msg = "Can't create device node";
const char * const mkfifo_msg = "Can't create FIFO file"; const char * const mkfifo_msg = "Can't create FIFO file";
const char * const uftype_msg = "%s: Unknown file type '%c', skipping."; const char * const uftype_msg = "%s: Unknown file type 0x%02X, skipping.";
const char * const chown_msg = "Can't change file owner"; const char * const chown_msg = "Can't change file owner";
mode_t get_umask(); mode_t get_umask();

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -41,7 +41,8 @@
#include "common_mutex.h" #include "common_mutex.h"
#include "decode.h" #include "decode.h"
/* When a problem is detected by any worker: /* Parallel decode does not skip; it exits at the first error.
When a problem is detected by any worker:
- the worker requests mastership and returns. - the worker requests mastership and returns.
- the courier discards new packets received or collected. - the courier discards new packets received or collected.
- the other workers return. - the other workers return.
@ -49,9 +50,9 @@
namespace { namespace {
const char * const other_msg = "Other worker found an error."; const char * const other_msg = "Another worker found an error.";
/* line is preformatted and newline terminated except for prefix, error. /* line is preformatted and newline terminated except for prefix and errors.
ok with an empty line is a no-op. */ ok with an empty line is a no-op. */
struct Packet // member name and metadata or error message struct Packet // member name and metadata or error message
{ {
@ -60,7 +61,7 @@ struct Packet // member name and metadata or error message
long member_id; // lzip member containing the header of this tar member long member_id; // lzip member containing the header of this tar member
std::string line; // member name and metadata ready to print, if any std::string line; // member name and metadata ready to print, if any
Status status; // diagnostics and errors go to stderr Status status; // diagnostics and errors go to stderr
int errcode; // for error int errcode; // for errors
Packet( const long i, const char * const msg, const Status s, const int e ) Packet( const long i, const char * const msg, const Status s, const int e )
: member_id( i ), line( msg ), status( s ), errcode( e ) {} : member_id( i ), line( msg ), status( s ), errcode( e ) {}
}; };
@ -119,11 +120,12 @@ public:
{ xunlock( &omutex ); return master_id == worker_id; } { xunlock( &omutex ); return master_id == worker_id; }
if( error_member_id < 0 || error_member_id > member_id ) if( error_member_id < 0 || error_member_id > member_id )
error_member_id = member_id; error_member_id = member_id;
while( !mastership_granted() && ( worker_id != deliver_id || while( !mastership_granted() &&
!opacket_queues[deliver_id].empty() ) ) ( worker_id != deliver_id || !opacket_queues[deliver_id].empty() ) )
xwait( &check_master, &omutex ); xwait( &check_master, &omutex );
if( !mastership_granted() && worker_id == deliver_id && if( !mastership_granted() &&
opacket_queues[deliver_id].empty() ) // redundant conditions useful for the compiler
worker_id == deliver_id && opacket_queues[deliver_id].empty() )
{ {
master_id = worker_id; // grant mastership master_id = worker_id; // grant mastership
for( int i = 0; i < num_workers; ++i ) // delete all packets for( int i = 0; i < num_workers; ++i ) // delete all packets
@ -356,10 +358,9 @@ Trival extract_member_lz( const Cl_options & cl_opts,
if( !courier.collect_packet( member_id, worker_id, rbuf(), Packet::ok ) ) if( !courier.collect_packet( member_id, worker_id, rbuf(), Packet::ok ) )
return Trival( other_msg, 0, 1 ); return Trival( other_msg, 0, 1 );
} }
/* Remove file before extraction to prevent following links. struct stat st;
Don't remove an empty dir because other thread may need it. */ bool exists = lstat( filename, &st ) == 0;
if( typeflag != tf_directory ) std::remove( filename ); if( !exists && !make_dirs( filename ) )
if( !make_dirs( filename ) )
{ {
if( format_file_error( rbuf, filename, intdir_msg, errno ) && if( format_file_error( rbuf, filename, intdir_msg, errno ) &&
!courier.collect_packet( member_id, worker_id, rbuf(), Packet::diag ) ) !courier.collect_packet( member_id, worker_id, rbuf(), Packet::diag ) )
@ -367,12 +368,16 @@ Trival extract_member_lz( const Cl_options & cl_opts,
set_error_status( 1 ); set_error_status( 1 );
return skip_member_lz( ar, courier, extended, member_id, worker_id, typeflag ); return skip_member_lz( ar, courier, extended, member_id, worker_id, typeflag );
} }
/* Remove file before extraction to prevent following links.
Don't remove an empty dir; another thread may need it. */
if( exists && ( typeflag != tf_directory || !S_ISDIR( st.st_mode ) ) )
{ exists = false; std::remove( filename ); }
switch( typeflag ) switch( typeflag )
{ {
case tf_regular: case tf_regular:
case tf_hiperf: case tf_hiperf:
outfd = open_outstream( filename, true, &rbuf ); outfd = open_outstream( filename, true, &rbuf, false );
if( outfd < 0 ) if( outfd < 0 )
{ {
if( verbosity >= 0 && if( verbosity >= 0 &&
@ -399,11 +404,6 @@ Trival extract_member_lz( const Cl_options & cl_opts,
} }
} break; } break;
case tf_directory: case tf_directory:
{
struct stat st;
bool exists = stat( filename, &st ) == 0;
if( exists && !S_ISDIR( st.st_mode ) )
{ exists = false; std::remove( filename ); }
if( !exists && mkdir( filename, mode ) != 0 && errno != EEXIST ) if( !exists && mkdir( filename, mode ) != 0 && errno != EEXIST )
{ {
if( format_file_error( rbuf, filename, mkdir_msg, errno ) && if( format_file_error( rbuf, filename, mkdir_msg, errno ) &&
@ -411,7 +411,7 @@ Trival extract_member_lz( const Cl_options & cl_opts,
return Trival( other_msg, 0, 1 ); return Trival( other_msg, 0, 1 );
set_error_status( 1 ); set_error_status( 1 );
} }
} break; break;
case tf_chardev: case tf_chardev:
case tf_blockdev: case tf_blockdev:
{ {
@ -483,7 +483,7 @@ Trival extract_member_lz( const Cl_options & cl_opts,
if( cl_opts.keep_damaged ) if( cl_opts.keep_damaged )
{ writeblock( outfd, buf, std::min( rest, (long long)ar.e_size() ) ); { writeblock( outfd, buf, std::min( rest, (long long)ar.e_size() ) );
close( outfd ); } close( outfd ); }
else { close( outfd ); std::remove( filename ); } else { close( outfd ); unlink( filename ); }
} }
return Trival( ar.e_msg(), ar.e_code(), ret ); return Trival( ar.e_msg(), ar.e_code(), ret );
} }
@ -614,13 +614,18 @@ extern "C" void * dworker( void * arg )
} }
if( typeflag == tf_extended ) if( typeflag == tf_extended )
{ {
const char * msg = 0; int ret = 2; std::vector< std::string > msg_vec;
const char * msg = 0; int ret = 2; bool good = false;
if( prev_extended && !cl_opts.permissive ) msg = fv_msg3; if( prev_extended && !cl_opts.permissive ) msg = fv_msg3;
else if( ( ret = ar.parse_records( extended, header, rbuf, extrec_msg, else if( ( ret = ar.parse_records( extended, header, rbuf, extrec_msg,
cl_opts.permissive ) ) != 0 ) msg = ar.e_msg(); cl_opts.permissive, &msg_vec ) ) != 0 ) msg = ar.e_msg();
else if( !extended.crc_present() && cl_opts.missing_crc ) else if( !extended.crc_present() && cl_opts.missing_crc )
{ msg = miscrc_msg; ret = 2; } { msg = miscrc_msg; ret = 2; }
else { prev_extended = true; continue; } else { prev_extended = true; good = true; }
for( unsigned j = 0; j < msg_vec.size(); ++j )
if( !courier.collect_packet( i, worker_id, msg_vec[j].c_str(),
Packet::diag ) ) { good = false; break; }
if( good ) continue;
if( courier.request_mastership( i, worker_id ) ) if( courier.request_mastership( i, worker_id ) )
courier.collect_packet( i, worker_id, msg, ( ret == 1 ) ? courier.collect_packet( i, worker_id, msg, ( ret == 1 ) ?
Packet::error1 : Packet::error2 ); Packet::error1 : Packet::error2 );
@ -632,13 +637,18 @@ extern "C" void * dworker( void * arg )
/* Skip members with an empty name in the ustar header. If there is an /* Skip members with an empty name in the ustar header. If there is an
extended header in a previous lzip member, its worker will request extended header in a previous lzip member, its worker will request
mastership. Else the ustar-only unnamed member will be ignored. */ mastership and the skip may fail here. Else the ustar-only unnamed
member will be ignored. */
std::string rpmsg; // removed prefix
Trival trival; Trival trival;
if( check_skip_filename( cl_opts, name_pending, extended.path().c_str() ) ) if( check_skip_filename( cl_opts, name_pending, extended.path().c_str(),
-1, &rpmsg ) )
trival = skip_member_lz( ar, courier, extended, i, worker_id, typeflag ); trival = skip_member_lz( ar, courier, extended, i, worker_id, typeflag );
else else
{ {
std::string rpmsg; if( verbosity >= 0 && rpmsg.size() &&
!courier.collect_packet( i, worker_id, rpmsg.c_str(), Packet::prefix ) )
{ trival = Trival( other_msg, 0, 1 ); goto fatal; }
if( print_removed_prefix( extended.removed_prefix, &rpmsg ) && if( print_removed_prefix( extended.removed_prefix, &rpmsg ) &&
!courier.collect_packet( i, worker_id, rpmsg.c_str(), Packet::prefix ) ) !courier.collect_packet( i, worker_id, rpmsg.c_str(), Packet::prefix ) )
{ trival = Trival( other_msg, 0, 1 ); goto fatal; } { trival = Trival( other_msg, 0, 1 ); goto fatal; }
@ -667,14 +677,14 @@ done:
} }
/* Get from courier the processed and sorted packets, and print /* Get from courier the processed and sorted packets.
the member lines on stdout or the diagnostics and errors on stderr. Print the member lines on stdout and the diagnostics and errors on stderr.
*/ */
void muxer( const char * const archive_namep, Packet_courier & courier ) int muxer( const char * const archive_namep, Packet_courier & courier )
{ {
std::vector< const Packet * > opacket_vector; std::vector< const Packet * > opacket_vector;
int retval = 0; int retval = 0;
while( retval == 0 ) while( retval == 0 ) // exit loop at first error packet
{ {
courier.deliver_packets( opacket_vector ); courier.deliver_packets( opacket_vector );
if( opacket_vector.empty() ) break; // queue is empty. all workers exited if( opacket_vector.empty() ) break; // queue is empty. all workers exited
@ -698,7 +708,7 @@ void muxer( const char * const archive_namep, Packet_courier & courier )
} }
if( retval == 0 && !courier.eoa_found() ) // no worker found EOA blocks if( retval == 0 && !courier.eoa_found() ) // no worker found EOA blocks
{ show_file_error( archive_namep, end_msg ); retval = 2; } { show_file_error( archive_namep, end_msg ); retval = 2; }
if( retval ) exit_fail_mt( retval ); return retval;
} }
} // end namespace } // end namespace
@ -715,8 +725,6 @@ int decode_lz( const Cl_options & cl_opts, const Archive_descriptor & ad,
Name_monitor Name_monitor
name_monitor( ( cl_opts.program_mode == m_extract ) ? num_workers : 0 ); name_monitor( ( cl_opts.program_mode == m_extract ) ? num_workers : 0 );
/* If an error happens after any threads have been started, exit must be
called before courier goes out of scope. */
Packet_courier courier( num_workers, out_slots ); Packet_courier courier( num_workers, out_slots );
Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers]; Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers];
@ -737,7 +745,7 @@ int decode_lz( const Cl_options & cl_opts, const Archive_descriptor & ad,
{ show_error( "Can't create worker threads", errcode ); exit_fail_mt(); } { show_error( "Can't create worker threads", errcode ); exit_fail_mt(); }
} }
muxer( ad.namep, courier ); int retval = muxer( ad.namep, courier );
for( int i = num_workers - 1; i >= 0; --i ) for( int i = num_workers - 1; i >= 0; --i )
{ {
@ -748,9 +756,8 @@ int decode_lz( const Cl_options & cl_opts, const Archive_descriptor & ad,
delete[] worker_threads; delete[] worker_threads;
delete[] worker_args; delete[] worker_args;
int retval = 0;
if( close( ad.infd ) != 0 ) if( close( ad.infd ) != 0 )
{ show_file_error( ad.namep, eclosa_msg, errno ); retval = 1; } { show_file_error( ad.namep, eclosa_msg, errno ); set_retval( retval, 1 ); }
if( retval == 0 ) if( retval == 0 )
for( int i = 0; i < cl_opts.parser.arguments(); ++i ) for( int i = 0; i < cl_opts.parser.arguments(); ++i )

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by

View file

@ -1,5 +1,5 @@
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2. .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2.
.TH TARLZ "1" "December 2024" "tarlz 0.26" "User Commands" .TH TARLZ "1" "March 2025" "tarlz 0.27" "User Commands"
.SH NAME .SH NAME
tarlz \- creates tar archives with multimember lzip compression tarlz \- creates tar archives with multimember lzip compression
.SH SYNOPSIS .SH SYNOPSIS
@ -19,7 +19,7 @@ compressed archives.
.PP .PP
Keeping the alignment between tar members and lzip members has two Keeping the alignment between tar members and lzip members has two
advantages. It adds an indexed lzip layer on top of the tar archive, making advantages. It adds an indexed lzip layer on top of the tar archive, making
it possible to decode the archive safely in parallel. It also minimizes the it possible to decode the archive safely in parallel. It also reduces the
amount of data lost in case of corruption. amount of data lost in case of corruption.
.PP .PP
The tarlz file format is a safe POSIX\-style backup format. In case of The tarlz file format is a safe POSIX\-style backup format. In case of
@ -160,12 +160,12 @@ Report bugs to lzip\-bug@nongnu.org
.br .br
Tarlz home page: http://www.nongnu.org/lzip/tarlz.html Tarlz home page: http://www.nongnu.org/lzip/tarlz.html
.SH COPYRIGHT .SH COPYRIGHT
Copyright \(co 2024 Antonio Diaz Diaz. Copyright \(co 2025 Antonio Diaz Diaz.
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html> License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
.br .br
This is free software: you are free to change and redistribute it. This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. There is NO WARRANTY, to the extent permitted by law.
Using lzlib 1.15\-rc1 Using lzlib 1.15
Using LZ_API_VERSION = 1015 Using LZ_API_VERSION = 1015
.SH "SEE ALSO" .SH "SEE ALSO"
The full documentation for The full documentation for

View file

@ -11,13 +11,14 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual Tarlz Manual
************ ************
This manual is for Tarlz (version 0.26, 7 December 2024). This manual is for Tarlz (version 0.27, 28 February 2025).
* Menu: * Menu:
* Introduction:: Purpose and features of tarlz * Introduction:: Purpose and features of tarlz
* Invoking tarlz:: Command-line interface * Invoking tarlz:: Command-line interface
* Argument syntax:: By convention, options start with a hyphen * Argument syntax:: By convention, options start with a hyphen
* Creating backups safely:: Checking integrity and accuracy of archives
* Portable character set:: POSIX portable filename character set * Portable character set:: POSIX portable filename character set
* File format:: Detailed format of the compressed archive * File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax * Amendments to pax format:: The reasons for the differences with pax
@ -29,7 +30,7 @@ This manual is for Tarlz (version 0.26, 7 December 2024).
* Concept index:: Index of concepts * Concept index:: Index of concepts
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to copy, This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it. distribute, and modify it.
@ -53,7 +54,7 @@ compressed archives.
Keeping the alignment between tar members and lzip members has two Keeping the alignment between tar members and lzip members has two
advantages. It adds an indexed lzip layer on top of the tar archive, making advantages. It adds an indexed lzip layer on top of the tar archive, making
it possible to decode the archive safely in parallel. It also minimizes the it possible to decode the archive safely in parallel. It also reduces the
amount of data lost in case of corruption. Compressing a tar archive with amount of data lost in case of corruption. Compressing a tar archive with
plzip may even double the amount of files lost for each lzip member damaged plzip may even double the amount of files lost for each lzip member damaged
because it does not keep the members aligned. because it does not keep the members aligned.
@ -216,7 +217,7 @@ tarlz supports the following operations:
'-t' '-t'
'--list' '--list'
List the contents of an archive. If FILES are given, list only the List the contents of an archive. If FILES are given, list only the
FILES given. FILES given. *Note mt-listing::.
'-x' '-x'
'--extract' '--extract'
@ -227,20 +228,23 @@ tarlz supports the following operations:
directories unconditionally before extracting over them. Other than directories unconditionally before extracting over them. Other than
that, it does not make any special effort to extract a file over an that, it does not make any special effort to extract a file over an
incompatible type of file. For example, extracting a file over a incompatible type of file. For example, extracting a file over a
non-empty directory usually fails. non-empty directory usually fails. *Note mt-extraction::.
'-z' '-z'
'--compress' '--compress'
Compress existing POSIX tar archives aligning the lzip members to the Compress existing POSIX tar archives aligning the lzip members to the
tar members with choice of granularity ('--bsolid' by default, tar members with choice of granularity ('--bsolid' by default,
'--dsolid' works like '--asolid'). Exit with error status 2 if any '--dsolid' works like '--asolid'). Each input archive is compressed to
input archive is an empty file. The input archives are kept unchanged. a file with the extension '.lz' added unless the option '--output' is
Existing compressed archives are not overwritten. A hyphen '-' used as used. If no archives are specified, or if a hyphen '-' is used as the
the name of an input archive reads from standard input and writes to name of an archive, tarlz reads from standard input and writes to
standard output (unless the option '--output' is used). Tarlz can be standard output (unless the option '--output' is used). When
used as compressor for GNU tar by using a command like '--output' is used, only one input archive can be specified. Exit with
'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Tarlz can be used as error status 2 if any input archive is an empty file. The input
compressor for zupdate (zutils) by using a command like archives are kept unchanged. Existing compressed archives are not
overwritten. Tarlz can be used as compressor for GNU tar by using a
command like 'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Tarlz can
be used as compressor for zupdate (zutils) by using a command like
'zupdate --lz='tarlz -z' foo.tar.gz'. Note that tarlz only works 'zupdate --lz='tarlz -z' foo.tar.gz'. Note that tarlz only works
reliably on archives without global headers, or with global headers reliably on archives without global headers, or with global headers
whose content can be ignored. whose content can be ignored.
@ -251,11 +255,8 @@ tarlz supports the following operations:
archive. Unless solid compression is requested, the end-of-archive archive. Unless solid compression is requested, the end-of-archive
blocks are compressed in a lzip member separated from the preceding blocks are compressed in a lzip member separated from the preceding
members and from any nonzero garbage following the end-of-archive members and from any nonzero garbage following the end-of-archive
blocks. '--compress' implies plzip argument style, not tar style. Each blocks. '--compress' implies plzip argument style, not tar style. '-f'
input archive is compressed to a file with the extension '.lz' added can't be used with '--compress'.
unless the option '--output' is used. When '--output' is used, only
one input archive can be specified. '-f' can't be used with
'--compress'.
'--check-lib' '--check-lib'
Compare the version of lzlib used to compile tarlz with the version Compare the version of lzlib used to compile tarlz with the version
@ -276,7 +277,9 @@ tarlz supports the following options: *Note Argument syntax::.
Set target size of input data blocks for the option '--bsolid'. *Note Set target size of input data blocks for the option '--bsolid'. *Note
--bsolid::. Valid values range from 8 KiB to 1 GiB. Default value is --bsolid::. Valid values range from 8 KiB to 1 GiB. Default value is
two times the dictionary size, except for option '-0' where it two times the dictionary size, except for option '-0' where it
defaults to 1 MiB. *Note Minimum archive sizes::. defaults to 1 MiB. *Note Minimum archive sizes::. Tarlz does not split
tar members. If a file is larger than BYTES, tarlz will create a lzip
member large enough to contain the file.
'-C DIR' '-C DIR'
'--directory=DIR' '--directory=DIR'
@ -424,12 +427,12 @@ tarlz supports the following options: *Note Argument syntax::.
group ID. group ID.
'--exclude=PATTERN' '--exclude=PATTERN'
Exclude files matching a shell pattern like '*.o'. A file is considered Exclude files matching a shell pattern like '*.o', even if the files
to match if any component of the file name matches. For example, '*.o' are specified in the command line. A file is considered to match if any
matches 'foo.o', 'foo.o/bar' and 'foo/bar.o'. If PATTERN contains a component of the file name matches. For example, '*.o' matches
'/', it matches a corresponding '/' in the file name. For example, 'foo.o', 'foo.o/bar' and 'foo/bar.o'. If PATTERN contains a '/', it
'foo/*.o' matches 'foo/bar.o'. Multiple '--exclude' options can be matches a corresponding '/' in the file name. For example, 'foo/*.o'
specified. matches 'foo/bar.o'. Multiple '--exclude' options can be specified.
'--ignore-ids' '--ignore-ids'
Make '--diff' ignore differences in owner and group IDs. This option is Make '--diff' ignore differences in owner and group IDs. This option is
@ -486,10 +489,10 @@ tarlz supports the following options: *Note Argument syntax::.
During archive creation, warn if any file being archived has a During archive creation, warn if any file being archived has a
modification time newer than the archive creation time. This option modification time newer than the archive creation time. This option
may slow archive creation somewhat because it makes an extra call to may slow archive creation somewhat because it makes an extra call to
'stat' after archiving each file, but it guarantees that file contents 'stat' after archiving each file, but it nearly guarantees that file
were not modified during the creation of the archive. Note that the contents were not modified during the creation of the archive. Note
file must be at least one second newer than the archive for it to be that the file must be at least one second newer than the archive for
detected as newer. it to be detected as newer.
Exit status: 0 for a normal exit, 1 for environmental problems (file not Exit status: 0 for a normal exit, 1 for environmental problems (file not
@ -498,7 +501,7 @@ indicate a corrupt or invalid input file, 3 for an internal consistency
error (e.g., bug) which caused tarlz to panic. error (e.g., bug) which caused tarlz to panic.
 
File: tarlz.info, Node: Argument syntax, Next: Portable character set, Prev: Invoking tarlz, Up: Top File: tarlz.info, Node: Argument syntax, Next: Creating backups safely, Prev: Invoking tarlz, Up: Top
3 Syntax of command-line arguments 3 Syntax of command-line arguments
********************************** **********************************
@ -541,9 +544,55 @@ GNU adds "long options" to these conventions:
Thus, '--foo bar' and '--foo=bar' are equivalent. Thus, '--foo bar' and '--foo=bar' are equivalent.
 
File: tarlz.info, Node: Portable character set, Next: File format, Prev: Argument syntax, Up: Top File: tarlz.info, Node: Creating backups safely, Next: Portable character set, Prev: Argument syntax, Up: Top
4 POSIX portable filename character set 4 Checking the integrity and accuracy of tar.lz archives
********************************************************
Uncompressed tar archives do not offer any integrity checking for the files
they store. The pax format even fails to offer integrity checking for some
of the metadata. *Note crc32::. The integrity checking of tar archives is
usually provided by a compression layer or by an external hash.
Lzip compression provides safe integrity checking to tar archives. But it
does not matter how safe is the archiving format if the archive is created
corrupt because of a concurrent modification of the files being archived, a
faulty RAM, or a bug in the archiving tool. The only way of guaranteeing
that a backup archive is correct is to check its integrity and accuracy
after creating it.
Testing the integrity of the archive with 'lzip -tv' guarantees that the
compression layer of the archive is valid, but it does not guarantee that
the tar layer is valid nor that the files in the archive match the files in
the file system. For example, if the RAM is faulty and a bit flip happens
in the input buffer before tarlz compresses it, the archive will not match
the files. It is safer to check the archive with 'tarlz -d' just after
creation because it checks the compression layer and the tar layer, and it
compares the files in the archive with the files in the file system:
tarlz -cf archive.tar.lz somedir # create the archive
tarlz -df archive.tar.lz # check the archive
Once the integrity and accuracy of an archive have been verified as in
the example above, they can be verified again anywhere at any time with
'tarlz -t -n0'. It is important to disable multi-threading with '-n0'
because multi-threaded listing does not detect corruption in the tar member
data of multimember archives: *Note mt-listing::.
tarlz -t -n0 -f archive.tar.lz > /dev/null
'lzip -tv' checks the integrity of the compression layer, and therefore
the integrity and accuracy of any archive created and verified as explained
above. This test is reliable for solidly compressed archives, but it does
not detect a truncated multimember archive if the truncation happens just
at a member boundary:
lzip -tv archive.tar.lz

File: tarlz.info, Node: Portable character set, Next: File format, Prev: Creating backups safely, Up: Top
5 POSIX portable filename character set
*************************************** ***************************************
The set of characters from which portable file names are constructed. The set of characters from which portable file names are constructed.
@ -561,7 +610,7 @@ names use only the portable character set without spaces added.
 
File: tarlz.info, Node: File format, Next: Amendments to pax format, Prev: Portable character set, Up: Top File: tarlz.info, Node: File format, Next: Amendments to pax format, Prev: Portable character set, Up: Top
5 File format 6 File format
************* *************
In the diagram below, a box like this: In the diagram below, a box like this:
@ -632,7 +681,7 @@ tar.lz
| member | member | member | | member | member | member |
+===============+=================================================+========+ +===============+=================================================+========+
5.1 Pax header block 6.1 Pax header block
==================== ====================
The pax header block is identical to the ustar header block described below The pax header block is identical to the ustar header block described below
@ -676,7 +725,7 @@ space, equal-sign, and newline.
previously archived. This record overrides the field 'linkname' in the previously archived. This record overrides the field 'linkname' in the
following ustar header block. The following ustar header block following ustar header block. The following ustar header block
determines the type of link created. If typeflag of the following determines the type of link created. If typeflag of the following
header block is 1, a hard link is created. If typeflag is 2, a header block is '1', a hard link is created. If typeflag is '2', a
symbolic link is created and the linkpath value is used as the symbolic link is created and the linkpath value is used as the
contents of the symbolic link. The linkpath record is created only for contents of the symbolic link. The linkpath record is created only for
links with a link name that does not fit in the space provided by the links with a link name that does not fit in the space provided by the
@ -716,17 +765,17 @@ space, equal-sign, and newline.
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
representing the CRC <value> itself. The <value> is represented as 8 representing the CRC <value> itself. The <value> is represented as 8
hexadecimal digits in big endian order, '22 GNU.crc32=00000000\n'. The hexadecimal digits in big endian order, '22 GNU.crc32=00000000\n'. The
keyword of the CRC record is protected by the CRC to guarantee that option '--missing-crc' guarantees that corruption is always detected
corruption is always detected when using '--missing-crc' (except in (except in case of CRC collision). A CRC was chosen because a checksum
case of CRC collision). A CRC was chosen because a checksum is too is too weak for a potentially large list of variable sized records. A
weak for a potentially large list of variable sized records. A
checksum can't detect simple errors like the swapping of two bytes. checksum can't detect simple errors like the swapping of two bytes.
*Note --missing-crc::.
At verbosity level 1 or higher tarlz prints a diagnostic for each unknown At verbosity level 1 or higher tarlz prints a diagnostic for each unknown
extended header keyword found in an archive, once per keyword. extended header keyword found in an archive, once per keyword.
5.2 Ustar header block 6.2 Ustar header block
====================== ======================
The ustar header block has a length of 512 bytes and is structured as shown The ustar header block has a length of 512 bytes and is structured as shown
@ -750,6 +799,7 @@ gname 297 32
devmajor 329 8 devmajor 329 8
devminor 337 8 devminor 337 8
prefix 345 155 prefix 345 155
padding 500 12
All characters in the header block are coded using the ISO/IEC 646:1991 All characters in the header block are coded using the ISO/IEC 646:1991
(ASCII) standard, except in fields storing names for files, users, and (ASCII) standard, except in fields storing names for files, users, and
@ -839,7 +889,7 @@ file archived:
''7'' ''7''
Reserved to represent a file to which an implementation has associated Reserved to represent a file to which an implementation has associated
some high-performance attribute (contiguous file). Tarlz treats this some high-performance attribute (contiguous file). Tarlz treats this
type of file as a regular file (type 0). type of file as a regular file (type '0').
The field 'magic' contains the ASCII null-terminated string "ustar". The The field 'magic' contains the ASCII null-terminated string "ustar". The
@ -848,13 +898,13 @@ field 'version' contains the characters "00" (0x30,0x30). The fields
characters in the array contain non-null characters including the last characters in the array contain non-null characters including the last
character. Each numeric field contains a leading space- or zero-filled, character. Each numeric field contains a leading space- or zero-filled,
optionally null-terminated octal number using digits from the ISO/IEC optionally null-terminated octal number using digits from the ISO/IEC
646:1991 (ASCII) standard. Tarlz is able to decode numeric fields 1 byte 646:1991 (ASCII) standard. Tarlz is able to decode numeric fields one byte
longer than standard ustar by not requiring a terminating null character. longer than standard ustar by not requiring a terminating null character.
 
File: tarlz.info, Node: Amendments to pax format, Next: Program design, Prev: File format, Up: Top File: tarlz.info, Node: Amendments to pax format, Next: Program design, Prev: File format, Up: Top
6 The reasons for the differences with pax 7 The reasons for the differences with pax
****************************************** ******************************************
Tarlz creates safe archives that allow the reliable detection of invalid or Tarlz creates safe archives that allow the reliable detection of invalid or
@ -865,7 +915,7 @@ achieve this goal and avoid some other flaws in the pax format, tarlz makes
some changes to the variant of the pax format that it uses. This chapter some changes to the variant of the pax format that it uses. This chapter
describes these changes and the concrete reasons to implement them. describes these changes and the concrete reasons to implement them.
6.1 Add a CRC of the extended records 7.1 Add a CRC of the extended records
===================================== =====================================
The POSIX pax format has a serious flaw. The metadata stored in pax extended The POSIX pax format has a serious flaw. The metadata stored in pax extended
@ -892,7 +942,7 @@ place.
Redundancy Check (CRC) in a way compatible with standard tar tools. *Note Redundancy Check (CRC) in a way compatible with standard tar tools. *Note
key_crc32::. key_crc32::.
6.2 Remove flawed backward compatibility 7.2 Remove flawed backward compatibility
======================================== ========================================
In order to allow the extraction of pax archives by a tar utility conforming In order to allow the extraction of pax archives by a tar utility conforming
@ -925,7 +975,7 @@ trying to extract the file or link. This also makes easier during parallel
decoding the detection of a tar member split between two lzip members at decoding the detection of a tar member split between two lzip members at
the boundary between the extended header and the ustar header. the boundary between the extended header and the ustar header.
6.3 As simple as possible (but not simpler) 7.3 As simple as possible (but not simpler)
=========================================== ===========================================
The tarlz format is mainly ustar. Extended pax headers are used only when The tarlz format is mainly ustar. Extended pax headers are used only when
@ -940,7 +990,7 @@ corruption.
ignored. Some operations may not behave as expected if the archive contains ignored. Some operations may not behave as expected if the archive contains
global headers. global headers.
6.4 Improve reproducibility 7.4 Improve reproducibility
=========================== ===========================
Pax includes by default the process ID of the pax process in the ustar name Pax includes by default the process ID of the pax process in the ustar name
@ -952,7 +1002,7 @@ extended records, making it easier to produce reproducible archives.
ten; '99<97_bytes>' or '100<97_bytes>'. Tarlz minimizes the length of the ten; '99<97_bytes>' or '100<97_bytes>'. Tarlz minimizes the length of the
record and always produces a length of x-1 in these cases. record and always produces a length of x-1 in these cases.
6.5 No data in hard links 7.5 No data in hard links
========================= =========================
Tarlz does not allow data in hard link members. The data (if any) must be in Tarlz does not allow data in hard link members. The data (if any) must be in
@ -961,27 +1011,26 @@ the names of a file are stored as hard links, the type of the file is lost.
Not allowing data in hard links also prevents invalid actions like Not allowing data in hard links also prevents invalid actions like
extracting file data for a hard link to a symbolic link or to a directory. extracting file data for a hard link to a symbolic link or to a directory.
6.6 Avoid misconversions to/from UTF-8 7.6 Avoid misconversions to/from UTF-8
====================================== ======================================
There is no portable way to tell what charset a text string is coded into. There is no portable way to tell what charset a text string is coded into.
Therefore, tarlz stores all fields representing text strings unmodified, Therefore, tarlz stores all fields representing text strings unmodified,
without conversion to UTF-8 nor any other transformation. This prevents without conversion to UTF-8 nor any other transformation. This prevents
accidental double UTF-8 conversions. If the need arises this behavior will accidental double UTF-8 conversions.
be adjusted with a command-line option in the future.
 
File: tarlz.info, Node: Program design, Next: Multi-threaded decoding, Prev: Amendments to pax format, Up: Top File: tarlz.info, Node: Program design, Next: Multi-threaded decoding, Prev: Amendments to pax format, Up: Top
7 Internal structure of tarlz 8 Internal structure of tarlz
***************************** *****************************
The parts of tarlz related to sequential processing of the archive are more The parts of tarlz related to sequential processing of the archive are more
or less similar to any other tar and won't be described here. The or less similar to any other tar and won't be described here. The
interesting parts described here are those related to Multi-threaded interesting parts described here are those related to multi-threaded
processing. processing.
The structure of the part of tarlz performing Multi-threaded archive The structure of the part of tarlz performing multi-threaded archive
creation is somewhat similar to that of plzip with the added complication creation is somewhat similar to that of plzip with the added complication
of the solidity levels. *Note Program design: (plzip)Program design. A of the solidity levels. *Note Program design: (plzip)Program design. A
grouper thread and several worker threads are created, acting the main grouper thread and several worker threads are created, acting the main
@ -1053,7 +1102,7 @@ error be avoided.
 
File: tarlz.info, Node: Multi-threaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top File: tarlz.info, Node: Multi-threaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
8 Limitations of parallel tar decoding 9 Limitations of parallel tar decoding
************************************** **************************************
Safely decoding a tar archive in parallel is only possible if one decodes Safely decoding a tar archive in parallel is only possible if one decodes
@ -1093,11 +1142,14 @@ tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded misalignment during multi-threaded decoding, it switches to single-threaded
mode and continues decoding the archive. mode and continues decoding the archive.
If the files in the archive are large, multi-threaded '--list' on a 9.1 Multi-threaded listing
regular (seekable) tar.lz archive can be hundreds of times faster than ==========================
sequential '--list' because, in addition to using several processors, it
only needs to decompress part of each lzip member. See the following If the files in the archive are large, multi-threaded '--list' on a regular
example listing the Silesia corpus on a dual core machine: (seekable) tar.lz archive can be hundreds of times faster than sequential
'--list' because, in addition to using several processors, it only needs to
decompress part of each lzip member. See the following example listing the
Silesia corpus on a dual core machine:
tarlz -9 --no-solid -cf silesia.tar.lz silesia tarlz -9 --no-solid -cf silesia.tar.lz silesia
time lzip -cd silesia.tar.lz | tar -tf - (5.032s) time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
@ -1106,10 +1158,12 @@ example listing the Silesia corpus on a dual core machine:
On the other hand, multi-threaded '--list' won't detect corruption in On the other hand, multi-threaded '--list' won't detect corruption in
the tar member data because it only decodes the part of each lzip member the tar member data because it only decodes the part of each lzip member
corresponding to the tar member header. This is another reason why the tar corresponding to the tar member header. Partial decoding of a lzip member
headers must provide their own integrity checking. can't guarantee the integrity of the data decoded. This is another reason
why the tar headers (including the extended records) must provide their own
integrity checking.
8.1 Limitations of multi-threaded extraction 9.2 Limitations of multi-threaded extraction
============================================ ============================================
Multi-threaded extraction may produce different output than single-threaded Multi-threaded extraction may produce different output than single-threaded
@ -1139,8 +1193,8 @@ links to.
 
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top
9 Minimum archive sizes required for multi-threaded block compression 10 Minimum archive sizes required for multi-threaded block compression
********************************************************************* **********************************************************************
When creating or appending to a compressed archive using multi-threaded When creating or appending to a compressed archive using multi-threaded
block compression, tarlz puts tar members together in blocks and compresses block compression, tarlz puts tar members together in blocks and compresses
@ -1177,7 +1231,7 @@ Level
 
File: tarlz.info, Node: Examples, Next: Problems, Prev: Minimum archive sizes, Up: Top File: tarlz.info, Node: Examples, Next: Problems, Prev: Minimum archive sizes, Up: Top
10 A small tutorial with examples 11 A small tutorial with examples
********************************* *********************************
Example 1: Create a multimember compressed archive 'archive.tar.lz' Example 1: Create a multimember compressed archive 'archive.tar.lz'
@ -1233,10 +1287,12 @@ other members can still be extracted).
tarlz -z --no-solid archive.tar tarlz -z --no-solid archive.tar
Example 10: Compress the archive 'archive.tar' and write the output to Example 10: Recompress the archive 'archive.tar.lz' with different
'foo.tar.lz'. solidity, write the output to 'archive-ns.tar.lz', and compare both
archives.
tarlz -z -o foo.tar.lz archive.tar lzip -cd archive.tar.lz | tarlz -9z --no-solid -o archive-ns.tar.lz
zcmp archive.tar.lz archive-ns.tar.lz
Example 11: Concatenate and compress two archives 'archive1.tar' and Example 11: Concatenate and compress two archives 'archive1.tar' and
'archive2.tar', and write the output to 'foo.tar.lz'. 'archive2.tar', and write the output to 'foo.tar.lz'.
@ -1246,7 +1302,7 @@ Example 11: Concatenate and compress two archives 'archive1.tar' and
 
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
11 Reporting bugs 12 Reporting bugs
***************** *****************
There are probably bugs in tarlz. There are certainly errors and omissions There are probably bugs in tarlz. There are certainly errors and omissions
@ -1270,6 +1326,7 @@ Concept index
* Amendments to pax format: Amendments to pax format. (line 6) * Amendments to pax format: Amendments to pax format. (line 6)
* argument syntax: Argument syntax. (line 6) * argument syntax: Argument syntax. (line 6)
* bugs: Problems. (line 6) * bugs: Problems. (line 6)
* creating backups: Creating backups safely. (line 6)
* examples: Examples. (line 6) * examples: Examples. (line 6)
* file format: File format. (line 6) * file format: File format. (line 6)
* getting help: Problems. (line 6) * getting help: Problems. (line 6)
@ -1287,26 +1344,29 @@ Concept index
 
Tag Table: Tag Table:
Node: Top216 Node: Top216
Node: Introduction1281 Node: Introduction1356
Node: Invoking tarlz4106 Node: Invoking tarlz4179
Ref: --data-size13109 Ref: --data-size13265
Ref: --bsolid17626 Ref: --bsolid17924
Node: Argument syntax23539 Ref: --missing-crc21532
Node: Portable character set25314 Node: Argument syntax23897
Node: File format25958 Node: Creating backups safely25673
Ref: key_crc3233001 Node: Portable character set28057
Ref: ustar-uid-gid36305 Node: File format28709
Ref: ustar-mtime37112 Ref: key_crc3235756
Node: Amendments to pax format39115 Ref: ustar-uid-gid39052
Ref: crc3239823 Ref: ustar-mtime39859
Ref: flawed-compat41134 Node: Amendments to pax format41866
Node: Program design45211 Ref: crc3242574
Node: Multi-threaded decoding49138 Ref: flawed-compat43885
Ref: mt-extraction52407 Node: Program design47870
Node: Minimum archive sizes53713 Node: Multi-threaded decoding51797
Node: Examples55840 Ref: mt-listing54198
Node: Problems58199 Ref: mt-extraction55236
Node: Concept index58754 Node: Minimum archive sizes56542
Node: Examples58671
Node: Problems61166
Node: Concept index61721
 
End Tag Table End Tag Table

View file

@ -6,8 +6,8 @@
@finalout @finalout
@c %**end of header @c %**end of header
@set UPDATED 7 December 2024 @set UPDATED 28 February 2025
@set VERSION 0.26 @set VERSION 0.27
@dircategory Archiving @dircategory Archiving
@direntry @direntry
@ -39,6 +39,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
* Introduction:: Purpose and features of tarlz * Introduction:: Purpose and features of tarlz
* Invoking tarlz:: Command-line interface * Invoking tarlz:: Command-line interface
* Argument syntax:: By convention, options start with a hyphen * Argument syntax:: By convention, options start with a hyphen
* Creating backups safely:: Checking integrity and accuracy of archives
* Portable character set:: POSIX portable filename character set * Portable character set:: POSIX portable filename character set
* File format:: Detailed format of the compressed archive * File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax * Amendments to pax format:: The reasons for the differences with pax
@ -51,7 +52,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
@end menu @end menu
@sp 1 @sp 1
Copyright @copyright{} 2013-2024 Antonio Diaz Diaz. Copyright @copyright{} 2013-2025 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to copy, This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it. distribute, and modify it.
@ -76,7 +77,7 @@ compressed archives.
Keeping the alignment between tar members and lzip members has two Keeping the alignment between tar members and lzip members has two
advantages. It adds an indexed lzip layer on top of the tar archive, making advantages. It adds an indexed lzip layer on top of the tar archive, making
it possible to decode the archive safely in parallel. It also minimizes the it possible to decode the archive safely in parallel. It also reduces the
amount of data lost in case of corruption. Compressing a tar archive with amount of data lost in case of corruption. Compressing a tar archive with
plzip may even double the amount of files lost for each lzip member damaged plzip may even double the amount of files lost for each lzip member damaged
because it does not keep the members aligned. because it does not keep the members aligned.
@ -254,7 +255,7 @@ during multi-threaded extraction. @xref{mt-extraction}.
@item -t @item -t
@itemx --list @itemx --list
List the contents of an archive. If @var{files} are given, list only the List the contents of an archive. If @var{files} are given, list only the
@var{files} given. @var{files} given. @xref{mt-listing}.
@item -x @item -x
@itemx --extract @itemx --extract
@ -265,20 +266,23 @@ directory without extracting the files under it, use
empty directories unconditionally before extracting over them. Other than empty directories unconditionally before extracting over them. Other than
that, it does not make any special effort to extract a file over an that, it does not make any special effort to extract a file over an
incompatible type of file. For example, extracting a file over a non-empty incompatible type of file. For example, extracting a file over a non-empty
directory usually fails. directory usually fails. @xref{mt-extraction}.
@item -z @item -z
@itemx --compress @itemx --compress
Compress existing POSIX tar archives aligning the lzip members to the tar Compress existing POSIX tar archives aligning the lzip members to the tar
members with choice of granularity (@option{--bsolid} by default, members with choice of granularity (@option{--bsolid} by default,
@option{--dsolid} works like @option{--asolid}). Exit with error status 2 if @option{--dsolid} works like @option{--asolid}). Each input archive is
any input archive is an empty file. The input archives are kept unchanged. compressed to a file with the extension @file{.lz} added unless the option
Existing compressed archives are not overwritten. A hyphen @samp{-} used as @option{--output} is used. If no archives are specified, or if a hyphen
the name of an input archive reads from standard input and writes to @samp{-} is used as the name of an archive, tarlz reads from standard input
standard output (unless the option @option{--output} is used). Tarlz can be and writes to standard output (unless the option @option{--output} is used).
used as compressor for GNU tar by using a command like When @option{--output} is used, only one input archive can be specified.
@w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}. Tarlz can be used as Exit with error status 2 if any input archive is an empty file. The input
compressor for zupdate (zutils) by using a command like archives are kept unchanged. Existing compressed archives are not
overwritten. Tarlz can be used as compressor for GNU tar by using a command
like @w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}. Tarlz can be
used as compressor for zupdate (zutils) by using a command like
@w{@samp{zupdate --lz='tarlz -z' foo.tar.gz}}. Note that tarlz only works @w{@samp{zupdate --lz='tarlz -z' foo.tar.gz}}. Note that tarlz only works
reliably on archives without global headers, or with global headers whose reliably on archives without global headers, or with global headers whose
content can be ignored. content can be ignored.
@ -289,10 +293,8 @@ block is found, and then compresses the rest of the archive. Unless solid
compression is requested, the end-of-archive blocks are compressed in a lzip compression is requested, the end-of-archive blocks are compressed in a lzip
member separated from the preceding members and from any nonzero garbage member separated from the preceding members and from any nonzero garbage
following the end-of-archive blocks. @option{--compress} implies plzip following the end-of-archive blocks. @option{--compress} implies plzip
argument style, not tar style. Each input archive is compressed to a file argument style, not tar style. @option{-f} can't be used with
with the extension @file{.lz} added unless the option @option{--output} is @option{--compress}.
used. When @option{--output} is used, only one input archive can be specified.
@option{-f} can't be used with @option{--compress}.
@item --check-lib @item --check-lib
Compare the Compare the
@ -319,8 +321,10 @@ tarlz supports the following options: @xref{Argument syntax}.
@itemx --data-size=@var{bytes} @itemx --data-size=@var{bytes}
Set target size of input data blocks for the option @option{--bsolid}. Set target size of input data blocks for the option @option{--bsolid}.
@xref{--bsolid}. Valid values range from @w{8 KiB} to @w{1 GiB}. Default @xref{--bsolid}. Valid values range from @w{8 KiB} to @w{1 GiB}. Default
value is two times the dictionary size, except for option @option{-0} where it value is two times the dictionary size, except for option @option{-0} where
defaults to @w{1 MiB}. @xref{Minimum archive sizes}. it defaults to @w{1 MiB}. @xref{Minimum archive sizes}. Tarlz does not split
tar members. If a file is larger than @var{bytes}, tarlz will create a lzip
member large enough to contain the file.
@item -C @var{dir} @item -C @var{dir}
@itemx --directory=@var{dir} @itemx --directory=@var{dir}
@ -465,12 +469,13 @@ If @var{group} is not a valid group name, it is decoded as a decimal numeric
group ID. group ID.
@item --exclude=@var{pattern} @item --exclude=@var{pattern}
Exclude files matching a shell pattern like @file{*.o}. A file is considered Exclude files matching a shell pattern like @file{*.o}, even if the files
to match if any component of the file name matches. For example, @file{*.o} are specified in the command line. A file is considered to match if any
matches @file{foo.o}, @file{foo.o/bar} and @file{foo/bar.o}. If component of the file name matches. For example, @file{*.o} matches
@var{pattern} contains a @samp{/}, it matches a corresponding @samp{/} in @file{foo.o}, @file{foo.o/bar} and @file{foo/bar.o}. If @var{pattern}
the file name. For example, @file{foo/*.o} matches @file{foo/bar.o}. contains a @samp{/}, it matches a corresponding @samp{/} in the file name.
Multiple @option{--exclude} options can be specified. For example, @file{foo/*.o} matches @file{foo/bar.o}. Multiple
@option{--exclude} options can be specified.
@item --ignore-ids @item --ignore-ids
Make @option{--diff} ignore differences in owner and group IDs. This option is Make @option{--diff} ignore differences in owner and group IDs. This option is
@ -493,6 +498,7 @@ recover as much data as possible from each damaged member. It is recommended
to run tarlz in single-threaded mode (@option{--threads=0}) when using this to run tarlz in single-threaded mode (@option{--threads=0}) when using this
option. option.
@anchor{--missing-crc}
@item --missing-crc @item --missing-crc
Exit with error status 2 if the CRC of the extended records is missing. When Exit with error status 2 if the CRC of the extended records is missing. When
this option is used, tarlz detects any corruption in the extended records this option is used, tarlz detects any corruption in the extended records
@ -525,9 +531,9 @@ values range from 1 to 1024. The default value is 64.
During archive creation, warn if any file being archived has a modification During archive creation, warn if any file being archived has a modification
time newer than the archive creation time. This option may slow archive time newer than the archive creation time. This option may slow archive
creation somewhat because it makes an extra call to @samp{stat} after creation somewhat because it makes an extra call to @samp{stat} after
archiving each file, but it guarantees that file contents were not modified archiving each file, but it nearly guarantees that file contents were not
during the creation of the archive. Note that the file must be at least one modified during the creation of the archive. Note that the file must be at
second newer than the archive for it to be detected as newer. least one second newer than the archive for it to be detected as newer.
@ignore @ignore
@item --permissive @item --permissive
@ -591,6 +597,58 @@ Thus, @w{@option{--foo bar}} and @option{--foo=bar} are equivalent.
@end itemize @end itemize
@node Creating backups safely
@chapter Checking the integrity and accuracy of tar.lz archives
@cindex creating backups
Uncompressed tar archives do not offer any integrity checking for the files
they store. The pax format even fails to offer integrity checking for some
of the metadata. @xref{crc32}. The integrity checking of tar archives is
usually provided by a compression layer or by an external hash.
Lzip compression provides safe integrity checking to tar archives. But it
does not matter how safe is the archiving format if the archive is created
corrupt because of a concurrent modification of the files being archived, a
faulty RAM, or a bug in the archiving tool. The only way of guaranteeing
that a backup archive is correct is to check its integrity and accuracy
after creating it.
Testing the integrity of the archive with @w{@samp{lzip -tv}} guarantees
that the compression layer of the archive is valid, but it does not
guarantee that the tar layer is valid nor that the files in the archive
match the files in the file system. For example, if the RAM is faulty and a
bit flip happens in the input buffer before tarlz compresses it, the archive
will not match the files. It is safer to check the archive with
@w{@samp{tarlz -d}} just after creation because it checks the compression
layer and the tar layer, and it compares the files in the archive with the
files in the file system:
@example
tarlz -cf archive.tar.lz somedir # create the archive
tarlz -df archive.tar.lz # check the archive
@end example
Once the integrity and accuracy of an archive have been verified as in the
example above, they can be verified again anywhere at any time with
@w{@samp{tarlz -t -n0}}. It is important to disable multi-threading with
@option{-n0} because multi-threaded listing does not detect corruption in
the tar member data of multimember archives: @xref{mt-listing}.
@example
tarlz -t -n0 -f archive.tar.lz > /dev/null
@end example
@w{@samp{lzip -tv}} checks the integrity of the compression layer, and
therefore the integrity and accuracy of any archive created and verified as
explained above. This test is reliable for solidly compressed archives, but
it does not detect a truncated multimember archive if the truncation happens
just at a member boundary:
@example
lzip -tv archive.tar.lz
@end example
@node Portable character set @node Portable character set
@chapter POSIX portable filename character set @chapter POSIX portable filename character set
@cindex portable character set @cindex portable character set
@ -641,7 +699,7 @@ are not allowed in multimember files.
Each lzip member contains one or more tar members in a simplified POSIX pax Each lzip member contains one or more tar members in a simplified POSIX pax
interchange format. The only pax typeflag value supported by tarlz (in interchange format. The only pax typeflag value supported by tarlz (in
addition to the typeflag values defined by the ustar format) is @samp{x}. addition to the typeflag values defined by the ustar format) is 'x'.
The pax format is an extension on top of the ustar format that removes the The pax format is an extension on top of the ustar format that removes the
size limitations of the ustar format. size limitations of the ustar format.
@ -654,7 +712,7 @@ An optional extended header block followed by one or more blocks that
contain the extended header records as if they were the contents of a file; contain the extended header records as if they were the contents of a file;
i.e., the extended header records are included as the data for this header i.e., the extended header records are included as the data for this header
block. This header block is of the form described in pax header block, with block. This header block is of the form described in pax header block, with
a typeflag value of @samp{x}. a typeflag value of 'x'.
@item @item
A header block in ustar format that describes the file. Any fields defined A header block in ustar format that describes the file. Any fields defined
@ -713,7 +771,7 @@ An extended header just before the end-of-archive blocks.
@section Pax header block @section Pax header block
The pax header block is identical to the ustar header block described below The pax header block is identical to the ustar header block described below
except that the typeflag has the value @samp{x} (extended). The field except that the typeflag has the value 'x' (extended). The field
@samp{size} is the size of the extended header data in bytes. Most other @samp{size} is the size of the extended header data in bytes. Most other
fields in the pax header block are zeroed on archive creation to prevent fields in the pax header block are zeroed on archive creation to prevent
trouble if the archive is read by a ustar tool, and are ignored by tarlz on trouble if the archive is read by a ustar tool, and are ignored by tarlz on
@ -752,8 +810,8 @@ greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
The file name of a link being created to another file, of any type, The file name of a link being created to another file, of any type,
previously archived. This record overrides the field @samp{linkname} in the previously archived. This record overrides the field @samp{linkname} in the
following ustar header block. The following ustar header block determines following ustar header block. The following ustar header block determines
the type of link created. If typeflag of the following header block is 1, a the type of link created. If typeflag of the following header block is '1', a
hard link is created. If typeflag is 2, a symbolic link is created and the hard link is created. If typeflag is '2', a symbolic link is created and the
linkpath value is used as the contents of the symbolic link. The linkpath linkpath value is used as the contents of the symbolic link. The linkpath
record is created only for links with a link name that does not fit in the record is created only for links with a link name that does not fit in the
space provided by the ustar header. space provided by the ustar header.
@ -789,13 +847,12 @@ greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
@item GNU.crc32 @item GNU.crc32
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
representing the CRC <value> itself. The <value> is represented as 8 representing the CRC <value> itself. The <value> is represented as 8
hexadecimal digits in big endian order, hexadecimal digits in big endian order, @w{@samp{22 GNU.crc32=00000000\n}}.
@w{@samp{22 GNU.crc32=00000000\n}}. The keyword of the CRC record is The option @option{--missing-crc} guarantees that corruption is always
protected by the CRC to guarantee that corruption is always detected when detected (except in case of CRC collision). A CRC was chosen because a
using @option{--missing-crc} (except in case of CRC collision). A CRC was checksum is too weak for a potentially large list of variable sized records.
chosen because a checksum is too weak for a potentially large list of A checksum can't detect simple errors like the swapping of two bytes.
variable sized records. A checksum can't detect simple errors like the @xref{--missing-crc}.
swapping of two bytes.
@end table @end table
@ -825,6 +882,7 @@ shown in the following table. All lengths and offsets are in decimal:
@item devmajor @tab 329 @tab 8 @item devmajor @tab 329 @tab 8
@item devminor @tab 337 @tab 8 @item devminor @tab 337 @tab 8
@item prefix @tab 345 @tab 155 @item prefix @tab 345 @tab 155
@item padding @tab 500 @tab 12
@end multitable @end multitable
All characters in the header block are coded using the ISO/IEC 646:1991 All characters in the header block are coded using the ISO/IEC 646:1991
@ -919,7 +977,7 @@ FIFO special file.
@item '7' @item '7'
Reserved to represent a file to which an implementation has associated some Reserved to represent a file to which an implementation has associated some
high-performance attribute (contiguous file). Tarlz treats this type of file high-performance attribute (contiguous file). Tarlz treats this type of file
as a regular file (type 0). as a regular file (type '0').
@end table @end table
@ -930,8 +988,8 @@ except when all characters in the array contain non-null characters
including the last character. Each numeric field contains a leading space- including the last character. Each numeric field contains a leading space-
or zero-filled, optionally null-terminated octal number using digits from or zero-filled, optionally null-terminated octal number using digits from
the ISO/IEC 646:1991 (ASCII) standard. Tarlz is able to decode numeric the ISO/IEC 646:1991 (ASCII) standard. Tarlz is able to decode numeric
fields 1 byte longer than standard ustar by not requiring a terminating null fields one byte longer than standard ustar by not requiring a terminating
character. null character.
@node Amendments to pax format @node Amendments to pax format
@ -1044,8 +1102,7 @@ extracting file data for a hard link to a symbolic link or to a directory.
There is no portable way to tell what charset a text string is coded into. There is no portable way to tell what charset a text string is coded into.
Therefore, tarlz stores all fields representing text strings unmodified, Therefore, tarlz stores all fields representing text strings unmodified,
without conversion to UTF-8 nor any other transformation. This prevents without conversion to UTF-8 nor any other transformation. This prevents
accidental double UTF-8 conversions. If the need arises this behavior will accidental double UTF-8 conversions.
be adjusted with a command-line option in the future.
@node Program design @node Program design
@ -1054,12 +1111,12 @@ be adjusted with a command-line option in the future.
The parts of tarlz related to sequential processing of the archive are more The parts of tarlz related to sequential processing of the archive are more
or less similar to any other tar and won't be described here. The interesting or less similar to any other tar and won't be described here. The interesting
parts described here are those related to Multi-threaded processing. parts described here are those related to multi-threaded processing.
The structure of the part of tarlz performing Multi-threaded archive The structure of the part of tarlz performing multi-threaded archive
creation is somewhat similar to that of creation is somewhat similar to that of
@uref{http://www.nongnu.org/lzip/plzip.html#Program-design,,plzip} with the @uref{http://www.nongnu.org/lzip/manual/plzip_manual.html#Program-design,,plzip}
added complication of the solidity levels. with the added complication of the solidity levels.
@ifnothtml @ifnothtml
@xref{Program design,,,plzip}. @xref{Program design,,,plzip}.
@end ifnothtml @end ifnothtml
@ -1174,6 +1231,9 @@ tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded misalignment during multi-threaded decoding, it switches to single-threaded
mode and continues decoding the archive. mode and continues decoding the archive.
@anchor{mt-listing}
@section Multi-threaded listing
If the files in the archive are large, multi-threaded @option{--list} on a If the files in the archive are large, multi-threaded @option{--list} on a
regular (seekable) tar.lz archive can be hundreds of times faster than regular (seekable) tar.lz archive can be hundreds of times faster than
sequential @option{--list} because, in addition to using several processors, sequential @option{--list} because, in addition to using several processors,
@ -1189,8 +1249,10 @@ time tarlz -tf silesia.tar.lz (0.020s)
On the other hand, multi-threaded @option{--list} won't detect corruption in On the other hand, multi-threaded @option{--list} won't detect corruption in
the tar member data because it only decodes the part of each lzip member the tar member data because it only decodes the part of each lzip member
corresponding to the tar member header. This is another reason why the tar corresponding to the tar member header. Partial decoding of a lzip member
headers must provide their own integrity checking. can't guarantee the integrity of the data decoded. This is another reason
why the tar headers (including the extended records) must provide their own
integrity checking.
@anchor{mt-extraction} @anchor{mt-extraction}
@section Limitations of multi-threaded extraction @section Limitations of multi-threaded extraction
@ -1344,11 +1406,13 @@ tarlz -z --no-solid archive.tar
@end example @end example
@noindent @noindent
Example 10: Compress the archive @file{archive.tar} and write the output to Example 10: Recompress the archive @file{archive.tar.lz} with different
@file{foo.tar.lz}. solidity, write the output to @file{archive-ns.tar.lz}, and compare both
archives.
@example @example
tarlz -z -o foo.tar.lz archive.tar lzip -cd archive.tar.lz | tarlz -9z --no-solid -o archive-ns.tar.lz
zcmp archive.tar.lz archive-ns.tar.lz
@end example @end example
@noindent @noindent

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -22,6 +22,7 @@
#include <cstdio> #include <cstdio>
#include "tarlz.h" #include "tarlz.h"
#include "common_mutex.h"
const CRC32 crc32c( true ); const CRC32 crc32c( true );
@ -58,9 +59,9 @@ long long parse_decimal( const char * const ptr, const char ** const tailp,
} }
uint32_t parse_record_crc( const char * const ptr ) unsigned parse_record_crc( const char * const ptr )
{ {
uint32_t crc = 0; unsigned crc = 0;
for( int i = 0; i < 8; ++i ) for( int i = 0; i < 8; ++i )
{ {
crc <<= 4; crc <<= 4;
@ -201,16 +202,25 @@ void Extended::calculate_sizes() const
// print a diagnostic for each unknown keyword once per keyword // print a diagnostic for each unknown keyword once per keyword
void Extended::unknown_keyword( const char * const buf, const int size ) const void Extended::unknown_keyword( const char * const buf, const int size,
std::vector< std::string > * const msg_vecp ) const
{ {
// prevent two threads from modifying the list of keywords at the same time
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int eq_pos = 0; // position of '=' in buf int eq_pos = 0; // position of '=' in buf
while( eq_pos < size && buf[eq_pos] != '=' ) ++eq_pos; while( eq_pos < size && buf[eq_pos] != '=' ) ++eq_pos;
const std::string keyword( buf, eq_pos ); const std::string keyword( buf, eq_pos );
xlock( &mutex );
for( unsigned i = 0; i < unknown_keywords.size(); ++i ) for( unsigned i = 0; i < unknown_keywords.size(); ++i )
if( keyword == unknown_keywords[i] ) return; if( keyword == unknown_keywords[i] ) { xunlock( &mutex ); return; }
unknown_keywords.push_back( keyword ); unknown_keywords.push_back( keyword );
print_error( 0, "Ignoring unknown extended header keyword '%s'", xunlock( &mutex );
keyword.c_str() ); const char * str = "Ignoring unknown extended header keyword '%s'";
if( !msg_vecp ) print_error( 0, str, keyword.c_str() );
else
{ msg_vecp->push_back( std::string() );
format_error( msg_vecp->back(), 0, str, keyword.c_str() ); }
} }
@ -281,7 +291,8 @@ const char * Extended::full_size_error() const
bool Extended::parse( const char * const buf, const int edsize, bool Extended::parse( const char * const buf, const int edsize,
const bool permissive ) const bool permissive,
std::vector< std::string > * const msg_vecp )
{ {
reset(); full_size_ = -4; // invalidate cached sizes reset(); full_size_ = -4; // invalidate cached sizes
for( int pos = 0; pos < edsize; ) // parse records for( int pos = 0; pos < edsize; ) // parse records
@ -348,18 +359,22 @@ bool Extended::parse( const char * const buf, const int edsize,
if( crc_present_ && !permissive ) return false; if( crc_present_ && !permissive ) return false;
if( rsize != (int)crc_record.size() ) return false; if( rsize != (int)crc_record.size() ) return false;
crc_present_ = true; crc_present_ = true;
const uint32_t stored_crc = parse_record_crc( tail + 10 ); const unsigned stored_crc = parse_record_crc( tail + 10 );
const uint32_t computed_crc = const unsigned computed_crc =
crc32c.windowed_crc( (const uint8_t *)buf, pos + rsize - 9, edsize ); crc32c.windowed_crc( (const uint8_t *)buf, pos + rsize - 9, edsize );
if( stored_crc != computed_crc ) if( stored_crc != computed_crc )
{ {
if( verbosity >= 2 ) if( verbosity < 1 ) return false;
std::fprintf( stderr, "CRC32-C = %08X\n", (unsigned)computed_crc ); const char * str = "CRC mismatch in extended records; stored %08X, computed %08X";
if( !msg_vecp ) print_error( 0, str, stored_crc, computed_crc );
else
{ msg_vecp->push_back( std::string() );
format_error( msg_vecp->back(), 0, str, stored_crc, computed_crc ); }
return false; return false;
} }
} }
else if( ( rest < 8 || std::memcmp( tail, "comment=", 8 ) != 0 ) && else if( ( rest < 8 || std::memcmp( tail, "comment=", 8 ) != 0 ) &&
verbosity >= 1 ) unknown_keyword( tail, rest ); verbosity >= 1 ) unknown_keyword( tail, rest, msg_vecp );
pos += rsize; pos += rsize;
} }
return true; return true;

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by

View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by

42
main.cc
View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -57,7 +57,7 @@ const char * const program_name = "tarlz";
namespace { namespace {
const char * const program_year = "2024"; const char * const program_year = "2025";
const char * invocation_name = program_name; // default value const char * invocation_name = program_name; // default value
@ -74,7 +74,7 @@ void show_help( const long num_online )
"compressed archives.\n" "compressed archives.\n"
"\nKeeping the alignment between tar members and lzip members has two\n" "\nKeeping the alignment between tar members and lzip members has two\n"
"advantages. It adds an indexed lzip layer on top of the tar archive, making\n" "advantages. It adds an indexed lzip layer on top of the tar archive, making\n"
"it possible to decode the archive safely in parallel. It also minimizes the\n" "it possible to decode the archive safely in parallel. It also reduces the\n"
"amount of data lost in case of corruption.\n" "amount of data lost in case of corruption.\n"
"\nThe tarlz file format is a safe POSIX-style backup format. In case of\n" "\nThe tarlz file format is a safe POSIX-style backup format. In case of\n"
"corruption, tarlz can extract all the undamaged members from the tar.lz\n" "corruption, tarlz can extract all the undamaged members from the tar.lz\n"
@ -457,17 +457,43 @@ bool format_error( Resizable_buffer & rbuf, const int errcode,
for( int i = 0; i < 2; ++i ) // resize rbuf if not large enough for( int i = 0; i < 2; ++i ) // resize rbuf if not large enough
{ {
int len = snprintf( rbuf(), rbuf.size(), "%s: ", program_name ); int len = snprintf( rbuf(), rbuf.size(), "%s: ", program_name );
if( len >= (int)rbuf.size() && !rbuf.resize( len + 1 ) ) break; if( !rbuf.resize( len + 1 ) ) break;
va_start( args, format ); va_start( args, format );
len += vsnprintf( rbuf() + len, rbuf.size() - len, format, args ); len += vsnprintf( rbuf() + len, rbuf.size() - len, format, args );
va_end( args ); va_end( args );
if( len >= (int)rbuf.size() && !rbuf.resize( len + 1 ) ) break; if( !rbuf.resize( len + 2 ) ) break;
if( errcode <= 0 ) rbuf()[len++] = '\n'; if( errcode <= 0 ) { rbuf()[len++] = '\n'; rbuf()[len] = 0; }
else len += snprintf( rbuf() + len, rbuf.size() - len, ": %s\n", else len += snprintf( rbuf() + len, rbuf.size() - len, ": %s\n",
std::strerror( errcode ) ); std::strerror( errcode ) );
if( len < (int)rbuf.size() || !rbuf.resize( len + 1 ) ) break; if( len < (int)rbuf.size() ) return true;
if( i > 0 || !rbuf.resize( len + 1 ) ) break;
} }
return true; return false;
}
bool format_error( std::string & msg, const int errcode,
const char * const format, ... )
{
if( verbosity < 0 ) return false;
Resizable_buffer rbuf;
if( !rbuf.size() ) return false;
va_list args;
for( int i = 0; i < 2; ++i ) // resize rbuf if not large enough
{
int len = snprintf( rbuf(), rbuf.size(), "%s: ", program_name );
if( !rbuf.resize( len + 1 ) ) break;
va_start( args, format );
len += vsnprintf( rbuf() + len, rbuf.size() - len, format, args );
va_end( args );
if( !rbuf.resize( len + 2 ) ) break;
if( errcode <= 0 ) { rbuf()[len++] = '\n'; rbuf()[len] = 0; }
else len += snprintf( rbuf() + len, rbuf.size() - len, ": %s\n",
std::strerror( errcode ) );
if( len < (int)rbuf.size() ) { msg.assign( rbuf(), len ); return true; }
if( i > 0 || !rbuf.resize( len + 1 ) ) break;
}
return false;
} }

20
tarlz.h
View file

@ -1,5 +1,5 @@
/* Tarlz - Archiver with multimember lzip compression /* Tarlz - Archiver with multimember lzip compression
Copyright (C) 2013-2024 Antonio Diaz Diaz. Copyright (C) 2013-2025 Antonio Diaz Diaz.
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by it under the terms of the GNU General Public License as published by
@ -120,7 +120,7 @@ public:
} }
return true; return true;
} }
char * operator()() { return p; } char * operator()() { return p; } // no need for operator[]
const char * operator()() const { return p; } const char * operator()() const { return p; }
uint8_t * u8() { return (uint8_t *)p; } uint8_t * u8() { return (uint8_t *)p; }
const uint8_t * u8() const { return (const uint8_t *)p; } const uint8_t * u8() const { return (const uint8_t *)p; }
@ -184,7 +184,8 @@ class Extended // stores metadata from/for extended records
mutable bool crc_present_; mutable bool crc_present_;
void calculate_sizes() const; void calculate_sizes() const;
void unknown_keyword( const char * const buf, const int size ) const; void unknown_keyword( const char * const buf, const int size,
std::vector< std::string > * const msg_vecp = 0 ) const;
public: public:
static const std::string crc_record; static const std::string crc_record;
@ -234,7 +235,8 @@ public:
bool crc_present() const { return crc_present_; } bool crc_present() const { return crc_present_; }
bool parse( const char * const buf, const int edsize, bool parse( const char * const buf, const int edsize,
const bool permissive ); const bool permissive,
std::vector< std::string > * const msg_vecp = 0 );
void fill_from_ustar( const Tar_header header ); void fill_from_ustar( const Tar_header header );
}; };
@ -279,7 +281,7 @@ public:
return crc ^ 0xFFFFFFFFU; return crc ^ 0xFFFFFFFFU;
} }
// Calculates the crc of size bytes except a window of 8 bytes at pos // compute the crc of size bytes except a window of 8 bytes at pos
uint32_t windowed_crc( const uint8_t * const buffer, const int pos, uint32_t windowed_crc( const uint8_t * const buffer, const int pos,
const int size ) const const int size ) const
{ {
@ -485,8 +487,9 @@ const char * const posix_lz_msg = "This does not look like a POSIX tar.lz archiv
const char * const eclosa_msg = "Error closing archive"; const char * const eclosa_msg = "Error closing archive";
const char * const eclosf_msg = "Error closing file"; const char * const eclosf_msg = "Error closing file";
const char * const nfound_msg = "Not found in archive."; const char * const nfound_msg = "Not found in archive.";
const char * const seek_msg = "Seek error"; const char * const rd_err_msg = "Read error";
const char * const wr_err_msg = "Write error"; const char * const wr_err_msg = "Write error";
const char * const seek_msg = "Seek error";
const char * const chdir_msg = "Error changing working directory"; const char * const chdir_msg = "Error changing working directory";
const char * const intdir_msg = "Failed to create intermediate directory"; const char * const intdir_msg = "Failed to create intermediate directory";
@ -503,7 +506,8 @@ bool show_member_name( const Extended & extended, const Tar_header header,
const int vlevel, Resizable_buffer & rbuf ); const int vlevel, Resizable_buffer & rbuf );
bool check_skip_filename( const Cl_options & cl_opts, bool check_skip_filename( const Cl_options & cl_opts,
std::vector< char > & name_pending, std::vector< char > & name_pending,
const char * const filename, const int chdir_fd = -1 ); const char * const filename, const int cwd_fd = -1,
std::string * const msgp = 0 );
bool make_dirs( const std::string & name ); bool make_dirs( const std::string & name );
// defined in common_mutex.cc // defined in common_mutex.cc
@ -597,6 +601,8 @@ void show_error( const char * const msg, const int errcode = 0,
const bool help = false ); const bool help = false );
bool format_error( Resizable_buffer & rbuf, const int errcode, bool format_error( Resizable_buffer & rbuf, const int errcode,
const char * const format, ... ); const char * const format, ... );
bool format_error( std::string & msg, const int errcode,
const char * const format, ... );
void print_error( const int errcode, const char * const format, ... ); void print_error( const int errcode, const char * const format, ... );
void format_file_error( std::string & estr, const char * const filename, void format_file_error( std::string & estr, const char * const filename,
const char * const msg, const int errcode = 0 ); const char * const msg, const int errcode = 0 );

File diff suppressed because it is too large Load diff

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
testsuite/t155_fv7.tar.lz Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
testsuite/test3_crc.tar.lz Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
testsuite/test3_gh7.tar.lz Normal file

Binary file not shown.

BIN
testsuite/test3_gh8.tar.lz Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
testsuite/test3_uk.tar.lz Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.