Adding upstream version 0.16.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
5d67ab9e97
commit
bb26c2917c
20 changed files with 854 additions and 662 deletions
|
@ -1,3 +1,12 @@
|
||||||
|
2019-10-08 Antonio Diaz Diaz <antonio@gnu.org>
|
||||||
|
|
||||||
|
* Version 0.16 released.
|
||||||
|
* extract.cc (extract_member): Fixed call order of chown, chmod.
|
||||||
|
* delete_lz.cc (delete_members_lz): Return 2 if collective member.
|
||||||
|
* main.cc: Set a valid invocation_name even if argc == 0.
|
||||||
|
* #include <sys/sysmacros.h> unconditionally.
|
||||||
|
* tarlz.texi: Added new chapter 'Portable character set'.
|
||||||
|
|
||||||
2019-04-11 Antonio Diaz Diaz <antonio@gnu.org>
|
2019-04-11 Antonio Diaz Diaz <antonio@gnu.org>
|
||||||
|
|
||||||
* Version 0.15 released.
|
* Version 0.15 released.
|
||||||
|
|
2
INSTALL
2
INSTALL
|
@ -1,7 +1,7 @@
|
||||||
Requirements
|
Requirements
|
||||||
------------
|
------------
|
||||||
You will need a C++ compiler and the lzlib compression library installed.
|
You will need a C++ compiler and the lzlib compression library installed.
|
||||||
I use gcc 5.3.0 and 4.1.2, but the code should compile with any standards
|
I use gcc 6.1.0 and 4.1.2, but the code should compile with any standards
|
||||||
compliant compiler.
|
compliant compiler.
|
||||||
Lzlib must be version 1.0 or newer, but --keep-damaged requires lzlib 1.11
|
Lzlib must be version 1.0 or newer, but --keep-damaged requires lzlib 1.11
|
||||||
or newer to recover as much data as possible from each damaged member.
|
or newer to recover as much data as possible from each damaged member.
|
||||||
|
|
|
@ -8,8 +8,9 @@ LIBS = -llz -lpthread
|
||||||
SHELL = /bin/sh
|
SHELL = /bin/sh
|
||||||
CAN_RUN_INSTALLINFO = $(SHELL) -c "install-info --version" > /dev/null 2>&1
|
CAN_RUN_INSTALLINFO = $(SHELL) -c "install-info --version" > /dev/null 2>&1
|
||||||
|
|
||||||
objs = arg_parser.o lzip_index.o create.o create_lz.o delete.o delete_lz.o \
|
objs = arg_parser.o lzip_index.o common.o common_decode.o create.o \
|
||||||
exclude.o extended.o extract.o list_lz.o main.o
|
create_lz.o delete.o delete_lz.o exclude.o extended.o extract.o \
|
||||||
|
list_lz.o main.o
|
||||||
|
|
||||||
|
|
||||||
.PHONY : all install install-bin install-info install-man \
|
.PHONY : all install install-bin install-info install-man \
|
||||||
|
@ -31,6 +32,8 @@ main.o : main.cc
|
||||||
|
|
||||||
$(objs) : Makefile
|
$(objs) : Makefile
|
||||||
arg_parser.o : arg_parser.h
|
arg_parser.o : arg_parser.h
|
||||||
|
common.o : tarlz.h
|
||||||
|
common_decode.o : arg_parser.h tarlz.h
|
||||||
create.o : arg_parser.h tarlz.h
|
create.o : arg_parser.h tarlz.h
|
||||||
create_lz.o : arg_parser.h tarlz.h
|
create_lz.o : arg_parser.h tarlz.h
|
||||||
delete.o : arg_parser.h lzip_index.h tarlz.h
|
delete.o : arg_parser.h lzip_index.h tarlz.h
|
||||||
|
|
17
NEWS
17
NEWS
|
@ -1,10 +1,11 @@
|
||||||
Changes in version 0.15:
|
Changes in version 0.16:
|
||||||
|
|
||||||
The new option '--delete', which deletes files and directories from an
|
'chown' and 'chmod' are now called in the right order on extracion to
|
||||||
archive in place, has been added. It currently can delete only from
|
preserve the S_ISUID and S_ISGID bits of executable files.
|
||||||
uncompressed archives and from archives with individually compressed files
|
|
||||||
('--no-solid' archives).
|
|
||||||
|
|
||||||
Multi-threaded listing of compressed archives with format violations (for
|
The return value of '--delete' when failing to delete a tar member not
|
||||||
example, an extended header without the corresponding ustar header) has been
|
individually compressed has been fixed. It returned 0, but should be 2.
|
||||||
fixed.
|
|
||||||
|
The header <sys/sysmacros.h> is now #included unconditionally.
|
||||||
|
|
||||||
|
The new chapter 'Portable character set' has been added to the manual.
|
||||||
|
|
31
README
31
README
|
@ -2,13 +2,19 @@ Description
|
||||||
|
|
||||||
Tarlz is a massively parallel (multi-threaded) combined implementation of
|
Tarlz is a massively parallel (multi-threaded) combined implementation of
|
||||||
the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
|
the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
|
||||||
archives in a simplified posix pax format compressed with lzip, keeping the
|
archives in a simplified and safer variant of the POSIX pax format
|
||||||
alignment between tar members and lzip members. This method adds an indexed
|
compressed with lzip, keeping the alignment between tar members and lzip
|
||||||
lzip layer on top of the tar archive, making it possible to decode the
|
members. The resulting multimember tar.lz archive is fully backward
|
||||||
archive safely in parallel. The resulting multimember tar.lz archive is
|
compatible with standard tar tools like GNU tar, which treat it like any
|
||||||
fully backward compatible with standard tar tools like GNU tar, which treat
|
other tar.lz archive. Tarlz can append files to the end of such compressed
|
||||||
it like any other tar.lz archive. Tarlz can append files to the end of such
|
archives.
|
||||||
compressed archives.
|
|
||||||
|
Keeping the alignment between tar members and lzip members has two
|
||||||
|
advantages. It adds an indexed lzip layer on top of the tar archive, making
|
||||||
|
it possible to decode the archive safely in parallel. It also minimizes the
|
||||||
|
amount of data lost in case of corruption. Compressing a tar archive with
|
||||||
|
plzip may even double the amount of files lost for each lzip member damaged
|
||||||
|
because it does not keep the members aligned.
|
||||||
|
|
||||||
Tarlz can create tar archives with five levels of compression granularity;
|
Tarlz can create tar archives with five levels of compression granularity;
|
||||||
per file (--no-solid), per block (--bsolid, default), per directory
|
per file (--no-solid), per block (--bsolid, default), per directory
|
||||||
|
@ -25,7 +31,7 @@ archive, but it has the following advantages:
|
||||||
member), and unwanted members can be deleted from the archive. Just
|
member), and unwanted members can be deleted from the archive. Just
|
||||||
like an uncompressed tar archive.
|
like an uncompressed tar archive.
|
||||||
|
|
||||||
* It is a safe posix-style backup format. In case of corruption,
|
* It is a safe POSIX-style backup format. In case of corruption,
|
||||||
tarlz can extract all the undamaged members from the tar.lz
|
tarlz can extract all the undamaged members from the tar.lz
|
||||||
archive, skipping over the damaged members, just like the standard
|
archive, skipping over the damaged members, just like the standard
|
||||||
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
|
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
|
||||||
|
@ -36,7 +42,7 @@ archive, but it has the following advantages:
|
||||||
corresponding solidly compressed tar.gz archive, except when
|
corresponding solidly compressed tar.gz archive, except when
|
||||||
individually compressing files smaller than about 32 KiB.
|
individually compressing files smaller than about 32 KiB.
|
||||||
|
|
||||||
Note that the posix pax format has a serious flaw. The metadata stored in
|
Note that the POSIX pax format has a serious flaw. The metadata stored in
|
||||||
pax extended records are not protected by any kind of check sequence.
|
pax extended records are not protected by any kind of check sequence.
|
||||||
Corruption in a long file name may cause the extraction of the file in the
|
Corruption in a long file name may cause the extraction of the file in the
|
||||||
wrong place without warning. Corruption in a large file size may cause the
|
wrong place without warning. Corruption in a large file size may cause the
|
||||||
|
@ -50,10 +56,17 @@ potentially much worse that undetected corruption in the data. Even more so
|
||||||
in the case of pax because the amount of metadata it stores is potentially
|
in the case of pax because the amount of metadata it stores is potentially
|
||||||
large, making undetected corruption more probable.
|
large, making undetected corruption more probable.
|
||||||
|
|
||||||
|
Headers and metadata must be protected separately from data because the
|
||||||
|
integrity checking of lzip may not be able to detect the corruption before
|
||||||
|
the metadata has been used, for example, to create a new file in the wrong
|
||||||
|
place.
|
||||||
|
|
||||||
Because of the above, tarlz protects the extended records with a CRC in a
|
Because of the above, tarlz protects the extended records with a CRC in a
|
||||||
way compatible with standard tar tools.
|
way compatible with standard tar tools.
|
||||||
|
|
||||||
Tarlz does not understand other tar formats like gnu, oldgnu, star or v7.
|
Tarlz does not understand other tar formats like gnu, oldgnu, star or v7.
|
||||||
|
'tarlz -tf archive.tar.lz > /dev/null' can be used to verify that the format
|
||||||
|
of the archive is compatible with tarlz.
|
||||||
|
|
||||||
The diagram below shows the correspondence between each tar member (formed
|
The diagram below shows the correspondence between each tar member (formed
|
||||||
by one or two headers plus optional data) in the tar archive and each lzip
|
by one or two headers plus optional data) in the tar archive and each lzip
|
||||||
|
|
149
common.cc
Normal file
149
common.cc
Normal file
|
@ -0,0 +1,149 @@
|
||||||
|
/* Tarlz - Archiver with multimember lzip compression
|
||||||
|
Copyright (C) 2013-2019 Antonio Diaz Diaz.
|
||||||
|
|
||||||
|
This program is free software: you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation, either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define _FILE_OFFSET_BITS 64
|
||||||
|
|
||||||
|
#include <cctype>
|
||||||
|
#include <cerrno>
|
||||||
|
#include <climits>
|
||||||
|
#include <cstdlib>
|
||||||
|
#include <cstring>
|
||||||
|
#include <string>
|
||||||
|
#include <vector>
|
||||||
|
#include <pthread.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
|
||||||
|
#include "tarlz.h"
|
||||||
|
|
||||||
|
|
||||||
|
void xinit_mutex( pthread_mutex_t * const mutex )
|
||||||
|
{
|
||||||
|
const int errcode = pthread_mutex_init( mutex, 0 );
|
||||||
|
if( errcode )
|
||||||
|
{ show_error( "pthread_mutex_init", errcode ); cleanup_and_fail(); }
|
||||||
|
}
|
||||||
|
|
||||||
|
void xinit_cond( pthread_cond_t * const cond )
|
||||||
|
{
|
||||||
|
const int errcode = pthread_cond_init( cond, 0 );
|
||||||
|
if( errcode )
|
||||||
|
{ show_error( "pthread_cond_init", errcode ); cleanup_and_fail(); }
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void xdestroy_mutex( pthread_mutex_t * const mutex )
|
||||||
|
{
|
||||||
|
const int errcode = pthread_mutex_destroy( mutex );
|
||||||
|
if( errcode )
|
||||||
|
{ show_error( "pthread_mutex_destroy", errcode ); cleanup_and_fail(); }
|
||||||
|
}
|
||||||
|
|
||||||
|
void xdestroy_cond( pthread_cond_t * const cond )
|
||||||
|
{
|
||||||
|
const int errcode = pthread_cond_destroy( cond );
|
||||||
|
if( errcode )
|
||||||
|
{ show_error( "pthread_cond_destroy", errcode ); cleanup_and_fail(); }
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void xlock( pthread_mutex_t * const mutex )
|
||||||
|
{
|
||||||
|
const int errcode = pthread_mutex_lock( mutex );
|
||||||
|
if( errcode )
|
||||||
|
{ show_error( "pthread_mutex_lock", errcode ); cleanup_and_fail(); }
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void xunlock( pthread_mutex_t * const mutex )
|
||||||
|
{
|
||||||
|
const int errcode = pthread_mutex_unlock( mutex );
|
||||||
|
if( errcode )
|
||||||
|
{ show_error( "pthread_mutex_unlock", errcode ); cleanup_and_fail(); }
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void xwait( pthread_cond_t * const cond, pthread_mutex_t * const mutex )
|
||||||
|
{
|
||||||
|
const int errcode = pthread_cond_wait( cond, mutex );
|
||||||
|
if( errcode )
|
||||||
|
{ show_error( "pthread_cond_wait", errcode ); cleanup_and_fail(); }
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void xsignal( pthread_cond_t * const cond )
|
||||||
|
{
|
||||||
|
const int errcode = pthread_cond_signal( cond );
|
||||||
|
if( errcode )
|
||||||
|
{ show_error( "pthread_cond_signal", errcode ); cleanup_and_fail(); }
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void xbroadcast( pthread_cond_t * const cond )
|
||||||
|
{
|
||||||
|
const int errcode = pthread_cond_broadcast( cond );
|
||||||
|
if( errcode )
|
||||||
|
{ show_error( "pthread_cond_broadcast", errcode ); cleanup_and_fail(); }
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
unsigned long long parse_octal( const uint8_t * const ptr, const int size )
|
||||||
|
{
|
||||||
|
unsigned long long result = 0;
|
||||||
|
int i = 0;
|
||||||
|
while( i < size && std::isspace( ptr[i] ) ) ++i;
|
||||||
|
for( ; i < size && ptr[i] >= '0' && ptr[i] <= '7'; ++i )
|
||||||
|
{ result <<= 3; result += ptr[i] - '0'; }
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/* Returns the number of bytes really read.
|
||||||
|
If (returned value < size) and (errno == 0), means EOF was reached.
|
||||||
|
*/
|
||||||
|
int readblock( const int fd, uint8_t * const buf, const int size )
|
||||||
|
{
|
||||||
|
int sz = 0;
|
||||||
|
errno = 0;
|
||||||
|
while( sz < size )
|
||||||
|
{
|
||||||
|
const int n = read( fd, buf + sz, size - sz );
|
||||||
|
if( n > 0 ) sz += n;
|
||||||
|
else if( n == 0 ) break; // EOF
|
||||||
|
else if( errno != EINTR ) break;
|
||||||
|
errno = 0;
|
||||||
|
}
|
||||||
|
return sz;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/* Returns the number of bytes really written.
|
||||||
|
If (returned value < size), it is always an error.
|
||||||
|
*/
|
||||||
|
int writeblock( const int fd, const uint8_t * const buf, const int size )
|
||||||
|
{
|
||||||
|
int sz = 0;
|
||||||
|
errno = 0;
|
||||||
|
while( sz < size )
|
||||||
|
{
|
||||||
|
const int n = write( fd, buf + sz, size - sz );
|
||||||
|
if( n > 0 ) sz += n;
|
||||||
|
else if( n < 0 && errno != EINTR ) break;
|
||||||
|
errno = 0;
|
||||||
|
}
|
||||||
|
return sz;
|
||||||
|
}
|
200
common_decode.cc
Normal file
200
common_decode.cc
Normal file
|
@ -0,0 +1,200 @@
|
||||||
|
/* Tarlz - Archiver with multimember lzip compression
|
||||||
|
Copyright (C) 2013-2019 Antonio Diaz Diaz.
|
||||||
|
|
||||||
|
This program is free software: you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation, either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define _FILE_OFFSET_BITS 64
|
||||||
|
|
||||||
|
#include <climits>
|
||||||
|
#include <cstdio>
|
||||||
|
#include <cstdlib>
|
||||||
|
#include <cstring>
|
||||||
|
#include <ctime>
|
||||||
|
#include <string>
|
||||||
|
#include <vector>
|
||||||
|
#include <pthread.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <sys/stat.h>
|
||||||
|
|
||||||
|
#include "arg_parser.h"
|
||||||
|
#include "tarlz.h"
|
||||||
|
|
||||||
|
|
||||||
|
namespace {
|
||||||
|
|
||||||
|
enum { mode_string_size = 10,
|
||||||
|
group_string_size = 1 + uname_l + 1 + gname_l + 1 }; // 67
|
||||||
|
|
||||||
|
void format_mode_string( const Tar_header header, char buf[mode_string_size] )
|
||||||
|
{
|
||||||
|
const Typeflag typeflag = (Typeflag)header[typeflag_o];
|
||||||
|
|
||||||
|
std::memcpy( buf, "----------", mode_string_size );
|
||||||
|
switch( typeflag )
|
||||||
|
{
|
||||||
|
case tf_regular: break;
|
||||||
|
case tf_link: buf[0] = 'h'; break;
|
||||||
|
case tf_symlink: buf[0] = 'l'; break;
|
||||||
|
case tf_chardev: buf[0] = 'c'; break;
|
||||||
|
case tf_blockdev: buf[0] = 'b'; break;
|
||||||
|
case tf_directory: buf[0] = 'd'; break;
|
||||||
|
case tf_fifo: buf[0] = 'p'; break;
|
||||||
|
case tf_hiperf: buf[0] = 'C'; break;
|
||||||
|
default: buf[0] = '?';
|
||||||
|
}
|
||||||
|
const mode_t mode = parse_octal( header + mode_o, mode_l ); // 12 bits
|
||||||
|
const bool setuid = mode & S_ISUID;
|
||||||
|
const bool setgid = mode & S_ISGID;
|
||||||
|
const bool sticky = mode & S_ISVTX;
|
||||||
|
if( mode & S_IRUSR ) buf[1] = 'r';
|
||||||
|
if( mode & S_IWUSR ) buf[2] = 'w';
|
||||||
|
if( mode & S_IXUSR ) buf[3] = setuid ? 's' : 'x';
|
||||||
|
else if( setuid ) buf[3] = 'S';
|
||||||
|
if( mode & S_IRGRP ) buf[4] = 'r';
|
||||||
|
if( mode & S_IWGRP ) buf[5] = 'w';
|
||||||
|
if( mode & S_IXGRP ) buf[6] = setgid ? 's' : 'x';
|
||||||
|
else if( setgid ) buf[6] = 'S';
|
||||||
|
if( mode & S_IROTH ) buf[7] = 'r';
|
||||||
|
if( mode & S_IWOTH ) buf[8] = 'w';
|
||||||
|
if( mode & S_IXOTH ) buf[9] = sticky ? 't' : 'x';
|
||||||
|
else if( sticky ) buf[9] = 'T';
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
int format_user_group_string( const Tar_header header,
|
||||||
|
char buf[group_string_size] )
|
||||||
|
{
|
||||||
|
int len;
|
||||||
|
if( header[uname_o] && header[gname_o] )
|
||||||
|
len = snprintf( buf, group_string_size,
|
||||||
|
" %.32s/%.32s", header + uname_o, header + gname_o );
|
||||||
|
else
|
||||||
|
{
|
||||||
|
const unsigned uid = parse_octal( header + uid_o, uid_l );
|
||||||
|
const unsigned gid = parse_octal( header + gid_o, gid_l );
|
||||||
|
len = snprintf( buf, group_string_size, " %u/%u", uid, gid );
|
||||||
|
}
|
||||||
|
return len;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// return true if dir is a parent directory of name
|
||||||
|
bool compare_prefix_dir( const char * const dir, const char * const name )
|
||||||
|
{
|
||||||
|
int len = 0;
|
||||||
|
while( dir[len] && dir[len] == name[len] ) ++len;
|
||||||
|
return ( !dir[len] && len > 0 && ( dir[len-1] == '/' || name[len] == '/' ) );
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// compare two file names ignoring trailing slashes
|
||||||
|
bool compare_tslash( const char * const name1, const char * const name2 )
|
||||||
|
{
|
||||||
|
const char * p = name1;
|
||||||
|
const char * q = name2;
|
||||||
|
while( *p && *p == *q ) { ++p; ++q; }
|
||||||
|
while( *p == '/' ) ++p;
|
||||||
|
while( *q == '/' ) ++q;
|
||||||
|
return ( !*p && !*q );
|
||||||
|
}
|
||||||
|
|
||||||
|
} // end namespace
|
||||||
|
|
||||||
|
|
||||||
|
bool block_is_zero( const uint8_t * const buf, const int size )
|
||||||
|
{
|
||||||
|
for( int i = 0; i < size; ++i ) if( buf[i] != 0 ) return false;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
bool format_member_name( const Extended & extended, const Tar_header header,
|
||||||
|
Resizable_buffer & rbuf, const bool long_format )
|
||||||
|
{
|
||||||
|
if( long_format )
|
||||||
|
{
|
||||||
|
format_mode_string( header, rbuf() );
|
||||||
|
const int group_string_len =
|
||||||
|
format_user_group_string( header, rbuf() + mode_string_size );
|
||||||
|
int offset = mode_string_size + group_string_len;
|
||||||
|
const time_t mtime = parse_octal( header + mtime_o, mtime_l ); // 33 bits
|
||||||
|
struct tm tms;
|
||||||
|
const struct tm * tm = localtime_r( &mtime, &tms );
|
||||||
|
if( !tm )
|
||||||
|
{ time_t z = 0; tm = localtime_r( &z, &tms ); if( !tm ) tm = &tms; }
|
||||||
|
const Typeflag typeflag = (Typeflag)header[typeflag_o];
|
||||||
|
const bool islink = ( typeflag == tf_link || typeflag == tf_symlink );
|
||||||
|
const char * const link_string = !islink ? "" :
|
||||||
|
( ( typeflag == tf_link ) ? " link to " : " -> " );
|
||||||
|
if( typeflag == tf_chardev || typeflag == tf_blockdev )
|
||||||
|
offset += snprintf( rbuf() + offset, rbuf.size() - offset, " %5u,%u",
|
||||||
|
(unsigned)parse_octal( header + devmajor_o, devmajor_l ),
|
||||||
|
(unsigned)parse_octal( header + devminor_o, devminor_l ) );
|
||||||
|
else
|
||||||
|
offset += snprintf( rbuf() + offset, rbuf.size() - offset, " %9llu",
|
||||||
|
extended.file_size() );
|
||||||
|
for( int i = 0; i < 2; ++i )
|
||||||
|
{
|
||||||
|
const int len = snprintf( rbuf() + offset, rbuf.size() - offset,
|
||||||
|
" %4d-%02u-%02u %02u:%02u %s%s%s\n",
|
||||||
|
1900 + tm->tm_year, 1 + tm->tm_mon, tm->tm_mday,
|
||||||
|
tm->tm_hour, tm->tm_min, extended.path().c_str(),
|
||||||
|
link_string, islink ? extended.linkpath().c_str() : "" );
|
||||||
|
if( (int)rbuf.size() > len + offset ) break;
|
||||||
|
if( !rbuf.resize( len + offset + 1 ) ) return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
if( rbuf.size() < extended.path().size() + 2 &&
|
||||||
|
!rbuf.resize( extended.path().size() + 2 ) ) return false;
|
||||||
|
snprintf( rbuf(), rbuf.size(), "%s\n", extended.path().c_str() );
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
bool show_member_name( const Extended & extended, const Tar_header header,
|
||||||
|
const int vlevel, Resizable_buffer & rbuf )
|
||||||
|
{
|
||||||
|
if( verbosity >= vlevel )
|
||||||
|
{
|
||||||
|
if( !format_member_name( extended, header, rbuf, verbosity > vlevel ) )
|
||||||
|
{ show_error( mem_msg ); return false; }
|
||||||
|
std::fputs( rbuf(), stdout );
|
||||||
|
std::fflush( stdout );
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
bool check_skip_filename( const Arg_parser & parser,
|
||||||
|
std::vector< char > & name_pending,
|
||||||
|
const char * const filename, const int filenames )
|
||||||
|
{
|
||||||
|
if( Exclude::excluded( filename ) ) return true; // skip excluded files
|
||||||
|
bool skip = filenames > 0;
|
||||||
|
if( skip )
|
||||||
|
for( int i = 0; i < parser.arguments(); ++i )
|
||||||
|
if( !parser.code( i ) && parser.argument( i ).size() )
|
||||||
|
{
|
||||||
|
const char * const name =
|
||||||
|
remove_leading_dotslash( parser.argument( i ).c_str() );
|
||||||
|
if( compare_prefix_dir( name, filename ) ||
|
||||||
|
compare_tslash( name, filename ) )
|
||||||
|
{ skip = false; name_pending[i] = false; break; }
|
||||||
|
}
|
||||||
|
return skip;
|
||||||
|
}
|
2
configure
vendored
2
configure
vendored
|
@ -6,7 +6,7 @@
|
||||||
# to copy, distribute and modify it.
|
# to copy, distribute and modify it.
|
||||||
|
|
||||||
pkgname=tarlz
|
pkgname=tarlz
|
||||||
pkgversion=0.15
|
pkgversion=0.16
|
||||||
progname=tarlz
|
progname=tarlz
|
||||||
srctrigger=doc/${pkgname}.texi
|
srctrigger=doc/${pkgname}.texi
|
||||||
|
|
||||||
|
|
|
@ -30,9 +30,7 @@
|
||||||
#include <unistd.h>
|
#include <unistd.h>
|
||||||
#include <sys/stat.h>
|
#include <sys/stat.h>
|
||||||
#include <sys/types.h>
|
#include <sys/types.h>
|
||||||
#if defined(__GNU_LIBRARY__)
|
|
||||||
#include <sys/sysmacros.h> // for major, minor
|
#include <sys/sysmacros.h> // for major, minor
|
||||||
#endif
|
|
||||||
#include <ftw.h>
|
#include <ftw.h>
|
||||||
#include <grp.h>
|
#include <grp.h>
|
||||||
#include <pwd.h>
|
#include <pwd.h>
|
||||||
|
|
|
@ -75,10 +75,10 @@ int tail_copy( const char * const archive_namep, const Arg_parser & parser,
|
||||||
if( ostream_pos < 0 ) { show_error( "Seek error", errno ); retval = 1; }
|
if( ostream_pos < 0 ) { show_error( "Seek error", errno ); retval = 1; }
|
||||||
else if( ostream_pos > 0 && ostream_pos < lzip_index.file_size() )
|
else if( ostream_pos > 0 && ostream_pos < lzip_index.file_size() )
|
||||||
{
|
{
|
||||||
int result;
|
int ret;
|
||||||
do result = ftruncate( outfd, ostream_pos );
|
do ret = ftruncate( outfd, ostream_pos );
|
||||||
while( result != 0 && errno == EINTR );
|
while( ret != 0 && errno == EINTR );
|
||||||
if( result != 0 )
|
if( ret != 0 || lseek( outfd, 0, SEEK_END ) != ostream_pos )
|
||||||
{
|
{
|
||||||
show_file_error( archive_namep, "Can't truncate archive", errno );
|
show_file_error( archive_namep, "Can't truncate archive", errno );
|
||||||
if( retval < 1 ) retval = 1;
|
if( retval < 1 ) retval = 1;
|
||||||
|
|
|
@ -51,7 +51,7 @@ int delete_members_lz( const char * const archive_namep,
|
||||||
|
|
||||||
long long istream_pos = 0; // source of next data move
|
long long istream_pos = 0; // source of next data move
|
||||||
const long long cdata_size = lzip_index.cdata_size();
|
const long long cdata_size = lzip_index.cdata_size();
|
||||||
int retval = 0;
|
int retval = 0, retval2 = 0;
|
||||||
for( long i = 0; i < lzip_index.members(); ++i )
|
for( long i = 0; i < lzip_index.members(); ++i )
|
||||||
{
|
{
|
||||||
const long long mdata_pos = lzip_index.dblock( i ).pos();
|
const long long mdata_pos = lzip_index.dblock( i ).pos();
|
||||||
|
@ -142,16 +142,16 @@ int delete_members_lz( const char * const archive_namep,
|
||||||
if( member_begin != mdata_pos || data_pos != mdata_end )
|
if( member_begin != mdata_pos || data_pos != mdata_end )
|
||||||
{ show_file_error( extended.path().c_str(),
|
{ show_file_error( extended.path().c_str(),
|
||||||
"Can't delete: not individually compressed." );
|
"Can't delete: not individually compressed." );
|
||||||
retval = 2; extended.reset(); continue; }
|
retval2 = 2; extended.reset(); continue; }
|
||||||
if( !show_member_name( extended, header, 1, rbuf ) )
|
if( !show_member_name( extended, header, 1, rbuf ) )
|
||||||
{ retval = 1; goto done; }
|
{ retval = 1; goto done; }
|
||||||
const long long size = member_pos - istream_pos;
|
const long long size = member_pos - istream_pos;
|
||||||
if( size > 0 ) // move pending data each time a member is deleted
|
if( size > 0 ) // move pending data each time a member is deleted
|
||||||
{
|
{
|
||||||
if( istream_pos == 0 )
|
if( istream_pos == 0 )
|
||||||
{ if( !safe_seek( outfd, size ) ) { retval = 1; break; } }
|
{ if( !safe_seek( outfd, size ) ) { retval = 1; goto done; } }
|
||||||
else if( !safe_seek( infd, istream_pos ) ||
|
else if( !safe_seek( infd, istream_pos ) ||
|
||||||
!copy_file( infd, outfd, size ) ) { retval = 1; break; }
|
!copy_file( infd, outfd, size ) ) { retval = 1; goto done; }
|
||||||
}
|
}
|
||||||
istream_pos = member_end;
|
istream_pos = member_end;
|
||||||
}
|
}
|
||||||
|
@ -159,6 +159,7 @@ int delete_members_lz( const char * const archive_namep,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
done:
|
done:
|
||||||
|
if( retval < retval2 ) retval = retval2;
|
||||||
if( LZ_decompress_close( decoder ) < 0 && !retval )
|
if( LZ_decompress_close( decoder ) < 0 && !retval )
|
||||||
{ show_error( "LZ_decompress_close failed." ); retval = 1; }
|
{ show_error( "LZ_decompress_close failed." ); retval = 1; }
|
||||||
// tail copy keeps trailing data
|
// tail copy keeps trailing data
|
||||||
|
|
22
doc/tarlz.1
22
doc/tarlz.1
|
@ -1,5 +1,5 @@
|
||||||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||||
.TH TARLZ "1" "April 2019" "tarlz 0.15" "User Commands"
|
.TH TARLZ "1" "October 2019" "tarlz 0.16" "User Commands"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
tarlz \- creates tar archives with multimember lzip compression
|
tarlz \- creates tar archives with multimember lzip compression
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -8,15 +8,19 @@ tarlz \- creates tar archives with multimember lzip compression
|
||||||
.SH DESCRIPTION
|
.SH DESCRIPTION
|
||||||
Tarlz is a massively parallel (multi\-threaded) combined implementation of
|
Tarlz is a massively parallel (multi\-threaded) combined implementation of
|
||||||
the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
|
the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
|
||||||
archives in a simplified posix pax format compressed with lzip, keeping the
|
archives in a simplified and safer variant of the POSIX pax format
|
||||||
alignment between tar members and lzip members. This method adds an indexed
|
compressed with lzip, keeping the alignment between tar members and lzip
|
||||||
lzip layer on top of the tar archive, making it possible to decode the
|
members. The resulting multimember tar.lz archive is fully backward
|
||||||
archive safely in parallel. The resulting multimember tar.lz archive is
|
compatible with standard tar tools like GNU tar, which treat it like any
|
||||||
fully backward compatible with standard tar tools like GNU tar, which treat
|
other tar.lz archive. Tarlz can append files to the end of such compressed
|
||||||
it like any other tar.lz archive. Tarlz can append files to the end of such
|
archives.
|
||||||
compressed archives.
|
|
||||||
.PP
|
.PP
|
||||||
The tarlz file format is a safe posix\-style backup format. In case of
|
Keeping the alignment between tar members and lzip members has two
|
||||||
|
advantages. It adds an indexed lzip layer on top of the tar archive, making
|
||||||
|
it possible to decode the archive safely in parallel. It also minimizes the
|
||||||
|
amount of data lost in case of corruption.
|
||||||
|
.PP
|
||||||
|
The tarlz file format is a safe POSIX\-style backup format. In case of
|
||||||
corruption, tarlz can extract all the undamaged members from the tar.lz
|
corruption, tarlz can extract all the undamaged members from the tar.lz
|
||||||
archive, skipping over the damaged members, just like the standard
|
archive, skipping over the damaged members, just like the standard
|
||||||
(uncompressed) tar. Moreover, the option '\-\-keep\-damaged' can be used to
|
(uncompressed) tar. Moreover, the option '\-\-keep\-damaged' can be used to
|
||||||
|
|
216
doc/tarlz.info
216
doc/tarlz.info
|
@ -11,12 +11,13 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
|
||||||
Tarlz Manual
|
Tarlz Manual
|
||||||
************
|
************
|
||||||
|
|
||||||
This manual is for Tarlz (version 0.15, 11 April 2019).
|
This manual is for Tarlz (version 0.16, 8 October 2019).
|
||||||
|
|
||||||
* Menu:
|
* Menu:
|
||||||
|
|
||||||
* Introduction:: Purpose and features of tarlz
|
* Introduction:: Purpose and features of tarlz
|
||||||
* Invoking tarlz:: Command line interface
|
* Invoking tarlz:: Command line interface
|
||||||
|
* Portable character set:: POSIX portable filename character set
|
||||||
* File format:: Detailed format of the compressed archive
|
* File format:: Detailed format of the compressed archive
|
||||||
* Amendments to pax format:: The reasons for the differences with pax
|
* Amendments to pax format:: The reasons for the differences with pax
|
||||||
* Multi-threaded tar:: Limitations of parallel tar decoding
|
* Multi-threaded tar:: Limitations of parallel tar decoding
|
||||||
|
@ -39,13 +40,19 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
|
||||||
|
|
||||||
Tarlz is a massively parallel (multi-threaded) combined implementation
|
Tarlz is a massively parallel (multi-threaded) combined implementation
|
||||||
of the tar archiver and the lzip compressor. Tarlz creates, lists and
|
of the tar archiver and the lzip compressor. Tarlz creates, lists and
|
||||||
extracts archives in a simplified posix pax format compressed with
|
extracts archives in a simplified and safer variant of the POSIX pax
|
||||||
lzip, keeping the alignment between tar members and lzip members. This
|
format compressed with lzip, keeping the alignment between tar members
|
||||||
method adds an indexed lzip layer on top of the tar archive, making it
|
and lzip members. The resulting multimember tar.lz archive is fully
|
||||||
possible to decode the archive safely in parallel. The resulting
|
backward compatible with standard tar tools like GNU tar, which treat
|
||||||
multimember tar.lz archive is fully backward compatible with standard
|
it like any other tar.lz archive. Tarlz can append files to the end of
|
||||||
tar tools like GNU tar, which treat it like any other tar.lz archive.
|
such compressed archives.
|
||||||
Tarlz can append files to the end of such compressed archives.
|
|
||||||
|
Keeping the alignment between tar members and lzip members has two
|
||||||
|
advantages. It adds an indexed lzip layer on top of the tar archive,
|
||||||
|
making it possible to decode the archive safely in parallel. It also
|
||||||
|
minimizes the amount of data lost in case of corruption. Compressing a
|
||||||
|
tar archive with plzip may even double the amount of files lost for
|
||||||
|
each lzip member damaged because it does not keep the members aligned.
|
||||||
|
|
||||||
Tarlz can create tar archives with five levels of compression
|
Tarlz can create tar archives with five levels of compression
|
||||||
granularity; per file (--no-solid), per block (--bsolid, default), per
|
granularity; per file (--no-solid), per block (--bsolid, default), per
|
||||||
|
@ -62,7 +69,7 @@ archive, but it has the following advantages:
|
||||||
member), and unwanted members can be deleted from the archive. Just
|
member), and unwanted members can be deleted from the archive. Just
|
||||||
like an uncompressed tar archive.
|
like an uncompressed tar archive.
|
||||||
|
|
||||||
* It is a safe posix-style backup format. In case of corruption,
|
* It is a safe POSIX-style backup format. In case of corruption,
|
||||||
tarlz can extract all the undamaged members from the tar.lz
|
tarlz can extract all the undamaged members from the tar.lz
|
||||||
archive, skipping over the damaged members, just like the standard
|
archive, skipping over the damaged members, just like the standard
|
||||||
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
|
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
|
||||||
|
@ -77,10 +84,11 @@ archive, but it has the following advantages:
|
||||||
with standard tar tools. *Note crc32::.
|
with standard tar tools. *Note crc32::.
|
||||||
|
|
||||||
Tarlz does not understand other tar formats like 'gnu', 'oldgnu',
|
Tarlz does not understand other tar formats like 'gnu', 'oldgnu',
|
||||||
'star' or 'v7'.
|
'star' or 'v7'. 'tarlz -tf archive.tar.lz > /dev/null' can be used to
|
||||||
|
verify that the format of the archive is compatible with tarlz.
|
||||||
|
|
||||||
|
|
||||||
File: tarlz.info, Node: Invoking tarlz, Next: File format, Prev: Introduction, Up: Top
|
File: tarlz.info, Node: Invoking tarlz, Next: Portable character set, Prev: Introduction, Up: Top
|
||||||
|
|
||||||
2 Invoking tarlz
|
2 Invoking tarlz
|
||||||
****************
|
****************
|
||||||
|
@ -149,11 +157,11 @@ equivalent to '-1 --solid'
|
||||||
Change to directory DIR. When creating or appending, the position
|
Change to directory DIR. When creating or appending, the position
|
||||||
of each '-C' option in the command line is significant; it will
|
of each '-C' option in the command line is significant; it will
|
||||||
change the current working directory for the following FILES until
|
change the current working directory for the following FILES until
|
||||||
a new '-C' option appears in the command line. When extracting, all
|
a new '-C' option appears in the command line. When extracting or
|
||||||
the '-C' options are executed in sequence before starting the
|
comparing, all the '-C' options are executed in sequence before
|
||||||
extraction. Listing ignores any '-C' options specified. DIR is
|
reading the archive. Listing ignores any '-C' options specified.
|
||||||
relative to the then current working directory, perhaps changed by
|
DIR is relative to the then current working directory, perhaps
|
||||||
a previous '-C' option.
|
changed by a previous '-C' option.
|
||||||
|
|
||||||
Note that a process can only have one current working directory
|
Note that a process can only have one current working directory
|
||||||
(CWD). Therefore multi-threading can't be used to create an
|
(CWD). Therefore multi-threading can't be used to create an
|
||||||
|
@ -162,17 +170,18 @@ equivalent to '-1 --solid'
|
||||||
|
|
||||||
'-d'
|
'-d'
|
||||||
'--diff'
|
'--diff'
|
||||||
Find differences between archive and file system. For each tar
|
Compare and report differences between archive and file system.
|
||||||
member in the archive, verify that the corresponding file exists
|
For each tar member in the archive, verify that the corresponding
|
||||||
and is of the same type (regular file, directory, etc). Report on
|
file in the file system exists and is of the same type (regular
|
||||||
standard output the differences found in type, mode (permissions),
|
file, directory, etc). Report on standard output the differences
|
||||||
owner and group IDs, modification time, file size, file contents
|
found in type, mode (permissions), owner and group IDs,
|
||||||
(of regular files), target (of symlinks) and device number (of
|
modification time, file size, file contents (of regular files),
|
||||||
block/character special files).
|
target (of symlinks) and device number (of block/character special
|
||||||
|
files).
|
||||||
|
|
||||||
As tarlz removes leading slashes from member names, the '-C'
|
As tarlz removes leading slashes from member names, the '-C'
|
||||||
option may be used in combination with '--diff' when absolute
|
option may be used in combination with '--diff' when absolute file
|
||||||
filenames were used on archive creation: 'tarlz -C / -d'.
|
names were used on archive creation: 'tarlz -C / -d'.
|
||||||
Alternatively, tarlz may be run from the root directory to perform
|
Alternatively, tarlz may be run from the root directory to perform
|
||||||
the comparison.
|
the comparison.
|
||||||
|
|
||||||
|
@ -184,15 +193,22 @@ equivalent to '-1 --solid'
|
||||||
Delete the specified files and directories from an archive in
|
Delete the specified files and directories from an archive in
|
||||||
place. It currently can delete only from uncompressed archives and
|
place. It currently can delete only from uncompressed archives and
|
||||||
from archives with individually compressed files ('--no-solid'
|
from archives with individually compressed files ('--no-solid'
|
||||||
archives). To delete a directory without deleting the files under
|
archives). Note that files of about '--data-size' or larger are
|
||||||
it, use 'tarlz --delete -f foo --exclude='dir/*' dir'. Deleting in
|
compressed individually even if '--bsolid' is used, and can
|
||||||
place may be dangerous. A corrupt archive, a power cut, or an I/O
|
therefore be deleted. Tarlz takes care to not delete a tar member
|
||||||
error may cause data loss.
|
unless it is possible to do so. For example it won't try to delete
|
||||||
|
a tar member that is not individually compressed. To delete a
|
||||||
|
directory without deleting the files under it, use
|
||||||
|
'tarlz --delete -f foo --exclude='dir/*' dir'. Deleting in place
|
||||||
|
may be dangerous. A corrupt archive, a power cut, or an I/O error
|
||||||
|
may cause data loss.
|
||||||
|
|
||||||
'--exclude=PATTERN'
|
'--exclude=PATTERN'
|
||||||
Exclude files matching a shell pattern like '*.o'. A file is
|
Exclude files matching a shell pattern like '*.o'. A file is
|
||||||
considered to match if any component of the file name matches. For
|
considered to match if any component of the file name matches. For
|
||||||
example, '*.o' matches 'foo.o', 'foo.o/bar' and 'foo/bar.o'.
|
example, '*.o' matches 'foo.o', 'foo.o/bar' and 'foo/bar.o'. If
|
||||||
|
PATTERN contains a '/', it matches a corresponding '/' in the file
|
||||||
|
name. For example, 'foo/*.o' matches 'foo/bar.o'.
|
||||||
|
|
||||||
'-f ARCHIVE'
|
'-f ARCHIVE'
|
||||||
'--file=ARCHIVE'
|
'--file=ARCHIVE'
|
||||||
|
@ -234,13 +250,15 @@ equivalent to '-1 --solid'
|
||||||
Compressed members can't be appended to an uncompressed archive,
|
Compressed members can't be appended to an uncompressed archive,
|
||||||
nor vice versa. If the archive is compressed, it must be a
|
nor vice versa. If the archive is compressed, it must be a
|
||||||
multimember lzip file with the two end-of-file blocks plus any
|
multimember lzip file with the two end-of-file blocks plus any
|
||||||
zero padding contained in the last lzip member of the archive.
|
zero padding contained in the last lzip member of the archive. It
|
||||||
Appending works as follows; first the end-of-file blocks are
|
is possible to append files to an archive with a different
|
||||||
removed, then the new members are appended, and finally two new
|
compression granularity. Appending works as follows; first the
|
||||||
end-of-file blocks are appended to the archive. If the archive is
|
end-of-file blocks are removed, then the new members are appended,
|
||||||
uncompressed, tarlz parses and skips tar headers until it finds
|
and finally two new end-of-file blocks are appended to the
|
||||||
the end-of-file blocks. Exit with status 0 without modifying the
|
archive. If the archive is uncompressed, tarlz parses and skips
|
||||||
archive if no FILES have been specified.
|
tar headers until it finds the end-of-file blocks. Exit with
|
||||||
|
status 0 without modifying the archive if no FILES have been
|
||||||
|
specified.
|
||||||
|
|
||||||
'-t'
|
'-t'
|
||||||
'--list'
|
'--list'
|
||||||
|
@ -351,7 +369,7 @@ equivalent to '-1 --solid'
|
||||||
that a corrupt 'GNU.crc32' keyword, for example 'GNU.crc33', is
|
that a corrupt 'GNU.crc32' keyword, for example 'GNU.crc33', is
|
||||||
reported as a missing CRC instead of as a corrupt record. This
|
reported as a missing CRC instead of as a corrupt record. This
|
||||||
misleading 'Missing CRC' message is the consequence of a flaw in
|
misleading 'Missing CRC' message is the consequence of a flaw in
|
||||||
the posix pax format; i.e., the lack of a mandatory check sequence
|
the POSIX pax format; i.e., the lack of a mandatory check sequence
|
||||||
in the extended records. *Note crc32::.
|
in the extended records. *Note crc32::.
|
||||||
|
|
||||||
'--out-slots=N'
|
'--out-slots=N'
|
||||||
|
@ -369,9 +387,24 @@ invalid input file, 3 for an internal consistency error (eg, bug) which
|
||||||
caused tarlz to panic.
|
caused tarlz to panic.
|
||||||
|
|
||||||
|
|
||||||
File: tarlz.info, Node: File format, Next: Amendments to pax format, Prev: Invoking tarlz, Up: Top
|
File: tarlz.info, Node: Portable character set, Next: File format, Prev: Invoking tarlz, Up: Top
|
||||||
|
|
||||||
3 File format
|
3 POSIX portable filename character set
|
||||||
|
***************************************
|
||||||
|
|
||||||
|
The set of characters from which portable file names are constructed.
|
||||||
|
|
||||||
|
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
|
||||||
|
a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||||
|
0 1 2 3 4 5 6 7 8 9 . _ -
|
||||||
|
|
||||||
|
The last three characters are the period, underscore, and
|
||||||
|
hyphen-minus characters, respectively.
|
||||||
|
|
||||||
|
|
||||||
|
File: tarlz.info, Node: File format, Next: Amendments to pax format, Prev: Portable character set, Up: Top
|
||||||
|
|
||||||
|
4 File format
|
||||||
*************
|
*************
|
||||||
|
|
||||||
In the diagram below, a box like this:
|
In the diagram below, a box like this:
|
||||||
|
@ -393,7 +426,7 @@ sets). The members simply appear one after another in the file, with no
|
||||||
additional information before, between, or after them.
|
additional information before, between, or after them.
|
||||||
|
|
||||||
Each lzip member contains one or more tar members in a simplified
|
Each lzip member contains one or more tar members in a simplified
|
||||||
posix pax interchange format. The only pax typeflag value supported by
|
POSIX pax interchange format. The only pax typeflag value supported by
|
||||||
tarlz (in addition to the typeflag values defined by the ustar format)
|
tarlz (in addition to the typeflag values defined by the ustar format)
|
||||||
is 'x'. The pax format is an extension on top of the ustar format that
|
is 'x'. The pax format is an extension on top of the ustar format that
|
||||||
removes the size limitations of the ustar format.
|
removes the size limitations of the ustar format.
|
||||||
|
@ -438,7 +471,7 @@ tar.lz
|
||||||
+===============+=================================================+========+
|
+===============+=================================================+========+
|
||||||
|
|
||||||
|
|
||||||
3.1 Pax header block
|
4.1 Pax header block
|
||||||
====================
|
====================
|
||||||
|
|
||||||
The pax header block is identical to the ustar header block described
|
The pax header block is identical to the ustar header block described
|
||||||
|
@ -492,7 +525,7 @@ conversion to UTF-8 nor any other transformation.
|
||||||
swapping of two bytes.
|
swapping of two bytes.
|
||||||
|
|
||||||
|
|
||||||
3.2 Ustar header block
|
4.2 Ustar header block
|
||||||
======================
|
======================
|
||||||
|
|
||||||
The ustar header block has a length of 512 bytes and is structured as
|
The ustar header block has a length of 512 bytes and is structured as
|
||||||
|
@ -519,11 +552,10 @@ prefix 345 155
|
||||||
All characters in the header block are coded using the ISO/IEC
|
All characters in the header block are coded using the ISO/IEC
|
||||||
646:1991 (ASCII) standard, except in fields storing names for files,
|
646:1991 (ASCII) standard, except in fields storing names for files,
|
||||||
users, and groups. For maximum portability between implementations,
|
users, and groups. For maximum portability between implementations,
|
||||||
names should only contain characters from the portable filename
|
names should only contain characters from the portable character set.
|
||||||
character set. But if an implementation supports the use of characters
|
But if an implementation supports the use of characters outside of '/'
|
||||||
outside of '/' and the portable filename character set in names for
|
and the portable character set in names for files, users, and groups,
|
||||||
files, users, and groups, tarlz will use the byte values in these names
|
tarlz will use the byte values in these names unmodified.
|
||||||
unmodified.
|
|
||||||
|
|
||||||
The fields name, linkname, and prefix are null-terminated character
|
The fields name, linkname, and prefix are null-terminated character
|
||||||
strings except when all characters in the array contain non-null
|
strings except when all characters in the array contain non-null
|
||||||
|
@ -618,20 +650,22 @@ character.
|
||||||
|
|
||||||
File: tarlz.info, Node: Amendments to pax format, Next: Multi-threaded tar, Prev: File format, Up: Top
|
File: tarlz.info, Node: Amendments to pax format, Next: Multi-threaded tar, Prev: File format, Up: Top
|
||||||
|
|
||||||
4 The reasons for the differences with pax
|
5 The reasons for the differences with pax
|
||||||
******************************************
|
******************************************
|
||||||
|
|
||||||
Tarlz is meant to reliably detect invalid or corrupt metadata during
|
Tarlz creates safe archives that allow the reliable detection of
|
||||||
decoding, and to create safe archives where corrupt metadata can be
|
invalid or corrupt metadata during decoding even when the integrity
|
||||||
reliably detected. In order to achieve these goals, tarlz makes some
|
checking of lzip can't be used because the lzip members are only
|
||||||
changes to the variant of the pax format that it uses. This chapter
|
decompressed partially, as it happens in parallel '--list' and
|
||||||
describes these changes and the concrete reasons to implement them.
|
'--extract'. In order to achieve this goal, tarlz makes some changes to
|
||||||
|
the variant of the pax format that it uses. This chapter describes
|
||||||
|
these changes and the concrete reasons to implement them.
|
||||||
|
|
||||||
|
|
||||||
4.1 Add a CRC of the extended records
|
5.1 Add a CRC of the extended records
|
||||||
=====================================
|
=====================================
|
||||||
|
|
||||||
The posix pax format has a serious flaw. The metadata stored in pax
|
The POSIX pax format has a serious flaw. The metadata stored in pax
|
||||||
extended records are not protected by any kind of check sequence.
|
extended records are not protected by any kind of check sequence.
|
||||||
Corruption in a long file name may cause the extraction of the file in
|
Corruption in a long file name may cause the extraction of the file in
|
||||||
the wrong place without warning. Corruption in a large file size may
|
the wrong place without warning. Corruption in a large file size may
|
||||||
|
@ -645,11 +679,16 @@ in them, potentially much worse that undetected corruption in the data.
|
||||||
Even more so in the case of pax because the amount of metadata it
|
Even more so in the case of pax because the amount of metadata it
|
||||||
stores is potentially large, making undetected corruption more probable.
|
stores is potentially large, making undetected corruption more probable.
|
||||||
|
|
||||||
|
Headers and metadata must be protected separately from data because
|
||||||
|
the integrity checking of lzip may not be able to detect the corruption
|
||||||
|
before the metadata has been used, for example, to create a new file in
|
||||||
|
the wrong place.
|
||||||
|
|
||||||
Because of the above, tarlz protects the extended records with a CRC
|
Because of the above, tarlz protects the extended records with a CRC
|
||||||
in a way compatible with standard tar tools. *Note key_crc32::.
|
in a way compatible with standard tar tools. *Note key_crc32::.
|
||||||
|
|
||||||
|
|
||||||
4.2 Remove flawed backward compatibility
|
5.2 Remove flawed backward compatibility
|
||||||
========================================
|
========================================
|
||||||
|
|
||||||
In order to allow the extraction of pax archives by a tar utility
|
In order to allow the extraction of pax archives by a tar utility
|
||||||
|
@ -660,9 +699,9 @@ approach is broken because if the extended header is needed because of
|
||||||
a long file name, the name and prefix fields will be unable to contain
|
a long file name, the name and prefix fields will be unable to contain
|
||||||
the full pathname of the file. Therefore the files corresponding to
|
the full pathname of the file. Therefore the files corresponding to
|
||||||
both the extended header and the overridden ustar header will be
|
both the extended header and the overridden ustar header will be
|
||||||
extracted using truncated filenames, perhaps overwriting existing files
|
extracted using truncated file names, perhaps overwriting existing
|
||||||
or directories. It may be a security risk to extract a file with a
|
files or directories. It may be a security risk to extract a file with
|
||||||
truncated filename.
|
a truncated file name.
|
||||||
|
|
||||||
To avoid this problem, tarlz writes extended headers with all fields
|
To avoid this problem, tarlz writes extended headers with all fields
|
||||||
zeroed except size, chksum, typeflag, magic and version. This prevents
|
zeroed except size, chksum, typeflag, magic and version. This prevents
|
||||||
|
@ -672,28 +711,29 @@ overridden by extended records.
|
||||||
|
|
||||||
If an extended header is required for any reason (for example a file
|
If an extended header is required for any reason (for example a file
|
||||||
size larger than 8 GiB or a link name longer than 100 bytes), tarlz
|
size larger than 8 GiB or a link name longer than 100 bytes), tarlz
|
||||||
moves the filename also to the extended header to prevent an ustar tool
|
moves the file name also to the extended header to prevent an ustar
|
||||||
from trying to extract the file or link. This also makes easier during
|
tool from trying to extract the file or link. This also makes easier
|
||||||
parallel decoding the detection of a tar member split between two lzip
|
during parallel decoding the detection of a tar member split between
|
||||||
members at the boundary between the extended header and the ustar
|
two lzip members at the boundary between the extended header and the
|
||||||
header.
|
ustar header.
|
||||||
|
|
||||||
|
|
||||||
4.3 As simple as possible (but not simpler)
|
5.3 As simple as possible (but not simpler)
|
||||||
===========================================
|
===========================================
|
||||||
|
|
||||||
The tarlz format is mainly ustar. Extended pax headers are used only
|
The tarlz format is mainly ustar. Extended pax headers are used only
|
||||||
when needed because the length of a file name or link name, or the size
|
when needed because the length of a file name or link name, or the size
|
||||||
of a file exceed the limits of the ustar format. Adding extended
|
of a file exceed the limits of the ustar format. Adding extended
|
||||||
headers to each member just to record subsecond timestamps seems
|
headers to each member just to record subsecond timestamps seems
|
||||||
wasteful for a backup format.
|
wasteful for a backup format. Moreover, minimizing the overhead may
|
||||||
|
help recovering the archive with lziprecover in case of corruption.
|
||||||
|
|
||||||
Global pax headers are tolerated, but not supported; they are parsed
|
Global pax headers are tolerated, but not supported; they are parsed
|
||||||
and ignored. Some operations may not behave as expected if the archive
|
and ignored. Some operations may not behave as expected if the archive
|
||||||
contains global headers.
|
contains global headers.
|
||||||
|
|
||||||
|
|
||||||
4.4 Avoid misconversions to/from UTF-8
|
5.4 Avoid misconversions to/from UTF-8
|
||||||
======================================
|
======================================
|
||||||
|
|
||||||
There is no portable way to tell what charset a text string is coded
|
There is no portable way to tell what charset a text string is coded
|
||||||
|
@ -705,7 +745,7 @@ this behavior will be adjusted with a command line option in the future.
|
||||||
|
|
||||||
File: tarlz.info, Node: Multi-threaded tar, Next: Minimum archive sizes, Prev: Amendments to pax format, Up: Top
|
File: tarlz.info, Node: Multi-threaded tar, Next: Minimum archive sizes, Prev: Amendments to pax format, Up: Top
|
||||||
|
|
||||||
5 Limitations of parallel tar decoding
|
6 Limitations of parallel tar decoding
|
||||||
**************************************
|
**************************************
|
||||||
|
|
||||||
Safely decoding an arbitrary tar archive in parallel is impossible. For
|
Safely decoding an arbitrary tar archive in parallel is impossible. For
|
||||||
|
@ -753,7 +793,7 @@ example listing the Silesia corpus on a dual core machine:
|
||||||
|
|
||||||
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded tar, Up: Top
|
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded tar, Up: Top
|
||||||
|
|
||||||
6 Minimum archive sizes required for multi-threaded block compression
|
7 Minimum archive sizes required for multi-threaded block compression
|
||||||
*********************************************************************
|
*********************************************************************
|
||||||
|
|
||||||
When creating or appending to a compressed archive using multi-threaded
|
When creating or appending to a compressed archive using multi-threaded
|
||||||
|
@ -791,7 +831,7 @@ Level
|
||||||
|
|
||||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Minimum archive sizes, Up: Top
|
File: tarlz.info, Node: Examples, Next: Problems, Prev: Minimum archive sizes, Up: Top
|
||||||
|
|
||||||
7 A small tutorial with examples
|
8 A small tutorial with examples
|
||||||
********************************
|
********************************
|
||||||
|
|
||||||
Example 1: Create a multimember compressed archive 'archive.tar.lz'
|
Example 1: Create a multimember compressed archive 'archive.tar.lz'
|
||||||
|
@ -850,7 +890,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory
|
||||||
|
|
||||||
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
||||||
|
|
||||||
8 Reporting bugs
|
9 Reporting bugs
|
||||||
****************
|
****************
|
||||||
|
|
||||||
There are probably bugs in tarlz. There are certainly errors and
|
There are probably bugs in tarlz. There are certainly errors and
|
||||||
|
@ -881,6 +921,9 @@ Concept index
|
||||||
* invoking: Invoking tarlz. (line 6)
|
* invoking: Invoking tarlz. (line 6)
|
||||||
* minimum archive sizes: Minimum archive sizes. (line 6)
|
* minimum archive sizes: Minimum archive sizes. (line 6)
|
||||||
* options: Invoking tarlz. (line 6)
|
* options: Invoking tarlz. (line 6)
|
||||||
|
* parallel tar decoding: Multi-threaded tar. (line 6)
|
||||||
|
* portable character set: Portable character set.
|
||||||
|
(line 6)
|
||||||
* usage: Invoking tarlz. (line 6)
|
* usage: Invoking tarlz. (line 6)
|
||||||
* version: Invoking tarlz. (line 6)
|
* version: Invoking tarlz. (line 6)
|
||||||
|
|
||||||
|
@ -888,20 +931,21 @@ Concept index
|
||||||
|
|
||||||
Tag Table:
|
Tag Table:
|
||||||
Node: Top223
|
Node: Top223
|
||||||
Node: Introduction1086
|
Node: Introduction1155
|
||||||
Node: Invoking tarlz3337
|
Node: Invoking tarlz3841
|
||||||
Ref: --data-size5489
|
Ref: --data-size6006
|
||||||
Ref: --bsolid12172
|
Ref: --bsolid13287
|
||||||
Node: File format15802
|
Node: Portable character set16917
|
||||||
Ref: key_crc3220622
|
Node: File format17420
|
||||||
Node: Amendments to pax format26039
|
Ref: key_crc3222248
|
||||||
Ref: crc3226580
|
Node: Amendments to pax format27647
|
||||||
Ref: flawed-compat27605
|
Ref: crc3228304
|
||||||
Node: Multi-threaded tar30128
|
Ref: flawed-compat29564
|
||||||
Node: Minimum archive sizes32667
|
Node: Multi-threaded tar32198
|
||||||
Node: Examples34800
|
Node: Minimum archive sizes34737
|
||||||
Node: Problems36517
|
Node: Examples36870
|
||||||
Node: Concept index37043
|
Node: Problems38587
|
||||||
|
Node: Concept index39113
|
||||||
|
|
||||||
End Tag Table
|
End Tag Table
|
||||||
|
|
||||||
|
|
149
doc/tarlz.texi
149
doc/tarlz.texi
|
@ -6,8 +6,8 @@
|
||||||
@finalout
|
@finalout
|
||||||
@c %**end of header
|
@c %**end of header
|
||||||
|
|
||||||
@set UPDATED 11 April 2019
|
@set UPDATED 8 October 2019
|
||||||
@set VERSION 0.15
|
@set VERSION 0.16
|
||||||
|
|
||||||
@dircategory Data Compression
|
@dircategory Data Compression
|
||||||
@direntry
|
@direntry
|
||||||
|
@ -37,6 +37,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
|
||||||
@menu
|
@menu
|
||||||
* Introduction:: Purpose and features of tarlz
|
* Introduction:: Purpose and features of tarlz
|
||||||
* Invoking tarlz:: Command line interface
|
* Invoking tarlz:: Command line interface
|
||||||
|
* Portable character set:: POSIX portable filename character set
|
||||||
* File format:: Detailed format of the compressed archive
|
* File format:: Detailed format of the compressed archive
|
||||||
* Amendments to pax format:: The reasons for the differences with pax
|
* Amendments to pax format:: The reasons for the differences with pax
|
||||||
* Multi-threaded tar:: Limitations of parallel tar decoding
|
* Multi-threaded tar:: Limitations of parallel tar decoding
|
||||||
|
@ -60,13 +61,19 @@ to copy, distribute and modify it.
|
||||||
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
|
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
|
||||||
(multi-threaded) combined implementation of the tar archiver and the
|
(multi-threaded) combined implementation of the tar archiver and the
|
||||||
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz creates,
|
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz creates,
|
||||||
lists and extracts archives in a simplified posix pax format compressed with
|
lists and extracts archives in a simplified and safer variant of the POSIX
|
||||||
lzip, keeping the alignment between tar members and lzip members. This
|
pax format compressed with lzip, keeping the alignment between tar members
|
||||||
method adds an indexed lzip layer on top of the tar archive, making it
|
and lzip members. The resulting multimember tar.lz archive is fully backward
|
||||||
possible to decode the archive safely in parallel. The resulting multimember
|
compatible with standard tar tools like GNU tar, which treat it like any
|
||||||
tar.lz archive is fully backward compatible with standard tar tools like GNU
|
other tar.lz archive. Tarlz can append files to the end of such compressed
|
||||||
tar, which treat it like any other tar.lz archive. Tarlz can append files to
|
archives.
|
||||||
the end of such compressed archives.
|
|
||||||
|
Keeping the alignment between tar members and lzip members has two
|
||||||
|
advantages. It adds an indexed lzip layer on top of the tar archive, making
|
||||||
|
it possible to decode the archive safely in parallel. It also minimizes the
|
||||||
|
amount of data lost in case of corruption. Compressing a tar archive with
|
||||||
|
plzip may even double the amount of files lost for each lzip member damaged
|
||||||
|
because it does not keep the members aligned.
|
||||||
|
|
||||||
Tarlz can create tar archives with five levels of compression granularity;
|
Tarlz can create tar archives with five levels of compression granularity;
|
||||||
per file (---no-solid), per block (---bsolid, default), per directory
|
per file (---no-solid), per block (---bsolid, default), per directory
|
||||||
|
@ -88,7 +95,7 @@ member), and unwanted members can be deleted from the archive. Just
|
||||||
like an uncompressed tar archive.
|
like an uncompressed tar archive.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
It is a safe posix-style backup format. In case of corruption,
|
It is a safe POSIX-style backup format. In case of corruption,
|
||||||
tarlz can extract all the undamaged members from the tar.lz
|
tarlz can extract all the undamaged members from the tar.lz
|
||||||
archive, skipping over the damaged members, just like the standard
|
archive, skipping over the damaged members, just like the standard
|
||||||
(uncompressed) tar. Moreover, the option @samp{--keep-damaged} can be
|
(uncompressed) tar. Moreover, the option @samp{--keep-damaged} can be
|
||||||
|
@ -105,7 +112,9 @@ Tarlz protects the extended records with a CRC in a way compatible with
|
||||||
standard tar tools. @xref{crc32}.
|
standard tar tools. @xref{crc32}.
|
||||||
|
|
||||||
Tarlz does not understand other tar formats like @samp{gnu}, @samp{oldgnu},
|
Tarlz does not understand other tar formats like @samp{gnu}, @samp{oldgnu},
|
||||||
@samp{star} or @samp{v7}.
|
@samp{star} or @samp{v7}. @w{@samp{tarlz -tf archive.tar.lz > /dev/null}}
|
||||||
|
can be used to verify that the format of the archive is compatible with
|
||||||
|
tarlz.
|
||||||
|
|
||||||
|
|
||||||
@node Invoking tarlz
|
@node Invoking tarlz
|
||||||
|
@ -179,13 +188,13 @@ Create a new archive from @var{files}.
|
||||||
|
|
||||||
@item -C @var{dir}
|
@item -C @var{dir}
|
||||||
@itemx --directory=@var{dir}
|
@itemx --directory=@var{dir}
|
||||||
Change to directory @var{dir}. When creating or appending, the position
|
Change to directory @var{dir}. When creating or appending, the position of
|
||||||
of each @samp{-C} option in the command line is significant; it will
|
each @samp{-C} option in the command line is significant; it will change the
|
||||||
change the current working directory for the following @var{files} until
|
current working directory for the following @var{files} until a new
|
||||||
a new @samp{-C} option appears in the command line. When extracting, all
|
@samp{-C} option appears in the command line. When extracting or comparing,
|
||||||
the @samp{-C} options are executed in sequence before starting the
|
all the @samp{-C} options are executed in sequence before reading the
|
||||||
extraction. Listing ignores any @samp{-C} options specified. @var{dir}
|
archive. Listing ignores any @samp{-C} options specified. @var{dir} is
|
||||||
is relative to the then current working directory, perhaps changed by a
|
relative to the then current working directory, perhaps changed by a
|
||||||
previous @samp{-C} option.
|
previous @samp{-C} option.
|
||||||
|
|
||||||
Note that a process can only have one current working directory (CWD).
|
Note that a process can only have one current working directory (CWD).
|
||||||
|
@ -194,12 +203,12 @@ option appears after a relative filename in the command line.
|
||||||
|
|
||||||
@item -d
|
@item -d
|
||||||
@itemx --diff
|
@itemx --diff
|
||||||
Find differences between archive and file system. For each tar member in the
|
Compare and report differences between archive and file system. For each tar
|
||||||
archive, verify that the corresponding file exists and is of the same type
|
member in the archive, verify that the corresponding file in the file system
|
||||||
(regular file, directory, etc). Report on standard output the differences
|
exists and is of the same type (regular file, directory, etc). Report on
|
||||||
found in type, mode (permissions), owner and group IDs, modification time,
|
standard output the differences found in type, mode (permissions), owner and
|
||||||
file size, file contents (of regular files), target (of symlinks) and device
|
group IDs, modification time, file size, file contents (of regular files),
|
||||||
number (of block/character special files).
|
target (of symlinks) and device number (of block/character special files).
|
||||||
|
|
||||||
As tarlz removes leading slashes from member names, the @samp{-C} option may
|
As tarlz removes leading slashes from member names, the @samp{-C} option may
|
||||||
be used in combination with @samp{--diff} when absolute file names were used
|
be used in combination with @samp{--diff} when absolute file names were used
|
||||||
|
@ -213,16 +222,22 @@ useful when comparing an @samp{--anonymous} archive.
|
||||||
@item --delete
|
@item --delete
|
||||||
Delete the specified files and directories from an archive in place. It
|
Delete the specified files and directories from an archive in place. It
|
||||||
currently can delete only from uncompressed archives and from archives with
|
currently can delete only from uncompressed archives and from archives with
|
||||||
individually compressed files (@samp{--no-solid} archives). To delete a
|
individually compressed files (@samp{--no-solid} archives). Note that files
|
||||||
|
of about @samp{--data-size} or larger are compressed individually even if
|
||||||
|
@samp{--bsolid} is used, and can therefore be deleted. Tarlz takes care to
|
||||||
|
not delete a tar member unless it is possible to do so. For example it won't
|
||||||
|
try to delete a tar member that is not individually compressed. To delete a
|
||||||
directory without deleting the files under it, use
|
directory without deleting the files under it, use
|
||||||
@w{@code{tarlz --delete -f foo --exclude='dir/*' dir}}. Deleting in place
|
@w{@samp{tarlz --delete -f foo --exclude='dir/*' dir}}. Deleting in place
|
||||||
may be dangerous. A corrupt archive, a power cut, or an I/O error may cause
|
may be dangerous. A corrupt archive, a power cut, or an I/O error may cause
|
||||||
data loss.
|
data loss.
|
||||||
|
|
||||||
@item --exclude=@var{pattern}
|
@item --exclude=@var{pattern}
|
||||||
Exclude files matching a shell pattern like @samp{*.o}. A file is considered
|
Exclude files matching a shell pattern like @samp{*.o}. A file is considered
|
||||||
to match if any component of the file name matches. For example, @samp{*.o}
|
to match if any component of the file name matches. For example, @samp{*.o}
|
||||||
matches @samp{foo.o}, @samp{foo.o/bar} and @samp{foo/bar.o}.
|
matches @samp{foo.o}, @samp{foo.o/bar} and @samp{foo/bar.o}. If
|
||||||
|
@var{pattern} contains a @samp{/}, it matches a corresponding @samp{/} in
|
||||||
|
the file name. For example, @samp{foo/*.o} matches @samp{foo/bar.o}.
|
||||||
|
|
||||||
@item -f @var{archive}
|
@item -f @var{archive}
|
||||||
@itemx --file=@var{archive}
|
@itemx --file=@var{archive}
|
||||||
|
@ -261,12 +276,13 @@ Append files to the end of an archive. The archive must be a regular
|
||||||
be appended to an uncompressed archive, nor vice versa. If the archive is
|
be appended to an uncompressed archive, nor vice versa. If the archive is
|
||||||
compressed, it must be a multimember lzip file with the two end-of-file
|
compressed, it must be a multimember lzip file with the two end-of-file
|
||||||
blocks plus any zero padding contained in the last lzip member of the
|
blocks plus any zero padding contained in the last lzip member of the
|
||||||
archive. Appending works as follows; first the end-of-file blocks are
|
archive. It is possible to append files to an archive with a different
|
||||||
removed, then the new members are appended, and finally two new end-of-file
|
compression granularity. Appending works as follows; first the end-of-file
|
||||||
blocks are appended to the archive. If the archive is uncompressed, tarlz
|
blocks are removed, then the new members are appended, and finally two new
|
||||||
parses and skips tar headers until it finds the end-of-file blocks. Exit
|
end-of-file blocks are appended to the archive. If the archive is
|
||||||
with status 0 without modifying the archive if no @var{files} have been
|
uncompressed, tarlz parses and skips tar headers until it finds the
|
||||||
specified.
|
end-of-file blocks. Exit with status 0 without modifying the archive if no
|
||||||
|
@var{files} have been specified.
|
||||||
|
|
||||||
@item -t
|
@item -t
|
||||||
@itemx --list
|
@itemx --list
|
||||||
|
@ -282,7 +298,7 @@ Verbosely list files processed.
|
||||||
Extract files from an archive. If @var{files} are given, extract only the
|
Extract files from an archive. If @var{files} are given, extract only the
|
||||||
@var{files} given. Else extract all the files in the archive. To extract a
|
@var{files} given. Else extract all the files in the archive. To extract a
|
||||||
directory without extracting the files under it, use
|
directory without extracting the files under it, use
|
||||||
@w{@code{tarlz -xf foo --exclude='dir/*' dir}}.
|
@w{@samp{tarlz -xf foo --exclude='dir/*' dir}}.
|
||||||
|
|
||||||
@item -0 .. -9
|
@item -0 .. -9
|
||||||
Set the compression level for @samp{--create} and @samp{--append}. The
|
Set the compression level for @samp{--create} and @samp{--append}. The
|
||||||
|
@ -326,7 +342,7 @@ compressed data block must contain an integer number of tar members. Block
|
||||||
compression is the default because it improves compression ratio for
|
compression is the default because it improves compression ratio for
|
||||||
archives with many files smaller than the block size. This option allows
|
archives with many files smaller than the block size. This option allows
|
||||||
tarlz revert to default behavior if, for example, it is invoked through an
|
tarlz revert to default behavior if, for example, it is invoked through an
|
||||||
alias like @code{tar='tarlz --solid'}. @xref{--data-size}, to set the target
|
alias like @samp{tar='tarlz --solid'}. @xref{--data-size}, to set the target
|
||||||
block size.
|
block size.
|
||||||
|
|
||||||
@item --dsolid
|
@item --dsolid
|
||||||
|
@ -374,7 +390,7 @@ When this option is used, tarlz detects any corruption in the extended
|
||||||
records (only limited by CRC collisions). But note that a corrupt
|
records (only limited by CRC collisions). But note that a corrupt
|
||||||
@samp{GNU.crc32} keyword, for example @samp{GNU.crc33}, is reported as a
|
@samp{GNU.crc32} keyword, for example @samp{GNU.crc33}, is reported as a
|
||||||
missing CRC instead of as a corrupt record. This misleading
|
missing CRC instead of as a corrupt record. This misleading
|
||||||
@samp{Missing CRC} message is the consequence of a flaw in the posix pax
|
@samp{Missing CRC} message is the consequence of a flaw in the POSIX pax
|
||||||
format; i.e., the lack of a mandatory check sequence in the extended
|
format; i.e., the lack of a mandatory check sequence in the extended
|
||||||
records. @xref{crc32}.
|
records. @xref{crc32}.
|
||||||
|
|
||||||
|
@ -400,6 +416,22 @@ invalid input file, 3 for an internal consistency error (eg, bug) which
|
||||||
caused tarlz to panic.
|
caused tarlz to panic.
|
||||||
|
|
||||||
|
|
||||||
|
@node Portable character set
|
||||||
|
@chapter POSIX portable filename character set
|
||||||
|
@cindex portable character set
|
||||||
|
|
||||||
|
The set of characters from which portable file names are constructed.
|
||||||
|
|
||||||
|
@example
|
||||||
|
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
|
||||||
|
a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||||
|
0 1 2 3 4 5 6 7 8 9 . _ -
|
||||||
|
@end example
|
||||||
|
|
||||||
|
The last three characters are the period, underscore, and hyphen-minus
|
||||||
|
characters, respectively.
|
||||||
|
|
||||||
|
|
||||||
@node File format
|
@node File format
|
||||||
@chapter File format
|
@chapter File format
|
||||||
@cindex file format
|
@cindex file format
|
||||||
|
@ -426,7 +458,7 @@ A tar.lz file consists of a series of lzip members (compressed data sets).
|
||||||
The members simply appear one after another in the file, with no
|
The members simply appear one after another in the file, with no
|
||||||
additional information before, between, or after them.
|
additional information before, between, or after them.
|
||||||
|
|
||||||
Each lzip member contains one or more tar members in a simplified posix
|
Each lzip member contains one or more tar members in a simplified POSIX
|
||||||
pax interchange format. The only pax typeflag value supported by tarlz
|
pax interchange format. The only pax typeflag value supported by tarlz
|
||||||
(in addition to the typeflag values defined by the ustar format) is
|
(in addition to the typeflag values defined by the ustar format) is
|
||||||
@samp{x}. The pax format is an extension on top of the ustar format that
|
@samp{x}. The pax format is an extension on top of the ustar format that
|
||||||
|
@ -506,7 +538,7 @@ extraction. @xref{flawed-compat}.
|
||||||
|
|
||||||
The pax extended header data consists of one or more records, each of
|
The pax extended header data consists of one or more records, each of
|
||||||
them constructed as follows:@*
|
them constructed as follows:@*
|
||||||
@code{"%d %s=%s\n", <length>, <keyword>, <value>}
|
@samp{"%d %s=%s\n", <length>, <keyword>, <value>}
|
||||||
|
|
||||||
The <length>, <blank>, <keyword>, <equals-sign>, and <newline> in the
|
The <length>, <blank>, <keyword>, <equals-sign>, and <newline> in the
|
||||||
record must be limited to the portable character set. The <length> field
|
record must be limited to the portable character set. The <length> field
|
||||||
|
@ -577,11 +609,11 @@ shown in the following table. All lengths and offsets are in decimal.
|
||||||
|
|
||||||
All characters in the header block are coded using the ISO/IEC 646:1991
|
All characters in the header block are coded using the ISO/IEC 646:1991
|
||||||
(ASCII) standard, except in fields storing names for files, users, and
|
(ASCII) standard, except in fields storing names for files, users, and
|
||||||
groups. For maximum portability between implementations, names should
|
groups. For maximum portability between implementations, names should only
|
||||||
only contain characters from the portable filename character set. But if
|
contain characters from the portable character set. But if an implementation
|
||||||
an implementation supports the use of characters outside of @samp{/} and
|
supports the use of characters outside of @samp{/} and the portable
|
||||||
the portable filename character set in names for files, users, and
|
character set in names for files, users, and groups, tarlz will use the byte
|
||||||
groups, tarlz will use the byte values in these names unmodified.
|
values in these names unmodified.
|
||||||
|
|
||||||
The fields name, linkname, and prefix are null-terminated character
|
The fields name, linkname, and prefix are null-terminated character
|
||||||
strings except when all characters in the array contain non-null
|
strings except when all characters in the array contain non-null
|
||||||
|
@ -679,17 +711,19 @@ ustar by not requiring a terminating null character.
|
||||||
@chapter The reasons for the differences with pax
|
@chapter The reasons for the differences with pax
|
||||||
@cindex Amendments to pax format
|
@cindex Amendments to pax format
|
||||||
|
|
||||||
Tarlz is meant to reliably detect invalid or corrupt metadata during
|
Tarlz creates safe archives that allow the reliable detection of invalid or
|
||||||
decoding, and to create safe archives where corrupt metadata can be reliably
|
corrupt metadata during decoding even when the integrity checking of lzip
|
||||||
detected. In order to achieve these goals, tarlz makes some changes to the
|
can't be used because the lzip members are only decompressed partially, as
|
||||||
variant of the pax format that it uses. This chapter describes these changes
|
it happens in parallel @samp{--list} and @samp{--extract}. In order to
|
||||||
and the concrete reasons to implement them.
|
achieve this goal, tarlz makes some changes to the variant of the pax format
|
||||||
|
that it uses. This chapter describes these changes and the concrete reasons
|
||||||
|
to implement them.
|
||||||
|
|
||||||
@sp 1
|
@sp 1
|
||||||
@anchor{crc32}
|
@anchor{crc32}
|
||||||
@section Add a CRC of the extended records
|
@section Add a CRC of the extended records
|
||||||
|
|
||||||
The posix pax format has a serious flaw. The metadata stored in pax extended
|
The POSIX pax format has a serious flaw. The metadata stored in pax extended
|
||||||
records are not protected by any kind of check sequence. Corruption in a
|
records are not protected by any kind of check sequence. Corruption in a
|
||||||
long file name may cause the extraction of the file in the wrong place
|
long file name may cause the extraction of the file in the wrong place
|
||||||
without warning. Corruption in a large file size may cause the truncation of
|
without warning. Corruption in a large file size may cause the truncation of
|
||||||
|
@ -703,8 +737,13 @@ potentially much worse that undetected corruption in the data. Even more so
|
||||||
in the case of pax because the amount of metadata it stores is potentially
|
in the case of pax because the amount of metadata it stores is potentially
|
||||||
large, making undetected corruption more probable.
|
large, making undetected corruption more probable.
|
||||||
|
|
||||||
Because of the above, tarlz protects the extended records with a CRC in
|
Headers and metadata must be protected separately from data because the
|
||||||
a way compatible with standard tar tools. @xref{key_crc32}.
|
integrity checking of lzip may not be able to detect the corruption before
|
||||||
|
the metadata has been used, for example, to create a new file in the wrong
|
||||||
|
place.
|
||||||
|
|
||||||
|
Because of the above, tarlz protects the extended records with a CRC in a
|
||||||
|
way compatible with standard tar tools. @xref{key_crc32}.
|
||||||
|
|
||||||
@sp 1
|
@sp 1
|
||||||
@anchor{flawed-compat}
|
@anchor{flawed-compat}
|
||||||
|
@ -729,8 +768,8 @@ extended records.
|
||||||
|
|
||||||
If an extended header is required for any reason (for example a file size
|
If an extended header is required for any reason (for example a file size
|
||||||
larger than @w{8 GiB} or a link name longer than 100 bytes), tarlz moves the
|
larger than @w{8 GiB} or a link name longer than 100 bytes), tarlz moves the
|
||||||
filename also to the extended header to prevent an ustar tool from trying to
|
file name also to the extended header to prevent an ustar tool from trying
|
||||||
extract the file or link. This also makes easier during parallel decoding
|
to extract the file or link. This also makes easier during parallel decoding
|
||||||
the detection of a tar member split between two lzip members at the boundary
|
the detection of a tar member split between two lzip members at the boundary
|
||||||
between the extended header and the ustar header.
|
between the extended header and the ustar header.
|
||||||
|
|
||||||
|
@ -741,7 +780,8 @@ The tarlz format is mainly ustar. Extended pax headers are used only when
|
||||||
needed because the length of a file name or link name, or the size of a file
|
needed because the length of a file name or link name, or the size of a file
|
||||||
exceed the limits of the ustar format. Adding extended headers to each
|
exceed the limits of the ustar format. Adding extended headers to each
|
||||||
member just to record subsecond timestamps seems wasteful for a backup
|
member just to record subsecond timestamps seems wasteful for a backup
|
||||||
format.
|
format. Moreover, minimizing the overhead may help recovering the archive
|
||||||
|
with lziprecover in case of corruption.
|
||||||
|
|
||||||
Global pax headers are tolerated, but not supported; they are parsed and
|
Global pax headers are tolerated, but not supported; they are parsed and
|
||||||
ignored. Some operations may not behave as expected if the archive contains
|
ignored. Some operations may not behave as expected if the archive contains
|
||||||
|
@ -759,6 +799,7 @@ be adjusted with a command line option in the future.
|
||||||
|
|
||||||
@node Multi-threaded tar
|
@node Multi-threaded tar
|
||||||
@chapter Limitations of parallel tar decoding
|
@chapter Limitations of parallel tar decoding
|
||||||
|
@cindex parallel tar decoding
|
||||||
|
|
||||||
Safely decoding an arbitrary tar archive in parallel is impossible. For
|
Safely decoding an arbitrary tar archive in parallel is impossible. For
|
||||||
example, if a tar archive containing another tar archive is decoded starting
|
example, if a tar archive containing another tar archive is decoded starting
|
||||||
|
|
|
@ -47,6 +47,7 @@ bool Exclude::excluded( const char * const filename )
|
||||||
while( *p )
|
while( *p )
|
||||||
{
|
{
|
||||||
for( unsigned i = 0; i < patterns.size(); ++i )
|
for( unsigned i = 0; i < patterns.size(); ++i )
|
||||||
|
// ignore a trailing sequence starting with '/' in filename
|
||||||
#ifdef FNM_LEADING_DIR
|
#ifdef FNM_LEADING_DIR
|
||||||
if( fnmatch( patterns[i].c_str(), p, FNM_LEADING_DIR ) == 0 ) return true;
|
if( fnmatch( patterns[i].c_str(), p, FNM_LEADING_DIR ) == 0 ) return true;
|
||||||
#else
|
#else
|
||||||
|
|
203
extract.cc
203
extract.cc
|
@ -32,9 +32,7 @@
|
||||||
#include <utime.h>
|
#include <utime.h>
|
||||||
#include <sys/stat.h>
|
#include <sys/stat.h>
|
||||||
#include <sys/types.h>
|
#include <sys/types.h>
|
||||||
#if defined(__GNU_LIBRARY__)
|
|
||||||
#include <sys/sysmacros.h> // for major, minor, makedev
|
#include <sys/sysmacros.h> // for major, minor, makedev
|
||||||
#endif
|
|
||||||
#include <lzlib.h>
|
#include <lzlib.h>
|
||||||
|
|
||||||
#include "arg_parser.h"
|
#include "arg_parser.h"
|
||||||
|
@ -181,131 +179,6 @@ int archive_read( const char * const archive_namep, const int infd,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
enum { mode_string_size = 10,
|
|
||||||
group_string_size = 1 + uname_l + 1 + gname_l + 1 }; // 67
|
|
||||||
|
|
||||||
void format_mode_string( const Tar_header header, char buf[mode_string_size] )
|
|
||||||
{
|
|
||||||
const Typeflag typeflag = (Typeflag)header[typeflag_o];
|
|
||||||
|
|
||||||
std::memcpy( buf, "----------", mode_string_size );
|
|
||||||
switch( typeflag )
|
|
||||||
{
|
|
||||||
case tf_regular: break;
|
|
||||||
case tf_link: buf[0] = 'h'; break;
|
|
||||||
case tf_symlink: buf[0] = 'l'; break;
|
|
||||||
case tf_chardev: buf[0] = 'c'; break;
|
|
||||||
case tf_blockdev: buf[0] = 'b'; break;
|
|
||||||
case tf_directory: buf[0] = 'd'; break;
|
|
||||||
case tf_fifo: buf[0] = 'p'; break;
|
|
||||||
case tf_hiperf: buf[0] = 'C'; break;
|
|
||||||
default: buf[0] = '?';
|
|
||||||
}
|
|
||||||
const mode_t mode = parse_octal( header + mode_o, mode_l ); // 12 bits
|
|
||||||
const bool setuid = mode & S_ISUID;
|
|
||||||
const bool setgid = mode & S_ISGID;
|
|
||||||
const bool sticky = mode & S_ISVTX;
|
|
||||||
if( mode & S_IRUSR ) buf[1] = 'r';
|
|
||||||
if( mode & S_IWUSR ) buf[2] = 'w';
|
|
||||||
if( mode & S_IXUSR ) buf[3] = setuid ? 's' : 'x';
|
|
||||||
else if( setuid ) buf[3] = 'S';
|
|
||||||
if( mode & S_IRGRP ) buf[4] = 'r';
|
|
||||||
if( mode & S_IWGRP ) buf[5] = 'w';
|
|
||||||
if( mode & S_IXGRP ) buf[6] = setgid ? 's' : 'x';
|
|
||||||
else if( setgid ) buf[6] = 'S';
|
|
||||||
if( mode & S_IROTH ) buf[7] = 'r';
|
|
||||||
if( mode & S_IWOTH ) buf[8] = 'w';
|
|
||||||
if( mode & S_IXOTH ) buf[9] = sticky ? 't' : 'x';
|
|
||||||
else if( sticky ) buf[9] = 'T';
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
int format_user_group_string( const Tar_header header,
|
|
||||||
char buf[group_string_size] )
|
|
||||||
{
|
|
||||||
int len;
|
|
||||||
if( header[uname_o] && header[gname_o] )
|
|
||||||
len = snprintf( buf, group_string_size,
|
|
||||||
" %.32s/%.32s", header + uname_o, header + gname_o );
|
|
||||||
else
|
|
||||||
{
|
|
||||||
const unsigned uid = parse_octal( header + uid_o, uid_l );
|
|
||||||
const unsigned gid = parse_octal( header + gid_o, gid_l );
|
|
||||||
len = snprintf( buf, group_string_size, " %u/%u", uid, gid );
|
|
||||||
}
|
|
||||||
return len;
|
|
||||||
}
|
|
||||||
|
|
||||||
} // end namespace
|
|
||||||
|
|
||||||
bool block_is_zero( const uint8_t * const buf, const int size )
|
|
||||||
{
|
|
||||||
for( int i = 0; i < size; ++i ) if( buf[i] != 0 ) return false;
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
bool format_member_name( const Extended & extended, const Tar_header header,
|
|
||||||
Resizable_buffer & rbuf, const bool long_format )
|
|
||||||
{
|
|
||||||
if( long_format )
|
|
||||||
{
|
|
||||||
format_mode_string( header, rbuf() );
|
|
||||||
const int group_string_len =
|
|
||||||
format_user_group_string( header, rbuf() + mode_string_size );
|
|
||||||
int offset = mode_string_size + group_string_len;
|
|
||||||
const time_t mtime = parse_octal( header + mtime_o, mtime_l ); // 33 bits
|
|
||||||
struct tm tms;
|
|
||||||
const struct tm * tm = localtime_r( &mtime, &tms );
|
|
||||||
if( !tm )
|
|
||||||
{ time_t z = 0; tm = localtime_r( &z, &tms ); if( !tm ) tm = &tms; }
|
|
||||||
const Typeflag typeflag = (Typeflag)header[typeflag_o];
|
|
||||||
const bool islink = ( typeflag == tf_link || typeflag == tf_symlink );
|
|
||||||
const char * const link_string = !islink ? "" :
|
|
||||||
( ( typeflag == tf_link ) ? " link to " : " -> " );
|
|
||||||
if( typeflag == tf_chardev || typeflag == tf_blockdev )
|
|
||||||
offset += snprintf( rbuf() + offset, rbuf.size() - offset, " %5u,%u",
|
|
||||||
(unsigned)parse_octal( header + devmajor_o, devmajor_l ),
|
|
||||||
(unsigned)parse_octal( header + devminor_o, devminor_l ) );
|
|
||||||
else
|
|
||||||
offset += snprintf( rbuf() + offset, rbuf.size() - offset, " %9llu",
|
|
||||||
extended.file_size() );
|
|
||||||
for( int i = 0; i < 2; ++i )
|
|
||||||
{
|
|
||||||
const int len = snprintf( rbuf() + offset, rbuf.size() - offset,
|
|
||||||
" %4d-%02u-%02u %02u:%02u %s%s%s\n",
|
|
||||||
1900 + tm->tm_year, 1 + tm->tm_mon, tm->tm_mday,
|
|
||||||
tm->tm_hour, tm->tm_min, extended.path().c_str(),
|
|
||||||
link_string, islink ? extended.linkpath().c_str() : "" );
|
|
||||||
if( (int)rbuf.size() > len + offset ) break;
|
|
||||||
if( !rbuf.resize( len + offset + 1 ) ) return false;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
if( rbuf.size() < extended.path().size() + 2 &&
|
|
||||||
!rbuf.resize( extended.path().size() + 2 ) ) return false;
|
|
||||||
snprintf( rbuf(), rbuf.size(), "%s\n", extended.path().c_str() );
|
|
||||||
}
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
bool show_member_name( const Extended & extended, const Tar_header header,
|
|
||||||
const int vlevel, Resizable_buffer & rbuf )
|
|
||||||
{
|
|
||||||
if( verbosity >= vlevel )
|
|
||||||
{
|
|
||||||
if( !format_member_name( extended, header, rbuf, verbosity > vlevel ) )
|
|
||||||
{ show_error( mem_msg ); return false; }
|
|
||||||
std::fputs( rbuf(), stdout );
|
|
||||||
std::fflush( stdout );
|
|
||||||
}
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
|
|
||||||
namespace {
|
|
||||||
|
|
||||||
int skip_member( const char * const archive_namep, const int infd,
|
int skip_member( const char * const archive_namep, const int infd,
|
||||||
const Extended & extended )
|
const Extended & extended )
|
||||||
{
|
{
|
||||||
|
@ -498,7 +371,6 @@ int extract_member( const char * const archive_namep, const int infd,
|
||||||
case tf_hiperf:
|
case tf_hiperf:
|
||||||
outfd = open_outstream( filename );
|
outfd = open_outstream( filename );
|
||||||
if( outfd < 0 ) return 2;
|
if( outfd < 0 ) return 2;
|
||||||
chmod( filename, mode ); // ignore errors
|
|
||||||
break;
|
break;
|
||||||
case tf_link:
|
case tf_link:
|
||||||
case tf_symlink:
|
case tf_symlink:
|
||||||
|
@ -559,6 +431,9 @@ int extract_member( const char * const archive_namep, const int infd,
|
||||||
return 2;
|
return 2;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if( typeflag == tf_regular || typeflag == tf_hiperf )
|
||||||
|
fchmod( outfd, mode ); // ignore errors
|
||||||
|
|
||||||
const int bufsize = 32 * header_size;
|
const int bufsize = 32 * header_size;
|
||||||
uint8_t buf[bufsize];
|
uint8_t buf[bufsize];
|
||||||
long long rest = extended.file_size();
|
long long rest = extended.file_size();
|
||||||
|
@ -597,30 +472,6 @@ int extract_member( const char * const archive_namep, const int infd,
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
} // end namespace
|
|
||||||
|
|
||||||
|
|
||||||
// return true if dir is a parent directory of name
|
|
||||||
bool compare_prefix_dir( const char * const dir, const char * const name )
|
|
||||||
{
|
|
||||||
int len = 0;
|
|
||||||
while( dir[len] && dir[len] == name[len] ) ++len;
|
|
||||||
return ( !dir[len] && len > 0 && ( dir[len-1] == '/' || name[len] == '/' ) );
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
// compare two file names ignoring trailing slashes
|
|
||||||
bool compare_tslash( const char * const name1, const char * const name2 )
|
|
||||||
{
|
|
||||||
const char * p = name1;
|
|
||||||
const char * q = name2;
|
|
||||||
while( *p && *p == *q ) { ++p; ++q; }
|
|
||||||
while( *p == '/' ) ++p;
|
|
||||||
while( *q == '/' ) ++q;
|
|
||||||
return ( !*p && !*q );
|
|
||||||
}
|
|
||||||
|
|
||||||
namespace {
|
|
||||||
|
|
||||||
bool parse_records( const char * const archive_namep, const int infd,
|
bool parse_records( const char * const archive_namep, const int infd,
|
||||||
Extended & extended, const Tar_header header,
|
Extended & extended, const Tar_header header,
|
||||||
|
@ -638,54 +489,6 @@ bool parse_records( const char * const archive_namep, const int infd,
|
||||||
} // end namespace
|
} // end namespace
|
||||||
|
|
||||||
|
|
||||||
/* Returns the number of bytes really read.
|
|
||||||
If (returned value < size) and (errno == 0), means EOF was reached.
|
|
||||||
*/
|
|
||||||
int readblock( const int fd, uint8_t * const buf, const int size )
|
|
||||||
{
|
|
||||||
int sz = 0;
|
|
||||||
errno = 0;
|
|
||||||
while( sz < size )
|
|
||||||
{
|
|
||||||
const int n = read( fd, buf + sz, size - sz );
|
|
||||||
if( n > 0 ) sz += n;
|
|
||||||
else if( n == 0 ) break; // EOF
|
|
||||||
else if( errno != EINTR ) break;
|
|
||||||
errno = 0;
|
|
||||||
}
|
|
||||||
return sz;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
/* Returns the number of bytes really written.
|
|
||||||
If (returned value < size), it is always an error.
|
|
||||||
*/
|
|
||||||
int writeblock( const int fd, const uint8_t * const buf, const int size )
|
|
||||||
{
|
|
||||||
int sz = 0;
|
|
||||||
errno = 0;
|
|
||||||
while( sz < size )
|
|
||||||
{
|
|
||||||
const int n = write( fd, buf + sz, size - sz );
|
|
||||||
if( n > 0 ) sz += n;
|
|
||||||
else if( n < 0 && errno != EINTR ) break;
|
|
||||||
errno = 0;
|
|
||||||
}
|
|
||||||
return sz;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
unsigned long long parse_octal( const uint8_t * const ptr, const int size )
|
|
||||||
{
|
|
||||||
unsigned long long result = 0;
|
|
||||||
int i = 0;
|
|
||||||
while( i < size && std::isspace( ptr[i] ) ) ++i;
|
|
||||||
for( ; i < size && ptr[i] >= '0' && ptr[i] <= '7'; ++i )
|
|
||||||
{ result <<= 3; result += ptr[i] - '0'; }
|
|
||||||
return result;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
int decode( const std::string & archive_name, const Arg_parser & parser,
|
int decode( const std::string & archive_name, const Arg_parser & parser,
|
||||||
const int filenames, const int num_workers, const int debug_level,
|
const int filenames, const int num_workers, const int debug_level,
|
||||||
const Program_mode program_mode, const bool ignore_ids,
|
const Program_mode program_mode, const bool ignore_ids,
|
||||||
|
|
267
list_lz.cc
267
list_lz.cc
|
@ -36,6 +36,8 @@
|
||||||
#include "tarlz.h"
|
#include "tarlz.h"
|
||||||
|
|
||||||
|
|
||||||
|
namespace {
|
||||||
|
|
||||||
// Returns the number of bytes really read.
|
// Returns the number of bytes really read.
|
||||||
// If (returned value < size) and (errno == 0), means EOF was reached.
|
// If (returned value < size) and (errno == 0), means EOF was reached.
|
||||||
//
|
//
|
||||||
|
@ -55,7 +57,7 @@ int preadblock( const int fd, uint8_t * const buf, const int size,
|
||||||
return sz;
|
return sz;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
// Returns the number of bytes really written.
|
// Returns the number of bytes really written.
|
||||||
// If (returned value < size), it is always an error.
|
// If (returned value < size), it is always an error.
|
||||||
//
|
//
|
||||||
|
@ -73,183 +75,7 @@ int pwriteblock( const int fd, const uint8_t * const buf, const int size,
|
||||||
}
|
}
|
||||||
return sz;
|
return sz;
|
||||||
}
|
}
|
||||||
|
*/
|
||||||
|
|
||||||
void xinit_mutex( pthread_mutex_t * const mutex )
|
|
||||||
{
|
|
||||||
const int errcode = pthread_mutex_init( mutex, 0 );
|
|
||||||
if( errcode )
|
|
||||||
{ show_error( "pthread_mutex_init", errcode ); cleanup_and_fail(); }
|
|
||||||
}
|
|
||||||
|
|
||||||
void xinit_cond( pthread_cond_t * const cond )
|
|
||||||
{
|
|
||||||
const int errcode = pthread_cond_init( cond, 0 );
|
|
||||||
if( errcode )
|
|
||||||
{ show_error( "pthread_cond_init", errcode ); cleanup_and_fail(); }
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
void xdestroy_mutex( pthread_mutex_t * const mutex )
|
|
||||||
{
|
|
||||||
const int errcode = pthread_mutex_destroy( mutex );
|
|
||||||
if( errcode )
|
|
||||||
{ show_error( "pthread_mutex_destroy", errcode ); cleanup_and_fail(); }
|
|
||||||
}
|
|
||||||
|
|
||||||
void xdestroy_cond( pthread_cond_t * const cond )
|
|
||||||
{
|
|
||||||
const int errcode = pthread_cond_destroy( cond );
|
|
||||||
if( errcode )
|
|
||||||
{ show_error( "pthread_cond_destroy", errcode ); cleanup_and_fail(); }
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
void xlock( pthread_mutex_t * const mutex )
|
|
||||||
{
|
|
||||||
const int errcode = pthread_mutex_lock( mutex );
|
|
||||||
if( errcode )
|
|
||||||
{ show_error( "pthread_mutex_lock", errcode ); cleanup_and_fail(); }
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
void xunlock( pthread_mutex_t * const mutex )
|
|
||||||
{
|
|
||||||
const int errcode = pthread_mutex_unlock( mutex );
|
|
||||||
if( errcode )
|
|
||||||
{ show_error( "pthread_mutex_unlock", errcode ); cleanup_and_fail(); }
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
void xwait( pthread_cond_t * const cond, pthread_mutex_t * const mutex )
|
|
||||||
{
|
|
||||||
const int errcode = pthread_cond_wait( cond, mutex );
|
|
||||||
if( errcode )
|
|
||||||
{ show_error( "pthread_cond_wait", errcode ); cleanup_and_fail(); }
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
void xsignal( pthread_cond_t * const cond )
|
|
||||||
{
|
|
||||||
const int errcode = pthread_cond_signal( cond );
|
|
||||||
if( errcode )
|
|
||||||
{ show_error( "pthread_cond_signal", errcode ); cleanup_and_fail(); }
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
void xbroadcast( pthread_cond_t * const cond )
|
|
||||||
{
|
|
||||||
const int errcode = pthread_cond_broadcast( cond );
|
|
||||||
if( errcode )
|
|
||||||
{ show_error( "pthread_cond_broadcast", errcode ); cleanup_and_fail(); }
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
bool check_skip_filename( const Arg_parser & parser,
|
|
||||||
std::vector< char > & name_pending,
|
|
||||||
const char * const filename, const int filenames )
|
|
||||||
{
|
|
||||||
if( Exclude::excluded( filename ) ) return true; // skip excluded files
|
|
||||||
bool skip = filenames > 0;
|
|
||||||
if( skip )
|
|
||||||
for( int i = 0; i < parser.arguments(); ++i )
|
|
||||||
if( !parser.code( i ) && parser.argument( i ).size() )
|
|
||||||
{
|
|
||||||
const char * const name =
|
|
||||||
remove_leading_dotslash( parser.argument( i ).c_str() );
|
|
||||||
if( compare_prefix_dir( name, filename ) ||
|
|
||||||
compare_tslash( name, filename ) )
|
|
||||||
{ skip = false; name_pending[i] = false; break; }
|
|
||||||
}
|
|
||||||
return skip;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
/* Return value: 0 = OK, 1 = damaged member, 2 = fatal error. */
|
|
||||||
int archive_read_lz( LZ_Decoder * const decoder, const int infd,
|
|
||||||
long long & file_pos, const long long member_end,
|
|
||||||
const long long cdata_size, uint8_t * const buf,
|
|
||||||
const int size, const char ** msg )
|
|
||||||
{
|
|
||||||
int sz = 0;
|
|
||||||
|
|
||||||
while( sz < size )
|
|
||||||
{
|
|
||||||
const int rd = LZ_decompress_read( decoder, buf + sz, size - sz );
|
|
||||||
if( rd < 0 )
|
|
||||||
{ *msg = LZ_strerror( LZ_decompress_errno( decoder ) ); return 1; }
|
|
||||||
if( rd == 0 && LZ_decompress_finished( decoder ) == 1 )
|
|
||||||
{ *msg = end_msg; return 2; }
|
|
||||||
sz += rd;
|
|
||||||
if( sz < size && LZ_decompress_write_size( decoder ) > 0 )
|
|
||||||
{
|
|
||||||
const long long ibuf_size = 16384; // try 65536
|
|
||||||
uint8_t ibuf[ibuf_size];
|
|
||||||
const long long rest = ( file_pos < member_end ) ?
|
|
||||||
member_end - file_pos : cdata_size - file_pos;
|
|
||||||
const int rsize = std::min( LZ_decompress_write_size( decoder ),
|
|
||||||
(int)std::min( ibuf_size, rest ) );
|
|
||||||
if( rsize <= 0 ) LZ_decompress_finish( decoder );
|
|
||||||
else
|
|
||||||
{
|
|
||||||
const int rd = preadblock( infd, ibuf, rsize, file_pos );
|
|
||||||
if( LZ_decompress_write( decoder, ibuf, rd ) != rd )
|
|
||||||
internal_error( "library error (LZ_decompress_write)." );
|
|
||||||
file_pos += rd;
|
|
||||||
if( rd < rsize )
|
|
||||||
{
|
|
||||||
LZ_decompress_finish( decoder );
|
|
||||||
if( errno ) { *msg = "Error reading archive"; return 2; }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
int parse_records_lz( LZ_Decoder * const decoder, const int infd,
|
|
||||||
long long & file_pos, const long long member_end,
|
|
||||||
const long long cdata_size, long long & data_pos,
|
|
||||||
Extended & extended, const Tar_header header,
|
|
||||||
Resizable_buffer & rbuf, const char ** msg,
|
|
||||||
const bool permissive )
|
|
||||||
{
|
|
||||||
const long long edsize = parse_octal( header + size_o, size_l );
|
|
||||||
const long long bufsize = round_up( edsize );
|
|
||||||
if( edsize <= 0 || edsize >= 1LL << 33 || bufsize >= INT_MAX )
|
|
||||||
return 1; // overflow or no extended data
|
|
||||||
if( !rbuf.resize( bufsize ) ) return 1; // extended records buffer
|
|
||||||
int retval = archive_read_lz( decoder, infd, file_pos, member_end,
|
|
||||||
cdata_size, (uint8_t *)rbuf(), bufsize, msg );
|
|
||||||
if( retval == 0 )
|
|
||||||
{ if( extended.parse( rbuf(), edsize, permissive ) ) data_pos += bufsize;
|
|
||||||
else retval = 1; }
|
|
||||||
return retval;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
int skip_member_lz( LZ_Decoder * const decoder, const int infd,
|
|
||||||
long long & file_pos, const long long member_end,
|
|
||||||
const long long cdata_size, long long & data_pos,
|
|
||||||
long long rest, const char ** msg )
|
|
||||||
{
|
|
||||||
const int bufsize = 32 * header_size;
|
|
||||||
uint8_t buf[bufsize];
|
|
||||||
while( rest > 0 ) // skip tar member
|
|
||||||
{
|
|
||||||
const int rsize = ( rest >= bufsize ) ? bufsize : rest;
|
|
||||||
const int ret = archive_read_lz( decoder, infd, file_pos, member_end,
|
|
||||||
cdata_size, buf, rsize, msg );
|
|
||||||
if( ret != 0 ) return ret;
|
|
||||||
data_pos += rsize;
|
|
||||||
rest -= rsize;
|
|
||||||
}
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
namespace {
|
|
||||||
|
|
||||||
struct Packet // member name and metadata or error message
|
struct Packet // member name and metadata or error message
|
||||||
{
|
{
|
||||||
|
@ -606,6 +432,91 @@ void muxer( const char * const archive_namep, Packet_courier & courier )
|
||||||
} // end namespace
|
} // end namespace
|
||||||
|
|
||||||
|
|
||||||
|
/* Read 'size' decompressed bytes from the archive.
|
||||||
|
Return value: 0 = OK, 1 = damaged member, 2 = fatal error. */
|
||||||
|
int archive_read_lz( LZ_Decoder * const decoder, const int infd,
|
||||||
|
long long & file_pos, const long long member_end,
|
||||||
|
const long long cdata_size, uint8_t * const buf,
|
||||||
|
const int size, const char ** msg )
|
||||||
|
{
|
||||||
|
int sz = 0;
|
||||||
|
|
||||||
|
while( sz < size )
|
||||||
|
{
|
||||||
|
const int rd = LZ_decompress_read( decoder, buf + sz, size - sz );
|
||||||
|
if( rd < 0 )
|
||||||
|
{ *msg = LZ_strerror( LZ_decompress_errno( decoder ) ); return 1; }
|
||||||
|
if( rd == 0 && LZ_decompress_finished( decoder ) == 1 )
|
||||||
|
{ *msg = end_msg; return 2; }
|
||||||
|
sz += rd;
|
||||||
|
if( sz < size && LZ_decompress_write_size( decoder ) > 0 )
|
||||||
|
{
|
||||||
|
const long long ibuf_size = 16384; // try 65536
|
||||||
|
uint8_t ibuf[ibuf_size];
|
||||||
|
const long long rest = ( file_pos < member_end ) ?
|
||||||
|
member_end - file_pos : cdata_size - file_pos;
|
||||||
|
const int rsize = std::min( LZ_decompress_write_size( decoder ),
|
||||||
|
(int)std::min( ibuf_size, rest ) );
|
||||||
|
if( rsize <= 0 ) LZ_decompress_finish( decoder );
|
||||||
|
else
|
||||||
|
{
|
||||||
|
const int rd = preadblock( infd, ibuf, rsize, file_pos );
|
||||||
|
if( LZ_decompress_write( decoder, ibuf, rd ) != rd )
|
||||||
|
internal_error( "library error (LZ_decompress_write)." );
|
||||||
|
file_pos += rd;
|
||||||
|
if( rd < rsize )
|
||||||
|
{
|
||||||
|
LZ_decompress_finish( decoder );
|
||||||
|
if( errno ) { *msg = "Error reading archive"; return 2; }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
int parse_records_lz( LZ_Decoder * const decoder, const int infd,
|
||||||
|
long long & file_pos, const long long member_end,
|
||||||
|
const long long cdata_size, long long & data_pos,
|
||||||
|
Extended & extended, const Tar_header header,
|
||||||
|
Resizable_buffer & rbuf, const char ** msg,
|
||||||
|
const bool permissive )
|
||||||
|
{
|
||||||
|
const long long edsize = parse_octal( header + size_o, size_l );
|
||||||
|
const long long bufsize = round_up( edsize );
|
||||||
|
if( edsize <= 0 || edsize >= 1LL << 33 || bufsize >= INT_MAX )
|
||||||
|
return 1; // overflow or no extended data
|
||||||
|
if( !rbuf.resize( bufsize ) ) return 1; // extended records buffer
|
||||||
|
int retval = archive_read_lz( decoder, infd, file_pos, member_end,
|
||||||
|
cdata_size, (uint8_t *)rbuf(), bufsize, msg );
|
||||||
|
if( retval == 0 )
|
||||||
|
{ if( extended.parse( rbuf(), edsize, permissive ) ) data_pos += bufsize;
|
||||||
|
else retval = 1; }
|
||||||
|
return retval;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
int skip_member_lz( LZ_Decoder * const decoder, const int infd,
|
||||||
|
long long & file_pos, const long long member_end,
|
||||||
|
const long long cdata_size, long long & data_pos,
|
||||||
|
long long rest, const char ** msg )
|
||||||
|
{
|
||||||
|
const int bufsize = 32 * header_size;
|
||||||
|
uint8_t buf[bufsize];
|
||||||
|
while( rest > 0 ) // skip tar member
|
||||||
|
{
|
||||||
|
const int rsize = ( rest >= bufsize ) ? bufsize : rest;
|
||||||
|
const int ret = archive_read_lz( decoder, infd, file_pos, member_end,
|
||||||
|
cdata_size, buf, rsize, msg );
|
||||||
|
if( ret != 0 ) return ret;
|
||||||
|
data_pos += rsize;
|
||||||
|
rest -= rsize;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
// init the courier, then start the workers and call the muxer.
|
// init the courier, then start the workers and call the muxer.
|
||||||
int list_lz( const char * const archive_namep, const Arg_parser & parser,
|
int list_lz( const char * const archive_namep, const Arg_parser & parser,
|
||||||
std::vector< char > & name_pending, const Lzip_index & lzip_index,
|
std::vector< char > & name_pending, const Lzip_index & lzip_index,
|
||||||
|
|
23
main.cc
23
main.cc
|
@ -60,7 +60,7 @@ namespace {
|
||||||
|
|
||||||
const char * const program_name = "tarlz";
|
const char * const program_name = "tarlz";
|
||||||
const char * const program_year = "2019";
|
const char * const program_year = "2019";
|
||||||
const char * invocation_name = 0;
|
const char * invocation_name = program_name; // default value
|
||||||
bool dereference = false;
|
bool dereference = false;
|
||||||
|
|
||||||
|
|
||||||
|
@ -68,14 +68,17 @@ void show_help( const long num_online )
|
||||||
{
|
{
|
||||||
std::printf( "Tarlz is a massively parallel (multi-threaded) combined implementation of\n"
|
std::printf( "Tarlz is a massively parallel (multi-threaded) combined implementation of\n"
|
||||||
"the tar archiver and the lzip compressor. Tarlz creates, lists and extracts\n"
|
"the tar archiver and the lzip compressor. Tarlz creates, lists and extracts\n"
|
||||||
"archives in a simplified posix pax format compressed with lzip, keeping the\n"
|
"archives in a simplified and safer variant of the POSIX pax format\n"
|
||||||
"alignment between tar members and lzip members. This method adds an indexed\n"
|
"compressed with lzip, keeping the alignment between tar members and lzip\n"
|
||||||
"lzip layer on top of the tar archive, making it possible to decode the\n"
|
"members. The resulting multimember tar.lz archive is fully backward\n"
|
||||||
"archive safely in parallel. The resulting multimember tar.lz archive is\n"
|
"compatible with standard tar tools like GNU tar, which treat it like any\n"
|
||||||
"fully backward compatible with standard tar tools like GNU tar, which treat\n"
|
"other tar.lz archive. Tarlz can append files to the end of such compressed\n"
|
||||||
"it like any other tar.lz archive. Tarlz can append files to the end of such\n"
|
"archives.\n"
|
||||||
"compressed archives.\n"
|
"\nKeeping the alignment between tar members and lzip members has two\n"
|
||||||
"\nThe tarlz file format is a safe posix-style backup format. In case of\n"
|
"advantages. It adds an indexed lzip layer on top of the tar archive, making\n"
|
||||||
|
"it possible to decode the archive safely in parallel. It also minimizes the\n"
|
||||||
|
"amount of data lost in case of corruption.\n"
|
||||||
|
"\nThe tarlz file format is a safe POSIX-style backup format. In case of\n"
|
||||||
"corruption, tarlz can extract all the undamaged members from the tar.lz\n"
|
"corruption, tarlz can extract all the undamaged members from the tar.lz\n"
|
||||||
"archive, skipping over the damaged members, just like the standard\n"
|
"archive, skipping over the damaged members, just like the standard\n"
|
||||||
"(uncompressed) tar. Moreover, the option '--keep-damaged' can be used to\n"
|
"(uncompressed) tar. Moreover, the option '--keep-damaged' can be used to\n"
|
||||||
|
@ -305,7 +308,7 @@ int main( const int argc, const char * const argv[] )
|
||||||
bool keep_damaged = false;
|
bool keep_damaged = false;
|
||||||
bool missing_crc = false;
|
bool missing_crc = false;
|
||||||
bool permissive = false;
|
bool permissive = false;
|
||||||
invocation_name = argv[0];
|
if( argc > 0 ) invocation_name = argv[0];
|
||||||
|
|
||||||
if( LZ_version()[0] < '1' )
|
if( LZ_version()[0] < '1' )
|
||||||
{ show_error( "Bad library version. At least lzlib 1.0 is required." );
|
{ show_error( "Bad library version. At least lzlib 1.0 is required." );
|
||||||
|
|
48
tarlz.h
48
tarlz.h
|
@ -317,6 +317,31 @@ const char * const fv_msg3 = "Format violation: consecutive extended headers fou
|
||||||
const char * const posix_msg = "This does not look like a POSIX tar archive.";
|
const char * const posix_msg = "This does not look like a POSIX tar archive.";
|
||||||
const char * const posix_lz_msg = "This does not look like a POSIX tar.lz archive.";
|
const char * const posix_lz_msg = "This does not look like a POSIX tar.lz archive.";
|
||||||
|
|
||||||
|
// defined in common.cc
|
||||||
|
void xinit_mutex( pthread_mutex_t * const mutex );
|
||||||
|
void xinit_cond( pthread_cond_t * const cond );
|
||||||
|
void xdestroy_mutex( pthread_mutex_t * const mutex );
|
||||||
|
void xdestroy_cond( pthread_cond_t * const cond );
|
||||||
|
void xlock( pthread_mutex_t * const mutex );
|
||||||
|
void xunlock( pthread_mutex_t * const mutex );
|
||||||
|
void xwait( pthread_cond_t * const cond, pthread_mutex_t * const mutex );
|
||||||
|
void xsignal( pthread_cond_t * const cond );
|
||||||
|
void xbroadcast( pthread_cond_t * const cond );
|
||||||
|
unsigned long long parse_octal( const uint8_t * const ptr, const int size );
|
||||||
|
int readblock( const int fd, uint8_t * const buf, const int size );
|
||||||
|
int writeblock( const int fd, const uint8_t * const buf, const int size );
|
||||||
|
|
||||||
|
// defined in common_decode.cc
|
||||||
|
class Arg_parser;
|
||||||
|
bool block_is_zero( const uint8_t * const buf, const int size );
|
||||||
|
bool format_member_name( const Extended & extended, const Tar_header header,
|
||||||
|
Resizable_buffer & rbuf, const bool long_format );
|
||||||
|
bool show_member_name( const Extended & extended, const Tar_header header,
|
||||||
|
const int vlevel, Resizable_buffer & rbuf );
|
||||||
|
bool check_skip_filename( const Arg_parser & parser,
|
||||||
|
std::vector< char > & name_pending,
|
||||||
|
const char * const filename, const int filenames );
|
||||||
|
|
||||||
// defined in create.cc
|
// defined in create.cc
|
||||||
enum Solidity { no_solid, bsolid, dsolid, asolid, solid };
|
enum Solidity { no_solid, bsolid, dsolid, asolid, solid };
|
||||||
extern int cl_owner;
|
extern int cl_owner;
|
||||||
|
@ -339,7 +364,6 @@ int final_exit_status( int retval, const bool show_msg = true );
|
||||||
unsigned ustar_chksum( const uint8_t * const header );
|
unsigned ustar_chksum( const uint8_t * const header );
|
||||||
bool verify_ustar_chksum( const uint8_t * const header );
|
bool verify_ustar_chksum( const uint8_t * const header );
|
||||||
bool has_lz_ext( const std::string & name );
|
bool has_lz_ext( const std::string & name );
|
||||||
class Arg_parser;
|
|
||||||
int concatenate( const std::string & archive_name, const Arg_parser & parser,
|
int concatenate( const std::string & archive_name, const Arg_parser & parser,
|
||||||
const int filenames );
|
const int filenames );
|
||||||
int encode( const std::string & archive_name, const Arg_parser & parser,
|
int encode( const std::string & archive_name, const Arg_parser & parser,
|
||||||
|
@ -381,16 +405,6 @@ bool excluded( const char * const filename );
|
||||||
// defined in extract.cc
|
// defined in extract.cc
|
||||||
enum Program_mode { m_none, m_append, m_concatenate, m_create, m_delete,
|
enum Program_mode { m_none, m_append, m_concatenate, m_create, m_delete,
|
||||||
m_diff, m_extract, m_list };
|
m_diff, m_extract, m_list };
|
||||||
bool block_is_zero( const uint8_t * const buf, const int size );
|
|
||||||
bool format_member_name( const Extended & extended, const Tar_header header,
|
|
||||||
Resizable_buffer & rbuf, const bool long_format );
|
|
||||||
bool show_member_name( const Extended & extended, const Tar_header header,
|
|
||||||
const int vlevel, Resizable_buffer & rbuf );
|
|
||||||
bool compare_prefix_dir( const char * const dir, const char * const name );
|
|
||||||
bool compare_tslash( const char * const name1, const char * const name2 );
|
|
||||||
int readblock( const int fd, uint8_t * const buf, const int size );
|
|
||||||
int writeblock( const int fd, const uint8_t * const buf, const int size );
|
|
||||||
unsigned long long parse_octal( const uint8_t * const ptr, const int size );
|
|
||||||
int decode( const std::string & archive_name, const Arg_parser & parser,
|
int decode( const std::string & archive_name, const Arg_parser & parser,
|
||||||
const int filenames, const int num_workers, const int debug_level,
|
const int filenames, const int num_workers, const int debug_level,
|
||||||
const Program_mode program_mode, const bool ignore_ids,
|
const Program_mode program_mode, const bool ignore_ids,
|
||||||
|
@ -398,18 +412,6 @@ int decode( const std::string & archive_name, const Arg_parser & parser,
|
||||||
const bool permissive );
|
const bool permissive );
|
||||||
|
|
||||||
// defined in list_lz.cc
|
// defined in list_lz.cc
|
||||||
void xinit_mutex( pthread_mutex_t * const mutex );
|
|
||||||
void xinit_cond( pthread_cond_t * const cond );
|
|
||||||
void xdestroy_mutex( pthread_mutex_t * const mutex );
|
|
||||||
void xdestroy_cond( pthread_cond_t * const cond );
|
|
||||||
void xlock( pthread_mutex_t * const mutex );
|
|
||||||
void xunlock( pthread_mutex_t * const mutex );
|
|
||||||
void xwait( pthread_cond_t * const cond, pthread_mutex_t * const mutex );
|
|
||||||
void xsignal( pthread_cond_t * const cond );
|
|
||||||
void xbroadcast( pthread_cond_t * const cond );
|
|
||||||
bool check_skip_filename( const Arg_parser & parser,
|
|
||||||
std::vector< char > & name_pending,
|
|
||||||
const char * const filename, const int filenames );
|
|
||||||
struct LZ_Decoder;
|
struct LZ_Decoder;
|
||||||
int archive_read_lz( LZ_Decoder * const decoder, const int infd,
|
int archive_read_lz( LZ_Decoder * const decoder, const int infd,
|
||||||
long long & file_pos, const long long member_end,
|
long long & file_pos, const long long member_end,
|
||||||
|
|
|
@ -182,7 +182,7 @@ rm -f test.txt || framework_failure
|
||||||
cmp "${in}" test.txt || test_failed $LINENO
|
cmp "${in}" test.txt || test_failed $LINENO
|
||||||
rm -f test.txt || framework_failure
|
rm -f test.txt || framework_failure
|
||||||
|
|
||||||
# test3 reference files for diff
|
# test3 reference files for -t and -tv (list3, vlist3)
|
||||||
"${TARLZ}" -tf "${test3}" > list3 || test_failed $LINENO
|
"${TARLZ}" -tf "${test3}" > list3 || test_failed $LINENO
|
||||||
"${TARLZ}" -tvf "${test3}" > vlist3 || test_failed $LINENO
|
"${TARLZ}" -tvf "${test3}" > vlist3 || test_failed $LINENO
|
||||||
"${TARLZ}" -tf "${test3_lz}" > out || test_failed $LINENO
|
"${TARLZ}" -tf "${test3_lz}" > out || test_failed $LINENO
|
||||||
|
@ -195,6 +195,8 @@ rm -f out || framework_failure
|
||||||
cat "${testdir}"/rfoo > cfoo || framework_failure
|
cat "${testdir}"/rfoo > cfoo || framework_failure
|
||||||
cat "${testdir}"/rbar > cbar || framework_failure
|
cat "${testdir}"/rbar > cbar || framework_failure
|
||||||
cat "${testdir}"/rbaz > cbaz || framework_failure
|
cat "${testdir}"/rbaz > cbaz || framework_failure
|
||||||
|
|
||||||
|
# test --list and --extract test3
|
||||||
rm -f foo bar baz || framework_failure
|
rm -f foo bar baz || framework_failure
|
||||||
"${TARLZ}" -xf "${test3_lz}" --missing-crc || test_failed $LINENO
|
"${TARLZ}" -xf "${test3_lz}" --missing-crc || test_failed $LINENO
|
||||||
cmp cfoo foo || test_failed $LINENO
|
cmp cfoo foo || test_failed $LINENO
|
||||||
|
@ -253,7 +255,7 @@ for i in "${test3dir}" "${test3dir_lz}" ; do
|
||||||
rm -rf dir || framework_failure
|
rm -rf dir || framework_failure
|
||||||
done
|
done
|
||||||
|
|
||||||
# --exclude
|
# test --extract --exclude
|
||||||
"${TARLZ}" -xf "${test3}" --exclude='f*o' --exclude=baz || test_failed $LINENO
|
"${TARLZ}" -xf "${test3}" --exclude='f*o' --exclude=baz || test_failed $LINENO
|
||||||
[ ! -e foo ] || test_failed $LINENO
|
[ ! -e foo ] || test_failed $LINENO
|
||||||
cmp cbar bar || test_failed $LINENO
|
cmp cbar bar || test_failed $LINENO
|
||||||
|
@ -288,7 +290,7 @@ rm -rf dir || framework_failure
|
||||||
[ ! -e dir ] || test_failed $LINENO
|
[ ! -e dir ] || test_failed $LINENO
|
||||||
rm -rf dir || framework_failure
|
rm -rf dir || framework_failure
|
||||||
|
|
||||||
# eof
|
# test --list and --extract eof
|
||||||
"${TARLZ}" -tvf "${testdir}"/test3_eof1.tar > out 2> /dev/null
|
"${TARLZ}" -tvf "${testdir}"/test3_eof1.tar > out 2> /dev/null
|
||||||
[ $? = 2 ] || test_failed $LINENO
|
[ $? = 2 ] || test_failed $LINENO
|
||||||
diff -u vlist3 out || test_failed $LINENO
|
diff -u vlist3 out || test_failed $LINENO
|
||||||
|
@ -456,8 +458,8 @@ cmp out.tar.lz aout.tar.lz || test_failed $LINENO
|
||||||
cmp "${in_tar_lz}" aout.tar.lz || test_failed $LINENO
|
cmp "${in_tar_lz}" aout.tar.lz || test_failed $LINENO
|
||||||
"${TARLZ}" -A "${in_tar_lz}" "${test3_lz}" > aout.tar.lz || test_failed $LINENO
|
"${TARLZ}" -A "${in_tar_lz}" "${test3_lz}" > aout.tar.lz || test_failed $LINENO
|
||||||
cmp out.tar.lz aout.tar.lz || test_failed $LINENO
|
cmp out.tar.lz aout.tar.lz || test_failed $LINENO
|
||||||
cat "${eof_lz}" > aout.tar.lz || framework_failure # concatenate to empty archive
|
cat "${eof_lz}" > aout.tar.lz || framework_failure
|
||||||
"${TARLZ}" -Aqf aout.tar.lz "${in_tar}"
|
"${TARLZ}" -Aqf aout.tar.lz "${in_tar}" # concatenate to empty archive
|
||||||
[ $? = 2 ] || test_failed $LINENO
|
[ $? = 2 ] || test_failed $LINENO
|
||||||
"${TARLZ}" -Af aout.tar.lz "${in_tar_lz}" "${test3_lz}" || test_failed $LINENO
|
"${TARLZ}" -Af aout.tar.lz "${in_tar_lz}" "${test3_lz}" || test_failed $LINENO
|
||||||
cmp out.tar.lz aout.tar.lz || test_failed $LINENO
|
cmp out.tar.lz aout.tar.lz || test_failed $LINENO
|
||||||
|
@ -609,7 +611,7 @@ cmp cbaz dir1/baz || test_failed $LINENO
|
||||||
rm -rf dir1 || framework_failure
|
rm -rf dir1 || framework_failure
|
||||||
rm -f out.tar.lz aout.tar.lz || framework_failure
|
rm -f out.tar.lz aout.tar.lz || framework_failure
|
||||||
|
|
||||||
# --exclude
|
# test --create --exclude
|
||||||
cat cfoo > foo || framework_failure
|
cat cfoo > foo || framework_failure
|
||||||
cat cbar > bar || framework_failure
|
cat cbar > bar || framework_failure
|
||||||
cat cbaz > baz || framework_failure
|
cat cbaz > baz || framework_failure
|
||||||
|
@ -631,6 +633,31 @@ cmp cfoo foo || test_failed $LINENO
|
||||||
[ ! -e baz ] || test_failed $LINENO
|
[ ! -e baz ] || test_failed $LINENO
|
||||||
rm -f out.tar foo bar baz || framework_failure
|
rm -f out.tar foo bar baz || framework_failure
|
||||||
|
|
||||||
|
# test --diff
|
||||||
|
"${TARLZ}" -xf "${test3_lz}" || test_failed $LINENO
|
||||||
|
"${TARLZ}" --uncompressed -cf out.tar foo || test_failed $LINENO
|
||||||
|
"${TARLZ}" --uncompressed -cf aout.tar foo --anonymous || test_failed $LINENO
|
||||||
|
if cmp out.tar aout.tar > /dev/null ; then
|
||||||
|
printf "\nwarning: '--diff' test can't be run as root."
|
||||||
|
else
|
||||||
|
"${TARLZ}" -df "${test3_lz}" > /dev/null
|
||||||
|
[ $? = 1 ] || test_failed $LINENO
|
||||||
|
"${TARLZ}" -df "${test3_lz}" --ignore-ids || test_failed $LINENO
|
||||||
|
"${TARLZ}" -df "${test3_lz}" --exclude '*' || test_failed $LINENO
|
||||||
|
"${TARLZ}" -df "${in_tar_lz}" --exclude '*' || test_failed $LINENO
|
||||||
|
rm -f bar || framework_failure
|
||||||
|
"${TARLZ}" -df "${test3_lz}" foo baz --ignore-ids || test_failed $LINENO
|
||||||
|
"${TARLZ}" -df "${test3_lz}" --exclude bar --ignore-ids ||
|
||||||
|
test_failed $LINENO
|
||||||
|
rm -f foo baz || framework_failure
|
||||||
|
"${TARLZ}" -q -xf "${test3dir_lz}" || test_failed $LINENO
|
||||||
|
"${TARLZ}" -q -df "${test3dir_lz}" --ignore-ids || test_failed $LINENO
|
||||||
|
"${TARLZ}" -q -df "${test3dir_lz}" dir --ignore-ids || test_failed $LINENO
|
||||||
|
"${TARLZ}" -df "${test3_lz}" --ignore-ids -C dir || test_failed $LINENO
|
||||||
|
rm -rf dir || framework_failure
|
||||||
|
fi
|
||||||
|
rm -f out.tar aout.tar foo bar baz || framework_failure
|
||||||
|
|
||||||
# test --delete
|
# test --delete
|
||||||
for e in "" .lz ; do
|
for e in "" .lz ; do
|
||||||
"${TARLZ}" -A "${in_tar}"$e "${test3}"$e > out.tar$e || test_failed $LINENO $e
|
"${TARLZ}" -A "${in_tar}"$e "${test3}"$e > out.tar$e || test_failed $LINENO $e
|
||||||
|
@ -694,6 +721,10 @@ cat "${in}" > test.txt || framework_failure
|
||||||
"${TARLZ}" -0 -cf out.tar.lz foo bar baz --asolid || test_failed $LINENO
|
"${TARLZ}" -0 -cf out.tar.lz foo bar baz --asolid || test_failed $LINENO
|
||||||
"${TARLZ}" -0 -rf out.tar.lz test.txt || test_failed $LINENO
|
"${TARLZ}" -0 -rf out.tar.lz test.txt || test_failed $LINENO
|
||||||
rm -f foo bar baz test.txt || framework_failure
|
rm -f foo bar baz test.txt || framework_failure
|
||||||
|
for i in foo bar baz ; do
|
||||||
|
"${TARLZ}" -qf out.tar.lz --delete $i
|
||||||
|
[ $? = 2 ] || test_failed $LINENO
|
||||||
|
done
|
||||||
"${TARLZ}" -f out.tar.lz --delete test.txt || test_failed $LINENO
|
"${TARLZ}" -f out.tar.lz --delete test.txt || test_failed $LINENO
|
||||||
"${TARLZ}" -xf out.tar.lz || test_failed $LINENO
|
"${TARLZ}" -xf out.tar.lz || test_failed $LINENO
|
||||||
cmp cfoo foo || test_failed $LINENO
|
cmp cfoo foo || test_failed $LINENO
|
||||||
|
@ -732,7 +763,7 @@ else
|
||||||
printf "\nwarning: skipping link test: 'ln' does not work on your system."
|
printf "\nwarning: skipping link test: 'ln' does not work on your system."
|
||||||
fi
|
fi
|
||||||
rm -f dummy_slink dummy_link dummy_file || framework_failure
|
rm -f dummy_slink dummy_link dummy_file || framework_failure
|
||||||
|
#
|
||||||
if [ "${ln_works}" = yes ] ; then
|
if [ "${ln_works}" = yes ] ; then
|
||||||
mkdir dir || framework_failure
|
mkdir dir || framework_failure
|
||||||
cat cfoo > dir/foo || framework_failure
|
cat cfoo > dir/foo || framework_failure
|
||||||
|
@ -762,7 +793,7 @@ if [ "${ln_works}" = yes ] ; then
|
||||||
done
|
done
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# test --append
|
# test --append compressed
|
||||||
cat cfoo > foo || framework_failure
|
cat cfoo > foo || framework_failure
|
||||||
cat cbar > bar || framework_failure
|
cat cbar > bar || framework_failure
|
||||||
cat cbaz > baz || framework_failure
|
cat cbaz > baz || framework_failure
|
||||||
|
@ -801,7 +832,7 @@ cmp out.tar.lz aout.tar.lz || test_failed $LINENO
|
||||||
cmp out.tar.lz aout.tar.lz || test_failed $LINENO
|
cmp out.tar.lz aout.tar.lz || test_failed $LINENO
|
||||||
rm -f out.tar.lz aout.tar.lz || framework_failure
|
rm -f out.tar.lz aout.tar.lz || framework_failure
|
||||||
|
|
||||||
# --uncompressed
|
# test --append --uncompressed
|
||||||
"${TARLZ}" --un -cf out.tar foo bar baz || test_failed $LINENO
|
"${TARLZ}" --un -cf out.tar foo bar baz || test_failed $LINENO
|
||||||
"${TARLZ}" --un -cf aout.tar foo || test_failed $LINENO
|
"${TARLZ}" --un -cf aout.tar foo || test_failed $LINENO
|
||||||
"${TARLZ}" --un -rf aout.tar foo bar baz --exclude foo || test_failed $LINENO
|
"${TARLZ}" --un -rf aout.tar foo bar baz --exclude foo || test_failed $LINENO
|
||||||
|
@ -837,7 +868,7 @@ cmp out.tar aout.tar || test_failed $LINENO
|
||||||
cmp out.tar aout.tar || test_failed $LINENO
|
cmp out.tar aout.tar || test_failed $LINENO
|
||||||
rm -f out.tar aout.tar || framework_failure
|
rm -f out.tar aout.tar || framework_failure
|
||||||
|
|
||||||
# append to solid archive
|
# test --append to solid archive
|
||||||
"${TARLZ}" --solid -q -0 -cf out.tar.lz "${in}" foo bar || test_failed $LINENO
|
"${TARLZ}" --solid -q -0 -cf out.tar.lz "${in}" foo bar || test_failed $LINENO
|
||||||
"${TARLZ}" -q -tf out.tar.lz || test_failed $LINENO # compressed seekable
|
"${TARLZ}" -q -tf out.tar.lz || test_failed $LINENO # compressed seekable
|
||||||
cat out.tar.lz > aout.tar.lz || framework_failure
|
cat out.tar.lz > aout.tar.lz || framework_failure
|
||||||
|
@ -863,31 +894,7 @@ for i in --asolid --bsolid --dsolid -0 ; do
|
||||||
done
|
done
|
||||||
rm -f foo bar baz || framework_failure
|
rm -f foo bar baz || framework_failure
|
||||||
|
|
||||||
# test --diff
|
# test -c -d -x on directories and links
|
||||||
"${TARLZ}" -xf "${test3_lz}" || test_failed $LINENO
|
|
||||||
"${TARLZ}" --uncompressed -cf out.tar foo || test_failed $LINENO
|
|
||||||
"${TARLZ}" --uncompressed -cf aout.tar foo --anonymous || test_failed $LINENO
|
|
||||||
if cmp out.tar aout.tar > /dev/null ; then
|
|
||||||
printf "\nwarning: --diff test can't be run as root."
|
|
||||||
else
|
|
||||||
"${TARLZ}" -df "${test3_lz}" > /dev/null
|
|
||||||
[ $? = 1 ] || test_failed $LINENO
|
|
||||||
"${TARLZ}" -df "${test3_lz}" --ignore-ids || test_failed $LINENO
|
|
||||||
"${TARLZ}" -df "${test3_lz}" --exclude '*' || test_failed $LINENO
|
|
||||||
"${TARLZ}" -df "${in_tar_lz}" --exclude '*' || test_failed $LINENO
|
|
||||||
rm -f bar || framework_failure
|
|
||||||
"${TARLZ}" -df "${test3_lz}" foo baz --ignore-ids || test_failed $LINENO
|
|
||||||
"${TARLZ}" -df "${test3_lz}" --exclude bar --ignore-ids ||
|
|
||||||
test_failed $LINENO
|
|
||||||
rm -f foo baz || framework_failure
|
|
||||||
"${TARLZ}" -q -xf "${test3dir_lz}" || test_failed $LINENO
|
|
||||||
"${TARLZ}" -q -df "${test3dir_lz}" --ignore-ids || test_failed $LINENO
|
|
||||||
"${TARLZ}" -q -df "${test3dir_lz}" dir --ignore-ids || test_failed $LINENO
|
|
||||||
rm -rf dir || framework_failure
|
|
||||||
fi
|
|
||||||
rm -f out.tar aout.tar foo bar baz || framework_failure
|
|
||||||
|
|
||||||
# test directories and links
|
|
||||||
mkdir dir1 || framework_failure
|
mkdir dir1 || framework_failure
|
||||||
"${TARLZ}" -0 -cf out.tar.lz dir1 || test_failed $LINENO
|
"${TARLZ}" -0 -cf out.tar.lz dir1 || test_failed $LINENO
|
||||||
rmdir dir1 || framework_failure
|
rmdir dir1 || framework_failure
|
||||||
|
@ -984,6 +991,7 @@ rm -f foo || framework_failure
|
||||||
|
|
||||||
printf "\ntesting bad input..."
|
printf "\ntesting bad input..."
|
||||||
|
|
||||||
|
# test --extract ".."
|
||||||
mkdir dir1 || framework_failure
|
mkdir dir1 || framework_failure
|
||||||
cd dir1 || framework_failure
|
cd dir1 || framework_failure
|
||||||
"${TARLZ}" -q -xf "${testdir}"/dotdot1.tar.lz || test_failed $LINENO
|
"${TARLZ}" -q -xf "${testdir}"/dotdot1.tar.lz || test_failed $LINENO
|
||||||
|
@ -999,6 +1007,7 @@ cd dir1 || framework_failure
|
||||||
cd .. || framework_failure
|
cd .. || framework_failure
|
||||||
rm -rf dir1 || framework_failure
|
rm -rf dir1 || framework_failure
|
||||||
|
|
||||||
|
# test --list and --extract truncated tar
|
||||||
dd if="${in_tar}" of=truncated.tar bs=1000 count=1 2> /dev/null
|
dd if="${in_tar}" of=truncated.tar bs=1000 count=1 2> /dev/null
|
||||||
"${TARLZ}" -q -tf truncated.tar > /dev/null
|
"${TARLZ}" -q -tf truncated.tar > /dev/null
|
||||||
[ $? = 2 ] || test_failed $LINENO
|
[ $? = 2 ] || test_failed $LINENO
|
||||||
|
@ -1024,7 +1033,7 @@ for i in 1 2 3 4 ; do
|
||||||
rm -f out.tar.lz foo bar baz || framework_failure
|
rm -f out.tar.lz foo bar baz || framework_failure
|
||||||
done
|
done
|
||||||
|
|
||||||
# test format violations
|
# test --list and --extract format violations
|
||||||
if [ "${ln_works}" = yes ] ; then
|
if [ "${ln_works}" = yes ] ; then
|
||||||
mkdir dir1 || framework_failure
|
mkdir dir1 || framework_failure
|
||||||
"${TARLZ}" -C dir1 -xf "${t155}" || test_failed $LINENO
|
"${TARLZ}" -C dir1 -xf "${t155}" || test_failed $LINENO
|
||||||
|
@ -1059,7 +1068,7 @@ if [ "${ln_works}" = yes ] ; then
|
||||||
rm -rf dir1 || framework_failure
|
rm -rf dir1 || framework_failure
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# test compressed and --keep-damaged
|
# test --extract and --keep-damaged compressed
|
||||||
rm -f test.txt || framework_failure
|
rm -f test.txt || framework_failure
|
||||||
for i in "${inbad1}" "${inbad2}" ; do
|
for i in "${inbad1}" "${inbad2}" ; do
|
||||||
"${TARLZ}" -q -xf "${i}.tar.lz"
|
"${TARLZ}" -q -xf "${i}.tar.lz"
|
||||||
|
@ -1128,7 +1137,7 @@ cmp cfoo foo || test_failed $LINENO
|
||||||
cmp cbar bar || test_failed $LINENO
|
cmp cbar bar || test_failed $LINENO
|
||||||
cmp cbaz baz || test_failed $LINENO
|
cmp cbaz baz || test_failed $LINENO
|
||||||
|
|
||||||
# test uncompressed and --keep-damaged
|
# test --extract and --keep-damaged uncompressed
|
||||||
rm -f test.txt || framework_failure
|
rm -f test.txt || framework_failure
|
||||||
"${TARLZ}" -q -xf "${inbad1}.tar"
|
"${TARLZ}" -q -xf "${inbad1}.tar"
|
||||||
[ $? = 2 ] || test_failed $LINENO
|
[ $? = 2 ] || test_failed $LINENO
|
||||||
|
|
Loading…
Add table
Reference in a new issue