diff --git a/AUTHORS b/AUTHORS index f34d016..dfd16e1 100644 --- a/AUTHORS +++ b/AUTHORS @@ -1,7 +1,7 @@ Lzlib was written by Antonio Diaz Diaz. The ideas embodied in lzlib are due to (at least) the following people: -Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for -the definition of Markov chains), G.N.N. Martin (for the definition of -range encoding), Igor Pavlov (for putting all the above together in -LZMA), and Julian Seward (for bzip2's CLI). +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrei Markov (for the +definition of Markov chains), G.N.N. Martin (for the definition of range +encoding), Igor Pavlov (for putting all the above together in LZMA), and +Julian Seward (for bzip2's CLI). diff --git a/COPYING b/COPYING index 4ad17ae..a6511c8 100644 --- a/COPYING +++ b/COPYING @@ -1,338 +1,17 @@ - GNU GENERAL PUBLIC LICENSE - Version 2, June 1991 + Lzlib - Compression library for the lzip format + Copyright (C) Antonio Diaz Diaz. - Copyright (C) 1989, 1991 Free Software Foundation, Inc., - 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - Everyone is permitted to copy and distribute verbatim copies - of this license document, but changing it is not allowed. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - Preamble + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - The licenses for most software are designed to take away your -freedom to share and change it. By contrast, the GNU General Public -License is intended to guarantee your freedom to share and change free -software--to make sure the software is free for all its users. This -General Public License applies to most of the Free Software -Foundation's software and to any other program whose authors commit to -using it. (Some other Free Software Foundation software is covered by -the GNU Lesser General Public License instead.) You can apply it to -your programs, too. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - When we speak of free software, we are referring to freedom, not -price. Our General Public Licenses are designed to make sure that you -have the freedom to distribute copies of free software (and charge for -this service if you wish), that you receive source code or can get it -if you want it, that you can change the software or use pieces of it -in new free programs; and that you know you can do these things. - - To protect your rights, we need to make restrictions that forbid -anyone to deny you these rights or to ask you to surrender the rights. -These restrictions translate to certain responsibilities for you if you -distribute copies of the software, or if you modify it. - - For example, if you distribute copies of such a program, whether -gratis or for a fee, you must give the recipients all the rights that -you have. You must make sure that they, too, receive or can get the -source code. And you must show them these terms so they know their -rights. - - We protect your rights with two steps: (1) copyright the software, and -(2) offer you this license which gives you legal permission to copy, -distribute and/or modify the software. - - Also, for each author's protection and ours, we want to make certain -that everyone understands that there is no warranty for this free -software. If the software is modified by someone else and passed on, we -want its recipients to know that what they have is not the original, so -that any problems introduced by others will not reflect on the original -authors' reputations. - - Finally, any free program is threatened constantly by software -patents. We wish to avoid the danger that redistributors of a free -program will individually obtain patent licenses, in effect making the -program proprietary. To prevent this, we have made it clear that any -patent must be licensed for everyone's free use or not licensed at all. - - The precise terms and conditions for copying, distribution and -modification follow. - - GNU GENERAL PUBLIC LICENSE - TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION - - 0. This License applies to any program or other work which contains -a notice placed by the copyright holder saying it may be distributed -under the terms of this General Public License. The "Program", below, -refers to any such program or work, and a "work based on the Program" -means either the Program or any derivative work under copyright law: -that is to say, a work containing the Program or a portion of it, -either verbatim or with modifications and/or translated into another -language. (Hereinafter, translation is included without limitation in -the term "modification".) Each licensee is addressed as "you". - -Activities other than copying, distribution and modification are not -covered by this License; they are outside its scope. The act of -running the Program is not restricted, and the output from the Program -is covered only if its contents constitute a work based on the -Program (independent of having been made by running the Program). -Whether that is true depends on what the Program does. - - 1. You may copy and distribute verbatim copies of the Program's -source code as you receive it, in any medium, provided that you -conspicuously and appropriately publish on each copy an appropriate -copyright notice and disclaimer of warranty; keep intact all the -notices that refer to this License and to the absence of any warranty; -and give any other recipients of the Program a copy of this License -along with the Program. - -You may charge a fee for the physical act of transferring a copy, and -you may at your option offer warranty protection in exchange for a fee. - - 2. You may modify your copy or copies of the Program or any portion -of it, thus forming a work based on the Program, and copy and -distribute such modifications or work under the terms of Section 1 -above, provided that you also meet all of these conditions: - - a) You must cause the modified files to carry prominent notices - stating that you changed the files and the date of any change. - - b) You must cause any work that you distribute or publish, that in - whole or in part contains or is derived from the Program or any - part thereof, to be licensed as a whole at no charge to all third - parties under the terms of this License. - - c) If the modified program normally reads commands interactively - when run, you must cause it, when started running for such - interactive use in the most ordinary way, to print or display an - announcement including an appropriate copyright notice and a - notice that there is no warranty (or else, saying that you provide - a warranty) and that users may redistribute the program under - these conditions, and telling the user how to view a copy of this - License. (Exception: if the Program itself is interactive but - does not normally print such an announcement, your work based on - the Program is not required to print an announcement.) - -These requirements apply to the modified work as a whole. If -identifiable sections of that work are not derived from the Program, -and can be reasonably considered independent and separate works in -themselves, then this License, and its terms, do not apply to those -sections when you distribute them as separate works. But when you -distribute the same sections as part of a whole which is a work based -on the Program, the distribution of the whole must be on the terms of -this License, whose permissions for other licensees extend to the -entire whole, and thus to each and every part regardless of who wrote it. - -Thus, it is not the intent of this section to claim rights or contest -your rights to work written entirely by you; rather, the intent is to -exercise the right to control the distribution of derivative or -collective works based on the Program. - -In addition, mere aggregation of another work not based on the Program -with the Program (or with a work based on the Program) on a volume of -a storage or distribution medium does not bring the other work under -the scope of this License. - - 3. You may copy and distribute the Program (or a work based on it, -under Section 2) in object code or executable form under the terms of -Sections 1 and 2 above provided that you also do one of the following: - - a) Accompany it with the complete corresponding machine-readable - source code, which must be distributed under the terms of Sections - 1 and 2 above on a medium customarily used for software interchange; or, - - b) Accompany it with a written offer, valid for at least three - years, to give any third party, for a charge no more than your - cost of physically performing source distribution, a complete - machine-readable copy of the corresponding source code, to be - distributed under the terms of Sections 1 and 2 above on a medium - customarily used for software interchange; or, - - c) Accompany it with the information you received as to the offer - to distribute corresponding source code. (This alternative is - allowed only for noncommercial distribution and only if you - received the program in object code or executable form with such - an offer, in accord with Subsection b above.) - -The source code for a work means the preferred form of the work for -making modifications to it. For an executable work, complete source -code means all the source code for all modules it contains, plus any -associated interface definition files, plus the scripts used to -control compilation and installation of the executable. However, as a -special exception, the source code distributed need not include -anything that is normally distributed (in either source or binary -form) with the major components (compiler, kernel, and so on) of the -operating system on which the executable runs, unless that component -itself accompanies the executable. - -If distribution of executable or object code is made by offering -access to copy from a designated place, then offering equivalent -access to copy the source code from the same place counts as -distribution of the source code, even though third parties are not -compelled to copy the source along with the object code. - - 4. You may not copy, modify, sublicense, or distribute the Program -except as expressly provided under this License. Any attempt -otherwise to copy, modify, sublicense or distribute the Program is -void, and will automatically terminate your rights under this License. -However, parties who have received copies, or rights, from you under -this License will not have their licenses terminated so long as such -parties remain in full compliance. - - 5. You are not required to accept this License, since you have not -signed it. However, nothing else grants you permission to modify or -distribute the Program or its derivative works. These actions are -prohibited by law if you do not accept this License. Therefore, by -modifying or distributing the Program (or any work based on the -Program), you indicate your acceptance of this License to do so, and -all its terms and conditions for copying, distributing or modifying -the Program or works based on it. - - 6. Each time you redistribute the Program (or any work based on the -Program), the recipient automatically receives a license from the -original licensor to copy, distribute or modify the Program subject to -these terms and conditions. You may not impose any further -restrictions on the recipients' exercise of the rights granted herein. -You are not responsible for enforcing compliance by third parties to -this License. - - 7. If, as a consequence of a court judgment or allegation of patent -infringement or for any other reason (not limited to patent issues), -conditions are imposed on you (whether by court order, agreement or -otherwise) that contradict the conditions of this License, they do not -excuse you from the conditions of this License. If you cannot -distribute so as to satisfy simultaneously your obligations under this -License and any other pertinent obligations, then as a consequence you -may not distribute the Program at all. For example, if a patent -license would not permit royalty-free redistribution of the Program by -all those who receive copies directly or indirectly through you, then -the only way you could satisfy both it and this License would be to -refrain entirely from distribution of the Program. - -If any portion of this section is held invalid or unenforceable under -any particular circumstance, the balance of the section is intended to -apply and the section as a whole is intended to apply in other -circumstances. - -It is not the purpose of this section to induce you to infringe any -patents or other property right claims or to contest validity of any -such claims; this section has the sole purpose of protecting the -integrity of the free software distribution system, which is -implemented by public license practices. Many people have made -generous contributions to the wide range of software distributed -through that system in reliance on consistent application of that -system; it is up to the author/donor to decide if he or she is willing -to distribute software through any other system and a licensee cannot -impose that choice. - -This section is intended to make thoroughly clear what is believed to -be a consequence of the rest of this License. - - 8. If the distribution and/or use of the Program is restricted in -certain countries either by patents or by copyrighted interfaces, the -original copyright holder who places the Program under this License -may add an explicit geographical distribution limitation excluding -those countries, so that distribution is permitted only in or among -countries not thus excluded. In such case, this License incorporates -the limitation as if written in the body of this License. - - 9. The Free Software Foundation may publish revised and/or new versions -of the General Public License from time to time. Such new versions will -be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - -Each version is given a distinguishing version number. If the Program -specifies a version number of this License which applies to it and "any -later version", you have the option of following the terms and conditions -either of that version or of any later version published by the Free -Software Foundation. If the Program does not specify a version number of -this License, you may choose any version ever published by the Free Software -Foundation. - - 10. If you wish to incorporate parts of the Program into other free -programs whose distribution conditions are different, write to the author -to ask for permission. For software which is copyrighted by the Free -Software Foundation, write to the Free Software Foundation; we sometimes -make exceptions for this. Our decision will be guided by the two goals -of preserving the free status of all derivatives of our free software and -of promoting the sharing and reuse of software generally. - - NO WARRANTY - - 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY -FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN -OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES -PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED -OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS -TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE -PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, -REPAIR OR CORRECTION. - - 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING -WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR -REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, -INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING -OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED -TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY -YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER -PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE -POSSIBILITY OF SUCH DAMAGES. - - END OF TERMS AND CONDITIONS - - How to Apply These Terms to Your New Programs - - If you develop a new program, and you want it to be of the greatest -possible use to the public, the best way to achieve this is to make it -free software which everyone can redistribute and change under these terms. - - To do so, attach the following notices to the program. It is safest -to attach them to the start of each source file to most effectively -convey the exclusion of warranty; and each file should have at least -the "copyright" line and a pointer to where the full notice is found. - - - Copyright (C) - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . - -Also add information on how to contact you by electronic and paper mail. - -If the program is interactive, make it output a short notice like this -when it starts in an interactive mode: - - Gnomovision version 69, Copyright (C) - Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. - This is free software, and you are welcome to redistribute it - under certain conditions; type `show c' for details. - -The hypothetical commands `show w' and `show c' should show the appropriate -parts of the General Public License. Of course, the commands you use may -be called something other than `show w' and `show c'; they could even be -mouse-clicks or menu items--whatever suits your program. - -You should also get your employer (if you work as a programmer) or your -school, if any, to sign a "copyright disclaimer" for the program, if -necessary. Here is a sample; alter the names: - - Yoyodyne, Inc., hereby disclaims all copyright interest in the program - `Gnomovision' (which makes passes at compilers) written by James Hacker. - - , 1 April 1989 - Ty Coon, President of Vice - -This General Public License does not permit incorporating your program into -proprietary programs. If your program is a subroutine library, you may -consider it more useful to permit linking proprietary applications with the -library. If this is what you want to do, use the GNU Lesser General -Public License instead of this License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. diff --git a/COPYING.GPL b/COPYING.GPL new file mode 100644 index 0000000..42fe735 --- /dev/null +++ b/COPYING.GPL @@ -0,0 +1,337 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. diff --git a/ChangeLog b/ChangeLog index 3c65439..d3e52e4 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,38 +1,155 @@ +2025-01-09 Antonio Diaz Diaz + + * Version 1.15 released. + * decoder.h (Rd_try_reload): Reject a nonzero first LZMA byte. + * minilzip.c (do_decompress): Reject empty member in multimember. + (Pp_free): New function. + * lzlib.h: Declare LZ_Errno, LZ_Encoder, and LZ_Decoder as typedef. + * Makefile.in: New target 'lib' which builds just the library. + New target 'bin' which builds the library and minilzip. + 'lib' is now the default; minilzip is no longer built by default. + 'install-bin' installs minilzip and its man page again. + * configure, Makefile.in: Use '--soname' conditionally. + (Reported by Michael Sullivan). + * INSTALL: Document use of 'make bin'. + * check.sh: Use 'cp' instead of 'cat'. + +2024-01-20 Antonio Diaz Diaz + + * Version 1.14 released. + * minilzip.c: Reformat file diagnostics as 'PROGRAM: FILE: MESSAGE'. + (show_option_error): New function showing argument and option name. + (main): Make -o preserve date/mode/owner if 1 input file. + * lzip.h: Rename verify_* to check_*. + * lzlib.texi: Document the need to declare uint8_t before lzlib.h. + (Reported by Michal Górny). + * configure, Makefile.in: New variable 'MAKEINFO'. + * INSTALL: Document use of CFLAGS+='--std=c99 -D_XOPEN_SOURCE=500'. + +2022-01-23 Antonio Diaz Diaz + + * Version 1.13 released. + * configure: Set variables AR and ARFLAGS. (Reported by Hoël Bézier). + * main.c: Rename to minilzip.c. + * minilzip.c (getnum): Show option name and valid range if error. + (check_lib): Check that LZ_API_VERSION and LZ_version_string match. + * Improve several descriptions in manual, '--help', and man page. + * lzlib.texi: Change GNU Texinfo category to 'Compression'. + (Reported by Alfred M. Szmidt). + +2021-01-02 Antonio Diaz Diaz + + * Version 1.12 released. + * lzlib.h: Define LZ_API_VERSION as 1000 * major + minor. 1.12 = 1012. + This change does not affect the soversion. + * lzlib.h, lzlib.c: New function LZ_api_version. + * LZd_try_verify_trailer: Return 2 if EOF at trailer or EOS marker. + * Decompression speed has been slightly increased. + * decoder.h: Increase 'rd_min_available_bytes' from 8 to 10. + * encoder_base.c (LZeb_try_sync_flush): + Compensate for the increase in 'rd_min_available_bytes'. + * main.c (do_decompress): Fix false report about library stall. + New option '--check-lib'. + (main): Report an error if a file name is empty. + Make '-o' behave like '-c', but writing to file instead of stdout. + Make '-c' and '-o' check whether the output is a terminal only once. + Do not open output if input is a terminal. + Replace 'decompressed', 'compressed' with 'out', 'in' in output. + Set a valid invocation_name even if argc == 0. + * lzlib.texi: Document the new way of checking the library version. + Document that 'LZ_(de)compress_close' and 'LZ_(de)compress_errno' + can be called with a null argument. + Document that sync flush marker is not allowed in lzip files. + Document the consequences of not calling 'LZ_decompress_finish'. + Document that 'LZ_decompress_read' returns at least once per member. + Document that 'LZ_(de)compress_read' can be called with a null + buffer pointer argument. + Real code examples for common uses have been added to the tutorial. + * bbexample.c: Don't use 'LZ_(de)compress_write_size'. + * lzcheck.c: New options '-s' (sync) and '-m' (member by member). + Test member by member without 'LZ_decompress_finish'. + * ffexample.c: New file containing example functions for file-to-file + compression/decompression. + * Document extraction from tar.lz in '--help' output and man page. + * Makefile.in: 'install-bin' no longer installs the man page. + New targets 'install-bin-compress' and 'install-bin-strip-compress'. + * testsuite: Add 9 new test files. + +2019-01-02 Antonio Diaz Diaz + + * Version 1.11 released. + * Rename File_* to Lzip_*. + * LZ_decompress_read: Don't return error until all data is read. + * decoder.c (LZd_decode_member): Decode truncated data until EOF. + * cbuffer.c (Cb_read_data): Allow a null buffer pointer. + * main.c: Don't allow mixing different operations (-d and -t). + * main.c: Check return value of close( infd ). + * main.c: Compile on DOS with DJGPP. + * lzlib.texi: Improve descriptions of '-0..-9', '-m', and '-s'. + Document that 'LZ_(de)compress_finish' can be called repeatedly. + * configure: Accept appending to CFLAGS; 'CFLAGS+=OPTIONS'. + * Makefile.in: Rename targets 'install-bin*' to 'install-lib*'. + * Makefile.in: Targets 'install-bin*' now install minilzip. + * INSTALL: Document use of CFLAGS+='-D __USE_MINGW_ANSI_STDIO'. + +2018-02-07 Antonio Diaz Diaz + + * Version 1.10 released. + * LZ_compress_finish now adjusts dictionary size for each member. + (Older versions can adjust dictionary size only once). + * lzlib.c (LZ_decompress_read): Detect corrupt header with HD=3. + * main.c: New option '--loose-trailing'. + (main): Make option '-S, --volume-size' keep input files. + Replace 'bits/byte' with inverse compression ratio in output. + (main): Show final diagnostic when testing multiple files. + (set_c_outname): Do not add a second '.lz' to the arg of '-o'. + (do_decompress): Show dictionary size at verbosity level 4 (-vvvv). + * lzlib.texi: New chapter 'Invoking minilzip'. + +2017-04-11 Antonio Diaz Diaz + + * Version 1.9 released. + * Compression time of option '-0' has been reduced by 3%. + * Compression time of options -1 to -9 has been reduced by 1%. + * Decompression time has been reduced by 3%. + * main.c: Continue testing if any input file is a terminal. + * Change the license of the library to "2-clause BSD". + 2016-05-17 Antonio Diaz Diaz * Version 1.8 released. - * decoder.c (LZd_verify_trailer): Removed test of final code. - * main.c: Added new option '-a, --trailing-error'. - * main.c (main): Delete '--output' file if infd is a terminal. - * main.c (main): Don't use stdin more than once. + * lzlib.h: Define LZ_API_VERSION to 1. + * lzlib.c (LZ_decompress_sync_to_member): Add skipped size to in_size. + * decoder.c (LZd_verify_trailer): Remove test of final code. + * main.c: New option '-a, --trailing-error'. + (main): Delete '--output' file if infd is a terminal. + (main): Don't use stdin more than once. * configure: Avoid warning on some shells when testing for gcc. * Makefile.in: Detect the existence of install-info. - * testsuite/check.sh: A POSIX shell is required to run the tests. - * testsuite/check.sh: Don't check error messages. + * check.sh: Require a POSIX shell. Don't check error messages. 2015-07-08 Antonio Diaz Diaz * Version 1.7 released. - * Ported fast encoder and option '-0' from lzip. + * Port fast encoder and option '-0' from lzip. * If open-->write-->finish, produce same dictionary size as lzip. - * Makefile.in: Added new targets 'install*-compress'. + * Makefile.in: New targets 'install*-compress'. 2014-08-27 Antonio Diaz Diaz * Version 1.6 released. - * Compression ratio of option -9 has been slightly increased. - * configure: Added new option '--disable-static'. - * configure: Added new option '--disable-ldconfig'. + * Compression ratio of option '-9' has been slightly increased. + * configure: New options '--disable-static' and '--disable-ldconfig'. * Makefile.in: Ignore errors from ldconfig. * Makefile.in: Use 'CFLAGS' in every invocation of 'CC'. * main.c (close_and_set_permissions): Behave like 'cp -p'. - * lzlib.texinfo: Renamed to lzlib.texi. - * License changed to "GPL version 2 or later with link exception". + * lzlib.texinfo: Rename to lzlib.texi. + * Change license to "GPL version 2 or later with link exception". 2013-09-15 Antonio Diaz Diaz * Version 1.5 released. - * Removed decompression support for version 0 files. + * Remove decompression support for version 0 files. * The LZ_compress_sync_flush mechanism has been fixed (again). * Minor fixes. @@ -43,20 +160,19 @@ * Compression ratio has been slightly increased. * Compression time has been reduced by 8%. * Decompression time has been reduced by 7%. - * lzlib.h: Changed 'long long' values to 'unsigned long long'. + * lzlib.h: Change 'long long' values to 'unsigned long long'. * encoder.c (Mf_init): Reduce minimum buffer size to 64KiB. * lzlib.c (LZ_decompress_read): Tell LZ_header_error from LZ_unexpected_eof the same way as lzip does. - * Makefile.in: Added new target 'install-as-lzip'. - * Makefile.in: Added new target 'install-bin'. - * main.c: Use 'setmode' instead of '_setmode' on Windows and OS/2. + * Makefile.in: New targets 'install-as-lzip' and 'install-bin'. * main.c: Define 'strtoull' to 'strtoul' on Windows. + (main): Use 'setmode' instead of '_setmode' on Windows and OS/2. 2012-02-29 Antonio Diaz Diaz * Version 1.3 released. * Translated to C from the C++ source of lzlib 1.2. - * configure: 'datadir' renamed to 'datarootdir'. + * configure: Rename 'datadir' to 'datarootdir'. 2011-10-25 Antonio Diaz Diaz @@ -65,12 +181,11 @@ independently of the value of 'pos_state'. This gives better compression for large values of '--match-length' without being slower. - * encoder.h encoder.cc: Optimize pair price calculations. This - reduces compression time for large values of '--match-length' - by up to 6%. - * main.cc: Added new option '-F, --recompress'. - * Makefile.in: 'make install' no more tries to run - '/sbin/ldconfig' on systems lacking it. + * encoder.h, encoder.cc: Optimize pair price calculations, reducing + compression time for large values of '--match-length' by up to 6%. + * main.cc: New option '-F, --recompress'. + * Makefile.in: 'make install' no longer tries to run '/sbin/ldconfig' + on systems lacking it. 2011-01-03 Antonio Diaz Diaz @@ -78,24 +193,20 @@ * Compression time has been reduced by 2%. * All declarations not belonging to the API have been encapsulated in the namespace 'Lzlib'. - * testsuite: 'test1' renamed to 'test.txt'. Added new tests. - * Match length limits set by options -1 to -9 of minilzip have - been changed to match those of lzip 1.11. - * main.cc: Set stdin/stdout in binary mode on OS2. + * testsuite: Rename 'test1' to 'test.txt'. New tests. + * main.cc (main): Set match length limits to same values as lzip 1.11. + (main): Set stdin/stdout in binary mode on OS2. * bbexample.cc: New file containing example functions for buffer-to-buffer compression/decompression. 2010-05-08 Antonio Diaz Diaz * Version 1.0 released. - * Added new function LZ_decompress_member_finished. - * Added new function LZ_decompress_member_version. - * Added new function LZ_decompress_dictionary_size. - * Added new function LZ_decompress_data_crc. - * Variables declared 'extern' have been encapsulated in a - namespace. - * main.cc: Fixed warning about fchown's return value being ignored. - * decoder.h: Input_buffer integrated in Range_decoder. + * New functions LZ_decompress_member_version, LZ_decompress_data_crc, + LZ_decompress_member_finished, and LZ_decompress_dictionary_size. + * Variables declared 'extern' have been encapsulated in a namespace. + * main.cc: Fix warning about fchown's return value being ignored. + * decoder.h: Integrate Input_buffer in Range_decoder. 2010-02-10 Antonio Diaz Diaz @@ -106,28 +217,26 @@ 2010-01-17 Antonio Diaz Diaz * Version 0.8 released. - * Added new function LZ_decompress_reset. - * Added new function LZ_decompress_sync_to_member. - * Added new function LZ_decompress_write_size. - * Added new function LZ_strerror. - * lzlib.h: API change. Replaced 'enum' with functions for values - of dictionary size limits to make interface names consistent. - * lzlib.h: API change. 'LZ_errno' replaced with 'LZ_Errno'. - * lzlib.h: API change. Replaced 'void *' with 'struct LZ_Encoder *' + * New functions LZ_decompress_reset, LZ_decompress_sync_to_member, + LZ_decompress_write_size, and LZ_strerror. + * lzlib.h: API change. Replace 'enum' with functions for values of + dictionary size limits to make interface names consistent. + * lzlib.h: API change. Rename 'LZ_errno' to 'LZ_Errno'. + * lzlib.h: API change. Replace 'void *' with 'struct LZ_Encoder *' and 'struct LZ_Decoder *' to make interface type safe. - * decoder.cc: Truncated member trailer is now correctly detected. + * decoder.cc: A truncated member trailer is now correctly detected. * encoder.cc: Matchfinder::reset now also clears at_stream_end_, allowing LZ_compress_restart_member to restart a finished stream. * lzlib.cc: Accept only query or close operations after a fatal error has occurred. - * Shared version of lzlib is no more built by default. - * testsuite/check.sh: Use 'test1' instead of 'COPYING' for testing. + * The shared version of lzlib is no longer built by default. + * check.sh: Use 'test1' instead of 'COPYING' for testing. 2009-10-20 Antonio Diaz Diaz * Version 0.7 released. * Compression time has been reduced by 4%. - * testsuite/check.sh: Removed -9 to run in less than 256MiB of RAM. + * check.sh: Remove -9 to run in less than 256MiB of RAM. * lzcheck.cc: Read files of any size up to 2^63 bytes. 2009-09-02 Antonio Diaz Diaz @@ -139,15 +248,14 @@ * Version 0.5 released. * Decompression speed has been improved. - * main.cc (signal_handler): Declared as 'extern "C"'. + * main.cc (signal_handler): Declare as 'extern "C"'. 2009-06-03 Antonio Diaz Diaz * Version 0.4 released. - * Added new function LZ_compress_sync_flush. - * Added new function LZ_compress_write_size. + * New functions LZ_compress_sync_flush and LZ_compress_write_size. * Decompression speed has been improved. - * Added chapter 'Buffering' to the manual. + * lzlib.texinfo: New chapter 'Buffering'. 2009-05-03 Antonio Diaz Diaz @@ -157,16 +265,15 @@ 2009-04-26 Antonio Diaz Diaz * Version 0.2 released. - * Fixed a segfault when decompressing trailing garbage. - * Fixed a false positive in LZ_(de)compress_finished. + * Fix a segfault when decompressing trailing garbage. + * Fix a false positive in LZ_(de)compress_finished. 2009-04-21 Antonio Diaz Diaz * Version 0.1 released. -Copyright (C) 2009-2016 Antonio Diaz Diaz. +Copyright (C) 2009-2025 Antonio Diaz Diaz. -This file is a collection of facts, and thus it is not copyrightable, -but just in case, you have unlimited permission to copy, distribute and -modify it. +This file is a collection of facts, and thus it is not copyrightable, but just +in case, you have unlimited permission to copy, distribute, and modify it. diff --git a/INSTALL b/INSTALL index 31237fc..ba3337e 100644 --- a/INSTALL +++ b/INSTALL @@ -1,9 +1,14 @@ Requirements ------------ -You will need a C compiler. -I use gcc 5.3.0 and 4.1.2, but the code should compile with any -standards compliant compiler. -Gcc is available at http://gcc.gnu.org. +You will need a C99 compiler. (gcc 3.3.6 or newer is recommended). +I use gcc 6.1.0 and 3.3.6, but the code should compile with any standards +compliant compiler. +Gcc is available at http://gcc.gnu.org +Lzip is available at http://www.nongnu.org/lzip/lzip.html + +The operating system must allow signal handlers read access to objects with +static storage duration so that the cleanup handler for Control-C can delete +the partial output file. (This requirement is for minilzip only). Procedure @@ -14,8 +19,8 @@ Procedure or lzip -cd lzlib[version].tar.lz | tar -xf - -This creates the directory ./lzlib[version] containing the source from -the main archive. +This creates the directory ./lzlib[version] containing the source code +extracted from the archive. 2. Change to lzlib directory and run configure. (Try 'configure --help' for usage instructions). @@ -23,46 +28,65 @@ the main archive. cd lzlib[version] ./configure -3. Run make. + If you choose a C standard, enable the POSIX features explicitly: + + ./configure CFLAGS+='--std=c99 -D_XOPEN_SOURCE=500' + + If you are compiling on MinGW, use: + + ./configure CFLAGS+='-D __USE_MINGW_ANSI_STDIO' + +3. Run make make +to build the library, or + + make bin + +to build also minilzip. + 4. Optionally, type 'make check' to run the tests that come with lzlib. 5. Type 'make install' to install the library and any data files and - documentation. (You may need to run ldconfig also). + documentation. You need root privileges to install into a prefix owned + by root. (You may need to run ldconfig also). Or type 'make install-compress', which additionally compresses the - info manual after installation. (Installing compressed docs may - become the default in the future). + info manual after installation. + (Installing compressed docs may become the default in the future). - You can install only the library, the info manual or the man page by - typing 'make install-bin', 'make install-info' or 'make install-man' - respectively. + You can install only the library or the info manual by typing + 'make install-lib' or 'make install-info' respectively. - Instead of 'make install', you can type 'make install-as-lzip' to - install the library and any data files and documentation, and link - minilzip to the name 'lzip'. + 'make install-bin' installs the program minilzip and its man page. It + installs a shared minilzip if the shared library has been configured. + Else it installs a static minilzip. + 'make install-bin-compress' additionally compresses the man page after + installation. + + 'make install-as-lzip' runs 'make install-bin' and then links minilzip to + the name 'lzip'. Another way ----------- You can also compile lzlib into a separate directory. -To do this, you must use a version of 'make' that supports the 'VPATH' -variable, such as GNU 'make'. 'cd' to the directory where you want the +To do this, you must use a version of 'make' that supports the variable +'VPATH', such as GNU 'make'. 'cd' to the directory where you want the object files and executables to go and run the 'configure' script. -'configure' automatically checks for the source code in '.', in '..' and +'configure' automatically checks for the source code in '.', in '..', and in the directory that 'configure' is in. -'configure' recognizes the option '--srcdir=DIR' to control where to -look for the sources. Usually 'configure' can determine that directory +'configure' recognizes the option '--srcdir=DIR' to control where to look +for the source code. Usually 'configure' can determine that directory automatically. After running 'configure', you can run 'make' and 'make install' as explained above. -Copyright (C) 2009-2016 Antonio Diaz Diaz. +Copyright (C) 2009-2025 Antonio Diaz Diaz. This file is free documentation: you have unlimited permission to copy, -distribute and modify it. +distribute, and modify it. diff --git a/Makefile.in b/Makefile.in index 02a1870..4f99874 100644 --- a/Makefile.in +++ b/Makefile.in @@ -1,44 +1,55 @@ DISTNAME = $(pkgname)-$(pkgversion) -AR = ar INSTALL = install INSTALL_PROGRAM = $(INSTALL) -m 755 -INSTALL_DATA = $(INSTALL) -m 644 INSTALL_DIR = $(INSTALL) -d -m 755 +INSTALL_DATA = $(INSTALL) -m 644 +INSTALL_SO = $(INSTALL) -m 644 LDCONFIG = /sbin/ldconfig SHELL = /bin/sh CAN_RUN_INSTALLINFO = $(SHELL) -c "install-info --version" > /dev/null 2>&1 -objs = carg_parser.o main.o +objs = carg_parser.o minilzip.o .PHONY : all install install-bin install-info install-man \ install-strip install-compress install-strip-compress \ install-bin-strip install-info-compress install-man-compress \ - install-as-lzip uninstall uninstall-bin uninstall-info uninstall-man \ + install-bin-compress install-bin-strip-compress \ + install-lib install-lib-strip \ + install-as-lzip \ + uninstall uninstall-bin uninstall-lib uninstall-info uninstall-man \ doc info man check dist clean distclean -all : $(progname_static) $(progname_shared) +all : lib + +lib : $(libname_static) $(libname_shared) lib$(libname).a : lzlib.o - $(AR) -rcs $@ $< + $(AR) $(ARFLAGS) $@ $< -lib$(libname).so.$(pkgversion) : lzlib_sh.o - $(CC) $(LDFLAGS) $(CFLAGS) -fpic -fPIC -shared -Wl,--soname=lib$(libname).so.$(soversion) -o $@ $< +lib$(libname).so.$(soversion) : lzlib_sh.o + $(CC) $(CFLAGS) $(LDFLAGS) -fpic -fPIC -shared -Wl,--soname=$@ -o $@ $< || \ + $(CC) $(CFLAGS) $(LDFLAGS) -fpic -fPIC -shared -o $@ $< + +bin : $(progname_static) $(progname_shared) $(progname) : $(objs) lib$(libname).a - $(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(objs) lib$(libname).a + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(objs) lib$(libname).a -$(progname)_shared : $(objs) lib$(libname).so.$(pkgversion) - $(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(objs) lib$(libname).so.$(pkgversion) +$(progname)_shared : $(objs) lib$(libname).so.$(soversion) + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(objs) lib$(libname).so.$(soversion) bbexample : bbexample.o lib$(libname).a - $(CC) $(LDFLAGS) $(CFLAGS) -o $@ bbexample.o lib$(libname).a + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ bbexample.o lib$(libname).a + +ffexample : ffexample.o lib$(libname).a + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ ffexample.o lib$(libname).a lzcheck : lzcheck.o lib$(libname).a - $(CC) $(LDFLAGS) $(CFLAGS) -o $@ lzcheck.o lib$(libname).a + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ lzcheck.o lib$(libname).a -main.o : main.c +minilzip.o : minilzip.c $(CC) $(CPPFLAGS) $(CFLAGS) -DPROGVERSION=\"$(pkgversion)\" -c -o $@ $< lzlib_sh.o : lzlib.c @@ -47,6 +58,11 @@ lzlib_sh.o : lzlib.c %.o : %.c $(CC) $(CPPFLAGS) $(CFLAGS) -c -o $@ $< +# prevent 'make' from trying to remake source files +$(VPATH)/configure $(VPATH)/Makefile.in $(VPATH)/doc/$(pkgname).texi : ; +MAKEFLAGS += -r +.SUFFIXES : + lzdeps = lzlib.h lzip.h cbuffer.c decoder.h decoder.c encoder_base.h \ encoder_base.c encoder.h encoder.c fast_encoder.h fast_encoder.c @@ -54,64 +70,73 @@ $(objs) : Makefile carg_parser.o : carg_parser.h lzlib.o : Makefile $(lzdeps) lzlib_sh.o : Makefile $(lzdeps) -main.o : carg_parser.h lzlib.h +minilzip.o : carg_parser.h lzlib.h bbexample.o : Makefile lzlib.h +ffexample.o : Makefile lzlib.h lzcheck.o : Makefile lzlib.h - doc : info man info : $(VPATH)/doc/$(pkgname).info $(VPATH)/doc/$(pkgname).info : $(VPATH)/doc/$(pkgname).texi - cd $(VPATH)/doc && makeinfo $(pkgname).texi + cd $(VPATH)/doc && $(MAKEINFO) $(pkgname).texi man : $(VPATH)/doc/$(progname).1 $(VPATH)/doc/$(progname).1 : $(progname) - help2man -n 'reduces the size of files' -o $@ --no-info ./$(progname) + help2man -n 'reduces the size of files' -o $@ --info-page=$(pkgname) ./$(progname) Makefile : $(VPATH)/configure $(VPATH)/Makefile.in ./config.status -check : $(progname) bbexample lzcheck +check : $(progname) bbexample ffexample lzcheck @$(VPATH)/testsuite/check.sh $(VPATH)/testsuite $(pkgversion) -install : install-bin install-info -install-strip : install-bin-strip install-info -install-compress : install-bin install-info-compress -install-strip-compress : install-bin-strip install-info-compress +install : install-lib install-info +install-strip : install-lib-strip install-info +install-compress : install-lib install-info-compress +install-strip-compress : install-lib-strip install-info-compress +install-bin-compress : install-bin install-man-compress +install-bin-strip-compress : install-bin-strip install-man-compress -install-bin : all +install-bin : bin install-man + if [ ! -d "$(DESTDIR)$(bindir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(bindir)" ; fi + $(INSTALL_PROGRAM) ./$(progname_lzip) "$(DESTDIR)$(bindir)/$(progname)" + +install-bin-strip : bin + $(MAKE) INSTALL_PROGRAM='$(INSTALL_PROGRAM) -s' install-bin + +install-lib : lib if [ ! -d "$(DESTDIR)$(includedir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(includedir)" ; fi if [ ! -d "$(DESTDIR)$(libdir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(libdir)" ; fi $(INSTALL_DATA) $(VPATH)/$(libname)lib.h "$(DESTDIR)$(includedir)/$(libname)lib.h" - if [ -n "$(progname_static)" ] ; then \ + if [ -n "$(libname_static)" ] ; then \ $(INSTALL_DATA) ./lib$(libname).a "$(DESTDIR)$(libdir)/lib$(libname).a" ; \ fi - if [ -n "$(progname_shared)" ] ; then \ - $(INSTALL_PROGRAM) ./lib$(libname).so.$(pkgversion) "$(DESTDIR)$(libdir)/lib$(libname).so.$(pkgversion)" ; \ + if [ -n "$(libname_shared)" ] ; then \ if [ -e "$(DESTDIR)$(libdir)/lib$(libname).so.$(soversion)" ] ; then \ run_ldconfig=no ; \ else run_ldconfig=yes ; \ fi ; \ rm -f "$(DESTDIR)$(libdir)/lib$(libname).so" ; \ rm -f "$(DESTDIR)$(libdir)/lib$(libname).so.$(soversion)" ; \ + $(INSTALL_SO) ./lib$(libname).so.$(soversion) "$(DESTDIR)$(libdir)/lib$(libname).so.$(pkgversion)" ; \ cd "$(DESTDIR)$(libdir)" && ln -s lib$(libname).so.$(pkgversion) lib$(libname).so ; \ cd "$(DESTDIR)$(libdir)" && ln -s lib$(libname).so.$(pkgversion) lib$(libname).so.$(soversion) ; \ if [ "${disable_ldconfig}" != yes ] && [ $${run_ldconfig} = yes ] && \ [ -x "$(LDCONFIG)" ] ; then "$(LDCONFIG)" -n "$(DESTDIR)$(libdir)" || true ; fi ; \ fi -install-bin-strip : all - $(MAKE) INSTALL_PROGRAM='$(INSTALL_PROGRAM) -s' install-bin +install-lib-strip : lib + $(MAKE) INSTALL_SO='$(INSTALL_SO) -s' install-lib install-info : if [ ! -d "$(DESTDIR)$(infodir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(infodir)" ; fi -rm -f "$(DESTDIR)$(infodir)/$(pkgname).info"* $(INSTALL_DATA) $(VPATH)/doc/$(pkgname).info "$(DESTDIR)$(infodir)/$(pkgname).info" -if $(CAN_RUN_INSTALLINFO) ; then \ - install-info --info-dir="$(DESTDIR)$(infodir)" "$(DESTDIR)$(infodir)/$(pkgname).info" ; \ + install-info --info-dir="$(DESTDIR)$(infodir)" "$(DESTDIR)$(infodir)/$(pkgname).info" ; \ fi install-info-compress : install-info @@ -125,25 +150,24 @@ install-man : install-man-compress : install-man lzip -v -9 "$(DESTDIR)$(mandir)/man1/$(progname).1" -install-as-lzip : install install-man - if [ ! -d "$(DESTDIR)$(bindir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(bindir)" ; fi - $(INSTALL_PROGRAM) ./$(progname_lzip) "$(DESTDIR)$(bindir)/$(progname)" +install-as-lzip : install-bin -rm -f "$(DESTDIR)$(bindir)/lzip" cd "$(DESTDIR)$(bindir)" && ln -s $(progname) lzip -uninstall : uninstall-man uninstall-info uninstall-bin +uninstall : uninstall-info uninstall-lib uninstall-bin : -rm -f "$(DESTDIR)$(bindir)/$(progname)" + +uninstall-lib : -rm -f "$(DESTDIR)$(includedir)/$(libname)lib.h" -rm -f "$(DESTDIR)$(libdir)/lib$(libname).a" -rm -f "$(DESTDIR)$(libdir)/lib$(libname).so" -rm -f "$(DESTDIR)$(libdir)/lib$(libname).so.$(soversion)" - -rm -f "$(DESTDIR)$(libdir)/lib$(libname).so.$(pkgversion)" uninstall-info : -if $(CAN_RUN_INSTALLINFO) ; then \ - install-info --info-dir="$(DESTDIR)$(infodir)" --remove "$(DESTDIR)$(infodir)/$(pkgname).info" ; \ + install-info --info-dir="$(DESTDIR)$(infodir)" --remove "$(DESTDIR)$(infodir)/$(pkgname).info" ; \ fi -rm -f "$(DESTDIR)$(infodir)/$(pkgname).info"* @@ -155,6 +179,7 @@ dist : doc tar -Hustar --owner=root --group=root -cvf $(DISTNAME).tar \ $(DISTNAME)/AUTHORS \ $(DISTNAME)/COPYING \ + $(DISTNAME)/COPYING.GPL \ $(DISTNAME)/ChangeLog \ $(DISTNAME)/INSTALL \ $(DISTNAME)/Makefile.in \ @@ -164,20 +189,22 @@ dist : doc $(DISTNAME)/doc/$(progname).1 \ $(DISTNAME)/doc/$(pkgname).info \ $(DISTNAME)/doc/$(pkgname).texi \ + $(DISTNAME)/*.h \ + $(DISTNAME)/*.c \ $(DISTNAME)/testsuite/check.sh \ $(DISTNAME)/testsuite/test.txt \ - $(DISTNAME)/testsuite/test2.txt \ - $(DISTNAME)/testsuite/test.txt.lz \ + $(DISTNAME)/testsuite/fox_lf \ + $(DISTNAME)/testsuite/fox.lz \ + $(DISTNAME)/testsuite/fox_*.lz \ $(DISTNAME)/testsuite/test_sync.lz \ - $(DISTNAME)/*.h \ - $(DISTNAME)/*.c + $(DISTNAME)/testsuite/test.txt.lz rm -f $(DISTNAME) lzip -v -9 $(DISTNAME).tar clean : - -rm -f $(progname) $(objs) - -rm -f $(progname)_shared lzlib_sh.o *.so.$(pkgversion) - -rm -f bbexample bbexample.o lzcheck lzcheck.o lzlib.o *.a + -rm -f $(progname) $(objs) lzlib.o lib$(libname).a + -rm -f $(progname)_shared lzlib_sh.o lib$(libname).so* + -rm -f bbexample bbexample.o ffexample ffexample.o lzcheck lzcheck.o distclean : clean -rm -f Makefile config.status *.tar *.tar.lz diff --git a/NEWS b/NEWS index e60b1bd..1528dce 100644 --- a/NEWS +++ b/NEWS @@ -1,16 +1,21 @@ -Changes in version 1.8: +Changes in version 1.15: -The test of the value remaining in the range decoder has been removed. -(After extensive testing it has been found useless to detect corruption -in the decompressed data. Eliminating it reduces the number of false -positives for corruption and makes error detection more accurate). +Lzlib now reports a nonzero first LZMA byte as a LZ_data_error. -The option "-a, --trailing-error", which makes minilzip exit with error -status 2 if any remaining input is detected after decompressing the last -member, has been added. +minilzip now exits with error status 2 if any empty member is found in a +multimember file. -When decompressing with minilzip, the file specified with the '--output' -option is now deleted if the input is a terminal. +LZ_Errno, LZ_Encoder, and LZ_Decoder are now declared in lzlib.h as typedef. -A harmless check failure on Windows, caused by the failed comparison of -a message in text mode, has been fixed. +The targets 'lib' and 'bin' have been added to Makefile.in. 'lib' is the new +default and builds just the library. 'bin' builds both the library and +minilzip. + +minilzip is no longer built by default. + +'install-bin' installs minilzip and its man page again. + +To improve portability, the linker option '--soname' is now used conditionally. +(Reported by Michael Sullivan). + +The use of the target 'bin' has been documented in INSTALL. diff --git a/README b/README index 97f11e9..b52806d 100644 --- a/README +++ b/README @@ -1,97 +1,89 @@ +See the file INSTALL for compilation and installation instructions. + Description -Lzlib is a data compression library providing in-memory LZMA compression -and decompression functions, including integrity checking of the -decompressed data. The compressed data format used by the library is the -lzip format. Lzlib is written in C. +Lzlib is a data compression library providing in-memory LZMA compression and +decompression functions, including integrity checking of the decompressed +data. The compressed data format used by the library is the lzip format. +Lzlib is written in C and is distributed under a 2-clause BSD license. -The lzip file format is designed for data sharing and long-term -archiving, taking into account both data integrity and decoder -availability: +The functions and variables forming the interface of the compression library +are declared in the file 'lzlib.h'. Usage examples of the library are given +in the files 'bbexample.c', 'ffexample.c', and 'minilzip.c' from the source +distribution. - * The lzip format provides very safe integrity checking and some data - recovery means. The lziprecover program can repair bit-flip errors - (one of the most common forms of data corruption) in lzip files, - and provides data recovery capabilities, including error-checked - merging of damaged copies of a file. +As 'lzlib.h' can be used in C and C++ programs, it must not impose a choice +of system headers on the program by including one of them. Therefore it is +the responsibility of the program using lzlib to include before 'lzlib.h' +some header that declares the type 'uint8_t'. There are at least four such +headers in C and C++: 'stdint.h', 'cstdint', 'inttypes.h', and 'cinttypes'. - * The lzip format is as simple as possible (but not simpler). The - lzip manual provides the code of a simple decompressor along with a - detailed explanation of how it works, so that with the only help of - the lzip manual it would be possible for a digital archaeologist to - extract the data from a lzip file long after quantum computers - eventually render LZMA obsolete. - - * Additionally the lzip reference implementation is copylefted, which - guarantees that it will remain free forever. - -A nice feature of the lzip format is that a corrupt byte is easier to -repair the nearer it is from the beginning of the file. Therefore, with -the help of lziprecover, losing an entire archive just because of a -corrupt byte near the beginning is a thing of the past. - -The functions and variables forming the interface of the compression -library are declared in the file 'lzlib.h'. Usage examples of the -library are given in the files 'main.c' and 'bbexample.c' from the -source distribution. +All the library functions are thread safe. The library does not install any +signal handler. The decoder checks the consistency of the compressed data, +so the library should never crash even in case of corrupted input. Compression/decompression is done by repeatedly calling a couple of -read/write functions until all the data have been processed by the -library. This interface is safer and less error prone than the -traditional zlib interface. +read/write functions until all the data have been processed by the library. +This interface is safer and less error prone than the traditional zlib +interface. Compression/decompression is done when the read function is called. This -means the value returned by the position functions will not be updated -until a read call, even if a lot of data is written. If you want the -data to be compressed in advance, just call the read function with a -size equal to 0. +means the value returned by the position functions is not updated until a +read call, even if a lot of data are written. If you want the data to be +compressed in advance, just call the read function with a size equal to 0. -If all the data to be compressed are written in advance, lzlib will -automatically adjust the header of the compressed data to use the -smallest possible dictionary size. This feature reduces the amount of -memory needed for decompression and allows minilzip to produce identical -compressed output as lzip. +If all the data to be compressed are written in advance, lzlib automatically +adjusts the header of the compressed data to use the largest dictionary size +that does not exceed neither the data size nor the limit given to +'LZ_compress_open'. This feature reduces the amount of memory needed for +decompression and allows minilzip to produce identical compressed output as +lzip. -Lzlib will correctly decompress a data stream which is the concatenation -of two or more compressed data streams. The result is the concatenation -of the corresponding decompressed data streams. Integrity testing of -concatenated compressed data streams is also supported. +Lzlib correctly decompresses a data stream which is the concatenation of +two or more compressed data streams. The result is the concatenation of the +corresponding decompressed data streams. Integrity testing of concatenated +compressed data streams is also supported. -All the library functions are thread safe. The library does not install -any signal handler. The decoder checks the consistency of the compressed -data, so the library should never crash even in case of corrupted input. +Lzlib is able to compress and decompress streams of unlimited size by +automatically creating multimember output. The members so created are large, +about 2 PiB each. In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a concrete algorithm; it is more like "any algorithm using the LZMA coding -scheme". For example, the option '-0' of lzip uses the scheme in almost -the simplest way possible; issuing the longest match it can find, or a -literal byte if it can't find a match. Inversely, a much more elaborated -way of finding coding sequences of minimum size than the one currently -used by lzip could be developed, and the resulting sequence could also -be coded using the LZMA coding scheme. +scheme". For example, the option '-0' of lzip uses the scheme in almost the +simplest way possible; issuing the longest match it can find, or a literal +byte if it can't find a match. Inversely, a more elaborate way of finding +coding sequences of minimum size than the one currently used by lzip could +be developed, and the resulting sequence could also be coded using the LZMA +coding scheme. -Lzlib currently implements two variants of the LZMA algorithm; fast -(used by option '-0' of minilzip) and normal (used by all other -compression levels). +Lzlib currently implements two variants of the LZMA algorithm: fast (used by +option '-0' of minilzip) and normal (used by all other compression levels). The high compression of LZMA comes from combining two basic, well-proven -compression ideas: sliding dictionaries (LZ77/78) and markov models (the -thing used by every compression algorithm that uses a range encoder or -similar order-0 entropy coder as its last stage) with segregation of -contexts according to what the bits are used for. +compression ideas: sliding dictionaries (LZ77) and Markov models (the thing +used by every compression algorithm that uses a range encoder or similar +order-0 entropy coder as its last stage) with segregation of contexts +according to what the bits are used for. The ideas embodied in lzlib are due to (at least) the following people: -Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for -the definition of Markov chains), G.N.N. Martin (for the definition of -range encoding), Igor Pavlov (for putting all the above together in -LZMA), and Julian Seward (for bzip2's CLI). +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrei Markov (for the +definition of Markov chains), G.N.N. Martin (for the definition of range +encoding), Igor Pavlov (for putting all the above together in LZMA), and +Julian Seward (for bzip2's CLI). + +LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have +been compressed. Decompressed is used to refer to data which have undergone +the process of decompression. + +minilzip uses Arg_parser for command-line argument parsing: +http://www.nongnu.org/arg-parser/arg_parser.html -Copyright (C) 2009-2016 Antonio Diaz Diaz. +Copyright (C) 2009-2025 Antonio Diaz Diaz. This file is free documentation: you have unlimited permission to copy, -distribute and modify it. +distribute, and modify it. -The file Makefile.in is a data file used by configure to produce the -Makefile. It has the same copyright owner and permissions that configure -itself. +The file Makefile.in is a data file used by configure to produce the Makefile. +It has the same copyright owner and permissions that configure itself. diff --git a/bbexample.c b/bbexample.c index 6b1352b..4370c7e 100644 --- a/bbexample.c +++ b/bbexample.c @@ -1,21 +1,20 @@ -/* Buffer to buffer example - Test program for the lzlib library - Copyright (C) 2010-2016 Antonio Diaz Diaz. +/* Buffer to buffer example - Test program for the library lzlib + Copyright (C) 2010-2025 Antonio Diaz Diaz. - This program is free software: you have unlimited permission - to copy, distribute and modify it. + This program is free software: you have unlimited permission + to copy, distribute, and modify it. - Usage is: - bbexample filename + Usage: bbexample filename - This program is an example of how buffer-to-buffer - compression/decompression can be implemented using lzlib. + This program is an example of how buffer-to-buffer + compression/decompression can be implemented using lzlib. */ +#define _FILE_OFFSET_BITS 64 + #include #include -#ifndef __cplusplus #include -#endif #include #include #include @@ -24,70 +23,72 @@ #include "lzlib.h" +#ifndef min + #define min(x,y) ((x) <= (y) ? (x) : (y)) +#endif -/* Returns the address of a malloc'd buffer containing the file data and - its size in '*size'. - In case of error, returns 0 and does not modify '*size'. + +/* Return the address of a malloc'd buffer containing the file data and + the file size in '*file_sizep'. + In case of error, return 0 and do not modify '*file_sizep'. */ -uint8_t * read_file( const char * const name, long * const size ) +uint8_t * read_file( const char * const name, long * const file_sizep ) { long buffer_size = 1 << 20, file_size; uint8_t * buffer, * tmp; FILE * const f = fopen( name, "rb" ); if( !f ) - { - fprintf( stderr, "bbexample: Can't open input file '%s': %s\n", - name, strerror( errno ) ); - return 0; - } + { fprintf( stderr, "bbexample: %s: Can't open input file: %s\n", + name, strerror( errno ) ); return 0; } buffer = (uint8_t *)malloc( buffer_size ); if( !buffer ) - { fputs( "bbexample: Not enough memory.\n", stderr ); return 0; } + { fputs( "bbexample: read_file: Not enough memory.\n", stderr ); + fclose( f ); return 0; } file_size = fread( buffer, 1, buffer_size, f ); while( file_size >= buffer_size ) { if( buffer_size >= LONG_MAX ) { - fprintf( stderr, "bbexample: Input file '%s' is too large.\n", name ); - free( buffer ); return 0; + fprintf( stderr, "bbexample: %s: Input file is too large.\n", name ); + free( buffer ); fclose( f ); return 0; } - buffer_size = ( buffer_size <= LONG_MAX / 2 ) ? 2 * buffer_size : LONG_MAX; + buffer_size = (buffer_size <= LONG_MAX / 2) ? 2 * buffer_size : LONG_MAX; tmp = (uint8_t *)realloc( buffer, buffer_size ); if( !tmp ) - { fputs( "bbexample: Not enough memory.\n", stderr ); - free( buffer ); return 0; } + { fputs( "bbexample: read_file: Not enough memory.\n", stderr ); + free( buffer ); fclose( f ); return 0; } buffer = tmp; file_size += fread( buffer + file_size, 1, buffer_size - file_size, f ); } if( ferror( f ) || !feof( f ) ) { - fprintf( stderr, "bbexample: Error reading file '%s': %s\n", + fprintf( stderr, "bbexample: %s: Error reading file: %s\n", name, strerror( errno ) ); - free( buffer ); return 0; + free( buffer ); fclose( f ); return 0; } fclose( f ); - *size = file_size; + *file_sizep = file_size; return buffer; } -/* Compresses 'size' bytes from 'data'. Returns the address of a - malloc'd buffer containing the compressed data and its size in - '*out_sizep'. - In case of error, returns 0 and does not modify '*out_sizep'. +/* Compress 'insize' bytes from 'inbuf'. + Return the address of a malloc'd buffer containing the compressed data, + and the size of the data in '*outlenp'. + In case of error, return 0 and do not modify '*outlenp'. */ -uint8_t * bbcompress( const uint8_t * const data, const long size, - const int level, long * const out_sizep ) +uint8_t * bbcompressl( const uint8_t * const inbuf, const long insize, + const int level, long * const outlenp ) { - struct Lzma_options + typedef struct Lzma_options { int dictionary_size; /* 4 KiB .. 512 MiB */ int match_len_limit; /* 5 .. 273 */ - }; - /* Mapping from gzip/bzip2 style 1..9 compression modes - to the corresponding LZMA compression modes. */ - const struct Lzma_options option_mapping[] = + } Lzma_options; + /* Mapping from gzip/bzip2 style 0..9 compression levels to the + corresponding LZMA compression parameters. */ + const Lzma_options option_mapping[] = { { 65535, 16 }, /* -0 (65535,16 chooses fast encoder) */ { 1 << 20, 5 }, /* -1 */ @@ -99,133 +100,247 @@ uint8_t * bbcompress( const uint8_t * const data, const long size, { 1 << 24, 68 }, /* -7 */ { 3 << 23, 132 }, /* -8 */ { 1 << 25, 273 } }; /* -9 */ - struct Lzma_options encoder_options; - const unsigned long long member_size = 0x7FFFFFFFFFFFFFFFULL; /* INT64_MAX */ - struct LZ_Encoder * encoder; - uint8_t * new_data; - const long delta_size = ( size / 4 ) + 64; /* size may be zero */ - long new_data_size = delta_size; /* initial size */ - long new_pos = 0; - long written = 0; + Lzma_options encoder_options; + LZ_Encoder * encoder; + uint8_t * outbuf; + const long delta_size = insize / 4 + 64; /* insize may be zero */ + long outsize = delta_size; /* initial outsize */ + long inpos = 0; + long outpos = 0; bool error = false; if( level < 0 || level > 9 ) return 0; encoder_options = option_mapping[level]; - if( encoder_options.dictionary_size > size && level != 0 ) - encoder_options.dictionary_size = size; /* saves memory */ + if( encoder_options.dictionary_size > insize && level != 0 ) + encoder_options.dictionary_size = insize; /* saves memory */ if( encoder_options.dictionary_size < LZ_min_dictionary_size() ) encoder_options.dictionary_size = LZ_min_dictionary_size(); encoder = LZ_compress_open( encoder_options.dictionary_size, - encoder_options.match_len_limit, member_size ); - if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) - { LZ_compress_close( encoder ); return 0; } - - new_data = (uint8_t *)malloc( new_data_size ); - if( !new_data ) - { LZ_compress_close( encoder ); return 0; } + encoder_options.match_len_limit, INT64_MAX ); + outbuf = (uint8_t *)malloc( outsize ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok || !outbuf ) + { free( outbuf ); LZ_compress_close( encoder ); return 0; } while( true ) { - int rd; - if( LZ_compress_write_size( encoder ) > 0 ) - { - if( written < size ) - { - const int wr = LZ_compress_write( encoder, data + written, - size - written ); - if( wr < 0 ) { error = true; break; } - written += wr; - } - if( written >= size ) LZ_compress_finish( encoder ); - } - rd = LZ_compress_read( encoder, new_data + new_pos, - new_data_size - new_pos ); - if( rd < 0 ) { error = true; break; } - new_pos += rd; + int ret = LZ_compress_write( encoder, inbuf + inpos, + min( INT_MAX, insize - inpos ) ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_compress_finish( encoder ); + ret = LZ_compress_read( encoder, outbuf + outpos, + min( INT_MAX, outsize - outpos ) ); + if( ret < 0 ) { error = true; break; } + outpos += ret; if( LZ_compress_finished( encoder ) == 1 ) break; - if( new_pos >= new_data_size ) + if( outpos >= outsize ) { uint8_t * tmp; - if( new_data_size > LONG_MAX - delta_size ) { error = true; break; } - new_data_size += delta_size; - tmp = (uint8_t *)realloc( new_data, new_data_size ); + if( outsize > LONG_MAX - delta_size ) { error = true; break; } + outsize += delta_size; + tmp = (uint8_t *)realloc( outbuf, outsize ); if( !tmp ) { error = true; break; } - new_data = tmp; + outbuf = tmp; } } if( LZ_compress_close( encoder ) < 0 ) error = true; - if( error ) { free( new_data ); return 0; } - *out_sizep = new_pos; - return new_data; + if( error ) { free( outbuf ); return 0; } + *outlenp = outpos; + return outbuf; } -/* Decompresses 'size' bytes from 'data'. Returns the address of a - malloc'd buffer containing the decompressed data and its size in - '*out_sizep'. - In case of error, returns 0 and does not modify '*out_sizep'. +/* Decompress 'insize' bytes from 'inbuf'. + Return the address of a malloc'd buffer containing the decompressed + data, and the size of the data in '*outlenp'. + In case of error, return 0 and do not modify '*outlenp'. */ -uint8_t * bbdecompress( const uint8_t * const data, const long size, - long * const out_sizep ) +uint8_t * bbdecompressl( const uint8_t * const inbuf, const long insize, + long * const outlenp ) { - struct LZ_Decoder * const decoder = LZ_decompress_open(); - uint8_t * new_data; - const long delta_size = size; /* size must be > zero */ - long new_data_size = delta_size; /* initial size */ - long new_pos = 0; - long written = 0; + LZ_Decoder * const decoder = LZ_decompress_open(); + const long delta_size = insize; /* insize must be > zero */ + long outsize = delta_size; /* initial outsize */ + uint8_t * outbuf = (uint8_t *)malloc( outsize ); + long inpos = 0; + long outpos = 0; bool error = false; - if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) - { LZ_decompress_close( decoder ); return 0; } - - new_data = (uint8_t *)malloc( new_data_size ); - if( !new_data ) - { LZ_decompress_close( decoder ); return 0; } + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok || !outbuf ) + { free( outbuf ); LZ_decompress_close( decoder ); return 0; } while( true ) { - int rd; - if( LZ_decompress_write_size( decoder ) > 0 ) - { - if( written < size ) - { - const int wr = LZ_decompress_write( decoder, data + written, - size - written ); - if( wr < 0 ) { error = true; break; } - written += wr; - } - if( written >= size ) LZ_decompress_finish( decoder ); - } - rd = LZ_decompress_read( decoder, new_data + new_pos, - new_data_size - new_pos ); - if( rd < 0 ) { error = true; break; } - new_pos += rd; + int ret = LZ_decompress_write( decoder, inbuf + inpos, + min( INT_MAX, insize - inpos ) ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_decompress_finish( decoder ); + ret = LZ_decompress_read( decoder, outbuf + outpos, + min( INT_MAX, outsize - outpos ) ); + if( ret < 0 ) { error = true; break; } + outpos += ret; if( LZ_decompress_finished( decoder ) == 1 ) break; - if( new_pos >= new_data_size ) + if( outpos >= outsize ) { uint8_t * tmp; - if( new_data_size > LONG_MAX - delta_size ) { error = true; break; } - new_data_size += delta_size; - tmp = (uint8_t *)realloc( new_data, new_data_size ); + if( outsize > LONG_MAX - delta_size ) { error = true; break; } + outsize += delta_size; + tmp = (uint8_t *)realloc( outbuf, outsize ); if( !tmp ) { error = true; break; } - new_data = tmp; + outbuf = tmp; } } if( LZ_decompress_close( decoder ) < 0 ) error = true; - if( error ) { free( new_data ); return 0; } - *out_sizep = new_pos; - return new_data; + if( error ) { free( outbuf ); return 0; } + *outlenp = outpos; + return outbuf; + } + + +/* Test the whole file at all levels. */ +int full_test( const uint8_t * const inbuf, const long insize ) + { + int level; + for( level = 0; level <= 9; ++level ) + { + long midsize = 0, outsize = 0; + uint8_t * outbuf; + uint8_t * midbuf = bbcompressl( inbuf, insize, level, &midsize ); + if( !midbuf ) + { fputs( "bbexample: full_test: Not enough memory or compress error.\n", + stderr ); return 1; } + + outbuf = bbdecompressl( midbuf, midsize, &outsize ); + free( midbuf ); + if( !outbuf ) + { fputs( "bbexample: full_test: Not enough memory or decompress error.\n", + stderr ); return 1; } + + if( insize != outsize || + ( insize > 0 && memcmp( inbuf, outbuf, insize ) != 0 ) ) + { fputs( "bbexample: full_test: Decompressed data differs from original.\n", + stderr ); free( outbuf ); return 1; } + + free( outbuf ); + } + return 0; + } + + +/* Compress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the compressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbcompress( const uint8_t * const inbuf, const int insize, + const int dictionary_size, const int match_len_limit, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + LZ_Encoder * const encoder = + LZ_compress_open( dictionary_size, match_len_limit, INT64_MAX ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { LZ_compress_close( encoder ); return false; } + + while( true ) + { + int ret = LZ_compress_write( encoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_compress_finish( encoder ); + ret = LZ_compress_read( encoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_compress_finished( encoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_compress_close( encoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } + + +/* Decompress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the decompressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbdecompress( const uint8_t * const inbuf, const int insize, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + LZ_Decoder * const decoder = LZ_decompress_open(); + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { LZ_decompress_close( decoder ); return false; } + + while( true ) + { + int ret = LZ_decompress_write( decoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_decompress_finish( decoder ); + ret = LZ_decompress_read( decoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_decompress_finished( decoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_decompress_close( decoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } + + +/* Test at most INT_MAX bytes from the file with buffers of fixed size. */ +int fixed_test( const uint8_t * const inbuf, const int insize ) + { + int dictionary_size = 65535; /* fast encoder */ + int midsize = min( INT_MAX, ( insize / 8 ) * 9LL + 44 ), outsize = insize; + uint8_t * midbuf = (uint8_t *)malloc( midsize ); + uint8_t * outbuf = (uint8_t *)malloc( outsize ); + if( !midbuf || !outbuf ) + { fputs( "bbexample: fixed_test: Not enough memory.\n", stderr ); + free( outbuf ); free( midbuf ); return 1; } + + for( ; dictionary_size <= 8 << 20; dictionary_size += 8323073 ) + { + int midlen, outlen; + if( !bbcompress( inbuf, insize, dictionary_size, 16, midbuf, midsize, &midlen ) ) + { fputs( "bbexample: fixed_test: Not enough memory or compress error.\n", + stderr ); free( outbuf ); free( midbuf ); return 1; } + + if( !bbdecompress( midbuf, midlen, outbuf, outsize, &outlen ) ) + { fputs( "bbexample: fixed_test: Not enough memory or decompress error.\n", + stderr ); free( outbuf ); free( midbuf ); return 1; } + + if( insize != outlen || + ( insize > 0 && memcmp( inbuf, outbuf, insize ) != 0 ) ) + { fputs( "bbexample: fixed_test: Decompressed data differs from original.\n", + stderr ); free( outbuf ); free( midbuf ); return 1; } + + } + free( outbuf ); + free( midbuf ); + return 0; } int main( const int argc, const char * const argv[] ) { - uint8_t * in_buffer; - long in_size = 0; - int level; + int retval = 0, i; + int open_failures = 0; + const bool verbose = argc > 2; if( argc < 2 ) { @@ -233,38 +348,20 @@ int main( const int argc, const char * const argv[] ) return 1; } - in_buffer = read_file( argv[1], &in_size ); - if( !in_buffer ) return 1; - - for( level = 0; level <= 9; ++level ) + for( i = 1; i < argc && retval == 0; ++i ) { - uint8_t * mid_buffer, * out_buffer; - long mid_size = 0, out_size = 0; + long insize; + uint8_t * const inbuf = read_file( argv[i], &insize ); + if( !inbuf ) { ++open_failures; continue; } + if( verbose ) fprintf( stderr, " Testing file '%s'\n", argv[i] ); - mid_buffer = bbcompress( in_buffer, in_size, level, &mid_size ); - if( !mid_buffer ) - { - fputs( "bbexample: Not enough memory or compress error.\n", stderr ); - return 1; - } - - out_buffer = bbdecompress( mid_buffer, mid_size, &out_size ); - if( !out_buffer ) - { - fputs( "bbexample: Not enough memory or decompress error.\n", stderr ); - return 1; - } - - if( in_size != out_size || - ( in_size > 0 && memcmp( in_buffer, out_buffer, in_size ) != 0 ) ) - { - fputs( "bbexample: Decompressed data differs from original.\n", stderr ); - return 1; - } - - free( out_buffer ); - free( mid_buffer ); + retval = full_test( inbuf, insize ); + if( retval == 0 ) retval = fixed_test( inbuf, min( INT_MAX, insize ) ); + free( inbuf ); } - free( in_buffer ); - return 0; + if( open_failures > 0 && verbose ) + fprintf( stderr, "bbexample: warning: %d %s failed to open.\n", + open_failures, ( open_failures == 1 ) ? "file" : "files" ); + if( retval == 0 && open_failures ) retval = 1; + return retval; } diff --git a/carg_parser.c b/carg_parser.c index 3d4e89f..20b8a16 100644 --- a/carg_parser.c +++ b/carg_parser.c @@ -1,20 +1,20 @@ -/* Arg_parser - POSIX/GNU command line argument parser. (C version) - Copyright (C) 2006-2016 Antonio Diaz Diaz. +/* Arg_parser - POSIX/GNU command-line argument parser. (C version) + Copyright (C) 2006-2025 Antonio Diaz Diaz. - This library is free software. Redistribution and use in source and - binary forms, with or without modification, are permitted provided - that the following conditions are met: + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - 1. Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - 2. Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ #include @@ -32,28 +32,46 @@ static void * ap_resize_buffer( void * buf, const int min_size ) } -static char push_back_record( struct Arg_parser * const ap, - const int code, const char * const argument ) +static char push_back_record( Arg_parser * const ap, const int code, + const char * const long_name, + const char * const argument ) { - const int len = strlen( argument ); - struct ap_Record * p; + ap_Record * p; void * tmp = ap_resize_buffer( ap->data, - ( ap->data_size + 1 ) * sizeof (struct ap_Record) ); + ( ap->data_size + 1 ) * sizeof (ap_Record) ); if( !tmp ) return 0; - ap->data = (struct ap_Record *)tmp; + ap->data = (ap_Record *)tmp; p = &(ap->data[ap->data_size]); p->code = code; - p->argument = 0; - tmp = ap_resize_buffer( p->argument, len + 1 ); - if( !tmp ) return 0; - p->argument = (char *)tmp; - strncpy( p->argument, argument, len + 1 ); + if( long_name ) + { + const int len = strlen( long_name ); + p->parsed_name = (char *)malloc( len + 2 + 1 ); + if( !p->parsed_name ) return 0; + p->parsed_name[0] = p->parsed_name[1] = '-'; + strncpy( p->parsed_name + 2, long_name, len + 1 ); + } + else if( code > 0 && code < 256 ) + { + p->parsed_name = (char *)malloc( 2 + 1 ); + if( !p->parsed_name ) return 0; + p->parsed_name[0] = '-'; p->parsed_name[1] = code; p->parsed_name[2] = 0; + } + else p->parsed_name = 0; + if( argument ) + { + const int len = strlen( argument ); + p->argument = (char *)malloc( len + 1 ); + if( !p->argument ) { free( p->parsed_name ); return 0; } + strncpy( p->argument, argument, len + 1 ); + } + else p->argument = 0; ++ap->data_size; return 1; } -static char add_error( struct Arg_parser * const ap, const char * const msg ) +static char add_error( Arg_parser * const ap, const char * const msg ) { const int len = strlen( msg ); void * tmp = ap_resize_buffer( ap->error, ap->error_size + len + 1 ); @@ -65,19 +83,20 @@ static char add_error( struct Arg_parser * const ap, const char * const msg ) } -static void free_data( struct Arg_parser * const ap ) +static void free_data( Arg_parser * const ap ) { int i; - for( i = 0; i < ap->data_size; ++i ) free( ap->data[i].argument ); + for( i = 0; i < ap->data_size; ++i ) + { free( ap->data[i].argument ); free( ap->data[i].parsed_name ); } if( ap->data ) { free( ap->data ); ap->data = 0; } ap->data_size = 0; } -static char parse_long_option( struct Arg_parser * const ap, +/* Return 0 only if out of memory. */ +static char parse_long_option( Arg_parser * const ap, const char * const opt, const char * const arg, - const struct ap_Option options[], - int * const argindp ) + const ap_Option options[], int * const argindp ) { unsigned len; int index = -1, i; @@ -87,14 +106,15 @@ static char parse_long_option( struct Arg_parser * const ap, /* Test all long options for either exact match or abbreviated matches. */ for( i = 0; options[i].code != 0; ++i ) - if( options[i].name && strncmp( options[i].name, &opt[2], len ) == 0 ) + if( options[i].long_name && + strncmp( options[i].long_name, &opt[2], len ) == 0 ) { - if( strlen( options[i].name ) == len ) /* Exact match found */ + if( strlen( options[i].long_name ) == len ) /* Exact match found */ { index = i; exact = 1; break; } else if( index < 0 ) index = i; /* First nonexact match found */ else if( options[index].code != options[i].code || options[index].has_arg != options[i].has_arg ) - ambig = 1; /* Second or later nonexact match found */ + ambig = 1; /* Second or later nonexact match found */ } if( ambig && !exact ) @@ -117,52 +137,55 @@ static char parse_long_option( struct Arg_parser * const ap, { if( options[index].has_arg == ap_no ) { - add_error( ap, "option '--" ); add_error( ap, options[index].name ); + add_error( ap, "option '--" ); add_error( ap, options[index].long_name ); add_error( ap, "' doesn't allow an argument" ); return 1; } if( options[index].has_arg == ap_yes && !opt[len+3] ) { - add_error( ap, "option '--" ); add_error( ap, options[index].name ); + add_error( ap, "option '--" ); add_error( ap, options[index].long_name ); add_error( ap, "' requires an argument" ); return 1; } - return push_back_record( ap, options[index].code, &opt[len+3] ); + return push_back_record( ap, options[index].code, options[index].long_name, + &opt[len+3] ); /* argument may be empty */ } - if( options[index].has_arg == ap_yes ) + if( options[index].has_arg == ap_yes || options[index].has_arg == ap_yme ) { - if( !arg || !arg[0] ) + if( !arg || ( options[index].has_arg == ap_yes && !arg[0] ) ) { - add_error( ap, "option '--" ); add_error( ap, options[index].name ); + add_error( ap, "option '--" ); add_error( ap, options[index].long_name ); add_error( ap, "' requires an argument" ); return 1; } ++*argindp; - return push_back_record( ap, options[index].code, arg ); + return push_back_record( ap, options[index].code, options[index].long_name, + arg ); /* argument may be empty */ } - return push_back_record( ap, options[index].code, "" ); + return push_back_record( ap, options[index].code, + options[index].long_name, 0 ); } -static char parse_short_option( struct Arg_parser * const ap, +/* Return 0 only if out of memory. */ +static char parse_short_option( Arg_parser * const ap, const char * const opt, const char * const arg, - const struct ap_Option options[], - int * const argindp ) + const ap_Option options[], int * const argindp ) { int cind = 1; /* character index in opt */ while( cind > 0 ) { int index = -1, i; - const unsigned char code = opt[cind]; + const unsigned char c = opt[cind]; char code_str[2]; - code_str[0] = code; code_str[1] = 0; + code_str[0] = c; code_str[1] = 0; - if( code != 0 ) + if( c != 0 ) for( i = 0; options[i].code; ++i ) - if( code == options[i].code ) + if( c == options[i].code ) { index = i; break; } if( index < 0 ) @@ -176,34 +199,34 @@ static char parse_short_option( struct Arg_parser * const ap, if( options[index].has_arg != ap_no && cind > 0 && opt[cind] ) { - if( !push_back_record( ap, code, &opt[cind] ) ) return 0; + if( !push_back_record( ap, c, 0, &opt[cind] ) ) return 0; ++*argindp; cind = 0; } - else if( options[index].has_arg == ap_yes ) + else if( options[index].has_arg == ap_yes || options[index].has_arg == ap_yme ) { - if( !arg || !arg[0] ) + if( !arg || ( options[index].has_arg == ap_yes && !arg[0] ) ) { add_error( ap, "option requires an argument -- '" ); add_error( ap, code_str ); add_error( ap, "'" ); return 1; } - ++*argindp; cind = 0; - if( !push_back_record( ap, code, arg ) ) return 0; + ++*argindp; cind = 0; /* argument may be empty */ + if( !push_back_record( ap, c, 0, arg ) ) return 0; } - else if( !push_back_record( ap, code, "" ) ) return 0; + else if( !push_back_record( ap, c, 0, 0 ) ) return 0; } return 1; } -char ap_init( struct Arg_parser * const ap, +char ap_init( Arg_parser * const ap, const int argc, const char * const argv[], - const struct ap_Option options[], const char in_order ) + const ap_Option options[], const char in_order ) { const char ** non_options = 0; /* skipped non-options */ int non_options_size = 0; /* number of skipped non-options */ int argind = 1; /* index in argv */ - int i; + char done = 0; /* false until success */ ap->data = 0; ap->error = 0; @@ -223,38 +246,41 @@ char ap_init( struct Arg_parser * const ap, if( ch2 == '-' ) { if( !argv[argind][2] ) { ++argind; break; } /* we found "--" */ - else if( !parse_long_option( ap, opt, arg, options, &argind ) ) return 0; + else if( !parse_long_option( ap, opt, arg, options, &argind ) ) goto out; } - else if( !parse_short_option( ap, opt, arg, options, &argind ) ) return 0; + else if( !parse_short_option( ap, opt, arg, options, &argind ) ) goto out; if( ap->error ) break; } else { - if( !in_order ) + if( in_order ) + { if( !push_back_record( ap, 0, 0, argv[argind++] ) ) goto out; } + else { void * tmp = ap_resize_buffer( non_options, ( non_options_size + 1 ) * sizeof *non_options ); - if( !tmp ) return 0; + if( !tmp ) goto out; non_options = (const char **)tmp; non_options[non_options_size++] = argv[argind++]; } - else if( !push_back_record( ap, 0, argv[argind++] ) ) return 0; } } if( ap->error ) free_data( ap ); else { + int i; for( i = 0; i < non_options_size; ++i ) - if( !push_back_record( ap, 0, non_options[i] ) ) return 0; + if( !push_back_record( ap, 0, 0, non_options[i] ) ) goto out; while( argind < argc ) - if( !push_back_record( ap, 0, argv[argind++] ) ) return 0; + if( !push_back_record( ap, 0, 0, argv[argind++] ) ) goto out; } - if( non_options ) free( non_options ); - return 1; + done = 1; +out: if( non_options ) free( non_options ); + return done; } -void ap_free( struct Arg_parser * const ap ) +void ap_free( Arg_parser * const ap ) { free_data( ap ); if( ap->error ) { free( ap->error ); ap->error = 0; } @@ -262,23 +288,26 @@ void ap_free( struct Arg_parser * const ap ) } -const char * ap_error( const struct Arg_parser * const ap ) - { return ap->error; } +const char * ap_error( const Arg_parser * const ap ) { return ap->error; } +int ap_arguments( const Arg_parser * const ap ) { return ap->data_size; } -int ap_arguments( const struct Arg_parser * const ap ) - { return ap->data_size; } - - -int ap_code( const struct Arg_parser * const ap, const int i ) +int ap_code( const Arg_parser * const ap, const int i ) { - if( i >= 0 && i < ap_arguments( ap ) ) return ap->data[i].code; - else return 0; + if( i < 0 || i >= ap_arguments( ap ) ) return 0; + return ap->data[i].code; } -const char * ap_argument( const struct Arg_parser * const ap, const int i ) +const char * ap_parsed_name( const Arg_parser * const ap, const int i ) { - if( i >= 0 && i < ap_arguments( ap ) ) return ap->data[i].argument; - else return ""; + if( i < 0 || i >= ap_arguments( ap ) || !ap->data[i].parsed_name ) return ""; + return ap->data[i].parsed_name; + } + + +const char * ap_argument( const Arg_parser * const ap, const int i ) + { + if( i < 0 || i >= ap_arguments( ap ) || !ap->data[i].argument ) return ""; + return ap->data[i].argument; } diff --git a/carg_parser.h b/carg_parser.h index e918942..28eabee 100644 --- a/carg_parser.h +++ b/carg_parser.h @@ -1,92 +1,101 @@ -/* Arg_parser - POSIX/GNU command line argument parser. (C version) - Copyright (C) 2006-2016 Antonio Diaz Diaz. +/* Arg_parser - POSIX/GNU command-line argument parser. (C version) + Copyright (C) 2006-2025 Antonio Diaz Diaz. - This library is free software. Redistribution and use in source and - binary forms, with or without modification, are permitted provided - that the following conditions are met: + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - 1. Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - 2. Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ -/* Arg_parser reads the arguments in 'argv' and creates a number of - option codes, option arguments and non-option arguments. +/* Arg_parser reads the arguments in 'argv' and creates a number of + option codes, option arguments, and non-option arguments. - In case of error, 'ap_error' returns a non-null pointer to an error - message. + In case of error, 'ap_error' returns a non-null pointer to an error + message. - 'options' is an array of 'struct ap_Option' terminated by an element - containing a code which is zero. A null name means a short-only - option. A code value outside the unsigned char range means a - long-only option. + 'options' is an array of 'struct ap_Option' terminated by an element + containing a code which is zero. A null long_name means a short-only + option. A code value outside the unsigned char range means a long-only + option. - Arg_parser normally makes it appear as if all the option arguments - were specified before all the non-option arguments for the purposes - of parsing, even if the user of your program intermixed option and - non-option arguments. If you want the arguments in the exact order - the user typed them, call 'ap_init' with 'in_order' = true. + Arg_parser normally makes it appear as if all the option arguments + were specified before all the non-option arguments for the purposes + of parsing, even if the user of your program intermixed option and + non-option arguments. If you want the arguments in the exact order + the user typed them, call 'ap_init' with 'in_order' = true. - The argument '--' terminates all options; any following arguments are - treated as non-option arguments, even if they begin with a hyphen. + The argument '--' terminates all options; any following arguments are + treated as non-option arguments, even if they begin with a hyphen. - The syntax for optional option arguments is '-' - (without whitespace), or '--='. + The syntax of options with an optional argument is + '-' (without whitespace), or + '--='. + + The syntax of options with an empty argument is '- ""', + '-- ""', or '--=""'. */ #ifdef __cplusplus extern "C" { #endif -enum ap_Has_arg { ap_no, ap_yes, ap_maybe }; +/* ap_yme = yes but maybe empty */ +typedef enum ap_Has_arg { ap_no, ap_yes, ap_maybe, ap_yme } ap_Has_arg; -struct ap_Option +typedef struct ap_Option { int code; /* Short option letter or code ( code != 0 ) */ - const char * name; /* Long option name (maybe null) */ - enum ap_Has_arg has_arg; - }; + const char * long_name; /* Long option name (maybe null) */ + ap_Has_arg has_arg; + } ap_Option; -struct ap_Record +typedef struct ap_Record { int code; + char * parsed_name; char * argument; - }; + } ap_Record; -struct Arg_parser +typedef struct Arg_parser { - struct ap_Record * data; + ap_Record * data; char * error; int data_size; int error_size; - }; + } Arg_parser; -char ap_init( struct Arg_parser * const ap, +char ap_init( Arg_parser * const ap, const int argc, const char * const argv[], - const struct ap_Option options[], const char in_order ); + const ap_Option options[], const char in_order ); -void ap_free( struct Arg_parser * const ap ); +void ap_free( Arg_parser * const ap ); -const char * ap_error( const struct Arg_parser * const ap ); +const char * ap_error( const Arg_parser * const ap ); - /* The number of arguments parsed (may be different from argc) */ -int ap_arguments( const struct Arg_parser * const ap ); +/* The number of arguments parsed. May be different from argc. */ +int ap_arguments( const Arg_parser * const ap ); - /* If ap_code( i ) is 0, ap_argument( i ) is a non-option. - Else ap_argument( i ) is the option's argument (or empty). */ -int ap_code( const struct Arg_parser * const ap, const int i ); +/* If ap_code( i ) is 0, ap_argument( i ) is a non-option. + Else ap_argument( i ) is the option's argument (or empty). */ +int ap_code( const Arg_parser * const ap, const int i ); -const char * ap_argument( const struct Arg_parser * const ap, const int i ); +/* Full name of the option parsed (short or long). */ +const char * ap_parsed_name( const Arg_parser * const ap, const int i ); + +const char * ap_argument( const Arg_parser * const ap, const int i ); #ifdef __cplusplus } diff --git a/cbuffer.c b/cbuffer.c index 89693ab..23d95e1 100644 --- a/cbuffer.c +++ b/cbuffer.c @@ -1,42 +1,31 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ -struct Circular_buffer +typedef struct Circular_buffer { uint8_t * buffer; unsigned buffer_size; /* capacity == buffer_size - 1 */ unsigned get; /* buffer is empty when get == put */ unsigned put; - }; + } Circular_buffer; -static inline void Cb_reset( struct Circular_buffer * const cb ) - { cb->get = 0; cb->put = 0; } - -static inline bool Cb_init( struct Circular_buffer * const cb, +static inline bool Cb_init( Circular_buffer * const cb, const unsigned buf_size ) { cb->buffer_size = buf_size + 1; @@ -44,35 +33,39 @@ static inline bool Cb_init( struct Circular_buffer * const cb, cb->put = 0; cb->buffer = ( cb->buffer_size > 1 ) ? (uint8_t *)malloc( cb->buffer_size ) : 0; - return ( cb->buffer != 0 ); + return cb->buffer != 0; } -static inline void Cb_free( struct Circular_buffer * const cb ) +static inline void Cb_free( Circular_buffer * const cb ) { free( cb->buffer ); cb->buffer = 0; } -static inline unsigned Cb_used_bytes( const struct Circular_buffer * const cb ) +static inline void Cb_reset( Circular_buffer * const cb ) + { cb->get = 0; cb->put = 0; } + +static inline unsigned Cb_empty( const Circular_buffer * const cb ) + { return cb->get == cb->put; } + +static inline unsigned Cb_used_bytes( const Circular_buffer * const cb ) { return ( (cb->get <= cb->put) ? 0 : cb->buffer_size ) + cb->put - cb->get; } -static inline unsigned Cb_free_bytes( const struct Circular_buffer * const cb ) +static inline unsigned Cb_free_bytes( const Circular_buffer * const cb ) { return ( (cb->get <= cb->put) ? cb->buffer_size : 0 ) - cb->put + cb->get - 1; } -static inline uint8_t Cb_get_byte( struct Circular_buffer * const cb ) +static inline uint8_t Cb_get_byte( Circular_buffer * const cb ) { const uint8_t b = cb->buffer[cb->get]; if( ++cb->get >= cb->buffer_size ) cb->get = 0; return b; } -static inline void Cb_put_byte( struct Circular_buffer * const cb, - const uint8_t b ) +static inline void Cb_put_byte( Circular_buffer * const cb, const uint8_t b ) { cb->buffer[cb->put] = b; if( ++cb->put >= cb->buffer_size ) cb->put = 0; } -static bool Cb_unread_data( struct Circular_buffer * const cb, - const unsigned size ) +static bool Cb_unread_data( Circular_buffer * const cb, const unsigned size ) { if( size > Cb_free_bytes( cb ) ) return false; if( cb->get >= size ) cb->get -= size; @@ -81,10 +74,11 @@ static bool Cb_unread_data( struct Circular_buffer * const cb, } -/* Copies up to 'out_size' bytes to 'out_buffer' and updates 'get'. - Returns the number of bytes copied. +/* Copy up to 'out_size' bytes to 'out_buffer' and update 'get'. + If 'out_buffer' is null, the bytes are discarded. + Return the number of bytes copied or discarded. */ -static unsigned Cb_read_data( struct Circular_buffer * const cb, +static unsigned Cb_read_data( Circular_buffer * const cb, uint8_t * const out_buffer, const unsigned out_size ) { @@ -95,7 +89,7 @@ static unsigned Cb_read_data( struct Circular_buffer * const cb, size = min( cb->buffer_size - cb->get, out_size ); if( size > 0 ) { - memcpy( out_buffer, cb->buffer + cb->get, size ); + if( out_buffer ) memcpy( out_buffer, cb->buffer + cb->get, size ); cb->get += size; if( cb->get >= cb->buffer_size ) cb->get = 0; } @@ -105,7 +99,7 @@ static unsigned Cb_read_data( struct Circular_buffer * const cb, const unsigned size2 = min( cb->put - cb->get, out_size - size ); if( size2 > 0 ) { - memcpy( out_buffer + size, cb->buffer + cb->get, size2 ); + if( out_buffer ) memcpy( out_buffer + size, cb->buffer + cb->get, size2 ); cb->get += size2; size += size2; } @@ -114,10 +108,10 @@ static unsigned Cb_read_data( struct Circular_buffer * const cb, } -/* Copies up to 'in_size' bytes from 'in_buffer' and updates 'put'. - Returns the number of bytes copied. +/* Copy up to 'in_size' bytes from 'in_buffer' and update 'put'. + Return the number of bytes copied. */ -static unsigned Cb_write_data( struct Circular_buffer * const cb, +static unsigned Cb_write_data( Circular_buffer * const cb, const uint8_t * const in_buffer, const unsigned in_size ) { diff --git a/configure b/configure index 2182cfa..90ab72d 100755 --- a/configure +++ b/configure @@ -1,19 +1,21 @@ #! /bin/sh # configure script for Lzlib - Compression library for the lzip format -# Copyright (C) 2009-2016 Antonio Diaz Diaz. +# Copyright (C) 2009-2025 Antonio Diaz Diaz. # # This configure script is free software: you have unlimited permission -# to copy, distribute and modify it. +# to copy, distribute, and modify it. pkgname=lzlib -pkgversion=1.8 +pkgversion=1.15 soversion=1 +libname=lz +libname_static=lib${libname}.a +libname_shared= progname=minilzip progname_static=${progname} progname_shared= progname_lzip=${progname} disable_ldconfig= -libname=lz srctrigger=doc/${pkgname}.texi # clear some things potentially inherited from environment. @@ -29,16 +31,15 @@ infodir='$(datarootdir)/info' libdir='$(exec_prefix)/lib' mandir='$(datarootdir)/man' CC=gcc +AR=ar CPPFLAGS= CFLAGS='-Wall -W -O2' LDFLAGS= +ARFLAGS=-rcs +MAKEINFO=makeinfo # checking whether we are using GNU C. -if /bin/sh -c "${CC} --version" > /dev/null 2>&1 ; then true -else - CC=cc - CFLAGS='-W -O2' -fi +/bin/sh -c "${CC} --version" > /dev/null 2>&1 || { CC=cc ; CFLAGS=-O2 ; } # Loop over all args args= @@ -50,22 +51,26 @@ while [ $# != 0 ] ; do shift # Add the argument quoted to args - args="${args} \"${option}\"" + if [ -z "${args}" ] ; then args="\"${option}\"" + else args="${args} \"${option}\"" ; fi # Split out the argument for options that take them case ${option} in - *=*) optarg=`echo ${option} | sed -e 's,^[^=]*=,,;s,/$,,'` ;; + *=*) optarg=`echo "${option}" | sed -e 's,^[^=]*=,,;s,/$,,'` ;; esac # Process the options case ${option} in --help | -h) - echo "Usage: configure [options]" + echo "Usage: $0 [OPTION]... [VAR=VALUE]..." echo - echo "Options: [defaults in brackets]" + echo "To assign makefile variables (e.g., CC, CFLAGS...), specify them as" + echo "arguments to configure in the form VAR=VALUE." + echo + echo "Options and variables: [defaults in brackets]" echo " -h, --help display this help and exit" echo " -V, --version output version information and exit" - echo " --srcdir=DIR find the sources in DIR [. or ..]" + echo " --srcdir=DIR find the source code in DIR [. or ..]" echo " --prefix=DIR install into DIR [${prefix}]" echo " --exec-prefix=DIR base directory for arch-dependent files [${exec_prefix}]" echo " --bindir=DIR user executables directory [${bindir}]" @@ -79,9 +84,13 @@ while [ $# != 0 ] ; do echo " --enable-shared build also a shared library [disable]" echo " --disable-ldconfig don't run ldconfig after install" echo " CC=COMPILER C compiler to use [${CC}]" - echo " CPPFLAGS=OPTIONS command line options for the preprocessor [${CPPFLAGS}]" - echo " CFLAGS=OPTIONS command line options for the C compiler [${CFLAGS}]" - echo " LDFLAGS=OPTIONS command line options for the linker [${LDFLAGS}]" + echo " AR=ARCHIVER library archiver to use [${AR}]" + echo " CPPFLAGS=OPTIONS command-line options for the preprocessor [${CPPFLAGS}]" + echo " CFLAGS=OPTIONS command-line options for the C compiler [${CFLAGS}]" + echo " CFLAGS+=OPTIONS append options to the current value of CFLAGS" + echo " LDFLAGS=OPTIONS command-line options for the linker [${LDFLAGS}]" + echo " ARFLAGS=OPTIONS command-line options for the library archiver [${ARFLAGS}]" + echo " MAKEINFO=NAME makeinfo program to use [${MAKEINFO}]" echo exit 0 ;; --version | -V) @@ -108,16 +117,25 @@ while [ $# != 0 ] ; do --mandir=*) mandir=${optarg} ;; --no-create) no_create=yes ;; --disable-static) + libname_static= progname_static= + libname_shared=lib${libname}.so.${soversion} + progname_shared=${progname}_shared + progname_lzip=${progname}_shared ;; + --enable-shared) + libname_shared=lib${libname}.so.${soversion} progname_shared=${progname}_shared progname_lzip=${progname}_shared ;; - --enable-shared) progname_shared=${progname}_shared ;; --disable-ldconfig) disable_ldconfig=yes ;; - CC=*) CC=${optarg} ;; - CPPFLAGS=*) CPPFLAGS=${optarg} ;; - CFLAGS=*) CFLAGS=${optarg} ;; - LDFLAGS=*) LDFLAGS=${optarg} ;; + CC=*) CC=${optarg} ;; + AR=*) AR=${optarg} ;; + CPPFLAGS=*) CPPFLAGS=${optarg} ;; + CFLAGS=*) CFLAGS=${optarg} ;; + CFLAGS+=*) CFLAGS="${CFLAGS} ${optarg}" ;; + LDFLAGS=*) LDFLAGS=${optarg} ;; + ARFLAGS=*) ARFLAGS=${optarg} ;; + MAKEINFO=*) MAKEINFO=${optarg} ;; --*) echo "configure: WARNING: unrecognized option: '${option}'" 1>&2 ;; @@ -128,7 +146,7 @@ while [ $# != 0 ] ; do exit 1 ;; esac - # Check if the option took a separate argument + # Check whether the option took a separate argument if [ "${arg2}" = yes ] ; then if [ $# != 0 ] ; then args="${args} \"$1\"" ; shift else echo "configure: Missing argument to '${option}'" 1>&2 @@ -137,19 +155,19 @@ while [ $# != 0 ] ; do fi done -# Find the source files, if location was not specified. +# Find the source code, if location was not specified. srcdirtext= if [ -z "${srcdir}" ] ; then srcdirtext="or . or .." ; srcdir=. if [ ! -r "${srcdir}/${srctrigger}" ] ; then srcdir=.. ; fi if [ ! -r "${srcdir}/${srctrigger}" ] ; then ## the sed command below emulates the dirname command - srcdir=`echo $0 | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'` + srcdir=`echo "$0" | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'` fi fi if [ ! -r "${srcdir}/${srctrigger}" ] ; then - echo "configure: Can't find sources in ${srcdir} ${srcdirtext}" 1>&2 + echo "configure: Can't find source code in ${srcdir} ${srcdirtext}" 1>&2 echo "configure: (At least ${srctrigger} is missing)." 1>&2 exit 1 fi @@ -167,9 +185,9 @@ if [ -z "${no_create}" ] ; then # Run this file to recreate the current configuration. # # This script is free software: you have unlimited permission -# to copy, distribute and modify it. +# to copy, distribute, and modify it. -exec /bin/sh $0 ${args} --no-create +exec /bin/sh "$0" ${args} --no-create EOF chmod +x config.status fi @@ -185,27 +203,32 @@ echo "infodir = ${infodir}" echo "libdir = ${libdir}" echo "mandir = ${mandir}" echo "CC = ${CC}" +echo "AR = ${AR}" echo "CPPFLAGS = ${CPPFLAGS}" echo "CFLAGS = ${CFLAGS}" echo "LDFLAGS = ${LDFLAGS}" +echo "ARFLAGS = ${ARFLAGS}" +echo "MAKEINFO = ${MAKEINFO}" rm -f Makefile cat > Makefile << EOF # Makefile for Lzlib - Compression library for the lzip format -# Copyright (C) 2009-2016 Antonio Diaz Diaz. +# Copyright (C) 2009-2025 Antonio Diaz Diaz. # This file was generated automatically by configure. Don't edit. # # This Makefile is free software: you have unlimited permission -# to copy, distribute and modify it. +# to copy, distribute, and modify it. pkgname = ${pkgname} pkgversion = ${pkgversion} soversion = ${soversion} +libname = ${libname} +libname_static = ${libname_static} +libname_shared = ${libname_shared} progname = ${progname} progname_static = ${progname_static} progname_shared = ${progname_shared} progname_lzip = ${progname_lzip} disable_ldconfig = ${disable_ldconfig} -libname = ${libname} VPATH = ${srcdir} prefix = ${prefix} exec_prefix = ${exec_prefix} @@ -216,9 +239,12 @@ infodir = ${infodir} libdir = ${libdir} mandir = ${mandir} CC = ${CC} +AR = ${AR} CPPFLAGS = ${CPPFLAGS} CFLAGS = ${CFLAGS} LDFLAGS = ${LDFLAGS} +ARFLAGS = ${ARFLAGS} +MAKEINFO = ${MAKEINFO} EOF cat "${srcdir}/Makefile.in" >> Makefile diff --git a/decoder.c b/decoder.c index 8ef3942..83a128c 100644 --- a/decoder.c +++ b/decoder.c @@ -1,167 +1,148 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ -static bool LZd_verify_trailer( struct LZ_decoder * const d ) +static int LZd_try_check_trailer( LZ_decoder * const d ) { - File_trailer trailer; - int size = Rd_read_data( d->rdec, trailer, Ft_size ); + Lzip_trailer trailer; + if( Rd_available_bytes( d->rdec ) < Lt_size ) + { if( !d->rdec->at_stream_end ) return 0; else return 2; } + d->check_trailer_pending = false; + d->member_finished = true; - if( size < Ft_size ) - return false; - - return ( Ft_get_data_crc( trailer ) == LZd_crc( d ) && - Ft_get_data_size( trailer ) == LZd_data_position( d ) && - Ft_get_member_size( trailer ) == d->rdec->member_position ); + if( Rd_read_data( d->rdec, trailer, Lt_size ) == Lt_size && + Lt_get_data_crc( trailer ) == LZd_crc( d ) && + Lt_get_data_size( trailer ) == LZd_data_position( d ) && + Lt_get_member_size( trailer ) == d->rdec->member_position ) return 0; + return 3; } /* Return value: 0 = OK, 1 = decoder error, 2 = unexpected EOF, 3 = trailer error, 4 = unknown marker found, - 5 = library error. */ -static int LZd_decode_member( struct LZ_decoder * const d ) + 5 = nonzero first LZMA byte found, 6 = library error. */ +static int LZd_decode_member( LZ_decoder * const d ) { - struct Range_decoder * const rdec = d->rdec; + Range_decoder * const rdec = d->rdec; State * const state = &d->state; -/* unsigned long long old_mpos = d->rdec->member_position; */ + unsigned old_mpos = rdec->member_position; if( d->member_finished ) return 0; - if( !Rd_try_reload( rdec, false ) ) - { if( !rdec->at_stream_end ) return 0; else return 2; } - if( d->verify_trailer_pending ) - { - if( Rd_available_bytes( rdec ) < Ft_size && !rdec->at_stream_end ) - return 0; - d->verify_trailer_pending = false; - d->member_finished = true; - if( LZd_verify_trailer( d ) ) return 0; else return 3; - } + const int tmp = Rd_try_reload( rdec ); + if( tmp > 1 ) return 5; + if( !tmp ) { if( !rdec->at_stream_end ) return 0; else return 2; } + if( d->check_trailer_pending ) return LZd_try_check_trailer( d ); while( !Rd_finished( rdec ) ) { - const int pos_state = LZd_data_position( d ) & pos_state_mask; -/* const unsigned long long mpos = d->rdec->member_position; - if( mpos - old_mpos > rd_min_available_bytes ) return 5; - old_mpos = mpos; */ - if( !Rd_enough_available_bytes( rdec ) ) /* check unexpected eof */ - { if( !rdec->at_stream_end ) return 0; else break; } + const unsigned mpos = rdec->member_position; + if( mpos - old_mpos > rd_min_available_bytes ) return 6; + old_mpos = mpos; + if( !Rd_enough_available_bytes( rdec ) ) /* check unexpected EOF */ + { if( !rdec->at_stream_end ) return 0; + if( Cb_empty( &rdec->cb ) ) break; } /* decode until EOF */ if( !LZd_enough_free_bytes( d ) ) return 0; - if( Rd_decode_bit( rdec, &d->bm_match[*state][pos_state] ) == 0 ) /* 1st bit */ + const int pos_state = LZd_data_position( d ) & pos_state_mask; + if( Rd_decode_bit( rdec, &d->bm_match[*state][pos_state] ) == 0 ) /* 1st bit */ { - const uint8_t prev_byte = LZd_peek_prev( d ); - if( St_is_char( *state ) ) + /* literal byte */ + Bit_model * const bm = d->bm_literal[get_lit_state(LZd_peek_prev( d ))]; + if( ( *state = St_set_char( *state ) ) < 4 ) + LZd_put_byte( d, Rd_decode_tree8( rdec, bm ) ); + else + LZd_put_byte( d, Rd_decode_matched( rdec, bm, LZd_peek( d, d->rep0 ) ) ); + continue; + } + /* match or repeated match */ + int len; + if( Rd_decode_bit( rdec, &d->bm_rep[*state] ) != 0 ) /* 2nd bit */ + { + if( Rd_decode_bit( rdec, &d->bm_rep0[*state] ) == 0 ) /* 3rd bit */ { - *state -= ( *state < 4 ) ? *state : 3; - LZd_put_byte( d, Rd_decode_tree( rdec, - d->bm_literal[get_lit_state(prev_byte)], 8 ) ); + if( Rd_decode_bit( rdec, &d->bm_len[*state][pos_state] ) == 0 ) /* 4th bit */ + { *state = St_set_shortrep( *state ); + LZd_put_byte( d, LZd_peek( d, d->rep0 ) ); continue; } } else { - *state -= ( *state < 10 ) ? 3 : 6; - LZd_put_byte( d, Rd_decode_matched( rdec, - d->bm_literal[get_lit_state(prev_byte)], - LZd_peek( d, d->rep0 ) ) ); + unsigned distance; + if( Rd_decode_bit( rdec, &d->bm_rep1[*state] ) == 0 ) /* 4th bit */ + distance = d->rep1; + else + { + if( Rd_decode_bit( rdec, &d->bm_rep2[*state] ) == 0 ) /* 5th bit */ + distance = d->rep2; + else + { distance = d->rep3; d->rep3 = d->rep2; } + d->rep2 = d->rep1; + } + d->rep1 = d->rep0; + d->rep0 = distance; } + *state = St_set_rep( *state ); + len = Rd_decode_len( rdec, &d->rep_len_model, pos_state ); } - else /* match or repeated match */ + else /* match */ { - int len; - if( Rd_decode_bit( rdec, &d->bm_rep[*state] ) != 0 ) /* 2nd bit */ + len = Rd_decode_len( rdec, &d->match_len_model, pos_state ); + unsigned distance = Rd_decode_tree6( rdec, d->bm_dis_slot[get_len_state(len)] ); + if( distance >= start_dis_model ) { - if( Rd_decode_bit( rdec, &d->bm_rep0[*state] ) != 0 ) /* 3rd bit */ - { - unsigned distance; - if( Rd_decode_bit( rdec, &d->bm_rep1[*state] ) == 0 ) /* 4th bit */ - distance = d->rep1; - else - { - if( Rd_decode_bit( rdec, &d->bm_rep2[*state] ) == 0 ) /* 5th bit */ - distance = d->rep2; - else - { distance = d->rep3; d->rep3 = d->rep2; } - d->rep2 = d->rep1; - } - d->rep1 = d->rep0; - d->rep0 = distance; - } + const unsigned dis_slot = distance; + const int direct_bits = ( dis_slot >> 1 ) - 1; + distance = ( 2 | ( dis_slot & 1 ) ) << direct_bits; + if( dis_slot < end_dis_model ) + distance += Rd_decode_tree_reversed( rdec, + d->bm_dis + ( distance - dis_slot ), direct_bits ); else { - if( Rd_decode_bit( rdec, &d->bm_len[*state][pos_state] ) == 0 ) /* 4th bit */ - { *state = St_set_short_rep( *state ); - LZd_put_byte( d, LZd_peek( d, d->rep0 ) ); continue; } - } - *state = St_set_rep( *state ); - len = min_match_len + Rd_decode_len( rdec, &d->rep_len_model, pos_state ); - } - else /* match */ - { - const unsigned rep0_saved = d->rep0; - int dis_slot; - len = min_match_len + Rd_decode_len( rdec, &d->match_len_model, pos_state ); - dis_slot = Rd_decode_tree6( rdec, d->bm_dis_slot[get_len_state(len)] ); - if( dis_slot < start_dis_model ) d->rep0 = dis_slot; - else - { - const int direct_bits = ( dis_slot >> 1 ) - 1; - d->rep0 = ( 2 | ( dis_slot & 1 ) ) << direct_bits; - if( dis_slot < end_dis_model ) - d->rep0 += Rd_decode_tree_reversed( rdec, - d->bm_dis + d->rep0 - dis_slot - 1, direct_bits ); - else + distance += + Rd_decode( rdec, direct_bits - dis_align_bits ) << dis_align_bits; + distance += Rd_decode_tree_reversed4( rdec, d->bm_align ); + if( distance == 0xFFFFFFFFU ) /* marker found */ { - d->rep0 += Rd_decode( rdec, direct_bits - dis_align_bits ) << dis_align_bits; - d->rep0 += Rd_decode_tree_reversed4( rdec, d->bm_align ); - if( d->rep0 == 0xFFFFFFFFU ) /* marker found */ + Rd_normalize( rdec ); + const unsigned mpos = rdec->member_position; + if( mpos - old_mpos > rd_min_available_bytes ) return 6; + old_mpos = mpos; + if( len == min_match_len ) /* End Of Stream marker */ { - d->rep0 = rep0_saved; - Rd_normalize( rdec ); - if( len == min_match_len ) /* End Of Stream marker */ - { - if( Rd_available_bytes( rdec ) < Ft_size && !rdec->at_stream_end ) - { d->verify_trailer_pending = true; return 0; } - d->member_finished = true; - if( LZd_verify_trailer( d ) ) return 0; else return 3; - } - if( len == min_match_len + 1 ) /* Sync Flush marker */ - { - if( Rd_try_reload( rdec, true ) ) { /*old_mpos += 5;*/ continue; } - else { if( !rdec->at_stream_end ) return 0; else break; } - } - return 4; + d->check_trailer_pending = true; + return LZd_try_check_trailer( d ); } + if( len == min_match_len + 1 ) /* Sync Flush marker */ + { + rdec->reload_pending = true; + const int tmp = Rd_try_reload( rdec ); + if( tmp > 1 ) return 5; + if( tmp ) continue; + if( !rdec->at_stream_end ) return 0; else break; + } + return 4; } } - d->rep3 = d->rep2; d->rep2 = d->rep1; d->rep1 = rep0_saved; - *state = St_set_match( *state ); - if( d->rep0 >= d->dictionary_size || - ( d->rep0 >= d->cb.put && !d->pos_wrapped ) ) - return 1; } - LZd_copy_block( d, d->rep0, len ); + d->rep3 = d->rep2; d->rep2 = d->rep1; d->rep1 = d->rep0; d->rep0 = distance; + *state = St_set_match( *state ); + if( d->rep0 >= d->dictionary_size || + ( d->rep0 >= d->cb.put && !d->pos_wrapped ) ) return 1; } + LZd_copy_block( d, d->rep0, len ); } return 2; } diff --git a/decoder.h b/decoder.h index a14156e..f880849 100644 --- a/decoder.h +++ b/decoder.h @@ -1,43 +1,35 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ -enum { rd_min_available_bytes = 8 }; +enum { rd_min_available_bytes = 10 }; -struct Range_decoder +typedef struct Range_decoder { - struct Circular_buffer cb; /* input buffer */ + Circular_buffer cb; /* input buffer */ unsigned long long member_position; uint32_t code; uint32_t range; bool at_stream_end; bool reload_pending; - }; + } Range_decoder; -static inline bool Rd_init( struct Range_decoder * const rdec ) +static inline bool Rd_init( Range_decoder * const rdec ) { if( !Cb_init( &rdec->cb, 65536 + rd_min_available_bytes ) ) return false; rdec->member_position = 0; @@ -48,25 +40,25 @@ static inline bool Rd_init( struct Range_decoder * const rdec ) return true; } -static inline void Rd_free( struct Range_decoder * const rdec ) +static inline void Rd_free( Range_decoder * const rdec ) { Cb_free( &rdec->cb ); } -static inline bool Rd_finished( const struct Range_decoder * const rdec ) - { return rdec->at_stream_end && !Cb_used_bytes( &rdec->cb ); } +static inline bool Rd_finished( const Range_decoder * const rdec ) + { return rdec->at_stream_end && Cb_empty( &rdec->cb ); } -static inline void Rd_finish( struct Range_decoder * const rdec ) +static inline void Rd_finish( Range_decoder * const rdec ) { rdec->at_stream_end = true; } -static inline bool Rd_enough_available_bytes( const struct Range_decoder * const rdec ) - { return ( Cb_used_bytes( &rdec->cb ) >= rd_min_available_bytes ); } +static inline bool Rd_enough_available_bytes( const Range_decoder * const rdec ) + { return Cb_used_bytes( &rdec->cb ) >= rd_min_available_bytes; } -static inline unsigned Rd_available_bytes( const struct Range_decoder * const rdec ) +static inline unsigned Rd_available_bytes( const Range_decoder * const rdec ) { return Cb_used_bytes( &rdec->cb ); } -static inline unsigned Rd_free_bytes( const struct Range_decoder * const rdec ) - { if( rdec->at_stream_end ) return 0; return Cb_free_bytes( &rdec->cb ); } +static inline unsigned Rd_free_bytes( const Range_decoder * const rdec ) + { return rdec->at_stream_end ? 0 : Cb_free_bytes( &rdec->cb ); } -static inline unsigned long long Rd_purge( struct Range_decoder * const rdec ) +static inline unsigned long long Rd_purge( Range_decoder * const rdec ) { const unsigned long long size = rdec->member_position + Cb_used_bytes( &rdec->cb ); @@ -75,32 +67,32 @@ static inline unsigned long long Rd_purge( struct Range_decoder * const rdec ) return size; } -static inline void Rd_reset( struct Range_decoder * const rdec ) +static inline void Rd_reset( Range_decoder * const rdec ) { Cb_reset( &rdec->cb ); rdec->member_position = 0; rdec->at_stream_end = false; } -/* Seeks a member header and updates 'get'. '*skippedp' is set to the - number of bytes skipped. Returns true if it finds a valid header. +/* Seek for a member header and update 'get'. Set '*skippedp' to the number + of bytes skipped. Return true if a valid header is found. */ -static bool Rd_find_header( struct Range_decoder * const rdec, - int * const skippedp ) +static bool Rd_find_header( Range_decoder * const rdec, + unsigned * const skippedp ) { *skippedp = 0; while( rdec->cb.get != rdec->cb.put ) { - if( rdec->cb.buffer[rdec->cb.get] == magic_string[0] ) + if( rdec->cb.buffer[rdec->cb.get] == lzip_magic[0] ) { unsigned get = rdec->cb.get; int i; - File_header header; - for( i = 0; i < Fh_size; ++i ) + Lzip_header header; + for( i = 0; i < Lh_size; ++i ) { if( get == rdec->cb.put ) return false; /* not enough data */ header[i] = rdec->cb.buffer[get]; if( ++get >= rdec->cb.buffer_size ) get = 0; } - if( Fh_verify( header ) ) return true; + if( Lh_check( header ) ) return true; } if( ++rdec->cb.get >= rdec->cb.buffer_size ) rdec->cb.get = 0; ++*skippedp; @@ -109,20 +101,22 @@ static bool Rd_find_header( struct Range_decoder * const rdec, } -static inline int Rd_write_data( struct Range_decoder * const rdec, +static inline int Rd_write_data( Range_decoder * const rdec, const uint8_t * const inbuf, const int size ) { if( rdec->at_stream_end || size <= 0 ) return 0; return Cb_write_data( &rdec->cb, inbuf, size ); } -static inline uint8_t Rd_get_byte( struct Range_decoder * const rdec ) +static inline uint8_t Rd_get_byte( Range_decoder * const rdec ) { + /* 0xFF avoids decoder error if member is truncated at EOS marker */ + if( Rd_finished( rdec ) ) return 0xFF; ++rdec->member_position; return Cb_get_byte( &rdec->cb ); } -static inline int Rd_read_data( struct Range_decoder * const rdec, +static inline int Rd_read_data( Range_decoder * const rdec, uint8_t * const outbuf, const int size ) { const int sz = Cb_read_data( &rdec->cb, outbuf, size ); @@ -130,7 +124,7 @@ static inline int Rd_read_data( struct Range_decoder * const rdec, return sz; } -static inline bool Rd_unread_data( struct Range_decoder * const rdec, +static inline bool Rd_unread_data( Range_decoder * const rdec, const unsigned size ) { if( size > rdec->member_position || !Cb_unread_data( &rdec->cb, size ) ) @@ -139,172 +133,211 @@ static inline bool Rd_unread_data( struct Range_decoder * const rdec, return true; } -static bool Rd_try_reload( struct Range_decoder * const rdec, const bool force ) +static int Rd_try_reload( Range_decoder * const rdec ) { - if( force ) rdec->reload_pending = true; if( rdec->reload_pending && Rd_available_bytes( rdec ) >= 5 ) { - int i; rdec->reload_pending = false; rdec->code = 0; - for( i = 0; i < 5; ++i ) - rdec->code = (rdec->code << 8) | Rd_get_byte( rdec ); rdec->range = 0xFFFFFFFFU; - rdec->code &= rdec->range; /* make sure that first byte is discarded */ + /* check first byte of the LZMA stream without reading it */ + if( rdec->cb.buffer[rdec->cb.get] != 0 ) return 2; + Rd_get_byte( rdec ); /* discard first byte of the LZMA stream */ + int i; for( i = 0; i < 4; ++i ) + rdec->code = (rdec->code << 8) | Rd_get_byte( rdec ); } return !rdec->reload_pending; } -static inline void Rd_normalize( struct Range_decoder * const rdec ) +static inline void Rd_normalize( Range_decoder * const rdec ) { if( rdec->range <= 0x00FFFFFFU ) - { - rdec->range <<= 8; - rdec->code = (rdec->code << 8) | Rd_get_byte( rdec ); - } + { rdec->range <<= 8; rdec->code = (rdec->code << 8) | Rd_get_byte( rdec ); } } -static inline int Rd_decode( struct Range_decoder * const rdec, - const int num_bits ) +static inline unsigned Rd_decode( Range_decoder * const rdec, + const int num_bits ) { - int symbol = 0; + unsigned symbol = 0; int i; for( i = num_bits; i > 0; --i ) { - uint32_t mask; Rd_normalize( rdec ); rdec->range >>= 1; /* symbol <<= 1; */ /* if( rdec->code >= rdec->range ) { rdec->code -= rdec->range; symbol |= 1; } */ - mask = 0U - (rdec->code < rdec->range); - rdec->code -= rdec->range; - rdec->code += rdec->range & mask; - symbol = (symbol << 1) + (mask + 1); + const bool bit = rdec->code >= rdec->range; + symbol <<= 1; symbol += bit; + rdec->code -= rdec->range & ( 0U - bit ); } return symbol; } -static inline int Rd_decode_bit( struct Range_decoder * const rdec, - Bit_model * const probability ) +static inline unsigned Rd_decode_bit( Range_decoder * const rdec, + Bit_model * const probability ) { - uint32_t bound; Rd_normalize( rdec ); - bound = ( rdec->range >> bit_model_total_bits ) * *probability; + const uint32_t bound = ( rdec->range >> bit_model_total_bits ) * *probability; if( rdec->code < bound ) { rdec->range = bound; - *probability += (bit_model_total - *probability) >> bit_model_move_bits; + *probability += ( bit_model_total - *probability ) >> bit_model_move_bits; return 0; } else { - rdec->range -= bound; rdec->code -= bound; + rdec->range -= bound; *probability -= *probability >> bit_model_move_bits; return 1; } } -static inline int Rd_decode_tree( struct Range_decoder * const rdec, - Bit_model bm[], const int num_bits ) +static inline void Rd_decode_symbol_bit( Range_decoder * const rdec, + Bit_model * const probability, unsigned * symbol ) { - int symbol = 1; - int i; - for( i = num_bits; i > 0; --i ) - symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] ); - return symbol - (1 << num_bits); + Rd_normalize( rdec ); + *symbol <<= 1; + const uint32_t bound = ( rdec->range >> bit_model_total_bits ) * *probability; + if( rdec->code < bound ) + { + rdec->range = bound; + *probability += ( bit_model_total - *probability ) >> bit_model_move_bits; + } + else + { + rdec->code -= bound; + rdec->range -= bound; + *probability -= *probability >> bit_model_move_bits; + *symbol |= 1; + } } -static inline int Rd_decode_tree6( struct Range_decoder * const rdec, - Bit_model bm[] ) +static inline void Rd_decode_symbol_bit_reversed( Range_decoder * const rdec, + Bit_model * const probability, unsigned * model, + unsigned * symbol, const int i ) { - int symbol = 1; - symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] ); - symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] ); - symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] ); - symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] ); - symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] ); - symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] ); + Rd_normalize( rdec ); + *model <<= 1; + const uint32_t bound = ( rdec->range >> bit_model_total_bits ) * *probability; + if( rdec->code < bound ) + { + rdec->range = bound; + *probability += ( bit_model_total - *probability ) >> bit_model_move_bits; + } + else + { + rdec->code -= bound; + rdec->range -= bound; + *probability -= *probability >> bit_model_move_bits; + *model |= 1; + *symbol |= 1 << i; + } + } + +static inline unsigned Rd_decode_tree6( Range_decoder * const rdec, + Bit_model bm[] ) + { + unsigned symbol = 1; + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); return symbol & 0x3F; } -static inline int Rd_decode_tree_reversed( struct Range_decoder * const rdec, - Bit_model bm[], const int num_bits ) +static inline unsigned Rd_decode_tree8( Range_decoder * const rdec, + Bit_model bm[] ) { - int model = 1; - int symbol = 0; - int i; - for( i = 0; i < num_bits; ++i ) - { - const bool bit = Rd_decode_bit( rdec, &bm[model] ); - model <<= 1; - if( bit ) { ++model; symbol |= (1 << i); } - } - return symbol; - } - -static inline int Rd_decode_tree_reversed4( struct Range_decoder * const rdec, - Bit_model bm[] ) - { - int model = 1; - int symbol = Rd_decode_bit( rdec, &bm[model] ); - int bit; - model = (model << 1) + symbol; - bit = Rd_decode_bit( rdec, &bm[model] ); - model = (model << 1) + bit; symbol |= (bit << 1); - bit = Rd_decode_bit( rdec, &bm[model] ); - model = (model << 1) + bit; symbol |= (bit << 2); - if( Rd_decode_bit( rdec, &bm[model] ) ) symbol |= 8; - return symbol; - } - -static inline int Rd_decode_matched( struct Range_decoder * const rdec, - Bit_model bm[], int match_byte ) - { - Bit_model * const bm1 = bm + 0x100; - int symbol = 1; - while( symbol < 0x100 ) - { - int match_bit, bit; - match_byte <<= 1; - match_bit = match_byte & 0x100; - bit = Rd_decode_bit( rdec, &bm1[match_bit+symbol] ); - symbol = ( symbol << 1 ) | bit; - if( match_bit != bit << 8 ) - { - while( symbol < 0x100 ) - symbol = ( symbol << 1 ) | Rd_decode_bit( rdec, &bm[symbol] ); - break; - } - } + unsigned symbol = 1; + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); return symbol & 0xFF; } -static inline int Rd_decode_len( struct Range_decoder * const rdec, - struct Len_model * const lm, - const int pos_state ) +static inline unsigned +Rd_decode_tree_reversed( Range_decoder * const rdec, + Bit_model bm[], const int num_bits ) { + unsigned model = 1; + unsigned symbol = 0; + int i; + for( i = 0; i < num_bits; ++i ) + Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, i ); + return symbol; + } + +static inline unsigned +Rd_decode_tree_reversed4( Range_decoder * const rdec, Bit_model bm[] ) + { + unsigned model = 1; + unsigned symbol = 0; + Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 0 ); + Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 1 ); + Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 2 ); + Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 3 ); + return symbol; + } + +static inline unsigned Rd_decode_matched( Range_decoder * const rdec, + Bit_model bm[], unsigned match_byte ) + { + unsigned symbol = 1; + unsigned mask = 0x100; + while( true ) + { + const unsigned match_bit = ( match_byte <<= 1 ) & mask; + const unsigned bit = Rd_decode_bit( rdec, &bm[symbol+match_bit+mask] ); + symbol <<= 1; symbol += bit; + if( symbol > 0xFF ) return symbol & 0xFF; + mask &= ~(match_bit ^ (bit << 8)); /* if( match_bit != bit ) mask = 0; */ + } + } + +static inline unsigned Rd_decode_len( Range_decoder * const rdec, + Len_model * const lm, + const int pos_state ) + { + Bit_model * bm; + unsigned mask, offset, symbol = 1; + if( Rd_decode_bit( rdec, &lm->choice1 ) == 0 ) - return Rd_decode_tree( rdec, lm->bm_low[pos_state], len_low_bits ); + { bm = lm->bm_low[pos_state]; mask = 7; offset = 0; goto len3; } if( Rd_decode_bit( rdec, &lm->choice2 ) == 0 ) - return len_low_symbols + - Rd_decode_tree( rdec, lm->bm_mid[pos_state], len_mid_bits ); - return len_low_symbols + len_mid_symbols + - Rd_decode_tree( rdec, lm->bm_high, len_high_bits ); + { bm = lm->bm_mid[pos_state]; mask = 7; offset = len_low_symbols; goto len3; } + bm = lm->bm_high; mask = 0xFF; offset = len_low_symbols + len_mid_symbols; + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); +len3: + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + return ( symbol & mask ) + min_match_len + offset; } enum { lzd_min_free_bytes = max_match_len }; -struct LZ_decoder +typedef struct LZ_decoder { - struct Circular_buffer cb; + Circular_buffer cb; unsigned long long partial_data_pos; - struct Range_decoder * rdec; + Range_decoder * rdec; unsigned dictionary_size; uint32_t crc; + bool check_trailer_pending; bool member_finished; - bool verify_trailer_pending; bool pos_wrapped; unsigned rep0; /* rep[0-3] latest four distances */ unsigned rep1; /* used for efficient coding of */ @@ -320,31 +353,28 @@ struct LZ_decoder Bit_model bm_rep2[states]; Bit_model bm_len[states][pos_states]; Bit_model bm_dis_slot[len_states][1<cb ) >= lzd_min_free_bytes; } -static inline uint8_t LZd_peek_prev( const struct LZ_decoder * const d ) - { - const unsigned i = ( ( d->cb.put > 0 ) ? d->cb.put : d->cb.buffer_size ) - 1; - return d->cb.buffer[i]; - } +static inline uint8_t LZd_peek_prev( const LZ_decoder * const d ) + { return d->cb.buffer[((d->cb.put > 0) ? d->cb.put : d->cb.buffer_size)-1]; } -static inline uint8_t LZd_peek( const struct LZ_decoder * const d, +static inline uint8_t LZd_peek( const LZ_decoder * const d, const unsigned distance ) { - unsigned i = d->cb.put - distance - 1; - if( d->cb.put <= distance ) i += d->cb.buffer_size; + const unsigned i = ( (d->cb.put > distance) ? 0 : d->cb.buffer_size ) + + d->cb.put - distance - 1; return d->cb.buffer[i]; } -static inline void LZd_put_byte( struct LZ_decoder * const d, const uint8_t b ) +static inline void LZd_put_byte( LZ_decoder * const d, const uint8_t b ) { CRC32_update_byte( &d->crc, b ); d->cb.buffer[d->cb.put] = b; @@ -352,21 +382,31 @@ static inline void LZd_put_byte( struct LZ_decoder * const d, const uint8_t b ) { d->partial_data_pos += d->cb.put; d->cb.put = 0; d->pos_wrapped = true; } } -static inline void LZd_copy_block( struct LZ_decoder * const d, +static inline void LZd_copy_block( LZ_decoder * const d, const unsigned distance, unsigned len ) { - unsigned i = d->cb.put - distance - 1; - bool fast; - if( d->cb.put <= distance ) - { i += d->cb.buffer_size; - fast = ( len <= d->cb.buffer_size - i && len <= i - d->cb.put ); } - else - fast = ( len < d->cb.buffer_size - d->cb.put && len <= d->cb.put - i ); - if( fast ) /* no wrap, no overlap */ + unsigned lpos = d->cb.put, i = lpos - distance - 1; + bool fast, fast2; + if( lpos > distance ) { - CRC32_update_buf( &d->crc, d->cb.buffer + i, len ); - memcpy( d->cb.buffer + d->cb.put, d->cb.buffer + i, len ); - d->cb.put += len; + fast = len < d->cb.buffer_size - lpos; + fast2 = fast && len <= lpos - i; + } + else + { + i += d->cb.buffer_size; + fast = len < d->cb.buffer_size - i; /* (i == pos) may happen */ + fast2 = fast && len <= i - lpos; + } + if( fast ) /* no wrap */ + { + const unsigned tlen = len; + if( fast2 ) /* no wrap, no overlap */ + memcpy( d->cb.buffer + lpos, d->cb.buffer + i, len ); + else + for( ; len > 0; --len ) d->cb.buffer[lpos++] = d->cb.buffer[i++]; + CRC32_update_buf( &d->crc, d->cb.buffer + d->cb.put, tlen ); + d->cb.put += tlen; } else for( ; len > 0; --len ) { @@ -375,8 +415,7 @@ static inline void LZd_copy_block( struct LZ_decoder * const d, } } -static inline bool LZd_init( struct LZ_decoder * const d, - struct Range_decoder * const rde, +static inline bool LZd_init( LZ_decoder * const d, Range_decoder * const rde, const unsigned dict_size ) { if( !Cb_init( &d->cb, max( 65536, dict_size ) + lzd_min_free_bytes ) ) @@ -385,9 +424,11 @@ static inline bool LZd_init( struct LZ_decoder * const d, d->rdec = rde; d->dictionary_size = dict_size; d->crc = 0xFFFFFFFFU; + d->check_trailer_pending = false; d->member_finished = false; - d->verify_trailer_pending = false; d->pos_wrapped = false; + /* prev_byte of first byte; also for LZd_peek( 0 ) on corrupt file */ + d->cb.buffer[d->cb.buffer_size-1] = 0; d->rep0 = 0; d->rep1 = 0; d->rep2 = 0; @@ -402,23 +443,21 @@ static inline bool LZd_init( struct LZ_decoder * const d, Bm_array_init( d->bm_rep2, states ); Bm_array_init( d->bm_len[0], states * pos_states ); Bm_array_init( d->bm_dis_slot[0], len_states * (1 << dis_slot_bits) ); - Bm_array_init( d->bm_dis, modeled_distances - end_dis_model ); + Bm_array_init( d->bm_dis, modeled_distances - end_dis_model + 1 ); Bm_array_init( d->bm_align, dis_align_size ); Lm_init( &d->match_len_model ); Lm_init( &d->rep_len_model ); - d->cb.buffer[d->cb.buffer_size-1] = 0; /* prev_byte of first byte */ return true; } -static inline void LZd_free( struct LZ_decoder * const d ) - { Cb_free( &d->cb ); } +static inline void LZd_free( LZ_decoder * const d ) { Cb_free( &d->cb ); } -static inline bool LZd_member_finished( const struct LZ_decoder * const d ) - { return ( d->member_finished && !Cb_used_bytes( &d->cb ) ); } +static inline bool LZd_member_finished( const LZ_decoder * const d ) + { return d->member_finished && Cb_empty( &d->cb ); } -static inline unsigned LZd_crc( const struct LZ_decoder * const d ) +static inline unsigned LZd_crc( const LZ_decoder * const d ) { return d->crc ^ 0xFFFFFFFFU; } static inline unsigned long long -LZd_data_position( const struct LZ_decoder * const d ) +LZd_data_position( const LZ_decoder * const d ) { return d->partial_data_pos + d->cb.put; } diff --git a/doc/lzlib.info b/doc/lzlib.info index a9f47b3..4e8d079 100644 --- a/doc/lzlib.info +++ b/doc/lzlib.info @@ -1,6 +1,6 @@ This is lzlib.info, produced by makeinfo version 4.13+ from lzlib.texi. -INFO-DIR-SECTION Data Compression +INFO-DIR-SECTION Compression START-INFO-DIR-ENTRY * Lzlib: (lzlib). Compression library for the lzip format END-INFO-DIR-ENTRY @@ -11,28 +11,29 @@ File: lzlib.info, Node: Top, Next: Introduction, Up: (dir) Lzlib Manual ************ -This manual is for Lzlib (version 1.8, 17 May 2016). +This manual is for Lzlib (version 1.15, 9 January 2025). * Menu: -* Introduction:: Purpose and features of lzlib -* Library version:: Checking library version -* Buffering:: Sizes of lzlib's buffers -* Parameter limits:: Min / max values for some parameters -* Compression functions:: Descriptions of the compression functions -* Decompression functions:: Descriptions of the decompression functions -* Error codes:: Meaning of codes returned by functions -* Error messages:: Error messages corresponding to error codes -* Data format:: Detailed format of the compressed data -* Examples:: A small tutorial with examples -* Problems:: Reporting bugs -* Concept index:: Index of concepts +* Introduction:: Purpose and features of lzlib +* Library version:: Checking library version +* Buffering:: Sizes of lzlib's buffers +* Parameter limits:: Min / max values for some parameters +* Compression functions:: Descriptions of the compression functions +* Decompression functions:: Descriptions of the decompression functions +* Error codes:: Meaning of codes returned by functions +* Error messages:: Error messages corresponding to error codes +* Invoking minilzip:: Command-line interface of the test program +* File format:: Detailed format of the compressed file +* Examples:: A small tutorial with examples +* Problems:: Reporting bugs +* Concept index:: Index of concepts - Copyright (C) 2009-2016 Antonio Diaz Diaz. + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This manual is free documentation: you have unlimited permission to -copy, distribute and modify it. + This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it.  File: lzlib.info, Node: Introduction, Next: Library version, Prev: Top, Up: Top @@ -40,93 +41,81 @@ File: lzlib.info, Node: Introduction, Next: Library version, Prev: Top, Up: 1 Introduction ************** -Lzlib is a data compression library providing in-memory LZMA compression -and decompression functions, including integrity checking of the -decompressed data. The compressed data format used by the library is the -lzip format. Lzlib is written in C. - - The lzip file format is designed for data sharing and long-term -archiving, taking into account both data integrity and decoder -availability: - - * The lzip format provides very safe integrity checking and some data - recovery means. The lziprecover program can repair bit-flip errors - (one of the most common forms of data corruption) in lzip files, - and provides data recovery capabilities, including error-checked - merging of damaged copies of a file. *Note Data safety: - (lziprecover)Data safety. - - * The lzip format is as simple as possible (but not simpler). The - lzip manual provides the code of a simple decompressor along with - a detailed explanation of how it works, so that with the only help - of the lzip manual it would be possible for a digital - archaeologist to extract the data from a lzip file long after - quantum computers eventually render LZMA obsolete. - - * Additionally the lzip reference implementation is copylefted, which - guarantees that it will remain free forever. - - A nice feature of the lzip format is that a corrupt byte is easier to -repair the nearer it is from the beginning of the file. Therefore, with -the help of lziprecover, losing an entire archive just because of a -corrupt byte near the beginning is a thing of the past. +Lzlib is a data compression library providing in-memory LZMA compression and +decompression functions, including integrity checking of the decompressed +data. The compressed data format used by the library is the lzip format. +Lzlib is written in C and is distributed under a 2-clause BSD license. The functions and variables forming the interface of the compression -library are declared in the file 'lzlib.h'. Usage examples of the -library are given in the files 'main.c' and 'bbexample.c' from the -source distribution. +library are declared in the file 'lzlib.h'. Usage examples of the library +are given in the files 'bbexample.c', 'ffexample.c', and 'minilzip.c' from +the source distribution. + + As 'lzlib.h' can be used in C and C++ programs, it must not impose a +choice of system headers on the program by including one of them. Therefore +it is the responsibility of the program using lzlib to include before +'lzlib.h' some header that declares the type 'uint8_t'. There are at least +four such headers in C and C++: 'stdint.h', 'cstdint', 'inttypes.h', and +'cinttypes'. + + All the library functions are thread safe. The library does not install +any signal handler. The decoder checks the consistency of the compressed +data, so the library should never crash even in case of corrupted input. Compression/decompression is done by repeatedly calling a couple of -read/write functions until all the data have been processed by the -library. This interface is safer and less error prone than the -traditional zlib interface. +read/write functions until all the data have been processed by the library. +This interface is safer and less error prone than the traditional zlib +interface. - Compression/decompression is done when the read function is called. -This means the value returned by the position functions will not be -updated until a read call, even if a lot of data is written. If you -want the data to be compressed in advance, just call the read function -with a SIZE equal to 0. + Compression/decompression is done when the read function is called. This +means the value returned by the position functions is not updated until a +read call, even if a lot of data are written. If you want the data to be +compressed in advance, just call the read function with a SIZE equal to 0. - If all the data to be compressed are written in advance, lzlib will -automatically adjust the header of the compressed data to use the -smallest possible dictionary size. This feature reduces the amount of -memory needed for decompression and allows minilzip to produce identical + If all the data to be compressed are written in advance, lzlib +automatically adjusts the header of the compressed data to use the largest +dictionary size that does not exceed neither the data size nor the limit +given to 'LZ_compress_open'. This feature reduces the amount of memory +needed for decompression and allows minilzip to produce identical compressed output as lzip. - Lzlib will correctly decompress a data stream which is the -concatenation of two or more compressed data streams. The result is the -concatenation of the corresponding decompressed data streams. Integrity -testing of concatenated compressed data streams is also supported. + Lzlib correctly decompresses a data stream which is the concatenation of +two or more compressed data streams. The result is the concatenation of the +corresponding decompressed data streams. Integrity testing of concatenated +compressed data streams is also supported. - All the library functions are thread safe. The library does not -install any signal handler. The decoder checks the consistency of the -compressed data, so the library should never crash even in case of -corrupted input. + Lzlib is able to compress and decompress streams of unlimited size by +automatically creating multimember output. The members so created are large, +about 2 PiB each. - In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is -not a concrete algorithm; it is more like "any algorithm using the LZMA -coding scheme". For example, the option '-0' of lzip uses the scheme in -almost the simplest way possible; issuing the longest match it can -find, or a literal byte if it can't find a match. Inversely, a much -more elaborated way of finding coding sequences of minimum size than -the one currently used by lzip could be developed, and the resulting -sequence could also be coded using the LZMA coding scheme. + In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a +concrete algorithm; it is more like "any algorithm using the LZMA coding +scheme". For example, the option '-0' of lzip uses the scheme in almost the +simplest way possible; issuing the longest match it can find, or a literal +byte if it can't find a match. Inversely, a more elaborate way of finding +coding sequences of minimum size than the one currently used by lzip could +be developed, and the resulting sequence could also be coded using the LZMA +coding scheme. - Lzlib currently implements two variants of the LZMA algorithm; fast -(used by option '-0' of minilzip) and normal (used by all other -compression levels). + Lzlib currently implements two variants of the LZMA algorithm: fast +(used by option '-0' of minilzip) and normal (used by all other compression +levels). - The high compression of LZMA comes from combining two basic, -well-proven compression ideas: sliding dictionaries (LZ77/78) and -markov models (the thing used by every compression algorithm that uses -a range encoder or similar order-0 entropy coder as its last stage) -with segregation of contexts according to what the bits are used for. + The high compression of LZMA comes from combining two basic, well-proven +compression ideas: sliding dictionaries (LZ77) and Markov models (the thing +used by every compression algorithm that uses a range encoder or similar +order-0 entropy coder as its last stage) with segregation of contexts +according to what the bits are used for. - The ideas embodied in lzlib are due to (at least) the following -people: Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey -Markov (for the definition of Markov chains), G.N.N. Martin (for the -definition of range encoding), Igor Pavlov (for putting all the above -together in LZMA), and Julian Seward (for bzip2's CLI). + The ideas embodied in lzlib are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrei Markov (for the +definition of Markov chains), G.N.N. Martin (for the definition of range +encoding), Igor Pavlov (for putting all the above together in LZMA), and +Julian Seward (for bzip2's CLI). + + LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never +have been compressed. Decompressed is used to refer to data which have +undergone the process of decompression.  File: lzlib.info, Node: Library version, Next: Buffering, Prev: Introduction, Up: Top @@ -134,19 +123,55 @@ File: lzlib.info, Node: Library version, Next: Buffering, Prev: Introduction, 2 Library version ***************** - -- Function: const char * LZ_version ( void ) - Returns the library version as a string. +One goal of lzlib is to keep perfect backward compatibility with older +versions of itself down to 1.0. Any application working with an older lzlib +should work with a newer lzlib. Installing a newer lzlib should not break +anything. This chapter describes the constants and functions that the +application can use to discover the version of the library being used. All +of them are declared in 'lzlib.h'. + + -- Constant: LZ_API_VERSION + This constant is defined in 'lzlib.h' and works as a version test + macro. The application should check at compile time that + LZ_API_VERSION is greater than or equal to the version required by the + application: + + #if !defined LZ_API_VERSION || LZ_API_VERSION < 1012 + #error "lzlib 1.12 or newer needed." + #endif + + Before version 1.8, lzlib didn't define LZ_API_VERSION. + LZ_API_VERSION was first defined in lzlib 1.8 to 1. + Since lzlib 1.12, LZ_API_VERSION is defined as (major * 1000 + minor). + + NOTE: Version test macros are the library's way of announcing +functionality to the application. They should not be confused with feature +test macros, which allow the application to announce to the library its +desire to have certain symbols and prototypes exposed. + + -- Function: int LZ_api_version ( void ) + If LZ_API_VERSION >= 1012, this function is declared in 'lzlib.h' (else + it doesn't exist). It returns the LZ_API_VERSION of the library object + code being used. The application should check at run time that the + value returned by 'LZ_api_version' is greater than or equal to the + version required by the application. An application may be dynamically + linked at run time with a different version of lzlib than the one it + was compiled for, and this should not break the application as long as + the library used provides the functionality required by the + application. + + #if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + if( LZ_api_version() < 1012 ) + show_error( "lzlib 1.12 or newer needed." ); + #endif -- Constant: const char * LZ_version_string - This constant is defined in the header file 'lzlib.h'. + This string constant is defined in the header file 'lzlib.h' and + represents the version of the library being used at compile time. - The application should compare LZ_version and LZ_version_string for -consistency. If the first character differs, the library code actually -used may be incompatible with the 'lzlib.h' header file used by the -application. - - if( LZ_version()[0] != LZ_version_string[0] ) - error( "bad library version" ); + -- Function: const char * LZ_version ( void ) + This function returns a string representing the version of the library + being used at run time.  File: lzlib.info, Node: Buffering, Next: Parameter limits, Prev: Library version, Up: Top @@ -154,30 +179,31 @@ File: lzlib.info, Node: Buffering, Next: Parameter limits, Prev: Library vers 3 Buffering *********** -Lzlib internal functions need access to a memory chunk at least as large -as the dictionary size (sliding window). For efficiency reasons, the -input buffer for compression is twice or sixteen times as large as the -dictionary size. +Lzlib internal functions need access to a memory chunk at least as large as +the dictionary size (sliding window). For efficiency reasons, the input +buffer for compression is twice or sixteen times as large as the dictionary +size. Finally, for safety reasons, lzlib uses two more internal buffers. - These are the four buffers used by lzlib, and their guaranteed -minimum sizes: + These are the four buffers used by lzlib, and their guaranteed minimum +sizes: - * Input compression buffer. Written to by the 'LZ_compress_write' - function. For the normal variant of LZMA, its size is two times - the dictionary size set with the 'LZ_compress_open' function or 64 - KiB, whichever is larger. For the fast variant, its size is 1 MiB. + * Input compression buffer. Written to by the function + 'LZ_compress_write'. For the normal variant of LZMA, its size is two + times the dictionary size set with the function 'LZ_compress_open' or + 64 KiB, whichever is larger. For the fast variant, its size is 1 MiB. - * Output compression buffer. Read from by the 'LZ_compress_read' - function. Its size is 64 KiB. + * Output compression buffer. Read from by the function + 'LZ_compress_read'. Its size is 64 KiB. - * Input decompression buffer. Written to by the - 'LZ_decompress_write' function. Its size is 64 KiB. + * Input decompression buffer. Written to by the function + 'LZ_decompress_write'. Its size is 64 KiB. - * Output decompression buffer. Read from by the 'LZ_decompress_read' - function. Its size is the dictionary size set in the header of the - member currently being decompressed or 64 KiB, whichever is larger. + * Output decompression buffer. Read from by the function + 'LZ_decompress_read'. Its size is the dictionary size set in the header + of the member currently being decompressed or 64 KiB, whichever is + larger.  File: lzlib.info, Node: Parameter limits, Next: Compression functions, Prev: Buffering, Up: Top @@ -196,8 +222,7 @@ Current values are shown in square brackets. Returns the smallest valid dictionary size [4 KiB]. -- Function: int LZ_max_dictionary_bits ( void ) - Returns the base 2 logarithm of the largest valid dictionary size - [29]. + Returns the base 2 logarithm of the largest valid dictionary size [29]. -- Function: int LZ_max_dictionary_size ( void ) Returns the largest valid dictionary size [512 MiB]. @@ -216,135 +241,151 @@ File: lzlib.info, Node: Compression functions, Next: Decompression functions, These are the functions used to compress data. In case of error, all of them return -1 or 0, for signed and unsigned return values respectively, -except 'LZ_compress_open' whose return value must be verified by -calling 'LZ_compress_errno' before using it. +except 'LZ_compress_open' whose return value must be checked by calling +'LZ_compress_errno' before using it. - -- Function: struct LZ_Encoder * LZ_compress_open ( const int - DICTIONARY_SIZE, const int MATCH_LEN_LIMIT, const unsigned - long long MEMBER_SIZE ) + -- Function: LZ_Encoder * LZ_compress_open ( const int DICTIONARY_SIZE, + const int MATCH_LEN_LIMIT, const unsigned long long MEMBER_SIZE ) Initializes the internal stream state for compression and returns a - pointer that can only be used as the ENCODER argument for the - other LZ_compress functions, or a null pointer if the encoder - could not be allocated. + pointer that can only be used as the ENCODER argument for the other + LZ_compress functions, or a null pointer if the encoder could not be + allocated. - The returned pointer must be verified by calling - 'LZ_compress_errno' before using it. If 'LZ_compress_errno' does - not return 'LZ_ok', the returned pointer must not be used and - should be freed with 'LZ_compress_close' to avoid memory leaks. + The returned pointer must be checked by calling 'LZ_compress_errno' + before using it. If 'LZ_compress_errno' does not return 'LZ_ok', the + returned pointer must not be used and should be freed with + 'LZ_compress_close' to avoid memory leaks. - DICTIONARY_SIZE sets the dictionary size to be used, in bytes. - Valid values range from 4 KiB to 512 MiB. Note that dictionary - sizes are quantized. If the specified size does not match one of - the valid sizes, it will be rounded upwards by adding up to - (DICTIONARY_SIZE / 8) to it. + DICTIONARY_SIZE sets the dictionary size to be used, in bytes. Valid + values range from 4 KiB to 512 MiB. Note that dictionary sizes are + quantized. If the size specified does not match one of the valid + sizes, it is rounded upwards by adding up to (DICTIONARY_SIZE / 8) to + it. MATCH_LEN_LIMIT sets the match length limit in bytes. Valid values range from 5 to 273. Larger values usually give better compression ratios but longer compression times. If DICTIONARY_SIZE is 65535 and MATCH_LEN_LIMIT is 16, the fast - variant of LZMA is chosen, which produces identical compressed - output as 'lzip -0'. (The dictionary size used will be rounded - upwards to 64 KiB). + variant of LZMA is chosen, which produces identical compressed output + as 'lzip -0'. (The dictionary size used is rounded upwards to 64 KiB). - MEMBER_SIZE sets the member size limit in bytes. Minimum member - size limit is 100 kB. Small member size may degrade compression + MEMBER_SIZE sets the member size limit in bytes. Valid values range + from 4 KiB to 2 PiB. A small member size may degrade compression ratio, so use it only when needed. To produce a single-member data - stream, give MEMBER_SIZE a value larger than the amount of data to - be produced, for example INT64_MAX. + stream, give MEMBER_SIZE a value larger than the amount of data to be + produced. Values larger than 2 PiB are reduced to 2 PiB to prevent the + uncompressed size of the member from overflowing. - -- Function: int LZ_compress_close ( struct LZ_Encoder * const ENCODER - ) - Frees all dynamically allocated data structures for this stream. - This function discards any unprocessed input and does not flush - any pending output. After a call to 'LZ_compress_close', ENCODER - can no more be used as an argument to any LZ_compress function. + -- Function: int LZ_compress_close ( LZ_Encoder * const ENCODER ) + Frees all dynamically allocated data structures for this stream. This + function discards any unprocessed input and does not flush any pending + output. After a call to 'LZ_compress_close', ENCODER can no longer be + used as an argument to any LZ_compress function. It is safe to call + 'LZ_compress_close' with a null argument. - -- Function: int LZ_compress_finish ( struct LZ_Encoder * const - ENCODER ) + -- Function: int LZ_compress_finish ( LZ_Encoder * const ENCODER ) Use this function to tell 'lzlib' that all the data for this member - have already been written (with the 'LZ_compress_write' function). - After all the produced compressed data have been read with - 'LZ_compress_read' and 'LZ_compress_member_finished' returns 1, a - new member can be started with 'LZ_compress_restart_member'. + have already been written (with the function 'LZ_compress_write'). It + is safe to call 'LZ_compress_finish' as many times as needed. After + all the compressed data have been read with 'LZ_compress_read' and + 'LZ_compress_member_finished' returns 1, a new member can be started + with 'LZ_compress_restart_member'. - -- Function: int LZ_compress_restart_member ( struct LZ_Encoder * - const ENCODER, const unsigned long long MEMBER_SIZE ) - Use this function to start a new member in a multimember data - stream. Call this function only after - 'LZ_compress_member_finished' indicates that the current member - has been fully read (with the 'LZ_compress_read' function). + -- Function: int LZ_compress_restart_member ( LZ_Encoder * const ENCODER, + const unsigned long long MEMBER_SIZE ) + Use this function to start a new member in a multimember data stream. + Call this function only after 'LZ_compress_member_finished' indicates + that the current member has been fully read (with the function + 'LZ_compress_read'). *Note member_size::, for a description of + MEMBER_SIZE. - -- Function: int LZ_compress_sync_flush ( struct LZ_Encoder * const - ENCODER ) - Use this function to make available to 'LZ_compress_read' all the - data already written with the 'LZ_compress_write' function. First - call 'LZ_compress_sync_flush'. Then call 'LZ_compress_read' until - it returns 0. + -- Function: int LZ_compress_sync_flush ( LZ_Encoder * const ENCODER ) + Use this function to make available to 'LZ_compress_read' all the data + already written with the function 'LZ_compress_write'. First call + 'LZ_compress_sync_flush'. Then call 'LZ_compress_read' until it + returns 0. + + This function writes at least one LZMA marker '3' ('Sync Flush' marker) + to the compressed output. Note that the sync flush marker is not + allowed in lzip files; it is a device for interactive communication + between applications using lzlib, but is useless and wasteful in a + file, and is excluded from the media type 'application/lzip'. The LZMA + marker '2' ('End Of Stream' marker) is the only marker allowed in lzip + files. *Note File format::. Repeated use of 'LZ_compress_sync_flush' may degrade compression - ratio, so use it only when needed. + ratio, so use it only when needed. If the interval between calls to + 'LZ_compress_sync_flush' is large (comparable to dictionary size), + creating a multimember data stream with 'LZ_compress_restart_member' + may be an alternative. - -- Function: int LZ_compress_read ( struct LZ_Encoder * const ENCODER, - uint8_t * const BUFFER, const int SIZE ) - The 'LZ_compress_read' function reads up to SIZE bytes from the - stream pointed to by ENCODER, storing the results in BUFFER. + Combining multimember stream creation with flushing may be tricky. If + there are more bytes available than those needed to complete + MEMBER_SIZE, 'LZ_compress_restart_member' needs to be called when + 'LZ_compress_member_finished' returns 1, followed by a new call to + 'LZ_compress_sync_flush'. - The return value is the number of bytes actually read. This might - be less than SIZE; for example, if there aren't that many bytes - left in the stream or if more bytes have to be yet written with the - 'LZ_compress_write' function. Note that reading less than SIZE - bytes is not an error. + -- Function: int LZ_compress_read ( LZ_Encoder * const ENCODER, uint8_t * + const BUFFER, const int SIZE ) + Reads up to SIZE bytes from the stream pointed to by ENCODER, storing + the results in BUFFER. If LZ_API_VERSION >= 1012, BUFFER may be a null + pointer, in which case the bytes read are discarded. - -- Function: int LZ_compress_write ( struct LZ_Encoder * const - ENCODER, uint8_t * const BUFFER, const int SIZE ) - The 'LZ_compress_write' function writes up to SIZE bytes from - BUFFER to the stream pointed to by ENCODER. + Returns the number of bytes actually read. This might be less than + SIZE; for example, if there aren't that many bytes left in the stream + or if more bytes have to be yet written with the function + 'LZ_compress_write'. Note that reading less than SIZE bytes is not an + error. - The return value is the number of bytes actually written. This - might be less than SIZE. Note that writing less than SIZE bytes is - not an error. + -- Function: int LZ_compress_write ( LZ_Encoder * const ENCODER, uint8_t * + const BUFFER, const int SIZE ) + Writes up to SIZE bytes from BUFFER to the stream pointed to by + ENCODER. Returns the number of bytes actually written. This might be + less than SIZE. Note that writing less than SIZE bytes is not an error. - -- Function: int LZ_compress_write_size ( struct LZ_Encoder * const - ENCODER ) - The 'LZ_compress_write_size' function returns the maximum number of - bytes that can be immediately written through the - 'LZ_compress_write' function. + -- Function: int LZ_compress_write_size ( LZ_Encoder * const ENCODER ) + Returns the maximum number of bytes that can be immediately written + through 'LZ_compress_write'. For efficiency reasons, once the input + buffer is full and 'LZ_compress_write_size' returns 0, almost all the + buffer must be compressed before a size greater than 0 is returned + again. (This is done to minimize the amount of data that must be + copied to the beginning of the buffer before new data can be accepted). It is guaranteed that an immediate call to 'LZ_compress_write' will accept a SIZE up to the returned number of bytes. - -- Function: enum LZ_Errno LZ_compress_errno ( struct LZ_Encoder * + -- Function: LZ_Errno LZ_compress_errno ( LZ_Encoder * const ENCODER ) + Returns the current error code for ENCODER. *Note Error codes::. It is + safe to call 'LZ_compress_errno' with a null argument, in which case + it returns 'LZ_bad_argument'. + + -- Function: int LZ_compress_finished ( LZ_Encoder * const ENCODER ) + Returns 1 if all the data have been read and 'LZ_compress_close' can + be safely called. Otherwise it returns 0. 'LZ_compress_finished' + implies 'LZ_compress_member_finished'. + + -- Function: int LZ_compress_member_finished ( LZ_Encoder * const ENCODER ) + Returns 1 if the current member, in a multimember data stream, has been + fully read and 'LZ_compress_restart_member' can be safely called. + Otherwise it returns 0. + + -- Function: unsigned long long LZ_compress_data_position ( LZ_Encoder * const ENCODER ) - Returns the current error code for ENCODER (*note Error codes::). - - -- Function: int LZ_compress_finished ( struct LZ_Encoder * const - ENCODER ) - Returns 1 if all the data have been read and 'LZ_compress_close' - can be safely called. Otherwise it returns 0. - - -- Function: int LZ_compress_member_finished ( struct LZ_Encoder * - const ENCODER ) - Returns 1 if the current member, in a multimember data stream, has - been fully read and 'LZ_compress_restart_member' can be safely - called. Otherwise it returns 0. - - -- Function: unsigned long long LZ_compress_data_position ( struct - LZ_Encoder * const ENCODER ) Returns the number of input bytes already compressed in the current member. - -- Function: unsigned long long LZ_compress_member_position ( struct - LZ_Encoder * const ENCODER ) - Returns the number of compressed bytes already produced, but - perhaps not yet read, in the current member. + -- Function: unsigned long long LZ_compress_member_position ( LZ_Encoder * + const ENCODER ) + Returns the number of compressed bytes already produced, but perhaps + not yet read, in the current member. - -- Function: unsigned long long LZ_compress_total_in_size ( struct - LZ_Encoder * const ENCODER ) + -- Function: unsigned long long LZ_compress_total_in_size ( LZ_Encoder * + const ENCODER ) Returns the total number of input bytes already compressed. - -- Function: unsigned long long LZ_compress_total_out_size ( struct - LZ_Encoder * const ENCODER ) + -- Function: unsigned long long LZ_compress_total_out_size ( LZ_Encoder * + const ENCODER ) Returns the total number of compressed bytes already produced, but perhaps not yet read. @@ -354,132 +395,146 @@ File: lzlib.info, Node: Decompression functions, Next: Error codes, Prev: Com 6 Decompression functions ************************* -These are the functions used to decompress data. In case of error, all -of them return -1 or 0, for signed and unsigned return values -respectively, except 'LZ_decompress_open' whose return value must be -verified by calling 'LZ_decompress_errno' before using it. +These are the functions used to decompress data. In case of error, all of +them return -1 or 0, for signed and unsigned return values respectively, +except 'LZ_decompress_open' whose return value must be checked by calling +'LZ_decompress_errno' before using it. - -- Function: struct LZ_Decoder * LZ_decompress_open ( void ) - Initializes the internal stream state for decompression and - returns a pointer that can only be used as the DECODER argument - for the other LZ_decompress functions, or a null pointer if the - decoder could not be allocated. + -- Function: LZ_Decoder * LZ_decompress_open ( void ) + Initializes the internal stream state for decompression and returns a + pointer that can only be used as the DECODER argument for the other + LZ_decompress functions, or a null pointer if the decoder could not be + allocated. - The returned pointer must be verified by calling - 'LZ_decompress_errno' before using it. If 'LZ_decompress_errno' - does not return 'LZ_ok', the returned pointer must not be used and - should be freed with 'LZ_decompress_close' to avoid memory leaks. + The returned pointer must be checked by calling 'LZ_decompress_errno' + before using it. If 'LZ_decompress_errno' does not return 'LZ_ok', the + returned pointer must not be used and should be freed with + 'LZ_decompress_close' to avoid memory leaks. - -- Function: int LZ_decompress_close ( struct LZ_Decoder * const - DECODER ) - Frees all dynamically allocated data structures for this stream. - This function discards any unprocessed input and does not flush - any pending output. After a call to 'LZ_decompress_close', DECODER - can no more be used as an argument to any LZ_decompress function. + -- Function: int LZ_decompress_close ( LZ_Decoder * const DECODER ) + Frees all dynamically allocated data structures for this stream. This + function discards any unprocessed input and does not flush any pending + output. After a call to 'LZ_decompress_close', DECODER can no longer + be used as an argument to any LZ_decompress function. It is safe to + call 'LZ_decompress_close' with a null argument. - -- Function: int LZ_decompress_finish ( struct LZ_Decoder * const - DECODER ) + -- Function: int LZ_decompress_finish ( LZ_Decoder * const DECODER ) Use this function to tell 'lzlib' that all the data for this stream - have already been written (with the 'LZ_decompress_write' - function). + have already been written (with the function 'LZ_decompress_write'). + It is safe to call 'LZ_decompress_finish' as many times as needed. It + is not required to call 'LZ_decompress_finish' if the input stream + only contains whole members, but not calling it prevents lzlib from + detecting a truncated member. - -- Function: int LZ_decompress_reset ( struct LZ_Decoder * const - DECODER ) - Resets the internal state of DECODER as it was just after opening - it with the 'LZ_decompress_open' function. Data stored in the - internal buffers is discarded. Position counters are set to 0. + -- Function: int LZ_decompress_reset ( LZ_Decoder * const DECODER ) + Resets the internal state of DECODER as it was just after opening it + with the function 'LZ_decompress_open'. Data stored in the internal + buffers are discarded. Position counters are set to 0. - -- Function: int LZ_decompress_sync_to_member ( struct LZ_Decoder * - const DECODER ) - Resets the error state of DECODER and enters a search state that - lasts until a new member header (or the end of the stream) is - found. After a successful call to 'LZ_decompress_sync_to_member', - data written with 'LZ_decompress_write' will be consumed and - 'LZ_decompress_read' will return 0 until a header is found. + -- Function: int LZ_decompress_sync_to_member ( LZ_Decoder * const DECODER + ) + Resets the error state of DECODER and enters a search state that lasts + until a new member header (or the end of the stream) is found. After a + successful call to 'LZ_decompress_sync_to_member', data written with + 'LZ_decompress_write' is consumed and 'LZ_decompress_read' returns 0 + until a header is found. This function is useful to discard any data preceding the first - member, or to discard the rest of the current member, for example - in case of a data error. If the decoder is already at the - beginning of a member, this function does nothing. + member, or to discard the rest of the current member, for example in + case of a data error. If the decoder is already at the beginning of a + member, this function does nothing. - -- Function: int LZ_decompress_read ( struct LZ_Decoder * const - DECODER, uint8_t * const BUFFER, const int SIZE ) - The 'LZ_decompress_read' function reads up to SIZE bytes from the - stream pointed to by DECODER, storing the results in BUFFER. + -- Function: int LZ_decompress_read ( LZ_Decoder * const DECODER, uint8_t + * const BUFFER, const int SIZE ) + Reads up to SIZE bytes from the stream pointed to by DECODER, storing + the results in BUFFER. If LZ_API_VERSION >= 1012, BUFFER may be a null + pointer, in which case the bytes read are discarded. - The return value is the number of bytes actually read. This might - be less than SIZE; for example, if there aren't that many bytes - left in the stream or if more bytes have to be yet written with the - 'LZ_decompress_write' function. Note that reading less than SIZE - bytes is not an error. + Returns the number of bytes actually read. This might be less than + SIZE; for example, if there aren't that many bytes left in the stream + or if more bytes have to be yet written with the function + 'LZ_decompress_write'. Note that reading less than SIZE bytes is not + an error. - -- Function: int LZ_decompress_write ( struct LZ_Decoder * const - DECODER, uint8_t * const BUFFER, const int SIZE ) - The 'LZ_decompress_write' function writes up to SIZE bytes from - BUFFER to the stream pointed to by DECODER. + 'LZ_decompress_read' returns at least once per member so that + 'LZ_decompress_member_finished' can be called (and trailer data + retrieved) for each member, even for empty members. Therefore, + 'LZ_decompress_read' returning 0 does not mean that the end of the + stream has been reached. The increase in the value returned by + 'LZ_decompress_total_in_size' can be used to tell the end of the stream + from an empty member. - The return value is the number of bytes actually written. This - might be less than SIZE. Note that writing less than SIZE bytes is - not an error. + In case of decompression error caused by corrupt or truncated data, + 'LZ_decompress_read' does not signal the error immediately to the + application, but waits until all the bytes decoded have been read. This + allows tools like tarlz to recover as much data as possible from each + damaged member. *Note tarlz manual: (tarlz)Top. - -- Function: int LZ_decompress_write_size ( struct LZ_Decoder * const + -- Function: int LZ_decompress_write ( LZ_Decoder * const DECODER, uint8_t + * const BUFFER, const int SIZE ) + Writes up to SIZE bytes from BUFFER to the stream pointed to by + DECODER. Returns the number of bytes actually written. This might be + less than SIZE. Note that writing less than SIZE bytes is not an error. + + -- Function: int LZ_decompress_write_size ( LZ_Decoder * const DECODER ) + Returns the maximum number of bytes that can be immediately written + through 'LZ_decompress_write'. This number varies smoothly; each + compressed byte consumed may be overwritten immediately, increasing by + 1 the value returned. + + It is guaranteed that an immediate call to 'LZ_decompress_write' will + accept a SIZE up to the returned number of bytes. + + -- Function: LZ_Errno LZ_decompress_errno ( LZ_Decoder * const DECODER ) + Returns the current error code for DECODER. *Note Error codes::. It is + safe to call 'LZ_decompress_errno' with a null argument, in which case + it returns 'LZ_bad_argument'. + + -- Function: int LZ_decompress_finished ( LZ_Decoder * const DECODER ) + Returns 1 if all the data have been read and 'LZ_decompress_close' can + be safely called. Otherwise it returns 0. 'LZ_decompress_finished' + does not imply 'LZ_decompress_member_finished'. + + -- Function: int LZ_decompress_member_finished ( LZ_Decoder * const DECODER ) - The 'LZ_decompress_write_size' function returns the maximum number - of bytes that can be immediately written through the - 'LZ_decompress_write' function. + Returns 1 if the previous call to 'LZ_decompress_read' finished reading + the current member, indicating that final values for the member are + available through 'LZ_decompress_data_crc', + 'LZ_decompress_data_position', and 'LZ_decompress_member_position'. + Otherwise it returns 0. - It is guaranteed that an immediate call to 'LZ_decompress_write' - will accept a SIZE up to the returned number of bytes. + -- Function: int LZ_decompress_member_version ( LZ_Decoder * const DECODER + ) + Returns the version of the current member, read from the member header. - -- Function: enum LZ_Errno LZ_decompress_errno ( struct LZ_Decoder * - const DECODER ) - Returns the current error code for DECODER (*note Error codes::). - - -- Function: int LZ_decompress_finished ( struct LZ_Decoder * const + -- Function: int LZ_decompress_dictionary_size ( LZ_Decoder * const DECODER ) - Returns 1 if all the data have been read and 'LZ_decompress_close' - can be safely called. Otherwise it returns 0. + Returns the dictionary size of the current member, read from the + member header. - -- Function: int LZ_decompress_member_finished ( struct LZ_Decoder * + -- Function: unsigned LZ_decompress_data_crc ( LZ_Decoder * const DECODER ) + Returns the 32 bit Cyclic Redundancy Check of the data decompressed + from the current member. The value returned is valid only when + 'LZ_decompress_member_finished' returns 1. + + -- Function: unsigned long long LZ_decompress_data_position ( LZ_Decoder * const DECODER ) - Returns 1 if the previous call to 'LZ_decompress_read' finished - reading the current member, indicating that final values for - member are available through 'LZ_decompress_data_crc', - 'LZ_decompress_data_position', and - 'LZ_decompress_member_position'. Otherwise it returns 0. + Returns the number of decompressed bytes already produced, but perhaps + not yet read, in the current member. - -- Function: int LZ_decompress_member_version ( struct LZ_Decoder * + -- Function: unsigned long long LZ_decompress_member_position ( LZ_Decoder + * const DECODER ) + Returns the number of input bytes already decompressed in the current + member. + + -- Function: unsigned long long LZ_decompress_total_in_size ( LZ_Decoder * const DECODER ) - Returns the version of current member from member header. - - -- Function: int LZ_decompress_dictionary_size ( struct LZ_Decoder * - const DECODER ) - Returns the dictionary size of current member from member header. - - -- Function: unsigned LZ_decompress_data_crc ( struct LZ_Decoder * - const DECODER ) - Returns the 32 bit Cyclic Redundancy Check of the data - decompressed from the current member. The returned value is valid - only when 'LZ_decompress_member_finished' returns 1. - - -- Function: unsigned long long LZ_decompress_data_position ( struct - LZ_Decoder * const DECODER ) - Returns the number of decompressed bytes already produced, but - perhaps not yet read, in the current member. - - -- Function: unsigned long long LZ_decompress_member_position ( struct - LZ_Decoder * const DECODER ) - Returns the number of input bytes already decompressed in the - current member. - - -- Function: unsigned long long LZ_decompress_total_in_size ( struct - LZ_Decoder * const DECODER ) Returns the total number of input bytes already decompressed. - -- Function: unsigned long long LZ_decompress_total_out_size ( struct - LZ_Decoder * const DECODER ) - Returns the total number of decompressed bytes already produced, - but perhaps not yet read. + -- Function: unsigned long long LZ_decompress_total_out_size ( LZ_Decoder + * const DECODER ) + Returns the total number of decompressed bytes already produced, but + perhaps not yet read.  File: lzlib.info, Node: Error codes, Next: Error messages, Prev: Decompression functions, Up: Top @@ -489,96 +544,345 @@ File: lzlib.info, Node: Error codes, Next: Error messages, Prev: Decompressio Most library functions return -1 to indicate that they have failed. But this return value only tells you that an error has occurred. To find out -what kind of error it was, you need to verify the error code by calling +what kind of error it was, you need to check the error code by calling 'LZ_(de)compress_errno'. Library functions don't change the value returned by 'LZ_(de)compress_errno' when they succeed; thus, the value returned by -'LZ_(de)compress_errno' after a successful call is not necessarily -LZ_ok, and you should not use 'LZ_(de)compress_errno' to determine -whether a call failed. If the call failed, then you can examine -'LZ_(de)compress_errno'. +'LZ_(de)compress_errno' after a successful call is not necessarily LZ_ok, +and you should not use 'LZ_(de)compress_errno' to determine whether a call +failed. If the call failed, then you can examine 'LZ_(de)compress_errno'. - The error codes are defined in the header file 'lzlib.h'. + The error codes are defined in the header file 'lzlib.h'. 'LZ_Errno' is +an enum type: - -- Constant: enum LZ_Errno LZ_ok - The value of this constant is 0 and is used to indicate that there - is no error. + -- Constant: LZ_Errno LZ_ok + The value of this constant is 0 and is used to indicate that there is + no error. - -- Constant: enum LZ_Errno LZ_bad_argument + -- Constant: LZ_Errno LZ_bad_argument At least one of the arguments passed to the library function was invalid. - -- Constant: enum LZ_Errno LZ_mem_error + -- Constant: LZ_Errno LZ_mem_error No memory available. The system cannot allocate more virtual memory because its capacity is full. - -- Constant: enum LZ_Errno LZ_sequence_error + -- Constant: LZ_Errno LZ_sequence_error A library function was called in the wrong order. For example 'LZ_compress_restart_member' was called before - 'LZ_compress_member_finished' indicates that the current member is + 'LZ_compress_member_finished' indicated that the current member is finished. - -- Constant: enum LZ_Errno LZ_header_error - An invalid member header (one with the wrong magic bytes) was - read. If this happens at the end of the data stream it may - indicate trailing data. + -- Constant: LZ_Errno LZ_header_error + An invalid member header (one with the wrong magic bytes) was read. If + this happens at the end of the data stream it may indicate trailing + data. - -- Constant: enum LZ_Errno LZ_unexpected_eof + -- Constant: LZ_Errno LZ_unexpected_eof The end of the data stream was reached in the middle of a member. - -- Constant: enum LZ_Errno LZ_data_error - The data stream is corrupt. + -- Constant: LZ_Errno LZ_data_error + The data stream is corrupt. If 'LZ_decompress_member_position' is 6 or + less, it indicates either a format version not supported, an invalid + dictionary size, a nonzero first LZMA byte, a corrupt header in a + multimember data stream, or trailing data too similar to a valid lzip + header. Lziprecover can be used to repair some of these errors and to + remove conflicting trailing data from a file. - -- Constant: enum LZ_Errno LZ_library_error - A bug was detected in the library. Please, report it (*note - Problems::). + -- Constant: LZ_Errno LZ_library_error + A bug was detected in the library. Please, report it. *Note Problems::.  -File: lzlib.info, Node: Error messages, Next: Data format, Prev: Error codes, Up: Top +File: lzlib.info, Node: Error messages, Next: Invoking minilzip, Prev: Error codes, Up: Top 8 Error messages **************** - -- Function: const char * LZ_strerror ( const enum LZ_Errno LZ_ERRNO ) - Returns the standard error message for a given error code. The - messages are fairly short; there are no multi-line messages or - embedded newlines. This function makes it easy for your program - to report informative error messages about the failure of a - library call. + -- Function: const char * LZ_strerror ( const LZ_Errno LZ_ERRNO ) + Returns the error message corresponding to the error code LZ_ERRNO. + The messages are fairly short; there are no multi-line messages or + embedded newlines. This function makes it easy for your program to + report informative error messages about the failure of a library call. The value of LZ_ERRNO normally comes from a call to 'LZ_(de)compress_errno'.  -File: lzlib.info, Node: Data format, Next: Examples, Prev: Error messages, Up: Top +File: lzlib.info, Node: Invoking minilzip, Next: File format, Prev: Error messages, Up: Top -9 Data format -************* +9 Invoking minilzip +******************* + +Minilzip is a test program for the compression library lzlib. Minilzip is +not intended to be installed because lzip has more features, but minilzip is +well tested and you can use it as your main compressor if so you wish. +*Note lzip: (lzip)Top. + + Lzip is a lossless data compressor with a user interface similar to the +one of gzip or bzip2. Lzip uses a simplified form of LZMA (Lempel-Ziv-Markov +chain-Algorithm) designed to achieve complete interoperability between +implementations. The maximum dictionary size is 512 MiB so that any lzip +file can be decompressed on 32-bit machines. Lzip provides accurate and +robust 3-factor integrity checking. 'lzip -0' compresses about as fast as +gzip, while 'lzip -9' compresses most files more than bzip2. Decompression +speed is intermediate between gzip and bzip2. Lzip provides better data +recovery capabilities than gzip and bzip2. Lzip has been designed, written, +and tested with great care to replace gzip and bzip2 as general-purpose +compressed format for Unix-like systems. + +The format for running minilzip is: + + minilzip [OPTIONS] [FILES] + +If no file names are specified, minilzip compresses (or decompresses) from +standard input to standard output. A hyphen '-' used as a FILE argument +means standard input. It can be mixed with other FILES and is read just +once, the first time it appears in the command line. Remember to prepend +'./' to any file name beginning with a hyphen, or use '--'. + +minilzip supports the following options: *Note Argument syntax: +(plzip)Argument syntax. + +'-h' +'--help' + Print an informative help message describing the options and exit. + +'-V' +'--version' + Print the version number of minilzip on the standard output and exit. + This version number should be included in all bug reports. + +'-a' +'--trailing-error' + Exit with error status 2 if any remaining input is detected after + decompressing the last member. Such remaining input is usually trailing + garbage that can be safely ignored. + +'-b BYTES' +'--member-size=BYTES' + When compressing, set the member size limit to BYTES. If BYTES is + smaller than the compressed size, a multimember file is produced. It is + advisable to keep members smaller than RAM size so that they can be + repaired with lziprecover in case of corruption. A small member size + may degrade compression ratio, so use it only when needed. Valid + values range from 100 kB to 2 PiB. Defaults to 2 PiB. + +'-c' +'--stdout' + Compress or decompress to standard output; keep input files unchanged. + If compressing several files, each file is compressed independently. + (The output consists of a sequence of independently compressed + members). This option (or '-o') is needed when reading from a named + pipe (fifo) or from a device. Use it also to recover as much of the + decompressed data as possible when decompressing a corrupt file. '-c' + overrides '-o' and '-S'. '-c' has no effect when testing. + +'-d' +'--decompress' + Decompress the files specified. The integrity of the files specified is + checked. If a file does not exist, can't be opened, or the destination + file already exists and '--force' has not been specified, minilzip + continues decompressing the rest of the files and exits with error + status 1. If a file fails to decompress, or is a terminal, minilzip + exits immediately with error status 2 without decompressing the rest + of the files. A terminal is considered an uncompressed file, and + therefore invalid. A multimember file with one or more empty members + is accepted if redirected to standard input. + +'-f' +'--force' + Force overwrite of output files. + +'-F' +'--recompress' + When compressing, force re-compression of files whose name already has + the '.lz' or '.tlz' suffix. + +'-k' +'--keep' + Keep (don't delete) input files during compression or decompression. + +'-m BYTES' +'--match-length=BYTES' + When compressing, set the match length limit in bytes. After a match + this long is found, the search is finished. Valid values range from 5 + to 273. Larger values usually give better compression ratios but + longer compression times. + +'-o FILE' +'--output=FILE' + If '-c' has not been also specified, write the (de)compressed output + to FILE; keep input files unchanged. If compressing several files, + each file is compressed independently. (The output consists of a + sequence of independently compressed members). This option (or '-c') + is needed when reading from a named pipe (fifo) or from a device. + '-o -' is equivalent to '-c'. '-o' has no effect when testing. + + When compressing and splitting the output in volumes, FILE is used as + a prefix, and several files named 'FILE00001.lz', 'FILE00002.lz', etc, + are created. In this case, only one input file is allowed. + +'-q' +'--quiet' + Quiet operation. Suppress all messages. + +'-s BYTES' +'--dictionary-size=BYTES' + When compressing, set the dictionary size limit in bytes. Minilzip + uses for each file the largest dictionary size that does not exceed + neither the file size nor this limit. Valid values range from 4 KiB to + 512 MiB. Values 12 to 29 are interpreted as powers of two, meaning + 2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be + coded in just one byte (*note coded-dict-size::). If the size + specified does not match one of the valid sizes, it is rounded upwards + by adding up to (BYTES / 8) to it. + + For maximum compression you should use a dictionary size limit as large + as possible, but keep in mind that the decompression memory requirement + is affected at compression time by the choice of dictionary size limit. + The dictionary size used for decompression is the same dictionary size + used for compression. + +'-S BYTES' +'--volume-size=BYTES' + When compressing, and '-c' has not been also specified, split the + compressed output into several volume files with names + 'original_name00001.lz', 'original_name00002.lz', etc, and set the + volume size limit to BYTES. Input files are kept unchanged. Each + volume is a complete, maybe multimember, lzip file. A small volume + size may degrade compression ratio, so use it only when needed. Valid + values range from 100 kB to 4 EiB. + +'-t' +'--test' + Check integrity of the files specified, but don't decompress them. This + really performs a trial decompression and throws away the result. Use + it together with '-v' to see information about the files. If a file + fails the test, does not exist, can't be opened, or is a terminal, + minilzip continues testing the rest of the files. A final diagnostic + is shown at verbosity level 1 or higher if any file fails the test + when testing multiple files. A multimember file with one or more empty + members is accepted if redirected to standard input. + +'-v' +'--verbose' + Verbose mode. + When compressing, show the compression ratio and size for each file + processed. + When decompressing or testing, further -v's (up to 4) increase the + verbosity level, showing status, compression ratio, dictionary size, + and trailer contents (CRC, data size, member size). + +'-0 .. -9' + Compression level. Set the compression parameters (dictionary size and + match length limit) as shown in the table below. The default + compression level is '-6', equivalent to '-s8MiB -m36'. Note that '-9' + can be much slower than '-0'. These options have no effect when + decompressing or testing. + + The bidimensional parameter space of LZMA can't be mapped to a linear + scale optimal for all files. If your files are large, very repetitive, + etc, you may need to use the options '--dictionary-size' and + '--match-length' directly to achieve optimal performance. + + If several compression levels or '-s' or '-m' options are given, the + last setting is used. For example '-9 -s64MiB' is equivalent to + '-s64MiB -m273' + + Level Dictionary size (-s) Match length limit (-m) + ------------------------------------------------------ + -0 64 KiB 16 bytes + -1 1 MiB 5 bytes + -2 1.5 MiB 6 bytes + -3 2 MiB 8 bytes + -4 3 MiB 12 bytes + -5 4 MiB 20 bytes + -6 8 MiB 36 bytes + -7 16 MiB 68 bytes + -8 24 MiB 132 bytes + -9 32 MiB 273 bytes + +'--fast' +'--best' + Aliases for GNU gzip compatibility. + +'--loose-trailing' + When decompressing or testing, allow trailing data whose first bytes + are so similar to the magic bytes of a lzip header that they can be + confused with a corrupt header. Use this option if a file triggers a + 'corrupt header' error and the cause is not indeed a corrupt header. + +'--check-lib' + Compare the version of lzlib used to compile minilzip with the version + actually being used at run time and exit. Report any differences + found. Exit with error status 1 if differences are found. A mismatch + may indicate that lzlib is not correctly installed or that a different + version of lzlib has been installed after compiling the shared version + of minilzip. Exit with error status 2 if LZ_API_VERSION and + LZ_version_string don't match. 'minilzip -v --check-lib' shows the + version of lzlib being used and the value of LZ_API_VERSION (if + defined). *Note Library version::. + + + Numbers given as arguments to options may be expressed in decimal, +hexadecimal, or octal (using the same syntax as integer constants in C++), +and may be followed by a multiplier and an optional 'B' for "byte". + + Table of SI and binary prefixes (unit multipliers): + +Prefix Value | Prefix Value +---------------------------------------------------------------------- +k kilobyte (10^3 = 1000) | Ki kibibyte (2^10 = 1024) +M megabyte (10^6) | Mi mebibyte (2^20) +G gigabyte (10^9) | Gi gibibyte (2^30) +T terabyte (10^12) | Ti tebibyte (2^40) +P petabyte (10^15) | Pi pebibyte (2^50) +E exabyte (10^18) | Ei exbibyte (2^60) +Z zettabyte (10^21) | Zi zebibyte (2^70) +Y yottabyte (10^24) | Yi yobibyte (2^80) +R ronnabyte (10^27) | Ri robibyte (2^90) +Q quettabyte (10^30) | Qi quebibyte (2^100) + + + Exit status: 0 for a normal exit, 1 for environmental problems (file not +found, invalid command-line options, I/O errors, etc), 2 to indicate a +corrupt or invalid input file, 3 for an internal consistency error (e.g., +bug) which caused minilzip to panic. + + +File: lzlib.info, Node: File format, Next: Examples, Prev: Invoking minilzip, Up: Top + +10 File format +************** Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away. -- Antoine de Saint-Exupery - In the diagram below, a box like this: + +---+ | | <-- the vertical bars might be missing +---+ represents one byte; a box like this: + +==============+ | | +==============+ represents a variable number of bytes. - - A lzip data stream consists of a series of "members" (compressed data -sets). The members simply appear one after another in the data stream, -with no additional information before, between, or after them. +A lzip file consists of one or more independent "members" (compressed data +sets). The members simply appear one after another in the file, with no +additional information before, between, or after them. Each member can +encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The +size of a multimember file is unlimited. Empty members (data size = 0) are +not allowed in multimember files. Each member has the following structure: + +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ @@ -586,178 +890,361 @@ with no additional information before, between, or after them. All multibyte values are stored in little endian order. 'ID string (the "magic" bytes)' - A four byte string, identifying the lzip format, with the value - "LZIP" (0x4C, 0x5A, 0x49, 0x50). + A four byte string, identifying the lzip format, with the value "LZIP" + (0x4C, 0x5A, 0x49, 0x50). 'VN (version number, 1 byte)' - Just in case something needs to be modified in the future. 1 for - now. + Just in case something needs to be modified in the future. 1 for now. 'DS (coded dictionary size, 1 byte)' The dictionary size is calculated by taking a power of 2 (the base - size) and substracting from it a fraction between 0/16 and 7/16 of - the base size. + size) and subtracting from it a fraction between 0/16 and 7/16 of the + base size. Bits 4-0 contain the base 2 logarithm of the base size (12 to 29). - Bits 7-5 contain the numerator of the fraction (0 to 7) to - substract from the base size to obtain the dictionary size. + Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract + from the base size to obtain the dictionary size. Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB Valid values for dictionary size range from 4 KiB to 512 MiB. 'LZMA stream' - The LZMA stream, finished by an end of stream marker. Uses default - values for encoder properties. *Note Stream format: (lzip)Stream + The LZMA stream, terminated by an 'End Of Stream' marker. Uses default + values for encoder properties. *Note Stream format: (lzip)Stream format, for a complete description. - Lzip only uses the LZMA marker '2' ("End Of Stream" marker). Lzlib - also uses the LZMA marker '3' ("Sync Flush" marker). + Lzip only uses the LZMA marker '2' ('End Of Stream' marker). Lzlib + also uses the LZMA marker '3' ('Sync Flush' marker). *Note + sync_flush::. 'CRC32 (4 bytes)' - CRC of the uncompressed original data. + Cyclic Redundancy Check (CRC) of the original uncompressed data. 'Data size (8 bytes)' - Size of the uncompressed original data. + Size of the original uncompressed data. 'Member size (8 bytes)' - Total size of the member, including header and trailer. This field - acts as a distributed index, allows the verification of stream - integrity, and facilitates safe recovery of undamaged members from - multimember files. + Total size of the member, including header and trailer. This field acts + as a distributed index, improves the checking of stream integrity, and + facilitates the safe recovery of undamaged members from multimember + files. Lzip limits the member size to 2 PiB to prevent the data size + field from overflowing.  -File: lzlib.info, Node: Examples, Next: Problems, Prev: Data format, Up: Top +File: lzlib.info, Node: Examples, Next: Problems, Prev: File format, Up: Top -10 A small tutorial with examples +11 A small tutorial with examples ********************************* -This chapter shows the order in which the library functions should be -called depending on what kind of data stream you want to compress or -decompress. See the file 'bbexample.c' in the source distribution for -an example of how buffer-to-buffer compression/decompression can be -implemented using lzlib. +This chapter provides real code examples for the most common uses of the +library. See these examples in context in the files 'bbexample.c' and +'ffexample.c' from the source distribution of lzlib. - Note that lzlib's interface is symmetrical. That is, the code for -normal compression and decompression is identical except because one -calls LZ_compress* functions while the other calls LZ_decompress* -functions. + Note that the interface of lzlib is symmetrical. That is, the code for +normal compression and decompression is identical except because one calls +LZ_compress* functions while the other calls LZ_decompress* functions. + +* Menu: + +* Buffer compression:: Buffer-to-buffer single-member compression +* Buffer decompression:: Buffer-to-buffer decompression +* File compression:: File-to-file single-member compression +* File decompression:: File-to-file decompression +* File compression mm:: File-to-file multimember compression +* Skipping data errors:: Decompression with automatic resynchronization + + +File: lzlib.info, Node: Buffer compression, Next: Buffer decompression, Up: Examples + +11.1 Buffer compression +======================= + +Buffer-to-buffer single-member compression (MEMBER_SIZE > total output). + +/* Compress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the compressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbcompress( const uint8_t * const inbuf, const int insize, + const int dictionary_size, const int match_len_limit, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + LZ_Encoder * const encoder = + LZ_compress_open( dictionary_size, match_len_limit, INT64_MAX ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { LZ_compress_close( encoder ); return false; } + + while( true ) + { + int ret = LZ_compress_write( encoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_compress_finish( encoder ); + ret = LZ_compress_read( encoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_compress_finished( encoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_compress_close( encoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } + + +File: lzlib.info, Node: Buffer decompression, Next: File compression, Prev: Buffer compression, Up: Examples + +11.2 Buffer decompression +========================= + +Buffer-to-buffer decompression. + +/* Decompress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the decompressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbdecompress( const uint8_t * const inbuf, const int insize, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + LZ_Decoder * const decoder = LZ_decompress_open(); + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { LZ_decompress_close( decoder ); return false; } + + while( true ) + { + int ret = LZ_decompress_write( decoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_decompress_finish( decoder ); + ret = LZ_decompress_read( decoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_decompress_finished( decoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_decompress_close( decoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } + + +File: lzlib.info, Node: File compression, Next: File decompression, Prev: Buffer decompression, Up: Examples + +11.3 File compression +===================== + +File-to-file compression using LZ_compress_write_size. + +int ffcompress( LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_finished( encoder ) == 1 ) return 0; + } + return 1; + } + + +File: lzlib.info, Node: File decompression, Next: File compression mm, Prev: File compression, Up: Examples + +11.4 File decompression +======================= + +File-to-file decompression using LZ_decompress_write_size. + +int ffdecompress( LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } + + +File: lzlib.info, Node: File compression mm, Next: Skipping data errors, Prev: File decompression, Up: Examples + +11.5 File-to-file multimember compression +========================================= + +Example 1: Multimember compression with members of fixed size +(MEMBER_SIZE < total output). + +int ffmmcompress( FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384, member_size = 4096 }; + uint8_t buffer[buffer_size]; + bool done = false; + LZ_Encoder * const encoder = LZ_compress_open( 65535, 16, member_size ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { fputs( "ffexample: Not enough memory.\n", stderr ); + LZ_compress_close( encoder ); return 1; } + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( LZ_compress_finished( encoder ) == 1 ) { done = true; break; } + if( LZ_compress_restart_member( encoder, member_size ) < 0 ) break; + } + } + if( LZ_compress_close( encoder ) < 0 ) done = false; + return done; + } -Example 1: Normal compression (MEMBER_SIZE > total output). +Example 2: Multimember compression (user-restarted members). (Call +LZ_compress_open with MEMBER_SIZE > largest member). - 1) LZ_compress_open - 2) LZ_compress_write - 3) LZ_compress_read - 4) go back to step 2 until all input data have been written - 5) LZ_compress_finish - 6) LZ_compress_read - 7) go back to step 6 until LZ_compress_finished returns 1 - 8) LZ_compress_close +/* Compress 'infile' to 'outfile' as a multimember stream with one member + for each line of text terminated by a newline character or by EOF. + Return 0 if success, 1 if error. +*/ +int fflfcompress( LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + for( len = 0; len < size; ) + { + int ch = getc( infile ); + if( ch == EOF || ( buffer[len++] = ch ) == '\n' ) break; + } + /* avoid writing an empty member to outfile */ + if( len == 0 && LZ_compress_data_position( encoder ) == 0 ) return 0; + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) || buffer[len-1] == '\n' ) + LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( feof( infile ) && LZ_compress_finished( encoder ) == 1 ) return 0; + if( LZ_compress_restart_member( encoder, INT64_MAX ) < 0 ) break; + } + } + return 1; + } + +File: lzlib.info, Node: Skipping data errors, Prev: File compression mm, Up: Examples -Example 2: Normal compression using LZ_compress_write_size. +11.6 Skipping data errors +========================= - 1) LZ_compress_open - 2) go to step 5 if LZ_compress_write_size returns 0 - 3) LZ_compress_write - 4) if no more data to write, call LZ_compress_finish - 5) LZ_compress_read - 6) go back to step 2 until LZ_compress_finished returns 1 - 7) LZ_compress_close - - -Example 3: Decompression. - - 1) LZ_decompress_open - 2) LZ_decompress_write - 3) LZ_decompress_read - 4) go back to step 2 until all input data have been written - 5) LZ_decompress_finish - 6) LZ_decompress_read - 7) go back to step 6 until LZ_decompress_finished returns 1 - 8) LZ_decompress_close - - -Example 4: Decompression using LZ_decompress_write_size. - - 1) LZ_decompress_open - 2) go to step 5 if LZ_decompress_write_size returns 0 - 3) LZ_decompress_write - 4) if no more data to write, call LZ_decompress_finish - 5) LZ_decompress_read - 5a) optionally, if LZ_decompress_member_finished returns 1, read - final values for member with LZ_decompress_data_crc, etc. - 6) go back to step 2 until LZ_decompress_finished returns 1 - 7) LZ_decompress_close - - -Example 5: Multimember compression (MEMBER_SIZE < total output). - - 1) LZ_compress_open - 2) go to step 5 if LZ_compress_write_size returns 0 - 3) LZ_compress_write - 4) if no more data to write, call LZ_compress_finish - 5) LZ_compress_read - 6) go back to step 2 until LZ_compress_member_finished returns 1 - 7) go to step 10 if LZ_compress_finished() returns 1 - 8) LZ_compress_restart_member - 9) go back to step 2 - 10) LZ_compress_close - - -Example 6: Multimember compression (user-restarted members). - - 1) LZ_compress_open - 2) LZ_compress_write - 3) LZ_compress_read - 4) go back to step 2 until member termination is desired - 5) LZ_compress_finish - 6) LZ_compress_read - 7) go back to step 6 until LZ_compress_member_finished returns 1 - 8) verify that LZ_compress_finished returns 1 - 9) go to step 12 if all input data have been written - 10) LZ_compress_restart_member - 11) go back to step 2 - 12) LZ_compress_close - - -Example 7: Decompression with automatic removal of leading data. - - 1) LZ_decompress_open - 2) LZ_decompress_sync_to_member - 3) go to step 6 if LZ_decompress_write_size returns 0 - 4) LZ_decompress_write - 5) if no more data to write, call LZ_decompress_finish - 6) LZ_decompress_read - 7) go back to step 3 until LZ_decompress_finished returns 1 - 8) LZ_decompress_close - - -Example 8: Streamed decompression with automatic resynchronization to -next member in case of data error. - - 1) LZ_decompress_open - 2) go to step 5 if LZ_decompress_write_size returns 0 - 3) LZ_decompress_write - 4) if no more data to write, call LZ_decompress_finish - 5) if LZ_decompress_read produces LZ_header_error or LZ_data_error, - call LZ_decompress_sync_to_member - 6) go back to step 2 until LZ_decompress_finished returns 1 - 7) LZ_decompress_close +/* Decompress 'infile' to 'outfile' with automatic resynchronization to + next member in case of data error, including the automatic removal of + leading garbage. +*/ +int ffrsdecompress( LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) + { + if( LZ_decompress_errno( decoder ) == LZ_header_error || + LZ_decompress_errno( decoder ) == LZ_data_error ) + { LZ_decompress_sync_to_member( decoder ); continue; } + break; + } + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + }  File: lzlib.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top -11 Reporting bugs +12 Reporting bugs ***************** -There are probably bugs in lzlib. There are certainly errors and -omissions in this manual. If you report them, they will get fixed. If -you don't, no one will ever know about them and they will remain unfixed -for all eternity, if not longer. +There are probably bugs in lzlib. There are certainly errors and omissions +in this manual. If you report them, they will get fixed. If you don't, no +one will ever know about them and they will remain unfixed for all +eternity, if not longer. If you find a bug in lzlib, please send electronic mail to -. Include the version number, which you can find -by running 'minilzip --version' or in 'LZ_version_string' from -'lzlib.h'. +. Include the version number, which you can find by +running 'minilzip --version' and 'minilzip -v --check-lib'.  File: lzlib.info, Node: Concept index, Prev: Problems, Up: Top @@ -768,36 +1255,53 @@ Concept index [index] * Menu: -* buffering: Buffering. (line 6) -* bugs: Problems. (line 6) -* compression functions: Compression functions. (line 6) -* data format: Data format. (line 6) -* decompression functions: Decompression functions. - (line 6) -* error codes: Error codes. (line 6) -* error messages: Error messages. (line 6) -* examples: Examples. (line 6) -* getting help: Problems. (line 6) -* introduction: Introduction. (line 6) -* library version: Library version. (line 6) -* parameter limits: Parameter limits. (line 6) +* buffer compression: Buffer compression. (line 6) +* buffer decompression: Buffer decompression. (line 6) +* buffering: Buffering. (line 6) +* bugs: Problems. (line 6) +* compression functions: Compression functions. (line 6) +* decompression functions: Decompression functions. (line 6) +* error codes: Error codes. (line 6) +* error messages: Error messages. (line 6) +* examples: Examples. (line 6) +* file compression: File compression. (line 6) +* file decompression: File decompression. (line 6) +* file format: File format. (line 6) +* getting help: Problems. (line 6) +* introduction: Introduction. (line 6) +* invoking: Invoking minilzip. (line 6) +* library version: Library version. (line 6) +* multimember compression: File compression mm. (line 6) +* options: Invoking minilzip. (line 6) +* parameter limits: Parameter limits. (line 6) +* skipping data errors: Skipping data errors. (line 6)  Tag Table: -Node: Top220 -Node: Introduction1301 -Node: Library version5918 -Node: Buffering6563 -Node: Parameter limits7783 -Node: Compression functions8742 -Node: Decompression functions15282 -Node: Error codes21450 -Node: Error messages23425 -Node: Data format24004 -Node: Examples26569 -Node: Problems30650 -Node: Concept index31222 +Node: Top215 +Node: Introduction1337 +Node: Library version5500 +Node: Buffering8051 +Node: Parameter limits9276 +Node: Compression functions10230 +Ref: member_size12006 +Ref: sync_flush13747 +Node: Decompression functions18313 +Node: Error codes25700 +Node: Error messages28045 +Node: Invoking minilzip28628 +Node: File format39710 +Ref: coded-dict-size41208 +Node: Examples42615 +Node: Buffer compression43576 +Node: Buffer decompression45089 +Node: File compression46496 +Node: File decompression47472 +Node: File compression mm48469 +Node: Skipping data errors51480 +Node: Problems52778 +Node: Concept index53339  End Tag Table diff --git a/doc/lzlib.texi b/doc/lzlib.texi index bc3b9fe..3e15079 100644 --- a/doc/lzlib.texi +++ b/doc/lzlib.texi @@ -6,10 +6,10 @@ @finalout @c %**end of header -@set UPDATED 17 May 2016 -@set VERSION 1.8 +@set UPDATED 9 January 2025 +@set VERSION 1.15 -@dircategory Data Compression +@dircategory Compression @direntry * Lzlib: (lzlib). Compression library for the lzip format @end direntry @@ -29,154 +29,178 @@ @contents @end ifnothtml +@ifnottex @node Top @top This manual is for Lzlib (version @value{VERSION}, @value{UPDATED}). @menu -* Introduction:: Purpose and features of lzlib -* Library version:: Checking library version -* Buffering:: Sizes of lzlib's buffers -* Parameter limits:: Min / max values for some parameters -* Compression functions:: Descriptions of the compression functions -* Decompression functions:: Descriptions of the decompression functions -* Error codes:: Meaning of codes returned by functions -* Error messages:: Error messages corresponding to error codes -* Data format:: Detailed format of the compressed data -* Examples:: A small tutorial with examples -* Problems:: Reporting bugs -* Concept index:: Index of concepts +* Introduction:: Purpose and features of lzlib +* Library version:: Checking library version +* Buffering:: Sizes of lzlib's buffers +* Parameter limits:: Min / max values for some parameters +* Compression functions:: Descriptions of the compression functions +* Decompression functions:: Descriptions of the decompression functions +* Error codes:: Meaning of codes returned by functions +* Error messages:: Error messages corresponding to error codes +* Invoking minilzip:: Command-line interface of the test program +* File format:: Detailed format of the compressed file +* Examples:: A small tutorial with examples +* Problems:: Reporting bugs +* Concept index:: Index of concepts @end menu @sp 1 -Copyright @copyright{} 2009-2016 Antonio Diaz Diaz. +Copyright @copyright{} 2009-2025 Antonio Diaz Diaz. -This manual is free documentation: you have unlimited permission -to copy, distribute and modify it. +This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it. +@end ifnottex @node Introduction @chapter Introduction @cindex introduction -Lzlib is a data compression library providing in-memory LZMA compression -and decompression functions, including integrity checking of the -decompressed data. The compressed data format used by the library is the -lzip format. Lzlib is written in C. +@uref{http://www.nongnu.org/lzip/lzlib.html,,Lzlib} +is a data compression library providing in-memory LZMA compression and +decompression functions, including integrity checking of the decompressed +data. The compressed data format used by the library is the +@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} format. +Lzlib is written in C and is distributed under a 2-clause BSD license. -The lzip file format is designed for data sharing and long-term -archiving, taking into account both data integrity and decoder -availability: +The functions and variables forming the interface of the compression library +are declared in the file @file{lzlib.h}. Usage examples of the library are +given in the files @file{bbexample.c}, @file{ffexample.c}, and +@file{minilzip.c} from the source distribution. -@itemize @bullet -@item -The lzip format provides very safe integrity checking and some data -recovery means. The -@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover} -program can repair bit-flip errors (one of the most common forms of data -corruption) in lzip files, and provides data recovery capabilities, -including error-checked merging of damaged copies of a file. -@ifnothtml -@xref{Data safety,,,lziprecover}. -@end ifnothtml +As @file{lzlib.h} can be used in C and C++ programs, it must not impose a +choice of system headers on the program by including one of them. Therefore +it is the responsibility of the program using lzlib to include before +@file{lzlib.h} some header that declares the type @samp{uint8_t}. There are +at least four such headers in C and C++: @file{stdint.h}, @file{cstdint}, +@file{inttypes.h}, and @file{cinttypes}. -@item -The lzip format is as simple as possible (but not simpler). The lzip -manual provides the code of a simple decompressor along with a detailed -explanation of how it works, so that with the only help of the lzip -manual it would be possible for a digital archaeologist to extract the -data from a lzip file long after quantum computers eventually render -LZMA obsolete. - -@item -Additionally the lzip reference implementation is copylefted, which -guarantees that it will remain free forever. -@end itemize - -A nice feature of the lzip format is that a corrupt byte is easier to -repair the nearer it is from the beginning of the file. Therefore, with -the help of lziprecover, losing an entire archive just because of a -corrupt byte near the beginning is a thing of the past. - -The functions and variables forming the interface of the compression -library are declared in the file @samp{lzlib.h}. Usage examples of the -library are given in the files @samp{main.c} and @samp{bbexample.c} from -the source distribution. +All the library functions are thread safe. The library does not install any +signal handler. The decoder checks the consistency of the compressed data, +so the library should never crash even in case of corrupted input. Compression/decompression is done by repeatedly calling a couple of -read/write functions until all the data have been processed by the -library. This interface is safer and less error prone than the -traditional zlib interface. +read/write functions until all the data have been processed by the library. +This interface is safer and less error prone than the traditional zlib +interface. Compression/decompression is done when the read function is called. This -means the value returned by the position functions will not be updated -until a read call, even if a lot of data is written. If you want the -data to be compressed in advance, just call the read function with a -@var{size} equal to 0. +means the value returned by the position functions is not updated until a +read call, even if a lot of data are written. If you want the data to be +compressed in advance, just call the read function with a @var{size} equal +to 0. -If all the data to be compressed are written in advance, lzlib will -automatically adjust the header of the compressed data to use the -smallest possible dictionary size. This feature reduces the amount of -memory needed for decompression and allows minilzip to produce identical -compressed output as lzip. +If all the data to be compressed are written in advance, lzlib automatically +adjusts the header of the compressed data to use the largest dictionary size +that does not exceed neither the data size nor the limit given to +@samp{LZ_compress_open}. This feature reduces the amount of memory needed for +decompression and allows minilzip to produce identical compressed output as +lzip. -Lzlib will correctly decompress a data stream which is the concatenation -of two or more compressed data streams. The result is the concatenation -of the corresponding decompressed data streams. Integrity testing of -concatenated compressed data streams is also supported. +Lzlib correctly decompresses a data stream which is the concatenation of +two or more compressed data streams. The result is the concatenation of the +corresponding decompressed data streams. Integrity testing of concatenated +compressed data streams is also supported. -All the library functions are thread safe. The library does not install -any signal handler. The decoder checks the consistency of the compressed -data, so the library should never crash even in case of corrupted input. +Lzlib is able to compress and decompress streams of unlimited size by +automatically creating multimember output. The members so created are large, +about @w{2 PiB} each. In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a concrete algorithm; it is more like "any algorithm using the LZMA coding -scheme". For example, the option @samp{-0} of lzip uses the scheme in almost -the simplest way possible; issuing the longest match it can find, or a -literal byte if it can't find a match. Inversely, a much more elaborated -way of finding coding sequences of minimum size than the one currently -used by lzip could be developed, and the resulting sequence could also -be coded using the LZMA coding scheme. +scheme". For example, the option @option{-0} of lzip uses the scheme in +almost the simplest way possible; issuing the longest match it can find, or +a literal byte if it can't find a match. Inversely, a more elaborate way of +finding coding sequences of minimum size than the one currently used by lzip +could be developed, and the resulting sequence could also be coded using the +LZMA coding scheme. -Lzlib currently implements two variants of the LZMA algorithm; fast -(used by option @samp{-0} of minilzip) and normal (used by all other -compression levels). +Lzlib currently implements two variants of the LZMA algorithm: fast (used by +option @option{-0} of minilzip) and normal (used by all other compression levels). The high compression of LZMA comes from combining two basic, well-proven -compression ideas: sliding dictionaries (LZ77/78) and markov models (the -thing used by every compression algorithm that uses a range encoder or -similar order-0 entropy coder as its last stage) with segregation of -contexts according to what the bits are used for. +compression ideas: sliding dictionaries (LZ77) and Markov models (the thing +used by every compression algorithm that uses a range encoder or similar +order-0 entropy coder as its last stage) with segregation of contexts +according to what the bits are used for. The ideas embodied in lzlib are due to (at least) the following people: -Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for -the definition of Markov chains), G.N.N. Martin (for the definition of -range encoding), Igor Pavlov (for putting all the above together in -LZMA), and Julian Seward (for bzip2's CLI). +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrei Markov (for the +definition of Markov chains), G.N.N. Martin (for the definition of range +encoding), Igor Pavlov (for putting all the above together in LZMA), and +Julian Seward (for bzip2's CLI). + +LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have +been compressed. Decompressed is used to refer to data which have undergone +the process of decompression. @node Library version @chapter Library version @cindex library version -@deftypefun {const char *} LZ_version ( void ) -Returns the library version as a string. +One goal of lzlib is to keep perfect backward compatibility with older +versions of itself down to 1.0. Any application working with an older lzlib +should work with a newer lzlib. Installing a newer lzlib should not break +anything. This chapter describes the constants and functions that the +application can use to discover the version of the library being used. All +of them are declared in @file{lzlib.h}. + +@defvr Constant LZ_API_VERSION +This constant is defined in @file{lzlib.h} and works as a version test +macro. The application should check at compile time that LZ_API_VERSION is +greater than or equal to the version required by the application: + +@example +#if !defined LZ_API_VERSION || LZ_API_VERSION < 1012 +#error "lzlib 1.12 or newer needed." +#endif +@end example + +Before version 1.8, lzlib didn't define LZ_API_VERSION.@* +LZ_API_VERSION was first defined in lzlib 1.8 to 1.@* +Since lzlib 1.12, LZ_API_VERSION is defined as (major * 1000 + minor). +@end defvr + +NOTE: Version test macros are the library's way of announcing functionality +to the application. They should not be confused with feature test macros, +which allow the application to announce to the library its desire to have +certain symbols and prototypes exposed. + +@deftypefun int LZ_api_version ( void ) +If LZ_API_VERSION >= 1012, this function is declared in @file{lzlib.h} (else +it doesn't exist). It returns the LZ_API_VERSION of the library object code +being used. The application should check at run time that the value +returned by @code{LZ_api_version} is greater than or equal to the version +required by the application. An application may be dynamically linked at run +time with a different version of lzlib than the one it was compiled for, and +this should not break the application as long as the library used provides +the functionality required by the application. + +@example +#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + if( LZ_api_version() < 1012 ) + show_error( "lzlib 1.12 or newer needed." ); +#endif +@end example @end deftypefun @deftypevr Constant {const char *} LZ_version_string -This constant is defined in the header file @samp{lzlib.h}. +This string constant is defined in the header file @file{lzlib.h} and +represents the version of the library being used at compile time. @end deftypevr -The application should compare LZ_version and LZ_version_string for -consistency. If the first character differs, the library code actually -used may be incompatible with the @samp{lzlib.h} header file used by the -application. - -@example -if( LZ_version()[0] != LZ_version_string[0] ) - error( "bad library version" ); -@end example +@deftypefun {const char *} LZ_version ( void ) +This function returns a string representing the version of the library being +used at run time. +@end deftypefun @node Buffering @@ -190,26 +214,23 @@ dictionary size. Finally, for safety reasons, lzlib uses two more internal buffers. -These are the four buffers used by lzlib, and their guaranteed minimum -sizes: +These are the four buffers used by lzlib, and their guaranteed minimum sizes: @itemize @bullet -@item Input compression buffer. Written to by the -@samp{LZ_compress_write} function. For the normal variant of LZMA, its -size is two times the dictionary size set with the -@samp{LZ_compress_open} function or 64 KiB, whichever is larger. For the -fast variant, its size is 1 MiB. +@item Input compression buffer. Written to by the function +@samp{LZ_compress_write}. For the normal variant of LZMA, its size is two +times the dictionary size set with the function @samp{LZ_compress_open} or +@w{64 KiB}, whichever is larger. For the fast variant, its size is @w{1 MiB}. -@item Output compression buffer. Read from by the -@samp{LZ_compress_read} function. Its size is 64 KiB. +@item Output compression buffer. Read from by the function +@samp{LZ_compress_read}. Its size is @w{64 KiB}. -@item Input decompression buffer. Written to by the -@samp{LZ_decompress_write} function. Its size is 64 KiB. +@item Input decompression buffer. Written to by the function +@samp{LZ_decompress_write}. Its size is @w{64 KiB}. -@item Output decompression buffer. Read from by the -@samp{LZ_decompress_read} function. Its size is the dictionary size set -in the header of the member currently being decompressed or 64 KiB, -whichever is larger. +@item Output decompression buffer. Read from by the function +@samp{LZ_decompress_read}. Its size is the dictionary size set in the header +of the member currently being decompressed or @w{64 KiB}, whichever is larger. @end itemize @@ -251,149 +272,176 @@ Returns the largest valid match length limit [273]. These are the functions used to compress data. In case of error, all of them return -1 or 0, for signed and unsigned return values respectively, -except @samp{LZ_compress_open} whose return value must be verified by +except @samp{LZ_compress_open} whose return value must be checked by calling @samp{LZ_compress_errno} before using it. -@deftypefun {struct LZ_Encoder *} LZ_compress_open ( const int @var{dictionary_size}, const int @var{match_len_limit}, const unsigned long long @var{member_size} ) +@deftypefun {LZ_Encoder *} LZ_compress_open ( const int @var{dictionary_size}, const int @var{match_len_limit}, const unsigned long long @var{member_size} ) Initializes the internal stream state for compression and returns a pointer that can only be used as the @var{encoder} argument for the other LZ_compress functions, or a null pointer if the encoder could not be allocated. -The returned pointer must be verified by calling -@samp{LZ_compress_errno} before using it. If @samp{LZ_compress_errno} -does not return @samp{LZ_ok}, the returned pointer must not be used and -should be freed with @samp{LZ_compress_close} to avoid memory leaks. +The returned pointer must be checked by calling @samp{LZ_compress_errno} +before using it. If @samp{LZ_compress_errno} does not return @samp{LZ_ok}, +the returned pointer must not be used and should be freed with +@samp{LZ_compress_close} to avoid memory leaks. @var{dictionary_size} sets the dictionary size to be used, in bytes. -Valid values range from 4 KiB to 512 MiB. Note that dictionary sizes are -quantized. If the specified size does not match one of the valid sizes, -it will be rounded upwards by adding up to (@var{dictionary_size} / 8) -to it. +Valid values range from @w{4 KiB} to @w{512 MiB}. Note that dictionary +sizes are quantized. If the size specified does not match one of the +valid sizes, it is rounded upwards by adding up to +@w{(@var{dictionary_size} / 8)} to it. @var{match_len_limit} sets the match length limit in bytes. Valid values -range from 5 to 273. Larger values usually give better compression -ratios but longer compression times. +range from 5 to 273. Larger values usually give better compression ratios +but longer compression times. -If @var{dictionary_size} is 65535 and @var{match_len_limit} is 16, the -fast variant of LZMA is chosen, which produces identical compressed -output as @code{lzip -0}. (The dictionary size used will be rounded -upwards to 64 KiB). +If @var{dictionary_size} is 65535 and @var{match_len_limit} is 16, the fast +variant of LZMA is chosen, which produces identical compressed output as +@w{@samp{lzip -0}}. (The dictionary size used is rounded upwards to +@w{64 KiB}). -@var{member_size} sets the member size limit in bytes. Minimum member -size limit is 100 kB. Small member size may degrade compression ratio, so -use it only when needed. To produce a single-member data stream, give -@var{member_size} a value larger than the amount of data to be produced, -for example INT64_MAX. +@anchor{member_size} +@var{member_size} sets the member size limit in bytes. Valid values range +from @w{4 KiB} to @w{2 PiB}. A small member size may degrade compression +ratio, so use it only when needed. To produce a single-member data stream, +give @var{member_size} a value larger than the amount of data to be +produced. Values larger than @w{2 PiB} are reduced to @w{2 PiB} to prevent +the uncompressed size of the member from overflowing. @end deftypefun -@deftypefun int LZ_compress_close ( struct LZ_Encoder * const @var{encoder} ) +@deftypefun int LZ_compress_close ( LZ_Encoder * const @var{encoder} ) Frees all dynamically allocated data structures for this stream. This function discards any unprocessed input and does not flush any pending output. After a call to @samp{LZ_compress_close}, @var{encoder} can no -more be used as an argument to any LZ_compress function. +longer be used as an argument to any LZ_compress function. +It is safe to call @samp{LZ_compress_close} with a null argument. @end deftypefun -@deftypefun int LZ_compress_finish ( struct LZ_Encoder * const @var{encoder} ) +@deftypefun int LZ_compress_finish ( LZ_Encoder * const @var{encoder} ) Use this function to tell @samp{lzlib} that all the data for this member -have already been written (with the @samp{LZ_compress_write} function). -After all the produced compressed data have been read with -@samp{LZ_compress_read} and @samp{LZ_compress_member_finished} returns -1, a new member can be started with @samp{LZ_compress_restart_member}. +have already been written (with the function @samp{LZ_compress_write}). +It is safe to call @samp{LZ_compress_finish} as many times as needed. +After all the compressed data have been read with @samp{LZ_compress_read} +and @samp{LZ_compress_member_finished} returns 1, a new member can be +started with @samp{LZ_compress_restart_member}. @end deftypefun -@deftypefun int LZ_compress_restart_member ( struct LZ_Encoder * const @var{encoder}, const unsigned long long @var{member_size} ) -Use this function to start a new member in a multimember data stream. -Call this function only after @samp{LZ_compress_member_finished} -indicates that the current member has been fully read (with the -@samp{LZ_compress_read} function). +@deftypefun int LZ_compress_restart_member ( LZ_Encoder * const @var{encoder}, const unsigned long long @var{member_size} ) +Use this function to start a new member in a multimember data stream. Call +this function only after @samp{LZ_compress_member_finished} indicates that +the current member has been fully read (with the function +@samp{LZ_compress_read}). @xref{member_size}, for a description of +@var{member_size}. @end deftypefun -@deftypefun int LZ_compress_sync_flush ( struct LZ_Encoder * const @var{encoder} ) -Use this function to make available to @samp{LZ_compress_read} all the -data already written with the @samp{LZ_compress_write} function. First -call @samp{LZ_compress_sync_flush}. Then call @samp{LZ_compress_read} -until it returns 0. +@anchor{sync_flush} +@deftypefun int LZ_compress_sync_flush ( LZ_Encoder * const @var{encoder} ) +Use this function to make available to @samp{LZ_compress_read} all the data +already written with the function @samp{LZ_compress_write}. First call +@samp{LZ_compress_sync_flush}. Then call @samp{LZ_compress_read} until it +returns 0. + +This function writes at least one LZMA marker @samp{3} ('Sync Flush' marker) +to the compressed output. Note that the sync flush marker is not allowed in +lzip files; it is a device for interactive communication between +applications using lzlib, but is useless and wasteful in a file, and is +excluded from the media type @samp{application/lzip}. The LZMA marker +@samp{2} ('End Of Stream' marker) is the only marker allowed in lzip files. +@xref{File format}. Repeated use of @samp{LZ_compress_sync_flush} may degrade compression -ratio, so use it only when needed. +ratio, so use it only when needed. If the interval between calls to +@samp{LZ_compress_sync_flush} is large (comparable to dictionary size), +creating a multimember data stream with @samp{LZ_compress_restart_member} +may be an alternative. + +Combining multimember stream creation with flushing may be tricky. If there +are more bytes available than those needed to complete @var{member_size}, +@samp{LZ_compress_restart_member} needs to be called when +@samp{LZ_compress_member_finished} returns 1, followed by a new call to +@samp{LZ_compress_sync_flush}. @end deftypefun -@deftypefun int LZ_compress_read ( struct LZ_Encoder * const @var{encoder}, uint8_t * const @var{buffer}, const int @var{size} ) -The @samp{LZ_compress_read} function reads up to @var{size} bytes from -the stream pointed to by @var{encoder}, storing the results in -@var{buffer}. +@deftypefun int LZ_compress_read ( LZ_Encoder * const @var{encoder}, uint8_t * const @var{buffer}, const int @var{size} ) +Reads up to @var{size} bytes from the stream pointed to by @var{encoder}, +storing the results in @var{buffer}. If @w{LZ_API_VERSION >= 1012}, +@var{buffer} may be a null pointer, in which case the bytes read are +discarded. -The return value is the number of bytes actually read. This might be -less than @var{size}; for example, if there aren't that many bytes left -in the stream or if more bytes have to be yet written with the -@samp{LZ_compress_write} function. Note that reading less than -@var{size} bytes is not an error. -@end deftypefun - - -@deftypefun int LZ_compress_write ( struct LZ_Encoder * const @var{encoder}, uint8_t * const @var{buffer}, const int @var{size} ) -The @samp{LZ_compress_write} function writes up to @var{size} bytes from -@var{buffer} to the stream pointed to by @var{encoder}. - -The return value is the number of bytes actually written. This might be -less than @var{size}. Note that writing less than @var{size} bytes is +Returns the number of bytes actually read. This might be less than +@var{size}; for example, if there aren't that many bytes left in the stream +or if more bytes have to be yet written with the function +@samp{LZ_compress_write}. Note that reading less than @var{size} bytes is not an error. @end deftypefun -@deftypefun int LZ_compress_write_size ( struct LZ_Encoder * const @var{encoder} ) -The @samp{LZ_compress_write_size} function returns the maximum number of -bytes that can be immediately written through the @samp{LZ_compress_write} -function. +@deftypefun int LZ_compress_write ( LZ_Encoder * const @var{encoder}, uint8_t * const @var{buffer}, const int @var{size} ) +Writes up to @var{size} bytes from @var{buffer} to the stream pointed to by +@var{encoder}. Returns the number of bytes actually written. This might be +less than @var{size}. Note that writing less than @var{size} bytes is not an +error. +@end deftypefun + + +@deftypefun int LZ_compress_write_size ( LZ_Encoder * const @var{encoder} ) +Returns the maximum number of bytes that can be immediately written through +@samp{LZ_compress_write}. For efficiency reasons, once the input buffer is +full and @samp{LZ_compress_write_size} returns 0, almost all the buffer must +be compressed before a size greater than 0 is returned again. (This is done +to minimize the amount of data that must be copied to the beginning of the +buffer before new data can be accepted). It is guaranteed that an immediate call to @samp{LZ_compress_write} will accept a @var{size} up to the returned number of bytes. @end deftypefun -@deftypefun {enum LZ_Errno} LZ_compress_errno ( struct LZ_Encoder * const @var{encoder} ) -Returns the current error code for @var{encoder} (@pxref{Error codes}). +@deftypefun {LZ_Errno} LZ_compress_errno ( LZ_Encoder * const @var{encoder} ) +Returns the current error code for @var{encoder}. @xref{Error codes}. +It is safe to call @samp{LZ_compress_errno} with a null argument, in which +case it returns @samp{LZ_bad_argument}. @end deftypefun -@deftypefun int LZ_compress_finished ( struct LZ_Encoder * const @var{encoder} ) +@deftypefun int LZ_compress_finished ( LZ_Encoder * const @var{encoder} ) Returns 1 if all the data have been read and @samp{LZ_compress_close} -can be safely called. Otherwise it returns 0. +can be safely called. Otherwise it returns 0. @samp{LZ_compress_finished} +implies @samp{LZ_compress_member_finished}. @end deftypefun -@deftypefun int LZ_compress_member_finished ( struct LZ_Encoder * const @var{encoder} ) +@deftypefun int LZ_compress_member_finished ( LZ_Encoder * const @var{encoder} ) Returns 1 if the current member, in a multimember data stream, has been fully read and @samp{LZ_compress_restart_member} can be safely called. Otherwise it returns 0. @end deftypefun -@deftypefun {unsigned long long} LZ_compress_data_position ( struct LZ_Encoder * const @var{encoder} ) -Returns the number of input bytes already compressed in the current -member. +@deftypefun {unsigned long long} LZ_compress_data_position ( LZ_Encoder * const @var{encoder} ) +Returns the number of input bytes already compressed in the current member. @end deftypefun -@deftypefun {unsigned long long} LZ_compress_member_position ( struct LZ_Encoder * const @var{encoder} ) +@deftypefun {unsigned long long} LZ_compress_member_position ( LZ_Encoder * const @var{encoder} ) Returns the number of compressed bytes already produced, but perhaps not yet read, in the current member. @end deftypefun -@deftypefun {unsigned long long} LZ_compress_total_in_size ( struct LZ_Encoder * const @var{encoder} ) +@deftypefun {unsigned long long} LZ_compress_total_in_size ( LZ_Encoder * const @var{encoder} ) Returns the total number of input bytes already compressed. @end deftypefun -@deftypefun {unsigned long long} LZ_compress_total_out_size ( struct LZ_Encoder * const @var{encoder} ) +@deftypefun {unsigned long long} LZ_compress_total_out_size ( LZ_Encoder * const @var{encoder} ) Returns the total number of compressed bytes already produced, but perhaps not yet read. @end deftypefun @@ -403,149 +451,172 @@ perhaps not yet read. @chapter Decompression functions @cindex decompression functions -These are the functions used to decompress data. In case of error, all -of them return -1 or 0, for signed and unsigned return values -respectively, except @samp{LZ_decompress_open} whose return value must -be verified by calling @samp{LZ_decompress_errno} before using it. +These are the functions used to decompress data. In case of error, all of +them return -1 or 0, for signed and unsigned return values respectively, +except @samp{LZ_decompress_open} whose return value must be checked by +calling @samp{LZ_decompress_errno} before using it. -@deftypefun {struct LZ_Decoder *} LZ_decompress_open ( void ) +@deftypefun {LZ_Decoder *} LZ_decompress_open ( void ) Initializes the internal stream state for decompression and returns a -pointer that can only be used as the @var{decoder} argument for the -other LZ_decompress functions, or a null pointer if the decoder could -not be allocated. +pointer that can only be used as the @var{decoder} argument for the other +LZ_decompress functions, or a null pointer if the decoder could not be +allocated. -The returned pointer must be verified by calling -@samp{LZ_decompress_errno} before using it. If -@samp{LZ_decompress_errno} does not return @samp{LZ_ok}, the returned -pointer must not be used and should be freed with +The returned pointer must be checked by calling @samp{LZ_decompress_errno} +before using it. If @samp{LZ_decompress_errno} does not return @samp{LZ_ok}, +the returned pointer must not be used and should be freed with @samp{LZ_decompress_close} to avoid memory leaks. @end deftypefun -@deftypefun int LZ_decompress_close ( struct LZ_Decoder * const @var{decoder} ) +@deftypefun int LZ_decompress_close ( LZ_Decoder * const @var{decoder} ) Frees all dynamically allocated data structures for this stream. This function discards any unprocessed input and does not flush any pending output. After a call to @samp{LZ_decompress_close}, @var{decoder} can no -more be used as an argument to any LZ_decompress function. +longer be used as an argument to any LZ_decompress function. +It is safe to call @samp{LZ_decompress_close} with a null argument. @end deftypefun -@deftypefun int LZ_decompress_finish ( struct LZ_Decoder * const @var{decoder} ) +@deftypefun int LZ_decompress_finish ( LZ_Decoder * const @var{decoder} ) Use this function to tell @samp{lzlib} that all the data for this stream -have already been written (with the @samp{LZ_decompress_write} function). +have already been written (with the function @samp{LZ_decompress_write}). +It is safe to call @samp{LZ_decompress_finish} as many times as needed. +It is not required to call @samp{LZ_decompress_finish} if the input stream +only contains whole members, but not calling it prevents lzlib from +detecting a truncated member. @end deftypefun -@deftypefun int LZ_decompress_reset ( struct LZ_Decoder * const @var{decoder} ) +@deftypefun int LZ_decompress_reset ( LZ_Decoder * const @var{decoder} ) Resets the internal state of @var{decoder} as it was just after opening -it with the @samp{LZ_decompress_open} function. Data stored in the -internal buffers is discarded. Position counters are set to 0. +it with the function @samp{LZ_decompress_open}. Data stored in the +internal buffers are discarded. Position counters are set to 0. @end deftypefun -@deftypefun int LZ_decompress_sync_to_member ( struct LZ_Decoder * const @var{decoder} ) -Resets the error state of @var{decoder} and enters a search state that -lasts until a new member header (or the end of the stream) is found. -After a successful call to @samp{LZ_decompress_sync_to_member}, data -written with @samp{LZ_decompress_write} will be consumed and -@samp{LZ_decompress_read} will return 0 until a header is found. +@deftypefun int LZ_decompress_sync_to_member ( LZ_Decoder * const @var{decoder} ) +Resets the error state of @var{decoder} and enters a search state that lasts +until a new member header (or the end of the stream) is found. After a +successful call to @samp{LZ_decompress_sync_to_member}, data written with +@samp{LZ_decompress_write} is consumed and @samp{LZ_decompress_read} returns +0 until a header is found. -This function is useful to discard any data preceding the first member, -or to discard the rest of the current member, for example in case of a -data error. If the decoder is already at the beginning of a member, this -function does nothing. +This function is useful to discard any data preceding the first member, or +to discard the rest of the current member, for example in case of a data +error. If the decoder is already at the beginning of a member, this function +does nothing. @end deftypefun -@deftypefun int LZ_decompress_read ( struct LZ_Decoder * const @var{decoder}, uint8_t * const @var{buffer}, const int @var{size} ) -The @samp{LZ_decompress_read} function reads up to @var{size} bytes from -the stream pointed to by @var{decoder}, storing the results in -@var{buffer}. +@deftypefun int LZ_decompress_read ( LZ_Decoder * const @var{decoder}, uint8_t * const @var{buffer}, const int @var{size} ) +Reads up to @var{size} bytes from the stream pointed to by @var{decoder}, +storing the results in @var{buffer}. If @w{LZ_API_VERSION >= 1012}, +@var{buffer} may be a null pointer, in which case the bytes read are +discarded. -The return value is the number of bytes actually read. This might be -less than @var{size}; for example, if there aren't that many bytes left -in the stream or if more bytes have to be yet written with the -@samp{LZ_decompress_write} function. Note that reading less than -@var{size} bytes is not an error. -@end deftypefun - - -@deftypefun int LZ_decompress_write ( struct LZ_Decoder * const @var{decoder}, uint8_t * const @var{buffer}, const int @var{size} ) -The @samp{LZ_decompress_write} function writes up to @var{size} bytes from -@var{buffer} to the stream pointed to by @var{decoder}. - -The return value is the number of bytes actually written. This might be -less than @var{size}. Note that writing less than @var{size} bytes is +Returns the number of bytes actually read. This might be less than +@var{size}; for example, if there aren't that many bytes left in the stream +or if more bytes have to be yet written with the function +@samp{LZ_decompress_write}. Note that reading less than @var{size} bytes is not an error. + +@samp{LZ_decompress_read} returns at least once per member so that +@samp{LZ_decompress_member_finished} can be called (and trailer data +retrieved) for each member, even for empty members. Therefore, +@samp{LZ_decompress_read} returning 0 does not mean that the end of the +stream has been reached. The increase in the value returned by +@samp{LZ_decompress_total_in_size} can be used to tell the end of the stream +from an empty member. + +In case of decompression error caused by corrupt or truncated data, +@samp{LZ_decompress_read} does not signal the error immediately to the +application, but waits until all the bytes decoded have been read. This +allows tools like +@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} to +recover as much data as possible from each damaged member. +@ifnothtml +@xref{Top,tarlz manual,,tarlz}. +@end ifnothtml @end deftypefun -@deftypefun int LZ_decompress_write_size ( struct LZ_Decoder * const @var{decoder} ) -The @samp{LZ_decompress_write_size} function returns the maximum number -of bytes that can be immediately written through the -@samp{LZ_decompress_write} function. - -It is guaranteed that an immediate call to @samp{LZ_decompress_write} -will accept a @var{size} up to the returned number of bytes. +@deftypefun int LZ_decompress_write ( LZ_Decoder * const @var{decoder}, uint8_t * const @var{buffer}, const int @var{size} ) +Writes up to @var{size} bytes from @var{buffer} to the stream pointed to by +@var{decoder}. Returns the number of bytes actually written. This might be +less than @var{size}. Note that writing less than @var{size} bytes is not an +error. @end deftypefun -@deftypefun {enum LZ_Errno} LZ_decompress_errno ( struct LZ_Decoder * const @var{decoder} ) -Returns the current error code for @var{decoder} (@pxref{Error codes}). +@deftypefun int LZ_decompress_write_size ( LZ_Decoder * const @var{decoder} ) +Returns the maximum number of bytes that can be immediately written through +@samp{LZ_decompress_write}. This number varies smoothly; each compressed +byte consumed may be overwritten immediately, increasing by 1 the value +returned. + +It is guaranteed that an immediate call to @samp{LZ_decompress_write} will +accept a @var{size} up to the returned number of bytes. @end deftypefun -@deftypefun int LZ_decompress_finished ( struct LZ_Decoder * const @var{decoder} ) +@deftypefun {LZ_Errno} LZ_decompress_errno ( LZ_Decoder * const @var{decoder} ) +Returns the current error code for @var{decoder}. @xref{Error codes}. +It is safe to call @samp{LZ_decompress_errno} with a null argument, in which +case it returns @samp{LZ_bad_argument}. +@end deftypefun + + +@deftypefun int LZ_decompress_finished ( LZ_Decoder * const @var{decoder} ) Returns 1 if all the data have been read and @samp{LZ_decompress_close} -can be safely called. Otherwise it returns 0. +can be safely called. Otherwise it returns 0. @samp{LZ_decompress_finished} +does not imply @samp{LZ_decompress_member_finished}. @end deftypefun -@deftypefun int LZ_decompress_member_finished ( struct LZ_Decoder * const @var{decoder} ) -Returns 1 if the previous call to @samp{LZ_decompress_read} finished -reading the current member, indicating that final values for member are -available through @samp{LZ_decompress_data_crc}, -@samp{LZ_decompress_data_position}, and -@samp{LZ_decompress_member_position}. Otherwise it returns 0. +@deftypefun int LZ_decompress_member_finished ( LZ_Decoder * const @var{decoder} ) +Returns 1 if the previous call to @samp{LZ_decompress_read} finished reading +the current member, indicating that final values for the member are available +through @samp{LZ_decompress_data_crc}, @samp{LZ_decompress_data_position}, +and @samp{LZ_decompress_member_position}. Otherwise it returns 0. @end deftypefun -@deftypefun int LZ_decompress_member_version ( struct LZ_Decoder * const @var{decoder} ) -Returns the version of current member from member header. +@deftypefun int LZ_decompress_member_version ( LZ_Decoder * const @var{decoder} ) +Returns the version of the current member, read from the member header. @end deftypefun -@deftypefun int LZ_decompress_dictionary_size ( struct LZ_Decoder * const @var{decoder} ) -Returns the dictionary size of current member from member header. +@deftypefun int LZ_decompress_dictionary_size ( LZ_Decoder * const @var{decoder} ) +Returns the dictionary size of the current member, read from the member header. @end deftypefun -@deftypefun {unsigned} LZ_decompress_data_crc ( struct LZ_Decoder * const @var{decoder} ) +@deftypefun {unsigned} LZ_decompress_data_crc ( LZ_Decoder * const @var{decoder} ) Returns the 32 bit Cyclic Redundancy Check of the data decompressed from -the current member. The returned value is valid only when +the current member. The value returned is valid only when @samp{LZ_decompress_member_finished} returns 1. @end deftypefun -@deftypefun {unsigned long long} LZ_decompress_data_position ( struct LZ_Decoder * const @var{decoder} ) +@deftypefun {unsigned long long} LZ_decompress_data_position ( LZ_Decoder * const @var{decoder} ) Returns the number of decompressed bytes already produced, but perhaps not yet read, in the current member. @end deftypefun -@deftypefun {unsigned long long} LZ_decompress_member_position ( struct LZ_Decoder * const @var{decoder} ) -Returns the number of input bytes already decompressed in the current -member. +@deftypefun {unsigned long long} LZ_decompress_member_position ( LZ_Decoder * const @var{decoder} ) +Returns the number of input bytes already decompressed in the current member. @end deftypefun -@deftypefun {unsigned long long} LZ_decompress_total_in_size ( struct LZ_Decoder * const @var{decoder} ) +@deftypefun {unsigned long long} LZ_decompress_total_in_size ( LZ_Decoder * const @var{decoder} ) Returns the total number of input bytes already decompressed. @end deftypefun -@deftypefun {unsigned long long} LZ_decompress_total_out_size ( struct LZ_Decoder * const @var{decoder} ) +@deftypefun {unsigned long long} LZ_decompress_total_out_size ( LZ_Decoder * const @var{decoder} ) Returns the total number of decompressed bytes already produced, but perhaps not yet read. @end deftypefun @@ -557,7 +628,7 @@ perhaps not yet read. Most library functions return -1 to indicate that they have failed. But this return value only tells you that an error has occurred. To find out -what kind of error it was, you need to verify the error code by calling +what kind of error it was, you need to check the error code by calling @samp{LZ_(de)compress_errno}. Library functions don't change the value returned by @@ -567,46 +638,49 @@ necessarily LZ_ok, and you should not use @samp{LZ_(de)compress_errno} to determine whether a call failed. If the call failed, then you can examine @samp{LZ_(de)compress_errno}. -The error codes are defined in the header file @samp{lzlib.h}. +The error codes are defined in the header file @file{lzlib.h}. +@samp{LZ_Errno} is an enum type: -@deftypevr Constant {enum LZ_Errno} LZ_ok -The value of this constant is 0 and is used to indicate that there is no -error. +@deftypevr Constant {LZ_Errno} LZ_ok +The value of this constant is 0 and is used to indicate that there is no error. @end deftypevr -@deftypevr Constant {enum LZ_Errno} LZ_bad_argument -At least one of the arguments passed to the library function was -invalid. +@deftypevr Constant {LZ_Errno} LZ_bad_argument +At least one of the arguments passed to the library function was invalid. @end deftypevr -@deftypevr Constant {enum LZ_Errno} LZ_mem_error -No memory available. The system cannot allocate more virtual memory -because its capacity is full. +@deftypevr Constant {LZ_Errno} LZ_mem_error +No memory available. The system cannot allocate more virtual memory because +its capacity is full. @end deftypevr -@deftypevr Constant {enum LZ_Errno} LZ_sequence_error +@deftypevr Constant {LZ_Errno} LZ_sequence_error A library function was called in the wrong order. For example @samp{LZ_compress_restart_member} was called before -@samp{LZ_compress_member_finished} indicates that the current member is +@samp{LZ_compress_member_finished} indicated that the current member is finished. @end deftypevr -@deftypevr Constant {enum LZ_Errno} LZ_header_error -An invalid member header (one with the wrong magic bytes) was read. If -this happens at the end of the data stream it may indicate trailing -data. +@deftypevr Constant {LZ_Errno} LZ_header_error +An invalid member header (one with the wrong magic bytes) was read. If this +happens at the end of the data stream it may indicate trailing data. @end deftypevr -@deftypevr Constant {enum LZ_Errno} LZ_unexpected_eof +@deftypevr Constant {LZ_Errno} LZ_unexpected_eof The end of the data stream was reached in the middle of a member. @end deftypevr -@deftypevr Constant {enum LZ_Errno} LZ_data_error -The data stream is corrupt. +@deftypevr Constant {LZ_Errno} LZ_data_error +The data stream is corrupt. If @samp{LZ_decompress_member_position} is 6 or +less, it indicates either a format version not supported, an invalid +dictionary size, a nonzero first LZMA byte, a corrupt header in a multimember +data stream, or trailing data too similar to a valid lzip header. +Lziprecover can be used to repair some of these errors and to remove +conflicting trailing data from a file. @end deftypevr -@deftypevr Constant {enum LZ_Errno} LZ_library_error -A bug was detected in the library. Please, report it (@pxref{Problems}). +@deftypevr Constant {LZ_Errno} LZ_library_error +A bug was detected in the library. Please, report it. @xref{Problems}. @end deftypevr @@ -614,27 +688,285 @@ A bug was detected in the library. Please, report it (@pxref{Problems}). @chapter Error messages @cindex error messages -@deftypefun {const char *} LZ_strerror ( const enum LZ_Errno @var{lz_errno} ) -Returns the standard error message for a given error code. The messages -are fairly short; there are no multi-line messages or embedded newlines. -This function makes it easy for your program to report informative error -messages about the failure of a library call. +@deftypefun {const char *} LZ_strerror ( const LZ_Errno @var{lz_errno} ) +Returns the error message corresponding to the error code @var{lz_errno}. +The messages are fairly short; there are no multi-line messages or embedded +newlines. This function makes it easy for your program to report informative +error messages about the failure of a library call. The value of @var{lz_errno} normally comes from a call to @samp{LZ_(de)compress_errno}. @end deftypefun -@node Data format -@chapter Data format -@cindex data format +@node Invoking minilzip +@chapter Invoking minilzip +@cindex invoking +@cindex options + +Minilzip is a test program for the compression library lzlib. Minilzip is +not intended to be installed because lzip has more features, but minilzip is +well tested and you can use it as your main compressor if so you wish. +@ifnothtml +@xref{Top,lzip,,lzip}. +@end ifnothtml + +@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip} +is a lossless data compressor with a user interface similar to the one +of gzip or bzip2. Lzip uses a simplified form of LZMA (Lempel-Ziv-Markov +chain-Algorithm) designed to achieve complete interoperability between +implementations. The maximum dictionary size is 512 MiB so that any lzip +file can be decompressed on 32-bit machines. Lzip provides accurate and +robust 3-factor integrity checking. @w{@samp{lzip -0}} compresses about as fast as +gzip, while @w{@samp{lzip -9}} compresses most files more than bzip2. Decompression +speed is intermediate between gzip and bzip2. Lzip provides better data +recovery capabilities than gzip and bzip2. Lzip has been designed, written, +and tested with great care to replace gzip and bzip2 as general-purpose +compressed format for Unix-like systems. + +@noindent +The format for running minilzip is: + +@example +minilzip [@var{options}] [@var{files}] +@end example + +@noindent +If no file names are specified, minilzip compresses (or decompresses) from +standard input to standard output. A hyphen @samp{-} used as a @var{file} +argument means standard input. It can be mixed with other @var{files} and is +read just once, the first time it appears in the command line. Remember to +prepend @file{./} to any file name beginning with a hyphen, or use @samp{--}. + +@noindent +minilzip supports the following +@uref{http://www.nongnu.org/lzip/manual/plzip_manual.html#Argument-syntax,,options}: +@ifnothtml +@xref{Argument syntax,,,plzip}. +@end ifnothtml + +@table @code +@item -h +@itemx --help +Print an informative help message describing the options and exit. + +@item -V +@itemx --version +Print the version number of minilzip on the standard output and exit. +This version number should be included in all bug reports. + +@item -a +@itemx --trailing-error +Exit with error status 2 if any remaining input is detected after +decompressing the last member. Such remaining input is usually trailing +garbage that can be safely ignored. + +@item -b @var{bytes} +@itemx --member-size=@var{bytes} +When compressing, set the member size limit to @var{bytes}. If @var{bytes} +is smaller than the compressed size, a multimember file is produced. It is +advisable to keep members smaller than RAM size so that they can be repaired +with lziprecover in case of corruption. A small member size may degrade +compression ratio, so use it only when needed. Valid values range from +@w{100 kB} to @w{2 PiB}. Defaults to @w{2 PiB}. + +@item -c +@itemx --stdout +Compress or decompress to standard output; keep input files unchanged. If +compressing several files, each file is compressed independently. (The +output consists of a sequence of independently compressed members). This +option (or @option{-o}) is needed when reading from a named pipe (fifo) or +from a device. Use it also to recover as much of the decompressed data as +possible when decompressing a corrupt file. @option{-c} overrides @option{-o} +and @option{-S}. @option{-c} has no effect when testing. + +@item -d +@itemx --decompress +Decompress the files specified. The integrity of the files specified is +checked. If a file does not exist, can't be opened, or the destination file +already exists and @option{--force} has not been specified, minilzip continues +decompressing the rest of the files and exits with error status 1. If a file +fails to decompress, or is a terminal, minilzip exits immediately with error +status 2 without decompressing the rest of the files. A terminal is +considered an uncompressed file, and therefore invalid. A multimember file +with one or more empty members is accepted if redirected to standard input. + +@item -f +@itemx --force +Force overwrite of output files. + +@item -F +@itemx --recompress +When compressing, force re-compression of files whose name already has +the @file{.lz} or @file{.tlz} suffix. + +@item -k +@itemx --keep +Keep (don't delete) input files during compression or decompression. + +@item -m @var{bytes} +@itemx --match-length=@var{bytes} +When compressing, set the match length limit in bytes. After a match this +long is found, the search is finished. Valid values range from 5 to 273. +Larger values usually give better compression ratios but longer compression +times. + +@item -o @var{file} +@itemx --output=@var{file} +If @option{-c} has not been also specified, write the (de)compressed output +to @var{file}; keep input files unchanged. If compressing several files, +each file is compressed independently. (The output consists of a sequence of +independently compressed members). This option (or @option{-c}) is needed +when reading from a named pipe (fifo) or from a device. @w{@option{-o -}} is +equivalent to @option{-c}. @option{-o} has no effect when testing. + +When compressing and splitting the output in volumes, @var{file} is used as +a prefix, and several files named @file{@var{file}00001.lz}, +@file{@var{file}00002.lz}, etc, are created. In this case, only one input +file is allowed. + +@item -q +@itemx --quiet +Quiet operation. Suppress all messages. + +@item -s @var{bytes} +@itemx --dictionary-size=@var{bytes} +When compressing, set the dictionary size limit in bytes. Minilzip uses for +each file the largest dictionary size that does not exceed neither the file +size nor this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. +Values 12 to 29 are interpreted as powers of two, meaning 2^12 to 2^29 +bytes. Dictionary sizes are quantized so that they can be coded in just one +byte (@pxref{coded-dict-size}). If the size specified does not match one of +the valid sizes, it is rounded upwards by adding up to @w{(@var{bytes} / 8)} +to it. + +For maximum compression you should use a dictionary size limit as large +as possible, but keep in mind that the decompression memory requirement +is affected at compression time by the choice of dictionary size limit. +The dictionary size used for decompression is the same dictionary size used +for compression. + +@item -S @var{bytes} +@itemx --volume-size=@var{bytes} +When compressing, and @option{-c} has not been also specified, split the +compressed output into several volume files with names +@file{original_name00001.lz}, @file{original_name00002.lz}, etc, and set the +volume size limit to @var{bytes}. Input files are kept unchanged. Each +volume is a complete, maybe multimember, lzip file. A small volume size may +degrade compression ratio, so use it only when needed. Valid values range +from @w{100 kB} to @w{4 EiB}. + +@item -t +@itemx --test +Check integrity of the files specified, but don't decompress them. This +really performs a trial decompression and throws away the result. Use it +together with @option{-v} to see information about the files. If a file +fails the test, does not exist, can't be opened, or is a terminal, minilzip +continues testing the rest of the files. A final diagnostic is shown at +verbosity level 1 or higher if any file fails the test when testing multiple +files. A multimember file with one or more empty members is accepted if +redirected to standard input. + +@item -v +@itemx --verbose +Verbose mode.@* +When compressing, show the compression ratio and size for each file processed.@* +When decompressing or testing, further -v's (up to 4) increase the verbosity +level, showing status, compression ratio, dictionary size, and trailer +contents (CRC, data size, member size). + +@item -0 .. -9 +Compression level. Set the compression parameters (dictionary size and +match length limit) as shown in the table below. The default compression +level is @option{-6}, equivalent to @w{@option{-s8MiB -m36}}. Note that +@option{-9} can be much slower than @option{-0}. These options have no +effect when decompressing or testing. + +The bidimensional parameter space of LZMA can't be mapped to a linear scale +optimal for all files. If your files are large, very repetitive, etc, you +may need to use the options @option{--dictionary-size} and +@option{--match-length} directly to achieve optimal performance. + +If several compression levels or @option{-s} or @option{-m} options are +given, the last setting is used. For example @w{@option{-9 -s64MiB}} is +equivalent to @w{@option{-s64MiB -m273}} + +@multitable {Level} {Dictionary size (-s)} {Match length limit (-m)} +@headitem Level @tab Dictionary size (-s) @tab Match length limit (-m) +@item -0 @tab 64 KiB @tab 16 bytes +@item -1 @tab 1 MiB @tab 5 bytes +@item -2 @tab 1.5 MiB @tab 6 bytes +@item -3 @tab 2 MiB @tab 8 bytes +@item -4 @tab 3 MiB @tab 12 bytes +@item -5 @tab 4 MiB @tab 20 bytes +@item -6 @tab 8 MiB @tab 36 bytes +@item -7 @tab 16 MiB @tab 68 bytes +@item -8 @tab 24 MiB @tab 132 bytes +@item -9 @tab 32 MiB @tab 273 bytes +@end multitable + +@item --fast +@itemx --best +Aliases for GNU gzip compatibility. + +@item --loose-trailing +When decompressing or testing, allow trailing data whose first bytes are +so similar to the magic bytes of a lzip header that they can be confused +with a corrupt header. Use this option if a file triggers a 'corrupt +header' error and the cause is not indeed a corrupt header. + +@item --check-lib +Compare the @uref{#Library-version,,version of lzlib} used to compile +minilzip with the version actually being used at run time and exit. Report +any differences found. Exit with error status 1 if differences are found. A +mismatch may indicate that lzlib is not correctly installed or that a +different version of lzlib has been installed after compiling the shared +version of minilzip. Exit with error status 2 if LZ_API_VERSION and +LZ_version_string don't match. @w{@samp{minilzip -v --check-lib}} shows the +version of lzlib being used and the value of LZ_API_VERSION (if defined). +@ifnothtml +@xref{Library version}. +@end ifnothtml + +@end table + +Numbers given as arguments to options may be expressed in decimal, +hexadecimal, or octal (using the same syntax as integer constants in C++), +and may be followed by a multiplier and an optional @samp{B} for "byte". + +Table of SI and binary prefixes (unit multipliers): + +@multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)} +@headitem Prefix @tab Value @tab | @tab Prefix @tab Value +@item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024) +@item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20) +@item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30) +@item T @tab terabyte (10^12) @tab | @tab Ti @tab tebibyte (2^40) +@item P @tab petabyte (10^15) @tab | @tab Pi @tab pebibyte (2^50) +@item E @tab exabyte (10^18) @tab | @tab Ei @tab exbibyte (2^60) +@item Z @tab zettabyte (10^21) @tab | @tab Zi @tab zebibyte (2^70) +@item Y @tab yottabyte (10^24) @tab | @tab Yi @tab yobibyte (2^80) +@item R @tab ronnabyte (10^27) @tab | @tab Ri @tab robibyte (2^90) +@item Q @tab quettabyte (10^30) @tab | @tab Qi @tab quebibyte (2^100) +@end multitable + +@sp 1 +Exit status: 0 for a normal exit, 1 for environmental problems +(file not found, invalid command-line options, I/O errors, etc), 2 to +indicate a corrupt or invalid input file, 3 for an internal consistency +error (e.g., bug) which caused minilzip to panic. + + +@node File format +@chapter File format +@cindex file format Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away.@* --- Antoine de Saint-Exupery -@sp 1 In the diagram below, a box like this: + @verbatim +---+ | | <-- the vertical bars might be missing @@ -642,6 +974,7 @@ In the diagram below, a box like this: @end verbatim represents one byte; a box like this: + @verbatim +==============+ | | @@ -650,12 +983,16 @@ represents one byte; a box like this: represents a variable number of bytes. -@sp 1 -A lzip data stream consists of a series of "members" (compressed data -sets). The members simply appear one after another in the data stream, -with no additional information before, between, or after them. +@noindent +A lzip file consists of one or more independent "members" (compressed data +sets). The members simply appear one after another in the file, with no +additional information before, between, or after them. Each member can +encode in compressed form up to @w{16 EiB - 1 byte} of uncompressed data. +The size of a multimember file is unlimited. Empty members (data size = 0) +are not allowed in multimember files. Each member has the following structure: + @verbatim +--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | @@ -672,19 +1009,19 @@ A four byte string, identifying the lzip format, with the value "LZIP" @item VN (version number, 1 byte) Just in case something needs to be modified in the future. 1 for now. +@anchor{coded-dict-size} @item DS (coded dictionary size, 1 byte) The dictionary size is calculated by taking a power of 2 (the base size) -and substracting from it a fraction between 0/16 and 7/16 of the base -size.@* +and subtracting from it a fraction between 0/16 and 7/16 of the base size.@* Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@* -Bits 7-5 contain the numerator of the fraction (0 to 7) to substract +Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract from the base size to obtain the dictionary size.@* Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* Valid values for dictionary size range from 4 KiB to 512 MiB. @item LZMA stream -The LZMA stream, finished by an end of stream marker. Uses default -values for encoder properties. +The LZMA stream, terminated by an 'End Of Stream' marker. Uses default values +for encoder properties. @ifnothtml @xref{Stream format,,,lzip}, @end ifnothtml @@ -693,19 +1030,21 @@ See @uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#Stream-format,,Stream format} @end ifhtml for a complete description.@* -Lzip only uses the LZMA marker @samp{2} ("End Of Stream" marker). Lzlib -also uses the LZMA marker @samp{3} ("Sync Flush" marker). +Lzip only uses the LZMA marker @samp{2} ('End Of Stream' marker). Lzlib +also uses the LZMA marker @samp{3} ('Sync Flush' marker). @xref{sync_flush}. @item CRC32 (4 bytes) -CRC of the uncompressed original data. +Cyclic Redundancy Check (CRC) of the original uncompressed data. @item Data size (8 bytes) -Size of the uncompressed original data. +Size of the original uncompressed data. @item Member size (8 bytes) Total size of the member, including header and trailer. This field acts -as a distributed index, allows the verification of stream integrity, and -facilitates safe recovery of undamaged members from multimember files. +as a distributed index, improves the checking of stream integrity, and +facilitates the safe recovery of undamaged members from multimember files. +Lzip limits the member size to @w{2 PiB} to prevent the data size field from +overflowing. @end table @@ -714,142 +1053,312 @@ facilitates safe recovery of undamaged members from multimember files. @chapter A small tutorial with examples @cindex examples -This chapter shows the order in which the library functions should be -called depending on what kind of data stream you want to compress or -decompress. See the file @samp{bbexample.c} in the source distribution -for an example of how buffer-to-buffer compression/decompression can be -implemented using lzlib. +This chapter provides real code examples for the most common uses of the +library. See these examples in context in the files @file{bbexample.c} and +@file{ffexample.c} from the source distribution of lzlib. -Note that lzlib's interface is symmetrical. That is, the code for normal -compression and decompression is identical except because one calls +Note that the interface of lzlib is symmetrical. That is, the code for +normal compression and decompression is identical except because one calls LZ_compress* functions while the other calls LZ_decompress* functions. -@sp 1 -@noindent -Example 1: Normal compression (@var{member_size} > total output). +@menu +* Buffer compression:: Buffer-to-buffer single-member compression +* Buffer decompression:: Buffer-to-buffer decompression +* File compression:: File-to-file single-member compression +* File decompression:: File-to-file decompression +* File compression mm:: File-to-file multimember compression +* Skipping data errors:: Decompression with automatic resynchronization +@end menu -@example -1) LZ_compress_open -2) LZ_compress_write -3) LZ_compress_read -4) go back to step 2 until all input data have been written -5) LZ_compress_finish -6) LZ_compress_read -7) go back to step 6 until LZ_compress_finished returns 1 -8) LZ_compress_close -@end example + +@node Buffer compression +@section Buffer compression +@cindex buffer compression + +Buffer-to-buffer single-member compression +@w{(@var{member_size} > total output)}. + +@verbatim +/* Compress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the compressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbcompress( const uint8_t * const inbuf, const int insize, + const int dictionary_size, const int match_len_limit, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + LZ_Encoder * const encoder = + LZ_compress_open( dictionary_size, match_len_limit, INT64_MAX ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { LZ_compress_close( encoder ); return false; } + + while( true ) + { + int ret = LZ_compress_write( encoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_compress_finish( encoder ); + ret = LZ_compress_read( encoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_compress_finished( encoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_compress_close( encoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } +@end verbatim + + +@node Buffer decompression +@section Buffer decompression +@cindex buffer decompression + +Buffer-to-buffer decompression. + +@verbatim +/* Decompress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the decompressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbdecompress( const uint8_t * const inbuf, const int insize, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + LZ_Decoder * const decoder = LZ_decompress_open(); + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { LZ_decompress_close( decoder ); return false; } + + while( true ) + { + int ret = LZ_decompress_write( decoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_decompress_finish( decoder ); + ret = LZ_decompress_read( decoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_decompress_finished( decoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_decompress_close( decoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } +@end verbatim + + +@node File compression +@section File compression +@cindex file compression + +File-to-file compression using LZ_compress_write_size. + +@verbatim +int ffcompress( LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_finished( encoder ) == 1 ) return 0; + } + return 1; + } +@end verbatim + + +@node File decompression +@section File decompression +@cindex file decompression + +File-to-file decompression using LZ_decompress_write_size. + +@verbatim +int ffdecompress( LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } +@end verbatim + + +@node File compression mm +@section File-to-file multimember compression +@cindex multimember compression + +Example 1: Multimember compression with members of fixed size +@w{(@var{member_size} < total output)}. + +@verbatim +int ffmmcompress( FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384, member_size = 4096 }; + uint8_t buffer[buffer_size]; + bool done = false; + LZ_Encoder * const encoder = LZ_compress_open( 65535, 16, member_size ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { fputs( "ffexample: Not enough memory.\n", stderr ); + LZ_compress_close( encoder ); return 1; } + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( LZ_compress_finished( encoder ) == 1 ) { done = true; break; } + if( LZ_compress_restart_member( encoder, member_size ) < 0 ) break; + } + } + if( LZ_compress_close( encoder ) < 0 ) done = false; + return done; + } +@end verbatim @sp 1 @noindent -Example 2: Normal compression using LZ_compress_write_size. +Example 2: Multimember compression (user-restarted members). +(Call LZ_compress_open with @var{member_size} > largest member). -@example -1) LZ_compress_open -2) go to step 5 if LZ_compress_write_size returns 0 -3) LZ_compress_write -4) if no more data to write, call LZ_compress_finish -5) LZ_compress_read -6) go back to step 2 until LZ_compress_finished returns 1 -7) LZ_compress_close -@end example +@verbatim +/* Compress 'infile' to 'outfile' as a multimember stream with one member + for each line of text terminated by a newline character or by EOF. + Return 0 if success, 1 if error. +*/ +int fflfcompress( LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + for( len = 0; len < size; ) + { + int ch = getc( infile ); + if( ch == EOF || ( buffer[len++] = ch ) == '\n' ) break; + } + /* avoid writing an empty member to outfile */ + if( len == 0 && LZ_compress_data_position( encoder ) == 0 ) return 0; + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) || buffer[len-1] == '\n' ) + LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( feof( infile ) && LZ_compress_finished( encoder ) == 1 ) return 0; + if( LZ_compress_restart_member( encoder, INT64_MAX ) < 0 ) break; + } + } + return 1; + } +@end verbatim -@sp 1 -@noindent -Example 3: Decompression. -@example -1) LZ_decompress_open -2) LZ_decompress_write -3) LZ_decompress_read -4) go back to step 2 until all input data have been written -5) LZ_decompress_finish -6) LZ_decompress_read -7) go back to step 6 until LZ_decompress_finished returns 1 -8) LZ_decompress_close -@end example +@node Skipping data errors +@section Skipping data errors +@cindex skipping data errors -@sp 1 -@noindent -Example 4: Decompression using LZ_decompress_write_size. - -@example -1) LZ_decompress_open -2) go to step 5 if LZ_decompress_write_size returns 0 -3) LZ_decompress_write -4) if no more data to write, call LZ_decompress_finish -5) LZ_decompress_read -5a) optionally, if LZ_decompress_member_finished returns 1, read - final values for member with LZ_decompress_data_crc, etc. -6) go back to step 2 until LZ_decompress_finished returns 1 -7) LZ_decompress_close -@end example - -@sp 1 -@noindent -Example 5: Multimember compression (@var{member_size} < total output). - -@example - 1) LZ_compress_open - 2) go to step 5 if LZ_compress_write_size returns 0 - 3) LZ_compress_write - 4) if no more data to write, call LZ_compress_finish - 5) LZ_compress_read - 6) go back to step 2 until LZ_compress_member_finished returns 1 - 7) go to step 10 if LZ_compress_finished() returns 1 - 8) LZ_compress_restart_member - 9) go back to step 2 -10) LZ_compress_close -@end example - -@sp 1 -@noindent -Example 6: Multimember compression (user-restarted members). - -@example - 1) LZ_compress_open - 2) LZ_compress_write - 3) LZ_compress_read - 4) go back to step 2 until member termination is desired - 5) LZ_compress_finish - 6) LZ_compress_read - 7) go back to step 6 until LZ_compress_member_finished returns 1 - 8) verify that LZ_compress_finished returns 1 - 9) go to step 12 if all input data have been written -10) LZ_compress_restart_member -11) go back to step 2 -12) LZ_compress_close -@end example - -@sp 1 -@noindent -Example 7: Decompression with automatic removal of leading data. - -@example -1) LZ_decompress_open -2) LZ_decompress_sync_to_member -3) go to step 6 if LZ_decompress_write_size returns 0 -4) LZ_decompress_write -5) if no more data to write, call LZ_decompress_finish -6) LZ_decompress_read -7) go back to step 3 until LZ_decompress_finished returns 1 -8) LZ_decompress_close -@end example - -@sp 1 -@noindent -Example 8: Streamed decompression with automatic resynchronization to -next member in case of data error. - -@example -1) LZ_decompress_open -2) go to step 5 if LZ_decompress_write_size returns 0 -3) LZ_decompress_write -4) if no more data to write, call LZ_decompress_finish -5) if LZ_decompress_read produces LZ_header_error or LZ_data_error, - call LZ_decompress_sync_to_member -6) go back to step 2 until LZ_decompress_finished returns 1 -7) LZ_decompress_close -@end example +@verbatim +/* Decompress 'infile' to 'outfile' with automatic resynchronization to + next member in case of data error, including the automatic removal of + leading garbage. +*/ +int ffrsdecompress( LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) + { + if( LZ_decompress_errno( decoder ) == LZ_header_error || + LZ_decompress_errno( decoder ) == LZ_data_error ) + { LZ_decompress_sync_to_member( decoder ); continue; } + break; + } + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } +@end verbatim @node Problems @@ -864,8 +1373,8 @@ for all eternity, if not longer. If you find a bug in lzlib, please send electronic mail to @email{lzip-bug@@nongnu.org}. Include the version number, which you can -find by running @w{@code{minilzip --version}} or in -@samp{LZ_version_string} from @samp{lzlib.h}. +find by running @w{@samp{minilzip --version}} and +@w{@samp{minilzip -v --check-lib}}. @node Concept index diff --git a/doc/minilzip.1 b/doc/minilzip.1 index b5ebf78..e375123 100644 --- a/doc/minilzip.1 +++ b/doc/minilzip.1 @@ -1,12 +1,26 @@ -.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1. -.TH MINILZIP "1" "May 2016" "minilzip 1.8" "User Commands" +.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2. +.TH MINILZIP "1" "January 2025" "minilzip 1.15" "User Commands" .SH NAME minilzip \- reduces the size of files .SH SYNOPSIS .B minilzip [\fI\,options\/\fR] [\fI\,files\/\fR] .SH DESCRIPTION -Minilzip \- Test program for the lzlib library. +Minilzip is a test program for the compression library lzlib. Minilzip is +not intended to be installed because lzip has more features, but minilzip is +well tested and you can use it as your main compressor if so you wish. +.PP +Lzip is a lossless data compressor with a user interface similar to the one +of gzip or bzip2. Lzip uses a simplified form of LZMA (Lempel\-Ziv\-Markov +chain\-Algorithm) designed to achieve complete interoperability between +implementations. The maximum dictionary size is 512 MiB so that any lzip +file can be decompressed on 32\-bit machines. Lzip provides accurate and +robust 3\-factor integrity checking. 'lzip \fB\-0\fR' compresses about as fast as +gzip, while 'lzip \fB\-9\fR' compresses most files more than bzip2. Decompression +speed is intermediate between gzip and bzip2. Lzip provides better data +recovery capabilities than gzip and bzip2. Lzip has been designed, written, +and tested with great care to replace gzip and bzip2 as general\-purpose +compressed format for Unix\-like systems. .SH OPTIONS .TP \fB\-h\fR, \fB\-\-help\fR @@ -19,13 +33,13 @@ output version information and exit exit with error status if trailing data .TP \fB\-b\fR, \fB\-\-member\-size=\fR -set member size limit in bytes +set member size limit of multimember files .TP \fB\-c\fR, \fB\-\-stdout\fR write to standard output, keep input files .TP \fB\-d\fR, \fB\-\-decompress\fR -decompress +decompress, test compressed file integrity .TP \fB\-f\fR, \fB\-\-force\fR overwrite existing output files @@ -40,7 +54,7 @@ keep (don't delete) input files set match length limit in bytes [36] .TP \fB\-o\fR, \fB\-\-output=\fR -if reading standard input, write to +write to , keep input files .TP \fB\-q\fR, \fB\-\-quiet\fR suppress all messages @@ -65,31 +79,59 @@ alias for \fB\-0\fR .TP \fB\-\-best\fR alias for \fB\-9\fR +.TP +\fB\-\-loose\-trailing\fR +allow trailing data seeming corrupt header +.TP +\fB\-\-check\-lib\fR +compare version of lzlib.h with liblz.{a,so} .PP If no file names are given, or if a file is '\-', minilzip compresses or decompresses from standard input to standard output. Numbers may be followed by a multiplier: k = kB = 10^3 = 1000, Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc... -Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12 -to 2^29 bytes. +Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12 to +2^29 bytes. .PP -The bidimensional parameter space of LZMA can't be mapped to a linear -scale optimal for all files. If your files are large, very repetitive, -etc, you may need to use the \fB\-\-dictionary\-size\fR and \fB\-\-match\-length\fR -options directly to achieve optimal performance. +The bidimensional parameter space of LZMA can't be mapped to a linear scale +optimal for all files. If your files are large, very repetitive, etc, you +may need to use the options \fB\-\-dictionary\-size\fR and \fB\-\-match\-length\fR directly +to achieve optimal performance. .PP -Exit status: 0 for a normal exit, 1 for environmental problems (file -not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or -invalid input file, 3 for an internal consistency error (eg, bug) which -caused minilzip to panic. +To extract all the files from archive 'foo.tar.lz', use the commands +\&'tar \fB\-xf\fR foo.tar.lz' or 'minilzip \fB\-cd\fR foo.tar.lz | tar \fB\-xf\fR \-'. +.PP +Exit status: 0 for a normal exit, 1 for environmental problems +(file not found, invalid command\-line options, I/O errors, etc), 2 to +indicate a corrupt or invalid input file, 3 for an internal consistency +error (e.g., bug) which caused minilzip to panic. +.PP +The ideas embodied in lzlib are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrei Markov (for the +definition of Markov chains), G.N.N. Martin (for the definition of range +encoding), Igor Pavlov (for putting all the above together in LZMA), and +Julian Seward (for bzip2's CLI). .SH "REPORTING BUGS" Report bugs to lzip\-bug@nongnu.org .br Lzlib home page: http://www.nongnu.org/lzip/lzlib.html .SH COPYRIGHT -Copyright \(co 2016 Antonio Diaz Diaz. -Using lzlib 1.8 +Copyright \(co 2025 Antonio Diaz Diaz. License GPLv2+: GNU GPL version 2 or later .br This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. +Using lzlib 1.15 +Using LZ_API_VERSION = 1015 +.SH "SEE ALSO" +The full documentation for +.B minilzip +is maintained as a Texinfo manual. If the +.B info +and +.B minilzip +programs are properly installed at your site, the command +.IP +.B info lzlib +.PP +should give you access to the complete manual. diff --git a/encoder.c b/encoder.c index f5b5b46..442670a 100644 --- a/encoder.c +++ b/encoder.c @@ -1,46 +1,27 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ -static int LZe_get_match_pairs( struct LZ_encoder * const e, struct Pair * pairs ) +static int LZe_get_match_pairs( LZ_encoder * const e, Pair * pairs ) { int32_t * ptr0 = e->eb.mb.pos_array + ( e->eb.mb.cyclic_pos << 1 ); int32_t * ptr1 = ptr0 + 1; - int32_t * newptr; - int len = 0, len0 = 0, len1 = 0; - int maxlen = 0; - int num_pairs = 0; - const int pos1 = e->eb.mb.pos + 1; - const int min_pos = ( e->eb.mb.pos > e->eb.mb.dictionary_size ) ? - e->eb.mb.pos - e->eb.mb.dictionary_size : 0; - const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb ); - int count, key2, key3, key4, newpos; - unsigned tmp; int len_limit = e->match_len_limit; - if( len_limit > Mb_available_bytes( &e->eb.mb ) ) { e->been_flushed = true; @@ -48,54 +29,61 @@ static int LZe_get_match_pairs( struct LZ_encoder * const e, struct Pair * pairs if( len_limit < 4 ) { *ptr0 = *ptr1 = 0; return 0; } } - tmp = crc32[data[0]] ^ data[1]; - key2 = tmp & ( num_prev_positions2 - 1 ); + int maxlen = 3; /* only used if pairs != 0 */ + int num_pairs = 0; + const int min_pos = (e->eb.mb.pos > e->eb.mb.dictionary_size) ? + e->eb.mb.pos - e->eb.mb.dictionary_size : 0; + const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb ); + + unsigned tmp = crc32[data[0]] ^ data[1]; + const int key2 = tmp & ( num_prev_positions2 - 1 ); tmp ^= (unsigned)data[2] << 8; - key3 = num_prev_positions2 + ( tmp & ( num_prev_positions3 - 1 ) ); - key4 = num_prev_positions2 + num_prev_positions3 + - ( ( tmp ^ ( crc32[data[3]] << 5 ) ) & e->eb.mb.key4_mask ); + const int key3 = num_prev_positions2 + ( tmp & ( num_prev_positions3 - 1 ) ); + const int key4 = num_prev_positions2 + num_prev_positions3 + + ( ( tmp ^ ( crc32[data[3]] << 5 ) ) & e->eb.mb.key4_mask ); if( pairs ) { - int np2 = e->eb.mb.prev_positions[key2]; - int np3 = e->eb.mb.prev_positions[key3]; + const int np2 = e->eb.mb.prev_positions[key2]; + const int np3 = e->eb.mb.prev_positions[key3]; if( np2 > min_pos && e->eb.mb.buffer[np2-1] == data[0] ) { pairs[0].dis = e->eb.mb.pos - np2; - pairs[0].len = maxlen = 2; + pairs[0].len = maxlen = 2 + ( np2 == np3 ); num_pairs = 1; } if( np2 != np3 && np3 > min_pos && e->eb.mb.buffer[np3-1] == data[0] ) { maxlen = 3; - np2 = np3; - pairs[num_pairs].dis = e->eb.mb.pos - np2; - ++num_pairs; + pairs[num_pairs++].dis = e->eb.mb.pos - np3; } if( num_pairs > 0 ) { - const int delta = pos1 - np2; + const int delta = pairs[num_pairs-1].dis + 1; while( maxlen < len_limit && data[maxlen-delta] == data[maxlen] ) ++maxlen; pairs[num_pairs-1].len = maxlen; + if( maxlen < 3 ) maxlen = 3; if( maxlen >= len_limit ) pairs = 0; /* done. now just skip */ } - if( maxlen < 3 ) maxlen = 3; } + const int pos1 = e->eb.mb.pos + 1; e->eb.mb.prev_positions[key2] = pos1; e->eb.mb.prev_positions[key3] = pos1; - newpos = e->eb.mb.prev_positions[key4]; + int newpos1 = e->eb.mb.prev_positions[key4]; e->eb.mb.prev_positions[key4] = pos1; + int len = 0, len0 = 0, len1 = 0; + + int count; for( count = e->cycles; ; ) { - int delta; - if( newpos <= min_pos || --count < 0 ) { *ptr0 = *ptr1 = 0; break; } + if( newpos1 <= min_pos || --count < 0 ) { *ptr0 = *ptr1 = 0; break; } if( e->been_flushed ) len = 0; - delta = pos1 - newpos; - newptr = e->eb.mb.pos_array + + const int delta = pos1 - newpos1; + int32_t * const newptr = e->eb.mb.pos_array + ( ( e->eb.mb.cyclic_pos - delta + ( (e->eb.mb.cyclic_pos >= delta) ? 0 : e->eb.mb.dictionary_size + 1 ) ) << 1 ); if( data[len-delta] == data[len] ) @@ -116,16 +104,16 @@ static int LZe_get_match_pairs( struct LZ_encoder * const e, struct Pair * pairs } if( data[len-delta] < data[len] ) { - *ptr0 = newpos; + *ptr0 = newpos1; ptr0 = newptr + 1; - newpos = *ptr0; + newpos1 = *ptr0; len0 = len; if( len1 < len ) len = len1; } else { - *ptr1 = newpos; + *ptr1 = newpos1; ptr1 = newptr; - newpos = *ptr1; + newpos1 = *ptr1; len1 = len; if( len0 < len ) len = len0; } } @@ -133,7 +121,7 @@ static int LZe_get_match_pairs( struct LZ_encoder * const e, struct Pair * pairs } -static void LZe_update_distance_prices( struct LZ_encoder * const e ) +static void LZe_update_distance_prices( LZ_encoder * const e ) { int dis, len_state; for( dis = start_dis_model; dis < modeled_distances; ++dis ) @@ -141,7 +129,7 @@ static void LZe_update_distance_prices( struct LZ_encoder * const e ) const int dis_slot = dis_slots[dis]; const int direct_bits = ( dis_slot >> 1 ) - 1; const int base = ( 2 | ( dis_slot & 1 ) ) << direct_bits; - const int price = price_symbol_reversed( e->eb.bm_dis + base - dis_slot - 1, + const int price = price_symbol_reversed( e->eb.bm_dis + ( base - dis_slot ), dis - base, direct_bits ); for( len_state = 0; len_state < len_states; ++len_state ) e->dis_prices[len_state][dis] = price; @@ -150,15 +138,15 @@ static void LZe_update_distance_prices( struct LZ_encoder * const e ) for( len_state = 0; len_state < len_states; ++len_state ) { int * const dsp = e->dis_slot_prices[len_state]; - int * const dp = e->dis_prices[len_state]; const Bit_model * const bmds = e->eb.bm_dis_slot[len_state]; int slot = 0; for( ; slot < end_dis_model; ++slot ) - dsp[slot] = price_symbol( bmds, slot, dis_slot_bits ); + dsp[slot] = price_symbol6( bmds, slot ); for( ; slot < e->num_dis_slots; ++slot ) - dsp[slot] = price_symbol( bmds, slot, dis_slot_bits ) + + dsp[slot] = price_symbol6( bmds, slot ) + (((( slot >> 1 ) - 1 ) - dis_align_bits ) << price_shift_bits ); + int * const dp = e->dis_prices[len_state]; for( dis = 0; dis < start_dis_model; ++dis ) dp[dis] = dsp[dis]; for( ; dis < modeled_distances; ++dis ) @@ -167,18 +155,17 @@ static void LZe_update_distance_prices( struct LZ_encoder * const e ) } -/* Returns the number of bytes advanced (ahead). +/* Return the number of bytes advanced (ahead). trials[0]..trials[ahead-1] contain the steps to encode. - ( trials[0].dis == -1 ) means literal. + ( trials[0].dis4 == -1 ) means literal. A match/rep longer or equal than match_len_limit finishes the sequence. */ -static int LZe_sequence_optimizer( struct LZ_encoder * const e, +static int LZe_sequence_optimizer( LZ_encoder * const e, const int reps[num_rep_distances], const State state ) { - int main_len, num_pairs, i, rep, cur = 0, num_trials, len; - int replens[num_rep_distances]; - int rep_index = 0; + int num_pairs, num_trials; + int i, rep, len; if( e->pending_num_pairs > 0 ) /* from previous call */ { @@ -187,17 +174,19 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, } else num_pairs = LZe_read_match_distances( e ); - main_len = ( num_pairs > 0 ) ? e->pairs[num_pairs-1].len : 0; + const int main_len = (num_pairs > 0) ? e->pairs[num_pairs-1].len : 0; + int replens[num_rep_distances]; + int rep_index = 0; for( i = 0; i < num_rep_distances; ++i ) { - replens[i] = Mb_true_match_len( &e->eb.mb, 0, reps[i] + 1, max_match_len ); + replens[i] = Mb_true_match_len( &e->eb.mb, 0, reps[i] + 1 ); if( replens[i] > replens[rep_index] ) rep_index = i; } if( replens[rep_index] >= e->match_len_limit ) { e->trials[0].price = replens[rep_index]; - e->trials[0].dis = rep_index; + e->trials[0].dis4 = rep_index; if( !LZe_move_and_update( e, replens[rep_index] ) ) return 0; return replens[rep_index]; } @@ -205,15 +194,12 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, if( main_len >= e->match_len_limit ) { e->trials[0].price = main_len; - e->trials[0].dis = e->pairs[num_pairs-1].dis + num_rep_distances; + e->trials[0].dis4 = e->pairs[num_pairs-1].dis + num_rep_distances; if( !LZe_move_and_update( e, main_len ) ) return 0; return main_len; } - { const int pos_state = Mb_data_position( &e->eb.mb ) & pos_state_mask; - const int match_price = price1( e->eb.bm_match[state][pos_state] ); - const int rep_match_price = match_price + price1( e->eb.bm_rep[state] ); const uint8_t prev_byte = Mb_peek( &e->eb.mb, 1 ); const uint8_t cur_byte = Mb_peek( &e->eb.mb, 0 ); const uint8_t match_byte = Mb_peek( &e->eb.mb, reps[0] + 1 ); @@ -223,7 +209,10 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, e->trials[1].price += LZeb_price_literal( &e->eb, prev_byte, cur_byte ); else e->trials[1].price += LZeb_price_matched( &e->eb, prev_byte, cur_byte, match_byte ); - e->trials[1].dis = -1; /* literal */ + e->trials[1].dis4 = -1; /* literal */ + + const int match_price = price1( e->eb.bm_match[state][pos_state] ); + const int rep_match_price = match_price + price1( e->eb.bm_rep[state] ); if( match_byte == cur_byte ) Tr_update( &e->trials[1], rep_match_price + @@ -234,7 +223,7 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, if( num_trials < min_match_len ) { e->trials[0].price = 1; - e->trials[0].dis = e->trials[1].dis; + e->trials[0].dis4 = e->trials[1].dis4; if( !Mb_move_pos( &e->eb.mb ) ) return 0; return 1; } @@ -248,9 +237,8 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, for( rep = 0; rep < num_rep_distances; ++rep ) { - int price; if( replens[rep] < min_match_len ) continue; - price = rep_match_price + LZeb_price_rep( &e->eb, rep, state, pos_state ); + const int price = rep_match_price + LZeb_price_rep( &e->eb, rep, state, pos_state ); for( len = min_match_len; len <= replens[rep]; ++len ) Tr_update( &e->trials[len], price + Lp_price( &e->rep_len_prices, len, pos_state ), rep, 0 ); @@ -259,7 +247,7 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, if( main_len > replens[0] ) { const int normal_match_price = match_price + price0( e->eb.bm_rep[state] ); - i = 0, len = max( replens[0] + 1, min_match_len ); + int i = 0, len = max( replens[0] + 1, min_match_len ); while( len > e->pairs[i].len ) ++i; while( true ) { @@ -270,17 +258,10 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, if( ++len > e->pairs[i].len && ++i >= num_pairs ) break; } } - } + int cur = 0; while( true ) /* price optimization loop */ { - struct Trial *cur_trial, *next_trial; - int newlen, pos_state, triable_bytes, len_limit; - int start_len = min_match_len; - int next_price, match_price, rep_match_price; - State cur_state; - uint8_t prev_byte, cur_byte, match_byte; - if( !Mb_move_pos( &e->eb.mb ) ) return 0; if( ++cur >= num_trials ) /* no more initialized trials */ { @@ -288,8 +269,8 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, return cur; } - num_pairs = LZe_read_match_distances( e ); - newlen = ( num_pairs > 0 ) ? e->pairs[num_pairs-1].len : 0; + const int num_pairs = LZe_read_match_distances( e ); + const int newlen = (num_pairs > 0) ? e->pairs[num_pairs-1].len : 0; if( newlen >= e->match_len_limit ) { e->pending_num_pairs = num_pairs; @@ -298,9 +279,10 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, } /* give final values to current trial */ - cur_trial = &e->trials[cur]; + Trial * cur_trial = &e->trials[cur]; + State cur_state; { - int dis = cur_trial->dis; + const int dis4 = cur_trial->dis4; int prev_index = cur_trial->prev_index; const int prev_index2 = cur_trial->prev_index2; @@ -309,55 +291,47 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, cur_state = e->trials[prev_index].state; if( prev_index + 1 == cur ) /* len == 1 */ { - if( dis == 0 ) cur_state = St_set_short_rep( cur_state ); + if( dis4 == 0 ) cur_state = St_set_shortrep( cur_state ); else cur_state = St_set_char( cur_state ); /* literal */ } - else if( dis < num_rep_distances ) cur_state = St_set_rep( cur_state ); + else if( dis4 < num_rep_distances ) cur_state = St_set_rep( cur_state ); else cur_state = St_set_match( cur_state ); } - else if( prev_index2 == dual_step_trial ) /* dis == 0 */ + else { - --prev_index; - cur_state = e->trials[prev_index].state; - cur_state = St_set_char( cur_state ); - cur_state = St_set_rep( cur_state ); - } - else /* if( prev_index2 >= 0 ) */ - { - prev_index = prev_index2; - cur_state = e->trials[prev_index].state; - if( dis < num_rep_distances ) cur_state = St_set_rep( cur_state ); - else cur_state = St_set_match( cur_state ); - cur_state = St_set_char( cur_state ); - cur_state = St_set_rep( cur_state ); + if( prev_index2 == dual_step_trial ) /* dis4 == 0 (rep0) */ + --prev_index; + else /* prev_index2 >= 0 */ + prev_index = prev_index2; + cur_state = St_set_char_rep(); } cur_trial->state = cur_state; for( i = 0; i < num_rep_distances; ++i ) cur_trial->reps[i] = e->trials[prev_index].reps[i]; - mtf_reps( dis, cur_trial->reps ); + mtf_reps( dis4, cur_trial->reps ); /* literal is ignored */ } - pos_state = Mb_data_position( &e->eb.mb ) & pos_state_mask; - prev_byte = Mb_peek( &e->eb.mb, 1 ); - cur_byte = Mb_peek( &e->eb.mb, 0 ); - match_byte = Mb_peek( &e->eb.mb, cur_trial->reps[0] + 1 ); + const int pos_state = Mb_data_position( &e->eb.mb ) & pos_state_mask; + const uint8_t prev_byte = Mb_peek( &e->eb.mb, 1 ); + const uint8_t cur_byte = Mb_peek( &e->eb.mb, 0 ); + const uint8_t match_byte = Mb_peek( &e->eb.mb, cur_trial->reps[0] + 1 ); - next_price = cur_trial->price + - price0( e->eb.bm_match[cur_state][pos_state] ); + int next_price = cur_trial->price + + price0( e->eb.bm_match[cur_state][pos_state] ); if( St_is_char( cur_state ) ) next_price += LZeb_price_literal( &e->eb, prev_byte, cur_byte ); else next_price += LZeb_price_matched( &e->eb, prev_byte, cur_byte, match_byte ); /* try last updates to next trial */ - next_trial = &e->trials[cur+1]; + Trial * next_trial = &e->trials[cur+1]; Tr_update( next_trial, next_price, -1, cur ); /* literal */ - match_price = cur_trial->price + price1( e->eb.bm_match[cur_state][pos_state] ); - rep_match_price = match_price + price1( e->eb.bm_rep[cur_state] ); + const int match_price = cur_trial->price + price1( e->eb.bm_match[cur_state][pos_state] ); + const int rep_match_price = match_price + price1( e->eb.bm_rep[cur_state] ); - if( match_byte == cur_byte && next_trial->dis != 0 && + if( match_byte == cur_byte && next_trial->dis4 != 0 && next_trial->prev_index2 == single_step_trial ) { const int price = rep_match_price + @@ -365,16 +339,16 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, if( price <= next_trial->price ) { next_trial->price = price; - next_trial->dis = 0; + next_trial->dis4 = 0; /* rep0 */ next_trial->prev_index = cur; } } - triable_bytes = + const int triable_bytes = min( Mb_available_bytes( &e->eb.mb ), max_num_trials - 1 - cur ); if( triable_bytes < min_match_len ) continue; - len_limit = min( e->match_len_limit, triable_bytes ); + const int len_limit = min( e->match_len_limit, triable_bytes ); /* try literal + rep0 */ if( match_byte != cur_byte && next_trial->prev_index != cur ) @@ -382,27 +356,28 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb ); const int dis = cur_trial->reps[0] + 1; const int limit = min( e->match_len_limit + 1, triable_bytes ); - len = 1; + int len = 1; while( len < limit && data[len-dis] == data[len] ) ++len; if( --len >= min_match_len ) { const int pos_state2 = ( pos_state + 1 ) & pos_state_mask; const State state2 = St_set_char( cur_state ); const int price = next_price + - price1( e->eb.bm_match[state2][pos_state2] ) + - price1( e->eb.bm_rep[state2] ) + - LZe_price_rep0_len( e, len, state2, pos_state2 ); + price1( e->eb.bm_match[state2][pos_state2] ) + + price1( e->eb.bm_rep[state2] ) + + LZe_price_rep0_len( e, len, state2, pos_state2 ); while( num_trials < cur + 1 + len ) e->trials[++num_trials].price = infinite_price; Tr_update2( &e->trials[cur+1+len], price, cur + 1 ); } } + int start_len = min_match_len; + /* try rep distances */ for( rep = 0; rep < num_rep_distances; ++rep ) { const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb ); - int price; const int dis = cur_trial->reps[rep] + 1; if( data[0-dis] != data[0] || data[1-dis] != data[1] ) continue; @@ -410,7 +385,7 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, if( data[len-dis] != data[len] ) break; while( num_trials < cur + len ) e->trials[++num_trials].price = infinite_price; - price = rep_match_price + LZeb_price_rep( &e->eb, rep, cur_state, pos_state ); + int price = rep_match_price + LZeb_price_rep( &e->eb, rep, cur_state, pos_state ); for( i = min_match_len; i <= len; ++i ) Tr_update( &e->trials[cur+i], price + Lp_price( &e->rep_len_prices, i, pos_state ), rep, cur ); @@ -418,17 +393,14 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, if( rep == 0 ) start_len = len + 1; /* discard shorter matches */ /* try rep + literal + rep0 */ - { int len2 = len + 1; const int limit = min( e->match_len_limit + len2, triable_bytes ); - int pos_state2; - State state2; while( len2 < limit && data[len2-dis] == data[len2] ) ++len2; len2 -= len + 1; if( len2 < min_match_len ) continue; - pos_state2 = ( pos_state + len ) & pos_state_mask; - state2 = St_set_rep( cur_state ); + int pos_state2 = ( pos_state + len ) & pos_state_mask; + State state2 = St_set_rep( cur_state ); price += Lp_price( &e->rep_len_prices, len, pos_state ) + price0( e->eb.bm_match[state2][pos_state2] ) + LZeb_price_matched( &e->eb, data[len-1], data[len], data[len-dis] ); @@ -441,25 +413,22 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, e->trials[++num_trials].price = infinite_price; Tr_update3( &e->trials[cur+len+1+len2], price, rep, cur + len + 1, cur ); } - } /* try matches */ if( newlen >= start_len && newlen <= len_limit ) { - int dis; const int normal_match_price = match_price + price0( e->eb.bm_rep[cur_state] ); while( num_trials < cur + newlen ) e->trials[++num_trials].price = infinite_price; - i = 0; + int i = 0; while( e->pairs[i].len < start_len ) ++i; - dis = e->pairs[i].dis; + int dis = e->pairs[i].dis; for( len = start_len; ; ++len ) { int price = normal_match_price + LZe_price_pair( e, dis, len, pos_state ); - Tr_update( &e->trials[cur+len], price, dis + num_rep_distances, cur ); /* try match + literal + rep0 */ @@ -497,30 +466,26 @@ static int LZe_sequence_optimizer( struct LZ_encoder * const e, } -static bool LZe_encode_member( struct LZ_encoder * const e ) +static bool LZe_encode_member( LZ_encoder * const e ) { - const bool best = ( e->match_len_limit > 12 ); + const bool best = e->match_len_limit > 12; const int dis_price_count = best ? 1 : 512; const int align_price_count = best ? 1 : dis_align_size; - const int price_count = ( e->match_len_limit > 36 ) ? 1013 : 4093; - int ahead, i; + const int price_count = (e->match_len_limit > 36) ? 1013 : 4093; + int i; State * const state = &e->eb.state; if( e->eb.member_finished ) return true; if( Re_member_position( &e->eb.renc ) >= e->eb.member_size_limit ) - { - if( LZeb_full_flush( &e->eb ) ) e->eb.member_finished = true; - return true; - } + { LZeb_try_full_flush( &e->eb ); return true; } if( Mb_data_position( &e->eb.mb ) == 0 && !Mb_data_finished( &e->eb.mb ) ) /* encode first byte */ { - const uint8_t prev_byte = 0; - uint8_t cur_byte; if( !Mb_enough_available_bytes( &e->eb.mb ) || !Re_enough_free_bytes( &e->eb.renc ) ) return true; - cur_byte = Mb_peek( &e->eb.mb, 0 ); + const uint8_t prev_byte = 0; + const uint8_t cur_byte = Mb_peek( &e->eb.mb, 0 ); Re_encode_bit( &e->eb.renc, &e->eb.bm_match[*state][0], 0 ); LZeb_encode_literal( &e->eb, prev_byte, cur_byte ); CRC32_update_byte( &e->eb.crc, cur_byte ); @@ -547,8 +512,7 @@ static bool LZe_encode_member( struct LZ_encoder * const e ) Lp_update_prices( &e->rep_len_prices ); } - ahead = LZe_sequence_optimizer( e, e->eb.reps, *state ); - if( ahead <= 0 ) return false; /* can't happen */ + int ahead = LZe_sequence_optimizer( e, e->eb.reps, *state ); e->price_counter -= ahead; for( i = 0; ahead > 0; ) @@ -556,33 +520,32 @@ static bool LZe_encode_member( struct LZ_encoder * const e ) const int pos_state = ( Mb_data_position( &e->eb.mb ) - ahead ) & pos_state_mask; const int len = e->trials[i].price; - const int dis = e->trials[i].dis; + int dis = e->trials[i].dis4; - bool bit = ( dis < 0 ); + bool bit = dis < 0; Re_encode_bit( &e->eb.renc, &e->eb.bm_match[*state][pos_state], !bit ); if( bit ) /* literal byte */ { const uint8_t prev_byte = Mb_peek( &e->eb.mb, ahead + 1 ); const uint8_t cur_byte = Mb_peek( &e->eb.mb, ahead ); CRC32_update_byte( &e->eb.crc, cur_byte ); - if( St_is_char( *state ) ) + if( ( *state = St_set_char( *state ) ) < 4 ) LZeb_encode_literal( &e->eb, prev_byte, cur_byte ); else { const uint8_t match_byte = Mb_peek( &e->eb.mb, ahead + e->eb.reps[0] + 1 ); LZeb_encode_matched( &e->eb, prev_byte, cur_byte, match_byte ); } - *state = St_set_char( *state ); } else /* match or repeated match */ { CRC32_update_buf( &e->eb.crc, Mb_ptr_to_current_pos( &e->eb.mb ) - ahead, len ); mtf_reps( dis, e->eb.reps ); - bit = ( dis < num_rep_distances ); + bit = dis < num_rep_distances; Re_encode_bit( &e->eb.renc, &e->eb.bm_rep[*state], bit ); if( bit ) /* repeated match */ { - bit = ( dis == 0 ); + bit = dis == 0; Re_encode_bit( &e->eb.renc, &e->eb.bm_rep0[*state], !bit ); if( bit ) Re_encode_bit( &e->eb.renc, &e->eb.bm_len[*state][pos_state], len > 1 ); @@ -592,7 +555,7 @@ static bool LZe_encode_member( struct LZ_encoder * const e ) if( dis > 1 ) Re_encode_bit( &e->eb.renc, &e->eb.bm_rep2[*state], dis > 2 ); } - if( len == 1 ) *state = St_set_short_rep( *state ); + if( len == 1 ) *state = St_set_shortrep( *state ); else { Re_encode_len( &e->eb.renc, &e->eb.rep_len_model, len, pos_state ); @@ -602,9 +565,9 @@ static bool LZe_encode_member( struct LZ_encoder * const e ) } else /* match */ { - LZeb_encode_pair( &e->eb, dis - num_rep_distances, len, pos_state ); - if( get_slot( dis - num_rep_distances ) >= end_dis_model ) - --e->align_price_counter; + dis -= num_rep_distances; + LZeb_encode_pair( &e->eb, dis, len, pos_state ); + if( dis >= modeled_distances ) --e->align_price_counter; --e->dis_price_counter; Lp_decrement_counter( &e->match_len_prices, pos_state ); *state = St_set_match( *state ); @@ -614,11 +577,11 @@ static bool LZe_encode_member( struct LZ_encoder * const e ) if( Re_member_position( &e->eb.renc ) >= e->eb.member_size_limit ) { if( !Mb_dec_pos( &e->eb.mb, ahead ) ) return false; - if( LZeb_full_flush( &e->eb ) ) e->eb.member_finished = true; + LZeb_try_full_flush( &e->eb ); return true; } } } - if( LZeb_full_flush( &e->eb ) ) e->eb.member_finished = true; + LZeb_try_full_flush( &e->eb ); return true; } diff --git a/encoder.h b/encoder.h index b70a8ec..cb9689e 100644 --- a/encoder.h +++ b/encoder.h @@ -1,56 +1,47 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ -struct Len_prices +typedef struct Len_prices { - const struct Len_model * lm; + const Len_model * lm; int len_symbols; int count; int prices[pos_states][max_len_symbols]; - int counters[pos_states]; - }; + int counters[pos_states]; /* may decrement below 0 */ + } Len_prices; -static inline void Lp_update_low_mid_prices( struct Len_prices * const lp, +static inline void Lp_update_low_mid_prices( Len_prices * const lp, const int pos_state ) { int * const pps = lp->prices[pos_state]; int tmp = price0( lp->lm->choice1 ); int len = 0; - lp->counters[pos_state] = lp->count; for( ; len < len_low_symbols && len < lp->len_symbols; ++len ) - pps[len] = tmp + price_symbol( lp->lm->bm_low[pos_state], len, len_low_bits ); + pps[len] = tmp + price_symbol3( lp->lm->bm_low[pos_state], len ); if( len >= lp->len_symbols ) return; tmp = price1( lp->lm->choice1 ) + price0( lp->lm->choice2 ); for( ; len < len_low_symbols + len_mid_symbols && len < lp->len_symbols; ++len ) pps[len] = tmp + - price_symbol( lp->lm->bm_mid[pos_state], len - len_low_symbols, len_mid_bits ); + price_symbol3( lp->lm->bm_mid[pos_state], len - len_low_symbols ); } -static inline void Lp_update_high_prices( struct Len_prices * const lp ) +static inline void Lp_update_high_prices( Len_prices * const lp ) { const int tmp = price1( lp->lm->choice1 ) + price1( lp->lm->choice2 ); int len; @@ -58,161 +49,152 @@ static inline void Lp_update_high_prices( struct Len_prices * const lp ) /* using 4 slots per value makes "Lp_price" faster */ lp->prices[3][len] = lp->prices[2][len] = lp->prices[1][len] = lp->prices[0][len] = tmp + - price_symbol( lp->lm->bm_high, len - len_low_symbols - len_mid_symbols, len_high_bits ); + price_symbol8( lp->lm->bm_high, len - len_low_symbols - len_mid_symbols ); } -static inline void Lp_reset( struct Len_prices * const lp ) +static inline void Lp_reset( Len_prices * const lp ) { int i; for( i = 0; i < pos_states; ++i ) lp->counters[i] = 0; } -static inline void Lp_init( struct Len_prices * const lp, - const struct Len_model * const lm, +static inline void Lp_init( Len_prices * const lp, const Len_model * const lm, const int match_len_limit ) { lp->lm = lm; lp->len_symbols = match_len_limit + 1 - min_match_len; - lp->count = ( match_len_limit > 12 ) ? 1 : lp->len_symbols; + lp->count = (match_len_limit > 12) ? 1 : lp->len_symbols; Lp_reset( lp ); } -static inline void Lp_decrement_counter( struct Len_prices * const lp, +static inline void Lp_decrement_counter( Len_prices * const lp, const int pos_state ) { --lp->counters[pos_state]; } -static inline void Lp_update_prices( struct Len_prices * const lp ) +static inline void Lp_update_prices( Len_prices * const lp ) { int pos_state; bool high_pending = false; for( pos_state = 0; pos_state < pos_states; ++pos_state ) if( lp->counters[pos_state] <= 0 ) - { Lp_update_low_mid_prices( lp, pos_state ); high_pending = true; } + { lp->counters[pos_state] = lp->count; + Lp_update_low_mid_prices( lp, pos_state ); high_pending = true; } if( high_pending && lp->len_symbols > len_low_symbols + len_mid_symbols ) Lp_update_high_prices( lp ); } -static inline int Lp_price( const struct Len_prices * const lp, - const int symbol, const int pos_state ) - { return lp->prices[pos_state][symbol - min_match_len]; } +static inline int Lp_price( const Len_prices * const lp, + const int len, const int pos_state ) + { return lp->prices[pos_state][len - min_match_len]; } -struct Pair /* distance-length pair */ +typedef struct Pair /* distance-length pair */ { int dis; int len; - }; + } Pair; enum { infinite_price = 0x0FFFFFFF, max_num_trials = 1 << 13, single_step_trial = -2, dual_step_trial = -1 }; -struct Trial +typedef struct Trial { State state; int price; /* dual use var; cumulative price, match length */ - int dis; /* rep index or match distance. (-1 for literal) */ + int dis4; /* -1 for literal, or rep, or match distance + 4 */ int prev_index; /* index of prev trial in trials[] */ int prev_index2; /* -2 trial is single step */ /* -1 literal + rep0 */ /* >= 0 ( rep or match ) + literal + rep0 */ int reps[num_rep_distances]; - }; + } Trial; -static inline void Tr_update( struct Trial * const trial, const int pr, - const int distance, const int p_i ) +static inline void Tr_update( Trial * const trial, const int pr, + const int distance4, const int p_i ) { if( pr < trial->price ) - { - trial->price = pr; trial->dis = distance; trial->prev_index = p_i; - trial->prev_index2 = single_step_trial; - } + { trial->price = pr; trial->dis4 = distance4; trial->prev_index = p_i; + trial->prev_index2 = single_step_trial; } } -static inline void Tr_update2( struct Trial * const trial, const int pr, +static inline void Tr_update2( Trial * const trial, const int pr, const int p_i ) { if( pr < trial->price ) - { - trial->price = pr; trial->dis = 0; trial->prev_index = p_i; - trial->prev_index2 = dual_step_trial; - } + { trial->price = pr; trial->dis4 = 0; trial->prev_index = p_i; + trial->prev_index2 = dual_step_trial; } } -static inline void Tr_update3( struct Trial * const trial, const int pr, - const int distance, const int p_i, +static inline void Tr_update3( Trial * const trial, const int pr, + const int distance4, const int p_i, const int p_i2 ) { if( pr < trial->price ) - { - trial->price = pr; trial->dis = distance; trial->prev_index = p_i; - trial->prev_index2 = p_i2; - } + { trial->price = pr; trial->dis4 = distance4; trial->prev_index = p_i; + trial->prev_index2 = p_i2; } } -struct LZ_encoder +typedef struct LZ_encoder { - struct LZ_encoder_base eb; + LZ_encoder_base eb; int cycles; int match_len_limit; - struct Len_prices match_len_prices; - struct Len_prices rep_len_prices; + Len_prices match_len_prices; + Len_prices rep_len_prices; int pending_num_pairs; - struct Pair pairs[max_match_len+1]; - struct Trial trials[max_num_trials]; + Pair pairs[max_match_len+1]; + Trial trials[max_num_trials]; int dis_slot_prices[len_states][2*max_dictionary_bits]; int dis_prices[len_states][modeled_distances]; int align_prices[dis_align_size]; int num_dis_slots; - int price_counter; + int price_counter; /* counters may decrement below 0 */ int dis_price_counter; int align_price_counter; bool been_flushed; - }; + } LZ_encoder; -static inline bool Mb_dec_pos( struct Matchfinder_base * const mb, - const int ahead ) +static inline bool Mb_dec_pos( Matchfinder_base * const mb, const int ahead ) { if( ahead < 0 || mb->pos < ahead ) return false; mb->pos -= ahead; + if( mb->cyclic_pos < ahead ) mb->cyclic_pos += mb->dictionary_size + 1; mb->cyclic_pos -= ahead; - if( mb->cyclic_pos < 0 ) mb->cyclic_pos += mb->dictionary_size + 1; return true; } -static int LZe_get_match_pairs( struct LZ_encoder * const e, struct Pair * pairs ); +static int LZe_get_match_pairs( LZ_encoder * const e, Pair * pairs ); - /* move-to-front dis in/into reps if( dis > 0 ) */ -static inline void mtf_reps( const int dis, int reps[num_rep_distances] ) + /* move-to-front dis in/into reps; do nothing if( dis4 <= 0 ) */ +static inline void mtf_reps( const int dis4, int reps[num_rep_distances] ) { - int i; - if( dis >= num_rep_distances ) + if( dis4 >= num_rep_distances ) /* match */ { - for( i = num_rep_distances - 1; i > 0; --i ) reps[i] = reps[i-1]; - reps[0] = dis - num_rep_distances; + reps[3] = reps[2]; reps[2] = reps[1]; reps[1] = reps[0]; + reps[0] = dis4 - num_rep_distances; } - else if( dis > 0 ) + else if( dis4 > 0 ) /* repeated match */ { - const int distance = reps[dis]; - for( i = dis; i > 0; --i ) reps[i] = reps[i-1]; + const int distance = reps[dis4]; + int i; for( i = dis4; i > 0; --i ) reps[i] = reps[i-1]; reps[0] = distance; } } -static inline int LZeb_price_shortrep( const struct LZ_encoder_base * const eb, +static inline int LZeb_price_shortrep( const LZ_encoder_base * const eb, const State state, const int pos_state ) { return price0( eb->bm_rep0[state] ) + price0( eb->bm_len[state][pos_state] ); } -static inline int LZeb_price_rep( const struct LZ_encoder_base * const eb, - const int rep, - const State state, const int pos_state ) +static inline int LZeb_price_rep( const LZ_encoder_base * const eb, + const int rep, const State state, + const int pos_state ) { - int price; if( rep == 0 ) return price0( eb->bm_rep0[state] ) + price1( eb->bm_len[state][pos_state] ); - price = price1( eb->bm_rep0[state] ); + int price = price1( eb->bm_rep0[state] ); if( rep == 1 ) price += price0( eb->bm_rep1[state] ); else @@ -223,15 +205,15 @@ static inline int LZeb_price_rep( const struct LZ_encoder_base * const eb, return price; } -static inline int LZe_price_rep0_len( const struct LZ_encoder * const e, - const int len, - const State state, const int pos_state ) +static inline int LZe_price_rep0_len( const LZ_encoder * const e, + const int len, const State state, + const int pos_state ) { return LZeb_price_rep( &e->eb, 0, state, pos_state ) + Lp_price( &e->rep_len_prices, len, pos_state ); } -static inline int LZe_price_pair( const struct LZ_encoder * const e, +static inline int LZe_price_pair( const LZ_encoder * const e, const int dis, const int len, const int pos_state ) { @@ -244,23 +226,20 @@ static inline int LZe_price_pair( const struct LZ_encoder * const e, e->align_prices[dis & (dis_align_size - 1)]; } -static inline int LZe_read_match_distances( struct LZ_encoder * const e ) +static inline int LZe_read_match_distances( LZ_encoder * const e ) { const int num_pairs = LZe_get_match_pairs( e, e->pairs ); if( num_pairs > 0 ) { - int len = e->pairs[num_pairs-1].len; + const int len = e->pairs[num_pairs-1].len; if( len == e->match_len_limit && len < max_match_len ) - { - len += Mb_true_match_len( &e->eb.mb, len, e->pairs[num_pairs-1].dis + 1, - max_match_len - len ); - e->pairs[num_pairs-1].len = len; - } + e->pairs[num_pairs-1].len = + Mb_true_match_len( &e->eb.mb, len, e->pairs[num_pairs-1].dis + 1 ); } return num_pairs; } -static inline bool LZe_move_and_update( struct LZ_encoder * const e, int n ) +static inline bool LZe_move_and_update( LZ_encoder * const e, int n ) { while( true ) { @@ -271,29 +250,29 @@ static inline bool LZe_move_and_update( struct LZ_encoder * const e, int n ) return true; } -static inline void LZe_backward( struct LZ_encoder * const e, int cur ) +static inline void LZe_backward( LZ_encoder * const e, int cur ) { - int * const dis = &e->trials[cur].dis; + int dis4 = e->trials[cur].dis4; while( cur > 0 ) { const int prev_index = e->trials[cur].prev_index; - struct Trial * const prev_trial = &e->trials[prev_index]; + Trial * const prev_trial = &e->trials[prev_index]; if( e->trials[cur].prev_index2 != single_step_trial ) { - prev_trial->dis = -1; + prev_trial->dis4 = -1; /* literal */ prev_trial->prev_index = prev_index - 1; prev_trial->prev_index2 = single_step_trial; if( e->trials[cur].prev_index2 >= 0 ) { - struct Trial * const prev_trial2 = &e->trials[prev_index-1]; - prev_trial2->dis = *dis; *dis = 0; + Trial * const prev_trial2 = &e->trials[prev_index-1]; + prev_trial2->dis4 = dis4; dis4 = 0; /* rep0 */ prev_trial2->prev_index = e->trials[cur].prev_index2; prev_trial2->prev_index2 = single_step_trial; } } prev_trial->price = cur - prev_index; /* len */ - cur = *dis; *dis = prev_trial->dis; prev_trial->dis = cur; + cur = dis4; dis4 = prev_trial->dis4; prev_trial->dis4 = cur; cur = prev_index; } } @@ -301,11 +280,11 @@ static inline void LZe_backward( struct LZ_encoder * const e, int cur ) enum { num_prev_positions3 = 1 << 16, num_prev_positions2 = 1 << 10 }; -static inline bool LZe_init( struct LZ_encoder * const e, +static inline bool LZe_init( LZ_encoder * const e, const int dict_size, const int len_limit, const unsigned long long member_size ) { - enum { before = max_num_trials + 1, + enum { before_size = max_num_trials, /* bytes to keep in buffer after pos */ after_size = max_num_trials + ( 2 * max_match_len ) + 1, dict_factor = 2, @@ -313,10 +292,10 @@ static inline bool LZe_init( struct LZ_encoder * const e, pos_array_factor = 2, min_free_bytes = 2 * max_num_trials }; - if( !LZeb_init( &e->eb, before, dict_size, after_size, dict_factor, + if( !LZeb_init( &e->eb, before_size, dict_size, after_size, dict_factor, num_prev_positions23, pos_array_factor, min_free_bytes, member_size ) ) return false; - e->cycles = ( len_limit < max_match_len ) ? 16 + ( len_limit / 2 ) : 256; + e->cycles = (len_limit < max_match_len) ? 16 + ( len_limit / 2 ) : 256; e->match_len_limit = len_limit; Lp_init( &e->match_len_prices, &e->eb.match_len_model, e->match_len_limit ); Lp_init( &e->rep_len_prices, &e->eb.rep_len_model, e->match_len_limit ); @@ -331,7 +310,7 @@ static inline bool LZe_init( struct LZ_encoder * const e, return true; } -static inline void LZe_reset( struct LZ_encoder * const e, +static inline void LZe_reset( LZ_encoder * const e, const unsigned long long member_size ) { LZeb_reset( &e->eb, member_size ); diff --git a/encoder_base.c b/encoder_base.c index ee7e0bb..b823dfa 100644 --- a/encoder_base.c +++ b/encoder_base.c @@ -1,42 +1,35 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ -static bool Mb_normalize_pos( struct Matchfinder_base * const mb ) +static bool Mb_normalize_pos( Matchfinder_base * const mb ) { if( mb->pos > mb->stream_pos ) { mb->pos = mb->stream_pos; return false; } if( !mb->at_stream_end ) { int i; - const int offset = mb->pos - mb->dictionary_size - mb->before_size; + /* offset is int32_t for the min below */ + const int32_t offset = mb->pos - mb->before_size - mb->dictionary_size; const int size = mb->stream_pos - offset; memmove( mb->buffer, mb->buffer + offset, size ); mb->partial_data_pos += offset; - mb->pos -= offset; + mb->pos -= offset; /* pos = before_size + dictionary_size */ mb->stream_pos -= offset; for( i = 0; i < mb->num_prev_positions; ++i ) mb->prev_positions[i] -= min( mb->prev_positions[i], offset ); @@ -47,43 +40,42 @@ static bool Mb_normalize_pos( struct Matchfinder_base * const mb ) } -static bool Mb_init( struct Matchfinder_base * const mb, - const int before, const int dict_size, - const int after_size, const int dict_factor, - const int num_prev_positions23, +static bool Mb_init( Matchfinder_base * const mb, const int before_size, + const int dict_size, const int after_size, + const int dict_factor, const int num_prev_positions23, const int pos_array_factor ) { const int buffer_size_limit = - ( dict_factor * dict_size ) + before + after_size; - unsigned size; + ( dict_factor * dict_size ) + before_size + after_size; int i; mb->partial_data_pos = 0; - mb->before_size = before; + mb->before_size = before_size; mb->after_size = after_size; mb->pos = 0; mb->cyclic_pos = 0; mb->stream_pos = 0; + mb->num_prev_positions23 = num_prev_positions23; mb->at_stream_end = false; - mb->flushing = false; + mb->sync_flush_pending = false; mb->buffer_size = max( 65536, buffer_size_limit ); mb->buffer = (uint8_t *)malloc( mb->buffer_size ); if( !mb->buffer ) return false; + mb->saved_dictionary_size = dict_size; mb->dictionary_size = dict_size; mb->pos_limit = mb->buffer_size - after_size; - size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 ); - if( mb->dictionary_size > 1 << 26 ) /* 64 MiB */ - size >>= 1; - mb->key4_mask = size - 1; - mb->num_prev_positions23 = num_prev_positions23; + unsigned size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 ); + if( mb->dictionary_size > 1 << 26 ) size >>= 1; /* 64 MiB */ + mb->key4_mask = size - 1; /* increases with dictionary size */ size += num_prev_positions23; - mb->num_prev_positions = size; + mb->pos_array_size = pos_array_factor * ( mb->dictionary_size + 1 ); size += mb->pos_array_size; - if( size * sizeof (int32_t) <= size ) mb->prev_positions = 0; - else mb->prev_positions = (int32_t *)malloc( size * sizeof (int32_t) ); + if( size * sizeof mb->prev_positions[0] <= size ) mb->prev_positions = 0; + else mb->prev_positions = + (int32_t *)malloc( size * sizeof mb->prev_positions[0] ); if( !mb->prev_positions ) { free( mb->buffer ); return false; } mb->pos_array = mb->prev_positions + mb->num_prev_positions; for( i = 0; i < mb->num_prev_positions; ++i ) mb->prev_positions[i] = 0; @@ -91,26 +83,29 @@ static bool Mb_init( struct Matchfinder_base * const mb, } -static void Mb_adjust_dictionary_size( struct Matchfinder_base * const mb ) +static void Mb_adjust_array( Matchfinder_base * const mb ) + { + int size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 ); + if( mb->dictionary_size > 1 << 26 ) size >>= 1; /* 64 MiB */ + mb->key4_mask = size - 1; + size += mb->num_prev_positions23; + mb->num_prev_positions = size; + mb->pos_array = mb->prev_positions + mb->num_prev_positions; + } + + +static void Mb_adjust_dictionary_size( Matchfinder_base * const mb ) { if( mb->stream_pos < mb->dictionary_size ) { - int size; - mb->buffer_size = - mb->dictionary_size = - mb->pos_limit = max( min_dictionary_size, mb->stream_pos ); - size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 ); - if( mb->dictionary_size > 1 << 26 ) - size >>= 1; - mb->key4_mask = size - 1; - size += mb->num_prev_positions23; - mb->num_prev_positions = size; - mb->pos_array = mb->prev_positions + mb->num_prev_positions; + mb->dictionary_size = max( min_dictionary_size, mb->stream_pos ); + Mb_adjust_array( mb ); + mb->pos_limit = mb->buffer_size; } } -static void Mb_reset( struct Matchfinder_base * const mb ) +static void Mb_reset( Matchfinder_base * const mb ) { int i; if( mb->stream_pos > mb->pos ) @@ -120,60 +115,62 @@ static void Mb_reset( struct Matchfinder_base * const mb ) mb->pos = 0; mb->cyclic_pos = 0; mb->at_stream_end = false; - mb->flushing = false; + mb->sync_flush_pending = false; + mb->dictionary_size = mb->saved_dictionary_size; + Mb_adjust_array( mb ); + mb->pos_limit = mb->buffer_size - mb->after_size; for( i = 0; i < mb->num_prev_positions; ++i ) mb->prev_positions[i] = 0; } - /* End Of Stream mark => (dis == 0xFFFFFFFFU, len == min_match_len) */ -static bool LZeb_full_flush( struct LZ_encoder_base * const eb ) +/* End Of Stream marker => (dis == 0xFFFFFFFFU, len == min_match_len) */ +static void LZeb_try_full_flush( LZ_encoder_base * const eb ) { - int i; + if( eb->member_finished || Cb_free_bytes( &eb->renc.cb ) < + max_marker_size + eb->renc.ff_count + Lt_size ) return; + eb->member_finished = true; const int pos_state = Mb_data_position( &eb->mb ) & pos_state_mask; const State state = eb->state; - File_trailer trailer; - if( eb->member_finished || - Cb_free_bytes( &eb->renc.cb ) < max_marker_size + eb->renc.ff_count + Ft_size ) - return false; Re_encode_bit( &eb->renc, &eb->bm_match[state][pos_state], 1 ); Re_encode_bit( &eb->renc, &eb->bm_rep[state], 0 ); LZeb_encode_pair( eb, 0xFFFFFFFFU, min_match_len, pos_state ); Re_flush( &eb->renc ); - Ft_set_data_crc( trailer, LZeb_crc( eb ) ); - Ft_set_data_size( trailer, Mb_data_position( &eb->mb ) ); - Ft_set_member_size( trailer, Re_member_position( &eb->renc ) + Ft_size ); - for( i = 0; i < Ft_size; ++i ) - Cb_put_byte( &eb->renc.cb, trailer[i] ); - return true; + Lzip_trailer trailer; + Lt_set_data_crc( trailer, LZeb_crc( eb ) ); + Lt_set_data_size( trailer, Mb_data_position( &eb->mb ) ); + Lt_set_member_size( trailer, Re_member_position( &eb->renc ) + Lt_size ); + int i; for( i = 0; i < Lt_size; ++i ) Cb_put_byte( &eb->renc.cb, trailer[i] ); } - /* Sync Flush mark => (dis == 0xFFFFFFFFU, len == min_match_len + 1) */ -static bool LZeb_sync_flush( struct LZ_encoder_base * const eb ) +/* Sync Flush marker => (dis == 0xFFFFFFFFU, len == min_match_len + 1) */ +static void LZeb_try_sync_flush( LZ_encoder_base * const eb ) { - int i; + const unsigned min_size = eb->renc.ff_count + max_marker_size; + if( eb->member_finished || + Cb_free_bytes( &eb->renc.cb ) < min_size + max_marker_size ) return; + eb->mb.sync_flush_pending = false; + const unsigned long long old_mpos = Re_member_position( &eb->renc ); const int pos_state = Mb_data_position( &eb->mb ) & pos_state_mask; const State state = eb->state; - if( eb->member_finished || - Cb_free_bytes( &eb->renc.cb ) < (2 * max_marker_size) + eb->renc.ff_count ) - return false; - for( i = 0; i < 2; ++i ) /* 2 consecutive markers guarantee decoding */ - { + do { /* size of markers must be >= rd_min_available_bytes + 5 */ Re_encode_bit( &eb->renc, &eb->bm_match[state][pos_state], 1 ); Re_encode_bit( &eb->renc, &eb->bm_rep[state], 0 ); LZeb_encode_pair( eb, 0xFFFFFFFFU, min_match_len + 1, pos_state ); Re_flush( &eb->renc ); } - return true; + while( Re_member_position( &eb->renc ) - old_mpos < min_size ); } -static void LZeb_reset( struct LZ_encoder_base * const eb, +static void LZeb_reset( LZ_encoder_base * const eb, const unsigned long long member_size ) { - int i; + const unsigned long long min_member_size = min_dictionary_size; + const unsigned long long max_member_size = 0x0008000000000000ULL; /* 2 PiB */ Mb_reset( &eb->mb ); - eb->member_size_limit = member_size - Ft_size - max_marker_size; + eb->member_size_limit = min( max( min_member_size, member_size ), + max_member_size ) - Lt_size - max_marker_size; eb->crc = 0xFFFFFFFFU; Bm_array_init( eb->bm_literal[0], (1 << literal_context_bits) * 0x300 ); Bm_array_init( eb->bm_match[0], states * pos_states ); @@ -183,12 +180,12 @@ static void LZeb_reset( struct LZ_encoder_base * const eb, Bm_array_init( eb->bm_rep2, states ); Bm_array_init( eb->bm_len[0], states * pos_states ); Bm_array_init( eb->bm_dis_slot[0], len_states * (1 << dis_slot_bits) ); - Bm_array_init( eb->bm_dis, modeled_distances - end_dis_model ); + Bm_array_init( eb->bm_dis, modeled_distances - end_dis_model + 1 ); Bm_array_init( eb->bm_align, dis_align_size ); Lm_init( &eb->match_len_model ); Lm_init( &eb->rep_len_model ); - Re_reset( &eb->renc ); - for( i = 0; i < num_rep_distances; ++i ) eb->reps[i] = 0; + Re_reset( &eb->renc, eb->mb.dictionary_size ); + int i; for( i = 0; i < num_rep_distances; ++i ) eb->reps[i] = 0; eb->state = 0; eb->member_finished = false; } diff --git a/encoder_base.h b/encoder_base.h index 3209922..b4a6f02 100644 --- a/encoder_base.h +++ b/encoder_base.h @@ -1,28 +1,20 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ enum { price_shift_bits = 6, @@ -149,22 +141,45 @@ static inline int price0( const Bit_model probability ) static inline int price1( const Bit_model probability ) { return get_price( bit_model_total - probability ); } -static inline int price_bit( const Bit_model bm, const int bit ) - { if( bit ) return price1( bm ); else return price0( bm ); } +static inline int price_bit( const Bit_model bm, const bool bit ) + { return bit ? price1( bm ) : price0( bm ); } -static inline int price_symbol( const Bit_model bm[], int symbol, - const int num_bits ) +static inline int price_symbol3( const Bit_model bm[], int symbol ) { - int price = 0; - symbol |= ( 1 << num_bits ); - while( symbol > 1 ) - { - const int bit = symbol & 1; - symbol >>= 1; - price += price_bit( bm[symbol], bit ); - } - return price; + bool bit = symbol & 1; + symbol |= 8; symbol >>= 1; + int price = price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + return price + price_bit( bm[1], symbol & 1 ); + } + + +static inline int price_symbol6( const Bit_model bm[], unsigned symbol ) + { + bool bit = symbol & 1; + symbol |= 64; symbol >>= 1; + int price = price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + return price + price_bit( bm[1], symbol & 1 ); + } + + +static inline int price_symbol8( const Bit_model bm[], int symbol ) + { + bool bit = symbol & 1; + symbol |= 0x100; symbol >>= 1; + int price = price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + return price + price_bit( bm[1], symbol & 1 ); } @@ -176,37 +191,33 @@ static inline int price_symbol_reversed( const Bit_model bm[], int symbol, int i; for( i = num_bits; i > 0; --i ) { - const int bit = symbol & 1; - price += price_bit( bm[model], bit ); - model = ( model << 1 ) | bit; + const bool bit = symbol & 1; symbol >>= 1; + price += price_bit( bm[model], bit ); + model <<= 1; model |= bit; } return price; } -static inline int price_matched( const Bit_model bm[], int symbol, - int match_byte ) +static inline int price_matched( const Bit_model bm[], unsigned symbol, + unsigned match_byte ) { int price = 0; - int mask = 0x100; + unsigned mask = 0x100; symbol |= mask; - - do { - int match_bit, bit; - match_byte <<= 1; - match_bit = match_byte & mask; - symbol <<= 1; - bit = symbol & 0x100; - price += price_bit( bm[match_bit+(symbol>>9)+mask], bit ); - mask &= ~(match_byte ^ symbol); /* if( match_bit != bit ) mask = 0; */ + while( true ) + { + const unsigned match_bit = ( match_byte <<= 1 ) & mask; + const bool bit = ( symbol <<= 1 ) & 0x100; + price += price_bit( bm[(symbol>>9)+match_bit+mask], bit ); + if( symbol >= 0x10000 ) return price; + mask &= ~(match_bit ^ symbol); /* if( match_bit != bit ) mask = 0; */ } - while( symbol < 0x10000 ); - return price; } -struct Matchfinder_base +typedef struct Matchfinder_base { unsigned long long partial_data_pos; uint8_t * buffer; /* input buffer */ @@ -224,56 +235,55 @@ struct Matchfinder_base int num_prev_positions23; int num_prev_positions; /* size of prev_positions */ int pos_array_size; + int saved_dictionary_size; /* dictionary_size restored by Mb_reset */ bool at_stream_end; /* stream_pos shows real end of file */ - bool flushing; - }; + bool sync_flush_pending; + } Matchfinder_base; -static bool Mb_normalize_pos( struct Matchfinder_base * const mb ); +static bool Mb_normalize_pos( Matchfinder_base * const mb ); -static bool Mb_init( struct Matchfinder_base * const mb, - const int before, const int dict_size, - const int after_size, const int dict_factor, - const int num_prev_positions23, +static bool Mb_init( Matchfinder_base * const mb, const int before_size, + const int dict_size, const int after_size, + const int dict_factor, const int num_prev_positions23, const int pos_array_factor ); -static inline void Mb_free( struct Matchfinder_base * const mb ) +static inline void Mb_free( Matchfinder_base * const mb ) { free( mb->prev_positions ); free( mb->buffer ); } -static inline uint8_t Mb_peek( const struct Matchfinder_base * const mb, +static inline uint8_t Mb_peek( const Matchfinder_base * const mb, const int distance ) { return mb->buffer[mb->pos-distance]; } -static inline int Mb_available_bytes( const struct Matchfinder_base * const mb ) +static inline int Mb_available_bytes( const Matchfinder_base * const mb ) { return mb->stream_pos - mb->pos; } static inline unsigned long long -Mb_data_position( const struct Matchfinder_base * const mb ) +Mb_data_position( const Matchfinder_base * const mb ) { return mb->partial_data_pos + mb->pos; } -static inline void Mb_finish( struct Matchfinder_base * const mb ) - { mb->at_stream_end = true; mb->flushing = false; } +static inline void Mb_finish( Matchfinder_base * const mb ) + { mb->at_stream_end = true; mb->sync_flush_pending = false; } -static inline bool Mb_data_finished( const struct Matchfinder_base * const mb ) - { return mb->at_stream_end && !mb->flushing && mb->pos >= mb->stream_pos; } +static inline bool Mb_data_finished( const Matchfinder_base * const mb ) + { return mb->at_stream_end && mb->pos >= mb->stream_pos; } -static inline bool Mb_flushing_or_end( const struct Matchfinder_base * const mb ) - { return mb->at_stream_end || mb->flushing; } +static inline bool Mb_flushing_or_end( const Matchfinder_base * const mb ) + { return mb->at_stream_end || mb->sync_flush_pending; } -static inline int Mb_free_bytes( const struct Matchfinder_base * const mb ) +static inline int Mb_free_bytes( const Matchfinder_base * const mb ) { if( Mb_flushing_or_end( mb ) ) return 0; return mb->buffer_size - mb->stream_pos; } -static inline bool Mb_enough_available_bytes( const struct Matchfinder_base * const mb ) - { - return ( mb->pos + mb->after_size <= mb->stream_pos || - ( Mb_flushing_or_end( mb ) && mb->pos < mb->stream_pos ) ); - } +static inline bool +Mb_enough_available_bytes( const Matchfinder_base * const mb ) + { return mb->pos + mb->after_size <= mb->stream_pos || + ( Mb_flushing_or_end( mb ) && mb->pos < mb->stream_pos ); } static inline const uint8_t * -Mb_ptr_to_current_pos( const struct Matchfinder_base * const mb ) +Mb_ptr_to_current_pos( const Matchfinder_base * const mb ) { return mb->buffer + mb->pos; } -static int Mb_write_data( struct Matchfinder_base * const mb, +static int Mb_write_data( Matchfinder_base * const mb, const uint8_t * const inbuf, const int size ) { const int sz = min( mb->buffer_size - mb->stream_pos, size ); @@ -283,19 +293,17 @@ static int Mb_write_data( struct Matchfinder_base * const mb, return sz; } -static inline int Mb_true_match_len( const struct Matchfinder_base * const mb, - const int index, const int distance, - int len_limit ) +static inline int Mb_true_match_len( const Matchfinder_base * const mb, + const int index, const int distance ) { - const uint8_t * const data = mb->buffer + mb->pos + index; - int i = 0; - if( index + len_limit > Mb_available_bytes( mb ) ) - len_limit = Mb_available_bytes( mb ) - index; + const uint8_t * const data = mb->buffer + mb->pos; + int i = index; + const int len_limit = min( Mb_available_bytes( mb ), max_match_len ); while( i < len_limit && data[i-distance] == data[i] ) ++i; return i; } -static inline bool Mb_move_pos( struct Matchfinder_base * const mb ) +static inline bool Mb_move_pos( Matchfinder_base * const mb ) { if( ++mb->cyclic_pos > mb->dictionary_size ) mb->cyclic_pos = 0; if( ++mb->pos >= mb->pos_limit ) return Mb_normalize_pos( mb ); @@ -303,23 +311,23 @@ static inline bool Mb_move_pos( struct Matchfinder_base * const mb ) } -struct Range_encoder +typedef struct Range_encoder { - struct Circular_buffer cb; + Circular_buffer cb; unsigned min_free_bytes; uint64_t low; unsigned long long partial_member_pos; uint32_t range; unsigned ff_count; uint8_t cache; - File_header header; - }; + Lzip_header header; + } Range_encoder; -static inline void Re_shift_low( struct Range_encoder * const renc ) +static inline void Re_shift_low( Range_encoder * const renc ) { - const bool carry = ( renc->low > 0xFFFFFFFFU ); - if( carry || renc->low < 0xFF000000U ) + if( renc->low >> 24 != 0xFF ) { + const bool carry = renc->low > 0xFFFFFFFFU; Cb_put_byte( &renc->cb, renc->cache + carry ); for( ; renc->ff_count > 0; --renc->ff_count ) Cb_put_byte( &renc->cb, 0xFF + carry ); @@ -329,42 +337,41 @@ static inline void Re_shift_low( struct Range_encoder * const renc ) renc->low = ( renc->low & 0x00FFFFFFU ) << 8; } -static inline void Re_reset( struct Range_encoder * const renc ) +static inline void Re_reset( Range_encoder * const renc, + const unsigned dictionary_size ) { - int i; Cb_reset( &renc->cb ); renc->low = 0; renc->partial_member_pos = 0; renc->range = 0xFFFFFFFFU; renc->ff_count = 0; renc->cache = 0; - for( i = 0; i < Fh_size; ++i ) - Cb_put_byte( &renc->cb, renc->header[i] ); + Lh_set_dictionary_size( renc->header, dictionary_size ); + int i; for( i = 0; i < Lh_size; ++i ) Cb_put_byte( &renc->cb, renc->header[i] ); } -static inline bool Re_init( struct Range_encoder * const renc, +static inline bool Re_init( Range_encoder * const renc, const unsigned dictionary_size, const unsigned min_free_bytes ) { if( !Cb_init( &renc->cb, 65536 + min_free_bytes ) ) return false; renc->min_free_bytes = min_free_bytes; - Fh_set_magic( renc->header ); - Fh_set_dictionary_size( renc->header, dictionary_size ); - Re_reset( renc ); + Lh_set_magic( renc->header ); + Re_reset( renc, dictionary_size ); return true; } -static inline void Re_free( struct Range_encoder * const renc ) +static inline void Re_free( Range_encoder * const renc ) { Cb_free( &renc->cb ); } static inline unsigned long long -Re_member_position( const struct Range_encoder * const renc ) +Re_member_position( const Range_encoder * const renc ) { return renc->partial_member_pos + Cb_used_bytes( &renc->cb ) + renc->ff_count; } -static inline bool Re_enough_free_bytes( const struct Range_encoder * const renc ) +static inline bool Re_enough_free_bytes( const Range_encoder * const renc ) { return Cb_free_bytes( &renc->cb ) >= renc->min_free_bytes + renc->ff_count; } -static inline int Re_read_data( struct Range_encoder * const renc, +static inline int Re_read_data( Range_encoder * const renc, uint8_t * const out_buffer, const int out_size ) { const int size = Cb_read_data( &renc->cb, out_buffer, out_size ); @@ -372,7 +379,7 @@ static inline int Re_read_data( struct Range_encoder * const renc, return size; } -static inline void Re_flush( struct Range_encoder * const renc ) +static inline void Re_flush( Range_encoder * const renc ) { int i; for( i = 0; i < 5; ++i ) Re_shift_low( renc ); renc->low = 0; @@ -381,21 +388,20 @@ static inline void Re_flush( struct Range_encoder * const renc ) renc->cache = 0; } -static inline void Re_encode( struct Range_encoder * const renc, +static inline void Re_encode( Range_encoder * const renc, const int symbol, const int num_bits ) { - int i; - for( i = num_bits - 1; i >= 0; --i ) + unsigned mask; + for( mask = 1 << ( num_bits - 1 ); mask > 0; mask >>= 1 ) { renc->range >>= 1; - if( (symbol >> i) & 1 ) renc->low += renc->range; - if( renc->range <= 0x00FFFFFFU ) - { renc->range <<= 8; Re_shift_low( renc ); } + if( symbol & mask ) renc->low += renc->range; + if( renc->range <= 0x00FFFFFFU ) { renc->range <<= 8; Re_shift_low( renc ); } } } -static inline void Re_encode_bit( struct Range_encoder * const renc, - Bit_model * const probability, const int bit ) +static inline void Re_encode_bit( Range_encoder * const renc, + Bit_model * const probability, const bool bit ) { const uint32_t bound = ( renc->range >> bit_model_total_bits ) * *probability; if( !bit ) @@ -409,76 +415,96 @@ static inline void Re_encode_bit( struct Range_encoder * const renc, renc->range -= bound; *probability -= *probability >> bit_model_move_bits; } - if( renc->range <= 0x00FFFFFFU ) - { renc->range <<= 8; Re_shift_low( renc ); } + if( renc->range <= 0x00FFFFFFU ) { renc->range <<= 8; Re_shift_low( renc ); } } -static inline void Re_encode_tree( struct Range_encoder * const renc, - Bit_model bm[], const int symbol, const int num_bits ) +static inline void Re_encode_tree3( Range_encoder * const renc, + Bit_model bm[], const int symbol ) + { + bool bit = ( symbol >> 2 ) & 1; + Re_encode_bit( renc, &bm[1], bit ); + int model = 2 | bit; + bit = ( symbol >> 1 ) & 1; + Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit; + Re_encode_bit( renc, &bm[model], symbol & 1 ); + } + +static inline void Re_encode_tree6( Range_encoder * const renc, + Bit_model bm[], const unsigned symbol ) + { + bool bit = ( symbol >> 5 ) & 1; + Re_encode_bit( renc, &bm[1], bit ); + int model = 2 | bit; + bit = ( symbol >> 4 ) & 1; + Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit; + bit = ( symbol >> 3 ) & 1; + Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit; + bit = ( symbol >> 2 ) & 1; + Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit; + bit = ( symbol >> 1 ) & 1; + Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit; + Re_encode_bit( renc, &bm[model], symbol & 1 ); + } + +static inline void Re_encode_tree8( Range_encoder * const renc, + Bit_model bm[], const int symbol ) { - int mask = ( 1 << ( num_bits - 1 ) ); int model = 1; int i; - for( i = num_bits; i > 0; --i, mask >>= 1 ) + for( i = 7; i >= 0; --i ) { - const int bit = ( symbol & mask ); + const bool bit = ( symbol >> i ) & 1; Re_encode_bit( renc, &bm[model], bit ); - model <<= 1; - if( bit ) model |= 1; + model <<= 1; model |= bit; } } -static inline void Re_encode_tree_reversed( struct Range_encoder * const renc, - Bit_model bm[], int symbol, const int num_bits ) +static inline void Re_encode_tree_reversed( Range_encoder * const renc, + Bit_model bm[], int symbol, const int num_bits ) { int model = 1; int i; for( i = num_bits; i > 0; --i ) { - const int bit = symbol & 1; - Re_encode_bit( renc, &bm[model], bit ); - model = ( model << 1 ) | bit; + const bool bit = symbol & 1; symbol >>= 1; + Re_encode_bit( renc, &bm[model], bit ); + model <<= 1; model |= bit; } } -static inline void Re_encode_matched( struct Range_encoder * const renc, - Bit_model bm[], int symbol, - int match_byte ) +static inline void Re_encode_matched( Range_encoder * const renc, + Bit_model bm[], unsigned symbol, + unsigned match_byte ) { - int mask = 0x100; + unsigned mask = 0x100; symbol |= mask; - - do { - int match_bit, bit; - match_byte <<= 1; - match_bit = match_byte & mask; - symbol <<= 1; - bit = symbol & 0x100; - Re_encode_bit( renc, &bm[match_bit+(symbol>>9)+mask], bit ); - mask &= ~(match_byte ^ symbol); /* if( match_bit != bit ) mask = 0; */ + while( true ) + { + const unsigned match_bit = ( match_byte <<= 1 ) & mask; + const bool bit = ( symbol <<= 1 ) & 0x100; + Re_encode_bit( renc, &bm[(symbol>>9)+match_bit+mask], bit ); + if( symbol >= 0x10000 ) break; + mask &= ~(match_bit ^ symbol); /* if( match_bit != bit ) mask = 0; */ } - while( symbol < 0x10000 ); } -static inline void Re_encode_len( struct Range_encoder * const renc, - struct Len_model * const lm, +static inline void Re_encode_len( Range_encoder * const renc, + Len_model * const lm, int symbol, const int pos_state ) { - bool bit = ( ( symbol -= min_match_len ) >= len_low_symbols ); + bool bit = ( symbol -= min_match_len ) >= len_low_symbols; Re_encode_bit( renc, &lm->choice1, bit ); if( !bit ) - Re_encode_tree( renc, lm->bm_low[pos_state], symbol, len_low_bits ); + Re_encode_tree3( renc, lm->bm_low[pos_state], symbol ); else { - bit = ( symbol >= len_low_symbols + len_mid_symbols ); + bit = ( symbol -= len_low_symbols ) >= len_mid_symbols; Re_encode_bit( renc, &lm->choice2, bit ); if( !bit ) - Re_encode_tree( renc, lm->bm_mid[pos_state], - symbol - len_low_symbols, len_mid_bits ); + Re_encode_tree3( renc, lm->bm_mid[pos_state], symbol ); else - Re_encode_tree( renc, lm->bm_high, - symbol - len_low_symbols - len_mid_symbols, len_high_bits ); + Re_encode_tree8( renc, lm->bm_high, symbol - len_mid_symbols ); } } @@ -486,9 +512,9 @@ static inline void Re_encode_len( struct Range_encoder * const renc, enum { max_marker_size = 16, num_rep_distances = 4 }; /* must be 4 */ -struct LZ_encoder_base +typedef struct LZ_encoder_base { - struct Matchfinder_base mb; + Matchfinder_base mb; unsigned long long member_size_limit; uint32_t crc; @@ -500,28 +526,28 @@ struct LZ_encoder_base Bit_model bm_rep2[states]; Bit_model bm_len[states][pos_states]; Bit_model bm_dis_slot[len_states][1<mb, before, dict_size, after_size, dict_factor, + if( !Mb_init( &eb->mb, before_size, dict_size, after_size, dict_factor, num_prev_positions23, pos_array_factor ) ) return false; if( !Re_init( &eb->renc, eb->mb.dictionary_size, min_free_bytes ) ) return false; @@ -529,44 +555,40 @@ static inline bool LZeb_init( struct LZ_encoder_base * const eb, return true; } -static inline bool LZeb_member_finished( const struct LZ_encoder_base * const eb ) - { return ( eb->member_finished && !Cb_used_bytes( &eb->renc.cb ) ); } +static inline bool LZeb_member_finished( const LZ_encoder_base * const eb ) + { return eb->member_finished && Cb_empty( &eb->renc.cb ); } -static inline void LZeb_free( struct LZ_encoder_base * const eb ) +static inline void LZeb_free( LZ_encoder_base * const eb ) { Re_free( &eb->renc ); Mb_free( &eb->mb ); } -static inline unsigned LZeb_crc( const struct LZ_encoder_base * const eb ) +static inline unsigned LZeb_crc( const LZ_encoder_base * const eb ) { return eb->crc ^ 0xFFFFFFFFU; } -static inline int LZeb_price_literal( const struct LZ_encoder_base * const eb, - uint8_t prev_byte, uint8_t symbol ) - { return price_symbol( eb->bm_literal[get_lit_state(prev_byte)], symbol, 8 ); } +static inline int LZeb_price_literal( const LZ_encoder_base * const eb, + const uint8_t prev_byte, const uint8_t symbol ) + { return price_symbol8( eb->bm_literal[get_lit_state(prev_byte)], symbol ); } -static inline int LZeb_price_matched( const struct LZ_encoder_base * const eb, - uint8_t prev_byte, uint8_t symbol, - uint8_t match_byte ) +static inline int LZeb_price_matched( const LZ_encoder_base * const eb, + const uint8_t prev_byte, const uint8_t symbol, const uint8_t match_byte ) { return price_matched( eb->bm_literal[get_lit_state(prev_byte)], symbol, match_byte ); } -static inline void LZeb_encode_literal( struct LZ_encoder_base * const eb, - uint8_t prev_byte, uint8_t symbol ) - { Re_encode_tree( &eb->renc, - eb->bm_literal[get_lit_state(prev_byte)], symbol, 8 ); } +static inline void LZeb_encode_literal( LZ_encoder_base * const eb, + const uint8_t prev_byte, const uint8_t symbol ) + { Re_encode_tree8( &eb->renc, eb->bm_literal[get_lit_state(prev_byte)], symbol ); } -static inline void LZeb_encode_matched( struct LZ_encoder_base * const eb, - uint8_t prev_byte, uint8_t symbol, - uint8_t match_byte ) +static inline void LZeb_encode_matched( LZ_encoder_base * const eb, + const uint8_t prev_byte, const uint8_t symbol, const uint8_t match_byte ) { Re_encode_matched( &eb->renc, eb->bm_literal[get_lit_state(prev_byte)], symbol, match_byte ); } -static inline void LZeb_encode_pair( struct LZ_encoder_base * const eb, +static inline void LZeb_encode_pair( LZ_encoder_base * const eb, const unsigned dis, const int len, const int pos_state ) { - const int dis_slot = get_slot( dis ); Re_encode_len( &eb->renc, &eb->match_len_model, len, pos_state ); - Re_encode_tree( &eb->renc, eb->bm_dis_slot[get_len_state(len)], dis_slot, - dis_slot_bits ); + const unsigned dis_slot = get_slot( dis ); + Re_encode_tree6( &eb->renc, eb->bm_dis_slot[get_len_state(len)], dis_slot ); if( dis_slot >= start_dis_model ) { @@ -575,7 +597,7 @@ static inline void LZeb_encode_pair( struct LZ_encoder_base * const eb, const unsigned direct_dis = dis - base; if( dis_slot < end_dis_model ) - Re_encode_tree_reversed( &eb->renc, eb->bm_dis + base - dis_slot - 1, + Re_encode_tree_reversed( &eb->renc, eb->bm_dis + ( base - dis_slot ), direct_dis, direct_bits ); else { diff --git a/fast_encoder.c b/fast_encoder.c index 03697cc..bd675bb 100644 --- a/fast_encoder.c +++ b/fast_encoder.c @@ -1,99 +1,79 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ -int FLZe_longest_match_len( struct FLZ_encoder * const fe, int * const distance ) +static int FLZe_longest_match_len( FLZ_encoder * const fe, int * const distance ) { enum { len_limit = 16 }; - const uint8_t * const data = Mb_ptr_to_current_pos( &fe->eb.mb ); int32_t * ptr0 = fe->eb.mb.pos_array + fe->eb.mb.cyclic_pos; - int32_t * newptr; - const int pos1 = fe->eb.mb.pos + 1; - int maxlen = 0; - int count, delta, newpos; - if( len_limit > Mb_available_bytes( &fe->eb.mb ) ) { *ptr0 = 0; return 0; } + const int available = min( Mb_available_bytes( &fe->eb.mb ), max_match_len ); + if( available < len_limit ) { *ptr0 = 0; return 0; } + const uint8_t * const data = Mb_ptr_to_current_pos( &fe->eb.mb ); fe->key4 = ( ( fe->key4 << 4 ) ^ data[3] ) & fe->eb.mb.key4_mask; - newpos = fe->eb.mb.prev_positions[fe->key4]; + const int pos1 = fe->eb.mb.pos + 1; + int newpos1 = fe->eb.mb.prev_positions[fe->key4]; fe->eb.mb.prev_positions[fe->key4] = pos1; + int maxlen = 0, count; for( count = 4; ; ) { - if( --count < 0 || newpos <= 0 ) { *ptr0 = 0; break; } - delta = pos1 - newpos; - if( delta > fe->eb.mb.dictionary_size ) { *ptr0 = 0; break; } - newptr = fe->eb.mb.pos_array + + int delta; + if( newpos1 <= 0 || --count < 0 || + ( delta = pos1 - newpos1 ) > fe->eb.mb.dictionary_size ) + { *ptr0 = 0; break; } + int32_t * const newptr = fe->eb.mb.pos_array + ( fe->eb.mb.cyclic_pos - delta + ( ( fe->eb.mb.cyclic_pos >= delta ) ? 0 : fe->eb.mb.dictionary_size + 1 ) ); if( data[maxlen-delta] == data[maxlen] ) { int len = 0; - while( len < len_limit && data[len-delta] == data[len] ) ++len; - if( maxlen < len ) { maxlen = len; *distance = delta - 1; } + while( len < available && data[len-delta] == data[len] ) ++len; + if( maxlen < len ) + { maxlen = len; *distance = delta - 1; + if( maxlen >= len_limit ) { *ptr0 = *newptr; break; } } } - if( maxlen < len_limit ) - { - *ptr0 = newpos; - ptr0 = newptr; - newpos = *ptr0; - } - else - { - *ptr0 = *newptr; - maxlen += Mb_true_match_len( &fe->eb.mb, maxlen, *distance + 1, - max_match_len - maxlen ); - break; - } + *ptr0 = newpos1; + ptr0 = newptr; + newpos1 = *ptr0; } return maxlen; } -bool FLZe_encode_member( struct FLZ_encoder * const fe ) +static bool FLZe_encode_member( FLZ_encoder * const fe ) { int rep = 0, i; State * const state = &fe->eb.state; if( fe->eb.member_finished ) return true; if( Re_member_position( &fe->eb.renc ) >= fe->eb.member_size_limit ) - { - if( LZeb_full_flush( &fe->eb ) ) fe->eb.member_finished = true; - return true; - } + { LZeb_try_full_flush( &fe->eb ); return true; } if( Mb_data_position( &fe->eb.mb ) == 0 && !Mb_data_finished( &fe->eb.mb ) ) /* encode first byte */ { - const uint8_t prev_byte = 0; - uint8_t cur_byte; if( !Mb_enough_available_bytes( &fe->eb.mb ) || !Re_enough_free_bytes( &fe->eb.renc ) ) return true; - cur_byte = Mb_peek( &fe->eb.mb, 0 ); + const uint8_t prev_byte = 0; + const uint8_t cur_byte = Mb_peek( &fe->eb.mb, 0 ); Re_encode_bit( &fe->eb.renc, &fe->eb.bm_match[*state][0], 0 ); LZeb_encode_literal( &fe->eb, prev_byte, cur_byte ); CRC32_update_byte( &fe->eb.crc, cur_byte ); @@ -104,17 +84,16 @@ bool FLZe_encode_member( struct FLZ_encoder * const fe ) while( !Mb_data_finished( &fe->eb.mb ) && Re_member_position( &fe->eb.renc ) < fe->eb.member_size_limit ) { - int match_distance; - int main_len, pos_state, len = 0; if( !Mb_enough_available_bytes( &fe->eb.mb ) || !Re_enough_free_bytes( &fe->eb.renc ) ) return true; - main_len = FLZe_longest_match_len( fe, &match_distance ); - pos_state = Mb_data_position( &fe->eb.mb ) & pos_state_mask; + int match_distance = 0; /* avoid warning from gcc 6.1.0 */ + const int main_len = FLZe_longest_match_len( fe, &match_distance ); + const int pos_state = Mb_data_position( &fe->eb.mb ) & pos_state_mask; + int len = 0; for( i = 0; i < num_rep_distances; ++i ) { - const int tlen = Mb_true_match_len( &fe->eb.mb, 0, - fe->eb.reps[i] + 1, max_match_len ); + const int tlen = Mb_true_match_len( &fe->eb.mb, 0, fe->eb.reps[i] + 1 ); if( tlen > len ) { len = tlen; rep = i; } } if( len > min_match_len && len + 3 > main_len ) @@ -127,11 +106,10 @@ bool FLZe_encode_member( struct FLZ_encoder * const fe ) Re_encode_bit( &fe->eb.renc, &fe->eb.bm_len[*state][pos_state], 1 ); else { - int distance; Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep1[*state], rep > 1 ); if( rep > 1 ) Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep2[*state], rep > 2 ); - distance = fe->eb.reps[rep]; + const int distance = fe->eb.reps[rep]; for( i = rep; i > 0; --i ) fe->eb.reps[i] = fe->eb.reps[i-1]; fe->eb.reps[0] = distance; } @@ -156,7 +134,6 @@ bool FLZe_encode_member( struct FLZ_encoder * const fe ) continue; } - { const uint8_t prev_byte = Mb_peek( &fe->eb.mb, 1 ); const uint8_t cur_byte = Mb_peek( &fe->eb.mb, 0 ); const uint8_t match_byte = Mb_peek( &fe->eb.mb, fe->eb.reps[0] + 1 ); @@ -165,36 +142,34 @@ bool FLZe_encode_member( struct FLZ_encoder * const fe ) if( match_byte == cur_byte ) { - const int short_rep_price = price1( fe->eb.bm_match[*state][pos_state] ) + - price1( fe->eb.bm_rep[*state] ) + - price0( fe->eb.bm_rep0[*state] ) + - price0( fe->eb.bm_len[*state][pos_state] ); + const int shortrep_price = price1( fe->eb.bm_match[*state][pos_state] ) + + price1( fe->eb.bm_rep[*state] ) + + price0( fe->eb.bm_rep0[*state] ) + + price0( fe->eb.bm_len[*state][pos_state] ); int price = price0( fe->eb.bm_match[*state][pos_state] ); if( St_is_char( *state ) ) price += LZeb_price_literal( &fe->eb, prev_byte, cur_byte ); else price += LZeb_price_matched( &fe->eb, prev_byte, cur_byte, match_byte ); - if( short_rep_price < price ) + if( shortrep_price < price ) { Re_encode_bit( &fe->eb.renc, &fe->eb.bm_match[*state][pos_state], 1 ); Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep[*state], 1 ); Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep0[*state], 0 ); Re_encode_bit( &fe->eb.renc, &fe->eb.bm_len[*state][pos_state], 0 ); - *state = St_set_short_rep( *state ); + *state = St_set_shortrep( *state ); continue; } } /* literal byte */ Re_encode_bit( &fe->eb.renc, &fe->eb.bm_match[*state][pos_state], 0 ); - if( St_is_char( *state ) ) + if( ( *state = St_set_char( *state ) ) < 4 ) LZeb_encode_literal( &fe->eb, prev_byte, cur_byte ); else LZeb_encode_matched( &fe->eb, prev_byte, cur_byte, match_byte ); - *state = St_set_char( *state ); - } } - if( LZeb_full_flush( &fe->eb ) ) fe->eb.member_finished = true; + LZeb_try_full_flush( &fe->eb ); return true; } diff --git a/fast_encoder.h b/fast_encoder.h index 9d9e1c7..bce1b26 100644 --- a/fast_encoder.h +++ b/fast_encoder.h @@ -1,37 +1,29 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ -struct FLZ_encoder +typedef struct FLZ_encoder { - struct LZ_encoder_base eb; - int key4; /* key made from latest 4 bytes */ - }; + LZ_encoder_base eb; + unsigned key4; /* key made from latest 4 bytes */ + } FLZ_encoder; -static inline void FLZe_reset_key4( struct FLZ_encoder * const fe ) +static inline void FLZe_reset_key4( FLZ_encoder * const fe ) { int i; fe->key4 = 0; @@ -39,44 +31,40 @@ static inline void FLZe_reset_key4( struct FLZ_encoder * const fe ) fe->key4 = ( fe->key4 << 4 ) ^ fe->eb.mb.buffer[i]; } -int FLZe_longest_match_len( struct FLZ_encoder * const fe, int * const distance ); - -static inline bool FLZe_update_and_move( struct FLZ_encoder * const fe, int n ) +static inline bool FLZe_update_and_move( FLZ_encoder * const fe, int n ) { + Matchfinder_base * const mb = &fe->eb.mb; while( --n >= 0 ) { - if( Mb_available_bytes( &fe->eb.mb ) >= 4 ) + if( Mb_available_bytes( mb ) >= 4 ) { - int newpos; - fe->key4 = ( ( fe->key4 << 4 ) ^ fe->eb.mb.buffer[fe->eb.mb.pos+3] ) & - fe->eb.mb.key4_mask; - newpos = fe->eb.mb.prev_positions[fe->key4]; - fe->eb.mb.prev_positions[fe->key4] = fe->eb.mb.pos + 1; - fe->eb.mb.pos_array[fe->eb.mb.cyclic_pos] = newpos; + fe->key4 = ( ( fe->key4 << 4 ) ^ mb->buffer[mb->pos+3] ) & mb->key4_mask; + mb->pos_array[mb->cyclic_pos] = mb->prev_positions[fe->key4]; + mb->prev_positions[fe->key4] = mb->pos + 1; } - else fe->eb.mb.pos_array[fe->eb.mb.cyclic_pos] = 0; - if( !Mb_move_pos( &fe->eb.mb ) ) return false; + else mb->pos_array[mb->cyclic_pos] = 0; + if( !Mb_move_pos( mb ) ) return false; } return true; } -static inline bool FLZe_init( struct FLZ_encoder * const fe, +static inline bool FLZe_init( FLZ_encoder * const fe, const unsigned long long member_size ) { - enum { before = 0, + enum { before_size = 0, dict_size = 65536, /* bytes to keep in buffer after pos */ after_size = max_match_len, dict_factor = 16, + min_free_bytes = max_marker_size, num_prev_positions23 = 0, - pos_array_factor = 1, - min_free_bytes = max_marker_size }; + pos_array_factor = 1 }; - return LZeb_init( &fe->eb, before, dict_size, after_size, dict_factor, + return LZeb_init( &fe->eb, before_size, dict_size, after_size, dict_factor, num_prev_positions23, pos_array_factor, min_free_bytes, member_size ); } -static inline void FLZe_reset( struct FLZ_encoder * const fe, +static inline void FLZe_reset( FLZ_encoder * const fe, const unsigned long long member_size ) { LZeb_reset( &fe->eb, member_size ); } diff --git a/ffexample.c b/ffexample.c new file mode 100644 index 0000000..9f313ae --- /dev/null +++ b/ffexample.c @@ -0,0 +1,298 @@ +/* File to file example - Test program for the library lzlib + Copyright (C) 2010-2025 Antonio Diaz Diaz. + + This program is free software: you have unlimited permission + to copy, distribute, and modify it. + + Try 'ffexample -h' for usage information. + + This program is an example of how file-to-file + compression/decompression can be implemented using lzlib. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include +#include +#include +#include +#include +#include +#include +#include +#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__ +#include +#include +#endif + +#include "lzlib.h" + +#ifndef min + #define min(x,y) ((x) <= (y) ? (x) : (y)) +#endif + + +static void show_help( void ) + { + printf( "ffexample is an example program showing how file-to-file (de)compression can\n" + "be implemented using lzlib. The content of infile is compressed,\n" + "decompressed, or both, and then written to outfile.\n" + "\nUsage: ffexample operation [infile [outfile]]\n" ); + printf( "\nOperation:\n" + " -h display this help and exit\n" + " -c compress infile to outfile\n" + " -d decompress infile to outfile\n" + " -b both (compress then decompress) infile to outfile\n" + " -m compress (multimember) infile to outfile\n" + " -l compress (1 member per line) infile to outfile\n" + " -r decompress with resync if data error or leading garbage\n" + "\nIf infile or outfile are omitted, or are specified as '-', standard input or\n" + "standard output are used in their place respectively.\n" + "\nReport bugs to lzip-bug@nongnu.org\n" + "Lzlib home page: http://www.nongnu.org/lzip/lzlib.html\n" ); + } + + +int ffcompress( LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_finished( encoder ) == 1 ) return 0; + } + return 1; + } + + +int ffdecompress( LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } + + +int ffboth( LZ_Encoder * const encoder, LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + ret = LZ_compress_read( encoder, buffer, size ); + if( ret < 0 ) break; + ret = LZ_decompress_write( decoder, buffer, ret ); + if( ret < 0 ) break; + if( LZ_compress_finished( encoder ) == 1 ) + LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } + + +int ffmmcompress( FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384, member_size = 4096 }; + uint8_t buffer[buffer_size]; + bool done = false; + LZ_Encoder * const encoder = LZ_compress_open( 65535, 16, member_size ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { fputs( "ffexample: Not enough memory.\n", stderr ); + LZ_compress_close( encoder ); return 1; } + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( LZ_compress_finished( encoder ) == 1 ) { done = true; break; } + if( LZ_compress_restart_member( encoder, member_size ) < 0 ) break; + } + } + if( LZ_compress_close( encoder ) < 0 ) done = false; + return done; + } + + +/* Compress 'infile' to 'outfile' as a multimember stream with one member + for each line of text terminated by a newline character or by EOF. + Return 0 if success, 1 if error. +*/ +int fflfcompress( LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + for( len = 0; len < size; ) + { + int ch = getc( infile ); + if( ch == EOF || ( buffer[len++] = ch ) == '\n' ) break; + } + /* avoid writing an empty member to outfile */ + if( len == 0 && LZ_compress_data_position( encoder ) == 0 ) return 0; + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) || buffer[len-1] == '\n' ) + LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( feof( infile ) && LZ_compress_finished( encoder ) == 1 ) return 0; + if( LZ_compress_restart_member( encoder, INT64_MAX ) < 0 ) break; + } + } + return 1; + } + + +/* Decompress 'infile' to 'outfile' with automatic resynchronization to + next member in case of data error, including the automatic removal of + leading garbage. +*/ +int ffrsdecompress( LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) + { + if( LZ_decompress_errno( decoder ) == LZ_header_error || + LZ_decompress_errno( decoder ) == LZ_data_error ) + { LZ_decompress_sync_to_member( decoder ); continue; } + break; + } + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } + + +int main( const int argc, const char * const argv[] ) + { +#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__ + setmode( STDIN_FILENO, O_BINARY ); + setmode( STDOUT_FILENO, O_BINARY ); +#endif + + LZ_Encoder * const encoder = LZ_compress_open( 65535, 16, INT64_MAX ); + LZ_Decoder * const decoder = LZ_decompress_open(); + FILE * const infile = (argc >= 3 && strcmp( argv[2], "-" ) != 0) ? + fopen( argv[2], "rb" ) : stdin; + FILE * const outfile = (argc >= 4 && strcmp( argv[3], "-" ) != 0) ? + fopen( argv[3], "wb" ) : stdout; + int retval; + + if( argc < 2 || argc > 4 || strlen( argv[1] ) != 2 || argv[1][0] != '-' ) + { show_help(); return 1; } + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok || + !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { fputs( "ffexample: Not enough memory.\n", stderr ); + LZ_compress_close( encoder ); LZ_decompress_close( decoder ); return 1; } + if( !infile ) + { fprintf( stderr, "ffexample: %s: Can't open input file: %s\n", + argv[2], strerror( errno ) ); return 1; } + if( !outfile ) + { fprintf( stderr, "ffexample: %s: Can't open output file: %s\n", + argv[3], strerror( errno ) ); return 1; } + + switch( argv[1][1] ) + { + case 'c': retval = ffcompress( encoder, infile, outfile ); break; + case 'd': retval = ffdecompress( decoder, infile, outfile ); break; + case 'b': retval = ffboth( encoder, decoder, infile, outfile ); break; + case 'm': retval = ffmmcompress( infile, outfile ); break; + case 'l': retval = fflfcompress( encoder, infile, outfile ); break; + case 'r': retval = ffrsdecompress( decoder, infile, outfile ); break; + default: show_help(); return argv[1][1] != 'h'; + } + + if( LZ_decompress_close( decoder ) < 0 || LZ_compress_close( encoder ) < 0 || + fclose( outfile ) != 0 || fclose( infile ) != 0 ) retval = 1; + return retval; + } diff --git a/lzcheck.c b/lzcheck.c index b9ba11b..86ce87d 100644 --- a/lzcheck.c +++ b/lzcheck.c @@ -1,239 +1,398 @@ -/* Lzcheck - Test program for the lzlib library - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzcheck - Test program for the library lzlib + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This program is free software: you have unlimited permission - to copy, distribute and modify it. + This program is free software: you have unlimited permission + to copy, distribute, and modify it. - Usage is: - lzcheck filename.txt + Usage: lzcheck [-m|-s] filename.txt... - This program reads the specified text file and then compresses it, - line by line, to test the flushing mechanism and the member - restart/reset/sync functions. + This program reads each text file specified and then compresses it, + line by line, to test the flushing mechanism and the member + restart/reset/sync functions. */ #define _FILE_OFFSET_BITS 64 +#include #include #include #include #include #include #include +#include #include "lzlib.h" -#ifndef min - #define min(x,y) ((x) <= (y) ? (x) : (y)) -#endif - -enum { buffer_size = 32768 }; +const unsigned long long member_size = INT64_MAX; +enum { buffer_size = 32749 }; /* largest prime < 32768 */ uint8_t in_buffer[buffer_size]; uint8_t mid_buffer[buffer_size]; uint8_t out_buffer[buffer_size]; -int lzcheck( FILE * const file, const int dictionary_size ) +static void show_line( const uint8_t * const buffer, const int size ) + { + int i; + for( i = 0; i < size; ++i ) + fputc( isprint( buffer[i] ) ? buffer[i] : '.', stderr ); + fputc( '\n', stderr ); + } + + +static LZ_Encoder * xopen_encoder( const int dictionary_size ) { const int match_len_limit = 16; - const unsigned long long member_size = 0x7FFFFFFFFFFFFFFFULL; /* INT64_MAX */ - struct LZ_Encoder * encoder; - struct LZ_Decoder * decoder; - int retval = 0; - - encoder = LZ_compress_open( dictionary_size, match_len_limit, member_size ); + LZ_Encoder * const encoder = + LZ_compress_open( dictionary_size, match_len_limit, member_size ); if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) { - const bool mem_error = ( LZ_compress_errno( encoder ) == LZ_mem_error ); + const bool bad_arg = + encoder && ( LZ_compress_errno( encoder ) == LZ_bad_argument ); LZ_compress_close( encoder ); - if( mem_error ) + if( bad_arg ) { - fputs( "lzcheck: Not enough memory.\n", stderr ); - return 1; + fputs( "lzcheck: internal error: Invalid argument to encoder.\n", stderr ); + exit( 3 ); } - fputs( "lzcheck: internal error: Invalid argument to encoder.\n", stderr ); - return 3; + fputs( "lzcheck: Not enough memory.\n", stderr ); + exit( 1 ); } + return encoder; + } - decoder = LZ_decompress_open(); + +static LZ_Decoder * xopen_decoder( void ) + { + LZ_Decoder * const decoder = LZ_decompress_open(); if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) { LZ_decompress_close( decoder ); fputs( "lzcheck: Not enough memory.\n", stderr ); - return 1; + exit( 1 ); + } + return decoder; + } + + +static void xclose_encoder( LZ_Encoder * const encoder, const bool finish ) + { + if( finish ) + { + unsigned long long size = 0; + LZ_compress_finish( encoder ); + while( true ) + { + const int rd = LZ_compress_read( encoder, mid_buffer, buffer_size ); + if( rd < 0 ) + { + fprintf( stderr, "lzcheck: xclose: LZ_compress_read error: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); + exit( 3 ); + } + size += rd; + if( LZ_compress_finished( encoder ) == 1 ) break; + } + if( size > 0 ) + { + fprintf( stderr, "lzcheck: %lld bytes remain in encoder.\n", size ); + exit( 3 ); + } + } + if( LZ_compress_close( encoder ) < 0 ) exit( 1 ); + } + + +static void xclose_decoder( LZ_Decoder * const decoder, const bool finish ) + { + if( finish ) + { + unsigned long long size = 0; + LZ_decompress_finish( decoder ); + while( true ) + { + const int rd = LZ_decompress_read( decoder, out_buffer, buffer_size ); + if( rd < 0 ) + { + fprintf( stderr, "lzcheck: xclose: LZ_decompress_read error: %s\n", + LZ_strerror( LZ_decompress_errno( decoder ) ) ); + exit( 3 ); + } + size += rd; + if( LZ_decompress_finished( decoder ) == 1 ) break; + } + if( size > 0 ) + { + fprintf( stderr, "lzcheck: %lld bytes remain in decoder.\n", size ); + exit( 3 ); + } + } + if( LZ_decompress_close( decoder ) < 0 ) exit( 1 ); + } + + +/* Return the next (usually newline-terminated) chunk of data from file. + The size returned in *sizep is always <= buffer_size. + If sizep is a null pointer, rewind the file, reset state, and return. + If file is at EOF, return an empty line. +*/ +static const uint8_t * next_line( FILE * const file, int * const sizep ) + { + static int l = 0; + static int read_size = 0; + int r; + + if( !sizep ) { rewind( file ); l = read_size = 0; return in_buffer; } + if( l >= read_size ) + { + l = 0; read_size = fread( in_buffer, 1, buffer_size, file ); + if( l >= read_size ) { *sizep = 0; return in_buffer; } /* end of file */ } - while( retval <= 1 ) - { - int i, l, r; - const int read_size = fread( in_buffer, 1, buffer_size, file ); - if( read_size <= 0 ) break; /* end of file */ + for( r = l + 1; r < read_size && in_buffer[r-1] != '\n'; ++r ); + *sizep = r - l; l = r; + return in_buffer + l - *sizep; + } - for( l = 0, r = 1; r <= read_size; l = r, ++r ) + +static int check_sync_flush( FILE * const file, const int dictionary_size ) + { + LZ_Encoder * const encoder = xopen_encoder( dictionary_size ); + LZ_Decoder * const decoder = xopen_decoder(); + int retval = 0; + + while( retval <= 1 ) /* test LZ_compress_sync_flush */ + { + int in_size, mid_size, out_size; + int line_size; + const uint8_t * const line_buf = next_line( file, &line_size ); + if( line_size <= 0 ) break; /* end of file */ + + in_size = LZ_compress_write( encoder, line_buf, line_size ); + if( in_size < 0 ) { - int in_size, mid_size, out_size; - while( r < read_size && in_buffer[r-1] != '\n' ) ++r; - in_size = LZ_compress_write( encoder, in_buffer + l, r - l ); - if( in_size < r - l ) r = l + in_size; - LZ_compress_sync_flush( encoder ); + fprintf( stderr, "lzcheck: LZ_compress_write error: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); + retval = 3; break; + } + if( in_size < line_size ) + { + fprintf( stderr, "lzcheck: sync: LZ_compress_write only accepted %d " + "of %d bytes\n", in_size, line_size ); mid_size = LZ_compress_read( encoder, mid_buffer, buffer_size ); - if( mid_size < 0 ) + const int wr = + LZ_compress_write( encoder, line_buf + in_size, line_size - in_size ); + if( wr < 0 ) { - fprintf( stderr, "lzcheck: LZ_compress_read error: %s\n", + fprintf( stderr, "lzcheck: LZ_compress_write error: %s\n", LZ_strerror( LZ_compress_errno( encoder ) ) ); retval = 3; break; } - LZ_decompress_write( decoder, mid_buffer, mid_size ); - out_size = LZ_decompress_read( decoder, out_buffer, buffer_size ); - if( out_size < 0 ) + if( wr + in_size != line_size ) { - fprintf( stderr, "lzcheck: LZ_decompress_read error: %s\n", - LZ_strerror( LZ_decompress_errno( decoder ) ) ); + fprintf( stderr, "lzcheck: sync: LZ_compress_write only accepted %d " + "of %d remaining bytes\n", wr, line_size - in_size ); retval = 3; break; } - - if( out_size != in_size || memcmp( in_buffer + l, out_buffer, out_size ) ) + in_size += wr; + LZ_compress_sync_flush( encoder ); + const int rd = LZ_compress_read( encoder, mid_buffer + mid_size, + buffer_size - mid_size ); + if( rd > 0 ) mid_size += rd; + else if( rd < 0 ) mid_size = -1; + } + else + { + LZ_compress_sync_flush( encoder ); + if( line_buf[0] & 1 ) /* read all data at once or byte by byte */ + mid_size = LZ_compress_read( encoder, mid_buffer, buffer_size ); + else for( mid_size = 0; mid_size < buffer_size; ) { - fprintf( stderr, "lzcheck: Sync error at pos %d in_size = %d, out_size = %d\n", - l, in_size, out_size ); - for( i = 0; i < in_size; ++i ) - fputc( in_buffer[l+i], stderr ); - if( in_buffer[l+in_size-1] != '\n' ) - fputc( '\n', stderr ); - for( i = 0; i < out_size; ++i ) - fputc( out_buffer[i], stderr ); - fputc( '\n', stderr ); - retval = 1; + const int rd = LZ_compress_read( encoder, mid_buffer + mid_size, 1 ); + if( rd > 0 ) mid_size += rd; + else { if( rd < 0 ) { mid_size = -1; } break; } } } + if( mid_size < 0 ) + { + fprintf( stderr, "lzcheck: LZ_compress_read error: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); + retval = 3; break; + } + LZ_decompress_write( decoder, mid_buffer, mid_size ); + out_size = LZ_decompress_read( decoder, out_buffer, buffer_size ); + if( out_size < 0 ) + { + fprintf( stderr, "lzcheck: LZ_decompress_read error: %s\n", + LZ_strerror( LZ_decompress_errno( decoder ) ) ); + retval = 3; break; + } + + if( out_size != in_size || memcmp( line_buf, out_buffer, out_size ) ) + { + fprintf( stderr, "lzcheck: LZ_compress_sync_flush error: " + "in_size = %d, out_size = %d\n", in_size, out_size ); + show_line( line_buf, in_size ); + show_line( out_buffer, out_size ); + retval = 1; + } } if( retval <= 1 ) { - rewind( file ); + int rd = 0; if( LZ_compress_finish( encoder ) < 0 || - LZ_decompress_write( decoder, mid_buffer, LZ_compress_read( encoder, mid_buffer, buffer_size ) ) < 0 || - LZ_decompress_read( decoder, out_buffer, buffer_size ) != 0 || - LZ_compress_restart_member( encoder, member_size ) < 0 ) + ( rd = LZ_compress_read( encoder, mid_buffer, buffer_size ) ) < 0 ) { - fprintf( stderr, "lzcheck: Can't finish member: %s\n", - LZ_strerror( LZ_decompress_errno( decoder ) ) ); + fprintf( stderr, "lzcheck: Can't drain encoder: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); retval = 3; } + LZ_decompress_write( decoder, mid_buffer, rd ); } - while( retval <= 1 ) - { - int i, l, r, size; - const int read_size = fread( in_buffer, 1, buffer_size / 2, file ); - if( read_size <= 0 ) break; /* end of file */ + xclose_decoder( decoder, retval == 0 ); + xclose_encoder( encoder, retval == 0 ); + return retval; + } - for( l = 0, r = 1; r <= read_size; l = r, ++r ) + +/* Test member by member decompression without calling LZ_decompress_finish, + inserting leading garbage before some members, and resetting the + decompressor sometimes. Test that the increase in total_in_size when + syncing to member is equal to the size of the leading garbage skipped. +*/ +static int check_members( FILE * const file, const int dictionary_size ) + { + LZ_Encoder * const encoder = xopen_encoder( dictionary_size ); + LZ_Decoder * const decoder = xopen_decoder(); + int retval = 0; + + while( retval <= 1 ) /* test LZ_compress_restart_member */ + { + unsigned long long garbage_begin = 0; /* avoid warning from gcc 3.3.6 */ + int leading_garbage, in_size, mid_size, out_size; + int line_size; + const uint8_t * const line_buf = next_line( file, &line_size ); + if( line_size <= 0 && /* end of file, write at least 1 member */ + LZ_decompress_total_in_size( decoder ) != 0 ) break; + + if( LZ_compress_finished( encoder ) == 1 ) { - int leading_garbage, in_size, mid_size, out_size; - while( r < read_size && in_buffer[r-1] != '\n' ) ++r; - leading_garbage = (l == 0) ? min( r, read_size / 2 ) : 0; - in_size = LZ_compress_write( encoder, in_buffer + l, r - l ); - if( in_size < r - l ) r = l + in_size; - LZ_compress_sync_flush( encoder ); - if( leading_garbage ) - memset( mid_buffer, in_buffer[0], leading_garbage ); - mid_size = LZ_compress_read( encoder, mid_buffer + leading_garbage, - buffer_size - leading_garbage ); - if( mid_size < 0 ) + if( LZ_compress_restart_member( encoder, member_size ) < 0 ) { - fprintf( stderr, "lzcheck: LZ_compress_read error: %s\n", + fprintf( stderr, "lzcheck: Can't restart member: %s\n", LZ_strerror( LZ_compress_errno( encoder ) ) ); retval = 3; break; } - LZ_decompress_write( decoder, mid_buffer, mid_size + leading_garbage ); - out_size = LZ_decompress_read( decoder, out_buffer, buffer_size ); - if( out_size < 0 ) + if( line_size >= 2 && line_buf[1] == 'h' ) + LZ_decompress_reset( decoder ); + } + in_size = LZ_compress_write( encoder, line_buf, line_size ); + if( in_size < line_size ) + fprintf( stderr, "lzcheck: member: LZ_compress_write only accepted %d of %d bytes\n", + in_size, line_size ); + LZ_compress_finish( encoder ); + if( line_size * 3 < buffer_size && line_buf[0] == 't' ) + { leading_garbage = line_size; + memset( mid_buffer, in_buffer[0], leading_garbage ); + garbage_begin = LZ_decompress_total_in_size( decoder ); } + else leading_garbage = 0; + mid_size = LZ_compress_read( encoder, mid_buffer + leading_garbage, + buffer_size - leading_garbage ); + if( mid_size < 0 ) + { + fprintf( stderr, "lzcheck: member: LZ_compress_read error: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); + retval = 3; break; + } + LZ_decompress_write( decoder, mid_buffer, leading_garbage + mid_size ); + out_size = LZ_decompress_read( decoder, out_buffer, buffer_size ); + if( out_size < 0 ) + { + if( leading_garbage && + ( LZ_decompress_errno( decoder ) == LZ_header_error || + LZ_decompress_errno( decoder ) == LZ_data_error ) ) { - if( LZ_decompress_errno( decoder ) == LZ_header_error || - LZ_decompress_errno( decoder ) == LZ_data_error ) + LZ_decompress_sync_to_member( decoder ); /* skip leading garbage */ + const unsigned long long garbage_end = + LZ_decompress_total_in_size( decoder ); + if( garbage_end - garbage_begin != (unsigned)leading_garbage ) { - LZ_decompress_sync_to_member( decoder ); /* remove leading garbage */ - out_size = LZ_decompress_read( decoder, out_buffer, buffer_size ); - } - if( out_size < 0 ) - { - fprintf( stderr, "lzcheck: LZ_decompress_read error: %s\n", - LZ_strerror( LZ_decompress_errno( decoder ) ) ); + fprintf( stderr, "lzcheck: member: LZ_decompress_sync_to_member error:\n" + " garbage_begin = %llu garbage_end = %llu " + "difference = %llu expected = %d\n", garbage_begin, + garbage_end, garbage_end - garbage_begin, leading_garbage ); retval = 3; break; } + out_size = LZ_decompress_read( decoder, out_buffer, buffer_size ); } - - if( out_size != in_size || memcmp( in_buffer + l, out_buffer, out_size ) ) + if( out_size < 0 ) { - fprintf( stderr, "lzcheck: Sync error at pos %d in_size = %d, out_size = %d, leading garbage = %d\n", - l, in_size, out_size, leading_garbage ); - for( i = 0; i < in_size; ++i ) - fputc( in_buffer[l+i], stderr ); - if( in_buffer[l+in_size-1] != '\n' ) - fputc( '\n', stderr ); - for( i = 0; i < out_size; ++i ) - fputc( out_buffer[i], stderr ); - fputc( '\n', stderr ); - retval = 1; + fprintf( stderr, "lzcheck: member: LZ_decompress_read error: %s\n", + LZ_strerror( LZ_decompress_errno( decoder ) ) ); + retval = 3; break; } } - if( retval >= 3 ) break; - if( LZ_compress_finish( encoder ) < 0 || - LZ_decompress_write( decoder, mid_buffer, LZ_compress_read( encoder, mid_buffer, buffer_size ) ) < 0 || - LZ_decompress_read( decoder, out_buffer, buffer_size ) != 0 || - LZ_decompress_reset( decoder ) < 0 || - LZ_compress_restart_member( encoder, member_size ) < 0 ) + if( out_size != in_size || memcmp( line_buf, out_buffer, out_size ) ) { - fprintf( stderr, "lzcheck: Can't restart member: %s\n", - LZ_strerror( LZ_decompress_errno( decoder ) ) ); - retval = 3; break; - } - - size = min( 100, read_size ); - if( LZ_compress_write( encoder, in_buffer, size ) != size || - LZ_compress_finish( encoder ) < 0 || - LZ_decompress_write( decoder, mid_buffer, LZ_compress_read( encoder, mid_buffer, buffer_size ) ) < 0 || - LZ_decompress_read( decoder, out_buffer, 0 ) != 0 || - LZ_decompress_sync_to_member( decoder ) < 0 || - LZ_compress_restart_member( encoder, member_size ) < 0 ) - { - fprintf( stderr, "lzcheck: Can't seek to next member: %s\n", - LZ_strerror( LZ_decompress_errno( decoder ) ) ); - retval = 3; break; + fprintf( stderr, "lzcheck: LZ_compress_restart_member error: " + "in_size = %d, out_size = %d\n", in_size, out_size ); + show_line( line_buf, in_size ); + show_line( out_buffer, out_size ); + retval = 1; } } - LZ_decompress_close( decoder ); - LZ_compress_close( encoder ); + xclose_decoder( decoder, retval == 0 ); + xclose_encoder( encoder, retval == 0 ); return retval; } int main( const int argc, const char * const argv[] ) { - FILE * file; - int retval; + int retval = 0, i; + int open_failures = 0; + const char opt = ( argc > 2 && + ( strcmp( argv[1], "-m" ) == 0 || strcmp( argv[1], "-s" ) == 0 ) ) ? + argv[1][1] : 0; + const int first = opt ? 2 : 1; + const bool verbose = opt != 0 || argc > first + 1; if( argc < 2 ) { - fputs( "Usage: lzcheck filename.txt\n", stderr ); + fputs( "Usage: lzcheck [-m|-s] filename.txt...\n", stderr ); return 1; } - file = fopen( argv[1], "rb" ); - if( !file ) + for( i = first; i < argc && retval == 0; ++i ) { - fprintf( stderr, "lzcheck: Can't open file '%s' for reading.\n", argv[1] ); - return 1; - } -/* fprintf( stderr, "lzcheck: Testing file '%s'\n", argv[1] ); */ + struct stat st; + if( stat( argv[i], &st ) != 0 || !S_ISREG( st.st_mode ) ) continue; + FILE * file = fopen( argv[i], "rb" ); + if( !file ) + { + fprintf( stderr, "lzcheck: %s: Can't open file for reading.\n", argv[i] ); + ++open_failures; continue; + } + if( verbose ) fprintf( stderr, " Testing file '%s'\n", argv[i] ); - retval = lzcheck( file, 65535 ); /* 65535,16 chooses fast encoder */ - if( retval == 0 ) - { rewind( file ); retval = lzcheck( file, 1 << 20 ); } - fclose( file ); + /* 65535,16 chooses fast encoder */ + if( opt != 'm' ) retval = check_sync_flush( file, 65535 ); + if( retval == 0 && opt != 'm' ) + { next_line( file, 0 ); retval = check_sync_flush( file, 1 << 20 ); } + if( retval == 0 && opt != 's' ) + { next_line( file, 0 ); retval = check_members( file, 65535 ); } + if( retval == 0 && opt != 's' ) + { next_line( file, 0 ); retval = check_members( file, 1 << 20 ); } + fclose( file ); + } + if( open_failures > 0 && verbose ) + fprintf( stderr, "lzcheck: warning: %d %s failed to open.\n", + open_failures, ( open_failures == 1 ) ? "file" : "files" ); + if( retval == 0 && open_failures ) retval = 1; return retval; } diff --git a/lzip.h b/lzip.h index 73fc7f1..0f2c1ed 100644 --- a/lzip.h +++ b/lzip.h @@ -1,28 +1,20 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ #ifndef max @@ -43,15 +35,13 @@ static inline State St_set_char( const State st ) static const State next[states] = { 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 4, 5 }; return next[st]; } - +static inline State St_set_char_rep() { return 8; } static inline State St_set_match( const State st ) - { return ( ( st < 7 ) ? 7 : 10 ); } - + { return ( st < 7 ) ? 7 : 10; } static inline State St_set_rep( const State st ) - { return ( ( st < 7 ) ? 8 : 11 ); } - -static inline State St_set_short_rep( const State st ) - { return ( ( st < 7 ) ? 9 : 11 ); } + { return ( st < 7 ) ? 8 : 11; } +static inline State St_set_shortrep( const State st ) + { return ( st < 7 ) ? 9 : 11; } enum { @@ -89,7 +79,7 @@ static inline int get_len_state( const int len ) { return min( len - min_match_len, len_states - 1 ); } static inline int get_lit_state( const uint8_t prev_byte ) - { return ( prev_byte >> ( 8 - literal_context_bits ) ); } + { return prev_byte >> ( 8 - literal_context_bits ); } enum { bit_model_move_bits = 5, @@ -104,16 +94,16 @@ static inline void Bm_init( Bit_model * const probability ) static inline void Bm_array_init( Bit_model bm[], const int size ) { int i; for( i = 0; i < size; ++i ) Bm_init( &bm[i] ); } -struct Len_model +typedef struct Len_model { Bit_model choice1; Bit_model choice2; Bit_model bm_low[pos_states][len_low_symbols]; Bit_model bm_mid[pos_states][len_mid_symbols]; Bit_model bm_high[len_high_symbols]; - }; + } Len_model; -static inline void Lm_init( struct Len_model * const lm ) +static inline void Lm_init( Len_model * const lm ) { Bm_init( &lm->choice1 ); Bm_init( &lm->choice2 ); @@ -174,19 +164,22 @@ static const uint32_t crc32[256] = static inline void CRC32_update_byte( uint32_t * const crc, const uint8_t byte ) { *crc = crc32[(*crc^byte)&0xFF] ^ ( *crc >> 8 ); } +/* about as fast as it is possible without messing with endianness */ static inline void CRC32_update_buf( uint32_t * const crc, const uint8_t * const buffer, const int size ) { int i; + uint32_t c = *crc; for( i = 0; i < size; ++i ) - *crc = crc32[(*crc^buffer[i])&0xFF] ^ ( *crc >> 8 ); + c = crc32[(c^buffer[i])&0xFF] ^ ( c >> 8 ); + *crc = c; } static inline bool isvalid_ds( const unsigned dictionary_size ) - { return ( dictionary_size >= min_dictionary_size && - dictionary_size <= max_dictionary_size ); } + { return dictionary_size >= min_dictionary_size && + dictionary_size <= max_dictionary_size; } static inline int real_bits( unsigned value ) @@ -197,42 +190,51 @@ static inline int real_bits( unsigned value ) } -static const uint8_t magic_string[4] = { 0x4C, 0x5A, 0x49, 0x50 }; /* "LZIP" */ +static const uint8_t lzip_magic[4] = { 0x4C, 0x5A, 0x49, 0x50 }; /* "LZIP" */ -typedef uint8_t File_header[6]; /* 0-3 magic bytes */ +enum { Lh_size = 6 }; +typedef uint8_t Lzip_header[Lh_size]; /* 0-3 magic bytes */ /* 4 version */ - /* 5 coded_dict_size */ -enum { Fh_size = 6 }; + /* 5 coded dictionary size */ -static inline void Fh_set_magic( File_header data ) - { memcpy( data, magic_string, 4 ); data[4] = 1; } +static inline void Lh_set_magic( Lzip_header data ) + { memcpy( data, lzip_magic, 4 ); data[4] = 1; } -static inline bool Fh_verify_magic( const File_header data ) - { return ( memcmp( data, magic_string, 4 ) == 0 ); } +static inline bool Lh_check_magic( const Lzip_header data ) + { return memcmp( data, lzip_magic, 4 ) == 0; } -/* detect truncated header */ -static inline bool Fh_verify_prefix( const File_header data, const int size ) +/* detect (truncated) header */ +static inline bool Lh_check_prefix( const Lzip_header data, const int sz ) { - int i; for( i = 0; i < size && i < 4; ++i ) - if( data[i] != magic_string[i] ) return false; - return ( size > 0 ); + int i; for( i = 0; i < sz && i < 4; ++i ) + if( data[i] != lzip_magic[i] ) return false; + return sz > 0; } -static inline uint8_t Fh_version( const File_header data ) +/* detect corrupt header */ +static inline bool Lh_check_corrupt( const Lzip_header data ) + { + int matches = 0; + int i; for( i = 0; i < 4; ++i ) + if( data[i] == lzip_magic[i] ) ++matches; + return matches > 1 && matches < 4; + } + +static inline uint8_t Lh_version( const Lzip_header data ) { return data[4]; } -static inline bool Fh_verify_version( const File_header data ) - { return ( data[4] == 1 ); } +static inline bool Lh_check_version( const Lzip_header data ) + { return data[4] == 1; } -static inline unsigned Fh_get_dictionary_size( const File_header data ) +static inline unsigned Lh_get_dictionary_size( const Lzip_header data ) { - unsigned sz = ( 1 << ( data[5] & 0x1F ) ); + unsigned sz = 1 << ( data[5] & 0x1F ); if( sz > min_dictionary_size ) sz -= ( sz / 16 ) * ( ( data[5] >> 5 ) & 7 ); return sz; } -static inline bool Fh_set_dictionary_size( File_header data, const unsigned sz ) +static inline bool Lh_set_dictionary_size( Lzip_header data, const unsigned sz ) { if( !isvalid_ds( sz ) ) return false; data[5] = real_bits( sz - 1 ); @@ -240,55 +242,53 @@ static inline bool Fh_set_dictionary_size( File_header data, const unsigned sz ) { const unsigned base_size = 1 << data[5]; const unsigned fraction = base_size / 16; - int i; + unsigned i; for( i = 7; i >= 1; --i ) if( base_size - ( i * fraction ) >= sz ) - { data[5] |= ( i << 5 ); break; } + { data[5] |= i << 5; break; } } return true; } -static inline bool Fh_verify( const File_header data ) +static inline bool Lh_check( const Lzip_header data ) { - if( Fh_verify_magic( data ) && Fh_verify_version( data ) ) - return isvalid_ds( Fh_get_dictionary_size( data ) ); - return false; + return Lh_check_magic( data ) && Lh_check_version( data ) && + isvalid_ds( Lh_get_dictionary_size( data ) ); } -typedef uint8_t File_trailer[20]; +enum { Lt_size = 20 }; +typedef uint8_t Lzip_trailer[Lt_size]; /* 0-3 CRC32 of the uncompressed data */ /* 4-11 size of the uncompressed data */ /* 12-19 member size including header and trailer */ -enum { Ft_size = 20 }; - -static inline unsigned Ft_get_data_crc( const File_trailer data ) +static inline unsigned Lt_get_data_crc( const Lzip_trailer data ) { unsigned tmp = 0; int i; for( i = 3; i >= 0; --i ) { tmp <<= 8; tmp += data[i]; } return tmp; } -static inline void Ft_set_data_crc( File_trailer data, unsigned crc ) +static inline void Lt_set_data_crc( Lzip_trailer data, unsigned crc ) { int i; for( i = 0; i <= 3; ++i ) { data[i] = (uint8_t)crc; crc >>= 8; } } -static inline unsigned long long Ft_get_data_size( const File_trailer data ) +static inline unsigned long long Lt_get_data_size( const Lzip_trailer data ) { unsigned long long tmp = 0; int i; for( i = 11; i >= 4; --i ) { tmp <<= 8; tmp += data[i]; } return tmp; } -static inline void Ft_set_data_size( File_trailer data, unsigned long long sz ) +static inline void Lt_set_data_size( Lzip_trailer data, unsigned long long sz ) { int i; for( i = 4; i <= 11; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } } -static inline unsigned long long Ft_get_member_size( const File_trailer data ) +static inline unsigned long long Lt_get_member_size( const Lzip_trailer data ) { unsigned long long tmp = 0; int i; for( i = 19; i >= 12; --i ) { tmp <<= 8; tmp += data[i]; } return tmp; } -static inline void Ft_set_member_size( File_trailer data, unsigned long long sz ) +static inline void Lt_set_member_size( Lzip_trailer data, unsigned long long sz ) { int i; for( i = 12; i <= 19; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } } diff --git a/lzlib.c b/lzlib.c index 953f8e3..3dd2566 100644 --- a/lzlib.c +++ b/lzlib.c @@ -1,28 +1,20 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ #include @@ -47,14 +39,14 @@ struct LZ_Encoder { unsigned long long partial_in_size; unsigned long long partial_out_size; - struct LZ_encoder_base * lz_encoder_base; /* these 3 pointers make a */ - struct LZ_encoder * lz_encoder; /* polymorphic encoder */ - struct FLZ_encoder * flz_encoder; - enum LZ_Errno lz_errno; + LZ_encoder_base * lz_encoder_base; /* these 3 pointers make a */ + LZ_encoder * lz_encoder; /* polymorphic encoder */ + FLZ_encoder * flz_encoder; + LZ_Errno lz_errno; bool fatal; }; -static void LZ_Encoder_init( struct LZ_Encoder * const e ) +static void LZ_Encoder_init( LZ_Encoder * const e ) { e->partial_in_size = 0; e->partial_out_size = 0; @@ -70,16 +62,16 @@ struct LZ_Decoder { unsigned long long partial_in_size; unsigned long long partial_out_size; - struct Range_decoder * rdec; - struct LZ_decoder * lz_decoder; - enum LZ_Errno lz_errno; - File_header member_header; /* header of current member */ + Range_decoder * rdec; + LZ_decoder * lz_decoder; + LZ_Errno lz_errno; + Lzip_header member_header; /* header of current member */ bool fatal; bool first_header; /* true until first header is read */ bool seeking; }; -static void LZ_Decoder_init( struct LZ_Decoder * const d ) +static void LZ_Decoder_init( LZ_Decoder * const d ) { int i; d->partial_in_size = 0; @@ -87,14 +79,14 @@ static void LZ_Decoder_init( struct LZ_Decoder * const d ) d->rdec = 0; d->lz_decoder = 0; d->lz_errno = LZ_ok; - for( i = 0; i < Fh_size; ++i ) d->member_header[i] = 0; + for( i = 0; i < Lh_size; ++i ) d->member_header[i] = 0; d->fatal = false; d->first_header = true; d->seeking = false; } -static bool verify_encoder( struct LZ_Encoder * const e ) +static bool check_encoder( LZ_Encoder * const e ) { if( !e ) return false; if( !e->lz_encoder_base || ( !e->lz_encoder && !e->flz_encoder ) || @@ -104,7 +96,7 @@ static bool verify_encoder( struct LZ_Encoder * const e ) } -static bool verify_decoder( struct LZ_Decoder * const d ) +static bool check_decoder( LZ_Decoder * const d ) { if( !d ) return false; if( !d->rdec ) @@ -113,12 +105,13 @@ static bool verify_decoder( struct LZ_Decoder * const d ) } -/*------------------------- Misc Functions -------------------------*/ +/* ------------------------- Misc Functions ------------------------- */ + +int LZ_api_version( void ) { return LZ_API_VERSION; } const char * LZ_version( void ) { return LZ_version_string; } - -const char * LZ_strerror( const enum LZ_Errno lz_errno ) +const char * LZ_strerror( const LZ_Errno lz_errno ) { switch( lz_errno ) { @@ -127,7 +120,7 @@ const char * LZ_strerror( const enum LZ_Errno lz_errno ) case LZ_mem_error : return "Not enough memory"; case LZ_sequence_error: return "Sequence error"; case LZ_header_error : return "Header error"; - case LZ_unexpected_eof: return "Unexpected eof"; + case LZ_unexpected_eof: return "Unexpected EOF"; case LZ_data_error : return "Data error"; case LZ_library_error : return "Library error"; } @@ -143,18 +136,17 @@ int LZ_min_match_len_limit( void ) { return min_match_len_limit; } int LZ_max_match_len_limit( void ) { return max_match_len; } -/*---------------------- Compression Functions ----------------------*/ +/* --------------------- Compression Functions --------------------- */ -struct LZ_Encoder * LZ_compress_open( const int dictionary_size, - const int match_len_limit, - const unsigned long long member_size ) +LZ_Encoder * LZ_compress_open( const int dictionary_size, + const int match_len_limit, + const unsigned long long member_size ) { - File_header header; - struct LZ_Encoder * const e = - (struct LZ_Encoder *)malloc( sizeof (struct LZ_Encoder) ); + Lzip_header header; + LZ_Encoder * const e = (LZ_Encoder *)malloc( sizeof (LZ_Encoder) ); if( !e ) return 0; LZ_Encoder_init( e ); - if( !Fh_set_dictionary_size( header, dictionary_size ) || + if( !Lh_set_dictionary_size( header, dictionary_size ) || match_len_limit < min_match_len_limit || match_len_limit > max_match_len || member_size < min_dictionary_size ) @@ -163,15 +155,15 @@ struct LZ_Encoder * LZ_compress_open( const int dictionary_size, { if( dictionary_size == 65535 && match_len_limit == 16 ) { - e->flz_encoder = (struct FLZ_encoder *)malloc( sizeof (struct FLZ_encoder) ); + e->flz_encoder = (FLZ_encoder *)malloc( sizeof (FLZ_encoder) ); if( e->flz_encoder && FLZe_init( e->flz_encoder, member_size ) ) { e->lz_encoder_base = &e->flz_encoder->eb; return e; } free( e->flz_encoder ); e->flz_encoder = 0; } else { - e->lz_encoder = (struct LZ_encoder *)malloc( sizeof (struct LZ_encoder) ); - if( e->lz_encoder && LZe_init( e->lz_encoder, Fh_get_dictionary_size( header ), + e->lz_encoder = (LZ_encoder *)malloc( sizeof (LZ_encoder) ); + if( e->lz_encoder && LZe_init( e->lz_encoder, Lh_get_dictionary_size( header ), match_len_limit, member_size ) ) { e->lz_encoder_base = &e->lz_encoder->eb; return e; } free( e->lz_encoder ); e->lz_encoder = 0; @@ -183,7 +175,7 @@ struct LZ_Encoder * LZ_compress_open( const int dictionary_size, } -int LZ_compress_close( struct LZ_Encoder * const e ) +int LZ_compress_close( LZ_Encoder * const e ) { if( !e ) return -1; if( e->lz_encoder_base ) @@ -194,17 +186,17 @@ int LZ_compress_close( struct LZ_Encoder * const e ) } -int LZ_compress_finish( struct LZ_Encoder * const e ) +int LZ_compress_finish( LZ_Encoder * const e ) { - if( !verify_encoder( e ) || e->fatal ) return -1; + if( !check_encoder( e ) || e->fatal ) return -1; Mb_finish( &e->lz_encoder_base->mb ); /* if (open --> write --> finish) use same dictionary size as lzip. */ /* this does not save any memory. */ if( Mb_data_position( &e->lz_encoder_base->mb ) == 0 && - LZ_compress_total_out_size( e ) == Fh_size ) + Re_member_position( &e->lz_encoder_base->renc ) == Lh_size ) { Mb_adjust_dictionary_size( &e->lz_encoder_base->mb ); - Fh_set_dictionary_size( e->lz_encoder_base->renc.header, + Lh_set_dictionary_size( e->lz_encoder_base->renc.header, e->lz_encoder_base->mb.dictionary_size ); e->lz_encoder_base->renc.cb.buffer[5] = e->lz_encoder_base->renc.header[5]; } @@ -212,10 +204,10 @@ int LZ_compress_finish( struct LZ_Encoder * const e ) } -int LZ_compress_restart_member( struct LZ_Encoder * const e, +int LZ_compress_restart_member( LZ_Encoder * const e, const unsigned long long member_size ) { - if( !verify_encoder( e ) || e->fatal ) return -1; + if( !check_encoder( e ) || e->fatal ) return -1; if( !LZeb_member_finished( e->lz_encoder_base ) ) { e->lz_errno = LZ_sequence_error; return -1; } if( member_size < min_dictionary_size ) @@ -231,114 +223,111 @@ int LZ_compress_restart_member( struct LZ_Encoder * const e, } -int LZ_compress_sync_flush( struct LZ_Encoder * const e ) +int LZ_compress_sync_flush( LZ_Encoder * const e ) { - if( !verify_encoder( e ) || e->fatal ) return -1; - if( !Mb_flushing_or_end( &e->lz_encoder_base->mb ) ) - e->lz_encoder_base->mb.flushing = true; + if( !check_encoder( e ) || e->fatal ) return -1; + if( !e->lz_encoder_base->mb.at_stream_end ) + e->lz_encoder_base->mb.sync_flush_pending = true; return 0; } -int LZ_compress_read( struct LZ_Encoder * const e, +int LZ_compress_read( LZ_Encoder * const e, uint8_t * const buffer, const int size ) { - int out_size = 0; - if( !verify_encoder( e ) || e->fatal ) return -1; + if( !check_encoder( e ) || e->fatal ) return -1; if( size < 0 ) return 0; - do { + + { LZ_encoder_base * const eb = e->lz_encoder_base; + int out_size = Re_read_data( &eb->renc, buffer, size ); + /* minimize number of calls to encode_member */ + if( out_size < size || size == 0 ) + { if( ( e->flz_encoder && !FLZe_encode_member( e->flz_encoder ) ) || ( e->lz_encoder && !LZe_encode_member( e->lz_encoder ) ) ) { e->lz_errno = LZ_library_error; e->fatal = true; return -1; } - if( e->lz_encoder_base->mb.flushing && - Mb_available_bytes( &e->lz_encoder_base->mb ) <= 0 && - LZeb_sync_flush( e->lz_encoder_base ) ) - e->lz_encoder_base->mb.flushing = false; - out_size += Re_read_data( &e->lz_encoder_base->renc, - buffer + out_size, size - out_size ); + if( eb->mb.sync_flush_pending && Mb_available_bytes( &eb->mb ) <= 0 ) + LZeb_try_sync_flush( eb ); + out_size += Re_read_data( &eb->renc, buffer + out_size, size - out_size ); } - while( e->lz_encoder_base->mb.flushing && out_size < size && - Mb_enough_available_bytes( &e->lz_encoder_base->mb ) && - Re_enough_free_bytes( &e->lz_encoder_base->renc ) ); - return out_size; + return out_size; } } -int LZ_compress_write( struct LZ_Encoder * const e, +int LZ_compress_write( LZ_Encoder * const e, const uint8_t * const buffer, const int size ) { - if( !verify_encoder( e ) || e->fatal ) return -1; + if( !check_encoder( e ) || e->fatal ) return -1; return Mb_write_data( &e->lz_encoder_base->mb, buffer, size ); } -int LZ_compress_write_size( struct LZ_Encoder * const e ) +int LZ_compress_write_size( LZ_Encoder * const e ) { - if( !verify_encoder( e ) || e->fatal ) return -1; + if( !check_encoder( e ) || e->fatal ) return -1; return Mb_free_bytes( &e->lz_encoder_base->mb ); } -enum LZ_Errno LZ_compress_errno( struct LZ_Encoder * const e ) +LZ_Errno LZ_compress_errno( LZ_Encoder * const e ) { if( !e ) return LZ_bad_argument; return e->lz_errno; } -int LZ_compress_finished( struct LZ_Encoder * const e ) +int LZ_compress_finished( LZ_Encoder * const e ) { - if( !verify_encoder( e ) ) return -1; - return ( Mb_data_finished( &e->lz_encoder_base->mb ) && - LZeb_member_finished( e->lz_encoder_base ) ); + if( !check_encoder( e ) ) return -1; + return Mb_data_finished( &e->lz_encoder_base->mb ) && + LZeb_member_finished( e->lz_encoder_base ); } -int LZ_compress_member_finished( struct LZ_Encoder * const e ) +int LZ_compress_member_finished( LZ_Encoder * const e ) { - if( !verify_encoder( e ) ) return -1; + if( !check_encoder( e ) ) return -1; return LZeb_member_finished( e->lz_encoder_base ); } -unsigned long long LZ_compress_data_position( struct LZ_Encoder * const e ) +unsigned long long LZ_compress_data_position( LZ_Encoder * const e ) { - if( !verify_encoder( e ) ) return 0; + if( !check_encoder( e ) ) return 0; return Mb_data_position( &e->lz_encoder_base->mb ); } -unsigned long long LZ_compress_member_position( struct LZ_Encoder * const e ) +unsigned long long LZ_compress_member_position( LZ_Encoder * const e ) { - if( !verify_encoder( e ) ) return 0; + if( !check_encoder( e ) ) return 0; return Re_member_position( &e->lz_encoder_base->renc ); } -unsigned long long LZ_compress_total_in_size( struct LZ_Encoder * const e ) +unsigned long long LZ_compress_total_in_size( LZ_Encoder * const e ) { - if( !verify_encoder( e ) ) return 0; + if( !check_encoder( e ) ) return 0; return e->partial_in_size + Mb_data_position( &e->lz_encoder_base->mb ); } -unsigned long long LZ_compress_total_out_size( struct LZ_Encoder * const e ) +unsigned long long LZ_compress_total_out_size( LZ_Encoder * const e ) { - if( !verify_encoder( e ) ) return 0; + if( !check_encoder( e ) ) return 0; return e->partial_out_size + Re_member_position( &e->lz_encoder_base->renc ); } -/*--------------------- Decompression Functions ---------------------*/ +/* -------------------- Decompression Functions -------------------- */ -struct LZ_Decoder * LZ_decompress_open( void ) +LZ_Decoder * LZ_decompress_open( void ) { - struct LZ_Decoder * const d = - (struct LZ_Decoder *)malloc( sizeof (struct LZ_Decoder) ); + LZ_Decoder * const d = (LZ_Decoder *)malloc( sizeof (LZ_Decoder) ); if( !d ) return 0; LZ_Decoder_init( d ); - d->rdec = (struct Range_decoder *)malloc( sizeof (struct Range_decoder) ); + d->rdec = (Range_decoder *)malloc( sizeof (Range_decoder) ); if( !d->rdec || !Rd_init( d->rdec ) ) { if( d->rdec ) { Rd_free( d->rdec ); free( d->rdec ); d->rdec = 0; } @@ -348,7 +337,7 @@ struct LZ_Decoder * LZ_decompress_open( void ) } -int LZ_decompress_close( struct LZ_Decoder * const d ) +int LZ_decompress_close( LZ_Decoder * const d ) { if( !d ) return -1; if( d->lz_decoder ) @@ -359,9 +348,9 @@ int LZ_decompress_close( struct LZ_Decoder * const d ) } -int LZ_decompress_finish( struct LZ_Decoder * const d ) +int LZ_decompress_finish( LZ_Decoder * const d ) { - if( !verify_decoder( d ) || d->fatal ) return -1; + if( !check_decoder( d ) || d->fatal ) return -1; if( d->seeking ) { d->seeking = false; d->partial_in_size += Rd_purge( d->rdec ); } else Rd_finish( d->rdec ); @@ -369,9 +358,9 @@ int LZ_decompress_finish( struct LZ_Decoder * const d ) } -int LZ_decompress_reset( struct LZ_Decoder * const d ) +int LZ_decompress_reset( LZ_Decoder * const d ) { - if( !verify_decoder( d ) ) return -1; + if( !check_decoder( d ) ) return -1; if( d->lz_decoder ) { LZd_free( d->lz_decoder ); free( d->lz_decoder ); d->lz_decoder = 0; } d->partial_in_size = 0; @@ -385,10 +374,10 @@ int LZ_decompress_reset( struct LZ_Decoder * const d ) } -int LZ_decompress_sync_to_member( struct LZ_Decoder * const d ) +int LZ_decompress_sync_to_member( LZ_Decoder * const d ) { - int skipped = 0; - if( !verify_decoder( d ) ) return -1; + unsigned skipped = 0; + if( !check_decoder( d ) ) return -1; if( d->lz_decoder ) { LZd_free( d->lz_decoder ); free( d->lz_decoder ); d->lz_decoder = 0; } if( Rd_find_header( d->rdec, &skipped ) ) d->seeking = false; @@ -404,12 +393,16 @@ int LZ_decompress_sync_to_member( struct LZ_Decoder * const d ) } -int LZ_decompress_read( struct LZ_Decoder * const d, +int LZ_decompress_read( LZ_Decoder * const d, uint8_t * const buffer, const int size ) { int result; - if( !verify_decoder( d ) || d->fatal ) return -1; - if( d->seeking || size < 0 ) return 0; + if( !check_decoder( d ) ) return -1; + if( size < 0 ) return 0; + if( d->fatal ) /* don't return error until pending bytes are read */ + { if( d->lz_decoder && !Cb_empty( &d->lz_decoder->cb ) ) goto get_data; + return -1; } + if( d->seeking ) return 0; if( d->lz_decoder && LZd_member_finished( d->lz_decoder ) ) { @@ -421,25 +414,42 @@ int LZ_decompress_read( struct LZ_Decoder * const d, int rd; d->partial_in_size += d->rdec->member_position; d->rdec->member_position = 0; - if( Rd_available_bytes( d->rdec ) < Fh_size + 5 && + if( Rd_available_bytes( d->rdec ) < Lh_size + 5 && !d->rdec->at_stream_end ) return 0; if( Rd_finished( d->rdec ) && !d->first_header ) return 0; - rd = Rd_read_data( d->rdec, d->member_header, Fh_size ); - if( Rd_finished( d->rdec ) ) + rd = Rd_read_data( d->rdec, d->member_header, Lh_size ); + if( rd < Lh_size || Rd_finished( d->rdec ) ) /* End Of File */ { - if( rd <= 0 || Fh_verify_prefix( d->member_header, rd ) ) + if( rd <= 0 || Lh_check_prefix( d->member_header, rd ) ) d->lz_errno = LZ_unexpected_eof; else d->lz_errno = LZ_header_error; d->fatal = true; return -1; } - if( !Fh_verify( d->member_header ) ) + if( !Lh_check_magic( d->member_header ) ) { /* unreading the header prevents sync_to_member from skipping a member if leading garbage is shorter than a full header; "lgLZIP\x01\x0C" */ if( Rd_unread_data( d->rdec, rd ) ) - d->lz_errno = LZ_header_error; + { + if( d->first_header || !Lh_check_corrupt( d->member_header ) ) + d->lz_errno = LZ_header_error; + else + d->lz_errno = LZ_data_error; /* corrupt header */ + } + else + d->lz_errno = LZ_library_error; + d->fatal = true; + return -1; + } + if( !Lh_check_version( d->member_header ) || + !isvalid_ds( Lh_get_dictionary_size( d->member_header ) ) ) + { + /* Skip a possible "LZIP" leading garbage; "LZIPLZIP\x01\x0C". + Leave member_pos pointing to the first error. */ + if( Rd_unread_data( d->rdec, 1 + !Lh_check_version( d->member_header ) ) ) + d->lz_errno = LZ_data_error; /* bad version or bad dict size */ else d->lz_errno = LZ_library_error; d->fatal = true; @@ -455,9 +465,9 @@ int LZ_decompress_read( struct LZ_Decoder * const d, d->fatal = true; return -1; } - d->lz_decoder = (struct LZ_decoder *)malloc( sizeof (struct LZ_decoder) ); + d->lz_decoder = (LZ_decoder *)malloc( sizeof (LZ_decoder) ); if( !d->lz_decoder || !LZd_init( d->lz_decoder, d->rdec, - Fh_get_dictionary_size( d->member_header ) ) ) + Lh_get_dictionary_size( d->member_header ) ) ) { /* not enough free memory */ if( d->lz_decoder ) { LZd_free( d->lz_decoder ); free( d->lz_decoder ); d->lz_decoder = 0; } @@ -470,30 +480,32 @@ int LZ_decompress_read( struct LZ_Decoder * const d, result = LZd_decode_member( d->lz_decoder ); if( result != 0 ) { - if( result == 2 ) - { d->lz_errno = LZ_unexpected_eof; - d->rdec->member_position += Cb_used_bytes( &d->rdec->cb ); - Cb_reset( &d->rdec->cb ); } - else if( result == 5 ) d->lz_errno = LZ_library_error; + if( result == 2 ) /* set input position at EOF */ + { d->rdec->member_position += Cb_used_bytes( &d->rdec->cb ); + Cb_reset( &d->rdec->cb ); + d->lz_errno = LZ_unexpected_eof; } + else if( result == 6 ) d->lz_errno = LZ_library_error; else d->lz_errno = LZ_data_error; d->fatal = true; - return -1; + if( Cb_empty( &d->lz_decoder->cb ) ) return -1; } +get_data: return Cb_read_data( &d->lz_decoder->cb, buffer, size ); } -int LZ_decompress_write( struct LZ_Decoder * const d, +int LZ_decompress_write( LZ_Decoder * const d, const uint8_t * const buffer, const int size ) { int result; - if( !verify_decoder( d ) || d->fatal ) return -1; + if( !check_decoder( d ) || d->fatal ) return -1; if( size < 0 ) return 0; result = Rd_write_data( d->rdec, buffer, size ); while( d->seeking ) { - int size2, skipped = 0; + int size2; + unsigned skipped = 0; if( Rd_find_header( d->rdec, &skipped ) ) d->seeking = false; d->partial_in_size += skipped; if( result >= size ) break; @@ -505,82 +517,82 @@ int LZ_decompress_write( struct LZ_Decoder * const d, } -int LZ_decompress_write_size( struct LZ_Decoder * const d ) +int LZ_decompress_write_size( LZ_Decoder * const d ) { - if( !verify_decoder( d ) || d->fatal ) return -1; + if( !check_decoder( d ) || d->fatal ) return -1; return Rd_free_bytes( d->rdec ); } -enum LZ_Errno LZ_decompress_errno( struct LZ_Decoder * const d ) +LZ_Errno LZ_decompress_errno( LZ_Decoder * const d ) { if( !d ) return LZ_bad_argument; return d->lz_errno; } -int LZ_decompress_finished( struct LZ_Decoder * const d ) +int LZ_decompress_finished( LZ_Decoder * const d ) { - if( !verify_decoder( d ) ) return -1; - return ( Rd_finished( d->rdec ) && - ( !d->lz_decoder || LZd_member_finished( d->lz_decoder ) ) ); + if( !check_decoder( d ) || d->fatal ) return -1; + return Rd_finished( d->rdec ) && + ( !d->lz_decoder || LZd_member_finished( d->lz_decoder ) ); } -int LZ_decompress_member_finished( struct LZ_Decoder * const d ) +int LZ_decompress_member_finished( LZ_Decoder * const d ) { - if( !verify_decoder( d ) ) return -1; - return ( d->lz_decoder && LZd_member_finished( d->lz_decoder ) ); + if( !check_decoder( d ) || d->fatal ) return -1; + return d->lz_decoder && LZd_member_finished( d->lz_decoder ); } -int LZ_decompress_member_version( struct LZ_Decoder * const d ) +int LZ_decompress_member_version( LZ_Decoder * const d ) { - if( !verify_decoder( d ) ) return -1; - return Fh_version( d->member_header ); + if( !check_decoder( d ) ) return -1; + return Lh_version( d->member_header ); } -int LZ_decompress_dictionary_size( struct LZ_Decoder * const d ) +int LZ_decompress_dictionary_size( LZ_Decoder * const d ) { - if( !verify_decoder( d ) ) return -1; - return Fh_get_dictionary_size( d->member_header ); + if( !check_decoder( d ) ) return -1; + return Lh_get_dictionary_size( d->member_header ); } -unsigned LZ_decompress_data_crc( struct LZ_Decoder * const d ) +unsigned LZ_decompress_data_crc( LZ_Decoder * const d ) { - if( verify_decoder( d ) && d->lz_decoder ) + if( check_decoder( d ) && d->lz_decoder ) return LZd_crc( d->lz_decoder ); return 0; } -unsigned long long LZ_decompress_data_position( struct LZ_Decoder * const d ) +unsigned long long LZ_decompress_data_position( LZ_Decoder * const d ) { - if( verify_decoder( d ) && d->lz_decoder ) + if( check_decoder( d ) && d->lz_decoder ) return LZd_data_position( d->lz_decoder ); return 0; } -unsigned long long LZ_decompress_member_position( struct LZ_Decoder * const d ) +unsigned long long LZ_decompress_member_position( LZ_Decoder * const d ) { - if( !verify_decoder( d ) ) return 0; + if( !check_decoder( d ) ) return 0; return d->rdec->member_position; } -unsigned long long LZ_decompress_total_in_size( struct LZ_Decoder * const d ) +unsigned long long LZ_decompress_total_in_size( LZ_Decoder * const d ) { - if( !verify_decoder( d ) ) return 0; + if( !check_decoder( d ) ) return 0; return d->partial_in_size + d->rdec->member_position; } -unsigned long long LZ_decompress_total_out_size( struct LZ_Decoder * const d ) +unsigned long long LZ_decompress_total_out_size( LZ_Decoder * const d ) { - if( !verify_decoder( d ) ) return 0; + if( !check_decoder( d ) ) return 0; if( d->lz_decoder ) return d->partial_out_size + LZd_data_position( d->lz_decoder ); return d->partial_out_size; diff --git a/lzlib.h b/lzlib.h index a734fbf..926124a 100644 --- a/lzlib.h +++ b/lzlib.h @@ -1,45 +1,42 @@ -/* Lzlib - Compression library for the lzip format - Copyright (C) 2009-2016 Antonio Diaz Diaz. +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2025 Antonio Diaz Diaz. - This library is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: - This library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. - You should have received a copy of the GNU General Public License - along with this library. If not, see . + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. - As a special exception, you may use this file as part of a free - software library without restriction. Specifically, if other files - instantiate templates or use macros or inline functions from this - file, or you compile this file and link it with other files to - produce an executable, this file does not by itself cause the - resulting executable to be covered by the GNU General Public - License. This exception does not however invalidate any other - reasons why the executable file might be covered by the GNU General - Public License. + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. */ #ifdef __cplusplus extern "C" { #endif -#define LZ_API_VERSION 1 +/* LZ_API_VERSION was first defined in lzlib 1.8 to 1. + Since lzlib 1.12, LZ_API_VERSION is defined as (major * 1000 + minor). */ -static const char * const LZ_version_string = "1.8"; +#define LZ_API_VERSION 1015 -enum LZ_Errno { LZ_ok = 0, LZ_bad_argument, LZ_mem_error, - LZ_sequence_error, LZ_header_error, LZ_unexpected_eof, - LZ_data_error, LZ_library_error }; +static const char * const LZ_version_string = "1.15"; + +typedef enum LZ_Errno + { LZ_ok = 0, LZ_bad_argument, LZ_mem_error, + LZ_sequence_error, LZ_header_error, LZ_unexpected_eof, + LZ_data_error, LZ_library_error } LZ_Errno; +int LZ_api_version( void ); /* new in 1.12 */ const char * LZ_version( void ); -const char * LZ_strerror( const enum LZ_Errno lz_errno ); +const char * LZ_strerror( const LZ_Errno lz_errno ); int LZ_min_dictionary_bits( void ); int LZ_min_dictionary_size( void ); @@ -49,65 +46,65 @@ int LZ_min_match_len_limit( void ); int LZ_max_match_len_limit( void ); -/*---------------------- Compression Functions ----------------------*/ +/* --------------------- Compression Functions --------------------- */ -struct LZ_Encoder; +typedef struct LZ_Encoder LZ_Encoder; -struct LZ_Encoder * LZ_compress_open( const int dictionary_size, - const int match_len_limit, - const unsigned long long member_size ); -int LZ_compress_close( struct LZ_Encoder * const encoder ); +LZ_Encoder * LZ_compress_open( const int dictionary_size, + const int match_len_limit, + const unsigned long long member_size ); +int LZ_compress_close( LZ_Encoder * const encoder ); -int LZ_compress_finish( struct LZ_Encoder * const encoder ); -int LZ_compress_restart_member( struct LZ_Encoder * const encoder, +int LZ_compress_finish( LZ_Encoder * const encoder ); +int LZ_compress_restart_member( LZ_Encoder * const encoder, const unsigned long long member_size ); -int LZ_compress_sync_flush( struct LZ_Encoder * const encoder ); +int LZ_compress_sync_flush( LZ_Encoder * const encoder ); -int LZ_compress_read( struct LZ_Encoder * const encoder, +int LZ_compress_read( LZ_Encoder * const encoder, uint8_t * const buffer, const int size ); -int LZ_compress_write( struct LZ_Encoder * const encoder, +int LZ_compress_write( LZ_Encoder * const encoder, const uint8_t * const buffer, const int size ); -int LZ_compress_write_size( struct LZ_Encoder * const encoder ); +int LZ_compress_write_size( LZ_Encoder * const encoder ); -enum LZ_Errno LZ_compress_errno( struct LZ_Encoder * const encoder ); -int LZ_compress_finished( struct LZ_Encoder * const encoder ); -int LZ_compress_member_finished( struct LZ_Encoder * const encoder ); +LZ_Errno LZ_compress_errno( LZ_Encoder * const encoder ); +int LZ_compress_finished( LZ_Encoder * const encoder ); +int LZ_compress_member_finished( LZ_Encoder * const encoder ); -unsigned long long LZ_compress_data_position( struct LZ_Encoder * const encoder ); -unsigned long long LZ_compress_member_position( struct LZ_Encoder * const encoder ); -unsigned long long LZ_compress_total_in_size( struct LZ_Encoder * const encoder ); -unsigned long long LZ_compress_total_out_size( struct LZ_Encoder * const encoder ); +unsigned long long LZ_compress_data_position( LZ_Encoder * const encoder ); +unsigned long long LZ_compress_member_position( LZ_Encoder * const encoder ); +unsigned long long LZ_compress_total_in_size( LZ_Encoder * const encoder ); +unsigned long long LZ_compress_total_out_size( LZ_Encoder * const encoder ); -/*--------------------- Decompression Functions ---------------------*/ +/* -------------------- Decompression Functions -------------------- */ -struct LZ_Decoder; +typedef struct LZ_Decoder LZ_Decoder; -struct LZ_Decoder * LZ_decompress_open( void ); -int LZ_decompress_close( struct LZ_Decoder * const decoder ); +LZ_Decoder * LZ_decompress_open( void ); +int LZ_decompress_close( LZ_Decoder * const decoder ); -int LZ_decompress_finish( struct LZ_Decoder * const decoder ); -int LZ_decompress_reset( struct LZ_Decoder * const decoder ); -int LZ_decompress_sync_to_member( struct LZ_Decoder * const decoder ); +int LZ_decompress_finish( LZ_Decoder * const decoder ); +int LZ_decompress_reset( LZ_Decoder * const decoder ); +int LZ_decompress_sync_to_member( LZ_Decoder * const decoder ); -int LZ_decompress_read( struct LZ_Decoder * const decoder, +int LZ_decompress_read( LZ_Decoder * const decoder, uint8_t * const buffer, const int size ); -int LZ_decompress_write( struct LZ_Decoder * const decoder, +int LZ_decompress_write( LZ_Decoder * const decoder, const uint8_t * const buffer, const int size ); -int LZ_decompress_write_size( struct LZ_Decoder * const decoder ); +int LZ_decompress_write_size( LZ_Decoder * const decoder ); -enum LZ_Errno LZ_decompress_errno( struct LZ_Decoder * const decoder ); -int LZ_decompress_finished( struct LZ_Decoder * const decoder ); -int LZ_decompress_member_finished( struct LZ_Decoder * const decoder ); +LZ_Errno LZ_decompress_errno( LZ_Decoder * const decoder ); +int LZ_decompress_finished( LZ_Decoder * const decoder ); +int LZ_decompress_member_finished( LZ_Decoder * const decoder ); -int LZ_decompress_member_version( struct LZ_Decoder * const decoder ); -int LZ_decompress_dictionary_size( struct LZ_Decoder * const decoder ); -unsigned LZ_decompress_data_crc( struct LZ_Decoder * const decoder ); +int LZ_decompress_member_version( LZ_Decoder * const decoder ); +int LZ_decompress_dictionary_size( LZ_Decoder * const decoder ); +unsigned LZ_decompress_data_crc( LZ_Decoder * const decoder ); -unsigned long long LZ_decompress_data_position( struct LZ_Decoder * const decoder ); -unsigned long long LZ_decompress_member_position( struct LZ_Decoder * const decoder ); -unsigned long long LZ_decompress_total_in_size( struct LZ_Decoder * const decoder ); -unsigned long long LZ_decompress_total_out_size( struct LZ_Decoder * const decoder ); +unsigned long long LZ_decompress_data_position( LZ_Decoder * const decoder ); +unsigned long long LZ_decompress_member_position( LZ_Decoder * const decoder ); +unsigned long long LZ_decompress_total_in_size( LZ_Decoder * const decoder ); +unsigned long long LZ_decompress_total_out_size( LZ_Decoder * const decoder ); #ifdef __cplusplus } diff --git a/main.c b/main.c deleted file mode 100644 index c2754bf..0000000 --- a/main.c +++ /dev/null @@ -1,1072 +0,0 @@ -/* Minilzip - Test program for the lzlib library - Copyright (C) 2009-2016 Antonio Diaz Diaz. - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 2 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . -*/ -/* - Exit status: 0 for a normal exit, 1 for environmental problems - (file not found, invalid flags, I/O errors, etc), 2 to indicate a - corrupt or invalid input file, 3 for an internal consistency error - (eg, bug) which caused minilzip to panic. -*/ - -#define _FILE_OFFSET_BITS 64 - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#if defined(__MSVCRT__) -#include -#define fchmod(x,y) 0 -#define fchown(x,y,z) 0 -#define strtoull strtoul -#define SIGHUP SIGTERM -#define S_ISSOCK(x) 0 -#define S_IRGRP 0 -#define S_IWGRP 0 -#define S_IROTH 0 -#define S_IWOTH 0 -#endif -#if defined(__OS2__) -#include -#endif - -#include "carg_parser.h" -#include "lzlib.h" - -#ifndef O_BINARY -#define O_BINARY 0 -#endif - -#if CHAR_BIT != 8 -#error "Environments where CHAR_BIT != 8 are not supported." -#endif - -#ifndef max - #define max(x,y) ((x) >= (y) ? (x) : (y)) -#endif -#ifndef min - #define min(x,y) ((x) <= (y) ? (x) : (y)) -#endif - -void cleanup_and_fail( const int retval ); -void show_error( const char * const msg, const int errcode, const bool help ); -void internal_error( const char * const msg ); - -int verbosity = 0; - -const char * const Program_name = "Minilzip"; -const char * const program_name = "minilzip"; -const char * const program_year = "2016"; -const char * invocation_name = 0; - -struct { const char * from; const char * to; } const known_extensions[] = { - { ".lz", "" }, - { ".tlz", ".tar" }, - { 0, 0 } }; - -struct Lzma_options - { - int dictionary_size; /* 4 KiB .. 512 MiB */ - int match_len_limit; /* 5 .. 273 */ - }; - -enum Mode { m_compress, m_decompress, m_test }; - -char * output_filename = 0; -int outfd = -1; -bool delete_output_on_interrupt = false; - - -struct Pretty_print - { - const char * name; - const char * stdin_name; - unsigned longest_name; - bool first_post; - }; - -static void Pp_init( struct Pretty_print * const pp, - const char * const filenames[], - const int num_filenames, const int verbosity ) - { - unsigned stdin_name_len; - int i; - pp->name = 0; - pp->stdin_name = "(stdin)"; - pp->longest_name = 0; - pp->first_post = false; - stdin_name_len = strlen( pp->stdin_name ); - - if( verbosity <= 0 ) return; - for( i = 0; i < num_filenames; ++i ) - { - const char * const s = filenames[i]; - const unsigned len = (strcmp( s, "-" ) == 0) ? stdin_name_len : strlen( s ); - if( len > pp->longest_name ) pp->longest_name = len; - } - if( pp->longest_name == 0 ) pp->longest_name = stdin_name_len; - } - -static inline void Pp_set_name( struct Pretty_print * const pp, - const char * const filename ) - { - if( filename && filename[0] && strcmp( filename, "-" ) != 0 ) - pp->name = filename; - else pp->name = pp->stdin_name; - pp->first_post = true; - } - -static inline void Pp_reset( struct Pretty_print * const pp ) - { if( pp->name && pp->name[0] ) pp->first_post = true; } - -static void Pp_show_msg( struct Pretty_print * const pp, const char * const msg ) - { - if( verbosity >= 0 ) - { - if( pp->first_post ) - { - unsigned i; - pp->first_post = false; - fprintf( stderr, " %s: ", pp->name ); - for( i = strlen( pp->name ); i < pp->longest_name; ++i ) - fputc( ' ', stderr ); - if( !msg ) fflush( stderr ); - } - if( msg ) fprintf( stderr, "%s\n", msg ); - } - } - - -static void show_help( void ) - { - printf( "%s - Test program for the lzlib library.\n", Program_name ); - printf( "\nUsage: %s [options] [files]\n", invocation_name ); - printf( "\nOptions:\n" - " -h, --help display this help and exit\n" - " -V, --version output version information and exit\n" - " -a, --trailing-error exit with error status if trailing data\n" - " -b, --member-size= set member size limit in bytes\n" - " -c, --stdout write to standard output, keep input files\n" - " -d, --decompress decompress\n" - " -f, --force overwrite existing output files\n" - " -F, --recompress force re-compression of compressed files\n" - " -k, --keep keep (don't delete) input files\n" - " -m, --match-length= set match length limit in bytes [36]\n" - " -o, --output= if reading standard input, write to \n" - " -q, --quiet suppress all messages\n" - " -s, --dictionary-size= set dictionary size limit in bytes [8 MiB]\n" - " -S, --volume-size= set volume size limit in bytes\n" - " -t, --test test compressed file integrity\n" - " -v, --verbose be verbose (a 2nd -v gives more)\n" - " -0 .. -9 set compression level [default 6]\n" - " --fast alias for -0\n" - " --best alias for -9\n" - "If no file names are given, or if a file is '-', minilzip compresses or\n" - "decompresses from standard input to standard output.\n" - "Numbers may be followed by a multiplier: k = kB = 10^3 = 1000,\n" - "Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc...\n" - "Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12\n" - "to 2^29 bytes.\n" - "\nThe bidimensional parameter space of LZMA can't be mapped to a linear\n" - "scale optimal for all files. If your files are large, very repetitive,\n" - "etc, you may need to use the --dictionary-size and --match-length\n" - "options directly to achieve optimal performance.\n" - "\nExit status: 0 for a normal exit, 1 for environmental problems (file\n" - "not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or\n" - "invalid input file, 3 for an internal consistency error (eg, bug) which\n" - "caused minilzip to panic.\n" - "\nReport bugs to lzip-bug@nongnu.org\n" - "Lzlib home page: http://www.nongnu.org/lzip/lzlib.html\n" ); - } - - -static void show_version( void ) - { - printf( "%s %s\n", program_name, PROGVERSION ); - printf( "Copyright (C) %s Antonio Diaz Diaz.\n", program_year ); - printf( "Using lzlib %s\n", LZ_version() ); - printf( "License GPLv2+: GNU GPL version 2 or later \n" - "This is free software: you are free to change and redistribute it.\n" - "There is NO WARRANTY, to the extent permitted by law.\n" ); - } - - -static void show_header( const unsigned dictionary_size ) - { - if( verbosity >= 3 ) - { - const char * const prefix[8] = - { "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi" }; - enum { factor = 1024 }; - const char * p = ""; - const char * np = " "; - unsigned num = dictionary_size, i; - bool exact = ( num % factor == 0 ); - - for( i = 0; i < 8 && ( num > 9999 || ( exact && num >= factor ) ); ++i ) - { num /= factor; if( num % factor != 0 ) exact = false; - p = prefix[i]; np = ""; } - fprintf( stderr, "dictionary size %s%4u %sB. ", np, num, p ); - } - } - - -static unsigned long long getnum( const char * const ptr, - const unsigned long long llimit, - const unsigned long long ulimit ) - { - unsigned long long result; - char * tail; - errno = 0; - result = strtoull( ptr, &tail, 0 ); - if( tail == ptr ) - { - show_error( "Bad or missing numerical argument.", 0, true ); - exit( 1 ); - } - - if( !errno && tail[0] ) - { - const int factor = ( tail[1] == 'i' ) ? 1024 : 1000; - int exponent = 0; /* 0 = bad multiplier */ - int i; - switch( tail[0] ) - { - case 'Y': exponent = 8; break; - case 'Z': exponent = 7; break; - case 'E': exponent = 6; break; - case 'P': exponent = 5; break; - case 'T': exponent = 4; break; - case 'G': exponent = 3; break; - case 'M': exponent = 2; break; - case 'K': if( factor == 1024 ) exponent = 1; break; - case 'k': if( factor == 1000 ) exponent = 1; break; - } - if( exponent <= 0 ) - { - show_error( "Bad multiplier in numerical argument.", 0, true ); - exit( 1 ); - } - for( i = 0; i < exponent; ++i ) - { - if( ulimit / factor >= result ) result *= factor; - else { errno = ERANGE; break; } - } - } - if( !errno && ( result < llimit || result > ulimit ) ) errno = ERANGE; - if( errno ) - { - show_error( "Numerical argument out of limits.", 0, false ); - exit( 1 ); - } - return result; - } - - -static int get_dict_size( const char * const arg ) - { - char * tail; - int dictionary_size; - const int bits = strtol( arg, &tail, 0 ); - if( bits >= LZ_min_dictionary_bits() && - bits <= LZ_max_dictionary_bits() && *tail == 0 ) - return ( 1 << bits ); - dictionary_size = getnum( arg, LZ_min_dictionary_size(), - LZ_max_dictionary_size() ); - if( dictionary_size == 65535 ) ++dictionary_size; /* no fast encoder */ - return dictionary_size; - } - - -static int extension_index( const char * const name ) - { - int i; - for( i = 0; known_extensions[i].from; ++i ) - { - const char * const ext = known_extensions[i].from; - const unsigned name_len = strlen( name ); - const unsigned ext_len = strlen( ext ); - if( name_len > ext_len && - strncmp( name + name_len - ext_len, ext, ext_len ) == 0 ) - return i; - } - return -1; - } - - -static int open_instream( const char * const name, struct stat * const in_statsp, - const enum Mode program_mode, const int eindex, - const bool recompress, const bool to_stdout ) - { - int infd = -1; - if( program_mode == m_compress && !recompress && eindex >= 0 ) - { - if( verbosity >= 0 ) - fprintf( stderr, "%s: Input file '%s' already has '%s' suffix.\n", - program_name, name, known_extensions[eindex].from ); - } - else - { - infd = open( name, O_RDONLY | O_BINARY ); - if( infd < 0 ) - { - if( verbosity >= 0 ) - fprintf( stderr, "%s: Can't open input file '%s': %s\n", - program_name, name, strerror( errno ) ); - } - else - { - const int i = fstat( infd, in_statsp ); - const mode_t mode = in_statsp->st_mode; - const bool can_read = ( i == 0 && - ( S_ISBLK( mode ) || S_ISCHR( mode ) || - S_ISFIFO( mode ) || S_ISSOCK( mode ) ) ); - const bool no_ofile = ( to_stdout || program_mode == m_test ); - if( i != 0 || ( !S_ISREG( mode ) && ( !can_read || !no_ofile ) ) ) - { - if( verbosity >= 0 ) - fprintf( stderr, "%s: Input file '%s' is not a regular file%s.\n", - program_name, name, - ( can_read && !no_ofile ) ? - ",\n and '--stdout' was not specified" : "" ); - close( infd ); - infd = -1; - } - } - } - return infd; - } - - -/* assure at least a minimum size for buffer 'buf' */ -static void * resize_buffer( void * buf, const int min_size ) - { - if( buf ) buf = realloc( buf, min_size ); - else buf = malloc( min_size ); - if( !buf ) - { - show_error( "Not enough memory.", 0, false ); - cleanup_and_fail( 1 ); - } - return buf; - } - - -static void set_c_outname( const char * const name, const bool multifile ) - { - output_filename = resize_buffer( output_filename, strlen( name ) + 5 + - strlen( known_extensions[0].from ) + 1 ); - strcpy( output_filename, name ); - if( multifile ) strcat( output_filename, "00001" ); - strcat( output_filename, known_extensions[0].from ); - } - - -static void set_d_outname( const char * const name, const int i ) - { - const unsigned name_len = strlen( name ); - if( i >= 0 ) - { - const char * const from = known_extensions[i].from; - const unsigned from_len = strlen( from ); - if( name_len > from_len ) - { - output_filename = resize_buffer( output_filename, name_len + - strlen( known_extensions[0].to ) + 1 ); - strcpy( output_filename, name ); - strcpy( output_filename + name_len - from_len, known_extensions[i].to ); - return; - } - } - output_filename = resize_buffer( output_filename, name_len + 4 + 1 ); - strcpy( output_filename, name ); - strcat( output_filename, ".out" ); - if( verbosity >= 1 ) - fprintf( stderr, "%s: Can't guess original name for '%s' -- using '%s'\n", - program_name, name, output_filename ); - } - - -static bool open_outstream( const bool force, const bool from_stdin ) - { - const mode_t usr_rw = S_IRUSR | S_IWUSR; - const mode_t all_rw = usr_rw | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; - const mode_t outfd_mode = from_stdin ? all_rw : usr_rw; - int flags = O_CREAT | O_WRONLY | O_BINARY; - if( force ) flags |= O_TRUNC; else flags |= O_EXCL; - - outfd = open( output_filename, flags, outfd_mode ); - if( outfd >= 0 ) delete_output_on_interrupt = true; - else if( verbosity >= 0 ) - { - if( errno == EEXIST ) - fprintf( stderr, "%s: Output file '%s' already exists, skipping.\n", - program_name, output_filename ); - else - fprintf( stderr, "%s: Can't create output file '%s': %s\n", - program_name, output_filename, strerror( errno ) ); - } - return ( outfd >= 0 ); - } - - -static bool check_tty( const int infd, const enum Mode program_mode ) - { - if( program_mode == m_compress && isatty( outfd ) ) - { - show_error( "I won't write compressed data to a terminal.", 0, true ); - return false; - } - if( ( program_mode == m_decompress || program_mode == m_test ) && - isatty( infd ) ) - { - show_error( "I won't read compressed data from a terminal.", 0, true ); - return false; - } - return true; - } - - -void cleanup_and_fail( const int retval ) - { - if( delete_output_on_interrupt ) - { - delete_output_on_interrupt = false; - if( verbosity >= 0 ) - fprintf( stderr, "%s: Deleting output file '%s', if it exists.\n", - program_name, output_filename ); - if( outfd >= 0 ) { close( outfd ); outfd = -1; } - if( remove( output_filename ) != 0 && errno != ENOENT ) - show_error( "WARNING: deletion of output file (apparently) failed.", 0, false ); - } - exit( retval ); - } - - - /* Set permissions, owner and times. */ -static void close_and_set_permissions( const struct stat * const in_statsp ) - { - bool warning = false; - if( in_statsp ) - { - const mode_t mode = in_statsp->st_mode; - /* fchown will in many cases return with EPERM, which can be safely ignored. */ - if( fchown( outfd, in_statsp->st_uid, in_statsp->st_gid ) == 0 ) - { if( fchmod( outfd, mode ) != 0 ) warning = true; } - else - if( errno != EPERM || - fchmod( outfd, mode & ~( S_ISUID | S_ISGID | S_ISVTX ) ) != 0 ) - warning = true; - } - if( close( outfd ) != 0 ) - { - show_error( "Error closing output file", errno, false ); - cleanup_and_fail( 1 ); - } - outfd = -1; - delete_output_on_interrupt = false; - if( in_statsp ) - { - struct utimbuf t; - t.actime = in_statsp->st_atime; - t.modtime = in_statsp->st_mtime; - if( utime( output_filename, &t ) != 0 ) warning = true; - } - if( warning && verbosity >= 1 ) - show_error( "Can't change output file attributes.", 0, false ); - } - - -/* Returns the number of bytes really read. - If (returned value < size) and (errno == 0), means EOF was reached. -*/ -static int readblock( const int fd, uint8_t * const buf, const int size ) - { - int sz = 0; - errno = 0; - while( sz < size ) - { - const int n = read( fd, buf + sz, size - sz ); - if( n > 0 ) sz += n; - else if( n == 0 ) break; /* EOF */ - else if( errno != EINTR ) break; - errno = 0; - } - return sz; - } - - -/* Returns the number of bytes really written. - If (returned value < size), it is always an error. -*/ -static int writeblock( const int fd, const uint8_t * const buf, const int size ) - { - int sz = 0; - errno = 0; - while( sz < size ) - { - const int n = write( fd, buf + sz, size - sz ); - if( n > 0 ) sz += n; - else if( n < 0 && errno != EINTR ) break; - errno = 0; - } - return sz; - } - - -static bool next_filename( void ) - { - const unsigned name_len = strlen( output_filename ); - const unsigned ext_len = strlen( known_extensions[0].from ); - int i, j; - if( name_len >= ext_len + 5 ) /* "*00001.lz" */ - for( i = name_len - ext_len - 1, j = 0; j < 5; --i, ++j ) - { - if( output_filename[i] < '9' ) { ++output_filename[i]; return true; } - else output_filename[i] = '0'; - } - return false; - } - - -static int do_compress( struct LZ_Encoder * const encoder, - const unsigned long long member_size, - const unsigned long long volume_size, - const int infd, struct Pretty_print * const pp, - const struct stat * const in_statsp ) - { - unsigned long long partial_volume_size = 0; - enum { buffer_size = 65536 }; - uint8_t buffer[buffer_size]; - if( verbosity >= 1 ) Pp_show_msg( pp, 0 ); - - while( true ) - { - int in_size = 0, out_size; - while( LZ_compress_write_size( encoder ) > 0 ) - { - const int size = min( LZ_compress_write_size( encoder ), buffer_size ); - const int rd = readblock( infd, buffer, size ); - if( rd != size && errno ) - { - Pp_show_msg( pp, 0 ); show_error( "Read error", errno, false ); - return 1; - } - if( rd > 0 && rd != LZ_compress_write( encoder, buffer, rd ) ) - internal_error( "library error (LZ_compress_write)." ); - if( rd < size ) LZ_compress_finish( encoder ); -/* else LZ_compress_sync_flush( encoder ); */ - in_size += rd; - } - out_size = LZ_compress_read( encoder, buffer, buffer_size ); - if( out_size < 0 ) - { - Pp_show_msg( pp, 0 ); - if( verbosity >= 0 ) - fprintf( stderr, "%s: LZ_compress_read error: %s\n", - program_name, LZ_strerror( LZ_compress_errno( encoder ) ) ); - return 1; - } - else if( out_size > 0 ) - { - const int wr = writeblock( outfd, buffer, out_size ); - if( wr != out_size ) - { - Pp_show_msg( pp, 0 ); show_error( "Write error", errno, false ); - return 1; - } - } - else if( in_size == 0 ) internal_error( "library error (LZ_compress_read)." ); - if( LZ_compress_member_finished( encoder ) ) - { - unsigned long long size; - if( LZ_compress_finished( encoder ) == 1 ) break; - if( volume_size > 0 ) - { - partial_volume_size += LZ_compress_member_position( encoder ); - if( partial_volume_size >= volume_size - LZ_min_dictionary_size() ) - { - partial_volume_size = 0; - if( delete_output_on_interrupt ) - { - close_and_set_permissions( in_statsp ); - if( !next_filename() ) - { Pp_show_msg( pp, "Too many volume files." ); return 1; } - if( !open_outstream( true, !in_statsp ) ) return 1; - } - } - size = min( member_size, volume_size - partial_volume_size ); - } - else - size = member_size; - if( LZ_compress_restart_member( encoder, size ) < 0 ) - { - Pp_show_msg( pp, 0 ); - if( verbosity >= 0 ) - fprintf( stderr, "%s: LZ_compress_restart_member error: %s\n", - program_name, LZ_strerror( LZ_compress_errno( encoder ) ) ); - return 1; - } - } - } - - if( verbosity >= 1 ) - { - const unsigned long long in_size = LZ_compress_total_in_size( encoder ); - const unsigned long long out_size = LZ_compress_total_out_size( encoder ); - if( in_size == 0 || out_size == 0 ) - fputs( " no data compressed.\n", stderr ); - else - fprintf( stderr, "%6.3f:1, %6.3f bits/byte, " - "%5.2f%% saved, %llu in, %llu out.\n", - (double)in_size / out_size, - ( 8.0 * out_size ) / in_size, - 100.0 * ( 1.0 - ( (double)out_size / in_size ) ), - in_size, out_size ); - } - return 0; - } - - -static int compress( const unsigned long long member_size, - const unsigned long long volume_size, const int infd, - const struct Lzma_options * const encoder_options, - struct Pretty_print * const pp, - const struct stat * const in_statsp ) - { - struct LZ_Encoder * const encoder = - LZ_compress_open( encoder_options->dictionary_size, - encoder_options->match_len_limit, ( volume_size > 0 ) ? - min( member_size, volume_size ) : member_size ); - int retval; - - if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) - { - if( !encoder || LZ_compress_errno( encoder ) == LZ_mem_error ) - Pp_show_msg( pp, "Not enough memory. Try a smaller dictionary size." ); - else - internal_error( "invalid argument to encoder." ); - retval = 1; - } - else retval = do_compress( encoder, member_size, volume_size, - infd, pp, in_statsp ); - LZ_compress_close( encoder ); - return retval; - } - - -static int do_decompress( struct LZ_Decoder * const decoder, const int infd, - struct Pretty_print * const pp, - const bool ignore_trailing, const bool testing ) - { - enum { buffer_size = 65536 }; - uint8_t buffer[buffer_size]; - bool first_member; - - for( first_member = true; ; ) - { - const int max_in_size = min( LZ_decompress_write_size( decoder ), buffer_size ); - int in_size = 0, out_size = 0; - if( max_in_size > 0 ) - { - in_size = readblock( infd, buffer, max_in_size ); - if( in_size != max_in_size && errno ) - { - Pp_show_msg( pp, 0 ); show_error( "Read error", errno, false ); - return 1; - } - if( in_size > 0 && in_size != LZ_decompress_write( decoder, buffer, in_size ) ) - internal_error( "library error (LZ_decompress_write)." ); - if( in_size < max_in_size ) LZ_decompress_finish( decoder ); - } - while( true ) - { - const int rd = LZ_decompress_read( decoder, buffer, buffer_size ); - if( rd > 0 ) - { - out_size += rd; - if( outfd >= 0 ) - { - const int wr = writeblock( outfd, buffer, rd ); - if( wr != rd ) - { - Pp_show_msg( pp, 0 ); show_error( "Write error", errno, false ); - return 1; - } - } - } - else if( rd < 0 ) { out_size = rd; break; } - if( LZ_decompress_member_finished( decoder ) == 1 ) - { - if( verbosity >= 1 ) - { - const unsigned long long data_size = LZ_decompress_data_position( decoder ); - const unsigned long long member_size = LZ_decompress_member_position( decoder ); - Pp_show_msg( pp, 0 ); - show_header( LZ_decompress_dictionary_size( decoder ) ); - if( verbosity >= 2 && data_size > 0 && member_size > 0 ) - fprintf( stderr, "%6.3f:1, %6.3f bits/byte, %5.2f%% saved. ", - (double)data_size / member_size, - ( 8.0 * member_size ) / data_size, - 100.0 * ( 1.0 - ( (double)member_size / data_size ) ) ); - if( verbosity >= 4 ) - fprintf( stderr, "data CRC %08X, data size %9llu, member size %8llu. ", - LZ_decompress_data_crc( decoder ), data_size, member_size ); - fputs( testing ? "ok\n" : "done\n", stderr ); - } - first_member = false; Pp_reset( pp ); - } - if( rd <= 0 ) break; - } - if( out_size < 0 || ( first_member && out_size == 0 ) ) - { - const enum LZ_Errno lz_errno = LZ_decompress_errno( decoder ); - if( lz_errno == LZ_unexpected_eof && - LZ_decompress_member_position( decoder ) <= 6 ) - { Pp_show_msg( pp, "File ends unexpectedly at member header." ); - return 2; } - if( lz_errno == LZ_header_error ) - { - if( first_member ) - { Pp_show_msg( pp, "Bad magic number (file not in lzip format)." ); - return 2; } - else if( !ignore_trailing ) - { show_error( "Trailing data not allowed.", 0, false ); return 2; } - break; - } - if( lz_errno == LZ_mem_error ) - { Pp_show_msg( pp, "Not enough memory." ); return 1; } - if( verbosity >= 0 ) - { - Pp_show_msg( pp, 0 ); - if( lz_errno == LZ_unexpected_eof ) - fprintf( stderr, "File ends unexpectedly at pos %llu\n", - LZ_decompress_total_in_size( decoder ) ); - else - fprintf( stderr, "Decoder error at pos %llu: %s\n", - LZ_decompress_total_in_size( decoder ), - LZ_strerror( LZ_decompress_errno( decoder ) ) ); - } - return 2; - } - if( LZ_decompress_finished( decoder ) == 1 ) break; - if( in_size == 0 && out_size == 0 ) - internal_error( "library error (LZ_decompress_read)." ); - } - return 0; - } - - -static int decompress( const int infd, struct Pretty_print * const pp, - const bool ignore_trailing, const bool testing ) - { - struct LZ_Decoder * const decoder = LZ_decompress_open(); - int retval; - - if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) - { Pp_show_msg( pp, "Not enough memory." ); retval = 1; } - else retval = do_decompress( decoder, infd, pp, ignore_trailing, testing ); - - LZ_decompress_close( decoder ); - return retval; - } - - -void signal_handler( int sig ) - { - if( sig ) {} /* keep compiler happy */ - show_error( "Control-C or similar caught, quitting.", 0, false ); - cleanup_and_fail( 1 ); - } - - -static void set_signals( void ) - { - signal( SIGHUP, signal_handler ); - signal( SIGINT, signal_handler ); - signal( SIGTERM, signal_handler ); - } - - -void show_error( const char * const msg, const int errcode, const bool help ) - { - if( verbosity < 0 ) return; - if( msg && msg[0] ) - { - fprintf( stderr, "%s: %s", program_name, msg ); - if( errcode > 0 ) fprintf( stderr, ": %s", strerror( errcode ) ); - fputc( '\n', stderr ); - } - if( help ) - fprintf( stderr, "Try '%s --help' for more information.\n", - invocation_name ); - } - - -void internal_error( const char * const msg ) - { - if( verbosity >= 0 ) - fprintf( stderr, "%s: internal error: %s\n", program_name, msg ); - exit( 3 ); - } - - -int main( const int argc, const char * const argv[] ) - { - /* Mapping from gzip/bzip2 style 1..9 compression modes - to the corresponding LZMA compression modes. */ - const struct Lzma_options option_mapping[] = - { - { 65535, 16 }, /* -0 (65535,16 chooses fast encoder) */ - { 1 << 20, 5 }, /* -1 */ - { 3 << 19, 6 }, /* -2 */ - { 1 << 21, 8 }, /* -3 */ - { 3 << 20, 12 }, /* -4 */ - { 1 << 22, 20 }, /* -5 */ - { 1 << 23, 36 }, /* -6 */ - { 1 << 24, 68 }, /* -7 */ - { 3 << 23, 132 }, /* -8 */ - { 1 << 25, 273 } }; /* -9 */ - struct Lzma_options encoder_options = option_mapping[6]; /* default = "-6" */ - const unsigned long long max_member_size = 0x0008000000000000ULL; - const unsigned long long max_volume_size = 0x4000000000000000ULL; - unsigned long long member_size = max_member_size; - unsigned long long volume_size = 0; - const char * input_filename = ""; - const char * default_output_filename = ""; - const char ** filenames = 0; - int num_filenames = 0; - int infd = -1; - enum Mode program_mode = m_compress; - int argind = 0; - int retval = 0; - int i; - bool filenames_given = false; - bool force = false; - bool ignore_trailing = true; - bool keep_input_files = false; - bool stdin_used = false; - bool recompress = false; - bool to_stdout = false; - struct Pretty_print pp; - - const struct ap_Option options[] = - { - { '0', "fast", ap_no }, - { '1', 0, ap_no }, - { '2', 0, ap_no }, - { '3', 0, ap_no }, - { '4', 0, ap_no }, - { '5', 0, ap_no }, - { '6', 0, ap_no }, - { '7', 0, ap_no }, - { '8', 0, ap_no }, - { '9', "best", ap_no }, - { 'a', "trailing-error", ap_no }, - { 'b', "member-size", ap_yes }, - { 'c', "stdout", ap_no }, - { 'd', "decompress", ap_no }, - { 'f', "force", ap_no }, - { 'F', "recompress", ap_no }, - { 'h', "help", ap_no }, - { 'k', "keep", ap_no }, - { 'm', "match-length", ap_yes }, - { 'n', "threads", ap_yes }, - { 'o', "output", ap_yes }, - { 'q', "quiet", ap_no }, - { 's', "dictionary-size", ap_yes }, - { 'S', "volume-size", ap_yes }, - { 't', "test", ap_no }, - { 'v', "verbose", ap_no }, - { 'V', "version", ap_no }, - { 0 , 0, ap_no } }; - - struct Arg_parser parser; - - invocation_name = argv[0]; - - if( LZ_version()[0] != LZ_version_string[0] ) - internal_error( "bad library version." ); - if( strcmp( PROGVERSION, LZ_version_string ) != 0 ) - internal_error( "bad library version_string." ); - - if( !ap_init( &parser, argc, argv, options, 0 ) ) - { show_error( "Not enough memory.", 0, false ); return 1; } - if( ap_error( &parser ) ) /* bad option */ - { show_error( ap_error( &parser ), 0, true ); return 1; } - - for( ; argind < ap_arguments( &parser ); ++argind ) - { - const int code = ap_code( &parser, argind ); - const char * const arg = ap_argument( &parser, argind ); - if( !code ) break; /* no more options */ - switch( code ) - { - case '0': case '1': case '2': case '3': case '4': - case '5': case '6': case '7': case '8': case '9': - encoder_options = option_mapping[code-'0']; break; - case 'a': ignore_trailing = false; break; - case 'b': member_size = getnum( arg, 100000, max_member_size ); break; - case 'c': to_stdout = true; break; - case 'd': program_mode = m_decompress; break; - case 'f': force = true; break; - case 'F': recompress = true; break; - case 'h': show_help(); return 0; - case 'k': keep_input_files = true; break; - case 'm': encoder_options.match_len_limit = - getnum( arg, LZ_min_match_len_limit(), - LZ_max_match_len_limit() ); break; - case 'n': break; - case 'o': default_output_filename = arg; break; - case 'q': verbosity = -1; break; - case 's': encoder_options.dictionary_size = get_dict_size( arg ); - break; - case 'S': volume_size = getnum( arg, 100000, max_volume_size ); break; - case 't': program_mode = m_test; break; - case 'v': if( verbosity < 4 ) ++verbosity; break; - case 'V': show_version(); return 0; - default : internal_error( "uncaught option." ); - } - } /* end process options */ - -#if defined(__MSVCRT__) || defined(__OS2__) - setmode( STDIN_FILENO, O_BINARY ); - setmode( STDOUT_FILENO, O_BINARY ); -#endif - - if( program_mode == m_test ) - outfd = -1; - - num_filenames = max( 1, ap_arguments( &parser ) - argind ); - filenames = resize_buffer( filenames, num_filenames * sizeof filenames[0] ); - filenames[0] = "-"; - - for( i = 0; argind + i < ap_arguments( &parser ); ++i ) - { - filenames[i] = ap_argument( &parser, argind + i ); - if( strcmp( filenames[i], "-" ) != 0 ) filenames_given = true; - } - - if( !to_stdout && program_mode != m_test && - ( filenames_given || default_output_filename[0] ) ) - set_signals(); - - Pp_init( &pp, filenames, num_filenames, verbosity ); - - output_filename = resize_buffer( output_filename, 1 ); - for( i = 0; i < num_filenames; ++i ) - { - int tmp; - struct stat in_stats; - const struct stat * in_statsp; - output_filename[0] = 0; - - if( !filenames[i][0] || strcmp( filenames[i], "-" ) == 0 ) - { - if( stdin_used ) continue; else stdin_used = true; - input_filename = ""; - infd = STDIN_FILENO; - if( program_mode != m_test ) - { - if( to_stdout || !default_output_filename[0] ) - outfd = STDOUT_FILENO; - else - { - if( program_mode == m_compress ) - set_c_outname( default_output_filename, volume_size > 0 ); - else - { - output_filename = resize_buffer( output_filename, - strlen( default_output_filename ) + 1 ); - strcpy( output_filename, default_output_filename ); - } - if( !open_outstream( force, true ) ) - { - if( retval < 1 ) retval = 1; - close( infd ); infd = -1; - continue; - } - } - } - } - else - { - const int eindex = extension_index( filenames[i] ); - input_filename = filenames[i]; - infd = open_instream( input_filename, &in_stats, program_mode, - eindex, recompress, to_stdout ); - if( infd < 0 ) { if( retval < 1 ) retval = 1; continue; } - if( program_mode != m_test ) - { - if( to_stdout ) outfd = STDOUT_FILENO; - else - { - if( program_mode == m_compress ) - set_c_outname( input_filename, volume_size > 0 ); - else set_d_outname( input_filename, eindex ); - if( !open_outstream( force, false ) ) - { - if( retval < 1 ) retval = 1; - close( infd ); infd = -1; - continue; - } - } - } - } - - if( !check_tty( infd, program_mode ) ) - { - if( retval < 1 ) retval = 1; - cleanup_and_fail( retval ); - } - - in_statsp = input_filename[0] ? &in_stats : 0; - Pp_set_name( &pp, input_filename ); - if( program_mode == m_compress ) - tmp = compress( member_size, volume_size, infd, &encoder_options, &pp, - in_statsp ); - else - tmp = decompress( infd, &pp, ignore_trailing, program_mode == m_test ); - if( tmp > retval ) retval = tmp; - if( tmp && program_mode != m_test ) cleanup_and_fail( retval ); - - if( delete_output_on_interrupt ) - close_and_set_permissions( in_statsp ); - if( input_filename[0] ) - { - close( infd ); infd = -1; - if( !keep_input_files && !to_stdout && program_mode != m_test ) - remove( input_filename ); - } - } - if( outfd >= 0 && close( outfd ) != 0 ) - { - show_error( "Can't close stdout", errno, false ); - if( retval < 1 ) retval = 1; - } - free( output_filename ); - free( filenames ); - ap_free( &parser ); - return retval; - } diff --git a/minilzip.c b/minilzip.c new file mode 100644 index 0000000..733506c --- /dev/null +++ b/minilzip.c @@ -0,0 +1,1306 @@ +/* Minilzip - Test program for the library lzlib + Copyright (C) 2009-2025 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . +*/ +/* + Exit status: 0 for a normal exit, 1 for environmental problems + (file not found, invalid command-line options, I/O errors, etc), 2 to + indicate a corrupt or invalid input file, 3 for an internal consistency + error (e.g., bug) which caused minilzip to panic. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include +#include +#include +#include /* CHAR_BIT, SSIZE_MAX */ +#include +#include +#include /* SIZE_MAX */ +#include +#include +#include +#include +#include +#include +#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__ +#include +#if defined __MSVCRT__ +#define fchmod(x,y) 0 +#define fchown(x,y,z) 0 +#define strtoull strtoul +#define SIGHUP SIGTERM +#define S_ISSOCK(x) 0 +#ifndef S_IRGRP +#define S_IRGRP 0 +#define S_IWGRP 0 +#define S_IROTH 0 +#define S_IWOTH 0 +#endif +#endif +#if defined __DJGPP__ +#define S_ISSOCK(x) 0 +#define S_ISVTX 0 +#endif +#endif + +#include "carg_parser.h" +#include "lzlib.h" + +#ifndef O_BINARY +#define O_BINARY 0 +#endif + +#if CHAR_BIT != 8 +#error "Environments where CHAR_BIT != 8 are not supported." +#endif + +#if ( defined SIZE_MAX && SIZE_MAX < UINT_MAX ) || \ + ( defined SSIZE_MAX && SSIZE_MAX < INT_MAX ) +#error "Environments where 'size_t' is narrower than 'int' are not supported." +#endif + +#ifndef max + #define max(x,y) ((x) >= (y) ? (x) : (y)) +#endif +#ifndef min + #define min(x,y) ((x) <= (y) ? (x) : (y)) +#endif + +static void cleanup_and_fail( const int retval ); +static void show_error( const char * const msg, const int errcode, + const bool help ); +static void show_file_error( const char * const filename, + const char * const msg, const int errcode ); +static void internal_error( const char * const msg ); +static const char * const mem_msg = "Not enough memory."; + +int verbosity = 0; + +static const char * const program_name = "minilzip"; +static const char * const program_year = "2025"; +static const char * invocation_name = "minilzip"; /* default value */ + +static const struct { const char * from; const char * to; } known_extensions[] = { + { ".lz", "" }, + { ".tlz", ".tar" }, + { 0, 0 } }; + +typedef struct Lzma_options + { + int dictionary_size; /* 4 KiB .. 512 MiB */ + int match_len_limit; /* 5 .. 273 */ + } Lzma_options; + +typedef enum Mode { m_compress, m_decompress, m_test } Mode; + +/* Variables used in signal handler context. + They are not declared volatile because the handler never returns. */ +static char * output_filename = 0; +static int outfd = -1; +static bool delete_output_on_interrupt = false; + + +static void show_help( void ) + { + printf( "Minilzip is a test program for the compression library lzlib. Minilzip is\n" + "not intended to be installed because lzip has more features, but minilzip is\n" + "well tested and you can use it as your main compressor if so you wish.\n" + "\nLzip is a lossless data compressor with a user interface similar to the one\n" + "of gzip or bzip2. Lzip uses a simplified form of LZMA (Lempel-Ziv-Markov\n" + "chain-Algorithm) designed to achieve complete interoperability between\n" + "implementations. The maximum dictionary size is 512 MiB so that any lzip\n" + "file can be decompressed on 32-bit machines. Lzip provides accurate and\n" + "robust 3-factor integrity checking. 'lzip -0' compresses about as fast as\n" + "gzip, while 'lzip -9' compresses most files more than bzip2. Decompression\n" + "speed is intermediate between gzip and bzip2. Lzip provides better data\n" + "recovery capabilities than gzip and bzip2. Lzip has been designed, written,\n" + "and tested with great care to replace gzip and bzip2 as general-purpose\n" + "compressed format for Unix-like systems.\n" + "\nUsage: %s [options] [files]\n", invocation_name ); + printf( "\nOptions:\n" + " -h, --help display this help and exit\n" + " -V, --version output version information and exit\n" + " -a, --trailing-error exit with error status if trailing data\n" + " -b, --member-size= set member size limit of multimember files\n" + " -c, --stdout write to standard output, keep input files\n" + " -d, --decompress decompress, test compressed file integrity\n" + " -f, --force overwrite existing output files\n" + " -F, --recompress force re-compression of compressed files\n" + " -k, --keep keep (don't delete) input files\n" + " -m, --match-length= set match length limit in bytes [36]\n" + " -o, --output= write to , keep input files\n" + " -q, --quiet suppress all messages\n" + " -s, --dictionary-size= set dictionary size limit in bytes [8 MiB]\n" + " -S, --volume-size= set volume size limit in bytes\n" + " -t, --test test compressed file integrity\n" + " -v, --verbose be verbose (a 2nd -v gives more)\n" + " -0 .. -9 set compression level [default 6]\n" + " --fast alias for -0\n" + " --best alias for -9\n" + " --loose-trailing allow trailing data seeming corrupt header\n" + " --check-lib compare version of lzlib.h with liblz.{a,so}\n" + "\nIf no file names are given, or if a file is '-', minilzip compresses or\n" + "decompresses from standard input to standard output.\n" + "Numbers may be followed by a multiplier: k = kB = 10^3 = 1000,\n" + "Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc...\n" + "Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12 to\n" + "2^29 bytes.\n" + "\nThe bidimensional parameter space of LZMA can't be mapped to a linear scale\n" + "optimal for all files. If your files are large, very repetitive, etc, you\n" + "may need to use the options --dictionary-size and --match-length directly\n" + "to achieve optimal performance.\n" + "\nTo extract all the files from archive 'foo.tar.lz', use the commands\n" + "'tar -xf foo.tar.lz' or 'minilzip -cd foo.tar.lz | tar -xf -'.\n" + "\nExit status: 0 for a normal exit, 1 for environmental problems\n" + "(file not found, invalid command-line options, I/O errors, etc), 2 to\n" + "indicate a corrupt or invalid input file, 3 for an internal consistency\n" + "error (e.g., bug) which caused minilzip to panic.\n" + "\nThe ideas embodied in lzlib are due to (at least) the following people:\n" + "Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrei Markov (for the\n" + "definition of Markov chains), G.N.N. Martin (for the definition of range\n" + "encoding), Igor Pavlov (for putting all the above together in LZMA), and\n" + "Julian Seward (for bzip2's CLI).\n" + "\nReport bugs to lzip-bug@nongnu.org\n" + "Lzlib home page: http://www.nongnu.org/lzip/lzlib.html\n" ); + } + + +static void show_lzlib_version( void ) + { + printf( "Using lzlib %s\n", LZ_version() ); +#if !defined LZ_API_VERSION + fputs( "LZ_API_VERSION is not defined.\n", stdout ); +#elif LZ_API_VERSION >= 1012 + printf( "Using LZ_API_VERSION = %u\n", LZ_api_version() ); +#else + printf( "Compiled with LZ_API_VERSION = %u. " + "Using an unknown LZ_API_VERSION\n", LZ_API_VERSION ); +#endif + } + + +static void show_version( void ) + { + printf( "%s %s\n", program_name, PROGVERSION ); + printf( "Copyright (C) %s Antonio Diaz Diaz.\n", program_year ); + printf( "License GPLv2+: GNU GPL version 2 or later \n" + "This is free software: you are free to change and redistribute it.\n" + "There is NO WARRANTY, to the extent permitted by law.\n" ); + show_lzlib_version(); + } + + +static inline void set_retval( int * retval, const int new_val ) + { if( *retval < new_val ) *retval = new_val; } + + +static int check_lzlib_ver() /* . or .[a-z.-]* */ + { +#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + const unsigned char * p = (unsigned char *)LZ_version_string; + unsigned major = 0, minor = 0; + while( major < 100000 && isdigit( *p ) ) + { major *= 10; major += *p - '0'; ++p; } + if( *p == '.' ) ++p; + else +out: { show_error( "Invalid LZ_version_string in lzlib.h", 0, false ); return 2; } + while( minor < 100 && isdigit( *p ) ) + { minor *= 10; minor += *p - '0'; ++p; } + if( *p && *p != '-' && *p != '.' && !islower( *p ) ) goto out; + const unsigned version = major * 1000 + minor; + if( LZ_API_VERSION != version ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: Version mismatch in lzlib.h: " + "LZ_API_VERSION = %u, should be %u.\n", + program_name, LZ_API_VERSION, version ); + return 2; + } +#endif + return 0; + } + + +static int check_lib() + { + int retval = check_lzlib_ver(); + if( strcmp( LZ_version_string, LZ_version() ) != 0 ) + { set_retval( &retval, 1 ); + if( verbosity >= 0 ) + printf( "warning: LZ_version_string != LZ_version() (%s vs %s)\n", + LZ_version_string, LZ_version() ); } +#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + if( LZ_API_VERSION != LZ_api_version() ) + { set_retval( &retval, 1 ); + if( verbosity >= 0 ) + printf( "warning: LZ_API_VERSION != LZ_api_version() (%u vs %u)\n", + LZ_API_VERSION, LZ_api_version() ); } +#endif + if( verbosity >= 1 ) show_lzlib_version(); + return retval; + } + + +/* assure at least a minimum size for buffer 'buf' */ +static void * resize_buffer( void * buf, const unsigned min_size ) + { + if( buf ) buf = realloc( buf, min_size ); + else buf = malloc( min_size ); + if( !buf ) { show_error( mem_msg, 0, false ); cleanup_and_fail( 1 ); } + return buf; + } + + +typedef struct Pretty_print + { + const char * name; + char * padded_name; + const char * stdin_name; + unsigned longest_name; + bool first_post; + } Pretty_print; + +static void Pp_init( Pretty_print * const pp, + const char * const filenames[], const int num_filenames ) + { + pp->name = 0; + pp->padded_name = 0; + pp->stdin_name = "(stdin)"; + pp->longest_name = 0; + pp->first_post = false; + + if( verbosity <= 0 ) return; + const unsigned stdin_name_len = strlen( pp->stdin_name ); + int i; + for( i = 0; i < num_filenames; ++i ) + { + const char * const s = filenames[i]; + const unsigned len = (strcmp( s, "-" ) == 0) ? stdin_name_len : strlen( s ); + if( pp->longest_name < len ) pp->longest_name = len; + } + if( pp->longest_name == 0 ) pp->longest_name = stdin_name_len; + } + +void Pp_free( Pretty_print * const pp ) + { if( pp->padded_name ) { free( pp->padded_name ); pp->padded_name = 0; } } + +static void Pp_set_name( Pretty_print * const pp, const char * const filename ) + { + unsigned name_len, padded_name_len, i = 0; + + if( filename && filename[0] && strcmp( filename, "-" ) != 0 ) + pp->name = filename; + else pp->name = pp->stdin_name; + name_len = strlen( pp->name ); + padded_name_len = max( name_len, pp->longest_name ) + 4; + pp->padded_name = resize_buffer( pp->padded_name, padded_name_len + 1 ); + while( i < 2 ) pp->padded_name[i++] = ' '; + while( i < name_len + 2 ) { pp->padded_name[i] = pp->name[i-2]; ++i; } + pp->padded_name[i++] = ':'; + while( i < padded_name_len ) pp->padded_name[i++] = ' '; + pp->padded_name[i] = 0; + pp->first_post = true; + } + +static void Pp_reset( Pretty_print * const pp ) + { if( pp->name && pp->name[0] ) pp->first_post = true; } + +static void Pp_show_msg( Pretty_print * const pp, const char * const msg ) + { + if( verbosity < 0 ) return; + if( pp->first_post ) + { + pp->first_post = false; + fputs( pp->padded_name, stderr ); + if( !msg ) fflush( stderr ); + } + if( msg ) fprintf( stderr, "%s\n", msg ); + } + + +static void show_header( const unsigned dictionary_size ) + { + enum { factor = 1024, n = 3 }; + const char * const prefix[n] = { "Ki", "Mi", "Gi" }; + const char * p = ""; + const char * np = " "; + unsigned num = dictionary_size; + bool exact = num % factor == 0; + + int i; for( i = 0; i < n && ( num > 9999 || ( exact && num >= factor ) ); ++i ) + { num /= factor; if( num % factor != 0 ) exact = false; + p = prefix[i]; np = ""; } + fprintf( stderr, "dict %s%4u %sB, ", np, num, p ); + } + + +/* separate numbers of 5 or more digits in groups of 3 digits using '_' */ +static const char * format_num3( unsigned long long num ) + { + enum { buffers = 8, bufsize = 4 * sizeof num, n = 10 }; + const char * const si_prefix = "kMGTPEZYRQ"; + const char * const binary_prefix = "KMGTPEZYRQ"; + static char buffer[buffers][bufsize]; /* circle of static buffers for printf */ + static int current = 0; + int i; + char * const buf = buffer[current++]; current %= buffers; + char * p = buf + bufsize - 1; /* fill the buffer backwards */ + *p = 0; /* terminator */ + if( num > 9999 ) + { + char prefix = 0; /* try binary first, then si */ + for( i = 0; i < n && num != 0 && num % 1024 == 0; ++i ) + { num /= 1024; prefix = binary_prefix[i]; } + if( prefix ) *(--p) = 'i'; + else + for( i = 0; i < n && num != 0 && num % 1000 == 0; ++i ) + { num /= 1000; prefix = si_prefix[i]; } + if( prefix ) *(--p) = prefix; + } + const bool split = num >= 10000; + + for( i = 0; ; ) + { + *(--p) = num % 10 + '0'; num /= 10; if( num == 0 ) break; + if( split && ++i >= 3 ) { i = 0; *(--p) = '_'; } + } + return p; + } + + +void show_option_error( const char * const arg, const char * const msg, + const char * const option_name ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: '%s': %s option '%s'.\n", + program_name, arg, msg, option_name ); + } + + +/* Recognized formats: k, Ki, [MGTPEZYRQ][i] */ +static unsigned long long getnum( const char * const arg, + const char * const option_name, + const unsigned long long llimit, + const unsigned long long ulimit ) + { + char * tail; + errno = 0; + unsigned long long result = strtoull( arg, &tail, 0 ); + if( tail == arg ) + { show_option_error( arg, "Bad or missing numerical argument in", + option_name ); exit( 1 ); } + + if( !errno && tail[0] ) + { + const unsigned factor = (tail[1] == 'i') ? 1024 : 1000; + int exponent = 0; /* 0 = bad multiplier */ + int i; + switch( tail[0] ) + { + case 'Q': exponent = 10; break; + case 'R': exponent = 9; break; + case 'Y': exponent = 8; break; + case 'Z': exponent = 7; break; + case 'E': exponent = 6; break; + case 'P': exponent = 5; break; + case 'T': exponent = 4; break; + case 'G': exponent = 3; break; + case 'M': exponent = 2; break; + case 'K': if( factor == 1024 ) exponent = 1; break; + case 'k': if( factor == 1000 ) exponent = 1; break; + } + if( exponent <= 0 ) + { show_option_error( arg, "Bad multiplier in numerical argument of", + option_name ); exit( 1 ); } + for( i = 0; i < exponent; ++i ) + { + if( ulimit / factor >= result ) result *= factor; + else { errno = ERANGE; break; } + } + } + if( !errno && ( result < llimit || result > ulimit ) ) errno = ERANGE; + if( errno ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: '%s': Value out of limits [%s,%s] in " + "option '%s'.\n", program_name, arg, format_num3( llimit ), + format_num3( ulimit ), option_name ); + exit( 1 ); + } + return result; + } + + +static int get_dict_size( const char * const arg, const char * const option_name ) + { + char * tail; + const long bits = strtol( arg, &tail, 0 ); + if( bits >= LZ_min_dictionary_bits() && + bits <= LZ_max_dictionary_bits() && *tail == 0 ) + return 1 << bits; + int dictionary_size = getnum( arg, option_name, LZ_min_dictionary_size(), + LZ_max_dictionary_size() ); + if( dictionary_size == 65535 ) ++dictionary_size; /* no fast encoder */ + return dictionary_size; + } + + +static void set_mode( Mode * const program_modep, const Mode new_mode ) + { + if( *program_modep != m_compress && *program_modep != new_mode ) + { + show_error( "Only one operation can be specified.", 0, true ); + exit( 1 ); + } + *program_modep = new_mode; + } + + +static int extension_index( const char * const name ) + { + int eindex; + for( eindex = 0; known_extensions[eindex].from; ++eindex ) + { + const char * const ext = known_extensions[eindex].from; + const unsigned name_len = strlen( name ); + const unsigned ext_len = strlen( ext ); + if( name_len > ext_len && + strncmp( name + name_len - ext_len, ext, ext_len ) == 0 ) + return eindex; + } + return -1; + } + + +static void set_c_outname( const char * const name, const bool force_ext, + const bool multifile ) + { + output_filename = resize_buffer( output_filename, strlen( name ) + 5 + + strlen( known_extensions[0].from ) + 1 ); + strcpy( output_filename, name ); + if( multifile ) strcat( output_filename, "00001" ); + if( force_ext || multifile ) + strcat( output_filename, known_extensions[0].from ); + } + + +static void set_d_outname( const char * const name, const int eindex ) + { + const unsigned name_len = strlen( name ); + if( eindex >= 0 ) + { + const char * const from = known_extensions[eindex].from; + const unsigned from_len = strlen( from ); + if( name_len > from_len ) + { + output_filename = resize_buffer( output_filename, name_len + + strlen( known_extensions[eindex].to ) + 1 ); + strcpy( output_filename, name ); + strcpy( output_filename + name_len - from_len, known_extensions[eindex].to ); + return; + } + } + output_filename = resize_buffer( output_filename, name_len + 4 + 1 ); + strcpy( output_filename, name ); + strcat( output_filename, ".out" ); + if( verbosity >= 1 ) + fprintf( stderr, "%s: %s: Can't guess original name -- using '%s'\n", + program_name, name, output_filename ); + } + + +static int open_instream( const char * const name, struct stat * const in_statsp, + const Mode program_mode, const int eindex, + const bool one_to_one, const bool recompress ) + { + if( program_mode == m_compress && !recompress && eindex >= 0 ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: %s: Input file already has '%s' suffix, ignored.\n", + program_name, name, known_extensions[eindex].from ); + return -1; + } + int infd = open( name, O_RDONLY | O_BINARY ); + if( infd < 0 ) + show_file_error( name, "Can't open input file", errno ); + else + { + const int i = fstat( infd, in_statsp ); + const mode_t mode = in_statsp->st_mode; + const bool can_read = i == 0 && + ( S_ISBLK( mode ) || S_ISCHR( mode ) || + S_ISFIFO( mode ) || S_ISSOCK( mode ) ); + if( i != 0 || ( !S_ISREG( mode ) && ( !can_read || one_to_one ) ) ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: %s: Input file is not a regular file%s.\n", + program_name, name, ( can_read && one_to_one ) ? + ",\n and neither '-c' nor '-o' were specified" : "" ); + close( infd ); + infd = -1; + } + } + return infd; + } + + +static bool open_outstream( const bool force, const bool protect ) + { + const mode_t usr_rw = S_IRUSR | S_IWUSR; + const mode_t all_rw = usr_rw | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; + const mode_t outfd_mode = protect ? usr_rw : all_rw; + int flags = O_CREAT | O_WRONLY | O_BINARY; + if( force ) flags |= O_TRUNC; else flags |= O_EXCL; + + outfd = open( output_filename, flags, outfd_mode ); + if( outfd >= 0 ) delete_output_on_interrupt = true; + else if( errno == EEXIST ) + show_file_error( output_filename, + "Output file already exists, skipping.", 0 ); + else + show_file_error( output_filename, "Can't create output file", errno ); + return outfd >= 0; + } + + +static void set_signals( void (*action)(int) ) + { + signal( SIGHUP, action ); + signal( SIGINT, action ); + signal( SIGTERM, action ); + } + + +static void cleanup_and_fail( const int retval ) + { + set_signals( SIG_IGN ); /* ignore signals */ + if( delete_output_on_interrupt ) + { + delete_output_on_interrupt = false; + show_file_error( output_filename, "Deleting output file, if it exists.", 0 ); + if( outfd >= 0 ) { close( outfd ); outfd = -1; } + if( remove( output_filename ) != 0 && errno != ENOENT ) + show_error( "warning: deletion of output file failed", errno, false ); + } + exit( retval ); + } + + +static void signal_handler( int sig ) + { + if( sig ) {} /* keep compiler happy */ + show_error( "Control-C or similar caught, quitting.", 0, false ); + cleanup_and_fail( 1 ); + } + + +static bool check_tty_in( const char * const input_filename, const int infd, + const Mode program_mode, int * const retval ) + { + if( ( program_mode == m_decompress || program_mode == m_test ) && + isatty( infd ) ) /* for example /dev/tty */ + { show_file_error( input_filename, + "I won't read compressed data from a terminal.", 0 ); + close( infd ); set_retval( retval, 2 ); + if( program_mode != m_test ) cleanup_and_fail( *retval ); + return false; } + return true; + } + +static bool check_tty_out( const Mode program_mode ) + { + if( program_mode == m_compress && isatty( outfd ) ) + { show_file_error( output_filename[0] ? + output_filename : "(stdout)", + "I won't write compressed data to a terminal.", 0 ); + return false; } + return true; + } + + +/* Set permissions, owner, and times. */ +static void close_and_set_permissions( const struct stat * const in_statsp ) + { + bool warning = false; + if( in_statsp ) + { + const mode_t mode = in_statsp->st_mode; + /* fchown in many cases returns with EPERM, which can be safely ignored. */ + if( fchown( outfd, in_statsp->st_uid, in_statsp->st_gid ) == 0 ) + { if( fchmod( outfd, mode ) != 0 ) warning = true; } + else + if( errno != EPERM || + fchmod( outfd, mode & ~( S_ISUID | S_ISGID | S_ISVTX ) ) != 0 ) + warning = true; + } + if( close( outfd ) != 0 ) + { show_file_error( output_filename, "Error closing output file", errno ); + cleanup_and_fail( 1 ); } + outfd = -1; + delete_output_on_interrupt = false; + if( in_statsp ) + { + struct utimbuf t; + t.actime = in_statsp->st_atime; + t.modtime = in_statsp->st_mtime; + if( utime( output_filename, &t ) != 0 ) warning = true; + } + if( warning && verbosity >= 1 ) + show_file_error( output_filename, + "warning: can't change output file attributes", errno ); + } + + +/* Return the number of bytes really read. + If (value returned < size) and (errno == 0), means EOF was reached. +*/ +static int readblock( const int fd, uint8_t * const buf, const int size ) + { + int sz = 0; + errno = 0; + while( sz < size ) + { + const int n = read( fd, buf + sz, size - sz ); + if( n > 0 ) sz += n; + else if( n == 0 ) break; /* EOF */ + else if( errno != EINTR ) break; + errno = 0; + } + return sz; + } + + +/* Return the number of bytes really written. + If (value returned < size), it is always an error. +*/ +static int writeblock( const int fd, const uint8_t * const buf, const int size ) + { + int sz = 0; + errno = 0; + while( sz < size ) + { + const int n = write( fd, buf + sz, size - sz ); + if( n > 0 ) sz += n; + else if( n < 0 && errno != EINTR ) break; + errno = 0; + } + return sz; + } + + +static bool next_filename( void ) + { + const unsigned name_len = strlen( output_filename ); + const unsigned ext_len = strlen( known_extensions[0].from ); + int i, j; + if( name_len >= ext_len + 5 ) /* "*00001.lz" */ + for( i = name_len - ext_len - 1, j = 0; j < 5; --i, ++j ) + { + if( output_filename[i] < '9' ) { ++output_filename[i]; return true; } + else output_filename[i] = '0'; + } + return false; + } + + +static int do_compress( LZ_Encoder * const encoder, + const unsigned long long member_size, + const unsigned long long volume_size, const int infd, + Pretty_print * const pp, + const struct stat * const in_statsp ) + { + unsigned long long partial_volume_size = 0; + enum { buffer_size = 65536 }; + uint8_t buffer[buffer_size]; /* read/write buffer */ + if( verbosity >= 1 ) Pp_show_msg( pp, 0 ); + + while( true ) + { + int in_size = 0; + while( LZ_compress_write_size( encoder ) > 0 ) + { + const int size = min( LZ_compress_write_size( encoder ), buffer_size ); + const int rd = readblock( infd, buffer, size ); + if( rd != size && errno ) + { + Pp_show_msg( pp, 0 ); show_error( "Read error", errno, false ); + return 1; + } + if( rd > 0 && rd != LZ_compress_write( encoder, buffer, rd ) ) + internal_error( "library error (LZ_compress_write)." ); + if( rd < size ) LZ_compress_finish( encoder ); +/* else LZ_compress_sync_flush( encoder ); */ + in_size += rd; + } + const int out_size = LZ_compress_read( encoder, buffer, buffer_size ); + if( out_size < 0 ) + { + Pp_show_msg( pp, 0 ); + if( verbosity >= 0 ) + fprintf( stderr, "%s: LZ_compress_read error: %s\n", + program_name, LZ_strerror( LZ_compress_errno( encoder ) ) ); + return 1; + } + else if( out_size > 0 ) + { + const int wr = writeblock( outfd, buffer, out_size ); + if( wr != out_size ) + { + Pp_show_msg( pp, 0 ); show_error( "Write error", errno, false ); + return 1; + } + } + else if( in_size == 0 ) + internal_error( "library error (LZ_compress_read)." ); + if( LZ_compress_member_finished( encoder ) ) + { + unsigned long long size; + if( LZ_compress_finished( encoder ) == 1 ) break; + if( volume_size > 0 ) + { + partial_volume_size += LZ_compress_member_position( encoder ); + if( partial_volume_size >= volume_size - LZ_min_dictionary_size() ) + { + partial_volume_size = 0; + if( delete_output_on_interrupt ) + { + close_and_set_permissions( in_statsp ); + if( !next_filename() ) + { Pp_show_msg( pp, "Too many volume files." ); return 1; } + if( !open_outstream( true, in_statsp ) ) return 1; + } + } + size = min( member_size, volume_size - partial_volume_size ); + } + else + size = member_size; + if( LZ_compress_restart_member( encoder, size ) < 0 ) + { + Pp_show_msg( pp, 0 ); + if( verbosity >= 0 ) + fprintf( stderr, "%s: LZ_compress_restart_member error: %s\n", + program_name, LZ_strerror( LZ_compress_errno( encoder ) ) ); + return 1; + } + } + } + + if( verbosity >= 1 ) + { + const unsigned long long in_size = LZ_compress_total_in_size( encoder ); + const unsigned long long out_size = LZ_compress_total_out_size( encoder ); + if( in_size == 0 || out_size == 0 ) + fputs( " no data compressed.\n", stderr ); + else + fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved, " + "%llu in, %llu out.\n", + (double)in_size / out_size, + ( 100.0 * out_size ) / in_size, + 100.0 - ( ( 100.0 * out_size ) / in_size ), + in_size, out_size ); + } + return 0; + } + + +static int compress( const unsigned long long member_size, + const unsigned long long volume_size, const int infd, + const Lzma_options * const encoder_options, + Pretty_print * const pp, + const struct stat * const in_statsp ) + { + LZ_Encoder * const encoder = + LZ_compress_open( encoder_options->dictionary_size, + encoder_options->match_len_limit, ( volume_size > 0 ) ? + min( member_size, volume_size ) : member_size ); + int retval; + + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { + if( !encoder || LZ_compress_errno( encoder ) == LZ_mem_error ) + Pp_show_msg( pp, "Not enough memory. Try a smaller dictionary size." ); + else + internal_error( "invalid argument to encoder." ); + retval = 1; + } + else retval = do_compress( encoder, member_size, volume_size, + infd, pp, in_statsp ); + LZ_compress_close( encoder ); + return retval; + } + + +static int do_decompress( LZ_Decoder * const decoder, const int infd, + Pretty_print * const pp, const bool from_stdin, + const bool ignore_trailing, const bool loose_trailing, + const bool testing ) + { + enum { buffer_size = 65536 }; + uint8_t buffer[buffer_size]; /* read/write buffer */ + unsigned long long total_in = 0; /* to detect library stall */ + bool first_member; + bool empty = false, multi = false; + + for( first_member = true; ; ) + { + const int max_in_size = + min( LZ_decompress_write_size( decoder ), buffer_size ); + int in_size = 0, out_size = 0; + if( max_in_size > 0 ) + { + in_size = readblock( infd, buffer, max_in_size ); + if( in_size != max_in_size && errno ) + { + Pp_show_msg( pp, 0 ); show_error( "Read error", errno, false ); + return 1; + } + if( in_size > 0 && in_size != LZ_decompress_write( decoder, buffer, in_size ) ) + internal_error( "library error (LZ_decompress_write)." ); + if( in_size < max_in_size ) LZ_decompress_finish( decoder ); + } + while( true ) + { + const int rd = + LZ_decompress_read( decoder, (outfd >= 0) ? buffer : 0, buffer_size ); + if( rd > 0 ) + { + out_size += rd; + if( outfd >= 0 ) + { + const int wr = writeblock( outfd, buffer, rd ); + if( wr != rd ) + { + Pp_show_msg( pp, 0 ); show_error( "Write error", errno, false ); + return 1; + } + } + } + else if( rd < 0 ) { out_size = rd; break; } + if( LZ_decompress_member_finished( decoder ) == 1 ) + { + const unsigned long long data_size = LZ_decompress_data_position( decoder ); + if( !from_stdin ) + { multi = !first_member; if( data_size == 0 ) empty = true; } + if( verbosity >= 1 ) + { + const unsigned long long member_size = + LZ_decompress_member_position( decoder ); + if( verbosity >= 2 || ( verbosity == 1 && first_member ) ) + Pp_show_msg( pp, 0 ); + if( verbosity >= 2 ) + { + if( verbosity >= 4 ) + show_header( LZ_decompress_dictionary_size( decoder ) ); + if( data_size == 0 || member_size == 0 ) + fputs( "no data compressed. ", stderr ); + else + fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ", + (double)data_size / member_size, + ( 100.0 * member_size ) / data_size, + 100.0 - ( ( 100.0 * member_size ) / data_size ) ); + if( verbosity >= 4 ) + fprintf( stderr, "CRC %08X, ", LZ_decompress_data_crc( decoder ) ); + if( verbosity >= 3 ) + fprintf( stderr, "%9llu out, %8llu in. ", data_size, member_size ); + fputs( testing ? "ok\n" : "done\n", stderr ); Pp_reset( pp ); + } + } + first_member = false; /* member decompressed successfully */ + } + if( rd <= 0 ) break; + } + if( out_size < 0 || ( first_member && out_size == 0 ) ) + { + const unsigned long long member_pos = LZ_decompress_member_position( decoder ); + const LZ_Errno lz_errno = LZ_decompress_errno( decoder ); + if( lz_errno == LZ_library_error ) + internal_error( "library error (LZ_decompress_read)." ); + if( member_pos <= 6 ) + { + if( lz_errno == LZ_unexpected_eof ) + { + if( first_member ) + show_file_error( pp->name, "File ends unexpectedly at member header.", 0 ); + else + Pp_show_msg( pp, "Truncated header in multimember file." ); + return 2; + } + else if( lz_errno == LZ_data_error ) + { + if( member_pos == 4 ) + { if( verbosity >= 0 ) + { Pp_show_msg( pp, 0 ); + fprintf( stderr, "Version %d member format not supported.\n", + LZ_decompress_member_version( decoder ) ); } } + else if( member_pos == 5 ) + Pp_show_msg( pp, "Invalid dictionary size in member header." ); + else if( member_pos == 6 ) + Pp_show_msg( pp, "Nonzero first LZMA byte." ); + else if( first_member ) /* for lzlib older than 1.10 */ + Pp_show_msg( pp, "Bad version or dictionary size in member header." ); + else if( !loose_trailing ) + Pp_show_msg( pp, "Corrupt header in multimember file." ); + else if( !ignore_trailing ) + Pp_show_msg( pp, "Trailing data not allowed." ); + else break; /* trailing data */ + return 2; + } + } + if( lz_errno == LZ_header_error ) + { + if( first_member ) + show_file_error( pp->name, + "Bad magic number (file not in lzip format).", 0 ); + else if( !ignore_trailing ) + Pp_show_msg( pp, "Trailing data not allowed." ); + else break; /* trailing data */ + return 2; + } + if( lz_errno == LZ_mem_error ) { Pp_show_msg( pp, mem_msg ); return 1; } + if( verbosity >= 0 ) + { + Pp_show_msg( pp, 0 ); + fprintf( stderr, "%s at pos %llu\n", ( lz_errno == LZ_unexpected_eof ) ? + "File ends unexpectedly" : "Decoder error", + LZ_decompress_total_in_size( decoder ) ); + } + return 2; + } + if( LZ_decompress_finished( decoder ) == 1 ) break; + if( in_size == 0 && out_size == 0 ) + { + const unsigned long long size = LZ_decompress_total_in_size( decoder ); + if( total_in == size ) internal_error( "library error (stalled)." ); + total_in = size; + } + } + if( verbosity == 1 ) fputs( testing ? "ok\n" : "done\n", stderr ); + if( empty && multi ) + { show_file_error( pp->name, "Empty member not allowed.", 0 ); return 2; } + return 0; + } + + +static int decompress( const int infd, Pretty_print * const pp, + const bool from_stdin, const bool ignore_trailing, + const bool loose_trailing, const bool testing ) + { + LZ_Decoder * const decoder = LZ_decompress_open(); + int retval; + + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { Pp_show_msg( pp, mem_msg ); retval = 1; } + else retval = do_decompress( decoder, infd, pp, from_stdin, ignore_trailing, + loose_trailing, testing ); + LZ_decompress_close( decoder ); + return retval; + } + + +static void show_error( const char * const msg, const int errcode, + const bool help ) + { + if( verbosity < 0 ) return; + if( msg && msg[0] ) + fprintf( stderr, "%s: %s%s%s\n", program_name, msg, + ( errcode > 0 ) ? ": " : "", + ( errcode > 0 ) ? strerror( errcode ) : "" ); + if( help ) + fprintf( stderr, "Try '%s --help' for more information.\n", + invocation_name ); + } + + +static void show_file_error( const char * const filename, + const char * const msg, const int errcode ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: %s: %s%s%s\n", program_name, filename, msg, + ( errcode > 0 ) ? ": " : "", + ( errcode > 0 ) ? strerror( errcode ) : "" ); + } + + +static void internal_error( const char * const msg ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: internal error: %s\n", program_name, msg ); + exit( 3 ); + } + + +int main( const int argc, const char * const argv[] ) + { + /* Mapping from gzip/bzip2 style 0..9 compression levels to the + corresponding LZMA compression parameters. */ + const Lzma_options option_mapping[] = + { + { 65535, 16 }, /* -0 (65535,16 chooses fast encoder) */ + { 1 << 20, 5 }, /* -1 */ + { 3 << 19, 6 }, /* -2 */ + { 1 << 21, 8 }, /* -3 */ + { 3 << 20, 12 }, /* -4 */ + { 1 << 22, 20 }, /* -5 */ + { 1 << 23, 36 }, /* -6 */ + { 1 << 24, 68 }, /* -7 */ + { 3 << 23, 132 }, /* -8 */ + { 1 << 25, 273 } }; /* -9 */ + Lzma_options encoder_options = option_mapping[6]; /* default = "-6" */ + const unsigned long long max_member_size = 0x0008000000000000ULL; /* 2 PiB */ + const unsigned long long max_volume_size = 0x4000000000000000ULL; /* 4 EiB */ + unsigned long long member_size = max_member_size; + unsigned long long volume_size = 0; + const char * default_output_filename = ""; + Mode program_mode = m_compress; + bool force = false; + bool ignore_trailing = true; + bool keep_input_files = false; + bool loose_trailing = false; + bool recompress = false; + bool to_stdout = false; + if( argc > 0 ) invocation_name = argv[0]; + + enum { opt_chk = 256, opt_lt }; + const ap_Option options[] = + { + { '0', "fast", ap_no }, + { '1', 0, ap_no }, + { '2', 0, ap_no }, + { '3', 0, ap_no }, + { '4', 0, ap_no }, + { '5', 0, ap_no }, + { '6', 0, ap_no }, + { '7', 0, ap_no }, + { '8', 0, ap_no }, + { '9', "best", ap_no }, + { 'a', "trailing-error", ap_no }, + { 'b', "member-size", ap_yes }, + { 'c', "stdout", ap_no }, + { 'd', "decompress", ap_no }, + { 'f', "force", ap_no }, + { 'F', "recompress", ap_no }, + { 'h', "help", ap_no }, + { 'k', "keep", ap_no }, + { 'm', "match-length", ap_yes }, + { 'n', "threads", ap_yes }, + { 'o', "output", ap_yes }, + { 'q', "quiet", ap_no }, + { 's', "dictionary-size", ap_yes }, + { 'S', "volume-size", ap_yes }, + { 't', "test", ap_no }, + { 'v', "verbose", ap_no }, + { 'V', "version", ap_no }, + { opt_chk, "check-lib", ap_no }, + { opt_lt, "loose-trailing", ap_no }, + { 0, 0, ap_no } }; + + /* static because valgrind complains and memory management in C sucks */ + static Arg_parser parser; + if( !ap_init( &parser, argc, argv, options, 0 ) ) + { show_error( mem_msg, 0, false ); return 1; } + if( ap_error( &parser ) ) /* bad option */ + { show_error( ap_error( &parser ), 0, true ); return 1; } + + int argind = 0; + for( ; argind < ap_arguments( &parser ); ++argind ) + { + const int code = ap_code( &parser, argind ); + if( !code ) break; /* no more options */ + const char * const pn = ap_parsed_name( &parser, argind ); + const char * const arg = ap_argument( &parser, argind ); + switch( code ) + { + case '0': case '1': case '2': case '3': case '4': case '5': + case '6': case '7': case '8': case '9': + encoder_options = option_mapping[code-'0']; break; + case 'a': ignore_trailing = false; break; + case 'b': member_size = getnum( arg, pn, 100000, max_member_size ); break; + case 'c': to_stdout = true; break; + case 'd': set_mode( &program_mode, m_decompress ); break; + case 'f': force = true; break; + case 'F': recompress = true; break; + case 'h': show_help(); return 0; + case 'k': keep_input_files = true; break; + case 'm': encoder_options.match_len_limit = + getnum( arg, pn, LZ_min_match_len_limit(), + LZ_max_match_len_limit() ); break; + case 'n': break; /* ignored */ + case 'o': if( strcmp( arg, "-" ) == 0 ) to_stdout = true; + else { default_output_filename = arg; } break; + case 'q': verbosity = -1; break; + case 's': encoder_options.dictionary_size = get_dict_size( arg, pn ); + break; + case 'S': volume_size = getnum( arg, pn, 100000, max_volume_size ); break; + case 't': set_mode( &program_mode, m_test ); break; + case 'v': if( verbosity < 4 ) ++verbosity; break; + case 'V': show_version(); return 0; + case opt_chk: return check_lib(); + case opt_lt: loose_trailing = true; break; + default: internal_error( "uncaught option." ); + } + } /* end process options */ + + if( strcmp( PROGVERSION, LZ_version_string ) != 0 ) + internal_error( "wrong PROGVERSION." ); +#if !defined LZ_API_VERSION || LZ_API_VERSION < 1012 +#error "lzlib 1.12 or newer needed." +#else + if( LZ_api_version() < 1012 ) /* minilzip passes null to LZ_decompress_read */ + { show_error( "lzlib 1.12 or newer needed. Try --check-lib.", 0, false ); + return 1; } + if( LZ_api_version() != LZ_API_VERSION ) show_error( + "warning: wrong library API version. Try --check-lib.", 0, false ); + else +#endif + if( strcmp( LZ_version_string, LZ_version() ) != 0 ) show_error( + "warning: wrong library version_string. Try --check-lib.", 0, false ); + +#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__ + setmode( STDIN_FILENO, O_BINARY ); + setmode( STDOUT_FILENO, O_BINARY ); +#endif + + static const char ** filenames = 0; + int num_filenames = max( 1, ap_arguments( &parser ) - argind ); + filenames = resize_buffer( filenames, num_filenames * sizeof filenames[0] ); + filenames[0] = "-"; + + int i; + bool filenames_given = false; + for( i = 0; argind + i < ap_arguments( &parser ); ++i ) + { + filenames[i] = ap_argument( &parser, argind + i ); + if( strcmp( filenames[i], "-" ) != 0 ) filenames_given = true; + } + + if( program_mode == m_compress ) + { + if( volume_size > 0 && !to_stdout && default_output_filename[0] && + num_filenames > 1 ) + { show_error( "Only can compress one file when using '-o' and '-S'.", + 0, true ); return 1; } + } + else volume_size = 0; + if( program_mode == m_test ) to_stdout = false; /* apply overrides */ + if( program_mode == m_test || to_stdout ) default_output_filename = ""; + + output_filename = resize_buffer( output_filename, 1 ); + output_filename[0] = 0; + if( to_stdout && program_mode != m_test ) /* check tty only once */ + { outfd = STDOUT_FILENO; if( !check_tty_out( program_mode ) ) return 1; } + else outfd = -1; + + const bool to_file = !to_stdout && program_mode != m_test && + default_output_filename[0]; + if( !to_stdout && program_mode != m_test && ( filenames_given || to_file ) ) + set_signals( signal_handler ); + + static Pretty_print pp; + Pp_init( &pp, filenames, num_filenames ); + + int failed_tests = 0; + int retval = 0; + const bool one_to_one = !to_stdout && program_mode != m_test && !to_file; + bool stdin_used = false; + struct stat in_stats; + for( i = 0; i < num_filenames; ++i ) + { + const char * input_filename = ""; + int infd; + const bool from_stdin = strcmp( filenames[i], "-" ) == 0; + + Pp_set_name( &pp, filenames[i] ); + if( from_stdin ) + { + if( stdin_used ) continue; else stdin_used = true; + infd = STDIN_FILENO; + if( !check_tty_in( pp.name, infd, program_mode, &retval ) ) continue; + if( one_to_one ) { outfd = STDOUT_FILENO; output_filename[0] = 0; } + } + else + { + const int eindex = extension_index( input_filename = filenames[i] ); + infd = open_instream( input_filename, &in_stats, program_mode, + eindex, one_to_one, recompress ); + if( infd < 0 ) { set_retval( &retval, 1 ); continue; } + if( !check_tty_in( pp.name, infd, program_mode, &retval ) ) continue; + if( one_to_one ) /* open outfd after checking infd */ + { + if( program_mode == m_compress ) + set_c_outname( input_filename, true, volume_size > 0 ); + else set_d_outname( input_filename, eindex ); + if( !open_outstream( force, true ) ) + { close( infd ); set_retval( &retval, 1 ); continue; } + } + } + + if( one_to_one && !check_tty_out( program_mode ) ) + { set_retval( &retval, 1 ); return retval; } /* don't delete a tty */ + + if( to_file && outfd < 0 ) /* open outfd after checking infd */ + { + if( program_mode == m_compress ) set_c_outname( default_output_filename, + false, volume_size > 0 ); + else + { output_filename = resize_buffer( output_filename, + strlen( default_output_filename ) + 1 ); + strcpy( output_filename, default_output_filename ); } + if( !open_outstream( force, false ) || !check_tty_out( program_mode ) ) + return 1; /* check tty only once and don't try to delete a tty */ + } + + const struct stat * const in_statsp = + ( input_filename[0] && one_to_one ) ? &in_stats : 0; + int tmp; + if( program_mode == m_compress ) + tmp = compress( member_size, volume_size, infd, &encoder_options, &pp, + in_statsp ); + else + tmp = decompress( infd, &pp, from_stdin, ignore_trailing, loose_trailing, + program_mode == m_test ); + if( close( infd ) != 0 ) + { show_file_error( pp.name, "Error closing input file", errno ); + set_retval( &tmp, 1 ); } + set_retval( &retval, tmp ); + if( tmp ) + { if( program_mode != m_test ) cleanup_and_fail( retval ); + else ++failed_tests; } + + if( delete_output_on_interrupt && one_to_one ) + close_and_set_permissions( in_statsp ); + if( input_filename[0] && !keep_input_files && one_to_one && + ( program_mode != m_compress || volume_size == 0 ) ) + remove( input_filename ); + } + if( delete_output_on_interrupt ) /* -o */ + close_and_set_permissions( ( retval == 0 && !stdin_used && + filenames_given && num_filenames == 1 ) ? &in_stats : 0 ); + else if( outfd >= 0 && close( outfd ) != 0 ) /* -c */ + { + show_error( "Error closing stdout", errno, false ); + set_retval( &retval, 1 ); + } + if( failed_tests > 0 && verbosity >= 1 && num_filenames > 1 ) + fprintf( stderr, "%s: warning: %d %s failed the test.\n", + program_name, failed_tests, + ( failed_tests == 1 ) ? "file" : "files" ); + free( output_filename ); + Pp_free( &pp ); + free( filenames ); + ap_free( &parser ); + return retval; + } diff --git a/testsuite/check.sh b/testsuite/check.sh index a78f156..d4c5eff 100755 --- a/testsuite/check.sh +++ b/testsuite/check.sh @@ -1,9 +1,9 @@ #! /bin/sh -# check script for Lzlib - A compression library for lzip files -# Copyright (C) 2009-2016 Antonio Diaz Diaz. +# check script for Lzlib - Compression library for the lzip format +# Copyright (C) 2009-2025 Antonio Diaz Diaz. # # This script is free software: you have unlimited permission -# to copy, distribute and modify it. +# to copy, distribute, and modify it. LC_ALL=C export LC_ALL @@ -11,6 +11,7 @@ objdir=`pwd` testdir=`cd "$1" ; pwd` LZIP="${objdir}"/minilzip BBEXAMPLE="${objdir}"/bbexample +FFEXAMPLE="${objdir}"/ffexample LZCHECK="${objdir}"/lzcheck framework_failure() { echo "failure in testing framework" ; exit 1 ; } @@ -19,178 +20,471 @@ if [ ! -f "${LZIP}" ] || [ ! -x "${LZIP}" ] ; then exit 1 fi -if [ -e "${LZIP}" ] 2> /dev/null ; then true -else +[ -e "${LZIP}" ] 2> /dev/null || + { echo "$0: a POSIX shell is required to run the tests" echo "Try bash -c \"$0 $1 $2\"" exit 1 -fi + } if [ -d tmp ] ; then rm -rf tmp ; fi mkdir tmp cd "${objdir}"/tmp || framework_failure -cat "${testdir}"/test.txt > in || framework_failure +cp "${testdir}"/test.txt in || framework_failure in_lz="${testdir}"/test.txt.lz -test2="${testdir}"/test2.txt +fox_lf="${testdir}"/fox_lf +fox_lz="${testdir}"/fox.lz +fnz_lz="${testdir}"/fox_nz.lz fail=0 +test_failed() { fail=1 ; printf " $1" ; [ -z "$2" ] || printf "($2)" ; } + +"${LZIP}" --check-lib # just print warning +[ $? != 2 ] || { test_failed $LINENO ; exit 2 ; } # unless bad lzlib.h printf "testing lzlib-%s..." "$2" "${LZIP}" -fkqm4 in -if [ $? = 1 ] && [ ! -e in.lz ] ; then printf . ; else printf - ; fail=1 ; fi +[ $? = 1 ] || test_failed $LINENO +[ ! -e in.lz ] || test_failed $LINENO "${LZIP}" -fkqm274 in -if [ $? = 1 ] && [ ! -e in.lz ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -fkqs-1 in -if [ $? = 1 ] && [ ! -e in.lz ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -fkqs0 in -if [ $? = 1 ] && [ ! -e in.lz ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -fkqs4095 in -if [ $? = 1 ] && [ ! -e in.lz ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -fkqs513MiB in -if [ $? = 1 ] && [ ! -e in.lz ] ; then printf . ; else printf - ; fail=1 ; fi +[ $? = 1 ] || test_failed $LINENO +[ ! -e in.lz ] || test_failed $LINENO +for i in bad_size -1 0 4095 513MiB 1G 1T 1P 1E 1Z 1Y 10KB ; do + "${LZIP}" -fkqs $i in + [ $? = 1 ] || test_failed $LINENO $i + [ ! -e in.lz ] || test_failed $LINENO $i +done "${LZIP}" -tq in -if [ $? = 2 ] ; then printf . ; else printf - ; fail=1 ; fi +[ $? = 2 ] || test_failed $LINENO "${LZIP}" -tq < in -if [ $? = 2 ] ; then printf . ; else printf - ; fail=1 ; fi +[ $? = 2 ] || test_failed $LINENO "${LZIP}" -cdq in -if [ $? = 2 ] ; then printf . ; else printf - ; fail=1 ; fi +[ $? = 2 ] || test_failed $LINENO "${LZIP}" -cdq < in -if [ $? = 2 ] ; then printf . ; else printf - ; fail=1 ; fi -dd if="${in_lz}" bs=1 count=6 2> /dev/null | "${LZIP}" -tq -if [ $? = 2 ] ; then printf . ; else printf - ; fail=1 ; fi -dd if="${in_lz}" bs=1 count=20 2> /dev/null | "${LZIP}" -tq -if [ $? = 2 ] ; then printf . ; else printf - ; fail=1 ; fi +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -dq -o in < "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -dq -o in "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -dq -o out nx_file.lz +[ $? = 1 ] || test_failed $LINENO +[ ! -e out ] || test_failed $LINENO +"${LZIP}" -q -o out.lz nx_file +[ $? = 1 ] || test_failed $LINENO +[ ! -e out.lz ] || test_failed $LINENO +"${LZIP}" -qf -S100k -o out in in # only one file with -o and -S +[ $? = 1 ] || test_failed $LINENO +{ [ ! -e out ] && [ ! -e out.lz ] ; } || test_failed $LINENO +# these are for code coverage +"${LZIP}" -cdt "${in_lz}" 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -t -- nx_file.lz 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -t "" < /dev/null 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --help > /dev/null || test_failed $LINENO +"${LZIP}" -n1 -V > /dev/null || test_failed $LINENO +"${LZIP}" -m 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -z 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --bad_option 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --t 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --test=2 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --output= 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --output 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +printf "LZIP\001-.............................." | "${LZIP}" -t 2> /dev/null +printf "LZIP\002-.............................." | "${LZIP}" -t 2> /dev/null +printf "LZIP\001+.............................." | "${LZIP}" -t 2> /dev/null printf "\ntesting decompression..." -"${LZIP}" -t "${in_lz}" -if [ $? = 0 ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -cd "${in_lz}" > copy || fail=1 -cmp in copy || fail=1 -printf . +for i in "${in_lz}" "${testdir}"/test_sync.lz ; do + "${LZIP}" -t "$i" || test_failed $LINENO "$i" + "${LZIP}" -d "$i" -o out || test_failed $LINENO "$i" + cmp in out || test_failed $LINENO "$i" + "${LZIP}" -cd "$i" > out || test_failed $LINENO "$i" + cmp in out || test_failed $LINENO "$i" + "${LZIP}" -d "$i" -o - > out || test_failed $LINENO "$i" + cmp in out || test_failed $LINENO "$i" + "${LZIP}" -d < "$i" > out || test_failed $LINENO "$i" + cmp in out || test_failed $LINENO "$i" + rm -f out || framework_failure +done -"${LZIP}" -t "${testdir}"/test_sync.lz -if [ $? = 0 ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -cd "${testdir}"/test_sync.lz > copy || fail=1 -cmp in copy || fail=1 -printf . +cp "${in_lz}" out.lz || framework_failure +"${LZIP}" -dk out.lz || test_failed $LINENO +cmp in out || test_failed $LINENO +rm -f out || framework_failure +"${LZIP}" -cd "${fox_lz}" > fox || test_failed $LINENO +cp fox copy || framework_failure +cp "${in_lz}" copy.lz || framework_failure +"${LZIP}" -d copy.lz out.lz 2> /dev/null # skip copy, decompress out +[ $? = 1 ] || test_failed $LINENO +[ ! -e out.lz ] || test_failed $LINENO +cmp fox copy || test_failed $LINENO +cmp in out || test_failed $LINENO +"${LZIP}" -df copy.lz || test_failed $LINENO +[ ! -e copy.lz ] || test_failed $LINENO +cmp in copy || test_failed $LINENO +rm -f copy out || framework_failure -rm -f copy -cat "${in_lz}" > copy.lz || framework_failure -"${LZIP}" -dk copy.lz || fail=1 -cmp in copy || fail=1 -printf "to be overwritten" > copy || framework_failure -"${LZIP}" -dq copy.lz -if [ $? = 1 ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -df copy.lz -if [ $? = 0 ] && [ ! -e copy.lz ] && cmp in copy ; then - printf . ; else printf - ; fail=1 ; fi +cp "${in_lz}" out.lz || framework_failure +"${LZIP}" -d -S100k out.lz || test_failed $LINENO # ignore -S +[ ! -e out.lz ] || test_failed $LINENO +cmp in out || test_failed $LINENO -printf "to be overwritten" > copy || framework_failure -"${LZIP}" -df -o copy < "${in_lz}" || fail=1 -cmp in copy || fail=1 -printf . +printf "to be overwritten" > out || framework_failure +"${LZIP}" -df -o out < "${in_lz}" || test_failed $LINENO +cmp in out || test_failed $LINENO +"${LZIP}" -d -o ./- "${in_lz}" || test_failed $LINENO +cmp in ./- || test_failed $LINENO +rm -f ./- || framework_failure +"${LZIP}" -d -o ./- < "${in_lz}" || test_failed $LINENO +cmp in ./- || test_failed $LINENO +rm -f ./- || framework_failure -rm -f copy -"${LZIP}" -s16 < in > anyothername || fail=1 -"${LZIP}" -d -o copy - anyothername - < "${in_lz}" -if [ $? = 0 ] && cmp in copy && cmp in anyothername.out ; then - printf . ; else printf - ; fail=1 ; fi -rm -f copy anyothername.out +cp "${in_lz}" anyothername || framework_failure +"${LZIP}" -dv - anyothername - < "${in_lz}" > out 2> /dev/null || + test_failed $LINENO +cmp in out || test_failed $LINENO +cmp in anyothername.out || test_failed $LINENO +rm -f anyothername.out || framework_failure "${LZIP}" -tq in "${in_lz}" -if [ $? = 2 ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -tq foo.lz "${in_lz}" -if [ $? = 1 ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -cdq in "${in_lz}" > copy -if [ $? = 2 ] && cat copy in | cmp in - ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -cdq foo.lz "${in_lz}" > copy -if [ $? = 1 ] && cmp in copy ; then printf . ; else printf - ; fail=1 ; fi -rm -f copy -cat "${in_lz}" > copy.lz || framework_failure -"${LZIP}" -dq in copy.lz -if [ $? = 2 ] && [ -e copy.lz ] && [ ! -e copy ] && [ ! -e in.out ] ; then - printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -dq foo.lz copy.lz -if [ $? = 1 ] && [ ! -e copy.lz ] && [ ! -e foo ] && cmp in copy ; then - printf . ; else printf - ; fail=1 ; fi +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -tq nx_file.lz "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -cdq in "${in_lz}" > out +[ $? = 2 ] || test_failed $LINENO +cat out in | cmp in - || test_failed $LINENO # out must be empty +"${LZIP}" -cdq nx_file.lz "${in_lz}" > out # skip nx_file, decompress in +[ $? = 1 ] || test_failed $LINENO +cmp in out || test_failed $LINENO +rm -f out || framework_failure +cp "${in_lz}" out.lz || framework_failure +for i in 1 2 3 4 5 6 7 ; do + printf "g" >> out.lz || framework_failure + "${LZIP}" -atvvvv out.lz "${in_lz}" 2> /dev/null + [ $? = 2 ] || test_failed $LINENO $i +done +"${LZIP}" -dq in out.lz +[ $? = 2 ] || test_failed $LINENO +[ -e out.lz ] || test_failed $LINENO +[ ! -e out ] || test_failed $LINENO +[ ! -e in.out ] || test_failed $LINENO +"${LZIP}" -dq nx_file.lz out.lz +[ $? = 1 ] || test_failed $LINENO +[ ! -e out.lz ] || test_failed $LINENO +[ ! -e nx_file ] || test_failed $LINENO +cmp in out || test_failed $LINENO +rm -f out || framework_failure cat in in > in2 || framework_failure -"${LZIP}" -s16 -o copy2 < in2 || fail=1 -"${LZIP}" -t copy2.lz || fail=1 -"${LZIP}" -cd copy2.lz > copy2 || fail=1 -cmp in2 copy2 || fail=1 -printf . +"${LZIP}" -t "${in_lz}" "${in_lz}" || test_failed $LINENO +"${LZIP}" -cd "${in_lz}" "${in_lz}" -o out > out2 || test_failed $LINENO +[ ! -e out ] || test_failed $LINENO # override -o +cmp in2 out2 || test_failed $LINENO +rm -f out2 || framework_failure +"${LZIP}" -d "${in_lz}" "${in_lz}" -o out2 || test_failed $LINENO +cmp in2 out2 || test_failed $LINENO +rm -f out2 || framework_failure -printf "garbage" >> copy2.lz || framework_failure -rm -f copy2 -"${LZIP}" -atq copy2.lz -if [ $? = 2 ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -atq < copy2.lz -if [ $? = 2 ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -adkq copy2.lz -if [ $? = 2 ] && [ ! -e copy2 ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -adkq -o copy2 < copy2.lz -if [ $? = 2 ] && [ ! -e copy2 ] ; then printf . ; else printf - ; fail=1 ; fi -printf "to be overwritten" > copy2 || framework_failure -"${LZIP}" -df copy2.lz || fail=1 -cmp in2 copy2 || fail=1 -printf . +cat "${in_lz}" "${in_lz}" > out2.lz || framework_failure +lines=`"${LZIP}" -tvv out2.lz 2>&1 | wc -l` || test_failed $LINENO +[ "${lines}" -eq 2 ] || test_failed $LINENO "${lines}" + +printf "\ngarbage" >> out2.lz || framework_failure +"${LZIP}" -tvvvv out2.lz 2> /dev/null || test_failed $LINENO +"${LZIP}" -atq out2.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -atq < out2.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -adkq out2.lz +[ $? = 2 ] || test_failed $LINENO +[ ! -e out2 ] || test_failed $LINENO +"${LZIP}" -adkq -o out2 < out2.lz +[ $? = 2 ] || test_failed $LINENO +[ ! -e out2 ] || test_failed $LINENO +printf "to be overwritten" > out2 || framework_failure +"${LZIP}" -df out2.lz || test_failed $LINENO +cmp in2 out2 || test_failed $LINENO +rm -f out2 || framework_failure + +touch empty em || framework_failure +"${LZIP}" -0 em || test_failed $LINENO +"${LZIP}" -dk em.lz || test_failed $LINENO +cmp empty em || test_failed $LINENO +cat em.lz em.lz | "${LZIP}" -t || test_failed $LINENO +cat em.lz em.lz | "${LZIP}" -d > em || test_failed $LINENO +cmp empty em || test_failed $LINENO +cat em.lz "${in_lz}" | "${LZIP}" -t || test_failed $LINENO +cat em.lz "${in_lz}" | "${LZIP}" -d > out || test_failed $LINENO +cmp in out || test_failed $LINENO +cat "${in_lz}" em.lz | "${LZIP}" -t || test_failed $LINENO +cat "${in_lz}" em.lz | "${LZIP}" -d > out || test_failed $LINENO +cmp in out || test_failed $LINENO printf "\ntesting compression..." -"${LZIP}" -cfq "${in_lz}" > out # /dev/null is a tty on OS/2 -if [ $? = 1 ] ; then printf . ; else printf - ; fail=1 ; fi -"${LZIP}" -cF -s16 "${in_lz}" > out || fail=1 -"${LZIP}" -cd out | "${LZIP}" -d > copy || fail=1 -cmp in copy || fail=1 -printf . +"${LZIP}" -c -0 in in in -S100k -o out3.lz > copy2.lz || test_failed $LINENO +[ ! -e out3.lz ] || test_failed $LINENO # override -o and -S +"${LZIP}" -0f in in --output=copy2.lz || test_failed $LINENO +"${LZIP}" -d copy2.lz -o out2 || test_failed $LINENO +[ -e copy2.lz ] || test_failed $LINENO +cmp in2 out2 || test_failed $LINENO +rm -f copy2.lz || framework_failure + +"${LZIP}" -cf "${in_lz}" > lzlz 2> /dev/null # /dev/null is a tty on OS/2 +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -Fvvm36 -o - -s16 "${in_lz}" > lzlz 2> /dev/null || test_failed $LINENO +"${LZIP}" -cd lzlz | "${LZIP}" -d > out || test_failed $LINENO +cmp in out || test_failed $LINENO +rm -f lzlz out || framework_failure + +"${LZIP}" -0 -o ./- in || test_failed $LINENO +"${LZIP}" -cd ./- | cmp in - || test_failed $LINENO +rm -f ./- || framework_failure +"${LZIP}" -0 -o ./- < in || test_failed $LINENO # don't add .lz +[ ! -e ./-.lz ] || test_failed $LINENO +"${LZIP}" -cd ./- | cmp in - || test_failed $LINENO +rm -f ./- || framework_failure for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do - "${LZIP}" -k -$i -s16 in || fail=1 - mv -f in.lz copy.lz || fail=1 - printf "garbage" >> copy.lz || fail=1 - "${LZIP}" -df copy.lz || fail=1 - cmp in copy || fail=1 + "${LZIP}" -k -$i -s16 in || test_failed $LINENO $i + mv in.lz out.lz || test_failed $LINENO $i + printf "garbage" >> out.lz || framework_failure + "${LZIP}" -df out.lz || test_failed $LINENO $i + cmp in out || test_failed $LINENO $i + + "${LZIP}" -$i -s16 in -c > out || test_failed $LINENO $i + "${LZIP}" -$i -s16 in -o o_out || test_failed $LINENO $i # don't add .lz + [ ! -e o_out.lz ] || test_failed $LINENO + cmp out o_out || test_failed $LINENO $i + rm -f o_out || framework_failure + printf "g" >> out || framework_failure + "${LZIP}" -cd out > copy || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i + + "${LZIP}" -$i -s16 < in > out || test_failed $LINENO $i + "${LZIP}" -d < out > copy || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i + + rm -f out.lz || framework_failure + printf "to be overwritten" > out || framework_failure + "${LZIP}" -f -$i -s16 -o out < in || test_failed $LINENO $i # don't add .lz + [ ! -e out.lz ] || test_failed $LINENO + "${LZIP}" -df -o copy < out || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i done -printf . +rm -f copy out || framework_failure -for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do - "${LZIP}" -c -$i -s16 in > out || fail=1 - printf "g" >> out || fail=1 - "${LZIP}" -cd out > copy || fail=1 - cmp in copy || fail=1 +cat in in in in in in in in > in8 || framework_failure +"${LZIP}" -1s12 -S100k in8 || test_failed $LINENO +"${LZIP}" -t in800001.lz in800002.lz || test_failed $LINENO +"${LZIP}" -cd in800001.lz in800002.lz | cmp in8 - || test_failed $LINENO +[ ! -e in800003.lz ] || test_failed $LINENO +rm -f in800001.lz in800002.lz || framework_failure +"${LZIP}" -1s12 -S100k -o out.lz in8 || test_failed $LINENO +# ignore -S +"${LZIP}" -d out.lz00001.lz out.lz00002.lz -S100k -o out || test_failed $LINENO +cmp in8 out || test_failed $LINENO +"${LZIP}" -t out.lz00001.lz out.lz00002.lz || test_failed $LINENO +[ ! -e out.lz00003.lz ] || test_failed $LINENO +rm -f out out.lz00001.lz out.lz00002.lz || framework_failure +"${LZIP}" -1ks4Ki -b100000 in8 || test_failed $LINENO +"${LZIP}" -t in8.lz || test_failed $LINENO +"${LZIP}" -cd in8.lz -o out | cmp in8 - || test_failed $LINENO # override -o +[ ! -e out ] || test_failed $LINENO +"${LZIP}" -0 -S100k -o out < in8.lz || test_failed $LINENO +"${LZIP}" -t out00001.lz out00002.lz || test_failed $LINENO +"${LZIP}" -cd out00001.lz out00002.lz | cmp in8.lz - || test_failed $LINENO +[ ! -e out00003.lz ] || test_failed $LINENO +rm -f out00001.lz out00002.lz || framework_failure +"${LZIP}" -1 -S100k -o out < in8.lz || test_failed $LINENO +"${LZIP}" -t out00001.lz out00002.lz || test_failed $LINENO +"${LZIP}" -cd out00001.lz out00002.lz | cmp in8.lz - || test_failed $LINENO +[ ! -e out00003.lz ] || test_failed $LINENO +rm -f out00001.lz out00002.lz || framework_failure +"${LZIP}" -0 -F -S100k in8.lz || test_failed $LINENO +"${LZIP}" -t in8.lz00001.lz in8.lz00002.lz || test_failed $LINENO +"${LZIP}" -cd in8.lz00001.lz in8.lz00002.lz | cmp in8.lz - || test_failed $LINENO +[ ! -e in8.lz00003.lz ] || test_failed $LINENO +rm -f in8.lz00001.lz in8.lz00002.lz || framework_failure +"${LZIP}" -0kF -b100k in8.lz || test_failed $LINENO +"${LZIP}" -t in8.lz.lz || test_failed $LINENO +"${LZIP}" -cd in8.lz.lz | cmp in8.lz - || test_failed $LINENO +rm -f in8.lz in8.lz.lz || framework_failure + +"${BBEXAMPLE}" in || test_failed $LINENO +"${BBEXAMPLE}" "${in_lz}" || test_failed $LINENO +"${BBEXAMPLE}" "${fox_lf}" || test_failed $LINENO + +"${FFEXAMPLE}" -h > /dev/null || test_failed $LINENO +"${FFEXAMPLE}" > /dev/null +[ $? = 1 ] || test_failed $LINENO +rm -f out || framework_failure +"${FFEXAMPLE}" -b in out || test_failed $LINENO +cmp in out || test_failed $LINENO +"${FFEXAMPLE}" -b in | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -b in8 | cmp in8 - || test_failed $LINENO +"${FFEXAMPLE}" -b "${fox_lf}" | cmp "${fox_lf}" - || test_failed $LINENO +"${FFEXAMPLE}" -d "${in_lz}" - | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -c in | "${FFEXAMPLE}" -d | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -m in | "${FFEXAMPLE}" -d | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -l in | "${FFEXAMPLE}" -d | cmp in - || test_failed $LINENO +cat "${fox_lf}" "${in_lz}" | "${FFEXAMPLE}" -r | cmp in - || test_failed $LINENO +cat in8 "${in_lz}" | "${FFEXAMPLE}" -r | cmp in - || test_failed $LINENO +cat "${in_lz}" "${fox_lf}" "${in_lz}" | "${FFEXAMPLE}" -r - | cmp in2 - || + test_failed $LINENO +cat "${in_lz}" in8 "${in_lz}" | "${FFEXAMPLE}" -r - - | cmp in2 - || + test_failed $LINENO + +"${LZCHECK}" in || test_failed $LINENO +"${LZCHECK}" "${in_lz}" || test_failed $LINENO +"${LZCHECK}" "${fox_lf}" || test_failed $LINENO +rm -f in8 || framework_failure + +printf "\ntesting bad input..." + +cat em.lz em.lz > ee.lz || framework_failure +"${LZIP}" -t < ee.lz || test_failed $LINENO +"${LZIP}" -d < ee.lz > em || test_failed $LINENO +cmp empty em || test_failed $LINENO +"${LZIP}" -tq ee.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -dq ee.lz +[ $? = 2 ] || test_failed $LINENO +[ ! -e ee ] || test_failed $LINENO +"${LZIP}" -cdq ee.lz > em +[ $? = 2 ] || test_failed $LINENO +cmp empty em || test_failed $LINENO +rm -f empty em || framework_failure +cat "${in_lz}" em.lz "${in_lz}" > inein.lz || framework_failure +"${LZIP}" -t < inein.lz || test_failed $LINENO +"${LZIP}" -d < inein.lz > out2 || test_failed $LINENO +cmp in2 out2 || test_failed $LINENO +"${LZIP}" -tq inein.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -dq inein.lz +[ $? = 2 ] || test_failed $LINENO +[ ! -e inein ] || test_failed $LINENO +"${LZIP}" -cdq inein.lz > out2 +[ $? = 2 ] || test_failed $LINENO +cmp in2 out2 || test_failed $LINENO +rm -f out2 inein.lz em.lz || framework_failure + +headers='LZIp LZiP LZip LzIP LzIp LziP lZIP lZIp lZiP lzIP' +body='\001\014\000\000\101\376\367\377\377\340\000\200\000\215\357\002\322\001\000\000\000\000\000\000\000\045\000\000\000\000\000\000\000' +cp "${in_lz}" int.lz || framework_failure +printf "LZIP${body}" >> int.lz || framework_failure +if "${LZIP}" -t int.lz ; then + for header in ${headers} ; do + printf "${header}${body}" > int.lz || framework_failure + "${LZIP}" -tq int.lz # first member + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq --loose-trailing int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + cp "${in_lz}" int.lz || framework_failure + printf "${header}${body}" >> int.lz || framework_failure + "${LZIP}" -tq int.lz # trailing data + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -t --loose-trailing int.lz || + test_failed $LINENO ${header} + "${LZIP}" -t --loose-trailing < int.lz || + test_failed $LINENO ${header} + "${LZIP}" -cd --loose-trailing int.lz > /dev/null || + test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing --trailing-error int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing --trailing-error < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq --loose-trailing --trailing-error int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + done +else + printf "warning: skipping header test: 'printf' does not work on your system." +fi +rm -f int.lz || framework_failure + +"${LZIP}" -tq "${fnz_lz}" +[ $? = 2 ] || test_failed $LINENO + +for i in fox_v2.lz fox_s11.lz fox_de20.lz \ + fox_bcrc.lz fox_crc0.lz fox_das46.lz fox_mes81.lz ; do + "${LZIP}" -tq "${testdir}"/$i + [ $? = 2 ] || test_failed $LINENO $i done -printf . -for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do - "${LZIP}" -$i -s16 < in > out || fail=1 - "${LZIP}" -d < out > copy || fail=1 - cmp in copy || fail=1 +for i in fox_bcrc.lz fox_crc0.lz fox_das46.lz fox_mes81.lz ; do + "${LZIP}" -cdq "${testdir}"/$i > out + [ $? = 2 ] || test_failed $LINENO $i + cmp fox out || test_failed $LINENO $i done -printf . +rm -f fox || framework_failure -for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do - "${LZIP}" -f -$i -s16 -o out < in || fail=1 - "${LZIP}" -df -o copy < out.lz || fail=1 - cmp in copy || fail=1 -done -printf . +cat "${in_lz}" "${in_lz}" > in2.lz || framework_failure +cat "${in_lz}" "${in_lz}" "${in_lz}" > in3.lz || framework_failure +if dd if=in3.lz of=trunc.lz bs=14682 count=1 2> /dev/null && + [ -e trunc.lz ] && cmp in2.lz trunc.lz ; then + for i in 6 20 14664 14683 14684 14685 14686 14687 14688 ; do + dd if=in3.lz of=trunc.lz bs=$i count=1 2> /dev/null + "${LZIP}" -tq trunc.lz + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -tq < trunc.lz + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -cdq trunc.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -dq < trunc.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO $i + done +else + printf "warning: skipping truncation test: 'dd' does not work on your system." +fi +rm -f in2.lz in3.lz trunc.lz || framework_failure -"${BBEXAMPLE}" in || fail=1 -printf . -"${BBEXAMPLE}" out || fail=1 -printf . -"${BBEXAMPLE}" ${test2} || fail=1 -printf . - -"${LZCHECK}" in || fail=1 -printf . -"${LZCHECK}" out || fail=1 -printf . -"${LZCHECK}" ${test2} || fail=1 -printf . +cp "${in_lz}" ingin.lz || framework_failure +printf "g" >> ingin.lz || framework_failure +cat "${in_lz}" >> ingin.lz || framework_failure +"${LZIP}" -atq ingin.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -atq < ingin.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -acdq ingin.lz > out +[ $? = 2 ] || test_failed $LINENO +cmp in out || test_failed $LINENO +"${LZIP}" -adq < ingin.lz > out +[ $? = 2 ] || test_failed $LINENO +cmp in out || test_failed $LINENO +"${LZIP}" -t ingin.lz || test_failed $LINENO +"${LZIP}" -t < ingin.lz || test_failed $LINENO +"${LZIP}" -dk ingin.lz || test_failed $LINENO +cmp in ingin || test_failed $LINENO +"${LZIP}" -cd ingin.lz > out || test_failed $LINENO +cmp in out || test_failed $LINENO +"${LZIP}" -d < ingin.lz > out || test_failed $LINENO +cmp in out || test_failed $LINENO +"${FFEXAMPLE}" -d ingin.lz | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -r ingin.lz | cmp in2 - || test_failed $LINENO +rm -f in2 out ingin ingin.lz || framework_failure echo if [ ${fail} = 0 ] ; then diff --git a/testsuite/fox.lz b/testsuite/fox.lz new file mode 100644 index 0000000..509da82 Binary files /dev/null and b/testsuite/fox.lz differ diff --git a/testsuite/fox_bcrc.lz b/testsuite/fox_bcrc.lz new file mode 100644 index 0000000..8f6a7c4 Binary files /dev/null and b/testsuite/fox_bcrc.lz differ diff --git a/testsuite/fox_crc0.lz b/testsuite/fox_crc0.lz new file mode 100644 index 0000000..1abe926 Binary files /dev/null and b/testsuite/fox_crc0.lz differ diff --git a/testsuite/fox_das46.lz b/testsuite/fox_das46.lz new file mode 100644 index 0000000..43ed9f9 Binary files /dev/null and b/testsuite/fox_das46.lz differ diff --git a/testsuite/fox_de20.lz b/testsuite/fox_de20.lz new file mode 100644 index 0000000..10949d8 Binary files /dev/null and b/testsuite/fox_de20.lz differ diff --git a/testsuite/test2.txt b/testsuite/fox_lf similarity index 100% rename from testsuite/test2.txt rename to testsuite/fox_lf diff --git a/testsuite/fox_mes81.lz b/testsuite/fox_mes81.lz new file mode 100644 index 0000000..d50ef2e Binary files /dev/null and b/testsuite/fox_mes81.lz differ diff --git a/testsuite/fox_nz.lz b/testsuite/fox_nz.lz new file mode 100644 index 0000000..44a4b58 Binary files /dev/null and b/testsuite/fox_nz.lz differ diff --git a/testsuite/fox_s11.lz b/testsuite/fox_s11.lz new file mode 100644 index 0000000..dca909c Binary files /dev/null and b/testsuite/fox_s11.lz differ diff --git a/testsuite/fox_v2.lz b/testsuite/fox_v2.lz new file mode 100644 index 0000000..8620981 Binary files /dev/null and b/testsuite/fox_v2.lz differ diff --git a/testsuite/test.txt b/testsuite/test.txt index 9196a3a..423f0c0 100644 --- a/testsuite/test.txt +++ b/testsuite/test.txt @@ -1,8 +1,7 @@ GNU GENERAL PUBLIC LICENSE Version 2, June 1991 - Copyright (C) 1989, 1991 Free Software Foundation, Inc., - 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Copyright (C) 1989, 1991 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. @@ -339,8 +338,7 @@ Public License instead of this License. GNU GENERAL PUBLIC LICENSE Version 2, June 1991 - Copyright (C) 1989, 1991 Free Software Foundation, Inc., - 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Copyright (C) 1989, 1991 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. diff --git a/testsuite/test.txt.lz b/testsuite/test.txt.lz index 41d2e39..5dc169f 100644 Binary files a/testsuite/test.txt.lz and b/testsuite/test.txt.lz differ diff --git a/testsuite/test_sync.lz b/testsuite/test_sync.lz index db680c3..2a6218b 100644 Binary files a/testsuite/test_sync.lz and b/testsuite/test_sync.lz differ