Welcome to python-isal’s documentation!¶
Introduction¶
Faster zlib and gzip compatible compression and decompression by providing Python bindings for the ISA-L library.
This package provides Python bindings for the ISA-L library. The Intel(R) Intelligent Storage Acceleration Library (ISA-L) implements several key algorithms in assembly language. This includes a variety of functions to provide zlib/gzip-compatible compression.
python-isal
provides the bindings by offering three modules:
isal_zlib
: A drop-in replacement for the zlib module that uses ISA-L to accelerate its performance.igzip
: A drop-in replacement for the gzip module that usesisal_zlib
instead ofzlib
to perform its compression and checksum tasks, which improves performance.igzip_lib
: Provides compression functions which have full access to the API of ISA-L’s compression functions.
isal_zlib
and igzip
are almost fully compatible with zlib
and
gzip
from the Python standard library. There are some minor differences
see: differences-with-zlib-and-gzip-modules.
Quickstart¶
The python-isal modules can be imported as follows
from isal import isal_zlib
from isal import igzip
from isal import igzip_lib
isal_zlib
and igzip
are meant to be used as drop in replacements so
their api and functions are the same as the stdlib’s modules. Except where
ISA-L does not support the same calls as zlib (See differences below).
A full API documentation can be found on our readthedocs page.
python -m isal.igzip
implements a simple gzip-like command line
application (just like python -m gzip
). Full usage documentation can be
found on our readthedocs page.
Installation¶
Installation with pip¶
pip install isal
Installation is supported on Linux, MacOS and Windows. On most platforms wheels are provided. The installation will include a staticallly linked version of ISA-L. If a wheel is not provided for your system the installation will build ISA-L first in a temporary directory. Please check the ISA-L homepage for the build requirements.
The latest development version of python-isal can be installed with:
pip install git+https://github.com/rhpvorderman/python-isal.git
This requires having the build requirements installed. If you wish to link dynamically against a version of libisal installed on your system use:
PYTHON_ISAL_LINK_DYNAMIC=true pip install isal --no-binary isal
ISA-L is available in numerous Linux distro’s as well as on conda via the conda-forge channel. Checkout the ports documentation on the ISA-L project wiki to find out how to install it. It is important that the development headers are also installed.
On Debian and Ubuntu the ISA-L libraries (including the development headers) can be installed with:
sudo apt install libisal-dev
Installation via conda¶
Python-isal can be installed via conda, for example using the miniconda installer with a properly setup conda-forge channel. When used with bioinformatics tools setting up bioconda provides a clear set of installation instructions for conda.
python-isal is available on conda-forge and can be installed with:
conda install python-isal
This will automatically install the ISA-L library dependency as well, since it is available on conda-forge.
Differences with zlib and gzip modules¶
Compression level 0 in
zlib
andgzip
means no compression, while inisal_zlib
andigzip
this is the lowest compression level. This is a design choice that was inherited from the ISA-L library.Compression levels range from 0 to 3, not 1 to 9.
isal_zlib.Z_DEFAULT_COMPRESSION
has been aliased toisal_zlib.ISAL_DEFAULT_COMPRESSION
(2).isal_zlib
only supportsNO_FLUSH
,SYNC_FLUSH
,FULL_FLUSH
andFINISH_FLUSH
. Other flush modes are not supported and will raise errors.zlib.Z_DEFAULT_STRATEGY
,zlib.Z_RLE
etc. are exposed asisal_zlib.Z_DEFAULT_STRATEGY
,isal_zlib.Z_RLE
etc. for compatibility reasons. However,isal_zlib
only supports a default strategy and will give warnings when other strategies are used.zlib
supports different memory levels from 1 to 9 (with 8 default).isal_zlib
supports memory levels smallest, small, medium, large and largest. These have been mapped to levels 1, 2-3, 4-6, 7-8 and 9. Soisal_zlib
can be used with zlib compatible memory levels.isal_zlib
methods have adata
argument which is positional only. In isal_zlib this is not enforced and it can also called as keyword argument. This is due to implementingisal_zlib
in cython and maintaining backwards compatibility with python 3.6.igzip.open
returns a classIGzipFile
instead ofGzipFile
. Since the compression levels are not compatible, a difference in naming was chosen to reflect this.igzip.GzipFile
does exist as an alias ofigzip.IGzipFile
for compatibility reasons.
API Documentation: isal_zlib¶
Implementation of the zlib module using the ISA-L libraries.
-
class
isal.isal_zlib.
Compress
¶ Compress object for handling streaming compression.
-
compress
()¶ Compress data returning a bytes object with at least part of the data in data. This data should be concatenated to the output produced by any preceding calls to the compress() method. Some input may be kept in internal buffers for later processing.
-
flush
()¶ All pending input is processed, and a bytes object containing the remaining compressed output is returned.
- Parameters
mode – Defaults to Z_FINISH which finishes the compressed stream and prevents compressing any more data. The other supported methods are Z_NO_FLUSH, Z_SYNC_FLUSH and Z_FULL_FLUSH.
-
-
class
isal.isal_zlib.
Decompress
¶ Decompress object for handling streaming decompression.
-
decompress
()¶ Decompress data, returning a bytes object containing the uncompressed data corresponding to at least part of the data in string.
- Parameters
max_length – if non-zero then the return value will be no longer than max_length. Unprocessed data will be in the unconsumed_tail attribute.
-
flush
()¶ All pending input is processed, and a bytes object containing the remaining uncompressed output is returned.
- Parameters
length – The initial size of the output buffer.
-
-
isal.isal_zlib.
adler32
()¶ Computes an Adler-32 checksum of data. Returns the checksum as unsigned 32-bit integer.
- Parameters
data – Binary data (bytes, bytearray, memoryview).
value – The starting value of the checksum.
-
isal.isal_zlib.
compress
()¶ Compresses the bytes in data. Returns a bytes object with the compressed data.
- Parameters
level – the compression level from 0 to 3. 0 is the lowest compression (NOT no compression as in stdlib zlib!) and the fastest. 3 is the best compression and the slowest. Default is a compromise at level 2.
wbits – Set the amount of history bits or window size and which headers and trailers are used. Values from 9 to 15 signify will use a zlib header and trailer. From +25 to +31 (16 + 9 to 15) a gzip header and trailer will be used. -9 to -15 will generate a raw compressed string with no headers and trailers.
-
isal.isal_zlib.
compressobj
()¶ Returns a Compress object for compressing data streams.
- Parameters
level – the compression level from 0 to 3. 0 is the lowest compression (NOT no compression as in stdlib zlib!) and the fastest. 3 is the best compression and the slowest. Default is a compromise at level 2.
method – The compression algorithm. Currently only DEFLATED is supported.
wbits – Set the amount of history bits or window size and which headers and trailers are used. Values from 9 to 15 signify will use a zlib header and trailer. From +25 to +31 (16 + 9 to 15) a gzip header and trailer will be used. -9 to -15 will generate a raw compressed string with no headers and trailers.
memLevel – The amount of memory used for the internal compression state. Higher values use more memory for better speed and smaller output. Values between 1 and 9 are supported.
- Zdict
A predefined compression dictionary. A sequence of bytes that are expected to occur frequently in the to be compressed data. The most common subsequences should come at the end.
-
isal.isal_zlib.
crc32
()¶ Computes a CRC-32 checksum of data. Returns the checksum as unsigned 32-bit integer.
- Parameters
data – Binary data (bytes, bytearray, memoryview).
value – The starting value of the checksum.
-
isal.isal_zlib.
decompress
()¶ Deompresses the bytes in data. Returns a bytes object with the decompressed data.
- Parameters
wbits – Set the amount of history bits or window size and which headers and trailers are expected. Values from 8 to 15 will expect a zlib header and trailer. -8 to -15 will expect a raw compressed string with no headers and trailers. From +24 to +31 == 16 + (8 to 15) a gzip header and trailer will be expected. From +40 to +47 == 32 + (8 to 15) automatically detects a gzip or zlib header.
bufsize – The initial size of the output buffer.
-
isal.isal_zlib.
decompressobj
()¶ Returns a Decompress object for decompressing data streams.
- Parameters
wbits – Set the amount of history bits or window size and which headers and trailers are expected. Values from 8 to 15 will expect a zlib header and trailer. -8 to -15 will expect a raw compressed string with no headers and trailers. From +24 to +31 == 16 + (8 to 15) a gzip header and trailer will be expected. From +40 to +47 == 32 + (8 to 15) automatically detects a gzip or zlib header.
- Zdict
A predefined compression dictionary. Must be the same zdict as was used to compress the data.
API-documentation: igzip¶
Similar to the stdlib gzip module. But using the Intel Storage Accelaration Library to speed up its methods.
-
class
isal.igzip.
IGzipFile
(filename=None, mode=None, compresslevel=2, fileobj=None, mtime=None)¶ The IGzipFile class simulates most of the methods of a file object with the exception of the truncate() method.
This class only supports opening files in binary mode. If you need to open a compressed file in text mode, use the gzip.open() function.
-
__init__
(filename=None, mode=None, compresslevel=2, fileobj=None, mtime=None)¶ Constructor for the IGzipFile class.
At least one of fileobj and filename must be given a non-trivial value.
The new class instance is based on fileobj, which can be a regular file, an io.BytesIO object, or any other object which simulates a file. It defaults to None, in which case filename is opened to provide a file object.
When fileobj is not None, the filename argument is only used to be included in the gzip file header, which may include the original filename of the uncompressed file. It defaults to the filename of fileobj, if discernible; otherwise, it defaults to the empty string, and in this case the original filename is not included in the header.
The mode argument can be any of ‘r’, ‘rb’, ‘a’, ‘ab’, ‘w’, ‘wb’, ‘x’, or ‘xb’ depending on whether the file will be read or written. The default is the mode of fileobj if discernible; otherwise, the default is ‘rb’. A mode of ‘r’ is equivalent to one of ‘rb’, and similarly for ‘w’ and ‘wb’, ‘a’ and ‘ab’, and ‘x’ and ‘xb’.
The compresslevel argument is an integer from 0 to 3 controlling the level of compression; 0 is fastest and produces the least compression, and 3 is slowest and produces the most compression. Unlike gzip.GzipFile 0 is NOT no compression. The default is 2.
The mtime argument is an optional numeric timestamp to be written to the last modification time field in the stream when compressing. If omitted or None, the current time is used.
-
write
(data)¶ Write the given buffer to the IO stream.
Returns the number of bytes written, which is always the length of b in bytes.
Raises BlockingIOError if the buffer is full and the underlying raw stream cannot accept more data at the moment.
-
-
isal.igzip.
compress
(data, compresslevel=3, *, mtime=None)¶ Compress data in one shot and return the compressed string. Optional argument is the compression level, in range of 0-3.
-
isal.igzip.
decompress
(data)¶ Decompress a gzip compressed string in one shot. Return the decompressed string.
-
isal.igzip.
open
(filename, mode='rb', compresslevel=2, encoding=None, errors=None, newline=None)¶ Open a gzip-compressed file in binary or text mode. This uses the isa-l library for optimized speed.
The filename argument can be an actual filename (a str or bytes object), or an existing file object to read from or write to.
The mode argument can be “r”, “rb”, “w”, “wb”, “x”, “xb”, “a” or “ab” for binary mode, or “rt”, “wt”, “xt” or “at” for text mode. The default mode is “rb”, and the default compresslevel is 2.
For binary mode, this function is equivalent to the GzipFile constructor: GzipFile(filename, mode, compresslevel). In this case, the encoding, errors and newline arguments must not be provided.
For text mode, a GzipFile object is created, and wrapped in an io.TextIOWrapper instance with the specified encoding, error handling behavior, and line ending(s).
API Documentation: igzip_lib¶
Pythonic interface to ISA-L’s igzip_lib.
This module comes with the following constants:
|
The lowest compression level (0) |
|
The highest compression level (3) |
|
The compromise compression level (2) |
|
Default size for the starting buffer (16K) |
|
Maximum window size bits (15). |
|
Flag to compress to a raw deflate block |
|
Flag to compress a gzip block, consisting of a gzip header, raw deflate block and a gzip trailer. |
|
Flag to compress a gzip block without a header. |
|
Flag to compress a zlib block, consisting of a zlib header, a raw deflate block and a zlib trailer. |
|
Flag to compress a zlib block without a header. |
|
Flag to decompress a raw deflate block. |
|
Flag to decompress a gzip block including header and verify the checksums in the trailer. |
|
Flag to decompress a gzip block without a header and verify the checksums in the trailer. |
|
Flag to decompress a gzip block without a header and without verifying the checksums in the trailer. |
|
Flag to decompress a zlib block including header and verify the checksums in the trailer. |
|
Flag to decompress a zlib block without a header and verify the checksums in the trailer. |
|
Flag to decompress a zlib block without a header and without verifying the checksums in the trailer. |
|
The default memory level for the internal level buffer. (Equivalent to MEM_LEVEL_LARGE.) |
|
The minimum memory level. |
|
|
|
|
|
|
|
The largest memory level. |
-
exception
isal.igzip_lib.
IsalError
¶ Exception raised on compression and decompression errors.
-
isal.igzip_lib.
compress
()¶ Compresses the bytes in data. Returns a bytes object with the compressed data.
- Parameters
level – the compression level from 0 to 3. 0 is the lowest compression (NOT no compression as in stdlib zlib!) and the fastest. 3 is the best compression and the slowest. Default is a compromise at level 2.
flag – Controls the header and trailer. Can be any of: COMP_DEFLATE (default), COMP_GZIP, COMP_GZIP_NO_HDR, COMP_ZLIB, COMP_ZLIB_NO_HDR.
mem_level – Set the memory level for the memory buffer. Larger buffers improve performance. Can be any of: MEM_LEVEL_DEFAULT (default, equivalent to MEM_LEVEL_LARGE), MEM_LEVEL_MIN, MEM_LEVEL_SMALL, MEM_LEVEL_MEDIUM, MEM_LEVEL_LARGE, MEM_LEVEL_EXTRA_LARGE.
hist_bits – Sets the size of the view window. The size equals 2^hist_bits. Similar to zlib wbits value, except that hist_bits is not used to set the compression flag. This is best left at the default (15, maximum).
-
isal.igzip_lib.
decompress
()¶ Deompresses the bytes in data. Returns a bytes object with the decompressed data.
- Parameters
flag – Whether the compressed block contains headers and/or trailers and of which type. Can be any of: DECOMP_DEFLATE (default), DECOMP_GZIP, DECOMP_GZIP_NO_HDR, DECOMP_GZIP_NO_HDR_VER, DECOMP_ZLIB, DECOMP_ZLIB_NO_HDR, DECOMP_ZLIB_NO_HDR_VER.
hist_bits – Sets the size of the view window. The size equals 2^hist_bits. Similar to zlib wbits value, except that hist_bits is not used to set the compression flag. This is best left at the default (15, maximum).
bufsize – The initial size of the output buffer. The output buffer is dynamically resized according to the need. The default size is 16K. If a larger output is expected, using a larger buffer will improve performance by negating the costs associated with the dynamic resizing.
python -m isal.igzip usage¶
A simple command line interface for the igzip module. Acts like igzip.
usage: python -m isal.igzip [-h] [-0 | -1 | -2 | -3 | -d] [-c] [file]
Positional Arguments¶
- file
Named Arguments¶
- -0, --fast
use compression level 0 (fastest)
- -1
use compression level 1
- -2
use compression level 2 (default)
- -3, --best
use compression level 3 (best)
- -d, --decompress
Decompress the file instead of compressing.
Default: True
- -c, --stdout
write on standard output
Default: False
Contributing¶
Please make a PR or issue if you feel anything can be improved. Bug reports are also very welcome. Please report them on the github issue tracker.
Acknowledgements¶
This project builds upon the software and experience of many. Many thanks to:
The ISA-L contributors for making ISA-L.
The Cython contributors for making it easy to create an extension and helping a novice get start with pointer addresses.
The CPython contributors. Python-isal mimicks
zlibmodule.c
andgzip.py
from the standard library to make it easier for python users to adopt it.@marcelm for taking a chance on this project and make it a dependency for his xopen and by extension cutadapt projects. This gave python-isal its first users who used python-isal in production.
The github actions team for creating the actions CI service that enables building and testing on all three major operating systems.
@animalize for explaining how to test and build python-isal for ARM 64-bit platforms.
And last but not least: everyone who submitted a bug report or a feature request. These make the project better!
Python-isal would not have been possible without you!
Changelog¶
version 0.10.0¶
Added an
igzip_lib
module which allows more direct access to ISA-L’s igzip_lib API. This allows features such as headerless compression and decompression, as well as setting the memory levels manually.Added more extensive documentation.
version 0.9.0¶
Fix a bug where a AttributeError was triggered when zlib.Z_RLE or zlib.Z_FIXED were not present.
Add support for Linux aarch64 builds.
Add support for pypy by adding pypy tests to the CI and setting up wheel building support.
version 0.8.1¶
Fix a bug where multi-member gzip files where read incorrectly due to an offset error. This was caused by ISA-L’s decompressobj having a small bitbuffer which was not taken properly into account in some circumstances.
version 0.8.0¶
Speed up
igzip.compress
andigzip.decompress
by improving the implementation.Make sure compiler arguments are passed to ISA-L compilation step. Previously ISA-L was compiled without optimisation steps, causing the statically linked library to be significantly slower.
A unused constant from the
isal_zlib
library was removed:ISAL_DEFAULT_HIST_BITS
.Refactor isal_zlib.pyx to work almost the same as zlibmodule.c. This has made the code look cleaner and has reduced some overhead.
version 0.7.0¶
Remove workarounds in the
igzip
module for theunconsumed_tail
andunused_data
bugs.igzip._IGzipReader
now functions the same asgzip._GzipReader
with only a few calls replaced withisal_zlib
calls for speed.Correctly implement
unused_data
andunconsumed_tail
onisal_zlib.Decompress
objects. It works the same as in CPython’s zlib now.Correctly implement flush implementation on
isal_zlib.Compress
andisal_zlib.Decompress
objects. It works the same as in CPython’s zlib now.
version 0.6.1¶
Fix a crash that occurs when opening a file that did not end in
.gz
while outputting to stdout usingpython -m isal.igzip
.
version 0.6.0¶
python -m gzip
’s behaviour has been changed since fixing bug: bpo-43316. This bug was not present inpython -m isal.igzip
but it handled the error differently than the solution in CPython. This is now corrected andpython -m isal.igzip
handles the error the same as the fixedpython -m gzip
.Installation on Windows is now supported. Wheels are provided for Windows as well.
version 0.5.0¶
Fix a bug where negative integers were not allowed for the
adler32
andcrc32
functions inisal_zlib
.Provided stubs (type-hint files) for
isal_zlib
and_isal
modules. Package is now tested with mypy to ensure correct type information.The command-line interface now reads in blocks of 32K instead of 8K. This improves performance by about 6% when compressing and 11% when decompressing. A hidden
-b
flag was added to adjust the buffer size for benchmarks.A
-c
or--stdout
flag was added to the CLI interface of isal.igzip. This allows it to behave more like thegzip
orpigz
command line interfaces.
version 0.4.0¶
Move wheel building to cibuildwheel on github actions CI. Wheels are now provided for Mac OS as well.
Make a tiny change in setup.py so python-isal can be build on Mac OS X.
version 0.3.0¶
Set included ISA-L library at version 2.30.0.
Python-isal now comes with a source distribution of ISA-L in its source distribution against which python-isal is linked statically upon installation by default. Dynamic linking against system libraries is now optional. Wheels with the statically linked ISA-L are now provided on PyPI.
version 0.2.0¶
Fixed a bug where writing of the gzip header would crash if an older version of Python 3.7 was used such as on Debian or Ubuntu. This is due to differences between point releases because of a backported feature. The code now checks if the backported feature is present.
Added Python 3.9 to the testing.
Fixed
setup.py
to list setuptools as a requirement.Changed homepage to reflect move to pycompression organization.
version 0.1.0¶
Publish API documentation on readthedocs.
Add API documentation.
Ensure the igzip module is fully compatible with the gzip stdlib module.
Add compliance tests from CPython to ensure isal_zlib and igzip are validated to the same standards as the zlib and gzip modules.
Added a working gzip app using
python -m isal.igzip
Add test suite that tests all possible settings for functions on the isal_zlib module.
Create igzip module which implements all gzip functions and methods.
Create isal_zlib module which implements all zlib functions and methods.