Welcome to python-isal’s documentation!

Introduction

Faster zlib and gzip compatible compression and decompression by providing Python bindings for the ISA-L library.

This package provides Python bindings for the ISA-L library. The Intel(R) Intelligent Storage Acceleration Library (ISA-L) implements several key algorithms in assembly language. This includes a variety of functions to provide zlib/gzip-compatible compression.

python-isal provides the bindings by offering three modules:

  • isal_zlib: A drop-in replacement for the zlib module that uses ISA-L to accelerate its performance.

  • igzip: A drop-in replacement for the gzip module that uses isal_zlib instead of zlib to perform its compression and checksum tasks, which improves performance.

  • igzip_lib: Provides compression functions which have full access to the API of ISA-L’s compression functions.

isal_zlib and igzip are almost fully compatible with zlib and gzip from the Python standard library. There are some minor differences see: differences-with-zlib-and-gzip-modules.

Quickstart

The python-isal modules can be imported as follows

from isal import isal_zlib
from isal import igzip
from isal import igzip_lib

isal_zlib and igzip are meant to be used as drop in replacements so their api and functions are the same as the stdlib’s modules. Except where ISA-L does not support the same calls as zlib (See differences below).

A full API documentation can be found on our readthedocs page.

python -m isal.igzip implements a simple gzip-like command line application (just like python -m gzip). Full usage documentation can be found on our readthedocs page.

Installation

Installation with pip

pip install isal

Installation is supported on Linux, MacOS and Windows. On most platforms wheels are provided. The installation will include a staticallly linked version of ISA-L. If a wheel is not provided for your system the installation will build ISA-L first in a temporary directory. Please check the ISA-L homepage for the build requirements.

The latest development version of python-isal can be installed with:

pip install git+https://github.com/rhpvorderman/python-isal.git

This requires having the build requirements installed. If you wish to link dynamically against a version of libisal installed on your system use:

PYTHON_ISAL_LINK_DYNAMIC=true pip install isal --no-binary isal

ISA-L is available in numerous Linux distro’s as well as on conda via the conda-forge channel. Checkout the ports documentation on the ISA-L project wiki to find out how to install it. It is important that the development headers are also installed.

On Debian and Ubuntu the ISA-L libraries (including the development headers) can be installed with:

sudo apt install libisal-dev

Installation via conda

Python-isal can be installed via conda, for example using the miniconda installer with a properly setup conda-forge channel. When used with bioinformatics tools setting up bioconda provides a clear set of installation instructions for conda.

python-isal is available on conda-forge and can be installed with:

conda install python-isal

This will automatically install the ISA-L library dependency as well, since it is available on conda-forge.

Differences with zlib and gzip modules

  • Compression level 0 in zlib and gzip means no compression, while in isal_zlib and igzip this is the lowest compression level. This is a design choice that was inherited from the ISA-L library.

  • Compression levels range from 0 to 3, not 1 to 9. isal_zlib.Z_DEFAULT_COMPRESSION has been aliased to isal_zlib.ISAL_DEFAULT_COMPRESSION (2).

  • isal_zlib only supports NO_FLUSH, SYNC_FLUSH, FULL_FLUSH and FINISH_FLUSH. Other flush modes are not supported and will raise errors.

  • zlib.Z_DEFAULT_STRATEGY, zlib.Z_RLE etc. are exposed as isal_zlib.Z_DEFAULT_STRATEGY, isal_zlib.Z_RLE etc. for compatibility reasons. However, isal_zlib only supports a default strategy and will give warnings when other strategies are used.

  • zlib supports different memory levels from 1 to 9 (with 8 default). isal_zlib supports memory levels smallest, small, medium, large and largest. These have been mapped to levels 1, 2-3, 4-6, 7-8 and 9. So isal_zlib can be used with zlib compatible memory levels.

  • isal_zlib methods have a data argument which is positional only. In isal_zlib this is not enforced and it can also called as keyword argument. This is due to implementing isal_zlib in cython and maintaining backwards compatibility with python 3.6.

  • igzip.open returns a class IGzipFile instead of GzipFile. Since the compression levels are not compatible, a difference in naming was chosen to reflect this. igzip.GzipFile does exist as an alias of igzip.IGzipFile for compatibility reasons.

API Documentation: isal_zlib

Implementation of the zlib module using the ISA-L libraries.

class isal.isal_zlib.Compress

Compress object for handling streaming compression.

compress()

Compress data returning a bytes object with at least part of the data in data. This data should be concatenated to the output produced by any preceding calls to the compress() method. Some input may be kept in internal buffers for later processing.

flush()

All pending input is processed, and a bytes object containing the remaining compressed output is returned.

Parameters

mode – Defaults to Z_FINISH which finishes the compressed stream and prevents compressing any more data. The other supported methods are Z_NO_FLUSH, Z_SYNC_FLUSH and Z_FULL_FLUSH.

class isal.isal_zlib.Decompress

Decompress object for handling streaming decompression.

decompress()

Decompress data, returning a bytes object containing the uncompressed data corresponding to at least part of the data in string.

Parameters

max_length – if non-zero then the return value will be no longer than max_length. Unprocessed data will be in the unconsumed_tail attribute.

flush()

All pending input is processed, and a bytes object containing the remaining uncompressed output is returned.

Parameters

length – The initial size of the output buffer.

isal.isal_zlib.adler32()

Computes an Adler-32 checksum of data. Returns the checksum as unsigned 32-bit integer.

Parameters
  • data – Binary data (bytes, bytearray, memoryview).

  • value – The starting value of the checksum.

isal.isal_zlib.compress()

Compresses the bytes in data. Returns a bytes object with the compressed data.

Parameters
  • level – the compression level from 0 to 3. 0 is the lowest compression (NOT no compression as in stdlib zlib!) and the fastest. 3 is the best compression and the slowest. Default is a compromise at level 2.

  • wbits – Set the amount of history bits or window size and which headers and trailers are used. Values from 9 to 15 signify will use a zlib header and trailer. From +25 to +31 (16 + 9 to 15) a gzip header and trailer will be used. -9 to -15 will generate a raw compressed string with no headers and trailers.

isal.isal_zlib.compressobj()

Returns a Compress object for compressing data streams.

Parameters
  • level – the compression level from 0 to 3. 0 is the lowest compression (NOT no compression as in stdlib zlib!) and the fastest. 3 is the best compression and the slowest. Default is a compromise at level 2.

  • method – The compression algorithm. Currently only DEFLATED is supported.

  • wbits – Set the amount of history bits or window size and which headers and trailers are used. Values from 9 to 15 signify will use a zlib header and trailer. From +25 to +31 (16 + 9 to 15) a gzip header and trailer will be used. -9 to -15 will generate a raw compressed string with no headers and trailers.

  • memLevel – The amount of memory used for the internal compression state. Higher values use more memory for better speed and smaller output. Values between 1 and 9 are supported.

Zdict

A predefined compression dictionary. A sequence of bytes that are expected to occur frequently in the to be compressed data. The most common subsequences should come at the end.

isal.isal_zlib.crc32()

Computes a CRC-32 checksum of data. Returns the checksum as unsigned 32-bit integer.

Parameters
  • data – Binary data (bytes, bytearray, memoryview).

  • value – The starting value of the checksum.

isal.isal_zlib.decompress()

Deompresses the bytes in data. Returns a bytes object with the decompressed data.

Parameters
  • wbits – Set the amount of history bits or window size and which headers and trailers are expected. Values from 8 to 15 will expect a zlib header and trailer. -8 to -15 will expect a raw compressed string with no headers and trailers. From +24 to +31 == 16 + (8 to 15) a gzip header and trailer will be expected. From +40 to +47 == 32 + (8 to 15) automatically detects a gzip or zlib header.

  • bufsize – The initial size of the output buffer.

isal.isal_zlib.decompressobj()

Returns a Decompress object for decompressing data streams.

Parameters

wbits – Set the amount of history bits or window size and which headers and trailers are expected. Values from 8 to 15 will expect a zlib header and trailer. -8 to -15 will expect a raw compressed string with no headers and trailers. From +24 to +31 == 16 + (8 to 15) a gzip header and trailer will be expected. From +40 to +47 == 32 + (8 to 15) automatically detects a gzip or zlib header.

Zdict

A predefined compression dictionary. Must be the same zdict as was used to compress the data.

API-documentation: igzip

Similar to the stdlib gzip module. But using the Intel Storage Accelaration Library to speed up its methods.

class isal.igzip.IGzipFile(filename=None, mode=None, compresslevel=2, fileobj=None, mtime=None)

The IGzipFile class simulates most of the methods of a file object with the exception of the truncate() method.

This class only supports opening files in binary mode. If you need to open a compressed file in text mode, use the gzip.open() function.

__init__(filename=None, mode=None, compresslevel=2, fileobj=None, mtime=None)

Constructor for the IGzipFile class.

At least one of fileobj and filename must be given a non-trivial value.

The new class instance is based on fileobj, which can be a regular file, an io.BytesIO object, or any other object which simulates a file. It defaults to None, in which case filename is opened to provide a file object.

When fileobj is not None, the filename argument is only used to be included in the gzip file header, which may include the original filename of the uncompressed file. It defaults to the filename of fileobj, if discernible; otherwise, it defaults to the empty string, and in this case the original filename is not included in the header.

The mode argument can be any of ‘r’, ‘rb’, ‘a’, ‘ab’, ‘w’, ‘wb’, ‘x’, or ‘xb’ depending on whether the file will be read or written. The default is the mode of fileobj if discernible; otherwise, the default is ‘rb’. A mode of ‘r’ is equivalent to one of ‘rb’, and similarly for ‘w’ and ‘wb’, ‘a’ and ‘ab’, and ‘x’ and ‘xb’.

The compresslevel argument is an integer from 0 to 3 controlling the level of compression; 0 is fastest and produces the least compression, and 3 is slowest and produces the most compression. Unlike gzip.GzipFile 0 is NOT no compression. The default is 2.

The mtime argument is an optional numeric timestamp to be written to the last modification time field in the stream when compressing. If omitted or None, the current time is used.

write(data)

Write the given buffer to the IO stream.

Returns the number of bytes written, which is always the length of b in bytes.

Raises BlockingIOError if the buffer is full and the underlying raw stream cannot accept more data at the moment.

isal.igzip.compress(data, compresslevel=3, *, mtime=None)

Compress data in one shot and return the compressed string. Optional argument is the compression level, in range of 0-3.

isal.igzip.decompress(data)

Decompress a gzip compressed string in one shot. Return the decompressed string.

isal.igzip.open(filename, mode='rb', compresslevel=2, encoding=None, errors=None, newline=None)

Open a gzip-compressed file in binary or text mode. This uses the isa-l library for optimized speed.

The filename argument can be an actual filename (a str or bytes object), or an existing file object to read from or write to.

The mode argument can be “r”, “rb”, “w”, “wb”, “x”, “xb”, “a” or “ab” for binary mode, or “rt”, “wt”, “xt” or “at” for text mode. The default mode is “rb”, and the default compresslevel is 2.

For binary mode, this function is equivalent to the GzipFile constructor: GzipFile(filename, mode, compresslevel). In this case, the encoding, errors and newline arguments must not be provided.

For text mode, a GzipFile object is created, and wrapped in an io.TextIOWrapper instance with the specified encoding, error handling behavior, and line ending(s).

API Documentation: igzip_lib

Pythonic interface to ISA-L’s igzip_lib.

This module comes with the following constants:

ISAL_BEST_SPEED

The lowest compression level (0)

ISAL_BEST_COMPRESSION

The highest compression level (3)

ISAL_DEFAULT_COMPRESSION

The compromise compression level (2)

DEF_BUF_SIZE

Default size for the starting buffer (16K)

MAX_HIST_BITS

Maximum window size bits (15).

COMP_DEFLATE

Flag to compress to a raw deflate block

COMP_GZIP

Flag to compress a gzip block, consisting of a gzip header, raw deflate block and a gzip trailer.

COMP_GZIP_NO_HDR

Flag to compress a gzip block without a header.

COMP_ZLIB

Flag to compress a zlib block, consisting of a zlib header, a raw deflate block and a zlib trailer.

COMP_ZLIB_NO_HDR

Flag to compress a zlib block without a header.

DECOMP_DEFLATE

Flag to decompress a raw deflate block.

DECOMP_GZIP

Flag to decompress a gzip block including header and verify the checksums in the trailer.

DECOMP_GZIP_NO_HDR

Flag to decompress a gzip block without a header and verify the checksums in the trailer.

DECOMP_GZIP_NO_HDR_VER

Flag to decompress a gzip block without a header and without verifying the checksums in the trailer.

DECOMP_ZLIB

Flag to decompress a zlib block including header and verify the checksums in the trailer.

DECOMP_ZLIB_NO_HDR

Flag to decompress a zlib block without a header and verify the checksums in the trailer.

DECOMP_ZLIB_NO_HDR_VER

Flag to decompress a zlib block without a header and without verifying the checksums in the trailer.

MEM_LEVEL_DEFAULT

The default memory level for the internal level buffer. (Equivalent to MEM_LEVEL_LARGE.)

MEM_LEVEL_MIN

The minimum memory level.

MEM_LEVEL_SMALL

MEM_LEVEL_MEDIUM

MEM_LEVEL_LARGE

MEM_LEVEL_EXTRA_LARGE

The largest memory level.

exception isal.igzip_lib.IsalError

Exception raised on compression and decompression errors.

isal.igzip_lib.compress()

Compresses the bytes in data. Returns a bytes object with the compressed data.

Parameters
  • level – the compression level from 0 to 3. 0 is the lowest compression (NOT no compression as in stdlib zlib!) and the fastest. 3 is the best compression and the slowest. Default is a compromise at level 2.

  • flag – Controls the header and trailer. Can be any of: COMP_DEFLATE (default), COMP_GZIP, COMP_GZIP_NO_HDR, COMP_ZLIB, COMP_ZLIB_NO_HDR.

  • mem_level – Set the memory level for the memory buffer. Larger buffers improve performance. Can be any of: MEM_LEVEL_DEFAULT (default, equivalent to MEM_LEVEL_LARGE), MEM_LEVEL_MIN, MEM_LEVEL_SMALL, MEM_LEVEL_MEDIUM, MEM_LEVEL_LARGE, MEM_LEVEL_EXTRA_LARGE.

  • hist_bits – Sets the size of the view window. The size equals 2^hist_bits. Similar to zlib wbits value, except that hist_bits is not used to set the compression flag. This is best left at the default (15, maximum).

isal.igzip_lib.decompress()

Deompresses the bytes in data. Returns a bytes object with the decompressed data.

Parameters
  • flag – Whether the compressed block contains headers and/or trailers and of which type. Can be any of: DECOMP_DEFLATE (default), DECOMP_GZIP, DECOMP_GZIP_NO_HDR, DECOMP_GZIP_NO_HDR_VER, DECOMP_ZLIB, DECOMP_ZLIB_NO_HDR, DECOMP_ZLIB_NO_HDR_VER.

  • hist_bits – Sets the size of the view window. The size equals 2^hist_bits. Similar to zlib wbits value, except that hist_bits is not used to set the compression flag. This is best left at the default (15, maximum).

  • bufsize – The initial size of the output buffer. The output buffer is dynamically resized according to the need. The default size is 16K. If a larger output is expected, using a larger buffer will improve performance by negating the costs associated with the dynamic resizing.

python -m isal.igzip usage

A simple command line interface for the igzip module. Acts like igzip.

usage: python -m isal.igzip [-h] [-0 | -1 | -2 | -3 | -d] [-c] [file]

Positional Arguments

file

Named Arguments

-0, --fast

use compression level 0 (fastest)

-1

use compression level 1

-2

use compression level 2 (default)

-3, --best

use compression level 3 (best)

-d, --decompress

Decompress the file instead of compressing.

Default: True

-c, --stdout

write on standard output

Default: False

Contributing

Please make a PR or issue if you feel anything can be improved. Bug reports are also very welcome. Please report them on the github issue tracker.

Acknowledgements

This project builds upon the software and experience of many. Many thanks to:

  • The ISA-L contributors for making ISA-L.

  • The Cython contributors for making it easy to create an extension and helping a novice get start with pointer addresses.

  • The CPython contributors. Python-isal mimicks zlibmodule.c and gzip.py from the standard library to make it easier for python users to adopt it.

  • @marcelm for taking a chance on this project and make it a dependency for his xopen and by extension cutadapt projects. This gave python-isal its first users who used python-isal in production.

  • The github actions team for creating the actions CI service that enables building and testing on all three major operating systems.

  • @animalize for explaining how to test and build python-isal for ARM 64-bit platforms.

  • And last but not least: everyone who submitted a bug report or a feature request. These make the project better!

Python-isal would not have been possible without you!

Changelog

version 0.10.0

  • Added an igzip_lib module which allows more direct access to ISA-L’s igzip_lib API. This allows features such as headerless compression and decompression, as well as setting the memory levels manually.

  • Added more extensive documentation.

version 0.9.0

  • Fix a bug where a AttributeError was triggered when zlib.Z_RLE or zlib.Z_FIXED were not present.

  • Add support for Linux aarch64 builds.

  • Add support for pypy by adding pypy tests to the CI and setting up wheel building support.

version 0.8.1

  • Fix a bug where multi-member gzip files where read incorrectly due to an offset error. This was caused by ISA-L’s decompressobj having a small bitbuffer which was not taken properly into account in some circumstances.

version 0.8.0

  • Speed up igzip.compress and igzip.decompress by improving the implementation.

  • Make sure compiler arguments are passed to ISA-L compilation step. Previously ISA-L was compiled without optimisation steps, causing the statically linked library to be significantly slower.

  • A unused constant from the isal_zlib library was removed: ISAL_DEFAULT_HIST_BITS.

  • Refactor isal_zlib.pyx to work almost the same as zlibmodule.c. This has made the code look cleaner and has reduced some overhead.

version 0.7.0

  • Remove workarounds in the igzip module for the unconsumed_tail and unused_data bugs. igzip._IGzipReader now functions the same as gzip._GzipReader with only a few calls replaced with isal_zlib calls for speed.

  • Correctly implement unused_data and unconsumed_tail on isal_zlib.Decompress objects. It works the same as in CPython’s zlib now.

  • Correctly implement flush implementation on isal_zlib.Compress and isal_zlib.Decompress objects. It works the same as in CPython’s zlib now.

version 0.6.1

  • Fix a crash that occurs when opening a file that did not end in .gz while outputting to stdout using python -m isal.igzip.

version 0.6.0

  • python -m gzip’s behaviour has been changed since fixing bug: bpo-43316. This bug was not present in python -m isal.igzip but it handled the error differently than the solution in CPython. This is now corrected and python -m isal.igzip handles the error the same as the fixed python -m gzip.

  • Installation on Windows is now supported. Wheels are provided for Windows as well.

version 0.5.0

  • Fix a bug where negative integers were not allowed for the adler32 and crc32 functions in isal_zlib.

  • Provided stubs (type-hint files) for isal_zlib and _isal modules. Package is now tested with mypy to ensure correct type information.

  • The command-line interface now reads in blocks of 32K instead of 8K. This improves performance by about 6% when compressing and 11% when decompressing. A hidden -b flag was added to adjust the buffer size for benchmarks.

  • A -c or --stdout flag was added to the CLI interface of isal.igzip. This allows it to behave more like the gzip or pigz command line interfaces.

version 0.4.0

  • Move wheel building to cibuildwheel on github actions CI. Wheels are now provided for Mac OS as well.

  • Make a tiny change in setup.py so python-isal can be build on Mac OS X.

version 0.3.0

  • Set included ISA-L library at version 2.30.0.

  • Python-isal now comes with a source distribution of ISA-L in its source distribution against which python-isal is linked statically upon installation by default. Dynamic linking against system libraries is now optional. Wheels with the statically linked ISA-L are now provided on PyPI.

version 0.2.0

  • Fixed a bug where writing of the gzip header would crash if an older version of Python 3.7 was used such as on Debian or Ubuntu. This is due to differences between point releases because of a backported feature. The code now checks if the backported feature is present.

  • Added Python 3.9 to the testing.

  • Fixed setup.py to list setuptools as a requirement.

  • Changed homepage to reflect move to pycompression organization.

version 0.1.0

  • Publish API documentation on readthedocs.

  • Add API documentation.

  • Ensure the igzip module is fully compatible with the gzip stdlib module.

  • Add compliance tests from CPython to ensure isal_zlib and igzip are validated to the same standards as the zlib and gzip modules.

  • Added a working gzip app using python -m isal.igzip

  • Add test suite that tests all possible settings for functions on the isal_zlib module.

  • Create igzip module which implements all gzip functions and methods.

  • Create isal_zlib module which implements all zlib functions and methods.