Home  Java compression intro  Deflater how-to  Deflater algorithm  Deflater configuration  Text compression performance  GZIP files  ZIP files

Search this site:
Threads Database Profiling Regular expressions Random numbers Compression Exceptions C Equivalents in Java

Reading a ZIP file in Java

Java provides the facility to read raw data compressed using the DEFLATE algorithm using a Deflater or DeflaterInputStream. For many applications, another useful facility is that Java provides an API for reading from (and writing to) unencrypted ZIP files. (Such support is inevitable, since the jar (Java archive) file is essentially a ZIP file.) The ZIP file format packages a number of files together into a single archive; individual subfiles within the archive are compressed using the DEFLATE algorithm. Thus before reading data from the archive, we need to specify which subfile we want to read.

Note that to read an encrypted ZIP file in Java, you'll need Arcmexer or some other third party library.

But sticking with Java's built-in support for now, let's consider the case where we want to read from a single file within the ZIP archive1. The basic pattern for doing so is as follows (for clarity, we'll ignore exception handling code):

ZipFile zf = new ZipFile(file);
try {
  InputStream in = zf.getInputStream("file.txt");
  // ... read from 'in' as normal
} finally {
  zf.close();
}

We can successively call getInputStream() on any number of subfiles in the archive and read the corresponding data. If you really desire, you can also hold open and call read methods concurrently on different InputStreams from the same ZipFile, but the actual reads are synchronized on the ZipFile object, so there'll only be one actual read per zip file in progress at any one time. Given the nature of what ZipFile does, that kind of makes sense.

Buffering and stream closing

In general, the flavour of InputStream returned by ZipFile.getInputStream() can be treated as any old InputStream. A couple of subtleties are:

  • there is no need to call close() on the individual InputStreams for subfiles, though you should close the ZipFile as above;
  • the single-byte read() method has particularly poor performance; if you need single-byte reads on an InputStream from a zip file, wrap it in a BufferedInputStream.

Of course, you should generally avoid unbuffered single-byte reads and writes; I make the point simply because you might have expected the single-byte read to be reading straight from a buffer, given the decompression process2.

Enumerating entries and metadata

The example above assumes that you want to read from a known file in the zip archive. But what happens if you want to read from 'all' of the files, or files matching a certain filter etc? On the next page, we look at how to enumerate zip entries and read their metadata via the ZipFile class.

Problems with ZIP files

There are at least a couple of problems with zip files that you should be aware of. We'll loook at how these relate to Java specifically.


1. A slight anomaly is that the zip file data must physically be in a file; we can't open a zip file from an arbitrary input stream.
2. In the current implementation, what actually happens is much worse: first, a single-element byte array is constructed on each call to read(), and then this is passed to an individual native method call each time. Performance-wise, that's not good.

comments powered by Disqus

Written by Neil Coffey. Copyright © Javamex UK 2012. All rights reserved.