Search this site

 Home  I/O  Buffering  Character streams  NIO intro  Buffers  Channels  Buffer performance

Search this site:
Threads Database Profiling Regular expressions Random numbers Compression Exceptions C Equivalents in Java

 What do you think of this article? Did it help you? Found a mistake? Feedback and suggestions here

Introduction to networking in Java

In this section, we will look at how to perform various networking operations in Java. Various aspects of networked I/O in Java are actually very similar to Java I/O generally.

Examle: how to download data from a URL

One of the most common networked operations that people want to perform in Java is to download data from a particular URL. This is generally a straightforward task. In its simplest form, the general procedure is as follows:

  • construct a URL object representing the URL that data is to be retrieved from;
  • call openConnection() on this URL to retrieve a URLConnection object;
  • on this connection object, call getInputStream() to get an InputStream object;
  • use the InputStream as normal to read data, bearing in mind issues that you would with other input streams, such as the need for buffering, or character encoding issues if we're translating the bytes into characters.

Constructing a URL object

We can construct a URL object simply by passing it the string representation of the URL, as would appear in a browser address bar:

try {
  URL ur = new URL("http://www.mydomain.com/myfile.gif");
  // do something with the URL...
} catch (IOException ioex) {
  ...
}

Notice that we catch IOException. Constructing a URL could throw a type of IOException, specifically MalformedURLException. Since we're likely to use the URL in order to connect to it— an operation that could also throw IOExceptions— it's often simpler to just catch any type of IOException around the whole operation.

Reading binary data from a URL

If the URL points to binary data, such as an image, then we essentially want to follow the above pattern, but pull out "raw bytes" from the input stream. If we want to get the bytes into a byte array, then we can use the help of ByteArrayOutputStream. This class lets us feed it successive bytes, then at the end call toByteArray(). So the code could look as follows:

  public static byte[] getBinaryURLContent(URL url) throws IOException {
    URLConnection conn = url.openConnection();
    InputStream in = new BufferedInputStream(conn.getInputStream());
    try {
      ByteArrayOutputStream bout = new ByteArrayOutputStream(10000);
      int b;
      while ((b = in.read()) != -1) {
        bout.write(b);
      }
      return bout.toByteArray();
    } finally {
      in.close();
    }
  }

Notice that:

  • once we've called openConnection() and then getInputStream(), we effectively proceed as though reading from any boring old input stream— at this point, there's nothing very special about the fact that the stream is coming via a URL connection;
  • this means that, like any InputStream, we should buffer the input (via BufferedInputStream in this case);
  • with the buffering in place, we just read one byte at a time from the (buffered) stream; there may be slightly more optimal ways to read the data (e.g. by reading an array of bytes each time, we avoid the potential overhead of a method call per byte), but this simple method is good enough for most purposes;
  • as always, we need to close the stream in a finally clause;
  • in this simple example, we make a rough guess as to the amount of data we're expecting (if the server provides it, we can actually query the URLConnection for the content length).

Closing the URLConnection?

There's a special contract between the InputStream and the underlying URLConnection that closing one will close the other. So it's sufficient in this case to just close the InputStream.

Reading the contents of a URL as a string (or CharSequence)

How to download the content of a URL to a string is a common situation, and is not much different to the binary data case just examined. Essentially, we need to read character by character from the URL stream and append each character to a string (or in fact, a string buffer of some kind). As of Java 5, we can use a StringBuilder, which is a non-synchronized StringBuffer.

Apart from the destination of the characters, a key issue is character encoding: that is, the scheme by which bytes are "mapped" to characters. If we're really lucky, the server will tell us which encoding it uses, and we can read the name of the scheme with getContentEncoding(). However, we must be prepared for the possibility that this method will just return null, in which case we need to make an assumption of some kind. For simplicity, we'll just assume a default encoding of ISO-8859-1 (another common encoding scheme being UTF-8):

public static CharSequence getURLContent(URL url) throws IOException {
  URLConnection conn = url.openConnection();
  String encoding = conn.getContentEncoding();
  if (encoding == null) {
    encoding = "ISO-8859-1";
  }
  BufferedReader br = new BufferedReader(new
      InputStreamReader(conn.getInputStream(), encoding));
  StringBuilder sb = new StringBuilder(16384);
  try {
    String line;
    while ((line = br.readLine()) != null) {
      sb.append(line);
      sb.append('\n');
    }
  } finally {
    br.close();
  }
  return sb;
}

Note some other points:

  • we posed the original problem as needing to "download to a string", but in fact, there's really no need to return a String: usually, the thing that the caller of this method will need is just "some character sequence or other", so we may as well declare the method as returning a CharSequence implementation (and in fact, just return the StringBuilder that we were writing to)— if the caller really wants a String, they can soon call toString() on the CharSequence returned;
  • I also choose to use the readLine() method to read line by line, and then adding a specific line break character (in this case \n) after each line: this has the effect of normalising line breaks (depending on the server system and/or file in question, line breaks could be marked in different ways, but BufferedReader deals with the different possibilities).

In practice, I've also found readLine() to give slightly better performance: possibly because the JVM can compile this whole method and avoid a method call per character.

comments powered by Disqus

Written by Neil Coffey. Copyright © Javamex UK 2012. All rights reserved.