Memory usage of Java Strings and string-related objects

Strings are everywhere. It's difficult to write an application that doesn't use them in some way, and many will make extensive use of them. Indeed, in many typical applications such as servers caching entities from a database or data to display in a web page, the majority of objects stored in the medium-to-long term on the heap may well be Strings. So if memory usage is crucial to your application, so is understanding the memory usage of Strings and related objects.

If you're used to using a language such as C and are not used to dealing with Unicode, you may expect a string to essentially take up one byte per character plus a single byte terminator. But in a language such as Java, gone are those days:

How to calculate String memory usage

For reasons we'll explore below, the minimum memory usage of a Java String (in the Hotspot Java 6 VM) is generally as follows:

Minimum String memory usage (bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)

Or, put another way:

Complications

In general, the formula above will give the memory usage for a "newly created" string. However, there are some subtle cases where:

Understanding String memory usage

To understand the above calculation, we need to start by looking at the fields on a String object. A String contains the following:

This means even if the string contains no characters, it will require 4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields, plus 8 bytes of object header. This gives 24 bytes (which is a multiple of 8 so no "padding" bytes are needed so far). Then, the (empty) char array will require a further 12 bytes (arrays have an extra 4 bytes to store their length), plus in this case 4 bytes of padding to bring the memory used by the char array object up to a multiple of 16. So in total, an empty string uses 40 bytes.

If the string contains, say, 17 characters, then the String object itself still requires 24 bytes. But now the char array requires 12 bytes of header plus 17*2=34 bytes for the seventeen chars. Since 12+34=46 isn't a multiple of 8, we also need to round up to the next multiple of 8 (48). So overall, our 17-character String will use up 48+24 = 72 bytes. As you can see, that's quite a long way off the 18 bytes that you might have expected if you were used to C programming in the "good old days"1.

Memory usage of substrings

At first glance, you may be wondering why a String object holds an offset and length of the array: why isn't the string's content just the whole of the char array? The answer is that when you create a substring of an existing String, the newly created substring is a new String object but which points back to the same char array as the parent (but with different offset and length). Depending on your usage, this is either a good or a bad thing:

For example, in the following code:

String str = "Some longish string...";
str = str.substring(5, 4);

you might have expected the underlying char array of str to end up containing four characters. In fact, it will continue to contain the full sequence Some longish string..., but with the internal offset and length set accordingly. If this memory wastage is a problem (because we are hanging on to lots of strings created in the above manner), then we can create a new string:

String str = "Some longish string...";
str = new String(str.substring(5, 4));

Creating a "brand new" string like this will force the String to take up the "minimum" amount of memory as outlined above by making the underlying char array "just big enough" for the characters of the substring.

Next: memory usage of other string-related objects

In the next part of this discussion, we look at the memory usage of string buffer objects (StringBuffer, StringBuilder) plus consider the general case of a CharSequence whose memory usage we can reduce.


1. We're actually being slightly unfair here. In C, a 17-character string plus terminator may well require just 18 bytes on the stack. But if you were to allocate 18 bytes to store the string via malloc(), then that allocation from the malloc heap would generally require some extra bytes of "housekeeping", just as for a Java object. (It would typically require in the order of 8 or so bytes, however— still a big difference!)