Memory usage of Java Strings and string-related objects
Strings are everywhere. It's difficult to write an application that doesn't use them in
some way, and many will make extensive use of them. Indeed, in many typical applications
such as servers caching entities from a database or data to display in a web page,
the majority of objects stored in the medium-to-long term on the heap may well be Strings.
So if memory usage is crucial to your application, so is understanding the memory
usage of Strings and related objects.
If you're used to using a language such as C and are not used to dealing with Unicode, you
may expect a string to essentially take up one byte per character plus a single byte terminator.
But in a language such as Java, gone are those days:
- as mentioned in our discussion of general memory usage
of Java objects, every object has at least 8 bytes of housekeeping data,
and arrays 12 bytes, and will be padded to a multiple of 16 bytes (in 32-bit versions of Hotspot);
- a Java String actually consists of more than one object;
- a Java char takes up two bytes, even if you're using
them to store boring old ASCII values that would fit into a single byte;
- a Java String contains some extra variables that you might not have considered.
How to calculate String memory usage
For reasons we'll explore below, the minimum memory usage of a
Java String (in the Hotspot Java 6 VM) is generally as follows:
Minimum String memory usage (bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)
Or, put another way:
- multiply the number of characters of the String by two;
- add 38;
- if the result is not a multiple of 8, round up to the next multiple of 8;
- the result is generally the minimum number of bytes taken up on the heap by the String.
In general, the formula above will give the memory usage for a "newly created" string.
However, there are some subtle cases where:
- a String could end up using more than this minimum, if it
was created as a substring from another string;
- substrings can share the underlying character array,
so overall a parent string plus several substrings will use less than the above calculation would predict.
Understanding String memory usage
To understand the above calculation, we need to start by looking at the fields on a
String object. A String contains the following:
- a char array— thus a separate object— containing the actual characters;
- an integer offset into the array at which the string starts;
- the length of the string;
- another int for the cached calculation of the hash code.
This means even if the string contains no characters, it will require 4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields, plus 8 bytes of object header. This gives 24 bytes (which is a multiple of 8 so no "padding" bytes are needed so far). Then, the (empty) char array will require a further 12 bytes (arrays have an extra 4 bytes to store their length), plus in this case 4 bytes of padding to bring the memory used by the char array object up to a multiple of 16. So in total, an empty string uses 40 bytes.
If the string contains, say, 17 characters, then the String object itself still requires 24 bytes. But now the char array requires 12 bytes of header plus 17*2=34 bytes for the seventeen chars. Since 12+34=46 isn't a multiple of 8, we also need to round up to the next multiple of 8 (48). So overall, our 17-character String will use up 48+24 = 72 bytes. As you can see, that's quite a long way off the 18 bytes that you might have expected if you were used to C programming in the "good old days"1.
Memory usage of substrings
At first glance, you may be wondering why a String object holds an offset and length
of the array: why isn't the string's content just the whole of the char array? The answer is
that when you create a substring of an existing String, the newly
created substring is a new String object but which points back to the same char
array as the parent (but with different offset and length). Depending on your usage, this is either
a good or a bad thing:
- if you keep on to the parent string after creating the substring, then
you will save memory overall;
- if you throw away the parent string after creating the substring, then
you will waste memory (if the substring is shorter than the parent).
For example, in the following code:
String str = "Some longish string...";
str = str.substring(5, 4);
you might have expected the underlying char array of str to end up containing
four characters. In fact, it will continue to contain the full sequence Some longish string...,
but with the internal offset and length set accordingly. If this memory wastage is a problem (because we
are hanging on to lots of strings created in the above manner), then we can
create a new string:
String str = "Some longish string...";
str = new String(str.substring(5, 4));
Creating a "brand new" string like this will force the String to take up the "minimum" amount
of memory as outlined above by making the underlying char array "just big enough" for the characters
of the substring.
Next: memory usage of other string-related objects
In the next part of this discussion, we look at the memory usage
of string buffer objects (StringBuffer, StringBuilder) plus consider the general
case of a CharSequence whose memory usage we can reduce.
1. We're actually being slightly unfair here. In C, a 17-character string plus terminator may well
require just 18 bytes on the stack. But if you were to allocate 18 bytes to
store the string via malloc(), then that allocation from the malloc heap would generally
require some extra bytes of "housekeeping", just as for a Java object. (It would typically require in the
order of 8 or so bytes, however— still a big difference!)