Developers are well aware that there's a lot going on behind the scenes when it comes to Java, one of which is the famous Garbage Collector (GC). Although its existence is well known, the way it works is not so clear to most (human) developers.
Java Garbage Collection is the process responsible for automatic memory management. This process is needed because when Java programs are executed, they allocate objects in the heap. Eventually, some of these objects are no longer needed, so they must be deleted in order to free up memory. Knowing how this process works can help you create better code lines, since application performance is closely related to memory allocation and deallocation.
In languages such as C and C++, developers used to work with the free function and delete operand, respectively, in order to do GC manually (explicit management). There are, however, some problems related to this approach, such as dangling references: when a memory space is deallocated but the object that was allocated in it is still being referenced, a space leak occurs. A space leak is when an object is not being referenced any more but the memory that was allocated is not released. These are the most common problems.
It is important to point out that each Java Virtual Machine (JVM) can have its own GC implementation as long as it respects the JVM specifications. The most commonly used JVM is HotSpot, by Oracle, and this is what we'll use today to explain GC.
All HotSpot GCs follow the same basic rules:
One important point to discuss is memory fragmentation. This occurs when memory is freed up and small pieces of memory are released in many areas; it can happen when there is not enough continuous space to allocate new objects. One way to avoid this is compaction, where memory can be compacted after objects are deleted. This puts remaining objects on a continuous block at the beginning of the heap, which improves the way new objects are sequentially stored after this initial block.
Marking all objects is an inefficient process since the number of allocated objects just increases. Since most of these objects are short-lived, we have the JVM Generations concept, which is when a heap is broken up into smaller parts or generations.
The JVM Generations concept is an "age strategy" where objects are classified by age. Classifications include: young generation (divided into eden and survivor spaces, as shown below), old generation, and permanent generation.
Figure 1: Garbage Collection process.
Serial
A serial collector takes care of garbage collection using a single thread. This is helpful because there is no communication overhead between threads. It can’t, however, take advantage of multiprocessor hardware (though it can be useful for small data sets).
Parallel
Parallel collectors use multiple threads in order to take care of garbage collection. This is the main difference between parallel and serial collectors. Parallel collectors are ideal for medium-to-large datasets on multiprocessor hardware.
One important note is that parallel compaction enables the parallel collector to perform major collections in parallel, otherwise it will be performed in one single thread by default.
CMS (Concurrent Mark Sweep)
CMS is for applications that can afford sharing processor resources with the GC and that prefer shorter GC pauses (the time GC takes to recover space that is no longer in use). It uses the same algorithm as parallel collector. The main GC process is multithreaded, but it runs simultaneously with the application process.
As a quick heads up, the CMS collector is deprecated as of JDK 9.
G1 (Garbage First)
G1 is intended for multiprocessors with large amounts of memory. It is parallel and concurrent and can achieve high throughput.
G1 has gained popularity, and since it is more efficient, it will replace CMS. One of the differences that makes G1 a better solution than CMS is that it's a compacting GC: "G1 compacts sufficiently to completely avoid the use of fine-grained free lists for allocation, and instead relies on regions" (Oracle). This feature eliminates some possible problems with fragmentation.
With G1, the heap is partitioned into equal heap regions, and the same roles (such as eden, survivor, and old generations) can be applied to these regions, but there is not a fixed size for them, which translates into flexibility, as shown below:
Figure 2. G1 Garbage Collection allocation. Image courtesy of Oracle.
The Z Garbage Collector
This is a scalable, low-latency, fully concurrent GC. It doesn’t stop the application threads at all and can perform all the hard and expensive work concurrently. This is possible because of some new techniques like colored 64-bit references, load barriers, relocation, and remapping. It was introduced with JDK 11 and is intended for applications with large heaps that require low latency.
Today we covered the basics of Java Garbage Collection and took a look at the various collectors Java makes available. Understanding how each collector works will help you navigate the best solutions for your applications. Next week, we'll take a look at how to select the best garbage collector based on the scope and parameters of your project, as well as best practices for utilizing these tools. Stay tuned!
For references and further reading about Java Garbage Collection, please see the following articles:
MANDIC. Java Garbage Collection: melhores práticas, tutoriais e muito mais. Accessed: Dec 4, 2019.
THE URBAN PENGUIN. JAVA Object Lifecycle, de-referencing and garbage collection. Accessed: Dec 4, 2019.
ORACLEa. Java Garbage Collection Basics. Accessed: Dec 4, 2019.
GEEKS FOR GEEKS. Garbage Collection in Java. Accessed: Dec 4, 2019.
ORACLEb. HotSpot Virtual Machine Garbage Collection Tuning Guide. Accessed: Dec 4, 2019.
ORACLEc. Getting Started with the G1 Garbage Collector. Accessed: Dec 18, 2019.
DEVMEDIA. Introdução ao Java Garbage Collection. Accessed: Dec 23, 2019.