What is garbage collection in Java and how does it work?
This article explains the purpose of garbage collection in Java and its evolution, and offers examples of how it works.
The author of this article is EPAM Senior Software Engineer Vaibhavi Deshpande.
Read other expert’s articles:
The purpose of garbage collection in Java
The purpose of garbage collection in Java is to automatically manage memory by reclaiming memory that is occupied by unused objects. This helps prevent memory leaks and reduces the burden of manual memory management. It also provides a mechanism to identify and deallocate objects that are no longer needed, freeing up memory for new object allocations.
The evolution of garbage collection
Before the introduction of garbage collection in Java, programming languages such as C and C++ required manual memory management. Developers had to explicitly allocate and deallocate memory for objects using functions like malloc() and free(). This manual memory management often led to several issues such as memory leaks, dangling pointers, and segmentation faults. It was a complex and error-prone process, requiring careful tracking of memory allocations and deallocations.
One of Java’s major advancements was the introduction of automatic garbage collection. The Java Virtual Machine (JVM) includes a garbage collector that automatically manages memory on behalf of the developer. The garbage collector identifies and frees memory occupied by objects that are no longer reachable or in use by the program. This automatic memory management simplifies the development process, reduces the likelihood of memory-related bugs, and improves overall program stability.
The evolution of garbage collection in Java has seen advancements in garbage collection algorithms, optimization techniques, and customization options to suit different application requirements. Over the years, the JVM has introduced various garbage collectors with different characteristics and trade-offs, allowing developers to choose the most suitable collector based on their application's needs.
Initially, the JVM used a simple mark-sweep garbage collector, which marked reachable objects and swept away unreferenced objects. This approach suffered from memory fragmentation issues. Subsequently, the JVM introduced the mark-compact algorithm, which included a compaction phase to minimize memory fragmentation.
As Java applications grew more complex, the need for efficient garbage collection mechanisms arose. The JVM introduced generational garbage collection, which divided the heap into young and old generations, based on the object's age. This generational approach leverages the observation that most objects become garbage shortly after they are created. It optimized garbage collection by performing more frequent and faster collections in the young generation, while collecting less frequently in the old generation.
To cater to different application scenarios, the JVM introduced different garbage collectors. The Serial Collector, Parallel Collector, Concurrent Mark Sweep (CMS) Collector, and Garbage-First (G1) Collector are some of the collectors available in modern JVMs. They offer different trade-offs between throughput, latency, and memory usage, allowing developers to choose the most suitable collector for their application.
The introduction of garbage collection in Java revolutionized memory management by eliminating the need for manual memory management, reducing memory-related bugs, and improving application stability. The evolution of garbage collection in Java led to the development of more sophisticated algorithms, collectors, and customization options to cater to diverse application requirements.
10 interesting facts about garbage collection in Java
1. HotSpot JVM
The HotSpot JVM, developed by Sun Microsystems (now owned by Oracle), is one of the most widely used JVMs for executing Java applications. It incorporates various garbage collection algorithms and collectors to optimize memory management.
2. Stop-the-world events
During garbage collection, the JVM typically halts the execution of application threads to perform garbage collection operations. These pauses, known as stop-the-world events, temporarily suspend the application's progress. However, modern garbage collectors aim to minimize the duration and frequency of these pauses to avoid significant disruptions.
3. Concurrent garbage collection
The Concurrent Mark Sweep (CMS) Collector and the Garbage-First (G1) Collector are examples of garbage collectors that perform garbage collection concurrently with the execution of application threads. They aim to minimize stop-the-world pauses by running garbage collection operations simultaneously with application code.
4. Memory fragmentation
Memory fragmentation can occur in the heap when free memory is divided into small, non-contiguous blocks. This can lead to inefficient memory utilization and increased allocation overhead. Garbage collectors that perform compaction, like the mark-compact algorithm, help reduce fragmentation by rearranging memory to create contiguous blocks of free memory.
5. Generational garbage collection
Generational garbage collection is based on the observation that most objects become garbage shortly after they are created. By dividing the heap into young and old generations, the JVM can optimize garbage collection by collecting young objects more frequently and quickly, while collecting older objects less frequently.
6. Tuning and customization
Garbage collectors in the JVM often provide various tuning options and parameters to customize their behavior. Developers can adjust these parameters to optimize garbage collection performance based on the specific characteristics and requirements of their application.
7. Garbage collection algorithms
In addition to the collectors mentioned earlier, the JVM also incorporates different garbage collection algorithms, such as mark-sweep, mark-compact, and copying algorithms. These algorithms define how objects are identified, marked, and reclaimed during garbage collection.
8. Impact on application performance
Garbage collection can impact application performance, especially if not properly tuned or if the application generates a large amount of short-lived objects. Understanding the behavior of the garbage collector and optimizing the application's memory usage can help mitigate any performance issues.
9. Memory leaks
Although garbage collection helps prevent many memory leaks, it does not eliminate the possibility entirely. Memory leaks can still occur if objects are unintentionally kept reachable by strong references or if resources are not properly released. Careful programming practices and memory management are still necessary to avoid such leaks.
10. Ongoing research and advancements
Garbage collection in Java continues to be an active area of research and development. New algorithms, collectors, and techniques are constantly being explored to further improve garbage collection performance, reduce latency, and minimize the impact on application execution.
Garbage collection in Java is a complex and fascinating aspect of memory management, and understanding its principles and optimizations can be a significant benefit to you in writing efficient and reliable Java applications.
How garbage collection works in Java
Now, let’s dive deeply into how garbage collection works and look at some examples. Important topics here are object reachability, mark and sweep, and finalization.
- Object reachability
The garbage collector determines which objects are still in use and which are eligible for garbage collection. It starts by considering a set of root objects, such as static variables, method call stacks, and local variables. Any object directly or indirectly referenced by these root objects is considered reachable and is not eligible for garbage collection. Objects that are not reachable from the root objects are considered unreachable and can be garbage collected. Java uses graph-based structure used for object reachability.
Example: Imagine a social media application where users can create posts and comment on them. Each post and comment is represented by an object. The root objects in this scenario could be the active user's profile, the posts they've created, and the comments they've made. Any object directly or indirectly referenced by these root objects will be considered reachable and not eligible for garbage collection.
- Mark and sweep
The garbage collector performs a mark and sweep algorithm to identify and reclaim unreachable objects. It traverses the object graph starting from the root objects, marking each visited object as reachable. Once the marking phase is complete, the garbage collector performs a sweep phase to identify objects that were not marked and are therefore unreachable. The memory occupied by these unreachable objects is then freed.
Example:
In the above code, we create multiple instances of the MyClass class. We also create references obj4 and obj5 that point to obj1 and obj2 respectively. Then, we set obj1, obj2, and obj3 to null to make them eligible for garbage collection. Finally, we explicitly invoke the System.gc() method to request garbage collection.
When the garbage collector runs, it marks all reachable objects starting from the root objects (obj4 and obj5). Since obj1, obj2, and obj3 are no longer reachable, they will be identified as unreachable during the sweep phase. As a result, their memory will be reclaimed, and the finalize() method will be called for each of those objects.
- Finalization
Before an object is garbage collected, the JVM calls the finalize() method on the object. This method allows the object to perform any necessary cleanup operations before it is freed from memory. However, it's important to note that relying on finalize() for critical resource cleanup is not recommended. It is not guaranteed to be called promptly or at all.
Real-world example: Suppose you have a database connection object that needs to be properly closed before it is garbage collected to release any acquired resources. In the finalize() method of the object, you could include the code to close the database connection.
In the above code, the finalize() method of the DatabaseConnection class is overridden to call the close() method, ensuring that the database connection is properly closed before the object is garbage collected.
Conclusion
In summary, garbage collection in Java involves identifying and reclaiming memory occupied by unreachable objects. It utilizes a mark and sweep algorithm to identify unreachable objects. The finalize() method can be used for object cleanup before garbage collection. By understanding how garbage collection works, developers can write efficient and memory-safe code in Java.