Apache Solr
Corporate Research Platform

Apache Ozone
Apache Ozone
Object Storage

Ozone is a redundant and distributed object storage system optimized for Big Data workloads. The primary design focus of Ozone is scalability, aiming to scale to billions of objects.
Ozone separates namespace management from block space management, which allows it to scale much more effectively. The namespace is managed by a daemon called the Ozone Manager (OM), while the block space is managed by the Storage Container Manager (SCM).
Ozone consists of volumes, buckets, and keys. A volume is similar to a home directory in the Ozone ecosystem. Only an administrator can create it.
Volumes are used to store buckets. Once a volume is created, users can create as many buckets as needed. Ozone stores data as keys that reside within these buckets.
The Ozone namespace comprises multiple storage volumes, which are also used as the basis for storage accounting.
The block diagram below illustrates the key components of Ozone.

The Ozone Manager is responsible for namespace management, the Storage Container Manager handles the physical and data layers, and Recon serves as the management interface for Ozone.
Different Perspectives

Any distributed system can be viewed from different perspectives. One way to look at Ozone is by considering the Ozone Manager as a namespace service built on top of HDDS, a distributed block storage system.
Another way to visualize Ozone is through its functional layers. It has a metadata management layer, which consists of the Ozone Manager and the Storage Container Manager.
There is also a data storage layer, which comprises the data nodes managed by the SCM.
The replication layer, provided by Apache Ratis, is used to replicate metadata (OM and SCM) and to ensure consistency when modifying data on the data nodes.
A management server called Recon communicates with all other Ozone components, providing a unified management API and UX for Ozone.
Ozone features a protocol bus that allows it to be extended via additional protocols. Currently, it supports the S3 protocol, which is built through the protocol bus. The protocol bus provides a generic framework that enables the implementation of new filesystem or object storage protocols that interact with the O3 Native protocol.
Apache Ozone Architecture
Apache Ozone's architecture is designed to manage billions of objects with scalability and resilience. It separates namespace management from block management, using the following key components:
-
Ozone Manager (OM):
- Manages the namespace (volumes, buckets, and keys).
- Uses the Ratis protocol to ensure consistency in distributed clusters.
-
Storage Container Manager (SCM):
- Manages containers, which are the replication unit.
- Coordinates data placement and replication across DataNodes.
-
DataNodes:
- Physically store data in containers.
- Ensure high availability and efficient replication.
-
Recon:
- Monitoring and analysis tool for the Ozone cluster.
- Collects detailed metrics for administration and diagnostics.
-
Ratis Protocol:
- Implements distributed consensus to ensure consistent replication.
Ozone Operations Workflow
-
Data Write:
- The client requests block allocation from OM to write data to DataNodes.
- Data is written directly to DataNodes and replicated as needed.
-
Data Read:
- The client requests OM to locate data.
- Data is retrieved directly from DataNodes.
-
Data Replication:
- SCM manages replication across DataNodes to ensure fault tolerance.
Performance Expectations
Ozone employs advanced techniques to achieve high performance, such as:
- RAFT replication for open containers.
- Asynchronous replication for closed containers (cold data).
Apache Ozone Features
Apache Ozone provides a robust set of features to meet massive data storage demands. Highlights include:
-
S3 Compatibility:
APIs that enable integration with cloud-based systems.
Ideal for data migrations and hybrid architectures. -
Hadoop Integration:
A direct alternative to HDFS, requiring no adaptations for frameworks like Apache Spark, Hive, and YARN. -
High Availability:
Based on the Hadoop Distributed Data Store (HDDS), ensures consistent replication and fault tolerance. -
Multimodal Support:
Allows storage as a file system or objects, depending on the use case. -
Horizontal Scalability:
Supports billions of objects, making it ideal for Big Data and Cloud-Native applications. -
Topology Awareness:
Optimizes read/write pipelines based on node placement in the cluster.
Apache Ozone Project Details
Apache Ozone is primarily built in Java, the foundational language for the entire Hadoop ecosystem, including HDFS, YARN, and other related components. This choice enables seamless integration with Hadoop and associated frameworks such as Spark, Hive, and MapReduce.
