Kerberos
Authentication and Identity Propagation

Authentication can be categorized into two types:
-
Service Authentication: Verifies the identity among different service components like HDFS, YARN, MapReduce, etc.
-
User Authentication: A process enabling a device to verify the identity of a user/client connecting to a network resource. Without user authentication, the service merely trusts the identity information provided by the client.
In most scenarios, a password is used as proof of identity. However, it is necessary to prevent the interception or "eavesdropping" of this password and to provide a means of user authentication such that whenever a user requests a service, they must prove their identity.

Features of Kerberos
Kerberos is the result of an effort by the MIT - Massachusetts Institute of Technology, known as "Project Athena," started in 1983. It's an open-source computer network authentication protocol that provides Single Sign-On (SSO) based on a trustworthy third party service (users and services rely on a third party: the Kerberos server). It has been adopted by the Hadoop team as a component for authentication to access the Hadoop Cluster. Among its main features, we highlight:
-
Reliability:
- Controlled access services are only available if Kerberos is also available. It uses a distributed architecture of servers, with systems enabled to back up others.
- It is a mutual authentication system ensuring not just that the user is who they claim to be, but also that the services the user is accessing are those expected. Both users and the server will always be assured that the counterpart they are interacting with is authentic.
-
Scalability: Passwords or secret keys are known only by the Key Distribution Center (KDC) and the Kerberos principals (unique identities such as users or services), making the system scalable to authenticate a large number of entities. Each entity only needs to know its own secret key and register it in advance with the KDC.
-
Security: The user's password is never transmitted over the network. Kerberos uses temporary tickets (such as the TGT and Service Ticket), which are issued by the authentication server (KDC) and have a limited lifetime.
-
Transparency: The user is unaware of the authentication process itself, except for the request for a password.
-
Simplicity: Utilizes the SSO (Single Sign-On) system, where a single ticket can be used by all services until the validity expires. Simplifies user management: Creating, deleting, updating users in Kerberos is very simple.
-
Speed, as it uses Symmetric Key Operations, which are always faster in SSL authentication operations, which is based on public-private keys.
-
Adaptability, as it integrates easily with enterprise identity providers such as Active Directory, FreeIPA, or LDAP-based systems.
Architecture of Kerberos
Kerberos works in a client-server model. Its basic operation consists of:
- A ticket, which is a type of certificate, securely informing the identity of the user to whom access was originally granted.
- An authenticator, which is a credential generated by the client with information that will be compared with the ticket, thereby ensuring that the client presenting the ticket is the same for whom the ticket was granted.
- A key distribution center, which provides valid temporary tickets to the user to access an application, which will be ratified by the authenticator. The application examines the ticket and the authenticator for validity and grants access if they are valid.
To authenticate and verify the identity of consumers, Kerberos uses symmetric key encryption (the same key is used to encrypt and decrypt data) and a central component called the Key Distribution Center (KDC), which maintains a database of all secret keys. This process involves three main components:
- Authentication Server (AS) – performs the initial authentication of the user and issues the Ticket Granting Ticket (TGT).
- Ticket Granting Server (TGS) – issues service-specific tickets based on the TGT, eliminating the need to reuse the user's password.
- Kerberos Database – stores the secret keys and identities of all users and services authorized in the system.
The guiding idea of the solution is the existence of a server capable of delivering tickets to the user to access services. These tickets remain valid for a certain period. The service does not need to query the KDC to validate the ticket, as it can validate it locally using its own shared key with the KDC:
-
The client requests a ticket from the Kerberos server.
-
The client submits the ticket to the desired service and is authenticated.
noteThere is no possibility of falsifying an identity or forging/reusing a ticket.
Service Authentication
Services authenticate with Kerberos when they are initiated, through the Kerberos Principal (a unique identity that can receive tickets for authentication) and the keytab (which contains the authentication credentials of cluster resources).
The Kerberos Principal authenticates the service through the key in the keytab.
After authentication, the KDC issues the ticket, which will be inserted into the private credentials set. The service can then serve the client.
The process of user authentication occurs, briefly, as follows:
- The user authenticates using their Kerberos Principal.
note
Initially, the user must log in to the client machine that is enabled to communicate with the Hadoop Cluster.
- The user executes the
kinit
command with the Kerberos Principal and the password. Kinit
will authenticate the user at the KDC, obtaining the resulting ticket and placing it in the ticket cache on the filesystem.

Best Practices for Kerberos
-
Encryption configurations in Kerberos are usually set to a variety of types, including "weak" choices such as DES by default. It is recommended to remove weak types to ensure the best possible security.
-
When using AES-256, the Java Cryptographic Extensions need to be installed on all cluster nodes to allow encryption types of "unlimited strength". It is important to note that some countries prohibit the use of these types of encryption.
-
In environments where Active Directory (AD) users need to access Hadoop Services, it is recommended to establish unidirectional trust between Hadoop Kerberos and the AD Domain.
-
As Kerberos is a time-sensitive protocol, all hosts in the domain must be synchronized by time, for example, using the Network Time Protocol (NTP). If the local system time of a client differs from that of the KDC by just 5 minutes (the default), the client will not be able to authenticate.
Details of Project Kerberos
Kerberos was developed in C.
Sources: