Apache Kerberos
Authentication and Identity Propagation

Authentication can be categorized into two types:
-
Service Authentication: Verifies the identity among different service components like HDFS, YARN, MapReduce, etc.
-
User Authentication: A process enabling a device to verify the identity of a user/client connecting to a network resource. Without user authentication, the service merely trusts the identity information provided by the client.
In most scenarios, a password is used as proof of identity. However, it is necessary to prevent the interception or "eavesdropping" of this password and to provide a means of user authentication such that whenever a user requests a service, they must prove their identity.

Features of Apache Kerberos
Kerberos is the result of an effort by the MIT - Massachusetts Institute of Technology, known as "Project Athena," started in 1983. It's an open-source computer network authentication protocol that provides Single Sign-On (SSO) based on a trustworthy third party service (users and services rely on a third party: the Kerberos server). It has been adopted by the Hadoop team as a component for authentication to access the Hadoop Cluster. Among its main features, we highlight:
-
Reliability:
- Controlled access services are only available if Kerberos is also available. It uses a distributed architecture of servers, with systems enabled to back up others.
- It is a mutual authentication system ensuring not just that the user is who they claim to be, but also that the services the user is accessing are those expected. Both users and the server will always be assured that the counterpart they are interacting with is authentic.
-
Scalability: Passwords or secret keys are only known to the KDC (Key Distribution Center) and the Kerberos Principal, making the system scalable to authenticate a large number of entities, which only need to know their own secret keys and set them up in the KDC.
-
Security: The user's password is never transmitted over the network. Uses Tickets, which are negotiated with the server, with a life time limit.
-
Transparency: The user is unaware of the authentication process itself, except for the request for a password.
-
Simplicity: Utilizes the SSO (Single Sign-On) system, where a single ticket can be used by all services until the validity expires. Simplifies user management: Creating, deleting, updating users in Kerberos is very simple.
-
Speed, as it uses Symmetric Key Operations, which are always faster in SSL authentication operations, which is based on public-private keys.
-
Adaptability, as it easily integrates with the Enterprise Identity Server.
Architecture of Apache Kerberos
Kerberos works in a client-server model. Its basic operation consists of:
- A ticket, which is a type of certificate, securely informing the identity of the user to whom access was originally granted.
- An authenticator, which is a credential generated by the client with information that will be compared with the ticket, thereby ensuring that the client presenting the ticket is the same for whom the ticket was granted.
- A key distribution center, which provides valid temporary tickets to the user to access an application, which will be ratified by the authenticator. The application examines the ticket and the authenticator for validity and grants access if they are valid.
For authenticating and verifying the authenticity of consumers, Kerberos makes use of symmetric key encryption (the same key to encrypt and decrypt data) and the KDC (Key Distribution Center), a database of all secret keys, involving 3 aspects:
- A TGS (ticket granting server) that connects the consumer with the service server (SS). Its main purpose is to prevent the user's password from being misplaced.
- A Kerberos database that stores the client's and server's secret keys and the identification of all tested users.
- An Authentication Server (AS) that performs preliminary authentication.
The guiding idea of the solution is the existence of a server capable of delivering tickets to the user to access services. These tickets remain valid for a certain period. The service does not have to request any validation of these tickets, as they are trustworthy, having been generated by a trustworthy server. Essentially, the solution involves two processes:
-
The client requests a ticket from the Kerberos server.
-
The client submits the ticket to the desired service and is authenticated.
noteThere is no possibility of falsifying an identity or forging/reusing a ticket.
Service Authentication
Services authenticate with Kerberos when they are initiated, through the Kerberos Principal (a unique identity that can receive tickets for authentication) and the keytab (which contains the authentication credentials of cluster resources).
The Kerberos Principal authenticates the service through the key in the keytab.
After authentication, the KDC issues the ticket, which will be inserted into the private credentials set. The service can then serve the client.
The process of user authentication occurs, briefly, as follows:
- The user authenticates using their Kerberos Principal.
note
Initially, the user must log in to the client machine that is enabled to communicate with the Hadoop Cluster.
- The user executes the
kinit
command with the Kerberos Principal and the password. Kinit
will authenticate the user at the KDC, obtaining the resulting ticket and placing it in the ticket cache on the filesystem.

Best Practices for Apache Kerberos
-
Encryption configurations in Kerberos are usually set to a variety of types, including "weak" choices such as DES by default. It is recommended to remove weak types to ensure the best possible security.
-
When using AES-256, the Java Cryptographic Extensions need to be installed on all cluster nodes to allow encryption types of "unlimited strength". It is important to note that some countries prohibit the use of these types of encryption.
-
In environments where Active Directory (AD) users need to access Hadoop Services, it is recommended to establish unidirectional trust between Hadoop Kerberos and the AD Domain.
-
As Kerberos is a time-sensitive protocol, all hosts in the domain must be synchronized by time, for example, using the Network Time Protocol (NTP). If the local system time of a client differs from that of the KDC by just 5 minutes (the default), the client will not be able to authenticate.
Details of Project Apache Kerberos
Kerberos was developed in C.
Sources: