Cluster Creation and Component Installation
After installing the Apache Ambari service, the next step is to create the Big Data Cluster, including the installation of the desired services/components.
Let's Get Started! Use a browser to access the Ambari web interface available at the IP/hostname of the Ambari Server machine, port 8080. For example:
http://192.168.56.100:8080

By default, the access username/password are admin / admin, respectively.
- Instructions
- Vídeo (Example)
-
On the first access, a welcome page will be displayed. To start the Cluster deployment process, click the LAUNCH INSTALL WIZARD button:
Figure 2 - Ambari Welcome Page -
Enter a name for the Cluster and click NEXT:
Figure 3 - Cluster Name
Version Selection
-
Select the desired TDP version:
Figure 4 - TDP Version -
Select the type of package repository (Public or Local) and enter the URL for Components (TDP-2.3.0) and Utils (TDP-UTILS-2.3.0):
Figure 5 - Package Repositories -
Then, click NEXT.
importantIf you choose to use the Public Package Repository of Tecnisys, access credentials (username and password) must be entered directly in the URL, as shown in the image above.
Installation Options
-
In Target Hosts, enter the Fully Qualified Domain Name (FQDN) of the hosts (machines) that will make up the Cluster.
The Ambari Server must have access to the machines provided. Make sure that the FQDN of the machines is correctly resolved, either through a DNS Server (recommended) or locally (via
/etc/hosts
file).Figure 6 - Cluster Host InformationtipIn Target Hosts, you can enter machines using Pattern Expressions. The example shown in the figure above would look like this:
big-tdp[1-7].dev-geep.local
. -
In Hosts Registration Information, select how the Cluster machines will be registered.
-
If you choose to provide the private SSH key of the Ambari Server machine for automatic registration of the Cluster machines, paste its content into the text field below or upload the file. Then, confirm the username and SSH port to be used. Also, make sure that Trust Relationship (SSH key exchange) has been correctly established, allowing access to all machines via SSH without the need for the password of the Ambari Server's daemon user (by default, root).
Figure 7 - Host Registration -
If you prefer manual registration of machines, install the Ambari Agent on each machine before proceeding.
tipAn RSA-type private SSH key can be obtained by running the following command:
Terminal inputcat ~/.ssh/id_rsa
noteTo manually install the Ambari Agent:
Terminal inputyum install ambari-agent
-
-
Then, click REGISTER AND CONFIRM.
Trust Relationship Configuration
-
On the Ambari Server machine, generate a private SSH key:
Terminal inputssh-keygen
-
Copy the SSH key to ALL machines in the Cluster. For example:
Terminal inputssh-copy-id tdp-mn01.tecnisys.com.br
-
Test SSH access to ALL machines in the Cluster without the user password. For example:
Terminal inputssh root@tdp-mn01.tecnisys.com.br
Host Confirmation
After installing the Ambari Agent on all machines specified in the previous step, Ambari performs a series of checks to ensure that the prerequisites have been met (JDK, Firewall, THP, among others).

Any errors need to be corrected, and the verification re-run to proceed.
Click NEXT to continue.
Package Issues alerts related to already installed PostgreSQL packages can be ignored.
Service Selection
-
Select the service responsible for the Cluster storage layer.
Figure 9 - Storage Layer Service Selection -
Select the remaining Cluster services.
Figure 10 - Selection of Additional Cluster ServicestipWe recommend initially selecting the basic services, such as YARN + MapReduce2, Tez, Zookeeper, Infra Solr, and Ambari Metrics. Additional services, if needed, can be added after the Cluster is created. This approach makes it easier to handle potential component installation issues.
noteThe Cluster requires certain services to operate fully, such as Apache Ranger for security and Apache Atlas for data governance. Therefore, Ambari will display alerts if any functionality is limited by the absence of a specific service. Ignore the alert (click the PROCEED ANYWAY button) if the service will be installed later or if you are aware of this limitation.
-
Then, click NEXT.
Assignment of Master Components
-
Specify the machine for each Master component (usually management and coordination components) of the selected services. On the right side of the page, you can see the organization of components by machine.
Figure 11 - Master Component AssignmentnoteThe setup should consider each component's requirements and the available resources on each machine. Some recommendations include:
- Avoid installing non-Edge or Gateway services on the Ambari Server machine. If possible, dedicate a machine to the Ambari Server.
- Components responsible for high availability should be installed on separate machines, e.g., NameNode and Secondary NameNode (SNameNode).
- Install Zookeeper on an odd number of machines, more than one (01); that is, initially on at least 3 machines.
-
Then, click NEXT.
Assignment of Slave and Client Components
-
Specify the machines where Slave components (usually storage and processing components) and Clients will be installed.
Figure 12 - Assignment of Slave and Client ComponentsnoteWhenever possible, avoid installing Slave components on Master component machines.
-
Then, click NEXT.
Service Customization
In this step, define access credentials, database connection data, directories, users, and other specific information necessary for each service installation.
Resolve any pending items in each section of this step and click NEXT to proceed.
Credentials
For illustration, we have Grafana, a component of Ambari Metrics, which requires setting up the administration username and password:

Databases
As an example, Hive requires a database to store metadata. Here, we provide the connection data for an existing PostgreSQL instance:

Click the TEST CONNECTION button to test the connection to the specified database.
Directories
In this section, you can customize the service directories, such as the DataNodes data directories, NameNode namespace directories, log directories, etc.

If possible, use exclusive storage devices (disks, SSDs, etc.), volumes, and directories for DataNode, NameNode, JournalNode, NodeManager, Timeline Service, and Zookeeper files.
Service Users
In this section, you can customize the operating system users that will be created for each service.

All Configurations
This final section provides access to all configurations of the services to be installed. Review and adjust as needed.

If you missed any configuration, don't worry. After installation, all these settings can be modified through Ambari.
Configuration Review
In this final step before creating the Cluster, a review of the settings is presented. Carefully check all information, and if you need to change any settings, use the left-side navigation area to go back to the desired step.

Use the PRINT button to generate an installation report and the GENERATE BLUEPRINT button to create an XML file with all defined settings, which can be used in the future to recreate the Cluster via the Ambari REST API.
To start the Cluster deployment, click the DEPLOY button.
Service Installation, Startup, and Testing
In this step, services will be installed, started, and tested, respecting each service's dependencies and integrations.

Click the Message column link to view the scheduled tasks for each machine.
In case of failures, Ambari may pause the deployment, allowing you to resume after fixing the issue by clicking the RETRY button.
However, depending on the progress made, Ambari may complete the deployment and make the Cluster available as-is, even if not all components of a specific service have been successfully installed, started, or tested. In this case, after the Cluster is created, you can adjust configurations or reinstall only the problematic service through Ambari.
Once deployment is complete, click NEXT.

Summary
In the final step of the process, an implementation summary is displayed.
Click the COMPLETE button to finalize the operation and access the administration area of the created Cluster.
