Environment Preparation
The following procedures should be performed on all machines in the Big Data Cluster.
Firewall Deactivation
- Instructions
- Vídeo (Example)
-
Disable the Firewall service:
Terminal inputsystemctl stop firewalld;
systemctl disable firewalld;warningIf disabling the Firewall is not possible, create rules (iptables) to allow communication between the machines on the ports used by the TDP Platform services.
SELinux Deactivation
- Instructions
- Vídeo (Example)
-
Temporarily disable SELinux:
Terminal inputsetenforce 0;
-
Permanently disable SELinux:
Terminal inputsed -i --follow-symlinks "s+SELINUX=enforcing+SELINUX=disabled+g" /etc/selinux/config;
-
Restart the machine to apply the permanent SELinux deactivation.
Kernel Parameter Configuration
- Instructions
- Vídeo (Example)
-
Minimize the use of swap space:
Terminal inputecho 'vm.swappiness=1' >> /etc/sysctl.conf;
sysctl -p /etc/sysctl.conf;noteThe Linux kernel provides a tunable setting that controls how frequently the swap space on disk is used, called swappiness. A swappiness value of zero means that the disk will be avoided unless absolutely necessary (when the server runs out of memory), while a swappiness value of 100 means that programs will use swap space almost instantly. Reducing the swappiness value reduces the likelihood of the Linux kernel sending an application's memory to swap space. Swap space is extremely slower than memory because it uses the disk instead of RAM. When processes' memory is swapped to disk, they can experience pauses, which can cause problems in services, such as Apache Zookeeper.
Figure 1 - Swap minimize -
Change the default behavior of RAM memory allocation:
Terminal inputecho 'vm.overcommit_memory=1' >> /etc/sysctl.conf;
sysctl -p /etc/sysctl.conf;noteIn Big Data environments, where it is common to have machines with large amounts of RAM, we recommend setting overcommit_memory to 1. Unlike the value 0, which uses a heuristic approach to memory requests (malloc), the value 1 assumes there is always sufficient physical memory, significantly improving the performance of memory-intensive tasks.
Figure 2 - Change RAM default -
Disable Transparent Huge Pages (THP):
Terminal inputecho never > /sys/kernel/mm/transparent_hugepage/enabled;
echo never > /sys/kernel/mm/transparent_hugepage/defrag;
echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >> /etc/rc.local;
echo "echo never > /sys/kernel/mm/transparent_hugepage/defrag" >> /etc/rc.local;noteMany Linux distributions offer THP as a low-complexity option to increase the size of memory blocks/pages (from 4KB to 2MB or 1GB) and to enable the management of many gigabytes, and even terabytes, of RAM. However, workloads in Big Data environments often perform poorly with THP enabled, because they tend to have sparse rather than contiguous memory access patterns, thus overloading the CPU.
Figure 3 - THP disable
Time Synchronization
- Instructions
- Vídeo (Example)
-
Install a time synchronization service (NTP) to ensure that the date and time of the machines are always synchronized, preventing inconsistencies in data and services. In this example, we will use Chrony:
Terminal inputyum install chrony -y
systemctl enable chronyd
systemctl start chronydFigure 4 - Install time synchronization -
Configure Chrony to synchronize with appropriate NTP servers:
Terminal inputvi /etc/chrony.conf
# Add or adjust the NTP servers as needed
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburstnoteTo save and exit the vi editor, press ESC, type :wq, and press Enter.
-
Restart the service to apply the new configurations:
Terminal inputsystemctl restart chronyd
-
Check the synchronization status:
Terminal inputchronyc tracking
chronyc sourcesFigure 5 - Configure Chrony