Skip to main content
Version: Next

Environment Preparation

The following procedures should be performed on all machines in the Big Data Cluster.

Firewall Deactivation

  1. Disable the Firewall service:

    Terminal input
    systemctl stop firewalld;
    systemctl disable firewalld;
    warning

    If disabling the Firewall is not possible, create rules (iptables) to allow communication between the machines on the ports used by the TDP Platform services.

SELinux Deactivation

  1. Temporarily disable SELinux:

    Terminal input
    setenforce 0;
  2. Permanently disable SELinux:

    Terminal input
    sed -i --follow-symlinks "s+SELINUX=enforcing+SELINUX=disabled+g" /etc/selinux/config;
  3. Restart the machine to apply the permanent SELinux deactivation.

Kernel Parameter Configuration

  1. Minimize the use of swap space:

    Terminal input
    echo 'vm.swappiness=1' >> /etc/sysctl.conf;
    sysctl -p /etc/sysctl.conf;
    note

    The Linux kernel provides a tunable setting that controls how frequently the swap space on disk is used, called swappiness. A swappiness value of zero means that the disk will be avoided unless absolutely necessary (when the server runs out of memory), while a swappiness value of 100 means that programs will use swap space almost instantly. Reducing the swappiness value reduces the likelihood of the Linux kernel sending an application's memory to swap space. Swap space is extremely slower than memory because it uses the disk instead of RAM. When processes' memory is swapped to disk, they can experience pauses, which can cause problems in services, such as Apache Zookeeper.

    Figure 1 - Swap minimize
    Figure 1 - Swap minimize
  2. Change the default behavior of RAM memory allocation:

    Terminal input
    echo 'vm.overcommit_memory=1' >> /etc/sysctl.conf;
    sysctl -p /etc/sysctl.conf;
    note

    In Big Data environments, where it is common to have machines with large amounts of RAM, we recommend setting overcommit_memory to 1. Unlike the value 0, which uses a heuristic approach to memory requests (malloc), the value 1 assumes there is always sufficient physical memory, significantly improving the performance of memory-intensive tasks.

    Figure 2 - Change RAM default
    Figure 2 - Change RAM default
  3. Disable Transparent Huge Pages (THP):

    Terminal input
    echo never > /sys/kernel/mm/transparent_hugepage/enabled;
    echo never > /sys/kernel/mm/transparent_hugepage/defrag;
    echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >> /etc/rc.local;
    echo "echo never > /sys/kernel/mm/transparent_hugepage/defrag" >> /etc/rc.local;
    note

    Many Linux distributions offer THP as a low-complexity option to increase the size of memory blocks/pages (from 4KB to 2MB or 1GB) and to enable the management of many gigabytes, and even terabytes, of RAM. However, workloads in Big Data environments often perform poorly with THP enabled, because they tend to have sparse rather than contiguous memory access patterns, thus overloading the CPU.

    Figure 3 - THP disable
    Figure 3 - THP disable

Time Synchronization

  1. Install a time synchronization service (NTP) to ensure that the date and time of the machines are always synchronized, preventing inconsistencies in data and services. In this example, we will use Chrony:

    Terminal input
    yum install chrony -y
    systemctl enable chronyd
    systemctl start chronyd
    Figure 4 - Install time synchronization
    Figure 4 - Install time synchronization
  2. Configure Chrony to synchronize with appropriate NTP servers:

    Terminal input
    vi /etc/chrony.conf

    # Add or adjust the NTP servers as needed

    server 0.centos.pool.ntp.org iburst
    server 1.centos.pool.ntp.org iburst
    server 2.centos.pool.ntp.org iburst
    server 3.centos.pool.ntp.org iburst
    note

    To save and exit the vi editor, press ESC, type :wq, and press Enter.

  3. Restart the service to apply the new configurations:

    Terminal input
    systemctl restart chronyd
  4. Check the synchronization status:

    Terminal input
    chronyc tracking
    chronyc sources
    Figure 5 - Configure Chrony
    Figure 5 - Configure Chrony