Skip to main content
Version Next

Environment Preparation

The following procedures must be performed on all machines of the Big Data Cluster.

Firewall Deactivation

  1. Disable the Firewall service:

    Terminal input
    systemctl stop firewalld;
    systemctl disable firewalld;
    warning

    If it is not possible to deactivate the Firewall, create rules (iptables) to allow communication between machines on the ports used by the TDP Platform services.

SELinux Deactivation

  1. Temporarily disable SELinux:

    Terminal input
    setenforce 0;
  2. Permanently disable SELinux:

    Terminal input
    sed -i --follow-symlinks "s+SELINUX=enforcing+SELINUX=disabled+g" /etc/selinux/config;
  3. Restart the machine to apply the permanent SELinux deactivation.

Kernel Parameter Configuration

  1. Minimize the use of the swap area:

    Terminal input
    echo 'vm.swappiness=1' >> /etc/sysctl.conf;
    sysctl -p /etc/sysctl.conf;
    note

    The Linux kernel provides a tunable setting that controls how often the swap area on disk is used, called swappiness. A swappiness setting of zero means the disk will be avoided unless absolutely necessary (when the server runs out of memory), while a swappiness setting of 100 means programs will use the swap area almost instantly. Reducing the swappiness value reduces the likelihood of the Linux kernel sending an application's memory to the swap area. The swap area is extremely slower than memory because it uses disk instead of RAM. When the memory area of processes is transferred to disk, they may experience pauses, which can cause problems in services such as Apache Zookeeper.

    Figure 1 - Minimize swap
    Figure 1 - Minimize swap
  2. Change the default RAM allocation behavior:

    Terminal input
    echo 'vm.overcommit_memory=1' >> /etc/sysctl.conf;
    sysctl -p /etc/sysctl.conf;
    note

    In Big Data environments, where it is common to have machines with large amounts of RAM, we recommend the value 1 for the overcommit_memory setting. Unlike the value 0, which uses a heuristic approach to handle memory requests (malloc), the value 1 assumes that there is always enough physical memory, considerably increasing the performance of memory-intensive tasks.

    Figure 2 - Change default RAM
    Figure 2 - Change default RAM
  3. Disable Transparent Huge Pages (THP):

    Terminal input
    echo never > /sys/kernel/mm/transparent_hugepage/enabled;
    echo never > /sys/kernel/mm/transparent_hugepage/defrag;
    echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >> /etc/rc.local;
    echo "echo never > /sys/kernel/mm/transparent_hugepage/defrag" >> /etc/rc.local;
    note

    Many Linux distributions provide THP as a low-complexity option to increase the size of memory blocks/pages (from 4KB to 2MB or 1GB) and enable the management of many gigabytes, and even terabytes, of RAM. However, workloads in Big Data environments tend to perform poorly with THP enabled because they tend to have sparse memory access patterns, rather than contiguous ones, thus overloading the CPU.

    Figure 3 - Disable THP
    Figure 3 - Disable THP

Time Synchronization

  1. Install a time synchronization service (NTP) to ensure that the date and time of the machines are always synchronized, preventing inconsistencies in data and services. In this example, we will use Chrony:

    Terminal input
    yum install chrony -y
    systemctl enable chronyd
    systemctl start chronyd
    Figure 4 - Install time synchronization
    Figure 4 - Install time synchronization
  2. Configure Chrony to synchronize with appropriate NTP servers:

    Terminal input
    vi /etc/chrony.conf

    # Add or adjust NTP servers as needed

    server 0.centos.pool.ntp.org iburst
    server 1.centos.pool.ntp.org iburst
    server 2.centos.pool.ntp.org iburst
    server 3.centos.pool.ntp.org iburst
    note

    To save and exit the vi editor, press ESC, type :wq and press Enter.

  3. Restart the service to apply the new settings:

    Terminal input
    systemctl restart chronyd
  4. Check the synchronization status:

    Terminal input
    chronyc tracking
    chronyc sources
    Figure 5 - Configure Chrony
    Figure 5 - Configure Chrony