Terminologies
-
Action: An "action" consists of a task or tasks on a machine or a group of machines. Each action is tracked by an ID, and the nodes report the status of the action at least at the action granularity. An action can be considered a running step. In this documentation, a stage and an action have a one-to-one correspondence unless otherwise specified. An action ID will be a bijection of request-id to stage-id.
-
Component: A service consists of one or more components. For example, HDFS has 3 components: NameNode, Secondary NameNode, and DataNode. Components can be optional. A component can span multiple nodes (for example, instances of the DataNode component on multiple nodes).
-
Manifest: The manifest refers to the definition of a task that is sent to a node for execution. The manifest must completely define the task and must be serializable. The manifest can also be persisted on disk for recovery or logging.
-
Role: A role maps a component (for example, NameNode) or an action (for example, HDFS rebalancing, HBase smoke test, etc.).
-
Node (or Host): Node refers to a machine (physical or virtual) in the Cluster. Node and host are used interchangeably in this documentation.
-
Operation: An operation refers to a set of changes or actions performed on a cluster to satisfy a user request or to achieve a desired state change in the Cluster. For example, starting a service or running a smoke test are operations. If a user request to add a new service to the Cluster includes running a smoke test as well, the entire set of actions to fulfill the user request will compose an Operation. An operation can consist of multiple ordered "actions".
-
Service: Service refers to the services of the Service Platform, such as HDFS, YARN, Spark, Kafka, among others. A service can have multiple components (for example, HDFS has NameNode, DataNode, etc.) or just be a client library (for example, Sqoop does not have any daemon service, just a client library).
-
Stage: A stage refers to a set of tasks required to complete an operation and are independent of each other. All tasks in the same stage can be executed on different nodes in parallel.
-
Stage Plan: An operation usually consists of multiple tasks on multiple machines, and they often have dependencies that require them to be executed in a specific order. Some tasks must be completed before others can be scheduled. Therefore, the tasks required for an operation can be divided into several stages, in which each stage must be completed before the next stage, but all tasks in the same stage can be scheduled in parallel on different nodes.
-
Task : Task is the unit of work sent for execution on a node. A task is the work that the node has to perform as part of an action. For example, an "action" can be composed of installing a DataNode on node N1 and installing a DataNode and a Secondary NameNode on node N2. In this case, the "task" for N1 will be to install a DataNode, and the "tasks" for N2 will be to install a DataNode and a Secondary NameNode.