Asset Health Insights Documentation

A transformation pipeline refers to a structured arrangement that includes a Directed Acyclic Graph (DAG) of transformation functions. Users can interactively select the necessary transformation functions by employing a drag-and-drop interface along with configurable setting blocks. These blocks can be connected to establish the flow of data through the pipeline. This intuitive setup allows users to easily manipulate and visualize the sequence and interaction of data transformations, facilitating a clear and efficient process for data manipulation and analysis.

To edit configuration of each block. User should click on icon

The below table show the configurable items for blocks

Configurable Items		Description
Input Data frame	Get from dataset	Access and generate features from an existing dataset, represented in tabular format
Input Data frame	Features	Users can add rows to this table, with each row containing the following elements: Feature name: Specifies the name of the feature. Data type: Identifies the type of data, which may include text, bool, float, int. Feature Type: Designates whether the column is a feature or target column
Transformation block	Features	This table also displays the features inputted from connected preceding blocks (referred to as “Father blocks”), indicating the sources of the data. Users have the option to select specific rows for processing by ticking a checkbox at the beginning of each row. For example, in this scenario described, the block receives inputs for features att0, att1, att2, att3. However, the user has configured it to process only att0, att1, and att3.
Transformation block	Config	Some transformation functions include configurable variables, defined as environment variable that allow behavior at runtime. User can adjust these configurations to align with the specific logic required by the block function.
Output Data frame	Features	It shows that the expected columns of all pipeline with current configurations. User can imagine that what is the output of pipeline.
Output Data frame	Config	This is a JSON file contains information about pipeline during runtime, enabling the data pipeline to pass information to the training job through this JSON file.