Transform functions

add_column.date

Module for all the date related operations a dataframe.

add_column.date.date

Extract the date from a timestamp into a new column.

Parameters
  • from_column (str) – source column
  • to_column (date) – destination column, cast from_column to a DateType

add_column.date.dayofmonth

Extract the day of the month from a timestamp into a new column.

Parameters
  • from_column (date) – source column
  • to_column (int) – destination column, representing the day of month of the from_column

add_column.date.month

Extract the month from a timestamp into a new column. Add a year column from a date column

Parameters
  • from_column (date) – source column
  • to_column (int) – destination column, representing the month of the from_column

add_column.date.unixtime_to_utcz

Convert a unix timestamp.

Parameters
  • from_column (int) – source column
  • to_column (str) – destination column, string formatted as yyyy-MM-dd'T'HH:mm:sssZ

add_column.date.year

Extract the year from a timestamp into a new column.

Add a year column from a date column

Parameters
  • from_column (date) – source column
  • to_column (int) – destination column, representing the year of the from_column

transform

Performs spark transformation operations.

transform.cast_column

Return DF with the column cast to new type and with the columns in the same order.

Parameters
  • col (str) – name of the column
  • new_type (T) – type of the column

transform.concat

Concatenate columns with delimiter and return concatenated column

Parameters
  • from_columns (list) – list of column names
  • to_column (str) – destination column
  • delimiter (str) – the delimiter between each column, default _

transform.drop_duplicates

Drop duplicates in the dataframe

Parameters
  • columns (list, optional) – list of columns names to make unique, default takes all columns

transform.explode

Explode a list in a cell to many rows in the dataframe

Parameters
  • col (str) – name of the column to explode
  • new_col (str) – name of the new column to explode to, could be exploded column

transform.filter_dataframe

Apply filter to DF and filters out(removes) rows satifying the specified condition.

transform.get_item

Return DF with a column that contains one item for an array

Parameters
  • col (str) – name of the column
  • new_col (str) – type of the new column
  • index (any) – the index key

Examples:

SectionName:
    Type: transform::generic
    Input: InputBlock
    Properties:
    Functions:
        - get_item:
            col: name
            new_col: firstname
            index: 2

transform.get_json_object

Return DF with a column that is a value extracted from a json object column.

Parameters
  • col (str) – name of the json column
  • new_col (str) – type of the new column
  • path (any) – the path key

Examples:

SectionName:
    Type: transform::generic
    Input: InputBlock
    Properties:
    Functions:
        - get_json_object:
            col: context
            new_col: context_type
            path: type

transform.join

Return a joined DF.

Parameters
  • TODO (undefined) – unclear how this works

transform.rename_column

Return DF with the column renamed and with the columns in the same order.

Parameters
  • col (str) – name of the column
  • new_name (str) – new name of the column

transform.select

Select columns mentioned in cols argument and apply renaming/casting transformations if any.

Parameters
  • cols (list) – list of columns

Columns

Parameters
  • col (str) – name of the column
  • add_new_column (bool) – add new column, default false
  • alias (str) – set alias for column
  • cast (str) – cast column to type
  • default_value (str) – set the default value of the column

If add_new_columns is true, add missing columns with None values.

transform.split

Return DF with a column that is the result of splitting a column on a given character

Parameters
  • col (str) – name of the column
  • new_col (str) – type of the new column
  • split_on (int) – split the string on this char

Examples:

SectionName:
    Type: transform::generic
    Input: InputBlock
    Properties:
    Functions:
        - split:
            col: name
            new_col: firstname
            split_on: ' '

transform.substring

Return DF with a column that is an substring of given column and with columns in the same order.

Parameters
  • col (str) – name of the column
  • new_col (str) – type of the new column
  • pos (int) – substring starts at pos
  • length (int) – length of substring

transform.union

Return union of DFs.

Parameters
  • TODO (undefined) – unclear how this works

transform.where

Apply where to DF and returns rows satifying the specified condition.

Note: Column names with special characters like '.' and '-' must be escaped with ´ ´ Example: payload.attributes.`plant-id` (escaping hyphen)

Parameters
  • predicate (PredicateType) – the predicate

PredicateType

PredicateType consists of a list with 3 string values.

Examples:

SectionName:
    Type: transform::generic
    Input: InputBlock
    Properties:
    Functions:
        - where:
            predicate: [DeviceName, '!=', 'null']