Transform functions
add_column.date
Module for all the date related operations a dataframe.
add_column.date.date
Extract the date from a timestamp into a new column.
- Parameters
-
- from_column (str) – source column
- to_column (date) – destination column, cast from_column to a
DateType
add_column.date.dayofmonth
Extract the day of the month from a timestamp into a new column.
- Parameters
-
- from_column (date) – source column
- to_column (int) – destination column, representing the day of month of the from_column
add_column.date.month
Extract the month from a timestamp into a new column. Add a year column from a date column
- Parameters
-
- from_column (date) – source column
- to_column (int) – destination column, representing the month of the from_column
add_column.date.unixtime_to_utcz
Convert a unix timestamp.
- Parameters
-
- from_column (int) – source column
- to_column (str) – destination column, string formatted as
yyyy-MM-dd'T'HH:mm:sssZ
add_column.date.year
Extract the year from a timestamp into a new column.
Add a year column from a date column
- Parameters
-
- from_column (date) – source column
- to_column (int) – destination column, representing the year of the from_column
transform
Performs spark transformation operations.
transform.cast_column
Return DF with the column cast to new type and with the columns in the same order.
- Parameters
-
- col (str) – name of the column
- new_type (T) – type of the column
transform.concat
Concatenate columns with delimiter and return concatenated column
- Parameters
-
- from_columns (list) – list of column names
- to_column (str) – destination column
- delimiter (str) – the delimiter between each column, default
_
transform.drop_duplicates
Drop duplicates in the dataframe
- Parameters
-
- columns (list, optional) – list of columns names to make unique, default takes all columns
transform.explode
Explode a list in a cell to many rows in the dataframe
- Parameters
-
- col (str) – name of the column to explode
- new_col (str) – name of the new column to explode to, could be exploded column
transform.filter_dataframe
Apply filter to DF and filters out(removes) rows satifying the specified condition.
transform.get_item
Return DF with a column that contains one item for an array
- Parameters
-
- col (str) – name of the column
- new_col (str) – type of the new column
- index (any) – the index key
Examples:
SectionName:
Type: transform::generic
Input: InputBlock
Properties:
Functions:
- get_item:
col: name
new_col: firstname
index: 2
transform.get_json_object
Return DF with a column that is a value extracted from a json object column.
- Parameters
-
- col (str) – name of the json column
- new_col (str) – type of the new column
- path (any) – the path key
Examples:
SectionName:
Type: transform::generic
Input: InputBlock
Properties:
Functions:
- get_json_object:
col: context
new_col: context_type
path: type
transform.join
Return a joined DF.
- Parameters
-
- TODO (undefined) – unclear how this works
transform.rename_column
Return DF with the column renamed and with the columns in the same order.
- Parameters
-
- col (str) – name of the column
- new_name (str) – new name of the column
transform.select
Select columns mentioned in cols argument and apply renaming/casting transformations if any.
- Parameters
-
- cols (list) – list of columns
Columns
- Parameters
-
- col (str) – name of the column
- add_new_column (bool) – add new column, default false
- alias (str) – set alias for column
- cast (str) – cast column to type
- default_value (str) – set the default value of the column
If add_new_columns is true, add missing columns with None values.
transform.split
Return DF with a column that is the result of splitting a column on a given character
- Parameters
-
- col (str) – name of the column
- new_col (str) – type of the new column
- split_on (int) – split the string on this char
Examples:
SectionName:
Type: transform::generic
Input: InputBlock
Properties:
Functions:
- split:
col: name
new_col: firstname
split_on: ' '
transform.substring
Return DF with a column that is an substring of given column and with columns in the same order.
- Parameters
-
- col (str) – name of the column
- new_col (str) – type of the new column
- pos (int) – substring starts at pos
- length (int) – length of substring
transform.union
Return union of DFs.
- Parameters
-
- TODO (undefined) – unclear how this works
transform.where
Apply where to DF and returns rows satifying the specified condition.
Note: Column names with special characters like '.' and '-' must be escaped with ´ ´
Example: payload.attributes.`plant-id`
(escaping hyphen)
- Parameters
-
- predicate (PredicateType) – the predicate
PredicateType
PredicateType consists of a list with 3 string values.
Examples:
SectionName:
Type: transform::generic
Input: InputBlock
Properties:
Functions:
- where:
predicate: [DeviceName, '!=', 'null']