Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

3.3.0 - 2023-04-05

3.2.4 - 2022-10-18

Fixed

s3_date_prefix_scan won't error when a path is not found

3.2.3 - 2022-10-10

Do not delete files, just overwrite the delta tbale with mode overwrite in delta_write block

3.2.2 - 2022-10-06

Bugfix: Add new exception for delta table read errors

3.2.1 - 2022-10-06

Add new exception for delta table read errors

3.2.0 - 2022-10-06

Delta core and delta storage 2.1
Use bool param mergeSchema for all delta writes

3.1.0 - 2022-10-04

Changed

Test with latest 2 pyspark versions only

3.0.0 - 2022-08-22

Breaking change: HiveTable.DatabaseName and HiveTable.TableName is mandatory
Add: Support for Databricks Unity Catalog
Change: file registry date scan now use SQL LIST for UC

2.8.1 - 2021-12-17

Add possibility to define JSON schema with JSON or PySpark code when reading XML files

2.8.0 - 2021-12-13

Add possibility to define JSON schema with JSON or PySpark code when reading JSON files

2.7.3 - 2021-11-18

Add support to read the Delta change data feed when running on Databricks platform

2.7.2 - 2021-11-11

Add mergeSchema option to delta write block

2.7.1 - 2021-09-30

Changed

Set default logging formatter to detailed

Fixed

Fixed code smells in upsert.py

2.7.0 - 2021-08-30

Added

Support for pyspark 3.1

2.6.0 - 2021-08-17

Added

Support for retry in postgresql and mysql upsert

2.5.1 - 2021-06-28

Added

Add logging of lift parametes and it values before starting lift

2.5.0 - 2021-06-17

Added

Add secret word filter for logging

Fixed

Use f-string for logging statements
Update how to run test instructions
Make mysql library non extra

2.4.0 - 2021-05-18

Added

custom::sql block for executing SQL statements

Fixed

Some codesmells according to SonarCloud

2.3.0 - 2021-03-29

Added

get_json_object transform function

2.2.0 - 2021-03-26

Added

MySQL upsert support

Changed

Test against postgres versions 10, 11, 12 and 13

2.1.0 - 2021-03-23

Added

split transform function
get_item transform function

2.0.0 - 2021-02-24

Removed

Support for pyspark 2.4.5

1.11.0 - 2021-02-24

Added

Add substring transform function

1.10.1 - 2021-01-25

Changed

The fileregistry::delta_diff fileregistry will read all data if the default start date is before the first version of the delta table

1.10.0 - 2021-01-22

Added

The fileregistry::delta_diff fileregistry for delta files

1.9.2 - 2020-12-16

Added

Parameters resolving will happen in sub strings as well like "${myVar}/extra"

1.9.1 - 2020-12-08

Added

Add support for nested columns in drop_duplicates transform function

1.9.0 - 2020-12-02

Added

Multiple outputs in custom::python_codeblock

1.8.0 - 2020-11-11

Added

Add drop_duplicates transform function

1.7.1 - 2020-11-10

Allow a retention interval shorter than 7 days for delta tables

1.7.0 - 2020-10-30

Added

Write json files through write::batch_json block

Changed

Update dependency versions

1.6.3 - 2020-10-29

Fixed

Bugfix: When creating empty arrays they looked like array. That is not supported by spark 3 so instead we create empty array

1.6.2 - 2020-10-29

Fixed

Bugfix for loading empty directories with batch_delta using spark 3.0

1.6.1 - 2020-10-27

Fixed

Bugfix the Databricks optimize of file-registry after updating

1.6.0 - 2020-10-27

Changed

Changed python version requirements to include python 3.9
Add Databricks optimize and vacuum of file-registry after updating

1.5.0 - 2020-10-23

Added

Options parameter in load::batch_json to be able to submit more settings when loading json files (like multiLine: true)

1.4.3 - 2020-10-21

Changed

Use of psycopg2.extras.execute_values to remove and simplify code

Removed

Utils functions chunked and flatten_rows_dict in getl/common/upsert.py

Fixed

When checking if a file registry exists in an empty directory or in a S3 prefix that doesn't exist, a different exception is raised

1.4.2 - 2020-09-30

Critical bugfix for Hive table creation.

1.4.1 - 2020-09-30

Support for PartitionBy columns for HiveTable

1.4.0 - 2020-09-29

Added

Support for PartitionBy columns in write::batch_delta

1.3.0 - 2020-09-28

Added

Support for loading csv files with batch_csv

1.2.0 - 2020-09-09

Added

Explode functionality is added to the generic transform block

1.1.0 - 2020-09-03

Added

Postgres upsert support

Changed

Schema for batch_json and batch_xml is now optional

1.0.1 - 2020-08-24

Added

Support for s3a:// paths
python -m bin bumpversion changes the CHANGELOG.md for a release changelog
Documentation on how to release a new version

Fixed

Links to versions in CHANGELOG.md
Fix the fileregistry type in docs/migrations/s3_date_prefix_scan.md to fileregistry::s3_date_prefix_scan

1.0.0 - 2020-08-19

Added

s3_date_prefix_scan fileregistry, based upon prefix_based_date, see migration.
pyspark 3.0 support including backwards compatibility support for pyspark 2.4
Python 3.8 support for pyspark 3.0

Removed

prefix_based_date fileregistry.