Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

3.3.0 - 2023-04-05

3.2.4 - 2022-10-18

Fixed

  • s3_date_prefix_scan won't error when a path is not found

3.2.3 - 2022-10-10

  • Do not delete files, just overwrite the delta tbale with mode overwrite in delta_write block

3.2.2 - 2022-10-06

Bugfix: Add new exception for delta table read errors

3.2.1 - 2022-10-06

  • Add new exception for delta table read errors

3.2.0 - 2022-10-06

  • Delta core and delta storage 2.1
  • Use bool param mergeSchema for all delta writes

3.1.0 - 2022-10-04

Changed

  • Test with latest 2 pyspark versions only

3.0.0 - 2022-08-22

  • Breaking change: HiveTable.DatabaseName and HiveTable.TableName is mandatory
  • Add: Support for Databricks Unity Catalog
  • Change: file registry date scan now use SQL LIST for UC

2.8.1 - 2021-12-17

  • Add possibility to define JSON schema with JSON or PySpark code when reading XML files

2.8.0 - 2021-12-13

  • Add possibility to define JSON schema with JSON or PySpark code when reading JSON files

2.7.3 - 2021-11-18

  • Add support to read the Delta change data feed when running on Databricks platform

2.7.2 - 2021-11-11

  • Add mergeSchema option to delta write block

2.7.1 - 2021-09-30

Changed

  • Set default logging formatter to detailed

Fixed

  • Fixed code smells in upsert.py

2.7.0 - 2021-08-30

Added

  • Support for pyspark 3.1

2.6.0 - 2021-08-17

Added

  • Support for retry in postgresql and mysql upsert

2.5.1 - 2021-06-28

Added

  • Add logging of lift parametes and it values before starting lift

2.5.0 - 2021-06-17

Added

  • Add secret word filter for logging

Fixed

  • Use f-string for logging statements
  • Update how to run test instructions
  • Make mysql library non extra

2.4.0 - 2021-05-18

Added

  • custom::sql block for executing SQL statements

Fixed

  • Some codesmells according to SonarCloud

2.3.0 - 2021-03-29

Added

  • get_json_object transform function

2.2.0 - 2021-03-26

Added

  • MySQL upsert support

Changed

  • Test against postgres versions 10, 11, 12 and 13

2.1.0 - 2021-03-23

Added

  • split transform function
  • get_item transform function

2.0.0 - 2021-02-24

Removed

  • Support for pyspark 2.4.5

1.11.0 - 2021-02-24

Added

  • Add substring transform function

1.10.1 - 2021-01-25

Changed

  • The fileregistry::delta_diff fileregistry will read all data if the default start date is before the first version of the delta table

1.10.0 - 2021-01-22

Added

  • The fileregistry::delta_diff fileregistry for delta files

1.9.2 - 2020-12-16

Added

  • Parameters resolving will happen in sub strings as well like "${myVar}/extra"

1.9.1 - 2020-12-08

Added

  • Add support for nested columns in drop_duplicates transform function

1.9.0 - 2020-12-02

Added

  • Multiple outputs in custom::python_codeblock

1.8.0 - 2020-11-11

Added

  • Add drop_duplicates transform function

1.7.1 - 2020-11-10

Allow a retention interval shorter than 7 days for delta tables

1.7.0 - 2020-10-30

Added

  • Write json files through write::batch_json block

Changed

  • Update dependency versions

1.6.3 - 2020-10-29

Fixed

  • Bugfix: When creating empty arrays they looked like array. That is not supported by spark 3 so instead we create empty array

1.6.2 - 2020-10-29

Fixed

  • Bugfix for loading empty directories with batch_delta using spark 3.0

1.6.1 - 2020-10-27

Fixed

  • Bugfix the Databricks optimize of file-registry after updating

1.6.0 - 2020-10-27

Changed

  • Changed python version requirements to include python 3.9
  • Add Databricks optimize and vacuum of file-registry after updating

1.5.0 - 2020-10-23

Added

  • Options parameter in load::batch_json to be able to submit more settings when loading json files (like multiLine: true)

1.4.3 - 2020-10-21

Changed

  • Use of psycopg2.extras.execute_values to remove and simplify code

Removed

  • Utils functions chunked and flatten_rows_dict in getl/common/upsert.py

Fixed

  • When checking if a file registry exists in an empty directory or in a S3 prefix that doesn't exist, a different exception is raised

1.4.2 - 2020-09-30

  • Critical bugfix for Hive table creation.

1.4.1 - 2020-09-30

  • Support for PartitionBy columns for HiveTable

1.4.0 - 2020-09-29

Added

  • Support for PartitionBy columns in write::batch_delta

1.3.0 - 2020-09-28

Added

  • Support for loading csv files with batch_csv

1.2.0 - 2020-09-09

Added

  • Explode functionality is added to the generic transform block

1.1.0 - 2020-09-03

Added

  • Postgres upsert support

Changed

  • Schema for batch_json and batch_xml is now optional

1.0.1 - 2020-08-24

Added

  • Support for s3a:// paths
  • python -m bin bumpversion changes the CHANGELOG.md for a release changelog
  • Documentation on how to release a new version

Fixed

  • Links to versions in CHANGELOG.md
  • Fix the fileregistry type in docs/migrations/s3_date_prefix_scan.md to fileregistry::s3_date_prefix_scan

1.0.0 - 2020-08-19

Added

  • s3_date_prefix_scan fileregistry, based upon prefix_based_date, see migration.
  • pyspark 3.0 support including backwards compatibility support for pyspark 2.4
  • Python 3.8 support for pyspark 3.0

Removed

  • prefix_based_date fileregistry.