Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
3.3.0 - 2023-04-05
3.2.4 - 2022-10-18
Fixed
- s3_date_prefix_scan won't error when a path is not found
3.2.3 - 2022-10-10
- Do not delete files, just overwrite the delta tbale with mode overwrite in delta_write block
3.2.2 - 2022-10-06
Bugfix: Add new exception for delta table read errors
3.2.1 - 2022-10-06
- Add new exception for delta table read errors
3.2.0 - 2022-10-06
- Delta core and delta storage 2.1
- Use bool param mergeSchema for all delta writes
3.1.0 - 2022-10-04
Changed
- Test with latest 2 pyspark versions only
3.0.0 - 2022-08-22
- Breaking change: HiveTable.DatabaseName and HiveTable.TableName is mandatory
- Add: Support for Databricks Unity Catalog
- Change: file registry date scan now use SQL LIST for UC
2.8.1 - 2021-12-17
- Add possibility to define JSON schema with JSON or PySpark code when reading XML files
2.8.0 - 2021-12-13
- Add possibility to define JSON schema with JSON or PySpark code when reading JSON files
2.7.3 - 2021-11-18
- Add support to read the Delta change data feed when running on Databricks platform
2.7.2 - 2021-11-11
- Add mergeSchema option to delta write block
2.7.1 - 2021-09-30
Changed
- Set default logging formatter to detailed
Fixed
- Fixed code smells in upsert.py
2.7.0 - 2021-08-30
Added
- Support for pyspark 3.1
2.6.0 - 2021-08-17
Added
- Support for retry in postgresql and mysql upsert
2.5.1 - 2021-06-28
Added
- Add logging of lift parametes and it values before starting lift
2.5.0 - 2021-06-17
Added
- Add secret word filter for logging
Fixed
- Use f-string for logging statements
- Update how to run test instructions
- Make mysql library non extra
2.4.0 - 2021-05-18
Added
- custom::sql block for executing SQL statements
Fixed
- Some codesmells according to SonarCloud
2.3.0 - 2021-03-29
Added
- get_json_object transform function
2.2.0 - 2021-03-26
Added
- MySQL upsert support
Changed
- Test against postgres versions 10, 11, 12 and 13
2.1.0 - 2021-03-23
Added
- split transform function
- get_item transform function
2.0.0 - 2021-02-24
Removed
- Support for pyspark 2.4.5
1.11.0 - 2021-02-24
Added
- Add substring transform function
1.10.1 - 2021-01-25
Changed
- The
fileregistry::delta_diff
fileregistry will read all data if the default start date is before the first version of the delta table
1.10.0 - 2021-01-22
Added
- The
fileregistry::delta_diff
fileregistry for delta files
1.9.2 - 2020-12-16
Added
- Parameters resolving will happen in sub strings as well like "${myVar}/extra"
1.9.1 - 2020-12-08
Added
- Add support for nested columns in drop_duplicates transform function
1.9.0 - 2020-12-02
Added
- Multiple outputs in custom::python_codeblock
1.8.0 - 2020-11-11
Added
- Add drop_duplicates transform function
1.7.1 - 2020-11-10
Allow a retention interval shorter than 7 days for delta tables
1.7.0 - 2020-10-30
Added
- Write json files through write::batch_json block
Changed
- Update dependency versions
1.6.3 - 2020-10-29
Fixed
- Bugfix: When creating empty arrays they looked like array
. That is not supported by spark 3 so instead we create empty array
1.6.2 - 2020-10-29
Fixed
- Bugfix for loading empty directories with batch_delta using spark 3.0
1.6.1 - 2020-10-27
Fixed
- Bugfix the Databricks optimize of file-registry after updating
1.6.0 - 2020-10-27
Changed
- Changed python version requirements to include python 3.9
- Add Databricks optimize and vacuum of file-registry after updating
1.5.0 - 2020-10-23
Added
- Options parameter in load::batch_json to be able to submit more settings when loading json files (like multiLine: true)
1.4.3 - 2020-10-21
Changed
- Use of psycopg2.extras.execute_values to remove and simplify code
Removed
- Utils functions chunked and flatten_rows_dict in getl/common/upsert.py
Fixed
- When checking if a file registry exists in an empty directory or in a S3 prefix that doesn't exist, a different exception is raised
1.4.2 - 2020-09-30
- Critical bugfix for Hive table creation.
1.4.1 - 2020-09-30
- Support for PartitionBy columns for HiveTable
1.4.0 - 2020-09-29
Added
- Support for PartitionBy columns in write::batch_delta
1.3.0 - 2020-09-28
Added
- Support for loading csv files with batch_csv
1.2.0 - 2020-09-09
Added
- Explode functionality is added to the generic transform block
1.1.0 - 2020-09-03
Added
- Postgres upsert support
Changed
- Schema for batch_json and batch_xml is now optional
1.0.1 - 2020-08-24
Added
- Support for s3a:// paths
- python -m bin bumpversion changes the CHANGELOG.md for a release changelog
- Documentation on how to release a new version
Fixed
- Links to versions in CHANGELOG.md
- Fix the fileregistry type in docs/migrations/s3_date_prefix_scan.md to fileregistry::s3_date_prefix_scan
1.0.0 - 2020-08-19
Added
- s3_date_prefix_scan fileregistry, based upon prefix_based_date, see migration.
- pyspark 3.0 support including backwards compatibility support for pyspark 2.4
- Python 3.8 support for pyspark 3.0
Removed
- prefix_based_date fileregistry.