Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update FW branch with latest main, to prep for testing #133

Merged
merged 82 commits into from
Oct 17, 2024

Conversation

tomreitz
Copy link
Collaborator

No description provided.

jayckaiser and others added 30 commits April 19, 2024 17:25
* Update CHANGELOG and VERSION.

* Hotfix/optional parquet sources (#86)

* Update optional file check in FileSource to build an empty dataframe if an empty folder is passed.

* Remove explicit file check in compile.

* Re-add filesize check in FileSource.execute().

* Move FtpSource connect from compile to execute.

* Fix attribute naming bug.

* Fix bug.

* Allow filepaths to be passed in optional FileSources, and check the existance of the path before loading the dataframe.

* Update CHANGELOG.

* fix add_columns typo in readme

* update changelog

* Feature/union all columns (#94)

* Add 'fill_missing' optional field to UnionOperation that uses default Pandas concat logic without erroring out. Still raise a debug message when applicable.

* Rename new field to 'fill_missing_columns' for clarity.

* Update dataframe.py

Rename fill_missing_columns to fill_missing.

* Update dataframe.py

* Update CHANGELOG.md

* Update CHANGELOG.md

* Rename UnionOperation's fill_missing field to fill_missing_columns; update README.

* Git clone timeout when running `earthmover deps` (#93)

* try using subprocess with timeout

* Update error message

* tweak timeouts

* switch to makedirs

* don't error if dir already exists

* remove package path on failure

* adjust deletes

* typo

* switch to rmtree

* remove gitpython dependency

* remove unused import

* remove unused var

* add optional git timeout config

* reverse accidentally removed kwargs

* add notes on git_auth_timeout config to readme

* code cleanup

* Update README.

---------

Co-authored-by: jayckaiser <[email protected]>

* Update changelog.

* Fix escape chars in output when `linearize: False` (#98)

* fixes a bug where escape characters were present in the output file when linearize is False

* remove unneeded Dask import

* update return value and comment based on notes from Jay

---------

Co-authored-by: Tom Reitz <[email protected]>

* fixing a bug introduced in the last version where nested JSON would be loaded as a stringified Python dictionaty, which is difficult to use in downstream Jinja (#97)

Co-authored-by: Tom Reitz <[email protected]>

* Only write `earthmover_compiled.yaml` on compile, not run (#91)

* only write to disk on compile, not run

* update readme with change to earthmover_compiled.yaml

* Add `earthmover clean` command and some CLI error handling (#87)

* add 'clean' command and clean up CLI messaging

* comment justifying dictionary

* update changlog

* remove skip_mkdir, make compiled_yaml_file a class attribute

* replace dict with list of constntas

---------

Co-authored-by: Jay Kaiser <[email protected]>

* Update CHANGELOG with new features.

* Fix `__row_data__` in `add_columns` and `modify_columns` operations (#99)

* fix __row_data__ in Jinja expressions of add_columns and modify_columns operations

* update how __row_data__ is added to prefent an error about modifying row

---------

Co-authored-by: Tom Reitz <[email protected]>

* Feature: Refactor Destination Execute (#95)

* Update config parsing to use ErrorHandler.assert_get_key() for all fields; move and unify Jinja template processing to execute.

* Update destination.py

* Update CHANGELOG.

* makes destination template optional (#88)

* makes destination template optional; when not specified, each row is turned into a JSON object where column names become object properties

* implement changes based on feedback from Jay

* bugfix

* Minor cleanup.

---------

Co-authored-by: Tom Reitz <[email protected]>
Co-authored-by: jayckaiser <[email protected]>

* Update CHANGELOG.

* adding the `debug` operation (#100)

* adding debug operation

* Update dataframe.py

Refactor code to improve readability and reference to existing Node attributes.

---------

Co-authored-by: Tom Reitz <[email protected]>
Co-authored-by: Jay Kaiser <[email protected]>

* Use Node.full_name in Node.check_expectations(), instead of redefining the string manually.

* Update CHANGELOG.

* Feature/flatten operation whitespace cleanup (#101)

* adding a flatten_operation

* README tweak

* implement changes based on feedback from Jay

* Clean up comments and whitespace in new FlattenOperation.

* Add print statements to debug tuple problem.

* Minor cleanup.

* Minor cleanup.

* Add single quotes to strip and trim variables in FlattenOperation.

* Fix single quote representation in trim_whitespace.

---------

Co-authored-by: Tom Reitz <[email protected]>

* Update CHANGELOG.

---------

Co-authored-by: johncmerfeld <[email protected]>
Co-authored-by: Samantha LeBlanc <[email protected]>
Co-authored-by: Tom Reitz <[email protected]>
Co-authored-by: Tom Reitz <[email protected]>
Hotfix: Resolve incompatible dependencies
updating fix with latest main
fix nested json not working when rendering destination templates
* Simplify code in FileDestination.render_row() to improve readability.

* Change FileDestination write logic to compute and write each partition, instead of mapping writes over rows.

* Update CHANGELOG and VERSION in preparation for patch.
…type()` and genericize logic using class attributes.
* init

* wip

* wip

* populate starter project

* add error message for invalid names

* reset requirements

* remove memory limit

* need to test on windows

* be more explict about mkdir

* be more explict about mkdir

* be more explict about mkdir

* fix base_dir issues

* cleanup

* remove formatting changes from main

* remove formatting changes from main

* update init readme

* fix typo

* add comment
tomreitz and others added 29 commits September 4, 2024 16:20
…_header_footer

add support for Jinja in a destination node header and footer
update version and changelog for 0.3.7 release
…ation_headers

Hotfix: Refactor Jinja Destination Headers and Footers
update version and changelog for bugfix release
Feature: Add support for Python 3.12, latest versions of Dask
adds a `--set` flag to the cli to enable overriding values in _compiled_ `earthmover.yml`
@tomreitz tomreitz merged commit 5ac827e into feature/fwf_colspecs Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants