All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Improve
names
extraction withpandas.MultiIndex
types:- Use only the last level values if
DataFrame.columns
is aMultiIndex
. - Ignore
None
values inMultiIndex.names
.
- Use only the last level values if
0.12.2 - 2024-10-16
- Fix typehint in
Translator.translated_names()
. - Fix
file
note in exceptions raised byMultiFetcher
children.
0.12.1 - 2024-09-28d
- Filter children by sources in
MultiFetcher.fetch_all()
.
0.12.0 - 2024-09-28
- Property
TranslationMap.len_per_source
.
- Improve
PolarsIO
performance (~4x faster for large datasets). - Typing updates (notably numpy
2.0
). - Added
concurrent_operation_action=raise|ignore
toAbstractFether
. Default isignore
forMemoryFetcher
. - Implementations that override
SqlFetcher.select_where()
no longer have to call the supermethod to ensure that IDs are filtered.
- The
Translator.go_offline(translatable=None)
-method now respects given names arguments.
- The
AbstractFetcher
now longer provides a caching implementation. Provides overridable methods instead.
0.11.1 - 2024-06-17
- Fix crash when using
deepclone
dMultiFetcher
instances (again).
0.11.0 - 2024-06-16
- Implemented
SqlFetcher.__deepclone__()
.
- Parameter
inplace
; usecopy
instead. - Parameter
maximal_untranslated_fraction
; usemax_fails
instead.
- Fix crash when using
deepclone
dMultiFetcher
instances. - Fix crash when object-type ID collections contain
NaN/None
values.
0.10.2 - 2024-05-28
- Fix crash on import with
rics>=4.1.0
.
0.10.1 - 2024-04-17
- Rewrite the Override-only mapping subsection in the mapping primer. Improve clarity and fix some confusing sentences.
- Improved exception handling in the
MultiFetcher
; add notes to identify raising child.
0.10.0 - 2024-04-08
- Added integration for polars.DataFrame.
- Added integration for dask.DataFrame and Series.
- Consume
[transform]
-sections in auxiliary configuration files (#231). - Added typehints to
dio.DataStructureIO
and otherid_translation.dio
classes and functions. - Added functions and methods to make creating new
DataStructureIO
implementations easier.
- Verify top-level sections in auxiliary configuration files.
0.9.0 - 2024-03-28
- Added new utility
utils.translation_helper.TranslationHelper
. - Added several new
TypedDict
types totranslator_typing
. - Added
Translator.translate
overloads. Catch-all overload forreverse=True
. - Added many new in-line examples to class, function and module docstrings. Updated and corrected or clarified several docstrings which were poorly worded or outdated.
- Methods
Translator.fetch()
andgo_offline()
now expose arguments (such asmaximal_untranslated_fraction
) that were previously limited totranslate()
. - Improve when a
maximal_untranslated_fraction
is in use. - The
Format
class is now callable for convenience (keyword only).
- Return copy in
TranslationMap.name_to_source
- consistent with similar properties. - Handle
dict
names properly inTranslator.fetch()
andgo_offline()
. - Untranslated IDs should now never be
None
- ensure a validFormat
is always available. - Raise
ValueError
when using positional placeholders inFormat
. This used to be a silent error.
0.8.0 - 2024-03-23
- Python
3.12
is now fully tested and supported in CI/CD. - New module
translator_typing
. Useful especially users who which to extend the baseTranslator
implementation. - Added support for simplified
Translator.fetcher
arguments on the form{source: {id: name}}
.
- Python minimum version is now
3.11
(was3.8
). - Minimum
pandas
version is now2.0.3
(was1.1.0
). - Minimum
sqlalchemy
version is now2.0.5
(was1.4.16
). - Updated base exceptions for several
id_translation.*.exceptions
-members:DataStructureIOError
:RuntimeError
->TypeError
ConfigurationError
:ValueError
->TypeError
ConnectionStatusError
:ValueError
->ConnectionError
TranslationError
:ValueError
->Exception
MappingError
:ValueError
->Exception
- Make
unmapped_values_action != 'ignore'
actions more specific: Raise newUnmappedValuesError(MappingError)
or warnUnmappedValuesWarning(MappingWarning)
(used to raise parent types directly). Add hints to warning message.
0.7.1 - 2024-03-09
- Heuristic functions that accept a
plural_to_singular
-argument now also accept a custom transformer. - Expose read-only attributes
Translator.fmt
anddefault_fmt
.
- Fixed an issue which sometimes caused a crash when verifying translations.
- Fixed an issue which sometimes caused a crash when one or more names were empty (zero IDs).
- Fixed plural-to-singular (
NounTransformer
) transforms of nouns such as 'languages', 'states', and many others. - Fixed a performance issue for large
pandas.Series
andIndex
objects.
0.7.0 - 2024-02-09
- New short-circuiting function
mapping.heuristic_functions.smurf_columns()
. - New function
dio.register_io()
, allowing users to create their own custom IO implementations. - New property
Translator.transformers
, allowing users to register new transformers after initialization.
- Lower cache hit default log level from INFO to DEBUG.
- Rename
MultiFetcher.fetchers
->MultiFetcher.children
.
- Some cosmetic logging and documentation issues.
- The base cache path for fetcher data is now configurable using
CacheAccess.BASE_CACHE_PATH
. - The
MultiFetcher
will no longer discard required fetchers for any reason. - Fall back to fetcher reuse in
Translator.clone()
whendeepcopy(Translator.fetcher)
fails.
- The
Translator.get_transformer
method (redundant: useTranslator.transformers.get()
instead).
0.6.0 - 2023-11-29
- The
Translate.translate()
-method now has overloads for improved typing. - User-defined ID and translation transformation framework:
id_translation.transform
. - Bitmask translation support:
id_translation.transform.BitmaskTransformer
. - Serialization methods for
TranslationMap
:to_dicts()
,to_pandas()
,from_pandas()
. Translations maps are returned byTranslator.fetch()
and thecache
attribute.
- Make
Translator.translated_names()
optionally return a mapping dict instead of just names. - Caching data for the
AbstractFether
has been updated.- Reduce the amount of excess data stored (now: records only).
- Store records per source instead of all sources in the same .pkl-file.
- Improve handling for
UUID
-like IDs.Fetcher
implementations now respectTranslator
settings with regard to UUID mitigations. - Update
SqlFetcher
:- No longer uses table sizes. This could be expensive for large tables.
- Simplify selection filtering; now only uses
SqlFetcher.select_where()
instead of two separate methods. - Add special handling of UUIDs when
SQLAlchemy<2
.
- Renamed
Translator.store()
->Translator.go_offline()
. - Change
Translator.default_fmt
toFormat("<Failed: id={id!r}>")
(wasNone
).
- Fixed issues in
Format
:- Fixed rendering of
{id}
when used in fallback format. - Fixed rendering of escaped curly brackets
{{literal-text}}
. - Convert optional blocks without placeholders to literal text.
- Fixed rendering of
- The
PandasFetcher
now properly handles remote filesystems.
- Attribute translation is no longer support.
Translator.allow_name_inheritance
attribute as been removed, as well as theTranslator.translate(attribute)
-argument. - The
Translator.from_config(clazz)
-argument (always usecls
instead).
0.5.1 - 2023-07-01
- Fix crash in
SqlFetcher.__str__
with bad engine configs. - Lower excessive log level used when discarding optional fetchers (configuration option added).
0.5.0 - 2023-06-29
- Add
Translator.translated_names()
. Returns the most recent names that were translated by calling instance. - Ability to mark a fetcher as optional. In multi-fetcher mode, optional fetchers are discarded if they raise an error the first time a source/placeholder enumeration is requested.
- A name-to-source dict may now be passed in place of the names
'names'
-argument. - Translation of
set
-type data is now supported. - Add environment variable
ID_TRANSLATION_DISABLED
to globally disable translation. EmitsTranslationDisabledWarning
once. - New exception type
MissingNamesError
. Raised when names cannot be derived (and not explicitly given) based on the data type instead ofAttributeError
.
- Add handling of attributes of retrieved translation elements (e.g.
UUID.int
). - The
AbstractFetcher.selective_fetch_all
-flag now restricts the columns retrieved bySqlFetcher
. - Extend
heuristic_functions.like_database_table
to handle more pluralization types. - Explicit
names
may no longer combined withignored_names
. - Improve support for translation of heterogeneous
dict
value types.
- Translation of
pandas.MultiIndex
is now properly supported (as indicated by not throwingUntranslatableTypeError
). - Preserve
format_spec
andconversion
inFormat.positional_part
. This means that format strings such as'{uuid!s:.8}:{name!r}'
will now work as expected. - Ensure deterministic match selection when scores are equal due to overrides.
- Ensure placeholders aren't fetched twice in the same query.
- Prevent crashing when using a non-translatable parent type with the
'attribute'
-argument.
- The now unused module
fetching.support
, and the functionSqlFetcher.TableSummary.select_columns()
.
0.4.0 - 2023-06-16
- The
uuid.UUID
-type has been added toIdType
s. - Add the
Translator.enable_uuid_heuristics
flag (default=False
). - The
Translator.translate()
-method now accepts an optionalfmt
-argument (had to useTranslator.copy(fmt=fmt).translate(...)
before). - Improved support and added documentation for override-only mapping.
- Clean up and rename a large number of heuristic and filter functions.
- Changed the default score function of the
Mapper
fromequality
todisabled
. - The
TranslatorFactory
now makes an effort to include the source file of config issues.
- Duplicate explicit names are now supported for most types (closes #4).
- Duplicate column names for the
pandas.DataFrame
translatable type are now supported. - The
AbstractFetcher
class now uses a warning to inform the user about consequences whenunmapped_values_action='raise'
is used. - Instead of silently failing, the
SqlFetcher
now raises when ID column mapping fails for a whitelisted table. - Fixed a performance issue when translating large
pandas.Series
instances (includingpandas.DataFrame
columns).
- The
FormatApplier
class is no longer abstract. RemovedDefaultFormatApplier
. - The
Mapper.context_sensitive_overrides
property. Plain overrides are now treated as shared/default overrides when a context is given. The type check inAbstractFetcher
has been removed (config-based fetching will work as before).
0.3.1 - 2023-03-19
- Convert
rics.mapping
into an internal package. ID translation now usesid_translation.mapping
. - Reduce the amount of records emitted in non-verbose mode.
- Structure mapping log messages by use case;
*.mapping.name-to-source
and*.mapping.name-to-source.placeholders
. - Fetchers inheriting from
AbstractFetcher
now include the primary cache key in the logger name (config filename).
0.3.0 - 2023-03-10
Release 0.3.0, require rics>=3.0.0
. Add the id-translation-project
cookiecutter template.
- New optional
schema
argument forSqlFetcher
. - Finished
Translator.load_persistent_instance()
implementation (no longer experimental). - The
SqlFetcher.finalize_statement()
method, used to customize fetching behavior programmatically. - New INFO-level begin/end log messages for
Translator.translate()
. - Raise
ConcurrentOperationError
inAbstractFetchers.fetch()
to prevent race conditions. - Limit
AbstractFetcher.fetch_all()
to sources that contain the required placeholders (after mapping) by default. - A large number of new debug messages with
extra
-dict values set. These all have keysevent_key
andevent_stage
as well as anexecuton_time
argument whenevent_stage='EXIT'
. Additional extras depend on context. - Caching logic to
AbstractFetcher
. Only active when explicitly enabled andAbstractFetcher.online
isFalse
. - Environment variable interpolation is now possible anywhere TOML config files. Key points:
- Cache logic does NOT consider actual values (only names)
- By default, simple interpolation is enabled.
- TOML config metaconfig can be placed in
metaconf.toml
, next to main config. - Interpolation can be configured under
[env]
in metaconf.
- Improve error reporting for unmapped required placeholders; warn about potential override issues.
- Default
MultiFetcher.duplicate_source_discovered_action
increased from 'ignore' to 'warn'. - Allow specifying
MultiFetcher
init arguments from the main TOML configuration file. - Set default value of
MultiFetcher.max_workers
to 1. - Set default value of
SqlFetcher.include_views
toFalse
.
- Minimum install requirement is now correctly set to
SQLAlchemy>=1.4
. - Now correctly always fetches all placeholders when performing a FETCH_ALL-operation.
- Copy
allow_name_inheritance
inTranslator.copy()
.
- Redundant alias
types.ExtendedOverrideFunction
and related code. - The
PandasFetcher.read_function_args
init argument, sinceread_function_kwargs
is much less error-prone. - Custom handling of environment variables in
SqlFetcher
.
0.2.1 - 2023-02-04
- Now compatible with
SQLAlchemy>=2
. Typing has not been updated for SQLAlchemy v2, since this would break backwards compatibility withSQLAlchemy<2
.
- Improve some SQL Fetcher log messages.
0.2.0 - 2022-11-30
- Fixed a few documentation issues.
- Bump requirement from
rics==1.0.0
torics>=2
. - Switch
id_translation.ttypes
back to justtypes
.
0.1.0 - 2022-11-26
- Branch from [email protected].
- Move out of
rics
namespace. - Switch to relative imports.
- Fix some intersphinx issues.