- Remove defected encoding check.
- Fix
utf-8
encoding error for Japanese data sets.
- Set the username to lowercase internaly.
- Added new argument
--pred_threshold
that allows adding the prediction threshold (which is 0.5 by default, but can be changed during deployment) as a column. - Added new argument
--pred_decision
that allows adding the prediction decision (the value predicted by the model or the class label for classification) as a column.
- Added support for
--max_prediction_explanations
on DataRobot 4.3.x .
- Fixed bug where retrieving the user's API token would fail when insecure SSL is required
- Updated trafaret dependency to support the same version as https://pypi.org/project/datarobot/ .
- Added new argument
--max_prediction_explanations
that allows batch scoring with predictions explanations and addsexplanation_N_feature
andexplanation_N_strength
to each row in output document (whereN ∈ (1, max_prediction_explanations)
)
- Added check to detect and warn about quoted delimiters during --fast mode with --keep_cols.
- Update requests dependency due to https://nvd.nist.gov/vuln/detail/CVE-2018-18074
- Added
batch_scoring_deployment_aware
to Windows/Linux/MacOSX assets.
- Fixed setting proxy with env vars (
HTTP_PROXY
,HTTPS_PROXY
,NO_PROXY
) - Enforce
shelve
to usedbm.dumb
/dumbdbm
modules for Python 3.x/2.7 respectively to prevent hiccups on a big amount of generated checkpoint. Was caught on macos X (ndbm
backend).
- New command
batch_scoring_deployment_aware
for scoring with new deployment aware routes.
- Changes the sequence of config file look up. A config file in the working directory will now take precedence over a config file in the user’s home directory.
- Include
README.rst
into distribution as long description.
- Fix wheel installation for Python 2.
- Fix package installation in Python 3 environments.
- Brings back support for legacy predictions (api/v1) and a new parameter for specifying api version (
--api_version
). Checkbatch_scoring --help
for a list of valid options and the default value. - Adds
--no_verify_ssl
argument for disabling SSL verification and--ca_bundle
for specifying certificate(s) of trusted Certificate Authorities. - Default for timeout is now None, meaning that the code does not enforce a timeout for operations to the server. This allows completion of runs with higher numbers of threads, particularly in MacOS. The value remains modifiable, and 30 seconds is a reasonable value in most cases.
- An issue which caused exit codes to not be set correctly from executables installed via the standalone installer has been addressed. The exit codes will now be set correctly.
- An issue which caused script crashes if one or more boolean options were specified in the config file.
- Updates the distribution metadata to include modules critical to the functioning of this library.
- Batch scoring now works with Python 3.6 on Windows (offline installs require 3.5 though)
- Logs now include version, retry attempts and whether output file was removed.
- New argument no-resume that allows you to start new batch-scoring run from scratch without being questioned about previous runs.
- The version of the dependency trafaret has been pinned to 0.10.0 to deal with a breaking change in the interface of that package.
- A new "Version Compatibility" section has been added to the README to help surface to users any incompatibilities between versions of batch_scoring and versions of DataRobot.
- New parameter field_size_limit allows users to specify a larger maximum field size than the Python csv module normally allows. Users can use a larger number for this value if they encounter issues with very large text fields, for example. Please note that using larger values for this parameter may cause issues with memory consumption.
- Previously, files whose first few lines did not fit within 512KB would error during the auto-sampler (which finds a reasonable number of rows to send with each batch). This issue hsa been fixed by adding a fallback to a default of 10 lines per batch in these cases. This parameter can still be overridden by using the n_samples parameter.
- Fix issue when client error message wasn't logged properly.
- Set default timeout on server response to infinity.
- New semantic routes versioning support
- New prediction response schema support
- Dropped support of DataRobot Prediction API < 3.0 version.
- Independent prediction service support for scoring
- switched to supervisor + workers architecture, improving handling of errors and subprocess lifecycle control.
- Source code split into more mostly isolated modules.
- added 3rd parallel process which handles post-processing and writing of responses. This should greatly improve performance.
- add ability to compress data in transit
- --output_delimiter flag to set delimiter for output CSV. "tab" can be used
- for tab-delimited output
- --skip_row_id flag to skip row_id column in output
- fixed hang of batch-scoring script on CSV parse errors
- added summary of run at the end of script output with full list of errors,
- warnings and total stats.
- fixed error when trying to report multiline CSV error in fast mode
- Run all tests against Windows
- --pred_name parameter is documented. Potentially backward incompatible change:
- Previously, 1.0 class was used as positive result for binary predictions, now last class in lexical order is used
- Fixed memory leak and performance problem caused by unrestricted batch-generator
- internal check and error avoidance logic for requests that are too large
- docker and docker-compose files for dockerized run of tests and script
- auto sampler target batch size increased to 2.5M
- improve url parsing. You no longer need to include "/api" in the host argument.
- return more descriptive error messages when there is a problem
- include the version of the batch-scoring script in the user-agent header
- add option to define document encoding
- add option to skip csv dialect detection.
- make adjustment to sample size used by dialect and encoding detection
- use auto_sample as default unless "--n_samples" is defined
- allow "tab" command line arg keyword. e.g. "--delimiter=tab"
- minor performance improvement for nix users
- This release is compatible with Windows
- logs are now sent to two files within the directory where the script is run
- added --auto_sample option to find the n_samples automatically.
- added --auto_sample option to find the n_samples automatically.
- change how csv dialects are passed around in attempt to fix a bug on Windows.
- use chardet module chardet to attempt to detect character encoding
- use standard lib csv module to attempt to discover CSV dialect
- use stream decoder and encoder in python 2 to transparently convert to utf-8
- provide a mode for sending all user messages to stdout
- separate process for disk IO and request payload serialization
- avoid codecs.getreader due to IO bottleneck
- dont parse CSV (fail fatally on multiline csv)
- multiline mode (to be renamed)
- keep_cols resolution
- Get rid of gevent/asyncio, use thread-based networking
- Show path to logs on every unexpected error
- Convert cmdline argument parser from docopt to argparse
- Add configuration file support
- Refactor logging/ui
- Drop support of making predictions using 'v2' Modeling API
- Fix bug under Python 2 where gevent was fatally failing on timeouts.
- Added timeout argument.
- Both asyncio and gevent now retry within the request exception handler.
- Authorization now checks schema too and thus we fail much earlier if input not correct.
- Fix bug under Python 2 where gevent was silently dropping batches.
- Better checks if run completed successfully.
- Fail fast on missing column or dtype mismatch.
- Add naming of prediction column for regression.
- Fix ignore datarobot_key.
- Update requirements for Python 3 to minimum versions.
- Updated client side error reporting to show the status message when it returns formatted as JSON object instead of just the error code
- Use utf8 encoding for CSV strings sent to prediction API server
- Use CSV instead of JSON for better throughput and reduced memory footprint on the server-side.
- Gevent dependency update to fix ssl bug on 2.7.9.
- Setuptools support.
- Use python logging and maintain a debug log to help support engineers trace errors.
- More robust delimiter handling (whitelist).
- Dont segfault on non-splittable delimiter.
- Set number of retries default to 3 instead of infinite.
- Fix: type -> task
- Initial release