Skip to content

Releases: huggingface/datasets

2.3.1

15 Jun 11:08
Compare
Choose a tag to compare

Bug fixes

  • Fix patching module that doesn't exist by @lhoestq in #4495
    • fix bug when importing the lib when scipy is not installed
  • Re-add download_manager module in utils by @lhoestq in #4497
    • fix moved imports of DownloadConfig, DownloadMode, DownloadManager
  • Support streaming UDHR dataset by @albertvillanova in #4487

Full Changelog: 2.3.0...2.3.1

2.3.0

14 Jun 18:12
Compare
Choose a tag to compare

Datasets Changes

Dataset Features

Dataset Cards

  • Minor fixes/improvements in scene_parse_150 card by @mariosasko in #4447
  • Tidy up license metadata for google_wellformed_query, newspop, sick by @leondz in #4378
  • Fix example in opus_ubuntu, Add license info by @leondz in #4360
  • Update README.md of fquad by @lhoestq in #4450

Documentation

Other improvements and bug fixes

New Contributors

Full Changelog: https://git...

Read more

2.2.2

20 May 17:54
Compare
Choose a tag to compare

Datasets fixes

Bug fixes

  • Support lists of multi-dimensional numpy arrays by @albertvillanova in #4194
  • Check if dataset features match before push in DatasetDict.push_to_hub by @mariosasko in #4372
  • Pin dill by @albertvillanova in #4380
    • dill 0.3.5 has some issues in transformers - pinning the version to <0.3.5 for now

Dataset Cards

  • Adding eval metadata for ade v2 by @sashavor in #4319
  • Adding eval metadata for AG News by @sashavor in #4329
  • Adding eval metadata to Allociné dataset by @sashavor in #4330
  • Adding eval metadata to Amazon Polarity by @sashavor in #4331
  • Adding eval metadata for arabic speech corpus by @sashavor in #4332
  • Adding eval metadata for Banking 77 by @sashavor in #4333
  • Eval metadata Batch 4: Tweet Eval, Tweets Hate Speech Detection, VCTK, Weibo NER, Wisesight Sentiment, XSum, Yahoo Answers Topics, Yelp Polarity, Yelp Review Full by @sashavor in #4338
  • Eval metadata batch 3: Reddit, Rotten Tomatoes, SemEval 2010, Sentiment 140, SMS Spam, Snips, SQuAD, SQuAD v2, Timit ASR by @sashavor in #4337
  • Eval metadata batch 1: BillSum, CoNLL2003, CoNLLPP, CUAD, Emotion, GigaWord, GLUE, Hate Speech 18, Hate Speech by @sashavor in #4335
  • Eval metadata batch 2 : Health Fact, Jigsaw Toxicity, LIAR, LJ Speech, MSRA NER, Multi News, NCBI Disease, Poem Sentiment by @sashavor in #4336

Docs

  • Add API code examples for Builder classes by @stevhliu in #4313
  • Add redirect to dataset script in the repo structure page by @lhoestq in #4369

Other improvements and bug fixes

New Contributors

Full Changelog: 2.2.1...2.2.2

2.2.1

11 May 17:04
Compare
Choose a tag to compare

Datasets bug fixes

  • Fix cnn_dailymail (dm stories were ignored) by @lhoestq in #4317
    • datasets 2.2.0 introduced a bug in cnn_dailymail and some examples were missing in the dataset

General improvements and bug fixes

New Contributors

Full Changelog: 2.2.0...2.2.1

2.2.0

10 May 19:44
Compare
Choose a tag to compare

Dataset Changes

Dataset Features

Dataset Cards

Metrics Changes

Metric Cards

Documentation

  • Document save_to_disk and push_to_hub on images and audio files by @lhoestq in #4193
  • Add to docs how to load from local script by @albertvillanova in #4200
  • Add code examples to API docs by @stevhliu in #4168
  • Add code examples for DatasetDict by @stevhliu in #4245
  • Add API code examples for IterableDataset by @stevhliu in #4274
  • Add packaged builder configs to the documentation by @lhoestq in #4307
  • [Imagefolder] Docs + Don't infer labels from file names when there are metadata + Error messages when metadata and images aren't linked correctly by @lhoestq in #4311

General improvements and bug fixes

Read more

2.1.0

14 Apr 09:33
Compare
Choose a tag to compare

Datasets Changes

Dataset Cards

Datasets Tags and Search on the Hugging Face Hub

Metrics Changes

Metric Cards

Documentation

General improvements and bug fixes

Read more

2.0.0

15 Mar 17:26
Compare
Choose a tag to compare

🤗 Datasets 2.0.0

We're happy to announce that our new documentation is available at hf.co/docs/datasets !

Dataset Features

  • Load a folder of images using the imagefolder dataset loader:
  • Push your image and audio datasets on the Hugging Face Hub with push_to_hub:
    • Add support for Audio and Image feature in push_to_hub by @mariosasko in #3685
  • New processing methods for streaming datasets:
    • Add IterableDataset.filter by @lhoestq in #3826
    • Manipulate columns on IterableDataset (rename columns, cast, etc.) by @lhoestq in #3862
    • Add the new methods to IterableDatasetDict by @lhoestq in #3923
  • And more:

Breaking changes

  • API changes for map and shuffle for datasets loaded in streaming mode:
    • Align map when streaming: update instead of overwrite + add missing parameters by @lhoestq in #3801
    • Align IterableDataset.shuffle with Dataset.shuffle by @lhoestq in #3842
  • Rename GenerateMode to DownloadMode by @albertvillanova in #3759
  • Remove deprecated methods/params (preparation for v2.0) by @mariosasko in #3803
  • Remove deprecated remove_columns param in filter by @mariosasko in #3827
  • Module namespace cleanup for v2.0 by @mariosasko in #3875

Dataset Changes

Dataset cards

Metric Changes

Metric cards

New documentation

General improvements and bug fixes

Read more

1.18.4

07 Mar 11:52
faf3d79
Compare
Choose a tag to compare

Bug fixes

Full Changelog: 1.18.3...1.18.4

1.18.3

02 Feb 14:21
Compare
Choose a tag to compare

Bug fixes

  • Fix MP3 resampling when a dataset's audio files have different sampling rates by @lhoestq in #3665
  • Extend dataset builder for streaming in get_dataset_split_names by @mariosasko in #3657

Dataset changes

Other improvements

New Contributors

Full Changelog: 1.18.2...1.18.3

1.18.2

28 Jan 16:55
Compare
Choose a tag to compare

Bug fixes

  • Fix streaming datasets that are not reset correctly by @lhoestq in #3646
  • Fix numpy rngs when shuffling with seed=None by @mariosasko in #3641
  • Fix dataset slicing with negative bounds when indices mapping is not None by @mariosasko in #3642
  • Fix add_column on datasets with indices mapping by @mariosasko in #3647

Other improvements

New Contributors

Full Changelog: 1.18.1...1.18.2