Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release/443-release-candidate #13822

Merged
merged 8 commits into from
May 25, 2023
Merged

Conversation

maziyarpanahi
Copy link
Member

@maziyarpanahi maziyarpanahi commented May 25, 2023


New Features & Enhancements

  • New multilabel parameter to switch from multi-class to multi-label on all Classifiers in Spark NLP: AlbertForSequenceClassification, BertForSequenceClassification, DeBertaForSequenceClassification, DistilBertForSequenceClassification, LongformerForSequenceClassification, RoBertaForSequenceClassification, XlmRoBertaForSequenceClassification, XlnetForSequenceClassification, BertForZeroShotClassification, DistilBertForZeroShotClassification, and RobertaForZeroShotClassification
  • Refactor protected Params and Features to avoid unwanted exceptions during runtime SPARKNLP-835: ProtectedParam and ProtectedFeature #13797
  • Add proper documentation and instructions for ZeroShot classifiers: BertForZeroShotClassification, DistilBertForZeroShotClassification, and RobertaForZeroShotClassification SPARKNLP-809: Add warning to ForZeroShot annotators #13798
  • Extend support for downloading models/pipelines directly by given name or S3 path in ResourceDownloader Add unzip param to downloadModelDirectly in ResourceDownloader #13796
from sparknlp.pretrained import ResourceDownloader

# partial S3 path
ResourceDownloader.downloadModelDirectly("public/models/albert_base_sequence_classifier_ag_news_en_3.4.0_3.0_1639648298937.zip", remote_loc = "public/models")

# full S3 path
ResourceDownloader.downloadModelDirectly("s3://auxdata.johnsnowlabs.com/public/models/albert_base_sequence_classifier_ag_news_en_3.4.0_3.0_1639648298937.zip", remote_loc = "public/models", unzip = False)

Bug Fixes

Known issues:
Current pre-trained pipelines don't work on PySpark 3.2/3.3/and 3.4. They will be fixed in the upcoming week.

danilojsl and others added 4 commits May 25, 2023 09:54
* SPARKNLP-825 Adding multilabel param to all sequence and zero-shot classifiers

* SPARKNLP-825 Adding documentation about multilabel parameter
* SPARKNLP-835: Finalize protected Features

* SPARKNLP-835: Remove redundant checks for protected Features

* SPARKNLP-835: Introduce ProtectedParam

* SPARKNLP-835: Resolve encoding/decoding issue for HasProtectedParams

* SPARKNLP-835: Make caseSensitive settable

* SPARKNLP-835: Make maxSentenceLength, batchSize settable

* SPARKNLP-835: Enable protected Params for Annotators
* SPARKNLP-809: Add warning ZeroShot annotators

* Fix DocumentAssembler documentation
@maziyarpanahi maziyarpanahi merged commit 6ae3a2b into master May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants