Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARKNLP-656 & SPARKNLP-657: Updated Documentation #13108

Conversation

DevinTDHa
Copy link
Member

Description

This PR updates the documentation so it is more clear how to use setTestDataset in *DLApproach annotators.

In addition installation instructions for M1 machines were also updated.

Resolves #13070 and #13079.

How Has This Been Tested?

No code changes.

@maziyarpanahi
Copy link
Member

@DevinTDHa

Once you split (or use CoNLL() to have another DataFrame for test/dev), you need to transform it on the very same pipeline.

So perhaps, we can have one example/doc for a normal training, and another one in case the needs to have testDataset param which usually doesn't use Pipeline (or everything up and incuding embeddings is in the Pipeline and the trainiable annotator is outside the pipeline)

Like

document = DocumentAssembler()\
    .setInputCol("description")\
    .setOutputCol("document")

use = UniversalSentenceEncoder.pretrained() \
 .setInputCols(["document"])\
 .setOutputCol("sentence_embeddings")

pipeline = Pipeline(stages = [document,use])

test_dataset = pipeline.fit(news_test_dataset).transform(news_test_dataset)

we transform and save the test/dev:

test_dataset.write.parquet("./test_news.parquet")

and we train:

classsifierdl = ClassifierDLApproach()\
  .setInputCols(["sentence_embeddings"])\
  .setOutputCol("class")\
  .setLabelColumn("category")\
  .setMaxEpochs(5)\
  .setEnableOutputLogs(True) \
  .setEvaluationLogExtended(True) \
  .setValidationSplit(0.2) \
  .setTestDataset("./test_news.parquet")

pipeline = Pipeline(
    stages = [
        document,
        use,
        classsifierdl
    ])

pipelineModel = pipeline.fit(trainDataset)

… classifiers

- updated docs for NerDLApproach, ClassifierDLApproach, MultiClassifierDLApproach, SentimentDLApproach
@DevinTDHa
Copy link
Member Author

@maziyarpanahi Updated with the latest push to better examples

@maziyarpanahi maziyarpanahi changed the base branch from master to feature/424-release-candidate November 28, 2022 09:59
@maziyarpanahi maziyarpanahi linked an issue Nov 28, 2022 that may be closed by this pull request
@maziyarpanahi maziyarpanahi merged commit 7bbb637 into JohnSnowLabs:feature/424-release-candidate Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants