Python Module Index
- -- m | ||
- |
- moralization | - |
diff --git a/build/doctrees/environment.pickle b/build/doctrees/environment.pickle index 996de2b..03682e5 100644 Binary files a/build/doctrees/environment.pickle and b/build/doctrees/environment.pickle differ diff --git a/build/doctrees/user_api.doctree b/build/doctrees/user_api.doctree index b145cd7..5dbfada 100644 Binary files a/build/doctrees/user_api.doctree and b/build/doctrees/user_api.doctree differ diff --git a/build/html/genindex.html b/build/html/genindex.html index 30ff059..1a29fa1 100644 --- a/build/html/genindex.html +++ b/build/html/genindex.html @@ -76,204 +76,8 @@
- |
- | - |
- | - |
- | - |
- | - |
|
-
|
-
- |
- | - |
- | - |
- | - |
- | - |
- |
DataManager
SpacyModelManager
TransformersModelManager
DataManager
DataManager.check_data_integrity()
DataManager.df_to_dataset()
DataManager.export_data_DocBin()
DataManager.import_data_DocBin()
DataManager.interactive_correlation_analysis()
DataManager.interactive_data_analysis()
DataManager.interactive_data_visualization()
DataManager.occurrence_analysis()
DataManager.print_dataset_info()
DataManager.publish()
DataManager.pull_dataset()
DataManager.return_analyzer_result()
DataManager.return_categories()
DataManager.set_dataset_info()
DataManager.visualize_data()
SpacyModelManager
-TransformersModelManager
TransformersModelManager.add_labels_to_inputs()
TransformersModelManager.compute_metrics()
TransformersModelManager.create_batch()
TransformersModelManager.evaluate()
TransformersModelManager.map_dataset()
TransformersModelManager.postprocess()
TransformersModelManager.publish()
TransformersModelManager.save()
TransformersModelManager.test()
TransformersModelManager.tokenize()
TransformersModelManager.tokenize_and_align()
TransformersModelManager.train()
Initialize the DataManager that handles the data transformations.
-should be stored.
-Defaults to “de_core_news_sm” (small German).
-this if pulling a dataset from Hugging Face. Defaults to False.
-return all labels of a given task, or a list of selected labels, such as [“Cheating”, “Fairness”]. -If you provide a list, this is independent of the task. -Defaults to None, in which case all labels for all categories are selected.
-Default is: -merge_dict = {
---“task1”: [“KAT1-Moralisierendes Segment”], -“task2”: [“KAT2-Moralwerte”, “KAT2-Subjektive Ausdrücke”], -“task3”: [“KAT3-Rolle”, “KAT3-Gruppe”, “KAT3-own/other”], -“task4”: [“KAT4-Kommunikative Funktion”], -“task5”: [“KAT5-Forderung explizit”],
-
} -Defaults to None.
-“task1”: [“KAT1-Moralisierendes Segment”] -“task2”: [“KAT2-Moralwerte”, “KAT2-Subjektive Ausdrücke”] -“task3”: [“KAT3-Rolle”, “KAT3-Gruppe”, “KAT3-own/other”] -“task4”: [“KAT4-Kommunikative Funktion”] -“task5”: [“KAT5-Forderung explizit”] -Defaults to “task1”.
-A DataManager object.
-This function checks the data and compares it to the spacy thresholds for label count, -span distinctiveness and boundary distinctiveness.
-If a value is found to be insufficient a warning will be raised.
-By default this function will be called when training data is exported
-Export the currently loaded docs as a spacy binary. This is used in spacy training.
-output_dir (str/Path, optional): The directory in which to place the output files. Defaults to None. -overwrite(bool, optional): If True, spacy files are written even if files are already present.
---Defaults to False.
-
check fails, then no output is written. Skip the test by setting to False. In this case, -the output is always generated even if the data does not pass the quality check.
-list[Path]: A list of the train and test files path.
-from relative path of given directory or from relative path of current working directory.
-input_dir (Path, optional): Lookup directory. Defaults to None. -train_file (Path, optional): Absolute or relative path. Defaults to None. -test_file (Path, optional): Absolute or relative path. Defaults to None.
-list[Path]: A list of the train and test files path.
-optionally one can filter by filename(s).
-_type (str, optional): Either “table”, “corr” or “heatmap”, defaults to table. -filter (str/list(str), optional): Filename filters. Defaults to None.
-pd.DataFrame: occurrence dataframe per paragraph.
-Print information set in the dataset.
-data_set (Dataset): The Dataset object of which the information -is to be printed.
-Publish the dataset to Hugging Face.
-This requires a User Access Token from https://huggingface.co/
-The token can either be passed via the hugging_face_token argument, -or it can be set via the HUGGING_FACE_TOKEN environment variable. -If the token is not set, a prompt will pop up where it can be provided.
-repo_id (str): The name of the repository that you are pushing to. -This can either be a new repository or an existing one. -data_set (Dataset): The Dataset to be published to Hugging Face. Please -note that this is a Dataset object and not a DatasetDict object, meaning -that if you have already split your dataset into test and train, you can -either push test and train separately or need to concatenate them using “+”. -If not set, the raw dataset that is connected to the DataManager instance will -be used. -hugging_face_token (str, optional): Hugging Face User Access Token.
-Method to pull existing dataset from Hugging Face.
-dataset_name (str): Name of the dataset to pull. -revision (str, optional): The revision number of the dataset
---that should be pulled. If not set, the default version from -the “main” branch will be pulled.
-
Depending on the dataset, this can be “train”, “test”, “validation”, -to be split into new test and train sets after the pull. -Can also be set to None , pulling the full dataset with existing splits. -Defaults to None.
-If no analyzer has been created yet, a new one will be generated and stored.
-span_distinctiveness, boundary_distinctiveness or “all”. Defaults to “frequency”.
-Returns a dict of all categories in the dataset.
-dict: A list of all categories in the dataset.
-Update the information set in the dataset.
-data_set (Dataset): The Dataset object of which the information is to be updated. -Defaults to the raw dataset associated with the DataManager instance. -description (str, optional): The new description to be updated. Optional, defaults to None.
---The description will contain the task for which the labels were created, and the -names of the original data files.
-
version (str, optional): The new version to be updated. Optional, defaults to None. -license (str, optional): The new license to be updated. Optional, defaults to None. -citation (str, optional): The new citation to be updated. Optional, defaults to None. -homepage (str, optional): The new homepage to be updated. Optional, defaults to None.
-Dataset: The updated Dataset object.
-Create, import, modify, train and publish spacy models.
-Models can be trained on data from a DataManager, and published to hugging face.
-Imports an existing model from the model_path folder if found.
-If the model_path folder does not exist, or if base_config_file -is supplied, or if overwrite_existing_files is True, -creates a new model in the model_path folder.
-Resulting folder structure inside model_path:
-/config.cfg: model config file
/meta.json: user-editable metadata (also exported to trained models)
If the model has been trained the folder will also contain:
-/data/train.spacy: training dataset generated by DataManager
/data/dev.spacy: testing dataset generated by DataManager
/model-best/config.cfg: best trained model
/model-last/config.cfg: last trained model
model_path (str or Path): Folder where the model is (or will be) stored -language (str, optional): Language that will be finetuned.
---Defaults to spaCy’s small German model.
-
task1: Training on all labels of category 1.
-base_config_file (str or Path, optional): If supplied this base config will be used to create a new model -overwrite_existing_files (bool): If true any existing files in model_path are removed
-Evaluate the model against the test dataset in data_manager
-Publish the model to Hugging Face.
-This requires a User Access Token from https://huggingface.co/
-The token can either be passed via the hugging_face_token argument, -or it can be set via the HUGGING_FACE_TOKEN environment variable. If -no token is provided, a command prompt will open to request the token.
-hugging_face_token (str, optional): Hugging Face User Access Token
-str: The URL of the published model
-Save any changes made to the model metadata.
-Test the model output with a test string
-Train the model on the data contained in data_manager.
-data_manager (DataManager): the DataManager that contains the training data -check_data_integrity (bool): Whether to test the data integrity. -use_gpu (int): The index of the GPU to use (default: -1 which means no GPU) -overrides (dict): An optional dictionary of parameters to override in the model config
-Create, import, modify, train and publish transformers models.
-Models can be trained on data from a DataManager, and published to hugging face.
-Import an existing model from model_name from Hugging Face.
-model_path (str or Path): Folder where the model is (or will be) stored -model_name (str): Name of the pretrained model
-Expand the label list to match the tokens after tokenization by -selected tokenizer. Add to inputs.
-labels (list, required): The nested list of labels that needs to be aligned.
-Convenience function to compute and return the metrics.
-eval_preds (tuple, required): The predicted and actual labels as a tuple.
-Creates the batches from the tokenized datasets using the Data Collator.
-Evaluate the model against the test dataset in data_manager. -Args:
---data_manager (DataManager): the DataManager that contains the training data
-
Apply the tokenization to the complete dataset using a mapping function.
-train_test_set (DatasetDict, required): The nested list of labels that needs to be aligned. -token_column_name (str, optional): The name of the column containing the sentences/tokens. -Defaults to “word”. -label_column_name (str): The name of the column containing the labels. -Defaults to “label”.
-tokenized_datasets (DatasetDict): The tokenized and label-aligned dataset.
-Publish the model to Hugging Face.
-This requires a User Access Token from https://huggingface.co/
-The token can either be passed via the hugging_face_token argument, -or it can be set via the HUGGING_FACE_TOKEN environment variable. If -no token is provided, a command prompt will open to request the token.
-repository.
-located (or shall be created).
-hugging_face_token (str, optional): Hugging Face User Access Token -create_new_repo (bool, optional): Create a new repository with new model card
---on Hugging Face.
-
str: The URL of the published model.
-Save the model to the set model path. -If a model already exists in that path, it will be overwritten.
-Test the model output with a test string.
-Tokenize the pre-tokenized inputs with the selected tokenizer.
-wordlist (list, required): The list of words that will be tokenized. -The list is checked for nesting, the tokenizer expects a nested list of lists.
-Tokenize the word list and align the label list to the new tokens.
-examples (batch, required): The batch of pre-tokenized words -and labels that needs to be aligned.
-inputs (BatchEncoding): The encoded tokens, labels, etc, after tokenization
-Train a model using the pre-loaded components.
-