From 276b89327ecb9b2ba3eb10d44437e8840936d9dc Mon Sep 17 00:00:00 2001 From: sahil11129 <112084296+sahil11129@users.noreply.github.com> Date: Fri, 24 Mar 2023 00:20:51 +0530 Subject: [PATCH] updated the numbering --- ...l for PII extraction using fine-tuned models.md | 14 +++++++------- ... for PII extraction using pre-trained models.md | 8 ++++---- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/ML/PII-Extraction/Tutorial for PII extraction using fine-tuned models.md b/ML/PII-Extraction/Tutorial for PII extraction using fine-tuned models.md index e3aeeed..89273bd 100644 --- a/ML/PII-Extraction/Tutorial for PII extraction using fine-tuned models.md +++ b/ML/PII-Extraction/Tutorial for PII extraction using fine-tuned models.md @@ -43,14 +43,14 @@ The tutorial demonstrates the extraction of PII using generated training data fo ## Step 1. Generate the data for custom PII -### Step 1.1 Set Project token +### 1. Set Project token Before you can begin working on notebook in Watson Studio in Cloud Pak for Data as a Service, you need to ensure that the project token is set so that you can access the project assets via the notebook. When this notebook is added to the project, a project access token should be inserted at the top of the notebook in a code cell. If you do not see the cell above, add the token to the notebook by clicking **More > Insert project token** from the notebook action bar. By running the inserted hidden code cell, a project object is created that you can use to access project resources. ![ws-project.mov](https://media.giphy.com/media/jSVxX2spqwWF9unYrs/giphy.gif) -### 1.2 Generate the sample data set for train the custom PIIs using faker library. Below table shows the custom PIIs which demonstrate in this tutorial: +### 2. Generate the sample data set for train the custom PIIs using faker library. Below table shows the custom PIIs which demonstrate in this tutorial: |Custom PIIs| |-----------| @@ -216,7 +216,7 @@ The Watson NLP platform provides a fine-tune feature that allows for custom trai * BILSTM: the BiLSTM network would take the preprocessed text as input and learn to identify patterns and relationships between words that are indicative of PII data. The BiLSTM network would then output a probability score for each word in the text, indicating the likelihood that the word is part of a PII entity. The BiLSTM network may also be trained to recognize specific entities such as names, addresses, phone numbers, email addresses, etc. -## Step 2.1 PII extraction function +## 1. PII extraction function Both the model are trained from labeled data, which require the syntax block to be executed first to generate the expected input for the entity-mention block. BiLSTM model requires Glove embedding for fine tuning. It allows for words to be represented as dense vectors in a high-dimensional space, where the distance between vectors reflects the semantic similarity between the corresponding words. We can use GloVe embedding to generate vector representations of the words in our data, which can then be utilized for further analysis or modeling." is a popular method for generating vector representations of words in natural language processing. It allows for words to be represented as dense vectors in a high-dimensional space, where the distance between vectors reflects the semantic similarity between the corresponding words. We can use GloVe embedding to generate vector representations of the words in our data, which can then be utilized for further analysis or modeling." @@ -233,7 +233,7 @@ mentions_train_template = watson_nlp.load(watson_nlp.download('file_path_entity- default_feature_extractor = watson_nlp.load(watson_nlp.download('feature-extractor_rbr_entity-mentions_sire_en_stock')) ``` -## Step 2.2 Fine-Tuning the models +## 2. Fine-Tuning the models Fine-tuning a BiLSTM model for PII extraction involves training the model on a labeled training dataset includes examples of PII entities. @@ -261,7 +261,7 @@ project.save_data('bilstm_pii_workflow_custom', data=mentions_workflow.as_file_l ``` now save the model with Syntax model as workflow model so we can directly test on the input text. -## 2.3 Test the Fine-Tuned Model +## 3. Test the Fine-Tuned Model Now let's run the trained models with testing data, Here testing data is a sentence from test data which we generate before. We can fetch single sentences : `text = pd.read_json('faker_PII_text_test.json')['text'][1]` @@ -275,7 +275,7 @@ As per the above result, fine-tuned BiLSTM model can identify all trained custom * SIRE: Statistical Information and Relation Extraction (SIRE) is a technique used in natural language processing (NLP) to extract specific information and relationships from text. It involves using machine learning algorithms to identify and extract structured data such as entities, attributes, and relations from unstructured text. SIRE is used in a variety of applications, including information extraction, knowledge graph construction, and question answering. SIRE typically uses supervised learning approach, where a model is trained using annotated examples of text and the corresponding structured data. The model can then be used to extract the same information from new, unseen text. -## 3.1 Fine-Tuning the models +## 1. Fine-Tuning the models Fine-tuning a Sire model for PII extraction involves training the model on a labeled training dataset includes examples of PII entities. @@ -300,7 +300,7 @@ project.save_data('sire_pii_workflow_custom', data=sire_workflow.as_file_like_ob now save the model with Syntax model as workflow model so we can directly test on the input text. -## 3.1 Test the Fine-Tuned Model +## 2. Test the Fine-Tuned Model Now let's run the trained models with testing data, Here testing data is a sentence from test data which we generate before. diff --git a/ML/PII-Extraction/Tutorial for PII extraction using pre-trained models.md b/ML/PII-Extraction/Tutorial for PII extraction using pre-trained models.md index 285d7af..7b748af 100644 --- a/ML/PII-Extraction/Tutorial for PII extraction using pre-trained models.md +++ b/ML/PII-Extraction/Tutorial for PII extraction using pre-trained models.md @@ -42,14 +42,14 @@ The tutorial demonstrates the extraction of PII using pre-trained Watson NLP mod ## Step 1. Generate the testing data -### Step 1.1 Set Project token +### 1. Set Project token Before you can begin working on notebook in Watson Studio in Cloud Pak for Data as a Service, you need to ensure that the project token is set so that you can access the project assets via the notebook. When this notebook is added to the project, a project access token should be inserted at the top of the notebook in a code cell. If you do not see the cell above, add the token to the notebook by clicking **More > Insert project token** from the notebook action bar. By running the inserted hidden code cell, a project object is created that you can use to access project resources. ![ws-project.mov](https://media.giphy.com/media/jSVxX2spqwWF9unYrs/giphy.gif) -### Step 1.2 Generate the sample data set for Name, credit card number and social security number using faker library. +### 2. Generate the sample data set for Name, credit card number and social security number using faker library. ``` @@ -99,7 +99,7 @@ The process of identifying and PII entities from text can be accomplished using 2. A model that is trained on labeled data for the more complex entity types such as persons, organizations, and locations. This model uses machine learning techniques to learn patterns and relationships between words and their corresponding entity types in order to accurately identify and extract entities from text. -## Step 2.1 PII extraction function +## 1. PII extraction function Rule-based models (RBR) can be directly applied to input text without any dependency on pre-processing blocks. On the other hand, models that are trained from labeled data, such as BilSTM and SIRE, require the syntax block to be executed first to generate the expected input for the entity-mention block. @@ -117,7 +117,7 @@ rbr_model = watson_nlp.load(watson_nlp.download('entity-mentions_rbr_multi_pii') sire = watson_nlp.load(watson_nlp.download('entity-mentions_sire_en_stock-wf')) ``` -## Step 2.2 Run the Pre-Trained models for PII Extraction +## 2. Run the Pre-Trained models for PII Extraction * BiLSTM Pretrained