README

Overview

This project provides a Python-based solution for named entity recognition (NER), specifically identifying personal names (ФИО) in Russian texts. It uses OpenAI's GPT-4 model via the langchain framework to extract names from input texts, retaining their original case and position. The solution consists of three main scripts:

model.py: Defines the language model pipeline for entity extraction.
main.py: Implements a cloud-hosted service for processing text and extracting entities.
test.py: Provides testing and evaluation metrics for the entity extraction performance.

File Descriptions

1. `model.py`

Purpose: Defines the pipeline for extracting personal names from text.
Key Components:
- Uses ChatOpenAI from langchain_openai to initialize the GPT-4 model.
- Constructs a prompt to identify all names (ФИО) within a given text.
- Defines an asynchronous function, generate_answer(), which processes the text to extract names and their positions.

2. `main.py`

Purpose: Implements the NER service as a cloud-hosted API.
Key Components:
- SimpleActionExample class defines the service logic for entity extraction using the model pipeline from model.py.
- Takes input texts, processes them to extract entities, and formats the results according to predefined schemas.
- Uses mlp_sdk to handle API hosting and deployment.

3. `test.py`

Purpose: Tests and evaluates the entity extraction performance.
Key Components:
- Downloads a dataset of annotated texts to test the NER functionality.
- Computes precision, recall, and F1-score metrics for the model output.
- Supports running tests on a configurable number of files using command-line arguments.

Installation

Clone the repository:

git clone <repository-url>
cd <repository-folder>

Install dependencies:
```
pip install -r requirements.txt
```
Set up environment variables:
- Create a .env file in the root directory and add your OpenAI API key:
```
OPENAI_API_KEY=your_openai_api_key
```

Usage

Running the NER Service

To start the entity extraction service:

python main.py

This will host the NER service using mlp_sdk on the cloud.

Running Tests

To evaluate the performance of the NER model:

python test.py --count <number_of_files>

Replace <number_of_files> with the number of test files you wish to evaluate.

##Example

To extract entities from the text:

Input Text: Гагарин полетел на орбиту на ракете Сергея Королёва.
Output:

{
  "entities_list": [
    {
      "entities": [
        {
          "value": "Гагарин",
          "entity_type": "PERSON",
          "span": {
            "start_index": 0,
            "end_index": 7
          },
          "entity": "Гагарин",
          "source_type": "SLOVNET"
        },
        {
          "value": "Сергея Королёва",
          "entity_type": "PERSON",
          "span": {
            "start_index": 28,
            "end_index": 42
          },
          "entity": "Сергея Королёва",
          "source_type": "SLOVNET"
        }
      ]
    }
  ]
}

Evaluation metrics

Precision: Measures the accuracy of the names extracted by the model.
Recall: Measures the coverage of the model in identifying all relevant names.
F1 Score: Harmonic mean of precision and recall, providing a balanced evaluation metric.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
main_fastapi.py		main_fastapi.py
requirements-lock.txt		requirements-lock.txt
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Overview

File Descriptions

1. `model.py`

2. `main.py`

3. `test.py`

Installation

Usage

Running the NER Service

Running Tests

Evaluation metrics

About

Releases

Packages

Languages

License

EvilSumrak2049/JustAI

Folders and files

Latest commit

History

Repository files navigation

README

Overview

File Descriptions

1. model.py

2. main.py

3. test.py

Installation

Usage

Running the NER Service

Running Tests

Evaluation metrics

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `model.py`

2. `main.py`

3. `test.py`

Packages