NAMA The NAme MAtching tool

Fast, flexible name matching for large datasets

Installation

Recommended install via pip

Create virtual env ``. Optional
Install nama pip install git+https://github.com/bradhackinen/nama.git@master

Install from source with conda

Install Anaconda
Clone nama

git clone https://github.com/bradhackinen/nama.git

Enter the conda directory where the conda environment file is with

cd conda

Create new conda environment with

conda create --name <env-name>

Activate the new environment with

conda activate <env-name>

Download & Install pytorch-mutex

conda install pytorch-mutex-1.0-cuda.tar.bz2

Download & Install pytorch

conda install pytorch-1.10.2-py3.9_cuda11.3_cudnn8.2.0_0.tar.bz2

Install the rest of the dependencies with

conda install --file conda_env.txt

Exit the conda directory with

cd ..

Install the package with

pip install .

Installing from source with pip

Clone nama git clone https://github.com/bradhackinen/nama.git
Create & activate virtual environment python -m venv nama_env && source nama_env/bin/activate
Install dependencies pip install -r requirements.txt
Install the package with pip install ./nama

Install from the project root directory pip install .
Install from another directory pip install /path-to-project-root

Demo

Usage

Using the `Matcher()`

Importing data

To import data into the matcher we can either pass nama a pandas DataFrame with

import nama

training_data = nama.from_df(
    df,
    group_column='group_column',
    string_column='string_column')
print(training_data)

or we can pass nama a .csv file directly

import nama

testing_data = nama.read_csv(
    'path-to-data',
    match_format=match_format,
    group_column=group_column,
    string_column=string_column)
print(training_data)

See from_df & read_csv for parameters and function details

Using the `EmbeddingSimilarityModel()`

Initialation

We can initalize a model like so

from nama.embedding_similarity import EmbeddingSimilarityModel

sim = EmbeddingSimilarityModel()

If using a GPU then we need to send the model to a GPU device like

sim.to(gpu_device)

Training

To train a model we simply need to specifiy the training parmeters and training data

train_kwargs = {
    'max_epochs': 1,
    'warmup_frac': 0.2,
    'transformer_lr':1e-5,
    'score_lr':30,
    'use_counts':False,
    'batch_size':8,
    'early_stopping':False
}

history_df, val_df = sim.train(training_data, verbose=True, **train_kwargs)

We can also save the trained model for later

sim.save("path-to-save-model")

Testing

We can use the model we train above directly like

embeddings = sim.embed(testing_data)

Or load a previously trained model

from nama.embedding_similarity import load_similarity_model

new_sim = load_similarity_model("path-to-saved-model")
embeddings = sim.embed(testing_data)

MORE TO COME

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NAMA The NAme MAtching tool

Installation

Demo

Usage

Using the `Matcher()`

Importing data

Using the `EmbeddingSimilarityModel()`

Initialation

Training

Testing

Files

README.md

Latest commit

History

README.md

File metadata and controls

NAMA The NAme MAtching tool

Installation

Demo

Usage

Using the Matcher()

Importing data

Using the EmbeddingSimilarityModel()

Initialation

Training

Testing

Using the `Matcher()`

Using the `EmbeddingSimilarityModel()`