Calculate similarities betweens JSON objects

This calculates the similarities between JSON objects and returns a file with top x entries.

Can be used to generate recommendations for similar blog posts.

Setup

python -m spacy download en_core_web_lg
pip install tensorflow_hub

Requires Python 3.12.

Install requirements from requirements.txt:

pip install -r requirements.txt

Install the en_core_web_lg model (license: MIT):

python -m spacy download en_core_web_lg

This uses universal-sentence-encoder (license: Apache 2.0).

Example usage

The following command calculates similarities between all objects in the two given JSON files and outputs all top 5 similarities in one JSON file.

python main.py --content_identifier "content" --highlights_identifier "id" 1.json 2.json

Explanation of field usage

With the command above the similarity of the content field in all objects will be calculated.

The result is a json file with top x similarities between them, ordered and identified by the id field.

1.json:

[
    {
        "id": 1,
        "content": "This is just an example in one file."
    }
]

2.json:

[
    {
        "id": 2,
        "content": "This is an example."
    },
    {
        "id": 3,
        "content": "Also a second example."
    }
]

Result top_5.json:

{
    "1": [
        {
            "id": 2,
            "content": "example."
        },
        {
            "id": 3,
            "content": "second example."
        }
    ],
    "2": [
        {
            "id": 3,
            "content": "second example."
        },
        {
            "id": 1,
            "content": "example one file."
        }
    ],
    "3": [
        {
            "id": 2,
            "content": "example."
        },
        {
            "id": 1,
            "content": "example one file."
        }
    ]
}

Please note that the field content is sanitized and can differ from the input.

You can run this with scripts/example.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
out		out
scripts		scripts
similarity		similarity
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Calculate similarities betweens JSON objects

Setup

Example usage

Explanation of field usage

About

Releases

Packages

Languages

License

denniskawurek/generate-recommendations

Folders and files

Latest commit

History

Repository files navigation

Calculate similarities betweens JSON objects

Setup

Example usage

Explanation of field usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages