Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support list variables in match_dicts #72

Open
sam-writer opened this issue Jul 16, 2020 · 0 comments
Open

Support list variables in match_dicts #72

sam-writer opened this issue Jul 16, 2020 · 0 comments

Comments

@sam-writer
Copy link
Contributor

sam-writer commented Jul 16, 2020

In our newest project we are using a wrapped version of replacy to support list variables in match_dicts, like so

import json
import os
from typing import List

from replacy import ReplaceMatcher
from replacy.db import load_json as load_replacy_files_from_directory


here = os.path.abspath(os.path.dirname(__file__))


class ModifiedReplaceMatcher:
    def __init__(self):
        rd_path = os.path.join(here, "resources/match_dicts")
        proto_match_dict = load_replacy_files_from_directory(rd_path)
        vocab_refs = self._load_vocab_refs("resources/variables/vocab_refs.json")
        self.rmatch_dict = self._refine_match_dict(proto_match_dict, vocab_refs)

    def _load_vocab_refs(self, vocab_refs_path: str):
        file_path = os.path.join(here, vocab_refs_path)
        with open(file_path, "r", encoding="utf-8") as f:
            return json.load(f)

    def _remove_square_brackets_from_list_of_strings(self, l: List[str]) -> str:
        """
        look at me, I'm metaprogramming
        turns ["a", "b", "c"]
        into '"a", "b", "c"'
        """
        list_str = '"'
        list_str += '", "'.join(l)
        list_str += '"'
        return list_str

    def _refine_match_dict(self, match_dict: dict, vocab_refs: dict) -> dict:
        """
        Replace $REF:something by vocab list
        from vocab_refs.json file
        This should probably be replaCy functionality
        And we could add functionality if we did fancier parsing
        """
        r_matcher_str = json.dumps(match_dict)
        for ref_id, ref_list in vocab_refs.items():
            # this is a sin
            ref_list_str = self._remove_square_brackets_from_list_of_strings(ref_list)
            target = f'"$REF:{ref_id}"'
            r_matcher_str = r_matcher_str.replace(target, ref_list_str)
            # end sin
        return json.loads(r_matcher_str)

    def get_matcher(self, nlp, kenlm_path):
        return ReplaceMatcher(nlp, match_dict=self.rmatch_dict, lm_path=kenlm_path)

Where resources/variables/vocab_refs.json would have an entry like

{
  "variable-name": [
    "hello",
    "hi",
    "yo"
  ]
}

This allows for a match dict syntax like:

[
  {"LOWER":{"IN": ["$REF:variable-name"]}}
]

which is convenient for frequently-used lists of words. Lists are easier than dicts though.

@sam-writer sam-writer changed the title Support list and dict variables in match_dicts Support list variables in match_dicts Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant