Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure all migration logic was moved to mojp-dbs-pipelines #204

Open
OriHoch opened this issue Jul 10, 2017 · 2 comments
Open

ensure all migration logic was moved to mojp-dbs-pipelines #204

OriHoch opened this issue Jul 10, 2017 · 2 comments

Comments

@OriHoch
Copy link
Contributor

OriHoch commented Jul 10, 2017

need to review the migration code and logic and ensure it all moved to pipelines

make sure we didn't miss some logic

@OriHoch
Copy link
Contributor Author

OriHoch commented Jul 10, 2017

the following code was deleted, need to review the logic it had and ensure it all moved to pipelines (if needed)

bhs_api/item.py

def update_es(collection_name, doc, is_new, es_index_name=None, es=None, app=None):
    app = current_app if not app else app
    es_index_name = app.es_data_db_index_name if not es_index_name else es_index_name
    es = app.es if not es else es
    # index only the docs that are publicly available
    if doc_show_filter(collection_name, doc):
        body = deepcopy(doc)
        # adjust attributes for elasticsearch
        if collection_name == "persons":
            body["person_id"] = body.get("id", body.get("ID"))
            body["first_name_lc"] = body["name_lc"][0]
            body["last_name_lc"] = body["name_lc"][1]
            # maps all known SEX values to normalized gender value
            body["gender"] = {"F": "F", "M": "M",
                              None: "U", "": "U", "U": "U", "?": "U", "P": "U"}[body.get("SEX", "").strip()]
        # _id field is internal to mongo
        if '_id' in body:
            del body['_id']
        # id field has special meaning in elasticsearch
        if 'id' in body:
            del body['id']
        if "thumbnail" in body and "data" in body["thumbnail"]:
            # no need to have thumbnail data in elasticsearch
            # TODO: ensure we only store and use thumbnail from filesystem
            del body["thumbnail"]["data"]
        # persons collection gets a fake header to support searching
        if collection_name == "persons":
            name = " ".join(body["name"]) if isinstance(body["name"], list) else body["name"]
            body["Header"] = {"En": name, "He": name}
        # elasticsearch uses the header for completion field
        # this field does not support empty values, so we put a string with space here
        # this is most likely wrong, but works for now
        # TODO: figure out how to handle it properly, maybe items without header are invalid?
        if "Header" in body:
            for lang in ("He", "En"):
                if body["Header"].get(lang) is None:
                    body["Header"][lang] = '_'
        if collection_name == "persons":
            doc_id = "{}_{}_{}".format(body["tree_num"], body["tree_version"], body["person_id"])
        else:
            doc_id = get_doc_id(collection_name, body)
        if is_new:
            uuids_to_str(body)
            es.index(index=es_index_name, doc_type=collection_name, id=doc_id, body=body)
            return True, "indexed successfully (inserted)"
        else:
            es.update(index=es_index_name, doc_type=collection_name, id=doc_id, body=body)
            return True, "indexed successfully (updated)"
    else:
        return True, "item should not be shown - so not indexed"

@OriHoch
Copy link
Contributor Author

OriHoch commented Jul 10, 2017

following should be deleted:

  • migration/ folder
  • many items in scripts/ folder like ensure required metadata / dump_mongo etc..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant