Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow customizable demographic fields to be set from the backend. #11999

Closed
rtibbles opened this issue Mar 19, 2024 · 6 comments
Closed

Allow customizable demographic fields to be set from the backend. #11999

rtibbles opened this issue Mar 19, 2024 · 6 comments
Assignees
Labels
DEV: backend Python, databases, networking, filesystem...

Comments

@rtibbles
Copy link
Member

Overview

Allow customisable demographic data collection for contextualized demographic reporting

Description and outcomes

During initial design of the demographic data collection that is currently in Kolibri, we were aware of additional demographic data reporting requirements, but were unable to come up with a more generalizable set than the limited demographic data that we currently allow collection of.

Because of the resource and data transmission constraints that exist in the contexts that Kolibri is used, collecting additional demographic data outside of the platform poses a significant additional burden to implementations, and a barrier to doing effective and targeted measurement and evaluation work.

To address this, this feature will pilot demographic data collection beyond the current fixed fields. Allowing someone with command line access to Kolibri to add additional fields to be collected for demographic data. This can be scripted for automated setup when piloting this feature. Only enumerated fields will be able to be added - i.e. in the user interface, it will be a dropdown menu, and in the backend, the schema specification will require an enum of allowed string values.

Technical specifications

A JSONSchema of this rough form will be added to the FacilityDataset extra_fields_schema as an additional property. It will also be reused in the DeviceSettings extra_settings_schema.

translations_schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "language": {
                "type": "string",
                "enum": list(KOLIBRI_SUPPORTED_LANGUAGES)
            },
            "message": {
                "type": "string"
            },
        },
    },
    "optional": True,
}


demographic_field_schema = {
    "type": "object",
    "properties": {
        "description": {
            "type": "string"
        },
        "enumValues": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "value": {
                        "type": "string"
                    },
                    "defaultLabel": {
                        "type": "string"
                    },
                    "translations": translations_schema,
                },
            },
        },
        "translations": translations_schema,
    },
}

An extra_fields field will be added to the FacilityUser model, which will be dynamically validated, depending on the value of the schema saved in the extra_fields of its associated FacilityDataset model.

For ease of programmatic setup, a management command to set these fields specifications in the device settings of an unprovisioned device will be created. Any Facility created on a device that has this setup will duplicate the fields specifications detailed in the DeviceSettings.

All user creation and editing workflows will also be updated to allow editing and saving of these new fields. As each field will be a dropdown, this will be implemented using a KSelect, with one displayed for each field. The order of the fields will be determined by the order of the fields saved in the array in JSON.

Places this will impact will be the setup wizard, the user profile page, and the facility user management.

@jredrejo
Copy link
Member

While implementing this pull request, it seemed like the json-schema-validator: https://pypi.org/project/json-schema-validator/ library was the only compatible option for Python 2.7. However, several newer features of the current JSON Schema standard are not supported in it.

Looking at the proposed schema I can't find support for translations or enumValues in the mentioned library. If this has to be applied in kolibri 0.16 which support python 2.7 I suggest to take a look at https://github.com/zyga/json-schema-validator/blob/master/json_schema_validator/tests/test_schema.py to check what can be used with this library.

Regarding the proposed approach, using customizable fields in the demographic data seems like the least disruptive approach. This maintains backward compatibility with the current model and minimizes changes needed for synchronization. However, I remember there were some problems in the past with Morango syncing json data in the past. Perhaps @bjester can shed light on whether this functionality is now working as expected.

@bjester
Copy link
Member

bjester commented Mar 20, 2024

@jredrejo No updates have been made to how JSON fields are handled when synced. If both devices in a sync have modified the data, the changes are not merged, so only one side's writes are preserved.

@rtibbles
Copy link
Member Author

translations or enumValues in the mentioned library

These are just the names of properties that I am defining for the schema. We would then generate the JSONSchema from this for validation of the entries in the extra_fields of the FacilityUser.

@jamalex
Copy link
Member

jamalex commented Mar 20, 2024

A couple of thoughts:

  • On the DeviceSettings, might be clearer to name it something like default_demographic_field_schema to make it clear that it's just referenced during facility creation, and wouldn't override what's marked on an existing facility dataset.
  • I like having the schema being synced, and I agree with validating any data entered from the frontend against the schema before saving it to the FacilityUser. I would caution against adding strict model validation on it, though, as the consequences of a FacilityUser (and all their data) being entirely blocked from syncing would outweigh the benefits to me. And it would mean it's possible for a formerly valid model to suddenly become invalid because of a change in another model (which may have happened on another device, meaning we'd need to worry about upgrade routines etc, but where would the logic of how to map between schema versions live?). Having it ensure that what gets saved from the frontend matches the schema, and then blanking out any invalid field values when loading to the frontend from the model, seems like it could be a safer alternative. In the edge cases of data on a user that doesn't match the current schema, we could apply centralized normalization to that on a case by case basis on KDP.

@rtibbles
Copy link
Member Author

Thanks - it does seem like syncing the schema is desirable, but doing softer validation for a dynamic schema will be necessary.

The issue of it being updated on two different devices and then having conflicting schemas within the FacilityUser data does seem like a nightmare. I'll try to add some test cases around that to make sure we're not making a mess for ourselves.

@rtibbles rtibbles self-assigned this Mar 21, 2024
@rtibbles rtibbles added the DEV: backend Python, databases, networking, filesystem... label Mar 21, 2024
@rtibbles
Copy link
Member Author

rtibbles commented Apr 2, 2024

Implemented in #12032

@rtibbles rtibbles closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DEV: backend Python, databases, networking, filesystem...
Projects
None yet
Development

No branches or pull requests

4 participants