Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle changes to schema selector fields #4

Closed
ian-ross opened this issue Jul 8, 2016 · 9 comments
Closed

Handle changes to schema selector fields #4

ian-ross opened this issue Jul 8, 2016 · 9 comments

Comments

@ian-ross
Copy link
Contributor

ian-ross commented Jul 8, 2016

Currently, there's no special treatment for cases where the schema selectors for a model instance change and so the effective schema changes. There should be a facility to manage this sort of thing, including some mechanism for migrating values between the "before" and "after" schemas.

@ian-ross
Copy link
Contributor Author

I think that I've fixed this, to some degree, but I've not done anything about migrating values between "before" and "after" schemas. @bjohare How were you imagining that this might work? I can see a couple of cases where things should just work (new schema has a non-required field the old schema doesn't; new schema lacks a field that was in the old schema), one where some action is required but things will work in all possible cases (new schema has a required field that the old schema doesn't, and there's a default value provided for the field), and a couple where there might be trouble (new schema has a required field that the old schema doesn't, with no default value provided; new schema has a field that occurs in the old schema, but with a different type that's not compatible with the current value in the field, or a choices list where the current value of the field doesn't occur).

For the "trouble" cases, the temptation is to throw an exception somehow, but that's tricky because the schema change is only computed when you try to access the attributes or schema list for the underlying JSONAttributes value. That means that you might update a field in a model that's used as a schema selector, and only find out later on that you did a bad thing when you try to access the list of attributes for the model. You would end up with model instances in states inconsistent with their attribute schemas.

The alternative was somehow to monkey-patch assignments to model instance fields that are used as schema selectors, but that would quickly get horrible. Suppose that, for some platform domain entity, you use the project and organisation as schema selectors. If you update the entity's project, you need to recalculate the schemas. But also, if you update the project's organisation, you need to update the schemas. That means that you would need to maintain some sort of notification list for all of these things, which would quickly get complicated. It would also lead to "non-local" exceptions: if you modify a project's organisation, you could get exceptions saying that you're not allowed to make that change because it invalidates an attribute in a field in an instance of some model instance that has a foreign key to the project. That seems just wrong-oh.

One option there is to say that schema updates when selectors change is only supported for selectors that aren't foreign keys, but that's naff. You might want to represent some sort of entity type as a foreign key into a type table, and you ought to be able to change those and have the schemas be recalculated without any drama.

Any ideas how to untangle this knot?

@oliverroick
Copy link
Member

We had similar problems at UCL, here’s how we dealt with it:

  • New schema has a required field that the old schema doesn't, with no default value provided: When a data collector edits a record, and there is no value provided for the new required field, the application throws, and error. i.e. the field is marked invalid. Wouldn’t it be possible to add a message like “The data schema has changed, please provide a value for this new field”?
  • New schema has a field that occurs in the old schema, but with a different type that's not compatible with the current value in the field: Changing the type of a field was something we did not allow. The only time when this was an issue, was when a project manager adds a field with a wrong type and wants to change that. Of course, this was much easier to control, because people were creating fields via the UI and not by uploading a form.
  • Choices list where the current value of the field doesn't occur We allowed this, but at people's risk. Project managers should be aware that removing a value might affect existing data. I think, we had a warning saying “There’s data with this value, you really shouldn’t to that.” Again, much easier to control, because UI-only.

Maybe we’re looking at this from the wrong point of view. Can we find a way to compute the differences between schemas when the schema is uploaded and provide a bunch of warnings whenever required?

@wonderchook
Copy link

I think there is something to be said about providing a bunch of warnings whenever required as @oliverroick suggested. So in his examples:

  • New schema has a required field that the old schema doesn't, with no default value provided: We warn the user about this case and tell them to either make the field not required or provide a default value
  • New schema has a field that occurs in the old schema, but with a different type that's not compatible with the current value in the field: This one we should tell them they can't change the value type to the incompatible one. Would it be possible to allow this if no data has yet been collected though? That would cover the scenario where the wrong field has been used.
  • Choices list where the current value of the field doesn't occur: I think allow and when we have a UI there will be a warning.

Thoughts?

@ian-ross
Copy link
Contributor Author

ian-ross commented Aug 2, 2016

I agree with this, and I've started implementing something to do it.

The idea is to signal these problems when an attempt is made to save a model instance to the database by using Django's pre_save signal to check the schema setup and throw an exception if anything has changed in a "bad" way (thus rolling back the model instance save action). There's no easy or efficient way of detecting when the fields of a model instance that are used as schema selectors are changed (there's no Django signal for "field changed" or something like that, so you would have to monkey patch the __setitem__ method in the relevant model classes to make it work, which would be error-prone and could interfere with other things that Django does). This isn't as bad as it might sound, because it's only when you save a model instance that things like database integrity constraints (uniqueness, etc.) get applied, and I think it's fair to argue that these schema conditions are comparable to things like uniqueness, so the model's save method is the right place to be checking them. (It's also quite possible to do the "allow changes if there's no data" thing you ask about.)

The only real problem with this is that it's difficult to do anything other than "allow silently" (don't throw an exception, so the save goes ahead as normal) or "disallow with error information" (throw an exception, save is aborted, exception information is passed back to the user showing which fields had problems). The last example you give, where the change is allowed but there's warning information, is a bit trickier to deal with. The only simple way to do it is to require the user to check explicitly whether there are schema migration warnings after attempting to save a model instance. If that's an acceptable approach, it should be easy to implement.

To summarise, here's what I'm doing:

  • Detection of save actions and determination whether schema selectors have changed. [DONE]
  • Calculation of attribute lists based on old and new schema selectors. [DONE]
  • Determination of attribute list differences and identification of any "unsafe" changes. [TODO]
  • Schema migration warnings check. [TODO]
  • More testing. [TODO]

@wonderchook
Copy link

@ian-ross so in that case the errors would come back and the user would indicate they want to go forward anyway? I think that is an acceptable method.

@ian-ross
Copy link
Contributor Author

ian-ross commented Aug 2, 2016

I think we might be using the word "user" in two different senses here. When I say "user" here, I mean a user of the django-jsonattrs API, not a user of the platform (so the platform code is the "user" of the API).

After making a change to a Django model instance that leads to a schema update, there needs to be a check to see if there are any warnings, and if so, somehow to present those warnings to the "end user" in the platform UI. How that happens is up to the platform code, but the only mechanism I can think of that maintains a reasonable separation of concerns between the django-jsonattrs code and the platform code is to require this explicit check for warnings. If you do something that leads to a schema update and it fails, you'll get an exception. If you don't get an exception, it either means that everything was completely OK, or it was OK enough to proceed, but there might be warnings. But you need to check for those warnings explicitly.

An additional thing that I could do is to add some code to allow a user of django-jsonattrs to check in advance whether an model instance change leading to a schema update is OK or not. You would call a function indicating what changes you're making to the model instance, and this function would return a result saying either: "no schema update", "schema update OK", "bad schema update: here are the errors", "OK-ish schema update, but with warnings, and here they are". That might make it easier to manage these things in the platform, since you'd be able to check in advance whether there was going to be a problem of some sort.

@oliverroick @bjohare What do you think?

@wonderchook
Copy link

@ian-ross I was using the term user really broadly, but thinking about the API. I was picturing how it might be used on the front end, but what would have to happen on the backend. To me being able to check in advance would make sense from an API perspective. Then in the implementation in the platform that could be used to allow the user to say they didn't care about the warnings and to proceed.

@ian-ross
Copy link
Contributor Author

ian-ross commented Aug 2, 2016

OK, that makes sense. I'll implement the in-advance check as well. Just writing a big pile of tests to make sure all the cases work right right now...

@ian-ross
Copy link
Contributor Author

ian-ross commented Aug 4, 2016

I believe I now have a good solution for this. Changes to the schema selector fields of a model instance that has a JSONAttributeField field are detected when you try to save the model instance. If there are conflicts between the new and old schemas as a result of the schema selector changes, you get a SchemaUpdateException carrying a list of SchemaUpdateConflict objects describing what's wrong. The conflicts caught by this cover adding required fields without defaults and changing the types of fields in an inconsistent way. Additionally, you can call the schema_update_conflicts function, passing a model instance as a parameter, after updating the schema selector fields but before saving the model instance. This function returns a list of conflicts that would be flagged if you attempted to save the model instance. This allows for checking of conflicts before trying to save. In addition to the "required no default" and "incompatible types" conflicts, the schema_update_conflicts function also flags updates to or addition of choice lists that don't include the relevant current field value.

I'm not sure what to do about incorporating this into the platform. These changes are included in version 0.1.8 of django-jsonattrs and all the platform unit tests pass using this version, so we can just update the django-jsonattrs version. @bjohare Did you have any particular ideas about how to deal with this stuff?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants