Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUT calls don't fully "replace the state of the target resource" #4231

Closed
claytondaley opened this issue Jun 30, 2016 · 68 comments
Closed

PUT calls don't fully "replace the state of the target resource" #4231

claytondaley opened this issue Jun 30, 2016 · 68 comments

Comments

@claytondaley
Copy link

claytondaley commented Jun 30, 2016

EDIT: For the current status of the issue, skip to #4231 (comment)

===

I'm having an issue implementing an Optimistic Concurrency library on an application that uses DRF to interact with the database. I'm trying to:

  • Confirm that the behavior I'm seeing is attributable to DRF
  • Confirm that this is the intended behavior
  • Determine if there's any practical way to overcome this behavior

I recently added optimistic concurrency to my Django application. To save you the Wiki lookup:

  • Every model has a version field
  • When an editor edits an object, they get the version of the object they're editing
  • When the editor saves the object, the included version number is compared to the database
  • If the versions match, the editor updated the latest document and the save goes through
  • If the versions don't match, we assume a "conflicting" edit was submitted between the time the editor loaded and saved so we reject the edit
  • If the version is missing, we cannot do testing and should reject the edit

I had a legacy UI talking through DRF. The legacy UI did not handle version numbers. I expected this to cause concurrency errors, but it did not. If I understand the discussion in #3648 correctly:

  • DRF merges the PUT with the existing record. This causes a missing version ID to be filled with the current database ID
  • Since this always provides a match, omitting this variable will always break an optimistic concurrency system that communicates across DRF
  • There are no easy options (like making the field "required") to ensure the data is submitted every time. (edit: you can workaround the issue by making it required as demonstrated in this comment)

Steps to reproduce

  1. Setup an Optimistic Concurrency field on a model
  2. Create a new instance and update several times (to ensure you no longer have a default version number)
  3. Submit an update (PUT) through DRF excluding the version ID

Expected behavior

The missing version ID should not match the database and cause a concurrency issue.

Actual behavior

The missing version ID is filled by DRF with the current ID so the concurrency check passes.

@tomchristie
Copy link
Member

Okay, can't promise I'll be able to review this pretty in-depth ticket immediately, as the upcoming 3.4 release takes priority. But thanks for such a detailed, well thought through issue. This'll most likely like be looked at on the scale of weeks, not days or months. If you make any progress, have any further thoughts yourself, please do update the ticket and keep us informed.

@claytondaley
Copy link
Author

claytondaley commented Jun 30, 2016

OK. I'm pretty sure my issue is the combination of two factors:

  1. DRF doesn't require the field in the PUT (even though it is required in the model) because it has a default (version=0)
  2. DRF merges the PUT fields with the current object (without injecting the default)

As a result, DRF uses the current (database) value and breaks the concurrency control. The second half of the issue is related to the discussion in #3648 (also cited above) and there's a (pre 3.x) discussion in #1445 that still appears to be relevant.

I'm hoping a concrete (and increasingly common) case where the default behavior is perverse will be enough to reopen the discussion about the "ideal" behavior of a ModelSerializer. Obviously, I'm only an inch deep on DRF, but my intuition is that the following behavior is appropriate for a required field and a PUT:

  • When using a non-partial serializer, we should either receive the value, use the default, or (if no default is available) raise a validation error. Model-wide validation should apply to only the inputs/defaults.
  • When using a partial serializer, we should either receive the value or fallback on the current values. Model-wide validation should apply to that combined data.
  • I believe the current "non-partial" serializer is really quasi-partial:
    • It's non-partial for fields that are required and have no default
    • It's partial for fields that are required and have a default (since the default isn't used)
    • It's partial for fields that are not required

We can't change bullet (1) above or the defaults become useless (we require the input even though we know the default). That means we have to fix the issue by changing #2 above. I agree with your argument in #2683 that:

Model defaults are model defaults. The serializer should omit the value and defer responsibility to the Model.object.create() to handle this.

To be consistent with that separation of concerns, update should create a new instance (delegating all defaults to the model) and apply the submitted values to that new instance. This results in the behavior requested in #3648.

Trying to describe the migration path helps highlight how odd the current behavior is. The end goal is to

  1. Fix the ModelSerializer,
  2. Add a flag for this quasi-partial state, and
  3. Make that flag the default (for backwards compatibility)

What is the name of that flag? The current Model Serializer is actually a partial serializer that (somewhat arbitrarily) requires fields that meet the condition required==True and default==None. We can't explicitly use the partial flag without breaking backwards compatibility so we need a new (hopefully temporary) flag. I'm left with quasi_partial, but my inability to express the arbitrary requirement required==True and default==None is why it's so clear to me that this behavior should be deprecated urgently.

@anoopmalav
Copy link
Contributor

You can add extra_kwargs in serializer's Meta, making version a required field.

class ConcurrentModelSerializer(serializers.ModelSerializer):
    class Meta:
        model = ConcurrentModel
        extra_kwargs = {'version': {'required': True}}

@claytondaley
Copy link
Author

Thanks @anoopmalev. That'll keep me on the production branch.

After "sleeping on it" I realize there's an extra wrinkle. Everything I said should apply to the serializer's fields. If a field isn't included in the serializer, it should not be modified. In this way, all serilaizers are (and should be) partial for the non-included fields. This is a little more complicated than my "make a new instance" above.

@tomchristie
Copy link
Member

I believe this issue needs to be reduced to a more constrained proposal in order to move forward.
Seems to broad to be actionable in it's current state.
For now I'm closing this - if anyone can reduce it to a concise, actionable statement of desired behavior then we can reconsider. Until then I think it's simply to broad.

@claytondaley
Copy link
Author

claytondaley commented Jul 4, 2016

Here's a concise proposal... for a non-partial serializer:

  1. For any field not listed in the serializer (implicitly or explicitly) or marked as read-only, preserve the existing value
  2. For all other fields, use the first option available:
    1. Populate with the submitted value
    2. Populate with a default, including a value implied by blank and/or null
    3. Raise an exception

For clarity, validation is run on the final product of this process.

@tomchristie
Copy link
Member

Ie you want to set required=True on any serializer field that doesn't have a model default, for updates?

Have I got that correct?

@claytondaley
Copy link
Author

Yes (and more). That's how I understand the partial (all fields optional) vs. non-partial (all fields required) distinction. The only time a non-partial serializer doesn't require a field is the presence of a default (narrowly or broadly defined) since the serializer can use that default if no value is provided.

The italicized section is what DRF is not currently doing and the more important change in my proposal. The current implementation just skips the field.

I had a second proposal mixed in, but it's really a separate question of how generous you want to be with the idea of a "default". The current behavior is "strict" in that only a default is treated as such. If you really wanted to reduce the amount of required data, you could make blank=True fields optional as well... assuming that an absent value is a blank value.

@pySilver
Copy link

@claytondaley I'm using OOL with DRF since 2x this way:

class VersionModelSerializer(serializers.ModelSerializer, BaseSerializer):
    _initial_version = 0

    _version = VersionField()

    def __init__(self, *args, **kwargs):
        super(VersionModelSerializer, self).__init__(*args, **kwargs)

        # version field should not be required if there is no object
        if self.instance is None and '_version' in self.fields and\
                getattr(self, 'parent', None) is None:
            self.fields['_version'].read_only = True
            self.fields['_version'].required = False

        # version field is required while updating instance
        if self.instance is not None and '_version' in self.fields:
            self.fields['_version'].required = True

        if self.instance is not None and hasattr(self.instance, '_version'):
            self._initial_version = self.instance._version

    def validate__version(self, value):
        if self.instance is not None:
            if not value and not isinstance(value, int):
                raise serializers.ValidationError(_(u"This field is required"))

        return value
   # more code & helpers

it works just great with all kind of business logic and never caused any problem.

@claytondaley
Copy link
Author

Was this left closed on accident? I responded to the specific question and didn't hear a reason what was wrong with the proposal.

@pySilver
Copy link

@claytondaley why OOL should be a part of DRF? Check my code – it works just find in a large app (1400 tests). VersionField is just an IntegerField.

@claytondaley
Copy link
Author

You've hard-coded the OOL into the serializer. This is the wrong place to do it because you have a race condition. Parallel updates (with the same prior version) would all pass at the serializer... but only one would win at the save action.

I'm using django-concurrency which puts the OOL logic into the save action (where it belongs). Basically UPDATE... WHERE version = submitted_version. This is atomic so there's no race condition. However, it exposes a flaw in the serialization logic::

  • If default is set on a field in the model, DRF sets required=False. The (valid) idea is that DRF can use that default if no value is submitted.
  • If that field is missing, however, DRF doesn't use the default. Instead it merges the submitted data with the current version of the object.

When we don't require the field, we do so because we have a default to use. DRF doesn't fulfill that contract because it doesn't use the default... it uses the existing value.

The underlying issue was discussed before, but they didn't have a nice, concrete case. OOL is that ideal case. The existing value of a version field always passes OOL so you can bypass the entire OOL system by leaving out version. That's (obviously) not the desired behavior of an OOL system.

@pySilver
Copy link

pySilver commented Jul 30, 2016

@claytondaley

You've hard-coded the OOL into the serializer.

Did I? Have you found any OOL logic in my serializer beside field requirement?

This is the wrong place to do it because you have a race condition.

Sry, I just cant see where is the race condition here.

I'm using django-concurrency which puts the OOL logic into the save action (where it belongs).

I'm also using django-concurrency :) But thats model level, not serializer. On the serializer level you just need to:

  • make sure _version field is always required (when it should be)
  • make sure your serializer knows how to handle OOL errors (this part I've ommited)
  • make sure your apiview knows how to handle OOL errors and raises HTTP 409 with possible diff context

@pySilver
Copy link

pySilver commented Jul 30, 2016

actually, im not using django-concurrency due to an issue that autor marked as "wont fix": it bypasses OOL when obj.save(update_fields=['one', 'two', 'tree']) is used which I found bad practice, so I forked package.

@pySilver
Copy link

here is the missing save method of the serializer I've mentioned earlier. that should solve all of your issues:

    def save(self, **kwargs):
        try:
            self.instance = super(VersionModelSerializer, self).save(**kwargs)
            return self.instance
        except VersionException:
            # Use select_for_update so we have some level of guarantee
            # that object won't be modified at least here at the same time
            # (but it may be modified somewhere else, where select_for_update
            # is not used!)
            with transaction.atomic():
                db_instance = self.instance.__class__.objects.\
                    select_for_update().get(pk=self.instance.pk)
                diff = self._get_serializer_diff(db_instance)

                # re-raise exception, so api client will receive friendly
                # printed diff with writable fields of current serializer
                if diff:
                    raise VersionException(diff)

                # otherwise re-try saving using db_instance
                self.instance = db_instance
                if self.is_valid():
                    return super(VersionModelSerializer, self).save(**kwargs)
                else:
                    # there are errors that could not be displayed to a user
                    # so api client should refresh & retry by itself
                    raise VersionException

        # instance.save() was interrupted by application error
        except ApplicationException as logic_exc:
            if self._initial_version != self.instance._version:
                raise VersionException

            raise logic_exc

@claytondaley
Copy link
Author

Sorry. I didn't read your code to figure out what you were doing. I saw a serializer. You can obviously work around the issue by hacking the serializer but you shouldn't have to.... because the flaw in the DRF logic stands on its own. I'm just using OOL to make the point.

@claytondaley
Copy link
Author

claytondaley commented Jul 30, 2016

And you should try that code against the latest version of django-concurrency (using IGNORE_DEFAULT=False). django-concurrency was also ignoring default values, but I submitted a patch. There was an odd corner case that I had to hunt down to make it work for normal cases.

@pySilver
Copy link

I think it's called extending default functionality, not really hacking. I think the best place for such feature support is at django-concurrency package.

I've reread whole issue discussion and found your proposal too broad and it would fail in many places (due to magically using default values from different sources under different conditions). DRF 3.x just got much easier and predictable than 2.x, lets keep it that way :)

@claytondaley
Copy link
Author

You can't fix this in the model layer because it's broken at the serializer (before it gets to the model). Set OOL aside... why don't we require a field if default is set?

@claytondaley
Copy link
Author

A non partial serializer "requires" all fields (fundamentally) and yet we let this one by. Is it a bug? Or do we have a logical reason?

@pySilver
Copy link

as you can see in my code example – _version field is always correctly required in all possible cases.

btw, it turned out that I've borrowed model lvl code from https://github.com/gavinwahl/django-optimistic-lock and not from django-concurrency which is way to complex for almost no reason.

@claytondaley
Copy link
Author

... so the bug is "non-partial serializers incorrectly set some fields to not-required". That's the alternative. Because that's the (implicit) commitment that a non-partial serializer makes.

@claytondaley
Copy link
Author

I can quote it:

By default, serializers must be passed values for all required fields or they will raise validation errors.

This says nothing about required (except when a default is provided).

@claytondaley
Copy link
Author

(and I get that I'm talking about two different levels, but the ModelSerializer shouldn't un-require fields if it's not going to take responsibility for that decision)

@pySilver
Copy link

I think I've lost your point..

@pySilver
Copy link

(and I get that I'm talking about two different levels, but the ModelSerializer shouldn't un-require fields if it's not going to take responsibility for that decision)

Whats wrong with that?

@claytondaley
Copy link
Author

claytondaley commented Jul 30, 2016

OK let me try a different angle.

  • Assume I have a non-partial Model Serializer (edit: all defaults) that covers all the fields in my model.

Should a CREATE or UPDATE with the same data ever produce a different object (minus the ID)

@carltongibson
Copy link
Collaborator

I'm going to de-milestone this for now. We can reassess after v3.7

@claytondaley
Copy link
Author

Up to you guys, but I want to make sure you're clear that this is not a Ticket to add concurrency support. The real issue is that a single serializer cannot correctly validate both a PUT and POST in the current architecture. Concurrency just provided the "failing test".

@claytondaley claytondaley changed the title Optimistic Concurrency and PUT Logic PUT calls don't fully "replace the state of the target resource" Sep 28, 2017
@claytondaley
Copy link
Author

claytondaley commented Sep 28, 2017

TL;DR You can see why this issue is blocked by starting at Tom's proposed fix.

In summary, the proposed solution is to make all fields required for a PUT request. There are (at least) two problems with this approach:

  1. Serializers think in actions not HTTP methods so there isn't a one-to-one mapping. The obvious example is create because it's shared by PUT and POST. Note that create-by-PUT is disabled by default so the proposed fix is probably better than nothing.
  2. We don't need to require all fields in a PUT (a sentiment shared by PUT does not completely replace object #3648, ModelSerializer fields don't obey default from the model #4703). If a nillable field is absent, we know it can be None. If a field with a default is absent, we know we can use the default. PUTs actually have the same (Model-derived) field requirements as POST.

The real issue is how we handle missing data and the basic proposal in #3648, #4703, and here remain the right solution. We can support all of the HTTP modes (including create-by-PUT) if we introduce a concept like if_missing_use_default. My original proposal presented it as a replacement for partial, but it's easier (and may be necessary) to think of it as an orthogonal concept.

@tomchristie
Copy link
Member

tomchristie commented Sep 29, 2017

if we introduce a concept like if_missing_use_default.

There's nothing preventing anyone from implementing either this, or a strict "require all fields" as a base serializer class, and wrapping that up as a third party library.

My opinion is that a strict "require all fields" mode might also be able to make it into core, it's very clear obvious behavior, and I can see why that'd be useful.

I'm not convinced that a "allow fields to be optional, but replace everything, using model defaults if they exist" - That seems like it'd present some very counter-intuitive behavior (eg. "created_at" fields, that automatically end up updating themselves). If we want a stricter behavior, we should just have a stricter behavior.

Either way around, the right way to approach this is to validate it as a third party package, then update our docs so we can link to that.

Alternatively, if you're convinced that we're missing a behavior from core that our users really do need, then you're welcome to make a pull request, updating the behavior and the documentation, so we can assess the merits in a very concrete way.

Happy to take pull requests as a starting point for this, and even happier to include a third party package demonstrating this behavior.

@tomchristie
Copy link
Member

coming around to the value in having an option of PUT-is-strict behavior.

This still stands. I think we could consider that aspect in core, if someone cares enough about it to make a pull request along those lines. It'd need to be an optional behavior.

@claytondaley
Copy link
Author

That seems like it'd present some very counter-intuitive behavior (eg. "created_at" fields, that automatically end up updating themselves).

A created_at field should be read_only (or excluded from the serializer). In both of these cases, it would be unchanged (the normal serializer behavior). In the counter-intuitive case that the field is not read-only in the serializer, you would get the counter-intuitive behavior of automatically changing it.

Happy to take pull requests as a starting point for this, and even happier to include a third party package demonstrating this behavior.

Absolutely. The "use defaults" variation is an ideal case for a 3rd party package because the change is a trivial wrapper around (one method of) the existing behavior and (if you buy into the defaults argument) works for all non-partial serializers.

tomchristie closed this 4 hours ago

Perhaps you'd consider adding a label like "PR Welcome" or "3rd Party Plugin" and leaving valid/acknowledged issues like this open. I often search open issues to see if a problem has already been reported and its progress towards resolution. I perceive closed issues as "invalid" or "fixed". Mixing a few "valid but closed" issues into the thousands of invalid/fixed issues doesn't invite efficient searching (even if you knew they might be there).

@tomchristie
Copy link
Member

Perhaps you'd consider adding a label like "PR Welcome" or "3rd Party Plugin"

That'd be reasonable enough, but we'd like our issue tracker to reflect active or actionable work on the project itself.

It's really important for us to try to keep our issues tightly scoped. Changing priorities might mean that we sometime choose to reopen issues that we've previously closed. Right now I think this has fallen out of the "the core team want to address this in the immediate future".

If it comes up repeatedly, and there continues to be no third party solution, then perhaps we would reassess it.

@tomchristie
Copy link
Member

leaving valid/acknowledged issues like this open.

A bit more context on the issue management style - https://www.dabapps.com/blog/sustainable-open-source-management/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants