PUT calls don't fully "replace the state of the target resource" #4231

claytondaley · 2016-06-30T03:06:23Z

EDIT: For the current status of the issue, skip to #4231 (comment)

===

I'm having an issue implementing an Optimistic Concurrency library on an application that uses DRF to interact with the database. I'm trying to:

Confirm that the behavior I'm seeing is attributable to DRF
Confirm that this is the intended behavior
Determine if there's any practical way to overcome this behavior

I recently added optimistic concurrency to my Django application. To save you the Wiki lookup:

Every model has a version field
When an editor edits an object, they get the version of the object they're editing
When the editor saves the object, the included version number is compared to the database
If the versions match, the editor updated the latest document and the save goes through
If the versions don't match, we assume a "conflicting" edit was submitted between the time the editor loaded and saved so we reject the edit
If the version is missing, we cannot do testing and should reject the edit

I had a legacy UI talking through DRF. The legacy UI did not handle version numbers. I expected this to cause concurrency errors, but it did not. If I understand the discussion in #3648 correctly:

DRF merges the PUT with the existing record. This causes a missing version ID to be filled with the current database ID
Since this always provides a match, omitting this variable will always break an optimistic concurrency system that communicates across DRF
~~There are no easy options (like making the field "required") to ensure the data is submitted every time.~~ (edit: you can workaround the issue by making it required as demonstrated in this comment)

Steps to reproduce

Setup an Optimistic Concurrency field on a model
Create a new instance and update several times (to ensure you no longer have a default version number)
Submit an update (PUT) through DRF excluding the version ID

Expected behavior

The missing version ID should not match the database and cause a concurrency issue.

Actual behavior

The missing version ID is filled by DRF with the current ID so the concurrency check passes.

tomchristie · 2016-06-30T08:47:19Z

Okay, can't promise I'll be able to review this pretty in-depth ticket immediately, as the upcoming 3.4 release takes priority. But thanks for such a detailed, well thought through issue. This'll most likely like be looked at on the scale of weeks, not days or months. If you make any progress, have any further thoughts yourself, please do update the ticket and keep us informed.

claytondaley · 2016-06-30T16:57:31Z

OK. I'm pretty sure my issue is the combination of two factors:

DRF doesn't require the field in the PUT (even though it is required in the model) because it has a default (version=0)
DRF merges the PUT fields with the current object (without injecting the default)

As a result, DRF uses the current (database) value and breaks the concurrency control. The second half of the issue is related to the discussion in #3648 (also cited above) and there's a (pre 3.x) discussion in #1445 that still appears to be relevant.

I'm hoping a concrete (and increasingly common) case where the default behavior is perverse will be enough to reopen the discussion about the "ideal" behavior of a ModelSerializer. Obviously, I'm only an inch deep on DRF, but my intuition is that the following behavior is appropriate for a required field and a PUT:

When using a non-partial serializer, we should either receive the value, use the default, or (if no default is available) raise a validation error. Model-wide validation should apply to only the inputs/defaults.
When using a partial serializer, we should either receive the value or fallback on the current values. Model-wide validation should apply to that combined data.
I believe the current "non-partial" serializer is really quasi-partial:
- It's non-partial for fields that are required and have no default
- It's partial for fields that are required and have a default (since the default isn't used)
- It's partial for fields that are not required

We can't change bullet (1) above or the defaults become useless (we require the input even though we know the default). That means we have to fix the issue by changing #2 above. I agree with your argument in #2683 that:

Model defaults are model defaults. The serializer should omit the value and defer responsibility to the Model.object.create() to handle this.

To be consistent with that separation of concerns, update should create a new instance (delegating all defaults to the model) and apply the submitted values to that new instance. This results in the behavior requested in #3648.

Trying to describe the migration path helps highlight how odd the current behavior is. The end goal is to

Fix the ModelSerializer,
Add a flag for this quasi-partial state, and
Make that flag the default (for backwards compatibility)

What is the name of that flag? The current Model Serializer is actually a partial serializer that (somewhat arbitrarily) requires fields that meet the condition required==True and default==None. We can't explicitly use the partial flag without breaking backwards compatibility so we need a new (hopefully temporary) flag. I'm left with quasi_partial, but my inability to express the arbitrary requirement required==True and default==None is why it's so clear to me that this behavior should be deprecated urgently.

anoopmalav · 2016-07-01T09:30:00Z

You can add extra_kwargs in serializer's Meta, making version a required field.

class ConcurrentModelSerializer(serializers.ModelSerializer):
    class Meta:
        model = ConcurrentModel
        extra_kwargs = {'version': {'required': True}}

claytondaley · 2016-07-01T14:55:03Z

Thanks @anoopmalev. That'll keep me on the production branch.

After "sleeping on it" I realize there's an extra wrinkle. Everything I said should apply to the serializer's fields. If a field isn't included in the serializer, it should not be modified. In this way, all serilaizers are (and should be) partial for the non-included fields. This is a little more complicated than my "make a new instance" above.

tomchristie · 2016-07-04T14:53:47Z

I believe this issue needs to be reduced to a more constrained proposal in order to move forward.
Seems to broad to be actionable in it's current state.
For now I'm closing this - if anyone can reduce it to a concise, actionable statement of desired behavior then we can reconsider. Until then I think it's simply to broad.

claytondaley · 2016-07-04T19:24:39Z

Here's a concise proposal... for a non-partial serializer:

For any field not listed in the serializer (implicitly or explicitly) or marked as read-only, preserve the existing value
For all other fields, use the first option available:
1. Populate with the submitted value
2. Populate with a default, including a value implied by blank and/or null
3. Raise an exception

For clarity, validation is run on the final product of this process.

tomchristie · 2016-07-04T20:03:20Z

Ie you want to set required=True on any serializer field that doesn't have a model default, for updates?

Have I got that correct?

claytondaley · 2016-07-04T21:11:41Z

Yes (and more). That's how I understand the partial (all fields optional) vs. non-partial (all fields required) distinction. The only time a non-partial serializer doesn't require a field is the presence of a default (narrowly or broadly defined) since the serializer can use that default if no value is provided.

The italicized section is what DRF is not currently doing and the more important change in my proposal. The current implementation just skips the field.

I had a second proposal mixed in, but it's really a separate question of how generous you want to be with the idea of a "default". The current behavior is "strict" in that only a default is treated as such. If you really wanted to reduce the amount of required data, you could make blank=True fields optional as well... assuming that an absent value is a blank value.

pySilver · 2016-07-25T17:13:59Z

@claytondaley I'm using OOL with DRF since 2x this way:

class VersionModelSerializer(serializers.ModelSerializer, BaseSerializer):
    _initial_version = 0

    _version = VersionField()

    def __init__(self, *args, **kwargs):
        super(VersionModelSerializer, self).__init__(*args, **kwargs)

        # version field should not be required if there is no object
        if self.instance is None and '_version' in self.fields and\
                getattr(self, 'parent', None) is None:
            self.fields['_version'].read_only = True
            self.fields['_version'].required = False

        # version field is required while updating instance
        if self.instance is not None and '_version' in self.fields:
            self.fields['_version'].required = True

        if self.instance is not None and hasattr(self.instance, '_version'):
            self._initial_version = self.instance._version

    def validate__version(self, value):
        if self.instance is not None:
            if not value and not isinstance(value, int):
                raise serializers.ValidationError(_(u"This field is required"))

        return value
   # more code & helpers

it works just great with all kind of business logic and never caused any problem.

claytondaley · 2016-07-30T01:35:07Z

Was this left closed on accident? I responded to the specific question and didn't hear a reason what was wrong with the proposal.

pySilver · 2016-07-30T02:45:49Z

@claytondaley why OOL should be a part of DRF? Check my code – it works just find in a large app (1400 tests). VersionField is just an IntegerField.

claytondaley · 2016-07-30T03:07:37Z

You've hard-coded the OOL into the serializer. This is the wrong place to do it because you have a race condition. Parallel updates (with the same prior version) would all pass at the serializer... but only one would win at the save action.

I'm using django-concurrency which puts the OOL logic into the save action (where it belongs). Basically UPDATE... WHERE version = submitted_version. This is atomic so there's no race condition. However, it exposes a flaw in the serialization logic::

If default is set on a field in the model, DRF sets required=False. The (valid) idea is that DRF can use that default if no value is submitted.
If that field is missing, however, DRF doesn't use the default. Instead it merges the submitted data with the current version of the object.

When we don't require the field, we do so because we have a default to use. DRF doesn't fulfill that contract because it doesn't use the default... it uses the existing value.

The underlying issue was discussed before, but they didn't have a nice, concrete case. OOL is that ideal case. The existing value of a version field always passes OOL so you can bypass the entire OOL system by leaving out version. That's (obviously) not the desired behavior of an OOL system.

pySilver · 2016-07-30T03:15:46Z

@claytondaley

You've hard-coded the OOL into the serializer.

Did I? Have you found any OOL logic in my serializer beside field requirement?

This is the wrong place to do it because you have a race condition.

Sry, I just cant see where is the race condition here.

I'm using django-concurrency which puts the OOL logic into the save action (where it belongs).

I'm also using django-concurrency :) But thats model level, not serializer. On the serializer level you just need to:

make sure _version field is always required (when it should be)
make sure your serializer knows how to handle OOL errors (this part I've ommited)
make sure your apiview knows how to handle OOL errors and raises HTTP 409 with possible diff context

pySilver · 2016-07-30T03:18:26Z

actually, im not using django-concurrency due to an issue that autor marked as "wont fix": it bypasses OOL when obj.save(update_fields=['one', 'two', 'tree']) is used which I found bad practice, so I forked package.

pySilver · 2016-07-30T03:21:53Z

here is the missing save method of the serializer I've mentioned earlier. that should solve all of your issues:

    def save(self, **kwargs):
        try:
            self.instance = super(VersionModelSerializer, self).save(**kwargs)
            return self.instance
        except VersionException:
            # Use select_for_update so we have some level of guarantee
            # that object won't be modified at least here at the same time
            # (but it may be modified somewhere else, where select_for_update
            # is not used!)
            with transaction.atomic():
                db_instance = self.instance.__class__.objects.\
                    select_for_update().get(pk=self.instance.pk)
                diff = self._get_serializer_diff(db_instance)

                # re-raise exception, so api client will receive friendly
                # printed diff with writable fields of current serializer
                if diff:
                    raise VersionException(diff)

                # otherwise re-try saving using db_instance
                self.instance = db_instance
                if self.is_valid():
                    return super(VersionModelSerializer, self).save(**kwargs)
                else:
                    # there are errors that could not be displayed to a user
                    # so api client should refresh & retry by itself
                    raise VersionException

        # instance.save() was interrupted by application error
        except ApplicationException as logic_exc:
            if self._initial_version != self.instance._version:
                raise VersionException

            raise logic_exc

claytondaley · 2016-07-30T03:22:11Z

Sorry. I didn't read your code to figure out what you were doing. I saw a serializer. You can obviously work around the issue by hacking the serializer but you shouldn't have to.... because the flaw in the DRF logic stands on its own. I'm just using OOL to make the point.

claytondaley · 2016-07-30T03:26:28Z

And you should try that code against the latest version of django-concurrency (using IGNORE_DEFAULT=False). django-concurrency was also ignoring default values, but I submitted a patch. There was an odd corner case that I had to hunt down to make it work for normal cases.

pySilver · 2016-07-30T03:30:31Z

I think it's called extending default functionality, not really hacking. I think the best place for such feature support is at django-concurrency package.

I've reread whole issue discussion and found your proposal too broad and it would fail in many places (due to magically using default values from different sources under different conditions). DRF 3.x just got much easier and predictable than 2.x, lets keep it that way :)

claytondaley · 2016-07-30T03:31:51Z

You can't fix this in the model layer because it's broken at the serializer (before it gets to the model). Set OOL aside... why don't we require a field if default is set?

claytondaley · 2016-07-30T03:33:17Z

A non partial serializer "requires" all fields (fundamentally) and yet we let this one by. Is it a bug? Or do we have a logical reason?

pySilver · 2016-07-30T03:35:12Z

as you can see in my code example – _version field is always correctly required in all possible cases.

btw, it turned out that I've borrowed model lvl code from https://github.com/gavinwahl/django-optimistic-lock and not from django-concurrency which is way to complex for almost no reason.

claytondaley · 2016-07-30T03:37:51Z

... so the bug is "non-partial serializers incorrectly set some fields to not-required". That's the alternative. Because that's the (implicit) commitment that a non-partial serializer makes.

claytondaley · 2016-07-30T03:41:35Z

I can quote it:

By default, serializers must be passed values for all required fields or they will raise validation errors.

This says nothing about required (except when a default is provided).

claytondaley · 2016-07-30T03:44:06Z

(and I get that I'm talking about two different levels, but the ModelSerializer shouldn't un-require fields if it's not going to take responsibility for that decision)

pySilver · 2016-07-30T03:44:07Z

I think I've lost your point..

pySilver · 2016-07-30T03:44:40Z

(and I get that I'm talking about two different levels, but the ModelSerializer shouldn't un-require fields if it's not going to take responsibility for that decision)

Whats wrong with that?

claytondaley · 2016-07-30T03:48:31Z

OK let me try a different angle.

Assume I have a non-partial Model Serializer (edit: all defaults) that covers all the fields in my model.

Should a CREATE or UPDATE with the same data ever produce a different object (minus the ID)

carltongibson · 2017-09-28T09:28:06Z

I'm going to de-milestone this for now. We can reassess after v3.7

claytondaley · 2017-09-28T15:28:48Z

Up to you guys, but I want to make sure you're clear that this is not a Ticket to add concurrency support. The real issue is that a single serializer cannot correctly validate both a PUT and POST in the current architecture. Concurrency just provided the "failing test".

claytondaley · 2017-09-28T19:13:18Z

TL;DR You can see why this issue is blocked by starting at Tom's proposed fix.

In summary, the proposed solution is to make all fields required for a PUT request. There are (at least) two problems with this approach:

Serializers think in actions not HTTP methods so there isn't a one-to-one mapping. The obvious example is create because it's shared by PUT and POST. Note that create-by-PUT is disabled by default so the proposed fix is probably better than nothing.
We don't need to require all fields in a PUT (a sentiment shared by PUT does not completely replace object #3648, ModelSerializer fields don't obey default from the model #4703). If a nillable field is absent, we know it can be None. If a field with a default is absent, we know we can use the default. PUTs actually have the same (Model-derived) field requirements as POST.

The real issue is how we handle missing data and the basic proposal in #3648, #4703, and here remain the right solution. We can support all of the HTTP modes (including create-by-PUT) if we introduce a concept like if_missing_use_default. My original proposal presented it as a replacement for partial, but it's easier (and may be necessary) to think of it as an orthogonal concept.

tomchristie · 2017-09-29T09:39:43Z

if we introduce a concept like if_missing_use_default.

There's nothing preventing anyone from implementing either this, or a strict "require all fields" as a base serializer class, and wrapping that up as a third party library.

My opinion is that a strict "require all fields" mode might also be able to make it into core, it's very clear obvious behavior, and I can see why that'd be useful.

I'm not convinced that a "allow fields to be optional, but replace everything, using model defaults if they exist" - That seems like it'd present some very counter-intuitive behavior (eg. "created_at" fields, that automatically end up updating themselves). If we want a stricter behavior, we should just have a stricter behavior.

Either way around, the right way to approach this is to validate it as a third party package, then update our docs so we can link to that.

Alternatively, if you're convinced that we're missing a behavior from core that our users really do need, then you're welcome to make a pull request, updating the behavior and the documentation, so we can assess the merits in a very concrete way.

Happy to take pull requests as a starting point for this, and even happier to include a third party package demonstrating this behavior.

tomchristie · 2017-09-29T09:40:51Z

coming around to the value in having an option of PUT-is-strict behavior.

This still stands. I think we could consider that aspect in core, if someone cares enough about it to make a pull request along those lines. It'd need to be an optional behavior.

claytondaley · 2017-09-29T14:27:07Z

That seems like it'd present some very counter-intuitive behavior (eg. "created_at" fields, that automatically end up updating themselves).

A created_at field should be read_only (or excluded from the serializer). In both of these cases, it would be unchanged (the normal serializer behavior). In the counter-intuitive case that the field is not read-only in the serializer, you would get the counter-intuitive behavior of automatically changing it.

Happy to take pull requests as a starting point for this, and even happier to include a third party package demonstrating this behavior.

Absolutely. The "use defaults" variation is an ideal case for a 3rd party package because the change is a trivial wrapper around (one method of) the existing behavior and (if you buy into the defaults argument) works for all non-partial serializers.

tomchristie closed this 4 hours ago

Perhaps you'd consider adding a label like "PR Welcome" or "3rd Party Plugin" and leaving valid/acknowledged issues like this open. I often search open issues to see if a problem has already been reported and its progress towards resolution. I perceive closed issues as "invalid" or "fixed". Mixing a few "valid but closed" issues into the thousands of invalid/fixed issues doesn't invite efficient searching (even if you knew they might be there).

tomchristie · 2017-09-29T16:03:33Z

Perhaps you'd consider adding a label like "PR Welcome" or "3rd Party Plugin"

That'd be reasonable enough, but we'd like our issue tracker to reflect active or actionable work on the project itself.

It's really important for us to try to keep our issues tightly scoped. Changing priorities might mean that we sometime choose to reopen issues that we've previously closed. Right now I think this has fallen out of the "the core team want to address this in the immediate future".

If it comes up repeatedly, and there continues to be no third party solution, then perhaps we would reassess it.

tomchristie · 2017-09-29T16:05:32Z

leaving valid/acknowledged issues like this open.

A bit more context on the issue management style - https://www.dabapps.com/blog/sustainable-open-source-management/

tomchristie added the Needs further review label Jun 30, 2016

claytondaley mentioned this issue Jul 1, 2016

Rejecting updates without a version saxix/django-concurrency#36

Closed

tomchristie closed this as completed Jul 4, 2016

tomchristie modified the milestones: 3.6.1 Release, 3.6.2 Release, 3.6.3 Release Mar 9, 2017

tomchristie modified the milestones: 3.6.3 Release, 3.6.4 Release May 12, 2017

claytondaley mentioned this issue Aug 14, 2017

HAL List Serializer Artory/drf-hal-json#20

Merged

carltongibson modified the milestones: 3.6.4 Release, 3.6.5 Release Aug 21, 2017

claytondaley mentioned this issue Sep 1, 2017

Separate Validation of the Client Submission from Resource Dependent checks #5385

Closed

tomchristie mentioned this issue Sep 18, 2017

Use QuerySet.select_for_update() in views to fix race condition #4675

Closed

6 tasks

carltongibson removed this from the 3.6.5 Release milestone Sep 28, 2017

claytondaley changed the title ~~Optimistic Concurrency and PUT Logic~~ PUT calls don't fully "replace the state of the target resource" Sep 28, 2017

tomchristie closed this as completed Sep 29, 2017

PUT calls don't fully "replace the state of the target resource" #4231

PUT calls don't fully "replace the state of the target resource" #4231

Comments

claytondaley commented Jun 30, 2016 • edited Loading

Steps to reproduce

Expected behavior

Actual behavior

tomchristie commented Jun 30, 2016

claytondaley commented Jun 30, 2016 • edited Loading

anoopmalav commented Jul 1, 2016

claytondaley commented Jul 1, 2016

tomchristie commented Jul 4, 2016

claytondaley commented Jul 4, 2016 • edited Loading

tomchristie commented Jul 4, 2016

claytondaley commented Jul 4, 2016

pySilver commented Jul 25, 2016

claytondaley commented Jul 30, 2016

pySilver commented Jul 30, 2016

claytondaley commented Jul 30, 2016

pySilver commented Jul 30, 2016 • edited Loading

pySilver commented Jul 30, 2016 • edited Loading

pySilver commented Jul 30, 2016

claytondaley commented Jul 30, 2016

claytondaley commented Jul 30, 2016 • edited Loading

pySilver commented Jul 30, 2016

claytondaley commented Jul 30, 2016

claytondaley commented Jul 30, 2016

pySilver commented Jul 30, 2016

claytondaley commented Jul 30, 2016

claytondaley commented Jul 30, 2016

claytondaley commented Jul 30, 2016

pySilver commented Jul 30, 2016

pySilver commented Jul 30, 2016

claytondaley commented Jul 30, 2016 • edited Loading

carltongibson commented Sep 28, 2017

claytondaley commented Sep 28, 2017

claytondaley commented Sep 28, 2017 • edited Loading

tomchristie commented Sep 29, 2017 • edited Loading

tomchristie commented Sep 29, 2017

claytondaley commented Sep 29, 2017

tomchristie commented Sep 29, 2017

tomchristie commented Sep 29, 2017

claytondaley commented Jun 30, 2016 •

edited

Loading

claytondaley commented Jun 30, 2016 •

edited

Loading

claytondaley commented Jul 4, 2016 •

edited

Loading

pySilver commented Jul 30, 2016 •

edited

Loading

pySilver commented Jul 30, 2016 •

edited

Loading

claytondaley commented Jul 30, 2016 •

edited

Loading

claytondaley commented Jul 30, 2016 •

edited

Loading

claytondaley commented Sep 28, 2017 •

edited

Loading

tomchristie commented Sep 29, 2017 •

edited

Loading