Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not treat missing payload as empty dict #7195

Closed

Conversation

peterthomassen
Copy link
Contributor

This allows views to distinguish missing payload from empty payload.

Related: #3647, #4566

Description

Previously, empty non-form payload caused request.data to equal {}. Note that the choice of {} is arbitrary, and while it may appear appropriate for views that use a regular Serializer, it does cause issues for views that use a ListSerializer. For list endpoints, [] clearly would have been the less arbitrary choice (but also cause other problems, cf. #3647).

The problem becomes more obscure with a list endpoint that may accept lists of objects, or single objects. In this case, the serializer's many argument has to be set to False if and only if an object was given, and to True if and only if a list was given. If the client does not pass any payload in such a scenario, the request will be treated as if the user had passed an empty dictionary. This is simply not correct.

This PR changes the behavior such that empty non-form payload will cause request.data to be None. This allows views to distinguish between missing payload and a non-None but empty payload data structure.

In #4566, @tomchristie argued that it would be an option to

Use the None sentinel for non-form data. (but don't want to go that way because it'd play less well with Django's existing behavior, and could easily break existing codebases)

While this change may break existing code bases, the current situation makes life hard for all who need to distinguish empty payload from missing payload. In particular, changing the behavior in a DRF-based application currently requires overwriting into APIView.initialize_request(), and doing some magic there based on whether request.stream is None or not (repeating the logic that is given in Request._parse(), with the danger of missing the RawPostDataException exception or other edge cases). This situation is error-prone and smells like future maintenance work within the application.

On the other hand, with the proposed change applied, users can simply fix their code bases by replacing instances of e.g. serializers.Serializer(data=request.data) with serializers.Serializer(data=request.data or {}) to recover the old behavior. (The reverse approach which has been taken so far is much more complicated, as the information about missing payload is lost at the point when the request has been dispatched for processing in a view.)

So, despite of the small backwards incompatibility, but in light of the simplicity of a corresponding fix, I propose to take the risk (and maybe announce in the next major release that the change will be upcoming in the next-to-next, or something like that).

Note that this does not cause the problem described in #3647 to resurface: The problem there was that empty content caused a side-effect and a success response, while the user expected a "Bad Request" response. With the proposed fix, this expectation will remain fulfilled.

Lastly, if people need empty payload to be accepted, one can set allow_null = True on the serializer. Without the proposed fix, the allow_null property is meaningless on the serializer used by the view (although the attribute does exist by virtue of the serializers being Fields, and is documented in general terms). In this regard, the proposed fix removes ambiguity and increases flexibility.

This allows views to distinguish missing payload from empty payload.

Related: encode#3647, encode#4566
@tomchristie
Copy link
Member

We really want to be consistent with form parsing here, which always returns dictionaries.

I don't think it's going to be acceptable for us to change the behavior here, since it's pretty fundamental. If you strictly need to distinguish between "sent no data" vs. "sent an empty data structure" then you'll probably need to inspect the Content-Length header or something similar.

@peterthomassen
Copy link
Contributor Author

Why do you want to be consistent with form parsing? What problem gets solved by doing this?

Django is a template-oriented framework (which necessarily leads to forms), but DRF distinctly supports API data interchange. API usage is different from form usage, in a "pretty fundamental" way. The relevant code in request.py already distinguishes the two cases.

It does that precisely because the two situations are not the same thing: One is a case that is related to the nature of Django (the form case, and we are consistent there), but the other case is not so closely related and should be doing its own thing in the best possible way. Treating them the same does not solve anything, but instead creates the described problems. (I was looking to quote a line from the Zen here, but it turns out that many of them would fit, so I decided not to make an artificial choice ;-) ).

Also, looking at the Content-Length header creates other problems -- just take a look at the checks and exception handling done in _load_stream(). Are we certain that this should be duplicated in other locations, in order to help views to convict DRF of fabricating payload that the user never sent?

@tomchristie
Copy link
Member

I'm not totally against this, but I think it's probably too big of a behavior change for us to introduce at this point in the framework.

@peterthomassen
Copy link
Contributor Author

Sure, maintaining backwards compatibility is a priority.

Would it be acceptable for you if I added a setting like DEFAULT_EMPTY_PAYLOAD which would default to {}, could be set to anything (such as None)? Any other ideas appreciated, such as a boolean setting, or a view attribute (probably overengineering), ...

@peterthomassen peterthomassen changed the title Do not treat empty payload as empty dict Do not treat missing payload as empty dict Feb 19, 2020
@tomchristie
Copy link
Member

I'm not sure really. Could be reasonable yup, since it looks like it'd be a pretty low impact change.

@rpkilby
Copy link
Member

rpkilby commented Feb 19, 2020

I'm normally skeptical of these kinds of changes, but my off-the-cuff reaction is almost +1. I haven't looked at the code in-depth, but generally, serializers gracefully handle "empty" values, so I don't think there's too much concern for breakages.

That said, I'd like to understand why test_empty_post_uses_default_boolean_value fails, and whether that could be fixed (by fixing the serializer instead of changing the test). Also, aside from serializers, what other potential impact would we be worried about here? What other side effects would result from this change?

Basically, I'm trying to understand just how big of a behavior change this is or is not. If we ultimately think it's really not that big of a change, then maybe the setting could default to None? And if any users report issues, then we can tell them to revert the behavior by setting to {}.

@tomchristie
Copy link
Member

but my off-the-cuff reaction is almost +1

Strong praise. 🤣

If we ultimately think it's really not that big of a change, then maybe the setting could default to None? And if any users report issues, then we can tell them to revert the behavior by setting to {}.

That might be worth considering, yes.

@peterthomassen
Copy link
Contributor Author

I'm normally skeptical of these kinds of changes, but my off-the-cuff reaction is almost +1. I haven't looked at the code in-depth, but generally, serializers gracefully handle "empty" values, so I don't think there's too much concern for breakages.

There should be quite some POST endpoints around which trigger actions and expect no data, but most of them will not have a serializer configured (as they expect no data). On the other hand, when the view does have a serializer configured, I would think that sending nothing would be a rather uncommon use case. So, I agree that there may not be very many breakages, although for another reason (no serializer involved, instead of the serializer being very graceful).

That said, I'd like to understand why test_empty_post_uses_default_boolean_value fails, and whether that could be fixed (by fixing the serializer instead of changing the test).

The test previously made the assumption that when no data is sent, that's equivalent to sending {}, in which case missing fields get set to their default. With the new default of None, the serializer does not correspond to a (JSON) object with missing fields; instead, it corresponds to nothing. That's why you wouldn't set default fields on "nothing".

This argument not only applies to serializers, but to fields in general: Think of a serializer that has a field which in turn is another serializer with allow_null = True. In this case, the outer serializer corresponds, for example, to a model with a nullable field representing another model. If we changed what that means (by populating default fields in the inner serializer even though the field was None), then a lot would break.

So, I would vote for not changing the serializer behavior. Thus, as a consequence of redefining the semantics of "no data", tests will have to be adapted. However, there's an update, see below.

Basically, I'm trying to understand just how big of a behavior change this is or is not. If we ultimately think it's really not that big of a change, then maybe the setting could default to None? And if any users report issues, then we can tell them to revert the behavior by setting to {}.

I changed the implementation a bit by adding the DEFAULT_MISSING_DATA setting that defaults to None. As this PR is closed, the change does not show up here, but you can see the implementation, tests, and docs update here: peterthomassen@388b0ac

There are now three tests: One that overrides the setting with DEFAULT_MISSING_DATA={}, recovering the old behavior, and two others that don't do that, but each have serializer.allow_null set to True or False, respectively. In the first case, the None input value from request.data is propagated to serializer.validated_data (as explained above) and then to response.data, and in the second case you get a validation error ("No data provided").

Does this cover everyone's concerns?

@peterthomassen
Copy link
Contributor Author

Thinking about it again, one way to move forward with this could be to introduce this setting, but set the default to {} for now, and announce the switch to None as upcoming with a major release. This way, the cleaner functionality is available soon, without changing anything for those who do not use the setting until the major release (with enough time for the community to learn about it etc.).

@tomchristie
Copy link
Member

Introducing it with {} as an initial default is a v. good plan. We then have the option of switching it up in a major release yup.

@peterthomassen
Copy link
Contributor Author

Cool! - Can I help in any way with the next step(s)? (not sure what they are)

@tomchristie
Copy link
Member

If you're up for it you could try taking a pass at a PR that replaces

with empty_data = api_settings.EMPTY_REQUEST_DATA and adding a default here...

@peterthomassen
Copy link
Contributor Author

@tomchristie It's already done, see peterthomassen@388b0ac If you reopen this PR, it should automatically update (it's the same branch)

@tomchristie
Copy link
Member

Screenshot 2020-02-24 at 10 11 40

🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants