Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoftDelete is an anti-pattern #364

Open
danielquinn opened this issue Mar 19, 2019 · 18 comments · May be fixed by #480
Open

SoftDelete is an anti-pattern #364

danielquinn opened this issue Mar 19, 2019 · 18 comments · May be fixed by #480

Comments

@danielquinn
Copy link

danielquinn commented Mar 19, 2019

My apologies if this comes across as grumpy. I don't mean to be disrespectful. I know what it's like to maintain a popular project and no one likes it when someone pipes up about how they don't like something.

With that said though, there's a very real problem here with the SoftDeletableModel.

Problem

The SoftDeletableModel (and corresponding manager) has been the bane of my life for years. I started one job and spent a year removing it from the codebase, only to start a new job and be faced with it again. The defaults it implements (overriding .objects) break internal operations with Django (specifically the admin) and leads to hundreds of lost hours trying to debug code that's silently hiding your own data from you.

Example

Consider this simple set of models:

class User(SoftDeletableModel):
    name = models.CharField(max_length=128)

class Purchase(models.Model):
    user = models.ForeignKey(User)
    note = models.TextField()

Now:

  1. Create a user and a purchase with that user.
  2. Soft delete that user
  3. Find the purchase in the admin.
  4. Try to save a change to the note field.

→ Django explodes because the attached user doesn't exist... except that it does exist but we've broken the defaults of the underlying framework by overriding .objects.all() to mean not actually all.

This problem reasserts itself when debugging: You're a new developer tasked with generating statistics based on the number of users, and using User.objects.all().count() to do it. For some reason, your math doesn't add up and you watch a few hours get sucked away by this anti-pattern. I have no idea what would happen in cases where you might do something like Purchase.objects.filter(user__name__startswith="..."), but I'm willing to bet it'll cause problems when comparing to User.objects.filter(name__startswith="...").

Please don't muck with the defaults.

Proposed Alternative

Rather than replacing .objects.all(), rework it to leave .all() alone and instead have .available(). Alternatively, leave objects alone altogether and just have an .available_objects manager on the class. In either case, explicit is better than implicit if for no other reason than there's no surprises.

Understandably, this would break current implementations using the SoftDeletableModel, but frankly not doing this means that more and more of this sort of thing will creep into code everywhere as this is a very popular library and this is being described as a good solution to a common problem. In its current implementation however it creates more problems than it solves.

At the very least, please add a warning to the documentation that using this model as your parent can create long-lasting and hard-to-remove problems with data integrity in your application and disruptions with core functions like the Django admin. I think that if the developers that came before me had seen such a message, they would have thought twice before using it as the default base for all models in the system.

@schinckel
Copy link
Contributor

That's a really good point: in fact I think the django docs explicitly call out overriding the default manager as potentially dangerous behaviour.

I think if it was to be changed, then it would need a deprecation path, but perhaps this is something that could be done in stages. (Is there a deprecation schedule for this project?)

@danielquinn
Copy link
Author

The easiest path that I could see for this would be to introduce .available_objects(), then add a deprecation warning to .objects.all(). Users could silence the messages by switching to .available_objects() explicitly. In these terms the stages would be:

  1. Introduce .available_objects()
  2. .all() includes a deprecation warning, recommending .available_objects() for the going-away behaviour.
  3. .all() is removed altogether allowing for the default.

@mozz100
Copy link

mozz100 commented Apr 10, 2019

I work with @danielquinn. FWIW I wrote a blog post about the kinds of problems that soft deletion can cause. I hope other developers read it and think about the implications of adopting this implementation of soft delete, ideally before they go ahead and adopt it without realising.

https://www.rmorrison.net/mnemozzyne/2019/04/10/think-twice-before-you-soft-delete/

(I don't think soft delete per se is an antipattern, just that the over-riding of .objects is problematic)

@frenzyk
Copy link

frenzyk commented Jun 3, 2019

Not sure if it good to use depreciation and .available_objects(). Personally, I like how this implemented in Laravel:
https://laravel.com/docs/5.8/eloquent#querying-soft-deleted-models
All we need is with_trashed(), only_trashed(), restore(), .forceDelete(). So if you want to get all objects including deleted, then you should just do .objects.with_trashed().all(). Although it may be a more complicated task to make soft deleted models play nice with relations.
Think it should be part of Django core actually.

@schinckel
Copy link
Contributor

The problem with something like .with_deleted() is that it’s not very easy within the scope of a queryset to “undo” a previously applied filter. And since queryset methods chain, you’d really need to be able to do that.

@danielquinn
Copy link
Author

danielquinn commented Jun 3, 2019

More than that, opting for .objects.with_trashed() breaks the default behaviour which necessarily leads to instability in working with other components that expect those defaults. Even if it seems more intuitive for some, this reason alone should be enough to dissuade you from this pattern.

@DylanYoung
Copy link

Newer version of Django do have support for overriding the default manager without overriding the default related manager (if I'm not mistaken).

Might be useful and save some of this deprecation path work.

@jonathan-s
Copy link
Contributor

jonathan-s commented Nov 5, 2019

The problem with something like .with_deleted() is that it’s not very easy within the scope of a queryset to “undo” a previously applied filter. And since queryset methods chain, you’d really need to be able to do that.

It would be possible to change the query midway in a way similar to this. https://github.com/divio/djangocms-versioning/blob/3f8f16f572755e99638e87766e0d44bdbda17904/djangocms_versioning/managers.py#L14-L25

The code above effectively undoes a certain filtering criteria.

Though I tend to agree with danielquinn that the main manager should not hide data. It's confusing, and I was in quite some pain trying to debug similar issues.

@farooqaaa
Copy link

@jonathan-s The djangocms method only works with get(). It doesn't work with filter().

@jonathan-s
Copy link
Contributor

jonathan-s commented Nov 8, 2019 via email

@ezet
Copy link

ezet commented Jan 15, 2020

Another solution is to use Meta.default_manager_name = 'all_objects'. That would result in the same behavior as using the .available_objects approach, except that using .objects would filter out deleted items, and it should be backwards compatible.

@craiga
Copy link
Contributor

craiga commented Aug 12, 2020

I've had a go at implementing the changes suggested here in #438

I'm a bit worried that emitting a warning every time soft_deletable_instance.objects is used is overkill. I'm very open to suggestions on how to improve this approach!

@craiga
Copy link
Contributor

craiga commented Aug 21, 2020

I think #438 is ready to go. I eagerly await your reviews.

@diek
Copy link

diek commented Mar 15, 2021

This is something I need for a client, I understand that to blindly use soft-delete is arguably an anti-pattern as @mozz100 wrote. In our case the client wants the ability to delete incident reports and to me this is far more problematic.

@craiga craiga linked a pull request Apr 5, 2021 that will close this issue
5 tasks
@craiga
Copy link
Contributor

craiga commented Apr 5, 2021

As I was working on removing the model manager, I realised the current deprecation message isn't enough.

class Manufacturer(models.Model):
    name = models.TextField()

class Car(SoftDeletableModel):
    name = models.TextField()
    manufacturer = models.ForeignKey(Manufacturer, related_name="cars", on_delete=models.CASCADE)

ford = Manufacturer.objects.create(name="Ford")
escort = Car.objects.create(name="Escort", manufacturer=ford)
focus = Car.objects.create(name="Focus", manufacturer=ford)

escort.delete()  # soft deleted

ford.cars.all()  # No deprecation message. Currently only includes the Focus; should include the soft-deleted Escort.

I propose an additional warning along the lines of:

This queryset will include soft-deleted records in an upcoming release; please add .available() to your queryset to continue excluding soft-deleted records.

.available() would be equivalent to .exclude(is_removed=True).

This should only appear if the user hasn't filtered by is_removed (assuming this is possible).

@diek
Copy link

diek commented Apr 7, 2021

This makes sense @craiga

@mbegoc
Copy link

mbegoc commented Jan 18, 2023

I just went through the deprecation warning and am ending up here.
I just don't get the whole point around this (but it's obviously too late to do something about it... Still, I think it's a good thing to question what lead to this in the first place).

The whole point of a soft delete is to mimic a delete, without actually losing the data, in case of.
It means you don't want to see the data within your apps. At all. Unless you implement something specific to do so, or you dig into your db.
So yeah, it totally makes sense the data that is deleted is not accessible within the django admin.

Regarding relationships between data, you obviously have to manage the consequences of deleting data... just like we do with DB: cascade, prevent, set null. If data is soft deleted without taking care of that, it's not the django admin nor the soft delete extension that are broken, it's the data itself that is corrupted.

Given the example given by the OP:
the good way to manage this is to implement a "on delete" behavior to manage the purchases: soft delete them as well? Prevent the user deletion? I don't know, it's a business logic decision: if the purchase cannot be deleted and you need to know who is the buyer, then you don't want to actually delete the data but implement a deactivation feature for the users.

As I am concerned, the whole point of using this extension is to avoid as much as possible to expose soft deleted data, which it won't help with anymore since we will have to use the good manager everywhere to do so.

Last thing: the fix this thread ended up with aims to fix an issue named by the thread title "SoftDelete is an anti-pattern". So we are trying to fix code written in the first place to implement an anti-pattern? The fix is simple, remove the whole thing from the extension or don't use it. I don't even get why something has been "fixed" in this extension for the sake of someone who consider this is an anti-pattern and said the only thing he does about this is to remove the feature from the project he's working on.

@gautammd
Copy link

gautammd commented Feb 4, 2023

I agree with @mbegoc, I don't think soft-delete is an anti-pattern and the whole point of hiding the data in the queryset is to make sure you don't do any operations on it by accident. For example - send out an email notification, process them in some batch operation or include them in api response etc..

as I wasn't aware of this library at that time, I implemented a solution similar to what this offers and here's how I solved the issue that OP mentions in his argument "Breaks Django Admin (Objects are not visible even though they exist in the db)"

# manager.py
class AbstractBaseManager(Manager):
    def get_queryset(self, *args, **kwargs):
        return super().get_queryset(*args, **kwargs).filter(is_archived=False)

    def archived(self, *args, **kwargs):
        return super().get_queryset(*args, **kwargs).filter(is_archived=True)
# admin.py
class BaseAdmin(admin.ModelAdmin):
    def get_queryset(self, request):
        return super().get_queryset(request) | self.model.objects.archived()

This way super admin can see all the data in django admin panel even if they are archived or soft deleted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.