Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"update_fields" option to "bulk_create" not supported? #1323

Open
elserj opened this issue Apr 26, 2024 · 1 comment · May be fixed by #1419
Open

"update_fields" option to "bulk_create" not supported? #1323

elserj opened this issue Apr 26, 2024 · 1 comment · May be fixed by #1419

Comments

@elserj
Copy link

elserj commented Apr 26, 2024

Problem Statement
Using the "update_fields" option when using "bulk_create_with history" doesn't seem to be supported.

Describe the solution you'd like
Ideally, I'd actually like if a new option was added that didn't add a history record if a new model is added, but if it is updated the history is also.

Describe alternatives you've considered

Additional context
The "update_fields" option was added to "bulk_create" in Django 4.1 (https://docs.djangoproject.com/en/4.1/ref/models/querysets/#bulk-create), but I don't see a corresponding change in the "bulk_create_with_history".

I don't actually use the "bulk_create_with_history" as I only actually want changes to the history made after initial load. What I'd like to see is an "bulk_create_with_history" to have the additional ability to handle the "update_conflicts/update_fields" options, with an additional option to only create history records for updates, but not for new models.

@JBrut22
Copy link

JBrut22 commented May 31, 2024

I saw your issue while looking for the same feature. I attempted to create a work around, but realized there might be a reason why they are not creating this feature...

I did successfully alter the code for bulk_create_with_history to allow you to update (you just add in the parameters from bulk_create to the args, then add them to the bulk_create function (lines 125-127) in simple_history.utils. However, this will not create a history record for updated records, only for created... Why? Because the bulk_create function from django returns all objects created/updated and does not specify which is which...

Based on the above, you would need to loop through all the records and check against the database to determine which are already existing using a pk or another unique field if the pk is an AutoField

Completion time for 500 records running on docker and a docker PostgreSQL db:

  • bulk_create: ~70 to 90 ms
  • modified bulk_create_with_history: ~180 to 200 ms
  • custom func: ~250 to 370 ms

As you can see, it can be quite a bit slower using the custom function due to the looping. I imagine it would take a lot longer for the thousands to tens of thousands of records people are often working with.

Anyway, here is the function. I have only tested it with pk_is_autofield = True. Also, if you have an AutoField pk and there is no single unique_field, it will not work. I could have added code to use a list of unique fields, but this is what served my purposes for now.

def bulk_create_update_with_history(
    obj_list: list,
    model,
    pk: str,
    pk_is_autofield: bool = False,
    unique_field: str = "",  # if above is True, this must have a value
    update_fields: list = [],
) -> tuple[list | Any, int]:
    if pk_is_autofield and unique_field == "":
        raise ValueError("unique_field must be provided if pk_is_autofield is True")

    if not pk_is_autofield:
        unique_field = pk

    # create a list of unique identifiers
    obj_unique_list = [getattr(obj, unique_field) for obj in obj_list]
    # create fields list for values query
    fields = [unique_field]
    if pk != unique_field:
        fields.append(pk)

    # get the obj that already exist and convert to dict
    existing_objs = model.objects.filter(
        **{f"{unique_field}__in": obj_unique_list}
    ).values(*fields)
    existing_obj_uniques = existing_objs.values_list(unique_field, flat=True)

    # separate objs that need update from those to be created
    create_objs = []
    update_objs = []
    for obj in obj_list:
        if not hasattr(obj, unique_field):
            raise ValueError(f"Object does not have unique field: {unique_field}")
        # if the obj exists, add to update list
        if getattr(obj, unique_field) in existing_obj_uniques:
            # add the pk field if it is an autofield for existing objects
            if pk_is_autofield:
                existing_pk = existing_objs.filter(
                    **{unique_field: getattr(obj, unique_field)}
                ).first()[pk]
                setattr(
                    obj,
                    pk,
                    existing_pk,
                )

            update_objs.append(obj)
        else:
            create_objs.append(obj)

    # bulk create the objects that do not exist
    created_objs = []
    if create_objs:
        created_objs = bulk_create_with_history(
            create_objs,
            model,
        )

    # bulk update
    num_updated_objs = 0
    if update_objs:
        num_updated_objs = bulk_update_with_history(
            update_objs,
            model,
            fields=update_fields,
        )

    return created_objs, num_updated_objs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants