Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogBatchPurger Too many DML rows: 10001 #166

Closed
starkguerraj opened this issue Jun 5, 2021 · 18 comments · Fixed by #173
Closed

LogBatchPurger Too many DML rows: 10001 #166

starkguerraj opened this issue Jun 5, 2021 · 18 comments · Fixed by #173
Labels
Feature: Log Retention Items related to LogBatchPurger or LogBatchPurgeScheduler Layer: Log Management Items related to the custom objects & Logger Console app optimization Type: Bug Something isn't working

Comments

@starkguerraj
Copy link

starkguerraj commented Jun 5, 2021

Getting this error for a batch whenever LogBatchPurger runs.

Almost latest version 4.4.5 of the Unlocked Package.

Will try to get some more details.

@starkguerraj
Copy link
Author

Here is a snippet of the exception from the debug log.

image

@jongpie
Copy link
Owner

jongpie commented Jun 6, 2021

Hey @starkguerraj - I'll have to work on optimizing that batch job, but as a quick workaround, you can try re-running the batch job with a smaller batch size (the default is 200). Try running this Apex script to see if it'll successfully delete your records:

Integer batchSize = 100;
Database.executeBatch(new LogBatchPurger(), batchSize);

@jongpie jongpie added Type: Bug Something isn't working optimization labels Jun 6, 2021
@starkguerraj
Copy link
Author

Definitely will do that for now, I tried 10 then moved down to 1 to finish off a few that were failing.

Thanks

jamessimone added a commit that referenced this issue Jun 6, 2021
jamessimone added a commit that referenced this issue Jun 6, 2021
@jongpie jongpie added this to the Version 4.5.0 milestone Jun 7, 2021
jamessimone added a commit that referenced this issue Jun 7, 2021
* potential fix for #166 by introducing limits-minded deletion mechanism in LogBatchPurger

* Re-added code coverage for while loop

* Prettier formatting, extra safety
@jamessimone
Copy link
Collaborator

@starkguerraj I went ahead and took care of this one for you - @jongpie will be releasing Version 4.5.0 later this week with the updates from this and from #165. Let us know if there are any further issues!

@starkguerraj
Copy link
Author

You guys are great, I really appreciate it.

@jongpie
Copy link
Owner

jongpie commented Jun 20, 2021

@starkguerraj I was a little slow in release version 4.5.0, but it's now finally live!

Let us know if you run into any other issues!

@starkguerraj
Copy link
Author

starkguerraj commented Jun 24, 2021

Thanks for the update, I was able to get the package installed, bad news unfortunately, received an exception this morning, here is the stack trace from the email, let me know if there's anything else I can provide.

Failed to process batch for class 'LogBatchPurger' for job id '7073Z0000DQd275'

caused by: System.LimitException: Too many DML rows: 10001

Class.LogBatchPurger.LogDeleter.hardDelete: line 65, column 1
Class.LogBatchPurger.LogDeleter.process: line 47, column 1
Class.LogBatchPurger.execute: line 102, column 1

@jongpie
Copy link
Owner

jongpie commented Jun 24, 2021

@starkguerraj, darn, I thought we had this resolved! Sorry for the ongoing issues! I think there are a couple of next steps:

  • For now, you can keep specifying a batch size (as mentioned above) until we get yet another release out for this
  • We'll look into further optimizing the code again to better handle that Too many DML rows error
  • I'm considering also adding another constructor to LogBatchPurgeScheduler with a parameter Integer batchSize - this would let you specify the batch size for the scheduled job. The only downside is that you'd have to schedule the job via Apex, not the Salesforce UI, but it would give admins more control over the scheduled job

@starkguerraj
Copy link
Author

@jongpie

No big deal, just wanted to keep it on the radar.

The last option seems like a great interim solution with a low amount of development effort for something that may be an edge case issue (me).

@jongpie
Copy link
Owner

jongpie commented Jun 24, 2021

Sounds good, I'll plan to add the extra constructor in the next release (as well as some other optimizations for the actual deletion process) - I've reopened this issue too.

@jongpie jongpie reopened this Jun 24, 2021
@jongpie jongpie modified the milestones: Version 4.5.0, Version 4.6.0 Jun 24, 2021
@jongpie jongpie added the Layer: Log Management Items related to the custom objects & Logger Console app label Aug 3, 2021
@jongpie
Copy link
Owner

jongpie commented Aug 10, 2021

@starkguerraj the newest release, v4.6.0, has 2 related changes:

  • LogBatchPurgeScheduler has a new constructor, LogBatchPurgeScheduler (Integer batchSize). This can be used to specify the batch size used when executing LogBatchPurger
      String cronExpression = '0 0 23 * * ?';
      Integer batchSize = 100;
      System.schedule('LogBatchPurgeScheduler with custom batch size', cronExpression, new LogBatchPurgeScheduler(batchSize));
  • LogBatchPurger has been optimized - my hope is that you'll no longer run into issues like 'Too many DML rows' or 'Apex CPU time limit exceeded' (mentioned in LogBatchPurger - Apex CPU time limit exceeded #176)

@starkguerraj
Copy link
Author

I was finally able to update my sandbox org from 4.5 to the latest unlocked 4.6.9.

Unfortunately there is a new error now, I attempted to execute the purger to test as follows, Database.executebatch(new LogBatchPurger(), 1).

It doesn't appear I'm over the limits as the error may suggest so perhaps its an internal issue, please let me know if there's something else I can provide to help you track this down.

image

image

image

@starkguerraj
Copy link
Author

starkguerraj commented Oct 7, 2021

I see, looking at the commit history

  • Switched to new approach in LogBatchPurger - it now runs separate batch jobs for each of Logger's objects

So running with a batch size of 1 is most likely no longer appropriate.

image

Would you happen to have a recommendation on a setting size?

Thanks

@jongpie
Copy link
Owner

jongpie commented Oct 8, 2021

Hi @starkguerraj - glad you were able to upgrade to v4.6.9! You're right, I don't think a batch size of 1 is ideal for your org any more. I think a batch size of 200 (Salesforce's default batch size when calling Database.executeBatch()) should be a safe choice now. But if it's easy for you to do so, it'd be interesting to see if it works with a batch size of 2,000 (the max batch size that Salesforce allows) to see how well the optimizations are truly working. I think the new approach is definitely more performant overall, but I don't currently have access to an org with enough storage space to test some of the large data volumes you have in your org, so I unfortunately can't currently test large data volumes at the moment.

And to give some more context on the batch job's design, as you noted, LogBatchPurger now runs separate batch jobs for each of Logger's objects - this new approach has 2 implications:

  1. It should be much more scalable for large data volumes. The old approach kept hitting limits because it tried to delete Log__c and LogEntry__c records in the same batch/transaction, so a large batch size of Log__c records, coupled with those Log__c records having a large number of LogEntry__c records, caused a lot of the issues you reported a few months ago. The new approach of "1 SObject Type per batch job" should eliminate those types of issues.
  2. More batch jobs need to run to delete the same amount of data, so using a larger batch size is more important now. A tradeoff with the new approach is that in some situations, it will need to run even more total batch jobs (compared to the old approach), which could increase your daily async job usage. For example, if you had 200 Log__c records, each with 2 related LogEntry__c records (400 total LogEntry__c records), then...
    • In the old approach, LogBatchPurger with a batch size of 200 would have deleted all Log__c and LogEntry__c records in 1 batch job/1 batch.
    • In the new approach, LogBatchPurger with a batch size of 200 will run 1 LogEntry__c batch job with 2 batches (400 records / 200 batch size), and it will run Log__c batch job with 1 batch (200 records / 200 batch size)

Let me know what batch size you end up using and/or if you run into any more issues!

@starkguerraj
Copy link
Author

starkguerraj commented Oct 8, 2021

Thanks for the update and explanation.

Prior to your response I arbitrarily picked 10000 to see what would happen and it appeared to execute without issue.

On a side note, it looks like there may be a separate issue now, there looks to be a bunch of Log records with no Entries and no start times that are well beyond my retention date that were not deleted.

image

This may or may not have been an issue previously.

@jongpie
Copy link
Owner

jongpie commented Oct 8, 2021

@starkguerraj that's awesome that it worked for such a large batch size (I think that it probably used a batch size of 2,000, even though you specified 10,000, since 2,000 is the max). This really helps confirm that the optimizations/new approach have helped - thanks for confirming that!

As far as those strange log records, it could be 1 of 2 scenarios:

  1. Occasionally, there are bugs in Logger that cause the creation of LogEntry__c records to fail, but the Log__c records are still created. This has happened some in the past due to bugs like "a required field on LogEntry__c wasn't being populated", or "a field value on LogEntry__c exceeded the field's max length", etc. So these could be some older logs in your org that were impacted by one of these bugs.
  2. It very well could be due to a bug within the LogBatchPurger job - there might have been related LogEntry__c records that were successfully deleted, but for whatever reason, the deletion batch job failed to delete the Log__c records.

In both of these 2 scenarios, the end result would look exactly like your screenshot - Log__c records with 0 related log entries - so it's hard to say which scenario you're seeing there. Could you check if the Log__c records in your screenshot have a value for the field Log__c.RetentionDate__c? If that field is null, then I think it's probably scenario 1 - if it's not null, then it probably is in fact another bug in the batch deletion job.

Either way, I think I can solve these scenarios by updating LogBatchPurger to also delete Log__c records with 0 related LogEntry__c records - I'll look into making this change in a future release.

@starkguerraj
Copy link
Author

They definitely have helped, thanks for that.

There is a retention date, some a few months old and some as recent as today (post logger upgrade), so I'm not sure which situation would have applied.

It does sound like deleting logs with no entry records would resolve the issue, and I cant really see a good reason to keep Logs without any entries.

@jongpie
Copy link
Owner

jongpie commented Oct 8, 2021

@starkguerraj sounds good, thanks for confirming those details. I've created issue #223 to update the batch job to delete any logs with 0 log entries. In the meantime, if you run into any other issues (or find out any more interesting details on those strange log records), let me know!

@jongpie jongpie added the Feature: Log Retention Items related to LogBatchPurger or LogBatchPurgeScheduler label May 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Log Retention Items related to LogBatchPurger or LogBatchPurgeScheduler Layer: Log Management Items related to the custom objects & Logger Console app optimization Type: Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants