Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory handling #1963

Merged
merged 9 commits into from
Oct 16, 2024

Conversation

PhilippTakacs
Copy link
Contributor

By using the memory snapshots with real examples we found some bottlenecks. Therefor I have implemented some optimizations.

We found that the flatview creation is quite expensive (around 1/4 of all runtime including emulation) when you have a lot of memory regions. By permanently creating new regions with CoW we have a lot of regions and rebuild for each cow the flatview again. I have added a update function does only change affected ranges.

Also we found that the find_ram_offset_last optimization doesn't work after a restore. To fix this the context now includes also the ramblock_freed bool.

This also includes two smaller optimizations: only clear the affected tlb on memory_cow and save the last ramblock of the ram_list

It's currently only a draft, because we need to do a few more tests to see if the optimization works as expected.

@PhilippTakacs
Copy link
Contributor Author

I have additional added the flatview to the context. This way flatview doesn't need to be rebuild on context_restore.

I have tested and profiled this. It does in fact save us relevant time.

@PhilippTakacs PhilippTakacs marked this pull request as ready for review July 3, 2024 09:00
@PhilippTakacs
Copy link
Contributor Author

@wtdcode have you overseen this PR? It would be nice to get some feedback.

@wtdcode
Copy link
Member

wtdcode commented Jul 19, 2024

@wtdcode have you overseen this PR? It would be nice to get some feedback.

Sorry for not leaving a comment. I had a quick review on it and didn't find any obvious problems. I planned to have further debugging to have a better understanding before commenting further.

@wtdcode
Copy link
Member

wtdcode commented Sep 21, 2024

I ported d010357 which generally allows unicorn to go fast qemu_st path (not going through store_helper again) to have better writing performance. This shall have side effects on your memory_cow which relies on store_helper.

I have twinkled a bit on the code to ensure that when snapshot is enabled, no TLB is marked clean. However, I'm not sure if that works for you, you may have a look.

Regarding this PR, I locally tested and think it is good to go if you don't have more to add.

@PhilippTakacs
Copy link
Contributor Author

Thanks for the update. As far as I understand this patch allows faster writes, if the tlb cache entry exists. So for snapshots it might be enough to clear the tlb cache. This way the store_helper would be called again and everything works as expected. I'll test this.

think it is good to go if you don't have more to add

I would like to rebase this to the dev branch to remove the "fixup" commits. Also because I haven't looked at this for a few weeks, I would like to test this again with the current HEAD. I'll ping you when I'm finished testing.

@PhilippTakacs
Copy link
Contributor Author

@wtdcode I have tested the code again and removed the fixup commits so for me this is ready to merge.

For the notdirty write I'm working on a optimization, but I'll open another PR for it.

@wtdcode
Copy link
Member

wtdcode commented Oct 11, 2024

@wtdcode I have tested the code again and removed the fixup commits so for me this is ready to merge.

For the notdirty write I'm working on a optimization, but I'll open another PR for it.

Will check today or tomorrow, thanks for your work!

@@ -2418,6 +2431,11 @@ uc_err uc_context_restore(uc_engine *uc, uc_context *context)
return ret;
}
uc_snapshot(uc);
uc->ram_list.freed = context->ramblock_freed;
Copy link
Member

@wtdcode wtdcode Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should have noticed this earlier. With this change (and actually the early snapshot_level change), with UC_CTL_CONTEXT_MEMORY, the context is no longer safe to keep as a file (it contains pointers) and be restored to another uc_engine instance.

It is super non-trivial to adapt the memory context to a value type and thus I suggest adding a few documents in the header (uc_context_save/restore, UC_CTL_CONTEXT_MEMORY).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we had this question already in the snapshots feature. I'll write some documentation about this.

qemu/exec.c Show resolved Hide resolved
uc.c Outdated
if (context->fv) {
free(context->fv->ranges);
}
g_free(context->fv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this shall move to the if (context->fv) branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g_free() and free() with a NULL is a noop so it's not necesary. I can move it if you prefer it this way better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please do so because it looks more straightforward.

}
}
memory_region_transaction_commit(mr);
// memory_region_transaction_commit(mr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to remove this commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This function is called in two cases.

  • unmap/free complete memory region (as part of destroy)
    Then the parrent unmap will trigger the commit
  • restore the memory state
    Then the restore also restores the old flatview so no rebuild is nesecary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick check, and it seems to make sense.

Could you also add these as comments here? So that I know the preconditions here if we are going to bump qemu version.

@wtdcode
Copy link
Member

wtdcode commented Oct 16, 2024

Okay, I finished the review, and it's safe to go now with minor fixes and a few document updates. Thanks again for your brilliant work.

Save the last element of the ram_list. This allows to
faster find where to add new elements when they are not
bigger then page size.
this keeps the optimization for find_ram_offset() intact after snapshot
restore.
Building each flatview new when the memory has changed is quite
expensive when many MemoryRegions are used. This is an issue when using
snapshots.
this avoids rebuilding the flatview when restore a context.
Specialy stress that with UC_CTL_CONTEXT_MEMORY it is not possible to
use the context with a different unicorn object.
@PhilippTakacs
Copy link
Contributor Author

I have added the last fixes and rebased it to the dev branch.

The failed test doesn't look like it's a bug by my changes.

@wtdcode
Copy link
Member

wtdcode commented Oct 16, 2024

I have added the last fixes and rebased it to the dev branch.

The failed test doesn't look like it's a bug by my changes.

I will rerun and it seems Github Action bug.

@wtdcode wtdcode merged commit e8ca3cb into unicorn-engine:dev Oct 16, 2024
33 checks passed
@wtdcode
Copy link
Member

wtdcode commented Oct 16, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants