Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static shared memory in PSRAM for model/imageTMP and tensor_arena #2215

Closed
wants to merge 29 commits into from

Conversation

caco3
Copy link
Collaborator

@caco3 caco3 commented Mar 20, 2023

Proof of Concept

‼️ Do not merge!

Use same memory block for all tensor_area

This is working ok

Use same memory block for both models and imageTMP.

If the allocated memory gets too small, it gets freed an a larger block gets allocated. This happens withing the first round and then no change is needed anymore.

This seems to work fine for the first round but at the 2nd round I always run into

I (24722) C IMG BASIS: Image (zwImage) loaded from memory: 0, 0, 0
E (24722) C IMG BASIS: Image with size 0 loaded --> reboot to be done! Check that your camera module is workin

The fb how ever contains valid data!

jomjol and others added 5 commits March 19, 2023 18:31
* Testcase for #2145 and debug-log (#2151)

* new models ana-cont-11.0.5, ana-class100-1.5.7, dig-class100-1.6.0

* Testcase for #2145
Added debug log, if allowNegativeRates is handeled

* Fix timezone config parser (#2169)

* make sure to parse the whole config line

* fix crash on empty timezone parameter

---------

Co-authored-by: CaCO3 <[email protected]>

* Enhance ROI pages (#2161)

* Check if the ROIs are equidistant. Only if not, untick the checkbox

* renaming

* Check if the ROIs have same y, dy and dx. If so, tick the sync checkbox

* only allow editing space when box is checked

* fix sync check

* show inner frame on all ROIs

* cleanup

* Check if the ROIs have same dy and dx. If so, tick the sync checkbox

* checkbox position

* renaming

* renaming

* show inner frame and cross hairs on all ROIs

* update ROIs on ticking checkboxes

* show timezone hint

* fix deleting last ROI

* cleanup

---------

Co-authored-by: CaCO3 <[email protected]>

* restart timeout on progress, catch error (#2170)

* restart timeout on progress, catch error

* .

---------

Co-authored-by: CaCO3 <[email protected]>

* BugFix #2167

* Release 15.1 preparations (#2171)

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

* Update changelog

* Fix links to PR

* Formating

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

* Update Changelog.md

---------

Co-authored-by: Slider0007 <[email protected]>
Co-authored-by: Slider0007 <[email protected]>

* fix typo

* Replace relative documentation links with absolute ones pointing to the external documentation (#2180)

Co-authored-by: CaCO3 <[email protected]>

* Sort model files in configuration combobox (#2189)

* new models ana-cont-11.0.5, ana-class100-1.5.7, dig-class100-1.6.0

* Testcase for #2145
Added debug log, if allowNegativeRates is handeled

* Sort model files in combobox

* reboot task - increase stack size (#2201)

Avoid stack overflow

* Update interface_influxdb.cpp

* Update Changelog.md

---------

Co-authored-by: Frank Haverland <[email protected]>
Co-authored-by: CaCO3 <[email protected]>
Co-authored-by: CaCO3 <[email protected]>
Co-authored-by: Slider0007 <[email protected]>
Co-authored-by: Slider0007 <[email protected]>
@caco3
Copy link
Collaborator Author

caco3 commented Mar 21, 2023

@Slider0007 @jomjol Maybe you have an idea why we run into the Image with size 0 loaded issue?

@caco3
Copy link
Collaborator Author

caco3 commented Mar 21, 2023

Example log where the first loaded model is smaller than the 2nd one and again smaller than imageTMP:

I (11922) PSRAM: Allocated 921600 bytes in PSRAM for 'C IMG BASIS->CImageBasis (rawImage)'
I (13482) PSRAM: Allocated 128004 bytes in PSRAM for 'ALIGN->AlgROI'

I (13592) PSRAM: Allocated 819200 bytes in PSRAM for 'TFLITE->tensor_arena'

I (13642) TFLITE: First model to be loaded: /sdcard/config/dig-cont_0610_s3_q.tflite, Size: 315504
I (13662) PSRAM: Allocated 315504 bytes in PSRAM for 'shared model/imageTMP memory'

I (14532) PSRAM: Allocated 1920 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI dig1)'
I (14552) PSRAM: Allocated 13746 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI dig1 original)'
I (14582) PSRAM: Allocated 1920 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI dig2)'
I (14602) PSRAM: Allocated 13746 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI dig2 original)'
I (14622) PSRAM: Allocated 1920 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI dig3)'
I (14652) PSRAM: Allocated 13746 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI dig3 original)'

I (14772) TFLITE: 2nd model to be loaded: /sdcard/config/ana-cont_1105_s2_q.tflite, Size: 53328
I (14792) TFLITE: Currently allocated shared model/imageTMP memory: 315504
I (14822) TFLITE: Shared model/imageTMP memory is large enough for 2nd model

I (15002) PSRAM: Allocated 3072 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI ana1)'
I (15032) PSRAM: Allocated 37632 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI ana1 original)'
I (15052) PSRAM: Allocated 3072 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI ana2)'
I (15072) PSRAM: Allocated 37632 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI ana2 original)'
I (15102) PSRAM: Allocated 3072 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI ana3)'
I (15122) PSRAM: Allocated 37632 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI ana3 original)'
I (15152) PSRAM: Allocated 3072 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI ana4)'
I (15172) PSRAM: Allocated 37632 bytes in PSRAM for 'C IMG BASIS->CImageBasis (ROI ana4 original)'

I (15342) TFLITE SERVER: Round #1 started

I (25862) ALIGN: Currently allocated shared model/imageTMP memory: 315504
I (25892) ALIGN: Extending shared model/imageTMP memory so we can fit the imageTMP...
I (25912) PSRAM: Freeing memory in PSRAM used for 'shared model/imageTMP memory'...
I (25942) PSRAM: Allocated 921600 bytes in PSRAM for 'shared model/imageTMP memory'

I (52842) C IMG BASIS: Not freeing shared model/imageTMP memory (ImageTMP)
I (53012) TFLITE: 2nd model to be loaded: /sdcard/config/dig-cont_0610_s3_q.tflite, Size: 315504
I (53042) TFLITE: Currently allocated shared model/imageTMP memory: 921600
I (53062) TFLITE: Shared model/imageTMP memory is large enough for 2nd model

I (62832) TFLITE: 2nd model to be loaded: /sdcard/config/ana-cont_1105_s2_q.tflite, Size: 53328
I (62862) TFLITE: Currently allocated shared model/imageTMP memory: 921600
I (62882) TFLITE: Shared model/imageTMP memory is large enough for 2nd model

I (66232) TFLITE SERVER: Round #1 completed (51 seconds)

I (135362) TFLITE SERVER: Round #2 started

E (142502) C IMG BASIS: Image with size 0 loaded --> reboot to be done! Check that your camera module is working and 

@caco3 caco3 changed the title use a single instance of CTfLiteClass and reuse the allocated space in PSRAM Static shared memory in PSRAM for model/imageTMP and tensor_arena Mar 22, 2023
@Slider0007
Copy link
Collaborator

Slider0007 commented Mar 22, 2023

@caco3, @jomjol
Please find attached some screenshots which try to decribe the actual PSRAM usage (only lager allocation visualized, not really pretty but maybe helpful to understand it better).

@caco3: This is my assumption why first round with your approach is working and you're failing with second round:

ROUND 1:
image

ROUND 2:
image

WORST CASE (max. models sizes)
image


Slider0007 approach: Keep models loaded, share tensor/ImageTMP

I did also some research to this topic. Approach is working till sum of 1000kB models (model1+2), then it also hangs at the same point than in approach from @caco3. This means 1200kB models are not working with this approach anymore. More detailed infos to further findings I have described in private chat. The testbranch is located here, if you'd like to have a look to it:

image


###Output of PSRAM memory blocks after a few rounds (flow finished):
Showing data for heap: 0x3f802c70
Block 0x3f8034dc data, size: 40 bytes, Free: No
Block 0x3f803508 data, size: 20 bytes, Free: No
Block 0x3f803520 data, size: 61440 bytes, Free: No
Block 0x3f812524 data, size: 55880 bytes, Free: No
Block 0x3f81ff70 data, size: 80 bytes, Free: No
Block 0x3f81ffc4 data, size: 16 bytes, Free: No
Block 0x3f81ffd8 data, size: 16 bytes, Free: No
Block 0x3f81ffec data, size: 16 bytes, Free: No
Block 0x3f820000 data, size: 16 bytes, Free: No
Block 0x3f820014 data, size: 16 bytes, Free: No
Block 0x3f820028 data, size: 16 bytes, Free: No
Block 0x3f82003c data, size: 16 bytes, Free: No
Block 0x3f820050 data, size: 16 bytes, Free: No
Block 0x3f820064 data, size: 12 bytes, Free: No
Block 0x3f820074 data, size: 24 bytes, Free: No
Block 0x3f820090 data, size: 2080 bytes, Free: No
Block 0x3f8208b4 data, size: 136 bytes, Free: No
Block 0x3f820940 data, size: 136 bytes, Free: No
Block 0x3f8209cc data, size: 1696 bytes, Free: No
Block 0x3f821070 data, size: 1696 bytes, Free: No
Block 0x3f821714 data, size: 1696 bytes, Free: No
Block 0x3f821db8 data, size: 1696 bytes, Free: No
Block 0x3f82245c data, size: 1696 bytes, Free: No
Block 0x3f822b00 data, size: 1696 bytes, Free: No
Block 0x3f8231a4 data, size: 1696 bytes, Free: No
Block 0x3f823848 data, size: 1696 bytes, Free: No
Block 0x3f823eec data, size: 1696 bytes, Free: No
Block 0x3f824590 data, size: 1696 bytes, Free: No
Block 0x3f824c34 data, size: 1696 bytes, Free: No
Block 0x3f8252d8 data, size: 1696 bytes, Free: No
Block 0x3f82597c data, size: 1696 bytes, Free: No
Block 0x3f826020 data, size: 1696 bytes, Free: No
Block 0x3f8266c4 data, size: 1696 bytes, Free: No
Block 0x3f826d68 data, size: 1696 bytes, Free: No
Block 0x3f82740c data, size: 552 bytes, Free: No
Block 0x3f827638 data, size: 768 bytes, Free: No
Block 0x3f82793c data, size: 56 bytes, Free: No
Block 0x3f827978 data, size: 12 bytes, Free: No
Block 0x3f827988 data, size: 12 bytes, Free: No
Block 0x3f827998 data, size: 56 bytes, Free: No
Block 0x3f8279d4 data, size: 12 bytes, Free: No
Block 0x3f8279e4 data, size: 80 bytes, Free: No
Block 0x3f827a38 data, size: 12 bytes, Free: No
Block 0x3f827a48 data, size: 12 bytes, Free: No
Block 0x3f827a58 data, size: 16 bytes, Free: No
Block 0x3f827a6c data, size: 76 bytes, Free: No
Block 0x3f827abc data, size: 56 bytes, Free: No
Block 0x3f827af8 data, size: 80 bytes, Free: No
Block 0x3f827b4c data, size: 184 bytes, Free: No
Block 0x3f827c08 data, size: 184 bytes, Free: No
Block 0x3f827cc4 data, size: 80 bytes, Free: No
Block 0x3f827d18 data, size: 40 bytes, Free: No
Block 0x3f827d44 data, size: 64 bytes, Free: No
Block 0x3f827d88 data, size: 104 bytes, Free: No
Block 0x3f827df4 data, size: 16 bytes, Free: No
Block 0x3f827e08 data, size: 16 bytes, Free: No
Block 0x3f827e1c data, size: 12 bytes, Free: Yes
Block 0x3f827e2c data, size: 44 bytes, Free: No
Block 0x3f827e5c data, size: 108 bytes, Free: Yes
Block 0x3f827ecc data, size: 56 bytes, Free: No
Block 0x3f827f08 data, size: 208 bytes, Free: No
Block 0x3f827fdc data, size: 208 bytes, Free: No
Block 0x3f8280b0 data, size: 16 bytes, Free: No
Block 0x3f8280c4 data, size: 120 bytes, Free: No
Block 0x3f828140 data, size: 64 bytes, Free: No
Block 0x3f828184 data, size: 104 bytes, Free: Yes
Block 0x3f8281f0 data, size: 136 bytes, Free: No
Block 0x3f82827c data, size: 942080 bytes, Free: No
Block 0x3f90e280 data, size: 128004 bytes, Free: No
Block 0x3f92d688 data, size: 226480 bytes, Free: No
Block 0x3f964b3c data, size: 1920 bytes, Free: No
Block 0x3f9652c0 data, size: 4860 bytes, Free: No
Block 0x3f9665c0 data, size: 1920 bytes, Free: No
Block 0x3f966d44 data, size: 4860 bytes, Free: No
Block 0x3f968044 data, size: 1920 bytes, Free: No
Block 0x3f9687c8 data, size: 4860 bytes, Free: No
Block 0x3f969ac8 data, size: 183756 bytes, Free: No
Block 0x3f996898 data, size: 3072 bytes, Free: No
Block 0x3f99749c data, size: 17788 bytes, Free: No
Block 0x3f99ba1c data, size: 3072 bytes, Free: No
Block 0x3f99c620 data, size: 17788 bytes, Free: No
Block 0x3f9a0ba0 data, size: 3072 bytes, Free: No
Block 0x3f9a17a4 data, size: 17788 bytes, Free: No
Block 0x3f9a5d24 data, size: 3072 bytes, Free: No
Block 0x3f9a6928 data, size: 17788 bytes, Free: No
Block 0x3f9aaea8 data, size: 3072 bytes, Free: No
----> same start block like below
Block 0x3f9abaac data, size: 17788 bytes, Free: No
Block 0x3f9b002c data, size: 856 bytes, Free: Yes
Block 0x3f9b0388 data, size: 56 bytes, Free: No
Block 0x3f9b03c4 data, size: 12 bytes, Free: No
Block 0x3f9b03d4 data, size: 512 bytes, Free: No

---> Helper structure memory
Block 0x3f9b05d8 data, size: 632400 bytes, Free: Yes --> only remaining not solved bigger fragmentation; potentially internal CImage helper structure which is neccessary to convert JPG to matrix image
<--- Helper structure memory

Block 0x3fa4ac2c data, size: 921604 bytes, Free: No
Block 0x3fb2bc34 data, size: 869316 bytes, Free: Yes


During Take image state in the following round (marked CImage Helper strucutre block):
----> same start block like above
Block 0x3f9abaac data, size: 17788 bytes, Free: No
Block 0x3f9b002c data, size: 208 bytes, Free: No
Block 0x3f9b0100 data, size: 208 bytes, Free: No
Block 0x3f9b01d4 data, size: 104 bytes, Free: Yes
Block 0x3f9b0240 data, size: 56 bytes, Free: No
Block 0x3f9b027c data, size: 264 bytes, Free: Yes
Block 0x3f9b0388 data, size: 56 bytes, Free: No
Block 0x3f9b03c4 data, size: 12 bytes, Free: No

---> Helper structure memory
Block 0x3f9b03d4 data, size: 18456 bytes, Free: No
Block 0x3f9b4bf0 data, size: 307216 bytes, Free: No
Block 0x3f9ffc04 data, size: 153616 bytes, Free: No
Block 0x3fa25418 data, size: 153616 bytes, Free: No
<--- Helper structure memory

Block 0x3fa4ac2c data, size: 921604 bytes, Free: No
Block 0x3fb2bc34 data, size: 869316 bytes, Free: Yes

If nothing come in between the same free blocks are allocated again, but whenever something is coming in between and take only some bytes PSRAM gets even more fragemented. That's my majot concern!


Actual implemented version without any preallocation (v15.0.3)

It seems that the actual implementation is the best balanced version, but only possible if no fragmentation occurs which is in my option the main issue.
image

If we get rid of the fragmentation we have a really good base to work on for further improvements. Unfortunately this would exclude the possibility to use wifi stack and bss in PSRAM and reduce the internal RAM again which is really bad to see. Up to now I have no glue if we come around the obstacle.

@caco3
Copy link
Collaborator Author

caco3 commented Mar 22, 2023

Thanks @Slider0007 for the helpful visualization!

As a side note, it would not be difficult to tell the stb library to use the internal RAM! They actually provide #defines so malloc can be customized: https://github.com/jomjol/AI-on-the-edge-device/blob/rolling/code/components/jomjol_image_proc/stb_image.h#L627

Using MALLOC_CAP_INTERNAL we should be able to enforce it to use the internal RAM. Question is if we have enough space there. Or we could modify it to use a static memory block like we did for the other parts.

caco3 and others added 4 commits March 30, 2023 21:54
- stb_image.h: Version update 2.25 -> 2.28
- stb_resize.h: Version update 0.96 -> 0.97
- stb_write.h: Version update 1.14 -> 1.16

Co-authored-by: CaCO3 <[email protected]>
* Rename module tag name

* Rename server_tflite.cpp -> MainFlowControl.cpp

* Remove redundandant MQTTMainTopic function

* Update

* Remove obsolete GetMQTTMainTopic
@caco3
Copy link
Collaborator Author

caco3 commented Apr 1, 2023

crash

350 KB model and

-CONFIG_SPIRAM_USE_MALLOC=y
+#CONFIG_SPIRAM_USE_MALLOC=y
+CONFIG_SPIRAM_USE_CAPS_ALLOC=y

=> no fragmentation, but most likely not enough internal RAM

grafik

@caco3 caco3 closed this Apr 30, 2023
@caco3 caco3 deleted the shared-psram-objects branch May 2, 2023 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants