ECS Consuming Hugely Disproportionate Data Disk Space in Overheads, and Consuming for No Apparent Reason #561

AB442 · 2022-11-07T12:52:01Z

Expected Behavior

The ECS OVA v3.7 installation. The expectation is that ECS storage uses a reasonable amount of disk space when a file is uploaded. For example a 1 GB file is uploaded, the user expects 1GB of disk space to be consumed, understandably this could be something like 1.5 GB with various overheads. It is expected that deleting the file releases all of the consumed disk space, allowing the space to be used for additional files.

Actual Behavior

ECS consumes massively more disk space on the data disk than the uploaded files should occupy (7 to 10 times more, multiplicative). Beyond this the disk space is also consumed passively after an upload, I have not been able to observe when this passive consumption of disk space stops and it appears to keep consuming more disk space as time goes on. The system was not rebooted during this time. All of the excess consumed space falls under metadata and protection overhead. I have a single node installation and metadata or protection are not enabled in my deploy.yml or bucket files. The machine has a large disk but this issue made another ECS deployment of mine with a smaller disk crash.

Examples of what happened: In my experience the disk space consumed by user files disk is dwarfed by metadata overhead (over 2x size of user files) and protection overhead (over 4x size of user files). More worrying is that ECS appears to be consuming extra disk space passively, for no apparent reason. For example I logged off on a Friday evening and approximately 700 GB was consumed, by Monday morning it was over 1 TB, all of the additional consumption appeared to be metadata and protection overhead. I can view in real time the consumption rising, in the last few hours I have added no files but an additional 30 GB was used. I have a sizeable amount of disk space on this ECS node so I have had the ability to monitor this consumption, but on another ECS node I made some weeks ago I had 100 GB data disk, I decided to check this node and it is now crashed with 96% disk space used by ECS, I cannot check the logs specifically but it carries the same symptoms of my problem on my main node. The smaller test node only had a single 6.8 MB file uploaded to it. Neither node was rebooted since they were set up as I'm aware of the known issue that rebooting ECS can tie up disk space.

I would greatly appreciate assistance with this issue, as you can see it's quite serious and effectively renders ECS Community Edition unusable for any length of time, and certainly prevents evaluation of the utility. See screenshots below.

Steps to Reproduce Behavior

Install ECS from OVA v3.7
Perform necessary steps (1 & 2) with default values in deploy.yml (protection = false, etc), but necessary environment IPs etc added to file
Once the platform is up and running begin uploading files and monitor the disk space used on the ECS dasboard. The disk space consumed should far outstrip the size of the uploaded files. In the capacity utilization tab of the dashboard you can view the breakdown of specific usage, if the issue has occurred the usage for metadata and protection overheads will be far higher than for the user files (2 to 4 times the size for each). Stopping the upload of files and waiting will allow one to observer further movement in memory consumption over the next day approx.

Relevant Output and Logs

Notifies: @nikhil-vr

The text was updated successfully, but these errors were encountered:

nikhil-vr · 2022-12-05T10:30:24Z

Metadata usage is very high on the above example which is unexpected for ECS code , we reserve many copies of btree pages , not sure is this the cause or the btree/journal garbage which happens after 30/15 days. Is the capacity usage remains same or dropping it ?

I'm currently working on 3.8 and will test this , also optimize some GC parameters for smaller system.

AB442 · 2022-12-05T10:54:09Z

We left the node running for about 2 weeks but the usage kept increasing day after day. As this is is a single-node installation, I'm not sure if this also occurs in multi-node. I think even if garbage disposal did clean it out after 15 / 30 days it wouldn't help systems with smaller disks, for example I have seen the garbage build up enough to crash a 500 GB disk machine in a day or two. Thanks for the response.

nikhil-vr · 2023-01-26T07:05:17Z

Please test 3.8 , I have made some optimization for metadata reduction for small systems , please try and let me know.

lriva94 · 2023-01-26T09:24:03Z

Hello, I installed a 3.8 edition with 4 nodes (maybe it is considered as not a small system), I did inject very few data and system metadata are increasing every day. No GC actions seem to be triggered.
It seems it is not fixed even in 3.8

lriva94 · 2023-01-26T09:32:16Z

I used OVA installation.

AB442 · 2023-01-26T09:35:55Z

I can test 3.8 from source as it appears the OVA still has the issue. Hopefully will have some results in the coming days. As mentioned, I installed 3.7 from OVA and it presented this issue.

Update: I tried to install from source but encountered a few errors in step 1 which I unfortunately don't have time to debug at the moment, but I think it's a reasonable assumption that the issue likely exists there also. I have the ECS 3.8 OVA set up so if there is anything you would like to try out then let me know.

nikhil-vr · 2023-02-01T06:42:55Z

Manual deployment will not change. We are investigating on it , will let you know the outcome.

tihopia · 2023-02-23T06:33:24Z

Is there any knowledge that this problem is not present on earlier build? Is it worth to install 3.6.2.0 to prevent disk space consumption

It seems that versions 3.7 ja 3.8 suffers this. We have installed 3.8 both with ova and manual method and discovered same symptoms what are described here

nikhil-vr · 2023-02-23T08:28:30Z

We are releasing a new OVA image for 3.8 next week , will update once it is posted.

AB442 · 2023-02-23T09:06:17Z

To answer tihopa's question: I installed multiple versions and the problem was present on them, including 3.6, 3.5.

jonnyoboy2110 · 2023-09-06T17:52:10Z

Is there any way to remove all the excess data that is being stored, or prevent any more of the excess data from being stored . I use my server for testing so I don't need any of the data after testing but its becoming difficult to set up a whole new server every time we need to test because the server has used up all of the storage on my machine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECS Consuming Hugely Disproportionate Data Disk Space in Overheads, and Consuming for No Apparent Reason #561

ECS Consuming Hugely Disproportionate Data Disk Space in Overheads, and Consuming for No Apparent Reason #561

AB442 commented Nov 7, 2022 •

edited

Loading

nikhil-vr commented Dec 5, 2022

AB442 commented Dec 5, 2022

nikhil-vr commented Jan 26, 2023

lriva94 commented Jan 26, 2023

lriva94 commented Jan 26, 2023

AB442 commented Jan 26, 2023 •

edited

Loading

nikhil-vr commented Feb 1, 2023

tihopia commented Feb 23, 2023

nikhil-vr commented Feb 23, 2023

AB442 commented Feb 23, 2023

jonnyoboy2110 commented Sep 6, 2023

ECS Consuming Hugely Disproportionate Data Disk Space in Overheads, and Consuming for No Apparent Reason #561

ECS Consuming Hugely Disproportionate Data Disk Space in Overheads, and Consuming for No Apparent Reason #561

Comments

AB442 commented Nov 7, 2022 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce Behavior

Relevant Output and Logs

nikhil-vr commented Dec 5, 2022

AB442 commented Dec 5, 2022

nikhil-vr commented Jan 26, 2023

lriva94 commented Jan 26, 2023

lriva94 commented Jan 26, 2023

AB442 commented Jan 26, 2023 • edited Loading

nikhil-vr commented Feb 1, 2023

tihopia commented Feb 23, 2023

nikhil-vr commented Feb 23, 2023

AB442 commented Feb 23, 2023

jonnyoboy2110 commented Sep 6, 2023

AB442 commented Nov 7, 2022 •

edited

Loading

AB442 commented Jan 26, 2023 •

edited

Loading