Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA Database bloat after introduction of Bermuda_Global sensors #389

Open
jeremysherriff opened this issue Nov 17, 2024 · 5 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@jeremysherriff
Copy link

jeremysherriff commented Nov 17, 2024

Configuration

Not applicable (I think?)

Describe the bug

Since v0.7 and the addition of the Global Bermuda device, my HA database has increased in size +15%. This seems to all be in the State table.
My database is MariaDB so using phpMyAdmin I checked what the highest state counts are, grouped by entity_id:

select states_meta.entity_id,count(*) from states left join states_meta on states.metadata_id = states_meta.metadata_id group by entity_id order by count(*) desc;

entity_id					count(*)   	
sensor.bermuda_global_visible_device_count	151901	
sensor.bermuda_global_total_device_count	 89860	
sensor.deck_sensor_humidity			 31100	
sensor.atc_c280_humidity			 30346	
sensor.deck_sensor_temperature			 28670	
sensor.lywsd03mmc_7d73_humidity			 26242	
sensor.atc_c280_temperature			 24805	
sensor.lywsd03mmc_7d73_temperature		 24085	
sensor.zm_memory_used				 22791	
...

The sensor.bermuda_global_visible_device_count has a state change count that is an order of magnitude higher than most sensors.

I suspect a lot of the bloat is coming from the lack of state_class attributes for these sensors, which causes every state change to be recorded as a discrete value rather than being aggregated:
image

I think the following attribute should be added to these sensors:

state_class: measurement

I have now manually added these attributes using the customize.yaml file and will update on success of this.

Alternatively, these sensors should perhaps be disabled by default?

Diagnostics

Not applicable (I think?)

@jeremysherriff
Copy link
Author

Update; My edits to customize.yaml did what I was hoping from the point of view that the states are now seen as numbers and can be aggregated and fed into the statistics engine. Whether this causes the state changes to be stored more efficiently I am unsure (but I have some other very high-change stats that do not bloat the database like these were, so I am quietly confident...)
image

@agittins
Copy link
Owner

Hi Jeremy!

v0.7.1 introduced rate-limits on the global sensor updates, can you check how it works for you on v0.7.2?

You're right about the state_class though, I'll add that to fix the treatment of the sensor values - I don't think it directly affects the state recording, at least until it goes into long-term stats where the aggregations happen - but it will fix how they're displayed.

@agittins agittins self-assigned this Nov 17, 2024
@agittins agittins added the enhancement New feature or request label Nov 17, 2024
@jeremysherriff
Copy link
Author

Ha I am always late to the party! I'm running 0.7.2 so the rate limit (once per minute?) is already in place, so my "testing" will be guaranteed successful :)

My recorder purge is 7 days so by this coming weekend it will be a fair test to see the difference there.

@jeremysherriff
Copy link
Author

About 25 hours since I purged the state data, now looking much better:

select states_meta.entity_id,count(*) from states left join states_meta on states.metadata_id = states_meta.metadata_id
 where states_meta.entity_id LIKE 'sensor.bermuda_global_%'
 group by entity_id
 order by count(*) desc;

entity_id					count(*)
sensor.bermuda_global_total_device_count	1566	
sensor.bermuda_global_visible_device_count	1315	
sensor.bermuda_global_active_proxy_count	 412	
sensor.bermuda_global_total_proxy_count		  25	

The ~152k in the OP was 7 days so approx 21k per day.

With state data being written every 1 minute (1,440 per day) and it being very likely that a change is observed at every interval, it is now trending towards 22k for the 7 days (which is still high; it'll be in the top 10 noisiest sensors but a vast improvement).

@agittins
Copy link
Owner

Great, thanks for the follow-up!

I think it's probably worth winding it back further - these sensors are more about gathering a wide-viewed picture of health (and there are more to come!) so perhaps every 5 minutes would be sufficient - possibly with an extra mechanism to force a refresh for any realtime diagnostic needs. Given their purpose it doesn't make a lot of sense for them to be near the top of that list! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants