-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include support for more TimescaleDB features #25
Comments
Hey ! Are you still looking for help? I did some experimentation with my HA instance.
I'm still experimenting with indexes on I also tried to add computed columns |
@Yamakaky Very interesting, thanks for reporting back! Can you please clarify:
Did you mean Also, for my understanding of your tests, since the compressed chunks are not indexed on the additional columns you extract from My main issue with the compression feature of timescaledb (when I first had a look at it) was that schema changes was not supported at all. Do you know if this has changed for the better? |
No, I meant indexes with different combination of these three columns + time. For now I didn't do much more on that subject since the indexes mostly depend on which queries are used. Yes, the compressed chunks are only have single column indexes on the group_by columns. The other indexes are only valid for the uncompressed days. This is still useful, for example i have a view with Some things are supported, but yes schema changes are limited: https://docs.timescale.com/timescaledb/latest/how-to-guides/compression/modify-a-schema/. However, I think the current schema is appropriate. I couldn't find another attribute that would qualify to be promoted as a column. My current schema:
|
Stats:
With this query: with table_size as (
SELECT hypertable_size
FROM hypertable_size('$table_name')
), waterline as (
select max(range_end) as range_end, min(range_start) as range_start
FROM timescaledb_information.chunks natural join chunk_compression_stats('$table_name')
where is_compressed
), metrics as (
select
count(*) filter (where time < range_end) as count_compressed,
count(*) filter (where time > range_end) as count_uncompressed
from $table_name, waterline
), size as (
select
sum(after_compression_total_bytes) as size_compressed
from chunk_compression_stats('$table_name')
)
select
count_uncompressed + count_compressed as "Metrics",
size_compressed::float / extract(epoch FROM range_end - range_start) * 86400 * 30 as "Size per month (compressed)",
size_compressed::float / count_compressed as "Bytes per metric (compressed)",
(hypertable_size::float - size_compressed) / count_uncompressed as "Bytes per metric (uncompressed)"
from table_size, metrics, size, waterline |
I also have this to parse homeassistant=# \d+ ltestv
View "public.ltestv"
Column | Type | Collation | Nullable | Default | Storage | Description
---------------------+--------------------------+-----------+----------+---------+----------+-------------
time | timestamp with time zone | | | | plain |
entity_id | text | | | | extended |
device_class | text | | | | extended |
state | text | | | | extended |
state_class | text | | | | extended |
attributes | jsonb | | | | extended |
location | geometry(Point,4326) | | | | main |
friendly_name | text | | | | extended |
unit_of_measurement | text | | | | extended |
state_float | double precision | | | | plain |
state_bool | boolean | | | | plain |
View definition:
SELECT ltest."time",
ltest.entity_id,
ltest.device_class,
ltest.state,
ltest.state_class,
ltest.attributes,
ltest.location,
ltest.friendly_name,
ltest.unit_of_measurement,
try_cast_float(ltest.state) AS state_float,
try_cast_boolean(ltest.state) AS state_bool
FROM ltest; |
I just tried to enable compression on mine, and I realized ltss table has a auto incrementing primary key, which makes compression kind of useless since it now has to segment on that. |
Yes, see my schema just above. All timescaledb docs don't include primary keys, just time. |
@Yamakaky Did you need to modify the integration? |
No, I just ended up adding an insert trigger on ltss to copy data to my hypertable. It would be better to have native support. |
Hey folks, apart from working at Timescale, I'm really interested in this for my own HASS instance. After installing, I wanted to enable compression and realized that due to the compound primary key, the id column needs to be part of the segment_by clause which works against the way compression is implemented (columnar storage). @Yamakaky is going into the right direction. In terms of removing the id column, as well as probably implementing a column per data type. I just recently wrote a best practices blog post on the topic (https://www.timescale.com/blog/best-practices-for-time-series-data-modeling-narrow-medium-or-wide-table-layout-2/) where this is represented as a medium wide layout. Happy to help getting this integration to implement the best practices (totally unselfish, I know 😋). I'm just not the best person when it comes to Python code. |
Alright looked a bit into it and that's my current idea. Isn't necessarily the best option yet (not enough data collected to really see the impact) but in case somebody is interested:
I've extracted the actual state into a set of 4 separate (nullable) fields which will compress to almost nothing if not used and I also extracted the attributes into a separate table with an ltree key which uses a tree-like structure to find values. The actual key is a sha256 of the attributes content to make sure similar data fields reference the same (already stored) attributes value. It's a basic deduplication. At least for my sensors it seems to shrink the number of jsonb entries quite a bit |
Thank you all for showing such an interest in this integration! Be aware, some personal opinions ahead! My stand on this is:
I would very much like to hear your thoughts on this, feel free to criticize 😉 |
Thank you for creating the initial integration! 😁 No worries about personal opinions. They exist to be discussed and I'm all too happy to do it!
Deduplication savings using the ltree approach may vary by the available entities. For me opnsense creates a lot of duplicates (since the interface metadata almost never changes), but others too. May be my specific set of entities / sensors though, but right now I'm at 194k to 15k which is quite the substantial saving 🥹 |
@noctariusHere's an example of an entity with a seemingly binary state having more then that. You can't quite model this as a boolean because then you'd only have 3 states, true, false and null. |
That is actually a very obvious one 😅 How'd I miss this. However, I think those two are general HASS elements that can happen for almost any entity, can't it? In this case, it should probably be its own column, such as "was the entity unavailable or unknown? than it won't have a real value at that time". WDYT? |
Hmm. Here's another hypothetical situation. Or perhaps, someone setting up a sensor with "On", "Off", "Open". What would this be mapped as? true, false, true? What about other languages? Since state is ANY string value, I'm not entirely sure if is affected by the home assistant language setting for example. I think the biggest problem is that this would be making assumptions that might not always hold true for a given entity and installation, and potentially, loosing context encoded in specific value of the state. It seems to me like any type inference logic here would be potentially fragile. Especially since template entities / helper entities are a thing that can be configured freely by end users. |
To add on my usecase: in my instance, I added If the semantic of an entity changes, then the user will have to handle it anyway I think. I already have that for a few entities. |
For me, this all comes down to maintainability of the codebase, and, as such, I dont want to go down the road with extra columns and extra logic (which quite possibly needs to be updated every time someone decides to add a new weird combo of state "types" in text-format in some random custom component) without a really good reason. Therefore, I will keep the state as a single text column for now. @Yamakaky If this is mainly a convenience issue with having to handle casting of invalid values in every query, my suggestion is to write a sql function for your specific usecase. |
Sorry for the no-answer. Crazy couple of days 😅 I see the reason why you want to stay with state as a text column only. I still find it pretty horrible from HASS side to not provide any metadata for the value but it's fair. The reasoning is pretty sound. I'll adopt my WIP PR the next few days. Just rebasing it onto 2.0 release (to be able to get it running again). Switching back from my current data set should be easy enough. |
I understand your concern, however I don't think it's specific to my usecase. I assume most if not all ltss instances will need to cast to floats at some point for processing and visualisation. Maybe then provide such functions with the extension? Alternatively, have a view like in my original post with additional state_* columns would work. Iirc I moved to actual columns since it added a noticeable performance gain, not sure about the specifics though. |
I adjusted the draft to only have a single text column again. I wonder if a column with a value of the guessed datatype (text, bool, float) may be interesting? That way it may be easier to filter out valid values and ignore pure text states 🤔 @Yamakaky I could imagine, that a Continuous Aggregate may be interesting for your use case, which collected the data pre-aggregated (depending on how you'd like to aggregate them). |
That would work, at the cost of duplicate data between the aggregate and the main table. Data rate is pretty slow on my side so it should be fine. |
I recently started using Home Assistant with TimescaleDB and LTSS and it's working great. However, my database is quickly growing. I tried compression, but with the id column there is nothing to be gained. Then I came across this thread and it looks like the perfect solution, so I was wondering whether there are any plans on incorporating this or whether I should create my own fork? Thanks in advance. |
@nwerter |
|
@freol35241 It seems to break my setup with compression enabled. Decompressing all the historical data doesn't seem to be feasible. Any options to still move to 2.1.0 ?
|
@velaar don't think there's any other way than decompressing and disable compression on the existing hypertable. As an alternative, you could create a new one and slowly copy the old data over chunk by chunk. |
@velaar compressed chunks in Timescale can't be updated or deleted, so updating to 2.1.0, which modifies the structure of the hypertable, can only be done on an uncompressed database. Afterwards you can enable compression again, which should also be much more effective without the id column. In order to help with the alternative suggested by noctarius, Timescale provides some guidance on backfilling historical data: https://docs.timescale.com/use-timescale/latest/compression/backfill-historical-data/ |
@velaar (and all others) You can work around it by removing the id column manually. Then the migration will not run. A primary key is not required after all. Just make sure you have a compound index on |
Side discussion/question: I manually created a compound index on |
The text was updated successfully, but these errors were encountered: