[0.10.0] TSM conversion reproducibly drops data silently #5606

jonseymour · 2016-02-10T02:37:13Z

G'day I recently upgraded my influx instance from 0.9.6.1 to 0.10.0

To perform the upgrade I did the following:

stopped the influx instance (it was a single node cluster)
performed a (filesystem) backup of the database
ran the influx_tsm utility to convert the b1 databases to tsm databases
upgraded the code base to v0.10.0
restarted influx on the new code base

The following day I ran did some sanity checks against the migrated data and discovered that some data had gone missing.

To verify that the data was, indeed, present prior to the upgrade, I restored the original b1 databases to another influx instance (using a filesystem restore), restarted a copy of 0.9.6.1 against the restored copy of the database and re-ran the queries to confirm that the data I was expecting to be there was in fact there.

I then re-ran the migration process on the restored copy of the database, double checking that the migration ran without reporting errors. I then restarted a copy of 0.10.0 against the migrated database and observed that the same sequences of data that went missing from the first migration was also missing in the repeated migration.

In other words, I have now run this migration twice from the same starting data and in both cases, data went missing. The data was present in the source prior to migration, as verified by the steps I took to do this.

I reviewed the output of influx_tsm and there was nothing in the output that suggested that data would be lost.

Let me know if you want me to enable additional diagnostics on a further attempt to restore the database. For the moment, I am planning to revert our production system to the b1 engine because I can't afford to lose access to the data that the migration process appears to silently drop.

jon.

jwilder · 2016-02-10T02:51:42Z

Can you post the output of the migration tool? Also, how are you determining points are dropped?

Some data such as fields with NaN or Inf will be dropped because they are not supported field values. These values were able to be inserted in past versions of some endpoints inadvertently.

jonseymour · 2016-02-10T03:16:50Z

The output of the TSM conversion can be found here - https://gist.github.com/jonseymour/1dd3f092eea45c1ab3b5

The technique for detecting missing data was initially a case of using grafana to review a date of significance to us (13 Jan 2016). All the timeseries data for that date was missing in the migrated database, but was present in the restored database.

For the purposes of investigation, I reduced the query to this:

select count(current) from "foo.bar.state" where time >= '2016-01-13' and time < '2016-01-20' group by time(1d)

I should get numbers of around ~800k for each day, but for the missing days was instead getting 0 for 5 of the 7 days. (I got the expected values of ~800k for 2 of the 7 days in the time period above).

jonseymour · 2016-02-10T03:38:47Z

I should clarify: all the data for the periods with missing data was missing from one measurement. There were other measurements that had data for that date.

jwilder · 2016-02-10T03:47:53Z

Can you run show shards? That should show which shard is holding data for that time range. For the missing measurement, can you provide a sample of what the raw point data looked like?

jonseymour · 2016-02-10T04:11:13Z

show shards output -> https://gist.github.com/jonseymour/ca81c516bd192a0a8119

An example of a measurement point:

time,channel,controlled,current,event,observedMax,offTicks,onTicks,peak,period,schema,thingType,uncontrolled,user

2016-01-13T00:00:02.471130001Z,"demand",0,0,"state",2368.2,52507,,0,4,"http://foobar.com/protocol/demand","aircon",0,"8e47ec63-9784-46ae-8042-224166f5d26b"

The tags for this measurement are:

 channel, event, schema, site, thing, thingType, user

The other columns in the sample are fields.

(This example was extracted from a 0.10.0 system running against a restored b1 database)

jonseymour · 2016-02-10T04:51:03Z

For reference, here is a counting query for a different measurement run against both the restored database and the converted database:

(good - restored database)

time    count
2016-01-01T00:00:00Z    337780
2016-01-02T00:00:00Z    338886
2016-01-03T00:00:00Z    301720
2016-01-04T00:00:00Z    339063
2016-01-05T00:00:00Z    326359
2016-01-06T00:00:00Z    324673
2016-01-07T00:00:00Z    324185
2016-01-08T00:00:00Z    322212
2016-01-09T00:00:00Z    319819
2016-01-10T00:00:00Z    311721
2016-01-11T00:00:00Z    347180
2016-01-12T00:00:00Z    433863

(bad - converted database)

2016-01-01T00:00:00Z    0
2016-01-02T00:00:00Z    0
2016-01-03T00:00:00Z    0
2016-01-04T00:00:00Z    7304
2016-01-05T00:00:00Z    8521
2016-01-06T00:00:00Z    8534
2016-01-07T00:00:00Z    8587
2016-01-08T00:00:00Z    8584
2016-01-09T00:00:00Z    8559
2016-01-10T00:00:00Z    8422
2016-01-11T00:00:00Z    0
2016-01-12T00:00:00Z    0

So, in some cases there are no data points in the periods of interest, and in other cases there about 1/35th of the expected number.

jwilder · 2016-02-10T16:58:09Z

It looks like that data should be in shard 14, 15, 16.

14  "sphere"    "default"   14  "2015-12-28T00:00:00Z"  "2016-01-04T00:00:00Z"  "2016-01-04T00:00:00Z"  "1"
15  "sphere"    "default"   15  "2016-01-04T00:00:00Z"  "2016-01-11T00:00:00Z"  "2016-01-11T00:00:00Z"  "1"
16  "sphere"    "default"   16  "2016-01-11T00:00:00Z"  "2016-01-18T00:00:00Z"  "2016-01-18T00:00:00Z"  "1"

You should have some directories under you data dir like /var/lib/influxdb/data/sphere/default/14 with one or more TSM files. Could you show me the the directory contents of each (e.g. ls -la) as well as run influx_inspect dumptsmdev -all /var/lib/influxdb/data/sphere/default/14/<file>.tsm?

jonseymour · 2016-02-10T23:41:33Z

Here is the file listing, I'll collect the dump now.

/var/lib/sphere-stack/influxdb-2/db$ sudo find $(pwd) -type f -ls 
8126518  552 -rw-r--r--   1 root     root       561729 Feb 10 01:03 /var/lib/sphere-stack/influxdb-2/db/_internal/monitor/807/000000001-000000001.tsm
8126514  660 -rw-r--r--   1 root     root       675719 Feb 10 01:03 /var/lib/sphere-stack/influxdb-2/db/_internal/monitor/809/000000001-000000001.tsm
8126515  260 -rw-r--r--   1 root     root       264516 Feb 10 01:03 /var/lib/sphere-stack/influxdb-2/db/_internal/monitor/806/000000001-000000001.tsm
8126516  464 -rw-r--r--   1 root     root       474692 Feb 10 01:03 /var/lib/sphere-stack/influxdb-2/db/_internal/monitor/808/000000001-000000001.tsm
8126517    8 -rw-r--r--   1 root     root         7157 Feb 10 01:03 /var/lib/sphere-stack/influxdb-2/db/_internal/monitor/814/000000001-000000001.tsm
8126519  692 -rw-r--r--   1 root     root       705174 Feb 10 01:04 /var/lib/sphere-stack/influxdb-2/db/_internal/monitor/810/000000001-000000001.tsm
8126520  308 -rw-r--r--   1 root     root       314729 Feb 10 01:04 /var/lib/sphere-stack/influxdb-2/db/_internal/monitor/813/000000001-000000001.tsm
8126512  740 -rw-r--r--   1 root     root       756044 Feb 10 01:03 /var/lib/sphere-stack/influxdb-2/db/_internal/monitor/812/000000001-000000001.tsm
8257542 3984 -rw-r--r--   1 root     root      4078514 Feb 10 02:10 /var/lib/sphere-stack/influxdb-2/db/sphere/default/13/000000001-000000001.tsm
8257543   32 -rw-r--r--   1 root     root        28679 Feb 10 02:10 /var/lib/sphere-stack/influxdb-2/db/sphere/default/28/000000001-000000001.tsm
8257544 1188 -rw-r--r--   1 root     root      1216365 Feb 10 02:10 /var/lib/sphere-stack/influxdb-2/db/sphere/default/1/000000001-000000001.tsm
8257545 62356 -rw-r--r--   1 root     root     63850285 Feb 10 02:12 /var/lib/sphere-stack/influxdb-2/db/sphere/default/8/000000001-000000001.tsm
8257546 97644 -rw-r--r--   1 root     root     99983386 Feb 10 02:14 /var/lib/sphere-stack/influxdb-2/db/sphere/default/7/000000001-000000001.tsm
8257547   12 -rw-r--r--   1 root     root        12008 Feb 10 02:14 /var/lib/sphere-stack/influxdb-2/db/sphere/default/21/000000001-000000001.tsm
8126521 14492 -rw-r--r--   1 root     root     14839197 Feb 10 01:04 /var/lib/sphere-stack/influxdb-2/db/sphere/default/3/000000001-000000001.tsm
8126522 780788 -rw-r--r--   1 root     root     799521817 Feb 10 01:14 /var/lib/sphere-stack/influxdb-2/db/sphere/default/803/000000001-000000001.tsm
8126523 41176 -rw-r--r--   1 root     root     42160169 Feb 10 01:15 /var/lib/sphere-stack/influxdb-2/db/sphere/default/5/000000001-000000001.tsm
8126524   40 -rw-r--r--   1 root     root        37994 Feb 10 01:15 /var/lib/sphere-stack/influxdb-2/db/sphere/default/9/000000001-000000001.tsm
8257541    4 -rw-r--r--   1 root     root         2197 Feb 10 02:10 /var/lib/sphere-stack/influxdb-2/db/sphere/default/22/000000001-000000001.tsm
8126525  188 -rw-r--r--   1 root     root       189522 Feb 10 01:15 /var/lib/sphere-stack/influxdb-2/db/sphere/default/2/000000001-000000001.tsm
8126526 972564 -rw-r--r--   1 root     root     995897866 Feb 10 01:29 /var/lib/sphere-stack/influxdb-2/db/sphere/default/795/000000001-000000001.tsm
8126527 14388 -rw-r--r--   1 root     root     14731832 Feb 10 01:29 /var/lib/sphere-stack/influxdb-2/db/sphere/default/16/000000001-000000001.tsm
8126528 128232 -rw-r--r--   1 root     root     131308433 Feb 10 01:31 /var/lib/sphere-stack/influxdb-2/db/sphere/default/6/000000001-000000001.tsm
8126529 14124 -rw-r--r--   1 root     root     14459729 Feb 10 01:32 /var/lib/sphere-stack/influxdb-2/db/sphere/default/15/000000001-000000001.tsm
8126530 413628 -rw-r--r--   1 root     root     423547303 Feb 10 01:47 /var/lib/sphere-stack/influxdb-2/db/sphere/default/12/000000001-000000001.tsm
8126531 1078404 -rw-r--r--   1 root     root     1104279102 Feb 10 02:03 /var/lib/sphere-stack/influxdb-2/db/sphere/default/790/000000001-000000001.tsm
8126532 207040 -rw-r--r--   1 root     root     212006115 Feb 10 02:06 /var/lib/sphere-stack/influxdb-2/db/sphere/default/811/000000001-000000001.tsm
8126533 2684 -rw-r--r--   1 root     root      2745232 Feb 10 02:06 /var/lib/sphere-stack/influxdb-2/db/sphere/default/14/000000001-000000001.tsm
8257538 78416 -rw-r--r--   1 root     root     80294016 Feb 10 02:10 /var/lib/sphere-stack/influxdb-2/db/sphere/default/11/000000001-000000001.tsm
8257539   52 -rw-r--r--   1 root     root        49777 Feb 10 02:10 /var/lib/sphere-stack/influxdb-2/db/sphere/default/10/000000001-000000001.tsm
8257540 28232 -rw-r--r--   1 root     root     28907200 Feb 10 02:10 /var/lib/sphere-stack/influxdb-2/db/sphere/default/4/000000001-000000001.tsm

jonseymour · 2016-02-10T23:57:01Z

@jwilder - could you e-mail [email protected] and I'll reply with a dropbox link to the dumps for 14, 15 and 16 since I am not comfortable sharing the contents of the dumps in a public forum.

jwilder · 2016-02-10T23:57:39Z

Sure. Thanks.

jonseymour · 2016-02-12T00:03:01Z

@jwilder It looks like the issue with the dropped data is that the shard series index has lost track of series that are stored in the shard, so these aren't exported. This doesn't affect access by the b1 engine itself because the series are accessed directly by bucket name without reference to whether the series is listed in the series index. I am going to have a crack at greedily trying to resurrect orphaned series by using a heuristic to discover them, irrespective of the state of the series index in the shard meta data.

jwilder · 2016-02-12T00:06:42Z

Ah... so the b1 shard index somehow became inconsistent with the actual data in the shard?

jonseymour · 2016-02-12T00:07:44Z

Yeah, that's what I think is happening.

jonseymour · 2016-02-12T00:52:35Z

So, what I plan to do is this. If I find series-like buckets (e.g. names containing ,) in the shard that aren't referenced by the series index then I will abort, unless --repair-series-index is specified in which case I will use an augmented series index that includes the series I find from the bucket list.

How does that sound, @jwilder?

jwilder · 2016-02-12T01:29:58Z

Sounds reasonable.

jonseymour · 2016-02-12T03:33:05Z

Ok, I have a candidate fix here - https://github.com/jonseymour/influxdb/tree/repair-b1-shard which I have used to successfully recover the data I expected to recover from shard 16. I'll run the modified code against the full database before I issue a pull request.

FWIW: the claimed space reduction for this shard dropped from 99% to 81%.

jonseymour · 2016-02-12T07:09:37Z

I have a hypothesis about how the corruption might have been introduced, although not enough evidence to confirm it at this point.

If a write for a set of points that introduces new series to a shard fails then the in-memory copy of an index will record that the series has been created even though the change to on disk version of the index will be rolled back. Subsequent writes into the series will succeed, but the on disk index will be inconsistent with the disk and this condition will never be fixed unless the server is restarted while the shard is still hot - in other words, a failed write might cause the in memory meta data index to become dirty with respect to the disk.

If someone else confirms that this could be a problem, it might be worth raising a separate issue to document the possibility of meta data corruption during a failed write.

A case (influxdata#5606) was found where a lot of data unexpectedly disappeared from a database following a TSM conversion. The proximate cause was an inconsistency between the root Bolt DB bucket list and the meta data in the "series" bucket of the same shard. There were apparently valid series in Bolt DB buckets that were no longer referenced by the meta data in the "series" bucket - so-called orphaned series; since the conversion process only iterated across the series found in the meta data, the conversion process caused the orphaned series to be removed from the converted shards. This resulted in the unexpected removal of data from the TSM shards that had previously been accessible (despite the meta data inconsistency) in the b1 shards. The root cause of the meta data inconsistency in the case above is not understood but, in the case above, removal of the orphaned series wasn't the appropriate resolution of the inconsistency. This change detects occurrences of meta data inconsistency in shards during conversion and, by default, will cause the conversion process to fail (--repair-index=fail) if any such inconsistency is found. The user can force the conversion to occur by choosing to resolve the inconsistency either by assuming the meta data is incorrect and attempting to convert the orphaned series (--repair-index=repair) or by assuming the meta data is correct and ignoring the orphaned series (--repair-index=ignore). Currently detection and recovery of orphaned series is only supported for b1 shards; bz1 shards are not currently supported. Signed-off-by: Jon Seymour <[email protected]>

jonseymour · 2016-02-12T10:42:26Z

This is somewhat circumstantial, but I plotted the number of original points, the number of points in the broken conversion and the delta against the beginning of each shard's 7 day capture period and also the version numbers that were installed during that period:

date    version broken points   fixed-points    delta-points    % delta
2015-10-12  0.9.2   135121  155029  19908   12.84%
2015-10-19  0.9.2   23035   142355  119320  83.82%
2015-10-26  0.9.2   1742612 1769748 27136   1.53%
2015-11-02  0.9.2   3439563 6177142 2737579 44.32%
2015-11-09  0.9.2   4991264 11538605    6547341 56.74%
2015-11-16  0.9.2   15519047    30690307    15171260    49.43%
2015-11-23  0.9.2   11843823    45975838    34132015    74.24%
2015-11-30  0.9.2   18142441    64235830    46093389    71.76%
2015-12-07  0.9.2   10170267    85322002    75151735    88.08%
2015-12-14  0.9.2   60655077    125208734   64553657    51.56%
2015-12-21  0.9.2   519825  127585672   127065847   99.59%
2015-12-28  0.9.2   423310  112628404   112205094   99.62%
2016-01-04  0.9.2   1680801 101438387   99757586    98.34%
2016-01-11  0.9.2   2231685 126219437   123987752   98.23%
2016-01-18  0.9.2/0.9.6.1   140051078   140793381   742303  0.53%
2016-01-25  0.9.6.1 129735681   129735681   0   0.00%
2016-02-01  0.9.6.1 104856564   104856564   0   0.00%
2016-02-08  0.9.6.1/0.10.0  27379595    27379595    0   0.00%

The elimination of broken shards for weeks in which 0.9.6.1 was partially installed would tend to suggest that whatever issue was causing the shard index to become corrupted was fixed between version 0.9.2 and 0.9.6.1

jonseymour · 2016-02-12T11:13:52Z

Yeah, I reckon the corruption was caused by the problem fixed with 3348dab which arrived in the tree after v0.9.2

A case (#5606) was found where a lot of data unexpectedly disappeared from a database following a TSM conversion. The proximate cause was an inconsistency between the root Bolt DB bucket list and the meta data in the "series" bucket of the same shard. There were apparently valid series in Bolt DB buckets that were no longer referenced by the meta data in the "series" bucket - so-called orphaned series; since the conversion process only iterated across the series found in the meta data, the conversion process caused the orphaned series to be removed from the converted shards. This resulted in the unexpected removal of data from the TSM shards that had previously been accessible (despite the meta data inconsistency) in the b1 shards. The root cause of the meta data inconsistency in the case above is not understood but, in the case above, removal of the orphaned series wasn't the appropriate resolution of the inconsistency. This change detects occurrences of meta data inconsistency in shards during conversion and, by default, will cause the conversion process to fail (--repair-index=fail) if any such inconsistency is found. The user can force the conversion to occur by choosing to resolve the inconsistency either by assuming the meta data is incorrect and attempting to convert the orphaned series (--repair-index=repair) or by assuming the meta data is correct and ignoring the orphaned series (--repair-index=ignore). Currently detection and recovery of orphaned series is only supported for b1 shards; bz1 shards are not currently supported. Signed-off-by: Jon Seymour <[email protected]>

joelegasse · 2016-02-12T23:26:49Z

@jonseymour I added a fix for both b1 and bz1 shards. Your comments about how the series metadata was never actually used was helpful in creating a cleaner fix... ignore the series metadata, and just scan every series. :-)

I did some local testing, but would you be able to try the updated fix on your shards that were having some of the series ignored?

A case (influxdata#5606) was found where a lot of data unexpectedly disappeared from a database following a TSM conversion. The proximate cause was an inconsistency between the root Bolt DB bucket list and the meta data in the "series" bucket of the same shard. There were apparently valid series in Bolt DB buckets that were no longer referenced by the meta data in the "series" bucket - so-called orphaned series; since the conversion process only iterated across the series found in the meta data, the conversion process caused the orphaned series to be removed from the converted shards. This resulted in the unexpected removal of data from the TSM shards that had previously been accessible (despite the meta data inconsistency) in the b1 shards. The root cause of the meta data inconsistency in the case above was a failure, in versions prior to v0.9.3 (actually 3348dab) to update the "series" bucket with series that had been created in previous shards during the life of the same influxd process instance. This fix is required to avoid data loss during TSM conversions for shards that were created with versions of influx that did not include 3348dab (e.g. prior to v0.9.3). Analysis-by: Jon Seymour <[email protected]>

jonseymour · 2016-02-13T03:07:57Z

Fixed by #5675. Thanks @jwilder, @joelegasse!

A case (#5606) was found where a lot of data unexpectedly disappeared from a database following a TSM conversion. The proximate cause was an inconsistency between the root Bolt DB bucket list and the meta data in the "series" bucket of the same shard. There were apparently valid series in Bolt DB buckets that were no longer referenced by the meta data in the "series" bucket - so-called orphaned series; since the conversion process only iterated across the series found in the meta data, the conversion process caused the orphaned series to be removed from the converted shards. This resulted in the unexpected removal of data from the TSM shards that had previously been accessible (despite the meta data inconsistency) in the b1 shards. The root cause of the meta data inconsistency in the case above was a failure, in versions prior to v0.9.3 (actually 3348dab) to update the "series" bucket with series that had been created in previous shards during the life of the same influxd process instance. This fix is required to avoid data loss during TSM conversions for shards that were created with versions of influx that did not include 3348dab (e.g. prior to v0.9.3). Analysis-by: Jon Seymour <[email protected]>

jonseymour mentioned this issue Feb 12, 2016

Tolerate b1 shard index inconsistencies during TSM conversion #5647

Closed

joelegasse self-assigned this Feb 12, 2016

joelegasse mentioned this issue Feb 12, 2016

influx_tsm: convert non-indexed series, too #5665

Closed

jonseymour closed this as completed Feb 13, 2016

jonseymour mentioned this issue Feb 17, 2016

Merge influx_tsm conversion fixes into master #5709

Closed

jhedlund mentioned this issue Feb 17, 2016

Different query results depending on time WHERE clause #5693

Closed

jwilder added this to the 0.10.1 milestone Feb 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.10.0] TSM conversion reproducibly drops data silently #5606

[0.10.0] TSM conversion reproducibly drops data silently #5606

jonseymour commented Feb 10, 2016

jwilder commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jwilder commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jwilder commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jwilder commented Feb 10, 2016

jonseymour commented Feb 12, 2016

jwilder commented Feb 12, 2016

jonseymour commented Feb 12, 2016

jonseymour commented Feb 12, 2016

jwilder commented Feb 12, 2016

jonseymour commented Feb 12, 2016

jonseymour commented Feb 12, 2016

jonseymour commented Feb 12, 2016

jonseymour commented Feb 12, 2016

joelegasse commented Feb 12, 2016

jonseymour commented Feb 13, 2016

[0.10.0] TSM conversion reproducibly drops data silently #5606

[0.10.0] TSM conversion reproducibly drops data silently #5606

Comments

jonseymour commented Feb 10, 2016

jwilder commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jwilder commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jwilder commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jonseymour commented Feb 10, 2016

jwilder commented Feb 10, 2016

jonseymour commented Feb 12, 2016

jwilder commented Feb 12, 2016

jonseymour commented Feb 12, 2016

jonseymour commented Feb 12, 2016

jwilder commented Feb 12, 2016

jonseymour commented Feb 12, 2016

jonseymour commented Feb 12, 2016

jonseymour commented Feb 12, 2016

jonseymour commented Feb 12, 2016

joelegasse commented Feb 12, 2016

jonseymour commented Feb 13, 2016