Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Creating an index produced disk stall error #36385

Closed
roncrdb opened this issue Apr 1, 2019 · 4 comments
Closed

SQL: Creating an index produced disk stall error #36385

roncrdb opened this issue Apr 1, 2019 · 4 comments
Assignees
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting

Comments

@roncrdb
Copy link

roncrdb commented Apr 1, 2019

Describe the problem

While creating an index, one of the nodes crashed with the following error:

disk stall detected: unable to write to <no-attributes>=/var/lib/crdbinternal/ds1 within 10s

This error is happening because RocksDB is compacting a lot and blocks traffic to the engine a number of times and data was not committable to the RocksDB engine within 10s. We do have a workaround for this which would be setting up the environment variable COCKROACH_ENGINE_MAX_SYNC_DURATION=120s.

However setting that env var led to #26.

Additional data / screenshots
Compaction Stats:

Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   24.19 MB   0.5      1.4     0.0      1.4      47.3     45.9       0.3   1.0      1.1     39.1      1239      4103    0.302     15M   7442
  L2    155/0   622.11 MB   9.7     90.3    46.1     44.1      90.2     46.0       0.1   2.0     35.6     35.6      2596      1562    1.662    741M  1197K
  L3     76/0   260.66 MB   1.0     52.0    33.9     18.1      52.0     33.8      11.7   1.5     35.1     35.1      1515      4320    0.351    366M   273K
  L4    463/0    2.55 GB   1.0    106.4    38.4     68.0     106.3     38.3       7.0   2.8     32.5     32.5      3354      6604    0.508   1187M  1442K
  L5   3427/2   25.47 GB   1.0    141.1    41.1    100.0     141.1     41.1       3.4   3.4     28.9     28.9      4996      5580    0.895   1625M   719K
  L6   6742/4   243.96 GB   0.0    218.1    37.8    180.2     214.5     34.3       0.0   5.7     33.9     33.3      6594      2745    2.402   1313M    87M
 Sum  10864/6   272.85 GB   0.0    609.2   197.4    411.8     651.3    239.5      22.6  14.1     30.7     32.9     20295     24914    0.815   5248M    91M
 Int      0/0    0.00 KB   0.0      8.7     2.5      6.3       9.6      3.3       0.3  10.9     25.2     27.6       355       681    0.521     77M   594K
Uptime(secs): 8533.3 total, 104.6 interval
Flush(GB): cumulative 45.927, interval 0.853
AddFile(GB): cumulative 0.399, interval 0.023
AddFile(Total Files): cumulative 2678, interval 353
AddFile(L0 Files): cumulative 2211, interval 326
AddFile(Keys): cumulative 8133933, interval 467383
Cumulative compaction: 651.34 GB write, 78.16 MB/s write, 609.22 GB read, 73.11 MB/s read, 20294.7 seconds
Interval compaction: 9.56 GB write, 93.60 MB/s write, 8.75 GB read, 85.64 MB/s read, 354.9 seconds
Stalls(count): 539 level0_slowdown, 536 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 17341 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 246 total count

Environment:

  • CockroachDB v19.1.0-beta.20190318
@roncrdb roncrdb added the A-sql-execution Relating to SQL execution. label Apr 1, 2019
@awoods187 awoods187 added A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting and removed A-sql-execution Relating to SQL execution. labels Apr 1, 2019
@dt
Copy link
Member

dt commented Apr 1, 2019

note the AddFile(L0 Files):.. interval 326 and interval 246 stalls there -- I think we're adding lots of SSTs to rocks and triggering its back pressure that assumes they are all there via writes, so it started slowing down writes (and did so 246 times in the period leading up to this error), even though write slowdown will help since the files are getting their via direct ingestions.

This suggests to me that we need back pressure on direct ingestions before we come close to the rocksdb limits, ala #34258

@vivekmenezes
Copy link
Contributor

#36403

along with

#36424

will solve this problem

@vivekmenezes
Copy link
Contributor

@dt I know you're doing additional work here, but confirming that this issue is not release blocking any more.

@vivekmenezes
Copy link
Contributor

@dt I believe we can close this right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Projects
None yet
Development

No branches or pull requests

4 participants