Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field data memory circuit breaker #4261

Closed
wants to merge 2 commits into from

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Nov 26, 2013

This adds the field data circuit breaker, which is used to estimate
the amount of memory required to load field data before loading it. It
then raises a CircuitBreakingException if the limit is exceeded.

It is configured with two parameters:

indices.fielddata.cache.breaker.limit - the maximum number of bytes
of field data to be loaded before circuit breaking. Defaults to
indices.fielddata.cache.size if set, unbounded otherwise.

indices.fielddata.cache.breaker.overhead - a contast for all field
data estimations to be multiplied with before aggregation. Defaults to
1.03.

Both settings can be configured dynamically using the cluster update
settings API.

@s1monw
Copy link
Contributor

s1monw commented Nov 26, 2013

Cool stuff @dakrone

@@ -60,6 +60,9 @@ By default, `indices` stats are returned. With options for `indices`,
Transport statistics about sent and received bytes in
cluster communication

`breaker`::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think breaker is too generic a name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about circuit-breaker?

@dakrone
Copy link
Member Author

dakrone commented Dec 6, 2013

I realized that the FieldDataEstimator class is no longer needed, as estimations have been moved into their respective field data loading classes, so I'll remove it.

double estimatedBytes = ((RamAccountingTermsEnum)termsEnum).getTotalBytes();
breaker.addWithoutBreaking(-(long)((estimatedBytes * breaker.getOverhead()) - actualUsed));
} else {
logger.warn("Trying to adjust circuit breaker, but TermsEnum has not been wrapped!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an assertion here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an assertion makes sense, I'll do that instead of the if statement.

@dakrone
Copy link
Member Author

dakrone commented Dec 11, 2013

Updated code and force-pushed another squashed commit (because there were going to be merge conflicts regardless, and I'd rather rebase and deal with them now rather than after reviews).

Changes:

  • Move all the files into better/more-applicable packages
  • Breaker stats are now under the key fielddata_breaker and the class is called FieldDataBreakerStats
  • Use constants instead of strings for reused field data filter settings
  • TermsEnum is wrapped in a filter if BlockTreeStats can't be used (for people using custom postings formats)
  • Fix "unwinding" of breaker in the event a different exception occurs while loading field data
  • De-interface-ify MemoryAggregatingCircuitBreaker to become concrete MemoryCircuitBreaker
  • Logger passed through to MemoryCircuitBreaker to preserve which area is using the breaker
  • Remove "field data" from strings in MemoryCircuitBreaker to make it a bit more generic (reflecting the package move to common.breaker)

I may have forgotten other changes that went in, so more reviews welcome :)

this.maxBytes = settings.getAsBytesSize(CIRCUIT_BREAKER_MAX_BYTES_SETTING, new ByteSizeValue(fieldDataMax)).bytes();
this.overhead = settings.getAsDouble(CIRCUIT_BREAKER_OVERHEAD_SETTING, DEFAULT_OVERHEAD_CONSTANT);

this.breaker = new MemoryCircuitBreaker(new ByteSizeValue(maxBytes), overhead, 0, logger);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like breaker is initialized twice. Here and then in doStart()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the doStart() one.

@dakrone
Copy link
Member Author

dakrone commented Dec 19, 2013

Pushed a new version of the circuit breaker that addresses @imotov's comments.

public class InternalCircuitBreakerService extends AbstractLifecycleComponent<InternalCircuitBreakerService> implements CircuitBreakerService {

public static final String CIRCUIT_BREAKER_MAX_BYTES_SETTING = "indices.fielddata.cache.breaker.limit";
public static final String CIRCUIT_BREAKER_OVERHEAD_SETTING = "indices.fielddata.cache.breaker.overhead";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the settings names do not match the package, it should be indices.fielddata.breaker.xxx

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I'll change those.

.setSource(MapBuilder.<String, Object>newMapBuilder().put("test", "value" + id).map()).execute().actionGet();
}

// refresh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a refresh() shortcut in ElasticsearchIntegationTest

This adds the field data circuit breaker, which is used to estimate
the amount of memory required to load field data before loading it. It
then raises a CircuitBreakingException if the limit is exceeded.

It is configured with two parameters:

`indices.fielddata.cache.breaker.limit` - the maximum number of bytes
of field data to be loaded before circuit breaking. Defaults to
`indices.fielddata.cache.size` if set, unbounded otherwise.

`indices.fielddata.cache.breaker.overhead` - a contast for all field
data estimations to be multiplied with before aggregation. Defaults to
1.03.

Both settings can be configured dynamically using the cluster update
settings API.
startObject("type").
startObject("properties").
startObject("test")
.field("type", "string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have some more types like long / double etc. as well as use a random field_data impl? We could have something like

 .field("type", "string")
 .startObject("fielddata")
 .field("format", randomStringFieldDataFormat())

and something like this for numeric as well:

private static String randomNumericFieldDataFormat() {
        return randomFrom(Arrays.asList("array", "compressed", "doc_values"));
}
private static String randomBytesFieldDataFormat() {
        return randomFrom(Arrays.asList("paged_bytes", "fst", "doc_values"));
}

I guess we can add those to ElasticsearchIntegrationTest

@s1monw
Copy link
Contributor

s1monw commented Jan 2, 2014

LGTM please squash and push 👍

@dakrone
Copy link
Member Author

dakrone commented Jan 2, 2014

Merged in a754224, closing #4592

@dakrone dakrone closed this Jan 2, 2014
@dakrone dakrone deleted the circuit-breaker-squashed branch April 21, 2014 23:00
@dakrone dakrone added the :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload label Oct 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants