Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest/ledgerbackend: Parse captive core toml to implement better merging #3558

Merged
merged 16 commits into from
May 5, 2021

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Apr 21, 2021

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

This PR fixes #3486 and lays the groundwork for implementing #3442 .

This PR changes how we interpret captive core append file (configured via --captive-core-config-append-path). Previously, we automatically generated the stellar core toml file for the captive core subprocess in ingest/ledgerbackend/stellar_core_runner.go by presetting some defaults and then concatenating the generated configuration with the user supplied append file.

This strategy did not work out so well because sometimes horizon operators included configuration in the append file which was already configured in the presets generated by ingest/ledgerbackend/stellar_core_runner.go. An example of this type of error can be found in #3486 .

Now, in this PR, instead of generating the stellar core toml file via simple string concatenation we actually parse the user supplied append file and try to do more robust merging between the user supplied configuration and the captive core preset defaults. The merging code is implemented in ingest/ledgerbackend/toml.go .

The merging strategy is unfortunately complicated by the fact that captive core configuration can come from 3 places:

  1. User supplied append file
  2. horizon command line flag (e.g. --captive-core-peer-port, --captive-core-http-port, etc)
  3. default values which we preset (e.g. DISABLE_XDR_FSYNC)

In theory the user supplied append file should only contain the quorum set but in practice we have never validated the append file to ensure that it configures a quorum set and nothing else. We could simplify the merging logic in this PR by choosing to validate the append file in this way. I decided not to take that approach because I think that it would be a breaking change for many horizon operators (including SDF) who have configuration in the append file which is not related to quorum sets. If you disagree with this design decision please let me know.

The merging algorithm generally works as follows:

If there is a configuration field which is set both in source (1) and source (2) we check that both sources provide equivalent values, otherwise we return an error.

The exception to this rule is the history archive configuration. We take the history archive configuration from source (1) and completely disregard what is provided in source (2) unless there is no history archive configuration in source (1). The reason for this is that it's difficult to compare the history archive values from source (1) and source (2).

If a configuration field is defined in source (1) but not source (2), we will use the field which is explicitly defined. The same logic applies if the field is defined in source (2) but not source (1).

Finally, if there is no configuration defined in either source (1) or source (2), we use the value from source (3).

In the next PR, my plan is to fully implement #3442 by introducing a new horizon command line flag --captive-core-config-file which deprecates --captive-core-config-append-path. The difference between --captive-core-config-file and --captive-core-config-append-path is that the validation is stricter.

--captive-core-config-file will not allow any unknown fields to be present in the toml file.

Furthermore, if --captive-core-config-file is provided then you will not be allowed to use any horizon command line flags to configure captive core. In other words, --captive-core-config-file deprecates source (2).

Along with defining --captive-core-config-file, the next PR will also add documentation for the captive core configuration file in an captive-core_example.cfg file (similar to how stellar core does it https://github.com/stellar/stellar-core/blob/master/docs/stellar-core_example.cfg ). The example file will document the full range of configuration options available to captive core instances.

Unfortunately there will be some slight differences between the stellar core example cfg and the captive core example cfg because captive core has different defaults compared to stellar core. For example, captive core sets DISABLE_XDR_FSYNC to true by default whereas stellar core sets DISABLE_XDR_FSYNC to false by default. We should think carefully about each of the cases where captive core diverges from stellar core.

Known limitations

The golang toml library doesn't handle nested tables in a way which is consistent with stellar core. As a workaround I implemented some hacky code to flatten nested tables using string manipulation.

@tamirms tamirms force-pushed the captive-core-toml branch 3 times, most recently from 708ed1c to 57080a0 Compare April 22, 2021 08:28
Copy link
Contributor

@Shaptic Shaptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is in-progress, but just a few thoughts

ingest/ledgerbackend/stellar_core_runner.go Outdated Show resolved Hide resolved
ingest/ledgerbackend/stellar_core_runner.go Outdated Show resolved Hide resolved
ingest/ledgerbackend/stellar_core_runner.go Outdated Show resolved Hide resolved
ingest/ledgerbackend/stellar_core_runner.go Outdated Show resolved Hide resolved
ingest/ledgerbackend/stellar_core_runner.go Outdated Show resolved Hide resolved
ingest/ledgerbackend/stellar_core_runner_test.go Outdated Show resolved Hide resolved
@tamirms tamirms force-pushed the captive-core-toml branch 2 times, most recently from ceea647 to 18bef97 Compare April 24, 2021 12:29
@@ -71,7 +71,6 @@ func Example_changes() {
backend, err := ledgerbackend.NewCaptive(
ledgerbackend.CaptiveCoreConfig{
BinaryPath: "/bin/stellar-core",
ConfigAppendPath: "/opt/stellar-core.cfg",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update this example

@tamirms tamirms force-pushed the captive-core-toml branch from 18bef97 to e142246 Compare April 24, 2021 13:16
@tamirms tamirms requested a review from a team April 26, 2021 09:12
Copy link
Contributor

@bartekn bartekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good! I don't have any major comments except NewCaptiveCoreTomlFromFile.

ingest/ledgerbackend/stellar_core_runner_test.go Outdated Show resolved Hide resolved
ingest/ledgerbackend/toml.go Outdated Show resolved Hide resolved
@jacekn
Copy link
Contributor

jacekn commented Apr 26, 2021

@tamirms this is a great improvement! It feels fine for the most part but the history archive handling could be a bit confusing IMO.

The exception to this rule is the history archive configuration. We take the history archive configuration from source (1) and completely disregard what is provided in source (2) unless there is no history archive configuration in source (1). The reason for this is that it's difficult to compare the history archive values from source (1) and source (2).

It's not a very strong opinion but I think it would be best to error if it's set in both places, even if there is a chance both are the same.
Downside of throwing an error is that we might do that if values are equivalent. But on the plus side we remove ambiguity.
I feel that erroring with a clear message is less risky than possibility or using archive other than what the operator intended.

@tamirms tamirms force-pushed the captive-core-toml branch from 1ddff54 to 2ce5690 Compare April 30, 2021 12:21
@tamirms tamirms force-pushed the captive-core-toml branch from 2ce5690 to 60c50ab Compare April 30, 2021 12:24
@tamirms tamirms force-pushed the captive-core-toml branch from 3ffa3cf to 4436d00 Compare April 30, 2021 15:38
@tamirms tamirms force-pushed the captive-core-toml branch from 8b5a5b0 to 81f1e28 Compare April 30, 2021 17:32
@tamirms tamirms marked this pull request as ready for review May 2, 2021 15:23
@tamirms tamirms requested a review from a team May 2, 2021 15:24
@tamirms
Copy link
Contributor Author

tamirms commented May 2, 2021

@stellar/horizon-committers this ready for a final review now, PTAL!

@@ -14,16 +14,17 @@ import (
"github.com/stellar/go/ingest/ledgerbackend"
"github.com/stellar/go/network"
"github.com/stellar/go/support/config"
support "github.com/stellar/go/support/config"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it imported line above?

Suggested change
support "github.com/stellar/go/support/config"

OptType: types.Uint,
CustomSetValue: support.SetOptionalUint,
Required: false,
FlagDefault: uint(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 11626?


return result, nil
text, err := captiveCoreToml.Marshall()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Suggested change
text, err := captiveCoreToml.Marshall()
text, err := captiveCoreToml.Marshal()


return re.ReplaceAllStringFunc(text, func(match string) string {
insideBrackets := match[1 : len(match)-1]
original := tablePlaceHolders.get(insideBrackets)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit:

Suggested change
original := tablePlaceHolders.get(insideBrackets)
original, ok := tablePlaceHolders.get(insideBrackets)

}

// Marshall serializes the CaptiveCoreToml into a toml document.
func (c *CaptiveCoreToml) Marshall() ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Suggested change
func (c *CaptiveCoreToml) Marshall() ([]byte, error) {
func (c *CaptiveCoreToml) Marshal() ([]byte, error) {

historyEntries := map[string]History{}
// The toml library has trouble with nested tables so we need to flatten all nested
// QUORUM_SET and HISTORY tables as a workaround.
// In Marshall() we apply the inverse process to unflatten the nested tables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// In Marshall() we apply the inverse process to unflatten the nested tables.
// In Marshal() we apply the inverse process to unflatten the nested tables.

// the catchup command on captive core.
func (c *CaptiveCoreToml) CatchupToml() (*CaptiveCoreToml, error) {
// clone the existing toml file
data, err := c.Marshall()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data, err := c.Marshall()
data, err := c.Marshal()

var offline CaptiveCoreToml
if err = offline.unmarshal(data); err != nil {
return nil, errors.Wrap(err, "could not unmarshall captive core toml")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have one of these, just to make this more clear what we're doing here?

func (c *CaptiveCoreToml) clone() (*CaptiveCoreToml, error) {

@tamirms tamirms changed the base branch from master to release-horizon-v2.3.0 May 5, 2021 13:17
@tamirms tamirms merged commit 4c98380 into stellar:release-horizon-v2.3.0 May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants