Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LF: Make DarReader ZipEntries immulatble #10243

Merged
merged 6 commits into from
Jul 12, 2021
Merged

LF: Make DarReader ZipEntries immulatble #10243

merged 6 commits into from
Jul 12, 2021

Conversation

remyhaemmerle-da
Copy link
Collaborator

CHANGELOG_BEGIN
CHANGELOG_END

Pull Request Checklist

  • Read and understand the contribution guidelines
  • Include appropriate tests
  • Set a descriptive title and thorough description
  • Add a reference to the issue this PR will solve, if appropriate
  • Include changelog additions in one or more commit message bodies between the CHANGELOG_BEGIN and CHANGELOG_END tags
  • Normal production system change, include purpose of change in description

NOTE: CI is not automatically run on non-members pull-requests for security
reasons. The reviewer will have to comment with /AzurePipelines run to
trigger the build.

val buffer = new Array[Byte](buffSize)
var output = Bytes.Empty
Iterator.continually(zip.read(buffer)).takeWhile(_ >= 0).foreach { size =>
output = output ++ Bytes.fromByteArray(buffer, 0, size)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concatenation is constant time.

@remyhaemmerle-da remyhaemmerle-da added the component/daml-engine DAML-LF Engine & Interpreter label Jul 12, 2021
@remyhaemmerle-da remyhaemmerle-da added this to the Maintenance milestone Jul 12, 2021
@remyhaemmerle-da remyhaemmerle-da marked this pull request as ready for review July 12, 2021 15:41
Copy link
Contributor

@cocreature cocreature left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems sensible, thanks!

go(Success(Map.empty)).map(ZipEntries(name, _))
}
): Try[ZipEntries] =
bracket(Try(darStream))(is => Try(is.close()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure this is a good idea. imho a function should not close a stream it didn’t open and we didn’t open the zip stream here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


def fromByteBuffer(a: ByteBuffer): Bytes =
new Bytes(ByteString.copyFrom(a))

def fromInputStream(a: InputStream): Bytes =
new Bytes(ByteString.readFrom(a))

// reads at most `len` bytes, does not close `a`
def fromInputStream(a: InputStream, len: Int): Bytes = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a fair amount of complexity and potential source of bugs. do the performance gains matter?

Copy link
Collaborator Author

@remyhaemmerle-da remyhaemmerle-da Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the optimization. Do you prefer like that ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx seems simpler! And just to be clear, I’m happy to make an optimization here if it provides a benefit just not without some measurements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 92 to 98
private[GenDarReader] def readDalfNames: Try[Dar[String]] =
bracket(get(ManifestName).map(_.toInputStream))(is => Try(is.close()))
.flatMap(DarManifestReader.dalfNames)
.recoverWith { case NonFatal(e1) =>
findLegacyDalfNames.recoverWith { case NonFatal(_) =>
Failure(Error.InvalidDar(this, e1))
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cocreature Do we have to handle dar without Manifest ?
If no, let me drop this code, otherwise should we have test for that (It seems we currently do not have any) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will always add a manifest and I’d generally consider anything not produced by the compiler to be unsupported so should be fine to drop it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great !

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@mergify mergify bot merged commit 6fe6ae0 into main Jul 12, 2021
@mergify mergify bot deleted the remy-lf-zipentries branch July 12, 2021 21:13
azure-pipelines bot pushed a commit that referenced this pull request Jul 14, 2021
This PR has been created by a script, which is not very smart
and does not have all the context. Please do double-check that
the version prefix is correct before merging.

@akshayshirahatti-da is in charge of this release.

Commit log:
```
1f35db1 ledger-on-sql: Use random log entry ID allocation so we don't depend on `SeedService`. [KVL-1002] (#10255)
b8e2198 Separate traces from warnings in engine. (#10253)
0a7f2b1 [JSON-API] Small refactoring to start using akka http routes internally (#10252)
6e8ec1d LF: Drop old depreated code (#10251)
de7a08f [In-memory fan-out] Handle `null` submitters in getTransactionLogUpdates (#10248)
584169a Participant state v1-to-v2 adaptors [KVL-1002] (#10246)
e4d7cd7 LF: Deprecate com.daml.lf.data.TryOps.Bracket (#10249)
b59f365 [JSON-API] Correctly extract the request source URL/IP (#10244)
edaf541 Adapt `ledger-api-bench-tool` to work with LR [DPP-424] (#10171)
6fe6ae0 LF: Make DarReader ZipEntries immulatble (#10243)
05e5218 Consolidating EventStorageBackend (#10224)
3fd3abf Instrument buffer after contract state events stream (#10209)
643a2de v2 participant state API draft [KVL-998] (#10210)
5abcd04 Adds Unit tests for parallel-ingestion [DPP-455] (#10238)
3cdedcf kvutils: Extract validators from TransactionCommitter [KVL-1015] (#10235)
00d622f Make @UNTIL-LF exclusive, add @UNTIL-LF-FEATURE. (#10240)
2bcbd4e es: switch to persistent nodes (#10236)
7f4bc2a update compat versions for 1.14.2 (#10233)
4eba4b0 Ledger: decouple BD value serialization library from archive library. (#10218)
729afa8 Write a few lines on how to run tests against an Oracle database (#10230)
2b67ebb tf: refactor appr var (#10232)
202b7f7 add akshay to appr team (#10229)
2c5410c Release SDK 1.14.2 (#10226)
89d87ad update compat versions for 1.14.2-snapshot.20210708.7135.0.aa297840 (#10223)
4839344 TODO Cleanup: QueryNonPruned [DPP-459] (#10225)
0bea5e3 Allow retriable lookupMaximumLedgerTime on contention (#10211)
ca2a9e9 Release 1.14.2 snapshot (#10222)
999577a tweak ES cluster (#10219)
6680d36 Warn on DA.Exception import (#10201)
f19f5b0 LF: Simplify DarReader (#10217)
8f1a8f2 Release DAML SDK 1.14.1 (#10204)
ebb76dc LF: reorganize errors in com.daml.lf.archive (#10213)
6964645 Improvements for validation tests (#10214)
5f124c3 Avoid collision in execution log postfix (#10205)
9e27ae0 Reshuffle cocreature in daml-lf codeowners (#10215)
41b8448 LF: Simplify archive reader. (#10208)
38734f0 es-feed: ignore invalid files (#10207)
6586dde Move exercise context computation to the client side (#10199)
c92c678 Mark Daml profiler as stable (#10203)
cb1f4ec ci/windows: disable spool (#10200)
2326d42 Publish execution logs on unix platforms (#10194)
4269345 Drop cleanup of node_modules (#10195)
c8faed8 Deduplicate Java codegen running instructions (#10185)
e4c8c39 update compat versions for 1.14.1-snapshot.20210706.7133.0.8ec16e94 (#10196)
0d881f5 Improvements to the documentation with regards to offsets (#10180)
1d5ba4f feed elasticsearch cluster (#10193)
e2bdca6 Use PartialFunction for more concise code. (#10191)
fe98c48 release 1.14.1-snapshot.20210706.7133.0.8ec16e94 (#10186)
4fe626a Drop logging on CompletionSource (#10190)
582aa5f Fix a typo in an exceptions example. (#10188)
8750c0c reduce noise in Scala bindings (#10187)
98b5ffe Add divulgence warning and test in script service. (#10179)
8578e56 Tests for transaction validation (#10167)
05a72a3 update compat versions for 1.15.0-snapshot.20210705.7286.0.62aabcc4 (#10184)
7b93923 [In-memory fan-out] Performance optimizations [DPP-470] (#10127)
6c49619 Document how to run the Java codegen from the Daml assistant (#10181)
61aa774 Release SDK 1.15 RC (#10182)
```
Changelog:
```
- [JSON-API] If the service is put behind a proxy filling either of these headers X-Forwarded-For & X-Real-Ip then these will now be respected for logging the request source ip/url
- [Daml Compiler] Importing DA.Exception on LF < 1.14 now produces a
warning that exceptions require Daml-LF >= 1.14.
- [Daml Profiler] The Daml profiler is now a stable feature.
[Docs] Improvements to the documentation with regards to offsets
[Docs] Document how to use the Java codegen using the Daml assistant
```

CHANGELOG_BEGIN
CHANGELOG_END
cocreature added a commit that referenced this pull request Jul 14, 2021
* release 1.15.0-snapshot.20210713.7343.0.1f35db17

This PR has been created by a script, which is not very smart
and does not have all the context. Please do double-check that
the version prefix is correct before merging.

@akshayshirahatti-da is in charge of this release.

Commit log:
```
1f35db1 ledger-on-sql: Use random log entry ID allocation so we don't depend on `SeedService`. [KVL-1002] (#10255)
b8e2198 Separate traces from warnings in engine. (#10253)
0a7f2b1 [JSON-API] Small refactoring to start using akka http routes internally (#10252)
6e8ec1d LF: Drop old depreated code (#10251)
de7a08f [In-memory fan-out] Handle `null` submitters in getTransactionLogUpdates (#10248)
584169a Participant state v1-to-v2 adaptors [KVL-1002] (#10246)
e4d7cd7 LF: Deprecate com.daml.lf.data.TryOps.Bracket (#10249)
b59f365 [JSON-API] Correctly extract the request source URL/IP (#10244)
edaf541 Adapt `ledger-api-bench-tool` to work with LR [DPP-424] (#10171)
6fe6ae0 LF: Make DarReader ZipEntries immulatble (#10243)
05e5218 Consolidating EventStorageBackend (#10224)
3fd3abf Instrument buffer after contract state events stream (#10209)
643a2de v2 participant state API draft [KVL-998] (#10210)
5abcd04 Adds Unit tests for parallel-ingestion [DPP-455] (#10238)
3cdedcf kvutils: Extract validators from TransactionCommitter [KVL-1015] (#10235)
00d622f Make @UNTIL-LF exclusive, add @UNTIL-LF-FEATURE. (#10240)
2bcbd4e es: switch to persistent nodes (#10236)
7f4bc2a update compat versions for 1.14.2 (#10233)
4eba4b0 Ledger: decouple BD value serialization library from archive library. (#10218)
729afa8 Write a few lines on how to run tests against an Oracle database (#10230)
2b67ebb tf: refactor appr var (#10232)
202b7f7 add akshay to appr team (#10229)
2c5410c Release SDK 1.14.2 (#10226)
89d87ad update compat versions for 1.14.2-snapshot.20210708.7135.0.aa297840 (#10223)
4839344 TODO Cleanup: QueryNonPruned [DPP-459] (#10225)
0bea5e3 Allow retriable lookupMaximumLedgerTime on contention (#10211)
ca2a9e9 Release 1.14.2 snapshot (#10222)
999577a tweak ES cluster (#10219)
6680d36 Warn on DA.Exception import (#10201)
f19f5b0 LF: Simplify DarReader (#10217)
8f1a8f2 Release DAML SDK 1.14.1 (#10204)
ebb76dc LF: reorganize errors in com.daml.lf.archive (#10213)
6964645 Improvements for validation tests (#10214)
5f124c3 Avoid collision in execution log postfix (#10205)
9e27ae0 Reshuffle cocreature in daml-lf codeowners (#10215)
41b8448 LF: Simplify archive reader. (#10208)
38734f0 es-feed: ignore invalid files (#10207)
6586dde Move exercise context computation to the client side (#10199)
c92c678 Mark Daml profiler as stable (#10203)
cb1f4ec ci/windows: disable spool (#10200)
2326d42 Publish execution logs on unix platforms (#10194)
4269345 Drop cleanup of node_modules (#10195)
c8faed8 Deduplicate Java codegen running instructions (#10185)
e4c8c39 update compat versions for 1.14.1-snapshot.20210706.7133.0.8ec16e94 (#10196)
0d881f5 Improvements to the documentation with regards to offsets (#10180)
1d5ba4f feed elasticsearch cluster (#10193)
e2bdca6 Use PartialFunction for more concise code. (#10191)
fe98c48 release 1.14.1-snapshot.20210706.7133.0.8ec16e94 (#10186)
4fe626a Drop logging on CompletionSource (#10190)
582aa5f Fix a typo in an exceptions example. (#10188)
8750c0c reduce noise in Scala bindings (#10187)
98b5ffe Add divulgence warning and test in script service. (#10179)
8578e56 Tests for transaction validation (#10167)
05a72a3 update compat versions for 1.15.0-snapshot.20210705.7286.0.62aabcc4 (#10184)
7b93923 [In-memory fan-out] Performance optimizations [DPP-470] (#10127)
6c49619 Document how to run the Java codegen from the Daml assistant (#10181)
61aa774 Release SDK 1.15 RC (#10182)
```
Changelog:
```
- [JSON-API] If the service is put behind a proxy filling either of these headers X-Forwarded-For & X-Real-Ip then these will now be respected for logging the request source ip/url
- [Daml Compiler] Importing DA.Exception on LF < 1.14 now produces a
warning that exceptions require Daml-LF >= 1.14.
- [Daml Profiler] The Daml profiler is now a stable feature.
[Docs] Improvements to the documentation with regards to offsets
[Docs] Document how to use the Java codegen using the Daml assistant
```

CHANGELOG_BEGIN
CHANGELOG_END

* 1.16 not 1.15

changelog_begin
changelog_end

Co-authored-by: Azure Pipelines DAML Build <[email protected]>
Co-authored-by: Moritz Kiefer <[email protected]>
cocreature added a commit that referenced this pull request Jul 15, 2021
I haven’t found any conclusive information as to why ON COMMIT doesn’t
work incrementally but
https://docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/json-query-rewrite-use-materialized-view-json_table.html#GUID-8B0922ED-C0D1-45BD-9588-B7719BE4ECF0
recommends that for rewriting (which isn’t what we do here but both
involve a materialized view on json_table).

Benchmarks:

before:
InsertBenchmark.run         1000          1            1000  avgt    5  0.327 ± 0.040   s/op
InsertBenchmark.run         1000          3            1000  avgt    5  0.656 ± 0.043   s/op
InsertBenchmark.run         1000          5            1000  avgt    5  1.034 ± 0.051   s/op
InsertBenchmark.run         1000          7            1000  avgt    5  1.416 ± 0.106   s/op
InsertBenchmark.run         1000          9            1000  avgt    5  1.734 ± 0.143   s/op
QueryBenchmark.run          1000         10             N/A  avgt    5  0.071 ± 0.016   s/op

After:
Benchmark            (batchSize)  (batches)  (numContracts)  Mode  Cnt  Score   Error  Units
InsertBenchmark.run         1000          1            1000  avgt    5  0.217 ± 0.034   s/op
InsertBenchmark.run         1000          3            1000  avgt    5  0.232 ± 0.027   s/op
InsertBenchmark.run         1000          5            1000  avgt    5  0.226 ± 0.051   s/op
InsertBenchmark.run         1000          7            1000  avgt    5  0.225 ± 0.048   s/op
InsertBenchmark.run         1000          9            1000  avgt    5  0.232 ± 0.021   s/op
QueryBenchmark.run          1000         10             N/A  avgt    5 0.080 ± 0.014   s/op

The difference in query times is just noise and changes across runs.

So we get the expected behavior of inserts being independent of the
total ACS size now. We could still explore if we gain something by
avoiding the materialized view to reduce constant factors but that’s
much less of an issue.

fixes #10243

changelog_begin
changelog_end
cocreature added a commit that referenced this pull request Jul 15, 2021
I haven’t found any conclusive information as to why ON COMMIT doesn’t
work incrementally but
https://docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/json-query-rewrite-use-materialized-view-json_table.html#GUID-8B0922ED-C0D1-45BD-9588-B7719BE4ECF0
recommends that for rewriting (which isn’t what we do here but both
involve a materialized view on json_table).

Benchmarks:

before:
InsertBenchmark.run         1000          1            1000  avgt    5  0.327 ± 0.040   s/op
InsertBenchmark.run         1000          3            1000  avgt    5  0.656 ± 0.043   s/op
InsertBenchmark.run         1000          5            1000  avgt    5  1.034 ± 0.051   s/op
InsertBenchmark.run         1000          7            1000  avgt    5  1.416 ± 0.106   s/op
InsertBenchmark.run         1000          9            1000  avgt    5  1.734 ± 0.143   s/op
QueryBenchmark.run          1000         10             N/A  avgt    5  0.071 ± 0.016   s/op

After:
Benchmark            (batchSize)  (batches)  (numContracts)  Mode  Cnt  Score   Error  Units
InsertBenchmark.run         1000          1            1000  avgt    5  0.217 ± 0.034   s/op
InsertBenchmark.run         1000          3            1000  avgt    5  0.232 ± 0.027   s/op
InsertBenchmark.run         1000          5            1000  avgt    5  0.226 ± 0.051   s/op
InsertBenchmark.run         1000          7            1000  avgt    5  0.225 ± 0.048   s/op
InsertBenchmark.run         1000          9            1000  avgt    5  0.232 ± 0.021   s/op
QueryBenchmark.run          1000         10             N/A  avgt    5 0.080 ± 0.014   s/op

The difference in query times is just noise and changes across runs.

So we get the expected behavior of inserts being independent of the
total ACS size now. We could still explore if we gain something by
avoiding the materialized view to reduce constant factors but that’s
much less of an issue.

fixes #10243

changelog_begin
changelog_end
mergify bot pushed a commit that referenced this pull request Jul 15, 2021
I haven’t found any conclusive information as to why ON COMMIT doesn’t
work incrementally but
https://docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/json-query-rewrite-use-materialized-view-json_table.html#GUID-8B0922ED-C0D1-45BD-9588-B7719BE4ECF0
recommends that for rewriting (which isn’t what we do here but both
involve a materialized view on json_table).

Benchmarks:

before:
InsertBenchmark.run         1000          1            1000  avgt    5  0.327 ± 0.040   s/op
InsertBenchmark.run         1000          3            1000  avgt    5  0.656 ± 0.043   s/op
InsertBenchmark.run         1000          5            1000  avgt    5  1.034 ± 0.051   s/op
InsertBenchmark.run         1000          7            1000  avgt    5  1.416 ± 0.106   s/op
InsertBenchmark.run         1000          9            1000  avgt    5  1.734 ± 0.143   s/op
QueryBenchmark.run          1000         10             N/A  avgt    5  0.071 ± 0.016   s/op

After:
Benchmark            (batchSize)  (batches)  (numContracts)  Mode  Cnt  Score   Error  Units
InsertBenchmark.run         1000          1            1000  avgt    5  0.217 ± 0.034   s/op
InsertBenchmark.run         1000          3            1000  avgt    5  0.232 ± 0.027   s/op
InsertBenchmark.run         1000          5            1000  avgt    5  0.226 ± 0.051   s/op
InsertBenchmark.run         1000          7            1000  avgt    5  0.225 ± 0.048   s/op
InsertBenchmark.run         1000          9            1000  avgt    5  0.232 ± 0.021   s/op
QueryBenchmark.run          1000         10             N/A  avgt    5 0.080 ± 0.014   s/op

The difference in query times is just noise and changes across runs.

So we get the expected behavior of inserts being independent of the
total ACS size now. We could still explore if we gain something by
avoiding the materialized view to reduce constant factors but that’s
much less of an issue.

fixes #10243

changelog_begin
changelog_end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/daml-engine DAML-LF Engine & Interpreter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants