Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[forge][telemetry] Enable node telemetry service in forge #4490

Merged
merged 1 commit into from
Sep 29, 2022

Conversation

ibalajiarun
Copy link
Contributor

@ibalajiarun ibalajiarun commented Sep 23, 2022

Description

This PR enables the node's telemetry in forge to assert the telemetry crate code paths during forge tests. A bug introduced in #4443 was uncaught in forge e2e test because the telemetry crate wasn't enabled.

Enabling the APTOS_FORCE_ENABLE_TELEMETRY env variable does not still assert the telemetry web service code path in forge, but it's not in the critical path of the nodes' operation.

Test Plan


This change is Reviewable

@ibalajiarun ibalajiarun requested a review from a team as a code owner September 23, 2022 15:32
@ibalajiarun ibalajiarun force-pushed the balaji/telmetry-enable-forge branch 2 times, most recently from c8c65b2 to 84f1d21 Compare September 23, 2022 18:36
@ibalajiarun ibalajiarun added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Sep 23, 2022
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@ibalajiarun ibalajiarun force-pushed the balaji/telmetry-enable-forge branch from 84f1d21 to 23a6f34 Compare September 23, 2022 22:34
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@ibalajiarun ibalajiarun force-pushed the balaji/telmetry-enable-forge branch from 23a6f34 to d7b8e3f Compare September 24, 2022 00:27
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@ibalajiarun ibalajiarun force-pushed the balaji/telmetry-enable-forge branch from d7b8e3f to 91367ce Compare September 24, 2022 04:50
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@ibalajiarun ibalajiarun enabled auto-merge (squash) September 29, 2022 00:21
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@rustielin rustielin requested review from sitalkedia and a team September 29, 2022 00:37
@rustielin rustielin requested a review from geekflyer September 29, 2022 00:37
@github-actions
Copy link
Contributor

✅ Forge suite land_blocking success on 91367ce4681efb49b59eef64c4cf221a00c5d177

performance benchmark with full nodes : 7786 TPS, 5095 ms latency, 8400 ms p99 latency,(!) expired 120 out of 3363840 txns
Test Ok

@sitalkedia
Copy link
Contributor

@ibalajiarun - How would this work if there is no telemetry service running for Forge?

@github-actions
Copy link
Contributor

✅ Forge suite compat success on 843b204dce971d98449b82624f4f684c7a18b991 ==> 91367ce4681efb49b59eef64c4cf221a00c5d177

Compatibility test results for 843b204dce971d98449b82624f4f684c7a18b991 ==> 91367ce4681efb49b59eef64c4cf221a00c5d177 (PR)
1. Check liveness of validators at old version: 843b204dce971d98449b82624f4f684c7a18b991
compatibility::simple-validator-upgrade::liveness-check : 7456 TPS, 4946 ms latency, 8400 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 91367ce4681efb49b59eef64c4cf221a00c5d177
compatibility::simple-validator-upgrade::single-validator-upgrade : 5428 TPS, 7104 ms latency, 8900 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 91367ce4681efb49b59eef64c4cf221a00c5d177
compatibility::simple-validator-upgrade::half-validator-upgrade : 5405 TPS, 6778 ms latency, 9300 ms p99 latency,no expired txns
4. upgrading second batch to new version: 91367ce4681efb49b59eef64c4cf221a00c5d177
compatibility::simple-validator-upgrade::rest-validator-upgrade : 7982 TPS, 4555 ms latency, 8600 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for 843b204dce971d98449b82624f4f684c7a18b991 ==> 91367ce4681efb49b59eef64c4cf221a00c5d177 passed
Test Ok

@ibalajiarun
Copy link
Contributor Author

ibalajiarun commented Sep 29, 2022

@ibalajiarun - How would this work if there is no telemetry service running for Forge?

This will force the node to send telemetry to the global service, but the service will simply drop it because it cannot authenticate.

@ibalajiarun ibalajiarun merged commit fb46119 into main Sep 29, 2022
@ibalajiarun ibalajiarun deleted the balaji/telmetry-enable-forge branch September 29, 2022 00:50
sokyrko added a commit to Mamoru-Foundation/aptos-core that referenced this pull request Oct 3, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
* [rosetta/cli] Add headroom to max gas calculation

Max gas calculation was being calculated based on the gas used in
simulation, but it can change between calls.  This adds extra gas
to prevent that.

* [cli] Bump to verison 0.3.5

* [aptos-cli] Improve price estimation confirmation message

* Update prover-test.yaml (aptos-labs#4386)

- added the path condition to trigger the Prover test in CI

* [forge] Implement ability to run multiple tests

* fix: fix typo coin_client.ts parameter description

* [aptos-crypto] Remove expect from signing message

* [aptos-crypto] Disable broken derive test

* [crypto] Remove Result output of sign arbitrary message

* [consensus] Propagate results for Vote::new()

* [crypto-derive] Fix derive and test for Result in sign function

* fix: add extra gas paramters to TokenClient apis

* fix: clean up the TS doc - in progress

* fix: show detailed error message for txn submission failures

* refactor: consolidate some hard coded paramters

* feat: interface to estimate max gas amount

* [Docs] Cleaning up tutorial docs.

* [Docs] Minor cleanup

* [Docs] More cleanup

* [Docs] Fixing code snippet style.

* [Docs] Minor edit.

* [tf] update utility node requirement (aptos-labs#4404)

* [Docs] Demoting the CLI install by source in the docs.

* Update ruby deps. (aptos-labs#4342)

* Revert "[forge] Implement ability to run multiple tests" (aptos-labs#4430)

This reverts commit 06abbab.

* fix(ts-sdk): fix the failing example tests

* [aptos-logger] Add key to error message telling that serialization failed

* [api] Cache the gas schedule

* [api] Limit transactions out of block APIs

* [rosetta] Paginate transactions for blocks

* [api] Update spec for new block API description

* update explorer domain (aptos-labs#4424)

* [dashboards] sync grafana dashboards (aptos-labs#4427)

Co-authored-by: rustielin <rustielin@users.noreply.github.com>

* Change https to http for localhost url

* [Consensus] Add metrics for pending blocks per stage (aptos-labs#4357)

* [consensus] fix log spew from u128 in structured logs

* Update privacy policy (aptos-labs#4432)

* [storage] small fixes (aptos-labs#4419)

* [Token] Add get max property version (aptos-labs#4436)

* [cicd] smoke tests: build with all features

* [Docs] Enhancing docs stuff.

* [aptos-node] Tune local testnet parameters

I've decreased a bunch of the polling timeouts and concurrency amounts
to decrease the CPU overhead of a local testnet.  On my M1 mac the
decrease in overhead is from >800% to ~250%.

* [forge][flaky] Fix consensus_stress_test (aptos-labs#4409)

ChangingWorkingQuorumTest is extended/more complicated version of ContinuousProgressTest.
I've tested that one more, and fixed some of the bugs, which I forgot to fix in the simmpler version.

Removing simpler version, as I can just configure more complex one to skip the failures part, making them the same.

The bug relevant here - is that we cannot look at progress as substraction of rounds and subtraction of epochs separately, but as a tuple only

* [smoke tests] add a retry for initializing swarm, apply to 2 tests (aptos-labs#4437)

### Description

Applied to `test_db_restore` and `test_txn_broadcast`.
Includes re-enabling `test_txn_broadcast` which was previously disabled due to flakiness.

If this helps, can follow-up to apply the retry globally.

### Test Plan

Injected a retry and at least confirmed that we can successfully create a new swarm after having created a swarm (that was itself successful).

Will have to continue bake in CICD and observe the flakiness trend.

* [aptos-rosetta] Add set voter operation

* [aptos-rosetta] Parse set voter from transaction payloads

* [smoke-test] Add Rosetta hash uniqueness checks

* [aptos-rosetta] Fix failed set operator and set voter outputs

* Limit the upper bound of `rewards_rate` (aptos-labs#4185)

This commit adds assertions and a spec to ensure `rewards_rate <= 1000000`,
and `rewards_rate <= rewards_rate_denominator`.

* [telemetry-service] remote log level configuration (aptos-labs#4367)

This commit enables telemetry log level configuration remote via the telemetry service. The updater is defined in aptos_logger as async fn which is scheduled via the telemetry service tokio runtime.

* [consensus] surface insert vote error

* [consensus] better logging

* [consensus] add missing checks

* [node-resource-metrics][bugfix] allow disk metrics for nvme disks (aptos-labs#4443)

- Stop filtering by sd prefix and filter out only loop devices
- Enable node metrics collection by default

* [Docs] Editorial cleanup.

* Add a staking proxy contract

* remove aptos_token::token::initialize_token (aptos-labs#4385)

* [Docs] Merging dev resources doc into getting started doc.

* [backup] restore coordinator: fix snapshot selection

There's an edge case where the latest state snapshot is at an newer
version than the transaction backup.

* [aptos-cli] Remove default max gas, and rely on estimation

* [aptos-cli] Handle errors from simulation in the API

* [rest-client] Add ability to describe execution status with rest client

This is used for the CLI to explain the simulation status so that the
proper error codes can be parsed.

* [backup] fix local_folder.sample.yaml

* [consensus] fix edge case of block retrieval

There's no guarantee that nodes still have blocks beyond committed block,
the current condition for sync allows a gap between local ordered round and remote committed round.

There's an edge case that remote peer may have already pruned those blocks in between and the node would
never be able to sync the gap.

The fix guarantees the node always syncs if the remote committed block doesn't exist locally and only fetch blocks that
are ancestors of committed block.

* [aptos-rosetta] Put decoded full transaction into parse transaction

* trivial: typo fix: the the

* fix typo

* trivial: move a log line

* [storage] get_rightmost_leaf take a version (aptos-labs#4123)

* [storage] get_rightmost_leaf take a version

* fixup! [storage] get_rightmost_leaf take a version

* [aptos-move] Allow updating operator without changing commission (aptos-labs#4465)

* add current token datas (aptos-labs#4379)

* [framework] Move `features.move` into stdlib so it can be used from everywhere (aptos-labs#4467)

This requires unfortunately that the boolean functions checking for features cannot be longer friends (and henceforth need
to stay and cannot be remove), but otherwise seems to be straightforward.

* Proved the `storage_gas` module (aptos-labs#4452)

- Proved that `storage_gas::on_reconfig` would never abort unexpectedly.
- Added the assertions to limit the upper bounds of certain parameters in order to prevent the arithmetic overflow

* [aptos-rosetta] Keep checking that the API isn't up forever

Logs once every 10 seconds to tell us to look at the logs

* [aptos-cli] Allow compiling scripts and proposals with local framework

The remote git framework can be cached funny, and for the need for
speed it should not have users be pulling down a git repo that they
may already have.

* [aptos-cli] Add a command to generate writeset for fire events

* [Genesis] Always create staking contracts in mainnet genesis (aptos-labs#4466)

* [Docs][Upgradability] Update docs (aptos-labs#4454)

* Fix broken compile call in CLI

* [docs] Add information about retrieving balance changes from events

* [gas] add script to generate gas schedule update proposal (aptos-labs#4471)

* [cached-packages] Update framework builder

* [aptos-cli] Move to using account transfer over coin transfer

* [aptos-cli] Change `coin` to `octa`

* [aptos-rosetta] fix broken test

* [aptos-cli] Add test for checking parameters of generate-admin-write-set

* [framework] prepare for testnet upgrade

init_module isn't called on package upgrade even for new modules, therefore we forever get to expose this initialize method so that we can call it

* [node-resource-metrics] Fix double prometheus register

Double register caused node to crash.  Now there's a mutex so it
doesn't double register

* revert Validate Move String transaction arguments  (aptos-labs#4483)

* Revert "More tests around string validation code (aptos-labs#4131)"

This reverts commit 03c0811.

* Revert "Validate Move String transaction arguments (aptos-labs#4090)"

This reverts commit f31d9c5.

* [telemetry-service] security audit fixes and related refactorings (aptos-labs#4433)

* [forge] prometheus auth option and remove opensearch support (aptos-labs#4442)

* [mempool] possibly fix fn_to_val_test (aptos-labs#4468)

### Description

The test fails and succeeds on retry in CICD sometimes: only the first transaction is propagated. There is a race condition where the broadcast from the pfn happens after txn0 is added but before txn1 is added. In that case, only txn0 is broadcasted (because the test is instrumented to only send a single message). By adding the txns first, we can avoid this.

### Test Plan

Will have to observe if flakiness goes down after land.

* [Docs] New gas doc and a minor nit in sys integ doc.

* [Docs] Fixing the API link.

* [Docs] Section header change.

* [Docs] Fixing broken link.

* [Docs] Working in review comments.

* [dashboards] sync grafana dashboards (aptos-labs#4439)

Co-authored-by: rustielin <rustielin@users.noreply.github.com>

* [storage-audit-fix] dedup some constants (aptos-labs#4423)

* [certik] fix storage comments (aptos-labs#4477)

* [Rust SDK] Increase `gas_unit_price` and fund balance in `transfer-coin` example.

* [txn-emitter] Fix ConstTps txn-emitter to wait enough for transactions to be committed (aptos-labs#4354)

Add graceful_overload test to confirm we see high TPS even when we send more txn than the system can handle.
(separate PR for high gas-fee txns going through)

Fix ConstTps txn-emitter to wait enough for transactions to be committed

When we overload the system, individual nodes can start being stale, so in order to check that transaction was definitelly not committed, we need to wait for ledger timestamp to be passed.

So change time waits to be based on ledger timestamp / txn expiration time, instead of relative.

Add some more (sampled) logs to help with future debugging

* [aptos-move] Update move dependency

* [mempool] Add counter for created block size in txn count and byte size (aptos-labs#4460)

* [mempool] Add counter for created block size in txn count and byte size

We have txn/committed block counter, but without re-serializing things there, I cannot easily get the size.
This seems an easy place to measure it, and it also enables us to easily see if different nodes are trying to
propose different sizes as well (irrespective of whether those land on the blockchain)

* [api] Fix gas cache on price estimation

Turns out I forgot to set the time for the cache, so now it's fixed

* [GHA] add branch for docker (aptos-labs#4501)

* [Genesis][Vesting] Add a convenient update_operator feature with same commission (aptos-labs#4504)

* [smoke tests] local swarm build retries 3 times (aptos-labs#4461)

### Description

Adding retries on a couple tests in aptos-labs#4437 seems to have worked. They have not flaked on local swarm startup. So expanding to all local swarm startup.

### Test Plan

(same as aptos-labs#4437) Injected a retry and at least confirmed that we can successfully create a new swarm after having created a swarm (that was itself successful).

* [configs] Update block size (txns/bytes) and make configs consistent across deployments (aptos-labs#4505)

- reduce block size to 2.5k
- corresponding max bytes should be 500-600k, to have the similar load
- I think we wanted to increase quorum poll count to be around 250ms, correct?
- made all three deployments have the same configs

* [Aptos Framework][Voting][Audit issue] Do not allow changing votes after voting is over

* [aptos-faucet] Log x-forwarded-for header

* [move-examples] add a script that makes minting operations atomic

* [cli] add support for txn params for script functions

* [cli] add support for (already) compiled scripts

they can run much faster and don't require downloading the framework.

* [faucet] move to txn script for perf

the txn script supports both operations of creating an account and
minting... this will reduce load on the faucet by up to 50%
it also simplfies the logic a lot

* Fix type on line 133 (aptos-labs#4488)

* [Token] add burn control flag (aptos-labs#4358)

* [framework] remove initializer since it isn't called

maybe we'll get lucky, we can fix this bug, and we not worry about
backward supporting it...

* [genesis] remove legacy gas schedule and update testnet.mrb

* [e2e-move-tests] integration -> unit for faster execution

integration tests are really great but they take longer to compile
especially with our system (compile + ld).
This disincentivizes using this framework since it takes longer to write
tests.

* [Token] Fix two fields in the token trasnfer event

* [TOK] Add helpers to token (aptos-labs#4516)

* fix(aptos-stdlib): typo

* [move-e2e] remove version testing since it wasn't actually testing our changes

* [move-e2e] remove legacy testing harnesses

These were used back in the day where we used to ensure we'd have
perfect compatibility with every version of our framework. I don't know
why we were so obsessed with compatibility to that extent.

We'll definitely want to improve on our versioning story, but that's
going to happen in a different way and in a different crate
(e2e-move-tests)

* [CP] remove community platform

* remove build

* observability: structured logging for faucet requests (aptos-labs#4529)

* [dev_setup] remove codegen as we don't use it anymore

* [MOVE] Format all source files to be consistent (aptos-labs#4526)

* Remove cli config from repo

* [Docs] Updating the doc to be consistent with the white paper terminology.

* [Tests] Add a test for init_module when republishing package (aptos-labs#4487)

* Update first-dapp.md

* [Staking][Minor fix] Enforce max validator set size after adding the validator (aptos-labs#4532)

* [FW] Add upsert to TableWithLength (aptos-labs#4535)

* [Voting][Minor fix] Validate that minimum vote threshold <= early resolution threshold (aptos-labs#4534)

* [aptos-rosetta] Rename operator field to new_operator and voter

* [aptos-node] Minimize overhead of local test mode

* [rosetta] Add subaccounts for staking

* [rosetta] Update types for staking

* [rosetta] Remove AccountIdentifier try into AccountAddress

* [rosetta] Move to using AccountIdentfier as an input to operations

* [rosetta] Add staking balances on account balance

* [rosetta] Remove unnecessary log line

this log line was spamming and it doesn't mean that anything is
necessarily wrong

* [rosetta] Use stake amounts rather than principal for stake pool balances

* [rosetta] Remove stake pool parsing

We will only use staking contract for event parsing from now on

* [rosetta] Remove dead code, add staking balance changes in block parsing

* [rosetta] Add original operator to voter and operator operations

* [rosetta] Add StakeReward operation to differentiate between stake rewards

* [rosetta] Add SetVoter support for get block

* [rosetta] Add set operator and set voter support

* [rosetta] Remove unknown account identifier

* [rosetta] Fix tests to support new voter and operator operations

* [rosetta] Load owner address mapping at startup

To support staking for owner addresses, you need to have
the mapping from pool address to owner address.  This loads
on startup, so it could fail immediately on startup.

* docs: add multi-signer account case to building-your-on-wallet

* docs: update properties to optioanl

* docs: update publicKey type and signature type

* [Tests only][aptos_framework] chain_id: initialize for test (to specify any chain_id) (aptos-labs#4406)

* update: chain_id - add func to update chain_id for test

* update: chain_id - add initialize_for_test instead of change_id

* [dashboards] sync grafana dashboards (aptos-labs#4499)

Co-authored-by: rustielin <rustielin@users.noreply.github.com>

* [Tests only] Add an e2e test for staking_contract flow (aptos-labs#4537)

* [mempool] fix GC'd txn metric (aptos-labs#4502)

### Description

There was a bug where GC'd txns were not showing up in metrics, because `get_insertion_time` did not return anything. Instead, directly use `txn.insertion_time`.

### Test Plan

Ran 15K TPS test. Where previously no GC's are shown, now client expired txns show up.

* [Docs] Adding Windows platform option and enhancing the doc.

* [storage] update readme

Consider the target audience to be a somewhat advanced node operator instead of Aptos developers.

1. refer to other docs for explanation of the data structure, backup format, etc.
2. comments on a complete storage config sample
3. backup and restore commanded.

* [storage] update alerts

1. warn on < 200GB disk space, critical on < 50GB -- more strict than
before were we assumed smaller disks on nodes.
2. added runbooks

I've checked the formulas still work via devnet prometheus

* [forge] Fix continuous forge tests by updating configurations (aptos-labs#4515)

- looked through consensus stress/failure tests, and updated configuration
  to hopefully make all reliable

* [backup] fix still flaky smoke test test_db_restore

There's the edge case where the only state snapshot is at a newer
version than all txn backups.

* [Node CLI] Add a command that returns the stake pool address (aptos-labs#4503)

* [gha] only copy release images to dockerhub (aptos-labs#4509)

* [backup] fix that get_transaction_iter() on non-existent version can crash node

also audited all zip_eq usages under storage

* execution readme update

* [aptos-cli] Support vector<vector<u8>> for run function and script

* execution readme update

* [Community] Add coin::unfreeze_coin_store function to coin module (aptos-labs#4065)

* comment: remove LibraRoot reference

* add coin::unfreeze_coin_store function and unit test

* run pre-commit

* trim whitespace

Co-authored-by: Paul Fidika <paul@fidika.com>

* [sdk-builder] Ensure string encoding is named a separate function

* [txn-emitter] Fix underflow when first request fails (aptos-labs#4567)

This debugging log was added to show txns around the failure, but when it is the first request, there is no previous transaction.

* Update whats-new-in-docs.md

* [dev setup] Ensure unzip installed before protoc

* [rosetta] Fix Set Voter events parsing

* [rosetta] Make operator optional to setvoter setoperator

* [move] Update to latest commit of `aptos` release branch in Move repo

This branch contains the most recent fixes on the vm.

* [gha] Conditionally run pull request and pull request target based on files changed

As a follow up to last week breakage, this attempts to run pull request and
pull request target to make sure that changes to a workflow using pull request
target are both forwards and backwards compatible.

For context, right now if you modify a workflow with pull reqeust target it
will not actually test the workflow on PR so you can break it completely and
the PR would just happily land immediately breaking all workflows!!!!

The longer term that would be ideal is getting rid of pull reqeust target
because currently its impossible to version.

Test Plan: ensure pull request running and passes on PR

* [tf/helm] delete old indexer deployment configs (aptos-labs#4565)

* [compression] limit (de)compression, to network application limit size (aptos-labs#4556)

### Description

Adding limits to compression:
* On the receiving end, a limit is needed to prevent malicious node from sending messages that decompress into a large size. Given lz4, decompression could have been as large as 2 GiB (i32::MAX).
* On the sending end, a limit is needed to ensure delivery of the compressed message to honest nodes.

Each call will have to set a limit; for network messages `MAX_APPLICATION_MESSAGE_SIZE` should be a good value as anyway the message needs to be sent over the network.

### Test Plan

Add unit test.

* [CLI] Make GetStakePool CLI easier to use and add CreateStakingContract (aptos-labs#4564)

* [forge] Implement ability to run multiple tests again

Summary:

Implement the ability to have multiple tests run in parallel balanced across clusters.

This also introduces a lot of config management for test runs and clusters so
we can spin up / spin down cluster and enable and even run disabled tests but
not care about their outcomes.

This change was landed befoer but then reverted.

Last time there was a change in the run-forge workflow itself which removed the
default suite, this caused the pre forge to fail with "no tests to run" sigh...
This is kind of a result of two things

1) Pre forge step which just leaves a comment is this weird workflow which we
only care about producing enough info to log a comment. This should just be
pushed down into forge.py itself which is more than capable of commenting on a
PR before running

2) run-forge.yaml is not really testable on a PR so a PR which completely
breaks this can actually land. I think I have a solution to this using a pull
request / pull request target depending on if you modify the workflow or not

For context pull request target is used so that we can run build / test on
community PR from forked repos once we've approved a PR. However I think there
is a compromise here. Community members really should not be making changes to
the workflows file themselves so we can use pull request on that and pull
request target on everything else. In the strange case community make a PR for
workflow we can manually test that, but frankly we should have to manually
review that anyway since its a sensitive area.

This reverts commit 4e89500.

Test Plan:

Running locally, running on PR ensure the test suite functinoality works and pr
and continuous work

* Fixing broken link.

This will change again, but for now this is the fix.

* [Docs] Cleanup of events and proof docs.

* [Docs] Merging Proof doc into txns and states doc.

* [Docs] Updating.

* [Docs] Adding redirect.

* [Docs] Fixing broken links for the deleted doc.

* [Docs] Discarding basics events doc from this pr.

* [Docs] Working in comments by Greg.

* As smoke test swarm init has stabilized, reenable some more tests (aptos-labs#4544)

### Description

Reenabling, as swarm init has stabilized

### Test Plan

Run tests in CICD. Will have to check if they show up in flaky tests.

* [forge] better sts errors (aptos-labs#4582)

* [forge] better sts errors

* [gha] randomly select continuous forge clusters

* [Genesis][Tooling] Allow skipping operator config file if validator is not joining genesis validator set

* [gha] test haproxy in forge continuously

* Fix misspellings throughout docs

* Lowercase blockchain

* [storage] delete range_delete

* [forge] Fix step summary formatting

* [gha] Run forge continuous on PR only when workflow is modified

* [forge] Make multi runner use markdown

* [forge] set default gas price

* [genesis] make genesis.sh backwards compat

* fix: fix the interface doc about getting txns for by account

* feat(ts-sdk): expose nodeUrl

* [aptos-rosetta] Remove stake balance changes and use switch operator events

* [rosetta] Put balance in set operator operation

* [rosetta] Add operators to balance stake lookups

* [rosetta] Add initialize stake pool as a construction API

* [rosetta] Add CLI support for create stake pool

* [rosetta] fix tests for staking

* [rosetta] Fix parsing of create stake pool transactions

* [rosetta] Use state changes to keep track of operator changes

* [Api] Expose Move error details for simulation endpoints

* [telemetry] codeowner file

* [forge] fix grafana namespace (aptos-labs#4602)

* [HApropxy] new haproxy config and resources

* [Docker] Adding HAProxy to docker compose

* [tokio-console] Configuring tokio-console default port

* [aptos-cli] Bump to version 0.3.6

* [storage] larger default state prune window

* refactor: add "strict" to SDK tsconfig

* [helm][aptos-node] override NodeConfig with deep merge (aptos-labs#4546)

* [helm][aptos-node] override NodeConfig with deep merge

* [forge] fix cluster autocleanup label

* [forge] default read cluster prometheus secret

* [helm][aptos-node] merge defaults to empty map

* [forge] prometheus auth tests

* add indexer to cli (aptos-labs#4586)

* Node docs (aptos-labs#4572)

* [Docs] Reflowing node docs for October milestone.

* [Docs] Draft update of deployments table.

* [Docs] More edits.

* [Docs] More updates.

* [Docs] More updates.

* [Docs] More updates.

* [Docs] More updates.

* [Docs] Landing page fix.

* [Docs] Minor fix.

* [Docs] Removing some older redirects as the target links have also changed.

* [Docs] Fixing broken link in the whats new doc.

* [Docs] Removing AIT-3 docs that are not reused, fixed broken links.

* [Docs] Using system font for the sidebar label.

* [Docs] Owner doc edits.

* [Docs] Owner doc draft.

* [Docs] Voter doc with new flowchart.

* [Docs] Validator flow and docs.

* [Docs] Connecting to aptos network doc.

* [Docs] Renamed additional doc into shutting down node.

* [Docs] Adding box shadows.

* [Docs] Validator landing page with diagrams.

* [Docs] Removing flowcharts and registering at community platform steps.

* [Docs] Adding redirects.

* [SDK][API] Remove event key endpoint / client code (aptos-labs#4338)

* [SDK][API] Remove event key endpoint / client code

* Update API spec and TS SDK client

Co-authored-by: bowenyang007 <carlyang007@gmail.com>

* [backup] tf: add back backup intervals

* [consensus] support a recovery mode if consensusdb is gone

* [forge][telemetry] enable node telemetry in forge (aptos-labs#4490)

* [storage] rename target_snapshot_size (aptos-labs#4615)

Co-authored-by: aldenhu <msmouse@gmail.com>

* [txn-emitter] Fix computation for needed coin balance (aptos-labs#4614)

With increased gas fees, we were exceeding u64. But we really shouldn't be needing that much money - and it was due to this bug

* [executor] fix integration test helper

when run via `cargo nextest -p executor`, GAS_UNIT_PRICE is 100 instead of
0 as in unit tests or when it's run via `cargo nextest --workspace`.
Better to make it work with non-zero gas charge anyway.

* enable backup-cli unit tests in ci

* [gha] Make forge namespace optional

Namespace should be optional, this allows for overlapping test runs which could
exhaust capacity, however that risk is already present and this shouldnt be our
defense against overscheduling.

Test Plan: this pr will run forge-continuous and run-forge as pull request which will actually test this! :D

* [network] limit the max fragments in stream message

* [consensus] limit the payload size and remove unused quorum store variant

* [doc] update doc to remove test mode (aptos-labs#4626)

* [gha] Enable release on PR, enable HAProxy on release test

Change the release test to run (for only 5 mins) on PR and also to enable
haproxy on the release test for more confidence around haproxy change.

Test Plan: this workflow will run for a short time on the PR itself

* [Aptos Stdlib] Make scaling factor actually useful in pool_u64

* [API Client] Fix wait_for_transaction to wait for expiry and return better messages (aptos-labs#4456)

Previously wait_for_transaction waited only for 60 seconds, and then failed, even if expiry didn't finish. 

Error message was confusing (timeout vs expired), so clarifying all messages better (so user knows if transaction can still succeed or not)

But it seems very odd to not wait to the expiry time passed in, so fixing that. 

 wait_for_transaction_by_hash(_bcs) are now configurable with:

       server_lag_timeout_after_expiry: Option<Duration>,
       timeout_from_call: Option<Duration>,

First one is to finish at some point if rest endpoint is too stale. second one is if a caller wants absolute timeout (even before expiry)

Previously default would be server_lag_timeout_after_expiry=None and timeout_from_call=60s, and I am putting that for now in all other methods (wait_for_transaction/submit_and_wait/etc), so that their logic is unchanged for now.

I changed in fund cli to wait to completion, and in forge tests which had issues with above.(which is server_lag_timeout_after_expiry=60 or 120s, and timeout_from_call=None).
 we can see later if we want to change that to be default, but not changing any logic before the cut.

* Revert "[HApropxy] new haproxy config and resources"

This reverts commit 74b8581.

* Adapt new changes

* Update goldenfiles

* Exclude `forge` tests

* Update move repository ref

* Update move repository ref

Co-authored-by: Greg Nazario <greg@gnazar.io>
Co-authored-by: Junkil Park <jpark@aptoslabs.com>
Co-authored-by: Perry Randall <perryjrandall@gmail.com>
Co-authored-by: mincheolog <33034771+ArdoDulcis@users.noreply.github.com>
Co-authored-by: Jijun Leng <jijunleng@gmail.com>
Co-authored-by: Raj Karamchedu <raj@aptoslabs.com>
Co-authored-by: Rustie Lin <rustie117@gmail.com>
Co-authored-by: Zach Denton <z@chdenton.com>
Co-authored-by: Zihan <109111707+zihan-aptos@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: rustielin <rustielin@users.noreply.github.com>
Co-authored-by: Jacob Barrett <jsbarrett88@gmail.com>
Co-authored-by: Sital Kedia <sitalkedia@users.noreply.github.com>
Co-authored-by: Igor <igor-aptos@users.noreply.github.com>
Co-authored-by: Andrew Hariri <hariria@usc.edu>
Co-authored-by: Alden Hu <msmouse@gmail.com>
Co-authored-by: Max Kaplan <1482859+CapCap@users.noreply.github.com>
Co-authored-by: Brian Cho <brian@aptoslabs.com>
Co-authored-by: igor-aptos <110557261+igor-aptos@users.noreply.github.com>
Co-authored-by: Balaji Arun <balaji@aptoslabs.com>
Co-authored-by: Zekun Li <li.zekun@gmail.com>
Co-authored-by: Kevin <kevin@aptoslabs.com>
Co-authored-by: Jeffrey Quesnelle <jquesnelle@gmail.com>
Co-authored-by: Aaron <lightmark@gmail.com>
Co-authored-by: Bowen Yang <carlyang007@gmail.com>
Co-authored-by: Wolfgang Grieskamp <wg@aptoslabs.com>
Co-authored-by: Zhou Runtian <runtian@aptoslabs.com>
Co-authored-by: Kevin <105028215+movekevin@users.noreply.github.com>
Co-authored-by: Victor Gao <10379359+vgao1996@users.noreply.github.com>
Co-authored-by: David Wolinsky <isaac.wolinsky@gmail.com>
Co-authored-by: Zekun Li <52300039+zekun000@users.noreply.github.com>
Co-authored-by: Steven Gu <steven.gu@crypto.com>
Co-authored-by: Sherry Xiao <sherryxiao.py@gmail.com>
Co-authored-by: Shih-Yu Hwang <53620329+Shih-Yu@users.noreply.github.com>
Co-authored-by: Bo Wu <bo@aptoslabs.com>
Co-authored-by: 0xbe1 <101405096+0xbe1@users.noreply.github.com>
Co-authored-by: CapCap <max@aptoslabs.com>
Co-authored-by: Christian Theilemann <christian@aptoslabs.com>
Co-authored-by: Scott <scott.ph.chou@gmail.com>
Co-authored-by: linnefromice <linnefromice@gmail.com>
Co-authored-by: Paul Fidika <fidika@hey.com>
Co-authored-by: Paul Fidika <paul@fidika.com>
Co-authored-by: Clay Murphy <114445310+clay-aptos@users.noreply.github.com>
Co-authored-by: Clay Murphy <claymurphy@Clays-Air.localdomain>
Co-authored-by: Balaji Arun <balajia@vt.edu>
Co-authored-by: Alex Markuze <alexm@aptoslabs.com>
Co-authored-by: Daniel Porteous (dport) <danielporteous1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants