[Milestone] Waku Network Can Support 10K Users #12

jm-clius · 2023-01-20T13:41:54Z

Priority Tracks: Secure Scalability
Due date: 31 May 2023
Milestone: https://github.com/waku-org/pm/milestone/5

Note: this deadline assumes that the target of 1 Mio users by end-June 2023 could lean for the largest part on the designed solutions for the problem space defined below.

Summary

Scale to 10K Status Community users, spread across ~10 to ~100 communities
This milestone focuses on 100% Desktop users, primarily using relay, but with experimental/beta support for filter and lightpush for clients with poor connectivity
Communities, private group chats and 1:1 chats should be considered. Public chats are excluded.

Tasks / Epics

Extracted questions

Are the number of users and number of communities realistic? Answer on 2023-01-19: yes, makes sense as an initial goal
What is the proportion (in message rate and bandwidth) of community messages vs community control messages vs store query-responses?
Does message rate increase linearly with increase in network size? Answer on 2023-01-19: generally should be the case (could have a multiplicative factor, but not combinatorial or exponential)_
What bandwidth upper bound should we target for Desktop nodes? One possible answer: ADSL2+ limit of 3.5 Mb/s?
Can this MVP consider participation in only one Community at a time? Answer on 2023-01-24: nodes will be part of multiple communities from the beginning.
What store query rate should we target for 10K users?

Network requirements

Note: this gathers the minimal set of requirements the Waku network must adhere to to support Status Communities scaling to 10K users. It does not propose a design.

1. Message Delivery and Sharding

Note: this section, especially, depends on app-defined user experience minimals. E.g. the app knows what (sub)set of messages is necessary "for a consistent experience" and this will feed into a pubsub topic, content topic and sharding design that does not compromise on UX. This process should also define when messages should be received "live" (relay) or opportunistically via history queries (store).

Nodes should be able to receive (via relay or store) all community messages of the community they're part of.
Nodes should receive live (via relay) all chat messages that is necessary for a consistent experience. A chat message is content sent by users either in a community channel, 1:1 or private group.
Nodes should receive live (via relay) all control messages that is necessary for a consistent experience. Control messages are mostly used for community reasons, with some for 1:1 and private groups (e.g. online presence and X3DH bundle).
Each community can utilize a single or multiple shards for control and community messages, as long as requirements (1) - (3) still hold.
Nodes should participate in shards in such a way that resource usage (especially bandwidth) is minimized, while requirements (1) - (3) still hold.
Peer and connection management should be sufficient to allow nodes to maintain a healthy set of connections within each shard they participate in.

Assumptions:

connectivity, NAT traversal capability, NAT hole punching, etc. is similar to that described Status MVP: Status Core Contributors use Status #7. No further work is required within the context of this MVP.
it is possible to be part of several communities simultaneously
we assume that community size is such that community desktop nodes can realistically be expected to relay the messages for all community traffic. That is - communities can be responsible for their own relay infrastructure.

2. Discovery

Nodes should be able to discover peers within each shard they're interested in.
Discovery method(s) can operate within a single or multiple shards, as long as:

requirement (1) still holds
nodes can bootstrap the chosen discovery method(s) for shards they're interested in
the chosen discovery method(s) does not add an unreasonable resource burden on nodes, especially if this method is shared between shards

Assumptions:

nodes are able to use discv5 as their main discovery method

3. Bootstrapping

Nodes should be able to initiate connection to bootstrap nodes within the shards they're interested in.
Bootstrap nodes can serve a single or multiple shards, as long as they can handle the added resource burden.

Assumptions:

Status initially provides bootstrapping infrastructure.
DNS discovery is sufficient to find initial bootstrap nodes.

4. Store nodes (Waku Archive)

Nodes should be able to find capable store nodes and query history within the shards they're interested in.
Store nodes can serve a single or multiple shards, as long as:

they can handle the query rate and resource burden
are discoverable as stated in requirement (1)

Assumptions:

Status provide initial store infrastructure, including a performant Waku Archive implementation.
PostgreSQL implementations exist for Waku Archive that can deal with the required rate of parallel queries to support 10K users
DNS discovery is sufficient to discover capable store nodes (these may or may not be the same nodes as used for bootstrapping, but discovery will be simpler if they are).

5. Security:

Community members should not be vulnerable to simple DoS/spam attacks as defined in (3) and (4) below.
Each community should be unaffected by failures and DoS/spam attacks in other communities. This implies some isolation/sharding in the messaging domain.
Store/Archive:
- store nodes for a community should only archive messages actually originating from the community
- store nodes for a community should not be vulnerable to being taken down by a high rate of history queries
Relay:
- community relay nodes should only relay messages actually originating from the community.

Assumptions:

Community members (i.e. the application) are able to validate messages against community membership.

Other requirements

Note: this gathers the minimal set of requirements outside the Waku network (e.g. operational, testing, etc.) to support Status Communities scaling to 10K users.

1. Kurtosis network testing

A simulation framework and initial set of tests that can approximate:

the protocols
the discovery methods
the traffic rates for a typical community
in such a way to prove the viability of any scaling design proposed to achieve the Network Requirements

2. Community Protocol hardening

The Community Chat Protocols specifications are moved to Vac RFC repo.

what else is required within this MVP time frame, e.g. including Community Chat in Kurtosis testing?

3. Nwaku integration testing

Nwaku requires integration testing and automated regression testing for releases to improve trust in stability of each release.

4. Fleet ownership

Ownership for infrastructure provided to Status communities should be established. This may require training and transfer of responsibilities which mostly lies de facto within the nwaku team.
Fleet ownership comprises the responsibility for:

establishing a sensible upgrade process (may require some nodes for staging)
upgrading fleets
monitoring existing fleets and protocol behavior
support and logging bugs when noticed

Initial work

The requirements above will lead to a design and task breakdown. Roughly the order of work:

Ownership for all three items below is shared between Vac, Waku and Status teams:

(1) Agree on requirements above as the complete and minimal set to achieve the 10K scaling goal.
(2) A viable, KISS network design adhering to "Network requirements"
(3) Task breakdown of each item and ownership assignment

The text was updated successfully, but these errors were encountered:

corpetty · 2023-01-24T12:29:16Z

tagging Testing team: @AlbertoSoutullo, @0xFugue, @Daimakaimura

jm-clius · 2023-01-25T13:05:46Z

Achieving network requirements: tasks and ownership

NB: requirements and tasks may change as we encounter unknowns. The task breakdown below assumes that nothing more has to be done for Discovery and Bootstrapping other than proper configuration (to be described in the Scaling Strategy BCP).

1. Verify scaling target requirements

Understand the expected:

rate of chat messages
rate of control messages
rate of store queries
avg expected bandwidth usage of each of the above

for 10K community users.

Note: this is not necessarily an analytical exercise but ballpark figures and sanity checking current Status Community message rates. @Menduist has done analysis of message rates in large Discord servers to get to rough estimate of what we would expect to see for Status Communities. However, analysis of existing Status Community shows significantly higher message rate and bandwidth usage. See conversation.

Tracked in: ??
Owners:

app: @richard-ramos
waku: @rymnc

2. Community sharding plan

Sharding strategy for Waku relay in general and Status Communities specifically. This plan will consider short term and longer term strategies.
This item is set out in more detail in @kaiserd's Secure Scaling Roadmap.

Tracked in: vacp2p/research#154
Owners:

app (strategy and implementation): @cammellos
waku (strategy): @kaiserd

3. Simple Waku Relay DoS mitigation

Strategy and implementation to protect relay and store against simple DoS attack vectors. This item is set out in more detail in @kaiserd's Secure Scaling Roadmap.

Tracked in: vacp2p/research#164
Owners:

app (implementation): @cammellos
waku (strategy): @alrevuelta
nwaku (implementation): @alrevuelta

4. Scalable storage: nwaku archive PostgreSQL implementation

Already part of #8 but repeated here for completeness.
Note that this includes work to allow concurrent queries.

Tracked in: #4
Owners:

postgresql backend: @cammellos
postgresql chronos/concurrency adaptation: ??

5. Scalable storage: deterministic message ID

Tracked in: vacp2p/rfc#563
Owners:

waku (protocol): @LNSD
nwaku (implementation): @LNSD
go-waku (implementation): @richard-ramos

6. Scalable storage: testing store at scale

Basic testing to see that PostgreSQL implementation works at expected message and query rates. (Note this is in addition to simulation with Kurtosis).

Tracked in: ??
Owner: @LNSD

7. Filter and lightpush improvements

Revising the RFCs and implementations in nwaku and go-waku. Already part of #8 but repeated here for completeness.

Tracked in: #5
Owners:

protocol: @jm-clius
nwaku (implementation): @jm-clius
go-waku (implementation): @richard-ramos

8. Peer management strategy

RFC for basic peer management strategy and implementations in nwaku and go-waku.

Tracked in: waku-org/nwaku#1353
Owners:

protocol (RFC): @alrevuelta
nwaku (implementation): @alrevuelta
go-waku (implementation): @richard-ramos

9. Combine into comprehensive scaling strategy

This can be seen as the final goal for all the moving parts and separate tasks listed above. Output will likely take the form of one or more Best Current Practices RFCs that focus on the Status 10K use case. It will bring together the short term strategies for sharding, DoS mitigation, bootstrapping, discovery and store configuration. It may include suggestions on when to use lightpush and filter rather than relay.

Tracked in: vacp2p/research#165
Owners:

protocol (RFC): @kaiserd
app (strategy consultation): @cammellos

10. Targeted dogfooding

This is in addition to simulation with Kurtosis. Individual owners of each task will be responsible for testing and dogfooding their strategies/features. This task ensures that we have considered each item for targeted network testing, including:

Sharding strategy
Store at scale
Bootstrapping
Discovery (DNS, discv5 and Waku Peer Exchange)
Filter and lightpush

Tracked in: ??
Owner: @jm-clius

11. New `multiaddrs` discovery: libp2p rendezvous

Although it is possible to encode multiaddrs in ENRs, which are currently being exchanged by all existing discovery methods, ENRs are limited in size and can consequently not contain more than one or two multiaddrs. We need a discovery method more suitable for multiaddrs. We have chosen libp2p rendezvous as solution here.

Tracked in: vacp2p/research#176
Owners:

nwaku: ??
go-waku: @richard-ramos

12. Waku static sharding implementation

This is an outflow of the Community sharding plan as specified by @kaiserd and covers the implementation portion, including configuration and enabling shard discovery via ENRs.

Owners:

nwaku: ??
go-waku: @richard-ramos

jm-clius · 2023-02-02T11:19:25Z

Achieving other requirements: tasks and breakdown

1. Wakurtosis: first network test

This is described in vacp2p/wakurtosis#7

It covers testing the scalability of the relay protocol, specifically measuring:

latency
reliability
resource usage (bandwidth, CPU, memory)
against network size and message rate.

Owner:

data science: @Daimakaimura

2. Wakurtosis: analyze first test results

This step will either confirm our (positive) assumptions about relay scalability or highlight bottlenecks/bugs in the protocol or implementations, which must be addressed and considered in the overall network roadmap.

Owners:

waku: @fryorcraken
data science: @Daimakaimura

4. Wakurtosis: plan next tests

This is a collaborative task flowing from the results of the first test to refine the simulation(s) and plan the next, most useful tests.

Owners:

waku: @jm-clius
data science: @Daimakaimura

5. Community Protocol: move to Vac RFC repo

This is an administrative step. It may require updating the RFC to match the latest implementation, moving sections around, etc.

Owner:

waku: @rymnc

6. Community Protocol: review protocols

Grasping the content of each protocol and how it maps to real-world Waku network traffic. This is potentially an involved task, so the scope should be minimized for this MVP. This relates to Verify scaling target requirements under the Network Requirements.

Owner:

waku: @rymnc
app: @cammellos (inputs from @richard-ramos @felicio)

7. Nwaku hardening: Wakurtosis sandbox machine

Provisioning a performant machine(s) which the dev team can use for sandbox testing features using ad-hoc Wakurtosis deployments.

Owners:

waku (requirements): @jm-clius
infra (devops): ??

8. Nwaku hardening: Wakurtosis integration testing

Integration test environment for nwaku. Most likely it will take the form of a pipeline that deploys a Wakurtosis network topology and runs a series of scripted integration tests for nwaku.

Owners:

waku (requirements): @jm-clius
infra (devops): ??

9. Nwaku hardening: release automation

Automated release pipeline for nwaku that builds a release, compile release notes and publish release binaries and tagged docker image for most common OSs/architectures.

Tracked in: waku-org/nwaku#611
Owners:

waku (requirements): @jm-clius
infra (devops): ??

10. Fleet ownership: set requirements

Create a document that summarizes all the common tasks that a fleet owner generally has to do, including deployment, monitoring and debugging. This will also allow us to communicate to other platforms planning on deploying their own Waku fleets what they need to consider. The document should include a section on what Status fleet ownership specifically entails, including a procedure to log and escalate bugs/network anomalies.

Owner: @jm-clius

11. Fleet ownership: training

Based on the requirements determined above, determine who will take ownership of the Status fleets and schedule training sessions.

Owner:

waku: @jm-clius
Status fleet owners: ??

LNSD · 2023-02-14T16:38:39Z

5. Scalable storage: deterministic message ID

Tracked in: vacp2p/rfc#563
Owners:

waku (protocol): @LNSD
nwaku (implementation): @LNSD
go-waku (implementation): @richard-ramos

The also known as Message Unique ID initiative progress is tracked in the following issue: waku-org/nwaku#1914

fryorcraken · 2023-08-10T04:53:20Z

Thoughts on current status:

1. Verify scaling target requirements

Several discussions have happen. outputs I am aware of are:

Scaling Status Communities : Potential Problems vacp2p/research#177

@jm-clius @richard-ramos did we have more to this?

2. Community sharding plan Roadmap(SeM): Secure Scaling vacp2p/research#154 (note this issue also track 1mil work)

This can be closed as static sharding was delivered. The quoted issue also tracks for 1mil.

3. Simple Waku Relay DoS Mitigation Simple Waku Relay DoS mitigation vacp2p/research#164

vacp2p/research#164 (comment)

4. Scalable storage: nwaku archive PostgreSQL implementation Waku Store Cache: PostgreSQL implementation #4 PostgreSQL nwaku#1888

waku-org/nwaku#1888 (comment)

5. Scalable storage: Deterministic Message ID Calculating ID of the messages for status-go and deterministic WakuMessage bytes vacp2p/rfc#563 Waku message UID #9

This needs clean-up. Implementation of MUID to avoid dupe in store is done. Which was the main reason to do it for 10k.
Moving forward, we could use MUID for gossipsub seen message logic, is that something we need for 1mil?

Then, MUID is possibly going to be used for Distributed store.

@jm-clius please confirm

6. Scalable Storage: testing store at scale

vacp2p/research#191 (comment)

@jm-clius were we thinking DST simulation for this?

7. Filter and lightpush improvements Waku Filter and Lightpush: Evaluation and improvement #5

#5 (comment)

8. Peer management strategy Networking MVP: Refactor + extend functionality nwaku#1353

waku-org/nwaku#1353 (comment)

9. Combine into comprehensive scaling strategy Roadmap(SeM): Application Protocols vacp2p/research#165

@jm-clius this seems done. Not sure if we tracked an output somewhere?

10. Targeted dogfooding

I suggest to descope this from Waku work. By delivering this milestone we enable Status to integrate Waku tech and start dogfooding. We are tracking hardening of Waku protocols as part of waku-org/research#3 with 2.1

11. New multiaddrs discovery: libp2p rendezvous Discovery: discovering suitable multiaddrs vacp2p/research#176

vacp2p/research#176 (comment)

12. Waku static sharding implementation

Done. What issue tracked the work/output? @jm-clius

Setup staging fleet with static sharding for Status dogfooding

Last remaining task. Are we tracking somewhere @jm-clius ?
edit: is this it? status-im/status-go#3528

Specify fleet ownerships requirements to enable Status team to maintain own fleet

The other last remaining task. Are we tracking somewhere @jm-clius ?

jm-clius · 2023-08-10T16:05:40Z

Thanks for revising, @fryorcraken. See my comments below.

Several discussions have happen. outputs I am aware of are:
vacp2p/research#177
@jm-clius @richard-ramos did we have more to this?

Afaik many of the suggestions have been implemented or are in the process of being implemented, also in status-go. @richard-ramos may have better idea of current status. Perhaps the work that's being done in status-go should be tracked there, which would mean the Waku side can be closed?

This can be closed as static sharding was delivered. The quoted issue also tracks for 1mil.

I agree.

This needs clean-up. Implementation of MUID to avoid dupe in store is done. Which was the main reason to do it for 10k.
Moving forward, we could use MUID for gossipsub seen message logic, is that something we need for 1mil? Then, MUID is possibly going to be used for Distributed store.

Yes, I would close vacp2p/rfc#563 as the only issue really needed for the 10K milestone. We also don't need to do anything else for the 1 mill milestone, but we can keep #9 open to track the work that would be necessary for the distributed store.

vacp2p/research#191 (comment)

@jm-clius were we thinking DST simulation for this?

Initially, yes. But I think a reasonable step for the 10K epic would be (a) dogfooding and (b) local stress-testing of postgresql.

Combine into comprehensive scaling strategy Roadmap(SeM): Application Protocols vacp2p/research#165
@jm-clius this seems done. Not sure if we tracked an output somewhere?

Yes, I've gone ahead and closed the issue. The output here was just moving the RFCs to vac repo and revising them.

Waku static sharding implementation
Done. What issue tracked the work/output? @jm-clius

Main tracking issue was: #15 which I think can just be closed. There were also tracking issues in nwaku (and probably go-waku/js-waku).

Setup staging fleet with static sharding for Status dogfooding
Last remaining task. Are we tracking somewhere @jm-clius ?
edit: is this it? status-im/status-go#3528

No, the first fleet that can be used for initial tests/dogfooding is tracked here: status-im/infra-waku#1 Since this fleet has been deployed, this issue can probably be closed. This is not quite a staging fleet for Status yet, which I'll link to the issue I create for the Status fleet requirements below.

Specify fleet ownerships requirements to enable Status team to maintain own fleet
The other last remaining task. Are we tracking somewhere @jm-clius ?

It is now: #61 Not a very detailed issue, but should do the trick. :)

richard-ramos · 2023-08-10T22:41:28Z

I think suggestions from: vacp2p/research#177 have not been implemented, or I could not find them on status-go code.

fryorcraken · 2023-08-14T05:01:36Z

Weekly Update

All software has been delivered. Pending items are:

Running stress testing on PostgreSQL to confirm performance gain chore (postgres): Establish a testing environment and tooling to measure Store performance nwaku#1894
Setting up a staging fleet for Status to try static sharding
Running simulations for Store protocol: Will confirm with Vac/DST on dates/commitment and probably move this to 1mil epic

fryorcraken · 2023-08-29T06:20:48Z

Monthly Update

Staging fleet for Status (static sharding + Postgres) has been defined and handed over to infra: waku-org/nwaku#1914
Stress testing of PostreSQL in progress, INSERT done, SELECT in progress.

fryorcraken · 2023-10-20T03:50:47Z

1k nodes simulation blogpost: vacp2p/vac.dev#123

fryorcraken · 2023-10-23T06:12:56Z

Weekly Update

achieved:
- Vac/DST team has done further runs with up to 600 nodes in the network as part of wrapping up a blog post report.
- Staging fleet for Status with static sharding and PostgreSQL deployed and being tested by go-waku team using local changes in Status Desktop.
next:
- Dogfooding of Status Desktop with Status staging fleet. Will aim to create a small internal Waku community.
- Continue integration of static sharding in status-go.
risks:
- Dependency on Vac/DST to conclude ~1k nodes simulations.
- PostgreSQL implementation has not yet been proven more performant than SQLite. Further improvements and testing in progress.
- Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK lead on 6 Nov to drive effort.

fryorcraken · 2023-10-31T11:54:31Z

Weekly Update

Integration of static sharding in go-waku is continuing (see updates below).
Testing of PostgreSQL enabled some performance improvement in the implementation that are being implemented.
Internal instructions have been distributed to dogfood static sharding with the Waku team (Waku Discord private channel).
risks:
- Dependency on Vac/DST to conclude ~1k nodes simulations.
- Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK lead on 6 Nov to drive effort.
- lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.

jm-clius · 2023-11-03T15:30:29Z

Weekly Update

achieved:
- further PostgreSQL optimisations nearing conclusion
- implemented bridge to allow Status Community to move to static sharding with backwards compatibility towards default pubsub topic
- solution for shared bootstrap nodes being filtered out in discv5 as more static shards are activated
- ensured no unknown blockers from Waku's side to start dogfooding in conversation with Status Communities
next:
- continue integration of static sharding in status-go.
- deploy bridge for backwards compatibility
- dogfooding of Status Desktop with Status staging fleet. Will aim to create a small internal Waku community
risks:
- Dependency on Vac/DST to conclude ~1k nodes simulations.
- Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK lead on 6 Nov to drive effort.
- lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.
- lack of clarity in terms of Status fleet ownership, monitoring and maintenance, which is an integral part of the solution.

jm-clius · 2023-11-10T15:49:03Z

Weekly Update

achieved:
- final PostgreSQL optimisations completed. Benchmarks published: https://www.notion.so/Postgres-e33d8e64fa204c4b9dcb1514baf9c582
- added "debug nodes" with trace-level message logging to each Status fleet to allow for easier e2e message traceability
- confirmed no unknown blockers from Waku's side to continue dogfooding in conversation with Status Communities
next:
- continue integration of static sharding in status-go.
- dogfooding of Status Desktop with Status staging fleet. Will aim to create a small internal Waku community
risks:
- Dependency on Vac/DST to conclude ~1k nodes simulations.
- Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK lead on 6 Nov to drive effort.
- lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.
- lack of clarity in terms of Status fleet ownership, monitoring and maintenance, which is an integral part of the solution.

jm-clius · 2023-11-17T15:54:21Z

Weekly Update

achieved:
- closed last PostgreSQL issue for Store scalability
- confirmed no unknown blockers from Waku's side to continue dogfooding in conversation with Status Communities
- started team-internal dogfooding of a test community using static sharding
- started fleet ownership handover process: published guidelines/list of responsibilities - https://www.notion.so/Fleet-Ownership-7532aad8896d46599abac3c274189741
next:
- continue dogfooding of Status Desktop with Status staging fleet with test community
- training session to conclude fleet ownership handover: https://www.notion.so/Fleet-Ownership-7532aad8896d46599abac3c274189741
risks:
- Dependency on Vac/DST to conclude ~1k nodes simulations.
- Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK lead on 6 Nov to drive effort.
- lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.
- lack of clarity in terms of Status fleet ownership, monitoring and maintenance, which is an integral part of the solution.

jm-clius · 2023-11-27T08:23:15Z

Weekly Update

achieved:
- confirmed no unknown blockers from Waku's side to continue dogfooding in conversation with Status Communities
- continuing team-internal dogfooding of a test community using static sharding [Epic] Targeted dogfooding for Status Communities #97. See dogfooding report
- fleet ownership training: held session for stakeholders on responsibilities - https://www.notion.so/Fleet-Ownership-7532aad8896d46599abac3c274189741
next:
- continue dogfooding of Status Desktop with Status staging fleet with test community ([Epic] Targeted dogfooding for Status Communities #97)
- fix issue of store fleet not connecting to bootstrap fleet due to enr shards mismatch Update dns discovery url for shards.test status-im/infra-status#23
risks:
- Fleet Ownership doc defines fleet maintainer and owner. Status team yet to clarify who is the fleet owner for Status Communities.
- QA by Status team to be planned on staging static sharding fleet; Waku team has done internal dogfooding (report). Any change to the staging static sharding fleet should then be tested by QA before being deployed to prod (e.g. # of Postgres instances). Status has committed to this testing on 28Nov call.
- Status team expressed will to deploy static sharding prod fleet and use it for all users: This is not recommended until proper QA is done on stagning static sharding fleet as it could impact other Status app activities.
- Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK team since November 2023 to drive effort.
- Dependency on Vac/DST to conclude ~1k nodes simulations; lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.

fryorcraken · 2023-11-29T01:23:34Z

We will run one more week of internal dogfooding of static sharding + PostgreSQL in Status Communities.
Once done and if no new issues are found. We will close this issue.

The go-waku and waku chat sdk team will continue to support Status with their integration of Waku v2 but no major effort is scheduled in term of software development and testing.

jm-clius · 2023-12-08T11:59:23Z

Weekly Update

achieved:
- fixed issue of store fleet not connecting to bootstrap fleet due to enr shards mismatch Update dns discovery url for shards.test status-im/infra-status#23
- continuing team-internal dogfooding of a test community using static sharding [Epic] Targeted dogfooding for Status Communities #97. See dogfooding report
- benchmarked various ways for large postgresql deployments: store-db add more appropriate db settings for current db hw status-im/infra-status#37
next:
- continue dogfooding of Status Desktop with Status staging fleet with test community ([Epic] Targeted dogfooding for Status Communities #97)
risks:
- Fleet Ownership doc defines fleet maintainer and owner. Status team yet to clarify who is the fleet owner for Status Communities.
- QA by Status team to be planned on staging static sharding fleet; Waku team has done internal dogfooding (report). Any change to the staging static sharding fleet should then be tested by QA before being deployed to prod (e.g. # of Postgres instances). Status has committed to this testing on 28Nov call.
- Status team expressed will to deploy static sharding prod fleet and use it for all users: This is not recommended until proper QA is done on stagning static sharding fleet as it could impact other Status app activities.
- Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK team since November 2023 to drive effort.
- Dependency on Vac/DST to conclude ~1k nodes simulations; lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.

fryorcraken · 2024-01-10T11:29:43Z

#97 is now done. Status QA is proceeding with testing.
Most changes are now focused on status-go with ad hoc bug/issue investigation from Waku team. This Milestone can now be closed 🎉

fryorcraken added this to Waku Jan 20, 2023

jm-clius moved this to In Progress in Waku Jan 23, 2023

fryorcraken added the RAID label Jan 24, 2023

This was referenced Jan 25, 2023

Roadmap(SeM): Secure Scaling vacp2p/research#154

Closed

Roadmap(SeM): Application Protocols vacp2p/research#165

Closed

rymnc mentioned this issue Jan 27, 2023

feat(56/STATUS-COMMUNITIES): initial draft vacp2p/rfc#567

Merged

This was referenced Jan 27, 2023

Waku message UID #9

Open

Roadmap(SeM): Data Synchronization vacp2p/research#170

Closed

Waku message ID vacp2p/research#175

Open

richard-ramos mentioned this issue Feb 6, 2023

Verify scaling target requirements status-im/telemetry#11

Open

alrevuelta mentioned this issue Feb 7, 2023

feat: add WakuMessage validation in gossipsub waku-org/nwaku#1537

Merged

kaiserd mentioned this issue Feb 13, 2023

[Epic] Waku Relay Topic Sharding: Static Sharding Implementation #15

Closed

6 tasks

kaiserd mentioned this issue Feb 21, 2023

SeM(Milestone): Scaling Status App to 1 million users MVP vacp2p/research#187

Closed

14 tasks

jm-clius mentioned this issue Mar 14, 2023

chore(archive): allow request concurrency towards PostgreSQL waku-org/nwaku#1604

Closed

5 tasks

jm-clius mentioned this issue May 11, 2023

Waku Network MVP: the public Waku network waku-org/research#1

Open

jm-clius mentioned this issue May 25, 2023

Status Communities: Orchestrate static shard allocation and infra deployment status-im/status-go#3528

Open

fryorcraken removed the RAID label Jul 31, 2023

fryorcraken changed the title ~~Status MVP: Scaling Communities to 10K~~ [Epic] Waku Network Can Support 10K Users Jul 31, 2023

fryorcraken added the Epic Tracks a sub-team Epic. label Aug 2, 2023

fryorcraken mentioned this issue Aug 2, 2023

[Milestone] Waku Network can support 1 million users #31

Closed

4 tasks

fryorcraken added the E:2023-10k-users label Aug 2, 2023

Ivansete-status mentioned this issue Aug 4, 2023

[Milestone] Establish a testing environment and tooling to measure Store performance #41

Closed

2 tasks

fryorcraken mentioned this issue Aug 10, 2023

Roadmap(DST) vacp2p/research#191

Closed

19 tasks

jm-clius mentioned this issue Aug 16, 2023

Setting up static sharding fleet for Status waku-org/nwaku#1914

Closed

fryorcraken changed the title ~~[Epic] Waku Network Can Support 10K Users~~ [Milestone] Waku Network Can Support 10K Users Aug 24, 2023

fryorcraken added Deliverable Tracks a Deliverable and removed Epic Tracks a sub-team Epic. labels Aug 24, 2023

fryorcraken changed the title ~~[Milestone] Waku Network Can Support 10K Users~~ [Epic] Waku Network Can Support 10K Users Aug 24, 2023

fryorcraken added Epic Tracks a sub-team Epic. and removed Deliverable Tracks a Deliverable labels Aug 24, 2023

fryorcraken changed the title ~~[Epic] Waku Network Can Support 10K Users~~ [Milestone] Waku Network Can Support 10K Users Aug 25, 2023

fryorcraken added Deliverable Tracks a Deliverable and removed Epic Tracks a sub-team Epic. labels Aug 25, 2023

fryorcraken mentioned this issue Sep 7, 2023

[Milestone] Waku Network can Support 1 Million Users #83

Closed

fryorcraken removed the E:2023-10k-users label Sep 22, 2023

fryorcraken mentioned this issue Oct 27, 2023

Roadmap Article waku-org/blog.waku.org#10

Merged

jm-clius closed this as completed Nov 3, 2023

github-project-automation bot moved this from In Progress to Done in Waku Nov 3, 2023

jm-clius reopened this Nov 3, 2023

fryorcraken closed this as completed Jan 10, 2024

chair28980 removed the Deliverable Tracks a Deliverable label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Milestone] Waku Network Can Support 10K Users #12

[Milestone] Waku Network Can Support 10K Users #12

jm-clius commented Jan 20, 2023 •

edited by fryorcraken

Loading

corpetty commented Jan 24, 2023

jm-clius commented Jan 25, 2023 •

edited

Loading

jm-clius commented Feb 2, 2023 •

edited

Loading

LNSD commented Feb 14, 2023

5. Scalable storage: deterministic message ID

fryorcraken commented Aug 10, 2023 •

edited

Loading

jm-clius commented Aug 10, 2023

richard-ramos commented Aug 10, 2023

fryorcraken commented Aug 14, 2023

fryorcraken commented Aug 29, 2023

fryorcraken commented Oct 20, 2023

fryorcraken commented Oct 23, 2023 •

edited

Loading

fryorcraken commented Oct 31, 2023 •

edited

Loading

jm-clius commented Nov 3, 2023 •

edited

Loading

jm-clius commented Nov 10, 2023 •

edited

Loading

jm-clius commented Nov 17, 2023

jm-clius commented Nov 27, 2023 •

edited by fryorcraken

Loading

fryorcraken commented Nov 29, 2023

jm-clius commented Dec 8, 2023

fryorcraken commented Jan 10, 2024

[Milestone] Waku Network Can Support 10K Users #12

[Milestone] Waku Network Can Support 10K Users #12

Comments

jm-clius commented Jan 20, 2023 • edited by fryorcraken Loading

Summary

Tasks / Epics

Extracted questions

Network requirements

1. Message Delivery and Sharding

2. Discovery

3. Bootstrapping

4. Store nodes (Waku Archive)

5. Security:

Other requirements

1. Kurtosis network testing

2. Community Protocol hardening

3. Nwaku integration testing

4. Fleet ownership

Initial work

corpetty commented Jan 24, 2023

jm-clius commented Jan 25, 2023 • edited Loading

Achieving network requirements: tasks and ownership

1. Verify scaling target requirements

2. Community sharding plan

3. Simple Waku Relay DoS mitigation

4. Scalable storage: nwaku archive PostgreSQL implementation

5. Scalable storage: deterministic message ID

6. Scalable storage: testing store at scale

7. Filter and lightpush improvements

8. Peer management strategy

9. Combine into comprehensive scaling strategy

10. Targeted dogfooding

11. New multiaddrs discovery: libp2p rendezvous

12. Waku static sharding implementation

jm-clius commented Feb 2, 2023 • edited Loading

Achieving other requirements: tasks and breakdown

1. Wakurtosis: first network test

2. Wakurtosis: analyze first test results

4. Wakurtosis: plan next tests

5. Community Protocol: move to Vac RFC repo

6. Community Protocol: review protocols

7. Nwaku hardening: Wakurtosis sandbox machine

8. Nwaku hardening: Wakurtosis integration testing

9. Nwaku hardening: release automation

10. Fleet ownership: set requirements

11. Fleet ownership: training

LNSD commented Feb 14, 2023

5. Scalable storage: deterministic message ID

fryorcraken commented Aug 10, 2023 • edited Loading

jm-clius commented Aug 10, 2023

richard-ramos commented Aug 10, 2023

fryorcraken commented Aug 14, 2023

fryorcraken commented Aug 29, 2023

fryorcraken commented Oct 20, 2023

fryorcraken commented Oct 23, 2023 • edited Loading

fryorcraken commented Oct 31, 2023 • edited Loading

jm-clius commented Nov 3, 2023 • edited Loading

jm-clius commented Nov 10, 2023 • edited Loading

jm-clius commented Nov 17, 2023

jm-clius commented Nov 27, 2023 • edited by fryorcraken Loading

fryorcraken commented Nov 29, 2023

jm-clius commented Dec 8, 2023

fryorcraken commented Jan 10, 2024

jm-clius commented Jan 20, 2023 •

edited by fryorcraken

Loading

jm-clius commented Jan 25, 2023 •

edited

Loading

11. New `multiaddrs` discovery: libp2p rendezvous

jm-clius commented Feb 2, 2023 •

edited

Loading

fryorcraken commented Aug 10, 2023 •

edited

Loading

fryorcraken commented Oct 23, 2023 •

edited

Loading

fryorcraken commented Oct 31, 2023 •

edited

Loading

jm-clius commented Nov 3, 2023 •

edited

Loading

jm-clius commented Nov 10, 2023 •

edited

Loading

jm-clius commented Nov 27, 2023 •

edited by fryorcraken

Loading