Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Milestone] Waku Network Can Support 10K Users #12

Closed
19 of 22 tasks
jm-clius opened this issue Jan 20, 2023 · 20 comments
Closed
19 of 22 tasks

[Milestone] Waku Network Can Support 10K Users #12

jm-clius opened this issue Jan 20, 2023 · 20 comments

Comments

@jm-clius
Copy link

jm-clius commented Jan 20, 2023

Priority Tracks: Secure Scalability
Due date: 31 May 2023
Milestone: https://github.com/waku-org/pm/milestone/5

Note: this deadline assumes that the target of 1 Mio users by end-June 2023 could lean for the largest part on the designed solutions for the problem space defined below.

Summary

  • Scale to 10K Status Community users, spread across ~10 to ~100 communities
  • This milestone focuses on 100% Desktop users, primarily using relay, but with experimental/beta support for filter and lightpush for clients with poor connectivity
  • Communities, private group chats and 1:1 chats should be considered. Public chats are excluded.

Tasks / Epics


Extracted questions

  • Are the number of users and number of communities realistic? Answer on 2023-01-19: yes, makes sense as an initial goal
  • What is the proportion (in message rate and bandwidth) of community messages vs community control messages vs store query-responses?
  • Does message rate increase linearly with increase in network size? Answer on 2023-01-19: generally should be the case (could have a multiplicative factor, but not combinatorial or exponential)_
  • What bandwidth upper bound should we target for Desktop nodes? One possible answer: ADSL2+ limit of 3.5 Mb/s?
  • Can this MVP consider participation in only one Community at a time? Answer on 2023-01-24: nodes will be part of multiple communities from the beginning.
  • What store query rate should we target for 10K users?

Network requirements

Note: this gathers the minimal set of requirements the Waku network must adhere to to support Status Communities scaling to 10K users. It does not propose a design.

1. Message Delivery and Sharding

Note: this section, especially, depends on app-defined user experience minimals. E.g. the app knows what (sub)set of messages is necessary "for a consistent experience" and this will feed into a pubsub topic, content topic and sharding design that does not compromise on UX. This process should also define when messages should be received "live" (relay) or opportunistically via history queries (store).

  1. Nodes should be able to receive (via relay or store) all community messages of the community they're part of.
  2. Nodes should receive live (via relay) all chat messages that is necessary for a consistent experience. A chat message is content sent by users either in a community channel, 1:1 or private group.
  3. Nodes should receive live (via relay) all control messages that is necessary for a consistent experience. Control messages are mostly used for community reasons, with some for 1:1 and private groups (e.g. online presence and X3DH bundle).
  4. Each community can utilize a single or multiple shards for control and community messages, as long as requirements (1) - (3) still hold.
  5. Nodes should participate in shards in such a way that resource usage (especially bandwidth) is minimized, while requirements (1) - (3) still hold.
  6. Peer and connection management should be sufficient to allow nodes to maintain a healthy set of connections within each shard they participate in.

Assumptions:

  • connectivity, NAT traversal capability, NAT hole punching, etc. is similar to that described Status MVP: Status Core Contributors use Status #7. No further work is required within the context of this MVP.
  • it is possible to be part of several communities simultaneously
  • we assume that community size is such that community desktop nodes can realistically be expected to relay the messages for all community traffic. That is - communities can be responsible for their own relay infrastructure.

2. Discovery

  1. Nodes should be able to discover peers within each shard they're interested in.
  2. Discovery method(s) can operate within a single or multiple shards, as long as:
  • requirement (1) still holds
  • nodes can bootstrap the chosen discovery method(s) for shards they're interested in
  • the chosen discovery method(s) does not add an unreasonable resource burden on nodes, especially if this method is shared between shards

Assumptions:

  • nodes are able to use discv5 as their main discovery method

3. Bootstrapping

  1. Nodes should be able to initiate connection to bootstrap nodes within the shards they're interested in.
  2. Bootstrap nodes can serve a single or multiple shards, as long as they can handle the added resource burden.

Assumptions:

  • Status initially provides bootstrapping infrastructure.
  • DNS discovery is sufficient to find initial bootstrap nodes.

4. Store nodes (Waku Archive)

  1. Nodes should be able to find capable store nodes and query history within the shards they're interested in.
  2. Store nodes can serve a single or multiple shards, as long as:
  • they can handle the query rate and resource burden
  • are discoverable as stated in requirement (1)

Assumptions:

  • Status provide initial store infrastructure, including a performant Waku Archive implementation.
  • PostgreSQL implementations exist for Waku Archive that can deal with the required rate of parallel queries to support 10K users
  • DNS discovery is sufficient to discover capable store nodes (these may or may not be the same nodes as used for bootstrapping, but discovery will be simpler if they are).

5. Security:

  1. Community members should not be vulnerable to simple DoS/spam attacks as defined in (3) and (4) below.
  2. Each community should be unaffected by failures and DoS/spam attacks in other communities. This implies some isolation/sharding in the messaging domain.
  3. Store/Archive:
    • store nodes for a community should only archive messages actually originating from the community
    • store nodes for a community should not be vulnerable to being taken down by a high rate of history queries
  4. Relay:
    • community relay nodes should only relay messages actually originating from the community.

Assumptions:

  • Community members (i.e. the application) are able to validate messages against community membership.

Other requirements

Note: this gathers the minimal set of requirements outside the Waku network (e.g. operational, testing, etc.) to support Status Communities scaling to 10K users.

1. Kurtosis network testing

A simulation framework and initial set of tests that can approximate:

  • the protocols
  • the discovery methods
  • the traffic rates for a typical community
    in such a way to prove the viability of any scaling design proposed to achieve the Network Requirements

2. Community Protocol hardening

The Community Chat Protocols specifications are moved to Vac RFC repo.

  • what else is required within this MVP time frame, e.g. including Community Chat in Kurtosis testing?

3. Nwaku integration testing

Nwaku requires integration testing and automated regression testing for releases to improve trust in stability of each release.

4. Fleet ownership

Ownership for infrastructure provided to Status communities should be established. This may require training and transfer of responsibilities which mostly lies de facto within the nwaku team.
Fleet ownership comprises the responsibility for:

  • establishing a sensible upgrade process (may require some nodes for staging)
  • upgrading fleets
  • monitoring existing fleets and protocol behavior
  • support and logging bugs when noticed

Initial work

The requirements above will lead to a design and task breakdown. Roughly the order of work:

Ownership for all three items below is shared between Vac, Waku and Status teams:

(1) Agree on requirements above as the complete and minimal set to achieve the 10K scaling goal.
(2) A viable, KISS network design adhering to "Network requirements"
(3) Task breakdown of each item and ownership assignment

@fryorcraken fryorcraken added this to Waku Jan 20, 2023
@jm-clius jm-clius moved this to In Progress in Waku Jan 23, 2023
@corpetty
Copy link

tagging Testing team: @AlbertoSoutullo, @0xFugue, @Daimakaimura

@jm-clius
Copy link
Author

jm-clius commented Jan 25, 2023

Achieving network requirements: tasks and ownership

NB: requirements and tasks may change as we encounter unknowns. The task breakdown below assumes that nothing more has to be done for Discovery and Bootstrapping other than proper configuration (to be described in the Scaling Strategy BCP).

1. Verify scaling target requirements

Understand the expected:

  • rate of chat messages
  • rate of control messages
  • rate of store queries
  • avg expected bandwidth usage of each of the above

for 10K community users.

Note: this is not necessarily an analytical exercise but ballpark figures and sanity checking current Status Community message rates. @Menduist has done analysis of message rates in large Discord servers to get to rough estimate of what we would expect to see for Status Communities. However, analysis of existing Status Community shows significantly higher message rate and bandwidth usage. See conversation.

Tracked in: ??
Owners:

2. Community sharding plan

Sharding strategy for Waku relay in general and Status Communities specifically. This plan will consider short term and longer term strategies.
This item is set out in more detail in @kaiserd's Secure Scaling Roadmap.

Tracked in: vacp2p/research#154
Owners:

3. Simple Waku Relay DoS mitigation

Strategy and implementation to protect relay and store against simple DoS attack vectors. This item is set out in more detail in @kaiserd's Secure Scaling Roadmap.

Tracked in: vacp2p/research#164
Owners:

4. Scalable storage: nwaku archive PostgreSQL implementation

Already part of #8 but repeated here for completeness.
Note that this includes work to allow concurrent queries.

Tracked in: #4
Owners:

  • postgresql backend: @cammellos
  • postgresql chronos/concurrency adaptation: ??

5. Scalable storage: deterministic message ID

Tracked in: vacp2p/rfc#563
Owners:

6. Scalable storage: testing store at scale

Basic testing to see that PostgreSQL implementation works at expected message and query rates. (Note this is in addition to simulation with Kurtosis).

Tracked in: ??
Owner: @LNSD

7. Filter and lightpush improvements

Revising the RFCs and implementations in nwaku and go-waku. Already part of #8 but repeated here for completeness.

Tracked in: #5
Owners:

8. Peer management strategy

RFC for basic peer management strategy and implementations in nwaku and go-waku.

Tracked in: waku-org/nwaku#1353
Owners:

9. Combine into comprehensive scaling strategy

This can be seen as the final goal for all the moving parts and separate tasks listed above. Output will likely take the form of one or more Best Current Practices RFCs that focus on the Status 10K use case. It will bring together the short term strategies for sharding, DoS mitigation, bootstrapping, discovery and store configuration. It may include suggestions on when to use lightpush and filter rather than relay.

Tracked in: vacp2p/research#165
Owners:

10. Targeted dogfooding

This is in addition to simulation with Kurtosis. Individual owners of each task will be responsible for testing and dogfooding their strategies/features. This task ensures that we have considered each item for targeted network testing, including:

  • Sharding strategy
  • Store at scale
  • Bootstrapping
  • Discovery (DNS, discv5 and Waku Peer Exchange)
  • Filter and lightpush

Tracked in: ??
Owner: @jm-clius

11. New multiaddrs discovery: libp2p rendezvous

Although it is possible to encode multiaddrs in ENRs, which are currently being exchanged by all existing discovery methods, ENRs are limited in size and can consequently not contain more than one or two multiaddrs. We need a discovery method more suitable for multiaddrs. We have chosen libp2p rendezvous as solution here.

Tracked in: vacp2p/research#176
Owners:

12. Waku static sharding implementation

This is an outflow of the Community sharding plan as specified by @kaiserd and covers the implementation portion, including configuration and enabling shard discovery via ENRs.

Owners:

@jm-clius
Copy link
Author

jm-clius commented Feb 2, 2023

Achieving other requirements: tasks and breakdown

1. Wakurtosis: first network test

This is described in vacp2p/wakurtosis#7

It covers testing the scalability of the relay protocol, specifically measuring:

  • latency
  • reliability
  • resource usage (bandwidth, CPU, memory)
    against network size and message rate.

Owner:

2. Wakurtosis: analyze first test results

This step will either confirm our (positive) assumptions about relay scalability or highlight bottlenecks/bugs in the protocol or implementations, which must be addressed and considered in the overall network roadmap.

Owners:

4. Wakurtosis: plan next tests

This is a collaborative task flowing from the results of the first test to refine the simulation(s) and plan the next, most useful tests.

Owners:

5. Community Protocol: move to Vac RFC repo

This is an administrative step. It may require updating the RFC to match the latest implementation, moving sections around, etc.

Owner:

6. Community Protocol: review protocols

Grasping the content of each protocol and how it maps to real-world Waku network traffic. This is potentially an involved task, so the scope should be minimized for this MVP. This relates to Verify scaling target requirements under the Network Requirements.

Owner:

7. Nwaku hardening: Wakurtosis sandbox machine

Provisioning a performant machine(s) which the dev team can use for sandbox testing features using ad-hoc Wakurtosis deployments.

Owners:

  • waku (requirements): @jm-clius
  • infra (devops): ??

8. Nwaku hardening: Wakurtosis integration testing

Integration test environment for nwaku. Most likely it will take the form of a pipeline that deploys a Wakurtosis network topology and runs a series of scripted integration tests for nwaku.

Owners:

  • waku (requirements): @jm-clius
  • infra (devops): ??

9. Nwaku hardening: release automation

Automated release pipeline for nwaku that builds a release, compile release notes and publish release binaries and tagged docker image for most common OSs/architectures.

Tracked in: waku-org/nwaku#611
Owners:

  • waku (requirements): @jm-clius
  • infra (devops): ??

10. Fleet ownership: set requirements

Create a document that summarizes all the common tasks that a fleet owner generally has to do, including deployment, monitoring and debugging. This will also allow us to communicate to other platforms planning on deploying their own Waku fleets what they need to consider. The document should include a section on what Status fleet ownership specifically entails, including a procedure to log and escalate bugs/network anomalies.

Owner: @jm-clius

11. Fleet ownership: training

Based on the requirements determined above, determine who will take ownership of the Status fleets and schedule training sessions.

Owner:

@LNSD
Copy link
Contributor

LNSD commented Feb 14, 2023

5. Scalable storage: deterministic message ID

Tracked in: vacp2p/rfc#563
Owners:

waku (protocol): @LNSD
nwaku (implementation): @LNSD
go-waku (implementation): @richard-ramos

The also known as Message Unique ID initiative progress is tracked in the following issue: waku-org/nwaku#1914

@fryorcraken
Copy link
Contributor

fryorcraken commented Aug 10, 2023

Thoughts on current status:

  • 1. Verify scaling target requirements

Several discussions have happen. outputs I am aware of are:

@jm-clius @richard-ramos did we have more to this?

This can be closed as static sharding was delivered. The quoted issue also tracks for 1mil.

vacp2p/research#164 (comment)

waku-org/nwaku#1888 (comment)

This needs clean-up. Implementation of MUID to avoid dupe in store is done. Which was the main reason to do it for 10k.
Moving forward, we could use MUID for gossipsub seen message logic, is that something we need for 1mil?

Then, MUID is possibly going to be used for Distributed store.

@jm-clius please confirm

  • 6. Scalable Storage: testing store at scale

vacp2p/research#191 (comment)

@jm-clius were we thinking DST simulation for this?

#5 (comment)

waku-org/nwaku#1353 (comment)

@jm-clius this seems done. Not sure if we tracked an output somewhere?

  • 10. Targeted dogfooding

I suggest to descope this from Waku work. By delivering this milestone we enable Status to integrate Waku tech and start dogfooding. We are tracking hardening of Waku protocols as part of waku-org/research#3 with 2.1

vacp2p/research#176 (comment)

  • 12. Waku static sharding implementation

Done. What issue tracked the work/output? @jm-clius

  • Setup staging fleet with static sharding for Status dogfooding

Last remaining task. Are we tracking somewhere @jm-clius ?
edit: is this it? status-im/status-go#3528

  • Specify fleet ownerships requirements to enable Status team to maintain own fleet

The other last remaining task. Are we tracking somewhere @jm-clius ?

@jm-clius
Copy link
Author

Thanks for revising, @fryorcraken. See my comments below.

Several discussions have happen. outputs I am aware of are:
vacp2p/research#177
@jm-clius @richard-ramos did we have more to this?

Afaik many of the suggestions have been implemented or are in the process of being implemented, also in status-go. @richard-ramos may have better idea of current status. Perhaps the work that's being done in status-go should be tracked there, which would mean the Waku side can be closed?

This can be closed as static sharding was delivered. The quoted issue also tracks for 1mil.

I agree.

This needs clean-up. Implementation of MUID to avoid dupe in store is done. Which was the main reason to do it for 10k.
Moving forward, we could use MUID for gossipsub seen message logic, is that something we need for 1mil? Then, MUID is possibly going to be used for Distributed store.

Yes, I would close vacp2p/rfc#563 as the only issue really needed for the 10K milestone. We also don't need to do anything else for the 1 mill milestone, but we can keep #9 open to track the work that would be necessary for the distributed store.

vacp2p/research#191 (comment)

@jm-clius were we thinking DST simulation for this?

Initially, yes. But I think a reasonable step for the 10K epic would be (a) dogfooding and (b) local stress-testing of postgresql.

  1. Combine into comprehensive scaling strategy Roadmap(SeM): Application Protocols vacp2p/research#165
    @jm-clius this seems done. Not sure if we tracked an output somewhere?

Yes, I've gone ahead and closed the issue. The output here was just moving the RFCs to vac repo and revising them.

  1. Waku static sharding implementation
    Done. What issue tracked the work/output? @jm-clius

Main tracking issue was: #15 which I think can just be closed. There were also tracking issues in nwaku (and probably go-waku/js-waku).

Setup staging fleet with static sharding for Status dogfooding
Last remaining task. Are we tracking somewhere @jm-clius ?
edit: is this it? status-im/status-go#3528

No, the first fleet that can be used for initial tests/dogfooding is tracked here: status-im/infra-waku#1 Since this fleet has been deployed, this issue can probably be closed. This is not quite a staging fleet for Status yet, which I'll link to the issue I create for the Status fleet requirements below.

Specify fleet ownerships requirements to enable Status team to maintain own fleet
The other last remaining task. Are we tracking somewhere @jm-clius ?

It is now: #61 Not a very detailed issue, but should do the trick. :)

@richard-ramos
Copy link
Member

I think suggestions from: vacp2p/research#177 have not been implemented, or I could not find them on status-go code.

@fryorcraken
Copy link
Contributor

Weekly Update

All software has been delivered. Pending items are:

@fryorcraken fryorcraken changed the title [Epic] Waku Network Can Support 10K Users [Milestone] Waku Network Can Support 10K Users Aug 24, 2023
@fryorcraken fryorcraken added Deliverable Tracks a Deliverable and removed Epic Tracks a sub-team Epic. labels Aug 24, 2023
@fryorcraken fryorcraken changed the title [Milestone] Waku Network Can Support 10K Users [Epic] Waku Network Can Support 10K Users Aug 24, 2023
@fryorcraken fryorcraken added Epic Tracks a sub-team Epic. and removed Deliverable Tracks a Deliverable labels Aug 24, 2023
@fryorcraken fryorcraken changed the title [Epic] Waku Network Can Support 10K Users [Milestone] Waku Network Can Support 10K Users Aug 25, 2023
@fryorcraken fryorcraken added Deliverable Tracks a Deliverable and removed Epic Tracks a sub-team Epic. labels Aug 25, 2023
@fryorcraken
Copy link
Contributor

Monthly Update

Staging fleet for Status (static sharding + Postgres) has been defined and handed over to infra: waku-org/nwaku#1914
Stress testing of PostreSQL in progress, INSERT done, SELECT in progress.

@fryorcraken
Copy link
Contributor

1k nodes simulation blogpost: vacp2p/vac.dev#123

@fryorcraken
Copy link
Contributor

fryorcraken commented Oct 23, 2023

Weekly Update

  • achieved:
    • Vac/DST team has done further runs with up to 600 nodes in the network as part of wrapping up a blog post report.
    • Staging fleet for Status with static sharding and PostgreSQL deployed and being tested by go-waku team using local changes in Status Desktop.
  • next:
    • Dogfooding of Status Desktop with Status staging fleet. Will aim to create a small internal Waku community.
    • Continue integration of static sharding in status-go.
  • risks:
    • Dependency on Vac/DST to conclude ~1k nodes simulations.
    • PostgreSQL implementation has not yet been proven more performant than SQLite. Further improvements and testing in progress.
    • Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK lead on 6 Nov to drive effort.

@fryorcraken
Copy link
Contributor

fryorcraken commented Oct 31, 2023

Weekly Update

  • Integration of static sharding in go-waku is continuing (see updates below).

  • Testing of PostgreSQL enabled some performance improvement in the implementation that are being implemented.

  • Internal instructions have been distributed to dogfood static sharding with the Waku team (Waku Discord private channel).

  • risks:

    • Dependency on Vac/DST to conclude ~1k nodes simulations.
    • Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK lead on 6 Nov to drive effort.
    • lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.

@jm-clius
Copy link
Author

jm-clius commented Nov 3, 2023

Weekly Update

  • achieved:
    • further PostgreSQL optimisations nearing conclusion
    • implemented bridge to allow Status Community to move to static sharding with backwards compatibility towards default pubsub topic
    • solution for shared bootstrap nodes being filtered out in discv5 as more static shards are activated
    • ensured no unknown blockers from Waku's side to start dogfooding in conversation with Status Communities
  • next:
    • continue integration of static sharding in status-go.
    • deploy bridge for backwards compatibility
    • dogfooding of Status Desktop with Status staging fleet. Will aim to create a small internal Waku community
  • risks:
    • Dependency on Vac/DST to conclude ~1k nodes simulations.
    • Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK lead on 6 Nov to drive effort.
    • lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.
    • lack of clarity in terms of Status fleet ownership, monitoring and maintenance, which is an integral part of the solution.

@jm-clius jm-clius closed this as completed Nov 3, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Waku Nov 3, 2023
@jm-clius jm-clius reopened this Nov 3, 2023
@jm-clius
Copy link
Author

jm-clius commented Nov 10, 2023

Weekly Update

  • achieved:
    • final PostgreSQL optimisations completed. Benchmarks published: https://www.notion.so/Postgres-e33d8e64fa204c4b9dcb1514baf9c582
    • added "debug nodes" with trace-level message logging to each Status fleet to allow for easier e2e message traceability
    • confirmed no unknown blockers from Waku's side to continue dogfooding in conversation with Status Communities
  • next:
    • continue integration of static sharding in status-go.
    • dogfooding of Status Desktop with Status staging fleet. Will aim to create a small internal Waku community
  • risks:
    • Dependency on Vac/DST to conclude ~1k nodes simulations.
    • Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK lead on 6 Nov to drive effort.
    • lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.
    • lack of clarity in terms of Status fleet ownership, monitoring and maintenance, which is an integral part of the solution.

@jm-clius
Copy link
Author

Weekly Update

  • achieved:
    • closed last PostgreSQL issue for Store scalability
    • confirmed no unknown blockers from Waku's side to continue dogfooding in conversation with Status Communities
    • started team-internal dogfooding of a test community using static sharding
    • started fleet ownership handover process: published guidelines/list of responsibilities - https://www.notion.so/Fleet-Ownership-7532aad8896d46599abac3c274189741
  • next:
  • risks:
    • Dependency on Vac/DST to conclude ~1k nodes simulations.
    • Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK lead on 6 Nov to drive effort.
    • lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.
    • lack of clarity in terms of Status fleet ownership, monitoring and maintenance, which is an integral part of the solution.

@jm-clius
Copy link
Author

jm-clius commented Nov 27, 2023

Weekly Update

  • achieved:
  • next:
  • risks:
    • Fleet Ownership doc defines fleet maintainer and owner. Status team yet to clarify who is the fleet owner for Status Communities.
    • QA by Status team to be planned on staging static sharding fleet; Waku team has done internal dogfooding (report). Any change to the staging static sharding fleet should then be tested by QA before being deployed to prod (e.g. # of Postgres instances). Status has committed to this testing on 28Nov call.
    • Status team expressed will to deploy static sharding prod fleet and use it for all users: This is not recommended until proper QA is done on stagning static sharding fleet as it could impact other Status app activities.
    • Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK team since November 2023 to drive effort.
    • Dependency on Vac/DST to conclude ~1k nodes simulations; lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.

@fryorcraken
Copy link
Contributor

We will run one more week of internal dogfooding of static sharding + PostgreSQL in Status Communities.
Once done and if no new issues are found. We will close this issue.

The go-waku and waku chat sdk team will continue to support Status with their integration of Waku v2 but no major effort is scheduled in term of software development and testing.

@jm-clius
Copy link
Author

jm-clius commented Dec 8, 2023

Weekly Update

  • achieved:
  • next:
  • risks:
    • Fleet Ownership doc defines fleet maintainer and owner. Status team yet to clarify who is the fleet owner for Status Communities.
    • QA by Status team to be planned on staging static sharding fleet; Waku team has done internal dogfooding (report). Any change to the staging static sharding fleet should then be tested by QA before being deployed to prod (e.g. # of Postgres instances). Status has committed to this testing on 28Nov call.
    • Status team expressed will to deploy static sharding prod fleet and use it for all users: This is not recommended until proper QA is done on stagning static sharding fleet as it could impact other Status app activities.
    • Implementation of static sharding in Status Communities and design decisions mostly driven by go-waku developer, with minimal input from Status dev (1, 2, 3). See status-go#4057 for remaining work. Mitigation by on-boarding Chat SDK team since November 2023 to drive effort.
    • Dependency on Vac/DST to conclude ~1k nodes simulations; lack of confidence in simulation results: results so far exhibits various artifacts and anomalies seemingly related to tooling limitations. It is therefore difficult to draw certain conclusions re Waku scalability.

@fryorcraken
Copy link
Contributor

#97 is now done. Status QA is proceeding with testing.
Most changes are now focused on status-go with ad hoc bug/issue investigation from Waku team. This Milestone can now be closed 🎉

@chair28980 chair28980 removed the Deliverable Tracks a Deliverable label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

6 participants