Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick PRs for v2.10.14-RC.2 #5278

Merged
merged 33 commits into from
Apr 4, 2024

Conversation

neilalexander
Copy link
Member

@neilalexander neilalexander commented Apr 4, 2024

@neilalexander neilalexander requested a review from a team as a code owner April 4, 2024 18:59
@neilalexander neilalexander force-pushed the neil/release/v2.10.14 branch from c204c49 to f7f7a09 Compare April 4, 2024 20:47
depthlending and others added 28 commits April 4, 2024 21:50
Signed-off-by: depthlending <[email protected]>
The code was just checking if max_deliver is >= than backoff,
however infinite max_deliver is represented by -1, so it was erroring.
This fixes that behavior.

Signed-off-by: Tomasz Pietrek <[email protected]>
The main issue for router/gateway connections were that they would
be registered with the global account but never removed from there,
which would cause the connection to be retained. The memory leak
was apparent in one user setup with large amount of subscriptions
that were transfered to a slow gateway, causing accumulation of
partial buffers before the connection was dropped.

For gateway connections, we also needed to clean the outbound's
map since it can contain sublist referencing its own connection.
Same for subscriptions in c.subs.

Another found cause for retained connection is with leafnode where
we are using a subscription referencing the leafnode connection
that was globally registered in a global sublist but never removed.

Also the connection's c.out.sg (a `*Cond`) needed to be set to `nil`
when the connection is closed. Without that, it seems that the connection
would not be released (at least based on experimentation). We now
make sure that c.out.sg is not nil before sending a signal.

The bottom line is that it looks like having an object referencing
itself in some way, prevents the GC from releasing the object, even
if the "top level" object is no longer reachable. For instance,
suppose `obj.someField = obj` seem to prevent the GC from releasing
`obj`, even if `obj` itself is no longer referenced.

The original issue/PR (#5212) was simply cleaning the c.out.wnb,
but the way it was done was racy since c.out.wnb is used without
lock in flushOutbound. Once the retain/release issue is fixed,
cleaning this buffer is not really required (but good practice
especially if not running with GOMEMLIMIT), so we take care
of cleaning this up, but under the protection of the flushOutbound
flag. If set, flushAndClose() will not do the cleaning, flushOutbound
will do it.

Relates to #5212

Signed-off-by: Ivan Kozlovic <[email protected]>
Failed reads here might result in duplicate nonces.

Signed-off-by: Neil Twigg <[email protected]>
Otherwise we might not overwrite the message with random
bytes properly.

Signed-off-by: Neil Twigg <[email protected]>
In this case, an empty-but-non-nil map would not be copied, which
could result in the clone referring to the original map.

Also implement a deep copy of `AllowedConnectionTypes` for `NkeyUser`,
like `User` already has.

Signed-off-by: Neil Twigg <[email protected]>
…r stream and has a max messages limit and has discard policy new. The stream should never go over the limit but should still get all the messages in the stream being sourced (i.e. getting more messages as they are consumed from the WQ stream)

Signed-off-by: Jean-Noël Moyne <[email protected]>
Tests were executed against the parent commit of when they were
added to make sure that the tuning is not affecting the reproducibility
of the test. Note that some of the tests were sometimes not detecting
the bug, even when running with `-count=xxx`, so in that case, it
is difficult to know if the tuning would prevent detection if the
bug were to be reintroduced.

The FileStore and MsgTrace tests were also added as a matrix run
in Travis/GA.

Signed-off-by: Ivan Kozlovic <[email protected]>
Signed-off-by: Ivan Kozlovic <[email protected]>
Excluded the files that were auto-generated (with a do not edit
comment).

Signed-off-by: Ivan Kozlovic <[email protected]>
…store if upto sequence was not found, e.g. deleted already.

Signed-off-by: Derek Collison <[email protected]>
We were getting partial or no cache errors when hitting it from multiple go routines in quick succession.

Signed-off-by: Derek Collison <[email protected]>
…valid.

If we think we need to calculate a new first sequence but the one we have is still valid, this could lead to messages being skipped and not sent to a consumer.

Signed-off-by: Derek Collison <[email protected]>
Signed-off-by: Derek Collison <[email protected]>
Signed-off-by: Derek Collison <[email protected]>
…raddles multiple allocations in go 1.22.1 on darwin.

Signed-off-by: Derek Collison <[email protected]>
Signed-off-by: Derek Collison <[email protected]>
Specifically around interest and limited streams that are clustered. Since we operate consumers and streams independently, when a stream is removing msgs when acked and acks are coming into different servers at different times, etc. Messages can get stranded.

We introduce a better check of the interest state based on a collective low ack floor that will purge our copy of the stream to catch up.
We run this periodically in monitorStream and during certain conditions.

Thanks to Wally for introducing the test that stresses this part of the system.

Signed-off-by: Derek Collison <[email protected]>
For the example hardened systemd file, which is derived from config files which
Synadia uses in production, add more examples:

 * Show setting GOMEMLIMIT env var
   + but also nudge towards using the external environment file for it
 * Use an external environment config file if it exists
 * Show mount-point dependency for JetStream
 * Configure automatic restarts
 * Show using InaccessiblePaths to lock down FS access even further
 * Setup a service alias of simply 'nats'.
derekcollison and others added 4 commits April 4, 2024 21:50
It concerns me a little that by duplicating the same test for both
file and memory stores, that differences could slip in later that
mean we are no longer testing for consistency between the two stores.

This approach allows us to write tests that only use the `StreamStore`
API and have them tested against all store permutations.

Signed-off-by: Neil Twigg <[email protected]>
@neilalexander neilalexander force-pushed the neil/release/v2.10.14 branch from f7f7a09 to e3c9236 Compare April 4, 2024 20:50
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit 8cafe5b into release/v2.10.14 Apr 4, 2024
4 checks passed
@derekcollison derekcollison deleted the neil/release/v2.10.14 branch April 4, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants