SOLR-16957: Test user managed cluster with a twist! #1875

epugh · 2023-08-31T14:09:02Z

https://issues.apache.org/jira/browse/SOLR-16957

Description

BATS test for user managed index replication. This is a End 2 End test, not a unit test for Bash scripts.

Solution

Fire up three independent Solrs, set up replication via apis, trigger it, and see what happens.

We demonstrate starting up three independent Solr nodes in the Leader/Repeater/Follower pattern.
Then we create three seperate 'techproducts' collections, uploading the same configset three seperate times to demonstrate that there is no interconnection or shard config between them.
We then index some XML data on the Leader, and then check that it flows through the Repeater to the Follower.
This is repeated for some more documents.
Lastly, we shutdown the Repeater and demonstrate that the Follower still has all of it's documents available for querying.
We delete the data on the Leader, and then subsequantly bring back up the Repeater.
The Repeater perseves all fo the configuration that was done during the setup process after restarting, and immediatley copies over the now empty 'techproducts' index and we then see the Follower picks up that empty collection as well.

janhoy

Cool. I don't see the point in testing replication between two isolated SolrCloud nodes though, is that even supported? Are you thinking about some kind of usecase where you pull the index from a cloud cluster to an outside cluster for hot standby purposes?

janhoy · 2023-08-31T16:39:32Z

solr/packaging/test/test_replication.bats

+
+  # Not totally sure why this didn't load it's data, but it works for our needs!
+  run curl 'http://localhost:7574/solr/techproducts/select?q=*:*'
+  assert_output --partial '"numFound":0'  


I'd expect this to return 32 as well. Could we perhaps issue a deleteByQuery request to empty the index before replicating?

janhoy · 2023-08-31T16:45:25Z

solr/packaging/test/test_replication.bats

+  # Wish we could block on fetchindex..   Does checking details help?
+  sleep 5
+  run curl 'http://localhost:7574/solr/techproducts/select?q=*:*'
+  assert_output --partial '"numFound":32' 


Could we do a while loop instead of sleep perhaps? And another way to assert replication is to compare index generation/version, but probably not necessary here.

argh, no examples of a while.... I wonder... DO we need the bin/solr assert to assert a doc count with a query? I like the timeout capability built into the assert..

I am still wishing we had a better bin/solr assert ;-). The curl's are all over the place. And the timeout would be nice...

epugh · 2023-08-31T17:42:53Z

Cool. I don't see the point in testing replication between two isolated SolrCloud nodes though, is that even supported? Are you thinking about some kind of usecase where you pull the index from a cloud cluster to an outside cluster for hot standby purposes?

I'm thinking that if having two standalone Solr's each with their own embedded ZK works, well, we just eliminated the need for traditional "standalone" Solr. While having an easy upgrade path for all the folks who want to continue to have user managed index replication. We just change "bin/solr start" to do what today you enabled with "bin/solr start -c" and everything continues to work. Except now, everywhere we make a SolrCloud versus standalone decision, we only have SolrCloud. And all those tickets about "make X work in standalone solr" are now obsoleted...

janhoy · 2023-08-31T17:59:05Z

Fair enough, but that sounds like a new JIRA issue, not part of this test improvement?

I'm sceptical to plan a cluster with 6 nodes, each with its own source of truth in Zookeeper. How would you update the schema of your collection? In standalone, the schema.xml file is replicated to the replica. But that will not work here since Solr reads its schema from each local ZK. So then you need to do solr zk upconfig six times instead of one. Managing split source of truth will be a nightmare. What if you want to backup your collection, do you then do six BACKUP calls? etc etc.

I'm more in favor of improving solrcloud with replica modes to the point where there are no benefits of running standalone anymore.

epugh · 2023-08-31T18:01:32Z

You are quite right about probably conflating this with my other experiment to see how it works. if you are running a cluster with six nodes, then you probably SHOULD be using Solrcloud, and proper ZK.

I'll split this up, and we do need to figure out how to come up with a path to eliminate the solrcloud versus standalone divide...

epugh · 2023-09-09T11:05:01Z

@janhoy just to clarify, if I keep the NON zookeeper end 2 end test for replication, do you see that as valuable and worth merging? I'll split the zookeeper version of the test out into it's own PR... I'm interested in playing with it a bit more...

janhoy · 2023-09-09T18:00:06Z

Yea, not sure how much value it gives in addition to the replication handler tests in then test suite though? Can you comment on that?

epugh · 2023-09-25T13:08:30Z

i think we're going to see a lot more change in this area, adding basic auth, the potential changes aorund ZK.. so thinking this helps build confidence we didn't break anything.... Does that seem like enough upside?

janhoy · 2023-09-25T15:03:38Z

Not very convinced still 😉

epugh · 2023-09-25T15:04:19Z

;-) Okay. That's fair. I'll close this, and if we see value int he future we can reopen.

epugh · 2023-09-25T15:04:40Z

i hate having PR's that just hang out open for years in github ;-)

…o batsify it

… it's configuration, and then it starts repeating data.

epugh · 2024-02-11T19:24:12Z

@gerlowskija here is a proof of concept of user managed cluster based on our conversation last week!

…y want is solr cloud!

epugh · 2024-02-19T20:53:18Z

change SOLR_PORT to LEADER_PORT, REPEATER_PORT, FOLLOWER_PORT.... Also, could look up indexversion on leader, and then wait for it on repeater instead of sleep...

epugh · 2024-02-19T20:55:00Z

solr/packaging/test/test_replication.bats

+  run curl "http://localhost:${SOLR3_PORT}/solr/techproducts/select?q=*:*&rows=0"
+  assert_output --partial '"numFound":46' 
+
+  # Now lets stop our replicator


github-actions · 2024-04-24T00:00:32Z

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!

github-actions · 2024-10-07T00:01:00Z

This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again.

epugh · 2024-10-07T10:14:28Z

Still working on how/when this type of integration test becomes part of Solr!

In PR 2783 we talk about various approaches to deploying solr from small to large. It would be good to actually test those deployment scenarios. This tries the first one out.

epugh · 2024-11-01T16:57:04Z

I am back on the path of wanting to get this in. In SOLR-17492 (and the PR #2783) we talk about how to run Solr. However, how do we actually KNOW that it works? We see a lot of bugs that come from specific combinations of auths, cluster shape, features etc. While there may be more robust ways of supporting testing these combinations, bats is one way that we have here today. Maybe we have a serpate directory of them that get run less frequently, but that validate the various deployment scenarios?

epugh added 2 commits August 31, 2023 10:01

test user managed index replication

de0d244

lint

12649fb

janhoy reviewed Aug 31, 2023

View reviewed changes

reduce the duration by polling more... still not great.

1f28051

epugh added 2 commits September 12, 2023 08:16

back out the zookeeper POC

59f6ae1

Merge remote-tracking branch 'upstream/main' into SOLR-16957

d6947f6

epugh closed this Sep 25, 2023

epugh added 3 commits February 11, 2024 08:13

Merge remote-tracking branch 'upstream/main' into SOLR-16957

cb1cb84

Point in time, but I have it all working locally manually, now time t…

76c0440

…o batsify it

Bat enabled run!

28c7bc6

epugh reopened this Feb 11, 2024

github-actions bot added the start-scripts label Feb 11, 2024

epugh added 3 commits February 11, 2024 13:16

It ran end to end!

10cec45

Now we have a leader/replicator/follower setup!

dcd1599

Demonstrate stopping a repeater, bringing it back up and it preserves…

759792e

… it's configuration, and then it starts repeating data.

epugh changed the title ~~Test user managed index replication~~ Test user managed index replication with a twist! Feb 11, 2024

epugh changed the title ~~Test user managed index replication with a twist!~~ Test user managed cluster with a twist! Feb 11, 2024

Back out changes made to support standalone solr because what I reall…

2f3aae4

…y want is solr cloud!

github-actions bot removed the start-scripts label Feb 11, 2024

restore nightly

fb15f80

epugh changed the title ~~Test user managed cluster with a twist!~~ SOLR:16957: Test user managed cluster with a twist! Feb 11, 2024

epugh changed the title ~~SOLR:16957: Test user managed cluster with a twist!~~ SOLR-16957: Test user managed cluster with a twist! Feb 11, 2024

epugh commented Feb 19, 2024

View reviewed changes

rename port to try and tell a clearer story

552c454

github-actions bot added the stale PR not updated in 60 days label Apr 24, 2024

github-actions bot added the closed-stale Closed after being stale for 60 days label Oct 7, 2024

github-actions bot closed this Oct 7, 2024

epugh added exempt-stale Prevent a PR from going stale and removed stale PR not updated in 60 days closed-stale Closed after being stale for 60 days labels Oct 7, 2024

epugh reopened this Oct 7, 2024

epugh added 3 commits November 1, 2024 12:41

Merge remote-tracking branch 'upstream/main' into SOLR-16957

0ce6add

update cli options

57c3e13

Introduce idea of a scenario.

ddcfcdf

In PR 2783 we talk about various approaches to deploying solr from small to large. It would be good to actually test those deployment scenarios. This tries the first one out.

epugh mentioned this pull request Nov 19, 2024

SOLR-17306: fix replication problem on follower restart #2873

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-16957: Test user managed cluster with a twist! #1875

SOLR-16957: Test user managed cluster with a twist! #1875

epugh commented Aug 31, 2023 •

edited

Loading

janhoy left a comment

janhoy Aug 31, 2023

janhoy Aug 31, 2023

epugh Sep 12, 2023

epugh Feb 11, 2024

epugh commented Aug 31, 2023

janhoy commented Aug 31, 2023

epugh commented Aug 31, 2023

epugh commented Sep 9, 2023

janhoy commented Sep 9, 2023

epugh commented Sep 25, 2023

janhoy commented Sep 25, 2023

epugh commented Sep 25, 2023

epugh commented Sep 25, 2023

epugh commented Feb 11, 2024

epugh commented Feb 19, 2024

epugh Feb 19, 2024

github-actions bot commented Apr 24, 2024

github-actions bot commented Oct 7, 2024

epugh commented Oct 7, 2024

epugh commented Nov 1, 2024

SOLR-16957: Test user managed cluster with a twist! #1875

Are you sure you want to change the base?

SOLR-16957: Test user managed cluster with a twist! #1875

Conversation

epugh commented Aug 31, 2023 • edited Loading

Description

Solution

janhoy left a comment

Choose a reason for hiding this comment

janhoy Aug 31, 2023

Choose a reason for hiding this comment

janhoy Aug 31, 2023

Choose a reason for hiding this comment

epugh Sep 12, 2023

Choose a reason for hiding this comment

epugh Feb 11, 2024

Choose a reason for hiding this comment

epugh commented Aug 31, 2023

janhoy commented Aug 31, 2023

epugh commented Aug 31, 2023

epugh commented Sep 9, 2023

janhoy commented Sep 9, 2023

epugh commented Sep 25, 2023

janhoy commented Sep 25, 2023

epugh commented Sep 25, 2023

epugh commented Sep 25, 2023

epugh commented Feb 11, 2024

epugh commented Feb 19, 2024

epugh Feb 19, 2024

Choose a reason for hiding this comment

github-actions bot commented Apr 24, 2024

github-actions bot commented Oct 7, 2024

epugh commented Oct 7, 2024

epugh commented Nov 1, 2024

epugh commented Aug 31, 2023 •

edited

Loading