Improvements to allow future deployment of multiple instances #7337

fxprunayre · 2023-09-13T12:40:50Z

Various experiments were done in GeoNetwork 3 to try to deploy multiple instances and the main issue was the Lucene index. Moving to Elasticsearch in version 4 allows to easily share the index and setup a cluster of Elasticsearch nodes if needed.

With https://metadata.vlaanderen.be/ team, we analyzed other limitations related to a multiple instances setup and fixed the main issue. Without altering a single instance mode, we describe below the changes made, provide a docker configuration for testing it and list remaining known limitations that we can improve in the future.

This PR combine the work done in the following PR for easier testing:

For testing the easiest is to use the docker configuration https://github.com/geonetwork/docker-geonetwork/tree/update-gn-4.4.0/4.4.0#clustering-experimental to easily start the main node and replicas.

Docker configuration is done in geonetwork/docker-geonetwork#107

Changes

HTML cache for formatter and WRO4J

This allows to have multiple instances starting up but having their own separate HTML cache. If not, H2 database are locked.

Caused by: org.h2.mvstore.MVStoreException: The file is locked: /.../WEB-INF/data/data/resources/htmlcache/wro4j-cache.mv.db [2.1.212/7]
	at org.h2.mvstore.DataUtils.newMVStoreException(DataUtils.java:1004) ~[h2-2.1.212.jar:2.1.212]

geonetwork.htmlcache.dir can be used to customize the location. Another workaround is to use LRU cache instead

cacheStrategy=disk-memory

replaced by

cacheStrategy=lru

to have in memory cache (but you can't benefit from the pre-built cache).

Schema initialization

On startup, GeoNetwork publish each schema plugin XSDs to /data/resources/xml/schemas folder. If more that one
instance starts at the same time, they can fails to copy the XSD. Adding a new data directory variables for the schemas -Dgeonetwork.schemapublication.dir=.

Admin / Site info

Admin console information panel provides relevant information to check node configuration (hostname, harvester configuration):

Harvester

Add configuration to only run scheduled harvesting tasks in one instance which will reload harvester configuration on a regular basis.

Java 11 fixes

Fix error on sending mail under Jetty

{"message":"NestedServletException","code":"runtime_exception",
"description":"Handler dispatch failed; nested exception is
 java.lang.NoClassDefFoundError: javax/activation/DataSource"}

Remaining known limitations

Harvester / Scheduler needs to be refreshed when the database harvester configuration is modified (the harvesting node refresh the schedule every 2 minutes as a stopgap solution)
Settings / When saving application settings, some modules need to be updated (not really blocker because those settings usually don't change often - restart or save settings on each node as a workaround)
- log file,
- DOI configuration,
- proxy configuration (use Java environment variable instead of database configuration)
Thesaurus / Local thesaurus modified in one node are not updated on others.

Future improvements

Define a way to exchange messages between node to be able to fix the above limitations (https://trac.osgeo.org/geonetwork/wiki/LoadBalanceable)
Editing session in parallel on same record - locking system to not allow multiple editing session

This allows to have multiple instances starting up but having their own separate HTML cache. If not H2 database are locked.

On startup, GeoNetwork publish each schema plugin XSDs to /data/resources/xml/schemas folder. If more that one instance starts at the same time, they can fails to copy the XSD. Adding a new data directory variables for the schemas -Dgeonetwork.schemapublication.dir=. Co-authored-by: Joachim Nielandt <[email protected]>

* Sort by insertion order * Showing data directory html cache directory setting in admin page Co-authored-by: Joachim Nielandt <[email protected]>

…vester scheduler. Add some more precise doc links.

Added hostname to the system information panel. As we discussed previously, more HA-related information is helpful to determine correct behaviour of multiple instances. Knowing which instance is running surely needs to be there. Co-authored-by: Joachim Nielandt <[email protected]>

When using multiple nodes, one is in charge of harvesting tasks. Use the following configuration for this node: ``` HARVESTER_SCHEDULER_ENABLED: "true" HARVESTER_REFRESH_INTERVAL_MINUTES: 2 ``` The node will then check every 2 minutes for any harvesting configuration changes and updates its schedule. This is a stopgap solution until we define a better messaging system to deal with those cases. Keep the default config if running only one node. Co-authored-by: Joachim Nielandt <[email protected]>

See geonetwork/docker-geonetwork@f21fd8c

on Java 11. See jetty/jetty.docker#10.

core/src/main/java/org/fao/geonet/kernel/SchemaManager.java

release/bin/startup.bat

release/bin/startup.sh

harvesters/src/main/java/org/fao/geonet/kernel/harvest/HarvestManagerImpl.java

harvesters/src/main/java/org/fao/geonet/kernel/harvest/RefreshHarvesterJob.java

…HarvesterJob.java Co-authored-by: Jose García <[email protected]>

sonarqubecloud · 2023-09-25T08:19:08Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
8 Code Smells

0.0% Coverage
0.0% Duplication

The version of Java (11.0.20.1) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

fxprunayre and others added 16 commits June 23, 2023 14:24

WRO4J / Move cache to HTML cache folder.

81c3257

This allows to have multiple instances starting up but having their own separate HTML cache. If not H2 database are locked.

API / Site info / Add all data directory info

f078d30

* Sort by insertion order * Showing data directory html cache directory setting in admin page Co-authored-by: Joachim Nielandt <[email protected]>

Test / Fix related to XSD publication dir.

dc46128

Translations / Add better message in case of harvester save issue.

6d34a41

Missing translations.

c774e9c

Bad configuration of a harvester could prevent geonetwork startup

e59ff9f

Proposal to disable harvesters.

6637e48

Java header.

d1c835a

System info / Add env variables related to database migration and har…

c5f5cd3

…vester scheduler. Add some more precise doc links.

Prettier.

83e4a51

Merge branch '424-multipleinstances' into ha

94261f7

Merge branch '440-harvesterscheduleconfig' into ha

71ed6ed

Merge remote-tracking branch 'origin/main' into ha

96ac3ca

fxprunayre added this to the 4.4.0 milestone Sep 13, 2023

This was referenced Sep 13, 2023

Introducing scheduled refresh of harvesterjobs. #7333

Closed

Harvester / Add property to turn off scheduling #7217

Closed

Improvements to allow future deployment of multiple instances #6990

Closed

fxprunayre added 4 commits September 13, 2023 17:21

Test / Add new properties.

3fc13fb

Test / Remove duplicated prop.

2d10959

Jetty / Form config consistent with default docker one

0a95b58

See geonetwork/docker-geonetwork@f21fd8c

Jetty / Update version and fix sending mail

a53fc64

on Java 11. See jetty/jetty.docker#10.

fxprunayre added the changelog label Sep 21, 2023

Adding schemapublication_dir to admin info panel

72b8521

josegar74 reviewed Sep 25, 2023

View reviewed changes

fxprunayre and others added 2 commits September 25, 2023 10:06

Java doc. Check also GeoNetworkDataDirectory for description of folders.

f216176

Update harvesters/src/main/java/org/fao/geonet/kernel/harvest/Refresh…

0e1b413

…HarvesterJob.java Co-authored-by: Jose García <[email protected]>

josegar74 approved these changes Sep 27, 2023

View reviewed changes

fxprunayre merged commit 2221501 into main Sep 28, 2023

fxprunayre deleted the ha branch September 28, 2023 06:29

f-necas mentioned this pull request Nov 16, 2023

Geonetwork 4.4 georchestra/geonetwork#261

Closed

josegar74 mentioned this pull request Nov 16, 2023

Add web/src/main/webapp/WEB-INF/data/data/resources/schemapublication folder to .gitignore #7493

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to allow future deployment of multiple instances #7337

Improvements to allow future deployment of multiple instances #7337

fxprunayre commented Sep 13, 2023 •

edited

Loading

sonarqubecloud bot commented Sep 25, 2023

Improvements to allow future deployment of multiple instances #7337

Improvements to allow future deployment of multiple instances #7337

Conversation

fxprunayre commented Sep 13, 2023 • edited Loading

Changes

HTML cache for formatter and WRO4J

Schema initialization

Admin / Site info

Harvester

Java 11 fixes

Remaining known limitations

Future improvements

sonarqubecloud bot commented Sep 25, 2023

fxprunayre commented Sep 13, 2023 •

edited

Loading