[RFC] Jigsaw integration #1588
Labels
discuss
Issues intended to help drive brainstorming and decision making
enhancement
Enhancement or improvement to existing feature or request
RFC
Issues requesting major changes
Roadmap:Modular Architecture
Project-wide roadmap label
Jigsaw integration OpenSearch
This proposal addresses following problems related to OpenSearch codebase and its plugins framework -
Note: this proposal assumes breaking changes are allowed in OpenSearch 2.0 and that’s when these changes can go in all at once.
Solving above two problems using Java 9 modules will inherently address other problems such as -
Benefits of Java 9 modules?
Tradeoffs of Java 9 modules:
No support for module versions: Java 9 modules doesn’t support version. And for this usual build tools like gradle can be used just like they are used now to support versioning and specify dependency of a modular jar on any specific version.
Split packages: same packages cannot span across multiple jars and will result in split packages in JPMS. This was not the case before.
Cyclic dependencies are not allowed between module and dependency inversion using ServiceProvider is promoted in JPMS. This implies more refactoring work to split modules out of a gigantic module with circular dependencies among its packages.
Additional effort to maintain dependencies and export rules
Module configurations are immutable, so updating the modules at run time will not be supported out of the box.
There is a nice blog on concerns regarding Jigsaw - https://developer.jboss.org/blogs/scott.stark/2017/04/14/critical-deficiencies-in-jigsawjsr-376-java-platform-module-system-ec-member-concerns
Before jumping into integration, if you’re new to JPMS, below are some of its concepts which will be used throughout the document -
Module descriptor file: This files defines all the rules like dependencies and exports of a module as is named as
module-info.java
. It is placed at java source root. This is what differentiates a modular jar from non-modular.Module path: Just like classpath for plain jars, module paths are for modular jar. It is not mandatory though to load all modular jars on module path, if its loaded onto classpath, then they would behave like a plain jar. JPMS has special rules for readability between module path and classpath which would become clear in following sections.
Unnamed module: All plain jars, not containing the module descriptor files and on classpath, will be all be categorized as unnamed module and will all be placed in unnamed module group. These non-modular jars are just like plain jars in non-jpms system, they export everything and can depend on classes of any jars present on the classpath. So any plain jar can depend on other plain jars. Also, unnamed modules have access to all modular jars and they don’t have to specify any dependency rules to depend on them. So, in a way, if we place all jars (both modular and non-modular) in classpath, everything would run in a same way as it is now.
Named module: Modules containing module-descriptor file placed on module path, will automatically be treated as modular jars. Named modules can only depend on other named modules and cannot directly depend on unnamed modules. This is way too strict, isn’t it? JPMS provides 2 ways to access plain jars not yet modularized or unnamed modules on classpath - by specifying readability edge between source module and all unnamed modules OR using automatic modules as described in next section. Adding readability edge isn’t a pleasant idea because of 2 reasons -
Automatic module: jars without module-descriptor files, but placed on module path will be treated as automatic modules. Automatic modules have access to all named modules as well as unnamed modules on the classpath. They export all their packages. Also, both modular and non-modular jars have access to everything in automatic modules. They act as a bridge between named and unnamed modules and is a great way to create a link between modular jars on modulepath to use non-modular jar on classpath. Modular jars can refer them by the automatic name given by JPMS to these jars, which is derived from the jar name, more details here. Its a great way to progressively migrate packages to non-modular jars to its modular version.
They come with a tradeoff, since they export everything, so when a jar is modularized, it may stop exporting some of the packages used by its consumer, and that could be a breaking change for its consumer. For e.g. if we put lucene jars on module path and treat them as automatic modules, they will export all their packages. So as an OpenSearch plugin developer or one of its module, if it has dependency on this lucene automatic modules, their code might break when lucene actually modularize their jars and stop exporting some of the packages used by its consumer.
Split packages: when one package is present in multiple jars, they cannot be loaded as separate modules on module path. They all can reside on classpath though like the way it supports them today. OpenSearch codebase is full of split packages between server module and other modules like opensearch-core, opensearch-x-content, opensearch-cli etc. It also have split packages with lucene jars as it contains org.apache.lucene package. Not just OpenSearch, lucene also have packages spread across multiple modules, more details here - https://lucene.apache.org/core/7_3_0/MIGRATE.html and https://issues.apache.org/jira/browse/LUCENE-9623
There are other important concepts such as open modules for reflection access, patch module to avoid split packages and accessing services using ServiceLoader APIs etc, which one can explore on their own and are important to enumerate all possible scenarios and workarounds.
Lets visualize existing non-JPMS modules in OpenSearch and dependency graph -
Strategy to modularize
From everything on the classpath, modularize and move everything to the module path incrementally -
Note: The new modules in diagram above are hypothetical names used to explain the strategy.
Rules:
Below rules are just proposal and there might be better alternative available depending on what tradeoffs are acceptable and we are willing to take.
From unnamed to automatic module -
Automatic modules are work in progress modules where classes and packages are not yet final. So, packages can move from unnamed module to an automatic module over the time. The constraints are -
From automatic to named module -
Module should only be treated as a named module when all its dependencies either are modularized or are present in one of the automatic module. This rules should be enforced for most of the cases and readability edge from named to unnamed module should be avoided. There could be exceptions though for readability edge from named to all-unnamed modules only if that dependency cannot be immediately modularized and its an internal dependency of that named module and not something exposed and published to the external consumers. This is to prevent external consumers to not to depend on unnamed modules. Also, export rules needs to be defined before declaring a module named.
Open Questions:
How to prioritize initial use-cases to migrate to modules?
Should it be rest client or analyzers or geo.
how to define base modules for initial use cases which would fit in overall modularization strategy?
A good way is to analyze all packages in server modules and try to look at their dependency graph. There are cyclic dependency among almost all packages. Number of edges of dependency between 2 packages do help in finding if that dependency is really needed or not. Thereafter, we can use the role and concern of each package in the system which we want modularize and see if the dependency makes sense or not. If not, we need to refactor the code and remove those cycles.
Renaming package while modularization?
This is required to avoid split packages. With lack of better solution so far, the only way out is to rename packages while migrating. There are 2 problems with this approach -
Modularizing libraries like lucene which are not yet modularized and have split packages among their jars
Lucene recommends creating Uber jar for all modules and then use it as a dependency. We can make use of patch module to create one Uber-lucene jar for all lucene plain jars, and then add it to the module path and treat it as automatic module.
https://lucene.apache.org/core/7_3_0/MIGRATE.html & https://issues.apache.org/jira/browse/LUCENE-9623
Libraries getting modularized later will break consumers using them, is it something acceptable?
It can potentially break the plugins again using these dependencies.
Challenges:
Some of the perceivable challenges in implementing these changes -
Related requests and open issues in elasticsearch - elastic/elasticsearch#38299 & elastic/elasticsearch#28984
Looking for feedbacks here.
The text was updated successfully, but these errors were encountered: