Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I want to be able to use both online and local schema/schematron files. #599

Closed
tbarnes4 opened this issue Feb 24, 2023 · 13 comments Β· Fixed by #632
Closed

As a user, I want to be able to use both online and local schema/schematron files. #599

tbarnes4 opened this issue Feb 24, 2023 · 13 comments Β· Fixed by #632
Assignees

Comments

@tbarnes4
Copy link

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Anyone that validates a file/collection/bundle against an updated or new schema/schematron that has not been ingested online, while your file/collection/bundle also uses other schema/schematrons that are ingested online.

πŸ’ͺ Motivation

So that when I invoke the -x or -S options, I do not have to locally specify every single schema/schematron that a file/bundle/collection references when only one or two are not available online. It is frustrating when I validate a bundle, the tool gives errors for schema that are easily found online.

πŸ“– Additional Details

This may relate to issue #513.

Whenever we have to update a schema/schematron file or if a mission provides a new ldd or if we are testing a new/updated ldd, if the pds4 product label references other schema/schematron files that are completely valid, when I finally run the validate tool and invoke the option -x or -S, the validate tool requires me to specify all schema/schematron files that may be referenced throughout my label/bundle/collection. This means I have to track down every single reference before, or specify every single possible schema/schematron file. These options seem to make an all or nothing scenario instead of here are the missing or updated schema/schematrons needed to include in the list.

I would ask that when the -x or -S options are invoked, that the validate tool first checks the files specified by the -x or -S options, and then checks the online posted copies, and then if nothing is found, it should report errors as it does.

I will also note that when you do not invoke the -x or -S options, and validate finds a schema/schematron in the label it cannot find, it reports a WARNING schema_reference.4: Failed to read schema document and ERROR cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element, whereas if you do invoke the -x or -S options and do not specify all files, it will instead not give warning of missing schema, but will give similar errors as before, but for the online schemas (ex: ERROR [error.label.schema] line XYZ, AB: cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element 'disp:Display_Settings'. Only when you specify both online missing schemas and the online schemas, do you get no errors or warnings.

Acceptance Criteria

Given
When I perform
Then I expect

βš™οΈ Engineering Details

No response

@jordanpadams
Copy link
Member

@tbarnes4 we will triage this to see where it fits in priority for next build. In the interim, you can find all schemas and schematrons here: https://pds.nasa.gov/datastandards/dictionaries/index-1.19.0.0.shtml

Sorry for the inconvenience

@jordanpadams jordanpadams removed their assignment Feb 24, 2023
@al-niessner al-niessner self-assigned this Apr 17, 2023
@al-niessner
Copy link
Contributor

@jordanpadams using the example in #513 to fix this as well

@al-niessner
Copy link
Contributor

@jordanpadams

Using a butchered version of the data on #513 because it is so simple but having some problems. The switches -S and -x clearly want files from their description via the command line:

-S,--schematron <schematron files>      Specify schematron files.
-x,--schema <schema files>              Specify schema files.

However, the code is having fits because it wants a directory:

          validatingSchema = schemaFactory.newSchema(
              loadSchemaSources(VersionInfo.getSchemasFromDirectory().toArray(new String[0]))
                  .toArray(new StreamSource[0]));

If we are allowing content in the XML to define schema and schematron and have files override specific schema and schematron, then files make more sense. If we want to retain that all schema and/or schematron must be overridden then directories make more sense. From the tickets here it seems the desire is for files (selective overrides not all or nothing overrides).

So, do you want me to change:

  1. code to be files not directories (keep all or nothing)
  2. command line docs to be directories instead of files (keep all or nothing)
  3. code to be files not directories (override just what is given and use XML definitions for rest)
  4. command line docs to be directories (override just what is given and use XML definitions for rest)
  5. check arg for file or dir then process as 1/2 or 3/4,

@al-niessner
Copy link
Contributor

@jordanpadams

I should note that this is going to take a week of effort as the whole schema/schematron loading is going to need some rework. If the encoding stuff is more urgent let me know but it seems like there is enough time for both.

@jordanpadams
Copy link
Member

@al-niessner nope. this is fine. thanks

@jordanpadams
Copy link
Member

@al-niessner #3 is preferred solution:

  1. code to be files not directories (override just what is given and use XML definitions for rest)

@tbarnes4
Copy link
Author

@al-niessner @jordanpadams

@al-niessner #3 is preferred solution:

  1. code to be files not directories (override just what is given and use XML definitions for rest)

I think option 5 would be preferred. As I understand it, we currently have to specify each file (not a directory) when we use the -x or -S options. Having the option to specify a directory (perhaps in addition to, but not excluding, individual file calls) would be nice, but not required. This may be too complicated, and usually the list of changed/new schema/schematron files should be small. If I understand option 3 correctly, that should work well for us.

Thanks for adding this functionality. It will greatly help our node with easing our migration efforts and upcoming missions.

I can also foresee when we are versioning bundles/collections that we will not update certain product labels (or whole collections), and so multiple build versions may be called upon for different products. Having that capability will still be nice. When I validating last week, I noticed if I include a specific build with the -x and -S that it would exclude all other builds and suggest an update build XYZ for the other products that contain a different build. This don't believe that this happens if I don't specify a build with the -x and -S option calls.

@jordanpadams
Copy link
Member

I noticed if I include a specific build with the -x and -S that it would exclude all other builds

@tdbarnes4 that is actually intentional to "overwrite" the schemas/schematrons specified in the file so you could validate your products against the latest version of the PDS4 IM. Is there a specific reason why you are specifying the schemas via command-line instead of just pulling the online version?

@tbarnes4
Copy link
Author

tbarnes4 commented Apr 18, 2023 via email

@al-niessner
Copy link
Contributor

al-niessner commented Apr 18, 2023

@jordanpadams

Sorry, but it just got more complicated. The error is because the directories come from core.properties file while the direction to use the values in the properties file comes from the command line in not using the force option which is turned off automatically when -S or -x is given.

So, what about the properties file? Kill it with respect to the schema/schematron or kill the command line? They do seem at odds or, at the least, need explanation for anyone intending on using the -S or -x and the interaction among all of the options. I can write the explanation if you tell me the desired interaction.

@jordanpadams
Copy link
Member

@al-niessner can you direct me to where in the code it is actually getting a directory and/or what it is trying to do? I am not seeing anything that makes sense:

xml.version=1.0
library.version=1.14.0
pds.version=4.0
pds.default.namespace=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1B00=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1A10=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1A00=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1900=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1800=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1700=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1600=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1500=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1400=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1301=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1300=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1201=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1200=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1101=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1100=http://pds.nasa.gov/pds4/pds/v1
pds.default.namespace.1000=http://pds.nasa.gov/pds4/pds/v1

core.schematron.namespace=http://purl.oclc.org/dsdl/schematron

core.copyright=\nCopyright 2010-2020, by the California Institute of Technology.\nALL RIGHTS RESERVED. United States Government Sponsorship acknowledged.\nAny commercial use must be negotiated with the Office of Technology Transfer\nat the California Institute of Technology.\n\nThis software is subject to U. S. export control laws and regulations\n(22 C.F.R. 120-130 and 15 C.F.R. 730-774). To the extent that the software\nis subject to U.S. export control laws and regulations, the recipient has\nthe responsibility to obtain export licenses or other export authority as\nmay be required before exporting such information to foreign countries or\nproviding access to foreign nationals.

Regardless, I think we can probably skip looking in that file for the schema/schematron.

@al-niessner
Copy link
Contributor

al-niessner commented Apr 18, 2023

Here is the code that load the core.properties:

static {
try {
xmlParserVersion = org.apache.xerces.impl.Version.getVersion();
InputStream is = VersionInfo.class.getResourceAsStream("/core.properties");
if (is == null) {
throw new RuntimeException("ERROR: Unable to locate core.properties");
}
props.load(is);
String schemaDirString = System.getProperty(SCHEMA_DIR_PROP);
internalMode = (schemaDirString == null) ? true : false;
if (!internalMode) {
schemaDir = new File(schemaDirString);
if (!schemaDir.exists()) {
throw new RuntimeException("Schema directory does not exist: " + schemaDirString);
}
if (!schemaDir.isDirectory()) {
throw new RuntimeException("Schema directory is not a directory: " + schemaDirString);
}
} else {
schemaDir = null;
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}

The confusion happens here (useLabelSchema is related to the force switch which gets set to false when -S or -x is used):

if (useLabelSchema) {
LOG.debug("createParserIfNeeded:#00BB9");
validatingSchema = schemaFactory.newSchema();
} else {
LOG.debug("createParserIfNeeded:#00BC0");
// Load from user specified external directory
validatingSchema = schemaFactory.newSchema(
loadSchemaSources(VersionInfo.getSchemasFromDirectory().toArray(new String[0]))
.toArray(new StreamSource[0]));
}

@tloubrieu-jpl
Copy link
Member

We decided to temporarily remove the cucumber test for this ticket because it breaks all the other tests.
A different ticket aims at re-integrating this test #633

@github-project-automation github-project-automation bot moved this from Release Backlog to 🏁 Done in B14.0 May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: 🏁 Done
Development

Successfully merging a pull request may close this issue.

4 participants