Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alternativeSet / setComponent / componentEntry #154

Closed
2 of 9 tasks
fordmadox opened this issue Aug 9, 2020 · 14 comments
Closed
2 of 9 tasks

alternativeSet / setComponent / componentEntry #154

fordmadox opened this issue Aug 9, 2020 · 14 comments

Comments

@fordmadox
Copy link
Member

fordmadox commented Aug 9, 2020

Creator of issue

  1. Mark Custer

The issue relates to

  • EAC-CPF schema issue
  • EAC-CPF Tag Library issue
  • EAD schema issue
  • EAD Tag Library issue
  • Schema issue
  • Tag Library issue
  • Suggestions for all schemas
  • Suggestions for all Tag Libraries
  • Other

Wanted change/feature

  • Is there any reason to keep alternativeSet? Is there a use case for this module in EAD, Functions, EAG, etc.?
  • In the ~225k EAC files I've seen so far, there are 0 uses of alternativeSet. Does anyone have real-world examples?
  • The tag library highlights an example of embedding an alternative EAC file. I've since defined objectXMLWrap so that it cannot include elements in the source file's namespace, however (the same way that this is set up in EAD3). To allow objectXMLWrap to contain any element with any attribute AND to use the ID datatype in RNG elsewhere, I believe that excluding the default namespace from objectXMLWrap is required, and doing that makes sense to me. If we really want to include EAC files/snippets in objectXMLWrap within EAC files, however, then we could move that ID check to the Schematron for the RNG schema, but I would prefer not to deviate from how EAD3 has this modeled currently since that definition makes more sense, to me at least.
  • Given all of that, is there any reason to keep this section of EAC? I'm adding it to the draft schema, but if this feature has never been used (aside from when creating examples in the tag library), I'd suggest it be removed entirely.

Reporting a bug

  • Text:

Suggested Solution

  • Text:

Steps to Reproduce (for bugs)

Context

  • Text:

Your Environment can be a clue to a bug

  • Version used:
  • Environment name and version (e.g. Chrome 39, node.js 5.4):
  • Operating System and version (desktop or mobile):
@fordmadox fordmadox changed the title alternativeSet / setComponent alternativeSet / componentEntry Aug 9, 2020
@fordmadox
Copy link
Member Author

fordmadox commented Aug 9, 2020

I should also mention that if we can find any examples from aggregators (APE perhaps, since I don't think there are any from SNAC), I'd be curious to see how the grouped authority records could be encoded using the new relations data model.

@fordmadox fordmadox changed the title alternativeSet / componentEntry alternativeSet / setComponent / componentEntry Aug 11, 2020
@ailie-s
Copy link

ailie-s commented Sep 9, 2020

The National Library of Australia's Trove uses <alternativeSet> when aggregating data from multiple sources as EAC-CPF.

@fordmadox
Copy link
Member Author

@ailie-s thanks so much for the tip!

I've now found some examples, e.g. http://www.nla.gov.au/apps/srw/search/peopleaustralia?query=pa.firstname+%3D+%22ailie%22&version=1.1&operation=searchRetrieve&recordSchema=urn%3Aisbn%3A1-931666-33-4&maximumRecords=10&startRecord=1&resultSetTTL=300&recordPacking=xml&recordXPath=&sortKeys= I'll add those to my set, and see if I can't figure out how they are utilized by Trove.

Is there anyone there we could talk to about their current EAC implementation?

The current implementation presents a question for objectXMLWrap, I think. EAC currently allows any namespace here, whereas EAD allows any element in a namespace other than itself. The NLA Trove examples embed EAC-namespaced elements within EAC, though. For that use case, I think it would make more sense to allow the eac-cpf element to be included in setComponent, rather than adding it to objectXMLwrap.

Also, I just noticed that the schema says that componentEntry is optional (and not included in the NLA files), whereas the tag library says: "The mandatory <componentEntry>..."

@kerstarno
Copy link
Contributor

About the <objectXMLWrap> question: didn't we say that we'd need to open this up in EAD once we've moved to a shared namespace for the EAS? I.e. have <objectXMLWrap> with any namespace in both contexts? And wouldn't this then include the root element(s) anyway?

Just wondering if allowing <eac-cpf> (or rather <eac> following a decision from the last EAC team meeting) within <setComponent> as such would open Pandora's box of a potentially endless nesting...

@fordmadox
Copy link
Member Author

fordmadox commented Sep 14, 2020

@kerstarno the primary reason that I like the EAD3 approach is that if you allow any element in the current namespace to be added to objectXMLWrap, then you allow anyone to intentionally (or not) create any element whatsoever in the EAC namespace. Here's an example:

        <alternativeSet>
            <setComponent>
                <objectXMLWrap>
                    <mads xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mads/mads.xsd">
                        <authority>
                            <name>
                                <namePart>Smith,John</namePart>
                                <namePart type="date">1995-</namePart>
                            </name>
                        </authority>
                        <variant type="other">
                            <name>
                                <namePart>Smith, J</namePart>
                            </name>
                        </variant>
                        <variant type="other">
                            <name>
                                <namePart>Smith, John J</namePart>
                            </name>
                        </variant>
                        <note type="history">Biographical note about John Smith.</note>
                        <affiliation>
                            <organization>Lawrence Livermore Laboratory</organization>
                            <dateValid>1987</dateValid>
                        </affiliation>
                    </mads>
                </objectXMLWrap>
            </setComponent>
        </alternativeSet>

It looks like I've just embedded a MADS file using EAC's alternativeSet approach, and I have, but since the namespace has not been changed, what I've also done is define nine new elements that are part of the EAC-CPF namespace (e.g. eac:affiliation). That's why I think it's far, far better not to allow an element in any namespace to be part of objectXMLWrap in EAS (but I still think we could do without objectXMLWrap altogether; if kept, I think we should force a namespace switch, as EAD3 has done, which still allows users to embed XML that is not in any namespace whatsoever).

As for the possibility of endless recursion, that is already possible in EAC 1.0. See the attached file (and the only thing different here, is that I also added a new element, in the EAC namespace, named "whenShouldIStop" 😄 ), which I had to change to a TXT file since GitHub wouldn't allow an XML file, but it is perfectly valid according to EAC 1.0

RCR00751.recursion.example.txt

@fordmadox fordmadox reopened this Sep 14, 2020
@fordmadox
Copy link
Member Author

fordmadox commented Sep 14, 2020

So, instead, if we keep objectXMLWrap around for the long haul in EAS, then I'd strongly suggest we follow the EAD3 way for how that is defined. Whenever you want to embed snippets of elements in the current namespace, then I think we should actually define what's allowed via the schema (e.g. perhaps we want to allow any component element to be included in a setComponent element). I wouldn't want to do that, but I think that would be the better approach, rather than allowing any EAD3 element to be included in objectXMLWrap, because when you do that, you allow folks to add any element name whatsoever to the EAD3 namespace, which is a real mess (e.g. ead3:mets, etc.).

@kerstarno
Copy link
Contributor

kerstarno commented Sep 15, 2020

I mainly meant that, if we have a shared namespace (EAS) for both EAC and EAD, then we would possibly need to allow having that namespace as an option in <objectXMLWrap> so that one could have snippets of an EAC file within an EAD file and the other way round, i.e. snippets from the file's own (EAS) namespace. But maybe we don't want that and would rather people create a separate EAC (or EAD file) and link to that.

As for the more general aspect, and maybe I've always misunderstood <objectXMLWrap>: isn't it that by declaring the namespace http://www.loc.gov/mads/mads.xsd within the direct sub-element of <objectXMLWrap> one specifies that the elements that follow are from the MADS namespace? I wouldn't interpret this as adding these elements to the EAS namespace. Or would we need to have something like the following in order to make 100% sure that's not the case?

<objectXMLWrap>
  <mads:mads xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="http://www.loc.gov/mads/mads.xsd">
    <mads:authority>
      <mads:name>
        <mads:namePart>Smith,John</mads:namePart>
        <mads:namePart type="date">1995-</mads:namePart>
      </mads:name>
    </mads:authority>
    <mads:variant type="other">
      <mads:name>
        <mads:namePart>Smith, J</mads:namePart>
      </mads:name>
    </mads:variant>
    <mads:variant type="other">
      <mads:name>
        <mads:namePart>Smith, John J</mads:namePart>
      </mads:name>
    </mads:variant>
    <mads:note type="history">Biographical note about John Smith.</mads:note>
    <mads:affiliation>
      <mads:organization>Lawrence Livermore Laboratory</mads:organization>
      <mads:dateValid>1987</mads:dateValid>
    </mads:affiliation>
   </mads:mads>
</objectXMLWrap>

@fordmadox
Copy link
Member Author

fordmadox commented Dec 3, 2020

Even with a shared namespace, I think we should model EAS such that we would never wrap EAS elements in objectXMLWrap. If it is important to allow, say, an "ead:c" element in an EAC relationship, then we should have the EAC schema permit the "ead:c" element to be present in that location of the EAC document (and/or to use our pointing/linking mechanisms, as you say... in fact, we could finally have an example of @xpointer in that case 😄). I wouldn't want to change course on EAD3's decision to exclude EAD3 from objectXMLWrap, since I think that's a good decision. Otherwise, anyone can indeed create a "mads" element in the EAD3 namespace (simply by not switching namespace declaration, which is the path of least resistance), despite the fact that we don't define a "mads" element in the EAD3 tag library or schema.

@fordmadox
Copy link
Member Author

fordmadox commented Dec 3, 2020

I should also add that for the EAC1 to EAC2 transformation process, this will not be an issue at all for the NLA files. Those will still have the "eac-cpf" documents embedded in the objectXMLWrap, and they will remain in the EAC1 namespace. But, it does raise a question for future usage for NLA-style aggregations of EAC documents. If we want EAC2 documents to be able to hold EAC2 documents, then a very simple option (which wouldn't open the floodgates for elements like objectXMLWrap/mads to show up in EAC2's namespace) would be to make the new "eac" element a valid child of "setComponent".

This was referenced Dec 21, 2020
@fordmadox
Copy link
Member Author

fordmadox commented Jan 2, 2021

One other possible option for the alternativeSet, if we really want to go the link-route rather than the embed-route (and if we aren't concerned with migration options, depending on NLA's response):

Now that we've inherited representation from EAD3, what about using that approach instead?

It is much simpler, but it would nevertheless allow us to come up with some examples, if we slightly alter the definition of that element, to show it could be used to point to parallel transformations of the file (as currently described in the EAD3 tag library, e.g. by pointing to a PDF transformation of the XML file) in addition to files / representations that were used to derive the current file (which could be used for EAC, as done by NLA, as well as for EAD, when those files are first derived, for example, from one or more bibliographic records).

@kerstarno
Copy link
Contributor

kerstarno commented Jan 4, 2021

I like the suggestion to look into using <representation> instead of <alternativeSet>. Not sure, if Ailie will be joining the next EAC team meeting (had an out of office reply from her email address earlier) and if she has had a response from NLA already. As this is specific EAC context, do we know if SNAC has been using <alternativeSet> in some way in their set-up stage, if they aren't using it anymore?

@fordmadox
Copy link
Member Author

fordmadox commented Jan 8, 2021

SNAC does not serialize anything to the 'alternativeSet' section. I believe they manage that in their database, but not at all with the EAC records, aside from including a reference as a "source", e.g.:

      <source xlink:href="http://www.aaa.si.edu/collections/findingaids/hanbuna.xml" xlink:type="simple">
        <objectXMLWrap>
          <container xmlns="">
            <filename>/data/source/findingAids/aar/hanbuna.xml</filename>
            <ead_entity en_type="persname" encodinganalog="600" source="lcsh">Fuller, R. Buckminster (Richard Buckminster), 1895-</ead_entity>
          </container>
        </objectXMLWrap>
      </source>

@fordmadox
Copy link
Member Author

And a lot of SNAC's initial setup files for ingest are here, https://github.com/snac-cooperative/snac_eac_cpf_utils, but again, "alternativeSet" is not used to keep track of source MARC files, etc.

@SJagodzinski
Copy link
Contributor

EAC-CPF meeting, Feb 2021:

With feedback from the National Library of Australia it seems clear that alternative set is used for EAC-CPF encoding of an entire authority record. Agreement to keep the element as it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants