Tabular EAD generation and transformation
In the EHRI EAD dataset management interface XML can be transformed using two types of transformation: XSLT documents and tabular XQuery mappings. XSLT is more suitable for making small changes to documents, whereas XQuery transformations can use the more expressive programmatic syntax to perform more complex tasks in a slightly easier way (your mileage may vary though.) This documentation describes how tabular XQuery transformations are specified.
Each mapping consists of four fields:
- target-path
an XPath specifying where to create a node in the output document. This value must end with a
/
character.- target-node
the local name or, when prefixed by the
@
symbol, attribute name to create within the target-path.- source-node
an XPath expression pointing to a node within the source document, or
.
if the output node does contain a value referencing the source.- value
an XPath expression giving the value of the target node, given the source node as context. For example, the expression
text()
would return the text value of the source node, whereas a quoted string such as"Some text"
would give a literal value.
Documents should be built by adding mappings in hierarchical order, i.e. the /ead
node should go before the
/eadheader
node.
The follow provides an example that creates a minimal EAD document with fixed dummy values, and doesn't reference any source document (you won't actually want to do this, but go with me here):
target-path |
target-node |
source-node |
value |
---|---|---|---|
/ |
ead |
. |
|
/ead/ |
eadheader |
. |
|
/ead/eadheader/ |
eadid |
. |
“example-1” |
/ead/eadheader/eadid/ |
@countrycode |
. |
“GB” |
/ead/eadheader/ |
filedesc |
. |
|
/ead/eadheader/filedesc/ |
titlestmt |
. |
|
/ead/eadheader/filedesc/titlestmt/ |
titleproper |
. |
“Example EAD” |
/ead/ |
archdesc |
. |
|
/ead/archdesc/ |
@level |
. |
“fonds” |
/ead/archdesc/ |
did |
. |
|
/ead/archdesc/did/ |
unitid |
. |
“example-1” |
/ead/archdesc/did/ |
unitdate |
. |
“2021-06-01” |
/ead/archdesc/did/ |
unittitle |
. |
“Example EAD” |
/ead/archdesc/did/ |
physdesc |
. |
|
/ead/archdesc/did/physdesc/ |
@label |
. |
“Extent” |
/ead/archdesc/did/physdesc/ |
extent |
. |
“1 scrap of paper” |
/ead/archdesc/ |
scopecontent |
. |
|
/ead/archdesc/scopecontent/ |
p |
. |
“Merely an example” |
/ead/archdesc/ |
accessrestrict |
. |
|
/ead/archdesc/accessrestrict/ |
p |
. |
“None” |
That should generate output that looks like the following:
<ead xmlns="urn:isbn:1-931666-22-9">
<eadheader>
<eadid countrycode="GB">example-1</eadid>
<filedesc>
<titlestmt>
<titleproper>Example EAD</titleproper>
</titlestmt>
</filedesc>
</eadheader>
<archdesc level="fonds">
<did>
<unitid>example-1</unitid>
<unitdate>2021-06-01</unitdate>
<unittitle>Example EAD</unittitle>
<physdesc label="Extent">
<extent>1 scrap of paper</extent>
</physdesc>
</did>
<scopecontent>
<p>Merely an example</p>
</scopecontent>
<accessrestrict>
<p>None</p>
</accessrestrict>
</archdesc>
</ead>
With the basic structure in place we can start adding references to the source document, which, for the sake of simplicity, will be some Dublin Core:
target-path |
target-node |
source-node |
value |
---|---|---|---|
/ |
ead |
. |
|
/ead/ |
eadheader |
. |
|
/ead/eadheader/ |
eadid |
//dc:identifier[2] |
text() |
/ead/eadheader/eadid/ |
@countrycode |
. |
“US” |
/ead/eadheader/ |
filedesc |
. |
|
/ead/eadheader/filedesc/ |
titlestmt |
. |
|
/ead/eadheader/filedesc/titlestmt/ |
titleproper |
//dc:title |
text() |
/ead/ |
archdesc |
. |
|
/ead/archdesc/ |
@level |
. |
“fonds” |
/ead/archdesc/ |
did |
. |
|
/ead/archdesc/did/ |
unitid |
//dc:identifier[2] |
text() |
/ead/archdesc/did/ |
unitdate |
//dc:date |
text() |
/ead/archdesc/did/ |
unittitle |
//dc:title |
text() |
/ead/archdesc/did/ |
physdesc |
. |
|
/ead/archdesc/did/physdesc/ |
@label |
. |
“Extent” |
/ead/archdesc/did/physdesc/ |
extent |
//dc:format |
text() |
/ead/archdesc/ |
scopecontent |
. |
|
/ead/archdesc/scopecontent/ |
p |
//dc:description |
text() |
/ead/archdesc/ |
accessrestrict |
. |
text() |
/ead/archdesc/accessrestrict/ |
p |
//dc:rights |
text() |
Now, put actual values in the third source-node
column to reference the source, and an expression in the forth
column (in this case just text()
) to say what we want to do with the node.
Although this example is simple, these columns accept any XQuery expressions so can be as complicated as you need.
XQuery transformation parameters
Unlike XSLT transformations, the JSON-format parameter map can be given for specific dataset transformations doesn't
provide values that are readable by XQuery expressions. It _can_ however be used to provide additional namespace
prefixes that can be referenced in the source-node
expressions. For example, the given JSON parameter map:
{
"xlink": "http://www.w3.org/1999/xlink"
}
would enable yuo to use expressions like //xlink:href
in the source-node
field.
Tips and tricks
Split a string containing multiple values into a set of separate nodes:
In this case we have a node containing several values separated by a semi-colon:
<indexentry>
<geogname>Deutschland; Großbritannien; Kanada; Frankfurt am Main; Mannheim</geogname>
</indexentry>
We want to break this into several individual nodes:
<controlaccess>
<geogname>Deutschland</geogname>
<geogname>Großbritannien</geogname>
<geogname>Kanada</geogname>
<geogname>Frankfurt am Main</geogname>
<geogname>Mannheim</geogname>
</controlaccess>
For these two nodes we could use the following mappings:
target-path |
target-node |
source-node |
value |
---|---|---|---|
/ead/archdesc/ |
controlaccess |
. |
|
/ead/archdesc/controlaccess/ |
geogname |
fn:tokenize(//indexentry/geogname/text(), "; ") |
. |
That is:
create a node for the controlaccess (no value is needed here since it's a parent node)
create a node for the geogname values
for the geogname path, split the text value using fn:tokenize(/path/to/node, delimiter), where the path points to the source path and the delimiter is a ";"
use "." for the value since we're already dealing with strings and not nodes
Gotchas
Unfortunately there are quite a lot of ways to get difficult-to-understand errors from the mapping process due to the way the table is evaluated against the source document. Some example and possible fixes follow:
- Error: mapping-error at /ead: mapping-error at /ead/eadheader: err:XPST0003 at /ead/eadheader/eadid: Unknown function or expression.
this error resulting from the use of unicode quotes, specifically the unicode “right double quotation mark“ symbol instead of "normal ascii double quotes", that in turn resulted from copying and pasting to and from a text editor that sneakily replaced them. This was hard to spot. More generally, and XPST0003 error is likely to be the result of the fourth column - the output value - being a malformed XQuery expression or mistyped function.
FIX: be careful that quotes are ascii quotes and no other typos exist in the 4th column
- Cannot get the value of an attribute with a path like /oai/description/@type.
Use a "." instead of "text()" as the value expression since we want the verbatim value, which is already a string and not a node.
- Target paths are missing from output.
Make sure that the target path (first column) value ends with a forward-slash: "/".