Metadata Blocks Job

This is about handling upstream changes to “system” metadata blocks and how to handle custom metadata block support.

Deploy and update Dataverse metadata blocks

Many upstream releases contain changes to the upstream metadata schemas. Simply deploy a “metadata update job”.

Deploying your own custom schemas can be done in the same way. You will need to get your custom metadata inside that job somehow, see below.

kubectl create -f https://gitcdn.link/repo/IQSS/dataverse-kubernetes/release/k8s/dataverse/jobs/metadata-update.yaml

Important

Please be sure to read Search Index Jobs thoroughly, too. You might need to reindex, depending on changes.

@startuml
start
:Find all TSV in /metadata (and /opt/dataverse);
:Load all schemas via POST;
:Trigger webhook to reconfigure Solr Index;
stop
@enduml

Force re-export of citation metadata after update

Especially when the core citation.tsv metadata schema changed, you will need to re-export all citation metadata. A simple job does the trick:

kubectl create -f https://gitcdn.link/repo/IQSS/dataverse-kubernetes/release/k8s/dataverse/jobs/metadata-reexport.yaml

Having a large set of published dataverses and datasets, you might want to run this during off-hours.

See also upstream admin guide about metadata exports.

How to get custom metadata blocks inside the job

Deploying metadata is reusing the Image “dataverse-k8s” by default. You need to drop metadata TSV files to the /metadata directory of the jobs container (see also important directories of dataverse-k8s)

This can happen via

  • custom/derived images

  • volume mounts

  • ConfigMap file mounts

  • sidecar container(s), downloading/cloning/checking out/…

Hint

  1. ConfigMaps seem to be the easiest option, but in case you use large or large amounts of custom metadata blocks, you might choose differently.

  2. You could override upstream blocks this way. You shouldn’t do it. Up to you.

Example with curl init container

You could create a Job based on k8s/dataverse/jobs/metadata-update.yaml, which you extend like below. (Download full example)

      containers:
        - name: metadata-update
          volumeMounts:
            - name: custom-metadata
              mountPath: /metadata
              readOnly: true
      initContainers:
        - name: get-metadata
          image: giantswarm/tiny-tools
          command:
            - "curl"
          args:
            - "-sSo"
            - "/metadata/test.tsv"
            - "https://gist.githubusercontent.com/poikilotherm/e54660ab99a24b12e5179621c9c7efb5/raw/960085c8277ad33fa1e52f3c16a38ec6df3ef281/test.tsv"
          volumeMounts:
            - name: custom-metadata
              mountPath: /metadata
      volumes:
        - name: custom-metadata
          emptyDir: {}