Automating Exhibition Catalogue Creation

—A Guide

by ADA Team Editorial

v1.0

Published by: ADA Semantic Publishing Pipeline

Last updated: 2023-05-04

Created: 2023-04-28

Language: English (United Kingdom)

Created by: Editorial Team

About the prototype

Publication type: Use case - An exhibition catalogue

  1. We are creating a demonstration prototype: An exhibition catalogue about a baroque painting collection.

  2. Objectives:

    1. Write an exhibition catalogue essay using AI tools

    2. Review the 'catalogue essay and AI tools' as open peer review

    3. Create the parts of the the catalogue:

      1. Cover

      2. Colophon

      3. Essay

      4. Collection

  3. What is the collection?

    1. The catalogue uses part of a Wikidata based collection of Bavarian collections of Baroque paintings. See: 17C Bavarian painting.

    2. We focus on the Baroque period: Bavarian Collections, 1590-1750 query link

    3. We make a small collection of paintings - 9 in this case.

  4. How are we using computational publishing and what is the prototype experiment?

    1. Creating a publication from different distributed (federated) remote sources using linked open data.

    2. Showing how asyncronous work can be carried out by team working on a single publication - this is the power of the TOC part! Which in more advanced domains becomes package management.

Learning points

Workflow activities that will be covered to create the exhibition catalogue:

  1. Real-time collaborative editing,

  2. Creating a Wikidata query of a collection,

  3. Displaying a painting catalogue sample collection from Wikidata LOD query for a multi-format publication.

  4. Editing a Jupyter Notebook in MyBinder,

  5. Embedding media objects: Video - TIB AV Portal, and; Semantic Kompakkt,

  6. Using GitHub

  7. Accessing API content for colophon

  8. Editing Wikidata collection query in Juypter Notebooks

  9. Asycrononous collective working and making a publication from multiple remote Linked Open Data (LOD) sources, and

  10. Rendering a multi-format publication with CSS styling.

Software (open-source)

Over 2023/24 the computational components will be added to the ADA Semantic Publishing Pipeline as well as introducing Vivliostyle Create Book markdown renderer and swapping to Jupyter Book computational book platform away from Quarto – https://github.com/NFDI4Culture/ada

AI Software

To be confirmed

https://openai.com/blog/chatgpt

https://www.perplexity.ai/

Activity: Editing a Jupyter Notebooks and accessing video

Objective: Running and editing Juypter Notebooks in MyBinder and retrieving video and 3D models as embeds.

External LOD and media used: TIB AV Portal, and Semantic Kompakkt

Notes: Jupyter Notebooks editing in MyBinder

Links:

Steps

  1. Open Notebook in the browser using MyBinder - Video and 3D Notebook embeds - click the 'launch binder' button to run the Notebook in MyBinder.

  2. Add new videos and 3D models to the Notebook from TIB AV Portal and Semantic Kompakkt.

    1. Open a second browser tab and load TIB AV Portal

    2. Choose a video and copy across the video ID from the URL https://av.tib.eu/media/60729

    3. Paste the video ID into the video iframe field and run the cell to render

    4. Open Semantic Kompakkt demo site: https://kompakkt.wbworkshop.tibwiki.io/explore

    5. View a model, copy the iframe embed from the folder icon, top right. In the Notebook paste in the complete iframe cover replacing the existing iframe:

      <iframe

      name="Doric Column"

      src="https://kompakkt.wbworkshop.tibwiki.io/viewer/?entity=63e8c22910e4f555d1f656ca&mode=open"

      allowfullscreen

      loading="lazy"

      ></iframe>

  3. Run the Notebook

  4. 3D view size, we can make the initial view bigger, add:

    <iframe

    width="1200" height="630"

  5. Download Notebook

  6. Render some videos and 3D models in the Quarto book. Pass along video id codes and 3d models using a hedge doc and chat to the Quarto render. The rendering and final display will take less than 10 minutes (hopefully): a. The code needs to be added to the main repo; b. Rendered locally; c. Uploaded to GitHub; d. Time for GitHub Pages to finish loading.

    1. Code: https://github.com/SimonXIX/Experimental_Books_workshop/blob/main/paintings.ipynb

    2. Rendering: https://simonxix.github.io/Experimental_Books_workshop/paintings.html

Activity: GitHub on boarding

Objective: On boarding and familiarisation with using GitHub for publishing and asynchronous working.

Publication repository: https://github.com/NFDI4Culture/catalogue-003

Quarto Install

Quarto help docs: https://quarto.org/

Options

For general purposes use the manual install. The Docker install is for when you are running multiple environments on your computer or carrying out long term development.

Manual install

https://quarto.org/docs/get-started/

Note: They miss out that you need Python installed and Jupyter Lab and a working terminal to install them

https://www.python.org/downloads/

https://jupyter.org/install

Also install Panda: py -m pip install panda

As well as knowing what the Python prompt and escape looks like: https://stackoverflow.com/questions/41524734/how-to-exit-python-script-in-command-prompt

If on Windows pip or python wont run. Check the solution here to add the python path for the Terminal April 23.

Docker install

Docker Compose

Text source: Readme - Thanks to Simon Bowie


https://github.com/NFDI4Culture/cp4c

This repository also contains Docker Compose and Dockerfiles for running the various applications in Docker containers.

Run docker-compose up -d --build to start the containers and docker-compose down to stop the containers.

The jupyterlab container runs a stand-alone version of JupyterLab on http://localhost:8888. This can be used to edit any Jupyter Notebook files in the repository. The JupyterLab instance runs with the password 'jupyterlab'.

The nginx container runs Nginx webserver and displays the static site that Quarto renders. This runs at http://localhost:1337.

The quarto container starts a Ubuntu 22.04 container, installs various things like Python, downloads Quarto and installs it, and then adds Python modules like jupyter, matplotlib, and panda. It then runs in the background so Quarto can be called on to render the qmd and ipynb files into the site/book like so:

docker exec -it quarto quarto render

There's a known issue with Quarto running in the Docker container in macOS due to the amd64 emulation of Docker Desktop for arm64 MacOS. See discussion at quarto-dev/quarto-cli#3308. This shouldn't occur in any other environment running Docker.

Visual Code Studio - Quarto install

This is not covered at present as there are conflicts with other approaches.

Troubleshooting

Raise an issue in the GitHub project if you need support.

Quarto commands

quarto render

quarto preview

quarto check

Quarto Rendering

Quarto help docs: https://quarto.org/

Using quarto to render as multi-format.

Use the command line to run Quarto: Powershell, GitBash, Cygwin, OSX shell, or terminal in Visual Studio Code ('VSC), etc.

Execute the Quarto commands from the top level of your publication repository.

Steps

Troubleshooting

Problems encountered so far:

  1. Cover image: file needs to be local for epub rendering

  2. Python path on Windows: If on Windows pip or python wont run. Check the solution here to add the python path for the Terminal April 23.

  3. CMD.EXE - Quarto wont run. See https://www.windows-faq.de/2017/02/27/unc-pfade-in-der-eingabeaufforderung-benutzen/ (no fix as yet - April 2023.

  4. Moving a repository locally: From Stackoverflow.

    If you are using GitHub Desktop, then just do the following steps:

    1. Close GitHub Desktop and all other applications with open files to your current directory path.

    2. Move the whole directory as mentioned above to the new directory location. (NB: The fdirectory has to be completely moved).

    3. Open GitHub Desktop and click on the blue (!) "repository not found" icon. Then a dialog will open and you will see a "Locate..." button which will open a popup allowing you to direct its path to a new location.

Editing Quarto

Visual Studio Code is one option as an editor, but you can use any editor suite that you like.

Install Visual Studio Code (VSC)

https://code.visualstudio.com/

Editing

Load the whole repository folder into your editor.

The key file to edit in Quart is _quarto.yml. This file contains the main configurations for your publication.

If you are working on a fork the first thing you need to do is edit the repository address on line 19 - this will point the GitHub icon in your publication to your own GitHub repo.

repo-url: https://github.com/NFDI4Culture/catalogue-003

To add new sections to your publication jast add file names to the chapter list after line 12

The other settings can we read about on the Quarto support pages, for example Book Structure.

If you are using VSC you can run Notebooks as well as use the Terminal to run Quarto commands, and commit to GitHub.

Jupyter Notebooks: Setup, Editing, and Saving

General instructions for using Jupyter Notebooks.

Installation of Python and Jupyter Notebooks is covered in Quarto install section.

The default editor we use is Visual Code Studio, but you can use other Notebook editors.

Steps

First run: requirements.txt file for Python libraries. See: https://note.nkmk.me/en/python-pip-install-requirements/

Use:

# install with 'pip install -r requirements.txt'

Then you can run, edit, and save Notebooks.

Activity: Create a Wikidata query

Objective: User builds a Wikidata query. See example query: paintings, Bavarian Collections, 1590 - 1750 - query link

External LOD and media used: Wikidata LOD, and Wiki Commons, Web Gallery of Art (images) - https://www.wga.hu/

Notes: Wikidata Query (help)

Steps

  1. Go to https://query.wikidata.org/

  2. Build a query around the 17C Bavarian painting collection to replicate the catalogue selection to be used in Activity B. Example:

    1. Code Repo: Current publication link

    1. Rendering: Current publication link

    1. Example: Paintings; in collection; Bavarian Collections; 1600 - 1700 - query link

  1. Step-by-step instructions to replicate parts of this query link base on this collection 17C Bavarian painting:

    Note: You will use the lefthand 'Query helper' GUI, where you can type in names of items. Sometimes you need to enter a term twice to get the correct item to appear.

    Items names are in bold below with their corresponding identifiers.

    1. Go to https://query.wikidata.org/

    2. Enable split view with i info button top left.

    3. Filter: instance of P31, painting Q3305213 - wdt:P31 wd:Q3305213.

    4. Filter: collection P195, Bavarian State Painting Collection Q812285 - wdt:P195 wd:Q812285.

    5. Play button - bottom left - renders query below

    6. Show: creator P170; image P18; copyright status P6216; inception P571.

    7. Play button - bottom left - renders query below

    8. Image grid view :-)

    9. Limit

    10. Dates from to 1590-1750 (code only)

      BIND(YEAR(?inception) AS ?inceptionyear)

      FILTER((1590 <= ?inceptionyear) && (?inceptionyear < 1750 ))

    11. Link query: https://w.wiki/6MGX - results: https://w.wiki/6MGY

  2. Participants can change the selection criteria around the available criteria: artists, dates, etc., as in collection 17C Bavarian painting

  3. Completion: Paste your query link into the HedgeDoc link provided. https://demo.HedgeDoc.org/s/4gr9JvUS7

    - END of activity.

Activity: Making SPARQL Queries in Wikidata of Collections

Keywords: Collections, Wikidata, SPARQL

If you are not familiar with creating a Wikidata SPARQL query then see the section on how to create a query: Activity: Create a Wikidata query

The activity goal

Create a sample SPARQL query of a cultural collection of a museum and save a link to the query on your Wikiversity user page.

Note: Two collections are needed as the painting collection is easier to add to Jupyter Notebooks as you edit existing data. The second general collection is used to explore more options - assistence will be needed to help move the second query to the Jupyter Notebooks as they can be complex.

  1. First one of a painting collection, and then,

  2. A second one - of any type of media or artifact from a collection: books, sculpture, photography, etc.

Here is an example: Bavarian Collections, 1590 - 1750 SW March 2023

If you need support the GitHub Project (task tracker) for the class is here - titled: Prototype catalogue - ADA CP pipeline.

Later in the workflow the query will be added to a Jupyter Notebook for rendering in Quarto.

Create the two queries and add the query links to Wikiversity.

Query 1: A painting collection

The Painting Notebook is simpler to move to Jupyter Notebooks as the number of edits needed are fewer.

  1. Use and copy the existing painting collection and modify the setting Bavarian Collections, 1590 - 1750.

  2. The exercise is to change three parts: the painting collection, date perameters, and number of items.

    1. Collection: Line 9 - ?item wdt:P195 wd:Q812285: Add in the identifier of a different collection - see list here Here is a sample list of painting collections found on Wikidata, Wikidata:WikiProject sum of all paintings/Collection. As an example here the the Frick Museum - https://www.wikidata.org/wiki/Q682827 has a unique identifier of unique Q682827.

    2. Date: Line 37 - FILTER((1590 <= ?inceptionyear) && (?inceptionyear < 1750 )): here change the from and to dates.

    3. Number of items: Line 40 - LIMIT 9: Change the number of items and run preview.

    Now copy the query URL and save the link to your Wikiversiyt page. To copy the link you need to have previewed the query and bottom right is a short link copy button (if this doesnt work - copy the URL from your browsers address bar). Later we will add the new query to the Notebook painting.ipny.

Query 2: Make your own collection query

Note: We want to have images in our query, using the image grid view allows you to preview images.

Resources for finding collections: Data models and information

The links will help you find collections and see what types of artwork or media are listed on Wikidata.

The help and examples in the query service are useful for seeing examples and modifying them https://query.wikidata.org/.

The examples section has a RUN IT link under each examples where the query will load for you. Note to check the image grid view.

As example here is one of scultures by Max Bill. You can use this and change the artists name, for example to Donatello.

Museums

Film

WikiProject sum of all paintings

Art genres

Books

GLAM WikiProjects

Steps for making your own query

  1. Start with a blank query https://query.wikidata.org/

  2. Use the Examples button top left and then edit the example, or use DuckduckGo or Google to search for queries to edit.

  3. Save your query link on your Wikiversity user homepage.


In the next section 'Transfering a Wikidata SPARQL query to a Juypter Notebook' you will see how to move your queries to Jupyter Notebooks.

Transfering a Wikidata SPARQL query to a Juypter Notebook

Query 1: A painting collection

For this collection you simply edit the existing Notebook for the painting collection 'paintings.ipynb' rename as paintings_1xx.ipynb (with xx as your initails) - transfering across the new values that you have already created. See the section 'Making SPARQL Queries in Wikidata of Collections'.

FigureWhere to add your Wikidata SPARQL URL link and paste in the full query.

These are the values being changed: Collection, Date range, and Number of items.

  1. All you need to do is paste in two items into the main cell of the Notebook:

    1. The SPARQL URL - see the green text indicated in Figure 1

    2. Paste in the complete body of the SPARQL query - see the orange text in Figure 1. Note all of your query is pasted between the four apostrohies top and bottom:

      ''''

      your query

      '''''

Then run the Notebook and save the Notebook file when you are done and any other files edited.

Note: Change the notebook name in the _quarto.yml TOC so that your new file is included in the publication. Change paintings.ipynb to paintings_1xx.ipynb.

Note: Ensure requirements.txt has been added and run.

To render the output in Quarto, run the Quarto commands in your terminal - 'quarto preview' and/or 'quarto render'.

When your finished you can upload the results to your GitHub repository.

Query 2: Your custom collection query

Because SPARQL queries are varied and complex the process of moving the query into a Jupyter Notebook is not simple and involved knowledge of SPARQL and Python. The challenge is that the SPARQL metadata output has to be parsed by Python for presentation in the publication and to do this custom code needs to be written.

To help with this obstacle a support service is in place to consult and add the Python code to your Notebooks.

When you have completed the steps below and have a Notebook ready raise a ticket in the GitHub Project and assign to support and we will review the Notebook.

See all class members collection notebooks here. This is a test zone where support will edit copies of your notebooks to show how to make them woirk: https://github.com/NFDI4Culture/sandbox-notebook-rendering

Steps

This will prepare the Notebook to output the SPARQL results only. The additional rendering will need to be added by support.

Create a new Notebook in your Quarto publication 'collection-name.ipynb'.

Back in the custom collection query at the bottom right there is a <code> button - click on the button and it will launch a preview window. Top right click on the Python tab, this will present the Query as Python code. Now copy and paste the code into your Notebook.

Next, we want to copy the URL of your query to then also past this into your notebook. This will allow users to review your orginal query. The URL is found on the bottom right of the query page, as copy short link. If this does not work just copy the long address from your browser address bar.

Paste your query link into the Notebook as a comment, which means haveing a # at the start of the line. Like so, underneath the lines shown here:

import sys

from SPARQLWrapper import SPARQLWrapper, JSON

# query link - https://w.wiki/6YJi based on https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Paintings_by_Gustav_Klimt

To finish your Notebook editing insert a Markdown field above your code cell as add a desciption, e.g.,

Custom Collection by Simon Worthington (April 23): Paintings by Lucas Cranach the Elder https://www.wikidata.org/wiki/Q191748.

The collection Notebook only contains the SPARQL query and needs additional Python adding to parse the metadata output.

Save and run your Notebook. The result will be the SPARQL unformatted output.

Add your Notebook to the Quarto TOC by adding the file name of your collection to the TOC in the file _quarto.yml, under the chapter header on line 12. Save and render the quart publication to see the results.

To complete this part raise a ticket in the GitHub project assigning to support.

This concludes this part of the process.

Publishing Tasks

These are the small tasks needed for getting ready to publish. There will be an additional Flight Check round after this before completeing publishing.

  1. Add a cover image

  2. Update Imprint (colophon)

  3. Add Essay

  4. Change Style

  5. Update the readme information

  6. Tidy up the TOC

Upload your image to the top level of repository.

Then link in the file _quarto.yml at the link cover-image: as for example 

Cover-image: cover.jpg

See example  _quarto.yml cover link. File can be in repo or online.

The colophon is made of two parts. The first part that you will edit using instructions below. And the second part which automatically comes for the Thoth.pub book metadata system.

Parts of the book → Files to modify or add

Quick reference file look-up.

Note: _quarto.yml is where all book configurations are done for Quarto.

Cover 

Colophon

Essay

Collections

Style

Readme

Include new files in TOC

Repository address

Note: Render and push back changes.

Review and Flight Check

These items will be covered in the class.