Steptext

Text Visualisation Tool for Exploring Digitised Historical Documents

Read more

Vane, O., 2018, May. Text Visualisation Tool for Exploring Digitised Historical Documents. In Proceedings of the 2018 ACM Conference Companion Publication on Designing Interactive Systems (pp. 153-158). ACM.

Timeline design for visualising cultural heritage data, Chapter 4.

Code

Built in JavaScript / D3.js with ElasticSearch

Data visualisation is often thought of as a useful technique to look at patterns in quantitative data: but what about the qualitative? Steptext provides a helpful structure for surveying texts by time, revealing and emphasising what is being said. The visualisation tool enables a historian to quickly survey a document collection around themes they are interested in while staying close to the original texts. Simple visuals help promote transparency.

The interface works as follows: a user searches for a keyword they are interested in. The interface then visualises instances of that keyword across all the documents, mapped by time horizontally. This way, the user can easily trace commentary through time.

Steptext is demonstrated on The Medical Officer of Health reports: a set of 5,500 19th Century public health reports digitised by the Wellcome Library. These reports mostly feature narrative text and have been converted to digital text files.

To read more about Steptext including evaluations with historians, read the short paper Text Visualisation Tool for Exploring Digitised Historical Documents, or my longer thesis chapter Steptext: Medical Officer of Health reports.

'nurse' timeline, Medical Officer of Health reports

Animation showing how text snippets are mapped horizontally by date: 'typhoid carrier' timeline, MOH reports

'putrid' timeline 1900-1972, MOH reports. By stacking the snippets from old to new, uneven patterns of occurrence are visible from the overall shape. The visualisation for ‘putrid’, for instance, displays a sharp turn in the shape from 1920-onwards. Was there a drop in putridity? Or maybe the language changed?

'heroin' timeline, MOH reports. The visualisation shape for ‘heroin’ raises questions. There is an almost vertical column to the right, indicating a sudden surge in instances 1960-onwards. The two pre-1940 occurrences of ‘heroin’ mention the drug in the context of regulation and a surgical technique. The strong column of results from 1960s shows discussion has shifted to drug addiction and abuse.

'blitz' timeline, MOH reports. The visualisation for ‘blitz’ shows a few instances of the term pre-WWII (referring to a person’s name and a type of instrument) and then a strong column of results from 1940 referring to Nazi air raids on Britain. By scanning through the snippets, a user can observe the gradual adoption of the word into English language. At first ‘blitz’ appears within quotation marks, but over time instances without quotation marks increase and become the norm as the word is accepted. By 1957 it is even used metaphorically: 'bed bug blitz'.