Brieven als Buit (‘Letters as Loot’)

About the Corpus application

The corpus application is developed by the Dutch Language Institute (Instituut voor de Nederlandse Taal or INT). The backend of the application is the BlackLab Lucene based search engine developed for corpora with token-based annotation (https://blacklab.ivdnt.org/). The web-based frontend is a further development of the corpus-frontend application developed by INT (https://github.com/instituutnederlandsetaal/blacklab-frontend) in CLARIN and CLARIAH projects. Its design is inspired by the first version of the OpenSoNaR user interface by Tilburg and Radboud University (https://github.com/Taalmonsters/WhiteLab2.0).

About the Brieven als Buit project

Approximately 40,000 Dutch letters from the second half of the 17^th to the early 19^th centuries have been gathering dust for centuries in British archives. They were sent home by sailors and others from abroad but also vice versa by those staying behind who needed to keep in touch with their loved ones. Many letters did not reach their destinations: they were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England. These confiscated letters of men, women and even children represent priceless material for historical linguists. They allow us to gain access to the as yet mainly unknown everyday Dutch of the past, the colloquial Dutch of people from the middle and lower classes.

The research programme Brieven als Buit/Letters as loot. Towards a non-standard view on the history of Dutch has explored this extraordinary source of Dutch letters from the past (see www.brievenalsbuit.nl). This programme, initiated and directed by prof. dr. Marijke van der Wal (Leiden University) and funded by the Netherlands Organisation for Scientific Research (NWO), successfully ran from 1 September 2008 till 1 September 2013.

A first online accessible version of the corpus was launched on 5 September 2013 and is one the programme's results. Like the second, adapted version that came online on 18 June 2015, it was achieved through close collaboration with the Institute for Dutch Lexicology (INL). The current third release, adapted by the Dutch Language Institute (INT) - the successor of INL - , has been released on 29 January 2020.

About the Brieven als Buit corpus

The corpus on this website comprises about 1,000 letters and was compiled within the Brieven als Buit / Letters as Loot programme (see https://www.universiteitleiden.nl/en/research/research-projects/humanities/letters-as-loot.-towards-a-non-standard-view-on-the-history-of-dutch#tab-1). The historical linguistic research of the Letters as Loot programme was based on this corpus. For the research results we refer to the following monograph and PhD dissertations which are all available in open access:

Gijsbert Rutten & Marijke van der Wal, Letters as Loot. A sociolinguistic approach to seventeenth- and eighteenth-century Dutch, Amsterdam & Philadelphia: John Benjamins, 2014 (https://www.jbe-platform.com/content/books/9789027269577)
Judith Nobels, (Extra)Ordinary letters: A view from below on seventeenth-century Dutch. Utrecht: LOT, 2013 (https://www.lotpublications.nl/extraordinary-letters-extraordinary-letters-a-view-from-below-on-seventeenth-century-dutch)
Tanja Simons, Ongekend 18^e-eeuws Nederlands: Taalvariatie in persoonlijke brieven. Utrecht: LOT, 2013 (https://www.lotpublications.nl/ongekend-18e-eeuws-nederlands-ongekend-18e-eeuws-nederlands-taalvariatie-in-persoonlijke-brieven)

In the present internet application, about 90 % of the photographs, all transcriptions and the metadata related to the letters originate from the Brieven als Buit / Letters as Loot programme.

The National Archives (Kew, UK), the current location of the confiscated documents, offered the opportunity to photograph selected letters, all kept in the High Court of Admiralty (HCA) Archives. The remaining 10 % were provided by the preservation project Metamorfoze (Royal Library (KB) and National Archives, both in The Hague), by dr. Adri van Vliet (cf. Adri P. Van Vliet, ‘Een vriendelijcke groetenisse’. Brieven van het thuisfront aan de vloot van De Ruyter (1664-1665). Franeker: Van Wijnen, 2007) and by dr. Roelof van Gelder.

Diplomatic transcriptions of all photographs were made within the Brieven als Buit / Letters as Loot programme by volunteers of the Leiden Wikiscripta Neerlandica project, initiated by prof. dr. Marijke van der Wal in 2007. After various stages of corrections the transcriptions were incorporated in the Brieven als Buit / Letters as Loot corpus.

All letters were annotated with extensive metadata from the database of the research programme Brieven als Buit/Letters as Loot. They were tokenized, tagged with Part of Speech and lemmatised by the INT. The linguistic annotations were verified manually.

Linguistic Annotation

The Part of Speech tagging has been done using the tagset and tagging principles for the annotation of diachronic corpora of historical Dutch, developed in the context of the CLARIAH+ project. The tags that were originally used for Brieven als Buit have been mapped to this tagging. This annotation layer has been added to the corpus, and can also be used to search the online corpus. A detailed description can be found here.

The most important differences are:

the use of a type feature for different kinds of proper names instead of using different tags, such as PER, NEPER, NELOC and NEORG;
the use of a type feature for the residual categories tagged by RES, FOREIGN and UNRESOLVED;
the ADJ tag is replaced by AA and ART is now a subtype of PD
multiple tags that denote the same thing have been uniformized to a single tag, such as NOU-C for NOU-C, NOU and NOU-EN and VRB for VRB and VRN.

The 17th and 18th century word forms all have a modern Dutch lemma. For words no longer used in modern Dutch, a modern lemma has been constructed using the same linguistic principles applicable to still existing words. More information about the used lemmatisation principles can be found in Lemmatiseerprincipes voor GiGaNT, het centrale lexicon van het INT.

Credits

When referring to the Brieven als Buit Corpus, please use the following reference:

The Letters as Loot / Brieven als Buit-corpus. Leiden University. Compiled by Marijke van der Wal (Programme leader), Gijsbert Rutten, Judith Nobels and Tanja Simons, with the assistance of volunteers of the Leiden-based Wikiscripta Neerlandica transcription project, and lemmatised, tagged and provided with search facilities by the Institute for Dutch Lexicology (INL). 3rd release januari 2021. http://hdl.handle.net/10032/tm-a2-s4

When referring to the Brieven als Buit Data, please use the following reference:

Brieven als Buit - Gouden Standaard (Version 2.0) (2013) [Data set]. Available at the Dutch Language Institute: http://hdl.handle.net/10032/Tm-a2-a7

For BlackLab:

Software available at https://github.com/instituutnederlandsetaal/BlackLab

Does, Jesse de, Jan Niestadt & Katrien Depuydt (2017), Creating research environments with BlackLab. In: Jan Odijk and Arjan van Hessen (eds.) CLARIN in the Low Countries, pp. 151-165. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi

For the corpus frontend:

Software available at: https://github.com/instituutnederlandsetaal/blacklab-frontend

Logo provenance:

Design Martien Frijns