The corpus application is developed by the INT. The backend of the application is the BlackLab Lucene based search engine developed for corpora with token-based annotation (http://inl.github.io/BlackLab/). The web-based frontend is a further development of the corpus-frontend application developed by INT (https://github.com/INL/corpus-frontend) in CLARIN and CLARIAH projects. Its design is inspired by the first version of the OpenSoNaR user interface by Tilburg and Radboud University (https://github.com/Taalmonsters/WhiteLab2.0).
Approximately 40,000 Dutch letters from the second half of the 17th to the early 19th centuries have been gathering dust for centuries in British archives. They were sent home by sailors and others from abroad but also vice versa by those staying behind who needed to keep in touch with their loved ones. Many letters did not reach their destinations: they were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England. These confiscated letters of men, women and even children represent priceless material for historical linguists. They allow us to gain access to the as yet mainly unknown everyday Dutch of the past, the colloquial Dutch of people from the middle and lower classes.
The research programme Brieven als Buit/Letters as loot. Towards a non-standard view on the history of Dutch has explored this extraordinary source of Dutch letters from the past (see www.brievenalsbuit.nl). This programme, initiated and directed by prof. dr. Marijke van der Wal (Leiden University) and funded by the Netherlands Organisation for Scientific Research (NWO), successfully ran from 1 September 2008 till 1 September 2013.
A first online accessible version of the corpus was launched on 5 September 2013 and is one the programme's results. Like the second, adapted version that came online on 18 June 2015, it was achieved through close collaboration with the Institute for Dutch Lexicology (INL). The current third release, adapted by the Dutch Language Institute (INT) - the successor of INL - , has been released on 29 January 2020.
The corpus on this website comprises about 1,000 letters and was compiled within the Brieven als Buit / Letters as Loot programme (see https://www.universiteitleiden.nl/en/research/research-projects/humanities/letters-as-loot.-towards-a-non-standard-view-on-the-history-of-dutch#tab-1). The historical linguistic research of the Letters as Loot programme was based on this corpus. For the research results we refer to the following monograph and PhD dissertations which are all available in open access:
In the present internet application, about 90 % of the photographs, all transcriptions and the metadata related to the letters originate from the Brieven als Buit / Letters as Loot programme.
The National Archives (Kew, UK), the current location of the confiscated documents, offered the opportunity to photograph selected letters, all kept in the High Court of Admiralty (HCA) Archives. The remaining 10 % were provided by the preservation project Metamorfoze (Royal Library (KB) and National Archives, both in The Hague), by dr. Adri van Vliet (cf. Adri P. Van Vliet, ‘Een vriendelijcke groetenisse’. Brieven van het thuisfront aan de vloot van De Ruyter (1664-1665). Franeker: Van Wijnen, 2007) and by dr. Roelof van Gelder.
Diplomatic transcriptions of all photographs were made within the Brieven als Buit / Letters as Loot programme by volunteers of the Leiden Wikiscripta Neerlandica project, initiated by prof. dr. Marijke van der Wal in 2007. After various stages of corrections the transcriptions were incorporated in the Brieven als Buit / Letters as Loot corpus.
All letters were annotated with extensive metadata from the database of the research programme Brieven als Buit/Letters as Loot. They were tokenized, tagged with Part of Speech and lemmatised by the INT. The linguistic annotations were verified manually.
The 17th and 18th century word forms all have a modern Dutch lemma. For words no longer used in modern Dutch, a modern lemma has been constructed using the same linguistic principles applicable to still existing words.
More information about the used lemmatization principles can be found in Marijke Mooijaart, Het lemma in the GiGaNT lexicon.
In the context of the CLARIAH+ project, a tagset and tagging principles for the annotation of diachronic corpora of historical Dutch has been developed. This annotation layer has been added to the corpus, and can also be used to search the online corpus.
A detailed description can be found here.
When referring to the Brieven als Buit Corpus, please use the following reference:
The Letters as Loot / Brieven als Buit-corpus. Leiden University. Compiled by Marijke van der Wal (Programme leader), Gijsbert Rutten, Judith Nobels and Tanja Simons, with the assistance of volunteers of the Leiden-based Wikiscripta Neerlandica transcription project, and lemmatised, tagged and provided with search facilities by the Institute for Dutch Lexicology (INL). 3rd release januari 2021. http://hdl.handle.net/10032/tm-a2-s4
When referring to the Brieven als Buit Data, please use the following reference:
Brieven als Buit - Gouden Standaard (Version 2.0) (2013) [Data set]. Available at the Dutch Language Institute: http://hdl.handle.net/10032/Tm-a2-a7
Software available at https://github.com/INL/BlackLab
Does, Jesse de, Jan Niestadt en Katrien Depuydt (2017), Creating research environments with BlackLab. In: Jan Odijk and Arjan van Hessen (eds.) CLARIN in the Low Countries, pp. 151-165. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi
For the corpus frontend:
Software available at: https://github.com/INL/corpus-frontend