Data Mining for Historical German Documents of the 18th Century: A Case Study of a Topic Model

Date: 2019-09-03 Tao Wang

Abstract: Topic Model is a newly developed research method, which is of great value for expanding the research path in the field of Digital Humanities. LDA is one of the Topic Model algorithms. It can be applied in the pieces of literature between the 17th -18th Centuries included in the German Literature Archive. After summarizing and analyzing the themes (topics) of the texts, the LDA can evaluate the effectiveness of the topic model. The calculation results of the topic model enable us to have a more concrete understanding of the German spiritual world in the 18th Century: The authors in the 18th Century had a strong sense of history and were extremely active in the construction of the knowledge system. Also, the popularity of novels is closely related to the rise of the public domain. The religious Enlightenment is the theme of that period as well. These results imply that the Enlightenment has multiple aspects. In historical research, it is necessary to combine the “Long-Distance reading,” which represented by the Topic Model, with the close reading to get more convincing research results. As a method of text mining, the Topic Modeling still has room for improvement, and it needs the cooperation of the humanists and computing experts, which is the way for the continued development of Digital Humanities.

 

Keywords: Digital History; Topic Model; Germany; the Enlightenment (Movement); Long-Distance Reading