Prof. Dr. rer. nat. Robert Jäschke

Dorotheenstraße 26
10117 Berlin
Room: 6
Phone.: +49 (0)30 2093-70960
Fax: +49 (0)30 2093-4335
Email: (no Word attachments!)
Web: https://www.ibi.hu-berlin.de/de/institut/personen/jaeschke
For Spammers:
PGP-Key: 0x7762DDED (E034 140F BBDF D647 BDA7 192D F17D 65F5 7762 DDED)
ORCID: 0000-0003-3271-9653

Research #

I am head of the Information Processing and Analytics group at the School for Library and Information Science at Humboldt-Universität zu Berlin.

My research interest is web science, “the emergent science of the people, organizations, applications, and of policies that shape and are shaped by the Web, the largest informational artifact constructed by humans in history” (from the call for papers of the ACM Web Science conference). Thereby, my research is situated in computer science, with multi-disciplinary connections to information science, psychology, sociology and economics.
Relevant projects: Unknown Data, Web Collections

My second focal point are the digital humanities where we develop and analyse approaches to detect stylistic devices in large text corpora, or mine citations to explore literary works.
Relevant projects: What matters? Key passages in literary works, Vossian Antonomasia, World Literature

Our collaborative tagging system BibSonomy is both a valuable tool for researchers to organize their literature as well as a test-bed for our methods and results. In that context, I am interested in the development and integration of recommendation methods for tags and scholarly articles for social bookmarking systems. Further topics of interest include citation and link analysis, entity matching and resolution, and social network analysis.

I extensively leverage big data technologies like Hadoop, HBase, or Elasticsearch for my research, for instance, to analyze crawled web pages of academic institutions in the context of our German Academic Web archive.

Publications #

My full list of publications

Top Publications

upcoming: P. Kraut, F. Arnold, R. Jäschke, and S. Martus "Schlüsselstellen: Wie die Literaturwissenschaft zitiert," Wallstein Verlag, 2026. ISBN: 978-3-8353-6085-3
F. Fischer and R. Jäschke, "Ein Quantum Literatur. Empirische Daten zu einer Theorie des literarischen Textumfangs," in Digitale Literaturwissenschaft (F. Jannidis, eds), pp. 777–812, J.B. Metzler, 2023. PDF
M. Schwab, R. Jäschke, and F. Fischer, "»The Rodney Dangerfield of Stylistic Devices« – End-to-End Detection and Extraction of Vossian Antonomasia Using Neural Networks," Frontiers in Artificial Intelligence, June 2022. PDF
F. Fischer and R. Jäschke, "»The Michael Jordan of greatness« – Extracting Vossian antonomasia from two decades of The New York Times, 1987–2007," Digital Scholarship in the Humanities, January 2019. PDF
S. Doerfel, R. Jäschke, and G. Stumme, "The Role of Cores in Recommender Benchmarking for Social Bookmarking Systems," ACM Transactions on Intelligent Systems and Technology vol. 7, pp. 40:1–40:33, Feb. 2016. PDF
L. Balby Marinho, A. Hotho, R. Jäschke, A. Nanopoulos, S. Rendle, L. Schmidt-Thieme, G. Stumme, and P. Symeonidis, Recommender Systems for Social Tagging Systems. SpringerBriefs in Electrical and Computer Engineering, Springer, Feb. 2012. PDF
L. Balby Marinho, A. Nanopoulos, L. Schmidt-Thieme, R. Jäschke, A. Hotho, G. Stumme, and P. Symeonidis, "Social tagging recommender systems," in Recommender Systems Handbook (F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, eds.), pp. 615–644, Springer, 2011. PDF
D. Benz, A. Hotho, R. Jäschke, B. Krause, F. Mitzlaff, C. Schmitz, and G. Stumme, "The social bookmark and publication management system BibSonomy," The VLDB Journal, vol. 19, pp. 849–875, Dec. 2010. PDF

Theses

Ph Thesis: R. Jäschke, Formal Concept Analysis and Tag Recommendations in Collaborative Tagging Systems, vol. 332 of Dissertationen zur Künstlichen Intelligenz. Akademische Verlagsgesellschaft AKA, Jan. 2011.
Diploma Thesis: R. Jäschke, "Die Struktur der Monoide binärer Relationen auf endlichen Mengen," (English: The structure of monoids of binary relations on finite sets), Technische Universität Dresden, Apr. 2005. PDF (25.04.2005, German)

Activities #

PC/Workshop Chair #

28th International Conference on Conceptual Structures, September 11 – 13, 2023, Berlin, Germany (general chair)
23rd ACM/IEEE Joint Conference on Digital Libraries, June 26 – 30, 2023, Santa Fe, USA (PC chair)
27th International Conference on Conceptual Structures, September 12 – 15, 2022, Münster, Germany (PC chair)
21st Conference LWDA – Learning. Knowledge. Data. Analytics., September 30, 2019 to October 2, Berlin, Germany (general chair)
Workshop “BIAS – Bias in Information, Algorithms, and Systems”, March 25, 2018, Sheffield, UK
8th International ACM Web Science Conference 2016, May 22 to May 25, 2016, Hannover (local chair)

PC Member #

3rd International Workshop on Natural Scientific Language Processing (NSLP 2026) , May 12, 2026, Palma de Mallorca, Spain
18th ACM Web Science Conference, May 26 – 29, 2026, Braunschweig, Germany
The Web Conference, April 13 – 17, 2026, Dubai, United Arab Emirates
DHd2026: Nicht nur Text, nicht nur Daten, February 23 – 27, 2026, Vienna, Austria
19th ACM Conference on Recommender Systems, September 22 – 26, 2025, Prague, Czech Republic
2nd Workshop on Natural Scientific Language Processing and Research Knowledge Graphs, June 1/2, 2025, Portoroz, Slovenia
7th International Symposium on Open Search Technology, October 8 – 10 2025, Espoo, Finland

Journal Editor #

Journal of Cultural Analytics: Special Issue on “Wikipedia, Wikidata, and World Literature”, 2023 together with Frank Fischer, Jacob Blakesley, and Paula Wojcik (read our preface “World Literature in an Expanding Digital Space”)

past activities

Talks #

“Challenges in Automatic Identification of Indirect Quotations Between Scholarly Texts and Literary Works” held at Workshop “re-late – re-use – re-vise”, December 4, 2025, Vienna, Austria
Panel “Kann uns KI zu Programmierern machen?” at Neue Perspektiven auf KI in Bibliotheken, May 13, 2025, Online
“Der Manfred Lehmann unter den rhetorischen Stilmitteln: Entdeckung und Extraktion Vossianischer Antonomasien mit Hilfe großer Sprachmodelle” held at 37. Berliner Sommer-Uni, September 13, 2024, Berlin, Germany
“Wörter zählen, oder: Die Top 10 von Goethes Substantiven, Verben und Adjektiven. Was geben die Datenbanken her?” in Fünf Minuten Goethe Podcast, February 6, 2024, Goethe-Gesellschaft und Klassik Stiftung Weimar
“FCA & DraCor” held at Workshop “Computational Notebooks for FCA” (CoNo-Concepts 2023), July 17, 2023, Kassel, Germany (Slides as Jupyter Notebook)
“Tales from the inside 10: years of growing and maintaining a multi-terabyte longitudinal archive of web pages and tweets” held at DIY digital archives workshop by the Digital Humanities Network, July 7, 2023, Potsdam, Germany (Slides)
“Formal Concept Analysis 101: Order and Knowledge” held DIHMA.LAB/MaRDI “Digital Humanities meet Mathematics”, September 20, 2022, Berlin, Germany (Slides)
“Liebe & Tod in der Deutschen Nationalbibliothek: Der DNB-Katalog als Forschungsobjekt der digitalen Literaturwissenschaft” held at Erschließen, Forschen, Analysieren – EFA22@DNB, September 8, 2022, Frankfurt, Germany (Slides)
“What matters? Key passages in literary works” held at DHLunch@GS, March 21, 2022, University of Texas at Austin, USA
“What matters? Key passages in literary works” held at DH Coffee Talks, December 16, 2021, Humboldt-Universität zu Berlin, Germany (Video)
“Datenschätze selber heben: Data Science und Bibliotheken” held at Universitätsbibliothek Zürich, June 5, 2021, Zürich, Switzerland (Slides)
“Datenschätze selber heben: Data Science und Bibliotheken” held at Universitätsbibliothek Johann Christian Senckenberg, February 25, 2020, Frankfurt, Germany (Slides)
“Uncovering Hidden Treasures: Libraries and Data Science” held at KNVI Smart Humanity Conference, November 15, 2019, Amsterdam, The Netherlands (Slides)
“Datenschätze selber heben: Data Science und Bibliotheken” held at 107. Deutscher Bibliothekartag, June 13, 2018, Berlin, Germany (Slides)
“Das World Wide Web als Ressource für die Wissenschaft” held at Berliner Bibliothekswissenschaftliches Kolloquium, November 29, 2017, Berlin, Germany (PDF, Video)
“Über die Entwicklung von Programmen zur Begriffsanalyse” held at the 91st Ernst-Schröder-Seminar, December 3, 2016, Darmstadt, Germany (PDF)
“Empfehlungssysteme für Social Bookmarking” held at the 91st Ernst-Schröder-Kolloquium, December 2, 2016, Darmstadt, Germany (PDF)
“A Cloud-Based Infrastructure for Community-Driven Data Mining on Research Outputs” held at the Digital Infrastructures for Research Conference, September 29, 2016, Kraków, Poland (PDF)
“Identifying and Analyzing Researchers on Twitter” held at Twitter Workshop, Göttingen Center for Digital Humanities, June 12 2014, Göttingen, Germany (PDF)
Tutorial “Ontology Learning from Folksonomies” held at
- Euro-NF Summer School on Modeling and Analysis of Novel Mechanisms in Future Internet Applications, March 28th - April 4th, 2012, Wuerzburg, Germany
- 9th International Conference on Formal Concept Analysis (ICFCA 2011), May 2 - 6, 2011, Nicosia, Cyprus (PDF)
- 6. Konferenz Professionelles Wissensmanagement, February 21 - 23, 2011, Innsbruck, Austria
- EKAW 2010 - Knowledge Engineering and Knowledge Management by the Masses, October 11 - 15, 2010, Lisbon, Portugal
“Chaos und Ordnung im Web 2.0” held at Dresdner Mathematisches Seminar, July 15, 2009, Technische Universität Dresden, Germany

Projects #

BibSonomy and PUMA #

Our social bookmark and publication sharing system BibSonomy is online since 2006 with me being the main developer from 2005 to 2012. Since 2009 I am leading the development and operation of the system together with Andreas Hotho. If you are interested in a cooperation, just let me know.

Together with the University Library Kassel we have extended the BibSonomy platform in the DFG-funded PUMA project for academic publication management. If you are interested in using PUMA, please contact us ().

AI-SKILLS #

As part of the BMBF-funded AI-SKILLS project we build an application-oriented infrastructure for AI communities in teaching-learning settings at Humboldt-Universität zu Berlin (2021-2025).

Is Expert Knowledge Key? Scholarly Interpretations as Resource for the Analysis of Literary Texts in Computational Literary Studies #

Together with Steffen Martus we are developing methods to identify and characterise key passsages in literary works in a DFG-funded research project (2020-2026) that is part of the DFG priority programme 2207 Computational Literary Studies.

Vossian Antonomasia #

We are developing novel methods to identify and extract Vossian antonomasia from large newspaper corpora. The approaches based on deep learning enable us to study this linguistic device on a large scale. Code, data, statistics, and many examples are available on our project page and GitHub repository.

World Literature #

Together with Frank Fischer and Mathias Göbel we are writing about Digital Humanities in general and our research on world literature in particular on weltliteratur.net – a black market for the digital humanities.

Web Collections #

We regularly crawl the German Academic Web and curate a large collection of tweets (more than 6 billion tweets spanning more than nine years).

past projects

Source Code #

Some of my source code is available on GitHub and GitLab, other code is linked here.

Jupyter Notebooks #

A collection of Jupyter Notebooks demonstrates skills of our students and shows use cases for data and text mining, visualisation, and statistics.

FolkRank #

FolkRank is an algorithm for search and ranking in collaborative tagging systems. It has been integrated into the community support architecture of the social semantic desktop developed by the NEPOMUK project. The source code is available from the project’s SVN repository.

Trias #

Trias is an algorithm for computing triadic concepts which fulfill minimal support constraints. The source code is available on the project page.

BibSonomy #

Some of the modules which BibSonomy is based on are available in a Maven repository. The complete source code of BibSonomy is available on Bitbucket, there you can also find an issue tracker and exemplary code snippets in the tools and BibSonomy Python projects. A good starting point is also dev.bibsonomy.org.

Data Sets #

We publish regular snapshots of the BibSonomy database. There you can also find the datasets of the ECML PKDD Discovery Challenges 2008 and 2009. Other datasets have been published on Zenodo.

Datasets from publications:

Twitter #

We have collected the random 1% sample of the Twitter streaming API between 2013 and 2023. The dataset was used as the foundation for TweetsKB and for a corpus of monthly Twitter n-grams generated from more than 2 billion English tweets (2013-2023).

German Academic Web #

We create and curate a longitudinal collection of academic web pages from Germany. Since 2012 we crawl every six months the web sites of all German universities and the Fraunhofer and Max Planck societies. Each crawl comprises around 6TB and 100 million URLs.