Prof. Dr. rer. nat. Robert Jäschke

  • Dorotheenstraße 26
  • 10117 Berlin
  • Room: 6
  • Phone.: +49 (0)30 2093-70960
  • Fax: +49 (0)30 2093-4335
  • Email: (no Word attachments!)
  • Web:
  • For Spammers:
  • PGP-Key: 0x7762DDED (E034 140F BBDF D647 BDA7 192D F17D 65F5 7762 DDED)
  • ORCID: 0000-0003-3271-9653
Robert Jäschke


Wir suchen Informatiker*innen mit Interesse an einer Promotionsstelle neuro-symbolische künstliche Intelligenz (E13 TV-L HU).

Research #

I am head of the Information Processing and Analytics group at the School for Library and Information Science at Humboldt-Universität zu Berlin.

My research interest is web science, “the emergent science of the people, organizations, applications, and of policies that shape and are shaped by the Web, the largest informational artifact constructed by humans in history” (from the call for papers of the ACM Web Science conference). Thereby, my research is situated in computer science, with multi-disciplinary connections to information science, psychology, sociology and economics.
Relevant projects: Unknown Data, Web Collections

My second focal point are the digital humanities where we develop and analyse approaches to detect stylistic devices in large text corpora, or mine citations to explore literary works.
Relevant projects: What matters? Key passages in literary works, Vossian Antonomasia, World Literature

Our collaborative tagging system BibSonomy is both a valuable tool for researchers to organize their literature as well as a test-bed for our methods and results. In that context, I am interested in the development and integration of recommendation methods for tags and scholarly articles for social bookmarking systems. Further topics of interest include citation and link analysis, entity matching and resolution, and social network analysis.

I extensively leverage big data technologies like Hadoop, HBase, or Elasticsearch for my research, for instance, to analyze crawled web pages of academic institutions in the context of our German Academic Web archive.

Publications #

My full list of publications

Top Publications


Activities #

PC/Workshop Chair #

PC Member #

Journal Editor #

past activities

Talks #

Other #

  • Notebooks – A collection of Jupyter Notebooks showing different features and analyses; including exemplary and excellent computational essays of our students from the data analysis module.
  • FCA4DH – A collection of slides and information for the application of Formal Concept Analysis (FCA) in, and by, Digital Humanities (DH).
  • DH@HU – The Digital Humanities Network at Humboldt-Universität zu Berlin.

Projects #

BibSonomy and PUMA #

Our social bookmark and publication sharing system BibSonomy is online since 2006 with me being the main developer from 2005 to 2012. Since 2009 I am leading the development and operation of the system together with Andreas Hotho. If you are interested in a cooperation, just let me know.

Together with the University Library Kassel we have extended the BibSonomy platform in the DFG-funded PUMA project for academic publication management. If you are interested in using PUMA, please contact us ().


As part of the BMBF-funded AI-SKILLS project we build an application-oriented infrastructure for AI communities in teaching-learning settings at Humboldt-Universität zu Berlin (2021-2025).

Unknown Data #

Together with LZI (home of DBLP) and GESIS we are mining and consolidating research dataset metadata from the Web in this DFG-funded research project (2021-2024).

What matters? Key passages in literary works #

Together with Steffen Martus we are developing methods to identify and characterise key passsages in literary works in a DFG-funded research project (2020-2023).

Vossian Antonomasia #

We are developing novel methods to identify and extract Vossian antonomasia from large newspaper corpora. The approaches based on deep learning enable us to study this linguistic device on a large scale. Code, data, statistics, and many examples are available on our project page and GitHub repository.

World Literature #

Together with Frank Fischer and Mathias Göbel we are writing about Digital Humanities in general and our research on world literature in particular on weltliteratur.neta black market for the digital humanities.

Web Collections #

We regularly crawl the German Academic Web and curate a large collection of tweets (more than 6 billion tweets spanning more than nine years).

past projects

Source Code #

Some of my source code is available on GitHub, other code is linked here.

FolkRank #

FolkRank is an algorithm for search and ranking in collaborative tagging systems. It has been integrated into the community support architecture of the social semantic desktop developed by the NEPOMUK project. The source code is available from the project’s SVN repository.

Trias #

Trias is an algorithm for computing triadic concepts which fulfill minimal support constraints. The source code is available on the project page.

BibSonomy #

Some of the modules which BibSonomy is based on are available in a Maven repository. The complete source code of BibSonomy is available on Bitbucket, there you can also find an issue tracker and exemplary code snippets in the tools and BibSonomy Python projects. A good starting point is also

Data Sets #

We publish regular snapshots of the BibSonomy database. There you can also find the datasets of the ECML PKDD Discovery Challenges 2008 and 2009.

Datasets from publications:

As part of the ALEXANDRIA project we started to create a longitudinal collection of academic web pages from Germany. Since 2012 we crawl every six months the web sites of all German universities. Each crawl comprises around 6TB and 100 million URLs.