Prof. Dr. rer. nat. Robert Jäschke

  • Dorotheenstraße 26
  • 10117 Berlin
  • Room: 6
  • Phone.: +49 (0)30 2093-70960
  • Fax: +49 (0)30 2093-4335
  • Email: (no Word attachments!)
  • Web:
  • For Spammers:
  • PGP-Key: 0x7762DDED (E034 140F BBDF D647 BDA7 192D F17D 65F5 7762 DDED)
  • ORCID: 0000-0003-3271-9653
Robert Jäschke


Für unser BMBF-Projekt AI-SKILLS – Anwendungsorientierte Infrastruktur für KI-Communities in Lehr-Lern-Settings suchen wir eine Wissenschaftliche Projektleitung (E14 TV-L HU). Zur Stellenausschreibung.

Research #

I am head of the Information Processing and Analytics group at the School for Library and Information Science at Humboldt-Universität zu Berlin.

My research interest is Web Science, “the emergent science of the people, organizations, applications, and of policies that shape and are shaped by the Web, the largest informational artifact constructed by humans in history” (from the call for papers of the ACM Web Science conference). Thereby, my research is situated in computer science, with multi-disciplinary connections to psychology, sociology, economics, and the digital humanities.

More specifically, together with colleagues from the Leibniz Research Alliance Open Science, I am investigating, how the (social) web is changing the research landscape and how it can improve communication, collaboration, participation, and open discourse.

I am also leading the development of the collaborative tagging system BibSonomy, which is both a valuable tool for researchers to organize their literature as well as a test-bed for our methods and results. In that context, I am interested in the development and integration of recommendation methods for tags and scientific publications for social bookmarking systems. Further topics of interest include citation and link analysis, entity matching and resolution, and social network analysis.

I extensively leverage big data technologies like Hadoop, HBase, Drill, or Elasticsearch for my research, e.g., to analyze crawled web pages of universities in the context of Open Science. Therefore, I have designed a dedicated cluster system for L3S Research Center, consisting of 40 nodes with an overall disk space of 2 Petabyte and 400 CPU cores. Since 2013 the first stages are installed and I am managing the operation and further extension of the cluster.

Publications #

My publications

Top Publications


Recommender Systems for Social Tagging Systems
Formal Concept Analysis and Tag Recommendations in Collaborative Tagging Systems


Activities #

PC/Workshop Chair #

PC Member #

past activities

Talks #

Projects #

BibSonomy and PUMA #

Our social bookmark and publication sharing system BibSonomy is online since 2006 with me being the main developer from 2005 to 2012. Since 2009 I am leading the development and operation of the system together with Andreas Hotho. If you are interested in a cooperation, just let me know.

Together with the University Library Kassel we have extended the BibSonomy platform in the DFG-funded PUMA project for academic publication management. If you are interested in using PUMA, please contact us ().


As part of the BMBF-funded AI-SKILLS project we build an application-oriented infrastructure for AI communities in teaching-learning settings at Humboldt-Universität zu Berlin (2021-2025).

Unknown Data #

Together with LZI (home of DBLP) and GESIS we are mining and consolidating research dataset metadata from the Web in this DFG-funded research project (2021-2024).

Uncovr #

In the Uncovr project we aim to improve the identification and linking of musical works in videos by leveraging methods from artificial and collective intelligence (2021-2023).

What matters? Key passages in literary works #

Together with Steffen Martus we are developing methods to identify and characterise key passsages in literary works in a DFG-funded research project (2020-2023).

Vossian Antonomasia #

We are developing novel methods to identify and extract Vossian antonomasia from large newspaper corpora. The approaches based on deep learning enable us to study this linguistic device on a large scale. Code, data, statistics, and many examples are available on our project page and GitHub repository.

World Literature #

Together with Frank Fischer and Mathias Göbel we are writing about Digital Humanities in general and our research on world literature in particular on – a black market for the digital humanities

Web Collections #

We regularly crawl the German Academic Web and curate a large collection of tweets (currently 5 billion tweets spanning more than six years).

past projects

Source Code #

Some of my source code is available on GitHub, other code is linked here.

FolkRank #

FolkRank is an algorithm for search and ranking in collaborative tagging systems. It has been integrated into the community support architecture of the social semantic desktop developed by the NEPOMUK project. The source code is available from the project’s SVN repository.

Trias #

Trias is an algorithm for computing triadic concepts which fulfill minimal support constraints. The source code is available on the project page.

BibSonomy #

Some of the modules which BibSonomy is based on are available in a Maven repository. The complete source code of BibSonomy is available on Bitbucket, there you can also find an issue tracker and exemplary code snippets in the tools and BibSonomy Python projects. A good starting point is also

Data Sets #

We publish regular snapshots of the BibSonomy database. There you can also find the datasets of the ECML PKDD Discovery Challenges 2008 and 2009.

Datasets from publications:

As part of the ALEXANDRIA project we started to create a longitudinal collection of academic web pages from Germany. Since 2012 we crawl every six months the web sites of all German universities. Each crawl comprises around 6TB and 100 million URLs.