Themen für Abschlussarbeiten

Die Liste ist größtenteils auf Englisch, die Abschlussarbeit darf natürlich auf Deutsch geschrieben werden.

Wann wurde eine Webseite erstellt? #

Es gibt einige Anwendungen, bei denen es wichtig ist zu wissen, wann eine Webseite erstellt wurde, beispielsweise im Journalismus, bei der Prüfung von Patenten und anderen rechtlichen Fragestellungen oder aus wissenschaftlicher Neugier bei der Analyse der Evolution des Webs.

Die vorhandenen Ansätze lösen das Problem beispielsweise durch Extraktion von im Text enthaltenen Publikationsdaten, durch Sprachmodelle oder durch die Analyse der Linkstruktur. Leider funktionieren diese Methoden auf einigen (vor allem älteren) Webseiten nicht so gut und erschweren damit zum Beispiel die Datierung von Webseiten im Internet Archive.

Im Rahmen dieser Abschlussarbeit soll untersucht werden, inwiefern maschinelle Lernverfahren die Datierung von Webseiten verbessern können. Dazu sollen insbesondere die vorhandenen Verfahren auf einer ausgewählten Teilmenge der Daten des Internet Archive miteinander und gegen ein vergleichsweise einfaches neues Verfahren verglichen werden.

Notwendig sind Programmierkenntnisse sowie Kenntnisse im Umgang mit digitalen Daten. Wünschenswert sind Erfahrungen mit maschinellen Lernverfahren.

Visualisations of the State of the Art #

To get an overview on the state of the art in a specific topic visualisations can be very helpful and are sometimes provided in survey articles, lecture slides, or in research papers. This project would investigate which types of visualisations exist, how they can be systematised and described, and how such visualisations could be found and collected. siehe auch

Replicating Twitter Research #

Twitter is a popular subject of research but also still rather new. Therefore, no methodology or framework for analysis has been established. This results in different methodologies and results for papers analysing quite similar research questions or data collections. The goal of this project is to identify, classify, and compare the methodology used by a selection of publications dealing with (roughly) the same research area and then comparing the findings between the papers. This can also involve trying to replicate findings by analysing and comparing actual Twitter data.

Semantic Publishing Data #

This project shall investigate different ways to represent data about publications and researchers using semantic web standards and technologies. Specifically, it shall identify and systematise best practices and standards used by different services and develop a workflow to automatically extract, clean, enrich, and publish metadata about scholarly articles

Twitter Statistics #

Within this project a cloud-based service for the automatic creation, update, and publication of statistics about the Twitter stream shall be developed. This includes figures like the number of tweets per hour or the fraction of tweets with a URL including their temporal evolution.

Emerging Vocabularies in Collaborative Tagging Systems #

A common assumption is that within collaborative tagging (or social bookmarking) systems a joint vocabulary emerges over time. This project shall investigate which studies exist with respect to that topic and how their findings were confirmed. Then, similar and additional experiments shall be performed on data from different tagging systems, trying to replicate (or refuse) findings. One particular aspect of interest is whether a global (within the system) vocabulary emerges or whether this happens only within certain subgroups of users.

Metadata Quality and Evolution in Collaborative Tagging #

Users of collaborative tagging systems can change (extend, repair) the metadata of their resources. This project shall analyse to which extent such changes happen and which types of changes users perform. This information shall then be used to automatically identify errors or omissions in order to suggest changes to users.

Metadata for Research Using Archived Web Pages #

Archiving the web is becoming more and more important also for researchers, since the web as an indispensable part of our society is (and will be) a valuable resource to analyse and understand our history. This project shall analyse which metadata for archived web pages is (typically) available, which metadata could be provided, and which metadata is actually required by different researchers. Therefore, a survey or interviews with researchers relying on/using web archives could be performed.

Recommender Systems for Scholarly Literature #

There is an abundance of works dealing with the recommendation of scholarly articles. The goal of this project is to perform a bibliographic analysis on these works in order to identify developments, main actors and groups, technologies, frameworks, evaluation methods, and datasets.

Profiles of Scholars on Twitter #

Scholars frequently use Twitter for scholarly communication. The goal of this project is to analyse how scholars represent themselves on Twitter, that is, how they curate their profiles, and how they describe themselves.

Comparing DBpedia and WikiData #

While DBpedia extracts data from Wikipedia, Wikidata goes the other way around: it tries to collect data and populate Wikipedia with it. DBpedia was a popular data source for research on Wikipedia. With the immense recent growth of Wikidata the question arises about the extent and quality of its data. The goal of this project is to identify a (small) set of analyses that are based on DBpedia and repeat them with data from Wikidata in order to understand the differences and identify potential pitfalls.

How to build the perfect researcher profile #

There are many services and databases about researchers which contain plenty of information – publications, over projects, research topics, collaborators, and so forth. This project shall identify main data sources and analyse which types of metadata they provide, they quality and coverage, and their accessibility. How could the perfect researcher profile be built with the least effort?

Automatically Identifying Vossian Antonomasia in Texts #

The Vossian Antonomasia (or short “Vossanto”) is a stylistic device commonly used in news articles (e.g., “Anna Netrebko, the Julia Roberts of opera” or “the modern Steinway, the Hummer of instruments”). The goal of this project is to develop a method to automatically identify occurrences of Vossantos in texts using natural language processing (NLP) and machine learning. siehe auch