This page lists the datasets that are either used or generated within the ESSENCE Network.
Planning Domains
Planning domains with explicit communication created as study cases for our extension of the symbolic planner for handling communication that can be found in the repository.
Clarification Requests about Intentions
A dataset of clarification requests extracted from the AMI Corpus, annotated for whether they target intentionality.
Crowdsourced Judgements about Pragmatic Rejections
A dataset of crowdsourced judgements on whether utterances considered "pragmatic rejections" have acceptance or rejection force.
Measuring user experience while interacting with a translation application
Data was collected according to two user studies. After getting the consent of the participants, data was gathered in three different ways: interviews, observations and video taping.
Transcribed interviews, observations and video were thematically analyzed by assigning labels to different parts of the data. Afterwards, labels were grouped in themes that were linked to the main objectives of the study.
Multilingual translations created by experts using a translation application
Data collected through the user interface corresponds to the translation of WordNet concepts into language elements of any other language, mainly Italian and Mongolian as these are the most active communities of users. English concepts can be translated in lemma, gloss and example, when there is a direct translation. Or can be translated in gloss, when there is not a direct translation and the concept is then classified as a lexical gap.
Multilingual lexical relations
The dataset for multilingual lexical relations is over 335 languages. There are the three types of lexical relations: derived, pertainym, and antonym.
Emergency Response Extensions for WordNet
A repository which contains emergency response extensions for WordNet. These extensions have been developed taking terms from different resilience sources.
HOWLINKS: A Large-Scale Corpus of Semantically Annotated Human Instructions in English
This dataset contains over 200.000 step-by-step instructions divided into over 2 million components (such as steps, requirements and methods) with English textual labels. Instructions are represented in RDF, interlinked with each other and disambiguated with links to DBpedia.
Corpus of Historical American English (COHA)
COHA is the largest structured collection of English text from the period between 1810s-2000s.