Data

This page lists the datasets that are either used or generated within the ESSENCE Network.

Planning Domains

Planning domains with explicit communication created as study cases for our extension of the symbolic planner for handling communication that can be found in the repository.

Link to the data

Clarification Requests about Intentions

A dataset of clarification requests extracted from the AMI Corpus, annotated for whether they target intentionality.

Link to the data

Crowdsourced Judgements about Pragmatic Rejections

A dataset of crowdsourced judgements on whether utterances considered "pragmatic rejections" have acceptance or rejection force.

Link to the data

Measuring user experience while interacting with a translation application

Data was collected according to two user studies. After getting the consent of the participants, data was gathered in three different ways: interviews, observations and video taping.
Transcribed interviews, observations and video were thematically analyzed by assigning labels to different parts of the data. Afterwards, labels were grouped in themes that were linked to the main objectives of the study.

Link to the data

Multilingual translations created by experts using a translation application

Data collected through the user interface corresponds to the translation of WordNet concepts into language elements of any other language, mainly Italian and Mongolian as these are the most active communities of users. English concepts can be translated in lemma, gloss and example, when there is a direct translation. Or can be translated in gloss, when there is not a direct translation and the concept is then classified as a lexical gap.

Link to the data

Multilingual lexical relations

The dataset for multilingual lexical relations is over 335 languages. There are the three types of lexical relations: derived, pertainym, and antonym.

Link to the data

Emergency Response Extensions for WordNet

A repository which contains emergency response extensions for WordNet. These extensions have been developed taking terms from different resilience sources.

Link to the data

HOWLINKS: A Large-Scale Corpus of Semantically Annotated Human Instructions in English

This dataset contains over 200.000 step-by-step instructions divided into over 2 million components (such as steps, requirements and methods) with English textual labels. Instructions are represented in RDF, interlinked with each other and disambiguated with links to DBpedia.

Link to the data

Corpus of Historical American English (COHA)

COHA is the largest structured collection of English text from the period between 1810s-2000s.

Link to the data

Leave a Reply

Evolution of Shared SEmaNtics in Computational Environments – A Marie Curie Initial Training Network