Powered by GitBook

1 of 1

Loading...

Data Creation

When you create a dataset, the followings need to be clearly described:

Data collection (e.g., sources of the data).
Preprocessing if performed (e.g., scripts that you write, existing tools used).
Annotation scheme and guidelines if conducted with justification.
People involved in this process (e.g., annotators, survey subjects).
Quality of the created data (e.g., inter-annotator agreement).
Statistics and analysis of the original, preprocessed, annotated data.

Here are a few papers presenting new datasets:

, Li et al., EMNLP 2020 (see Section 3).
, Yang and Choi, SIGDIAL, 2019 (see Section 3).

Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models

FriendsQA: Open-Domain Question Answering on TV Show Transcripts