Data Creation
Last updated
Last updated
When you create a dataset, the followings need to be clearly described:
Data collection (e.g., sources of the data).
Preprocessing if performed (e.g., scripts that you write, existing tools used).
Annotation scheme and guidelines if conducted with justification.
People involved in this process (e.g., annotators, survey subjects).
Quality of the created data (e.g., inter-annotator agreement).
Statistics and analysis of the original, preprocessed, annotated data.
Here are a few papers presenting new datasets:
, Li et al., EMNLP 2020 (see Section 3).
, Yang and Choi, SIGDIAL, 2019 (see Section 3).