This is a project to track progress on automated question generation, by maintaining a leaderboard of current state-of-the-art (SOTA) BLEU scores on popular datasets.
AQ stands for answer questioning, also known as question generation. It represents the inverse of the more widely known "question answering" task, and requires that a model take a context document and an answer within that document, and generate a plasible natural language question.
While no dedicated AQ datasets exist, we currently track progress on the most widely used dataset. Du et al. (2017) split the well known SQuAD dataset to provide a training, validation and test set. This split is available in the same json format as SQuAD on our GitHub.
We use BLEU to evaluate performance of the models - while this is far from perfect, it's the best we have for now.