KDnuggets Home » News » 2016 » Sep » Software » New sequence learning data set ( 16:n34 )

New sequence learning data set


A new data set for the study of sequence learning algorithms is available as of today. The data set consists of pen stroke sequences that represent handwritten digits, and was created based on the MNIST handwritten digit data set.



By Dr. Edwin D. de Jong, Netherlands.

Digit2 Sequence A new benchmark data set for sequence learning has been made available. It's based on the well known MNIST handwritten digit data set; all 70000 images have been thresholded and thinned, and based on the resulting 1-pixel-width skeleton of each digit, using a TSP solver, hypothetical stroke sequences were then inferred to produces stroke sequences that could have generated the digit.

This will be relevant for folks who want to experiment with sequence learning and need a publicly available and comprehensible data set.

MNIST stroke sequence data set (github)

The code project used to create the data set (github)

Sign Up

By subscribing you accept KDnuggets Privacy Policy