Artificial data sets





Synthetic datasets for classification and clustering


This GitHub repository contains MATLAB code for generating 2D and 3D synthetic/toy/artificial datasets. The datasets have been previously used in various publications, mostly for rapid testing of classification and clustering methods. A sample of the datasets can be downloaded from here. Currently, there are 52 datasets: 47 two-dimensional sets


and 5 three-dimensional sets


In addition, we provide an interactive tool in MATLAB for generating a 2D dataset manually.



Older data collection

(download)


Classification data


Clustering data