SDCD: Simulated Data for Concept Drift

Developed for project: #EP/D0404X/1 Classifier Ensembles for Changing Environments
Sponsored by EPSRC.
(related keywords: concept drift, population drift, hidden contexts, recurring contexts)

A fundamental assumption often made in supervised classification is that the problem is static, i.e., the description of the classes does not change with time. However many practical classification tasks involve changing environments. Thus designing and testing classifiers for changing environments are of increasing interest and importance. A number of benchmark data sets are available for static classification tasks. For example, the UCI machine learning repository is extensively used by researchers to compare algorithms across various domains. Only few benchmark datasets are available for changing environments. Also, while generating data for static environments is relatively straightforward, this is not so for changing environments. The reason is that an infinite amount of changes can be simulated, and it is difficult to define which ones will be realistic and hence useful. We proposed a general framework for generating data to simulate changing environments in:

  • Narasimhamurthy A., L.I. Kuncheva, A framework for generating data to simulate changing environments, Proc. IASTED, Artificial Intelligence and Applications, Innsbruck, Austria, 2007, 384-389. pdf [bib]

  • Content