Data sets

·Repositories
Below I am giving some links for some repository data sets for regression tasks. The datasets given below include some soft sensors datasets (which is my main area of study), where some of them have been discriminated here. The datasets are also discriminated regarding if they are static or dynamic and if they come from a soft sensors application or not.


·General regression datasets

  • Weka database - contains several regression datasets from different sources.

  • UCI database - contains several regression datasets;

  • Luís Torgo repository - contains many regression datasets, including some of well know regression datasets present in the UCI database;

  • Delve repository - This repository contains some regression datasets, most of them are also included in the Luís Torgo database;

  • Journal of Statistics Education - this repository contains several regression datasets and their respective description.

  • CoEPrA 2006 - this repository contains high dimensional regression datasets based on the CoEPRA competition.


· Soft sensors data sets

  • WWTP:

    • Download: here;

    • Description: Stationary; Extracted from a real WWTP plant, more info can be found in page 8 here;

    • Data Info: Continuous; Stationary; Number of inputs: 8; Number of samples: 1000; Output: Fluorine at efluent stage;

    • Objective: Predict fluorine at the efluent stage;

    • In case of publication please cite: Francisco Souza, Rui Araújo, Tiago Matias, Jérôme Mendes. A Multilayer-Perceptron Based Method for Variable Selection in Soft Sensor Design. Journal of Process Control, 23(10):1371–1378, November 2013. [doi; pdf; bib].

  • SRU Unit:

    • Download: Data for SRU Unit and Debutanizer Column (original link);

    • Description: Stationary; Extracted from a real Debutanizer plant, more info can be found in page XX of Fortuna et al. Book;

    • Data Info: Continuous; Stationary; Number of inputs: 7; Number of samples: 2393; Output: Butane concentration;

    • Objective: Predict the butane concentration on a Debutanizer column;

    • In case of publication cite the original book: Fortuna et al. Book.

  • To be updated soon.