Personal webpage of Jorge Sanchez Almeida - Analysis of large datasets

Analysis of large data sets

The upcoming surveys are so large that their analysis has to be automated. Using this motto, I have been exploring the use of the k-means clustering algorithm to organize (classify and process) larger data sets.

Pre-processing of raw data sets (in the solar-physics context).
Classification of galaxy spectra. ASK classication of all the galaxies with spectra in SDSS DR7
Classification of SEGUE stellar spectra
A single pass k-mean ... faster than the traditional algorithm by Ordovas and SA (2104)