Hubble Tuning Fork - By "ESA/Hubble"
Hubble Tuning Fork - By "ESA/Hubble"
As part of a Design Oriented Project for my Physics degree, I worked on implementing Machine Learning Models for classification of Galaxies in the Galaxy Zoo dataset. Additionally, I implemented a Decision Tree Regressor model to predict the redshift in Galaxies using data from the SLOAN Digital Sky Survey (SDSS) .
This project was one of my first interactions with traditional Machine Learning techniques such as Decision Trees and Random Forests. I took upon this project as part of an elective course for my MSc Physics requirements. I learned about the different types of Galaxies in the Hubble Tuning fork and the key characteristics. I predicted the different types of galaxies in the Galaxy Zoo dataset.
Additionally, I read comprehensive research to understand the mathematics that goes behind the calculation of redshift in objects. Redshift is crucial for determining galaxy distances, measured via spectroscopy. It shifts the galaxy spectrum based on its motion relative to us. I used the SLOAN Digital Sky Survey (SDSS) to develop a regression model for the prediction of the redshift.
The SDSS has produced detailed 3D maps of the Universe, capturing spectral data for over three million astronomical objects using a wide-angle optical telescope. Redshift is crucial for determining galaxy distances, measured via spectroscopy. I developed a Python-based code to predict the redshifts of galaxies using data from SDSS and the Decision Tree Regressor model.
Colour Indices: I used the colour indices derived from the five photometric filters of SDSS (u, g, r, i, z) to determine the properties of galaxies and use them as features for the regression model.
Decision Tree Model: The decision tree regression algorithm was employed to predict the photometric redshift using colour indices. The model's depth was optimized to minimize prediction error.
Validation: Held-out and k-fold cross-validation methods were used to validate the model, achieving an accuracy of 98%.
Hubble Tuning Fork (By "ESA/Hubble")
Galaxies are primarily classified into three types: spiral, elliptical, and irregular. Spiral galaxies are complex and predominantly blue, indicating young, massive stars. Elliptical galaxies are smooth and red, consisting mainly of older stars. Irregular galaxies lack a defined structure and often result from gravitational interactions or mergers. The Hubble Tuning fork (shown on left) is popularly used to display different types of galaxies.
Dataset: I used the Galaxy Data Zoo dataset for the classification task. I performed a 70:30 train-test split of the dataset for the model. The features present in the dataset, which I could use for the classification algorithms are :
Color Indices: Derived from SDSS photometric filters (u, g, r, i, z) to determine galaxy properties. Studies of galaxy evolution tell us that spiral galaxies have younger star populations and therefore are 'bluer' (brighter at lower wavelengths). Elliptical galaxies have an older star population and are brighter at higher wavelengths ('redder').
Eccentricity: Measures the shape of the galaxy by fitting an ellipse to its profile.
Adaptive Moments: Adaptive moments also describe the shape of a galaxy. They are used in image analysis to detect similar objects at different sizes and orientations.
Concentration: Measures the luminosity profile of a galaxy.
Models:
Decision Trees: Using decision trees, the model achieved an accuracy of 78.59%.
Random Forest: The Random Forest algorithm, improved accuracy to 86.15% by building multiple decision trees and merging them for stable predictions.
For more detailed insights and code, please refer to the project's appendix and the provided graphs in the project report.