Abstract:
With Gaia Data Release 3 (DR3), new and improved astrometric, photometric, and spectroscopic measurements for 1.8 billion stars
have become available. Alongside this wealth of new data, however, there are challenges in finding efficient and accurate computational
methods for their analysis. In this paper, we explore the feasibility of using machine learning regression as a method of extracting
basic stellar parameters and line-of-sight extinctions from spectro-photometric data. To this end, we built a stable gradient-boosted
random-forest regressor (xgboost), trained on spectroscopic data, capable of producing output parameters with reliable uncertainties
from Gaia DR3 data (most notably the low-resolution XP spectra), without ground-based spectroscopic observations. Using Shapley
additive explanations, we interpret how the predictions for each star are influenced by each data feature. For the training and testing of
the network, we used high-quality parameters obtained from the StarHorse code for a sample of around eight million stars observed
by major spectroscopic stellar surveys, complemented by curated samples of hot stars, very metal-poor stars, white dwarfs, and hot
sub-dwarfs. Th...