Skip to content

Using machine learning to predict the mass of quasar supermassive black holes

Notifications You must be signed in to change notification settings

AndrewWren/Quasar-mass-machine-learning

Repository files navigation

Quasar mass machine learning

Using machine learning to predict the mass of quasar supermassive black holes

A quasar is a distant and very luminous active galactic nucleus which outshines all the starlight from its galaxy. Each quasar is powered by its galaxy's central supermassive black hole. The mass of the black hole can be found from the characteristics of the quasar spectrum using well-established "single-epoch" spectral fitting estimates - see McLure & Dunlop 2004, Vestergaard & Peterson 2006, Shen+ 2011. This involves subtracting the effects of dust in our galaxy, also subtracting a model of the main "continuum" spectrum, and estimating the width of key emission lines in the remaining spectrum, typically lines associated with emissions from hydrogen, magnesium and carbon.

The Sloan Digital Sky Survey (SDSS) has observed an extensive set of quasar spectra - see Lyke+ 2020 for details. From the SDSS Chen+ 2019 made single-epoch black hole mass estimates for 173,559 quasars by spectral fitting.

The code in this repository uses a subset of the Chen+ 2019 dataset to predict quasar black hole masses through an artificial neural network (ANN) implemented using Keras on Tensorflow2. The ANN is loosely based on a model for SketchRNN presented by @ageron in his solutions for Chapter 15 of Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow. It predicts the log of the black hole mass, log10(MBH/MSun), for each single spectrum in a selected subset of the Chen+ quasars.

The ANN takes the Chen+ "single-epoch" mass estimates as the model's ground truth for training. (Apologies that epoch has distinct astrophysics and ML meanings - hopefully always obvious from the context. The ML "epoch" only occurs in the monitor that runs with the ANN.)

The ANN comprises three Conv1D layers (the middle one with a batch normalization) followed by two LSTM layers and then two Dense layers. Two rounds of Nadam optimization are used, with learning rates of 1e-3 and 1e-5.

On the test set, this model gives an R2 of 0.72. Some quasars have been observed many times with ~70 spectra taken; for those, using the mean mass prediction gives a test set R2 of 0.75. An issue for further investigation is whether machine learning predictions for quasars with multiple spectra can be improved with a method more sophisticated than taking the mean of the associated predictions - for example an ANN approach to pooling estimates.

The files and folders are as follows:

  • Quasar_mass_ML.ipynb, the main program file to run, a Jupyter iPython file;
  • this README.md file;
  • get_Chen_data.py, a script file;
  • quasar_analysis.py, another script file;
  • a models folder for hdf5 files in two folders:
    • an empty checkpoints folder where checkpoints automatically generated by the ANN in ... will be placed; and
    • a Best_models folder where the best ANN model is kept with its two different sets of optimizers and weights. There are two files for the best model:
      • Best_ANN_intermediate_stage.h5 which has the intermediate weights from the 1e-3 learning rate model run; and
      • Best_ANN_final_stage.h5 which is the model to use for prediction, having the final weights from the 1e-5 learning rate model run;
  • a data folder, containing:

Additional data is needed to run the model as a spectra.parquet file is too big to be on GitHub (2.4GB) and is stored on Open Science Foundation at https://osf.io/6hbqx/. Download this and add it to the bigger_data sub-folder.

To create the quasars.parquet and spectra.parquet files from scratch, download to bigger_data an SDSS data file https://data.sdss.org/sas/dr16/eboss/qso/DR16Q/DR16Q_v4.fits, which is described on the relevant SDSS datamodel webpage. Follow instructions in the second cell of Quasar_mass_ML.ipynb. Please note further steps of this process downloads around 7GB, taking about half a day to run with a standard laptop and internet connection.

About

Using machine learning to predict the mass of quasar supermassive black holes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published