Store HITRAN isotopes separately #513

erwanp · 2022-08-19T10:14:15Z

💭 Description

Some species like CO2, CO, H2O have many isotopes but most of our calculations use only the first ones (1,2,3).

Current implementation downloads all isotopes and stores them in a same local database file, even if only the first isotopes are required.
It creates long download time; for instance in the ReadTheDocs example : https://radis.readthedocs.io/en/latest/auto_examples/plot_line_survey.html (47s to download & parse all 9 CO2 isotopes although only the first one is needed)

Implementation

We could switch to an implementation where all isotopes are stored separately.

file names would be CO_isoX.hdf5 instead of CO.hdf5.
few changes required in the registered database (~/radis.json) : It's easy to implement since our database management system allows for wildcards, i.e. CO_iso*.hdf5. The database name would still be a unique HITRAN_CO in this example
we already handle databases composed of many files, such as HITEMP CO2 and HITEMP H2O. Should be easy to adapt to HITRAN.

One thing to fix :

when computing with isotopes='all', we currently simply load the full database. If isotopes are stored in different files, how do you know if they've just not been downloaded yet (because they were never required), or if they do not exist? In the Download_hitran() script, we currently fetch the HITRAN database until it fails. We shouldn't fetch the server for each calculation. Therefore the list of all available HITRAN isotopes should be stored in RADIS, hardcoded, and a test should be set up to compare the hardcoded list to the latest HITRAN list (by fetching the website).

Performance / impact

In terms of database loading performance:

I expect it will be exactly the same for Vaex (which handles many files the same way as a unique one) . It might become a bit slower; though; if re-sorting all the isotope databases by wavenumber requires time (should be checked by comparing loading, let's say, the 9 CO2 isotopes in 1 file or 9 ).
slightly slower for Pytables if needing all the isotopes (because combining the different databases will take 2x memory, and takime) , but maybe faster if needing only a few out of many isotopes.
Anyway, Vaex is the future so it's ok if we suffer a minor performance drop with Pytables.

The user experience to run the CO test spectrum on a fresh RADIS installation will ~40% faster, since the 1st run currently downloads all 6 CO isotopes instead of the 3 required

from radis import fetch_hitran
df = fetch_hitran("CO")
len(df)
>>> 5381
len(df.query("iso==1 | iso==2 | iso==3"))
>>> 3306           # so only 60% of the lines in the first 3 isotopes

It may also be easier to handle cached rovibrational energies of different isotopes separately #176 @sagarchotalia (typically, spectroscopic constants are only implemented for the first few isotopes )

The text was updated successfully, but these errors were encountered:

erwanp added refactor requires changes in code architecture todo in the short-term roadmap labels Aug 19, 2022

anandxkumar added this to the 0.15 milestone Oct 28, 2022

erwanp modified the milestones: 0.15, 0.16 Aug 1, 2023

minouHub modified the milestones: 0.16, 0.17 Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store HITRAN isotopes separately #513

Store HITRAN isotopes separately #513

erwanp commented Aug 19, 2022 •

edited

Loading

Store HITRAN isotopes separately #513

Store HITRAN isotopes separately #513

Comments

erwanp commented Aug 19, 2022 • edited Loading

💭 Description

Implementation

Performance / impact

erwanp commented Aug 19, 2022 •

edited

Loading