You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some species like CO2, CO, H2O have many isotopes but most of our calculations use only the first ones (1,2,3).
Current implementation downloads all isotopes and stores them in a same local database file, even if only the first isotopes are required.
It creates long download time; for instance in the ReadTheDocs example : https://radis.readthedocs.io/en/latest/auto_examples/plot_line_survey.html (47s to download & parse all 9 CO2 isotopes although only the first one is needed)
Implementation
We could switch to an implementation where all isotopes are stored separately.
file names would be CO_isoX.hdf5 instead of CO.hdf5.
few changes required in the registered database (~/radis.json) : It's easy to implement since our database management system allows for wildcards, i.e. CO_iso*.hdf5. The database name would still be a unique HITRAN_CO in this example
we already handle databases composed of many files, such as HITEMP CO2 and HITEMP H2O. Should be easy to adapt to HITRAN.
One thing to fix :
when computing with isotopes='all', we currently simply load the full database. If isotopes are stored in different files, how do you know if they've just not been downloaded yet (because they were never required), or if they do not exist? In the Download_hitran() script, we currently fetch the HITRAN database until it fails. We shouldn't fetch the server for each calculation. Therefore the list of all available HITRAN isotopes should be stored in RADIS, hardcoded, and a test should be set up to compare the hardcoded list to the latest HITRAN list (by fetching the website).
Performance / impact
In terms of database loading performance:
I expect it will be exactly the same for Vaex (which handles many files the same way as a unique one) . It might become a bit slower; though; if re-sorting all the isotope databases by wavenumber requires time (should be checked by comparing loading, let's say, the 9 CO2 isotopes in 1 file or 9 ).
slightly slower for Pytables if needing all the isotopes (because combining the different databases will take 2x memory, and takime) , but maybe faster if needing only a few out of many isotopes.
Anyway, Vaex is the future so it's ok if we suffer a minor performance drop with Pytables.
The user experience to run the CO test spectrum on a fresh RADIS installation will ~40% faster, since the 1st run currently downloads all 6 CO isotopes instead of the 3 required
from radis import fetch_hitran
df = fetch_hitran("CO")
len(df)
>>> 5381
len(df.query("iso==1 | iso==2 | iso==3"))
>>> 3306 # so only 60% of the lines in the first 3 isotopes
It may also be easier to handle cached rovibrational energies of different isotopes separately #176@sagarchotalia (typically, spectroscopic constants are only implemented for the first few isotopes )
The text was updated successfully, but these errors were encountered:
💭 Description
Some species like CO2, CO, H2O have many isotopes but most of our calculations use only the first ones (1,2,3).
Current implementation downloads all isotopes and stores them in a same local database file, even if only the first isotopes are required.
It creates long download time; for instance in the ReadTheDocs example : https://radis.readthedocs.io/en/latest/auto_examples/plot_line_survey.html (47s to download & parse all 9 CO2 isotopes although only the first one is needed)
Implementation
We could switch to an implementation where all isotopes are stored separately.
CO_iso*.hdf5
. The database name would still be a uniqueHITRAN_CO
in this exampleOne thing to fix :
isotopes='all'
, we currently simply load the full database. If isotopes are stored in different files, how do you know if they've just not been downloaded yet (because they were never required), or if they do not exist? In the Download_hitran() script, we currently fetch the HITRAN database until it fails. We shouldn't fetch the server for each calculation. Therefore the list of all available HITRAN isotopes should be stored in RADIS, hardcoded, and a test should be set up to compare the hardcoded list to the latest HITRAN list (by fetching the website).Performance / impact
In terms of database loading performance:
Anyway, Vaex is the future so it's ok if we suffer a minor performance drop with Pytables.
The user experience to run the CO test spectrum on a fresh RADIS installation will ~40% faster, since the 1st run currently downloads all 6 CO isotopes instead of the 3 required
It may also be easier to handle cached rovibrational energies of different isotopes separately #176 @sagarchotalia (typically, spectroscopic constants are only implemented for the first few isotopes )
The text was updated successfully, but these errors were encountered: