-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-42576: mpSkyEphemerisQuery #227
Conversation
@Gerenjie the reason the action is failing is because the DM repos require rebasing and not merging from main in order to keep the ticket branch up to date. If you need help with that you can Slack me. |
b58aae7
to
adf1c28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to put this together! I have quite a few comments, so I'm returning the review ahead of the other PRs. My most important objections are the timeout and the print
statements; leaving either alone will make this task almost impossible to use in the field. Please also make sure you're building the docs; I saw some formatting problems that I would expect to cause a build error.
Happy to look at this again once these issues are addressed. See the dev guide and the commits appendix for tips for working through the comments without getting overwhelmed.
import lsst.pipe.base.connectionTypes as connTypes | ||
|
||
# Enforce an error for unsafe column/array value setting in pandas. | ||
pd.options.mode.chained_assignment = 'raise' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very dangerous to set a global option in a module like this -- it means the behavior of other code will depend on whether this module has been imported or not.
If your task really needs this setting, consider using option_context
instead. But if I understand https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#evaluation-order-matters correctly, this flag is a debugging tool and shouldn't be committed at all.
import pandas as pd | ||
import numpy as np | ||
import pyarrow as pa | ||
import requests | ||
import sys | ||
|
||
import lsst.pipe.base as pipeBase | ||
import lsst.pex.config as pexConfig | ||
from lsst.utils.timer import timeMethod | ||
from lsst.geom import SpherePoint | ||
|
||
from lsst.pipe.base import PipelineTask, PipelineTaskConfig, PipelineTaskConnections | ||
import lsst.pipe.base.connectionTypes as connTypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please organize imports according to PEP 8 guidelines. In particular, you import pipe.base
three times in three different ways; to minimize confusion, please pick one and use it consistently throughout the file.
|
||
ssObjects = connTypes.Output( | ||
doc="MPSky-provided Solar System objects observable in this detector-visit", | ||
name="preloaded_ssObjects", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During RFC-997 there was some grumbling about exactly what the new datasets ought to be called. It might be a good idea to check with #dm-naming-things
before we actually start creating new dataset types in /repo/main
and /repo/embargo
(though when I asked about the similarly-named APDB output datasets I didn't get an answer).
observerCode = pexConfig.Field( | ||
dtype=str, | ||
doc="IAU Minor Planet Center observer code for queries " | ||
"(Rubin Obs./LSST default is X05)", | ||
default='X05' | ||
) | ||
queryRadiusDegrees = pexConfig.Field( | ||
dtype=float, | ||
doc="On sky radius for Ephemeris cone search. Defaults " | ||
"to the radius of Rubin Obs FoV in degrees", | ||
default=1.75 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create PRs for obs_subaru
, obs_decam
, and obs_lsst
that create config overrides for these instrument-dependent fields. For the HSC/DECam observer codes you can probably copy from the existing SkyBotEphemerisQuery.py
files, but I'm pretty sure AuxTel/LATISS has a separate code from Simonyi/LSSTCam.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out there's no good documentation about making config overrides. 😞 A quick from-memory summary:
- The Python files in
obs_*/config/
contain config settings that should be applied for all runs of that task on that instrument. They have higher priority than task defaults, but lower priority than in-pipeline overrides. - The config file must have the same name as the task label in the pipeline (which you can usually assume is
_DefaultName
, including in your case). - Overrides directly under
obs_lsst/config/
apply to all LSST instruments. Overrides in a subdirectory apply only to that instrument, and have higher priority than the general overrides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second glance, something is very wrong here -- you don't actually use observerCode
when downloading from MPSky. Is this a future feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use X05 for AuxTel for now. AuxTel getting an observatory code is blocked on association working, since the Minor Planet Center assigns observatory codes and they've asked for some NEO measurements. I don't see an obs_auxtel equivalent though -- should that be somewhere in obs_lsst?
Now that I look at it, I don't see a SkyBotEphemerisQuery.py override in obs_lsst at all. Should I follow the format from obs_subaru and obs_decam?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, observerCode is a future MPSky feature, but it currently just defaults to LSST (X05).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an obs_auxtel equivalent though -- should that be somewhere in obs_lsst?
Yes, that's what the subdirectories are for. I think LATISS
is the only one that's for AuxTel instead of Simonyi.
Now that I look at it, I don't see a SkyBotEphemerisQuery.py override in obs_lsst at all. Should I follow the format from obs_subaru and obs_decam?
The default for SkyBotEphemerisQuery was I11, which was already for LSST so no override was done. I'm not sure how I11 relates to X05.
Where you do need an override (LATISS only?), then yes, following the existing format should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, observerCode is a future MPSky feature, but it currently just defaults to LSST (X05).
Please add a note to the field docstring that it's not used yet.
This does raise the question of what to do for DECam and especially HSC. We could just run the task and accept that nothing will match (that's what Made irrelevant by the news that we can't run this task in SkyBotEphemerisQueryTask
did before I learned that geocentric is not a good default 😛), or we could trim the task out of those pipelines until MPSky supports them.ap_verify
at all; see DM-45750.
It looks like there's no Jira issue for adding generic observatory support, so we can't cite it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I11 is Gemini South, which is basically LSST as far as positions on the Earth are concerned. X05 probably didn't exist as a code when this was written (it was just designated last year, I believe), but as far as science is concerned these are all equivalent except in very extreme cases (impactors).
I've created DM-45735, and will cite it in comments! Should be resolved soon, but who knows.
|
||
self.log.info("Successfully associated %d SolarSystemObjects.", nFound) | ||
assocMask = diaSourceCatalog["ssObjectId"] != 0 | ||
assocMask = (diaSourceCatalog["ssObjectId"] != 0) & (np.isfinite(diaSourceCatalog["ssObjectId"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to guarantee that ssObjectId
is always finite -- you're the one defining it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if ssObjectID has defined behavior for diaSources which are not associated to an ssObject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard that -- I do define it and set it to zero. I agree, this change is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard that -- I do define it and set it to zero. I agree, this change is not needed.
python/lsst/ap/ephemerides.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pipelines should not go in task packages, and they should never be mixed in with Python code. Please move this to ap_pipe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that lsst/ap_pipe#189 has its own Ephemerides.yaml
, so I recommend just dropping this file entirely.
expCenter = SpherePoint(region.getBoundingCircle().getCenter()) | ||
|
||
# Make sure date is non-NaN. | ||
expMidPointEPOCH = (timespan.begin.mjd + timespan.end.mjd)/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once #230 merges, you should be able to just do:
expMidPointEPOCH = (timespan.begin.mjd + timespan.end.mjd)/2 | |
expMidPointEPOCH = utils.getMidpointFromTimespan(timespan, allowEndpoints=False).utc.mjd |
As a bonus, getMidpointFromTimespan
would do the input validation for you.
(I originally claimed the above code wouldn't work because I thought timespan.begin.mjd
was still an astropy.time.Time
. That's not the case, so this is just simple code reuse.)
0bd06cc
to
7c1f70e
Compare
d42d5a8
to
1e5823b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better, most of my comments are about documentation. However, I'm a bit concerned about the exception handling and the correctness of the ID converter; can you add some round-trip tests for the latter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't do umbrella "response to review" commits. Commits get their value from being logically coherent and individually correct, so review responses should be done either by tweaking the existing commits (edit
, squash
, or fixup
commands in git rebase -i
) or, for a genuinely new change, creating new commits that are as well-organized and -documented as the original ones.
In this case, since you had one giant commit to begin with, squashing the changes onto it is probably the least of the available evils. I'll continue the review on the combined diff.
solarSystemObjectTable : `pandas.DataFrame` | ||
Preloaded Solar System objects expected to be visible in the image. | ||
legacySolarSystemTable : `pandas.DataFrame` | ||
Not used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the reason for including a "virtual" input that is never used but must be specified. What is this for, and is there a plan for phasing it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to maintain the interface's backwards compatibility -- removing the argument in the place of legacySolarSystemTable
would break every place where diaPipe.run
is currently being called. I do not believe there is a plan for phasing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that what you have here is not actually backward-compatible -- a hypothetical user who tries to call run
with the old signature, expecting to get solar system associations, will silently get nothing. If you do want backward compatibility, you'd need to convert the old-style input into the new-style input.
Phasing out could be handled by raising a FutureWarning
if the argument isn't None
(there's no specific procedure for deprecating parameters, so you'd have to do it manually, but see that page for recommended text) and filing a Jira issue to remove it later.
All that said, though, I doubt that run
is called anywhere except from runQuantum
or unit tests, so I think you could get away with unceremoniously deleting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @Gerenjie and @isullivan separately, we've agreed to move forward as follows:
legacySolarSystemTable
will be used in place ofsolarSystemObjectTable
when the latter is undefined. That should be enough to give backwards-compatibility whetherrun
is called with positional args or keywords.- After merging this PR, we'll file an RFC to deprecate and remove
legacySolarSystemTable
per the usual process. I doubt there will be any pushback.
"""Solar System Object Query to MPSky in place of a internal Rubin solar | ||
system object caching/retrieval code. | ||
|
||
Will compute the location for of known SSObjects within a visit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will compute the location for of known SSObjects within a visit | |
Will compute the location for of known SSObjects within a visit-detector combination. |
observerCode = pexConfig.Field( | ||
dtype=str, | ||
doc="IAU Minor Planet Center observer code for queries " | ||
"(Rubin Obs./LSST default is X05)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auto-generated documentation includes the field default, you can see DiaPipelineConfig for examples.
"(Rubin Obs./LSST default is X05)", | |
"(default is Rubin Obs./LSST)", |
I always recommend that people proofread their docs by running package-docs clean && package-docs build
and looking at the HTML output. 🙂
python/lsst/ap/association/utils.py
Outdated
for character in ObjID: | ||
ssObjectID *= 256 | ||
ssObjectID += ord(character) | ||
ssObjectID *= 256 ** (7 - len(ObjID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you seem to be trying to pack specific inputs into specific bytes, I recommend using shifting and bitwise operators instead of arithmetic. Both you and future developers will be less likely to make a mistake.
Just to be sure there is no mistake, is the output supposed to be an 8-byte flag where:
- Top byte is
flags
- Remaining 7 bytes are the characters of
ObjID
, zero-padded on the right?
If so, it would be good to have a comment that says so. (I assume that the exact format for ssObjectID
is supposed to be an implementation detail, and that clients/tests should rely on ssObjectID_to_ObjID
instead?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the hope! Yes, the idea is for everyone to use these methods to convert, and for RSP users to have to use table lookups to check the ssObjectID for the minor planet they care about.
python/lsst/ap/association/utils.py
Outdated
|
||
def ssObjectID_to_ObjID(ssObjectID): | ||
"""Convert from Minor Planet Center packed provisional object IDs and | ||
Rubin ssObjectID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the correct description.
python/lsst/ap/association/utils.py
Outdated
""" | ||
ObjID = ''.join([chr(ssObjectID // (256 ** i) % 256) for i in reversed(range(0, 7))]) | ||
ObjID = ObjID.replace('\x00', '') | ||
return ObjID, ssObjectID//(256 ** 7)%256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about using bit operators instead (extracting each byte with a mask should be much simpler than all these exponents and moduli).
python/lsst/ap/association/utils.py
Outdated
ObjID : `str` | ||
Minor Planet Center packed provisional designation for a small solar | ||
system object. Must be fewer than eight characters. | ||
|
||
flags : `int` | ||
Eight free bits to enable future decoupling between Minor Planet Center | ||
and Rubin. Zero by default, should not be changed unless we need to | ||
move away from a 1:1 mapping with the MPC. Must be within [0, 255]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update these descriptions to be appropriate for outputs, not inputs.
python/lsst/ap/association/utils.py
Outdated
------ | ||
|
||
""" | ||
ObjID = ''.join([chr(ssObjectID // (256 ** i) % 256) for i in reversed(range(0, 7))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider validating ssObjectID the same way you did input validation in the other function.
1e5823b
to
88cfa66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only minor comments.
However, please squash the commits before merging:
- "response to review" commits are still discouraged
- unit tests should be added on the same commit as the code they test, otherwise the commits are not self-contained (imagine if somebody deleted or rearranged one of the commits -- you'd have either untested code, or tests of nonexistent code). That said, the ID converters would have made a good separate commit a priori, since they can be defined and tested without MPSkyEphemerisQuery.
- fixes needed to get things to build should be merged into the original change, as if it had been done right the first time; as things stand, the original test commit breaks
scons
/flake8
.
"Made legacySolarSystemTable an optional input" is a bit of a grey area; this one makes sense either as a standalone commit or squashed under the "right the first time" principle. I'd lean towards squashing (as it's basically a bug in the original legacySolarSystemTable
), but if you keep it separate, please write it in the imperative tense, e.g. "Use legacySolarSystemTable in new code".
import pyarrow as pa | ||
import requests | ||
import sys | ||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alphabetical order would make it easier to spot what's already imported.
|
||
return Struct( | ||
ssObjects=mpSkySsObjects, | ||
) | ||
|
||
def _mpSkyConeSearch(self, expCenter, epochMJD, queryRadius): | ||
def read_mpSky_response(self, response): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit odd to mix camelCase and snake_case in the same name. Please pick one and stick to it.
object_polynomial : `np.ndarray` | ||
Array of object cartesian position polynomials | ||
observer_polynomial : `np.ndarray` | ||
Array of observer cartesian position polynomials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding explicit dimensions, at least for the multidimensional arrays (a reader can probably figure out that objID
, ra
, and dec
are N-vectors).
python/lsst/ap/association/utils.py
Outdated
ObjID = ObjID.replace('\x00', '') | ||
return ObjID, ssObjectID//(256 ** 7)%256 | ||
if ssObjectID < 0 or ssObjectID >= (2 << 63): | ||
raise ValueError(f'ssObjectID ({ssObjectID}) outside [0, 2^63].') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be 2^64-1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
python/lsst/ap/association/utils.py
Outdated
ObjID = ''.join([chr(ssObjectID // (256 ** i) % 256) for i in reversed(range(0, 7))]) | ||
ObjID = ObjID.replace('\x00', '') | ||
return ObjID, ssObjectID//(256 ** 7)%256 | ||
if ssObjectID < 0 or ssObjectID >= (2 << 63): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 1 << 64
would be more idiomatic.
tests/test_utils.py
Outdated
for allowed_flag in allowed_flags: | ||
self.assertEqual((allowed_string, allowed_flag), | ||
ssObjectID_to_objID(objID_to_ssObjectID(allowed_string, | ||
allowed_flag))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth factoring the ssObjectID_to_objID(objID_to_ssObjectID())
call into its own variable for readability.
tests/test_utils.py
Outdated
+ ['K99AJ3Z', 'K08Aa0A', 'K07Tf8A', 'PLS2040', 'T1S3138', 'T2S1010', 'T3S4101'] | ||
allowed_flags = [i for i in range(0, 256)] | ||
disallowed_flags = [-999999999, -512, -256, -255, -1, 256, 512, 99999999] | ||
disallowed_strings = ['Ā', '🔭', 'A' * 8, ' ' * 8, 'A' * 128] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think putting an emoji in Python code might be going a little too far (though it's hilarious that GitHub can print it!)
EDIT: just checked and even nano
can handle it. Wasn't expecting that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the empty string? That seems like a natural edge case.
+ ['K99AJ3Z', 'K08Aa0A', 'K07Tf8A', 'PLS2040', 'T1S3138', 'T2S1010', 'T3S4101'] | ||
+ ['A' * i for i in range(2, 8)] + ['Z' * i for i in range(2, 8)] \ | ||
+ ['J95X00A', 'J95X01L', 'J95F13B', 'J98SA8Q', 'J98SC7V', 'J98SG2S'] \ | ||
+ ['K99AJ3Z', 'K08Aa0A', 'K07Tf8A', 'PLS2040', 'T1S3138', 'T2S1010', 'T3S4101'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this breaks flake8
, so this commit should have been part of "adding unit tests"
…n ssObjectID and MPC designation
…ectTable is provided
c8d648a
to
e8a5a9d
Compare
Just for future reference, the commit summaries overflowed GitHub's 72-character limit. See the dev guide for recommendations on writing commit messages (which actually recommends the much tighter limit of 50 characters, maybe we should update that). |
Adding mpSkyEphemerisQuery, which replaces SkyBot in querying mpSky for ssObjects in the pre-load step.