DM-42576: mpSkyEphemerisQuery #227

Gerenjie · 2024-07-16T17:01:37Z

Adding mpSkyEphemerisQuery, which replaces SkyBot in querying mpSky for ssObjects in the pre-load step.

timj · 2024-07-25T20:20:01Z

@Gerenjie the reason the action is failing is because the DM repos require rebasing and not merging from main in order to keep the ticket branch up to date. If you need help with that you can Slack me.

kfindeisen

Thanks for taking the time to put this together! I have quite a few comments, so I'm returning the review ahead of the other PRs. My most important objections are the timeout and the print statements; leaving either alone will make this task almost impossible to use in the field. Please also make sure you're building the docs; I saw some formatting problems that I would expect to cause a build error.

Happy to look at this again once these issues are addressed. See the dev guide and the commits appendix for tips for working through the comments without getting overwhelmed.

python/lsst/ap/association/mpSkyEphemerisQuery.py

kfindeisen · 2024-07-26T22:47:34Z

python/lsst/ap/association/mpSkyEphemerisQuery.py

+import lsst.pipe.base.connectionTypes as connTypes
+
+# Enforce an error for unsafe column/array value setting in pandas.
+pd.options.mode.chained_assignment = 'raise'


It's very dangerous to set a global option in a module like this -- it means the behavior of other code will depend on whether this module has been imported or not.

If your task really needs this setting, consider using option_context instead. But if I understand https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#evaluation-order-matters correctly, this flag is a debugging tool and shouldn't be committed at all.

kfindeisen · 2024-07-26T23:06:12Z

python/lsst/ap/association/mpSkyEphemerisQuery.py

+import pandas as pd
+import numpy as np
+import pyarrow as pa
+import requests
+import sys
+
+import lsst.pipe.base as pipeBase
+import lsst.pex.config as pexConfig
+from lsst.utils.timer import timeMethod
+from lsst.geom import SpherePoint
+
+from lsst.pipe.base import PipelineTask, PipelineTaskConfig, PipelineTaskConnections
+import lsst.pipe.base.connectionTypes as connTypes


Please organize imports according to PEP 8 guidelines. In particular, you import pipe.base three times in three different ways; to minimize confusion, please pick one and use it consistently throughout the file.

kfindeisen · 2024-07-26T23:17:36Z

python/lsst/ap/association/mpSkyEphemerisQuery.py

+
+ ssObjects = connTypes.Output(
+ doc="MPSky-provided Solar System objects observable in this detector-visit",
+ name="preloaded_ssObjects",


During RFC-997 there was some grumbling about exactly what the new datasets ought to be called. It might be a good idea to check with #dm-naming-things before we actually start creating new dataset types in /repo/main and /repo/embargo (though when I asked about the similarly-named APDB output datasets I didn't get an answer).

kfindeisen · 2024-07-26T23:23:58Z

python/lsst/ap/association/mpSkyEphemerisQuery.py

+ observerCode = pexConfig.Field(
+ dtype=str,
+ doc="IAU Minor Planet Center observer code for queries "
+ "(Rubin Obs./LSST default is X05)",
+ default='X05'
+ )
+ queryRadiusDegrees = pexConfig.Field(
+ dtype=float,
+ doc="On sky radius for Ephemeris cone search. Defaults "
+ "to the radius of Rubin Obs FoV in degrees",
+ default=1.75
+ )


Please create PRs for obs_subaru, obs_decam, and obs_lsst that create config overrides for these instrument-dependent fields. For the HSC/DECam observer codes you can probably copy from the existing SkyBotEphemerisQuery.py files, but I'm pretty sure AuxTel/LATISS has a separate code from Simonyi/LSSTCam.

Turns out there's no good documentation about making config overrides. 😞 A quick from-memory summary:

The Python files in obs_*/config/ contain config settings that should be applied for all runs of that task on that instrument. They have higher priority than task defaults, but lower priority than in-pipeline overrides.

The config file must have the same name as the task label in the pipeline (which you can usually assume is _DefaultName, including in your case).

Overrides directly under obs_lsst/config/ apply to all LSST instruments. Overrides in a subdirectory apply only to that instrument, and have higher priority than the general overrides.

On second glance, something is very wrong here -- you don't actually use observerCode when downloading from MPSky. Is this a future feature?

We should use X05 for AuxTel for now. AuxTel getting an observatory code is blocked on association working, since the Minor Planet Center assigns observatory codes and they've asked for some NEO measurements. I don't see an obs_auxtel equivalent though -- should that be somewhere in obs_lsst?

Now that I look at it, I don't see a SkyBotEphemerisQuery.py override in obs_lsst at all. Should I follow the format from obs_subaru and obs_decam?

Yes, observerCode is a future MPSky feature, but it currently just defaults to LSST (X05).

I don't see an obs_auxtel equivalent though -- should that be somewhere in obs_lsst?

Yes, that's what the subdirectories are for. I think LATISS is the only one that's for AuxTel instead of Simonyi.

Now that I look at it, I don't see a SkyBotEphemerisQuery.py override in obs_lsst at all. Should I follow the format from obs_subaru and obs_decam?

The default for SkyBotEphemerisQuery was I11, which was already for LSST so no override was done. I'm not sure how I11 relates to X05.

Where you do need an override (LATISS only?), then yes, following the existing format should be enough.

Yes, observerCode is a future MPSky feature, but it currently just defaults to LSST (X05).

Please add a note to the field docstring that it's not used yet.

This does raise the question of what to do for DECam and especially HSC. We could just run the task and accept that nothing will match (that's what SkyBotEphemerisQueryTask did before I learned that geocentric is not a good default 😛), or we could trim the task out of those pipelines until MPSky supports them. Made irrelevant by the news that we can't run this task in ap_verify at all; see DM-45750.

It looks like there's no Jira issue for adding generic observatory support, so we can't cite it.

I11 is Gemini South, which is basically LSST as far as positions on the Earth are concerned. X05 probably didn't exist as a code when this was written (it was just designated last year, I believe), but as far as science is concerned these are all equivalent except in very extreme cases (impactors).

I've created DM-45735, and will cite it in comments! Should be resolved soon, but who knows.

kfindeisen · 2024-07-29T17:50:03Z

python/lsst/ap/association/ssoAssociation.py


 self.log.info("Successfully associated %d SolarSystemObjects.", nFound)
- assocMask = diaSourceCatalog["ssObjectId"] != 0
+ assocMask = (diaSourceCatalog["ssObjectId"] != 0) & (np.isfinite(diaSourceCatalog["ssObjectId"]))


You should be able to guarantee that ssObjectId is always finite -- you're the one defining it!

I don't know if ssObjectID has defined behavior for diaSources which are not associated to an ssObject.

Disregard that -- I do define it and set it to zero. I agree, this change is not needed.

kfindeisen · 2024-07-29T17:54:09Z

python/lsst/ap/ephemerides.yaml

Pipelines should not go in task packages, and they should never be mixed in with Python code. Please move this to ap_pipe.

I see that lsst/ap_pipe#189 has its own Ephemerides.yaml, so I recommend just dropping this file entirely.

python/lsst/ap/ephemerides.yaml

kfindeisen · 2024-08-13T00:57:37Z

python/lsst/ap/association/mpSkyEphemerisQuery.py

+ expCenter = SpherePoint(region.getBoundingCircle().getCenter())
+
+ # Make sure date is non-NaN.
+ expMidPointEPOCH = (timespan.begin.mjd + timespan.end.mjd)/2


Once #230 merges, you should be able to just do:

Suggested change

expMidPointEPOCH = (timespan.begin.mjd + timespan.end.mjd)/2

expMidPointEPOCH = utils.getMidpointFromTimespan(timespan, allowEndpoints=False).utc.mjd

As a bonus, getMidpointFromTimespan would do the input validation for you.

(I originally claimed the above code wouldn't work because I thought timespan.begin.mjd was still an astropy.time.Time. That's not the case, so this is just simple code reuse.)

kfindeisen

Looks much better, most of my comments are about documentation. However, I'm a bit concerned about the exception handling and the correctness of the ID converter; can you add some round-trip tests for the latter?

kfindeisen · 2024-08-29T16:48:28Z

python/lsst/ap/association/diaPipe.py

Please don't do umbrella "response to review" commits. Commits get their value from being logically coherent and individually correct, so review responses should be done either by tweaking the existing commits (edit, squash, or fixup commands in git rebase -i) or, for a genuinely new change, creating new commits that are as well-organized and -documented as the original ones.

In this case, since you had one giant commit to begin with, squashing the changes onto it is probably the least of the available evils. I'll continue the review on the combined diff.

python/lsst/ap/association/diaPipe.py

kfindeisen · 2024-08-29T16:56:59Z

python/lsst/ap/association/diaPipe.py

- solarSystemObjectTable : `pandas.DataFrame`
- Preloaded Solar System objects expected to be visible in the image.
+ legacySolarSystemTable : `pandas.DataFrame`
+ Not used


I don't understand the reason for including a "virtual" input that is never used but must be specified. What is this for, and is there a plan for phasing it out?

The idea is to maintain the interface's backwards compatibility -- removing the argument in the place of legacySolarSystemTable would break every place where diaPipe.run is currently being called. I do not believe there is a plan for phasing this out.

Except that what you have here is not actually backward-compatible -- a hypothetical user who tries to call run with the old signature, expecting to get solar system associations, will silently get nothing. If you do want backward compatibility, you'd need to convert the old-style input into the new-style input.

Phasing out could be handled by raising a FutureWarning if the argument isn't None (there's no specific procedure for deprecating parameters, so you'd have to do it manually, but see that page for recommended text) and filing a Jira issue to remove it later.

~~All that said, though, I doubt that run is called anywhere except from runQuantum or unit tests, so I think you could get away with unceremoniously deleting it.~~

After discussing with @Gerenjie and @isullivan separately, we've agreed to move forward as follows:

legacySolarSystemTable will be used in place of solarSystemObjectTable when the latter is undefined. That should be enough to give backwards-compatibility whether run is called with positional args or keywords.

After merging this PR, we'll file an RFC to deprecate and remove legacySolarSystemTable per the usual process. I doubt there will be any pushback.

kfindeisen · 2024-08-29T16:59:30Z

python/lsst/ap/association/mpSkyEphemerisQuery.py

+"""Solar System Object Query to MPSky in place of a internal Rubin solar
+system object caching/retrieval code.
+
+Will compute the location for of known SSObjects within a visit


Suggested change

Will compute the location for of known SSObjects within a visit

Will compute the location for of known SSObjects within a visit-detector combination.

kfindeisen · 2024-08-29T17:33:09Z

python/lsst/ap/association/mpSkyEphemerisQuery.py

+ observerCode = pexConfig.Field(
+ dtype=str,
+ doc="IAU Minor Planet Center observer code for queries "
+ "(Rubin Obs./LSST default is X05)",


The auto-generated documentation includes the field default, you can see DiaPipelineConfig for examples.

Suggested change

"(Rubin Obs./LSST default is X05)",

"(default is Rubin Obs./LSST)",

I always recommend that people proofread their docs by running package-docs clean && package-docs build and looking at the HTML output. 🙂

kfindeisen · 2024-08-29T19:22:42Z

python/lsst/ap/association/utils.py

+ for character in ObjID:
+ ssObjectID *= 256
+ ssObjectID += ord(character)
+ ssObjectID *= 256 ** (7 - len(ObjID))


Since you seem to be trying to pack specific inputs into specific bytes, I recommend using shifting and bitwise operators instead of arithmetic. Both you and future developers will be less likely to make a mistake.

Just to be sure there is no mistake, is the output supposed to be an 8-byte flag where:

Top byte is flags

Remaining 7 bytes are the characters of ObjID, zero-padded on the right?

If so, it would be good to have a comment that says so. (I assume that the exact format for ssObjectID is supposed to be an implementation detail, and that clients/tests should rely on ssObjectID_to_ObjID instead?)

That's the hope! Yes, the idea is for everyone to use these methods to convert, and for RSP users to have to use table lookups to check the ssObjectID for the minor planet they care about.

kfindeisen · 2024-08-29T19:30:18Z

python/lsst/ap/association/utils.py

+
+def ssObjectID_to_ObjID(ssObjectID):
+ """Convert from Minor Planet Center packed provisional object IDs and
+ Rubin ssObjectID.


I don't think this is the correct description.

kfindeisen · 2024-08-29T19:34:11Z

python/lsst/ap/association/utils.py

+ """
+ ObjID = ''.join([chr(ssObjectID // (256 ** i) % 256) for i in reversed(range(0, 7))])
+ ObjID = ObjID.replace('\x00', '')
+ return ObjID, ssObjectID//(256 ** 7)%256


Same comment about using bit operators instead (extracting each byte with a mask should be much simpler than all these exponents and moduli).

kfindeisen · 2024-08-29T19:34:59Z

python/lsst/ap/association/utils.py

+ ObjID : `str`
+ Minor Planet Center packed provisional designation for a small solar
+ system object. Must be fewer than eight characters.
+
+ flags : `int`
+ Eight free bits to enable future decoupling between Minor Planet Center
+ and Rubin. Zero by default, should not be changed unless we need to
+ move away from a 1:1 mapping with the MPC. Must be within [0, 255].


Please update these descriptions to be appropriate for outputs, not inputs.

kfindeisen · 2024-08-29T19:35:16Z

python/lsst/ap/association/utils.py

+ ------
+
+ """
+ ObjID = ''.join([chr(ssObjectID // (256 ** i) % 256) for i in reversed(range(0, 7))])


Consider validating ssObjectID the same way you did input validation in the other function.

kfindeisen

Looks good, only minor comments.

However, please squash the commits before merging:

"response to review" commits are still discouraged
unit tests should be added on the same commit as the code they test, otherwise the commits are not self-contained (imagine if somebody deleted or rearranged one of the commits -- you'd have either untested code, or tests of nonexistent code). That said, the ID converters would have made a good separate commit a priori, since they can be defined and tested without MPSkyEphemerisQuery.
fixes needed to get things to build should be merged into the original change, as if it had been done right the first time; as things stand, the original test commit breaks scons/flake8.

"Made legacySolarSystemTable an optional input" is a bit of a grey area; this one makes sense either as a standalone commit or squashed under the "right the first time" principle. I'd lean towards squashing (as it's basically a bug in the original legacySolarSystemTable), but if you keep it separate, please write it in the imperative tense, e.g. "Use legacySolarSystemTable in new code".

kfindeisen · 2024-09-05T22:30:04Z

python/lsst/ap/association/mpSkyEphemerisQuery.py

 import pyarrow as pa
 import requests
-import sys
+import os


Alphabetical order would make it easier to spot what's already imported.

kfindeisen · 2024-09-05T22:36:47Z

python/lsst/ap/association/mpSkyEphemerisQuery.py


 return Struct(
 ssObjects=mpSkySsObjects,
 )

- def _mpSkyConeSearch(self, expCenter, epochMJD, queryRadius):
+ def read_mpSky_response(self, response):


It's a bit odd to mix camelCase and snake_case in the same name. Please pick one and stick to it.

python/lsst/ap/association/mpSkyEphemerisQuery.py

kfindeisen · 2024-09-05T22:43:42Z

python/lsst/ap/association/mpSkyEphemerisQuery.py

+ object_polynomial : `np.ndarray`
+ Array of object cartesian position polynomials
+ observer_polynomial : `np.ndarray`
+ Array of observer cartesian position polynomials


Consider adding explicit dimensions, at least for the multidimensional arrays (a reader can probably figure out that objID, ra, and dec are N-vectors).

kfindeisen · 2024-09-05T22:58:44Z

python/lsst/ap/association/utils.py

- ObjID = ObjID.replace('\x00', '')
- return ObjID, ssObjectID//(256 ** 7)%256
+ if ssObjectID < 0 or ssObjectID >= (2 << 63):
+ raise ValueError(f'ssObjectID ({ssObjectID}) outside [0, 2^63].')


I think this should be 2^64-1.

Good catch!

kfindeisen · 2024-09-05T22:59:19Z

python/lsst/ap/association/utils.py

- ObjID = ''.join([chr(ssObjectID // (256 ** i) % 256) for i in reversed(range(0, 7))])
- ObjID = ObjID.replace('\x00', '')
- return ObjID, ssObjectID//(256 ** 7)%256
+ if ssObjectID < 0 or ssObjectID >= (2 << 63):


I think 1 << 64 would be more idiomatic.

kfindeisen · 2024-09-05T23:02:58Z

tests/test_utils.py

+ for allowed_flag in allowed_flags:
+ self.assertEqual((allowed_string, allowed_flag),
+ ssObjectID_to_objID(objID_to_ssObjectID(allowed_string,
+ allowed_flag)))


Might be worth factoring the ssObjectID_to_objID(objID_to_ssObjectID()) call into its own variable for readability.

kfindeisen · 2024-09-05T23:04:58Z

tests/test_utils.py

+ + ['K99AJ3Z', 'K08Aa0A', 'K07Tf8A', 'PLS2040', 'T1S3138', 'T2S1010', 'T3S4101']
+ allowed_flags = [i for i in range(0, 256)]
+ disallowed_flags = [-999999999, -512, -256, -255, -1, 256, 512, 99999999]
+ disallowed_strings = ['Ā', '🔭', 'A' * 8, ' ' * 8, 'A' * 128]


I think putting an emoji in Python code might be going a little too far (though it's hilarious that GitHub can print it!)

EDIT: just checked and even nano can handle it. Wasn't expecting that.

What about the empty string? That seems like a natural edge case.

tests/test_utils.py

kfindeisen · 2024-09-05T23:10:12Z

tests/test_utils.py

-  + ['K99AJ3Z', 'K08Aa0A', 'K07Tf8A', 'PLS2040', 'T1S3138', 'T2S1010', 'T3S4101']
+ + ['A' * i for i in range(2, 8)] + ['Z' * i for i in range(2, 8)] \
+ + ['J95X00A', 'J95X01L', 'J95F13B', 'J98SA8Q', 'J98SC7V', 'J98SG2S'] \
+ + ['K99AJ3Z', 'K08Aa0A', 'K07Tf8A', 'PLS2040', 'T1S3138', 'T2S1010', 'T3S4101']


I assume this breaks flake8, so this commit should have been part of "adding unit tests"

…n ssObjectID and MPC designation

…ectTable is provided

kfindeisen · 2024-09-06T18:32:17Z

Just for future reference, the commit summaries overflowed GitHub's 72-character limit. See the dev guide for recommendations on writing commit messages (which actually recommends the much tighter limit of 50 characters, maybe we should update that).

Gerenjie changed the title ~~Tickets/dm 42576~~ DM-42576: mpSkyEphemerisQuery Jul 16, 2024

Gerenjie force-pushed the tickets/DM-42576 branch 4 times, most recently from b58aae7 to adf1c28 Compare July 25, 2024 21:56

kfindeisen requested changes Jul 29, 2024

View reviewed changes

kfindeisen mentioned this pull request Jul 29, 2024

DM-42576: Write task to compute predicted positions for an upcoming visit lsst/ap_pipe#189

Merged

kfindeisen mentioned this pull request Aug 6, 2024

DM-31833: Move LoadDiaCatalogs to standalone pipelineTask #230

Merged

kfindeisen reviewed Aug 13, 2024

View reviewed changes

Gerenjie force-pushed the tickets/DM-42576 branch from 0bd06cc to 7c1f70e Compare August 15, 2024 17:33

Gerenjie force-pushed the tickets/DM-42576 branch 2 times, most recently from d42d5a8 to 1e5823b Compare August 28, 2024 06:41

kfindeisen requested changes Aug 29, 2024

View reviewed changes

Gerenjie force-pushed the tickets/DM-42576 branch from 1e5823b to 88cfa66 Compare August 31, 2024 19:11

kfindeisen approved these changes Sep 5, 2024

View reviewed changes

Gerenjie added 2 commits September 5, 2024 22:28

Add mpSkyEphemerisQuery for ssObject precompute and conversion betwee…

94d6ec9

…n ssObjectID and MPC designation

Use legacySolarSystemTable if provided and no optional solarSystemObj…

e8a5a9d

…ectTable is provided

Gerenjie force-pushed the tickets/DM-42576 branch from c8d648a to e8a5a9d Compare September 6, 2024 05:39

Gerenjie merged commit 67901ff into main Sep 6, 2024
2 checks passed

Gerenjie deleted the tickets/DM-42576 branch September 6, 2024 18:27

	expMidPointEPOCH = (timespan.begin.mjd + timespan.end.mjd)/2
	expMidPointEPOCH = utils.getMidpointFromTimespan(timespan, allowEndpoints=False).utc.mjd

	Will compute the location for of known SSObjects within a visit
	Will compute the location for of known SSObjects within a visit-detector combination.

	"(Rubin Obs./LSST default is X05)",
	"(default is Rubin Obs./LSST)",

DM-42576: mpSkyEphemerisQuery #227

DM-42576: mpSkyEphemerisQuery #227

Conversation

Gerenjie commented Jul 16, 2024

timj commented Jul 25, 2024

kfindeisen left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfindeisen Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

kfindeisen Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfindeisen Aug 13, 2024 • edited Loading

Choose a reason for hiding this comment

kfindeisen Aug 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfindeisen Aug 13, 2024 • edited Loading

Choose a reason for hiding this comment

kfindeisen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfindeisen Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

kfindeisen Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfindeisen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfindeisen Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfindeisen commented Sep 6, 2024

kfindeisen left a comment •

edited

Loading

kfindeisen Jul 29, 2024 •

edited

Loading

kfindeisen Jul 29, 2024 •

edited

Loading

kfindeisen Aug 13, 2024 •

edited

Loading

kfindeisen Aug 13, 2024 •

edited

Loading

kfindeisen Aug 13, 2024 •

edited

Loading

kfindeisen Aug 29, 2024 •

edited

Loading

kfindeisen Sep 3, 2024 •

edited

Loading

kfindeisen Sep 5, 2024 •

edited

Loading