Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42576: mpSkyEphemerisQuery #227

Merged
merged 2 commits into from
Sep 6, 2024
Merged

DM-42576: mpSkyEphemerisQuery #227

merged 2 commits into from
Sep 6, 2024

Conversation

Gerenjie
Copy link
Contributor

Adding mpSkyEphemerisQuery, which replaces SkyBot in querying mpSky for ssObjects in the pre-load step.

@Gerenjie Gerenjie changed the title Tickets/dm 42576 DM-42576: mpSkyEphemerisQuery Jul 16, 2024
@timj
Copy link
Member

timj commented Jul 25, 2024

@Gerenjie the reason the action is failing is because the DM repos require rebasing and not merging from main in order to keep the ticket branch up to date. If you need help with that you can Slack me.

@Gerenjie Gerenjie force-pushed the tickets/DM-42576 branch 4 times, most recently from b58aae7 to adf1c28 Compare July 25, 2024 21:56
Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to put this together! I have quite a few comments, so I'm returning the review ahead of the other PRs. My most important objections are the timeout and the print statements; leaving either alone will make this task almost impossible to use in the field. Please also make sure you're building the docs; I saw some formatting problems that I would expect to cause a build error.

Happy to look at this again once these issues are addressed. See the dev guide and the commits appendix for tips for working through the comments without getting overwhelmed.

import lsst.pipe.base.connectionTypes as connTypes

# Enforce an error for unsafe column/array value setting in pandas.
pd.options.mode.chained_assignment = 'raise'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very dangerous to set a global option in a module like this -- it means the behavior of other code will depend on whether this module has been imported or not.

If your task really needs this setting, consider using option_context instead. But if I understand https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#evaluation-order-matters correctly, this flag is a debugging tool and shouldn't be committed at all.

Comment on lines 33 to 45
import pandas as pd
import numpy as np
import pyarrow as pa
import requests
import sys

import lsst.pipe.base as pipeBase
import lsst.pex.config as pexConfig
from lsst.utils.timer import timeMethod
from lsst.geom import SpherePoint

from lsst.pipe.base import PipelineTask, PipelineTaskConfig, PipelineTaskConnections
import lsst.pipe.base.connectionTypes as connTypes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please organize imports according to PEP 8 guidelines. In particular, you import pipe.base three times in three different ways; to minimize confusion, please pick one and use it consistently throughout the file.


ssObjects = connTypes.Output(
doc="MPSky-provided Solar System objects observable in this detector-visit",
name="preloaded_ssObjects",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During RFC-997 there was some grumbling about exactly what the new datasets ought to be called. It might be a good idea to check with #dm-naming-things before we actually start creating new dataset types in /repo/main and /repo/embargo (though when I asked about the similarly-named APDB output datasets I didn't get an answer).

Comment on lines 73 to 78
observerCode = pexConfig.Field(
dtype=str,
doc="IAU Minor Planet Center observer code for queries "
"(Rubin Obs./LSST default is X05)",
default='X05'
)
queryRadiusDegrees = pexConfig.Field(
dtype=float,
doc="On sky radius for Ephemeris cone search. Defaults "
"to the radius of Rubin Obs FoV in degrees",
default=1.75
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create PRs for obs_subaru, obs_decam, and obs_lsst that create config overrides for these instrument-dependent fields. For the HSC/DECam observer codes you can probably copy from the existing SkyBotEphemerisQuery.py files, but I'm pretty sure AuxTel/LATISS has a separate code from Simonyi/LSSTCam.

Copy link
Member

@kfindeisen kfindeisen Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out there's no good documentation about making config overrides. 😞 A quick from-memory summary:

  • The Python files in obs_*/config/ contain config settings that should be applied for all runs of that task on that instrument. They have higher priority than task defaults, but lower priority than in-pipeline overrides.
  • The config file must have the same name as the task label in the pipeline (which you can usually assume is _DefaultName, including in your case).
  • Overrides directly under obs_lsst/config/ apply to all LSST instruments. Overrides in a subdirectory apply only to that instrument, and have higher priority than the general overrides.

Copy link
Member

@kfindeisen kfindeisen Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second glance, something is very wrong here -- you don't actually use observerCode when downloading from MPSky. Is this a future feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use X05 for AuxTel for now. AuxTel getting an observatory code is blocked on association working, since the Minor Planet Center assigns observatory codes and they've asked for some NEO measurements. I don't see an obs_auxtel equivalent though -- should that be somewhere in obs_lsst?

Now that I look at it, I don't see a SkyBotEphemerisQuery.py override in obs_lsst at all. Should I follow the format from obs_subaru and obs_decam?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, observerCode is a future MPSky feature, but it currently just defaults to LSST (X05).

Copy link
Member

@kfindeisen kfindeisen Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an obs_auxtel equivalent though -- should that be somewhere in obs_lsst?

Yes, that's what the subdirectories are for. I think LATISS is the only one that's for AuxTel instead of Simonyi.

Now that I look at it, I don't see a SkyBotEphemerisQuery.py override in obs_lsst at all. Should I follow the format from obs_subaru and obs_decam?

The default for SkyBotEphemerisQuery was I11, which was already for LSST so no override was done. I'm not sure how I11 relates to X05.

Where you do need an override (LATISS only?), then yes, following the existing format should be enough.

Copy link
Member

@kfindeisen kfindeisen Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, observerCode is a future MPSky feature, but it currently just defaults to LSST (X05).

Please add a note to the field docstring that it's not used yet.

This does raise the question of what to do for DECam and especially HSC. We could just run the task and accept that nothing will match (that's what SkyBotEphemerisQueryTask did before I learned that geocentric is not a good default 😛), or we could trim the task out of those pipelines until MPSky supports them. Made irrelevant by the news that we can't run this task in ap_verify at all; see DM-45750.

It looks like there's no Jira issue for adding generic observatory support, so we can't cite it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I11 is Gemini South, which is basically LSST as far as positions on the Earth are concerned. X05 probably didn't exist as a code when this was written (it was just designated last year, I believe), but as far as science is concerned these are all equivalent except in very extreme cases (impactors).

I've created DM-45735, and will cite it in comments! Should be resolved soon, but who knows.


self.log.info("Successfully associated %d SolarSystemObjects.", nFound)
assocMask = diaSourceCatalog["ssObjectId"] != 0
assocMask = (diaSourceCatalog["ssObjectId"] != 0) & (np.isfinite(diaSourceCatalog["ssObjectId"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to guarantee that ssObjectId is always finite -- you're the one defining it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if ssObjectID has defined behavior for diaSources which are not associated to an ssObject.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard that -- I do define it and set it to zero. I agree, this change is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard that -- I do define it and set it to zero. I agree, this change is not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipelines should not go in task packages, and they should never be mixed in with Python code. Please move this to ap_pipe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that lsst/ap_pipe#189 has its own Ephemerides.yaml, so I recommend just dropping this file entirely.

python/lsst/ap/ephemerides.yaml Outdated Show resolved Hide resolved
python/lsst/ap/ephemerides.yaml Outdated Show resolved Hide resolved
python/lsst/ap/ephemerides.yaml Outdated Show resolved Hide resolved
expCenter = SpherePoint(region.getBoundingCircle().getCenter())

# Make sure date is non-NaN.
expMidPointEPOCH = (timespan.begin.mjd + timespan.end.mjd)/2
Copy link
Member

@kfindeisen kfindeisen Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #230 merges, you should be able to just do:

Suggested change
expMidPointEPOCH = (timespan.begin.mjd + timespan.end.mjd)/2
expMidPointEPOCH = utils.getMidpointFromTimespan(timespan, allowEndpoints=False).utc.mjd

As a bonus, getMidpointFromTimespan would do the input validation for you.

(I originally claimed the above code wouldn't work because I thought timespan.begin.mjd was still an astropy.time.Time. That's not the case, so this is just simple code reuse.)

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better, most of my comments are about documentation. However, I'm a bit concerned about the exception handling and the correctness of the ID converter; can you add some round-trip tests for the latter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do umbrella "response to review" commits. Commits get their value from being logically coherent and individually correct, so review responses should be done either by tweaking the existing commits (edit, squash, or fixup commands in git rebase -i) or, for a genuinely new change, creating new commits that are as well-organized and -documented as the original ones.

In this case, since you had one giant commit to begin with, squashing the changes onto it is probably the least of the available evils. I'll continue the review on the combined diff.

python/lsst/ap/association/diaPipe.py Show resolved Hide resolved
solarSystemObjectTable : `pandas.DataFrame`
Preloaded Solar System objects expected to be visible in the image.
legacySolarSystemTable : `pandas.DataFrame`
Not used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the reason for including a "virtual" input that is never used but must be specified. What is this for, and is there a plan for phasing it out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to maintain the interface's backwards compatibility -- removing the argument in the place of legacySolarSystemTable would break every place where diaPipe.run is currently being called. I do not believe there is a plan for phasing this out.

Copy link
Member

@kfindeisen kfindeisen Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that what you have here is not actually backward-compatible -- a hypothetical user who tries to call run with the old signature, expecting to get solar system associations, will silently get nothing. If you do want backward compatibility, you'd need to convert the old-style input into the new-style input.

Phasing out could be handled by raising a FutureWarning if the argument isn't None (there's no specific procedure for deprecating parameters, so you'd have to do it manually, but see that page for recommended text) and filing a Jira issue to remove it later.

All that said, though, I doubt that run is called anywhere except from runQuantum or unit tests, so I think you could get away with unceremoniously deleting it.

Copy link
Member

@kfindeisen kfindeisen Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @Gerenjie and @isullivan separately, we've agreed to move forward as follows:

  • legacySolarSystemTable will be used in place of solarSystemObjectTable when the latter is undefined. That should be enough to give backwards-compatibility whether run is called with positional args or keywords.
  • After merging this PR, we'll file an RFC to deprecate and remove legacySolarSystemTable per the usual process. I doubt there will be any pushback.

"""Solar System Object Query to MPSky in place of a internal Rubin solar
system object caching/retrieval code.

Will compute the location for of known SSObjects within a visit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Will compute the location for of known SSObjects within a visit
Will compute the location for of known SSObjects within a visit-detector combination.

observerCode = pexConfig.Field(
dtype=str,
doc="IAU Minor Planet Center observer code for queries "
"(Rubin Obs./LSST default is X05)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-generated documentation includes the field default, you can see DiaPipelineConfig for examples.

Suggested change
"(Rubin Obs./LSST default is X05)",
"(default is Rubin Obs./LSST)",

I always recommend that people proofread their docs by running package-docs clean && package-docs build and looking at the HTML output. 🙂

for character in ObjID:
ssObjectID *= 256
ssObjectID += ord(character)
ssObjectID *= 256 ** (7 - len(ObjID))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you seem to be trying to pack specific inputs into specific bytes, I recommend using shifting and bitwise operators instead of arithmetic. Both you and future developers will be less likely to make a mistake.

Just to be sure there is no mistake, is the output supposed to be an 8-byte flag where:

  • Top byte is flags
  • Remaining 7 bytes are the characters of ObjID, zero-padded on the right?

If so, it would be good to have a comment that says so. (I assume that the exact format for ssObjectID is supposed to be an implementation detail, and that clients/tests should rely on ssObjectID_to_ObjID instead?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the hope! Yes, the idea is for everyone to use these methods to convert, and for RSP users to have to use table lookups to check the ssObjectID for the minor planet they care about.


def ssObjectID_to_ObjID(ssObjectID):
"""Convert from Minor Planet Center packed provisional object IDs and
Rubin ssObjectID.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the correct description.

"""
ObjID = ''.join([chr(ssObjectID // (256 ** i) % 256) for i in reversed(range(0, 7))])
ObjID = ObjID.replace('\x00', '')
return ObjID, ssObjectID//(256 ** 7)%256
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about using bit operators instead (extracting each byte with a mask should be much simpler than all these exponents and moduli).

Comment on lines 316 to 323
ObjID : `str`
Minor Planet Center packed provisional designation for a small solar
system object. Must be fewer than eight characters.

flags : `int`
Eight free bits to enable future decoupling between Minor Planet Center
and Rubin. Zero by default, should not be changed unless we need to
move away from a 1:1 mapping with the MPC. Must be within [0, 255].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update these descriptions to be appropriate for outputs, not inputs.

------

"""
ObjID = ''.join([chr(ssObjectID // (256 ** i) % 256) for i in reversed(range(0, 7))])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating ssObjectID the same way you did input validation in the other function.

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only minor comments.

However, please squash the commits before merging:

  • "response to review" commits are still discouraged
  • unit tests should be added on the same commit as the code they test, otherwise the commits are not self-contained (imagine if somebody deleted or rearranged one of the commits -- you'd have either untested code, or tests of nonexistent code). That said, the ID converters would have made a good separate commit a priori, since they can be defined and tested without MPSkyEphemerisQuery.
  • fixes needed to get things to build should be merged into the original change, as if it had been done right the first time; as things stand, the original test commit breaks scons/flake8.

"Made legacySolarSystemTable an optional input" is a bit of a grey area; this one makes sense either as a standalone commit or squashed under the "right the first time" principle. I'd lean towards squashing (as it's basically a bug in the original legacySolarSystemTable), but if you keep it separate, please write it in the imperative tense, e.g. "Use legacySolarSystemTable in new code".

import pyarrow as pa
import requests
import sys
import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetical order would make it easier to spot what's already imported.


return Struct(
ssObjects=mpSkySsObjects,
)

def _mpSkyConeSearch(self, expCenter, epochMJD, queryRadius):
def read_mpSky_response(self, response):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd to mix camelCase and snake_case in the same name. Please pick one and stick to it.

python/lsst/ap/association/mpSkyEphemerisQuery.py Outdated Show resolved Hide resolved
object_polynomial : `np.ndarray`
Array of object cartesian position polynomials
observer_polynomial : `np.ndarray`
Array of observer cartesian position polynomials
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding explicit dimensions, at least for the multidimensional arrays (a reader can probably figure out that objID, ra, and dec are N-vectors).

ObjID = ObjID.replace('\x00', '')
return ObjID, ssObjectID//(256 ** 7)%256
if ssObjectID < 0 or ssObjectID >= (2 << 63):
raise ValueError(f'ssObjectID ({ssObjectID}) outside [0, 2^63].')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be 2^64-1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

ObjID = ''.join([chr(ssObjectID // (256 ** i) % 256) for i in reversed(range(0, 7))])
ObjID = ObjID.replace('\x00', '')
return ObjID, ssObjectID//(256 ** 7)%256
if ssObjectID < 0 or ssObjectID >= (2 << 63):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 1 << 64 would be more idiomatic.

for allowed_flag in allowed_flags:
self.assertEqual((allowed_string, allowed_flag),
ssObjectID_to_objID(objID_to_ssObjectID(allowed_string,
allowed_flag)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth factoring the ssObjectID_to_objID(objID_to_ssObjectID()) call into its own variable for readability.

+ ['K99AJ3Z', 'K08Aa0A', 'K07Tf8A', 'PLS2040', 'T1S3138', 'T2S1010', 'T3S4101']
allowed_flags = [i for i in range(0, 256)]
disallowed_flags = [-999999999, -512, -256, -255, -1, 256, 512, 99999999]
disallowed_strings = ['Ā', '🔭', 'A' * 8, ' ' * 8, 'A' * 128]
Copy link
Member

@kfindeisen kfindeisen Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting an emoji in Python code might be going a little too far (though it's hilarious that GitHub can print it!)

EDIT: just checked and even nano can handle it. Wasn't expecting that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the empty string? That seems like a natural edge case.

tests/test_utils.py Outdated Show resolved Hide resolved
+ ['K99AJ3Z', 'K08Aa0A', 'K07Tf8A', 'PLS2040', 'T1S3138', 'T2S1010', 'T3S4101']
+ ['A' * i for i in range(2, 8)] + ['Z' * i for i in range(2, 8)] \
+ ['J95X00A', 'J95X01L', 'J95F13B', 'J98SA8Q', 'J98SC7V', 'J98SG2S'] \
+ ['K99AJ3Z', 'K08Aa0A', 'K07Tf8A', 'PLS2040', 'T1S3138', 'T2S1010', 'T3S4101']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this breaks flake8, so this commit should have been part of "adding unit tests"

@Gerenjie Gerenjie merged commit 67901ff into main Sep 6, 2024
2 checks passed
@Gerenjie Gerenjie deleted the tickets/DM-42576 branch September 6, 2024 18:27
@kfindeisen
Copy link
Member

Just for future reference, the commit summaries overflowed GitHub's 72-character limit. See the dev guide for recommendations on writing commit messages (which actually recommends the much tighter limit of 50 characters, maybe we should update that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants