-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-44850: Add utility to create empty tables #223
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that ApdbSchema
is, like _make_empty_catalog
, an implementation detail that may break at any time. I'd like feedback from @andy-slac before approving.
tests/test_diaPipe.py
Outdated
@@ -35,6 +35,7 @@ | |||
from lsst.pipe.base.testUtils import assertValidOutput | |||
|
|||
from lsst.ap.association import DiaPipelineTask | |||
from lsst.ap.association.utils import make_empty_catalog, convertTableToSdmSchema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest moving this to a test_utils.py
.
tests/test_diaPipe.py
Outdated
def test_make_empty_catalog(self): | ||
"""Check that an empty catalog has the correct format. | ||
""" | ||
tableName = "DiaObject" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also work with other tables, e.g. DiaForcedSource
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should work with any of the tables. I'll add a for
loop over a few tables.
tests/test_diaPipe.py
Outdated
tableName = "DiaObject" | ||
config = self._makeDefaultConfig(config_file=self.config_file.name) | ||
diaPipeTask = DiaPipelineTask(config=config) | ||
emptyDiaObjects = make_empty_catalog(diaPipeTask.schema, tableName=tableName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you provide a test schema directly, then there's no need to mix DiaPipelineTask
into the test...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, though here I wanted to also test that diaPipeTask.schema
is getting set correctly. That's why I added this test to test_diaPipe
instead of creating a test_utils
, but I am happy to move it out and just make the schema directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's actually being tested here -- the only thing you compare the schema to is itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will move it to test_utils
, and also add a check of a few of the values.
tests/test_diaPipe.py
Outdated
emptyDf = pd.DataFrame(columns=["diaObjectId",]) | ||
emptyDf.set_index("diaObjectId") | ||
convertedEmptyDiaObjects = convertTableToSdmSchema(diaPipeTask.schema, emptyDf, tableName=tableName) | ||
self.assertSetEqual(set(convertedEmptyDiaObjects.columns), set(emptyDiaObjects.columns)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're comparing two set
, then assertSetEqual
is redundant.
self.assertSetEqual(set(convertedEmptyDiaObjects.columns), set(emptyDiaObjects.columns)) | |
self.assertEqual(set(convertedEmptyDiaObjects.columns), set(emptyDiaObjects.columns)) |
a54f713
to
ea8881b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Krzysztof's sentiment - ApdbSchema
presently is not in public API. And I think that you do not actually have to use it, as Apdb
has a method to return table schema which wraps access to ApdbSchema
. I'd probably re-implement readSchemaFromApdb
like this:
from lsst.dax.apdb import ApdbTables, schema_model
def readSchemaFromApdb(apdb: Apdb) -> dict[str, schema_model.Table | None]:
"""...."""
return {table.table_name(): apdb.tableDef(table) for table in ApdbTables}
OTOH, if there is something else in ApdbSchema
that you want to reuse, we can definitely extend Apdb's public interface.
python/lsst/ap/association/utils.py
Outdated
@@ -141,7 +142,7 @@ def convertTableToSdmSchema(apdbSchema, sourceTable, tableName): | |||
|
|||
Parameters | |||
---------- | |||
apdbSchema : `lsst.dax.apdb.apdbSchema.ApdbSchema` | |||
apdbSchema : 'dict' of `lsst.dax.apdb.apdbSchema.ApdbSchema` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be lsst.dax.apdb.schema_model.Table
instead of lsst.dax.apdb.apdbSchema.ApdbSchema
. Same on the line 128 above.
python/lsst/ap/association/utils.py
Outdated
@@ -182,7 +183,7 @@ def dropEmptyColumns(apdbSchema, sourceTable, tableName): | |||
|
|||
Parameters | |||
---------- | |||
apdbSchema : `lsst.dax.apdb.apdbSchema.ApdbSchema` | |||
apdbSchema : 'dict' of `lsst.dax.apdb.apdbSchema.ApdbSchema` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lsst.dax.apdb.schema_model.Table
here too.
python/lsst/ap/association/utils.py
Outdated
|
||
Parameters | ||
---------- | ||
apdbSchema : 'dict' of `lsst.dax.apdb.apdbSchema.ApdbSchema` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here.
Great! Your code snippet for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some documentation and usage comments.
@@ -124,13 +126,10 @@ def readSchemaFromApdb(apdb): | |||
|
|||
Returns | |||
------- | |||
schemaTable : 'dict' of `lsst.dax.apdb.apdbSchema.ApdbSchema` | |||
schemaTable : dict[str, schema_model.Table | None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to not make an entry instead of returning None
? I assume that's for tables that aren't in a particular DB instance, but it would be good to document that explicitly.
python/lsst/ap/association/utils.py
Outdated
@@ -89,7 +91,7 @@ def readSdmSchemaFile(schemaFile: str, | |||
|
|||
Returns | |||
------- | |||
schemaTable : 'dict' of `lsst.dax.apdb.apdbSchema.ApdbSchema` | |||
schemaTable : 'dict' of `lsst.dax.apdb.schema_model.Table` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schemaTable : 'dict' of `lsst.dax.apdb.schema_model.Table` | |
schemaTable : `dict` [`str`, `lsst.dax.apdb.schema_model.Table`] |
Same for all the other places that say "'dict' of" instead of "dict
of"
tests/test_utils.py
Outdated
def test_make_empty_catalog(self): | ||
"""Check that an empty catalog has the correct format. | ||
""" | ||
apdb_config = daxApdb.ApdbSql.init_database(db_url="sqlite://") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, in-memory APDBs will be removed in a future version. But I think @andy-slac would need to deprecate them first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In-memory SQLite support is already removed since DM-43414. It might work for this particular unit test, but you still should not use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I recommend using NamedTemporaryFile
as a context manager instead of using addCleanup
(which is more appropriate for stuff created in setUp
):
apdb_config = daxApdb.ApdbSql.init_database(db_url="sqlite://") | |
with tempfile.NamedTemporaryFile() as apdb_file: | |
apdb_config = daxApdb.ApdbSql.init_database(db_url="sqlite://" + apdb_file) | |
apdb = daxApdb.Apdb.from_config(apdb_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was struggling to get your suggestion to work, and realized I only need the schema here, not the full APDB. I will use readSdmSchemaFile
instead of readSchemaFromApdb
to get the schema.
tests/test_utils.py
Outdated
config_file = tempfile.NamedTemporaryFile() | ||
self.addCleanup(config_file.close) | ||
apdb_config.save(config_file.name) | ||
apdb = daxApdb.Apdb.from_uri(config_file.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing this kind of code, I recommend reading the API docs instead of cargo-culting:
config_file = tempfile.NamedTemporaryFile() | |
self.addCleanup(config_file.close) | |
apdb_config.save(config_file.name) | |
apdb = daxApdb.Apdb.from_uri(config_file.name) | |
apdb = daxApdb.Apdb.from_config(apdb_config) |
74d70b9
to
dcd3631
Compare
dcd3631
to
899a360
Compare
No description provided.