Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-44850: Add utility to create empty tables #223

Merged
merged 1 commit into from
Jun 19, 2024
Merged

Conversation

isullivan
Copy link
Contributor

No description provided.

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that ApdbSchema is, like _make_empty_catalog, an implementation detail that may break at any time. I'd like feedback from @andy-slac before approving.

python/lsst/ap/association/utils.py Show resolved Hide resolved
@@ -35,6 +35,7 @@
from lsst.pipe.base.testUtils import assertValidOutput

from lsst.ap.association import DiaPipelineTask
from lsst.ap.association.utils import make_empty_catalog, convertTableToSdmSchema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest moving this to a test_utils.py.

def test_make_empty_catalog(self):
"""Check that an empty catalog has the correct format.
"""
tableName = "DiaObject"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also work with other tables, e.g. DiaForcedSource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should work with any of the tables. I'll add a for loop over a few tables.

tableName = "DiaObject"
config = self._makeDefaultConfig(config_file=self.config_file.name)
diaPipeTask = DiaPipelineTask(config=config)
emptyDiaObjects = make_empty_catalog(diaPipeTask.schema, tableName=tableName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you provide a test schema directly, then there's no need to mix DiaPipelineTask into the test...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, though here I wanted to also test that diaPipeTask.schema is getting set correctly. That's why I added this test to test_diaPipe instead of creating a test_utils, but I am happy to move it out and just make the schema directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's actually being tested here -- the only thing you compare the schema to is itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will move it to test_utils, and also add a check of a few of the values.

emptyDf = pd.DataFrame(columns=["diaObjectId",])
emptyDf.set_index("diaObjectId")
convertedEmptyDiaObjects = convertTableToSdmSchema(diaPipeTask.schema, emptyDf, tableName=tableName)
self.assertSetEqual(set(convertedEmptyDiaObjects.columns), set(emptyDiaObjects.columns))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're comparing two set, then assertSetEqual is redundant.

Suggested change
self.assertSetEqual(set(convertedEmptyDiaObjects.columns), set(emptyDiaObjects.columns))
self.assertEqual(set(convertedEmptyDiaObjects.columns), set(emptyDiaObjects.columns))

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Krzysztof's sentiment - ApdbSchema presently is not in public API. And I think that you do not actually have to use it, as Apdb has a method to return table schema which wraps access to ApdbSchema. I'd probably re-implement readSchemaFromApdb like this:

from lsst.dax.apdb import ApdbTables, schema_model

def readSchemaFromApdb(apdb: Apdb) -> dict[str, schema_model.Table | None]:
    """...."""
    return {table.table_name(): apdb.tableDef(table) for table in ApdbTables}

OTOH, if there is something else in ApdbSchema that you want to reuse, we can definitely extend Apdb's public interface.

@@ -141,7 +142,7 @@ def convertTableToSdmSchema(apdbSchema, sourceTable, tableName):

Parameters
----------
apdbSchema : `lsst.dax.apdb.apdbSchema.ApdbSchema`
apdbSchema : 'dict' of `lsst.dax.apdb.apdbSchema.ApdbSchema`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be lsst.dax.apdb.schema_model.Table instead of lsst.dax.apdb.apdbSchema.ApdbSchema. Same on the line 128 above.

@@ -182,7 +183,7 @@ def dropEmptyColumns(apdbSchema, sourceTable, tableName):

Parameters
----------
apdbSchema : `lsst.dax.apdb.apdbSchema.ApdbSchema`
apdbSchema : 'dict' of `lsst.dax.apdb.apdbSchema.ApdbSchema`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lsst.dax.apdb.schema_model.Table here too.


Parameters
----------
apdbSchema : 'dict' of `lsst.dax.apdb.apdbSchema.ApdbSchema`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

@isullivan
Copy link
Contributor Author

Great! Your code snippet for readSchemaFromApdb works for my tests, so I have switched over to using it.

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some documentation and usage comments.

@@ -124,13 +126,10 @@ def readSchemaFromApdb(apdb):

Returns
-------
schemaTable : 'dict' of `lsst.dax.apdb.apdbSchema.ApdbSchema`
schemaTable : dict[str, schema_model.Table | None]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to not make an entry instead of returning None? I assume that's for tables that aren't in a particular DB instance, but it would be good to document that explicitly.

@@ -89,7 +91,7 @@ def readSdmSchemaFile(schemaFile: str,

Returns
-------
schemaTable : 'dict' of `lsst.dax.apdb.apdbSchema.ApdbSchema`
schemaTable : 'dict' of `lsst.dax.apdb.schema_model.Table`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
schemaTable : 'dict' of `lsst.dax.apdb.schema_model.Table`
schemaTable : `dict` [`str`, `lsst.dax.apdb.schema_model.Table`]

Same for all the other places that say "'dict' of" instead of "dict of"

def test_make_empty_catalog(self):
"""Check that an empty catalog has the correct format.
"""
apdb_config = daxApdb.ApdbSql.init_database(db_url="sqlite://")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, in-memory APDBs will be removed in a future version. But I think @andy-slac would need to deprecate them first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-memory SQLite support is already removed since DM-43414. It might work for this particular unit test, but you still should not use it.

Copy link
Member

@kfindeisen kfindeisen Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I recommend using NamedTemporaryFile as a context manager instead of using addCleanup (which is more appropriate for stuff created in setUp):

Suggested change
apdb_config = daxApdb.ApdbSql.init_database(db_url="sqlite://")
with tempfile.NamedTemporaryFile() as apdb_file:
apdb_config = daxApdb.ApdbSql.init_database(db_url="sqlite://" + apdb_file)
apdb = daxApdb.Apdb.from_config(apdb_config)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was struggling to get your suggestion to work, and realized I only need the schema here, not the full APDB. I will use readSdmSchemaFile instead of readSchemaFromApdb to get the schema.

Comment on lines 37 to 40
config_file = tempfile.NamedTemporaryFile()
self.addCleanup(config_file.close)
apdb_config.save(config_file.name)
apdb = daxApdb.Apdb.from_uri(config_file.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing this kind of code, I recommend reading the API docs instead of cargo-culting:

Suggested change
config_file = tempfile.NamedTemporaryFile()
self.addCleanup(config_file.close)
apdb_config.save(config_file.name)
apdb = daxApdb.Apdb.from_uri(config_file.name)
apdb = daxApdb.Apdb.from_config(apdb_config)

@isullivan isullivan merged commit 639f7d8 into main Jun 19, 2024
2 checks passed
@isullivan isullivan deleted the tickets/DM-44850 branch June 19, 2024 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants