-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(python cdk): Update init files to expose classes that are part of the public interface #38297
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
from .transformations import RecordTransformation | ||
from .declarative_stream import DeclarativeStream | ||
|
||
__all__ = ["ConnectionChecker", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expose the interfaces of the declarative module since connector developer can create custom class inheriting them. Changing these interfaces should be considered a breaking change
|
||
__all__ = ["AirbyteTracedException", "SchemaInferrer", "is_cloud_environment"] | ||
__all__ = ["AirbyteTracedException", "SchemaInferrer", "is_cloud_environment", "as_airbyte_message", "resolve_refs"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved_refs
is used by a few destinations so it should be part of the public interface. We can revisit separately if we don't want it to be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a bit of digging to see if there are things being used that are being exposed. There are, but it is also super understandably hard to keep track! It's also hard to talk about a "top level interface" when we still encourage importing from all of these various init files. That's to say, I don't know how much it adds to expose things here when they're so close to the files that they're being imported from.
Most of the other python libraries we import from (dagster, pydantic, git, etc) put everything into their top level (e.g. most things we import from them we can or should be importing with a from dagster import X
instead of from dagster.some.specific.path import X
). WDYT about us working toward this? We could take 1 of 2 approaches:
- Put everything that's currently used in the top-level init. This is easier to track because as you add things to the init, you can temporarily find and replace the code in the connectors until everything is just
from airbyte_cdk import X
. People can move over to the new imports now, and then we start removing things as necessary in later majors. Less obvious visibility into what people are using that we want to stop supporting, but we can still search for usages I guess - Use that top level file as our "ideal", such that we put everything we want exposed there, and don't expose anything else. Then if we get people to move to importing things from base
airbyte_cdk
when it's in the list, we start to have visibility into which things people are importing that we aren't intentionally exposing. Downside here is its a lot of work to update the "desired" imports and distinguish them from the ones that can't be updated
airbyte-cdk/python/airbyte_cdk/sources/streams/http/__init__.py
Outdated
Show resolved
Hide resolved
@@ -7,6 +7,7 @@ | |||
from .abstract_source import AbstractSource | |||
from .config import BaseConfig | |||
from .source import Source | |||
from .types import Config, Record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add these to __all__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda think we can go either way with these. We could go all in on the newly defined types. And try to normalize or encourage our Python sources to use stronger types like the StreamSlice
concept. Or like Ella mentioned we just don't expose them. Until a week ago these didn't even exist in Python CDK so most connectors probably don't refrerence them
Two questions:
- Do we want to expose the
StreamSlice
concept which might help add some more sanity to our Python susbstreams? - Right now our method interfaces don't account for a lot of these types so it's not intuitive to use the new classes. But if we're making the 1.0.0 bump, would it be a good idea to improve the method interfaces?
airbyte-cdk/python/airbyte_cdk/sources/streams/concurrent/__init__.py
Outdated
Show resolved
Hide resolved
@erohmensing I'll move classes to the top-level init. that's a good idea. I have a soft-preference for the first approach because it'll make it clear that modifying these classes is a breaking change while others can be considered "internal". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! If we wanted to, we could stop exporting the classes from the lower init files that aren't extras-based, which would start to force people to move to the top-level imports if they're using those and not the actual immediate file-based imports. I'm not sure its necessary, but it might also make one less import interface.
Can you please also update the 1.0 migration guide to tell people that they should now import directly from airbyte_cdk
and that other imports will be considered non-stable (i.e. we can move them around)?
What
Update CDK init files to expose classes that are currently imported by a connector.
How
git grep 'from airbyte_cdk' -- . | cat | awk -F ':' '{print $2}' | sort | uniq > imports.txt
to find all imports fromairbyte-integrations/connectors
__all__
test
package is exported as a package to make it clear that it's separate from the other classesairbyte_cdk.sources.declarative.schema.json_file_schema_loader._default_file_path
even though it's used by a connector since it's marked as private. We should instead update the connectorairbyte_cdk.sources.declarative.models
even though two are used by a connector. We shouldn't need to import these classes since they're only used to parse from the YAML representation of low-code connectors to their python implementation. The connector should instead of updated to use the declarativeinterfaces.The classes from the
destination.vector_db_based
aren't at the top level because the submodule depends on thevector-db-based
extra.The classes from
source.file_based
module aren't at the top level because the submodule depends on the file-based extraFollow up issues to force connector modules to import from the top-level module
Can this PR be safely reverted and rolled back?