You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @sujee I think there are two things in this issue:
1- I am not a big fan of exposing all the inner interface parameters to the end user. We do not have enough experience yet to know what works and what breaks so, at least for the short term, I think we should be adding additional parameters as needed. The turn around should not be that long once we have a requirement to expose a new feature and it gives us a chance to incrementally test the capabilities as we open up more functionality.
2- As for Domain and Path focus, I though they are set by defaults to true. Do you have any test script that I can use to confirm one way or the other. In anycase I would be OK with a PR that exposes those two flags.
Thanks
Search before asking
Component
Transforms/Other
Feature
dpk_connector supports limiting crawl to a domain or path
https://github.com/IBM/data-prep-kit/blob/dev/data-connector-lib/doc/overview.md
These parameters should be exposed from the simpler API
dpk_web2parquet.transform.Web2Parquet
This is important to limit the crawl only within a domain so the crawler doesn't go following links to other domains.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: