-
-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a script for customizing pipelines #147
Conversation
This project includes a script to merge pipelines, handling issues like whether to use replace_listeners and giving warnings when replace_listeners isn't suitable. Currently this should work well if a secondary pipeline has only one listener, or if it's ok that a merged pipeline can't be trained. However, there are still many things that can be improved, and questions about how to do things like ensure compatibility between the pipelines to be merged. I'm also not sure a project is the best structure for this - it might also make sense as a spaCy command. This seemed to be the best way to share it for discussion though.
I don't like this name a lot, but it's not just merging pipelines any more.
Still thinking about this and pretty sure it should be a spaCy command rather than a project, but it works as is and should be fine for demonstrating the feature and goals. This now has the feature that it can take a pipeline and generate a config modified to use a transformer/CNN tok2vec, regardless of what the original pipeline used. This is pretty different from the pipeline merging feature, but because it needs to find details of the pipeline that aren't directly exposed - like a list of listener components - there ended up being overlap. |
I can see that there are shared utility methods, but I think it's confusing to have a script that sometimes outputs a pipeline and sometimes outputs a config. |
You're absolutely right that having them in the same script is weird. I have split them up. I am still not sure that doing this as a project makes the most sense, and I am trying to come up with better names, especially for the tok2vec rewriting component. It feels like these should be spaCy commands, but I'm not sure where they would fit in the hierarchy, and I don't want to give them each their own place. |
I added a script to generate a config for resuming training, which ended up being really simple to generate. I think I am getting a better idea of how to present these, I'll gather my thoughts about them. |
This continues work started in explosion/projects#147, which provides features for automatically manipulating pipelines and configs. The functions included are: - merge: combine components from two pipelines and handle listeners - use_transformer: use transformer as feature source - use_tok2vec: use CNN tok2vec as feature source - resume: make a version of a config for resuming training Currently these are all grouped under a new `spacy configure` command. That may not be the best place for them; in particular, `merge` may belong elsewhere, since it outputs a pipeline rather than a config. The current state of the PR is that the commands run, but there's only one small test, and docs haven't been written yet. Docs can be started but will depend somewhat on how the naming issues work out.
Closing this in favor of explosion/spaCy#12020, since in the end these do make more sense as commands. |
This project includes a script to merge pipelines, handling issues like whether to use
replace_listeners
and giving warnings whenreplace_listeners
isn't suitable.Currently this should work well if a secondary pipeline has only one listener, or if it's ok that a merged pipeline can't be trained. However, there are still many things that can be improved, and questions about how to do things like ensure compatibility between the pipelines to be merged.
I'm also not sure a project is the best structure for this - it might also make sense as a spaCy command. This seemed to be the best way to share it for discussion though.