Add a script for customizing pipelines #147

polm · 2022-11-11T09:55:51Z

This project includes a script to merge pipelines, handling issues like whether to use replace_listeners and giving warnings when replace_listeners isn't suitable.

Currently this should work well if a secondary pipeline has only one listener, or if it's ok that a merged pipeline can't be trained. However, there are still many things that can be improved, and questions about how to do things like ensure compatibility between the pipelines to be merged.

I'm also not sure a project is the best structure for this - it might also make sense as a spaCy command. This seemed to be the best way to share it for discussion though.

This project includes a script to merge pipelines, handling issues like whether to use replace_listeners and giving warnings when replace_listeners isn't suitable. Currently this should work well if a secondary pipeline has only one listener, or if it's ok that a merged pipeline can't be trained. However, there are still many things that can be improved, and questions about how to do things like ensure compatibility between the pipelines to be merged. I'm also not sure a project is the best structure for this - it might also make sense as a spaCy command. This seemed to be the best way to share it for discussion though.

I don't like this name a lot, but it's not just merging pipelines any more.

polm · 2022-11-22T10:56:10Z

Still thinking about this and pretty sure it should be a spaCy command rather than a project, but it works as is and should be fine for demonstrating the feature and goals.

This now has the feature that it can take a pipeline and generate a config modified to use a transformer/CNN tok2vec, regardless of what the original pipeline used. This is pretty different from the pipeline merging feature, but because it needs to find details of the pipeline that aren't directly exposed - like a list of listener components - there ended up being overlap.

adrianeboyd · 2022-11-23T10:50:12Z

I can see that there are shared utility methods, but I think it's confusing to have a script that sometimes outputs a pipeline and sometimes outputs a config.

polm · 2022-11-30T11:15:47Z

You're absolutely right that having them in the same script is weird. I have split them up.

I am still not sure that doing this as a project makes the most sense, and I am trying to come up with better names, especially for the tok2vec rewriting component.

It feels like these should be spaCy commands, but I'm not sure where they would fit in the hierarchy, and I don't want to give them each their own place.

polm · 2022-12-02T09:54:34Z

I added a script to generate a config for resuming training, which ended up being really simple to generate.

I think I am getting a better idea of how to present these, I'll gather my thoughts about them.

This continues work started in explosion/projects#147, which provides features for automatically manipulating pipelines and configs. The functions included are: - merge: combine components from two pipelines and handle listeners - use_transformer: use transformer as feature source - use_tok2vec: use CNN tok2vec as feature source - resume: make a version of a config for resuming training Currently these are all grouped under a new `spacy configure` command. That may not be the best place for them; in particular, `merge` may belong elsewhere, since it outputs a pipeline rather than a config. The current state of the PR is that the commands run, but there's only one small test, and docs haven't been written yet. Docs can be started but will depend somewhat on how the naming issues work out.

polm · 2022-12-23T10:10:14Z

Closing this in favor of explosion/spaCy#12020, since in the end these do make more sense as commands.

polm mentioned this pull request Nov 15, 2022

Add equality definition for vectors explosion/spaCy#11806

Merged

3 tasks

polm added 5 commits November 22, 2022 18:17

Add tok2vec commands, use typer properly

50b7e65

Update docstrings

a48d1a9

Rename to "customize"

9bcc698

I don't like this name a lot, but it's not just merging pipelines any more.

Rewrite project file

c32f8dc

Update project.yml, add README

aba4163

polm changed the title ~~Add a script for merging pipelines~~ Add a script for customizing pipelines Nov 22, 2022

Split functions into separate scripts

d1f8bd5

polm added 2 commits December 2, 2022 18:45

Fix relative imports

2cd7546

Add resume training script

d167e19

polm mentioned this pull request Dec 14, 2022

Change GoEmotions to use spacy assemble #140

Merged

polm mentioned this pull request Dec 23, 2022

Add commands for automatically modifying configs explosion/spaCy#12020

Open

3 tasks

polm closed this Dec 23, 2022

chyccs mentioned this pull request Jan 25, 2023

fixed: fix example workflows script and file listing logic chyccs/pull-request-typography#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a script for customizing pipelines #147

Add a script for customizing pipelines #147

polm commented Nov 11, 2022

polm commented Nov 22, 2022

adrianeboyd commented Nov 23, 2022

polm commented Nov 30, 2022

polm commented Dec 2, 2022

polm commented Dec 23, 2022

Add a script for customizing pipelines #147

Add a script for customizing pipelines #147

Conversation

polm commented Nov 11, 2022

polm commented Nov 22, 2022

adrianeboyd commented Nov 23, 2022

polm commented Nov 30, 2022

polm commented Dec 2, 2022

polm commented Dec 23, 2022