-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add commands for automatically modifying configs #12020
base: v4
Are you sure you want to change the base?
Conversation
This continues work started in explosion/projects#147, which provides features for automatically manipulating pipelines and configs. The functions included are: - merge: combine components from two pipelines and handle listeners - use_transformer: use transformer as feature source - use_tok2vec: use CNN tok2vec as feature source - resume: make a version of a config for resuming training Currently these are all grouped under a new `spacy configure` command. That may not be the best place for them; in particular, `merge` may belong elsewhere, since it outputs a pipeline rather than a config. The current state of the PR is that the commands run, but there's only one small test, and docs haven't been written yet. Docs can be started but will depend somewhat on how the naming issues work out.
Maybe this will fix the CI issue?
This reverts commit be95ef5.
Adding the transformer component requires spacy-transformers, which isn't present in the normal test env.
This also change the `output_file` arg to match other commands.
This might require more adjustment - for example, maybe merge should be split out into a separate command - but it is substantially complete and ready for review. Any feedback on how to make the design clearer would be welcome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, I think this functionality can be super handy! I just did a superficial pass now and will continue reviewing at a later point.
I agree - having |
Thanks for the feedback, I moved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the command naming and some smaller stuff this looks good, but I'd definitely get feedback from a third party before moving forward.
Co-authored-by: Raphael Mitsch <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of features in this PR and ideally we would have split this up into separate PRs to make reviewing easier and the commits in the history more atomic. I guess we can keep it as is for now, but we'll want to review this carefully as there's a lot going on :-)
"pooling": {"@layers": "reduce_mean.v1"}, | ||
} | ||
nlp.config["components"][listener]["model"]["tok2vec"] = listener_config | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also need to update the [training]
block.
"upstream": "tok2vec", | ||
} | ||
nlp.config["components"][listener]["model"]["tok2vec"] = listener_config | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may also need to update the [training]
block. (I know that tok2vec
->transformer
doesn't work. I'm not 100% sure it doesn't work the other way around, but probably the tok2vec
defaults are better.)
TOK2VEC_ARCHS = [ | ||
("spacy", "Tok2Vec"), | ||
("spacy", "HashEmbedCNN"), | ||
("spacy-transformers", "TransformerModel"), | ||
] | ||
# These are the listeners. | ||
LISTENER_ARCHS = [ | ||
("spacy", "Tok2VecListener"), | ||
("spacy-transformers", "TransformerListener"), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a more general way to determine these lists?
Description
This continues work started in explosion/projects#147, which provides features for automatically manipulating pipelines and configs. The functions included are:
Currently these are all grouped under a new
spacy configure
command. That may not be the best place for them; in particular,merge
may belong elsewhere, since it outputs a pipeline rather than a config.The current state of the PR is that the commands run, but there's only one small test, and docs haven't been written yet. Docs can be started but will depend somewhat on how the naming issues work out.
Types of change
enhancement
Checklist