-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Streaming Conformer Transducer #178
base: main
Are you sure you want to change the base?
Support for Streaming Conformer Transducer #178
Conversation
This PR is aimed to advance the TODO list on #14 . |
@usimarit Hello! My question is... why is a Conv2D being used? I've double checked with the original Conformer paper and it's supposed to be a Conv1D. |
I noticed you're using a Here's what that would look like:
|
Good news @usimarit, most of the initial work is done. The model is now trainable, though a few things are still missing. I had to modify the base |
Sorry for the late reply, |
@andreselizondo-adestech We will have a big change in the repo structure as in the PR #177. Please be aware of that 😄 These changes will split the conformer file to the encoder file and the model file like this. I about to finish that PR so you will have to pull the main, create a new branch and cherry pick what you've done into the new structure 😄 |
Understood, I'll look into the new format :) Regarding the SeparableConv1D. I now see what you mean, it seems odd to me that DepthwiseConv1D only exists as a combination of both layers. This means internally the implementation is supported, it's just not exposed for us to use. I found this issue/PR (tensorflow/tensorflow#48557) on the Tensorflow repository. They intend to add support for the layer we need. However, the issue was opened less than 24hrs ago, so we'll have to wait and see how long it takes to be released into tf-nightly. |
@andreselizondo-adestech We can build our own |
@andreselizondo-adestech The refactor PR is merged 😄 |
Updates fork with refactoring on base repo
@usimarit I'm merging my changes into the refactored code, however.. there appears to be an issue using |
@andreselizondo-adestech Ah yeah, I missed that part, I'll update it. |
@usimarit I've adapted my changes to the refactored repo and everything seems to be working. Next step is to create our own implementation of I've been digging into how TF does the SeparableConv1D, but they just call Could you help me out with this? |
Seem like it's from tf c/c++ library 😄 |
I'm currently running a test on two VMs: Regular Conformer vs DepthwiseConv1D Conformer @usimarit In the mean time, I'm not sure how inference should work for the Streaming Conformer. |
@usimarit Good news! The two Conformer models converge to the same CER, meaning performance was not impacted negatively by the custom DepthwiseConv1D layer. In the meantime, I think we should look at how to do steaming inference on the Streaming Conformer Transducer. |
Adds mask pre-compute when input max_length is defined.
@usimarit The next step is looking at the file |
I haven't had time to dive into how the StreamingConformer work in the inference mode but I think it's quite different than the But anyway we should complete the whole pipeline (training, inference, testing, tflite) before merging 😄 |
@usimarit Hey there, this is just a gentle ping. |
@andreselizondo-adestech Sorry, I'm currently a bit busy until the end of July. So after that, I can go back to support this feature 😄 |
Hello @usimarit |
@andreselizondo-adestech Of course, I'll find some free time to help implement the inference of this |
@andreselizondo-adestech hi, are you still working on this? |
This PR is an attempt at adding support for the Streaming Conformer Transducer network.
The changes that have been identified are:
padding='causal'
2.1. A parameter for
history_window_size
needs to be added to config and dataset preprocessing.padding
tocausal
whenstreaming=True
ASRMaskedDataset
. It must compute the required mask usinghistory_window_size
.mask
to MHSA layer (Create wrapperStreamingConformerEncoder
for this?).DepthwiseConv1D
with support forpadding=causal
ASRMaskedTFRecordDataset
for working with TFRecords.time_reduction_factor
intoASRMaskedDataset
dynamically. (currently hardcoded)StreamingConformer
class. Remove unnecessary methods copied fromStreamingTransducer
.Deferred:
MaskedTransducerTrainerGA
for working with gradient accumulation. (GA is no longer supported)All comments and edits are welcome.