Using CR-CTC to train, decoder results are all start and end character. #1780

masterjade7 · 2024-10-22T09:04:33Z

masterjade7
Oct 22, 2024

When using CR-CTC to train from a NULL model, the decoded results consistently converge to only output the start and end characters. Disabling CR_loss allows the training to proceed normally. However, even when using a model pre-trained with CTC_loss as the initial model, it still eventually converges to only the start and end characters. In my model, the time_mask and frequency_mask only mask a segment, with the window size randomly generated as an integer. Could this be causing the issue, or might there be other factors contributing to this behavior?

Answered by yaozengwei

Oct 22, 2024

If you use reduction="mean", in torch.nn.functional.ctc_loss, you can divide the final cr_loss by the total number of frames over the batch, could use encoder_out_lens.sum().item().
For CTC loss, I would suggest using reduction="sum", in torch.nn.functional.ctc_loss, and divide that by the total number of frames over the batch. This would make the relative scale consistency with our settings.

Like this (see the commented part):

    def forward_cr_ctc(
        self,
        encoder_out: torch.Tensor,
        encoder_out_lens: torch.Tensor,
        targets: torch.Tensor,
        target_lengths: torch.Tensor,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """Compute CTC loss with consis…

View full answer

yaozengwei · 2024-10-22T09:18:17Z

yaozengwei
Oct 22, 2024
Maintainer

@masterjade7 What dataset are you using? Could you show your training script? Which loss do you use besides cr-ctc-loss? Have you tried regular SpecAugment?

14 replies

masterjade7 Oct 22, 2024
Author

It is both set to 'mean'。

yaozengwei Oct 22, 2024
Maintainer

For CTC loss, if you use reduction='mean', the output losses will be divided by the target lengths and then the mean over the batch is taken, as described in https://pytorch.org/docs/stable/generated/torch.nn.functional.ctc_loss.html.

masterjade7 Oct 22, 2024
Author

I checked and noticed that the reduction for nn.functional.kl_div in CR_loss is set to "none." Should I change it to "mean"? In the provided code, the reduction for CTC_loss is set to "sum" and for kl_div it is set to "none." I'll make change my code and update the experimental results.

yaozengwei Oct 22, 2024
Maintainer

If you use reduction="mean", in torch.nn.functional.ctc_loss, you can divide the final cr_loss by the total number of frames over the batch, could use encoder_out_lens.sum().item().
For CTC loss, I would suggest using reduction="sum", in torch.nn.functional.ctc_loss, and divide that by the total number of frames over the batch. This would make the relative scale consistency with our settings.

Like this (see the commented part):

    def forward_cr_ctc(
        self,
        encoder_out: torch.Tensor,
        encoder_out_lens: torch.Tensor,
        targets: torch.Tensor,
        target_lengths: torch.Tensor,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """Compute CTC loss with consistency regularization loss.
        Args:
          encoder_out:
            Encoder output, of shape (2 * N, T, C).
          encoder_out_lens:
            Encoder output lengths, of shape (2 * N,).
          targets:
            Target Tensor of shape (2 * sum(target_lengths)). The targets are assumed
            to be un-padded and concatenated within 1 dimension.
        """
        # Compute CTC loss
        ctc_output = self.ctc_output(encoder_out)  # (2 * N, T, C)
        ctc_loss = torch.nn.functional.ctc_loss(
            log_probs=ctc_output.permute(1, 0, 2),  # (T, 2 * N, C)
            targets=targets.cpu(),
            input_lengths=encoder_out_lens.cpu(),
            target_lengths=target_lengths.cpu(),
            reduction="sum",
        )

        # Compute consistency regularization loss
        exchanged_targets = ctc_output.detach().chunk(2, dim=0)
        exchanged_targets = torch.cat(
            [exchanged_targets[1], exchanged_targets[0]], dim=0
        )  # exchange: [x1, x2] -> [x2, x1]
        cr_loss = nn.functional.kl_div(
            input=ctc_output,
            target=exchanged_targets,
            reduction="none",
            log_target=True,
        )  # (2 * N, T, C)
        length_mask = make_pad_mask(encoder_out_lens).unsqueeze(-1)
        cr_loss = cr_loss.masked_fill(length_mask, 0.0).sum()

        # Note: you could also scale down ctc_loss and cr_loss by the total number of frames 
        # tot_frames = encoder_out_lens.sum().item()
        # ctc_loss = ctc_loss / tot_frames
        # cr_loss = cr_loss / tot_frames

        return ctc_loss, cr_loss

Answer selected by yfyeung

masterjade7 Oct 23, 2024
Author

I’ve now set CTC_loss and CR_loss as the code you've given , and replaced the mask with SpecAugment for the ongoing experiment. I will update the results. I also tried disabling the mask, and during debugging, CR_loss was calculated as 0.

yaozengwei Oct 23, 2024
Maintainer

I also tried disabling the mask, and during debugging, CR_loss was calculated as 0.

The cr-loss value would not be 0 even without masking, as the dropout will still lead to prediction difference, unless it is at eval mode.

masterjade7 Oct 23, 2024
Author

My model does not use dropout.

yaozengwei Oct 23, 2024
Maintainer

Even no layer dropout?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using CR-CTC to train, decoder results are all start and end character. #1780

{{title}}

Replies: 1 comment 14 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Using CR-CTC to train, decoder results are all start and end character. #1780

masterjade7 Oct 22, 2024

Replies: 1 comment · 14 replies

yaozengwei Oct 22, 2024 Maintainer

masterjade7 Oct 22, 2024 Author

yaozengwei Oct 22, 2024 Maintainer

masterjade7 Oct 22, 2024 Author

yaozengwei Oct 22, 2024 Maintainer

masterjade7 Oct 23, 2024 Author

yaozengwei Oct 23, 2024 Maintainer

masterjade7 Oct 23, 2024 Author

yaozengwei Oct 23, 2024 Maintainer

masterjade7
Oct 22, 2024

Replies: 1 comment 14 replies

yaozengwei
Oct 22, 2024
Maintainer

masterjade7 Oct 22, 2024
Author

yaozengwei Oct 22, 2024
Maintainer

masterjade7 Oct 22, 2024
Author

yaozengwei Oct 22, 2024
Maintainer

masterjade7 Oct 23, 2024
Author

yaozengwei Oct 23, 2024
Maintainer

masterjade7 Oct 23, 2024
Author

yaozengwei Oct 23, 2024
Maintainer