-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError for class values on conformer_ctc. #240
Comments
Could you change icefall/egs/librispeech/ASR/conformer_ctc/train.py Lines 413 to 416 in 1603744
to unsorted_token_ids = graph_compiler.texts_to_ids(
supervisions["text"]
)
import pdb
pdb.set_trace()
att_loss = mmodel.decoder_forward( When it enters pdb, you can print out the value of |
my output: (Pdb) print(unsorted_token_ids) |
Is this batch causing the above error? |
I guess so. It gives error |
You can continue running by
If it throws again, the last output of Also, you can use a |
When i did
|
icefall/egs/librispeech/ASR/conformer_ctc/label_smoothing.py Lines 78 to 83 in 1603744
At line 79, entries of -1 in Did you make any changes to the code? |
nope. I did not change anything in label_smoothing.py. Ignored variable looks good, but target variable assigning is not working i guess.
|
What are the values of |
result:
|
When you print out the value of |
result:
|
Ok, it's strange. What the version of your pytorch? What is the output of the following code for your PyTorch? #!/usr/bin/env python3
import torch
target = torch.tensor([1, 3, -1, -1, 2])
ignored = target == -1
print(ignored)
target[ignored] = 0
print(target) It outputs
on my computer. |
torch==1.7.1
|
It is very odd that |
FWIW, I was getting the same issue where indexing did not seem to be changing the -1 to 0. I made the following changes in label_smoothing.py to make it work: # target[ignored] = 0
target = torch.where(ignored, torch.zeros_like(target), target) and # true_dist[ignored] = 0
true_dist = torch.where(
ignored.unsqueeze(1).repeat(1, true_dist.shape[1]),
torch.zeros_like(true_dist),
true_dist,
) and then it worked. It seems to be related to some PyTorch errors other people have encountered, where logical indexing fails with CUDA tensors. |
Thanks! |
Fixed by #300 Feel free to re-open it if the issue still exists. |
Hi, I was migrating my environment over to a newly setup docker and faced this exact issue too, because I was using an older version of "label_smoothing.py" in my program.
When I updated my codes, this is the new error, presumably because
Anyhow, in the lack of a more elegant solution, I built a new Icefall image based on pytorch/pytorch:1.9.0-cuda11.1-cudnn8-devel, and just successfully started training. So far no issue. I noticed that this peculiar error seemed to happen for torch=1.7.x and torch=1.8.x, which the current Icefall Dockerfile is based on. Are there any plans to update the Dockerfile? |
If you have time, would you mind updating the docker file? To be honest, I am not using icefall in a docker container. |
Yes sure. However, there was a recent Nvidia issue where the apt-keys are being "rotated". The base images don't seem to have caught on with the change yet, so perhaps this is not a good time to update the Dockerfile anyway. I plan to have a go at updating the Dockerfile after my current training is done. Is there any recommended environment that has been proven to be stable at this point in time? |
Sorry, I don't have much experience with docker. I think your current working version is fine. |
Hi k2
I am trying to run "librispeech/ASR/conformer_ctc/". I build docker image from your Dockerfile. I use Librispeech dataset. However it gives an error, and i could not fix.
I use thşs command:
conformer_ctc/train.py --max-duration 140
Thanks,
Mesut
The text was updated successfully, but these errors were encountered: