Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After basecallling with dorado, nanopolish was unable to recognize methylation information #1140

Open
happier21 opened this issue Apr 7, 2024 · 7 comments

Comments

@happier21
Copy link

Dear all,

In the Quickstart-calling methylation with nanopolish section of the nanopolish usage instructions, guppy basecaller is used to identify the base signal of reads. I replaced guppy with ont's newly recommended basecalling tool, dorado. The rest of the steps remained the same, but the methylation information was significantly less than with guppy.Is it because dorado and nanopolish are not compatible?
I examined my process log and found that while the number of reads was still high when using the nanopolish index and minimap2, the number of reads decreased significantly when using nanopolish call-methylation.

d50033230e00dccfa0c08f2312a118e 135504aab317cf37c5d6d478b96a4ef 117e880364de210865609dbfcd21a1b Why does this happen

Thank you,
ShengquanWang

@hasindu2008
Copy link
Contributor

Is the data new R10 data?

@happier21
Copy link
Author

Yes, the data is new R10 data

@hasindu2008
Copy link
Contributor

nanopolish doesn't support r10 data yet. You can try f5c which is an optimised re-implementation of the index, call-methylation and eventalign modules in nanopolish that also supports r10 and rna004.

@happier21
Copy link
Author

Thank you for your help, this method seems to work, but when running f5c call-methylation, I find another problem. Through the log of f5c call-methylation, It was found that the quality of dorado basecaller's reads was significantly lower than that of guppy basecaller's reads. Why this happened?
This is the log of f5c call-methylation
1712556874646
I then to use the dorado basecaller data run "samtools view -b -q 20 -F 4 test.sorted.bam > test.sorted.q20.mapped.bam" and calculate the number of reads in bam file. The result is as follows
1712557327264
I do the same with guppy basecaller's data. The result is as follows
1712557442867
Why is this quantity so different

@hasindu2008
Copy link
Contributor

hasindu2008 commented Apr 8, 2024

Could you please open an issue on the f5c repo? I will answer there.

What is the mapper you are using - MInimap2? If MInimap2 aligns well for Guppy and not DOrado - Might be something with Dorado - are you using the correct model?

@happier21
Copy link
Author

Thank you very much for your help!
This is the full log of the f5c call-methylation:
1712629800738
This is dorado basecaller's order:
dorado basecaller /share/home/yzwl_hanxs/app/dorado-0.5.3-linux-x64/model/[email protected] ./pod5/ | amtools view -bhS -@ 10 > test.bam
Convert bam to fastq:
samtools fastq -0 test.fastq test.bam
Use minimap2 to align:
minimap2 -a -x map-ont /share/home/yzwl_hanxs/refdata-gex-GRCh38-2020-A/fasta/genome.fa test.fastq | samtools sort -o test.sorted.bam -T test.tmp
This is the log for minimap2:
135504aab317cf37c5d6d478b96a4ef

@hasindu2008
Copy link
Contributor

Could you open an issue with this log at https://github.com/hasindu2008/f5c/issues as this more relevant there now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants