-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoindex zipcodes #4455
base: lr-giraffe
Are you sure you want to change the base?
Autoindex zipcodes #4455
Conversation
@adamnovak Can you take a look at this for me please? I'm trying to make the autoindexer make the minimizers and zipcodes for both long read and short read giraffe. I made the subcommands be responsible for setting the minimizer parameters by changing the IndexingParameters, but I'm not sure if that's the right way to go about it. Maybe it should be separate recipes for short vs long read giraffe minimizers? That would also let us change the file names to include the minimizers parameters like we've been doing, but I also don't want it to not be able to find files that don't have the parameters in the file name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we anticipate people mapping both long-read and short-read data against the same stored reference I think we do have to keep the indexes straight from each other with different names, or else have some kind of error to stop you accidentally using a preexisting short-read index when you meant to be using long-read mode or visa-versa. But since we want to pick up k and w from the index itself, setting up the error would be hard.
Right now if you map with the Illumina preset against a GBZ and then map with the HiFi preset against the same GBZ, you will get different mapping results than if you mapped the HiFi data first, because you will generate an Illumina-parameters minimizer index and then pick it up for HiFi.
But the IndexRegistry system doesn't really understand numerically-parametrized index types.
Since we really only have two minimizer indexing modes (the short read one and the long read one), maybe we should split "Minimizers"
into "Short Read Minimizers"
and "Long Read Minimizers"
. Then we can give them different extensions (.withzip.min
for short and .long.withzip.min
for long, or something), and have one of those lambda-function-based recipes so the indexing implementations share code.
src/index_registry.cpp
Outdated
//TODO: maybe we want to add this too? I left it as the default | ||
bool space_efficient_counting=false; | ||
size_t threshold = 500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These probably should in fact be in IndexingParameters and not just locals.
src/index_registry.cpp
Outdated
cerr << "LOad gbz from " << gbz_filename << endl; | ||
unique_ptr<gbwtgraph::GBZ> gbz = vg::io::VPKG::load_one<gbwtgraph::GBZ>(gbz_filename); | ||
cerr << "Finished loading gbz" << endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cerr << "LOad gbz from " << gbz_filename << endl; | |
unique_ptr<gbwtgraph::GBZ> gbz = vg::io::VPKG::load_one<gbwtgraph::GBZ>(gbz_filename); | |
cerr << "Finished loading gbz" << endl; | |
if (IndexingParameters::verbosity != IndexingParameters::None) { | |
cerr << "[IndexRegistry]: Loading gbz from " << gbz_filename << endl; | |
} | |
unique_ptr<gbwtgraph::GBZ> gbz = vg::io::VPKG::load_one<gbwtgraph::GBZ>(gbz_filename); | |
if (IndexingParameters::verbosity != IndexingParameters::None) { | |
cerr << "[IndexRegistry]: Finished loading gbz" << endl; | |
} |
src/subcommand/giraffe_main.cpp
Outdated
@@ -1373,6 +1375,9 @@ int main_giraffe(int argc, char** argv) { | |||
found->second.apply(*parser); | |||
} | |||
} | |||
if (param_preset == std::string("hifi") || param_preset == std::string("r10")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you really need to call the string constructor here.
Make the autoindexer and giraffe subcommand make minimizers and zipcodes