Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoindex zipcodes #4455

Open
wants to merge 17 commits into
base: lr-giraffe
Choose a base branch
from
Open

Conversation

xchang1
Copy link
Contributor

@xchang1 xchang1 commented Nov 25, 2024

Make the autoindexer and giraffe subcommand make minimizers and zipcodes

@xchang1
Copy link
Contributor Author

xchang1 commented Nov 25, 2024

@adamnovak Can you take a look at this for me please? I'm trying to make the autoindexer make the minimizers and zipcodes for both long read and short read giraffe. I made the subcommands be responsible for setting the minimizer parameters by changing the IndexingParameters, but I'm not sure if that's the right way to go about it. Maybe it should be separate recipes for short vs long read giraffe minimizers? That would also let us change the file names to include the minimizers parameters like we've been doing, but I also don't want it to not be able to find files that don't have the parameters in the file name

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we anticipate people mapping both long-read and short-read data against the same stored reference I think we do have to keep the indexes straight from each other with different names, or else have some kind of error to stop you accidentally using a preexisting short-read index when you meant to be using long-read mode or visa-versa. But since we want to pick up k and w from the index itself, setting up the error would be hard.

Right now if you map with the Illumina preset against a GBZ and then map with the HiFi preset against the same GBZ, you will get different mapping results than if you mapped the HiFi data first, because you will generate an Illumina-parameters minimizer index and then pick it up for HiFi.

But the IndexRegistry system doesn't really understand numerically-parametrized index types.

Since we really only have two minimizer indexing modes (the short read one and the long read one), maybe we should split "Minimizers" into "Short Read Minimizers" and "Long Read Minimizers". Then we can give them different extensions (.withzip.min for short and .long.withzip.min for long, or something), and have one of those lambda-function-based recipes so the indexing implementations share code.

Comment on lines 4120 to 4122
//TODO: maybe we want to add this too? I left it as the default
bool space_efficient_counting=false;
size_t threshold = 500;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These probably should in fact be in IndexingParameters and not just locals.

Comment on lines 3851 to 3853
cerr << "LOad gbz from " << gbz_filename << endl;
unique_ptr<gbwtgraph::GBZ> gbz = vg::io::VPKG::load_one<gbwtgraph::GBZ>(gbz_filename);
cerr << "Finished loading gbz" << endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cerr << "LOad gbz from " << gbz_filename << endl;
unique_ptr<gbwtgraph::GBZ> gbz = vg::io::VPKG::load_one<gbwtgraph::GBZ>(gbz_filename);
cerr << "Finished loading gbz" << endl;
if (IndexingParameters::verbosity != IndexingParameters::None) {
cerr << "[IndexRegistry]: Loading gbz from " << gbz_filename << endl;
}
unique_ptr<gbwtgraph::GBZ> gbz = vg::io::VPKG::load_one<gbwtgraph::GBZ>(gbz_filename);
if (IndexingParameters::verbosity != IndexingParameters::None) {
cerr << "[IndexRegistry]: Finished loading gbz" << endl;
}

@@ -1373,6 +1375,9 @@ int main_giraffe(int argc, char** argv) {
found->second.apply(*parser);
}
}
if (param_preset == std::string("hifi") || param_preset == std::string("r10")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you really need to call the string constructor here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants