Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Rework alignment #61

Open
1 of 4 tasks
HLWeil opened this issue Jul 23, 2019 · 2 comments
Open
1 of 4 tasks

[Feature Request] Rework alignment #61

HLWeil opened this issue Jul 23, 2019 · 2 comments

Comments

@HLWeil
Copy link
Member

HLWeil commented Jul 23, 2019

The Problem

Using the Pairwise alignment in BioFSharp.Algorithms works fine but the only implemented way to write out this alignment in a correct format is in the BioFSharp.IO.Clustal module. Although both generally use the same BioFSharp.Alignment.Alignment type, the conversion can be quite cumbersome.

Solution

Remodel BioFSharp.Algorithms.Pairwise Alignment and BioFSharp.IO.Clustal

  • Add ConservationInfo module to BioFSharp.IO.Clustal or BioFSharp.Alignment

  • Let Clustal functions use BioSeqs instead of Strings

  • Let BioFSharp.Algorithms.PairwiseAlignment functions use BioSeqs as output instead of Nucleotides

  • Add create function to Alignment Type in BioFSharp.Alignment

These changes should make using the different alignment functions of different namespaces together easier.

Example of unnecessary conversions

Output type of alignment
 Alignment.Alignment<Nucleotides.Nucleotide list, Algorithm.PairwiseAlignment.Score>
Expected input of clustal write function
 Alignment.Alignment<BioID.TaggedSequence<string,char>,Clustal.AlignmentInfo>
Needed Conversion
 let mappedData = 
     alignment.AlignedSequences
     |> List.mapi (fun i (ns:Nucleotides.Nucleotide list) -> 
         Seq.map (BioItem.symbol) ns
         |> BioID.createTaggedSequence (sprintf "seq%i" i)
     )

 let conservationInfo = String.init firstGeneSeq.Length (fun _ -> "*")

 let newHeader = {Header = "Decoy";ConservationInfo = conservationInfo}

 let newAlignment = {MetaData = newHeader;AlignedSequences = mappedData}

which is very cumbersome

@kMutagene
Copy link
Member

@HLWeil any updates?

@HLWeil
Copy link
Member Author

HLWeil commented May 11, 2021

Actually there are multiple types that more or less look very similar:

type TaggedSequence<'T,'S> =
    {
        Tag: 'T;
        Sequence: seq<'S>
    }
type FastaItem<'a> = {
    Header    : string;
    Sequence  : 'a;       
}
///General Alignment type used throughout BioFSharp
type Alignment<'Sequence,'Metadata> =                
    {
    ///Additional information for this alignment
    MetaData            : 'Metadata;
    ///List of aligned Sequences
    Sequences    : seq<'Sequence>;
    }

Replacing the Alignment type with the TaggedSequence might actually cause conciseness loss, but I think in general it would be good if these types had seamless interop.
Also the FastaItem type might actually be replacable with the TaggedSequence type with some minor adjustments.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants