Module vcf_rider::rider
[−]
[src]
Main module of vcf_rider, its function get_scores
is the entry point of the whole analysis.
Re-exports
use bio::io::bed; |
use std::fs; |
use std::io::BufWriter; |
use super::fasta; |
use super::mutations; |
use super::indel; |
use std::collections::VecDeque; |
use std::io::Write; |
use bit_vec::BitVec; |
Structs
RiderParameters |
The parameters used to setup vcf_rider are a vector of objects able to score a sequence and their minimum and maximum lengths. In the future it will become possible to combine scores for all the subsequences not only summing but also for example averaging them, getting the minimum or the maximum, etc. |
Traits
CanScoreSequence |
Our vcf_rider main function will receive a Vec<T: CanScoreSequence> and call it for every T on subsequences of the genomes of the samples doing it only for each existing subsequence once. This trait will need to be able to compute a score on a given sequence, represented by a splice of an array of u8 [TODO] starting for a given position (it is guaranteed by the lib that the used position will be given inside the sequence, i.e. sequence.len() - self.get_length() >= 0). |
Functions
encode_genotypes |
Function that encode the genotypes of our individuals as BitVec given the list of the overlapping mutations, the information on their indel status obtained from IndelRider, the indexes of this group individuals/alleleles, the total number of individuals that we are studying and a vector of all their ids. |
find_overlapping_snps |
Function that advances on the VcfReader (Iterator of Mutation) until the first snp that does not overlap with the given window, putting in snps_buffer all the overlapping snps and their number and then the first not overlapping snp. It uses SNPs in snps_buffer to manage overlapping (always sorted on their starting coord!) bed entries. b3_e < b2_e TODO is managed/test? |
get_scores |
The single entry point of our library, right now for ease of use in bioinformatic pipelines it simply prints the results on standard output. TODO: return a suitable data structure with results. |
match_indexes |
Function that populates a Vec<(usize, bool)> for a sequence, represented by the given index, telling us for which individuals/alleles (represented by an usize) have that sequence. Returns true if at least an individual/allele has this sequence. |
obtain_seq |
Function that populates a vector of sequences, given a genomic window, a buffer of SNPs overlapping with it alongside their indel status information
for the samples groups, the reference sequence, the genotypes encoded for our individuals and the end of the bed that we are considering.
The filled vector is a vector of tuples of BitVec, representing the sequences indexes, and Vec |
print_overlapping |
Function that prints information about the overlapping SNPs in the given writer. This is only an output function that does not compute anything |