Module vcf_rider::rider [] [src]

Main module of vcf_rider, its function get_scores is the entry point of the whole analysis.

Re-exports

use bio::io::bed;
use std::fs;
use std::io::BufWriter;
use super::fasta;
use super::mutations;
use super::indel;
use std::collections::VecDeque;
use std::io::Write;
use bit_vec::BitVec;

Structs

RiderParameters

The parameters used to setup vcf_rider are a vector of objects able to score a sequence and their minimum and maximum lengths. In the future it will become possible to combine scores for all the subsequences not only summing but also for example averaging them, getting the minimum or the maximum, etc.

Traits

CanScoreSequence

Our vcf_rider main function will receive a Vec<T: CanScoreSequence> and call it for every T on subsequences of the genomes of the samples doing it only for each existing subsequence once. This trait will need to be able to compute a score on a given sequence, represented by a splice of an array of u8 [TODO] starting for a given position (it is guaranteed by the lib that the used position will be given inside the sequence, i.e. sequence.len() - self.get_length() >= 0).

Functions

encode_genotypes

Function that encode the genotypes of our individuals as BitVec given the list of the overlapping mutations, the information on their indel status obtained from IndelRider, the indexes of this group individuals/alleleles, the total number of individuals that we are studying and a vector of all their ids.

find_overlapping_snps

Function that advances on the VcfReader (Iterator of Mutation) until the first snp that does not overlap with the given window, putting in snps_buffer all the overlapping snps and their number and then the first not overlapping snp. It uses SNPs in snps_buffer to manage overlapping (always sorted on their starting coord!) bed entries. b3_e < b2_e TODO is managed/test?

get_scores

The single entry point of our library, right now for ease of use in bioinformatic pipelines it simply prints the results on standard output. TODO: return a suitable data structure with results.

match_indexes

Function that populates a Vec<(usize, bool)> for a sequence, represented by the given index, telling us for which individuals/alleles (represented by an usize) have that sequence. Returns true if at least an individual/allele has this sequence.

obtain_seq

Function that populates a vector of sequences, given a genomic window, a buffer of SNPs overlapping with it alongside their indel status information for the samples groups, the reference sequence, the genotypes encoded for our individuals and the end of the bed that we are considering. The filled vector is a vector of tuples of BitVec, representing the sequences indexes, and Vec, representing the sequences itself. At this stage we will remove duplicated sequences, storing only once the ones that are the same in different individual/alleles.

print_overlapping

Function that prints information about the overlapping SNPs in the given writer. This is only an output function that does not compute anything