Crate vcf_rider [] [src]

vcf_rider: a library to efficiently compute score on individual genomes starting from vcf files

The idea behind vcf_rider is to exploit the fact that polymorphisms are rare. If one needs to compute a sequence based score (i.e. CpG/CG content, number of PWM hits, number of miRNA seeds, Total Binding Affinity) that can be computed on defined and independent windows it is not needed to reconstruct every individual genome and then compute the scores separatedly. vcf_rider is able to compute scores on windows that are the same for all individuals only once. Even for polymorphic subsequences it computes the scores only for the number of extant sequences and correctly assigns them to different individuals. Scores for different windows can be put together in different ways - right now they are only summed but extending the lib to perform different actions should be easy via a new configuration in ther RiderParameters struct.

Re-exports

extern crate std as std;
extern crate bio;
extern crate rust_htslib;
extern crate itertools;
extern crate bit_vec;
use std::prelude::v1::*;

Modules

fasta

The module used for reading the fasta file representing the genome of interest. Right now it should contain a single chromosome to be used with vcf_rider. The id of the fasta should be the same used in the vcf file and with genomic regions represented in the used bed.

indel

Module needed to correctly manage indels when computing scores. Indels represent a problem because they make genomes of different individuals 'out of phase' and force us to divide them in different groups.

mirna
mutations

Module able to load mutations from a vcf file.

pwm

Module representing Positional Weight Matrixes and that is able to compute their score on a given sequence.

rider

Main module of vcf_rider, its function get_scores is the entry point of the whole analysis.