Voss representation
A Voss representation of a biological sequence is a binary matrix that encodes the sequence. The Voss representation of a sequence is obtained by encoding the sequence into a binary matrix where each column of the matrix represents a position in the sequence and each row represents a symbol in the alphabet (Voss, 1992). Formally, given a sequence
For example, the Voss matrix of the DNA sequence (i.e of
In this case the given alphabet is the DNA alphabet, but the same representation can be used for other alphabets.
Encoding BioSequences
This package provides a simple and fast way to encode biological sequences into Voss representations. The main struct
provided by this package is VossEncoder
which is a wrapper of BitMatrix
that encodes a biological sequence into a bit matrix and its corresponding alphabet. The following example shows how to encode a DNA sequence into a Voss matrix.
julia> using BioSequences, BioVossEncoder
julia> seq = dna"ACGT"
julia> VossEncoder(seq)
4×4 Voss Matrix of DNAAlphabet{4}():
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
For simplicity the VossEncoder
struct provides a property bitmatrix
that returns the BitMatrix
representation of the sequence.
julia> VossEncoder(seq).bitmatrix
4×4 BitMatrix:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Similarly another function that makes use of the VossEncoder
structure is vossmatrix
which returns the BitMatrix
representation of a sequence directly.
julia> vossmatrix(seq)
4×4 BitMatrix:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Creating a Voss vector of a sequence
Sometimes it proves to be useful to encode a sequence into a Voss vector representation (i.e a bit vector of the sequence from the corresponding molecule alphabet).
This package provides a function vossvector
that returns Voss vector of a sequence given a BioSequence
and the specific molecule (BioSymbol
) that could be DNA
or AA
.
julia> vossvector(seq, DNA_A)
4-element view(::BitMatrix, 1, :) with eltype Bool:
1
0
0
0
Note that the output is actually using behind the scenes a view of the BitMatrix
representation of the sequence. This is done for performance reasons.
References
Voss, R. F. Evolution of long-range fractal correlations and 1/ f noise in DNA base sequences. Phys. Rev. Lett. 68, 3805–3808 (1992).