Skip to content

Voss representation

A Voss representation of a biological sequence is a binary matrix that encodes the sequence. The Voss representation of a sequence is obtained by encoding the sequence into a binary matrix where each column of the matrix represents a position in the sequence and each row represents a symbol in the alphabet (Voss, 1992). Formally, given a sequence S of length n and an alphabet A of size m, the Voss matrix V of S is a m×n binary matrix V such that Vi,j=1 if the jth position of the sequence S is equal to the ith symbol of the alphabet A and Vi,j=0 otherwise:

vi[j]={1if s[j]=a[i]0if s[j]a[i]

For example, the Voss matrix of the DNA sequence (i.e of A)=={A,C,G,T}) is the following matrix:

[A1000C0100G0010T0001]

In this case the given alphabet is the DNA alphabet, but the same representation can be used for other alphabets.

Encoding BioSequences

This package provides a simple and fast way to encode biological sequences into Voss representations. The main struct provided by this package is VossEncoder which is a wrapper of BitMatrix that encodes a biological sequence into a bit matrix and its corresponding alphabet. The following example shows how to encode a DNA sequence into a Voss matrix.

julia
julia> using BioSequences, BioVossEncoder
julia
julia> seq = dna"ACGT"
julia
julia> VossEncoder(seq)
4×4 Voss Matrix of DNAAlphabet{4}():
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

For simplicity the VossEncoder struct provides a property bitmatrix that returns the BitMatrix representation of the sequence.

julia
julia> VossEncoder(seq).bitmatrix
4×4 BitMatrix:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

Similarly another function that makes use of the VossEncoder structure is vossmatrix which returns the BitMatrix representation of a sequence directly.

julia
julia> vossmatrix(seq)
4×4 BitMatrix:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

Creating a Voss vector of a sequence

Sometimes it proves to be useful to encode a sequence into a Voss vector representation (i.e a bit vector of the sequence from the corresponding molecule alphabet).

This package provides a function vossvector that returns Voss vector of a sequence given a BioSequence and the specific molecule (BioSymbol) that could be DNA or AA.

julia
julia> vossvector(seq, DNA_A)
4-element view(::BitMatrix, 1, :) with eltype Bool:
 1
 0
 0
 0

Note that the output is actually using behind the scenes a view of the BitMatrix representation of the sequence. This is done for performance reasons.

References

Voss, R. F. Evolution of long-range fractal correlations and 1/ f noise in DNA base sequences. Phys. Rev. Lett. 68, 3805–3808 (1992).