Voss representation

A Voss representation of a biological sequence is a binary matrix that encodes the sequence. The Voss representation of a sequence is obtained by encoding the sequence into a binary matrix where each column of the matrix represents a position in the sequence and each row represents a symbol in the alphabet (Voss, 1992). Formally, given a sequence $S$ of length $n$ and an alphabet $\mathscr{A}$ of size $m$, the Voss matrix $V$ of $S$ is a $m \times n$ binary matrix $V$ such that $V_{i,j} = 1$ if the $j^{th}$ position of the sequence $S$ is equal to the $i^{th}$ symbol of the alphabet $\mathscr{A}$ and $V_{i,j} = 0$ otherwise:

\[v_i[j] = \begin{cases} 1 & \text{if } s[j] = \mathscr{a}[i] \\ 0 & \text{if } s[j] \neq \mathscr{a}[i] \end{cases}\]

For example, the Voss matrix of the DNA sequence (i.e of $\mathscr{A}) == \{A, C, G, T\}$) is the following matrix:

\[\begin{bmatrix} \text{A} & 1 & 0 & 0 & 0 \\ \text{C} & 0 & 1 & 0 & 0 \\ \text{G} & 0 & 0 & 1 & 0 \\ \text{T} & 0 & 0 & 0 & 1 \\ \end{bmatrix}\]

In this case the given alphabet is the DNA alphabet, but the same representation can be used for other alphabets.

Encoding BioSequences

This package provides a simple and fast way to encode biological sequences into Voss representations. The main struct provided by this package is VossEncoder which is a wrapper of BitMatrix that encodes a biological sequence into a bit matrix and its corresponding alphabet. The following example shows how to encode a DNA sequence into a Voss matrix.

julia> using BioSequences, BioVossEncoder
julia> seq = dna"ACGT"
julia> VossEncoder(seq)
4×4 Voss Matrix of DNAAlphabet{4}():
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

For simplicity the VossEncoder struct provides a property bitmatrix that returns the BitMatrix representation of the sequence.

julia> VossEncoder(seq).bitmatrix
4×4 BitMatrix:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

Similarly another function that makes use of the VossEncoder structure is vossmatrix which returns the BitMatrix representation of a sequence directly.

julia> vossmatrix(seq)
4×4 BitMatrix:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

Creating a Voss vector of a sequence

Sometimes it proves to be useful to encode a sequence into a Voss vector representation (i.e a bit vector of the sequence from the corresponding molecule alphabet).

This package provides a function vossvector that returns Voss vector of a sequence given a BioSequence and the specific molecule (BioSymbol) that could be DNA or AA.

julia> vossvector(seq, DNA_A)
4-element view(::BitMatrix, 1, :) with eltype Bool:
 1
 0
 0
 0

Note that the output is actually using behind the scenes a view of the BitMatrix representation of the sequence. This is done for performance reasons.

References

Voss, R. F. Evolution of long-range fractal correlations and 1/ f noise in DNA base sequences. Phys. Rev. Lett. 68, 3805–3808 (1992).