Finding Complete and Overlapped ORFs
The main function in the GeneFinder package is findorfs, which serves as an interface to various gene-finding algorithms. All methods return an ORFCollection—a structure that bundles the found ORFs with their source sequence, providing a clean API for sequence extraction.
Note
The minlen keyword argument is set to a minimum length of 6 nucleotides (nt). As a result, it may identify short ORFs that aren't necessarily genes, such as dna"ATGTGA" producing the amino acid sequence aa"M*".
Usage Example
Here's an example of using findorfs with the NaiveFinder algorithm:
julia> using BioSequences, GeneFinder
julia> seq = dna"AACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAACAGCACTGGCAATCTGACTGTGGGCGGTGTTACCAACGGCACTGCTACTACTGGCAACATCGCACTGACCGGTAACAATGCGCTGAGCGGTCCGGTCAATCTGAATGCGTCGAATGGCACGGTGACCTTGAACACGACCGGCAATACCACGCTCGGTAACGTGACGGCACAAGGCAATGTGACGACCAATGTGTCCAACGGCAGTCTGACGGTTACCGGCAATACGACAGGTGCCAACACCAACCTCAGTGCCAGCGGCAACCTGACCGTGGGTAACCAGGGCAATATCAGTACCGCAGGCAATGCAACCCTGACGGCCGGCGACAACCTGACGAGCACTGGCAATCTGACTGTGGGCGGCGTCACCAACGGCACGGCCACCACCGGCAACATCGCGCTGACCGGTAACAATGCACTGGCTGGTCCTGTCAATCTGAACGCGCCGAACGGCACCGTGACCCTGAACACAACCGGCAATACCACGCTGGGTAATGTCACCGCACAAGGCAATGTGACGACTAATGTGTCCAACGGCAGCCTGACAGTCGCTGGCAATACCACAGGTGCCAACACCAACCTGAGTGCCAGCGGCAATCTGACCGTGGGCAACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAGC";
julia> collection = findorfs(seq, finder=NaiveFinder)
ORFCollection{NaiveFinder} with 12 ORFs in 726bp sequence:
ORF{NaiveFinder}(29:40, '+', 2)
ORF{NaiveFinder}(137:145, '+', 2)
ORF{NaiveFinder}(164:184, '+', 2)
⋮Working with ORFCollections
The ORFCollection provides a convenient interface for working with ORFs:
julia> collection[1] # First ORF
ORF{NaiveFinder}(29:40, '+', 2)
julia> collection[1:5] # Range of ORFs (returns a Vector)
5-element Vector{OpenReadingFrame{NaiveFinder}}:
ORF{NaiveFinder}(29:40, '+', 2)
⋮
julia> length(collection) # Number of ORFs
12
julia> source(collection) # Access source sequence
726nt DNA Sequence:
AACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTG…GCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAGCExtracting Sequences from ORFs
Extract sequences using the sequence function with an index or ORF:
julia> sequence(collection, 1) # By index
12nt DNA Sequence:
ATGCAACCCTGA
julia> orf = collection[1]; sequence(collection, orf) # By ORF
12nt DNA Sequence:
ATGCAACCCTGA
julia> sequence.(Ref(collection), collection.orfs) # All sequences using broadcasting
12-element Vector{LongSubSeq{DNAAlphabet{4}}}:
ATGCAACCCTGA
ATGCGCTGA
⋮Translating ORF Sequences
To translate ORF sequences to amino acids, use BioSequences.translate on extracted sequences:
julia> using BioSequences
julia> translate(sequence(collection, 1))
4aa Amino Acid Sequence:
MQP*
julia> translate.(sequence.(Ref(collection), collection.orfs)) # Translate all ORFs
12-element Vector{LongAA}:
MQP*
MR*
MRRMAR*
⋮