Skip to content

Writting ORFIs into bioinformatic formats

This package facilitates the creation of FASTA, BED, and GFF files, specifically extracting Open Reading Frame (ORF) information from BioSequence instances, particularly those of type NucleicSeqOrView{A} where A, and then writing the information into the desired format.

Functionality:

The package provides four distinct functions for writing files in different formats:

FunctionDescription
write_orfs_fnaWrites nucleotide sequences in FASTA format.
write_orfs_faaWrites amino acid sequences in FASTA format.
write_orfs_bedOutputs information in BED format.
write_orfs_gffGenerates files in GFF format.

All these functions support processing both BioSequence instances and external FASTA files. In the case of a BioSequence instace into external files, simply provide the path to the FASTA file using a String to the path. To demonstrate the use of the write_* methods with a BioSequence, consider the following example:

julia
using BioSequences, GeneFinder

# > 180195.SAMN03785337.LFLS01000089 -> finds only 1 gene in Prodigal (from Pyrodigal tests)
seq = dna"AACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAACAGCACTGGCAATCTGACTGTGGGCGGTGTTACCAACGGCACTGCTACTACTGGCAACATCGCACTGACCGGTAACAATGCGCTGAGCGGTCCGGTCAATCTGAATGCGTCGAATGGCACGGTGACCTTGAACACGACCGGCAATACCACGCTCGGTAACGTGACGGCACAAGGCAATGTGACGACCAATGTGTCCAACGGCAGTCTGACGGTTACCGGCAATACGACAGGTGCCAACACCAACCTCAGTGCCAGCGGCAACCTGACCGTGGGTAACCAGGGCAATATCAGTACCGCAGGCAATGCAACCCTGACGGCCGGCGACAACCTGACGAGCACTGGCAATCTGACTGTGGGCGGCGTCACCAACGGCACGGCCACCACCGGCAACATCGCGCTGACCGGTAACAATGCACTGGCTGGTCCTGTCAATCTGAACGCGCCGAACGGCACCGTGACCCTGAACACAACCGGCAATACCACGCTGGGTAATGTCACCGCACAAGGCAATGTGACGACTAATGTGTCCAACGGCAGCCTGACAGTCGCTGGCAATACCACAGGTGCCAACACCAACCTGAGTGCCAGCGGCAATCTGACCGTGGGCAACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAGC"

Once a BioSequence object has been instantiated, the write_orfs_fna function proves useful for generating a FASTA file containing the nucleotide sequences of the ORFIs. Notably, the write_orfs* methods support either an IOStream or an IOBuffer as an output argument, allowing flexibility in directing the output either to a file or a buffer. In the following example, we demonstrate writing the output directly to a file.

julia
outfile = "LFLS01000089.fna"

open(outfile, "w") do io
    write_orfs_fna(seq, io, NaiveFinder())
end
bash
cat LFLS01000089.fna

>seq id=01 start=29 stop=40 strand=+ frame=2 features=[]
ATGCAACCCTGA
>seq id=02 start=137 stop=145 strand=+ frame=2 features=[]
ATGCGCTGA
>seq id=03 start=164 stop=184 strand=+ frame=2 features=[]
ATGCGTCGAATGGCACGGTGA
>seq id=04 start=173 stop=184 strand=+ frame=2 features=[]
ATGGCACGGTGA
>seq id=05 start=236 stop=241 strand=+ frame=2 features=[]
ATGTGA
>seq id=06 start=248 stop=268 strand=+ frame=2 features=[]
ATGTGTCCAACGGCAGTCTGA
>seq id=07 start=362 stop=373 strand=+ frame=2 features=[]
ATGCAACCCTGA
>seq id=08 start=470 stop=496 strand=+ frame=2 features=[]
ATGCACTGGCTGGTCCTGTCAATCTGA
>seq id=09 start=551 stop=574 strand=+ frame=2 features=[]
ATGTCACCGCACAAGGCAATGTGA
>seq id=10 start=569 stop=574 strand=+ frame=2 features=[]
ATGTGA
>seq id=11 start=581 stop=601 strand=+ frame=2 features=[]
ATGTGTCCAACGGCAGCCTGA
>seq id=12 start=695 stop=706 strand=+ frame=2 features=[]
ATGCAACCCTGA

This could also be done to writting a FASTA file with the nucleotide sequences of the ORFIs using the write_orfs_fna function. Similarly for the BED and GFF files using the write_orfs_bed and write_orfs_gff functions respectively.