echo "This will be printed"
This will be printed
In this chapter we will explore the fundamentals of the command line interface (aka CLI). We will distinguish the differences between Unix, CLI, Bash and Terminal and other concepts from the computer sciences.
As you will see the CLI is composed of several programs enabling the interaction with the machine, we will discuss some of the basics to navigate your machine, and some advance one that enable complex operations and automating tasks.
Before landing into the CLI let us consider the Unix concept. The first question that comes in this section is: what is Unix? It simply is an :operating system (OS). In other words, it is a set of programs that inter-operate with each other to let you communicate with the machine. A very important variant (or clone) of Unix is the very well known OS :Linux, which was created by :Linus Torvalds from scratch. The most important idea behind Unix based systems is the idea that we can use it to access information and hardware programmatically. Other main feature from Unix-like OS systems is the fact that data is usually stored as text files and the interface by which users communicate with the machine is also text-based (TUI: text user interface as opposed to GUI, graphic user interface).
Almost every computer has a way to interact with or access to the inner elements of the computer. Such interface is called the the command-line-interface Fig. 1.1.
Programs, files and directories on every machine (with Unix-like OS) display hierarchical paths (routes), starting out from the root (represented by the back-slash character /
). The root represents the beginning of all the software installed in the machine. And many other files are nested from there forming a tree-like structure for the paths Fig. 1.2
You can inspect the paths of a nested directory tree using tree
command in you cli:
tree -d -L 1
There are basically two ways to explore or navigate your file system. If you always represent it from the root, then you are presenting an absolute path. For instance the absolute path to my desktop is (/Users/camilogarcia/Desktop
).
Given that the vast majority of file systems are organized in file paths, the first question when starting with the CLI is “Where am I?”. So Unix tool system is equipped with a bunch of commands but its basic ones are pretty much oriented to answer that question and navigating this text-based interface of files. The following three commands (pwd
, cd
, ls
) will help you conquer the CLI.
To know where you are you can see your current location, that is to print your working directory using the pwd
command.
pwd
cd test-dir
Some basic arguments to navigate across your terminal:
cd .. # change backwards
cd ~ # change to the home
cd / # change to the root
cd - # change to previous dir
ls
You can navigate your executed commands by typing or .
mkdir test-dir
A simple command to create any file inside your terminal is touch
it just create a file, but do not allow any editing.
touch new-file.txt
The new-file.txt
is empty and created on your current location unless you assign another path when creating it. We suggest to take a look at Allison Horst illustrations, especially on how to name files depending on the case see Fig. 1.3
cat new-file.txt
some
lines
that
were
written
echo "This will be printed"
This will be printed
rm
When having a long command, it becomes practically to go to the beginning or to the end of it. To do so you can use the key combination Ctrl + A
and Ctrl + E
respectively.
rmdir
There is still many conventions by which the parts of a command line might be called, yet a very standard convention is presented in Fig. 1.4
Some other for instance also tend to call the option
as flag
. This conventions are powerful because almost any command line interface display this structure (complex one add some other features and simple one tend to lack subcommands).
Bacterial defense mechanisms to avoid bacteriophage infections are abundant. One of these is the :restriction-modification system (RM-System), which works by targeting a specific sites called motifs, shared by the phage and bacteria, with methylations. Motifs are commonly represented as a :sequence logo which is a probabilistic representation of the nucleotides at each position. The challenge consists of finding the number of times the motif from Fig. 1.5 appears on B. tequilensis EA-CB0015 genome using a command. Assume that probabilities are equal when multiple bases appeared at one site.
Before diving into an :answer take your time to think and solve it by your own.
When facing the CLI several issues or problems will arise. As for any other unintuitive challenge, a complete text interface Handling errors. Getting help Patience
Some operator or metacharacters have special functions in bash. For instance the *
or wildcard is a regular expression character (sometimes called as a placeholder) that will turn in any character, many times, similarly the ?
represents any character, once. Whereas the $
(dollar sign or operator) is intended for an special task: call environmental variables which means that once a variable is defined (e.g., var=1
) this variable can be called via the $
operator anytime echo $var
will get us 1
as the standard output
wc
tr
grep
sed
When using the CLI at first its common to feal quite slow. Then, a very useful tip to boost the productivity from the command line is the autocompletion of commands by hitting <tab>
after the initial command.
When having a long command, it is also useful to jump by lines instead of character by character. To do so you can use the key combination Alt + <-
and Alt + ->
respectively.
A second part of this challenges consists of create a script out from r the motif-search one-line command that recursively search the motifs in all genomes from a zip file that contains 10 bacterial genomes. The script should include the shebang, loops, conditionals and environmental variables.
See a possible script that solve the challenge :here
.bashrc
awk
snippet:Restriction endonucleases (RE) cleave the DNA by digesting the :phosphodiester bond between two nucleotides. Many RE are directed to specific DNA motifs normally palindromic. There are mainly three types according to it digestive mechanism. RE have been widely used in molecular biotechnology because its specificity and versatility to carry out different experiments.
One of the main uses of RE is to generate a pattern of restricted fragments from different organisms so that samples of organisms, sequences or genes could be distinguished, as long as they display differences in the number of recognition motifs. This is normally done in the lab, where an RE is mixed with a DNA sample and later an :electrophoresis gel is run to see a separation pattern according to the fragments size.
Professor Javier has sequenced the genome of a sampled SARS-CoV2 and want to see the band pattern that the genome would display if it were digested with the RE EcoRV. He has asked you to help him with this problem. The expected output is a text file with the sizes of the fragments, where the size is the number of nucleotides of each fragment.
For more explanations on the basic commands in the command line we suggest to visit the first chapters of Computing skills for biologist from Allesina and Wilmes (2019)
A list of reading for this section:
Dudley and Butte (2009)
Perkel et al. (2021)
Brandies and Hogg (2021)