-
Notifications
You must be signed in to change notification settings - Fork 4
CLIMB 2
nano is a simple text editor that can be used for small edits on remote files. For larger tasks the way to go is editing the file locally with a powerful text editor, saving it and transfering it to the remote server.
The keyboard shortcuts are listed at the bottom, where ^
means Ctrl
. To exit, for example, we will press Ctrl + X
(if the file changed, we will need to confirm if we want to save it.
The general syntax is: ls [options] [files]
. Both the options and the files are optional, and files can be files or directories. Now we introduce some of the options:
Option Description
-
-a
Also show hidden files -
-l
Long format, will show one file per line, with size, owner, date… -
-h
Used with -l, will display file size in human-readable format (e.g. 2.3Mb instead of 2298011 ) -
-d
Show directories as files, without listing their content
The options can be combined together, and the following two commands are identical:
ls -l -h -a
ls -lha
If we want to list the files present at the root, we don't need to move there, but simply ask ls which path to scan for you:
ls /
Here another example:
ls ~/learn_bash/phage/ ~/learn_bash/files/
As we noticed, ls
can receive more than one file. Usually, though, we don't type every single item to be listed, but instead we use wildcards, then the shell will expand our shortcuts into a list of paths. There are wildcards, ranges and lists to be used.
Symbol | Meaning | Example |
---|---|---|
* | Any set of characters (any length) |
*.fasta : all files ending with “.fasta” |
? | A single character |
A???.txt : files starting with A, followed by exactly 3 chars, endin by “.txt” |
[a-z] | Range: any single lowercase letters |
file1[a-c].txt : files called file1a, file1b and file1c, ending with “.txt” |
[0-9] | Range, any single digit |
reads_R[1-2].fastq : reads_R1.fastq and reads_R2.fastq |
{a,b} | Comma separated list of words |
vir_{protein,assembly}* : files beginning with vir_protein or vir_assembly
|
Yesterday we had a preview on tsv
files, a very common way to store tabular data in the command line, including an example of bioinformatic file (the gff
file with the annotation of lambda phage).
The GFF (General Feature Format) is used to store annotations. An alternative format, called GTF, is more focused on genes annotations while GFF is more generic. They are both TSV (tab separated values), that is they are table where the boundaries across cells are marked by a single tabulation.
The first lines optionally specify some metadata, and they are preceded by a #.
Let's see an example:
less -S ~/learn_bash/phage/vir_genomic.gff
# If we want to remove the header lines:
grep -v '^#' ~/learn_bash/phage/vir_genomic.gff | less -S
# If we want to increase the tabulation:
grep -v '^#' ~/learn_bash/phage/vir_genomic.gff | less -S -x 15
If we want to extract all the lines with CDSs (-w
requires the pattern to be surrounded by non alphanumeric characters), and then lines containing the word capsid:
grep -w CDS ~/learn_bash/phage/vir_genomic.gff
grep -w CDS ~/learn_bash/phage/vir_genomic.gff | grep -i capsid
A useful command to extract some columns from a text file is cut:
cut -f 1,3-5 ~/examples/phage/vir_genomic.gff
GFF, GTF, but also SAM and VCF are examples of tabular text files. They all are tab-separated values. A smaller example will be easier to deal with:
cat ~/learn_bash/files/wine.tsv
To sort a table, there is the command sort with these options:
-
-n
to sort numerically (default is alphabetic) -
-k NUMBER
to specify the column to sort (by default first) -
-r
for reverse sorting (default: ascending)
If we want to sort by username, that is the third column of the file:
sort -k 3 ~/learn_bash/files/wine.tsv
Sometimes we need to increase the space used by tabs to have a clearer view:
sort -k 3 /homes/2020/binf/data/people.tsv | less -S -x 20
Sometimes with tabular data we want to extract a set of columns. The command cut is there for us:
-
-f
to specify the columns (fields), supports lists (-f 1,4,6
) and ranges (-f 1-8
) -
-d
the character delimiting the columns. By default is tab, but can be-d ","
. -
-t
as delimiter. By default white spaces. type-t$'\t'
for tab delimited, or-t ','
for comma separated
# Get Country and Alcohol content:
cut -f 2,3 ~/learn_bash/files/wine.tsv
# and sort by alchool:
cut -f 2-3 ~/learn_bash/files/wine.tsv | sort -n -k 2
· Bioinformatics at the Command Line - Andrea Telatin, 2017-2020
Menu