Search
TAATGCCATGGGATGTT
TGGCA
GCATTGCAA
TGCAAT
CAATT
ATTTGAC
In this tutorial, we are going to de-novo assembly a genome of an unknown organism. First, download the read data:
bash
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR292/SRR292770/SRR292770_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR292/SRR292770/SRR292770_2.fastq.gz
zcat
zless
zmore
zless SRR292770_1.fastq.gz
@
+
Download and unpack the Velvet assembler. This algorithm was proposed here: https://doi.org/10.1101/gr.074492.107.
git clone https://github.com/dzerbino/velvet
Now build the assembler.
cd velvet_1.2.10 make MAXKMERLENGTH=60 OPENMP=1 cd ..
At this point, we are ready to run the assembly algorithm. Velvet first calculates hashes, using velveth command. Then velvetg command is used for deBruijn graph construction. Run
velveth
velvetg
./velvet_1.2.10/velveth ./velvet_1.2.10/velvetg
-cov_cutoff 2.81
You can find out how many contigs were produced by running
cat <out_dir_35>/contigs.fa
Change k and other settings of the Velvet assembler. Watch how they influence assembly results.
To visualize the assembly, you can use Bandage program. First, download the program
wget https://github.com/rrwick/Bandage/releases/download/v0.8.1/Bandage_Ubuntu_dynamic_v0_8_1.zip unzip Bandage_Ubuntu_dynamic_v0_8_1.zip
./Bandage
The program visualizes the de-Bruijn graph from the Velvet assembler, which you can find in the folder where you put the output from the velvet assembler.
In the graph, find contigs, that are not connected with the remains of the graph. Find likely assembly errors and repeats. Each block in the graph represents one contig.
Another visualization tool is Tablet. Download and install it by typing
wget https://bioinf.hutton.ac.uk/tablet/installers/tablet_linux_x64_1_21_02_08.sh chmod +x tablet_linux_x64_1_17_08_17.sh ./tablet_linux_x64_1_17_08_17.sh
We have to tell velvet to store additional statistics about assembly. For this purpose, call velvetg again with an additional parameter -amos_file yes. Next, open the .afg file in Tablet, for example by
-amos_file yes
.afg
tablet your_output_directory/velvet_asm.afg
You can use tablet to identify assembly errors as one below in the picture.
tablet
How can you explain the following situation when a short subsequence of a contig has twice that high coverage as remains of the contig?
More details about using the Velvet assembler and other tools can be found on ENA website.