Efficient sequence-structure alignments of ncRNA
LaRA 2 is an improved version of LaRA, a tool for sequence-structure alignment of RNA sequences. It…
Clone the repository and use the --recurse-submodules option for downloading SeqAn and Lemon as submodules.
git clone --recurse-submodules https://github.com/seqan/lara.git
Alternatively, you can download a package of the repository via the buttons at the top of this page. If you do so, please extract the file into a new subdirectory named lara and download the dependencies separately.
LaRA is dependent on the following libraries:
If you have not performed a recursive clone above, simply run the following command in the lara directory to download them.
cd lara
git submodule update --init --recursive
cd ..
To process the output for multiple alignments (3 or more sequences), you need either
Optionally, LaRA can predict the RNA structures for you if you provide
Note: Users reported problems with installing ViennaRNA, so we provide some hints here.
./configure --without-swig --without-kinfold --without-forester --without-rnalocmin --without-gsl
./configure --disable-lto
./configure --enable-sse
If you have further suggestions, we are happy to add them here.
Please create a new directory and build the program for your platform.
mkdir build
cd build
cmake ../lara # specify the path to the lara directory
make
cd ..
Note: In order to use the SIMD vectorization, you need to tell the compiler what kind of hardware you have.
This can be done by appending -DCMAKE_CXX_FLAGS="-march=X"
to the cmake command, where X denotes your CPU type.
If you want to run the code only on the machine you are compiling, you can set -march=native
to use the current
CPU type. You can find detailed information and valid parameters in your compiler manual,
e.g. gcc-9.3 documentation or g++-9 --help=target
.
After building the program binary, running LaRA is as simple as
build/lara -i sequences.fasta
Note that for passing sequence files you need the ViennaRNA dependency, as the program must predict structures. Instead, you can pass at least two dot plot files, which contain the base pair probabilities for a single sequence each.
build/lara -d seq1_dp.ps -d seq2_dp.ps
The pairwise structural alignments are printed to stdout in the T-Coffee Library format (see below). If you want to store the result in a file, please use the -w option or redirect the output.
build/lara -i sequences.fasta -w results.lib
build/lara -i sequences.fasta > results.lib
We recommend you to specify the number of threads with the -j option, to execute for instance 4 alignments in parallel. If you specify -j 0 the program tries to detect the maximal number of threads available on your machine.
build/lara -i sequences.fasta -j 4
For a list of options, please see the help message:
build/lara --help
Each output format is sorted primarily by the first and subsequently by the second sequence index.
The result of LaRA is a T-Coffee library file and its format is documented here. It contains the structural scores for each residue pair of each computed sequence pair. This file is the input for T-Coffee, which computes the multiple alignment based on the scores:
t_coffee -lib results.lib
LaRA has an additional output format that can be read by the MAFFT framework. Each pairwise alignment produces three lines: a description line composed of the two sequence ids and the two gapped sequences of the alignment:
> first id && second id
AACCG-UU
-ACCGGUU
> first id && third id
AA-CCGUU
AAGCCGUU
MAFFT invokes LaRA with the option -o pairs for receiving this output format.
LaRA can produce the aligned FastA format, which is recommended for a single pairwise alignment. It looks like a normal FastA file with gap symbols in the sequences:
> first id
AACCG-UU
> second id
-ACCGGUU
You need to pass the option -o fasta to the LaRA call for getting this output format.
LaRA prints a warning if you use this format with more than two sequences. Using this format with 3 or more sequences is possible but not recommended, because additional pairwise alignments will simply be appended to the file, and it may be hard to distinguish the pairs later. In addition, this can confuse other programs, which expect a single multiple sequence alignment as produced by MAFFT or T-Coffee.
LaRA 2 is being developed by Jörg Winkler and Gianvito Urgese, but it incorporates a lot of work from other members of the SeqAn project.