LaRA 2

Efficient sequence-structure alignments of ncRNA

What LaRA 2 does for you

LaRA 2 is an improved version of LaRA, a tool for sequence-structure alignment of RNA sequences. It…

Download instructions

Clone the repository and use the --recurse-submodules option for downloading SeqAn and Lemon as submodules.

git clone --recurse-submodules https://github.com/seqan/lara.git

Alternatively, you can download a package of the repository via the buttons at the top of this page. If you do so, please extract the file into a new subdirectory named lara and download the dependencies separately.

Requirements

LaRA is dependent on the following libraries:

If you have not performed a recursive clone above, simply run the following command in the lara directory to download them.

cd lara
git submodule update --init --recursive
cd ..

To process the output for multiple alignments (3 or more sequences), you need either

Optionally, LaRA can predict the RNA structures for you if you provide

Note: Users reported problems with installing ViennaRNA, so we provide some hints here.

  1. Install the GNU MPFR Library first.
  2. Exclude unnecessary components of ViennaRNA: ./configure --without-swig --without-kinfold --without-forester --without-rnalocmin --without-gsl
  3. If you have linker issues use ./configure --disable-lto
  4. If your system supports SSE4.1 instructions then we recommend ./configure --enable-sse

If you have further suggestions, we are happy to add them here.

Build instructions

Please create a new directory and build the program for your platform.

mkdir build
cd build
cmake ../lara     # specify the path to the lara directory
make
cd ..

Note: In order to use the SIMD vectorization, you need to tell the compiler what kind of hardware you have. This can be done by appending -DCMAKE_CXX_FLAGS="-march=X" to the cmake command, where X denotes your CPU type. If you want to run the code only on the machine you are compiling, you can set -march=native to use the current CPU type. You can find detailed information and valid parameters in your compiler manual, e.g. gcc-9.3 documentation or g++-9 --help=target.

First steps to use LaRA 2

After building the program binary, running LaRA is as simple as

build/lara -i sequences.fasta

Note that for passing sequence files you need the ViennaRNA dependency, as the program must predict structures. Instead, you can pass at least two dot plot files, which contain the base pair probabilities for a single sequence each.

build/lara -d seq1_dp.ps -d seq2_dp.ps

The pairwise structural alignments are printed to stdout in the T-Coffee Library format (see below). If you want to store the result in a file, please use the -w option or redirect the output.

build/lara -i sequences.fasta -w results.lib
build/lara -i sequences.fasta  > results.lib

We recommend you to specify the number of threads with the -j option, to execute for instance 4 alignments in parallel. If you specify -j 0 the program tries to detect the maximal number of threads available on your machine.

build/lara -i sequences.fasta -j 4

For a list of options, please see the help message:

build/lara --help

Output format

Each output format is sorted primarily by the first and subsequently by the second sequence index.

for multiple alignments with T-Coffee

The result of LaRA is a T-Coffee library file and its format is documented here. It contains the structural scores for each residue pair of each computed sequence pair. This file is the input for T-Coffee, which computes the multiple alignment based on the scores:

t_coffee -lib results.lib

for multiple alignments with MAFFT

LaRA has an additional output format that can be read by the MAFFT framework. Each pairwise alignment produces three lines: a description line composed of the two sequence ids and the two gapped sequences of the alignment:

> first id && second id
AACCG-UU
-ACCGGUU
> first id && third id
AA-CCGUU
AAGCCGUU

MAFFT invokes LaRA with the option -o pairs for receiving this output format.

for pairwise alignments

LaRA can produce the aligned FastA format, which is recommended for a single pairwise alignment. It looks like a normal FastA file with gap symbols in the sequences:

> first id
AACCG-UU
> second id
-ACCGGUU

You need to pass the option -o fasta to the LaRA call for getting this output format.

LaRA prints a warning if you use this format with more than two sequences. Using this format with 3 or more sequences is possible but not recommended, because additional pairwise alignments will simply be appended to the file, and it may be hard to distinguish the pairs later. In addition, this can confuse other programs, which expect a single multiple sequence alignment as produced by MAFFT or T-Coffee.

LaRA 2 is being developed by Jörg Winkler and Gianvito Urgese, but it incorporates a lot of work from other members of the SeqAn project.