Since scores are the log of odd ratios, the PAM number of this matrix corresponds to the maximum likelihood of evolutionary distance.
Empirically, we observed that with a mixture of homologous and non-homologous pairs of sequences as input, the PAM-224 matrix yields alignment scores that are on average closest to the ones obtained in the refinement part.
Since we consider entire proteins as the basic evolutionary unit, why then not use global alignments?
Protein ends are often variable, and thus, it is reasonable to ignore them by using local alignments.
The use of a heuristic-based algorithm such as BLAST could potentially increase the speed of the homology search, but modern implementations of Smith-Waterman using SIMD instructions are almost as fast as BLAST .
Moreover, most of the time is consumed by estimating evolutionary distances.
In addition, the number of complete genomes under analysis has increased to over 657, which requires efficient solutions regarding computation speed and memory consumption.
A web interface now enables interactive exploration of the predictions .The algorithm of OMA improves upon standard bidirectional best-hit approach in several respects: it uses evolutionary distances instead of scores, considers distance inference uncertainty, includes many-to-many orthologous relations, and accounts for differential gene losses.Herein, we describe in detail the algorithm for inference of orthology and provide the rationale for parameter selection through multiple tests.All pairs of protein sequences from complete genomes are aligned using full dynamic programming.There are several advantages of using protein sequences rather than using DNA sequences.The classification of genes according to evolutionary relations is essential for many aspects of comparative and functional genomics.Evolutionary relations are often described as pairwise relations.Very distant homologies are difficult to find at the DNA level, and protein sequences suffer less from convergence due to mutational biases.Also, the length of a protein is one third of that of the corresponding DNA sequence, a considerable advantage given that the time complexity of aligning sequences is quadratic with respect to length.Orthologs are valuable in numerous analyses, including reconstruction of species phylogenies, protein function inference, database annotation, and genomic context analysis.Evolutionary relations can also be defined with respect to a third gene.