AUTHORS: Ilhan Karić, Zanin Vejzović
Download as PDF
ABSTRACT: This paper proposes a new algorithm for the evaluation of similarity between two sequences in quasilinear time. It describes the theoretical, practical and implementational aspects of the algorithm. The proposed method is a new approach dedicated to the computation of sequential similarity in contrast to other methods like the Jaccard Index which although designed for the computation of similarity of sets have been frequently used on sequences. The method is generalizable and applicable to any form of sequential data of a finite alphabet (binary files, DNA sequences, natural language etc.)
KEYWORDS: Sequence Similarity, Comparison, Contextual Similarity, Quasilinear-Time Complexity
REFERENCES:
[1] X1. Konrad Rieck, Pavel Laskov, “LinearTime Computation of Similarity Measures for Sequential Data”, Journal of Machine Learning Research 9 (2008) 23-48 pp. 1
[2] S. T. Piantadosi, “Zipf’s word frequency law in natural language: A critical review and future directions”, Psychonomic Bulletin & Review, vol. 21, 2014, pp. 1112-1130