Similarity, Data Compression and a Dead Composer

  • Jetse Koopmans University of Amsterdam
  • Daan van den Berg FNWI
  • Vadim Zaytsev FNWI


Domenico Scarlatti (1685-1759) is well-known for his 555 keyboard sonatas. Although his work is greatly revered by many professional musicians, some claim that it does not show any compository development. In this paper, his sonatas are clustered by normalized compression distance (NCD), an algorithmical similarity metric with no musical background knowledge. NCD is rooted in Kolmogorov Complexity (KC), a measure that captures the similarity between any two sonatas in a single number. The results show clusters of similar sonatas and suggest Scarlatti’s work does show compository development, even ‘milestone sonatas’ marking changes in artistic style during his lifetime.


[1] R. Kirkpatrick. Domenico Scarlatti. Princeton
University Press, 1953.

[2] W.D. Sutcliffe. The Keyboard Sonatas of Domenico
Scarlatti and Eighteenth-Century Musical Style.
Cambridge University Press.

[3] Sheveloff. Keyboard. 1970, p. 258.

[4] S.R. Owen. On the Similarity of MIDI Documents.
Harvard College, 2000, pp. 40–41.

[5] T. van Schie. Enige gedachten bij de Sonates van
Scarlatti. 1988. (,
consulted Oct 16th , 2015

[6] M. Li and P.M.B. Vitány. An Introduction to
Kolmogorov Complexity and its Applications. Springer
Verlag, New York, 2nd Edition, 1997.

[7] M. Koucký. A Brief Introduction to Kolmogorov
Complexity. MÚ AV ČR, Praha, 2006, p. 4.

[8] R. Cilibrasi and P.M.B. Vitányi. “Clustering by
compression”. In: Information Theory, IEEE Transactions
on 51.4 (Apr. 2005), pp. 1523–1545.

[9] K. Orpen and D. Huron. Measurement of similarity in
music: A quantitative approach for non-parametric
representations. Computers in Music Research 4, 1992.

[10] G. Cormode, M. Paterson, S. Sahinalp and U.
Vishkin. “Communication complexity of document
exchange”. In: Proc. 11th ACM-SIAM Symposium on
Discrete Algorithms (2000), pp. 197–206.

[11] M. Li, J.H. Badger, X. Chen, S. Kwong, P. Kearney
and H. Zhang. An information-based sequence distance
and its application to whole mitochondrial genome
phylogeny. Bioinformatics, 17(2).

[12] M. Li and P.M.B. Vitány. “Algorithmic complexity”.
In: International Encyclopedia of the Social & Behavioral
Sciences (2001), pp. 376–382.

[13] M. Li, X. Chen, X. Li, B. Ma and P.M.B. Vitány.
The similarity metric. Proc. 14th ACM-SIAM
Symposium on Discrete Algorithms, 2003, pp. 863–872.

[14] X. Chen, B. Francia, M. Li, B. McKinnon and A.
Seker. “Shared information and program plagiarism
detection”. In: Information Theory, IEEE Transactions on
50.7 (July 2004), pp. 1545–1551.

[15] R. Cilibrasi, P.M.B. Vitányi and R. de Wolf.
“Algorithmic Clustering of Music Based on String
Compression”. In: Computer Music Journal 28.4 (Dec.
2004), pp. 49–67.

[16] A. El-Hamdouchi and P. Willett. “Comparison of
hierarchic agglomerative clustering methods for
document retrieval”. In: The Computer Journal 32.3 (June
1989), pp. 220–227.

[17] P. Knees and M. Schedl. Music Retrieval and
Recommendation: A Tutorial Overview. In Development
in Information Retrieval (2015), pp. 1133-1136.

[18] V. Kumar, H. Pandya and C.V. Jawahar. Identifying
Ragas in Indian Music. In Pattern Recognition (ICPR),
2014 22nd International Conference on (2014), pp.

[19] T. Li, M. Ogihara and Q. Li. A Comparative Study
on Content-Based Music Genre Classification. In
Development in Information Retrieval (2003), pp.

[20] U. Simsekli. Automatic Music Genre Classification
Using Bass Lines. In Pattern Recognition (ICPR), 2010,
pp. 4137-4140.

[21] M. Schedl and D. Hauger. Tailoring Music
Recommendations to Users by Considering Diversity,
Mainstreaminess, and Novelty. In Development in
Information Retrieval (2015), pp. 947-950.

[22] D. Nebel, B. Hammer and T. Villmann. About
Learning of Supervised Generative Models for
Dissimilarity Data. Machine Learning Reports (2013),
pp, 1–19.
How to Cite
KOOPMANS, Jetse; VAN DEN BERG, Daan; ZAYTSEV, Vadim. Similarity, Data Compression and a Dead Composer. Student Undergraduate Research E-journal!, [S.l.], v. 1, nov. 2015. ISSN 2468-0443. Available at: <>. Date accessed: 20 oct. 2020.