← All papers

A Parametrized Family of Tversky Metrics Connecting the Jaccard Distance to an Analogue of the Normalized Information Distance

Bjørn Kjos-Hanssen, Saroj Niraula, Soowhan Yoon

cs.LO Jan 1, 2020 · v1
Uses Lean to formally verify metric properties of a parametrized Tversky distance family.
This paper introduces a parametrized family of metrics based on Tversky's set-theoretic similarity index, connecting the Jaccard distance at one extreme to an analogue of the normalized information distance at the other. The triangle inequality and metric properties are formally verified using the Lean theorem prover, providing machine-checked guarantees for these distance functions used in information theory and data science.

Tversky's set-theoretic similarity index generalizes the Jaccard index but does not yield a metric for all parameter choices. Establishing which parametrizations produce valid metrics, and proving the triangle inequality, requires careful formal argument.

The paper introduces a parametrized family of metrics based on Tversky's similarity index, connecting the Jaccard distance at one extreme to an analogue of the normalized information distance at the other. The triangle inequality and metric properties are formally verified using the Lean theorem prover.

Machine-checked proofs guarantee that the proposed family of distance functions satisfies metric axioms across the full parameter range. The family provides a continuous interpolation between the Jaccard distance and an analogue of the normalized information distance, applicable in information theory and data science.