Evaluation Results#

Info

The content of this section is autogenerated based on latest published configuration of the TwinSpect Benchmark.

Overview#

Effectiveness of all algorithm/dataset pairs at optimum F1-Score:

Algorithm	Dataset	Threshold	Recall	Precision	F1-Score
AUDIO-CODE-64	ISCC-FMA-10K	4	0.87	0.89	0.88
IMAGE-CODE-64	MIRFLICKR-MFND	12	0.91	0.96	0.94
TEXT-CODE-64	STLIB-2000	11	0.98	0.97	0.98

AUDIO-CODE-64#

Evaluation against dataset ISCC-FMA-10K

Effectiveness#

Understanding the Effectiveness Chart

This chart evaluates the effectiveness of a similarity hash in comparing media files. Each hash is compared against all others at different distance thresholds, with results assessed against the ground truth.

Chart Interpretation:

The X-Axis shows "Hamming Distance Query Thresholds". Each threshold marks a maximum distance for two hashes to be considered similar.
The Y-Axis represents Recall, Precision, and F1-Score:
Recall: The fraction of actual matches correctly identified by the hash. Higher recall indicates better match detection.
Precision: The ratio of correct predictions to the total number of predictions. Higher precision implies more reliable predictions.
F1-Score: Harmonic mean of Precision and Recall, balancing both measures. A high F1-score signals an effective algorithm.

The curves display how these metrics vary across thresholds.

Robustness#

Transformation	Maximum	Mean	Median
compress-medium	11	2.006	1.0
echo	15	3.598	3.0
equalize	12	1.214	1.0
fade-8s-both	4	0.436	0.0
loudnorm	5	0.55	0.0
transcode-aac-32kbps	10	1.786	1.0
transcode-mp3-128kbps	3	0.33	0.0
transcode-ogg-64kbps	7	0.864	1.0
trim-1s-both	5	0.936	1.0
trim-5s-both	11	2.4	2.0

Distribution#

Performance#

Minimum: 0.41 MB/s
Maximum: 32.92 MB/s
Mean: 6.51 MB/s
Median: 5.61 MB/s

IMAGE-CODE-64#

Evaluation against dataset MIRFLICKR-MFND

Effectiveness#

Understanding the Effectiveness Chart

This chart evaluates the effectiveness of a similarity hash in comparing media files. Each hash is compared against all others at different distance thresholds, with results assessed against the ground truth.

Chart Interpretation:

The X-Axis shows "Hamming Distance Query Thresholds". Each threshold marks a maximum distance for two hashes to be considered similar.
The Y-Axis represents Recall, Precision, and F1-Score:
Recall: The fraction of actual matches correctly identified by the hash. Higher recall indicates better match detection.
Precision: The ratio of correct predictions to the total number of predictions. Higher precision implies more reliable predictions.
F1-Score: Harmonic mean of Precision and Recall, balancing both measures. A high F1-score signals an effective algorithm.

The curves display how these metrics vary across thresholds.

Distribution#

Performance#

Minimum: 0.27 MB/s
Maximum: 3.52 MB/s
Mean: 0.75 MB/s
Median: 0.70 MB/s

TEXT-CODE-64#

Evaluation against dataset STLIB-2000

Effectiveness#

Understanding the Effectiveness Chart

This chart evaluates the effectiveness of a similarity hash in comparing media files. Each hash is compared against all others at different distance thresholds, with results assessed against the ground truth.

Chart Interpretation:

The X-Axis shows "Hamming Distance Query Thresholds". Each threshold marks a maximum distance for two hashes to be considered similar.
The Y-Axis represents Recall, Precision, and F1-Score:
Recall: The fraction of actual matches correctly identified by the hash. Higher recall indicates better match detection.
Precision: The ratio of correct predictions to the total number of predictions. Higher precision implies more reliable predictions.
F1-Score: Harmonic mean of Precision and Recall, balancing both measures. A high F1-score signals an effective algorithm.

The curves display how these metrics vary across thresholds.

Distribution#

Performance#

Minimum: 0.00 MB/s
Maximum: 6.76 MB/s
Mean: 0.21 MB/s
Median: 0.07 MB/s

Last update: 2023-07-19
Created: 2023-07-19