Towards a Measure of Algorithm Similarity

TL;DR

A new framework detects near-identical algorithms and checks diversity in AI-generated code, improving software verification and originality.

In an era where algorithms power everything from search engines to medical diagnostics, determining whether two programs are truly different is a fundamental challenge. Researchers have now developed a practical method to measure algorithm similarity, addressing a core problem in computer science with broad implications for software development, plagiarism detection, and artificial intelligence. This breakthrough offers a standardized way to compare algorithms beyond superficial code differences, enabling more reliable analysis in fields reliant on computational procedures.

The key finding is the EMOC framework, which embeds any algorithm into a numeric vector based on four components: evaluation (functional output), memory usage, operations used, and computational complexity. This approach captures the essence of an algorithm by focusing on what it does, how it uses resources, and the steps it takes, rather than just its code syntax. For instance, it can distinguish between algorithms that produce the same output but differ in efficiency or internal logic, such as variations of sorting methods.

To create this framework, the researchers compiled a dataset called PACD, consisting of 350 Python implementations across problems like list sorting and prime number checking. Each algorithm was manually verified for correctness. The methodology involves sampling inputs to test functional equivalence, measuring memory and runtime scaling with input size, and counting distinct operations like additions or multiplications. By combining these into a single vector, EMOC provides a consistent representation that is less affected by trivial changes, such as renaming variables or reordering associative operations.

Results show that EMOC achieves 79.1% accuracy in classifying algorithm types using clustering techniques. In clone detection, it identified near-duplicates, such as two BubbleSort implementations in different languages that perform similarly. Additionally, the framework quantified diversity in programs generated by large language models (LLMs), revealing that higher model temperatures and parameter counts lead to more varied algorithms. For example, modifying prompts to encourage novelty increased the diversity of operations used in sorting algorithms, as shown in the paper's figures.

This work matters because it supports real-world applications like software validation, where detecting copied or inefficient code is crucial, and AI research, where assessing the originality of generated programs can foster innovation. By providing a reproducible metric, it helps developers and researchers ensure algorithms are not just functionally equivalent but also optimized and distinct, potentially reducing errors and promoting ethical practices in coding.

Limitations include the framework's reliance on finite input sampling and predefined problem domains, which may not capture all edge cases. The paper notes that determining exact equivalence remains uncomputable in theory, and EMOC's effectiveness depends on the abstraction level chosen, such as using Python for its balance of expressiveness and analyzability. Future work could extend this to more languages and complex algorithmic behaviors.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn