Cisco Launches an "DNA Test" for AI, Compliant with the AI Act

⚡

Key Takeaways

1Cisco has introduced the Model Provenance Kit, an open-source tool that identifies the common origin of AI models with an accuracy of 96.4%.

2The tool uses fingerprints based on weights, the tokenizer, and architecture, and operates in a few milliseconds on classical computers.

3This program helps businesses comply with the AI Act by providing verifiable evidence for the required technical documentation.

💡Why it matters — This free tool democratizes access to AI model auditing, which is crucial for SMEs facing increasing regulatory demands.

Cisco has recently introduced a groundbreaking tool, the Model Provenance Kit, which promises to transform the way artificial intelligence models are verified for their origin. This open-source program, developed in Python, allows users to compare two AI models to determine if they share a common origin. In a rigorous test involving 111 pairs of models, the tool demonstrated remarkable accuracy of 96.4%. The code is available on GitHub under the Apache-2.0 license.

The initial database of the program includes 150 models from 45 families and over 20 publishers, with sizes ranging from 135 million to over 70 billion parameters. Among the platforms hosting these models, Hugging Face stands out with more than 2 million downloadable models, many of which are unreported derivatives of other models, often repackaged under new names.

Cisco's program operates through a command-line interface and generates a "fingerprint" for each model. This fingerprint consists of three essential elements: the weights acquired during training, the tokenizer used to segment the text, and the files describing the model's architecture. Two main usage modes are offered: the comparison mode, which directly confronts two models, and the scan mode, which searches for a unique model within a database of 150 reference fingerprints.

The creators of this tool, Ehsan Aghaei, Amy Chang, Ankit Garg, and Sanket Mendapara, describe it as an "DNA test" for AI. It operates on consumer-grade computer processors and allows for the comparison of two architectures in just a few milliseconds. The decision threshold for establishing a match is set at 0.70 on a scale of 0 to 1.

The comparison process unfolds in two steps. First, it analyzes the configuration files of the models, such as the number of layers in the neural network, internal dimensions, and the type of attention mechanism used. If this initial analysis is insufficient, the program proceeds to a detailed analysis of the weights using five indicators, such as the geometry of relationships between words and the statistical distribution of words.

A sixth indicator, concerning the tokenizer, is measured but does not influence the final score to avoid false matches. For example, although the StableLM and Pythia models share the GPT-NeoX tokenizer, they do not have common weights.

In testing, the tool failed in 4 cases, all related to extreme architectural transformations. These errors are considered mathematical limits rather than flaws in the program.

Cisco's tool aligns with the requirements of the AI Act, which mandates high-risk system providers to supply detailed technical documentation. This obligation, set to take effect in August 2026, could be postponed to December 2, 2027. Companies must prove the origin of their models, and the Model Provenance Kit offers verifiable evidence in the form of numerical scores and rankings.

For small businesses, Article 11 of the AI Act allows for simplified documentation. However, proving the origin of third-party components remains a costly challenge. The Model Provenance Kit, free and requiring no specialized hardware, provides an accessible solution.

A concrete example is Cursor Composer 2, which uses elements from Kimi 2.5 by Moonshot AI. This initially unreported dependency could have been detected using Cisco's kit.

The initial database of 150 models includes major publishers like Meta, Mistral, Alibaba, and DeepSeek. The program produces verifiable evidence: a numerical score, a ranking of candidate models, and a breakdown by indicator. These outputs integrate into a Annex IV file.

The question remains regarding the cost for small entities. Article 11 of the AI Act allows SMEs and start-ups to provide simplified documentation through a dedicated form. They still must prove the origin of third-party components. However, commercial AI audit solutions can be expensive, while the Model Provenance Kit remains free under the Apache-2.0 license. No specialized graphics card or server cluster is required. Pre-calculated fingerprints are cached for reuse. Structures previously excluded from audit tools can now access origin verification.

Cisco Launches an "DNA Test" for AI, Compliant with the AI Act

Le brief IA que les pros lisent chaque soir

Brief IA — L'actualité IA en français