Subquadratic Revolutionizes AI Language Models

⚡

Key Takeaways

1Subquadratic, a startup from Miami, claims to have overcome a mathematical bottleneck that has been hindering large language models.

2Its SubQ model is said to be 12 times more efficient than current models while consuming less energy.

3Independent tests from Appen partially confirm these claims, but skepticism remains due to the lack of public access to the model.

💡Why it matters — If Subquadratic's claims are validated, it could transform the efficiency and cost of large-scale language models, impacting the global tech industry.

Subquadratic: A Startup Disrupting Language Model Norms

The Miami-based artificial intelligence startup Subquadratic has recently made a notable entrance into the tech scene with an ambitious announcement. It claims to have solved a complex mathematical problem that has long hindered large language models, a challenge that has persisted for nearly a decade.

Although the initial details of this breakthrough remained vague, Subquadratic has begun to provide evidence supporting its claims. The company shared the results of an independent evaluation of its innovative technology, which has sparked growing interest in its assertions.

A Revolutionary Language Model

Subquadratic has developed a new type of large language model called SubQ. According to the startup, this model is not only faster and cheaper but also consumes significantly less energy than currently available models on the market. SubQ is said to be capable of processing up to 12 times more text simultaneously compared to its competitors, enabling it to efficiently perform tasks requiring intensive data analysis, such as reviewing hundreds of documents or vast codebases.

Furthermore, Subquadratic claims that SubQ can compete with the performance of leading models offered by giants like Google DeepMind, OpenAI, and Anthropic, particularly in key tasks such as coding.

Skepticism and Initial Evidence

Despite these bold claims, Subquadratic initially provided little tangible evidence to support them, merely publishing a few self-reported test scores. Additionally, the SubQ model has not yet been widely made available for public testing.

This initial lack of transparency naturally led to some skepticism. Dan McAteer, an AI engineer, expressed this doubt on X by comparing SubQ to a potential "Theranos AI," referencing the infamous biotech company that collapsed following fraud allegations.

A month after its announcement, Subquadratic released more information about its model, including the results of independent tests conducted by the third-party company Appen.

Validation Through Independent Testing

Alex Whedon, co-founder and CTO of Subquadratic, acknowledged that healthy skepticism was to be expected. He admitted that publishing third-party benchmarks alongside the initial announcement could have alleviated much of the doubt. That’s why the company is now committed to thoroughly verifying future results before making them public.

Subquadratic enlisted Appen, a company specializing in evaluating models from other firms, to test SubQ. The results appear to confirm many of Subquadratic's claims. Jeanine Sinanan-Singh, director of generative AI research at Appen, stated that these results validate SubQ's architecture.

Promising Efficiency

SubQ will not replace all existing models, but it could offer significant speed gains at a much lower cost for certain tasks. Subquadratic hopes that its breakthrough will usher in a new era of efficiency in building large language models. Justin Dangel, co-founder and CEO of the company, expressed hope that in a few years, transformers will no longer be in use.

To understand the significance of Subquadratic's claims, it is essential to examine how current language models operate. The central mechanism of an LLM is a neural network called a transformer, which uses a process known as dense attention. Modern LLMs typically chain multiple transformers together, as described in Google's foundational paper from 2017 titled "Attention Is All You Need."

The Challenge of Dense Attention

Dense attention works by assigning a number to each word or token in a text and then multiplying these numbers together to capture the overall meaning of the text. For example, a 10,000-word text would require nearly 50 million individual multiplications, which explains why LLMs are so energy-intensive.

As the length of the text increases, the number of calculations explodes, as each new number must be multiplied by all previous ones. Doubling the number of words quadruples the number of calculations, a phenomenon known as quadratic expansion.

The Innovation of Sparse Attention

Subquadratic's solution is to replace dense attention with sparse attention, thereby reducing the number of calculations needed. Instead of multiplying each number by all the others, sparse attention selects only certain numbers to multiply, based on the premise that not all relationships between words are relevant.

Alex Whedon explains that sparse attention allows for a focus on important relationships between words, rather than considering all of them. This approach is not new, but Subquadratic claims to have finally solved the problem by offering the first sparse attention LLM that competes with dense attention models in terms of performance.

A Unique Approach

Historically, sparse attention mechanisms used fixed patterns, such as systematically comparing the first word to the fifth. However, language is too complex for such limitations. Subquadratic has developed a unique mechanism that dynamically selects important words, a process calculated in real-time and tailored to each text.

Testing and Results

SubQ has proven to be faster and cheaper to execute than most other models for certain tasks. Appen evaluated SubQ on several standard tests, finding it to be 56 times faster than models using FlashAttention, a previous sparse attention technique.

On LiveCodeBench, a performance test on competitive coding problems, SubQ achieved a score of 89.7%, ranking among the top coding models.

Subquadratic's claims regarding cost are more challenging to verify, as SubQ is not yet widely accessible. According to Justin Dangel, the cost to run Anthropic's Opus 4.6 LLM via RULER 128 is $2,600, while SubQ would only cost eight dollars.

SubQ also appears capable of handling very large datasets, with a context window of up to 12 million tokens. During a demonstration, SubQ processed a task involving 400 documents in seconds, while Perplexity, an LLM-powered search engine, failed to load all the documents.

Appen also conducted the "needle in a haystack" test, where SubQ achieved 98% accuracy with context windows of six million and twelve million tokens, demonstrating nearly perfect long-context retrieval.

Persistent Skepticism

Despite these impressive results, benchmarks provide only a partial view of a model's capabilities. Testing under specific conditions does not replace evaluating a model across a wide range of real-world tasks.

Subquadratic presents SubQ as a model suited for coding and searching large datasets. The company claims that thousands of potential users have signed up for early access, including over 500 enterprise clients. However, the waiting list is long, and few people have been able to test the model so far.

A lingering issue is that Subquadratic reused the weights from a version of the open-source Chinese model Qwen to initialize SubQ, rather than training it from scratch. While this practice is common, it contradicts Subquadratic's assertion that it has entirely reinvented how LLMs operate.

Will Depue, an independent AI researcher, remains cautious about Subquadratic's claims. He acknowledges that the company may have created something real and useful, but believes that public evidence does not yet justify the assertion that it has solved the quadratic attention bottleneck.

In the meantime, Alex Whedon, co-founder of Subquadratic, insists that innovation was necessary to build a competitive model. "We are in more trouble than OpenAI is," he states.