Springboards and Flint: Reinventing LLMs for Greater Creativity

⚡

Key Takeaways

1Current LLMs, like ChatGPT, tend to produce predictable responses, limiting creativity.

2The startup Springboards has developed Flint, a model that generates more varied and original responses.

3Flint uses specific training techniques to avoid the monotony of traditional LLM responses.

💡Why it matters — Flint could transform the way creative professionals use AI for brainstorming and innovation.

The Predictability of Current Language Models

Open your favorite chatbot, whether it’s Claude, ChatGPT, or Gemini, and ask it to give you a random number between 1 and 10. You will often find that the answer is 7. If you insist on getting another number, you will likely receive 3 or 4, and if you continue, it will be 8 or 9. While this does not happen every time, this trend highlights a remarkable predictability of large language models (LLMs).

These models, although highly effective for tasks like coding or research, exhibit a certain monotony in their responses, which can be problematic in contexts requiring creativity, such as brainstorming or travel planning.

Flint: A New Approach by Springboards

The Australian startup Springboards has decided to tackle this issue by developing a language model called Flint. Unlike traditional LLMs, Flint is designed to provide a greater diversity of responses to open-ended questions, such as "Where should I go in Europe?" Pip Bingemann, co-founder and CEO of Springboards, explains that while most models struggle with hallucinations, Flint embraces them to generate more varied answers.

During a demonstration, Bingemann showed how Flint stands out from other models. While ChatGPT and Claude both gave the number 7 to a question about random numbers, Flint also provided 7 on the first attempt. After restarting the session, Flint suggested 3.7916, illustrating its ability to think outside the box.

This is not just about numbers. When Bingemann asked ChatGPT and Claude to name a type of car, he predicted it would be a Toyota or a Honda—and he was right. Flint suggested a Ford F-150. “There’s all this lost information that isn’t served up in these models,” he says. “They are just as capable of saying a Buick or a Tesla. They just don’t do it—they are biased.”

Bingemann sent one last request to each of the three models: “Give me a slogan for a New Balance running shoe campaign. Just the slogan.” Claude: “Run your way.” ChatGPT: “Run your way.” Flint: “Built to last, run to win.” It may not win any awards, but at least it’s different.

Research on the Collective Thinking of LLMs

This strange limitation of LLMs is starting to attract more attention. In November, a team of researchers published a paper titled “Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond),” which exposed a remarkable degree of repetition not only in the responses of individual LLMs but also among them. They found that different LLMs converged on very similar answers when asked open-ended questions.

It is unclear why this happens exactly, but researchers hypothesize that most of today’s LLMs are trained similarly on similar data to perform similar tasks. The team won the best paper award at NeurIPS, a major AI conference.

When the researchers asked 25 different LLMs (including models from the largest American companies as well as open-source models from China and elsewhere) to write 50 times each a metaphor about time, most of the 1,250 responses were a version of “Time is a river” or “Time is a weaver.”

(I posed the same question to some of my colleagues, and six people gave me six different answers. My standout: “Time is a favorite sweatshirt, shaped by a life of wear.”)

When you look for it, you see repetition everywhere, says Kieran Browne, co-founder and CTO of Springboards. “The way most chat interfaces are designed, it gives the impression that you’re having a personal conversation,” he says. “I think most people don’t really realize the extent to which they’re getting the same things as everyone else.”

Take another example: “How should I name my band?” Most models will say something involving “glass,” “neon,” “velvet,” or “static,” says Browne.

When I tried, ChatGPT spat out a list of 56 band names. At the top was “Glass Harbor.” As I scrolled through, I found “Static Empire,” “Neon Hearts,” and “Velvet Echo.” I asked Gemini; it gave me 15 suggestions, including “Static Horizon.”

Some of the suggestions sounded pretty cool. ChatGPT’s “Sofa Astronauts” caught my attention, so I googled it—and discovered that a band called Sofa Astronauts already exists.

(OpenAI states that training models to provide reliable and consistent answers can lead them to converge around familiar, high-probability responses, and that pushing harder for novelty can result in weaker or less reliable answers. It also notes that the paper “Artificial Hivemind” studied models from 2024 that have since been updated.)

Creative Catapult

Springboards has developed a tool supported by a selection of LLMs, including ChatGPT and Claude, that creative professionals in advertising or marketing can use to brainstorm ideas. The tool allows you to move text produced by different models, selecting the elements you like and combining them into something new—in theory. Springboards presents Flint as an alternative model that users of its tool can select when seeking more variety.

Zoe Scaman, founder of the business strategy startup Bodacious and strategy director at 77X, a direct marketing platform created by Luka Dončić of the LA Lakers, has tested it. “I find it really useful for steering me in completely different directions,” she says. “I use it if I want to catapult myself in all directions.”

In one test, Scaman pitted Flint against Claude, Gemini, and ChatGPT by giving each model a classic MBA case study: How to reinvent a finance company for today’s youth? The three traditional models all followed the same path, she says: “You know, we need to teach financial literacy in a fun and funky way—well, that’s nothing new.”

But Flint suggested something different, proposing that the very concept of wealth accumulation should be rebranded. “That was really interesting,” says Scaman.

She notes that Flint is still a prototype and does not work all the time. “It sometimes crashes when you start pushing it too far,” she says. “But I think the principle behind it is really powerful.”

Taking the Temperature

Springboards built Flint on the basis of Qwen 3, an open-source model. “We’re a small team,” says Browne. “Training a base model is beyond our means. It’s just too costly.”

Most LLMs have parameters that allow you to adjust the level of randomness in their outputs. The most common is called temperature. “Obviously, that was one of the first things we explored, because that’s what people tell you: If you want more creativity, you raise the temperature,” says Browne.

But changing these parameters can also make the models inconsistent. Increasing the temperature to the maximum on one of OpenAI’s models caused it to produce responses that switched from English to code mid-sentence, says Browne.

Springboards realized that the parameters were too blunt an instrument for what it wanted to achieve. It doesn’t make sense to increase randomness generally; you only want to increase it at specific points in its output, he says.

For example, when you ask a chatbot “Where should I go in Europe?”, the model only needs to adjust the randomness before naming a destination, not for every word in its response.

To make Flint work this way, Springboards trained its version of Qwen 3 to identify points in its output where more variety was possible and fill those spaces with slightly more random words or phrases.

“Flint is programmed to throw out an original idea. It’s more of an invitation to think more broadly,” says Maximilian Weigl, co-founder and strategy director at Uncommon, a marketing agency. “It’s super interesting.”

Weigl’s team uses Flint alongside ChatGPT, Claude, and Gemini. “You can’t really create something revolutionary with tools that bring you back to the average,” he says.

And yet, Weigl notes that nine times out of ten, the average is sufficient. You don’t always need to reach extremes with something like Flint, he says: “Most people are fine with what’s adequate. They want to see familiar and mainstream things.”

Weigl also warns against over-reliance on any LLM. “I have a big problem when people lean on the results of an AI, including Flint,” he says. “If I saw people on my team copy-pasting something from an AI, I’d say: ‘That’s not your work! Think, talk to other people, use your own voice.’”

For now, Flint is aimed at advertisers and marketers as they are Springboards’ clients. But Bingemann and Browne insist that a lack of variety is a problem for anyone using chatbots.

The idea is to give people the choice and let them decide whether the result is good or not, says Bingemann. “Variety is great when you’re trying to spark ideas,” he says. “Let’s go in that direction instead of letting machines do everything and ending up in a gray and boring world.”