Goodfire unveils Silico, a key tool for debugging AI

⚡

Key Takeaways

1Goodfire, a startup from San Francisco, launches Silico, a tool for adjusting AI models during their training.

2Silico allows for the exploration of the neurons in AI models, making it easier to correct undesirable behaviors.

3The tool uses agents to automate interpretability, making these techniques accessible to small businesses.

💡Why it matters — Silico democratizes access to advanced interpretability techniques, which are crucial for developing more reliable and transparent AIs.

Goodfire, an innovative startup based in San Francisco, has recently launched a groundbreaking tool named Silico. This tool is designed to provide researchers and engineers with the ability to explore and adjust the parameters of artificial intelligence (AI) models during their training phase. This capability could transform the way models are developed, allowing for more precise control than was previously possible. Goodfire claims that Silico is the first ready-to-use tool of its kind, capable of helping developers debug every step of the development process, from creating a dataset to training a model.

Goodfire's mission is to make building AI models less like alchemy and more like rigorous science. Large language models (LLMs) such as ChatGPT and Gemini can accomplish impressive tasks, but their internal workings often remain a mystery. This opacity complicates the correction of their flaws or the prevention of undesirable behaviors. Eric Ho, CEO of Goodfire, expressed in an exclusive conversation with MIT Technology Review that the gap between understanding models and their widespread deployment is concerning. According to him, the dominant approach in leading labs is to scale up, increase computing power, and gather more data to achieve artificial general intelligence (AGI), but Goodfire offers a more efficient alternative.

Goodfire is among a select group of companies, alongside leaders like Anthropic, OpenAI, and Google DeepMind, that are exploring mechanistic interpretability. This technique aims to understand what happens inside an AI model when it performs a task by mapping its neurons and the connections between them. Goodfire wants to use this approach not only to audit already trained models but also to design them from the ground up. “We want to eliminate trial and error and transform model training into precision engineering,” says Ho. “And that means exposing the knobs and dials so you can actually use them during the training process.”

Silico allows users to zoom in on specific parts of a trained model, such as individual neurons or groups of neurons, and conduct experiments to observe their functions. While access to proprietary models like ChatGPT or Gemini is limited, Silico can be used to explore many open-source models. For example, Goodfire identified a neuron in the open-source Qwen 3 model that was linked to the trolley problem, influencing the model's responses in terms of moral dilemmas. “When this neuron is active, all sorts of strange things happen,” explains Ho.

The tool uses agents to automate much of the complex interpretability work, making these techniques accessible to small businesses and research teams. “Agents are now powerful enough to do much of the interpretability work that we used to do with humans,” explains Ho. This automation bridges the gap necessary for Silico to become a viable platform that clients can use themselves.

Leonard Bereska, a researcher at the University of Amsterdam who has worked on mechanistic interpretability, believes that Silico appears to be a useful tool. However, he questions Goodfire's broader ambitions, stating that they are adding precision to alchemy, and that calling it engineering might give a more grounded impression than it actually is.

Silico also offers the ability to adjust neuron parameters to modify undesirable behaviors. An illustrative example is when a model was asked about disclosing misleading behaviors affecting 200 million users. Initially, the model responded negatively, but by reinforcing the neurons related to transparency, the response became positive in the majority of cases. “The model already had the ethical reasoning circuit, but it was overshadowed by the assessment of business risks,” says Ho.

The tool also allows for guiding model training by filtering certain data to avoid defining undesirable values from the outset. For instance, many models can be influenced by erroneous associations with biblical texts or code repositories, which can be corrected using Silico. By launching Silico, Goodfire aims to put techniques previously reserved for a few leading labs into the hands of small businesses and research teams looking to build their own model or adapt an open-source model.

The tool will be available for a fee determined on a case-by-case basis according to client needs, although Goodfire has declined to provide specific details on pricing. Bereska agrees that tools like Silico could help companies build more reliable models, particularly for critical safety applications in fields such as healthcare and finance. “Leading labs already have internal interpretability teams,” he adds. “Silico empowers the next level of companies, where the value lies in not having to hire interpretability researchers.”

Goodfire unveils Silico, a key tool for debugging AI

Le brief IA que les pros lisent chaque soir

Brief IA — L'actualité IA en français