Claude: 10 Strategies to Master Your Token Usage

⚡

Key Takeaways

1Claude users are reaching their session limits faster since the end of March.

2Anthropic has reduced usage windows to five hours during peak times.

3The Opus, Sonnet, and Haiku models influence token consumption based on tasks.

💡Why it matters — Optimizing the use of Claude helps manage resources effectively and avoid service interruptions.

For several weeks, Claude users, particularly those subscribed to the Pro and Max plans, have noticed an acceleration in reaching their consumption limits. At the end of March, Anthropic confirmed that it had adjusted the usage windows to five hours during peak hours on weekdays. This means that users are hitting their session limits more quickly than before.

This phenomenon is primarily due to the increase in agentic uses, such as Claude Code, long sessions, and complex tasks, which consume significantly more resources than simple text exchanges. Here are some tips to understand how these limits work and optimize your credits.

How do Claude's limits work?

Claude's limit system is based on two distinct mechanics:

Current session limit: This limit functions like a sliding counter that measures the amount of resources consumed over a five-hour period. Once the limit is reached, you must wait for the gauge to reset.
Weekly limit: As the name suggests, this limit is renewed once a week. When you reach your weekly limit, you can no longer use Claude until it resets.

These limits are not measured in the number of messages but in tokens. One token roughly corresponds to a word (or three to four characters). What significantly increases the cost is that Claude re-reads the entire conversation from the beginning with each new message. The first message in a session costs very little. The thirtieth, however, forces Claude to re-read twenty-nine complete exchanges before processing the new question. This is the main reason why limits disappear much faster than expected.

There is also a length limit, which concerns the context window, or the amount of information that Claude can process in a single conversation. This is Claude's "working memory" for a given exchange. The context window is 200,000 tokens for all models and paid plans, except for Enterprise, which has 500,000 tokens on certain models.

Factors influencing credit consumption in Claude:

The length of exchanged messages,
The size of attached files,
The duration of the ongoing conversation,
The activation of tools like web search or Research mode,
The choice of model (Sonnet, Opus, Haiku),
The creation of Artifacts (documents, tables, presentations).

10 tips to optimize your usage on Claude

Open a new conversation for each topic
Continuing different topics in the same thread is one of the most common mistakes. Claude re-reads the entire history with each message, and a thread that mixes several topics unnecessarily increases consumption. The right reflex is to open a new thread as soon as you change the subject. For long sessions on the same theme, another good practice is to ask Claude for a summary of key decisions at the end of the session, then start a new discussion by pasting this summary as the first message. This way, you convey the essential context without incurring the cost of the complete history.
Group multiple questions into a single message
Sending three separate messages for three related questions forces Claude to re-read the complete history three times. By grouping them into a single message, you achieve the same result for a third of the consumption. Feel free to format your questions as a bullet list to structure your request and organize your ideas.
Choose the right model for the right task
Opus is Claude's most powerful model, but also the most resource-intensive. For corrections, reformatting, or simple questions, Sonnet provides very similar results at a much lower cost. Haiku is even more economical for short queries. The practical rule: reserve Opus for complex tasks that truly justify its use.

Which Anthropic model for which task?
- Haiku: for simple and repetitive tasks (reformatting, spell checking, data extraction, short summaries, classification, answers to factual questions…)
- Sonnet: for the majority of professional tasks (writing, analysis, coding, research, document processing, brainstorming…)
- Opus: for complex tasks that require in-depth reasoning (in-depth analysis, complex coding, long multi-step tasks, strategic decisions, creating a skill…)
Disable unnecessary tools
Web search, Research mode, and connectors (Slack, Google Drive, etc.) consume additional credits with each response. And Anthropic confirms in its documentation that these tools are particularly token-hungry! The right reflex is to disable all these options by default and activate them only when a task requires it.
Leverage the Project feature
Uploading the same document in multiple conversations means having it read as many times. Claude's Projects solve this issue: a file uploaded once is cached and remains available for all conversations in the Project, without reusing any tokens. Additionally, Projects have a RAG (retrieval-augmented generation) mode, allowing you to manage a large volume of data without consuming more.
Monitor your consumption in real-time
Claude offers a dashboard accessible in Settings > Usage. This displays all your usage limits according to the different tools used. Checking it regularly allows you to anticipate blockages and plan intensive sessions outside of peak hours if necessary.
Convert files before uploading them
Sending a PDF to Claude effectively uses your credits twice. Indeed, Claude extracts the text and converts each page into an image for visual analysis. By directly extracting the useful text and pasting it into a text or Markdown file before uploading, you significantly reduce consumption compared to a raw PDF. The same logic applies to screenshots: when the information is textual, it's better to copy-paste than to capture!
Edit your request instead of correcting it in the discussion
When you make a request and Claude does not provide the answer you expect, one of the good practices is to modify the request rather than continuing the discussion to tell it that you were not expecting that answer. Indeed, each message of the type "no, I meant to say..." adds to Claude's history, and it will reinterpret it with each new exchange in the discussion. To modify a request, simply click on the pencil button, then edit your text and press Enter. The exchange will thus be replaced rather than stacked.
Generate files at the end of the session, with the right model
Creating Artifacts (Word documents, presentations, tables) is a costly operation in tokens. Two reflexes to combine: first work in conversation mode to refine the content, then trigger the file generation all at once at the end of the session. Choose the right model according to the task. For example, you might first build your conversation with Opus, then switch to a less costly model for the simple generation of the Artifact.
Allow Claude to remember your conversations
Claude can access past conversations to retrieve context, thus saving you from repeating the same information in each new discussion. Two features should be activated in Settings > Features:

Searching past conversations: which allows you to explicitly ask Claude to find what was discussed in previous exchanges,
Contextual memory: which allows Claude to automatically retain key information from one session to the next.

Good to know: conversations from projects are not integrated into Claude's global memory; each has its own memory space.

Claude: 10 Strategies to Master Your Token Usage

Le brief IA que les pros lisent chaque soir

How do Claude's limits work?

Factors influencing credit consumption in Claude:

10 tips to optimize your usage on Claude

Brief IA — L'actualité IA en français