What Are Tokens?

Many asked about why it sometimes cost more and sometimes cost less? We work on a transparency principle so it's best for you to learn abit about LLMs rather than just trying to convince you it's magic! We charge you a 30% service fee (25% profit goes to creator) on top of what AI providers charge us. Learn more here Transparent Pricing

Tokens are the basic units that AI language models use to process and understand text. Think of them as the "building blocks" of language that the AI reads and generates.

How it works:

When you send a message, the AI breaks it down into tokens before processing it
Tokens can be whole words, parts of words, or even individual characters
On average, 1 token ≈ 4 characters or ¾ of a word in English
For example: "Hello, how are you?" = approximately 6 tokens

Why tokens matter:

Cost: Most AI services charge based on tokens used (both input and output)
Context limit: AI models have a maximum token limit for chat conversation
Response limits: The AI can only generate a certain number of tokens per response

Practical examples:

A short message (50 words) ≈ 65-70 tokens
A medium paragraph (200 words) ≈ 265-280 tokens
A long roleplay response (500 words) ≈ 665-700 tokens

3 Types of Tokens

When you chat with AI in ISEKAI ZERO, tokens work in three different ways:

1. Input Tokens

Your prompts and messages to the AI
Commands and instructions you give
Example: "I try to explain to the adventurers that I'm not a real demon."

2. Cache Tokens (Smart Memory System)

Previous conversation history that gets saved
Character details and backstory
World information and scene context
These are stored so the AI does not have to re-read everything from scratch

3. Output Tokens

The story content the AI writes back
Character dialogue and responses
Scene descriptions and narrative

How Tokens Flow in a Conversation

Step 1: You Send Your Action

Your prompt becomes Input Tokens.

"I try to convince the guards I'm just a normal traveler."

Step 2: AI Processes Your Request

The AI reads your message along with relevant Cache Tokens (previous story context) to understand the situation.

Step 3: AI Responds

The AI generates a story continuation as Output Tokens.

The guard eyes you suspiciously. "Normal travelers don't have horns," he mutters, hand moving to his sword...

Step 4: Important Details Get Cached

The AI automatically saves key information from this exchange as Cache Tokens for future use. This makes your next interaction faster and cheaper because the AI doesn't need to reload the entire conversation history—it already remembers the important parts.

Why This Matters

Input + Output = Your direct costs (what you pay per message)
Cache = Your savings (prevents expensive re-processing)
Longer conversations use more cache but save you money overall
Each response builds on cached memory, creating a seamless story

Why Do Costs Vary Between Messages?

You often notice that some messages cost more than others, even if they are similar in length.

Here's why:

The Cache System Works on "Best Effort"

The AI tries to cache (save) your conversation history to reduce costs, but it can only reuse what is still valid. The cache system does its best effort with the current context. Not all llm model support cache.

Important: Cached tokens are significantly cheaper than regular input tokens but the exact discount varies depending on the situation.

How Cache Saves You Money

Example:

Your conversation needs 10,000 input tokens to process
The AI successfully caches 8,000 tokens from previous turns
Result: Those 8,000 cached tokens cost a fraction of their original price (often around 10% or less, but this varies)

The savings:

2,000 regular input tokens = Full price
8,000 cached tokens = Much cheaper (discount varies)
Total = Significantly less than paying full price for all 10,000 tokens!

Why Cache Effectiveness Varies

Cache works GREAT when:

✅ You are actively chatting (within 5 minutes of last message)
✅ Your conversation history stays unchanged
✅ Character details remain the same
✅ No edits to previous messages

Cache is LOST or REDUCED when:

❌ 5+ minutes pass without interaction (cache expires)
❌ You edit a previous message (invalidates cache from that point)
❌ You modify character details (changes the context)
❌ Previous conversation turns are altered
❌ AI provider service degration

The 5-Minute Rule

Your cache expires after 5 minutes of inactivity.

If you reply within 5 minutes → Cache is still active → Lower costs
If you wait longer than 5 minutes → Cache expires → Full input token costs

This is why costs can spike after breaks. The AI has to reload everything at full price.

Bottom Line

The cache system tries to save you money, but it needs:

Continuous interaction (replies within 5 minutes)
Unedited conversation history
Unchanged character information

Pro Tips for Lower Costs:

Reply within 5 minutes to keep cache active
Avoid editing previous messages when possible
Plan your character details before starting
Take longer breaks between sessions instead of many small breaks

Cache is "Best Effort" — Not Guaranteed

It's important to understand that caching is a best effort system, meaning it tries to work but success is never guaranteed. Several factors beyond your control can cause cache to fail:

Why cache can fail unexpectedly:

Server-side factors — AI providers may clear caches during high traffic, maintenance, or system updates
Model routing — Your request might be processed by a different server instance that doesn't have your cached data
Infrastructure changes — Backend updates or load balancing can invalidate existing caches
Token limit constraints — If your conversation grows too large, older cached content may be dropped
Provider policies — Each AI provider handles caching differently, and their systems can change without notice

What this means for you:

Even if you do everything "right" (reply within 5 minutes, don't edit messages, etc.), you may occasionally see higher costs due to cache misses. This is normal and expected — it's simply how distributed AI systems work.

The bottom line: Cache saves you money on average over time, but any individual message might not benefit from caching. Think of it as a discount that usually applies, not a guarantee.

How Tokens are Calculated?

Example:

DeepSeek V3.2, it costs

29.4 Mana / Arcane per 1M input tokens
44.1 Mana / Arcane per 1M output tokens

Total Tokens: 61,810

Prompt Tokens: 61,608
- Cached: 30,784
- Fresh Input: 30,824 (61,608 - 30,784)
Output Tokens: 202

Cost Calculations

Fresh Input Tokens Cost = (30,824 / 1,000,000) × 29.4 = 0.9062256 Mana
Cached Tokens Cost = (30,784 / 1,000,000) × 2.94 = 0.09050496 Mana
Output Tokens Cost = (202 / 1,000,000) × 44.1 = 0.0089082 Mana

Total Cost: 1.00563876 Mana

If ALL 61,608 prompt tokens were charged at FULL price,

Input Tokens Cost = (61,608 / 1,000,000) × 29.4 = 1.8112752 Mana
Output Tokens Cost = (202 / 1,000,000) × 44.1 = 0.0089082 Mana

Without Cache: 1.8201834 Mana

Total Tokens Saved: 0.8147958 Mana (44.75% cheaper)

Token Types Summary

Feature	Input Tokens	Cache Read Tokens	Output Tokens
Description	What you send	What AI remembers	What AI generates
Cost	Moderate	Very cheap	Most expensive
Reason	AI reads your text	AI reuses stored content	AI creates new content

Revision #17
Created 30 December 2025 15:20:15 by Cloudy
Updated 23 January 2026 22:07:41 by Louis