Gemma 4 26B and Qwen3 Coder 30B are now available for Sovereign AI Inferencing

18 June

Gemma 4 26B and Qwen3 Coder 30B are now live on ResetData's AI platform - two open-weight, Mixture-of-Experts models running on sovereign Australian infrastructure.

What’s available

  
        Gemma 4 26B A4B Instruct
        Qwen3 Coder 30B A3B Instruct
      
        Model ID
        google/gemma-4-26b-a4b-it
        qwen/qwen3-coder-30b-a3b-instruct
      
        Provider
        Google DeepMind
        Alibaba
      
        Architecture
        MoE — 26B total / ~4B active
        MoE — 30B total / ~3B active
      
        Context
        256K native
        256K native (1M with YaRN)
      
        Input pricing
        $0.09 / 1M tokens
        $0.10 / 1M tokens
      
        Output pricing
        $0.49 / 1M tokens
        $0.40 / 1M tokens
      
        License
        Gemma Terms of Use
        Apache 2.0

	Gemma 4 26B A4B Instruct	Qwen3 Coder 30B A3B Instruct
Model ID	google/gemma-4-26b-a4b-it	qwen/qwen3-coder-30b-a3b-instruct
Provider	Google DeepMind	Alibaba
Architecture	MoE — 26B total / ~4B active	MoE — 30B total / ~3B active
Context	256K native	256K native (1M with YaRN)
Input pricing	$0.09 / 1M tokens	$0.10 / 1M tokens
Output pricing	$0.49 / 1M tokens	$0.40 / 1M tokens
License	Gemma Terms of Use	Apache 2.0

Both are available via serverless endpoint (instant start, pay per token, no infrastructure).

Who is this for?

Choose Gemma 4 26B for long documents, structured reasoning, multilingual text, or regulated data. It fits RAG pipelines, document summarisation, enterprise chatbots, and agentic workflows. Government, finance, and healthcare teams running document-heavy workloads will find it a natural fit.

Choose Qwen3 Coder 30B if code is the primary output. It’s built for agentic coding at repository scale: multi-file refactoring, test generation, CI/debugging loops. Apache 2.0 licensing means no usage restrictions or legal complications for product builds.

Gemma 4 26B A4B Instruct

Google DeepMind’s Gemma 4 26B is a sparse MoE model with 26B total parameters and only ~4B activated per token, using 128 fine-grained experts with top-8 routing. It fits on a single 80GB H100 unquantised and currently sits at #6 on the Arena AI text leaderboard, competing above models 20x its size.

The architecture uses hybrid attention: local sliding window attention interleaved with full global attention, with the final layer always global. This keeps processing fast without sacrificing long-range awareness across 256K-token inputs.

Capabilities:

256K native context window (262,144 tokens)

MoE architecture (26B total / ~4B active per token)

Structured thinking mode for reasoning

Function calling and tool use

Multilingual support

Instruction-tuned for chat and tool-augmented workflows

Use cases:

Chat, reasoning, tool-use, code generation, long-context analysis, multilingual support, question-answering.

Limitations:

Text-only on this deployment. Gemma 4 is multimodal at the architecture level, but image and audio inputs are not exposed here.

256K context is available, but real concurrent capacity is governed by the KV cache pool.

Safety guardrails apply at the platform layer, not the model itself.

Get started:

Playground: Experiment in real-time. Log in to app.resetdata.ai → Playground → Gemma 4 26B

Serverless: $0.09/1M input, $0.49/1M output.

Qwen3 Coder 30B A3B Instruct

Alibaba’s Qwen3 Coder 30B is a MoE model with 30B total parameters and ~3B activated per token across 8 of 128 experts. It was pre-trained on a code-heavy corpus with synthesised agent traces and tool-use trajectories, then post-trained with reinforcement learning from code-execution feedback. The result is a model built specifically for agentic, repository-level coding work.

The native context window is 256K tokens, extendable to 1M with YaRN rope scaling. For repo-scale work, that matters: loading an entire codebase into context, across dozens of files and full dependency chains, is the difference between autocomplete and genuine understanding of what you’re building.

Tool calls use the dedicated qwen3_coder parser server-side. Compatible with Qwen Code, CLINE, and other agentic coding platforms, with OpenAI-compatible tool-use formatting for broader integrations.

Capabilities:

256K native context (extendable to 1M with YaRN)

Agentic coding at repository scale

Function calling and tool use (qwen3_coder parser format)

Multi-file refactoring, test generation, and debugging

MoE architecture for fast decode (~3B params active per token)

Use cases:

Code generation, code completion, agentic coding, repository-level tasks, test generation, debugging, refactoring, tool-use.

Limitations:

1M context requires YaRN configuration changes. 256K works out of the box.

No thinking mode. For complex chain-of-thought planning, use a reasoning model and hand off to Qwen3 Coder for execution.

Tool-call format requires the dedicated qwen3_coder parser server-side.

Output recommended at 64K tokens per generation.

Get started:

Playground: Experiment in real-time. Log in to app.resetdata.ai → Playground → Qwen3 Coder 30B A3B Instruct

Serverless: $0.10/1M input, $0.40/1M output.

Why MoE matters for cost

MoE models route each token through a small subset of specialised experts rather than activating every parameter. Gemma 4 runs ~4B of its 26B parameters per token. Qwen3 Coder runs ~3B of its 30B. Inference cost scales with active parameters, not total, so you get the knowledge capacity of a large model at the compute cost of a small one.

Sovereign by default

Both models run in Australia within ResetData’s sovereign AI infrastructure. Your data stays onshore and is never used for training.

Already have an account? Login to start building. New to ResetData? Sign up with free $50 token credits.

Caroline Martinez

Gemma 4 26B and Qwen3 Coder 30B are now available for Sovereign AI Inferencing

What’s available

Who is this for?

Gemma 4 26B A4B Instruct

Qwen3 Coder 30B A3B Instruct

Why MoE matters for cost

Sovereign by default

Navigation

Connect

Legal

Gemma 4 26B and Qwen3 Coder 30B are now available for Sovereign AI Inferencing

What’s available

Who is this for?

Gemma 4 26B A4B Instruct

Qwen3 Coder 30B A3B Instruct

Why MoE matters for cost

Sovereign by default

Z.ai GLM 5.2 is now available for Sovereign AI Inferencing

One sovereign GPU foundation. Four ways to consume it.

Navigation

Connect

Legal