Gemma 4 26B and Qwen3 Coder 30B: Now Available on Sovereign Australian AI Infrastructure
Gemma 4 26B and Qwen3 Coder 30B are now live on ResetData's AI platform — two open-weight, Mixture-of-Experts models running on sovereign Australian infrastructure.
Both ship with 256K native context, function calling, and serverless access from $0.09/1M tokens. One's a long-context generalist. The other's a purpose-built coding agent.
What’s available
| Gemma 4 26B A4B Instruct | Qwen3 Coder 30B A3B Instruct | |
|---|---|---|
| Model ID | google/gemma-4-26b-a4b-it | qwen/qwen3-coder-30b-a3b-instruct |
| Provider | Google DeepMind | Alibaba |
| Architecture | MoE — 26B total / ~4B active | MoE — 30B total / ~3B active |
| Context | 256K native | 256K native (1M with YaRN) |
| Input pricing | $0.09 / 1M tokens | $0.10 / 1M tokens |
| Output pricing | $0.49 / 1M tokens | $0.40 / 1M tokens |
| License | Gemma Terms of Use | Apache 2.0 |
Both are available via serverless endpoint (instant start, pay per token, no infrastructure).
Who is this for?
Choose Gemma 4 26B for long documents, structured reasoning, multilingual text, or regulated data. It fits RAG pipelines, document summarisation, enterprise chatbots, and agentic workflows. Government, finance, and healthcare teams running document-heavy workloads will find it a natural fit.
Choose Qwen3 Coder 30B if code is the primary output. It’s built for agentic coding at repository scale: multi-file refactoring, test generation, CI/debugging loops. Apache 2.0 licensing means no usage restrictions or legal complications for product builds.
Gemma 4 26B A4B Instruct
Google DeepMind’s Gemma 4 26B is a sparse MoE model with 26B total parameters and only ~4B activated per token, using 128 fine-grained experts with top-8 routing. It fits on a single 80GB H100 unquantised and currently sits at #6 on the Arena AI text leaderboard, competing above models 20x its size.
The architecture uses hybrid attention: local sliding window attention interleaved with full global attention, with the final layer always global. This keeps processing fast without sacrificing long-range awareness across 256K-token inputs.
Capabilities:
256K native context window (262,144 tokens)
MoE architecture (26B total / ~4B active per token)
Structured thinking mode for reasoning
Function calling and tool use
Multilingual support
Instruction-tuned for chat and tool-augmented workflows
Use cases:
Chat, reasoning, tool-use, code generation, long-context analysis, multilingual support, question-answering.
Limitations:
Text-only on this deployment. Gemma 4 is multimodal at the architecture level, but image and audio inputs are not exposed here.
256K context is available, but real concurrent capacity is governed by the KV cache pool.
Safety guardrails apply at the platform layer, not the model itself.
Get started:
Playground: Experiment in real-time. Log in to app.resetdata.ai → Playground → Gemma 4 26B
Serverless: $0.09/1M input, $0.49/1M output.
Qwen3 Coder 30B A3B Instruct
Alibaba’s Qwen3 Coder 30B is a MoE model with 30B total parameters and ~3B activated per token across 8 of 128 experts. It was pre-trained on a code-heavy corpus with synthesised agent traces and tool-use trajectories, then post-trained with reinforcement learning from code-execution feedback. The result is a model built specifically for agentic, repository-level coding work.
The native context window is 256K tokens, extendable to 1M with YaRN rope scaling. For repo-scale work, that matters: loading an entire codebase into context, across dozens of files and full dependency chains, is the difference between autocomplete and genuine understanding of what you’re building.
Tool calls use the dedicated qwen3_coder parser server-side. Compatible with Qwen Code, CLINE, and other agentic coding platforms, with OpenAI-compatible tool-use formatting for broader integrations.
Capabilities:
256K native context (extendable to 1M with YaRN)
Agentic coding at repository scale
Function calling and tool use (qwen3_coder parser format)
Multi-file refactoring, test generation, and debugging
MoE architecture for fast decode (~3B params active per token)
Use cases:
Code generation, code completion, agentic coding, repository-level tasks, test generation, debugging, refactoring, tool-use.
Limitations:
1M context requires YaRN configuration changes. 256K works out of the box.
No thinking mode. For complex chain-of-thought planning, use a reasoning model and hand off to Qwen3 Coder for execution.
Tool-call format requires the dedicated qwen3_coder parser server-side.
Output recommended at 64K tokens per generation.
Get started:
Playground: Experiment in real-time. Log in to app.resetdata.ai → Playground → Qwen3 Coder 30B A3B Instruct
Serverless: $0.10/1M input, $0.40/1M output.
Why MoE matters for cost
MoE models route each token through a small subset of specialised experts rather than activating every parameter. Gemma 4 runs ~4B of its 26B parameters per token. Qwen3 Coder runs ~3B of its 30B. Inference cost scales with active parameters, not total, so you get the knowledge capacity of a large model at the compute cost of a small one.
Sovereign by default
Both models run in Australia within ResetData’s sovereign AI infrastructure. Your data stays onshore and is never used for training.
Already have an account? Login to start building. New to ResetData? Sign up with free $50 token credits.