Skip to main content
Support

Browse by category

All categories
← All posts
Explainer Jun 2, 2026 · 8 min read

WebGPU, explained — how the browser got a modern GPU API

WebGPU is the browser API that exposes the GPU directly — explicit pipelines, compute shaders, lower overhead. What it is, and why on-device AI now runs in a tab.

By Khine 1,661 words Extractable lead
WebGPU, explained — how the browser got a modern GPU API — hero illustration

What WebGPU is

WebGPU is a web API that gives a page direct, structured access to the GPU on the machine it is running on. It is the successor to WebGL, and the name invites a fair assumption — that it is a faster way to draw triangles. That is part of it. The larger part is that WebGPU was designed around general-purpose GPU computation as a first-class feature, not as a graphics trick done sideways.

The standard comes from the W3C’s “GPU for the Web” working group. The same group defined a companion shading language, WGSL — the WebGPU Shading Language — which is what you write the GPU programs in. Both shipped together; you do not get one without the other.

The reason WebGPU exists is that the native graphics landscape moved on and WebGL did not follow. WebGL is built on OpenGL ES 2.0, a design from a much earlier era of GPU hardware, and it is no longer actively evolving. WebGPU instead maps onto the modern explicit APIs that drive games and professional software today: Direct3D 12 on Windows, Metal on Apple platforms, and Vulkan elsewhere. In MDN’s framing, it offers “better compatibility with modern GPUs, support for general-purpose GPU computations, faster operations, and access to more advanced GPU features.”

The model: adapters, devices, encoders

The part worth understanding is the object model, because it is where WebGPU’s design choices show. Four pieces carry most of the weight.

An adapter represents a physical GPU and its driver. You request one from navigator.gpu.requestAdapter(). A machine might expose more than one — a power-saving integrated GPU and a discrete one — and the adapter is how you find out what is available and what it can do.

A device is a logical handle obtained from the adapter via requestDevice(). MDN calls the adapter “a physical GPU and driver” and the device “an abstraction via which a single web app can access GPU capabilities in a compartmentalized way.” The device is the object you actually use; everything else hangs off it. The split matters because it isolates one page’s GPU work from another’s, which is the kind of thing you want when arbitrary websites are issuing instructions to your graphics hardware.

A pipeline describes a fixed unit of work. A render pipeline has vertex and fragment stages and produces pixels. A compute pipeline has a single compute stage and produces whatever you write to a buffer. You build pipelines up front, declaring the shader code, the data layouts, and the output formats in advance.

A command encoder records the work. You ask the device for an encoder, record a sequence of passes into it (set this pipeline, bind that buffer, draw, or dispatch), call finish() to get a command buffer, and submit it to the device’s queue. The GPU executes the batch.

That up-front, batch-and-submit shape is the substantive difference from WebGL, and it is deliberate. WebGL is a state machine you mutate call by call — bind a texture, set a uniform, draw, repeat — and each of those calls is a round trip the browser has to validate. WebGPU front-loads the validation into pipeline creation, so the per-frame work is mostly recording already-checked commands. The Chrome team reported a Babylon.js scene running its draw calls “more than 10x faster” under WebGPU than WebGL 2 for that reason: the API is, in their word, less “chatty.”

I’ll admit the adapter-versus-device split took me a couple of readings to hold in my head — it feels like one indirection too many until you remember the browser is handing untrusted pages a pipe to the GPU, at which point the compartment makes sense.

WGSL, briefly

Shaders — the programs that run on the GPU — are written in WGSL. It is a new language defined alongside the API rather than a reuse of GLSL (WebGL’s language) or HLSL (Direct3D’s). MDN describes it as “a low-level Rust-like language,” which is a reasonable first impression: explicit types, explicit address spaces, a syntax that reads more like systems code than the C-flavored GLSL.

Defining a fresh language was a contested decision, and I think the criticism was fair at the time — another language is another thing to learn and another compiler to trust. The argument that won was portability. A single WGSL source has to compile down to three different native shading languages depending on the platform, and owning the language end to end made that translation tractable in a way that adopting an existing one would not have.

Compute shaders, which are the real story

WebGL can run vertex and fragment shaders. It cannot run compute shaders at all. For years, anyone who wanted to do general numerical work on the GPU from the web had to disguise it as graphics: encode your data as a texture, run it through a fragment shader, render to an off-screen target, and read the pixels back as your result. It worked, and it was miserable.

WebGPU has compute pipelines as a native feature. The Chrome launch post calls compute shaders “WebGPU’s primary new feature.” A compute shader gets direct read-write access to buffers, a notion of workgroups that run in parallel, and shared memory within a workgroup for cooperation. You dispatch a grid of work and the GPU schedules it across its cores. None of it pretends to be rendering.

This is the capability that changes what the platform is for. Linear algebra — large matrix multiplies, the operation at the center of neural-network inference — maps directly onto the compute model. A texture-hack version of that was a research curiosity. A direct version is a runtime.

Why now: support

WebGPU shipped to stable Chrome in Chrome 113, which the Chrome team dated to release “today in Chrome 113 on ChromeOS, macOS, and Windows” in their May 2023 announcement. For roughly two years it was a Chrome and Edge feature, with everyone else behind flags.

That has changed, though not yet uniformly, and this is the spot where I would caution against the headlines declaring the job finished. Apple shipped WebGPU in Safari 26 across its platforms. Mozilla shipped it in Firefox 141 — but, per Mozilla’s own Gfx Team, Firefox 141 covers Windows only, with macOS and Linux following and Android after that. The W3C group’s implementation-status page is the honest place to check; the short version is that WebGPU is now in all three major engines, with each still filling in platforms. Plan for it as widely available, detect it at runtime, and keep a fallback.

A minimal feature check is one line:

if (!navigator.gpu) {
  // No WebGPU here — fall back to WebGL or WASM.
}

What it makes possible

Two things, mainly.

The first is better in-browser graphics: 3D scenes, data visualization, and CAD-style viewers that render more geometry with less CPU spent feeding the GPU. Engines like Babylon.js and Three.js already have WebGPU backends. The win is less a higher ceiling than a lower floor — the same scene costs less, which on a phone is the difference between smooth and not. (Some of Loft’s 3D and CAD viewers lean on this lineage of browser GPU access for exactly that reason.)

The second, and the one drawing the attention, is machine learning on the device. Quantized models now run at interactive speed inside a tab, on the user’s own GPU, with nothing sent to a server. Two real projects anchor this:

  • WebLLM, from the MLC team, runs large language models in the browser on WebGPU and exposes an OpenAI-style API. The call looks like a cloud request; the computation is local.
  • Transformers.js, from Hugging Face, brings the Python Transformers API to JavaScript. It runs on ONNX Runtime Web, which delegates to WebGPU for acceleration where the browser provides it.

The performance case was visible early. The Chrome team noted that “an initial port of an image diffusion model in TensorFlow.js shows a 3x performance gain on a variety of hardware when moved from WebGL to WebGPU.” For transformer inference the gap is generally larger, because the workload is almost entirely the matrix math that compute shaders exist to do.

It is worth being plain about the limits. Model weights are large, so first load is a real download. GPU memory is finite and shared, so model size has a ceiling that depends on the visitor’s hardware. WebGPU does not make a phone into a datacenter. What it does is remove the assumption that non-trivial GPU computation has to happen on a server — and that assumption shaped a decade of how web software was built.

The short version

WebGPU is the GPU, exposed to the web through an explicit, modern API with compute as a peer to rendering. The object model — adapter, device, pipeline, command encoder — front-loads validation so the hot path is cheap. WGSL is the language you write the GPU code in. Compute shaders are the feature that turns the browser from a place that draws into a place that calculates. It shipped to stable Chrome in 2023 and is now in every major engine, platform by platform. The most consequential result is that machine learning, including language models, can run on the visitor’s own hardware with no round trip — which is a quieter shift than the demos suggest, and a more durable one.

References

References

  1. WebGPU API — MDN Web Docs — Mozilla (accessed 2026-05-29)
  2. WebGPU: Unlocking modern GPU access in the browser — Google — Chrome for Developers (accessed 2026-05-29)
  3. Shipping WebGPU on Windows in Firefox 141 — Mozilla Gfx Team Blog (accessed 2026-05-29)
  4. From WebGL to WebGPU — Google — Chrome for Developers (accessed 2026-05-29)
  5. WebGPU — Implementation Status (gpuweb wiki) — W3C GPU for the Web Community Group (accessed 2026-05-29)