What WebGPU is
WebGPU is a web API that gives a page direct, structured access
to the GPU on the machine it is running on. It is the successor
to WebGL, and the name invites a fair assumption — that it is a
faster way to draw triangles. That is part of it. The larger
part is that WebGPU was designed around general-purpose GPU
computation as a first-class feature, not as a graphics trick
done sideways.
The standard comes from the W3C’s “GPU for the Web” working
group. The same group defined a companion shading language,
WGSL — the WebGPU Shading Language — which is what you write the
GPU programs in. Both shipped together; you do not get one
without the other.
The reason WebGPU exists is that the native graphics landscape
moved on and WebGL did not follow. WebGL is built on OpenGL ES
2.0, a design from a much earlier era of GPU hardware, and it is
no longer actively evolving. WebGPU instead maps onto the modern
explicit APIs that drive games and professional software today:
Direct3D 12 on Windows, Metal on Apple platforms, and Vulkan
elsewhere. In MDN’s framing, it offers “better compatibility
with modern GPUs, support for general-purpose GPU computations,
faster operations, and access to more advanced GPU features.”
The model: adapters, devices, encoders
The part worth understanding is the object model, because it is
where WebGPU’s design choices show. Four pieces carry most of
the weight.
An adapter represents a physical GPU and its driver. You
request one from navigator.gpu.requestAdapter(). A machine
might expose more than one — a power-saving integrated GPU and a
discrete one — and the adapter is how you find out what is
available and what it can do.
A device is a logical handle obtained from the adapter via
requestDevice(). MDN calls the adapter “a physical GPU and
driver” and the device “an abstraction via which a single web
app can access GPU capabilities in a compartmentalized way.” The
device is the object you actually use; everything else hangs off
it. The split matters because it isolates one page’s GPU work
from another’s, which is the kind of thing you want when
arbitrary websites are issuing instructions to your graphics
hardware.
A pipeline describes a fixed unit of work. A render pipeline
has vertex and fragment stages and produces pixels. A compute
pipeline has a single compute stage and produces whatever you
write to a buffer. You build pipelines up front, declaring the
shader code, the data layouts, and the output formats in
advance.
A command encoder records the work. You ask the device for
an encoder, record a sequence of passes into it (set this
pipeline, bind that buffer, draw, or dispatch), call finish()
to get a command buffer, and submit it to the device’s queue.
The GPU executes the batch.
That up-front, batch-and-submit shape is the substantive
difference from WebGL, and it is deliberate. WebGL is a state
machine you mutate call by call — bind a texture, set a uniform,
draw, repeat — and each of those calls is a round trip the
browser has to validate. WebGPU front-loads the validation into
pipeline creation, so the per-frame work is mostly recording
already-checked commands. The Chrome team reported a Babylon.js
scene running its draw calls “more than 10x faster” under WebGPU
than WebGL 2 for that reason: the API is, in their word, less
“chatty.”
I’ll admit the adapter-versus-device split took me a couple of
readings to hold in my head — it feels like one indirection too
many until you remember the browser is handing untrusted pages a
pipe to the GPU, at which point the compartment makes sense.
WGSL, briefly
Shaders — the programs that run on the GPU — are written in
WGSL. It is a new language defined alongside the API rather than
a reuse of GLSL (WebGL’s language) or HLSL (Direct3D’s). MDN
describes it as “a low-level Rust-like language,” which is a
reasonable first impression: explicit types, explicit address
spaces, a syntax that reads more like systems code than the
C-flavored GLSL.
Defining a fresh language was a contested decision, and I think
the criticism was fair at the time — another language is another
thing to learn and another compiler to trust. The argument that
won was portability. A single WGSL source has to compile down to
three different native shading languages depending on the
platform, and owning the language end to end made that
translation tractable in a way that adopting an existing one
would not have.
Compute shaders, which are the real story
WebGL can run vertex and fragment shaders. It cannot run compute
shaders at all. For years, anyone who wanted to do general
numerical work on the GPU from the web had to disguise it as
graphics: encode your data as a texture, run it through a
fragment shader, render to an off-screen target, and read the
pixels back as your result. It worked, and it was miserable.
WebGPU has compute pipelines as a native feature. The Chrome
launch post calls compute shaders “WebGPU’s primary new
feature.” A compute shader gets direct read-write access to
buffers, a notion of workgroups that run in parallel, and shared
memory within a workgroup for cooperation. You dispatch a grid
of work and the GPU schedules it across its cores. None of it
pretends to be rendering.
This is the capability that changes what the platform is for.
Linear algebra — large matrix multiplies, the operation at the
center of neural-network inference — maps directly onto the
compute model. A texture-hack version of that was a research
curiosity. A direct version is a runtime.
Why now: support
WebGPU shipped to stable Chrome in Chrome 113, which the Chrome
team dated to release “today in Chrome 113 on ChromeOS, macOS,
and Windows” in their May 2023 announcement. For roughly two
years it was a Chrome and Edge feature, with everyone else
behind flags.
That has changed, though not yet uniformly, and this is the
spot where I would caution against the headlines declaring the
job finished. Apple shipped WebGPU in Safari 26 across its
platforms. Mozilla shipped it in Firefox 141 — but, per
Mozilla’s own Gfx Team, Firefox 141 covers Windows only,
with macOS and Linux following and Android after that. The W3C
group’s implementation-status page is the honest place to check;
the short version is that WebGPU is now in all three major
engines, with each still filling in platforms. Plan for it as
widely available, detect it at runtime, and keep a fallback.
A minimal feature check is one line:
if (!navigator.gpu) {
// No WebGPU here — fall back to WebGL or WASM.
}
What it makes possible
Two things, mainly.
The first is better in-browser graphics: 3D scenes, data
visualization, and CAD-style viewers that render more geometry
with less CPU spent feeding the GPU. Engines like Babylon.js and
Three.js already have WebGPU backends. The win is less a higher
ceiling than a lower floor — the same scene costs less, which on
a phone is the difference between smooth and not. (Some of
Loft’s 3D and CAD viewers lean on this lineage of
browser GPU access for exactly that reason.)
The second, and the one drawing the attention, is machine
learning on the device. Quantized models now run at interactive
speed inside a tab, on the user’s own GPU, with nothing sent to
a server. Two real projects anchor this:
- WebLLM, from the MLC team, runs large language models in
the browser on WebGPU and exposes an OpenAI-style API. The
call looks like a cloud request; the computation is local.
- Transformers.js, from Hugging Face, brings the Python
Transformers API to JavaScript. It runs on ONNX Runtime Web,
which delegates to WebGPU for acceleration where the browser
provides it.
The performance case was visible early. The Chrome team noted
that “an initial port of an image diffusion model in
TensorFlow.js shows a 3x performance gain on a variety of
hardware when moved from WebGL to WebGPU.” For transformer
inference the gap is generally larger, because the workload is
almost entirely the matrix math that compute shaders exist to
do.
It is worth being plain about the limits. Model weights are
large, so first load is a real download. GPU memory is finite
and shared, so model size has a ceiling that depends on the
visitor’s hardware. WebGPU does not make a phone into a
datacenter. What it does is remove the assumption that
non-trivial GPU computation has to happen on a server — and that
assumption shaped a decade of how web software was built.
The short version
WebGPU is the GPU, exposed to the web through an explicit,
modern API with compute as a peer to rendering. The object
model — adapter, device, pipeline, command encoder — front-loads
validation so the hot path is cheap. WGSL is the language you
write the GPU code in. Compute shaders are the feature that
turns the browser from a place that draws into a place that
calculates. It shipped to stable Chrome in 2023 and is now in
every major engine, platform by platform. The most consequential
result is that machine learning, including language models, can
run on the visitor’s own hardware with no round trip — which is
a quieter shift than the demos suggest, and a more durable one.
References