Transformation Matrices
- You can precompute one matrix (M) and apply it to many points
- Easy to chain transforms:
M = Projection * View * Model - This is how graphics pipelines (OpenGL, DirectX, etc.) work
Minimal pseudocode
Build final transformation matrix
M = T(cx, cy, cz) * Rz(θ) * T(-cx, -cy, -cz)
Apply to each point
for each point P:
P' = M * P
1. Model matrix (M)
“Where is the object in the world?” - Takes your object’s local coordinates (its own origin) - Moves / rotates / scales it into the world
P_world = M * P_local
Example: - A cube defined around (0,0,0) - You want it at (10, 2, -5) and rotated → That transformation is the model matrix
2. View matrix (V)
“Where is the camera, and what is it looking at?” - Moves the world so the camera is at the origin - Equivalent to “moving the camera,” but actually moves everything else
P_view = V * P_world
Key idea:
- If camera moves forward, everything else moves backward
But more precisely:
- You don’t “move the camera” directly
- You apply the inverse transform of the camera to everything else
If the camera has:
- position: C
- rotation: R
Then the view matrix is:
V = inverse( R * T(C) )
In practice:
V = inverse(R) * inverse(T(C))
3. Projection matrix (P)
“How do we turn 3D into 2D?” - Applies perspective (things further away look smaller) - Converts 3D coordinates into screen-like coordinates
P_screen = P * P_view
This is where 3D becomes 2D. There are two common types: - Perspective projection (what you want for 3D) - Orthographic projection (no depth scaling) You almost always want perspective.
Perspective projection (simple derivation)
The simplest form comes from similar triangles:
For a point (x, y, z):
x' = x / z
y' = y / z
This is the core idea of perspective.
Add field of view (FOV)
Instead of raw division, we scale it:
x' = x * f / z
y' = y * f / z
Where:
f = 1 / tan(FOV / 2)
Full projection matrix (standard form)
In homogeneous coordinates:
P =
[ f/aspect 0 0 0
0 f 0 0
0 0 (far+near)/(near-far) (2*far*near)/(near-far)
0 0 -1 0 ]
Where:
- f = 1 / tan(FOV / 2)
- aspect = width / height
- near, far define depth range
Combined pipeline
All together:
P_final = P * V * M * P_local
Order (right → left): 1. Model → place object in world 2. View → move world relative to camera 3. Projection → apply perspective
What happens after projection
After:
clip = P * V * M * point
You do:
Perspective divide:
x_screen = x / w
y_screen = y / w