Skip to content

Transformation Matrices

Useful Explanation

  • You can precompute one matrix (M) and apply it to many points
  • Easy to chain transforms: M = Projection * View * Model
  • This is how graphics pipelines (OpenGL, DirectX, etc.) work

Minimal pseudocode

Build final transformation matrix

M = T(cx, cy, cz) * Rz(θ) * T(-cx, -cy, -cz)

Apply to each point

for each point P:  
    P' = M * P

1. Model matrix (M)

“Where is the object in the world?” - Takes your object’s local coordinates (its own origin) - Moves / rotates / scales it into the world

P_world = M * P_local

Example: - A cube defined around (0,0,0) - You want it at (10, 2, -5) and rotated → That transformation is the model matrix

2. View matrix (V)

“Where is the camera, and what is it looking at?” - Moves the world so the camera is at the origin - Equivalent to “moving the camera,” but actually moves everything else

P_view = V * P_world

Key idea: - If camera moves forward, everything else moves backward But more precisely: - You don’t “move the camera” directly - You apply the inverse transform of the camera to everything else If the camera has: - position: C - rotation: R Then the view matrix is:

V = inverse( R * T(C) )

In practice:

V = inverse(R) * inverse(T(C))

3. Projection matrix (P)

“How do we turn 3D into 2D?” - Applies perspective (things further away look smaller) - Converts 3D coordinates into screen-like coordinates

P_screen = P * P_view

This is where 3D becomes 2D. There are two common types: - Perspective projection (what you want for 3D) - Orthographic projection (no depth scaling) You almost always want perspective.

Perspective projection (simple derivation)

The simplest form comes from similar triangles: For a point (x, y, z):

x' = x / z  
y' = y / z

This is the core idea of perspective.

Add field of view (FOV)

Instead of raw division, we scale it:

x' = x * f / z  
y' = y * f / z

Where:

f = 1 / tan(FOV / 2)

Full projection matrix (standard form)

In homogeneous coordinates:

P =  
[ f/aspect   0        0              0  
    0        f        0              0  
    0        0   (far+near)/(near-far)   (2*far*near)/(near-far)  
    0        0       -1              0 ]

Where: - f = 1 / tan(FOV / 2) - aspect = width / height - near, far define depth range

Combined pipeline

All together:

P_final = P * V * M * P_local

Order (right → left): 1. Model → place object in world 2. View → move world relative to camera 3. Projection → apply perspective

What happens after projection

After:

clip = P * V * M * point

You do:

Perspective divide:

x_screen = x / w  
y_screen = y / w