Completing the Square and the Normal Form of Quadrics

Kyle M. Douglass

2025-07-07 08:42

I am working on rendering cross section views of optical systems for my ray tracer. The problem is one of finding the intersection curve between a plane (the cutting plane) and a quadric surface which represents an interface between two media with different refractive indexes. Quadric surfaces are important primitives for modeling optical interfaces because they represent common surface types in optics, such as spheroids and paraboloids. A pair of quadrics, or a quadric and a plane, models a common lens.

In 3D, the implicit surface equation for a quadric is

$$ A x^2 + B y^2 + C z^2 + D x y + E y z + F x z + G x + H y + I z + J = 0 $$

Any quadric can be reduced to a so-called normal form that identifies its class, i.e. ellipsoid, hyperbolic paraboloid, etc. Except for paraboloids, none of the normal form equations contain linear terms in $ x $, $ y $, or $ z $.

A quadric of revolution occurs when two or more of the parameters of the the quadric's normal form are equal, such as $ x^2 / R^2 + y^2 / R^2 + z^2 / R^2 = 1 $, which is the equation for a spheroid with radius parameter $ R $¹. Quadrics of revolution are the surface types most-often encountered in optics².

The surface sag of a quadric surface is a very important quantity for ray tracing. The sag of a quadric is usually given in terms of the conic constant, $ K $. One obtains the sag by solving the following quadric equation for $ z $:

$$ x^2 + y^2 - 2 R z + ( K + 1 ) z^2 = 0 $$

Here, $ R $ is the radius of curvature of the surface at its apex, $ x = y = z = 0 $.

At this point I asked myself how I could rewrite the above expression in its normal form, and for a while I was unable to do it. After a bit of searching on the internet, I eventually realized that the solution involves completing the square, a topic that was not given much attention during my high school education. After this exercise, I realize now that the purpose of completing the square is to essentially move any linear terms of a quadratic equation into squared parantheses. This allows one to then remove the linear terms entirely by applying a suitable transformation, leaving only quadratic and constant terms.

Converting the Quadric to its Normal Form

The conversion of the above equation proceeds as follows. We first factor out $ ( K + 1 ) $ from the terms involving $ z $.

$$ x^2 + y^2 + ( K + 1 ) \left[ z^2 - \frac{ 2 R z }{ K + 1 }\right] = 0 $$

Next, we "add zero" to the term inside the square brackets by adding $ [ 2 R / 2 ( K + 1 ) ]^2 - [ 2 R / 2 ( K + 1 ) ]^2 = [ R / ( K + 1 ) ]^2 - [ R / ( K + 1 ) ]^2 $:

$$ x^2 + y^2 + ( K + 1 ) \left[ z^2 - \frac{ 2 R z }{ K + 1 } + \left( \frac{ R }{ K + 1 } \right)^2 - \left( \frac{ R }{ K + 1 } \right)^2 \right] = 0 $$

We can understand this a bit more generally by considering the expression $ z^2 - a z $. Here I need to add and subtract $ ( a / 2 )^2 $. The reason is that now we can rewrite the first three terms inside the square brackets as a squared binomial:

$$ x^2 + y^2 + ( K + 1 ) \left[ \left( z - \frac{ R }{ K + 1 } \right)^2 - \left( \frac{ R } { K + 1 } \right)^2 \right] = 0 $$

These last two steps complete the square. To place the equation into its normal form, I apply the Euclidean transformation $ z' = z - \frac{ R }{ K + 1 } $ and carry through the $ K + 1 $.

$$ x^2 + y^2 + ( K + 1 ) z^{ \prime 2 } - R^2 / ( K + 1 ) = 0 $$

The above equation is almost a normal form expression for a quadric. To finish the job, I would need to substiute in a specific value for the conic constant and divide through so that the constant is either -1, 0, or 1.

The coefficients of each term need not be equal in general. ↩
A cylindrical lens does actually contain a quadric, but rather would consist of at least one toroidal surface. These are less common than lenses with spherical profiles, however. ↩

Why is Camera Read Noise Gaussian Distributed?

Kyle M. Douglass

2025-06-19 10:40

Comments

As a microscopist I work with very weak light signals, often just tens of photons per camera pixel. The images I record are noisy as a result¹. To a good approximation, the value of a pixel is a sum of two random variables describing two different physical processes:

photon shot noise, which is described by a Poisson probability mass function, and
camera read noise, which is described by a Gaussian probability density function.

Read noise has units of electrons, which must be discrete, positive integers. So why is it modeled as a continuous probability density function²?

The Source(s) of Read Noise

Janesick³ defines read noise as "any noise source that is not a function of signal." This means that there is not necessarily one single source of read noise. It is commonly understood that it comes from somewhere in the camera electronics, but "somewhere" need not imply that it is isolated to one location.

The signal from a camera pixel is the number of photoelectrons that were generated inside the pixel. I imagine readout of this signal as a linear path consisting of many steps. The signal might change form along this path, such as going from number of electrons to a voltage. At each step, there is a small probability that some small error is added to (or maybe also removed from?) the signal. The final result is a value that differs randomly from the original signal.

Importantly, I do not think that it matters which physical process each step actually represents; rather there just has to be many of them for this abstraction to be valid.

"But aren't there only a handful of steps?" you might ask. After all, linear models of photon transfer typically consist of a few processes such as detection, amplification, readout, and analog-to-digital conversion. I am not referring to these when I use the term "step." Rather, I am referring to processes that are much more microscopic, such as passage of a signal through a transistor or amplifier chip. At the very least Johnson noise, or random currents induced by thermal motion of the charge carriers, will be present in all of the camera's components.

Read Noise is Gaussian because of the Central Limit Theorem

The reason for my conclusion that I can ignore the details so long as there are many steps is the following:

I can model the error introduced by each step as a random variable. Let's assume that each step is independent of the others. The result of camera readout is a sum of a large number of independent random variables. And of course the Central Limit Theorem states that the distribution of the sum of random variables tends towards a normal distribution, i.e. Gaussian, as the number of random variables tends towards infinity. This happens regardless of the distributions of the underlying random variables.

So read noise can appear to be effectively Gaussian so long as there are many steps along the path of conversion from photoelectrons to pixel values and each step has a chance of introducing an error.

Sums of Discrete Random Variables

I encountered one conceptual difficulty here: the sum of discrete random variables is still discrete. If I have several variables that produce only integers, their sum is still an integer. I cannot get, say, 3.14159 as a result. Does the Gaussian approximation, which is for continuous random variables, still apply in this case?

This question is relevant because the signal in a camera is transformed between discrete a continuous representations at least twice: from electrons to voltage and from voltage to analog-to-digital units (ADUs).

Let's say that I have a discrete random variable that can assume values of 0 or 1, and the probability that the value is 1 is denoted $ p $. This is known as a Bernoulli trial. Now let's say that I have a large number $ n $ of Bernoulli trials. But the sum of $ n $ Bernoulli trials has a distribution that is binomial, and this is well-known to be approximated as a Gaussian when certain conditions are met, including large $ n $⁴. So a sum of a large number of discrete random variables can have a probability distribution function that is approximated as a Gaussian.

This does not mean that the sum of discrete random variables can take on continuous values. Rather, the probability associated with any one output value can be estimated by a Gaussian probability density function.

But how exactly can I use a continuous distribution to approximate a discrete one? After all, if the random variable $ Y $ is a continuous, Gaussian random variable, then $P (Y = a) = 0 $ for all values of $ a $. To get a non-zero probability from a probability density function, I need to integrate it over some interval of its domain. I can therefore integrate the Gaussian in a small interval around each possible value of the discrete random variable, and then associate this integrated area with the probability of the obtaining that discrete value. This is called a continuity correction.

Example of a Continuity Correction

As a very simple example, consider a discrete random variable $ X $ that is approximated by a Gaussian continuous random variable $ Y $. The probability of getting a discrete value 5 is $ P (X = 5) $. The Gaussian approximation is $ P ( 4.5 \lt Y \lt 5.5 ) $, i.e. I integrate the Gaussian from 4.5 to 5.5 to compute the approximate probability of getting the discrete value 5.

I wrote a blog post about this a while back: https://kmdouglass.github.io/posts/modeling-noise-for-image-simulations/ ↩
This is often asserted without justification. See for example Janesick, Photon Transfer, page 34. ↩
https://doi.org/10.1117/3.725073 ↩
https://en.wikipedia.org/wiki/Binomial_distribution#Normal_approximation ↩

3D Sequential Optical System Layouts

Kyle M. Douglass

2025-06-05 11:26

Comments

I am working on a new feature in my ray tracer that will allow users to lay out sequential optical systems in 3D. This is forcing me to think carefully about 3D rigid body transformations in a level of detail that I have never before considered.

In this post I walk through the mathematics for modeling a pair of flat mirrors that are oriented at different angles. Strictly speaking, the layout can be represented more easily in 2D, but I will treat the problem as if it were the more general 3D case. Emphasis will be placed on specifying rotations in an intuitive manner, which will mean rotations about the optical axis, rather than about a fixed axis in a global reference frame.

The Problem

The problem that I will consider is depicted as follows:

A system of two flat mirrors whose optical axis forms the figure Z.

The system consists of two flat mirrors whose optical axis forms a "figure Z." The normal of the first mirror is at 30 degrees to the axis, and likewise for the second. The optical axis emerges from the second mirror parallel to the first.

The questions are:

How do I construct the system without requiring the user to specify the absolute coordinates of the mirror surfaces?
How do I represent the local coordinate reference frames for each mirror surface?
How do I handle transformations between frames?

Ray Tracing Review

As a quick review, the ray tracing algorithm that I implemented was described by Spencer and Murty¹. It is loosely follows this pseudo-code:

for each surface in system:
    for each ray in ray bundle:
        1. transform the ray coordinates by rotating the reference frame into the local surface frame
        2. find the ray/surface intersection point
        3. propagate the ray to the intersection point
        4. perform bounds checking against the surface
        5. redirect the ray according to the laws of refraction or reflection
        6. transform the ray coordinates by rotating the reference frame back into the global frame

Rotations are performed using 3x3 rotation matrices. Ray/surface intersections are found numerically using the Newton-Raphson method, even for spherical surfaces². I computed the expressions for the surface sag and normal vectors for conics and flat surfaces by hand and hard-coded them as functions of the intersection point in the local surface reference frame to avoid having to compute them on-the-fly.

Looking at the ray trace algorithm, I see three things that are relevant to this discussion:

There are both global and local reference frames
Surfaces are iterated over sequentially
There are rotations, one at the beginning of each loop iteration and one at the end

Let's explore each one individually, starting with the global and local reference frames.

Reference Frames

I will use only right-handed reference frames where positive rotations are in the counterclockwise direction.

The Global Reference Frame

The global reference frame $ \mathbf{G} $ remains fixed. Sometimes it's called the world frame. I denote the coordinate axes of the global frame using $ x $, $ y $, and $ z $.

The global reference frame.

By convention, I put its origin at the first non-object surface; this would be at the first mirror in the system of two mirrors I described above³. I also establish the convention that the optical axis between the object and the first surface is parallel to the global z-axis.

The global frame is important because the orthonormal vectors defining the local and cursor frames (to be explained later) are expressed relative to it.

Local Reference Frames

Each surface $ i $ has a local reference frame $ \mathbf{L}_i $ whose origin lies at the vertex of the surface. Its coordinate axes are denoted $ x_i^{\prime} $, $ y_i^{\prime} $, and $ z_i^{\prime} $. For flat surfaces, I set the $ z_i^{\prime} $ axis perpendicular to the surface.

The local reference frames of the two mirrors.

Notice that the $ x^{\prime} $ axes flip directions when going from mirror 1 to mirror 2. This is done to preserve the right-handedness of the reference frames. More about this will be explained in the next section.

Sequential System Models

Ray tracing programs for optical design are often divided into two categories: sequential and nonsequential. In sequential ray tracers, rays are traced from one surface to another in the sequence for which they are defined. This means that a ray could pass right through a surface if it is not the next surface in the model sequence.

Nonsequential ray tracers do not take account of the order in which surfaces are defined. Rays are fired into the world and the intersect whatever the closest object is on their path. Illumination optics often use nonsequential ray tracing, as do rendering engines for cinema.

My ray tracer is a sequential ray tracer because sequential ray tracing is easier to implement and can be applied to nearly all the use cases that I encounter in the lab.

3D Layouts of Sequential Surfaces

One possibility to layout sequential surfaces in 3D is to specify the coordinates and orientations of each surface relative to the global frame. This is how one adds surfaces in 3D in the open source Python library Optiland, for example. In practice, I found that I need to have a piece of paper by my side to work out the positions of each surface independently. This option provides maximum flexibility in surface placement.

The other possibility that I considered is to leverage the fact that the surfaces are an ordered sequence, and position them in 3D space along the optical axis. The axis can reflect from reflecting surfaces using the law of reflection. Furthermore, any tilt or decenter could be specified relative to this axis. I ultimately chose this solution because I felt that it better matches my mental model of sequential optical systems. It also seems to follow more closely what I do in the lab when I build a system, i.e. add components along an axis that bends through 3D space.

The Cursor

I created the idea of the cursor to position sequential surfaces in 3D space. A cursor has a 3D position, $ \vec{ t } \left( s \right) $ that is parameterized over the track length $ s $. $ s $ is negative for the object surface, $ s = 0 $ at the first non-object surface, and achieves its greatest value at the final image plane.

In addition, the cursor has a reference frame attached to it that I denote $ \mathbf{C} \left( s \right) $. The axes of the cursor frame are $ r $, $ u $ and $ f $, which stand for right, up, and forward, respectively. This nearly matches the FRU coordinate system in game engines such as Unreal, except I take the forward direction to represent the optical axis because I would say that this convention is universal in optical design.

The cursor frames at three different positions along the optical axis.

Above I show the cursor frame at three different positions along the optical axis $ s_1 < 0 < s_2 < s_3 $. Refracting surfaces will not change the orientation of the cursor frame, but reflecting surfaces will.

Finally, when $ s $ is exactly equal to a reflecting surface position, I take the orientation of the cursor frame to be the one before reflection. An infinitesimal distance later, the frame reorients by reflecting about the surface normal at the vertex of the surface in its local frame.

Convention for Maintaining Right Handedness upon Reflection

There is an ambiguity that arises in the cursor frame upon reflection that is best illustrated in the example below:

Ambiguity in the cursor frame upon reflection.

In panel a, the cursor is incident upon a mirror with its frame's forward direction antiparallel to the mirror's normal vector. There are two equally valid choices when defining the cursor frame after reflection. In panel b, the cursor frame is rotated about the up direction, whereas in panel c it is rotated about the right direction. This means that there is no fundamentally correct way to position the cursor frame after reflection. We must choose a convention and stick with it.

Reflections of the cursor frame are handled in two steps:

Reflect the frame
Adjust the results to maintain right handedness and address the ambiguity illustrated above

The vector law of reflection is used to compute the new $ \hat{ r } $, $ \hat{ u } $, and $ \hat{ f } $ unit vectors for any general angle of incidence of the cursor frame upon a reflecting surface:

$$\begin{equation} \hat{ f }^{\prime} = \hat{ f } - 2 \left( \hat{ f } \cdot \hat{ n } \right) \hat{ n } \end{equation}$$

where $ \hat{ n } $ is the surface's unit normal vector. The same applies for the right and up unit vectors.

After reflection, I perform a check for right handedness. By convention, I maintain the direction of the up unit vector because many optical systems are laid out in 2D and their elements are rotated about this direction. This convention means that the right unit vector must be flipped:

if cross(right, up) · forward < 0:
    right = -right

The cross product between the right and up directions must point in the forward direction if the system is right handed. "Pointing in the forward direction" means that the dot product of the result with the forward unit vector must be greater than zero. The conditonal in the pseudocode above checks whether this is not indeed the case and flips the right unit vector if necessary.

Transformations between Reference Frames

There are two different transformations required by the ray trace algorithm:

From the global frame to a surface local frame
From a surface local frame to the global frame

Because the system is laid out relative to the cursor frame, I need to chain together two rotations, one from the global to the cursor frame, and one from the cursor to the local frame.

Example

Let's say that the second mirror has a diameter of 25.4 mm, and that the mirrors are separated by $ \| \vec{ t } \left( s_2 \right) \| = 100 \, mm $. I want to find the transformation from the global frame coordinates at a point on the bottom edge of the mirror to the local frame coordinates, which is $ y_2^{\prime} = -12.7 \, mm $. The image below illustrates the geometry that will be used for this example.

Example geometry for transforming from the global frame to the local frame via the cursor frame.

From relatively straightforward trigonmetry⁴ we get the global frame coordinates of both $ {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) $ and the point we are trying to find, $ {}^{\mathbf{G}}\vec{ p } $. (Vectors preceded by superscripts with reference frame names indicate the coordinate system they are being referred to.)

$$\begin{eqnarray} {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) = \left( \begin{array}{c} 0 \\ 50 \sqrt{ 3 } \\ -50 \end{array} \right) \end{eqnarray}$$

$$\begin{eqnarray} {}^{\mathbf{G}}\vec{ p } = \left( \begin{array}{c} 0 \\ 43.65 \sqrt{ 3 } \\ -56.35 \end{array} \right) \end{eqnarray}$$

Step 1: Translate from the Global Origin to the Cursor Frame

The first step in computing $ {}^{\mathbf{ C }} \vec{ p } $ is to translate from the origin of the global frame to the position of cursor.

$$ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) = (0, -6.35 \sqrt{3}, -6.35)^{ \mathrm{ T }}$$

Step 2: Rotate into the Cursor Frame

A rotation from the global frame into the cursor frame can be achieved by taking the $ \hat{ r } $, $ \hat{ u } $, and $ \hat{ f } $ unit vectors that define the cursor frame in the global coordinate system and making them the columns of a $ 3 \times 3 $ rotation matrix. At the second mirror, this matrix is:

$$\begin{eqnarray} R_{GC} \left( \theta = 30^{ \circ } \right) = \left( \begin{array}{ccc} -1 & 0 & 0 \\ 0 & 1 / 2 & \sqrt{ 3 } / 2 \\ 0 & \sqrt{ 3 } / 2 & -1 / 2 \end{array} \right) \end{eqnarray}$$

If this is not clear, consider that the columns of a rotation matrix represent the basis vectors of the coordinate system after rotation, but expressed in the original (global) frame's coordinate system. Also, from the diagram above, $ \hat{ u } \left( s_2 \right) = ( 0, 1 / 2, \sqrt{ 3 } / 2)^{ \mathrm{ T }} $ and $ \hat{ f } \left( s_2 \right) = ( 0, \sqrt{ 3 } / 2, - 1 / 2)^{ \mathrm{ T }}$, which are the second and third columns of the matrix.

The rotation into the cursor frame is the product between the rotation matrix and the difference $ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) $:

$$\begin{eqnarray} {}^{\mathbf{ C }} \vec{ p } = R_{GC} \left[ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) \right] = \left( \begin{array}{ccc} -1 & 0 & 0 \\ 0 & 1 / 2 & \sqrt{ 3 } / 2 \\ 0 & \sqrt{ 3 } / 2 & -1 / 2 \end{array} \right) \left( \begin{array}{c} 0 \\ -6.35 \sqrt{ 3 } \\ -6.35 \end{array} \right) = \left( \begin{array}{c} 0 \\ -6.35 \sqrt{ 3 } \\ -6.35 \end{array} \right) \end{eqnarray}$$

At first, I thought I had made a mistake when I did this calculation because the vector is unchanged after rotation. However, as illustrated below, you can see that the relative lengths of the projections of $ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) $ onto the $ u $ and $ f $ axes make sense.

A simplified schematic showing the projection of difference vector onto the -u and -f axes.

As it turns out, I inadvertently chose an eigenvector of the rotation matrix as an example; any general point will in fact change its coordinates when moving from the global to the cursor frame. For example, if we try to rotate a vector that is antiparallel to the global z-axis, i.e. $ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) = ( 0, 0, -1 )^{ \mathrm{ T } }$, then it will become

$$\begin{eqnarray} R_{GC} \left[ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) \right] = \left( \begin{array}{ccc} -1 & 0 & 0 \\ 0 & 1 / 2 & \sqrt{ 3 } / 2 \\ 0 & \sqrt{ 3 } / 2 & -1 / 2 \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ -1 \end{array} \right) = \left( \begin{array}{c} 0 \\ -0.8660 \\ -0.5 \end{array} \right) \end{eqnarray}$$

in the cursor frame.

Step 3: Rotate into the Surface Local Frame

For the final step, I need to compose a rotation matrix from a sequence of three rotations. To do this well, I need to be very clear about what types of rotations I am performing and their sequence.

Active vs. Passive Rotations

The difference between active and passive rotations are illustrated below for a 45 degree rotation about the right axis.

Active vs. passive rotations

Active rotations specify the rotation of a point relative to a fixed reference frame; passive rotations specify the rotation of a reference frame, keeping the point fixed. And pay attention here: the right axis points into the screen, so a positive rotation would be clockwise when viewed from the perspective drawn above.

What are the corresponding rotation matrices? Here, I found that the internet is absolutely littered with wrong answers, including on sites like Wikipedia. I even get different answers from LLMs depending on when I ask. Therefore, I am including them here as a gift to my future self.

The active rotation matrices about the x (right), y (up), and z (forward) axes are:

$$\begin{eqnarray} R_x \left( \theta \right) = \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos \theta & - \sin \theta \\ 0 & \sin \theta & \cos \theta \end{array} \right) \end{eqnarray}$$

$$\begin{eqnarray} R_y \left( \psi \right) = \left( \begin{array}{ccc} \cos \psi & 0 & \sin \psi \\ 0 & 1 & 0 \\ - \sin \psi & 0 & \cos \psi \end{array} \right) \end{eqnarray}$$

$$\begin{eqnarray} R_z \left( \phi \right) = \left( \begin{array}{ccc} \cos \phi & - \sin \phi & 0 \\ \sin \phi & \cos \phi & 0 \\ 0 & 0 & 1 \end{array} \right) \end{eqnarray}$$

The passive rotation matrices about the x (right), y (up), and z (forward) axes are:

$$\begin{eqnarray} R_x \left( \theta \right) = \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos \theta & \sin \theta \\ 0 & - \sin \theta & \cos \theta \end{array} \right) \end{eqnarray}$$

$$\begin{eqnarray} R_y \left( \psi \right) = \left( \begin{array}{ccc} \cos \psi & 0 & - \sin \psi \\ 0 & 1 & 0 \\ \sin \psi & 0 & \cos \psi \end{array} \right) \end{eqnarray}$$

$$\begin{eqnarray} R_z \left( \phi \right) = \left( \begin{array}{ccc} \cos \phi & \sin \phi & 0 \\ - \sin \phi & \cos \phi & 0 \\ 0 & 0 & 1 \end{array} \right) \end{eqnarray}$$

Notice that all that changes between these two types of rotations is the location of a negative sign on the $ \sin $ terms.

I found that a useful way to remember whether a matrix represents an active or passive rotation is as follows. Take for example the +45 degree rotation of the vector $ ( 0, 0, 1 )^{ \mathrm{ T } } $ about the right direction illustrated above. You can see that an active rotation should result in a negative $ u $ and a positive $ f $ component. This means⁶:

$$\begin{eqnarray} \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 / \sqrt{ 2 } & - 1 / \sqrt{ 2 } \\ 0 & 1 / \sqrt{ 2 } & 1 / \sqrt{ 2 } \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right) = \left( \begin{array}{c} 0 \\ - 1 / \sqrt{ 2 } \\ 1 / \sqrt{ 2 } \end{array} \right) \end{eqnarray}$$

The passive rotation should result in positive values for both the $ u^{ \prime } $ and $ f^{ \prime } $ components:

$$\begin{eqnarray} \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 / \sqrt{ 2 } & 1 / \sqrt{ 2 } \\ 0 & - 1 / \sqrt{ 2 } & 1 / \sqrt{ 2 } \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right) = \left( \begin{array}{c} 0 \\ 1 / \sqrt{ 2 } \\ 1 / \sqrt{ 2 } \end{array} \right) \end{eqnarray}$$

I can do a similar check for the other directions to verify the other matrices.

Extrinsic vs. Intrinsic Rotations

I found these easier to understand than active and passive rotations. Extrinsic rotations are rotations that are always about a fixed global reference frame. On the other hand, intrinsic rotations are about the intermediate frames that result from a single rotation. So if I rotate about the $ f $ axis, then the $ r $ and $ u $ axes will be rotated, resulting in an intermediate $ r^{ \prime }u^{ \prime }f^{ \prime } $ frame. The next rotation will be about one of these intermediate axes.

The confusing thing about these two types of rotations is the order in which the rotation matrices are applied to a vector. An extrinsic rotation of a vector $ \vec{ v } $ about $ r $, then $ u $, then $ f $ is written as:

$$ R_f R_u R_r \vec{ v} $$

which follows the usual commutativity rules of matrix multiplication. An intrinsic rotation of a vector $ \vec{ v } $ about $ r $, then $ u^{ \prime } $, then $ f^{ \prime \prime } $, on the other hand is written as:

$$ R_r R_u R_f \vec{ v} $$

So even though the rotation about the right direction is performed first, we multiply the vector first by the rotation matrix about the $ f $ direction in the second intermediate frame.

All of this might seem confusing and lead one to wonder why they would want to use intrinsic rotations, but actually they are much more intuitive than extrinsic rotations and make a lot of sense when laying out an optical system. For example, if I have a two-axis mirror mount and I rotate the mirror about the vertical axis, a horizontal rotation that follows will be about the axis in the newly rotated frame, not the global laboratory frame. In any case, a sequence of three extrinsic rotations and three intrinsic rotations through the same angles will produce the same result so long as the order of the rotation matrices is correct.

Euler Angles and Rotation Sequences

The most important thing I learned about Euler angles is that they are completely meaningless unless you also specify a rotation sequence. Additionally, the internet is full of resources about the distinction between proper and improper Euler angles. The gist of what I learned here is that proper Euler angles are really a distraction to scientists and engineers because they rely on rotation sequences in which one of the axes is used twice. More useful are what aerospace engineers sometimes refer to as the Tait-Bryan angles, which are the rotation angles associated with sequences like $ z-y^{ \prime }-x^{ \prime \prime } $ or $ x-y-z $.

Now, there is one point here that is worth making and that is relevant to optical system layout: rotations about $ f $, the forward direction, into the local frame are best performed last in the sequence. To understand why, consider a cylindrical lens with an axis parallel to the local $ z' $ direction. If we perform an intrinsic rotation about the cursor's $ f $ direction first and then try to adjust its tip or tilt, we will be doing so about axes that are rotated such that its tip and tilt become coupled with respect to the global frame. When aligning such systems, no one expects that rotation of a cylindrical lens about its axis will change the way that the tip and tilt adjustors on a lens mount work.

For all these reasons, I choose an intrinsic sequence $r - u^{ \prime } - f^{{ \prime \prime} } $ of passive rotations with Euler angles $ \theta $, $ \psi $, and $ \phi $, respectively. The corresponding rotation matrix is:

$$\begin{eqnarray} R_{ \mathbf{ CL } } ( \theta, \psi, \phi ) = R_r ( \theta ) R_u ( \psi ) R_f ( \phi ) = \left( \begin{array}{ccc} \cos \phi \cos \psi & \sin \phi \cos \psi & - \sin \psi \\ - \sin \phi \cos \theta + \sin \psi \sin \theta \cos \phi & \sin \phi \sin \psi \sin \theta + \cos \phi \cos \theta & \sin \theta \cos \psi \\ \sin \phi \sin \theta + \sin \psi \cos \phi \cos \theta & \sin \phi \sin \psi \cos \theta - \sin \theta \cos \phi & \cos \psi \cos \theta \end{array} \right) \end{eqnarray}$$

And finally, the transformation of the point on the mirror from the global to the surface local frame is:

$$ {}^{ \mathbf{ L }}\vec{p} = R_{ CL }R_{ GC } \left[ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) \right] $$

The Solution to the Example

Does this give the correct result in the above example? Well, the mirror is rotated +30 degrees about the right direction, so the cursor-to-local rotation matrix is:

$$\begin{eqnarray} R_{CL} \left( \theta = 30^{ \circ }, \psi = 0, \phi = 0 \right) = \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \sqrt{ 3 } / 2 & 1 / 2 \\ 0 & - 1 / 2 & \sqrt{ 3 } / 2 \end{array} \right) \end{eqnarray}$$

From earlier, the vector representing the point in the cursor frame is $ ( 0, -6.35 \sqrt{ 3 }, -6.35 )^{ \mathrm{ T } } $. Their product gives the final answer:

$$\begin{eqnarray} {}^{ \mathbf{ L } }\vec{ p } = \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \sqrt{ 3 } / 2 & 1 / 2 \\ 0 & - 1 / 2 & \sqrt{ 3 } / 2 \end{array} \right) \left( \begin{array}{c} 0 \\ -6.35 \sqrt{ 3 } \\ -6.35 \end{array} \right) = \left( \begin{array}{c} 0 \\ -12.7 \\ 0 \end{array} \right) \end{eqnarray}$$

This is exactly as expected, as we wanted to get a point on the bottom of the 25.4 mm diameter mirror in its local frame.

Step 4: Rotate back from the Surface Local to the Global Frame

I need to go back to the global frame at the end of each iteration for a ray trace. Fortunately, it's easy to undo a rotation because the inverse of a rotation matrix is just its transpose. I also need to swap the order of the matrices when taking the inverse, and add back the offset from the origin of the global system. This means:

$${}^{\mathbf{ G } }\vec{ p } = R_{GC}^{ \mathrm{ T} } R_{CL}^{ \mathrm{ T} } {}^{\mathbf{ L } }\vec{ p } + {}^{\mathbf{ G }} \vec{ t } (s_2) $$

I plugged in the numbers in Python and verified that I get the original point back.

This should be all I need to know to implement 3D sequential optical system layouts in my ray tracer.

G. H. Spencer and M. V. R. K. Murty, "General Ray-Tracing Procedure," J. Opt. Soc. Am. 52, 672-678 (1962). https://doi.org/10.1364/JOSA.52.000672. ↩
Ray/surface intersections with spherical surfaces can be found analytically using the quadratic equation with only minor caveats considering stability issues due to floating point arithmetic. This would likely be faster than using Newton-Raphson. However, a general system contains both spherical and non-spherical surfaces, and I was concerned that checking each surface type would result in a performance hit due to branch prediction failures by the processor. I could probably have found a way around this by deciding ahead of time which algorithm to use to determine the intersection for each surface before entering the main ray tracing loop, but during initial development I decided to just use Newton-Raphson for everything because doing so resulted in very simple code. (Thanks to Andy York for telling me about the numerical instabilities when using the quadratic equation. See Chapter 7 here: https://www.realtimerendering.com/raytracinggems/rtg/index.html.) ↩
The object plane is the flat surface perpendicular to the optical axis in which the object lies. It is always at surface index 0 in my convention. ↩
Since the only angles involved are $ 30^{ \circ} $ and $ 60^{ \circ } $, I used a 30-60-90 triangle of lengths 1, $ \sqrt{ 3 } $, and 2, respectively to compute the cosines and sines. ↩
Passive rotations result in a rotation of the coordinate axes, keeping a point fixed; active rotations rotate a point about a set of axes. ↩
The cosine and sine of 45 degrees are both $ 1 / \sqrt{ 2 } $. ↩

A Very Brief Summary of The Analytic Signal in Fourier Optics

Kyle M. Douglass

2025-04-01 15:53

Comments

The Analytic Signal Representation of a Monochromatic Wave

Monochromatic Scalar Waves

A monochromatic, scalar waveform is described by the expression:

$$ u \left( \mathbf{r}, t\right) = A ( \mathbf{r} ) \cos \left[2 \pi f_0 t + \phi \left( \mathbf{r} \right) \right] $$

The signal is real-valued
The signal has a known phase for all $ t $

The Analytic Signal

An analytic signal is a generalization of a phasor. It is used to represent a real-valued signal as a complex exponential or a sum of complex exponentials. When Goodman¹ refers to a phasor, he often means the analytic signal. This is made clear in Chapter 6 where he describes the construction of the phasor for a narrowband signal as follows:

compute its Fourier transform
set the positive frequency components to zero
double the amplitudes of the negative frequency components
inverse Fourier transform the resulting one-sided spectrum

Strictly speaking, the analytic signal is obtained by setting the negative frequencies to zero and doubling by application of the Hilbert transform. However, many engineering fields have adopted the convention of setting the positive frequencies to zero instead. The results will be the same, except that the direction of power flow will be reversed (if I recall correctly).

The Fourier transform of $ u \left( \mathbf{r}, t\right) $ is:

$$ \mathcal{F} \left\{ u \left( \mathbf{r}, t\right) \right\} = \frac{A ( \mathbf{r} ) }{2} \left[ e^{j \phi \left( \mathbf{r} \right) } \delta \left( f - f_0 \right) + e^{-j \phi \left( \mathbf{r} \right) } \delta \left( f + f_0 \right) \right] $$

Drop the positive frequency term $ e^{j \phi \left( \mathbf{r} \right) } \delta \left( f - f_0 \right) $ and double the result. This produces:

$$ A ( \mathbf{r} ) e^{-j \phi \left( \mathbf{r} \right) } \delta \left( f + f_0 \right) $$

Let $ U ( \mathbf{r} ) := A ( \mathbf{r} ) e^{-j \phi \left( \mathbf{r} \right) } $. The inverse Fourier transform of this signal is:

$$ \mathcal{F}^{-1} \left\{ U ( \mathbf{r} ) \delta \left( f + f_0 \right) \right\} = U ( \mathbf{r} ) e^{-j 2 \pi f_0 t } $$

We can recover the original field by taking the real part of this expression, which is equivalent to applying Euler's identity and dropping the imaginary part:

$$ u \left( \mathbf{r}, t \right) = \Re \left[ U ( \mathbf{r} ) e^{-j 2 \pi f t} \right] $$

Polychromatic Scalar Waves

To model a polychromatic wave, we integrate over the analytic signals of each spectral component and take the real part of the result:

$$ u \left( \mathbf{r}, t\right) = \Re \left[ \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, f \right) e^{-j 2 \pi f t} \,df \right] $$

The Narrowband Assumption

We get a useful representation to the expression above if we assume that the bandwidth of the signal is much smaller than its center frequency $ \Delta f \ll f_0 $:

$$ \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, f \right) e^{-j 2 \pi f t} \,df = U \left( \mathbf{r}, t \right) e^{-j 2 \pi f_0 t} $$

To better understand the meaning of this assumption, make the substitution $ \nu = f - f_0 $ into the expression on the left hand side:

$$\begin{eqnarray} \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, f \right) e^{-j 2 \pi f t} \,df &=& \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, \nu + f0 \right) e^{-j 2 \pi \left( \nu + f_0 \right) t} \,d\nu \\ &=& e^{-j 2 \pi f_0 t} \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, \nu + f0 \right) e^{-j 2 \pi \nu t} \,d\nu \end{eqnarray}$$

Under the narrowband assumption, the integration in the expression above is constrained to small values around $ \nu = 0 $ that are much less than the phasor term that is oscillating at frequency $ f_0 $. If we define the following function:

$$ U \left( \mathbf{r}, t \right) := \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, \nu + f0 \right) e^{-j 2 \pi \nu t} \,d\nu $$

then it will vary slowly with respect to the carrier frequency $ f_0 $.

As a result, under the assumptions of narrowbandedness, we can interpret the complex function $ U \left( \mathbf{r}, t \right) $ as an "envelope" modulating the amplitude of the fast oscillating carrier wave. If the assumption is not valid, then this interpretation fails.

The Slowly Varying Envelope Assumption

It is instructive to reverse our reasoning and see why a slowly-varying envelope implies a narrowband signal. Compute the Fourier transforms of the narrowband waveform, along with the Fourier transform of the derivative of $ U \left( \mathbf{r}, t \right) $.

The Fourier transform of the analytic signal:

$$ \int_{-\infty}^{\infty} \left[ U \left( \mathbf{r}, t \right) e^{-j 2 \pi f_0 t} \right] e^{-j 2 \pi f t} \,dt = \tilde{U} \left( \mathbf{r}, f + f_0 \right) $$

The Fourier transform of the derivative of $ U $:

$$\begin{eqnarray} \int_{-\infty}^{\infty} \frac{d}{dt} \left[ U \left( \mathbf{r}, t \right) \right] e^{-j 2 \pi f t} \,dt &=& j 2 \pi f \tilde{U} \left( \mathbf{r}, f \right) \end{eqnarray}$$

Now, apply the slowly varying envelope approximation (SVEA) by asserting that the rate of change of $ U $ with respect to time is much less than the value of $ U $ multiplied by the center frequency, or $ \left| \frac{d}{dt} U \left( \mathbf{r}, t\right) \right| \ll \left| 2 \pi f_0 U \left( \mathbf{r, t} \right) \right| $

$$\begin{eqnarray} \left| j 2 \pi f \tilde{U} \left( \mathbf{r}, f \right) \right| &=& \left| \int_{-\infty}^{\infty} \frac{d}{dt} \left[ U \left( \mathbf{r}, t \right) \right] e^{-j 2 \pi f t} \,dt \right| \\ &\ll& \left| \int_{-\infty}^{\infty} 2 \pi f_0 U \left( \mathbf{r}, t \right) e^{-j 2 \pi f t} \,dt \right| \\ &\ll& 2 \pi f_0 \left| \tilde{U} \left( \mathbf{r}, f \right) \right| \end{eqnarray}$$

This expression means that the appreciable frequency components of $ U \left( \mathbf{r} , t \right) $ are much less than the frequency $ f_0 $². And when we consider the spectrum of $ U \left( \mathbf{r} , t \right) $ centered around $ f_0 $, we find that the bandwidth $ \Delta f $ is small with respect to $ f_0 $.

Assumptions, not Approximations!

The narrowband and slowly varying envelope assumptions are usually referred to as approximations. This is misleading! The resulting expression for the field is not an approximation at all; instead, under the assumptions of narrowbandedness, we can interpret the complex function $ U \left( \mathbf{r}, t \right) $ an "envelope" modulating the amplitude of the fast oscillating carrier wave. If the assumption is not valid, then this interpretation is not correct.

Narrowband Polychromatic Waves

In summary, narrowband polychromatic waves with a center frequency $ f_0 $ are modeled as the product of a fast rotating phasor and slowly varying envelope:

$$ u \left( \mathbf{r}, t \right) = \Re \left[ U \left( \mathbf{r}, t \right) e^{-j 2 \pi f_0 t} \right] $$

The amplitude and the phase of the envelope are the amplitude and phase of the real optical wave.

Coherence

While the expression for the analytic signal $ U \left( \mathbf{r}, t \right) $ as an integral over frequency components appears deterministic, the phase relationships between the spectral components are often unknown and vary randomly in time. As a result, the envelope of the optical wave will vary unpredictably and must be analyzed in terms of its statistical properties.

Monochromatic Light is Coherent

Since monochromatic light has only one spectral component by definition, it is completely coherent.

I mean monochromatic in the ideal sense, not like how we sometimes describe lasers.
Monochromatic waves, like plane waves, cannot exist in real life. The uncertainy principle requires that a monochromatic wave exist for an infinite duration.

Goodman, Joseph W. Introduction to Fourier optics. Roberts and Company publishers (2005). ISBN 978-0974707723. ↩
https://physics.stackexchange.com/questions/451239/slowly-varying-envelope-approximation-what-does-it-imply ↩

A Very Brief Summary of Fresnel and Fraunhofer Diffraction Integrals

Kyle M. Douglass

2025-03-28 09:06

Comments

Fourier Optics is complicated, and though I have internalized its concepts over the years, I often still need to review the specifics of its mathematical models. Unfortunately, my go-to resource for this, Goodman's Fourier Optics¹, tends to disperse information across chapters and homework problems. This makes quick review difficult.

Here I condense what I think are the essentials of Fresnel and Fraunhofer diffraction into one blog post.

Starting Point: the Huygens-Fresnel Principle

Ignore Chapter 3 of Goodman; it's largely irrelevant for practical work. The Huygens-Fresnel principle itself is a good intuitive model to start with.

The Model

An opaque screen with a clear aperture $ \Sigma $ is located in the $ z = 0 $ plane with transverse coordinates $ \left( \xi , \eta \right ) $. It is illuminated by a complex-valued scalar field $ U \left( \xi, \eta \right) $. Let $ \vec{r_0} = \left( \xi, \eta, 0 \right) $ be a point in the plane of the aperture and $ \vec{r_1} = \left( x, y, z \right) $ be a point in the observation plane. The Huygens-Fresnel Principle provides the following formula for the diffracted field $ U \left( x, y \right) $ in the plane $ z $:

$$ U \left( x, y; z \right) = \frac{z}{j \lambda} \iint_{\Sigma} U \left( \xi , \eta \right) \frac{\exp \left( j k r_{01} \right)}{r_{01}^2} \, d\xi d\eta $$

with the distance $ r_{01}^2 = \left( x - \xi \right)^2 + \left( y - \eta \right)^2 + z^2 $.

We assumed an obliquity factor $ cos \, \theta = z / r_{01}$. The choice of obliquity factor depends on the boundary conditions discussed in Chapter 3, but again this isn't terribly important for practical work.
The integral is a sum over secondary spherical wavelets emitted by each point in the aperture and weighted by the incident field and the obliquity factor.
The factor $ 1 / j $ means that each secondary wavelet from a point $ \left( \xi, \eta \right) $ is 90 degrees out-of-phase with the incident field at that point.

Approximations used in the Huygens-Fresnel Principle

The electromagnetic field can be approximated as a complex-valued scalar field.
$ r_{01} \gg \lambda $, or the observation screen is many multiples of the wavelength away from the aperture.

The Fresnel Diffraction Integral

The Fresnel Approximation

Rewrite $ r_{01} $ as:

$$ r_{01} = z \sqrt{ 1 + \frac{\left( x - \xi \right)^2 + \left( y - \eta \right)^2}{z^2} } $$

Apply the binomial approximation:

$$ r_{01} \approx z + \frac{\left( x - \xi \right)^2 + \left( y - \eta \right)^2}{2z} $$

In the Huygens-Fresnel diffraction integral, replace:

$r_{01}^2 $ in the denominator with $ z^2 $
$r_{01}$ in the argument of the exponential with $ z + \frac{\left( x - \xi \right)^2 + \left( y - \eta \right)^2}{2z} $

The Diffraction Integral: Form 1

Perform the substitutions for $ r_{01} $ into the Huygens-Fresnel formula that were mentioned above to get the first form of the Fresnel diffraction integral:

$$ U \left( x, y; z \right) = \frac{ e^{jkz} }{j \lambda z} \iint_{-\infty}^{\infty} U \left( \xi , \eta \right) \exp \left\{ \frac{jk}{2z} \left[ \left( x - \xi \right)^2 + \left( y - \eta \right)^2 \right] \right\} \,d\xi \,d\eta $$

It is space invariant, i.e. it depends only on the differences in coordinates $ \left( x - \xi \right) $ and $ \left( y - \eta \right) $.
It represents a convolution of the input field with the kernel $ h \left( x, y \right) = \frac{e^{j k z}}{j \lambda z} \exp \left[ \frac{j k}{2 z} \left( x^2 + y^2 \right) \right] $.

The Diffraction Integral: Form 2

Expand the squared quantities inside the parantheses of Form 1 to get the second from of the integral:

$$ U \left( x, y; z \right) = \frac{ e^{jkz} }{j \lambda z} e^{\frac{j k}{2 z} \left( x^2 + y^2 \right)} \iint_{-\infty}^{\infty} \left[ U \left( \xi , \eta \right) e^{\frac{j k}{2 z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda z} \left( x \xi + y \eta \right) } \,d\xi \,d\eta $$

It is proportional to the Fourier transform of the product of the incident field and a parabolic phase curvature $ e^{\frac{j k}{2 z} \left( \xi^2 + \eta^2 \right)} $.

Phasor Conventions

Section 4.2.1 of Goodman is an interesting practical aside about how to identify whether a spherical or parabolic wavefront is converging or diverging based on the sign of its phasor. It is useful for solving the important homework problem 4.16 which concerns the diffraction pattern from an aperture that is illuminated by a converging spherical wave.

Unfortunately, Figure 4.2 does not align well with its description in the text about negative z-values, and it's not clear how the interpretations change for point sources not at $ z = 0 $. I address this below.

Let the point of convergence (or center of divergence) of a spherical wave sit on the z-axis at $ z = Z $.
The phasor describing the time-dependent part of the field in Goodman's notation is $ e^{-j 2 \pi f t} $.
If we move away from the center of the wave such that $ z - Z $ is positive and we encounter wavefronts emitted earlier in time, then $ t $ is decreasing and the argument to the phasor is increasing. The wave is therefore diverging if the argument is positive.
If we move away from the center of the wave such that $ z - Z $ is negative and we encounter wavefronts emitted earlier in time, then $ t $ is decreasing and the argument to the phasor is increasing. However, a negative $ z - Z $ makes the phasor negative again so that it is in fact decreasing. The wave is therefore diverging if the argument is negative.
Likewise for converging waves.

To summarize:

Phasor	$ \left( z - Z \right) $ positive	$ \left( z - Z \right) $ negative
$ e^{ j k r} $	diverging	converging
$ e^{ -j k r} $	converging	diverging

The Fraunhofer Diffraction Integral

The Fraunhofer Approximation

Assume we are so far from the screen that the quadratic phasor inside the diffraction integral is effectively flat. This means:

$$ z \gg \frac{k \left( \xi^2 + \eta^2 \right)_{\text{max}}}{2} $$

The Diffraction Integral

Applying the approximation above allows us to drop the quadratic phasor inside the Fresnel diffraction integral because it is effectively 1:

$$ U \left( x, y; z \right) = \frac{ e^{jkz} }{j \lambda z} e^{\frac{j k}{2 z} \left( x^2 + y^2 \right)} \iint_{-\infty}^{\infty} U \left( \xi , \eta \right) e^{-j \frac{2 \pi }{\lambda z} \left( x \xi + y \eta \right) } \,d\xi \,d\eta $$

Apart from the phase term that depends on $ z $, this expression represents a Fourier transform of the incident field.
It appears to break spatial invariance because we no longer depend on differences of coordinates, e.g. $ x - \xi $. However, we can still use the Fresnel transfer function (the Fourier transform of the Fresnel convolution kernel) as the transfer function for Fraunhofer diffraction because if the Fraunhofer approximation is valid, then so is the Fresnel approximation.

Solution to Homework Problem 4.16

Problem 4.16 is important because it is a basis for the development of the frequency analysis of image-forming systems in later chapters of Goodman.

The purpose of 4.16 is to show that the diffraction pattern of an aperture that is illuminated by a spherical converging wave in the Fresnel regime is the Fraunhofer diffraction pattern of the aperture.

Part a: Quadratic phase approximation to the incident wave

Let $ z = 0 $ be the plane of the aperture and $ z = Z $ be the observation plane. Additionally, let $ \left( \xi, \eta \right) $ represent the coordinates in the plane of the aperture, and $ \left( x, y \right) $ the coordinates in the observation plane. The spherical wave that illuminates the aperture is convering to a point $ \vec{r}_P = Y \hat{ \jmath} + Z \hat{k} $ in the observation plane.

To find a quadratic phase approximation for the incident wave, start with its representation as a time-harmonic spherical wave of amplitude $ A $:

$$ U \left( x, y, z \right) = A \frac{e^{j k |\vec{r} - \vec{r}_P|}}{|\vec{r} - \vec{r}_P|} $$

Note that $ \vec{r} - \vec{r}_P = x \hat{\imath} + \left( y - Y \right) \hat{\jmath} + \left( z - Z \right) \hat{k} $. Its magnitude is

$$\begin{eqnarray} | \vec{r} - \vec{r}_P | &=& \sqrt{x^2 + \left( y - Y \right)^2 + \left( z - Z \right)^2} \\ &=& \left( z - Z \right) \sqrt{1 + \frac{x^2 + \left( y - Y \right)^2}{\left( z - Z \right)^2} } \\ &\approx& \left( z - Z \right) + \frac{ x^2 + \left( y - Y \right)^2 }{2 \left( z - Z \right)} \end{eqnarray}$$

At first glance, there's a problem here because allowing $ \left( z - Z \right) $ to be negative will result in a negative value for the magnitude of the vector $ \left( \hat{r} - \hat{r}_P \right) $. However, if we use the above table for selecting $ e^{j k r} $ as the phasor for a converging wave when $ \left( z - Z \right) $ is negative, then we will have the correct sign of the argument to the phasor. We do however need to take the absolute value of the $ z - Z $ term in the denominator of the expression of the spherical wave.

Replacing the distance in the phasor's argument with the two lowest order terms in the binomial expansion and the lowest order term in the denominator:

$$ U \left( x, y, z \right) \approx A \frac{e^{j k \left(z - Z \right)} e^{j k \left[ x^2 + \left( y - Y \right)^2 \right] / 2 \left(z - Z \right) }}{\left|z - Z \right|} $$

In the $ z = 0 $ plane, this becomes:

$$ U \left( x, y; z = 0 \right) \approx A \left(x, y \right) \frac{e^{-j k Z} e^{-j k \left[ x^2 + \left( y - Y \right)^2 \right] / 2 Z }}{Z} $$

I moved the finite extent of the aperture into a new function for the amplitude $ A $ above. This function is zero outside the aperture and a constant $ A $ inside it.

Part b: Diffraction pattern at the point $ P $

Use the second form of the Fresnel diffraction integral to compute the diffraction pattern at $ P $:

$$\begin{eqnarray} U \left( x = 0, y = Y, z = Z \right) &=& \frac{ e^{jkZ} }{j \lambda Z} e^{\frac{j k Y^2}{2 Z}} \iint_{-\infty}^{\infty} \left[ U \left( \xi , \eta ; z = 0 \right) e^{\frac{j k}{2 Z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{ e^{jkZ} e^{-jkZ} }{j \lambda Z^2} e^{\frac{j k Y^2}{2 Z}} \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) \left[ e^{-\frac{j k}{2Z} \left[ \xi^2 + \left( \eta - Y \right)^2 \right]} e^{\frac{j k}{2 Z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{1}{j \lambda Z^2} e^{\frac{j k Y^2}{2 Z} } \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) \left[ e^{-\frac{j k}{2Z} \left( \xi^2 + \eta^2 - 2 \eta Y + Y^2 \right)} e^{\frac{j k}{2 Z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{1}{j \lambda Z^2} \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) e^{\frac{j k \eta Y}{Z}} e^{-j \frac{2 \pi}{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{1}{j \lambda Z^2} \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) e^{-j \frac{2 \pi }{\lambda Z} \left(\eta - Y \right) } \,d\xi \,d\eta \end{eqnarray}$$

The final expression above is proportional to the Fraunhofer diffraction pattern of the aperture. The reason that the Fraunhofer diffraction pattern appears as the result is that the converging spherical wavefronts exactly cancel the diverging quadratic phase term inside the Fresnel diffraction formula, leaving a simple Fourier transform of the aperture as a result.

Goodman, Joseph W. Introduction to Fourier optics. Roberts and Company publishers (2005). ISBN 978-0974707723. ↩

Data Type Alignment for Ray Tracing in Rust

Kyle M. Douglass

2025-02-24 08:40

Comments

I would like to clean up my 3D ray trace routines for my Rust-based optical design library. The proof of concept (PoC) is finished and I now I need to make the code easier to modify to better support the features that I want to add on the frontend. I suspect that I might be able to make some performance gains as well during refactoring. Towards this end, I want to take a look at my ray data type from the perspective of making it CPU cache friendly.

One of the current obstacles to adding more features to the GUI (for example color selection for different ray bundles) is how I handle individual rays. For the PoC it was fastest to add two additional fields to each ray to track where they come from and whether they are terminated:

struct Vec3 {
    e: [f64; 3],
}

struct Ray {
    pos: Vec3,
    dir: Vec3,
    terminated: bool,
    field_id: usize,
}

A ray is just two, 3-element arrays of floats that specify the coordinates of a point on the ray and its direction cosines. I have additionally included a boolean flag to indicate whether the ray has terminated, i.e. gone out-of-bounds of the system or failed to converge during calculation of the intersection point with a surface.

A ray fan is a collection of rays and is specified by a 3-tuple of wavelength, axis, and field; field_id really should not belong to an individual Ray because it can be stored along with the set of all rays for the current ray fan. I probably added it because it was the easiest thing to do at the time to get the application working.

A deeper look into the Ray struct

Size of a ray

Let's first look to see how much space the Ray struct occupies.

use std::mem;

#[derive(Debug)]
struct Vec3 {
    e: [f64; 3],
}

#[derive(Debug)]
struct Ray {
    pos: Vec3,
    dir: Vec3,
    terminated: bool,
    field_id: usize,
}

fn main() {
    println!("Size of ray: {:?}", mem::size_of::<Ray>());
}

The Ray struct occupies 64 bytes in memory. Does this make sense?

The sizes of the individual fields are:

Field	Size, bytes
pos	24
dir	24
terminated	1
field_id	8*

pos and dir are each 24 bytes because they are each composed of three 64-bit floats and 8 bits = 1 byte. terminated is only one byte because it is a boolean. field_id is a usize, which means that it depends on the compilation target. On 64-bit targets, such as x86_64, it is 64 bits = 8 bytes in size.

Adding the sizes in the above table gives 57 bytes, not 64 bytes as was output from the example code. Why is this?

Alignment and padding

Alignment refers to the layout of a data type in memory and how it is accessed. CPUs read memory in chunks that are equal in size to the word size. Misaligned data is inefficient to access because the CPU requires more cycles than is necessary to fetch the data.

Natural alignment refers to the most efficient alignment of a data type for CPU access. To achieve natural alignment, a compiler can introduce padding between fields of a struct so that the memory address of a field or datatype is a multiple of the field's/data type's alignment.

As an example of misalignment, consider a 4-byte integer and that starts at memory address 5. The CPU has 32-bit memory words. To read the data, the CPU must:

read bytes 4-7,
read bytes 8-11,
and combine the relevant parts of both reads to get the 4 bytes, i.e. bytes 5 - 8.

Notice that we must specify the memory word size to determine whether a data type is misaligned.

Here is an important question: why can't the CPU just start reading from memory address 5? The answer, as far as I can tell, is that it just can't. This is not how the CPU, RAM, and memory bus are wired.

Alignment in Rust

Alignment in Rust is defined as follows:

The alignment of a value specifies what addresses are valid to store the value at. A value of alignment n must only be stored at an address that is a multiple of n.

The Rust compiler only guarantees the following when it comes to padding fields in structs:

The fields are properly aligned.

The fields do not overlap.

The alignment of the type is at least the maximum alignment of its fields.

So for my Ray data type, its alignment is 8 because the maximum alignment of its fields is 8 bytes. (pos and dir are composed of 8-byte floating point numbers). The addresses of its fields are:

fn main() {
    let ray = Ray {
        pos: Vec3 { e: [0.0, 0.0, 0.0] },
        dir: Vec3 { e: [0.0, 0.0, 1.0] },
        terminated: false,
        field_id: 0,
    };

    unsafe {
        println!("Address of ray.pos: {:p}", ptr::addr_of!(ray.pos));
        println!("Address of ray.dir: {:p}", ptr::addr_of!(ray.dir));
        println!("Address of ray.terminated: {:p}", ptr::addr_of!(ray.terminated));
        println!("Address of ray.field_id: {:p}", ptr::addr_of!(ray.field_id));
    }
}

I got the following results, which will vary from system-to-system and probably run-to-run:

Address of ray.pos: 0x7fff076c6b50
Address of ray.dir: 0x7fff076c6b68
Address of ray.terminated: 0x7fff076c6b88
Address of ray.field_id: 0x7fff076c6b80

So the pos field comes first at address 0x6b50 (omitting the most significant hexadecimal digits). Then, 24 bytes later, comes dir at address 0x6b68. Note that the difference is hexadecimal 0x18, which is decimal 16 + 8 = 24! So pos really occupies 24 bytes like we previously calculated.

Next comes field_id and not terminated. It is 0x6b80 - 0x6b68 = 0x0018, or 24 bytes after dir like before. So far we have no padding, but the compiler did swap the order of the fields. Finally, terminated is 8 bytes after field_id because field_id is 8-byte aligned. This means that the Rust compiler must have placed 7 bytes of padding after the terminated field.

What makes a good data type?

As I mentioned, I already know that field_id shouldn't belong to the ray for reasons related to data access by the programmer. So the reason for removing it from the Ray struct is not related to performance. But what about the terminated bool? Well, in this case, it's resulting in 7 extra bytes of padding for each ray!

#[derive(Debug)]
struct Vec3 {
    e: [f64; 3],
}

#[derive(Debug)]
struct Ray {
    pos: Vec3,
    dir: Vec3,
    terminated: bool,
}

fn main() {
    let ray = Ray {
        pos: Vec3 { e: [0.0, 0.0, 0.0] },
        dir: Vec3 { e: [0.0, 0.0, 1.0] },
        terminated: false,
    };

    println!("Size of ray: {:?}", mem::size_of::<Ray>());
}

This program prints Size of ray : 56, but 24 + 24 + 1 = 49. In both versions we waste 7 bytes.

Fitting a Ray into the CPU cache

Do I have a good reason to remove terminated from the Ray struct because it wastes space? Consider the following:

We want as many Ray instances as possible to fit within a CPU cache line if we want to maximize performance. (Note that I'm not saying that we necessarily want to maximize performance because that comes with tradeoffs.) Each CPU core on my AMD Ryzen 7 has a 64 kB L1 cache with 64 byte cache lines. This means that I can fit only 1 of the current version of Ray into each cache line for a total of 64 kB / 64 bytes = 1024 rays maximum in the L1 cache of each core. If I remove field_size and terminated, then the size of a ray becomes 48 bytes. Unfortunately, this means that only one Ray instance fits in a cache line, just as before with a 64 byte Ray.

But, if I also reduce my precision to 32-bit floats, then the size of a Ray becomes 6 * 4 = 24 bytes and I have doubled the number of rays that fit in L1 cache.

Now what if I reduced the precision but kept terminated? Then I get 6 * 4 + 8 = 32 bytes per Ray and I still have 2 rays per cache line.

I conclude that there is no reason to remove terminated for performance reasons. Reducing my floating point precision would produce a more noticeable effect on the cache locality of the Ray data type.

Does all of this matter?

My Ryzen 7 laptop can trace about 600 rays through 3 surfaces in 380 microseconds with Firefox, Slack, and Outlook running. At this point, I doubt that crafting my data types for cache friendliness is going to offer a significant payoff. Creating data types that are easy to work with is likely more important.

I do think, however, that it's important to understand these concepts. If I do need to tune the performance in the future, then I know where to look.

An Analog LED Dimmer Circuit

Kyle M. Douglass

2025-01-17 09:08

Comments

I recently needed to build a circuit to control the brightness of a 4 W LED with a knob. I know basic electronics, and I thought this would be easy. I spoke to a few people whom I know and are knowledgable in electronics. I also asked people on Reddit. A lot of people said it would be easy.

As it turns it, it wasn't easy.

The Requirements

My requirments are simple:

The brightness should be manually adjustable with a knob from OFF to nearly full ON.
The LED will serve as the light source of a microscope trans-illuminator. It should work across a large range of frame acquisition rates (1 Hz to 1 kHz, or exposure times of 1 ms to 1 s).
The range of brightnesses should be variable across the dynamic range of the camera, which in my case is 35,000:1, or about 90 dB.

I don't care about efficiency. I don't care about whether I can use a Raspberry Pi to control it. I don't care whether it can be turned on or off with different logic levels. I just want a knob that I can turn to make the LED brighter or dimmer.

In spite of the insistence of several people that I communicated with on the internet, I decided that the second requirement would preclude using pulse width modulation (PWM) to dim the LED. Even when I could convince others that PWM almost always causes aliasing at high frame rates, they tried to find obscure work arounds so I could still use PWM. I really do appreciate all the feedback I got. But I also learned that PWM is the hammer of the electronics world that makes everything look like a nail¹.

The Circuit

I reached out to a friend of mine who's a wizard at analog electronics². He suggested to use a MOSFET and to vary the gate-source voltage to control the current through the LED.

After a lot of thinking and reading, I arrived at the following circuit:

The LED is an Osram Oslon star LED (LST1-01F05-4070-01) with a maximum current of 1.3 A and a maximum forward voltage of 3.2 V. The MOSFET is an IRF510, whose gate-source threshold voltage is about 3 V.

Here's a brief explanation of what each component does:

Voltage source : This is just a 12 V wall wart.
200 nF capacitor : This smooths out any fluctuations from the wall wart.
50 kOhm potentiometer : The "knob." Turning it will vary the gate-source voltage of the MOSFET, which controls how much current flows through the LED.
50 kOhm resistor : This along with the potentiometer forms a voltage divider to keep the minimum voltage at the MOSFET gate close to where the LED turns on. Without it, you need to rotate the potentiometer almost half of its full range for the LED to turn on.
300 nF capacitor : A debounce capacitor that smooths out the mechanical irregularities of the pot when it turns.
IRF510 MOSFET : Basically a valve that I can vary continuously to control the LED current by setting the voltage at the gate.
LED : So pretty.
10 Ohm resistor : This limits the current through the resistor. I calculated its value by dividing the maximum supply voltage minus the maximum forward voltage drop across the LED by the maximum current, then rounded up for safety.

$$ R = \frac{V}{I} = \frac{\left( 12 \, V - 3.2 \, V \right) }{1.3 \, A} = 6.8 \Omega $$

The resistor also has to handle a large power dissipation at the maximum current:

$$ P = I^2 R = \left(1.3 \, A \right)^2 \left( 10 \Omega \right) = 16.9 W $$

I decided instead to keep the current to less than 1 Amp so that I could use a 10 Watt resistor that I had.

Power dissipation is also why we don't just use a potentiometer to control the LED current: my pots were only rated up to about 50 mW, whereas I expected that the MOSFET would have handle loads on the order of Watts due to the high current.

What I Learned

I really need to study MOSFETS

I still don't really know how to solve circuits with MOSFETs. I arrived at the above circuit largely by trial-and-error on a prototype and by performing naive calculations on the voltage divider that turned out to not be entirely correct. I also expected that the LED would turn on once I passed the MOSFET's gate-source voltage threshold, but this turned out to be off by about 2 or 3 V.

MOSFETs suffer from second order effects

There is currently a hysteresis in the gate-ground voltage at when the LED turns on and when it turns off by about half a Volt. According to a helpful person on Reddit, this is likely due to a change in both the LEDs forward voltage and the MOSFET's threshold voltage with temperature once the current starts flowing. A possible fix is to swap the order of the LED and the MOSFET so that only the MOSFET will contribute to the hysteresis.

You can always complicate things to make them better

The same person on Reddit above also suggested making the circuit robust to temperature variations by adding an opamp to control the MOSFET gate voltage. It would compensate for temperature changes by comparing the potentiometer value to the 10 Ohm resistor in a closed feedback loop.

Electronics is an art

Yes, electronics is a science, but I would argue that having to mentally juggle second order effects and the fact that experts seem to make an initial design "by feel" are signatures of an art.

It also struck me how nearly every step of the process forced me to take a detour to address something I hadn't at first considered, such as current limits in the wires and large variations in the MOSFET specs.

The next time I need to do something like this, I will expect the problem to take longer to solve than I first anticipate.

To my non-native English speaking readers: I mean that people try to use PWM to solve problems where it's not appropriate. ↩
And if you play electric guitar, be sure to check out his handmade effects pedals: https://www.volumeandpower.com/. ↩

Meta-Biophysics of the Cell

Kyle M. Douglass

2025-01-14 17:07

Comments

I work in a biophysics lab applying microscopy techniques to study cell biology. I am not a biologist, and I was not trained as one. I therefore have had to develop heuristics to help me understand what cell biologists do and to communicate with them effectively.

In this post, I present these heuristics as a sort of model for how I think that cell biologists think. I collectively call them "Meta-Biophysics of the Cell" because:

they model the models of cell biologists, hence they are a meta-model,
they are inspired by physics and quantitative modeling, and
I limit myself to cellular processes and not, for example, in vitro or organismal studies.

Furthermore, they are heavily biased by my work in microscopy.

These heuristics may very well be wrong. I do not pretend to understand biology nearly as well as a biologist. If you think I am wrong, please do not hesitate to leave a comment and explain why.

They also are certainly not complete. I present here only what I think the most important heuristics are.

Biologists want to know distributions of proteins across space and time

Cell biology is concerned with understanding the structure and behavior of cells as complex phenomena that emerge from the interactions of molecules. The types of these molecules may be proteins, nucleic acids, lipids, or some other type. I will use the term "protein" generally because

it's easier than always listing all the types of molecules,
it's more precise than just saying "molecule"
there are at least 10,000 different types of proteins in the human cell, making proteins a core building block of the cell, and
these heuristics do not change if you substitute another type of molecule for the word "protein."

Within a single cell, we can model the number of proteins at a point in space and a point in time with a function $ N \left( \mathbf{r}, t \right) $. This is a spatiotemporal distribution for protein number density. Its dimensions are in numbers of proteins per unit of volume.

A single protein can be represented as a point in space and time, such as a delta function:

$$ N \left( \mathbf{r}, t \right) = \delta \left( \mathbf{r}, t \right) $$

Of course a real protein occupies some volume and is not a point, but at the level of a cell I think that this is an adequate model for what cell biologists want to know about protein distributions.

But wait. There are more than ten thousand types of proteins within the cell. Some proteins are of high abundance, and some are very rare. So it is not sufficient merely to count proteins in space and time: we also need to identify their types. For this, I assign a unique ID to each type that I call $ s $ for "species". Our model now becomes a function of another variable, one that is categorical rather than continuous:

$$ N \left( \mathbf{r}, t ; s\right) $$

Below I show a simplified schematic of the volume that this model occupies. It is simplified because I show only one spatial dimension (otherwise it would be a five dimensional hypervolume). I saw a figure like this once in a paper about a decade ago, but sadly I cannot find it to credit it. (Update 2025-01-30: The paper referred to is Megason and Fraser, "Imaging in Systems Biology," Cell 130(5), 784-795 (2007))

A cell can be represented as a "volume" of spatiotemporal distributions where one spatiotemporal slice belongs to each protein species.

The $ x $ and $ t $ dimensions are continuous; the $ s $ dimension is a discrete, categorical variable. Each $ s $ slice is the spatiotemporal distribution of that protein within a single cell. Each point in the volume is the protein density for that point in space and time and for that species of protein.

Individual proteins might also vary amongst themselves as in, for example, post translation modifications. This is not really a problem for the above model because we can use a different value for the $ s $ of each variant.

Creation and degradation of a protein

The appearance and disappearance of a protein is modeled as a non-zero value over a time range from $ t_0 $ to $ t_1 $. For example, a single protein existing over a finite time interval may be expressed as

$$\begin{eqnarray} &&\delta \left( \mathbf{r}, t \right), \, t_0 < t < t_1 \\ &&0, \, \text{otherwise} \end{eqnarray}$$

Biologists can only measure slices of protein distributions

Look again at the figure above. Remember that there are in fact five dimensions. Can any one experimental technique measure the whole hypevolume?

No. Instead, biologists can measure slices from the volume and try to piece together a complete picture of $ N $ from individual measurements.

For example, fluorescence microscopy can measure proteins in space and time. Sometimes it can measure in 3 spatial dimensions, but it is easier to measure in 2. Unfortunately, it can only measure a small of number of protein species relative to all the proteins that are in the cell. These would correspond to the different fluorescence channels of the measurement. Thus, fluorescence microscopy provides a slice of the volume that looks like the following:

Fluorescence microscopy measures a small slice of the spatiotemporal protein distribution that represents a cell.

Other types of measurements, such as those in single cell proteomics, might measure a large number of proteins but cannot resolve them in space and time. They would slice the volume like this:

So after their measurements, biologists are always going to be left with less than the total amount of information contained within a single cell because they can only measure slices of $ N $.

Organelles are mutually exclusive slices of protein distributions in space

Consider a mitochondrion. It is a membrane-bound organelle. Everything inside the mitochondrion is considered part of the organelle, and everything outside it is not. An organelle is therefore a mutually-exclusive slice through the volume dimensions of $ N $.

The slices are mutually-exclusive because two mitochondria cannot occupy the same volume at the same time and have a distinct identity.

For non-membrane bound organelles, keep reading.

Organelles contain many types proteins

If we use the definition of an organelle as a mutually-exclusive slice through the volume dimensions of $ N $, then we can look sideways along the $ s $ dimension to find all the proteins that belong to the organelle. If the distribution for protein $ s_i $ is non-zero inside an organelle's volume at a given time, then it belongs to the organelle. The set of all proteins within the volume slice at a given time constitute the organelle.

Knowing $ N $ doesn't by itself tell us what is and is not an organelle

Membrane-bound organelles are easy to identify because of their structure. Other sets of proteins within a given volume may or may not form an organelle. In these situations, we might look at their function instead to decide whether the volume is or is not occupied by an organelle.

For example, organelles like the centrosome have a diffuse, pericentriolar material that surround them. In this case, the border defining what is and isn't inside the centrosome is likely to be somewhat arbitrary.

Cause and Effect is the probability of one protein distribution given another

At this time, I am much less certain about how function and causal relationships fit within this model. It is nevertheless important because biologists are deeply interested in the function of proteins and other complexes. To a rough approximation, I would say that cause and effect describes how the spatiotemporal distributions of a subset of protein species can serve as a predictor of another distribution at a later time. In other words, we can assign a probability to a certain distribution given another one.

I would guess that not every possible set proteins is linked by causal relationships. This would mean that the limitations that come from being able to sample slices of the protein distribution hypervolume are not so significant. You would then want choose your measurements so that you slice the volume to include only the species that are causally linked for the phenomenon that you are studying.

As a consequence of this, the causal links between distributions are likely more important than knowing $ N $. I doubt that we can have a satisfactory understanding of the cell if we could exhaustively measure $ N $ for even a single cell.

Interactions between proteins require spatial colocalization

Protein-protein interactions occur on length-scales on the order of the size of individual proteins. For two different proteins to "interact" we require that they be colocated less than this distance. Colcalization means that two proteins are located less than the distance required for an interaction to occur.

Furthermore, colocalization is necessary but not sufficient for an interaction. A real interaction involves the chemistry between the two different species.

Summary

In summary, my main three heuristics for meta-biophysics of the cell are:

Biologists want to know distributions of proteins across space and time
Organelles are mutually exclusive slices of protein distributions in space
Cause and Effect is the probability of one or more protein distributions given another

I find that nearly all the problems that the cell biologists that I work with can be reformulated into this language. I emphasize again that the "real" science is being done by the biologists, and I in no way mean to diminish the complexity of their work. These heuristics are merely a tool that I use to understand what they are doing when I myself am unfamiliar with their jargon and mental models.

Coordinate Systems for Modeling Microscope Objectives

Kyle M. Douglass

2024-11-21 10:52

Comments

A common model for infinity corrected microscope objectives is that of an aplanatic and telecentric optical system. In many developments of this model, emphasis is placed upon the calculation of the electric field near the focus. However, this has the effect that the definition of the coordinate systems and geometry are conflated with the determination of the fields. In addition, making the model amenable to computation often occurs as an afterthought.

In this post I will explore the geometry of an aplanatic system for modeling high NA objectives with an emphasis on computational implementations. My approach follows Novotny and Hecht¹ and Herrera and Quinto-Su².

The Model Components

The model system is illustrated below:

A high NA, infinity corrected microscope objective as an aplanatic and telecentric optical system.

In this model, we abstract over the details of the objective by representing it as four surfaces:

A back focal plane containing an aperture stop
A back principal plane, $ P $
A front principal surface, $ P' $
A front focal plane

The space to the left of the back principal plane is called the infinity space. The space to the right of the front principal surface is called the sample space.

We let the infinity space refractive index $ n_1 = 1 $ because it is in air. The refractive index $ n_2 $ is the refractive index of the immersion medium.

The unit vectors $ \mathbf{n} $ are not used in this discussion; they are relevant for computing the fields.

Assumptions

We make one assumption: the system obeys the sine condition. The meaning of this will be explained later.

An aplanatic system is one that obeys the sine condition.

We will not assume the intensity law to conserve energy because it is only necessary when computing the electric field near the focus.

The Aperture Stop and Back Focal Plane

The aperture stop (AS) of an optical system is the element that limits the angle of the marginal ray.

The system is telecentric because the aperture stop is located in the back focal plane (BFP). We can shape the focal field by spatially modulating any of the amplitude, phase, or polarization of the incident light in a plane conjugate to the BFP.

The Back Principal Plane

This is the plane in infinity space at which rays appear to refract. It is a plane because rays coming from a point in the front focal plane all emerge into the infinity space in the same direction.

Strictly speaking, focus field calculations require us to propagate the field from the AS to the back principal plane before computing the Debye diffraction integral, but this step is often omitted³. The assumptions of paraxial optics should hold here.

The Front Principal Surface

The front principal surface is the surface at which rays appear to refract in the sample space. It is a surface because

this is a non-paraxial system, and
we assumed the sine condition.

The sine condition states that refraction of a ray coming from an on-axis point in the front focal plane occurs on a spherical cap centered upon the focal point. The distance from the optical axis of the point of intersection of the ray with the surface is proportional to the sine of the angle that the ray makes with the axis.

The principal surface is in the far field of the electric field coming from the focal region. For this reason, we can represent a point on this surface as representing a single ray or a plane wave¹.

The Front Focal Plane

This plane is located a distance $ n_2 f $ from the principal surface⁴. It is not at a distance $ f $ from this surface. This is a result of imaging in an immersion medium.

Geometry and Coordinate Systems

The Aperture Stop Radius

The aperture stop radius $ R $ corresponds to the distance from the axis to the point where the marginal ray intersects the front prinicpal surface. In the sample space, the marginal ray travels at an angle $ \theta_{max} $ with respect to the axis.

Under the sine condition, this height is

$$ R = n_2 f \sin{ \theta_{max} } = f \, \text{NA} $$

The right-most expression uses the definition of the numerical aperture $ \text{NA} \equiv n \sin{ \theta_{max} } $.

Compare this result to the oft-cited expression for the entrance pupil diameter of an objective lens: $ D = 2 f \, \text{NA} $. They are the same. This makes sense because an entrance pupil is either

an image of an aperture stop, or
a physical stop.

The Back Principal Plane

There are two independent coordinate systems in the back principal plane:

the spatial coordinate system defining the far field positions $ \left( x_{\infty} , y_{\infty} \right) $, and
the coordinate system of the angular spectrum of plane waves $ \left( k_x, k_y \right) $.

The Far Field Coordinate System

The far field coordinate system may be written in Cartesian form as $ \left( x_{\infty} , y_{\infty} \right) $. It also has a cylindrical representation as

$$\begin{eqnarray} \rho &=& \sqrt{x_{\infty}^2 + y_{\infty}^2} \\ \phi &=& \arctan \left( \frac{y_{\infty}}{x_{\infty}} \right) \end{eqnarray}$$

The cylindrical representation appears to be preferred in textbook developments of the model. The Cartesian representation is likely preferred for computational models because it works naturally with two-dimensional arrays of numbers, and because beam shaping elements such as spatial light modulators are rectangular arrays of pixels².

The Angular Spectrum Coordinate System

Each point in the angular spectrum coordinate system represents a plane wave in the sample space that is traveling at an angle $ \theta $ to the axis according to:

$$\begin{eqnarray} k_x &=& k \sin \theta \cos \phi \\ k_y &=& k \sin \theta \sin \phi \\ k_z &=& k \cos \theta \end{eqnarray}$$

where $ k = 2 \pi n_2 / \lambda = n_2 k_0 $.

Along the y-axis ( $ x_{\infty} = 0 $ ), the maximum value of $ k_y $ is $n_2 k_0 \sin \theta_{max} = k_0 \, \text{NA} $.

Substitute in the expression $ \text{NA} = R / f $ and we get $k_{y, max} = k_0 R / f$. But $ R = y_{\infty, max} $. This (and similar reasoning for the x-axis) implies that:

$$\begin{eqnarray} k_x &=& k_0 x_{\infty} / f \\ k_y &=& k_0 y_{\infty} / f \end{eqnarray}$$

The above equations link the angular spectrum coordinate system to the far field coordinate system. They are no longer independent once $ f $ and $ \lambda $ are specified.

Numerical Meshes

There are four free parameters for defining the coordinate systems of the numerical meshes:

The numerical aperture, $ \text{NA} $
The wavelength, $ \lambda $
The focal length, $ f $
The linear mesh size, $ L $

Below is a figure that illustrates the construction of the meshes. Both the far field and angular spectrum coordinate systems are represented by a $ L \times L $ array. $ L = 16 $ in the figure below. In general the value of $ L $ should be a power of 2 to help ensure the efficiency of the Fast Fourier Transform (FFT). By considering only powers of 2, we need only consider arrays of even size as well.

A numeric mesh representing the far field and angular spectrum coordinate systems of a microscope objective. Fields are sampled at the center of each mesh pixel.

The fields are defined on a region of circular support that is centered on this array. The radius of the domain of the far field coordinate system is $ f \text{NA} $; the radius of the domain of the angular spectrum coordinate system is $ k_0 \text{NA} $.

The boxes that are bound by the gray lines indicate the location of each field sample. The $ \left( x_{\infty} , y_{\infty} \right) $ and the $ \left( k_x, k_y \right) $ coordinate systems are sampled at the center of each gray box. The origin is therefore not sampled, which will help avoid division by zero errors when the fields are eventually computed.

The figure suggests that we could create only one mesh and scale it by either $ f \text{NA} $ or $ k_0 \text{NA} $ depending on which coordinate system we are working with. The normalized coordinates become $ \left( x_{\infty} / \left( f \text{NA} \right), y_{\infty} / \left( f \text{NA} \right) \right) $ and $ \left( k_x / \left( k_0 \text{NA} \right), k_y / \left( k_0 \text{NA} \right) \right) $.

1D Mesh Example

As an example, let $ L = 16 $. To four decimal places, the normalized coordinates are $ -1.0000, -0.8667, \ldots, -0.0667, 0.0667, \ldots, 0.8667, 1.0000 $.

The spacing between array elements is $ 2 / \left( L - 1 \right) = 0.1333 $. Note that 0 is not included in the 1D mesh as it goes from -0.0667 to 0.0667.

A 2D mesh is easily constructed from the 1D mesh using tools such as NumPy's meshgrid.

Back Principal Plane Mesh Spacings

In the x-direction, the mesh spacing of the far field coordinate system is

$$ \Delta x_{\infty} = 2 R / \left( L - 1 \right) = 2 f \text{NA} / \left( L - 1 \right) $$

In the $ k_x $-direction, the mesh spacing of the angular spectrum coordinate system is

$$ \Delta k_x = 2 k_{max} / \left( L - 1 \right) = 2 k_0 \text{NA} / \left( L - 1 \right) $$

Note the symmetry between these two expressions. One scales with $ f \text{NA} $ and the other $ k_0 \text{NA} $. Recall that these are free parameters of the model.

Sample Space Mesh Spacing

It is interesting to compute the spacing between mesh elements $ \Delta x $ in the sample space when the fields are eventually computed.

The sampling angular frequency in the sample space is $ k_S = 2 \pi / \Delta x $.

The Nyquist-Shannon sampling theory states that the maximum informative angular frequency is $ k_{max} = k_S / 2 $.

From the previous section, we know that $ k_{max} = \left(L - 1 \right) \Delta k_x / 2 $, and that $ \Delta k_x = 2 k_0 \text{NA} / \left( L - 1 \right) $.

Combining all the previous expressions and simplifying, we get:

$$\begin{eqnarray} k_S &=& 2 k_{max} \\ 2 \pi / \Delta x &=& \left(L - 1 \right) \Delta k_x \\ 2 \pi / \Delta x &=& \left(L - 1 \right) \left[ 2 k_0 \text{NA} / \left( L - 1 \right) \right] \\ 2 \pi / \Delta x &=& \left(L - 1 \right) \left[ 2 \left(2 \pi / \lambda \right) \text{NA} / \left( L - 1 \right) \right] \end{eqnarray}$$

Solving the above expression for $ \Delta x $, we arrive at

$$ \Delta x = \frac{\lambda}{2 \text{NA}} $$

which is of course the Abbe diffraction limit.

Effect of not Sampling the Origin

Herrera and Quinto-Su² point out that an error will be introduced if we naively apply the FFT to compute the field components in the $ \left( k_x, k_y \right) $ coordinate system because the origin is not sampled, whereas the FFT assumes that we sample the zero frequency component. The effect is that the result of the FFT has a constant phase error that accounts for a half-pixel shift in each direction of the mesh.

Consider again the 1D mesh example with $L = 16 $: $ -1.0000, -0.8667, \ldots, -0.0667, 0.0667, \ldots, 0.8667, 1.0000 $

In Python and other languages that index arrays starting at 0, the origin is located at $L / 2 - 0.5 $, i.e. halfway between the samples at index 7 and 8. A lateral shift in Fourier space is equivalent to a phase shift in real space:

$$ \phi_{shift} \left(X, Y \right) = -j 2 \pi \frac{0.5}{L} X - j 2 \pi \frac{0.5}{L} Y $$

where $ X $ and $ Y $ are normalized coordinates.

At this point, I am uncertain whether the phasor with the above argument needs to be multiplied or divided with the result of the FFT because 1. there are a few typos in the signs for the coordinate system bounds in the manuscript of Herrera and Quinto-Su, and 2. the correction was developed for use in MATLAB, which indexes arrays starting at 1. Once the fields are computed, it would be easy to verify the correct sign of the phase terms following the procedure outlined in Figure 3 of Herrera and Quinto-Su's manuscript.

Structure of the Algorithm

The algorithm to compute the focus fields will proceed as follows:

(optional) Propgate the inputs fields from the AS to the back principal plane using paraxial wave propagation
Input the sampled fields in the back principal plane in the $ \left( x_{\infty}, y_{\infty} \right) $ coordinate system
Transform the fields to the $ \left( k_x, k_y \right) $ coordinate system
Compute the fields in the $ \left(x, y, z \right) $ coordinate system using the FFT

Additional Remarks

Zero padding the mesh will increase the sample space resolution beyond the Abbe limit, but since the fields remain zero outside of the support, no new information is added.
On the other hand, zero padding might be required when computing fields going from the sample space to the back principal plane to faithfully reproduce any evanescent components.
Separating the coordinate system and mesh construction from the calculation of the fields reveals that the two assumptions of the model belong separately to each part. The sine condition is used in the construction of the coordinate systems, whereas energy conservation is used when computing the fields.
This post did not explain how to compute the fields.
Herrera and Quinto-Su (and possibly also Novotny and Hecht) appear to use an "effective" focal length which can be obtained by multiplying the one that I use by the sample space refractive index. I prefer my formulation because it is consistent with geometric optics and the well-known expression for the diameter of an objective's entrance pupil. When the fields are calculated, however, I do not yet know whether the arguments of the phasors of the Debye integral will require modification.

Lukas Novotny and Bert Hecht, "Principles of Nano-Optics," Cambridge University Press (2006). https://doi.org/10.1017/CBO9780511813535 ↩↩
Isael Herrera and Pedro A. Quinto-Su, "Simple computer program to calculate arbitrary tightly focused (propagating and evanescent) vector light fields," arXiv:2211.06725 (2022). https://doi.org/10.48550/arXiv.2211.06725 ↩↩↩
Marcel Leutenegger, Ramachandra Rao, Rainer A. Leitgeb, and Theo Lasser, "Fast focus field calculations," Opt. Express 14, 11277-11291 (2006). https://doi.org/10.1364/OE.14.011277 ↩
Sun-Uk Hwang and Yong-Gu Lee, "Simulation of an oil immersion objective lens: A simplified ray-optics model considering Abbe’s sine condition," Opt. Express 16, 21170-21183 (2008). https://doi.org/10.1364/OE.16.021170 ↩

GitHub CLI Authorization with a Fine-grained Access Token

Kyle M. Douglass

2024-10-04 14:18

Comments

It is a good idea to use fine-grained access tokens for shared PCs in the lab that require access to private GitHub repos so that you can restrict the scope of their use to specific repositories and not use your own personal SSH keys on the shared machines. I am experimenting with the GitHub command line tool gh to authenticate with GitHub using fine-grained access tokens and make common remote operations on repos easier.

Today I encountered a subtle problem in the gh authentication process. If you set the protocol to ssh during login, then you will not have access to the repos that you granted permissions to in the fine-grained access token. This can lead to a lot of head scratching because it's not at all clear which permissions map to which git operations. In other words, what you think is a specific permissions error with the token is actually an authentication error.

To avoid the problem, be sure to specify https and not ssh as the protocol during authentication:

 echo "$ACCESS_TOKEN" | gh auth login -p https --with-token

Phasor	\( \left( z - Z \right) \) positive	\( \left( z - Z \right) \) negative
\( e^{ j k r} \)	diverging	converging
\( e^{ -j k r} \)	converging	diverging

Converting the Quadric to its Normal Form

The Source(s) of Read Noise

Read Noise is Gaussian because of the Central Limit Theorem

Sums of Discrete Random Variables

Example of a Continuity Correction

The Problem

Ray Tracing Review

Reference Frames

The Global Reference Frame

Local Reference Frames

Sequential System Models

3D Layouts of Sequential Surfaces

The Cursor

Convention for Maintaining Right Handedness upon Reflection

Transformations between Reference Frames

Example

Step 1: Translate from the Global Origin to the Cursor Frame

Step 2: Rotate into the Cursor Frame

Step 3: Rotate into the Surface Local Frame

Active vs. Passive Rotations

Extrinsic vs. Intrinsic Rotations

Euler Angles and Rotation Sequences

The Solution to the Example

Step 4: Rotate back from the Surface Local to the Global Frame

The Analytic Signal Representation of a Monochromatic Wave

Monochromatic Scalar Waves

The Analytic Signal

Polychromatic Scalar Waves

The Narrowband Assumption

The Slowly Varying Envelope Assumption

Assumptions, not Approximations!

Narrowband Polychromatic Waves

Coherence

Monochromatic Light is Coherent

Starting Point: the Huygens-Fresnel Principle

The Model

Approximations used in the Huygens-Fresnel Principle

The Fresnel Diffraction Integral

The Fresnel Approximation

The Diffraction Integral: Form 1

The Diffraction Integral: Form 2

Phasor Conventions

The Fraunhofer Diffraction Integral

The Fraunhofer Approximation

The Diffraction Integral

Solution to Homework Problem 4.16

Part a: Quadratic phase approximation to the incident wave

Part b: Diffraction pattern at the point \( P \)

A deeper look into the Ray struct

Size of a ray

Alignment and padding

Alignment in Rust

What makes a good data type?

Fitting a Ray into the CPU cache

Does all of this matter?

The Requirements

The Circuit

What I Learned

I really need to study MOSFETS

MOSFETs suffer from second order effects

You can always complicate things to make them better

Electronics is an art

Biologists want to know distributions of proteins across space and time

Creation and degradation of a protein

Biologists can only measure slices of protein distributions

Organelles are mutually exclusive slices of protein distributions in space

Organelles contain many types proteins

Knowing \( N \) doesn't by itself tell us what is and is not an organelle

Cause and Effect is the probability of one protein distribution given another

Interactions between proteins require spatial colocalization

Summary

The Model Components

Assumptions

The Aperture Stop and Back Focal Plane

The Back Principal Plane

The Front Principal Surface

The Front Focal Plane

Geometry and Coordinate Systems

The Aperture Stop Radius

The Back Principal Plane