Literature Review: Clarification and Unification of the Obliquity Factor in Diffraction and Scattering Theories

Citation

Yijun Bao and Thomas K. Gaylord, "Clarification and Unification of the Obliquity Factor in Diffraction and Scattering Theories: Discussion," J. Opt. Soc. Am. A 34, 1738-1745 (2017)

Abstract

Two-dimensional (2D) and three-dimensional (3D) diffraction theories form the underlying basis of quantitative phase imaging. This paper reviews how 2D and 3D diffraction theories are developed based on thin and thick object requirements. However, some previously reported work has mixed 2D and 3D theories. This discussion shows that it is possible to enable consistent mixed use of 2D and 3D theories by applying appropriate obliquity factor (OF) modifications. The discussion is concluded with an overall unifying representation for the usage of the OF modifications in 2D and 3D diffraction theories as applied to both thin and thick objects.

Reason for this Review

I often notice that articles concerning quantitative phase imaging (QPI) are unclear about what is meant by 2D and 3D objects. The article helps to clarify this point.

Summary of the Paper

The paper addresses the problem of when to use 2D and 3D diffraction theories in forward models of image formation in a microscope. I will refer to this problem as the choice of dimensionality. In addition, the authors highlight the choice of whether an object may be treated as a "thick" object or a "thin" object, which is not the same as the choice of dimensionality. I call this the choice of object model.

Ultimately, a decision matrix is constructed in which the dimensionality and object model serve as inputs. The output is the form of the obliquity factor, a term found in all diffraction integrals relating to the Huygens-Fresnel principle and that is used to prevent backward energy flow of the wave field1. The correct forms of the obliquity factor (OF) are then used to allow the application of 2D diffraction theory to thick objects (the so-called Type-1 OF modification) and 3D diffraction theory to thin objects (the Type-2 OF modification).

The unifying theory is meant to address a problem of consistency in the authors' previous work, but I think that it is important on a more fundamental level.

Diffraction Theory

2D Diffraction Theory

2D diffraction theory follows from the well-known developments of Huygens, Fresnel, Kirchoff, Rayleigh, and Sommerfeld. The integral equation for 2D diffraction is:

$$ u ( \vec{x}, z ) = \frac{1}{j \lambda }\int u_{inc} ( \vec{x}', 0) t ( \vec{x}' ) \frac{e^{ j k \sqrt{ (\vec{x} - \vec{x}' ) + z^2 } } }{\sqrt{ (\vec{x} - \vec{x}' ) + z^2 }} K ( \vec{x}, z; \vec{x} ') \, d \vec{x}' . $$

In words, the above expression determines the field \( u \) at transverse coordinate \( \vec{x} \) and axial coordinate \( z \) due to an incident field \( u_{inc} \) on a 2D complex transmission screen \( t \) at \(z = 0 \). \( K \) is the obliquity factor, takes the form \( cos(\theta) \) in the first Rayleigh-Sommerfeld diffraction integral, \( \theta \) being the angle between the normal to the screen and the line from the origin to the point of observation.

This expression is equivalent to Eq. 3-41 of Goodman2.

Assumption of the 2D Theory

The most important assumption in 2D diffraction theory is that the object is thin. I think the authors give a somewhat unsatisfactory definition of "thin," stating:

A thin object usually means that the light exits the object approximately at the same transveral coordinate as it enters the object, or the transversal deviation of light can be neglected.

For this definition they cite Goodman2. I think they are specifically referring to this passage in Chapter 5, section 1:

With reference to Appendix B, a lens is said to be a thin lens if a ray entering at coordinates (x, y) on one face exits at approximately the same coordinates on the opposite face, i.e. if there is negligible translation of a ray within the lens. Thus a thin lens simply delays an incident wavefront by an amount proportional to the thickness of the lens at each point.

I find the defintion unsatisfactory partly because Goodman is specifically talking about rays and lenses, whereas Bao and Gaylord are talking about waves and inhomogeneous media. At the end of this post I provide a more detailed critique and a possible solution.

Mathematically, this assumption allows us to write the refractive index difference due to the screen as

$$ \Delta n ( \vec{x}, z) = \phi ( \vec{x} ) \delta ( z ) / k $$

where \( \delta \) is the Dirac delta function. The delta function is ultimately the source of the difficulties in reconciling the 2D and 3D theories.

3D Diffraction Theory

By "3D diffraction theory," Bao and Gaylord are referring to what I normally think of as "scattering." In particular, under the first Born approximation:

$$ u ( \vec{r} ) = u_{inc} ( \vec{r} ) + \int u_{inc} ( \vec{r}' ) F ( \vec{ r }' ) G ( \vec{r}, \vec{r}' ) \, d \vec{r}' .$$

This expression states that the total field diffracted (i.e. scattered) by a 3D scattering potental \( F \) is the sum of the incident field and an integral over source terms \( u_{inc} ( \vec{r}') F ( \vec{r}' ) \) multiplied by the Greens function \( G \).

Assumptions of the 3D Theory

In deriving the integral expression above, Born and Wolf3 note in chapter 13 that the gradient of the object's refractive index must be small so that the electric field components can be decoupled, thereby reducing the vector theory to a scalar one. Additionally, the first Born approximation requires that the refractive index contrast of the object be small.

The authors rightly point out that these assumptions are in contradiction with what constitutes a thin object. As a result of the delta function in the expression for \( \Delta n \) of a thin object, the refractive index gradient is huge and 3D diffraction theory should not apply. More specifically, the scalar approximation to the vector Helmholtz equation should be invalid, and it is this approximation that leads to the integral equation of potential scattering.

The authors further state that another consequence of a thin object is that the values of the refractive index become large, which violates the assumption of the first Born approximation. Here I am less certain of their argument, and I think the reason again is due to the nature of the delta function. I think that their argument is this: the refractive index values are large because the delta function technically has an infinite value. But I usually think that the delta function by itself is physically meaningless unless integrated over, which is why I am less certain of the strength of this argument.

An Inconsistency Arises

When applied to a thin object, the two different theories lead to nearly identical expressions for the diffracted field. They differ in that the expression from the 2D theory has an obliquity factor whereas that from the 3D theory does not. The authors point out that this near similarity was discussed in Chapter 1, section 8 of Cowley 1, a book originally from the 1970's. The authors clarify that the inconsistency arises from "dissimilar object requirements." I think another way to say this is that the assumptions of the 3D theory are violated when applied to a thin object. We will later see that the assumptions are not violated. Rather, the discrepancy comes from incorrectly accounting for the phase accumulated by light incident at an angle to the object.

Conversions between 2D and 3D Theories

Application of 2D theory to 3D objects

In an illuminating (pun intended) discussion the authors proceed to demonstrate that 3D diffraction theory actually emerges from the 2D theory by applying multislice theory. In this theory, a 3D object is divided into many, infinitesimally thin slices and the final field is the sum over the 2D diffracted fields from each slice.

The authors state that, for each slice, both the thin object requirement and the small refractive index contrast requirements are satisfied, which allows for the 2D and 3D theories to be linked. There is an extremely subtle point here because in the previous section the authors stated that an infinitesimally thin slice violates the assumption of small refractive index constrat values. But this is probably because the phase shift of the slice is finite but not infinitesimally small. Here, by making the slices of the 3D object extremely thin, we are also inducing an infinitesimal phase shift in each slice. An infinitesimal phase shift is consistent with the small refractive index contrasts required by the first Born approximation.

Thus the 3D theory can be in a sense rescued by proper and repeated application of the 2D theory.

Now, a proper accounting of the phase shift of each slice must include a division by the obliquity factor:

$$ d \phi ( \vec{x}', z' ) = \frac{ k \Delta n (\vec{x}', z' ) dz' } {K (\vec{x}, z; \vec{x}', z' )} .$$

The logic behind this assertion is that illumination of the slice at an angle results in a larger distance covered in traversing the slice relative to normal illumination. In the words of the authors:

The division of the OF enlarges the effective phase, because the light path length in the slice is \( dz' / K \) for off-axis light.

Combining this with the 2D integral theory results in a cancellation of the obliquity factors and a recovery of the 3D integral expression for the diffracted field.

The authors call this modification to the differential phase imparted by each slice a type I OF modification.

Application of 3D theory to 2D objects

3D theory cannot be applied to 2D objects in an attempt to recover 2D diffraction theory because of the "dissimilar object requirements." By "cannot" the authors really mean "can" because in just a few sentences they circumvent the perceived difficulty by modifying the Greens function in the 3D theory by multiplying by the obliquity factor. This leads to what the authors call a type II OF modification which appears never to have been published before in the literature.

Discussion on the Two Types OF Modification

Section 3C is probably the most important section of the paper. In it, the authors discuss the nature of the proposed OF modifications and why they are justified. By the term "nature," I mean whether they are physically motivated or just mathematical conveniences.

The application of 2D theory to 3D objects is discussed first. The corresponding OF modification to the phase of each slice is physically motivated by "the longer propagation distances of the off-axis rays." The statement at the beginning of the paragraph "A thick object must satisfy [ \( | n ( \vec{r} ) -1 | \ll 1 \) ], which is the requirement of 3D diffraction theory," is a bit puzzling not because of what it asserts (it's correct), but because it seems to imply that a proper application of 2D theory can recover this requirement without having to assume it. But by asserting that we can arrive at the total diffracted field by summing contributions from each infinitesimal slice, we are assuming exactly the first Born approximation. So it is not surprising that we should recover the 3D diffraction theory from multislice modeling when the slices are independent of one another because we made the first Born approximation in assuming the independence of each slice.

Perhaps the authors' intent was just to point out the physical basis of the OF modification to the phase. In this case, I completely agree with the modification and its physical origin.

Next we arrive at a paragraph about why 2D theory cannot be derived from 3D theory. Here the authors state that the phase induced by thin 2D object of thickness \( l \) is \( \phi \approx k l \Delta n \). Since both \( l \) and \( \Delta n \) are small, their product is very small and therefore the object has no appreciable effect on the phase. They next claim that the object cannot satisfy the first Born approximation, which is a bit confusing because the discussion in section 2C seemed to suggest that the problem comes from refractive index values that are too large, not too small.

To be honest, I don't quite follow the logic here, but I again agree with the conclusion. The type II OF modification of the Greens function is just a mathematical convenience that produces the correct results when applying 3D theory to thin objects.

What about the Depth of Field?

Overall, this paper has helped me immensely in understanding the subtleties in the different ways to model diffraction and scattering from transparent objects. In particular, I have a much clearer understanding now on how to properly carry out forward modeling of image formation in high numerical aperture (NA) microscopes.

There is however one significant oversight, which is the definition of what it means for an object to be thin. I find their definition unsatisfactory for the following reason.

Consider a flat piece of glass in air and a ray of light incident upon it. If thin means that "light exits the object approximately at the same tranversal coordinate as it enters the object," then a physically thick piece of glass can be just as "thin" under a small angle of incidence as a physically thin piece of glass under a large angle of incidence. It's the optical path length that matters, not the physical extent of the object.

This is not so much an objection to their definition, but rather to the word "thin" itself. "Thin" alludes to physical extent, but we must remind ourselves that it is really optical extent. But consider next a ray of light that is normally incident upon the glass. Now it doesn't matter how physically thick or thin the glass is; it always exits at the same tranverse coordinate. Is every object under illumination at normal incidence a thin object?

These examples indicate that any defintion of "thin" might have to include the angle of the illumination.

Carrying this example further, if we continuously reduce the NA of the system, then we will measure light that is confined to smaller and smaller angles to the axis. But then we will approach the situation described above where even physically thick objects appear thin under illumination at angles close to normal. Since decreasing the NA increases the depth of field, I suspect that a more satisfactory definition of "thin" will ultimately depend on the depth of field as well.

My hypothesis is that an object's "thinness" really depends on two things:

  1. The ratio of its physical extent to the depth of field
  2. The refractive index contrast of the object

We need both of the above quantities to be small for an object to be thin.

Conclusion

In conclusion, I really love this paper. It made me critically examine aspects of QPI that I think are taken for granted by clarifying a common point of confusion. Though I might sound critical in this post, it's only because I wish to see the authors' arguments further improved to the point where no ambiguity remains.


  1. Cowley, John Maxwell. Diffraction physics. Elsevier, 1995. 

  2. Goodman, Joseph W. Introduction to Fourier optics. Roberts and Company publishers, 2005. 

  3. Born, Max, and Emil Wolf. Principles of optics: electromagnetic theory of propagation, interference and diffraction of light. Elsevier, 2013. 

Image Formation in Brightfield Microscopy: Part 1 - Object Models

The mathematical model of image formation of a thick, transilluminated object by a microscope is complicated yet incredibly interesting. It draws from diverse areas such as crystallography, light scattering, and the theory of partial coherence. I find it much more complicated, but yet more satisfying, than the image formation models of fluorescence microscopy.

This is the first in a series of posts in which I discuss the theory of image formation of a 3D object in brightfield microscopy. The series is inspired by the classic 1985 paper Three-dimensional imaging by a microscope by N. Streibl, but includes a few digressions that I think help to understand subtleties in the theory.

The Problem

The problem is simple: we would like a model that predicts (as much as is possible) the image of a microscopic object that is captured by a brightfield microscope.

Note that this is slightly different from the problem of recovering the object, which cannot be done with complete fidelity in brightfield microscopy.

What is the Object Model?

In the theory, the microscopic object, such as a cell or microorganism, is modeled as a complex-valued, three-dimensional function \( n \) of spatial coordinates x, y, and z.

\( n \) is of course the refractive index of the object. The volume outside the object is a uniform medium of refractive index \( n_0 \).

\( n \) is complex to account for

  1. phase shifts imparted onto the light, and
  2. absorption, which is related to the imaginary part of the refrative index.

Model Classification by Scattering Strength, Object Extent, Depth of Focus

The general model of brightfield image formation is too complex to be of any real use for practical work. As a result, we need to make simplifications to make the theory manageable.

I think that the most useful way to understand these simplifications is by specifying three quantities:

  1. The strength of light scattering by the object
  2. The object's axial extent
  3. The depth of focus of the microscope

We will also need to consider properties such as the coherence of the light source, but I will leave this for a later discussion.

The Scattering Strength and the Object's Axial Extent

The degree of light scattering by an object is the most important characteristic in determining the modeling approach because there is little hope of high-fidelity imaging in strongly scattering samples. (Think of trying to see a distant object in a thick fog1.)

Roughly speaking, the scattering strength of an object depends on the degree of refractive index variations within the object and the object's extent. All else being equal, a stronger variation of the refractive index within the object leads to stronger scattering. And as the object becomes larger, a larger fraction of the energy carried by the light is scattered light spends more time inside the object.

One simple heuristic for the scattering strength is the mean free path of a photon, \( \ell \) inside the object. This is the average distance a photon travels before being scattered. The ratio of \( \ell \) to an object's extent \( L \) is therefore an indication of how many times a photon is likely to scatter.

For weakly scattering objects,

$$ \frac{\ell}{L} \ll 1 $$

I say that this is a heuristic because there are a few problems with this explanation. Namely,

  1. As far as I am aware, there isn't a clear relationship between \( \ell \) and the gradient of the refractive index.
  2. The mean free path makes sense only if light is modeled as discrete, point-like objects traveling ballistically through the sample, occasionally scattering off of other point-like objects. This is an obvious over-simplification that doesn't reflect the wave nature of light.

Additionally, if the ratio \( \ell / L \) is much less than one, the above picture suggests that light will not interact with the sample at all, but it can still very much diffract.

We'll be dealing with waves and not photons from this point onward, so if you're already experiencing some cognitive dissonance this is part of the reason. Still, this model does help one think about scattering strength without resorting to deeper and much more complicated mathematics, such as the Born approximation and Banach's fixed point theorem.

The Depth of Focus

The depth of focus of the microscope is also important for determining the modeling approach.

Consider the following ratio between the object's axial extent and the microscope's depth of focus, or

$$ \frac{L}{\text{DOF}} $$

The depth of focus is the axial range within which an object appears in focus. When the ratio is significantly less than 1, the object fits entirely within the depth of focus and appears to be effectively two dimensional. When it is about 1 or greater, some sections of the object will appear in focus whereas others will be out-of-focus. The light originating from out-of-focus sections will contribute a nonzero background to the image of the in focus section.

There is no hard cutoff value for this ratio that I am aware of that determines when an object is effectively 2D. Most likely it is more like a continuum where models that assume 2D objects become continuously less accurate as the value of the ratio increases.

In terms of microscope optics, the depth of focus (also confusingly called the depth of field) depends, among other things, on the numerical aperture (NA) of the objective. A higher NA leads to a smaller depth of focus.

An interesting situation arises when the thickness of an object, such as a cell, varies considerably across its cross section. As illustrated below, a cell might be approximately 2D everywhere when imaged with a 0.25 NA objective, which has an approximately \( 10 \, \mu m \) depth of focus. On the otherhand, only the contents within its periphery would be 2D with a 0.75 NA objective possessing a \( 1 \, \mu m \) depth of focus. Significant portions of the nucleus would be out-of-focus.

Object Modeling Approaches

So what are the modeling approaches for the object? I came up with the following decision tree to help explain the process of choosing one, which is further described below.

Strongly Scattering Objects

This is the easiest case. We're probably not going to be doing microscopy for such samples, so we don't even try to model it.

There are techniques other than brightfield microscopy for studying multiply scattering samples, but they are outside the scope of this discussion.

Weakly Scattering Objects

Here we need to know whether the object is larger or smaller than the microscope depth of focus.

Objects Smaller than the Depth of Focus

If the object is smaller than the depth of focus, we can model the sample as a complex 2D transmission mask:

$$ t( x, y) \approx e^{ j k_0 \int \left[ n( x, y, z) - n_0 \right] \, dz } $$

where \( k_0 \) is the free space wavenumber and \( n_0 \) is the refractive index of medium surrounding the object.

This approach effectively projects the small refractive index differences onto a plane transverse to the z-axis. It can be derived from the Helmholtz equation by ignorning transverse gradients of the field's amplitude and approximating the refractive index term, which is quadratic, as a linear function in \( n \).

Objects Larger than the Depth of Focus

One approach to this case is to divide the sample into thin, independent slices. Each slice is modeled as previously described. The final image is obtained by propagating the field from each slice with various degrees of defocus to the image plane, summing the propagated fields, and computing the intensity.

Of course, the slices are really only independent if the scattering is extremely weak. If this is not the case, we could use multi-slice modeling. In this approach, the sample is again divided into thin sequential slices. The difference is that the light diffracted from the first plane serves as the input to the second plane, which diffracts and serves as the input to the third plane, and so on.

Moderately Scattering Samples

I think things get really interesting here, but likely this is best left for later as I don't fully know what options are available.

I will however say that microscopy approaches with optical sectioning, such as confocal and lightsheet, might be of some use.

In the next post I plan on discussing a concept that determines the resolving power of a 3D object under a microscope: the 3D aperture.


  1. I think a better analogy would be imaging the 3D structure of the body with infrared light because it is the body itself that scatters the light, unlike fog which only obscures the object that we really care about. This example is more technical and has other difficulties, though, so I think the fog analogy is sufficient. 

The Axis of Symmetry of a Parabola and the Eigenvectors of its Matrix Representation

The last few months I have been working on developing parametric representations of conic sections so that any arbitrary conic can be drawn to the computer screen. Besides being a fun intellectual exercise, I expect that the performance of the parametric approach should be quite good relative to any iterative method for drawing implicit representations of arbitrary conic curves because no iteration is required.

The parabola in particular has proven to be trickier than I had anticipated, and it has forced me to revisit some of its basic properties.

The Setup

Consider the most general form of a conic curve, which is the implicit equation:

$$ Q ( x, y ) = A x^2 + B x y + C y^2 + D x + E y + F = 0 $$

For a parabola, \( B \) is dependent on the coefficients \( A \) and \( C \) via the equation \( B^2 = 4 A C \), so that the implicit equation for a parabola is

$$ Q ( x, y ) = (a x + c y)^2 + D x + E y + F = 0 $$

where \( a^2 = A \) and \( c^2 = C \).

It can be shown that the axis of symmetry of the parabola is the line

$$ a x + c y + \frac{a D + c E}{2 \left( a^2 + c^2 \right)} = 0 $$

The matrix of the quadratic form for the parabola is

$$\begin{eqnarray} A_{33} = \left( \begin{array}{cc} A & B / 2 \\ B / 2 & C \end{array} \right) = \left( \begin{array}{cc} a^2 & ac \\ ac & c^2 \end{array} \right) \end{eqnarray}$$

\( A_{33} \) is singular and has one eigenvalue whose value is 0.

In this post I show that the eigenvector of the matrix of the quadratic form of a parabola with the zero eigenvalue is parallel to the axis of symmetry.

Determine the Eigenvalues of \( A_{33} \)

The characteristic polynomial of the matrix \( A_{33} \) is

$$\begin{eqnarray} \left( a^2 - \lambda \right) \left( c^2 - \lambda \right) - a^2 c^2 &=& 0 \\ a^2 c^2 - \left( a^2 + c^2 \right) \lambda + \lambda^2 - a^2 c^2 &=& 0 \\ - \left( a^2 + c^2 \right) \lambda + \lambda^2 &=& 0 \end{eqnarray}$$

The solutions to the above equation are \( \lambda = \{ 0, \, a^2 + c^2 \} \).

Find the Eigenvector of the Zero Eigenvalue

The system of equations for finding the eigenvectors is

$$\begin{eqnarray} \left( a^2 - \lambda \right) x + acy &=& 0 \\ acx + \left( c^2 - \lambda \right) y &=& 0 \end{eqnarray}$$

Solving the first equation for \( y \) gives

$$ y = - \left( \frac{a^2 - \lambda}{ac} \right) x $$

Substitute in \( \lambda = 0 \) to find

$$ y = - \left( \frac{a}{c} \right) x $$

The eigenvector is thus

$$\begin{pmatrix} 1 \\ -c / a \end{pmatrix}$$

Recall from above that the axis of symmetry is

$$ a x + c y + \frac{a D + c E}{2 \left( a^2 + c^2 \right)} = 0 $$

This is a line of slope \( m = -a / c \) and is therefore parallel to the eigenvector with the zero eigenvalue.

The Tangent at the Vertex and the Eigenvector with Non-Zero Eigenvalue

The tangent to the parabola at its vertex is perpendicular to the axis of symmetry and must therefore have a slope of \( c / a \)1.

The eigenvector with eigenvalue \( \lambda = a^2 + c^2 \) of the matrix of the quadratic form is found by substituting this value into the first equation in the system above:

$$\begin{eqnarray} \left( a^2 - \lambda \right) x + ac y &=& 0 \\ -c^2 x + ac y &=& 0 \end{eqnarray}$$

This gives

$$ y = \left( \frac{c}{a} \right) x $$

which is a line whose slope is the negative reciprocal of the slope of the axis of symmetry. The eigenvector with non-zero eigenvalue is therefore parallel to the tangent at the vertex.

For completeness, the eigenvector with non-zero eigevalue is

$$\begin{pmatrix} 1 \\ a / c \end{pmatrix}$$


  1. Perpendicular lines have slopes whose product is equal to -1. 

Completing the Square and the Normal Form of Quadrics

I am working on rendering cross section views of optical systems for my ray tracer. The problem is one of finding the intersection curve between a plane (the cutting plane) and a quadric surface which represents an interface between two media with different refractive indexes. Quadric surfaces are important primitives for modeling optical interfaces because they represent common surface types in optics, such as spheroids and paraboloids. A pair of quadrics, or a quadric and a plane, models a common lens.

In 3D, the implicit surface equation for a quadric is

$$ A x^2 + B y^2 + C z^2 + D x y + E y z + F x z + G x + H y + I z + J = 0 $$

Any quadric can be reduced to a so-called normal form that identifies its class, i.e. ellipsoid, hyperbolic paraboloid, etc. Except for paraboloids, none of the normal form equations contain linear terms in \( x \), \( y \), or \( z \).

A quadric of revolution occurs when two or more of the parameters of the the quadric's normal form are equal, such as \( x^2 / R^2 + y^2 / R^2 + z^2 / R^2 = 1 \), which is the equation for a spheroid with radius parameter \( R \)1. Quadrics of revolution are the surface types most-often encountered in optics2.

The surface sag of a quadric surface is a very important quantity for ray tracing. The sag of a quadric is usually given in terms of the conic constant, \( K \). One obtains the sag by solving the following quadric equation for \( z \):

$$ x^2 + y^2 - 2 R z + ( K + 1 ) z^2 = 0 $$

Here, \( R \) is the radius of curvature of the surface at its apex, \( x = y = z = 0 \).

At this point I asked myself how I could rewrite the above expression in its normal form, and for a while I was unable to do it. After a bit of searching on the internet, I eventually realized that the solution involves completing the square, a topic that was not given much attention during my high school education. After this exercise, I realize now that the purpose of completing the square is to essentially move any linear terms of a quadratic equation into squared parantheses. This allows one to then remove the linear terms entirely by applying a suitable transformation, leaving only quadratic and constant terms.

Converting the Quadric to its Normal Form

The conversion of the above equation proceeds as follows. We first factor out \( ( K + 1 ) \) from the terms involving \( z \).

$$ x^2 + y^2 + ( K + 1 ) \left[ z^2 - \frac{ 2 R z }{ K + 1 }\right] = 0 $$

Next, we "add zero" to the term inside the square brackets by adding \( [ 2 R / 2 ( K + 1 ) ]^2 - [ 2 R / 2 ( K + 1 ) ]^2 = [ R / ( K + 1 ) ]^2 - [ R / ( K + 1 ) ]^2 \):

$$ x^2 + y^2 + ( K + 1 ) \left[ z^2 - \frac{ 2 R z }{ K + 1 } + \left( \frac{ R }{ K + 1 } \right)^2 - \left( \frac{ R }{ K + 1 } \right)^2 \right] = 0 $$

We can understand this a bit more generally by considering the expression \( z^2 - a z \). Here I need to add and subtract \( ( a / 2 )^2 \). The reason is that now we can rewrite the first three terms inside the square brackets as a squared binomial:

$$ x^2 + y^2 + ( K + 1 ) \left[ \left( z - \frac{ R }{ K + 1 } \right)^2 - \left( \frac{ R } { K + 1 } \right)^2 \right] = 0 $$

These last two steps complete the square. To place the equation into its normal form, I apply the Euclidean transformation \( z' = z - \frac{ R }{ K + 1 } \) and carry through the \( K + 1 \).

$$ x^2 + y^2 + ( K + 1 ) z^{ \prime 2 } - R^2 / ( K + 1 ) = 0 $$

The above equation is almost a normal form expression for a quadric. To finish the job, I would need to substiute in a specific value for the conic constant and divide through so that the constant is either -1, 0, or 1.


  1. The coefficients of each term need not be equal in general. 

  2. A cylindrical lens does actually contain a quadric, but rather would consist of at least one toroidal surface. These are less common than lenses with spherical profiles, however. 

Why is Camera Read Noise Gaussian Distributed?

As a microscopist I work with very weak light signals, often just tens of photons per camera pixel. The images I record are noisy as a result1. To a good approximation, the value of a pixel is a sum of two random variables describing two different physical processes:

  1. photon shot noise, which is described by a Poisson probability mass function, and
  2. camera read noise, which is described by a Gaussian probability density function.

Read noise has units of electrons, which must be discrete, positive integers. So why is it modeled as a continuous probability density function2?

The Source(s) of Read Noise

Janesick3 defines read noise as "any noise source that is not a function of signal." This means that there is not necessarily one single source of read noise. It is commonly understood that it comes from somewhere in the camera electronics, but "somewhere" need not imply that it is isolated to one location.

The signal from a camera pixel is the number of photoelectrons that were generated inside the pixel. I imagine readout of this signal as a linear path consisting of many steps. The signal might change form along this path, such as going from number of electrons to a voltage. At each step, there is a small probability that some small error is added to (or maybe also removed from?) the signal. The final result is a value that differs randomly from the original signal.

Importantly, I do not think that it matters which physical process each step actually represents; rather there just has to be many of them for this abstraction to be valid.

"But aren't there only a handful of steps?" you might ask. After all, linear models of photon transfer typically consist of a few processes such as detection, amplification, readout, and analog-to-digital conversion. I am not referring to these when I use the term "step." Rather, I am referring to processes that are much more microscopic, such as passage of a signal through a transistor or amplifier chip. At the very least Johnson noise, or random currents induced by thermal motion of the charge carriers, will be present in all of the camera's components.

Read Noise is Gaussian because of the Central Limit Theorem

The reason for my conclusion that I can ignore the details so long as there are many steps is the following:

I can model the error introduced by each step as a random variable. Let's assume that each step is independent of the others. The result of camera readout is a sum of a large number of independent random variables. And of course the Central Limit Theorem states that the distribution of the sum of random variables tends towards a normal distribution, i.e. Gaussian, as the number of random variables tends towards infinity. This happens regardless of the distributions of the underlying random variables.

So read noise can appear to be effectively Gaussian so long as there are many steps along the path of conversion from photoelectrons to pixel values and each step has a chance of introducing an error.

Sums of Discrete Random Variables

I encountered one conceptual difficulty here: the sum of discrete random variables is still discrete. If I have several variables that produce only integers, their sum is still an integer. I cannot get, say, 3.14159 as a result. Does the Gaussian approximation, which is for continuous random variables, still apply in this case?

This question is relevant because the signal in a camera is transformed between discrete a continuous representations at least twice: from electrons to voltage and from voltage to analog-to-digital units (ADUs).

Let's say that I have a discrete random variable that can assume values of 0 or 1, and the probability that the value is 1 is denoted \( p \). This is known as a Bernoulli trial. Now let's say that I have a large number \( n \) of Bernoulli trials. But the sum of \( n \) Bernoulli trials has a distribution that is binomial, and this is well-known to be approximated as a Gaussian when certain conditions are met, including large \( n \)4. So a sum of a large number of discrete random variables can have a probability distribution function that is approximated as a Gaussian.

This does not mean that the sum of discrete random variables can take on continuous values. Rather, the probability associated with any one output value can be estimated by a Gaussian probability density function.

But how exactly can I use a continuous distribution to approximate a discrete one? After all, if the random variable \( Y \) is a continuous, Gaussian random variable, then \(P (Y = a) = 0 \) for all values of \( a \). To get a non-zero probability from a probability density function, I need to integrate it over some interval of its domain. I can therefore integrate the Gaussian in a small interval around each possible value of the discrete random variable, and then associate this integrated area with the probability of the obtaining that discrete value. This is called a continuity correction.

Example of a Continuity Correction

As a very simple example, consider a discrete random variable \( X \) that is approximated by a Gaussian continuous random variable \( Y \). The probability of getting a discrete value 5 is \( P (X = 5) \). The Gaussian approximation is \( P ( 4.5 \lt Y \lt 5.5 ) \), i.e. I integrate the Gaussian from 4.5 to 5.5 to compute the approximate probability of getting the discrete value 5.


  1. I wrote a blog post about this a while back: https://kmdouglass.github.io/posts/modeling-noise-for-image-simulations/ 

  2. This is often asserted without justification. See for example Janesick, Photon Transfer, page 34. 

  3. https://doi.org/10.1117/3.725073 

  4. https://en.wikipedia.org/wiki/Binomial_distribution#Normal_approximation 

3D Sequential Optical System Layouts

I am working on a new feature in my ray tracer that will allow users to lay out sequential optical systems in 3D. This is forcing me to think carefully about 3D rigid body transformations in a level of detail that I have never before considered.

In this post I walk through the mathematics for modeling a pair of flat mirrors that are oriented at different angles. Strictly speaking, the layout can be represented more easily in 2D, but I will treat the problem as if it were the more general 3D case. Emphasis will be placed on specifying rotations in an intuitive manner, which will mean rotations about the optical axis, rather than about a fixed axis in a global reference frame.

The Problem

The problem that I will consider is depicted as follows:

A system of two flat mirrors whose optical axis forms the figure Z.

The system consists of two flat mirrors whose optical axis forms a "figure Z." The normal of the first mirror is at 30 degrees to the axis, and likewise for the second. The optical axis emerges from the second mirror parallel to the first.

The questions are:

  1. How do I construct the system without requiring the user to specify the absolute coordinates of the mirror surfaces?
  2. How do I represent the local coordinate reference frames for each mirror surface?
  3. How do I handle transformations between frames?

Ray Tracing Review

As a quick review, the ray tracing algorithm that I implemented was described by Spencer and Murty1. It is loosely follows this pseudo-code:

for each surface in system:
    for each ray in ray bundle:
        1. transform the ray coordinates by rotating the reference frame into the local surface frame
        2. find the ray/surface intersection point
        3. propagate the ray to the intersection point
        4. perform bounds checking against the surface
        5. redirect the ray according to the laws of refraction or reflection
        6. transform the ray coordinates by rotating the reference frame back into the global frame

Rotations are performed using 3x3 rotation matrices. Ray/surface intersections are found numerically using the Newton-Raphson method, even for spherical surfaces2. I computed the expressions for the surface sag and normal vectors for conics and flat surfaces by hand and hard-coded them as functions of the intersection point in the local surface reference frame to avoid having to compute them on-the-fly.

Looking at the ray trace algorithm, I see three things that are relevant to this discussion:

  1. There are both global and local reference frames
  2. Surfaces are iterated over sequentially
  3. There are rotations, one at the beginning of each loop iteration and one at the end

Let's explore each one individually, starting with the global and local reference frames.

Reference Frames

I will use only right-handed reference frames where positive rotations are in the counterclockwise direction.

The Global Reference Frame

The global reference frame \( \mathbf{G} \) remains fixed. Sometimes it's called the world frame. I denote the coordinate axes of the global frame using \( x \), \( y \), and \( z \).

The global reference frame.

By convention, I put its origin at the first non-object surface; this would be at the first mirror in the system of two mirrors I described above3. I also establish the convention that the optical axis between the object and the first surface is parallel to the global z-axis.

The global frame is important because the orthonormal vectors defining the local and cursor frames (to be explained later) are expressed relative to it.

Local Reference Frames

Each surface \( i \) has a local reference frame \( \mathbf{L}_i \) whose origin lies at the vertex of the surface. Its coordinate axes are denoted \( x_i^{\prime} \), \( y_i^{\prime} \), and \( z_i^{\prime} \). For flat surfaces, I set the \( z_i^{\prime} \) axis perpendicular to the surface.

The local reference frames of the two mirrors.

Notice that the \( x^{\prime} \) axes flip directions when going from mirror 1 to mirror 2. This is done to preserve the right-handedness of the reference frames. More about this will be explained in the next section.

Sequential System Models

Ray tracing programs for optical design are often divided into two categories: sequential and nonsequential. In sequential ray tracers, rays are traced from one surface to another in the sequence for which they are defined. This means that a ray could pass right through a surface if it is not the next surface in the model sequence.

Nonsequential ray tracers do not take account of the order in which surfaces are defined. Rays are fired into the world and the intersect whatever the closest object is on their path. Illumination optics often use nonsequential ray tracing, as do rendering engines for cinema.

My ray tracer is a sequential ray tracer because sequential ray tracing is easier to implement and can be applied to nearly all the use cases that I encounter in the lab.

3D Layouts of Sequential Surfaces

One possibility to layout sequential surfaces in 3D is to specify the coordinates and orientations of each surface relative to the global frame. This is how one adds surfaces in 3D in the open source Python library Optiland, for example. In practice, I found that I need to have a piece of paper by my side to work out the positions of each surface independently. This option provides maximum flexibility in surface placement.

The other possibility that I considered is to leverage the fact that the surfaces are an ordered sequence, and position them in 3D space along the optical axis. The axis can reflect from reflecting surfaces using the law of reflection. Furthermore, any tilt or decenter could be specified relative to this axis. I ultimately chose this solution because I felt that it better matches my mental model of sequential optical systems. It also seems to follow more closely what I do in the lab when I build a system, i.e. add components along an axis that bends through 3D space.

The Cursor

I created the idea of the cursor to position sequential surfaces in 3D space. A cursor has a 3D position, \( \vec{ t } \left( s \right) \) that is parameterized over the track length \( s \). \( s \) is negative for the object surface, \( s = 0 \) at the first non-object surface, and achieves its greatest value at the final image plane.

In addition, the cursor has a reference frame attached to it that I denote \( \mathbf{C} \left( s \right) \). The axes of the cursor frame are \( r \), \( u \) and \( f \), which stand for right, up, and forward, respectively. This nearly matches the FRU coordinate system in game engines such as Unreal, except I take the forward direction to represent the optical axis because I would say that this convention is universal in optical design.

The cursor frames at three different positions along the optical axis.

Above I show the cursor frame at three different positions along the optical axis \( s_1 < 0 < s_2 < s_3 \). Refracting surfaces will not change the orientation of the cursor frame, but reflecting surfaces will.

Finally, when \( s \) is exactly equal to a reflecting surface position, I take the orientation of the cursor frame to be the one before reflection. An infinitesimal distance later, the frame reorients by reflecting about the surface normal at the vertex of the surface in its local frame.

Convention for Maintaining Right Handedness upon Reflection

There is an ambiguity that arises in the cursor frame upon reflection that is best illustrated in the example below:

Ambiguity in the cursor frame upon reflection.

In panel a, the cursor is incident upon a mirror with its frame's forward direction antiparallel to the mirror's normal vector. There are two equally valid choices when defining the cursor frame after reflection. In panel b, the cursor frame is rotated about the up direction, whereas in panel c it is rotated about the right direction. This means that there is no fundamentally correct way to position the cursor frame after reflection. We must choose a convention and stick with it.

Reflections of the cursor frame are handled in two steps:

  1. Reflect the frame
  2. Adjust the results to maintain right handedness and address the ambiguity illustrated above

The vector law of reflection is used to compute the new \( \hat{ r } \), \( \hat{ u } \), and \( \hat{ f } \) unit vectors for any general angle of incidence of the cursor frame upon a reflecting surface:

$$\begin{equation} \hat{ f }^{\prime} = \hat{ f } - 2 \left( \hat{ f } \cdot \hat{ n } \right) \hat{ n } \end{equation}$$

where \( \hat{ n } \) is the surface's unit normal vector. The same applies for the right and up unit vectors.

After reflection, I perform a check for right handedness. By convention, I maintain the direction of the up unit vector because many optical systems are laid out in 2D and their elements are rotated about this direction. This convention means that the right unit vector must be flipped:

if cross(right, up) · forward < 0:
    right = -right

The cross product between the right and up directions must point in the forward direction if the system is right handed. "Pointing in the forward direction" means that the dot product of the result with the forward unit vector must be greater than zero. The conditonal in the pseudocode above checks whether this is not indeed the case and flips the right unit vector if necessary.

Transformations between Reference Frames

There are two different transformations required by the ray trace algorithm:

  1. From the global frame to a surface local frame
  2. From a surface local frame to the global frame

Because the system is laid out relative to the cursor frame, I need to chain together two rotations, one from the global to the cursor frame, and one from the cursor to the local frame.

Example

Let's say that the second mirror has a diameter of 25.4 mm, and that the mirrors are separated by \( \| \vec{ t } \left( s_2 \right) \| = 100 \, mm \). I want to find the transformation from the global frame coordinates at a point on the bottom edge of the mirror to the local frame coordinates, which is \( y_2^{\prime} = -12.7 \, mm \). The image below illustrates the geometry that will be used for this example.

Example geometry for transforming from the global frame to the local frame via the cursor frame.

From relatively straightforward trigonmetry4 we get the global frame coordinates of both \( {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) \) and the point we are trying to find, \( {}^{\mathbf{G}}\vec{ p } \). (Vectors preceded by superscripts with reference frame names indicate the coordinate system they are being referred to.)

$$\begin{eqnarray} {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) = \left( \begin{array}{c} 0 \\ 50 \sqrt{ 3 } \\ -50 \end{array} \right) \end{eqnarray}$$

$$\begin{eqnarray} {}^{\mathbf{G}}\vec{ p } = \left( \begin{array}{c} 0 \\ 43.65 \sqrt{ 3 } \\ -56.35 \end{array} \right) \end{eqnarray}$$

Step 1: Translate from the Global Origin to the Cursor Frame

The first step in computing \( {}^{\mathbf{ C }} \vec{ p } \) is to translate from the origin of the global frame to the position of cursor.

$$ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) = (0, -6.35 \sqrt{3}, -6.35)^{ \mathrm{ T }}$$

Step 2: Rotate into the Cursor Frame

A rotation from the global frame into the cursor frame can be achieved by taking the \( \hat{ r } \), \( \hat{ u } \), and \( \hat{ f } \) unit vectors that define the cursor frame in the global coordinate system and making them the columns of a \( 3 \times 3 \) rotation matrix. At the second mirror, this matrix is:

$$\begin{eqnarray} R_{GC} \left( \theta = 30^{ \circ } \right) = \left( \begin{array}{ccc} -1 & 0 & 0 \\ 0 & 1 / 2 & \sqrt{ 3 } / 2 \\ 0 & \sqrt{ 3 } / 2 & -1 / 2 \end{array} \right) \end{eqnarray}$$

If this is not clear, consider that the columns of a rotation matrix represent the basis vectors of the coordinate system after rotation, but expressed in the original (global) frame's coordinate system. Also, from the diagram above, \( \hat{ u } \left( s_2 \right) = ( 0, 1 / 2, \sqrt{ 3 } / 2)^{ \mathrm{ T }} \) and \( \hat{ f } \left( s_2 \right) = ( 0, \sqrt{ 3 } / 2, - 1 / 2)^{ \mathrm{ T }}\), which are the second and third columns of the matrix.

The rotation into the cursor frame is the product between the rotation matrix and the difference \( {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) \):

$$\begin{eqnarray} {}^{\mathbf{ C }} \vec{ p } = R_{GC} \left[ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) \right] = \left( \begin{array}{ccc} -1 & 0 & 0 \\ 0 & 1 / 2 & \sqrt{ 3 } / 2 \\ 0 & \sqrt{ 3 } / 2 & -1 / 2 \end{array} \right) \left( \begin{array}{c} 0 \\ -6.35 \sqrt{ 3 } \\ -6.35 \end{array} \right) = \left( \begin{array}{c} 0 \\ -6.35 \sqrt{ 3 } \\ -6.35 \end{array} \right) \end{eqnarray}$$

At first, I thought I had made a mistake when I did this calculation because the vector is unchanged after rotation. However, as illustrated below, you can see that the relative lengths of the projections of \( {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) \) onto the \( u \) and \( f \) axes make sense.

A simplified schematic showing the projection of difference vector onto the -u and -f axes.

As it turns out, I inadvertently chose an eigenvector of the rotation matrix as an example; any general point will in fact change its coordinates when moving from the global to the cursor frame. For example, if we try to rotate a vector that is antiparallel to the global z-axis, i.e. \( {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) = ( 0, 0, -1 )^{ \mathrm{ T } }\), then it will become

$$\begin{eqnarray} R_{GC} \left[ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) \right] = \left( \begin{array}{ccc} -1 & 0 & 0 \\ 0 & 1 / 2 & \sqrt{ 3 } / 2 \\ 0 & \sqrt{ 3 } / 2 & -1 / 2 \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ -1 \end{array} \right) = \left( \begin{array}{c} 0 \\ -0.8660 \\ -0.5 \end{array} \right) \end{eqnarray}$$

in the cursor frame.

Step 3: Rotate into the Surface Local Frame

For the final step, I need to compose a rotation matrix from a sequence of three rotations. To do this well, I need to be very clear about what types of rotations I am performing and their sequence.

Active vs. Passive Rotations

The difference between active and passive rotations are illustrated below for a 45 degree rotation about the right axis.

Active vs. passive rotations

Active rotations specify the rotation of a point relative to a fixed reference frame; passive rotations specify the rotation of a reference frame, keeping the point fixed. And pay attention here: the right axis points into the screen, so a positive rotation would be clockwise when viewed from the perspective drawn above.

What are the corresponding rotation matrices? Here, I found that the internet is absolutely littered with wrong answers, including on sites like Wikipedia. I even get different answers from LLMs depending on when I ask. Therefore, I am including them here as a gift to my future self.

The active rotation matrices about the x (right), y (up), and z (forward) axes are:

$$\begin{eqnarray} R_x \left( \theta \right) = \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos \theta & - \sin \theta \\ 0 & \sin \theta & \cos \theta \end{array} \right) \end{eqnarray}$$

$$\begin{eqnarray} R_y \left( \psi \right) = \left( \begin{array}{ccc} \cos \psi & 0 & \sin \psi \\ 0 & 1 & 0 \\ - \sin \psi & 0 & \cos \psi \end{array} \right) \end{eqnarray}$$

$$\begin{eqnarray} R_z \left( \phi \right) = \left( \begin{array}{ccc} \cos \phi & - \sin \phi & 0 \\ \sin \phi & \cos \phi & 0 \\ 0 & 0 & 1 \end{array} \right) \end{eqnarray}$$

The passive rotation matrices about the x (right), y (up), and z (forward) axes are:

$$\begin{eqnarray} R_x \left( \theta \right) = \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos \theta & \sin \theta \\ 0 & - \sin \theta & \cos \theta \end{array} \right) \end{eqnarray}$$

$$\begin{eqnarray} R_y \left( \psi \right) = \left( \begin{array}{ccc} \cos \psi & 0 & - \sin \psi \\ 0 & 1 & 0 \\ \sin \psi & 0 & \cos \psi \end{array} \right) \end{eqnarray}$$

$$\begin{eqnarray} R_z \left( \phi \right) = \left( \begin{array}{ccc} \cos \phi & \sin \phi & 0 \\ - \sin \phi & \cos \phi & 0 \\ 0 & 0 & 1 \end{array} \right) \end{eqnarray}$$

Notice that all that changes between these two types of rotations is the location of a negative sign on the \( \sin \) terms.

I found that a useful way to remember whether a matrix represents an active or passive rotation is as follows. Take for example the +45 degree rotation of the vector \( ( 0, 0, 1 )^{ \mathrm{ T } } \) about the right direction illustrated above. You can see that an active rotation should result in a negative \( u \) and a positive \( f \) component. This means5:

$$\begin{eqnarray} \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 / \sqrt{ 2 } & - 1 / \sqrt{ 2 } \\ 0 & 1 / \sqrt{ 2 } & 1 / \sqrt{ 2 } \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right) = \left( \begin{array}{c} 0 \\ - 1 / \sqrt{ 2 } \\ 1 / \sqrt{ 2 } \end{array} \right) \end{eqnarray}$$

The passive rotation should result in positive values for both the \( u^{ \prime } \) and \( f^{ \prime } \) components:

$$\begin{eqnarray} \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 / \sqrt{ 2 } & 1 / \sqrt{ 2 } \\ 0 & - 1 / \sqrt{ 2 } & 1 / \sqrt{ 2 } \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right) = \left( \begin{array}{c} 0 \\ 1 / \sqrt{ 2 } \\ 1 / \sqrt{ 2 } \end{array} \right) \end{eqnarray}$$

I can do a similar check for the other directions to verify the other matrices.

Extrinsic vs. Intrinsic Rotations

I found these easier to understand than active and passive rotations. Extrinsic rotations are rotations that are always about a fixed global reference frame. On the other hand, intrinsic rotations are about the intermediate frames that result from a single rotation. So if I rotate about the \( f \) axis, then the \( r \) and \( u \) axes will be rotated, resulting in an intermediate \( r^{ \prime }u^{ \prime }f^{ \prime } \) frame. The next rotation will be about one of these intermediate axes.

The confusing thing about these two types of rotations is the order in which the rotation matrices are applied to a vector. An extrinsic rotation of a vector \( \vec{ v } \) about \( r \), then \( u \), then \( f \) is written as:

$$ R_f R_u R_r \vec{ v} $$

which follows the usual commutativity rules of matrix multiplication. An intrinsic rotation of a vector \( \vec{ v } \) about \( r \), then \( u^{ \prime } \), then \( f^{ \prime \prime } \), on the other hand is written as:

$$ R_r R_u R_f \vec{ v} $$

So even though the rotation about the right direction is performed first, we multiply the vector first by the rotation matrix about the \( f \) direction in the second intermediate frame.

All of this might seem confusing and lead one to wonder why they would want to use intrinsic rotations, but actually they are much more intuitive than extrinsic rotations and make a lot of sense when laying out an optical system. For example, if I have a two-axis mirror mount and I rotate the mirror about the vertical axis, a horizontal rotation that follows will be about the axis in the newly rotated frame, not the global laboratory frame. In any case, a sequence of three extrinsic rotations and three intrinsic rotations through the same angles will produce the same result so long as the order of the rotation matrices is correct.

Euler Angles and Rotation Sequences

The most important thing I learned about Euler angles is that they are completely meaningless unless you also specify a rotation sequence. Additionally, the internet is full of resources about the distinction between proper and improper Euler angles. The gist of what I learned here is that proper Euler angles are really a distraction to scientists and engineers because they rely on rotation sequences in which one of the axes is used twice. More useful are what aerospace engineers sometimes refer to as the Tait-Bryan angles, which are the rotation angles associated with sequences like \( z-y^{ \prime }-x^{ \prime \prime } \) or \( x-y-z \).

Now, there is one point here that is worth making and that is relevant to optical system layout: rotations about \( f \), the forward direction, into the local frame are best performed last in the sequence. To understand why, consider a cylindrical lens with an axis parallel to the local \( z' \) direction. If we perform an intrinsic rotation about the cursor's \( f \) direction first and then try to adjust its tip or tilt, we will be doing so about axes that are rotated such that its tip and tilt become coupled with respect to the global frame. When aligning such systems, no one expects that rotation of a cylindrical lens about its axis will change the way that the tip and tilt adjustors on a lens mount work.

For all these reasons, I choose an intrinsic sequence \(r - u^{ \prime } - f^{{ \prime \prime} } \) of passive rotations with Euler angles \( \theta \), \( \psi \), and \( \phi \), respectively. The corresponding rotation matrix is:

$$\begin{eqnarray} R_{ \mathbf{ CL } } ( \theta, \psi, \phi ) = R_r ( \theta ) R_u ( \psi ) R_f ( \phi ) = \left( \begin{array}{ccc} \cos \phi \cos \psi & \sin \phi \cos \psi & - \sin \psi \\ - \sin \phi \cos \theta + \sin \psi \sin \theta \cos \phi & \sin \phi \sin \psi \sin \theta + \cos \phi \cos \theta & \sin \theta \cos \psi \\ \sin \phi \sin \theta + \sin \psi \cos \phi \cos \theta & \sin \phi \sin \psi \cos \theta - \sin \theta \cos \phi & \cos \psi \cos \theta \end{array} \right) \end{eqnarray}$$

And finally, the transformation of the point on the mirror from the global to the surface local frame is:

$$ {}^{ \mathbf{ L }}\vec{p} = R_{ CL }R_{ GC } \left[ {}^{\mathbf{G}}\vec{ p } - {}^{\mathbf{G}}\vec{ t } \left( s_2 \right) \right] $$

The Solution to the Example

Does this give the correct result in the above example? Well, the mirror is rotated +30 degrees about the right direction, so the cursor-to-local rotation matrix is:

$$\begin{eqnarray} R_{CL} \left( \theta = 30^{ \circ }, \psi = 0, \phi = 0 \right) = \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \sqrt{ 3 } / 2 & 1 / 2 \\ 0 & - 1 / 2 & \sqrt{ 3 } / 2 \end{array} \right) \end{eqnarray}$$

From earlier, the vector representing the point in the cursor frame is \( ( 0, -6.35 \sqrt{ 3 }, -6.35 )^{ \mathrm{ T } } \). Their product gives the final answer:

$$\begin{eqnarray} {}^{ \mathbf{ L } }\vec{ p } = \left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \sqrt{ 3 } / 2 & 1 / 2 \\ 0 & - 1 / 2 & \sqrt{ 3 } / 2 \end{array} \right) \left( \begin{array}{c} 0 \\ -6.35 \sqrt{ 3 } \\ -6.35 \end{array} \right) = \left( \begin{array}{c} 0 \\ -12.7 \\ 0 \end{array} \right) \end{eqnarray}$$

This is exactly as expected, as we wanted to get a point on the bottom of the 25.4 mm diameter mirror in its local frame.

Step 4: Rotate back from the Surface Local to the Global Frame

I need to go back to the global frame at the end of each iteration for a ray trace. Fortunately, it's easy to undo a rotation because the inverse of a rotation matrix is just its transpose. I also need to swap the order of the matrices when taking the inverse, and add back the offset from the origin of the global system. This means:

$${}^{\mathbf{ G } }\vec{ p } = R_{GC}^{ \mathrm{ T} } R_{CL}^{ \mathrm{ T} } {}^{\mathbf{ L } }\vec{ p } + {}^{\mathbf{ G }} \vec{ t } (s_2) $$

I plugged in the numbers in Python and verified that I get the original point back.

This should be all I need to know to implement 3D sequential optical system layouts in my ray tracer.


  1. G. H. Spencer and M. V. R. K. Murty, "General Ray-Tracing Procedure," J. Opt. Soc. Am. 52, 672-678 (1962). https://doi.org/10.1364/JOSA.52.000672

  2. Ray/surface intersections with spherical surfaces can be found analytically using the quadratic equation with only minor caveats considering stability issues due to floating point arithmetic. This would likely be faster than using Newton-Raphson. However, a general system contains both spherical and non-spherical surfaces, and I was concerned that checking each surface type would result in a performance hit due to branch prediction failures by the processor. I could probably have found a way around this by deciding ahead of time which algorithm to use to determine the intersection for each surface before entering the main ray tracing loop, but during initial development I decided to just use Newton-Raphson for everything because doing so resulted in very simple code. (Thanks to Andy York for telling me about the numerical instabilities when using the quadratic equation. See Chapter 7 here: https://www.realtimerendering.com/raytracinggems/rtg/index.html.) 

  3. The object plane is the flat surface perpendicular to the optical axis in which the object lies. It is always at surface index 0 in my convention. 

  4. Since the only angles involved are \( 30^{ \circ} \) and \( 60^{ \circ } \), I used a 30-60-90 triangle of lengths 1, \( \sqrt{ 3 } \), and 2, respectively to compute the cosines and sines. 

  5. The cosine and sine of 45 degrees are both \( 1 / \sqrt{ 2 } \). 

  6. Passive rotations result in a rotation of the coordinate axes, keeping a point fixed; active rotations rotate a point about a set of axes. 

A Very Brief Summary of The Analytic Signal in Fourier Optics

The Analytic Signal Representation of a Monochromatic Wave

Monochromatic Scalar Waves

A monochromatic, scalar waveform is described by the expression:

$$ u \left( \mathbf{r}, t\right) = A ( \mathbf{r} ) \cos \left[2 \pi f_0 t + \phi \left( \mathbf{r} \right) \right] $$

  • The signal is real-valued
  • The signal has a known phase for all \( t \)

The Analytic Signal

An analytic signal is a generalization of a phasor. It is used to represent a real-valued signal as a complex exponential or a sum of complex exponentials. When Goodman1 refers to a phasor, he often means the analytic signal. This is made clear in Chapter 6 where he describes the construction of the phasor for a narrowband signal as follows:

  1. compute its Fourier transform
  2. set the positive frequency components to zero
  3. double the amplitudes of the negative frequency components
  4. inverse Fourier transform the resulting one-sided spectrum

Strictly speaking, the analytic signal is obtained by setting the negative frequencies to zero and doubling by application of the Hilbert transform. However, many engineering fields have adopted the convention of setting the positive frequencies to zero instead. The results will be the same, except that the direction of power flow will be reversed (if I recall correctly).

The Fourier transform of \( u \left( \mathbf{r}, t\right) \) is:

$$ \mathcal{F} \left\{ u \left( \mathbf{r}, t\right) \right\} = \frac{A ( \mathbf{r} ) }{2} \left[ e^{j \phi \left( \mathbf{r} \right) } \delta \left( f - f_0 \right) + e^{-j \phi \left( \mathbf{r} \right) } \delta \left( f + f_0 \right) \right] $$

Drop the positive frequency term \( e^{j \phi \left( \mathbf{r} \right) } \delta \left( f - f_0 \right) \) and double the result. This produces:

$$ A ( \mathbf{r} ) e^{-j \phi \left( \mathbf{r} \right) } \delta \left( f + f_0 \right) $$

Let \( U ( \mathbf{r} ) := A ( \mathbf{r} ) e^{-j \phi \left( \mathbf{r} \right) } \). The inverse Fourier transform of this signal is:

$$ \mathcal{F}^{-1} \left\{ U ( \mathbf{r} ) \delta \left( f + f_0 \right) \right\} = U ( \mathbf{r} ) e^{-j 2 \pi f_0 t } $$

We can recover the original field by taking the real part of this expression, which is equivalent to applying Euler's identity and dropping the imaginary part:

$$ u \left( \mathbf{r}, t \right) = \Re \left[ U ( \mathbf{r} ) e^{-j 2 \pi f t} \right] $$

Polychromatic Scalar Waves

To model a polychromatic wave, we integrate over the analytic signals of each spectral component and take the real part of the result:

$$ u \left( \mathbf{r}, t\right) = \Re \left[ \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, f \right) e^{-j 2 \pi f t} \,df \right] $$

The Narrowband Assumption

We get a useful representation to the expression above if we assume that the bandwidth of the signal is much smaller than its center frequency \( \Delta f \ll f_0 \):

$$ \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, f \right) e^{-j 2 \pi f t} \,df = U \left( \mathbf{r}, t \right) e^{-j 2 \pi f_0 t} $$

To better understand the meaning of this assumption, make the substitution \( \nu = f - f_0 \) into the expression on the left hand side:

$$\begin{eqnarray} \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, f \right) e^{-j 2 \pi f t} \,df &=& \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, \nu + f0 \right) e^{-j 2 \pi \left( \nu + f_0 \right) t} \,d\nu \\ &=& e^{-j 2 \pi f_0 t} \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, \nu + f0 \right) e^{-j 2 \pi \nu t} \,d\nu \end{eqnarray}$$

Under the narrowband assumption, the integration in the expression above is constrained to small values around \( \nu = 0 \) that are much less than the phasor term that is oscillating at frequency \( f_0 \). If we define the following function:

$$ U \left( \mathbf{r}, t \right) := \int_{-\infty}^{\infty} \tilde{U} \left( \mathbf{r}, \nu + f0 \right) e^{-j 2 \pi \nu t} \,d\nu $$

then it will vary slowly with respect to the carrier frequency \( f_0 \).

As a result, under the assumptions of narrowbandedness, we can interpret the complex function \( U \left( \mathbf{r}, t \right) \) as an "envelope" modulating the amplitude of the fast oscillating carrier wave. If the assumption is not valid, then this interpretation fails.

The Slowly Varying Envelope Assumption

It is instructive to reverse our reasoning and see why a slowly-varying envelope implies a narrowband signal. Compute the Fourier transforms of the narrowband waveform, along with the Fourier transform of the derivative of \( U \left( \mathbf{r}, t \right) \).

The Fourier transform of the analytic signal:

$$ \int_{-\infty}^{\infty} \left[ U \left( \mathbf{r}, t \right) e^{-j 2 \pi f_0 t} \right] e^{-j 2 \pi f t} \,dt = \tilde{U} \left( \mathbf{r}, f + f_0 \right) $$

The Fourier transform of the derivative of \( U \):

$$\begin{eqnarray} \int_{-\infty}^{\infty} \frac{d}{dt} \left[ U \left( \mathbf{r}, t \right) \right] e^{-j 2 \pi f t} \,dt &=& j 2 \pi f \tilde{U} \left( \mathbf{r}, f \right) \end{eqnarray}$$

Now, apply the slowly varying envelope approximation (SVEA) by asserting that the rate of change of \( U \) with respect to time is much less than the value of \( U \) multiplied by the center frequency, or \( \left| \frac{d}{dt} U \left( \mathbf{r}, t\right) \right| \ll \left| 2 \pi f_0 U \left( \mathbf{r, t} \right) \right| \)

$$\begin{eqnarray} \left| j 2 \pi f \tilde{U} \left( \mathbf{r}, f \right) \right| &=& \left| \int_{-\infty}^{\infty} \frac{d}{dt} \left[ U \left( \mathbf{r}, t \right) \right] e^{-j 2 \pi f t} \,dt \right| \\ &\ll& \left| \int_{-\infty}^{\infty} 2 \pi f_0 U \left( \mathbf{r}, t \right) e^{-j 2 \pi f t} \,dt \right| \\ &\ll& 2 \pi f_0 \left| \tilde{U} \left( \mathbf{r}, f \right) \right| \end{eqnarray}$$

This expression means that the appreciable frequency components of \( U \left( \mathbf{r} , t \right) \) are much less than the frequency \( f_0 \)2. And when we consider the spectrum of \( U \left( \mathbf{r} , t \right) \) centered around \( f_0 \), we find that the bandwidth \( \Delta f \) is small with respect to \( f_0 \).

Assumptions, not Approximations!

The narrowband and slowly varying envelope assumptions are usually referred to as approximations. This is misleading! The resulting expression for the field is not an approximation at all; instead, under the assumptions of narrowbandedness, we can interpret the complex function \( U \left( \mathbf{r}, t \right) \) an "envelope" modulating the amplitude of the fast oscillating carrier wave. If the assumption is not valid, then this interpretation is not correct.

Narrowband Polychromatic Waves

In summary, narrowband polychromatic waves with a center frequency \( f_0 \) are modeled as the product of a fast rotating phasor and slowly varying envelope:

$$ u \left( \mathbf{r}, t \right) = \Re \left[ U \left( \mathbf{r}, t \right) e^{-j 2 \pi f_0 t} \right] $$

The amplitude and the phase of the envelope are the amplitude and phase of the real optical wave.

Coherence

While the expression for the analytic signal \( U \left( \mathbf{r}, t \right) \) as an integral over frequency components appears deterministic, the phase relationships between the spectral components are often unknown and vary randomly in time. As a result, the envelope of the optical wave will vary unpredictably and must be analyzed in terms of its statistical properties.

Monochromatic Light is Coherent

Since monochromatic light has only one spectral component by definition, it is completely coherent.

  • I mean monochromatic in the ideal sense, not like how we sometimes describe lasers.
  • Monochromatic waves, like plane waves, cannot exist in real life. The uncertainy principle requires that a monochromatic wave exist for an infinite duration.

  1. Goodman, Joseph W. Introduction to Fourier optics. Roberts and Company publishers (2005). ISBN 978-0974707723. 

  2. https://physics.stackexchange.com/questions/451239/slowly-varying-envelope-approximation-what-does-it-imply 

A Very Brief Summary of Fresnel and Fraunhofer Diffraction Integrals

Fourier Optics is complicated, and though I have internalized its concepts over the years, I often still need to review the specifics of its mathematical models. Unfortunately, my go-to resource for this, Goodman's Fourier Optics1, tends to disperse information across chapters and homework problems. This makes quick review difficult.

Here I condense what I think are the essentials of Fresnel and Fraunhofer diffraction into one blog post.

Starting Point: the Huygens-Fresnel Principle

Ignore Chapter 3 of Goodman; it's largely irrelevant for practical work. The Huygens-Fresnel principle itself is a good intuitive model to start with.

The Model

An opaque screen with a clear aperture \( \Sigma \) is located in the \( z = 0 \) plane with transverse coordinates \( \left( \xi , \eta \right ) \). It is illuminated by a complex-valued scalar field \( U \left( \xi, \eta \right) \). Let \( \vec{r_0} = \left( \xi, \eta, 0 \right) \) be a point in the plane of the aperture and \( \vec{r_1} = \left( x, y, z \right) \) be a point in the observation plane. The Huygens-Fresnel Principle provides the following formula for the diffracted field \( U \left( x, y \right) \) in the plane \( z \):

$$ U \left( x, y; z \right) = \frac{z}{j \lambda} \iint_{\Sigma} U \left( \xi , \eta \right) \frac{\exp \left( j k r_{01} \right)}{r_{01}^2} \, d\xi d\eta $$

with the distance \( r_{01}^2 = \left( x - \xi \right)^2 + \left( y - \eta \right)^2 + z^2 \).

  • We assumed an obliquity factor \( cos \, \theta = z / r_{01}\). The choice of obliquity factor depends on the boundary conditions discussed in Chapter 3, but again this isn't terribly important for practical work.
  • The integral is a sum over secondary spherical wavelets emitted by each point in the aperture and weighted by the incident field and the obliquity factor.
  • The factor \( 1 / j \) means that each secondary wavelet from a point \( \left( \xi, \eta \right) \) is 90 degrees out-of-phase with the incident field at that point.

Approximations used in the Huygens-Fresnel Principle

  1. The electromagnetic field can be approximated as a complex-valued scalar field.
  2. \( r_{01} \gg \lambda \), or the observation screen is many multiples of the wavelength away from the aperture.

The Fresnel Diffraction Integral

The Fresnel Approximation

Rewrite \( r_{01} \) as:

$$ r_{01} = z \sqrt{ 1 + \frac{\left( x - \xi \right)^2 + \left( y - \eta \right)^2}{z^2} } $$

Apply the binomial approximation:

$$ r_{01} \approx z + \frac{\left( x - \xi \right)^2 + \left( y - \eta \right)^2}{2z} $$

In the Huygens-Fresnel diffraction integral, replace:

  1. \(r_{01}^2 \) in the denominator with \( z^2 \)
  2. \(r_{01}\) in the argument of the exponential with \( z + \frac{\left( x - \xi \right)^2 + \left( y - \eta \right)^2}{2z} \)

The Diffraction Integral: Form 1

Perform the substitutions for \( r_{01} \) into the Huygens-Fresnel formula that were mentioned above to get the first form of the Fresnel diffraction integral:

$$ U \left( x, y; z \right) = \frac{ e^{jkz} }{j \lambda z} \iint_{-\infty}^{\infty} U \left( \xi , \eta \right) \exp \left\{ \frac{jk}{2z} \left[ \left( x - \xi \right)^2 + \left( y - \eta \right)^2 \right] \right\} \,d\xi \,d\eta $$

  • It is space invariant, i.e. it depends only on the differences in coordinates \( \left( x - \xi \right) \) and \( \left( y - \eta \right) \).
  • It represents a convolution of the input field with the kernel \( h \left( x, y \right) = \frac{e^{j k z}}{j \lambda z} \exp \left[ \frac{j k}{2 z} \left( x^2 + y^2 \right) \right] \).

The Diffraction Integral: Form 2

Expand the squared quantities inside the parantheses of Form 1 to get the second from of the integral:

$$ U \left( x, y; z \right) = \frac{ e^{jkz} }{j \lambda z} e^{\frac{j k}{2 z} \left( x^2 + y^2 \right)} \iint_{-\infty}^{\infty} \left[ U \left( \xi , \eta \right) e^{\frac{j k}{2 z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda z} \left( x \xi + y \eta \right) } \,d\xi \,d\eta $$

  • It is proportional to the Fourier transform of the product of the incident field and a parabolic phase curvature \( e^{\frac{j k}{2 z} \left( \xi^2 + \eta^2 \right)} \).

Phasor Conventions

Section 4.2.1 of Goodman is an interesting practical aside about how to identify whether a spherical or parabolic wavefront is converging or diverging based on the sign of its phasor. It is useful for solving the important homework problem 4.16 which concerns the diffraction pattern from an aperture that is illuminated by a converging spherical wave.

Unfortunately, Figure 4.2 does not align well with its description in the text about negative z-values, and it's not clear how the interpretations change for point sources not at \( z = 0 \). I address this below.

  • Let the point of convergence (or center of divergence) of a spherical wave sit on the z-axis at \( z = Z \).
  • The phasor describing the time-dependent part of the field in Goodman's notation is \( e^{-j 2 \pi f t} \).
  • If we move away from the center of the wave such that \( z - Z \) is positive and we encounter wavefronts emitted earlier in time, then \( t \) is decreasing and the argument to the phasor is increasing. The wave is therefore diverging if the argument is positive.
  • If we move away from the center of the wave such that \( z - Z \) is negative and we encounter wavefronts emitted earlier in time, then \( t \) is decreasing and the argument to the phasor is increasing. However, a negative \( z - Z \) makes the phasor negative again so that it is in fact decreasing. The wave is therefore diverging if the argument is negative.
  • Likewise for converging waves.

To summarize:

Phasor \( \left( z - Z \right) \) positive \( \left( z - Z \right) \) negative
\( e^{ j k r} \) diverging converging
\( e^{ -j k r} \) converging diverging

The Fraunhofer Diffraction Integral

The Fraunhofer Approximation

Assume we are so far from the screen that the quadratic phasor inside the diffraction integral is effectively flat. This means:

$$ z \gg \frac{k \left( \xi^2 + \eta^2 \right)_{\text{max}}}{2} $$

The Diffraction Integral

Applying the approximation above allows us to drop the quadratic phasor inside the Fresnel diffraction integral because it is effectively 1:

$$ U \left( x, y; z \right) = \frac{ e^{jkz} }{j \lambda z} e^{\frac{j k}{2 z} \left( x^2 + y^2 \right)} \iint_{-\infty}^{\infty} U \left( \xi , \eta \right) e^{-j \frac{2 \pi }{\lambda z} \left( x \xi + y \eta \right) } \,d\xi \,d\eta $$

  • Apart from the phase term that depends on \( z \), this expression represents a Fourier transform of the incident field.
  • It appears to break spatial invariance because we no longer depend on differences of coordinates, e.g. \( x - \xi \). However, we can still use the Fresnel transfer function (the Fourier transform of the Fresnel convolution kernel) as the transfer function for Fraunhofer diffraction because if the Fraunhofer approximation is valid, then so is the Fresnel approximation.

Solution to Homework Problem 4.16

Problem 4.16 is important because it is a basis for the development of the frequency analysis of image-forming systems in later chapters of Goodman.

The purpose of 4.16 is to show that the diffraction pattern of an aperture that is illuminated by a spherical converging wave in the Fresnel regime is the Fraunhofer diffraction pattern of the aperture.

Part a: Quadratic phase approximation to the incident wave

Let \( z = 0 \) be the plane of the aperture and \( z = Z \) be the observation plane. Additionally, let \( \left( \xi, \eta \right) \) represent the coordinates in the plane of the aperture, and \( \left( x, y \right) \) the coordinates in the observation plane. The spherical wave that illuminates the aperture is convering to a point \( \vec{r}_P = Y \hat{ \jmath} + Z \hat{k} \) in the observation plane.

To find a quadratic phase approximation for the incident wave, start with its representation as a time-harmonic spherical wave of amplitude \( A \):

$$ U \left( x, y, z \right) = A \frac{e^{j k |\vec{r} - \vec{r}_P|}}{|\vec{r} - \vec{r}_P|} $$

Note that \( \vec{r} - \vec{r}_P = x \hat{\imath} + \left( y - Y \right) \hat{\jmath} + \left( z - Z \right) \hat{k} \). Its magnitude is

$$\begin{eqnarray} | \vec{r} - \vec{r}_P | &=& \sqrt{x^2 + \left( y - Y \right)^2 + \left( z - Z \right)^2} \\ &=& \left( z - Z \right) \sqrt{1 + \frac{x^2 + \left( y - Y \right)^2}{\left( z - Z \right)^2} } \\ &\approx& \left( z - Z \right) + \frac{ x^2 + \left( y - Y \right)^2 }{2 \left( z - Z \right)} \end{eqnarray}$$

At first glance, there's a problem here because allowing \( \left( z - Z \right) \) to be negative will result in a negative value for the magnitude of the vector \( \left( \hat{r} - \hat{r}_P \right) \). However, if we use the above table for selecting \( e^{j k r} \) as the phasor for a converging wave when \( \left( z - Z \right) \) is negative, then we will have the correct sign of the argument to the phasor. We do however need to take the absolute value of the \( z - Z \) term in the denominator of the expression of the spherical wave.

Replacing the distance in the phasor's argument with the two lowest order terms in the binomial expansion and the lowest order term in the denominator:

$$ U \left( x, y, z \right) \approx A \frac{e^{j k \left(z - Z \right)} e^{j k \left[ x^2 + \left( y - Y \right)^2 \right] / 2 \left(z - Z \right) }}{\left|z - Z \right|} $$

In the \( z = 0 \) plane, this becomes:

$$ U \left( x, y; z = 0 \right) \approx A \left(x, y \right) \frac{e^{-j k Z} e^{-j k \left[ x^2 + \left( y - Y \right)^2 \right] / 2 Z }}{Z} $$

I moved the finite extent of the aperture into a new function for the amplitude \( A \) above. This function is zero outside the aperture and a constant \( A \) inside it.

Part b: Diffraction pattern at the point \( P \)

Use the second form of the Fresnel diffraction integral to compute the diffraction pattern at \( P \):

$$\begin{eqnarray} U \left( x = 0, y = Y, z = Z \right) &=& \frac{ e^{jkZ} }{j \lambda Z} e^{\frac{j k Y^2}{2 Z}} \iint_{-\infty}^{\infty} \left[ U \left( \xi , \eta ; z = 0 \right) e^{\frac{j k}{2 Z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{ e^{jkZ} e^{-jkZ} }{j \lambda Z^2} e^{\frac{j k Y^2}{2 Z}} \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) \left[ e^{-\frac{j k}{2Z} \left[ \xi^2 + \left( \eta - Y \right)^2 \right]} e^{\frac{j k}{2 Z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{1}{j \lambda Z^2} e^{\frac{j k Y^2}{2 Z} } \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) \left[ e^{-\frac{j k}{2Z} \left( \xi^2 + \eta^2 - 2 \eta Y + Y^2 \right)} e^{\frac{j k}{2 Z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{1}{j \lambda Z^2} \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) e^{\frac{j k \eta Y}{Z}} e^{-j \frac{2 \pi}{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{1}{j \lambda Z^2} \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) e^{-j \frac{2 \pi }{\lambda Z} \left(\eta - Y \right) } \,d\xi \,d\eta \end{eqnarray}$$

The final expression above is proportional to the Fraunhofer diffraction pattern of the aperture. The reason that the Fraunhofer diffraction pattern appears as the result is that the converging spherical wavefronts exactly cancel the diverging quadratic phase term inside the Fresnel diffraction formula, leaving a simple Fourier transform of the aperture as a result.


  1. Goodman, Joseph W. Introduction to Fourier optics. Roberts and Company publishers (2005). ISBN 978-0974707723. 

Data Type Alignment for Ray Tracing in Rust

I would like to clean up my 3D ray trace routines for my Rust-based optical design library. The proof of concept (PoC) is finished and I now I need to make the code easier to modify to better support the features that I want to add on the frontend. I suspect that I might be able to make some performance gains as well during refactoring. Towards this end, I want to take a look at my ray data type from the perspective of making it CPU cache friendly.

One of the current obstacles to adding more features to the GUI (for example color selection for different ray bundles) is how I handle individual rays. For the PoC it was fastest to add two additional fields to each ray to track where they come from and whether they are terminated:

struct Vec3 {
    e: [f64; 3],
}

struct Ray {
    pos: Vec3,
    dir: Vec3,
    terminated: bool,
    field_id: usize,
}

A ray is just two, 3-element arrays of floats that specify the coordinates of a point on the ray and its direction cosines. I have additionally included a boolean flag to indicate whether the ray has terminated, i.e. gone out-of-bounds of the system or failed to converge during calculation of the intersection point with a surface.

A ray fan is a collection of rays and is specified by a 3-tuple of wavelength, axis, and field; field_id really should not belong to an individual Ray because it can be stored along with the set of all rays for the current ray fan. I probably added it because it was the easiest thing to do at the time to get the application working.

A deeper look into the Ray struct

Size of a ray

Let's first look to see how much space the Ray struct occupies.

use std::mem;

#[derive(Debug)]
struct Vec3 {
    e: [f64; 3],
}

#[derive(Debug)]
struct Ray {
    pos: Vec3,
    dir: Vec3,
    terminated: bool,
    field_id: usize,
}

fn main() {
    println!("Size of ray: {:?}", mem::size_of::<Ray>());
}

The Ray struct occupies 64 bytes in memory. Does this make sense?

The sizes of the individual fields are:

Field Size, bytes
pos 24
dir 24
terminated 1
field_id 8*

pos and dir are each 24 bytes because they are each composed of three 64-bit floats and 8 bits = 1 byte. terminated is only one byte because it is a boolean. field_id is a usize, which means that it depends on the compilation target. On 64-bit targets, such as x86_64, it is 64 bits = 8 bytes in size.

Adding the sizes in the above table gives 57 bytes, not 64 bytes as was output from the example code. Why is this?

Alignment and padding

Alignment refers to the layout of a data type in memory and how it is accessed. CPUs read memory in chunks that are equal in size to the word size. Misaligned data is inefficient to access because the CPU requires more cycles than is necessary to fetch the data.

Natural alignment refers to the most efficient alignment of a data type for CPU access. To achieve natural alignment, a compiler can introduce padding between fields of a struct so that the memory address of a field or datatype is a multiple of the field's/data type's alignment.

As an example of misalignment, consider a 4-byte integer and that starts at memory address 5. The CPU has 32-bit memory words. To read the data, the CPU must:

  1. read bytes 4-7,
  2. read bytes 8-11,
  3. and combine the relevant parts of both reads to get the 4 bytes, i.e. bytes 5 - 8.

Notice that we must specify the memory word size to determine whether a data type is misaligned.

Here is an important question: why can't the CPU just start reading from memory address 5? The answer, as far as I can tell, is that it just can't. This is not how the CPU, RAM, and memory bus are wired.

Alignment in Rust

Alignment in Rust is defined as follows:

The alignment of a value specifies what addresses are valid to store the value at. A value of alignment n must only be stored at an address that is a multiple of n.

The Rust compiler only guarantees the following when it comes to padding fields in structs:

  1. The fields are properly aligned.
  2. The fields do not overlap.
  3. The alignment of the type is at least the maximum alignment of its fields.

So for my Ray data type, its alignment is 8 because the maximum alignment of its fields is 8 bytes. (pos and dir are composed of 8-byte floating point numbers). The addresses of its fields are:

fn main() {
    let ray = Ray {
        pos: Vec3 { e: [0.0, 0.0, 0.0] },
        dir: Vec3 { e: [0.0, 0.0, 1.0] },
        terminated: false,
        field_id: 0,
    };

    unsafe {
        println!("Address of ray.pos: {:p}", ptr::addr_of!(ray.pos));
        println!("Address of ray.dir: {:p}", ptr::addr_of!(ray.dir));
        println!("Address of ray.terminated: {:p}", ptr::addr_of!(ray.terminated));
        println!("Address of ray.field_id: {:p}", ptr::addr_of!(ray.field_id));
    }
}

I got the following results, which will vary from system-to-system and probably run-to-run:

Address of ray.pos: 0x7fff076c6b50
Address of ray.dir: 0x7fff076c6b68
Address of ray.terminated: 0x7fff076c6b88
Address of ray.field_id: 0x7fff076c6b80

So the pos field comes first at address 0x6b50 (omitting the most significant hexadecimal digits). Then, 24 bytes later, comes dir at address 0x6b68. Note that the difference is hexadecimal 0x18, which is decimal 16 + 8 = 24! So pos really occupies 24 bytes like we previously calculated.

Next comes field_id and not terminated. It is 0x6b80 - 0x6b68 = 0x0018, or 24 bytes after dir like before. So far we have no padding, but the compiler did swap the order of the fields. Finally, terminated is 8 bytes after field_id because field_id is 8-byte aligned. This means that the Rust compiler must have placed 7 bytes of padding after the terminated field.

What makes a good data type?

As I mentioned, I already know that field_id shouldn't belong to the ray for reasons related to data access by the programmer. So the reason for removing it from the Ray struct is not related to performance. But what about the terminated bool? Well, in this case, it's resulting in 7 extra bytes of padding for each ray!

#[derive(Debug)]
struct Vec3 {
    e: [f64; 3],
}

#[derive(Debug)]
struct Ray {
    pos: Vec3,
    dir: Vec3,
    terminated: bool,
}

fn main() {
    let ray = Ray {
        pos: Vec3 { e: [0.0, 0.0, 0.0] },
        dir: Vec3 { e: [0.0, 0.0, 1.0] },
        terminated: false,
    };

    println!("Size of ray: {:?}", mem::size_of::<Ray>());
}

This program prints Size of ray : 56, but 24 + 24 + 1 = 49. In both versions we waste 7 bytes.

Fitting a Ray into the CPU cache

Do I have a good reason to remove terminated from the Ray struct because it wastes space? Consider the following:

We want as many Ray instances as possible to fit within a CPU cache line if we want to maximize performance. (Note that I'm not saying that we necessarily want to maximize performance because that comes with tradeoffs.) Each CPU core on my AMD Ryzen 7 has a 64 kB L1 cache with 64 byte cache lines. This means that I can fit only 1 of the current version of Ray into each cache line for a total of 64 kB / 64 bytes = 1024 rays maximum in the L1 cache of each core. If I remove field_size and terminated, then the size of a ray becomes 48 bytes. Unfortunately, this means that only one Ray instance fits in a cache line, just as before with a 64 byte Ray.

But, if I also reduce my precision to 32-bit floats, then the size of a Ray becomes 6 * 4 = 24 bytes and I have doubled the number of rays that fit in L1 cache.

Now what if I reduced the precision but kept terminated? Then I get 6 * 4 + 8 = 32 bytes per Ray and I still have 2 rays per cache line.

I conclude that there is no reason to remove terminated for performance reasons. Reducing my floating point precision would produce a more noticeable effect on the cache locality of the Ray data type.

Does all of this matter?

My Ryzen 7 laptop can trace about 600 rays through 3 surfaces in 380 microseconds with Firefox, Slack, and Outlook running. At this point, I doubt that crafting my data types for cache friendliness is going to offer a significant payoff. Creating data types that are easy to work with is likely more important.

I do think, however, that it's important to understand these concepts. If I do need to tune the performance in the future, then I know where to look.

An Analog LED Dimmer Circuit

I recently needed to build a circuit to control the brightness of a 4 W LED with a knob. I know basic electronics, and I thought this would be easy. I spoke to a few people whom I know and are knowledgable in electronics. I also asked people on Reddit. A lot of people said it would be easy.

As it turns it, it wasn't easy.

The Requirements

My requirments are simple:

  • The brightness should be manually adjustable with a knob from OFF to nearly full ON.
  • The LED will serve as the light source of a microscope trans-illuminator. It should work across a large range of frame acquisition rates (1 Hz to 1 kHz, or exposure times of 1 ms to 1 s).
  • The range of brightnesses should be variable across the dynamic range of the camera, which in my case is 35,000:1, or about 90 dB.

I don't care about efficiency. I don't care about whether I can use a Raspberry Pi to control it. I don't care whether it can be turned on or off with different logic levels. I just want a knob that I can turn to make the LED brighter or dimmer.

In spite of the insistence of several people that I communicated with on the internet, I decided that the second requirement would preclude using pulse width modulation (PWM) to dim the LED. Even when I could convince others that PWM almost always causes aliasing at high frame rates, they tried to find obscure work arounds so I could still use PWM. I really do appreciate all the feedback I got. But I also learned that PWM is the hammer of the electronics world that makes everything look like a nail1.

The Circuit

I reached out to a friend of mine who's a wizard at analog electronics2. He suggested to use a MOSFET and to vary the gate-source voltage to control the current through the LED.

After a lot of thinking and reading, I arrived at the following circuit:

The LED is an Osram Oslon star LED (LST1-01F05-4070-01) with a maximum current of 1.3 A and a maximum forward voltage of 3.2 V. The MOSFET is an IRF510, whose gate-source threshold voltage is about 3 V.

Here's a brief explanation of what each component does:

  1. Voltage source : This is just a 12 V wall wart.
  2. 200 nF capacitor : This smooths out any fluctuations from the wall wart.
  3. 50 kOhm potentiometer : The "knob." Turning it will vary the gate-source voltage of the MOSFET, which controls how much current flows through the LED.
  4. 50 kOhm resistor : This along with the potentiometer forms a voltage divider to keep the minimum voltage at the MOSFET gate close to where the LED turns on. Without it, you need to rotate the potentiometer almost half of its full range for the LED to turn on.
  5. 300 nF capacitor : A debounce capacitor that smooths out the mechanical irregularities of the pot when it turns.
  6. IRF510 MOSFET : Basically a valve that I can vary continuously to control the LED current by setting the voltage at the gate.
  7. LED : So pretty.
  8. 10 Ohm resistor : This limits the current through the resistor. I calculated its value by dividing the maximum supply voltage minus the maximum forward voltage drop across the LED by the maximum current, then rounded up for safety.

$$ R = \frac{V}{I} = \frac{\left( 12 \, V - 3.2 \, V \right) }{1.3 \, A} = 6.8 \Omega $$

The resistor also has to handle a large power dissipation at the maximum current:

$$ P = I^2 R = \left(1.3 \, A \right)^2 \left( 10 \Omega \right) = 16.9 W $$

I decided instead to keep the current to less than 1 Amp so that I could use a 10 Watt resistor that I had.

Power dissipation is also why we don't just use a potentiometer to control the LED current: my pots were only rated up to about 50 mW, whereas I expected that the MOSFET would have handle loads on the order of Watts due to the high current.

What I Learned

I really need to study MOSFETS

I still don't really know how to solve circuits with MOSFETs. I arrived at the above circuit largely by trial-and-error on a prototype and by performing naive calculations on the voltage divider that turned out to not be entirely correct. I also expected that the LED would turn on once I passed the MOSFET's gate-source voltage threshold, but this turned out to be off by about 2 or 3 V.

MOSFETs suffer from second order effects

There is currently a hysteresis in the gate-ground voltage at when the LED turns on and when it turns off by about half a Volt. According to a helpful person on Reddit, this is likely due to a change in both the LEDs forward voltage and the MOSFET's threshold voltage with temperature once the current starts flowing. A possible fix is to swap the order of the LED and the MOSFET so that only the MOSFET will contribute to the hysteresis.

You can always complicate things to make them better

The same person on Reddit above also suggested making the circuit robust to temperature variations by adding an opamp to control the MOSFET gate voltage. It would compensate for temperature changes by comparing the potentiometer value to the 10 Ohm resistor in a closed feedback loop.

Electronics is an art

Yes, electronics is a science, but I would argue that having to mentally juggle second order effects and the fact that experts seem to make an initial design "by feel" are signatures of an art.

It also struck me how nearly every step of the process forced me to take a detour to address something I hadn't at first considered, such as current limits in the wires and large variations in the MOSFET specs.

The next time I need to do something like this, I will expect the problem to take longer to solve than I first anticipate.


  1. To my non-native English speaking readers: I mean that people try to use PWM to solve problems where it's not appropriate. 

  2. And if you play electric guitar, be sure to check out his handmade effects pedals: https://www.volumeandpower.com/