A Very Brief Summary of Fresnel and Fraunhofer Diffraction Integrals

Fourier Optics is complicated, and though I have internalized its concepts over the years, I often still need to review the specifics of its mathematical models. Unfortunately, my go-to resource for this, Goodman's Fourier Optics1, tends to disperse information across chapters and homework problems. This makes quick review difficult.

Here I condense what I think are the essentials of Fresnel and Fraunhofer diffraction into one blog post.

Starting Point: the Huygens-Fresnel Principle

Ignore Chapter 3 of Goodman; it's largely irrelevant for practical work. The Huygens-Fresnel principle itself is a good intuitive model to start with.

The Model

An opaque screen with a clear aperture \( \Sigma \) is located in the \( z = 0 \) plane with transverse coordinates \( \left( \xi , \eta \right ) \). It is illuminated by a complex-valued scalar field \( U \left( \xi, \eta \right) \). Let \( \vec{r_0} = \left( \xi, \eta, 0 \right) \) be a point in the plane of the aperture and \( \vec{r_1} = \left( x, y, z \right) \) be a point in the observation plane. The Huygens-Fresnel Principle provides the following formula for the diffracted field \( U \left( x, y \right) \) in the plane \( z \):

$$ U \left( x, y; z \right) = \frac{z}{j \lambda} \iint_{\Sigma} U \left( \xi , \eta \right) \frac{\exp \left( j k r_{01} \right)}{r_{01}^2} \, d\xi d\eta $$

with the distance \( r_{01}^2 = \left( x - \xi \right)^2 + \left( y - \eta \right)^2 + z^2 \).

  • We assumed an obliquity factor \( cos \, \theta = z / r_{01}\). The choice of obliquity factor depends on the boundary conditions discussed in Chapter 3, but again this isn't terribly important for practical work.
  • The integral is a sum over secondary spherical wavelets emitted by each point in the aperture and weighted by the incident field and the obliquity factor.
  • The factor \( 1 / j \) means that each secondary wavelet from a point \( \left( \xi, \eta \right) \) is 90 degrees out-of-phase with the incident field at that point.

Approximations used in the Huygens-Fresnel Principle

  1. The electromagnetic field can be approximated as a complex-valued scalar field.
  2. \( r_{01} \gg \lambda \), or the observation screen is many multiples of the wavelength away from the aperture.

The Fresnel Diffraction Integral

The Fresnel Approximation

Rewrite \( r_{01} \) as:

$$ r_{01} = z \sqrt{ 1 + \frac{\left( x - \xi \right)^2 + \left( y - \eta \right)^2}{z^2} } $$

Apply the binomial approximation:

$$ r_{01} \approx z + \frac{\left( x - \xi \right)^2 + \left( y - \eta \right)^2}{2z} $$

In the Huygens-Fresnel diffraction integral, replace:

  1. \(r_{01}^2 \) in the denominator with \( z^2 \)
  2. \(r_{01}\) in the argument of the exponential with \( z + \frac{\left( x - \xi \right)^2 + \left( y - \eta \right)^2}{2z} \)

The Diffraction Integral: Form 1

Perform the substitutions for \( r_{01} \) into the Huygens-Fresnel formula that were mentioned above to get the first form of the Fresnel diffraction integral:

$$ U \left( x, y; z \right) = \frac{ e^{jkz} }{j \lambda z} \iint_{-\infty}^{\infty} U \left( \xi , \eta \right) \exp \left\{ \frac{jk}{2z} \left[ \left( x - \xi \right)^2 + \left( y - \eta \right)^2 \right] \right\} \,d\xi \,d\eta $$

  • It is space invariant, i.e. it depends only on the differences in coordinates \( \left( x - \xi \right) \) and \( \left( y - \eta \right) \).
  • It represents a convolution of the input field with the kernel \( h \left( x, y \right) = \frac{e^{j k z}}{j \lambda z} \exp \left[ \frac{j k}{2 z} \left( x^2 + y^2 \right) \right] \).

The Diffraction Integral: Form 2

Expand the squared quantities inside the parantheses of Form 1 to get the second from of the integral:

$$ U \left( x, y; z \right) = \frac{ e^{jkz} }{j \lambda z} e^{\frac{j k}{2 z} \left( x^2 + y^2 \right)} \iint_{-\infty}^{\infty} \left[ U \left( \xi , \eta \right) e^{\frac{j k}{2 z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda z} \left( x \xi + y \eta \right) } \,d\xi \,d\eta $$

  • It is proportional to the Fourier transform of the product of the incident field and a parabolic phase curvature \( e^{\frac{j k}{2 z} \left( \xi^2 + \eta^2 \right)} \).

Phasor Conventions

Section 4.2.1 of Goodman is an interesting practical aside about how to identify whether a spherical or parabolic wavefront is converging or diverging based on the sign of its phasor. It is useful for solving the important homework problem 4.16 which concerns the diffraction pattern from an aperture that is illuminated by a converging spherical wave.

Unfortunately, Figure 4.2 does not align well with its description in the text about negative z-values, and it's not clear how the interpretations change for point sources not at \( z = 0 \). I address this below.

  • Let the point of convergence (or center of divergence) of a spherical wave sit on the z-axis at \( z = Z \).
  • The phasor describing the time-dependent part of the field in Goodman's notation is \( e^{-j 2 \pi f t} \).
  • If we move away from the center of the wave such that \( z - Z \) is positive and we encounter wavefronts emitted earlier in time, then \( t \) is decreasing and the argument to the phasor is increasing. The wave is therefore diverging if the argument is positive.
  • If we move away from the center of the wave such that \( z - Z \) is negative and we encounter wavefronts emitted earlier in time, then \( t \) is decreasing and the argument to the phasor is increasing. However, a negative \( z - Z \) makes the phasor negative again so that it is in fact decreasing. The wave is therefore diverging if the argument is negative.
  • Likewise for converging waves.

To summarize:

Phasor \( \left( z - Z \right) \) positive \( \left( z - Z \right) \) negative
\( e^{ j k r} \) diverging converging
\( e^{ -j k r} \) converging diverging

The Fraunhofer Diffraction Integral

The Fraunhofer Approximation

Assume we are so far from the screen that the quadratic phasor inside the diffraction integral is effectively flat. This means:

$$ z \gg \frac{k \left( \xi^2 + \eta^2 \right)_{\text{max}}}{2} $$

The Diffraction Integral

Applying the approximation above allows us to drop the quadratic phasor inside the Fresnel diffraction integral because it is effectively 1:

$$ U \left( x, y; z \right) = \frac{ e^{jkz} }{j \lambda z} e^{\frac{j k}{2 z} \left( x^2 + y^2 \right)} \iint_{-\infty}^{\infty} U \left( \xi , \eta \right) e^{-j \frac{2 \pi }{\lambda z} \left( x \xi + y \eta \right) } \,d\xi \,d\eta $$

  • Apart from the phase term that depends on \( z \), this expression represents a Fourier transform of the incident field.
  • It appears to break spatial invariance because we no longer depend on differences of coordinates, e.g. \( x - \xi \). However, we can still use the Fresnel transfer function (the Fourier transform of the Fresnel convolution kernel) as the transfer function for Fraunhofer diffraction because if the Fraunhofer approximation is valid, then so is the Fresnel approximation.

Solution to Homework Problem 4.16

Problem 4.16 is important because it is a basis for the development of the frequency analysis of image-forming systems in later chapters of Goodman.

The purpose of 4.16 is to show that the diffraction pattern of an aperture that is illuminated by a spherical converging wave in the Fresnel regime is the Fraunhofer diffraction pattern of the aperture.

Part a: Quadratic phase approximation to the incident wave

Let \( z = 0 \) be the plane of the aperture and \( z = Z \) be the observation plane. Additionally, let \( \left( \xi, \eta \right) \) represent the coordinates in the plane of the aperture, and \( \left( x, y \right) \) the coordinates in the observation plane. The spherical wave that illuminates the aperture is convering to a point \( \vec{r}_P = Y \hat{ \jmath} + Z \hat{k} \) in the observation plane.

To find a quadratic phase approximation for the incident wave, start with its representation as a time-harmonic spherical wave of amplitude \( A \):

$$ U \left( x, y, z \right) = A \frac{e^{j k |\vec{r} - \vec{r}_P|}}{|\vec{r} - \vec{r}_P|} $$

Note that \( \vec{r} - \vec{r}_P = x \hat{\imath} + \left( y - Y \right) \hat{\jmath} + \left( z - Z \right) \hat{k} \). Its magnitude is

$$\begin{eqnarray} | \vec{r} - \vec{r}_P | &=& \sqrt{x^2 + \left( y - Y \right)^2 + \left( z - Z \right)^2} \\ &=& \left( z - Z \right) \sqrt{1 + \frac{x^2 + \left( y - Y \right)^2}{\left( z - Z \right)^2} } \\ &\approx& \left( z - Z \right) + \frac{ x^2 + \left( y - Y \right)^2 }{2 \left( z - Z \right)} \end{eqnarray}$$

At first glance, there's a problem here because allowing \( \left( z - Z \right) \) to be negative will result in a negative value for the magnitude of the vector \( \left( \hat{r} - \hat{r}_P \right) \). However, if we use the above table for selecting \( e^{j k r} \) as the phasor for a converging wave when \( \left( z - Z \right) \) is negative, then we will have the correct sign of the argument to the phasor. We do however need to take the absolute value of the \( z - Z \) term in the denominator of the expression of the spherical wave.

Replacing the distance in the phasor's argument with the two lowest order terms in the binomial expansion and the lowest order term in the denominator:

$$ U \left( x, y, z \right) \approx A \frac{e^{j k \left(z - Z \right)} e^{j k \left[ x^2 + \left( y - Y \right)^2 \right] / 2 \left(z - Z \right) }}{\left|z - Z \right|} $$

In the \( z = 0 \) plane, this becomes:

$$ U \left( x, y; z = 0 \right) \approx A \left(x, y \right) \frac{e^{-j k Z} e^{-j k \left[ x^2 + \left( y - Y \right)^2 \right] / 2 Z }}{Z} $$

I moved the finite extent of the aperture into a new function for the amplitude \( A \) above. This function is zero outside the aperture and a constant \( A \) inside it.

Part b: Diffraction pattern at the point \( P \)

Use the second form of the Fresnel diffraction integral to compute the diffraction pattern at \( P \):

$$\begin{eqnarray} U \left( x = 0, y = Y, z = Z \right) &=& \frac{ e^{jkZ} }{j \lambda Z} e^{\frac{j k Y^2}{2 Z}} \iint_{-\infty}^{\infty} \left[ U \left( \xi , \eta ; z = 0 \right) e^{\frac{j k}{2 Z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{ e^{jkZ} e^{-jkZ} }{j \lambda Z^2} e^{\frac{j k Y^2}{2 Z}} \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) \left[ e^{-\frac{j k}{2Z} \left[ \xi^2 + \left( \eta - Y \right)^2 \right]} e^{\frac{j k}{2 Z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{1}{j \lambda Z^2} e^{\frac{j k Y^2}{2 Z} } \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) \left[ e^{-\frac{j k}{2Z} \left( \xi^2 + \eta^2 - 2 \eta Y + Y^2 \right)} e^{\frac{j k}{2 Z} \left( \xi^2 + \eta^2 \right)} \right] e^{-j \frac{2 \pi }{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{1}{j \lambda Z^2} \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) e^{\frac{j k \eta Y}{Z}} e^{-j \frac{2 \pi}{\lambda Z} y \eta } \,d\xi \,d\eta \\ &\approx& \frac{1}{j \lambda Z^2} \iint_{-\infty}^{\infty} A \left(\xi, \eta \right) e^{-j \frac{2 \pi }{\lambda Z} \left(\eta - Y \right) } \,d\xi \,d\eta \end{eqnarray}$$

The final expression above is proportional to the Fraunhofer diffraction pattern of the aperture. The reason that the Fraunhofer diffraction pattern appears as the result is that the converging spherical wavefronts exactly cancel the diverging quadratic phase term inside the Fresnel diffraction formula, leaving a simple Fourier transform of the aperture as a result.


  1. Goodman, Joseph W. Introduction to Fourier optics. Roberts and Company publishers (2005). ISBN 978-0974707723. 

Data Type Alignment for Ray Tracing in Rust

I would like to clean up my 3D ray trace routines for my Rust-based optical design library. The proof of concept (PoC) is finished and I now I need to make the code easier to modify to better support the features that I want to add on the frontend. I suspect that I might be able to make some performance gains as well during refactoring. Towards this end, I want to take a look at my ray data type from the perspective of making it CPU cache friendly.

One of the current obstacles to adding more features to the GUI (for example color selection for different ray bundles) is how I handle individual rays. For the PoC it was fastest to add two additional fields to each ray to track where they come from and whether they are terminated:

struct Vec3 {
    e: [f64; 3],
}

struct Ray {
    pos: Vec3,
    dir: Vec3,
    terminated: bool,
    field_id: usize,
}

A ray is just two, 3-element arrays of floats that specify the coordinates of a point on the ray and its direction cosines. I have additionally included a boolean flag to indicate whether the ray has terminated, i.e. gone out-of-bounds of the system or failed to converge during calculation of the intersection point with a surface.

A ray fan is a collection of rays and is specified by a 3-tuple of wavelength, axis, and field; field_id really should not belong to an individual Ray because it can be stored along with the set of all rays for the current ray fan. I probably added it because it was the easiest thing to do at the time to get the application working.

A deeper look into the Ray struct

Size of a ray

Let's first look to see how much space the Ray struct occupies.

use std::mem;

#[derive(Debug)]
struct Vec3 {
    e: [f64; 3],
}

#[derive(Debug)]
struct Ray {
    pos: Vec3,
    dir: Vec3,
    terminated: bool,
    field_id: usize,
}

fn main() {
    println!("Size of ray: {:?}", mem::size_of::<Ray>());
}

The Ray struct occupies 64 bytes in memory. Does this make sense?

The sizes of the individual fields are:

Field Size, bytes
pos 24
dir 24
terminated 1
field_id 8*

pos and dir are each 24 bytes because they are each composed of three 64-bit floats and 8 bits = 1 byte. terminated is only one byte because it is a boolean. field_id is a usize, which means that it depends on the compilation target. On 64-bit targets, such as x86_64, it is 64 bits = 8 bytes in size.

Adding the sizes in the above table gives 57 bytes, not 64 bytes as was output from the example code. Why is this?

Alignment and padding

Alignment refers to the layout of a data type in memory and how it is accessed. CPUs read memory in chunks that are equal in size to the word size. Misaligned data is inefficient to access because the CPU requires more cycles than is necessary to fetch the data.

Natural alignment refers to the most efficient alignment of a data type for CPU access. To achieve natural alignment, a compiler can introduce padding between fields of a struct so that the memory address of a field or datatype is a multiple of the field's/data type's alignment.

As an example of misalignment, consider a 4-byte integer and that starts at memory address 5. The CPU has 32-bit memory words. To read the data, the CPU must:

  1. read bytes 4-7,
  2. read bytes 8-11,
  3. and combine the relevant parts of both reads to get the 4 bytes, i.e. bytes 5 - 8.

Notice that we must specify the memory word size to determine whether a data type is misaligned.

Here is an important question: why can't the CPU just start reading from memory address 5? The answer, as far as I can tell, is that it just can't. This is not how the CPU, RAM, and memory bus are wired.

Alignment in Rust

Alignment in Rust is defined as follows:

The alignment of a value specifies what addresses are valid to store the value at. A value of alignment n must only be stored at an address that is a multiple of n.

The Rust compiler only guarantees the following when it comes to padding fields in structs:

  1. The fields are properly aligned.
  2. The fields do not overlap.
  3. The alignment of the type is at least the maximum alignment of its fields.

So for my Ray data type, its alignment is 8 because the maximum alignment of its fields is 8 bytes. (pos and dir are composed of 8-byte floating point numbers). The addresses of its fields are:

fn main() {
    let ray = Ray {
        pos: Vec3 { e: [0.0, 0.0, 0.0] },
        dir: Vec3 { e: [0.0, 0.0, 1.0] },
        terminated: false,
        field_id: 0,
    };

    unsafe {
        println!("Address of ray.pos: {:p}", ptr::addr_of!(ray.pos));
        println!("Address of ray.dir: {:p}", ptr::addr_of!(ray.dir));
        println!("Address of ray.terminated: {:p}", ptr::addr_of!(ray.terminated));
        println!("Address of ray.field_id: {:p}", ptr::addr_of!(ray.field_id));
    }
}

I got the following results, which will vary from system-to-system and probably run-to-run:

Address of ray.pos: 0x7fff076c6b50
Address of ray.dir: 0x7fff076c6b68
Address of ray.terminated: 0x7fff076c6b88
Address of ray.field_id: 0x7fff076c6b80

So the pos field comes first at address 0x6b50 (omitting the most significant hexadecimal digits). Then, 24 bytes later, comes dir at address 0x6b68. Note that the difference is hexadecimal 0x18, which is decimal 16 + 8 = 24! So pos really occupies 24 bytes like we previously calculated.

Next comes field_id and not terminated. It is 0x6b80 - 0x6b68 = 0x0018, or 24 bytes after dir like before. So far we have no padding, but the compiler did swap the order of the fields. Finally, terminated is 8 bytes after field_id because field_id is 8-byte aligned. This means that the Rust compiler must have placed 7 bytes of padding after the terminated field.

What makes a good data type?

As I mentioned, I already know that field_id shouldn't belong to the ray for reasons related to data access by the programmer. So the reason for removing it from the Ray struct is not related to performance. But what about the terminated bool? Well, in this case, it's resulting in 7 extra bytes of padding for each ray!

#[derive(Debug)]
struct Vec3 {
    e: [f64; 3],
}

#[derive(Debug)]
struct Ray {
    pos: Vec3,
    dir: Vec3,
    terminated: bool,
}

fn main() {
    let ray = Ray {
        pos: Vec3 { e: [0.0, 0.0, 0.0] },
        dir: Vec3 { e: [0.0, 0.0, 1.0] },
        terminated: false,
    };

    println!("Size of ray: {:?}", mem::size_of::<Ray>());
}

This program prints Size of ray : 56, but 24 + 24 + 1 = 49. In both versions we waste 7 bytes.

Fitting a Ray into the CPU cache

Do I have a good reason to remove terminated from the Ray struct because it wastes space? Consider the following:

We want as many Ray instances as possible to fit within a CPU cache line if we want to maximize performance. (Note that I'm not saying that we necessarily want to maximize performance because that comes with tradeoffs.) Each CPU core on my AMD Ryzen 7 has a 64 kB L1 cache with 64 byte cache lines. This means that I can fit only 1 of the current version of Ray into each cache line for a total of 64 kB / 64 bytes = 1024 rays maximum in the L1 cache of each core. If I remove field_size and terminated, then the size of a ray becomes 48 bytes. Unfortunately, this means that only one Ray instance fits in a cache line, just as before with a 64 byte Ray.

But, if I also reduce my precision to 32-bit floats, then the size of a Ray becomes 6 * 4 = 24 bytes and I have doubled the number of rays that fit in L1 cache.

Now what if I reduced the precision but kept terminated? Then I get 6 * 4 + 8 = 32 bytes per Ray and I still have 2 rays per cache line.

I conclude that there is no reason to remove terminated for performance reasons. Reducing my floating point precision would produce a more noticeable effect on the cache locality of the Ray data type.

Does all of this matter?

My Ryzen 7 laptop can trace about 600 rays through 3 surfaces in 3.8 microseconds with Firefox, Slack, and Outlook running. At this point, I doubt that crafting my data types for cache friendliness is going to offer a significant payoff. Creating data types that are easy to work with is likely more important.

I do think, however, that it's important to understand these concepts. If I do need to tune the performance in the future, then I know where to look.

An Analog LED Dimmer Circuit

I recently needed to build a circuit to control the brightness of a 4 W LED with a knob. I know basic electronics, and I thought this would be easy. I spoke to a few people whom I know and are knowledgable in electronics. I also asked people on Reddit. A lot of people said it would be easy.

As it turns it, it wasn't easy.

The Requirements

My requirments are simple:

  • The brightness should be manually adjustable with a knob from OFF to nearly full ON.
  • The LED will serve as the light source of a microscope trans-illuminator. It should work across a large range of frame acquisition rates (1 Hz to 1 kHz, or exposure times of 1 ms to 1 s).
  • The range of brightnesses should be variable across the dynamic range of the camera, which in my case is 35,000:1, or about 90 dB.

I don't care about efficiency. I don't care about whether I can use a Raspberry Pi to control it. I don't care whether it can be turned on or off with different logic levels. I just want a knob that I can turn to make the LED brighter or dimmer.

In spite of the insistence of several people that I communicated with on the internet, I decided that the second requirement would preclude using pulse width modulation (PWM) to dim the LED. Even when I could convince others that PWM almost always causes aliasing at high frame rates, they tried to find obscure work arounds so I could still use PWM. I really do appreciate all the feedback I got. But I also learned that PWM is the hammer of the electronics world that makes everything look like a nail1.

The Circuit

I reached out to a friend of mine who's a wizard at analog electronics2. He suggested to use a MOSFET and to vary the gate-source voltage to control the current through the LED.

After a lot of thinking and reading, I arrived at the following circuit:

The LED is an Osram Oslon star LED (LST1-01F05-4070-01) with a maximum current of 1.3 A and a maximum forward voltage of 3.2 V. The MOSFET is an IRF510, whose gate-source threshold voltage is about 3 V.

Here's a brief explanation of what each component does:

  1. Voltage source : This is just a 12 V wall wart.
  2. 200 nF capacitor : This smooths out any fluctuations from the wall wart.
  3. 50 kOhm potentiometer : The "knob." Turning it will vary the gate-source voltage of the MOSFET, which controls how much current flows through the LED.
  4. 50 kOhm resistor : This along with the potentiometer forms a voltage divider to keep the minimum voltage at the MOSFET gate close to where the LED turns on. Without it, you need to rotate the potentiometer almost half of its full range for the LED to turn on.
  5. 300 nF capacitor : A debounce capacitor that smooths out the mechanical irregularities of the pot when it turns.
  6. IRF510 MOSFET : Basically a valve that I can vary continuously to control the LED current by setting the voltage at the gate.
  7. LED : So pretty.
  8. 10 Ohm resistor : This limits the current through the resistor. I calculated its value by dividing the maximum supply voltage minus the maximum forward voltage drop across the LED by the maximum current, then rounded up for safety.

$$ R = \frac{V}{I} = \frac{\left( 12 \, V - 3.2 \, V \right) }{1.3 \, A} = 6.8 \Omega $$

The resistor also has to handle a large power dissipation at the maximum current:

$$ P = I^2 R = \left(1.3 \, A \right)^2 \left( 10 \Omega \right) = 16.9 W $$

I decided instead to keep the current to less than 1 Amp so that I could use a 10 Watt resistor that I had.

Power dissipation is also why we don't just use a potentiometer to control the LED current: my pots were only rated up to about 50 mW, whereas I expected that the MOSFET would have handle loads on the order of Watts due to the high current.

What I Learned

I really need to study MOSFETS

I still don't really know how to solve circuits with MOSFETs. I arrived at the above circuit largely by trial-and-error on a prototype and by performing naive calculations on the voltage divider that turned out to not be entirely correct. I also expected that the LED would turn on once I passed the MOSFET's gate-source voltage threshold, but this turned out to be off by about 2 or 3 V.

MOSFETs suffer from second order effects

There is currently a hysteresis in the gate-ground voltage at when the LED turns on and when it turns off by about half a Volt. According to a helpful person on Reddit, this is likely due to a change in both the LEDs forward voltage and the MOSFET's threshold voltage with temperature once the current starts flowing. A possible fix is to swap the order of the LED and the MOSFET so that only the MOSFET will contribute to the hysteresis.

You can always complicate things to make them better

The same person on Reddit above also suggested making the circuit robust to temperature variations by adding an opamp to control the MOSFET gate voltage. It would compensate for temperature changes by comparing the potentiometer value to the 10 Ohm resistor in a closed feedback loop.

Electronics is an art

Yes, electronics is a science, but I would argue that having to mentally juggle second order effects and the fact that experts seem to make an initial design "by feel" are signatures of an art.

It also struck me how nearly every step of the process forced me to take a detour to address something I hadn't at first considered, such as current limits in the wires and large variations in the MOSFET specs.

The next time I need to do something like this, I will expect the problem to take longer to solve than I first anticipate.


  1. To my non-native English speaking readers: I mean that people try to use PWM to solve problems where it's not appropriate. 

  2. And if you play electric guitar, be sure to check out his handmade effects pedals: https://www.volumeandpower.com/

Meta-Biophysics of the Cell

I work in a biophysics lab applying microscopy techniques to study cell biology. I am not a biologist, and I was not trained as one. I therefore have had to develop heuristics to help me understand what cell biologists do and to communicate with them effectively.

In this post, I present these heuristics as a sort of model for how I think that cell biologists think. I collectively call them "Meta-Biophysics of the Cell" because:

  1. they model the models of cell biologists, hence they are a meta-model,
  2. they are inspired by physics and quantitative modeling, and
  3. I limit myself to cellular processes and not, for example, in vitro or organismal studies.

Furthermore, they are heavily biased by my work in microscopy.

These heuristics may very well be wrong. I do not pretend to understand biology nearly as well as a biologist. If you think I am wrong, please do not hesitate to leave a comment and explain why.

They also are certainly not complete. I present here only what I think the most important heuristics are.

Biologists want to know distributions of proteins across space and time

Cell biology is concerned with understanding the structure and behavior of cells as complex phenomena that emerge from the interactions of molecules. The types of these molecules may be proteins, nucleic acids, lipids, or some other type. I will use the term "protein" generally because

  1. it's easier than always listing all the types of molecules,
  2. it's more precise than just saying "molecule"
  3. there are at least 10,000 different types of proteins in the human cell, making proteins a core building block of the cell, and
  4. these heuristics do not change if you substitute another type of molecule for the word "protein."

Within a single cell, we can model the number of proteins at a point in space and a point in time with a function \( N \left( \mathbf{r}, t \right) \). This is a spatiotemporal distribution for protein number density. Its dimensions are in numbers of proteins per unit of volume.

A single protein can be represented as a point in space and time, such as a delta function:

$$ N \left( \mathbf{r}, t \right) = \delta \left( \mathbf{r}, t \right) $$

Of course a real protein occupies some volume and is not a point, but at the level of a cell I think that this is an adequate model for what cell biologists want to know about protein distributions.

But wait. There are more than ten thousand types of proteins within the cell. Some proteins are of high abundance, and some are very rare. So it is not sufficient merely to count proteins in space and time: we also need to identify their types. For this, I assign a unique ID to each type that I call \( s \) for "species". Our model now becomes a function of another variable, one that is categorical rather than continuous:

$$ N \left( \mathbf{r}, t ; s\right) $$

Below I show a simplified schematic of the volume that this model occupies. It is simplified because I show only one spatial dimension (otherwise it would be a five dimensional hypervolume). I saw a figure like this once in a paper about a decade ago, but sadly I cannot find it to credit it. (Update 2025-01-30: The paper referred to is Megason and Fraser, "Imaging in Systems Biology," Cell 130(5), 784-795 (2007))

A cell can be represented as a "volume" of spatiotemporal distributions where one spatiotemporal slice belongs to each protein species.

The \( x \) and \( t \) dimensions are continuous; the \( s \) dimension is a discrete, categorical variable. Each \( s \) slice is the spatiotemporal distribution of that protein within a single cell. Each point in the volume is the protein density for that point in space and time and for that species of protein.

Individual proteins might also vary amongst themselves as in, for example, post translation modifications. This is not really a problem for the above model because we can use a different value for the \( s \) of each variant.

Creation and degradation of a protein

The appearance and disappearance of a protein is modeled as a non-zero value over a time range from \( t_0 \) to \( t_1 \). For example, a single protein existing over a finite time interval may be expressed as

$$\begin{eqnarray} &&\delta \left( \mathbf{r}, t \right), \, t_0 < t < t_1 \\ &&0, \, \text{otherwise} \end{eqnarray}$$

Biologists can only measure slices of protein distributions

Look again at the figure above. Remember that there are in fact five dimensions. Can any one experimental technique measure the whole hypevolume?

No. Instead, biologists can measure slices from the volume and try to piece together a complete picture of \( N \) from individual measurements.

For example, fluorescence microscopy can measure proteins in space and time. Sometimes it can measure in 3 spatial dimensions, but it is easier to measure in 2. Unfortunately, it can only measure a small of number of protein species relative to all the proteins that are in the cell. These would correspond to the different fluorescence channels of the measurement. Thus, fluorescence microscopy provides a slice of the volume that looks like the following:

Fluorescence microscopy measures a small slice of the spatiotemporal protein distribution that represents a cell.

Other types of measurements, such as those in single cell proteomics, might measure a large number of proteins but cannot resolve them in space and time. They would slice the volume like this:

So after their measurements, biologists are always going to be left with less than the total amount of information contained within a single cell because they can only measure slices of \( N \).

Organelles are mutually exclusive slices of protein distributions in space

Consider a mitochondrion. It is a membrane-bound organelle. Everything inside the mitochondrion is considered part of the organelle, and everything outside it is not. An organelle is therefore a mutually-exclusive slice through the volume dimensions of \( N \).

The slices are mutually-exclusive because two mitochondria cannot occupy the same volume at the same time and have a distinct identity.

For non-membrane bound organelles, keep reading.

Organelles contain many types proteins

If we use the definition of an organelle as a mutually-exclusive slice through the volume dimensions of \( N \), then we can look sideways along the \( s \) dimension to find all the proteins that belong to the organelle. If the distribution for protein \( s_i \) is non-zero inside an organelle's volume at a given time, then it belongs to the organelle. The set of all proteins within the volume slice at a given time constitute the organelle.

Knowing \( N \) doesn't by itself tell us what is and is not an organelle

Membrane-bound organelles are easy to identify because of their structure. Other sets of proteins within a given volume may or may not form an organelle. In these situations, we might look at their function instead to decide whether the volume is or is not occupied by an organelle.

For example, organelles like the centrosome have a diffuse, pericentriolar material that surround them. In this case, the border defining what is and isn't inside the centrosome is likely to be somewhat arbitrary.

Cause and Effect is the probability of one protein distribution given another

At this time, I am much less certain about how function and causal relationships fit within this model. It is nevertheless important because biologists are deeply interested in the function of proteins and other complexes. To a rough approximation, I would say that cause and effect describes how the spatiotemporal distributions of a subset of protein species can serve as a predictor of another distribution at a later time. In other words, we can assign a probability to a certain distribution given another one.

I would guess that not every possible set proteins is linked by causal relationships. This would mean that the limitations that come from being able to sample slices of the protein distribution hypervolume are not so significant. You would then want choose your measurements so that you slice the volume to include only the species that are causally linked for the phenomenon that you are studying.

As a consequence of this, the causal links between distributions are likely more important than knowing \( N \). I doubt that we can have a satisfactory understanding of the cell if we could exhaustively measure \( N \) for even a single cell.

Interactions between proteins require spatial colocalization

Protein-protein interactions occur on length-scales on the order of the size of individual proteins. For two different proteins to "interact" we require that they be colocated less than this distance. Colcalization means that two proteins are located less than the distance required for an interaction to occur.

Furthermore, colocalization is necessary but not sufficient for an interaction. A real interaction involves the chemistry between the two different species.

Summary

In summary, my main three heuristics for meta-biophysics of the cell are:

  1. Biologists want to know distributions of proteins across space and time
  2. Organelles are mutually exclusive slices of protein distributions in space
  3. Cause and Effect is the probability of one or more protein distributions given another

I find that nearly all the problems that the cell biologists that I work with can be reformulated into this language. I emphasize again that the "real" science is being done by the biologists, and I in no way mean to diminish the complexity of their work. These heuristics are merely a tool that I use to understand what they are doing when I myself am unfamiliar with their jargon and mental models.

Coordinate Systems for Modeling Microscope Objectives

A common model for infinity corrected microscope objectives is that of an aplanatic and telecentric optical system. In many developments of this model, emphasis is placed upon the calculation of the electric field near the focus. However, this has the effect that the definition of the coordinate systems and geometry are conflated with the determination of the fields. In addition, making the model amenable to computation often occurs as an afterthought.

In this post I will explore the geometry of an aplanatic system for modeling high NA objectives with an emphasis on computational implementations. My approach follows Novotny and Hecht1 and Herrera and Quinto-Su2.

The Model Components

The model system is illustrated below:

A high NA, infinity corrected microscope objective as an aplanatic and telecentric optical system.

In this model, we abstract over the details of the objective by representing it as four surfaces:

  1. A back focal plane containing an aperture stop
  2. A back principal plane, \( P \)
  3. A front principal surface, \( P' \)
  4. A front focal plane

The space to the left of the back principal plane is called the infinity space. The space to the right of the front principal surface is called the sample space.

We let the infinity space refractive index \( n_1 = 1 \) because it is in air. The refractive index \( n_2 \) is the refractive index of the immersion medium.

The unit vectors \( \mathbf{n} \) are not used in this discussion; they are relevant for computing the fields.

Assumptions

We make one assumption: the system obeys the sine condition. The meaning of this will be explained later.

An aplanatic system is one that obeys the sine condition.

We will not assume the intensity law to conserve energy because it is only necessary when computing the electric field near the focus.

The Aperture Stop and Back Focal Plane

The aperture stop (AS) of an optical system is the element that limits the angle of the marginal ray.

The system is telecentric because the aperture stop is located in the back focal plane (BFP). We can shape the focal field by spatially modulating any of the amplitude, phase, or polarization of the incident light in a plane conjugate to the BFP.

The Back Principal Plane

This is the plane in infinity space at which rays appear to refract. It is a plane because rays coming from a point in the front focal plane all emerge into the infinity space in the same direction.

Strictly speaking, focus field calculations require us to propagate the field from the AS to the back principal plane before computing the Debye diffraction integral, but this step is often omitted3. The assumptions of paraxial optics should hold here.

The Front Principal Surface

The front principal surface is the surface at which rays appear to refract in the sample space. It is a surface because

  1. this is a non-paraxial system, and
  2. we assumed the sine condition.

The sine condition states that refraction of a ray coming from an on-axis point in the front focal plane occurs on a spherical cap centered upon the focal point. The distance from the optical axis of the point of intersection of the ray with the surface is proportional to the sine of the angle that the ray makes with the axis.

The principal surface is in the far field of the electric field coming from the focal region. For this reason, we can represent a point on this surface as representing a single ray or a plane wave1.

The Front Focal Plane

This plane is located a distance \( n_2 f \) from the principal surface4. It is not at a distance \( f \) from this surface. This is a result of imaging in an immersion medium.

Geometry and Coordinate Systems

The Aperture Stop Radius

The aperture stop radius \( R \) corresponds to the distance from the axis to the point where the marginal ray intersects the front prinicpal surface. In the sample space, the marginal ray travels at an angle \( \theta_{max} \) with respect to the axis.

Under the sine condition, this height is

$$ R = n_2 f \sin{ \theta_{max} } = f \, \text{NA} $$

The right-most expression uses the definition of the numerical aperture \( \text{NA} \equiv n \sin{ \theta_{max} } \).

Compare this result to the oft-cited expression for the entrance pupil diameter of an objective lens: \( D = 2 f \, \text{NA} \). They are the same. This makes sense because an entrance pupil is either

  1. an image of an aperture stop, or
  2. a physical stop.

The Back Principal Plane

There are two independent coordinate systems in the back principal plane:

  1. the spatial coordinate system defining the far field positions \( \left( x_{\infty} , y_{\infty} \right) \), and
  2. the coordinate system of the angular spectrum of plane waves \( \left( k_x, k_y \right) \).

The Far Field Coordinate System

The far field coordinate system may be written in Cartesian form as \( \left( x_{\infty} , y_{\infty} \right) \). It also has a cylindrical representation as

$$\begin{eqnarray} \rho &=& \sqrt{x_{\infty}^2 + y_{\infty}^2} \\ \phi &=& \arctan \left( \frac{y_{\infty}}{x_{\infty}} \right) \end{eqnarray}$$

The cylindrical representation appears to be preferred in textbook developments of the model. The Cartesian representation is likely preferred for computational models because it works naturally with two-dimensional arrays of numbers, and because beam shaping elements such as spatial light modulators are rectangular arrays of pixels2.

The Angular Spectrum Coordinate System

Each point in the angular spectrum coordinate system represents a plane wave in the sample space that is traveling at an angle \( \theta \) to the axis according to:

$$\begin{eqnarray} k_x &=& k \sin \theta \cos \phi \\ k_y &=& k \sin \theta \sin \phi \\ k_z &=& k \cos \theta \end{eqnarray}$$

where \( k = 2 \pi n_2 / \lambda = n_2 k_0 \).

Along the y-axis ( \( x_{\infty} = 0 \) ), the maximum value of \( k_y \) is \(n_2 k_0 \sin \theta_{max} = k_0 \, \text{NA} \).

Substitute in the expression \( \text{NA} = R / f \) and we get \(k_{y, max} = k_0 R / f\). But \( R = y_{\infty, max} \). This (and similar reasoning for the x-axis) implies that:

$$\begin{eqnarray} k_x &=& k_0 x_{\infty} / f \\ k_y &=& k_0 y_{\infty} / f \end{eqnarray}$$

The above equations link the angular spectrum coordinate system to the far field coordinate system. They are no longer independent once \( f \) and \( \lambda \) are specified.

Numerical Meshes

There are four free parameters for defining the coordinate systems of the numerical meshes:

  1. The numerical aperture, \( \text{NA} \)
  2. The wavelength, \( \lambda \)
  3. The focal length, \( f \)
  4. The linear mesh size, \( L \)

Below is a figure that illustrates the construction of the meshes. Both the far field and angular spectrum coordinate systems are represented by a \( L \times L \) array. \( L = 16 \) in the figure below. In general the value of \( L \) should be a power of 2 to help ensure the efficiency of the Fast Fourier Transform (FFT). By considering only powers of 2, we need only consider arrays of even size as well.

A numeric mesh representing the far field and angular spectrum coordinate systems of a microscope objective. Fields are sampled at the center of each mesh pixel.

The fields are defined on a region of circular support that is centered on this array. The radius of the domain of the far field coordinate system is \( f \text{NA} \); the radius of the domain of the angular spectrum coordinate system is \( k_0 \text{NA} \).

The boxes that are bound by the gray lines indicate the location of each field sample. The \( \left( x_{\infty} , y_{\infty} \right) \) and the \( \left( k_x, k_y \right) \) coordinate systems are sampled at the center of each gray box. The origin is therefore not sampled, which will help avoid division by zero errors when the fields are eventually computed.

The figure suggests that we could create only one mesh and scale it by either \( f \text{NA} \) or \( k_0 \text{NA} \) depending on which coordinate system we are working with. The normalized coordinates become \( \left( x_{\infty} / \left( f \text{NA} \right), y_{\infty} / \left( f \text{NA} \right) \right) \) and \( \left( k_x / \left( k_0 \text{NA} \right), k_y / \left( k_0 \text{NA} \right) \right) \).

1D Mesh Example

As an example, let \( L = 16 \). To four decimal places, the normalized coordinates are \( -1.0000, -0.8667, \ldots, -0.0667, 0.0667, \ldots, 0.8667, 1.0000 \).

The spacing between array elements is \( 2 / \left( L - 1 \right) = 0.1333 \). Note that 0 is not included in the 1D mesh as it goes from -0.0667 to 0.0667.

A 2D mesh is easily constructed from the 1D mesh using tools such as NumPy's meshgrid.

Back Principal Plane Mesh Spacings

In the x-direction, the mesh spacing of the far field coordinate system is

$$ \Delta x_{\infty} = 2 R / \left( L - 1 \right) = 2 f \text{NA} / \left( L - 1 \right) $$

In the \( k_x \)-direction, the mesh spacing of the angular spectrum coordinate system is

$$ \Delta k_x = 2 k_{max} / \left( L - 1 \right) = 2 k_0 \text{NA} / \left( L - 1 \right) $$

Note the symmetry between these two expressions. One scales with \( f \text{NA} \) and the other \( k_0 \text{NA} \). Recall that these are free parameters of the model.

Sample Space Mesh Spacing

It is interesting to compute the spacing between mesh elements \( \Delta x \) in the sample space when the fields are eventually computed.

The sampling angular frequency in the sample space is \( k_S = 2 \pi / \Delta x \).

The Nyquist-Shannon sampling theory states that the maximum informative angular frequency is \( k_{max} = k_S / 2 \).

From the previous section, we know that \( k_{max} = \left(L - 1 \right) \Delta k_x / 2 \), and that \( \Delta k_x = 2 k_0 \text{NA} / \left( L - 1 \right) \).

Combining all the previous expressions and simplifying, we get:

$$\begin{eqnarray} k_S &=& 2 k_{max} \\ 2 \pi / \Delta x &=& \left(L - 1 \right) \Delta k_x \\ 2 \pi / \Delta x &=& \left(L - 1 \right) \left[ 2 k_0 \text{NA} / \left( L - 1 \right) \right] \\ 2 \pi / \Delta x &=& \left(L - 1 \right) \left[ 2 \left(2 \pi / \lambda \right) \text{NA} / \left( L - 1 \right) \right] \end{eqnarray}$$

Solving the above expression for \( \Delta x \), we arrive at

$$ \Delta x = \frac{\lambda}{2 \text{NA}} $$

which is of course the Abbe diffraction limit.

Effect of not Sampling the Origin

Herrera and Quinto-Su2 point out that an error will be introduced if we naively apply the FFT to compute the field components in the \( \left( k_x, k_y \right) \) coordinate system because the origin is not sampled, whereas the FFT assumes that we sample the zero frequency component. The effect is that the result of the FFT has a constant phase error that accounts for a half-pixel shift in each direction of the mesh.

Consider again the 1D mesh example with \(L = 16 \): \( -1.0000, -0.8667, \ldots, -0.0667, 0.0667, \ldots, 0.8667, 1.0000 \)

In Python and other languages that index arrays starting at 0, the origin is located at \(L / 2 - 0.5 \), i.e. halfway between the samples at index 7 and 8. A lateral shift in Fourier space is equivalent to a phase shift in real space:

$$ \phi_{shift} \left(X, Y \right) = -j 2 \pi \frac{0.5}{L} X - j 2 \pi \frac{0.5}{L} Y $$

where \( X \) and \( Y \) are normalized coordinates.

At this point, I am uncertain whether the phasor with the above argument needs to be multiplied or divided with the result of the FFT because 1. there are a few typos in the signs for the coordinate system bounds in the manuscript of Herrera and Quinto-Su, and 2. the correction was developed for use in MATLAB, which indexes arrays starting at 1. Once the fields are computed, it would be easy to verify the correct sign of the phase terms following the procedure outlined in Figure 3 of Herrera and Quinto-Su's manuscript.

Structure of the Algorithm

The algorithm to compute the focus fields will proceed as follows:

  1. (optional) Propgate the inputs fields from the AS to the back principal plane using paraxial wave propagation
  2. Input the sampled fields in the back principal plane in the \( \left( x_{\infty}, y_{\infty} \right) \) coordinate system
  3. Transform the fields to the \( \left( k_x, k_y \right) \) coordinate system
  4. Compute the fields in the \( \left(x, y, z \right) \) coordinate system using the FFT

Additional Remarks

  • Zero padding the mesh will increase the sample space resolution beyond the Abbe limit, but since the fields remain zero outside of the support, no new information is added.
  • On the other hand, zero padding might be required when computing fields going from the sample space to the back principal plane to faithfully reproduce any evanescent components.
  • Separating the coordinate system and mesh construction from the calculation of the fields reveals that the two assumptions of the model belong separately to each part. The sine condition is used in the construction of the coordinate systems, whereas energy conservation is used when computing the fields.
  • This post did not explain how to compute the fields.
  • Herrera and Quinto-Su (and possibly also Novotny and Hecht) appear to use an "effective" focal length which can be obtained by multiplying the one that I use by the sample space refractive index. I prefer my formulation because it is consistent with geometric optics and the well-known expression for the diameter of an objective's entrance pupil. When the fields are calculated, however, I do not yet know whether the arguments of the phasors of the Debye integral will require modification.

  1. Lukas Novotny and Bert Hecht, "Principles of Nano-Optics," Cambridge University Press (2006). https://doi.org/10.1017/CBO9780511813535 

  2. Isael Herrera and Pedro A. Quinto-Su, "Simple computer program to calculate arbitrary tightly focused (propagating and evanescent) vector light fields," arXiv:2211.06725 (2022). https://doi.org/10.48550/arXiv.2211.06725 

  3. Marcel Leutenegger, Ramachandra Rao, Rainer A. Leitgeb, and Theo Lasser, "Fast focus field calculations," Opt. Express 14, 11277-11291 (2006). https://doi.org/10.1364/OE.14.011277 

  4. Sun-Uk Hwang and Yong-Gu Lee, "Simulation of an oil immersion objective lens: A simplified ray-optics model considering Abbe’s sine condition," Opt. Express 16, 21170-21183 (2008). https://doi.org/10.1364/OE.16.021170 

GitHub CLI Authorization with a Fine-grained Access Token

It is a good idea to use fine-grained access tokens for shared PCs in the lab that require access to private GitHub repos so that you can restrict the scope of their use to specific repositories and not use your own personal SSH keys on the shared machines. I am experimenting with the GitHub command line tool gh to authenticate with GitHub using fine-grained access tokens and make common remote operations on repos easier.

Today I encountered a subtle problem in the gh authentication process. If you set the protocol to ssh during login, then you will not have access to the repos that you granted permissions to in the fine-grained access token. This can lead to a lot of head scratching because it's not at all clear which permissions map to which git operations. In other words, what you think is a specific permissions error with the token is actually an authentication error.

To avoid the problem, be sure to specify https and not ssh as the protocol during authentication:

 echo "$ACCESS_TOKEN" | gh auth login -p https --with-token

Raspberry Pi I2C Quickstart

Below are my notes concerning the control of a Sparkfun MCP4725 12-bit DAC over I2C with a Raspberry Pi.

Rasbperry Pi Setup

  1. Enable the I2C interface if isn't already with raspi-config. Verify that the I2C device file(s) are present in /dev/ with ls /dev | grep i2c. (I had two files: i2c-1 and i2c-2.)
  2. Install the i2c-tools package for debugging I2C interfaces.
sudo apt update && sudo apt install -y i2c-tools

i2cdetect

Attach the DAC to the Raspberry Pi. The pinout is simple:

Raspberry Pi MCP4725
GND GND
3.3V Vcc
SCL SCL
SDA SDA

Next, run the command i2cdetect -y 1. This will check for a device on bus 1 (/dev/i2c-1) and automatically accept confirmations:

leb@raspberrypi:~/$ i2cdetect -y 1
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: 60 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --

Each I2C device must have a unique 7-bit address, i.e. 0x00 to 0x7f. The ranges [0x00, 0x07] and [0x78, 0x7f] are reserved. The above output indicates the DAC is at address 0x60. (Rows are the value of the first hexadecimal number of the address, columns are the second.)

i2cset

i2cset is a command line tool that is part of i2c-tools and that is used to write data to I2C devices. I can set the voltage output of the DAC to 0 as follows:

i2cset -y 1 0x60 0x40 0x00 0x00 i

The arguments mean the following:

Command byte

The command byte is explained on pages 23 and 25 of the MCP4725 datasheet. From most-significant to least-significant bits, the bits mean:

  1. C2 : command bit
  2. C1 : command bit
  3. C0 : command bit
  4. X : unused
  5. X : unused
  6. PD1 : Power down select
  7. PD0 : Power down select
  8. X : unused

According to Table 6-2 and Figure 6-2, C2, C1, C0 = 0, 1, 0 identifies the command to write to the DAC register and NOT also to the EEPROM. In normal operation, the power down bits are 0, 0 (page 28).

So, to write to the DAC register, we want to send 0b01000000 which in hexadecimal is 0x40.

Data bytes to voltage

The data bytes are explained in Figure 6-2 of the datasheet. The first byte contains bits 11-4, and the second byte bits 3-0 in the most-significant bits:

D11 D10 D9 D8 D7 D6 D5 D4 | D3 D2 D1 D0 X X X X

12-bits are used because this is a 12-bit DAC. The mapping between bytes and voltage is:

Data bytes, hex Data bytes, decimal Voltage
0x00 0x00 0 0
0xFF 0xF0 65520 V_max

where V_max is the voltage supplied to the chip's Vcc pin (3.3V in my case). The output step size is \( \Delta V = V_{max} / 4096 \) or about 0.8 mV.

Control via Python

This is modified from Sparkfun's tutorial and uses the smbus Python bindings. Be aware that the tutorial example has a bug in how it prepares the list of bytes to send to the DAC.

import smbus


OUTPUT_MAX: int = 4095
V_MAX = 3.3


def send(output: float, channel: int = 1, device_address: int = 0x60, command_byte: int = 0x40):
    assert output > 0.0 and output <= 1.0, "Output voltage must be expressed as fraction of the maximum in the range [0.0, 1.0]"

    bus = smbus.SMBus(channel)

    output_bytes = int(output * OUTPUT_MAX) & 0xfff
    data_byte_0: int = (output_bytes & 0xff0) >> 4  # First data byte
    data_bytes: list[int] = [data_byte_0, (output_bytes & 0xf) << 4]  # Second data byte

    bus.write_i2c_block_data(device_address, command_byte, data_bytes)


if __name__ == "__main__":
    output: float = 0.42
    send(output)

    print(f"Estimated output: {output * V_MAX}")

Misc.

Basic Calculator bc

This is a command line calculator and can be used for hexadecimal, binary, and decimal conversions. Install with apt install bc.

# Convert 0x40 to binary
echo "ibase=16; obase=2; 40" | bc

# Convert 0x40 to decimal
echo "ibase=16; 40" | bc

Note that hexadecimal values must be uppercase, e.g. 0xC7, not 0xc7!

Persist Internet Connection Sharing after Reboot

In my previous post I wrote about how to use Microsoft's Internet Connection Sharing to share an internet connection on a Windows machine with a Raspberry Pi. Unfortunately, I learned that the ICS service settings do not persist after the Windows machine reboots, and as a result the ICS connection is lost.

The fix is explained in this Microsoft Learn page.

To fix the issue, add a key in the Windows registry with the following information:

  • Path: HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\SharedAccess
  • Type: DWORD
  • Setting: EnableRebootPersistConnection
  • Value: 1

I then had to reset the shared connection by unchecking and rechecking the boxes in the Sharing tab of the internet connection as explained previously. After a reboot, I confirmed that I could connect to the Pi without manually re-enabling ICS.

Internet Connection Sharing for Raspberry Pi Setups

Today I decided to set up an old Raspberry Pi 3B+ for a task in the lab. After burning the latest Raspberry Pi OS Lite image on the SD card, I booted it up and was faced with the unfortunately common problem of network access. It would have taken days to get IT to register the Pi's MAC address on our system, and I did not want to wait that long.

Luckily, I had a spare network crossover cable and an extra ethernet interface on my Windows work laptop, so I plugged the Pi directly into the laptop and enabled Microsoft Internet Connection Sharing (ICS) between the network connection through which I was connected to the internet and the connection to the Pi. In my specific example:

  1. Press the Windows key and navigate to View network connections
  2. Right click on my internet connection (Ethernet 2 in my case), select Properties..., and then the Sharing tab.
  3. Check Allow other network users to connect... and in the Home networking connection: dropdown, select the connection corresponding to the Pi (Ethernet in my case).
  4. Check Allow other network users to control.... I'm not sure whether this is necessary.

Click OK and restart the Pi if it's already connected. Once it restarts, it should now have internet access through the laptop.

Next I wanted to connect with SSH to the Pi from my laptop and I needed to know the Pi's IP address. Luckily, ICS uses the mshome.net domain name for the network, and the Raspberry Pi by default has the hostname raspberrypi. So getting the IP is as easy running the ping raspberrypi.mshome.net command in Powershell.

The Mono16 Format and Flir Cameras

For a long time I had found the Mono16 image format of Flir's cameras a bit strange. In the lab I have several Flir cameras with 12-bit ADC's, but the images they output in Mono16 would span a range from 0 to around 65535. How does the camera map a 12-bit number to a 16-bit number?

If you search for the Mono16 format you will find that it's a padded format. This means that, in the 12-bit ADC example, 4 bits in each pixel are always 0, and the remaining 12 bits represent the pixel's value. But this should mean that we should get pixel values only between 0 and 2^12 - 1, or 4095. So how is it that we can saturate one of these cameras with values near 65535?

Today it occurred to me that Flir's Mono16 format might not use all the values in the range [0, 65535]. This is indeed the case, as I show below with an image stack that I acquired from one of these cameras:

>>> sorted_unique_pixels = np.unique(images.ravel())
>>> np.unique(np.diff(sorted_unique_pixels))
array([ 16,  32,  48,  64,  96, 144], dtype=uint16)

This prints all the possible, unique differences between the sorted and flattened pixel values in my particular image stack. Notice how they are all multiples of 16?

Let's look also at the sorted array of unique values itself:

>>> sorted_unique_pixels
array([ 5808,  5824,  5856, ..., 57312, 57328, 57472], dtype=uint16)

There are more than a million pixels in this array, yet they all take values that are integer multiples of 16.

It looks like Flir's Mono16 format rescales the camera's output onto the interval [0, 65535] by introducing "gaps" between the numbers equal to 2^16 - 2^N where N is the bit-depth of the camera's ADC.

But wait just a moment. Above I said that 4 bits in the Mono16 are zero, but I assumed that these were the most significant bits. If the least significant bits are the zero padding, then the allowed pixel values would be, for example, 0000 0000 = 0, 0001 0000 = 16, 0010 0000 = 32, 0011 0000 = 48, etc. (Here I ignored the first 8 bits for clarity.)

So it appears that Flir is indeed padding the 12-bit ADC data with 0's in its Mono16 format. But, somewhat counter-intuitively, it is the four least significant bits that are the zero padding. I say this is counter-intuitive because I have another camera that pads the most significant bits, so that the maximum pixel value is really 2^N - 1, with N being the ADC's bit-depth.