Reading view

There are new articles available, click to refresh the page.

Candle Flame Oscillations as a Clock

By: cpldcpu

Todays candles have been optimized for millenia not to flicker. But it turns out when we bundle three of them together, we can undo all of these optimizations and the resulting triplet will start to naturally oscillate. A fascinating fact is that the oscillation frequency is rather stable at ~9.9Hz as it mainly depends on gravity and diameter of the flame. 

We use a rather unusual approach based on a wire suspended in the flame, that can sense capacitance changes caused by the ionized gases in the flame, to detect this frequency and divide it down to 1Hz.

Introduction

Candlelight is a curious thing. Candles seem to have a life of their own: the brightness wanders, they flicker, and they react to the faintest motion of air.

There has always been an innate curiosity in understanding how candle flames work and behave. In recent years, people have also extensively sought to emulate this behavior with electronic light sources. I have also been fascinated by this and tried to understand real candles and how artificial candles work.

Now, it’s a curious thing that we try to emulate the imperfections of candles. After all, candle makers have worked for centuries (and millennia) on optimizing candles NOT to flicker?

In essence: The trick is that there is a very delicate balance in how much fuel (the molten candle wax) is fed into the flame. If there is too much, the candle starts to flicker even when undisturbed. This is controlled by how the wick is made.

Candle Triplet Oscillations

Now, there is a particularly fascinating effect that has more recently been the subject of publications in scientific journals12 : When several candles are brought close to each other, they start to “communicate” and their behavior synchronizes. The simplest demonstration is to bundle three candles together; they will behave like a single large flame.

So, what happens with our bundle of three candles? It will basically undo millennia of candle technology optimization to avoid candle flicker. If left alone in motionless air, the flames will suddenly start to rapidly change their height and begin to flicker. The image below shows two states in that cycle.

Two states of the oscillation cycle in bundled candles

We can also record the brightness variation over time to understand this process better. In this case, a high-resolution ambient light sensor was used to sample the flicker over time. (This was part of more comprehensive set experiments of conducted a while ago, which are still unpublished)

Plotting the brightness evolution over time shows that the oscillations are surprisingly stable, as shown in the image below. We can see a very nice sawtooth-like signal: the flame slowly grows larger until it collapses and the cycle begins anew. You can see a video of this behavior here. (Which, unfortunately cannot embed properly due to WordPress…)

Left: Brightness variation over time showing sawtooth pattern.
Right: Power spectral density showing stable 9.9 Hz frequency

On the right side of the image, you can see the power spectral density plot of the brightness signal on the left. The oscillation is remarkably stable at a frequency of 9.9 Hz.

This is very curious. Wouldn’t you expect more chaotic behavior, considering that everything else about flames seems so random?

The phenomenon of flame oscillations has baffled researchers for a long time. Curiously, they found that the oscillation frequency of a candle flame (or rather a “wick-stabilized buoyant diffusion flame”) depends mainly on just two variables: gravity and the dimension of the fuel source. A comprehensive review can be found in Xia et al.3.

Now that is interesting: gravity is rather constant (on Earth) and the dimensions of the fuel source are defined by the size (diameter) of the candles and possibly their proximity. This leaves us with a fairly stable source of oscillation, or timing, at approximately 10Hz. Could we use the 9.9 Hz oscillation to derive a time base?

Sensing Candle Frequencies with a Phototransistor

Now that we have a source of stable oscillations—remind you, FROM FIRE—we need to convert them into an electrical signal.

The previous investigation of candle flicker was based an I²C-based light sensor to sample the light signal. This provides very high SNR, but is comparatively complex and adds latency.

A phototransistor provides a simpler option. Below you can see the setup with a phototransistor in a 3mm wired package (arrow). Since the phototransistor has internal gain, it provides a much higher current than a photodiode and can be easily picked up without additional amplification.

Phototransistor setup with sensing resistor configuration

The phototransistor was connected via a sensing resistor to a constant voltage source, with the oscilloscope connected across the sensing resistor. The output signal was quite stable and showed a nice ~9.9 Hz oscillation.

In the next step, this could be connected to an ADC input of a microcontroller to process the signal further. But curiously, there is also a simpler way of detecting the flame oscillations.

Capacitive Flame Sensing

Capacitive touch peripherals are part of many microcontrollers and can be easily implemented with an integrated ADC by measuring discharge rates versus an integrated pull-up resistor, or by a charge-sharing approach in a capacitive ADC.

While this is not the most obvious way of measuring changes in a flame, it is to be expected to observe some variations. The heated flame with all its combustion products contains ionized molecules to some degree and is likely to have different dielectric properties compared to the surrounding air, which will be observed as either a change of capacitance or increased electrical loss. A quick internet search also revealed publications on capacitance-based flame detectors.

A CH32V003 microcontroller with the CH32fun environment was used for experiments. The set up is shown below: the microcontroller is located on the small PCB to the left. The capacitance is sensed between a wire suspended in the flame (the scorched one) and a ground wire that is wound around the candle. The setup is completed with an LED as an output.

Complete capacitive sensing setup with CH32V003 microcontroller, candle triplet and a LED.

Initial attempts with two wires in the flame did not yield better results and the setup was mechanically much more unstable.

Read out was implemented straightforward using the TouchADC function that is part of CH32fun. This function measures the capacitance on an input pin by charging it to a voltage and measuring voltage decay while it is discharged via a pull-up/pull-down resistor. To reduce noise, it was necessary to average 32 measurements.

// Enable GPIOD, C and ADC
RCC->APB2PCENR |= RCC_APB2Periph_GPIOA | RCC_APB2Periph_GPIOD | RCC_APB2Periph_GPIOC | RCC_APB2Periph_ADC1;

InitTouchADC();
...

int iterations = 32;
sum = ReadTouchPin( GPIOA, 2, 0, iterations );

First attempts confirmed to concept to work. The sample trace below shows sequential measurements of a flickering candle until it was blown out at the end, as signified by the steep drop of the signal.

The signal is noisier than the optical signal and shows more baseline wander and amplitude drift—but we can work with that. Let’s put it all together.

Capacitive sensing trace showing candle oscillations and extinction

Putting everything together

Additional digitial signal processing is necessary to clean up the signal and extract a stable 1 Hz clock reference.

The data traces were recorded with a Python script from the monitor output and saved as csv files. A separate Python script was used to analyze the data and prototype the signal processing chain. The sample rate is limited to around ~90 Hz due to the overhead of printing data via the debug output, but the data rate turned out to be sufficient for this case.

The image above shows an overview of the signal chain. The raw data (after 32x averaging) is shown on the left. The signal is filtered with an IIR filter to extract the baseline (red). The middle figure shows the signal with baseline removed and zero-cross detection. The zero-cross detector will tag the first sample after a negative-to-positive transition with a short dead-time to prevent it from latching to noise. The right plot shows the PSD of the overall and high-pass filtered signal, showing that despite the wandering input signal, we get a sharp ~9.9 Hz peak for the main frequency.

A detailed zoom-in of raw samples with baseline and HP filtered data is shown below.

The inner loop code is shown below, including implementation of IIR filter, HP filter, and zero-crossing detector. Conversion from 9.9 Hz to 1 Hz is implemented using a fractional counter. The output is used to blink the attached LED. Alternatively, an advanced implementation using a software-implemented DPLL might provide a bit more stability in case of excessive noise or missing zero crossings, but this was not attempted for now.

const int32_t led_toggle_threshold = 32768;  // Toggle LED every 32768 time units (0.5 second)
const int32_t interval = (int32_t)(65536 / 9.9); // 9.9Hz flicker rate
...

sum = ReadTouchPin( GPIOA, 2, 0, iterations );

if (avg == 0) { avg = sum;} // initialize avg on first run
avg = avg - (avg>>5) + sum; // IIR low-pass filter for baseline
hp = sum -  (avg>>5); // high-pass filter

// Zero crossing detector with dead time
if (dead_time_counter > 0) {
    dead_time_counter--;  // Count down dead time
    zero_cross = 0;  // No detection during dead time
} else {
    // Check for positive zero crossing (sign change)
    if ((hp_prev < 0 && hp >= 0)) {
        zero_cross = 1;  
        dead_time_counter = 4;  
        time_accumulator += interval;  
        
        // LED blinking logic using time accumulator
        // Check if time accumulator has reached LED toggle threshold
        if (time_accumulator >= led_toggle_threshold) {
            time_accumulator = time_accumulator - led_toggle_threshold;  // Subtract threshold (no modulo)
            led_state = led_state ^ 1;  // Toggle LED state using XOR
            
            // Set or clear PC4 based on LED state
            if (led_state) {
                GPIOC->BSHR = 1<<4;  // Set PC4 high
            } else {
                GPIOC->BSHR = 1<<(16+4);  // Set PC4 low
            }
        }
    } else {
        zero_cross = 0;  // No zero crossing
    }
}

hp_prev = hp;

Finally, let’s marvel at the result again! You can see the candle flickering at 10 Hz and the LED next to it blinking at 1 Hz! The framerate of the GIF is unfortunately limited, which causes some aliasing. You can see a higher framerate version on YouTube or the original file.

That’s all for our journey from undoing millennia of candle-flicker-mitigation work to turning this into a clock source that can be sensed with a bare wire and a microcontroller. Back to the decade-long quest to build a perfect electronic candle emulation…

All data and code is published in this repository.

This is an entry to the HaD.io “One Hertz Challenge”

References

  1. Okamoto, K., Kijima, A., Umeno, Y. & Shima, H. “Synchronization in flickering of three-coupled candle flames.”  Scientific Reports 6, 36145 (2016). ↩
  2. Chen, T., Guo, X., Jia, J. & Xiao, J. “Frequency and Phase Characteristics of Candle Flame Oscillation.”  Scientific Reports 9, 342 (2019). ↩
  3. J. Xia and P. Zhang, “Flickering of buoyant diffusion flames,” Combustion Science and Technology, 2018. ↩

A transistor for heat

By: VM

Quantum technologies and the prospect of advanced, next-generation electronic devices have been maturing at an increasingly rapid pace. Both research groups and governments around the world are investing more attention in this domain.

India for example mooted its National Quantum Mission in 2023 with a decade-long outlay of Rs 6,000 crore. One of the Mission’s goals, in the words of IISER Pune physics professor Umakant Rapol, is “to engineer and utilise the delicate quantum features of photons and subatomic particles to build advanced sensors” for applications in “healthcare, security, and environmental monitoring”.

On the science front, as these technologies become better understood, scientists have been paying increasingly more attention to managing and controlling heat in them. These technologies often rely on quantum physical phenomena that appear only at extremely low temperatures and are so fragile that even a small amount of stray heat can destabilise them. In these settings, scientists have found that traditional methods of handling heat — mainly by controlling the vibrations of atoms in the devices’ materials — become ineffective.

Instead, scientists have identified a promising alternative: energy transfer through photons, the particles of light. And in this paradigm, instead of simply moving heat from one place to another, scientists have been trying to control and amplify it, much like how transistors and amplifiers handle electrical signals in everyday electronics.

Playing with fire

Central to this effort is the concept of a thermal transistor. This device resembles an electrical transistor but works with heat instead of electrical current. Electrical transistors amplify or switch currents, allowing the complex logic and computation required to power modern computers. Creating similar thermal devices would represent a major advance, especially for technologies that require very precise temperature control. This is particularly true in the sub-kelvin temperature range where many quantum processors and sensors operate.

Transistor Simple Circuit Diagram with NPN Labels.svg.
This circuit diagram depicts an NPN bipolar transistor. When a small voltage is applied between the base and emitter, electrons are injected from the emitter into the base, most of which then sweep across into the collector. The end result is a large current flowing through the collector, controlled by the much smaller current flowing through the base. Credit: Michael9422 (CC BY-SA)

Energy transport at such cryogenic temperatures differs significantly from normal conditions. Below roughly 1 kelvin, atomic vibrations no longer carry most of the heat. Instead, electromagnetic fluctuations — ripples of energy carried by photons — dominate the conduction of heat. Scientists channel these photons through specially designed, lossless wires made of superconducting materials. They keep these wires below their superconducting critical temperatures, allowing only photons to transfer energy between the reservoirs. This arrangement enables careful and precise control of heat flow.

One crucial phenomenon that allows scientists to manipulate heat in this way is negative differential thermal conductance (NDTC). NDTC defies common intuition. Normally, decreasing the temperature difference between two bodies reduces the amount of heat they exchange. This is why a glass of water at 50º C in a room at 25º C will cool faster than a glass of water at 30º C. In NDTC, however, reducing the temperature difference between two connected reservoirs can actually increase the heat flow between them.

NDTC arises from a detailed relationship between temperature and the properties of the material that makes up the reservoirs. When physicists harness NDTC, they can amplify heat signals in a manner similar to how negative electrical resistance powers electrical amplifiers.

A ‘circuit’ for heat

In a new study, researchers from Italy have designed and theoretically modelled a new kind of ‘thermal transistor’ that they have said can actively control and amplify how heat flows at extremely low temperatures for quantum technology applications. Their findings were published recently in the journal Physical Review Applied.

To explore NDTC experimentally, the researchers studied reservoirs made of a disordered semiconductor material that exhibited a transport mechanism called variable range hopping (VRH). An example is neutron-transmutation-doped germanium. In VRH materials, the electrical resistance at low temperatures depends very strongly, sometimes exponentially, on temperature.

This attribute makes them ideal to tune their impedance, a property that controls the material’s resistance to energy flow, simply by adjusting temperature. That is, how well two reservoirs made of VRH materials exchange heat can be controlled by tuning the impedance of the materials, which in turn can be controlled by tuning their temperature.

In the new study, the researchers reported that impedance matching played a key role. When the reservoirs’ impedances matched perfectly (when their temperatures became equal), the efficiency with which they transferred photonic heat reached a peak. As the materials’ temperatures diverged, heat flow dropped. In fact, the researchers wrote that there was a temperature range, especially as the colder reservoir’s temperature rose to approach that of the warmer one, within which the heat flow increased even as the temperature difference shrank. This effect forms the core of NDTC.

The research team, associated with the NEST initiative at the Istituto Nanoscienze-CNR and Scuola Normale Superiore, both in Pisa in Italy, have proposed a device they call the photonic heat amplifier. They built it using two VRH reservoirs connected by superconducting, lossless wires. One reservoir was kept at a higher temperature and served as the source of heat energy. The other reservoir, called the central island, received heat by exchanging photons with the warmer reservoir.

The proposed device features a central island at temperature T1 that transfers heat currents to various terminals. The tunnel contacts to the drain and gate are positioned at heavily doped regions of the yellow central island, highlighted by a grey etched pattern. Each arrow indicates the positive direction of the heat flux. The substrate is (shown as and) maintained at temperature Tb, the gate at Tg, and the drain at Td. Credit: arXiv:2502.04250v3

The central island was also connected to two additional metallic reservoirs named the “gate” and the “drain”. These points operated with the same purpose as the control and output terminals in an electrical transistor. The drain stayed cold, allowing the amplified heat signal to exit the system from this point. By adjusting the gate temperature, the team could modulate and even amplify the flow of heat between the source and the drain (see image below).

To understand and predict the amplifier’s behaviour, the researchers developed mathematical models for all forms of heat transfer within the device. These included photonic currents between VRH reservoirs, electron tunnelling through the gate and drain contacts, and energy lost as vibrations through the device’s substrate.

(Tunnelling is a quantum mechanical phenomenon where an electron has a small chance of floating through a thin barrier instead of going around it.)

Raring to go

By carefully selecting the device parameters — including the characteristic temperature of the VRH material, the source temperature, resistances at the gate and drain contacts, the volume of the central island, and geometric factors — the researchers said they could tailor the device for different amplification purposes.

They reported two main operating modes. The first was called ‘current modulation amplifier’. In this configuration, the device amplified small variations in thermal input at the gate. In this mode, small oscillations in the gate heat current produced much larger oscillations, up to 15-times greater, in the photon current between the source and the central island and in the drain current, according to the paper. This amplification was efficient down to 20 millikelvin, matching the ultracold conditions required in quantum technologies. The output range of heat current was similarly broad, showing the device’s suitability to amplify heat signals.

The second mode was called ‘temperature modulation amplifier’. Here, slight changes of only a few millikelvin in the gate temperature, the team wrote, caused the output temperature in the central island to swing by as large as 3.3 times the changes in the input. The device could also handle input temperature ranges over 100 millikelvin. This performance reportedly matched or surpassed other temperature amplifiers already reported in the scientific literature. The researchers also noted that this mode could be used to pre-amplify signals in bolometric detectors used in astronomy telescopes.

An important ability relevant for practical use is the relaxation time, i.e. how soon after operating once the device returned to its original state, ready for the next run. The amplifier in both configurations showed relaxation times between microseconds and milliseconds. According to the researchers, this speed resulted from the device’s low thermal mass and efficient heat channels. Such a fast response could make it suitable to detect and amplify thermal signals in real time.

The researchers wrote that the amplifier also maintained good linearity and low distortion across various inputs. In other words, the output heat signal changed proportionally to the input heat signal and the device didn’t add unwanted changes, noise or artifacts to the input signal. Its noise-equivalent power values were also found to rival the best available solid-state thermometers, indicating low noise levels.

Approaching the limits

For these promising results, realising this device involves some significant practical challenges. For instance, NDTC depends heavily on precise impedance matching. Real materials inevitably have imperfections, including those due to imperfect fabrication and environmental fluctuations. Such deviations could lower the device’s heat transfer efficiency and reduce the operational range of NDTC.

The system also banked on lossless superconducting wires being kept well below their critical temperatures. Achieving and maintaining these ultralow temperatures requires sophisticated and expensive refrigeration infrastructure, which adds to the experimental complexity.

Fabrication also demands very precise doping and finely tuned resistances for the gate and drain terminals. Scaling production to create many devices or arrays poses major technical difficulties. Integrating numerous photonic heat amplifiers into larger thermal circuits risks unwanted thermal crosstalk and signal degradation, a risk compounded by the extremely small heat currents involved.

Furthermore, the fully photonic design offers benefits such as electrical isolation and long-distance thermal connections. However, it also approaches fundamental physical limits. Thermal conductance caps the maximum possible heat flow through photonic channels. This limitation could restrict how much power the device is able to handle in some applications.

Then again, many of these challenges are typical of cutting-edge research in quantum devices, and highlight the need for detailed experimental work to realise and integrate photonic heat amplifiers into operational quantum systems.

If they are successfully realised for practical applications, photonic heat amplifiers could transform how scientists manage heat in quantum computing and nanotechnologies that operate near absolute zero. They could pave the way for on-chip heat control, computers to autonomously stabilise the temperature, and perform thermal logic operations. Redirecting or harvesting waste heat could also improve the efficiency and significantly reduce noise — a critical barrier in ultra-sensitive quantum devices like quantum computers.

Featured image credit: Lucas K./Unsplash.

The Hyperion dispute and chaos in space

By: VM

I believe my blog’s subscribers did not receive email notifications of some recent posts. If you’re interested, I’ve listed the links to the last eight posts at the bottom of this edition.

When reading around for my piece yesterday on the wavefunctions of quantum mechanics, I stumbled across an old and fascinating debate about Saturn’s moon Hyperion.

The question of how the smooth, classical world around us emerges from the rules of quantum mechanics has haunted physicists for a century. Most of the time the divide seems easy: quantum laws govern atoms and electrons while planets, chairs, and cats are governed by the laws of Newton and Einstein. Yet there are cases where this distinction is not so easy to draw. One of the most surprising examples comes not from a laboratory experiment but from the cosmos.

In the 1990s, Hyperion became the focus of a deep debate about the nature of classicality, one that quickly snowballed into the so-called Hyperion dispute. It showed how different interpretations of quantum theory could lead to apparently contradictory claims, and how those claims can be settled by making their underlying assumptions clear.

Hyperion is not one of Saturn’s best-known moons but it is among the most unusual. Unlike round bodies such as Titan or Enceladus, Hyperion has an irregular shape, resembling a potato more than a sphere. Its surface is pocked by craters and its interior appears porous, almost like a sponge. But the feature that caught physicists’ attention was its rotation. Hyperion does not spin in a steady, predictable way. Instead, it tumbles chaotically. Its orientation changes in an irregular fashion as it orbits Saturn, influenced by the gravitational pulls of Saturn and Titan, which is a moon larger than Mercury.

In physics, chaos does not mean complete disorder. It means a system is sensitive to its initial conditions. For instance, imagine two weather models that start with almost the same initial data: one says the temperature in your locality at 9:00 am is 20.000º C, the other says it’s 20.001º C. That seems like a meaningless difference. But because the atmosphere is chaotic, this difference can grow rapidly. After a few days, the two models may predict very different outcomes: one may show a sunny afternoon and the other, thunderstorms.

This sensitivity to initial conditions is often called the butterfly effect — it’s the idea that the flap of a butterfly’s wings in Brazil might, through a chain of amplifications, eventually influence the formation of a tornado in Canada.

Hyperion behaves in a similar way. A minuscule difference in its initial spin angle or speed grows exponentially with time, making its future orientation unpredictable beyond a few months. In classical mechanics this is chaos; in quantum mechanics, those tiny initial uncertainties are built in by the uncertainty principle, and chaos amplifies them dramatically. As a result, predicting its orientation more than a few months ahead is impossible, even with precise initial data.

To astronomers, this was a striking case of classical chaos. But to a quantum theorist, it raised a deeper question: how does quantum mechanics describe such a macroscopic, chaotic system?

Why Hyperion interested quantum physicists is rooted in that core feature of quantum theory: the wavefunction. A quantum particle is described by a wavefunction, which encodes the probabilities of finding it in different places or states. A key property of wavefunctions is that they spread over time. A sharply localised particle will gradually smear out, with a nonzero probability of it being found over an expanding region of space.

For microscopic particles such as electrons, this spreading occurs very rapidly. For macroscopic objects, like a chair, an orange or you, the spread is usually negligible. The large mass of everyday objects makes the quantum uncertainty in their motion astronomically small. This is why you don’t have to be worried about your chai mug being in two places at once.

Hyperion is a macroscopic moon, so you might think it falls clearly on the classical side. But this is where chaos changes the picture. In a chaotic system, small uncertainties get amplified exponentially fast. A variable called the Lyapunov exponent measures this sensitivity. If Hyperion begins with an orientation with a minuscule uncertainty, chaos will magnify that uncertainty at an exponential rate. In quantum terms, this means the wavefunction describing Hyperion’s orientation will not spread slowly, as for most macroscopic bodies, but at full tilt.

In 1998, the Polish-American theoretical physicist Wojciech Zurek calculated that within about 20 years, the quantum state of Hyperion should evolve into a superposition of macroscopically distinct orientations. In other words, if you took quantum mechanics seriously, Hyperion would be “pointing this way and that way at once”, just like Schrödinger’s famous cat that is alive and dead at once.

This startling conclusion raised the question: why do we not observe such superpositions in the real Solar System?

Zurek’s answer to this question was decoherence. Say you’re blowing a soap bubble in a dark room. If no light touches it, the bubble is just there, invisible to you. Now shine a torchlight on it. Photons from the bulb will scatter off the bubble and enter your eyes, letting you see its position and color. But here’s the catch: every photon that bounces off the bubble also carries away a little bit of information about it. In quantum terms, the bubble’s wavefunction becomes entangled with all those photons.

If the bubble were treated purely quantum mechanically, you could imagine a strange state where it was simultaneously in many places in the room — a giant superposition. But once trillions of photons have scattered off it, each carrying “which path?” information, the superposition is effectively destroyed. What remains is an apparent mixture of “bubble here” or “bubble there”, and to any observer the bubble looks like a localised classical object. This is decoherence in action: the environment (the sea of photons here) acts like a constant measuring device, preventing large objects from showing quantum weirdness.

For Hyperion, decoherence would be rapid. Interactions with sunlight, Saturn’s magnetospheric particles, and cosmic dust would constantly ‘measure’ Hyperion’s orientation. Any coherent superposition of orientations would be suppressed almost instantly, long before it could ever be observed. Thus, although pure quantum theory predicts Hyperion’s wavefunction would spread into cat-like superpositions, decoherence explains why we only ever see Hyperion in a definite orientation.

Thus Zurek argued that decoherence is essential to understand how the classical world emerges from its quantum substrate. To him, Hyperion provided an astronomical example of how chaotic dynamics could, in principle, generate macroscopic superpositions, and how decoherence ensures these superpositions remain invisible to us.

Not everyone agreed with Zurek’s conclusion, however. In 2005, physicists Nathan Wiebe and Leslie Ballentine revisited the problem. They wanted to know: if we treat Hyperion using the rules of quantum mechanics, do we really need the idea of decoherence to explain why it looks classical? Or would Hyperion look classical even without bringing the environment into the picture?

To answer this, they did something quite concrete. Instead of trying to describe every possible property of Hyperion, they focused on one specific and measurable feature: the part of its spin that pointed along a fixed axis, perpendicular to Hyperion’s orbit. This quantity — essentially the up-and-down component of Hyperion’s tumbling spin — was a natural choice because it can be defined both in classical mechanics and in quantum mechanics. By looking at the same feature in both worlds, they could make a direct comparison.

Wiebe and Ballentine then built a detailed model of Hyperion’s chaotic motion and ran numerical simulations. They asked: if we look at this component of Hyperion’s spin, how does the distribution of outcomes predicted by classical physics compare with the distribution predicted by quantum mechanics?

The result was striking. The two sets of predictions matched extremely well. Even though Hyperion’s quantum state was spreading in complicated ways, the actual probabilities for this chosen feature of its spin lined up with the classical expectations. In other words, for this observable, Hyperion looked just as classical in the quantum description as it did in the classical one.

From this, Wiebe and Ballentine drew a bold conclusion: that Hyperion doesn’t require decoherence to appear classical. The agreement between quantum and classical predictions was already enough. They went further and suggested that this might be true more broadly: perhaps decoherence is not essential to explain why macroscopic bodies, the large objects we see around us, behave classically.

This conclusion went directly against the prevailing view of quantum physics as a whole. By the early 2000s, many physicists believed that decoherence was the central mechanism that bridged the quantum and classical worlds. Zurek and others had spent years showing how environmental interactions suppress the quantum superpositions that would otherwise appear in macroscopic systems. To suggest that decoherence was not essential was to challenge the very foundation of that programme.

The debate quickly gained attention. On one side stood Wiebe and Ballentine, arguing that simple agreement between quantum and classical predictions for certain observables was enough to resolve the issue. On the other stood Zurek and the decoherence community, insisting that the real puzzle was more fundamental: why we never observe interference between large-scale quantum states.

At this time, the Hyperion dispute wasn’t just about a chaotic moon. It was about how we could define ‘classical behavior’ in the first place. For Wiebe and Ballentine, classical meant “quantum predictions match classical ones”. For Zurek et al., classical meant “no detectable superpositions of macroscopically distinct states”. The difference in definitions made the two sides seem to clash.

But then, in 2008, physicist Maximilian Schlosshauer carefully analysed the issue and showed that the two sides were not actually talking about the same problem. The apparent clash arose because Zurek and Wiebe-Ballentine had started from essentially different assumptions.

Specifically, Wiebe and Ballentine had adopted the ensemble interpretation of quantum mechanics. In everyday terms, the ensemble interpretation says, “Don’t take the quantum wavefunction too literally.” That is, it does not describe the “real state” of a single object. Instead, it’s a tool to calculate the probabilities of what we will see if we repeat an experiment many times on many identical systems. It’s like rolling dice. If I say the probability of rolling a 6 is 1/6, that probability does not describe the dice themselves as being in a strange mixture of outcomes. It simply summarises what will happen if I roll a large collection of dice.

Applied to quantum mechanics, the ensemble interpretation works the same way. If an electron is described by a wavefunction that seems to say it is “spread out” over many positions, the ensemble interpretation insists this does not mean the electron is literally smeared across space. Rather, the wavefunction encodes the probabilities for where the electron would be found if we prepared many electrons in the same way and measured them. The apparent superposition is not a weird physical reality, just a statistical recipe.

Wiebe and Ballentine carried this outlook over to Hyperion. When Zurek described Hyperion’s chaotic motion as evolving into a superposition of many distinct orientations, he meant this as a literal statement: without decoherence, the moon’s quantum state really would be in a giant blend of “pointing this way” and “pointing that way”. From his perspective, there was a crisis because no one ever observes moons or chai mugs in such states. Decoherence, he argued, was the missing mechanism that explained why these superpositions never show up.

But under the ensemble interpretation, the situation looks entirely different. For Wiebe and Ballentine, Hyperion’s wavefunction was never a literal “moon in superposition”. It was always just a probability tool, telling us the likelihood of finding Hyperion with one orientation or another if we made a measurement. Their job, then, was simply to check: do these quantum probabilities match the probabilities that classical physics would give us? If they do, then Hyperion behaves classically by definition. There is no puzzle to be solved and no role for decoherence to play.

This explains why Wiebe and Ballentine concentrated on comparing the probability distributions for a single observable, namely the component of Hyperion’s spin along a chosen axis. If the quantum and classical results lined up — as their calculations showed — then from the ensemble point of view Hyperion’s classicality was secured. The apparent superpositions that worried Zurek were never taken as physically real in the first place.

Zurek, on the other hand, was addressing the measurement problem. In standard quantum mechanics, superpositions are physically real. Without decoherence, there is always some observable that could reveal the coherence between different macroscopic orientations. The puzzle is why we never see such observables registering superpositions. Decoherence provided the answer: the environment prevents us from ever detecting those delicate quantum correlations.

In other words, Zurek and Wiebe-Ballentine were tackling different notions of classicality. For Wiebe and Ballentine, classicality meant the match between quantum and classical statistical distributions for certain observables. For Zurek, classicality meant the suppression of interference between macroscopically distinct states.

Once Schlosshauer spotted this difference, the apparent dispute went away. His resolution showed that the clash was less over data than over perspectives. If you adopt the ensemble interpretation, then decoherence indeed seems unnecessary, because you never take the superposition as a real physical state in the first place. If you are interested in solving the measurement problem, then decoherence is crucial, because it explains why macroscopic superpositions never manifest.

The overarching takeaway is that, from the quantum point of view, there is no single definition of what constitutes “classical behaviour”. The Hyperion dispute forced physicists to articulate what they meant by classicality and to recognise the assumptions embedded in different interpretations. Depending on your personal stance, you may emphasise the agreement of statistical distributions or you may emphasise the absence of observable superpositions. Both approaches can be internally consistent — but they  also answer different questions.

For school students that are reading this story, the Hyperion dispute may seem obscure. Why should we care about whether a distant moon’s tumbling motion demands decoherence or not? The reason is that the moon provides a vivid example of a deep issue: how do we reconcile the strange predictions of quantum theory with the ordinary world we see?

In the laboratory, decoherence is an everyday reality. Quantum computers, for example, must be carefully shielded from their environments to prevent decoherence from destroying fragile quantum information. In cosmology, decoherence plays a role in explaining how quantum fluctuations in the early universe influenced the structure of galaxies. Hyperion showed that even an astronomical body can, in principle, highlight the same foundational issues.


Last five posts:

1. The guiding light of KD45

2. What on earth is a wavefunction?

3.  The PixxelSpace constellation conundrum

4. The Zomato ad and India’s hustle since 1947

5. A new kind of quantum engine with ultracold atoms

6. Trade rift today, cryogenic tech yesterday

7. What keeps the red queen running?

8. A limit of ‘show, don’t tell’

Watch the celebrations, on mute

By: VM

Right now, Shubhanshu Shukla is on his way back to Earth from the International Space Station. Am I proud he’s been the first Indian up there? I don’t know. It’s not clear.

The whole thing seemed to be stage-managed. Shukla didn’t say anything surprising, nothing that popped. In fact he said exactly what we expected him to say. Nothing more, nothing less.

Fuck controversy. It’s possible to be interesting in new ways all the time without edging into the objectionable. It’s not hard to beat predictability — but there it was for two weeks straight. I wonder if Shukla was fed all his lines. It could’ve been a monumental thing but it feels… droll.

“India’s short on cash.” “India’s short on skills.” “India’s short on liberties.” We’ve heard these refrains as we’ve covered science and space journalism. But it’s been clear for some time now that “India’s short on cash” is a myth.

We’ve written and spoken over and over that Gaganyaan needs better accountability and more proactive communication from ISRO’s Human Space Flight Centre. But it’s also true that it needs even more money than the Rs 20,000 crore it’s already been allocated.

One thing I’ve learnt about the Narendra Modi government is that if it puts its mind to it, if it believes it can extract political mileage from a particular commitment, it will find a way to go all in. So when it doesn’t, the fact that it doesn’t sticks out. It’s a signal that The Thing isn’t a priority.

Looking at the Indian space programme through the same lens can be revealing. Shukla’s whole trip and back was carefully choreographed. There’s been no sense of adventure. Grit is nowhere to be seen.

But between Prime Minister Modi announcing his name in the list of four astronaut-candidates for Gaganyaan’s first crewed flight (currently set for 2027) and today, I know marginally more about Shukla, much less about the other three, and nothing really personal to boot. Just banal stuff.

This isn’t some military campaign we’re talking about, is it? Just checking.

Chethan Kumar at ToI and Jatan Mehta have done everyone a favour: one by reporting extensively on Shukla’s and ISRO’s activities and the other by collecting even the most deeply buried scraps of information from across the internet in one place. The point, however, is that it shouldn’t have come to this. Their work is laborious, made possible by the fact that it’s by far their primary responsibility.

It needed to be much easier than this to find out more about India’s first homegrown astronauts. ISRO itself has been mum, so much so that every new ISRO story is turning out to be an investigative story. The details of Shukla’s exploits needed to be interesting, too. The haven’t been.

So now, Shukla’s returning from the International Space Station. It’s really not clear what one’s expected to be excited about…

Featured image credit: Ray Hennessy/Unsplash.

Sharks don’t do math

By: VM

From ’Sharks hunt via Lévy flights’, Physics World, June 11, 2010:

They were menacing enough before, but how would you feel if you knew sharks were employing advanced mathematical concepts in their hunt for the kill? Well, this is the case, according to new research, which has tracked the movement of these marine predators along with a number of other species as they foraged for prey in the Pacific and Atlantic oceans. The results showed that these animals hunt for food by alternating between Brownian motion and Lévy flights, depending on the scarcity of prey.

Animals don’t use advanced mathematical concepts. This statement encompasses many humans as well because it’s not a statement about intelligence but one about language and reality. You see a shark foraging in a particular pattern. You invent a language to efficiently describe such patterns. And in that language your name for the shark’s pattern is a Lévy flight. This doesn’t mean the shark is using a Lévy flight. The shark is simply doing what makes sense to it, but which we — in our own description of the world — call a Lévy flight.

The Lévy flight isn’t an advanced concept either. It’s a subset of a broader concept called the random walk. Say you’re on a square grid, like a chessboard. You’re standing on one square. You can move only one step at a time. You roll a four-sided die. Depending on the side it lands on, you step one square forwards, backwards, to the right or to the left. The path you trace over time is called a random walk because its shape is determined by the die roll, which is random.

Random walk 2500.svg.

There are different kinds of walks depending on the rule that determines the choice of your next step. A Lévy flight is a random walk that varies both the direction of the next step and the length of the step. In the random walk on the chessboard, you took steps of fixed lengths: to the adjacent squares. In a Lévy flight, the direction of the next step is random and the length is picked at random from a Lévy distribution. This is what the distribution looks like:

Levy0 distributionPDF.svg.

Notice how a small part of each curve (for different values of c in the distribution’s function) has high values and the majority has smaller values. When you pick your step length at random from, say, the red curve, you have higher odds of of picking a smaller step length than a longer one. This means in a Lévy flight, most of the step lengths will be short but a small number of steps will be long. Thus the ‘flight’ looks like this:

Sharks and many other animals have been known to follow a Lévy flight when foraging. To quote from an older post:

Research has shown that the foraging path of animals looking for food that is scarce can be modelled as a Lévy flight: the large steps correspond to the long distances towards food sources that are located far apart and the short steps to finding food spread in a small area at each source.

Brownian motion is a more famous kind of random walk. It’s the name for the movement of an object that’s following the Wiener process. This means the object’s path needs to obey the following five rules (from the same post):

(i) Each increment of the process is independent of other (non-overlapping) increments;

(ii) How much the process changes over a period of time depends only on the duration of the period;

(iii) Increments in the process are randomly sampled from a Gaussian distribution;

(iv) The process has a statistical mean equal to zero;

(v) The process’s covariance between any two time points is equal to the lower variance at those two points (variance denotes how quickly the value of a variable is spreading out over time).

Thus Brownian motion models the movement of pollen grains in water, dust particles in the air, electrons in a conductor, and colloidal particles in a fluid, and the fluctuation of stock prices, the diffusion of molecules in liquids, and population dynamics in biology. That is, all these processes in disparate domains evolve at least in part according to the rules of the Wiener process.

Still doesn’t mean a shark understands what a Lévy flight is. By saying “sharks use a Lévy flight”, we also discard in the process how the shark makes its decisions — something worth learning about in order to make more complete sense of the world around us rather than force the world to make sense only in those ways we’ve already dreamt up. (This is all the more relevant now with #sharkweek just a week away.)

I care so much because metaphors are bridges between language and reality. Even if the statement “sharks employ advanced mathematical concepts” doesn’t feature a metaphor, the risk it represents hews close to one that stalks the use of metaphors in science journalism: the creation of false knowledge.

Depending on the topic, it’s not uncommon for science journalists to use metaphors liberally, yet scientists have not infrequently upbraided them for using the wrong metaphors in some narratives or for not alerting readers to the metaphors’ limits. This is not unfair: while I disagree with some critiques along these lines for being too pedantic, in most cases it’s warranted. As science philosopher Daniel Sarewitz put it in that 2012 article:

Most people, including most scientists, can acquire knowledge of the Higgs only through the metaphors and analogies that physicists and science writers use to try to explain phenomena that can only truly be characterized mathematically.

Here’s The New York Times: “The Higgs boson is the only manifestation of an invisible force field, a cosmic molasses that permeates space and imbues elementary particles with mass … Without the Higgs field, as it is known, or something like it, all elementary forms of matter would zoom around at the speed of light, flowing through our hands like moonlight.” Fair enough. But why “a cosmic molasses” and not, say, a “sea of milk”? The latter is the common translation of an episode in Hindu cosmology, represented on a spectacular bas-relief panel at Angkor Wat showing armies of gods and demons churning the “sea of milk” to producean elixir of immortality.

For those who cannot follow the mathematics, belief in the Higgs is an act of faith, not of rationality.

A metaphor is not the thing itself and shouldn’t be allow to masquerade as such.

Just as well, there are important differences between becoming aware of something and learning it, and a journalist may require metaphors only to facilitate the former. Toeing this line also helps journalists tame the publics’ expectations of them.

Featured image credit: David Clode/Unsplash.

Tracking the Meissner effect under pressure

By: VM

In the last two or three years, groups of scientists from around the world have made several claims that they had discovered a room-temperature superconductor. Many of these claims concerned high-pressure superconductors — materials that superconduct electricity at room temperature but only if they are placed under extreme pressure (a million atmospheres’ worth). Yet other scientists had challenged these claims on many grounds, but one in particular was whether these materials really exhibited the Meissner effect.

Room-temperature superconductors are often called the ‘holy grail’ of materials science. I abhor clichés but in this case the idiom fits perfectly. If such a material is invented or discovered, it could revolutionise many industries. To quote at length from an article by electrical engineer Massoud Pedram in The Conversation:

Room-temperature superconductors would enable ultra high-speed digital interconnects for next-generation computers and low-latency broadband wireless communications. They would also enable high-resolution imaging techniques and emerging sensors for biomedical and security applications, materials and structure analyses, and deep-space radio astrophysics.

Room-temperature superconductors would mean MRIs could become much less expensive to operate because they would not require liquid helium coolant, which is expensive and in short supply. Electrical power grids would be at least 20% more power efficient than today’s grids, resulting in billions of dollars saved per year, according to my estimates. Maglev trains could operate over longer distances at lower costs. Computers would run faster with orders of magnitude lower power consumption. And quantum computers could be built with many more qubits, enabling them to solve problems that are far beyond the reach of today’s most powerful supercomputers.

However, this surfeit of economic opportunities could also lure scientists into not thoroughly double-checking their results, cherry-picking from their data or jumping to conclusions if they believe they have found a room-temperature superconductor. Many papers written by scientists claiming they had found a room-temperature superconductor have in fact been published in and subsequently retracted from peer-reviewed journals with prestigious reputations, including Nature and Science, after independent experts found the papers to contain flawed data. Whatever the reasons for these mistakes, independent scrutiny of such reports has become very important.

If a material is a superconductor, it needs to meet two conditions*. The first of course is that it needs conduct a direct electric current with zero resistance. Second, the material should display the Meissner effect. Place a magnet over a superconducting material. Then, gradually cool the material to lower and lower temperatures, until you cross the critical temperature. Just as you cross this threshold, the magnet will start to float above the material. You’ve just physically observed the Meissner effect. It happens because when the material transitions to its superconducting state, it will expel all magnetic fields within its bulk to its surface. This results in any magnets already sitting nearby to be pushed away. In fact, the Meissner effect is considered to be the hallmark sign of a superconductor because it’s difficult to fake.

An illustration of the Meissner effect. B denotes the magnetic field, T is the temperature, and Tc is the critical temperature. Credit: Piotr Jaworski
Wait for the 1:03 mark.

The problem with acquiring evidence of the Meissner effect is the setup in which many of these materials become superconductors. In order to apply the tens to hundreds of gigapascals (GPa) of pressure, a small sample of the material — a few grams or less — is placed between a pair of high-quality diamond crystals and squeezed. This diamond anvil cell apparatus leaves no room for a conventional magnetic field sensor to be placed inside the cell. Measuring the magnetic properties of the sample is also complicated because of the fields from other sources in the apparatus, which will have to be accurately measured and then subtracted from the final data.

To tackle this problem, some scientists have of late suggested measuring the sample’s magnetic properties using the only entity that can still enter and leave the diamond anvil cell: light.

In technical terms, such a technique is called optical magnetometry. Magnetometry in general is any technique that converts some physical signal into data about a magnetic field. In this case the signal is in the form of light, thus the ‘optical’ prefix. To deploy optical magnetometry in the context of verifying whether a material is a high-pressure superconductor, scientists have suggested using nitrogen vacancy (NV) centres.

Say you have a good crystal of diamond with you. The crystal consists of carbon atoms bound to each other in sets of four in the shape of a pyramid. Millions of copies of such pyramids together make up the diamond. Now, say you substitute one of the carbon atoms in the gem with a nitrogen atom and also knock out an adjacent carbon atom. Physicists have found that this vacancy in the lattice, called an NV centre, has interesting, useful properties. For example, an NV centre can fluoresce, i.e. absorb light of a higher frequency and emit light of a lower frequency.

An illustration of a nitrogen vacancy centre in diamond. Carbon atoms are shown in green. Credit: Public domain

Because each NV centre is surrounded by three carbon atoms and one nitrogen atom, the vacancy hosts six electrons, two of which are unpaired. All electrons have a property called quantum spin. The quantum spin is the constitutive entity of magnetism the same way the electric charge is the constitutive entity of electricity. For example, if a block of iron is to be turned into a magnet, the spins of all the electrons inside have to be made point in the same direction. Each spin can point in one of two directions, which for a magnet are called ‘north’ and ‘south’. Planet earth has a magnetic north and a magnetic south because the spins of the trillions upon trillions of electrons in its core have come to point in roughly the same direction.

The alignment of the spins of different electrons also affects what energy they have. For example, in the right conditions, an atom with two electrons will have more energy if the electrons’ spins are aligned (↑↑) than when the electrons’ spins are anti-aligned (↑↓). This fundamental attribute of the electrons in the NV centres allows the centres to operate as a super-sensitive detector of magnetic fields — and which is what scientists from institutions around France have reported doing in a June 30 paper in Physical Review Applied.

The scientists implanted a layer of 10,000 to 100,000 NV centres a few nanometres under the surface of one of the diamond anvils. These centres had electrons with energies precisely 2.87 GHz apart.** When the centres were then exposed to microwave laser of some frequency, every NV centre could absorb green laser light and re-emit red light.

The experimental setup. DAC stands for ‘diamond anvil cell’. PL stands for ‘photoluminescence’, i.e. the red light emission. Credit: arXiv:2501.14504v1

As the diamond anvils squeezed the sample past 4 GPa, the pressure at which it would have become a superconductor, the sample displayed the Meissner effect, expelling magnetic fields from within its bulk to the surface. As a result, the NV centres were exposed to a magnetic field in their midst that wasn’t there before. This field affected the electrons’ collective spin and thus their energy levels, which in turn caused the red light being emitted from the centres to dim.

The researchers could easily track the levels and patterns of dimming in the NV centres with a microscopy, and based on that were able to ascertain whether the sample had displayed the Meissner effect. As Physical Review Letters associate editor Martin Rodriguez-Vega wrote in Physics magazine: “A statistical analysis of the [optical] dataset revealed information about the magnetic-field strength and orientation across the sample. Mapping these quantities produced a visualisation of the Meissner effect and revealed the existence of defects in the superconductor.”

In (a), the dotted lines show the parts of the sample that the diamond anvils were in contact with. (b) shows the parts of the sample associated with the red-light emissions from the NV centres, meaning these parts of the sample exhibited the Meissner effect in the experiment. (c) shows the normalised red-light emission along the y-axis and the frequency of microwave light shined along the x-axis. Red lines show the emission in normal conditions and blue lines show the emissions in the presence of the Meissner effect. Credit: arXiv:2501.14504v1

Because the NV centres were less than 1 micrometre away from the sample, they were extremely sensitive to changes in the magnetic field. In fact the researchers reported that the various centres were able to reveal the critical temperature for different parts of the sample separately than for the sample as a whole — a resolution not possible with conventional techniques. The pristine diamond matrix also conferred the electrons’ spins inside the NV centres with a long lifetime. And because there were so many NV centres, the researchers were able to ‘scan’ them with the microwave laser en masse instead of having to maintain focus on a single point on the diamond anvil, when looking for evidence of changes in the sample’s magnetic field. Finally, while the sample in the study became superconducting at a critical temperature of around 140 K, the centres were stable to under 4 K.

Another major advantage of the technique is that it can be used with type II superconductors as well. Type I superconductors are materials that transition to their superconducting state in a single step, under the critical temperature. Type II superconductors transition to their superconducting states in more than one step and display a combination of flux-pinning and the Meissner effect. From my piece in The Hindu in August 2023: “When a flux-pinned superconductor is taken away from a particular part of the magnetic field and put back in, it will snap back to its original relative position.” This happens because type II materials, while they don’t expel magnetic fields from within their bulk, also prevent the fields from moving around inside. Thus the magnetic field lines are pinned in place.

Because of the spatial distribution of the NV centres and their sensitivity, they can reveal flux-pinning in the sample by ‘sensing’ the magnetic fields at different distances.


* The material can make a stronger case for itself if it displays two more properties. (i) The heat energy required to raise the material’s electrons by 1º C has to change drastically at the critical temperature, which is the temperature below which the material becomes a superconductor. (ii) The material’s electrons shouldn’t be able to have certain energy readings. (That is, a map of the energies of all the electrons should show some gaps.) These properties are however considered optional.

** While 2.87 GHz is a frequency figure, recall Planck’s equation from high school: E = hv. Energy is equal to frequency times Planck’s constant, h. Since h is a constant (6.62 × 10-34 m2kg/s), energy figures are frequently denoted in terms of frequency in physics. An interested party can calculate the energy by themselves.

Enfeebling the Indian space programme

By: VM

There’s no denying that there currently prevails a public culture in India that equates criticism, even well-reasoned, with pooh-poohing. It’s especially pronounced in certain geographies where the Bharatiya Janata Party (BJP) enjoys majority support as well as vis-à-vis institutions that the subscribers of Hindu politics consider to be ripe for international renown, especially in the eyes of the country’s former colonial masters. The other side of the same cultural coin is the passive encouragement it offers to those who’d play up the feats of Indian enterprises even if they lack substantive evidence to back their claims up. While these tendencies are pronounced in many enterprises, I have encountered them most often in the spaceflight domain.

Through its feats of engineering and administration over the years, the Indian Space Research Organisation (ISRO) has cultivated a deserved reputation of setting a high bar for itself and meeting them. Its achievements are the reason why India is one of a few countries today with a functionally complete space programme. It operates launch vehicles, conducts spaceflight-related R&D, has facilities to develop as well as track satellites, and maintains data-processing pipeliness to turn the data it collects from space into products usable for industry and academia. It is now embarking on a human spaceflight programme as well. ISRO has also launched interplanetary missions to the moon and Mars, with one destined for Venus in the works. In and of itself the organisation has an enviable legacy. Thus, unsurprisingly, many sections of the Hindutva brigade have latched onto ISRO’s achievements to animate their own propaganda of India’s greatness, both real and imagined.

The surest signs of this adoption are most visible when ISRO missions fail or succeed in unclear ways. The Chandrayaan 2 mission and the Axiom-4 mission respectively are illustrative examples. As if to forestall any allegations that the Chandrayaan 2 mission failed, then ISRO chairman K. Sivam said right after its Vikram lander crashed on the moon that it had been a “98% success”. Chandrayaan 2 was a technology demonstrator and it did successfully demonstrate most of those onboard very well. The “98%” figure, however, was so disproportionate as to suggest Sivan was defending the mission less on its merits than on its ability to fit into reductive narratives of how good ISRO was. (Recall, similarly, when former DCGI V.G. Somani claimed the homegrown Covaxin vaccine was “110% safe” when safety data from its phase III clinical trials weren’t even available.)

On the other hand, even as the Axiom-4 mission was about to kick off, neither ISRO nor the Department of Space (DoS) had articulated what Indian astronaut Shubhanshu Shukla’s presence onboard the mission was expected to achieve. If these details didn’t actually exist before the mission, to participate in which ISRO had paid Axiom Space more than Rs 500 crore, both ISRO and the DoS were effectively keeping the door open to picking a goalpost of their choosing to kick the ball through as the mission progressed. If they did have these details but had elected to not share them, their (in)actions raised — or ought to have — difficult questions about the terms on which these organisations believed they were accountable in a democratic country. Either way, the success of the Axiom-4 mission vis-à-vis Shukla’s participation was something of an empty vessel: a ready receptacle for any narrative that could be placed inside ex post facto.

At the same time, raising this question has often been construed in the public domain, but especially on social media platforms, in response to arguments presented in the news, and in conversations among people interested in Indian spaceflight, as naysaying Shukla’s activities altogether. By all means let’s celebrate Shukla’s and by extension India’s ‘citius, altius, fortius’ moment in human spaceflight; the question is: what didn’t ISRO/DoS share before Axiom-4 lifted off and why? (Note that what journalists have been reporting since liftoff, while valuable, isn’t the answer to the question posed here.) While it’s tempting to think this pinched communication is a strategy developed by the powers that be to cope with insensitive reporting in the press, doing so would also ignore the political capture institutions like ISRO have already suffered and which ISRO arguably has as well, during and after Sivan’s term as chairman.

For just two examples of institutions that have historically enjoyed a popularity comparable in both scope and flavour to that of ISRO, consider India’s cricket administration and the Election Commission. During the 2024 men’s T20 World Cup that India eventually won, the Indian team had the least amount of travel and the most foreknowledge on the ground it was to play its semifinal game on. At the 2023 men’s ODI World Cup, too, India played all its matches on Sundays, ensuring the highest attendance for its own contests rather than be able to share that opportunity with all teams. The tournament is intended to be a celebration of the sport after all. For added measure, police personnel were also deployed at various stadia to take away spectators’ placards and flags in support of Pakistan in matches featuring the Pakistani team. The stage management of both World Cups only lessened, rather than heightened, the Indian team’s victories.

It’s been a similar story with the Election Commission of India, which has of late come under repeated attack from the Indian National Congress party and some of its allies for allegedly rigging their electronic voting machines and subsequently entire elections in favour of the BJP. While the Congress has failed to submit the extraordinary evidence required to support these extraordinary claims, doubts about the ECI’s integrity have spread anyway because there are other, more overt ways in which the once-independent institution of Indian democracy favours the BJP — including scheduling elections according to the availability of party supremo Narendra Modi to speak at rallies.

Recently, a more obscure but nonetheless pertinent controversy erupted in some circles when in an NDTV report incumbent ISRO chairman V. Narayanan seemed to suggest that SpaceX called one of the attempts to launch Axiom-4 off because his team at ISRO had insisted that the company thoroughly check its rocket for bugs. The incident followed SpaceX engineers spotting a leak on the rocket. The point of egregiousness here is that while SpaceX had built and flown that very type of rocket hundreds of times, Narayanan and ambiguous wording in the NDTV report made it out to be that SpaceX would have flown the rocket if not for ISRO’s insistence. What’s more likely to have happened is NASA and SpaceX engineers would have consulted ISRO as they would have consulted the other agencies involved in the flight — ESA, HUNOR, and Axiom Space — about their stand, and the ISRO team on its turn would have clarified its position: that SpaceX recheck the rocket before the next launch attempt. However, the narrative “if not for ISRO, SpaceX would’ve flown a bad rocket” took flight anyway.

Evidently these are not isolated incidents. The last three ISRO chairmen — Sivan, Somanath, and now Narayanan — have progressively curtailed the flow of information from the organisation to the press even as they have maintained a steady pro-Hindutva, pro-establishment rhetoric. All three leaders have also only served at ISRO’s helm when the BJP was in power at the Centre, wielding its tendency to centralise power by, among others, centralising the permissions to speak freely. Some enterprising journalists like Chethan Kumar and T.S. Subramanian and activists like r/Ohsin and X.com/@SolidBoosters have thus far kept the space establishment from resembling a black hole. But the overarching strategy is as simple as it is devious: while critical arguments become preoccupied by whataboutery and fending off misguided accusations of neocolonialist thinking (“why should we measure an ISRO mission’s success the way NASA measures its missions’ successes?”), unconditional expressions of support and adulation spread freely through our shared communication networks. This can only keep up a false veil of greatness that crumbles the moment it brooks legitimate criticism, becoming desperate for yet another veil to replace itself.

But even that is beside the point: to echo the philosopher Bruno Latour, when criticism is blocked from attending to something we have all laboured to build, that something is deprived of the “care and caution” it needs to grow, to no longer be fragile. Yet that’s exactly what the Indian space programme risks becoming today. Here’s a brand new case in point, from the tweets that prompted this post: according to an RTI query filed by @SolidBoosters, India’s homegrown NavIC satellite navigation constellation is just one clock failure away from “complete operational collapse”. The issue appears to be ISRO’s subpar launch cadence and the consequently sluggish replacement of clocks that have already failed.

6/6 Root Cause Analysis for atomic clock failures has been completed but classified under RTI Act Section 8 as vital technical information. Meanwhile public transparency is limited while the constellation continues degrading. #NavIC #ISRO #RTI

— SolidBoosters (@SolidBoosters) July 2, 2025

Granted, rushed critiques and critiques designed to sting more than guide can only be expected to elicit defensive posturing. But to minimise one’s exposure to all criticism altogether, especially those from learned quarters and conveyed in respectful language, is to deprive oneself of the pressure and the drive to solve the right problems in the right ways, both drawing from and adding to India’s democratic fabric. The end results are public speeches and commentary that are increasingly removed from reality as well as, more importantly, thicker walls between criticism and The Thing it strives to nurture.

Externalised costs and the human on the bicycle

By: VM

Remember the most common question the protagonists of the eponymous British sitcom The IT Crowd asked a caller checking why a computer wasn’t working? “Have you tried turning it off and on again?” Nine times out of 10, this fixed the problem, whatever it was, and the IT team could get on with its life.

Around COP26 or so, I acquired a similar habit: every time someone presented something as a model of energy and/or cost efficiency, my first thought was whether they’d included the externalised costs. This is clearly a global problem today yet many people continue to overlook it in contexts big and small. So when I came across a neat graph on Bluesky (shown below), drawn from an old article in Scientific American, I began to wonder if the awesome transportation efficiency of the human on the bicycle (HotB) included the energy costs of making the bicycle as well.

According to the article, written by an SS Wilson and published in 1973, the HotB required 1-2 calories per gram per km to move around. The next most efficient mover was the salmon, which needed 4 cal/g/km. If the energy costs of making the bicycle are included, the energy cost per g/km would shoot up and, depending on the distance the MotB travels, the total cost may never become fully amortised. (It also matters that the math works out only this way at the scale of the human: anything smaller or bigger and the energy cost increases per unit weight per unit distance.)

But there’s a problem with this line of thinking. On a more basic level, neither Wilson nor Scientific American intended the graph to be completely accurate or claimed it was backed by any research more than that required to estimate the energy costs of moving different kinds of moving things through some distance. It was a graph to make one limited point. More importantly, it illustrates how externalised costs can become counterproductive if attempts to factor them in are not guided by subjective, qualitative assessments of what we’re arguing for or against.

Of course the question of external costs is an important one to ask — more so today, when climate commitments and actions are being reinterpreted in dollar figures and quantitative assessments are gaining in prominence as the carbon budget may well have to be strictly rationed among the world’s countries. But whether or not some activity is rendered more or less efficient by factoring in its externalised costs, any human industrial activities — including those to manufacture bicycles — are polluting. There’s no escaping that. And the struggle to mitigate climate change is a struggle to mitigate climate change while ensuring we don’t undermine or compromise the developmental imperative. Otherwise the struggle isn’t one at all.

Even more importantly, this balancing act isn’t a strategy and isn’t the product of consensus: it’s an implicit and morally and ethically correct assumption, an implicit and inviolable component of global climate mitigation efforts. Put another way, this is how it needs to be. In this milieu, and at a time it’s becoming clear the world’s richer countries have a limit to how much they’re prepared to spend to help poorer countries deal with climate change, the impulse to consider externalised costs can mislead decision-making by making some choices seem more undesirable than they really are.

Externalised costs are, or ought to be, important when the emissions from some activity don’t stack up commensurately with any social, cultural, and/or political advantages they confer as well. These costs are not always unavoidable nor undesirable, and we need to keep an eye on where we’re drawing the line between acceptable and unacceptable costs. The danger is that as richer countries both expect and force poorer ones to make more emissions cuts, the latter may have to adopt more robust quantitative rationales to determine what emissions to cut from which sources and when. Should they include externalised costs, many enterprises that should actually live on may face the axe instead.

For one, the HotB should be able to continue to ride on.


Addendum: Here’s an (extended) excerpt from the Scientific American article on where the HotB scores their efficiency gains.

Before considering these developments in detail it is worth asking why such an apparently simple device as the bicycle should have had such a major effect on the acceleration of technology.The answer surely lies in the sheer humanity of the machine. Its purpose is to make it easier for an individual to move about, and this the bicycle achieves in a way that quite outdoes natural evolution. When one compares the energy consumed in moving a certain distance as a function of body weight for a variety of animals and machines, one finds that an unaided walking man does fairly well (consuming about .75 calorie per gram per kilometer), but he is not as efficient as a horse, a salmon or a jet transport. With the aid of a bicycle, however, the man’s energy consumption for a given distance is reduced to about a fifth (roughly .15 calorie per gram per kilometer). Therefore, apart from increasing his unaided speed by a factor of three or four, the cyclist improves his efficiency rating to No. 1 among moving creatures and machines.

… The reason for the high energy efficiency of cycling compared with walking appears to lie mainly in the mode of action of the muscles. … the cyclist … saves energy by sitting, thus relieving his leg muscles of their supporting function and accompanying energy consumption. The only reciprocating parts of his body are his knees and thighs; his feet rotate smoothly at a constant speed and the rest of his body is still. Even the acceleration and deceleration of his legs are achieved efficiently, since the strongest muscles are used almost exclusively; the rising leg does not have to be lifted but is raised by the downward thrust of the other leg. The back muscles must be used to support the trunk, but the arms can also help to do this, resulting (in the normal cycling attitude) in a little residual strain on the hands and arms.

Featured image credit: Luca Zanon/Unsplash.

Tamil Nadu’s lukewarm heatwave policy

By: VM

From ‘Tamil Nadu heatwave policy is only a start’, The Hindu, November 21, 2024:

Estimates of a heatwave’s deadliness are typically based on the extent to which the ambient temperature deviates from the historical average at a specific location and the number of lives lost during and because of the heatwave. This is a tricky, even devious, combination as illustrated by the accompanying rider: “to the reasonable exclusion of other causes of hyperthermia”.

A heatwave injures and/or kills by first pushing more vulnerable people over the edge; the less vulnerable are further down the line. The new policy is presumably designed to help the State catch those whose risk exposure the State has not been able to mitigate in time. However, the goal should be to altogether reduce the number of people requiring such catching. The policy lacks the instruments to guide the State toward this outcome.

The farm fires paradox

By: VM

From The Times of India on November 18, 2024:

A curious claim by all means. The scientist, a Hiren Jethva at NASA Goddard, compared data from the Aqua, Suomi-NPP, and GEO-KOMPSAT 2A satellites and reported that the number of farm fires over North India and Pakistan had dropped whereas the aerosol optical depth — a proxy measure of the aerosol load in the atmosphere — has remained what it’s been over the last half decade or so. He interpreted this to suggest farmers could be burning paddy stubble after the Aqua and Suomi-NPP satellites had completed their overpass. GEO-KOMPSAT 2A is in a geostationary orbit so there’s no evading its gaze.

The idea that farmers across the many paddy-growing states in North India collectively decided to postpone their fires to keep them out of the satellites’ sight seems preposterous. The The Times of India article has some experts towards the end saying this…

… and I sort of agree because it’s in farmers’ interests for the satellites to see more of their fires so the national and state governments can give them better alternatives with better incentives.

The farmers aren’t particularly keen on burning the stubble — they’re doing it because it’s what’s cheapest and quickest. It also matters that there is no surer path to national headlines than being one of the causes of air pollution in New Delhi, much more than dirtying the air in any other city in the country, and that both national and states’ governments have thus far failed to institute sustainable alternatives to burning the stubble. Taken together, if any farmers are looking for better alternatives, more farm fires seem to be the best way to put pressure on governments to do better.

All this said, there may be a fallacy lurking in Jethva’s decision to interpret the timing change solely with respect to the overpass times of the two US satellites and not with any other factor. It’s amusing with a tinge of disappointment that the possibility of someone somewhere “educating” farmers to change their behaviour — and then them following suit en masse — was more within reach than the possibility of satellite data being flawed. If a fire burns in a farm and no satellite is around to see it, does it still produce smoke?

As The Hindu reported:

The data on fire counts are from a heat-sensing instrument on two American satellites — Suomi-NPP and NOAA-20 polar-orbiting satellites. Instruments on polar-orbiting satellites typically observe a wildfire at a given location a few times a day as they orbit the Earth, pole to pole. They pass over India from 1 p.m. to 2 p.m. …

Other researchers also suggest that merely relying on fire counts from the polar satellites may be inadequate and newer satellite data parameters, such as estimating the actual extent of fields burned, may be a more accurate indicator of the true measure of stubble burning.

An infuriating editorial in Science

By: VM

I’m not just disappointed with an editorial published by the journal Science on November 14, I’m angry.

Irrespective of whether the Republican Party in the US has shifted more or less rightward on specific issues, it has certainly shifted towards falsehoods on many of them. Party leaders, including Donald Trump, have been using everything from lazily inaccurate information to deliberately misleading messages to preserve conservative attitudes wherever that’s been the status quo and to stoke fear, confusion, uncertainty, and animosity where peace and good sense have thus far prevailed.

Against this backdrop, which the COVID-19 pandemic revealed in all its glory, Science‘s editorial is headlined “Science is neither red nor blue”. (Whether this is a reference to the journal itself is immaterial.) Its author, Marcia McNutt, president of the US National Academy of Sciences (NAS), writes (emphasis added):

… scientists need to better explain the norms and values of science to reinforce the notion—with the public and their elected representatives—that science, at its most basic, is apolitical. Careers of scientists advance when they improve upon, or show the errors in, the work of others, not by simply agreeing with prior work. Whether conservative or liberal, citizens ignore the nature of reality at their peril. A recent example is the increased death rate from COVID-19 (as much as 26% higher) in US regions where political leaders dismissed the science on the effectiveness of vaccines. Scientists should better explain the scientific process and what makes it so trustworthy, while more candidly acknowledging that science can only provide the best available evidence and cannot dictate what people should value. Science cannot say whether society should prioritize allocating river water for sustaining fish or for irrigating farms, but it can predict immediate and long-term outcomes of any allocation scheme. Science can also find solutions that avoid the zero-sum dilemma by finding conservation approaches to water management that benefit both fish and farms.

Can anyone explain to me what the first portion in bold even means? Because I don’t want to assume a science administrator as accomplished as McNutt is able to ignore the narratives and scholarship roiling around the sociology of science at large or the cruel and relentless vitiation of scientific knowledge the first Trump administration practiced in particular. Even if the editorial’s purpose is to extend an olive branch to Trump et al., it’s bound to fail. If, say, a Republican leader makes a patently false claim in public, are we to believe an institution as influential as the NAS will not call it out for fear of being cast as “blue” in the public eye?

The second portion in bold is slightly less ridiculous: “science can only provide the best available evidence and cannot dictate what people should value.” McNutt is creating a false impression here by failing to present the full picture. During a crisis, science has to be able to tell people what to value more or less rather than what to value at all. Crises create uncertainty whereas science creates knowledge that is free from bias (at least it can be). It offers a pillar to lean on while we figure out everything else. People should value these pillars.

When a national government — in this case the government of one of the world’s most powerful countries — gives conspiracies and lies free reign, crises will be everywhere. If McNutt means to suggest these crises are so only insofar as the liberal order is faced with changes inimical to its sustenance, she will be confusing what is today the evidence-conspiracy divide for what was once, but is no longer, the conservative-liberal divide.

As if to illustrate this point, she follows up with the third portion in bold: “Science cannot say whether society should prioritize allocating river water for sustaining fish or for irrigating farms, but it can predict immediate and long-term outcomes of any allocation scheme.” Her choice of example is clever because it’s also fallacious: it presents a difficult decision with two reasonable outcomes, ‘reasonable’ being the clincher. The political character of science-in-practice is rarely revealed in debates where reasonability is allowed through the front door and given the power to cast the decisive vote. This was almost never the case under the first Trump administration nor the parts of the Republican Party devoted to him (which I assume is the whole party now), where crazy* has had the final say.

The choice McNutt should really have deliberated is “promoting the use of scientifically tested vaccines during a pandemic versus urging people to be cautious about these vaccines” or “increasing the stockpile of evidence-backed drugs and building social resilience versus hawking speculative ideas and demoralising science administrators”. When the choice is between irrigation for farms and water for fisheries, science can present the evidence and then watch. When the choice is between reason and bullshit, still advocating present-and-watch would be bullshit, too — i.e. science would be “red”.

This is just my clumsy, anger-flecked take on what John Stuart Mill and many others recognised long past: “Bad men need nothing more to compass their ends than that good men should look on and do nothing.” But if McNutt would still rather push the line that what seem like “bad men” to me might be good men to others, she and the policies she influences will have committed themselves to the sort of moral relativism that could never be relevant to politics in practice, which in turn would be a blow for us all.


(* My colloquialism for the policy of being in power for the sake of being in power, rather than to govern.)

Low Orbit Satellite Companies Respond to Scientists’ Concerns About Light and Environmental Pollution With Even Bigger, Brighter Satellites

By: Nick Heer

Karl Bode, Techdirt:

Scientists say that low earth orbit (LEO) satellite constellations being built by Amazon, Starlink, and AT&T pose a dire threat to astronomy and scientific research, and that too little is being done to address the issue.

There are costs to suddenly widespread satellite connectivity. Apple’s partner in its offering, Globalstar, operates a constellation of satellites which would similarly be concerning to scientists.

It is a tricky balance. Adding redundant communications layers in our everyday devices can be useful and is, plausibly, of lifesaving consequence. Yet it also means the sky is littered with fields of objects which interfere with ground-based instruments. The needs of scientists might seem more abstract and less dire than, say, people seeking help in a natural disaster — I understand that. But I am not certain we will be proud of ourselves fifty years from now if we realize astronomical research has been severely curtailed because a bunch of private companies decided to compete in our shared sky. There is surely a balance to be struck.

⌥ Permalink

What can science education do, and what can it not?

What can science education do, and what can it not?

On September 29, 2021, The Third Eye published an interview with Milind Sohoni, a teacher at the Centre for Technology Alternatives for Rural Areas and at IIT Bombay. (Thanks to @labhopping for bringing it into my feed.) I found it very thought-provoking. I’m pasting below some excerpts from the interview together with my notes. I think what Prof. Sohoni says doesn’t build up to a coherent whole. He is at times simplistic and self-contradictory, and what he says is often descriptive instead of offering a way out. Of course I don’t know whether what I say builds up to a coherent whole either but perhaps you’ll realise details here that I’ve missed.


… I wish the textbooks had exercises like let’s visit a bus depot, or let’s visit a good farmer and find out what the yields are, or let’s visit the PHC sub-centre, talk to the nurse, talk to the compounder, talk to the two doctors, just getting familiar with the PHC as something which provides a critical health service would have helped a lot. Or spend time with an ASHA worker. She has a notepad with names of people in a village and the diseases they have, which family has what medical emergency. How is it X village has so much diabetes and Y village has none?

I’m sure you’ll agree this would be an excellent way to teach science — together with its social dependencies instead of introducing the latter as an add-on at the level of higher, specialised education.

… science education is not just about big science, and should not be about big science. But if you look at the main central government departments populated by scientists, they are Space, Atomic Energy and Defence. Okay, so we have missile men and women, big people in science, but really, so much of science in most of the developed world is really sadak, bijli, pani.

I disagree on three counts. (i) Science education should include ‘big science’; if it doesn’t we lose access to a domain of knowledge and enterprise that plays an important role in future-proofing societies. We choose the materials with which we will build buildings, lay roads, and make cars and batteries and from which we will generate electric power based on ‘big science’. (ii) Then again, what is ‘big science’? I’m not clear what Sohoni means by that in this comment. But later in the interview he refers to Big Science as a source of “certainty” (vis-à-vis life today) delivered in the form of “scientific things … which we don’t understand”.

If by “Big Science” he means large scientific experiments that have received investments worth millions of dollars from multiple governments, and which are churning out results that don’t inform or enhance contemporary daily life, his statement seems all the more problematic. If a government invests some money in a Big Science project but then pulls out, it doesn’t necessarily or automatically redirect those funds to a project that a critic has deemed more worthwhile, like say multiple smaller science projects. Government support for Big Science has never operated that way. Further, Big Science frequently and almost by design inevitably leads to a lot of derivative ‘Smaller Science’, spinoff technologies, and advances in allied industries. Irrespective of whether these characteristics — accidental or otherwise — suffice to justify supporting a Big Science project, wanting to expel such science from science education is still reckless.

You’re allowed to be interested in particle physics
This page appeared in The Hindu’s e-paper today. I wrote the lead article, about why scientists are so interested in an elementary particle called the top quark. Long story short: the top quark is the heaviest elementary particle, and because all elementary particles get their masses by interacting with
What can science education do, and what can it not?DisagreeVM
What can science education do, and what can it not?

(iii) Re: “… so much of science in most of the developed world is really streets, electricity, water” — Forget proving/disproving this and ask yourself: how do we separate research in space, atomic energy, and defence from knowledge that gave rise to better roads, cheaper electricity, and cleaner water? We can’t. There is also a specific history that explains why each of these departments Sohoni has singled out were set up the way they were. And just because they are staffed with scientists doesn’t mean they are any good or worth emulating. (I’m also setting aside what Sohoni means by “much”. Time consumed in research? Money spent? Public value generated? Number of lives improved/saved?).

Our science education should definitely include Big Science: following up from the previous quote, teachers can take students to a radio observatory nearby and speak to the scientists about how the project acquired so much land, how it secured its water and power requirements, how administrators negotiated with the locals, etc. Then perhaps we can think about avoiding cases like the INO.

India-based neutrino oblivion
In a conversation with science journalist Nandita Jayaraj, physicist and Nobel laureate Takaaki Kajita touched on the dismal anti-parallels between the India-based Neutrino Observatory (INO) and the Japanese Kamioka and Super-Kamiokande observatories. The INO’s story should be familiar to readers of this blog: a team of physicists led by
What can science education do, and what can it not?DisagreeVM
What can science education do, and what can it not?
The Prohibition of Employment as Manual Scavengers Act came along ago, and along with it came a list of 42 [pieces of] equipment, which every municipality should have: a mask, a jetting machine, pumps and so on. Now, even IIT campuses don’t have that equipment. Is there any lab that has a ‘test mask’ even? Our men are going into talks and dying because of [lethal] fumes. A ‘test mask’ is an investment. You need a face-like structure and an artificial lung exposed to various environments to test its efficacy. And this mask needs to be standard equipment in every state. But these are things we never asked IITs to do, right?

This comment strikes a big nail on the head. It also brings to mind an incident on the Anna University campus eight years ago. To quote from Thomas Manuel’s report in The Wire on the incident: “On June 21, 2016, two young men died. Their bodies were found in a tank at the Anna University campus in Chennai. They were employees of a subcontractor who had been hired to seal the tank with rubber to prevent any leakage of air. The tank was being constructed as a part of a project by the Ministry of Renewable Energy to explore the possibilities of using compressed air to store energy. The two workers, Ramesh Shankar and Deepan, had arrived at the site at around 11.30 am and begun work. By 3.30 pm, when they were pulled out of the tank, Deepan was dead and Ramesh Shankar, while still breathing at the time, died a few minutes later.”

This incident seemed, and still seems, to say that even within a university — a place where scientists and students are keenly aware of the rigours of science and the value it brings to society — no one thinks to ensure the people hired for what is casually called “menial” labour are given masks or other safety equipment. The gaps in science education Sohoni is talking about are evident in the way scientists think about how they can ensure society is more rational. A society rife with preventable deaths is not rational.

I think what science does is that it claims to study reality. But most of reality is socially administered, and so we need to treat this kind of reality also as a part of science.

No, we don’t. We shouldn’t. Science offers a limited set of methods and analytical techniques with which people can probe and describe reality and organise the knowledge they generate. He’s right, most of reality is socially administered, but that shouldn’t be an invitation to forcibly bring what currently lies beyond science to within the purview of science. The scientific method can’t deal with them — but importantly it shouldn’t be expected to. Science is incapable of handling multiple, equally valid truths pertaining to the same set of facts. In fact a few paras later Sohoni ironically acknowledges that there are truths beyond science and that their existence shouldn’t trouble scientists or science itself:

… scientists have to accept that there are many things that we don’t know, and they still hold true. Scientists work empirically and sometimes we say okay, let’s park it, carry on, and maybe later on we will find out the ‘why’. The ‘why’ or the explanation is very cultural…

… whereas science needs that ‘why’, and needs it to be singular and specific. If these explanations for aspects of reality don’t exist in a form science can accommodate, yet we also insist as Sohoni did when he said “we need to treat this kind of reality also as a part of science”, then we will be forced to junk these explanations for no fault except that they don’t meet science’s acceptability criteria.

Perhaps there is a tendency here as if to say we need a universal theory of everything, but do we? We can continue to use different human intellectual and social enterprises to understand and take advantage of different parts of human experience. Science and for that matter the social sciences needn’t be, and aren’t, “everything”.

Science has convinced us, and is delivering on its promise of making us live longer. Whether those extra five years are of higher quality is not under discussion. You know, this is the same as people coming from really nice places in the Konkan to a slum in Mumbai and staying there because they want certainty. Life in rural Maharashtra is very hard. There’s more certainty if I’m a peon or a security guard in the city. I think that science is really offering some ‘certainty’. And that is what we seem to have accepted.

This seems to me to be too simplistic. Sohoni says this in reply to being asked whether science education today leans towards “technologies that are serving Big Business and corporate profits, rather than this developmental model of really looking critically at society”. And he would have been fairer to say we have many more technological devices and products around us today, founded on what were once scientific ideas, that serve corporate profits more than anything else. The French philosopher Jacques Ellul elucidated this idea brilliantly in his book The Technological Society (1964).

It’s just that Sohoni’s example of ageing is off the mark, and in the process it is harder to know what he’s really getting at. Lifespan is calculated as the average number of years an individual in a particular population lives. It can be improved by promoting factors that help our bodies become more resilient and by dissuading factors that cause us to die sooner. If lifespan is increasing today, it’s because fewer babies are succumbing to vaccine-preventable diseases before they turn five, because there are fewer road accidents thanks to vehicle safety, and because novel treatments like immunotherapy are improving the treatment rates of various cancers. Any new scientific knowledge in the prevailing capitalist world-system is susceptible to being coopted by Big Business but I’m also glad the knowledge exists at all.

Hair conditioners and immortality
I’m not a fan of cosmetic products whatsoever. The most I use is a bar of soap, a bottle of shampoo, a smaller bottle of coconut oil and the occasional earbud. Maybe a bottle of deodorant when I’ve been out in the sun overlong. But recently, when I
What can science education do, and what can it not?DisagreeVM
What can science education do, and what can it not?

Sure, we can all live for five more years on average, but if those five years will be spent in, say, the humiliating conditions of palliative care, let’s fix that problem. Sohoni says science has strayed from that path and I’m not so sure — but I’m convinced there’s enough science to go around (and enough money for it, just not the political will): scientists can work on both increasing lifespan and improving the conditions of palliative care. We shouldn’t vilify one kind of science in order to encourage the other. Yet Sohoni persists with this juxtaposition as he says later:

… we are living longer, we are still shitting on the road or, you know, letting our sewage be cleaned by fellow humans at the risk of death, but we are living longer. And that is, I think, a big problem.

We are still shitting on the road and we are letting our sewage be cleaned by fellow humans at the risk of death. These are big problems. Us living longer is not a big problem.

Big Technology has a knack of turning us all into consumers of science, by neutralising questions on ‘how’ and ‘why’ things work. We accept it and we enjoy the benefits. But see, if you know the benefits are divided very unevenly, why doesn’t it bother us? For example, if you buy an Apple iPhone for Rs. 75,000 how much does the actual makers of the phone (factory workers) get? I call it the Buddhufication Crisis: a lot of people are just hooked on to their smartphones, and live in a bubble of manufactured certainty; and the rest of society that can’t access smartphones, is left to deal with real-world problems.

By pushing us to get up, get out, and engage with science where it is practised, a better science education can inculcate a more inquisitive, critical-thinking population that applies the good sense that comes of a good education to more, or all, aspects of society and social living. This is why Big Technology in particular does not tempt us into becoming “consumers” of science rather than encouraging us to pick at its pieces. Practically everything does. Similarly Sohoni’s “Buddhufication” description is muddled. Of course it’s patronising towards the people who create value — especially if it is new and/or takes unexpected forms — out of smartphones and use it as a means of class mobility, and seems to suggest a person striving for any knowledge other than of the scientific variety is being a “buddhu”. And what such “buddhufication” has to do with the working conditions of Apple’s “factory workers” is unclear.

Speaking of relationships:

Through our Public Health edition, we also seem to sit with the feeling that science is not serving rural areas, not serving the poor. In turn, there is also a lower expectation of science from the rural communities. Do you feel this is true?
Yes, I think that is true to a large extent. But it’s not to do with rural. You see, for example, if you look at western Maharashtra — the Pune-Nashik belt — some of the cleverest people live there. They are basically producing vegetables for the big urban markets: in Satara, Sangli, that entire irrigated area. And in fact, you will see that they are very careful about their future, and understand their place in society and the role of the state. And they expect many things from the state or the government; they want things to work, hospitals to work, have oxygen, etc. And so, it is really about the basic understanding of cause and effect of citizenship. They understand what is needed to make buses work, or hospitals function; they understand how the state works. This is not very different from knowing how gadgets work.

While the distinction to many others may be trivial, “science” and “scientists” are not the same thing. This equation is present throughout the interview. At first I assumed it was casual and harmless but at this point, given the links between science, science education, technology, and public welfare that Sohoni has tried to draw, the distinction is crucial here. Science is already serving rural areas — Sohoni says as much in the comment here and the one that follows. But many, or maybe most, scientists may not be serving rural areas, if only so we can also acknowledge that some scientists are also serving rural areas. “Science is not serving rural areas” would mean no researcher in the country — or anywhere, really — has brought the precepts of science to bear on the problems of rural India. This is just not true. On the other hand saying “most scientists are not serving rural areas” will tell us some useful scientific knowledge exists but (i) too few scientists are working on it (i.e. mindful of the local context) and (ii) there are problems with translating it from the lab bench to its application in the field, at ground zero.

This version of this post benefited from inputs from and feedback by Prathmesh Kher.

Neural Networks (MNIST inference) on the “3-cent” Microcontroller

By: cpldcpu

Bouyed by the surprisingly good performance of neural networks with quantization aware training on the CH32V003, I wondered how far this can be pushed. How much can we compress a neural network while still achieving good test accuracy on the MNIST dataset? When it comes to absolutely low-end microcontrollers, there is hardly a more compelling target than the Padauk 8-bit microcontrollers. These are microcontrollers optimized for the simplest and lowest cost applications there are. The smallest device of the portfolio, the PMS150C, sports 1024 13-bit word one-time-programmable memory and 64 bytes of ram, more than an order of magnitude smaller than the CH32V003. In addition, it has a proprieteray accumulator based 8-bit architecture, as opposed to a much more powerful RISC-V instruction set.

Is it possible to implement an MNIST inference engine, which can classify handwritten numbers, also on a PMS150C?

On the CH32V003 I used MNIST samples that were downscaled from 28×28 to 16×16, so that every sample take 256 bytes of storage. This is quite acceptable if there is 16kb of flash available, but with only 1 kword of rom, this is too much. Therefore I started with downscaling the dataset to 8×8 pixels.

The image above shows a few samples from the dataset at both resolutions. At 16×16 it is still easy to discriminate different numbers. At 8×8 it is still possible to guess most numbers, but a lot of information is lost.

Suprisingly, it is still possible to train a machine learning model to recognize even these very low resolution numbers with impressive accuracy. It’s important to remember that the test dataset contains 10000 images that the model does not see during training. The only way for a very small model to recognize these images accurate is to identify common patterns, the model capacity is too limited to “remember” complete digits. I trained a number of different network combinations to understand the trade-off between network memory footprint and achievable accuracy.

Parameter Exploration

The plot above shows the result of my hyperparameter exploration experiments, comparing models with different configurations of weights and quantization levels from 1 to 4 bit for input images of 8×8 and 16×16. The smallest models had to be trained without data augmentation, as they would not converge otherwise.

Again, there is a clear relationship between test accuracy and the memory footprint of the network. Increasing the memory footprint improves accuracy up to a certain point. For 16×16, around 99% accuracy can be achieved at the upper end, while around 98.5% is achieved for 8×8 test samples. This is still quite impressive, considering the significant loss of information for 8×8.

For small models, 8×8 achieves better accuracy than 16×16. The reason for this is that the size of the first layer dominates in small models, and this size is reduced by a factor of 4 for 8×8 inputs.

Surprisingly, it is possible to achieve over 90% test accuracy even on models as small as half a kilobyte. This means that it would fit into the code memory of the microcontroller! Now that the general feasibility has been established, I needed to tweak things further to accommodate the limitations of the MCU.

Training the Target Model

Since the RAM is limited to 64 bytes, the model structure had to use a minimum number of latent parameters during inference. I found that it was possible to use layers as narrow as 16. This reduces the buffer size during inference to only 32 bytes, 16 bytes each for one input buffer and one output buffer, leaving 32 bytes for other variables. The 8×8 input pattern is directly read from the ROM.

Furthermore, I used 2-bit weights with irregular spacing of (-2, -1, 1, 2) to allow for a simplified implementation of the inference code. I also skipped layer normalization and instead used a constant shift to rescale activations. These changes slightly reduced accuracy. The resulting model structure is shown below.

All things considered, I ended up with a model with 90.07% accuracy and a total of 3392 bits (0.414 kilobytes) in 1696 weights, as shown in the log below. The panel on the right displays the first layer weights of the trained model, which directly mask features in the test images. In contrast to the higher accuracy models, each channel seems to combine many features at once, and no discernible patterns can be seen.

Implementation on the Microntroller

In the first iteration, I used a slightly larger variant of the Padauk Microcontrollers, the PFS154. This device has twice the ROM and RAM and can be reflashed, which tremendously simplifies software development. The C versions of the inference code, including the debug output, worked almost out of the box. Below, you can see the predictions and labels, including the last layer output.

Squeezing everything down to fit into the smaller PMS150C was a different matter. One major issue when programming these devices in C is that every function call consumes RAM for the return stack and function parameters. This is unavoidable because the architecture has only a single register (the accumulator), so all other operations must occur in RAM.

To solve this, I flattened the inference code and implemented the inner loop in assembly to optimize variable usage. The inner loop for memory-to-memory inference of one layer is shown below. The two-bit weight is multiplied with a four-bit activation in the accumulator and then added to a 16-bit register. The multiplication requires only four instructions (t0sn, sl,t0sn,neg), thanks to the powerful bit manipulation instructions of the architecture. The sign-extending addition (add, addc, sl, subc) also consists of four instructions, demonstrating the limitations of 8-bit architectures.

void fc_innerloop_mem(uint8_t loops) {

    sum = 0;
    do  {
       weightChunk = *weightidx++;
__asm   
    idxm  a, _activations_idx
	inc	_activations_idx+0

    t0sn _weightChunk, #6
    sl     a            ;    if (weightChunk & 0x40) in = in+in;
    t0sn _weightChunk, #7
    neg    a           ;     if (weightChunk & 0x80) in =-in;                    

    add    _sum+0,a
    addc   _sum+1
    sl     a 
    subc   _sum+1  

  ... 3x more ...

__endasm;
    } while (--loops);

    int8_t sum8 = ((uint16_t)sum)>>3; // Normalization
    sum8 = sum8 < 0 ? 0 : sum8; // ReLU
    *output++ = sum8;
}

In the end, I managed to fit the entire inference code into 1 kilowords of memory and reduced sram usage to 59 bytes, as seen below. (Note that the output from SDCC is assuming 2 bytes per instruction word, while it is only 13 bits).

Success! Unfortunately, there was no rom space left for the soft UART to output debug information. However, based on the verificaiton on PFS154, I trust that the code works, and since I don’t have any specific application in mind, I left it at that stage.

Summary

It is indeed possible to implement MNIST inference with good accuracy using one of the cheapest and simplest microcontrollers on the market. A lot of memory footprint and processing overhead is usually spent on implementing flexible inference engines, that can accomodate a wide range of operators and model structures. Cutting this overhead away and reducing the functionality to its core allows for astonishing simplification at this very low end.

This hack demonstrates that there truly is no fundamental lower limit to applying machine learning and edge inference. However, the feasibility of implementing useful applications at this level is somewhat doubtful.

You can find the project repository here.

The pitfalls of Somanath calling Aditya L1 a “protector”

By: VM

In a WhatsApp group of which I’m a part, there’s a heated discussion going on around an article published by NDTV on June 10, entitled ‘Sun’s Fury May Fry Satellites, But India Has A Watchful Space Protector’. The article was published after the Indian Space Research Organisation (ISRO) published images of the Sun the Aditya L1 spacecraft (including its coronagraph) captured during the May solar storm. The article also features quotes by ISRO chairman S. Somanath — and some of them in particular prompted the discussion. For example, he says:

“Aditya L1 captured when the Sun got angry this May. If it gets furious in the near future, as scientists suggest, India’s 24x7X365 days’ eye on the Sun is going to provide a forewarning. After all, we have to protect the 50-plus Indian satellites in space that have cost the country an estimated more than ₹ 50,000 crore. Aditya L1 is a celestial protector for our space assets.”

A space scientist on the group pointed out that any solar event that could fry satellites in Earth orbit would also fry Aditya L1, which is stationed at the first Earth-Sun Lagrange point (1.5 million km from Earth in the direction of the Sun), so it doesn’t make sense to describe this spacecraft as a “protector” of India’s “space assets”. Instead, the scientist said, we’re better off describing Aditya L1 as a science mission, which is what it’d been billed as.

Another space scientist in the same group contended that the coronagraph onboard Aditya L1, plus its other instruments, still give the spacecraft a not insignificant early-warning ability, using which ISRO could consider protective measures. He also said not all solar storms are likely to fry all satellites around Earth, only the very powerful ones; likewise, not all satellites around Earth are equally engineered to withstand solar radiation that is more intense than usual, to varying extents. With these variables in mind, he added, Aditya L1 — which is protected to a greater degree — could give ISRO folks enough head start to manoeuvre ‘weaker’ satellites out of harm’s way or prevent catastrophic failures. By virtue of being ISRO’s eyes on the Sun, then, he suggested Aditya L1 was a scientific mission that could also perform some, but not all, of the functions expected of a full-blown early warning system.

(For such a system vis-a-vis solar weather, he said the fourth or the fifth Earth-Sun Lagrange points would have been better stations.)

I’m putting this down here as a public service message. Characterising a scientific mission — which is driven by scientists’ questions, rather than ISRO’s perception of threats or as part of any overarching strategy of the Indian government — as something else is not harmless because it downplays the fact that we have open questions and that we need to spend time and money answering them. It also creates a false narrative about the mission’s purpose that the people who have spent years designing and building the instruments onboard Aditya L1 don’t deserve, and a false impression of how much room the Indian space programme currently has to launch and operate spacecraft that are dedicated to providing early warnings of bad solar weather.

To be fair, the NDTV article says in a few places that Aditya L1 is a scientific mission, as does astrophysicist Somak Raychaudhury in the last paragraph. It’s just not clear why Somanath characterised it as a “protector” and as a “space-based insurance policy”. NDTV also erred by putting “protector” in the headline (based on my experiences at The Wire and The Hindu, most readers of online articles read and share nothing more than the headline). That it was the ISRO chairman who said these things is more harmful: as the person heading India’s nodal space research body, he has a protagonist’s role in making room in the public imagination for the importance and wonders of scientific missions.

The BHU Covaxin study and ICMR bait

By: VM

Earlier this month, a study by a team at Banaras Hindu University (BHU) in Varanasi concluded that fully 1% of Covaxin recipients may suffer severe adverse events. One percent is a large number because the multiplier (x in 1/100 * x) is very large — several million people. The study first hit the headlines for claiming it had the support of the Indian Council of Medical Research (ICMR) and reporting that both Bharat Biotech and the ICMR are yet to publish long-term safety data for Covaxin. The latter is probably moot now, with the COVID-19 pandemic well behind us, but it’s the principle that matters. Let it go this time and who knows what else we’ll be prepared to let go.

But more importantly, as The Hindu reported on May 25, the BHU study is too flawed to claim Covaxin is harmful, or claim anything for that matter. Here’s why (excerpt):

Though the researchers acknowledge all the limitations of the study, which is published in the journal Drug Safety, many of the limitations are so critical that they defeat the very purpose of the study. “Ideally, this paper should have been rejected at the peer-review stage. Simply mentioning the limitations, some of them critical to arrive at any useful conclusion, defeats the whole purpose of undertaking the study,” Dr. Vipin M. Vashishtha, director and pediatrician, Mangla Hospital and Research Center, Bijnor, says in an email to The Hindu. Dr. Gautam Menon, Dean (Research) & Professor, Departments of Physics and Biology, Ashoka University shares the same view. Given the limitations of the study one can “certainly say that the study can’t be used to draw the conclusions it does,” Dr. Menon says in an email.

Just because you’ve admitted your study has limitations doesn’t absolve you of the responsibility to interpret your research data with integrity. In fact, the journal needs to speak up here: why did Drug Safety publish the study manuscript? Too often when news of a controversial or bad study is published, the journal that published it stays out of the limelight. While the proximal cause is likely that journalists don’t think to ask journal editors and/or publishers tough questions about their publishing process, there is also a cultural problem here: when shit hits the fan, only the study’s authors are pulled up, but when things are rosy, the journals are out to take credit for the quality of the papers they publish. In either case, we must ask what they actually bring to the table other than capitalising on other scientists’ tendency to judge papers based on the journals they’re published in instead of their contents.

Of course, it’s also possible to argue that unlike, say, journalistic material, research papers aren’t required to be in the public interest at the time of publication. Yet the BHU paper threatens to undermine public confidence in observational studies, and that can’t be in anyone’s interest. Even at the outset, experts and many health journalists knew observational studies don’t carry the same weight as randomised controlled trials as well as that such studies still serve a legitimate purpose, just not the one to which its conclusions were pressed in the BHU study.

After the paper’s contents hit the headlines, the ICMR shot off a latter to the BHU research team saying it hasn’t “provided any financial or technical support” to the study and that the study is “poorly designed”. Curiously, the BHU team’s repartee to the ICMR’s makes repeated reference to Vivek Agnihotri’s film The Vaccine War. In the same point in which two of these references appear (no. 2), the team writes: “While a study with a control group would certainly be of higher quality, this immediately points to the fact that it is researchers from ICMR who have access to the data with the control group, i.e. the original phase-3 trials of Covaxin – as well publicized in ‘The Vaccine War’ movie. ICMR thus owes it to the people of India, that it publishes the long-term follow-up of phase-3 trials.”

I’m not clear why the team saw fit to appeal to statements made in this of all films. As I’ve written earlier, The Vaccine War — which I haven’t watched but which directly references journalistic work by The Wire during and of the pandemic — is most likely a mix of truths and fictionalisation (and not in the clever, good-faith ways in which screenwriters adopt textual biographies for the big screen), with the fiction designed to serve the BJP’s nationalist political narratives. So when the letter says in its point no. 5 that the ICMR should apologise to a female member of the BHU team for allegedly “spreading a falsehood” about her and offers The Vaccine War as a counterexample (“While ‘The Vaccine War’ movie is celebrating women scientists…”), I can’t but retch.

Together with another odd line in the latter — that the “ICMR owes it to the people of India” — the appeals read less like a debate between scientists on the merits and the demerits of the study and more like they’re trying to bait the ICMR into doing better. I’m not denying the ICMR started it, as a child might say, but saying that this shouldn’t have prevented the BHU team from keeping it dignified. For example, the BHU letter reads: “It is to be noted that interim results of the phase-3 trial, also cited by Dr. Priya Abraham in ‘The Vaccine War’ movie, had a mere 56 days of safety follow-up, much shorter than the one-year follow-up in the IMS-BHU study.” Surely the 56-day period finds mention in a more respectable and reliable medium than a film that confuses you about what’s real and what’s not?

In all, the BHU study seems to have been designed to draw attention to gaps in the safety data for Covaxin — but by adopting such a provocative route, all that took centerstage was its spat with the ICMR plus its own flaws.

India can do it!

By: VM
Against the background of the H5N1 pandemic in birds and an epidemic among cattle in the US, the Government of Victoria, in Australia, published a statement on May 21 that the province had recorded the country’s first human H5N1 case. This doesn’t seem to be much cause (but also not negligible cause) for concern because, according to the statement as well as other experts, this strain of avian influenza hasn’t evolved to spread easily between people. The individual in question who had the infection — “a child”, according to Victoria’s statement — had a severe form of it but has since recovered fully as well.

But this story isn’t testament to Australia’s pathogen surveillance, at least not primarily; it’s testament to India’s ability to do it. In Vivek Agnihotri’s film The Vaccine War — purportedly about the efforts of Bharat Biotech, the ICMR, and the NIV to develop Covaxin during the COVID-19 pandemic — Raima Sen, who plays the science editor of a fictitious publication called The Daily Wire, says about developing the vaccine in a moment of amusing cringe on a TV news show that “India can’t do it”. Agnihotri didn’t make it difficult to see myself in Sen’s character: I was science editor of the very real publication The Wire when Covaxin was being developed. And I’m here to tell you that India, in point of fact, can: according to Victoria’s statement, the child became infected by a strain of the H5N1 virus in India and fell ill in March 2024.

And what is it that India can do? According to Victoria’s statement, spotting the infection required “Victoria’s enhanced surveillance system”. Further, “most strains don’t infect humans”; India was able to serve the child with one of the rare strains that could. “Transmission to humans” is also “very rare”, happening largely among people who “have contact with infected birds or animals, or their secretions, while in affected areas of the world”. Specifically: “Avian influenza is spread by close contact with an infected bird (dead or alive), e.g. handling infected birds, touching droppings or bedding, or killing/preparing infected poultry for cooking. You can’t catch avian influenza through eating fully cooked poultry or eggs, even in areas with an outbreak of avian influenza.”

So let’s learn our lesson: If we give India’s widespread dysregulation of poultry and cattle health, underinvestment in pathogen surveillance, and its national government’s unique blend of optimism and wilful ignorance a chance, the country will give someone somewhere a rare strain of an avian influenza virus that can infect humans. Repeat after me: India can do it!

The billionaire’s solution to climate change

By: VM

On May 3, Bloomberg published a profile of Salesforce CEO Marc Benioff’s 1t.org project to plant or conserve one trillion trees around the world in order to sequester 200 gigatonnes of carbon every year. The idea reportedly came to Benioff from Thomas Crowther’s infamous September 2015 paper in Nature that claimed restoring trees was the world’s best way to ‘solve’ climate change.

Following pointed criticism of the paper’s attitude and conclusions, they were revised to a significant extent in October 2019 to tamper predictions about the carbon sequestration potential of the world’s trees and to withdraw its assertion that no other solution could work better than planting and/or restoring trees.

According to Bloomberg’s profile, Benioff’s 1t.org initiative seems to be faltering as well, with unreliable accounting of the pledges companies submitted to 1t.org and, unsurprisingly, many of these companies engaging in shady carbon-credit transactions. This is also why Jane Goodall’s comment in the article is disagreeable: it isn’t better for these companies to do something vis-à-vis trees than nothing at all because the companies are only furthering an illusion of climate action — claiming to do something while doing nothing at all — and perpetuating the currency of counterproductive ideas like carbon-trading.

A smattering of Benioff’s comments to Bloomberg are presented throughout the profile, as a result of which he might come across like a sage figure — but take them together, in one go, and he sounds actually like a child.

“I think that there’s a lot of people who are attacking nature and hate nature. I’m somebody who loves nature and supports nature.”

This comment follows one by “the climate and energy policy director at the Union of Concerned Scientists”, Rachel Cleetus, that trees “should not be seen as a substitute for the core task at hand here, which is getting off fossil fuels.” But in Bloomberg’s telling, Cleetus is a [checks notes] ‘nature hater’. Similarly, the following thoughtful comment is Benioff’s view of other scientists who criticised the Crowther et al. paper:

“I view it as nonsense.”

Moving on…

“I was in third grade. I learned about photosynthesis and I got it right away.”

This amazing quote appears as the last line of a paragraph; the rest of it goes thus: “Slashing fossil fuel consumption is critical to slowing warming, but scientists say we also need to pull carbon that’s already in the air back out of it. Trees are really good at that, drawing in CO2 and then releasing oxygen.” Then Benioff’s third-grade quote appears. It’s just comedy.

His other statements make for an important reminder of the oft-understated purpose of scientific communication. Aside from being published by a ‘prestige’ journal — Nature — the Crowther et al. paper presented an easy and straightforward solution to reducing the concentration of atmospheric carbon: to fix lots and lots of trees. Even without knowing the specific details of the study’s merits, any environmental scientist in South and Southeast Asia, Africa, and South America, i.e. the “Global South”, would have said this is a terrible idea.

“I said, ‘What? One trillion trees will sequester more than 200 gigatons of carbon? We have to get on this right now. Who’s working on this?’”

“Everybody agreed on tree diplomacy. I was in shock.”

“The greatest, most scalable technology we have today to sequester carbon is the tree.”

The countries in these regions have become sites of aggressive afforestation that provide carbon credits for the “Global North” to encash as licenses to keep emitting carbon. But the flip sides of these exercises are: (i) only some areas are naturally amenable to hosting trees, and it’s not feasible to plant them willy-nilly through ecosystems that don’t naturally support them; (ii) unless those in charge plant native species, afforestation will only precipitate local ecosystem decline, which will further lower the sequestration potential; (iii) unafforested land runs the risk of being perceived as ‘waste land’, sidelining the ecosystem services provided by wetlands, deserts, grasslands, etc.; and (iv) many of these countries need to be able to emit more carbon before being expected to reach net-zero, in order to pull their populations out of poverty and become economically developed — the same right the “Global North” countries had in the 19th and 20th centuries.

Scientists have known all this from well before the Crowther et al. paper turned up. Yet Benioff leapt for it the moment it appeared, and was keen on seeing it to its not-so-logical end. It’s impossible to miss the fact that his being worth $10 billion didn’t encourage him to use all that wealth and his clout to tackle the more complex actions in the soup of all actions that make up humankind’s response to climate change. Instead, he used his wealth to go for an easy way out, while dismissing informed criticism of it as “nonsense”

In fact, a similar sort of ‘ease-seeking’ is visible in the Crowther et al. paper as well, as brought out in a comment published by Veldman et al. In response to this, Crowther et al. wrote in October 2019 that their first paper simply presented value-neutral knowledge and that it shouldn’t be blamed for how it’s been construed:

Veldman et al. (4) criticize our results in dryland biomes, stating that many of these areas simply should not be considered suitable for tree restoration. Generally, we must highlight that our analysis does not ever address whether any actions “should” or “should not” take place. Our analysis simply estimated the biophysical limits of global forest growth by highlighting where trees “can” exist.

In fact, the October 2019 correction to Crowther et al., in which the authors walked back on the “trees are the best way” claim, was particularly important because it has come to mirror the challenges Benioff has found himself facing through 1t.org: it isn’t just that there are other ways to improve climate mitigation and adaptation, it’s that those ways are required, and giving up on them for any reason could never be short of a moral hazard, if not an existential one.

Featured image credit: Dawid Zawiła/Unsplash.

The “coherent water” scam is back

By: VM

On May 7, I received a press release touting a product called “coherent water” made by a company named Analemma Water India. According to the document, “coherent water” is based on more than “15 years of rigorous research and development” and confers “a myriad … health benefits”.This “rigorous research” is flawed research. There’s definitely such a thing as “coherent water” and it’s indistinguishable from regular water at all scales. The “coherent water” scam has reared its serpentine head before with the names “hexagonal water”, “structured water”, “polywater”, “exclusion zone water”, and water with one additional hydrogen and oxygen atom each, i.e. “H3O2”. Analemma’s “Mother Water”, which is its brand name for “coherent water”, itself is a rebranding of a product called “Somarka” that hit the Indian market in 2021.

The scam here is that the constituent molecules of “coherent water” get together to form hexagonal structures that persist indefinitely. And these structures distinguish “coherent water”, giving it wonderful abilities like possessing a greater energy content than regular water, boosting one’s “life force”, and — this one I love — being able to “encourage” other water molecules around it to form similar hexagonal assemblages.

I hope people won’t fall for this hoax but I know some will. But thanks to the lowest price of what Analemma is offering — a vial of “Mother Water” that it claims is worth $180 (Rs 15,000) — it’ll be some rich buggers and I think that’s okay. Fools, their wealth, and all that. Then again, it’s somewhat saddening that while (some) people are fighting to keep junk foods and bad medicines out of the market, we have “coherent water” companies and their PR outfits bravely broadcasting their press releases to news publications (and at least one publishing it) at around the same time.

If you’re curious about the issue with “coherent water”: At room temperature and pressure, the hydrogen atoms of water keep forming and breaking weak bonds with other hydrogen atoms. These bonds last for a very small duration and give water its high boiling point and ice crystals their characteristic hexagonal structure.

Sometimes water molecules organise themselves using these bonds into a hexagonal structure as well. But these formations are very short-lived because the hydrogen bonds last only around 200 quadrillionths of a second at a time, if not lower. According to the hoax, however, in “coherent water”, the hydrogen bonds continue to hold such that its water molecules persist in long-lived hexagonal clusters. But this conclusion is not supported by research — nor is the  claim that, “When swirled in normal water, the [magic water] encourages chaotic and irregular H2O molecules to rearrange into the same liquid crystalline structure as the [magic water]. What’s more, the coherent structure is retained over time – this stability is unique to Analemma.”

I don’t think this ability is unique to the “Mother Water”. In 1963, a scientist named Felix Hoenikker invented a variant of ice that, when it came in contact with water cooler than 45.8º C, quickly converted it to ice-nine as well. Sadly Hoenikker had to abandon the project after he realised the continued use of ice-nine would simply destroy all life on Earth.

Anyway, water that’s neither acidic nor basic also has a few rare hydronium (H3O+) and hydroxide (OH-) ions floating around as well. The additional hydrogen ion — basically a proton — from the hydronium ion is engaged in a game of musical chairs with the protons in the same volume of water, each one jumping to a molecule, dislodging a proton there, which jumps to another molecule, and so on. This is happening so rapidly that the hydrogen atoms in every water molecule are practically being changed several thousand times every minute.

In this milieu, it’s impossible for a fixed group of water molecules to be hanging around. In addition, the ultra-short lifetime of the hydrogen bonds are what makes water a liquid: a thing that flows, fills containers, squeezes between gaps, collects into droplets, etc. Take this ability and the fast-switching hydrogen bonds away, as “coherent water” claims to do by imposing a fixed structure, and it’s no longer water — any kind of water.

Analemma has links to some reports on its website; if you’re up to it, I suggest going through them with a simple checklist of the signs of bad research side by side. You should be able to spot most of the gunk.

Infinity in 15 kilograms

By: VM

While space is hard, there are also different kinds of hardness. For example, on April 15, ISRO issued a press release saying it had successfully tested nozzles made of a carbon-carbon composite that would replace those made of Columbium alloy in the PSLV rocket’s fourth stage and thus increase the rocket’s payload capacity by 15 kg. Just 15 kg!

The successful testing of the C-C nozzle divergent marked a major milestone for ISRO. On March 19, 2024, a 60-second hot test was conducted at the High-Altitude Test (HAT) facility in ISRO Propulsion Complex (IPRC), Mahendragiri, confirming the system’s performance and hardware integrity. Subsequent tests, including a 200-second hot test on April 2, 2024, further validated the nozzle’s capabilities, with temperatures reaching 1216K, matching predictions.

Granted, the PSLV’s cost of launching a single kilogram to low-earth orbit is more than 8 lakh rupees (a very conservative estimate, I reckon) – meaning an additional 15 kg means at least an additional Rs 1.2 crore per launch. But finances alone are not a useful way to evaluate this addition: more payload mass could mean, say, one additional instrument onboard an indigenous spacecraft instead of waiting for a larger rocket to become available or postponing that instrument’s launch to a future mission.

But equally fascinating, and pride- and notice-worthy, to me is the fact that ISRO’s scientists and engineers were able to fine-tune the PSLV to this extent. This isn’t to say I’m surprised they were able to do it at all; on the contrary, it means the feat is as much about the benefits accruing to the rocket, and the Indian space programme by extension, as about R&D advances on the materials science front. It speaks to the oft-underestimated importance of the foundations on which a space programme is built.

Vikram Sarabhai Space Centre … has leveraged advanced materials like Carbon-Carbon (C-C) Composites to create a nozzle divergent that offers exceptional properties. By utilizing processes such as carbonization of green composites, Chemical Vapor Infiltration, and High-Temperature Treatment, it has produced a nozzle with low density, high specific strength, and excellent stiffness, capable of retaining mechanical properties even at elevated temperatures.

A key feature of the C-C nozzle is its special anti-oxidation coating of Silicon Carbide, which extends its operational limits in oxidizing environments. This innovation not only reduces thermally induced stresses but also enhances corrosion resistance, allowing for extended operational temperature limits in hostile environments.

The advances here draw from insights into metallurgy, crystallography, ceramic engineering, composite materials, numerical methods, etc., which in turn stand on the shoulders of people trained well enough in these areas, the educational institutions (and their teachers) that did so, and the schooling system and socio-economic support structures that brought them there. A country needs a lot to go right for achievements like squeezing an extra 15 kg into the payload capacity of an already highly fine-tuned machine to be possible. It’s a bummer that such advances are currently largely vertically restricted, except in the case of the Indian space programme, rather than diffusing freely across sectors.

Other enterprises ought to have these particular advantages ISRO enjoys. Even should one or two rockets fail, a test not work out or a spacecraft go kaput sooner than designed, the PSLV’s new carbon-carbon-composite nozzles stand for the idea that we have everything we need to keep trying, including the opportunity to do better next time. They represent the idea of how advances in one field of research can lead to advances in another, such that each field is no longer held back by the limitations of its starting conditions.

The BHU Covaxin study and ICMR bait

By: V.M.

Earlier this month, a study by a team at Banaras Hindu University (BHU) in Varanasi concluded that fully 1% of Covaxin recipients may suffer severe adverse events. One percent is a large number because the multiplier (x in 1/100 * x) is very large — several million people. The study first hit the headlines for claiming it had the support of the Indian Council of Medical Research (ICMR) and reporting that both Bharat Biotech and the ICMR are yet to publish long-term safety data for Covaxin. The latter is probably moot now, with the COVID-19 pandemic well behind us, but it’s the principle that matters. Let it go this time and who knows what else we’ll be prepared to let go.

But more importantly, as The Hindu reported on May 25, the BHU study is too flawed to claim Covaxin is harmful, or claim anything for that matter. Here’s why (excerpt):

Though the researchers acknowledge all the limitations of the study, which is published in the journal Drug Safety, many of the limitations are so critical that they defeat the very purpose of the study. “Ideally, this paper should have been rejected at the peer-review stage. Simply mentioning the limitations, some of them critical to arrive at any useful conclusion, defeats the whole purpose of undertaking the study,” Dr. Vipin M. Vashishtha, director and pediatrician, Mangla Hospital and Research Center, Bijnor, says in an email to The Hindu. Dr. Gautam Menon, Dean (Research) & Professor, Departments of Physics and Biology, Ashoka University shares the same view. Given the limitations of the study one can “certainly say that the study can’t be used to draw the conclusions it does,” Dr. Menon says in an email.

Just because you’ve admitted your study has limitations doesn’t absolve you of the responsibility to interpret your research data with integrity. In fact, the journal needs to speak up here: why did Drug Safety publish the study manuscript? Too often when news of a controversial or bad study is published, the journal that published it stays out of the limelight. While the proximal cause is likely that journalists don’t think to ask journal editors and/or publishers tough questions about their publishing process, there is also a cultural problem here: when shit hits the fan, only the study’s authors are pulled up, but when things are rosy, the journals are out to take credit for the quality of the papers they publish. In either case, we must ask what they actually bring to the table other than capitalising on other scientists’ tendency to judge papers based on the journals they’re published in instead of their contents.

Of course, it’s also possible to argue that unlike, say, journalistic material, research papers aren’t required to be in the public interest at the time of publication. Yet the BHU paper threatens to undermine public confidence in observational studies, and that can’t be in anyone’s interest. Even at the outset, experts and many health journalists knew observational studies don’t carry the same weight as randomised controlled trials as well as that such studies still serve a legitimate purpose, just not the one to which its conclusions were pressed in the BHU study.

After the paper’s contents hit the headlines, the ICMR shot off a latter to the BHU research team saying it hasn’t “provided any financial or technical support” to the study and that the study is “poorly designed”. Curiously, the BHU team’s repartee to the ICMR’s makes repeated reference to Vivek Agnihotri’s film The Vaccine War. In the same point in which two of these references appear (no. 2), the team writes: “While a study with a control group would certainly be of higher quality, this immediately points to the fact that it is researchers from ICMR who have access to the data with the control group, i.e. the original phase-3 trials of Covaxin – as well publicized in ‘The Vaccine War’ movie. ICMR thus owes it to the people of India, that it publishes the long-term follow-up of phase-3 trials.”

I’m not clear why the team saw fit to appeal to statements made in this of all films. As I’ve written earlier, The Vaccine War — which I haven’t watched but which directly references journalistic work by The Wire during and of the pandemic — is most likely a mix of truths and fictionalisation (and not in the clever, good-faith ways in which screenwriters adopt textual biographies for the big screen), with the fiction designed to serve the BJP’s nationalist political narratives. So when the letter says in its point no. 5 that the ICMR should apologise to a female member of the BHU team for allegedly “spreading a falsehood” about her and offers The Vaccine War as a counterexample (“While ‘The Vaccine War’ movie is celebrating women scientists…”), I can’t but retch.

Together with another odd line in the latter — that the “ICMR owes it to the people of India” — the appeals read less like a debate between scientists on the merits and the demerits of the study and more like they’re trying to bait the ICMR into doing better. I’m not denying the ICMR started it, as a child might say, but saying that this shouldn’t have prevented the BHU team from keeping it dignified. For example, the BHU letter reads: “It is to be noted that interim results of the phase-3 trial, also cited by Dr. Priya Abraham in ‘The Vaccine War’ movie, had a mere 56 days of safety follow-up, much shorter than the one-year follow-up in the IMS-BHU study.” Surely the 56-day period finds mention in a more respectable and reliable medium than a film that confuses you about what’s real and what’s not?

In all, the BHU study seems to have been designed to draw attention to gaps in the safety data for Covaxin — but by adopting such a provocative route, all that took centerstage was its spat with the ICMR plus its own flaws.

India can do it!

By: V.M.
Against the background of the H5N1 pandemic in birds and an epidemic among cattle in the US, the Government of Victoria, in Australia, published a statement on May 21 that the province had recorded the country’s first human H5N1 case. This doesn’t seem to be much cause (but also not negligible cause) for concern because, according to the statement as well as other experts, this strain of avian influenza hasn’t evolved to spread easily between people. The individual in question who had the infection — “a child”, according to Victoria’s statement — had a severe form of it but has since recovered fully as well.

But this story isn’t testament to Australia’s pathogen surveillance, at least not primarily; it’s testament to India’s ability to do it. In Vivek Agnihotri’s film The Vaccine War — purportedly about the efforts of Bharat Biotech, the ICMR, and the NIV to develop Covaxin during the COVID-19 pandemic — Raima Sen, who plays the science editor of a fictitious publication called The Daily Wire, says about developing the vaccine in a moment of amusing cringe on a TV news show that “India can’t do it”. Agnihotri didn’t make it difficult to see myself in Sen’s character: I was science editor of the very real publication The Wire when Covaxin was being developed. And I’m here to tell you that India, in point of fact, can: according to Victoria’s statement, the child became infected by a strain of the H5N1 virus in India and fell ill in March 2024.

And what is it that India can do? According to Victoria’s statement, spotting the infection required “Victoria’s enhanced surveillance system”. Further, “most strains don’t infect humans”; India was able to serve the child with one of the rare strains that could. “Transmission to humans” is also “very rare”, happening largely among people who “have contact with infected birds or animals, or their secretions, while in affected areas of the world”. Specifically: “Avian influenza is spread by close contact with an infected bird (dead or alive), e.g. handling infected birds, touching droppings or bedding, or killing/preparing infected poultry for cooking. You can’t catch avian influenza through eating fully cooked poultry or eggs, even in areas with an outbreak of avian influenza.”

So let’s learn our lesson: If we give India’s widespread dysregulation of poultry and cattle health, underinvestment in pathogen surveillance, and its national government’s unique blend of optimism and wilful ignorance a chance, the country will give someone somewhere a rare strain of an avian influenza virus that can infect humans. Repeat after me: India can do it!

The billionaire’s solution to climate change

By: V.M.

On May 3, Bloomberg published a profile of Salesforce CEO Marc Benioff’s 1t.org project to plant or conserve one trillion trees around the world in order to sequester 200 gigatonnes of carbon every year. The idea reportedly came to Benioff from Thomas Crowther’s infamous September 2015 paper in Nature that claimed restoring trees was the world’s best way to ‘solve’ climate change.

Following pointed criticism of the paper’s attitude and conclusions, they were revised to a significant extent in October 2019 to tamper predictions about the carbon sequestration potential of the world’s trees and to withdraw its assertion that no other solution could work better than planting and/or restoring trees.

According to Bloomberg’s profile, Benioff’s 1t.org initiative seems to be faltering as well, with unreliable accounting of the pledges companies submitted to 1t.org and, unsurprisingly, many of these companies engaging in shady carbon-credit transactions. This is also why Jane Goodall’s comment in the article is disagreeable: it isn’t better for these companies to do something vis-à-vis trees than nothing at all because the companies are only furthering an illusion of climate action — claiming to do something while doing nothing at all — and perpetuating the currency of counterproductive ideas like carbon-trading.

A smattering of Benioff’s comments to Bloomberg are presented throughout the profile, as a result of which he might come across like a sage figure — but take them together, in one go, and he sounds actually like a child.

“I think that there’s a lot of people who are attacking nature and hate nature. I’m somebody who loves nature and supports nature.”

This comment follows one by “the climate and energy policy director at the Union of Concerned Scientists”, Rachel Cleetus, that trees “should not be seen as a substitute for the core task at hand here, which is getting off fossil fuels.” But in Bloomberg’s telling, Cleetus is a [checks notes] ‘nature hater’. Similarly, the following thoughtful comment is Benioff’s view of other scientists who criticised the Crowther et al. paper:

“I view it as nonsense.”

Moving on…

“I was in third grade. I learned about photosynthesis and I got it right away.”

This amazing quote appears as the last line of a paragraph; the rest of it goes thus: “Slashing fossil fuel consumption is critical to slowing warming, but scientists say we also need to pull carbon that’s already in the air back out of it. Trees are really good at that, drawing in CO2 and then releasing oxygen.” Then Benioff’s third-grade quote appears. It’s just comedy.

His other statements make for an important reminder of the oft-understated purpose of scientific communication. Aside from being published by a ‘prestige’ journal — Nature — the Crowther et al. paper presented an easy and straightforward solution to reducing the concentration of atmospheric carbon: to fix lots and lots of trees. Even without knowing the specific details of the study’s merits, any environmental scientist in South and Southeast Asia, Africa, and South America, i.e. the “Global South”, would have said this is a terrible idea.

“I said, ‘What? One trillion trees will sequester more than 200 gigatons of carbon? We have to get on this right now. Who’s working on this?’”

“Everybody agreed on tree diplomacy. I was in shock.”

“The greatest, most scalable technology we have today to sequester carbon is the tree.”

The countries in these regions have become sites of aggressive afforestation that provide carbon credits for the “Global North” to encash as licenses to keep emitting carbon. But the flip sides of these exercises are: (i) only some areas are naturally amenable to hosting trees, and it’s not feasible to plant them willy-nilly through ecosystems that don’t naturally support them; (ii) unless those in charge plant native species, afforestation will only precipitate local ecosystem decline, which will further lower the sequestration potential; (iii) unafforested land runs the risk of being perceived as ‘waste land’, sidelining the ecosystem services provided by wetlands, deserts, grasslands, etc.; and (iv) many of these countries need to be able to emit more carbon before being expected to reach net-zero, in order to pull their populations out of poverty and become economically developed — the same right the “Global North” countries had in the 19th and 20th centuries.

Scientists have known all this from well before the Crowther et al. paper turned up. Yet Benioff leapt for it the moment it appeared, and was keen on seeing it to its not-so-logical end. It’s impossible to miss the fact that his being worth $10 billion didn’t encourage him to use all that wealth and his clout to tackle the more complex actions in the soup of all actions that make up humankind’s response to climate change. Instead, he used his wealth to go for an easy way out, while dismissing informed criticism of it as “nonsense”

In fact, a similar sort of ‘ease-seeking’ is visible in the Crowther et al. paper as well, as brought out in a comment published by Veldman et al. In response to this, Crowther et al. wrote in October 2019 that their first paper simply presented value-neutral knowledge and that it shouldn’t be blamed for how it’s been construed:

Veldman et al. (4) criticize our results in dryland biomes, stating that many of these areas simply should not be considered suitable for tree restoration. Generally, we must highlight that our analysis does not ever address whether any actions “should” or “should not” take place. Our analysis simply estimated the biophysical limits of global forest growth by highlighting where trees “can” exist.

In fact, the October 2019 correction to Crowther et al., in which the authors walked back on the “trees are the best way” claim, was particularly important because it has come to mirror the challenges Benioff has found himself facing through 1t.org: it isn’t just that there are other ways to improve climate mitigation and adaptation, it’s that those ways are required, and giving up on them for any reason could never be short of a moral hazard, if not an existential one.

Featured image credit: Dawid Zawiła/Unsplash.

The “coherent water” scam is back

By: V.M.

On May 7, I received a press release touting a product called “coherent water” made by a company named Analemma Water India. According to the document, “coherent water” is based on more than “15 years of rigorous research and development” and confers “a myriad … health benefits”.This “rigorous research” is flawed research. There’s definitely such a thing as “coherent water” and it’s indistinguishable from regular water at all scales. The “coherent water” scam has reared its serpentine head before with the names “hexagonal water”, “structured water”, “polywater”, “exclusion zone water”, and water with one additional hydrogen and oxygen atom each, i.e. “H3O2”. Analemma’s “Mother Water”, which is its brand name for “coherent water”, itself is a rebranding of a product called “Somarka” that hit the Indian market in 2021.

The scam here is that the constituent molecules of “coherent water” get together to form hexagonal structures that persist indefinitely. And these structures distinguish “coherent water”, giving it wonderful abilities like possessing a greater energy content than regular water, boosting one’s “life force”, and — this one I love — being able to “encourage” other water molecules around it to form similar hexagonal assemblages.

I hope people won’t fall for this hoax but I know some will. But thanks to the lowest price of what Analemma is offering — a vial of “Mother Water” that it claims is worth $180 (Rs 15,000) — it’ll be some rich buggers and I think that’s okay. Fools, their wealth, and all that. Then again, it’s somewhat saddening that while (some) people are fighting to keep junk foods and bad medicines out of the market, we have “coherent water” companies and their PR outfits bravely broadcasting their press releases to news publications (and at least one publishing it) at around the same time.

If you’re curious about the issue with “coherent water”: At room temperature and pressure, the hydrogen atoms of water keep forming and breaking weak bonds with other hydrogen atoms. These bonds last for a very small duration and give water its high boiling point and ice crystals their characteristic hexagonal structure.

Sometimes water molecules organise themselves using these bonds into a hexagonal structure as well. But these formations are very short-lived because the hydrogen bonds last only around 200 quadrillionths of a second at a time, if not lower. According to the hoax, however, in “coherent water”, the hydrogen bonds continue to hold such that its water molecules persist in long-lived hexagonal clusters. But this conclusion is not supported by research — nor is the  claim that, “When swirled in normal water, the [magic water] encourages chaotic and irregular H2O molecules to rearrange into the same liquid crystalline structure as the [magic water]. What’s more, the coherent structure is retained over time – this stability is unique to Analemma.”

I don’t think this ability is unique to the “Mother Water”. In 1963, a scientist named Felix Hoenikker invented a variant of ice that, when it came in contact with water cooler than 45.8º C, quickly converted it to ice-nine as well. Sadly Hoenikker had to abandon the project after he realised the continued use of ice-nine would simply destroy all life on Earth.

Anyway, water that’s neither acidic nor basic also has a few rare hydronium (H3O+) and hydroxide (OH-) ions floating around as well. The additional hydrogen ion — basically a proton — from the hydronium ion is engaged in a game of musical chairs with the protons in the same volume of water, each one jumping to a molecule, dislodging a proton there, which jumps to another molecule, and so on. This is happening so rapidly that the hydrogen atoms in every water molecule are practically being changed several thousand times every minute.

In this milieu, it’s impossible for a fixed group of water molecules to be hanging around. In addition, the ultra-short lifetime of the hydrogen bonds are what makes water a liquid: a thing that flows, fills containers, squeezes between gaps, collects into droplets, etc. Take this ability and the fast-switching hydrogen bonds away, as “coherent water” claims to do by imposing a fixed structure, and it’s no longer water — any kind of water.

Analemma has links to some reports on its website; if you’re up to it, I suggest going through them with a simple checklist of the signs of bad research side by side. You should be able to spot most of the gunk.

Infinity in 15 kilograms

By: V.M.

While space is hard, there are also different kinds of hardness. For example, on April 15, ISRO issued a press release saying it had successfully tested nozzles made of a carbon-carbon composite that would replace those made of Columbium alloy in the PSLV rocket’s fourth stage and thus increase the rocket’s payload capacity by 15 kg. Just 15 kg!

The successful testing of the C-C nozzle divergent marked a major milestone for ISRO. On March 19, 2024, a 60-second hot test was conducted at the High-Altitude Test (HAT) facility in ISRO Propulsion Complex (IPRC), Mahendragiri, confirming the system’s performance and hardware integrity. Subsequent tests, including a 200-second hot test on April 2, 2024, further validated the nozzle’s capabilities, with temperatures reaching 1216K, matching predictions.

Granted, the PSLV’s cost of launching a single kilogram to low-earth orbit is more than 8 lakh rupees (a very conservative estimate, I reckon) – meaning an additional 15 kg means at least an additional Rs 1.2 crore per launch. But finances alone are not a useful way to evaluate this addition: more payload mass could mean, say, one additional instrument onboard an indigenous spacecraft instead of waiting for a larger rocket to become available or postponing that instrument’s launch to a future mission.

But equally fascinating, and pride- and notice-worthy, to me is the fact that ISRO’s scientists and engineers were able to fine-tune the PSLV to this extent. This isn’t to say I’m surprised they were able to do it at all; on the contrary, it means the feat is as much about the benefits accruing to the rocket, and the Indian space programme by extension, as about R&D advances on the materials science front. It speaks to the oft-underestimated importance of the foundations on which a space programme is built.

Vikram Sarabhai Space Centre … has leveraged advanced materials like Carbon-Carbon (C-C) Composites to create a nozzle divergent that offers exceptional properties. By utilizing processes such as carbonization of green composites, Chemical Vapor Infiltration, and High-Temperature Treatment, it has produced a nozzle with low density, high specific strength, and excellent stiffness, capable of retaining mechanical properties even at elevated temperatures.

A key feature of the C-C nozzle is its special anti-oxidation coating of Silicon Carbide, which extends its operational limits in oxidizing environments. This innovation not only reduces thermally induced stresses but also enhances corrosion resistance, allowing for extended operational temperature limits in hostile environments.

The advances here draw from insights into metallurgy, crystallography, ceramic engineering, composite materials, numerical methods, etc., which in turn stand on the shoulders of people trained well enough in these areas, the educational institutions (and their teachers) that did so, and the schooling system and socio-economic support structures that brought them there. A country needs a lot to go right for achievements like squeezing an extra 15 kg into the payload capacity of an already highly fine-tuned machine to be possible. It’s a bummer that such advances are currently largely vertically restricted, except in the case of the Indian space programme, rather than diffusing freely across sectors.

Other enterprises ought to have these particular advantages ISRO enjoys. Even should one or two rockets fail, a test not work out or a spacecraft go kaput sooner than designed, the PSLV’s new carbon-carbon-composite nozzles stand for the idea that we have everything we need to keep trying, including the opportunity to do better next time. They represent the idea of how advances in one field of research can lead to advances in another, such that each field is no longer held back by the limitations of its starting conditions.

Justice delayed but a ton of bricks await

By: V.M.

From ‘SC declines Ramdev, Patanjali apology; expresses concern over FMCGs taking gullible consumers ‘up and down the garden path’’, The Hindu, April 10, 2024:

The Supreme Court has refused to accept the unconditional apology from Patanjali co-founder Baba Ramdev and managing director Acharya Balkrishna for advertising medical products in violation of giving an undertaking in the apex court in November 2023 prohibiting the self-styled yoga guru. … Justices Hima Kohli and Ahsanuddin Amanullah told senior advocate Mukul Rohatgi that Mr. Ramdev has apologised only after being caught on the back foot. His violations of the undertaking to the court was deliberate and willful, they said. The SC recorded its dissatisfaction with the apology tendered by proposed contemnors Patanjali, Mr. Balkrishna and Mr. Ramdev, and posted the contempt of court case on April 16.

… The Bench also turned its ire on the Uttarakhand State Licensing Authority for “twiddling their thumbs” and doing nothing to prevent the publications and advertisements. “Why should we not come down like a ton of bricks on your officers? They have been fillibustering,” Justice Kohli said. The court said the assurances of the State Licensing Authority and the apology of the proposed contemnors are not worth the paper they are written on.

A very emotionally gratifying turn of events, but perhaps not as gratifying as they might have been had they transpired at the government’s hands when Patanjali was issuing its advertisements of pseudoscience-backed COVID-19 cures during the pandemic. Or if the Supreme Court had proceeded to actually hold the men in contempt instead of making a slew of observations and setting a date for another hearing. Still, something to cheer for and occasion to reserve some hope for the April 16 session.

But in matters involving Ramdev and Patanjali Ayurved, many ministers of the current government ought to be pulled up as well, including former Union health minister Harsh Vardhan, Union micro, small, and medium enterprises minister Nitin Gadkari, and Prime Minister Narendra Modi. Modi’s governance and policies both written and unwritten enabled Patanjali’s charlatanry while messrs Vardhan and Gadkari were present at an event in February 2021 when Patanjali launched a product it claimed could cure COVID-19, with Vardhan – who was health minister then – speaking in favour of people buying and using the unproven thing.

I think the Supreme Court’s inclination to hold Ramdev et al. in contempt should extend to Vardhan as well because his presence at the event conferred a sheen of legitimacy on the product but also because of a specific bit of theatrics he pulled in May the same year involving Ramdev and former Prime Minister Manmohan Singh. Ramdev apologising because that’s more politically convenient rather than because he thinks he screwed up isn’t new. In that May, he’d called evidence-based medicine “stupid” and alleged such medicine had killed more people than the virus itself. After some virulent public backlash, Vardhan wrote a really polite letter to Ramdev asking him to apologise, and Ramdev obliged.

But just the previous month, in April 2021, Manmohan Singh had written a letter to Modi suggesting a few courses of action to improve India’s response to the virus’s spread. Its contents were perfectly reasonable, yet Vardhan responded to it accusing Singh of spreading “vaccine hesitancy” and alleging Congress-ruled states were responsible for fanning India’s deadly second wave of COVID-19 infections (in 2021). These were all ridiculous assertions. But equally importantly, his lashing out stood in stark contrast to his letter to Ramdev: respect for the self-styled godman and businessman whose company was attempting to corner the market for COVID-19 cures with untested, pseudo-Ayurvedic froth versus unhinged rhetoric for a well-regarded economist and statesman.

For this alone, Vardhan deserves the “ton of bricks” the Supreme Court is waiting with.

The "coherent water" scam is back

By: VM
The "coherent water" scam is back

On May 7, I received a press release touting a product called "coherent water" made by a company named Analemma Water India. According to the document, "coherent water" is based on more than "15 years of rigorous research and development" and confers "a myriad … health benefits".

This "rigorous research" is flawed research. There's definitely such a thing as "coherent water" and it's indistinguishable from regular water at all scales. The "coherent water" scam has reared its serpentine head before with the names "hexagonal water", "structured water", "polywater", "exclusion zone water", and water with one additional hydrogen and oxygen atom each, i.e. "H3O2". Analemma's "Mother Water", which is its brand name for "coherent water", itself is a rebranding of a product called "Somarka" that hit the Indian market in 2021.

The scam here is that the constituent molecules of "coherent water" get together to form hexagonal structures that persist indefinitely. And these structures distinguish "coherent water", giving it wonderful abilities like possessing a greater energy content than regular water, boosting one's "life force", and — this one I love — being able to "encourage" other water molecules around it to form similar hexagonal assemblages.

I hope people won't fall for this hoax but I know some will. But thanks to the lowest price of what Analemma is offering — a vial of "Mother Water" that it claims is worth $180 (Rs 15,000) — it'll be some rich buggers and I think that's okay. Fools, their wealth, and all that. Then again, it's somewhat saddening that while (some) journalists, policymakers, activists, and members of the judiciary are fighting to keep junk foods and bad medicines out of the market, we have also companies and their PR outfits bravely broadcasting their press releases to news publications (and at least one publishing it) at around the same time.

Anyway, if you're curious about the issue with "coherent water": At room temperature and pressure, the hydrogen atoms of water keep forming and breaking weak bonds with other hydrogen atoms. These bonds last for a very small duration and give water its high boiling point and ice crystals their characteristic hexagonal structure.

Sometimes water molecules organise themselves using these bonds into a hexagonal structure as well. But these formations are very short-lived because the hydrogen bonds last only around 200 quadrillionths of a second at a time, if not lower. According to the hoax, however, in "coherent water", the hydrogen bonds continue to hold such that its water molecules persist in long-lived hexagonal clusters. But this conclusion is not supported by research — nor is the  claim that, "When swirled in normal water, the [magic water] encourages chaotic and irregular H2O molecules to rearrange into the same liquid crystalline structure as the [magic water]. What’s more, the coherent structure is retained over time – this stability is unique to Analemma."

I don't think this ability is unique to the "Mother Water". In 1963, a scientist named Felix Hoenikker invented a variant of ice that, when it came in contact with water cooler than 45.8º C, quickly converted it to ice-nine as well. Sadly Hoenikker had to abandon the project after he realised the continued use of ice-nine would simply destroy all life on Earth.

Anyway, water that's neither acidic nor basic also has a few rare hydronium (H3O+) and hydroxide (OH-) ions floating around as well. The additional hydrogen ion — basically a proton — from the hydronium ion is engaged in a game of musical chairs with the protons in the same volume of water, each one jumping to a molecule, dislodging a proton there, which jumps to another molecule, and so on. This is happening so rapidly that the hydrogen atoms in every water molecule are practically being changed several thousand times every minute.

In this milieu, it's impossible for a fixed group of water molecules to be hanging around. In addition, the ultra-short lifetime of the hydrogen bonds are what makes water a liquid: a thing that flows, fills containers, squeezes between gaps, collects into droplets, etc. Take this ability and the fast-switching hydrogen bonds away, as "coherent water" claims to do by imposing a fixed structure, and it's no longer water — any kind of water.

Analemma has links to some reports on its website; if you're up to it, I suggest going through them with a simple checklist of the signs of bad research side by side. You should be able to spot most of the gunk.

End of the line

By: VM
End of the line

The folks at The Wire have laid The Wire Science to rest, I’ve learnt. The site hasn’t published any (original) articles since February 2 and its last tweet was on February 16, 2024.

At the time I left, in October 2022, the prospect of it continuing to run on its own steam was very much in the picture. But I’ve also been out of the loop since and learnt a short while ago that The Wire Science stopped being a functional outlet sometime earlier this year, and that its website and its articles will, in the coming months, be folded into The Wire, where they will continue to live. The Wire must do what’s best for its future and I don’t begrudge the decision to stop publishing The Wire Science separately – although I do wonder if, even if they didn’t see sense in finding a like-for-like replacement, they could have attempted something less intensive with another science journalist. I’m nonetheless sad because some things will still be lost.

Foremost on my mind are The Wire Science‘s distinct sensibilities. As is the case at The Hindu as well as at all publications whose primary journalistic product is ‘news’, the science coverage doesn’t have the room or license to examine a giant swath of the science landscape, which – while in many ways being science news in the sense that it presents new information derived from scientific work – can only manifest in the pages of a news product as ‘analysis’, ‘commentary’, ‘opinion’, etc. The Wire has the latter, or had when I left and I don’t know how they’ll be thinking about that going ahead, but there is still the risk of science coverage there not being able to spread its wings nearly as widely as it could on The Wire Science.

I still think such freedom is required because we haven’t figured out how best to cover science, at least not without also getting entangled in questions about science’s increasingly high-strung relationship with society and whether science journalists, as practitioners of a science journalism coming of age anew in the era of transdisciplinary technologies (AI, One Health, open access, etc.), can expect to be truly objective, forget covering science by the same rules and expectations that guide the traditional journalisms of business, politics, sports, etc. If however The Wire‘s journalists are still thinking about these things, kudos and best wishes to them.

Of course, one thing was definitely lost: the room to experiment with forms of storytelling that better interrogate many of these alternative possibilities I think science journalism needs to embrace. Such things rarely, if ever, survive the demands of the everyday newsroom. Again, The Wire must do what it deems best for its future; doing otherwise would be insensible. But loss is also loss. RIP. I’m sad, but also proud The Wire Science was what it was when it lived.

The foundation of shit

By: VM
The foundation of shit

I’ve been a commissioning editor in Indian science, health, and environment journalism for a little under a decade. I’ve learnt many lessons in this time but one in particular still surprises me. Whenever I receive an email, I’m quick to at least shoot off a holding reply: “I’m caught up with other stuff today, I’ll get back to you on this <whenever>”. Having a horizon makes time management much easier. What surprises me is that many commissioning editors don’t do this. I’ve heard the same story from scores of freelancing writers and reporters: “I email them but they just don’t reply for a long time.” Newsrooms are short-staffed everywhere and I readily empathise with any editor who says there’s just no time or mental bandwidth. But that’s also why the holding email exists and can even be automated to ask the sender to wait for <insert number here> hours. A few people have even said they prefer working with me because, among other things, I’m prompt. This really isn’t a brag. It’s a fruit hanging so low it’s touching the ground. Sure, it’s nice to have an advantage just by being someone who replies to emails and sets expectations – but if you think about it, especially from a freelancer’s point of view, it has a foundation of shit. It shouldn’t exist.

There’s a problem on the other side of this coin here. I picked up the habit of the holding email when I was with The Wire (before The Wire Science) – a very useful piece of advice SV gave me. When I first started to deploy it, it worked wonders when engaging with reporters and writers. Because I wrote back, almost always within less than half a day of their emails, they submitted more of their work. Bear in mind at this point that freelancers are juggling payments for past work (from this or other publications), negotiations for payment for the current submission, and work on other stories in the pipeline. In the midst of all this – and I’m narrating second-hand experiences here – to have an editor come along who replies possibly seems very alluring. Perhaps it’s one less variable to solve for. I certainly wanted to take advantage of it. Over time, however, a problem arose. Being prompt with emails means checking the inbox every <insert number here> minutes. I quickly lost my mind over having to check for new emails as often as I could, but I kept at it because the payoff stayed high. This behaviour also changed some writers’ expectations of me: if I didn’t reply within six hours, say, I’d receive an email or two checking in or, in one case, accusing me of being like “the others”.

I want my job to be about doing good science journalism as much as giving back to the community of science journalists. In fact, I believe doing the latter will automatically achieve the former. We tried this in one way when building out The Wire Science and I think we’ve taken the first steps in a new direction at The Hindu Science – yet these are also drops in the ocean. For a community that requires so, so much still, giving can be so easy that one loses oneself in the process, including on the deceptively trivial matter of replying to emails. Reply quickly and meaningfully and it’s likely to offer a value of its own to the person on the other side of the email server. Suddenly you have a virtue, and because it’s a virtue, you want to hold on to it. But it’s a pseudo-virtue, a false god, created by the expectations of those who deserve better and the aspirations of those who want to meet those expectations. Like it or not, it comes from a bad place. The community needs so, so much still, but that doesn’t mean everything I or anyone else has to give is valuable.

I won’t stop being prompt but I will have to find a middle-ground where I’m prompt enough and at the same time the sender of the email doesn’t think I or any other editor for that matter has dropped the ball. This is as much about managing individual expectations as the culture of thinking about time a certain way, which includes stakeholders’ expectations of the editor-writer relationship in all Indian newsrooms publishing science-related material. (The fact of India being the sort of country where the place you’re at – and increasingly the government there – being one of the first things getting in the way of life also matters.) This culture should also serve the interests of science journalism in the country, including managing the tension between the well-being of its practitioners and sustainability on one hand and the effort and the proverbial extra push required for its growth on the other.

Neural Networks (MNIST inference) on the “3-cent” Microcontroller

By: cpldcpu

Bouyed by the surprisingly good performance of neural networks with quantization aware training on the CH32V003, I wondered how far this can be pushed. How much can we compress a neural network while still achieving good test accuracy on the MNIST dataset? When it comes to absolutely low-end microcontrollers, there is hardly a more compelling target than the Padauk 8-bit microcontrollers. These are microcontrollers optimized for the simplest and lowest cost applications there are. The smallest device of the portfolio, the PMS150C, sports 1024 13-bit word one-time-programmable memory and 64 bytes of ram, more than an order of magnitude smaller than the CH32V003. In addition, it has a proprieteray accumulator based 8-bit architecture, as opposed to a much more powerful RISC-V instruction set.

Is it possible to implement an MNIST inference engine, which can classify handwritten numbers, also on a PMS150C?

On the CH32V003 I used MNIST samples that were downscaled from 28×28 to 16×16, so that every sample take 256 bytes of storage. This is quite acceptable if there is 16kb of flash available, but with only 1 kword of rom, this is too much. Therefore I started with downscaling the dataset to 8×8 pixels.

The image above shows a few samples from the dataset at both resolutions. At 16×16 it is still easy to discriminate different numbers. At 8×8 it is still possible to guess most numbers, but a lot of information is lost.

Suprisingly, it is still possible to train a machine learning model to recognize even these very low resolution numbers with impressive accuracy. It’s important to remember that the test dataset contains 10000 images that the model does not see during training. The only way for a very small model to recognize these images accurate is to identify common patterns, the model capacity is too limited to “remember” complete digits. I trained a number of different network combinations to understand the trade-off between network memory footprint and achievable accuracy.

Parameter Exploration

The plot above shows the result of my hyperparameter exploration experiments, comparing models with different configurations of weights and quantization levels from 1 to 4 bit for input images of 8×8 and 16×16. The smallest models had to be trained without data augmentation, as they would not converge otherwise.

Again, there is a clear relationship between test accuracy and the memory footprint of the network. Increasing the memory footprint improves accuracy up to a certain point. For 16×16, around 99% accuracy can be achieved at the upper end, while around 98.5% is achieved for 8×8 test samples. This is still quite impressive, considering the significant loss of information for 8×8.

For small models, 8×8 achieves better accuracy than 16×16. The reason for this is that the size of the first layer dominates in small models, and this size is reduced by a factor of 4 for 8×8 inputs.

Surprisingly, it is possible to achieve over 90% test accuracy even on models as small as half a kilobyte. This means that it would fit into the code memory of the microcontroller! Now that the general feasibility has been established, I needed to tweak things further to accommodate the limitations of the MCU.

Training the Target Model

Since the RAM is limited to 64 bytes, the model structure had to use a minimum number of latent parameters during inference. I found that it was possible to use layers as narrow as 16. This reduces the buffer size during inference to only 32 bytes, 16 bytes each for one input buffer and one output buffer, leaving 32 bytes for other variables. The 8×8 input pattern is directly read from the ROM.

Furthermore, I used 2-bit weights with irregular spacing of (-2, -1, 1, 2) to allow for a simplified implementation of the inference code. I also skipped layer normalization and instead used a constant shift to rescale activations. These changes slightly reduced accuracy. The resulting model structure is shown below.

All things considered, I ended up with a model with 90.07% accuracy and a total of 3392 bits (0.414 kilobytes) in 1696 weights, as shown in the log below. The panel on the right displays the first layer weights of the trained model, which directly mask features in the test images. In contrast to the higher accuracy models, each channel seems to combine many features at once, and no discernible patterns can be seen.

Implementation on the Microntroller

In the first iteration, I used a slightly larger variant of the Padauk Microcontrollers, the PFS154. This device has twice the ROM and RAM and can be reflashed, which tremendously simplifies software development. The C versions of the inference code, including the debug output, worked almost out of the box. Below, you can see the predictions and labels, including the last layer output.

Squeezing everything down to fit into the smaller PMS150C was a different matter. One major issue when programming these devices in C is that every function call consumes RAM for the return stack and function parameters. This is unavoidable because the architecture has only a single register (the accumulator), so all other operations must occur in RAM.

To solve this, I flattened the inference code and implemented the inner loop in assembly to optimize variable usage. The inner loop for memory-to-memory inference of one layer is shown below. The two-bit weight is multiplied with a four-bit activation in the accumulator and then added to a 16-bit register. The multiplication requires only four instructions (t0sn, sl,t0sn,neg), thanks to the powerful bit manipulation instructions of the architecture. The sign-extending addition (add, addc, sl, subc) also consists of four instructions, demonstrating the limitations of 8-bit architectures.

void fc_innerloop_mem(uint8_t loops) {

    sum = 0;
    do  {
       weightChunk = *weightidx++;
__asm   
    idxm  a, _activations_idx
	inc	_activations_idx+0

    t0sn _weightChunk, #6
    sl     a            ;    if (weightChunk & 0x40) in = in+in;
    t0sn _weightChunk, #7
    neg    a           ;     if (weightChunk & 0x80) in =-in;                    

    add    _sum+0,a
    addc   _sum+1
    sl     a 
    subc   _sum+1  

  ... 3x more ...

__endasm;
    } while (--loops);

    int8_t sum8 = ((uint16_t)sum)>>3; // Normalization
    sum8 = sum8 < 0 ? 0 : sum8; // ReLU
    *output++ = sum8;
}

In the end, I managed to fit the entire inference code into 1 kilowords of memory and reduced sram usage to 59 bytes, as seen below. (Note that the output from SDCC is assuming 2 bytes per instruction word, while it is only 13 bits).

Success! Unfortunately, there was no rom space left for the soft UART to output debug information. However, based on the verificaiton on PFS154, I trust that the code works, and since I don’t have any specific application in mind, I left it at that stage.

Summary

It is indeed possible to implement MNIST inference with good accuracy using one of the cheapest and simplest microcontrollers on the market. A lot of memory footprint and processing overhead is usually spent on implementing flexible inference engines, that can accomodate a wide range of operators and model structures. Cutting this overhead away and reducing the functionality to its core allows for astonishing simplification at this very low end.

This hack demonstrates that there truly is no fundamental lower limit to applying machine learning and edge inference. However, the feasibility of implementing useful applications at this level is somewhat doubtful.

You can find the project repository here.

Implementing Neural Networks on the “10-cent” RISC-V MCU without Multiplier

By: cpldcpu

I have been meaning for a while to establish a setup to implement neural network based algorithms on smaller microcontrollers. After reviewing existing solutions, I felt there is no solution that I really felt comfortable with. One obvious issue is that often flexibility is traded for overhead. As always, for a really optimized solution you have to roll your own. So I did. You can find the project here and a detailed writeup here.

It is always easier to work with a clear challenge: I picked the CH32V003 as my target platform. This is the smallest RISC-V microcontroller on the market right now, addressing a $0.10 price point. It sports 2kb of SRAM and 16kb of flash. It is somewhat unique in implementing the RV32EC instruction set architecture, which does not even support multiplications. In other words, for many purposes this controller is less capable than an Arduino UNO.

As a test subject I chose the well-known MNIST dataset, which consists of images of hand written numbers which need to be classified from 0 to 9. Many inspiring implementation on Arduino exist for MNIST, for example here. In this case, the inference time was 7 seconds and 82% accuracy was achieved.

The idea is to train a neural network on a PC and optimize it for inference on teh CH32V003 while meetings these criteria:

  1. Be as fast and as accurate as possible
  2. Low SRAM footprint during inference to fit into 2kb sram
  3. Keep the weights of the neural network as small as possible
  4. No multiplications!

These criteria can be addressed by using a neural network with quantized weights, were each weight is represented with as few bits as possible. The best possible results are achieved when training the network already on quantized weights (Quantization Aware Training) as opposed to quantized a model that was trained with high accuracy weights. There is currently some hype around using Binary and Ternary weights for large language models. But indeed, we can also use these approaches to fit a neural network to a small microcontroller.

The benefit of only using a few bits to represent each weight is that the memory footprint is low and we do not need a real multiplication instruction – inference can be reduced to additions only.

Model structure and optimization

For simplicity reasons, I decided to go for a e network architecture based on fully-connected layers instead of convolutional neural networks. The input images are reduced to a size of 16×16=256 pixels and are then fed into the network as shown below.

The implementation of the inference engine is straightforward since only fully connected layers are used. The code snippet below shows the innerloop, which implements multiplication of 4 bit weights by using adds and shifts. The weights use a one-complement encoding without zero, which helps with code efficiency. One bit, ternary, and 2 bit quantization was implemented in a similar way.

    int32_t sum = 0;
for (uint32_t k = 0; k < n_input; k+=8) {
uint32_t weightChunk = *weightidx++;

for (uint32_t j = 0; j < 8; j++) {
int32_t in=*activations_idx++;
int32_t tmpsum = (weightChunk & 0x80000000) ? -in : in;
sum += tmpsum; // sign*in*1
if (weightChunk & 0x40000000) sum += tmpsum<<3; // sign*in*8
if (weightChunk & 0x20000000) sum += tmpsum<<2; // sign*in*4
if (weightChunk & 0x10000000) sum += tmpsum<<1; // sign*in*2
weightChunk <<= 4;
}
}
output[i] = sum;

In addition the fc layers also normalization and ReLU operators are required. I found that it was possible to replace a more complex RMS normalization with simple shifts in the inference. Not a single full 32×32 multiplication is needed for the inference! Having this simple structure for inference means that we have to focus the effort on the training part.

I studied variations of the network with different numbers of bits and different sizes by varying the numer of hidden activiations. To my surprise I found that the accuracy of the prediction is proportional to the total number of bits used to store the weights. For example, when 2 bits are used for each weight, twice the numbers of weights are needed to achieve the same perforemnce as a 4 bit weight network. The plot below shows training loss vs. total number of bits. We can see that for 1-4 bits, we can basically trade more weights for less bits. This trade-off is less efficient for 8 bits and no quantization (fp32).

I further optimized the training by using data augmentation, a cosine schedule and more epochs. It seems that 4 bit weights offered the best trade off.

More than 99% accuracy was achieved for 12 kbyte model size. While it is possible to achiever better accuracy with much larger models, it is significantly more accurate than other on-MCU implementations of MNIST.

Implementation on the Microcontroller

The model data is exported to a c-header file for inclusion into the inference code. I used the excellent ch32v003fun environment, which allowed me to reduce overhead to be able to store 12kb of weights plus the inference engine in only 16kb of flash.

There was still enough free flash to include 4 sample images. The inference output is shown above. Execution time for one inference is 13.7 ms which would actually allow to model to process moving image input in real time.

Alternatively, I also tested a smaller model with 4512 2-bit parameters and only 1kb of flash memory footprintg. Despite its size, it still achieves a 94.22% test accuracy and it executes in only 1.88ms.

Conclusions

This was quite a tedious projects, hunting many lost bits and rounding errors. I am quite pleased with the outcome as it shows that it is possible to compress neural networks very significantly with dedicated effort. I learned a lot and am planning to use the data pipeline for more interesting applications.

❌