Normal view

There are new articles available, click to refresh the page.

Before yesterdayMain stream

Tim's+Blog
Candle Flame Oscillations as a Clock 13 August 2025 at 11:16

Candle Flame Oscillations as a Clock

Tim's+Blog

By: cpldcpu

13 August 2025 at 11:16

Todays candles have been optimized for millenia not to flicker. But it turns out when we bundle three of them together, we can undo all of these optimizations and the resulting triplet will start to naturally oscillate. A fascinating fact is that the oscillation frequency is rather stable at ~9.9Hz as it mainly depends on gravity and diameter of the flame.

We use a rather unusual approach based on a wire suspended in the flame, that can sense capacitance changes caused by the ionized gases in the flame, to detect this frequency and divide it down to 1Hz.

Introduction

Candlelight is a curious thing. Candles seem to have a life of their own: the brightness wanders, they flicker, and they react to the faintest motion of air.

There has always been an innate curiosity in understanding how candle flames work and behave. In recent years, people have also extensively sought to emulate this behavior with electronic light sources. I have also been fascinated by this and tried to understand real candles and how artificial candles work.

Now, it’s a curious thing that we try to emulate the imperfections of candles. After all, candle makers have worked for centuries (and millennia) on optimizing candles NOT to flicker?

In essence: The trick is that there is a very delicate balance in how much fuel (the molten candle wax) is fed into the flame. If there is too much, the candle starts to flicker even when undisturbed. This is controlled by how the wick is made.

Candle Triplet Oscillations

Now, there is a particularly fascinating effect that has more recently been the subject of publications in scientific journals¹² : When several candles are brought close to each other, they start to “communicate” and their behavior synchronizes. The simplest demonstration is to bundle three candles together; they will behave like a single large flame.

So, what happens with our bundle of three candles? It will basically undo millennia of candle technology optimization to avoid candle flicker. If left alone in motionless air, the flames will suddenly start to rapidly change their height and begin to flicker. The image below shows two states in that cycle.

Two states of the oscillation cycle in bundled candles

We can also record the brightness variation over time to understand this process better. In this case, a high-resolution ambient light sensor was used to sample the flicker over time. (This was part of more comprehensive set experiments of conducted a while ago, which are still unpublished)

Plotting the brightness evolution over time shows that the oscillations are surprisingly stable, as shown in the image below. We can see a very nice sawtooth-like signal: the flame slowly grows larger until it collapses and the cycle begins anew. You can see a video of this behavior here. (Which, unfortunately cannot embed properly due to WordPress…)

Left: Brightness variation over time showing sawtooth pattern.
Right: Power spectral density showing stable 9.9 Hz frequency

On the right side of the image, you can see the power spectral density plot of the brightness signal on the left. The oscillation is remarkably stable at a frequency of 9.9 Hz.

This is very curious. Wouldn’t you expect more chaotic behavior, considering that everything else about flames seems so random?

The phenomenon of flame oscillations has baffled researchers for a long time. Curiously, they found that the oscillation frequency of a candle flame (or rather a “wick-stabilized buoyant diffusion flame”) depends mainly on just two variables: gravity and the dimension of the fuel source. A comprehensive review can be found in Xia et al.³.

Now that is interesting: gravity is rather constant (on Earth) and the dimensions of the fuel source are defined by the size (diameter) of the candles and possibly their proximity. This leaves us with a fairly stable source of oscillation, or timing, at approximately 10Hz. Could we use the 9.9 Hz oscillation to derive a time base?

Sensing Candle Frequencies with a Phototransistor

Now that we have a source of stable oscillations—remind you, FROM FIRE—we need to convert them into an electrical signal.

The previous investigation of candle flicker was based an I²C-based light sensor to sample the light signal. This provides very high SNR, but is comparatively complex and adds latency.

A phototransistor provides a simpler option. Below you can see the setup with a phototransistor in a 3mm wired package (arrow). Since the phototransistor has internal gain, it provides a much higher current than a photodiode and can be easily picked up without additional amplification.

Phototransistor setup with sensing resistor configuration

The phototransistor was connected via a sensing resistor to a constant voltage source, with the oscilloscope connected across the sensing resistor. The output signal was quite stable and showed a nice ~9.9 Hz oscillation.

In the next step, this could be connected to an ADC input of a microcontroller to process the signal further. But curiously, there is also a simpler way of detecting the flame oscillations.

Capacitive Flame Sensing

Capacitive touch peripherals are part of many microcontrollers and can be easily implemented with an integrated ADC by measuring discharge rates versus an integrated pull-up resistor, or by a charge-sharing approach in a capacitive ADC.

While this is not the most obvious way of measuring changes in a flame, it is to be expected to observe some variations. The heated flame with all its combustion products contains ionized molecules to some degree and is likely to have different dielectric properties compared to the surrounding air, which will be observed as either a change of capacitance or increased electrical loss. A quick internet search also revealed publications on capacitance-based flame detectors.

A CH32V003 microcontroller with the CH32fun environment was used for experiments. The set up is shown below: the microcontroller is located on the small PCB to the left. The capacitance is sensed between a wire suspended in the flame (the scorched one) and a ground wire that is wound around the candle. The setup is completed with an LED as an output.

Complete capacitive sensing setup with CH32V003 microcontroller, candle triplet and a LED.

Initial attempts with two wires in the flame did not yield better results and the setup was mechanically much more unstable.

Read out was implemented straightforward using the TouchADC function that is part of CH32fun. This function measures the capacitance on an input pin by charging it to a voltage and measuring voltage decay while it is discharged via a pull-up/pull-down resistor. To reduce noise, it was necessary to average 32 measurements.

// Enable GPIOD, C and ADC
RCC->APB2PCENR |= RCC_APB2Periph_GPIOA | RCC_APB2Periph_GPIOD | RCC_APB2Periph_GPIOC | RCC_APB2Periph_ADC1;

InitTouchADC();
...

int iterations = 32;
sum = ReadTouchPin( GPIOA, 2, 0, iterations );

First attempts confirmed to concept to work. The sample trace below shows sequential measurements of a flickering candle until it was blown out at the end, as signified by the steep drop of the signal.

The signal is noisier than the optical signal and shows more baseline wander and amplitude drift—but we can work with that. Let’s put it all together.

Capacitive sensing trace showing candle oscillations and extinction

Putting everything together

Additional digitial signal processing is necessary to clean up the signal and extract a stable 1 Hz clock reference.

The data traces were recorded with a Python script from the monitor output and saved as csv files. A separate Python script was used to analyze the data and prototype the signal processing chain. The sample rate is limited to around ~90 Hz due to the overhead of printing data via the debug output, but the data rate turned out to be sufficient for this case.

The image above shows an overview of the signal chain. The raw data (after 32x averaging) is shown on the left. The signal is filtered with an IIR filter to extract the baseline (red). The middle figure shows the signal with baseline removed and zero-cross detection. The zero-cross detector will tag the first sample after a negative-to-positive transition with a short dead-time to prevent it from latching to noise. The right plot shows the PSD of the overall and high-pass filtered signal, showing that despite the wandering input signal, we get a sharp ~9.9 Hz peak for the main frequency.

A detailed zoom-in of raw samples with baseline and HP filtered data is shown below.

The inner loop code is shown below, including implementation of IIR filter, HP filter, and zero-crossing detector. Conversion from 9.9 Hz to 1 Hz is implemented using a fractional counter. The output is used to blink the attached LED. Alternatively, an advanced implementation using a software-implemented DPLL might provide a bit more stability in case of excessive noise or missing zero crossings, but this was not attempted for now.

const int32_t led_toggle_threshold = 32768;  // Toggle LED every 32768 time units (0.5 second)
const int32_t interval = (int32_t)(65536 / 9.9); // 9.9Hz flicker rate
...

sum = ReadTouchPin( GPIOA, 2, 0, iterations );

if (avg == 0) { avg = sum;} // initialize avg on first run
avg = avg - (avg>>5) + sum; // IIR low-pass filter for baseline
hp = sum -  (avg>>5); // high-pass filter

// Zero crossing detector with dead time
if (dead_time_counter > 0) {
    dead_time_counter--;  // Count down dead time
    zero_cross = 0;  // No detection during dead time
} else {
    // Check for positive zero crossing (sign change)
    if ((hp_prev < 0 && hp >= 0)) {
        zero_cross = 1;  
        dead_time_counter = 4;  
        time_accumulator += interval;  
        
        // LED blinking logic using time accumulator
        // Check if time accumulator has reached LED toggle threshold
        if (time_accumulator >= led_toggle_threshold) {
            time_accumulator = time_accumulator - led_toggle_threshold;  // Subtract threshold (no modulo)
            led_state = led_state ^ 1;  // Toggle LED state using XOR
            
            // Set or clear PC4 based on LED state
            if (led_state) {
                GPIOC->BSHR = 1<<4;  // Set PC4 high
            } else {
                GPIOC->BSHR = 1<<(16+4);  // Set PC4 low
            }
        }
    } else {
        zero_cross = 0;  // No zero crossing
    }
}

hp_prev = hp;

Finally, let’s marvel at the result again! You can see the candle flickering at 10 Hz and the LED next to it blinking at 1 Hz! The framerate of the GIF is unfortunately limited, which causes some aliasing. You can see a higher framerate version on YouTube or the original file.

That’s all for our journey from undoing millennia of candle-flicker-mitigation work to turning this into a clock source that can be sensed with a bare wire and a microcontroller. Back to the decade-long quest to build a perfect electronic candle emulation…

All data and code is published in this repository.

This is an entry to the HaD.io “One Hertz Challenge”

References

Okamoto, K., Kijima, A., Umeno, Y. & Shima, H. “Synchronization in flickering of three-coupled candle flames.” Scientific Reports 6, 36145 (2016). ︎
Chen, T., Guo, X., Jia, J. & Xiao, J. “Frequency and Phase Characteristics of Candle Flame Oscillation.” Scientific Reports 9, 342 (2019). ︎
J. Xia and P. Zhang, “Flickering of buoyant diffusion flames,” Combustion Science and Technology, 2018. ︎

Tim's+Blog
Implementing Neural Networks on the “10-cent” RISC-V MCU without Multiplier 24 April 2024 at 10:20

Implementing Neural Networks on the “10-cent” RISC-V MCU without Multiplier

Tim's+Blog

By: cpldcpu

24 April 2024 at 10:20

I have been meaning for a while to establish a setup to implement neural network based algorithms on smaller microcontrollers. After reviewing existing solutions, I felt there is no solution that I really felt comfortable with. One obvious issue is that often flexibility is traded for overhead. As always, for a really optimized solution you have to roll your own. So I did. You can find the project here and a detailed writeup here.

It is always easier to work with a clear challenge: I picked the CH32V003 as my target platform. This is the smallest RISC-V microcontroller on the market right now, addressing a $0.10 price point. It sports 2kb of SRAM and 16kb of flash. It is somewhat unique in implementing the RV32EC instruction set architecture, which does not even support multiplications. In other words, for many purposes this controller is less capable than an Arduino UNO.

As a test subject I chose the well-known MNIST dataset, which consists of images of hand written numbers which need to be classified from 0 to 9. Many inspiring implementation on Arduino exist for MNIST, for example here. In this case, the inference time was 7 seconds and 82% accuracy was achieved.

The idea is to train a neural network on a PC and optimize it for inference on teh CH32V003 while meetings these criteria:

Be as fast and as accurate as possible
Low SRAM footprint during inference to fit into 2kb sram
Keep the weights of the neural network as small as possible
No multiplications!

These criteria can be addressed by using a neural network with quantized weights, were each weight is represented with as few bits as possible. The best possible results are achieved when training the network already on quantized weights (Quantization Aware Training) as opposed to quantized a model that was trained with high accuracy weights. There is currently some hype around using Binary and Ternary weights for large language models. But indeed, we can also use these approaches to fit a neural network to a small microcontroller.

The benefit of only using a few bits to represent each weight is that the memory footprint is low and we do not need a real multiplication instruction – inference can be reduced to additions only.

Model structure and optimization

For simplicity reasons, I decided to go for a e network architecture based on fully-connected layers instead of convolutional neural networks. The input images are reduced to a size of 16×16=256 pixels and are then fed into the network as shown below.

The implementation of the inference engine is straightforward since only fully connected layers are used. The code snippet below shows the innerloop, which implements multiplication of 4 bit weights by using adds and shifts. The weights use a one-complement encoding without zero, which helps with code efficiency. One bit, ternary, and 2 bit quantization was implemented in a similar way.

    int32_t sum = 0;
    for (uint32_t k = 0; k < n_input; k+=8) {
        uint32_t weightChunk = *weightidx++;

        for (uint32_t j = 0; j < 8; j++) {
            int32_t in=*activations_idx++;
            int32_t tmpsum = (weightChunk & 0x80000000) ? -in : in; 
            sum += tmpsum;                                  // sign*in*1
            if (weightChunk & 0x40000000) sum += tmpsum<<3; // sign*in*8
            if (weightChunk & 0x20000000) sum += tmpsum<<2; // sign*in*4
            if (weightChunk & 0x10000000) sum += tmpsum<<1; // sign*in*2
            weightChunk <<= 4;
        }
    }
    output[i] = sum;

In addition the fc layers also normalization and ReLU operators are required. I found that it was possible to replace a more complex RMS normalization with simple shifts in the inference. Not a single full 32×32 multiplication is needed for the inference! Having this simple structure for inference means that we have to focus the effort on the training part.

I studied variations of the network with different numbers of bits and different sizes by varying the numer of hidden activiations. To my surprise I found that the accuracy of the prediction is proportional to the total number of bits used to store the weights. For example, when 2 bits are used for each weight, twice the numbers of weights are needed to achieve the same perforemnce as a 4 bit weight network. The plot below shows training loss vs. total number of bits. We can see that for 1-4 bits, we can basically trade more weights for less bits. This trade-off is less efficient for 8 bits and no quantization (fp32).

I further optimized the training by using data augmentation, a cosine schedule and more epochs. It seems that 4 bit weights offered the best trade off.

More than 99% accuracy was achieved for 12 kbyte model size. While it is possible to achiever better accuracy with much larger models, it is significantly more accurate than other on-MCU implementations of MNIST.

Implementation on the Microcontroller

The model data is exported to a c-header file for inclusion into the inference code. I used the excellent ch32v003fun environment, which allowed me to reduce overhead to be able to store 12kb of weights plus the inference engine in only 16kb of flash.

There was still enough free flash to include 4 sample images. The inference output is shown above. Execution time for one inference is 13.7 ms which would actually allow to model to process moving image input in real time.

Alternatively, I also tested a smaller model with 4512 2-bit parameters and only 1kb of flash memory footprintg. Despite its size, it still achieves a 94.22% test accuracy and it executes in only 1.88ms.

Conclusions

This was quite a tedious projects, hunting many lost bits and rounding errors. I am quite pleased with the outcome as it shows that it is possible to compress neural networks very significantly with dedicated effort. I learned a lot and am planning to use the data pipeline for more interesting applications.