Reading view

There are new articles available, click to refresh the page.

Wikis for Everyone: Bridging the Accessibility Gap at the 2026 Hackathon

Wikimedians discussing web accessibility at the Wikimedia Hackathon 2026
Italian wikimedians discussing web accessibility at the Wikimedia Hackathon 2026

Web accessibility is not merely a technical feature. It is a prerequisite for truly free knowledge. During the recent Wikimedia Hackathon 2026, held in Milan, we came together as a dedicated group hailing from Italy to confront a quiet yet persistent issue: the barriers that prevent visually-impaired individuals from fully engaging with Wikipedia and its sister projects.

Thus, Valcio, Daimona Eaytoy, and Piergiovanna Grossi (WMIT) led the unconference session “Wikipedia for Everyone: Closing the Accessibility Gap”, which served as both a wake-up call and a collaborative workshop. By examining how community-made templates and interface elements often fail our users, we aimed to transition from identifying problems to building sustainable solutions.

This is a short recap for those who missed it.

The Reality of the Digital Barrier

Home page for MediaWiki Accessibility Checker
Home page for MediaWiki Accessibility Checker

The session opened with a candid look at the current state of our interfaces. While MediaWiki provides a robust foundation, years of community-driven customisation have inadvertently introduced many accessibility violations. Key issues discussed included:

  • Missing Alt-Text: Images essential for understanding content often lack descriptions or alternative text which is readable by screen readers, assistive technologies that read out graphic content to visually impaired users.
  • The “HTML Wall”: Many tables and templates lack proper semantic markup, forcing text-to-speech tools to read out raw code rather than structured information.
  • Contrast and Colour: Numerous gadgets and banners still fall short of the WCAG 2.2 AA (a web-accessibility standard) minimum contrast ratios, rendering them invisible to users with colour blindness or low vision.

Measuring Missing Alt-Text

The unconference session also sparked a small follow-up experiment. CristianCantoro set out to measure how widespread the issue of missing alt-text is on Italian Wikipedia and Lombard Wikipedia, combining the Wikipedia HTML dumps provided by Wikimedia Enterprise with the XML dumps published by the Wikimedia Foundation. The initial results confirm the scale of the challenge: more than 90% of images used in Italian and Lombard Wikipedia articles lack alternative text.

This is not an isolated finding. In 2023, a team of researchers from Stanford University and Google Research presented a cross-lingual analysis of image accessibility across 108 Wikipedia language editions finding that, on average, only around 10% of images had alt-text. This research was presented at the 2023 edition of the Wiki Workshop.

These numbers are a reminder that missing alt-text is still an open and large-scale challenge across languages. If we want Wikipedia to be truly open to everyone, we need better tools, workflows, and community practices to help editors add alt-text and meaningful descriptions to images.

From Discussion to Action: The MediaWiki Accessibility Checker

Logo for MediaWiki Accessibility Checker
Logo for MediaWiki Accessibility Checker

To move from awareness to action, one of the session participants — Super nabla from the Indic MediaWiki Developers User Group — built a concrete solution during the hackathon itself. The tool, available on Toolforge, assists editors and developers in meeting accessibility standards: the MediaWiki Accessibility Checker. Try it out: https://accessibility-checker.toolforge.org/

Built on the industry-standard axe-core engine and Playwright, the tool is specifically adapted for the MediaWiki ecosystem. It allows editors and developers to (i) perform deep audits (queryable both from the frontend interface as well as from a dedicated RESTful API) based on WCAG 2.2 AA (and other standards) on any wiki URL, including project pages; (ii) generate professional reports in multiple formats, including PDF and Wikitext for easy sharing on-wiki; (iii) utilise a modern interface designed with the Wikimedia Codex design system, ensuring a seamless experience for contributors.

This tool represents a small yet important step forward in democratising accessibility auditing, allowing gadget authors — even those without formal expertise — to identify and rectify errors before they impact our readers.

A Legacy of “Wikiricci” and Community Care

Daimona Eaytoy with the WikiRiccio
Daimona Eaytoy with the WikiRiccio

The roots of this technical collaboration extend back to 2018 at itWikiCon in Como (Italy), where the “Officina” (the Italian Wikipedia’s technical project) was honoured for its quiet, essential labour, carried out by the smanettoni (hackers) — the tinkerers and wizards who operate behind the scenes to ensure the platform’s gears continue to turn. This community recognition is personified by the Wikiriccio (wiki hedgehog), a physical trophy whose travel history has become something of a legendary saga within the Italian community. Traditionally held in rotation, after years of near-misses, it finally found its way to Daimona Eaytoy during this hackathon, reminding us that accessibility work is also about human connections and shared care.

For us, this light-hearted tradition and award serve as a reminder: behind every accessibility tool or interface fix is a human connection, a shared community-based vision and history, and a commitment to “making the shop run” for the benefit of all users.

Next Steps and Community Involvement

The hackathon session was only the beginning. The outcomes of our session are being synthesised into a formal proposal in the Italian Wikipedia and a Phabricator task to help standardise CSS custom properties and automated linting workflows.

Yet, technology alone cannot solve a cultural challenge. We invite all UI/UX designers, developers, and experienced wiki-editors to join the effort. Whether you are improving the alt text on a high-traffic policy page or helping modernise an old template, your contribution ensures that Wikipedia remains truly accessible, enabling everyone to share in the sum of all knowledge.

A special thanks to the hackathon organisers and all the participants who shared their lived experiences; your insights are what drive these technical improvements forward.

Tech News 2026 – Issue 23

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Updates for editors

  • The Reader Experience team is conducting an experiment to show the reading lists feature, which is still in development, to logged-out mobile readers to test whether it encourages account creation at a higher rate compared to the watchstar button. The experiment was launched on May 18th on German, Spanish, Italian, Portuguese, Polish, Dutch, Turkish, and Urdu wikis, and it will run for a month.
  • The Wikimedia Apps team released Phase 1 of the redesigned Home Feed to the Android Beta app. The new Home Feed includes a refreshed “Community” tab and a personalized “For You” tab featuring daily updated reading recommendations. The redesign is part of a broader effort to improve content discovery and create more engaging learning experiences in the Wikipedia apps.
  • Recurrent item View all 18 community-submitted tasks that were resolved last week. For example, an issue where images could fail to load for some suggested edits on Special:Homepage, leaving the thumbnail stuck in a loading state, has now been fixed. [1]

Updates for technical contributors

  • Recurrent item Detailed code updates later this week: MediaWiki

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

AWW Podcast Season 2 Episode #1 Can Wikipedia Evolve With the Digital Age? 

By: AnnComms

There was a time when Wikipedia was the go-to source for information and one of the most trusted tools for research across the world. From students and journalists to researchers and everyday internet users, millions relied on the platform for quick and accessible knowledge. However, as technology continues to evolve, the way people consume information has also changed.

Today, Wikipedia faces growing competition from emerging technologies such as Artificial Intelligence (AI) tools and social media platforms, which now shape how many people search for and engage with information online. As a result, the platform has experienced a decline in page views over the years, raising important questions about its future relevance and visibility in the digital age.

To address these concerns, about 100 Wikimedian affiliates, volunteers, and external experts gathered in Frankfurt am Main from 30 January to 1 February 2026, for the Wikimedia Futures Lab event organised by the Wikimedia movement. The Futures Lab serves as a space for research, experimentation, and forward-thinking conversations on the future of free knowledge.

At a time when technology is rapidly transforming the internet and information-sharing, the event provided an opportunity for participants to reflect on how Wikipedia can continue to remain relevant, visible, and trusted in an increasingly digital and AI-driven world.

From the attendees

The conversations and ideas shared during the event formed the AWW Voices Podcast episode “Can Wikipedia Evolve with the Digital Age?”. In this episode, host Oluwapelumi Aina joined by Ruby D Brown, Co-Founder of African Wiki Women, Tochi Precious, Language Advocate and Co-Founder of the Igbo User Group, and Olubusola Afolabi, Community Engagement Lead at Free Knowledge Africa. 

Screenshot of AWW Voices Podcast host and guests.

Having attended the Wikimedia Futures Lab event, the guests shared their experiences, reflections, and key takeaways from the discussions held in Frankfurt. 

“The world around us is changing really fast. When you think about how people trust information online, AI-generated media, new laws, and shifting technologies, it becomes important to understand how these trends affect us as the Wikimedia community,” says Tochi.

Wikipedia vs Digital Age

Despite technological advancement, Wikipedia, once regarded as one of the most trusted digital information platforms, has seen a decline in page views since 2016 as more people turn to AI tools for information. However, it is important to recognise that many AI systems are trained using content from platforms like Wikipedia.

“For example, when you search for something on Google, the AI overview provides a summary alongside references. Very few people actually click on the Wikipedia link for the longer version. This shows that people are still consuming Wikipedia content, but AI tools now act as middlemen,” explains Olubusola.

According to her, this shift means Wikipedia can no longer rely solely on users visiting the platform directly. Instead, it must adapt to changing online habits and find ways to bring information closer to the spaces where audiences already spend their time.

She adds that Wikipedia must adapt by meeting audiences where they already are, bringing information directly to the platforms people use instead of expecting them to always visit the main website.

The solution

The rise of AI and social media has also changed how people consume information. Many users now prefer short-form content over long-form reading because of shrinking attention spans. Since Wikipedia is traditionally a long-form platform, there is growing pressure for it to evolve alongside these changing habits.

For many younger internet users, information is no longer consumed through lengthy articles alone. Videos, creators, podcasts, and short-form explainers are increasingly becoming the preferred way to learn and engage online.

“People are moving away from institution-based information and increasingly relying on personalities. They want direct interaction, and video content makes information easier to consume. As Wikimedia, we need to pay attention to these shifts so we can meet people where they are,” says Ruby.

The Dilemma

Wikimedia exists because of the volunteers who edit and write the content on the platform. While keeping up with technological change is necessary, the movement also faces the challenge of ensuring that technology does not overshadow the human element that has always been at the centre of Wikimedia projects.

As conversations around AI continue to grow, many community members believe the focus should remain on supporting contributors rather than replacing them.

Last year, the Wikimedian community launched its AI Strategy, which clearly showed that AI should not replace the human writers and editors but rather support their work.

When the Home Page Gets Boring: How My Colleagues and I Revitalised Thai Wikipedia

After a few years away from Thai Wikipedia, I returned to find that the Main Page had become stagnant. It lacked the dynamic energy a landing page needs. So, my colleagues and I decided to revitalise it—and here is exactly how we did it.

Thai Wikipedia's Home Page, as of 26 May 2026, only the website's logo, search box, page name. welcome message, featured sections and broad categories links included.
Thai Wikipedia’s Home Page, as of 26 May 2026

Before diving into the details, let me explain the structure of Thai Wikipedia’s Home Page. It was heavily inspired by the original English edition‘s layout, featuring four core content sections:

  • This Month’s Featured Articles (TMFA): An excerpt of a well-written article (Thai Wikipedia lacks the volume to change this daily like the English site).
  • Did You Know (DYK): Interesting facts pulled from recently expanded or created articles.
  • In The News (ITN): Recent global (and occasionally space-related) events.
  • On This Day (OTD): A look back at historical events on the current date.

When I returned to active editing in mid-2024, I realised these sections were frozen in time. Sometimes, content remained identical for days. After a thorough review, I found the issues were threefold: stagnant content, unpredictable update schedules (except for the strictly automated OTD), and complex, opaque backend procedures for publishing content to the Main Page.

To build a sustainable solution, we had to attack the problem from two angles: community contribution and technical infrastructure.

On the contribution side, we introduced clear, easy-to-follow Standard Operating Procedures (SOPs) to ensure nominators and reviewers wouldn’t feel overwhelmed. We also lifted several legacy constraints that were discouraging newbie and intermediate editors.

On the nerdy side, we introduced a “Nested Transclude Template System” to make pulling content to the main page seamless. No more messy, bespoke coding required. All nominations can now be tracked and recalled without digging through a chaotic page history.

For the less tech-savvy, here is how simple it is now: You no longer need to deal with any messy, complicated coding. As shown in the diagram, everything is built like a set of nesting dolls:

Diagram illustrating a nested template system for Wikipedia. Content like hooks and excerpts are grouped inside date-based templates, which are automatically pulled into the main DYK and TMRA templates.
A Diagram to demonstrate a nested template system for Wikipedia. Content like hooks and excerpts are grouped inside date-based templates, which are automatically pulled into the main DYK and TMRA templates.
  • Write your content: You just write your proposal or excerpt in a standard form.
  • Name it with the date: You save it inside a specific date format (like YYYY-MM-DD).
  • The system does the rest: When that day arrives, the Main Page template automatically fetches the correct date’s content and puts it live—completely on its own!

This means no one has to lift a finger to update it manually, and we can track past nominations without digging through a chaotic page history.

Did You Know it’s now easier than ever to nominate your articles?

The first backlog I tackled was the DYK section. There, I crossed paths with Taweethaも, a renowned Thai Wikipedian. That chance encounter inspired a complete revolution of our process. We teamed up to clear backlogs that had been sitting untouched for over six months. Together, we drafted new SOPs and built a backend system to support them—queuing content chronologically by nomination date, enforcing character limits, and scheduling release dates.

Once the system stabilised, we launched a content contest to diversify the topics and test our new workflow under pressure. The campaign was a massive success: 16 contributors created or improved over 90 articles. Crucially, three of those contributors remain highly active “DYK editors” today.

We also noticed that while some nominators were incredibly prolific, they rarely helped review others’ work. To keep the backlog manageable, we implemented a Quid Pro Quo (QPQ) policy, requiring nominators to review a peer’s submission to qualify their own.

Opening the Gates: Allowing Good Articles onto the Main Page

With DYK running smoothly, we turned our attention to TMFA. This section had suffered from a decade-long drought of new Featured Articles (FAs) to showcase. Beyond adapting our new DYK SOPs, we made a major policy shift: we lifted the strict FA constraint and allowed Good Articles (GAs) to be featured. To reflect this, we renamed the section from This Month’s Featured Article to Recommended Articles.

Whilst long-form, high-quality writing requires significantly more energy from contributors—meaning it wasn’t as explosive as the DYK campaign—the initiative still successfully brought 7 brand-new, high-quality articles to the front page from 7 different writers.

A new solution brings a new quirk

Excerpt of Thai Wikipedia's Home Page on 4 June 2025, but it displayed OTD of 31 May.
An excerpt of Thai Wikipedia’s Home Page on 4 June 2025 showing OTD content from 31 May due to caching issues.

Every new system has its bugs. Just a day into the DYK campaign, a participant noticed that logged-out readers were seeing stale, outdated main page content, while logged-in users saw the updates perfectly.

We spent days hunting for a fix. Thankfully, User:Chlod—a perennial savior of Wikipedia infrastructure—pointed out that the server cache just needed to be manually “purged” (which simply means appending ?action=purge to the URL string).

To automate this, I sat down for some classic “vibe coding” and wrote a Python script. Hosted on Toolforge (Wikimedia’s dedicated server for customised scripts within the Wikimedia Movement) and linked to my bot account, it now runs via a cron job twice a day to keep the page fresh. I also added a secondary feature to the script: it automatically archives the Main Page to the Internet Archive‘s Wayback Machine daily.

For those unfamiliar with the tech jargon, here is the simple version: I asked the AI chatbot, Google Gemini, to help me write a program in the Python language. After testing it repeatedly until I was sure it worked, I uploaded the code to Toolforge—which is essentially a free, 24/7 computer server available to Wikipedia volunteers. I set the server to run my code twice a day to automatically fix the glitch and keep the Main Page fresh. As a bonus, I also programmed it to save a daily copy of the Main Page to the Wayback Machine (a digital archive of the internet) so we always have a historical record.

I’ve published my source code in GitHub if you’re looking for: https://github.com/sarawutkhs/wthpurge

What about the other two sections?

You might be wondering why I haven’t mentioned ITN or OTD. To be completely honest, I tried to implement similar reforms for OTD, but couldn’t find anyone in the community available to jump in. If you have ideas on how we can spark interest and bring that same magic to the remaining sections, please drop a comment!

Acknowledgements

This transformation wouldn’t have been possible without an incredible support system. Beyond those already mentioned, I want to thank the original architects of the Main Page structure, as well as every single campaign participant who dedicated time to improving Thai Wikipedia. Finally, my deepest respect goes to Taweethaも, whose guidance both on- and off-wiki was invaluable.

Declaration: This case study was previously presented at the ESEAP Conference 2026 and the October 2025 ESEAP Community Call. The initial phase of this project was also published on the ESEAP’s Substack.

A surprising IC in a LED light chain.

By: cpldcpu

LED-based festive decorations are a fascinating subject for exploration of ingenuity in low-cost electronics. New products appear every year and often very surprising technology approaches are used to achieve some differentiation while adding minimal cost.

This year, there wasn’t any fancy new controller, but I was surprised how much the cost of simple light strings was reduced. The LED string above includes a small box with batteries and came in a set of ten for less than $2 shipped, so <$0.20 each. While I may have benefitted from promotional pricing, it is also clear that quite some work went into making the product cheap.

The string is constructed in the same way as one I had analyzed earlier: it uses phosphor-converted blue LEDs that are soldered to two insulated wires and covered with an epoxy blob. In contrast to the earlier device, they seem to have switched from copper wire to cheaper steel wires.

The interesting part is in the control box. It comes with three button cells, a small PCB, and a tactile button that turns the string on and cycles through different modes of flashing and and constant light.

Curiously, there is nothing on the PCB except the button and a device that looks like an LED. Also, note how some “redundant” joints have simply been left unsoldered.

Closer inspection reveals that the “LED” is actually a very small integrated circuit packaged in an LED package. The four pins are connected to the push button, the cathode of the LED string, and the power supply pins. I didn’t measure the die size exactly, but I estimate that it is smaller than 0.3×0.2 mm² = ~0.1 mm².

What is the purpose of packaging an IC in an LED package? Most likely, the company that made the light string is also packaging their own LEDs, and they saved costs by also packaging the IC themselves—in a package type they had available.

I characterized the current-voltage behavior of IC supply pins with the LED string connected. The LED string started to emit light at around 2.7V, which is consistent with the forward voltage of blue LEDs. The current increased proportionally to the voltage, which suggests that there is no current limit or constant current sink in the IC – it’s simply a switch with some series resistance.

Left: LED string in “constantly on” mode. Right: Flashing

Using an oscilloscope, I found that the string is modulated with an on-off ratio of 3:1 at a frequency if ~1.2 kHz. The image above shows the voltage at the cathode, the anode is connected to the positive supply. This is most likely to limit the current.

All in all, it is rather surprising to see an ASIC being used when it barely does more than flashing the LED string. It would have been nice to see a constant current source to stabilize the light levels over the lifetime of the battery and maybe more interesting light effects. But I guess that would have increased the cost of the ASIC too much and then using an ultra-low cost microcontroller may have been cheaper. This almost calls for a transplant of a MCU into this device…

Neural Networks (MNIST inference) on the “3-cent” Microcontroller

By: cpldcpu

Bouyed by the surprisingly good performance of neural networks with quantization aware training on the CH32V003, I wondered how far this can be pushed. How much can we compress a neural network while still achieving good test accuracy on the MNIST dataset? When it comes to absolutely low-end microcontrollers, there is hardly a more compelling target than the Padauk 8-bit microcontrollers. These are microcontrollers optimized for the simplest and lowest cost applications there are. The smallest device of the portfolio, the PMS150C, sports 1024 13-bit word one-time-programmable memory and 64 bytes of ram, more than an order of magnitude smaller than the CH32V003. In addition, it has a proprieteray accumulator based 8-bit architecture, as opposed to a much more powerful RISC-V instruction set.

Is it possible to implement an MNIST inference engine, which can classify handwritten numbers, also on a PMS150C?

On the CH32V003 I used MNIST samples that were downscaled from 28×28 to 16×16, so that every sample take 256 bytes of storage. This is quite acceptable if there is 16kb of flash available, but with only 1 kword of rom, this is too much. Therefore I started with downscaling the dataset to 8×8 pixels.

The image above shows a few samples from the dataset at both resolutions. At 16×16 it is still easy to discriminate different numbers. At 8×8 it is still possible to guess most numbers, but a lot of information is lost.

Suprisingly, it is still possible to train a machine learning model to recognize even these very low resolution numbers with impressive accuracy. It’s important to remember that the test dataset contains 10000 images that the model does not see during training. The only way for a very small model to recognize these images accurate is to identify common patterns, the model capacity is too limited to “remember” complete digits. I trained a number of different network combinations to understand the trade-off between network memory footprint and achievable accuracy.

Parameter Exploration

The plot above shows the result of my hyperparameter exploration experiments, comparing models with different configurations of weights and quantization levels from 1 to 4 bit for input images of 8×8 and 16×16. The smallest models had to be trained without data augmentation, as they would not converge otherwise.

Again, there is a clear relationship between test accuracy and the memory footprint of the network. Increasing the memory footprint improves accuracy up to a certain point. For 16×16, around 99% accuracy can be achieved at the upper end, while around 98.5% is achieved for 8×8 test samples. This is still quite impressive, considering the significant loss of information for 8×8.

For small models, 8×8 achieves better accuracy than 16×16. The reason for this is that the size of the first layer dominates in small models, and this size is reduced by a factor of 4 for 8×8 inputs.

Surprisingly, it is possible to achieve over 90% test accuracy even on models as small as half a kilobyte. This means that it would fit into the code memory of the microcontroller! Now that the general feasibility has been established, I needed to tweak things further to accommodate the limitations of the MCU.

Training the Target Model

Since the RAM is limited to 64 bytes, the model structure had to use a minimum number of latent parameters during inference. I found that it was possible to use layers as narrow as 16. This reduces the buffer size during inference to only 32 bytes, 16 bytes each for one input buffer and one output buffer, leaving 32 bytes for other variables. The 8×8 input pattern is directly read from the ROM.

Furthermore, I used 2-bit weights with irregular spacing of (-2, -1, 1, 2) to allow for a simplified implementation of the inference code. I also skipped layer normalization and instead used a constant shift to rescale activations. These changes slightly reduced accuracy. The resulting model structure is shown below.

All things considered, I ended up with a model with 90.07% accuracy and a total of 3392 bits (0.414 kilobytes) in 1696 weights, as shown in the log below. The panel on the right displays the first layer weights of the trained model, which directly mask features in the test images. In contrast to the higher accuracy models, each channel seems to combine many features at once, and no discernible patterns can be seen.

Implementation on the Microntroller

In the first iteration, I used a slightly larger variant of the Padauk Microcontrollers, the PFS154. This device has twice the ROM and RAM and can be reflashed, which tremendously simplifies software development. The C versions of the inference code, including the debug output, worked almost out of the box. Below, you can see the predictions and labels, including the last layer output.

Squeezing everything down to fit into the smaller PMS150C was a different matter. One major issue when programming these devices in C is that every function call consumes RAM for the return stack and function parameters. This is unavoidable because the architecture has only a single register (the accumulator), so all other operations must occur in RAM.

To solve this, I flattened the inference code and implemented the inner loop in assembly to optimize variable usage. The inner loop for memory-to-memory inference of one layer is shown below. The two-bit weight is multiplied with a four-bit activation in the accumulator and then added to a 16-bit register. The multiplication requires only four instructions (t0sn, sl,t0sn,neg), thanks to the powerful bit manipulation instructions of the architecture. The sign-extending addition (add, addc, sl, subc) also consists of four instructions, demonstrating the limitations of 8-bit architectures.

void fc_innerloop_mem(uint8_t loops) {

    sum = 0;
    do  {
       weightChunk = *weightidx++;
__asm   
    idxm  a, _activations_idx
	inc	_activations_idx+0

    t0sn _weightChunk, #6
    sl     a            ;    if (weightChunk & 0x40) in = in+in;
    t0sn _weightChunk, #7
    neg    a           ;     if (weightChunk & 0x80) in =-in;                    

    add    _sum+0,a
    addc   _sum+1
    sl     a 
    subc   _sum+1  

  ... 3x more ...

__endasm;
    } while (--loops);

    int8_t sum8 = ((uint16_t)sum)>>3; // Normalization
    sum8 = sum8 < 0 ? 0 : sum8; // ReLU
    *output++ = sum8;
}

In the end, I managed to fit the entire inference code into 1 kilowords of memory and reduced sram usage to 59 bytes, as seen below. (Note that the output from SDCC is assuming 2 bytes per instruction word, while it is only 13 bits).

Success! Unfortunately, there was no rom space left for the soft UART to output debug information. However, based on the verificaiton on PFS154, I trust that the code works, and since I don’t have any specific application in mind, I left it at that stage.

Summary

It is indeed possible to implement MNIST inference with good accuracy using one of the cheapest and simplest microcontrollers on the market. A lot of memory footprint and processing overhead is usually spent on implementing flexible inference engines, that can accomodate a wide range of operators and model structures. Cutting this overhead away and reducing the functionality to its core allows for astonishing simplification at this very low end.

This hack demonstrates that there truly is no fundamental lower limit to applying machine learning and edge inference. However, the feasibility of implementing useful applications at this level is somewhat doubtful.

You can find the project repository here.

Neural Networks (MNIST inference) on the “3-cent” Microcontroller

By: cpldcpu

Bouyed by the surprisingly good performance of neural networks with quantization aware training on the CH32V003, I wondered how far this can be pushed. How much can we compress a neural network while still achieving good test accuracy on the MNIST dataset? When it comes to absolutely low-end microcontrollers, there is hardly a more compelling target than the Padauk 8-bit microcontrollers. These are microcontrollers optimized for the simplest and lowest cost applications there are. The smallest device of the portfolio, the PMS150C, sports 1024 13-bit word one-time-programmable memory and 64 bytes of ram, more than an order of magnitude smaller than the CH32V003. In addition, it has a proprieteray accumulator based 8-bit architecture, as opposed to a much more powerful RISC-V instruction set.

Is it possible to implement an MNIST inference engine, which can classify handwritten numbers, also on a PMS150C?

On the CH32V003 I used MNIST samples that were downscaled from 28×28 to 16×16, so that every sample take 256 bytes of storage. This is quite acceptable if there is 16kb of flash available, but with only 1 kword of rom, this is too much. Therefore I started with downscaling the dataset to 8×8 pixels.

The image above shows a few samples from the dataset at both resolutions. At 16×16 it is still easy to discriminate different numbers. At 8×8 it is still possible to guess most numbers, but a lot of information is lost.

Suprisingly, it is still possible to train a machine learning model to recognize even these very low resolution numbers with impressive accuracy. It’s important to remember that the test dataset contains 10000 images that the model does not see during training. The only way for a very small model to recognize these images accurate is to identify common patterns, the model capacity is too limited to “remember” complete digits. I trained a number of different network combinations to understand the trade-off between network memory footprint and achievable accuracy.

Parameter Exploration

The plot above shows the result of my hyperparameter exploration experiments, comparing models with different configurations of weights and quantization levels from 1 to 4 bit for input images of 8×8 and 16×16. The smallest models had to be trained without data augmentation, as they would not converge otherwise.

Again, there is a clear relationship between test accuracy and the memory footprint of the network. Increasing the memory footprint improves accuracy up to a certain point. For 16×16, around 99% accuracy can be achieved at the upper end, while around 98.5% is achieved for 8×8 test samples. This is still quite impressive, considering the significant loss of information for 8×8.

For small models, 8×8 achieves better accuracy than 16×16. The reason for this is that the size of the first layer dominates in small models, and this size is reduced by a factor of 4 for 8×8 inputs.

Surprisingly, it is possible to achieve over 90% test accuracy even on models as small as half a kilobyte. This means that it would fit into the code memory of the microcontroller! Now that the general feasibility has been established, I needed to tweak things further to accommodate the limitations of the MCU.

Training the Target Model

Since the RAM is limited to 64 bytes, the model structure had to use a minimum number of latent parameters during inference. I found that it was possible to use layers as narrow as 16. This reduces the buffer size during inference to only 32 bytes, 16 bytes each for one input buffer and one output buffer, leaving 32 bytes for other variables. The 8×8 input pattern is directly read from the ROM.

Furthermore, I used 2-bit weights with irregular spacing of (-2, -1, 1, 2) to allow for a simplified implementation of the inference code. I also skipped layer normalization and instead used a constant shift to rescale activations. These changes slightly reduced accuracy. The resulting model structure is shown below.

All things considered, I ended up with a model with 90.07% accuracy and a total of 3392 bits (0.414 kilobytes) in 1696 weights, as shown in the log below. The panel on the right displays the first layer weights of the trained model, which directly mask features in the test images. In contrast to the higher accuracy models, each channel seems to combine many features at once, and no discernible patterns can be seen.

Implementation on the Microntroller

In the first iteration, I used a slightly larger variant of the Padauk Microcontrollers, the PFS154. This device has twice the ROM and RAM and can be reflashed, which tremendously simplifies software development. The C versions of the inference code, including the debug output, worked almost out of the box. Below, you can see the predictions and labels, including the last layer output.

Squeezing everything down to fit into the smaller PMS150C was a different matter. One major issue when programming these devices in C is that every function call consumes RAM for the return stack and function parameters. This is unavoidable because the architecture has only a single register (the accumulator), so all other operations must occur in RAM.

To solve this, I flattened the inference code and implemented the inner loop in assembly to optimize variable usage. The inner loop for memory-to-memory inference of one layer is shown below. The two-bit weight is multiplied with a four-bit activation in the accumulator and then added to a 16-bit register. The multiplication requires only four instructions (t0sn, sl,t0sn,neg), thanks to the powerful bit manipulation instructions of the architecture. The sign-extending addition (add, addc, sl, subc) also consists of four instructions, demonstrating the limitations of 8-bit architectures.

void fc_innerloop_mem(uint8_t loops) {

    sum = 0;
    do  {
       weightChunk = *weightidx++;
__asm   
    idxm  a, _activations_idx
	inc	_activations_idx+0

    t0sn _weightChunk, #6
    sl     a            ;    if (weightChunk & 0x40) in = in+in;
    t0sn _weightChunk, #7
    neg    a           ;     if (weightChunk & 0x80) in =-in;                    

    add    _sum+0,a
    addc   _sum+1
    sl     a 
    subc   _sum+1  

  ... 3x more ...

__endasm;
    } while (--loops);

    int8_t sum8 = ((uint16_t)sum)>>3; // Normalization
    sum8 = sum8 < 0 ? 0 : sum8; // ReLU
    *output++ = sum8;
}

In the end, I managed to fit the entire inference code into 1 kilowords of memory and reduced sram usage to 59 bytes, as seen below. (Note that the output from SDCC is assuming 2 bytes per instruction word, while it is only 13 bits).

Success! Unfortunately, there was no rom space left for the soft UART to output debug information. However, based on the verificaiton on PFS154, I trust that the code works, and since I don’t have any specific application in mind, I left it at that stage.

Summary

It is indeed possible to implement MNIST inference with good accuracy using one of the cheapest and simplest microcontrollers on the market. A lot of memory footprint and processing overhead is usually spent on implementing flexible inference engines, that can accomodate a wide range of operators and model structures. Cutting this overhead away and reducing the functionality to its core allows for astonishing simplification at this very low end.

This hack demonstrates that there truly is no fundamental lower limit to applying machine learning and edge inference. However, the feasibility of implementing useful applications at this level is somewhat doubtful.

You can find the project repository here.

Decapsulating the CH32V203 Reveals a Separate Flash Die

By: cpldcpu

The CH32V203 is a 32bit RISC-V microcontroller. In the produt portfolio of WCH it is the next step up from the CH32V003, sporting a much higher clock rate of 144 MHz and a more powerful RISC-V core with RV32IMAC instruction set architecture. The CH32V203 is also extremely affordable, starting at around 0.40 USD (>100 bracket), depending on configuration.

An interesting remark on twitter piqued my interest: Supposedly the listed flash memory size only refers to a fraction that can be accessed with zero waitstate, while the total flash size is even 224kb. The datasheet indeed has a footnote claiming the same. In addition, the RB variant offers the option to reconfigure between RAM and flash, which is rather odd, considering that writing to flash is usually much slower than to RAM.

Then the 224kb number is mentioned in the memory map. Besides the code flash, there is also a 28Kb boot section and additional configurable space. 224 kbyte +28 kbyte+4=256kbyte, which suggests that the total available flash is 256 kbyte and is remapped to different locations of the memory.

All of these are red flags for an architecture where a separate NOR flash die is used to store the code and the main CPU core has a small SRAM that is used as a cache. This configuration was pioneered by Gigadevice and is also famously used by the ESP32 and RP2040 more recently, although that latter two use an external NOR flash device.

Flash memory is quite different from normal CMOS devices as it requires a special gate stack, isolation and much higher voltages. Therefore, integrating flash memory into a CMOS logic die usually requires extra process steps. The added complexity increases when going to smaller technologies nodes. Separating both dies offers the option of using a high density logic process (for example 45 nm) and pairing it with a low-cost off-the-shell NOR flash die.

Decapsulation and Die Images

To confirm my suspicions I decapsulated a CH32V203C8T6 sample, shown above. I heated the package to drive out the resin and then carefully broke the, now brittle, package apart. Already after removing the lead frame, we can cleary see that it contains two dies.

The small die is around 0.5mm² in area. I wasn’t able to completely removed the remaining filler, but we can see that it is an IC with a smaller number of pads, fitting to a serial flash die.

The microcontroller die came out really well. Unfortunately, the photos below are severely limited by my low-cost USB microscope. I hope Zeptobars or others will come up with nicer images at some point.

The die size of ~1.8 mm² is surprisingly small. In fact it is even smaller than the die of the CH32V003 with a die size of ~2.0 mm² according to Zeptobars die shot. Apart from the fact that the flash was moved off-chip, most likely also a much smaller CMOS technology node was used for the CH32V203 than for the V003.

Summary

It was quite surprising to find a two-die configuration in such a low-cost device. But obviously, it explains the oddities in the device specification, and it also explains why 144 MHz core clock is possible in this device without wait-states.

What are the repercussions?

Amazingly, it seems that, instead of only 32kb of flash, as listed for the smallest device, a total of 224kb can be used for code and data storage. The datasheet mentions a special “flash enhanced read mode” that can apparently be used to execute code from the extended flash space. It’s not entirely clear what the impact on speed is, though, but that’s certainly an area for exploration.

I also expect this MCU to be highly overclockable, similar to the RP2040.

What are you really doing when you fill in an hCaptcha

hCaptcha is a reCAPTCHA clone that has been growing in popularity over 2020 and 2021, in particular due to Cloudflare’s conversion of their nag screens from Google’s reCAPTCHA to hCaptcha. Although hCaptcha advertises itself as being a privacy-conscious alternative to reCAPTCHA, there’s also an incentive for websites to switch over: hCaptcha will pay websites each time one of their users completes a hCaptcha challenge.

Now the question is: how does you completing a captcha earn anyone money? Of course, hCaptcha is a VC-funded business, so it can afford to burn money in the pursuit of market share; nonetheless there needs to be a plausible business model there, and it’s not obvious at first sight.

If you read the hCaptcha website, they suggest that AI startups will pay them to label their images for them. 1 Labelling images is a labour-intensive task and required for some current-generation machine learning approaches. AI startups are well-funded and have money to spend on labelling, so this sounds like a reasonable case of selling shovels during a gold rush. But the output from solving CAPTCHAs isn’t obviously isomorphic to the type of labelling required for machine learning, which is often quite specific and requires a very low error rate.

Complex CAPTCHA challenges are not possible, as web users turn out to be drunk, blind, 3 years old, or just randomly clicking buttons to get this infernal thing to go away. Accordingly, hCaptcha challenges are simple: select the images that match a simple 1-3 word prompt from a 3x3 grid. This is fortunately easy for most real people. 2 3

The most common prompts seem to be selecting buses, trucks, boats or trains out of the grid.4 The market demand for this sort of simple labelling must be rather limited, even if challenges have to be repeated many times and cross-checked to get an acceptable error rate.

So far, a little inscrutable but all seems sensible enough. But then it all gets interesting when you actually take a look at the images in a little more detail:

hCaptcha example

Starting from the top left and going right, we have:

  • A boat that appears to have been painted by Dalí, with a mast drooping like a wet noodle.
  • A plane with tricycle landing gear, except it’s got two sets of wheels at the front and one at the back. That’s not normal!
  • A normal looking plane with some odd-looking clouds above.
  • A bus with an axle in front of the door, and another behind it, and another at the back. Hmm
  • A boat in a marina made of splodges.
  • A normal-looking boat on a normal-looking sea, except - look at that horizon! How did that happen.
  • A single-decker london bus with a ghost of it’s double-decker cousin above. And a giant moth perched on it at the back.
  • Another ghostly upper deck on a regional bus.
  • A sailing boat with some oddly stylised “alien” writing on the sail.

These images are obviously AI-generated. They have all the hallmarks of GAN output, with typical artifacts and oddities. Have some more and see if you can spot the same things in these other challenges - it’s not hard at all, is it!

The question then is why? Why would hCaptcha be generating these challenges - aren’t they supposed to be labelling real life, not some AI mirages? You know the labels before you generate them, what’s the point in using humans to re-label them again… And why are the results so bad - these are definitely not state of the art!

The only explanation that makes sense is that hCaptcha is not really doing this whole AI-labelling business at all, or if they are it’s only in very limited fashion. Most of the time they’re just using a GAN to generate images that defeat the bots’ image recognition AI. And the GAN isn’t trained to optimise human recognition, rather to confound the bots in an arms race, leading to the bad image quality.

If you have any better ideas I’d be glad to hear them because this whole thing doesn’t really make much sense.

Footnotes:

  1. If you look closer, they have an article that purports to explain the “technical architecture of hCaptcha” which is a supreme example of buzzword-stuffing blockchain-washed nothing. There is less than zero need for a blockchain to track customer requests, much less the public Ethereum blockchain, but it’s the buzzword of the month so it must go in. 

  2. Most real users, that is. There are some users for whom the challenge is actually too hard, or who’ve been blackholed and are interpreting bad IP reputation as poor skill. But the ones who fall down most often are those who try too hard and analyse the prompt and challenge in too much detail. The real way to solve these image challenges is to answer what you think other people will answer, rather than the correct answer. And don’t take too long either, just a quick glance is all your competition are giving! Anecdotally, this isn’t too common with hCaptcha, but reCAPTCHA challenges are extremely prone to this failure if you think too hard. 

  3. Unfortunately this is also quite easy for bots, somewhat subverting the point of a CAPTCHA, so that’s how browser fingerprinting and IP reputation creep in to get reasonable enough results. 

  4. These prompts are so common that a front-page post on Hacker News consisted of this observation (and prompted me to write up my thoughts on the topic from the past few months). 

Searching for Nothing, Finding a Surprise

Following on from my post yesterday about an edge case in YouTube, I thought I’d write about a class of edge cases perhaps even more strange that I’ve been exploring recently:

Search engines are a fact of daily life for most of the population nowadays. Google (sub your preferred provider) is an extension of the brain, imagined as giving you access to the sum of the world’s information at the click of a button. But a search engine isn’t just a Ctrl-F for the internet with a nice interface and ads; rather it’s a tremendously complicated system with lots of features and interactions between those features. And all you need to explore the system yourself is some well-tuned search queries.

I recently had an epiphany: search engines are designed to find you results for something and that’s a job they perform well. But there’s nothing stopping you from searching for nothing! And the search engines will still give you results!

And what results they are - have a go on the links below:

An empty query on DDG: https://duckduckgo.com/?q=+””
A different empty query on DDG: https://duckduckgo.com/?q=(“”)
An empty query on Google: https://www.google.com/search?q=(“”)
An empty query on Google News: https://www.google.com/search?q=”“&tbm=nws

And have you ever thought about doing an anything but search? Normally you can add negations to the end of your search term to remove unwanted results, but there’s nothing stopping you from having a search term consisting entirely of negations!

Here’s one on DDG: https://duckduckgo.com/?q=-“an entirely negated query”
On Bing: https://www.bing.com/search?q=-“an entirely negated query”
And on Google Books: https://www.google.com/search?q=-“nothing to see here”&tbm=bks

Commentary

Google appears to have some half-effective filtering for these empty search queries so you’ll mostly get the same two YouTube videos as a result - is this an Easter egg? Although Google News and Books don’t have any filter, and you do get some odd results there!

DuckDuckGo doesn’t appear to have any filtering at all, although it’s obvious just how much DDG relies on Bing’s whitelabel product for its results by looking at how similar the two are.

If you can think of a deeper reason for these results, please do leave a comment and lets try and explain some of the mystery away.

❌