Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Perf Matters at Wikipedia in 2020

20 December 2022 at 20:00

Organizing a web performance conference

Photo by Sia Karamalegos, CC BY-SA 4.0.

There are numerous industry conferences dedicated to web performance. We have attended and spoken at several of them, and noticed important topics remain underrepresented. While the logistics of organizing a conference is too daunting for our small team, FOSDEM presents an appealing compromise. 

The Wikimedia Performance Team organized the inaugural Web Performance devroom at FOSDEM 2020. 

FOSDEM is the biggest Free and Open Source software conference in the world. It takes place in Brussels every year, is free to attend, and attracts over 8000 attendees. FOSDEM is known for its many self-organized conference tracks, known as “devrooms”. The logistics are taken care of by FOSDEM, while we focus on programming the content. We ran our own CfP, curate and invite speakers, and emcee the event.

📖 Read all about it on our blog: Organizing a developer room at FOSDEM
🎥 Watch sessions: Web Performance @ FOSDEM 2020
📙
See also: History of Wikimedia attending FOSDEM

Multi-DC progress

This year saw the completion of two milestones on the MediaWiki Multi-DC roadmap. Multi-DC is a cross-team initiative driven by the Performance Team, to evolve MediaWiki for operation from multiple datacenters. This is motivated by higher resilience, and eliminating steps from switchover procedures. This eases or enables routine maintenance by allowing clusters to be turned off — without a major switchover event.

The Multi-DC initiative has brought about performance and resiliency improvements across the MediaWiki codebases, and at every level of our infrastructure. These gains are effective even in today’s single-DC operation. We resolved long-standing tech debt and improved extension interfaces, which increased developer productivity. We also reduced dependencies, coupling, restructured business logic, and implemented async eventual-consistency solutions.

This year we applied the Multi-DC strategy to MediaWiki’s ChronologyProtector (T254634), and started work on the MainStash DB (T212129).

Read more at Performance/Multi-DC MediaWiki.

Setting up a mobile device lab

Today we collect real-user data from pageviews, which alerts when a regression happens, but doesn’t help investigate and fix why. Synthetic testing complements this for desktop browsers, but we have no equivalent for mobile devices. Desktop browsers have an “emulate mobile” option, but DevTools emulation is nothing like real mobile devices

The goal of the mobile device lab is to find performance regressions on Wikipedia, that are relevant to the experience of our mobile users. Alerts include detailed profiles for investigation, like we do for desktop browsers today.

📖 Read more at Learnings from setting up a performance device lab 

Introducing: Web Perf Hero award

Starting in 2020, we give out a Web Perf Hero award to individuals who have gone above and beyond to improve site performance. It’s awarded (up to) once a quarter to individuals who demonstrate repeated care and discipline around performance.

Browse posts tagged Web Perf Hero award or find an overview of Web Perf Hero award on Wikitech.

Performance perception survey

Since 2018, we have an on-going survey measuring performance perception on several Wikipedias. You can find the main findings in last year’s blog post. An important take-away was that none of the standard and new metrics we tried, correlate well to real user experience. The “best” metric (page load time) scored a mere 0.14 on the Pearson coefficient scale (from 0 to 1). As such, it remains valuable to survey the real perceived performance, as empirical barometer to validate other performance monitoring.

Data from three cohorts, seen in Grafana. You can see that there’s loose correlation with page load time (“loadEventEnd”). When site performance degrades (time goes up), satisfaction gets worse too (positive percentage goes down). Likewise, when load time improves (yellow goes down), satisfaction improves (green goes up).

Refer to Research:Study of performance perception for the full dataset used in the 2019 paper.

Catalan Wikipedia, ca.wikipedia.org
Spanish Wikipedia, es.wikipedia.org
Russian Wikipedia, ru.wikipedia.org

Miscellaneous

Further reading

About this post

Featured image by Kuhnmi, CC BY-SA 4.0, via Wikimedia Commons.

Perf Matters at Wikipedia in 2019

19 December 2022 at 11:00

A large-scale study of Wikipedia’s quality of experience

Last year we reported how our extensive literature review on performance perception changed our perspective on what the field of web performance actually knows.

Existing frontend metrics correlated poorly with user-perceived performance. It became clear that the best way to understand perceived performance is still to ask people directly about their experience. We set out to run our own survey to do exactly that, and look for correlations from a range of well-known and novel performance metrics to the lived experience. We partnered with Dario Rossi, Telecom ParisTech, and Wikimedia Research to carry out the study (T187299).

While machine learning failed to explain everything, the survey unearthed many key findings. It gave us newfound appreciation for the old school Page Load Time metric, as the metric that best (or least-terribly) correlated to the real human experience.

📖 A large-scale study of Wikipedia’s quality of experience, the published paper.

Refer to Research:Study of performance perception on Meta-Wiki for the dataset, background info, and an extended technical report.

Throughout the study we blogged about various findings:

Join the World Wide Web Consortium (W3C)

W3C Logo

The Performance Team has been participating in web standards as individually “invited experts” for a while. We initiated the work for Wikimedia Foundation to become an W3C member organization, and by March 2019 it was official.

As a represented membership organization, we are now collaborating in W3C working groups alongside other major stakeholders to the Web!

Read more at Joining the World Wide Web Consortium

Element Timing API for Images experiment

In the search for a better user experience metric, we tried out the upcoming Element Timing API for images. This is meant to measure when a given image is displayed on-screen. We enrolled wikipedia.org in the ongoing Google Chrome origin trial for the Element Timing API.

Read all about it at Evaluating Element Timing API for Images 

Event Timing API origin trial

The upcoming Event Timing API is meant to help developers identify slow event handlers on web pages. This is an area of web performance that hasn’t gotten a lot of attention, but its effects can be very frustrating for users.

Via another Chrome origin trial, this experiment gave us an opportunity to gather data, discover bugs in several MediaWiki extensions, and provide early feedback on the W3C Editor’s Draft to the browser vendors designing this API.

Read more at Tracking down slow event handlers with Event Timing

Implement a new API in upstream WebKit

We decided to commission the implementation of a browser feature that measures performance from an end-user perspective. The Paint Timing API measures when content appears on-screen for a visitor’s device. This was, until now, a largely Chrome-only feature. Being unable to measure such a basic user experience metric for Safari visitors risks long-term bias, negatively affecting over 20% of our audience. It’s essential that we maintain equitable access and keep Wikimedia sites fast for everyone.

We funded and oversaw implementation of the Paint Timing API in WebKit. We contracted Noam Rosenthal who brings experience in both web standards and upstream WebKit development.

Read more at How Wikimedia contributed Paint Timing API to WebKit

Update (April 2021): The Paint Timing API has been released in Safari 14.1!

Wikipedia’s JavaScript initialisation on a budget

ResourceLoader is Wikipedia’s delivery system for styles, scripts, and localization. It delivers JavaScript code on web pages in two stages. This design prioritizes the user experience through optimal cache performance of HTML and individual modules, and through a consistent experience between page views (i.e. no flip-flopping between pages based on when they were cache). It also achieves a great developer experience by ensuring we don’t mix incompatible versions of modules on the same page, and by ensuring rollout (and rollback) of deployments and complete worldwide in under 10 minutes.

This design rests on the first stage (startup manifest) staying small. We carried out a large-scale audit that shrunk the manifest size back down, and put monitoring and guidelines in place. This work was tracked under T202154.

  1. Identify modules that are unused in practice. This included picking up unfinished or forgotten software deprecations, and removing code for obsolete browser compatibility.
  2. Consolidate modules that did not represent an application entrypoint or logical bundle. Extensions are encouraged to use directories and file splitting for internal organization. Some extensions were registering internal files and directories as public module bundles (like a linker or autoloader), thus growing the startup manifest for all page views.
  3. Shrink the registry holistically through clever math and improved compression.

We wrote new frontend development guides as reference material, enabling developers to understand how each stage of the page load process is impacted by different types of changes. We merged and redirected various older guides in favor of this one.

Read about it at Wikipedia’s JavaScript initialisation on a budget

Autonomous Systems performance report

We published our first AS report, which explores the experience of Wikimedia visitors by their IP network (such as mobile carriers and Internet service providers, also known as Autonomous Systems).

This new monthly report is notable for how it accounts for differences in device type and device performance, because device ownership and content choice is not equally distributed among people and regions. We believe our method creates a fair assessment that focuses specifically on the connectivity of mobile carriers and internet services providers, to Wikimedia datacenters.

The goal is to watch the evolution of these metrics over time, allowing us to identify improvements and potential pain points.

Read more at Introducing: Autonomous Systems performance report

Miscellaneous

Further reading

About this post

Feature image by Peng LIU, licensed under Creative Commons CC0 1.0.

Web Perf Hero: Máté Szabó

16 January 2024 at 05:30

MediaWiki is the platform that powers Wikipedia and other Wikimedia projects. There is a lot of traffic to these sites. We want to serve our audience in a way that they get the best experience and performance possible. So efficiency of the MediaWiki platform is of great importance to us and our readers.

MediaWiki is a relatively large application with 645,000 lines of PHP code in 4,600 PHP files, and growing! (Reported by cloc.) When you have as much traffic as Wikipedia, working on such a project can create interesting problems. 

MediaWiki uses an “autoloader” to find and import classes from PHP files into memory. In PHP, this happens on every single request, as each request gets its own process. In 2017, we introduced support for loading classes from PSR-4 namespace directories (in MediaWiki 1.31). This mechanism involves checking which directory contains a given class definition.

Problem statement

Kunal (@Legoktm) noticed after MediaWiki 1.35, wikis became slower due to spending more time in fstat system calls. Syscalls make a program switch to kernel mode, which is expensive.

We learned that our Autoloader was the one doing the fstat calls, to check file existence. The logic powers the PSR-4 namespace feature, and actually existed before MediaWiki 1.35. But, it only became noticeable after we introduced the HookRunner system, which loaded over 500 new PHP interfaces via the PSR-4 mechanism.

MediaWiki’s Autoloader has a class map array that maps class names to their file paths on disk. PSR-4 classes do not need to be present in this map. Before introducing HookRunner, very few classes in MediaWiki were loaded by PSR-4. The new hook files leveraged PSR-4, exposing many calls to file_exists() for PSR-4 directory searching, in every request. This adds up pretty quickly thereby degrading MediaWiki performance.

See task T274041 on Phabricator for the collaborative investigation between volunteers and staff.

Solution: Optimized class map

Máté Szabó (@TK-999) took a deep dive and profiled a local MediaWiki install with php-excimer and generated a flame graph. He found that about 16.6% of request time was spent in the Autoloader::find() method, which is responsible for finding which file contains a given class.

Figure 1: Flame graph by Máté Szabó.

Checking for file existence during PSR-4 autoloading seems necessary because one namespace can correspond to multiple directories that promise to define some of its classes. The search logic has to check each directory until it finds a class file. Only when the class is not not found anywhere may the program crash with a fatal error.

Máté avoided the directory searching cost by expanding MediaWiki’s Autoloader class map to include all classes, including those registered via PSR-4 namespaces. This solution makes use of a hash-map, where each class maps to one and only one file path on disk, making it a 1-to-1 mapping.

This means, the Autoloader::find() method no longer has to search through the PSR-4 directories. It now knows upfront where each class is, by merely accessing the array from memory. This removes the need for file existence checks. This approach is similar to the autoloader optimization flag in Composer.


Impact

Máté’s optimization significantly reduced response time by optimizing the Autoloader::find() method. This is largely due to the elimination of file system calls.

After deploying the change to MediaWiki appservers in production, we saw a major shift in response times toward faster buckets: a ~20% increase in requests completed within 50ms, and a ~10% increase in requests served under 100ms (T274041#8379204).

Máté analyzed the baseline and classmap cases locally, benchmarking 4800 requests, controlled at exactly 40 requests per second. He found latencies reduced on average by ~12%:

Table 1: Difference in latencies between baseline and classmap autoloader.
LatenciesBaselineFull classmap
p50
(mean average)
26.2ms22.7ms (~13.3% faster)
p9029.2ms25.7ms (~11.8% faster)
p9531.1ms27.3ms (~12.3% faster)

We reproduced Máté’s findings locally as well. On the Git commit right before his patch, Autoloader::find() really stands out.

Figure 2: Profile before optimization.
Figure 3: Profile after optimization.

NOTE: We used ApacheBench to load the /wiki/Main_Page URL from a local MediaWiki installation with PHP 8.1 on on Apple M1. We ran it both in a bare metal environment (PHP built-in webserver, 8 workers, no APCU), and in MediaWiki-Docker. We configured our benchmark to run 1000 requests with 7 concurrent requests. The profiles were captured using Excimer with a 1ms interval. The flame graphs were generated with Speedscope, and the box plots were created with Gnuplot.

In Figure 4 and 5, the “After” box plot has a lower median than the “Before” box plot. This means there is a reduction in latency. Also, the standard deviation in the “After” scenario shrunk, which indicates that responses were more consistently fast (not only on average). This increases the percentage of our users that have an experience very close to the average response time of web requests. Fewer users now experience an extreme case of web response slowness.

Figure 4: Boxplot for requests on bare metal.
Figure 5: Boxplot for requests on Docker.

Web Perf Hero award

The Web Perf Hero award is given to individuals who have gone above and beyond to improve the web performance of Wikimedia projects. The initiative is led by the Performance Team and started mid-2020. It is awarded quarterly and takes the form of a Phabricator badge.

Read about past recipients at Web Perf Hero award on Wikitech.


Further reading

Flame graphs arrive in WikimediaDebug

8 June 2023 at 19:00

The new “Excimer UI” option in WikimediaDebug generates flame graphs. What are flame graphs, and when do you need this?

A flame graph visualizes a tree of function calls across the codebase, and emphasizes the time each function spends. In 2014, we introduced Arc Lamp to help detect and diagnose performance issues in production. Arc Lamp samples live traffic and publishes daily flame graphs. This same diagnostic power is now available on-demand to debug sessions!

Debugging until now

WikimediaDebug is a browser extension for Firefox and Chromium-based browsers. It helps stage deployments and diagnose problems in backend requests. It can pin your browser to a given data center and server, send verbose messages to Logstash, and… capture performance profiles!

Our main debug profiler has been XHGui. XHGui is an upstream project that we first deployed in 2016. It’s powered by php-tideways under the hood, which favors accuracy in memory and call counts. This comes at the high cost of producing wildly inaccurate time measurements. The Tideways data model also can’t represent a call tree, needed to visualize a timeline (learn more, upstream change). These limitations have led to misinterpretations and inconclusive investigations. Some developers work around this manually with time-consuming instrumentation from a production shell. Others might repeatedly try fixing a problem until a difference is noticeable.

Table that lists function names with their call count, memory usage, and estimated runtime.
Screenshot of XHGui.

Accessible performance profiling

Our goal is to lower the barrier to performance profiling, such that it is accessible to any interested party, and quick enough to do often. This includes reducing knowledge barriers (internals of something besides your code), and mental barriers (context switch).

You might wonder (in code review, in chat, or reading a mailing list) why one thing is slower than another, what the bottlenecks are in an operation, or whether some complexity is “worth” it?

With WikimediaDebug, you flip a switch, find out, and continue your thought! It is part of a culture in which we can make things faster by default, and allows for a long tail of small improvements that add up.

Example: In reviewing a change, which proposes adding caching somewhere, I was curious. Why is that function slow? I opened the feature and enabled WikimediaDebug. That brought me to an Excimer profile where you can search (ctrl-F) for the changed function (“doDomain”). We find exactly how much time is spent in that particular function. You can verify our results, or capture your own!

Tree diagram of function calls from top to bottom, each level sized by how long that function runs.
Flame graph in Excimer UI via Speedscope (by Jamie Wong, MIT License).

What: Production vs Debugging

We measure backend performance in two categories: production and debugging.

“Production” refers to live traffic from the world at large. We collect statistics from MediaWiki servers, like latency, CPU/memory, and errors. These stats are part of the observability strategy and measure service availability (“SLO”). To understand the relationship between availability and performance, let’s look at an example. Given a browser that timed out after 30 seconds, can you tell the difference between a response that will never arrive (it’s lost), and a response that could arrive if you keep waiting? From the outside, you can’t!

When setting expectations, you thus actually define both “what” and “when”. This makes performance and availability closely intertwined concepts. When a response is slower than expected, it counts toward the SLO error budget. We do deliver most “too slow” responses to their respective browser (better than a hard error!). But above a threshold, a safeguard stops the request mid-way, and responds with a timeout error instead. This protects us against misuse that would drain web server and database capacity for other clients.

These high-level service metrics can detect regressions after software deployments. To diagnose a server overload or other regression, developers analyze backend traffic to identify the affected route (pageview, editing, login, etc.). Then, developers can dig one level deeper to function-level profiling, to find which component is at fault. On popular routes (like pageviews), Arc Lamp can find the culprit. Arc Lamp publishes daily flame graphs with samples from MediaWiki production servers.

Production profiling is passive. It happens continuously in the background and represents the shared experience of the public. It answers: What routes are most popular? Where is server time generally spent, across all routes?

“Debug” profiling is active. It happens on-demand and focuses on an individual request—usually your own. You can analyze any route, even less popular ones, by reproducing the slow request. Or, after drafting a potential fix, you can use debugging tools to stage and verify your change before deploying it worldwide.

These “unpopular” routes are more common than you might think. Wikipedia is among the largest sites with ~8 million requests per minute. About half a million are pageviews. Yet, looking at our essential workflows, anything that isn’t a pageview has too few samples for real-time monitoring. Each minute we receive a few hundred edits. Other workflows are another order of magnitude below that. We can take all edits, reviews of edits (“patrolling”), discussion replies, account blocks, page protections, etc; and their combined rate would be within the error budget of one high-traffic service.

Excimer to the rescue

Tim Starling on our team realized that we could leverage Excimer as the engine for a debug profiler. Excimer is the production-grade PHP sampling profiler used by Arc Lamp today, and was specifically designed for flame graphs and timelines. Its data model represents the full callstack.

Remember that we use XHGui with Tideways, which favors accurate call counts by intercepting every function call in the PHP engine. That costly choice skews time. Excimer instead favors low-overhead, through a sampling interval on a separate thread. This creates more representative time measures.
Re-using Excimer felt obvious in retrospect, but when we first deployed the debug services in 2016, Excimer did not yet exist. As a proof of concept, we first created an Excimer recipe for local development.

How it works

After completing the proof of concept, we identified four requirements to make Excimer accessible on-demand:

  1. Capture the profiling information,
  2. Store the information,
  3. Visualize the profile in a way you can easily share or link to,
  4. Discover and control it from an interface.

We took the capturing logic as-is from the proof of concept, and bundled it in mediawiki-config. This builds on the WikimediaDebug component, with an added conditional for the “excimer” option.

To visualize the data we selected Speedscope, an interactive profile data visualization tool that creates flame graphs. We did consider Brendan Gregg’s original flamegraph.pl script, which we use in Arc Lamp. flamegraph.pl specializes in aggregate data, using percentages and sample counts. This is great for Arc Lamp’s daily summaries, but when debugging a single request we actually know how much time has passed. It would be more intuitive to developers if we presented the time measurements, instead of losing that information. Speedscope can display time.

We store each captured profile in a MySQL key-value table, hosted in the Foundation’s misc database cluster. The cluster is maintained by SRE Data Persistence, and also hosts the databases of Gerrit, Phabricator, Etherpad, and XHGui.

Freely licensed software

We use Speedscope as the flame graph visualizer. Speedscope is an open source project by Jamie Wong. As part of this project we upstreamed two improvements, including a change to bundle a font rather than calling on a third-party CDN. This aligns with our commitment to privacy and independence.

The underlying profile data is captured by Excimer, a low-overhead sampling profiler for PHP. We developed Excimer in 2018 for Arc Lamp. To make the most of Speedscope’s feature set, we added support for time units and added the Speedscope JSON format as a built-in output type for Excimer.

We added Excimer to the php.net registry and submitted it to major Linux package managers (Debian, Ubuntu, Sury, and Remi’s RPM). Special thanks to Kunal Mehta as Debian Developer and fellow Wikimedian who packaged Excimer for Debian Linux. These packages make Excimer accessible to MediaWiki contributors and their local development environment (e.g. MediaWiki-Docker).

Our presence in the Debian repository carries special meaning. Presence in the Debian repository signals trust, stability, and confidence in our software to the free software ecosystem. For example, we were pleased to learn that Sentry adopted Excimer to power their Sentry Profiling for PHP service!

Try it!

If you haven’t already, install WikimediaDebug in your Firefox or Chrome browser.

  1. Navigate to any article on Wikipedia.
  2. Set the widget to On, with the “Excimer UI” checked.
  3. Reload the page.
  4. Click the “Open profile” link in the WikimediaDebug popup.

Accessible debugging tools empower you to act on your intuitions and curiosities, as part of a culture where you feel encouraged to do so. What we want to avoid is filtering these intuitions down to big incidents only, where you can justify hours of work, or depend on specialists.


Further reading:

Around the world: How Wikipedia became a multi-datacenter deployment

Learn why we transitioned the MediaWiki platform to serve traffic from multiple data centers, and the challenges we faced along the way.

Wikimedia Foundation provides access to information for people around the globe. When you visit Wikipedia, your browser sends a web request to our servers and receives a response. Our servers are located in multiple geographically separate datacenters. This gives us the ability to quickly respond to you from the closest possible location.

Six Wikimedia Foundation data centers around the world.

Two application data centers located in the United States, in Ashburn and Carrolton.

Four caching data centers, in Amsterdam, San Francisco, Singapore, and Marseille.
Data centers, Wikitech.

You can find out which data center is handling your requests by using the Network tab in your browser’s developer tools (e.g. right-click -> Inspect element -> Network). Refresh the page and click the top row in the table. In the “x-cache” response header, the first digit corresponds to a data center in the above map.

HTTP headers from en.wikipedia.org, shown in browser DevTools.
The "x-cache" header is set to CP4043.
The "server" header says MW2393.

In the example above, we can tell from the 4 in “cp4043”, that San Francisco was chosen as my nearest caching data center. The cache did not contain a suitable response, so the 2 in “mw2393” indicates that Dallas was chosen as the application data center. These are the ones where we run the MediaWiki platform on hundreds of bare metal Apache servers. The backend response from there is then proxied via San Francisco back to me.

Why multiple data centers?

Our in-house Content Delivery Network (CDN) is deployed in multiple geographic locations. This lowers response time by reducing the distance that data must travel, through (inter)national cables and other networking infrastructure from your ISP and Internet backbones. Each caching data center that makes up our CDN, contains cache servers that remember previous responses to speed up delivery. Requests that have no matching cache entry yet, must be forwarded to a backend server in the application data center.

If these backend servers are also deployed in multiple geographies, we lower the latency for requests that are missing from the cache, or that are uncachable. Operating multiple application data centers also reduces organizational risk from catastrophic damage or connectivity loss to a single data center. To achieve this redundancy, each application data center must contain all hardware, databases, and services required to handle the full worldwide volume of our backend traffic.

Multi-region evolution of our CDN

Wikimedia started running its first datacenter in 2004, in St Petersburg, Florida. This contained all our web servers, databases, and cache servers. We designed MediaWiki, the web application that powers Wikipedia, to support cache proxies that can handle our scale of Internet traffic. This involves including Cache-Control headers, sending HTTP PURGE requests when pages are edited, and intentional limitations to ensure content renders the same for different people. We originally deployed Squid as the cache proxy software, and later replaced it with Varnish and Apache Traffic Server.

In 2005, with only minimal code changes, we deployed cache proxies in Amsterdam, Seoul, and Paris. More recently, we’ve added caching clusters in San Francisco, Singapore, and Marseille. Each significantly reduces latency from Europe and Asia.

Adding cache servers increased the overhead of cache invalidation, as the backend would send an explicit PURGE request to each cache server. After ten years of growth both in Wikipedia’s edit rate and the number of servers, we adopted a more scalable solution in 2013 in the form of a one-to-many broadcast. This eventually reaches all caching servers, through a single asynchronous message (based on UDP multicast). This was later replaced with a Kafka-based system in 2020.

Screenshot of the "Bicycle" article on Wikipedia. The menu includes Create account and Log in links, indicating you are not logged-in. The toolbar includes a View source link.
When articles are temporarily restricted, “View source” replaces the familiar “Edit” link for most readers.

The traffic we receive from logged-in users is only a fraction of that of logged-out users, while also being difficult to cache. We forward such requests uncached to the backend application servers. When you browse Wikipedia on your device, the page can vary based on your name, interface preferences, and account permissions. Notice the elements highlighted in the example above. This kind of variation gets in the way of whole-page HTTP caching by URL.

Our highest-traffic endpoints are designed to be cacheable even for logged-in users. This includes our CSS/JavaScript delivery system (ResourceLoader), and our image thumbnails. The performance of these endpoints is essential to the critical path of page views.

Multi-region for application servers

Wikimedia Foundation began operating a secondary data center in 2014, as contingency to facilitate a quick and full recovery within minutes in the event of a disaster. We excercise full switchovers annually, and we use it throughout the year to ease maintenance through partial switchover of individual backend services.

Actively serving traffic from both data centers would add advantages over a cold-standby system:

  • Requests are forwarded to closer servers, which reduces latency. 
  • Traffic load is spread across more hardware, instead of half sitting idle. 
  • No need to “warm up” caches in a standby data center prior to switching traffic from one data center to another.
  • With multiple data centers in active use, there is institutional incentive to make sure each one can correctly serve live traffic. This avoids creation of services that are configured once, but not reproducible elsewhere.

We drafted several ideas into a proposal in 2015, to support multiple application data centers. Many components of the MediaWiki platform assumed operating from one backend data center. Such as assuming that a primary database is always reachable for querying, or that deleting a key from “the” Memcache cluster suffices to invalidate a cache. We needed to adopt new paradigms and patterns, deploy new infrastructure, and update existing components to accommodate these. Our seven-year journey ended in 2022, when we finally enabled concurrent use of multiple data centers!

The biggest changes that made this transition possible are outlined below.

HTTP verb traffic routing

MediaWiki was designed from the ground up to make liberal use of relational databases (e.g. MySQL). During most HTTP requests, the backend application makes several dozen round trips to its databases. This is acceptable when those databases are physically close to the web servers (<0.2ms ping time). But, this would accumulate significant delays if they are in different regions (e.g. 35ms ping time).

MediaWiki is also designed to strictly separate primary (writable) from replica (read-only) databases. This is essential at our scale. We have a CDN and hundreds of web servers behind it. As traffic grows, we can add more web servers and replica database servers as-needed. But, this requires that page views don’t put load on the primary database server — of which there can be only one! Therefore we optimize page views to rely only on queries to replica databases. This generally respects the “method” section of RFC 9110, which states that requests that modify information (such as edits) use HTTP POST requests, whereas read actions (like page views) only involve HTTP GET (or HTTP HEAD) requests.

The above pattern gave rise to the key idea that there could be a “primary” application datacenter for “write” requests, and “secondary” data centers for “read” requests. The primary databases reside in the primary datacenter, while we have MySQL replicas in both data centers. When the CDN has to forward a request to an application server, it chooses the primary datacenter for “write” requests (HTTP POST) and the closest datacenter for “read” requests (e.g. HTTP GET).

We cleaned up and migrated components of MediaWiki to fit this pattern. For pragmatic reasons, we did make a short list of exceptions. We allow certain GET requests to always route to the primary data center. The exceptions require HTTP GET for technical reasons, and change data at the same low frequency as POST requests. The final routing logic is implemented in Lua on our Apache Traffic Server proxies.

Media storage

Our first file storage and thumbnailing infrastructure relied on NFS. NetApp hardware provided mirroring to standby data centers.

By 2012, this required increasingly expensive hardware and proved difficult to maintain. We migrated media storage to Swift, a distributed file store.

As MediaWiki assumed direct file access, Aaron Schulz and Tim Starling introduced the FileBackend interface to abstract this. Each application data center has its own Swift cluster. MediaWiki tries writes to both clusters, and the “swiftrepl” background service manages consistency. When our CDN finds thumbnails absent from its cache, it forwards requests to the nearest Swift cluster.

Job queue

MediaWiki features a job queue system since 2009, for performing background tasks. We took our Redis-based job queue service, and migrated to Kafka in 2017. With Kafka, we support bidirectional and asynchronous replication. This allows MediaWiki to quickly and safely queue jobs locally within the secondary data center. Jobs are then relayed to and executed in the primary data center, near the primary databases.

The bidirectional queue helps support legacy features that discover data updates during a pageview or other HTTP GET request. Changing each of these features was not feasible in a reasonable time span. Instead, we designed the system to ensure queueing operations are equally fast and local to each data center.

In-memory object cache

MediaWiki uses Memcached as an LRU key-value store to cache frequently accessed objects. Though not as efficient as whole-page HTTP caching, this very granular cache is suitable for dynamic content.

Some MediaWiki extensions assumed that Memcached had strong consistency guarantees, or that a cache could be invalidated by setting new values at relevant keys when the underlying data changes. Although these assumptions were never valid, they worked well enough in a single data center.

We introduced WANObjectCache as a simple yet robust interface in MediaWiki. It takes care of dealing with multiple independent data centers. The system is backed by mcrouter, a Memcached proxy written by Facebook. WANObjectCache provides two basic functions: getWithSet and delete. It uses cache-aside in the local data center, and broadcasts invalidation to all data centers. We’ve migrated virtually all Memcached interactions in MediaWiki to WANObjectCache.

Parser cache

Most of a Wikipedia page is the HTML rendering of the editable content. This HTML is the result of parsing wikitext markup and expanding template macros. MediaWiki stores this in the ParserCache to improve scalability and performance. Originally, Wikipedia used its main Memcached cluster for this. In 2011, we added MySQL as the lower tier key-value store. This improved resiliency from power outages and simplified Memcached maintenance. ParserCache databases use circular replication between data centers.

Ephemeral object stash

The MainStash interface provides MediaWiki extensions on the platform with a general key-value store. Unlike Memcached, this is is a persistent store (disk-backed, to survive restarts) and replicates its values between data centers. Until now, in our single data center setup, we used Redis as our MainStash backend.

In 2022 we moved this data to MySQL, and replicate it between data centers using circular replication. Our access layer (SqlBagOStuff) adheres to a Last-Write-Wins consistency model.

Login sessions were similarly migrated away from Redis, to a new session store based on Cassandra. It has native support for multi-region clustering and tunable consistency models.

Reaping the rewards

Most multi-DC work took the form of incremental improvements and infrastructure cleanup, spread over several years. While we did find latency redunction on some of the individual changes, we mainly looked out for improvements in availability and reliability.

The final switch to “turn on” concurrent traffic to both application data centers was the HTTP verb routing. We deployed it in two stages. The first stage applied the routing logic to 2% of web traffic, to reduce risk. After monitoring and functional testing, we moved to the second stage: route 100% of traffic.

We reduced latency of “read” requests by ~15ms for users west of our data center in Carrollton (Texas, USA). For example, logged-in users within East Asia. Previously, we forwarded their CDN cache-misses to our primary data center in Ashburn (Virginia, USA). Now, we could respond from our closer, secondary, datacenter in Carrollton. This improvement is visible in the 75th percentile TTFB (Time to First Byte) graph below. The time is in seconds. Note the dip after 03:36 UTC, when we deployed the HTTP verb routing logic.

Line graph dated 6 September 2022, plotting Singapore upstream latency. Previously around 510ms, and drops down to 490ms after 3 AM.

Further reading

About this post

Featured image credit: Wikimedia servers by Victor Grigas, licensed CC BY-SA 3.0.

Perf Matters at Wikipedia in 2016

8 December 2022 at 16:43

Thumbor shadow-serving production traffic

Wikimedia Commons is our open media repository. Like Wikipedia and its other sister projects, Commons runs on the MediaWiki platform. Commons is home to millions of photos, documents, videos, and other multimedia files.

MediaWiki has a built-in imagescaler that, until now, we used in production as well. To improve security isolation, we started an effort in 2015 to develop support in MediaWiki for external media handling services. We choose Thumbor, an open-source thumbnail generation service, for Wikimedia’s thumbnailing needs.

During 2016 and 2017 we worked on Thumbor until it was feature complete and able to support the same open media formats and low-memory footprint as our MediaWiki setup. This included contributions to upstream Thumbor, and development of the wikimedia-thumbor plugin. We also fully packaged all dependencies for Debian Linux. Read more in The Journey to Thumbor (3-part series), or check the Wikitech docs

– Gilles Dubuc.


Exclude background tabs

During routine post-deployment checks we found the p99 First Paint metric regressed from 4s to 20s. That’s quite a jump. The median and p75 during the same time period remained constant at their sub-second values.

A graph of the distribution of firstPaint
A graph of the distribution of firstPaint
Distribution of First Paint, which prompted our investigation.

After an investigation we learned that page load time and visual rendering metrics are often skewed in visually hidden browser tabs (such as tabs that are open in the background). The deployment had refactored code such that background tabs could deprioritize more of the rendering work. Rather than revert this, we decided to change how MediaWiki’s Navigation Timing client collects these metrics. We now only sample pageviews in browser tabs that are “visible” from their birth until the page finishes loading.

To understand why background tabs had such an impact on our global metrics, we also ran a simple JS counter for a few days. We found that over a three-day period, 8.4% of page views in capable browsers were visually hidden for at least part of their load time. (Measured using the Page Visibility API, which itself was available on 98% of the sampled pageviews.)

Browser support and distribution of page visibility on Wikipedia.

– Peter Hedenskog and Timo Tijhof.


Performance Inspector goes Beta

We had an idea to improve page load time performance on Wikipedia by providing performance metrics to editors through an in-article modal link (T117411). By using the Performance Inspector, tech-savvy Wikipedians could use this extra data to inform edits that make the article load faster. At least, that was the idea.

It turns out that in reality it’s hard for users to distinguish between costs due to the article content and costs of our own software features. It was hard for editors to actually do something that made a noticeable difference in page load time. We discontinued the Performance Inspector in favor of providing more developer-oriented tools.

— Peter Hedenskog.

The discontinued Perf Inspector offered a modal interface to list each bundle with its size in kilobytes.
The “mw.inspect” console utility for calculating bundle sizes. 

Hello, HTTP/2!

Deploying HTTP/2 support to the Wikimedia CDN significantly changed how browsers negotiate and transfer data during the page load process. We anticipated a speed-up as part of the transition, and also identified specific opportunities to leverage HTTP/2 in our architecture for even faster page loads.

We also found unexpected regressions in page load performance during the HTTP/2 transition. In Chrome, pageviews using HTTP/2 initially had a slower Time to First Paint experience when compared to the previous HTTP/1 stack.  We wrote about this in HTTP/2 performance revisited.

– Timo Tijhof and Peter Hedenskog.


Stylesheet-aware dependency tracking

2016 saw a new state-tracking mechanism for stylesheets in ResourceLoader (Wikipedia’s JS/CSS delivery system). The HTML we send from MediaWiki to the browser, references a bundle of stylesheets. The server now also transmits a small metadata blob alongside that HTML, which provides the JS client with information about those stylesheets. On the client side, we utilize this new metadata to act as if those stylesheets were already imported by the client.

Why now

MediaWiki is built with semantic HTML and standardized CSS classes in both PHP-rendered and client-rendered elements alike. The server is responsible for loading the current skin stylesheets. We generally do not declare an explicit dependency from a JS feature to a specific skin stylesheet. This is by design, and allows us to separate concerns and give each skin control over how to style these elements.

The adoption of OOUI (our in-house UI framework that renders natively in both PHP and JavaScript), got to a point where an increasing number of features needed to load OOUI both as stylesheet for server-rendered elements, but also potentially load OOUI for (unrelated) JS functionality such as modal interactions elsewhere on the page. These JS-based interactions can happen on any page, including on pages that don’t embed OOUI elements server-side. Thus the OOUI module must include stylesheets in this bundle. This would have caused the stylesheet to sometimes download twice. We worked around this issue for OOUI, through a boolean signal from the server to the JS client (in the HTML head). The signal indicates whether OOUI styles were already referenced (change 267794).

Outcome

We turned our workaround into a small general-purpose mechanism built-in to ResourceLoader. It works transparently to developers, and is automatically applied to all stylesheets.

This enabled wider adoption of OOUI, and also applied the optimization to other reusable stylesheets in the wider MediaWiki ecosystem (such as for Gadgets). It also facilitates easy creation of multiple distinct OOUI bundles without developers having to manually track each with a boolean signal.

This tiny capability took only a few lines of code to implement, but brought huge bandwidth savings; both through relative improvements as well as through what we prevented from being incurred in the future.

Despite being small in code, we did plan for a multi-month migration (T92459). Over the years, some teams had begun to rely on a subtle bug in the old behavior. It was previously permitted to load a JavaScript bundle through a static stylesheet link. This wasn’t an intended feature of ResourceLoader, and would load only the stylesheet portion of the bundle. Their components would then load the same JS bundle a second time from the client-side, disregarding the fact that it downloaded CSS twice. We found that the reason some teams did this was to avoid a FOUC (first load the CSS for the server-rendered elements, then load the module in its entirety for client-side enhancements). In most cases, we mitigated this by splitting the module in question in two: a reusable stylesheet and a pure JS payload.

–  Timo Tijhof.


One step closer to Multi-DC

Prior to 2015, numerous MediaWiki extensions treated Memcache (erroneously) as a linearizable “black box”. A box that could be written to in a naive way. This approach, while somewhat intuitive, was based on dated and unrealistic assumptions:

  • That cache servers are always reachable for updates.
  • That transactions for database writes never fail, time out, or get rolled back later in the same request.
  • That database servers do not experience replication lag.
  • That there are no concurrent web requests also writing to the same database or cache in between our database reads.
  • That application and cache servers reside in a single data center region, with cache reads always reflecting prior writes.

The Flow extension, for example, made these assumptions and experienced anomalies even within our primary data center. The addition of multiple data centers would amplify these anomalies, reminding us to face the reality that these assumptions were not true.

Flow became among the first to adopt WANCache, a new developer-friendly interface we built for Memcached, specifically to offer high resiliency when operating at Wikipedia scale.

Replication lag was especially important. In MySQL/MariaDB, database reads can enjoy an “isolation level” that offers session consistency with repeatable reads. MediaWiki implements this by wrapping queries from a web request in one transaction. This means web requests will interact with one consistent and internally stable point-in-time state of the database. For example, this ensures foreign keys reliably resolve to related rows, even when queried later in the same request. However, it also means these queries perceive more replication lag.

WANCache is built using the “cache aside” and “purge” strategies. This means callers let go of the fine-grained control of (problematically) directly writing cache values. In exchange, they enjoy the simplicity of only declaring a cache key and a closure that computes the value. Optionally, they can send a “purge” notification to invalidate a cache key during a (soon-to-be-committed) database write.

Instead of proactively writing new values to both the database and the cache, WANCache lets subsequent HTTP requests fill the cache on-demand from a local DB replica. During the database write, we merely purge relevant cache keys. This avoids having to wait for, and incur load on, the primary DB during the critical path of wiki edits and other user actions. WANCache’s tombstone system prevents lagged data from getting (back) into a long-lived cache.

Read more about the Flow case study or Multi-DC MediaWiki.

– Aaron Schulz.


Improve database resilience

We made numerous improvements to database performance across the platform. This is often in collaboration with SRE and/or with the engineering teams that build atop our platform. We regularly review incident reports, flame graphs, and other metrics; and look for ways to address infra problems at the source, in higher-level components and MediaWiki service classes.

For example, the incident where a partial outage due to database unavailability, was caused by significant network saturation on the Wikimedia Commons database replicas. The saturation occurred due to the PdfHandler service fetching metadata from the database during every thumbnail transformation and every access to the PDF page count. This was mitigated by removing the need for metadata loads from the thumbnail handler, and refactoring the page count to utilize WANCache.

Another time we used our flame graphs to learn one of the top three queries came from WikiModule::preloadTitleInfo. This DB query uses batching to improve latency, and would traditionally be difficult to cache due to variable keys that each relate to part of a large dataset. We applied WANCache to WikiModule and used the “checkKeys” feature to facilitate easy cache invalidation of a large category of cache keys, through a single operation; without need for any propagation or tracking.

Read more about our flame graphs in Profiling PHP in production at scale.

– Aaron Schulz.


Further reading

About this post

Featured image credit: Long exposure of highway by PxHere, licensed under Creative Commons CC0 1.0.

Web Perf Hero: Valentín Gutierrez

21 November 2022 at 18:00

By Timo Tijhof

I’m happy to share that the second Web Perf Hero award of 2022 goes to Valentín Gutierrez!

This award is in recognition of Valentín’s work on the Wikimedia CDN over the past three months. In particular, Valentín dove deep into Apache Traffic Server. We use ATS as the second layer in our HTTP Caching strategy for MediaWiki. (The first layer is powered by Varnish.)

Cache miss

Valentín (@Vgutierrez) observed that ATS was treating many web requests as cache misses, despite holding a seemingly matching entry in the cache. To understand why, we have to talk about the Vary header.

If a page is served the same way to everyone, it can be cached under its URL and served as such to anyone navigating to that same URL. This is nearly true for us from a statistical viewpoint, except that we have editors with logged-in sessions, whose pageviews must bypass the CDN and its static HTML caches. In HTTP terminology, we say that MediaWiki server responses “vary” by cookies. Two clients with different cookies may get a different response. Two clients with the same cookies, or with no cookies, can enjoy the same cached response. But, log-in sessions aren’t the only cookies in town! For example, our privacy-conscious device counting metric also utilizes a cookie (“WMF-Last-Access”). It is a very low entropy cookie, but a cookie nonetheless. We also optionally use cookies for fundraising localisation, and various other JavaScript features. As such, a majority of connecting browsers will have at least one cookie.

The HTTP specification says that when a response for a URL varies by the value of a header (in our case, the Cookies header controls whether you’re logged-in), then cache proxies like ATS and Varnish must not re-use a cache entry, unless the original and current browser have the exact same cookies. For the cache to be effective, though, we must pay attention to the session cookie only, and ignore cookies related to metrics and JavaScript. For our Varnish cache, we do exactly that (through custom VCL code), but we never did this for ATS.

And so work began to implement Lua code for ATS to identify session cookies, and treat all other cookies as if they don’t exist — but only within the context of finding a match in the cache, restoring them right after.

Plot of p75 ATS backend latency in September 2022. The week before it swung around 475 ms, then it dropped down to 350ms.

In our Singapore data center, our ATS latency improved by 25% at the p75, e.g. from 475ms down to 350ms compared to the same time and day a week earlier. That’s a 125ms drop, which is one of the biggest reductions we’ve ever documented!

The reduction is due to more requests being served directly from the cache, instead of generating new pageviews for each combination of unrelated cookies. We can also measure this as a ratio between cache hits and cache misses — the cache hit ratio. For the Amsterdam data center, ATS cache hits went from ~600/s to 1200/s. As a percentage of all backend traffic, that’s from 2% to 4%. (The CDN frontend enjoys a cache hit ratio of 90-99% depending on entrypoint.)

Plot of ATS backend requests, from 600 per second in August to 1000 per second on September 1st with new peaks up to 1200 per second.

Disk reads

In September, Valentín created a Grafana dashboard to explore metrics from internal operations within ATS. This is part of on-going work to establish a high-level SLO for ATS. ATS reads from disk as part of serving a cache hit. Valentín discovered that disk reads were regularly taking up to a whole second. 

Plot of ATS cache_read_time p999 before the change, regularly spiking from 5 ms up to 1000 ms every few minutes.

Most traffic passing through ATS is a cache miss, where we respond within 300ms at the p75 (latency shown earlier). For the subset where we serve a cache hit at the ATS layer, we generally respond within ~5ms, magnitudes faster. When we observed a cache hit taking 1000ms to respond, that is not only very slow, it is also notably slower than generating a fresh page from a MediaWiki server.

After ruling out timeout-related causes, Valentín traced the issue to the ATS cache_dir_sync operation. This operation synchronizes metadata about cache entries to disk, and runs once every few minutes. It takes about one minute, during which we consistently saw 0.1% of requests experience the delay. Cache reads had to wait for a safety lock held by a single sync for the entire server. Valentín worked around the issue by partitioning the cache into multiple volumes, with the sync (and its lock) applying only to a portion of the data. These are held for a shorter period of time, and less likely to overlap with a cache read in the first place. (our investigation, upstream issue)

Plot of ATS cache_read_time p999, the sporadic spikes to 1000 ms make way for a flat line at 1 ms.

On most ATS servers, the cache read p999 dropped from spiking at 1000ms down to a steady 1ms. That’s a 1000X reduction!

Note that this issue was not observable through the 75th percentile measure, because each minute affected a different 0.1% of requests, despite happening consistently throughout the day. This is why we don’t recommend p75 for backend objectives. Left continuously, much more than 0.1% of clients would experience the issue. Resolving this avoids a constant spending of the error budget SLO, preserving our budget for more unusual and unforeseen issues down the line.

Web Perf Hero award

The Web Perf Hero award is given to individuals who have gone above and beyond to improve the web performance of Wikimedia projects. The initiative is led by the Performance Team and started mid-2020. It is awarded quarterly and takes the form of a Phabricator badge.

Read about past recipients at Web Perf Hero award on Wikitech.

HTTP/2 performance revisited

4 November 2022 at 14:00

By Timo Tijhof

Hello, HTTP/2!

In 2016, the Wikimedia Foundation deployed HTTP/2 (or “H2”) support to our CDN. At the time, we used Nginx- for TLS termination and two layers of Varnish for caching. We anticipated a possible speed-up as part of the transition, and also identified opportunities to leverage H2 in our architecture.

The HTTP/2 protocol was standardized through the IETF, with Google Chrome shipping support for the experimental SPDY protocol ahead of the standard. Brandon Black (SRE Traffic) led the deployment and had to make a choice between SPDY and H2. We launched with SPDY in 2015, as H2 support was still lacking in many browsers, and Nginx did not support having both. By May 2016, browser support had picked up and we switched to H2.

Goodbye domain sharding?

You can benefit more from HTTP/2 through domain consolidation. The following improvements were achieved by effectively undoing domain sharding:

  • Faster delivery of static CSS/JS assets. We changed ResourceLoader to no longer use a dedicated cookieless domain (“bits.wikimedia.org”), and folded our asset entrypoint back into the MediaWiki platform for faster requests local to a given wiki domain name (T107430).
  • Speed up mobile page loads, specifically mobile-device “m-dot” redirects. We consolidated the canonical and mobile domains behind the scenes, through DNS. This allows the browser to reuse and carry the same HTTP/2 connection over a cross-domain redirect (T124482).
  • Faster Geo service and faster localized fundraising banner rendering. The Geo service was moved from geiplookup.wikimedia.org to /geoiplookup on each wiki. The service was later removed entirely, in favor of an even faster zero-roundtrip solution (0-RTT): An edge-injected cookie within the Wikimedia CDN (T100902, patch). This transfers the information directly alongside the pageview without the delay of a JavaScript payload requesting it after the fact.

Could HTTP/2 be slower than HTTP/1?

During the SPDY experiment, Peter Hedenskog noticed early on that SPDY and HTTP/2 have a very real risk of being slower than HTTP/1. We observed this through our synthetic testing infrastructure.

In HTTP/1, all resources are considered equal. When your browser navigates to an article, it creates a dedicated connection and starts downloading HTML from the server. The browser streams, parses, and renders in real-time as each chunk arrives. The browser creates additional connections to fetch stylesheets and images when it encounters references to them. For a typical article, MediaWiki’s stylesheets are notably smaller than the body content. This means, despite naturally being discovered from within (and thus after the start of) the HTML download, the CSS download generally finishes first, while chunks from the HTML continue to trickle in. This is good, because it means we can achieve the First Paint and Visually Complete milestones (above-the-fold) on page views before the HTML has fully downloaded in the background.

Page load over HTTP/1.

In HTTP/2, the browser assigns a bandwidth priority to each resource, and resources share a single connection. This is different from HTTP/1, where each resource has its own connection, with lower-level networks and routers dividing their bandwidth equally as two seemingly unrelated connections. During the time where HTML and CSS downloads overlap, HTTP/1 connections each enjoyed about half the available bandwidth. This was enough for the CSS to slip through without any apparent delay. With HTTP/2, we observed that Chrome was not getting any CSS response until after the HTML was mostly done.

Page load over SPDY.

This HTTP/2 feature can solve a similar issue in reverse. If a webpage suffers from large amounts of JavaScript code and below-the-fold images being downloaded during the page load, under HTTP1 those low-priority resources would compete for bandwidth and starve the critical HTML and CSS downloads. The HTTP/2 priority system allows the browser and server to agree, and give more bandwidth to the important resources first. A bug in Chrome caused CSS to effectively have a lower priority relative to HTML (chromium #586938).

A graph of SPDY usage vs time to first paint
First paint regression correlated with SPDY rollout. (Ori Livneh, T96848#2199791)

We confirmed the hypothesis by disabling SPDY support on the Wikimedia CDN for a week (T125979). After Chrome resolved the bug, we transitioned from SPDY to HTTP/2 (T166129, T193221). This transition saw improvements both to how web browsers give signals to the server, and the way Nginx handled those signals.

As it stands today, page load time is overall faster on HTTP/2, and the CSS once again often finishes before the HTML. Thus, we achieve the same great early First Paint and Visually Complete milestones that we were used to from HTTP/1. But, we do still see edge cases where HTTP/2 is sometimes not able to re-negotiate priorities quick enough, causing CSS to needlessly be held back by HTML chunks that have already filled up the network pipes for that connection (chromium #849106, still unresolved as of this writing).

Lessons learned

These difficulties in controlling bandwidth prioritization taught us that domain consolidation isn’t a cure-all. We decided to keep operating our thumbnail service at upload.wikimedia.org through a dedicated IP and thus a dedicated connection, for now (T116132).

Browsers may reuse connections for multiple domains if an existing HTTPS connection carries a TLS certificate that includes the other domain in its SNI information, even when this connection is for a domain that corresponds to a different IP address in DNS. Under certain conditions, this can lead to a surprising HTTP 404 error (T207340, mozilla #1363451, mozilla #1222136). Emanuele Rocca from SRE Traffic Team mitigated this by implementing HTTP 421 response codes in compliance with the spec. This way, visitors affected by non-compliant browsers and middleware will automatically recover and reconnect accordingly.

Further reading

❌
❌