Normal view

There are new articles available, click to refresh the page.
Before yesterdayPixel Envy

Web Development Tip: Disable Pointer Events on Link Images

By: Nick Heer
27 November 2025 at 04:45

Good tip from Jeff Johnson:

My business website has a number of “Download on the App Store” links for my App Store apps. Here’s an example of what that looks like:

[…]

The problem is that Live Text, “Select text in images to copy or take action,” is enabled by default on iOS devices (Settings → General → Language & Region), which can interfere with the contextual menu in Safari. Pressing down on the above link may select the text inside the image instead of selecting the link URL.

I love the Live Text feature, but it often conflicts with graphics like these. There is a good, simple, two-line CSS trick for web developers that should cover most situations. Also, if you rock a user stylesheet — and I think you should — it seems to work fine as a universal solution. Any issues I have found have been minor and not worth noting. I say give it a shot.

Update: Adding Johnson’s CSS to a user stylesheet mucks up the layout of Techmeme a little bit. You can exclude it by adding div:not(.ii) > before a:has(> img) { display: inline-block; }.

⌥ Permalink

Typepad Is Shutting Down Next Month

By: Nick Heer
28 August 2025 at 02:51

Typepad:

After September 30, 2025, access to Typepad – including account management, blogs, and all associated content – will no longer be available. Your account and all related services will be permanently deactivated.   

I have not thought about Typepad in years, and I am certain I am not alone. That is not a condemnation; Typepad occupies a particular time and place on the web. As with anything hosted, however, users are unfortunately dependent on someone else’s interest in maintaining it.

If you have anything hosted at Typepad, now is a good time to back it up.

⌥ Permalink

Interview With MacSurfer’s New Owner, Ken Turner

By: Nick Heer
16 August 2025 at 20:56

Nice scoop from Eric Schwarz:

Over the past week, I’ve been working to track down the new owner of MacSurfer’s Headline News, a beloved site that shut down in 2020 and has recently had somewhat mysterious revival. Fortunately, after some digging that didn’t really lead anywhere, I received an email from its new owner, Ken Turner, and he graciously took the time to answer a few questions about the new project.

Turner sounds like a great steward to carry on the MacSurfer legacy. Even in an era of well-known aggregators like Techmeme and massive forums like Hacker News and Reddit, I think there is still a role for a smaller and more focused media tracking site.

I am uncertain what the role of BackBeat Media is in all this. I have not heard from Dave Hamilton or anyone there to confirm if they even have a role.

⌥ Permalink

MacSurfer Returns

By: Nick Heer
14 August 2025 at 17:39

Five years ago, Apple and tech news aggregator MacSurfer announced it was shutting down. The site was still accessible albeit in a stopped-time state, and it seemed that is how it would sit until the server died.

In June, though, MacSurfer was relaunched. The design has been updated and it is no longer as technically simple as it once was, but — charmingly — the logo appears to be the exact same static GIF as always. I cannot find any official announcement of its return.

Eric Schwarz:

It looks like Macsurfer is coming back, but I can’t find any details or who’s behind it? I really hope it’s not AI slop or someone trying to make a buck off nostalgia like iLounge or TUAW.

I had the same question, so I started digging. MxToolbox reveals a txt record on the domain for validating with Google apps, registered to BackBeat Media. BackBeat’s other properties include the Mac Observer, AppleInsider, and PowerPage. A review of historical MacSurfer txt records using SecurityTrails indicates the site has been with Backbeat Media since at least 2011, even though BackBeat’s site has not listed MacSurfer even when it was actively updated.

I cannot confirm the ownership is the same yet but I have asked Dave Hamilton, of BackBeat, and will update this if I hear back.

⌥ Permalink

Trapping Misbehaving Bots in an A.I. Labyrinth

By: Nick Heer
22 March 2025 at 04:32

Reid Tatoris, Harsh Saxena, and Luis Miglietti, of Cloudflare:

Today, we’re excited to announce AI Labyrinth, a new mitigation approach that uses AI-generated content to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives. When you opt in, Cloudflare will automatically deploy an AI-generated set of linked pages when we detect inappropriate bot activity, without the need for customers to create any custom rules.

Two thoughts:

  1. This is amusing. Nothing funnier than using someone’s own words or, in this case, technology against them.

  2. This is surely going to lead to the same arms race as exists now between privacy protections and hostile adtech firms. Right?

⌥ Permalink

Cool URLs Mean Something

By: Nick Heer
1 August 2024 at 03:55

Tim Berners-Lee in 1998:

Keeping URIs so that they will still be around in 2, 20 or 200 or even 2000 years is clearly not as simple as it sounds. However, all over the Web, webmasters are making decisions which will make it really difficult for themselves in the future. Often, this is because they are using tools whose task is seen as to present the best site in the moment, and no one has evaluated what will happen to the links when things change. The message here is, however, that many, many things can change and your URIs can and should stay the same. They only can if you think about how you design them.

Jay Hoffmann:

Links give greater meaning to our webpages. Without the link, we would lose this significant grammatical tool native the web. And as links die out and rot on the vine, what’s at stake is our ability to communicate in the proper language of hypertext.

A dead link may not seem like it means very much, even in the aggregate. But they are. One-way links, the way they exist on the web where anyone can link to anything, is what makes the web universal. In fact, the first name for URL’s was URI’s, or Universal Resource Identifier. It’s right there in the name. And as Berners-Lee once pointed out, “its universality is essential.”

In 2018, Google announced it was deprecating its URL shortener, with no new links being created after March 2019. All existing shortened links would, however, remain active. It announced this in a developer blog post which — no joke — returns a 404 error at its original URL, which I found via 9to5Google. Google could not bother to redirect posts from just six years ago to their new valid URLs.

Google’s URL shortener was in the news again this month because the company has confirmed it will turn off these links in August 2025 except for those created via Google’s own apps. Google Maps, for example, still creates a goo.gl short link when sharing a location.

In principle, I support this deprecation because it is confusing and dangerous for Google’s own shortened URLs to have the same domain as ones created by third-party users. But this is a Google-created problem because it designed its URLs poorly. It should have never been possible for anyone else to create links with the same URL shortener used by Google itself. Yet, while it feels appropriate for a Google service to be unreliable over a long term, it also should not be ending access to links which may have been created just about five years ago.

By the way, the Sophos link on the word “dangerous” in that last paragraph? I found it via a ZDNet article where the inline link is — you guessed it — broken. Sophos also could not bother to redirect this URL from 2018 to its current address. Six years ago! Link rot is a scourge.

⌥ Permalink

Third-Party Cookies Have Got to Go

By: Nick Heer
30 July 2024 at 02:26

Anthony Chavez, of Google:

[…] Instead of deprecating third-party cookies, we would introduce a new experience in Chrome that lets people make an informed choice that applies across their web browsing, and they’d be able to adjust that choice at any time. We’re discussing this new path with regulators, and will engage with the industry as we roll this out.

Oh good — more choices.

Hadley Beeman, of the W3C’s Technical Architecture Group:

Third-party cookies are not good for the web. They enable tracking, which involves following your activity across multiple websites. They can be helpful for use cases like login and single sign-on, or putting shopping choices into a cart — but they can also be used to invisibly track your browsing activity across sites for surveillance or ad-targeting purposes. This hidden personal data collection hurts everyone’s privacy.

All of this data collection only makes sense to advertisers in the aggregate, but it only works because of specifics: specific users, specific webpages, and specific actions. Privacy Sandbox is imperfect but Google could have moved privacy forward by ending third-party cookies in the world’s most popular browser.

⌥ Permalink

⌥ Engineering Consent

By: Nick Heer
30 July 2024 at 02:02

Anthony Ha, of TechCrunch, interviewed Jean-Paul Schmetz, CEO of Ghostery, and I will draw your attention to this exchange:

AH I want to talk about both of those categories, Big Tech and regulation. You mentioned that with GDPR, there was a fork where there’s a little bit of a decrease in tracking, and then it went up again. Is that because companies realized they can just make people say yes and consent to tracking?

J-PS What happened is that in the U.S., it continued to grow, and in Europe, it went down massively. But then the companies started to get these consent layers done. And as they figured it out, the tracking went back up. Is there more tracking in the U.S. than there is in Europe? For sure.

AH So it had an impact, but it didn’t necessarily change the trajectory?

J-PS It had an impact, but it’s not sufficient. Because these consent layers are basically meant to trick you into saying yes. And then once you say yes, they never ask again, whereas if you say no, they keep asking. But luckily, if you say yes, and you have Ghostery installed, well, it doesn’t matter, because we block it anyway. And then Big Tech has a huge advantage because they always get consent, right? If you cannot search for something in Google unless you click on the blue button, you’re going to give them access to all of your data, and you will need to rely on people like us to be able to clean that up.

The TechCrunch headline summarizes this by saying “regulation won’t save us from ad trackers”, but I do not think that is a fair representation of this argument. What it sounds like, to me, is that regulations should be designed more effectively.

The E.U.’s ePrivacy Directive and GDPR have produced some results: tracking is somewhat less pervasive, people have a right to data access and portability, and businesses must give users a choice. That last thing is, as Schmetz points out, also its flaw, and one it shares with something like App Tracking Transparency on iOS. Apps affected by the latter are not permitted to keep asking if tracking is denied, but they do similarly rely on the assumption a user can meaningfully consent to a cascading system of trackers.

In fact, the similarities and differences between cookie banner laws and App Tracking Transparency are considerable. Both require some form of consent mechanism immediately upon accessing a website or an app, assuming a user can provide that choice. Neither can promise tracking will not occur should a user deny the request. Both are interruptive.

But cookie consent laws typically offer users more information; many European websites, for example, enumerate all their third-party trackers, while App Tracking Transparency gives users no visibility into which trackers will be allowed. The latter choice is remembered forever unless a user removes and reinstalls the app, while websites can ask you for cookie consent on each visit. Perhaps the latter may sometimes be a consequence of using Safari; it is hard to know.

App Tracking Transparency also has a system-wide switch to opt out of all third-party tracking. There used to be something similar in web browsers, but compliance was entirely optional. Its successor effort, Global Privacy Control, is sadly not as widely supported as it ought to be, but it appears to have legal teeth.

Both of these systems have another important thing in common: neither are sufficiently protective of users’ privacy because they burden individuals with the responsibility of assessing something they cannot reasonably comprehend. It is patently ridiculous to put the responsibility on individuals to mitigate a systemic problem like invasive tracking schemes.

There should be a next step to regulations like these because user tracking is not limited to browsers where Ghostery can help — if you know about it. A technological response is frustrating and it is unclear to me how effective it is on its own. This is clearly not a problem only regulation can solve but neither can browser extensions. We need both.

⌥ On Robots and Text

By: Nick Heer
20 June 2024 at 17:25

After Robb Knight found — and Wired confirmed — Perplexity summarizes websites which have followed its opt out instructions, I noticed a number of people making a similar claim: this is nothing but a big misunderstanding of the function of controls like robots.txt. A Hacker News comment thread contains several versions of these two arguments:

  • robots.txt is only supposed to affect automated crawling of a website, not explicit retrieval of an individual page.

  • It is fair to use a user agent string which does not disclose automated access because this request was not automated per se, as the user explicitly requested a particular page.

That is, publishers should expect the controls provided by Perplexity to apply only to its indexing bot, not a user-initiated page request. Wary of being the kind of person who replies to pseudonymous comments on Hacker News, this is an unnecessarily absolutist reading of how site owners expect the Robots Exclusion Protocol to work.

To be fair, that protocol was published in 1994, well before anyone had to worry about websites being used as fodder for large language model training. And, to be fairer still, it has never been formalized. A spec was only recently proposed in September 2022. It has so far been entirely voluntary, but the draft standard proposes a more rigid expectation that rules will be followed. Yet it does not differentiate between different types of crawlers — those for search, others for archival purposes, and ones which power the surveillance economy — and contains no mention of A.I. bots. Any non-human means of access is expected to comply.

The question seems to be whether what Perplexity is doing ought to be considered crawling. It is, after all, responding to a direct retrieval request from a user. This is subtly different from how a user might search Google for a URL, in which case they are asking whether that site is in the search engine’s existing index. Perplexity is ostensibly following real-time commands: go fetch this webpage and tell me about it.

But it clearly is also crawling in a more traditional sense. The New York Times and Wired both disallow PerplexityBot, yet I was able to ask it to summarize a set of recent stories from both publications. At the time of writing, the Wired summary is about seventeen hours outdated, and the Times summary is about two days old. Neither publication has changed its robots.txt directives recently; they were both blocking Perplexity last week, and they are blocking it today. Perplexity is not fetching these sites in real-time as a human or web browser would. It appears to be scraping sites which have explicitly said that is something they do not want.

Perplexity should be following those rules and it is shameful it is not. But what if you ask for a real-time summary of a particular page, as Knight did? Is that something which should be identifiable by a publisher as a request from Perplexity, or from the user?

The Robots Exclusion Protocol may be voluntary, but a more robust method is to block bots by detecting their user agent string. Instead of expecting visitors to abide by your “No Homers Club” sign, you are checking IDs. But these strings are unreliable and there are often good reasons for evading user agent sniffing.

Perplexity says its bot is identifiable by both its user agent and the IP addresses from which it operates. Remember: this whole controversy is that it sometimes discloses neither, making it impossible to differentiate Perplexity-originating traffic from a real human being — and there is a difference.

A webpage being rendered through a web browser is subject to the quirks and oddities of that particular environment — ad blockers, Reader mode, screen readers, user style sheets, and the like — but there is a standard. A webpage being rendered through Perplexity is actually being reinterpreted and modified. The original text of the page is transformed through automated means about which neither the reader or the publisher has any understanding.

This is true even if you ask it for a direct quote. I asked for a full paragraph of a recent article and it mashed together two separate sections. They are direct quotes, to be sure, but the article must have been interpreted to generate this excerpt.1

It is simply not the case that requesting a webpage through Perplexity is akin to accessing the page via a web browser. It is more like automated traffic — even if it is being guided by a real person.

The existing mechanisms for restricting the use of bots on our websites are imperfect and limited. Yet they are the only tools we have right now to opt out of participating in A.I. services if that is something one wishes to do, short of putting pages or an entire site behind a user name and password. It is completely reasonable for someone to assume their signal of objection to any robotic traffic ought to be respected by legitimate businesses. The absolute least Perplexity can do is respecting those objections by clearly and consistently identifying itself, and excluding websites which have indicated they do not want to be accessed by these means.


  1. I am not presently blocking Perplexity, and my argument is not related to its ability to access the article. I am only illustrating how it reinterprets text. ↥︎

Perplexity A.I. Is Lying About Its User Agent

By: Nick Heer
15 June 2024 at 15:49

Robb Knight blocked various web scrapers via robots.txt and through nginx. Yet Perplexity seemed to be able to access his site:

I got a perfect summary of the post including various details that they couldn’t have just guessed. Read the full response here. So what the fuck are they doing?

[…]

Before I got a chance to check my logs to see their user agent, Lewis had already done it. He got the following user agent string which certainly doesn’t include PerplexityBot like it should: […]

I am sure Perplexity will respond to this by claiming it was inadvertent, and it has fixed the problem, and it respects publishers’ choices to opt out of web scraping. What matters is how we have only a small amount of control over how our information is used on the web. It defaults to open and public — which is part of the web’s brilliance, until the audience is no longer human.

Unless we want to lock everything behind a login screen, the only mechanisms for control that we have are dependent on companies like Perplexity being honest about their bots. There is no chance this problem only affects the scraping of a handful of independent publishers; this is certainly widespread. Without penalty or legal reform, A.I. companies have little incentive not to do exactly the same as Perplexity.

⌥ Permalink

The Deskilling of Web Development

By: Nick Heer
29 May 2024 at 17:55

Baldur Bjarnason:

But instead we’re all-in on deskilling the industry. Not content with removing CSS and HTML almost entirely from the job market, we’re now shifting towards the model where devs are instead “AI” wranglers. The web dev of the future will be an underpaid generalist who pokes at chatbot output until it runs without error, pokes at a copilot until it generates tests that pass with some coverage, and ships code that nobody understand and can’t be fixed if something goes wrong.

There are parallels in the history of software development to the various abstractions accumulated in a modern web development stack. Heck, you can find people throughout history bemoaning how younger generations lack some fundamental knowledge since replaced by automation or new technologies. It is always worth a gut-check about whether newer ideas are actually better. In the case of web development, what are we gaining and losing by eventually outsourcing much of it to generative software?

I think Bjarnason is mostly right: if web development become accessible by most through layers of A.I. and third-party frameworks, it is abstracted to such a significant extent that it becomes meaningless gibberish. In fairness, the way plain HTML, CSS, and JavaScript work is — to many — meaningless gibberish. It really is better for many people that creating things for the web has become something which does not require a specialized skillset beyond entering a credit card number. But that is distinct from web development. When someone has code-level responsibility, they have an obligation to understand how things work.

⌥ Permalink

❌
❌