Normal view

There are new articles available, click to refresh the page.
Yesterday — 7 December 2025Main stream

Internet Archive Wayback Machine Link Fixer

By: Nick Heer
28 November 2025 at 05:05

The Internet Archive released a WordPress plugin not too long ago:

Internet Archive Wayback Machine Link Fixer is a WordPress plugin designed to combat link rot—the gradual decay of web links as pages are moved, changed, or taken down. It automatically scans your post content — on save and across existing posts — to detect outbound links. For each one, it checks the Internet Archive’s Wayback Machine for an archived version and creates a snapshot if one isn’t available.

Via Michael Tsai:

The part where it replaces broken links with archive links is implemented in JavaScript. I like that it doesn’t modify the post content in your database. It seems safe to install the plug-in without worrying about it messing anything up. However, I had kind of hoped that it would fix the links as part of the PHP rendering process. Doing it in JavaScript means that the fixed links are not available in the actual HTML tags on the page. And the data that the JavaScript uses is stored in an invisible <div> under the attribute data-iawmlf-post-links, which makes the page fail validation.

I love the idea of this plugin, but I do not love this implementation. I think I understand why it works this way: for the nondestructive property mentioned by Tsai, and also to account for its dependence on a third-party service of varying reliability. I would love to see a demo of this plugin in action.

⌥ Permalink

Before yesterdayMain stream

Internet Archive ‘Glitch’ Affects User Data

By: Nick Heer
10 October 2024 at 05:06

Speaking of the Internet Archive, Matt Sephton, in August, posted about the surprise loss of his account there:

Recently at Internet Archive a “glitch” (their choice of word) deleted a great many accounts, including my account that had been at archive.org/details/@gingerbeardman since 2015.

I had meant to post this nearer to when it happened but, like others, my requests for comment went unanswered, even when sent directly to an organization representative instead of a generic media inbox. Parts of Sephton’s account were thankfully restored, but only after this post was sent to Hacker News.

I find the Internet Archive’s utility unparalleled. I find some of its recent behaviour frustrating.

⌥ Permalink

The Internet Archive Is Under DDoS Attack

By: Nick Heer
9 October 2024 at 23:28

Jason Scott:

Someone is DDOSing the internet archive, so we’ve been down for hours. According to their twitter, they’re doing it just to do it. Just because they can. No statement, no idea, no demands.

An X account claiming responsibility says it is a politically motivated attack. If that is true, it is an awfully stupid rationale and a poor choice of target.

Wes Davis, the Verge:

Here’s what the popup said:

“Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!”

HIBP refers to Have I Been Pwned?, a website where people can look up whether or not their information has been published in data leaked from cyber attacks. It’s unclear what is happening with the site, but attacks on services like TweetDeck have exploited XSS or cross-site scripting vulnerabilities with similar effects.

I have no idea if this group actually obtained any Internet Archive user data. The site has only a placeholder page directing visitors to its X account for status updates, but I see nothing there or on Brewster Kahle’s personal one.

Update: Three minutes after publishing this post, I received an alert from Have I Been Pwned that my Internet Archive account was one of over 31 million total which had been exposed. Troy Hunt, who runs HIBP, and Lawrence Abrams of Bleeping Computer both tried contacting the Internet Archive with no response.

⌥ Permalink

Cool URLs Mean Something

By: Nick Heer
1 August 2024 at 03:55

Tim Berners-Lee in 1998:

Keeping URIs so that they will still be around in 2, 20 or 200 or even 2000 years is clearly not as simple as it sounds. However, all over the Web, webmasters are making decisions which will make it really difficult for themselves in the future. Often, this is because they are using tools whose task is seen as to present the best site in the moment, and no one has evaluated what will happen to the links when things change. The message here is, however, that many, many things can change and your URIs can and should stay the same. They only can if you think about how you design them.

Jay Hoffmann:

Links give greater meaning to our webpages. Without the link, we would lose this significant grammatical tool native the web. And as links die out and rot on the vine, what’s at stake is our ability to communicate in the proper language of hypertext.

A dead link may not seem like it means very much, even in the aggregate. But they are. One-way links, the way they exist on the web where anyone can link to anything, is what makes the web universal. In fact, the first name for URL’s was URI’s, or Universal Resource Identifier. It’s right there in the name. And as Berners-Lee once pointed out, “its universality is essential.”

In 2018, Google announced it was deprecating its URL shortener, with no new links being created after March 2019. All existing shortened links would, however, remain active. It announced this in a developer blog post which — no joke — returns a 404 error at its original URL, which I found via 9to5Google. Google could not bother to redirect posts from just six years ago to their new valid URLs.

Google’s URL shortener was in the news again this month because the company has confirmed it will turn off these links in August 2025 except for those created via Google’s own apps. Google Maps, for example, still creates a goo.gl short link when sharing a location.

In principle, I support this deprecation because it is confusing and dangerous for Google’s own shortened URLs to have the same domain as ones created by third-party users. But this is a Google-created problem because it designed its URLs poorly. It should have never been possible for anyone else to create links with the same URL shortener used by Google itself. Yet, while it feels appropriate for a Google service to be unreliable over a long term, it also should not be ending access to links which may have been created just about five years ago.

By the way, the Sophos link on the word “dangerous” in that last paragraph? I found it via a ZDNet article where the inline link is — you guessed it — broken. Sophos also could not bother to redirect this URL from 2018 to its current address. Six years ago! Link rot is a scourge.

⌥ Permalink

❌
❌