I feel like this is related to these issues (with somebody attempting this appro...

ylere · 2026-03-12T19:40:18 1773344418

It also shows why this approach is questionable. Opus 4.6 without tool use or web access can provide chardets source code in full from memory/training data (ironically, including the licensing header): https://gist.github.com/yannleretaille/1ce99e1872e5f3b7b133e...

torginus · 2026-03-12T21:18:26 1773350306

This comes with the uncomfortable implication that its impossible to tell actually to what extent are LLMs pulling together snippets of GPLd code, and to what extent is that legally acceptable.

pera · 2026-03-12T22:31:19 1773354679

There are a lot of examples like that since the first announcement of GitHub Copilot in 2021, search for (copying) "verbatim" in this submission:

https://news.ycombinator.com/item?id=27676266

Here is a more recent example I found in Cursor's browser experiment from January:

https://news.ycombinator.com/item?id=46661236

SlinkyOnStairs · 2026-03-12T21:46:41 1773352001

> and to what extent is that legally acceptable.

De-jure, not at all.

Parallel creation is a very minimal defense to copyright infringement claims. It is practically impossible to prove in humans, to much annoyance of musicians. "Go prove in a court that you have never heard this song, not even in the background somewhere".

LLMs having been trained on all software they could get their hands on will fail this test. There is no parallel creation claim to be had. AI firms love to trot out the "they learn just like humans" which is both false and irrelevant; It's copyright when humans do it to. If you view a GPL'd repo and later reproduce the code unintentionally? Still copyright infringement.

De-facto though, things are different. The technical details behind LLMs are irrelevant. AI companies lie and frustrate discovery, whilst begging politicians to pass laws legalizing their copyright infringement.

There won't be a copyright reckoning, not anymore. All the dumb politicians think AI is going to bail out their economies.

codethief · 2026-03-12T20:37:50 1773347870

Wow, I did not expect such perfect reproduction. Link to the actual source code (before being rewritten):

https://github.com/chardet/chardet/blob/5.0.0/chardet/mbchar...

ylere · 2026-03-13T01:01:20 1773363680

Indeed, and that's through the API. If you use Claude Chat/Code and even if you then turn off web search, it still has access to some of its tools (for doing calculations, running small code snippets etc.) and that environment contains chardets code 4 times:

  /home/claude/.cache/uv/archive-v0/nZCy52fMCgTsNaLySn0xf/chardet
  /home/claude/.cache/uv/wheels-v6/pypi/chardet
  /usr/lib/python3/dist-packages/pip/_vendor/chardet
  /usr/local/lib/python3.12/dist-packages/chardet

It's not surprising that they were able to create a new, working version of chardet this quickly. It seems the author just told Claude Code to "do a clean room implementation" and to make sure the code looks different from the original chardet (named several times in the prompt) without considering the training set and the tendency for LLMs to "cheat".

lupire · 2026-03-12T17:45:26 1773337526

That's worth its own submission and discussion.

alberto-m · 2026-03-12T18:08:25 1773338905

It has been submitted last week, happy reading:

https://news.ycombinator.com/item?id=47259177

alexwebb2 · 2026-03-12T22:26:33 1773354393

Wow. The guy who’s been thanklessly maintaining the project for 10+ years, with very little help, went way out of his way to produce a zero-reuse, ground-up reimplementation so that it could be MIT licensed... and the very-online copyleft crowd is crucifying him for it and telling him to kick rocks.

Unbelievable. This is why we can’t have nice things.

aeyes · 2026-03-13T13:15:26 1773407726

Mark Pilgrim isn't even the original author, he just ported the C version to Python and contributed nothing to it for the last 10 years.

If you take 5 minutes to look at the code you'll see that v7 works in a completely different way, it mostly uses machine learning models instead of heuristics. Even if you compare the UTF8 or UTF16 detection code you'll see that they have absolutely nothing in common.

Its just API compatible and the API is basically 3 functions.

If he had published this under a different name nobody would have challenged it.

marxisttemp · 2026-03-13T02:09:20 1773367760

Nothing to help out a thankless maintainer like allowing companies to use his work wholesale while contributing nothing back! Enjoy your nice things