AI’s Dirty Little Secret

I believe artificial intelligence’s implementation is going to be huuuuuuuuuuuge, but it has a dirty little secret: it crawls the web with impunity and appropriates trillions of nuggets of information, some of which is protected by copyright, and a great percentage of which the owners of the information do not want to freely surrender without compensation.

The creepy crawlers do not ask permission; they just crawl while you are sleeping.

OpenAI ChatGPT crawler closing in on blogger content in Amarillo, Texas this morning.

Why can’t AI just steal that stuff, Big Red Car?

A good many creative providers of content have “opted out” from allowing AI crawlers — such as OpenAI, Microsoft, Google, Common Crawl — to crawl their sites. This is true of more than 600 self-proclaimed news publishers.

The New York Times is amongst this group. They claim that OpenAI’s ChatGPT has been trained by using gargantuan amounts of their current and archived content without their permission and without compensating them.

The NYT is currently engaged in litigation with both OpenAI and Microsoft. This will be fun to watch as the NYT has a ton of content and the crawlers have already gotten most of it.

This single action will become of substantial importance when this gets where it is ultimately going to arrive — the courthouse.

Why can’t they steal it, Big Red Car?

Well, because much of it is, as I said, copyrighted. They are instead taking two different approaches:

 1. In some instances, they are entering into agreements with content providers (Associated Press, Axel-Springer, Business Insider) to compensate them for their free use of the content provider’s content. This is the licensing approach.

 2. In some other instances, they are dealing with lawsuits from folks who object to the free use of their content.

Amongst this group is the author John Grisham who objects to people using his original content without paying him for that right. Hard to find fault with that approach, no?

This matter will be decided in a courtroom or on the steps of the courtroom. This is the copyright infringement litigation approach.

Whoa, wait just a damn second, Big Red Car — fair use?

Ahhh, dear reader, yes the defense offered by the AI crawlers is what is called “fair use” under the US Copyright Act 1976, Section 107 that allows the use of copyrighted material to “train” the AI device and to “transform” that information into something that is unrecognizable as the original copyrighted material.

The problem with that — as it relates to the NYT instance — is the crawlers broke through the NYT paywall and that when ChatGPT was tasked to write an article, it came out almost exactly like the NYT copyrighted material.

OK, Big Red Car, how about the European Union AI Act?

Yes, dear reader, those European sharpies are ahead of the US in dealing with AI (perhaps) and enacted the EU AI Act that forces crawlers to list all their source materials and where they have used copyrighted material to train their products they will have to have had legal access which means they will have to have struck a deal — compensated — the likes of the NYT and others.

Bottom line it, Big Red Car — we’re leaving at noon

OK, dear reader, there is a lot contentious activity going on behind the scenes with the AI crawlers and the content suppliers and it is also going to be huuuuuuuuuge.

But, hey, what the Hell do I really know anyway? I’m just a Big Red Car.

This weekend call someone who is lonely and chat them up. You will feel good about yourself and it will justify indulging yourself. Trust me on this.