When AI Companies Train on Your Work and Become Your Competitor

Nikita Silaech
Dec 17, 2025
3 min read

The New York Times sued Perplexity AI on 5th December for copyright infringement, alleging that the startup scraped millions of Times articles without permission and used them to power its AI products.

The lawsuit joined more than 40 other active copyright disputes between publishers and AI companies, but what makes this one significant is that it reveals the fundamental contradiction embedded in how AI models are built. Perplexity trained its system on the Times' work, and now Perplexity competes directly with the Times as an information source (NYT, 2025).

This contradiction has been present from the beginning but invisible until recently. ChatGPT and other large language models were trained on vast amounts of internet content scraped without permission from creators. The companies behind these models argued that using copyrighted material to train AI models constitutes fair use, analogous to how humans learn by reading copyrighted books without paying each author. The training process doesn't distribute the copyrighted material, it just learns from it. The argument sounded reasonable until the models started producing content that directly competed with the sources they learned from.

Perplexity's particular offense, according to the Times, is that it doesn't just use Times content in its training. It actively reproduces verbatim or near-verbatim portions of Times articles in its search results and sometimes attributes fabricated information to the Times while displaying the Times' trademark. A user searching through Perplexity gets content that looks like Times content, stripped of the Times' copyright protections and presented as if Perplexity created it. The Times argues this is not fair use. This is theft that happens to use an AI as the mechanism.

The courts have not settled whether training on copyrighted material constitutes infringement at all. The fair use doctrine allows limited use of copyrighted material for purposes like criticism, commentary, and education. Defendants argue that AI training falls into this category because the model learns patterns rather than reproducing the original works. Plaintiffs argue that when AI systems output text that closely resembles or reproduces the original copyrighted work, that is infringement regardless of what happened during training.

This lawsuit is the second such instance. The Times' lawsuit against OpenAI, filed in 2023, is still ongoing and remains unresolved more than two years later. The case has become a proxy for the entire question of whether AI companies have the right to train on copyrighted material without permission or compensation. Getty Images sued Stability AI for the same reason. Authors have sued multiple AI companies. Multiple publishers have sued Perplexity.

However, the companies being sued have not clearly lost. OpenAI sought extensive discovery from the Times, which suggests the company believes it has a defensible legal position. The courts have not issued sweeping rulings against AI companies on copyright grounds. This creates a strange interim state where thousands of companies are allegedly committing copyright infringement but the courts have not ruled it illegal (USC IP, 2025).

Some jurisdictions are trying to impose solutions before the courts decide the question. India's Department for Promotion of Industry and Internal Trade proposed a mandatory licensing scheme where AI companies trained on copyrighted material would pay a flat percentage of global revenue to creators (Indian Express, 2025). The scheme sounds fair until you consider that it requires mandatory participation from copyright holders, meaning creators cannot opt out of having their work used for AI training. They just get paid whether they want to participate or not.

Responsible AI Foundation

When AI Companies Train on Your Work and Become Your Competitor

Related Posts

Comments

Never Miss a New Post.

Join Us