Getting your Trinity Audio player ready...
NYTimes suing OpenAI, Microsoft over copyright
Ruby Media Group CEO Kristen Ruby tells ‘Fox News Live’ that plagiarism ‘does not benefit humanity,’ saying machine learning ‘circumvents’ copyright laws.
The New York Times sued OpenAI and Microsoft, accusing the AI company of using millions of the newspaper’s articles without permission to train artificial intelligence technologies.
The New York Times Takes Legal Action
The New York Times recently filed a federal lawsuit against OpenAI and Microsoft, accusing them of copyright violations. The lawsuit alleges that these companies used the Times’ stories to train their chatbots, resulting in billions of dollars worth of work stolen from the Times’ journalists.
Kristen Ruby, CEO of Ruby Media Group and a commentator on Big Tech and AI, shared her insights on Fox News. She emphasized the importance of copyright protection laws and the impact of AI companies using machine learning to circumvent these laws. Kristen also highlighted the potential mass exodus of creative talent from the industry if creative professionals are unable to opt out of having their work used for AI training without compensation.
Notable AI Lawsuits
The number of lawsuits filed against OpenAI continues to grow, with major authors like John Grisham, Jodi Picot, and George R.R. Martin joining the legal battle to protect their creative content. These lawsuits are crucial in setting legal precedent and standing up for the value of creative work.
The Future of Creative Content: Protecting Your Work in the Age of AI
Why does this matter and how does it impact you?
The NYT vs. OpenAI lawsuit could set legal precedent for how AI companies handle training data and copyright claims pertaining to news organizations. Even if the lawsuit goes in the favor of The NY Times, what does this really mean for individual authors and writers? What does it mean for you, the small business owner who has invested hundreds of thousands of dollars into your own content marketing assets over the years? The truth is, not much.
Protecting media companies does not mean regular business owners will be protected from having their content used for training data. There are some things you can do to prevent your content being used for training, but that doesn’t change the underlying issue. If NY Times wins this case, the disparity will only grow larger of the protection that large media outlets are afforded that regular businesses owners are not.
What is the value of content and thought leadership if it can be stolen and trained by AI without your consent? What is the value of your investment in content marketing work? If you can’t protect your work, the value will go down to zero. As an agency owner, this is the conversation I am having every week with clients. How can we make sure what we create for you remains protected so that you get the most out of your investment? Up until this point, that meant putting everything out in the open. But if the paradigm of the Internet is fundamentally shifting where you lose the fundamental right to own what you create; more people will start gating their content.
Does this lead to information gain or information loss? I predict this will lead to a mass exodus of the type of free content you are used to consuming right now. If a medical practice owner is paying an agency to assist them with content marketing, why would they put that out for free knowing it can be trained and they can’t stop it?
It was hard enough to convince people that inbound marketing was the new frontier of marketing. But now to convince people to be okay with putting out free content that can also be stolen and trained and you can’t do anything about it – that is a bridge too far. It is not a recommendation I will make. To make the most out of your investment in content marketing, you need to be able to protect your digital assets. IP is one of the most valuable assets you have as a business owner. This content becomes integral to who you are, how you operate, and how you communicate to the world about the services you offer.
If all marketing will be plagiarized, then why would anyone pay for marketing at all? This may lead to a new era of marketing where even marketing becomes gated. People will not pay others to create any type of marketing asset if it can be ripped off. The only way to hold the value of the creation is to keep it secret. That certainly defeats the purpose and premise of marketing altogether- but it may lead to a new era of exclusive marketing where your message is exposed to less people.
More people will start charging for courses and access to exclusive content; not because they want to, but because they have to. It is not feasible to pay an agency to create assets that another AI company can illegally train on. The only way to protect your investment is to make sure the underlying work can’t be stolen. This is where marketing is headed towards. The age of free and open Internet of ideas is coming to an end.
What is AI plagiarism?
The term plagiarism comes from the Latin word “plagiarius,” meaning “kidnapper,” and it has been a concern since ancient times. Plagiarism prevents readers from tracing ideas back to their original source.
Without access to the underlying dataset or training data, it is currently not possible to identify whether an AI model was trained on a particular piece of training material. That being said, any writer can easily recognize their work when they ask AI a question and the answer it generates is verbatim from their writing. PerplexityAI provides sources to where it pulls data points from, but that is different from the question of where a model was trained from.
AI is the ultimate plagiarist.
AI technology has led to new forms of cheating, such as using special text and other techniques to deceive plagiarism detection solutions. Article spinning, scraping, illegal training – all of this is a form of IP theft.
We must stop making excuses for theft and hold those responsible for the crimes they are committing. If your entire business model is built on scraped (stolen) data and you are charging users a fee to use the tool while giving those who you stole from zero royalties for their data which created your program, you are stealing from the origin source. This means your entire business model is built on theft. OpenAI must take responsibility for this and either start paying those they stole from or start over and only use training data that people have agreed to provide.
“Is it okay to plagiarize from ChatGPT today?”
If you are using content from ChatGPT, you are plagiarizing the work of someone else. The problem is that you don’t know who that person is, because OpenAI hid who they stole from. Therefore, no, it is not okay to plagiarize from ChatGPT, because you are plagiarizing a plarist. This will create a copy of a copy of a copy.
Is it okay for ChatGPT to plagiarize the work of other people?
The answer is no. It is not okay for anyone to plagiarize from anyone, regardless of the technology used. Plagiarizing “from” ChatGPT means plagiarizing from a plagiarist. The premise is flawed. The answer to “Is it okay to plagiarize” does not depend on the outcome of The NY Times vs. OpenAI case. A functioning society that respects copyright law would not need the outcome of this case to reiterate what we already know. Theft is wrong. Charging users a monthly fee to use a tool that is built on theft at scale is in fact – very wrong.
AI is not being “weaponized” to detect plagiarism. It has always been used this way and there is nothing wrong with that use case.
Plagiarism detection software has been around for a long time. Academic works is not being targeted by AI. Rather, everyone else’s work is. Colleges have used plagiarism detection software for a long time in the college admissions process.
Using AI to detect plagiarism is not targeting someone. It’s a form of intellectual property protection. If someone is plagiarizing – they should be outed – regardless of the type of emerging technology used for detection.
What is plagiarism?
- Failure to paraphrase properly is plagiarism.
- Failure to properly credit the work of others is plagiarism.
- Duplicate content is a form of plagiarism.
- Article spinning is plagiarism.
“The more impactful the work and the more important it is, the more likely it will be at risk of being reviewed for plagiarism.”
This is 100 percent false. No one is reviewing anyone’s work for plagiarism. OpenAI is straight up stealing peoples work and plagiarizing it.
You are conflating two different issues. Theft is theft – regardless of the impact of the work.
Re the above logic – only famous writers will have their work plagiarized by LLMs. This is simply not true. Everyone’s work is being stolen. No one has given consent. No one has the ability to opt out. The only way to rectify this situation is to start over.
“We are going to have to come up with new standards for plagiarism.”
No we don’t. We need to uphold the current standards that exist and stop giving AI companies a pass for plagiarism at scale.
The issue isn’t our standards. Rather – it is the complete lack of standards of those who think nothing of using emerging technology to circumvent the law.
“But what do we do about papers written before today, which will inevitably fail an AI plagiarism test?”
What do we do about articles written after today?
What will we do when no one wants to write anymore because they know their work will be plagiarized?
The issue is not that papers will fail a plagiarism test. Rather – it is that we should be testing models to make sure they aren’t plagiarizing the work of humans.
You are protecting the model – not the people who had their work stolen.
Instead of looking at the issue retroactively for your personal use case, you need to look at it proactively for the larger use case. This issue is much greater than academic writers being called out for plagiarism. LLMs are plagiarizing everyone’s work at scale.
The model is the plagiarist. The entire system is built on plagiarism. This ultimately leads to erosion of trust in a transparent Internet. The outcome of this will be less creation, not more.
Why should I pay a writer for their work? LLM theft essentially drives the value of said work down to zero. IP theft promotes moral decay. People quickly adopt this attitude. Others take it for free. Who are the “others?” OpenAI.
Publisher concerns grow around AI copyright infringement, compensation, and royalty claims.
The legal battle between the New York Times and AI companies is a significant development in the ongoing conversation about protecting creative content in the age of AI. The outcome of these lawsuits will shape the future of labor and copyright protection for creative professionals. As we navigate the intersection of technology and creativity, it is crucial to advocate for fair compensation and protection of intellectual property. Stay tuned for updates on this evolving issue and its impact on the marketing industry.
In 2024, we expect to see a heated ongoing debate on publisher rights and the laws surrounding generative AI. We have been covering this issue for the past year. For continued reporting AI reporting, follow Ruby Media Group Founder Kris Ruby on X @sparklingruby.
Is it possible to opt out of AI training?
Should my company opt out of AI?
There is a difference between ethical AI and unethical AI. Using AI tools for organizational efficiency is a great use of AI. Using AI tools trained on other people’s work is not a good use of AI and can open your company up to all kinds of legal risks and vulnerabilities. Every new AI content marketing tool you add to your tech stack comes with an attached risk. It is important to weigh this risk with each new tool you onboard your organization to.
Your company should use AI that you can control.
If your company wants to leverage AI, consider using AI that you can directly control. This includes control of training data, servers, and models. The greater control you have, the more you reduce compliance risk for your organization.
Why should companies be concerned about LLMs scraping user data?
Companies should worry about LLMs scraping user data, which can result in legal compliance and HIPAA issues. Additionally, if automation is used in decision making, that is another legal landmine. You cannot integrate this technology and hope for the best. You must have a plan to avoid the downstream consequences of the technology.
Training any AI system on user-generated content without permission is a serious issue not only for end users, but also for your company, too. For example, let’s say you are a local dentist and want to add an SMS AI chat functionality to your marketing program. Not so fast. Is it HIPAA compliant? Most AI is not, unless you jump through hoops to get a BAA from OpenAI Enterprise. Even then, there are still risks with how the data is processed and stored.
Why should companies be concerned about LLMs scraping their data?
Companies should be concerned about large language models scraping their data because it essentially drives the value of their investment in content to zero. It also means another company is stealing profitability from your company by monetizing your thought leadership. You may have hired a marketing agency or PR firm to assist you in crafting your content. Do not let someone else steal your investment.
Over the years, that amounts to hundreds of thousands of dollars in content marketing asset creation. OpenAI has essentially robbed business owners of their investment by training an AI model on what they wrote. They never got permission to train a model on the data, including proprietary data with copyright protection.
This creates a lethal situation where business owners will stop investing in content if they know the value of the investment goes down to zero, and agency owners will be less likely to offer content creation services if they know their content will be stolen with zero attribution. The cycle of automated AI plagiarism is detrimental to culture and society.
It mirrors what we are seeing in cities that are in rapid decline. People steal because they know they can. If there are no consequences for stealing, the theft will continue to increase. This behavior is currently being mirrored online by some of the largest tech companies in the world. They have set a terrible example and are eroding the democratic principles our society was built on; ethics, morality, integrity, and intellectual property rights.
Big tech companies are essentially saying, what you created isn’t yours. It’s now ours. Not only are they stealing from you in broad daylight, they are also making money on the assets they stole from you right in front of your eyes for all to see. This display of arrogance is the greatest ponzi scheme of all time. If any other criminal stole diamonds from a jewelry store and then bragged about their robbery online, they would be in prison. Yet, we are living in a culture where the complete opposite is happening.
Our government is now dependent on the robbers to help them understand the system of crime, leading to zero arrests. This will ultimately lead to a surge in crime. We cannot rely on criminals to reduce crime. We must rely on those who are not in bed with criminals to guide elected officials out of this mess.
Machine learning is being used as a criminal weapon to circumvent the Constitution. In The Ruby Files, we learned how machine learning circumvented the First Amendment. That was done behind closed doors. This time it’s different. This time the stealing is happening right in front of you and instead of calling it out, we are paying for the right to use the stolen property.
It is another nail in the coffin of this country that will ultimately lead to moral decline and failure. In business, ethics, integrity, and strong character are essential skills. We are now telling future business leaders that none of that matters. No need to have ethics. Just steal and then charge other people to access the stolen goods. This message will lead to a business wasteland over the next decade if it is not course corrected now.
Why should users be concerned about LLMs scraping user data?
“Protecting proprietary content, traffic, and search engine rankings. Website owners invest time and resources in content that drives inbound traffic and leads to their website. They spend years researching a subject and publishing new content. Why would they want to allow that information to be scraped in bulk by a GPT Bot that could use the information on their own platform? This would take away traffic from the website owner. It is essentially a redirect method of traffic, rankings, and content. With a traditional search engine, the purpose is to drive traffic to the website and display the original source when someone uses their search engine and queries a relevant term. With AI large language model scraping, the publisher of the content (website owner) sees no tangible value of having their content scraped by LLMs.” – Jeff Rose
AI TRAINING OPT OUT GUIDE
What can users do to prevent their data being used by LLMs?
Ruby Media Group AI Privacy Hub
Here’s how to protect your content from AI plagiarism and illegal training
How do I opt out of generative AI?
Google and OpenAI want all online content available to train their models on. The more content they can scrape, the better. But what they want is not what you want. Did you know that many AI companies are illegally using your data to train their models? They are training their models using your content without your permission or consent. Only content protected by a paywall is safeguarded against OpenAI’s GPTBot. If the platform allows it to use the content, it is still not protected.
Here’s what you need to know to put a stop to it.
How do I opt out of generative AI training on OpenAI?
Block ChatGPT from using your website content. OpenAI now lets you block its web crawler from scraping your site to help train GPT models. Manually opt out of training data with your OpenAI account. You can opt out of having your input used to train their models by emailing them at firstname.lastname@example.org or filling out a form here. This does not apply to data already used to train their model.
Jeff Rose, President of iCircle Technologies, said this move will give users more control over what they allow the GPT Bot to ingest. “If a website owner does not want to share all of their website content, they can insert code into their robot.txt file that lets GPT Bot know what it can or cannot scrape.”
What can users do to prevent their data being used by LLMs?
According to OpenAI documentation, the best way to prevent their GPTBot from scraping your site, you should add a few lines of code to your Robot.txt file.
To block all GPTBot access, enter the following:
To block or allow only certain directories you can add the following code:
How do I opt out of generative AI training on Google?
Don’t want Google to use your website for AI training data? You are not alone.
Google Extended offers an opt-out for training data. With Google-Extended, publishers can add a directive to their robots.txt file that blocks Google’s crawlers from accessing their pages to gather data.
How do I opt out of generative AI on Facebook?
If you want to stop Meta from using your personal data to train generative AI models, fill out this form. However, the form is more of a PR stunt than a real privacy protection measure and ultimately does nothing to stop Meta from using your data.
How do I opt out of generative AI on X?
There is currently no way to opt out of generative AI training on X. After acquiring Twitter, Elon Musk paused OpenAI’s access to the social media platform’s database, which was being used for training purposes. It is important to note that Musk has not paused his own access to illegally training your work for his own training purposes.
How do I opt out of generative AI on Google Bard?
In your Bard account settings, turn off Bard activity under “my activity.” Be sure to frequently delete any prompts used for Bard as well.
How do I opt out of generative AI training on Substack?
Most Substack writers are unaware that the popular newsletter platform enables AI training by default. If you want to opt out of training your Substack newsletter, you can disable AI training bots from using your content, go to the publication setting page and scroll to publication details. Toggle to block AI training.
How do I opt out of generative AI training on Stability AI?
Check to see if your work has been used in large scale training sets. Stable Diffusion is based on the open-source LAION-5B dataset, which was built by scraping images from the internet, including copyrighted work of artists. AI prompt engineers create nearly identical images by coming up with text-based prompts to copy the style and work of their favorite artists. To stay out of legal trouble, we highly suggest never doing this. HaveIBeenTrained by Spawning allows artists to remove their images from datasets used to train AI models.
“In December 2022, Spawning announced that Stability AI would consider this so-called artist opt-out when training Stable Diffusion 3. The deadline to opt-out has passed. Artists removed over 70 million images from Stable Diffusion 3’s training set. This shows you how many people want to opt out. According to The Decoder, over 40,000 opt-out requests were submitted directly through haveibeentrained.com.”
Other privacy protection recommendations include:
- File a copyright claim for your work.
- Register for copyright protection for your written work.
- Add a DMCA protection notice to your website.
KRIS RUBY is the CEO of Ruby Media Group, an award-winning public relations and media relations agency in Westchester County, New York. Kris Ruby has more than 15 years of experience in the Media industry. She is a sought-after media relations strategist, content creator and public relations consultant. Kris Ruby is also a national television commentator and political pundit and she has appeared on national TV programs over 200 times covering big tech bias, politics and social media. She is a trusted media source and frequent on-air commentator on social media, tech trends and crisis communications and frequently speaks on FOX News and other TV networks. She has been featured as a published author in OBSERVER, ADWEEK, and countless other industry publications. Her research on brand activism and cancel culture is widely distributed and referenced. She graduated from Boston University’s College of Communication with a major in public relations and is a founding member of The Young Entrepreneurs Council. She is also the host of The Kris Ruby Podcast Show, a show focusing on the politics of big tech and the social media industry. Kris is focused on PR for SEO and leveraging content marketing strategies to help clients get the most out of their media coverage.