AI Fire
Posts
🔒 Is AI Stealing Your Words? How to Stop ChatGPT from Using Your Writing

🔒 Is AI Stealing Your Words? How to Stop ChatGPT from Using Your Writing

Keep your writing safe from AI bots with easy steps to block unauthorized use.

Neil Phan
October 27, 2024

Do you think AI training should pay for using writers’ content? 💬

Let’s hear your thoughts below!

Introduction
I. The Scale of AI Training
II. Legal Actions Against AI Data Usage
- 1. Big Money, Big Problems
- 2. The Word Shortage Crisis
III. How to Protect Your Writing
- 1. Robots.txt: Your Digital “Do Not Enter” Sign
- 2. Checking Your Site’s Robots.txt File
IV. Setting Up Robots.txt on Your Own Website
V. The Role of Common Crawl
- 1. How Common Crawl Got Caught Up in AI Training
- 2. Why You Might Want to Block Common Crawl Too
VI. Call to Action
- 1. What You Can Do as a Writer
- 2. Final Thought: Your Words Are Yours
Conclusion

Introduction

Ever had someone swipe your work, slap their name on it, and act like it’s all good? Yeah, it’s the worst. Now imagine an AI doing that—scooping up your words, taking bits and pieces, and turning them into part of its “AI training.” Feels kind of off, right?

But here’s the twist: no one’s talking about it, and not everyone even knows it’s happening. Your writing is your thing, your effort, and yeah, your intellectual property. So when AI companies use your words without a heads-up, it’s more than just annoying. It’s a problem. And not a small one.

AI training needs an insane amount of data. We’re talking hundreds of billions of words! And, uh, they’re not exactly asking for permission. Yep, some companies just scrape the web, grab all the writing they can find—maybe even yours—and use it to teach their bots. No credit, no compensation, just poof, gone into the AI void.
So, what do you do when a machine decides to borrow your words? And should we just accept it? Let’s break this down and see how we can stop being the free buffet for AI training.

I. The Scale of AI Training

So, let’s talk about the elephant in the room—AI training. When ChatGPT launched, it wasn’t just playing around with a few essays and Wikipedia articles. Nope, we’re talking about 300 billion words. Yeah, billion with a B.

To put that into perspective, if you sat down and wrote 1,000 words every single day, it would take you about 2,740 years to reach just one billion. And ChatGPT needed 300 of those.

1. Big Numbers, Bigger Problems

Now, I get it—big numbers like that are hard to imagine. But what’s important here is where those words came from. They didn’t just fall from the sky. AI training is like a super picky eater that only wants the best stuff.

Remember when Microsoft tried their first AI on Twitter? It went rogue and turned into a mess in just one day. That’s when companies like OpenAI realized they needed high-quality writing—no more random tweets or internet fluff. So, they started “borrowing” from:

Books
Articles
Scientific papers
Even your favorite blogs

And let’s be honest, they weren’t exactly asking for permission.

2. Quality Matters

The thing is, they didn’t ask first. It’s like that friend who shows up at your house, eats all your snacks, and says they’ll pay you back “someday.”

Only this time, the "friend" is a giant AI company, and they’re making billions off those borrowed words. Here’s the catch:

High-quality writing is essential for AI training.
The better the content, the smarter the AI becomes.

So while we’re over here putting blood, sweat, and tears into our work, AI companies are happily using it to train their bots without even a thank you.

3. It’s Not Just About Quantity, It’s About Quality

Bad content = AI failure (like Microsoft’s Twitter bot)
Good content = Successful AI (like ChatGPT)

AI training isn’t just about gobbling up words—it’s about choosing the right ones. And that’s where it gets tricky for writers like us.

They’re using our words to make their money, and most of us don’t even know it.

So, how do you feel about that?

Learn How to Make AI Work For You!

Transform your AI skills with the AI Fire Academy Premium Plan – FREE for 14 days! Gain instant access to 200+ AI workflows, advanced tutorials, exclusive case studies, and unbeatable discounts. No risks, cancel anytime.

Start Your Free Trial Today >>

II. Legal Actions Against AI Data Usage

Let’s talk about AI training and how it’s facing some serious heat in court. The New York Times decided they weren’t just going to sit back while AI companies helped themselves to their articles. They were the first major player to sue over AI scraping. Why? Because sometimes these AI models don’t just “learn”—they spit out full-on chunks of text, practically verbatim. Imagine writing an article, only for parts of it to show up in an AI’s response without even a wink of acknowledgment. Yeah, not cool.

And it’s not just The New York Times making noise. Other companies might join the lawsuit bandwagon too. Because, let’s face it, if AI training is using your work without permission, it’s fair to ask, “Where’s my cut?”

1. Big Money, Big Problems

Now, while writers are counting every penny, OpenAI is counting millions. They made around $300 million in August alone and are aiming for $3.7 billion this year. Meanwhile, most of us are just hoping to afford an extra for coffee this month. It’s a little frustrating to watch them rake in profits from AI training while using words they didn’t exactly get permission for.

Here’s the breakdown:

OpenAI's Revenue:
- $300 million in a single month.
- Projected to hit $3.7 billion this year.
- Aiming for $100 billion by 2029.
Writers' Earnings:
- Often just scraping by.
- No compensation for AI training using their work.
- Seeing their words turned into AI-generated responses without credit.

The reality is simple: AI training makes a lot of money, but the original writers? Not so much. And it’s not just about fairness—it’s about acknowledging where the content comes from.

2. The Word Shortage Crisis

Here’s a twist: even AI can run out of stuff to read. A study by Epoch AI suggests that by 2026, AI models might run out of fresh, human-generated content. Turns out, there’s only so much internet to go around, and without new material, AI training might hit a wall. Imagine an AI just re-reading the same old articles like they’re stuck in a never-ending rerun. That’s the future if they don’t find new words to munch on.

Why does this matter?

Content Shortage = Stagnant AI: Without new data, AI models can’t improve.
Paying for Content: AI companies might need to start compensating writers to keep up.
Legal Pressure: More companies could sue, pushing for better content practices.

So, what’s next? Will AI training keep helping itself to free words, or will writers finally get a seat at the table?

III. How to Protect Your Writing

Okay, let’s be real: 100% protection from AI training is like trying to hide your snacks from a hungry roommate. It’s not gonna happen. Writing offline in a notebook? Cute idea, but not exactly practical when your rent depends on digital work. Most of us need a solution that works online, where our words actually live.

1. Robots.txt: Your Digital “Do Not Enter” Sign

Now, before you freak out, robots.txt is literally just a tiny text file. It’s like putting up a little sign that says, “Hey bots, keep out!” And it’s pretty easy to set up, even if you’re not a tech wizard.

Here’s how it works:

User-agent: This is the bot’s name (like GPTBot).
Disallow: This means no, you can’t access this.
Slash ( / ): This means the whole site or account.

So, a basic line in your file might look like this:

User-agent: GPTBot
Disallow: /

It’s basically a polite way of saying, “Hey, GPTBot, these are my words—not yours.”

2. Checking Your Site’s Robots.txt File

Curious to see how this looks in action? Just type yourwebsite.com/robots.txt in your browser, and boom, you’ll see the file. You can even peek at how big sites like Wikipedia handle their settings.

For example, on my wordpress blog, I’ve got to manually turn off AI training:

Go to Settings > Privacy.
Tick Prevent third-party for vicngaao.wordpress.com.

It’s not automatic, so make sure you check!

So, no, we can’t hide our words in a cave somewhere. But we can make it harder for those bots to use our writing for AI training. And honestly, isn’t it worth a shot?

IV. Setting Up Robots.txt on Your Own Website

Okay, so you want to stop AI training from munching on your words, but you’ve got a website, and you’re not sure where to start? Let’s break it down. It’s easier than dealing with an 8 a.m. Monday meeting, I promise.

1. For WordPress Users: The Easy Way

Use the Yoast SEO Plugin

If you’re already using Yoast for SEO, you’re halfway there. Here’s what you do:
- Go to your WordPress sidebar.
- Click on Yoast > Tools > File Editor.
- From here, you can edit your robots.txt file directly.
It’s like leaving a note on your front door: “Dear bots, please respect my space.”
Install the WP Robots Txt Plugin
Not using Yoast? No problem. The WP Robots Txt plugin is your new bestie:
- Go to Plugins > Add New in WordPress.
- Search for WP Robots Txt, install it, and activate.
- Now, you can easily add lines like this to your robots.txt file:

2. Manual Method: For the Brave

If you’re feeling adventurous and know how to access your website’s files through FTP:

Head over to the root directory of your website.
Find the robots.txt file or create one if it’s not there.
Copy-paste this code:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

It’s like putting up a digital “No Trespassing” sign.

3. When You Need a Little Help

Not everyone is into tech (and that’s okay). If this feels like too much, don’t stress! Reach out to your hosting company’s support. Most of them are pretty nice about helping you out—just tell them you want to block bots from AI training on your site, and they’ll know what to do.

Why Bother?

Because, honestly, your words are your words. And while robots.txt won’t solve everything, it’s a solid way to make sure those AI bots don’t take without asking. Plus, it’s way more satisfying than feeling like your site is an all-you-can-eat buffet for AI training.

Got questions? Need more tips? Drop them below!

V. The Role of Common Crawl

Okay, let’s talk about Common Crawl—the quiet little data collector that got tangled up in the big AI training drama. Originally, Common Crawl had a pretty noble goal: make a giant library of internet data that anyone could use for research and analysis. Think of it like the friendly neighbor who lets everyone borrow their books.

1. How Common Crawl Got Caught Up in AI Training

So, what happened? Well, OpenAI used data from Common Crawl to train its models. Instead of scraping every website one by one, they went straight to this big, pre-made collection of words and text. It was like skipping the grocery store and going directly to Costco. And sure, that sounds efficient—except they didn’t exactly get the okay from every “book” owner in that library.

The New York Times lawsuit called this out, and suddenly, this little nonprofit found itself being named in a big legal battle. It’s kind of like being the friend who just wanted to host a casual game night but ended up in the middle of a heated Monopoly feud. 🎲

2. Why You Might Want to Block Common Crawl Too

Here’s the thing: If you’re a writer, blogger, or anyone who puts their heart into online words, you might not want your content in that giant data pool. Because guess what? That data could end up in the hands of AI models that might use it for AI training without giving you a second thought (or a penny).

Reasons to Block Common Crawl:

Protect your content: Keep your work out of AI training datasets.
Control your digital presence: Decide who gets to use your words.
Avoid being part of the controversy: Don’t let your content end up in legal disputes without your say.

Blocking Common Crawl won’t solve all your AI training worries, but it’s a solid start. It’s like saying, “Hey, Common Crawl, thanks but no thanks.” And if you’re not sure how to do this, scroll up to the part where we talked about setting up robots.txt—it’s not rocket science, promise.

In the end, Common Crawl might have started with the best intentions, but it’s become part of the bigger conversation about AI training and how our words get used. If you want to keep your content out of the mix, now you know how. And honestly, it’s worth taking control over where your words end up—no one likes being an accidental contributor to a billion-dollar AI project, right?

VI. Call to Action

Look, I know we’re all hoping the courts step up and do the right thing here. Companies like The New York Times are pushing back against AI training that snatches up content without permission, and honestly, I’m rooting for them. It’s about time some legal wins protect our work.

I also hope more companies follow their lead—because if enough of them challenge these AI giants, it could change the way AI training operates. We might finally see some respect for intellectual property rights.

1. What You Can Do as a Writer

But while we’re waiting for those court rulings, there are a few things you can do right now to protect your content:

Check your website settings: Make sure your robots.txt file is telling AI bots, “No, you can’t use my work for AI training.”
Platforms you write on: Look at their AI policies. If it’s not clear, shoot them a message asking about how they handle AI training.
- Platforms like Substack have settings for this, but some don’t make it obvious, so don’t be shy about asking questions.

2. Final Thought: Your Words Are Yours

At the end of the day, your words are your property. Until laws change, it’s up to us to stay on top of where our content goes. AI training may be the new reality, but that doesn’t mean you can’t control how much you’re involved in it.

So keep an eye on your content. Protect it. Because the reality is, no one’s going to care about your work more than you do.

And hey, we’re all in this together. Let’s not let AI companies turn our words into their next paycheck without at least asking first.

Conclusion

In the end, protecting your writing from AI training isn’t some impossible task. Sure, writing in a notebook might keep the bots out, but let’s be real—that’s not how most of us get work done. It’s about finding those small steps—like using robots.txt—that can make a big difference.

And yes, we need bigger legal changes to truly protect writers. It’s about time those courtrooms start valuing our words as much as we do. Until then, keep an eye on your content and stay on top of those settings.

Because your words? They’re yours—no matter how much AI wants a piece of them.

If you are interested in other topics and how AI is transforming different aspects of our lives, or even in making money using AI with more detailed, step-by-step guidance, you can find our other articles here:

*indicates a premium content, if any

What do you think about the AI Research series?

Reply

or to participate.