Stuck with a website you didn’t build, don’t love, and don’t trust? Your website should be your biggest ally — not your biggest headache. Learn how to fix it fast with the FREE Website Triage Template.
Stuck with a website you didn’t build, don’t love, and don’t trust? Your website should be your biggest ally — not your biggest headache. Learn how to fix it fast with the FREE Website Triage Template.
We’ve spent years optimizing website content to get found, dialing in SEO, crafting thought leadership, and building better digital experiences to engage visitors.
But AI, so yeah – is it time to rethink that approach?
Your content is being scraped and repurposed in ways you didn’t anticipate. You see it at the top of Google search results, in Perplexity, and all the AI engines. And that happens without a visit, without credit, and without you even knowing it. That strategic guide your team spent weeks developing could already be powering chatbot responses somewhere that isn’t your website.
So is this a panic moment? Nah – it’s a pivot point.
For most brands, wide exposure still makes sense. For others, especially those with proprietary insights or high-value content, it may be time to reevaluate what visibility really means.
But blocking AI bots?!?
It’s not for every organization, but depending on your objectives, it might be a smart move to protect your IP, brand voice, and the long-term value of your content.
Oh, the irony! You have spent countless meetings, dollars, and effort building content to be discovered. Now AI is doing the discovering without ever visiting. Your best-performing content could be training a model while your brand remains invisible. That’s not the visibility you invested in.
Your site may house frameworks, playbooks, or methodologies that are core to your value proposition. If those assets are feeding AI tools without attribution, they are no longer exclusive.
Brand voice is strategic. It reflects who you are, what you believe, and how you connect with your audience. When AI tools extract and repackage your copy, the nuance gets lost. The result is generic, impersonal, and off-brand.
If an AI tool can answer your audience’s questions before they ever reach your site, your traffic suffers. That means fewer conversions, less engagement, and a declining return on your content investment.
For companies in regulated industries, the risks are even greater. Even anonymized or public-facing content can raise red flags when reused out of context. AI scraping introduces a layer of complexity that most compliance teams are not prepared for.
Once your content is pulled into an AI model, you lose visibility into where it appears, how it’s framed, or who benefits. Blocking bots helps protect the integrity of your messaging and your intent.
If this makes sense for your brand, the answer isn’t without flaws, and there’s no single solution, but these steps can help you take back control.
Reputable AI companies publish their bot names and follow exclusion rules. If you want to block your entire site, add this to your robots.txt file:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
If you prefer to block specific pages or folders—such as proprietary resources or gated tools—you can target just those areas. For example:
User-agent: GPTBot
Disallow: /resources/pricing-strategy-guide.pdf
Disallow: /insights/internal-research/
This gives you flexibility. You can still allow general visibility for most of your site while protecting high-value or sensitive assets.
Keep in mind: while this strategy deters compliant AI crawlers, it won’t stop bad actors who ignore robots.txt. That’s why it works best as part of a layered approach that includes IP blocking, meta tags, and legal protections.
Some companies publish the IP addresses their bots use. Blocking these at the firewall or server level gives you an added layer of protection beyond robots.txt.
Prevent pages from being indexed or summarized with the following:
<meta name=”robots” content=”noindex, noarchive, nosnippet”>
This may not stop access completely, but it discourages use by compliant bots.
Proprietary resources should be behind logins, paywalls, or CAPTCHAs. If bots can’t access it, they can’t train on it.
Make your expectations clear. Prohibit scraping or use of your content for AI training. It may not block activity directly, but it strengthens your legal position.
Protecting access to your content does not mean hiding – it means making intentional decisions about how your content is used. If your business is built on thought leadership, proprietary frameworks, or a strong brand identity, it’s worth considering who your content is serving—and whether that aligns with your strategy.
Need help implementing these techniques or just want to discuss them? Reach out to talk.