Are AI Detectors Good at Identifying AI Content?

Jun 1

AI detection tools have become necessary because the popularity of using ChatGPT and other artificial intelligence platforms for writing has exploded recently. Articles written entirely (or mostly) by AI hurt everyone. AI-generated content does not rank well in search results because it merely summarizes existing information instead of showcasing the author’s EEAT. This hurts the publisher and the company behind them. Writers, in turn, are affected by having their work scrutinized more closely. Readers are directly impacted by low-quality regurgitated information present in AI-generated answers in organic search (AI Overviews) and apps (ChatGPT).

To address this emerging problem, there are now numerous tools that claim to detect AI-generated text with a high degree of certainty. ZeroGPT, Copyleaks, and GPTZero are popular choices on the market. But how accurate are they when it comes to flagging AI-generated content? Our findings included several surprises, both from the humans and the machines.

Methodology

We produced a range of blog articles with varying degrees of AI use, on topics spanning construction, finance, entertainment, health, business, and manufacturing. These articles include:

16 Human-written articles

6 human-written articles, including outlines
2 human-written, AI-revised articles
3 human-written articles based on an AI-produced outline
3 human-written articles based on an AI-produced, human-reviewed outline
2 human-written articles with AI used only to brainstorm

10 AI-generated articles

8 AI-generated, human-revised articles
2 purely AI-generated articles

We ran the same articles through ZeroGPT, Copyleaks, and GPTZero and noted the AI scores assigned to each. The results are presented in order, beginning with the least accurate AI detection tool and ending with the most accurate.

How Accurate Is ZeroGPT?

ZeroGPT came out as the least accurate AI detection tool of the three we tested. It generated false positives for almost all human-written articles, with especially high AI scores for technical topics.

The AI scores it returned for actual AI articles (with human review) often came out lower than the scores for the human-written articles. This was most pronounced when we included instructions in the prompt to ChatGPT to use an "inspiring, uplifting tone" rather than a formal one.

ZeroGPT Scoring Methodology

ZeroGPT scores text based on “a series of complex and deep algorithms” patented as DeepAnalyse™ Technology. They developed and tested these algorithms through in-house experiments using educational datasets, collections of texts from the internet, and their own AI-generated content using a range of language models.

After scanning a text, the AI detection tool returns a probability verdict on a continuum from “Your text is Human written” to “Your text is AI/GPT generated.”

ZeroGPT claims an accuracy detection rate of up to 98 percent. The team aims to achieve a 2 percent error rate.

Our tests showed that AI GPT scores were higher for construction, business, and finance articles. They were lower for entertainment and health articles, irrespective of their human or AI origin.

ZeroGPT Accuracy in Detecting Human-Written Text

ZeroGPT seems to give a high AI score to any text that sounds formal, technical, or clear and easy to understand.

Topic	Content Generation Method	*AI GPT Score**
Construction 1	Human-written content	69.54% AI GPT*
Construction 2	Human-written content	98.76% AI GPT*
Construction 3	Human-written content	88.76% AI GPT*
Construction 4	Human-written content	78.38% AI GPT*
Finance 1	Human-written content simplified using AI	49.21% AI GPT*
Finance 2	Human-written content simplified using AI	79.93% AI GPT*
Finance 3	Human-written content	64.92% AI GPT*
Finance 4	Human-written content	77.77% AI GPT*
Finance 5	AI outline, human-written content	86.29% AI GPT*
Finance 6	AI outline, human-written content	79.35% AI GPT*
Finance 7	AI outline, human-written content	60.05% AI GPT*
Entertainment 1	Human-written content, AI used to brainstorm	32.96% AI GPT*
Entertainment 2	Human-written content, AI used to brainstorm	46.12% AI GPT*
Health 1	AI-generated + human-reviewed outline, human-written text	37.10% AI GPT*
Health 2	AI-generated + human-reviewed outline, human-written text	46.20% AI GPT*
Health 3	AI-generated + human-reviewed outline, human-written text	51.20% AI GPT*

ZeroGPT Accuracy in Detecting AI-Generated Text

ZeroGPT didn't prove particularly accurate in detecting AI content, especially when we instructed ChatGPT to use an informal tone. The prompts to ChatGPT for the first two articles in the table below included instructions to use a "professional tone." Their ZeroGPT AI scores are notably on the high end.

The prompts to ChatGPT for the rest of the articles specified an "inspiring and uplifting tone." These articles came out with lower scores on the whole than the human-written technical articles (the construction and finance articles listed above).

Topic	Content Generation Method	*AI GPT Score**
Business formal 1	AI-generated with human review	99.46% AI GPT*
Business formal 2	AI-generated with human review	89.2% AI GPT*
Business casual 1	AI-generated with human review	52.21% AI GPT*
Business casual 2	AI-generated with human review	71.1% AI GPT*
Business casual 3	Purely AI-generated	90.9% AI GPT*
Manufacturing casual 1	AI-generated with human review	33.08% AI GPT*
Manufacturing casual 2	AI-generated with human review	6.10% AI GPT*
Manufacturing casual 3	AI-generated with human review	25.43% AI GPT*
Manufacturing casual 4	AI-generated with human review	16.54% AI GPT*
Manufacturing casual 5	Purely AI-generated	61.1% AI GPT*

Triggers Leading to High and Low Scores in ZeroGPT

We noticed three specific triggers that led to a high AI GPT score in ZeroGPT:

Extensive use of bullet points
Technical topic
Formal tone

Our team also noticed that “humanizing” an article section by section doesn’t work with ZeroGPT. Individual sections may come out as “human” using the tool, and then the entire article, pasted together, returns a “written by AI” result. The tool is looking for patterns across the entire text.

How Accurate Is Copyleaks?

Our tests confirm that it is a very accurate AI detection tool. It generates some false positives, but rarely returns a false negative.

Copyleaks claims to be "the most accurate AI detection tool with over 99% accuracy (backed by third-party studies)." Our test confirms that it is a very accurate AI detection tool. It comes back with some false positives, but rarely returns a false negative.

Copyleaks Scoring Methodology

Copyleaks currently scores scanned text using the V10 model of the Copyleaks API. It assigns each text an AI Detection Percentage. This refers to the percentage of the text that is deemed AI-produced. It is not a confidence score. The user can adjust the sensitivity level to raise or lower the likelihood of false positives.

The tool used to provide an “overused phrase score” if AI patterns were detected. However, this score is no longer provided. Instead, the tool provides a heat map of phrases according to the probability that they are AI or human, referred to as AI Phrases.

The scoring algorithm looks for certain patterns in writing that help to reveal whether it’s AI or human-written:

Frequency ratios of certain phrases
Parts of speech (grammar and syntax)
Syllable dispersion
Hyphen use

Copyleaks Accuracy in Detecting Human-Written Text

Using AI to simplify human writing appeared to "trip the switch" in Copyleaks. We have also seen the occasional human-written technical article return a score of 100% AI using this tool. Copyleaks seems to be all or nothing, as we most frequently see results of either 0% or 100% (including false positives) rather than intermediate percentages.

Topic	Content Generation Method	Copyleaks Score
Construction 1	Human-written content	0% AI
Construction 2	Human-written content	0% AI
Construction 3	Human-written content	0% AI
Construction 4	Human-written content	0% AI
Finance 1	Human-written content simplified using AI	95.30%
Finance 2	Human-written content simplified using AI	100%
Finance 3	Human-written content	0%
Finance 4	Human-written content	0%
Finance 5	AI outline, human-written content	100%
Finance 6	AI outline, human-written content	100%
Finance 7	AI outline, human-written content	100%
Entertainment 1	Human-written content, AI used to brainstorm	34.50%
Entertainment 2	Human-written content, AI used to brainstorm	0%
Health 1	AI-generated + human-reviewed outline, human-written text	0%
Health 2	AI-generated + human-reviewed outline, human-written text	0%
Health 3	AI-generated + human-reviewed outline, human-written text	0%

Copyleaks Accuracy in Detecting AI-Generated Text

The Copyleaks scores for the AI-generated, human-reviewed test articles were also quite accurate. Copyleaks correctly identified AI-generated text in every case. The two scores that came out at less than 100% were for articles that had been modified through human review.

Topic	Content Generation Method	*AI GPT Score**
Business formal 1	AI-generated with human review	98.2%
Business formal 2	AI-generated with human review	100%
Business casual 1	AI-generated with human review	100%
Business casual 2	AI-generated with human review	74.7%
Business casual 3	Purely AI-generated	100%
Manufacturing casual 1	AI-generated with human review	97.30%
Manufacturing casual 2	AI-generated with human review	100%
Manufacturing casual 3	AI-generated with human review	100%
Manufacturing casual 4	AI-generated with human review	100%
Manufacturing casual 5	Purely AI-generated	100%

How Accurate Is GPTZero?

GPTZero appears to be about as accurate as Copyleaks at detecting AI content, while providing an even more detailed score (Human % - Mixed % - AI %).

GPTZero Scoring Methodology

GPTZero uses two bases for classifying a text:

The perplexity score, which measures randomness. Generative AI tools work in a similar way to autocomplete functions in search engines and instant messaging applications. They predict the next word based on patterns and probability. AI tools that choose the next word completely randomly (like a magician pulling colored balls out of a hat) will produce text that sounds more machine-like. Humans will more often favor certain words in a sequence because they fit better in context.
The burstiness score, which measures the variation in perplexity. This refers to the variations in sentence length and structure. Humans naturally write a mix of long, complex sentences and short, direct ones. This measurement considers both the amount of variation and how closely it resembles natural as opposed to artificial variation.

One independent study found GPTZero to have an accuracy of 80% with many false negatives, in which AI-generated content returned a “human” score.

Formal human writing that features clear explanations and short sentences also triggers false positives. We have seen AI scores up to 100% for technical articles in some instances.

GPTZero Accuracy in Detecting Human-Written Text

GPTZero accurately scored all of the purely human-written articles in our trial with a score of 0% AI. The AI-simplified human-written articles received a score of 100% AI. Articles written based on AI-generated outlines also resulted in high AI scores in two out of three cases.

Topic	Content Generation Method	AI%	Mixed%	Human%
Construction 1	Human-written content	0	5	95
Construction 2	Human-written content	0	2	98
Construction 3	Human-written content	0	5	95
Construction 4	Human-written content	0	2	98
Finance 1	Human-written content simplified using AI	100	0	0
Finance 2	Human-written content simplified using AI	100	0	0
Finance 3	Human-written content	0	1	99
Finance 4	Human-written content	1	0	99
Finance 5	AI outline, human-written content	94	6	0
Finance 6	AI outline, human-written content	100	0	0
Finance 7	AI outline, human-written content	1	99	0
Entertainment 1	Human-written content, AI used to brainstorm	1	3	96
Entertainment 2	Human-written content, AI used to brainstorm	0	3	97
Health 1	AI-generated + human-reviewed outline, human-written text	0	1	99
Health 2	AI-generated + human-reviewed outline, human-written text	0	2	98
Health 3	AI-generated + human-reviewed outline, human-written text	3	2	95

This AI detection tool differentiates itself from the other tools mentioned above because it sometimes explains the reasons why each article received a particular score (AI, Human, or Mixed). It also highlights the sentences that had the largest impact on the probability score (both AI and Human).

Several human articles that we’ve written since this test returned a 100% AI score. Reasons we frequently see an AI score on human-written articles interestingly align with many SEO best practices. Namely:

Robotic Formality: The writing style is formal and polished, focusing on clarity and orderliness, but may appear robotic due to lack of variation.
Sophisticated Clarity: Precise word choice prioritizes clarity and sophistication, sometimes affecting natural flow.
Rigid Guidance: The writing style is consistent, focusing on practical advice and strategies.

These similarities between SEO writing and AI writing make it even more challenging for both companies and freelancers to determine the best approach to take when producing human-written content.

GPTZero Accuracy in Detecting AI-Generated Text

GPTZero returned the following scores for our AI-generated test articles:

Topic	Content Generation Method	AI%	Mixed%	Human%
Business formal 1	AI-generated with human review	100	0	0
Business formal 2	AI-generated with human review	100	0	0
Business casual 1	AI-generated with human review	100	0	0
Business casual 2*	AI-generated with human review	16	67	17
Business casual 3	Purely AI-generated	100	0	0
Manufacturing casual 1	AI-generated with human review	100	0	0
Manufacturing casual 2**	AI-generated with human review	47	53	0
Manufacturing casual 3	AI-generated with human review	94	6	0
Manufacturing casual 4	AI-generated with human review	100	0	0
Manufacturing casual 5	Purely AI-generated	100	0	0

For the two AI-generated articles that received an AI score below 50%, GPTZero provided the following classifications:

*"Lightly edited by AI: We are moderatelyconfident this text was originally human written and polished by AI"

**"We are uncertain about this document. If we had to classify it, it would likely be a mix of AI and human"

In both cases, the combined “AI” and “mixed” percentage was above 80%.

AI Detection Tools Compared

This side-by-side comparison shows the accuracy, cost, and main features of ZeroGPT, Copyleaks, and GPTZero.

	ZeroGPT	Copyleaks	GPTZero
True positives*	66.67%	100%	100%
False positives	68.75%	25%	18.75%
True negatives	31.25%	75%	81.25%
False negatives	33.33%	0%	0%
Free plan	15,000 characters per scan Ads are displayed on the screen.	25,000 characters per scan (more with a free account)	10,000 words per month, 3 free advanced scans, and 5 free AI highlights in the Chrome extension
Paid personal plans	Pro ($9.99/month), Plus ($19.99/month), Max ($26.99/month), Expert (price not displayed) options with increasing character limits, number of batch files, PDF report generation, and detection histories.	$16.99/month or $13.99/month billed annually Includes AI scanning in 30+ languages and plagiarism scanning in 100+ languages, Google Chrome extension, Google Docs add-on, and 100 scan credits for up to 25,000 words per month.	$23.99/month or $12.99/month billed annually Includes 300,000 words per month of basic scans, advanced scans, multilingual scans, downloadable reports, and AI highlights in the Chrome extension.
Paid professional plans	Pay-as-you-go API plans (Beginner, PRO, VIP) with increasing character limits, number of batch files, and maximum file sizes. Customers on the Paid Professional plans have access to a customized version of the tool’s private API.	$99.99/month or $74.99/month billed annually Includes advanced detection filters, full website scanning, cross-language translation detection, 1,000 scan credits for up to 250,000 words per month.	$45.99/month or $24.99/month billed annually Includes up to 500,000 words per month with all of the premium features, plus up to 10 million words overage, the ability to scan up to 250 files simultaneously, page-by-page scanning, and enterprise-grade security LMS integration.
Paid enterprise plans	Contact sales for pricing.	Custom enterprise and education solutions with API access and LMS integration.	Contact sales for pricing.
Additional tools	Advanced AI ChatBot AI Summarizer AI Paraphraser AI Grammar Check AI Translator Word Counter AI Email Helper Citation Generator Plagiarism Checker	Plagiarism checker Chrome extension	Plagiarism checker Writing feedback Writing report Chrome extension

*The two AI-simplified human articles were counted as AI articles for these calculations.

Takeaways:

GPTZero was the most accurate AI detection tool of the three, closely followed by Copyleaks.
ZeroGPT was the least accurate AI detection tool of the three.
ZeroGPT is more accessible as a free tool with an unlimited number of scans, but less accurate overall than Copyleaks or GPTZero.
Copyleaks typically gives “all-or-nothing” scores, in contrast to ZeroGPT and GPTZero, which generally give scores along the continuum from 0 to 100.
GPTZero is the only tool of the three that breaks percentages down into AI, mixed, and human.
All three tools return false positives for human-written technical articles with a formal tone.
ZeroGPT is more easily fooled than Copyleaks or GPTZero into rating AI articles with an informal tone as “likely human.”
GPTZero is the only tool of the three that provides detailed justifications for why scanned text was rated as human or AI.
All three tools provide plagiarism checking capacity along with AI detection.
GPTZero is the only tool of the three that provides writing feedback to help human writers improve their craft.

How AI Is Used Impacts the Score

The way AI is used in the writing process makes a big difference to AI scores. Brainstorming with an AI tool doesn’t seem to lead to high AI scores. However, using it to generate an outline or to “simplify” human-written text does.

Tone & Topic Make a Difference

Technical and formal articles tend to trigger higher AI scores. Lifestyle articles and those with a conversational tone often come back with lower AI scores, even if they were written entirely by AI.

Technical writers and businesses that hire them should expect higher AI scores (and even plagiarism scores) and use other methods to confirm the writers' process. There are only so many ways to word facts like "ice is a solid" before the statement becomes factually incorrect.

It becomes important in these cases to add unique insights and examples that existing articles on the topic don't have. A little humor in the introduction and conclusion goes a long way toward "humanizing" the content, as well.

Additional Ways to Detect AI Use

If AI detection tools aren't accurate, how can you tell if a writer has used ChatGPT?

1. Look for ChatGPT Markers in the Version History

Cloud-based word processing software like Google Docs provides access to version history tracking. If your in-house writers don’t currently use them, moving to a cloud-based solution will allow you to see the following telltale signs of ChatGPT use in the writing process:

Large Blocks of Copied and Pasted Text

Entire sections (or the entire article) appearing as a copied and pasted block

Typical ChatGPT Formatting Features

Bolded terms and phrases
Extensive use of bullet points
Bullet lists with fairly consistent numbers of sub-bullets within main bullets
Extensive use of em dashes (—)
Horizontal lines between sections
Hand icons pointing to links
FAQ answers indented by a single space compared to the questions
Random headings that look like headings but are formatted as “normal text” when you select them and clear the formatting (CTRL+\)

External Link Clues

External link URLs with a ChatGPT UTM tag at the end of them
External links to source URLs that don’t exist

Generic Headings

“Common Mistakes to Avoid” as a heading (unless requested in the article outline)
“Final Thoughts” as the heading for the conclusion

Examples

Made-up “real-world” examples with company names that don’t actually exist. E.g., “Barrio Soleado Shoes, a shoe shop in Madrid, launched a customer loyalty program and saw a 40 percent increase in sales.”

Sentence Structure

The appearance of identical sentence openings across multiple articles in the same batch. For example, two or more articles with an introduction that ends with, “Let’s dive into ABC…”
“It’s not this, it’s that” turns of phrase. For example, “Great writing isn’t about sounding smarter. It’s about making readers feel smarter.”
Sentences beginning with dependent clauses that are similar to: "In today's technology-driven world," “In today's dynamic workforce,” or “In today’s competitive business landscape.”

Excessive CTAs (Calls to Action)

An additional CTA right at the end of the conclusion, which often occurs when ChatGPT is prompted to include one in a specific place in the main body of the article

Prompts or Answers

Prompts or conversational AI answers pasted in accidentally, for example: “Here is the XYZ you requested”

Caveat 1: You won't see these markers if your writers work in an on-premises word processing program like Microsoft Word, or if they write in a separate “working Google Doc” and copy and paste the final article into a "clean" document that they submit to you. It could be worth asking writers to use a single Google Doc from start to finish so you can see their process. This may sound like overkill. However, a nice, long editing history is one of the best ways for a writer to demonstrate that they really wrote their articles themselves.

Caveat 2: Blocks of text could also appear suddenly in a cloud-based document if the writer works offline and then syncs their changes all at once. It's important to know your writers and how they work.

2. Ask the Writer About Their Process

Asking the writer about the process they used for researching, writing, and editing their article may be warranted if all of their articles come back with high AI scores across tools or if you see any of the markers listed above. They might have.

Used ChatGPT for research or brainstorming.
Used ChatGPT to generate an outline, and then wrote the article themselves.
Used ChatGPT to write the entire article, and then simply shuffled the sentences or reworded them.
Used ChatGPT to generate certain sections, lists, or examples.
Used ChatGPT to simplify an article they wrote themselves.
Used an AI tool to rephrase sentences marked as "plagiarized."
Used an AI tool to create a table, based on data they provided themselves.

Once you know what's going on, discuss internally and decide which uses of AI for researching and writing you will and won't accept. Then communicate this to your writers or freelancers to avoid unpleasant surprises in the future (for either party).

Tip: GPTZero offers a Writing Report with the Chrome extension that provides a replay of edits made in Google Docs, together with a score for how "natural" the typing patterns were. This is a helpful resource for quickly identifying copy-paste work.

How to Lower AI Scores

Your company or head of marketing may require a low AI detection score as a condition for in-house article acceptance. The strategies required to achieve this depend on the tool. However, these tricks often help:

Rewrite introductions and conclusions.
Use AI tools at the brainstorming stage, and then to check spelling and grammar at the end. Don’t use them to generate an outline, lists, or steps, or to rephrase or simplify human writing after the fact. These uses will often lead to a high AI score.
Use numbered lists instead of bullet lists.
Reduce the use of bullet lists in general.
Limit the use of em dashes.
Avoid “Final Thoughts” as the heading for the conclusion.
Vary your sentence structure and avoid overused phrases.
Avoid typical ChatGPT language like "delve."
Avoid frequent use of "Not X but Y" sentence structures.
Add insightful real-world use cases and examples that are real and not hypothetical.

Note: We have often reworked entire article sections to clear the "suspected AI" score, only to see a new section flagged. If you're spending hours trying to lower the score only to find that the article is becoming less coherent, it's worth communicating this to stakeholders and suggesting other ways to ensure both quality and transparency. Running checks with two or three AI detection tools combined with spot-checking the edit histories of working docs seems to be a balanced approach.

Does a Low AI Score Matter for SEO Rankings?

According to Google Search's guidance about AI-generated content, the use of automation in content production is not penalized as long as it is used to generate helpful content. Content generated "with the primary purpose of manipulating ranking in search results" (however it is made) is considered a violation of Google's spam policies.

Our own testing confirms this. Dozens of our human-written articles that were flagged as 100% AI by detection tools rank on page one of SERPs. These pieces are well-researched, expertly written, and genuinely helpful to readers.

This begs the question of whether using ChatGPT for writing SEO content is a good idea, given the potential to save time and money. It depends on how strategic you are.

Keep these points in mind when using generative AI for drafting content:

AI relies on articles that are already published on the internet, so it tends to create generic-sounding articles. Add unique information to the prompt to ChatGPT, Gemini, or your chosen generative AI tool for a higher-quality result.
AI sometimes makes factually inaccurate statements that only an expert would pick up. To avoid this problem, add your own insights and have another expert review the content before it's published.

AI Detection Tools Can Be Helpful

AI detectors are far from perfect and vary in precision. Even the best AI detection tools get it wrong. Technical articles, in particular, seem to trigger the AI detection alarm even when real human experts were the authors.

High AI detection scores across several tools may indicate that AI was used at some stage of the writing process (beyond simple brainstorming). Talk with your writers and implement strategies to detect and eliminate unauthorized AI use. Work together to find solutions that work for your writers, your customers, and your brand.

are ai detectors accurate

Rosanne Walters

Rosanne is the Content Manager at Avidon Marketing Group, bringing eight years of specialized experience in SEO and content marketing. She directs our clients’ content strategy through every phase, leading keyword research, copywriting, editorial governance, and writer development.