12 BEST AI Test Automation Tools for 2026 The Third Wave

Table of Contents

The Bottom Line for 2026:

After 25+ years in QA and interviewing over 580 automation experts on the TestGuild podcast, the reality is most “AI testing tools” are just GPT wrappers. The AI test automation tools actually delivering ROI for enterprise teams right now fall into three categories: visual validation (Applitools), autonomous test generation (Blinq.io, Mabl), and self-healing execution agentic (Perfecto Perforce). If you want to cut through the hype and find the right tool for your specific stack, use our interactive Tool Matcher, otherwise, here are the 12 AI testing tools actually worth your time.

AI Testing Waves

I first wrote about the AI “three waves” back in 2017, and honestly?

I thought I was just documenting a trend.

Turns out, I was watching a revolution.

(Quick note: I originally wrote this in 2017 and updated it in 2023 to keep up with all the innovation in this space since it was first posted. Now in late 2025, we’re living in what I predicted – the third wave is real, and it’s changing everything.)

Here’s what’s wild: back in 2017, when I first started talking about AI in testing on my TestGuild podcast, people thought I was overselling it. “Joe, is this just hype?” they’d ask. Now in 2025, 81% of development teams use AI in their testing workflows.

The question isn’t “should we use AI?” anymore. It’s “which AI tool won’t waste our time?”

And that’s what this guide is really about.

Cucumber + GenAI

BlinqIO

AI meets prompt engineering

Autonomous everything

testers.ai

AI agents write and run tests

Agentic workflows

Mabl

Autonomous test agents

All-in-one platform

Katalon

Self-healing + AI generation

Visual testing

Applitools

Visual AI pioneer

Codeless automation

ACCELQ

Generative AI test creation

Test observability

BrowserStack

AI root cause analysis

Reducing flaky tests

Testim

ML-powered locators

LLM-powered

TestMu KaneAI

Natural language tests

No selectors

TestResults.io

Selector-free testing

Enterprise

Tricentis

Fully codeless AI

Manual regression testing

Parasoft (TIA)

Code coverage-driven prioritization

Find the Right Test Tool For You Now with our Tool Matcher

Frequently Asked Questions About AI Test Automation Tools

Will AI test automation replace QA engineers?

Real talk: No, and I’ll tell you why after interviewing 500+ testing leaders on TestGuild Podcast, AI handles the grunt work that burns out good engineers. We’re talking test generation, flake detection, self-healing selectors, the repetitive stuff that makes testers want to quit. But here’s what AI can’t do: understand business risk, know which bugs actually matter to users, or make judgment calls when your test results are ambiguous. The teams I’ve talked to at Automation Guild who are winning with AI? They’re using it to amplify their engineers’ impact, not replace them. Test everything and keep the good, including the human expertise.

Are AI test automation tools ready for real production use?

Some are, most aren’t. I’ve seen this firsthand covering AI testing tools for TestGuild over the past few years,the vendors who are honest admit they’re solving specific problems, not replacing your entire test stack. Tools focused on targeted use cases like Selenium self-healing, visual regression analysis, or Playwright test generation are in production CI/CD pipelines right now. Full autonomous testing with zero human oversight? That’s mostly conference demo magic. On TestGuild Podcast, I’ve talked to engineering teams at companies like Microsoft and Google who use AI in production, but they’re extremely selective about which problems they’re solving with AI versus traditional automation.

How do I choose the right AI testing tool for my team?

Start with the pain, not the hype. What’s actually breaking your workflow right now, flaky tests eating your CI/CD time? Maintenance hell every time the UI changes? Coverage gaps you can’t close with your current headcount? Then find the tool that solves that specific problem for your stack. Don’t buy a “complete AI testing platform” that does everything poorly. I built the TestGuild Tool Matcher specifically for this: answer a few questions about your tech stack and pain points, and it’ll recommend tools that actually fit your situation. And if you’re evaluating tools, ask vendors the hard questions I ask in my podcast interviews: How does this integrate with our existing framework? What’s the learning curve for my team? Can we export the tests if we need to leave?

Do AI testing tools work with Playwright, Selenium, or Cypress?

Most modern AI tools integrate with the big frameworks, Playwright, Selenium, Cypress, and even older stacks like WebDriver, but the quality of that integration varies wildly. Some tools bolt onto your existing tests through plugins or APIs. Others generate tests in your framework’s native language (like JavaScript for Playwright or Python for Selenium) so you own the output. The sketchy vendors lock you into their proprietary format and make migration painful. When I interview AI tool founders on TestGuild Podcast, I specifically ask them about framework compatibility because I know that’s where teams get burned. Before you commit to any AI testing tool, run a proof of concept with your actual application and framework, not just their demo app.

The Three Waves: How We Got Here

Before I dive into tools, you need to understand this framework.

It’ll help you cut through the AI washing and marketing BS that’s everywhere now.

First Wave: The Vendor Lock-In Era (1990s-2000s)

I cut my teeth on WinRunner. Man, I loved that tool. Then Mercury killed it for QTP, and my heart broke a little.

That was the first wave – proprietary tools that locked you in:

WinRunner, Silk Test, QTP – The OGs
Proprietary everything – Each vendor had their own scripting language (TSL for WinRunner, anyone?)
Record and playback – Sounded great, produced brittle garbage
Expensive as hell – Enterprise pricing before “enterprise” pricing was cool

The problem? When the vendor pivoted (or died), you were screwed. Your entire test infrastructure could become obsolete overnight.

Enter the second wave.

Second Wave: Open Source Changes Everything (2004-2020)

Then Selenium happened. And Selenium changed everything.

I’ve interviewed dozens of people on my podcast about this shift – from Jason Huggins (Selenium’s creator) to folks building Cypress and Playwright.

The second wave was all about:

Open source – Free, community-driven, no vendor lock-inWeb-first – Built for the modern web app explosionDeveloper-focused – Real programming, not wizard-driven nonsenseExplosion of tools – Cypress, Playwright, Appium, and hundreds more

But here’s what nobody talks about: this wave just moved the pain. Instead of paying vendors, you paid engineers. Instead of brittle record-playback, you got brittle selectors. Different problem, same headache.

By 2017, we were seeing early ML attempts – basic self-healing, visual AI.

But the real AI explosion? That didn’t hit until ChatGPT launched in late 2022.

Suddenly everyone was racing to add LLMs to their testing tools.

Third Wave: AI That Actually Works (2020-Present)

Here’s where we are now. And I’m not gonna lie – after interviewing hundreds of testing experts on TestGuild, I’m cautiously optimistic about this wave.

What makes a tool “third wave”?

Self-healing – Tests adapt when your app changes
Natural language – Write tests in plain English
Autonomous agents – AI that can reason and make decisions
Visual intelligence – “Sees” your app like a human does
Predictive smarts – Knows which tests to run and when

The big shift? Third wave tools don’t just run your tests faster. They actively reduce the maintenance burden that’s been killing teams since the Selenium days.

Look, I’m a skeptic by nature. But after testing these tools myself and talking to teams using them in production, this is real. Not perfect.

Not magical. But real.

Join Test Automation Training

The 12 Tools Actually Worth Your Time

Alright, let’s get into it.

I’ve personally tested, used, or extensively interviewed founders/users of every tool here.

No fluff, just what I’ve learned.

1. BlinqIO: Where Cucumber Meets Generative AI

Podcast Connection: I had founders Guy Arieli and Tal Barmeir on episode A485 to talk about “AI Meets Cucumber: A New Testing Approach Using Prompt Engineering”. They’re also a Platinum Sponsor at Automation Guild 2025.

Here’s what got me excited: Guy and Tal are serial entrepreneurs with 25 years in testing. Their previous company, Experitest (now Digital.ai), was at the forefront of mobile test automation. Instead of retiring with a pile of money, they built BlinqIO.

The Innovation: BlinqIO calls Cucumber a “test speak” language – a way to communicate precisely with AI. Their AI virtual testers translate test scenarios into automation code, and here’s the kicker: they work 24/7. As Tal told me on the podcast, “You can have an army of virtual testers underneath you that work during the night.”

What Makes It Third Wave:

AI Test Engineer – Automatically generates BDD (Gherkin) scenarios from feature requirements
AI Recorder – Captures test steps and generates Playwright code + business descriptions
Self-healing – Detects UI changes and automatically recovers/fixes tests
No vendor lock-in – Complete access to your project code in a private Git repository
Multilingual – Supports testing in 50+ languages

From Our Podcast Conversation: Guy emphasized how generative AI creates a “synthetic human brain” that dramatically boosts tester productivity.

Unlike tools that replace testers, BlinqIO augments them – testers direct the AI army.

**Watch my hands-on demo:**

Real Results:

RedHat Test Automation Engineer reported 10x boost in test creation efficiency
Vodafone Team Leader praised seamless integration into team processes

Best For: Teams already using or familiar with Cucumber/BDD, organizations wanting AI without vendor lock-in, global companies needing multilingual testing

Pricing: Freemium model available

Check it out: blinq.io

2. testers.ai: The Ex-Google Team Bringing Chrome-Level Testing to Everyone

What I Love: Built by engineers who tested Chrome at Google. These folks know what actual enterprise testing looks like.

When I first saw testers.ai, I thought “oh great, another ‘autonomous AI’ promise.” Then I dug deeper. The team behind this – they’re the ones who built the testing infrastructure that keeps Chrome running for billions of users. That’s a different pedigree than most startups.

The Hook: AI agents that write AND run tests. No scripts to maintain. No brittle selectors. No manual clicking through the same flows for the hundredth time.

Two Types of Checks:

Autonomous Static Checks – AI scans your app for the basics you’re probably missing:

Performance issues
Privacy & consent problems
Security vulnerabilities
Third-party supply chain risks
API design issues
Error handling gaps

Autonomous Dynamic Checks – This is where it gets interesting. AI analyzes your app and generates interactive tests covering:

Happy paths (the obvious stuff)
Edge cases (the stuff that breaks in production)
Invalid inputs (the stuff users WILL try)
Statistically likely bugs (based on patterns across millions of apps)

Plus – and this is clever – it gives you “Copilot fix prompts” you can paste directly into GitHub Copilot or Cursor to fix issues.

Real Talk: The claim is tests that used to take 8-12 hours to write now run in minutes. I haven’t validated that personally, but knowing their background, I believe the tech is solid. Bonus is Jason Arbon who has been aon multiple TestGuild podcast and Automation Guild sessions.

Best For: Teams who want Google-level testing without hiring a Google-sized QA team

Pricing: Not published, targeting teams who previously couldn’t afford this level of coverage

Check it out: testers.ai

3. Mabl: Agentic AI (Finally Living Up to the Hype)

Podcast Connection: I’ve been following mabl since they started, and recently had them on TestGuild to talk about their new agentic workflows.

When mabl talks about “agentic workflows,” they mean AI that acts like a skilled human tester. Not just running scripts – actually thinking about what to test.

What’s New in 2025:

Test Creation Agent – Give it requirements in plain English, it builds your test suite
mabl MCP Server – IDE integration that lets you query tests with natural language
Auto TFA – Autonomous root cause analysis for every failure
Visual Assist – Adapts tests when UI changes

My Take: I’ve seen a lot of tools claim “autonomous” testing. Mabl is one of the few actually delivering on it. Their approach to test creation from user stories is legitimately impressive.

Real Results I’ve Heard:

One team told me they’d save $240K over 2 years vs. Selenium
Another said they went from 2 weeks of work to 2 hours

Best For: Teams ready to embrace truly autonomous testing, unified testing across web/mobile/API

Pricing: Starts around $450/month

Learn more: mabl.com

4. Katalon: The Gartner-Approved Choice

Podcast Connection: We’ve had Katalon folks on multiple times discussing their AI features.

Katalon got named a Visionary in the 2025 Gartner Magic Quadrant. That’s enterprise-speak for “these folks are legit.”

What I like about Katalon is they’re not chasing hype. They’ve built a solid all-in-one platform that works for teams at different skill levels.

Key Features:

No-code test creation (for beginners)
Full scripting capabilities (for experts)
Self-healing scripts (reduces maintenance)
AI-powered test generation
Covers web, mobile, API, and desktop

My Take: If you need ONE tool that does everything reasonably well, Katalon’s your answer. It’s not the flashiest, but it’s reliable.

Best For: Teams with mixed technical skills, organizations wanting an all-in-one solution

Pricing: Free tier available (actually usable), premium starts at $208/month

Check it out: katalon.com

5. Applitools: Visual AI That Made Me a Believer

Podcast Connection: I interviewed founder Adam Carmi back in the early days (listen to episode 43), and he’s been back on the show multiple times.

I’ll be honest – when Adam first told me about visual validation testing in 2015, I thought it was BS. “An algorithm that finds bugs without explicitly defining elements? Come on.”

Then I tried it. And my skeptical mind was blown.

Why Applitools Is Different: No pixel-by-pixel comparisons. No fragile baseline images. Their Visual AI actually understands what matters visually and what doesn’t.

What’s New in 2025:

AI-based self-healing execution cloud
Automated maintenance grouping – ML clusters similar changes across pages/browsers/devices
Smart diff prioritization – AI knows what’s a bug vs. an intentional change

Real Story: One company saved a million dollars a year by replacing thousands of assertion lines with visual checkpoints. A MILLION. DOLLARS.

My Take: If you’re doing any UI testing and not using Applitools, you’re working too hard.

Best For: Visual regression testing, cross-browser validation, teams obsessed with UI/UX quality

Pricing: Starts at $199/month

Try it: applitools.com

6. ACCELQ: Generative AI Gets Real

What I’ve Seen: ACCELQ’s approach to generative AI is different than most. They’re using LLMs to actually understand test intent, not just generate scripts.

Key Features:

Plain English test creation – No rigid syntax, just describe what you want
Autonomous healing – Handles complex element type changes automatically
Logic insights – AI analyzes your test design and suggests optimizations
Reusable test assets – Reduces duplication across your test suite

My Take: The “logic insights” feature is underrated. It’s like having a senior test engineer review your work and suggest improvements.

Best For: Teams wanting to scale test coverage fast, organizations moving from manual to automated

Pricing: Custom enterprise pricing

Learn more: accelq.com

7. BrowserStack Test Observability: AI Debugging That Doesn’t Suck

What It Does: Turns test failure chaos into clear root causes using AI.

Look, everyone has test reporting. BrowserStack’s Test Observability actually uses AI to tell you WHY tests failed and whether it’s a product bug, automation issue, or environment problem.

Key Features:

AI-powered root cause analysis – No more digging through logs for hours
AI-based tagging – Automatically categorizes failures
Smart prioritization – Tells you what to fix first
Works anywhere – BrowserStack, local, other platforms

My Take: If you have a large test suite and spend hours debugging failures, this pays for itself immediately.

Best For: Teams with 100+ tests, distributed teams needing unified observability

Pricing: Starts at $29/month (add-on to BrowserStack)

Try it: browserstack.com/test-observability

8. TestResults.io: No More Selector Hell

Podcast Connection: I had founder Tobias Müller on the show (episode on Next Gen Functional Visual Testing), and what he showed me was legitimately innovative.

The big idea: What if you never had to deal with XPath, CSS selectors, or element IDs ever again?

How It Works: You describe what users do in plain language. TestResults.io figures out the rest. No selectors. Just user journeys.

Key Benefits:

3x faster testing (according to their data)
Eliminates flakiness through AI stability
Massive maintenance reduction
Works across any platform users can interact with

My Take: If selector maintenance is killing your team (and it probably is), check this out.

Best For: Cross-platform testing, teams tired of selector maintenance

Pricing: Custom

Try it: testresults.io

9. Testim: ML for Locator Intelligence

Podcast Connection: I spoke with co-founder Oren Rubin about their mission to make test automation accessible beyond just developers.

Testim uses machine learning specifically to solve the “flaky test” problem that drives everyone crazy.

How It Works: Multiple fallback strategies for finding elements. If one locator breaks, ML automatically tries others. Tests self-correct when UI changes.

Key Features:

ML-powered locators – Multiple ways to find elements
Smart execution – AI optimizes test order
Intelligent grouping – Related failures grouped for efficient debugging
Auto-healing – Tests fix themselves

My Take: They’re laser-focused on one problem (flaky tests) and solving it well. I respect that approach.

Best For: Developer teams, CI/CD environments, teams fighting test flakiness

Pricing: Starts at $450/month

Learn more: testim.io

10. LambdaTest KaneAI: Modern LLMs Meet Testing

What’s Different: Built on modern large language models – think GPT-level natural language understanding.

KaneAI lets you create, debug, and evolve tests using natural language. And because it’s LambdaTest, you get their entire cloud infrastructure for cross-browser testing.

Key Features:

Natural language test creation
LLM-powered debugging
Autonomous test evolution
Integrates with LambdaTest’s cross-browser platform

My Take: This is where testing is heading – conversational interfaces powered by modern AI. Check out my podcast episode all about this AI as Your Testing Assistant with Mudit Singh.

Best For: Teams wanting cutting-edge LLM tech, cloud-based cross-browser testing

Pricing: LambdaTest starts at $15/month

Check it out: lambdatest.com/kane-ai

11. Tricentis: Enterprise AI at Scale

What It Is: The big enterprise play. Fully AI-driven, fully codeless, built for massive scale.

If you’re a large enterprise with SAP, mainframes, and a complex application portfolio, Tricentis is built for your world.

Key Features:

AI-powered test design and generation
Automated maintenance at enterprise scale
Intelligent test execution optimization
Packaged application testing (SAP, Salesforce, etc.)

My Take: Not for startups. But if you’re a Fortune 500 with complex enterprise apps, this is the tool built for you.

Best For: Large enterprises, SAP environments, complex application portfolios

Pricing: Custom enterprise pricing

Learn more: tricentis.com

12. Parasoft Test Impact Analysis: Data-Driven Test Selection

Podcast Connection: I had Wilhelm Haaker (Director of Solution Engineering) and Daniel Garay (Director of QA) walk me through their Test Impact Analysis approach and give me a hands-on demo.

Here’s my pet peeve from being an automation engineer: someone checks in code, the build triggers, and you’re debugging test failures that have NOTHING to do with what changed. Pure noise.

Parasoft’s Test Impact Analysis solves this using actual code coverage data. Instead of guessing which tests to run, it tells you exactly which tests are affected by code changes.

How It Works: Captures code coverage from your manual test sessions (not just unit tests). When new code deploys, it compares what changed and says: “These specific tests need to be rerun.”

What It Supports:

Languages: Java and C# (Spring Boot, .NET)
Test Framework: Agnostic – works with any framework
Integration: Imports tests from Jira X-ray, Azure DevOps
Deployment: Web server, supports Kubernetes

My Take: As Daniel put it – QA works in a black box. You don’t see code changes. This gives you data-driven answers instead of stress-driven guessing. Not magic, but practical.

Real Benefit: Wilhelm’s point resonated: time savings doesn’t mean less work – it means deeper exploratory testing in areas that actually changed.

Best For: Teams with large regression suites, Java/Spring Boot or .NET apps, organizations needing to justify test scope with data

Pricing: Custom enterprise pricing

Learn more: parasoft.com

Overwhelmed? Use My Free Tool Matcher

Look, I get it – 11 tools is a lot to process. That’s exactly why I built the TestGuild Tool Matcher.

Answer a few quick questions about your tech stack, budget, and testing goals, and it’ll shortlist the best options from over 300 tools (including all the ones in this article).

Takes about 60 seconds. Completely free. No email required, no sales BS. Just a straight answer about what tools actually fit your needs.

Try the Tool Matcher now →

How to Actually Choose (By Pain Point)

By Team Size

Small Teams (1-10): Go with testers.ai, BlinqIO, or LambdaTest KaneAI. Low learning curve, affordable, fast value.

Mid-Size Teams (10-50): Mabl, Katalon, or Testim. Good balance of power and usability.

Enterprise (50+): Tricentis, ACCELQ, or Katalon Enterprise. Built for scale.

By Primary Pain Point

“Our tests are flaky as hell” → Testim or BrowserStack“Maintenance is killing us” → TestResults.io or testers.ai“We need visual testing” → Applitools (no question)“Want plain English tests” → testers.ai, BlinqIO, or ACCELQ“Need autonomous agents” → Mabl (most advanced)“Love Cucumber/BDD” → BlinqIO (built for it)“Need everything in one” → Katalon or Tricentis

By Technical Skill

Non-Technical Team:

testers.ai
BlinqIO
ACCELQ

Mixed Skills:

Mabl
Katalon
Testim

Highly Technical: Any of these work. Focus on integration capabilities.

Real Talk: Is AI in Testing Just Hype?

I get asked this on every podcast episode. Here’s my honest answer:

In 2017? Yes, mostly hype.

In 2023? Getting real, but oversold.

In 2025? It’s mainstream. The question isn’t “is it hype?” but “which tools actually deliver?”

After interviewing Jason Huggins (Selenium creator), Ben Fellows (LoopQA), Guy Arieli and Tal Barmeir (BlinqIO), Jim Trentadue, and dozens of other testing leaders on TestGuild, here’s what I’ve learned:

AI won’t replace QA engineers. But it WILL change what they do:

Less time writing/maintaining scripts
More time on exploratory testing
More time on test strategy
More time analyzing quality trends
More time on complex scenarios AI can’t handle

The teams winning aren’t the ones avoiding AI. They’re the ones figuring out how to work WITH it.

Wait… Is There Already a Fourth Wave?

Podcast Connection: I just had Don Jackson on episode A554, and what he showed me made me question everything I thought I knew about where automation testing is heading.

Don recently joined Perfecto (now part of Perforce), and their new agentic AI approach is fundamentally different from every tool in this article. Here’s why:

Third Wave vs Fourth Wave: The Critical Difference

Third Wave Tools (everything above in this guide):

AI helps CREATE scripts
AI MAINTAINS scripts
AI HEALS scripts when they break
But there’s still a SCRIPT being executed

Perfecto’s Fourth Wave Approach: No script. Ever.

Instead, you write goal-oriented prompts in natural language. Don’s example from our podcast:

“Book a flight from San Francisco to New York in business class, prefer an aisle seat, second preference is window seat. If there are no flights available that have one of those seats, I don’t want to sit in the middle. Come back with an error message.”

That’s it. That’s your entire “test.”

How It Actually Works

At runtime, the AI:

Takes a screenshot of your application
Interrogates the image to understand context
Makes decisions about what to do next to achieve the goal
Handles UI changes automatically (because there’s no brittle script to break)
Works across web, iOS native, Android native, mobile responsive – all from ONE test

Don’s tagline says it all: “No scripts, no frameworks, no maintenance”

Real Example That Blew My Mind

I asked Don about reliability concerns. He told me about testing a weather app:

He wrote: “If the app isn’t installed, go install it.”

What the AI did autonomously:

Recognized it was on an Android device (not iOS)
Swiped from bottom to check app catalog
Didn’t find it, so did a search
Still not found, clicked home
Opened Play Store (not App Store – it knew!)
Searched for the app
Clicked install
Waited and checked progress bar repeatedly until done
Clicked “Open” when button appeared

No explicit loop scripting. No device-specific logic. No progress bar waits coded. Just one simple prompt.

As Don said on the podcast: “Think about how hard that would be to script today.”

What Makes This “Fourth Wave”?

The difference is agency – real, autonomous decision-making:

Third Wave Example:

AI generates: click(‘#login-button’)type(‘#username’, ‘[email protected]’)type(‘#password’, ‘password123’)click(‘#submit’)

Script is created and executed.

Fourth Wave Example

Prompt: “Log into the application”

AI figures out HOW at runtime based on what it sees.

Things That Were Previously “Untestable”

Don mentioned several beta customers finding use cases nobody expected:

Financial Services Company: Stock trading app with dynamic graphs. The AI can now validate:

If a price point is higher than the previous point, it should show green (not red)
The chart visualization matches the numbers in the table below
All this with DYNAMIC data (no static test data required)

E-commerce Company: Product images with descriptions

They run marketing campaigns where descriptions change (“Sale on Laptops!” added to everything)
Couldn’t test these campaigns before (static data problem)
Now they can validate: “Does the text match the picture? If it says ‘HP laptop with 17-inch screen and 10-key’, does the image show the HP logo and 10-key keyboard?”

Accessibility Testing: One prompt: “Make sure this page matches WCAG 2.0 standards”

The AI grabs those standards, checks compliance, reports back. Done.

My Honest Take (The Skeptic’s View)

Look, I’ve been in automation for 25+ years. I’ve seen a LOT of “revolutionary” promises that turned into vaporware.

When Don first described this 18 months ago, I thought it was interesting theory. When he demoed it, I was intrigued. Now that it’s actually released and I’ve seen real customer results?

The Good:

Solves the selector maintenance nightmare
Works across platforms without rewriting
Enables non-technical testers to automate
Handles complex scenarios that were too hard to script

The Trade-offs:

Slower than traditional scripts (it’s taking screenshots and processing them)
Requires good prompting skills (vague prompts = vague results)
You need to build trust through auditing early on
Not a replacement for API testing or unit testing

The Real Question: Is this production-ready today?

For some use cases – absolutely. For dynamic UIs like Salesforce Lightning (Don’s example), for exploratory testing, for applications that change frequently.

For high-speed regression suites where you need maximum performance? Maybe not yet.

The Controversial Take: Scripters vs Testers

Don said something on the podcast that’s going to upset some people:

“Some of the best testers I’ve known in my career are the worst scripters. And conversely, some of the best scripters were the worst testers because they didn’t have that destructive mindset. Wouldn’t it be amazing if I could have my best testers be able to do automation?”

I’ve seen this my entire career. The business experts who understand the domain can’t automate. The automation experts don’t understand the business context.

Fourth wave tools might finally bridge that gap.

Exploratory Testing, Automated

This is what really got me excited. Don described a beta customer who asked the AI:

“Find all the different paths to get to the shopping cart.”

The AI found 12 paths.

The customer only knew about 9.

Think about that. Automated exploratory testing that discovers things your manual testers missed.

Should You Adopt This Now?

Immediate Use Cases:

Salesforce Lightning testing (notoriously difficult to automate)
Dynamic applications that change frequently
Multilingual testing (works in 98% of languages)
Accessibility compliance checking
Exploratory test automation

Wait a Bit If:

You have stable apps with established automation
You need maximum execution speed
Your team isn’t comfortable with AI/prompting
You’re just getting started with automation (learn traditional first)

My Prediction

In our podcast conversation, Don mentioned he’d been talking about this concept for 18 months and calling it “goal-oriented testing.” The fact that multiple companies (including Perfecto) are now building this approach tells me something:

This is where testing is going.

Not in 10 years. In the next 2-3 years.

The tools in the main part of this article (third wave) are amazing and will continue to evolve. But I think we’re watching the fourth wave emerge right now.

Check it out: Perfecto – Look for their Agentic AI features (released July 15, 2025)

Watch my hands-on review: TestGuild YouTube Channel – I did a deep dive showing this in action

Hear the full conversation: TestGuild Podcast Episode A554 with Don Jackson

My Actual Recommendation

Stop overthinking it. Pick 2-3 tools from this list based on your primary pain point. Get trial access. Build the same 5 tests in each. See which one clicks with your team.

For the adventurous: Try Perfecto’s new agentic AI on one particularly painful automation scenario. See if runtime decision-making works better than scripting.

Don’t wait for perfect. Start experimenting this quarter.

The teams I see succeeding with third-wave tools aren’t necessarily the ones with the biggest budgets or most engineers. They’re the ones who started early and learned by doing.

And the teams that will lead in the fourth wave? They’re experimenting with these agentic approaches RIGHT NOW.

Free TestGuild Courses

Stay Connected

Want more? Here’s how to keep learning:

TestGuild Podcast: Every week I interview testing leaders about what’s actually working. We’ve covered AI testing extensively with folks from BlinqIO, Applitools, Testim, Reflect, and many more. Subscribe here

Automation Guild Conference: My annual online conference brings together the biggest names in test automation. We’ll have sessions specifically on AI testing tools. Learn more

Weekly Newsletter: I send out weekly updates on the latest tools, trends, and techniques. No BS, just actionable insights. Join 40,000+ subscribers