AI Testing Is Breaking Your Pipeline (Fix Quality)

About This Episode:

AI coding tools are helping teams move faster than ever, but there’s a hidden cost.

In this episode, we break down new insights from a DevOps industry report revealing a growing “velocity paradox”: teams are shipping more code, but experiencing more failures, rollbacks, and burnout.

You’ll discover why AI adoption is heavily skewed toward coding, but not testing, pipelines, or observability, and how that imbalance is creating fragile systems that break under pressure.

More importantly, you’ll learn what high-performing teams are doing differently to maintain quality while scaling speed.

What You’ll Discover:
Why AI is increasing deployment failures (and how to stop it)
The “velocity vs quality” trap hurting modern DevOps teams
How to reduce flaky tests and pipeline instability
Why observability and feature flags are now critical, not optional
Practical ways to improve your CI/CD pipeline for AI-driven development
The role of QA engineers in the age of AI (and why it’s growing, not shrinking)

If you’re a tester, automation engineer, or DevOps leader trying to keep up

[fusebox_track_player url=”https://traffic.libsyn.com/testtalks/tgaEricAITestingIsBreakingYourPipeline.FixQualityBeforeItsTooLate585a.mp3″ social_linkedin=”true” social_email=”true” ]

About Eric Minick

Eric is a veteran of the DevOps space, and recently co-authored the O’Reilly book AI-Native Software Delivery. Recently, he’s been digging in to the downstream effects teams are seeing after adopting AI coding assistants.

Connect with Eric Minick

Company: Harness https://www.harness.io/
LinkedIn: https://www.linkedin.com/company/harnessinc/
Twitter: https://x.com/harnessio

What This Episode Is Really About

Harness principal product manager Eric Minick published a DevOps state-of-the-industry report that confirmed what a lot of us suspected but couldn’t prove: AI is accelerating code output while quietly torching quality, security, and developer wellbeing. This episode gets into the specific numbers, why the tools most teams are skipping matter more than ever, and what the small cohort of teams actually thriving with AI is doing differently.

The Velocity Paradox — and Why the Numbers Aren’t What You’d Expect

The term Eric used to frame the whole conversation was the “velocity paradox.” Roughly a year ago, Dora research found something strange: even though developers reported massive productivity gains from AI coding assistants, the actual volume of software being delivered hadn’t moved. More code, same output. That was the first signal something was off.

Since then, the bottleneck moved. Teams figured out how to ship more — by cutting quality. The data from Eric’s report tells the story bluntly:

95% of developers use AI in coding at least once a week; about half use it daily
Only 60–70% use AI in testing — and Eric suspects a lot of that is “my vendor added an AI badge to the UI”
22% of production deployments from the heaviest AI users caused an incident, required a rollback, or needed a hotfix
69% of heavy AI users said AI-generated code caused deployment issues
70% of teams reported their pipelines are plagued by flaky tests and failures

The more unsettling stat: engineers who use AI the most report the highest levels of manual toil — 38% in that cohort, compared to 32% in lighter users. The explanation isn’t complicated. If AI accelerates the coding phase but nothing else changes downstream, every non-automated step just has to happen more often. Three times the releases means three times the manual checks, even after you’ve knocked out 20% of them.

The Organizations Getting It Right Are Doing Old Things Well

This is where the conversation got interesting. There’s a minority — roughly 30% of heavy AI users — who said AI actually reduced their quality issues. Security problems down, performance problems down, fewer bugs. Eric’s read on what separates them: they brought rigor back to the fundamentals.

Spec first. Test first. TDD. Treat the AI as a pair programmer, not an autocomplete button. Run dependency firewalls. Keep pipelines strong enough that a developer’s shortcuts get caught before production. Connect observability tooling to deployment pipelines so a rollback can happen automatically — not after a human reads an alert, opens a ticket, and tracks down the DevOps team.

“If everything’s solid, you can go fast safely,” Eric said. “It’s all the stuff we’ve known we should be doing for the last 10 or 15 years.”

The Role QA Engineers Should Be Playing Right Now

One of the sharpest moments in the episode was Eric’s take on where quality ownership lands when everyone’s an AI-assisted developer. His answer: testers shouldn’t be siloed, but they’re more valuable than ever. Not as a separate function churning out test cases in isolation — as consultants embedded with engineering teams, asking the right questions before code ships.

Are you doing TDD with your coding assistant? Have you thought through the negative cases? The corner cases? Do we know whether this is actually shippable?

Eric called it “the revenge of QA” — his argument being that if writing code is no longer the hard part, the ability to think critically about whether something is ready to ship might be the most valuable skill on the team right now.

The Confidence Gap Is Real, and It Should Worry Enterprise Teams

Here’s the stat that genuinely surprised Eric: 95% of teams in the survey rated their DevOps practice as “good” or “very good.” This from the same teams reporting nights-and-weekends work, climbing incident rates, security issues, and non-compliance risks.

His interpretation: teams that built solid CI/CD pipelines feel pride in what they built — and that pride has calcified into complacency. What was genuinely impressive a year ago isn’t sufficient when you’re generating three times the code. For teams in regulated industries — healthcare, insurance, financial services — the combination of security issues and non-compliance is a lawsuit waiting to happen.

Rapid-Fire Answers

AI coding — net positive or net risk? Net positive. But.

More tests or better tests? Better tests.

Feature flags or test coverage? Feature flags. Testing in production is where it’s at.

AI-generated tests — useful or dangerous? Both. It’s all code. You have to keep your eyes on it.

One metric every team should track in an AI-native SDLC? Change failure rate. “We are lighting production on fire way too much and we need to stop.”

Rate and Review TestGuild

Thanks again for listening to the show. If it has helped you in any way, shape, or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.