Thursday, December 04, 2025

AI Is a Power Tool, Not a Magic Wand. Why 95% of AI Projects Fail—and What the Survivors Do Differently.

There's a seductive lie circulating in boardrooms and pitch decks: that AI is a turnkey solution. Plug it in, point it at your problem, watch the magic happen.

MIT's research tells a different story. Ninety-five percent of AI initiatives never make it to production. They die in pilots. They rot in proof-of-concept purgatory. They get abandoned when the team realizes that "works in a notebook" and "works in the real world" are separated by an ocean of engineering.

The survivors—the 5%—don't treat AI as a destination. They treat it as a power tool in a well-stocked shop. And they remember that a table saw is useless without a blueprint, measurements, and someone who knows how to build.

The Prompting Delusion

Let's be direct: prompting is not AI development. Prompting is using AI. It's valuable. It's a skill. But it's the equivalent of knowing how to use a search engine and calling yourself a data engineer.

The market is now flooded with "AI solutions" that are, architecturally, a wrapper around an API call to GPT-4. There's a text box. There's a system prompt someone spent an afternoon on. There's a prayer that the model doesn't hallucinate in front of a customer. This isn't engineering. It's hope with a billing page.

Real AI development means: Training or fine-tuning models on domain-specific data. Building retrieval systems that ground responses in verified sources.Designing evaluation frameworks that catch failures before users do. Creating feedback loops that improve performance over time. Engineering for graceful degradation when the model gets it wrong.

The difference between prompting and AI engineering is the difference between driving a car and building one.

SDLC Isn't Optional Anymore

The software development lifecycle exists because we learned—through decades of expensive failures—that complex systems require discipline. Requirements. Design. Implementation. Testing. Deployment. Maintenance. AI doesn't get a pass on this. If anything, AI systems demand more rigor because they introduce a new failure mode: systems that are confidently, plausibly wrong.

Traditional software fails predictably. It throws errors. It crashes. It returns null. You can write tests that catch these failures.

AI systems fail unpredictably. They return fluent, well-formatted nonsense. They work perfectly on your test set and catastrophically on edge cases. They drift as the world changes and their training data grows stale.
This means your SDLC for AI must include: Requirements that specify failure tolerance. What happens when the model is wrong? How wrong is acceptable? What's the blast radius of a bad prediction?

Design that includes human checkpoints. Where does a human review the output? What decisions should never be fully automated?
Testing that goes beyond unit tests. Adversarial inputs. Distribution shift. Prompt injection. Hallucination detection. Bias audits.
Deployment that enables rollback. Model versioning. A/B testing. Shadow mode before full production.
Monitoring that catches drift. Performance metrics over time. Feedback collection. Continuous evaluation against ground truth.
Skip any of these, and you're building on sand.

AI as a Tool in the Toolkit

A craftsman doesn't use a single tool for every job. A table saw is essential, but so is a chisel, a level, and a measuring tape. The magic isn't in any single tool—it's in knowing which tool to reach for and how they work together.

The same applies to AI builds:
Large Language Models are powerful for language tasks—summarization, extraction, generation, classification. They are not databases. They are not calculators. They are not deterministic systems.

Traditional ML models (XGBoost, random forests, logistic regression) often outperform LLMs on structured prediction tasks and are orders of magnitude cheaper to run.

Rule-based systems still have a place. When you need guaranteed behavior—compliance checks, validation logic, business rules—don't reach for AI. Reach for code.

Retrieval systems (vector databases, search indices) ground AI in facts. RAG architectures aren't a workaround for model limitations; they're a fundamental design pattern.

Human-in-the-loop workflows aren't a failure of AI. They're a feature. The best systems augment human judgment rather than replacing it.
The 5% of AI projects that succeed understand this. They don't ask "How do we use AI?" They ask "What's the right tool for each part of this problem?"—and AI is one answer among many.

The Real Work

Building AI systems that work in production is engineering. It requires:
Data pipelines that are reliable, documented, and monitored
Model evaluation that reflects real-world performance, not benchmark scores.

Infrastructure that can scale, fail gracefully, and be debugged at 2 AM
Documentation that lets someone else maintain the system when you're gone.

Security that treats model inputs as untrusted user data (because they are).

Observability that tells you what the system is actually doing, not what you hoped it would do.

None of this is glamorous. None of it makes for good LinkedIn posts. But it's the difference between a demo and a product.

The Path Forward

If you're building with AI, here me now:  AI will not fix a broken process. If your workflow is broken, AI will automate the broken process and make it fast.

AI will not replace domain expertise. It will amplify the experts who know how to wield it.

AI will not succeed without engineering discipline. The 95% failure rate isn't because the technology doesn't work. It's because organizations skip the hard parts.
The organizations winning with AI aren't the ones with the biggest models or the most GPU spend. They're the ones treating AI as what it is: a powerful, imperfect tool that requires skilled hands, careful planning, and the humility to know its limits.

A power tool, not a magic wand.
Jeff Stutzman is CEO of Monadnock Cyber LLC, a Guerrilla AI Lab building patented sales and marketing intelligence systems. He previously founded Trusted Internet, an MSSP, and holds a Senior Executive Fellowship from Harvard Kennedy School.

Saturday, January 04, 2025

What do you think will be the most important predictions for 2025?


Here’s what I think. My top three.

Artificial intelligence


In the world of cyber defense, AI in defense is clumsily integrated (today), fraught with false positives, but today is just the beginning. In both attack and defense, integration and usefulness of AI will get SO much better in 2025; transformative, a disrupter. There will be a gap between attack and defense. There always is. Attackers have already taken the early lead in adoption: deepfakes and dead accurate social engineering. BEC scams dwarf ransomware attacks. Why? How? AI-generated social engineering is used to convince someone to send them a check. Capabilities in attack and defend will level out, likely not in 2025 but soon after. 


What’s coming next? Here's some speculation. Remember Bees with Machine Guns? Bees with Machine Guns uses numerous micro EC2 instances (the bees) to load test web applications. Think about hundreds of AI-driven self-learning micro EC2 instances attacking an entire infrastructure all at the same time. Think cyber swarms using AI to guide multi-vector high volume attack – not just DDoS; high speed overwhelming attacks. Defenses are going to need to keep up. The volley of attack and defense will be carried out at speeds no human could imagine, analyze, and correlate. Long gone are the days of dumping packet captures and running them manually. 2025 will be a significant year for AI.


Next, AI-driven Information Warfare (an old term but still accurate) against the masses is coming. “I read it on the Internet, it must be right, right?” How many times have each of us said this?! Think about that! LLMs are taught by feeding data from the Internet. Could the output of an LLM be shaped by feeding it volumes of data?

Have you noticed any of the LLMs giving you answers containing slanted product information? I asked Gemini (I love Gemini!) about correlating cyber security data. It gives me Microsoft Azure as an answer. I had to tell Gemini to answer but without Azure!


I can’t wait to see how AI shapes marketing and news. I refuse to hire analysts who use only AI (and we’ve had a few). Keep thinking independently.


What about Quantum computing? 


There’s been speculation about quantum computing for years. 2025 will be the year that we see risks to existing encryption methods. Interestingly enough, we’ve seen (heard) vendors hawking “quantum-resistant cryptography” based on NIST standards. [1]


Many companies (around the world) are busy developing and offering Quantum computing, offering various levels of access: IBM, Google, Microsoft, Intel, Amazon, plus IONQ and the Chinese, Origin Wukong.


Much of this is still marketeer noise. NIST says they believe quantum computers will break encryption within the next decade. Me? We’re more than inching toward it; we’re marching, and the footsteps are growing louder.


Ransomware attacks 


Ransomware is by far the biggest threat to cyber today. It will continue to be a major threat, evolving with new techniques and becoming more disruptive incorporating AI and automation, making them more sophisticated and harder to detect. This is a no-brainer. Lockbit 4 is coming out in the spring (February? March?) and others are standing in line directly behind them.


Ransomware operators will take advantage of AI. It’s cheap and easy to use. Ransomware operators building AI into their operations is a no-brainer. A stop sign could have predicted that. But what about Quantum? When Quantum is as cheap to use as AI, expect it. My guess? We’re going to measure intent by monitoring bad guys hoarding encrypted data. When we see that, we’ll know they likely intend to use Quantum computing to break encryption on previously protected data, and ransome owners. I don’t expect this in 2025, but it will come.


2025 is going to be awesome. the tech is changing so fast (again). I can't wait to see how this unfolds!


[1] https://www.federalregister.gov/documents/2024/08/14/2024-17956/announcing-issuance-of-federal-information-processing-standards-fips-fips-203-module-lattice-based