AI Use Case Evaluator (Free Tool) | Field Guide to AI

Why evaluating matters before building

Here's the painful truth: 80% of AI projects fail to make it to production. Not because the technology doesn't work, but because teams picked the wrong use cases.

You don't need more AI. You need the right AI for the right problems. This evaluator helps you identify which of your ideas are actually worth pursuing—before you spend 6 months and $200,000 finding out the hard way.

This tool is for:

Product managers evaluating AI feature requests
CTOs prioritizing AI initiatives
Business leaders assessing AI investment opportunities
Consultants advising clients on AI strategy
Technical leads determining project feasibility

How the evaluation framework works

This isn't a simple yes/no decision. AI readiness exists on a spectrum. The framework evaluates your use case across 6 critical dimensions, each scored 1-5:

Data Availability — Do you have the data AI needs to learn?
Task Repeatability — Is this a recurring problem worth automating?
Error Tolerance — Can you handle when AI makes mistakes?
Budget Reality — Do you have realistic funding for the full lifecycle?
Technical Resources — Do you have the skills to build and maintain this?
Expected ROI — Will the benefits justify the investment?

Maximum possible score: 30 points

Each dimension is equally important. A low score in even one dimension can sink an entire project. That's why this holistic evaluation matters—it forces you to confront uncomfortable truths early, when pivoting is cheap.

How to use this evaluator

Step 1: Define your use case clearly

Write a one-sentence description: "We want to use AI to [do what] so that [business outcome]."

Bad example: "Use AI to improve customer service"
Good example: "Use AI to automatically categorize and route customer support tickets so response times drop by 30%"

Step 2: Score each dimension honestly

Read the detailed criteria for each dimension below. Score your use case 1-5 on each one. Be brutally honest—overestimating readiness leads to failed projects.

Step 3: Calculate your total score

Add up all six scores (maximum 30). Then check the interpretation guide to understand what your score means.

Step 4: Identify gaps and make a decision

Look at your lowest-scoring dimensions. Can you improve them? Should you? Or should you choose a different use case?

Dimension 1: Data Availability

What this measures

AI learns from data. Without sufficient, quality data that's actually accessible, your project is dead on arrival. This dimension evaluates whether you have the data foundation AI needs.

How to score (1-5)

Score 1 — No viable data

You have no historical data for this use case
Data exists but is completely inaccessible (legal, technical, or political barriers)
Data quality is so poor it's unusable (missing values, errors, inconsistencies)

Red flags: Starting from zero, locked-down data silos, severe quality issues

Score 2 — Minimal data

You have some data but far less than needed (e.g., 100 examples when you need 10,000)
Data exists but requires 6+ months to access or clean
Data is spread across incompatible systems with no integration plan

Red flags: Insufficient volume, major access delays, no data infrastructure

Score 3 — Adequate data with caveats

You have enough data to train a basic model, but coverage has gaps
Data quality requires significant cleaning (3-6 months of work)
Data is accessible but in difficult formats (PDFs, scans, legacy systems)
You're missing key labels or annotations needed for supervised learning

Watch out for: Data blind spots, major preprocessing needed, labeling burden

Score 4 — Good data with minor issues

You have substantial, representative data (10,000+ examples for most use cases)
Data quality is good but needs some cleaning (1-2 months)
Data is mostly accessible in workable formats
You have labels or a clear path to getting them

Green flags: Large dataset, good quality, accessible, labeled or labelable

Score 5 — Excellent data ready to use

You have abundant, high-quality, labeled data (50,000+ examples)
Data is clean, structured, and immediately accessible
Data is representative of real-world conditions you'll encounter
You have ongoing data pipelines to capture more data
Data is already centralized and in AI-friendly formats

Green flags: Data goldmine, production-ready, continuously updating

Real-world examples

Score 1 example — Medical diagnosis for rare disease
A hospital wants to detect a condition that affects 1 in 50,000 people. They have 12 historical cases. Verdict: Not viable without multi-institutional data sharing.

Score 5 example — Email spam detection
An email provider has 10 billion labeled emails (spam/not spam) from 20 years of user reports. Verdict: Ideal data foundation.

Dimension 2: Task Repeatability

What this measures

AI excels at automating repetitive tasks. If your use case is a one-time analysis or rarely occurs, the ROI math doesn't work. This dimension evaluates whether your problem happens often enough to justify AI.

How to score (1-5)

Score 1 — One-time or rare task

This task happens once or maybe a few times per year
The task is highly variable—no two instances are similar
Building AI takes longer than just doing the task manually

Red flags: One-off project, high variability, manual approach is faster

Score 2 — Infrequent task

Task occurs monthly or quarterly
Instances vary significantly each time
Manual approach is tedious but manageable

Red flags: Low frequency, high variability, questionable ROI

Score 3 — Moderate repetition

Task occurs weekly or a few times per week
Instances follow a general pattern with some variation
Manual approach is becoming a bottleneck

Watch out for: Borderline frequency, process standardization needed first

Score 4 — Frequent repetitive task

Task occurs daily or multiple times per day
Instances follow a clear, consistent pattern
Manual approach is a significant time sink

Green flags: High frequency, consistent pattern, clear automation value

Score 5 — Constant high-volume task

Task occurs hundreds or thousands of times per day
Instances are highly standardized
Manual approach is impossible at this scale
Task is a core part of operations

Green flags: Massive scale, highly repetitive, mission-critical automation

Real-world examples

Score 1 example — Merger due diligence analysis
A company needs AI to analyze documents for a single acquisition. Verdict: Hire consultants, don't build AI.

Score 5 example — Fraud detection for payment processor
A payment company processes 10 million transactions per day and needs real-time fraud checks. Verdict: Perfect AI use case.

Dimension 3: Error Tolerance

What this measures

AI is never 100% accurate. This dimension evaluates whether your use case can tolerate mistakes, and whether you have processes to catch and correct errors.

How to score (1-5)

Score 1 — Zero error tolerance

A single mistake causes catastrophic harm (safety, legal, financial)
No human review process is feasible
Errors are irreversible

Red flags: Life-or-death decisions, no safety net, irreversible consequences

Example: Autonomous surgery, legal sentencing decisions, nuclear power plant control

Score 2 — Very low error tolerance

Errors cause significant harm but not catastrophic
Human review is theoretically possible but not practical at scale
Error correction is expensive or slow

Red flags: High stakes, limited oversight capacity, expensive fixes

Example: Automated loan approvals with no human review, medical diagnosis without doctor confirmation

Score 3 — Moderate error tolerance

Errors cause inconvenience or minor costs
Human review is possible for some decisions (e.g., flagged cases)
Errors can be corrected but require effort

Watch out for: User frustration risk, limited review capacity, correction burden

Example: Resume screening (humans review top candidates), content moderation (appeals process exists)

Score 4 — High error tolerance

Errors cause minimal harm or inconvenience
Human review is built into the workflow
Errors are easy to correct
Users understand AI isn't perfect

Green flags: Low-stakes decisions, human-in-the-loop design, easy corrections

Example: Product recommendations (user just ignores bad ones), email categorization (user can move emails)

Score 5 — Very high error tolerance

Errors are nearly harmless
Users expect imperfection and self-correct naturally
Failed AI outputs are obvious and easy to discard
AI is providing suggestions, not making decisions

Green flags: Suggestion-only, obvious failures, user remains in control

Example: AI writing assistant (user edits output), photo filters, music playlist recommendations

Real-world examples

Score 1 example — Autonomous vehicle braking
AI decides when to apply emergency brakes. No human review is possible. Verdict: Requires extraordinary accuracy and redundant safety systems.

Score 5 example — Netflix recommendations
AI suggests movies. If it's wrong, user just picks something else. Verdict: Errors are essentially free.

Dimension 4: Budget Reality

What this measures

AI projects cost more than most people expect. This dimension evaluates whether you have realistic funding for the full lifecycle—not just initial development, but ongoing maintenance, monitoring, and improvement.

How to score (1-5)

Score 1 — Severely underfunded

Budget is under $10,000 or "we'll figure it out later"
Expecting to build a custom model with free tools and no staff time
No budget allocated for maintenance or improvements

Red flags: Unrealistic expectations, no real commitment, hoping for free solutions

Score 2 — Insufficient budget

Budget is $10,000-$50,000 for a complex custom AI system
Funding covers initial build but not ongoing costs
No contingency for unexpected challenges (there will be many)

Red flags: Underestimating costs, no maintenance plan, no buffer

Score 3 — Adequate budget with risks

Budget is $50,000-$200,000 for a custom system
Funding covers initial build and 6-12 months of maintenance
Limited contingency budget (10-20%)
You're betting on hitting milestones to unlock more funding

Watch out for: Funding cliffs, optimistic timelines, limited runway

Score 4 — Solid budget

Budget is $200,000-$500,000+ for a custom system, or $50,000+/year for SaaS solutions
Funding covers 18-24 months including development and operations
Realistic contingency budget (25%+)
Ongoing operational budget is secured

Green flags: Multi-year funding, healthy contingency, operational budget secured

Score 5 — Well-funded strategic initiative

Budget is $500,000+ for custom systems with multi-year commitment
Funding is tied to strategic priorities, not just project budget
Ample contingency for experimentation and iteration
Executive buy-in ensures continued funding
Budget includes team scaling, tools, and infrastructure

Green flags: Strategic priority, executive sponsorship, long-term commitment

Budget reality check

Typical costs for custom AI projects:

Discovery & planning: $20,000-$50,000 (2-4 weeks)
Data collection & preparation: $50,000-$200,000 (2-6 months)
Model development & training: $100,000-$500,000 (3-9 months)
Integration & deployment: $50,000-$150,000 (2-4 months)
Ongoing monitoring & maintenance: $100,000-$300,000/year

Total first-year cost: $320,000-$1,200,000 for a serious custom AI system

Using SaaS AI tools: $10,000-$100,000/year depending on scale and features

Real-world examples

Score 2 example — Startup with $25,000 budget
Founders want a custom recommendation engine for their new app. Verdict: Use an off-the-shelf solution or wait until Series A.

Score 5 example — Enterprise with $2M AI initiative
Fortune 500 company allocating $2M over 3 years for customer service AI with executive sponsorship. Verdict: Properly funded.

Dimension 5: Technical Resources

What this measures

AI requires specialized skills to build, deploy, and maintain. This dimension evaluates whether you have access to the technical expertise needed—either in-house, through hiring, or via partners.

How to score (1-5)

Score 1 — No technical capability

No one on your team has ML/AI experience
No budget to hire or outsource
No partnerships with AI vendors or consultants
No internal IT/engineering function

Red flags: Zero capability, no path to capability, no partners

Score 2 — Minimal capability

You have general software developers but no AI specialists
No experience with ML frameworks (TensorFlow, PyTorch, scikit-learn)
You could hire or outsource but haven't yet
You'd be learning everything from scratch

Red flags: No AI expertise, steep learning curve, dependency on external help

Score 3 — Developing capability

You have 1-2 people with some ML experience (online courses, side projects)
Software engineering team can support integration but not build models
You're planning to hire AI specialists or engage consultants
You have basic data infrastructure (databases, APIs) but no ML pipeline

Watch out for: Capability gaps, reliance on junior staff, infrastructure needs

Score 4 — Solid capability

You have 2-3+ people with production ML experience
Team has successfully deployed AI models before (even if small)
Strong software engineering foundation to integrate AI
You have or can build ML infrastructure (training pipelines, monitoring, versioning)
You have a path to hire or upskill for any gaps

Green flags: Experienced team, proven track record, infrastructure exists

Score 5 — Strong AI-native capability

You have a dedicated ML/AI team (5+ people)
Team includes ML engineers, data scientists, and ML ops specialists
You've deployed multiple production AI systems successfully
Mature ML infrastructure and practices (CI/CD for models, monitoring, A/B testing)
AI is a core competency of your organization

Green flags: Dedicated team, mature practices, AI-native culture

Skills required for most AI projects

Essential skills:

Machine learning fundamentals (supervised learning, model evaluation)
Python programming and ML libraries (scikit-learn, pandas, numpy)
Data cleaning and feature engineering
Model deployment and API integration
Basic statistics and experimentation

Advanced skills (for complex projects):

Deep learning (TensorFlow, PyTorch)
ML ops (model monitoring, versioning, retraining)
Distributed training and infrastructure
Domain expertise (NLP, computer vision, etc.)

Real-world examples

Score 1 example — Brick-and-mortar retailer
Local retail chain with no tech team wants to build inventory prediction AI. Verdict: Partner with a vendor or hire externally.

Score 5 example — AI-first startup
Company founded by ML PhDs with 15-person ML team and production systems serving millions. Verdict: Built for AI.

Dimension 6: Expected ROI

What this measures

AI is an investment, not magic. This dimension evaluates whether the business value you'll capture justifies the cost and effort. Clear, measurable ROI is essential for project success and continued funding.

How to score (1-5)

Score 1 — No clear ROI

You're building AI because "everyone else is doing it"
No specific business metrics will improve
Value is entirely speculative or intangible ("it would be cool")
Cost far exceeds any plausible benefit

Red flags: No business case, following hype, no measurable value

Score 2 — Weak ROI

Benefits are vague or hard to measure (e.g., "better user experience")
Expected value is $50,000-$100,000 but costs are $200,000+
ROI timeline is 5+ years
Assumptions require everything to go perfectly

Red flags: Negative or barely-positive ROI, long payback, optimistic assumptions

Score 3 — Moderate ROI

Clear but modest benefits (e.g., 10% efficiency improvement)
Expected value is $200,000-$500,000 over 3 years
ROI is positive but requires 2-3 years to break even
Benefits are measurable but incremental

Watch out for: Long payback period, modest gains, other priorities might be better

Score 4 — Strong ROI

Significant measurable benefits (e.g., 30% cost reduction or 50% faster process)
Expected value is $500,000-$2M over 3 years
ROI is 2-3x within 18-24 months
Benefits directly impact revenue or major cost centers
ROI case is defensible with conservative assumptions

Green flags: Clear value, reasonable timeline, measurable impact, solid business case

Score 5 — Exceptional ROI

Transformational impact (e.g., enables new business model or saves millions)
Expected value is $2M+ over 3 years
ROI is 5x+ within 12-18 months
Benefits are measurable, substantial, and defensible
Project unlocks strategic capabilities beyond immediate ROI
Competitive necessity (not doing this puts you at risk)

Green flags: Massive value, fast payback, strategic importance, competitive imperative

How to calculate ROI

Formula: ROI = (Total Benefits - Total Costs) / Total Costs × 100%

Total Benefits (over 3 years):

Labor savings: Hours saved × Hourly cost
Revenue increase: New sales enabled by AI
Cost avoidance: Errors prevented, efficiency gains
Strategic value: What's it worth to be 12 months ahead of competitors?

Total Costs (over 3 years):

Development costs (see Dimension 4 budget guide)
Ongoing operational costs ($100,000-$300,000/year)
Opportunity cost (what else could the team build?)

Target minimum: 200% ROI (3x return) within 24 months for most projects

Real-world examples

Score 1 example — AI chatbot for 20-person company
Building a custom support chatbot that answers 5 tickets per week. Verdict: Costs $150,000, saves $10,000/year. No business case.

Score 5 example — Fraud detection for payment processor
AI prevents $10M/year in fraud losses, costs $800,000/year to operate. Verdict: 12x ROI, mission-critical.

Scoring guide: Interpreting your total score

Add up your scores across all 6 dimensions. Here's what your total means:

Score 26-30: Excellent AI candidate

What it means: This use case is a strong fit for AI across all dimensions. You have the data, resources, budget, and business case to succeed. These are the projects that should be top priority.

What to do next:

Build a detailed project plan with milestones and success metrics
Assemble your team (in-house, hiring, or partners)
Start with a small proof-of-concept (4-8 weeks) to validate assumptions
If POC succeeds, move to production development
Set up monitoring and evaluation from day one

Watch out for: Even great projects fail with poor execution. Stay disciplined about scope, timelines, and metrics.

Score 21-25: Good candidate with gaps

What it means: This use case has strong potential but has 1-2 dimensions that need attention. You can likely move forward, but address the weaknesses first.

What to do next:

Identify your lowest-scoring dimensions
Create a plan to address each gap:
- Low data score? Invest in data collection or labeling first
- Low budget score? Seek additional funding or use cheaper alternatives
- Low technical resources? Hire, train, or partner before you start building
Re-evaluate once gaps are addressed
Consider starting with a smaller pilot to prove value before full investment

Watch out for: Don't proceed hoping gaps will resolve themselves. They won't.

Score 15-20: Risky candidate

What it means: This use case has significant challenges across multiple dimensions. Success is possible but requires major changes to your approach, scope, or timeline.

What to do next:

Identify all dimensions scoring 3 or below
Ask: Can we realistically fix these issues? How long will it take?
Consider alternatives:
- Use off-the-shelf AI tools instead of building custom
- Narrow scope to a smaller, more achievable problem
- Partner with an AI vendor who can fill your capability gaps
- Delay the project until you've built foundational capabilities
If you proceed, treat this as a high-risk R&D project, not a sure bet

Watch out for: These projects often become zombie initiatives—consuming resources without delivering value.

Score 10-14: Poor fit for AI

What it means: This use case has fundamental issues that make AI success unlikely. Proceeding will likely waste time and money.

What to do next:

Honestly assess why the score is so low
Consider non-AI solutions:
- Manual processes with better tools
- Simple automation (rules-based, not ML)
- Outsourcing to humans
- Process improvements before automation
If AI still seems necessary, focus on building foundational capabilities first:
- Spend 6-12 months collecting data
- Hire or develop technical expertise
- Start with simpler automation projects to build confidence
Re-evaluate in 12-18 months once foundations are stronger

Watch out for: Pressure to "do AI" because it's trendy. Bad use cases don't become good just because executives want them.

Score 6-9: Definitely not ready

What it means: This use case should not be pursued with AI at this time. The gaps are too large.

What to do next:

Say no. Seriously. This is the most valuable thing you can do.
Explain why to stakeholders using this framework
Propose alternatives (see "Poor fit" guidance above)
Document what would need to change for this to become viable
Focus your limited resources on better opportunities

Watch out for: Political pressure to do AI anyway. Stand your ground. Failed AI projects damage credibility and waste resources.

Real-world evaluated examples

Let's score 6 actual use cases to see how the framework works in practice:

Example 1: Customer support ticket routing

Context: E-commerce company receives 5,000 support tickets per day across 15 categories. Currently done manually by a team of 10 people spending 30 mins per ticket.

Scores:

Data Availability: 5 (2 years of tickets with manual categories)
Task Repeatability: 5 (thousands per day, highly repetitive)
Error Tolerance: 4 (misrouting is annoying but fixable)
Budget Reality: 4 ($200,000 budget, ongoing operational commitment)
Technical Resources: 3 (have engineers but no ML team yet)
Expected ROI: 5 (save 4,000 hours/month = $400,000/year)

Total: 26/30 — Excellent candidate
Verdict: Strong use case. Address the technical resource gap by hiring or partnering, then proceed.

Example 2: Predicting rare equipment failures

Context: Manufacturing plant wants to predict failures of a critical machine that fails 2-3 times per year. Each failure costs $100,000 in downtime.

Scores:

Data Availability: 2 (only 15 historical failures over 5 years)
Task Repeatability: 1 (very rare event)
Error Tolerance: 3 (false positives waste maintenance time, false negatives are costly)
Budget Reality: 3 ($150,000 budget)
Technical Resources: 2 (no ML experience)
Expected ROI: 3 (save maybe $200,000/year if perfect)

Total: 14/30 — Poor fit
Verdict: Not enough failure data to train a reliable model. Recommendation: Invest in better preventive maintenance schedules and sensor monitoring. Collect more data for 3-5 years, then revisit.

Example 3: Document extraction for tax forms

Context: Accounting firm processes 50,000 tax forms per year. Currently 20 staff members manually type information from PDFs into their system.

Scores:

Data Availability: 5 (50,000 annotated forms from previous years)
Task Repeatability: 5 (highly repetitive, year-round need)
Error Tolerance: 3 (errors must be caught before filing; human review required)
Budget Reality: 5 (saving $800,000/year in labor justifies major investment)
Technical Resources: 2 (small IT team, no ML experience)
Expected ROI: 5 (massive labor savings)

Total: 25/30 — Good candidate
Verdict: Excellent use case but capability gap exists. Recommendation: Use a commercial OCR/document extraction tool (like AWS Textract or Google Document AI) rather than building custom. Requires integration work but not ML expertise.

Example 4: AI-generated marketing copy

Context: Marketing team wants AI to write first drafts of blog posts and social media content. 5-person team creates 20 blog posts and 100 social posts per month.

Scores:

Data Availability: 4 (3 years of blog posts and social content)
Task Repeatability: 4 (daily content creation)
Error Tolerance: 5 (humans review and edit everything anyway)
Budget Reality: 5 (using ChatGPT/Claude = $200/month per user)
Technical Resources: 5 (no AI team needed; using existing tools)
Expected ROI: 4 (30% productivity boost = 1 FTE saved = $80,000/year)

Total: 27/30 — Excellent candidate
Verdict: Perfect use case for off-the-shelf generative AI. Implement immediately with clear editorial guidelines.

Example 5: Predicting customer churn

Context: SaaS company with 5,000 customers wants to predict which customers will cancel so they can intervene proactively.

Scores:

Data Availability: 4 (3 years of customer data, usage logs, and cancellation history)
Task Repeatability: 5 (continuous monitoring of entire customer base)
Error Tolerance: 4 (false positives mean wasted outreach; false negatives mean lost customers)
Budget Reality: 4 ($150,000 for development + $50,000/year operations)
Technical Resources: 4 (have a data analyst with some ML experience)
Expected ROI: 4 (reducing churn 20% = $500,000/year in retained revenue)

Total: 25/30 — Good candidate
Verdict: Strong business case and good data. Start with a simple model (logistic regression) to prove value, then iterate. Upskill your data analyst or hire an ML engineer for production deployment.

Example 6: Autonomous legal contract generation

Context: Law firm wants AI to automatically generate complete, binding legal contracts from client briefs with no attorney review.

Scores:

Data Availability: 3 (10,000 past contracts but high variability)
Task Repeatability: 3 (50-100 contracts per month, each unique)
Error Tolerance: 1 (legal errors create malpractice liability)
Budget Reality: 3 ($200,000 budget but unclear if enough)
Technical Resources: 2 (no ML team)
Expected ROI: 3 (time savings exist but error risk is enormous)

Total: 15/30 — Risky candidate
Verdict: Fatal flaw is error tolerance. Recommendation: Change scope to AI-assisted drafting where attorneys review and approve everything. Use AI for research, clause suggestions, and first drafts, not autonomous generation.

The decision tree: Quick assessment

Use this decision tree for fast initial screening before doing the full 6-dimension evaluation:

Question 1: Do you have data?

YES, lots of quality data (1,000+ examples) → Continue to Question 2
YES, but limited or poor quality → YELLOW FLAG. Assess if you can collect more.
NO → STOP. Come back when you have data.

Question 2: Is this task repetitive?

YES, happens constantly (daily/hourly) → Continue to Question 3
SOMETIMES, happens weekly/monthly → YELLOW FLAG. ROI may be marginal.
NO, rare or one-time → STOP. AI is overkill.

Question 3: Can you tolerate errors?

YES, errors are low-stakes or humans review outputs → Continue to Question 4
MAYBE, errors are costly but catchable → YELLOW FLAG. Design for human oversight.
NO, errors are catastrophic → STOP. AI is too risky.

Question 4: Do you have budget?

YES, $100,000+ for custom or $10,000+/year for SaaS → Continue to Question 5
MAYBE, but it's tight → YELLOW FLAG. Consider cheaper alternatives.
NO, hoping to do this for free → STOP. Not realistic.

Question 5: Do you have technical capability?

YES, in-house or clear path to hire/partner → Continue to Question 6
MAYBE, we're learning → YELLOW FLAG. Start small and simple.
NO, and no plan to get it → STOP. You can't execute.

Question 6: Is the ROI clear?

YES, measurable benefits justify costs (2x+ ROI) → GREEN LIGHT. Do the full evaluation.
MAYBE, benefits are soft or uncertain → YELLOW FLAG. Tighten the business case.
NO, we're just experimenting → STOP. Experiments are fine, but don't call it a real project.

Decision tree result:

All green, no yellow flags? This is likely a strong candidate. Do the full 6-dimension evaluation.
1-2 yellow flags? Address those issues, then evaluate fully.
3+ yellow flags or any stops? This use case needs major changes before it's viable.

Common pitfalls to avoid

Even good use cases fail when teams make these mistakes:

Pitfall 1: Confusing AI with magic

The mistake: Assuming AI will just "figure it out" without clear requirements, quality data, or ongoing maintenance.

Why it fails: AI learns from data and patterns. Garbage in = garbage out. Vague requirements lead to models that technically work but don't solve the actual problem.

How to avoid: Define success metrics before you start. What accuracy rate do you need? What does good look like? Write this down.

Pitfall 2: Skipping the data audit

The mistake: Assuming you have "plenty of data" without actually checking quality, representativeness, or accessibility.

Why it fails: Most organizations discover too late that their data is incomplete, biased, in the wrong format, or impossible to access due to compliance constraints.

How to avoid: Before scoring Dimension 1, actually pull and examine your data. Don't guess. Look at it. How much do you have? What % is missing or corrupted? Is it representative? Can you actually access it?

Pitfall 3: Underestimating the "last mile"

The mistake: Focusing all effort on model accuracy and none on deployment, monitoring, and integration with existing systems.

Why it fails: A 95% accurate model in a Jupyter notebook is worthless. Getting it into production, integrated with your systems, monitored for drift, and maintained takes longer than building the model.

How to avoid: Budget 50% of your timeline for the "last mile" of deployment and integration. Plan for ML ops from day one.

Pitfall 4: No human-in-the-loop design

The mistake: Building fully autonomous AI for use cases that actually need human judgment or oversight.

Why it fails: Even great AI makes mistakes. Use cases with low error tolerance need humans in the loop to review, approve, or handle exceptions.

How to avoid: If your use case scored 3 or lower on error tolerance, design for human oversight from the start. AI assists, humans decide.

Pitfall 5: Ignoring change management

The mistake: Building technically sound AI but failing to get users to actually adopt it.

Why it fails: People resist tools that feel like black boxes or threaten their jobs. Even great AI fails if users don't trust it or won't use it.

How to avoid: Involve end users early. Explain how the AI works (at a high level). Show them it makes their jobs easier, not obsolete. Provide clear feedback mechanisms when AI is wrong.

Pitfall 6: Falling for sunk cost fallacy

The mistake: Continuing a failing AI project because you've already invested so much.

Why it fails: Bad projects don't become good with more investment. Continuing to fund a doomed initiative wastes resources that could be used on better opportunities.

How to avoid: Set clear go/no-go milestones at the start (e.g., "If we don't hit 80% accuracy by month 4, we kill the project"). Actually follow through.

Pitfall 7: Building when you should buy

The mistake: Attempting to build custom AI for use cases that are already solved by commercial tools.

Why it fails: Building takes longer and costs more than expected. Meanwhile, off-the-shelf solutions already exist, are battle-tested, and include ongoing support.

How to avoid: Before building, spend 2 weeks researching existing solutions. SaaS tools often cost $10,000-$50,000/year vs $500,000+ to build custom. Can you customize or integrate an existing tool rather than starting from scratch?

Pitfall 8: No plan for model decay

The mistake: Launching AI and assuming it will work forever without retraining or updates.

Why it fails: AI models decay over time as the world changes. Customer behavior shifts, products change, new edge cases emerge. Yesterday's 90% accuracy becomes 70% within months.

How to avoid: Plan for ongoing monitoring, retraining, and updates from day one. Budget for ML ops, not just initial development.

What to do next based on your score

If you scored 26-30:

You have a strong candidate. Your next steps:

Build a detailed project plan (timeline, milestones, success metrics)
Assemble your team (or identify partners)
Secure funding and executive buy-in
Start with a 4-8 week proof of concept
Set up evaluation and monitoring frameworks
If POC succeeds, move to production development

Timeline: Expect 6-12 months from start to production for most projects.

If you scored 21-25:

You have a good candidate with gaps. Your next steps:

Identify your lowest-scoring dimensions (especially anything 3 or below)
Create a gap-closing plan:
- Data gaps? Invest 2-4 months in data collection, cleaning, or labeling
- Budget gaps? Seek additional funding or consider cheaper alternatives (SaaS vs custom)
- Technical gaps? Hire, train, or partner before starting development
- ROI gaps? Tighten your business case or narrow scope to improve ROI
Re-evaluate once gaps are addressed
Consider a smaller pilot to prove value before full commitment

Timeline: Expect 3-6 months to address gaps, then 6-12 months for development.

If you scored 15-20:

This is a risky use case. Your next steps:

List all dimensions scoring 3 or below
Honestly assess: Can these gaps be closed? How long will it take? How much will it cost?
Consider alternatives:
- Use off-the-shelf tools instead of building custom
- Narrow scope dramatically (do 20% of the original vision)
- Solve the problem with non-AI automation first
- Delay until foundational capabilities are built
If you proceed, treat this as R&D, not a sure thing. Set clear kill criteria.

Timeline: Expect 6-12 months to address foundational gaps before starting, then 6-12 months for development. Total: 12-24 months.

If you scored 10-14:

This is a poor fit for AI right now. Your next steps:

Acknowledge this isn't viable in its current form
Explore non-AI solutions:
- Process improvements
- Simple rule-based automation
- Better tools for manual processes
- Outsourcing to specialists
If AI still seems necessary long-term, build foundations first:
- Spend 6-12 months collecting quality data
- Hire technical talent or develop internal skills
- Start with simpler automation projects to build confidence
Re-evaluate in 12-18 months

Timeline: 12-18 months to build foundations before starting. Not a near-term initiative.

If you scored 6-9:

Stop. Do not pursue this with AI. Your next steps:

Say no clearly to stakeholders
Explain why using this framework
Propose non-AI alternatives
Document what would need to change for this to become viable (likely years of foundational work)
Focus resources on better opportunities

Timeline: Not viable. Consider other approaches entirely.

Want to go deeper?

This evaluator gives you a framework. To improve your AI strategy and execution:

Choosing the Right AI Tool — Finding solutions that fit your needs
When NOT to Use AI — Recognizing when AI is the wrong solution
AI Strategy Basics — Building an AI roadmap for your organization
Glossary: Training Data — Understanding what AI needs to learn
Glossary: ROI — Measuring return on AI investments

License & Attribution

This resource is licensed under Creative Commons Attribution 4.0 (CC-BY). You're free to:

Share with your team or organization
Adapt for your specific use cases or industry
Use in client workshops or internal strategy sessions

Just include this attribution:

"AI Use Case Evaluator" by Field Guide to AI (fieldguidetoai.com) is licensed under CC BY 4.0

Download now

Click below for instant access. No signup required.

Why evaluating matters before building

How the evaluation framework works

How to use this evaluator

Step 1: Define your use case clearly

Step 2: Score each dimension honestly

Step 3: Calculate your total score

Step 4: Identify gaps and make a decision

Dimension 1: Data Availability

What this measures

How to score (1-5)

Real-world examples

Dimension 2: Task Repeatability

What this measures

How to score (1-5)

Real-world examples

Dimension 3: Error Tolerance

What this measures

How to score (1-5)

Real-world examples

Dimension 4: Budget Reality

What this measures

How to score (1-5)

Budget reality check

Real-world examples

Dimension 5: Technical Resources

What this measures

How to score (1-5)

Skills required for most AI projects

Real-world examples

Dimension 6: Expected ROI

What this measures

How to score (1-5)

How to calculate ROI

Real-world examples

Scoring guide: Interpreting your total score

Score 26-30: Excellent AI candidate

Score 21-25: Good candidate with gaps

Score 15-20: Risky candidate

Score 10-14: Poor fit for AI

Score 6-9: Definitely not ready

Real-world evaluated examples

Example 1: Customer support ticket routing

Example 2: Predicting rare equipment failures

Example 3: Document extraction for tax forms

Example 4: AI-generated marketing copy

Example 5: Predicting customer churn

Example 6: Autonomous legal contract generation

The decision tree: Quick assessment

Question 1: Do you have data?

Question 2: Is this task repetitive?

Question 3: Can you tolerate errors?

Question 4: Do you have budget?

Question 5: Do you have technical capability?

Question 6: Is the ROI clear?

Common pitfalls to avoid

Pitfall 1: Confusing AI with magic

Pitfall 2: Skipping the data audit

Pitfall 3: Underestimating the &quot;last mile&quot;

Pitfall 4: No human-in-the-loop design

Pitfall 5: Ignoring change management

Pitfall 6: Falling for sunk cost fallacy

Pitfall 7: Building when you should buy

Pitfall 8: No plan for model decay

What to do next based on your score

If you scored 26-30:

If you scored 21-25:

If you scored 15-20:

If you scored 10-14:

If you scored 6-9:

Want to go deeper?

License &amp; Attribution

Download now

Related Guides

Choosing the Right AI Tool: ChatGPT, Claude, Gemini, and More

When to Use AI Tools (and When Not To)

Key Terms

Ready to view?

Pitfall 3: Underestimating the "last mile"

License & Attribution