Building an AI-Powered Self-Review Skill

April 28, 2026 Data & AI

Your Style Guide Has 200 Rules. Your Information Developers Remember 12. How to Fix that?

This blog post describes a self-review skill built on GitHub Copilot's extensibility framework. The approach and architecture described here are generalizable to any domain where editorial or compliance standards can be expressed as structured rules and applied by the content creator before formal review.

The Problem Every Documentation Team Knows

Technical documentation is hard to keep consistent. Style guides grow to dozens of pages. Terminology tables expand with every release. New information developers join, experienced information developers forget edge cases, and everyone submits drafts with issues they could have caught themselves—if they had the time and mental bandwidth to cross-check every rule.

Our team maintains documentation for a mature enterprise platform with strict editorial standards—over 14 broader categories of review criteria, from terminology accuracy and grammatical correctness to structural completeness and typographic conventions. Information developers genuinely wanted to get things right the first time, but no one can hold hundreds of rules in their head while also thinking about clarity, accuracy, and structure. The result was predictable: editors spent most of their time catching mechanical errors instead of focusing on the high-value work—content accuracy, narrative flow, and reader experience.

We asked ourselves: What if information developers had a way to self-review against the full style guide before submitting a draft?

The Idea: A Personal Style-Guide Coach in the Information developer's IDE

The core insight was simple—our editorial standards were already codified in a written style guide. The rules were explicit, the examples were concrete, and the expected corrections were well-defined. This is exactly the kind of structured knowledge that a large language model can apply consistently, at scale, and without fatigue. The missing piece wasn't the rules—it was putting them in the information developer's hands at the moment of writing.

Rather than building a standalone application or another review gate, we chose to build a GitHub Copilot custom skill—a modular extension that plugs directly into the information developer's IDE. This meant:

  • Self-service, on demand. Information developers run a review whenever they want—after finishing a section, before submitting a draft, or after incorporating editor feedback. No need to wait for someone else's calendar.
  • Zero context switching. The review happens in the same tool the information developer already uses to write.
  • Natural language invocation. No CLI flags, no configuration files—just describe what you want in plain English.
  • Iterative workflow. Fix a batch of issues, paste the revised text, and get a delta review instantly. Information developers can tighten their own drafts through multiple self-review passes before anyone else sees the document.

Architecture: Three Files, One Self-Review Workflow

The skill is remarkably lightweight. The entire system consists of three core files:

  1. The Skill Definition

This is the orchestration layer—a structured Markdown file that tells the AI who it is and how to behave. It defines:

  • The persona: A senior technical editor with deep domain expertise and linguistic precision.
  • The procedure: A step-by-step workflow—ingest the document, detect its type, compute metrics, run the review and delivers results.
  • Input handling: The skill accepts pasted text, file paths (including PDFs) or documentation URLs. PDF extraction is fully automated using Python libraries—no manual copy-paste required.
  • Output format: A structured Executive Summary with quantitative metrics, followed by a detailed Annotated Change Sheet with one row per issue.

The key design principle here is deterministic orchestration with flexible execution. The steps are numbered and mandatory but the AI has latitude in how it applies each rule to the specific document at hand. From the information developer's perspective, it feels like having a tireless colleague who has memorized the entire style guide and is always available for a quick read-through.

  1. The Review Criteria Reference

This is the knowledge base—the encoded style guide. It contains:

  • Terminology tables mapping incorrect forms to approved forms (with exceptions for edge cases like official product names).
  • Grammar rules covering subject-verb agreement, article usage, tense consistency, punctuation and more.
  • Clarity guidelines with specific wordy-phrase-to-concise-phrase mappings.
  • Style rules for voice, tone, pronoun usage and register.
  • Structural checks for heading hierarchy, list construction, table formatting and cross-references.
  • Domain-specific checks that activate conditionally based on document type (courseware gets instructional design checks; reference docs get syntax-example pairing checks).

The criteria file is loaded by reference during the review—the AI reads it at runtime, which means updates to the style guide are immediately reflected in reviews without redeploying anything.

  1. The Annotation Template

This is the output schema—a structured Markdown template means that every review produces consistent, machine-readable output:

  • Executive Summary with issue counts by category and severity, document statistics, consistency metrics and a structural completeness checklist.
  • Annotated Change Sheet where each issue gets a row with location, category, severity, original text, suggested change and justification.

The template enforces a contract: every review looks the same, regardless of which document was reviewed or when.

What Makes It Work: Design Principles

  1. Encode Rules, Not Opinions

For a self-review tool to earn information developers' trust, it has to be precise—not vague. The most important decision we made was to express every editorial rule as a concrete, testable criterion with examples of both correct and incorrect usage. Instead of "use clear language," we wrote specific tables:

 

Flag this

Replace with

“in order to”

“to”

“has the ability to”

“can”

“prior to”

“before”

 

This precision reduces hallucination and subjective variation. The AI isn't deciding what "clear" means—it's pattern-matching against an explicit inventory of known issues. For information developers, this is crucial: the feedback feels objective and actionable, not like an opinionated critique.

  1. Severity as a Decision Framework

Every issue is classified into one of three severity levels:

  • Critical—factual errors or broken instructions. Must fix before publishing.
  • Major—significantly hurts reader comprehension. Strongly recommended fix.
  • Minor—polish and preference. Fix if time allows.

This three-tier system gives information developers a clear prioritization framework during self-review. A document with twenty Minor issues and zero Critical issues is in much better shape than one with two Critical issues and the severity labels make that immediately visible. Information developers can focus on the Critical and Major items first and address Minor polish if time allows — just as a human editor would advise.

  1. Quantitative Metrics Alongside Qualitative Review

The skill doesn't just find issues—it computes measurable document health indicators:

  • Average sentence length (target: 15–22 words)
  • Passive voice density (target: under 20%)
  • Lexical density—the ratio of content words to total words, computed via NLP part-of-speech tagging (target: 45–60%)
  • Acronym coverage—what percentage of acronyms in the document are defined on first use (target: ≥ 90%)
  • Spelling error count with domain-aware exclusion lists

These metrics give information developers a numeric snapshot of their draft's health before they even read the detailed findings. A quick glance at the summary answers the question every information developer asks before submitting: "Is this ready or does it need another pass?" The metrics also make it possible for information developers to track their own improvement over time.

  1. Document-Type-Aware Conditional Logic

Not all documents are the same. A courseware lesson has learning objectives and check-your-understanding questions; a reference topic has syntax blocks and parameter tables; release notes have neither.

The skill auto-detects the document type from structural signals (headings like "Learning Objectives" or "Syntax" or "Release Notes") and activates or deactivates entire categories of checks accordingly. Courseware gets Bloom's Taxonomy verb validation and objective-assessment alignment checks. Reference docs get syntax-example pairing checks. Release notes get only the universal grammar, style and terminology checks.

This conditional architecture prevents false positives—the skill never flags a reference topic for missing learning objectives.

  1. Structured Output With a Consistent Contract

Every review produces the same output structure:

  1. Executive Summary—the 30-second health check: Am I ready to submit?
  2. Annotated Change Sheet—the line-by-line detail to work through.
  3. Optional Word Document—a color-coded, shareable report for cases where the information developer wants to attach the self-review alongside the draft.

The consistency of the output format means that information developers learn the layout once and can scan it quickly on every subsequent self-review. When they do submit to an editor, the editor receives a cleaner draft—and if the information developer attaches the self-review report, the editor can see exactly what was already checked.

Bringing Computation into the Loop

One of the more novel aspects of the design is how the skill blends LLM reasoning with deterministic computation. The AI doesn't estimate passive voice percentage—it runs a Python script that counts passive constructions using regex patterns and NLP part-of-speech tagging.

This hybrid approach plays to each technology's strengths:

  • The LLM handles nuanced judgment—is this sentence genuinely ambiguous? Is this passive voice acceptable because the agent is truly unknown? Is this heading parallel with its siblings?
  • Python scripts handle precise measurement—word counts, POS tagging, regex-based pattern detection, readability scores.

The result is a review that is both computationally rigorous and editorially intelligent.

Results and Impact

Since deploying the skill as a self-review tool:

  • Information developers self-review in minutes, not hours. A 10-page document review completes in under two minutes—fast enough to run after every major editing pass.
  • Draft quality at submission improved dramatically. Editors reported that the mechanical errors they used to spend most of their time on—terminology, comma rules, passive voice—dropped significantly. They could focus on what only a human editor can do: content accuracy, narrative arc and reader experience.
  • Information developers gained confidence. Instead of wondering "did I miss something?" before submitting, information developers could see a concrete quality summary. The self-review became a checklist they trusted.
  • The style guide became a living system rather than a static PDF that nobody re-reads after onboarding. Every self-review is an encounter with the style guide's rules, complete with explanations of whyeach change is recommended.
  • Onboarding new information developers accelerated. New team members used the skill as an interactive tutor—write a section, self-review, learn from the justifications and internalize the standards faster than any training session could achieve.
  • Editor–information developer friction decreased. When information developers catch their own issues, editorial feedback focuses on substance rather than mechanics. Reviews feel collaborative rather than corrective.

How Your Team Can Build a Self-Review Skill

You don't need our specific domain or our specific platform. The pattern generalizes to any team that has editorial standards and produces written content. The goal is simple: give every information developer the ability to review their own work against the team's full set of standards, instantly and on demand. Here's the playbook:

Step 1: Audit Your Style Guide

Go through your existing style guide and classify every rule into one of these types:

  • Terminology rules — "Use X, not Y." These are the easiest to encode.
  • Pattern rules — "Replace wordy phrase A with concise phrase B." Build lookup tables.
  • Structural rules — "Every procedure must have numbered steps." These become checklist items.
  • Judgment rules — "Use active voice when the agent is known." These are where the LLM shines.

Step 2: Build the Criteria Reference

Create a single document that contains all your rules in explicit, example-rich format. Structure it with clear section numbers so the AI can cite specific rules in its justifications. Include both positive examples (correct) and negative examples (incorrect) for every rule.

Step 3: Define the Output Contract

Design your annotation template before you build the skill. Decide:

  • What metadata goes in the summary?
  • What columns does the change sheet need?
  • What severity levels will you use?
  • What metrics will you compute?

Step 4: Write the Skill Definition

Define the persona, the procedure and the input/output handling. Be specific about the order of operations. Include error handling for edge cases (PDF extraction failures, ambiguous document types, etc.).

Step 5: Add Computational Metrics

Identify which aspects of quality can be measured numerically. Sentence length, passive voice density and acronym coverage are good starting points. Write small scripts that compute these metrics and integrate them into the review flow.

Step 6: Iterate With Real Documents

Run the skill against real documents from your team. Have information developers use it for self-review and collect their feedback. Tune the criteria based on false positives and missed issues. Add exceptions for legitimate edge cases. Refine the severity classifications based on what information developers actually find actionable. The skill improves fastest when the people using it daily are the ones shaping the rules.

Lessons Learned

Start with the rules, not the AI. The quality of the skill is directly proportional to the quality of the encoded criteria. Spend most of your time on the style guide, not the prompt engineering.

Be explicit about exceptions. Every terminology rule has edge cases. Document them upfront. Without explicit exceptions, the AI will flag legitimate usage and information developers will stop trusting the tool. Trust is everything in a self-review system—the moment information developers dismiss the output as noisy, they stop using it.

Severity matters more than count. A review that finds 50 Minor issues is less actionable than one that finds 3 Critical issues. The severity framework is what transforms a wall of feedback into a prioritized to-do list that information developers can actually work through.

Self-review is not a replacement for editorial review. The skill catches mechanical and structural issues — the things that are tedious for humans but straightforward for a machine. It does not replace the editor's judgment on content accuracy, audience appropriateness or narrative quality. What it does is free up the editor's time and attention for exactly that high-value work.

The skill teaches the style guide. Every self-review is a micro-learning moment. The justification column explains why each change is recommended, citing specific rules. Over time, information developers internalize the standards and make fewer errors in the first draft. The skill doesn't just catch mistakes—it prevents them.

Structured output enables automation. Because the output follows a consistent schema, downstream processes can consume it—tracking metrics over time, generating quality dashboards or gating submissions on minimum quality thresholds.

The skill is the style guide. Once the criteria are encoded in a machine-readable format, the "style guide" is no longer a document that people read once during onboarding—it's a system that actively coaches information developers every time they self-review. Updates to the criteria file immediately change the behavior of every future review.

Looking Ahead

This is just the beginning. The same pattern—encode domain expertise as structured criteria, let the LLM apply it with judgment and augment with deterministic computation—can be applied as a self-review tool beyond documentation:

  • Code review—let developers self-check against coding standards before submitting a pull request.
  • API documentation—information developers validate that every endpoint has examples, error responses and authentication details before publishing.
  • Regulatory compliance—authors self-check documents against specific regulatory requirements before legal review.
  • Localization review—translators verify that their output meets style and terminology standards for each locale before it goes to a reviewer.

The common thread is shifting quality left—empowering the person doing the work to catch issues at the point of creation, not the point of review. The tools are here. The pattern is proven. The only prerequisite is a team that has codified its standards—and the willingness to put those standards directly in the hands of the people who write.

Deepti Gupta

Deepti Gupta has over 16 years of experience spanning software development, content development and technical writing. Her career began as an Oracle Apps Developer, where she built a strong foundation in ERP databases, business reports and supporting financial operations through Oracle Apps Finance modules.

This deep technical background now informs her approach to documentation, ensuring clarity in every piece of technical and courseware content she creates. Eager to implement AI-enhanced workflows, Deepti seeks to learn about the broader applicability of AI across technical documentation and instructional design, including adaptive learning paths and microlearning modules.

Driven by curiosity and a commitment to continuous improvement, Deepti advocates for hybrid workflows that combine AI efficiency with human creativity. Her goal is to empower teams to produce high-impact content that resonates with users and supports business goals. In her free time, she enjoys reading literary fiction, learning to play the ukulele and watching mind-bending sci-fi movies.

Read next Ethical and Operational Risks of Agentic AI in CMS