Word Agent

1. Overview

This project is an AI-assisted writing system based on (multi-)agent workflows: WenCe AI (Word Agent). After users install the add-in in office suites (such as WPS and Microsoft Word), they can interact with AI through natural language to get writing suggestions, content generation, and structure optimization.

WenCe AI (Word Agent): strategy-driven writing, smarter expression.

The backend is built with FastAPI. The frontend WPS add-in communicates with the backend via streaming APIs, so users can see LLM outputs in real time for a smooth writing-assistant experience.

The frontend uses Vue 3 and JavaScript. A key module is the DocxJson bidirectional converter, which converts formatted Word content and JSON structures back and forth.

The backend is implemented in Python, using LangChain and LangGraph for agent design and collaboration, ChatOpenAI-compatible APIs for SSE streaming and tool calling, and a lightweight PySide6 desktop panel for add-in installation and terminal log inspection.

At its core, this project focuses on structured Word document generation. The project defines a JSON schema conceptually similar to HTML and CSS, abstracting paragraph and text-run style attributes so the agent can better understand and generate well-formatted Word content.

Main JSON data structures:

paragraphs: an array of Word paragraphs containing multiple runs; this is the primary editable object for the agent
- pStyle: paragraph style ID (for example, Heading 1, Heading 2, Body)
- runs: text-run array, the smallest content unit in this project
  - text: text content
  - rStyle: character style ID (for example, bold, red)
- paraIndex: paragraph index, used by the agent to locate and edit a specific paragraph precisely
styles: style-definition dictionary that contains all paragraph and character style definitions; the agent references these style IDs to preserve formatting correctness

Compared with many AI writing assistants on the market, WenCe AI provides:

Cross-version and cross-platform support: built on mainstream office software with a Copilot-like Word add-in UX, lowering the barrier for general users, and supporting both Windows and Linux.
Native rich-text editing with style and paragraph awareness: unlike many Word AI tools, this project can understand Word document structure, autonomously gather online information, and modify both structure and content based on user requirements.
Efficient editing with multi-agent collaboration: multiple agents take different expert roles and collaborate to produce in-depth long-form writing.
Open and flexible integration: supports user-provided API keys and is compatible with most mainstream LLM providers and models.

2. Project Preview

WPS Add-in UI	Backend Qt UI

For example, in WPS single-agent mode, if a user asks: "Expand my internship objective into five points," the agent follows a "locate → read → understand → edit" workflow. It calls search_document to locate the target paragraph, read_document to fetch the content, then performs edits (for example via delete_document) and finally calls generate_document to produce the rewritten result. The frontend add-in highlights before/after changes with different colors so users can clearly see what was modified.

Note: The output includes not only text content but also matching style metadata (for example headings/body, bold, font, indentation, and line spacing). The frontend add-in renders the final Word-formatted result based on these style definitions.

Another example, switching to multi-agent mode: when a user requests to write a long novel with illustrations, each specialized agent works in sequence: the planner agent orchestrates the workflow, the research agent searches online novels and calls image generation tools, the outline agent describes the novel outline, the writer agent outputs the article content, and finally the reviewer agent reviews paragraphs and suggests revisions.

Note: Multi-agent mode excels at generating long-form content while staying on-topic and maintaining coherence, but has slightly weaker tool-calling capability compared to single-agent mode.

In addition, the project supports two types of pluggable extensions: MCP servers and Skills.

MCP server example (third-party API/service integration): users can configure MCP servers so the agent can call third-party APIs as tools. For example, with Amap (Gaode) Maps MCP and a Visualization Chart MCP Server, when a user asks: "Query Changsha's weather for the next five days, draw a temperature line chart, and write a weather report," the agent can retrieve temperature data via the Amap MCP server, then generate a chart image URL via the chart MCP server and render it in the add-in panel.

Skill example (packaged, reusable workflows): a Skill bundles reusable capabilities and procedures (for example prompt templates, tool-call orchestration, or domain-specific writing logic). Once loaded, the agent can select and execute the appropriate Skill to complete certain task types more reliably.

3. Development Plan

Single-agent mode
Multi-agent mode
Remote MCP server integration
Local MCP server and Skill tool integration
Context compression support
Advanced style editing (tables, illustrations, equations, etc.) — equations are readable but cannot be generated

Supported Office Suites

WPS Office (Windows, Linux), version 12.1.25225 and above
Microsoft Word (Windows, Web), version 2019/2021 and above

4. System Architecture

To better satisfy user needs and improve generation stability and depth, the project provides two agent architectures.

4.1 Single-Agent Loop Architecture

Architecture Diagram

The frontend WPS add-in converts the user's request and selected document range into structured JSON and sends it to the backend.

In the backend single-agent architecture, the system follows a standard ReAct loop. In each round, the agent reasons over user input and current document state, chooses whether to call a tool (such as web search) or finish directly, and continues this tool-use/reasoning loop until completion.

read_document tool: reads content in the (startPosition, endPosition) range and returns structured JSON to the agent.
generate_document tool: generates structured JSON document content and returns it to the frontend add-in.
search_document tool: locates paragraph positions by format or text criteria and returns positions to the agent.
web_fetch tool: fetches information from user-provided links.

4.2 Multi-Agent Architecture

Architecture Diagram

The frontend flow is the same as in single-agent mode. In the backend multi-agent workflow, a planner agent orchestrates and schedules several specialized agents.

research agent: collects online reference information
outline agent: generates an outline based on references and user requirements
writer agent: writes content based on references and user requirements
reviewer agent: reviews generated content and provides revision suggestions

5. Quick Start

Environment Setup

Node v22.12.0
wpsjs 2.2.3
Python 3.11.14
Windows 10/11 or Ubuntu 22.04

Build Frontend Add-in

cd frontend/wps_word_plugin       # WPS Word add-in
cd frontend/microsoft_word_plugin # Or Microsoft Word add-in
pnpm install
pnpm build

Run Backend Service

cd backend
uv run python main.py

Use LangSmith Tracing

The project also supports LangSmith for tracing and analyzing agent behavior. For setup details, see backend/README.md.

Package the Desktop App

cd backend/deploy
uv run pyinstaller wence.spec

The packaged executable is generated in backend/deploy/dist.

If you do not want to package it yourself, you can directly download the packaged archive from Releases and run the executable after extraction.

Download

Packaged release files: Release.

Run the App

After downloading, run the executable, start the backend service (wence_word_plugin -> Install), open Word, trust the add-in, and start using the system.

You need to configure an LLM API. This project is currently tested with Alibaba Bailian Qwen3.5-Plus APIs.

6. LLM API Compatibility

The project has tested part of the mainstream LLM APIs, and compatibility is still expanding:

Recommended: Use GPT series models for best results, followed by Qwen series models. See the evaluation document for details.

Note: part of development used free quotas from Alibaba Bailian and OpenRouter.

7. About

Contact: https://visresearch.github.io/WordAgent/guide/about.html

8. License

Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github		.github
backend		backend
frontend		frontend
web		web
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Word Agent

1. Overview

2. Project Preview

3. Development Plan

Supported Office Suites

4. System Architecture

4.1 Single-Agent Loop Architecture

Architecture Diagram

4.2 Multi-Agent Architecture

Architecture Diagram

5. Quick Start

Environment Setup

Build Frontend Add-in

Run Backend Service

Use LangSmith Tracing

Package the Desktop App

Download

Run the App

6. LLM API Compatibility

7. About

8. License

About

Uh oh!

Releases 16

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Word Agent

1. Overview

2. Project Preview

3. Development Plan

Supported Office Suites

4. System Architecture

4.1 Single-Agent Loop Architecture

Architecture Diagram

4.2 Multi-Agent Architecture

Architecture Diagram

5. Quick Start

Environment Setup

Build Frontend Add-in

Run Backend Service

Use LangSmith Tracing

Package the Desktop App

Download

Run the App

6. LLM API Compatibility

7. About

8. License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 16

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages