Skip to content

visresearch/WordAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

137 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Agent

Python FastAPI LangChain LangGraph Node.js Version License

English | 中文文档

1. Overview

This project is an AI-assisted writing system based on (multi-)agent workflows: WenCe AI (Word Agent). After users install the add-in in office suites (such as WPS and Microsoft Word), they can interact with AI through natural language to get writing suggestions, content generation, and structure optimization.

WenCe AI (Word Agent): strategy-driven writing, smarter expression.

The backend is built with FastAPI. The frontend WPS add-in communicates with the backend via streaming APIs, so users can see LLM outputs in real time for a smooth writing-assistant experience.

The frontend uses Vue 3 and JavaScript. A key module is the DocxJson bidirectional converter, which converts formatted Word content and JSON structures back and forth.

The backend is implemented in Python, using LangChain and LangGraph for agent design and collaboration, ChatOpenAI-compatible APIs for SSE streaming and tool calling, and a lightweight PySide6 desktop panel for add-in installation and terminal log inspection.

At its core, this project focuses on structured Word document generation. The project defines a JSON schema conceptually similar to HTML and CSS, abstracting paragraph and text-run style attributes so the agent can better understand and generate well-formatted Word content.

Main JSON data structures:

  • paragraphs: an array of Word paragraphs containing multiple runs; this is the primary editable object for the agent
    • pStyle: paragraph style ID (for example, Heading 1, Heading 2, Body)
    • runs: text-run array, the smallest content unit in this project
      • text: text content
      • rStyle: character style ID (for example, bold, red)
    • paraIndex: paragraph index, used by the agent to locate and edit a specific paragraph precisely
  • styles: style-definition dictionary that contains all paragraph and character style definitions; the agent references these style IDs to preserve formatting correctness

Compared with many AI writing assistants on the market, WenCe AI provides:

  1. Cross-version and cross-platform support: built on mainstream office software with a Copilot-like Word add-in UX, lowering the barrier for general users, and supporting both Windows and Linux.
  2. Native rich-text editing with style and paragraph awareness: unlike many Word AI tools, this project can understand Word document structure, autonomously gather online information, and modify both structure and content based on user requirements.
  3. Efficient editing with multi-agent collaboration: multiple agents take different expert roles and collaborate to produce in-depth long-form writing.
  4. Open and flexible integration: supports user-provided API keys and is compatible with most mainstream LLM providers and models.

2. Project Preview

WPS Add-in UI Backend Qt UI

For example, in WPS single-agent mode, if a user asks: "Expand my internship objective into five points," the agent follows a "locate → read → understand → edit" workflow. It calls search_document to locate the target paragraph, read_document to fetch the content, then performs edits (for example via delete_document) and finally calls generate_document to produce the rewritten result. The frontend add-in highlights before/after changes with different colors so users can clearly see what was modified.

Note: The output includes not only text content but also matching style metadata (for example headings/body, bold, font, indentation, and line spacing). The frontend add-in renders the final Word-formatted result based on these style definitions.

Another example, switching to multi-agent mode: when a user requests to write a long novel with illustrations, each specialized agent works in sequence: the planner agent orchestrates the workflow, the research agent searches online novels and calls image generation tools, the outline agent describes the novel outline, the writer agent outputs the article content, and finally the reviewer agent reviews paragraphs and suggests revisions.

Note: Multi-agent mode excels at generating long-form content while staying on-topic and maintaining coherence, but has slightly weaker tool-calling capability compared to single-agent mode.

In addition, the project supports two types of pluggable extensions: MCP servers and Skills.

  1. MCP server example (third-party API/service integration): users can configure MCP servers so the agent can call third-party APIs as tools. For example, with Amap (Gaode) Maps MCP and a Visualization Chart MCP Server, when a user asks: "Query Changsha's weather for the next five days, draw a temperature line chart, and write a weather report," the agent can retrieve temperature data via the Amap MCP server, then generate a chart image URL via the chart MCP server and render it in the add-in panel.

  1. Skill example (packaged, reusable workflows): a Skill bundles reusable capabilities and procedures (for example prompt templates, tool-call orchestration, or domain-specific writing logic). Once loaded, the agent can select and execute the appropriate Skill to complete certain task types more reliably.

3. Development Plan

  • Single-agent mode
  • Multi-agent mode
  • Remote MCP server integration
  • Local MCP server and Skill tool integration
  • Context compression support
  • Advanced style editing (tables, illustrations, equations, etc.) — equations are readable but cannot be generated

Supported Office Suites

  • WPS Office (Windows, Linux), version 12.1.25225 and above
  • Microsoft Word (Windows, Web), version 2019/2021 and above

4. System Architecture

To better satisfy user needs and improve generation stability and depth, the project provides two agent architectures.

4.1 Single-Agent Loop Architecture

Architecture Diagram

The frontend WPS add-in converts the user's request and selected document range into structured JSON and sends it to the backend.

In the backend single-agent architecture, the system follows a standard ReAct loop. In each round, the agent reasons over user input and current document state, chooses whether to call a tool (such as web search) or finish directly, and continues this tool-use/reasoning loop until completion.

  • read_document tool: reads content in the (startPosition, endPosition) range and returns structured JSON to the agent.
  • generate_document tool: generates structured JSON document content and returns it to the frontend add-in.
  • search_document tool: locates paragraph positions by format or text criteria and returns positions to the agent.
  • web_fetch tool: fetches information from user-provided links.

4.2 Multi-Agent Architecture

Architecture Diagram

The frontend flow is the same as in single-agent mode. In the backend multi-agent workflow, a planner agent orchestrates and schedules several specialized agents.

  • research agent: collects online reference information
  • outline agent: generates an outline based on references and user requirements
  • writer agent: writes content based on references and user requirements
  • reviewer agent: reviews generated content and provides revision suggestions

5. Quick Start

Environment Setup

  • Node v22.12.0
  • wpsjs 2.2.3
  • Python 3.11.14
  • Windows 10/11 or Ubuntu 22.04

Build Frontend Add-in

cd frontend/wps_word_plugin       # WPS Word add-in
cd frontend/microsoft_word_plugin # Or Microsoft Word add-in
pnpm install
pnpm build

Run Backend Service

cd backend
uv run python main.py

Use LangSmith Tracing

The project also supports LangSmith for tracing and analyzing agent behavior. For setup details, see backend/README.md.

Package the Desktop App

cd backend/deploy
uv run pyinstaller wence.spec

The packaged executable is generated in backend/deploy/dist.

If you do not want to package it yourself, you can directly download the packaged archive from Releases and run the executable after extraction.

Download

Packaged release files: Release.

Run the App

After downloading, run the executable, start the backend service (wence_word_plugin -> Install), open Word, trust the add-in, and start using the system.

You need to configure an LLM API. This project is currently tested with Alibaba Bailian Qwen3.5-Plus APIs.

6. LLM API Compatibility

The project has tested part of the mainstream LLM APIs, and compatibility is still expanding:

  • Qwen 3.6 Plus (stable)
  • GLM-5.1 (stable)
  • GPT 5.4 (stable)
  • MiniMax M2.5 (stable)
  • Step 3.5 Flash (stable)
  • DeepSeek v4 pro (stable)
  • Claude Sonnet/Opus (stable)
  • MiMo-V2.5 (stable)
  • Gemini 3.1 Pro

Recommended: Use GPT series models for best results, followed by Qwen series models. See the evaluation document for details.

Note: part of development used free quotas from Alibaba Bailian and OpenRouter.

7. About

Contact: https://visresearch.github.io/WordAgent/guide/about.html

8. License

Apache License 2.0.

About

An AI agent-powered writing assistance system (Copilot style) that enables AI-assisted content creation via WPS and Microsoft Word add-ins. 基于AI智能体的写作辅助系统,通过WPS、Microsoft Word加载项,实现AI辅助的文字创作

Topics

Resources

License

Stars

Watchers

Forks

Contributors