A transparent AutoHotkey v2 text laboratory for digital humanities exploration and introductory NLP teaching
TextLab-AHK is a lightweight Windows desktop text-analysis environment written entirely in AutoHotkey v2.
It was developed with two parallel goals:
- to support exploratory digital humanities text work, and
- to teach beginner Natural Language Processing concepts through highly readable code.
Most NLP teaching tools today rely on Python ecosystems, external libraries, hidden tokenizers, and abstract package dependencies. TextLab-AHK takes the opposite approach: every core text-processing operation is visible, linear, and easy for students to inspect.
This makes the software not only a usable corpus experimentation tool, but also a practical instructional model for explaining:
- tokenization
- normalization
- lexical counting
- n-gram generation
- concordance construction
- corpus statistics
- vocabulary rarity
Because the codebase is plain AutoHotkey, students can open the scripts and directly follow what each algorithm is doing.
The project contains two script editions.
This is the main interactive application containing:
- tabbed corpus workspace
- cleaning controls
- analysis modules
- KWIC concordance panel
- notes panel
- export system
It functions as a small desktop textual laboratory.
Recommended for:
- classroom demonstrations
- digital humanities workshops
- guided corpus experiments
- student labs
This version removes the full graphical desktop interface and instead works through:
- keyboard hotkeys
- clipboard loading
- file loading
- popup prompts
- clipboard result output
- optional export
It is intentionally simpler and easier for students to inspect because the computational logic is not buried inside GUI layout code.
Recommended for:
- teaching source-code reading
- beginner NLP exercises
- algorithm walkthroughs
- minimalist corpus experiments
AutoHotkey was chosen deliberately.
While Python is more powerful in large-scale NLP, AutoHotkey offers several advantages for teaching introductory computational text analysis:
- extremely readable syntax
- straightforward loops
- simple string handling
- no external package installation
- instant desktop execution
- easy hotkey experimentation
- visible procedural logic
Students can therefore focus on algorithmic understanding rather than environment configuration.
The objective of TextLab-AHK is not industrial NLP production.
The objective is:
to make text-processing algorithms understandable.
Each component of the software corresponds to a foundational text-processing concept.
| Module | Demonstrates |
|---|---|
| Corpus Loading | text ingestion |
| Cleaning | normalization / preprocessing |
| Word Frequency | token counting |
| Bigram / Trigram | n-gram construction |
| Corpus Stats | lexical measurement |
| Hapax Legomena | rare vocabulary behavior |
| KWIC Concordance | context retrieval |
| Notes / Export | interpretive documentation |
This allows instructors to move students through a full visible workflow:
raw text → cleaned corpus → token patterns → contextual reading → interpretation.
Load corpus material from:
.txtfiles- OCR text
- copied archive excerpts
- clipboard text
Useful for small to medium classroom corpora.
Preprocessing options include:
- lowercase conversion
- punctuation removal
- number stripping
- whitespace normalization
- short-word removal
- regex substitutions
This helps demonstrate how preprocessing choices alter downstream linguistic results.
Generate:
- word frequency tables
- bigram frequencies
- trigram frequencies
Students can directly observe lexical repetition and phrase recurrence.
Quick descriptive metrics:
- character count
- token count
- unique vocabulary
- lexical diversity
Useful for introducing corpus descriptives.
Lists all words appearing only once.
A simple but powerful introduction to vocabulary distribution.
Search a term and generate:
left context + keyword + right context.
This is one of the clearest demonstrations of why context matters beyond frequency counts.
The full version includes:
- research note taking
- session documentation
- exportable reports
Useful for DH interpretive workflows.
| Hotkey | Function |
|---|---|
| Ctrl + Alt + V | Load clipboard as corpus |
| Ctrl + Alt + O | Open TXT file |
| Ctrl + Alt + C | Clean corpus |
| Ctrl + Alt + F | Word frequency |
| Ctrl + Alt + B | Bigram frequency |
| Ctrl + Alt + T | Trigram frequency |
| Ctrl + Alt + K | KWIC concordance |
| Ctrl + Alt + S | Corpus statistics |
| Ctrl + Alt + H | Hapax words |
| Ctrl + Alt + E | Export last result |
TextLab-AHK is particularly effective for:
Demonstrating what happens inside basic text algorithms.
Running lightweight literary or historical corpus inspection.
Exploring lexical recurrence and phrase distribution.
Showing normalization on messy archival text.
Because the source code is short and linear, students can modify functions and immediately see textual consequences.
- Windows
- AutoHotkey v2 installed
Download AutoHotkey: https://www.autohotkey.com/
Place both scripts in the same project folder:
app.ahk
app-lite.ahk
README.md
Then simply run either:
app.ahkfor the full GUI desktop laboratory orapp-lite.ahkfor the lightweight no-GUI teaching version
No external libraries are required.
A productive classroom sequence is:
Show students an unprocessed text.
Demonstrate normalization decisions.
Discuss token repetition.
Show phrase windows and co-occurrence.
Move from counting to contextual interpretation.
Walk through the AutoHotkey functions line by line.
This last stage is where TextLab-AHK becomes especially valuable: students can connect visible interface actions to visible procedural code.
TextLab-AHK sits between:
- manual close reading,
- digital humanities experimentation,
- and introductory NLP pedagogy.
It is intentionally small, hackable, and transparent.
The software is best understood as:
a visible computational text sandbox.
Free for academic, classroom, personal, and experimental use.
Modification is encouraged.