AI-powered paper reading assistant for Zotero
PaperWorm adds an AI chat panel to Zotero's PDF reader. While reading a paper, you can ask questions, request summaries, quote selected passages as context, and have a conversation — all in context with the paper you are currently reading.
- Contextual chat — the paper's full text, title, authors, year, and abstract are automatically included in every conversation; no manual indexing required for text-based PDFs
- Streaming responses — AI replies appear word by word in real time
- Quick actions — one-click to summarize the paper, or select text from the PDF as context for your questions
- Multi-provider — supports OpenAI, DeepSeek, Anthropic (Claude), Google Gemini, Kimi (Moonshot), Qwen (Alibaba Cloud), OpenRouter, Xiaomi MiMo, MiniMax, and Ollama (local). See Model Guide for details
- Persistent sessions with cross-device sync — every conversation is automatically saved as a Zotero child note attached to the paper; sessions survive Zotero restarts and sync to other devices via your free Zotero account
- Multiple sessions per paper — start new conversations and switch between them via the Session List view
- Rich text rendering — Markdown (headings, bold, lists, code blocks) and LaTeX math (via KaTeX MathML) rendered in AI responses
- Precise layout extraction (MinerU) — optional layout-aware text extraction using MinerU to better handle tables, formulas, and structured content
- Multimodal screenshot analysis — drag to draw a rectangle over any figure, chart, or equation in the PDF; the captured region is attached to your next message and analyzed by a vision-capable model
- Zotero 7, 8, or 9 (
strict_min_version: 6.999) - An API key for at least one supported LLM provider (or a local Ollama instance)
- Download the latest
.xpifile from Releases - In Zotero, open Tools → Add-ons
- Click the gear icon → Install Add-on From File…
- Select the downloaded
.xpi - Restart Zotero when prompted
- Open Edit → Settings → PaperWorm (or Zotero → Settings on macOS)
- Select your LLM provider and enter your API key
- Click Test Connection to verify
- Click Fetch Models to automatically populate the available models for this provider
- Open any PDF in Zotero's reader — the PaperWorm panel will appear in the right sidebar. You can now switch between the fetched models directly in the panel.
PaperWorm supports enhanced text extraction using MinerU, which analyzes document layout to better extract tables, formulas, and structured content from academic papers.
To enable this feature:
- Go to mineru.net and create a free account
- Obtain your API token from the dashboard
- Open Edit → Settings → PaperWorm and enter your MinerU API Token
- Click Test Connection to verify
- When reading a paper in Zotero, click the ⚡ Fine Extraction button in the PaperWorm panel to extract structured text
- The extracted content will be cached as a Zotero note and used for subsequent conversations with this paper
Note: The cached content syncs across devices via your Zotero account. Once extracted, the structured text is available on all your devices without re-extraction.
PaperWorm can analyze figures, charts, equations, and any visible region of a PDF by capturing a screenshot and sending it to a vision-capable LLM.
How to use:
- While reading a PDF, click the 框选区域 button in the PaperWorm panel toolbar
- The cursor changes to a crosshair — drag to select the region you want to capture
- A thumbnail chip appears above the input box showing your screenshot
- Type your question (or leave it blank) and press Send
- The AI will receive both the screenshot and your text
Configuring the vision assistant model:
The vision assistant runs as a separate LLM call, independent of your main chat model. To configure it:
- Open Edit → Settings → PaperWorm
- Scroll to the 视觉辅助模型 (Vision Assistant Model) section
- Select a provider and enter the vision model name (e.g.
kimi-k2.6for Kimi) - The vision assistant uses the same API key as the selected provider — no extra key needed
Recommended vision models:
| Provider | Model | Notes |
|---|---|---|
| Kimi (Moonshot) | kimi-k2.6 |
Supports native vision; API key shared with Kimi chat config |
Notes:
- If no vision assistant is configured, the screenshot is still described as "视觉模型未配置" and the message is sent with that placeholder — the main model will acknowledge it but cannot see the image
- Screenshot descriptions are cached for the session: the same image region sent multiple times only calls the vision model once
- Screenshot data stays in memory only; stored session notes use
[截图 📷]as a placeholder instead of the raw image
| Provider | Recommended Model | Notes |
|---|---|---|
| OpenAI | gpt-5.4 |
Requires API key. See OpenAI Model Guide below |
| DeepSeek | deepseek-v4-pro |
Requires API key. Supports thinking mode and tool calling |
| Anthropic | claude-sonnet-4-6 |
Requires API key. See Claude Model Guide below |
| Google Gemini | gemini-3-flash-preview (free) |
Recommended for most tasks; free tier available. See Gemini Model Guide below |
| Kimi (Moonshot) | kimi-k2.6 |
Requires API key from platform.moonshot.cn |
| Alibaba Cloud Bailian | qwen3.6-max |
Model aggregation platform; requires API key from bailian.console.aliyun.com |
| OpenRouter | any model (e.g. openai/gpt-4.1) |
Access hundreds of models via a single API key from openrouter.ai |
| Xiaomi MiMo | mimo-v2-pro |
Requires API key from platform.xiaomimimo.com |
| MiniMax | MiniMax-M2.7 |
Requires API key from platform.minimaxi.com |
| Ollama | any local model | No API key needed; set base URL (default: http://localhost:11434) |
Every setting — provider, model, API key, temperature, max tokens, and system prompt — is read fresh on each message send. There is no restart or reload required.
This means you can switch providers or models at any point during a conversation. The next message will be sent to the new provider, while the full conversation history remains intact and is passed along as context.
Each provider's API key and model are stored independently, so switching between providers never overwrites your other configurations.
Google Gemini offers a wide range of models through the Gemini API. Below is a guide to help you choose the right model for paper reading tasks.
Gemini 3 Series (Preview status - latest generation)
| Model | API Identifier | Best For | Free Tier |
|---|---|---|---|
| Gemini 3 Flash | gemini-3-flash-preview |
General paper reading, fast responses | ✅ Free |
| Gemini 3.1 Pro | gemini-3.1-pro-preview |
Complex reasoning, detailed analysis | ❌ Paid only |
| Gemini 3.1 Flash-Lite | gemini-3.1-flash-lite-preview |
High-volume, cost-effective tasks | ✅ Free |
Gemini 2.5 Series (Stable release)
| Model | API Identifier | Best For | Free Tier |
|---|---|---|---|
| Gemini 2.5 Flash | gemini-2.5-flash |
Balanced performance and speed | ✅ Free |
| Gemini 2.5 Flash-Lite | gemini-2.5-flash-lite |
Most cost-effective option | ✅ Free |
| Gemini 2.5 Pro | gemini-2.5-pro-preview-03-25 |
Maximum reasoning capability | ❌ Paid only |
Gemini model names follow this pattern:
gemini-{major}.{minor}-{variant}-{status}
Examples:
gemini-3-flash-preview— Gemini 3 Flash, preview releasegemini-3.1-pro-preview— Gemini 3.1 Pro, preview releasegemini-2.5-flash— Gemini 2.5 Flash, stable release
- For most users: Use
gemini-3-flash-preview(free, fast, capable) - For complex analysis: Use
gemini-3.1-pro-preview(best reasoning) - For high-volume reading: Use
gemini-3.1-flash-lite-preview(most economical) - For stable production: Use
gemini-2.5-flash(non-preview, reliable)
- Go to Google AI Studio
- Sign in with your Google account
- Click "Create API Key"
- Copy the key and paste it into PaperWorm settings
Note: The free tier includes generous rate limits for Flash and Flash-Lite models. Pro models require a paid plan.
OpenAI's GPT models are state-of-the-art large language models capable of understanding and generating natural language.
| Model | API Identifier | Best For | Cost |
|---|---|---|---|
| GPT-5.4 | gpt-5.4 |
Complex reasoning, coding, analysis | Standard |
| GPT-5.4 Mini | gpt-5.4-mini |
Balanced performance and efficiency | Lower |
| GPT-5.4 Nano | gpt-5.4-nano |
Simple tasks, high-volume processing | Lowest |
| Model | API Identifier | Best For |
|---|---|---|
| GPT-4o | gpt-4o |
Multimodal tasks (text + vision) |
| GPT-4o Mini | gpt-4o-mini |
Cost-effective general tasks |
- For most users: Use
gpt-5.4(flagship model with best reasoning) - For cost-conscious users: Use
gpt-5.4-mini(good balance of quality and cost) - For quick summaries: Use
gpt-5.4-nano(fastest, most economical)
- Go to platform.openai.com
- Sign up or sign in to your account
- Navigate to API Keys in the left sidebar
- Click Create new secret key
- Copy the key and paste it into PaperWorm settings
Note: OpenAI requires a paid account with available credits. New accounts may receive free trial credits.
Anthropic's Claude models are designed for high performance across language, reasoning, analysis, and coding tasks.
| Model | Best For | Speed | Intelligence |
|---|---|---|---|
| Claude Opus 4.6 | Complex analysis, coding, professional work | Standard | Highest |
| Claude Sonnet 4.6 | General paper reading, balanced performance | Fast | High |
| Claude Haiku 4.5 | Quick summaries, high-volume processing | Fastest | Near-frontier |
Claude model names follow this pattern:
claude-{variant}-{version}
- Opus: Most intelligent, best for complex reasoning and coding
- Sonnet: Balanced intelligence and speed, good for most tasks
- Haiku: Fastest, most cost-effective for simple tasks
- For most users: Use
claude-sonnet-4-6(best balance of quality and speed) - For deep analysis: Use
claude-opus-4-7(maximum reasoning capability) - For quick summaries: Use
claude-haiku-4-5(fastest, most economical)
- Go to console.anthropic.com
- Sign up or sign in to your account
- Navigate to API Keys section
- Click Create Key and copy it
- Paste the key into PaperWorm settings
PaperWorm automatically injects the paper's full text into every conversation so the AI can answer detailed questions about the content.
PaperWorm uses multiple strategies in order, stopping at the first that succeeds:
- MinerU precise cache — if you have previously performed fine extraction on this paper, the structured text (with proper handling of tables and formulas) is used immediately
- Zotero full-text index — if the paper has already been indexed by Zotero, the cached text is used immediately
- On-demand indexing — if not yet indexed, PaperWorm triggers Zotero's built-in PDF indexer (
pdftotext) and reads the result; this is automatic and requires no user action - Rendered page text — if indexing fails or is unavailable, PaperWorm reads the text directly from the PDF viewer's rendered pages (
.textLayerDOM elements)
You do not need to manually pre-index papers. Strategy 2 handles indexing automatically on first use.
- Scanned / image-only PDFs (no OCR text layer): none of the four strategies can extract text. The AI will work from the title, authors, and abstract only.
- Strategy 4 coverage: only pages that have been rendered in the viewer (i.e., pages you have scrolled to) are available. For complete coverage of long papers, strategies 1-3 (which read the full file) are preferable — they are attempted first.
- Character limit: up to 400,000 characters (~120 pages) of full text are injected per message.
PaperWorm stores each conversation as a Zotero child note attached to the paper item. This has two benefits:
- Persistence — sessions survive Zotero restarts (unlike in-memory history)
- Free cross-device sync — Zotero notes are item metadata and sync automatically with a free Zotero account, with no file storage subscription required
Each session note contains:
- A human-readable transcript (
用户/AIturns) - A machine-readable metadata block used by PaperWorm to restore the session
The metadata block appears at the bottom of the note under a 会话元数据 label as a Base64-encoded JSON string. This is intentional and expected — it is how PaperWorm stores the conversation data for reloading.
If you open a PaperWorm session note directly in Zotero's note editor, you will see this Base64 text at the bottom. You can ignore it. Do not edit or delete it manually, as doing so will prevent PaperWorm from loading that session.
Session notes are titled PaperWorm · <first message> and are listed under the paper in Zotero's item tree.
API keys are stored in Zotero's local preferences file (prefs.js) in plain text, which is standard practice for Zotero plugins. This file is excluded from version control via .gitignore.
Never share your Zotero profile directory or commit prefs.js to any repository.
- Node.js 18+ (tested with v22)
- npm
npm install
npm run buildThe built .xpi is output to .scaffold/build/.
PaperWorm/
├── addon/ Static plugin assets
│ ├── bootstrap.js Zotero entry point
│ ├── manifest.json Plugin manifest
│ ├── content/ XHTML UI + icons
│ └── locale/ FTL localization (en-US, zh-CN)
├── src/ TypeScript source
│ ├── index.ts Entry point
│ ├── hooks.ts Lifecycle hooks
│ └── modules/
│ ├── llm/ LLM provider abstraction + implementations
│ ├── chat/ Chat history management
│ ├── paper/ Paper metadata extraction
│ └── ui/ Reader panel UI
├── docs/ Project documentation
└── typings/ Global type declarations
- Create
src/modules/llm/<name>.tsimplementing theLLMProviderinterface (skip if the provider is OpenAI-compatible — reuseOpenAIProviderwith a differentbaseUrl) - Register it in
src/modules/llm/manager.ts— add toProviderNametype andbuildProvider()switch - Add the corresponding preference fields in
addon/prefs.jsandaddon/content/preferences.xhtml(use full pref paths:extensions.zotero.paperworm.llm.<name>.*) - Add the provider name to the
providersarray insrc/modules/preferenceScript.ts→showProviderSection()
The LLMProvider interface requires three methods:
interface LLMProvider {
readonly name: string;
chat(options: LLMRequestOptions): Promise<string>;
chatStream(options, onChunk, onDone, onError): Promise<void>;
testConnection(): Promise<boolean>;
}- zotero-plugin-template by @windingwind — project scaffold this plugin is based on
- zotero-plugin-toolkit by @windingwind — Zotero plugin utility library
- zotero-plugin-scaffold by @northword — build toolchain
- KaTeX — math formula rendering
- MinerU by OpenDataLab — free, non-commercial layout-aware PDF text extraction service
- HermesAgent by NousResearch — the vision routing strategy (dedicated vision LLM → text description → main model injection) was inspired by HermesAgent's multimodal architecture
MIT