ShadowCursor
MIT Open SourcePress a shortcut, speak your intent. ShadowCursor reads the page and guides you through it.
Overview
ShadowCursor is an open-source Chrome extension that turns voice commands into guided browser automation. Press Cmd+Shift+K, speak your request, and the extension captures a screenshot, scrapes the visible DOM, and sends everything to Claude or OpenAI. The model either answers your question inline or proposes a supervised step-by-step action plan with an animated ghost cursor.
- →Voice capture with live transcript via MediaRecorder + Web Speech API; stops automatically on silence or manually via keyboard.
- →Multimodal context assembly: screenshot + DOM snapshot + page URL are bundled into a single LLM prompt for accurate, page-aware responses.
- →Confirmation-first execution: every action step requires user approval before the ghost cursor moves, so nothing runs without your sign-off.
§Runtime flow
The full pipeline from keypress to executed action runs entirely inside your browser, with no relay server and no persistent cloud session:
- →Cmd+Shift+K (macOS) or Ctrl+Shift+K triggers voice capture; a recording indicator appears in-page.
- →On capture end, the content script bundles raw audio, transcript, DOM snapshot, and page metadata and hands it to the background service worker.
- →The service worker captures a tab screenshot, optionally upgrades the transcript via an external STT provider, then assembles the full multimodal prompt.
- →The LLM returns either mode: 'answer' (explanation card) or mode: 'action' (step list); the content script renders the result and executes each confirmed step.
§Stack & architecture
Built entirely in TypeScript with Webpack, structured around Chrome's Manifest V3 service worker model:
- →background/: service-worker.ts orchestrates screenshots, STT resolution, and LLM routing; llm-router.ts supports Anthropic and OpenAI with user-supplied BYOK keys.
- →content/: trigger.ts, voice-capture.ts, dom-scraper.ts, action-executor.ts, and shadow-cursor.ts each own a single responsibility; they communicate via typed chrome.runtime messages.
- →shared/: types.ts, constants.ts, storage.ts, and messaging.ts provide a strict contract between background and content layers.
- →Options page lets users configure LLM provider, API keys, STT provider, auto-execute preference, and destructive-action confirmation.
§Security & privacy
ShadowCursor handles sensitive page context; screenshots and DOM snapshots may contain personal data. The project ships with explicit guidance:
- →API keys are stored in chrome.storage.sync, never in source control or .env files.
- →No keys, screenshots, or session exports are committed to the repository.
- →Host access uses <all_urls> by default; production deployments should restrict to an allowlist.
- →SECURITY.md documents responsible disclosure guidance for vulnerability reports.