Open Source Projects/ShadowCursor

ShadowCursor

MIT Open Source

Press a shortcut, speak your intent. ShadowCursor reads the page and guides you through it.

TypeScriptChrome ExtensionAILLMVoiceMultimodalMV3WebpackClaudeOpenAI

Overview

ShadowCursor is an open-source Chrome extension that turns voice commands into guided browser automation. Press Cmd+Shift+K, speak your request, and the extension captures a screenshot, scrapes the visible DOM, and sends everything to Claude or OpenAI. The model either answers your question inline or proposes a supervised step-by-step action plan with an animated ghost cursor.

  • Voice capture with live transcript via MediaRecorder + Web Speech API; stops automatically on silence or manually via keyboard.
  • Multimodal context assembly: screenshot + DOM snapshot + page URL are bundled into a single LLM prompt for accurate, page-aware responses.
  • Confirmation-first execution: every action step requires user approval before the ghost cursor moves, so nothing runs without your sign-off.
Screenshots
Tap or click any preview to expand

§Runtime flow

The full pipeline from keypress to executed action runs entirely inside your browser, with no relay server and no persistent cloud session:

  • Cmd+Shift+K (macOS) or Ctrl+Shift+K triggers voice capture; a recording indicator appears in-page.
  • On capture end, the content script bundles raw audio, transcript, DOM snapshot, and page metadata and hands it to the background service worker.
  • The service worker captures a tab screenshot, optionally upgrades the transcript via an external STT provider, then assembles the full multimodal prompt.
  • The LLM returns either mode: 'answer' (explanation card) or mode: 'action' (step list); the content script renders the result and executes each confirmed step.

§Stack & architecture

Built entirely in TypeScript with Webpack, structured around Chrome's Manifest V3 service worker model:

  • background/: service-worker.ts orchestrates screenshots, STT resolution, and LLM routing; llm-router.ts supports Anthropic and OpenAI with user-supplied BYOK keys.
  • content/: trigger.ts, voice-capture.ts, dom-scraper.ts, action-executor.ts, and shadow-cursor.ts each own a single responsibility; they communicate via typed chrome.runtime messages.
  • shared/: types.ts, constants.ts, storage.ts, and messaging.ts provide a strict contract between background and content layers.
  • Options page lets users configure LLM provider, API keys, STT provider, auto-execute preference, and destructive-action confirmation.

§Security & privacy

ShadowCursor handles sensitive page context; screenshots and DOM snapshots may contain personal data. The project ships with explicit guidance:

  • API keys are stored in chrome.storage.sync, never in source control or .env files.
  • No keys, screenshots, or session exports are committed to the repository.
  • Host access uses <all_urls> by default; production deployments should restrict to an allowlist.
  • SECURITY.md documents responsible disclosure guidance for vulnerability reports.