Swipium 1.5.0

Today I published version 1.5.0, which includes quite a bit of work, mainly fixes and improvements that are typical of an early-stage phase, plus iterations I made while using the tool myself and based on feedback from teammates.

It is live on GitHub and on npm as swipium@1.5.0.

If you are new to the project, the 1.0.0 launch post explains what Swipium is and why it exists. The docs cover setup, and the integrations page lists the supported agents. These are the most important changes in 1.5.0.

95 tools down to 60

Swipium grew fast between 1.1 and 1.4. Each release added capability, and by now the public MCP surface had reached 95 tools.

The issue I found with having so many tools is that the agent, even with the documentation, often did not choose the correct tool. This happened either because of the tool name, or because there were tools that shared certain core functions but changed one specific behavior, or were designed for a specific use case in a specific scenario.

What I did in this version was merge the tools that were most similar, while keeping their original capabilities and scenarios. I also simplified their documentation and names, so the agent can use one single tool and then choose which capability to apply, instead of having to read through everything.

This significantly increased the success rate.

Before: 95 public tools, many of them overlapping or lower-level.

Now: 60 public tools, grouped around the workflow the user actually wants.

Why: the end-to-end capability is the same; what changed is that the correct path is easier to identify. Most of the reduction came from consolidating entry points, which is what the next few sections describe.

One generator instead of five

Generation used to be spread across several tools that all did the same category of thing: turn observed behavior into a reusable asset.

Before: separate tools to generate flows, page objects, suites, test cases, and Appium code.

Now: a single qa_generate, where the kind of output is a parameter:

qa_generate {
  "sessionId": "...",
  "target": "appium",
  "mode": "plan"
}

Swap target for flow, pom, suite, or testcases to get the other outputs.

Why: the agent can hear "generate automation" and pick one obvious tool, then choose the detail through a field. Underneath, all paths share the same validation and diagnostics, so new targets do not grow the public surface.

One first-run workflow

First-run screens are one of the harder parts of mobile QA: login, signup, onboarding, permissions, OTP, paywalls, and the jump to home. They need judgment more than blind automation.

Before: two tools, one to plan the first-run screen and one to continue it.

Now: one qa_first_run with a mode. mode:"plan" classifies the current screen and returns a safe plan without acting; mode:"continue" runs bounded steps and stops at gates:

qa_first_run { "sessionId": "...", "mode": "continue", "until": "until_home" }

Why: one intent should map to one tool. Planning first and only acting when allowed matches how these flows actually work.

Plan and run are one tool with a mode

I had a habit of shipping "plan" and "run" as separate tools, which doubled the surface for no gain.

Before: qa_build_plan next to the real build, qa_flow_plan next to running a flow.

Now: a single mode parameter, with defaults based on how risky the action is. qa_build defaults to plan, because building from source is expensive and mutating. qa_flow_run defaults to run, because running a saved flow is the point of the tool.

Why: it removed a class of accidental mutations where an agent triggered a real build when it only meant to look.

Clearer suite naming and selectors

Before: the durable repo-level suite and a per-run generated suite had names close enough to misread, and qa_act accepted free-form selector strings.

Now: qa_suite_* always means the persistent suite in .swipium/test-suite.json, and a per-run suite is qa_generate with target:"suite". Selectors in qa_act are structured objects.

Why: "update the suite" now means exactly one thing, and structured selectors are easier for an agent to build correctly and for me to validate. A few lower-level helpers were deferred from the public contract for the same reason; the high-value workflows (feature testing, mobile audits, the app map, reports, durable issue memory) all stayed first-class.

A more reliable release

The other half of 1.5.0 is reliability, mostly from problems I found the hard way.

Stale builds. TypeScript leaves old compiled files in dist/ after you delete a source file, so a removed tool could still ship. Now npm run build cleans dist/ before compiling, so the package matches the source tree.

Error handling. A few tools threw stack traces at the protocol layer, which is useless to an agent. Every recoverable failure now returns a structured envelope with ok:false, plus what, changedState, retrySafe, nextSteps, and a failureCode. A test invokes every public tool with hostile inputs and checks it returns a clean result or that envelope, never a crash.

File locking. Swipium keeps small JSON files under .swipium, and multiple clients can run at once. I added advisory locking, and on a lock timeout it now fails the write instead of proceeding unlocked, so the agent gets a recoverable error rather than corrupted state.

Release checks. The release now runs a typecheck, ESLint, Prettier, the full test suite, a dependency audit, a clean build, and npm pack --dry-run. The npm publish uses GitHub Actions provenance, and the package ships a THREAT_MODEL.md and CHANGELOG.md.

Thoughts

Building an MCP is fun and helps me in my daily work. It is also messy to build and maintain. I have several more ideas to implement here, so I will write another blog post when there is a new version.

This version includes quite a few improvements, but several items were left out, and I already have them identified to fix or implement soon.