Entity-level diffs on top of Git. Structured output for AI agents, CI pipelines, and humans who want more than line numbers.
$ sem diff ┌─ src/auth/login.ts ────────────────────────────────── │ │ ⊕ function validateToken [added] │ ∆ function authenticateUser [modified] │ ⊖ function legacyAuth [deleted] │ └────────────────────────────────────────────────────── ┌─ config/database.yml ───────────────────────────────── │ │ ∆ property production.pool_size [modified] │ - 5 │ + 20 │ └────────────────────────────────────────────────────── Summary: 1 added, 1 modified, 1 deleted across 2 files
--format json for machine-readable output. More commands coming soon.
-s, --staged Show staged changes only-c, --commit <sha> Diff a specific commit--from <ref> --to <ref> Diff a commit range-f, --format <fmt> terminal (default) or json--file-exts <ext>... Only include files with these extensions (e.g. .py .rs)--entity <name> Show dependencies/dependents for a specific entity--format <fmt> terminal (default) or json--file-exts <ext>... Only include files with these extensions<entity> Name of the entity to analyze--json Output as JSON--file-exts <ext>... Only include files with these extensions<file> File to blame--json Output as JSON| Format | Extensions | Entities |
|---|---|---|
| TypeScript | .ts .tsx | functions, classes, interfaces, types, enums |
| JavaScript | .js .jsx .mjs .cjs | functions, classes, variables |
| Python | .py | functions, classes, decorators |
| Go | .go | functions, methods, types |
| Rust | .rs | functions, structs, enums, impls, traits |
| JSON | .json | properties, objects (RFC 6901 paths) |
| YAML | .yml .yaml | sections, properties (dot paths) |
| TOML | .toml | sections, properties |
| CSV | .csv .tsv | rows (first column as ID) |
| Markdown | .md .mdx | heading-based sections |
Three-phase algorithm that detects additions, modifications, deletions, renames, and moves.
Phase 1 — Exact ID
Same entity ID in before/after? Modified or unchanged.
Phase 2 — Content hash
Same SHA-256, different name? Renamed or moved.
Phase 3 — Fuzzy similarity
>80% Jaccard token overlap? Probable rename.
Structured JSON. Pipe sem into your AI agent, CI pipeline, or automation.
{ "summary": { "fileCount": 2, "added": 1, "modified": 1, "deleted": 1, "total": 3 }, "changes": [ { "entityId": "src/auth.ts::function::validateToken", "changeType": "added", "entityType": "function", "entityName": "validateToken", "filePath": "src/auth.ts" } ] }
git gives you lines. sem gives you entities — functions, properties, rows, sections.
| Feature | git diff | sem diff |
|---|---|---|
| Diff granularity | lines | entities (functions, classes, properties) |
| Code parsing | no | tree-sitter (TS, Python, Go, Rust, JS) |
| JSON / YAML / TOML | lines | key-path entities |
| CSV | lines | row + cell identity |
| Rename detection | heuristic (file-level) | 3-phase (ID + hash + fuzzy) |
| Machine-readable output | patch format | JSON |
| Agent accuracy | 41.5% avg | 95.9% avg (benchmark) |
| Speed | 9ms | 8ms |
| Adoption | - | single binary, drop into any Git repo |
Real measurements on the sem repo. 50 runs each via hyperfine -N, median reported. LTO-optimized Rust binary with xxHash64 and cached tree resolution.
Wall-clock time, no shell overhead. Measured with hyperfine -N --warmup 10 --runs 50 on the sem repo.
Same commit (5 files), same repo. sem adds entity-level parsing on top of git's line diff.
Built-in instrumentation via sem diff --profile. Shows where time is spent inside the binary.
CPU time breakdown for sem diff --commit (large, 13 files). Hover for details.