Adding a language to graph

The graph tool started as a TypeScript-only call-graph engine.

Language Pluggability

The language-pluggability work introduced a six-method GraphLanguageAdapter contract so the engine itself doesn't know any specific language.

There are five first-party adapters — each as its own publishable npm package under packages/graph/graph-<lang>/:

The four tree-sitter adapters are backed by vendored web-tree-sitter WASM grammars (no native build / node-gyp at install) and share the parse/discover/walk/cache-key scaffolding in @opensip-cli/graph-adapter-common.

Discovery and Registration

Auto-discovery is marker-based: a package declares opensipTools.kind: "graph-adapter", targetDomain: "graph-adapter", targetDomainApiVersion: 1, and exports adapter. First-party adapters use the @opensip-cli/graph-* namespace, but the marker is the discovery contract; third-party adapters can use any npm name. You can also pin an exact list in plugins.graphAdapters in opensip-cli.config.yml.

Any first-party or third-party adapter slots in by implementing the contract and shipping the marker/export pair (or via the explicit-pin form).

This doc walks a contributor through that workflow.

What you'll have done after this:

- Decided where your adapter lives (first-party in this repo vs. third-party npm package).

- Implemented the six adapter methods against your language's parser.

- Run the contract test suite against your fixture project.

- Registered the adapter so opensip graph works on a project in your language.


1. Read first

The canonical contract source is the TypeScript file itself: packages/graph/engine/src/lang-adapter/types.ts — interface signatures, behavioral invariants I-1 through I-9 (in JSDoc), and the I/O shapes that flow between the orchestrator and your adapter.

Then look at the reference implementations. Five ship today:


2. The six methods

A GraphLanguageAdapter exposes six methods plus three identity fields (id, fileExtensions, displayName). The same data flows through them in order:

| Method | Responsibility | TypeScript reference |

|---|---|---|

| discoverFiles | Resolve which files belong to the project for a given cwd. Reads language-specific config (tsconfig.json, pyproject.toml, Cargo.toml, go.mod, etc.). Returns absolute, realpath-normalized, sorted file paths. | graph-typescript/discover.ts |

| parseProject | Build adapter-internal parse state. The shape is opaque (P = unknown); the engine passes it back into walkProject and resolveCallSites unchanged. Must stay synchronous (the engine calls it synchronously and the shard worker serializes results across a process boundary). TypeScript holds a ts.Program; the tree-sitter adapters hold a Map<filePath, { tree, source }> parsed via web-tree-sitter — the one-time async Parser.init() / Language.load(<wasm>) are confined to module top-level await, so parseProject itself stays sync. | graph-typescript/parse.ts |

| walkProject | One pass over the parsed project. Emit FunctionOccurrences (one per callable thing — function, method, arrow, constructor, getter/setter, plus a synthetic module-init per file) AND CallSiteRecords (pre-located call expressions, owner-keyed by bodyHash). | graph-typescript/walk.ts |

| resolveCallSites | Resolve the call-site list against the frozen catalog. Return a bodyHash → CallEdge[] map plus resolution stats. Call edges carry a confidence ('high' for symbol-resolved, 'medium'/'low' for name-only resolution). | graph-typescript/edges.ts (resolveEdgesFromRecords) |

| cacheKey | Compute an opaque per-adapter cache invalidation key. Different adapters MUST emit different prefixes (e.g. ts-…, py-…, rs-…, go-…, java-…) so cross-adapter accidents hash-mismatch immediately. | graph-typescript/cache-key.ts |

| ruleHints | Optional. Declare what counts as a test file in your language and which side-effect primitives the no-side-effect-path rule should look for. Without this, defaults apply and rules silently degrade. | graph-typescript/index.ts (ruleHints) |

The exact TypeScript signatures live in lang-adapter/types.ts. Read that file once — it's the technical reference.


3. Recommended file layout

A new first-party adapter ships as its own publishable npm package under packages/graph/graph-<id>/:

packages/graph/graph-<id>/
  package.json       — { "name": "@opensip-cli/graph-<id>",
                          deps on @opensip-cli/graph + @opensip-cli/core
                          (+ graph-adapter-common + web-tree-sitter for
                          tree-sitter adapters); `files` includes `wasm` }
                        (the `@opensip-cli/graph-*` name prefix is
                        the first-party naming convention; default
                        discovery consults only the
                        `opensipTools: { kind: "graph-adapter",
                          targetDomain: "graph-adapter",
                          targetDomainApiVersion: 1 }` marker)
  tsconfig.json
  src/
    discover.ts      — discoverFiles implementation (reads pyproject.toml / Cargo.toml / go.mod / etc.)
    parse.ts         — parseProject implementation (top-level `await Language.load(<wasm>)`,
                       then bind the shared `createTreeSitterParseProject` driver)
    walk.ts          — walkProject implementation (one pass, emit occurrences + call-site records)
    resolve.ts       — resolveCallSites implementation (name-based or symbol-based)
    cache-key.ts     — cacheKey implementation (hash language config + toolchain version)
    rule-hints.ts    — ruleHints constant (isTestFile, sideEffectPrimitives, throwSyntaxRegex)
    index.ts         — exports the adapter:
                          export const <id>GraphAdapter: GraphLanguageAdapter<P> = {
                            id: '<id>',
                            fileExtensions: ['.<ext>'],
                            displayName: '<DisplayName>',
                            discoverFiles, parseProject, walkProject,
                            resolveCallSites, cacheKey,
                            ruleHints,
                          };
  __tests__/
    fixtures/<id>/   — small project that exercises file discovery,
                       parsing, occurrence emission, call resolution

This mirrors graph-python/, graph-rust/, graph-go/, and graph-java/ — the recommended template for tree-sitter adapters. The four tree-sitter adapters keep most of their discover / parse / walk / cache-key scaffolding in @opensip-cli/graph-adapter-common (e.g. createTreeSitterParseProject, createDiscover, hashConfig) and each src/ file is a thin per-language binding plus the vendored wasm/<grammar>.wasm (shipped in the package's files). The TypeScript adapter has a deeper subdir layout (inventory-visitors/, edge-resolvers/, inventory-helpers/) because its symbol-resolved walk is genuinely more complex; for a tree-sitter adapter the flat layout is plenty. Adapters that prefer one big file or a different breakdown are fine — the contract doesn't care, only the public index.ts export matters.

Third-party graph adapters are supported via the same discovery path the first-party packages use: any package installed in node_modules that declares opensipTools.kind: "graph-adapter" plus the graph-adapter target-domain epoch is loaded and its adapter export registered. For deployments that need pinned discovery, list the exact package names under plugins.graphAdapters: in opensip-cli.config.yml — that list replaces the auto-scan entirely. plugins.autoDiscoverGraphAdapters: false disables the scan. The adapter contract types (GraphLanguageAdapter, registerAdapter, pickAdapter) are exported from @opensip-cli/graph.


4. The contract test suite

Every adapter MUST pass the shared contract test suite at packages/graph/graph-typescript/src/__tests__/lang-adapter-contract.test.ts. It validates the nine behavioral invariants documented on the GraphLanguageAdapter interface:

| Invariant | What the test checks |

|---|---|

| I-1 | walkProject is deterministic — two calls return the same occurrences and call-site summary. |

| I-2 | Different bodies produce different bodyHashes (the duplicated-function-body rule depends on this). |

| I-3 | Every CallSiteRecord.ownerHash exists in the same walk's occurrences. |

| I-4 | resolveCallSites does not mutate its input catalog. |

| I-5 | Every CallEdge.to references a catalog bodyHash or is empty (no dangling targets). |

| I-6 | cacheKey is stable for stable input AND changes when the language config file changes. |

| I-7 | parseProject is total over its files input — every file is either parsed or named in parseErrors. |

| I-8 | adapter.id matches the language family the adapter handles, and cacheKey carries an adapter-distinct prefix. |

| I-9 | discoverFiles is referentially transparent — repeated calls return the same files list. |

Add a describe block to that test file for your adapter. Each adapter ships with a small fixture project under __tests__/fixtures/<id>/ that exercises file discovery, parsing, occurrence emission, and call resolution. The fixture should produce non-trivial occurrences (at least one function, one method, one arrow / lambda equivalent, one anonymous callable).


5. Per-language fidelity expectations

Different adapters produce different-fidelity edges. This is intrinsic — TypeScript's getSymbolAtLocation is rich; tree-sitter-based adapters resolve calls by name and have no symbol table. The contract surfaces this via CallEdge.confidence:

| Adapter | confidence for direct calls | Notes |

|---|---|---|

| typescript | 'high' (symbol-resolved) | Reference. Has the TS type-checker. |

| python | Mostly 'medium'; 'low' on simple-name collisions | Tree-sitter; multiple functions named process may resolve to the wrong target. |

| rust | 'medium' (with impl block context for receivers) | Tree-sitter; trait dispatch and method-on-generic resolution stay name-only. |

| go | 'medium' (with receiver-type narrowing) | Tree-sitter; package-aware discovery via go.mod. |

| java | 'medium' (with class context) | Tree-sitter; class-resident scope means the resolver always knows the enclosing type. |

| c/c++ (planned) | 'medium' | Header/source duplication and namespace resolution are the wrinkles. |

Per-rule fidelity expectations:

| Rule | TS adapter (today) | Tree-sitter adapter (Python/Rust/Go/Java) |

|---|---|---|

| orphan-subtree | High — symbol resolution gives accurate transitive callee sets | Medium — name-based resolution; multiple process functions may pick wrong target |

| duplicated-function-body | Medium — body hash is textual; lexical-scope FPs documented | Medium — same fidelity (body hashing is language-agnostic) |

| no-side-effect-path | High — accurate edges + side-effect primitive list | Low — edge inaccuracy compounds; side-effect list is also language-specific |

| test-only-reachable | High — symbol resolution makes "callable from test only" precise | Low — same fidelity issue as no-side-effect-path |

| always-throws-branch | Medium — textual heuristic on CallEdge.text, language-agnostic | Medium — same heuristic, different syntax via the adapter's throwSyntaxRegex hint |

When you ship a new adapter, add a row to this table in your PR.


6. Registration

First-party and third-party adapters use the same registration path: ship a package that declares opensipTools.kind: "graph-adapter" plus targetDomain: "graph-adapter" / targetDomainApiVersion: 1, and whose main entry exports adapter (or list the package name under plugins.graphAdapters: in the project config).

{
  "name": "@opensip-cli/graph-cpp",
  "main": "dist/index.js",
  "files": ["dist", "wasm"],
  "opensipTools": {
    "kind": "graph-adapter",
    "targetDomain": "graph-adapter",
    "targetDomainApiVersion": 1
  },
  "dependencies": {
    "@opensip-cli/core": "workspace:*",
    "@opensip-cli/graph": "workspace:*",
    "@opensip-cli/graph-adapter-common": "workspace:*",
    "web-tree-sitter": "0.25.10"
  }
}

The opensipTools.kind marker is what default graph-adapter discovery consults; the target-domain epoch declares the graph-adapter API version your package targets. An explicit plugins.graphAdapters: list can pin the adapter set instead.

// packages/graph/graph-cpp/src/index.ts
import type { GraphLanguageAdapter } from '@opensip-cli/graph';

export const cppGraphAdapter: GraphLanguageAdapter<CppParsedProject> = {
  id: 'cpp',
  displayName: 'C/C++',
  fileExtensions: ['.c', '.cc', '.cpp', '.h', '.hpp'],
  discoverFiles,
  parseProject,
  walkProject,
  resolveCallSites,
  cacheKey,
  ruleHints,
};

export const adapter = cppGraphAdapter;

The generic capability loader (packages/cli/src/bootstrap/load-tool-capabilities.ts) discovers graph-adapter packages per command and routes each package's adapter export through graph's registrar into the per-run scope. A new adapter is live once it is installed or present in the workspace.

Once two or more adapters are registered, pickAdapter(cwd) chooses by file-extension dominance with a deterministic preference list. Add your language to the preference list in resolveTie if you ship a new first-party adapter.


7. First PR checklist

When you open the PR for a new adapter, verify each of these:


8. Common gotchas

These are drawn from real bugs caught while shipping the tree-sitter adapters.


9. Where to ask


What's next