Building a Zero-Dependency Codebase Analyzer in Node.js
When you join a new project or open-source repo, the first question is always: what’s in here? The tree command shows file structure, but it doesn’t tell you what language each file is, what functions are defined, or how files depend on each other.
I built codemap — a CLI tool that generates structural overviews of codebases with zero runtime dependencies. Here’s what I learned about parsing code without a parser.
The Problem with tree
src/
├── analyzer.js
├── cli.js
├── formatter.js
└── index.js
This tells you nothing. What does analyzer.js export? How many lines is it? Does cli.js import from formatter.js? You’d have to open each file to find out.
What codemap does differently
src/
├─ analyzer.js (JavaScript, 335L)
function parseGitignore:38
function extractSignatures:66
function extractImports:185
├─ cli.js (JavaScript, 76L)
├─ formatter.js (JavaScript, 252L)
⬡ function formatSummary:26
⬡ function formatTree:41
⬡ function formatGraph:192
└─ index.js (JavaScript, 3L)
Language detection, line counts, function signatures with line numbers, exported symbols marked with ⬡. All without installing a single dependency.
Design Decisions
Why zero dependencies?
- Install speed —
npx codemap-cliruns in seconds, not minutes - Trust — No supply chain risk. You can read the entire source in 10 minutes
- Maintenance — No breaking updates from upstream packages
- Portability — Works on any Node.js 18+ environment
The entire tool is ~750 lines across 4 files. Every line is mine.
Language detection without a parser
You don’t need a full AST parser to extract useful information. A regex-based approach works surprisingly well for signatures:
// JavaScript/TypeScript functions
const exportMatch = line.match(
/^export\s+(default\s+)?(function|class|const)\s+(\w+)/
);
const funcMatch = line.match(/^(async\s+)?function\s+(\w+)/);
// Python
const defMatch = line.match(/^(async\s+)?def\s+(\w+)/);
// Go
const funcMatch = line.match(/^func\s+(?:\(.*?\)\s+)?(\w+)/);
// Rust
const fnMatch = line.match(/^(?:pub\s+)?(?:async\s+)?fn\s+(\w+)/);
This catches top-level declarations across 9 languages. It won’t find nested functions or complex patterns, but for a structural overview, that’s exactly right. You want the shape of the code, not every detail.
Import extraction for dependency graphs
The same regex approach works for imports:
// JS/TS: import ... from '...' or require('...')
line.match(/(?:import\s+.*\s+from\s+|import\s+)['"]([^'"]+)['"]/);
line.match(/require\s*\(\s*['"]([^'"]+)['"]\s*\)/);
// Python: from X import Y
line.match(/^from\s+([\w.]+)\s+import/);
With imports extracted, you can build a dependency graph and output it as Mermaid:
graph LR
src_cli_js["src/cli.js"]
src_analyzer_js["src/analyzer.js"]
src_formatter_js["src/formatter.js"]
src_cli_js --> src_analyzer_js
src_cli_js --> src_formatter_js
This renders directly on GitHub — no external tools needed.
Respecting .gitignore
One subtle but important feature: codemap reads .gitignore and applies those patterns. This means node_modules, dist, build, and other generated directories are automatically excluded. The output shows your source code, not your build artifacts.
Output Modes
The tool outputs in four formats:
- Tree (default) — colored terminal output with signatures
- Summary (
-s) — quick stats: file count, lines, language breakdown - Markdown (
-m) — structured tables for documentation - Graph (
-g) — Mermaid dependency diagram
The JSON mode (-j) outputs everything as structured data, so you can pipe to jq or feed into other tools:
# Find files with the most functions
codemap -j | jq '.files[] | select(.signatures | length > 5) | {name, count: (.signatures | length)}'
What I’d do differently
More language support for imports. Currently import extraction covers JS/TS, Python, Go, Rust, and C/C++. Java and Ruby are missing. Adding them is straightforward — each language has a distinctive import syntax.
Better resolution of relative imports. The dependency graph resolves ./foo to ./foo.js or ./foo.ts, but doesn’t handle path aliases (@/components/...) or barrel imports (./index). These are solvable but add complexity.
Performance on large repos. The tool reads every file sequentially. For a monorepo with 10,000 files, parallel file reading with Promise.all batches would help. For now, it’s fast enough for most projects.
Try it
npx codemap-cli
Or check out the source: github.com/Taru0208/codemap
There’s also a GitHub Action that auto-generates CODEMAP.md on every push.