Dependencies & the whitelist¶
← Home
Some questions can't be answered from names and raw bytes alone — reading MP3 tags, image dimensions, PDF text, or the structure of source code needs a library. nfind lets the generated filter declare the third-party packages it needs, then installs them into a sandboxed image — but only after the package has been approved.
The defaults include tree-sitter plus a set of per-language grammar wheels
(tree-sitter-python, -javascript, -typescript, -go, -rust, -java, -c,
-bash, -kotlin, -swift, -dart), so a Python filter can parse source
structure — functions, imports,
classes — without a dedicated runtime. Each wheel bundles its compiled grammar, so it
works in the default Docker backend's no-network, read-only sandbox; the generated
code uses the standard API,
Parser(Language(tree_sitter_python.language())). Reach for the
Node.js runtime only when you need type-aware TypeScript/JS analysis
that the compiler API provides.
How it works¶
- Declare. When generating the filter, the model also returns the PyPI packages
the code imports (for example
mutagento read audio tags). - Check the whitelist. nfind compares the requested packages against an approved set for that runtime: a small built-in default list plus anything you've approved before (saved to disk). Python (pip) and Node.js (npm) packages are tracked separately.
- Approve new packages. If a package isn't already approved, nfind asks before installing it. On approval it is remembered so you're not asked again.
- Build a derived image. Approved packages are installed into a derived worker
image (
nfind-search-paths:deps-<hash>for Python,nfind-search-node:deps-<hash>for Node.js) layered on the chosen runtime's base. The image is cached and reused for the same set of packages. Prompts that need no packages keep using the stdlib-only base image. - Run. The filter executes in the derived image — with the packages available, read-only mount, and dropped capabilities at run time. The default Docker backend also disables networking; the experimental Apple backend on macOS 15 does not.
Installing packages happens at image build time, which needs network access. The default Docker container that runs the filter has networking disabled. With
--sandbox appleon macOS 15, raw IP network access may still be possible.
Controlling approval¶
| Flag | Effect |
|---|---|
| (none) | Prompt to install any package not already approved; remember approvals. |
--yes, -y |
Approve and remember any requested packages without prompting. |
--no-deps |
Refuse any third-party package — the filter must use the standard library only. |
# Prompt before installing anything new (default)
nfind "MP3 files whose title tag contains 'live', using mutagen" ~/Music
# Trust this run — install whatever it asks for, and remember it
nfind "images larger than 4000px on either side" ~/Photos --yes
# Force standard-library-only; reject any package request
nfind "files containing the word TODO" . --no-deps
If a filter needs a package that isn't approved and you don't approve it (or you pass
--no-deps), nfind aborts with a DependencyError before building or running
anything. The same checks apply when replaying saved Python or Node filters with
nfind --run; saved metadata cannot silently install packages.
The default list¶
These common, read-only analysis packages are pre-approved and install without a prompt.
Python (pip): chardet, mutagen, pdfminer-six, pillow, pillow-heif,
pypdf, python-magic, pyyaml, tinytag, tomli, tree-sitter,
tree-sitter-bash, tree-sitter-c, tree-sitter-dart, tree-sitter-go,
tree-sitter-java, tree-sitter-javascript, tree-sitter-kotlin,
tree-sitter-python, tree-sitter-rust, tree-sitter-swift,
tree-sitter-typescript
Node.js (npm): @babel/parser, acorn, esprima, fast-xml-parser, ts-morph,
typescript, yaml
The whitelist file¶
Approvals are stored as JSON at:
Override the location with the NFIND_WHITELIST environment variable. The file lists
the packages you've approved, per runtime; edit or delete it to manage what installs
without a prompt:
The effective allow-set for a runtime is always its built-in
default list plus that runtime's entry in this file. (A
legacy flat {"packages": [...]} file is still read as the Python list.)
Why a whitelist¶
Even sandboxed, installing arbitrary packages carries risk — a package can run code during installation or pull in unexpected transitive dependencies. Restricting installs to a vetted, remembered set keeps you in control of what enters the image, while still letting the filter use real libraries when you allow it. See the Safety model for the full picture.