Skip to content

Python API

Home

Note: The programmatic API is still evolving and may change between releases without notice. Pin a specific version if you depend on it. The CLI interface is stable.

The public entry point is search, re-exported from the top-level package:

from nfind import search

# Returns a list of records, each a dict with at least a "path" key (a host path).
# When the prompt asks for extra per-file values, they appear as additional keys.
records = search(".", "directories that contain only audio files")

for record in records:
    print(record["path"])

Requirements are the same as the CLI: Docker running by default (or Apple Containers on macOS via sandbox_backend="apple"), and OPENAI_API_KEY set in the environment. The first call builds the worker image; later calls reuse it. The Apple backend is experimental on macOS 15 because it cannot disable networking the way Docker does.

def search(
    path: str | Path | Sequence[str | Path],
    prompt: str,
    *,
    image: str | None = None,         # override the chosen runtime's base tag
    model: str = "openai/gpt-5.4",
    timeout: float = 180.0,
    memory: str = "256m",
    cpus: float = 1.0,
    pids_limit: int = 64,
    rebuild: bool = False,
    build_timeout: float = 120.0,
    on_generated: Callable[[GeneratedFilter], None] | None = None,
    on_retry: Callable[[int, ValueError], None] | None = None,
    approve_dependencies: Callable[[list[str]], bool] | None = None,
    whitelist: set[str] | None = None,
    macos_meta: bool = False,            # macOS: expose tags/quarantine to the filter
    format_code: bool = True,            # tidy generated Python with ruff before running
    sandbox: Sandbox | None = None,      # override the execution backend (see below)
    sandbox_backend: Literal["docker", "apple"] = "docker",
    exclude: Sequence[str] = (),         # glob patterns to prune before filtering
    max_depth: int | None = None,
    use_default_ignores: bool = True,
) -> list[dict[str, Any]]:

Generates a filter for prompt, runs it against path in the sandbox, and returns the matching paths as records (host paths plus any extra fields the prompt produced). path may be one root or a sequence of roots, each a directory (walked) or a single file; with several roots each is mounted separately and results are merged as host paths. The model picks the runtime (Python or Node.js) and the matching base image is used unless image overrides it. The keyword arguments mirror the CLI options: exclude, max_depth, and use_default_ignores shape host-side enumeration before the generated filter runs, format_code=False matches --no-format, and sandbox_backend="apple" matches --sandbox apple.

When using sandbox_backend="apple" on macOS 15, apply the same caveat as the CLI: Apple's container does not support Docker's --network none there. nfind uses --no-dns, but raw IP network access may still be possible.

model accepts a bare name (OpenAI) or a provider/model selector for any OpenAI-compatible provider in backend.PROVIDERS (anthropic/…, gemini/…, groq/…, ollama/…, openrouter/<vendor>/<model>, …). nfind reuses the OpenAI SDK against the provider's base URL and reads its *_API_KEY; see Providers.

Reviewing or gating the generated code

on_generated, if given, is called with the GeneratedFilter after it is produced but before it runs. It exposes .code, .runtime ("python" or "node"), and .dependencies. Use it to inspect, log, or save the code — or raise to abort before execution:

from nfind import search
from nfind.backend import GeneratedFilter

def review(generated: GeneratedFilter) -> None:
    print(f"[{generated.runtime}] {generated.dependencies}")
    print(generated.code)
    if "child_process" in generated.code:    # your own policy check
        raise RuntimeError("rejected")

records = search(".", "files with no extension", on_generated=review)

This is the same hook the CLI uses to implement --show-code, --save, and --confirm.

Saving and replaying filters

serialize_filter(generated, prompt, model) renders a GeneratedFilter as a self-describing, replayable script (a PEP 723 script for the Python runtime, a commented file with a machine-readable metadata line for Node) — the same artifact the CLI's --save writes. run_saved(filter_path, path, …) parses such a file back and replays it through the sandbox without an LLM call, gating any declared Python or Node packages through approve_dependencies/the per-runtime whitelist exactly as search does. It accepts the same one-or-many path, exclude, max_depth, and use_default_ignores enumeration controls as search:

from pathlib import Path

from nfind import serialize_filter, run_saved, search
from nfind.backend import GeneratedFilter

# Capture and persist a filter while searching:
saved: list[GeneratedFilter] = []
search(".", "Python files that import os", on_generated=saved.append)
Path("os-imports.py").write_text(
    serialize_filter(saved[0], "Python files that import os", "openai/gpt-5.4")
)

# Later, replay it sandboxed with no model call:
records = run_saved("os-imports.py", "./src")

See Saving & replaying filters for the file format and the Python-only uv run trusted fast path.

Generation retries

The model is asked for the filter in a single call. If its reply fails validation (malformed JSON, the wrong function shape, an invalid package name, an unknown runtime), nfind feeds the error back and retries — up to 3 attempts in total. The first attempt runs at temperature 0; retries nudge the temperature up so the model diverges from the reply that just failed. Only validation errors are retried; API, sandbox backend, and dependency-approval failures are not. If every attempt fails, the last validation error is raised.

on_retry, if given, is called with the 1-based retry number and the ValueError before each retry — handy for logging. The CLI uses it to print a notice under --verbose. generate_filter takes the same on_retry, plus an attempts argument (default 3) to tune or disable retries.

Approving dependencies

If the generated filter requests third-party packages that aren't already approved, approve_dependencies is called with the new package names. Return True to install them (into a derived image) and remember them; return False (the default behaviour when no approver is given) to reject with a DependencyError. whitelist overrides the approved set (defaults to load_whitelist(runtime), i.e. the chosen runtime's built-in list plus saved approvals). load_whitelist and approve_packages take a runtime argument ("python" or "node", default "python").

from nfind import search, load_whitelist

records = search(
    "~/Music",
    "MP3 files whose title tag contains 'live', using mutagen",
    approve_dependencies=lambda packages: True,   # auto-approve (like --yes)
    whitelist=load_whitelist() | {"tinytag"},
)

See Dependencies & the whitelist for the full model.

macOS metadata

With macos_meta=True on a macOS host, nfind reads selected per-path attributes (Finder tags, quarantine/where-from) during enumeration and exposes them to a Python filter as a global META dict, so filters can combine macOS metadata with file contents. It is a no-op on other platforms. See macOS metadata for the field schema and examples.

Errors

from nfind import DependencyError, DockerError, DockerUnavailableError

try:
    records = search(".", "files with no extension")
except DockerUnavailableError as exc:
    ...   # selected sandbox CLI/daemon/services could not be reached
except DockerError as exc:
    ...   # other sandbox lifecycle failures (build/run)
except DependencyError as exc:
    ...   # filter needed packages that were not approved

DockerUnavailableError is a backwards-compatible alias for the generic sandbox unavailable error, and remains a subclass of DockerError. DependencyError is raised when a filter needs unapproved packages. Filter execution problems (timeouts, invalid results) surface as TimeoutError and RuntimeError.

Lower-level building blocks

For finer control, nfind.generation owns the model-to-filter step, nfind.enumeration owns host-side path enumeration, and nfind.execution owns sandbox image preparation, dependency gating, and worker execution. nfind.backend orchestrates those pieces and re-exports the older helper names for compatibility:

from pathlib import Path
from nfind import enumeration, execution, generation, runtimes

root = Path(".").resolve()
container_paths, host_by_container, mounts = enumeration.enumerate_roots([root])
generated = generation.generate_filter("files with no extension")   # .code and .dependencies
runtime = runtimes.RUNTIMES[generated.runtime]
image = execution.build_worker_image(
    runtime.base_image,
    generated.dependencies,
    runtime=runtime,
)
records = execution.run_filter(
    generated.code,
    root,
    container_paths,
    image=image,
    mounts=mounts,
)
Function Purpose
enumeration.enumerate_paths(root, exclude=…, max_depth=…, use_default_ignores=…) Walk one tree; return container paths and a container→host map. exclude prunes matching globs, max_depth bounds depth, and default VCS/dependency/cache dirs are skipped unless disabled. Also re-exported from nfind.backend for compatibility.
enumeration.enumerate_roots(roots, exclude=…, max_depth=…, use_default_ignores=…) Walk one or more roots; return container paths, a container→host map, and the mounts needed for execution. This is what search and run_saved use.
collect_macos_metadata(host_by_container) macOS: read tags/quarantine/where-from per path; {} off macOS.
generation.generate_filter(prompt, model=…, attempts=…, on_retry=…) Ask the LLM for a GeneratedFilter (.code + .dependencies), validated for shape; retries on invalid replies. Also re-exported from nfind.backend for compatibility.
build_image(image=…, rebuild=…, build_timeout=…) Build the stdlib-only base worker image when absent or on request.
execution.build_worker_image(image=…, dependencies=…, …) Ensure a runnable image (base, or a derived image with packages); return the tag to run. Also re-exported from nfind.backend for compatibility.
execution.run_filter(code, root, container_paths, …) Execute the filter in the sandbox; return container-path records. Pass limits=Limits(…) to set the resource/output caps directly, or a sandbox= to override the backend. Also re-exported from nfind.backend for compatibility.
load_whitelist(runtime="python") / approve_packages(pkgs, runtime="python") Read the approved-package set / persist new approvals for one runtime ("python" or "node").
check_docker_available() Raise DockerUnavailableError if Docker can't be reached.
check_sandbox_available("docker" | "apple") Raise DockerUnavailableError if the selected sandbox backend can't be reached.

These lower-level helpers return the in-container paths the filter will see — each root's own host path when it can be safely mounted there, or neutral /data / /data/0 … mount points otherwise — alongside the container→host map that search and run_saved use to translate results back.

The sandbox component

The hardened execution lives behind a small, domain-agnostic Sandbox protocol in nfind.sandbox. The default backend, DockerSandbox, owns the security-relevant docker run flag set (no network, read-only root, dropped capabilities, no-new-privileges, and process/memory/CPU/file-descriptor/tmpfs limits) in one auditable place, plus the image build/derive mechanics. AppleContainerSandbox is an experimental macOS backend selected with sandbox_backend="apple"; on macOS 15 it does not provide Docker-equivalent no-network isolation. execution.build_worker_image and execution.run_filter are nfind-specific adapters over the selected backend.

search and run_saved accept an optional sandbox to override the backend — pass a fake implementing the protocol to drive the nfind logic without Docker, or an alternate backend later:

from nfind import search
from nfind.sandbox import CompletedRun, Limits, Mount

class DryRunSandbox:                 # structural match for the Sandbox protocol
    def check_available(self) -> None: ...
    def ensure_image(self, *, rebuild: bool = False) -> None: ...
    def derive_image(self, dockerfile_text: str, *, rebuild: bool = False) -> str:
        return "dry-run:latest"
    def run(self, stdin: bytes, *, mounts: list[Mount], limits: Limits) -> CompletedRun:
        return CompletedRun(stdout=b'{"ok": true, "results": []}', stderr=b"", returncode=0)

records = search(".", "files with no extension", sandbox=DryRunSandbox())

run raises SandboxTimeout / SandboxOutputTooLarge / SandboxUnavailable (DockerUnavailableError is an alias of SandboxUnavailable); it does not interpret exit codes or parse output — run_filter does that.