:orphan: Programmatic API ================ This page documents the Python API for creating, submitting, and monitoring jobs from code (without using the CLI). Installation ------------ .. code-block:: bash pip install hpc-runner Quick start ----------- .. code-block:: python from hpc_runner import Job job = Job(command="python my_script.py", cpu=4, mem="8G", time="1:00:00") result = job.submit() # auto-detect scheduler final = result.wait() # block until complete print(final, result.returncode) Creating jobs ------------- ``Job`` is the core unit of work. It’s a scheduler-agnostic data container; a scheduler implementation is responsible for translating fields into submission flags/directives. .. code-block:: python from hpc_runner import Job job = Job( command="python train.py", name="training_job", cpu=4, mem="16G", time="4:00:00", queue="gpu.q", modules=["python/3.11", "cuda/12.0"], workdir="/path/to/project", stdout="train.out", stderr=None, # None means “merge stderr into stdout” for most schedulers ) Submitting jobs --------------- Submit to an auto-detected scheduler: .. code-block:: python result = job.submit() Or explicitly select a scheduler: .. code-block:: python from hpc_runner import get_scheduler scheduler = get_scheduler("sge") result = scheduler.submit(job) Monitoring jobs --------------- ``Job.submit()`` returns a ``JobResult`` which can be polled or waited on. .. code-block:: python from hpc_runner import Job, JobStatus result = Job("python train.py").submit() while not result.is_complete: print(result.job_id, result.status) if result.status == JobStatus.COMPLETED: print("ok:", result.read_stdout(tail=20)) else: print("failed:", result.read_stderr(tail=50)) Cancel a job: .. code-block:: python result.cancel() Configuration-driven jobs ------------------------- ``Job()`` is config-aware by default. It auto-consults the TOML config hierarchy, merging ``[defaults]`` with a matched ``[tools.]`` or ``[types.]`` section, then applies any explicit keyword arguments you pass. The tool name is auto-detected from the command (first word, path stripped). Use the ``job_type`` keyword to look up a ``[types.*]`` entry instead (this skips tool auto-detection). If the tool has ``[tools..options]`` entries defined in the config, the **full command string** (not just the first word) is used to match argument-specific overrides. See :ref:`tool-option-specialisation` for the matching rules. .. code-block:: python from hpc_runner import Job, reload_config # Explicitly load a config file for this process (optional) reload_config("./hpc-runner.toml") # Auto-detects "python" from command → looks up [tools.python] job = Job("python train.py", cpu=8) # Look up [types.gpu], merge with [defaults], override cpu job = Job("python train.py", job_type="gpu", cpu=8) # No matching tool — just [defaults] job = Job("echo hello") # With [tools.fusesoc.options."--tool slang"] in config, the command # arguments trigger option-specific overrides automatically: job = Job("fusesoc run --tool slang core:v:n") result = job.submit() Job dependencies ---------------- At the low level, you can chain jobs by attaching a dependency to a new job: .. code-block:: python from hpc_runner import Job r1 = Job("python preprocess.py", name="pre").submit() j2 = Job("python train.py", name="train") j2.dependencies = [r1] # programmatic dependency j2.dependency_type = "afterok" r2 = j2.submit() Pipelines (multi-step workflows) -------------------------------- For larger workflows, use ``Pipeline`` to define steps and dependencies by name. When used as a context manager, the pipeline auto-submits on exit and results are available via the ``results`` property. .. code-block:: python from hpc_runner import Pipeline with Pipeline("ml") as p: p.add("python preprocess.py", name="preprocess", cpu=8, mem="32G") p.add("python train.py", name="train", depends_on=["preprocess"]) p.add("python evaluate.py", name="evaluate", depends_on=["train"]) # Auto-submitted when the with-block exits cleanly. # Results are accessible on the pipeline object. for name, result in p.results.items(): print(name, result.job_id, result.status) p.wait() Without a context manager, call ``submit()`` explicitly: .. code-block:: python p = Pipeline("ml") p.add("make build", name="build") p.add("make test", name="test", depends_on=["build"]) results = p.submit() # must call manually p.wait() Config-aware pipeline jobs ^^^^^^^^^^^^^^^^^^^^^^^^^^ ``Pipeline.add()`` creates jobs through ``Job()``, so every step picks up ``[defaults]`` automatically and the tool is auto-detected from the command. Use the ``job_type`` keyword to pull in ``[types.*]`` config instead: .. code-block:: python with Pipeline("ml") as p: # Auto-detects "python" → picks up [tools.python] config p.add("python preprocess.py", name="preprocess") # Picks up [types.gpu] config (queue, resources, etc.) p.add("python train.py", name="train", depends_on=["preprocess"], job_type="gpu") # Only [defaults] — no matching tool or type p.add("echo done", name="notify", depends_on=["train"]) Keyword arguments override whatever comes from config: .. code-block:: python # [types.gpu] sets cpu=8, but we want 16 for this step p.add("python big_train.py", name="train", job_type="gpu", cpu=16) Per-job dependency types ^^^^^^^^^^^^^^^^^^^^^^^^ By default every dependency uses ``AFTEROK`` (run only if all parents succeed). You can set a different dependency type per step: .. code-block:: python from hpc_runner import DependencyType, Pipeline with Pipeline("robust") as p: p.add("python train.py", name="train") # Only runs if train succeeds p.add("python evaluate.py", name="evaluate", depends_on=["train"], dependency_type=DependencyType.AFTEROK) # Runs regardless of success/failure (cleanup, notifications, etc.) p.add("python notify.py", name="notify", depends_on=["train"], dependency_type=DependencyType.AFTERANY) Available types: ``AFTEROK``, ``AFTERANY``, ``AFTER``, ``AFTERNOTOK``. Choosing a scheduler ^^^^^^^^^^^^^^^^^^^^ By default the scheduler is auto-detected at submit time. You can pin it at construction: .. code-block:: python from hpc_runner import Pipeline, get_scheduler sge = get_scheduler("sge") with Pipeline("build", scheduler=sge) as p: p.add("make build", name="build") Or pass it to ``submit()`` directly: .. code-block:: python p = Pipeline("build") p.add("make build", name="build") p.submit(scheduler=sge) Handling submission failures ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If a scheduler error interrupts submission partway through, the successfully submitted jobs are preserved. Call ``submit()`` again to retry only the remaining jobs: .. code-block:: python p = Pipeline("etl") p.add("python extract.py", name="extract") p.add("python transform.py", name="transform", depends_on=["extract"]) p.add("python load.py", name="load", depends_on=["transform"]) try: p.submit() except RuntimeError: # extract submitted, transform failed — fix the issue, then: p.submit() # skips extract, retries transform and load Array jobs ---------- Use ``JobArray`` when you want a scheduler array job (SGE: ``qsub -t``): .. code-block:: python from hpc_runner import Job, JobArray base = Job("python work.py", name="work", cpu=2, time="0:30:00") array = JobArray(job=base, start=1, end=100, max_concurrent=10) array_result = array.submit() statuses = array_result.wait() print("completed:", sum(1 for s in statuses.values() if s.name == "COMPLETED"))