Skip to content

DATASET_INFO

Extracts detailed metadata from a LAS/LAZ point cloud using PDAL. The step runs pdal info on the input dataset and produces a JSON document describing the structure and statistics of the point cloud.

This step is a pure inspection step — it does not transform the data. It extracts structured metadata and publishes it as an artifact that can be consumed by downstream steps or external systems.

Typical use: dataset inspection, metadata extraction, quality control, and feeding metadata to CALL_WEBHOOK so external systems receive point cloud statistics at job completion.


Contract

TypeDATASET_INFO
Acceptsinput_las: las
Producesmetadata: json
Paramsnone

Inputs

SlotTypeDescription
input_laslasPoint cloud dataset to inspect

Outputs

SlotTypeDescription
metadatajsonPDAL metadata report describing the dataset

What it does internally

  1. Downloads the LAS artifact from MinIO
  2. Runs pdal info input.las
  3. Saves the JSON output to info.json
  4. Uploads info.json to MinIO as an artifact
  5. Returns metadata pointing to the uploaded JSON

The PDAL metadata report includes:

  • Bounding boxes (native CRS and EPSG:4326)
  • Total point count
  • Per-dimension statistics (min, max, mean, stddev) for X, Y, Z, Intensity, etc.
  • File size
  • PDAL reader information
  • Coordinate system information (when available)

Role in standard pipelines

All current MapPrism preset pipelines run DATASET_INFO in parallel with the main conversion step:

json
"recipe": [
  {
    "id": "build_copc",
    "type": "BUILD_COPC",
    "inputs": { "input_las": "job:input_las" },
    "outputs": { "output_copc": "step:build_copc.output_copc" }
  },
  {
    "id": "dataset_info",
    "type": "DATASET_INFO",
    "inputs": { "input_las": "job:input_las" },
    "outputs": { "metadata": "step:dataset_info.metadata" }
  }
]

The CALL_WEBHOOK in on_exit waits for step:dataset_info.metadata, ensuring the full job object (including metadata) is available when the webhook fires.


Recipe usage

json
{
  "id": "dataset_info",
  "type": "DATASET_INFO",
  "inputs":  { "input_las": "job:input_las" },
  "outputs": { "metadata": "step:dataset_info.metadata" }
}

Artifact storage path

artifacts/job_{id}/dataset_info/info.json