Pipestat CLI

Before following this tutorial please make sure you're familiar with more information-rich "Pipestat Python API" tutorial.

Prepare environment

Pipestat command line interface can use multiple environment variables to avoid copious repetition of arguments in subsequent pipestat calls.

Please refer to the Environment variables reference for the complete list of supported environment variables. We will set a few for this tutorial:

export PIPESTAT_RESULTS_SCHEMA=../tests/data/sample_output_schema.yaml
export PIPESTAT_RECORD_ID=sample1
export PIPESTAT_RESULTS_FILE=`mktemp` # temporary file for results storage
export PIPESTAT_NAMESPACE=test

Usage reference

To learn about the usage pipestat usage use --help/-h option on any level. If the environment variables are set, the pipestat command help will reflect that:

pipestat -h
version: 0.0.3
usage: pipestat [-h] [--version] [--silent] [--verbosity V] [--logdev]
                {report,inspect,remove,retrieve,status} ...

pipestat - report pipeline results

positional arguments:
  {report,inspect,remove,retrieve,status}
    report              Report a result.
    inspect             Inspect a database.
    remove              Remove a result.
    retrieve            Retrieve a result.
    status              Manage pipeline status.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --silent              Silence logging. Overrides verbosity.
  --verbosity V         Set logging level (1-5 or logging module level name)
  --logdev              Expand content of logging message format.

Pipestat standardizes reporting of pipeline results and pipeline status
management. It formalizes a way for pipeline developers and downstream tools
developers to communicate -- results produced by a pipeline can easily and
reliably become an input for downstream analyses. The object exposes API for
interacting with the results and pipeline status and can be backed by either a
YAML-formatted file or a PostgreSQL database.

pipestat report -h
usage: pipestat report [-h] [-n N] [-f F] [-c C] [-a] [-s S] [--status-schema ST]
                       [--flag-dir FD] -i I [-r R] -v V [-o] [-t]

Report a result.

optional arguments:
  -h, --help                   show this help message and exit
  -n N, --namespace N          Name of the pipeline to report result for. If not provided
                               'PIPESTAT_NAMESPACE' env var will be used. Currently set
                               to: test
  -f F, --results-file F       Path to the YAML file where the results will be stored.
                               This file will be used as pipestat backend and to restore
                               the reported results across sessions
  -c C, --config C             Path to the YAML configuration file. If not provided
                               'PIPESTAT_CONFIG' env var will be used. Currently not set
  -a, --database-only          Whether the reported data should not be stored in the
                               memory, only in the database.
  -s S, --schema S             Path to the schema that defines the results that can be
                               reported. If not provided 'PIPESTAT_RESULTS_SCHEMA' env var
                               will be used. Currently set to:
                               ../tests/data/sample_output_schema.yaml
  --status-schema ST           Path to the status schema. Default will be used if not
                               provided: /Library/Frameworks/Python.framework/Versions/3.6
                               /lib/python3.6/site-
                               packages/pipestat/schemas/status_schema.yaml
  --flag-dir FD                Path to the flag directory in case YAML file is the
                               pipestat backend.
  -i I, --result-identifier I  ID of the result to report; needs to be defined in the
                               schema
  -r R, --record-identifier R  ID of the record to report the result for. If not provided
                               'PIPESTAT_RECORD_ID' env var will be used. Currently set
                               to: sample1
  -v V, --value V              Value of the result to report
  -o, --overwrite              Whether the result should override existing ones in case of
                               name clashes
  -t, --try-convert            Whether to try to convert the reported value into reqiuired
                               class in case it does not meet the schema requirements

pipestat retrieve -h
usage: pipestat retrieve [-h] [-n N] [-f F] [-c C] [-a] [-s S] [--status-schema ST]
                         [--flag-dir FD] -i I [-r R]

Retrieve a result.

optional arguments:
  -h, --help                   show this help message and exit
  -n N, --namespace N          Name of the pipeline to report result for. If not provided
                               'PIPESTAT_NAMESPACE' env var will be used. Currently set
                               to: test
  -f F, --results-file F       Path to the YAML file where the results will be stored.
                               This file will be used as pipestat backend and to restore
                               the reported results across sessions
  -c C, --config C             Path to the YAML configuration file. If not provided
                               'PIPESTAT_CONFIG' env var will be used. Currently not set
  -a, --database-only          Whether the reported data should not be stored in the
                               memory, only in the database.
  -s S, --schema S             Path to the schema that defines the results that can be
                               reported. If not provided 'PIPESTAT_RESULTS_SCHEMA' env var
                               will be used. Currently set to:
                               ../tests/data/sample_output_schema.yaml
  --status-schema ST           Path to the status schema. Default will be used if not
                               provided: /Library/Frameworks/Python.framework/Versions/3.6
                               /lib/python3.6/site-
                               packages/pipestat/schemas/status_schema.yaml
  --flag-dir FD                Path to the flag directory in case YAML file is the
                               pipestat backend.
  -i I, --result-identifier I  ID of the result to report; needs to be defined in the
                               schema
  -r R, --record-identifier R  ID of the record to report the result for. If not provided
                               'PIPESTAT_RECORD_ID' env var will be used. Currently set
                               to: sample1

pipestat remove -h
usage: pipestat remove [-h] [-n N] [-f F] [-c C] [-a] [-s S] [--status-schema ST]
                       [--flag-dir FD] -i I [-r R]

Remove a result.

optional arguments:
  -h, --help                   show this help message and exit
  -n N, --namespace N          Name of the pipeline to report result for. If not provided
                               'PIPESTAT_NAMESPACE' env var will be used. Currently set
                               to: test
  -f F, --results-file F       Path to the YAML file where the results will be stored.
                               This file will be used as pipestat backend and to restore
                               the reported results across sessions
  -c C, --config C             Path to the YAML configuration file. If not provided
                               'PIPESTAT_CONFIG' env var will be used. Currently not set
  -a, --database-only          Whether the reported data should not be stored in the
                               memory, only in the database.
  -s S, --schema S             Path to the schema that defines the results that can be
                               reported. If not provided 'PIPESTAT_RESULTS_SCHEMA' env var
                               will be used. Currently set to:
                               ../tests/data/sample_output_schema.yaml
  --status-schema ST           Path to the status schema. Default will be used if not
                               provided: /Library/Frameworks/Python.framework/Versions/3.6
                               /lib/python3.6/site-
                               packages/pipestat/schemas/status_schema.yaml
  --flag-dir FD                Path to the flag directory in case YAML file is the
                               pipestat backend.
  -i I, --result-identifier I  ID of the result to report; needs to be defined in the
                               schema
  -r R, --record-identifier R  ID of the record to report the result for. If not provided
                               'PIPESTAT_RECORD_ID' env var will be used. Currently set
                               to: sample1

pipestat inspect -h
usage: pipestat inspect [-h] [-n N] [-f F] [-c C] [-a] [-s S] [--status-schema ST]
                        [--flag-dir FD] [-d]

Inspect a database.

optional arguments:
  -h, --help              show this help message and exit
  -n N, --namespace N     Name of the pipeline to report result for. If not provided
                          'PIPESTAT_NAMESPACE' env var will be used. Currently set to:
                          test
  -f F, --results-file F  Path to the YAML file where the results will be stored. This
                          file will be used as pipestat backend and to restore the
                          reported results across sessions
  -c C, --config C        Path to the YAML configuration file. If not provided
                          'PIPESTAT_CONFIG' env var will be used. Currently not set
  -a, --database-only     Whether the reported data should not be stored in the memory,
                          only in the database.
  -s S, --schema S        Path to the schema that defines the results that can be
                          reported. If not provided 'PIPESTAT_RESULTS_SCHEMA' env var will
                          be used. Currently set to:
                          ../tests/data/sample_output_schema.yaml
  --status-schema ST      Path to the status schema. Default will be used if not provided:
                          /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/
                          site-packages/pipestat/schemas/status_schema.yaml
  --flag-dir FD           Path to the flag directory in case YAML file is the pipestat
                          backend.
  -d, --data              Whether to display the data

pipestat status -h
usage: pipestat status [-h] {set,get} ...

Manage pipeline status.

positional arguments:
  {set,get}
    set       Set status.
    get       Get status.

optional arguments:
  -h, --help  show this help message and exit

pipestat status get -h
usage: pipestat status get [-h] [-n N] [-f F] [-c C] [-a] [-s S] [--status-schema ST]
                           [--flag-dir FD] [-r R]

Get status.

optional arguments:
  -h, --help                   show this help message and exit
  -n N, --namespace N          Name of the pipeline to report result for. If not provided
                               'PIPESTAT_NAMESPACE' env var will be used. Currently set
                               to: test
  -f F, --results-file F       Path to the YAML file where the results will be stored.
                               This file will be used as pipestat backend and to restore
                               the reported results across sessions
  -c C, --config C             Path to the YAML configuration file. If not provided
                               'PIPESTAT_CONFIG' env var will be used. Currently not set
  -a, --database-only          Whether the reported data should not be stored in the
                               memory, only in the database.
  -s S, --schema S             Path to the schema that defines the results that can be
                               reported. If not provided 'PIPESTAT_RESULTS_SCHEMA' env var
                               will be used. Currently set to:
                               ../tests/data/sample_output_schema.yaml
  --status-schema ST           Path to the status schema. Default will be used if not
                               provided: /Library/Frameworks/Python.framework/Versions/3.6
                               /lib/python3.6/site-
                               packages/pipestat/schemas/status_schema.yaml
  --flag-dir FD                Path to the flag directory in case YAML file is the
                               pipestat backend.
  -r R, --record-identifier R  ID of the record to report the result for. If not provided
                               'PIPESTAT_RECORD_ID' env var will be used. Currently set
                               to: sample1

pipestat status set -h
usage: pipestat status set [-h] [-n N] -i S [-f F] [-c C] [-a] [-s S] [--status-schema ST]
                           [--flag-dir FD] [-r R]

Set status.

optional arguments:
  -h, --help                   show this help message and exit
  -n N, --namespace N          Name of the pipeline to report result for. If not provided
                               'PIPESTAT_NAMESPACE' env var will be used. Currently set
                               to: test
  -i S, --status-identifier S  Status identifier to use
  -f F, --results-file F       Path to the YAML file where the results will be stored.
                               This file will be used as pipestat backend and to restore
                               the reported results across sessions
  -c C, --config C             Path to the YAML configuration file. If not provided
                               'PIPESTAT_CONFIG' env var will be used. Currently not set
  -a, --database-only          Whether the reported data should not be stored in the
                               memory, only in the database.
  -s S, --schema S             Path to the schema that defines the results that can be
                               reported. If not provided 'PIPESTAT_RESULTS_SCHEMA' env var
                               will be used. Currently set to:
                               ../tests/data/sample_output_schema.yaml
  --status-schema ST           Path to the status schema. Default will be used if not
                               provided: /Library/Frameworks/Python.framework/Versions/3.6
                               /lib/python3.6/site-
                               packages/pipestat/schemas/status_schema.yaml
  --flag-dir FD                Path to the flag directory in case YAML file is the
                               pipestat backend.
  -r R, --record-identifier R  ID of the record to report the result for. If not provided
                               'PIPESTAT_RECORD_ID' env var will be used. Currently set
                               to: sample1

Usage demonstration

Reporting

Naturally, the command line interface provides access to all the Python API functionalities of pipestat. So, for example, to report a result and back the object by a file use:

pipestat report -i number_of_things -v 100 --try-convert
Reported records for 'sample1' in 'test' namespace:
 - number_of_things: 100

The result has been reported and the database file has been updated:

cat $PIPESTAT_RESULTS_FILE
test:
  sample1:
    number_of_things: 100

Let's report another result:

pipestat report -i percentage_of_things -v 1.1 --try-convert
Reported records for 'sample1' in 'test' namespace:
 - percentage_of_things: 1.1

cat $PIPESTAT_RESULTS_FILE
test:
  sample1:
    number_of_things: 100
    percentage_of_things: 1.1

Inspection

pipestat inspect command is a way to briefly look at the general PipestatManager state, like number of records, type of backend etc.

pipestat inspect


PipestatManager (test)
Backend: file (/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/tmp.Zid7BMd1)
Results schema source: ../tests/data/sample_output_schema.yaml
Status schema source: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pipestat/schemas/status_schema.yaml
Records count: 1

In order to display the contents of the results file or database table associated with the indicated namespace, add --data flag:

pipestat inspect --data


PipestatManager (test)
Backend: file (/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/tmp.Zid7BMd1)
Results schema source: ../tests/data/sample_output_schema.yaml
Status schema source: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pipestat/schemas/status_schema.yaml
Records count: 1

Data:
test:
  sample1:
    number_of_things: 100
    percentage_of_things: 1.1

Retrieval

Naturally, the reported results can be retrieved. Just call pipestat retrieve to do so:

pipestat retrieve -i percentage_of_things
1.1

Removal

In order to remove a result call pipestat remove:

pipestat remove -i percentage_of_things
Removed result 'percentage_of_things' for record 'sample1' from 'test' namespace

The results file and the state of the PipestatManager object reflect the removal:

cat $PIPESTAT_RESULTS_FILE
test:
  sample1:
    number_of_things: 100

pipestat inspect --data


PipestatManager (test)
Backend: file (/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/tmp.Zid7BMd1)
Results schema source: ../tests/data/sample_output_schema.yaml
Status schema source: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pipestat/schemas/status_schema.yaml
Records count: 1

Data:
test:
  sample1:
    number_of_things: 100

Status management

To manage pipeline status call pipestat status <subcommand>:

  • set to set pipeline statuses
  • get to retrieve pipeline statuses

Starting with pipestat 0.0.3 the --schema argument is not required for status management if YAML file is used as the backend.

pipestat status set --status-identifier running
pipestat status get
running

Note that only statuses defined in the status schema are supported:

rm $PIPESTAT_RESULTS_FILE