.. _pipeline:

==============================================
The toasty Image-Processing Pipeline Framework
==============================================

The toasty_ package provides a series of “pipeline” commands that automate the
process of tiling a collection of images and publishing the resulting data files
in a format easily usable by the `AAS WorldWide Telescope`_.

.. _toasty: https://toasty.readthedocs.io/
.. _AAS WorldWide Telescope: http://worldwidetelescope.org/


Overview
========

The toasty_ image-processing pipeline is designed to run automatically and
incrementally. The pipeline connects two nodes:

- A “source” of new images to process
- A “destination” where processed images get published

Pipeline operations follow this general scheme:

1. :ref:`Initialize a workspace <cli-pipeline-init>`, which gets associated with a
   source and a destination.
2. :ref:`Query the source <cli-pipeline-refresh>` for new images.
3. Choose which images to process and :ref:`fetch <cli-pipeline-fetch>` them.
4. :ref:`Process the source images <cli-pipeline-process-todos>` into WWT’s formats.
5. Check the results and :ref:`approve <cli-pipeline-approve>` images that are
   ready for publication.
6. :ref:`Publish <cli-pipeline-publish>` the approved images.

Each pipeline stage is implemented as a subcommand of the ``toasty``
command-line program.


Configuration
=============

The root of the *destination* data repository should contain a configuration
file named ``toasty-pipeline-config.yaml``. Once a pipeline workflow is set up,
you shouldn’t need to worry about this file. But to get a new pipeline going,
you need to create it and then place it in your data destination.

This file contains structured data in the `YAML <https://yaml.org/>`_ format. An
example is:

.. code-block:: YAML

    source_type: djangoplicity
    publish_url_prefix: http://data1.wwtassets.org/feeds/noirlab/

    djangoplicity:
      base_url: https://noirlab.edu/public/images/
      channel_name: noirlab

The toplevel settings are:

- ``source_type`` specifies the kind of system from which images are drawn.
  Values are documented below.
- ``publish_url_prefix`` specifies the URL prefix below which the data
  produced by the pipeline will ultimately become publicly accessible. This
  setting needs to be provided in order to write the correct data access URLs
  into the WTML files generated by the pipeline. For maximum cross-compatibility
  the URLs should begin with an ``http://`` prefix.

Djangoplicity Data Source
-------------------------

Currently, the only functional ``source_type`` is ``djangoplicity``, which
downloads and parses an imagery feed from a website powered by the the
`Djangoplicity <https://github.com/djangoplicity/djangoplicity>`_ gallery
system. An example is the `ESA Hubble gallery
<https://spacetelescope.org/images/>`_.

When using the ``djangoplicity`` data source, the ``toasty-pipeline-config.yaml``
file should contain a dictionary named ``djangoplicity`` as in the example above.
This dictionary should contain the following keys:

- ``base_url``, giving the root URL of the gallery. The URL
  ``{base_url}/archive/search`` should get you to a search page.
- ``channel_name``, giving the name of the “channel” that the output metadata will
  be annotated with, analogous to a YouTube channel. This should be a brief, lowercase,
  URL-friendly name.

The following keys are optional:

- ``search_page_name`` specifies the text used to construct search pagination URLs
  at the data source. The default is ``page``, which means that search page
  result URLs look like ``{base_url}/archive/search/page/1/``. As of late 2020,
  for ``eso.org``, and potentially other sites, the correct setting is ``list``,
  because the search page URLs look like ``{base_url}/archive/search/list/1/``.
- ``force_insecure_tls`` should take on a boolean value. If true, it specifies
  that the TLS-encrypted connections shouldn't be verified. As of late 2020 this
  is necessary for ``noirlab.edu``.

Astropix Data Source
--------------------

The ``astropix`` data source downloads and parses an imagery feed from the
`AstroPix <https://astropix.ipac.caltech.edu/>`_ service. **It needs updating
and won’t work right now**.

When using the ``astropix`` data source, the ``toasty-pipeline-config.yaml``
file should contain a dictionary named ``astropix``. This dictionary should
contain one key, ``json_query_url``. This key should give the URL of a saved
AstroPix search in the JSON output format. When fetching new data, the pipeline
will download JSON from this URL and parse it to index the available imagery.