.. _pipeline: ============================================== The toasty Image-Processing Pipeline Framework ============================================== The toasty_ package provides a series of “pipeline” commands that automate the process of tiling a collection of images and publishing the resulting data files in a format easily usable by the `AAS WorldWide Telescope`_. .. _toasty: https://toasty.readthedocs.io/ .. _AAS WorldWide Telescope: http://worldwidetelescope.org/ Overview ======== The toasty_ image-processing pipeline is designed to run automatically and incrementally. The pipeline connects two nodes: - A “source” of new images to process - A “destination” where processed images get published Pipeline operations follow this general scheme: 1. :ref:`Initialize a workspace `, which gets associated with a source and a destination. 2. :ref:`Query the source ` for new images. 3. Choose which images to process and :ref:`fetch ` them. 4. :ref:`Process the source images ` into WWT’s formats. 5. Check the results and :ref:`approve ` images that are ready for publication. 6. :ref:`Publish ` the approved images. Each pipeline stage is implemented as a subcommand of the ``toasty`` command-line program. Configuration ============= The root of the *destination* data repository should contain a configuration file named ``toasty-pipeline-config.yaml``. Once a pipeline workflow is set up, you shouldn’t need to worry about this file. But to get a new pipeline going, you need to create it and then place it in your data destination. This file contains structured data in the `YAML `_ format. An example is: .. code-block:: YAML source_type: djangoplicity publish_url_prefix: http://data1.wwtassets.org/feeds/noirlab/ djangoplicity: base_url: https://noirlab.edu/public/images/ channel_name: noirlab The toplevel settings are: - ``source_type`` specifies the kind of system from which images are drawn. Values are documented below. - ``publish_url_prefix`` specifies the URL prefix below which the data produced by the pipeline will ultimately become publicly accessible. This setting needs to be provided in order to write the correct data access URLs into the WTML files generated by the pipeline. For maximum cross-compatibility the URLs should begin with an ``http://`` prefix. Djangoplicity Data Source ------------------------- Currently, the only functional ``source_type`` is ``djangoplicity``, which downloads and parses an imagery feed from a website powered by the the `Djangoplicity `_ gallery system. An example is the `ESA Hubble gallery `_. When using the ``djangoplicity`` data source, the ``toasty-pipeline-config.yaml`` file should contain a dictionary named ``djangoplicity`` as in the example above. This dictionary should contain the following keys: - ``base_url``, giving the root URL of the gallery. The URL ``{base_url}/archive/search`` should get you to a search page. - ``channel_name``, giving the name of the “channel” that the output metadata will be annotated with, analogous to a YouTube channel. This should be a brief, lowercase, URL-friendly name. The following keys are optional: - ``search_page_name`` specifies the text used to construct search pagination URLs at the data source. The default is ``page``, which means that search page result URLs look like ``{base_url}/archive/search/page/1/``. As of late 2020, for ``eso.org``, and potentially other sites, the correct setting is ``list``, because the search page URLs look like ``{base_url}/archive/search/list/1/``. - ``force_insecure_tls`` should take on a boolean value. If true, it specifies that the TLS-encrypted connections shouldn't be verified. As of late 2020 this is necessary for ``noirlab.edu``. Astropix Data Source -------------------- The ``astropix`` data source downloads and parses an imagery feed from the `AstroPix `_ service. **It needs updating and won’t work right now**. When using the ``astropix`` data source, the ``toasty-pipeline-config.yaml`` file should contain a dictionary named ``astropix``. This dictionary should contain one key, ``json_query_url``. This key should give the URL of a saved AstroPix search in the JSON output format. When fetching new data, the pipeline will download JSON from this URL and parse it to index the available imagery.