The toasty Image-Processing Pipeline Framework

The toasty package provides a series of “pipeline” commands that automate the process of tiling a collection of images and publishing the resulting data files in a format easily usable by the AAS WorldWide Telescope.

Overview

The toasty image-processing pipeline is designed to run automatically and incrementally. The pipeline connects two nodes:

  • A “source” of new images to process

  • A “destination” where processed images get published

Pipeline operations follow this general scheme:

  1. Initialize a workspace, which gets associated with a source and a destination.

  2. Query the source for new images.

  3. Choose which images to process and fetch them.

  4. Process the source images into WWT’s formats.

  5. Check the results and approve images that are ready for publication.

  6. Publish the approved images.

Each pipeline stage is implemented as a subcommand of the toasty command-line program.

Configuration

The root of the destination data repository should contain a configuration file named toasty-pipeline-config.yaml. Once a pipeline workflow is set up, you shouldn’t need to worry about this file. But to get a new pipeline going, you need to create it and then place it in your data destination.

This file contains structured data in the YAML format. An example is:

source_type: djangoplicity
publish_url_prefix: http://data1.wwtassets.org/feeds/noirlab/

djangoplicity:
  base_url: https://noirlab.edu/public/images/
  channel_name: noirlab

The toplevel settings are:

  • source_type specifies the kind of system from which images are drawn. Values are documented below.

  • publish_url_prefix specifies the URL prefix below which the data produced by the pipeline will ultimately become publicly accessible. This setting needs to be provided in order to write the correct data access URLs into the WTML files generated by the pipeline. For maximum cross-compatibility the URLs should begin with an http:// prefix.

Djangoplicity Data Source

Currently, the only functional source_type is djangoplicity, which downloads and parses an imagery feed from a website powered by the the Djangoplicity gallery system. An example is the ESA Hubble gallery.

When using the djangoplicity data source, the toasty-pipeline-config.yaml file should contain a dictionary named djangoplicity as in the example above. This dictionary should contain the following keys:

  • base_url, giving the root URL of the gallery. The URL {base_url}/archive/search should get you to a search page.

  • channel_name, giving the name of the “channel” that the output metadata will be annotated with, analogous to a YouTube channel. This should be a brief, lowercase, URL-friendly name.

The following keys are optional:

  • search_page_name specifies the text used to construct search pagination URLs at the data source. The default is page, which means that search page result URLs look like {base_url}/archive/search/page/1/. As of late 2020, for eso.org, and potentially other sites, the correct setting is list, because the search page URLs look like {base_url}/archive/search/list/1/.

  • force_insecure_tls should take on a boolean value. If true, it specifies that the TLS-encrypted connections shouldn’t be verified. As of late 2020 this is necessary for noirlab.edu.

Astropix Data Source

The astropix data source downloads and parses an imagery feed from the AstroPix service. It needs updating and won’t work right now.

When using the astropix data source, the toasty-pipeline-config.yaml file should contain a dictionary named astropix. This dictionary should contain one key, json_query_url. This key should give the URL of a saved AstroPix search in the JSON output format. When fetching new data, the pipeline will download JSON from this URL and parse it to index the available imagery.