The toasty Image-Processing Pipeline Framework¶
The toasty package provides a series of “pipeline” commands that automate the process of tiling a collection of images and publishing the resulting data files in a format easily usable by the AAS WorldWide Telescope.
Overview¶
The toasty image-processing pipeline is designed to run automatically and incrementally. The pipeline connects two nodes:
A “source” of new images to process
A “destination” where processed images get published
Pipeline operations follow this general scheme:
Initialize a workspace, which gets associated with a source and a destination.
Query the source for new images.
Choose which images to process and fetch them.
Process the source images into WWT’s formats.
Check the results and approve images that are ready for publication.
Publish the approved images.
Each pipeline stage is implemented as a subcommand of the toasty
command-line program.
Configuration¶
The root of the destination data repository should contain a configuration
file named toasty-pipeline-config.yaml
. Once a pipeline workflow is set up,
you shouldn’t need to worry about this file. But to get a new pipeline going,
you need to create it and then place it in your data destination.
This file contains structured data in the YAML format. An example is:
source_type: djangoplicity
publish_url_prefix: http://data1.wwtassets.org/feeds/noirlab/
djangoplicity:
base_url: https://noirlab.edu/public/images/
channel_name: noirlab
The toplevel settings are:
source_type
specifies the kind of system from which images are drawn. Values are documented below.publish_url_prefix
specifies the URL prefix below which the data produced by the pipeline will ultimately become publicly accessible. This setting needs to be provided in order to write the correct data access URLs into the WTML files generated by the pipeline. For maximum cross-compatibility the URLs should begin with anhttp://
prefix.
Djangoplicity Data Source¶
Currently, the only functional source_type
is djangoplicity
, which
downloads and parses an imagery feed from a website powered by the the
Djangoplicity gallery
system. An example is the ESA Hubble gallery.
When using the djangoplicity
data source, the toasty-pipeline-config.yaml
file should contain a dictionary named djangoplicity
as in the example above.
This dictionary should contain the following keys:
base_url
, giving the root URL of the gallery. The URL{base_url}/archive/search
should get you to a search page.channel_name
, giving the name of the “channel” that the output metadata will be annotated with, analogous to a YouTube channel. This should be a brief, lowercase, URL-friendly name.
The following keys are optional:
search_page_name
specifies the text used to construct search pagination URLs at the data source. The default ispage
, which means that search page result URLs look like{base_url}/archive/search/page/1/
. As of late 2020, foreso.org
, and potentially other sites, the correct setting islist
, because the search page URLs look like{base_url}/archive/search/list/1/
.force_insecure_tls
should take on a boolean value. If true, it specifies that the TLS-encrypted connections shouldn’t be verified. As of late 2020 this is necessary fornoirlab.edu
.
Astropix Data Source¶
The astropix
data source downloads and parses an imagery feed from the
AstroPix service. It needs updating
and won’t work right now.
When using the astropix
data source, the toasty-pipeline-config.yaml
file should contain a dictionary named astropix
. This dictionary should
contain one key, json_query_url
. This key should give the URL of a saved
AstroPix search in the JSON output format. When fetching new data, the pipeline
will download JSON from this URL and parse it to index the available imagery.