Getting started

Overview

The CLAMS project has many moving parts to make various computational analysis tools talk to each other to create customized workflows. However the most important part of the project must be the apps published for the CLAMS platform. The CLAMS Python SDK will help app developers handling MMIF data format with high-level classes and methods in Python, and publishing their code as a CLAMS app that can be easily deployed to the site via CLAMS workflow engines, such as the CLAMS appliance.

A CLAMS app can be any software that performs automated contents analysis on text, audio, and/or image/video data stream, while using MMIF as I/O format. When deployed into a CLAMS workflow engine, an app needs be running as a webapp wrapped in a container. In this documentation, we will explain what Python API’s and HTTP API’s an app must implement.

Prerequisites

  • Python: the latest clams-python requires Python 3.8 or newer. We have no plan to support Python 2.7.

  • Containerization software: when deployed to a CLAMS workflow engine, an app must be containerized. Developers can choose any containerization software for doing so, but clams-python SDK is developed with Docker in mind.

Installation

clams-python distribution package is available at PyPI. You can use pip to install the latest version.

pip install clams-python

Note that installing clams-python will also install mmif-python PyPI package, which is a companion python library related to the MMIF data format. More information regarding MMIF specifications can be found here. Documentation on mmif-python is available at the Python API documentation website.

Quick Start with the App Template

clams-python comes with a cookiecutter template for creating a CLAMS app. You can use it to create a new app project.

clams develop --help

The newly created project will have a README.md file that explains how to develop and deploy the app. Here we will explain the basic structure of the project. Developing a CLAMS app has three (or four depending on the underlying analyzer) major components.

  1. (Writing computational analysis code, or use existing code in Python)

  2. Make the analyzer to speak MMIF by wrapping with clams.app.ClamsApp.

  3. Make the app into a web app by wrapping with clams.restify.Restifier.

  4. Containerize the app by writing a Containerfile or Dockerfile.

CLAMS App API

A CLAMS Python app is a python class that implements and exposes two core methods: annotate(), appmetadata(). In essence, these methods (discussed further below) are wrappers around _annotate() and _appmetadata(), and they provide some common functionality such as making sure the output is serialized into a string.

  • annotate(): Takes a MMIF as input and processes the MMIF input, then returns serialized MMIF str.

  • appmetadata(): Returns JSON-formatted str that contains metadata about the app.

A good place to start writing a CLAMS app is to start with inheriting clams.app.ClamsApp. Let’s talk about the two methods in detail when inheriting the class.

annotate()

The annotate() method is the core method of a CLAMS app. It takes a MMIF JSON string as the main input, along with other kwargs for runtime configurations, and analyzes the MMIF input, then returns analysis results in a serialized MMIF str. When you inherit ClamsApp, you need to implement

  • _annotate() instead of annotate() (read the docstrings as they contain important information about the app implementation).

  • at a high level, _annotate() is mostly concerned with

    • finding processable documents and relevant annotations from previous views,

    • creating new views,

    • and calling the code that runs over the documents and inserts the results to the new views.

As a developer you can expose different behaviors of the annotate() method by providing configurable parameters as keyword arguments of the method. For example, you can have the user specify a re-sample rate of an audio file to be analyzed by providing a resample_rate parameter.

Note

These runtime configurations are not part of the MMIF input, but for reproducible analysis, you should record these configurations in the output MMIF.

Note

There are universal parameters defined at the SDK-level that all CLAMS apps commonly use. See clams.app.ClamsApp.universal_parameters.

Warning

All the runtime configurations should be pre-announced in the app metadata.

Also see <Tutorial: Wrapping an NLP Application> for a step-by-step tutorial on how to write the _annotate() method with a simple example NLP tool.

appmetadata()

App metadata is a map where important information about the app itself is stored as key-value pairs. That said, the appmetadata() method should not perform any analysis on the input MMIF. In fact, it shouldn’t take any input at all.

When using clams.app.ClamsApp, you have different options to implement information source for the metadata. See _load_appmetadata() for the options, and <CLAMS App Metadata> for the metadata specification.

Note

In the future, the app metadata will be used for automatic generation of CLAMS App Directory.

HTTP webapp

To be integrated into CLAMS workflow engines, a CLAMS app needs to serve as a webapp. Once your application class is ready, you can use clams.restify.Restifier to wrap your app as a Flask-based web application.

from clams.app import ClamsApp
from clams.restify import Restifier

class AnApp(ClamsApp):
    # Implements an app that does this and that.

if __name__ == "__main__":
    app = AnApp()
    webapp = Restifier(app)
    webapp.run()

When running the above code, Python will start a web server and host your CLAMS app. By default the serve will listen to 0.0.0.0:5000, but you can adjust hostname and port number. In this webapp, appmetadata and annotate will be respectively mapped to GET, and POST to the root route. Hence, for example, you can POST a MMIF file to the web app and get a response with the annotated MMIF string in the body.

Note

Now with HTTP interface, users can pass runtime configuration as URL query strings. As the values of query string parameters are always strings, Restifier will try to convert the values to the types specified in the app metadata, using clams.restify.ParameterCaster.

In the above example, clams.restify.Restifier.run() will start the webapp in debug mode on a Werkzeug server, which is not always suitable for a production server. For a more robust server that can handle multiple requests asynchronously, you might want to use a production-ready HTTP server. In such a case you can use serve_production(), which will spin up a multi-worker Gunicorn server. If you don’t like it (because, for example, gunicorn does not support Windows OS), you can write your own HTTP wrapper. At the end of the day, all you need is a webapp that maps appmetadata and annotate on GET and POST requests.

To test the behavior of the application in a Flask server, you should run the app as a webapp in a terminal (shell) session:

$ python app.py --develop --port 5000
# default port number is 5000

And poke at it from a new shell session:

# in a new terminal session
$ curl http://localhost:5000/
$ curl -H "Accept: application/json" -X POST -d@input/example-1.mmif "http://0.0.0.0:5000?pretty=True"

The first command prints the metadata, and the second prints the output MMIF file. Appending ?pretty=True to the URL will result in a pretty printed output. Note that with the --develop option we started a Flask development server. Without this option, a production server will be started. To get more information about the input file format (the contents of input/example-1.mmif), please refer to the user manual.

Containerization

In addition to the HTTP service, a CLAMS app is expected to be containerized for seamless deployment to CLAMS workflow engines. Also, independently from being compatible with the CLAMS platform, containerization of your app is recommended especially when your app processes video streams and is dependent on complicated system-level video processing libraries (e.g. OpenCV, FFmpeg).

When you start developing an app with clams develop command, the command will create a Containerfile with some instructions as inline comments for you (you can always start from scratch with any containerization tool you like).

Note

If you are part of CLAMS team and you want to publish your app to the https://github.com/clamsproject organization, clams develop command will also create a GitHub Actions files to automatically build and push an app image to the organization’s container registry. For the actions to work, you must use the name Containerfile instead of Dockerfile.

If you are not familiar with Containerfile or Dockerfile, refer to the official documentation to learn how to write one. To integrate to the CLAMS workflow engines, a containerized CLAMS app must automatically start itself as a webapp when instantiated as a container, and listen to 5000 port.

We have a public GitHub Container Repository, and publishing Debian-based base images to help developers write Containerfile and save build time to install common libraries. At the moment we have various basic images with Python 3.8, clams-python, and commonly used video and audio processing libraries installed.

Once you finished writing your Containerfile, you can build and test the containerized app locally. If you are not familiar with building and running container images To build a Docker image, these documentation will be helpful.