The CLAMS project has many moving parts to make various computational analysis tools talk to each other to create customized workflows. However the most important part of the project must be the apps published for the CLAMS platform. The CLAMS Python SDK will help app developers handling MMIF data format with high-level classes and methods in Python, and publishing their code as a CLAMS app that can be easily deployed to the site via CLAMS workflow engines, such as the CLAMS appliance.
A CLAMS app can be any software that performs automated contents analysis on text, audio, and/or image/video data stream, while using MMIF as I/O format. When deployed into a CLAMS workflow engine, an app needs be running as a webapp wrapped in a container. In this documentation, we will explain what Python API’s and HTTP API’s an app must implement.
Containerization software: when deployed to a CLAMS workflow engine, an app must be containerized. Developers can choose any containerization software for doing so, but
clams-pythonSDK is developed with Docker in mind.
clams-python distribution package is available at PyPI. You can use
pip to install the latest version.
pip install clams-python
Note that installing
clams-python will also install
mmif-python PyPI package, which is a companion python library related to the MMIF data format. More information regarding MMIF specifications can be found here. Documentation on
mmif-python is available at the Python API documentation website.
Quick Start with the App Template
clams-python comes with a cookiecutter template for creating a CLAMS app. You can use it to create a new app project.
clams develop --help
The newly created project will have a
README.md file that explains how to develop and deploy the app. Here we will explain the basic structure of the project. Developing a CLAMS app has three (or four depending on the underlying analyzer) major components.
(Writing computational analysis code, or use existing code in Python)
Make the analyzer to speak MMIF by wrapping with
Make the app into a web app by wrapping with
Containerize the app by writing a
CLAMS App API
A CLAMS Python app is a python class that implements and exposes two core methods:
appmetadata(). In essence, these methods (discussed further below) are wrappers around
_appmetadata(), and they provide some common functionality such as making sure the output is serialized into a string.
annotate(): Takes a MMIF as input and processes the MMIF input, then returns serialized MMIF
appmetadata(): Returns JSON-formatted
strthat contains metadata about the app.
A good place to start writing a CLAMS app is to start with inheriting
clams.app.ClamsApp. Let’s talk about the two methods in detail when inheriting the class.
annotate() method is the core method of a CLAMS app. It takes a MMIF JSON string as the main input, along with other kwargs for runtime configurations, and analyzes the MMIF input, then returns analysis results in a serialized MMIF
When you inherit
ClamsApp, you need to implement
annotate()(read the docstrings as they contain important information about the app implementation).
at a high level,
_annotate()is mostly concerned with
finding processable documents and relevant annotations from previous views,
creating new views,
and calling the code that runs over the documents and inserts the results to the new views.
As a developer you can expose different behaviors of the
annotate() method by providing configurable parameters as keyword arguments of the method. For example, you can have the user specify a re-sample rate of an audio file to be analyzed by providing a
These runtime configurations are not part of the MMIF input, but for reproducible analysis, you should record these configurations in the output MMIF.
There are universal parameters defined at the SDK-level that all CLAMS apps commonly use. See
All the runtime configurations should be pre-announced in the app metadata.
Also see <Tutorial: Wrapping an NLP Application> for a step-by-step tutorial on how to write the
_annotate() method with a simple example NLP tool.
App metadata is a map where important information about the app itself is stored as key-value pairs. That said, the
appmetadata() method should not perform any analysis on the input MMIF. In fact, it shouldn’t take any input at all.
clams.app.ClamsApp, you have different options to implement information source for the metadata. See
_load_appmetadata() for the options, and <CLAMS App Metadata> for the metadata specification.
In the future, the app metadata will be used for automatic generation of CLAMS App Directory.
To be integrated into CLAMS workflow engines, a CLAMS app needs to serve as a webapp. Once your application class is ready, you can use
clams.restify.Restifier to wrap your app as a Flask-based web application.
from clams.app import ClamsApp
from clams.restify import Restifier
# Implements an app that does this and that.
if __name__ == "__main__":
app = AnApp()
webapp = Restifier(app)
When running the above code, Python will start a web server and host your CLAMS app. By default the serve will listen to
0.0.0.0:5000, but you can adjust hostname and port number. In this webapp,
annotate will be respectively mapped to
POST to the root route. Hence, for example, you can
POST a MMIF file to the web app and get a response with the annotated MMIF string in the body.
Now with HTTP interface, users can pass runtime configuration as URL query strings. As the values of query string parameters are always strings,
Restifier will try to convert the values to the types specified in the app metadata, using
In the above example,
clams.restify.Restifier.run() will start the webapp in debug mode on a Werkzeug server, which is not always suitable for a production server. For a more robust server that can handle multiple requests asynchronously, you might want to use a production-ready HTTP server. In such a case you can use
serve_production(), which will spin up a multi-worker Gunicorn server. If you don’t like it (because, for example, gunicorn does not support Windows OS), you can write your own HTTP wrapper. At the end of the day, all you need is a webapp that maps
To test the behavior of the application in a Flask server, you should run the app as a webapp in a terminal (shell) session:
$ python app.py --develop --port 5000
# default port number is 5000
And poke at it from a new shell session:
# in a new terminal session
$ curl http://localhost:5000/
$ curl -H "Accept: application/json" -X POST -d@input/example-1.mmif "http://0.0.0.0:5000?pretty=True"
The first command prints the metadata, and the second prints the output MMIF file. Appending
?pretty=True to the URL will result in a pretty printed output. Note that with the
--develop option we started a Flask development server. Without this option, a production server will be started. To get more information about the input file format (the contents of
input/example-1.mmif), please refer to the user manual.
In addition to the HTTP service, a CLAMS app is expected to be containerized for seamless deployment to CLAMS workflow engines. Also, independently from being compatible with the CLAMS platform, containerization of your app is recommended especially when your app processes video streams and is dependent on complicated system-level video processing libraries (e.g. OpenCV, FFmpeg).
When you start developing an app with
clams develop command, the command will create a
Containerfile with some instructions as inline comments for you (you can always start from scratch with any containerization tool you like).
If you are part of CLAMS team and you want to publish your app to the
clams develop command will also create a GitHub Actions files to automatically build and push an app image to the organization’s container registry. For the actions to work, you must use the name
Containerfile instead of
If you are not familiar with
Dockerfile, refer to the official documentation to learn how to write one. To integrate to the CLAMS workflow engines, a containerized CLAMS app must automatically start itself as a webapp when instantiated as a container, and listen to
We have a public GitHub Container Repository, and publishing Debian-based base images to help developers write
Containerfile and save build time to install common libraries. At the moment we have various basic images with Python 3.8,
clams-python, and commonly used video and audio processing libraries installed.
Once you finished writing your
Containerfile, you can build and test the containerized app locally. If you are not familiar with building and running container images To build a Docker image, these documentation will be helpful.