CLAMS App Metadata

Overview

Every CLAMS app must provide information about the app itself. We call this set of information App Metadata.

Format

A CLAMS App Metadata should be able to be serialized into a JSON string.

Input/Output type specification

Essentially, all CLAMS apps are designed to take one MMIF file as input and produce another MMIF file as output. In this section, we will discuss how to specify, in the App Metadata, the semantics of the input and output MMIF files, and how that information should be formatted in terms of the App Metadata syntax, concretely by using input and output lists and type vocabularies where @type are defined.

Note

CLAMS App Metadata is encoded in JSON format, but is not part of MMIF specification. Full json schema for app metadata is available in the below section. When an app is published to the CLAMS app directory, the app metadata will be rendered as a HTML page, with some additional information about submission. Visit the CLAMS app directory to see how the app metadata is rendered.

Annotation types in MMIF

As described in the MMIF documentation, MMIF files can contain annotations of various types. Currently, CLAMS team is using the following vocabularies with pre-defined annotation types:

Each annotation object type in the vocabularies has a unique URI that is used as the value of the @type field. However, more important part of the type definition in the context of CLAMS app development is the metadata and properties fields. These fields provide additional information about the annotation type. Semantically, there is no differences between a metadata field and a property field. The difference is in the intended use of the field. a metadata field is used to provide common information about a group of annotation objects of the same type, while a properties field is used to provide information about the individual annotation instance. In practice, metadata fields are placed in the view metadata (view[].metatadata.contains) and properties fields are placed in the annotation object itself. Because of this lack of distinction in the semantics, we will use the term “type property” to refer to both metadata and properties in the context of annotation type (I/O) specifications in the app metadata.

Type definitions in the vocabularies are intentionally kept minimal and underspecified. This is because the definitions are meant to be extended by an app developers. For example, the LAPPS vocabulary defines a type called Token, primarily to represent a token in a natural language text. However, the usage of the type can be extended to represent a sub-word token used in a machine learning model, or a minimal unit of a sign language video. If the app developer needs to add additional information to the type definition, they can do so by adding arbitrary properties to the type definition in action. In such a case, the app developer is expected to provide the explanation of the extended type in the app metadata. See below for the syntax of I/O specification in the app metadata.

Syntax for I/O specification in App Metadata

In the App Metadata, the input and output types are specified as lists of objects. Each object in the list should have the following fields:

  • @type: The URI of the annotation type. This field is required.

  • description: A human-readable description of the annotation type. This field is optional.

  • properties: A key-value pairs of type properties. This field is optional.

  • required: A boolean value indicating whether the annotation type is required as input. This field is optional and defaults to true. Not applicable for output types.

Simple case - using types as defined in the vocabularies

In the simplest case, where a developer merely re-uses an annotation type definition and pre-defined properties, an input specification can be as simple as the following:

{
  # other app metadata fields,
  "input":
  [
    {
      "@type": "http://vocab.lappsgrid.org/Token",
      "properties": {
        "posTagSet": "https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html"
      }
    }
  ],
  # and more app metadata fields,
}

In the above example, the developer is declaring the app is expecting Token``annotation objects, with a ``posTagSet property of which value is the URL of the Penn Treebank POS tag set, verbatim, in the input MMIF, and all other existing annotation types in the input MMIF will be ignored during processing. There are some grammar of how this input list can be written.

  • The value of a property specification can be a verbatim string, or "*" to indicate that the property can have any value.

  • If the app expects multiple types of annotations, the input field should contain multiple objects in the list.

  • And if the app expects “one-of” specified types, one can specify the set of those types in a nested list. One nested list in the input specification means one required type.

  • And finally, if an input type is optional (i.e., required=false), it indicates that the app can use extra information from the optional annotations. In such a case, it is recommended to provide a description of differences in the output MMIF when the extra information is available.

For example, here is a more complex example of the simple case:

{
  # other app metadata fields,
  "input":
  [
    [
      { "@type": "https://mmif.clams.ai/vocabulary/AudioDocument/v1/" },
      { "@type": "https://mmif.clams.ai/vocabulary/VideoDocument/v1/" }
    ],
    {
      "@type": "https://mmif.clams.ai/vocabulary/TimeFrame/v5",
      "properties": {
        "label": "speech",
      }
      "required": false
    },
  ],
  # and more app metadata fields,
}

This app is a speech-to-text (automatic speech recognition) app that can take either an audio document or a video document and transcribe the speech in the document. The app can also take a TimeFrame annotation objects with label="speech" property. When speech time frames are available, app can perform transcription only on the speech segments, to save time and compute power.

Another example with even more complex input specification:

{
  # other app metadata fields,
  "input":
  [
    { "@type": "https://mmif.clams.ai/vocabulary/VideoDocument/v1/" },
    [
      {
        "@type": "https://mmif.clams.ai/vocabulary/TimeFrame/v5",
        "properties": {
          "timeUnit": "*"
          "label": "slate",
        }
      },
      {
        "@type": "https://mmif.clams.ai/vocabulary/TimeFrame/v5",
        "properties": {
          "timeUnit": "*"
          "label": "chyron",
        }
      }
    ]
  ],
  # and more app metadata fields,
}

This is a text recognition app that can take a video document and TimeFrame annotations that are labels as either slate or chyron, and have timeUnit properties. The value of the timeUnit property doesn’t matter, but the input time frames must have it.

Note

Unfortunately, currently there is no way to specify optional properties within the type definition.

Finally, let’s take a look at the output specification of a scene recognition CLAMS app:

{
  # other app metadata fields,
  "output":
  [
      {
        "@type": "https://mmif.clams.ai/vocabulary/TimePoint/v4/",
        "description": "An individual \"still frame\"-level image classification results.",
        "properties": {
            "timeUnit": "milliseconds",
            "labelset": ["slate", "chyron", "talking-heads-no-text"],
            "classification": "*",
            "label": "*"
        }
      }
  ],
  # and more app metadata fields,
}

Note that in the actual output MMIF, more properties can be stored in the TimePoint objects. The output specification in the app metadata is a subset of the properties to be produced that are useful for type checking in the downstream apps, as well as for human readers to understand the output.

Extended case - adding custom properties to the type definition

When the type definition is extended on the fly, developers are expected to provide the extended specification in the form of key-value pairs in the properties field. The grammar of the JSON object does not change, but developers are expected to provide a verbose description of the type extension in the description field.

Runtime parameter specification

CLAMS apps designed to be run as HTTP servers, preferably as stateless. When accepting HTTP requests, the app should take the request data payload (body) as the input MMIF, and any exposed configurations should be read from query strings in the URL.

That said, the only allowed data type for users to pass as parameter values at the request time is a string. Hence, the app developer is responsible for parsing the string values into the appropriate data types. (clams-python SDK provides some basic parsing functions, automatically called by the web framework wrapper.) At the app metadata level, developers can specify the expected parameter data types, among integer, number, string, boolean, map, and also can specify the default value of the parameter (when specified, default values should be properly typed, not as strings). Noticeably, there’s NO list in the available data types, and that is because a parameter can be specified as multivalued=True to accept multiple values as a list. For details of how SDK’s built-in parameter value parsing works, please refer to the App Metadata json scheme (in the below section).

Syntax for parameter specification in App Metadata

Metadata Schema

The schema for app metadata is as follows. (You can also download the schema in JSON Schema format from here.)

CLAMS AppMetadata

Data model that describes a CLAMS app.

Can be initialized by simply passing all required key-value pairs.

If you have a pre-generated metadata as an external file, you can read in the file as a dict and use it as keyword arguments for initialization. But be careful with keys of which values are automatically generated by the SDK.

Please refer to <CLAMS App Metadata> for the metadata specification.

type

object

properties

  • name

Name

A short name of the app.

type

string

  • description

Description

A longer description of the app (what it does, how to use, etc.).

type

string

  • app_version

App Version

(AUTO-GENERATED, DO NOT SET MANUALLY)

Version of the app.

When the metadata is generated using clams-python SDK, this field is automatically filled in

type

string

  • mmif_version

Mmif Version

(AUTO-GENERATED, DO NOT SET MANUALLY)

Version of MMIF specification the app.

When the metadata is generated using clams-python SDK, this field is automatically filled in.

type

string

  • analyzer_version

Analyzer Version

(optional) Version of an analyzer software, if the app is working as a wrapper for one.

type

string

  • app_license

App License

License information of the app.

type

string

  • analyzer_license

Analyzer License

(optional) License information of an analyzer software, if the app works as a wrapper for one.

type

string

  • identifier

Identifier

(partly AUTO-GENERATED)

IRI-formatted unique identifier for the app.

If the app is to be published to the CLAMS app-directory, the developer should give a single string value composed with valid URL characters (no /, no whitespace),

then when the metadata is generated using clams-python SDK, the app-directory URL is prepended and app_version value will be appended automatically.

For example, example-app -> http://apps.clams.ai/example-app/1.0.0

Otherwise, only the app_version value is used as suffix, so use an IRI form, but leave the version number out.

type

string

maxLength

65536

minLength

1

format

uri

  • url

Url

A public repository where the app’s source code (git-based) and/or documentation is available.

type

string

maxLength

65536

minLength

1

format

uri

  • input

Input

List of input types. Must have at least one element.

This list should iterate all input types in an exhaustive and meticulous manner, meaning it is recommended for developers to pay extra attention to input and output fields to include 1) all types are listed, 2) if types to have specific properties, include the properties.

This list should interpreted conjunctive (AND).

However, a nested list in this list means oneOf disjunctive (OR) specification.

For example, input = [TypeA (req=True), [TypeB, TypeC]] means``TypeA`` is required and either TypeB or TypeC is additionally required.

All input elements in the nested list must not be required=False, and only a single nesting level is allowed (e.g. input = [TypeA, [ [TypeB, TypeC], [TypeD, TypeE] ] ] is not allowed).

type

array

default

items

anyOf

#/definitions/Input

type

array

items

#/definitions/Input

  • output

Output

List of output types. Must have at least one element.This list should iterate all output types in an exhaustive and meticulous manner, meaning it is recommended for developers to pay extra attention to input and output fields to include

type

array

default

items

#/definitions/Output

  • parameters

Parameters

List of runtime parameters. Can be empty.

type

array

default

items

#/definitions/RuntimeParameter

  • dependencies

Dependencies

(optional) List of software dependencies of the app.

This list is completely optional, as in most cases such dependencies are specified in a separate file in the codebase of the app (for example, requirements.txt file for a Python app, or pom.xml file for a maven-based Java app).

List items must be strings, not any kind of structured data. Thus, it is recommended to include a package name and its version in the string value at the minimum (e.g., clams-python==1.2.3).

type

array

items

type

string

  • more

More

(optional) A string-to-string map that can be used to store any additional metadata of the app.

type

object

additionalProperties

type

string

additionalProperties

False

CLAMS Input Specification

Data model that describes input specification of a CLAMS app.

CLAMS apps are expected to have at least one input type, and each type must be defined by a @type URI string. If the type has specific properties and values required by the app, they can be described in the (optional) properties field. Finally, a human-readable verbose description can be provided in the (optional) description field for users.

Developers should take diligent care to include all input types and their properties in the app metadata.

type

object

properties

  • @type

@Type

The type of the object. Must be a IRI string.

type

string

maxLength

65536

minLength

1

format

uri

  • description

Description

A verbose, human-readable description of the type. This is intended to be used for documentation purpose for a particular use case of this annotation type and is not expected to be consumed by software. This description should work as a guideline for users to understand the output type, and also can be used as a expansion specification for the type definition beyond the base vocabulary.

type

string

  • properties

Properties

(optional) Specification for type properties, if any. "*" indicates any value.

type

object

default

additionalProperties

anyOf

type

integer

type

number

type

boolean

type

string

  • required

Required

(optional, True by default) Indicating whether this input type is mandatory or optional.

type

boolean

additionalProperties

False

CLAMS Output Specification

Data model that describes output specification of a CLAMS app.

CLAMS apps are expected to have at least one output type, and each type must be defined by a @type URI string. If the type has common properties and values generated by the app, they can be described in the (optional) properties field. Finally, a human-readable verbose description can be provided in the (optional) description field for users.

Developers should take diligent care to include all output types and their properties in the app metadata. To specify the property values, developers can use an actual value (for full match) or '*' (for any value).

type

object

properties

  • @type

@Type

The type of the object. Must be a IRI string.

type

string

maxLength

65536

minLength

1

format

uri

  • description

Description

A verbose, human-readable description of the type. This is intended to be used for documentation purpose for a particular use case of this annotation type and is not expected to be consumed by software. This description should work as a guideline for users to understand the output type, and also can be used as a expansion specification for the type definition beyond the base vocabulary.

type

string

  • properties

Properties

(optional) Specification for type properties, if any. "*" indicates any value.

type

object

default

additionalProperties

anyOf

type

integer

type

number

type

boolean

type

string

additionalProperties

False

CLAMS App Runtime Parameter

Defines a data model that describes a single runtime configuration of a CLAMS app. Usually, an app keeps a list of these configuration specifications in the parameters field. When initializing a RuntimeParameter object in python the value for the default field must be a string. For example, if you want to set a default value for a boolean parameter, use any of 'True', 'true', 't', or their falsy counterpart, instead of True or False

type

object

properties

  • name

Name

A short name of the parameter (works as a key).

type

string

  • description

Description

A longer description of the parameter (what it does, how to use, etc.).

type

string

  • type

Type

Type of the parameter value the app expects. Must be one of (‘integer’, ‘number’, ‘string’, ‘boolean’, ‘map’). When type is map, multivalued=true is automatically forced.

Notes for developers:

When the type is map, the parameter value (still a single string from the users’ perspective) must be formatted as a KEY:VALUE pair, namely a colon-separated string. To pass multiple key-value pairs, users need to pass the parameter multiple times (remember type=map implies multivalued=true) with pairs in the colon-separated format.

Also, the VALUE part of the user input is always expected and handled as a string. If a developers wants to do more text processing on the passed value to accept more complex data types or structures (e.g., map from a string to a list of strings), it is up to the developer. However, any additional form requirements should be precisely described in the description field for users.

Finally, the same format is expected for the default value, if any. For example, if the default desired dictionary is {'key1': 'value1', 'key2': 'value2'}, the default value (used when initializing a parameter) should be ['key1:value1','key2:value2'] .

type

string

enum

integer, number, string, boolean, map

  • choices

Choices

(optional) List of string values that can be accepted.

type

array

items

anyOf

type

integer

type

number

type

boolean

type

string

  • default

Default

(optional) Default value for the parameter.

Notes for developers:

Setting a default value makes a parameter optional.

When multivalued=True, the default value should be a list of values.

When type=map, the default value should be a list of colon-separated strings.

anyOf

type

integer

type

number

type

boolean

type

string

type

array

items

anyOf

type

integer

type

number

type

boolean

type

string

  • multivalued

Multivalued

(optional, False by default) Set True if the parameter can have multiple values.

Note that, for parameters that allow multiple values, the SDK will pass a singleton list to _annotate() even when one value is passed via HTTP.

type

boolean

additionalProperties

False