CLAMS App Metadata
Overview
Every CLAMS app must provide information about the app itself. We call this set of information App Metadata.
Format
A CLAMS App Metadata should be able to be serialized into a JSON string.
Input/Output type specification
Essentially, all CLAMS apps are designed to take one MMIF file as input and produce another MMIF file as output. In this
section, we will discuss how to specify, in the App Metadata, the semantics of the input and output MMIF files, and
how that information should be formatted in terms of the App Metadata syntax, concretely by using input
and
output
lists and type vocabularies where @type
are defined.
Note
CLAMS App Metadata is encoded in JSON format, but is not part of MMIF specification. Full json schema for app metadata is available in the below section. When an app is published to the CLAMS app directory, the app metadata will be rendered as a HTML page, with some additional information about submission. Visit the CLAMS app directory to see how the app metadata is rendered.
Annotation types in MMIF
As described in the MMIF documentation, MMIF files can contain annotations of various types. Currently, CLAMS team is using the following vocabularies with pre-defined annotation types:
Each annotation object type in the vocabularies has a unique URI that is used as the value of the @type
field.
However, more important part of the type definition in the context of CLAMS app development is the metadata
and
properties
fields. These fields provide additional information about the annotation type. Semantically, there is
no differences between a metadata field and a property field. The difference is in the intended use of the field.
a metadata
field is used to provide common information about a group of annotation objects of the same type, while
a properties
field is used to provide information about the individual annotation instance. In practice, metadata
fields are placed in the view metadata (view[].metatadata.contains
) and properties fields are placed in the
annotation object itself. Because of this lack of distinction in the semantics, we will use the term “type property” to
refer to both metadata and properties in the context of annotation type (I/O) specifications in the app metadata.
Type definitions in the vocabularies are intentionally kept minimal and underspecified. This is because the definitions
are meant to be extended by an app developers. For example, the LAPPS vocabulary defines a type called Token
,
primarily to represent a token in a natural language text. However, the usage of the type can be extended to represent
a sub-word token used in a machine learning model, or a minimal unit of a sign language video. If the app developer
needs to add additional information to the type definition, they can do so by adding arbitrary properties to the type
definition in action. In such a case, the app developer is expected to provide the explanation of the extended type in
the app metadata. See below for the syntax of I/O specification in the app metadata.
Syntax for I/O specification in App Metadata
In the App Metadata, the input and output types are specified as lists of objects. Each object in the list should have the following fields:
@type
: The URI of the annotation type. This field is required.description
: A human-readable description of the annotation type. This field is optional.properties
: A key-value pairs of type properties. This field is optional.required
: A boolean value indicating whether the annotation type is required as input. This field is optional and defaults totrue
. Not applicable for output types.
Simple case - using types as defined in the vocabularies
In the simplest case, where a developer merely re-uses an annotation type definition and pre-defined properties, an input specification can be as simple as the following:
{
# other app metadata fields,
"input":
[
{
"@type": "http://vocab.lappsgrid.org/Token",
"properties": {
"posTagSet": "https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html"
}
}
],
# and more app metadata fields,
}
In the above example, the developer is declaring the app is expecting Token
annotation objects, with a posTagSet
property of which value is the URL of the Penn Treebank POS tag set, verbatim, in the input MMIF, and all other
existing annotation types in the input MMIF will be ignored during processing. There are some grammar of how this
input
list can be written.
The value of a property specification can be a verbatim string, or
"*"
to indicate that the property can have any value.If the app expects multiple types of annotations, the
input
field should contain multiple objects in the list.And if the app expects “one-of” specified types, one can specify the set of those types in a nested list. One nested list in the input specification means one required type.
And finally, if an input type is optional (i.e.,
required=false
), it indicates that the app can use extra information from the optional annotations. In such a case, it is recommended to provide a description of differences in the output MMIF when the extra information is available.
For example, here is a more complex example of the simple case:
{
# other app metadata fields,
"input":
[
[
{ "@type": "https://mmif.clams.ai/vocabulary/AudioDocument/v1/" },
{ "@type": "https://mmif.clams.ai/vocabulary/VideoDocument/v1/" }
],
{
"@type": "https://mmif.clams.ai/vocabulary/TimeFrame/v5",
"properties": {
"label": "speech",
}
"required": false
},
],
# and more app metadata fields,
}
This app is a speech-to-text (automatic speech recognition) app that can take either an audio document or a video
document and transcribe the speech in the document. The app can also take a TimeFrame
annotation objects with
label="speech"
property. When speech time frames are available, app can perform transcription only on the speech
segments, to save time and compute power.
Another example with even more complex input specification:
{
# other app metadata fields,
"input":
[
{ "@type": "https://mmif.clams.ai/vocabulary/VideoDocument/v1/" },
[
{
"@type": "https://mmif.clams.ai/vocabulary/TimeFrame/v5",
"properties": {
"timeUnit": "*"
"label": "slate",
}
},
{
"@type": "https://mmif.clams.ai/vocabulary/TimeFrame/v5",
"properties": {
"timeUnit": "*"
"label": "chyron",
}
}
]
],
# and more app metadata fields,
}
This is a text recognition app that can take a video document and TimeFrame
annotations that are labels as
either slate
or chyron
, and have timeUnit
properties. The value of the timeUnit
property doesn’t matter,
but the input time frames must have it.
Note
Unfortunately, currently there is no way to specify optional properties within the type definition.
Finally, let’s take a look at the output
specification of a scene recognition CLAMS app:
{
# other app metadata fields,
"output":
[
{
"@type": "https://mmif.clams.ai/vocabulary/TimePoint/v4/",
"description": "An individual \"still frame\"-level image classification results.",
"properties": {
"timeUnit": "milliseconds",
"labelset": ["slate", "chyron", "talking-heads-no-text"],
"classification": "*",
"label": "*"
}
}
],
# and more app metadata fields,
}
Note that in the actual output MMIF, more properties can be stored in the TimePoint
objects. The output
specification in the app metadata is a subset of the properties to be produced that are useful for type checking
in the downstream apps, as well as for human readers to understand the output.
Extended case - adding custom properties to the type definition
When the type definition is extended on the fly, developers are expected to provide the extended specification in the
form of key-value pairs in the properties
field. The grammar of the JSON object does not change, but developers are
expected to provide a verbose description of the type extension in the description
field.
Runtime parameter specification
CLAMS apps designed to be run as HTTP servers, preferably as stateless. When accepting HTTP requests, the app should take the request data payload (body) as the input MMIF, and any exposed configurations should be read from query strings in the URL.
That said, the only allowed data type for users to pass as parameter values at the request time is a string. Hence, the
app developer is responsible for parsing the string values into the appropriate data types. (clams-python
SDK
provides some basic parsing functions, automatically called by the web framework wrapper.) At the app metadata level,
developers can specify the expected parameter data types, among integer
, number
, string
, boolean
,
map
, and also can specify the default value of the parameter (when specified, default values should be properly
typed, not as strings). Noticeably, there’s NO list
in the available data types, and that is because a parameter
can be specified as multivalued=True
to accept multiple values as a list. For details of how SDK’s built-in
parameter value parsing works, please refer to the App Metadata json scheme (in the below
section).
Syntax for parameter specification in App Metadata
Metadata Schema
The schema for app metadata is as follows. (You can also download the schema in JSON Schema format from here.)
CLAMS AppMetadata
Data model that describes a CLAMS app.
Can be initialized by simply passing all required key-value pairs.
If you have a pre-generated metadata as an external file, you can read in the file as a dict
and use it as
keyword arguments for initialization. But be careful with keys of which values are automatically generated by the
SDK.
Please refer to <CLAMS App Metadata> for the metadata specification.
type |
object |
|||
properties |
||||
|
Name |
|||
A short name of the app. |
||||
type |
string |
|||
|
Description |
|||
A longer description of the app (what it does, how to use, etc.). |
||||
type |
string |
|||
|
App Version |
|||
(AUTO-GENERATED, DO NOT SET MANUALLY) Version of the app. When the metadata is generated using clams-python SDK, this field is automatically filled in |
||||
type |
string |
|||
|
Mmif Version |
|||
(AUTO-GENERATED, DO NOT SET MANUALLY) Version of MMIF specification the app. When the metadata is generated using clams-python SDK, this field is automatically filled in. |
||||
type |
string |
|||
|
Analyzer Version |
|||
(optional) Version of an analyzer software, if the app is working as a wrapper for one. |
||||
type |
string |
|||
|
App License |
|||
License information of the app. |
||||
type |
string |
|||
|
Analyzer License |
|||
(optional) License information of an analyzer software, if the app works as a wrapper for one. |
||||
type |
string |
|||
|
Identifier |
|||
(partly AUTO-GENERATED) IRI-formatted unique identifier for the app. If the app is to be published to the CLAMS app-directory, the developer should give a single string value composed with valid URL characters (no then when the metadata is generated using clams-python SDK, the app-directory URL is prepended and For example, Otherwise, only the |
||||
type |
string |
|||
maxLength |
65536 |
|||
minLength |
1 |
|||
format |
uri |
|||
|
Url |
|||
A public repository where the app’s source code (git-based) and/or documentation is available. |
||||
type |
string |
|||
maxLength |
65536 |
|||
minLength |
1 |
|||
format |
uri |
|||
|
Input |
|||
List of input types. Must have at least one element. This list should iterate all input types in an exhaustive and meticulous manner, meaning it is recommended for developers to pay extra attention to This list should interpreted conjunctive ( However, a nested list in this list means For example, All input elements in the nested list must not be |
||||
type |
array |
|||
default |
||||
items |
anyOf |
#/definitions/Input |
||
type |
array |
|||
items |
#/definitions/Input |
|||
|
Output |
|||
List of output types. Must have at least one element.This list should iterate all output types in an exhaustive and meticulous manner, meaning it is recommended for developers to pay extra attention to |
||||
type |
array |
|||
default |
||||
items |
#/definitions/Output |
|||
|
Parameters |
|||
List of runtime parameters. Can be empty. |
||||
type |
array |
|||
default |
||||
items |
#/definitions/RuntimeParameter |
|||
|
Dependencies |
|||
(optional) List of software dependencies of the app. This list is completely optional, as in most cases such dependencies are specified in a separate file in the codebase of the app (for example, List items must be strings, not any kind of structured data. Thus, it is recommended to include a package name and its version in the string value at the minimum (e.g., |
||||
type |
array |
|||
items |
type |
string |
||
|
More |
|||
(optional) A string-to-string map that can be used to store any additional metadata of the app. |
||||
type |
object |
|||
additionalProperties |
type |
string |
||
additionalProperties |
False |
CLAMS Input Specification
Data model that describes input specification of a CLAMS app.
CLAMS apps are expected to have at least one input type, and each type must
be defined by a @type
URI string. If the type has specific properties and values required by the app,
they can be described in the (optional) properties
field. Finally, a human-readable
verbose description can be provided in the (optional) description
field for users.
Developers should take diligent care to include all input types and their properties in the app metadata.
type |
object |
|||
properties |
||||
|
@Type |
|||
The type of the object. Must be a IRI string. |
||||
type |
string |
|||
maxLength |
65536 |
|||
minLength |
1 |
|||
format |
uri |
|||
|
Description |
|||
A verbose, human-readable description of the type. This is intended to be used for documentation purpose for a particular use case of this annotation type and is not expected to be consumed by software. This description should work as a guideline for users to understand the output type, and also can be used as a expansion specification for the type definition beyond the base vocabulary. |
||||
type |
string |
|||
|
Properties |
|||
(optional) Specification for type properties, if any. |
||||
type |
object |
|||
default |
||||
additionalProperties |
anyOf |
type |
integer |
|
type |
number |
|||
type |
boolean |
|||
type |
string |
|||
|
Required |
|||
(optional, True by default) Indicating whether this input type is mandatory or optional. |
||||
type |
boolean |
|||
additionalProperties |
False |
CLAMS Output Specification
Data model that describes output specification of a CLAMS app.
CLAMS apps are expected to have at least one output type, and each type must
be defined by a @type
URI string. If the type has common properties and values generated by the app,
they can be described in the (optional) properties
field. Finally, a human-readable
verbose description can be provided in the (optional) description
field for users.
Developers should take diligent care to include all output types and their properties in the app metadata. To
specify the property values, developers can use an actual value (for full match) or '*'
(for any value).
type |
object |
|||
properties |
||||
|
@Type |
|||
The type of the object. Must be a IRI string. |
||||
type |
string |
|||
maxLength |
65536 |
|||
minLength |
1 |
|||
format |
uri |
|||
|
Description |
|||
A verbose, human-readable description of the type. This is intended to be used for documentation purpose for a particular use case of this annotation type and is not expected to be consumed by software. This description should work as a guideline for users to understand the output type, and also can be used as a expansion specification for the type definition beyond the base vocabulary. |
||||
type |
string |
|||
|
Properties |
|||
(optional) Specification for type properties, if any. |
||||
type |
object |
|||
default |
||||
additionalProperties |
anyOf |
type |
integer |
|
type |
number |
|||
type |
boolean |
|||
type |
string |
|||
additionalProperties |
False |
CLAMS App Runtime Parameter
Defines a data model that describes a single runtime configuration of a CLAMS app.
Usually, an app keeps a list of these configuration specifications in the parameters
field.
When initializing a RuntimeParameter object in python the value for the default field must be a string.
For example, if you want to set a default value for a boolean parameter, use any of 'True'
, 'true'
, 't'
,
or their falsy counterpart, instead of True
or False
type |
object |
||||
properties |
|||||
|
Name |
||||
A short name of the parameter (works as a key). |
|||||
type |
string |
||||
|
Description |
||||
A longer description of the parameter (what it does, how to use, etc.). |
|||||
type |
string |
||||
|
Type |
||||
Type of the parameter value the app expects. Must be one of (‘integer’, ‘number’, ‘string’, ‘boolean’, ‘map’). When type is Notes for developers: When the type is Also, the VALUE part of the user input is always expected and handled as a string. If a developers wants to do more text processing on the passed value to accept more complex data types or structures (e.g., map from a string to a list of strings), it is up to the developer. However, any additional form requirements should be precisely described in the Finally, the same format is expected for the default value, if any. For example, if the default desired dictionary is |
|||||
type |
string |
||||
enum |
integer, number, string, boolean, map |
||||
|
Choices |
||||
(optional) List of string values that can be accepted. |
|||||
type |
array |
||||
items |
anyOf |
type |
integer |
||
type |
number |
||||
type |
boolean |
||||
type |
string |
||||
|
Default |
||||
(optional) Default value for the parameter. Notes for developers: Setting a default value makes a parameter optional. When When |
|||||
anyOf |
type |
integer |
|||
type |
number |
||||
type |
boolean |
||||
type |
string |
||||
type |
array |
||||
items |
anyOf |
type |
integer |
||
type |
number |
||||
type |
boolean |
||||
type |
string |
||||
|
Multivalued |
||||
(optional, False by default) Set True if the parameter can have multiple values. Note that, for parameters that allow multiple values, the SDK will pass a singleton list to |
|||||
type |
boolean |
||||
additionalProperties |
False |