clams.source package

Module contents

class clams.source.PipelineSource(common_documents_json: Optional[List[Union[str, dict]]] = None, common_metadata_json: Optional[Union[str, dict]] = None)[source]

Bases: object

The PipelineSource class.

A PipelineSource object is used at the beginning of a CLAMS pipeline to populate a new MMIF file with media.

The same PipelineSource object can be used repeatedly to generate multiple MMIF objects.

  • common_documents_json – JSON doc_lists for any documents that should be common to all MMIF objects produced by this pipeline.

  • common_metadata_json – JSON doc_lists for metadata that should be common to all MMIF objects produced by this pipeline.

add_document(document: Union[str, dict, mmif.serialize.annotation.Document])None[source]

Adds a document to the working source MMIF.

When you’re done, fetch the source MMIF with produce().


document – the medium to add, as a JSON dict or string or as a MMIF Medium object

change_metadata(key: str, value)[source]

Adds or changes a metadata entry in the working source MMIF.

  • key – the desired key of the metadata property

  • value – the desired value of the metadata property

from_data(doc_lists: Iterable[List[Union[str, dict, mmif.serialize.annotation.Document]]], metadata_objs: Optional[Iterable[Optional[Union[str, dict, mmif.serialize.mmif.MmifMetadata]]]] = None) → Generator[mmif.serialize.mmif.Mmif, None, None][source]

Provided with an iterable of document lists and an optional iterable of metadata objects, generates MMIF objects produced from that data.

doc_lists and metadata_objs should be matched pairwise, so that if they are zipped together, each pair defines a single MMIF object from this pipeline source.

  • doc_lists – an iterable of document lists to generate MMIF from

  • metadata_objs – an iterable of metadata objects paired with the document lists


a generator of produced MMIF files from the data

mmif: mmif.serialize.mmif.Mmif[source]

Primes the PipelineSource with a fresh MMIF object.

Call this method if you want to reset the PipelineSource without producing a MMIF object with produce().


Returns the source MMIF and resets the PipelineSource.

Call this method once you have added all the documents for your pipeline.


the current MMIF object that has been prepared