clams.source package

Module contents

class clams.source.PipelineSource(common_documents_json: Optional[List[Union[str, dict]]] = None, common_metadata_json: Optional[Union[str, dict]] = None)[source]

Bases: object

The PipelineSource class.

A PipelineSource object is used at the beginning of a CLAMS pipeline to populate a new MMIF file with media.

The same PipelineSource object can be used repeatedly to generate multiple MMIF objects.

Parameters
  • common_documents_json – JSON doc_lists for any documents that should be common to all MMIF objects produced by this pipeline.

  • common_metadata_json – JSON doc_lists for metadata that should be common to all MMIF objects produced by this pipeline.

add_document(document: Union[str, dict, mmif.serialize.annotation.Document])None[source]

Adds a document to the working source MMIF.

When you’re done, fetch the source MMIF with produce().

Parameters

document – the medium to add, as a JSON dict or string or as a MMIF Medium object

change_metadata(key: str, value)[source]

Adds or changes a metadata entry in the working source MMIF.

Parameters
  • key – the desired key of the metadata property

  • value – the desired value of the metadata property

from_data(doc_lists: Iterable[List[Union[str, dict, mmif.serialize.annotation.Document]]], metadata_objs: Optional[Iterable[Optional[Union[str, dict, mmif.serialize.mmif.MmifMetadata]]]] = None) → Generator[mmif.serialize.mmif.Mmif, None, None][source]

Provided with an iterable of document lists and an optional iterable of metadata objects, generates MMIF objects produced from that data.

doc_lists and metadata_objs should be matched pairwise, so that if they are zipped together, each pair defines a single MMIF object from this pipeline source.

Parameters
  • doc_lists – an iterable of document lists to generate MMIF from

  • metadata_objs – an iterable of metadata objects paired with the document lists

Returns

a generator of produced MMIF files from the data

mmif: mmif.serialize.mmif.Mmif[source]
prime()None[source]

Primes the PipelineSource with a fresh MMIF object.

Call this method if you want to reset the PipelineSource without producing a MMIF object with produce().

produce()mmif.serialize.mmif.Mmif[source]

Returns the source MMIF and resets the PipelineSource.

Call this method once you have added all the documents for your pipeline.

Returns

the current MMIF object that has been prepared