Reference

Datasets

Dataset objects are the main entry point for user code in Quevedo. They provide methods to manage the dataset, but also to retrieve other objects within it. Therefore, you don't usually need to create instances of other objects directly, but rather use the methods in the Dataset class to get them already built.

For example:

from quevedo import Dataset, Target

ds = Dataset('path/to/the/dataset')

# annotation is of type quevedo.Grapheme, a subclass of quevedo.Annotation
annotation = ds.get_single(Target.GRAPH, 'subset', 32)
print(annotation.to_dict())

# net is of type quevedo.Network
net = ds.get_network('grapheme_classify')
net.auto_annotate(annotation)
annotation.save()

# creating a quevedo.Logogram (subclass of quevedo.Annotation)
new_a = ds.new_single(Target.LOGO, 'my_new_subset',
                      image_path='path/to/the/image',
                      graphemes=[
                          {'tags': ['character', 'letter_a'],
                           'box': [0.2, 0.6, 0.3, 0.3]},
                          {'tags': ['character', 'accute_accent']
                           'box': [0.2, 0.2, 0.1, 0.2]}
                      ])
class

Dataset(path='.')

Class representing a Quevedo dataset.

It provides access to the annotations, subsets, and any neural networks contained.

Parameters
  • path (optional) the path to the dataset directory (existing or to be created)
Attributes
Methods
  • create() Create or initialize a directory to be a Quevedo dataset.
  • create_subset(target, name, existing) Creates the directory for a new subset.
  • get_annotations(target, subset) Get annotations from the dataset.
  • get_config(section, key) Get the configuration for a key under a section (a value in a table, eg [network.example], where network is the section and example is the key. This method looks for the "extend" key and merges configuration recursively.
  • get_network(name) Get a single neural network by name.
  • get_pipeline(name) Get a pipeline by name.
  • get_single(target, subset, id) Retrieve a single annotation.
  • get_subsets(target) Gets information about subsets in the dataset.
  • is_test(annotation) Checks if an annotation belongs to the training split.
  • is_train(annotation) Checks if an annotation belongs to the training split.
  • list_networks() Get a list of all neural networks for this dataset.
  • list_pipelines() Get a list of all pipelines for this dataset.
  • new_single(target, subset, **kwds) Create a new annotation.
method

get_config(section, key)

Get the configuration for a key under a section (a value in a table, eg [network.example], where network is the section and example is the key. This method looks for the "extend" key and merges configuration recursively.

Returns

dict

method

create()

Create or initialize a directory to be a Quevedo dataset.

method

list_networks()

Get a list of all neural networks for this dataset.

Returns

list of Networks

method

get_network(name)

Get a single neural network by name.

Parameters
  • name name of the neural network as specified in the configuration file.
Returns

a Network object.

method

list_pipelines()

Get a list of all pipelines for this dataset.

Returns

list of Pipelines

method

get_pipeline(name)

Get a pipeline by name.

Parameters
  • name name of the pipeline as specified in the configuration file.
Returns

a Pipeline object.

method

get_single(target, subset, id)

Retrieve a single annotation.

Parameters
  • target (AnnotationTarget) Target (type) of the annotation to retrieve.
  • subset name of the subset where the annotation is stored.
  • id number of the annotation in the subset.
Returns

a single Annotation of the appropriate type.

method

new_single(target, subset, **kwds)

Create a new annotation.

This method creates the annotation files in the corresponding directory, and initializes them with create_from. Any extra arguments will be passed to that method.

Parameters
  • target (AnnotationTarget) Target (type) of the annotation to create.
  • subset name of the (existing) subset where to place it.
Returns

the new Annotation.

method

get_annotations(target=<AnnotationTarget.GRAPH|LOGO: 3>, subset=None)

Get annotations from the dataset.

Depending on the arguments, all annotations, those of a given target, or only those in a given subset (or subsets) and target will be selected.

Parameters
  • target (AnnotationTarget, optional) Target (type) of the annotations to retrieve. By default, it is the union of both types, so all annotations are retrieved: Target.GRAPH | Target.LOGO.
  • subset (optional) name of the subsets to get, or None to get annotations from all subsets.
Returns

a generator that yields selected annotations.

method

get_subsets(target)

Gets information about subsets in the dataset.

Parameters
  • target (AnnotationTarget) Target (type) of the annotation subsets.
Returns

a sorted list of dict, each with the keys name for the name of the subset, and count for the number of annotations in it.

method

create_subset(target, name, existing='a')

Creates the directory for a new subset.

Parameters
  • target (AnnotationTarget) Target (type) of the annotation subset to create.
  • name name for the new subset.
  • existing (optional) controls behaviour when the directory already exists. It can be 'a' to abort (the default), 'r' to remove existing annotations, or 'm' (merge) to do nothing.
Returns

the path of the created directory.

method

is_train(annotation)

Checks if an annotation belongs to the training split.

method

is_test(annotation)

Checks if an annotation belongs to the training split.

Annotations

Quevedo annotations are of two types, logograms and graphemes, both derived from the parent class Annotation. When it is necessary to distinguish logograms and graphemes in a process, there is the enum Target, which can take the values Target.GRAPH or Target.LOGO. the values Target.GRAPH or Target.LOGO.

There is also the BoundGrapheme class, used to represent each of the graphemes which make up a logogram.

class

Annotation(path=None, image=None, **kwds)

Class representing a single annotation of either a logogram of a sign or signs in the dataset or an isolated grapheme.

Parameters
  • path (optional) the full path to the annotation files (either source image or tag dictionary, which should share path and filename but not extension (the annotation dictionary need not exist).
  • image (optional) either a file object or a PIL image to create a "path-less" annotation which lives in memory.
Attributes
  • id Number which identifies this annotation in its subset.
  • image (PIL.Image.Image) image data for this annotation.
  • image_path Path to the source image for the annotation. It is the id plus png extension.
  • json_path Path to the json annotation file. It is the id plus json extension.
  • meta Dictionary of metadata annotations.
Methods
  • create_from(image_path, binary_data, pil_image, **kwds) Initialize an annotation with some source image data.
  • save() Persist the information to the filesystem.
  • to_dict() Get the annotation data as a dictionary.
  • update(meta, fold, **kwds) Update the content of the annotation.
method

update(meta=None, fold=None, **kwds)

Update the content of the annotation.

This method should be overriden by the specific annotation classes to add their specific annotation information.

Parameters
  • meta (optional) dictionary of metadata values to set.
  • fold (optional) fold to which the annotation will belong.
method

to_dict()

Get the annotation data as a dictionary.

method

save()

Persist the information to the filesystem.

method

create_from(image_path=None, binary_data=None, pil_image=None, **kwds)

Initialize an annotation with some source image data.

One of image_path, binary_data or pil_image must be provided. Other arguments will be passed to update so metadata or tags can also be set.

The annotation will be persisted with a call to save too.

Parameters
  • image_path (optional) path to an image in the filesystem to use as image for this annotation. The image will be copied into the dataset.
  • binary_data (optional) bytes array which encodes the source image. The contents will be dumped to the appropriate image file in the dataset.
  • pil_image (optional) [PIL.Image.Image] object to be stored as image file for this annotation in the dataset.
Returns

self to allow chaining.

class

Grapheme(*args, **kwargs)

Bases
quevedo.annotation.annotation.Annotation

Annotation for an isolated grapheme.

Attributes
  • id Number which identifies this annotation in its subset.
  • image (PIL.Image.Image) image data for this annotation.
  • image_path Path to the source image for the annotation. It is the id plus png extension.
  • json_path Path to the json annotation file. It is the id plus json extension.
  • meta Dictionary of metadata annotations.
  • tags annotated tags for this grapheme.
Methods
  • create_from(image_path, binary_data, pil_image, **kwds) Initialize an annotation with some source image data.
  • save() Persist the information to the filesystem.
  • to_dict() Get the annotation data as a dictionary.
  • update(tags, **kwds) Extends base update, other arguments will be passed through.
method

save()

Persist the information to the filesystem.

method

create_from(image_path=None, binary_data=None, pil_image=None, **kwds)

Initialize an annotation with some source image data.

One of image_path, binary_data or pil_image must be provided. Other arguments will be passed to update so metadata or tags can also be set.

The annotation will be persisted with a call to save too.

Parameters
  • image_path (optional) path to an image in the filesystem to use as image for this annotation. The image will be copied into the dataset.
  • binary_data (optional) bytes array which encodes the source image. The contents will be dumped to the appropriate image file in the dataset.
  • pil_image (optional) [PIL.Image.Image] object to be stored as image file for this annotation in the dataset.
Returns

self to allow chaining.

method

update(tags=None, **kwds)

Extends base update, other arguments will be passed through.

Parameters
  • tags (optional) new tags for this grapheme (replaces all).
method

to_dict()

Get the annotation data as a dictionary.

class

Logogram(*args, **kwargs)

Bases
quevedo.annotation.annotation.Annotation

Annotation for a logogram, with its contained graphemes.

Attributes
  • edges list of edges found within this logogram.
  • graphemes list of bound graphemes found within this logogram.
  • id Number which identifies this annotation in its subset.
  • image (PIL.Image.Image) image data for this annotation.
  • image_path Path to the source image for the annotation. It is the id plus png extension.
  • json_path Path to the json annotation file. It is the id plus json extension.
  • meta Dictionary of metadata annotations.
  • tags annotated tags for this logogram.
Methods
  • create_from(image_path, binary_data, pil_image, **kwds) Initialize an annotation with some source image data.
  • save() Persist the information to the filesystem.
  • to_dict() Get the annotation data as a dictionary.
  • update(tags, graphemes, edges, **kwds) Extends base update, other arguments will be passed through.
method

save()

Persist the information to the filesystem.

method

create_from(image_path=None, binary_data=None, pil_image=None, **kwds)

Initialize an annotation with some source image data.

One of image_path, binary_data or pil_image must be provided. Other arguments will be passed to update so metadata or tags can also be set.

The annotation will be persisted with a call to save too.

Parameters
  • image_path (optional) path to an image in the filesystem to use as image for this annotation. The image will be copied into the dataset.
  • binary_data (optional) bytes array which encodes the source image. The contents will be dumped to the appropriate image file in the dataset.
  • pil_image (optional) [PIL.Image.Image] object to be stored as image file for this annotation in the dataset.
Returns

self to allow chaining.

method

update(tags=None, graphemes=None, edges=None, **kwds)

Extends base update, other arguments will be passed through.

Parameters
  • tags (optional) new tags for this logogram (replaces all).
  • graphemes (optional) either a list of Graphemes, BoundGraphemes, or dicts with the keys necessary to initialize a BoundGrapheme.
  • edges (optional) either a list of Edges, or dicts with the keys necessary to initialize an Edge. In this case, start and end should be the indices of the boundgraphemes in the graphemes list.
method

to_dict()

Get the annotation data as a dictionary.

class

BoundGrapheme(logogram, box=[0, 0, 0, 0], *args, **kwargs)

Bases
quevedo.annotation.grapheme.Grapheme quevedo.annotation.annotation.Annotation

A grapheme which is not isolated, but rather forms part of a logogram.

To promote this bound grapheme to an isolated grapheme with its own annotation, create a grapheme object using create_from, passing this object's image to the argument pil_image.

Attributes
  • box (list[float]) Bounding box coordinates (x, y, w, h) of this grapheme within the logogram.
    • (x, y): coordinates of the center.
    • (w, h): width and height.
    Values are relative to the logogram size, in the range [0, 1].
  • id Number which identifies this annotation in its subset.
  • image (PIL.Image.Image) image data for only this grapheme, cropped out of the parent logogram's image.
  • image_path Path to the source image for the annotation. It is the id plus png extension.
  • json_path Path to the json annotation file. It is the id plus json extension.
  • logogram Logogram where this grapheme is found.
  • meta Dictionary of metadata annotations.
  • tags annotated tags for this grapheme.
Methods
  • create_from(image_path, binary_data, pil_image, **kwds) Initialize an annotation with some source image data.
  • inbound() Generator[Edge,None,None]: edges in the logogram ending in this grapheme.
  • outbound() Generator[Edge,None,None]: edges in the logogram emanating from this grapheme.
  • save() Persist the information to the filesystem.
  • to_dict() Get the annotation data as a dictionary.
  • update(tags, **kwds) Extends base update, other arguments will be passed through.
method

save()

Persist the information to the filesystem.

method

create_from(image_path=None, binary_data=None, pil_image=None, **kwds)

Initialize an annotation with some source image data.

One of image_path, binary_data or pil_image must be provided. Other arguments will be passed to update so metadata or tags can also be set.

The annotation will be persisted with a call to save too.

Parameters
  • image_path (optional) path to an image in the filesystem to use as image for this annotation. The image will be copied into the dataset.
  • binary_data (optional) bytes array which encodes the source image. The contents will be dumped to the appropriate image file in the dataset.
  • pil_image (optional) [PIL.Image.Image] object to be stored as image file for this annotation in the dataset.
Returns

self to allow chaining.

method

update(tags=None, **kwds)

Extends base update, other arguments will be passed through.

Parameters
  • tags (optional) new tags for this grapheme (replaces all).
method

to_dict()

Get the annotation data as a dictionary.

method

outbound()

Generator[Edge,None,None]: edges in the logogram emanating from this grapheme.

method

inbound()

Generator[Edge,None,None]: edges in the logogram ending in this grapheme.

class

Edge(start, end, tags={})

An edge between graphemes in a logogram.

Edges are used to connect two graphemes, and can be used to define the dependency or function between them. The edges and graphemes of a logogram form a directed graph. The tags for an edge are a dictionary with keys in the dataset's e_tags field.

Attributes

Networks

Network objects in Quevedo represent the network itself, but also their configuration, training and testing process, and use. There are two types of networks, Detector networks and Classifier networks, which work on logograms and graphemes respectively.

The Network base class documented here is a base class that defines general behaviour, but code specific to each type of network lives in its own class. Therefore, you should get the network from a Quevedo dataset's method get_network so that the proper instance is built.

class

Network(dataset, name, config)

Class representing a neural net to train and predict logograms or graphemes.

Attributes
  • config Configuration dictionary
  • dataset Parent dataset
  • get_tag function to get the relevant label for the network from a list of tags according to g_tags
  • name Name of the network
  • path Path to the network directory
  • prediction_to_tag function to get the g_tags values from the tag/label/class predicted by the network
Methods
  • auto_annotate(annotation) Use the network to automatically annotate a real instance.
  • get_annotations(test) Get the annotations configured for use with this network.
  • is_prepared() Checks whether the neural network configuration files have been made.
  • is_trained() Checks whether the neural network has been trained and can be used to predict.
  • predict(image_path) Use the trained neural network to predict results from an image.
  • prepare() Creates the files needed for training (and later using) darknet.
  • test(annotation, stats) Method to test the network on an annotation.
  • train(initial) Trains the neural network.
method

is_prepared()

Checks whether the neural network configuration files have been made.

method

is_trained()

Checks whether the neural network has been trained and can be used to predict.

method

get_annotations(test=False)

Get the annotations configured for use with this network.

Parameters
  • test (optional) get test annotations instead of train
Returns

a list of relevant Annotations.

method

prepare()

Creates the files needed for training (and later using) darknet.

Stores the files in the network directory so they can be reused or tracked by a version control system. Must be called before training, and files not deleted (except maybe the "train" directory) before testing or predicting with the net.

method

train(initial=None)

Trains the neural network.

When finished, removes partial weights and keeps only the last. Can be interrupted and optionally resumed later.

Parameters
  • initial (optional) path to the weights from which to resume training.
method

predict(image_path)

Use the trained neural network to predict results from an image.

Parameters
  • image_path path to the image in the file system.
Returns

a list of dictionaries with the predictions. Each prediction has a confidence value. Classify networks have a tag key for the predicted class, each entry is a possible classification of the image. Detector networks results are possible graphemes found, each with a name (predicted class) and box (bounding box).

method

test(annotation, stats)

Method to test the network on an annotation.

This method is intended for internal library use, you probably want to use predict instead.

Uses the network to get the prediction for a real annotation, compare results and update stats. See test.py for stats.

method

auto_annotate(annotation)

Use the network to automatically annotate a real instance.

In detector networks, existing bound graphemes will be removed. In classify networks, tags which are not relevant to this network won't be modified, so it can be used incrementally.

Parameters
  • annotation Annotation to automatically tag using this network's predictions.

Pipelines

Pipeline objects in Quevedo can often act as substitutes for networks, to perform their task but using a more complex system of networks and steps. There are a number of types of pipelines, each represented by a different class in the code, and a base class that serves as common interface. To get a configured pipeline from a dataset, use its get_pipeline method.

class

Pipeline(dataset, name, config)

A pipeline combines networks and logical steps into a computational graph that can be used for the task of detection or classification of logograms or graphemes.

Methods
  • predict(image_path) (Annotation) Run the pipeline on the given image and return the resulting annotation.
  • run(a) Run the pipeline on the given annotation.
staticmethod

run(a)

Run the pipeline on the given annotation.

Parameters
  • a (Annotation) Annotation to run the pipeline on.
method

predict(image_path)

Run the pipeline on the given image and return the resulting annotation.

Parameters
  • image_path (str) Path to the image to run the pipeline on.
Returns (Annotation)

The resulting annotation.

class

NetworkPipeline(dataset, name, config)

Bases
quevedo.pipeline.Pipeline

A pipeline step that runs a network on the given annotation.

Methods
  • predict(image_path) (Annotation) Run the pipeline on the given image and return the resulting annotation.
  • run(a) Run the pipeline on the given annotation.
method

predict(image_path)

Run the pipeline on the given image and return the resulting annotation.

Parameters
  • image_path (str) Path to the image to run the pipeline on.
Returns (Annotation)

The resulting annotation.

method

run(a)

Run the pipeline on the given annotation.

Parameters
  • a (Annotation) Annotation to run the pipeline on.
class

LogogramPipeline(dataset, name, config)

Bases
quevedo.pipeline.Pipeline

A pipeline for detecting graphemes within a logogram and then classifying them, using networks or sub pipelines.

Methods
  • predict(image_path) (Annotation) Run the pipeline on the given image and return the resulting annotation.
  • run(a) Run the pipeline on the given annotation.
method

predict(image_path)

Run the pipeline on the given image and return the resulting annotation.

Parameters
  • image_path (str) Path to the image to run the pipeline on.
Returns (Annotation)

The resulting annotation.

method

run(a)

Run the pipeline on the given annotation.

Parameters
  • a (Annotation) Annotation to run the pipeline on.
class

SequencePipeline(dataset, name, config)

Bases
quevedo.pipeline.Pipeline

A pipeline that runs a sequence of other pipelines. All steps should have the same target.

Methods
  • predict(image_path) (Annotation) Run the pipeline on the given image and return the resulting annotation.
  • run(a) Run the pipeline on the given annotation.
method

predict(image_path)

Run the pipeline on the given image and return the resulting annotation.

Parameters
  • image_path (str) Path to the image to run the pipeline on.
Returns (Annotation)

The resulting annotation.

method

run(a)

Run the pipeline on the given annotation.

Parameters
  • a (Annotation) Annotation to run the pipeline on.
class

BranchPipeline(dataset, name, config)

Bases
quevedo.pipeline.Pipeline

A pipeline that runs one of many possible branches depending on a criterion. The criterion can be:

  • a tag name: the pipeline will run the branch corresponding to the tag value. Can also be a meta tag.
  • a lambda expression: the pipeline will run the branch corresponding to the result of the lambda expression, which will receive the annotation as parameter.

All branches should have the same target.

Methods
  • predict(image_path) (Annotation) Run the pipeline on the given image and return the resulting annotation.
  • run(a) Run the pipeline on the given annotation.
method

predict(image_path)

Run the pipeline on the given image and return the resulting annotation.

Parameters
  • image_path (str) Path to the image to run the pipeline on.
Returns (Annotation)

The resulting annotation.

method

run(a)

Run the pipeline on the given annotation.

Parameters
  • a (Annotation) Annotation to run the pipeline on.
class

FunctionPipeline(dataset, name, config)

Bases
quevedo.pipeline.Pipeline

A pipeline that runs a user-defined function.

The config should be a string in the form 'module.py:function'. 'module.py' should be a file in the scripts directory of the dataset, and 'function' should be the name of a function in that file, that accepts an annotation and a dataset and returns nothing. Alternatively, a string containing a lambda function, receiving the same arguments, can be used.

The target of this pipeline is deduced from the signature of the function. This is often inconsequential, but if this network is the first of a sequence or branching, its target will be the target for the whole pipeline. To ensure correct deduction, use a type annotation of Logogram or Grapheme for the second parameter.

Methods
  • predict(image_path) (Annotation) Run the pipeline on the given image and return the resulting annotation.
  • run(a) Run the pipeline on the given annotation.
method

predict(image_path)

Run the pipeline on the given image and return the resulting annotation.

Parameters
  • image_path (str) Path to the image to run the pipeline on.
Returns (Annotation)

The resulting annotation.

method

run(a)

Run the pipeline on the given annotation.

Parameters
  • a (Annotation) Annotation to run the pipeline on.