Reference¶
Datasets¶
Dataset objects are the main entry point for user code in Quevedo. They provide methods to manage the dataset, but also to retrieve other objects within it. Therefore, you don't usually need to create instances of other objects directly, but rather use the methods in the Dataset class to get them already built.
For example:
from quevedo import Dataset, Target
ds = Dataset('path/to/the/dataset')
# annotation is of type quevedo.Grapheme, a subclass of quevedo.Annotation
annotation = ds.get_single(Target.GRAPH, 'subset', 32)
print(annotation.to_dict())
# net is of type quevedo.Network
net = ds.get_network('grapheme_classify')
net.auto_annotate(annotation)
annotation.save()
# creating a quevedo.Logogram (subclass of quevedo.Annotation)
new_a = ds.new_single(Target.LOGO, 'my_new_subset',
image_path='path/to/the/image',
graphemes=[
{'tags': ['character', 'letter_a'],
'box': [0.2, 0.6, 0.3, 0.3]},
{'tags': ['character', 'accute_accent']
'box': [0.2, 0.2, 0.1, 0.2]}
])
Dataset
(
path='.'
)
Class representing a Quevedo dataset.
It provides access to the annotations, subsets, and any neural networks contained.
path
(optional) — the path to the dataset directory (existing or to be created)
config
(dict) — Dataset configurationpath
(pathlib.Path) — Path to the dataset directory.
create
(
)
— Create or initialize a directory to be a Quevedo dataset.create_subset
(
target
,name
,existing
)
— Creates the directory for a new subset.get_annotations
(
target
,subset
)
— Get annotations from the dataset.get_config
(
section
,key
)
— Get the configuration for a key under a section (a value in a table, eg [network.example], where network is the section and example is the key. This method looks for the "extend" key and merges configuration recursively.get_network
(
name
)
— Get a single neural network by name.get_pipeline
(
name
)
— Get a pipeline by name.get_single
(
target
,subset
,id
)
— Retrieve a single annotation.get_subsets
(
target
)
— Gets information about subsets in the dataset.is_test
(
annotation
)
— Checks if an annotation belongs to the training split.is_train
(
annotation
)
— Checks if an annotation belongs to the training split.list_networks
(
)
— Get a list of all neural networks for this dataset.list_pipelines
(
)
— Get a list of all pipelines for this dataset.new_single
(
target
,subset
,**kwds
)
— Create a new annotation.
get_config
(
section
, key
)
Get the configuration for a key under a section (a value in a table, eg [network.example], where network is the section and example is the key. This method looks for the "extend" key and merges configuration recursively.
dict
create
(
)
Create or initialize a directory to be a Quevedo dataset.
list_networks
(
)
Get a list of all neural networks for this dataset.
list of Networks
get_network
(
name
)
Get a single neural network by name.
name
— name of the neural network as specified in the configuration file.
a Network object.
list_pipelines
(
)
Get a list of all pipelines for this dataset.
list of Pipelines
get_pipeline
(
name
)
Get a pipeline by name.
name
— name of the pipeline as specified in the configuration file.
a Pipeline object.
get_single
(
target
, subset
, id
)
Retrieve a single annotation.
target
(AnnotationTarget) — Target (type) of the annotation to retrieve.subset
— name of the subset where the annotation is stored.id
— number of the annotation in the subset.
a single Annotation of the appropriate type.
new_single
(
target
, subset
, **kwds
)
Create a new annotation.
This method creates the annotation files in the corresponding directory,
and initializes them with
create_from
.
Any extra arguments will be passed to that method.
target
(AnnotationTarget) — Target (type) of the annotation to create.subset
— name of the (existing) subset where to place it.
the new Annotation.
get_annotations
(
target=<AnnotationTarget.GRAPH|LOGO: 3>
, subset=None
)
Get annotations from the dataset.
Depending on the arguments, all annotations, those of a given target, or only those in a given subset (or subsets) and target will be selected.
target
(AnnotationTarget, optional) — Target (type) of the annotations to retrieve. By default, it is the union of both types, so all annotations are retrieved:Target.GRAPH | Target.LOGO
.subset
(optional) — name of the subsets to get, orNone
to get annotations from all subsets.
a generator that yields selected annotations.
get_subsets
(
target
)
Gets information about subsets in the dataset.
target
(AnnotationTarget) — Target (type) of the annotation subsets.
a sorted list of dict
, each with the keys name
for the name of
the subset, and count
for the number of annotations in it.
create_subset
(
target
, name
, existing='a'
)
Creates the directory for a new subset.
target
(AnnotationTarget) — Target (type) of the annotation subset to create.name
— name for the new subset.existing
(optional) — controls behaviour when the directory already exists. It can be 'a' to abort (the default), 'r' to remove existing annotations, or 'm' (merge) to do nothing.
the path of the created directory.
is_train
(
annotation
)
Checks if an annotation belongs to the training split.
is_test
(
annotation
)
Checks if an annotation belongs to the training split.
Annotations¶
Quevedo annotations are of two types, logograms and graphemes, both derived from
the parent class Annotation
. When it is necessary to distinguish logograms and
graphemes in a process, there is the enum Target
, which can take the values
Target.GRAPH
or Target.LOGO
. the values Target.GRAPH
or Target.LOGO
.
There is also the BoundGrapheme
class, used to represent each of the graphemes
which make up a logogram.
Annotation
(
path=None
, image=None
, **kwds
)
Class representing a single annotation of either a logogram of a sign or signs in the dataset or an isolated grapheme.
path
(optional) — the full path to the annotation files (either source image or tag dictionary, which should share path and filename but not extension (the annotation dictionary need not exist).image
(optional) — either a file object or a PIL image to create a "path-less" annotation which lives in memory.
id
— Number which identifies this annotation in its subset.image
(PIL.Image.Image) — image data for this annotation.image_path
— Path to the source image for the annotation. It is the id pluspng
extension.json_path
— Path to the json annotation file. It is the id plusjson
extension.meta
— Dictionary of metadata annotations.
create_from
(
image_path
,binary_data
,pil_image
,**kwds
)
— Initialize an annotation with some source image data.save
(
)
— Persist the information to the filesystem.to_dict
(
)
— Get the annotation data as a dictionary.update
(
meta
,fold
,**kwds
)
— Update the content of the annotation.
update
(
meta=None
, fold=None
, **kwds
)
Update the content of the annotation.
This method should be overriden by the specific annotation classes to add their specific annotation information.
meta
(optional) — dictionary of metadata values to set.fold
(optional) — fold to which the annotation will belong.
to_dict
(
)
Get the annotation data as a dictionary.
save
(
)
Persist the information to the filesystem.
create_from
(
image_path=None
, binary_data=None
, pil_image=None
, **kwds
)
Initialize an annotation with some source image data.
One of image_path
, binary_data
or pil_image
must be provided.
Other arguments will be passed to
update
so metadata
or tags can also be set.
The annotation will be persisted with a call to save
too.
image_path
(optional) — path to an image in the filesystem to use as image for this annotation. The image will be copied into the dataset.binary_data
(optional) — bytes array which encodes the source image. The contents will be dumped to the appropriate image file in the dataset.pil_image
(optional) — [PIL.Image.Image] object to be stored as image file for this annotation in the dataset.
self to allow chaining.
Grapheme
(
*args
, **kwargs
)
Annotation for an isolated grapheme.
id
— Number which identifies this annotation in its subset.image
(PIL.Image.Image) — image data for this annotation.image_path
— Path to the source image for the annotation. It is the id pluspng
extension.json_path
— Path to the json annotation file. It is the id plusjson
extension.meta
— Dictionary of metadata annotations.tags
— annotated tags for this grapheme.
create_from
(
image_path
,binary_data
,pil_image
,**kwds
)
— Initialize an annotation with some source image data.save
(
)
— Persist the information to the filesystem.to_dict
(
)
— Get the annotation data as a dictionary.update
(
tags
,**kwds
)
— Extends baseupdate
, other arguments will be passed through.
save
(
)
Persist the information to the filesystem.
create_from
(
image_path=None
, binary_data=None
, pil_image=None
, **kwds
)
Initialize an annotation with some source image data.
One of image_path
, binary_data
or pil_image
must be provided.
Other arguments will be passed to
update
so metadata
or tags can also be set.
The annotation will be persisted with a call to save
too.
image_path
(optional) — path to an image in the filesystem to use as image for this annotation. The image will be copied into the dataset.binary_data
(optional) — bytes array which encodes the source image. The contents will be dumped to the appropriate image file in the dataset.pil_image
(optional) — [PIL.Image.Image] object to be stored as image file for this annotation in the dataset.
self to allow chaining.
to_dict
(
)
Get the annotation data as a dictionary.
Logogram
(
*args
, **kwargs
)
Annotation for a logogram, with its contained graphemes.
edges
— list of edges found within this logogram.graphemes
— list of bound graphemes found within this logogram.id
— Number which identifies this annotation in its subset.image
(PIL.Image.Image) — image data for this annotation.image_path
— Path to the source image for the annotation. It is the id pluspng
extension.json_path
— Path to the json annotation file. It is the id plusjson
extension.meta
— Dictionary of metadata annotations.tags
— annotated tags for this logogram.
create_from
(
image_path
,binary_data
,pil_image
,**kwds
)
— Initialize an annotation with some source image data.save
(
)
— Persist the information to the filesystem.to_dict
(
)
— Get the annotation data as a dictionary.update
(
tags
,graphemes
,edges
,**kwds
)
— Extends baseupdate
, other arguments will be passed through.
save
(
)
Persist the information to the filesystem.
create_from
(
image_path=None
, binary_data=None
, pil_image=None
, **kwds
)
Initialize an annotation with some source image data.
One of image_path
, binary_data
or pil_image
must be provided.
Other arguments will be passed to
update
so metadata
or tags can also be set.
The annotation will be persisted with a call to save
too.
image_path
(optional) — path to an image in the filesystem to use as image for this annotation. The image will be copied into the dataset.binary_data
(optional) — bytes array which encodes the source image. The contents will be dumped to the appropriate image file in the dataset.pil_image
(optional) — [PIL.Image.Image] object to be stored as image file for this annotation in the dataset.
self to allow chaining.
update
(
tags=None
, graphemes=None
, edges=None
, **kwds
)
Extends base
update
, other
arguments will be passed through.
tags
(optional) — new tags for this logogram (replaces all).graphemes
(optional) — either a list of Graphemes, BoundGraphemes, or dicts with the keys necessary to initialize a BoundGrapheme.edges
(optional) — either a list of Edges, or dicts with the keys necessary to initialize an Edge. In this case, start and end should be the indices of the boundgraphemes in the graphemes list.
to_dict
(
)
Get the annotation data as a dictionary.
BoundGrapheme
(
logogram
, box=[0, 0, 0, 0]
, *args
, **kwargs
)
A grapheme which is not isolated, but rather forms part of a logogram.
To promote this bound grapheme to an isolated grapheme with its own
annotation, create a grapheme object using
create_from
,
passing this object's image
to the argument pil_image
.
box
(list[float]) — Bounding box coordinates (x, y, w, h) of this grapheme within the logogram.
- (x, y): coordinates of the center.
- (w, h): width and height.
[0, 1]
.id
— Number which identifies this annotation in its subset.image
(PIL.Image.Image) — image data for only this grapheme, cropped out of the parent logogram's image.image_path
— Path to the source image for the annotation. It is the id pluspng
extension.json_path
— Path to the json annotation file. It is the id plusjson
extension.logogram
— Logogram where this grapheme is found.meta
— Dictionary of metadata annotations.tags
— annotated tags for this grapheme.
create_from
(
image_path
,binary_data
,pil_image
,**kwds
)
— Initialize an annotation with some source image data.inbound
(
)
— Generator[Edge,None,None]: edges in the logogram ending in this grapheme.outbound
(
)
— Generator[Edge,None,None]: edges in the logogram emanating from this grapheme.save
(
)
— Persist the information to the filesystem.to_dict
(
)
— Get the annotation data as a dictionary.update
(
tags
,**kwds
)
— Extends baseupdate
, other arguments will be passed through.
save
(
)
Persist the information to the filesystem.
create_from
(
image_path=None
, binary_data=None
, pil_image=None
, **kwds
)
Initialize an annotation with some source image data.
One of image_path
, binary_data
or pil_image
must be provided.
Other arguments will be passed to
update
so metadata
or tags can also be set.
The annotation will be persisted with a call to save
too.
image_path
(optional) — path to an image in the filesystem to use as image for this annotation. The image will be copied into the dataset.binary_data
(optional) — bytes array which encodes the source image. The contents will be dumped to the appropriate image file in the dataset.pil_image
(optional) — [PIL.Image.Image] object to be stored as image file for this annotation in the dataset.
self to allow chaining.
to_dict
(
)
Get the annotation data as a dictionary.
outbound
(
)
Generator[Edge,None,None]: edges in the logogram emanating from this grapheme.
inbound
(
)
Generator[Edge,None,None]: edges in the logogram ending in this grapheme.
Edge
(
start
, end
, tags={}
)
An edge between graphemes in a logogram.
Edges are used to connect two graphemes, and can be used to define the
dependency or function between them. The edges and graphemes of a logogram
form a directed graph. The tags for an edge are a dictionary with keys in
the dataset's e_tags
field.
end
— bound grapheme end of the edgestart
— bound grapheme origin of the edgetags
— annotated tags for this edge.
Networks¶
Network objects in Quevedo represent the network itself, but also their configuration, training and testing process, and use. There are two types of networks, Detector networks and Classifier networks, which work on logograms and graphemes respectively.
The Network base class documented here is a base class that defines general
behaviour, but code specific to each type of network lives in its own class.
Therefore, you should get the network from a Quevedo dataset's method
get_network
so that the proper instance
is built.
Network
(
dataset
, name
, config
)
Class representing a neural net to train and predict logograms or graphemes.
config
— Configuration dictionarydataset
— Parent datasetget_tag
— function to get the relevant label for the network from a list of tags according tog_tags
name
— Name of the networkpath
— Path to the network directoryprediction_to_tag
— function to get theg_tags
values from the tag/label/class predicted by the network
auto_annotate
(
annotation
)
— Use the network to automatically annotate a real instance.get_annotations
(
test
)
— Get the annotations configured for use with this network.is_prepared
(
)
— Checks whether the neural network configuration files have been made.is_trained
(
)
— Checks whether the neural network has been trained and can be used to predict.predict
(
image_path
)
— Use the trained neural network to predict results from an image.prepare
(
)
— Creates the files needed for training (and later using) darknet.test
(
annotation
,stats
)
— Method to test the network on an annotation.train
(
initial
)
— Trains the neural network.
is_prepared
(
)
Checks whether the neural network configuration files have been made.
is_trained
(
)
Checks whether the neural network has been trained and can be used to predict.
get_annotations
(
test=False
)
Get the annotations configured for use with this network.
test
(optional) — get test annotations instead of train
a list of relevant Annotations.
prepare
(
)
Creates the files needed for training (and later using) darknet.
Stores the files in the network directory so they can be reused or tracked by a version control system. Must be called before training, and files not deleted (except maybe the "train" directory) before testing or predicting with the net.
train
(
initial=None
)
Trains the neural network.
When finished, removes partial weights and keeps only the last. Can be interrupted and optionally resumed later.
initial
(optional) — path to the weights from which to resume training.
predict
(
image_path
)
Use the trained neural network to predict results from an image.
image_path
— path to the image in the file system.
a list of dictionaries with the predictions. Each prediction has a
confidence
value. Classify networks have a tag
key for the
predicted class, each entry is a possible classification of the
image. Detector networks results are possible graphemes found, each
with a name
(predicted class) and box
(bounding box).
test
(
annotation
, stats
)
Method to test the network on an annotation.
This method is intended for internal library use, you probably want to
use predict
instead.
Uses the network to get the prediction for a real annotation, compare
results and update stats. See test.py
for stats
.
auto_annotate
(
annotation
)
Use the network to automatically annotate a real instance.
In detector networks, existing bound graphemes will be removed. In classify networks, tags which are not relevant to this network won't be modified, so it can be used incrementally.
annotation
— Annotation to automatically tag using this network's predictions.
Pipelines¶
Pipeline objects in Quevedo can often act as substitutes for
networks, to perform their task but using a more complex system of networks and
steps. There are a number of types of pipelines, each represented by a different
class in the code, and a base class that serves as common interface. To get
a configured pipeline from a dataset, use its
get_pipeline
method.
Pipeline
(
dataset
, name
, config
)
A pipeline combines networks and logical steps into a computational graph that can be used for the task of detection or classification of logograms or graphemes.
run
(
a
)
Run the pipeline on the given annotation.
a
(Annotation) — Annotation to run the pipeline on.
predict
(
image_path
)
Run the pipeline on the given image and return the resulting annotation.
image_path
(str) — Path to the image to run the pipeline on.
The resulting annotation.
NetworkPipeline
(
dataset
, name
, config
)
A pipeline step that runs a network on the given annotation.
predict
(
image_path
)
Run the pipeline on the given image and return the resulting annotation.
image_path
(str) — Path to the image to run the pipeline on.
The resulting annotation.
run
(
a
)
Run the pipeline on the given annotation.
a
(Annotation) — Annotation to run the pipeline on.
LogogramPipeline
(
dataset
, name
, config
)
A pipeline for detecting graphemes within a logogram and then classifying them, using networks or sub pipelines.
predict
(
image_path
)
Run the pipeline on the given image and return the resulting annotation.
image_path
(str) — Path to the image to run the pipeline on.
The resulting annotation.
run
(
a
)
Run the pipeline on the given annotation.
a
(Annotation) — Annotation to run the pipeline on.
SequencePipeline
(
dataset
, name
, config
)
A pipeline that runs a sequence of other pipelines. All steps should have the same target.
predict
(
image_path
)
Run the pipeline on the given image and return the resulting annotation.
image_path
(str) — Path to the image to run the pipeline on.
The resulting annotation.
run
(
a
)
Run the pipeline on the given annotation.
a
(Annotation) — Annotation to run the pipeline on.
BranchPipeline
(
dataset
, name
, config
)
A pipeline that runs one of many possible branches depending on a criterion. The criterion can be:
- a tag name: the pipeline will run the branch corresponding to the tag value. Can also be a meta tag.
- a lambda expression: the pipeline will run the branch corresponding to the result of the lambda expression, which will receive the annotation as parameter.
All branches should have the same target.
predict
(
image_path
)
Run the pipeline on the given image and return the resulting annotation.
image_path
(str) — Path to the image to run the pipeline on.
The resulting annotation.
run
(
a
)
Run the pipeline on the given annotation.
a
(Annotation) — Annotation to run the pipeline on.
FunctionPipeline
(
dataset
, name
, config
)
A pipeline that runs a user-defined function.
The config should be a string in the form 'module.py:function'. 'module.py'
should be a file in the scripts
directory of the dataset, and 'function'
should be the name of a function in that file, that accepts an annotation
and a dataset and returns nothing. Alternatively, a string containing
a lambda function, receiving the same arguments, can be used.
The target of this pipeline is deduced from the signature of the function. This is often inconsequential, but if this network is the first of a sequence or branching, its target will be the target for the whole pipeline. To ensure correct deduction, use a type annotation of Logogram or Grapheme for the second parameter.
predict
(
image_path
)
Run the pipeline on the given image and return the resulting annotation.
image_path
(str) — Path to the image to run the pipeline on.
The resulting annotation.
run
(
a
)
Run the pipeline on the given annotation.
a
(Annotation) — Annotation to run the pipeline on.