Teanga Transforms Module
TransformedCorpus
Bases: ImmutableCorpus
A corpus that lazily applies a transformation to its documents.
Source code in teanga/transforms.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
docs
property
Return an iterator over the documents in the corpus. See Corpus.docs for more information.
meta
property
writable
Return the metadata of the corpus. See Corpus.meta for more information.
__init__(corpus, transform)
Create a new TransformedCorpus.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus
|
ImmutableCorpus
|
The corpus to transform. |
required |
transform
|
dict[str, Callable[[str], str]]
|
A dictionary mapping layer names to functions |
required |
Source code in teanga/transforms.py
8 9 10 11 12 13 14 15 16 17 |
|
add_doc(*args, **kwargs)
Add a document to the corpus. See Corpus.add_doc for more information.
Source code in teanga/transforms.py
35 36 37 38 39 |
|
add_layer_meta(name=None, layer_type='characters', base=None, data=None, link_types=None, target=None, default=None)
Add a layer to the corpus. See Corpus.add_layer_meta for more information.
Source code in teanga/transforms.py
25 26 27 28 29 30 31 32 33 |
|
add_meta_from_service(service)
Add metadata from a service to the corpus. See Corpus.add_meta_from_service for more information.
Source code in teanga/transforms.py
19 20 21 22 23 |
|
apply(service)
Apply a service to the corpus. See Corpus.apply for more information.
Source code in teanga/transforms.py
89 90 91 92 93 |
|
doc_by_id(doc_id)
Return a document by its id. See Corpus.doc_by_id for more information.
Source code in teanga/transforms.py
69 70 71 72 73 |
|
doc_ids()
Return a list of document ids in the corpus. See Corpus.doc_ids for more information.
Source code in teanga/transforms.py
41 42 43 44 45 |
|
lower()
Lowercase all the text in the corpus.
Source code in teanga/transforms.py
95 96 97 98 99 100 101 102 103 104 105 |
|
transform(layer, transform)
Transform a layer in the corpus.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
layer
|
str
|
str The name of the layer to transform. |
required |
transform
|
Callable[[str], str]
|
Callable[[str], str] The transformation function. |
required |
Examples:
>>> import teanga
>>> corpus = teanga.text_corpus()
>>> doc = corpus.add_doc("This is a document.")
>>> corpus = corpus.upper().transform("text", lambda x: x[:10])
>>> list(corpus.docs)
[Document('Kjco', {'text': 'THIS IS A '})]
Source code in teanga/transforms.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
transform_doc(doc)
Transform a document using the transformation functions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
doc
|
Document
|
The document to transform. |
required |
Returns:
Type | Description |
---|---|
Document
|
A new document with the transformed layers. |
Source code in teanga/transforms.py
55 56 57 58 59 60 61 62 63 64 65 66 67 |
|
upper()
Uppercase all the text in the corpus.
Source code in teanga/transforms.py
107 108 109 110 111 112 113 114 115 116 117 |
|