Creating processors¶
In order to create Mustash processors, you need to create a new class
inheriting from either Processor
, FieldProcessor
,
or another existing processor.
In this guide, we will walk through steps to create sample processors exploiting each case.
Creating field processors¶
If the processor you’re trying to create takes a value from one field, applies
some logic to it, and then updates the document with the result either on
the same field, or to a separate field, it is recommended to inherit from
FieldProcessor
directly:
It defines all of the settings you may want for such a use case, including:
FieldProcessor.field
: the field from which to read the value;FieldProcessor.ignore_missing
: whether to just not apply the processor if the source field is not present in the document;FieldProcessor.target_field
: the target field to which the result should be applied, if the source field shouldn’t be updated in-place;FieldProcessor.remove_if_successful
: whether to remove the source field if the target field is different and the processor has been applied successfully.
It is able to validate the type of the value present in the source field, with a simple syntax tweak;
It implements
Processor.apply()
, and defines a more appropriateFieldProcessor.process()
method you can override.
Let’s assume we want to create a processor for adding , haha
to a field
assumed to contain a string. There are two options on the inheritance:
We can inherit from
FieldProcessor
directly, and implement the type checking on the source field ourselves inFieldProcessor.process()
;We can inherit from
FieldProcessor[str]
, and let the parent class handle type checking.
For ease of implementation, we prefer to use option 2. We can now define our class with the appropriate method:
from mustash.core import FieldProcessor
class HahaProcessor(FieldProcessor[str]):
"""Processor for adding ``", haha"`` to a field."""
async def process(self, value: str, /) -> str:
return value + ", haha"
In order to test our processor, we can create our sample document and check the result. Here’s a full snippet on how to do so:
from __future__ import annotations
import asyncio
from mustash.core import Document, FieldProcessor
class HahaProcessor(FieldProcessor[str]):
"""Processor for adding ``", haha"`` to a field."""
async def process(self, value: str, /) -> str:
return value + ", haha"
d: Document = {"my_field": "hello, world"}
processor = HahaProcessor(field="my_field")
asyncio.run(processor.apply(d))
print(d)
The script above prints the following:
{'my_field': 'hello, world, haha'}
If we don’t systematically want to add , haha
as a suffix, but also others,
we can add an additional option on the processor called suffix
.
A complete snippet for doing precisely this and testing it is the following:
from __future__ import annotations
import asyncio
from mustash.core import Document, FieldProcessor
class SuffixProcessor(FieldProcessor[str]):
"""Processor for adding a suffix to a field."""
suffix: str
"""Suffix to add to the field."""
async def process(self, value: str, /) -> str:
return value + self.suffix
d: Document = {"my_field": "hello, world"}
processor = SuffixProcessor(field="my_field", suffix=", wow")
asyncio.run(processor.apply(d))
print(d)
This snippet prints the following:
{'my_field': 'hello, world, wow'}
Creating more complex processors¶
If your processor has a logic more complex than a simple field processor,
especially if it has multiple inputs and/or multiple outputs, you will need
to inherit from Processor
, define your settings and override
Processor.apply()
.
Let’s assume we want to create a processor that computes the sum of two
fields and places the result in a third one. We will need to define options
to provide the path to all three fields, using FieldPath
:
from mustash.core import FieldPath, Processor
class SumProcessor(Processor):
"""Processor for computing the sum of two fields into a third one."""
first_field: FieldPath
"""Path to the first field."""
second_field: FieldPath
"""Path to the second field."""
target_field: FieldPath
"""Path to the target field, to set with the sum."""
If you want to ensure all fields are different, you can use a model validator:
from typing import Self
from mustash.core import FieldPath, Processor
from pydantic import model_validator
class SumProcessor(Processor):
"""Processor for computing the sum of two fields into a third one."""
first_field: FieldPath
"""Path to the first field."""
second_field: FieldPath
"""Path to the second field."""
target_field: FieldPath
"""Path to the target field, to set with the sum."""
@model_validator(mode="after")
def _validate(self, /) -> Self:
if (
self.first_field == self.second_field
or self.first_field == self.target_field
or self.second_field == self.target_field
):
raise ValueError("All three fields must be different.")
return self
Note
When using model validators with built-in processors to Mustash, contributors usually prefix them with an underscore so that they won’t appear in the Sphinx documentation.
You must then override Processor.apply()
to make your validations
and operations on a provided document:
from mustash.core import Document
class SumProcessor:
...
async def apply(self, document: Document, /) -> None:
first = self.first_field.get(document, cls=int)
second = self.second_field.get(document, cls=int)
self.target_field.set(document, first + second)
Note
Processor.apply()
should be written in a naive manner, as
exceptions are handled by either Processor.__call__()
, or
the function above.
You can now test your processor using a sample document! Here’s a full snippet you can run:
from __future__ import annotations
import asyncio
from typing import Self
from pydantic import model_validator
from mustash.core import Document, FieldPath, Processor
class SumProcessor(Processor):
"""Processor for computing the sum of two fields into a third one."""
first_field: FieldPath
"""Path to the first field."""
second_field: FieldPath
"""Path to the second field."""
target_field: FieldPath
"""Path to the target field, to set with the sum."""
@model_validator(mode="after")
def _validate(self, /) -> Self:
if (
self.first_field == self.second_field
or self.first_field == self.target_field
or self.second_field == self.target_field
):
raise ValueError("All three fields must be different.")
return self
async def apply(self, document: Document, /) -> None:
first = self.first_field.get(document, cls=int)
second = self.second_field.get(document, cls=int)
self.target_field.set(document, first + second)
d: Document = {"farm": "Old MacDonalds", "animals": {"chickens": 4, "cows": 7}}
processor = SumProcessor(
first_field="animals.chickens",
second_field="animals.cows",
target_field="animals.total",
)
asyncio.run(processor.apply(d))
print(d)
This snippet prints the following:
{'farm': 'Old MacDonalds', 'animals': {'chickens': 4, 'cows': 7, 'total': 11}}