Creating processors

In order to create Mustash processors, you need to create a new class inheriting from either Processor, FieldProcessor, or another existing processor.

In this guide, we will walk through steps to create sample processors exploiting each case.

Creating field processors

If the processor you’re trying to create takes a value from one field, applies some logic to it, and then updates the document with the result either on the same field, or to a separate field, it is recommended to inherit from FieldProcessor directly:

  • It defines all of the settings you may want for such a use case, including:

  • It is able to validate the type of the value present in the source field, with a simple syntax tweak;

  • It implements Processor.apply(), and defines a more appropriate FieldProcessor.process() method you can override.

Let’s assume we want to create a processor for adding , haha to a field assumed to contain a string. There are two options on the inheritance:

  1. We can inherit from FieldProcessor directly, and implement the type checking on the source field ourselves in FieldProcessor.process();

  2. We can inherit from FieldProcessor[str], and let the parent class handle type checking.

For ease of implementation, we prefer to use option 2. We can now define our class with the appropriate method:

from mustash.core import FieldProcessor


class HahaProcessor(FieldProcessor[str]):
    """Processor for adding ``", haha"`` to a field."""

    async def process(self, value: str, /) -> str:
        return value + ", haha"

In order to test our processor, we can create our sample document and check the result. Here’s a full snippet on how to do so:

from __future__ import annotations

import asyncio

from mustash.core import Document, FieldProcessor


class HahaProcessor(FieldProcessor[str]):
    """Processor for adding ``", haha"`` to a field."""

    async def process(self, value: str, /) -> str:
        return value + ", haha"


d: Document = {"my_field": "hello, world"}
processor = HahaProcessor(field="my_field")
asyncio.run(processor.apply(d))
print(d)

The script above prints the following:

{'my_field': 'hello, world, haha'}

If we don’t systematically want to add , haha as a suffix, but also others, we can add an additional option on the processor called suffix. A complete snippet for doing precisely this and testing it is the following:

from __future__ import annotations

import asyncio

from mustash.core import Document, FieldProcessor


class SuffixProcessor(FieldProcessor[str]):
    """Processor for adding a suffix to a field."""

    suffix: str
    """Suffix to add to the field."""

    async def process(self, value: str, /) -> str:
        return value + self.suffix


d: Document = {"my_field": "hello, world"}
processor = SuffixProcessor(field="my_field", suffix=", wow")
asyncio.run(processor.apply(d))
print(d)

This snippet prints the following:

{'my_field': 'hello, world, wow'}

Creating more complex processors

If your processor has a logic more complex than a simple field processor, especially if it has multiple inputs and/or multiple outputs, you will need to inherit from Processor, define your settings and override Processor.apply().

Let’s assume we want to create a processor that computes the sum of two fields and places the result in a third one. We will need to define options to provide the path to all three fields, using FieldPath:

from mustash.core import FieldPath, Processor


class SumProcessor(Processor):
    """Processor for computing the sum of two fields into a third one."""

    first_field: FieldPath
    """Path to the first field."""

    second_field: FieldPath
    """Path to the second field."""

    target_field: FieldPath
    """Path to the target field, to set with the sum."""

If you want to ensure all fields are different, you can use a model validator:

from typing import Self

from mustash.core import FieldPath, Processor
from pydantic import model_validator


class SumProcessor(Processor):
    """Processor for computing the sum of two fields into a third one."""

    first_field: FieldPath
    """Path to the first field."""

    second_field: FieldPath
    """Path to the second field."""

    target_field: FieldPath
    """Path to the target field, to set with the sum."""

    @model_validator(mode="after")
    def _validate(self, /) -> Self:
        if (
            self.first_field == self.second_field
            or self.first_field == self.target_field
            or self.second_field == self.target_field
        ):
            raise ValueError("All three fields must be different.")

        return self

Note

When using model validators with built-in processors to Mustash, contributors usually prefix them with an underscore so that they won’t appear in the Sphinx documentation.

You must then override Processor.apply() to make your validations and operations on a provided document:

from mustash.core import Document


class SumProcessor:
    ...

    async def apply(self, document: Document, /) -> None:
        first = self.first_field.get(document, cls=int)
        second = self.second_field.get(document, cls=int)
        self.target_field.set(document, first + second)

Note

Processor.apply() should be written in a naive manner, as exceptions are handled by either Processor.__call__(), or the function above.

You can now test your processor using a sample document! Here’s a full snippet you can run:

from __future__ import annotations

import asyncio
from typing import Self

from pydantic import model_validator

from mustash.core import Document, FieldPath, Processor


class SumProcessor(Processor):
    """Processor for computing the sum of two fields into a third one."""

    first_field: FieldPath
    """Path to the first field."""

    second_field: FieldPath
    """Path to the second field."""

    target_field: FieldPath
    """Path to the target field, to set with the sum."""

    @model_validator(mode="after")
    def _validate(self, /) -> Self:
        if (
            self.first_field == self.second_field
            or self.first_field == self.target_field
            or self.second_field == self.target_field
        ):
            raise ValueError("All three fields must be different.")

        return self

    async def apply(self, document: Document, /) -> None:
        first = self.first_field.get(document, cls=int)
        second = self.second_field.get(document, cls=int)
        self.target_field.set(document, first + second)


d: Document = {"farm": "Old MacDonalds", "animals": {"chickens": 4, "cows": 7}}
processor = SumProcessor(
    first_field="animals.chickens",
    second_field="animals.cows",
    target_field="animals.total",
)
asyncio.run(processor.apply(d))
print(d)

This snippet prints the following:

{'farm': 'Old MacDonalds', 'animals': {'chickens': 4, 'cows': 7, 'total': 11}}