Tukuy is a robust, extensible data transformation library that leverages a flexible plugin system. It simplifies the manipulation, validation, and extraction of data across multiple formats (text, HTML, JSON, dates, numbers, and more), making it an ideal tool for building data pipelines and cleaning workflows.
A flexible data transformation library with a plugin system for Python.
Tukuy (meaning βto transformβ or βto becomeβ in Quechua) is a powerful and extensible data transformation library that makes it easy to manipulate, validate, and extract data from various formats. With its plugin architecture, Tukuy provides a unified interface for working with text, HTML, JSON, dates, numbers, and more.
pip install tukuy
from tukuy import TukuyTransformer
# Create transformer
TUKUY = TukuyTransformer()
# Basic text transformation
text = " Hello World! "
result = TUKUY.transform(text, [
"strip",
"lowercase",
{"function": "truncate", "length": 5}
])
print(result) # "hello..."
# HTML transformation
html = "<div>Hello <b>World</b>!</div>"
result = TUKUY.transform(html, [
"strip_html_tags",
"lowercase"
])
print(result) # "hello world!"
# Date transformation
date_str = "2023-01-01"
age = TUKUY.transform(date_str, [
{"function": "age_calc"}
])
print(age) # 1
# Validation
email = "[email protected]"
valid = TUKUY.transform(email, ["email_validator"])
print(valid) # "[email protected]" or None if invalid
Tukuy uses a plugin system to organize transformers into logical groups and make it easy to extend functionality.
You can create custom plugins by extending the TransformerPlugin
class:
from tukuy.plugins import TransformerPlugin
from tukuy.base import ChainableTransformer
class ReverseTransformer(ChainableTransformer[str, str]):
def validate(self, value: str) -> bool:
return isinstance(value, str)
def _transform(self, value: str, context=None) -> str:
return value[::-1]
class MyPlugin(TransformerPlugin):
def __init__(self):
super().__init__("my_plugin")
@property
def transformers(self):
return {
'reverse': lambda _: ReverseTransformer('reverse')
}
# Usage
TUKUY = TukuyTransformer()
TUKUY.register_plugin(MyPlugin())
result = TUKUY.transform("hello", ["reverse"]) # "olleh"
See the example plugin for a more detailed example.
Plugins can implement initialize()
and cleanup()
methods for setup and teardown:
class MyPlugin(TransformerPlugin):
def initialize(self) -> None:
super().initialize()
# Load resources, connect to databases, etc.
def cleanup(self) -> None:
super().cleanup()
# Close connections, free resources, etc.
Tukuy provides powerful pattern-based extraction capabilities for both HTML and JSON data.
pattern = {
"properties": [
{
"name": "title",
"selector": "h1",
"transform": ["strip", "lowercase"]
},
{
"name": "links",
"selector": "a",
"attribute": "href",
"type": "array"
}
]
}
data = TUKUY.extract_html_with_pattern(html, pattern)
pattern = {
"properties": [
{
"name": "user",
"selector": "data.user",
"properties": [
{
"name": "name",
"selector": "fullName",
"transform": ["strip"]
}
]
}
]
}
data = TUKUY.extract_json_with_pattern(json_str, pattern)
Tukuy is designed to handle a wide range of data transformation scenarios:
Tukuy provides comprehensive error handling with detailed error messages:
from tukuy.exceptions import ValidationError, TransformationError, ParseError
try:
result = TUKUY.transform(data, transformations)
except ValidationError as e:
print(f"Validation failed: {e}")
except ParseError as e:
print(f"Parsing failed: {e}")
except TransformationError as e:
print(f"Transformation failed: {e}")
Contributions are welcome! Hereβs how you can help:
git checkout -b feature/amazing-feature
)pytest
git commit -m 'Add amazing feature'
)git push origin feature/amazing-feature
)