Skip to content

Operators

Abstract

Operators are functions that perform operations on Mappings.

combine

Combines two recursive tree structures using a binary operator op to resolve conflicts at the leaf nodes.

This is a powerful generalization of merge. It recursively walks two tree structures and applies the op only when it encounters a conflict at the leaf nodes (e.g., two scalars at the same path, or a structural mismatch like a dict vs a list).

You can pass a custom callable (old, new) -> resolved or use one of the pre-built strategies from the resolver enums: Resolver (structural), LogicalResolver (bitwise/sets), or NumericResolver (math).

Some of the most common resolvers from the Resolver enum include:

  • Resolver.FIRST: Keeps the original value (tree1).
  • Resolver.LAST: Overwrites with the new value (tree2).
  • Resolver.COALESCE_FIRST: Returns the first value if it is not None, otherwise the last value.
  • Resolver.COALESCE_LAST: Returns the last value if it is not None, otherwise the first value.
  • Resolver.PREFER_FIRST: Returns the first truthy value, otherwise the last value.
  • Resolver.PREFER_LAST: Returns the last truthy value, otherwise the first value.
  • Resolver.ALL: Combines both values into a tuple.
  • Resolver.FAIL: Raises a ValueError on any conflict.

If you just need the standard "last-wins" behavior, you can use the simpler merge operator, which is essentially combine(tree1, tree2, op=Resolver.LAST), barring the difference in handling list vs. scalars conflicts.

Combining with different conflict resolutions

from mappingtools.operators import combine
from mappingtools.resolvers import NumericResolver, Resolver

tree1 = {"a": 1, "b": {"c": 10}}
tree2 = {"a": 2, "b": {"c": 20}, "d": 5}

# 1. Sum conflicts
summed = combine(tree1, tree2, op=NumericResolver.SUM)
print(summed)
# output: {'a': 3, 'b': {'c': 30}, 'd': 5}

# 2. Keep original values on conflict
first_wins = combine(tree1, tree2, op=Resolver.FIRST)
print(first_wins)
# output: {'a': 1, 'b': {'c': 10}, 'd': 5}

# 3. Coalesce None values
tree3 = {"a": 0, "b": {"c": None}}
coalesced = combine(tree3, tree2, op=Resolver.COALESCE_LAST)
print(coalesced)
# output: {'a': 2, 'b': {'c': 20}, 'd': 5}

Decision Metrics

You can optionally extract side-channel metadata companion trees about how the combination occurred (e.g., tracking provenance, conflict audits, or mutation changelogs) in a single high-performance recursive traversal pass.

To do this, pass a list of strategies (DecisionMetric) or custom callbacks to the decision_metrics parameter. When passed, combine will return a 2-tuple: (combined_tree, metrics_dict).

Built-in metric strategies include:

  • DecisionMetric.AUDIT: Produces conflict log descriptions (e.g., "conflict: 10 vs 20 -> 20" or "clean").
  • DecisionMetric.CHANGELOG: Tracks mutation status relative to tree1 ("added", "updated", "unchanged").
  • DecisionMetric.PROVENANCE: Tracks which tree (0: tree1, 1: tree2, None: composite/aggregative conflict) each leaf came from.

Extracting decision metrics in a single pass

from mappingtools.operators import combine
from mappingtools.resolvers import DecisionMetric, NumericResolver

tree1 = {"a": 10, "b": {"c": 100}, "d": [1, 2]}
tree2 = {"a": 20, "b": {"c": 200, "e": 300}, "d": [3]}

# Extract multiple metadata trees simultaneously
combined, metrics = combine(
    tree1, 
    tree2, 
    NumericResolver.SUM, 
    [DecisionMetric.PROVENANCE, DecisionMetric.AUDIT, DecisionMetric.CHANGELOG]
)

print(combined)
# output: {'a': 30, 'b': {'c': 300, 'e': 300}, 'd': [4, 2]}

print(metrics["PROVENANCE"])
# output: {'a': None, 'b': {'c': None, 'e': 1}, 'd': [None, 0]}

print(metrics["AUDIT"])
# output: {'a': 'conflict: 10 vs 20 -> 30', 'b': {'c': 'conflict: 100 vs 200 -> 300', 'e': 'clean'}, 'd': ['conflict: 1 vs 3 -> 4', 'clean']}

print(metrics["CHANGELOG"])
# output: {'a': 'updated', 'b': {'c': 'updated', 'e': 'added'}, 'd': ['updated', 'unchanged']}

distinct

Yields distinct values for a specified key across multiple mappings.

Example

from mappingtools.operators import distinct

mappings = [
    {'a': 1, 'b': 2},
    {'a': 2, 'b': 3},
    {'a': 1, 'b': 4}
]
distinct_values = list(distinct('a', *mappings))
print(distinct_values)
# output: [1, 2]

flatten

The flatten function takes a nested tree structure (dicts and lists) and converts it into a single-level dictionary. The key path formatting can be customized using the key_format parameter with the KeyFormat enum:

  • KeyFormat.TUPLE: Keys are path tuples of (key, index, ...). (Default)
  • KeyFormat.STR: Keys are string-joined path parts (e.g. '"a","b","c"').
  • KeyFormat.JAVASCRIPT: Keys are Javascript-style dot/bracket paths (e.g. a[0].b).
  • KeyFormat.JSONPATH: Keys are RFC 9535 JSONPath strings (e.g. $.a[0].b).
  • KeyFormat.JSONPOINTER: Keys are RFC 6901 JSON Pointer strings (e.g. /a/0/b).

Example

from mappingtools.operators import KeyFormat, flatten

nested_dict = {
    'a': {'b': 1, 'c': {'d': 2}},
    'e': 3
}

# 1. Default (Tuple keys)
flat_dict = flatten(nested_dict)
print(flat_dict)
# output: {('a', 'b'): 1, ('a', 'c', 'd'): 2, ('e',): 3}

# 2. JSONPath keys
jsonpath_dict = flatten(nested_dict, key_format=KeyFormat.JSONPATH)
print(jsonpath_dict)
# output: {'$.a.b': 1, '$.a.c.d': 2, '$.e': 3}

inverse

Swaps keys and values in a dictionary.

Example

1
2
3
4
5
6
from mappingtools.operators import inverse

original_mapping = {'a': {1, 2}, 'b': {3}}
inverted_mapping = inverse(original_mapping)
print(inverted_mapping)
# output: defaultdict(<class 'set'>, {1: {'a'}, 2: {'a'}, 3: {'b'}})

merge

A pure function (Monoid operation) to deeply merge two recursive tree structures. The merging strategy resolves conflicts by overwriting existing values with new ones (right-side precedence), unless the conflict is a list vs. scalar, in which case it concatenates (appends/prepends) the list.

Mathematically, this operation forms a composite Monoid:

  • Last Monoid (Scalar Fallback): When resolving conflicts between simple values, the right-hand side (tree2) wins.
  • Pointwise Monoid (Dictionary Merge): If the values are dictionaries, they are merged by key, recursively calling merge on the values.
  • Zip Monoid (List Merge): If both are lists, they are zipped and merged positionally, substituting MISSING for missing indices.
  • Free Monoid (Mixed List/Scalar): If one is a list and the other is a scalar/dict, it concatenates (appends/prepends).

Because it forms a Monoid, this function can be used with functools.reduce to collect an iterable of trees into a single structure.

Merging two trees directly

1
2
3
4
5
6
7
8
from mappingtools.operators import merge

tree1 = {"a": 1, "b": [1, 2]}
tree2 = {"b": [3], "c": 4}

merged = merge(tree1, tree2)
print(merged)
# output: {'a': 1, 'b': [3, 2], 'c': 4}

Reducing an iterable of trees

Using Python's standard functools.reduce, we can easily merge an entire sequence of nested structures.

from functools import reduce
from mappingtools.operators import merge

trees = [
    {"a": 1, "b": {"c": 2}},
    {"b": {"d": 3}},
    {"a": 10}, # Overwrites previous "a"
]

merged = reduce(merge, trees)
print(merged)
# output: {'a': 10, 'b': {'c': 2, 'd': 3}}

Deep merging with Lenses

If you need to merge data into a specific, deeply nested location of a larger tree, you can compose the merge function with an Optic (Lens). This avoids modifying the pure merge function with path traversal logic.

from mappingtools.operators import merge
from mappingtools.optics import Lens

system_state = {"system": {"config": {"retries": 3}}}
new_config = {"timeout": 30}

# Focus specifically on the 'config' node inside 'system'
config_lens = Lens.path("system", "config")

# Apply the merge function OVER the focused node
new_state = config_lens.modify(
    system_state, 
    lambda old: merge(old, new_config)
)

print(new_state)
# output: {'system': {'config': {'retries': 3, 'timeout': 30}}}

pivot

Reshapes a list of mappings into a nested dictionary based on index and column keys. Supports different aggregation modes via Aggregation.

Example

from mappingtools.operators import pivot
from mappingtools.aggregations import Aggregation

data = [
    {"city": "NYC", "month": "Jan", "temp": 10},
    {"city": "NYC", "month": "Feb", "temp": 12},
    {"city": "LON", "month": "Jan", "temp": 5},
    {"city": "NYC", "month": "Jan", "temp": 20}, # Duplicate
]

# Default mode (LAST wins)
result = pivot(data, index="city", columns="month", values="temp")
print(result)
# output: {'NYC': {'Jan': 20, 'Feb': 12}, 'LON': {'Jan': 5}}

# Aggregation mode: ALL (collect list)
result_all = pivot(data, index="city", columns="month", values="temp", aggregation=Aggregation.ALL)
print(result_all["NYC"]["Jan"])
# output: [10, 20]

reshape

A generalization of pivot that creates nested dictionaries (tensors) of arbitrary depth. While pivot is limited to 2 dimensions (Index, Columns), reshape accepts a sequence of keys to define the hierarchy.

Example

from mappingtools.operators import reshape
from mappingtools.aggregations import Aggregation

data = [
    {"country": "US", "state": "NY", "city": "NYC", "pop": 8.4},
    {"country": "US", "state": "CA", "city": "LA", "pop": 3.9},
    {"country": "UK", "state": "ENG", "city": "LON", "pop": 8.9},
    {"country": "US", "state": "NY", "city": "Albany", "pop": 0.1},
]

# 3-Level Hierarchy: Country -> State -> City
tree = reshape(data, keys=["country", "state", "city"], value="pop")

print(tree["US"]["NY"]["NYC"])
# output: 8.4

# Aggregation: Sum population by Country -> State
# (City is marginalized/ignored)
state_pop = reshape(
    data, 
    keys=["country", "state"], 
    value="pop", 
    aggregation=Aggregation.SUM
)

print(state_pop["US"]["NY"])
# output: 8.5

# Deep Keys (using Lenses or Callables)
# If your data is nested, you can use callables to extract keys.
# This works perfectly with the library's `Lens` or standard `operator.itemgetter`.

nested_data = [
    {"id": 1, "meta": {"region": "US"}, "val": 10},
    {"id": 2, "meta": {"region": "UK"}, "val": 20},
]

# Group by meta.region
deep_tree = reshape(
    nested_data, 
    keys=[lambda x: x["meta"]["region"]], 
    value="val"
)
# output: {'US': 10, 'UK': 20}

rekey

Transforms keys of a mapping based on a factory function of (key, value). This allows "re-indexing" a mapping where the new key depends on the content of the value or a combination of the old key and value. Collisions are handled according to the specified aggregation.

Example

from mappingtools.operators import rekey
from mappingtools.aggregations import Aggregation

mapping = {
    "alice": {"dept": "IT", "id": 1},
    "bob": {"dept": "HR", "id": 2},
    "charlie": {"dept": "IT", "id": 3},
}

# Re-index by 'id'
by_id = rekey(mapping, lambda k, v: v["id"])
print(by_id[1])
# output: {'dept': 'IT', 'id': 1}

# Group by 'dept' using Aggregation.ALL
by_dept = rekey(mapping, lambda k, v: v["dept"], aggregation=Aggregation.ALL)
print(list(by_dept.keys()))
# output: ['IT', 'HR']
print(len(by_dept["IT"]))
# output: 2

rename

Renames keys in a mapping based on a mapper (Mapping or Callable). If a key is not present in the mapper, it remains unchanged. Collisions are handled according to the specified aggregation.

Example

from mappingtools.operators import rename

data = {"usr_id": 1, "usr_name": "Alice", "email": "alice@example.com"}

# Using a mapping
renamed = rename(data, {"usr_id": "id", "usr_name": "name"})
print(list(renamed.keys()))
# output: ['id', 'name', 'email']

# Using a callable
renamed_upper = rename(data, str.upper)
print(list(renamed_upper.keys()))
# output: ['USR_ID', 'USR_NAME', 'EMAIL']