Skip to main content

Data Model

This document covers the RevitPy ORM data model: the entity type hierarchy, Pydantic validation rules, cache system, change tracking, relationship management, and the query execution pipeline.

All types are defined in revitpy/orm/validation.py, revitpy/orm/types.py, revitpy/orm/cache.py, revitpy/orm/change_tracker.py, revitpy/orm/relationships.py, and revitpy/orm/query_builder.py.

ORM Entity Types

BaseElement

Defined in revitpy/orm/validation.py. All entity models inherit from BaseElement, which extends Pydantic v2’s BaseModel.

class BaseElement(BaseModel):
    model_config = ConfigDict(
        extra="allow",
        validate_default=True,
        use_enum_values=True,
        validate_assignment=True,
        populate_by_name=True,
        strict=False,
        arbitrary_types_allowed=True,
    )

    id: ElementId          # Required, unique identifier
    name: str | None       # max_length=1000, stripped of whitespace
    category: str | None   # max_length=255, stripped of whitespace
    level_id: ElementId | None
    family_name: str | None  # max_length=255
    type_name: str | None    # max_length=255
    created_at: datetime     # Auto-set to UTC now
    modified_at: datetime    # Auto-set to UTC now
    version: int             # >= 1, default 1
    is_valid: bool           # default True
    state: ElementState      # default UNCHANGED

The validate_assignment=True setting means every field assignment after construction is validated through Pydantic. The extra="allow" setting permits dynamic properties for Revit-specific attributes not covered by the base schema.

Field validators on BaseElement:

  • validate_name – strips whitespace; converts empty strings to None.
  • validate_category – strips whitespace; converts empty strings to None.

Model validator (mode="after"):

  • validate_element – updates modified_at timestamp on every model change.

WallElement

class WallElement(BaseElement):
    height: float      # gt=0, warns if > 100 ft
    length: float      # gt=0
    width: float       # gt=0, warns if > 5 ft (wall thickness)
    area: float | None   # ge=0, auto-calculated as height * length
    volume: float | None # ge=0, auto-calculated as height * length * width
    base_constraint: str | None
    top_constraint: str | None
    base_offset: float       # default 0.0
    top_offset: float        # default 0.0
    structural_material: str | None
    finish_material_interior: str | None
    finish_material_exterior: str | None
    structural: bool         # default False
    fire_rating: int | None  # ge=0, le=4 (hours)

Validators:

  • validate_height – rejects <= 0; logs warning if > 100 ft.
  • validate_width – rejects <= 0; logs warning if > 5 ft.
  • calculate_derived_properties (model validator, mode="after") – auto-computes area and volume from dimensions if not provided.

Cross-property validation (in ElementValidator._validate_element_specific):

  • Checks area consistency: |area - height * length| <= 0.1.
  • Checks volume consistency: |volume - height * length * width| <= 0.1.

RoomElement

class RoomElement(BaseElement):
    number: str         # min_length=1, max_length=50
    area: float         # ge=0, warns if > 10,000 sq ft
    perimeter: float    # ge=0
    volume: float       # ge=0
    department: str | None     # max_length=255
    occupancy: int | None      # ge=0
    ceiling_height: float | None  # gt=0
    temperature: float | None  # ge=-50, le=150 (Fahrenheit)
    humidity: float | None     # ge=0, le=100 (percentage)
    air_flow_required: float | None  # ge=0 (CFM)

Validators:

  • validate_room_number – strips whitespace; rejects empty; allows only alphanumeric plus .-_.
  • validate_area – rejects negative; logs warning if > 10,000 sq ft.

Cross-property validation:

  • Occupancy check: if area / occupancy < 50, flags as too high (minimum 50 sq ft per person).

DoorElement

class DoorElement(BaseElement):
    width: float        # gt=0, warns if > 20 ft
    height: float       # gt=0, warns if > 20 ft
    material: str | None
    fire_rating: float | None  # ge=0, le=4 (hours)
    hardware_set: str | None
    hand: str | None           # pattern: ^(Left|Right)$
    operation_type: str | None # e.g. Swing, Sliding

Validators:

  • validate_dimensions – applied to both width and height; rejects <= 0; logs warning if > 20 ft.

WindowElement

class WindowElement(BaseElement):
    width: float        # gt=0
    height: float       # gt=0
    glass_type: str | None
    frame_material: str | None
    u_factor: float | None           # gt=0
    solar_heat_gain: float | None    # ge=0, le=1 (SHGC coefficient)
    sound_transmission_class: int | None  # ge=0, le=100 (STC rating)
    energy_star_rated: bool          # default False

ElementState Enum

Defined in revitpy/orm/types.py:

State Description
UNCHANGED Entity matches the data source
ADDED New entity, not yet persisted
MODIFIED Entity has pending changes
DELETED Entity marked for deletion
DETACHED Entity is not tracked by any context

Validation Infrastructure

The ElementValidator class supports four strictness levels via the ValidationLevel enum:

Level Behaviour
NONE Skip all validation
BASIC Basic type checking only
STANDARD Standard validation rules (default)
STRICT Strict validation with all constraints

Custom validation rules can be added at runtime via ValidationRule objects:

rule = ValidationRule(
    property_name="height",
    constraint_type=ConstraintType.MAX_VALUE,
    constraint_value=50.0,
    error_message="Height exceeds project limit",
)
validator.add_custom_rule(rule)

Supported constraint types: REQUIRED, MIN_VALUE, MAX_VALUE, MIN_LENGTH, MAX_LENGTH, PATTERN, CUSTOM.

A global default validator is available via get_validator() and its level can be changed with set_validation_level().

Cache System

Defined in revitpy/orm/cache.py.

CacheConfiguration

Parameter Default Description
max_size 10,000 Maximum number of entries
max_memory_mb 500 Maximum memory usage in MB
default_ttl_seconds 3,600 Default time-to-live (1 hour)
eviction_policy LRU Eviction strategy
enable_statistics True Track hit/miss/eviction counters
cleanup_interval_seconds 300 Interval for expired entry cleanup (5 min)
compression_enabled False Data compression (reserved)
thread_safe True Use RLock for thread safety

Eviction Policies

Defined by the EvictionPolicy enum:

Policy Strategy
LRU Least Recently Used – evicts the entry that was accessed least recently. Implemented via OrderedDict.move_to_end().
LFU Least Frequently Used – evicts the entry with the lowest access_count.
FIFO First In, First Out – evicts the oldest entry by insertion order.
TTL Time To Live only – relies solely on TTL expiration.
SIZE_BASED Based on estimated memory size per entry.

Cache Architecture

CacheManager (high-level API)
  |
  +-- CacheBackend (abstract)
        |
        +-- MemoryCache (default in-memory implementation)
              |
              +-- OrderedDict[str, CacheEntry]
              +-- Dependency tracking (key -> dependent keys)
              +-- Reverse dependency index (dependent -> source keys)

CacheManager wraps a CacheBackend and adds:

  • CacheStatistics – tracks hits, misses, evictions, invalidations, and memory usage. All statistics counters are guarded by threading.RLock.
  • Invalidation callbacks – registered functions called whenever a cache entry is invalidated.
  • Dependency-based invalidationinvalidate_by_dependency(dependency) cascades to all entries that declared a dependency on the given key.
  • Pattern-based invalidationinvalidate_by_pattern(pattern) removes all entries whose key string contains the pattern.

Cache Keys

CacheKey is a dataclass with four optional components:

@dataclass
class CacheKey:
    entity_type: str
    query_hash: str | None = None
    entity_id: Any | None = None
    relationship_path: str | None = None

String representation: entity_type|id:entity_id|query:query_hash|rel:relationship_path.

Factory functions:

  • create_entity_cache_key(entity_type, entity_id) – for individual entity lookup.
  • create_query_cache_key(entity_type, query_hash) – for query result caching.
  • create_relationship_cache_key(entity_type, entity_id, relationship_path) – for relationship data.

Cache Entries

CacheEntry carries the cached data along with metadata:

@dataclass
class CacheEntry:
    key: CacheKey
    data: Any
    created_at: datetime
    accessed_at: datetime
    access_count: int = 0
    ttl_seconds: int | None = None
    dependencies: set[str] = field(default_factory=set)

The is_expired property checks (now - created_at) > ttl_seconds. The mark_accessed() method updates accessed_at and increments access_count.

Memory estimation uses a constant of approximately 1,000 bytes per entry (MEMORY_USAGE_PER_ENTRY_BYTES), and capacity checks use 0.001 MB per entry (MEMORY_PER_ENTRY_ESTIMATE_MB).

Change Tracking

Defined in revitpy/orm/change_tracker.py.

Change Types

The ChangeType enum covers:

Type Description
PROPERTY_CHANGED A property value was modified
RELATIONSHIP_ADDED A relationship was added
RELATIONSHIP_REMOVED A relationship was removed
ENTITY_ADDED A new entity was attached
ENTITY_DELETED An entity was marked for deletion
ENTITY_ATTACHED An entity was attached to the tracker
ENTITY_DETACHED An entity was detached from the tracker

EntityTracker

Each tracked entity gets its own EntityTracker instance, which stores:

  • original_values – snapshot taken at attach time via snapshot_current_state().
  • current_values – accumulated changes.
  • property_changes – dictionary of PropertyChange objects keyed by property name.
  • relationship_changes – list of RelationshipChange objects.
  • state – current ElementState.
  • version – incremented on each accept_changes() and snapshot_current_state().

The is_dirty property returns True when state != UNCHANGED or when there are any tracked property or relationship changes.

ChangeTracker

The main ChangeTracker class manages a collection of EntityTracker instances:

  • attach(entity, entity_id) – begins tracking; takes an initial state snapshot.
  • detach(entity_id) – stops tracking.
  • track_property_change(entity, property_name, old_value, new_value) – records a property change; auto-attaches if not tracked.
  • track_relationship_change(entity, relationship_name, change_type, related_entity) – records a relationship change.
  • mark_as_added(entity) / mark_as_deleted(entity) – set entity state.
  • accept_changes(entity_id=None) – moves current values to original; clears change records.
  • reject_changes(entity_id=None) – reverts entity attributes to original_values.
  • get_all_changes() – returns ChangeSet objects for all dirty entities.

Thread safety is controlled by the thread_safe constructor parameter. When enabled, all public methods acquire a threading.RLock.

A @track_changes decorator is provided for automatic change tracking on setter methods.

Batch Operations

ChangeTracker also supports batch operations via BatchOperation objects:

@dataclass
class BatchOperation:
    operation_type: BatchOperationType  # INSERT, UPDATE, DELETE, BULK_UPDATE
    entity: Any
    properties: dict[str, Any]
    operation_id: UUID
    dependencies: list[UUID]

Relationship Management

Defined in revitpy/orm/relationships.py.

Relationship Types

Type Class Description
One-to-One OneToOneRelationship Single related entity
One-to-Many OneToManyRelationship Collection of related entities; supports add() and remove()
Many-to-Many ManyToManyRelationship Collection with junction table awareness

All relationship classes inherit from Relationship[T, R] and implement both load(entity) and load_async(entity).

Load Strategies

Defined by the LoadStrategy enum in revitpy/orm/types.py:

Strategy Description
LAZY Load on first access
EAGER Load with parent entity
SELECT Use a separate select query
BATCH Batch load multiple entities

Relationship Configuration

@dataclass
class RelationshipConfiguration:
    name: str
    relationship_type: RelationshipType
    target_entity: type
    foreign_key: str | None = None
    inverse_property: str | None = None
    load_strategy: LoadStrategy = LoadStrategy.LAZY
    cascade: set[CascadeAction] = field(default_factory=set)
    cache_enabled: bool = True
    batch_size: int = 100

Cascade actions: NONE, DELETE, DELETE_ORPHAN, MERGE, PERSIST, REFRESH, DETACH.

RelationshipManager

RelationshipManager is the central registry for all relationships:

  • register_one_to_one(source_type, name, target_type, ...) – registers a 1:1 relationship.
  • register_one_to_many(source_type, name, target_type, ...) – registers a 1:N relationship.
  • register_many_to_many(source_type, name, target_type, ...) – registers an M:N relationship.
  • load_relationship(entity, relationship_name) – loads relationship data using the registered loader.
  • invalidate_entity(entity) – invalidates all cached relationship data for an entity.

Inverse relationships are tracked in a separate _inverse_relationships dictionary for bidirectional navigation.

Caching Integration

Each relationship instance maintains:

  • An in-memory _loaded_entities dictionary for fast re-access.
  • Integration with CacheManager for persistent caching when cache_enabled=True.

When a relationship collection is modified (via add() or remove() on OneToManyRelationship), the corresponding cache entry is automatically invalidated.

Query Execution Pipeline

Pipeline Stages

1. QueryBuilder.where/select/order_by/skip/take/distinct
   (Appends operations to QueryPlan, returns new QueryBuilder clone)
       |
2. Terminal method called (to_list, first, count, etc.)
       |
3. QueryPlan.optimize()
   - Moves filters before projections
   - Estimates cost (filter=2.0, select=1.0, order_by=3.0, skip/take=0.1, distinct=2.5)
   - Sets use_index=True if any filters exist
   - Enables parallel_execution if estimated_cost > 10.0
   - Applies OPTIMIZATION_IMPROVEMENT_FACTOR (0.8) to total cost
       |
4. LazyQueryExecutor.execute()
   - Checks query cache (by MD5 hash of plan operations)
   - Fetches initial elements from provider (by type or all)
   - Builds lazy generator chain:
     * filter -> generator expression
     * select -> generator expression
     * order_by -> sorted() (materialises)
     * skip -> itertools.islice(elements, count, None)
     * take -> itertools.islice(elements, count)
     * distinct -> custom generator with seen-set
   - Materialises with list()
   - Caches results if < 1,000 elements and cache policy != NONE
       |
5. Result returned to caller

Async Execution

Async terminal methods (to_list_async, first_async, count_async) delegate to LazyQueryExecutor.execute_async(), which runs the synchronous pipeline in a thread executor via asyncio.get_event_loop().run_in_executor().

Streaming Execution

StreamingQuery wraps a QueryBuilder and yields results in batches:

streaming = context.query(WallElement).where(...).as_streaming(batch_size=100)
async for batch in streaming:
    process(batch)

For queries with parallel_execution enabled, the full result set is computed first and then yielded in chunks. For smaller queries, all results are yielded in a single batch.