Workflow Context & Architecture
Museum digital asset pipelines require deterministic, time-bound state transitions. Embargoes bridge donor agreements, institutional review periods, and statutory copyright terms. Unlike static access controls, these workflows rely on temporal triggers. Continuous evaluation against asset metadata prevents premature exposure. Production systems decouple evaluation from ingestion. Scheduled asynchronous evaluators poll metadata stores. They apply compliance gates and emit state-change events. This design integrates with broader Rights Metadata Mapping & Licensing Automation frameworks. Embargo expiration never bypasses downstream rights verification. The pipeline handles batched evaluation. It enforces strict UTC normalization. Routing remains idempotent to prevent duplicate publish events.
stateDiagram-v2
[*] --> active
active --> expired: end date reached
expired --> pending_review: rights re-check
pending_review --> published: cleared
pending_review --> active: re-embargo
published --> [*]Standards Alignment & Metadata Mapping
Embargo states must serialize predictably across LIDO and IIIF. LIDO captures temporal restrictions within <lido:rightsWork> and <lido:descriptiveMetadata>. The <lido:termRightsType> element stores the embargo classification. An IIIF Presentation API 3.0 manifest may carry an optional rights property holding a single license or rights-statement URI. Access control itself is not a manifest property: it is handled by the separate IIIF Authorization Flow (Auth) API, referenced through the service property, and that auth layer enforces the embargo’s temporal boundaries. Systems should parse these fields into normalized Python objects. Consistent mapping prevents downstream routing failures. See Setting Date-Based Embargo Triggers for boundary calculation logic. Reference the official LIDO v1.1 specification for element placement rules.
Core Evaluation Engine (Python 3.9+)
The evaluator operates as a bounded-concurrency service. It fetches asset batches, validates temporal boundaries, and routes based on state. Synchronous blocking is prohibited in production. Connection pooling with aiohttp manages throughput. Type hints enforce strict schema validation. Pydantic models guarantee data integrity before evaluation. The following implementation demonstrates a production-ready pattern.
import asyncio
import logging
from datetime import datetime, timezone
from typing import Optional, List, Dict, Any, Union
from pydantic import BaseModel, field_validator
from enum import Enum
import aiohttp
logger = logging.getLogger("embargo_pipeline")
class EmbargoState(str, Enum):
ACTIVE = "active"
EXPIRED = "expired"
PENDING_REVIEW = "pending_review"
class AssetMetadata(BaseModel):
asset_id: str
embargo_start: datetime
embargo_end: Optional[datetime] = None
rights_statement: Optional[str] = None
state: EmbargoState = EmbargoState.ACTIVE
@field_validator("embargo_start", "embargo_end", mode="before")
@classmethod
def enforce_utc(cls, v: Optional[Union[str, datetime]]) -> Optional[datetime]:
if v is None:
return v
if isinstance(v, str):
v = datetime.fromisoformat(v)
return v.replace(tzinfo=timezone.utc) if v.tzinfo is None else v
async def fetch_asset_batch(session: aiohttp.ClientSession, offset: int, limit: int) -> List[Dict[str, Any]]:
"""Fetch paginated metadata from CMS/DAM API with strict timeout.
The session must be created with a base URL for the relative path below,
e.g. ``aiohttp.ClientSession(base_url="https://dam.internal")``.
"""
async with session.get(
"/api/v1/assets/embargoed",
params={"offset": offset, "limit": limit},
timeout=aiohttp.ClientTimeout(total=15)
) as resp:
resp.raise_for_status()
return await resp.json()
def evaluate_embargo_state(asset: AssetMetadata, now: datetime) -> EmbargoState:
"""Determine current embargo status using UTC-normalized boundaries."""
if asset.embargo_end is None:
return EmbargoState.ACTIVE
if now >= asset.embargo_end:
return EmbargoState.EXPIRED
return EmbargoState.ACTIVERouting & State Transition Logic
Evaluation outputs must trigger deterministic routing. Expired assets require secondary validation. Copyright status checks verify statutory terms before public release. The routing layer applies a fallback chain when rights data is incomplete. Pending review assets enter an internal queue. Active assets remain isolated from public endpoints. Event-driven architectures publish state changes to a message broker. Consumers update CMS records and regenerate IIIF manifests. This approach aligns with Automating Copyright Status Checks for post-embargo verification.
Production Deployment & Idempotency
Idempotent routing prevents duplicate publish events. Each state transition requires a unique correlation ID. The pipeline retries failed evaluations with exponential backoff. Connection limits protect downstream DAM APIs. Monitoring tracks evaluation latency and error rates. Alerting triggers when embargo boundaries drift from expected values. Scheduled cron jobs or Kubernetes CronJobs execute the evaluator. Logs capture asset IDs, previous states, and new routing destinations. This ensures auditability for donor compliance. Consult asyncio concurrency patterns for semaphore tuning.
Integration with Licensing Pipelines
Embargo expiration does not imply open access. Assets transition to a rights verification stage. Creative Commons routing applies only after statutory clearance. The pipeline validates license compatibility against institutional policies. Public domain thresholds require precise date arithmetic. Missing metadata triggers fallback chains for manual review. Final publication updates the IIIF manifest rights field. Access tokens expire alongside embargo boundaries. This workflow guarantees compliance across all distribution channels. See Routing Creative Commons Licenses for downstream license assignment patterns.
Conclusion
The critical design decision is that embargo expiration triggers a pending_review state, not direct publication. This enforces a secondary rights check — typically copyright status verification — before any asset becomes publicly accessible. The IIIF Auth API, not the manifest’s rights property, is the correct mechanism for enforcing access control at serving time; the manifest records what rights apply, while the Auth API controls who can exercise them.