Every respondent identified the same root cause, though the depth of engagement varied considerably. The core friction isn't the test fit itself — it's the manual labor of translating three incompatible input formats into a usable starting point for the Revit prototype. The design team is acting as the human bridge between inconsistent inputs and a parametric model, and that bridge-building is what consumes the two days, not the architectural judgment that follows.
The two highest-scoring respondents pushed this diagnosis further and arrived at the same structural insight from different angles. Both argued that the real deliverable for a simple fit — the venue metrics package that feeds cost estimation and revenue analysis — is deterministic given a site geometry. That means the system doesn't necessarily need to produce a Revit model at all. A pipeline that extracts geometry from the site plan and applies the parametric rules directly would solve the bottleneck without touching the Revit API. This is a meaningful reframe: it changes the build problem from "automate Revit" to "extract geometry, apply rules, output metrics" — a more tractable and more auditable target.
The respondent representing a purpose-built spatial feasibility platform argued differently: keep Revit as the engine and build around it, automating the repetitive configuration steps while leaving the site planner in control of judgment calls. That's a legitimate path, and it preserves the existing workflow rather than replacing it. The practical question is whether Live Nation needs Revit as the output format for downstream users, or whether the metrics package is sufficient — that question wasn't answered in the brief, and several respondents named it as a critical clarification before any scoping can happen.
Where every strong respondent converged was on the prior standardization failure. The attempt was abandoned because exceptions were too frequent. The design error was trying to constrain inputs. The new system has to constrain itself instead — tolerant of every input format the team sees today, surfacing its own uncertainty when geometry extraction isn't confident rather than generating wrong metrics silently.
Three of four respondents proposed the same first-30-day structure: spend the first week embedded with the design team watching actual test fits (not reviewing documentation), use weeks two and three to audit input formats and the Revit model's automation surface area, and end month one with a working prototype running against historical fits. The deliverable at Day 30 is not a finished product — it's a decision artifact backed by working code: here's prototype accuracy per metrics field, here's what Phase 2 needs to build.
Two specific moves appeared across nearly every response. First, sample real inputs before proposing anything: pull 10–15 historical test fits across the format mix — DWG, GIS, Google Maps — and measure what fraction of site geometry current parsing can extract reliably. That ratio sets the realistic automation coverage before anyone commits to a scope. Second, audit the Revit prototype model directly: establish which dimensions the team currently adjusts are driveable via the Revit API, and which require manual geometric edits. Several respondents flagged that if the prototype wasn't built with automation in mind, hardening it becomes part of the engagement scope — and needs to be discovered in week one, not week six.
The meaningful split is in ambition. Kler and Word both structured Phase 1 to end with a named go/no-go gate with the right decision-makers before Phase 2 commitment — the right posture given that the decision sits at President level. Blackwood proposed a tighter timeline (30 days to full MVP delivery via the Giraffe spatial platform), which reflects a different proposition: a purpose-built tool with existing automation rather than a custom build from first principles, with a correspondingly different risk profile.
All four respondents proposed phased structures, and three built an explicit go/no-go gate between phases. The consistent logic: this problem has enough open unknowns — input format distribution, Revit API surface area, what specifically caused the prior failure — that committing to a full production build upfront is premature. Phase 1 is designed to surface those unknowns and produce working code on real historical fits, so Phase 2 scope can be grounded in observed performance rather than estimated. The most notable structural split is custom pipeline (three respondents) vs. purpose-built spatial platform (Giraffe): different build risk profiles and different dependencies on third-party roadmap.
| Expert / Firm | Engagement Model | Indicative Budget | Fit |
|---|---|---|---|
| Aanikh Kler Lazer Technologies |
Phase 1 (4–6 wks): embedded discovery, parallel input + Revit audits, working slice on highest-leverage cut, go/no-go with John/Jordan/David. Phase 2 (8–12 wks): full ingest pipeline, Revit adaptation layer, exception-routing UI, throughput measurement. | Not specified | Strong Fit |
| James Word NextFocus AI |
Phase 1 (~4 wks): audit inputs, prototype against historical fits, go/no-go gate. Phase 2 (8 wks): full triage pipeline. Phase 3 (4 wks): hardening + training. | ~$57K–$128K total Phase 1 standalone: $12K–$18K |
Strong Fit |
| James Blackwood Giraffe |
Scoping call + PoC demo → contract + MVP config → refine to spec → optional integrations with capital allocation process. 30-day delivery claim. | Not specified | Fit |
| Justin D'Iorio Baseforge |
Discovery phase (4 wks): process maps, time allocation, feasibility analysis, prioritized recommendations. Build MVP scope defined at close of Discovery. | ~$40K Discovery Build TBD |
Fit |
Word's $12–18K Phase 1 is the most accessible entry point in the set — it stands on its own as a scoping investment even if Live Nation decides not to proceed further. D'Iorio's $40K discovery is the most expensive pure-discovery phase, but includes the most formal deliverable (process maps + prioritized recommendations). Neither Kler nor Blackwood specified pricing, which will need to be addressed before shortlisting.
Multiple respondents named the same hidden dependency: "simple vs. complex" is currently an intuitive call, not a defined standard. The whole pipeline depends on a classifier that can route routine jobs to automation and flag complex ones for the team. If that classifier can't be built reliably from observable input signals, the automation layer either over-reaches into jobs that needed judgment, or under-delivers because the team still triages everything manually. This is week-one discovery work, not an assumption to carry into Phase 2.
"Adjusting width, depth, doors, and loading dock placement while maintaining capacity" reads like parametric work — but whether those edits are driveable via the Revit API or require manual geometric operations is an open question. Two respondents flagged this as the second critical unknown: if the prototype wasn't built with automation in mind, hardening it becomes part of the engagement scope and needs to be discovered in Phase 1, not week six.
Google Maps screenshots are fundamentally different from DWG files — computer vision accuracy is lower, cost is higher, and dimensional estimations carry inherent uncertainty. If a meaningful fraction of incoming fits arrive as raster images, the system needs to handle them gracefully, surfacing its own confidence level rather than generating wrong metrics silently. The actual format mix is unknown and is prerequisite to scoping.
The prior standardization attempt was abandoned because exceptions were too frequent. Several respondents argued that understanding what specifically broke it — not just that it broke — is the spec for what the new system has to tolerate. A system that makes the same architectural error (constraining inputs rather than absorbing them) will fail the same way.
The two days a simple test fit takes is not two days of architectural judgment. Most of it is the design team acting as the human bridge between three incompatible input formats (DWG, GIS, scaled Google Maps imagery) and a Revit prototype model. Two steps inside that bridge are mechanical enough to absorb: site geometry interpretation, and the iterative width/depth/door/loading-dock adjustments that keep capacity fixed. The third step — validating the result — is human judgment but takes minutes once the first two are right.
The reframe: the goal isn't to make test fits faster in general. It's to absorb steps (a) and (b) on the routine jobs and hand everything back to the team unchanged with the same Revit metrics output the cost-estimation and revenue-analysis pipelines already consume. The prior standardization attempt failed because the team tried to constrain inputs. The system has to constrain itself instead — tolerant of every format, surfacing its own uncertainty when it can't confidently extract geometry.
What we'd validate first: sample 10–15 real inputs across the format mix, measure reliable extraction fraction per format; audit the Revit prototype's configurable surface area vs. what requires manual edits; understand what specifically failed last time; define "simple" rigorously by observable input signals.
Week 1 — Embedded discovery. Sit in with John and the team. Watch 2–3 test fits live, including one where the input is a scaled Google Maps image. Pull 10–15 historical test fits across the input-format mix as the audit corpus.
Week 2 — Two parallel audits. Input audit: bench-test extraction against the corpus for each format (DWGs against ODA/Teigha libraries; GIS against GDAL/OGR; scaled imagery against current vision models with reference-scale calibration). Revit configurability audit: for each dimension the team currently adjusts, establish whether it's driveable via Revit API/Dynamo or requires manual geometric edits. This sets the automation ceiling.
Week 3 — Synthesis. Map the workflow end-to-end, rate each step by automation feasibility, define "simple" rigorously (which observable signals classify a job as routine), pressure-test with John's team.
Week 4 — Working slice + decision artifact. Build a working prototype on the single highest-leverage cut (DWG → extracted site geometry → driving the Revit prototype's configurable parameters). Run on 3–5 historical fits. Measure against the 2-day baseline. Hand back a decision artifact substantive enough for Jordan and David to decide on.
Two phases, gated by working prototype.
Phase 1 — Discovery + working slice (4–6 weeks): Embedded sessions with John's team. Input audit on 10–15 historical fits. Revit prototype model audit. Reverse-engineer the prior standardization failure. Ship a working slice on the highest-leverage cut. Decision gate with John, Jordan, David: continue to Phase 2, refine scope, or stop.
Phase 2 — Production tooling (8–12 weeks): Extend ingest to remaining input formats (GIS + Google Maps with explicit uncertainty surfacing). Wrap the Revit prototype-adaptation step into a configurable workflow so the full metrics package generates without manual model edits. Lightweight exception-routing UI: anything the pipeline doesn't confidently classify as simple goes back to the design team with whatever partial work the pipeline produced. Measure throughput against the existing 2-day baseline.
The classifier is the system. If "simple vs. complex" can't be reliably detected from observable inputs, automation either over-reaches or under-delivers. The single biggest project risk is shipping good ingest + good Revit automation behind a weak classifier — net effect is no throughput gain and eroded team trust. Phase 1 has to prove the classifier works on real historical samples before Phase 2 commits to production tooling.
The Revit prototype's configurable surface area is the second hidden dependency. "Adjusting width, depth, door and loading-dock placement while maintaining capacity" reads like parametric work, but Revit's API exposes some edits cleanly and others only through manual geometric operations. If the prototype wasn't built with automation in mind, hardening it is part of the engagement — and has to be scoped explicitly, not discovered in week 6.
Koru — AI pipeline for heterogeneous document extraction (Insurance): Built an AI extraction pipeline for insurance submission documents (SOVs, Excel, PDF schedules) — no consistent format across submitters. Framework-based architecture with swappable underlying model. 80% reported accuracy on extraction task. Positive pilot used in market conversations.
Leo Berwick LLP — Operationalizing an exception-heavy expert workflow (M&A diligence): Replaced a manual diligence process with dynamic questionnaires that re-shaped per deal configuration, dependency logic across 500 conditional questions, AI-assisted PDF parsing, and a structured report builder. Prior systematization attempts had stalled. Live in production. Why it maps: "Both have a prior failed attempt to enforce regularity on the human side of the process. Both require the build to absorb irregularity in the system rather than push it back on users."
The core problem is a triage problem. Right now every incoming fit hits the same three-person team regardless of whether it's a simple site or a genuinely complex one. The simple ones take two days each and they pile up, which means the team has less bandwidth for the hard stuff that actually needs their expertise.
The other thing that jumped out: for simple fits, the real deliverable may not need to be a Revit model. It's the metrics package (square footage, capacities, toilet counts, POS, building section, etc.). Revit calculates those from an adapted model, which means the math is deterministic. An AI and code pipeline that takes in a site plan, extracts the geometry, and generates the metrics directly would solve the problem without touching the Revit API at all.
Three clarifying questions before proposing anything: (1) How many incoming site plans are Google Maps screenshots vs. clean DWG files? That changes difficulty significantly. (2) Does anyone downstream actually need a Revit file for simple fits, or is the metrics package enough? (3) Is "simple vs. complex" a defined standard or an intuitive assumption?
Week 1: Sit down with the design team and walk through a few recent fits — one simple and one complex. See where the time actually goes, what the judgment calls look like, and what formats the site plans come in. Ask for 3–5 completed simple fits with both the inputs and the final metrics packages to test against.
Weeks 2–3: Dig into the parametric rules behind each metric. For each one (sq footage, capacity, toilets, POS, etc.), figure out what drives the number and where judgment enters vs. where it's pure math. In parallel, start building the parsing pipeline against the benchmark inputs and generating prototype metrics.
Week 4: Compare prototype output to actual Revit packages, document accuracy per field, write up an honest recommendation. If it works: here's the plan for Phase 2. If it doesn't: here's why and what would need to change.
Phase 1 — Discovery + prototype (~4 weeks, ~$12K–18K): Audit input formats, reverse-engineer the parametric rules behind each metric, build a working prototype against 2–3 historical simple fits, measure accuracy against actual Revit output. Clear go/no-go gate at the end. Phase 1 stands on its own.
Phase 2 — Production pipeline (8 weeks, ~$35K–80K): Full intake system where every fit enters the pipeline and comes out sorted: simple fits get an auto-generated metrics package, complex ones get flagged with specific reasons so the team starts with context instead of from scratch. Feedback loop built in so the team can flag misclassifications and the system improves over time.
Phase 3 — Hardening (4 weeks, ~$10K–30K): Tune accuracy on real production data, handle edge cases, train the team, set up quarterly review cadence.
Total: ~$57K–$128K. Well within the indicated budget. Phase 1 standalone if the client wants to evaluate before going further.
Input format is the biggest variable. DWG and GIS files are straightforward to parse with existing tooling. Google Maps screenshots are a different story: computer vision accuracy drops and cost goes up. If a significant portion of incoming fits arrive as raster images, that changes the scope meaningfully and expectations about accuracy need to be set explicitly.
Parametric rules may be more complex than they look. Toilet counts that vary by occupancy class, POS placement driven by code requirements — if those rules encode years of institutional knowledge about venue compliance, replicating them takes real effort. Phase 1 is designed to figure this out before anyone commits to the full build.
Medical image classification pipeline (Healthcare): Built a multimodal LLM vision system that ingests high-res X-ray images, extracts structured measurements, and sorts cases into three buckets: auto-process, escalate to specialist, or flag for review. The three-bucket approach (not binary) is what made false negatives manageable. Direct pattern match: technical image in, structured metrics out, confidence-based triage so the expert team only sees what actually needs them.
Autonomous AI pipeline for technical document processing: Runs continuously, takes in complex technical deliverables, puts them through multiple rounds of quality checks (OCR, correctness, completeness, architectural fit) until they pass on every dimension. Routes different steps to different models depending on task needs, catches regressions, fully air-gapped and scalable. "The venue fit pipeline needs the same kind of architecture: messy inputs come in, structured multi-step processing happens, and reliable output comes out the other end without someone watching it."
Prior to tools like Giraffe, test fits were best done in Revit — legacy, detailed design tools that are overly complex for a test fit or site plan. The issue with newer testfit tools is they're not purpose-built for the exact case study Live Nation is solving for.
Giraffe is a first-principles spatial engine built to be assembled around any test fitting workflow — whether that's a site plan for a music venue, a temporary housing deployment, or a complex master plan. It provides spatial tools to understand context instantly, design tools to think through site problems, bespoke algorithms to solve for repeat design problems, and automated analysis to produce exact schedules for each project.
The largest problem identified from the brief is disconnected workflows in non-modern tools — resulting in data translation fatigue and rework. We'd want to validate the detail of the site plans and how best to reproduce them; identify which design workflows are repetitive (and therefore automatable) and which are best done uniquely to the site; and understand the take-offs and downstream calculations to automate and produce consistent results.
This project should take a maximum of 30 days. The key is Giraffe becoming experts in the Live Nation workflow, designing a solution, and delivering via the Giraffe toolkit.
Phase 1: Initial engagement — scoping call to document the workflow. Deliverable: statement of works with project estimate and recommended automations. Includes an initial PoC demo of the workflow in Giraffe.
Phase 2: Execute contract, deliver licensing and MVP configuration, get a round of feedback from Live Nation team.
Phase 3: Refine automations to Live Nation spec.
Phase 4 (optional): If appetite, scope further integrations with the Live Nation capital allocation process.
Full site planning automation is not realistic. What makes Live Nation effective as site planners is their understanding of the detail and nuance of the operation of their assets. The key with building a good system is to only automate repetitive tasks, and to integrate the entire workflow — leaving the site planner as the human making the final decisions, not an algorithm.
Trammell Crow — Kit of parts for site search & design feasibility: Reduced decision times from weeks to hours. Case study
NSW Government — Spatial feasibility platform deployment: Reduced $15M in consulting spend in the first year. Case study
Giraffe is an established feasibility platform across all sectors in real estate with additional redacted case studies available on request.
The core problem is the input variability, and coming up with a solution that can be flexible enough to handle a wide range of inputs.
Before proposing anything: where is time actually spent during the 2 days? That will help baseline the value for solutioning so we can prioritize lowest investment, highest value. And where are current tools failing vs. excelling?
Discovery phase first — map the current business processes end-to-end, decompose by time-per-activity, review sample data across the input variability, and run working sessions with the design team.
Discovery phase (4 weeks, ~$40K): Process maps, time allocation by activity, feasibility and effort-to-automate by component, and prioritized recommendations grounded in team interviews and independent analysis. Even in the worst case, Live Nation walks away with a fully documented process and a set of quick wins to improve efficiency immediately, regardless of whether we move forward together on a build.
Build MVP phase: Informed directly by Discovery findings. Scope and pricing defined at the close of Discovery, when we have the data to commit to both with confidence rather than estimate them upfront.
Falling for the trap of a one-shot end-to-end solution. From experience, it is likely going to look more like an orchestration of solutions that gets you 80–90% of the way there.
AI Estimating Agent — Nationwide General Contractor: Client's estimating function was a growth bottleneck — existing estimators at capacity, qualified estimators hard to hire, bid win rates trailing benchmarks. Built an estimating agent ingesting structured and unstructured data from third-party sources, historical invoices, and cost sheets, applying codified estimating playbooks to ensure consistent methodology across estimators. Outcome: reclaimed estimator capacity, improved bid consistency, 1.3% win-rate improvement at the four-month mark. "Came across a similar issue with RFPs that varied from blueprints to one-page Word documents."