Overview
You're the technical expert who proves Modular's platform actually delivers on performance and portability promises. You run POCs in customer environments, benchmark inference speed, and answer deep technical questions about MLIR, GPU optimization, and model serving. Post-BentoML acquisition, you're now demoing the full stack (Mojo/MAX optimization + BentoML serving).
Role Snapshot
| Aspect | Details |
|---|---|
| Role Type | Pre-sales Sales Engineer (POC-heavy) |
| Sales Motion | Supporting AEs on mid-market and enterprise deals |
| Deal Complexity | Highly technical - ML engineers evaluating infrastructure |
| Sales Cycle | 2-8 months depending on customer segment |
| Deal Size | You support $50K-$1M+ deals |
| Quota (est.) | Measured on deals influenced, not personal quota (likely $4-6M influenced ARR) |
Company Context
Stage: Series B (315 employees)
Size: 315 employees
Growth: Just acquired BentoML - you're now responsible for demoing both optimization (Mojo/MAX) and serving layers
Market Position: Category creator - you're explaining new concepts (AI compute hypervisor, hardware portability) to customers
GTM Reality
SE to AE Ratio: Likely 1:3-4 (one SE supports 3-4 AEs), so you're juggling 8-12 active POCs at a time
Demo Frequency: 5-8 technical demos per week, plus 2-3 active POCs running in customer environments
POC Success Rate: Probably 40-50% of POCs convert to deals - technical validation passes, but budget/timing kills some
Competitive Landscape
Technical Objections You'll Handle:
- "How is this different from TensorRT/Triton?" (NVIDIA's native optimization)
- "We're already using BentoML OSS - what's different in paid version?"
- "Can you really deliver cross-hardware performance?" (skepticism on AMD support)
- "What's the migration path from our current serving layer?"
- "Show me the benchmarks on OUR models with OUR data" (custom POC requirements)
Your Win Themes:
- Live performance demos showing 2-3x speedup on their actual workloads
- Deploying in their BYOC environment to prove security/compliance
- Running same model on NVIDIA and AMD to prove portability
- Showing engineering time savings (less infrastructure code to maintain)
What You'll Actually Do
Time Breakdown
POCs & Technical Validation (40%) | Demos & Discovery (30%) | Internal Prep (20%) | Customer Support (10%)
Key Activities
-
Running technical discovery calls: You're on early sales calls asking questions about their ML stack - what frameworks (PyTorch, TensorFlow), what models (LLMs, embeddings, vision), what inference volume, what hardware (NVIDIA GPUs, which generation), current serving setup. You're taking notes for POC scoping.
-
Delivering product demos: You're screen-sharing and walking through Modular's platform - showing how to deploy a model with Mojo/MAX, how BentoML serving works, how to configure for different hardware. Demos are 45-60 minutes with lots of technical Q&A from ML engineers.
-
Scoping and executing POCs: You're defining success criteria with the customer ("2x throughput improvement" or "sub-100ms p99 latency"), getting access to their cloud environment or VPC, deploying Modular, running their actual models, and producing benchmark reports. POCs take 2-6 weeks and you're managing 2-3 simultaneously.
-
Troubleshooting technical issues: During POCs, things break - CUDA version mismatches, model compilation errors, infrastructure access issues. You're debugging with customers' ML engineers, working with your product/engineering team to fix bugs, and unblocking deployments.
-
Building custom demos and POC environments: Between customer meetings, you're maintaining demo infrastructure, updating benchmarks for new product releases, and building repeatable POC templates for common use cases (LLM serving, recommendation engines, etc.).
-
Answering deep technical questions: Prospects ask about MLIR compiler internals, how Mojo handles memory management, performance on AMD MI250 vs NVIDIA A100, how BentoML's autoscaling works under the hood. You need to know this stuff or find answers quickly.
The Honest Reality
What's Hard
-
POCs are unpredictable and time-consuming: You think a POC will take 2 weeks, but customer's ML engineer is busy for 3 weeks, then their infrastructure team takes another week to grant VPC access, then you find a compatibility issue with their CUDA version. POCs routinely take 2x longer than planned.
-
You're explaining cutting-edge tech that's not widely understood: Mojo is new. Many ML engineers haven't heard of MLIR. You're educating people on concepts while also trying to prove value. Lots of "wait, explain that again" in demos.
-
Post-POC, you don't control the deal: You can prove 3x performance improvement, but the deal still dies in procurement or because they decide to wait until next quarter. You did your job perfectly, but the AE couldn't close.
-
Customer environments are messy: Every enterprise has a unique ML infrastructure snowflake. You're dealing with custom Docker images, legacy CUDA versions, weird networking constraints, and security policies that block everything. POCs involve a lot of environment wrangling.
-
Balancing BentoML OSS vs paid positioning: Existing BentoML users might ask "why can't we just keep using free version?" You need to articulate enterprise value without sounding like you're taking away their open source toy.
What Success Looks Like
- 40-50% POC win rate - technical validation passes and converts to closed deal
- Supporting $4-6M in closed ARR per year across your AE team
- Running 8-12 active POCs simultaneously without letting quality slip
- Building repeatable POC frameworks that reduce time-to-value for common use cases
Who You're Selling To
Primary Contacts:
- Principal/Staff ML Engineers - Hands-on technical evaluator, runs POC on their side, needs to be convinced this is better than building in-house
- ML Platform / MLOps Engineers - Infrastructure owner, cares about deployment reliability and maintenance burden
- VP/Director of Engineering - Sponsor who approved POC, wants to see clear business outcomes (cost savings, team efficiency)
What They Care About:
- Performance: Does it actually run faster on their specific models and data?
- Ease of migration: How much work to switch from current serving layer?
- Reliability: Will this break in production? What's the failure mode?
- Vendor lock-in concerns: Can they export their models? What if Modular disappears?
- Support: When things break at 2am, can they get help?
Requirements
- 3+ years as Sales Engineer, Solutions Architect, or ML Engineer working on production model serving
- Deep understanding of ML infrastructure - CUDA, GPUs, model optimization, inference serving, containerization
- Hands-on experience with PyTorch or TensorFlow in production (you need to code, not just demo)
- Can explain complex technical concepts to both ML engineers (peer-level) and executives (simplified)
- Experience running technical POCs in customer environments (BYOC, VPC deployments, security compliance)
- Comfortable with ambiguity - you're selling new technology that doesn't fit existing categories
- Bonus: Contributed to ML open source projects or familiar with BentoML, Ray, Triton, TensorRT