SalesJobsBoard - Curated B2B Sales Jobs

Overview

You're the technical expert who proves Modular's platform actually delivers on performance and portability promises. You run POCs in customer environments, benchmark inference speed, and answer deep technical questions about MLIR, GPU optimization, and model serving. Post-BentoML acquisition, you're now demoing the full stack (Mojo/MAX optimization + BentoML serving).

Role Snapshot

Aspect	Details
Role Type	Pre-sales Sales Engineer (POC-heavy)
Sales Motion	Supporting AEs on mid-market and enterprise deals
Deal Complexity	Highly technical - ML engineers evaluating infrastructure
Sales Cycle	2-8 months depending on customer segment
Deal Size	You support $50K-$1M+ deals
Quota (est.)	Measured on deals influenced, not personal quota (likely $4-6M influenced ARR)

Company Context

Stage: Series B (315 employees)

Size: 315 employees

Growth: Just acquired BentoML - you're now responsible for demoing both optimization (Mojo/MAX) and serving layers

Market Position: Category creator - you're explaining new concepts (AI compute hypervisor, hardware portability) to customers

GTM Reality

SE to AE Ratio: Likely 1:3-4 (one SE supports 3-4 AEs), so you're juggling 8-12 active POCs at a time

Demo Frequency: 5-8 technical demos per week, plus 2-3 active POCs running in customer environments

POC Success Rate: Probably 40-50% of POCs convert to deals - technical validation passes, but budget/timing kills some

Competitive Landscape

Technical Objections You'll Handle:

"How is this different from TensorRT/Triton?" (NVIDIA's native optimization)
"We're already using BentoML OSS - what's different in paid version?"
"Can you really deliver cross-hardware performance?" (skepticism on AMD support)
"What's the migration path from our current serving layer?"
"Show me the benchmarks on OUR models with OUR data" (custom POC requirements)

Your Win Themes:

Live performance demos showing 2-3x speedup on their actual workloads
Deploying in their BYOC environment to prove security/compliance
Running same model on NVIDIA and AMD to prove portability
Showing engineering time savings (less infrastructure code to maintain)

What You'll Actually Do

Time Breakdown

POCs & Technical Validation (40%) | Demos & Discovery (30%) | Internal Prep (20%) | Customer Support (10%)

Key Activities

Running technical discovery calls: You're on early sales calls asking questions about their ML stack - what frameworks (PyTorch, TensorFlow), what models (LLMs, embeddings, vision), what inference volume, what hardware (NVIDIA GPUs, which generation), current serving setup. You're taking notes for POC scoping.
Delivering product demos: You're screen-sharing and walking through Modular's platform - showing how to deploy a model with Mojo/MAX, how BentoML serving works, how to configure for different hardware. Demos are 45-60 minutes with lots of technical Q&A from ML engineers.
Scoping and executing POCs: You're defining success criteria with the customer ("2x throughput improvement" or "sub-100ms p99 latency"), getting access to their cloud environment or VPC, deploying Modular, running their actual models, and producing benchmark reports. POCs take 2-6 weeks and you're managing 2-3 simultaneously.
Troubleshooting technical issues: During POCs, things break - CUDA version mismatches, model compilation errors, infrastructure access issues. You're debugging with customers' ML engineers, working with your product/engineering team to fix bugs, and unblocking deployments.
Building custom demos and POC environments: Between customer meetings, you're maintaining demo infrastructure, updating benchmarks for new product releases, and building repeatable POC templates for common use cases (LLM serving, recommendation engines, etc.).
Answering deep technical questions: Prospects ask about MLIR compiler internals, how Mojo handles memory management, performance on AMD MI250 vs NVIDIA A100, how BentoML's autoscaling works under the hood. You need to know this stuff or find answers quickly.

The Honest Reality

What's Hard

POCs are unpredictable and time-consuming: You think a POC will take 2 weeks, but customer's ML engineer is busy for 3 weeks, then their infrastructure team takes another week to grant VPC access, then you find a compatibility issue with their CUDA version. POCs routinely take 2x longer than planned.
You're explaining cutting-edge tech that's not widely understood: Mojo is new. Many ML engineers haven't heard of MLIR. You're educating people on concepts while also trying to prove value. Lots of "wait, explain that again" in demos.
Post-POC, you don't control the deal: You can prove 3x performance improvement, but the deal still dies in procurement or because they decide to wait until next quarter. You did your job perfectly, but the AE couldn't close.
Customer environments are messy: Every enterprise has a unique ML infrastructure snowflake. You're dealing with custom Docker images, legacy CUDA versions, weird networking constraints, and security policies that block everything. POCs involve a lot of environment wrangling.
Balancing BentoML OSS vs paid positioning: Existing BentoML users might ask "why can't we just keep using free version?" You need to articulate enterprise value without sounding like you're taking away their open source toy.

What Success Looks Like

40-50% POC win rate - technical validation passes and converts to closed deal
Supporting $4-6M in closed ARR per year across your AE team
Running 8-12 active POCs simultaneously without letting quality slip
Building repeatable POC frameworks that reduce time-to-value for common use cases

Who You're Selling To

Primary Contacts:

Principal/Staff ML Engineers - Hands-on technical evaluator, runs POC on their side, needs to be convinced this is better than building in-house
ML Platform / MLOps Engineers - Infrastructure owner, cares about deployment reliability and maintenance burden
VP/Director of Engineering - Sponsor who approved POC, wants to see clear business outcomes (cost savings, team efficiency)

What They Care About:

Performance: Does it actually run faster on their specific models and data?
Ease of migration: How much work to switch from current serving layer?
Reliability: Will this break in production? What's the failure mode?
Vendor lock-in concerns: Can they export their models? What if Modular disappears?
Support: When things break at 2am, can they get help?

Requirements

3+ years as Sales Engineer, Solutions Architect, or ML Engineer working on production model serving
Deep understanding of ML infrastructure - CUDA, GPUs, model optimization, inference serving, containerization
Hands-on experience with PyTorch or TensorFlow in production (you need to code, not just demo)
Can explain complex technical concepts to both ML engineers (peer-level) and executives (simplified)
Experience running technical POCs in customer environments (BYOC, VPC deployments, security compliance)
Comfortable with ambiguity - you're selling new technology that doesn't fit existing categories
Bonus: Contributed to ML open source projects or familiar with BentoML, Ray, Triton, TensorRT

Sales Engineer