Tim Davis

Sales Engineer

Modular

Sales EngineerBalancedEnterpriseRemote📍 Remote
Deal Size: $50K-$1M+ ACV
Sales Cycle: 2-8 months
Posted by Tim Davis

Overview

You're the technical expert who proves Modular's platform actually delivers on performance and portability promises. You run POCs in customer environments, benchmark inference speed, and answer deep technical questions about MLIR, GPU optimization, and model serving. Post-BentoML acquisition, you're now demoing the full stack (Mojo/MAX optimization + BentoML serving).


Role Snapshot

AspectDetails
Role TypePre-sales Sales Engineer (POC-heavy)
Sales MotionSupporting AEs on mid-market and enterprise deals
Deal ComplexityHighly technical - ML engineers evaluating infrastructure
Sales Cycle2-8 months depending on customer segment
Deal SizeYou support $50K-$1M+ deals
Quota (est.)Measured on deals influenced, not personal quota (likely $4-6M influenced ARR)

Company Context

Stage: Series B (315 employees)

Size: 315 employees

Growth: Just acquired BentoML - you're now responsible for demoing both optimization (Mojo/MAX) and serving layers

Market Position: Category creator - you're explaining new concepts (AI compute hypervisor, hardware portability) to customers


GTM Reality

SE to AE Ratio: Likely 1:3-4 (one SE supports 3-4 AEs), so you're juggling 8-12 active POCs at a time

Demo Frequency: 5-8 technical demos per week, plus 2-3 active POCs running in customer environments

POC Success Rate: Probably 40-50% of POCs convert to deals - technical validation passes, but budget/timing kills some


Competitive Landscape

Technical Objections You'll Handle:

  • "How is this different from TensorRT/Triton?" (NVIDIA's native optimization)
  • "We're already using BentoML OSS - what's different in paid version?"
  • "Can you really deliver cross-hardware performance?" (skepticism on AMD support)
  • "What's the migration path from our current serving layer?"
  • "Show me the benchmarks on OUR models with OUR data" (custom POC requirements)

Your Win Themes:

  • Live performance demos showing 2-3x speedup on their actual workloads
  • Deploying in their BYOC environment to prove security/compliance
  • Running same model on NVIDIA and AMD to prove portability
  • Showing engineering time savings (less infrastructure code to maintain)

What You'll Actually Do

Time Breakdown

POCs & Technical Validation (40%) | Demos & Discovery (30%) | Internal Prep (20%) | Customer Support (10%)

Key Activities

  • Running technical discovery calls: You're on early sales calls asking questions about their ML stack - what frameworks (PyTorch, TensorFlow), what models (LLMs, embeddings, vision), what inference volume, what hardware (NVIDIA GPUs, which generation), current serving setup. You're taking notes for POC scoping.

  • Delivering product demos: You're screen-sharing and walking through Modular's platform - showing how to deploy a model with Mojo/MAX, how BentoML serving works, how to configure for different hardware. Demos are 45-60 minutes with lots of technical Q&A from ML engineers.

  • Scoping and executing POCs: You're defining success criteria with the customer ("2x throughput improvement" or "sub-100ms p99 latency"), getting access to their cloud environment or VPC, deploying Modular, running their actual models, and producing benchmark reports. POCs take 2-6 weeks and you're managing 2-3 simultaneously.

  • Troubleshooting technical issues: During POCs, things break - CUDA version mismatches, model compilation errors, infrastructure access issues. You're debugging with customers' ML engineers, working with your product/engineering team to fix bugs, and unblocking deployments.

  • Building custom demos and POC environments: Between customer meetings, you're maintaining demo infrastructure, updating benchmarks for new product releases, and building repeatable POC templates for common use cases (LLM serving, recommendation engines, etc.).

  • Answering deep technical questions: Prospects ask about MLIR compiler internals, how Mojo handles memory management, performance on AMD MI250 vs NVIDIA A100, how BentoML's autoscaling works under the hood. You need to know this stuff or find answers quickly.


The Honest Reality

What's Hard

  • POCs are unpredictable and time-consuming: You think a POC will take 2 weeks, but customer's ML engineer is busy for 3 weeks, then their infrastructure team takes another week to grant VPC access, then you find a compatibility issue with their CUDA version. POCs routinely take 2x longer than planned.

  • You're explaining cutting-edge tech that's not widely understood: Mojo is new. Many ML engineers haven't heard of MLIR. You're educating people on concepts while also trying to prove value. Lots of "wait, explain that again" in demos.

  • Post-POC, you don't control the deal: You can prove 3x performance improvement, but the deal still dies in procurement or because they decide to wait until next quarter. You did your job perfectly, but the AE couldn't close.

  • Customer environments are messy: Every enterprise has a unique ML infrastructure snowflake. You're dealing with custom Docker images, legacy CUDA versions, weird networking constraints, and security policies that block everything. POCs involve a lot of environment wrangling.

  • Balancing BentoML OSS vs paid positioning: Existing BentoML users might ask "why can't we just keep using free version?" You need to articulate enterprise value without sounding like you're taking away their open source toy.

What Success Looks Like

  • 40-50% POC win rate - technical validation passes and converts to closed deal
  • Supporting $4-6M in closed ARR per year across your AE team
  • Running 8-12 active POCs simultaneously without letting quality slip
  • Building repeatable POC frameworks that reduce time-to-value for common use cases

Who You're Selling To

Primary Contacts:

  • Principal/Staff ML Engineers - Hands-on technical evaluator, runs POC on their side, needs to be convinced this is better than building in-house
  • ML Platform / MLOps Engineers - Infrastructure owner, cares about deployment reliability and maintenance burden
  • VP/Director of Engineering - Sponsor who approved POC, wants to see clear business outcomes (cost savings, team efficiency)

What They Care About:

  • Performance: Does it actually run faster on their specific models and data?
  • Ease of migration: How much work to switch from current serving layer?
  • Reliability: Will this break in production? What's the failure mode?
  • Vendor lock-in concerns: Can they export their models? What if Modular disappears?
  • Support: When things break at 2am, can they get help?

Requirements

  • 3+ years as Sales Engineer, Solutions Architect, or ML Engineer working on production model serving
  • Deep understanding of ML infrastructure - CUDA, GPUs, model optimization, inference serving, containerization
  • Hands-on experience with PyTorch or TensorFlow in production (you need to code, not just demo)
  • Can explain complex technical concepts to both ML engineers (peer-level) and executives (simplified)
  • Experience running technical POCs in customer environments (BYOC, VPC deployments, security compliance)
  • Comfortable with ambiguity - you're selling new technology that doesn't fit existing categories
  • Bonus: Contributed to ML open source projects or familiar with BentoML, Ray, Triton, TensorRT