GenPeach AI is a product-driven research lab building vertical multimodal foundation models for hyper-realistic human generation in image and video – designed for emotionally resonant, human-centered AI experiences. Our goal is to create tools that supercharge human creativity rather than replace it.
We train models from scratch: proprietary datasets at massive scale, novel architectures and training recipes, large GPU clusters, and tight product integration so research ships to users quickly.
We are a deeply technical team of around 10 people. We’re advised by Directors from Google DeepMind and backed by leading AI-focused funds and angels from OpenAI, Meta AI, Microsoft AI, Project Prometheus, and Fal. Collectively, our team, advisors, and angels have contributed to models including Meta’s Imagine/MovieGen and foundation-model work behind OpenAI’s Sora, plus Google’s Veo and Gemini.
You’ll join the research team working across image/video generation and multimodal understanding. You’ll work closely with other Research Engineers and Scientists, as well as Founders and help turn research into scalable training runs, strong evaluations, and production-ready systems.
We’re hiring an AI Research Engineer to help build and scale GenPeach’s foundation models end-to-end – from implementing new model ideas and training recipes, to owning the parts of the training stack that determine quality and speed, to pushing models through production constraints.
This is a hands-on, high-ownership role. You’ll write research-grade code that becomes production-critical.
Implement and iterate on image/video generative model ideas (architecture, losses, conditioning, sampling, pre-training, distillation, post-training)
Own training performance end-to-end (distributed training, throughput, memory, stability, debugging scaling failure modes)
Build the experimentation loop (evals, ablations, reproducibility tooling, reporting, decision hygiene)
Build and improve VLMs for image/video captioning (data recipes, training strategies, model variants, evaluation)
Run high-iteration research: read papers when useful, implement ideas, validate empirically
Create captioning pipelines that improve generation training and product quality
Partner with inference/product to ship under real constraints (latency, cost, reliability, rollout safety)
Build demos and prototypes to showcase capabilities and accelerate iteration
Love the craft of experimentation: fast iteration, clear ablations, strong evals, and honest conclusions
Enjoy debugging messy real-world training runs (not just clean demos)
Can move between research and engineering: write clean code, ship utilities, and improve team velocity
Take ownership beyond your job description when needed (startup reality)
Communicate clearly and collaborate well in a small, senior team
Strong Python and PyTorch skills (4+ years of experience)
Experience implementing and training deep learning models (generative models, VLMs, LLMs, vision/video, or adjacent)
Solid understanding of training dynamics, optimization, and practical debugging
Ability to drive projects end-to-end with minimal supervision
Hands-on experience with diffusion/flow-based image or video generation, or large-scale generative modeling in adjacent domains
Experience with distributed training at scale (multi-node) and performance tuning (throughput/memory)
Experience building evaluation frameworks (offline metrics + human eval + regression tracking)
Strong intuition for data quality and dataset/labeling tradeoffs for training and captioning
Publications are a plus, but shipped impact and strong technical evidence matter more
Build frontier image/video models and the VLM captioning systems that power them
Join a lean, senior team that holds a high engineering + research bar
Direct product impact: your training runs become real user-facing capabilities
Benchmark against the best in the world and compete on model quality through what we ship
You own outcomes end-to-end and are trusted with real responsibility
Direct, low-ego communication and fast feedback loops
Bias toward impact: measure iterate ship
Research discipline: clear ablations, reproducibility, and crisp decision-making
Location: Zurich (Switzerland) or Warsaw (Poland) — onsite or hybrid. If you’re elsewhere, we’re open to remote (team/timezone fit considered).
Compensation: competitive salary + meaningful equity (level-dependent)
Interview process: quick screen 2x technical rounds (practical + systems) team fit/values
Visa sponsorship (where applicable); we’ll make a strong effort to relocate you to Switzerland or Poland if desired
Remote-friendly: work fully remote, hybrid, or on-site from our hubs
Regular offsites and in-person events to collaborate and connect
Flexible PTO