hero


Work for one of our portfolio companies

Head of Technology - Cosmos

Infinity

Infinity

IT
Remote
Posted on Oct 7, 2025

Location

Remote

Employment Type

Full time

Location Type

Remote

Department

Cosmos

Head of Technology AI-First IT Automation Platform

About the Company

We’re building the next generation of AI-powered IT operations combining automation, intelligent agents, and human expertise to create an entirely new category of Managed Service Provider.
Our mission is to make every IT function from endpoint management to incident response faster, more autonomous, and more intelligent.

We are part of Infinity Constellation, a portfolio of AI-enabled service companies. This is an early-stage, hands-on builder role: you’ll define the core systems, agent frameworks, and infrastructure that power the platform from day one.

The Role

We’re looking for a Head of Technology who thrives in early-stage, fast-moving environments, someone who codes, experiments, and ships intelligent systems that solve operational problems in the real world.
This isn’t a “corner-office” leadership role. It’s a hands-on founder-type position where you’ll:

  • Architect and build the AI and automation backbone of the company

  • Deploy, evaluate, and monitor ML/LLM systems in production

  • Partner with product and operations to translate real customer pain points into intelligent, self-healing systems

  • Build and lead a small technical team (5–10 engineers) as the platform scales over the next 24 months

Key Responsibilities

Architecture & Systems Design

  • Own the end-to-end technical architecture: service orchestration, observability, and AI/ML pipelines

  • Select and integrate frameworks for agentic orchestration (e.g. LangChain, LlamaIndex, OpenDevin-style frameworks, or custom alternatives)

  • Establish standards for model evaluation, context management, and secure data handling

AI/ML & Agent Development

  • Build, train, and deploy AI agents that automate IT workflows (troubleshooting, monitoring, patching, access management, etc.)

  • Select and ship modern MLOps and Agentic Frameworks for deployment and monitoring

  • Implement human-in-the-loop feedback loops to continuously improve model behavior

Infrastructure & Reliability

  • Lead development of scalable backend systems (Python, Django/FastAPI, Pydantic, etc.) on AWS with EKS/Helm

  • Design observability and cost-tracking systems from day one

  • Ensure reliability, uptime, and security across distributed environments

Team Leadership & Scale

  • Recruit, mentor, and lead early technical hires across backend, MLOps, and automation domains

  • Define engineering standards, code review processes, and developer productivity systems

  • Build the technical culture: curiosity, ownership, and continuous learning

Who You Are

  • Builder-first: happiest when coding, prototyping, or debugging not just delegating

  • Systems-minded: you can design for both reliability and intelligence

  • AI-fluent: you understand how to deploy, monitor, and improve LLMs and agent systems

  • Pragmatic: you pick tools that work and move fast while managing long-term scalability

  • Operationally literate: you understand IT operations, DevOps, or infrastructure automation domains

Qualifications

Required

  • 5+ years of software engineering experience, including 2–3 years building or operating ML systems in production

  • Expertise in Python and experience with backend frameworks (Django/FastAPI), MLOps tools (Ray, MLflow, W&B, etc.) and Agentic Frameworks (LangChain, LangGraph, PydanticAI, CrewAI, Autogen, etc.)

  • Experience deploying models or agent systems on cloud infrastructure (AWS preferred; EKS/Helm/Terraform a plus)

  • Familiarity with LLM orchestration or multi-agent frameworks

  • Strong foundation in observability, logging, and production reliability

  • Demonstrated ability to lead and scale small teams (5–10 engineers)

Nice to Have

  • Experience in IT automation, RMM tools, or endpoint management

  • Experience fine-tuning or evaluating LLMs (OpenAI, Anthropic, HuggingFace, etc.)

  • Familiarity with retrieval-augmented generation or evaluation frameworks

  • Background in Infrastructure-as-Code, SRE, or cloud cost optimization

  • Previous startup or founder experience