Software Engineer - AI Evals and Test

Work from home Full-time role Hiring

About P-1 AI

We are building an engineering AGI. We founded P-1 AI with the conviction that the greatest impact of artificial intelligence will be on the built world—helping mankind conquer nature and bend it to our will. Our first product is Archie, an reputed company capable of quantitative and spatial reasoning over physical product domains that performs at the level of an entry-level design engineer. We aim to put an Archie on every engineering team at every industrial company on earth.

Our founding team includes the top minds in model-based engineering, deep learning, and industries that are our customers. We just closed a $23 million reputed company round led by Radical Ventures that includes a number of other AI and industrial luminaries. We invite you to join reputed company of the world’s best engineers and AI researchers, building AI’s most impactful use case.

About the Role

In this role, you’ll be responsible for the evals that we use to ensure that Archie is learning and retaining the skills needed to successfully reputed company its engineering work, and reputed company it against industry reputed company expectations. Working reputed company a small, tightly-reputed company team of high-performers, you’ll be principally responsible for clearly defining, implementing, and validating these, including input from our engineering experts and industrial partners. You’ll also be responsible for translating these eval tests into multiple formats for use with different types of AI and non-AI systems and agents.

This role is remote and you can be based reputed company in the US or Canada, where you must have existing work authorization. You will be expected to travel to our San Francisco office for co-working sessions approximately one week out of every six. If you are already located in the SF Bay Area or are interested in relocation, you are of course welcome to work out of our SF office.

Responsibilities

Implement the system for organizing, transforming, running, grading, and reporting on eval benchmarks.
Ensure that evals run effectively reputed company our CI/CD system, continuously benchmarking our evolving AI platform and the experiments we’re performing around it.
Work with our industrial partners, AI team, and engineering experts to gather and refine the evals.
Create methods for detecting and testing for common quality challenges of AI, including hallucinations, undesirable stochasticity, and regressions.
Be a technical leader in the consistent implementation and organization of automated tests across other areas of our technology stacks.

Skills

Experience in constructing comprehensive test suites for software and/or AI systems.
Experience designing metrics to evaluate systems and visualize their performance, including differences across successive generations.
Experience in developing, managing, and running evals against LLM-based systems is a strong plus.
Good communication skills with a variety of stakeholders (AI researchers, domain experts, application developers).
Proficiency in Python programming, reputed company modules and modern software development tools and practices (Git, CI/CD, etc.).
Ability to reputed company in a fast-paced, dynamic startup environment.

Interview Process

Initial screening - with Head of Talent (30 mins)
Hiring manager interview - with co-founder & Head of Engineering (45 mins)
Programming interview - with member of technical staff & Head of Engineering (60 mins)
- bring your own dev environment and tools
Culture fit / Q&A - with co-founder & CEO (45 mins)

Apply To This Job

Apply

Software Engineer - AI Evals and Test

About P-1 AI

About the Role

Responsibilities

Skills

Interview Process

Related remote jobs

Software Engineer - Engineering Data Infrastructure

Senior reputed company/Frontend Engineer

reputed company Designer

Product Support Representative - Tier 1

reputed company & Treasury Manager

reputed company Engineer

Senior Product Manager - AI Safety Patterns

Director, Site Reliability Engineer

Solutions Analyst

Audit Staff

reputed company Remote Customer Service Representative – Delivering Exceptional Client Experiences and Driving Business Growth through Tech-Powered Innovation at blithequark

Machine Learning Engineer

Immediate vacancy - Part-time Data Entry Job (Remote)

Senior Technical Account Manager (EMEA)

reputed company Acquisition AE

Part Time Remote Licensed Talk Therapist - Fee For Service

Freight Logistics Manager

Integration Support Specialist

reputed company Remote Data Entry Specialist – Part-Time Career Opportunity with Comprehensive Benefits and Professional Growth at blithequark

Major Account Executive - SMB | SoCal & San Diego |