Job Details

View jobs in our app

Learn more about the app. Workinapps.com

Member of Engineering, Pre-training/data (Remote)

2026-01-15 Recruiting from Scratch all cities,AK

Description:

Who is Recruiting from Scratch:

Recruiting from Scratch is a specialized talent firm dedicated to helping companies build exceptional teams. We partner closely with our clients to deeply understand their needs, then connect them with top-tier candidates who are not only highly skilled but also the right fit for the company's culture and vision. Our mission is simple: place the best people in the right roles to drive long-term success for both clients and candidates.

Title of Role: Member of Engineering, Pre-training/Data
Location: Remote (East Coast or EMEA preferred)
Company Stage of Funding: Series B (Series C closing soon, $600M+ raised)
Office Type: Remote
Salary: $240,000 - $400,000 base + highly competitive equity
Company Description

We are representing a frontier AI lab focused on building some of the most capable foundational models in the world. Unlike many AI startups, this company both trains its own large-scale models and ships a developer-facing product, backed by hundreds of millions in venture funding. The team is engineering-first, led by proven leaders from top-tier technology companies, and dedicated to pushing the boundaries of what AI can do for software development.

The company is growing rapidly and seeking world-class engineers to join their data team, where you will help shape the future of AI-powered development.
What You Will Do

Build and optimize massive-scale pretraining datasets of natural language and source code to improve LLM performance.
Design, experiment with, and analyze data ablations, data mix optimization, and synthetic data generation techniques.
Collaborate closely with pre-training, fine-tuning, and product teams to ensure short feedback loops on model quality.
Stay at the forefront of the latest research in dataset design and LLM pretraining, rapidly iterating on experiments to improve quality.
Deploy solutions into high-performance distributed data pipelines running on large GPU clusters.

Ideal Candidate Background

3+ years of industry experience as a research scientist or engineer.
Strong background in machine learning AND engineering.
Proven experience building large-scale pretraining datasets and running experiments such as ablations or mixture modeling.
Prior hands-on involvement in LLM pretraining, including training models from scratch.
Familiarity with distributed systems, data pipelines, and large GPU cluster operations.
Passion for data quality and applied experimentation.

Preferred

Degree in Computer Science or related technical field.
Strong programming skills, including Python, plus low-level languages such as C/C++, CUDA, or Triton.
Experience with DevOps tooling (Git, Docker, Kubernetes, Terraform).
Author of published research in ML/LLMs.
Experience generating and working with synthetic data.
Willingness to travel occasionally (e.g., to Europe for team sessions).

Compensation and Benefits

Base Salary: $240,000 - $400,000 depending on experience.
Equity: Highly competitive package.
Visa Sponsorship: Available for exceptional candidates.
Remote Work: Flexible, with preference for East Coast U.S. or EMEA time zones.
Work Environment: Join an engineering-first culture (over 75% of the team are engineers) working alongside leaders from GitHub, Snap, and other top companies.
Impact: Architect the data pipelines powering foundational models that will define the future of AI-assisted software development.

Job Details

View jobs in our app

Member of Engineering, Pre-training/data (Remote)

Apply for this Job

Registration Required

Login to Apply

You are leaving our site

Registration Required

Email this job to a friend

Job: Member of Engineering, Pre-training/data (Remote)

Job Alert Sign Up

Add To Job Alert

Job Alert Updated

Email Customer Care