Unlimited Job Postings Subscription - $99/yr!

Job Details

Member of Engineering, Pre-training/data (Remote)

  2026-01-15     Recruiting from Scratch     all cities,AK  
Description:

Who is Recruiting from Scratch:

Recruiting from Scratch is a specialized talent firm dedicated to helping companies build exceptional teams. We partner closely with our clients to deeply understand their needs, then connect them with top-tier candidates who are not only highly skilled but also the right fit for the company's culture and vision. Our mission is simple: place the best people in the right roles to drive long-term success for both clients and candidates.



Title of Role: Member of Engineering, Pre-training/Data
Location: Remote (East Coast or EMEA preferred)
Company Stage of Funding: Series B (Series C closing soon, $600M+ raised)
Office Type: Remote
Salary: $240,000 - $400,000 base + highly competitive equity
Company Description

We are representing a frontier AI lab focused on building some of the most capable foundational models in the world. Unlike many AI startups, this company both trains its own large-scale models and ships a developer-facing product, backed by hundreds of millions in venture funding. The team is engineering-first, led by proven leaders from top-tier technology companies, and dedicated to pushing the boundaries of what AI can do for software development.

The company is growing rapidly and seeking world-class engineers to join their data team, where you will help shape the future of AI-powered development.
What You Will Do

  • Build and optimize massive-scale pretraining datasets of natural language and source code to improve LLM performance.
  • Design, experiment with, and analyze data ablations, data mix optimization, and synthetic data generation techniques.
  • Collaborate closely with pre-training, fine-tuning, and product teams to ensure short feedback loops on model quality.
  • Stay at the forefront of the latest research in dataset design and LLM pretraining, rapidly iterating on experiments to improve quality.
  • Deploy solutions into high-performance distributed data pipelines running on large GPU clusters.
Ideal Candidate Background
  • 3+ years of industry experience as a research scientist or engineer.
  • Strong background in machine learning AND engineering.
  • Proven experience building large-scale pretraining datasets and running experiments such as ablations or mixture modeling.
  • Prior hands-on involvement in LLM pretraining, including training models from scratch.
  • Familiarity with distributed systems, data pipelines, and large GPU cluster operations.
  • Passion for data quality and applied experimentation.
Preferred
  • Degree in Computer Science or related technical field.
  • Strong programming skills, including Python, plus low-level languages such as C/C++, CUDA, or Triton.
  • Experience with DevOps tooling (Git, Docker, Kubernetes, Terraform).
  • Author of published research in ML/LLMs.
  • Experience generating and working with synthetic data.
  • Willingness to travel occasionally (e.g., to Europe for team sessions).
Compensation and Benefits
  • Base Salary: $240,000 - $400,000 depending on experience.
  • Equity: Highly competitive package.
  • Visa Sponsorship: Available for exceptional candidates.
  • Remote Work: Flexible, with preference for East Coast U.S. or EMEA time zones.
  • Work Environment: Join an engineering-first culture (over 75% of the team are engineers) working alongside leaders from GitHub, Snap, and other top companies.
  • Impact: Architect the data pipelines powering foundational models that will define the future of AI-assisted software development.


Apply for this Job

Please use the APPLY HERE link below to view additional details and application instructions.

Apply Here

Back to Search