Pre-Training Text Data

Microsoft Corporation · Multiple Locations, United States, United States

Location
Multiple Locations
Job Type
Full-time
Posted
June 10, 2026

Job Description

**Overview**

We are seeking engineers and researchers to join our Pretraining Text Data team, where we are building the next generation of foundation large language models. If you are passionate about designing and curating high-quality datasets to power frontier AI models, this role is for you.

In this role, you’ll work at the intersection of data and innovation—collaborating with scientists, engineers, and annotators to curate, analyze, and evaluate diverse text datasets critical to model development. You will lead efforts to:

+ Develop novel data collection strategies

+ Improve dataset quality and integrity

+ Understand data-driven model behaviors

+ Train models to understand the impact of data and data mixes

+ Align datasets with ethical and societal values

This is a cross-disciplinary, high-impact role ideal for engineers and researchers who want to push the boundaries of what AI can learn from data.

...

Ready to Apply?

Submit your application for Pre-Training Text Data at Microsoft Corporation

Apply Now