Machine Learning Engineer Pre-Training Data
@ Cohere

Toronto, ON
$150,000
On Site
Full Time
Posted 20 hours ago

Your Application Journey

Personalized Resume
Apply
Email Hiring Manager
Interview

Email Hiring Manager

XXXXXXXX XXXXXXXXXXXXX XXXXXX***** @cohere.com
Recommended after applying

Job Details

About the Role

The Machine Learning Engineer Pre-Training Data at Cohere will play a pivotal role in developing the data pipeline for advanced language models. The role involves end-to-end management of training data including ingestion, cleaning, filtering, optimization, and data modeling to ensure datasets are structured for optimal model performance.

Responsibilities

  • Design and build scalable data pipelines for diverse datasets.
  • Conduct data ablations to assess quality and experiment with data mixtures.
  • Develop robust data modeling techniques for efficient training.
  • Research and implement innovative data curation methods.
  • Collaborate with cross-functional teams including researchers and engineers.

Qualifications

  • Strong software engineering skills with proficiency in Python.
  • Experience building data pipelines and using frameworks like Apache Spark, Apache Beam, or Pandas.
  • Experience with large-scale datasets such as web data, code data, and multilingual corpora.
  • Knowledge of data quality assessment techniques and experimentation with data mixtures.
  • Passion for bridging research and engineering in AI model training.
  • Bonus: Publications at top-tier venues.

Culture & Benefits

Cohere values diversity, inclusivity, and innovative excellence. Enjoy perks like remote flexibility, a co-working stipend, competitive benefits, and a dynamic, open culture.

Key skills/competency

  • Python
  • Data Pipelines
  • Apache Spark
  • Data Cleaning
  • Data Modeling
  • NLP
  • Research
  • Collaboration
  • Data Quality
  • Scaling

How to Get Hired at Cohere

🎯 Tips for Getting Hired

  • Optimize Your Resume: Highlight Python and data pipeline expertise.
  • Customize Your Application: Tailor examples of AI data projects.
  • Research Cohere: Understand their AI mission and culture.
  • Prepare for Interviews: Be ready to discuss scalable systems.

📝 Interview Preparation Advice

Technical Preparation

Review Python programming concepts.
Practice building data pipelines using Spark.
Study data cleaning and transformation techniques.
Understand scalable system architecture fundamentals.

Behavioral Questions

Describe handling project challenges under pressure.
Explain effective teamwork and cross-department collaboration.
Discuss problem-solving strategies in unclear situations.
Share experiences managing shifting priorities.

Frequently Asked Questions