Language Data Scientist
Innodata Inc.
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
Language Data Scientist at Innodata Inc.
Innodata (NASDAQ: INOD) is a leading data engineering company renowned for its AI technology solutions. Serving over 2,000 customers globally, including 4 out of 5 of the world’s biggest technology companies, Innodata combines advanced machine learning and artificial intelligence (ML/AI) with a global workforce of subject matter experts and a high-security infrastructure. With over 7,000 employees across 13 cities worldwide, Innodata is poised for explosive growth, helping usher in the promise of AI through digital data solutions and high-quality platforms.
About the Role
Innodata is seeking a Language Data Scientist to join a growing team of Gen AI experts. This role involves hands-on work with multi-modal and multi-lingual datasets, collaborating with cross-functional partners, and leveraging expertise in human and synthetic data workflows to drive innovation. The ideal candidate will possess a strong blend of skills in computational linguistics, human evaluation tasks, data science, and data engineering, contributing to the advancement of GenAI applications for Innodata's customers.
Key Responsibilities
- Design and improve workflows for AI/ML training and evaluation data, including human annotation and data collection, as well as synthetic data generation.
- Deeply analyze existing workflows and processes to gather insights, propose recommendations, and implement improvements through innovation and cross-functional collaboration with customers.
- Critically assess annotation tooling and workflows to optimize efficiency and quality.
- Perform quantitative analysis on large datasets, applying statistical methods, calculating metrics, and making data-driven recommendations for accuracy and performance enhancements.
- Engage closely with client stakeholders to understand objectives, gather requirements, propose effective solutions, and ensure successful execution.
Qualifications
- MA in computational linguistics, data science, computer science (AI/ML/NLU), quantitative social sciences, or a related scientific/quantitative field (PhD strongly preferred).
- Extensive experience with human language data, designing complex multi-phase human evaluation tasks, and a deep understanding of language-culture relationships.
- Ability to identify and resolve ambiguity and subjectivity in language across multi-lingual and multi-modal projects.
- Advanced knowledge of statistics, metrics (e.g., f1 score, inter-rater reliability), and data analysis methods like sampling.
- Proficiency in Python for handling and transforming large datasets (preprocessing, postprocessing, pandas), performing quantitative analyses, and data visualization (matplotlib, seaborn).
- Deep understanding of data pipelines for ML and NLP workflows, including efficient data collection, transformation, storage, data structures, algorithms, and data engineering principles.
- Excellent interpersonal skills for effective cross-functional stakeholder engagement and problem-solving.
- Ability to work independently, collaborate effectively in a team, and adapt to evolving technologies and methodologies.
- Capacity to translate research and development insights to understand client products and services.
Preferred Qualifications
- Stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques through continuous research.
- Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency.
- Experience in developing and maintaining ML/AI pipelines, encompassing data preprocessing, feature extraction, model training, and evaluation.
- Familiarity with fine-tuning pre-trained models for specific tasks and datasets.
- Ability to create clear documentation for technical specifications, user guides, and presentations, effectively communicating complex AI concepts.
- Contribute to establishing best practices for generative AI development both internally and with customers.
- Provide technical mentorship and guidance to junior team members.
- Understanding of techniques such as GPT, VAE, and GANs.
Key skills/competency
- Computational Linguistics
- Data Science
- Natural Language Processing (NLP)
- Generative AI (GenAI)
- Python Programming
- Statistical Analysis
- Data Engineering
- Machine Learning (ML)
- Human Annotation
- Multi-lingual Data
How to Get Hired at Innodata Inc.
- Research Innodata Inc.'s culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Tailor your resume: Highlight experience in computational linguistics, GenAI, NLP, and Python, customizing it for the Language Data Scientist role at Innodata Inc.
- Showcase projects: Prepare to discuss past projects involving multi-modal/multi-lingual data, human evaluation, and statistical analysis relevant to Innodata Inc.'s AI focus.
- Prepare for technical interviews: Expect questions on data science principles, NLP techniques (SpaCy, NLTK), Python data manipulation (pandas), and ML/AI pipeline development.
- Demonstrate problem-solving: Be ready to share examples of how you've critically assessed workflows, driven innovation, and collaborated cross-functionally on data-related challenges at Innodata Inc.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background