Data Engineer
Noxtua
Job Overview
Who's the hiring manager?
Sign up to PitchMeAI to discover the hiring manager's details for this job. We will also write them an intro email for you.

Job Description
About the Role: Data Engineer at Noxtua
As a Data Engineer, you will be instrumental in developing Noxtua's legal data and search infrastructure end-to-end. Your role involves designing, maintaining, and optimizing robust ETL pipelines to clean and normalize XML-based legal data from diverse jurisdictions. You will also develop scalable data models and metadata enrichment strategies to enhance searchability, semantic relevance, and usability of legal information for downstream AI agents and products. A significant part of your responsibilities will include leveraging generative AI to improve data processing and metadata generation, alongside continuously benchmarking and tuning database and search performance to ensure efficient, low-latency querying at scale. Collaborating closely with product teams, AI researchers, and legal domain experts, you will deliver high-quality, reliable data solutions that unlock the value of complex, multilingual legal content.
Your Team at Noxtua
You will join our collaborative Data Team, working alongside approximately 5 data experts. This team is dedicated to pushing the boundaries of generative AI, natural language processing, and privacy-preserving machine learning solutions in the legal tech space.
Meet Your Hiring Manager
Felix, our Director of AI & Data Engineering, will be your guide at Noxtua. With extensive expertise in AI systems, Felix fosters innovation and collaboration, ensuring every team member has the opportunity to thrive.
Benefits of Working at Noxtua
- Working hours: Enjoy flexible working hours.
- Vacation: 26 days off, plus December 24th & 31st, with an additional day for each year of employment (up to a maximum of 30 days).
- Remote: 100% remote work is possible, provided you have EU residence/working permit in Austria, Croatia, Germany, Poland, or Slovakia. You also have the flexibility to use our offices in Berlin, Munich, Paris, or Zagreb.
- Home Office Setup Budget: Receive €1,000 with your first salary to establish your ideal remote workspace.
- Equipment: You will be provided with a laptop (Lenovo or Mac).
- Discounts: Access various discounts, such as an Urban Sports Club Membership, depending on your location.
Your Core Responsibilities
- Design, build, and optimize end-to-end ETL pipelines for legal data from multiple jurisdictions, encompassing ingestion, validation, cleaning, transformation, chunking, embedding, and ingestion into vector databases.
- Work extensively with XML-based legal data feeds: parse, validate, normalize, and transform complex XML structures into scalable internal schemas and unified document formats.
- Develop and maintain data models and storage schemas that support continuously updated datasets while ensuring consistency, scalability, and accuracy across diverse datasets and large amounts of data.
- Coordinate data handover and integration from multiple internal and external data providers, including official sources, APIs, and web scraping pipelines, ensuring reliable and timely updates.
- Implement and continuously refine metadata enrichment strategies to maximize searchability, ranking quality, and relevance of legal information in vector databases.
- Build and maintain high-performance search and retrieval infrastructure, enabling agent-based systems to call search functions and retrieve the most relevant legal information efficiently.
- Explore and integrate generative AI techniques to enhance data processing workflows, such as structured field extraction, metadata generation, and document normalization.
- Experiment with different embedding and chunking strategies, including comprehensive evaluation.
- Conduct database performance benchmarking and tuning to ensure efficient query execution and scalability.
- Collaborate with product, AI, and legal domain experts to deliver high-quality, reliable data solutions.
Our Key Tech Stack
- Programming Languages: Python
- Data format: XML, parquet
- Vector Search: ElasticSearch, Qdrant
- Graph Databases: Neo4j, Amazon Neptune
- Libraries: HuggingFace, Transformers, NumPy, Pandas, Pydantic, FastAPI, OpenAI & PyTorch
- Deployment Tools: Docker
- Cloud Infrastructure: OTC, AWS
- Pipeline Orchestration: Apache Airflow
- Ticket System: Atlassian JIRA
- Repository: Github
- CI/CD System: GitHub Actions
- Documentation: Confluence
- Communication: Slack
- Office Application: MS365
What We're Looking For In You
- Residence & Work Permit: Required in/for one of the following countries: Austria, Croatia, Germany, Poland & Slovakia.
- Language: English proficiency at C2 level.
- Experience: Proven experience in AI development or data engineering with successfully deployed projects.
- RAG Systems: Experience in building AI-specific RAG pipelines.
- Data: Expertise in data processing, filtering, and augmentation.
- Databases: Expertise in vector databases, data embedding, benchmarking, and management.
- Programming: Strong Python skills and experience with AI pipelines.
OPTIONAL:
- Experience in deploying graph databases.
- Familiarity with developing and deploying NLP and generative AI models.
- Legal background knowledge.
SOUNDS GOOD? Then, we look forward to receiving your CV via our online application form.
About Noxtua
Noxtua is Europe’s sovereign Legal AI, covering the entire spectrum of legal text work – from information gathering (research) and analysis of complex issues (understanding) to document creation (drafting). This legally compliant AI adheres to professional, criminal, and data protection requirements for lawyers (e.g., Section 203 German Criminal Code, Section 43e German Federal Code for Lawyers) and is certified according to BSI C5, TISAX, ISO 27001, 9001, 27018, 27017, and 42001. Noxtua has formed exclusive partnerships with leading European publishing houses across Germany, Austria, Switzerland, Poland, the Czech Republic, and Slovakia for its Legal AI Workspaces.
Founded in 2017 in Berlin as a result of research by Dr. Leif-Nissen Lundbæk and Professor Dr. Michael Huth at Oxford University and Imperial College London, Noxtua is a European legal tech company with extensive experience in GDPR-compliant AI solutions. It now has offices in Paris, Berlin, Zagreb, and Munich. Strategic partners, including Germany’s leading legal publisher C.H.BECK and top law firms CMS and Dentons, have invested approximately 81 million EURO in the European scaleup during its Series B funding round.
Noxtua explicitly encourages women to apply, especially given their current underrepresentation. Our goal is to build a diverse and inclusive work environment that values different perspectives. We welcome applications from all qualified individuals, regardless of gender, ethnic origin, religion, disability, age, or sexual identity.
Key skills/competency
- Data Engineering
- ETL Pipeline Development
- XML Data Processing
- Generative AI Integration
- Vector Databases
- Python Programming
- Legal Data Management
- Natural Language Processing (NLP)
- Search Infrastructure
- Database Performance Tuning
How to Get Hired at Noxtua
- Research Noxtua's culture: Study their mission, values, recent news, and employee testimonials on LinkedIn and Glassdoor.
- Tailor your resume for Legal AI: Highlight experience in data engineering, AI, NLP, and legal tech, emphasizing deployed projects.
- Demonstrate Python and ETL expertise: Showcase projects involving complex data pipelines, XML processing, and vector database management.
- Prepare for technical and behavioral questions: Focus on problem-solving, AI integration, data quality, and collaborative problem-solving scenarios.
- Emphasize your EU legal data understanding: Highlight any familiarity with legal data structures, compliance, or multilingual content processing.
Frequently Asked Questions
Find answers to common questions about this job opportunity
Explore similar opportunities that match your background