
Data Engineer, Web Scraping
10a Labs · San Francisco, CA
- On site
- Full-time
- $125,000 / year
- San Francisco, CA
Job highlights
- Design and optimize data pipelines for web scraping.
- Collect and process structured and unstructured data.
- Prepare data for analysis and ML engineers.
- Develop internal and external APIs and tools.
- Collaborate with cross-functional teams for insights.
About the role
About 10a Labs
10a Labs is the safety and threat-intelligence layer trusted by frontier AI labs, AI unicorns, Fortune 10 companies, and leading global technology platforms. Our adversarial red teaming, model evaluations, and intelligence collection enable engineering, safety, and security teams to stay ahead of evolving threats and deploy AI systems safely.In This Role, You Will
- Design, implement, and optimize end-to-end data pipelines for scraping and processing structured and unstructured data using Google Cloud Platform (or similar) and best practices.
- Conduct ad hoc web scraping and data collection to support research and intelligence initiatives.
- Prepare data for further analysis, including data cleaning, transformation, anonymization, and masking.
- Contribute to the development of internal and external APIs, following best practices.
- Collaborate with ML engineers, other data engineers, and software developers to deliver actionable insights and functional tools, including internal and external dashboards, APIs, and data dumps.
- Drive other critical initiatives.
Requirements
- Degree (or equivalent work experience) in Computer Science, Engineering, Information Science, Data Science or a related field (graduate degree preferred).
- 2+ years of professional experience in data engineering or a closely related field.
- Ability to communicate complex technical ideas clearly to non-technical audiences.
- Proficiency in Python, SQL.
- Experience with web scraping/crawling (e.g., Beautiful Soup, Selenium, Scrapy).
- Experience with Google Cloud Platform (or similar), including storage and database services (e.g., Cloud Storage, CloudSQL, Cloud Spanner) and workflow orchestration (e.g., Cloud Composer/Airflow, Cloud Run, Pub/Sub).
- Experience building and managing data pipelines, especially for text data.
- Comfort working in fast-moving, high-impact environments, such as startups, AI research labs, or security-focused teams.
Compensation & Benefits
- Salary Range: $105K–$125K, depending on experience and location.
- Bonus: Performance-based annual bonus.
- Professional Development: Support for conferences, continuing education, or leadership training.
- Work Environment: Fully remote, U.S.-based.
- Health Benefits: Comprehensive health, dental, and vision coverage.
- Time Off: Generous PTO and paid holiday schedule.
- Retirement: 401(k) plan.
Key skills/competency
- Data Engineering
- Web Scraping
- Python
- SQL
- Google Cloud Platform
- Data Pipelines
- API Development
- Data Processing
- Data Analysis
- Cloud Computing
Skills & topics
- Data Engineer
- Web Scraping
- Python
- SQL
- Google Cloud Platform
- Data Pipelines
- API Development
- Data Processing
- Data Analysis
- Cloud Computing
- AI Safety
- Threat Intelligence
How to get hired
- Tailor your resume: Highlight Python, SQL, web scraping, and GCP experience.
- Showcase project impact: Quantify achievements in data pipeline design and data processing.
- Prepare for technical questions: Brush up on data structures, algorithms, and cloud services.
- Demonstrate communication skills: Practice explaining technical concepts to non-technical audiences.
- Research 10a Labs: Understand their AI safety and threat intelligence mission.
Technical preparation
Practice Python coding challenges, focus on data structures.,Master SQL for complex data querying and manipulation.,Build small web scraping projects with Beautiful Soup/Scrapy.,Familiarize yourself with GCP services: Storage, SQL, Composer.
Behavioral questions
Describe a complex data pipeline you designed.,How do you handle ambiguous data requirements?,Explain a technical concept to a non-technical person.,How do you stay updated with AI safety trends?
Frequently asked questions
- What are the key technical skills for a Data Engineer at 10a Labs?
- For the Data Engineer, Web Scraping role at 10a Labs, key technical skills include proficiency in Python and SQL, experience with web scraping tools like Beautiful Soup or Scrapy, and familiarity with cloud platforms such as Google Cloud Platform. Experience in designing and managing data pipelines, especially for text data, is also crucial.
- What is the work arrangement for this Data Engineer position at 10a Labs?
- This Data Engineer position at 10a Labs is fully remote and U.S.-based. This means you can work from anywhere within the United States.
- What kind of data will a Data Engineer work with at 10a Labs?
- A Data Engineer at 10a Labs will work with both structured and unstructured data, with a specific emphasis on text data collected through web scraping. This data supports the company's safety and threat intelligence initiatives.
- Does 10a Labs offer professional development opportunities for its Data Engineers?
- Yes, 10a Labs supports professional development for its employees. This includes assistance for attending conferences, pursuing continuing education, or engaging in leadership training programs.
- What is the preferred educational background for a Data Engineer at 10a Labs?
- While a degree in Computer Science, Engineering, Information Science, Data Science, or a related field is preferred, 10a Labs also considers equivalent work experience. A graduate degree in a related field is a plus.
- How does 10a Labs ensure AI systems are deployed safely?
- 10a Labs acts as a safety and threat-intelligence layer for AI development. They achieve this through adversarial red teaming, model evaluations, and intelligence collection, enabling teams to proactively address evolving threats and deploy AI safely.