Поиск работы на robota.uaukraine
Удаленная работа

AI Data Engineer

2 недели назад
07 февраля 2025
Киев
Удаленная работа
Полная занятость

The Role

We're looking for an AI Data Engineer to build and maintain the data infrastructure powering our AI-driven healthcare platform. This role focuses on implementing robust data pipelines, managing our data lakehouse architecture, and ensuring high-quality data processing for our AI systems.

Responsibilities:

  • Design and implement scalable data pipelines for diverse healthcare data sources
  • Build and maintain data lakehouse architecture on AWS for storing structured and unstructured medical data
  • Create efficient ETL processes for handling medical transcriptions, clinical documentation, and practice data
  • Implement data quality monitoring systems and validation frameworks
  • Develop and maintain data crawlers for collecting domain-specific medical content
  • Support RAG system implementation with optimized data storage and retrieval mechanisms

Ideal Candidate:

  • Strong experience with AWS data services (S3, RDS, Glue, EMR Serverless, Athena, DataZone, Lake Formation, DynamoDB)
  • Expertise in data orchestration tools (Dagster, Apache Airflow, AWS MWAA, Step Functions)
  • Proficiency in Python, SQL, and PySpark with experience in data processing frameworks
  • Experience with data lakehouse architectures, ETL pipeline development, and SageMaker Feature Store
  • Strong background with AWS analytics services (Glue Catalog, Glue ETL/EMR Serverless, Athena)
  • Experience with Apache Iceberg table format for organizing data in data lakehouse architecture, including working with time travel, ACID transactions, and schema evolution
  • Experience with PostgreSQL and vector databases (pgvector, OpenSearch, etc.)
  • Proficiency in data transformation tools like dbt
  • Experience implementing data quality frameworks (Great Expectations, Glue Data Quality, PyDeequ)
  • Knowledge of healthcare data structures and medical terminology preferred
  • Experience with data preprocessing for LLM applications strongly preferred (NLP libraries like spaCy, web scraping tools, text extraction, semantic chunking, etc.)
  • Understanding of data security and HIPAA compliance requirements
  • Collaborative mindset and ability to work in a fast-paced startup environment
  • Bachelor's degree in Computer Science, Engineering, or related field

Maria Bilo

Похожие вакансии