公司简介
-Design and implement data pipelines using tools like Apache Flink, Dataflow, Dataproc, or Cloud Composer.
-Manage and optimize GCP services such as BigQuery, Cloud Storage, and Pub/Sub for data processing.
-Develop ETL processes to extract, transform, and load data from various sources into GCP data warehouses.
-Ensure data quality, consistency, and integrity through validation and monitoring.
-Collaborate with data scientists and analysts to provide clean and structured datasets.
-Implement data security measures, including encryption, IAM roles, and access controls.
-Automate data workflows to improve efficiency and reduce manual intervention.
-Monitor and troubleshoot data pipelines to ensure reliability and performance.
-Optimize query performance and storage costs in BigQuery and other GCP services.
-Document data engineering processes and mentor team members on GCP data tools and best practices.
-University Degree (or above) in Computer Science, Software Engineering, or a related discipline.
-Excellent written and spoken communication skills in English is a must
-Demonstrable 5 years above of commercial experience on developing software, application or solution for large-scale system with ideally either Java/Spring or Python. Both skill sets are highly desirable.
-Good experience in Cloud native technologies, include but not limited for those popular ones on GCP, AWS, Alicloud. Experience on GCP is desirable but not essentia
-Master SQL / BigQuery /Postgres and other relational databases. Good knowledge with Mongo/Clickhouse would be a big puls.
-Familiar with Apache Flink for real-time data processing and streaming applications.
-Efficient with batch and stream processing pipelines with Dataflow for scalable data transformations.
-Familiar with Dataproc to manage and run Hadoop/Spark jobs for big data processing would be a big puls.
-Deploy and orchestrate workflows using Apache Airflow or Cloud Composer for ETL automation.
-Containerize data processing applications with Docker for portability and consistency.
-Familiar with Kubernetes (K8s) to manage and scale containerized workloads in the cloud.
-Ensure clear communication and documentation in English to collaborate effectively with teams.
-Implement robust data security and governance practices across all GCP services.