ijact-book-coverT

Scalable Data Pipeline using Google Cloud

© 2025 by IJACT

Volume 3 Issue 1

Year of Publication : 2025

Author : Sanjay Puthenpariyarath

:10.56472/25838628/IJACT-V3I1P116

Citation :

Sanjay Puthenpariyarath, 2025. "Scalable Data Pipeline using Google Cloud" ESP International Journal of Advancements in Computational Technology (ESP-IJACT)  Volume 3, Issue 1: 149-153.

Abstract :

In the data processing system, the data pipeline plays a crucial role. The scalability is a mandatory feature for processing enormous volume of data along with proper approaches for data management. Google cloud platform offers various services for efficient data processing and in this article, we are building a scalable data pipeline using Google cloud. The complete solution implements GCS together with BigQuery and Cloud Dataflow and Cloud Composer which use Python programming and Apache Airflow for integration. The paper outlines data engineering standards and depicts real examples of Python work using the Google Cloud infrastructure. The paper explores the fundamental aspects of pipeline construction which includes architectural design alongside implementation steps and performance optimization while providing examples from practical applications.

References :

[1] P. K., Narayanan. Engineering Data Pipelines Using Google Cloud Platfor m. In Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms (pp. 531-570). Berkeley, CA: Apress. (2024).
[2] S. P. T., Krishnan & J. L. U, Gonzalez. Google cloud dataflow. Building Your Next Big Thing with Google Cloud Platform: A Guide for Developers and Enterprise Architects, 255-275. (2015)
[3] V., Sresth, S. P., Nagavalli & S. Tiwari. Optimizing Data Pipelines in Advanced Cloud Computing: Innovative Approaches to Large-Scale Data Processing, Analytics, and Real-Time Optimization. International Journal of Research And Analytical Reviews, 10, 478-496. (2023).
[4] J., Reuterswärd. Implementation & architecture of a cloud-based data analytics pipeline. (2016)
[5] I., Lipovac & M. B., Babac. Developing a data pipeline solution for big data processing. International Journal of Data Mining, Modelling and Management, 16(1), 1-22. (2024).
[6] M., Kukreja & D., Zburivsky. Data Engineering with Apache Spark, Delta Lake, and Lakehouse: Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way. Packt Publishing Ltd. (2021)
[7] Z., Shojaee Rad & M. Ghobaei-Arani. Data pipeline approaches in serverless computing: a taxonomy, review, and research trends. Journal of Big Data, 11(1), 82. (2024)
[8] S. U., Rahaman. Cloud-Based Data Pipeline Automation: Transforming Efficiency in Large-Scale Data Processing. (2018).
[9] S. Shukla. Developing pragmatic data pipelines using apache airflow on Google Cloud Platform. Int J Comput Sci Eng, 10(8), 1-8. (2022).
[10] V., Sresth, S. P., Nagavalli & S. Tiwari. Optimizing Data Pipelines in Advanced Cloud Computing: Innovative Approaches to Large-Scale Data Processing, Analytics, and Real-Time Optimization. International Journal of Research and Analytical Reviews, 10, 478-496. (2023)
[11] N., Naik. Connecting Google cloud system with organizational systems for effortless data analysis by anyone, anytime, anywhere. In 2016 IEEE International Symposium on Systems Engineering (ISSE) (pp. 1-6). IEEE. (2016, October).

Keywords :

Data pipeline, Google Cloud, BigQuery, Apache Airflow, Cloud Composer, GCS, Cloud Dataflow, Scalability.