Low Latency High Throughput Data Serving Layer for Generative AI Applications using the REST-based APIs

Lakshmana Kumar Yenduri

Low Latency High Throughput Data Serving Layer for Generative AI Applications using the REST-based APIs

Volume 2 Issue 3

Year of Publication : 2024

Author : Lakshmana Kumar Yenduri

:10.56472/25838628/IJACT-V2I3P106

Citation :

Lakshmana Kumar Yenduri, 2024. "Low Latency High Throughput Data Serving Layer for Generative AI Applications using the REST-based APIs" ESP International Journal of Advancements in Computational Technology (ESP-IJACT) Volume 2, Issue 3: 61-76.

Abstract :

Based on the discussion of generative AI and other successful neoteny complex AI applications such as large language models and synthesis of images and other forms of AI-generated creativity, the efficiency of an application depends much on data management systems. Most of these applications are computationally intensive. Hence, the serving layer being required to have high I/O and response rates, in essence, being real-time. The generic data-serving architectures differ and are shown to be unsalable and slow when it comes to the case of generative AI. The subsequent paper outlines a novel architecture for managing the four factors using REST-based API’s for integration and interaction. The idea is to reveal the state-of-art technologies consisting of the multi-level caching approaches, distributed databases, and the optimal RESTful API architecture to construct the fully independent, reliable, and beautiful data-serving layer. With respect to the important characteristics involved in data handling, such as data access pattern optimization, query optimization, and network, the architecture offered a response time that was significantly slower than the increasing load. The use and integration of distributed databases ensure that the system has the characteristic of being horizontally scalable, meaning that the increase in the amount of data that the system has to process does not compromise efficiency. On the same note, the caching architectures are used to conserve frequency by storing data in the regions of usage. In this paper, discussions revolve around structures of design and total performance evaluation of design with various inputs credited to the strategies employed in implementing the design. Extracting an average of the performance metrics, it can be seen that the architecture meets the needs as requested and is heavily optimized for low-latency and high-throughput, let alone integrated real-time generative AI applications. From this work, we understood the concept and efforts that need to be applied to make data-serving layers in the future advancements of the field of artificial intelligence and laid down the development path for future expansion of this ever-evolving technology.

References :

[1] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S. & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2672-2680.

[2] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., &Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.

[3] Van den Oord, A., Vinyals, O., &Kavukcuoglu, K. (2017). Neural discrete representation learning. Advances in Neural Information Processing Systems, 30, 6309-6318.

[4] Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.

[5] Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010). The Hadoop distributed file system. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (pp. 1-10). IEEE.

[6] Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., ... & Gruber, R. E. (2006). Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2), 1-26.

[7] Decandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., ... & Vogels, W. (2007). Dynamo: Amazon’s highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles (pp. 205-220).

[8] Carzaniga, A., Gorla, A., &Pezzè, M. (2013). Self-healing by means of automatic workarounds. Software: Practice and Experience, 43(12), 1377-1394.

[9] Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., & Vassilakis, T. (2010). Dremel: Interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment, 3(1-2), 330-339.

[10] Ristock, K. E., & Pennell, J. (1996). Community research as empowerment: Feminist links, postmodern interruptions. University of Toronto Press.

[11] Nurmi, P., Bhattacharya, S., &Floréen, P. (2010). A grid-based method for improving the accuracy of the k-nearest neighbor method. Pattern Recognition Letters, 31(9), 827-836.

[12] Vogels, W. (2009). Eventually consistent. Communications of the ACM, 52(1), 40-44.

[13] Fielding, R. T., & Taylor, R. N. (2000). Principled design of the modern web architecture. ACM Transactions on Internet Technology (TOIT), 2(2), 115-150.

[14] Postel, J. B. (1980). Transmission Control Protocol. Internet Requests for Comments, 793.

[15] Guidance for Low-Latency, High Throughput Model Inference Using Amazon SageMaker, AWS, online. https://aws.amazon.com/solutions/guidance/low-latency-high-throughput-model-inference-using-amazon-sagemaker/

[16] 10 Tips for Improving API Performance, online. https://nordicapis.com/10-tips-for-improving-api-performance/

[17] Architecture for High-Throughput Low-Latency Big Data Pipeline on Cloud, apexon, online. https://www.apexon.com/blog/architecture-for-high-throughput-low-latency-big-data-pipeline-on-cloud/

[18] REST APIs: How They Work and What You Need to Know, hubspot, online. https://blog.hubspot.com/website/what-is-rest-api

[19] What are RESTful APIs, konghq, online. https://konghq.com/blog/learning-center/what-is-restful-api

[20] Richardson, L., Amundsen, M., & Ruby, S. (2013). RESTful Web APIs: Services for a Changing World. “O'Reilly Media, Inc."

Keywords :

Low latency, High throughput, Data serving layer, Generative AI, REST APIs, Scalability.

ESP International Journal of Advancements in Computational Technology [ESP-IJACT]

Low Latency High Throughput Data Serving Layer for Generative AI Applications using the REST-based APIs

Citation :

Abstract :

References :

Keywords :