Sateesh Reddy Adavelli, 2024. "Multimodal Gen AI: Integrating Text, Image, and Video Analysis for Comprehensive Claims Assessment" ESP International Journal of Advancements in Computational Technology (ESP-IJACT) Volume 2, Issue 2: 133-141.
The increase in claim sophistication in both the insurance and legal domains is a result of an increase in stokes and heterogeneity of data needed to assess the claim validity. Originally, this task was performed by some sort of subjectivity assessments and graphical rule sets, which is very slow and may be inherently erroneous due to its purely manual nature. Hence, with progressivity in multimodal learning, specifically in AI, there is now a unique chance of solving these challenges through the use of text data, which may include policies, reports and images, which may include accident images, evidence images, videos such as surveillance, cam videos among others. However, existing AI-based solutions usually address only one of the modalities, which makes it difficult to evaluate an integrated situation. This has led to the need for systems that will integrate information from all these modalities and come up with an accurate, efficient, and transparent processing system.Indeed, this paper seeks to discuss the use of Multimodal generative AI to address this need as one of the most recent approaches that rely on high-performing models that can process and integrate text, image, and video data. The proposed system combines these modalities to ensure that the system captures relevant data from each data type and combines all in a way that provides more comprehensive and enriched decision support. An initial system was designed and empirically tested against current claim adjudication techniques and was found to yield substantial enhancements in all utilization rates, throughput and main rationale for the claim decisions. The findings shown in the study stress the capability of multimodal generative AI for revolutionizing the present approaches of claims analysis and developing more efficient, accurate, and capable responses to various real-life conditions. This integration of technologies is an unprecedented advance towards advancing functional processes in the insurance and legal industries.
[1] Abdu, S. A., Yousef, A. H., & Salem, A. (2021). Multimodal video sentiment analysis using deep learning approaches a survey. Information Fusion, 76, 204-226.
[2] Sumeet Wadhwani, Breaking New Ground: A Dive Into Multimodal Generative AI, Spiceworks, 2023. online. https://www.spiceworks.com/tech/artificial-intelligence/articles/multimodal-generative-ai-adoption/
[3] Holland, C. P., & Kavuri, A. (2021). Artificial intelligence and digital transformation of insurance markets.
[4] The “superpowers” of multimodal AI, Mapfre, online. https://www.mapfre.com/en/insights/innovation/multimodal-artificial-intelligence/
[5] Everything You Need to Know about Multimodal AI: What It Is, How It Works, Its Benefits, and More, online. https://floatbot.ai/tech/what-is-genai-multimodal-ai
[6] Lee, G. G., Shi, L., Latif, E., Gao, Y., Bewersdorff, A., Nyaaba, M., ... & Zhai, X. (2023). Multimodality of AI for education: Towards artificial general intelligence. arXiv preprint arXiv:2312.06037.
[7] Latif, E., Mai, G., Nyaaba, M., Wu, X., Liu, N., Lu, G., ... & Zhai, X. (2023). AGI: Artificial general intelligence for education. arXiv preprint arXiv:2304.12479.
[8] Guo, R., Wei, J., Sun, L., Yu, B., Chang, G., Liu, D., & Bu, L. (2023). A survey on image-text multimodal models. arXiv preprint arXiv:2309.15857.
[9] The Power of Multimodal AI: Unlocking New Possibilities with Text and Sensory Data, gleecus, online. https://www.gleecus.com/blogs/multimodal-ai-possibilities/?
[10] Top generative AI trends to know in 2024, simublade, online. https://www.simublade.com/blogs/generative-ai-trends/
[11] Nie, L., Liu, M., & Song, X. (2019). Multimodal learning toward micro-video understanding (Vol. 9, p. 186). San Rafael, CA, USA: Morgan & Claypool.
[12] Annie Surla, Aditi Bodhankar and Tanay Varshney, An Easy Introduction to Multimodal Retrieval-Augmented Generation, developer.nvidia, online. https://developer.nvidia.com/blog/an-easy-introduction-to-multimodal-retrieval-augmented-generation/
[13] Cope, B., & Kalantzis, M. (2023). A multimodal grammar of artificial intelligence: Measuring the gains and losses in generative AI. Multimodality & Society, 4(2), 123-152.
[14] Fang, X., Wang, W., Lv, X., & Yan, J. (2024). Pcqa: A strong baseline for aigc quality assessment based on prompt condition. arXiv preprint arXiv:2404.13299.
[15] Acosta, J. N., Falcone, G. J., Rajpurkar, P., & Topol, E. J. (2022). Multimodal biomedical AI. Nature Medicine, 28(9), 1773-1784.
[16] Fei, N., Lu, Z., Gao, Y., Yang, G., Huo, Y., Wen, J., & Wen, J. R. (2022). Towards artificial general intelligence via a multimodal foundation model. Nature Communications, 13(1), 3094.
[17] Wang, Y., He, Y., Li, Y., Li, K., Yu, J., Ma, X., ... & Qiao, Y. (2023). Internvid: A large-scale video-text dataset for multimodal understanding and generation. arXiv preprint arXiv:2307.06942.
[18] Liu, V. (2023, April). Beyond text-to-image: Multimodal prompts to explore generative AI. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1-6).
[19] Zammit, M., Liapis, A., & Yannakakis, G. N. (2024, March). MAP-elites with transverse assessment for multimodal problems in creative domains. In International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) (pp. 401-417). Cham: Springer Nature Switzerland.
[20] Soenksen, L. R., Ma, Y., Zeng, C., Boussioux, L., Villalobos Carballo, K., Na, L., ... & Bertsimas, D. (2022). Integrated multimodal artificial intelligence framework for healthcare applications. NPJ digital medicine, 5(1), 149.
Multimodal Generative AI, Claims Assessment, Text Analysis, Image Analysis, Video Analysis, Deep Learning.