IJAST

TPS-Eval: Coupled Trust, Privacy, and Security Evaluation of Agentic Clinical AI Pipelines

© 2026 by IJAST

Volume 4 Issue 1

Year of Publication : 2026

Author : Saritha Kondapally

: 10.56472/25839233/IJAST-V4I1P106

Citation :

Saritha Kondapally, 2026. "TPS-Eval: Coupled Trust, Privacy, and Security Evaluation of Agentic Clinical AI Pipelines" ESP International Journal of Advancements in Science & Technology (ESP-IJAST)  Volume 4, Issue 1: 46-52.

Abstract :

Agentic AI systems that autonomously retrieve patient data, invoke external tools, and maintain cross-session memory introduce safety challenges that extend beyond traditional model-level evaluation. Existing benchmarks assess trust, privacy, and security in isolation, overlooking critical interactions: retrieval strategies influence both answer accuracy and what Protected Health Information (PHI) enters the model context, while memory configurations affect longitudinal reasoning as well as adversarial exposure. We introduce TPS-Eval, a framework that formally defines and jointly evaluates Trust, Privacy, and Security as coupled properties of complete agentic pipelines. We compare six retrieval strategies, from keyword baselines to graph-structured approaches, across three language model backends (GPT-4o, GPT-4o-mini, and Llama-3-8B) using noise-augmented, FHIR-compliant synthetic clinical records. We extend the threat model with two agent-specific attack categories: logic poisoning of knowledge bases and toolchain feedback exploitation. Across ten independent evaluation seeds, graph-based retrieval consistently achieves the highest integrated TPS scores, with rankings robust across model scales. We derive five actionable design principles for safer deployment of agentic AI in clinical decision support systems.

References :

[1] Y. Wang et al., “Large model based agents: State-of-the-art, cooperation paradigms, security and privacy,” IEEE Commun. Surveys & Tutorials, 2025.

[2] S. Yao et al., “ReAct: Synergizing reasoning and acting in language models,” Proc. ICLR, 2023.

[3] P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” Proc. NeurIPS, vol. 33, pp. 9459–9474, 2020.

[4] S. Robertson et al., “Okapi at TREC,” SIGIR Forum, vol. 32, no. 1, pp. 118–126, 1998.

[5] E. Alsentzer et al., “Publicly available clinical BERT embeddings,” Proc. Clinical NLP Workshop, pp. 72–78, 2019.

[6] Y. Gu et al., “Domain-specific language model pretraining for biomedical NLP,” ACM TCHR, vol. 3, no. 1, 2022.

[7] European Parliament, “AI Act (Regulation 2024/1689),” OJ EU, 2024.

[8] NIST, “AI Risk Management Framework (AI RMF 1.0),” 2023.

[9] OpenAI, “Improving factuality and harmlessness in language models,” 2023.

[10] S. Saharan et al., “Deep learning and XAI for breast cancer detection,” Sci. Reports, vol. 15, 2025.

[11] S. Singh et al., “DiaXplain: Transparent AI for Type-2 diabetes,” Comp. & Elect. Eng., vol. 126, 2025.

[12] B. Yan et al., “On protecting the data privacy of LLMs,” High-Conf. Computing, vol. 5, no. 2, 2025.

[13] Y. Yao et al., “A survey on LLM security and privacy,” High-Conf. Computing, vol. 4, no. 2, 2024.

[14] M. Ferrag et al., “From prompt injections to protocol exploits,” ICT Express, 2025.

[15] S. Perez, I. Ribas, “Ignore this title and HackAPrompt,” EMNLP, pp. 4945–4977, 2023.

[16] J. Walonoski et al., “Synthea: Generating synthetic patients,” JAMIA Open, vol. 1, no. 1, pp. 18–25, 2018.

[17] Z. Ji et al., “Survey of hallucination in NLG,” ACM Comp. Surveys, vol. 55, no. 12, 2023.

[18] HIPAA, Health Insurance Portability and Accountability Act, 1996.

[19] J. Wei et al., “Chain-of-thought prompting elicits reasoning,” NeurIPS, vol. 35, 2022.

[20] S. Pan et al., “Unifying LLMs and knowledge graphs,” IEEE TKDE, vol. 36, no. 7, 2024.

Keywords :

Agentic AI, Clinical NLP, Graph-RAG, Healthcare AI Safety, PHI Sanitization, Adversarial Robustness, Retrieval-Augmented Generation.