C. Vijayalakshmi, Chanda Asani, 2025. "Talking to Data: Designing Smart Assistants for Humanities Databases" ESP International Journal of Emerging Multidisciplinary Research [ESP-IJEMR] Volume 1, Issue 1: 33-42.
The exponential growth of digital humanities databases in the past several years have introduced behemoth accumulations of linguistic, cultural and historical content that are for all practical purposes only accessible by highly trained experts, though we imagine them to be theoretically accessible to anyone. Users, who are often humanities scholars, can find it daunting to query well-structured data because current retrieval methods require knowledge of the targeted database schema or controlled vocabularies or one (or more) query languages such as SQL, SPARQL or XPath. This leaves a gulf between the logic of computation that provides the underpinning for database design, and the interpretive processes involved in humanistic research. Emerging model: “talking to data” The notion for solving this sticking point comes from the emerging paradigm of “talking to data”, which aspires to enable humanities researchers to access and query databases in natural language using with AI-powered conversational assistants (Qiu et al. To build interfaces that can understand, reason and respond to complex, context-rich questions capturing the nuances of humanistic reasoning, the study examines the combination between large language models (LLMs) and semantic technology. Such assistants are capable of receiving dynamic, interpretative questioning rather than relying simply upon key-word searches. For instance, a researcher could ask "which correspondences from the 17th century discuss trade in Venice? and receive answers that are relevant to the context, supported with provenance information and structured metadata. Such a system offers a solution that generalizes the expressive power of database queries for those users who cannot express their needs in constant-domain logic. We propose such a system bridging together unstructured human language and symbolic representations from ontologies by integrating neural natural language processing (NLP) with symbolic ontology representations.With an eye toward explainability, interpretability, and cultural sensitivity, this work contributes to discussions around human-centered AI by developing intelligent assistant systems tailored for the humanities. The approach highlights the requirement for interfaces that are sensitive not only to computation or quantitation, but to the epistemological basis of humanities research, where meaning is historical, contingent and interpretative. The paper argues that conversational interaction with such implicit databases, can support inter-disciplinary collaboration, democratise access to digital archives and transform the practices of academic knowledge production and analysis in the digital age. Furthermore, this research positions conversational database assistants within the philosophical and normative debate on interpretation and artificial intelligence. It stresses that besides getting data back, such systems also need to provide clear paths of reasoning so human researchers can follow their inferences. Through the fusion of multimodal interface design, knowledge graph integration and language modelling, "talking to data" becomes a game changer for the next generation of digital scholarship. What we have then is a model of humanistic inquiry in which databases are dynamic partners, responsive collaborators within a dialogic process of interpretation between humans and machines—not inert archives.
[1] Affolter, K., Stockinger, K., & Bernstein, A. (2019). A comparative survey of recent natural language interfaces for databases. The VLDB Journal, 28(5), 793–819.
[2] Auer, S., et al. (2007). DBpedia: A nucleus for a web of open data. The Semantic Web.
[3] Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific American, 284(5), 28–37.
[4] Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data—the story so far. International Journal on Semantic Web and Information Systems, 5(3), 1–22.
[5] Burnard, L. (2014). What is the Text Encoding Initiative?. OpenEdition Press.
[6] Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding. ACL.
[7] Choudhury, S., & Thorsen, L. (2022). Artificial intelligence in the digital humanities. Digital Scholarship in the Humanities, 37(2), 321–339.
[8] Cimiano, P. (2006). Ontology Learning and Population from Text. Springer.
[9] Clark, P., et al. (2020). Transformer-based models for question answering over structured data. ACL.
[10] Cohen, D., & Rosenzweig, R. (2006). Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web. University of Pennsylvania Press.
[11] Davies, M. (2010). The Corpus of Historical American English: 400 million words, 1810–2009.
[12] Drucker, J. (2021). Visualization and Interpretation: Humanistic Approaches to Display. MIT Press.
[13] Efron, M. (2018). Information retrieval in digital humanities. Annual Review of Information Science and Technology, 52, 173–196.
[14] Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. MIT Press.
[15] Ganascia, J. G. (2010). The ethics of digital humanities. Literary and Linguistic Computing, 25(3), 331–341.
[16] Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), 199–220.
[17] Hogan, A., et al. (2021). Knowledge graphs. ACM Computing Surveys, 54(4), 1–37.
[18] Jiang, T., & Li, F. (2021). Natural language interfaces for semi-structured data. Foundations and Trends in Databases, 10(2–3), 73–220.
[19] Jones, S. (2020). The promise of AI in digital humanities. Digital Humanities Quarterly, 14(4).
[20] Jockers, M. L. (2013). Macroanalysis: Digital Methods and Literary History. University of Illinois Press.
[21] Kiefer, K., & Thiel, M. (2021). Human-centered AI in heritage databases. Computers and the Humanities, 55(3), 489–507.
[22] Li, F., & Jagadish, H. V. (2014). Constructing an interactive natural language interface for relational databases. VLDB Journal, 23(4), 563–588.
[23] Liu, Z., et al. (2023). Integrating large language models into digital humanities. AI & Society.
[24] Loper, E., & Bird, S. (2002). NLTK: The natural language toolkit. ACL Workshop.
[25] Mahony, S., & Pierazzo, E. (2012). Teaching skills or teaching methodology? Literary and Linguistic Computing, 27(2), 127–139.
[26] Manovich, L. (2019). Cultural Analytics. MIT Press.
[27] McPherson, T. (2018). Feminist in a Software Lab. Harvard University Press.
[28] Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.
[29] Pustejovsky, J., & Stubbs, A. (2012). Natural Language Annotation for Machine Learning. O’Reilly Media.
[30] Robertson, S. (2016). Humanities data: A necessary contradiction. DHQ, 10(3).
[31] Sætre, R., & Smedsrud, P. (2022). Explainable AI in cultural heritage systems. Journal on Computing and Cultural Heritage, 15(2).
[32] Samuels, L. (2018). The ethics of machine-assisted interpretation. Digital Studies, 8(2).
[33] Schöch, C. (2013). Big data in the humanities. Journal of Digital Humanities, 2(1).
[34] Smith, D. (2020). Conversational interfaces for digital archives. Digital Humanities Quarterly, 14(3).
[35] Tennant, R. (2015). Metadata for Digital Collections. ALA Editions.
[36] Underwood, T. (2019). Distant Horizons: Digital Evidence and Literary Change. University of Chicago Press.
[37] Verhagen, P., & Scerri, D. (2022). Semantic data in archaeology. Digital Applications in Archaeology and Cultural Heritage, 27.
[38] Wevers, M., & Smits, T. (2020). The visual digital turn. Digital Scholarship in the Humanities, 35(1), 194–207.
[39] Wiggins, A., & He, Y. (2020). Conversational retrieval over cultural heritage data. IEEE Access, 8, 158223–158235.
[40] Zeng, M. L., & Qin, J. (2016). Metadata. Neal-Schuman Publishers.
Conversational AI, Digital Humanities, Natural Language Interfaces, Knowledge Graphs, Large Language Models (LLMs), Semantic Web, Human-Centered Computing, Intelligent Databases, Explainable AI, Cultural Informatics.