Seminar Series: Computation & Data

About Event

The goal of the interdisciplinary seminar series Computation & Data at HSU is to bring together researchers and foster exchange on the development of algorithms, methods and software. The seminar series is typically scheduled for the last Wednesday every month, 16:00-17:00, with 1 presentation per hybrid session (digital and at HSU). Immediately after the seminar series, the HPC Café take place.

Feel free to subscribe the seminar newsletter by sending an e-mail to info-hpc-bw@hsu-hh.de with the subject line „Subscription Seminar Computation & Data”.

Mihail Miller (InfAI e.V. Leipzig): Disentangling the Black Box: Transparent and Structured On-Premises Knowledge Engineering and Retrieval with AI Models – Insights from LiquidInfo (URZ, Leipzig) and DigiTaKS*

This presentation examines how privacy-compliant AI systems can transform heterogeneous, potentially sensitive data into an interactive knowledge landscape. Grounded in ongoing research and development at the University Computing Centre in Leipzig, it offers a concise overview of initial processing steps, including transcription with speaker recognition, named-entity recognition, temporal parsing, topic detection and stance analysis. These components illustrate how semi-automated structuring of data can serve as the basis for transparent and robust knowledge extraction. The phases are conceptualised as prerequisites for the subsequent deployment of an on-premises explainable AI system capable of answering complex queries, rather than as ends in themselves.

The analysis foregrounds human-machine interaction, asking how users can steer, query, explore and exploit this structured knowledge base. Three complementary access modalities are examined: (1) a graphical user interface with a hierarchical layout, (2) an audio interface controlled via voice commands, and (3) a Text2NoSQL pipeline in which natural-language questions are transformed into database queries. This pipeline entails modelling user intent, selecting relevant collections and fields, and generating syntactically and semantically valid queries and result sets. Typical sources of error such as ambiguous questions, incomplete context and conflicting constraints are identified, together with mitigation strategies including clarification requests, incremental query reformulation and the suggestion of alternative queries. A second focus is the design of human-machine interaction patterns that promote transparency and trustworthiness. Rather than concealing knowledge processing behind an opaque black box, the proposed interfaces aim to provide users with meaningful control and insight. Provenance indicators are used to make the origin and evidential basis of specific answers intelligible. Knowledge maps, faceted navigation and dialogue-oriented drill-downs are introduced as complementary mechanisms for moving from a single answer towards a richer, more explainable exploration of the underlying corpus.

The presentation further analyses how large language models (LLMs) can be constrained to generate natural-language responses that adhere strictly to the underlying knowledge base and output format, thereby avoiding unsupported hallucinations. Techniques such as retrieval-augmented generation, explicit anchoring of responses to query results and template-based response structures are considered as mechanisms for aligning LLM output with legal and organisational requirements, particularly in regulated environments.

Finally, these concepts are situated within educational and research contexts. Concrete scenarios are outlined in which students and researchers can annotate and interrogate large sensitive datasets such as interview transcripts, institutional documents or discussion forums using natural-language queries while remaining compliant with European data-protection regulations. The presentation concludes by synthesising lessons learned from prototyping such systems, including organisational prerequisites for deployment, and by identifying open research questions concerning the evaluation, accountability and maintenance of AI systems.