Transparency is a really important concept in practical AI ethics. According to the EU AI Act, “Transparency means that AI systems are developed and used in a way that allows appropriate traceability and explainability, while making humans aware that they communicate or interact with an AI system…”
I’ve been a little unsure about what this means in practice for companies using an LLM base model for their AI product or chatbot. The company may provide the prompting, additional documentation and context for the chatbot (i.e. via RAG – Retrival Augmented Generation), data storage and wraparound fancy platform, but it’s the LLM itself (via an API to OpenAI, Anthropic etc.) which is powering the language and interaction.
LLMs are still notoriously opaque – it’s very difficult to trace and explain the workings and data usage of the model (see e.g. https://lnkd.in/e9EdJ4yr) – so how can a company be ethically and legally ‘transparent’ about their product, and how might they be liable if things go badly wrong when somebody is using their chatbot?
I had the opportunity to ask Matthias Holweg this question yesterday during online training with Saïd Business School, University of Oxford. The short answer is that if you are using an API for the base “general purpose” model to power your product, your liability is likely to be zero – you are a “deployer” of the LLM. The threshold for liability may increase if you are using RAG and/or fine tuning the model to an extent that you are co-producing the model and become a ‘provider’. But even at this stage you would have to be introducing some context to the model which is risky or constitutes ‘data poisoning’ (e.g. hateful language, sensitive information). And this would then be a problem if the original model provider demonstrates that this behaviour wasn’t replicated in the original base model.
In summary, companies using AI in a chatbot in an ethically/legally transparent way should consider the following points:
1). You should make user know they are using AI.
2). If there are problematic instances – does the base model exhibit the same behaviour? Yes or No?
3). If you want to avoid liability don’t cross the threshold of becoming a ‘provider’.
Coincidentally, Matthias and his colleague Philipp Hacker have just published a paper providing more information and recommendations on this: https://lnkd.in/e7KXxTke
So this clarified legal transparency and liability, but on an ethical level I’m still uneasy – we still can’t truly understand and audit the underlying mechanics and data use of the underlying LLM models behind our new AI technology. Is this a level of ‘appropriate’ transparency we can live with? Can we truly trust generative AI as a conversational partner which can help humans learn, reflect and flourish?
