Large language models (LLMs) have rapidly popularised since the launch of ChatGPT at the end of 2022. As an emerging technology there are many security unknowns but considering what these might be has now come secondary to the initial excitement over using this powerful and time-saving technology. Is it therefore time to rein-in unfettered use of LLMs by employees for company business and on company devices?

What is an LLM?

An LLM uses an algorithm trained on a large amount of text-based data, typically scraped from the open internet. LLMs impressively generate a huge range of convincing content in multiple human and computer languages. However, they are in no way infallible and contain some serious flaws, including:

  • getting things wrong and creating incorrect information presented as facts
  • becoming biased and manipulated by leading questions
  • requiring huge compute resources and vast data to train from scratch
  • the potential to create harmful content and being vulnerable to ‘injection attacks

What are the possible data security implications for companies?

A repeated concern is that an LLM will ‘learn’ from your inputs and use that information when responding to others’ queries. However, LLMs do not currently automatically add information from queries to its model for others to query. This therefore means including information in a query will not result in that data being incorporated into the LLM.

But that’s not the end of the matter! The query will be visible to the organisation providing the LLM (so in the case of ChatGPT, to OpenAI). Those queries are saved and will undoubtedly be used in the development of the LLM. This could mean that the LLM provider (or its subcontractors) incorporates your queries in the future. The terms of use and privacy policy should therefore be fully understood before asking sensitive questions.

A question might be sensitive because of data included in the query or because of who is asking the question and when. For example, it might be discovered that a senior manager asked ‘how best to dismiss an employee?’ or an individual asked revealing health or relationship questions. Furthermore, the aggregation of multiple similar queries using the same login could increase the chance of individuals being exposed.

There is also the real possibility of queries stored online being hacked, leaked or accidentally made publicly accessible. This could potentially include user-identifiable data. Additionally the operator of the LLM might be acquired by an organisation with a different privacy ethos.

How can companies protect their data and employees?

Industry experts recommend:

  • not including sensitive information in queries to public LLMs
  • not submitting queries to public LLMs that would lead to problems if they were made public

Absolute Networks Ltd advocates incorporating these recommendation in an update to your existing IT Use and Information Security policies. You should ensure that those who want to experiment with LLMs are able to, but in a way that doesn’t place organisational data at risk.

The Verdict

LLMs, and ChatGPT in particular, bring an exciting and powerful development in the evolution of technology. Many will embrace what they have to offer and others may never use them.

However, individuals and organisations should be cautious and not underestimate unknown future threats. When data is input today it could be put to use in the future. Data capture is the 21st century gold-rush and with the advent of LLMs we’ve just discovered a new mine!

Published On: April 25th, 2023Categories: Security, Trends, Uncategorized