Generative artificial intelligence technology is providing a competitive advantage for early adopters across many industries, while for others it is already a virtual necessity. Business leaders are asking not if, but how they will integrate AI into their tech stacks to remain competitive and meet rising client expectations. A crucial part of this effort is determining how this technology can be deployed without compromising data security.
For those that have made the transition to cloud computing in the last two decades, the question is: what are the new risks as AI enters the equation? This blog post examines this question by mapping AI security risks across different environments. It aims to provide some insights into how businesses can use AI without downgrading their data security infrastructure.
The consumer-facing versions of large language model (LLM) tools like ChatGPT offer a convenient way to explore the capabilities of LLMs as they are low cost and accessible. However, using these tools at the enterprise level, or for any use that involves sensitive or personal information, carries significant privacy concerns. A major concern is that platforms like OpenAI and Google utilize user data, including prompts to AI chatbots, to train their LLM models.[1] Once data is shared with these tools, it may be accessed by other users, potentially breaching confidentiality restrictions and exposing sensitive data. Samsung learned this the hard way when employees inadvertently leaked confidential information by using ChatGPT at work.[2] Following these incidents, citing data security concerns, Samsung banned ChatGPT from the workplace.
Although mitigation measures exist to manage this concern, users should be wary of using these tools to process sensitive information. For example, while OpenAI users can request to opt-out of allowing their data to be used to improve services, OpenAI doesn’t explain how it assesses these requests, or how long they take to process.[3] Moreover, relying on employees to take this step would be burdensome, unreliable, and difficult to track for organizations. Likewise, while users can request to have data deleted, it is much safer to avoid putting sensitive data into these tools to begin with. As Samsung discovered, it can be difficult to retrieve and delete data on external servers once it is shared to these tools. In fact, both OpenAI and Google actively discourage users from sharing sensitive or confidential information with their consumer-facing LLM tools.[4]
The data retention and disclosure policies applicable to these tools also present data security and privacy challenges. OpenAI retains data shared with ChatGPT for an indeterminate amount of time in an unspecified location and can disclose that data to a broad range of parties.[5] Google retains data shared with its chatbot Bard for 18 months and shares data with affiliates and other unspecified businesses for external processing. [6]
Ultimately, while consumer-facing LLM tools offer an accessible entry point for individuals to explore AI, they are not designed for use at the enterprise level. Enterprises that do intend to utilize these tools should carefully devise acceptable use policies and select tools that minimize risks of data leakage or confidentiality breaches. Anthropic, for example, does not train its models on user data and may be a more palatable option for some uses.
Permitting consumer-facing tools in the workplace at all can be risky as employees have control over their own accounts, meaning security and oversight falls to individuals. What’s more, a single error could result in a serious data leak. Enterprises should consider implementing firewalls to block riskier tools or redirect employees to approved alternatives that prioritize data security.
Accessing LLMs through an API can offer greater protection, as providers have implemented enhanced security measures for data shared through this medium.
To take OpenAI as an example, the data protections applicable to API users are a vast improvement on those that accompany its consumer-facing tools. As of March 1, 2023, OpenAI no longer trains its LLMs on data received from users through the API. It limits its retention of API inputs and outputs to 30 days for the purpose of identifying abuse and offers zero-day retention for certain use cases. Users also retain control over data submitted for fine tuning purposes, which will be retained until deleted by the user.[7] User data is confined to a much more limited audience of “(1) authorized employees that require access for engineering support, investigating potential platform abuse, and legal compliance and (2) specialized third-party contractors who are bound by confidentiality and security obligations, solely to review for abuse and misuse.”[8]
OpenAI also hosts its application infrastructure on a trusted cloud provider, Microsoft Azure Cloud Services, encrypts data in transit and at rest, and the API platform has been certified as SOC2 Type 2 and SOC 3 compliant.[9] Additional security measures are formally established in customer agreements.[10]
Whether this data security infrastructure is sufficient will come down to the needs of individual businesses, the risk they are willing to tolerate, and the applicable legal requirements. Absent some engineering capacity, accessing a provider’s API may require engagement of a third party to provide a suitable interface. In that case, that provider’s security infrastructure will form part of the overall risk assessment. Enterprises should thoroughly review the providers’ privacy documentation and negotiate their contracts to ensure their data protection needs are met. For businesses in Europe, this will include making sure any transfer of personal data to LLM tools hosted in the US complies with the GDPR or equivalent regulations. The U.S. International Trade Administration maintains a public database of U.S. organizations which meet these standards by participating in the EU-US Data Privacy Framework.[11]
Enterprises worldwide have been steadily turning to services like Microsoft Azure and Amazon Web Services (AWS) for over a decade to facilitate their transition to cloud storage. These providers offer enterprise-grade security measures, which have passed diligence reviews at millions of businesses. As these providers have generated a great deal of trust with their enterprise customers, it is unsurprising they are setting the security standards for enterprise AI solutions.
For its part, Microsoft has partnered with OpenAI to offer Azure Open AI, a service which provides OpenAI’s GPT models hosted on the Azure platform. Similarly, AWS has collaborated with numerous foundation model providers, including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon, to offer “Amazon Bedrock,” which hosts the various foundation models on AWS’s secure cloud platform.
Both Azure OpenAI and Amazon Bedrock offer access to LLM tools with advanced data protection measures. For example, they offer private endpoints to enable data routing through a centralized network (avoiding exposure to the public internet), they give customers control over data encryption by permitting them to use their own keys, and they support role-based or identity-based access controls.
They also provide customers with an additional layer of control around data processing and storage. Customers of Azure OpenAI can specify their data processing and storage location and user data can only be accessed by Microsoft, not OpenAI. Similarly, Amazon Bedrock ensures user data is stored in the AWS region where Amazon Bedrock is utilized and that data is not shared with any third-party model providers.[12] These features can streamline compliance for businesses, particularly those operating under stringent regulatory requirements like the GDPR, which has tight controls around the processing and transfer of data.
Azure OpenAI and Amazon Bedrock are an attractive proposition for enterprises that want to safely use AI tools in the cloud. They offer a secure and trusted infrastructure, designed with enterprises in mind, which enables businesses to access a range of foundation models.
Businesses resistant to using commercial cloud providers may wish to explore options for hosting LLMs on premises. This will likely involve hosting a small open-source model on their own servers, keeping all activity within their internal network. These companies can also work with external providers, to host software solutions that enhance or secure the use of AI, within their own environment.
This avoids the security risks associated with sharing data with any third parties, but is more costly and complex to operate, requiring engineering capacity to set up and monitor and the hardware to host the solutions. This option is likely to be favored by businesses with considerably heightened security needs, a significant budget, and in-house technical capabilities.
While enterprises should be aware of the potential pitfalls of permitting AI applications in the workplace, secure paths to adoption exist. The right solution for a given organization will depend on its risk profile and the problems it hopes to solve. API tools and those hosted on a secure cloud environment provide a much more enterprise-ready offering than consumer-facing tools, while on premises solutions may appeal to those with significant resources and heightened security needs.
Constructing a secure data infrastructure is a crucial step towards securely using AI, but is only the starting point. Enterprises must ensure their use of AI complies with applicable laws and regulations, devise thorough acceptable use policies setting out how AI can and cannot be used, and inform employees which tools are permitted. Enterprises should provide training on the risks associated with using AI, such as hallucinations, and the consequences of improper use, and should monitor for any misuse.
Getting data security right when implementing AI is a crucial investment for enterprises. If you are interested in discussing data security for AI with our team, reach out to ravin@credal.ai.
_________________________________
[1] See OpenAI, Privacy Policy, https://openai.com/policies/privacy-policy.
[2] Samsung bans use of generative AI tools like ChatGPT after April internal data leak, https://techcrunch.com/2023/05/02/samsung-bans-use-of-generative-ai-tools-like-chatgpt-after-april-internal-data-leak/.
[3] https://docs.google.com/forms/d/e/1FAIpQLScrnC-_A7JFs4LbIuzevQ_78hVERlNqqCPCt3d8XqnKOfdRdQ/viewform
[4] See OpenAI, What is ChatGPT, https://help.openai.com/en/articles/6783457-what-is-chatgpt (“Please don't share any sensitive information in your conversations.”); Bard Privacy Help Hub, https://support.google.com/bard/answer/13594961?hl=en#your_data (“Please don’t enter confidential information in your Bard conversations or any data you wouldn’t want a reviewer to see or Google to use to improve our products, services, and machine-learning technologies”).
[5] OpenAI Privacy Policy, https://openai.com/policies/privacy-policy (“Content is stored on OpenAI systems and our trusted service providers’ systems in the US and around the world” and data can be shared with “affiliates, vendors and service providers, law enforcement, and parties OpenAI transacts with”).
[6] Google, Privacy Policy, https://policies.google.com/privacy#infosharing.
[7] See OpenAI Privacy Policy.
[8] OpenAI, Enterprise Privacy at OpenAI, https://openai.com/enterprise-privacy.
[9] OpenAI, SOC3 Compliance Report, https://trust.openai.com/?itemUid=b2671060-5c66-4d9c-b70f-af4ab3dbd45a&source=documents_card ; OpenAI Security Portal, https://trust.openai.com/?itemUid=a0c2d606-48f6-4519-8db5-9029a98328d6&source=click.
[10] See SOC 3 Report.
[11] Data Privacy Framework Program, https://www.dataprivacyframework.gov/s/participant-search.
[12] AWS, Amazon Bedrock FAQs, https://aws.amazon.com/bedrock/faqs/#:~:text=Any%20customer%20content%20processed%20by,you%20are%20using%20Amazon%20Bedrock.
Credal gives you everything you need to supercharge your business using generative AI, securely.