Using GPT-3 as a Content Management System (CMS)
It’s been said that artificial intelligence (AI) won’t put human beings out of work. But human beings who use AI will put those who don’t use AI out of work. Put differently, AI has the power to make you much, much faster and better at what you already do.
Do you code for a living? Consulting ChatGPT when you’ve hit a wall will nearly always get you through it faster than searching for a solution online (or leaving through a textbook). Do you write for a living? ChatGPT is more effective with outline creation and writer’s block than buckets of Adderall.
But it’s not just about boosting your efficiency and speed. AI is also poised to redefine entire industries. To illustrate what I mean, let’s talk about enterprise Content Management Systems (CMS).
What is a CMS?
A CMS is any software system that manages extensive data archives meant to be accessed by multiple individuals and groups. Government departments or companies might expose some or all of their data to clients, customers, or vendors. That data might live within secure databases, but appropriately phrased requests could deliver precise subsets of that data to authorized consumers. As a rule, a good CMS will provide all or most of these features:
- Access control to ensure data is available only to authorized consumers
- Search and navigation
- Network connectivity to permit secure remote access
- Version control to provide data lifecycle management and appropriate attribution
- Multi-media management to incorporate plain-text, structured SQL, audio, and video resources
- Document creation tools
Some popular CMS systems include Atlassian Confluence and, in a very different way, WordPress.
It’s also common for smaller businesses and other organizations to maintain extensive archives of documentation and “institutional knowledge”. But they often don’t use formal CMS platforms, and this is the kind of use case we’re going to discuss here.
The problem of data management in informal settings
Organizations change. Employees come and go and, in between, change roles. And systems evolve. This means that any given data resource is as likely as not to fall out of date or simply get lost. Or the person who once knew where everything was is no longer around to ask.
Is that not confusing enough? Well, consider how an organization’s data and documents can be hosted on a dizzying range of hosts, including individual team members’ PCs, local file servers, and cloud storage platforms. If you haven’t got the money or - more important - the time to incorporate an industrial strength CMS into your workflow, you’ll need something a bit more lightweight, which is precisely where AI tools like GPT-3 can come in.
How GPT-3 can solve your document management problems
The value of a CMS is in how it can help users can quickly find exactly the resources they need. The value of an internet search engine is in how it can help users quickly find the resources they need. Do you see a pattern here? More: to different degrees, both a good CMS and a search engine accept natural language inputs and, based on positional algorithms, return related information.
But how much more powerful could those tools be if they actually understood the natural language requests. Now that is the secret superpower of AI. And how more powerful they still would be if they actually understood the content of the documents they’re returning!
To explain what I mean, I’m going to show this to you in action - although on a very small scale.
Using GPT-3 as a CMS
Imagine that your organization relies on documentation stretched across a handful of PCs and servers around your office. Different people created the documentation over many years, and it’s stored in more than one format; you know: PDFs, spreadsheets, MS Word docs, meeting transcripts, etc. Now someone wants an answer to a question but doesn’t even know what the right document is called, let alone where it’s kept.
Suppose you’ve already exposed GPT-3 to your entire digital archive. And then suppose you used the contents of that archive to train the AI to “understand” your organization better than anyone. How, then, could you get the answer to your question?
Simple. You sit down and type out a GPT-3 prompt. “How much did we pay for rent on our storage facility building over the past four years?” Or: “Who is the registered owner of our Amazon Web Services Organizations account?” Or: “Can you show me the immediately previous version of the source code for the data analytics app used by the web administration team?”
Session Replay for Developers
Uncover frustrations, understand bugs and fix slowdowns like never before with OpenReplay — an open-source session replay tool for developers. Self-host it in minutes, and have complete control over your customer data. Check our GitHub repo and join the thousands of developers in our community.
Training GPT-3 on private data
Here, in a very scaled-down way, is how that training might look using the GPT-3.5 API using Python.
I’ll first import a couple of libraries and the OpenAI access key (from a local file called key
).
# Import Required Libraries
import openai
import re
# Reference your GPT-3 access key
openai.api_key_path = 'key'
I’ll then read a single article into a variable called text
. As it happens, this code comes from an actual experiment I recently ran as a proof of concept. The article here was the Markdown version of one chapter from a book I’d written on digital security.
# Read in the Text File
with open("article1.md", "r") as file:
text = file.read()
With our text document loaded, it’s time to feed it to GPT-3 and then prompt it with a question. I’ll specify I’ll go with the text-davinci-002
GPT engine. There’s obviously more than one, each with its own advantages. The question
variable contains the question I’d like to ask, while the prompt
argument contains both the question and the document itself. openai.Completion.create
is the actual command that makes everything happen.
# Ask GPT-3 a Question
question = "Describe the primary topic of this text in 30 words or less."
response = openai.Completion.create(
engine="text-davinci-002",
prompt=f"{question} {text}",
max_tokens=1024,
temperature=0.5
)
print(response["choices"][0]["text"])
After a few minutes spent thinking about things, GPT-3 got back to me with this:
This text provides an overview of digital security threats and tools for protection.
This shows that the AI effectively understood my question and the document’s contents well enough to pull out that concise summary. Apply this process to all the documents in your archive, and you’ll have an effective CMS that might be significantly better than any commercial package on the market today.
Of course, I just showed you a very basic example. You’ll need to optimize your code for cost and efficiency, apply it to a much more extensive range of resources, and then expose your trained AI model across your networks. It would probably also be helpful to wait for improved versions of GPT-3 (or GPT-4) that are on the way. But the principle itself seems workable.