I think it was around August 2022 I first learnt about DALL·E and my mind was totally blown.
Today, I'm going to show how I used OpenAI GPT-4 and OpenAI Assistants to build a bot that answers questions around DigitalOcean's documentation.
Throughout this blog post the input from me really isn't anything special, I want to show how large language models such as GPT-4 enable human beings to achieve things in hours that would have previously taken months.
Ok, so let's ask ChatGPT for help with all of this. I know that ChatGPT recently announced their Assistants API so perhaps we can use this to build our bot.
Creating an assistant looks pretty straight forward, it requires a few inputs such as:
- Design a prompt for it
- Enable the Retrieval tool which will allow the assistant to reference files to find it's answers, i.e a CSV containing all of DigitalOcean's documentation.
So first let's scrape DigitalOcean's incredible documentation into a CSV. I asked ChatGPT to help write a python script to achieve this.
I wanted to only scrape the titles of each article and then the main content. Each page has a sidebar, navigation, and last updated/created dates and I wanted to exclude this from our CSV.
You can see the full conversation here, I started with this prompt that I found online and after some back and forth I ended up with a working Python script.
In addition to the above prompt I asked ChatGPT below, ChatGPT responded with an updated, working script each time
- for article_urls I want it to read a sitemap.xml and get the URLs from there
- thanks, I want the script to verbosely output each what it is doing to the console when I run it
- thanks that worked, in each article that it is scraping the content contains two things that I want removed from the final CSV
The things that I want removed are:
1. 'Validated on 3 Oct 2023'
2. 'Last edited on 10 Oct 2023'
Note that these are just examples and the dates will be different for every scrape
- thanks, add that to the overall script
Great! Our Python script worked first time and it is now scraping DigitalOcean's documentation into a CSV.
Ok, so lets create an OpenAI Assistant, this part was really quite easy
- Browse to https://platform.openai.com/assistants and hit Create
- Give it a name
- Enter the prompt, for this I used GitHub Copilot's leaked prompt for inspiration
gpt-4-1106-previewas our model
- Enable the Retrieval tool and upload our csv
For some reason it did not like our data in csv format despite it being a supported format for Retrieval, instead it appeared to want JSON Lines. No worries! I'll just ask ChatGPT to convert our CSV into JSON Lines format.
Awesome! Now our Assistant is ready to take questions (yes, that fast)
I'm going to ask it a simple question but I'm not going to give it much context, lets see how it does in comparison to ChatGPT...
Great! On the left our Assistant gives a really detailed accurate answer and didn't ask for additional context
On the right GPT-4 who gives a really generic answer (as expected). When I tell ChatGPT I am referring to DigitalOcean App Platform it comes back with another generic answer that yes there is a free tier but it isn't sure on the details of that.
I wanted to check our prompt was clear and our assistant was following the rules. In the prompt I don't tell it not to answer questions about other clouds just that it should stay on topic and not discuss opinions.