This repository contains a Jupyter notebook that fetches and processes data from a CSV file, computes embeddings for data entries using OpenAI's API, and provides recommendations based on the closest matches (knn) found through these embeddings.
To use this notebook, you'll need to install the following Python libraries:
openai[Note: You'll need your API key. See Quickstart.]python-dotenvpandasnumpytenacitypickletiktokennomic[Note: Nomic requires an account. See Quickstart.]
You can install these using pip:
pip install openai python-dotenv pandas numpy tenacity pickle tiktoken nomicYou'll need to set the following environment variables:
OPENAI_API_KEY: Your OpenAI API key.
You can do this by creating a .env file in the root directory of this project and adding the following line:
OPENAI_API_KEY=your_openai_api_keyReplace your_openai_api_key with your actual OpenAI API key.
The data is expected to be in a CSV format file named Problem_Intake_CurrentVers_TEST.csv in the source_data directory.
The CSV file should have the following columns:
date: The date of the data entry.need: The need statement.contact: The contact information.dept: The department associated with the need statement.
To use the notebook, simply open it in your Jupyter notebook environment and run the cells sequentially. The notebook will:
- Fetch data from the CSV file.
- Compute the embeddings for each need statement using the OpenAI API.
- Provide recommendations based on the closest matches found through these embeddings.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update the tests as appropriate.