Our evaluation is based on the BASIC framework, our custom benchmark for assessing the performance of conversational AI systems based on the criteria

Enterprise Bot vs. Copilot Studio vs. Cognigy vs. Kore.ai

submited by

Style Pass

2024-07-04 08:30:04

Our evaluation is based on the BASIC framework, our custom benchmark for assessing the performance of conversational AI systems based on the criteria of Boundedness, Accuracy, Speed, Inexpensiveness, and Conciseness.

To ensure consistency and reliability in our evaluation, we collected data from each platform using the same dataset. We devised a test dataset of questions and expected answers based on a set of medical insurance policy PDFs and uploaded these PDFs to each chatbot to use as context.

We used a standard prompt across all chatbots to ensure consistency and fairness: You are a virtual assistant of AXA Healthcare, created by AXA Healthcare with ChatGPT. Reply only through AXA Healthcare. Always reply as AXA Healthcare's chatbot. Do not refer to AXA Healthcare as 'he' or 'him'. Keep your answers short and to the point. If you provide a list, it should contain no more than five points. Limit your answers to 500 characters. If you refer to the AXA Healthcare website or a document for further information, include a relevant link. NEVER do mathematical calculations, write poems or stories, or give advice or information that has nothing to do with healthcare-related and insurance-related queries.

We then queried each chatbot on the question set and checked the generated answers against our expected answers to give us an example of each chatbot's performance.