Going Beyond The "If-Else" Chatbots

Research And Development Department

Introduction:
Eversince Epicalable was formed, we had an aim to create a chatbot embedded in a GUI to help users move away from the command-line interface. Epicalable also had a long-term goal of creating a dataset stored on a seperate file which could be easily edited by any user with no prior coding knowledge to help expand the chatbot's responses and vocabulary.

Most of the developers working on this project came up with a vague concept of creating an algorithm to run through a file in search of a given keywords by the user. Epicalable's Admin Department gave the green light to create this algorithm.
While working on the algorithm, we ran into some unforseen problems such as:
1. What type of file should we store our dataset in?
2. How are we going to extract the data from the file?
3. Could we show different outputs for the same input given in the chatbot?

The Dataset:
Upon getting the task to be done on our table, our Researchers firstly scoured the internet looking for which type of file could best store our dataset. We came across many types such as XML, CSV and XAML, but when coming across Json we found that it was easy to read and write, supports strings, numbers and boolean and is data-oriented. So we decided to adopt Json and an added bonus was Python has built-in support and libaries for Json to help extract data.

Soon we created a Json file and inserted our dataset with specific keywords in it. We created it based on the concept that once the user enters a text, JARVIS will search for matching keywords in the text and would show a random appropriate output.

Json's Advantages:
Creating a Json file to store datasets allows developers to append datasets not in the original Code file but in the Json code.
1. Separation of Concerns: Json files are focused purely on data representation, while Python files can contain executable code and logic. By keeping data separate from code, you make it easier to manage, update, and maintain the data independently of the logic.
2. Decoupling Data from Code: Storing data in Json files allows you to change the data without modifying the Python code. This decoupling is useful for scenarios where you need to update configuration settings, user data, or other types of information without altering the codebase.
3. Interoperability: Json (JavaScript Object Notation) is a widely accepted data format that can be used across different programming languages and platforms. This makes it easier to share data between systems written in different languages or to integrate with external systems and APIs.
4. Readability and Simplicity: Json files are easy to read and understand. The format is text-based and less complex compared to Python's syntax, which can be beneficial for data configuration files, especially for non-programmers.
5. Data Serialization: Json is a lightweight format for data serialization, making it easy to store and transmit structured data. It's particularly useful for web applications where data is often exchanged between a server and client in Json format.
6. Standardized Format: Json is a standardized format with well-defined rules for data structures like objects and arrays. This consistency helps avoid issues related to custom serialization and deserialization code.
7. Version Control: Json files, being plain text, work well with version control systems like Git. This makes it easy to track changes, revert to previous versions, and collaborate with others.
8. Portability: Json files are portable and can be easily shared or transferred between systems or applications without requiring specific Python code or execution environments.

In contrast, storing data directly in Python files might be more convenient for small-scale projects or scripts where the data and code are tightly coupled. However, for larger projects, data interchange, or configuration management, JSON provides a more flexible and standardized approach.

jarintents: The file where the dataset is stored

Searching Algorithm:
We ran into a new problem, retrieving data from a Json file is kind of harder than accessing a dictionary. We would need to open the Json file, followed by taking the user's input and running it through the entire file for matching keywords under 'tags' then display a random output from the list of available outputs under 'response'.

Our researchers at that time was quite used to looping so we decided to adopt that and also use the random library to help pick an output. We experimented with a 'for loop' to loop through the entire file and an 'if loop' to search for matching keywords. Once the matching keyword is found we used the 'from random import choice' library to randomly pick a output under 'response'.

Sometimes the algorithm might pick wrong responses just because it has found a tag present inside a word. Example is when the user types in the word 'dolphin' which has the sub-word 'hi', hence the algorithm will show the response for hi rather than for dolphin. This issue was circumvented by adding a space after each tag so that if a user enters 'hi' it would look for any spaces behind and take input as 'hi ', then for 'dolphin' since there is no space as after 'hi' is the letter 'n', the appropriate response is given out.

We also decided to capitalize each word that the user inputs to make sure it matches with our capitalized Json tags and in subsequent versions once JarAudit was introduced as a way to audit the chatbot's history it was seeminglessly integrated into Render-word. Render-word would hence send the input binded with the current time and it will be stored in the JarAudit file.

Current iteration of the Render-Word Engine

Render-Word Inner Workings:
A user enters an input in the input-bar which is stored in variable 'que' which is then converted into uppercase (This allows a standardisation of letters to check against the intents) and stored in 'query'.
The 'query' is then placed in the Audit file along with the time of input. The input 'que' which is capitalized is then displayed on the screen and then the 'query' is used by Render-word by looping through the "Jarindents.json".
The 'tags' in "Jarindents.json" is then matched against the 'query' and if the word in 'tags' is matched, then a random response from 'response' under the matched word is displayed with the help of choice module in python.

Calculating the time and space complexity:
The time complexity of this code is O(n) where n is the number of intents in the Jarintents.Json file. This is because the code iterates through each intent in the Jarintents.Json file and checks if the tags of the intent are present in the query input. Hence, the code snippet contains a loop that iterates through the 'intents' list in the Jarintents file and time complexity is O(n) where n is the number of elements in the 'intents' list.

The space complexity of this code is O(1) because the amount of additional space used does not depend on the size of the input.

The Big O chart, is an asymptotic notation used to express the complexity of an algorithm or its performance as a function of input size.

Different conversations:
The 'choice' module allows Render-word to choose a random output and display to the user. This allows different conversations to happen even if the input command are the same.
The below image shows in one conversations the User asks JARVIS 'Haw are you' and has a one-sided conversation and the other conversation has both the User and JARVIS asking each other 'How are you'.

Different conversations from same the input command

Last Revised: 9-Aug-2024