Introduction:
Eversince Epicalable was formed, we had an aim to create a chatbot embedded in a GUI to help users move away from the command-line
interface. Epicalable also had a long-term goal of creating a dataset stored on a seperate file which could be easily edited
by any user with no prior coding knowledge to help expand the chatbot's responses and vocabulary.
Most of the developers working on this project came up with a vague concept of creating an algorithm to run through a
file in search of a given keywords by the user. Epicalable's Admin Department gave the green light to create this algorithm.
While working on the algorithm, we ran into some unforseen problems such as:
1. What type of file should we store our dataset in?
2. How are we going to extract the data from the file?
3. Could we show different outputs for the same input given in the chatbot?
The Dataset:
Upon getting the task to be done on our table, our Researchers firstly scoured the internet looking for which type of file
could best store our dataset. We came across many types such as XML, CSV and XAML, but when coming across Json we found that it was easy to read
and write, supports strings, numbers and boolean and is data-oriented. So we decided to adopt Json and an added bonus was Python has built-in support and
libaries for Json to help extract data.
Soon we created a Json file and inserted our dataset with specific keywords in it. We created it based on the concept that once the user enters a
text, JARVIS will search for matching keywords in the text and would show a random appropriate output.
Json's Advantages:
Creating a Json file to store datasets allows developers to append datasets not in the original Code file but in the
Json code.
1. Separation of Concerns: Json files are focused purely on data representation, while Python files can contain
executable
code and logic. By keeping data separate from code, you make it easier to manage, update, and maintain the data
independently of the logic.
2. Decoupling Data from Code: Storing data in Json files allows you to change the data without modifying the Python
code.
This decoupling is useful for scenarios where you need to update configuration settings, user data, or other types
of
information without altering the codebase.
3. Interoperability: Json (JavaScript Object Notation) is a widely accepted data format that can be used across
different
programming languages and platforms. This makes it easier to share data between systems written in different
languages
or to integrate with external systems and APIs.
4. Readability and Simplicity: Json files are easy to read and understand. The format is text-based and less complex
compared to Python's syntax, which can be beneficial for data configuration files, especially for
non-programmers.
5. Data Serialization: Json is a lightweight format for data serialization, making it easy to store and transmit
structured
data. It's particularly useful for web applications where data is often exchanged between a server and client in
Json
format.
6. Standardized Format: Json is a standardized format with well-defined rules for data structures like objects and
arrays.
This consistency helps avoid issues related to custom serialization and deserialization code.
7. Version Control: Json files, being plain text, work well with version control systems like Git. This makes it
easy to
track changes, revert to previous versions, and collaborate with others.
8. Portability: Json files are portable and can be easily shared or transferred between systems or applications
without
requiring specific Python code or execution environments.
In contrast, storing data directly in Python files might be more convenient for small-scale projects or scripts where
the data and code are tightly coupled. However, for larger projects, data interchange, or configuration management, JSON
provides a more flexible and standardized approach.
Searching Algorithm:
We ran into a new problem, retrieving data from a Json file is kind of harder than accessing a dictionary. We would need to open the Json file, followed by
taking the user's input and running it through the entire file for matching keywords under 'tags' then display a random output from the list of available outputs
under 'response'.
Our researchers at that time was quite used to looping so we decided to adopt that and also use the random library to help pick an output.
We experimented with a 'for loop' to loop through the entire file and an 'if loop' to search for matching keywords. Once the matching keyword is
found we used the 'from random import choice' library to randomly pick a output under 'response'.
Sometimes the algorithm might pick wrong responses just because it has found a tag present inside a word. Example is
when the user types in the word 'dolphin'
which has the sub-word 'hi', hence the algorithm will show the response for hi rather than for dolphin. This
issue was circumvented
by adding a space after each tag so that if a user enters 'hi' it would look for any spaces behind and take input
as 'hi ', then for 'dolphin'
since there is no space as after 'hi' is the letter 'n', the appropriate response is given out.
We also decided to capitalize each word that the user inputs to make sure it matches with our capitalized Json tags and
in subsequent versions
once JarAudit was introduced as a way to audit the chatbot's history it was seeminglessly integrated into Render-word.
Render-word would hence send the
input binded with the current time and it will be stored in the JarAudit file.
Render-Word Inner Workings:
A user enters an input in the input-bar which is stored in variable 'que' which is then converted into uppercase (This allows a
standardisation of letters to check against the intents) and stored in 'query'.
The 'query' is then placed in the Audit file along with the time of input. The input 'que' which is capitalized is then displayed
on the screen and then the 'query' is used by Render-word by looping through the "Jarindents.json".
The 'tags' in "Jarindents.json" is then matched against the 'query' and if the word in 'tags' is matched, then a random
response from 'response' under the matched word is displayed with the help of choice module in python.
Calculating the time and space complexity:
The time complexity of this code is O(n) where n is the number of intents in the Jarintents.Json file. This is because the code iterates
through each intent in the Jarintents.Json file and checks if the tags of the intent are present in the query input. Hence, the code snippet
contains a loop that iterates through the 'intents' list in the Jarintents file and time complexity is O(n) where n is the number of
elements in the 'intents' list.
The space complexity of this code is O(1) because the amount of additional space used does not depend on the size of the
input.
Different conversations:
The 'choice' module allows Render-word to choose a random output and display to the user. This allows different conversations to happen
even if the input command are the same.
The below image shows in one conversations the User asks JARVIS 'Haw are you' and has a one-sided conversation and the other conversation
has both the User and JARVIS asking each other 'How are you'.