Text analyzing text data where assessed. According to Ghosh,

Text mining ,
also sometimes referred to as text analytics or text data mining refers to the
process of deriving high quality information from text through patterns and
trends. It involves the process of structuring input text within a database,
deriving patterns within the structured data and finally the evaluation and
interpretation of the output. (Aggarwal, 2012)

In a research
article by (Ghosh et al., 2012), the applicability of text mining algorithms in
analyzing text data where assessed. According to Ghosh,  80% of information is stored as text rather
than structured data. Manual labor intensive text mining approaches first
surfaced in the 1980s,  these however
were very time consuming and often never updated. With technological advances
and complexity of data in terms of size and type, text mining procedures have
evolved into more automated and accurate efficient processes. Ghosh further
mentioned the various application of text mining techniques such as such as
information retrieval, natural language processing, information extraction and
data mining which can be used to automated text mining methods. In his article,
The different text mining algorithms which include classification, association
and clustering algorithms are reviewed with their merits and demerits based on

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


Ghosh proposed that
merging the logic of these different algorithms would generate a more powerful  algorithm which could perform the task of
Classification of a data set into some predefined classes, establish
relationship between the classified data and finally cluster the data based on
the associations between them before grouping it. The article provides a better
understanding of the various algorithms and there implementations, however Ghosh
based all this information on theoretical work hence evaluating the proposed
algorithms for sensitivity, specificity or precision was not done.


A study by (Raja et al., 2008) 
had a major goal of evaluating the potential / capabilities of text
mining in healthcare settings. The study was conducted at University of Alabama
at Birmingham (UAB) and the University of Alabama (UA ). These had the largest
information systems in the nation with over 13,279 users and data for 661,533
patients data that had never been utilized. These information systems contained
vast amounts of unstructured data that had never been used. According to Raja,
Health information systems collect large amounts of textual and numeric
information about patient visits, prescriptions, physician notes that could
lead to: improvement in health care quality, promotion of clinical and research
initiatives, reduction in medical errors, and reduction in healthcare costs. However,
this information  is never available for
analysis due its structured nature and hence can not be utilized. In the study,
Predictive models were developed using unsupervised machine learning algorithms
and used to text mine electronic clinical records.


The study scope
was restricted to pathology reports and discharge summaries as these contained
moderately specialized vocabularies. Unsupervised analysis was used and so no
major manipulations where done to the data to identify interesting trends and
patterns. The results of the study were very promising as these indicated that
text mining can be an effective tool in healthcare datasets. The clustering
algorithm made it possible to group patients with similar conditions and to
predict diseases for particular patients based on the various clusters. This
study however bases on datasets where standard nomenclatures such RxNorm and
Unified Modified Language system was used. In settings such as Uganda where
such standards are rarely used in collecting or capturing healthcare data, standard
codes to represent collected data are important if data is be meaningful and
properly utilized.


A research
article by (Jusoh & Alfawareh, 2012) 
shows that different techniques can be applied to perform a text mining
task, the author analyses the fundamental methods for text mining which
include  natural language processing and
Information Extraction techniques. Natural language processing techniques are
concerned with natural language understanding which is important in situations
when the tool is expected to discover knowledge. One common NLG application is
machine translation system.

According to the
author, the second method of text mining is Information Extraction (IE) which
directly involves extracting useful information from the texts with a goal
of  finding  specific data or information in natural
language text. IE deals with the extraction of specified entities, events and
relationships from unrestricted text sources. IE can be described as the
creation of a structured representation of selected information drawn from
texts. In IE natural language texts are mapped to be predefine through
structured representation or templates whichrepresent an extract of key
information from the original text when filled,. The input can be unstructured
documents like free texts that are written in natural language or the
semi-structured documents . This information once extracted can  be stored in databases for querying and  data mining


In another
research article (Sheng, Road, Road, Hospital, & Road, 2016), the author proposed a novel
text mining algorithm based on an artificial neural network as a way to explore
richer knowledge mining methods such as semantic rules found, trend analysis
and topic tracking rather than basic classification algorithms in text mining.
According to the author, artificial neural networks have a capability of performing
richer analyses compared to other algorithms.


Another research article (Siddarth, 2016), a supervised learning technique was developed to
automatically detect EEG reports that described seizures and epileptiform
discharges. 3,277 documents where manually labeled as describing one or more
seizures versus no seizures, and as describing epileptiform discharges versus
no epileptiform discharges. A Naïve Bayes algorithm was used to enable the
system  automatically classify EEG
reports into these categories. The system  consisted of normalization techniques,
extraction of key sentences, and automated feature selection.

Based on the results,
the systems was able to effectively and efficiently automate the classification
of free text EEG reports. The limitation on the study however was the use of
Naïve Bayes  method which is more
effective in classifying text with only two categories. Various studies however
have indicated  Naïve Baye’s poor  performance in classifying text with more
than two categories.





worldwide efforts to combat HIV/AIDS have yielded positive results. According
to the Centers for Disease Control and Prevention (CDC) report (2016), global
efforts to fight against HIV/AIDS have resulted in 19.5 million people
receiving antiretroviral therapy (ART) for HIV infection in 2016. World Wide, AIDS-related
deaths have fallen by 45% since the peak in 2005. In 2015, 1.1 million people
died from AIDS –related causes worldwide, compared to two million in 2005 according
to the joint UN programme on HIV/AIDS (UNAIDS). Despite this progress, HIV/AIDS
continues to be a significant global public health concern. In 2016, 36.7
million people worldwide were estimated to be living with the  disease while about 1 million people died
from AIDS-related illnesses, 1.8 million people become newly infected with HIV.


In one research article by (John, 2015), the main objective was to
find out the information needs of the People Living With HIV/AIDS in Ibadan
metropolis, Nigeria and the sources from which people get this information. This was a qualitative study with the
primary source of information being a four part questionnaire. According to the
study, the major information needs included information ARV’s, Nutrition and
Non Governmental organisations supporting people with HIV while poverty and
discrimination were major barriers to information access. Major weaknesses of
the study however include the fact that the study was only limited to
individuals living with HIV AIDS, this could have led to an information bias as
information needs for people not living with HIV/AIDS were not captured. The
use of questionnaires could have limited respondents from fully expressing
themselves due to fear of stigmatization. Health concerns captured at point of
contact with a health facility were also not captured yet these contain very
important information regarding health information needs.


Another study carried out in Florida state- United
States of America (Oh & Park, 2014) used a text mining method (concepts and category
map) in HIV/AIDS question analysis focusing on examining questions regarding HIV/AIDS that
people generate in online social knowledge spaces such as yahoo Questions and
answers. Social Q&A is an online service that allows people to ask and
answer questions about many topics in everyday life, such as health issues.
This study focused on HIV / AIDS related questions as these were among the
major topics of concern.  A total of
15,574 HIV/AIDS questions posted in Yahoo! Answers were randomly selected and analyzed
using text mining. Category maps and concept maps were used for analyzing and
interpreting data. According to the study, HIV/AIDS is a major area of concern
and understanding people’s information needs regarding the disease is
important. The major topics of HIV related questions included prevention,
relationships among other. However, this study was carried out in a very
developed setting (USA), the information needs of people in such setting might
be totally different from settings such as Uganda due to the various issues
affecting the health care sector such as frequent ARV stock outs. Information
needs therefore must be localized to their settings if proper information
interventions are to be implemented