Abstract: people an opportunity toexpress and share their feelings

Abstract: The territory of Iraq and Syria has been controlled by a group known as Islamic State of Iraq and Syriawhich has made many countries and organizations such as United State, Canada, and United Nation to intervene soas to restore peace to Iraq and Syria. The attacks by ISIS has filled the airwaves as many reports and updates on theISIS attack has been broadcasted and published on different media including micro blogs such as the twitter, theFacebook and the Instagram in various languages and forms. These media has given people an opportunity toexpress and share their feelings and thoughts concerning the ISIS fight in Iraq and ISIS. These feelings includesadness, happiness, fear, doubt, joy, anger and neutral 6 . The use of the social media to initiate discussions on theISIS fight has tend to create an interesting and wide network in the social media which is an interesting area ofstudy in social network analysis. The collection of the people’s feelings and thoughts can help determine the depth ofharm or benefit of the fight to the people, decision making towards providing a solution and the risk involved as wellas knowing the opinions of the people.This paper deals with overall sentiments of the people on activities of ISIS. For this we have gathered twitter data inform of tweets and analyzed those using different approaches. The initial approach is Jeffery Breen Approach,followed by use of different APIs namely Semantaria and Repustate. The positive side of these three approaches isthat they have different parameters to analyze the tweet and the result so generated also provides variety of results.Keywords: ISIS, sentiment analysis, Repustate, FScore, Twitter, Semantaria, positive, negative, neutral, dataset.Introduction:Micro blogging sites have millions of people sharing their thoughts daily because of its characteristics short andsimple manner of expression. Social media became popular than ever as people are willing to share their emotionsand opinions or to participate in social networking. Accordingly the understanding of social media usage is also veryimportant. The sentiment analysis has emerged as one of the important source to analyze the emotional statsexpressed in textual data including social media data. Micro blogging website like Twitter is one of the platformuses by people to share their opinion on any burning issue. Analysis of tweets or posts would help in designingsmart recommendation systems 2 . In such a system the gathered tweets or posts are categorized into differentcategories. Based on different categories formed the tweets or posts are then classified based on the type of wordsthey use to express their feelings which further helps in polarizing the tweets or posts into mainly three categoriesnamely: Positive Negative and neutral tweets or posts. Different tools are used to calculate polarity. After processingthe tweets or posts each tool gives its own FScore and every tool has its scale to analyze that F score. The overallanalysis of score of all tweets or posts given by different tools gives the overall sentiment of people. One of thelimitations of these analyzing tools is their accuracy. Many tools provide results which do not have much accuracyas the takes into account different neutral or slang words These slang words increases the count of words but do nottake active part in determining the actual sentiment of the tweets.Scalability is one of the key issues which we have to deal with huge amount of tweets or posts. As the bulk oftweets or posts increase the bulk of neutral words or the slang words increase and it affect accuracy. So the data withless number of tweets or posts is considered to be more accurate whereas the data with more number of tweets orposts tend to reduce the accuracy and thus, the F Score will be affected 6 . In this project we have tried to analyze thegeneral opinion of people around the world about the rising activities of ISIS in the regions of Syria and Iraq. Fordoing so we tried to analyze the same issue using different sentiment analyzing techniques so as to generate moreaccurate results and thereby creating a better picture of overall sentiments of people on ISIS globally. Thetechniques which are under consideration includes use of different Streaming APIs available from developers site ofdifferent social media networks may it be Twitter, Facebook, Instagram. Fig. 1 shows the detailed taxonomy forsentiment analysis of the twitter data and different mannerAnother technique involves the use of online dictionary and the data is being tested on with the online dictionary togenerate a score about the tweet or post. Fig.1 below depicts the overall taxonomical distribution of sentimentanalysis approaches.Fig.1 Taxonomy of Sentiment Analysis 19Sentiment analysis is a combination of two words, sentiment and analysis. Sentiment is defined as feelings,attitudes, emotions and opinions toward a discussion or object. Usually sentiment is subjective impressions and notfacts. Analysis is the process of separating something into its appropriate structure or elements as a basis fordiscussion or interpretation. Therefore, sentiment analysis, sometimes referred to as opinion mining is a field thatuses natural language processing (NLP), statistics, or machine learning methods to extract, identify and characterizethe sentiment content of a text unit.The ISIS group has been known for causing several killings, beheadings and unbearable environment for the Iraqisand Syrians which has attracted many actions from different countries and organizations as well as opinion andsentiment discussions on Twitter by internet users. This project collected streaming tweets and retweets on thecurrent attacks by the ISIS group in Iraq and Syria to perform a discussion analysis using Opinion Miningtechniques. This project focuses on classifying the messages from the tweets and the retweets (don’t mentionretweets here) into positive, negative or neutral opinions about the ISIS group attacks.Methodology:Methodology comprise of different stages namely Data Collection followed by Sentiment Analysis and finallyinterpreting the result and drawing conclusion 3 . This section is divided into different sub sections of:? Data Collection? Sentiment Analysis? Result Formulation.Data Collection:Micro blogging website has become a widely used communication tool among internet users for expressing theiropinions about different aspects of life, therefore making such micro blogs a rich source of people’s opinions foropinion mining and sentiment analysis. One of the most popular micro blogging platforms for sentiment analysis isthe Twitter 3 . We use a dataset formed by collected messages on open discussions regarding ISIS attack from thetwitter for the sentiment analysis. Twitter contains very large number of very short messages created by the microblog platform users.The contents of the messages may be individual thoughts or public statements. The short messages shared on twitterare referred to as tweets and has a length limit of 140 characters. People tend to use acronyms, emoticons and othercharacters with special meanings for composing their tweets due to the length restriction. Twitter platform users canfollow others to receive their tweets. By using the Twitter developer API and R programming language, wecollected 75,000 text posts on ISIS from 3 rd November 2016 to 10 th November 2016 which was within the period thatthe United State launched an airstrike in Iraq which of course attracted many tweet posts on twitter. The large set ofdata collected from the twitter was used to carry out sentiment analysis in order to classify dataset of three classes:a) Positive sentiments: These sentiments express happiness, amusement or joyb) Negative sentiments: These sentiments express sadness, anger, disgust or disappointmentc) Neutral sentiments: These sentiments state facts or do not express any emotions.Sentiment Analysis:To collect the three classes of sentiment (positive, negative and neutral), we used three different sentiment analysisapproaches on the set of data collected on twitter. The sentiment analysis approaches used in this project is describedin the following section.Sentiment Analysis ApproachesI. Jeffrey Breen’s Approach:The Jeffrey Breen’s 3 approach is named after Jeffrey Breen’s unravel example of slides presentation on sentimentanalysis of tweets using R. Breen’s approach uses sentiment score to categorize tweets into positive, negative orneutral sentiments.To calculate the sentiment score, Breen’s approach uses a dictionary of positive and negative words on the set oftweets collected to classify which tweet is positive, negative or neutral. The formula used in calculating the Breen’ssentiment score is as below:Score = number of positive words – number of negative wordsIf score ; 0 then overall sentiment of the tweet is classified as “Positive opinion”If score ; 0 then overall sentiment of the tweet is classified as “Negative opinion”Else the overall sentiment of the tweet is classified as “Neutral opinion”Flow Process for the Sentiment Analysis on ISIS using Breen’s ApproachThe step-by- step process followed in carrying out the sentiment analysis on ISIS is explained in Breen’s approachflowchart as below:Fig 2: Flowchart of ISIS tweets Sentiment analysis using Breen’s Approach 6Search Twitter fordiscussion on ISIS andcollect tweet textLoad Positive andNegative sentiment wordlists (Hu & Liu)Clean each tweet withJunk DataCalculate score sentimentfor each tweetAnalyze and summarizeresults based on scoresentimentStep 1:We used the Twitter streaming developer API to collect tweets about ISIS on the Twitter starting from 3 rd November2016 to 10 th November 2016. The tweets collection were done on a daily basis through the R programming languageplatform and stored in a file. The CSV file containing the collected tweets is imported into R using read.csv.Step 2:The positive words and negative words were loaded into the R by importing these words from the local storagelocation using Scan function. The positive and negative words are opinion lexicon provided by Breen which isprimarily based on Hu and Liu papers. In Hu and Liu papers, he categorized 6,800 words as positive and negativewith 2006 positive words and 4783 negative words. He categorized the positive and negative words based on theemotional or sentimental expressions defined by those words. Some examples of positive words include: happy,love, charming, courageous and fascinate.Step 3:After loading the stored tweet file and the positive and negative words on R, the next task was to clean the tweets bystripping of retweet entities (RT), names of people mentioned in tweets (that is, @Robin), punctuations (such as !, .,:), digits included in tweets (0-9), hyperlinks (such as http or https) and unnecessary spaces from the tweets using thegsub function.Step 4:The cleaned tweets is now ready for sentiment evaluation. We created a sentiment score function which was used tocalculate scores of the cleaned tweets. The sentiment score function of the Breen approach accepts four parametersfor the calculation of score; the cleaned tweets, the positive words, the negative words and the progress of thetweets. Once the function performed the sentiment score calculation, we get the result score against each sentenceand stored this output in a CSV file for further study and analysis. The output file contains the tweet count, the scoreof the processed tweet and then analyzed tweet.Fig. 4 depicts of tweets along with their scores which are useful for analyzing over all sentimentFig 3: Screen Shot of Scored Tweets 6II. SEMANTARIA APISeminarian tool is powerful and easy-to- use tool for monitoring and visualizing Twitter, Facebook, surveys andother unstructured data for analytics community. It performs the sentiment analysis on large set of documents andtherefore outputs data in multiple forms. These forms can be categories, entities, phrases and sentiment score. Inaddition, the beautiful feature of this tool is that it provides the charts for all different forms. The different forms inwhich represented are:? Entities: These are the pronouns from the sentiments.? Themes: Themes are the noun phrases that give meaning to a sentence. Without this sentence is do notpossess any meaning.? Categories: Defines the type of the word user want.Semantaria possess diverse features in analyzing sentiments. It allows analyzing sentiments in two different modes:? Detailed mode: In detailed mode the sentiment score of each and every sentiment is calculated based onthe words contained in the sentiments. These words are matched with the words contained in the defaultdictionary. In addition, in detailed mode scores and polarity of phrases and themes are also estimated.? Discover mode: This mode allows to discover the data from sentiments in terms of entities, categories andthemes. The sentiment scores and polarity of all the different forms are estimated and the chart is plot onthe basis of the scores and polarity.The scores estimated by Seminarian is a number between 2 and -2 with sign +, – or no sign in front. Moreover thistool calculates the polarity of the data; the polarity can be categorized as positive negative and neutral. This can beenseen in figure sow belowFig 4: Score and Polarity estimation by Semantaria 6III. REPUSTATE APIAnother approach which is being covered in this comparative study is using an API called Repustate. The chieffunctioning of this API involves cleaning of tweets storing of tweets and calculating the FScore.The main working of this API 6 is dependent on categorizing the words into three lexicon set either positive,negative or neutral. The basis on which words are being categorized make use of online dictionary of words andthen compare those words in the positive, negative and neutral words dictionary and then categorizing. The overallsentiment of tweets depends on ranking of each word in their respective category and also, the total words in thetweets which determine the overall score of the tweets.So, these are the three approaches that are being covered and the outcome of these approaches can be seen belowwhich are further use for result formulation and output analysis. After the sentiment analysis we now proceed to endresult formulation which will form the base for further study.Result Formulation:We used the output from the sentiment score obtained to analyze the tweets by plotting the scores on the bar chartand histogram. The scores obtained from the score sentiment calculation ranged from -7 to 5. Figure 5 below depictsthe output of Jeffery Breen Approach which is represented in form of a bar graph.SCORES0200040006000800010000120001400016000Sentiment Analysis of Tweets on ISIS(Scores of tweets from 3rd November to 10th November 2016)TotalFig 5: Bar Chart of Sentiment Analysis of Tweets on ISIS Using Breen’s Approach 6The bar chart shows frequency of sentiments obtained from the analyzed tweets. The zero (0) value on the bar chartindicates tweets with neutral sentiments, the negative values (values to the left hand side of zero) indicates tweetswith negative sentiments while the positive values (values to the right hand side of zero) indicates tweets withpositive sentiments.Subsequently we further categorized the sentiments in sub-categories for effective analysis. Any tweet that had ascore of less than or equal to -2 was classified as an extreme negative sentiment, a score of -1 was classified asnegative sentiment, score with 0 for neutral sentiments, a score of 1 was classified as positive sentiment and scoresgreater than or equal to 2 were classified as extreme positive sentiments. Following the data on the bar chart above,for the overall tweets collected, the highest number of tweets was of neutral opinion (about 14,000), the secondhighest number of tweets was negative opinions (about 7,400) and the third highest tweets was positive opinions(3,200).We also analyzed the overall scores of the tweet using a pie chart with a result interpretation as follows: The neutralsentiment tweets had the highest value with a percentage of 49%, the second highest score tweets was negative(26%) and the third highest score tweets was positive (12%). The next highest score tweet was the extreme negativewith a value of 9%. Fig. 6 shows the graph plot between the scores of the tweets and count of scores01000200030004000500060007000COUNT OF SCORES -2-1.5-1-0.500.511.52Scores of the SentimentsFig.6: Sentiment Scores analysis using Semantaria tool 6From the above graph it can be observed that on average the count of the negative scores (value with negative sign)is high as compare to the count of positive score, hence we can conclude that the overall sentiment results arepositive.Similarly by plotting the polarity on the graph we can say that negative polarity is high therefore the ISIS sentimentsare negative all over. Fig. 7 shows the distribution of tweets based on polarity which helps us in concluding that theoverall sentiment of people globally is negative regarding the ISIS activities around the world.negativeneutralpositivePOLARITY-2500-2000-1500-1000-50005001000Sentiment Analysis by PolarityFig.7: Depicting the polarity of tweets 19Analyzing the tweets and the score so given to each tweet may help us classify the tweets into positive, negative orneutral tweets. There by we can create graph based on the polarity and conclude that overall sentiment of peopleglobally to be negative on ISIS.Result and Discussion:We used the output from the sentiment score obtained to analyze the tweets by plotting the scores on the bar chartand histogram. The scores obtained from the score sentiment calculation ranged from -7 to 5.The zero (0) value on the bar chart indicates tweets with neutral sentiments, the negative values (values to the left-hand side of zero) indicates tweets with negative sentiments while the positive values (values to the right-hand sideof zero) indicates tweets with positive sentiments.Subsequently we further categorized the sentiments in sub-categories for effective analysis. Any tweet that had ascore of less than or equal to -2 was classified as an extreme negative sentiment, a score of -1 was classified asnegative sentiment, score with 0 for neutral sentiments, a score of 1 was classified as positive sentiment and scoresgreater than or equal to 2 were classified as extreme positive sentiments 5 . Following the data on the bar chartabove, for the overall tweets collected, the highest number of tweets was of neutral opinion (about 14,000), thesecond highest number of tweets was negative opinions (about 7,400) and the third highest tweets was positiveopinions (3,200).We also analyzed the overall scores of the tweet using a pie chart with a result interpretation asfollows: The neutral sentiment tweets had the highest value with a percentage of 49%, the second highest scoretweets were negative (26%) and the third highest score tweets was positive (12%). The next highest score tweet wasthe extreme negative with a value of 9%.There are different other approaches which are not included in this report. In this comparitive study we covereddifferent techniques like Jeffery Breen Approach, Repustste API, Semantaria API and other different tools which areincluded. As there is progress in technology there will be real time analysis of tweets which will give us instantresults and itwill be topic of further discussion that how the data can be interpreted. The figure given below showsthe sentiment analysis of tweets based on the score and the score value of the tweets.-70%-60%-50%-40%-32%-29%-126%049%112%22%30%40%50%Sentiment Analysis of Tweets on ISISFig .8: Sentiment Analysis of Tweets on ISIS Using Breen’s Approach 6SCORES0200040006000800010000120001400016000Sentiment Analysis of Tweets on ISIS(Scores of tweets from 3rd November to 10th November 2016)TotalFig.9: Bar Chart of Sentiment Analysis of Tweets on ISIS Using Breen’s ApproachBased on two figures, Fig.8 and Fig. 9 we can compare two results. These two pictorial diagrams of the sentimentscore of tweets on ISIS, we observed that the highest number of tweets were both neutral followed by negative andpositive types. If we ignore the neutral tweets, we observed that in overall, the negative tweets were of higher valuesthan the positive tweets which indicates the fears, sadness, doubt, and tension experienced by the people as a resultof the ISIS fight.Future Scope:As there are lot of attacks which are being done by ISIS the global media has lots of news to cover. For recent times,the news include a firing event at a music concert in Las Vegas by an ISIS sleeper cell Steven has caused a lot ofhumar loss in USA. Also there are different terrorist attacks in London, Paris and different parts of world whichhave drawn the attention of global media .In the recent times, the murder because of ” Love Jihad” in Kerala was also a news of deep conceern as it has put lotmany peoples life at stake. The war with Kashmir and Alqaida’s chief Hafeez Saeed is also one of trending topics insocial media.As a future scope of this project we can analyze the overall sentiments of other terrorist activites in the world andcan formulate them. As of now we have restricted our survey to only the destruction caused by ISIS in North part ofIraq and Syria.