Public Notices

E Yomjinda Zhao

Description
E Yomjinda Zhao
Categories
Published
of 9
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Analyzing Sentiment ResponsesFrom Financial News Headlines Woramanot Yomjinda yomjinda@princeton.edu Jinjin Zhao jinjinz@princeton.edu Abstract Inthispaper, weintroduceanovelcon-cept for analyzing sentiment from fi-nancial news, where sentiment fromresponse comments is processed torepresent the ground truth of articleheadlines. To analyze the viabilityof this method, we created a newdatabase containing over 40,000 ar-ticles of financial headlines and re-sponse comments from the SeekingAlpha website. The new ground truthsentiments are tested and evaluatedagainst human judgment, and multiplemachine learning models are utilizedto assess preliminary prediction accu-racy of the new sentiments. 1 Introduction When financial news headlines are analyzedusing current sentiment analysis models, theysometimes give results that don’t correlate tothe expected sentiment. For example, theheadline “Abeona axes CEO over personalmisconduct”[3], wasevaluatedtobeneutralinStanfordCoreNLP[9], Textblob, and positivein Google’s Cloud Natural Language. Fig.1compares other examples of headline senti-ment results with the expected human senti-ment. An hypothesis for this phenomenon isthat the sentiment and the meaning of wordsused in financial news websites is differentthanthatofthosethecorpusgeneralsentimentanalysis models are trained on.For our project, we seek to improve thisissue by performing sentiment analysis onspecifically financial news headlines, withoutresorting to hand-generate new labeled datafor the training corpus. We achieve this byusing current models to generate sentimentbased on the context of the headline - specifi-cally the article and response comments.Figure 1: Headline sentiments mismatch withexpected sentiments 2 Related Work There have been various studies that investi-gate into the use of machine learning to cat-egorize sentiment in financial news, but thevast majority of those studies focus on pre-processing and model techniques, using someform of human judgment as the ground truthsentiment. InForoozan[2], textdocumentsarefirst analyzed using a default R package, andthen validated by economic scientists. BothKanishcheva[4] and Luo[ ?  ] directly used ex-pert knowledge to evaluate and annotate thetext. In general, some domain expertise isusedintheseannotations, validatingourclaimthat general text sentiment classifiers may notbe effective for this specific task.The datasets used in these works are lessthan 10,000 in size, and the largest limiting  factor towards this is probably the use of hu-man annotations. Additionally, the selectionof samples can be subjected to some form of human bias; for example, in Kanishcheva[4],the annotators chose which topics within thefinancial news to work with. An automaticsentiment classifier technique specifically inthe domain of financial news would resolvethese issues and be useful towards the generalevaluation of this form of text.Alternative strategies to our paper towardsthis task have been proposed. Koppel[6] di-rectly correlates the financial categorizationof text based on the market state. A draw-back for this method is that it assumes cor-relations without direct substantiated cause,and is likely to overfit on the training dataset.Ruiz-Martnez[8] generates a gazetteer wordlist specific towards financial news for iden-tifying sentiment. However, the compilationof this list is human-intensive and is restrictedto vocabulary in the dataset used. Our pro-posed method is automatic, can generalize toany vocabulary, and focuses directly on sen-timent classification, while addressing thesedrawbacks in related works. 3 Data and Prentepossessing3.1 Sentiment Labeling:StanfordCoreNLP StanfordCoreNLP’s Sentiment Analysis usesrecursive deep models for sentiment labels.We chose this model over other options(e.g. TextBlob), since we could downloadthe code, and it used more advanced tech-niques. The model is a recurrent neuralnetwork (RNN) that processes sentencesin a tree-like approach. It can be ran on:edu.stanford.nlp.pipeline.StanfordCoreNLPServer,and this is the approach we chose for pro-cessing our labels. The sentiment label is aninteger number from  0 . 0  to  4 . 0  with valuesranging from extremely negative ( 0 . 0 ) toslightly negative ( 1 . 0 ), to neutral ( 2 . 0 ) toslightly positive ( 3 . 0 ) to extremely positive( 4 . 0 ). 3.2 Datasets After evaluating the potential financial newswebsites, we selected the SeekingAlpha dueto the following observations: we can access alist of all their financial news at one web page,the comments were loaded in on page, and anexternal service wasn’t used to create and pro-cess the comment section. 3.3 Parsing Seeking Alpha We first parsed the website’s news section forurl links to the articles. We found that theelements which contained the header url hadthe attributes of ”add-source-assigned” so wesearched and recorded those elementers witha scraper. One challenge of this part was thenews section used infinite scroll so that wewould have to reach the bottom of the webpage before loading the next headers. Thiswas solved by running a JavaScript script onthe chrome console such that the code wouldmanually move to the end of the webpage af-ter the parsing was complete. Another issuecame up was that there was a limit to thesize of the files that Chrome can download atany point, which we solved by separating ourfiles into chunks. In the end, we had  41 , 892 news articles to process. This won’t be the fi-nal number of articles in our database, sincewe will have additional preprocessing restric-tions.The comments are loaded in an asyn-chronous Ajax request, so we retrieved thecomments directly by making the Ajax call.After getting the comments, we put them to-gether as an array and stored them in chunksas before. There were metadata informationincluded in this information that will allow usto do more processing if we chose to in thefuture, such as the number of comments, typeof user, and the location in response chains.We did some preliminary analysis by look-ing at the number of the comments for eacharticle in Fig. 2. Overall, as anticipated, themajority of the articles had very few com-ments but there was still a significant numberof articles with over 5 comments remaining(17225). We chose to use this subset of ar-  Figure 2: Number of comments per HeadlineHistogram: most headlines have less than 5commentsticles, so that each article had enough num-ber of comments to give a reasonable esti-mate of sentiment. There is a possibility thatthis may poss potential biases in our results,in case more sensational articles get highercomments, but our future analysis did not im-ply this bias (most comments were still mildlynegative or neutral, but this is likely due to thetime-frame of data scraping being in the mid-dle of a generally bearish, negative-leaningmarket). 4 Classifying the Sentiment4.1 Prelimary Analysis We initially classified the sentiment of everycomment and headline in order to examinethe distribution of each. From Fig. 3, itcan be seen that there is a majority of neu-tral and mildly negative article headlines, andvery few mildly positive headlines. However,the distribution of the average comment senti-ment of each article leans more towards neg-ative, as shown in Fig. 4. There are muchfewer articles with a sentiment of 3 or above,and significantly fewer neutral articles com-pared to negative ones.This distribution suggests that there is a sig-nificant difference between the comment sen-timents and the article sentiments.Figure 3: StanfordCoreNLP headline senti-ment: score is balancedFigure 4: Comment-based headline senti-ment: score is more skewed 4.2 Optimal Binning We first processed the estimated headline sen-timent by taking the average of the commentsentiments, and then binning them into sixcategories ranging between  [0 , 5] . This rangeis similar to the general representation of sen-timent as used in StanfordCoreNLP [9] andother papers. However, in evaluation of thismethod, we found that only 64 out of all of our headlines gave positive sentiments ( >  2 ).This extreme bias would make our final clas-sifier much more difficult to implement, so forthis preliminary analysis we chose to classifyour headlines as neutral or negative and re-moved all of the positive sentiment results.We used two different methods to evaluatethe sentiment. For the first method, we aver-aged the comment sentiments and then classi-fied them as neutral (with an average between [1 . 5 , 2 . 5] , or negative (with an average  <  1 . 5 ).For the second method (Majority), we cate-  gorized each comment as either negative orneutral (with the same ranges as above), andidentifiedthearticlebasedonthecategorythatthe most comments fall under. We found thatthe first method is the optimal binning methodacross all classifier models. For the sake of comparison, we have included one result of Majority binning in the baseline model to theresults table in later section to be compared tofirst method baseline.When we compared the distribution of themajority compared to the average method, itwas much more biased to negative. Addition-ally, when comparing the accuracy to the hu-man labeled sentiment of 300 headline sam-ples, the accuracy for neutral comments wasvery low ( <  0 . 4 ). Hence, we found average tobe the best representation overall. 5 Human Sentiment Analysis From the confusion matrices, we can ob-serve that the human annotated sentimentsfor headlines are much more aligned withthe comment-averaged sentiments generatedfrom StanfordCoreNLP, compared to the stan-dard generated headline sentiments (Neutralprecision 79 % against 53 % for the baselineof 50 %). Fig. 5 compares the true sentimentsof humans and generated predicted labels of 300 headlines, where the prediction are di-rectly from StanfordCoreNLP. In Fig. 6, thesame headline samples are used to comparethe human labels to the average comment sen-timent generated from StanfordCoreNLP. Thecomment sentiment has higher true labels pre-cision and recall for both negative and neutralresults.One interesting observation is that whenwe asked participants with no domain knowl-edge in finance (a computer science under-graduate), their headline sentiment results(not shown in this paper) were more alignedwith the StanfordCoreNLP generated senti-ments than the generated comment-averagedsentiment. We reasoned that this is becausetheir view would be more aligned with anon-contextual understanding of the headline;however, a person with financial backgroundwould have a better judgment of the true re-action with respect to the general financialscene. Previous work, as mentioned in Re-lated Works, generally use financial experts tohand annotate sentiment.Lastly, we compared the human annota-tions for 200 sample comments against theStanfordCoreNLP generated sentiment in Fig.7. This is to evaluate the quality of our rawsentiment data for comments; we assumedthat using StanfordCoreNLP would generateaccurate results for these. We found the ac-curacy within this sample to be above  0 . 75 ,meaning that it was a valid assumption to usecomment sentiments for our paper. However,we could potentially have even better estima-tion of the headline sentiment if this accuracywere higher, and that could be a subject forfuture work.Figure 5: StanfordCoreNLP headline senti-ment vs. Human: poor precision and recall 6 Modeling Methods As of this part, we have processed the neces-sary headline and comment data, and ascer-tained the validity of the claim that comment-based sentiment align better with human-generated sentiment. In this part, it is im-portant to underline that our goal is neitherto beat the State of the Art StanfordCoreNLPmodel nor to implement this in a large scale.We want to train our classifiers with simple  Figure 6: Comment-based (AVG Model)headline sentiment vs. Human: good preci-sion and recallFigure 7: StanfordCoreNLP comment senti-ment vs. Human: good precision and recallmodels to show that our comment-based ap-proach can be put into practice, and with morecomplex algorithm and larger database, canachieve high accuracy and practicality. 6.1 Word Embedding Techniques :For us to be able to train classifiers effec-tively, we want to have appropriate word em-bedding scheme. In this paper, we decideto test two types of word embedding: bag-of-words with count and bag-of-words withGloVe. 6.1.1 Bag-of-words Model The bag-of-words model is the simplestmodel for word embedding. In this model, aheadline text is represented as the bag (multi-set) of its words. The bag-of-words is specialin that it completely disregard grammar andword order while keeping multiplicity [13].Bag-of-words with count is a model wherewe encode the vector by converting a headlineinto a vector with each member representingcounts of each word. In other words we de-note vector of sentence i by  X  i  = ( x 1 i ,...,x n i i  ) where  ( x 1 ,...,x n )  represent set of all/ selectedwords from the headline dataset [13]. 6.1.2 Global Vectors for WordRepresentation (GloVe) The bag-of-words model is simple and effec-tive, but the simple counting method couldnot capture fine-grained semantic and syn-tactic regularities. To this end, we want touse Global Vectors for Word Represention(GloVe), alogbilinearregressionmodelwhichleverages statistical information by trainingonly on the nonzero elements in a word-wordcooccurrence matrix, rather than on the entiresparse matrix or on individual context win-dows in a large corpus. The GloVe modelpropose a weighted least squares regressionmodel that factorizing in the log of the co-occurrence matrix unequally J   = V    i,j =1 f  ( X  ij )( w T i   w  j  +  b i  +  b  j  − logX  ij ) 2 where  f  ( X  ij )  is a weighting function,  w  ∈ R d are word vectors and   w ] in R d are separatecontext word vectors, and  b i  and   b  j  are theadded bias [7].GloVe model is trained on five varyingcorpora billion in sizes with a vocabularyof 400,000 word vectors most frequent pre-trained [7]. Our headline data can fully utilizeGloVe word embedding in conjunction withthe bag-of-words model. It is important tonote that we can also use GloVe with othermodel, but for the sake of time and simplic-
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x