Government & Politics

Blocking Embarrassing Messages on Online Social Networks Using K-Means Clustering Algorithm

Description
Although online social networks (OSN) have attracted a lot of user’s attention towards it in interchanging the most updated information around the world from one location to other location, still it faces some main limitation like ability to control
Published
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Transcript
   © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)   IJRAR19K2412 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org  831 Blocking   Embarrassing Messages on Online Social Networks Using K-Means Clustering Algorithm SANDHYA KRISHNA P #1 , PAVANI  VELLALACHERVU #2   #1  Assistant Professor, Dept of IT, Vignan ’s  Nirula Institute of Technology and Science for Women, Peda Palakaluru Road, Peda Palakaluru, Andhra Pradesh -522005. #2  Assistant Professor, Dept of IT, Vignan ’s Nirula Institute of Technology and Science for Women, Peda Palakaluru Road, Peda Palakaluru, Andhra Pradesh -522005. ABSTRACT Although online social networks (OSN) have attracted a lot of user’s attention towards it  in interchanging the most updated information around the world from one location to other location, still it faces some main limitation like ability to control the embarrassing words or abused messages on OSN wall. This type of messages are mostly seen in the current OSN services in which a kid try to harass another a kid or preteen or child using the internet with some abused or bad messages .This process is known as “ Cyber Bullying ” , which is becoming a serious problem in the current OSN services by afflicting the children, young adults with these rude messages. In this proposed thesis we try to design and analyze a novel approach for automatically identifying the set of cyber bulled messages and try to system which can able to cluster the cyber bulled messages into a separate cluster and avoid such messages not to be posted on user walls. Here we use K-Means algorithm for clustering the messages based on text filtering. Key Words: Online Social Networks, K-Means, Cyber Bullying, Embarrassing Messages, Clustering, Harassing, Text Filtering.   © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)   IJRAR19K2412 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org  832 I.   Introduction Data Mining is a method in the process of knowledge discovery which uses several techniques like classification,clustering,associations,rule mining and interpretation in order to obtain the desired results. Clustering is one among the several processes inside the data mining, which is used for separating a set of un-supervised data into a meaningful way. There are lot of clustering algorithms in the data mining literature, mainly used for exploratory data analysis, where there is little or no prior knowledge about the data .The clustering algorithms mainly try to take the raw data as input and try to apply the classifier in order to identify the objects which are exactly matched comes into one cluster and those which are not matched with the initial classifier need to be separate cluster. Here the clusters are labeled with distinct names in order to separate the inputs easily and then cluster accordingly[1]. Figure. 1. Demonstrate the Sample of Clustering Technique in Data Mining From the above figure 1, we clearly identify an un-supervised data is taken as input and then applied clustering on that input data in order to generate the supervised data blocks. Initially we try to assume three different colour data blocks and then try to apply clustering algorithm to process that un-supervised data block into supervised manner[2]. Once the clustering algorithm is applied then the data blocks are arranged in a three clusters in which each and every individual cluster holds its matched blocks. Here the data which is unsupervised is almost some colour blocks which are randomly shuffled into a single group and where each and every colour block has individual characteristics in appearance and shape. Now we try to apply the clustering algorithm K-Means in order to categorize the colour blocks into separate groups [3]. Once the K-means clustering is applied on this input data, the colour blocks which are having same colour come into one block and they are termed as one cluster and those which are having different appearance as treated as separate blocks .During this process if any blocks remained still un-structured, they will be remained as a separate group. In this same way we can apply the same clustering algorithm on all examples to cluster the data into various individual groups [4]-[7].   © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)   IJRAR19K2412 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org  833 In this current thesis we try to apply this clustering technique on message filtering based on text classification. In current days almost all the social network sites or companies try to post their status update messages on their individual walls. As a lot of users try to post updates on their individual walls, different OSN users try to post comments or replies on that posted content. In present situation cyber bullying is mostly showing impact on social media by posting abused, aggressive, intentional contents performed by an individual or a group of people via digital communication methods such as sending messages and posting comments against a victim [8]. It is very much different from traditional bullying that usually occurs at school during face-to-face communication. For bullies, they are free to hurt their peers’ feelings because they do not need to face someone and can hide behind the Internet. For victims, they are easily exposed to harassment since all of us, especially youth, are constantly connected to Internet or social media. Bag-of-words (BoW) model is one commonly used model that each dimension corresponds to a term. By mapping text units into fixed-length vectors, the learned representation can be further processed for numerous language processing tasks. II.   Background Work In this section we will mainly discuss about the background work which is carried out in order to prove the performance of our proposed cyber bullying approach for blocking the embarrassing messages not to post on the recipient wall. Motivation The main motivation for designing this current thesis come from text mining technique in the data mining, which is nothing but the process of dividing the text into multiple parts and where each and every individual part need to be saved in a separate array items. This is also referred to as text data mining, roughly equivalent to analytics. The process of extracting or identifying the main features from the given input and removing the ir-relevant information from the input is known as text mining[9]. In this proposed thesis we try to use a word like high quality information, which is nothing but the information which is derived from a set of patterns and trends and in-turn used for the pattern learning applications. Here we try to take some abused words into the BoW (Bag of Words) database and then try to filter out the messages based on this BoW words. If the message contains any word which matched from BoW database then such a message needs to be blocked and it shouldn’t b e posted on the user OSN wall. If the same message contains no single word from the BoW database, then such a message is treated as normal message and it can be displayed on the User OSN wall.   © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)   IJRAR19K2412 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org  834 Figure. 2. Represents the Basic Flow of Clustering Algorithm From the above figure 2, we can clearly find out that for any clustering algorithm the input is a raw data which is in mixed manner and this is need to be converted into a clustered manner .   Also the main goal of any clustering algorithm is to determine the intrinsic grouping in a set of unlabelled data.Here in our proposed thesis raw data is nothing but the set of messages on a User OSN wall and the clustering algorithm [11] we apply is K- means. Here the k value is assigned with ‘2’, so that the cluste ring algorithm tries to convert the input raw data into two clusters. One is cyber bulled cluster and other one is non-cyber bulled cluster. Here the raw data is matched with a BoW database condition and once if any message contains a word from BoW database, then such a message need to be identified as cyber bulled messages and they are formed in cluster 1 and those which are not matched with that BoW condition, those are treated as cluster 2. III. Proposed K-Means Clustering Algorithm to Block the Embarrassing Messages In this section we mainly define about the proposed K-Means Clustering algorithm for detecting the cyber bullying messages over an OSN walls communication. Assumptions and Notations Here in this section we try to introduce notations which are used in this current thesis Let D = {w1; : : : ;wd} be the dictionary covering all the words existing in the text corpus. Here we try to represent each message with a letter ‘M’  Here we try to take the condition for clustering the text mining using a ‘ BoW ’  vector x  € R d . The normal message is termed as corpus and which can be denoted in a matrix format using X = [x1; : : : ; xn]  € R d×n , Where n is the number of available posts.   © 2019 IJRAR May 2019, Volume 6, Issue 2 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)   IJRAR19K2412 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org  835 Next we try to examine the procedure how the normal messages and corrupted messages (I.e. Cyber bulled messages) are classified and clustered using the K-Means algorithm. Problem Definition   K-means is one of the simplest unsupervised learning algorithms which are design to solve the well-known clustering problem. This is a method which came from vector quantization, srcinated from signal processing. Here the normal message is taken as X = [x1; : : : ; xn] And our main goal is to reconstruct the srcinal input from a corrupted one ~x1; : : : ; ~xn with the goal of obtaining robust representation[12]. In this model, we try to apply BoW model for the srcinal text corpus X,and then try to match with text classification via a linear projection. The projection matrix can be learned as: Where X ˜ = [~x1; : : : ; ~xn]  is the corrupted version of X. It is easily shown that Equation. (2) is an ordinary least square problem having a closed-form solution: P=[x1,x2…..]  Q=[~x1; : : : ; ~xn Where P = X X ˜ T and Q = X ˜. X˜ T .
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x