GCRIS

Now showing 1 - 3 of 3

Clustering of News in Publications
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2018) Sülün, Erhan; Arısoy Saraçlar, Ebru
In today’s world, high volume of text is produced and stored continuously by the help of computer systems and Internet. And again by the help of Internet, those huge amount of text data is accessible to everyone. But when considering the size of the produced text, it is really hard for people to analyze the huge amounts of text data and discover the meaningful information in that data. Machine learning techniques and computer power emerges at this point, in order to analyze data and discover meaningful information to help people to access the summarized information. First step to analyze text data is to represent data in a numerical format, as machine learning techniques can only use numerical inputs. There are several methods for data representation; such as TF-IDF (Term Frequency - Inverse Document Frequency), Bag of Words, Word2Vec and Doc2Vec. Second step to analyze text data is to use machine learning algorithms by using the numerical representation of text data as input. There are supervised and unsupervised machine learning techniques to be decided to be used according to the structure of the problem and the data. In this study, news documents published in some publications in United States, such as New York Times, Reuters and Washington Post will be clustered into topics in order to categorize them and ease the investigation of them. Three types of data representation methods will be examined in detail and will be used, which are Bag of Words, TF-IDF and Doc2Vec representations. And finally, as the news data is an unlabeled set of documents, K-Means clustering algorithm will be used which is an unsupervised learning technique, by using both Euclidean Distance and Cosine Similarity metrics. Categorization will be performed multiple times with different category counts, meaning with different K values, and most meaningful category count will be determined after examining the clustering results.
Predicting Yelp Stars Based on Business Attributes
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2018) Tek, Ahmet; Arısoy Saraçlar, Ebru
Yelp is a business review website where consumers can comment on a business from their point of view. This allows other consumers to have prior knowledge of the business. Whenever we search something we try and hope to get the most relevant results, and recommender systems can achieve this. Review websites, such as Yelp and TripAdvisor allow users to post online reviews for various businesses, products and services and have been recently shown to have a significant influence on consumer shopping behavior [1]. This paper aims to predict restaurant ratings using their attributes such as alcohol, noise level, Wifi, music, a smoking area and to find the most important attributes for higher ratings. Yelp dataset has lots of information about businesses and consumer behaviors and it is free for academic usage. For these reasons, Yelp dataset has been selected in this project. Machine Learning models have been executed for two-star label classification. Since we aim to find the most important features for a higher rating we only choose 4 and 5-star labels from the dataset. In our research, restaurant rating prediction is implemented as binary-class classification where the class labels are the star ratings. Restaurant attributes are the input features of the classifier. We will investigate Decision Trees, Naive Bayes Classifier, Two-Class Decision Forest, Two-Class Boosted Decision Trees, TwoClass Neural Network, Two-Class Support Vector Machine, Two-Class Logistic Regression and choose the most important 10 attributes resulting in high ratings.
E-Commerce Customer Shurn Prediction Based Machine Learning Algortihms
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2018) Eser, Ahmet Yetkin; Arısoy Saraçlar, Ebru
With the development and popularization of a digital world, human behavior has changed so remarkably. A lot of sectors affected because of this change. One of the most affected areas is the retail sector. People have left their regular shopping habits and started shopping on e-commerce sites. Thanks to increasing of variety and volume of collected data and velocity of new machines, companies can use sophisticated algorithms efficiently on their data. In this paper, we discuss about how companies can predict potential churned customers with machine learning methods.

Yüksek Lisans Tezleri

Browse

Filters

Settings

Sort By

Results per page

Search Results