Yüksek Lisans Tezleri

Permanent URI for this collectionhttps://hdl.handle.net/20.500.11779/1785

Browse

Search Results

Now showing 1 - 10 of 76

A Study on Churn Prediction in Telecommunication and Pay Tv Area
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2020) Şayık, Murat; …
...
Duplicate Record Detection: a Rule-Based Approach
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2017) Malkaralı, Gülce; Özgür Özlük
The study presents a rule based algorithm to detect dublicate and near-dublicate rocords within a dataset that is extracted from a leading online reality platform.
Forecasting With Ensemble Methods: an Application Using Fashion Retail Sales Data
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Yüzbaşıoğlu, Orkun Berk; Küçükaydın, Hande
In this project, ensemble methods of machine learning are used to predict short term store sales of a fashion retailer. Sales forecasts of various products at different stores are generated for a span of three months with bagging tree regressor, random forest regressor, and gradient boosting regressor algorithm. Algorithms are trained and evaluated with real past sales data of a Turkish fashion retailer. The predictive performance of the models is compared with linear regression. The results of the study show that random forest regressor shows the best performance
Alternative Credit Scoring Model for Thin File Customers
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Korkmaz, İstem Akça; Taş Küten, Duygu
Credit scoring is a widely used tool for banks, financial institutions or corporations. Traditional credit score models are calculated from past financial history of users, and this may lead to exclude some people who have limited financial history from the credit system. Alternative credit scoring allows sector players to access to a larger portion of these customers. The credit scoring industry has expanded with an "all data is credit data" approach that combines traditional credit scoring systems with new data points. In this study, we aim to build an alternative credit scoring model for customers who have limited financial historical data (thin file) by using alternative data points for a national bank in Turkey. Some of the alternative data points and variables have been gathered from one of the bank’s products: the authorized card for Turkish national league football tickets (Passolig). Using alternative data points combining with demographical and geographical information, we perform a comparison between the machine-learning approaches. We use logistic regression approach as a base model and perform a comparison between tree-based approaches: decision tree, random forest and XGBoost to select the most effective modelling approach
Predicttion of Brent Oil Spot Prices Using Country Based Inventory and Trading Data
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Usta, İsmail Batur; Ağralı, Semra
Crude oil price forecasting has been the focus of numerous authorities, yet the task still persists on being a challenging one. The extremely volatile nature of oil market and high number of active players in it makes establishing a solid forecasting model that is constantly relevant to time very difficult. Recent advancements on data technologies, mainly ever-increasing computing power and trending big data technologies allowed new approaches to be born. From online learners to natural language processing, advanced data analytics models were employed with the help of easily accessible and diverse data. This project is an attempt on making use of such available data in order to forecast Brent oil spot price. By using monthly country by country inventory, trading and economic data, strong drivers of crude price was explored. The data used in this project comes from various sources and in multiple formats, with the final merged data frame has over 17000 observations and contains information on 86 countries. To enhance prediction power, a specialized learner is fit on each country individually and then the predictions are accumulated and filtered before outputting a single prediction. Compared to a single predictor, this approach enhanced the predictive power of the algorithm by adapting to dynamics of each country.
A Comprasion of Ensemble Learning Methods in Retail Sales Forecasting
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Süer, Serhan; Güney, Evren
Forecasting has always been an essential skill which companies try to have and implement in various areas. Sales forecasting is one of the major usage areas of forecasting which is used in almost all sectors. This study refers to forecasting sales of Walmart Stores based on several features such as store id, department id, date, and store size. Walmart sales data which was used in this study contains information of stores between 2010 and 2012. At the beginning of the study, the introduction of the dataset and exploratory data analysis were made to identify dependent/independent variables and their characteristics. To apply machine learning algorithms, data preprocessing methods such as missing value treatment, outlier treatment, and feature selection was applied. Ensemble learning methods in machine learning algorithms were applied in the modeling stage. These methods were addressed in three parts such as Bootstrap Aggregation, Boosting, and Stacked Generalization and these parts consist of six different algorithms in total. The models were compared based on four regression metrics as Root Mean Square Error, Mean Absolute Error, R-Squared, and runtime. After selecting the main metric which models were evaluated, cross-validation was applied to achieve unbiased estimates. Finally, parameters of the model which have the highest score in cross-validation were tuned in the hyperparameter optimization stage and a machine learning model which can be used in forecasting sales of Walmart stores and its success score were obtained.
Football Player Profiling Using Opta Match Event Data: Hierarchical Clustering
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Kalenderoğlu, Uğurcan; Koç, Utku
Increasing popularity of data analytics has impacted the sport industry. Dimension of available data and best practices on the usage of data analytics increased as a result of this trend. Player profiling is one of emerging hot topics among those, especially in football. On the other hand, income and expense balance of transfers has been biggest burden on clubs’ financials while it should be reverse. Scouting processes are currently dominated by bilateral relations and intuitive comments of scouting staff. It is an important step to transform into data driven decision framework to overcome this situation. It is crucial to replace a player who leave the team with someone who has potential and very close playing style. Player profiling is the first step to do this. The data set used in this project is obtained from Opta – a sport focused data company – and contains all actions performed on-ball at player level from Turkish Super League, English Premier League and German Bundesliga in three seasons between 2015 and 2018. Principal component analysis is applied to the dataset in order to reduce dimensionality to the 15 features which consists of 2469 players and 271 features at the beginning. As a result of this study, it is observed that there are twelve different player clusters within the traditional main positions; three for defenders, four for midfielders and five for forwards. Clubs can enrich and benefit from these clusters in three ways: 1) evaluation of a player style over a period of time and detecting the best role fit 2) analyzing the effect of cluster combination to decide which line-up yields better team results 3) finding the closest match to a player who is subject to replacement.
Predicting Customer Perfection on Brands Functional Near-Infrared Spectroscopy Measurements
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Kemerci, Emre; Koç, Utku
Customer perception on the brands have importance to give strategic decisions by marketing professionals. In classical ways, customer perception on brands are researched through conducting field surveys. Similarly, neuromarketing discipline have studies on customer behaviors, their perceptions, communication techniques etc. under the frame of decision-making process of human. In neuromarketing, functional near-infrared spectroscopy (fNIRS) is a technology used to measure oxy and deoxy hemoglobin concentration in the tissues in order to enable to analyze hemodynamic responses of the brain activities. In this study, a group of participants’ activations of prefrontal cortex so the hemodynamic responses that were collected against a set of stimuli, which is a brand logo and adjective associated with the brand is used as dataset. Measured hemodynamic response metrics are oxygenated hemoglobin (HbO), deoxygenated hemoglobin (HbR), total hemoglobin (HbT) and Oxygenation (Oxy) and the dataset includes 168 participants’ measurements for 30 stimuli. In addition, the information regarding the responses of the participants and common perception of stimuli (field study results for same stimuli) are also exists in dataset. The aim of the project is to predict through machine learning algorithms whether relation between brand and the relevant adjective is Positive, Negative or Neutral using these feature set. As methodology of this study, fNIRS measurements in the data is cleaned and Null values are handled, measurements are consolidated per participant and stimuli with two different method as feature creation and classification algorithms are used as supervised learning to predict brand perception. In conclusion, performance of support vector classifier and XGBoosting algorithms are become very low, slightly over 50% accuracy despite the optimization with different classifier parameters. Further studies are addressed as performing feature engineering studies with different options.
Analyzing the Drivers of Customer Satisfaction Via Social Media
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Yücel, Kadir Kutlu; Koç, Utku
Social media became a great influence force during the last decade. Active social media user population increased with the new generations. Thus, data started to accumulate in tremendous amounts. Data accumulated through social media offers an opportunity to reach valuable insights and support business decisions. The aim of this project is to understand the drivers of customer satisfaction by public sentiments on Twitter towards a financial institution. Data was extracted from the most popular microblogging platform Twitter and sentiment analysis was performed. The unstructured data was classified by their sentiments with a lexicon-based model and a machine learning based model. The outcome of this study showed machine learning based model successfully overcame the language specific problems and was able to make better predictions where lexicon-based model struggled. Further analysis was performed on the extreme daily average sentiment scores to match these days with prominent events. The results showed that the public sentiment on Twitter is driven by three main themes; complaints related to services, advertisement campaigns, and influencers’ impact.
Prediction of Up and Down Signalsın Selected Blues Chip Stocks
(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Yıldız, Mustafa; Koç, Utku
Efforts have been made to predict the direction in which equity stocks will move in the capital markets. In most of these studies, Technical Analysis and Fundamental Analysis based models have been used. For daily price estimations, macroeconomic variables or financial ratios of financial instruments are used. On the other hand trade book data are taken into consideration in intraday price estimates. In this study, equity market data analytics, which are created by Borsa İstanbul as a benchmark for intraday price signals, are used. These analytics are derived from trade and order book data. For 5 minute periods, intraday price and equity market data analytics data sets are created, and different algorithms are tried over these data sets. The study is carried out using one-week data of 4 selected blue chip stocks. The signals for increase is 1, for decreases is -1 and 0 for non-change signals. As a result of the study, the decision jungle algorithm is the most successful algorithm. In addition this, the lack of volatility and liquidity in the market have caused overfitting problems in ensemble algorithms. According to the multiclass decision jungle confusion matrix, the positive true results for 1 (or increase of the price) are promising. If an investors can just use the algorithm for the price increase, it will be meaningful. The true positive ratio of 1, 54.5%, is too high when it is compared with its false trues value for decrease (or -1), which is just 13.6%. The difference between true positive and false negative (54.5% - 13.6%) will be the earning ratio for the investor, if he/she decides to invest the price increase of Yapi Kredi stock with the decision jungle algorithm. Although it is stated that big data algorithms (machine learning techniques) can give the best results for the data, domain knowledge related to the data is still very important. As it is seen in the study, in order to overcome the problems of overfitting or bias that occur in other studies, it is necessary to obtain sufficient domain knowledge in consultation with the experts and practitioners of the subject. In addition, the increase in the studies on intraday trading, which is a shallow area in the literature, will provide better results in the studies conducted on price forecasts in the future. In the results of this study, parallel with the literature, it is revealed that there is difficulty in estimating the stock price movements.

Yüksek Lisans Tezleri

Browse

Filters

Settings

Sort By

Results per page

Search Results