04. Enstitüler / Lisansüstü Eğitim Enstitüsü
Permanent URI for this communityhttps://hdl.handle.net/20.500.11779/204
Browse
Browsing 04. Enstitüler / Lisansüstü Eğitim Enstitüsü by Department "Lisansüstü Eğitim Enstitüsü, Büyük Veri Analitiği Yüksek Lisans Programı"
Now showing 1 - 20 of 97
- Results Per Page
- Sort Options
Master Thesis Market Basket Analysis Using Apriori Algorithm(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2018) Şimşek, Yıldırım Murat; Çakar, TunaPredictive analysis is a branch of data engineering that predicts some occurrence or probabilities depend on the data. To make predictions about future events, predictive analytics uses data mining techniques. The process of these techniques involves an analysis of historic data and predicts the future events based on that analysis. Also using predictive analytics modelling techniques, a model can be created to predict. Depending on the data that they are using these predictive models can be varied. Predictive analytics is made of various statistical and analytical techniques used to develop models that will predict future occurrence, events or probabilities. Market basket analysis is one of the data mining techniques that focusing on discovering purchasing pattern by extracting associations from a store’s transactional data. The electronic commerce point-of-sale expanded the utilization and application of transactional data in Market Basket Analysis. The needs of the customers have to be known and adapted to them from the retailers. The retailers collect information about their customers and what they purchase with the help of the advanced technology. Analysing this information is extremely valuable for understanding purchasing behaviour in retail commerce. Market basket analysis is one possible way to discover which items can be sold together. This analysis gives retailer valuable information about related sales on a group of goods basis customers who buy bread often also buy several products related to bread like milk or butter. It makes sense that these groups are placed side by side in a store so that customers can reach them quickly. Market basket analysis is very useful technique for the related group of products that are bought together, and to reorganize the supermarket layout, and also to design promotional campaigns such that products’ purchase can be improved. The main aim of this capstone project is to find the co-occurring items in consumer shopping baskets in the data set that provided by GittiGidiyor E-Commerce Company with the help of the association rule mining algorithm; apriori. Mining association rules from transactional data will provide us with valuable information about co-occurrences and copurchases of products. Such information can be used as a basis for decisions about marketing activity such as promotional support, inventory control and cross-sale campaigns.Master Thesis Carbon Price Forecasting(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2018) Karakaya, Nurhak; Ağralı, SemraIn last twenty years great improvements occurred both in technological advances and in the world economic capacity. The total production capacity of countries has been increasing rapidly. These increases need great usage of energy. For that reason, prices of energy related products are very important as they dramatically affect company budgets. Energy budgets get a great deal in total budget of companies and countries. A unit increase in an energy related product can severely affect the budget. The carbon price is one of those products. Besides carbon prices, carbon usage also affects global environment so its price also has an impact on global temperature. To forecast future carbon price different machine learning methods are used. In literature, support vector machines (SVM) [1, 2, 3], random forest (RF) [4, 5], artificial neural networks (ANN) [6, 7, 8] and Auto Regressive Moving Average (ARMA) [9] are commonly used methods. All these methods have pros and cons over the others. In this project, we also apply different machine learning methods, ANN, SVM, RF, Lasso Regression (LG)[11] and Ridge Regression (RR) [10] to forecast the carbon price over time, and give an explanation for future price movements. Then, we compare those five models by analyzing model validation methods. Finally, we choose the best model for further experiments. We have four data types: daily carbon price (CP), electricity price (EP), natural gas price (NG) and coal price (COP) that cover the period of 2009 and 2017. Prices are provided in different currencies. First of all, we work on the data to have all prices in the same currency. We completely eliminate null data. Then, graphically we investigate overall trend by smoothing the data. For analyzing data, we look for daily, monthly, yearly and seasonally time scales. For every weekday or weekends in train data set we keep a day in test data set so that we can keep the time effect in our model. After the data management process, we apply different forecasting methods to explain future carbon price tendencies.Master Thesis Alternative Credit Scoring Model for Thin File Customers(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Korkmaz, İstem Akça; Taş Küten, DuyguCredit scoring is a widely used tool for banks, financial institutions or corporations. Traditional credit score models are calculated from past financial history of users, and this may lead to exclude some people who have limited financial history from the credit system. Alternative credit scoring allows sector players to access to a larger portion of these customers. The credit scoring industry has expanded with an "all data is credit data" approach that combines traditional credit scoring systems with new data points. In this study, we aim to build an alternative credit scoring model for customers who have limited financial historical data (thin file) by using alternative data points for a national bank in Turkey. Some of the alternative data points and variables have been gathered from one of the bank’s products: the authorized card for Turkish national league football tickets (Passolig). Using alternative data points combining with demographical and geographical information, we perform a comparison between the machine-learning approaches. We use logistic regression approach as a base model and perform a comparison between tree-based approaches: decision tree, random forest and XGBoost to select the most effective modelling approachMaster Thesis Predicting Outcomes and Improving Game Models for Football Matches(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Göçer, Murat; Küçükaydın, HandeThis study is conducted to predict the results of the 2017/2018 English Premier League football matches and show the teams what they should pay attention to in order to win. In this study, classification algorithms are used and the algorithm that gives the best results is applied to real matches. After evaluating the results, some suggestions are made for similar future studies and for the teams to develop their game models.Master Thesis Predicting Facebook Ad Impressions & Cpm Values(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2018) Tekten, Semih; Özlük, ÖzgürIt is estimated that there are more than two billion active users on Facebook as of the first quarter of 2018 and social media has tremendous opportunities for advertisers in terms of performance and measurability. However, for marketing managers, it is very difficult to manage all the campaigns on different marketing channels and optimize for better results.For that reason, Facebook Marketing Partners or other optimization solutions emerged in the adtech market. In order to improve existing optimization solutions in the market, ad impression costs will be predicted in this study by using different machine learning techniques and different algorithms. The main goal of this study is to generate a robust model for predicting CPM values on Facebook, and to use that model as an in put for the existing optimization solution Adphorus offers for its clients. Adphorus is one of the Facebook Marketing Partners in the market.Master Thesis Sentiment Analysis of Hürriyet Emlak(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2017) Korkmaz, Alev; Özlük, ÖzgürSentiment analysis refer to the task of natural language processing to determine whether a piece of text contains some subjective information and what subjective information it expresses, whether the attitude behind a text is positive, negative or neutral.Master Thesis Game Recommendation System for Steam Platform(MEF Üniversitesi Fen Bilimleri Enstitüsü, 2021) Bayram, Serhan; Semra AğralıIncreasing number of choices and competition in the markets, force companies to differ in services they provide to their customers. Offering better services have a positive impact on customer loyalty, and to do so, companies should understand their customers’ interests and act accordingly. One popular method for this purpose is building recommendation engines to make personalized suggestions. In this project, collaborative filtering methods with implicit feedback are used to make recommendations to users of theSteam platform. The recommendation systems are built using two different matrix factorization techniques, Alternating Least Squares and Bayesian Personalized Ranking. Different models are created with implicit playtime data of the users and the results are evaluated by using Precision at k metric. Additionally, similar items that are offered by the models are analyzed. Results show that the models are considerably successful at finding personal choices and similar items. The best model finds the item in the libraries of 33% ofthe users.Master Thesis Scoring Neighborhoods for Locating Atm Using Machine Learning(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2018) Yıldırım, Oğuzhan; Küçükaydın, HandeFacility location is a general problem that is important for many different sectors and it is even more important when building the facility costs too much. In this project we analyzed the neighborhoods of Turkey and built two different models to estimate the good and bad neighborhoods for locating an ATM, which has significant costs for banks to build one. We used demographic and socio-economic data of 4,504 neighborhoods in Turkey and built models using Linear Regression and Decision Tree techniques of Machine Learning to find the best neighborhoods for locating a new ATM for a new bank entering the market. We compared the results of two machine learning methods and the results showed that we can make successful predictions of the neighborhoods by using machine learning methods which are good to locate an ATM without classical optimization techniques that requires complex calculations and machine learning methods.Master Thesis Airbnb Host Recommendation Engine(MEF Üniversitesi Fen Bilimleri Enstitüsü, 2021) Arslan, Batuhan; Özgür ÖzlükIn this project, a fifth rule is proposed to reveal guests ' comments about hosts using the recommendation system and sentiment analysis for the super hosts' selection for Airbnb. This project is aimed to contribute to Airbnb's selection of Super hosts. In this study, sentiment analysis and comment data are examined, and polarity scores are created for use in suggestion systems. A collaborative filtering method is used for the recommendation system. The FunkSVD algorithm received the best RMSE score. Polarity scores are estimated for each latent user by looking at the host and listing id. The recommendation system developed ranked the polarity scores of hosts for each user.Master Thesis The Passanger Load Factor Prediction of Airline Transport(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2017) Karakoç, Kalender; Arslan, Şuayb Ş.Turkish Airlines is one of the most preferred leading European air carriers with global network coverage thanks to its strict compliance with flight safety, reliability, product line, service quality and competitiveness. Turkish Airlines maintains its identity as the flag carrier of Turkey.Master Thesis Text Classification Using Apache Spark(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2018) Azizoğlu, Umut Rezan; Özlük, ÖzgürOne of the biggest problems of enterprises which are marketplace e-commerce business model with social platform; The improper communication of their social platform is the negative impact of the customer experience and the damage of the brand's value both materially and morally. As the number of daily commentaries is in numbers that cannot be read manually with optimal human resources in terms of company profitability, the interpretation modules in social market places are left unconscious. With this Project; established a model that prevents sentences that spoil the customer experience in their social platforms. Both data preparation and machine learning model were developed on Databricks notebook, using the apache spark platform with SparkML libraries and Pyspark language. The “Text Classification” approach is adopted when determining the model.Master Thesis Gittigidiyor Basket Analysis(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2017) Yılmaz, KeremData Mining is becoming more important for lots of sector and companies worlwide. Because, it can find patterns, correlations, anomalies in the databases which can help us to make accurate future decision. Data Mining contains of various statistical analyses that reveal unknown aspect of the data. Data Mining encompasses a huge variety of statistical and computational technigues such as; Market Basket Analysis, Clustering, Classification and Regression Analyses.Master Thesis Price Prediction Using Machine Learning Techniques: an Application To Vacation Rental Properties(MEF Üniversitesi Fen Bilimleri Enstitüsü, 2021) Ay, Oğuz; Hande KüçükaydınPricing is a subjective process that highly depends on person. There is no general rule to price a house. That is why there is both overpriced and underpriced rental houses in rental listings in websites such as AirBnB. In order to reduce the effect of subjective pricing, a general machine learning model is built in this project to make more objective price predictions.In the literature, there are different machine learning models to make numeric predictions. Physical features of houses are used as an input to make inferences about the price of a house. These machine learning models can identify the relations between features and the price and make the predictions with respect to features of a new listing house that has not been priced before.In this project, six different machine learning models are developed. These are linear regression, ridge regression, support vector regressor, random forest regressor, light gradient boosting machine regressor and extreme gradient boosting regressor. The performances of all models are compared, and the best model is selected for hyper-parameter tuning to make more accurate predictions.Master Thesis Customer Segmentation of an Online Retailer(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2018) Öniz, Bengisu; Demir, ŞenizData about customers and their shopping habits is one of the most valuable assets of many organizations. Processing customer data, discovering unknown patterns, and getting useful results from them are primary purposes of customer segmentation. In this study, it is aimed to segment the customers of one of the leading apparel retail companies in Turkey. The data gathered from the company's e-commerce web page consists of web analytics and product purchases of customers. For clustering customer data, K-means and Agglomerative are used, and the number of clusters is determined via different distance metrics and silhouette scores. Our analysis results show that there are differences in purchasing frequencies, quantities, campaign sensitivities, and site usage patterns among clusters. Since customers in the same cluster are expected to share common purchasing habits, we argue that this study would be of great use in loss churn analysis or in a product recommendation system.Master Thesis Duplicate Record Detection: a Rule-Based Approach(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2017) Malkaralı, Gülce; Özgür ÖzlükThe study presents a rule based algorithm to detect dublicate and near-dublicate rocords within a dataset that is extracted from a leading online reality platform.Master Thesis Predicttion of Brent Oil Spot Prices Using Country Based Inventory and Trading Data(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Usta, İsmail Batur; Ağralı, SemraCrude oil price forecasting has been the focus of numerous authorities, yet the task still persists on being a challenging one. The extremely volatile nature of oil market and high number of active players in it makes establishing a solid forecasting model that is constantly relevant to time very difficult. Recent advancements on data technologies, mainly ever-increasing computing power and trending big data technologies allowed new approaches to be born. From online learners to natural language processing, advanced data analytics models were employed with the help of easily accessible and diverse data. This project is an attempt on making use of such available data in order to forecast Brent oil spot price. By using monthly country by country inventory, trading and economic data, strong drivers of crude price was explored. The data used in this project comes from various sources and in multiple formats, with the final merged data frame has over 17000 observations and contains information on 86 countries. To enhance prediction power, a specialized learner is fit on each country individually and then the predictions are accumulated and filtered before outputting a single prediction. Compared to a single predictor, this approach enhanced the predictive power of the algorithm by adapting to dynamics of each country.Master Thesis Prediction of Credit Card Default(MEF Üniversitesi Fen Bilimleri Enstitüsü, 2021) Akalın, Selçuk; Utku KoçAs profitable customer acquisition becomes more and more critical for the banking sector in terms of competition, the requirement to predict customer defaults with different machine learning algorithms is increasing. Thanks to similar practices, possible damages can be prevented. Due to the rapid change of machine learning with the changing technology, the fields of application and development in different sectors are also changing and developing rapidly. In this study, the aim is to make a comparison over model outcomes and making observations on outcomes to determine the areas that can be developed or researched with running different supervised and unsupervised machine learning algorithms on the final dataset gathered by doing following methods such as key points discovered in exploratory data analysis on an imbalanced credit card dataset, generating different features according to learned key points, eliminating imbalance with different oversampling and undersampling methods.Master Thesis Forecasting Organic Traffic With Different Source of Data(MEF Üniversitesi Fen Bilimleri Enstitüsü, 2021) Çolak, Mehtap; Özgür ÖzlükIn this project, the results are compared using different data sets for the organic traffic forecasting of a website. Two different models were developed based on the data obtained from Google Search Console (GSC), Google Analytics (GA), Ahrefs and Google Trends and trained with XGBoost and Random Forest machine learning algorithms. Although the .. value and accuracy rate of the first model developed on the GSC, GA and Ahrefs data obtained between 2019-2020 was high; it is not suitable for predictive analysis because the data sets consist of dependent variables. The second model was developed with Google Trends data for brand and non-brand queries with the highest Impression value. The future trends of the relevant queries were predicted using the Prophet algorithm. Through this model, Impression values of the relevant website were estimated for the remainder of 2021.Master Thesis Trangling Weratedogs Twitter Data To Create Interesting and Trustworthy Explosatory/Predictive Anaylses and Visulation Using Different Machine Learning Algorithms(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2018) Arı, Esra; Çakar, TunaSocial media usage has rapidly grown in recent years and knowledge in these environments increased due to this expansion. Therefore, doing exploratory and predictive analysis from intensive data of social media became so popular. However, almost all of the large datasets obtained are uncleaned / raw data. Therefore, the assessing and cleaning of the data is at least as important as the exploratory and predictive analysis. The open source WeRateDogs twitter account tweets have been gathered, assessed, cleaned, analyzed and predicted for this thesis. As a result of the study, it was understood that the most important and most time-consuming part of the predictive data analysis is the data gathering and cleaning. As a result of this project, probability of dog’s breed whether retriever or not is predicted from the tweet’s text body. 24 points increase (%34 change) in accuracy values has been achieved by doing oversampling in the data sets which contain low event observation. At the same time, the decision tree, logistic regression and random forest algorithms are compared and it is shown that the random forest's model performance is better than the others. The algorithm works 13 points better than logistic regression, 21 points better than decision tree.Master Thesis Vote Transtition Analysis and Comparison of Turkish Local Elections in 2014 and 2019(MEF Üniversitesi, Fen Bilimleri Enstitüsü, 2019) Baydoğan, Ufuk; Güney, EvrenDebates around how voters switched their votes relative to previous elections are always the topic after the Election Day. Turkish local election of 2019 was important because of three reasons: first, because it was the first local election after Turkey adapted the new presidential system and the President also participated in the election campaign for his party; second, because İstanbul election, originally run on March 31, was ruled for rerun by Supreme Election Council and the third, because the electoral alliances had significant impact on the results where the votes for The People's Alliance significantly collapsed. This study presents a comparative analysis of 2014 and 2019 official Turkish Local Election Results as well as 2019 Re-Run Election Results of Istanbul to understand the vote transitions. As the outcomes are considered, there are significant changes in the distribution of voting rates between these elections, especially in critical metropolitans. Using the aggregate level vote counts, the vote transition probabilities between the elections are inferred using ecological inference. Proposed clustering approach on vote transition probabilities show that CHP and IYI Party have benefited from forming Nation’s Alliance for most of the cities mainly due to the vote switches from HDP and MHP. For the re-run election case, the slight number of vote difference between the alliances in March has increased significantly. This is mainly because of the contribution of absentees to Nation’s Alliance and around %5 of the People’s Alliance supporters in March who estimated to vote for Nation’s Alliance.

