Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11779/1686
Title: Retail data predictive analysis using machine learning models
Other Titles: Makine öğrenmesi modellerini kullanarak tahmine dayalı perakende verisi analizi
Authors: Güner, Müjde
Advisors: Tuna Çakar
Keywords: Makine Öğrenmesi, Satış Tahmini, Zaman Serileri, Tahminsel Analitik
Publisher: MEF Üniversitesi Fen Bilimleri Enstitüsü
Source: Güner, M. (2021). Retail Data Predictive Analysis Using Machine Learning Models. MEF Üniversitesi Fen Bilimleri Enstitüsü, Bilişim Teknolojileri Yüksek Lisans Programı. ss. 1-39
Abstract: Machine Learning (ML) is a popular field which deals with training the system with data (experience), performing some task (regression or classification) and evaluating the system with the desired performance metrics. ML automatically extracts useful and meaningful insights from the data. ML models for sales prediction applies computational intelligence in many real world applications such as stock market, production, economics, weather, retail, census analysis and so on. Sales prediction can be viewed as a regression problem and various algorithms can be applied. In this project, real life data analysis has been done to predict the sales for four categories of products like Cold Cereal, Bag Snacks, Oral Hygiene Products, and Frozen Pizza. Exploratory Data Analysis (EDA) has been applied to the dataset to make exact predictions even during an unpredictable environment. The different phases of EDA used in this project are Data Preprocessing and Analysis, Feature Selection and Feature Extraction, Model Building and Regression Analysis, Clustering, Time Series Analysis and Model Evaluation using the Performance Metrics. For outlier detection, InterQuartile Range (IQR) method is used. For Filter Based Feature Selection, Univariate Feature Analysis using SelectK-Best and SelectPercentile, Decision Tree Regressor method has been used. For Wrapper Based Feature Selection, Sequential Feature Selector method has been deployed. For Regression Analysis, various algorithms such as Linear Regression, XGBoost Regression and Support Vector Regression (SVR) are analyzed. K-Means Clustering Algorithm has been used on the dataset to generate 4 different clusters. In Time Series Analysis, the week end date and average weekly basket attributes are analyzed, and the sequential data has been rendered for a given time period of occurrence. In model evaluation phase, the Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R2 and Adjusted R2 accuracy has been calculated and validated. The project has been implemented in an open source software called Anaconda which includes Jupyter Notebook platform for scientific computations. Python programming language with different packages such as Numpy, Pandas, Scikit learn has been used.
URI: https://hdl.handle.net/20.500.11779/1686
Appears in Collections:FBE, Yüksek Lisans, Proje Koleksiyonu

Files in This Item:
File Description SizeFormat 
FBE_BilişimTeknolojileri_MüjdeGüner.pdfYL-Proje Dosyası1.38 MBAdobe PDFThumbnail
View/Open
Show full item record



CORE Recommender

Page view(s)

6
checked on Jun 26, 2024

Google ScholarTM

Check





Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.