Trangling Weratedogs Twitter Data To Create Interesting and Trustworthy Explosatory/Predictive Anaylses and Visulation Using Different Machine Learning Algorithms

Arı, Esra

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11779/1168

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Çakar, Tuna	-
dc.contributor.author	Arı, Esra	-
dc.date.accessioned	2019-11-12T13:42:00Z
dc.date.available	2019-11-12T13:42:00Z
dc.date.issued	2018	-
dc.identifier.citation	Arı, E. (2018). Trangling weratedogs Twtter data to create interesting and trustworthy explosatory/predictive anaylses and visulation using different machine learning algorithms, MEF Üniversitesi Fen Bilimleri Enstitüsü, İstanbul, Türkiye	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.11779/1168	-
dc.description.abstract	Social media usage has rapidly grown in recent years and knowledge in these environments increased due to this expansion. Therefore, doing exploratory and predictive analysis from intensive data of social media became so popular. However, almost all of the large datasets obtained are uncleaned / raw data. Therefore, the assessing and cleaning of the data is at least as important as the exploratory and predictive analysis. The open source WeRateDogs twitter account tweets have been gathered, assessed, cleaned, analyzed and predicted for this thesis. As a result of the study, it was understood that the most important and most time-consuming part of the predictive data analysis is the data gathering and cleaning. As a result of this project, probability of dog’s breed whether retriever or not is predicted from the tweet’s text body. 24 points increase (%34 change) in accuracy values has been achieved by doing oversampling in the data sets which contain low event observation. At the same time, the decision tree, logistic regression and random forest algorithms are compared and it is shown that the random forest's model performance is better than the others. The algorithm works 13 points better than logistic regression, 21 points better than decision tree.	en_US
dc.description.abstract	Son yıllarda artan sosyal medya kullanımı, bu mecralardaki bilgi birikimi arttırmıştır. Artan bu bilgi yoğunluğu sosyal medyadan veri elde etmeyi ve bununla hem keşifçi hem de tahminsel analizler yapmayı popüler hale getirmiştir. Fakat elde edilen büyük verilerin neredeyse hepsi temizlenmemiş/ham veri durumundadır. Dolayısla verinin doğru bir şekilde temizlenmesi ve incelenmesi en az keşifçi ve tahminsel analizler kadar önemlidir Bu bitirme tezi için farklı kaynaklardan kirli veriyi toplamak, değerlendirmek, temizlemek, keşifçi ve tahminsel analizler yapmak amacı ile açık kaynaklı olan WeRateDogs twitter hesabının tweetleri kullanılmıştır. Yapılan çalışma sonucunda tahminsel veri analizinde aslında en önemli ve en çok zaman alan kısımın veriyi toplama ve temizleme olduğu anlaşılmıştır. Bu projenin çıktısı olarak sadece atılan tweet’in içerdiği yazı bilgisi ile köpeğin türünün retriever olup olmadığı tahminlenmiştir. Yapılan tahminleme sürecinde düşük olay gözlemi içeren veri setlerinde fazladan örneklem yapılarak modelin doğruluk değerini 24 puan artması sağlanmıştır. Aynı zamanda karar ağacı, lojistik regresyon ve random forest algoritmaları karşılaştırılmış, random forest’ın model performansı açısından karar ağacı modellerinden iyi olduğu görüşmüştür. Bu doğrultuda random forest modeli karar ağacı modelinden 21 puan, lojistik regresyon modelinden ise 13 puan daha iyi doğruluk değeri almıştır.	en_US
dc.language.iso	en	en_US
dc.publisher	MEF Üniversitesi, Fen Bilimleri Enstitüsü	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Text-Hashing	en_US
dc.subject	Data Wrangling	en_US
dc.subject	WeRateDogs	en_US
dc.subject	Machine Learning	en_US
dc.subject	Twitter Data	en_US
dc.subject	Principle Component Analysis	en_US
dc.subject	Random Forest	en_US
dc.subject	Decision Tree	en_US
dc.subject	Logistic Regression	en_US
dc.subject	Azure Machine Learning Studio	en_US
dc.subject	Veri İnceleme	en_US
dc.subject	Makine Öğrenmesi	en_US
dc.subject	Twitter Verisi	en_US
dc.subject	Princible Component Analizi	en_US
dc.subject	Karar Ağacı	en_US
dc.subject	Lojistik Regresyon	en_US
dc.title	Trangling Weratedogs Twitter Data To Create Interesting and Trustworthy Explosatory/Predictive Anaylses and Visulation Using Different Machine Learning Algorithms	en_US
dc.title.alternative	Farklı makine öğrenme algoritmalarını kullanarak weratedogs twitter hesabının verilerinin keşfedici ve tahminsel analizlerinin yapılması ve görselleştirilmesi	en_US
dc.type	Master's Degree Project	en_US
dc.identifier.wosquality	N/A	-
dc.identifier.scopusquality	N/A	-
dc.relation.publicationcategory	YL-Bitirme Projesi	en_US
dc.department	Büyük Veri Analitigi Yüksek Lisans Programı	en_US
dc.institutionauthor	Arı, Esra	-
item.languageiso639-1	en	-
item.fulltext	With Fulltext	-
item.grantfulltext	open	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.cerifentitytype	Publications	-
item.openairetype	Master's Degree Project	-
Appears in Collections:	FBE, Yüksek Lisans, Proje Koleksiyonu