Bektas, BusraGultekin, Ali OzgunOzdemiroglu, EmreYilmaz, ZeynepDikici, BuseDemir, Seniz2025-10-052025-10-0520259798331566555https://doi.org/10.1109/SIU66497.2025.11112421https://hdl.handle.net/20.500.11779/3095Isik UniversityIn recent years, large language models have demonstrated extraordinary capabilities in natural language processing tasks. The integration of these models to text summarization has highlighted the need for evaluating varying model performances under a standardized benchmarking framework. In this study, the performance of different large language models in generating abstracts of scientific papers which has a common structure and unique language is compared through an extensive experimental analysis. The abstracts automatically generated by these models using prompt engineering were evaluated via various evaluation metrics based on content overlap and semantic similarity. The results that we obtained demonstrated the effectiveness of large language models in abstract generation. © 2025 Elsevier B.V., All rights reserved.trinfo:eu-repo/semantics/closedAccessBenchmarkingLarge Language ModelsScientific PublicationsText SummarizationAbstractingComputational LinguisticsNatural Language Processing SystemsSemanticsText ProcessingLanguage ModelLanguage ProcessingLarge Language ModelModeling PerformanceNatural LanguagesPerformanceScientific PapersText SummarisationVarying ModelsDil Modelleri ile Akademik Özet ÜretimiAcademic Abstract Generation With LLMsConference Object10.1109/SIU66497.2025.111124212-s2.0-105015586999