Examining the Relationship between Topic Model Similarity and Software Maintenance

ABSTRACT

Software maintenance is the last phase of software development, and typically one of the most time- consuming. One reason for this is the difficulty in finding related source code fragments. A high-level understanding of the source code is necessary to make decisions about which source code fragments should be modified together, for example, in the context of fixing a bug. Even with a similarity metric available, understanding what it means to measure similarity in the first place is important; if a technique suggests that two source code fragments are related, is there a human-oriented way of explaining that relation? In this paper, we attempt to identify a concrete link between software maintenance and the similarity metrics provided by latent topic models. We show that similarity in topic models is related to the likelihood that source code fragments will be modified together in the future, and that an awareness of similar source code can make software maintenance easier.

INTRODUCTION

Software development tends to be dominated by maintenance . One difficult problem in software maintenance involves predicting other source code fragments that should be considered when making a change. One approach to solving this problem involves tracking the maintenance history of code sections and assuming that code that has been changed together in the past may need to be changed together in the future. This approach has been shown to make good suggestions , leading to the possibility of using comaintenance history as an evaluative source of data. While it is possible to observe past co-maintenance by observing the changelists in the history of a project, making meaningful predictions for the future often requires a long history. In this paper, we show that Latent Dirichlet Allocation, an unsupervised latent topic model, can be effective at predicting required changes. We demonstrate this fact by artificially omitting source code fragments from clusters of historically co-maintained fragments to simulate a forgotten change. By choosing arbitrary points in the project’s revision history and generating topic models based on that version of the source code, as seen in Figure , we use the observed future changelists to evaluate the model’s ability to predict co-maintenance. By basing our experiment on actual maintenance history, we show that in many cases, topic models are able to predict comaintenance relationships without supervision. In essence, we can evaluate how well they predict what else we might have forgotten to change when making a revision.

چکیده

نگهداری نرم افزار آخرین مرحله توسعه نرم افزار است و به طور معمول یکی از وقت گیر ترین ها است. یکی از دلایلی که این مشکل در یافتن قطعات مربوط به کد منبع است. یک درک سطح بالا از کد منبع لازم است تا تصمیم بگیرد که کدام کد های کد منبع باید با یکدیگر هماهنگ شوند، مثلا در زمینه رفع اشکال. حتی با یک متریک شباهت در دسترس است، درک آنچه که به معنای اندازه گیری شباهت در وهله اول مهم است؛ اگر یک تکنیک نشان می دهد که دو کد منبع کد منبع مرتبط هستند، آیا یک روش انسانی برای توضیح رابطه وجود دارد؟ در این مقاله، ما تلاش می کنیم ارتباط بنیادی بین نگهداری نرم افزار و معیارهای شباهت ارائه شده توسط مدل های موضوعی پنهان را مشخص کنیم. ما نشان می دهیم که شباهت در مدل های موضوعی مربوط به احتمال وجود قطعات کد منبع در آینده در آینده است و آگاهی از کد منبع مشابه می تواند تعمیر و نگهداری نرم افزار را آسان تر کند.

مقدمه

توسعه نرم افزار عمدتا تحت تأثیر نگهداری قرار می گیرد. یک مشکل دشوار در تعمیر و نگهداری نرم افزار شامل پیش بینی سایر قطعات کد منبع است که باید هنگام تغییر تغییر در نظر گرفته شود. یک رویکرد برای حل این مشکل شامل ردیابی تاریخ نگهداری بخش های کد و فرض بر این است که کد که در گذشته تغییر کرده است ممکن است نیاز باشد که در آینده تغییر یابد. این رویکرد نشان داده شده است که پیشنهادات خوبی ارائه می دهد، که منجر به امکان استفاده از تاریخ سوالات می شود به عنوان منبع ارزیابی داده ها. در حالی که امکان مشاهده همکاری های گذشته با مشاهده تغییرات در تاریخ یک پروژه وجود دارد، پیش بینی های معنی دار برای آینده اغلب نیاز به یک تاریخ طولانی دارد. در این مقاله، ما نشان می دهیم که تخصیص نهشته های دلتنگ، یک مدل موضوع ناپیوسته بدون نظارت، می تواند در پیش بینی تغییرات مورد نیاز موثر باشد. ما این واقعیت را با استفاده از مصنوعی حذف قطعات کد منبع از خوشه های از قطعات حفظ شده تاریخی به منظور شبیه سازی تغییر فراموش شده را نشان می دهد. با انتخاب نقاط دلخواه در تاریخ تجدید نظر پروژه و ایجاد مدل های موضوعی بر اساس آن نسخه از کد منبع، همانطور که در شکل دیده می شود، ما از متغیرهای تغییر یافته آینده استفاده می کنیم تا توانایی مدل برای پیش بینی همکاری نگهداری را بررسی کنیم. با استفاده از آزمایش ما در مورد تاریخ نگهداری واقعی، ما نشان می دهیم که در بسیاری از موارد، مدل های موضوعی قادر به پیش بینی روابط comaintenance بدون نظارت هستند. در اصل، ما می توانیم ارزیابی کنیم که چطور می توانیم پیش بینی کنیم چه چیزی ممکن است ما را فراموش کرده تغییر زمانی که یک تجدید نظر است.

Year: 2014

Publisher : IEEE

By : Scott Grant ,James R. Cordy

File Information: English Language/ 5 Page / size: 622 KB

Download

سال : 1393

ناشر : IEEE

کاری از : اسکات گرانت، جیمز R. کوردی

اطلاعات فایل : زبان انگلیسی / 5 صفحه / حجم : KB 622

لینک دانلود

Examining the Relationship between Topic Model Similarity and Software Maintenance

دیدگاه خود را ثبت کنید

دیدگاهتان را بنویسید لغو پاسخ

درباره فروشگاه

ارتباط با ما

شاید دوست داشته باشید

دیدگاه خود را ثبت کنید

دیدگاهتان را بنویسید لغو پاسخ

درباره فروشگاه

ارتباط با ما