정보관리학회지, 한국정보관리학회

권한신청
P-ISSN1013-0799
E-ISSN2586-2073
KCI

검색어: usefulness, 검색결과: 33

멀티미디어 콘텐츠를 위한 이용빈도 기반 하이브리드 추천시스템에 관한 연구

김용(전북대학교) ; 문성빈(연세대학교) 2006, Vol.23, No.3, pp.91-125 https://doi.org/10.3743/KOSIM.2006.23.3.091

초록보기

초록

정보기술과 인터넷의 발전에 따른 정보의 폭발적인 증가로 인하여 정보과잉에 따른 적절한 정보의 선택이 필요하게 되었다. 이를 위하여 이용자가 정보를 효율적으로 이용할 수 있도록 검색 또는 여과하는 일을 수행하기 위하여 정보검색 및 정보여과 시스템이 등장하게 되었다. 이러한 일련의 정보환경의 변화에 대한 보다 적극적인 대응방법으로서 도서관 및 정보센터에서는 이용자가 원하는 정보를 정확하고 효율적으로 제공하기 위한 노력의 일환으로서 이용자에게 맞춤화된 정보 추천서비스 제공이 요구된다. 본 연구에서는 도서관 및 정보센터에서 적극적인 정보서비스를 위한 방법으로 이용자에게 맞춤화된 정보를 제공할 수 있는 개인화 추천시스템을 구축하기 위한 방안을 제안하였다. 이를 위하여 기존의 추천방법에 대한 장단점을 분석하고 기존 추천방법에 대한 문제점을 해결하기 위한 방법으로서 대용량 콘텐츠 및 이용자 환경에서 이용자의 콘텐츠 이용빈도를 기준으로 멀티미디어 콘텐츠를 위한 개인화된 하이브리드 추천방법을 제안하였다. 이를 위하여 이용빈도에 있어서 상위 이용자 및 콘텐츠를 분리하고 적절한 추천방법에 적용하기 위한 새로운 형태의 추천방법 및 대용량 추천시스템에 적합한 연관규칙과 협업여과방법에 대한 조합방법을 제안하였다.

Abstract

Recent advancements in information technology and the Internet have caused an explosive increase in the information available and the means to distribute it. However, such information overflow has made the efficient and accurate search of information a difficulty for most users. To solve this problem, an information retrieval and filtering system was developed as an important tool for users. Libraries and information centers have been in the forefront to provide customized services to satisfy the user's information needs under the changing information environment of today. The aim of this study is to propose an efficient information service for libraries and information centers to provide a personalized recommendation system to the user. The proposed method overcomes the weaknesses of existing systems, by providing a personalized hybrid recommendation method for multimedia contents that works in a large-scaled data and user environment. The system based on the proposed hybrid method uses an effective framework to combine Association Rule with Collaborative Filtering Method.

문헌범주화에서 학습문헌수 최적화에 관한 연구

심경(아이리스닷넷) 2006, Vol.23, No.4, pp.277-294 https://doi.org/10.3743/KOSIM.2006.23.4.277

초록보기

초록

본 연구는 실재 시스템 환경에서 문헌 분류를 위해 범주화 기법을 적용할 경우, 범주화 성능이 어느 정도이며, 적정한 문헌범주화 성능의 달성을 위하여 분류기 학습에 필요한 범주당 가장 이상적인 학습문헌집합의 규모는 무엇인가를 파악하기 위하여 kNN 분류기를 사용하여 실험하였다. 실험문헌집단으로15만 여건의 실제 서비스되는 데이터베이스에서 2,556건 이상의 문헌을 가진 8개 범주를 선정하였다. 이들을 대상으로 범주당 학습문헌수 20개(Tr20)에서 2,000개(Tr2000)까지 단계별로 증가시키며 8개 학습문헌집합 규모를 갖도록 하위문헌집단을 구성한 후, 학습문헌집합 규모에 따른 하위문헌집단 간 범주화 성능을 비교하였다. 8개 하위문헌집단의 거시평균 성능은 F1 값 30%로 선행연구에서 발견된 kNN 분류기의 일반적인 성능에 미치지 못하는 낮은 성능을 보였다. 실험을 수행한 8개 대상문헌집단 중 학습문헌수가 100개인 Tr100 문헌집단이 F1 값 31%로 비용대 효과면에서 분류기 학습에 필요한 최적정의 실험문헌집합수로 판단되었다. 또한, 실험문헌집단에 부여된 주제범주 정확도를 수작업 재분류를 통하여 확인한 후, 이들의 범주별 범주화 성능과 관련성을 기반으로 위 결론의 신빙성을 높였다.

Abstract

This paper examines a level of categorization performance in a reallife collection of abstract articles in the fields of science and technology, and tests the optimal size of documents per category in a training set using a kNN classifier. The corpus is built by choosing categories that hold more than 2,556 documents first, and then 2,556 documents per category are randomly selected. It is further divided into eight subsets of different size of training documents: each set is randomly selected to build training documents ranging from 20 documents (Tr20) to 2,000 documents (Tr2000) per category. The categorization performances of the 8 subsets are compared. The average performance of the eight subsets is 30% in F1 measure which is relatively poor compared to the findings of previous studies. The experimental results suggest that among the eight subsets the Tr100 appears to be the most optimal size for training a kNN classifier. In addition, the correctness of subject categories assigned to the training sets is probed by manually reclassifying the training sets in order to support the above conclusion by establishing a relation between and the correctness and categorization performance.

학습문헌집합에 기 부여된 범주의 정확성과 문헌 범주화 성능

심경(Systems R&D Center, Iris.Net) ; 정영미(연세대학교) 2006, Vol.23, No.2, pp.265-285 https://doi.org/10.3743/KOSIM.2006.23.2.265

초록보기

초록

문헌범주화에서는 학습문헌집합에 부여된 주제범주의 정확성이 일정 수준을 가진다고 가정한다. 그러나, 이는 실제 문헌집단에 대한 지식이 없이 이루어진 가정이다. 본 연구는 실제 문헌집단에서 기 부여된 주제범주의 정확성의 수준을 알아보고, 학습문헌집합에 기 부여된 주제범주의 정확도와 문헌범주화 성능과의 관계를 확인하려고 시도하였다. 특히, 학습문헌집합에 부여된 주제범주의 질을 수작업 재색인을 통하여 향상시킴으로써 어느 정도까지 범주화 성능을 향상시킬 수 있는가를 파악하고자 하였다. 이를 위하여 과학기술분야의 1,150 초록 레코드 1,150건을 전문가 집단을 활용하여 재색인한 후, 15개의 중복문헌을 제거하고 907개의 학습문헌집합과 227개의 실험문헌집합으로 나누었다. 이들을 초기문헌집단, Recat-1, Recat-2의 재 색인 이전과 이후 문헌집단의 범주화 성능을 kNN 분류기를 이용하여 비교하였다. 초기문헌집단의 범주부여 평균 정확성은 16%였으며, 이 문헌집단의 범주화 성능은 F1값으로 17%였다. 반면, 주제범주의 정확성을 향상시킨 Recat-1 집단은 F1값 61%로 초기문헌집단의 성능을 3.6배나 향상시켰다.

Abstract

In text categorization a certain level of correctness of labels assigned to training documents is assumed without solid knowledge on that of real-world collections. Our research attempts to explore the quality of pre-assigned subject categories in a real-world collection, and to identify the relationship between the quality of category assignment in training set and text categorization performance. Particularly, we are interested in to what extent the performance can be improved by enhancing the quality (i.e., correctness) of category assignment in training documents. A collection of 1,150 abstracts in computer science is re-classified by an expert group, and divided into 907 training documents and 227 test documents (15 duplicates are removed). The performances of before and after re-classification groups, called Initial set and Recat-1/Recat-2 sets respectively, are compared using a kNN classifier. The average correctness of subject categories in the Initial set is 16%, and the categorization performance with the Initial set shows 17% in F1 value. On the other hand, the Recat-1 set scores F1 value of 61%, which is 3.6 times higher than that of the Initial set.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지