정보관리학회지, 한국정보관리학회

1

김용환(연세대학교) ; 정영미(연세대학교) 2012, Vol.29, No.2, pp.155-171 https://doi.org/10.3743/KOSIM.2012.29.2.155

초록보기

초록

텍스트 범주화에 있어서 일반적인 문제는 문헌을 표현하는 핵심적인 용어라도 학습문헌 집합에 나타나지 않으면 이 용어는 분류자질로 선정되지 않는다는 것과 형태가 다른 동의어들은 서로 다른 자질로 사용된다는 점이다. 이 연구에서는 위키피디아를 활용하여 문헌에 나타나는 동의어들을 하나의 분류자질로 변환하고, 학습문헌 집합에 출현하지 않은 입력문헌의 용어를 가장 유사한 학습문헌의 용어로 대체함으로써 범주화 성능을 향상시키고자 하였다. 분류자질 선정 실험에서는 (1) 비학습용어 추출 시 범주 정보의 사용여부, (2) 용어의 유사도 측정 방법(위키피디아 문서의 제목과 본문, 카테고리 정보, 링크 정보), (3) 유사도 척도(단순 공기빈도, 정규화된 공기빈도) 등 세 가지 조건을 결합하여 실험을 수행하였다. 비학습용어를 유사도 임계치 이상의 최고 유사도를 갖는 학습용어로 대체하여 kNN 분류기로 분류할 경우 모든 조건 결합에서 범주화 성능이 0.35%~1.85% 향상되었다. 실험 결과 범주화 성능이 크게 향상되지는 못하였지만 위키피디아를 활용하여 분류자질을 선정하는 방법이 효과적인 것으로 확인되었다.

Abstract

In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in F1 value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

2

디지털도서관 구축과정에서 TREC 텍스트 문서의 시각적 표현에 관한 연구

정기태(Assistant Professor University of Oklahoma School of Library and Information Studies) ; 박일종(계명대학교) 2004, Vol.21, No.3, pp.1-14 https://doi.org/10.3743/KOSIM.2004.21.3.001

초록보기

초록

이용자들은 유사문서를 검색할 때, 각 가지 문서의 시각적표현을 통하여 도움을 얻게 되며, 모든 정보검색에 관한 연구는 이용자들의 다양한 요구를 충족시키기 위한 여러 가지의 해결책을 제시하고 있다. 제안되어진 해결책은 알파벳 순서로 만들어 진 파피루스 문서로부터 카드목록, 마이크로 필름을 이용한 저장, 컴퓨터 디스크를 이용한 파일 보관 등에 이르기까지 다양한 방법들을 들 수 있을 것이다. 또한 대부분의 정보검색 시스템들은 Document Surrogate(문헌을 대체할 수 있는 것들), 즉 요약문, 목차, 초록, 리뷰한 내용, 기계가독형목록(MARC) 기록물 등과 같은 서지자료들을 전체논문을 대체하여 이용하게 된다.본 논문에서는 또 다른 형태의 Document Surrogate로서 용어 리스트의 집단화 방법을 이용해서 찾아보았다. 이 Document Surrogate들은 Multidimensional Scaling (MDS)을 이용해서 2차원 그래프 위에 좌표로써 표현되어지고 있다. 사용된 2차원의그래프 위에서 좌표간의 거리는 문헌들의 유사성을 나타낸다고 해석할 수 있으며 거리가 가까우면 가까울수록 두 문서는 더욱 유사한내용을 포함하고 있다고 해석할 수 있는 것으로 밝혀졌다.

Abstract

Visualization of documents will help users when they do search similar documents, and all research in information retrieval addresses itself to the problem of a user with an information need facing a data source containing an acceptable solution to that need. In various contexts, adequate solutions to this problem have included alphabetized cubbyholes housing papyrus rolls, microfilm registers, card catalogs and inverted files coded onto discs. Many information retrieval systems rely on the use of a document surrogate. Though they might be surprise to discover it, nearly every information seeker uses an array of document surrogates. Summaries, tables of contents, abstracts, reviews, and MARC recordsthese are all document surrogates. That is, they stand infor a document allowing a user to make some decision regarding it, whether to retrieve a book from the stacks, whether to read an entire article, etc.In this paper another type of document surrogate is investigated using a grouping method of term list. Using Multidimensional Scaling Method (MDS) those surrogates are visualized on two-dimensional graph. The distances between dots on the two-dimensional graph can be represented as the similarity of the documents. More close the distance, more similar the documents.

3

분포유사도를 이용한 문헌클러스터링의 성능향상에 대한 연구

이재윤(경기대학교) 2007, Vol.24, No.4, pp.267-283 https://doi.org/10.3743/KOSIM.2007.24.4.267

초록보기

초록

이 연구에서는 분포 유사도를 문헌 클러스터링에 적용하여 전통적인 코사인 유사도 공식을 대체할 수 있는 가능성을 모색해보았다. 대표적인 분포 유사도인 KL 다이버전스 공식을 변형한 Jansen-Shannon 다이버전스, 대칭적 스큐 다이버전스, 최소 스큐 다이버전스의 세 가지 공식을 문헌 벡터에 적용하는 방안을 고안하였다. 분포 유사도를 적용한 문헌 클러스터링 성능을 검증하기 위해서 세 실험 집단을 대상으로 두 가지 실험을 준비하여 실행하였다. 첫 번째 문헌 클러스터링 실험에서는 최소 스큐 다이버전스가 코사인 유사도 뿐만 아니라 다른 다이버전스 공식의 성능도 확연히 앞서는 뛰어난 성능을 보였다. 두 번째 실험에서는 피어슨 상관계수를 이용하여 1차 유사도 행렬로부터 2차 분포 유사도를 산출하여 문헌 클러스터링을 수행하였다. 실험 결과는 2차 분포 유사도가 전반적으로 더 좋은 문헌 클러스터링 성능을 보이는 것으로 나타났다. 문헌 클러스터링에서 처리 시간과 분류 성능을 함께 고려한다면 이 연구에서 제안한 최소 스큐 다이버전스 공식을 사용하고, 분류 성능만 고려할 경우에는 2차 분포 유사도 방식을 사용하는 것이 바람직하다고 판단된다.

Abstract

In this study, measures of distributional similarity such as KL-divergence are applied to cluster documents instead of traditional cosine measure, which is the most prevalent vector similarity measure for document clustering. Three variations of KL-divergence are investigated; Jansen-Shannon divergence, symmetric skew divergence, and minimum skew divergence. In order to verify the contribution of distributional similarities to document clustering, two experiments are designed and carried out on three test collections. In the first experiment the clustering performances of the three divergence measures are compared to that of cosine measure. The result showed that minimum skew divergence outperformed the other divergence measures as well as cosine measure. In the second experiment second-order distributional similarities are calculated with Pearson correlation coefficient from the first-order similarity matrixes. From the result of the second experiment, second-order distributional similarities were found to improve the overall performance of document clustering. These results suggest that minimum skew divergence must be selected as document vector similarity measure when considering both time and accuracy, and second-order similarity is a good choice for considering clustering accuracy only.

4

이용자 중심의 주제어 기반 분류를 위한 주제명 개발에 관한 연구: 지식조직체계 분석을 바탕으로

백지원(이화여자대학교) 2011, Vol.28, No.1, pp.171-193 https://doi.org/10.3743/KOSIM.2011.28.1.171

초록보기

초록

본 연구는 도서관 장서의 분류를 위하여 기존의 문헌 분류체계 대신 주제어 기반의 분류를 적용하고자 할 때 필수적인 주제명 개발의 필요성을 논하고, 개발 방법론의 하나로 기존의 다양한 지식조직체계의 주제어를 활용하는 방법의 가능성을 모색하는데 목적이 있다. 이를 위하여 분석 대상 저작을 선정하고 이에 대하여 부여된 문헌분류, 주제명표목, 국내외 대형 서점의 분류, 서가명 및 주제어, 이용자 태그 등 다양한 지식조직체계의 주제어를 수집하여 그 특성을 비교 분석하였다. 이러한 분석의 결과, 전통적인 도서관 중심의 지식조직체계와 상업성이 중심이 되는 지식조직체계의 성격과 범주화의 방식이 다름을 확인할 수 있었다. 한편, 이용자 태그는 최상위 빈도수의 태그인 경우 전통적인 지식조직체계 및 상업적 영역의 지식조직체계와 어휘의 측면에서 거의 차이가 없는 결과를 나타냈으나, 이용자 중심의 주제어로서 독특한 특성을 가지고 있음을 파악하였다. 이러한 분석을 바탕으로 분류를 대체하는 주제명 작성을 위해 기존의 지식조직체계를 활용할 때 고려해야 할 각각의 특성 및 상호 관계를 분석하였고, 국내에서의 적용을 위한 실질적인 고려사항을 제안하였다.

Abstract

This study aims to analyse the necessity of the subject heading construction for the word based classification and to suggest a methodology that uses various knowledge organization systems(KOS). For this purpose, six kinds of KOS were collected for the 20 selected works in each subject. The collected subjects were analysed in terms of constructing a subject heading for the word based classification. The result of the analysis shows that there is a noticeable difference between the library oriented KOS and commercial oriented KOS. In addition, user oriented tags are more similar to the commercial sector's concerning subject categorization than the library oriented ones. However, there is no noticeable difference among the library oriented KOS, commercial sector oriented KOS, and user oriented tags regarding the subject vocabulary. Some practical implications were suggested for the application to the Korean libraries based on the findings of this study.

5

연구지원 정보서비스를 위한 히스토리오그래프와 SPLC 활용에 관한 실험적 연구: LED 분야 사례를 중심으로

유소영(한남대학교) 2013, Vol.30, No.3, pp.273-296 https://doi.org/10.3743/KOSIM.2013.30.3.273

초록보기

초록

이 연구에서는 특정 주제 분야의 핵심적이고 전역적인 연구 동향을 제공하는 연구지원 정보서비스 개발을 위해 SPLC(Search Path Link Count) 분석을 적용할 때, 데이터의 범위와 인용빈도 설정에 대하여 탐험적으로 살펴보고자 하였다. 이를 위하여 Web of Science에서 검색된 RGB LED 분야의 2,318개 논문과 20,109개 상위 인용논문으로 5개의 데이터셋을 구성하였다. 각 데이터셋에서 히스토리오그래프와 SPLC 네트워크를 인용빈도 임계치를 변화시키면서 28개 주요 연구 동향 네트워크를 추출하여, 인용문헌의 포함여부와 인용빈도 임계치 설정이 SPLC 네트워크에 미치는 영향을 살펴보았다. 그리고 특정 기관 소속 연구자들에게 SPLC 네트워크에 포함된 198개 주요 논문 리스트를 제공하고 피드백을 받음으로써, 전역적 연구 동향이 개인 연구자의 정보 요구에 부합하는지 살펴보았다. 분석 결과, 분석 대상에 상위 인용문헌 포함 여부와 인용빈도임계치에 따라 추출되는 SPLC 네트워크가 변화되었으나, 일정 인용빈도임계치값에서는 수렴하였다. 그리고 개인 연구자의 정보 요구는 SPLC를 통해 제공된 전역적 연구 동향과 출판년도의 차이는 있지만 대체적으로 일치하는 것으로 나타나, 인용문헌을 포함하여 인용빈도임계치를 변화시키는 SPLC 분석을 통해 개인 이용자가 원하는 전역적 연구 정보를 제공해 줄 수 있는 것으로 해석된다. 이를 일반화하기 위해서는 이 탐색적 연구에서 제안된 방법을 다양한 분야에 적용하는 후속 연구가 필요할 것이다.

Abstract

The purpose of this study is to examine the data coverage and citation threshold for analyzing SPLC(Search Path Link Count) as a main path of a historiograph of a certain topic in order to provide ‘core’ papers of global research trends to a researcher affiliated with a local R&D institution. 5 datasets were constructed by retrieving and collecting 2,318 articles on RGB LED on Web of Science published from 1990-2013 and 20,109 articles which cited these original 2,318. The SPLC analysis was performed on each dataset by increasing the threshold of citation counts, and the changes and resilience of the 28 extraced networks were compared. The results of user feedback on 198 unique core papers from 28 SPLC networks received from LED researchers affiliated with a Korean government-sponsored research institution were also analyzed. As a result, it is found that the nodes in each SPLC network in each dataset were differentiated by the citation counts, while the changes in the structure of SPLC networks were slight after the networks’ citation counts were set at 40. Additionally, the user feedback showed that personalized research interest generally matched to the global research trends identified by the SPLC analysis.

6

RDA 자원유형 디스플레이를 위한 고려사항에 관한 연구

이미화(공주대학교) 2016, Vol.33, No.1, pp.33-52 https://doi.org/10.3743/KOSIM.2016.33.1.033

초록보기

초록

본 연구는 GMD를 대체하는 RDA 자원유형인 내용유형, 매체유형, 수록매체유형의 디스플레이를 위한 고려사항을 모색하고자 한다. 연구방법으로는 문헌연구, 사례조사, 설문조사를 이용하였다. RDA 자원유형의 디스플레이 방안으로 첫째, RDA 자원유형을 디스플레이하기 위해 내용유형과 수록매체유형을 결합하는 것을 제안하였다. 둘째, RDA 내용유형과 수록매체유형을 아이콘화하는 알고리즘으로 내용유형을 나타내는 이미지와 수록매체유형 용어를 결합하는 방안과 내용유형과 수록매체유형을 모두 이미지로 표현하고 각 이미지에 해당하는 용어를 포함시키는 방안을 제안하였다. 셋째, 복합자원의 자원유형 디스플레이를 위해 필드링크와 순서를 나타내는 서브필드를 활용하여 내용유형, 수록매체유형이 세트로 유지될 수 있도록 제안하였다. 넷째, 간략화면에서 자원유형을 나타내는 아이콘은 자원이 디스플레이되는 왼쪽 상단에 두고, 상세화면에서는 자원유형을 기술사항 내에 배치하는 것을 제안하였다. 다섯째, 표출어로 ‘포맷’이라는 표현을 사용할 것을 제안하였다. 본 연구는 RDA 자원유형의 디스플레이를 계획할 때 고려사항을 제시하였으므로 도서관에서 실질적인 RDA 디스플레이 방안 마련에 활용할 수 있을 것이다.

Abstract

This study was to find the display considerations of RDA resource type - content type, media type, and carrier type. The Literature review, the case study, and the survey were used as the research method. 5 display strategies were suggested in this study. First, content and carrier types were better displayed than all 3 types of RDA resources. Second, two kinds of algorithm should be considered for RDA resource icon display. One was the combination of the terms of carrier types plus content types icon. The other was the combination of carrier types icon and content types icon in which the terms reflecting types must be included. Third, the subfield of 33x must be used for the paired display of content type and carrier type of multi-types resources. Fourth, in brief display, resource type icon was better positioned on the left and upper side and in detailed display, resource types were better located in description area. Fifth, ‘format’ was used as display indication phrase. This study would contribute to the design for the resource display by suggesting the practical display considerations of RDA resource type.

7

자치단체의 독서진흥조례 내용분석

홍은성(전남대학교 문헌정보학과) ; 장우권(전남대학교) 2015, Vol.32, No.4, pp.107-135 https://doi.org/10.3743/KOSIM.2015.32.4.107

초록보기

초록

이 연구는 우리나라 지방자치단체의 자치법규인 독서문화진흥을 위한 조례의 제정과 시행에 대한 현황과 내용을 조사․분석한 후 조례와 규칙의 운영에 대한 효율적인 개선방안을 제시하는데 있다. 이를 위해 문헌고찰과 관련 조례를 조사․분석하였다. 연구의 결과는 1) 전국 245개 광역 및 기초자치단체가 운영 중인 독서관련 자치법규는 조례가 77건, 규칙이 7건으로 나타났다. 2) 지자체와 교육지자체의 조례와 규칙명칭이 다양하게 나타나고 있다. 3) 조례와 규칙의 명칭에 따라 내용의 구성요소가 다양하게 나타나고 있으며, 같은 조례 규칙의 명칭을 부여하고 있음에도 서로 다른 구성요소를 가지고 있다. 4) 현재까지 폐지된 지자체 독서관련 자치법규는 조례 10건, 훈령 2건으로 나타났다. 이에 독서문화진흥정책의 활성화를 위한 방안을 제시하면 1) 독서진흥정책 홍보를 통한 인지도를 개선해야 한다. 2) 지자체의 독서진흥의 환경을 고려한 최적의 자치법규 조례명칭을 부여해야 하며, 조례 규칙의 내용은 통일성을 가져야 한다. 3) 조례는 폐지하기에 앞서 폐지 후 나타난 문제점을 면밀히 살펴본 후, 주민들이나 전문가들의 의견을 충분히 수렴한 후 대체 자치법규를 제정하여야 한다.

Abstract

The purpose of this study is to investigate and analyze present condition of enactment and enforcement of regulation for reading culture promotion which is a local statute of the autonomous community of Korea to suggest effective improvement methods for operation of ordinance and regulation. In this research, literature review and regulation analysis were conducted and investigated. The results of this study are as follows. 1) There were 77 ordinances of reading related local statutes of 245 metropolitan and primary local authority and 7 regulations. 2) Ordinances and ordinance regulation of the local government and local government of education are being named variously. 3) Composition of ordinances ordinance regulation were not systematic due to diverse contents of ordinance by local government according to the names of ordinance, and they overlapped with similar contents in general. 4) There were 10 ordinances and 2 official orders for the abolished reading related local statutes of the local government until today. This study suggested the following methods to vitalize the reading culture promotion policy. 1) It would be necessary to improve awareness by promoting the reading promotion policy. 2) Optimal name for local statute and ordinance that considered the environment of reading promotion of local statue need to be assigned, and contents of the ordinance regulation related to reading needs to be consistent. 3) Local statutes need to be established by collecting enough opinions of residents or specialists after thoroughly examining problems of the ordinance before abolition.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지