정보관리학회지, 한국정보관리학회

91

인용 정보를 고려한 미발견 공공 지식 추출: Swanson의 ABC 모델 재현 및 확장

함정은(연세대학교 문헌정보학과) ; 송민(연세대학교) 2015, Vol.32, No.2, pp.87-103 https://doi.org/10.3743/KOSIM.2015.32.2.087

초록보기

초록

많은 연구들 가운데 살펴볼 가치가 있는 대상을 찾아 제시해주는 문헌기반 발견의 접근법은 연구자들에게 매우 유용할 것이다. 문헌기반 발견 연구의 대표 이론인 Swanson의 ABC 모델은 기존에 검증되지 않은 개체들의 관계를 연구할 것을 제안해 준다. 본 연구는 Swanson의 ABC 모델에 인용 정보를 고려하여 유의한 관계에 있는 개체들을 더 효율적으로 찾아내고자 하였다. 수집 논문들의 참고문헌 목록에서 인용 정보를 확인하고 논문의 표제와 초록을 대상으로 텍스트 마이닝 기법으로 중요한 단어들을 추출하였다. Swanson의 연구들 중 어유와 레이노드 질병 및 증상의 관계를 재현하였으며 기존의 접근법으로 확인되는 개체들과 어떤 차이가 있는지 분석하였다.

Abstract

It is useful to find something valuable for researching through literature based discovery. Swanson’s ABC model, known as literature based discovery, suggests the relationship between entities undiscovered yet. This study tries to find the valid relationship between entities by referring to citation which connects articles on similar topic. We collect citation from references in articles, and extract important concepts in titles and abstracts through text mining techniques. We reproduce the relationship between fish oil and Raynaud’s disease, which is known as one of Swanson’s works, and compare the results with entities identified from traditional approach.

92

텍스트 마이닝 기법을 이용한 컴퓨터공학 및 정보학 분야 연구동향 조사: DBLP의 학술회의 데이터를 중심으로

김수연(연세대학교) ; 송성전(연세대학교 문헌정보학과) ; 송민(연세대학교) 2015, Vol.32, No.1, pp.135-152 https://doi.org/10.3743/KOSIM.2015.32.1.135

초록보기

초록

Abstract

The goal of this paper is to explore the field of Computer and Information Science with the aid of text mining techniques by mining Computer and Information Science related conference data available in DBLP (Digital Bibliography & Library Project). Although studies based on bibliometric analysis are most prevalent in investigating dynamics of a research field, we attempt to understand dynamics of the field by utilizing Latent Dirichlet Allocation (LDA)-based multinomial topic modeling. For this study, we collect 236,170 documents from 353 conferences related to Computer and Information Science in DBLP. We aim to include conferences in the field of Computer and Information Science as broad as possible. We analyze topic modeling results along with datasets collected over the period of 2000 to 2011 including top authors per topic and top conferences per topic. We identify the following four different patterns in topic trends in the field of computer and information science during this period: growing (network related topics), shrinking (AI and data mining related topics), continuing (web, text mining information retrieval and database related topics), and fluctuating pattern (HCI, information system and multimedia system related topics).

93

국가지식정보의 유통 현황 분석 및 유통방안에 관한 연구

이지연(연세대학교) ; 민지연(연세대학교) ; 주수형(연세대학교) 2007, Vol.24, No.3, pp.299-319 https://doi.org/10.3743/KOSIM.2007.24.3.299

초록보기

초록

최근 국가지식정보가 전문성과 신뢰성 측면에서 우수한 콘텐츠로 인식되면서, 질 높은 콘텐츠를 확보하려는 검색포털과 국가기관 및 종합정보센터 간의 연계가 활발하게 이루어지고 있다. 지금까지의 연계는 이용자의 접근성과 지식정보 이용 편리성을 제고하기 위한 것으로서, 현재 우리나라는 국가지식정보 유통의 초기 단계에 놓여 있다. 이에 본 연구에서는 지식정보 제공기관, 검색포털, 국가지식포털을 중심으로 해외 및 국내의 국가지식정보 유통 현황을 파악하였다. 또한 학계, 도서관전문정보센터, 민간 업계 전문가 6명을 대상으로 심층적인 인터뷰를 실시하였다. 이를 바탕으로 효율적인 국가지식정보 유통을 위한 각 정보제공 서비스 간의 연계 방안에 대하여 검토, 제시하였다.

Abstract

Recently, information portals, national institutions, and integrated information centers, which are eager to acquire quality contents, actively share contents. Korean national knowledge is recognized to be superior in terms of its specialty and reliability. Currently, the distribution of national knowledge in Korea is at the beginning stage as the content sharing was intended to enhance the users' accessibility to the information and ease of information use. In this study, we identified the national knowledge distribution status in Korea and abroad by analyzing the roles of the information providers, information search portals, and Korea Knowledge Portal. We also conducted in-depth interviews with six experts, who represent academic institutions, libraries, specialized information centers, and commercial ventures. To enable effective Korean national knowledge distribution, we generated suggestions for the respective information providing services to share and cooperate based on the analysis and the interviews.

94

음악 자원을 대상으로 한 이벤트 중심 ABC 온톨로지 확장 모형에 관한 연구

이혜원(서울여자대학교) ; 김태수(연세대학교) 2007, Vol.24, No.1, pp.273-300 https://doi.org/10.3743/KOSIM.2007.24.1.273

초록보기

초록

In this study it is intended to develop the ontology which can express the relation between objects with emphasis on the structural representation of semantics. Its interoperability with other kinds of previous ontology and metadata was also considered so that the developed ontology may be applicable to the real situation. The ABC Ontology can get extended into another field where the application of the concept of event is possible, for ABC Ontology provides the fundamental framework on the axis of event. In this study it is Music where ABC Ontology can be applied properly, which results in creating Music Ontology. Music Ontology provides the infrastructure of knowledge for reasoning of potential meaning as well as the simple semantic connection of terms. The extended model of ABC Ontology has been developed by applying Music Ontology, which is the domain ontology and conveys meaning, to ABC Ontology that represents the whole framework. The representation of conceptual relation in ABC Ontology turns into the association of the framework and meaning in the extended model of ABC Ontology, with reasoning rules which are typical in ontology. Also, interoperability of the extended model of ABC Ontology is examined in consideration of co-operating with metadata different from those in it. 핵심되는 말: ABC 온톨로지, 이벤트, 메타데이터 상호운용성, 음악 온톨로지, 온톨로지 통합, OWL

Abstract

95

텍스트의 언어적 난이도 측정 공식 비교 연구 - 초중고 교과서를 중심으로 -

최인숙(숙명여자대학교) 2005, Vol.22, No.4, pp.173-195 https://doi.org/10.3743/KOSIM.2005.22.4.173

초록보기

초록

본 연구는 언어적 난이도에 영향을 주는 요인들로 텍스트수준점수 측정공식을 구성하는 방법론이 초등학교 텍스트는 물론 중고등학교 텍스트까지 확장적용될 수 있는지 확인하고 텍스트가 확장됨에 따라 나타나는 새로운 특성을 설명할 수 있는 요인들을 규명하고자 한다. 초중고 텍스트 통합공식, 중고등학교 텍스트 전용공식, 초등학교 텍스트 전용공식을 구성하여 각 공식들의 특징을 비교한 결과 텍스트의 범위를 넓게 잡아 통합 공식을 구성하는 것보다는 소규모 집단으로 분리한 후 전용공식을 구성하는 것이 해당 집단의 특성을 잘 반영하는 우수한 공식을 도출할 수 있는 것으로 확인되었다. 중고등학교 텍스트의 점수를 측정하려면 단락내문장수요인, 문장수단락수요인을 사용하고 초등학교 텍스트의 점수를 측정하려면 이형어절수요인, 이형어절수새어절출현비율요인을 사용하는 것이 효율적이었다.

Abstract

The purpose of this study is to clarify whether readability formulas based on linguistic factors are suitable for secondary and older primary age texts. A comparison among fomulas for primary age texts, some for both primary and secondary age, and some for secondary age revealed that exclusive ones for narrow age range were more effective. A model estimating readability scores from the average number of sentences in paragraphs or a model with two factors, the average number of sentences and paragraphs in texts was found to be good one for secondary age. While a model based on total number of unique syllables or a model from total number of unique syllables and new syllable occurrence ratio was good for primary age.

96

기관리포지터리 수용모형 연구: 과학분야 연구자를 중심으로

황혜경(한국과학기술정보연구원) ; 이지연(연세대학교) 2017, Vol.34, No.2, pp.47-80 https://doi.org/10.3743/KOSIM.2017.34.2.047

초록보기

초록

Abstract

The purpose of this study is to develop an adoptive model of institutional repositories (IRs) by identifying the key factors affecting adoptive intention of IRs and explaining the relations among these factors. Through a survey of 270 researchers and 12 in-depth interviews in the field of physics, mathematics, and life science in Korea, performance expectancy, perceived risks, socio-organizational influence, and individual characteristics were found to have substantial influences on the adoptive intention of IRs. Among the key factors, individual characteristics showed the greatest effect on the adoptive intention of IRs, followed by performance expectancy and other socio-organizational influences except for the perceived risks. Strategies to enhance the adoptive intention of IRs based on analyses of the results were suggested, in terms of the reformation of research assessment system at the national level, strengthening of role of the operational institution, and the need for voluntary scientists-participating service.

97

토픽모델링을 활용한 국내 문헌정보학 연구동향 분석

박자현(연세대학교) ; 송민(연세대학교) 2013, Vol.30, No.1, pp.7-32 https://doi.org/10.3743/KOSIM.2013.30.1.007

초록보기

초록

본 연구는 국내 문헌정보학 분야의 연구동향을 규명하기 위하여 문헌정보학 주요 학술지인, 정보관리학회지, 한국문헌정보학회지, 한국도서관․정보학회지, 한국비블리아학회지의 1970년도부터 2012년도까지 발표 논문 초록을 수집하여 LDA(Latent Dirichlet Allocation)기반의 토픽 모델링 실험을 수행하였다. 그 결과를 종합하면 다음과 같다. 첫째, 토픽모델링 실험에서 도출된 연구주제를 문헌정보학 주제분류표와 비교․분석한 결과, ‘정보학’영역의 디지털도서관, 이용연구, 인터넷, 전문가시스템, 계량정보학, 자동화, 정보검색, 정보시스템, ‘도서관 서비스’영역의 정보서비스, 도서관 유형별 서비스, 이용자 교육/정보리터러시, 서비스 평가, ‘문헌정보학 기초’영역의 도서관과 사회, 전문성, ‘자료조직’영역의 분류, 편목, 메타데이터, ‘도서관 경영’영역의 도서관 평가, 장서개발/관리, ‘서지학’영역의 고서지, ‘도서관 체제’영역의 도서관 및 정보정책, ‘출판’영역의 도서/출판, ‘기록관리학’영역의 하위주제 등과 연결할 수 있었다. 또한 가장 많은 연구주제가 발견된 학문영역은 정보학과 도서관서비스로 나타났다. 둘째, 문헌정보학의 주요 연구주제에서 도서관 유형별 서비스 및 평가, 인터넷, 메타데이터의 연구주제는 상승세를 보였으나, 도서, 분류, 편목, 고서지에 관한 연구주제는 하강세를 보였다. 셋째, 학술지를 구분하여 비교․분석한 결과, 정보관리학회지는 도서관에 관한 연구주제보다 정보학에 관한 연구주제가 많이 출현하였고, 한국문헌정보학회지와 한국도서관․정보학회지, 한국비블리아학회지는 도서관에 관한 연구주제가 정보학에 관한 주제보다 많이 나타났다.

Abstract

The goal of the present study is to identify the topic trend in the field of library and information science in Korea. To this end, we collected titles and abstracts of the papers published in four major journals such as Journal of the Korean Society for information Management, Journal of the Korean Society for Library and Information Science, Journal of Korean Library and Information Science Society, and Journal of the Korean BIBLIA Society for library and Information Science during 1970 and 2012. After that, we applied the well-received topic modeling technique, Latent Dirichlet Allocation(LDA), to the collected data sets. The research findings of the study are as follows: 1) Comparison of the extracted topics by LDA with the subject headings of library and information science shows that there are several distinct sub-research domains strongly tied with the field. Those include library and society in the domain of “introduction to library and information science,” professionalism, library and information policy in the domain of “library system,” library evaluation in the domain of “library management,” collection development and management, information service in the domain of “library service,” services by library type, user training/information literacy, service evaluation, classification/cataloging/meta-data in the domain of “document organization,” bibliometrics/digital libraries/user study/internet/expert system/information retrieval/information system in the domain of “information science,” antique documents in the domain of “bibliography,” books/publications in the domain of “publication,” and archival study. The results indicate that among these sub-domains, information science and library services are two most focused domains. Second, we observe that there is the growing trend in the research topics such as service and evaluation by library type, internet, and meta-data, but the research topics such as book, classification, and cataloging reveal the declining trend. Third, analysis by journal show that in Journal of the Korean Society for information Management, information science related topics appear more frequently than library science related topics whereas library science related topics are more popular in the other three journals studied in this paper.

98

일상생활 맥락 정보요구 기반의 이미지 접근점 확장에 관한 연구

정은경(이화여자대학교) ; 정선영(이화여자대학교) 2012, Vol.29, No.4, pp.273-294 https://doi.org/10.3743/KOSIM.2012.29.4.273

초록보기

초록

세대적 특성과 정보기술의 발달은 이미지의 생산과 이용을 가속화한다. 본 연구는 이미지 이용자의 일상생활 맥락에서 정보요구를 분석하여 이미지 접근점 확장에 관한 논의를 목적으로 하였다. 이를 위하여 소셜 Q&A 서비스인 네이버 지식인에서 이미지를 검색하고자 하는 질문 105건을 추출하였다. 이미지 질문은 이용 목적과 이미지 속성으로 구분한 프레임워크를 이용하여 분석하였다. 분석결과로서 이용 목적은 총 8가지로, 이미지를 데이터로서 이용하고자 하는 목적이 두드러졌으며, 이중에서 ‘보고그리기’는 기존 연구결과에서 찾아볼 수 없었던 이용 목적으로 새롭게 도출되었다. 이미지 속성에서는 의미, 비시각적, 구성 측면에서 의미와 비시각적 속성이 우세하게 나타났다. 전통적으로 이미지 검색과 접근에서 의미 측면의 속성은 중요하게 인식되어 왔으나, 본 연구의 분석결과에서 보여주는 바와 같이 비시각적 측면 특히, 맥락 요소의 비중은 접근점 제공에 있어서 중요한 시사점으로 볼 수 있다.

Abstract

Images have been substantially searched and used due to not only the advanced internet and digital technologies but the characteristics of a younger generation. The purpose of this study aims to discuss the ways on expanding the access points to images by analyzing the needs of users in context of everyday life. In order to achieve the purpose of this study, 105 questions of image seeking in NAVER, which is one of social Q&A services in Korea, were analyzed. For the analysis, a two-dimensional framework with image uses and image attributes were utilized. The findings of this study demonstrate that considerable use purposes on data oriented pole, such as information processing, information dissemination and learning are identified. On the other hand, image attributes from the needs of image show that non-visual aspects including contextual attributes are recognized substantially in addition to the traditional semantic attributes.

99

단어 임베딩(Word Embedding) 기법을 적용한 키워드 중심의 사회적 이슈 도출 연구: 장애인 관련 뉴스 기사를 중심으로

최가람(경기대학교) ; 최성필(경기대학교) 2018, Vol.35, No.1, pp.231-250 https://doi.org/10.3743/KOSIM.2018.35.1.231

초록보기

초록

본 논문에서는 온라인 뉴스 기사에서 자동으로 추출된 키워드 집합을 활용하여 특정 시점에서의 세부 주제별 토픽을 추출하고 정형화하는 새로운 방법론을 제시한다. 이를 위해서, 우선 다량의 텍스트 집합에 존재하는 개별 단어들의 중요도를 측정할 수 있는 복수의 통계적 가중치 모델들에 대한 비교 실험을 통해 TF-IDF 모델을 선정하였고 이를 활용하여 주요 키워드 집합을 추출하였다. 또한 추출된 키워드들 간의 의미적 연관성을 효과적으로 계산하기 위해서 별도로 수집된 약 1,000,000건 규모의 뉴스 기사를 활용하여 단어 임베딩 벡터 집합을 구성하였다. 추출된 개별 키워드들은 임베딩 벡터 형태로 수치화되고 K-평균 알고리즘을 통해 클러스터링 된다. 최종적으로 도출된 각각의 키워드 군집에 대한 정성적인 심층 분석 결과, 대부분의 군집들이 레이블을 쉽게 부여할 수 있을 정도로 충분한 의미적 집중성을 가진 토픽들로 평가되었다.

Abstract

In this paper, we propose a new methodology for extracting and formalizing subjective topics at a specific time using a set of keywords extracted automatically from online news articles. To do this, we first extracted a set of keywords by applying TF-IDF methods selected by a series of comparative experiments on various statistical weighting schemes that can measure the importance of individual words in a large set of texts. In order to effectively calculate the semantic relation between extracted keywords, a set of word embedding vectors was constructed by using about 1,000,000 news articles collected separately. Individual keywords extracted were quantified in the form of numerical vectors and clustered by K-means algorithm. As a result of qualitative in-depth analysis of each keyword cluster finally obtained, we witnessed that most of the clusters were evaluated as appropriate topics with sufficient semantic concentration for us to easily assign labels to them.

100

토픽맵 기반 의학 정보 검색 시스템 구축을 통한 온톨로지 구축 및 방법론 연구

이명호(상명대학교) 2010, Vol.27, No.3, pp.35-51 https://doi.org/10.3743/KOSIM.2010.27.3.035

초록보기

초록

Abstract

Emerging Web 2.0 services such as Twitter, Blogs, and Wikis alongside the poorly- structured and immeasurable growth of information requires an enhanced information organization approach. Ontology has received much attention over the last 10 years as an emerging approach for enhancing information organization. However, there is little penetration into current systems. The purpose of this study is to propose ontology implementation and methodology. To achieve the goal of this study, limitations of traditional information organization approaches are addressed and emerging information organization approaches are presented. Two ontology data models, RDF/OW and Topic Maps, are compared and then ontology development processes and methodology with topic maps based medical information retrieval system are addressed. The comparison of two data models allows users to choose the right model for ontology development.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지