바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

데이터 활용률 제고를 위한 기술 용어의 상호 네트워크 생성과 통제

Generating and Controlling an Interlinking Network of Technical Terms to Enhance Data Utilization

정보관리학회지 / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2018, v.35 no.1, pp.157-182
https://doi.org/10.3743/KOSIM.2018.35.1.157
정도헌 (덕성여자대학교)
  • 다운로드 수
  • 조회수

초록

빅 데이터 시대에 접어들면서 저장 기술과 처리 기술이 급속도로 발전함에 따라, 과거에는 간과되었던 롱테일(long tail) 데이터가 많은 기업과 연구자들에게 관심의 대상이 되고 있다. 본 연구는 롱테일 법칙의 영역에 존재하는 데이터의 활용률을 높이기 위해 텍스트 마이닝 기반의 기술 용어 네트워크 생성 및 통제 기법을 제안한다. 특히 텍스트 마이닝의 편집 거리(edit distance) 기법을 이용해 학문 분야에서 사용되는 기술 용어의 상호 네트워크를 자동으로 생성하는 효과적인 방안을 제시하였다. 데이터의 활용률 향상 실험을 위한 데이터 수집을 위해 LOD(linked open data) 환경을 이용하였으며, 이 과정에서 효과적으로 LOD 시스템의 데이터를 활용하는 기법과 용어의 패턴 처리 알고리즘을 제안하였다. 마지막으로, 생성된 기술 용어 네트워크의 성능 측정을 통해 제안한 기법이 롱테일 데이터의 활용률 제고에 효과적이었음을 확인하였다.

keywords
롱테일 법칙, 개방형 연결 데이터, 언어 자원, DBLP, 편집 거리 알고리즘, long tail theory, linked open data, language resources, DBLP, edit distance algorithm

Abstract

As data management and processing techniques have been developed rapidly in the era of big data, nowadays a lot of business companies and researchers have been interested in long tail data which were ignored in the past. This study proposes methods for generating and controlling a network of technical terms based on text mining technique to enhance data utilization in the distribution of long tail theory. Especially, an edit distance technique of text mining has given us efficient methods to automatically create an interlinking network of technical terms in the scholarly field. We have also used linked open data system to gather experimental data to improve data utilization and proposed effective methods to use data of LOD systems and algorithm to recognize patterns of terms. Finally, the performance evaluation test of the network of technical terms has shown that the proposed methods were useful to enhance the rate of data utilization.

keywords
롱테일 법칙, 개방형 연결 데이터, 언어 자원, DBLP, 편집 거리 알고리즘, long tail theory, linked open data, language resources, DBLP, edit distance algorithm

참고문헌

1.

안광모. (2013). Levenshtein 거리를 이용한 영화평 감성 분류. 디지털콘텐츠학회논문지, 14(4), 581-587. http://dx.doi.org/10.9728/dcs.2013.14.4.581.

2.

황미녕. (2011). 기술 용어의 용어지배값을 이용한 활용주기 모델링방법 (139-141). 한국정보과학회 학술발표논문집.

3.

Abe, A.. (2010). Analysis of research keys as temporal patterns of technical term usage in bibliographical data. Lecture Notes in Computer Science book series, 6496, 150-157. http://dx.doi.org/10.1007/978-3-642-15470-6_16.

4.

Graham Cormode. (2007). The string edit distance matching problem with moves. ACM Transactions on Algorithms, 3(1), 1-. http://dx.doi.org/10.1145/1186810.1186812.

5.

Fortune. (2017). Apple just acquired this little-known artificial intelligence startup. http://fortune.com/2017/05/13/apple-lattice.

6.

Gartner. (2018). Dark data (Gartner IT Glossary). https://www.gartner.com/it-glossary/dark-data.

7.

P. Bryan Heidorn. (2008). Shedding Light on the Dark Data in the Long Tail of Science. Library Trends, 57(2), 280-299. http://dx.doi.org/10.1353/lib.0.0036.

8.

Hwang, M. N.. (2014). Technical terms trends analysis method for technology opportunity discovery. Information, An International Interdisciplinary Journal, 17(3), 877-883.

9.

Jain, P.. (2010). Ontology alignment for linked open data. Lecture Notes in Computer Science book series, 6496, 402-417. http://dx.doi.org/10.1007/978-3-642-17746-0_26.

10.

Jeong, D. H.. (2011). Generating knowledge map for acronymexpansion recognition (287-293). Proceedings on U- and E-Service Science and Technology.

11.

Jeong, D. H.. (2013). Acronym-expansion recognition based on knowledge map system. Information, An International Interdisciplinary Journal, 12(A), 8403-8408.

12.

Jinhyung Kim. (2012). Technology trends analysis and forecasting application based on decision tree and statistical feature analysis. Expert Systems with Applications, 39(16), 12618-12625. http://dx.doi.org/10.1016/j.eswa.2012.05.021.

13.

Qi Li. (2014). A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment, 8(4), 425-436. http://dx.doi.org/10.14778/2735496.2735505.

14.

Noia, T. D.. (2012). Linked open data to support content-based recommender systems (1-8). Proceedings of the 8th International Conference on Semantic Systems.

15.

Paulheim, H.. (2012). Unsupervised generation of data mining features from linked open data (-). Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics.

16.

Reis, D. C.. (2004). Automatic web news extraction using tree edit distance (502-511). Proceedings of the 13th International Conference on World Wide Web.

17.

Veritas. (2016). Veritas global databerg report finds 85% of stored data is either dark or Redundant. https://www.veritas.com/news-releases/2016-03-15-veritas-global-databerg-report-finds-85-percent-of-stored-data.

18.

Wikipedia. (2018). Long tail. https://en.wikipedia.org/wiki/Long_tail.

19.

Wikipedia. (2018). X-ray diffraction (redirection). https://en.wikipedia.org/wiki/X-ray_crystallography.

20.

Wikipedia. (2018). High-performance liquid chromatography. https://en.wikipedia.org/wiki/High-performance_liquid_chromatography.

21.

Wikipedia. (2018). Edit distance. https://en.wikipedia.org/wiki/Edit_distance.

22.

Wu, F.. (2008). Information extraction from Wikipedia: moving down the long tail (731-739). Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

23.

Zhang, C.. (2016). Extracting databases from dark data with deepdive (847-859). Proceedings of the 2016 International Conference on Management of Data.

정보관리학회지