바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

미국 특허 서지정보 추출 방법에 대한 연구: HTML 파싱 기법의 활용을 중심으로

An Extraction Method of Bibliographic Information from the US Patents: Using an HTML Parsing Technique

정보관리학회지 / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2010, v.27 no.2, pp.7-20
https://doi.org/10.3743/KOSIM.2010.27.2.007
한유진 (숙명여자대학교)
오승우 (Seoul National University)
  • 다운로드 수
  • 조회수

Abstract

This study aims to provide a method of extracting the most recent information on US patent documents. An HTML paring technique that can directly connect to the US Patent and Trademark Office (USPTO) Web page is adopted. After obtaining a list of 50 documents through a keyword searching method, this study suggested an algorithm, using HTML parsing techniques, which can extract a patent number, an applicant, and the US patent class information. The study also revealed an algorithm by which we can extract both patents and subsequent patents using their closely connected relationship, that is a very distinctive characteristic of US patent documents. Although the proposed method has several limitations, it can supplement existing databases effectively in terms of timeliness and comprehensiveness.

keywords
미국 특허, 서지정보, 추출, HTML 파싱, US patents, bibliographic information, extraction, HTML parsing, US patents, bibliographic information, extraction, HTML parsing

참고문헌

1.

Calcagno, M.. (2008). An investigation into analyzing patents by chemical structure using Thomson’s Derwent World Patent Index codes. World Patent Information, 30(3), 188-198.

2.

Ernst, H.. (2003). Patent Information for Strategic Technology Management. World Patent Information, 25(3), 233-242.

3.

Gupta, S.. (2005). Automating Content Extraction of HTML Documents. World Wide Web, 8(2), 179-224.

4.

Hall, B.. (2001). The NBER Patent Citations Data File: Lessons, Insights and Methodological Tools. .

5.

Lerdorf, R.. (2006). Programming PHP (2nd ed.):O'Reilly Media.

6.

Lichtenthaler, U.. (2009). The role of corporate technology strategy and patent portfolios in low-, medium- and high-technology firms. Research Policy, 38(3), 559-569.

7.

No, H. J.. (2010). Trajectory patterns of technology fusion: Trend analysis and taxonomical grouping in nanobiotechnology. Technological Forecasting and Social Change, 77(1), 63-75.

8.

Simmons, E. S.. (2004). The online divide: a professional user’s perspective on Derwent database development in the online era. World Patent Information, 26(1), 45-47.

9.

World Intellectual Property Organization (WIPO, 2010) IP Statistics.

10.

유재복. (2010). 특허 인용에 영향을 미치는 요인 분석. 정보관리학회지, 27(1), 103-118.

11.

Yoon, B. U.. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. The Journal of High Technology Management Research, 15(1), 37-50.

정보관리학회지