저널 n-gram 오픈소스 The General Index
2021. 10. 30. 11:27ㆍ데이터 분석가로 살기
https://archive.org/details/GeneralIndex
The General Index consists of 3 tables derived from 107,233,728 journal articles.
A table of n-grams, ranging from unigrams to 5-grams, is extracted using SpaCy. Each of the 355,279,820,087 rows of the n-gram table consists of an n-gram coupled with a journal article id.
A second table is constructed using Yake and consists of 19,740,906,314 rows, each with a keywords and an article id.
A third table associates an article id with metadata.
'데이터 분석가로 살기' 카테고리의 다른 글
Korea - Nike Direct Applied Analytics Director (0) | 2021.12.20 |
---|---|
2021 트렌드 (0) | 2021.11.03 |
알고리즘 win, 하드웨어 lose, 특정문제에 대해서는 (0) | 2021.10.08 |
직방 Senior Data Scientist JD (0) | 2021.06.03 |
Python 인터뷰 준비 (0) | 2020.06.20 |