Structuring Data Science Part 2
Snapshot
A secondary method for structuring online data science resources. A video presenting the project and initial results is below, I will update with the full blog soon.
Overview of Methods Used
Used topic modeling to relate technical resources to one another. Extracting the fundamental concepts in a text and creating links to other resources by their concept similarities. Allowing a user to read a journal and directly see videos or blogs covering those same concepts.
- TFIDF+SVD pipeline created to generate topic similarities based on concept vocabulary created.
- Network graph created for topic ontologies and PageRank used in recommendation engine.
- MongoDB database created for hosting the different data collections.
- Multiple web spiders developed to mine blog pages and video transcripts with a corpus of 80k blog pages, 30k papers and 8k video transcripts curated.
- Deployed through a flask generated website.