Skip to content Skip to sidebar Skip to footer

How To Get Tf-idf Matrix Of A Large Size Corpus, Where Features Are Pre-specified?

I have a corpus consisting 3,500,000 text documents. I want to construct a tf-idf matrix of (3,500,000 * 5,000) size. Here I have 5,000 distinct features (words). I am using scikit

Solution 1:

Other option can be gensim it is very efficient in terms of memory and is very fast. Here is the link to its tf-idf tutorial for your corpus.


Post a Comment for "How To Get Tf-idf Matrix Of A Large Size Corpus, Where Features Are Pre-specified?"