n gram - Retrieve Ngram list with frequencies from Solr -


I realized that using any of the following APIs, SLO can get the top rank:
localhost: 8983 / Solr / admin / luke? Fl = text in & amp; NumTerms = 5000 & amp; Wt = json
but it only gives one of the top unigrams (eg "David") etc. The list, not Bigrams (for example "David Beckham"), trigrams there is a way I have solar, top breaks, Can I get from the list of trigrams etc?

A field type like the Ngram filter:

  ; FieldType name = "myngram" stored = "wrong" class = "solr.StrField" & gt; & Lt; Analyzer Type = "Index" & gt; & Lt; Tokenizer class = "solr.StandardTokenizerFactory" /> & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; Filter class = "solr.NGramFilterFactory" minGramSize = "2" maxGramSize = "5" /> & Lt; / Analyzer & gt; & Lt; / FieldType & gt;  

and then declare a copy area of ​​type myngram

  & lt; Field name = "ngrams" type = "myngram" indexed = "true" stored = "wrong" required = "wrong" /> & Lt; CopyField source = "doc_text" dest = "ngrams" /> Assuming that the document is located in the text  doc_test  field.  
  localhost: 8983 / solr / admin / luke? Fl = ngrams & amp; NumTerms = 5000 & amp; Wt = json   

This will give you the top ngrams of length 2 to 5. You can just limit the Bigrams you can limit maxGramSize parameter of the filter filter factor From 2

Comments

Popular posts from this blog

c - Mpirun hangs when mpi send and recieve is put in a loop -

python - Apply coupon to a customer's subscription based on non-stripe related actions on the site -

java - Unable to get JDBC connection in Spring application to MySQL -