n gram - Retrieve Ngram list with frequencies from Solr -
I realized that using any of the following APIs, SLO can get the top rank:
localhost: 8983 / Solr / admin / luke? Fl = text in & amp; NumTerms = 5000 & amp; Wt = json
but it only gives one of the top unigrams (eg "David") etc. The list, not Bigrams (for example "David Beckham"), trigrams there is a way I have solar, top breaks, Can I get from the list of trigrams etc?
A field type like the Ngram filter:
; FieldType name = "myngram" stored = "wrong" class = "solr.StrField" & gt; & Lt; Analyzer Type = "Index" & gt; & Lt; Tokenizer class = "solr.StandardTokenizerFactory" /> & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; Filter class = "solr.NGramFilterFactory" minGramSize = "2" maxGramSize = "5" /> & Lt; / Analyzer & gt; & Lt; / FieldType & gt; and then declare a copy area of type myngram & lt; Field name = "ngrams" type = "myngram" indexed = "true" stored = "wrong" required = "wrong" /> & Lt; CopyField source = "doc_text" dest = "ngrams" /> Assuming that the document is located in the text doc_test field. localhost: 8983 / solr / admin / luke? Fl = ngrams & amp; NumTerms = 5000 & amp; Wt = json This will give you the top ngrams of length 2 to 5. You can just limit the Bigrams you can limit maxGramSize parameter of the filter filter factor From 2
Comments
Post a Comment