Coming from a Solr/Lucene/Algolia background, my opinions on this:
What's good:
==========
- Focused search for question and answer databases (such as customer FAQs)
- ML-based semantic search without requiring any explicit configuration
- Connectors for S3, AWS-hosted MySQL/PG, Sharepoint.
Searching data already in the AWS ecosystem (S3, Aurora) is now easier,
and likely faster and cheaper too in some aspects like saving incoming/outgoing bandwidth
- Document-level access control at all pricing plans
- Managed search (similar to Algolia)
What's similar to existing search systems (Solr / ES / Algolia):
==========
- Indexing: All data has to be processed into "field:value" structure prior to indexing
- Indexing file formats: Plain text, HTML, PDF, MS DOCX, MS PPT
- Searching: Usual boolean filters and faceting but only at field level.
- Searching: Field and value boosts for relevance, but only at index-time
- Results: Highlighting support
What's missing:
===========
- No multi-lingual support. Only English. Given that it's AWS, I'm very surprised by this actually (or
I've missed out something in their docs)
- Can't configure text analysis for English. I feel this'll return relevant results for formal-style
content, but probably not for informal-style content like emails.
- No connectors for common internal systems: Outlook, JIRA, Confluence
- No built-in support for CSV, XLS, JSON (that one's odd!). They'll all require preprocessing which means additional infra costs.
- Doesn't seem to support range- / query- facets. I feel lack of range facets is a big problem, especially
for numerical data.
- No query-time relevance tuning
- No field-level access control
- Scores are not returned in results
- Common post-searching functionality is missing: rescoring, grouping, clustering
What's unknown:
============
- I don't see any information about phrase or proximity searches. Of course, they are usually relevance hacks in keyword-based systems, but sometimes users really need exact phrase matches. Does their ML backend handle this somehow?
- All search systems fall short while handling proper nouns - names, places, things, scientific names.
It's possible to alleviate it to some extent using part-of-speech aware indexing. Not sure if Kendra
does it in its ML backend.
What's good:
==========
- Focused search for question and answer databases (such as customer FAQs)
- ML-based semantic search without requiring any explicit configuration
- Connectors for S3, AWS-hosted MySQL/PG, Sharepoint. Searching data already in the AWS ecosystem (S3, Aurora) is now easier, and likely faster and cheaper too in some aspects like saving incoming/outgoing bandwidth
- Document-level access control at all pricing plans
- Managed search (similar to Algolia)
What's similar to existing search systems (Solr / ES / Algolia):
==========
- Indexing: All data has to be processed into "field:value" structure prior to indexing
- Indexing file formats: Plain text, HTML, PDF, MS DOCX, MS PPT
- Searching: Usual boolean filters and faceting but only at field level.
- Searching: Field and value boosts for relevance, but only at index-time
- Results: Highlighting support
What's missing:
===========
- No multi-lingual support. Only English. Given that it's AWS, I'm very surprised by this actually (or I've missed out something in their docs)
- Can't configure text analysis for English. I feel this'll return relevant results for formal-style content, but probably not for informal-style content like emails.
- No connectors for common internal systems: Outlook, JIRA, Confluence
- No built-in support for CSV, XLS, JSON (that one's odd!). They'll all require preprocessing which means additional infra costs.
- Doesn't seem to support range- / query- facets. I feel lack of range facets is a big problem, especially for numerical data.
- No query-time relevance tuning
- No field-level access control
- Scores are not returned in results
- Common post-searching functionality is missing: rescoring, grouping, clustering
What's unknown:
============
- I don't see any information about phrase or proximity searches. Of course, they are usually relevance hacks in keyword-based systems, but sometimes users really need exact phrase matches. Does their ML backend handle this somehow?
- All search systems fall short while handling proper nouns - names, places, things, scientific names. It's possible to alleviate it to some extent using part-of-speech aware indexing. Not sure if Kendra does it in its ML backend.