All Things Distributed
Today Amazon Web Services is introducing Amazon CloudSearch, a new web service that brings the power of the Amazon.com’s search technology to every developer. Amazon CloudSearch provides a fully-featured search engine that is easy to manage and scale. It offers full-text search with features like faceting and user-defined rank functions. And like most AWS services, Amazon CloudSearch scales automatically as your data and traffic grow, making it an easy choice for applications small to large. With Amazon CloudSearch, developers just create a Search Domain, upload data, and start querying.
Search is an essential part of many of today’s cloud-centric applications. While in our daily lives we are mostly familiar with the search functionality offered by web search, there are in fact many more cases where search is a fundamental component of an application. Search is a much broader technology than just the indexing of large collections of web pages. Many organizations have large collections of documents, structured and unstructured, that can benefit from a specialized search service. With the rise of the App developer culture there is an increasing number of consumer data sources that cannot be simply queried with a web search engine. Using specialized ranking functions these apps can give their customers a highly specialized search experience.
And increasingly, search is applied to data that, though called a “document” for the purposes of search, is really just a record in a database or an object in a NoSQL system. On the query side, we are used to seeing search results as users, but search results are increasingly being used at the core of complex distributed systems where the results are consumed by machines, not people.
With these applications in mind, our customers have told us that a cloud-based managed search service is high on their wish lists. Their main motivation is that existing search technologies, both commercial and open source, have proven to be hard to manage and complex to configure.
Amazon CloudSearch will have democratization effect as it offers features that have been out of reach for many customers. With Amazon CloudSearch, a powerful search engine is now in the hands of every developer, at our familiar low prices, using a pay-as-you-go model. It will allow developers to improve functionality of their products, at lower costs with almost zero administration. It is very simple to get started; customers can create a Search Domain, upload their documents, and can immediately start querying.
How it Works
Developers set up a Search Domain – a set of resources in AWS that will serve as the home for one collection of data. Developers then access their domain through two HTTP-based endpoints: a document upload endpoint and a query endpoint. As developers send documents to the upload endpoint they are quickly incorporated into the searchable index and become searchable.
Developers can upload data either through the AWS console, from the command-line tools, or by sending their own HTTP POST requests to the upload endpoint.
There are three features that make it easy to configure and customize the search results to meet exactly the needs of the application.
Filtering: Conceptually, this is using a match in a document field to restrict the match set. For example, if documents have a “color” field, you can filter the matches for the color “red”.
Ranking: Search has at least two major phases: matching and ranking. The query specifies which documents match, generating a match set. After that, scores are computed (or direct sort criterion is applied) for each of the matching documents to rank them best to worst. Amazon CloudSearch provides the ability to have customized ranking functions to fine tune the search results.
Faceting: Faceting allows you to categorize your search results into refinements on which the user can further search. For example, a user might search for ‘umbrellas’, and facets allow you to group the results by price, such as $0-$10, $10-$20, $20-$40, etc. Amazon CloudSearch also allows for result counts to be included in facets, so that each refinement has a count of the number of documents in that group. The example could then be: $0-$10 (4 items), $10-$20 (123 items), $20-$40 (57 items), etc.
For more information on the different configuration possibilities visit the Amazon CloudSearch detail page.
Amazon CloudSearch is itself built on AWS, which enables it to handle scale.
Amazon CloudSearch supports both horizontal and vertical scaling. The main search index is kept in memory to ensure that requests can be served at very high rates. As developers add data, CloudSearch increases either the size of your underlying node or it increases the number of nodes in the cluster. To handle growing request rates, the service autoscales the number of instances handling queries.
Amazon CloudSearch is based on more than a decade of developing high quality search technologies for Amazon.com. It has been developed by A9, the Amazon.com subsidiary that focuses on search technologies. The technology that is used at all the different places where you can search on Amazon.com is also at the core of at Amazon CloudSearch.
With the launch of Amazon CloudSearch the Amazon Web Services remove yet another pain point for developers. Almost every application these days needs some form of search and as such every developer has to spend significant time implementing it. With Amazon CloudSearch developers can now simply focus on their application and leave the management of search to the cloud.
For more information see the Amazon CloudSearch detail pages, the Amazon CloudSearch Developer Guide and the posting on the AWS developer blog.
You can sign up for the Introduction To Amazon CloudSearch webinar on May 10.