Inside Elasticsearch Shard

What is shard To add data to Elasticsearch, we need an index — a place to store related data. In reality, an index is just a logical namespace that points to one or more physical shards. A shard is a low-level worker unit that holds just a slice of all the data in the index, and a shard is a single instance of Lucene, the Java libraries on which Elasticsearch is based. »

Search Engine - Scoring and Ranking

It’s essential for a search engine to rank-order the documents matching a query. To do so, search engine computes a score for each matching document with respect to the query at hand. Weighted Zone Scoring Instead as a sequence of terms, most documents have additional structure: metadata. Metadata is forms of data about a document, such as its authors(s), title and date of publication, and would generally include fields such as the date of creation and the format of the document. »

Search Engine - Index Construction

This is a general overview of index construction, from term ID, document ID pairs to final reversed index. To search a query, simply get all the document IDs which contain all the terms in the query and return to user. Before going directly to index construction, we’ll talk about hardware basics first. Access to data in memory (5 × 10 −9 seconds) is much faster than access to data on disk (2 × 10 −8 seconds). »

Question Answering System Overview

Types of questions Factoid questions How many calories are there in two slices of apple pie? Most question answering systems focus on factoid questions, that can be answered with simple facts expressed in short texts. The answers to the questions usually can be expressed by a personal name, temporal expression, or location. Complex (narrative) questions: In children with an acute febrile illness, what is the efficacy of acetaminophen in reducing fever? »

Search Engine - Spelling Correction

We cannot guarantee that every word in our queries is typo-free. However, search engine can always try to correct our spellings and gives us what we actually want. There is no mystery behind it, the truth is, search engines have to do a lot of work to make it. To search for an index, search engine has to keep a vocabulary. Vocabulary lookup operation usually uses a classical data structure called the dictionary and has two broad classes of solutions: hashing, and search trees. »