Inside Elasticsearch Shard

What is shard To add data to Elasticsearch, we need an index — a place to store related data. In reality, an index is just a logical namespace that points to one or more physical shards. A shard is a low-level worker unit that holds just a slice of all the data in the index, and a shard is a single instance of Lucene, the Java libraries on which Elasticsearch is based. »

Research Methods in Psycholinguistics

This is a summarization of available research methods in psycholinguistic field. Habituation Techniques Looking time(LT) is the most common measure of habituation in language acquisition research. Habituation is one of the optimal tasks for testing pre‐verbal infants as it does not rely on overt productions, but rather on implicit cognitive measures (e.g., looking time, sucking, heart rate, among others). Further, based on the comparator model, it allows researchers to determine the nature of infants’ percepts and concepts by testing differing levels of novelty from the habituated stimulus (e. »

<Database Internals> Notes

Introduction Database systems take care of data integrity, consistency, and redundancy. Databases are modular systems and consist of multiple parts: a transport layer accepting requests, a query processor determining the most efficient way to run queries, an execution engine carrying out the operations, and a storage engine. Back in 2000 you only had a few options of databases and most of them would be relational databases. Around 2010, a new class of eventually consistent databases started appearing, and terms such as NoSQL, and later, big data grew in popularity. »

Chuanrong Li on #notes,

Search Engine - Scoring and Ranking

It’s essential for a search engine to rank-order the documents matching a query. To do so, search engine computes a score for each matching document with respect to the query at hand. Weighted Zone Scoring Instead as a sequence of terms, most documents have additional structure: metadata. Metadata is forms of data about a document, such as its authors(s), title and date of publication, and would generally include fields such as the date of creation and the format of the document. »

Graph Algorithms and Appilcations

Graph Basics A graph is defined as a collection of objects where some pairs of objects are connected by links. Undirected graphs have symmetrical/reciprocal links, directed graphs have directed links. Complete Graph: an undirected graph with the maximum number of edges (such that all pairs of nodes are connected). Graph attribute Key factor Connected versus disconnected Whether there is a path between any two nodes in the graph, irrespective of distance Weighted versus unweighted Whether there are (domain-specific) values on relationships or nodes Directed versus undirected Whether or not relationships explicitly define a start and end node Cyclic versus acyclic Whether paths start and end at the same node Sparse versus dense Relationship to node ratio Monopartite, bipartite, and k-partite Whether nodes connect to only one other node type (e. »

Chuanrong Li