<Designing Data-Intensive Application> Notes

Chapter 1 - Reliable, Scalable, and Maintainable Applications Typical data-intensive application functions: Store data so that they, or another application, can find it again later (databases) Remember the result of an expensive operation, to speed up reads (caches) Allow users to search data by keyword or filter it in various ways (search indexes) Send a message to another process, to be handled asynchronously (stream processing) Periodically crunch a large amount of accumulated data (batch processing) An application has to meet functional requirements(e. »

Chuanrong Li on #notes,

Search Engine - Index Construction

This is a general overview of index construction, from term ID, document ID pairs to final reversed index. To search a query, simply get all the document IDs which contain all the terms in the query and return to user. Before going directly to index construction, we’ll talk about hardware basics first. Access to data in memory (5 × 10 −9 seconds) is much faster than access to data on disk (2 × 10 −8 seconds). »

Question Answering System Overview

Types of questions Factoid questions How many calories are there in two slices of apple pie? Most question answering systems focus on factoid questions, that can be answered with simple facts expressed in short texts. The answers to the questions usually can be expressed by a personal name, temporal expression, or location. Complex (narrative) questions: In children with an acute febrile illness, what is the efficacy of acetaminophen in reducing fever? »

Towards Robust Natural Language Understanding

What is the robustness of NLU? One thing we have to admit, deep learning models are easy to be fooled. In software development, robust programming is a style of programming that focuses on handling unexpected termination and unexpected actions. Similar in natural language understanding, the NLU system needs to be prepared for cases where the input data does not correspond to the expectations. The expectations are vary by the systems, in chatting bot, that usually means the system can handle offensive language, and in text classification, the system should ignore irrelevant information. »

Search Engine - Spelling Correction

We cannot guarantee that every word in our queries is typo-free. However, search engine can always try to correct our spellings and gives us what we actually want. There is no mystery behind it, the truth is, search engines have to do a lot of work to make it. To search for an index, search engine has to keep a vocabulary. Vocabulary lookup operation usually uses a classical data structure called the dictionary and has two broad classes of solutions: hashing, and search trees. »