# 索引

索引在数据库中是众所周知的;它们用于提高数据存储上的数据检索操作的速度。为了获得更快的读取速度，索引需要权衡增加的存储开销和更慢的写入速度(因为我们不仅需要写入数据，还需要更新索引)。索引用于快速定位数据，而不必检查数据库表中的每一行。索引可以使用数据库表的一个或多个列创建，为快速随机查找和高效访问有序记录提供了基础。

索引是一种数据结构，可以看作是指向实际数据所在位置的内容表。因此，当我们在表的一列上创建索引时，我们存储该列和一个指向索引中整个行的指针。索引还用于创建相同数据的不同视图。对于大型数据集，这是一种指定不同过滤器或排序方案的极好方法，而无需创建多个额外的数据副本。

就像传统的关系数据存储一样，我们也可以将这个概念应用到更大的数据集。使用索引的诀窍在于，我们必须仔细考虑用户将如何访问数据。对于许多tb大小但有效负载非常小(例如，1 KB)的数据集，索引是优化数据访问的必要条件。在这么大的数据集中找到一个小的有效负载可能是一个真正的挑战，因为我们不可能在任何合理的时间内迭代这么多的数据。此外，如此大的数据集很可能分布在多个物理设备上——这意味着我们需要某种方法来找到所需数据的正确物理位置。索引是最好的方法。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://vagrant.gitbook.io/grokking-system-design/suo-yin.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.