몽고디비의 다양한 Index

수강노트/백엔드 개발자를 위한 한 번에 끝내는 대용량 데이터 & 트래픽 처리 초

몽고디비의 다양한 Index

yuni02 2023. 11. 22. 23:03

본 내용은 패스트캠퍼스의 '백엔드 개발자를 위한 한 번에 끝내는 대용량 데이터 & 트래픽 처리 초격차 패키지 Online.'를 수강하면서 저의 방식대로 정리한 글입니다.

그에따라 틀린 내용이 있을 수 있습니다. 틀린 내용이 있으면 댓글로 알려주시면 감사하겠습니다.

compound index와 ESR Rule

이 강의를 들으면서 실제 참여중인 개발 프로젝트에서 활용할 수 있는 정보를 추려봤다. 핵심적인 기능은 explain 메서드를 통한 몽고DB쿼리문 성능 측정법이다. 이 방법을 활용해 쿼리문 성능 개선을 위한 인덱스 추가를 고려해볼 수 있다.

MongoDB에서 .explain() 메소드는 쿼리 계획 및 실행에 대한 자세한 정보를 제공한다. 이 메소드의 주요 역할과 특징은 다음과 같다:

쿼리 실행 계획(Execution Plan) 제공: .explain()은 데이터베이스가 특정 쿼리를 어떻게 실행할 것인지에 대한 계획을 보여줍니다. 이는 인덱스 사용 여부, 쿼리가 스캔해야 하는 문서 수, 쿼리의 실행 경로 등을 포함한다.
성능 최적화 도구로 활용: 개발자들은 이 정보를 사용하여 쿼리의 성능을 분석하고 최적화할 수 있습니다. 예를 들어, 비효율적인 전체 컬렉션 스캔이 발생하는지, 적절한 인덱스가 사용되고 있는지 확인할 수 있다.
다양한 모드 제공: .explain() 메소드는 여러 모드("queryPlanner", "executionStats", "allPlansExecution")를 제공하여, 쿼리 계획만 보거나, 실행 통계 또는 모든 가능한 쿼리 계획의 실행 정보를 볼 수 있다.
디버깅 및 트러블슈팅: 데이터베이스의 성능 문제나 예상치 못한 쿼리 동작을 진단하는 데 유용합니다. 예를 들어, 쿼리가 예상보다 오래 걸릴 때 원인을 분석할 수 있다.

.explain() 메소드는 MongoDB에서 쿼리의 성능을 이해하고 최적화하는 데 필수적인 도구다.

위의 그림처럼 explain method를 통해 쿼리 성능 개선이 가능하다. 이는 아래화면과 같은 성능에 대한 세부정보 확인을 통해 할 수 있다.
explain method를 통해서 나온 실행결과가 길어서 두개의 캡처화면으로 이어붙였다. 결국, 두개의 그림이 하나의 결과다.

아래는 개발 프로젝트에서 사용한 쿼리문(Aggregate)의 성능을 조회해 본 지표다.
MongoDB의 `.explain()` 메소드 실행 결과를 분석하여 성능 개선을 위한 조치를 고려하는 방법은 여러 단계로 나누어 볼 수 있다. 여기서 주요 포인트를 살펴보겠다:

1. 쿼리 계획 분석
- Namespace: 이는 쿼리가 실행되는 데이터베이스와 컬렉션을 나타낸다.
- Query Planner:
  - `parsedQuery`: 쿼리는 `isUse`, `patientId` 필드와 `dcProcedure`, `opProcedure` 필드 내의 `isUse` 값을 기준으로 조건을 걸고 있다.
  - `winningPlan`: 쿼리 실행 계획에서 `PROJECTION_DEFAULT`와 `COLLSCAN` (Collection Scan, 컬렉션 전체를 스캔)을 사용하고 있음을 나타낸다.

2. 성능 분석
- Execution Stats:
  - `executionTimeMillis`: 쿼리 실행 시간은 1밀리초로, 매우 빠르다.
  - `totalDocsExamined`: 5개 문서를 검사했다. 이는 쿼리가 작업하는 문서의 양을 나타낸다.
  - `COLLSCAN`: 인덱스를 사용하지 않고 컬렉션을 전체 스캔했습니다. 대규모 데이터에서는 성능 저하의 원인이 될 수 있다.

3. 성능 개선 방안
- 인덱스 활용: 현재 쿼리는 `COLLSCAN`을 사용하고 있다. 이는 특히 데이터가 많은 경우 비효율적일 수 있다. 쿼리가 자주 사용하는 필드(`isUse`, `patientId`, `dcProcedure.isUse`, `opProcedure.isUse`)에 인덱스를 생성하여 성능을 개선할 수 있다.
- 쿼리 최적화: 쿼리가 너무 많은 데이터를 반환하지 않도록 필요한 필드만 선택하거나, 조건을 더 명확히 해서 검색 범위를 좁히는 것도 성능 향상에 도움이 될 수 있다.

4. 추가 고려 사항
- 데이터 크기와 사용 빈도: 현재 쿼리는 소규모 데이터(5개 문서)를 대상으로 하고 있으며, 실행 시간도 매우 짧다. 따라서 현재 상태에서 성능 문제가 없다면 굳이 복잡한 최적화를 진행할 필요는 없을 수 있다. 하지만 데이터 크기가 증가하거나 쿼리 빈도가 높아질 경우 성능 문제가 발생할 가능성이 있다.

최종적으로, 쿼리의 성능 최적화는 데이터 크기, 사용 빈도, 애플리케이션의 특정 요구사항 등 다양한 요소를 고려해야 한다. 따라서 `.explain()`의 결과를 바탕으로 문맥에 맞는 조치를 취하는 것이 중요하다.

{
  explainVersion: "1",
  queryPlanner: {
    namespace: "d1-op.implant",
    indexFilterSet: false,
    parsedQuery: {
      $and: [
        {
          $or: [
            { dcProcedure: { $elemMatch: { isUse: { $eq: true } } } },
            { opProcedure: { $elemMatch: { isUse: { $eq: true } } } },
          ],
        },
        { isUse: { $eq: true } },
        { patientId: { $eq: "6551d1b85b4897748047d46b" } },
      ],
    },
    queryHash: "37095531",
    planCacheKey: "37095531",
    optimizedPipeline: true,
    maxIndexedOrSolutionsReached: false,
    maxIndexedAndSolutionsReached: false,
    maxScansToExplodeReached: false,
    winningPlan: {
      stage: "PROJECTION_DEFAULT",
      transformBy: {
        _id: true,
        dentalCode: true,
        isUse: true,
        modId: true,
        regDt: true,
        modDt: true,
        site: true,
        patientId: true,
        provNum: true,
        examDate: true,
        providerId: true,
        provAbbr: true,
        regId: true,
        opProcedure: {
          $filter: {
            input: "$opProcedure",
            as: "op",
            cond: { $eq: ["$$op.isUse", { $const: true }] },
          },
        },
        dcProcedure: {
          $filter: {
            input: "$dcProcedure",
            as: "dc",
            cond: { $eq: ["$$dc.isUse", { $const: true }] },
          },
        },
      },
      inputStage: {
        stage: "COLLSCAN",
        filter: {
          $and: [
            {
              $or: [
                { dcProcedure: { $elemMatch: { isUse: { $eq: true } } } },
                { opProcedure: { $elemMatch: { isUse: { $eq: true } } } },
              ],
            },
            { isUse: { $eq: true } },
            { patientId: { $eq: "6551d1b85b4897748047d46b" } },
          ],
        },
        direction: "forward",
      },
    },
    rejectedPlans: [],
  },
  executionStats: {
    executionSuccess: true,
    nReturned: 0,
    executionTimeMillis: 1,
    totalKeysExamined: 0,
    totalDocsExamined: 5,
    executionStages: {
      stage: "PROJECTION_DEFAULT",
      nReturned: 0,
      executionTimeMillisEstimate: 1,
      works: 6,
      advanced: 0,
      needTime: 5,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      transformBy: {
        _id: true,
        dentalCode: true,
        isUse: true,
        modId: true,
        regDt: true,
        modDt: true,
        site: true,
        patientId: true,
        provNum: true,
        examDate: true,
        providerId: true,
        provAbbr: true,
        regId: true,
        opProcedure: {
          $filter: {
            input: "$opProcedure",
            as: "op",
            cond: { $eq: ["$$op.isUse", { $const: true }] },
          },
        },
        dcProcedure: {
          $filter: {
            input: "$dcProcedure",
            as: "dc",
            cond: { $eq: ["$$dc.isUse", { $const: true }] },
          },
        },
      },
      inputStage: {
        stage: "COLLSCAN",
        filter: {
          $and: [
            {
              $or: [
                { dcProcedure: { $elemMatch: { isUse: { $eq: true } } } },
                { opProcedure: { $elemMatch: { isUse: { $eq: true } } } },
              ],
            },
            { isUse: { $eq: true } },
            { patientId: { $eq: "6551d1b85b4897748047d46b" } },
          ],
        },
        nReturned: 0,
        executionTimeMillisEstimate: 1,
        works: 6,
        advanced: 0,
        needTime: 5,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 1,
        direction: "forward",
        docsExamined: 5,
      },
    },
  },
  command: {
    aggregate: "implant",
    pipeline: [
      {
        $match: {
          isUse: true,
          patientId: "6551d1b85b4897748047d46b",
          $or: [
            { opProcedure: { $elemMatch: { isUse: true } } },
            { dcProcedure: { $elemMatch: { isUse: true } } },
          ],
        },
      },
      {
        $project: {
          opProcedure: {
            $filter: {
              input: "$opProcedure",
              as: "op",
              cond: { $eq: ["$$op.isUse", true] },
            },
          },
          dcProcedure: {
            $filter: {
              input: "$dcProcedure",
              as: "dc",
              cond: { $eq: ["$$dc.isUse", true] },
            },
          },
          dentalCode: 1,
          site: 1,
          examDate: 1,
          providerId: 1,
          provNum: 1,
          provAbbr: 1,
          patientId: 1,
          isUse: 1,
          regId: 1,
          regDt: 1,
          modId: 1,
          modDt: 1,
        },
      },
    ],
    cursor: {},
    $db: "d1-op",
  },
  serverInfo: {
    host: "ac-izgs2dd-shard-00-01.mvapd2m.mongodb.net",
    port: 27017,
    version: "6.0.11",
    gitVersion: "f797f841eaf1759c770271ae00c88b92b2766eed",
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
    internalLookupStageIntermediateDocumentMaxSizeBytes: 16793600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    internalQueryMaxBlockingSortMemoryUsageBytes: 33554432,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600,
  },
  ok: 1,
  $clusterTime: {
    clusterTime: Timestamp({ t: 1700663692, i: 27 }),
    signature: {
      hash: Binary.createFromBase64("VbuGB3LK+4w5zoo3Cg6Au19+w9k=", 0),
      keyId: 7241868336311566000,
    },
  },
  operationTime: Timestamp({ t: 1700663692, i: 27 }),
};