긍정적인 사고와 행동으로 선한 영향력을 줄 수 있도록

PostgreSQL

[PostgreSQL] pgvector 설치 및 간단한 사용 예시

리거니 2025. 11. 25. 09:39

00. pgvector 소개

  • 본 가이드는 pgvector 개념 및 설치 후 실습에 대한 메뉴얼입니다.
  • vector 는 다차원 공간에서 데이터를 나타내는 수학적 표현으로, AI 모델에서 이미지, 텍스트, 음성 데이터를 벡터로 변환하여 벡터 간 유사도를 계산하는 데 사용됩니다.
  • pgvector 는 PostgreSQL에서 벡터 데이터를 저장하고 유사도 검색을 지원하는 확장 모듈입니다.
  • pgvector의 주요 기능
    • 벡터 데이터 저장: PostgreSQL 내에서 벡터 데이터를 효율적으로 저장할 수 있습니다.
    • 유사도 검색: 코사인 유사도, 유클리드 거리, 내적 등 다양한 벡터 유사도 검색 방식을 지원합니다.
    • 확장성: 인덱스 기능을 통해 대규모 데이터에서 빠른 검색을 지원하여 성능을 최적화합니다.

01. pgvector 설치

주의사항 : postgresql-devel (확장 빌드) 패키지가 설치 되있어야 함.

1-1. git으로 최신버전 설치 파일 다운

[tarandb@ztest-db1 ~]$ git clone --branch v0.8.1 https://github.com/pgvector/pgvector.git
Cloning into 'pgvector'...
remote: Enumerating objects: 12052, done.
remote: Counting objects: 100% (4776/4776), done.
remote: Compressing objects: 100% (376/376), done.
remote: Total 12052 (delta 4567), reused 4400 (delta 4400), pack-reused 7276 (from 4)
Receiving objects: 100% (12052/12052), 1.87 MiB | 8.09 MiB/s, done.
Resolving deltas: 100% (9018/9018), done.
Note: switching to '778dacf20c07caf904557a88705142631818d8cb'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c 

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

1-2. make & make install

[tarandb@ztest-db1 ~]$ cd pgvector/
[tarandb@ztest-db1 pgvector]$ ls -al
total 108
drwxrwxr-x 7 tarandb tarandb  4096 Nov 13 09:49 .
drwx------ 3 tarandb tarandb  4096 Nov 13 09:49 ..
-rw-rw-r-- 1 tarandb tarandb    84 Nov 13 09:49 .editorconfig
drwxrwxr-x 8 tarandb tarandb  4096 Nov 13 09:49 .git
drwxrwxr-x 3 tarandb tarandb  4096 Nov 13 09:49 .github
-rw-rw-r-- 1 tarandb tarandb   117 Nov 13 09:49 .gitignore
-rw-rw-r-- 1 tarandb tarandb  6255 Nov 13 09:49 CHANGELOG.md
-rw-rw-r-- 1 tarandb tarandb   713 Nov 13 09:49 Dockerfile
-rw-rw-r-- 1 tarandb tarandb  1104 Nov 13 09:49 LICENSE
-rw-rw-r-- 1 tarandb tarandb  1105 Nov 13 09:49 META.json
-rw-rw-r-- 1 tarandb tarandb  3069 Nov 13 09:49 Makefile
-rw-rw-r-- 1 tarandb tarandb  2724 Nov 13 09:49 Makefile.win
-rw-rw-r-- 1 tarandb tarandb 40773 Nov 13 09:49 README.md
drwxrwxr-x 2 tarandb tarandb  4096 Nov 13 09:49 sql
drwxrwxr-x 2 tarandb tarandb  4096 Nov 13 09:49 src
drwxrwxr-x 6 tarandb tarandb  4096 Nov 13 09:49 test
-rw-rw-r-- 1 tarandb tarandb   145 Nov 13 09:49 vector.control
[tarandb@ztest-db1 pgvector]$ make
Makefile:48: /app/tarantuladb/v15/lib/pgxs/src/makefiles/pgxs.mk: No such file or directory
make: *** No rule to make target '/app/tarantuladb/v15/lib/pgxs/src/makefiles/pgxs.mk'.  Stop.
[tarandb@ztest-db2 pgvector]$ make install
/usr/bin/mkdir -p '/app/tarantuladb/v16/lib'
/usr/bin/mkdir -p '/app/tarantuladb/v16/share/extension'
/usr/bin/mkdir -p '/app/tarantuladb/v16/share/extension'
/usr/bin/install -c -m 755  vector.so '/app/tarantuladb/v16/lib/vector.so'
/usr/bin/install -c -m 644 .//vector.control '/app/tarantuladb/v16/share/extension/'
/usr/bin/install -c -m 644 .//sql/vector--0.7.4--0.8.0.sql .//sql/vector--0.5.0--0.5.1.sql .//sql/vector--0.1.3--0.1.4.sql .//sql/vector--0.1.1--0.1.3.sql .//sql/vector--0.4.3--0.4.4.sql .//sql/vector--0.1.7--0.1.8.sql .//sql/vector--0.2.0--0.2.1.sql .//sql/vector--0.1.5--0.1.6.sql .//sql/vector--0.3.1--0.3.2.sql .//sql/vector--0.1.8--0.2.0.sql .//sql/vector--0.2.6--0.2.7.sql .//sql/vector--0.3.0--0.3.1.sql .//sql/vector--0.4.2--0.4.3.sql .//sql/vector--0.6.2--0.7.0.sql .//sql/vector--0.4.0--0.4.1.sql .//sql/vector--0.1.4--0.1.5.sql .//sql/vector--0.2.7--0.3.0.sql .//sql/vector--0.7.0--0.7.1.sql .//sql/vector--0.4.1--0.4.2.sql .//sql/vector--0.7.3--0.7.4.sql .//sql/vector--0.5.1--0.6.0.sql .//sql/vector--0.6.1--0.6.2.sql .//sql/vector--0.1.0--0.1.1.sql .//sql/vector--0.2.3--0.2.4.sql .//sql/vector--0.1.6--0.1.7.sql .//sql/vector--0.3.2--0.4.0.sql .//sql/vector--0.8.0--0.8.1.sql .//sql/vector--0.2.5--0.2.6.sql .//sql/vector--0.4.4--0.5.0.sql .//sql/vector--0.2.1--0.2.2.sql .//sql/vector--0.7.2--0.7.3.sql .//sql/vector--0.6.0--0.6.1.sql .//sql/vector--0.2.2--0.2.3.sql .//sql/vector--0.7.1--0.7.2.sql .//sql/vector--0.2.4--0.2.5.sql sql/vector--0.8.1.sql '/app/tarantuladb/v16/share/extension/'
/usr/bin/mkdir -p '/app/tarantuladb/v16/include/server/extension/vector/'
/usr/bin/install -c -m 644   .//src/halfvec.h .//src/sparsevec.h .//src/vector.h '/app/tarantuladb/v16/include/server/extension/vector/'
/usr/bin/mkdir -p '/app/tarantuladb/v16/lib/bitcode/vector'
/usr/bin/mkdir -p '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/bitutils.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/bitvec.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/halfutils.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/halfvec.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/hnsw.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/hnswbuild.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/hnswinsert.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/hnswscan.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/hnswutils.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/hnswvacuum.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/ivfbuild.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/ivfflat.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/ivfinsert.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/ivfkmeans.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/ivfscan.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/ivfutils.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/ivfvacuum.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/sparsevec.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
/usr/bin/install -c -m 644 src/vector.bc '/app/tarantuladb/v16/lib/bitcode'/vector/src/
cd '/app/tarantuladb/v16/lib/bitcode' && /usr/bin/llvm-lto -thinlto -thinlto-action=thinlink -o vector.index.bc vector/src/bitutils.bc vector/src/bitvec.bc vector/src/halfutils.bc vector/src/halfvec.bc vector/src/hnsw.bc vector/src/hnswbuild.bc vector/src/hnswinsert.bc vector/src/hnswscan.bc vector/src/hnswutils.bc vector/src/hnswvacuum.bc vector/src/ivfbuild.bc vector/src/ivfflat.bc vector/src/ivfinsert.bc vector/src/ivfkmeans.bc vector/src/ivfscan.bc vector/src/ivfutils.bc vector/src/ivfvacuum.bc vector/src/sparsevec.bc vector/src/vector.bc

1-3. create vector

[tarandb@ztest-db1 pgvector]$ psql -c "select * from pg_available_extensions where name like '%vector%'";
  name  | default_version | installed_version |                       comment
--------+-----------------+-------------------+------------------------------------------------------
 vector | 0.8.1           |                   | vector data type and ivfflat and hnsw access methods
(1 row)

# vector 설치치
[tarandb@ztest-db1 pgvector]$ psql -c "create extension vector";
CREATE EXTENSION

[tarandb@ztest-db1 pgvector]$ psql -c "\dx vector";
                           List of installed extensions
  Name  | Version | Schema |                     Description
--------+---------+--------+------------------------------------------------------
 vector | 0.8.1   | public | vector data type and ivfflat and hnsw access methods
(1 row)

02. pgvector 설치

2-1. 3차원 벡터 열 생성 후 조회

# 3차원 벡터 열 생성
postgres=# CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
CREATE TABLE
postgres=# \d+ items
                                                          Table "public.items"
  Column   |   Type    | Collation | Nullable |              Default              | Storage  | Compression | Stats target | Description
-----------+-----------+-----------+----------+-----------------------------------+----------+-------------+--------------+-------------
 id        | bigint    |           | not null | nextval('items_id_seq'::regclass) | plain    |             |              |
 embedding | vector(3) |           |          |                                   | external |             |              |
Indexes:
    "items_pkey" PRIMARY KEY, btree (id)
Access method: heap

# 백터 삽입
postgres=# INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]'), ('[0.5, 0.1, 0.2]');
INSERT 0 2

# L2 거리로 [3,1,2]와 가장 가까운 데이터 조회
# 계산 공식 : sqrt( (x1 - y1)^2 + (x2 - y2)^2 + (x3 - y3)^2 )
postgres=# SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
 id |   embedding
----+---------------
  1 | [1,2,3]
  3 | [1,2,3]
  5 | [0.5,0.1,0.2]
  2 | [4,5,6]
  4 | [4,5,6]
(5 rows)

# 지원되는 거리 함수는 다음과 같습니다.
<->- L2 거리
<#>- (음수) 내적
<=>- 코사인 거리
<+>- L1 거리
<~>- 해밍 거리(이진 벡터)
<%>- 자카드 거리(이진 벡터)