• https://weaviate.io/developers/weaviate/introduction
    • 우리는 날라다닌다...
      • Weaviate is a low-latency Vector Database for different media types. (text, images, etc)
      • It offers Semantic Search, Question-Answer Extraction, Classification, etc.
      • Built from scratch in Go.
      • Combining "vector-search" with "structured-filtering".
      • The fault tolerance of a cloud-native database.
    • 특징
      • Fast queries
      • Ingest any media type with Weaviate Modules
      • Combine vector and scalar search
      • Real-time and persistent
      • Horizontal Scalability
      • High-Availability
      • Cost-Effectiveness
      • Graph-like connections between objects
  • https://weaviate.io/developers/weaviate/manage-data
    • Collection
      • # 테이블 생성
        {db_client}.collections.create(
            "이름",
            vectorizer_config=Configure.Vectorizer.text2vec_openai(
                model='text-embedding-3-large', 
                vectorize_collection_name=True,  # vectorize the collection name.
                dimensions=3072),
            # vectorizer_config=[
            #    Configure.NamedVectors.text2vec_openai(
            #        name="title_country",
            #        source_properties=["title", "country"]),
            #    식으로 특정 속성들 을 NamedVector로 설정 할 수 있음!
            # ],
        		
            # vector_index_config=Configure.VectorIndex.hnsw(
            #   quantizer=Configure.VectorIndex.Quantizer.bq(),
            #   ef_construction=300,
            #   distance_metric=VectorDistances.COSINE,
            #   filter_strategy=VectorFilterStrategy.SWEEPING),  #HNSW index
            # vector_index_config=Configure.VectorIndex.flat(),  #FLAT index
            vector_index_config=Configure.VectorIndex.dynamic(),  #DYNAMIC index
            
            inverted_index_config=Configure.inverted_index(
                index_null_state=True,
                index_property_length=True,
                index_timestamps=True),
            
            #reranker_config=Configure.Reranker.cohere(),  # Optional
            
            generative_config=Configure.Generative.openai(
                model='gpt-4o-mini'),
            multi_tenancy_config=Configure.multi_tenancy(
                enabled=True,
                auto_tenant_creation=True,
                auto_tenant_activation=True
            ),
            properties=[
                Property(
                    name="title",
                    data_type=DataType.TEXT,
                    vectorize_property_name=True,  # Use "title" as part of the value to vectorize
                    tokenization=Tokenization.LOWERCASE,
                    index_filterable=True,
                    index_searchable=True,
                    index_range_filters=False,
                ),
                Property(
                    name="body",
                    data_type=DataType.TEXT,
                    skip_vectorization=True,  # Don't vectorize this property
                    tokenization=Tokenization.WHITESPACE
                ),
            ]
        )
        
        # 각 사용자별 tenant 생성
        {collection}.tenants.create(
            tenants=[Tenant(name="사용자ID")]
        )
      • inverted_index_config :
        • https://weaviate.io/developers/weaviate/config-refs/schema#invertedindexconfig
        • index_timestamps :
          • To perform queries that are filtered by timestamps → objects' internal timestamps
          • `creationTimeUnix` and `lastUpdateTimeUnix`
      • multi_tenancy_config :
        • Each tenant is stored on a separate shard.
        • If your application serves many different users, multi-tenancy keeps their data private and makes database operations more efficient.
        • auto_tenant_creation & auto_tenant_activation :
          • https://weaviate.io/developers/academy/py/multitenancy/setup#-enable-multi-tenancy
      • property.tokenization :
        • WORD : 토크나이저는 영숫자를 유지하고 소문자로 변환하고 공백을 분할합니다.
          • 예) “Test_domain_weaviate" → ‘test’, ‘domain’, ‘weaviate’
        • KAGOME_KR : 한국어 특화 !!
        • https://weaviate.io/developers/weaviate/config-refs/schema#tokenization
    • Tenant State
      • https://weaviate.io/developers/weaviate/manage-data/multi-tenancy
      • https://weaviate.io/developers/weaviate/manage-data/tenant-states
      • Active : 활성이 되고, Mem 혹은 SSD 으로 데이터 이동.
      • Inactive : 비활성이 되고, SSD 으로 데이터 이동.
      • Offloaded : 비활성이 되고, S3 으로 데이터 이동. (재활성시 지연시간 발생)
    • ...
  • https://weaviate.io/developers/weaviate/concepts
    • Multi-tenancy
      • Sharding has several benefits
        • Data isolation
        • Fast, efficient querying
        • Easy and robust setup and clean up
      • DB서버 노드당 50,000 개 이상의 활성 샤드를 가질수 있음.
      • Each tenant has a dedicated, high-performance vector index.
      • Multi-tenancy is especially useful when you wnt to store data for multiple customers!
      • IDs : Tenant_ID + Object_UUID 가 유니크 함!
      • Cross-References :
        • multi-tenancy → non-multi-tenancy (O)
        • multi-tenancy → same multi-tenancy (O)
        • non-multi-tenancy → multi-tenancy (X)
        • multi-tenancy → diff multi-tenancy (X)
    • Compression
      • BQ :
      • PQ :
      • SQ :
    • Indexing
      • Vector indexes (vector-search)
        • HNSW :
          • RAM{Hot} ↔ SSD{Warm} ↔ S3{Cold}
          • ApproximateNearestNeighbor(ANN) search based vector index.
          • scale well with large datasets.
        • Flat :
          • SSD{Warm} ↔ SSD{Warm} ↔ S3{Cold}
          • for brute-force searches. (무차별 검색)
          • useful for small datasets.
        • Dynamic :
          • RAM{Hot}/SSD{Warm} ↔ SSD{Warm} ↔ S3{Cold}
          • when the dataset is small ←[switch]→ when the dataset is large.
      • Inverted indexes (keyword-search)
        • Collection 의 각 Property 단에서 셋팅
          • https://weaviate.io/developers/weaviate/concepts/indexing#inverted-index-types-summary
          • https://weaviate.io/developers/weaviate/concepts/indexing#inverted-index-for-timestamps
        • indexSearchable : BM25 | hybrid search. (필터링에도 사용가능 / Filterable 보단 성능down)
        • indexFilterable : a match-based index for fast filtering by matching criteria.
        • indexRangeFilters : a range-based index for filtering by numerical ranges.
    • Vector Indexing (심화)
      • ...
      • ASYNC_INDEXING : https://weaviate.io/developers/weaviate/concepts/vector-index#dynamic-index
      • ...
    • Filtering
      • Efficient Pre-Filtered Search
        • Each shard contains an inverted index right next to the HNSW index.
        • This allows for efficient pre-filtering.
      • Filter strategy
        • ACORN :
          • 필터가 쿼리 벡터와 상관 관계가 낮을 때 특히 유용.
          • (즉, 필터가 쿼리 벡터와 가장 유사한 그래프 영역에서 많은 객체를 제외할 때)
          • 대규모 데이터에서 더 강함.
        • Sweeping :
          • The existing and current default filter strategy in Weaviate.
    • Reranking
      • ...
  • ...

-끝-

'NoSQL' 카테고리의 다른 글

CQL  (0) 2020.03.02
NoSQL 모델링 이란?  (0) 2020.02.29
NoSQL 이란?  (0) 2020.02.29
분산시스템 이란?  (0) 2020.02.29
DynamoDB  (0) 2019.05.18

+ Recent posts