插入数据

创建Pinecone索引后，您可以开始将向量嵌入和元数据插入索引。

插入向量

连接到索引：

下面分别是Python和Curl代码

index = pinecone.Index("pinecone-index")

# Not applicable

将数据作为(id, vector)元组列表插入。使用Upsert操作将向量写入命名空间：

下面分别是Python、JavaScript和Curl代码

# Insert sample data (5 8-dimensional vectors)
index.upsert([
    ("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]),
    ("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]),
    ("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]),
    ("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]),
    ("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5])
])

index.upsert({
  vectors: [
    {
      id: "A",
      values: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
    },
    {
      id: "B",
      values: [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
    },
    {
      id: "C",
      values: [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],
    },
    {
      id: "D",
      values: [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4],
    },
    {
      id: "E",
      values: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
    },
  ],
});

curl -i -X POST https://YOUR_INDEX-YOUR_PROJECT.svc.YOUR_ENVIRONMENT.pinecone.io/vectors/upsert \
  -H 'Api-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vectors": [
      {
        "id": "A",
        "values": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
      },
      {
        "id": "B",
        "values": [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]
      },
      {
        "id": "C",
        "values": [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]
      },
      {
        "id": "D",
        "values": [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]
      },
      {
        "id": "E",
        "values": [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]
      }
    ]
  }'

在接收到upsert响应后，向量可能还无法立即对查询可见。在大多数情况下，您可以通过检查describe_index_stats()返回的向量计数是否有所更新来判断向量是否已被接收。如果索引具有多个副本，则该技术可能无法正常工作。数据库最终一致。

批量插入

对于批量插入大量数据的客户端，您应该在多个upsert请求中以100个向量或更少的批次将数据插入索引。

示例

Python

import random
import itertools

def chunks(iterable, batch_size=100):
    """A helper function to break an iterable into chunks of size batch_size."""
    it = iter(iterable)
    chunk = tuple(itertools.islice(it, batch_size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it, batch_size))

vector_dim = 128
vector_count = 10000

# Example generator that generates many (id, vector) pairs
example_data_generator = map(lambda i: (f'id-{i}', [random.random() for _ in range(vector_dim)]), range(vector_count))

# Upsert data with 100 vectors per upsert request
for ids_vectors_chunk in chunks(example_data_generator, batch_size=100):
    index.upsert(vectors=ids_vectors_chunk)  # Assuming `index` defined elsewhere

并行发送upserts

默认情况下，所有向量操作都会阻塞，直到收到响应。但是使用我们的客户端，它们可以被异步执行。对于批量Upserts示例，可以按以下方式完成：

PythonShell

# Upsert data with 100 vectors per upsert request asynchronously
# - Create pinecone.Index with pool_threads=30 (limits to 30 simultaneous requests)
# - Pass async_req=True to index.upsert()
with pinecone.Index('example-index', pool_threads=30) as index:
    # Send requests in parallel
    async_results = [
        index.upsert(vectors=ids_vectors_chunk, async_req=True)
        for ids_vectors_chunk in chunks(example_data_generator, batch_size=100)
    ]
    # Wait for and retrieve responses (this raises in case of error)
    [async_result.get() for async_result in async_results]

# Not applicable

Pinecone是线程安全的，因此您可以同时启动多个读取请求和多个写入请求。启动多个请求可以帮助提高吞吐量。但是，不能同时执行读取和写入操作，因此大批量写入可能会影响查询延迟，反之亦然。

如果上传速度慢，请参阅性能调优以获取建议。

将索引分区

您可以将添加到索引中的向量组织成分区或“命名空间”，以便将查询和其他向量操作限制为一次仅限一个这样的命名空间。有关更多信息，请参见：命名空间。

插入包含元数据的向量

您可以插入包含键值对元数据的向量。

当你发送查询时，可以使用元数据来过滤满足特定条件的项。Pinecone只会在匹配过滤条件的项中搜索相似的向量嵌入。更多信息请参见：元数据过滤。

下面分别是Python、JavaScript和Curl代码

index.upsert([
    ("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], {"genre": "comedy", "year": 2020}),
    ("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2], {"genre": "documentary", "year": 2019}),
    ("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], {"genre": "comedy", "year": 2019}),
    ("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4], {"genre": "drama"}),
    ("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5], {"genre": "drama"})
])

await index.upsert({
  vectors: [
    {
      id: "A",
      values: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
      metadata: { genre: "comedy", year: 2020 },
    },
    {
      id: "B",
      values: [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
      metadata: { genre: "documentary", year: 2019 },
    },
    {
      id: "C",
      values: [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],
      metadata: { genre: "comedy", year: 2019 },
    },
    {
      id: "D",
      values: [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4],
      metadata: { genre: "drama" },
    },
    {
      id: "E",
      values: [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
      metadata: { genre: "drama" },
    },
  ],
});

curl -i -X POST https://YOUR_INDEX-YOUR_PROJECT.svc.YOUR_ENVIRONMENT.pinecone.io/vectors/upsert \
  -H 'Api-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vectors": [
      {
        "id": "A",
        "values": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
        "metadata": {"genre": "comedy", "year": 2020}
      },
      {
        "id": "B",
        "values": [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
        "metadata": {"genre": "documentary", "year": 2019}
      },
      {
        "id": "C",
        "values": [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],
        "metadata": {"genre": "comedy", "year": 2019}
      },
      {
        "id": "D",
        "values": [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4],
        "metadata": {"genre": "drama"}
      },
      {
        "id": "E",
        "values": [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
        "metadata": {"genre": "drama"}
      }
    ]
  }'

插入包含稀疏值的向量

稀疏向量值可以与密集向量值一起插入。

如果没有密集向量值，不能插入稀疏向量值。

⚠️警告
只有使用“点积”（dotproduct）距离度量的s1和p1 pod类型支持查询稀疏向量。在upsert时没有错误：如果您> 尝试使用稀疏向量查询任何其他pod类型，则Pinecone会返回错误。

2023年2月22日之前创建的索引不支持稀疏值。

下面分别是Python和Curl代码

 index = pinecone.Index('example-index') 

 upsert_response = index.upsert(
    vectors=[
      {'id': 'vec1',
        'values': [0.1, 0.2, 0.3, 0.4],
        'metadata': {'genre': 'drama'},
        'sparse_values': {
            'indices': [10, 45, 16],
            'values': [0.5, 0.5, 0.2]
        }},
      {'id': 'vec2',
        'values': [0.2, 0.3, 0.4, 0.5],
        'metadata': {'genre': 'action'},
        'sparse_values': {
            'indices': [15, 40, 11],
            'values': [0.4, 0.5, 0.2]
        }}
    ],
    namespace='example-namespace'
)

curl --request POST \
     --url https://index_name-project_id.svc.environment.pinecone.io/vectors/upsert \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
     "vectors": [
          {
               "values": [
                    0.1,
                    0.2,
                    0.3,
                    0.4
               ],
               "sparseValues": {
                    "indices": [
                         10,
                         45,
                         16
                    ],
                    "values": [
                         0.4,
                         0.5,
                         0.2
                    ]
               },
               "id": "vec1"
          },
          {
               "values": [
                    0.2,
                    0.3,
                    0.4,
                    0.5
               ],
               "sparseValues": {
                    "indices": [
                         15,
                         40,
                         11
                    ],
                    "values": [
                         0.4,
                         0.5,
                         0.2
                    ]
               },
               "id": "vec2"
          }
     ]
}
'

故障排除索引完整性错误

在插入数据时，您可能会收到以下错误：

console

Index is full, cannot accept data.

新的插入可能会因容量不足而失败。虽然您的索引仍然可以提供查询，但您需要扩展您的环境以容纳更多向量。

要解决此问题，您可以扩展您的索引。

最近更新时间：3个月前

插入数据

插入向量​

批量插入​

并行发送upserts​

将索引分区​

插入包含元数据的向量​

插入包含稀疏值的向量​

⚠️警告​

故障排除索引完整性错误​