跳到主要内容

hugging-face-endpoints

拥抱面推理端点允许访问简单的模型推理。结合Pinecone,我们可以轻松生成和索引高质量的向量嵌入。

让我们通过初始化生成向量嵌入的推理端点来开始吧。

端点

我们首先前往拥抱面推理端点主页,如果需要,注册一个账号。注册成功后,我们应该能够看到这个页面:

endpoints 0

我们点击创建新端点,选择一个模型仓库(例如模型名称)、端点名称(可以随意选择)和选择一个云环境。在继续之前,非常重要的是,我们需要在高级配置设置中将任务设置为句子嵌入

endpoints 1

endpoints 2

其他重要的选项包括实例类型,默认情况下使用CPU,这更便宜但速度较慢。对于更快的处理,我们需要使用GPU实例。最后,我们在页面末尾设置隐私设置。

在设置好选项后,我们可以点击页面底部的创建端点。这个操作应该会带我们到下一页,在那里我们将看到我们端点的当前状态。

endpoints 3

一旦状态从构建中变为运行中(这可能需要一些时间),我们就可以开始使用它创建嵌入。

创建嵌入

每个端点都有一个端点URL,可以在端点概述页面上找到。我们需要将此端点URL分配给endpoint_url变量。

endpoints 4

Python

endpoint_url = "<<ENDPOINT_URL>>"
api_org = "<<ORG_API_TOKEN>>"

endpoints 5

Python

api_org = "<<API_ORG_TOKEN>>"

现在我们已经准备好通过推理端点创建嵌入了。让我们从一个玩具示例开始。

Python

import requests

# add the api org token to the headers
headers = {
'Authorization': f'Bearer {api_org}'
}
# we add sentences to embed like so
json_data = {"inputs": ["a happy dog", "a sad dog"]}
# make the request
res = requests.post(
endpoint,
headers=headers,
json=json_data
)

我们应该看到一个200响应。

Python

res

<Response [200]>

在响应中,我们应该找到两个嵌入...

Python

len(res.json()['embeddings'])

2

我们还可以这样看到嵌入的维度:

Python

dim = len(res.json()['embeddings'][0])
dim

768

我们需要更多的条目来搜索,所以让我们下载一个更大的数据集。为此,我们将使用Hugging Face数据集。

Python

from datasets import load_dataset

snli = load_dataset("snli", split='train')
snli

Downloading: 100%|██████████| 1.93k/1.93k [00:00<00:00, 992kB/s]
Downloading: 100%|██████████| 1.26M/1.26M [00:00<00:00, 31.2MB/s]
Downloading: 100%|██████████| 65.9M/65.9M [00:01<00:00, 57.9MB/s]
Downloading: 100%|██████████| 1.26M/1.26M [00:00<00:00, 43.6MB/s]

Dataset({
features: ['premise', 'hypothesis', 'label'],
num_rows: 550152
})

SNLI包含550K个句子对,其中许多包括重复项,因此我们只取其中一个集合(假设)并去重。

Python

passages = list(set(snli['hypothesis']))
len(passages)

480042

我们将减少到50K个句子,这样示例就可以快速运行,如果您有时间,可以保留全部480K个句子。

Python

passages = passages[:50_000]

Vector DB

现在,我们已经准备好了端点和数据集,我们唯一缺少的就是向量数据库。为此,我们需要初始化与Pinecone的连接,这需要一个免费的API密钥

Python

import pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")

现在,我们创建一个名为'hf-endpoints'的新索引,名称并不重要,但dimension必须与我们的端点模型输出维度匹配(我们在上面找到了这个维度),而且模型指标通常是cosine

Python

index_name = 'hf-endpoints'

# check if the hf-endpoints index exists
if index_name not in pinecone.list_indexes():
# create the index if it does not exist
pinecone.create_index(
index_name,
dimension=dim,
metric="cosine"
)

# connect to hf-endpoints index we created
index = pinecone.Index(index_name)

创建和索引嵌入向量

现在,我们所有的组件都已准备就绪:端点、数据集和Pinecone。让我们开始创建我们的数据集嵌入,并将它们索引在Pinecone上。

Python

from tqdm.auto import tqdm

# we will use batches of 64
batch_size = 64

for i in tqdm(range(0, len(passages), batch_size)):
# find end of batch
i_end = min(i+batch_size, len(passages))
# extract batch
batch = passages[i:i_end]
# generate embeddings for batch via endpoints
res = requests.post(
endpoint,
headers=headers,
json={"inputs": batch}
)
emb = res.json()['embeddings']
# get metadata (just the original text)
meta = [{'text': text} for text in batch]
# create IDs
ids = [str(x) for x in range(i, i_end)]
# add all to upsert list
to_upsert = list(zip(ids, emb, meta))
# upsert/insert these records to pinecone
_ = index.upsert(vectors=to_upsert)

# check that we have all vectors in index
index.describe_index_stats()

100%|██████████| 782/782 [11:02<00:00,  1.18it/s]

{'dimension': 768,
'index_fullness': 0.1,
'namespaces': {'': {'vector_count': 50000}},
'total_vector_count': 50000}

一切都被索引了,我们现在可以开始查询了。我们将从数据集的“premise”列中选取几个例子。

Python

query = snli['premise'][0]
print(f"Query: {query}")
# encode with HF endpoints
res = requests.post(endpoint, headers=headers, json={"inputs": query})
xq = res.json()['embeddings']
# query and return top 5
xc = index.query(xq, top_k=5, include_metadata=True)
# iterate through results and print text
print("Answers:")
for match in xc['matches']:
print(match['metadata']['text'])

Query: A person on a horse jumps over a broken down airplane.
Answers:
The horse jumps over a toy airplane.
a lady rides a horse over a plane shaped obstacle
A person getting onto a horse.
person rides horse
A woman riding a horse jumps over a bar.

这些看起来不错,我们再试几个例子。

Python

query = snli['premise'][100]
print(f"Query: {query}")
# encode with HF endpoints
res = requests.post(endpoint, headers=headers, json={"inputs": query})
xq = res.json()['embeddings']
# query and return top 5
xc = index.query(xq, top_k=5, include_metadata=True)
# iterate through results and print text
print("Answers:")
for match in xc['matches']:
print(match['metadata']['text'])

Query: A woman is walking across the street eating a banana, while a man is following with his briefcase.
Answers:
A woman eats a banana and walks across a street, and there is a man trailing behind her.
A woman eats a banana split.
A woman is carrying two small watermelons and a purse while walking down the street.
The woman walked across the street.
A woman walking on the street with a monkey on her back.

再来多一点:

Python

query = snli['premise'][200]
print(f"Query: {query}")
# encode with HF endpoints
res = requests.post(endpoint, headers=headers, json={"inputs": query})
xq = res.json()['embeddings']
# query and return top 5
xc = index.query(xq, top_k=5, include_metadata=True)
# iterate through results and print text
print("Answers:")
for match in xc['matches']:
print(match['metadata']['text'])

Query: People on bicycles waiting at an intersection.
Answers:
A pair of people on bikes are waiting at a stoplight.
Bike riders wait to cross the street.
people on bicycles
Group of bike riders stopped in the street.
There are bicycles outside.

所有这些结果看起来都很好。如果你不打算在本教程之外继续运行你的端点和向量数据库,可以关闭它们。

一旦删除索引,就不能再使用它。

通过导航到推论端点概述页面并选择删除端点来关闭端点。使用以下命令删除Pinecone索引:

Python

pinecone.delete_index(index_name)


更新于 3个月前