Cohere v3 for Multilingual QA


Notebook by Bilge Yucel

Multilingual Generative QA Using Cohere and Haystack

In this notebook, we’ll delve into the details of multilingual retrieval and multilingual generation, and demonstrate how to build a Retrieval Augmented Generation (RAG) pipeline to generate answers from multilingual hotel reviews using Cohere models and Haystack 2.0. 🏡

Haystack 2.0 Useful Sources

For Haystack 1.x version, check out Article: Multilingual Generative Question Answering with Haystack and Cohere

Installation

Let’s start by installing Haystack’s Cohere integration:

!pip install cohere-haystack

Storing Multilingual Embeddings

To create a question answering system for hotel reviews, the first thing we need is a document store. We’ll use an InMemoryDocumentStore to save the hotel reviews along with their embeddings.

from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()

Getting Cohere API Key

After signing up, you can get a Cohere API key for free to start using Cohere models.

from getpass import getpass

COHERE_API_KEY = getpass("Enter Cohere API key:")
Enter Cohere API key:··········

Creating an Indexing Pipeline

Let’s create an indexing pipeline to write the hotel reviews from different languages to our document store. For this, we’ll split the long reviews with DocumentSplitter and create multilingual embeddings for each document using embed-multilingual-v3.0 model with CohereDocumentEmbedder.

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack.components.preprocessors import DocumentSplitter
from haystack_integrations.components.embedders.cohere import CohereDocumentEmbedder
from haystack.utils import Secret

documents = [Document(content="O ar condicionado de um dos quartos deu problema, mas levaram um ventilador para ser utilizado. Também por ser em uma área bem movimentada, o barulho da rua pode ser ouvido. Porém, eles deixam protetores auriculares para o uso. Também senti falta de um espelho de corpo inteiro no apartamento. Só havia o do banheiro que mostra apenas a parte superior do corpo."),
             Document(content="Durchgängig Lärm, weil direkt an der Partymeile; schmutziges Geschirr; unvollständige Küchenausstattung; Abzugshaube über Herd ging für zwei Stunden automatisch an und lies sich nicht abstellen; Reaktionen auf Anfragen entweder gar nicht oder unfreundlich"),
             Document(content="Das Personal ist sehr zuvorkommend! Über WhatsApp war man im guten Kontakt und konnte alles erfragen. Auch das Angebot des Shuttleservices war super und würde ich empfehlen - sehr unkompliziert! Unser Flug hatte Verspätung und der Shuttle hat auf uns gewartet. Die Lage zur Innenstadt ist sehr gut,jedoch ist die Fensterfront direkt zur Club-Straße deshalb war es nachts bis drei/vier Uhr immer recht laut. Die Kaffeemaschine oder auch die Couch hätten sauberer sein können. Ansonsten war das Appartement aber völlig ok."),
             Document(content="Super appartement. Juste au dessus de plusieurs bars qui ferment très tard. A savoir à l'avance. (Bouchons d'oreilles fournis !)"),
             Document(content="Zapach moczu przy wejściu do budynku, może warto zainstalować tam mocne światło na czujnik ruchu, dla gości to korzystne a dla kogoś kto chciałby zrobić tam coś innego niekorzystne :-). Świetne lokalizacje w centrum niestety są na to narażane."),
             Document(content="El apartamento estaba genial y muy céntrico, todo a mano. Al lado de la librería Lello y De la Torre de los clérigos. Está situado en una zona de marcha, así que si vais en fin de semana , habrá ruido, aunque a nosotros no nos molestaba para dormir"),
             Document(content="The keypad with a code is convenient and the location is convenient. Basically everything else, very noisy, wi-fi didn't work, check-in person didn't explain anything about facilities, shower head was broken, there's no cleaning and everything else one may need is charged."),
             Document(content="It is very central and appartement has a nice appearance (even though a lot IKEA stuff), *W A R N I N G** the appartement presents itself as a elegant and as a place to relax, very wrong place to relax - you cannot sleep in this appartement, even the beds are vibrating from the bass of the clubs in the same building - you get ear plugs from the hotel -> now I understand why -> I missed a trip as it was so loud and I could not hear the alarm next day due to the ear plugs.- there is a green light indicating 'emergency exit' just above the bed, which shines very bright at night - during the arrival process, you felt the urge of the agent to leave as soon as possible. - try to go to 'RVA clerigos appartements' -> same price, super quiet, beautiful, city center and very nice staff (not an agency)- you are basically sleeping next to the fridge, which makes a lot of noise, when the compressor is running -> had to switch it off - but then had no cool food and drinks. - the bed was somehow broken down - the wooden part behind the bed was almost falling appart and some hooks were broken before- when the neighbour room is cooking you hear the fan very loud. I initially thought that I somehow activated the kitchen fan"),
             Document(content="Un peu salé surtout le sol. Manque de service et de souplesse"),
             Document(content="De comfort zo centraal voor die prijs."),
             Document(content="Die Lage war sehr Zentral und man konnte alles sehenswertes zu Fuß erreichen. Wer am Wochenende nachts schlafen möchte, sollte diese Unterkunft auf keinen Fall nehmen. Party direkt vor der Tür so das man denkt, man schläft mitten drin. Sehr Sehr laut also und das bis früh 5 Uhr. Ab 7 kommt dann die Straßenreinigung die keineswegs leiser ist."),
             Document(content="Ótima escolha! Apartamento confortável e limpo! O RoofTop é otimo para beber um vinho! O apartamento é localizado entre duas ruas de movimento noturno. Porem as janelas, blindam 90% do barulho. Não nos incomodou"),
             Document(content="Nous avons passé un séjour formidable. Merci aux personnes , le bonjours à Ricardo notre taxi man, très sympathique. Je pense refaire un séjour parmi vous, après le confinement, tout était parfait, surtout leur gentillesse, aucune chaude négative. Je n'ai rien à redire de négative, Ils étaient a notre écoute, un gentil message tout les matins, pour nous demander si nous avions besoins de renseignement et savoir si tout allait bien pendant notre séjour."),
             Document(content="Boa localização. Bom pequeno almoço. A tv não se encontrava funcional."),
             Document(content="Céntrico. Muy cómodo para moverse y ver Oporto. Edificio con terraza propia en la última planta. Todo reformado y nuevo. Te traen un estupendo desayuno todas las mañanas al apartamento. Solo que se puede escuchar algo de ruido de la calle a primeras horas de la noche. Es un zona de ocio nocturno. Pero respetan los horarios.")
]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("splitter", DocumentSplitter(split_by="word", split_length=200))
indexing_pipeline.add_component("embedder", CohereDocumentEmbedder(api_key=Secret.from_token(COHERE_API_KEY), model="embed-multilingual-v3.0"))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("splitter", "embedder")
indexing_pipeline.connect("embedder", "writer")

indexing_pipeline.run({"splitter": {"documents": documents}})
Calculating embeddings: 100%|██████████| 1/1 [00:00<00:00,  3.29it/s]





{'embedder': {'meta': {'api_version': {'version': '1'},
   'billed_units': {'input_tokens': 1137}}},
 'writer': {'documents_written': 16}}

Building a RAG Pipeline

Now that we have multilingual embeddings indexed in our document store, we’ll create a pipeline where users interact the most: Retrieval-Augmented Generation (RAG) Pipeline.

A RAG pipeline consists of two parts: document retrieval and answer generation.

Multilingual Document Retrieval

In the document retrieval step of a RAG pipeline, CohereTextEmbedder creates an embedding for the query in the multilingual vector space and InMemoryEmbeddingRetriever retrieves the most similar top_k documents to the query from the document store. In our case, the retrieved documents will be hotel reviews.

Multilingual Answer Generation

In the generation step of the RAG pipeline, we’ll use command model of Cohere with CohereGenerator to generate an answer based on the retrieved documents.

Let’s create a prompt template to use for hotel reviews. In this template, we’ll have two prompt variables: {{documents}} and {{question}}. These variables will later be filled with the user question and the retrieved hotel reviews outputted from the retriever.

from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack_integrations.components.embedders.cohere import CohereTextEmbedder
from haystack_integrations.components.generators.cohere import CohereGenerator

template = """
You will be provided with reviews in multiple languages for an accommodation.
Create a concise and informative answer for a given question based solely on the given reviews.

\nReviews:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

\nQuestion: {{question}};
\nAnswer:
"""
rag_pipe = Pipeline()
rag_pipe.add_component("embedder", CohereTextEmbedder(api_key=Secret.from_token(COHERE_API_KEY), model="embed-multilingual-v3.0"))
rag_pipe.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store, top_k=3))
rag_pipe.add_component("prompt_builder", PromptBuilder(template=template))
rag_pipe.add_component("llm", CohereGenerator(api_key=Secret.from_token(COHERE_API_KEY), model="command"))
rag_pipe.connect("embedder.embedding", "retriever.query_embedding")
rag_pipe.connect("retriever", "prompt_builder.documents")
rag_pipe.connect("prompt_builder", "llm")

Asking a Question

Learn if this hotel is a suitable place this stay with your questions!

question = "Is this place too noisy to sleep?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 The general consensus from the reviews is that this accommodation is very loud and not ideal for sleeping. The first review warns of loud club music which vibrates the beds, while the second review describes loud nightlife noise and loud cleaning practices in the early morning. 

Based on this evidence, it is fair to conclude that this accommodation is too noisy for guests to get adequate rest. 

Other questions you can try 👇

question = "What are the problems about this place?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 Some of the main issues with this accommodation include loud vibrations and poor sound insulation, likely due to the proximity of clubs in the same building. Also, the WiFi did not work and the equipment in the apartment, including the shower head and bed, was broken and aged. Finally, the staff who checked them in offered no guidance or support and charged additionally for any needed amenities. 
question = "What is good about this place?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 The reviews highlight the following positive aspects of the accommodation: 

- Great location 
- Good breakfast 
- Friendly staff and taxi driver (Ricardo)
- Gentleness and care of the staff, with a kind message each morning to ensure all was well. 

It seems like the guests greatly appreciated the staff and service of the accommodation, and recommend it, wishing to return again in the future. 
question = "Should I stay at this hotel?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 The reviews for this hotel are mixed. Some guests had an enjoyable experience at this hotel, highlighting the convenient location, breakfast, and friendly staff. However, it's important to note that there are also more critical reviews that point out some lacking amenities and an unclean atmosphere. 

If you are looking for a hotel with a more consistent reputation, it might be worth considering other options in the area. Ultimately, it is up to you to decide whether this hotel's potential strengths match your preferences and whether the reported issues are deal-breakers for your stay. 
question = "How is the wifi?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 Based on the reviews provided, the wifi is functional but may not work sometimes. There also seems to be an issue with the television set and the cleanliness of the room. 
question = "Are there pubs near by?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 Yes, the apartment is located above multiple bars. However the noise levels at night are mentioned as "ferment(ing) très tard", so you will want to consider ahead of time whether this might be an issue for you.