Skip to content

meta_fields_to_embed silently drops valid falsy metadata values (0, False) #11405

@rautaditya2606

Description

@rautaditya2606

Describe the bug

meta_fields_to_embed silently drops valid falsy metadata values (0, False) during text preparation for embedding and ranking. This happens because the filtering logic uses a Python truthiness check instead of an explicit None guard.

Affected components:

  • SentenceTransformersDocumentEmbedder
  • SentenceTransformersSparseDocumentEmbedder
  • TransformersSimilarityRanker
  • SentenceTransformersSimilarityRanker
  • SentenceTransformersDiversityRanker

Error message

No error is raised. The values are silently excluded, making it a silent correctness bug.

Expected behavior

All metadata values specified in meta_fields_to_embed should be included in the embedded text unless they are explicitly None or the key is absent from doc.meta.

To Reproduce

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

embedder = SentenceTransformersDocumentEmbedder(
    model="sentence-transformers/all-MiniLM-L6-v2",
    meta_fields_to_embed=["rating", "is_available"],
    embedding_separator="\n"
)

doc = Document(content="some content", meta={"rating": 0, "is_available": False})

# Expected embedded text: "0\nFalse\nsome content"
# Actual embedded text:   "some content"  ← both fields silently dropped

Root cause — all 5 affected components use:

if key in doc.meta and doc.meta[key]

which treats any falsy value as absent. Should be:

if key in doc.meta and doc.meta[key] is not None

Additional context

OpenAIDocumentEmbedder and AzureOpenAIDocumentEmbedder already use the correct is not None pattern and are unaffected.

FAQ Check

  • Have you had a look at our new FAQ page?

System:

  • OS: Fedora Linux 42 (KDE Plasma) x86_64
  • GPU/CPU: Intel i5-12500H / NVIDIA RTX 3050 4GB
  • Haystack version: main (d8a7c96)
  • DocumentStore: N/A
  • Reader: N/A
  • Retriever: N/A

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions