`transform` method not handling single embeddings or strings given to it.

**Issue**

Currently the `transform` method of the `BERTopic` has to recieve a list of strings as the documents and if embeddings are present then it needs to be a 2darray of shape (len(documents, embedding_dimension). This requires extra reshape calls when trying to call the `transform` method on a single document.

For example it would be nicer to do this:
```python
document = 'This is a really interesting document'

document_embedding = np.random.rand(768)

topic_model.transform(document, document_embedding)
```
<details>
  <summary>Example of what you currently have to do</summary>

  ```python
  document = 'This is a really interesting document'
  
  document_embedding = np.random.rand(768)
  
  topic_model.transform([document], document_embedding.reshape(1,-1))
  ```
</details>

Currently if you run it you get this error.

```
BFile ~/code/BERTopic/bertopic/_utils.py:55, in check_embeddings_shape(embeddings, docs)
      else:
         if embeddings.shape[0] != len(docs):
--->         raise ValueError("Make sure that the embeddings are a numpy array with shape: "
                               "(len(docs), vector_dim) where vector_dim is the dimensionality "
                               "of the vector embeddings. ")

ValueError: Make sure that the embeddings are a numpy array with shape: (len(docs), vector_dim) where vector_dim is the dimensionality of the vector embeddings.
```

**Why I think it is happening**

This is because the embeddings shape check inside  `transform` happens before:
1. the `document` argument is converted into a list if it is str
2. The embedding is reshaped from a 1d array to 2d array  _(Note that it currently never does this)._

**Thoughts on making it better**
Am I missing something here about np arrays or the transform function? It seems to me that wanting to transform a single document is common enough that the function could have that little bit of extra functionality.

I have already made the change on a fork for my uses did you agree with the idea and would you like a PR?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`transform` method not handling single embeddings or strings given to it. #2018

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

transform method not handling single embeddings or strings given to it. #2018

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`transform` method not handling single embeddings or strings given to it. #2018