Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k. #23

Open
alifarhan357 opened this issue Apr 29, 2024 · 1 comment

Comments

@alifarhan357
Copy link

from bertopic import BERTopic

Define seed words for topics

seed_words = [
['software', 'programming', 'Python', 'Java', 'machine learning', 'data visualization'],
['project management', 'leadership'],
['healthcare', 'medical research', 'patient care', 'disease prevention']
]

Sample documents (text data)

documents = [
"This is about software development and programming languages like Python and Java.",
"Finance and banking are important topics in the economy.",
"Project management and leadership skills are essential for success.",
"Healthcare and medical research focus on patient care and disease prevention."
]

Initialize BERTopic model with seed_topic_list

model = BERTopic(seed_topic_list=seed_words)

Fit and transform documents to obtain topics and probabilities

topics, probabilities = model.fit_transform(documents)

Display the assigned topics for each document

for i, (doc, topic) in enumerate(zip(documents, topics)):
print(f"Document {i+1}: Topic {topic} - '{doc}'")

Error

/usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py:1600: RuntimeWarning: k >= N for N * N square matrix. Attempting to use scipy.linalg.eigh instead.
warnings.warn("k >= N for N * N square matrix. "
/usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py:1600: RuntimeWarning: k >= N for N * N square matrix. Attempting to use scipy.linalg.eigh instead.
warnings.warn("k >= N for N * N square matrix. "

TypeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/bertopic/_bertopic.py in _reduce_dimensionality(self, embeddings, y, partial_fit)
3471 y = np.array(y) if y is not None else None
-> 3472 self.umap_model.fit(embeddings, y=y)
3473 except TypeError:

14 frames
TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py in eigsh(A, k, M, sigma, which, v0, ncv, maxiter, tol, return_eigenvectors, Minv, OPinv, mode)
1603
1604 if issparse(A):
-> 1605 raise TypeError("Cannot use scipy.linalg.eigh for sparse A with "
1606 "k >= N. Use scipy.linalg.eigh(A.toarray()) or"
1607 " reduce k.")

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

@MaartenGr
Copy link
Owner

Thank you for sharing. This is a BERTopic issue and not a Concept issue, so I would advise you to check the issues page of BERTopic instead. I believe you can also find some temporary solutions there until a fix is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants