You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So I noticed that newlines are getting removed when RAG is transforming from pdf to txt. That probably decreases the level of accuracy when using similarity search.
I kind of hat a workaround as my ingestion files don't need bo PDFs so I just could take the txt file like and leave it as it is for the embedding:
---
event: meetup
title: Langchain AI MVP
date: "2024-02-03"
tags: ["meetup", "langchain"]
---
0:00:00.719,0:00:03.719
okay
0:00:05.400,0:00:09.120
awesome so I will start with the first
0:00:07.980,0:00:11.460
talk
0:00:09.120,0:00:14.280
uh thanks again for attending here to
I'm sure that had better results then without newlines!
Expected Behavior
Newlines are not removed
Current Behavior
Newlines are removed
Reproduction Steps
do a RAG and view the txt file.
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.124.0
Framework Version
No response
Node.js Version
20
OS
macos
Language
Typescript
Language Version
No response
Region experiencing the issue
us-east-1
Code modification
....
Other information
No response
Service quota
I have reviewed the service quotas for this construct
The text was updated successfully, but these errors were encountered:
Hi @mmuller88 , thank you for reporting this ! This method is currently relying on Langchain, we will be merging soon a new capability for users to provide their own lambda business logic in case they want to use a different library / transformation method !
Will close this ticket and mention #284 which should be merged pretty soon
Describe the bug
So I noticed that newlines are getting removed when RAG is transforming from pdf to txt. That probably decreases the level of accuracy when using similarity search.
I kind of hat a workaround as my ingestion files don't need bo PDFs so I just could take the txt file like and leave it as it is for the embedding:
I'm sure that had better results then without newlines!
Expected Behavior
Newlines are not removed
Current Behavior
Newlines are removed
Reproduction Steps
do a RAG and view the txt file.
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.124.0
Framework Version
No response
Node.js Version
20
OS
macos
Language
Typescript
Language Version
No response
Region experiencing the issue
us-east-1
Code modification
....
Other information
No response
Service quota
The text was updated successfully, but these errors were encountered: