- Support for Docling 2.0 added to DPK in pdf2parquet transform. The new updates allow DPK users to ingest other type of documents, e.g. MS Word, MS Powerpoint, Images, Markdown, Asciidocs, etc.
- Released Web2parquet transform for crawling the web.
- Data-Prep-Kit: getting your data ready for LLM application development
- Granite Code Models: A Family of Open Foundation Models for Code Intelligence
- Scaling Granite Code Models to 128K Context
- "Building Successful LLM Apps: The Power of high quality data" - Video | Slides
- "Hands on session for fine tuning LLMs" - Video
- "Build your own data preparation module using data-prep-kit" - Video
- "Data Prep Kit: A Comprehensive Cloud-Native Toolkit for Scalable Data Preparation in GenAI App" - Video | Slides
- "RAG with Data Prep Kit" Workshop @ Mountain View, CA, USA ** - info
- Tech Educator summit IBM CSR Event
- Talk and Hands on session at MIT Bangalore
- PyData NYC 2024 - 90 mins Tutorial
- Open Source AI Demo Night
- Data Exchange Podcast with Ben Lorica
- Unstructured Data Meetup - SF, NYC, Silicon Valley
- IBM TechXchange Las Vegas
- Open Source RAG Pipeline workshop with Data Prep Kit at TechEquity's AI Summit in Silicon Valley
- Data Science Dojo Meetup - video
- DPK tutorial and hands on session at IIIT Delhi
Find example code in readme section of each tranform and some sample jupyter notebooks for getting started here
- Data Prep Kit Discord Channel
- DPK is now listed in Github Awesome-LLM under LLM Data section
- DPK is now up for access via IBM Skills Build Download
- DPK added to the Application Hub of “AI Sustainability Catalog”
Feel free to contribute to discussions or create a new one to share your feedback