Skip to content

Latest commit

 

History

History
54 lines (37 loc) · 4.29 KB

resources.md

File metadata and controls

54 lines (37 loc) · 4.29 KB

New Features & Enhancements

  • Support for Docling 2.0 added to DPK in pdf2parquet transform. The new updates allow DPK users to ingest other type of documents, e.g. MS Word, MS Powerpoint, Images, Markdown, Asciidocs, etc.
  • Released Web2parquet transform for crawling the web.

Data Prep Kit Resources

📄 Papers

  1. Data-Prep-Kit: getting your data ready for LLM application development
  2. Granite Code Models: A Family of Open Foundation Models for Code Intelligence
  3. Scaling Granite Code Models to 128K Context

🎤 External Events and Showcase

  1. "Building Successful LLM Apps: The Power of high quality data" - Video | Slides
  2. "Hands on session for fine tuning LLMs" - Video
  3. "Build your own data preparation module using data-prep-kit" - Video
  4. "Data Prep Kit: A Comprehensive Cloud-Native Toolkit for Scalable Data Preparation in GenAI App" - Video | Slides
  5. "RAG with Data Prep Kit" Workshop @ Mountain View, CA, USA ** - info
  6. Tech Educator summit IBM CSR Event
  7. Talk and Hands on session at MIT Bangalore
  8. PyData NYC 2024 - 90 mins Tutorial
  9. Open Source AI Demo Night
  10. Data Exchange Podcast with Ben Lorica
  11. Unstructured Data Meetup - SF, NYC, Silicon Valley
  12. IBM TechXchange Las Vegas
  13. Open Source RAG Pipeline workshop with Data Prep Kit at TechEquity's AI Summit in Silicon Valley
  14. Data Science Dojo Meetup - video
  15. DPK tutorial and hands on session at IIIT Delhi

Example Code

Find example code in readme section of each tranform and some sample jupyter notebooks for getting started here

Blogs / Tutorials

Relevant online communities

We Want Your Feedback!

Feel free to contribute to discussions or create a new one to share your feedback