Skip to content

Pull requests: modelscope/data-juicer

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

use black dj:core issues/PRs about the core functions of Data-Juicer environment related to third-party dependency, DJ-pypi, DJ-docker, etc.
#714 opened Jun 20, 2025 by cyruszhang Loading…
Fix sandbox left bugs bug Something isn't working dj:cookbook useful recipes and demos enhancement New feature or request
#710 opened Jun 19, 2025 by HYLcool Loading…
[NewOp] Add download_file_mapper op
#709 opened Jun 18, 2025 by Cathy0908 Loading…
[WIP] Optimization framework dj:core issues/PRs about the core functions of Data-Juicer dj:efficiency regarding to efficiency issues and enhancements
#702 opened Jun 13, 2025 by cyruszhang Loading…
Add unittest for text embedding module
#698 opened Jun 12, 2025 by lingzhq Loading…
MinHash calculation with GPU on Ray dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements dj:op issues/PRs about some specific OPs dj:tools issues/PRs about specific tools enhancement New feature or request
#694 opened Jun 9, 2025 by cyruszhang Loading…
Add DJ-MCP server
#690 opened Jun 5, 2025 by cmgzn Loading…
support RayExporter
#687 opened Jun 4, 2025 by Cathy0908 Loading…
[WIP] fix calculate_np
#679 opened May 22, 2025 by Cathy0908 Loading…
[WIP] deduping benchmark suite
#607 opened Mar 4, 2025 by cyruszhang Loading…
Optimize dedup to avoid oom dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements dj:tools issues/PRs about specific tools enhancement New feature or request good first issue Good for newcomers
#568 opened Feb 7, 2025 by coolderli Loading…
Add humanvbench operators dj:multimodal issues/PRs about multimodal data processing dj:op issues/PRs about some specific OPs good first issue Good for newcomers
#553 opened Jan 17, 2025 by SYSUzhouting Loading…
Add minhash deduplicator based on RAY and Redis dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements dj:op issues/PRs about some specific OPs
#489 opened Nov 15, 2024 by pan-x-c Loading…
Automatically split input dataset in ray mode
#415 opened Sep 4, 2024 by pan-x-c Loading…
[WIP]Add text tagging by prompt mapper op dj:op issues/PRs about some specific OPs
#408 opened Aug 30, 2024 by garyzhang99 Loading…
1 task
Add GPT-4V as evaluator dj:multimodal issues/PRs about multimodal data processing enhancement New feature or request stale-pr
#276 opened Mar 22, 2024 by drcege Draft DJ-SORA
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.