Dataset Viewer (First 5GB)
Auto-converted to Parquet Duplicate
messages
stringlengths
37.7k
1.08M
instance_id
stringlengths
9
30
rollout_patch
stringlengths
171
15.1M
func_name
stringlengths
1
81
func_path
stringlengths
10
88
line_level_recall
float64
0
1
problem_statement
stringlengths
129
13.6k
target_patch
stringlengths
0
1.54M
docker_image
stringclasses
121 values
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
MONAI_14513
"diff --git a/monai/transforms/intensity/dictionary.py b/monai/transforms/intensity/dictionary.py\ni(...TRUNCATED)
RandRicianNoised.__call__
monai/transforms/intensity/dictionary.py
0
"# Fix modularity and reusability issues in RandRicianNoised.__call__\n\n## Description\n\nThe `Rand(...TRUNCATED)
"diff --git a/monai/transforms/intensity/dictionary.py b/monai/transforms/intensity/dictionary.py\ni(...TRUNCATED)
jyangballin/swesmith.x86_64.project-monai_1776_monai.a09c1f08
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
click_10676
"diff --git a/src/click/termui.py b/src/click/termui.py\nindex d30dc19..281d4c7 100644\n--- a/src/cl(...TRUNCATED)
confirm
src/click/termui.py
1
"Fix reliability and deterministic behavior issues with click.confirm()\n\nDescription\n\t\t \n\t\t((...TRUNCATED)
"diff --git a/src/click/termui.py b/src/click/termui.py\nindex d30dc19..14db158 100644\n--- a/src/cl(...TRUNCATED)
jyangballin/swesmith.x86_64.pallets_1776_click.fde47b4b
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
MONAI_18409
"diff --git a/monai/transforms/inverse_batch_transform.py b/monai/transforms/inverse_batch_transform(...TRUNCATED)
BatchInverseTransform.__call__
monai/transforms/inverse_batch_transform.py
0.5
"# Resource leak in BatchInverseTransform.__call__ when exceptions occur\n\n### Description\n\nThe `(...TRUNCATED)
"diff --git a/monai/transforms/inverse_batch_transform.py b/monai/transforms/inverse_batch_transform(...TRUNCATED)
jyangballin/swesmith.x86_64.project-monai_1776_monai.a09c1f08
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
paramiko_11400
"diff --git a/paramiko/transport.py b/paramiko/transport.py\nindex f0fcb97..cbc43bc 100644\n--- a/pa(...TRUNCATED)
SSHClient.invoke_shell
paramiko/client.py
0
"SSHClient.invoke_shell side effects and purity issues\nDescription:\nThe SSHClient.invoke_shell() m(...TRUNCATED)
"diff --git a/paramiko/client.py b/paramiko/client.py\nindex d8be910..97fa24c 100644\n--- a/paramiko(...TRUNCATED)
jyangballin/swesmith.x86_64.paramiko_1776_paramiko.23f92003
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
scrapy_11482
"diff --git a/scrapy/statscollectors.py b/scrapy/statscollectors.py\nindex f3dd0f8..be2d299 100644\n(...TRUNCATED)
StatsCollector.get_stats
scrapy/statscollectors.py
0
"[Bug]: StatsCollector.get_stats() backwards compatibility issues\n### Bug summary\n\nStarting with (...TRUNCATED)
"diff --git a/scrapy/statscollectors.py b/scrapy/statscollectors.py\nindex f3dd0f8..422a45f 100644\n(...TRUNCATED)
jyangballin/swesmith.x86_64.scrapy_1776_scrapy.35212ec5
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
sunpy_11863
"diff --git a/sunpy/map/maputils.py b/sunpy/map/maputils.py\nindex 0f425b7..0ba1f4c 100644\n--- a/su(...TRUNCATED)
all_corner_coords_from_map
sunpy/map/maputils.py
0
"# Dependency management issues with all_corner_coords_from_map\n\n## Description\n\nThere are depen(...TRUNCATED)
"diff --git a/sunpy/map/maputils.py b/sunpy/map/maputils.py\nindex 0f425b7..1a15c51 100644\n--- a/su(...TRUNCATED)
jyangballin/swesmith.x86_64.sunpy_1776_sunpy.f8edfd5c
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
trio_11346
"diff --git a/src/trio/_socket.py b/src/trio/_socket.py\nindex 259992b..210f658 100644\n--- a/src/tr(...TRUNCATED)
_SocketType.sendto
src/trio/_socket.py
0
"Should `_SocketType.sendto()` be more testable through dependency injection?\n\n### Description\n\n(...TRUNCATED)
"diff --git a/src/trio/_socket.py b/src/trio/_socket.py\nindex 259992b..fa82cbd 100644\n--- a/src/tr(...TRUNCATED)
jyangballin/swesmith.x86_64.python-trio_1776_trio.cfbbe2c1
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
dvc_13075
"diff --git a/dvc/parsing/context.py b/dvc/parsing/context.py\nindex af8d1b3..3264ef4 100644\n--- a/(...TRUNCATED)
KeyNotInContext.__init__
dvc/parsing/context.py
0.75
"# KeyNotInContext Performance Issue\n\nThere appears to be an issue with performance and efficiency(...TRUNCATED)
"diff --git a/dvc/parsing/context.py b/dvc/parsing/context.py\nindex af8d1b3..3d98091 100644\n--- a/(...TRUNCATED)
jyangballin/swesmith.x86_64.iterative_1776_dvc.1d6ea681
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
pypika_11294
"diff --git a/pypika/__init__.py b/pypika/__init__.py\nindex 66f564f..c00d467 100644\n--- a/pypika/_(...TRUNCATED)
Term.__truediv__
pypika/terms.py
0
"# Reduce Add/RemoveIndex migration operations\n\nDescription\n\nWe need to optimize migration opera(...TRUNCATED)
"diff --git a/pypika/terms.py b/pypika/terms.py\nindex a277e1a..c4494d1 100644\n--- a/pypika/terms.p(...TRUNCATED)
jyangballin/swesmith.x86_64.kayak_1776_pypika.1c9646f0
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
trio_10018
"diff --git a/src/trio/_core/_io_epoll.py b/src/trio/_core/_io_epoll.py\nindex 5e05f08..c35e5a4 1006(...TRUNCATED)
EpollIOManager.get_events
src/trio/_core/_io_epoll.py
0.5
"RuntimeWarning in EpollIOManager.get_events when using large file descriptors\n\n#### Description\n(...TRUNCATED)
"diff --git a/src/trio/_core/_io_epoll.py b/src/trio/_core/_io_epoll.py\nindex 5e05f08..ddffcd5 1006(...TRUNCATED)
jyangballin/swesmith.x86_64.python-trio_1776_trio.cfbbe2c1
End of preview. Expand in Data Studio

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

This dataset contains 66337 trajectories. Data was generated from the second rollout of SVG on 121 SWE-smith codebases using GLM-4.5-Air as teacher and includes three SVG runs per function. Sera-4.5-Lite-T2 is a subset of this dataset and was used to train SERA-32B-GA.

Schema:

messages: Generated trajectory
instance_id: ID of trajectory
rollout_patch: Created patch to the codebase from the current trajectory
func_name: Name of function sampled from codebase to start the pipeline
func_path: File path to the sampled function
line_level_recall: Minimum patch verification threshold that is satisfied
problem_statement: Problem statement provided to the model
target_patch: Ground truth patch (empty if T1) 
docker_image: Docker image used

Verification:
Verification can be done on T2 trajectories by comparing generated rollout patches against the target ground truth patch from T1 trajectories.
We do not verify in our main experiments but provide the metadata to do so in target_patch and rollout_patch.

Note: Apply json.loads() to the messages column to load.

Sera-4.5A-Full-T2 is licensed under the Open Data Commons Attribution License v1.0 (ODC-By). It is intended for research and educational use. For more information, please see our Responsible Use Guidelines.

Downloads last month
116

Collection including allenai/Sera-4.5A-Full-T2