Dataset Viewer (First 5GB)
Auto-converted to Parquet Duplicate
messages
stringlengths
45.6k
1.04M
instance_id
stringlengths
9
30
rollout_patch
stringlengths
128
15.1M
func_name
stringlengths
1
81
func_path
stringlengths
10
90
line_level_recall
float64
0
1
problem_statement
string
target_patch
string
docker_image
string
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
tornado_10294
"diff --git a/tornado/websocket.py b/tornado/websocket.py\nindex 1e0161e..c66e8d7 100644\n--- a/torn(...TRUNCATED)
WebSocketProtocol._run_callback
tornado/websocket.py
0
"Dependency management issues with WebSocketProtocol._run_callback\n\nDescription\n\nThe WebSocketPr(...TRUNCATED)
"diff --git a/tornado/websocket.py b/tornado/websocket.py\nindex 1e0161e..6f1857c 100644\n--- a/torn(...TRUNCATED)
jyangballin/swesmith.x86_64.tornadoweb_1776_tornado.d5ac65c1
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
MONAI_11809
"diff --git a/monai/data/utils.py b/monai/data/utils.py\nindex d03dbd3..5503f31 100644\n--- a/monai/(...TRUNCATED)
select_cross_validation_folds
monai/data/utils.py
0
"Modular and Reusable Implementation of `select_cross_validation_folds`\n\nDescription\n\n(last modi(...TRUNCATED)
"diff --git a/monai/data/utils.py b/monai/data/utils.py\nindex d03dbd3..e66b4a1 100644\n--- a/monai/(...TRUNCATED)
jyangballin/swesmith.x86_64.project-monai_1776_monai.a09c1f08
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
graphene_10273
"diff --git a/graphene/types/enum.py b/graphene/types/enum.py\nindex bc61cd4..6a2b105 100644\n--- a/(...TRUNCATED)
EnumMeta.get
graphene/types/enum.py
0.5
"## Performance Issue with EnumMeta.get\n\nI've identified a performance issue with `EnumMeta.get` i(...TRUNCATED)
"diff --git a/graphene/types/enum.py b/graphene/types/enum.py\nindex bc61cd4..932ef60 100644\n--- a/(...TRUNCATED)
jyangballin/swesmith.x86_64.graphql-python_1776_graphene.82903263
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
dask_10563
"diff --git a/dask/delayed.py b/dask/delayed.py\nindex 1718ca8..6cf3175 100644\n--- a/dask/delayed.p(...TRUNCATED)
Delayed._rebuild
dask/delayed.py
0
"# Control flow issue in Delayed._rebuild method\n\n## Problem Description\n\nThere appears to be an(...TRUNCATED)
"diff --git a/dask/delayed.py b/dask/delayed.py\nindex 1718ca8..db4e9d6 100644\n--- a/dask/delayed.p(...TRUNCATED)
jyangballin/swesmith.x86_64.dask_1776_dask.5f61e423
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
pypika_11270
"diff --git a/pypika/queries.py b/pypika/queries.py\nindex 42c7c45..0e38641 100644\n--- a/pypika/que(...TRUNCATED)
DropQueryBuilder.drop_user
pypika/queries.py
0
"[BUG] DropQueryBuilder.drop_user doesn't handle legacy data formats properly\n\n### Problem\n\nTher(...TRUNCATED)
"diff --git a/pypika/__init__.py b/pypika/__init__.py\nindex 66f564f..e2d7f18 100644\n--- a/pypika/_(...TRUNCATED)
jyangballin/swesmith.x86_64.kayak_1776_pypika.1c9646f0
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
MONAI_10993
"diff --git a/monai/apps/reconstruction/mri_utils.py b/monai/apps/reconstruction/mri_utils.py\nindex(...TRUNCATED)
root_sum_of_squares_t
monai/apps/reconstruction/mri_utils.py
0
"# root_sum_of_squares_t type signature mismatch causes data flow issues\n\n## Description\n\nThere (...TRUNCATED)
"diff --git a/monai/apps/reconstruction/mri_utils.py b/monai/apps/reconstruction/mri_utils.py\nindex(...TRUNCATED)
jyangballin/swesmith.x86_64.project-monai_1776_monai.a09c1f08
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
scrapy_10202
"diff --git a/scrapy/core/spidermw.py b/scrapy/core/spidermw.py\nindex 85a3b58..6a70f09 100644\n--- (...TRUNCATED)
_isiterable
scrapy/core/spidermw.py
0
"_data transformation issue with _isiterable in spidermw.py_\n\nThe `_isiterable` function in `scrap(...TRUNCATED)
"diff --git a/scrapy/core/spidermw.py b/scrapy/core/spidermw.py\nindex 85a3b58..597313b 100644\n--- (...TRUNCATED)
jyangballin/swesmith.x86_64.scrapy_1776_scrapy.35212ec5
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
dask_10363
"diff --git a/dask/_task_spec.py b/dask/_task_spec.py\nindex 45e6c46..ea1c79f 100644\n--- a/dask/_ta(...TRUNCATED)
Task.__init__
dask/_task_spec.py
0
"Task.__init__ violates single responsibility principle\n\nDescription\n\nThe `Task.__init__` method(...TRUNCATED)
"diff --git a/dask/_task_spec.py b/dask/_task_spec.py\nindex 45e6c46..9f37371 100644\n--- a/dask/_ta(...TRUNCATED)
jyangballin/swesmith.x86_64.dask_1776_dask.5f61e423
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
jinja_11341
"diff --git a/src/jinja2/parser.py b/src/jinja2/parser.py\nindex 817abec..42430da 100644\n--- a/src/(...TRUNCATED)
CodeGenerator.visit_InternalName
src/jinja2/compiler.py
0
"There appears to be an issue with code reliability and deterministic behavior related to CodeGenera(...TRUNCATED)
"diff --git a/src/jinja2/parser.py b/src/jinja2/parser.py\nindex 817abec..00c0f2e 100644\n--- a/src/(...TRUNCATED)
jyangballin/swesmith.x86_64.pallets_1776_jinja.ada0a9a6
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
tornado_11262
"diff --git a/tornado/tcpclient.py b/tornado/tcpclient.py\nindex 2e4b284..1801d32 100644\n--- a/torn(...TRUNCATED)
SimpleHTTPClientTestMixin.test_connect_timeout
tornado/test/simple_httpclient_test.py
0
"Backwards compatibility issue with SimpleHTTPClientTestMixin.test_connect_timeout\n\nDescription\n\(...TRUNCATED)
jyangballin/swesmith.x86_64.tornadoweb_1776_tornado.d5ac65c1
End of preview. Expand in Data Studio

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

This dataset contains 35615 trajectories. Data was generated from the second rollout of SVG on 121 SWE-smith codebases using GLM-4.5-Air as teacher and includes one SVG run per function. 16000 samples from the dataset were used to train SERA-32B-GA. Sera-4.5-Full-T2 is a superset of this dataset with three SVG runs per function.

Schema:

messages: Generated trajectory
instance_id: ID of trajectory
rollout_patch: Created patch to the codebase from the current trajectory
func_name: Name of function sampled from codebase to start the pipeline
func_path: File path to the sampled function
line_level_recall: Minimum patch verification threshold that is satisfied
problem_statement: Problem statement provided to the model
target_patch: Ground truth patch (empty if T1) 
docker_image: Docker image used

Verification:
Verification can be done on T2 trajectories by comparing generated rollout patches against the target ground truth patch from T1 trajectories.
We do not verify in our main experiments but provide the metadata to do so in target_patch and rollout_patch.

Note: Apply json.loads() to the messages column to load. Sera-4.5A-Lite-T2 is licensed under the Open Data Commons Attribution License v1.0 (ODC-By). It is intended for research and educational use. For more information, please see our Responsible Use Guidelines.

Downloads last month
73

Collection including allenai/Sera-4.5A-Lite-T2