Dataset Viewer (First 5GB)
Auto-converted to Parquet Duplicate
messages
stringlengths
86.1k
1.07M
instance_id
stringlengths
9
30
rollout_patch
stringlengths
280
1.55M
func_name
stringlengths
1
81
func_path
stringlengths
10
88
problem_statement
stringlengths
182
874
target_patch
stringclasses
1 value
docker_image
stringclasses
120 values
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
python-docx_10411
"diff --git a/src/docx/image/tiff.py b/src/docx/image/tiff.py\nindex 1194929..1cbb044 100644\n--- a/(...TRUNCATED)
_TiffParser.vert_dpi
src/docx/image/tiff.py
"There appears to be an issue with boundary conditions and edge case handling related to _TiffParser(...TRUNCATED)
jyangballin/swesmith.x86_64.python-openxml_1776_python-docx.0cf6d71f
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
mypy_14549
"diff --git a/mypy/build.py b/mypy/build.py\nindex f6272ed..1c91d5d 100644\n--- a/mypy/build.py\n+++(...TRUNCATED)
State.check_blockers
mypy/build.py
"There appears to be an issue with code lifecycle and temporal behavior related to State.check_block(...TRUNCATED)
jyangballin/swesmith.x86_64.python_1776_mypy.e93f06ce
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
chardet_10225
"diff --git a/chardet/charsetprober.py b/chardet/charsetprober.py\nindex a103ca1..5c235cd 100644\n--(...TRUNCATED)
Latin1Prober.feed
chardet/latin1prober.py
"There appears to be an issue with code null safety and optional value handling related to Latin1Pro(...TRUNCATED)
jyangballin/swesmith.x86_64.chardet_1776_chardet.9630f238
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
conan_10397
"diff --git a/conan/cps/cps.py b/conan/cps/cps.py\nindex d8c9fb1..8ee9698 100644\n--- a/conan/cps/cp(...TRUNCATED)
CPS.deserialize
conan/cps/cps.py
"There appears to be an issue with dependency management and module coupling related to CPS.deserial(...TRUNCATED)
jyangballin/swesmith.x86_64.conan-io_1776_conan.86f29e13
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
dask_15963
"diff --git a/dask/array/_shuffle.py b/dask/array/_shuffle.py\nindex 5851a4a..d6ebeae 100644\n--- a/(...TRUNCATED)
_rechunk_other_dimensions
dask/array/_shuffle.py
"There appears to be an issue with code clarity and complexity management related to _rechunk_other_(...TRUNCATED)
jyangballin/swesmith.x86_64.dask_1776_dask.5f61e423
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
tweepy_10286
"diff --git a/tweepy/cache.py b/tweepy/cache.py\nindex bfb87b3..4147099 100644\n--- a/tweepy/cache.p(...TRUNCATED)
MongodbCache.__init__
tweepy/cache.py
"There appears to be an issue with testing and observability related to MongodbCache.__init__ in twe(...TRUNCATED)
jyangballin/swesmith.x86_64.tweepy_1776_tweepy.91a41c6e
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
langdetect_10107
"diff --git a/langdetect/detector.py b/langdetect/detector.py\nindex cc831a0..f2b9a04 100644\n--- a/(...TRUNCATED)
Detector.set_prior_map
langdetect/detector.py
"There appears to be an issue with code testability and design for testing related to Detector.set_p(...TRUNCATED)
jyangballin/swesmith.x86_64.mimino666_1776_langdetect.a1598f1a
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
astroid_11498
"diff --git a/astroid/rebuilder.py b/astroid/rebuilder.py\nindex 4c77906..9397812 100644\n--- a/astr(...TRUNCATED)
TreeRebuilder.visit_raise
astroid/rebuilder.py
"There appears to be an issue with code correctness and business logic implementation related to Tre(...TRUNCATED)
jyangballin/swesmith.x86_64.pylint-dev_1776_astroid.b114f6b5
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
paramiko_10752
"diff --git a/paramiko/channel.py b/paramiko/channel.py\nindex 25326ca..2826345 100644\n--- a/parami(...TRUNCATED)
Channel.shutdown
paramiko/channel.py
"There appears to be an issue with code coupling and dependency relationships related to Channel.shu(...TRUNCATED)
jyangballin/swesmith.x86_64.paramiko_1776_paramiko.23f92003
"[{\"role\": \"system\", \"content\": \"You are a helpful assistant that can interact with a compute(...TRUNCATED)
tornado_10894
"diff --git a/tornado/auth.py b/tornado/auth.py\nindex 64428c5..40d7002 100644\n--- a/tornado/auth.p(...TRUNCATED)
AuthTest.test_facebook_login
tornado/test/auth_test.py
"There appears to be an issue with code lifecycle and temporal behavior related to AuthTest.test_fac(...TRUNCATED)
jyangballin/swesmith.x86_64.tornadoweb_1776_tornado.d5ac65c1
End of preview. Expand in Data Studio

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

This dataset contains 72118 trajectories. Data was generated from the first rollout of SVG on 121 SWE-smith codebases using GLM-4.5-Air as teacher and includes three SVG runs per function.

Schema:

messages: Generated trajectory
instance_id: ID of trajectory
rollout_patch: Created patch to the codebase from the current trajectory
func_name: Name of function sampled from codebase to start the pipeline
func_path: File path to the sampled function
problem_statement: Problem statement provided to the model
target_patch: Ground truth patch (empty if T1) 
docker_image: Docker image used

Verification:
There is no verification for rollout one.

Sera-4.5A-Full-T1 is licensed under the Open Data Commons Attribution License v1.0 (ODC-By). It is intended for research and educational use. For more information, please see our Responsible Use Guidelines.

Downloads last month
111

Collection including allenai/Sera-4.5A-Full-T1