In language testing, there’s an ongoing debate: how much time should test takers be given to complete a writing task? On the one hand, longer durations might better reflect real-life writing processes; on the other hand, it's possible to imagine that beyond a certain point, longer durations are not contributing much to our assessment of test taker's writing skills.

This question led our assessment research team to conduct a study on writing task duration and its effects on test scores, reliability, and validity. Here’s what we discovered.

The research setup: Testing 5-minute vs. 20-minute writing tasks

To investigate, we asked adult L2-English writers to complete writing tasks with either a 5-minute or a 20-minute limit. Responses were then scored by both human raters and an automated writing evaluation (AWE) tool, allowing us to compare how these durations affected scores, score reliability, and score validity.

We hoped to answer key questions:

  1. Does writing performance improve with more time?
  2. Does giving more time enhance the reliability and validity of writing scores?
  3. How do scores differ when rated by humans vs. automated tools?

3 key findings: How time limits affect writing scores

1. Performance improves with longer durations—but gains are modest

As expected, writers produced longer and slightly higher-scoring responses in the 20-minute condition than in the 5-minute one. In part, these differences were due to both durations being rated with the same task expectations. 

In terms of length, these differences weren’t proportional to the additional time: while the task duration quadrupled, the word count only doubled. Critically, under both conditions, test takers evidenced the full range of writing proficiency from beginner to expert.

2. Reliability and validity hold steady across time conditions

Interestingly, our study found that the reliability and validity of scores were similar across both the 5-minute and 20-minute tasks. Reliability measures, which assess score consistency, showed no significant differences between durations. Similarly, criterion validity—how well these scores aligned with other standardized test results like IELTS—was equally robust in both short and long conditions.

These findings suggest that shorter tasks, if designed effectively, can be just as reliable and valid as longer ones. This has big implications for test design, as shorter tasks could provide a faster, less stressful way to assess writing ability without compromising quality. Moreover, given a fixed time for a writing assessment (say 20 minutes), our results demonstrate that it would be much more beneficial to administer 4 short tasks than one long one, allowing for responses on different topics and with different communicative purposes.

3. Human vs. automated scoring: Consistent results

The lack of advantage for longer durations was found for both human raters and the AWE tool. However, we found that automated scores tended to show slightly higher reliability and validity than human scores. This consistency across scoring methods is a positive indicator that automated scoring, when carefully calibrated, can be a reliable and accurate measure in language assessment.

Why this research matters for the future of language testing

This study sheds light on the optimal length of writing tasks for language assessment. In many real-world settings, such as university admissions, writing tasks need to balance practicality and fairness. Our research suggests that shorter tasks—if well-designed—can deliver accurate, reliable results without the need for lengthy testing sessions. For test takers, this means a potentially smoother, less time-intensive experience, while institutions can feel confident in the integrity of the results.

The findings also emphasize the value of automated scoring, which provides a consistent assessment across responses. As we refine these tools, automated scoring can continue to enhance the efficiency and accuracy of writing assessments in language testing.

Looking forward: Designing effective writing tasks

The results of this study open up new possibilities for efficient, reliable writing assessments. At the DET, we’re committed to using research to inform our test design and improve the experience for test takers and institutions alike. By creating tests that are both fair and practical, we can help ensure that English proficiency assessments accurately reflect a test taker’s skills while fitting seamlessly into their lives.

💡 For a deeper dive into this study, read our full research publication in Assessing Writing!

Search