The test lasted on average 30 min. (range 20 – 50 min).

More than 60% of dogs stayed for the full test duration in both groups.

Coding reliability

The majority of behaviours was reliably codable within and between raters. Only medium proximity, all attitude behaviours, and shaking had poor coding reliability. Excluding those variables, the average inter-rater reliability was 0.82 (ICC) and the average intra-rater reliability was 0.84 (ICC).

Test-retest reliability

The average test-retest reliability was poor with an average ICC of 0.25 in the shelter and 0.22 on the streets.

The main behaviours reliably observed over time were human-directed, in particular proximity to the experimenter(s), the phase in which the experimenter was approached, tail wagging towards the experimenter, and following the pointing cues.