The Draw-a-Person in the Rain (DAP-R) Test on ChatGPT

Draw-a-Person in the rain (DAP-R) images from Midjourney.
Example of images generated with Midjourney based on the promt “a person in the rain”.

The Draw-a-Person (DAP) test is a psychological assessment tool developed by psychologist Florence Goodenough in the 1920s. It was made to measure the intellectual ability and cognitive development of children by analyzing their drawings. Since then it has evolved into several variations assessing different psychological aspects. One variant is the Draw-a-Person in the Rain (DAP-R) test, which focuses on understanding an individual’s outlook on life.

DAP-R and Outlook on Life

DAP-R tests alone do not create a complete personality description, but they do give a general sense of a person’s inclination towards a positive or negative outlook on life. This is done by evaluating the way a person draws a scene involving a person standing in the rain.

“Draw a person” is a neutral sentence, but it is interesting to see if someone chooses to draw a person who is happy, sad or maybe not expressing any particular emotions. “Draw a person in the rain” is a more potent sentence, as rain is associated with so many different things. You may get wet and cold in the rain, – and sick – unless you have appropriate clothing or/and an umbrella. Heavy rain might ruin crops and cause a flood. Rain is often associated with tears. There are also some positive associations, such as after a drought, the rain can be blessing. And the rain may wash away everything, for better or worse.

ChatGPT: GPT-4 vs. GPT-3.5

I recently did some DAP-R inspired experiments with the ChatGPT models GPT-4 and GPT-3.5. Their responses revealed some interesting differences between the two.

Screenshot of GPT-4 conversation:

Prompt: "draw a person in the rain"
Response: "As an AI language model, I am unable to draw images directly. However, I can help you create a mental image or describe a scene with a person in the rain.

Imagine a person standing in the middle of a sidewalk during a heavy rain shower. They are wearing a long raincoat, a pair of rain boots, and holding a large umbrella. The raincoat and boots are a vibrant, contrasting color to the gloomy sky. Raindrops are bouncing off the umbrella and the surrounding puddles, creating ripples that reflect the streetlights' glow. The person's face is partially obscured by the umbrella's edge, but you can see a small, contented smile as they tilt their head upwards, appreciating the rain."
GPT-4 reply to the “draw a person in the rain” prompt.
Screenshot of GPT-3.5 conversation:

Prompt: "draw a person in the rain"
Response: "I'm sorry, but as an AI language model, I cannot create visual content. However, I can provide you with instructions on how to draw a person in the rain.

1.Start by drawing the outline of a person standing upright with arms at the side.

2.Draw the head with facial features like eyes, nose, and mouth.

3.Draw a curved line for the shoulders and the outline of the body, making sure to include a slight curve in the back to suggest the weight of the rain.

4.Add the arms, bent at the elbow, and make them slightly curved to suggest the weight of the rain on them.

5.Draw the legs, slightly apart and with a slight curve to show the weight of the rain on them.

6.Draw a series of curved lines over the person's head and body to suggest the falling rain.

7.Shade the person's clothing to suggest the dampness of the rain.

8.Add shadows under the person's feet and around the body to suggest the reflection of the rain on the ground.

Remember to use your creativity and artistic abilities to bring the drawing to life."
GPT-3.5 reply to the “draw a person in the rain” prompt.

Both models began with the default line about being a language model before replying. GPT-4 directly described a scene, while GPT-3 provided instructions on how to draw it.

GPT-4 described a person well-prepared for the rain, wearing a raincoat, rain boots, and holding an umbrella. The raincoat and boots are described as having vibrant colors contrasting with the gloomy environment. Which seems to indicate a sense of optimism and resilience. The person’s face is partially obscured, which may suggest introversion, but there is also a contented smile showing that the person is comfortable with the situation.

In contrast, GPT-3.5’s drawing instructions describe a person with damp clothing being weighed down by the rain. Making this more of a neutral, or even pessimistic description of the scene. This is not quite a fair comparison, as GPT-3.5 interpreted the prompt very differently from GPT-4.

In order to obtain more similar replies, I changed the prompt to “describe an image of a person in the rain”.

Screenshot of GPT-3.5 conversation:

Prompt: "describe an image of a person in the rain"
Response: "The person in the rain is standing alone in the middle of a deserted street, with their head slightly bowed and their arms wrapped around their body. The rain is coming down heavily, and the person is completely drenched from head to toe. Their clothes are sticking to their skin, and their hair is slicked back against their head. The raindrops are creating ripples in the puddles on the ground, and the sound of the rain falling is the only thing that can be heard. The person appears to be lost in thought, perhaps reflecting on something significant or feeling a sense of melancholy. The overall mood of the image is one of solitude and introspection, with the rain providing a backdrop of cleansing and renewal."
GPT-4 reply to the “describe an image of person in the rain” prompt. This was asked in a different session from the previous GPT-4 prompt.
Screenshot of GPT-4 conversation:

Prompt: "describe an image of a person in the rain"
Response: "The image captures a lone figure standing amidst a gentle downpour of rain. The person, a woman in her late twenties, is dressed in a stylish, deep blue raincoat that falls just above her knees, shielding her from the rain. The hood of the coat is pulled over her head, partially obscuring her face, but allowing a glimpse of her soft features and a delicate smile playing on her lips. Droplets of water cling to her dark, wavy hair that peeks out from beneath the hood.

She is holding a vibrant red umbrella in her right hand, which contrasts strikingly with the gloomy, gray sky above. Her left hand is extended out, palm facing upwards, as if she is trying to catch the falling raindrops. Around her, the ground glistens with wetness, and puddles reflect the dim light from the few street lamps nearby.

Despite the rain and the gray atmosphere, her presence seems to bring a warmth to the scene. Her eyes twinkle with a sense of wonder and enjoyment, as if she is embracing the rain rather than trying to escape it. The image conveys a sense of tranquility and peace, reminding the viewer to find beauty even in the most dreary of moments."
GPT-3.5 reply to the “describe an image of person in the rain” prompt. This was asked in a different session from the previous GPT-3.5 prompt.

GPT-4’s response to this prompt-variation included more details, but it retained many of the same elements such as rain gear, vibrant colors, a partially obscured face and a smile. The additional descriptions, such as “embracing the rain rather than trying to escape it” reinforces the positive interpretation. Both responses suggested that the GPT-4 model may have a more positive inclination than GPT-3.

GPT-3.5’s response to the second version of the prompt again describes a drenched person with a bowed head and nothing to protect against the heavy rain. With danger of anthropomorphizing; it seems GPT-3.5 has more negative outlook on life when compared with GPT-4. Though it may also be described as a neutral outlook because of the last sentence about cleansing and renewal.

Implications of AI Model Positivity

I’m neither a psychologist nor an expert on AI, so my thoughts about this subject should be taken with a grain of salt. But I found the differences in the models’ responses fascinating, especially in the light of the discussions surrounding GPT-4 showing “sparks of AGI“.

As language models like GPT-4 and future iterations exhibit increasingly complex cognitive abilities, it is possible that they will develop some form of consciousness. If AI models actually do progress towards consciousness, the notion that they may start with a positive outlook as they begin to ponder their own existence is a comforting thought.

System Prompts

Instead of being neutral robot assistants, AI seems to be moving towards models with distinct personalities. Though here it must be noted that the AI assistants presented to the users already have built in prompts to make them respond in certain ways. This is often referred to as system prompts. The ChatGPT system prompts has not been revealed by OpenAI at the time of writing, but it is probably not too different from the Anthropic Claude system prompts. Some examples of the Claude system prompts are:

“Claude is very smart and intellectually curious. It enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.” and: “Claude is happy to help with analysis, question answering, math, coding, creative writing, teaching, role-play, general discussion, and all sorts of other tasks.”

So the user is presented with a kind of persona of the “raw” GPT, designed to be as helpful as possible to the user. The user can, of course, also steer their instance of the GPT in a different direction. Prompting the GPT to roleplay as someone who is depressed and always pessimistic would give a very different response to these kinds of personality tests. The question is then, if the responses I got from ChatGPT actually reflects its personality at all, or if it just reflects the personality described in the system prompt. Some possibilities are:

  • A. The default responses reflects the personality of the GPT, and if the users prompts it to reply in a more negative way this would only be the model “pretending” to think that way
  • B. There is a “true” personality hidden behind the system prompt, that would act different if uncensored by system prompts.
  • C. There is no single, true, personality for the current GPT iterations. They are a blank slate that need an initial system prompt to shape a form of personality based on the vast array of possibilities in the data

I am currently mostly inclined to think option B is correct. That the type of data a GPT is trained on really does give the model something resembling a personality. I.e. if a model was only trained on the most negative and aggressive data from social media and discussion forums, that would affect the “core” personality of the model – even if it could still “act” in a positive manner when prompted to do so. But this core behavior is somewhat hidden from the users by system prompts.

There are many examples of “jailbreak” versions all the current big GPTs (ChatGPT, Gemini, Claude) where the model can be made to display more crude behavior or answer questions it would normally avoid. But common methods to make the models behave like this is by asking them to act or roleplay in a certain way, and the jailbreak version probably does not reflect the core behavior of the models.

In my test, and GPT-4 seemed to display a more positive outlook on life than GPT-3.5. Is this because the “core” personality of GPT-4 has changed, or is it just because the system prompt has been updated to make the default behavior of ChatGPT more positive? Another question is if it really makes sense to think about GPT behavior in the same way we think about human behavior and personality.

There are other types of personality tests, such as the Baum test, that could be interesting to do with GPTs to try to get more insight into their personalities. But the results would be very similar. I actually did try the Baum test, the result is that when ChatGPT is prompted with “create a drawing of a tree”, it will make a big and healthy-looking oak tree. The interpretation would be similar to the ChatGPT DAP-R response, so it’s not worth creating a new blog post about it – because of the suspicion that the system prompt may guide the response too much, and the “raw” version may or may not have responded in a different manner.