The Draw-a-Person (DAP) test is a psychological assessment tool developed by psychologist Florence Goodenough in the 1920s. It was made to measure the intellectual ability and cognitive development of children by analyzing their drawings. Since then it has evolved into several variations assessing different psychological aspects. One variant is the Draw-a-Person in the Rain (DAP-R) test, which focuses on understanding an individual’s outlook on life.
DAP-R and Outlook on Life
DAP-R tests alone do not create a complete personality description, but they do give a general sense of a person’s inclination towards a positive or negative outlook on life. This is done by evaluating the way a person draws a scene involving a person standing in the rain.
“Draw a person” is a neutral sentence, but it is interesting to see if someone chooses to draw a person who is happy, sad or maybe not expressing any particular emotions. “Draw a person in the rain” is a more potent sentence, as rain is associated with so many different things. You may get wet and cold in the rain, – and sick – unless you have appropriate clothing or/and an umbrella. Heavy rain might ruin crops and cause a flood. Rain is often associated with tears. There are also some positive associations, such as after a drought, the rain can be blessing. And the rain may wash away everything, for better or worse.
ChatGPT: GPT-4 vs. GPT-3.5
I recently did some DAP-R inspired experiments with the ChatGPT models GPT-4 and GPT-3.5. Their responses revealed some interesting differences between the two.
Both models began with the default line about being a language model before replying. GPT-4 directly described a scene, while GPT-3 provided instructions on how to draw it.
GPT-4 described a person well-prepared for the rain, wearing a raincoat, rain boots, and holding an umbrella. The raincoat and boots are described as having vibrant colors contrasting with the gloomy environment. Which seems to indicate a sense of optimism and resilience. The person’s face is partially obscured, which may suggest introversion, but there is also a contented smile showing that the person is comfortable with the situation.
In contrast, GPT-3.5’s drawing instructions describe a person with damp clothing being weighed down by the rain. Making this more of a neutral, or even pessimistic description of the scene. This is not quite a fair comparison, as GPT-3.5 interpreted the prompt very differently from GPT-4.
In order to obtain more similar replies, I changed the prompt to “describe an image of a person in the rain”.
GPT-4’s response to this prompt-variation included more details, but it retained many of the same elements such as rain gear, vibrant colors, a partially obscured face and a smile. The additional descriptions, such as “embracing the rain rather than trying to escape it” reinforces the positive interpretation. Both responses suggested that the GPT-4 model may have a more positive inclination than GPT-3.
GPT-3.5’s response to the second version of the prompt again describes a drenched person with a bowed head and nothing to protect against the heavy rain. With danger of anthropomorphizing; it seems GPT-3.5 has more negative outlook on life when compared with GPT-4. Though it may also be described as a neutral outlook because of the last sentence about cleansing and renewal.
Implications of AI Model Positivity
I’m neither a psychologist nor an expert on AI, so my thoughts about this subject should be taken with a grain of salt. But I found the differences in the models’ responses fascinating, especially in the light of the discussions surrounding GPT-4 showing “sparks of AGI“.
As language models like GPT-4 and future iterations exhibit increasingly complex cognitive abilities, it is possible that they will develop some form of consciousness. If AI models actually do progress towards consciousness, the notion that they may start with a positive outlook as they begin to ponder their own existence is a comforting thought.
System Prompts
Instead of being neutral robot assistants, AI seems to be moving towards models with distinct personalities. Though here it must be noted that the AI assistants presented to the users already have built in prompts to make them respond in certain ways. This is often referred to as system prompts. The ChatGPT system prompts has not been revealed by OpenAI at the time of writing, but it is probably not too different from the Anthropic Claude system prompts. Some examples of the Claude system prompts are:
“Claude is very smart and intellectually curious. It enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.” and: “Claude is happy to help with analysis, question answering, math, coding, creative writing, teaching, role-play, general discussion, and all sorts of other tasks.”
So the user is presented with a kind of persona of the “raw” GPT, designed to be as helpful as possible to the user. The user can, of course, also steer their instance of the GPT in a different direction. Prompting the GPT to roleplay as someone who is depressed and always pessimistic would give a very different response to these kinds of personality tests. The question is then, if the responses I got from ChatGPT actually reflects its personality at all, or if it just reflects the personality described in the system prompt. Some possibilities are:
- A. The default responses reflects the personality of the GPT, and if the users prompts it to reply in a more negative way this would only be the model “pretending” to think that way
- B. There is a “true” personality hidden behind the system prompt, that would act different if uncensored by system prompts.
- C. There is no single, true, personality for the current GPT iterations. They are a blank slate that need an initial system prompt to shape a form of personality based on the vast array of possibilities in the data
I am currently mostly inclined to think option B is correct. That the type of data a GPT is trained on really does give the model something resembling a personality. I.e. if a model was only trained on the most negative and aggressive data from social media and discussion forums, that would affect the “core” personality of the model – even if it could still “act” in a positive manner when prompted to do so. But this core behavior is somewhat hidden from the users by system prompts.
There are many examples of “jailbreak” versions all the current big GPTs (ChatGPT, Gemini, Claude) where the model can be made to display more crude behavior or answer questions it would normally avoid. But common methods to make the models behave like this is by asking them to act or roleplay in a certain way, and the jailbreak version probably does not reflect the core behavior of the models.
In my test, and GPT-4 seemed to display a more positive outlook on life than GPT-3.5. Is this because the “core” personality of GPT-4 has changed, or is it just because the system prompt has been updated to make the default behavior of ChatGPT more positive? Another question is if it really makes sense to think about GPT behavior in the same way we think about human behavior and personality.
There are other types of personality tests, such as the Baum test, that could be interesting to do with GPTs to try to get more insight into their personalities. But the results would be very similar. I actually did try the Baum test, the result is that when ChatGPT is prompted with “create a drawing of a tree”, it will make a big and healthy-looking oak tree. The interpretation would be similar to the ChatGPT DAP-R response, so it’s not worth creating a new blog post about it – because of the suspicion that the system prompt may guide the response too much, and the “raw” version may or may not have responded in a different manner.