Ken writes about the lessons they’ve learned building new LLM-based features into their product.
When it comes to prompts, less is more
Not enumerating an exact list or instructions in the prompt produces better results, if that thing was already common knowledge. GPT is not dumb, and it actually gets confused if you over-specify.
This has been my experience as well. For a recent project, I first started with a very long and detailed prompt, asking the LLM to classify a text and produce a summary. GPT-4, GPT-3.5, Claude-3-Opus, and Claude-3-Haiku all performed average or poorly. I then experimented with shorter prompts, and with some adjustments I was able to get much better responses with a very much shorter prompt.
GPT is really bad at producing the null hypothesis
“Return an empty output if you don’t find anything” – is probably the most error-prone prompting language we came across. Not only does GPT often chooses to hallucinate rather than return nothing, it also causes it to just lack confidence a lot, returning blank more often than it should. It basically doesn’t know how to say “I don’t know”.
Again, very true. Be cautious in asking an LLM to return “nothing” or to respond with “I don’t know” if condition. I’ve found the models from Anthropic (specially Claude-3-Haiku) to be terrible in responding negatively. They really want to respond with something satisfying to the prompt, even if the condition to respond “I don’t know” is met.