410. LLM Reasoning

▮ LLM Reasoning

Despite the impressive performance of LLMs across many tasks, their reasoning processes can still inadvertently introduce hallucinations and accumulated errors.

For this post, I’d like to share what I’ve learned from several state-of-the-art research in this field, and how we might increase the LLM’s reasoning ability.

▮ Chain-Of-Thought

The first significant progress was achieved by a method called Chain-Of-Thought. This research shows that prompting the model with a series of intermediate reasoning steps instead of just prompting the final answer significantly improves the ability of LLMs.

Chain-Of-Thought

▮ Self-Consistency

Instead of only taking the greedy singular reasoning path in the decoding phase, like in chain-of-thought, this method first samples a diverse set of reasoning paths and then selects the most consistent answer by marginalizing out the sampled reasoning paths.

Self-Consistency

What makes this approach attractive is that it requires no additional human annotation, and avoids any additional training, auxiliary models, or fine-tuning. It acts like a “self-ensemble” model that works on top of a single language model.

▮ Deductive Verification Chain-Of-Thought

Despite the improved performance by self-consistency, we cannot deny that “consistency” and “reliability” are not inherently correlated; they are not powerful enough to represent many kinds of reasoning processes.

Inspired by how humans engage in deductive logical reasoning processes to solve a task, this method seeks to enable LLMs to perform such reasoning processes and also ensure their trustworthiness through “self-verification“.

Through their previous experiments, they have found that the reliability of the verification done by existing LLMs is quite high if the reasoning process is very short. Therefore, they verify the reasoning processes by gradually isolating very few statements within the long thought chain. Breaking long thoughts into smaller chunks.

Self-Verification

As you can see in the image above, they first only extract #1 and #2 and compare that with the reasoning process LLM has just generated.

▮ References