• Ech@lemmy.ca
    link
    fedilink
    arrow-up
    0
    ·
    2 days ago

    Took a look cause, as frustrating as it’d be, it’d still be a step in the right direction. But no, they’re still adamant that these are just a “quirk”.

    Conclusions

    We hope that the statistical lens in our paper clarifies the nature of hallucinations and pushes back on common misconceptions:

    Claim: Hallucinations will be eliminated by improving accuracy because a 100% accurate model never hallucinates. Finding: Accuracy will never reach 100% because, regardless of model size, search and reasoning capabilities, some real-world questions are inherently unanswerable.

    Claim: Hallucinations are inevitable. Finding: They are not, because language models can abstain when uncertain.

    Claim: Avoiding hallucinations requires a degree of intelligence which is exclusively achievable with larger models. Finding: It can be easier for a small model to know its limits. For example, when asked to answer a Māori question, a small model which knows no Māori can simply say “I don’t know” whereas a model that knows some Māori has to determine its confidence. As discussed in the paper, being “calibrated” requires much less computation than being accurate.

    Claim: Hallucinations are a mysterious glitch in modern language models. Finding: We understand the statistical mechanisms through which hallucinations arise and are rewarded in evaluations.

    Claim: To measure hallucinations, we just need a good hallucination eval. Finding: Hallucination evals have been published. However, a good hallucination eval has little effect against hundreds of traditional accuracy-based evals that penalize humility and reward guessing. Instead, all of the primary eval metrics need to be reworked to reward expressions of uncertainty.

    Infuriating.

    • lets_get_off_lemmy@reddthat.com
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      2 days ago

      This further points to the solution being smaller models that know less and are trained for smaller tasks. Instead of gargantuan models that require an insane amount of resources to answer easy questions. Route queries to smaller, more specialized models, based on queries. This was the motivation behind MoE models, but I think there are other architectures and paradigms to explore.

    • wewbull@feddit.uk
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      2 days ago

      Finding: Accuracy will never reach 100% because, regardless of model size, search and reasoning capabilities, some real-world questions are inherently unanswerable.

      Translation: PEBKAC. You asked the wrong question.