• bigboitricky@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    18 days ago

    It’s hard to please everybody with an answer when you also train off the responses from said answers.

    For anyone interested, Computerphile did an episode about sleeper agents in models.

  • Frezik@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    0
    ·
    19 days ago

    Just so we’re all clear, don’t trust stuff posted from Anthropic “research”. They may be right, they may be wrong, but they definitely have poor methodology with titles designed for media outrage.

  • very_well_lost@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    19 days ago

    This really shouldn’t be that surprising.

    Language is a chaotic system (in the mathematical sense) where even small changes to the initial conditions can lead to vastly different outcomes. Even subtle variations in tone, cadence, word choice and word order all have a major impact on the way a given sentence is understood, and if any of those things are even slightly off in the training data, you’re bound to get weird results.