How much CSAM porn is too much CSAM porn? 30 images, says Apple

That’s what it takes for its hashtag matching/iCloud flagging/human review protocol to reach a one-in-a-trillion chance of being wrong.

We empirically assessed NeuralHash performance by matching 100 million non-CSAM photographs against the perceptual hash database created from NCMEC’s CSAM collection, obtaining a total of 3 false positives, as verified by human inspection. Separately, we assessed NeuralHash on an adult pornography dataset of about 500,000 where we observed zero false positives against the perceptual hash database. The threat model explicitly takes into account the possibility of NeuralHash image-level false positives by imposing a required threshold of simultaneous matches for a given account before Apple’s iCloud Photos servers can decrypt any vouchers.

Apple always chooses the match threshold such that the possibility of any given account being flagged incorrectly is lower than one in one trillion, under a very conservative assumption of the NeuralHash false positive rate in the field. As the system is initially deployed, we do not assume the 3 in 100M image-level false positive rate we measured in our empirical assessment. Instead, our conservative assumption builds in a safety margin that is two orders of magnitude stronger. Specifically, we assume a worst-case NeuralHash image-level error rate of one in one million, and pick a threshold that safely produces less than a one-in-one-trillion error rate for a given account under that assumption. Building in an additional safety margin by assuming that every iCloud Pho- to library is larger than the actual largest one, we expect to choose an initial match threshold of 30 images. Since this initial threshold contains a drastic safety margin reflecting a worst-case assumption about real-world performance, we may change the threshold after continued empirical evaluation of NeuralHash false positive rates – but the match threshold will never be lower than what is required to produce a one-in-one-trillion false positive rate for any given account.

My take: I think I understand the system well enough now to skip any commentary or criticism that describes what Apple is doing with CSAM as “scanning photos.” I’ve flagged a few:

7 Comments

  1. Fred Stein said:
    The arrogance and ignorance of some Apple critiques. Do they not care, or do they not know, that Apple spent years debating and exploring technology approaches before going forward. Further, these debates included the most talented, ethical people in the industry, Apple’s most valuable human resources.

    As an investor, I’m happy with the ROIC.

    2
    August 14, 2021
  2. Jerry Doyle said:
    Apple’s implementation of this new algorithm for CSAM Detection specifically on the iPhone when Apple always has gone out of its way to say the information on our personal devices is “private” information opens the back door for access to personal devices for which Government long has sought. If Apple can implement such a program for CSAM to accommodate NCMEC’s requests, then Apple can (and should) do so to accommodate federal and national security agencies to void potential terrorists activities and to void crime syndicates activities involving fraud and abuse of innocent citizens, especially the elderly.

    Apple, itself, has chosen voluntarily to void its strong policy line upholding personal privacy for a specific group. While children are a most vulnerable group they no less are no more vulnerable than other potential innocent victims at the mercy of perpetrators of national terrorist acts and crime syndicates.

    If Apple fails to comply with future federal agency requests for information access on iPhone devices, then Congress now has the tools to pass easily needed legislation requiring government access to personal users data on their iPhone devices.

    0
    August 14, 2021
    • Gregg Thurman said:
      opens the back door for access to personal devices for which Government long has sought.

      No it doesn’t. Apple compares the neural hash of photos being uploaded to iCloud with known CSAM photos in the NCMEC’s database. The feature DOES NOT RESPOND to any outside “requests” for data.

      There is no back door in this feature as the capability is a part of iOS and not an independent App.

      2
      August 14, 2021
  3. David Emery said:
    I still don’t know where the “1 in 1 trillion” estimate comes from… Seems they got 3 in 100 million, which is a small number, but A LOT BIGGER than 1 in 1 trillion. (I leave it to the reader to calculate how many zeros there are between those numbers 🙂 )

    0
    August 14, 2021
    • Gregg Thurman said:
      David, I struggled with this until I reread the PR. Apple’s experience is 3:100 Million. Apple ASSUMED a WORST CASE error rate of 1:1 Million then set the error rate bar 2 orders of magnitude higher, or 1:1 Trillion.

      In order to reach that bar the system has to report 30 images. At that point a human steps in and visually examines the matches to the NCMEC database.

      0
      August 15, 2021

Leave a Reply