Very unlikely that they are considering the actual sung or hummed pitch as very few people, including professional musicians, would start singing at the correct pitch without accompaniment.
Most likely they are mapping the interval between the sung notes and using that as part of the ‘melodic fingerprint’ for matching.
Most likely they are mapping the interval between the sung notes and using that as part of the ‘melodic fingerprint’ for matching.