this is not entirely accurate.
comb-filtering is merely an interference pattern manifested within the 2d frequency-response due to 3d SPATIAL POLAR LOBING (and dependent upon where the receiver position (mic/listening position) is with respect to the lobes for a given wavelength for a given spacing between the sources). eg, whether you are situated within in a polar lobe (area of constructive interference) for a given wavelength or a polar null (area of destructive interference)
to experience "comb-filtering" at a given location in 3space, one does NOT require a delay as experienced via that of a reflected indirect signal!
even in an anechoic chamber where there is no delay as there is no indirect reflections superposing with the direct signal, two speakers (sources) WILL EXHIBIT SPATIAL POLAR LOBING which will then exhibit a comb-filter interference pattern within the 2d frequency response based on receiver position:
polar lobing interference pattern due to 2 direct sources, no indirect reflections (increasing in source frequency):
polar lobing interference pattern due to one source superposing with one indirect reflection (bottom gray boundary/wall)
a 3d balloon plot for two sources (no indirect reflections/delay) clearly showing the polar lobing and subsequent comb-filter interference pattern (thanks to dragonfyr for rendering this):