Distinguishing and recognizing water targets and underwater targets has been the focus of passive sonar detection. The depth of the target is closely related to the physical characteristics of the signal. In the shallow water waveguide, the normal mode theory can be used to give a good explanation to the acoustic signal physical properties. In this paper, a new method of beam forming in horizontal array modal domain is proposed. Under the condition of predicting target azimuth, the difference in acoustic path between the horizontal array elements corresponding to the direction of the target signal can be calculated according to the azimuthal information, and the phase delay of each normal mode component of the acoustic signal can be obtained. The horizontal wave number varies with order of normal mode, so each order of the normal mode has a specific phase delay. By using the beam forming principle, when the phase of a certain order of normal mode is compensated for, the output of the superposition of the signal on each element is the modal intensity of the normal mode. After obtaining the target signal modal intensity of each order, based on the shallow water condition, the modal intensities of sound source excitation at different depths are obtained as the reference mode intensities of the sound source at corresponding depths in the shallow water waveguide by simulating on Kracken software. Then, calculating the correlation coefficient between the target signal modal intensity of each order and the reference modal intensity of the sound source at each depth, we search for the maximum value of the correlation coefficient. The reference depth corresponding to the maximum value of the correlation peak is the estimated value of the target depth calculated by the method. Based on physical causes and characteristics of the normal modes, in this paper, the influences of the parameters such as the element number of horizontal array, depth of receiving array, signal-to-noise ratio, velocity profile, waveguide depth, azimuthal estimation accuracy, effective array length and application frequency band on the performance of this method are analyzed. The simulation results show that the algorithm can estimate the depth of the sound source effectively by using the signal sample with a bandwidth of 300 Hz when the signal-to-noise ratio is -10 dB. The wider the frequency band, the longer the effective array length, and the more the array element number, the higher the accuracy of azimuth estimation will be, which will bring beneficial effects to the depth estimation with the method. In addition, the depth estimation performance of the proposed method is still robust when the waveguide conditions such as the velocity profile and the seafloor parameters are disturbed.