first thing, if you're going to include your voice in the audio, you'll need a linear editor. most of the linear editors allow multiple audio and video tracks.
the recording aspect is just like a VCR. you hit the record button and it starts recording. then you start playing your game. whatever is being sent out of the Xbox will go to the inputs of your capture device. the video you record will contain both 1 video track and 1 audio track. the separate audio you recorded for your own commentary will be a WAV file that you created via Sound Recorder or any other audio recorder of your choosing.
when you put the recorded video on your editing timeline of your editor, you'll see video track 1 from your recorded video as well as audio track 1 from the video. you'll add that recording of your voice as audio track 2. then you trim your video as needed in the editor. then you publish your video. you'll want to make it h.264 MP4 which is Youtube's preferred format. If you make it another format, you will be at the mercy of Youtube's transcoder to make your video fit their standards. however, no matter what video you create, Youtube will transcode it anyway. But it'll be a faster transcode process and will most likely fit your specifications rather than theirs.