To introduce a voice command, start by saying the word as you provide the hand gesture. The most common choice is "sit", spoken firmly in a middle register - neither high pitched and excited nor low and growly. But exactly how you say the word (or, of course, which word you pick) isn't as important as getting your dog to learn whichever one you pick.
After you have been doing this over a period of time, then you can proceed to giving the voice command without the hand signal. Doesn't work? Just return to one step earlier in the training process: give the hand signal or even go back to luring the dog into the sit. Dog behavior isn't as consistent as we might think, and there is normally a lot of training work between the first time the dog does something in response to a command and when they can do it consistently (well, and then there's being able to do it with a squirrel running in front of them, and few dogs make it all the way to the squirrel-proof stage).
One thing to avoid doing is to repeat the voice command. If it didn't work the first time, there is a good chance that it won't work the second time, and you just give the dog a chance to unlearn the connection between the word and sitting. You can try your hand signal shortly after the voice command, you can coax with words like "keep going", "you can do it", and the like, or you can just move on to trying something else and getting back to the problem one later. But you don't want your dog to come away with the message that they should ignore the first few commands.