These virtual sets are usually images, sometimes video files. The images are usually .png files, which are a widely accepted file format that supports embedded alpha channels which is how we convey the ability to put people behind desks and video into monitors.
Alpha channels are a 4th channel in the image, the first 3 being red, green, and blue. This 4th channel is a grayscale which determines what is opaque and what is transparent; white being visible and black being transparent and various shades of gray in between semitransparent.
In our sets there are 2-3 different versions of each angle, a 24 bit solid image, and 1 or 2 32bit matte versions which have an alpha channel for either putting the chromakeyed talent behind a desk or video into a screen.
These matte files, usually found in the mattes folder, have an A or B in their name. We used to use FG and BG but some sets have two background mattes for different things like screens and windows.
The Virtual set will be built on your timeline and may look something like this below in your nonlinear editor. Some editors are top down and some are bottom up. In this example the NLE is SpeedEdit and it is a bottom up view (meaning the camera sees the bottom layer first and so on). Most NLEs are top down, the topmost track is what is seen first. On the bottom we have the B matte, in this case it is the table. Next is the chromakeyed talent, then the A matte which is most of the set except there is an alpha channel hole where the screen is. Last is video that is going into the monitor.
We can think of this composite being a sandwich of the above items. If you looked at it from the side it might look something like this:
But when we line everything up right we get a finished composite that looks something like this:
Viola! instant virtual set.