By: Amir Majidimehr
One of the most significant technological changes in CCTV market is the advent of so called “IP cameras.” Despite the fact this technology has been available for quite a few years, there is scant little objective information on what sets them apart from the analog cameras that had powered the industry for decades before. The complexity of what goes into design of IP cameras is partly responsible for this, requiring users to be fluent in everything from advanced video compression to computers and networking.
Complicating matters is the confusing specifications and often misleading terminology used to describe the basic performance of CCTV systems. Simple terms like “lines are resolution” are used where the intuitive meaning (how many lines there are in the picture) actually is not what the metric measures! Add to this the typical marketing hype and the picture becomes even muddier.
The purpose of this series of articles is to simplify these concepts and distill them down to a level where you can make purchasing decisions intelligently. While the coverage will be comprehensive, significant simplification is applied as to make the concepts easy to grasp, assuring that the proverbial “forest is seen from the trees.”
With that introduction, now let’s take a “deep dive” into each technology and what sets them apart.
Analog Camera Overview
As with any imaging device, the analog CCTV camera has a sensor which captures the video image. The resolution of the sensor varies but for reasons which will be described later, it is limited to 720×575. This is 720 pixels across the screen (horizontal resolution) and 575 up and down (vertical resolution).
The video is captured at 60 intervals called “fields” and transmitted to the receiver. Two fields together are called a “frame.” This is called interlaced transmission. More on this later.
To get the video out of the CCTV camera into a recording and display device, a single coax cable is used. To maintain compatibility with analog televisions (and hence make it easier to use off the shelf products for display and recording), the signal that comes out of the camera complies with broadcast television standards.
There are two popular analog standards in the world for television: NTSC (e.g. as used in North America and Japan) and PAL (used in many other countries, especially in Europe). There is also SECAM but it is not a common standard in CCTV world.
First thing to understand about NTSC or PAL is that the number of horizontal lines that make up the picture (i.e. the vertical resolution) is fixed by the specific standard. Let me repeat this again: the number of lines is fixed and every source must transmit that many lines to be compliant with the standard. As a result, when you see a specifications for the number of lines a CCTV camera has, it does NOT refer to vertical resolution which is capped by the standard.
In the case of NTSC, the standard calls for 525 lines and for PAL, 625. However, not every line carries picture information. In reality, the viewable number of lines is 480 for NTSC and for PAL, 575. Note that you may see variations of these numbers such as 486 for NTSC. This is due to some people rounding the number and others not. For the purposes of this article let’s stay with the rounded numbers as the extra accuracy doesn’t mean much in practice anyway.
Now let’s look at the horizontal resolution. Here the picture becomes muddy, pun intended. What does resolution mean in the case of an analog system which does not care about individual pixels of light on your display?
If you look up the spec for an analog CCTV camera, you often see a resolution specified in the form of “lines.” Could this be the horizontal resolution of the camera as the name seems to imply? Well, no! Before we can understand the definition of lines, we need to dig more into the broadcast standard.
The NTSC TV transmission system relies on a display that has elongated pixels and has an aspect ratio of 4:3. In other words, the image is wider than it is tall. What does this have to do with the “line” specification? Well, someone decided that the horizontal resolution needs to be expressed in relation to vertical resolution, trying to show what the horizontal resolution would have been if the TV were square. I know this sounds strange but please don’t shoot the messenger! I am only here to explain things not justify them.
Fortunately, the conversion from actual pixels to “lines” is much simpler than understanding the motivation for it. Multiply the horizontal resolution in pixels by 3 and divide by 4 to arrive at number of lines. So for example, if you have 100 pixels in the horizontal dimension, you only have 75 “lines of resolution” (100 x 3 / 4 = 75). In this regard, the rating understates the true resolution of the system.
Be sure to not confuse the term “line” used in analog TV systems from similar term used to describe different profiles of the high definition TV (HDTV) standard. In that world the pixels are square so lines is the same as resolution. But confusingly, the rating refers to vertical resolution rather than horizontal! I know this all may sound confusing. To keep things straight, just consider “lines” as a metric for analog cameras. In higher resolution formats such as HDTV and IP Cameras, true pixel resolution is used so there is no confusion there.
An interesting question then becomes what is the highest resolution that can be achieved in an analog camera? For this, we can look at the highest standard in analog TV standard and that is what is used in a television studio at say, a major network (or used to be before transition to HDTV). There, we find that when analog TV is processed, it is done in digital domain at a horizontal resolution of 720 pixels. That number then sets the upper bounds for an analog CCTV camera which is usually considerably inferior to the units used in broadcast television.
Now you see why I mentioned that the maximum resolution of any analog camera is 720×575. Even in a broadcast setting, we cannot exceed the number of vertical lines in the standard, which is 575 in case of PAL. The counterpart for NTSC is 720×480. Yes, PAL has higher resolution but displays fewer fields per second (50 versus 60 for NTSC). If you have an analog camera which supports both resolutions, you may want to opt for PAL setting to extract a bit more resolution out of the camera in vertical dimension. Converting the horizontal resolution of the broadcast camera in pixels to line rating we get 540 (720 x 3 / 4 = 540). You may have seen this spec advertised for analog CCTV cameras and now you know where it comes from.
What does it mean if you see a number higher than 540? There can be two reasons for that. One, the specification is in pixels in which case, it can be up to 720. People working for camera companies often get these metrics confused. Assuming the spec is indeed in “lines” then it simply indicates the resolution of the sensor, NOT what you can extract from it after the signal is digitized and sent out. This means that extra resolution is wasted. Its only benefit is some noise reduction.
Of course, nothing stops anyone from putting lower resolution sensors in the camera and indeed, this is often done. Examine the spec and if the line rating is less than 540, then the resolution is lower than the highest it could be.
As they say, “but wait, there is more!” Turns out even the 540 line spec is grossly overstated. So far we have been talking about the sensor resolution and compliances of it with the standard. But there is another part of the standard which deals with transmission of the same over the air. You might wonder why we would care about that part. After all, we are sending our video signal over a coax wire. Well, the standard used over the coax wire in CCTV applications is the same as what would be put on air by a network.
To make it easier (and reduce power consumption of the transmitter) the standard allows that the signal to be reduced in bandwidth. I will not bore you with the engineering details but there is a handy rule that for every “Megahertz” of bandwidth for a radio signal, we can carry 80 “lines” of video resolution. So to carry 540 lines of the broadcast TV signal, we would need 540/80 = 6.75 MHz of bandwidth. If you look into the specifications for NTSC however, you see that the standard only allows 4.28 MHz. So it goes without saying that we are not able to transmit 540 lines (or 720 pixels).
To figure out what resolution we can transmit, we simply multiply 4.28 MHz by 80 and arrive at a maximum resolution of 340 lines for NTSC (rounding down for simplicity). Yes, you read that right. The camera which advertises 540 lines of resolution, cannot achieve more than 340 once you look at the image that comes out of it over coax. What the vendor is advertising is the raw resolution of the sensor used to capture the video, not what can actually be achieved in a real system when the output is viewed over that coax wire. That extra bit of resolution cannot be extracted out of the camera. It is simply lost as soon as the video leaves the camera.
Using the bit of math we have learned so far, we can translate 340 lines back into pixels. The result is 450 pixels of resolutions (340 x 4 / 3 = 450), again rounding down. Are we there yet? Can we assume that our total pixel resolution is 450×480 for NTSC and 450×575 for PAL? Well, not quite! We need to re-examine the vertical resolution because that is not what it seems either!
In order to reduce the amount of data that needs to be transmitted, both NTSC and PAL employ a poor man’s form of video compression called “interlace.” NTSC updates the picture on your display 60 times a second (PAL does so 50 times per second). But instead of sending all of those 450×480 pixels in every instance, the system transmits every other line in each transmission. These are the fields mentioned earlier. The actual resolution then in each field is 450×240 for NTSC, sent 60 times a second. At the receiving end, we don’t display each field separately but rather, combine two fields into one frame and display that. In the old analog TVs, this was done by relying on your eye average the two fields being drawn at their respective positions. In case of digital TVs and computer monitors, they are combined in memory and then displayed as a whole. In either case, it is important to note that the transmission occurs at 60 fields of half vertical resolution, not 30 full resolution frames. The same works for PAL except that the field rate is 50.
What does this mean in real life? Well, if you mount your analog NTSC CCTV camera on a solid mount with zero vibrations and point it at a static scene with nothing whatsoever moving in it (think of a wall), then the maximum resolution of an interlaced system is the same as progressive (where we transmit full frames of video all at once). So the fact that the system is interlaced doesn’t impact us as at all and we have a vertical resolution of 480. Reason is that it doesn’t matter that we captured and transmitted the subject at different times. Nothing moved 1/60th of a second later so we preserved the full resolution of the image.
Now what happens if a car goes by? Well, now the camera captures odd and even lines of that car in separate intervals (fields). Since the camera is moving, when we sample the image 1/60th of a second later, the pixels are no longer lined up with where they were in the last field. The display then mixes these two and what you get is half the resolution of the previous example in vertical dimension. The visual artifact is jagged lines (every other line appearing to be out of sync with the previous one).
Best way to think of this is to consider that analog TV standards have variable vertical resolution. When nothing moves, they have their maximum resolution (480 and 575 for NTSC and PAL respectively at the sensor). But when there is high motion, you drop down by half. And if there is slow motion, then you are somewhere in between.
You may have heard of ways to de-interlacing an analog video signal. The techniques vary based on sophistication and complexity of implementation. One simple technique is averaging those vertical pixels, resulting in softer images but without jaggies. At the end of the day, it is very hard to undo the effects of interlace for video source material. Yes, you may have heard that de-interlacing works well in case of playing a movie in a DVD player but that is because the source there is a movie and as such, was a progressive source. Such is not the case for live TV.
Putting it all together, your analog NTSC camera can have a maximum resolution ranging from 450×240 to 450×480. Total number of pixels therefore ranges from 0.1 megapixels to 0.2 megapixels. No matter what someone tries to do, and how much money they put in the design of the analog TV camera, they cannot improve on this number. Period!
Needless to say, the low resolution forces you to be much more careful in camera position and lens selection. A wide angle lens on an analog camera covering a large field, is unlikely to be able to capture detail that is recognizable because the resolution simply is not there.
Note that up to now we have been generous and assumed a perfect transmission system from camera to the capture device. Such is not the case with analog signals. Despite being shielded, the coax cable can still pick up noise as can the analog capture hardware in the recorder. The noise does more damage than you may intuit. One of the enemies of video compression used in video recorders is noise. It represents randomness which is very difficult to reduce in size. End result is that the added noise results in recordings which may suffer from more compression artifacts.
Are we done yet? Sorry to say no. On top of everything already discussed we have to consider the fact that analog TV standards have imperfections which introduce artifacts of their own. So called “decoding errors” manifests in such things as false color where a black and white image will bleed some color that is not in the source. This is very visible in analog CCTV captures.
As you see compliance with analog TV standards severely limits what we can do with CCTV cameras. The system works remarkably well for a 50+ year old standard but is nowhere near ideal for an application where recognition of detail (e.g. license plate or someone’s face) is paramount as opposed to enjoyment of a movie or TV programming where fidelity may play second fiddle to the entertainment value.
So what is the solution? Simple: cut the cord with respect to compliance with broadcast standard. We have a simple “point to point” system where both ends are under our control. So we really don’t need to use a universal broadcast TV standard, especially one this old. Even the broadcast world has abandoned analog TV by switching to all digital system with much improved resolution (in US at least). In our world, that means “IP cameras.”
IP Cameras
An IP camera has an image sensor much like the analog camera. However, once it has captured its image, it transmits it as “data” over a network connection. That data is in the form of compressed video frames sent over standardized networking protocol used for computer applications which is where it gets its name. “IP” stands for Internet Protocol which is the low-level language used to transmit data between computers in your home and the Internet. What this implies then is that the IP camera is like a little computer that you connect to, to access your video. Indeed, IP cameras are computers and run operating systems not all that different from your PC. Where they differ is that they are fixed function and their programming cannot be extended by the user.
The fact that the camera uses IP for transmission is not very important. What is important that we are no longer bound by the broadcast standard. In theory, we could now have any resolution we wanted. You could as easily envision a camera with 10,000x2x000 pixels as you can 800×800.
Let’s drill into different technologies used an IP camera and their impact on system functionality and performance.
Sensor:Lowest end IP cameras use the same sensors as analog cameras. In other words, they have a resolution of 720×480 or 720×576. Some go as far as even using interlaced sensors. While interlace is a fact of life in analog camera, we cannot think of any reason to tolerate them in IP world where interlace only hurts the image fidelity. So where possible, avoid using interlaced IP cameras and instead, opt for units with “Progressive” sensors. You can find this fact in fine print of camera spec. If not, ask the manufacturer or avoid the brand altogether. It is a bad sign that they would not be forthcoming with this information.
As the resolution climbs above broadcast level, the sensor type will always be progressive.
By convention, IP camera companies advertize the resolution in “megapixels.” To arrive at megapixels, simply multiply the horizontal resolution by vertical and divide by one million. If a camera has 1280×720 resolution, it would have 0.9 million pixels but this is often rounded to one megapixels.
A useful feature of some cameras is the ability to capture a subset of sensor data. Since an IP camera tends to have a lot more resolution than its analog counterpart, we can still have ample resolution left for the “area of interest,” allowing us to save hard disk space in our recorder. Companies like Mobotix provide such a feature and is a useful one to look for.
To put the resolution of the sensor in perspective, let’s look at the specs for other types of video standards in use today:
- DVD
The DVD Format was designed to deliver the same resolution used in broadcast world for analog TV. So it has the same resolution of 720×480 for NTSC and 720×575 for PAL. You may have noticed how much sharper and better the DVD quality is versus watching analog TV off air (and analog cable). This shows you how much degradation compliance with NTSC/PAL can cause! Of note, DVD players may have “S-video” and component outputs. Using these types of interconnect, you are able to achieve higher resolutions than using the standard single cable coax connection. S-Video requires two cables (one for color and the other for black and white) and component three (one for black and white and two for “color difference” signals). However, neither one of these is in common use in CCTV world (Axis has one camera model with component output for video previews). And neither is the digital standard called HDMI used on newer “upscaling” DVD players. The latter has severe length limitations which would make it an unlikely choice for CCTV. But we digress. Let’s compute the total resolution for DVD by multiplying its horizontal and vertical numbers together. This gives us 350,000 pixels for NTSC and 414,000 for PAL (rounding for convenience). Divide these by one million to get the “megapixel” rating of 0.35 for NTSC and 0.41 for PAL. In other words, even the best form of standard definition video, free of NTSC/PAL limitations, has much less resolution than even a camera phone! Admittedly, the quality of those pixels is far above a camera phone but you get the picture, pun intended! Using the above numbers, a one megapixel IP camera will deliver three times more pixels than NTSC DVD. Note that this is NOT three times more pixels in either dimension: that would result in nine times higher resolution. Rather, we have square root of three or 1.7 times more pixels in either dimension. This is a good time to also talk about why some IP cameras come in “VGA” resolution. VGA refers to a specific PC resolution of 640×480. This resolution is also considered “square pixel” version of NTSC video. You might think that a VGA resolution IP camera would be inferior to its full resolution analog counterpart. But such is not the case since the VGA resolution is transmitted all the way to the receiver, devoid of NTSC/PAL artifacts or reduction of resolution. Indeed, most people are shocked by how much cleaner a VGA IP camera image can be compared to even the best analog CCTV cameras even though the market specs indicate not. - High Definition Television (HDTV)
The US digital TV standard comes in various flavors but the most common are “720p” and “1080i/p.” “P” means progressive and “i” interlaced. So 1080i means 1920×1080 resolution in interlaced format which is used for most broadcast HDTV signals. 1080p has the same resolution as 1080i but as the name indicates, is a progressive format. It cannot be used in broadcast HDTV but is used in Blu-ray Disc format.Doing the math again, 720p translates to roughly one megapixel. And 1080i/p translates into two megapixels. So even though we have made quite a jump from NTSC/PAL formats in moving to HDTV, we are way short of state-of-the-art in sensor resolution as you will see below. - Point and Shoot Cameras
These cameras come in various resolutions but even a $100 one is likely to boast 3-5 million pixels. Many come at resolutions above these. Isn’t remarkable that such a cheap camera has more resolution than HDTV and Blu-ray Disc to say nothing of multiples of an analog CCTV? Yes, it is capturing still images but many also support video these days. - Professional still image cameras
These cameras show us where we could go as far as resolution. As of this writing, high volume professional (DSLR) cameras boast resolution above 20 Megapixels and specialized units exceed 60 Megapixels. Even lower end cameras (under $1000) now have resolutions above 10 megapixels and some even support real-time video capture and compression (although limited to 1080p today).
What’s more, these cameras have superb dynamic range and sensitivity. This is due to use of much larger sensors than what is used in CCTV cameras. But there is no reason why they could not be adapted to CCTV applications (although the cost of both the cameras and lenses would go up appreciably). Of note, a number of pro cameras such as Canon’s entire DSLR range use CMOS sensors, debunking the myth that CMOS sensors used in IP cameras is somehow inferior when it comes to low-light performance.
So what is the extra resolution good for? For one, it gives you the ability to zoom into the image much more without it turning into a soft and fuzzy image. Detail like a license plate will be much more recognizable at 3 megapixels, versus 0.3.
Turning the above upside down, you can choose to have the same resolution but have it cover much wider area. The same 3 megapixel camera can cover the same area as three analog cameras and still have more resolution to boot. Of course, details matter as far as lens selection and positioning but as far as pure resolution is concerned, we can save a lot of cost in camera installation by using fewer cameras. Note that sensor resolution is not the only metric for image quality. Lens quality and low-light ability can impact effective resolution. For example a lens that is soft in the corners is likely to offset the increased sensor resolution in that area. As the resolution goes up, it becomes progressively more important to pay attention to these details.
On light gathering capability, all else being equal, as you increase resolution, the size of “photosites” (elements that capture light in the sensor) gets smaller resulting in higher noise figures. As noted earlier, one can compensate for this by enlarging the size of the sensor. The downside is that this also increases the camera cost (and hence the reason you don’t see a $200 webcam come with a large sensor). There is also special processing which can be done to reduce noise although this tends to lower effective resolution of the camera.
Note that just because two sensors are of equal size, it does not mean that they perform the same. A quarter inch sensor may be as good as a lower quality one that is one third inch. Lux ratings of camera often lacks all the metrics needed to evaluate the camera sensitivity (e.g. shutter speed). As a result nothing replaces independent evaluation of the unit to gauge how well the camera works in low light environments.
Video Compression: Uncompressed video takes considerable amount of data to store and transmit. Even in standard definition, the numbers can be huge. Take DVD. At just 720×480 resolution, times 24 frames a second (used in movies), we are talking about 132 megabits/sec of data. If you have a typical of broadband connection of say, 3 mbit/sec, your link is 40 times slower than what is needed to watch DVDs without compression!
Luckily, video is very amenable to compression. Frames of video themselves have a lot of redundancy in them as do sequence of frames. Take a blue sky. Chances are a lot of pixels are the same and can be described using fewer bits of data. We call this “interaframe compression.” JPEG is a form of interaframe compression. Send a sequence of JPEG frames and we call that Motion JPEG or M-JPEG for short. JPEG is very cheap to implement and hence the reason it is universally offered in IP cameras.
Another form of compression is “interframe” compression. This takes advantage of redundancy between frames. MPEG-2, MPEG-4, H.264 (also called MPEG-4 Part 10 or MPEG-4 AVC), and VC-1 are popular compression standards of this type. At high level, these systems perform a similar function to JPEG in compressing an individual frame. But they also look to see if the current frame is similar to the one before it. If so, then they only transmit what is different and the decoder combines that information with the pervious frame to display the image.
For example, imagine a person walking in front of a building. The building is not changing in every frame. The only thing changing is the pixels describing the person moving. The above systems divide the screen into blocks and then track whether each block moves. If it has, it then tells the decoder to move that square, rather than having to retransmit the whole image. The decoder holds on to the previous frame(s) in order to be able to perform this processing . The amount of compression is not predictable and is picture dependent. A static image achieves the highest level of compression. A noisy night-time image with lots of motion will probably be the lowest.
While a scheme like JPEG can provide 10 to 20 times data reduction without a lot of fidelity loss, systems using MPEG-4 AVC can ratchet this up to 50 or even 100 times compression. In a future article I will talk more about video compression and what things to watch out for there.
Networking
Once we have a video image compressed, we need to transmit it where it is going to be viewed or stored. The favorite method of physical connection is an Ethernet port. Being the most common interconnect scheme for computers for a number of decades, one gains incredible economies of scale in this manner. And with advent of Power over Ethernet (PoE), IP cameras can be powered using the same Ethernet wire.
There is some folklore around lack of bandwidth to distribute video over Ethernet. In reality the opposite is true. A typical IP camera has a data rate of 2-3 megabits/sec. So even the old standby, 100 Mbit/sec Ethernet has ample bandwidth to carry the signal from many cameras over the same wire. In reality, you would be using an Ethernet switch meaning each camera gets its own private 100 mbit/sec so there is no congestion at all. Yes, the final link to the recorder needs to be able to capture data from all the cameras but with advent of ultra low cost gigabit Ethernet switches, there really is no barrier to deployment of large number of IP cameras. And of course, being “data” and digital, we are immune to noise over the cable, unlike analog cameras.
Modern IP cameras provide a range of interfaces to extract the video from them. The simplest form is an included web server in the camera which you can connect to using any browser and view (usually motion JPEG) videos directly.
While operating in the browser provides broad level of compatibility, it can be limiting from functionality point of view. For this reason, camera companies also provide plug-ins called “ActiveX controls” in Windows lingo, which are little applications that know how to talk to the camera. These controls are like the Flash player used on the Internet to play audio/video streams. Next method is through a software development kit (SDK). This is a computer library that application developers use to talk to the camera. The SDK is used for example by third-party DVR software to capture video and control the camera. Without the SDK, third party integration is not possible.
Many cameras have the ability to upload their videos directly to a networked storage device, whether this is a NAS (Networked Attached Storage) or a PC server. Others can email you select video frames, or “ftp” the stream to an Internet server.
On the control front, that is done through software. Instead of running wires to control the functions of a PTZ camera, you would use the software interface to send the same commands to the camera, saving wiring costs.
Summary
I hope you now have a better understanding how severely the performance of analog CCTV camera is capped by the way it has to work (i.e. compliance with broadcast standards). By using a data connection and computer networking, IP cameras can provide much better performance with no real limitations for future growth in resolution or other capabilities. The increased picture detail allows one to save money by installing fewer cameras, or gain a level of detail that simply is not achievable through an analog camera. The fact that the camera can be accessed directly without the need for any special software is the icing on the cake.
Amir Majidimehr is 30+ year veteran of computer, networking and digital media industries. Most recently, he worked at Microsoft as the Vice President of Digital Media Division where his group developed entire suite of audio/video/imaging technologies including compression, high definition DVD formats, audio/video streaming and playback. Prior to that, he led executive roles at a wide range of companies from Sony to professional video editing and processing companies such as Pinnacle Systems. He currently is the Contributing Editor for Widescreen Video magazine where he writes tutorials on how traditional audio/video systems intermingle with modern computing systems and the Internet.