
Access from the University of Nottingham repository:
http://eprints.nottingham.ac.uk/13846/1/420354.pdf

Copyright and reuse:

The Nottingham ePrints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

This article is made available under the University of Nottingham End User licence and may be reused according to the conditions of the licence. For more details see:
http://eprints.nottingham.ac.uk/end_user_agreement.pdf

For more information, please contact eprints@nottingham.ac.uk
CMOS OPTICAL CENTROID PROCESSOR FOR AN INTEGRATED SHACK-HARTMANN WAVEFRONT SENSOR

Boon Hean Pui, B.Eng (Hons)

Thesis submitted to the University of Nottingham for the degree of Doctor of Philosophy

September 2004
ABSTRACT

A Shack Hartmann wavefront sensor is used to detect the distortion of light in an optical wavefront. It does this by sampling the wavefront with an array of lenslets and measuring the displacement of focused spots from reference positions. These displacements are linearly related to the local wavefront tilts from which the entire wavefront can be reconstructed. In most Shack Hartmann wavefront sensors, a CCD is used to sample the entire wavefront, typically at a rate of 25 to 60 Hz, and a whole frame of light spots is read out before their positions are processed. This results in a data bottleneck. In this design, parallel processing is achieved by incorporating local centroid processing for each focused spot, thereby requiring only reduced bandwidth data to be transferred off-chip at a high rate. To incorporate centroid processing at the sensor level requires high levels of circuit integration not possible with a CCD technology. Instead a standard 0.7μm CMOS technology was used but photodetector structures for this technology are not well characterised. As such characterisation of several common photodiode structures was carried out which showed good responsivity of the order of 0.3 A/W. Prior to fabrication on-chip, a hardware emulation system using a reprogrammable FPGA was built which implemented the centroiding algorithm successfully. Subsequently, the design was implemented as a single-chip CMOS solution. The fabricated optical centroid processor successfully computed and transmitted the centroids at a rate of more than 2.4 kHz, which when integrated as an array of tilt sensors will allow a data rate that is independent of the number of tilt sensors employed. Besides removing the data bottleneck present in current systems, the design also offers advantages in terms of power consumption, system size and cost. The design was also shown to be extremely scalable to a complete low cost real time adaptive optics system.
ACKNOWLEDGEMENTS

If I have seen further it is by standing on the shoulders of giants.
-Isaac Newton

I would like to thank Dr. Barrie Hayes Gill for his support, guidance, and not least, patience throughout my research work. I would also like to express my gratitude to Professor Mike Somekh and Dr. Chung Wah See for their help and guidance. To Matt Clark, all your help and brilliant insights are greatly appreciated. To the many research staff, colleagues and technicians who have in one way or another been involved in my work, I thank you.

The research work has been supported by the University of Nottingham, University of Nottingham International Office, University of Nottingham in Malaysia and the Engineering and Physical Sciences Research Council (EPSRC), UK and I would like to thank them for making this possible.

Finally, my sincere thanks go to my friends and family for making this journey bearable, and my deepest gratitude and love to my parents and brother for their belief in me.
# TABLE OF CONTENTS

**ABSTRACT**

1

1 **INTRODUCTION**

1.1 **ADAPTIVE OPTICS** .............................................. 1

1.2 **APPLICATIONS** .................................................. 3

1.2.1 **ASTRONOMY** ..................................................... 3

1.2.2 **OPHTHALMOLOGY** .............................................. 6

1.2.3 **BEAM QUALITY CONTROL** ..................................... 7

1.2.4 **MICROSCOPY** ................................................... 9

1.3 **WAVEFRONT SENSING** .......................................... 10

1.3.1 **SHACK-HARTMANN WAVEFRONT SENSOR** ................. 10

1.3.2 **OTHER WAVEFRONT SENSORS** .............................. 13

1.4 **CENTROID DETECTION** .......................................... 15

1.4.1 **LATERAL EFFECT PHOTODIODES (LEP)** .................... 16

1.4.2 **MULTI-ELEMENT PSD** ........................................ 16

1.4.3 **MULTI-ELEMENT PSD PERFORMANCE** ...................... 18

1.4.4 **CENTROID PROCESSING** ..................................... 22

1.5 **PHOTODETECTION** ............................................... 34

1.5.1 **OPTICAL ABSORPTION** ...................................... 34

1.5.2 **QUANTUM EFFICIENCY AND RESPONSITIVITY** ............. 36

1.5.3 **NOISE AND PHOTODIODE EQUIVALENT CIRCUIT** .......... 39

1.5.4 **PHOTODIODE MEASUREMENTS** ............................ 42

1.5.5 **TECHNOLOGY AND MATERIALS** ............................ 44

1.6 **PIXEL ARCHITECTURES IN CMOS** ............................. 51

1.6.1 **PASSIVE PIXEL SENSORS (PPS)** ........................... 52

1.6.2 **ACTIVE PIXEL SENSORS (APS)** ............................ 53

1.6.3 **NOISE REMOVAL AND EXTENDING DYNAMIC RANGE** .... 57

1.7 **CHAPTER SUMMARY** ............................................. 60

2 **CHARACTERISATION OF CMOS PHOTODIODES** .................. 63
1.1 ADAPTIVE OPTICS

Since Galileo pointed his telescope to the heavens some 400 years ago, man has been trying to see further and further into the stars and in greater detail. The fundamental limit of resolving the images is known as the diffraction limit and is governed by the diameter of the lens used. However it was observed that as larger telescope lenses were used, the astronomical images did not get any sharper when the lenses exceeded about 20cm in diameter [Angel 2000]. There was something distorting the images in a seemingly random manner. This was the air around us. Variations in temperature in the atmosphere cause random fluctuations in wind velocity and hence, changes in the refractive index [Tyson 1998]. This leads to distortion in the images obtained. Fortunately there was something we could do about it and it is called adaptive optics.

Adaptive optics (AO), which has been heavily developed over the last 30 years, allows automatic compensation of atmospheric systems. It deals with the control of light in a real-time closed-loop fashion and is made up of three fundamental components, the wavefront sensor, the control computer and a corrector element such as a deformable mirror. The wavefront sensor acts like the eyes detecting light from the object of interest, such as an astronomical object or a satellite, and transducing the intensity information of the wavefront into phase information of the aberration in the wavefront. The control computer then calculates the necessary changes required to correct this aberration, and passes this on to the corrector or the deformable mirror where these changes are made. Figure 1.1 shows the components of a typical adaptive optics system as used in a telescope [O'Byrne 1996]. Often a tilt-tip mirror is used to rapidly remove beam wander in the incoming beam of light while the deformable mirror performs the higher order corrections.
The most widely used wavefront sensor in adaptive optics is the Shack-Hartmann [Platt 2001] and currently with most of these systems a CCD is used to sample the wavefront and a frame grabber is used to acquire and digitise the image before it is transferred to a PC for reconstruction of the wavefront. The bandwidth of these AO systems is often limited to some tens of Hz [Nirmaier 2003]. Integration of these systems with processing at the detector level will reduce the bandwidth of the data to be transferred off-chip thus allowing fast real-time wavefront detection and correction and is the topic of this research. Furthermore, integration of the wavefront sensor with wavefront reconstruction will reduce the size and cost of the system even further; realising the concept of a System-on-a-Chip (SoC).
Adaptive optics has traditionally been known for its role in compensating wavefront distortions for astronomical applications. The main reason for this is the cost of the key elements of an adaptive optics system - deformable mirrors, wavefront sensors and control systems requiring high-speed computers. AO systems with a reasonable bandwidth (greater than a few Hz) were extremely expensive, with a component cost of >£10^5 [Munro 1999]. Applications of adaptive optics however are not limited to astronomy or defence initiatives and a number of potential applications are surfacing which will benefit from some form of cheap, fast, adaptive optics systems. These range from laser communications, to medical imaging of the retina, to industrial inspection to the development of more efficient lasers as well as underwater imaging devices and better microscopes. Basically adaptive optics can be used wherever light passes through a distorting medium. Section 1.2 will cover some of the application areas where an adaptive optics system can be applied. In Section 1.3 the concept of wavefront sensing is described paying particular attention to the mechanics of a Shack-Hartmann wavefront sensor and how integration will remove the bottleneck in traditional CCD systems. The process of detecting a centroid which is a fundamental component of a Shack-Hartmann wavefront sensor is covered under Section 1.4. Section 1.5 and 1.6 will then review the theory behind photodetection and the possible implementation structures for this. Section 1.7 summarises the chapter while Section 1.8 will detail the layout of the rest of the chapters.

1.2 APPLICATIONS

In addition to system integration, the development of new low-cost technologies such as Micro-Opto-Electro-Mechanical Systems (MOEMS), liquid crystal wavefront correctors and micromachined deformable mirrors [Anderson 1999, Hatcher 2001, Vdovin 1997] will further open up new areas of applications. Some of the key applications for an adaptive optics system are discussed in the following subsections.

1.2.1 ASTRONOMY

The field of astronomy gave birth to the technique of adaptive optics and is widely used in correcting the imaging capabilities of ground-based telescopes. The image
The spatial resolution of uncompensated telescopes can be more than 10 times better on mountains than at sea level [Tyson 2000].

The structure and statistics of turbulence as well as its corresponding effects can be described by a model by Kolmogorov [Tyson 1998]. The effect of this turbulence is to cause high spatial frequency beam spreading, low spatial frequency beam wander, and intensity variations which limits the ability of telescopes to resolve fine details. The level of turbulence at a particular site can be described by a parameter introduced by Fried called the Fried coherence length, $r_o$, [Fried 1965] and is the maximum diameter of the aperture that can be used for collection of the wavefront before atmospheric distortion seriously limits its performance. This parameter defines the limit of the achievable resolution without compensation, as shown in Figure 1.2 by the sketch of the typical point spread function of a star being imaged by an astronomical telescope. The Fried coherence length is ~2cm under poor seeing conditions to ~20cm under good seeing conditions [Mansell 2000]. Figure 1.3 shows the uncompensated and compensated image of a binary star as taken at the Starfire Optical Range [Air Force Research Laboratory Directed Energy Directorate 1997]. With compensation, the image halo or beam spread, as in Figure 1.2, has been corrected for and the two distinct stars of the binary star k-Peg can be discerned.

![Image](image.png)

**Figure 1.2** Beam spread due to atmospheric turbulence limits the resolution for an aperture of diameter D
For adaptive optics to work, the aberrations that are caused by the turbulence have to be measured faster than they can change. This is given by the Greenwood frequency $f_0$, which is strongly dependent on the velocity of the wind, and can range from tens to hundreds of hertz under fair viewing conditions [Tyson 2000]. Another important factor to consider in the design of atmospheric adaptive optics systems is the isoplanatic angle $\theta_0$, which determines the maximum angle that we can look away at our object point and still measure the correct wavefront [Tyson 2000]. Because the isoplanatic patch for the atmosphere is so small, only a tiny fraction of the sky will be near suitably bright stars that can serve as reference beacons. A way of overcoming this is to produce artificial guide stars using powerful lasers to illuminate the sky. Two types of artificial guide stars exist. One using Rayleigh scattering of ultraviolet or visible light illuminates the sky at a height of 5 to 15 kilometres in the atmosphere. The other uses resonant scattering of light from a layer of sodium atoms that sits in the upper mesosphere at about 90 to 100 kilometres in altitude. The second scheme has the advantage of putting the reference beacon higher, thus sampling a larger portion of the path of light from a celestial object in space to a telescope on Earth [Olivier 1999]. The disadvantage is that it is more expensive and requires laser at a specific wavelength of 589nm for excitation of sodium atoms. An emerging technique called Multi-Conjugate Adaptive Optics (MCAO) which uses several guide stars and
wavefront sensors allows the field of view to be extended and could overcome the disadvantage of having to use artificial guide stars [Berkefeld 2001].

Besides atmospheric imaging, underwater imaging and fluid mechanics [Neal 1993] will also benefit from the field of adaptive optics. And just as how the advancement of lasers, imaging devices and optical materials has pushed the frontiers of the field of adaptive optics for astronomy, the theories and techniques developed for the correction of atmospheric turbulence is directly applicable to that of other non-astronomical applications enabling their rapid development.

1.2.2 OPHTHALMOLOGY

Imperfections in the cornea and the eye leads to refractive errors which causes image blurring. This gives rise to long and short sightedness which needs correction with glasses or contact lenses. It is now possible to perform these corrections through eye surgery. Laser-Assisted In-Situ Keratomileusis, or LASIK as it is commonly known, is the procedure of reshaping the cornea with a laser beam to correct for these errors. Typically LASIK corrects for low-order aberrations and in the course of reshaping the cornea to correct these, refractive surgeries can inadvertently increase higher-order aberrations. A wavefront sensor can be used to measure these higher-order aberrations and to allow doctors to have a more detailed and quantitative view of the topography of the cornea before it is operated upon. The first commercial ophthalmic Shack-Hartmann aberrometer, the Complete Ophthalmic Analysis System (COAS), manufactured by WaveFront Sciences, Inc. became available in early 2000 and incorporates a CCD-based Shack-Hartmann wavefront sensor [Salmon]. The human eye is a non-static optical system and the corrections need to be done at a bandwidth of at least several hundred Hz [Nirmaier 2003]. Real-time wavefront correction in the human eye will also allow a better diagnosis of eye diseases like the common glaucoma and will allow the development of the next generation of customised wavefront-guided contact lenses [Thibos 2003].
1.2.3 BEAM QUALITY CONTROL

The beam quality and output power of lasers can be degraded by optical aberrations within the laser resonator [Kudryashov 2002]. Adaptive optics allow the correction of these aberrations using either intracavity or extracavity control of the beam. Intracavity control involves using an adaptive mirror as one of the end mirrors of the laser resonator as shown in Figure 1.4.

Intracavity control is able to influence the geometry of the output modes and stabilise the output energy. Also the output parameters of the beam can be changed without the need to reconstruct the entire cavity or altering the power supply block which is costly and time consuming. Intracavity beam control will also aid in the generation of beams with a super-gaussian distribution [Cherezova 1997], which has lower side lobe intensities than a typical Gaussian beam and consequently, a reduction in higher spatial frequencies and a higher intensity profile. This is very attractive for industrial applications.

For lower orders of aberration, extracavity control is easier to implement. Extracavity correction involves performing correction outside the cavity of the resonator. Extracavity control will allow beams to be accurately focused on a sample as well as maintaining beam quality over long distances. For instance, extracavity control will also be used on the Laser Interferometer Gravitational-Wave Observatory (LIGO) system for the detection of gravitational waves [Mansell 1999]. Gravitational waves are produced by events such as collapses, explosions or collisions of celestial objects and its observation will allow a better view of the universe and its beginnings. They are less attenuated than electromagnetic waves like radio waves but the predicted
magnitudes of such waves are extremely small. As such very sensitive means of detection are necessary to detect these waves and typically laser interferometry with large kilometre sized arms is used. It is necessary to maintain the beam quality and its coherence over the length of the arms making adaptive optics necessary.

Another field that has received a lot of attention lately is that of free space optical communications which will allow high-speed transmission of large bandwidths of data in the order of gigabits and without the need for cables [Weyrauch 2002]. The use of highly collimated laser beams will ensure the security of the communication. Air flow and temperature gradients at ground level will degrade the quality of the communication which can be improved with the use of some form of wavefront correction. However limitations like scintillation, weather, need for line-of-sight and sun-blindness needs to be addressed. In free-space optoelectronic interconnects, a key challenge is maintaining precise alignment of the opto-mechanical system, which requires high tolerances of optical components and opto-mechanics. Correcting any misalignment dynamically using adaptive optics will help reduce the specifications and tolerance requirements of the opto-mechanical system and improve the cost/performance trade-off [Gourlay 2000].

In laser fusion, pulse shaping and precision focus of the high-energy lasers involved will ensure the quality of the laser pulse as it goes through the amplification process and will allow safe testing of nuclear devices as well as aid fusion energy research [Metrologic Instruments Inc.]. Industrial applications of laser beam control include laser welding and cutting [Haferkamp 1993]. For pulse piercing technology using deformable mirrors, the piercing time can be reduced and for laser cutting technology the thickness of high-quality cutting can be increased. Adaptive optics was used to laser cut thicknesses up to 16 mm in mild steel without decrease of the cut surface with a thickness increase by maintaining focus of the laser beam [Geiger 1996]. Commercially, adaptive optics can also be applied to optical data storage such as in CD drives.
1.2.4 MICROSCOPY

In microscopy, an adaptive optical system can aid in the sensing and correction of aberrations due to imperfections and misalignment in components and the mismatch of refractive indices between the media and the sample to be observed [Booth 2002a, Booth 2002b]. For instance, in a confocal microscope a pinhole is used to block out light from the specimen that are not within the focal plane. This allows strong rejection of multiple scattered light and gives significant improvements in resolution over conventional microscopes [Diaspro 2001]. Its principle is illustrated in Figure 1.5. By scanning the specimen a full 3D image of the specimen can be built up. However, even small amounts of spherical aberration are enough to produce considerable degradation of the imaging performance in the depth direction. Also, confocal microscopes are often operated in reflection because aberrations caused by the refractive index structures within the specimen make imaging in transmission difficult. This results in a loss of phase information only available in transmission. The use of an adaptive optical system would overcome this and allow the compensation of the aberrations introduced by the specimen as well as any misalignment of optical components in the microscope [O'Bryne 1999, Sheppard 1991].

![Figure 1.5 Principle of the confocal microscope](image)

In multiphoton fluorescence microscopy, a point source is scanned through the sample volume and the resulting fluorescence is imaged. The localised excitation provides high spatial resolution, efficient background rejection, reduced photobleaching and
increased penetration depth in specimens compared to conventional microscopes. It allows the elimination of the confocal aperture and hence does not limit the number of photons detected. However specimen induced aberration again reduces the achievable resolution as well as increases the necessary laser power to achieve imaging. Aberration correction using feedback will allow the imaging depth to be extended and increase the efficiency of the system [Marsh 2003].

1.3 WAVEFRONT SENSING

As mentioned previously, an integral part of an adaptive optics system is the wavefront sensor which quantitatively measures the amount of aberration present in the wavefront. Wavefront sensing can be either modal or zonal [Tyson 1998]. In modal sensing the wavefront is expressed in terms of coefficients of the modes of a polynomial expansion each representing one of the known aberrations (e.g. tip, tilt, defocus, astigmatism, coma etc.), whose magnitudes are measured separately. Current modal sensors can only sense low-order aberrations. In zonal sensing the wavefront is divided into a number of zones, and the slope or the curvature of the local wavefront is measured in each zone. The Shack-Hartmann wavefront sensor is one such sensor.

1.3.1 SHACK-HARTMANN WAVEFRONT SENSOR

A Shack-Hartmann wavefront sensor uses an array of microlenses\(^1\) to sample the optical wavefront as shown in Figure 1.6. If the incident beam had a flat wavefront, the light falling on each lenslet would be focused at the centre of each tilt sensor. If instead the wavefront is not flat but distorted, the spots obtained by the lenslets will deviate from the centre and by measuring this deviation, the local wavefront tilts are obtained. To remove alignment errors sometimes a reference plane wave beam is used and the deviation is then measured from the reference positions obtained [Tyson 1998].

\(^1\) The Shack-Hartmann wavefront sensor is an improvement over the basic Hartmann test which uses an array of hard apertures instead of the lenslet array. The Shack-Hartmann samples the entire wavefront and has the advantage of better photon efficiency. The disadvantage is in the cost of the microlenses and the difficulty in the optical alignment.
Chapter 1

Array of tilt sensors measuring displacement of spots

Turbulence

Lenslet array

Figure 1.6 Shack-Hartmann wavefront sensor

Traditional CCD systems for Shack-Hartmann wavefront sensing use the CCD to sample the entire wavefront and entire array of spots need to be read out before they are processed leading to a data bottleneck. This bottleneck is illustrated in Figure 1.7 in comparison with our proposed system, where each local wavefront tilt is measured by a local tilt sensor with its own detector array and local centroid processing. The parallel readout and processing of the raw data into reduced bandwidth centroid data will allow faster frame rates to be achieved. In addition, the array of tilt sensors can be linked to a matrix processor to reconstruct the estimate of the complete wavefront. Once calculated, the reduced bandwidth wavefront data can then be transferred off-chip. Hence, as a result of parallel processing, the data rate is independent of the number of tilt sensors employed.
Figure 1.7 Integration of on-chip centroid processing to remove data bottleneck

Assuming that at each tiny local portion of the wavefront the only aberration is the tilt, the local wavefront tilt can be linearly related to the displacement of the centroid position from its centre or reference position, as illustrated in Figure 1.8 and given by:

$$\text{Tilt} = \frac{dW}{dx} = \frac{\Delta x}{f}$$  \hspace{1cm} (1.1)

where $x$ is the displacement of the centroid and $f$ is the distance of the subaperture from the focal or measurement plane and $f \gg$ maximum $dW$ over the entire subaperture. From these local wavefront tilts, the entire wavefront can be reconstructed and this will be covered further in Chapter 5.

Figure 1.8 Relationship between local wavefront tilt and displacement of the centroid (for a single lenslet of Figure 1.6)

The size of the subaperture required for correct measurement of the wavefront is given by the distance over which the subaperture can pass a coherent beam, i.e. over which the optical phase distortion is highly correlated. In the case of atmospheric optics, this is given by Fried's coherence length, $r_o$, which has a dependence of $\lambda^{0.65}$ with wavelength, $\lambda$, and as such astronomical adaptive optics is usually performed in the infrared. Another factor to consider is the number of degrees of freedom required, that is, the number of actuators in the wavefront corrector, and this is closely related to the number of subapertures required. There should be roughly one actuator corresponding
to each patch of sky equal in size to Fried's coherence length [Mansell 2000], so the number of subapertures required, \( N \), will be:

\[
N = \left( \frac{D}{r_0} \right)^2
\]

where \( D \) is the size of the entire pupil or wavefront. Hence the longer the wavelength the lower the complexity.

The Shack-Hartmann wavefront sensor is simple to construct, robust with no moving parts, compact and is by far the most common and established wavefront sensor. It offers high accuracy, reproducibility and a wide dynamic range [de Lima Monteiro 2002]. The work done in this thesis focuses on the use of a Shack-Hartmann wavefront sensor because of the high level of integration possible but it is by no means the only option open to designers of adaptive optic systems. The following section will briefly describe the other wavefront sensing techniques available and why these are less suitable for the purpose of this work.

### 1.3.2 OTHER WAVEFRONT SENSORS

The choice of wavefront sensor is very much dependent on the application. Several other common wavefront sensing techniques include interferometers, phase diversity, curvature wavefront sensors and the relatively new pyramid wavefront sensors. Interferometric methods include the lateral shear interferometer which measures the wavefront slope or the first derivative of the phase and the point diffraction interferometer which measures the phase of the wavefront directly [Tyson 1998]. The lateral shear interferometer works by splitting the beam and introducing a lateral shear on one arm and measuring the difference or interference between these two beams. The point diffraction interferometer also generates its own reference but does this by capturing a small part of the beam and expanding this as a plane wave reference. In general, interferometric methods of wavefront sensing require monochromatic, highly coherent sources to work making them unsuitable for certain applications such as astronomical imaging. They are also vibration sensitive, expensive and wavefront extraction is complicated so real-time analysis is difficult. Unlike the Shack-Hartmann, they suffer from phase ambiguity of phases exceeding \( 2\pi \) and they cannot be used for pulsed sources. However, the point diffraction interferometer for example,
performs better than the Shack-Hartmann wavefront sensor in strong scintillation where phase discontinuities make the use of linear reconstruction difficult.

Another technique called phase diversity retrieves the phase from the analysis of two simultaneous images, one in-focus and the other defocused [Jefferies 2002]. This method has the advantage of not having any particular requirement on the optical beam and can be used with greatly extended sources. But the algorithm is non-linear and hence slow so it is often used as a post-processing technique for measuring aberrations and deblurring images.

The curvature wavefront sensor works by measuring the irradiances at two planes at the same distance but on opposite sides of the focal point [Roddier 1998b]. By solving the irradiance transport equation that relates the irradiances on the two planes, the curvature of the wavefront can be obtained. They have the advantage of being cheaper and more sensitive than the Shack-Hartmann. However, the equation is non-linear and its solution is not trivial [de Lima Monteiro 2002], and they are difficult to implement for systems that require large number of degrees of freedom such as in highly segmented telescopes [Jefferies 2002] and are only suited for low order systems. On highly segmented mirrors they could still be used for the tip/tilt alignment or the alignment of the primary mirror segment. In confocal microscopy, curvature sensing does not work well due to strong diffraction effects.

Pyramid wavefront sensors work by focusing the wavefront onto the central vertex of a glass pyramid which splits the beam into its four parts with the four edges acting like four Foucault knife edge tests and the images contain information of the aberration present in the wavefront. Pyramid wavefront sensors offer higher sensitivity than Shack Hartmann wavefront sensors and also allow variable gain which makes them useful in wide field adaptive optics. However the fabrication of the pyramids is no simple matter. The quality of the edges between the faces of the pyramids and the size of the roof at the apex of the pyramid are critical [Canadian VLOT Working Group 2003]. Manufacturing of single pyramid structures using the classical figuring and polishing technique is a time consuming process and the production of a large number of identical pyramids is still being developed.
A new development, the hybrid curvature and gradient sensor enables one to obtain information on the local curvature as well as the local wavefront tilts or gradients while maintaining the simplicity of the Shack-Hartmann wavefront sensor [Paterson 2000]. The sensor uses quad cells placed at the foci of an array of astigmatic lenslets and the curvature signal is obtained from the difference of the pair of diagonal elements of the quad cell. Experimental results of this design have yet to be published.

Several factors make the Shack-Hartmann wavefront sensor the choice for an integrated wavefront sensor not least of which is that it requires only simple processing in finding the spot positions which can easily be integrated at the sensor level to reduce the amount of data to be sent off chip. Lower resolution imagers can be used in finding the centroid position, instead of obtaining complicated fringe data in interferometric methods for example. The linear relationship between the spot displacement and the local wavefront tilt also means a simple linear reconstruction technique can be used. This translates to fast real-time correction of wavefront aberrations. Integration could also lead to a reduction in size and costs in many applications.

1.4 CENTROID DETECTION

The fundamental process performed in a Shack-Hartmann wavefront sensor is the detection of the optical centroids. Optical position-sensitive detectors (PSDs) detect the centroid position of a light spot projected on their surface and can be divided into two broad categories namely lateral-effect PSDs and multi-element PSDs [Sharman 2002]. Besides adaptive optics, optical position sensing has numerous commercial, industrial and laboratory applications. In the manufacturing process position-sensitive devices are used to characterize lasers, align optical systems, and calibrate and analyze machinery. PSDs are also used as triangulating sensors in various domestic appliances for switching the appliances on and off by detecting the presence of a body. They are also used in the feeding of paper in fax machines and printers and in the reading of disc tracks in CD players.
1.4.1 LATERAL EFFECT PHOTODIODES (LEP)

LEPs, as shown in Figure 1.9 (c), consist of a single resistive sheet formed by a p-n junction. The photogenerated charge carriers in the silicon move towards the appropriate electrode where the photocurrent at each electrode is inversely proportional to the distance between that electrode and the centroid of the incident light beam. Lateral effect PSDs are usually operated under reverse bias. Different geometries and positioning of the electrodes in lateral effect PSDs will give rise to tradeoffs in terms of linearity, sensitivity and resolution [Wang 1989].

A lateral effect PSD requires large uniform sheet resistance for linear operation, which is not readily available in a standard CMOS process making integration with circuitry difficult [de Lima Monteiro 2002] and hence unsuitable for the aims of this work. However, the performance of the LEP shall be compared with other PSD structures in Section 1.4.3.4.

1.4.2 MULTI-ELEMENT PSD

Multi-element PSDs consist of separate active areas. The simplest two-dimensional multi-element structure would be the quad cell, shown in Figure 1.9 (a). Larger structures are termed multi-pixel arrays. Like LEPs, quad cells have simple readout schemes. The position of the incident spot is determined by the comparison of the signals from the four quadrants as illustrated in Figure 1.9(a) and described below:

\[
x = \frac{[(B+D) - (A+C)]}{[A+B+C+D]}
\]

\[
y = \frac{[(A+B) - (C+D)]}{[A+B+C+D]}
\]

Figure 1.9 Different position sensitive detector (PSD) structures
For multi-pixel arrays, the position of the spot can be found either by simply finding the maximum signal in the array, and this is termed binary position sensing [Makynen 1998], or by finding the normalized first order moment of the signals of all the pixels in the array [Horn 1986] and this is given by:

\[
C(x) = \frac{\sum r_{xn} I_n}{\sum I_n}; \quad C(y) = \frac{\sum r_{yn} I_n}{\sum I_n};
\]

(1.4)

where \( r_{xn} \) is the displacement in the x-direction of pixel \( n \)
\( r_{yn} \) is the displacement in the y-direction of pixel \( n \)
\( I_n \) is the light (photocurrent) level of pixel \( n \)

This essentially finds the weighted average of the different elements. Finding the weighted average offers the advantage of subpixel accuracy at the expense of more complicated processing. Higher order moments can also be found. The second order moment for example can be used to give the axis of least inertia or orientation of the imaged object [Standley 1991]. In the field of computer vision, the centroid and higher order moments are often used for character and object recognition [Cash 1987, Dudani 1977, Low 1998] as well as image compression [Karadimitiou 1998].

Other methods for computing a centroid from multi-pixel arrays also exist, such as the median-sum method used by the students of Johns Hopkins University [Dickinson 2003] for tracking objects, which was motivated by the Robocup competition where robots are built to play soccer. In this method, the row and column currents are summed and the median of these currents represent the centroid. This technique has the advantage of not requiring complex mathematical processing but is only accurate when a large number of pixels are used. Also this technique does not provide subpixel accuracy.

Another technique for determining the centroid of an object is by fitting a suitably defined PSF to a series of images [Fosu 2004]; a Gaussian function for stellar images for example. This method can only be used when the image is spread over more than four pixels but is said to give better accuracy than the moment analysis method. However, it is computationally intensive and complex making integration and real-time operation difficult.
1.4.3 MULTI-ELEMENT PSD PERFORMANCE

Pixelated position sensitive devices are typically evaluated in terms of linearity, positional sensitivity and positional range. These are affected by the detector size\(^2\), the cell density i.e. the number of cells for a given detector size, the gap between the cells and the intensity profile of the spot. Consider a uniform circular beam incident on a bi-cell, which is basically a 2-cell device which measures position in 1 dimension. The results of sweeping the beam of varying sizes across the cells are simulated and shown in Figure 1.10. This case is then extended to a 4-cell linear array and the results are shown in Figure 1.11. Note that for the simulations, truncation of the beam in the vertical direction is ignored. That is the height of the cells are infinitely long and the problem is limited to one dimension. These results shall be discussed in terms of spot size, cell density and beam intensity profile.

\[
\text{Shaded area} = r^2 \cos\left(\frac{x}{r}\right) - x\sqrt{r^2 - x^2}
\]

where \( r \) is the radius of the beam and \( x \) is the lateral displacement.

Figure 1.10 Response of a bi-cell PSD for spot sizes of different radius, \( r \)

\(^2\) Detector size is the size of the entire array whereas the cell size is the size of a single element or pixel in the array.
1.4.3.1 Spot size

From Figures 1.10 and 1.11, we can see that when a spot is smaller than the size of a cell or pixel and it moves completely into one cell tracking is lost, which results in a step-like response [Sharman 2002]. While tracking is still achieved, non-linearity for spot sizes smaller than the detector size is due to the circular nature of the beam. Non-linearity for spot sizes larger than the detector is due to the truncation of the beam as it moves off the array. Maximum linearity and positional range is obtained when the spot size is the size of the entire detector as shown in Figure 1.10 for \( r = 1 \). However, the spot size is usually made smaller for two reasons [de Lima Monteiro 2002]. For large displacements, the beam may impinge on neighbouring cells leading to optical crosstalk. Secondly, the positional resolution or positional sensitivity is higher for smaller spot sizes because for a given displacement a small spot produces a much bigger differential signal.
1.4.3.2 Cell density

The larger the cell density the better the linearity [de Lima Monteiro 2002]. This can be seen from the differential and double differential of the PSD response of a bi-cell and 4-cell linear array in Figure 1.12 (b) and (c). The downside is that the positional sensitivity is poorer as indicated by the slope of the PSD responses. Also larger cell density means more complicated processing and longer processing time. As we have seen, positional sensitivity can be improved by making the spot size smaller. There is a trade-off between linearity and positional sensitivity. Multi-pixel arrays are able to deal better with smaller spot sizes, and likewise for a given spot size of a few pixels, the larger the array the larger the positional range achievable.

Figure 1.12 Comparison of a bi-cell and a 4-cell linear array PSD response
1.4.3.3 Intensity profile

The effect of the beam shape and intensity profile also needs to be considered. The response of a quad cell is only linear over the whole range for a rectangular or square beam. With a circular beam, linearity is only achieved over the central region of the quad cell. The situation is even worse for laser beams which have a Gaussian profile [de Lima Monteiro 2002]. For a Gaussian beam, maximum linearity is not obtained with a beam the size of the quad cell but of that smaller due to the infinite extent of a Gaussian beam. With a Gaussian beam incident on a multi-pixel array, typically a spot size of about 1 to 2 pixels would then be suitable for maximum linearity, sensitivity and positional range.

1.4.3.4 PSD comparisons

Because of its higher positional sensitivity but lower linearity and positional range compared to LEPs, quad cells tend to be used more as centring devices than as linear position sensors where LEPs are more dominant [Mäkynen 2000]. As a custom device, the LEP offers fine resolution over a large positional range as there are no gaps and no problems of loss of tracking when the beam is in a single detector segment as in the case of multi-element PSDs. On the other hand, quad cells have lower noise and a faster response than LEPs and a particular disadvantage of LEPs is that it does not cope well with stray or background light whereas discrete detectors are able to remove this somewhat by applying a threshold.

Quad cells have simple readout schemes but are not very linear. They are designed primarily for measuring small deviations because the incident beam must impinge simultaneously on all four sectors of the detector [On-Trak Photonics]. Multi-pixel arrays have better linearity and positional range at the expense of processing time and positional sensitivity. They also offer greater flexibility and are able to deal with multiple spots and non-uniform intensity profiles. Quad cells require the beam to be defocused in order to achieve sufficient linearity making it susceptible to illumination fluctuations [Mäkynen 2000], that is, smaller spot sizes deal better with scintillations due to atmospheric turbulence. The relative performance of the different PSD structures can be summarised as in Figure 1.13.
Increasing linearity and positional range, decreasing positional sensitivity

(a) Quad cell  (b) Multi-pixel array  (c) Lateral Effect Photodiodes (LEP)

Figure 1.13 Performance of the different PSD structures

With any multi-element detector, the issue of crosstalk arises and requires mentioning. There are two possible sources of crosstalk; crosstalk from other elements or cells and crosstalk from the substrate. Crosstalk from outside the array due to diffused carriers actually improves the linearity by increasing the signal at the edges and gives the appearance of larger pixel size at the edges. However, crosstalk from within the array serves to average the centroid value towards the centre leading to a reduction in positional sensitivity.

1.4.4 CENTROID PROCESSING

In the previous section it was shown how lateral effect photodiodes (LEP), quad cells and multi-pixel arrays are used for the purpose of centroid detection. In this section the processing techniques in computing the centroid from these architectures are presented. Table 1.1 shows a summary of the work done by other groups capable of obtaining optical centroids using standard CMOS or BiCMOS processes. Most LEP systems have the processing performed off-chip because the LEP itself is not usually fabricated on a standard CMOS process due to the high non-linearity obtained. Turner [Turner 1994] demonstrated a LEP in standard CMOS with photocurrents measured externally at a maximum bandwidth of 2.4 kHz. The reported resolution was approximately 0.25\,\mu m but with non-linearity at the edges reaching 40%. Centroid processing for quad cell and multi-pixel array architectures, on the other hand, can be readily integrated on-chip.
<table>
<thead>
<tr>
<th><strong>Author, Year</strong></th>
<th><strong>Process</strong></th>
<th><strong>Architecture</strong></th>
<th><strong>Location</strong></th>
<th><strong>Application</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Johansson, 2003</td>
<td>0.35µm CMOS</td>
<td>Centre of Gravity</td>
<td>Integrated Vision Products, Sweden</td>
<td>General purpose machine vision sensor</td>
</tr>
<tr>
<td>Nirmayer, 2003</td>
<td>0.35µm CMOS</td>
<td>Winner Take All (WTA) circuitry</td>
<td>Kirchhoff Institute of Physics</td>
<td>Ophthalmology</td>
</tr>
<tr>
<td>Akita, 2002</td>
<td>0.64µm CMOS</td>
<td>Thresholding (point masking to multiple pulse modulation)</td>
<td>Future- University, Hakodate, Japan</td>
<td>Robot vision</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th><strong>Bandwidth</strong></th>
<th><strong>Pixel size</strong></th>
<th><strong>Array size</strong></th>
<th><strong>Positional resolution</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>N/A (Bit-serial processor in each column. Multiplication takes 36 cycles of 33MHz clock)</td>
<td>9.5µm (fill factor: 60%)</td>
<td>1536x512 pixels</td>
<td>17µm (1.25 pixel uncertainty, alternate pixels for x and y centroid)</td>
</tr>
<tr>
<td>300 Hz (390pW detectable power)</td>
<td>17µm</td>
<td>8x8 array of 21x21 pixels</td>
<td>120µm (1 pixel)</td>
</tr>
<tr>
<td>20 kHz</td>
<td>120µm (fill factor: 5.7%)</td>
<td>23x23 pixels</td>
<td>120µm (1 pixel)</td>
</tr>
<tr>
<td>Author</td>
<td>Institution</td>
<td>Technology Description</td>
<td>Specifications</td>
</tr>
<tr>
<td>------------------------</td>
<td>--------------------------------------------------</td>
<td>----------------------------------------------------------------------------------------</td>
<td>-------------------------------------------------------------------------------</td>
</tr>
<tr>
<td>Ambundo, 2002</td>
<td>Mixed-Signal-Wireless (MSW), Texas Instruments</td>
<td>Quad cell (on-chip wavefront reconstruction using resistive grid to solve 2nd derivative of phase from adjacent quad cell centroids)</td>
<td>AMI 0.5µm n-well CMOS for translinear amplifier (rest of chip not yet fabricated)</td>
</tr>
<tr>
<td>de Lima Monteiro, 2002</td>
<td>Electronic Instrumentation Laboratory, Delft University of Technology</td>
<td>Quad cells (off chip centroid computation by PC)</td>
<td>1.6µm CMOS process, 8x8 quad cells, 600µm x 600µm, 200 Hz closed-loop correction, 260 Hz wavefront reconstruction, 3.125 kHz sensor readout</td>
</tr>
<tr>
<td>Oike, 2002</td>
<td>University of Tokyo</td>
<td>Thresholding (with logarithmic sensor and correlation circuit for background suppression)</td>
<td>0.5µm CMOS, 64x64 pixels, 40µm (fill factor: 18.05%), 2 kHz</td>
</tr>
<tr>
<td>Droste, 2001</td>
<td>Kirchoff Institute of Physics</td>
<td>WTA</td>
<td>0.6µm n-well CMOS, 16x16 array of 19x19 pixels, 17.6µm, 500 Hz (1nW detectable power)</td>
</tr>
<tr>
<td>Authors</td>
<td>Institution</td>
<td>Methodology</td>
<td>Device Specifications</td>
</tr>
<tr>
<td>------------------</td>
<td>------------------------------------------------------------------------------</td>
<td>------------------------------------------------------------------------------</td>
<td>--------------------------------------------------------------------------------------</td>
</tr>
<tr>
<td>Pain, 2000</td>
<td>NASA’s Jet Propulsion Laboratory, California</td>
<td>Capacitive array (centroid computation for up to 9x9 pixels)</td>
<td>HP 0.5μm CMOS</td>
</tr>
<tr>
<td>Furth, 1998</td>
<td>The Klipsch School of Electrical &amp; Computer Engineering, New Mexico State University</td>
<td>Quad cell (current-to-voltage conversion and quad cell differencing circuits integrated on-chip)</td>
<td>1.2μm CMOS 2x2 (single quad cell) 160μm x 160μm</td>
</tr>
<tr>
<td>Makynen, 1998</td>
<td>University of Oulu, Finland</td>
<td>Thresholding, off-chip centroid calculation of binary image map</td>
<td>1.2μm n-well CMOS 16x16 pixels 50μm (30% fill factor)</td>
</tr>
<tr>
<td>Turner, 1994</td>
<td>Optoelectronic Computing Systems Center, University of Colorado</td>
<td>WTA for coarse resolution, LEP for fine resolution</td>
<td>2μm n-well CMOS 2x1 LEPs 400μm x 180μm</td>
</tr>
<tr>
<td>----------------------------</td>
<td>---------------------</td>
<td>-------------------------------------</td>
<td>--------------</td>
</tr>
<tr>
<td>Deweareh, 1992</td>
<td>160 x 160 pixels</td>
<td>256 x 256 pixels</td>
<td>±0.3% of a 25 x 25 square</td>
</tr>
<tr>
<td>Integrated Vision Products of Massachusetts Institute of Technology, Sweden</td>
<td>2μm CMOS process</td>
<td>5kHz</td>
<td>94μm x 90μm</td>
</tr>
<tr>
<td>Forchheimer, 1992</td>
<td>2μm CMOS process</td>
<td>5kHz</td>
<td>94μm x 90μm</td>
</tr>
<tr>
<td>Standley, 1991</td>
<td>2μm CMOS process</td>
<td>190μm x 190μm</td>
<td>10 x 10 pixels</td>
</tr>
<tr>
<td>Alberta Microelectronic Centre, Canada</td>
<td>3μm p-well CMOS process</td>
<td>190μm x 190μm</td>
<td>190μm x 190μm</td>
</tr>
</tbody>
</table>

Table 1.1 Other research efforts capable of optical centroid detection
1.4.4.1 Quad-Cell Centroid Processing

Processing using quad cells are relatively simple requiring only a minimum number of signals; two from each axis. De Lima Monteiro [de Lima Monteiro 2002] demonstrated an approach for an integrated Shack-Hartmann wavefront sensor using an array of 8 x 8 quad cells in a 1.6μm CMOS process. The sensor can be read out at a rate of 3.125 kHz but the current-to-voltage conversion and serial conversion of the analogue voltages into digital format was performed off-chip and the centroid computation was carried out on a PC. The resulting operating frequency of 260Hz was limited by the data acquisition card. Another quad cell centroiding approach, this time in analogue using a 1.2μm CMOS process, by Furth [Furth 1998], integrates the current-to-voltage conversion on-chip using passive and active loads as well as differencing circuits which computes the difference between the photocurrents in the x and y-direction. The differencing circuits consist of double-differential transconductance amplifiers. Experimental results were not reported. However, recently, Ambundo and Furth [Ambundo 2002] have incorporated the wavefront reconstruction on-chip by finding the second derivative of the phase by taking the difference between the centroid currents of neighbouring quad cells and injecting this result into a resistive grid which solves this second derivative to obtain the phase. Normalization allows the centroid computation to be independent of light intensity and was achieved using a modified current amplifier to divide the sum of the four photocurrents in the quad cell. Currently the system has only been simulated and yet to be fabricated and no performance results were shown.

Charge-coupled devices (CCD) are multi-pixel arrays but when used in a Shack-Hartmann wavefront sensing system CCDs are typically used as an array of quad cells with guard row and column pixels between them [Thompson 2002]. Due to the serial readout nature of CCDs, the entire wavefront has to be sampled and a whole frame of light spots read out before they can be processed. This results in a data

3 There are applications where the use of more pixels per subaperture than a quad cell is needed such as in varying seeing conditions. A multi-pixel array can easily be adapted for such circumstances at the expense of reduced signal-to-noise ratio and increased computational load.
bottleneck. Processing of individual light spot positions at the sensor level would alleviate this problem as only reduced bandwidth data need to be transmitted off-chip. However, circuit integration on CCDs remains difficult (see Section 1.5.5.1).

1.4.4.2 Multi-Pixel Array Centroid Processing

Processing using quad cells offer limited displacement range and require careful alignment of the null point of the system [Tyson 1998] as large offsets from the null point will reduce the dynamic range of the system and lead to significant non-linearity [Dillon 1999]. Using a multi-pixel array will allow the system to cope better with varying aberrations and seeing conditions. Efforts into incorporating centroid computation for multi-pixel arrays at the sensor level can be categorised into two basic approaches, analogue and digital and several different sub-approaches, as illustrated in Figure 1.14.

![Centroid Processing Diagram](image)

Figure 1.14 Different approaches for optical centroid processing using multi-pixel arrays
1.4.4.2.1 Analogue Centroid Processing for Multi-Pixel Arrays

Most multi-pixel array approaches are performed in analogue, using either an analogue current division method capable of subpixel accuracy, or discrete binary position sensing techniques\(^4\). With the analogue current dividing method, photocurrents are divided on a uniform resistive array [Gonnason 1990, Standley 1991] or a linearly varying capacitive array [Pain 2000]. Both effectively compute the first order moment of the array photocurrents. With the uniform resistive array, the photocurrent of a pixel is divided on the line and the difference in output currents of the ends of the resistive line is directly related to the position of the incident light on the array. With a quadratic resistor line, the second order moment can be obtained and used to determine the orientation of the object [Standley 1991]. In addition to the basic resistive line, Deweerth [Deweerth 1992] used a current mirror and differential transistor pairs to establish feedback allowing the system to continuously respond to changes in spot position. However, non-idealities and mismatch in these additional circuitry caused offsets in the system. With the linear capacitive array, the pixel voltages are sampled onto separate sampling capacitors, the sizes of which are proportional to the integer row and column addresses, hence giving the inner products of the centroid computation of equation (1.4).

Binary position sensing effectively uses a form of thresholding technique to reject all photocurrent levels below a certain threshold level or below the largest signal level in the array or a collection of pixels. Many variations are possible but two commonly used circuits are the winner-take-all (WTA) circuit [Droste 2002, Nirmaier 2003] or some form of on-pixel comparator [Burns 2003, Makynen 1998]. Figure 1.15 shows the basic form of the WTA circuit and its \(I_D\) vs. \(V_{DS}\) characteristic. A WTA circuit consists of an array of competing cells with each cell consisting of two MOSFETs \(M_s\) and \(M_F\). \(M_s\) senses the input current \(I_i\) while \(M_F\), if activated, draws the output current \(I_o\).

\(^4\) Digital centroid computation in this thesis refers to the computation of the centroid from several bits of digitized pixel values and not from a binary image map such as in the case of binary position sensing.
Because all the $M_S$ are identical and are gate-connected, they have the same $I_D$ vs. $V_{DS}$ characteristic. The one with the highest input current will generate the highest drain potential and hence the highest $V_{GS}$ of all the $M_F$, therefore sinking most of the current source $I_{src}$ and shutting off all other $M_F$. The computation is continuous in time and the winning output encodes the logarithm of its associated input since the $M$ all the $M_F$ are operating in subthreshold [Lazzaro 1988]. Saturation of the pixel is determined by the saturation of the WTA $M_S$ MOSFETs and positional accuracy is limited to that of a single pixel. With this circuit, a very slow response time (several hundred ms) is obtained due to the large photodiode capacitance seen at the drain of $M_S$. The capacitance seen was reduced by using a regulated cascode configuration. Response time can be improved further by setting the drain of $M_S$ to a defined value at startup and by introducing positive feedback into the WTA. But enabling feedback reduces accuracy of position detection due to mismatches. Nirmaier et al. [Nirmaier 2003]
introduced an interdigitated topology to the WTA concept by splitting the single WTA circuit into several groups. This has the advantage of increased robustness against defective outputs, reduced sensitivity to mismatch and faster response.

For analogue centroid computation utilizing destructive readout such as in the current division method, or for those utilizing the WTA algorithm, two discrete photodiodes are needed per pixel. One for the x-centroid and one for the y-centroid. This results in lower fill factor, sensitivity and a non-linear spatial response. De Lima Monteiro [de Lima Monteiro 2002] proposed the use of a spiral structure to reduce the non-linearity.

With these architectures, the pixels in each row and column are tied together and the photoccurrents along each row and column are summed so only two sets of current division or WTA circuits are needed per array, one for each axis, as illustrated in Figure 1.16. In a CCD this would be equivalent to binning all pixels in the row or column [Dillon 1999].

![Figure 1.16 Use of two photodiodes per pixel and the summation of photocurrents along each row and column with analogue centroid computation [Droste 2002]](image)

Standley [Standley 1991] used a uniform grid of resistors to aggregate the photocurrents in both the x and y-dimensions, hence eliminating the need for two photodiodes per pixel. However, this suffers from non-linearity due to the tolerance of on-chip resistors as well as increased power consumption and thermal noise. It also required the use of two resistive lines per axis instead of just one. This technique has limited usage in position sensing because the advantage of increased fill factor and sensitivity from removing the need of a second structure is lost by the need to
integrate a resistor at each pixel. However, its use in neural network structures for vision chips is common as interconnectivity between neighbouring pixels is desired.

Makynen [Makynen 1998] used global threshold current comparison per pixel to generate a binary image map and off-chip moment calculation of the binary map to obtain sub-pixel accuracy. Unlike the WTA circuit approach, it is able to deal with multiple beam spots and it does not require two structures per pixel. However, it does not deal well with non-uniform intensity profiles due to its binary representation and the extensive circuitry per pixel leads to low fill factor and sensitivity. With position sensing using on-pixel comparators, it is possible to use a ramp function of the threshold value, to obtain a more accurate centroid estimate as well as deal with non-uniform intensity profiles by obtaining several binary image maps at different threshold levels [Burns 2003]. However, this requires post-processing and several readouts of the array.

1.4.4.2.2 Digital Centroid Processing for Multi-Pixel Arrays

Analogue centroid computation offers the advantage of high speed and high functional density but suffers from lack of flexibility and imprecision due to mismatches and tight tolerances of components. De Lima Monteiro [de Lima Monteiro 2002] found that there was significant spatial variation in on-chip polysilicon-array resistance which leads not only to the shifting of the zero response but also to the slope of the response curve, as per Figure 1.10. Well structures offer higher sheet resistance but has greater spatial variation and poorer temperature and voltage coefficients. Also, as CMOS technology scales, the advantage of speed and functional density of analogue over digital diminishes.

Digital centroid computation involves the analogue-to-digital conversion of the pixel values into several bits of data and computing a weighted average of the photogenerated signals. A generic 256 x 256 pixel array system with an on-chip image processor has been designed which performs several common image processing algorithms including centroiding at 250 frames/s [Forcheimer 1993, Forcheimer 1992]. Recently an even more advanced and larger array sized programmable image
sensor and processor has been developed by the same group [Johansson 2002]. However, in an adaptive optics system such as the Shack Hartmann wavefront sensor, a large number of tilt sensors are required but the pixel count of each tilt sensor can be minimal. Nonetheless, the work presented by the group is encouraging because it shows that it is possible to integrate complex digital circuits alongside a CMOS image sensor and still achieve low noise.

Another generic structure for image processing is the cellular neural network (CNN) architecture where each cell (pixel) senses a point of the input image and interacts with neighbouring cells to perform parallel-processing tasks on the input image [Roska 1993]. All cells operate in parallel and in continuous time so that high operation speeds are obtained [Dominguez-Castro 1997]. However, due to the locality of the connections, global image processing tasks such as centroid detection require longer processing times, and generic structures in general are not optimised for any particular task.

The approach taken in this work is to integrate dedicated local digital centroid processing at each subaperture to measure the local wavefront tilt. By performing the centroid computation of the subapertures in parallel, the processing speed is maximised and the amount of data to be sent off-chip is reduced. In addition to an increase in speed, a single-chip system will have an advantage of reduced system size, costs and power consumption over multi-chip systems. This work represents the only dedicated digital centroid processor designed and fabricated to date.

5 However, post-processing of a Shack-Hartmann subaperture image using artificial neural networks is capable of providing a more accurate estimate of the centroid location than with conventional linear estimators (1st moment calculation) [Montera 1996].

6 There are digital chips that compute the first, second and higher order moments, e.g. [Hatamian 1986], but these do not have on-chip photodetection and are not dedicated centroid processors.
1.5 PHOTODETECTION

When determining the centroid in a given subaperture, the relative light intensities incident on each pixel in the array needs to be measured accurately. So an understanding of the mechanisms involved in the photogeneration of carriers is needed and this section will examine this.

1.5.1 OPTICAL ABSORPTION

When a photon is incident on a piece of semiconductor, there is a possibility that the photon will be absorbed if its energy is greater than the bandgap energy of the semiconductor. When a photon is absorbed, a bound electron in the valence band is excited to the conduction band where it is free to move randomly or under the influence of an electric field. The excited electron leaves behind a vacancy, or hole, in the valence band, which is also mobile. Hence an electron-hole (e-h) pair is generated. The electron-hole pair will then either recombine, diffuse or get separated by an electric field. Silicon is an indirect bandgap material so a phonon is required in the optical absorption process reducing transition probability and making the process strongly temperature dependent. For crystalline silicon, the bandgap energy, $E_g$ is 1.12 eV making the cut-off wavelength above which no photons can be absorbed to be $\lambda_c \approx 1.11 \, \mu m$. The optical absorption process can be quantified as follows. The carrier generation rate $g(x)$ at a depth of $x$ in the silicon must equal the rate of change of the photon flux $\Phi(x)$ with $x$ and at the same time proportional to $\Phi(x)$ [Bar-Lev 1984] as given by:

$$g(x) = -\frac{d\Phi}{dx} = \alpha(\lambda)\Phi(x)$$

(1.5)

where the proportionality constant $\alpha(\lambda)$ (cm$^{-1}$) is called the absorption coefficient and is dependent on the material and the wavelength, $\lambda$. The solution of this shows an exponential decay of photon flux with penetration depth as follows:

$\text{Energy of a photon, } E = \frac{hc}{\lambda q} \, eV = \frac{1.24}{\lambda(\mu m)} \, eV$ ; Cut-off wavelength, $\lambda_c = \frac{1.24}{E_g(\lambda V)} \, \mu m$
\[
\phi(x) = T \phi_0 \exp(-\alpha x)
\]  
(1.6)

where \(T\) is the transmission coefficient\(^8\) and \(\phi_0\) is the photon flux at the surface (\(x=0\)). This then gives a carrier generation rate of:

\[
g(x) = -\frac{d\phi}{dx} = T \alpha \phi_0 \exp(-\alpha x)
\]  
(1.7)

The absorption coefficients of several common semiconductor materials and compound semiconductors are shown in Figure 1.17. For wavelengths exceeding \(\lambda_c\), \(\alpha\) becomes negligible and the material becomes transparent to those wavelengths. For shorter wavelengths, \(\alpha\) becomes very large which means photons of shorter wavelengths get absorbed closer to the surface. The slow increase of \(\alpha\) with photon energy in silicon is due to the fact that Si is an indirect bandgap semiconductor.

---

\(^8\) The transmission coefficient or transmittance is the ratio of the amount of transmitted light to the amount of incident light i.e. the fraction of incident photons on the surface that is not reflected. With antireflection coatings, \(T=1-R \approx 1\), where \(R\) is the reflectance.

---

Figure 1.17 Absorption coefficient, \(\alpha\), for various semiconductor materials at 300K [Kasap 2001]
Figure 1.17 also shows that different semiconductor materials can be used to detect incident radiation over different wavelength regions with silicon having a characteristic wavelength range of about 250 nm to 1100 nm. Visible wavelengths range from 400nm (blue) to 750nm (red). Typically, blue light penetrates to a depth of about 0.2μm while red light penetrates more than 10μm. This difference in penetration depths can be utilized for the design of colour sensors by stacking charge collection layers at different depths, as pursued by Foveon Inc. in their commercially available Foveon X3 direct image sensors [Hubel].

The choice of silicon in this work is due to the high level of circuit integration required and available with the Complementary Metal Oxide Semiconductor (CMOS) silicon process technology. In the near infrared and infrared, compound semiconductors like Indium Gallium Arsenide (InGaAs), Indium Antimonide (InSb) and Mercury Cadmium Telluride (HgCdTe) are usually used.9

1.5.2 QUANTUM EFFICIENCY AND RESPONSITIVITY

The quantum efficiency and responsivity of a photodetector is a measure of how well the device can detect light. Quantum efficiency is defined as the number of signal electrons generated per incident photon while responsivity is defined as the ratio of the photogenerated current to the incident light power falling on the device and they can be related as follows:

Responsivity, $R_\lambda = \frac{\text{Photocurrent generated}}{\text{Incident power}} = \frac{\text{Photocharge generated}}{\text{Incident Energy}}$

Hence, $R_\lambda = \frac{n_{\text{gen}} q}{n_{\text{inc}} \frac{hc}{\lambda}} \quad (1.8)$

Therefore, the quantum efficiency $\eta = \frac{n_{\text{gen}}}{n_{\text{inc}}} = R_\lambda \frac{hc}{\lambda q} = 1.24 \times 10^{-6} \frac{R_\lambda}{\lambda} \quad (1.9)$

9 These types of detectors are called quantum detectors. Thermal detectors like bolometers and thermopiles are also used for far infrared detection.
where \( n_{gen} \) is the number of electron-hole pairs generated and \( n_{inc} \) is the number of incident photons, \( \lambda \) is the incident wavelength (m), \( h \) is Planck's constant \( = 6.626068 \times 10^{-34} \text{ m}^2\text{kg/s} \), \( c \) is the speed of light \( = 3 \times 10^8 \text{ m/s} \), and \( q \) is the electron charge \( = 1.6 \times 10^{-19} \text{ Coulombs} \).

There are various types of photodetector structures that can be implemented in silicon such as p-n junction photodiodes, Schottky photodiodes, p-i-n photodiodes, avalanche photodiodes (APD), metal-oxide semiconductor (MOS) capacitors and phototransistors [Bar-Lev 1984, Sze 1981].

### 1.5.2.1 P-N Junction Photodiode

The p-n junction photodiode is by far the most common structure because of its low cost, visible wavelength range and its easy availability in standard silicon processes. In a junction photodiode, a p-n junction is used as the photodetection region as the depletion region provides an electric field to efficiently separate and collect the electron-hole pairs generated and to prevent recombination. However, electron-hole pairs generated outside the depletion region can also diffuse to the depletion region and be collected but less efficiently.

The quantum efficiency of a photodiode structure can be derived by solving for the drift current inside the depletion region and the diffusion current outside the depletion region\(^{10}\). The quantum efficiency for a vertical p-n photodiode with a very narrow p-region and n-type bulk substrate can be shown to be [Sze 1981]:

\[
\eta = 1 - \frac{\exp(-\alpha W)}{1+\alpha L_p} \tag{1.10}
\]

where \( \eta \) is the quantum efficiency, \( \alpha \) is the absorption coefficient, \( W \) is the depletion width of the junction and \( L_p \) is the diffusion length of the minority holes in the n-substrate. Hence the quantum efficiency of a photodiode can be increased by

\(^{10}\)The drift current is obtained from integrating the carrier generation rate of equation (1.7) across the depletion region. The diffusion current is found by solving the diffusion equation for the minority carrier concentration using boundary conditions.
increasing the depletion width, which is dependent on the doping levels and the reverse bias voltage applied.

The speed of a photodiode is limited by three factors: diffusion of carriers, drift time in the depletion region, and capacitance of the depletion region [UDT Sensors Inc. 1982]. Carriers generated outside the depletion region must diffuse to the junction resulting in considerable time delay. The wider the depletion region, the more light is absorbed and the larger the spectral bandwidth. However, the depletion region must not be too wide or transit-time effects will limit the frequency response. It also should not be too thin or excessive photodiode capacitance $C$ will result in a large RC time constant.

1.5.2.2 Other photodetector structures

The Schottky photodiode is formed by the interface of a doped semiconductor with a metal layer and is capable of high speeds of the order of GHz but suffers from lower quantum efficiency and higher dark current. A p-i-n photodiode has a thick or lightly doped intrinsic (i) layer between the p and n-regions that serves to provide the device with a large depletion region and a low junction capacitance. This results in faster response times and higher quantum efficiency. However, the intrinsic layer which is usually tailored to be fully depleted is not a standard feature in the CMOS fabrication process. An avalanche photodiode (APD) achieves internal gain by operating under high reverse bias in the avalanche region where multiplication of charge carriers occurs through impact ionization. APDs have large dark current and integration of electronic circuitry with an APD is not straightforward due to the high reverse voltage requirement [de Lima Monteiro 2002]. A MOS capacitor detects light by storing photogenerated charges in a potential well that is formed when a voltage is applied to its gate. It is capable of high sensitivity and is the basis of the charge-coupled device (CCD) which will be discussed later on in Section 1.5.5.1. Phototransistors provide internal gain but only carriers generated in the base-collector space-charge region is amplified and phototransistors are slower and less linear than photodiodes and have a large dark current.
1.5.3 NOISE AND PHOTODIODE EQUIVALENT CIRCUIT

For modelling of a junction photodiode, an equivalent circuit is needed and one that is typically used is shown in Figure 1.18. The different noise sources have been collectively represented by the current source $I_N$ and the photocurrent is modelled as the current source $I_{ph}$.

![Figure 1.18 Photodiode equivalent circuit](image)

$I_d$ represents the diode current and is given by:

$$I_d = I_o \left[ \exp \left( \frac{qV}{kT} \right) - 1 \right]$$  \hspace{1cm} (1.11)

where $k=1.38\times10^{-23}$ J/K is the Boltzmann constant, $T$ is the absolute temperature in Kelvin, $q=1.6\times10^{-19}$ C is the electron charge, $V$ is the voltage across the photodiode and $I_o$ is the process dependent diode saturation current. In the reverse bias the diode current converges to $-I_o$, which is equivalent to the dark current of the photodiode. The resultant output current is given by the sum of the individual currents:

$$I = I_{ph} - I_d + I_N$$  \hspace{1cm} (1.12)

The capacitance, $C$, of the photodiode is the junction capacitance of the depletion region formed and the shunt resistance $R_{sh}$ represents the resistance of this depletion layer and is usually very large of the order of $10\Omega$ to $100\Omega$ [de Lima Monteiro 2002]. The series resistance, $R_s$, which is the resistance of the undepleted region between the edge of the depletion layer and the metal contact, has a value ranging from several Ohms to several hundred Ohms. There are two mains sources of noise in a photodiode, shot noise and thermal noise [UDT Sensors Inc. 1982, de Lima Monteiro 2002, Hornsey 1999c]. In addition, there is also $1/f$ noise, reset noise and spatial noise.
1.5.3.1 Shot Noise

Shot noise, $I_s$, is due to the statistical fluctuation of both the photocurrent $I_{ph}$ and the dark current $I_d$, and is expressed by:

$$I_s = \sqrt{2q(I_{ph} + I_d)B} \quad (1.13)$$

where $q$ is the electron charge and $B$ is the noise measurement bandwidth.

1.5.3.2 Thermal Noise

The thermal noise or Johnson noise of the photodiode, $V_t$, is due to the random motion of carriers in resistive electric materials and it increases with temperature. In a photodiode the thermal noise associated with the load resistance $R_L$ is given by:\n
$$V_t = \sqrt{4kTR_LB} \quad (1.14)$$

where $k$ is the Boltzmann constant, $T$ is the absolute temperature in Kelvin and $B$ is the noise measurement bandwidth.

1.5.3.3 Reset Noise

Capacitors are usually thought of as noise-free devices. In the case of sampling systems, however, they exhibit a theoretical noise because the capacitor is periodically reset (see Section 1.6.3). In most image sensor pixel architectures, signal detection will involve the reset of the photodiode capacitive node. This operation gives rise to reset noise and is due to the thermal noise of the resistance of the switch used to reset the photodiode.

The noise equivalent bandwidth, $B$, of a circuit is defined as the voltage-gain-squared of the circuit as follows [Homsey 1999c]:

$$B = \frac{1}{A(0)^2} \int_0^\infty |A(f)|^2 df \quad (1.15)$$

where $A$ is the voltage gain of the circuit and $f$ is frequency. For an RC circuit like that of a photodiode being reset through a switch, the noise equivalent bandwidth can be shown to have the value of $1/4RC$ and substituting this into the expression for thermal noise voltage we get the ever popular ‘$kT\!C$’ noise figure of:

---

\[11\] Assuming the load resistance $R_L$ is significantly smaller than the $R_{sh}$, which is a reasonable assumption in most cases.
\[ V_t = \sqrt{4kT R_{eq}} B = \frac{4kT R_{eq}}{4R_{eq}C} = \frac{kT}{\sqrt{C}} \]  \hspace{1cm} (1.16)

where \( C \) is the capacitance of the photodiode (Figure 1.18) and \( R_{eq} \) is the equivalent resistance of the circuit.

### 1.5.3.4 1/f Noise

Another noise source which exists but is given only brief mention here is the 1/f noise or flicker noise. The causes of this noise are not well understood and it has been proposed that it comes from carrier fluctuations at the surface interface traps or by mobility fluctuations. It derives its name from the fact that its magnitude is inversely proportional to its frequency and structures with a larger area are less prone to its effects. Also, it is more significant in lateral shallow devices (e.g. MOS transistors) and less important in bare photodiodes.

### 1.5.3.5 Spatial Noise

The sources of noise talked about so far are forms of temporal noise. When an array of photodiodes is used, spatial noise\(^{12}\) needs to be considered as well. This consists of fixed pattern noise (FPN) which is the pixel-to-pixel variations in the absence of illumination and photoresponse non-uniformity (PRNU) which is a function of the incident light level. The main causes of FPN are variations in photodetector geometry, dark current\(^{13}\) and threshold voltages, \( V_T \), while the non-uniformity in the photoresponse of CMOS photodiodes is caused mainly by light interference in the passivation layers as well as threshold variations [Makynen 1998]. Typical non-uniformity of a CMOS photodetector responsivity is \(<5\%\). With integrated on-pixel circuitry, threshold variations dominate the spatial non-uniformity\(^{14}\). Good matching in general requires close spacing and non-minimum size which is prohibitive with on-pixel circuitry. Devices operating in the subthreshold have higher threshold voltage or current variations such as in the logarithmic active pixel sensor (see Section 1.6.2.3).

\(^{12}\) Also known as pattern noise

\(^{13}\) Variations in photodetector geometry and dark current are smaller for larger sized devices.

\(^{14}\) Threshold variation and circuit mismatches have a larger effect on spatial noise than do the other photodiode parameters. For instance, variation of photodiode well capacity across the array does not matter if only half of the total well capacity is used for the desired application.
However, there are means to remove these spatial noise sources with the use of additional circuitry at the column or chip level which will be discussed in Section 1.6.3.

1.5.4 PHOTODIODE MEASUREMENTS

In the testing of photodiodes it is important to understand the different measurement techniques possible when measuring photocurrents directly without integration of charge. The best place to start would be with the general I-V characteristic of a photodiode as shown in Figure 1.19. A photodiode can be operated in either quadrant 3 or quadrant 4 of the diode I-V response. In quadrant 4, one can either measure the open-circuit voltage $V_{oc}$ or the short-circuit current, $I_{sc}$.

![Figure 1.19 I-V characteristics of a photodiode](image)

When measuring the open-circuit voltage ($I=0$ in Figure 1.18 and Figure 1.19), the load resistance is very large, for example that of a high input impedance multimeter. Ignoring noise, and from equations (1.11) and (1.12), the photogenerated open-circuit voltage obtained is:

$$V_{oc} = \frac{kT}{q} \ln \left( \frac{I_{ph}}{I_o} + 1 \right)$$

(1.17)
In effect, what is happening is that the generated photocurrent cancels out the forward bias diode current for very small forward bias voltages. The problem with obtaining the photocurrent this way is that the measurements now depend on temperature as well as \( I_o \), which in turn depends on process parameters like doping concentration and minority carrier lifetimes. The open circuit voltage, \( V_{oc} \), is also highly non-linear.

In order to get a linear response with respect to photocurrent, it is more suitable to measure the short circuit current, \( I_{sc} \), i.e. measuring the change in photocurrent along the \( y \)-axis of Figure 1.19. In order to do so, a very low load resistance is required. An op-amp is typically used to achieve this low load resistance by keeping the voltage across the diode fixed, as shown in Figure 1.20, using the virtual earth principle. In the short-circuit mode, \( I_{sc} = I_{ph} \) (\( I_d = 0 \)).

![Figure 1.20 Photocurrent measurements using an operational amplifier](image)

The photodiode can also be operated under reverse bias in quadrant 3 with a linear response. In this region, the diode current, \( I_d \) is approximately equal to the leakage current, \( I_o \). An op-amp can again be used to obtain a low load resistance line. The advantage of operating a photodiode under reverse bias is its high speed of response as well as larger generated photocurrent. Both of these are due to the increasing depletion width with reverse bias voltage. However, the disadvantage is that the leakage current is also increased and hence the noise.

When charge integration of a photodiode is to be measured, usually a capacitor performs the charge integration and this could be the photodiode capacitance itself, and a buffer or amplifier is used to readout the signal. The amplifier could be a sophisticated off-the-shelf component which has the advantage of low noise or a simple source follower buffer which lends itself to on-chip integration with the photodiode.
1.5.5 TECHNOLOGY AND MATERIALS

Even in silicon several different technologies and fabrication processes are available to the designer. These include the charge-coupled device (CCD), BiCMOS and CMOS technologies as well modifications to the standard CCD and CMOS process and even a combined CCD/CMOS process. A newer development, borne out of the move towards smaller feature sizes and silicon-on-insulator (SOI) technology, is the Thin Film on ASIC (TFA) technology [Wong 1996]. In the following sections, these technologies and their applicability to the work will be discussed.

1.5.5.1 Charge-Coupled Device (CCD)

Invented in the late 1960s by researchers at Bell Labs, the charge-coupled device (CCD) was initially intended for use as a memory circuit. But its potential in imaging soon became clear and it has since become the industry standard in image sensor technology. The basis of a CCD is the accumulation, storage and transfer of charges using closely spaced metal-oxide-semiconductor (MOS) capacitors. A MOS capacitor is simply a semiconductor substrate with an overlying thin oxide layer and a top metal contact, also known as the gate. When the structure has a p-type substrate, an n-type MOS capacitor is formed. To operate the CCD these MOS capacitors are pulsed with a positive gate voltage and driven into deep depletion (empty potential well). This is a non-equilibrium phase and the structure is able to collect any available minority carriers (electrons). The empty potential well can either be filled up by thermally generated electrons or photo-generated electrons. Fortunately thermal generation of electrons is relatively slow. It takes several seconds at room temperature to collect enough thermally generated electrons for inversion of the MOS capacitor to occur. During this time the potential well is available to collect photo-generated electrons. For low light level applications long integration times may be necessary and cooling is used to reduce the thermally generated dark current.

Once the charge has been stored, the next step is for the charge to be transferred to the output amplifier to be read off-chip. The transfer mechanisms in CCDs are well documented [Theuwissen 1995]. Figure 1.21 illustrates the charge transfer mechanism for a three-phase CCD system. A typical analogy used to describe the transfer of charge in a CCD is that of transferring water using buckets. By varying the voltages...
applied to the gate electrodes in a properly timed sequence, the stored charges are shuttled across the array to the output register and finally to the output amplifier. There are various transport systems possible for a CCD, from the classical four-phase system all the way to a single phase system. They have relative tradeoffs between fill factor, charge handling capacity, fabrication complexity and clocking requirements. But by far the most common is the four-phase system for transfer in the array and the two-phase system in the output register.

![Figure 1.21 Three-phase charge transport mechanism in a CCD (Source: Eastman Kodak)](image)

Besides the various transport mechanisms, there are several architectures possible in a CCD imager, the main architectures being the full frame, frame transfer and interline transfer CCDs. Full frame CCDs represents the basic architecture where the image is directly transferred to the readout register and has the problem of image smear as the sensor is still exposed to illumination as the image is transferred out, necessitating the use of a shutter and making them unsuitable for video applications. The other architectures aim to correct this by having fast intermediate transfer to on-chip storage area before the image is serially readout.
Since its inception, CCDs has had its fabrication process specially tailored towards imaging. CCD fabrication is complex with typically 15 - 25 masks [Homsey 1999b]. To name but a few unique features; closely spaced or overlapping gates and large clocking voltages (10-20V) are necessary to produce high charge transfer efficiencies, large operating voltages means the gate oxide thickness has to be large (80nm), compared to 10nm in CMOS, and a buried channel structure reduces surface traps and improves charge transfer efficiency. Crosstalk is reduced by controlling the doping concentration and resistivity of the substrate to limit the diffusion length of minority carriers, and unique antiblooming structures, specialised channel stop implants and stepped oxide isolation are used to absorb free carriers. Thinning and backside illumination are often used to improve blue and ultraviolet (UV) response while Multi Pinned Phase (MPP) clocking is used to suppress dark current by inverting the channel and quenching stray electrons. However these specialised fabrication procedures and techniques, though optimised for image sensing, make integration with circuitry difficult and cause the sensor to be susceptible to radiation damage, making it unsuitable for certain applications such as space based imaging. This has led to the resurgence of CMOS image sensors.

Much has been said about the possibility of CMOS image sensors eclipsing CCDs in the image sensing market. While this seems to be true in low end and high volume applications, CCDs still continue to dominate the scientific imaging market. For sure, developments into improving the performance of CCDs are still ongoing with several innovations being introduced. Roper Scientific's deep depletion CCDs use a high resistivity silicon substrate to reduce diffusion of charge carriers and improve quantum efficiency in the near-infrared (NIR). Kodak's Microelectronics Technology Division developed a gate structure based on indium tin oxide (ITO) which is more transparent than polysilicon hence giving better sensitivity in the blue/green region. Fujifilm's 3rd Generation Super CCD System uses octagonal-shaped photodiodes in an interwoven layout to achieve higher sensitivity and equal resolution in both horizontal/vertical direction and diagonal direction. The orthogonal transfer CCD (OTCCD), developed by Tonry, Burke and Schechter [Tonry 1997], permits parallel clocking in both the horizontal and vertical direction by replacing the channel stop between columns of pixels by an additional gate and was used to remove image motion caused by atmospheric turbulence at rates of up to 100Hz. The low light level
CCD (LLLCCD) from E2V is able to achieve sub-electron readout noise levels even at MHz pixel rates using on-chip charge multiplication and is currently being incorporated into the NAOMI wavefront sensor at the Isaac Newton Group of Telescopes (ING). Sony has introduced its HAD (Hole Accumulation Diode), Super HAD and EXview HAD CCD technology where an additional accumulation layer has been included to drain off thermally generated currents. The newer Super HAD and EXview HAD technology also incorporates two layers of on-chip microlenses for better light collection. Kodak integrated clock drivers on-chip with its interline KAI2020 CCD chip. Research is also being done into making CCDs more radiation tolerant. All these mean the predicted demise of CCDs is far from certain. However, for the purpose of this work, CCDs do not offer the level of integration needed to allow parallel processing of subaperture centroids. The disadvantages of this process will be highlighted further in Section 1.5.5.3 when the CMOS process technology is discussed.

1.5.5.2 BiCMOS

The BiCMOS process was introduced to combine the performance, high packing density and low power dissipation of the CMOS process with the high current drive, high switching speed and low mismatch of the bipolar device [Gray 1992]. However, the use of BiCMOS processes for imaging has been limited [Biber 2000, Chou 1991, Guidash 1995, Kuo 1991, Tanaka 1989, Wohl 2003] due to its complexity and cost with no obvious advantage in possible photosensing structures. The process is not yet mature and, unfortunately, many of the improvements in CMOS fabrication techniques do not directly transfer to BiCMOS fabrication. Also the large area required for each bipolar transistor makes them unattractive in large vision chips [Moini 1999]. The bipolar image sensor did achieve some commercial success with the base-stored image sensor (BASIS) [Tanaka 1989] which was used in Canon’s EOS line of autofocus sensors but has since been dropped in favour of CMOS sensors. The imager achieves amplification using a vertical bipolar transistor structure with the optically generated holes being integrated on the base.

1.5.5.3 CMOS

Complementary Metal-Oxide Semiconductor (CMOS) technology is the dominant technology in integrated circuit (IC) fabrication and is continuing to mature. CMOS
image sensors, on the other hand, are relatively immature having been sidelined for the better image quality of CCD sensors. However, these devices are making a comeback and an in-depth historical account of the birth and development of CMOS image sensors is given by Fossum [Fossum 1997].

Unlike CCDs, standard CMOS processes are not tailored for imaging purposes. For example, in a standard CMOS process, a shallow epi-layer substrate (see Figure 2.1) is used to mitigate latch-up reducing the response in the red, while heavily doped junctions which enable denser, shorter gate-length devices reduces the response in the blue/green region. Furthermore, CMOS imagers suffer from high temporal noise and 1/f noise because signals are transferred to the outside world via multiple transistor stages. However, CMOS imagers offer higher levels of integration and compared to multi-chip systems, a reduction in system size and power consumption [Janesick 2002]. CMOS imagers are more suited for high volume, space-constrained applications where imaging quality is less important such as in security cameras, PC peripherals, toys, fax machines, and some automotive applications [Litwiller 2001]. A summary of the relative advantages and disadvantages of the CMOS and CCD processes are given in Table 1.2.

<table>
<thead>
<tr>
<th>CMOS Advantages</th>
<th>CCD Advantages</th>
</tr>
</thead>
<tbody>
<tr>
<td>Capable of on-chip circuit integration</td>
<td>High light sensitivity and low noise</td>
</tr>
<tr>
<td>Low power consumption</td>
<td>Low dark current</td>
</tr>
<tr>
<td>Random access to pixel regions of interest (ROI)</td>
<td>High uniformity</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>CMOS Disadvantages</th>
<th>CCD Disadvantages</th>
</tr>
</thead>
<tbody>
<tr>
<td>Higher noise levels</td>
<td>Circuit integration difficult</td>
</tr>
<tr>
<td>Larger dark current</td>
<td>High power consumption</td>
</tr>
<tr>
<td>Lower fill factor</td>
<td>Require large multiple supply voltages and complex timing signals</td>
</tr>
<tr>
<td></td>
<td>Pixel defects could render entire row/column unusable</td>
</tr>
</tbody>
</table>

Table 1.2 Comparison of advantages and disadvantages of CMOS and CCD image sensors
Chapter 1

The cost advantage of CMOS over CCDs is not very well understood. CMOS imagers would be much cheaper if they could be produced on the same high-volume wafer processing lines as mainstream logic or memory chips. However, typically for improved performance, CMOS imagers would require additional modifications to the basic process such as optical packaging, on-chip colour filter arrays and on-chip microlenses. So at the chip or sensor level costs are similar but at the system level CMOS imagers are generally cheaper due to the additional related circuitry required for CCD operation [Litwiller 2001].

An issue which faces CMOS imagers is that of decreasing feature sizes. Smaller feature sizes mean higher packing densities, improved fill factors, lower power consumptions, faster speeds\(^{15}\) and reduced crosstalk\(^{16}\). However, reduced voltage swing due to downscaling reduces dynamic range and smaller junction depths means reduced volume for photocharge collection and increase in surface effects [Wong 1996] as well as shifting of quantum efficiency curves to shorter wavelengths. Short channel effects lead to off leakage currents and tunneling currents which contribute to the dark current of pixels. Furthermore, for these processes, opaque silicide layers (WSi\(_2\), TiSi\(_2\), CoSi\(_2\)) are used to reduce contact and sheet resistances of source/drain regions and gates. Hence, as technology scales beyond 0.5\(\mu\)m, modifications to the fabrication process are needed to enable good quality imaging [Lule 2000, Wong 1996] such as the removal of the silicide layer.

The use of CMOS imagers currently proves difficult for low-light level applications such as astronomy due to the high-level of noise in CMOS compared to CCDs. However, CMOS imaging is a relatively new development and noise reduction techniques by means of specialized circuitry are being heavily researched [Bursky 1999, Lai 2002, Meynants 2001, Pain 2003, Rullmann 2003]. Watabe et al. [Watabe 2003] mentioned the overlaying of a high-gain avalanche rushing amorphous photoconductor (HARP) film on top of a CMOS image sensor to produce an ultrahigh

\(^{15}\) This is due to lower capacitance which leads to better conversion efficiency (\(q/C\)) between electron charge and output voltage.

\(^{16}\) This is due to higher doping levels which lead to reduced diffusion lengths.
sensitivity CMOS image sensor. So it may be that in the not too distant future CMOS image sensors will achieve the level of sensitivity now only seen in CCDs.

1.5.5.4 CCD/CMOS

Modifications of the basic CCD and CMOS process in order to allow more flexible readout in CCDs or improved imaging quality in CMOS include the charge injection device (CID), static induction transistor (SIT), charge modulation device (CMD), pinned photodiodes and many more. The CID uses MOS capacitors like in CCDs but allow X-Y addressing and non-destructive readout [Theuwissen 1995]. The SIT achieves current amplification by placing a light sensitive MOS capacitor on top of a bipolar transistor. A CMD sensor, developed by Olympus, consists of a MOSFET structure where photogenerated charges collected under the gate of the device modulates the current flowing through the transistor [Hornsey 1999a]. Amplification is achieved and the device is compact requiring only two transistors per pixel but suffers from large dark current and fixed pattern noise. The pinned photodiode\(^{17}\) developed by JPL/Kodak offers high quantum efficiency, low dark current and low noise readout [Fossum 1997]. However none of these sensors are fully compatible on a standard CMOS process and additional fabrication steps are required.

Several efforts have been made to combine CCD and CMOS processes to make use of their relative advantages, in particular the better imaging quality of CCD sensors with the high level of integration of the CMOS process. However this is not without its difficulties [Hornsey 1999b, Moini 1999]. CCD/CMOS processes do not provide an optimised CCD structure. In fact, neither process is fully optimised in a combined device and the approach represents more of a compromise than an improvement. Also, the high clocking pulses needed for CCD operation induces noise into any circuitry that is integrated. Being highly capacitive devices, CCD structures will cause adjacent CMOS circuits to dissipate too much power. Furthermore, combining CMOS and CCD processes to obtain the best of both worlds would require almost all the stages

---

\(^{17}\) Pinned photodiode has a p'np' structure where the voltage applied to the n-layer fully depletes the n-layer and the voltage is pinned. Photogenerated majority carriers are then stored in this depletion region decreasing the pinned voltage. This is different than a p-i-n photodiode which utilizes an intrinsic layer between a p-layer and n-layer (typically p'n'n') and photogenerated minority carriers are swept across the depletion region and collected by electrodes connected to the p and n-layers.
from both processes, which means an excessive number of fabrication masks and the resulting process tends to be more expensive than either the standard CMOS process or CCD manufacturing. High volume production is highly unlikely.

One of the approaches taken by NASA is their concept of "Hybrid imaging technology" (HIT). Instead of uniting CCD and CMOS devices at the device-fabrication-process level, the devices are fabricated separately and then joined mechanically and electrically (hybridized) by standard bump bonding techniques where indium bumps are deposited on matching bump-bond pads formed on the CCD imager and CMOS chips.

1.5.5.5 Thin Film on ASIC (TFA)

TFA (Thin Film on ASIC) image sensors consists of a hydrogenated amorphous silicon (a-Si:H) photodiode with the a-Si:H layers directly deposited on the CMOS chip to give fill factors approaching 100% [Wong 1996]. Furthermore, detectors and electronic circuitry can be developed independently, with the potential of obtaining very low dark currents due to the higher energy gap of a-Si:H (1.75eV). However this represents a relatively immature technology and is not widely available, hence costs are still high. But for downscaled processes this technology holds great promise.

1.6 PIXEL ARCHITECTURES IN CMOS

Of the technologies discussed, CMOS offers the highest possibility of on-chip processing at a reasonable cost and performance. This section describes the possible pixel architectures in a CMOS process. Readout for CMOS photodetector structures can either be made in the direct readout mode or in the charge integration mode. The advantage of charge integration readout is that it offers higher signal sensitivity [Fossum 1997] and allows the dynamic range to be controlled by changing integration times and it has low sensitivity to device mismatch because the integration time depends on the input capacitance, which has less mismatch than other parameters of the circuit [Moini 1999]. Also it has a linear transfer characteristic and integration acts as a low-pass filter which removes the high frequency components of the noise.
1.6.1 PASSIVE PIXEL SENSORS (PPS)

The passive pixel sensor (PPS) first introduced by Weckler in 1967 [Weckler 1967] represents the early form of the CMOS imager and is responsible for much of its initial criticism due to its poor noise performance. Passive pixel sensors have one transistor per pixel for addressing purposes as shown in Figure 1.22. Operating the passive pixel sensor in a direct or continuous mode usually involves the use of a transimpedance amplifier, with the feedback resistance providing the current-to-voltage conversion. However, this technique does not lend itself to on-chip integration due to the difficulty in incorporating the large feedback resistance required. A more common approach is to operate the sensor in a charge integration mode using a charge amplifier with feedback capacitance at the column or chip level [Hornsey 1999b] as in Figure 1.22.

![Passive pixel sensors with column-level charge amplifier readout circuitry](image)

The photocharge integrated on the photodiode capacitance is transferred to the feedback capacitance of the charge amplifier and output as a voltage. Gain is provided by the ratio of the photodiode capacitance to the feedback resistance. With passive pixel sensors, parasitic capacitances of the data line is a major concern as it limits the speed at which the pixel can be read out, increases readout noise as well as reduces the charge seen at the output. As such, passive pixel devices does not scale well to larger array sizes and is not usually the architecture of choice except where fill factor is a limitation or current readout is desired. Integration of a charge amplifier at the column...
level has the advantage of reduced bus capacitance but the disadvantage of mismatches between the amplifiers and limited space, and hence performance, available for each amplifier.

1.6.2 ACTIVE PIXEL SENSORS (APS)

Active pixel sensors incorporate an active amplifier or buffer at each pixel, typically a source follower, to overcome the large bus capacitance of the passive pixel sensors. The initial problem with active pixel sensors was the poor fill factor caused by the incorporation of the on-pixel amplifier but decreasing feature sizes means more and more functionality can now be built into a single pixel. Pixels as small as 4 microns have been fabricated [Endo 2003]. There are three major types of active pixel sensors, namely the photodiode APS, photogate APS and logarithmic APS.

1.6.2.1 Photodiode APS

The structure of a photodiode APS is shown in Figure 1.23. Light incident on the photodiode generates charge carriers which are collected on the photodiode capacitance. After the integration time has elapsed the voltage on the capacitor is read out and is linearly related to the charge collected and hence the incident illumination. After readout, the reset line is pulsed high to reset the photodiode to the supply voltage. The integration may then be repeated.

![Figure 1.23 A photodiode APS with array row/column selection](image-url)
Due to the added circuitry and their threshold drops, the dynamic range of an active pixel sensor is normally limited by the voltage swing of the circuit rather than the full well capacity of the photodiode. Methods to extend this output swing include using a complementary PMOS readout structure in addition to the regular NMOS source follower readout structure [Xu 2002]. This however causes a reduction in fill factor. Numerous modifications to the basic photodiode APS have been carried out in order to improve its functionality or its performance, as detailed in [Fossum 1997].

1.6.2.2 Photogate APS

A photogate APS is based on a CCD device where photogenerated charge is collected in a potential well when a voltage is applied to the photogate (PG). The structure of the photogate APS is shown in Figure 1.24. After integration, the floating diffusion is reset, and its reset voltage is stored. A transfer gate is then pulsed to transfer the stored photogenerated charge to the floating diffusion and this voltage is then read. Readout of the reset and signal voltages are performed through a source follower buffer and a row select transistor like in the photodiode APS of Figure 1.23. The difference in the reset and signal voltages is the output of the sensor. This approach is called correlated double sampling (see Section 1.63) and it suppresses reset noise, 1/f noise and FPN. However the photogate APS has a lower fill factor, higher mismatch\(^\text{18}\) and lower quantum efficiency, particularly in the blue, than the photodiode APS due to the additional circuitry and the overlying polysilicon gate. However, it has better noise suppression and charge conversion efficiency\(^\text{19}\) making it suitable for low-light level applications.

\(^{18}\) This is due to the surface states at the Si-SiO\(_2\) interface contributing to the recombination of stored carriers

\(^{19}\) This is because it has a separate smaller output node (floating diffusion) which means a smaller capacitance (Charge conversion efficiency = \(q/C\) in V/e\(^-\))
Figure 1.24 Photogate APS with (a) overlapping transfer gate and (b) with n+ transfer diffusion [de Lima Monteiro 2002]

Ideally the transfer gate should overlap the photogate to ensure effective charge transfer. This would require a double poly process. However, the need for an additional gate can be avoided by utilizing an intermediate 'bridging' diffusion, as shown in Figure 1.24 (b). This has little effect on the performance of the pixel except for the possible introduction of image lag [Mendis 1997].

1.6.2.3 Logarithmic APS

The logarithmic pixel is a modification of the linear photodiode active pixel sensor where the gate of the reset transistor is connected to the supply voltage giving continuous readout of the photocurrent and is depicted in Figure 1.25. The small photocurrent causes the reset transistor to operate in the weak inversion or subthreshold region where the MOS current flow is dependent upon the exponential of $V_{DS}$. The voltage at the photodiode node therefore varies logarithmically with the photocurrent, giving the pixel a very large dynamic range and can be expressed by the following equation [Homsey 1999b]:

$$V_s = VDD - \frac{kT}{q} \ln \left( \frac{i_{\text{photo}}}{i_o} \right)$$ (1.18)

where $k$ is the Boltzmann constant, $T$ is the absolute temperature in Kelvin, $q$ is the electron charge, $VDD$ is the supply voltage, $i_{\text{photo}}$ is the generated photocurrent and $i_o$ is a process dependent parameter.
Logarithmic pixels can measure illumination over 5 orders of magnitude, an order of magnitude more than ordinary APS [Homsey 1999b]. In addition, logarithmic pixels do not require a reset line and have simpler timing and operation as well as larger fill factor. Since logarithmic pixels operate in continuous time, they are randomly accessible both in time and in space. This also means they are able to operate at a higher sampling rate. On the downside, because of the subthreshold operation of the MOSFET and its dependence on temperature and process parameters such as threshold voltage and oxide thickness, logarithmic sensors suffer from large pixel offset non-uniformity or FPN. So though its dynamic range is larger, typically logarithmic pixels have lower SNR. This FPN cannot be removed by correlated double sampling because of its continuous time operation. This offset, however, can be removed by storing the offset in memory and subtracting when the pixel is read. It can be performed by software but for the highest possible speed, a parallel hardware correction method is used. Dierickx et. al. [Dierickx 1996] used an external PROM and a dedicated co-processor while Ricquier et. al. [Ricquier 1995] performed the non-uniformity correction on-chip.

Another disadvantage of the logarithmic APS is its speed under low illumination levels because of the small photocurrent available for charging/discharging of the sensing node [Homsey 1999b]. Delbrück, however, used feedback to improve the
speed response. An adaptive element was also used in order to give compression for slowly varying signals and higher gain for larger frequencies making it useful for biological vision systems and motion detection [Moini 1999]. In fact, logarithmic sensors are the preferred sensors for modelling biologically inspired vision systems as it mimics its large dynamic range response.

An inverted logarithmic APS structure, where the positions of the photodiode and the load (transistor in subthreshold) is reversed, was used to reduce pattern noise and improve output voltage swing by reducing signal compression [Hong 2001]. The electrical sensitivity of the conventional structure can be improved by increasing the number of subthreshold diode connected MOS transistors (MOS diodes) in the pixel at the expense of reduced fill factor and speed of response. With an inverted structure the effect is less pronounced (no increase in sensitivity) but instead the subthreshold region of operation is extended over a wider region offering an even larger dynamic range. Again at the expense of reduced fill factor.

1.6.3 NOISE REMOVAL AND EXTENDING DYNAMIC RANGE

Noise has been the weak point of CMOS imagers compared to the highly sensitive CCDs. However, there are various means to achieve noise removal in CMOS sensors. New techniques are constantly being developed but two well established methods are the Correlated Double Sampling (CDS) and Delta-Difference Sampling (DDS) techniques for removing the ‘kTC’ reset noise\(^{20}\) and FPN. Figure 1.26 shows the typical circuit for performing CDS and DDS.

\(^{20}\) It is known as kTC noise because the number of noise electrons generated \( n = \frac{\sqrt{kTC}}{q} \), though the noise voltage at the output is given by \( V_n = \sqrt{\frac{kT}{C}} \) (from \( Q=CV \)). Since signal electrons increases proportionately with area but reset noise electrons increase as a square root of area (capacitance), SNR improves with a larger photodetection area.
Figure 1.26 Correlated Double Sampling (CDS) and Delta-Difference Sampling (DDS) applied to a photogate APS [Mendis 1997]

Correlated double sampling is usually performed at the column level and works by differentially reading out the reset and signal levels. However, due to the threshold voltage variations between the two readout circuits, column-wise FPN is generated. Delta-difference sampling removes this by shorting the two sample and hold capacitors (by pulsing CB and SEL in Figure 1.26) and taking another differential reading. This reading is proportional to the threshold voltage difference between the two circuits and subtracting this from the initial reading gives the final offset free output.

Reset noise of an APS is the thermal noise (see equation (1.16)) associated with the finite resistance of the reset switch. This noise is transferred to the capacitor when the reset switch opens. In the case of a photogate APS, the ‘kTC’ noise freezes when the reset transistor switches off because the effective noise bandwidth, $B = 1/4RC$ (equation (1.15)), drops significantly ($R_{off} \gg R_{on}$). Figure 1.27 (a) illustrates this. However, in a photodiode APS the charge is integrated on the output node such that when the reset signal goes low the photodiode immediately starts discharging the stored charge. Removal of reset noise would require sampling right at the instant reset
is switched off and the photodiode discharges, which is difficult to do. However, it is still possible to use double sampling (not correlated) to remove 1/f noise and fixed pattern noise from the photodiode pixel.

Besides noise, another important characteristic of image sensors is the dynamic range. There are several means to extend the dynamic range of active pixel sensors [Yadid-Pecht 1999]. These include the logarithmic pixel discussed previously, multi mode sensors, clipping a sensor's response, having a variable integration time [Yasuda 2003], and conversion of the sensor output to a pulse frequency [Yang 1994]. Multi mode sensors allow the photodetector structure used to be operated under different modes. One such example makes use of the fact that it is simple to switch between the linear and logarithmic mode of the active pixel sensor by proper biasing of the reset/subthreshold transistor. This has been commercially marketed under the label LINLOG technology by Photonfocus AG and it uses a linear response at low illumination levels and logarithmic compression at high intensities. Clipping sensors
have anti-blooming structures that bleed off excess charge as it builds up. Control of integration time to extend dynamic range works on the fact that increasing integration time allows more charge to be stored in the pixel and this can be done either globally or locally. The advantage of controlling the integration time locally is that if the scene being captured consists of different illumination levels, the dynamic range at the brighter part of the scene is extended while the resolution at the darker regions is maintained. Most dynamic range enhancement efforts, specifically those requiring on-pixel circuitry, suffer from reduced fill factor, sensitivity and spatial resolution as well as increased mismatch.

It is clear that the ability to integrate circuitry on-chip with CMOS imagers has opened the doors to a wide range of applications and possibilities. Its flexibility has meant enhanced functionality of devices. From adaptive photocircuits and foveated pixels for robotic vision [Moini 1999], to unique readout and pixel reset structures [Yadid-Pecht 2003], to pixel-level ADCs for high frame rates [Kleinfelder 2001], to on-chip or in-pixel analogue memory [Simoni 1995] for motion detection, extended dynamic range and electronic shuttering; the possibilities seem endless for CMOS imaging.

1.7 CHAPTER SUMMARY

This chapter has emphasized the need for adaptive optics (AO) highlighting several key application areas where low cost real-time AO systems would be useful such as astronomy, ophthalmology, intra and extra-cavity laser correction, free space optical communications and microscopy. A fundamental part of any AO system is the wavefront sensor and with current Shack-Hartmann wavefront sensors, conventional imagers are used with limited frame rates ranging from 25 to 60 Hz. Using a dedicated CCD increases the frame rate but at the expense of increased cost, and the need for an image-processing step and special hardware still remains [de Lima Monteiro 2002]. In this thesis, a solution to the data bottleneck is proposed by integrating local centroid processing at the detector level.
There are several possible structures for implementing a position sensitive device (PSD) such as the lateral effect photodiode (LEP), the quad cell and the multi-pixel array. A lateral effect PSD requires large uniform sheet resistance for linear operation, which is not readily available in a standard CMOS process making integration with circuitry difficult. Quad cells have simple readout schemes but are not very linear. Multi pixel arrays have better linearity and positional range, which translates to larger tilt measurement capability. They also offer greater flexibility and are able to deal with multiple spots and non-uniform intensity profiles. The drawback is the increased computational load but for moderate array sizes this is reasonable and this was the architecture chosen for our system. A 5 x 5 pixel array was selected as a tradeoff between linearity and circuit complexity.

Several technological options were highlighted and the standard CMOS process was chosen as the technology of choice as it allows high levels of circuit integration needed to implement the local centroid processing. There have been various efforts to implement centroid detection on a CMOS process for numerous applications. In general, analogue multi-pixel array approaches suffer from low fill factor and sensitivity, requiring either separate x and y pixels or on-pixel circuitry such as a comparator or resistors. In addition, binary position sensing techniques using Winner Take All (WTA) circuitry or an on-pixel comparator do not offer subpixel accuracy and cannot cope with multiple spots or non-uniform spots. A dedicated digital centroid processor has yet to be demonstrated to date, though several generic image processors exist, and this research explores this approach. A dedicated digital centroid processor offers high accuracy and greater flexibility. Also the processor can be made programmable and additional image processing tasks can easily implemented if necessary.

The fundamentals of the photodetection mechanism were described along with issues of response, noise and operation. The junction photodiode structure was chosen as the basis of the imaging component as it is readily available in a standard CMOS process and offers good quantum efficiency as well as high linearity and dynamic range. In terms of pixel architectures, the CMOS active pixel sensor (APS) was selected as it offers high fill factor and low mismatch compared to other APS types. Ideally, the
pixel size has to be sufficiently large in order to achieve a large fill factor and sufficient tilt dynamic range. A large fill factor also means less mismatch.

In summary, in the proposed design each tilt sensor will consist of: i) a 5 x 5 photodiode active pixel sensor array in a standard CMOS process ii) a dedicated on-chip digital centroid processor to remove the data bottleneck. A discussion of the data bottleneck in current CCD systems and how our system addresses this is given in Appendix A1.1. The following chapters of this thesis will cover the design, fabrication and implementation of the proposed system.

Chapter 2 will discuss the results from the characterisation of fabricated full custom photodiodes in a standard CMOS process. Their suitability and performance are assessed. Chapter 3 then describes the use of a hardware emulation system to validate the functionality of the design prior to committing the design to silicon. The emulation system consists of a photodiode array as the front-end for light detection and a Field Programmable Gate Array (FPGA) as the digital backend that performs the centroid computation. The system was tested using both a commercial photodiode array and a fabricated full custom CMOS photodiode array. Chapter 4 then details the integration of a full custom CMOS photodiode array with on-chip digital centroid processing. Chapter 5 discusses the reconstruction of an optical wavefront from an array of centroid data and finally Chapter 6 will offer some concluding remarks and some discussions on possible further developments and improvements.
CHAPTER 2
CHARACTERISATION OF CMOS PHOTODIODES

2.1 INTRODUCTION
In the design of any complete system, particularly in VLSI, the individual parts of the system need to be evaluated and characterised before the complete system is fabricated. One of the fundamental building blocks of any optoelectronic system is that of the photodetector. As mentioned in the previous chapter, these photodetectors are formed in a CMOS process by the generation of a p-n junction and are typically the “well-substrate” or “diffusion-substrate” or the “diffusion-well” photodiode types. This chapter covers the characterisation of these discrete photodiodes and the selection of the optimum device prior to the addition of any circuitry or processing.

2.2 FABRICATION OF TEST STRUCTURES
The process used for the test structures and also for the fabrication of the centroid processor is the Alcatel Microelectronics (Mietec)\textsuperscript{21} 0.7μm self-aligned twin-well, single-poly, double-metal layer CMOS process with LOCOS isolation [Europractice IC Service]. This process is accessed via IMEC in Belgium through the Europractice IC Service. The Europractice Multi-Project Wafer (MPW) service enables the prototyping to be carried out at a reduced cost. The main electrical and physical parameters of this process such as the resistivity, threshold voltage and transistor transconductance are highlighted in Appendix A2.1. However, to give a clearer view

\textsuperscript{21} Now known as AMI Semiconductor (AMIS) after AMIS acquired Alcatel Microelectronics' mixed-signal business activities from STMicroelectronics.
of the characterisation results the junction depths of the process have been illustrated in a typical CMOS cross section shown in Figure 2.1.

![Figure 2.1 Junction depths of the Mietec 0.7μm CMOS process](image)

2.2.1 FIRST CHARACTERISATION CHIP (PDFINAL)

Figure 2.2 shows the layout of the first test chip PDfinal. This chip contained the following: 1. well-substrate photodiodes 2. diffusion-substrate photodiode 3. combined well-substrate and diffusion-well photodiode 4. lateral effect photodiode (LEP) 5. active pixel sensor 6. 5-by-5 array of combined well-substrate and diffusion-well photodiodes. The various junction photodiodes were included in order to determine their relative response and characteristics as well as their individual variation with area and periphery. In addition, a combined device was designed and included in order to capture a longer range of wavelengths than either the well-substrate (deep) photodiode or the diffusion-substrate (shallow) photodiode, and will be discussed further in Section 2.2.3. The LEP is commonly used for position sensing as a custom device and was included for characterisation in a CMOS process but was not used in this work as the multi-pixel array approach was chosen for our application. Finally, the 5-by-5 photodiode array was included for use in the hardware emulation system of the centroid processor which is described in Chapter 3.

22 The term p-substrate will be used frequently in this thesis and this will refer to the p-epilayer substrate and not the bulk substrate.
Initial characterisation of this chip showed several issues. Firstly light being absorbed in the substrate and diffusing to the photodiode active region gave rise to crosstalk and a larger signal than expected. Secondly the pads used were those available in the library and contained a diode protection structure (see Figure 2.3), which if biased incorrectly may interfere with the characterisation of the raw devices. Also when operating the photodiode in reverse bias it was necessary to power up the protection circuit in order to avoid any forward bias current from the protection structure affecting the results. However, we were nevertheless able to obtain satisfactory responsitivity values from the photodiodes and the array on this chip allowed us to proceed with the development of the centroiding system, as will be discussed in the next chapter. The chip size was 2513.8µm x 2412.2µm (Area = 6.0638mm²) and was packaged in a 44-pin ceramic J-leaded chip carrier (JLCC 44).

![Figure 2.2 Layout of 1st photodiode test chip (PDfinal)](image)

![Figure 2.3 Diode protection structure present in pads used in PDfinal](image)
2.2.2 SECOND CHARACTERISATION CHIP (CHIPIBFINAL)

Figure 2.4 shows the layout of the 2nd characterisation chip (chip1bfinal). In this chip several changes were made. Firstly, a metal light shield surrounding each structure was incorporated. However the process required that holes be included in the metal every 25\mu m to relieve mechanical stress, which meant total blockage was not possible. Secondly, more structures were incorporated and the number of different sized devices for each structure was increased in order to better determine any area and perimeter scaling effects. Finally it was necessary to design a pad without any additional circuitry on it to allow accurate characterisation of the photodiode test structures.

---

**Figure 2.4 Layout of the 2nd photodiode test chip (chip1bfinal)**

---

**Figure 2.5 Optical image of well-substrate photodiodes on test chip (transposed)**
The size of this chip was 3640\(\mu\text{m} \times 3543\mu\text{m}\), which is an area of approximately 12.9\(\text{mm}^2\) and was packaged in a 68-pin ceramic J-leaded chip carrier (JLCC 68). The full list of devices present on this chip are summarised in Table 2.1 and will be referred to by its assigned short name from henceforth - and is also used in Figure 2.4.

<table>
<thead>
<tr>
<th>Short name</th>
<th>Photodiode type</th>
<th>Size of photodiode</th>
</tr>
</thead>
<tbody>
<tr>
<td>deep1</td>
<td>n-well/p-substrate with n+ removed (deep)</td>
<td>30(\mu\text{m} \times 30\mu\text{m})</td>
</tr>
<tr>
<td>deep2</td>
<td>60(\mu\text{m} \times 60\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>deep3</td>
<td>80(\mu\text{m} \times 80\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>deep4</td>
<td>100(\mu\text{m} \times 100\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>deep5</td>
<td>160(\mu\text{m} \times 160\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>deep6</td>
<td>200(\mu\text{m} \times 200\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>ndeep1</td>
<td>n-well/p-substrate with n+ across (deep with n+)</td>
<td>100(\mu\text{m} \times 100\mu\text{m})</td>
</tr>
<tr>
<td>ndeep2</td>
<td>200(\mu\text{m} \times 200\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>nshal1</td>
<td>n+/p-substrate (shallow n+)</td>
<td>30(\mu\text{m} \times 30\mu\text{m})</td>
</tr>
<tr>
<td>nshal2</td>
<td>60(\mu\text{m} \times 60\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>nshal3</td>
<td>80(\mu\text{m} \times 80\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>nshal4</td>
<td>100(\mu\text{m} \times 100\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>nshal5</td>
<td>160(\mu\text{m} \times 160\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>nshal6</td>
<td>200(\mu\text{m} \times 200\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>pshal1</td>
<td>p+/n-well (shallow p+)</td>
<td>30(\mu\text{m} \times 30\mu\text{m})</td>
</tr>
<tr>
<td>pshal2</td>
<td>60(\mu\text{m} \times 60\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>pshal3</td>
<td>80(\mu\text{m} \times 80\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>pshal4</td>
<td>100(\mu\text{m} \times 100\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>pshal5</td>
<td>160(\mu\text{m} \times 160\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>pshal6</td>
<td>200(\mu\text{m} \times 200\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>ncomb1</td>
<td>Combined n+/p-substrate and n-well/p-substrate</td>
<td>30(\mu\text{m} \times 30\mu\text{m})</td>
</tr>
<tr>
<td>ncomb2</td>
<td>60(\mu\text{m} \times 60\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>ncomb3</td>
<td>80(\mu\text{m} \times 80\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>ncomb4</td>
<td>100(\mu\text{m} \times 100\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>ncomb5</td>
<td>160(\mu\text{m} \times 160\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>ncomb6</td>
<td>200(\mu\text{m} \times 200\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>pcomb1</td>
<td>Combined p+/n-well and n-well/p-substrate</td>
<td>30(\mu\text{m} \times 30\mu\text{m})</td>
</tr>
<tr>
<td>pcomb2</td>
<td>60(\mu\text{m} \times 60\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>pcomb3</td>
<td>80(\mu\text{m} \times 80\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>pcomb4</td>
<td>100(\mu\text{m} \times 100\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>pcomb5</td>
<td>160(\mu\text{m} \times 160\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>pcomb6</td>
<td>200(\mu\text{m} \times 200\mu\text{m})</td>
<td></td>
</tr>
<tr>
<td>APSPMOS</td>
<td>Active pixel sensor with PMOS reset gate (n-well/p-substrate photodiode)</td>
<td>100(\mu\text{m} \times 100\mu\text{m})</td>
</tr>
<tr>
<td>APSCMOS</td>
<td>Active pixel sensor with CMOS reset gate (n-well/p-substrate photodiode)</td>
<td>100(\mu\text{m} \times 100\mu\text{m})</td>
</tr>
<tr>
<td>PDarray</td>
<td>5 by 5 n-well/p-substrate (deep4) photodiode array</td>
<td>100(\mu\text{m} \times 100\mu\text{m})</td>
</tr>
</tbody>
</table>

Table 2.1 Devices present in characterisation chip ‘chip1bfinal’
Figure 2.6 shows the layout and cross section of the deep photodiode with n+ removed from the active region except for the cathode contact region (i.e. deep1-6 and those used in PDarray). In addition, the simplified cross sections of the other photodiode types present on the chip are shown in Figure 2.7.

Figure 2.6 Layout and cross section of fabricated well-substrate (deep1-6 and PDarray) photodiode
2.2.3 COMBINED PHOTODIODES

In Section 1.4.1 we observed that the absorption depth of a photon depends upon the wavelength. As a result the two different junction depths (at 0.3μm and 2μm) in theory lend themselves to being sensitive to different wavelengths. Hence the combined devices were designed and fabricated in order to extend the spectral
response of the typical photodiode over a wider range of wavelengths. Two types of combined photodiodes are possible, the combined shallow p+/n-well and deep n-well/p-substrate photodiode (pcomb1-6 – Figure 2.7(d)) and the combined shallow n+/p-well and deep n-well/p-substrate photodiode (ncomb1-6 – Figure 2.7(c)). In the first chip, the former was included. In the second chip, both were included. The ncomb devices are in effect two discrete photodiodes laterally adjacent to each other and will not provide any additional advantages when focused light is used such as in the intended application. In fact, in the intended application of finding a centroid, the use of this device would be detrimental due to its non-symmetrical spatial response. As for the pcomb devices, when tested in room light, these devices were found to be very leaky with increasing reverse bias voltage. This observed effect is shown in Figure 2.8. This is believed to be due to the depletion region being formed at the surface of the photodiode leading to a large leakage current as a result of the large electrical field caused by increased mechanical stress and increased number of surface traps present [Bogaerts 2000, Pain 2001]. As the reverse bias voltage increases the leakage current increases as a result of the larger depletion width. As a consequence of these observations, the majority of the characterisation work presented henceforth will be focused on the deep (Figure 2.6) and shallow devices (Figure 2.7(a) and 2.7(b)). However, results on these combined devices will be presented where deemed relevant to highlight its uniqueness or simply for completeness.

![Graph](image)

**Figure 2.8 I-V response in room light showing increased leakage of combined shallow p+/deep (pcomb1-6) device compared to the deep (deep1-6) device for reverse bias operation (PDfinal)**


2.3 DARK RESPONSE OF PHOTODIODES

The response of a photodiode can be evaluated under dark or illuminated conditions. Its response in the dark is assessed by its current-voltage (I-V) characteristics and its capacitance-voltage (C-V) characteristics.

2.3.1 DARK I-V MEASUREMENTS

The dark current of a photodiode determines the smallest detectable photocurrent and hence the dynamic range achievable. Dark current also gives rise to shot noise. Hence it is necessary, particularly for low light level applications, to quantify the amount of dark current present in the system. Hence, I-V measurements of the devices under no illumination i.e. in the dark, were carried out. Note that the direction of the current and voltage on the I-V plots to be shown is such that a positive current and a positive voltage represents the photodiode operating in reverse bias i.e. in quadrant 3 of a typical I-V plot of a photodiode (the plot is therefore transposed).

2.3.1.1 Experimental setup for dark I-V measurements

Figure 2.9 shows the setup. The dark current was measured using a Keithley 236 Source-Measure Unit. The unit is capable of measuring currents as low as 10fA and sourcing voltages from 100μV to 110V. In order to avoid any pickup of electromagnetic interference, the sample was placed in a metallic die-cast box with a coaxial connection. The connections on the Keithley are made through triaxial cables. In order to convert the connections of the triax cables to that of a coaxial connection, a second die-cast box was made. Initially a PCB board was built up to house the sample (photodiode chip) but it was found to introduce too high a leakage current even with the devices mounted simply in its JLCC socket. The lowest leakage current was obtained with the packaged chip tested on its own with no socket or PCB. That is the test probes were connected directly to the pins of the packaged chip.
The measured dark current of the fabricated photodiodes is typically less than 1pA. With the added DC leakage from the cabling, packaging and housing, the actual leakage current can be much larger. As a result the system DC leakage was measured with no sample (photodiode) attached i.e. just the cables and die cast box. These results are shown in Figure 2.11. Here we can see the DC leakage of the system is of the order of 833 GΩ with a 1pA offset.

Figure 2.9 Experimental setup (left) and Keithley 236 Source-Measure Unit (right)

Figure 2.10 Die-cast box to hold sample (left) and die-cast box for triax to coaxial connection (right)

Figure 2.11 I-V measurement of the cables showing systematic error in the setup
Hence this systematic error was subtracted from the photodiode readings of the test devices. Figure 2.12 shows the dark current measurements for the deep photodiodes with n+ across (ndeep1, ndeep2) before and after subtraction of the cable offset.

![Graph showing dark current measurements for deep photodiodes with n+ across](image)

**Figure 2.12** Dark I-V measurements of deep photodiodes with n+ across (ndeep1, ndeep2) before (left) and after (right) subtraction of cable offset

### 2.3.1.2 Results and discussion of dark I-V measurements

From a closer look at the dark current measurement of the deep photodiodes with n+ across in Figure 2.13, it can be seen that the plots do not pass through the origin indicating that a systematic error in the reading still exists. We can however see that at 2V the dark current of ndeep1 and ndeep2 is estimated to be 0.35pA and 0.5pA respectively.

![Graph showing close up of dark I-V measurements for deep photodiodes with n+ across](image)

**Figure 2.13** Close up of dark I-V measurements of deep photodiodes with n+ across (ndeep1, ndeep2)
It was also found that the measurements of the dark current were affected by the position of the cable in the die-cast box due to possible triboelectric effects at such low currents. Hence it was necessary to measure the offset introduced by the cable for every measurement of the sample with the cable in roughly the same position. This was difficult but the dark current was found to be of the order of 0.2 - 1.0pA for a reverse bias voltage of 2.0 - 4.0V for both the deep photodiodes (deep1-6 and ndeep1-2) as well as for the shallow photodiodes (nsha1-6, psha1-6). This is a lot larger than the value obtained through simulation and could be due to the uncertainty in the cable measurement and parasitics in the connection of the photodiode to the outside world i.e. pad and wiring capacitance and resistance. However, these results allowed the typical measurement accuracy of the characterisation system to be determined and to obtain a figure for the dark current limits for deciding the next stage of the design.

An interesting observation was made in the forward bias currents of the deep devices with n+ removed (deep1-6). The forward bias current in these devices does not rise exponentially as in a typical forward biased diode but was significantly smaller. Initially, because of its somewhat linear response, this was thought to be due to a large load resistance in series with the diode introduced somewhere in the design or in the setup. However when modelled for this, it showed that this was not the case as a large resistance would make the response linear at an early stage of the bias. Forward I-V plots of deep2 and deep3 are shown in Figure 2.14 and are shown in comparison to ndeep1 and a simulation plot of a 50kΩ resistor in series with a deep3 photodiode model. The reason for this anomaly was later discovered and will be explained in Section 2.3.2.2.

![Figure 2.14 Forward bias currents of deep2 and deep3 showing abnormal behaviour compared to ndeep1](image-url)
2.3.2 DARK C-V MEASUREMENTS

The response time of a photodiode is dependent on the drift time of charge carriers across its depletion region, the charge collection time of carriers outside of the depletion region diffusing to it, and the RC time constant of the photodiode and the circuit [Centronic Ltd. 1998, UDT Sensors Inc., Zimmermann 2000]. This response time is highly dependent on the applied bias voltage. By increasing the applied reverse bias, the depletion region of the diode increases thereby reducing the diffusion time\(^{23}\) of the photodiode. The RC time constant also decreases because the capacitance of the photodiode, which arises from the junction capacitance of the depletion region, is inversely proportional to the width of the depletion region [Sze 1981]. Depending on the circuitry connected to the photodiode the RC time constant could very well dominate the response time of the system. As such it was necessary to characterise the photodiodes in terms of their C-V characteristics. This would also allow one to determine a suitable operating voltage for the photodiode depending on the application.

The C-V characteristics of the various junction diodes in the Mietec 0.7\(\mu\)m CMOS process were simulated using PSpice and the models provided. Figure 2.15 shows the C-V plots for the p+/n-well (pshall-6), n+/p-well (nshall-6) and the n-well/p-substrate (ndeepl, ndeep2) junction diodes. The values shown are for an area of 10000 \(\mu\)m\(^2\) (100\(\mu\)m x 100\(\mu\)m).

\(^{23}\)The drift time also decreases due to the increase of drift velocity, \(v_d\), with electric field, \(E\), applied \((v_d=\mu E\) where \(\mu\) is the mobility of carriers\). However, once saturation is reached, the drift velocity does not increase further and drift time, \(t_d\), increases with depletion width, \(w\) \((t_d=w/v_d)\).
Figure 2.15 Simulated C-V plots for the (a) p+/n-well (pshall-6), (b) n+/p-well (nshall-6) and (c) n-well/p-substrate (ndeepl-2) junction diodes (all devices are of area 100µm x 100µm)

The software does not simulate periphery capacitance but it took into account scaling factors and grading coefficients provided in the models. Table 2.2 summarises the parameters from the models and the results of the simulation.

<table>
<thead>
<tr>
<th>Photodiode type</th>
<th>Process datasheet (0V)</th>
<th>Simulations (PSpice)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>( C_j ) (pF/µm²)</td>
<td>( C_{jsw} ) (pF/µm)</td>
</tr>
<tr>
<td>p+/n-well (pshall-6)</td>
<td>( 6.0 \times 10^{-4} )</td>
<td>( 3.6 \times 10^{-4} )</td>
</tr>
<tr>
<td>n+/p-well (nshall-6)</td>
<td>( 5.0 \times 10^{-4} )</td>
<td>( 2.8 \times 10^{-4} )</td>
</tr>
<tr>
<td>n-well/p-substrate (ndeepl, ndeep2)</td>
<td>( 7.89 \times 10^{-5} )</td>
<td>( 7.33 \times 10^{-4} )</td>
</tr>
</tbody>
</table>

Table 2.2 Junction capacitance of diodes based on model parameters and simulations

Based on these values, the periphery capacitance of the shallow photodiodes (nshall1-6, pshall1-6) only starts to dominate the total capacitance value for areas < 1µm x 1µm. But with the deep photodiode (ndeepl, ndeep2) the periphery capacitance remains the dominant capacitance for areas up to 10µm x 10µm. The deep
photodiodes have the lowest capacitance. This is because they have the largest depletion region due to their lower doping concentration. At zero bias the p+/n-well photodiode has a larger capacitance than the n+/p-well photodiode but it becomes lower at higher bias voltages of more than 2V. The p+/n-well photodiode has the largest variation in capacitance with voltage while the deep photodiodes have the smallest making it suitable to be used in the integrating and discharge mode. For a 100μm x 100μm device, the calculated zero bias junction capacitance of the p+/n-well, n+/p-well and n-well/p-substrate diodes are 6.14pF, 5.11pF and 1.08pF respectively.

It should be noted that the process parameters can vary from run to run and with external conditions such as temperature. This makes the simulations an estimate at best. Mietec provide a set of models to account for the variation in process parameters such as threshold voltage, gate oxide thickness and gate lengths drawn. The models provided are TYP for nominal process conditions, FAST for fast devices (to estimate worst-case power dissipation) and SLOW for slow devices (to estimate worst-case delay). Simulations were mainly performed and shown for TYP but illustrated in Figure 2.16 is the effect of process variations on the C-V characteristics of an n+/p-well junction diode.

Figure 2.16 Simulations of the C-V characteristics of a 100μm x 100μm n+/p-well diode
2.3.2.1 Experimental setup for C-V measurements

In order to verify the values given by the simulations, the capacitance was measured using a Boonton Electronics Capacitance Meter (Model 72B) capable of measuring capacitance down to a resolution of 0.01pF. In order to eliminate the measurement of any parasitic capacitance (cable, package and bondpads), a differential measurement was obtained. The capacitance meter allows a direct difference measurement to be made at its two terminals. So by connecting one photodiode of a particular size to one terminal and another photodiode of a different size to the other terminal, a reading corresponding to the capacitance of the difference in size of the two photodiodes is obtained. This of course assumes the stray cable capacitance in both connections are similar. A reverse bias voltage bias was applied to bias the photodiodes via the back of the capacitance meter.

2.3.2.2 Results and discussions for C-V measurements

Figure 2.17 shows the differential capacitance measurement between ndeep2 and ndeep1. The result is hence equivalent to the effective capacitance of a 30000μm² (or 0.03mm²) deep photodiode with n+ across. The C-V response obtained has the characteristic inverse shape of a C-V curve we would expect from a junction diode and it agrees satisfactorily with simulations.

![Figure 2.17 Differential C-V measurements of deep photodiodes with n+ across](image)

Figure 2.17 Differential C-V measurements of deep photodiodes with n+ across (effective area of 30000μm² ≈ 170μm x 170μm) compared with the simulation model
However, when the deep photodiodes without n+ (deep1-6) were tested, an unusual C-V response was obtained just as we saw with the I-V plots for these devices. The response observed was essentially and uncharacteristically flat, as shown in Figure 2.18. In general the capacitance for these devices were lower than that obtained with the deep photodiodes with n+ across (ndep1, ndep2).

![Figure 2.18 Differential C-V measurements (left) of deep photodiodes with n+ removed (deep1-6) and compared to simulations (right) for an effective area of 39100μm² (~ 200μm x 200μm)](image)

The reasons for requesting the removal of the n+ layer was to maximize the light entering the substrate without being strongly absorbed at the surface and to reduce recombination of photogenerated carriers in this region and hence improve the overall quantum efficiency. Also, silicide, which is opaque to light, is used in CMOS processes to reduce the resistivity in the diffusion and polysilicon layers [Yang 1996] making it necessary to remove these layers for image sensing. In order to remove the n+ diffusion layer, Mietec allowed the use of a reserved layer called NO_GEN to indicate where diffusion is to be removed. The layout rule provided by Mietec containing its usage is as follows:

IGS layer 3 (NMOS_FIELD) and IGS layer 16 (N+ _IMPLANT) are automatically generated, unless in these areas covered by IGS layer 61 (NO_GEN). On these IGS layers, all data that is not covered by IGS layer 61 will be ignored during mask preparation.

Figure 2.19 shows how NO_GEN was used to generate an active region without any diffusion layer. Note the use of NO_GEN as shown gave a design rule warning because this was a reserved layer that wasn't recognised by the design rule checker.
used. However when checked via IMEC's Dracula design rule checker the layer was recognised and the diffusion layer was removed at that junction.

![Diagram](image.png)

**Figure 2.19 Use of NO_GEN to remove diffusion layers in an active area**

It was later established that the Mietec 0.7μm CMOS process used was a polycide process and not a salicide process. But the abnormal results obtained with the deep devices with n+ removed necessitated closer inspection of the device. It seems that the use of NO_GEN layer over the explicitly drawn n+ region had removed the n+ diffusion layer here as well, despite the description of the layout rule. As a consequence a Schottky barrier diode was formed between the n-well and the cathode (K) contacts. Hence two diodes in series were formed as illustrated in Figure 2.20, which explains the C-V characteristics as well as the forward bias current obtained.

---

24 Silicidization is the process of depositing metal (typically titanium or cobalt) on to the silicon in order to lower the resistance of the polysilicon interconnect or the source-drain contact. In a polycide process only the polysilicon is silicided. In a silicide process both polysilicon gate and source-drain regions are silicided. If this silicide process is a self-aligned process, it is usually termed salicide.
The overall capacitance of the photodiode is the capacitance of these two diodes in series and as such the smaller of the two capacitance dominates. The Schottky barrier diode is formed only over the contact area and is hence much smaller than the junction diode. When the deep junction is reverse biased the Schottky barrier is forward biased. However, since it is small its capacitance dominates. Hence a smaller and more linear capacitance value is obtained which agrees with that observed. As the deep junction capacitance drops with increasing bias it will come into play in determining the overall capacitance. The effect of the Schottky diode on the photoresponse in the reverse bias operation of the photodiode is not observable which is reasonable because under these conditions the Schottky diode is forward biased. This will be shown later. Paradoxically, the lower capacitance obtained with these deep devices is a useful by-product for high-speed applications and increased charge conversion efficiency. Also the linear C-V curve obtained will give rise to a linear discharge curve when the photodiode is used in an integrating mode, such as in an integrating active pixel sensor.

It should be pointed out that the diode capacitances measured do not scale linearly with the effective area and is not expected to because a differential measurement will have a lower periphery component than a direct measurement. Consider a difference measurement between a 200μm x 200μm and a 100μm x 100μm diode, the effective area measured will be 30000μm² and the effective periphery measured will be the difference in periphery, which is 400μm. However, for an area of 30000μm² the periphery expected would be close to 700μm. A more accurate estimate of the capacitance is obtained by considering the measurement for the largest difference in
area, for example the measurement for deep6 – deep1. Figures 2.21, 2.22 and 2.23 show the measured C-V characteristics of the shallow n+, the shallow p+ and both the combined photodiodes respectively.

Figure 2.21 Differential C-V measurements of shallow n+ photodiodes (nshall1-6) (left) and compared to simulations for an effective area of 39100μm² (right)

Figure 2.22 Differential C-V measurements of shallow p+ photodiodes (pshall1-6) (left) and compared to simulations for an effective area of 39100μm² (right)

Figure 2.23 Differential C-V measurements of combined shallow n+/deep photodiodes (ncomb1-6) (left) and combined shallow p+/deep photodiodes (pcomb1-6) (right)
The p+/n-well (pshall-6) photodiode has a larger capacitance than the n+/p-well (nshall-6) photodiode and both have a larger capacitance per unit area than the deep photodiodes, which agrees with what simulations suggest. The measured capacitance of the combined devices are also shown. The pcomb devices exhibit a strange response which has yet to be explained but is thought to be due to the formation of the depletion region at the surface and how this depletion region increases in size with reverse-bias till it eventually meets the n+ collection region leading to punch-through.

2.4 PHOTORESPONSE OF PHOTODIODES

The photoresponse of a photodiode can be evaluated in terms of its spatial and spectral sensitivity. The following sections detail experimental work carried out in determining both of these responses for the fabricated standard CMOS photodiodes.

2.4.1 SPATIAL RESPONSE

Edge-effects due to the lateral diffusion of photogenerated carriers in imaging detectors lead to the increase in photocurrent in the periphery and a larger effective charge collection area than the actual geometry of the photodiode [Holloway 1983]. This effect is expected to be more pronounced in small photodiodes which has a larger perimeter-to-area ratio. A series of photodiodes of varying sizes were included in the second characterisation chip (chip1bfinal) in order to evaluate this. Also with the first characterisation chip, the effect of lateral crosstalk was seen. Lateral crosstalk arises from the diffusion of lateral photocharge from outside the pixel region, either from a neighbouring photodiode or from collection in the substrate. The effect of this for imaging applications is that the contrast obtained will be significantly degraded and decreasing pixel size to increase resolution will reach a limit if this crosstalk is not removed. The following section demonstrates and evaluates this issue.
2.4.1.1 Experimental setup of spatial response test

In order to determine the spatial photoresponse of the photodiodes, a laser beam (667nm) was focused to a spot of approximately 5\(\mu\)m and scanned across the area of the photodiode. This was done by placing the sample on a scanning stage and adjusting the height and position of the stage such that the laser is focused on the sample. The stage is controlled by a PC to move in 2 dimensions to cover the scanning area with the focused laser remaining fixed. The scanning stage is capable of moving in step sizes as small as 1\(\mu\)m but mainly a step size of 5\(\mu\)m was used, as too small a step would lead to excessively long scan times. Also it would be unnecessary to make the step size too small when the spot size is limited to 5\(\mu\)m anyway. The PRO8000 laser diode controller from Profile, Germany, was used to control the laser output power over a range of 2 decades (0.4mW - 10mW). It also maintains the temperature of the laser at a specified level for stability and a room temperature of 25°C was chosen. Figure 2.24 shows the setup of the scanning system. The power of the laser diode was set at 0.4mW with no neutral density filter (NDF) in the optical path but after going through the optics the power incident on the chip was approximately 82\(\mu\)W. The reflected beam was imaged on a reference photodiode to obtain an image of the scan and to determine if the setup was in focus. As with the dark current measurements, the Keithley Source-Measure Unit was used to apply a bias voltage and take the current measurement. The scan was performed with a reverse bias voltage of 2V applied to the test photodiode. However, unlike the dark current measurements, the Keithley was controlled through the IEEE 488.2 GPIB (General Purpose Interface Bus) serial interface [Keithley Instruments Inc. 2001] to allow automatic collection of data\(^{25}\). However, it was necessary to wait for a period of at least 3s after setting the bias conditions before taking a reading from the Keithley as the bus remains busy for this period. Consequently a time between readings of 5s was used throughout. A test board allowed each photodiode on the test chip to be tested in

---

\(^{25}\) When controlling the Keithley through the GPIB, the autoranging feature of the Keithley would fail at low measured currents and an arbitrary value of +0.001mA is obtained. When that occurred, it was necessary to change the measurement range. The easiest way to do this was to check the reading obtained and if the reading when out of range, the program would switch to an appropriate measurement range.
turn by connecting the appropriate jumper. The schematic and PCB of the test board for the scanning experiment is included in Appendix A2.2.

Figure 2.24 Setup of scanning system for the characterisation of fabricated photodiodes for spatial response measurements

2.4.1.2 Results and discussion of spatial sensitivity measurements

Figure 2.25 shows the optical image obtained from a scan of the deep4 photodiode and Figure 2.26 shows the spatial results in both x and y direction obtained from this scan. The increase of photocurrent at the edges is due to the side-wall of the photodiode providing a larger volume depletion region (see Figure 2.1) and hence collection region. Also the large number of defects at the edges, particularly at the surface and at the field-oxide/well-junction interface, could contribute to its presence. During the oxidation process in chip fabrication, stresses are generated that slightly lift the protective nitride at its edges, creating a tapered oxide called a bird's beak. The LOCOS or bird's beak region is the transition between the field oxide and the thin oxide that covers the n+ implant and is under elevated mechanical stress. The presence of this can lead to a larger leakage current. In a recent paper by Hornsey and Renshaw [Lee 2003 (Part II)], it was observed that the edge-effect in CMOS photodiodes is significantly affected by surface recombination and mobility degradation along the Si-SiO₂ interface.
It can be seen that edge effects are more significant in the x-direction than the y-direction. It is not yet clear why this is so. However it is felt that shadowing effects in the optics made the observed edge effects more pronounced than they actually are as the edge effects are also seen in the optical image which is the reflection of the beam from the surface. Also the structure is not perfectly planar, particularly at the edges, and variation of type and thickness in the layers will mean the relative effect of the response between the edges and its centre could depend on the wavelength used and the reflections that occur. Furthermore, there exists a grain in the wafer, which can be observed in the optical image previously shown (Figure 2.5) and will give rise to different responses depending on where along the grain the spot lies. Another issue is that the photoresponse extends outside the area of the exposed photodiode. So it is possible that at the edges, diffraction effects and multiple reflections in the passivation layers are occurring, and not discounting the possibility that the light spot is diffused significantly more than expected by the imaging optics. Stray and scattered light was also an issue in the experiment.

Figure 2.25 Optical image from reflected beam
Figure 2.26 Scan of 100μm x 100μm deep photodiode with n+ removed (deep4)
Figure 2.27 shows the scan of different sized devices of the deep photodiodes with n+ removed. The edge effects can again be seen, except for ‘deep1’, the 30µm x 30µm device, where only a single peak exists at the centre of the pixel. For the other sizes (i.e. deep2-6) the response at the centre decreases with increasing size. It seems that as the pixel size gets smaller the peaks get closer together increasing the response at the centre until the peaks merge and further decrease in size reduces the central response.

For applications where a flooded light source is required and the pixel size dictated resolution, there would be an optimum size in the trade-off between sensitivity and resolution [Chen 2000]. In the case of a focused spot size, however, the size of the device is expected not to matter until the size of the device is comparable to the spot size. However, because of the edge effects and non-uniform response, it may be prudent to make the detector size somewhat larger than the spot size.
Crosstalk

In scanning a laser beam across the chip, two sources of crosstalk could be seen: diffusion of photogenerated carriers from neighbouring photodiodes and diffusion of carriers from the exposed substrate. The crosstalk from neighbouring photodiodes could be removed by grounding the neighbouring photodiodes. This is illustrated in Figure 2.28 (a) and (b). Figure 2.28 (a) shows the photocurrent detected by the device in the centre as the beam is scanned across the other photodiodes which were left floating. By grounding these devices this crosstalk was removed as shown in Figure 2.28 (b). Figure 2.28 (c) and (d) show the measured photocurrents along \( y=0 \) and \( x=0 \) of the scan before and after grounding of the neighbouring photodiodes. Although the crosstalk from neighbouring photodiodes has been removed, the crosstalk from the substrate still remains. To remove crosstalk from the substrate a metal light shield is placed around each photodiode to block the incident light in this region. However due to the large diffusion length of the carriers relative to the scale of the devices, substrate current as far as 300\( \mu \)m away from the pixel is still detected by the photodiode under test. This implies either a larger area light shield is required or a guard ring or parasitic photodiode structure is needed to absorb the leakage current. However, it can be seen that the crosstalk from the substrate is also reduced when the neighbouring photodiodes are grounded because some of the diffused substrate current is now drawn and collected by the other photodiodes.

![Measured photocurrent](image)

(a) Neighbouring photodiodes floating

Figure 2.28 Scan of ‘deep4’ photodiode with crosstalk present and crosstalk removed
Figure 2.28 Scan of ‘deep4’ photodiode with crosstalk present and crosstalk removed
Figure 2.29 shows the response obtained from the scan of the 100\( \mu \text{m} \) x 100\( \mu \text{m} \) deep photodiode with n+ across. Several cross sections of the scan are shown. The crosstalk from the substrate clearly shows the extent of the diffusion length of the carriers. The minority carrier diffusion lengths of epitaxial silicon in modern CMOS processes are typically in the order of hundreds of micrometers [El Gamal, Lee 2003 (Part I)].
Figure 2.29 Scan of the 100\(\mu\)m x 100\(\mu\)m deep photodiode with n+ across

The contact area of the photodiode can clearly be discerned in Figure 2.29 (b) establishing the size of the spot to be less than 6\(\mu\)m. Note that in Figure 2.29 (c) and (d) (circled regions) the presence of metal tracks did not block the light completely because the spot size was larger than the track size (2\(\mu\)m) at these points. This causes the size of the tracks in the optical image to appear broader than they are. Spatial filtering has occurred.
The diffusion of minority carriers follows an exponential decay with length [Shcherback 2003, Sze 1981] and hence the diffusion length\(^{26}\) of the process used can be estimated from the plots of the substrate crosstalk as follows:

\[
\frac{I_1}{I_2} = \frac{\exp\left(-\frac{x_1}{L_n}\right)}{\exp\left(-\frac{x_2}{L_n}\right)}
\]

(2.1)

where \(I_1\) and \(I_2\) are the photocurrents generated at \(x_1\) and \(x_2\) respectively, and \(L_n\) is the diffusion length of minority carrier electrons in the p-epi substrate.

Therefore, the diffusion length, \(L_n = \frac{x_2 - x_1}{\ln\left(\frac{I_1}{I_2}\right)}\) (2.2)

Choosing two points, \(x_2 = -200\mu\text{m}\) and \(x_1 = -150\mu\text{m}\), from the plot of \(y=0\mu\text{m}\) of the scan (\(I_2 = 1.0111 \times 10^{-5}\text{A}, I_1 = 8.0628 \times 10^{-6}\text{A}\)), a diffusion length of \(~220\mu\text{m}\) for electrons in the p-substrate is obtained. It takes three diffusion lengths for the concentration of diffused carriers to drop to 5% of its original value.

Figure 2.30 shows the crosstalk obtained when the beam is scanned across the photodiode array with the central device connected and the remaining devices floating. The lateral crosstalk is significant with adjacent pixels reaching more than 50% of the central pixel value. Furthermore, the response of the pixel under test is lower than that obtained in the isolated pixel case. This is possibly because part of the photoresponse is due to the diffusion of carriers outside the depletion region and this is now being collected by the p-n junctions (depletion regions) of neighbouring photodiodes. The diffusion process, though contributing to the photocurrent, acts as a spatial filter performing spatial averaging of the image. It is also interesting to note, from the 3D image obtained (Figure 2.30 (e)), that the edge-effects are most prominent at the corners of a photodiode pixel where the electric field stresses are higher [Shcherback 2002].

---

\(^{26}\) The distance over which concentration of injected free charge carriers injected into semiconductor falls to \(1/e\) (37%) of its original value.
Figure 2.30 Crosstalk between neighbouring pixels of the photodiode array (PDArray)
Figure 2.31 shows the response of the different types of photodiode of size 100μm x 100μm. In general the shallow n+ photodiode had a lower response and the combined shallow n+/deep device has an abrupt and two distinct responses. The lower responsitivity of the shallow n+ photodiode can be attributed to its narrower depletion region [Xiangliang 2002]. Also its shallow junction depths and the isolation provided by the deep field oxide trenches means collection of diffusion carriers is poorer than in deep photodiodes. Whether or not the choice of wavelength used had an effect in the response obtained will be discussed later.

Figure 2.31 Photoresponse of the different 100μm x 100μm sized photodiodes
Figure 2.32 shows the measured photoresponse in the combined devices. In the case of the combined shallow n+/deep (ncomb) device, the response is due to the fact that this photodiode consists of two distinct photodiodes next to each other (see photodiode cross-section in Figure 2.7 (c)) with the deep device having a higher responsitivity than the shallow n+ photodiode as mentioned. It is also interesting to observe the breakdown effect in the combined shallow p+/deep (pcomb) devices as mentioned previously in Section 2.2.3. With the pcomb device, a large background current is obtained with reverse bias voltage but with no amplification of the photocurrent. This device will suffer from poor signal-to-noise ratio due to large shot noise and poor dynamic range due to saturation, if used in the reverse bias mode.

![Measured photocurrent](image)

(a) ncomb4  
(b) pcomb4

Figure 2.32 Photoresponse of the 100µm x 100µm combined photodiodes

**Chip to chip variation**

In order to see the variation in photoresponse from chip to chip i.e. with process or wafer variations, scans of different samples of the deep4 photodiode were performed. It can be seen from Figure 2.33 that the shape of the peaks varied. This could be due to the grain in the wafer as mentioned previously. Also in setting up the experiment for different chips, slight difference in clamping of the test board to the scanning stage i.e. if the board is not flat, could lead to different shadow effects in the scan. Overall the standard deviation of the images over the photosensitive area is still less than 1.7%. From the image of the measured photocurrent of Figure 2.33 (a), the absorption of light through the metal holes can also be seen. Also with the neighbouring
photodiodes grounded, contribution from the edges of these devices is still visible though significantly reduced as seen in Figure 2.33 (b) and (c). Perfect removal of crosstalk just by grounding is not possible. Diffusion follows a statistical process and a very small proportion still diffuses to the test photodiode.

(a) Image of measured photocurrent

(b) Measured photocurrent along y=0

(c) Measured photocurrent along x=0

Figure 2.33 Scan of the deep 100µm x 100µm photodiode with n+ removed (deep4) on various chips

Responsivity

From the spatial sensitivity experimental setup, responsivity values can be obtained. However, there are several means to determine this value due to the spatial nature of the response. The responsivity can be obtained by taking a mean over the area of the photodiode. But it is difficult to determine exactly the size of the photodiode as the light entering the substrate outside the defined photodiode area can also be picked up.
Thus far the photodiode size has been defined by the size of the n-well as this corresponds to the location of the p-n junction or depletion region. But the photodiode exposed area is slightly larger than this because of the necessary substrate contacts around the periphery of the device – see Figure 2.6. For our design, the exposed region of each photodiode is about 20μm larger than the stated n-well width in both directions.

Four different conditions are defined for the possible calculation of the responsivity. The responsivity can be calculated based on an average over a defined area. Here two will be used: the total area exposed to light i.e. before the boundary of the light shield and the area of the drawn n-well. Or the responsivity can be obtained from specific points on the photoresponse scan. Intuitively, either the centre of the pixel or the maximum value across the scan is used. Table 2.3 gives the responsivities obtained for a 100μm x 100μm deep photodiode with n+ across for these different conditions.

<table>
<thead>
<tr>
<th>Exposed area</th>
<th>N-well area</th>
<th>Centre</th>
<th>Maximum</th>
</tr>
</thead>
<tbody>
<tr>
<td>Responsivity (A/W)</td>
<td>0.242</td>
<td>0.293</td>
<td>0.335</td>
</tr>
<tr>
<td>Quantum efficiency (%)</td>
<td>45.1</td>
<td>54.5</td>
<td>62.3</td>
</tr>
</tbody>
</table>

Table 2.3 Responsivity and quantum efficiency values for the 100μm x 100μm deep photodiode with n+ across (ndeepl) at λ = 667nm

It is felt that the average value obtained using the n-well area gives a fair and good estimate of the responsivity and will be used from now on. In the case of photodiodes without an n-well, the equivalent active area drawn defines the area. Quantum efficiency values are also shown and are obtained from the measured responsivity values using equation (1.9).

Table 2.4 shows the responsivities obtained for the photodiodes tested. The average responsivity of the deep devices is 0.298 A/W. The responsivity of the shallow n+ device is slightly lower as it has a smaller depletion region to collect the photogenerated charges. The shallow p+ device has a comparable responsivity to the deep devices because of the presence of the parasitic n-well/p-substrate junction. A
more detailed analysis of the responsivity will be given later when the spectral response of the devices is observed.

<table>
<thead>
<tr>
<th>Photodiode</th>
<th>Responsivity (A/W) at 667nm</th>
</tr>
</thead>
<tbody>
<tr>
<td>ndeep1 (100µm x 100µm)</td>
<td>0.293</td>
</tr>
<tr>
<td>ndeep2 (200µm x 200µm)</td>
<td>0.307</td>
</tr>
<tr>
<td>deep1 (30µm x 30µm)</td>
<td>0.259</td>
</tr>
<tr>
<td>deep2 (60µm x 60µm)</td>
<td>0.287</td>
</tr>
<tr>
<td>deep3 (80µm x 80µm)</td>
<td>0.281</td>
</tr>
<tr>
<td>deep4 (100µm x 100µm)</td>
<td>0.307</td>
</tr>
<tr>
<td>deep5 (160µm x 160µm)</td>
<td>0.306</td>
</tr>
<tr>
<td>deep6 (200µm x 200µm)</td>
<td>0.320</td>
</tr>
<tr>
<td>nshal4 (100µm x 100µm)</td>
<td>0.258</td>
</tr>
<tr>
<td>pshal4 (100µm x 100µm)</td>
<td>0.303</td>
</tr>
<tr>
<td>ncomb4 (100µm x 100µm)</td>
<td>0.282</td>
</tr>
<tr>
<td>pcomb4 (100µm x 100µm)</td>
<td>0.273</td>
</tr>
</tbody>
</table>

Table 2.4 Responsitivities of photodiodes tested at 667nm (based on n-well area)

### 2.4.2 I-V CHARACTERISTICS

By obtaining the I-V characteristics of a photodiode (see Section 1.5.4) under varied illumination levels, its linearity and suitable operating range in terms of light intensity and bias voltage can be determined.

#### 2.4.2.1 Experimental setup

The same scanning system employed in the measurement of the spatial photoresponse (see Section 2.4.1.1) was used to obtain the I-V characteristics of the photodiodes in light. However, the incident power on the sample is now adjusted by varying the output power of the laser diode and by placing various NDFs in the optical path. NDFs with optical densities, D of 3.2, 0.8 and 0.4 where the NDF transmittance, T =
10^D, was used to give an incident power range of 5 decades (60nW to 2.7mW). The focused laser beam is imaged onto the centre of the pixel under test and then an I-V sweep is performed using the Keithley with a step size of 0.1V up to a reverse bias voltage of 5V and a forward bias of 1V (5V for the deep and pcomb devices with Schottky diodes). The results obtained from the characterisation will be shown in the following section.

2.4.2.2 Results and discussion of I-V characterisation in light

Figure 2.34 shows the I-V characteristics obtained for various illumination levels (82µW to 2.7mW) of the deep 100µm x 100µm photodiode with n+ removed (ndeep1). As expected, the larger the incident power, the larger the photocurrent. The response appears linear and this will be investigated further. What is interesting to note is that there is some response to light in the forward bias because the Schottky diode (as a consequence of no n+ under the contacts) can act as a photodiode as well. However, its response is weak, partly due to the size of the device and partly due to the fact that it is completely covered in metal with no interdigitated structure required for proper photodiode operation of Schottky photodiodes. The main mechanism for light detection here is probably due to the diffusion of carriers from outside the contact area.

![Figure 2.34 I-V characteristics for deep 100µm x 100µm photodiode with n+ removed (ndeep1) for increasing light level](image-url)
Figure 2.35 shows the comparison between the I-V characteristics of the deep 100μm x 100μm photodiode with n+ across (ndeepl) and that without (deep4) for incident light powers of 41.6μW and 494μW. The deep photodiode with n+ across is slightly more responsive (8.6%). It is also observed that the deep photodiode with n+ removed has significantly less photoresponse in 'quadrant 4' of the I-V plot and has a fixed open-circuit voltage in the presence of illumination. This is due to the reverse bias action of the Schottky diode in this region. This limits the operating range of this device as a photodiode.

![Figure 2.35 I-V characteristics of the deep 100μm x 100μm photodiode with n+ across (ndeepl) and without (deep4)](image)

The linearity with illumination level was tested and the results for the deep 100μm x 100μm photodiode with n+ removed (deep4) at a reverse bias voltage of 2V are shown in Figure 2.36 (a) and (b), as compared to a linear (dotted) line. The non-linearity\(^\text{27}\) or the maximum deviation over the full range of powers (60nW to 2.7mW) tested is 0.73% of the full scale range while the average deviation was about 0.13%. Over the range of 60nW to 1μW, the non-linearity was 2.26%. The linearity of both the deep photodiodes with n+ across (ndeepl-2) and with n+ removed (deep1-6) are shown in Figure 2.37. In general, all the photodiodes were found to be of similar linearity.

\(^{27}\) Non-linearity is defined as the maximum deviation of the transmitter output from the reference line (terminal or best-fit straight line) and is reported as a percentage of the unit's full-scale range.
Chapter 2

Photocurrent vs. Incident Power

(a) Incident power of 60nW to 2.7mW  (b) Incident power of 60nW to 1µW

Figure 2.36 Photoresponse linearity of deep4 for different ranges of incident light levels (2V)

Figure 2.37 Photoresponse linearity of the deep photodiodes with n+ across (n_deep1-2) and without (deep1-6) for incident power of 40µW to 460µW (2V)

When the photocurrent measured is plotted against area as shown in Figure 2.38, a non-linear response was obtained. Also the response seems to be larger for smaller sized devices with the exception of the 30µm x 30µm device. This test was repeated on a separate chip with similar results. In actual fact, this is the same response that was seen with the spatial photoresponse of the deep devices in Figure 2.27 because it is the photoresponse at the centre of the pixel that is being measured. The dip in the
photoresponse gets shallower as the pixel gets smaller and the edge effects merge till a single peak is seen for 'deep1'.

![Figure 2.38 Variation of photocurrent with area for deep devices (2V)](image)

### 2.4.3 SPECTRAL RESPONSE

The spectral response of a photodiode shows how the magnitude of the photocurrent for a given incident light power varies over a range of wavelengths. Obtaining the spectral response will help determine a suitable operating wavelength to use in a chosen application.

#### 2.4.3.1 Experimental setup for spectral sensitivity tests

This section describes the setup and testing of the experimental apparatus for obtaining the spectral response of the full custom photodiodes, as shown in Figure 2.39. The first part of the setup involves providing a monochromatic or single wavelength output over a wide range of wavelengths. The H20 IR Jobin Yvon monochromator [Jobin Yvon (Horiba) New Jersey, USA] with a grating of 600 lines/mm and an output wavelength range of 400nm to 1100nm was used. The monochromator takes in white light from a 70W tungsten-halogen lamp through its entrance slit. Mirrors inside the monochromator direct the light to a diffraction grating, which divides the white light into its spectrum. Another set of mirrors direct the light to the exit slit where the spectrum is narrowed down to a near-
monochromatic light. The wavelengths exiting the monochromator are selected by rotating the grating which is controlled by the dial on the monochromator.

The resolution of the monochromator is specified as 0.5nm for a wavelength of 500nm and a diffraction grating of 1200 lines/mm. In order to observe and confirm the resolution of the monochromator for the diffraction grating used, its output was observed through a spectrometer. The grating used in the spectrometer allowed a range of 400nm to 700nm to be observed. Figure 2.40 shows the resolution observed for wavelengths of 420nm and 600nm. The resolution, specified as the full width at half maximum (FWHM), is approximately 5nm at both wavelengths. In addition, the output beam of the monochromator was diverging and non-uniform. The non-uniformity was partly due to the image of the grating appearing on the output beam and the position of this changes with wavelength. Hence a diffuser was used in order to produce a uniform beam of light over the area of the sample. The disadvantage of this is less light gets through.
Figure 2.40 Spectral output of the monochromator viewed through a spectrometer

A stepper motor was used to automatically rotate the grating and step through the wavelengths. The UCN5804B BiMOS II Unipolar Stepper Motor Driver is used to convert CMOS/TTL logic inputs into a stepper motor drive format to drive the four-phase unipolar stepper motor attached to the monochromator grating turret [Chen 2002]. The format used was the two-phase drive format which has better torque performance and less susceptible to motor resonance. The driver accepts two signals from the PC's parallel port. One is to control the rotation sequence of the outputs and hence the direction of the motor i.e. whether the wavelength is increased or decreased and the second is to advance the sequence position of the outputs by one position with
every high-to-low transition. Six step pulses were needed to advance the wavelength by 1nm. Step sizes of 5nm were used in the measurement of the spectral response.

The second part of the setup is the measurement of the sample’s photocurrent and the incident light power. The Keithley was used to obtain the photocurrent measurements. In theory, fluctuations in the source (wavelength and intensity) can be compensated for by splitting the light and simultaneously measuring (and cross calibrating) the photocurrent generated on the test photodiode and on a calibrated reference photodiode. However, the Keithley only allows for measurements on one channel and so this cannot be carried out. The power of the light incident on the sample was measured with a Newport Optical Power Meter (Model 835) that used an 818-SL detector type with an active area of 1cm². Readings were double-checked with a second power meter, namely the Coherent LabMaster Ultima Power Meter which had a detector aperture of 7.9mm, a spectral range of 400-1064nm and a resolution of 1nW. Measuring the power of a non-uniform beam as illustrated in Figure 2.41 gives rise to an incorrectly higher responsivity value because the power measured is averaged across the beam but the power incident on the detector, which is smaller than the aperture, is higher. An aperture was used so that a more uniform area of illumination is obtained and this also allowed a more accurate determination of the illuminated area when measuring the incident light power with the power meter.

![Figure 2.41 The use of an aperture to obtain more uniform power measurements](image)

Initial tests were made with the Temic BPW34 photodiode in order to use its datasheet values for comparison. However, this showed a significantly larger response than that specified in its datasheet. This turned out to be due to the source not being accurately imaged on the entrance slit. This resulted in a more non-uniform beam at the output. However the response obtained after correcting this was still high as shown in Figure
2.42 (a). Also shown are measurements taken at different times (correctly imaged on the slit) which shows that the temporal variations of power or wavelength could not account for this discrepancy. A calibrated photodiode from Hamamatsu (S6058 4-quadrant Si PIN Photodiode) was then used for further tests but this still showed similar results of the response being higher than expected. This is shown in Figure 2.42 (b). It was thought that the uniformity of the beam was still affecting the reading and a smaller aperture and a second diffuser was used. This improved the reading as shown. However, some non-uniformity probably still existed. As such it was decided to scale the spectral response curves obtained with the value of the responsitivity obtained with the spatial response experiment as the light source was focused in that experiment and the incident power could be determined more accurately. The scaled result is also shown in Figure 2.42 (b). Now the curve is slightly lower but this is marginal and can be explained by scattered light, which gets measured by the power meter, and losses in the photocurrent measurement.

![Absolute Spectral Response](image1)

(a) BPW34

![Absolute Spectral Response](image2)

(b) S6058

Figure 2.42 Comparing measured and documented spectral response of reference photodiodes

Table 2.6 shows the responsitivity values obtained from the scanning system compared to that quoted on the datasheet for the reference photodiodes. The incident power in this test was measured at 84μW. It was concluded that the best and most accurate means of determining the spectral response of any test photodiodes was to scale the response obtained with the responsitivity value obtained from the previous spatial scanning experiment. This removed the need for the second diffuser, which
meant more light could be directed at the sample and the signal-to-noise ratio of the system is improved.

<table>
<thead>
<tr>
<th>Reference photodiode</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>BPW34</td>
<td>S6058</td>
</tr>
<tr>
<td>Measured photocurrent</td>
<td>35.4 µA</td>
</tr>
<tr>
<td>Measured responsitivity</td>
<td>0.421 A/W</td>
</tr>
<tr>
<td>Responsivity from datasheet</td>
<td>0.425 A/W</td>
</tr>
<tr>
<td>Percentage error</td>
<td>0.94 %</td>
</tr>
</tbody>
</table>

Table 2.5 Comparing measured responsivity values of the reference photodiodes using the scanning system with their quoted datasheet values at 667nm

Due to size restrictions in the setup, unshielded flying leads had to be used to make the measurement instead of a direct shielded BNC connection to the board (as in the spatial scanning system). This caused the dark DC leakage current from the spectral system (biased at 2V) to be about 50pA compared to that of the scanning system which was about 3pA. As such, dark current measurements were taken and subtracted for each run. However, this did not eliminate problems due to electromagnetic pickup and steps had to be taken to remove this, such as isolating from external circuitry and human motion.

The output light intensity of the monochromator could be increased by increasing the entrance and exit slit widths of the monochromator but this has the effect of reducing the resolution of the wavelength selection. Using a higher power (wattage) light bulb may not necessarily increase the input light intensity because a higher power light bulb may just have a larger filament area whose image is truncated by the input slit anyway.
2.4.3.2 Results and discussion of spectral sensitivity tests

The spectral response of the deep 100μm x 100μm photodiode with n+ removed (deep4) from 400nm to 1100nm is shown in Figure 2.43 (c). The response curves when scaled to the different responsivity values are shown. Figures 2.43 (a) and (b) show the measured photocurrent and incident light power respectively. The readings go below 100pA for wavelengths of 450nm or less for the 200μm x 200μm devices and 470nm or less for the 100μm x 100μm devices making it susceptible to the noise level at this range. However this corresponds to where the response curves quickly drop off anyway. An internal spline function in Matlab was used to interpolate the data and get a smoother, more detailed waveform.
The spectral response curve of the device is typical of a photodiode fabricated in a standard CMOS process [Lee 2003 (Part I), Stoppa 2002]. The peaks and troughs in the response seen are due to the interference of the reflections within the passivation layers covering the active area of the photodiode. The response drops off at wavelengths longer than 1100nm as it approaches the cut-off wavelength of silicon (see Section 1.5.1). Photons with energies smaller than the bandgap energy of 1.12eV at room temperature will not be absorbed at all. At the other end, the reason for the drop-off of responsitivity at lower wavelengths is twofold. Firstly light at that wavelength gets absorbed closer to the surface and the photogenerated carriers do not diffuse to the depletion region but are lost due to surface recombination at the Si-SiO₂ interface. Secondly, for a given amount of power P incident on the detector, the shorter the wavelength, \( \lambda \), the more energy, \( E_{ph} \), the photons have and hence the number of quanta or incident photons is smaller as given below:

\[
N = \frac{P}{E_{ph}} = \frac{P}{hc/\lambda} \text{ photons/s}
\]  

(2.3)

Figure 2.44 shows the spectral response obtained for the different sized deep photodiodes with n+ removed (deep1-6). The peaks of the curves line up reasonably as expected because the variation in the pixel is in the lateral width and not with any vertical differences. As the experiment was carried out with a flooded, not focused, light source, it was susceptible to crosstalk from the substrate – an optical blocking layer was used around the test devices but this only extended to 180\( \mu \)m. However, this is not expected to affect the shape of the spectral response.
Figure 2.44 Measured spectral response of the deep photodiodes with n+ removed (deep1-6)

Figure 2.45 shows the measured spectral response of the various 100μm x 100μm photodiodes. It shows that the 'deep' and 'ncomb' devices show similar responses with the 'nshal' device showing a lower response (maximum of 0.3081A/W at 735nm). The lower response of the 'nshal' device is believed to be due to the higher doping concentration of the p-well compared to the p-substrate leading to a potential barrier for the collection of diffusion electrons in the substrate [Dierickx 1997]. Also the 'pshal' device shows better response at shorter wavelengths and the 'pcomb' device showing an overall wider response than the others. Overall the 'ndeep1' device gave the best response at longer wavelengths while the 'pshal' device was better at shorter wavelengths. This response is consistent with the fact that light of longer wavelength penetrates deeper into the substrate where the junction of the deep photodiode lies so charges generated here are swept across the junction and collected, while the reverse is true for shorter wavelengths. This is due to the absorption coefficient which is highly wavelength dependent (see Section 1.5.1) and results in a penetration depth\(^{28}\) of light into silicon as shown in Figure 2.46.

\(^{28}\) Penetration depth is defined as the distance that light travels before the intensity falls to 37\% (1/e) of its original value at the surface.
Figure 2.45 Measured spectral response of the 100µm x 100µm photodiodes

Figure 2.46 Penetration depth of light into the silicon substrate at various wavelengths [UDT Sensors Inc.]
Comparing the spectral response curves in Figure 2.45 (b) of the deep4 photodiode (Figure 2.6) and the pcomb4 photodiode (Figure 2.7 (d)), at a wavelength of 422nm the response is higher for the pcomb4 device. This is because pcomb4 has an additional shallow junction for the collection of charges at a depth of 0.3μm (see Figure 2.46) which corresponds to the penetration depth of light at this wavelength. At a wavelength of 586nm, or a penetration depth equal to the n-well junction depth, $x_j$, of 2μm, they show similar responsitivity values. This again is consistent since any photons penetrating past the deep n-well junction will only be collected by this junction and not the shallow source/drain region. Table 2.6 summarises the responsitivities obtained with the test photodiodes.

<table>
<thead>
<tr>
<th>Photodiode</th>
<th>Responsitivity (A/W) at 667nm</th>
<th>Maximum responsivity (A/W)</th>
<th>Wavelength of max. responsivity (nm)</th>
</tr>
</thead>
<tbody>
<tr>
<td>ndeep1 (100μm x 100μm)</td>
<td>0.293</td>
<td>0.403</td>
<td>690</td>
</tr>
<tr>
<td>ndeep2 (200μm x 200μm)</td>
<td>0.307</td>
<td>0.413</td>
<td>683</td>
</tr>
<tr>
<td>deep1 (30μm x 30μm)</td>
<td>0.259</td>
<td>0.312</td>
<td>779</td>
</tr>
<tr>
<td>deep2 (60μm x 60μm)</td>
<td>0.287</td>
<td>0.354</td>
<td>740</td>
</tr>
<tr>
<td>deep3 (80μm x 80μm)</td>
<td>0.281</td>
<td>0.326</td>
<td>733</td>
</tr>
<tr>
<td>deep4 (100μm x 100μm)</td>
<td>0.307</td>
<td>0.357</td>
<td>735</td>
</tr>
<tr>
<td>deep5 (160μm x 160μm)</td>
<td>0.306</td>
<td>0.377</td>
<td>739</td>
</tr>
<tr>
<td>deep6 (200μm x 200μm)</td>
<td>0.320</td>
<td>0.402</td>
<td>681</td>
</tr>
<tr>
<td>nshal4 (100μm x 100μm)</td>
<td>0.258</td>
<td>0.308</td>
<td>735</td>
</tr>
<tr>
<td>pshal4 (100μm x 100μm)</td>
<td>0.303</td>
<td>0.387</td>
<td>647</td>
</tr>
<tr>
<td>ncomb4 (100μm x 100μm)</td>
<td>0.282</td>
<td>0.358</td>
<td>686</td>
</tr>
<tr>
<td>pcomb4 (100μm x 100μm)</td>
<td>0.273</td>
<td>0.340</td>
<td>685</td>
</tr>
</tbody>
</table>

Table 2.6 Responsitivities of the photodiodes tested
2.5 CHAPTER SUMMARY

Unlike CCDs which have a specially tailored process and structures such as buried channels and surface state pinning to achieve very low dark current levels, photodetectors in a standard CMOS process make use of the parasitic junctions that exists. The work done in this chapter was carried out in order to evaluate the design and use of these junction photodiodes from a standard CMOS process. Several factors need to be considered such as the dark current, the capacitance and its variation with bias, the responsivity of the device and its spatial and chromatic variation. The dark current determines the minimum sensitivity of the device and has two main sources [Shcherback 2002]: dark current from the diffusion of carriers across the depletion region which depends on the doping concentration, bandgap, temperature, bias voltage and active area, and stress induced or defect generated leakage current which depends on the active area shape and bias voltage. For the same fill factor, the smoother the shape the lower the leakage current. This was clearly seen from the observed spatial response of the devices where edge effects showed increase leakage current. The dark current for the devices tested was of the order of 1pA or less for a reverse bias voltage of 2 - 4V.

For applications where speed of response is important, the junction capacitance of the devices needs to be small for a fast response time. The capacitance of the deep device was shown to be smaller than the shallow devices with the presence of the inadvertent Schottky barrier diode in reverse lowering the capacitance further. In determining the spatial response of a device, the issue of crosstalk between pixels and from the substrate was highlighted. Due to the large diffusion lengths in silicon, either a metal shield or a guard ring is required to prevent degradation of the contrast of the image obtained. Also, due to the presence of edge effects the response of the photodiodes does not scale linearly with area but is affected by the peripheral response. So in cases where there is a trade-off between sensitivity and resolution, this must be taken into account. The junction photodiodes were also shown to be very linear with light power, with saturation level not yet reached for an incident light power of 2.7mW. The linearity range of a photodiode can be extended slightly by applying a reverse bias voltage [UDT Sensors Inc.].
The deep or well-substrate photodiode showed better responsivity than the shallow devices due to its wide depletion region caused by the relatively low carrier concentration in the n-well. Since it is deep it is also able to collect the minority carriers photogenerated deep in the substrate provided that they are generated within a diffusion length of the depletion region. In terms of spectral response, the deep photodiode has better spectral response at longer wavelengths while the shallow performed better at shorter wavelengths. This is due to the absorption coefficient and penetration depth of light into silicon, where light of longer wavelength penetrates deeper into the substrate. The deep photodiode is sensitive to substrate noise and crosstalk from the neighbouring photodiodes due to its large and deep collection region while the shallow n+/p-substrate photodiode has good substrate noise immunity due to the presence of the deep field oxide (FOX) implants. Also the presence of the diffusion implant at the surface helps reduce the collection of dark current generated at the surface states of the Si-SiO₂ interface. The shallow p+/n-well photodiode is the least sensitive to substrate noise and crosstalk with neighbouring pixels because each junction is isolated within its own n-well [de Lima Monteiro 2002]. However the presence of the n-well also means that arrays using these photodiodes are less dense with the n+/p-substrate photodiode offering the best packing density. Noise was not characterised as the noise components in a bare photodiode without any additional circuitry are small and the large shunt resistance of the photodiode gives rise to a very small noise bandwidth (see Section 1.5.3). In most applications, where there is sufficient light budget, the imaging system tends to be photon shot noise limited [Homsey 1999c]. In addition, the connection of the photodiode to readout circuitry will induce another form of noise known as read noise, which limits the noise at low-light levels.

In summary, this chapter has demonstrated that the photodiodes present in a standard CMOS process offer great potential as an optical detector. This work provides an essential foundation to the rest of this thesis. In the next chapter, the design of a hardware emulation system of the optical centroid processor will be presented which makes use of a full custom array of photodiodes as the front end.
3.1 INTRODUCTION

The fabrication of an ASIC (Application Specific Integrated Circuit), especially one that contains analogue components, carries the possibility that the design may fall outside specifications and hence more than one fabrication iteration may be required before a satisfactory operating circuit can be realised. This carries a heavy cost penalty. Hence, a more conservative approach of a hardware emulation system prior to ASIC fabrication has been adopted in order to reduce the number of iterations needed. The hardware emulation system consists of a photodiode array as the optical front end and a reconfigurable digital device (called a Field Programmable Gate Array or FPGA) for the digital centroid processing. Once the emulation hardware confirms the satisfactory performance of a design in its intended application, it can then be converted into a mask programmed CMOS integrated circuit. Due to the reprogrammable nature of the FPGA the hardware emulation environment can also be used to evaluate many other optical processing algorithms prior to ASIC fabrication.

3.2 SYSTEM OVERVIEW

The hardware emulation system is shown in Figure 3.1 and consists of two printed circuit boards: a main motherboard and a smaller daughter board. The motherboard contains a single channel 16-bit analogue-to-digital converter (ADC), a Field Programmable Gate Array (FPGA), an RS232 transceiver for a PC serial interface, LED displays for debugging purposes and miscellaneous switches for initiating various test routines under user control. The second, smaller, daughter board contains an optical front end with a 5 by 5 photodetector array for optical light detection, multiplexers for pixel access and current-to-voltage conversion prior to digitisation. The individual parts of the system will be described in the following sections.
3.2.1 OPTICAL FRONT END

Initially a commercial photodetector array from Centronics (part number MD25-5T) [Centronic Ltd. 1998] was used in the front end. This device is a 5 by 5 photodiode array with a pixel size of 2.7mm x 2.7mm, a wavelength range of 340nm to 1100nm and a quoted responsivity of 0.18A/W at 436nm. This allowed the design and testing of the hardware emulation system to be carried out in order to locate any possible problems. Once confirmed the fabrication of a full custom photodetector array was carried out and once the fabricated array (5 by 5 photodiode array in PDfinal)\(^{29}\) was tested, it was incorporated into the emulation system in place of the commercial photodetector array. The size of the array chosen is a tradeoff between linearity and positional range with complexity and the desired centroid processing time as discussed in Section 1.4.3.2. Each pixel in the full custom array has a size of 100μm x 100μm and though the exact size of the photodiodes is not crucial, there is a compromise between ease of focusing, efficient use of silicon area and light budget when deciding upon the pixel size.

Both the commercial and the full custom array have a passive pixel architecture with no active circuitry at each photodiode. Current to voltage conversion is achieved using an op-amp in the transimpedance mode i.e. with a feedback resistance. A 5MΩ

\(^{29}\) Note that the photodiode type used in the full custom array is the combined device which is leaky in the reverse bias. However, in the emulation system the photodiodes were biased at 0V where the operation of the combined devices is acceptable.
variable resistor was used to allow the design to cope with different photocurrent levels and hence light intensities. A single transimpedance amplifier is used and multiplexers are used to select each photodiode output in turn to be converted. Serial multiplexers (Maxim MAX349) [Maxim Integrated Products Inc. 1998] were used to reduce the number of control signals needed. The schematics and PCB layouts for the daughter board with the commercial photodiode array and for the daughter board with the full custom photodiode array are shown in Appendix A3.1 and A3.2 respectively.

The choice of op-amps is crucial when using a large feedback resistance to detect a small photocurrent. An op-amp with low input bias current is necessary. The input bias current should be significantly smaller than the photocurrent that is to be converted because the large feedback resistance will convert this input bias current into a dc offset voltage at the output of the op-amp for every pixel. This background offset will significantly affect the centroid algorithm by shifting the centroid position towards the centre. Initially the Texas Instruments TLE2024Y op-amp [Texas Instruments Inc. 1997] was used in the front end (Appendix A3.1 and 3.2) which had an input bias current of 50nA but this was replaced with the pin-compatible TLC2274I [Texas Instruments Inc. 2000] which had an input bias current of only 1pA. In addition, the TLC2274I has a low noise voltage of 9nV/√Hz and rail-to-rail output voltage, hence providing a larger dynamic range.

The Centronics photodiode array has a common cathode configuration and hence the photodiodes must be wired as shown in Figure 3.2(a). If the non-inverting input of the transimpedance amplifier is biased at 0V, the output of the transimpedance amplifier will be a negative voltage. However, the input voltage range for the ADC on the FPGA board was hardwired for 0 to 5V operation (see Section 3.2.2.2). So either a 2nd op-amp configured as an analogue inverter is used, but this would also require generating a -5V supply for the daughter board, or the non-inverting input of the transimpedance amplifier is biased at 2.5V. The latter was chosen though this meant a decrease in dynamic range by half. But for testing purposes this was adequate. The MAX873 voltage reference generator [Maxim Integrated Products Inc. 1992] was used to generate the 2.5V ± 1.5mV reference.
Figure 3.2 Connection of photodiode array pixels to the current-to-voltage converter on the daughter boards

In the case of the full custom array, the photodiodes have a common anode configuration (Figure 3.2(b)) and the output of the transimpedance amplifier goes from 0 to 5V. However, due to the on-resistance of the switches (60Ω for MAX349 and 10Ω for MAX4514 [Maxim Integrated Products Inc. 1996c]), the input voltage at the switches go negative when photocurrent is drawn. So this meant the switches had to be able to cope with a negative input voltage range. So a MAX660 voltage inverter [Maxim Integrated Products Inc. 1996b] was used to generate a -5V supply while the MAX349s were configured for ±5V operation and the MAX4514 was replaced with the dual supply DG418DY [Maxim Integrated Products Inc. 1996a].
3.2.2 FPGA PROCESSOR

The FPGA processor board consists of an ADC for digitising the analogue signal voltages, an FPGA to perform the centroid processing and an RS232 transceiver for transmitting the computed centroids to a PC. Peripheral circuitry includes LEDs for debugging purposes, switches for control and power supply protection. An onboard 25MHz crystal oscillator provides a clock input for the FPGA. The schematic and PCB layout for the FPGA motherboard is given in Appendix 3.3 and the following sections will discuss the construction of the different components of the board.

3.2.2.1 FPGA

The processor selected to perform the centroid processing in the hardware emulation system is the Xilinx Spartan XCS40-3PQ208C FPGA [Xilinx Inc. 1999] with 40,000 system gates\(^{30}\) or 784 Configurable Logic Blocks (CLBs)\(^{31}\). CLBs are used to implement most of the logic in the FPGA and are organised as a two dimensional array interconnected by routing channels and surrounded by a perimeter of programmable Input/Output Blocks (IOBs). Figure 3.3(a) shows the basic block diagram of a Spartan FPGA. Each CLB consists of primitive hardware elements such as look-up tables (LUT) and positive-edge triggered flip flops as shown in Figure 3.3(b). Each IOB controls one package pin and can be configured for input, output, or bidirectional signals\(^{32}\).

---

\(^{30}\) This is the quoted maximum value but the typical gate range can vary from 13,000 – 40,000 logic and RAM gates depending on how much of the resources can be utilised in a design. It is more common to quote the number of CLBs used.

\(^{31}\) The Spartan devices with speed grade -3 have a specified minimum clock high time and clock low time of 4.0ns. So theoretically these devices can be run up to a speed of 125MHz.

\(^{32}\) Note that if an I/O is unused after configuration, it is configured as an input with a pull-up resistor activated.
The FPGA is programmed by loading configuration data into its internal static memory cells. The values stored in these memory cells determine the logic functions and interconnections implemented in the FPGA. The board is designed to allow configuration from a Xilinx XCS17S40-PD8C [Xilinx Inc. 1999] serial PROM (Master Serial mode) or from an external device such as a PC via an XChecker cable (Slave Serial mode) as shown in Figure 3.4. In the Master Serial mode, the FPGA’s internal oscillator generates a Configuration Clock (CCLK) for driving the serial-configuration PROM (SPROM) while in the Slave Serial mode, CCLK is driven by an external signal. Clearing of the configuration memory is done using the PROGRAM pin which is controlled by a pushbutton on the board. *INIT and DONE provide status outputs during configuration of the FPGA. Connecting the *RESET of the SPROM to the *INIT output of the Spartan device ensures that the SPROM address counter is reset before the start of any configuration.
3.2.2.2 Analogue-to-Digital Converter (ADC)

The ADC used for the hardware emulation system is the Burr-Brown ADS7807UB 16-bit sampling successive approximation ADC [Burr Brown Corporation 1994]. Figure 3.5 shows how the ADC is controlled and connected to the FPGA and also how the processed data from the ADC is to be displayed or transmitted to a PC. The ADC was hardwired for an input voltage range of 0 to 5V. The ADS7807UB can acquire and convert 16-bits in 25μs (40kHz) while consuming only 35mW (max) with a maximum integral non-linearity error of ±1.5LSB and no missing codes. It has 8 parallel output lines and a BYTE signal that has to be controlled to read the high byte and low byte in turn. Conversion is initiated by controlling a convert signal, R/C, with the 25MHz clock used to run the FPGA, as shown in Figure 3.6.
The ADC used had a low input impedance of only 20kΩ. But the output impedance of the transimpedance amplifier was significantly smaller (130Ω) so there was relatively no volt drop of the input voltage due to this and no buffering was required. An anti-aliasing filter was incorporated into the front end of the ADC. The switches of the multiplexers on the daughter board are updated 10 clock cycles before the ADC convert signal is sent.

**3.2.2.3 RS232 Transceivers**

The RS232 (or EIA232) standard was introduced to ensure reliable serial communication between devices. In the RS232 standard\(^3\), voltages of -3V to -25V with respect to signal ground (pin 7 on DB25 connectors or pin 5 on DB9 connectors) are considered logic '1' while voltages of +3V to +25V are considered logic '0'. An RS232 transceiver is a level converter IC which converts CMOS level voltages to RS232 level voltages and vice versa, and for this purpose, a MAX3232E transceiver [Maxim Integrated Products Inc. 2000] was employed which has two receivers and two drivers guaranteed to run at data rates of 250kbps while maintaining RS-232 output levels\(^4\).

---

\(^3\) In RS232 the start bit is logic '0' and stop bit is logic '1' and the least significant bit is always the first bit sent.

\(^4\) The output voltage swing of the transmitters is ±5.4V (typ).
For serial communication with a PC, a null modem connection is made. For synchronising, the receiver on the PC scans the incoming data for valid start and stop bit pairs. The receiver uses a 16x clock for detecting the incoming start bit, so the occurrence of the start bit will be located within the ±1/2 16x clock cycle or ±1/32 bit or ±3.125%. The design of the transmitter for generating and sending the output data will be discussed in Section 3.3 when the design of the digital centroid processor is presented.

### 3.2.2.4 Peripheral Circuitry

LED bargraphs and 7 segment displays are used to display results and for troubleshooting. A logic low level on an FPGA output connected to the LEDs draws current through the LEDs turning them on. In addition, an 8-way rocker DIL (Dual In-Line) switch and 3 tactile pushbutton switches were included to allow the user to easily control several input pins of the FPGA. Some of these input switches are used as mode or control inputs during the configuration of the FPGA.

All the components on the board operate on a 5V supply. Power supply protection is incorporated to protect these devices from voltage surges and incorrect powering of the board. The power supply protection circuit is shown in Figure 3.7 and includes a fuse, a zener diode, a varistor (voltage-dependent resistor) and a PMOS power MOSFET. When there is a power surge or a large voltage spike, the zener diode goes into breakdown and the varistor's resistance rapidly decreases creating a shunt path for the over-voltage. The PMOS SI9430SDY is used to protect the board from incorrect connection of the power supply. If the supply is connected correctly and the gate of the PMOS is connected to 0V and the source is connected to the +5V input, then Vgs < 0 and the PMOS is on. Else if it is connected in reverse say, Vgs > 0 and the PMOS turns off cutting the supply to the board.

35 To display the output data in decimal on the 7 segment display, binary to BCD (binary coded decimal) conversion is performed using the FPGA prior to output. Only three 7 segment displays were available, requiring data larger than 12 bits to be truncated. If the data is to be displayed in units other than binary, say volts, the effect of this truncation has to be taken into account when scaling.

36 Resistors are used to limit the current drawn from the LEDs.

37 \( R_{ds(on)} = 0.1 \Omega \)
Under quiescent conditions, the FPGA board drew 105.6mA from a 5V regulated supply. With all the LEDs on, it drew 189.6mA and with the Centronics array daughter board connected, 213.5mA was drawn. The board also allows users to use an unregulated power supply or a non-compliant supply voltage such as a 9V battery. It does this by having a second power supply input with the same power supply protection circuitry but connected to a MAX667 voltage regulator [Maxim Integrated Products Inc. 1994] prior to connection to the board's power lines. The power supply protection in the voltage regulator path has the same structure as that in the unregulated supply path but with different ratings. For example, the zener diodes have a zener voltage of 5.1V and 16V respectively in the regulated and unregulated supply paths. The MAX667 accepts a +3.5V to +16.5V input and has a maximum dropout voltage of 350mV and maximum supply current of 250mA, sufficient for the design needs.

3.2.2.5 Layout and Testing of FPGA Board

The PCBs were designed in Protei and the PCB motherboard was sent away for fabrication while the daughter boards were built in-house. When laying out the board, several basic rules were adhered to for reducing EMI (Electromagnetic Interference) and crosstalk:

- Use a large ground plane.
- Make power supply tracks large.
- As far as possible keep signals away from power lines.
- Avoid creating a loop when routing the power line\(^{38}\). Use a star configuration.

\(^{38}\)Routing traces in a loop around the board can increase the board’s susceptibility to external fields as well as increase the generation of them.
• Two decoupling capacitors (0.1uF ceramic and 10uF tantalum) are placed as close as practically possible to each power and ground pin of all the IC components\textsuperscript{39}.

• Use of surface mount technology (SMT) components is preferred over through-hole mounting due to its shorter lead length and hence lower inductance.

• Use a regulated power supply. If not, use the onboard voltage regulator.

Figures 3.8(a), (b) and (c) respectively show photos of the FPGA board, the FPGA board with the Centronics array board attached and the FPGA board with the full custom array daughter board connected. A basic test of the FPGA was to generate a counter and observe the outputs on the LEDs. The serial link was tested by generating and sending data from the FPGA to a PC [Goodwin 1992] via the onboard tranceivers. For the testing of the ADC, various analogue test signals such as a triangle wave and a sine wave input were acquired and converted and the results transmitted via the RS232 port to the PC. The analogue front end was tested by applying fixed voltages via resistors to generate an input current at the bare photodiode array sockets and observing the multiplexer switching and I-to-V conversion outputs. Once the system was fully tested, it was inserted into an optical bench setup for obtaining centroids.

\textsuperscript{39} A real capacitor includes both an inductor and resistor in the form of leads, traces, and even ground planes in series with it. This means that, in a circuit, a capacitor acts as a low-impedance element only over a limited range of frequencies. To extend this frequency range, many references propose adding a second capacitor to bypass frequencies outside the limited range of the single capacitor.
Figure 3.8 Photographs of the boards designed and built for the centroid hardware emulation system

3.3 DIGITAL CENTROID PROCESSOR DESIGN

This section discusses the design of the digital centroid processor on the FPGA. Digital systems can be specified in three domains [Yalamanchili 2001]. Under the functional domain, the system is described in terms of its operation or behaviour. Under the structural domain, the system is specified by the interconnection and hierarchy of its components and finally in the physical or geometrical domain, it is specified by the physical layout of the components. At the same time, a digital system
can have different levels of abstraction from the algorithm level to the register transfer level (RTL)\textsuperscript{40} to the boolean logic level. In this design, the digital backend of the system was described in VHDL where the individual behavioural RTL-based VHDL macros or components are placed and connected together on a schematic to give a structural and graphical description of the system. A block diagram of the VHDL macros of the centroiding system is shown in Figure 3.9. The schematics for the FPGA centroiding system of the Centronics array and the full custom array are shown in Appendix A3.4 and A3.5 respectively\textsuperscript{41}. The only difference between the two is the digital inputs in the Centronics array case are inverted and an offset is subtracted within the centroid processor block in order to account for the inverse direction of the signal in the Centronics array due to its common cathode configuration. The advantage of using VHDL (VHSIC\textsuperscript{42} Hardware Description Language) to model a digital system is that it is technology independent hence allowing a standardised, portable model of electronic systems. Technology independence will allow technology migration to, for example, reduced feature lengths in ASICs (deep sub-micron) or from, say, a Field Programmable Gate Array (FPGA) to ASIC where an FPGA has been used to prove the functionality of a design. In addition, a VHDL model of a digital system can be described both structurally and behaviourally and at different levels of abstraction, providing a means of managing large, complex designs.

\textsuperscript{40} At the register transfer level (RTL) a digital system is represented by a set of registers and a set of transfer functions describing the flow of data between the registers.

\textsuperscript{41} As the inputs of the transimpedance amplifier for the Centronics array are biased at 2.5V instead of 0V, its output voltage goes from 2.5V to 0V (10000000 to 00000000) for an effective signal level of 00000000 to 10000000 so inversion (shown in schematic) and substraction of an offset of 01111111 from the digitised data bits is performed before centroid processing is carried out.

\textsuperscript{42} VHSIC: Very High Speed Integrated Circuits
To obtain a centroid from incident light levels of a photodetector array, the 1st order moment of the light levels has to be calculated as described in Section 1.4.2 and given by equation (1.4). A simplified example of a centroid calculation is shown in Figure 3.10. In this example, a 4 x 4 photodiode array (shaded) with arbitrary light intensity given by the decimal numbers in the top right hand corner of each pixel produces centroid positions of \( C(x) = 2.53 \) and \( C(y) = 2.68 \).

**Reference point (0,0)**

<table>
<thead>
<tr>
<th>Y</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>4</td>
<td>•</td>
<td>•</td>
<td>•</td>
<td>5</td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>•</td>
<td>•</td>
<td>•</td>
<td>6</td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>•</td>
<td>•</td>
<td>•</td>
<td>7</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>•</td>
<td>•</td>
<td>•</td>
<td>7</td>
</tr>
<tr>
<td></td>
<td>0</td>
<td>•</td>
<td>•</td>
<td>•</td>
<td>7</td>
</tr>
</tbody>
</table>

**Figure 3.10 Example centroid calculation for a 4 by 4 photodiode array giving centroid positions of \( C(x) = 2.53 \) and \( C(y) = 2.68 \)**

Note that the reference point is chosen outside of the array. If the reference point was chosen as the centre of the array with positive and negative coordinate ranges, this reference point will not carry any weighting in the centroid calculation, leading to poor noise characteristics when the spot is close to the centre.
If the light levels are now represented digitally then these centroid moments can be implemented using the block diagram shown in Figure 3.11 for the x-coordinate and another duplicate block (not shown) for the y-coordinate. Photocurrent data is clocked in sequentially from each photo-detector and multiplied by a counter (Mod N \( N^{1/2} \), where N is the number of pixel elements in the array) that holds the position of the detector relative to the reference point in the x-direction. The output of this multiplier is continually accumulated via an adder block and the result is divided by the total photocurrent acquired via a separate and parallel running accumulator. The resultant division represents the x-centroid coordinate. A second centroid processing block calculates, in parallel, the y-centroid coordinate.

![Figure 3.11 Block diagram of centroid processor in the x-direction](image)

For binary addition and multiplication, VHDL operators (functions) within the IEEE numeric_std package are used. The addition effectively synthesises carry look-ahead adders while the binary multiplication process is effectively a shift and add procedure [Chang 1999]. Binary division however was not supported and division is

---

Adders can be implemented using a ripple structure which is small but slow or carry look-ahead adders which is faster but larger.
implemented using shift and conditional subtract operations of long division\textsuperscript{45} [Dewey 1997]. For a 5 x 5 array with a digitised 8-bit input light level, 15 bits are required for the numerator (255 x 15 x 5 + 1 levels) and 13 bits for the denominator (255 x 25 + 1 levels). This results in a non-floating point quotient output of 3 bits and corresponds to the coordinate range of 1 to 5 of the array, or 001 to 101 in binary. To increase the number of quotient bits and hence the precision of the division process, additional shift and subtract cycles are performed. This represents an increase in the number of cycles of operation with minimal increase in hardware as the dividend and divisor size remains the same (as long as 8-bit representation of light level is sufficient). A 7-bit representation of the centroid coordinates was chosen with 3 non-floating point bits and 4 floating point bits giving a positional resolution of 0.0625 of a pixel.

A centroid is obtained after N+5 conversion cycles or pixel cycles from the start of the frame where N, the number of photodiodes is 25. For a 40 kHz (25\mu s) conversion rate, a centroid is obtained after 0.75ms from the start of the frame. The 5 remaining cycles are required to allow the division process to complete. However, a new frame is started after N+1 or 26 cycles by making use of the latency during the division process, so centroids are updated every 26 cycles or at a rate 1.54 kHz. The additional cycle in this case is for the latching and reset of the dividend and divisor result prior to division and the start of the next frame\textsuperscript{46}.

The calculated centroid positions are then converted into the serial RS232 format with one start bit (logic 0), 8 data bits, no parity bits and 1 stop bit (logic 1). The MSB of each byte sent is used to indicate whether it is x or y-data while the remaining 7-bits are for the actual centroid data. A standard RS232 baud clock of 19,200 bits/s is generated to transmit the centroid coordinates, which limits the frame rate to 960 Hz.

\textsuperscript{45} Like in long division, the divisor needs to be aligned to the dividend before subtraction can be carried out. This is done by buffering or padding the divisor with additional zeros.

\textsuperscript{46} A conversion cycle was used for convenience sake and a shorter cycle could be utilised by controlling the final latching of the dividend and divisor on a faster clock.
When a baud rate of 38,400 bits/s is used, the full frame rate of the centroid processing is utilised. The serial centroid data is then sent off the FPGA chip to the MAX3232E for RS232 level conversion. In addition to the computation and transmission of the centroid, the digital processor had to control the ADC for the digitisation and acquisition of the photocurrents as well as the serial multiplexers for selecting the individual pixels in turn.

### 3.4 FPGA CAD ENVIRONMENT AND DESIGN FLOW

The CAD environment used for the development and programming of the FPGA system is the Xilinx Foundation Series 2.1i software [Xilinx Inc.] which fully supports the use of the Spartan device. The design flow for an FPGA design environment is shown in Figure 3.12. VHDL programs are analyzed to check for syntax errors and compiled to a form executable by a VHDL simulator. The analyzed design is synthesized to a library of components, typically gates, latches or flip-flops. Hierarchical designs are synthesized in a bottom up fashion, that is lower level components are synthesized before higher level components. Once the design is synthesized we have a gate-level netlist. This gate-level netlist can now be simulated\(^4^8\). Functional simulation is possible but accurate timing simulation is not possible at this point because the actual timing characteristics are determined by the physical placement of this design within the FPGA chip.

\(^4^7\) The 25MHz clock is divided by 1302 to obtain a baud rate clock of 19201.23 bits/s. This represents an error in the bit rate of 0.0064% which is not significant and in addition, RS232 receivers are designed to synchronise the transmission at the start of each new byte sent by clocking in the start bit at 16x the baud rate clock.

\(^4^8\) Xilinx simulation script files (.cmd) are used to ease the input of test vectors as well as allow simulations to be repeated or modified quickly.
Once the gate-level netlist is obtained the next step is to map this design onto the FPGA. Mapping a design onto an FPGA involves translating the gate-level netlist produced by the synthesis compiler into a netlist of FPGA primitive hardware components. Locking of the input and output ports on the design schematic to specific physical pins on the FPGA chip is done using the LOC property on the individual ports. The LOC property is a property provided for within the Xilinx Foundation Series software for assigning pin numbers to each input and output pin. This option was preferred over the use of the Constraint Editor to lock the pinouts as it did not always register or store the values entered.

In the Place and Route stage of the design, these primitive hardware components are assigned to actual physical primitives on the FPGA chip and the interconnections between these components are made. The Spartan device, and FPGAs in general, have different types of routing channels from single-length lines between each CLB to long or global lines which run the entire length of the array. These global routing networks can be used to route and distribute critical nets such as clock signals and high fanout signals throughout the device with minimal skew. This is done by placing global
buffers from the library on the design schematic. Besides layout constraints, timing constraints can also be placed in the User Constraints File (UCF) for controlling and optimising the placement and routing. In addition, Xilinx Foundation Series 2.1i allows several options to be selected during design implementation such as optimisation for area or speed, number of place and route passes to make, configuration of input/output pins (TTL or CMOS) and so on.

After place and route, the design can be simulated with propagation delays of the routed signals incorporated. Two types of post-layout simulation are possible in the Xilinx Foundation Series design environment, namely logic timing simulation and static timing analysis. In the timing simulation, user defined test vectors are dynamically propagated through the circuit and the resulting output waveforms are observed. The time required to perform the simulation limits the number of input vectors and circuit operating modes, and the length of circuit operation that can be simulated. Static timing analysis on the other hand does not have a simulation cycle and therefore do not schedule events. Instead of evaluating logic functions, static tools sum up and compare delays through paths, relative to pre-defined clocks. Static timing analysis will determine the critical paths in the design and verify that the design meets the timing constraints set. Static timing analysis is faster and provides a wider coverage but is less comprehensive and may generate false paths. Note that the Timing Analyzer (i.e. static timing analysis) in Xilinx Foundation 2.1 does not detect setup and hold violations but these violations are highlighted during logic simulation. Xilinx FPGA chips come with different speed grades and the static timing analyser can provide a quick analysis of the effect of different speed grades on the same design.

Once the design has been properly verified, the generated configuration bits during implementation can be downloaded onto the FPGA via a Xilinx XChecker cable. Due to the reprogrammability of the device, the design can be verified in-circuit using real data and any errors can easily be corrected and the device reprogrammed until the desired performance is achieved.
3.5 RESULTS OF HARDWARE EMULATION SYSTEM

For testing the hardware emulation system, a 20μm diameter laser beam (a double YAG laser at 532nm with approximate output power of 0.86mW) was scanned across the array at a speed of 2000μm/sec. Figure 3.13 shows the experimental setup used for testing the hardware emulation system. Centroid values were computed by the FPGA and serially transmitted in real time to a PC at a rate 38,400 bits/s. Initially the Centronics photodiode array was incorporated in order to test the VHDL centroid algorithm. Then the full custom photodiode array was included to evaluate the performance of a full custom array for centroid detection. The results of these tests are shown in the following sections.

![Experimental setup of scanning system](image)

**Figure 3.13 Experimental setup of scanning system**

3.5.1 COMMERCIAL PHOTODIODE ARRAY (CENTRONICS)

Figure 3.14 shows a grey scale map of the centroid values successfully recorded at each position on the array – each photodiode was of size 2.7mm x 2.7mm. The dark regions correspond to larger centroid coordinates whilst lighter regions correspond to small centroid coordinates. As expected, as we scan in the x-direction, the x-centroid values increases while the y-centroid values remain constant and vice versa. Since the laser beam size (20μm) is less than the size of one pixel then a stepped appearance can be seen as the beam moves across the array passing from one discrete detector to another. The guard rings of the Centronics array was left floating (Appendix A3.1)
and this gave rise to crosstalk but the purpose of the test was to check the functionality of the centroid processing algorithm which shows the desired response.

![Image Map of x and y Centroids for the Centronics Array](image)

Figure 3.14 Image map of x and y centroids for the Centronics array

### 3.5.2 FULL CUSTOM PHOTODIODE ARRAY

Centroid values were again calculated in real time by the FPGA with the 532nm laser scanned across the custom made CMOS array – the pixel size is 100μm x 100μm. Figure 3.15 shows the y-coordinate centroid values plotted as a function of pixel position for different beam diameter sizes. The array goes from -250μm to 250μm. Near the edges we can see non-linearity effects as the beam falls off the edge of the array. This effect is more pronounced for larger beam sizes because these will fall off the edge first. For very small beam sizes, we obtain discrete steps in the waveform as we would expect as the beam passes from 1 pixel to another. The steeper rise in centroid value occurs when the beam lands in-between two pixels. As the beam size increases the response becomes more linear in the centre of the array.
Figure 3.15 Measured position vs. actual position for different beam sizes

Figure 3.16 shows the grey scale centroid maps for both the x and y centroid coordinates obtained in real time by the CMOS array via the FPGA. Again the darker regions correspond to large centroid coordinates and vice-versa for light regions. Again we can see the stepped appearance with small beam size and a smoother appearance for larger beam size. From these results it can be seen that the hardware emulation system has proven the functionality of the digital centroid processor on the FPGA in computing the required centroids.

![Figure 3.16 Image map of x and y centroids for different beam sizes](image-url)
3.6 CHAPTER SUMMARY

A hardware emulation system allows the designer to test out various processing algorithms prior to IC fabrication. Although conservative, this approach aims to reduce the number of design iterations needed to produce a working design, and hence can lead to a reduction in cost and time-to-market. The hardware emulation system for the optical centroid detector consists of a 5 x 5 photodiode array, I-to-V conversion of the photocurrent, ADC of the signal voltage and the centroid calculation of the digital data with a reprogrammable FPGA. The hardware emulation system was tested with both a commercial photodiode array and a full custom standard CMOS photodiode array fabricated in the Mietec 0.7μm CMOS process. Current-to-voltage conversion was achieved using a transimpedance amplifier with a feedback resistance.

The centroid processor successfully computed the centroids at a rate of 1.54kHz which was limited by the maximum conversion frequency of the ADC of 40kHz. But even at this speed, an array of these centroid detection systems operating in parallel will enable fast low cost adaptive optical systems to be built. The centroid data was then transmitted off-chip to a PC using a RS232 transmitter. Having proven the functionality of the digital centroid processor and the use of a full custom photodiode array, the next step in the design was to integrate the full custom array with the digital centroid processor onto a single CMOS IC chip.
4.1 INTRODUCTION

After the performance of a full custom photodiode array and the centroid processing algorithm was verified by the hardware emulation system, the next stage is to integrate the full custom array and the processing onto a single piece of silicon. A block diagram of the overall system is shown in Figure 4.1. The top level schematic of the system is shown in Appendix A4.1 which shows the centroid chip divided into an analogue front end and a digital backend. The analogue front end consists of an active pixel sensor array and analogue-to-digital conversion circuitry. The digital backend consists of the centroid processor and a serial link for transmitting and receiving data off-chip. The individual components of the system are discussed in greater detail in the following sections.

Figure 4.1 Block diagram of the single CMOS chip optical centroid processor
4.2 ANALOGUE FRONT END

The analogue front end in the hardware emulation system consists of a passive pixel array and a transimpedance amplifier with a large feedback resistance. However, it is difficult to integrate a large resistor on silicon and the use of an external resistor would lead to parasitics and noise. Instead, an integrating photodiode APS array was used which incorporates a buffer at each pixel to convert the photocurrent to a discharging voltage output (see Section 1.6.2.1). The optimisation of this architecture is presented in Section 4.2.1 while the digitisation of the output signal is discussed in Section 4.2.2.

4.2.1 PHOTODIODE AND PIXEL ARCHITECTURE

For the photodiode array, the deep n-well to p-substrate photodiode is used. The deep photodiode has better responsivity and lower junction capacitance than the shallow devices due to its wide depletion region caused by the relatively low carrier concentration in the n-well. Also, it does not suffer from a large leakage current in the reverse bias as the combined shallow p+ /n-well and deep n-well/p-substrate photodiode in the original array does. Furthermore, its higher responsivity at longer wavelengths make them suitable for adaptive optical systems where longer wavelength operation means less stringent requirements.

Each pixel of the APS array, shown in Figure 4.2 (a), has a size of 100μm x 100μm and consists of the deep photodiode (Djunc), a complementary NMOS/PMOS reset gate (MRST, MNRST), a source follower (MACT, MBIAS) and two select transistors (MRSEL, MCSEL). All pixels are reset globally and the inverter output and the bias transistor (MBIAS) are shared with all pixels. Having a CMOS transmission gate allows the pixel to take advantage of a wider dynamic range by pulling the pixel up to the 5V supply voltage (VDD) during reset. This eliminates the problem obtained with using only an NMOS reset transistor whereby the reset level varies with light intensity [Tian 2001]. The layout of a single pixel is shown in Figure 4.2(b). Circuitry other than the photodetector is light shielded using one of the two available metal layers but this is not shown in Figure 4.2(b).
Figure 4.2 Active pixel circuit and its layout

The backend of the active pixel sensor, shown in Figure 4.3, acts as a current-to-voltage converter and buffer for the photodiode node. It consists of the source follower active transistor (MACT), a row select transistor (MRSEL), a column select transistor (MCSEL) and a bias transistor (MBIAS) which is shared by all pixels. To optimize the design of the backend, simulations were carried out to find the optimum W/L of the transistors and biasing voltage. For the simulations, the gates of the row-column access transistors were held at VDD i.e. 5V. To verify and confirm the simulation results, first order analysis of the circuit was carried out. The results of the simulation and the circuit analysis are presented in the following subsections.
Chapter 4

![Diagram of the active pixel sensor backend](image)

**Figure 4.3 Backend of the active pixel sensor**

Optimisation of $\text{MBIAS}$:

As $V_{\text{bias}}$ is increased, the dynamic range decreases and the response becomes more non-linear particularly for lower input voltages (Figure 4.4). An optimum bias voltage of 1V was selected to keep the biasing transistor operating above the threshold voltage of 0.76V [Alcatel Microelectronics 1999a] but sufficiently small so as to maintain a wide linear operating range. For further optimisation after fabrication, the applied voltage of the bias transistor of the active pixel array can also be applied externally.

![Graph of VOUT against Vin for different values of Vbias (W/L = 3μm/3μm)](image)

**Figure 4.4 VOUT against Vin for different values of Vbias (W/L = 3μm/3μm)**
Similarly, as the width-to-length (W/L) ratio of MBIAS increases, the dynamic range reduces and the response becomes more non-linear at lower input voltages (Figure 4.5). Also using larger size transistors does not improve the linearity of the response. As such, a W/L ratio of $3 \mu m/3 \mu m$ was chosen.

![Figure 4.5 VOUT against Vin for different sizes of the MBIAS transistor](image)

**Figure 4.5** VOUT against Vin for different sizes of the MBIAS transistor

**Optimisation of MCSEL and MRSEL:**

The effect of sweeping the voltage at Vrow on the output VOUT is shown in Figure 4.6. The response is fairly linear at voltages below 3V. As the W/L ratio of MCSEL increases, the dynamic range improves and flattens off at a higher voltage. As such, a large W/L ratio is desirable but the improvement in increasing the W/L ratio reduces as the W/L ratio increases. A W/L ratio of $3 \mu m/1 \mu m$ is used to keep the fill factor of the pixel large. The simulations indicate the need of a CMOS transmission gate to allow satisfactory transmission of higher voltages. But the dynamic range achievable is more than adequate for our application. Similar results were obtained with MRSEL and a W/L ratio of $3 \mu m/1 \mu m$ was also used for MRSEL.
Figure 4.6 \( \text{VOUT against Vrow for different sizes of MCSEL} \)

Optimisation of MACT:
Increasing the W/L of MACT improves the voltage gain and linearity at lower input voltages but at higher voltages the linearity degrades because of the poor transmission of high voltages by the NMOS row-column select transistors (Figure 4.7). The improvement in increasing the W/L ratio reduces as the ratio increases. Using a larger size transistor with the same W/L ratio does not give better results. A W/L ratio of 6\( \mu \text{m}/1\mu \text{m} \) was chosen for MACT. The selected W/L ratios of the transistors are shown in Figure 4.3.

Figure 4.7 \( \text{VOUT against Vin for different sizes of MACT} \)
Circuit analysis (ignoring second order effects):

In the circuit analysis, the node and voltage names in Figure 4.3 are used and the transconductance, $K$, and threshold voltages, $V_T$, of the different transistors are referred to using a subscript of that transistor's assigned name (MACT, MRSEL, MCSEL and MBIAS), e.g. $K_{MBIAS}$ is the transconductance of the transistor MBIAS.

For output voltages $V_{out} > V_{bias} - V_{T_{MBIAS}}$ and $V_{bias} > V_{T_{MBIAS}}$, MBIAS is operated in the saturation region [Gray 1992] such that its drain-source current is independent of the drain-source voltage, $V_{out}$. Therefore, the current through MBIAS, $i$ is given by:

$$i = \frac{K_{MBIAS}}{2} (V_{bias} - V_{T_{MBIAS}})^2$$  \hspace{1cm} (4.1)

MACT is also in saturation and because the voltage at $V_{out}$ is buffered, the current through MACT is equal to the current through MBIAS, MRSEL and MCSEL. Therefore:

$$i = \frac{K_{MACT}}{2} (V_{in} - V_{act} - V_{T_{MACT}})^2$$  \hspace{1cm} (4.2)

From (4.1) and (4.2), the voltage at $V_{act}$ in terms of $V_{in}$ and $V_{bias}$ is obtained as follows:

$$V_{act} = V_{in} - V_{T_{MACT}} - \sqrt{\frac{K_{MBIAS}}{K_{MACT}} (V_{bias} - V_{T_{MBIAS}})}$$  \hspace{1cm} (4.3)

MRSEL is operating in the linear region. Hence:

$$i = K_{MRSEL} (V_{DD} - V_{row} - V_{T_{MRSEL}}) (V_{act} - V_{row})$$  \hspace{1cm} (4.4)

From (4.1) and (4.4), the voltage $V_{row}$ in terms of $V_{act}$ and $V_{bias}$ becomes:

$$V_{row} = V_{act} - \frac{K_{MBIAS}}{2K_{MRSEL}} \left( \frac{V_{bias} - V_{T_{MBIAS}}}{V_{DD} - V_{row} - V_{T_{MRSEL}}} \right)^2$$  \hspace{1cm} (4.5)

Similarly, MCSEL is also operating in the linear region so:

$$i = K_{MCSEL} (V_{DD} - V_{out} - V_{T_{MCSEL}}) (V_{row} - V_{out})$$  \hspace{1cm} (4.6)

From (4.1) and (4.6), the voltage $V_{out}$ in terms of $V_{row}$ and $V_{bias}$ is given by:

$$V_{out} = V_{row} - \frac{K_{MBIAS}}{2K_{MCSEL}} \left( \frac{V_{bias} - V_{T_{MBIAS}}}{V_{DD} - V_{out} - V_{T_{MCSEL}}} \right)^2$$  \hspace{1cm} (4.7)

Therefore, from (4.3), (4.5) and (4.7) the output voltage $V_{out}$ in terms of $V_{in}$ and $V_{bias}$ is now obtained as follows:
This shows that as $V_{bias}$ or the W/L ratio of MBIAS is increased, the dynamic range is reduced as indicated by Figure 4.4. It also confirms that increasing the W/L ratios of MACT, MRSEL, MCSEL improves the voltage gain and linearity of the transfer function. Furthermore, the last term shows that as the output voltage drops, the non-linearity increases. This agrees with the simulation results presented earlier. The non-linearity for $V_{in}$ values close to the supply voltage is due to the use of NMOS select transistors which do not pass high voltages very well.

### 4.2.2 ANALOGUE-TO-DIGITAL CONVERSION

The pixel is operated in the charge integration mode. Each pixel is globally reset to 5V for 8μs (with a 32MHz clock) after which the pixel photodiode is allowed to discharge through its own photocurrent as shown in Figure 4.8. The discharge rate is proportional to the photocurrent of that pixel, which in turn is proportional to its incident light level. The discharge curve is approximately linear for voltages above 1V. This is because the photodiode capacitance varies inversely with the square root of the diode voltage. Also, as the diode and the pixel output voltage drops, the bias transistor starts to operate in the linear region and is no longer independent of the output voltage.
When digitising the pixel light level from a discharge curve, either the final output voltage is digitised using standard ADC techniques such as successive approximation, dual slope or flash techniques or the time taken for the discharge to occur is measured. The second method was preferred due to the simple circuitry required and by integrating the ADC into the discharge curve of the pixel using a counter technique the digital output is immediately available after the discharge period. The discharge time is measured by starting an 8-bit counter when it passes through an initial voltage level and stopping it when it passes a second lower voltage level. These voltage levels are set by 2 sets of reference voltage generators.

Each set of reference voltage generators can generate a voltage between 1V and 3.75V with a step size of 0.25V and each connects to a comparator input. Six levels of one of these reference generators are shown in Figure 4.9. The reference voltage generator consists of a set of voltage dividers implemented using active resistors (transistors in saturation) and transmission gates for selecting the desired voltage. These transmission gates are used to select the switching points of the comparators during analogue-to-digital conversion. This style of reference generation was used because of its efficient use of space [Allen 1987]. Increasing the number of devices in an active resistor can reduce the total required area by reducing the voltage across the transistors and changing the ratio required for the desired output. For testability, external reference voltages can also be applied in place of the internal ones.

49 If the transistors are identical in size the volt drop across the transistors are equal.
Voltage divider for 1V reference

Select switch

Figure 4.9 Reference voltage generator

When reset is fired, one reference voltage (Vref1) will be set at 3.75V while the second reference voltage (Vref2) will be set at 3.5V. Then the reference voltages are decreased until both reference voltages are below the pixel-reset level. This is determined by when the comparators switch over. In order to cope with a wide range of light levels, three modes of operation have been designed. In the first mode, the counter is started when the reset is removed and stopped when the discharge curve passes the 1st reference voltage. In the second mode, the counter is started when the discharge curve passes the 1st reference voltage and stopped when it passes the 2nd reference voltage. This has the advantage that if the reset level varies from pixel to pixel, the reading will be independent of this offset. In the third mode, a 2-cycle approach is used. In the 1st cycle, a reading is obtained as in mode 2. In the 2nd cycle the value of Vref2 is adjusted such that a larger dynamic range is obtained thereby increasing the resolution for higher light levels.

The simulated comparator delay is 0.38μs and the pixel reset period is sufficiently long for the setting of the threshold levels. The default discharge time of 8μs for the system was chosen for detecting photocurrents of the order of 10nA to 10μA in Mode 1 and 2. The minimum and maximum detectable current, \(I_{\text{min}}\) and \(I_{\text{max}}\) respectively, is given by:

\[
I_{\text{min}} = \frac{C\Delta V}{\Delta T_{\text{max}}} = \frac{0.5 pF(0.25V)}{8 \mu s} = 15.6 nA
\] (4.9)
where $C$ is the capacitance of the photodiode at 2V (taken as the average voltage on the discharge curve) and is obtained from the dark C-V measurements (see Section 2.3.2), $\Delta V$ is the volt drop over which the time is measured, and $\Delta T_{\text{max}}$ and $\Delta T_{\text{min}}$ are the maximum and minimum measurable time step respectively. Mode 3 extends this dynamic range by 8 times. Increasing $\Delta V$ to increase the minimum (and maximum) detectable current is in effect a form of thresholding and can be used to remove background signals. With this ADC technique, the sensitivity at low light intensity is limited by the spacing of the reference voltage levels while the sensitivity at high light intensity is limited by the speed of the counter.

Two of the main sources of noise in an active pixel sensor are shot noise and reset noise. Shot noise due to integration is given by [Droste 2002]:

$$V_n = \sqrt{\frac{q}{C^2} \int_0^T (I_{ph} + i_{dc}) dt} = \sqrt{\frac{q\Delta V}{C}}$$  \hspace{1cm} (4.11)

where $q = 1.6 \times 10^{-19}$C is the electron charge, $C$ is the photodiode capacitance, $I_{ph}$ is the pixel photocurrent, $i_{dc}$ is the pixel dark current, $T$ is the integration period and $\Delta V$ is the signal volt drop over the integration period. For a signal volt drop of 0.25V, the shot noise voltage is:

$$V_n = \sqrt{\frac{1.6 \times 10^{-19}}{0.5 \text{pF}}} \times 0.25 = 0.283 \text{mV}$$  \hspace{1cm} (4.12)

The reset voltage (equation (1.16)) on the other hand works out to be:

$$V_{\text{reset}} = \sqrt{\frac{kT}{C}} = \sqrt{\frac{1.38 \times 10^{-23} \times 300}{0.5 \times 10^{-12}}} = 91 \mu \text{V}$$  \hspace{1cm} (4.13)

For a signal volt drop of 0.25V, this represents a total SNR of more than 58.9dB.

A programmable discharge clock is used such that for lower light levels, a slower clock is used to measure the slower discharge curve over a longer period, thus allowing optimum resolution to be maintained for different intensity levels. This is yet another advantage of using CMOS processing which allows on-chip programmability.
to be incorporated. Four programmable discharge clock frequencies are possible, which are internally selected by two mode registers. The states of which can be read via the on-chip RS232 receiver.

### 4.2.3 APPLICABILITY OF DESIGN

Given that the sensor was designed to operate with photocurrents of 10nA to 10μA in Mode 1 and 2, the typical incident light levels for several applications are examined in order to determine which applications are feasible. In astronomy, the number of photons reaching the Earth's surface in a given area in unit time is given by the astronomical brightness, \( B_{\text{astro}} \) and this is defined for a visible passband by:

\[
B_{\text{astro}} = \left(4 \times 10^6 \right) 10^{-m_v/2.5} \text{photons/cm}^2 \text{-sec}
\]  

(4.14)

where \( m_v \) is the visual magnitude of the observed star and a visual magnitude of 14 is roughly the brightness of a sunlit geosynchronous satellite [Tyson 1995]. For \( m_v = 14 \), \( B_{\text{astro}} = 10 \text{ photons/cm}^2 \text{-s} \). For an 8m telescope, the photon flux will be \( 5 \times 10^6 \text{ photons/s} \) and assuming the Fried's coherence length, \( r_o \), is 15cm i.e. the size of a single subaperture, the photon flux per subaperture will be 2500 photons/s or about 0.8fW of incident power. For a responsitivity of approximately 0.3A/W, the photocurrent generated will be about 0.2fA! In order to detect this level of intensity, the integration time needs to increase by \( 10^8 \) times i.e. 3s.

According to [Nirmaier 2003], specifications for ophthalmic applications state that a safely applicable laser power results in 200pW per spot or about 60pA of photocurrent, which is about 250 times below the measurement limit. With faster clocks, shorter integration times, moving to smaller lower capacitance pixels and better readout techniques, it is possible with future designs to improve the sensitivity to this level and enable its use in ophthalmic applications.

In free-space optical (FSO) communications, the requirement is for the system to be eye safe as per IEC 60825 Class 1 or Class 1M specifications (up to 2mW/cm²). In addition, FSO systems operate at longer wavelengths, either 780-850nm and near the 1550nm band in order to be completely eye safe. For example, at approximately 1550nm, the regulatory agencies allow approximately 100 times higher power for
"eye safe" lasers. This is because at this wavelength, the aqueous fluid of the eye absorbs much more of the energy of the beam, preventing it from travelling to the retina and inflicting damage.

With further work it is foreseen that the design will be applicable to the fields of free space optical communications, microscopy and ophthalmology, but it is unlikely to be used in astronomy, where currently CMOS imagers are not as sensitive as CCDs, and the cooling and long integration times required negates the benefits of using the low-cost highly-integrated CMOS option.

4.3 DIGITAL BACKEND

The digital backend consists of the centroid processor which was previously verified by the hardware emulation system, counters and control for the ADC, the required clock dividers, serial transmitters for sending output data off-chip and serial receivers for receiving control signals. The block diagram of the digital backend is shown in Figure 4.10.

---

Figure 4.10 Block diagram of digital backend of the ASIC
4.3.1 ASIC CENTROID PROCESSOR

The centroid processor computes the first order moment of the light intensity (see Section 1.4.2) and was successfully demonstrated in the hardware emulation system. The x and y-centroids are calculated in parallel with separate processors. In addition to finding the centroid, the position of the pixel with the highest intensity is also found. In the ASIC processor, the number of bits is extended to 11 bits due to the extended dynamic range. This extends the number of bits required for the dividend and divisor to 18 bits and 16 bits respectively. The only change required to the centroid processor in the hardware emulation system is larger storage while the operations performed remain the same.

The centroid multiplications and summations are carried out as the pixels are read while the division process of the centroid calculation is performed once every frame, after all the pixel values have been read and the final dividend and divisor values obtained. After the division process is completed, the centroid values are latched out. A frame lasts 26 pixel periods. A pixel period consists of a reset period that lasts 256 clock cycles (8μs) and a discharge period that lasts up to 256 clock cycles depending on the light level. For the full 32MHz clock, this gives a frame rate of between 2.4kHz (32MHz / (512*26)) and 4.8kHz (32MHz / (256*26)) depending on the incident light level of every pixel. The use of 26 pixel periods per frame was out of convenience and it is possible to reduce this to 25 pixel periods by using a faster clock to latch out the data and clear the registers, thereby allowing a new frame to start immediately after 25 pixel periods.

In the hardware emulation system, the division process was performed asynchronously due to the style of coding and required about 125μs (5 conversion cycles of the ADC) to settle to its result\(^{50}\). The VHDL division process was modified in the centroid ASIC for synchronous operation, leading not only to an improvement in speed but also a reduction in gate count despite the increase in number of bits. The division process now needed just 15 clock cycles of a 16MHz clock to complete. The

\(^{50}\) Although the division process in the hardware emulation system is slower, the update rate of the centroid values was also 26 ADC conversion cycles or pixel access cycles.
clock frequency used was half the external 32MHz clock frequency as the critical path
delay was found to be 31.37ns in the division process (Appendix A4.3). At 16MHz,
15 clock cycles takes just 0.9375μs to complete. With a pixel period of 8μs to 16μs,
there is plenty of latency to be utilised allowing the division processor to be shared
among several centroid processors and this will be discussed further in Section 4.3.3.
For testability, the reset of the pixels and the row-column addressing can also be
controlled externally.

4.3.2 DATA TRANSMISSION

Serial transmission was preferred over parallel outputs in order to minimise the
number of pinouts required. Centroid data and intermediate values of centroid
processing (such as the x-dividend, y-dividend, divisor, individual pixel light level
and peak pixel position) are transmitted in RS232 format\textsuperscript{51} with, as before, one start
bit (logic '0'), 8 data bits, no parity bits and 1 stop bit (logic '1') at a selectable baud
rate of 115200, 76800, 57600, 38400, 19200, 9600, 4800 or 2400 bits/s selected by
three external control pins. The default startup rate is 115200 bits/s and this is the only
RS232 baud rate capable of transmitting the centroid data in real time. The minimum
possible data rate needed to transmit the centroid data in real time = (32 MHz / (256 x
26)) x 20 bits = 96154 bits/s.

The row/column address and digitised light levels are also transmitted off-chip so
external processing of the centroids can be performed. Unlike the centroid values,
these need to be transmitted at the pixel rate. So the light level data was transmitted in
a modified serial format with one start bit (logic '0'), 11 data bits (10 for the
row/column address), no parity bits and 1 stop bit (logic '1') and the minimum
possible data rate required in this case is (32 MHz / 256) x 13 bits = 1.625 Mbits/s.
Hence, a baud clock of 4 MHz was used to transmit the row/column address and

\textsuperscript{51} A single transmitter is used to transmit the centroid and intermediate values by controlling three
external pins to select the data type to send. Some of the intermediate values have more than two bytes
(or just one byte for the pixel position of maximum intensity) to transmit and as before, the MSBs are
coded to distinguish between the packets transmitted. clkd10, clkd20 and clkd30 signals are used to
ensure that a complete set of bytes are sent before newly available or updated data are transmitted.
digitised light levels. This is a non-standard format intended for use with an FPGA or DSP to perform the external centroid processing.

Two serial receivers were implemented. One for obtaining control signals for selecting the ADC mode of operation, the discharge clock rate, the accessibility of certain input/output pins, and the enabling of external processor control. The second receiver is used to receive external row/column address inputs, and like the row/column address transmitter, receives data at 4 Mbits/s to enable real-time operation with an FPGA say. The control signal receiver, on the other hand, does not need to operate very fast and is designed to detect a serial RS232 format input of one start bit, 8 data bits and 1 stop bit at a baud rate of 19200 bits/s. Two bytes of input data (or 16 bits) provide the required control signals. The receivers synchronise the baud rate clock at every start bit using the 32 MHz clock and starts looking for the next start bit at the centre of a stop bit.

4.3.3 LAYOUT AND TEST BOARD

The ASIC centroid processor represents a single tilt sensor in an integrated Shack Hartmann wavefront sensor and the core-limited layout of this tilt sensor is shown Figure 4.11. The chip contained 7200 logic gates and has a size of 4500μm x 4000μm. The core area is 3800μm x 3400μm and the photodiode array took up 530μm x 600μm. The digital circuitry which makes a significant portion of the chip will scale favourably as technology scales and as we move towards a triple metal process with improved routing capabilities. The main centroid processor block, for example, takes up an area of 1800μm x 1900μm or 3.42mm² and the division circuitry takes up about 44% of this area or 1.5mm². However, since the division is performed only once every frame and requires only 15 cycles of the 16MHz clock or 0.9375μs to complete, there is significant latency in the use of the divider. Assuming that it is necessary to obtain all the centroid/tilt outputs within one pixel cycle (256 x 1/32MHz = 8μs) in order that the tilts correctly represent the same wavefront, about 8 centroid processors could

---

52 The received data is latched out at this point as well. Certain control signals were not received serially but as parallel inputs, such as the baud rate clock select, the transmit data type, the transmit trigger signal and the global reset signal.
time-share one divider without any loss in data rate. However, such a large number of centroid processors to one divider would lead to significant routing and crosstalk issues. Also circuitry like the clock dividers, receiver and control logic can be shared alongside the divider while the transmitters are only needed for the final output signals. So overall there is a 50% fill factor in the circuitry that can be shared. Integrating 4 tilt sensors for every divider such as illustrated in Figure 4.12 will give a fill factor of about 20% which is a reasonable solution.

For testing of the centroid processor, the fabricated ASIC is incorporated into a test board with power supply protection, input switches, serial port connection and a 32MHz crystal oscillator, as shown in Figure 4.13. The pushbuttons control reset signals and to ensure default values are used on startup, an active-high power-on reset circuit with an RC time constant of 2.7ms is used as shown in Figure 4.14. The schematic and PCB layout for this test board is shown in Appendix A4.2.

Figure 4.11 Layout of ASIC optical centroid processor (a single tilt sensor)
Figure 4.12 Scalability of design to a complete wavefront sensing and reconstruction system

Figure 4.13 Test board for ASIC centroid processor

Figure 4.14 Power-on reset used in centroid ASIC test board
4.4 ASIC CAD ENVIRONMENT AND DESIGN FLOW

The design flow for an ASIC, illustrated in Figure 4.15, is a little more involved than for an FPGA. An FPGA has a highly integrated synthesis, placement and routing flow with less need and scope for manual optimisation due to the highly regular and structured architecture of the FPGA. An ASIC, on the other hand, does not have a regular layout and routing structure and can consist of both full-custom and semi-custom cells. Hence, for an ASIC, there is greater flexibility and manual control in the floorplanning, placement and routing stage.

![Figure 4.15 ASIC design flow](image)

The CAD environment used for the design of the ASIC was the Mentor Graphics C4 suite of tools [Mentor Graphics Corporation 1998]. Design entry is made via Design Architect v8.6_4. VHDL macros are incorporated into the design schematic by synthesizing the VHDL code using the Leonardo Spectrum synthesis tool and converting the EDIF netlist (.edf) generated into Mentor's proprietary netlist format, Electronic Design Data Model (EDDM), prior to schematic and symbol generation. The simulation tools used were Mentor's Accusim v8.6_3, QuickSim II v8.6_4 and
QuickPath v8.5_1 for analogue simulation, digital simulation and static timing analysis respectively. Mixed analogue and digital simulation was not available so the analogue and digital parts of the design had to be simulated separately. Layout was carried out using Mentor’s IC Station v8.7_3 family of tools, which include ICgraph for full custom editing, ICplan for floorplanning, ICblock and ICroute for automatic placement and routing of standard cells and blocks, and ICcompact for layout compaction. The target technology for the design was the same as in the first two chips (Section 2.2.1 and 2.2.2), that is the Alcatel Microelectronics (Mietec) 0.7μm CMOS process.

Due to the added nature of full custom editing, the physical verification stage is an important part of any ASIC design and can be divided into three tasks: Electrical Rule Checks (ERC), Design Rule Checks (DRC) and Layout Versus Schematic (LVS) checks. The ERC checks for simple circuit violations such as short circuits, open circuits and correct power and ground connections. A DRC is to ensure the design meets the layout rules set by the foundry such as minimum spacing and minimum lengths of any given mask [Alcatel Microelectronics 1999b], while the LVS checks if the layout structure matches with the original schematic design. In IC Station the verification toolset is called ICverify and consists of the ICtrace, ICrules and ICextract tools for ERC, DRC and LVS checks respectively. ICextract is also the extraction tool used for extracting parasitic resistance and capacitance from the layout for backannotation into the design schematic. Post layout simulations were not carried out as the tools for backannotation were not properly setup but LVS and DRC were performed. LVS had to be performed separately on each individual block before the top level check, that is, a hierarchical check had to be done. Finally when fully verified, the design was exported in GDSII format and submitted to the chosen foundry, IMEC in Belgium, via the Europractice IC Service. There further ERC and

53 Floorplanning is the process of placing groups of circuits on a die, and analyzing the effect of that placement in terms of design performance and routability. Floorplanning also helps to monitor the actual size of a design.

54 Backannotation is the process of extracting timing information from the layout back into the design schematic for post-layout simulation.
DRC checks were carried out using their Cadence Dracula set of tools before fabrication can commence.

4.4.1 DESIGN AND LAYOUT ISSUES

The design is a mixed semi-custom and full-custom design with full custom cells (photodiode array, voltage reference generators), Mietec analogue cells (comparators, biasing) and digital standard cells (VHDL macros). With such a mixed design, several design and layout issues need to be addressed [Baker 1997, Johns 1997], such as the combining, partitioning and routing of the design in the Mentor Environment, clock tree buffering of digital circuitry and power consumption. These issues are discussed in the following subsections.

4.4.1.1 Design Entry and Full Custom Editing

In the Mentor environment, specific properties are used on the components at different levels of hierarchy such as the PWR_NET and GND_NET properties to specify which global power supply nets to use, SUB_COMP and CELL_COMP to specify the lowest level of hierarchy, MODEL, INSTPAR and INST properties for analogue simulation, PLACE properties to specify placement on the layout, etc. When designing full custom cells, these properties had to be incorporated and the layout instance pin names must match the schematic symbol pin names for correct LVS. In addition, equivalent circuit models of the full custom cells are included in subcircuit schematics for simulation purposes. At the top level schematic (Appendix A4.1), the full custom components, Mietec components and VHDL macros are connected and external input/output and power supply pads are attached. Each VHDL macro corresponds to a single layout block where the standard cells within each individual block are autoplaced and autorouted.

4.4.1.2 Design Partitioning

The design was partitioned into blocks, which aids the separation of the analogue and digital sections of the design and avoids coupling of high frequency digital switching onto sensitive analogue lines. Clocks and transmitters were placed further away from the analogue portion of the circuit but generated clocks were kept close enough to the ADC and centroid processing blocks to prevent excessive clock skews and delays. Keeping the ADC and centroid computation blocks together also helps to minimise...
wire lengths and delays. Once the individual blocks were placed, routing was carried out.

4.4.1.3 Power Supply Routing and Protection

Power supply nets and critical nets were routed manually before autorouting of all nets was performed. By using a 'keep pre-routes' option, the router uses the widths of the manual routing but moves it as it sees fit. When the autorouter has finished, further manual changes are made where necessary. Mietec specifies the maximum current for the metal layers, the contacts and the vias under different conditions and this works out to be approximately 1mA/µm, 0.30mA/contact and 0.35mA/via respectively [Alcatel Microelectronics 1999]. To ensure low resistance and inductance, a conservative approach was taken and power supply nets were widened to at least 25µm, expanding as power supply buses join, and clock lines were widen to 10µm. In the design, four sets of power supplies (VDD1-4, VSS1-4) are used for the digital I/O, one pair for the digital core cells (VDD, VSS) and three pairs (VDDA1-3, VSSA1-3) for the analogue cells, namely the photodiode array, the comparators (with biasing) and the voltage reference generators\(^{55}\) respectively. The Mietec P_SUPPROT power supply protection structure was placed between each power and ground pair for better immunity against electrostatic discharge (ESD) [Alcatel Microelectronics 1995a].

4.4.1.4 Mietec Analogue Cells

The comparators used were the analog CFCMP1 cell provided for in Mietec's MTC22500 Analog Library [Alcatel Microelectronics 1995b]. In order to use this cell the cell had to be biased according to the biasing strategy stated by Mietec. This includes a bandgap voltage reference (1.20V), a master bias generator for providing a 'sinking' current source and a slave bias cell to convert the reference current to bias voltages for the analogue cells. The slave bias cells and the analogue cells that they bias are placed close together and in the same row of cells, since voltage drops in the supply lines can cause errors in the bias current.

\(^{55}\) VDDA3-VSSA3 was also used to power the analogue I/Os.
4.4.1.5 Clock Tree Planning

Clock tree planning is necessary as it is impossible to use an ideal clock to drive all the latches due to issues of routability, circuit drive strength and clock latency and skew. In the FPGA design environment, global buffers and routing channels are used to distribute critical nets such as input clock signals on the device with minimum skew. In the ASIC environment, no clock-tree synthesis tool was available so clock tree synthesis had to be done manually by calculating the load of every input and output signal of each block and inserting clock drivers or buffers (inverters) into the design in a tree-like manner.

Several buffers are available within the Mietec standard cell library [Alcatel Microelectronics 1998] as shown in Table 4.1. CBTSA, CBTSB and CBTSC are positive enabled tristate buffers with low drive, 2X drive and 3X drive respectively, while CIA, CIB and CIC are inverters with low drive, 2X drive and 3X drive respectively. CIB was chosen as a compromise between high output drive, low propagation delay, minimum area and low power consumption. A maximum load of 20SL was allowed on a net before buffers were inserted to keep the maximum load of each arm of the clock tree to 20SL.

<table>
<thead>
<tr>
<th>Cell name</th>
<th>Area (µm²)</th>
<th>Power (µW/MHz)</th>
<th>Propagation delay for 32SL (ns)</th>
<th>Input capacitance (SL)*</th>
<th>Output drive (SL)*</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>t PLH</td>
<td>t PHL</td>
<td></td>
</tr>
<tr>
<td>CBTSA</td>
<td>432</td>
<td>2.1360</td>
<td>3.69</td>
<td>2.74</td>
<td>2.0</td>
</tr>
<tr>
<td>CBTSB</td>
<td>649</td>
<td>3.7925</td>
<td>1.85</td>
<td>1.52</td>
<td>4.0</td>
</tr>
<tr>
<td>CBTSC</td>
<td>1189</td>
<td>9.6915</td>
<td>0.97</td>
<td>0.83</td>
<td>9.7</td>
</tr>
<tr>
<td>CIA</td>
<td>216</td>
<td>0.8878</td>
<td>2.29</td>
<td>1.65</td>
<td>2.0</td>
</tr>
<tr>
<td>CIB</td>
<td>324</td>
<td>1.2632</td>
<td>1.12</td>
<td>0.96</td>
<td>3.3</td>
</tr>
<tr>
<td>CIC</td>
<td>757</td>
<td>4.5378</td>
<td>0.16</td>
<td>0.16</td>
<td>11.6</td>
</tr>
</tbody>
</table>

* SL is defined in the Mietec library as a standard load (SL) of 0.029pF

Table 4.1 Buffer types in the Mietec MTC23000 standard cell library

Only part of the clock tree buffering circuit can be seen in the top level schematic (Appendix 4.1) with additional clock tree buffering extending into the lower level schematics of certain blocks. Only input and output signals of the blocks were
analysed by hand. Internal signals within each block were not dealt with manually but static timing analysis highlights nets with long propagation delays and heavy loads.

### 4.4.1.6 Power Consumption

Another issue in circuit design is the power consumption of the circuit. A power analysis tool was not available and manual calculations of the chip’s power consumption were carried out. Based on simulations and the specified supply currents of the analogue cells, the majority of the power consumption is expected to come from the digital circuits which also makes up a bigger proportion of the circuitry. The power dissipation of a cell in the Mietec MTC23000 CMOS Standard Cell library is given by:

\[
\text{Total Power Dissipation per cell} = \left[ \text{POW (value from datasheet)} + (VDD^2 \times C_{\text{ext}}) \right] \times \text{FREQ}
\]

where
- \(C_{\text{ext}}\) = load capacitance in pF for each cell
- \(\text{POW}\) = power in \(\mu\text{W/MHz}\)
- \(\text{FREQ}\) = switching frequency in MHz

The first term represents power dissipation due to the internal circuitry of the cell while the second term is the power consumption due to the charging and discharging of the load capacitance of the cell. Except for heavily loaded lines, the 2\(^{nd}\) term can be ignored and for the purpose of these power consumption calculations was not taken into account. Calculation of the total power consumption of the digital circuitry is shown in Appendix A4.3 where the power dissipation of the components in each block are summed and multiplied by their operating frequency. A conservative approach was used where the highest frequency which any part of the block runs at was taken as the operating frequency of the whole block. For example, though the calculation of the dividend and divisor runs at the pixel access rate of 2.4kHz to 4.8kHz, the switching frequency of the block was taken as the frequency at which the division process occurs i.e. at \(32\text{MHz}/2 = 16\text{MHz}\). Furthermore, not every part of the circuit is constantly being operated and the transmitters, the receiver and the control logic only operate when activated. A summary of the results as well as the number of gates and the critical path delay of each block is shown in Table 4.2.
<table>
<thead>
<tr>
<th>Block name</th>
<th>POW(μW/MHz)</th>
<th>Frequency, f (MHz)</th>
<th>Power Consumption (mW) at frequency, f (MHz)</th>
<th>No. of Gates</th>
<th>Critical path delay (ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>clkdiv1</td>
<td>72.9231</td>
<td>32</td>
<td>2.3335392</td>
<td>47</td>
<td>6.46</td>
</tr>
<tr>
<td>CTR5by5a</td>
<td>5131.3544</td>
<td>16</td>
<td>82.1016704</td>
<td>3641</td>
<td>31.37</td>
</tr>
<tr>
<td>CTRa2d</td>
<td>1554.306</td>
<td>32</td>
<td>49.737792</td>
<td>1028</td>
<td>17.45</td>
</tr>
<tr>
<td>ctrlrcvr</td>
<td>1029.2836</td>
<td>32</td>
<td>32.9370752</td>
<td>625</td>
<td>13.2</td>
</tr>
<tr>
<td>div2</td>
<td>12.632</td>
<td>32</td>
<td>0.404224</td>
<td>7</td>
<td>2.37</td>
</tr>
<tr>
<td>div8</td>
<td>58.0278</td>
<td>32</td>
<td>1.8568896</td>
<td>35</td>
<td>5.02</td>
</tr>
<tr>
<td>div10</td>
<td>60.6702</td>
<td>0.1152</td>
<td>0.006989207</td>
<td>37</td>
<td>4.86</td>
</tr>
<tr>
<td>div20</td>
<td>79.6368</td>
<td>0.1152</td>
<td>0.009174159</td>
<td>48</td>
<td>5.51</td>
</tr>
<tr>
<td>div30</td>
<td>74.3362</td>
<td>0.1152</td>
<td>0.00856353</td>
<td>46</td>
<td>5.69</td>
</tr>
<tr>
<td>divbaud</td>
<td>307.2977</td>
<td>32</td>
<td>9.8335264</td>
<td>191</td>
<td>13.76</td>
</tr>
<tr>
<td>opctrlsyn</td>
<td>150.0007</td>
<td>32</td>
<td>4.8000224</td>
<td>92</td>
<td>2.05</td>
</tr>
<tr>
<td>RCrcvr</td>
<td>554.6075</td>
<td>32</td>
<td>17.74744</td>
<td>335</td>
<td>7.56</td>
</tr>
<tr>
<td>txall</td>
<td>869.5475</td>
<td>0.1152</td>
<td>0.100171872</td>
<td>590</td>
<td>11.02</td>
</tr>
<tr>
<td>txlightout</td>
<td>399.6852</td>
<td>4</td>
<td>1.5987408</td>
<td>256</td>
<td>8.84</td>
</tr>
<tr>
<td>txrowcol</td>
<td>381.9001</td>
<td>4</td>
<td>1.5276004</td>
<td>239</td>
<td>8.5</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td><strong>205.0034192</strong></td>
<td><strong>7217</strong></td>
<td><strong>205.0034192</strong></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table 4.2 Power consumption, gate count and critical path delay of digital blocks

These figures are purely estimates at best but showed that the chip could cope with power supply demands. Certainly, compared to the hefty power consumption of a PC (typically 100W) there is significant power savings to be made by going to a single chip solution. When the chip had been fabricated and tested on a circuit board, the supply current drawn from a 5V supply was 30mA (rms) when computing and transmitting centroid data. Hence, the power drawn was less than estimated, which is reasonable as a conservative approach of over-estimating the frequency of operation was taken.
4.5 RESULTS OF CENTROID ASIC

A 3μm diameter beam from a 633nm HeNe laser was scanned across the array at a speed of 2000μm/sec. Centroid values were computed by the processor and serially transmitted in real time to a PC running Linux with a baud rate of 115,200 bits/s. Figure 4.16 shows the experimental setup used to scan the beam across the ASIC and obtain the centroids. A neutral density filter (NDF) was used to bring the power incident on the ASIC down to about 1μW.

![Experimental scanning setup for testing the centroid ASIC](image)

Figure 4.16 Experimental scanning setup for testing the centroid ASIC

Figures 4.17 shows the optical image obtained from the reference photodiode as the beam is scanned across the array. Figures 4.18 and 4.19 show grey scale maps of the x and y-centroid values successfully recorded at each position on the array, with the device operating under mode 1 of the digitisation procedure. The dark regions correspond to smaller centroid coordinates whilst lighter regions correspond to larger centroid coordinates. As expected, as we scan in the x-direction, the x-centroid values increases while the y-centroid values remain constant and vice versa. Since the laser beam size is less than the size of one pixel, a stepped appearance can be seen as the beam moves across the array passing from one discrete detector to another. Figures 4.19 and 4.20 show the averaged x and y-centroid values plotted as a function of pixel position. Also shown in Figures 4.19 and 4.20 are the error bars for the measurement of the position across the array.
Figure 4.17 Optical image of scan

Figure 4.18 Image map of x-centroids

Figure 4.19 Image map of y-centroids
4.5.1 POSITION RESOLUTION

The positional resolution is obtained by finding the average of the maximum deviation (error bar) in the position response across the array in a single scan and this is found to be 19.5\(\mu m\) in the y (2.1LSB) and 14.9\(\mu m\) in the x (1.8LSB)\(^{56}\). This is comparable to

\(^{56}\)The difference in resolution in the x and y is accentuated by the fact that the array size is 530\(\mu m\) in the x and 600\(\mu m\) in the y.
the positional resolution obtained by Nirmaier [Nirmaier 2003] whose chip was used to measure wavefront aberrations in the human eye. De Lima Monteiro’s integrated wavefront sensor [de Lima Monteiro 2002] achieved a positional resolution of 1.4μm with a 7.0μW spot but a resolution of 47.1μm with a 0.2μW spot. Hence, the design showed reasonable position resolution.

The noise in the positional response curves obtained is attributed mainly to FPN and shot noise. The shot noise and thermal/reset noise level for the design were discussed in Section 4.2.2 and the shot noise was shown to dominate the reset noise. Typical figures for FPN on the other hand are hard to define because it is significantly dependent on the precise process used [Homsey 1999c]. The FPN reported with CMOS image sensors in a submicron process was approximately 2-3% of saturation for raw data without FPN removal circuitry, and as discussed in Section 1.5.3.5, the main cause of FPN is the variation in $V_T$ in the pixel circuitry rather than the variation in photoresponse. As seen in Section 2.4.1.2 on chip to chip variation, the FPN from pixel to pixel is expected to be very small when the entire pixel is flooded i.e. for large spot sizes.

FPN can be removed at the photodetection level using focal plane FPN removal circuitry such as CDS and DDS, as described in Section 1.6.3, or by subtracting a stored averaged dark frame of pixel values. This initial prototype did not include either of these. However, FPN can still be removed using a suitable calibration technique. To remove FPN for a system that outputs a centroid would require scanning a spot across the array many times over and averaging out the temporal noise in the 2D images of centroids acquired to obtain a single 2D image map consisting of the positional response, the dark FPN component and the PRNU component. Assuming the PRNU component is negligible and this image map is applicable to all other intensity levels, curve fitting can be used to fit ideal or average curves through the positional response curves obtained, as illustrated in Figure 4.22, and the difference between the curves can be stored in memory and subsequently subtracted from future centroid readings. However, this is only accurate for a particular spot size for which the design is tailored for. Note that for larger spot sizes the effect of the FPN is expected to decrease.
In addition, it is possible to increase the resolution by reducing the size of the pixels but this, however, reduces the maximum detectable tilt and hence, the maximum measurable aberration magnitude. Also, the resolution was inherently limited by the number of bits in the centroid representation (7 bits) and to increase this requires very little overhead. That is, just an additional shift-and-subtract cycle in the divider is required for each additional bit in the result, without the need for additional storage for the dividend and divisor.

Under conditions of low signal-to-noise ratio (SNR) it is possible to improve the accuracy of the centroiding by removing the background signal through thresholding. With a digital centroid processor this is easily achieved by subtracting a programmable, even adaptive, offset to the digitised input or by setting all pixel values below a certain threshold to zero before the centre of gravity is found. It is also possible to apply windowing around the pixel of maximum intensity to improve the SNR further.
4.5.2 SPEED

The design achieved a frame rate of more than 2.4kHz which when scaled to an array of centroid processors or tilt sensors in parallel will achieve a frame rate which is independent of the number of tilt sensors employed, allowing fast real time adaptive optical systems to be built.

The speed of the design is limited by the frame readout and digitisation technique and not the centroid computation. In the initial design, 26 pixel periods were required per frame but this can be reduced to 25 pixel periods by using a faster clock to latch out the data and clear the registers, thereby allowing a new frame to start immediately after 25 pixel periods.

Although the system is able to remove the data bottleneck in present systems, it is possible for the design to go even faster as currently with the ADC technique used, only 1 pixel is digitised per discharge cycle and 25 separate discharge cycles were required before a frame was readout, as illustrated in Figure 4.23 (a). By measuring the time to discharge to a particular voltage level, different pixels with different discharge rates (due to different incident light intensities) complete the ADC conversion at different times making it difficult to sequentially measure every pixel during one cycle. It would be possible to reduce the current system to a single integration period by comparing every pixel during each count as shown in Figure 4.23 (b), but this would require a very fast clock (800MHz for the default 8μs range) and an equally fast comparator. This may be feasible for long integration times but is not considered a suitable alternative. Instead a design which requires only one discharge cycle but a separate conventional ADC [Hoeschele 1994] that does not incorporate the discharge curve into the digitisation process is proposed. This is illustrated in Figure 4.23 (c) and Figure 4.24. By starting the integration period of the pixels at different times and using a fixed integration period, pixel values can be readout and digitised sequentially. Variable integration time is achieved by controlling the position at which integration is started and stopped. The integration time is given by the number of pixel access times between these points. With this technique, the frame delay is reduced to a maximum of 25 pixel access times for a 5 x 5 array, which is significantly faster than the current design.
Figure 4.23 Pixel discharge and access with (a) the current system, (b) current system modified for single discharge cycle operation and (c) a proposed sequential digitisation structure

Figure 4.24 Alternative pixel access and digitisation structure
4.5.3 DYNAMIC RANGE

The spatial dynamic range of the centroid outputs as shown in Figures 4.20 and 4.21 was limited leading to limited positional sensitivity\(^57\). A limited spatial dynamic range translates to a limited tilt measurement range. There are several reasons for the reduced spatial dynamic range. Stray light in the test system will lead to a large background signal on all pixels, shifting the centroid values towards the centre. Secondly, as a global reset was used, all the pixels discharge simultaneously whether or not they are read and during this discharge the photodiode node is floating and the pixel current can diffuse to neighbouring pixels. This crosstalk will lead to a larger background reading in all the signals, once again shifting the centroid output towards the centre. Also, as simulation results may vary from actual values, the voltage drop in mode 1 may be significantly smaller than that designed for, leading to a limited dynamic range in this mode. In order to have a minimum of 0.25V volt drop in mode 1, the 2\(^{nd}\) reference voltage level should be used instead of the 1\(^{st}\) reference voltage level. Finally, as a consequence of the digitising technique used where the discharge time is measured for a given light level, a 1/x compression of the input photocurrent is achieved leading to a high light intensity dynamic range [Forchheimer 1994] but also a smaller signal to background ratio, as expressed by:

\[
\Delta T = T_{\text{max}} - \frac{C\Delta V}{I_{\text{ph}}}
\]  

(4.14)

where \(\Delta T\) is the measured discharge time, \(T_{\text{max}}\) is the maximum discharge time for a given volt drop \(\Delta V\), photodiode capacitance \(C\) and photocurrent \(I_{\text{ph}}\). For the case of the current system \(C = 0.5\text{pF}\) at 2V and \(\Delta V = 0.25\text{V}\), a digital output count, \(x\), is obtained as follows (and shown in Figure 4.25):

\[
x = 255 - \frac{4\mu\text{A}}{I_{\text{ph}}}
\]  

(4.15)

---

\(^57\) Poor positional sensitivity does not imply poor positional resolution and the accurate positional resolution of spots could still be obtained by careful calibration of the response curves.
The issue of limited spatial dynamic range can be addressed in various manners. In order to remove any background signals, a thresholding technique can be used, or alternatively, an initial frame of dark (or background) readings is stored and subtracted from subsequent readings of each pixel. To resolve the problem of pixel crosstalk, the pixels need to be reset individually [Yadid-Pecht 1997] such that when one pixel is discharging the other pixels remain under reset. Also, a suitably biased guard ring structure can be incorporated between pixels to mop up any crosstalk current. As for the compression effect of the current digitisation technique, this has little benefit for determining centroids but would be a useful feature in imaging where it is desired to capture both the bright and dark regions of an image. Hence, an alternate digitisation structure like that proposed in Section 4.5.2 and Figure 4.23 is preferred. Note that the proposed pixel access and digitisation structure does not allow crosstalk to be removed by having individual pixel reset. Instead a guard ring structure must be used with every pixel.
4.5.4 SCALABILITY

The system as a whole is extremely scalable. The division circuitry takes up a significant amount of the processing area but the division process is performed only once every frame and requires only 15 cycles of the 16MHz clock to complete so there is significant latency in the use of the divider. When several centroid processors are integrated in parallel the divider can be shared without significant increase in size or loss of speed.

In addition, the specified gate density of the Mietec process is 1250 gates/mm². Migration to smaller feature sizes will mean greater packing density. The austriamicrosystems (AMS) 0.35μm CMOS process, for example, has a gate density of 18k gates/mm². This is a 14 times reduction in size of the digital circuitry, making the integration of a large number of centroid processors for a complete wavefront sensing system feasible.

4.6 CHAPTER SUMMARY

A real time VLSI optical centroid processor was successfully designed and fabricated for integration into a proposed Shack-Hartmann wavefront sensor. The chip consists of an optimised 5 x 5 active pixel array and analogue-to-digital conversion circuitry integrated with the centroid processor previously demonstrated using a hardware emulation system. Centroid values can be obtained at a rate of 2.4 - 4.8 kHz with a position resolution of less than 20μm or 0.2 of a pixel, allowing real time performance of the adaptive optical system. By replacing the use of a CCD, a frame grabber and a PC with a dedicated on-chip centroid processor, significant savings in power, size and cost can also be achieved.
CHAPTER 5
WAVEFRONT RECONSTRUCTION

5.1 INTRODUCTION

Once an array of optical centroid processors has obtained the wavefront slopes, the next step in the process of an adaptive optics system is the reconstruction of the aberrated wavefront. The main aim of wavefront reconstruction is to generate the required actuator signals to deform a flexible mirror to compensate for the distortions in the wavefront. Hence, in order to understand the process of wavefront reconstruction, one needs to understand how wavefronts are described and how deformable mirrors are used to perform the correction before delving into the reconstruction techniques available. The following sections discuss the concepts of wavefront reconstruction and how this process can be incorporated into the design, which will enable the design of a complete, compact, fast and low-cost adaptive optical system.

5.2 WAVEFRONT DESCRIPTION

A wavefront can be described using a zonal approach or a modal approach [Tyson 1998]. In a zonal approach, the wavefront is expressed in terms of the phase over a small spatial area or zone and by combining all the zones within the aperture, a complete wavefront is described. If the number of zones approaches infinity, the wavefront is exactly represented. In the modal approach, the wavefront is expressed in terms of a weighted sum of spatial modes such as tip/tilt, defocus, etc. where each mode is defined over the entire aperture. For wavefronts with low spatial frequencies, the entire wavefront can be adequately represented by a few low-order modes whereas if high spatial frequencies are present a large number of terms are needed and a zonal approach may be preferable [Geary 1995]. This weighted sum of modes is expressed
as a suitable polynomial expansion and one such expansion is the sum of Zernike polynomials, $Z_k$ of order $k$:

$$\phi(\rho, \theta) = \sum_k A_k Z_k(\rho, \theta) \quad (5.1)$$

where $\rho$, $\theta$ are polar coordinates and the coefficients $A_k$ is a time varying parameter which is typically smaller for higher orders. $Z_1$ and $Z_2$ correspond to the tilt of the wavefront in the $x$ and $y$-directions, $Z_3$ to defocus, $Z_4$ and $Z_5$ to astigmatism and so on [Noll 1976]. Zernike polynomials are a popular choice because the polynomials are defined over a unit circle similar to the circular aperture of a telescope making it straightforward to express such wavefronts in terms of Zernike polynomials. They can also take into account the effect of the annulus present in telescopes. The orthogonality of the polynomials over a unit circle is also useful for incorporating higher order terms that are independent of the lower order terms and Zernike polynomials also allow easy calculation of the wavefront variance or error.

5.3 DEFORMABLE MIRRORS

Deformable mirrors are used to produce the phase conjugate of the aberrated wavefront in order to produce a plane wave. There are different types of deformable mirrors used in adaptive optics and these include segmented mirrors, continuous faceplate mirrors and bimorph mirrors [Tyson 2000].

Segmented mirrors can be manufactured to tight tolerances and each segment acts independently so the control computer is simplified. However, they do not provide a smooth surface transition and the gap between segments can have an adverse effect on the optical beam because its regular pattern acts somewhat like a diffraction grating by imparting diffractive modes into the beam. In addition, segmented mirrors need more actuators than continuous faceplate mirrors. The continuous faceplate deformable mirror eliminates the gaps and the optical problems associated with segmented mirrors at the expense of more complicated control. The shape of the continuous deformable mirrors is described by its influence function which describes the influence of one actuator on the surrounding surface.
A bimorph mirror consists of two thin layers of material bonded together. The layers can be oppositely polarized piezoelectric wafers or a piezoelectric wafer bonded with an optical surface made from glass or silicon. An array of electrodes is deposited between the two wafers and when a voltage is applied to an electrode, one wafer expands relative to the other producing a curvature proportional to the voltage applied. For a given number of electrodes bimorph mirrors achieve the highest degree of turbulence compensation. Compared to other deformable mirror technologies such as membrane mirrors, bimorph mirror fabrication uses lower cost components and involves fewer and much simpler processes. Bimorph mirrors produce a curvature (which follows a Poisson equation) making it suitable for use with curvature wavefront sensors [Roddier 1998a] without the need of complex reconstruction circuitry but less suitable with other wavefront sensors which requires the Poisson equation to be solved. Also, the geometry of the actuators in bimorph mirrors is radial-circular which conveniently matches the circular telescope apertures with a central annulus. However, the number of modes or actuators remains limited.

![Figure 5.1 Comparison between (a) segmented mirrors, (b) continuous faceplate mirrors and (c) bimorph mirrors [Doelman 2000]](image)

Micromachined deformable mirrors are a new class of deformable mirrors fabricated in silicon Micro-Electro-Mechanical Systems (MEMS) technology where small mirror elements are deflected by electrostatic forces. They offer potential for low cost and large number of actuators [Hatcher 2001, Mansell 2000]. But currently insufficient stroke and the small size of the elements remain a limitation.
Liquid crystal (LC) spatial light modulators (SLM) are another way to control the phase of light [Dayton 1997]. They operate based on the fact that an applied voltage will change the alignment of the long thin LC molecules and hence change its index of refraction. They can have a large number of elements but the phase shifts introduced by liquid crystals remain too small and wavelength-dependent.

5.4 WAVEFRONT RECONSTRUCTION

As described in Section 1.3.1, the Shack Hartmann wavefront sensor obtains local wavefront tilts from focal spot position displacements. The reconstruction of a wavefront from a set of local wavefront tilts involves solving a system of linear equations [Tyson 2000], which expressed in matrix algebra has the form:

\[ s = B a \] (5.2)

where \( s \) is a vector of the local wavefront tilts, \( a \) is a vector of the required actuator commands (if modal reconstruction is used modes are obtained instead of phases and have to be converted into actuator commands) and \( B \) is called the reconstruction matrix that contains information on how the tilts are related to the actuator signals.

The system is usually overdetermined with the system having more equations than unknowns such that \( s \) has a higher dimension than \( a \). A least-squares approximation can be used to solve for vector \( a \) and this is equivalent to calculating:

\[ a = (B^T B)^{-1} B^T s \] (5.3)

where \( B^T \) is the transpose and \( (B^T B)^{-1} B^T \) is the pseudo-inverse of the reconstruction matrix \( B \). The equation is valid on the condition that \( B^T B \) is invertible (not singular). If this condition is not met, a method called singular value decomposition (SVD) is used. Otherwise, direct inversion methods like Gaussian elimination can be used [de Lima Monteiro 2002]. The pseudo-inverse matrix \( (B^T B)^{-1} B^T \) only needs to be calculated once for a given configuration (sensor-actuator geometry and reconstruction method). After which, the system only needs to compute the centroids, the associated tilts and evaluate a matrix multiplication for the actuator commands. There are two types of reconstruction methods that can be used, namely the zonal or modal reconstruction. The choice of which depends very much on the choice of deformable mirror and the choice of wavefront sensor.
5.4.1 MODAL APPROACH

In a modal reconstructor, the coefficients of a polynomial function for describing the wavefront, such as the Zemike polynomials, are obtained. From equation (5.1) the local tilts, $S_{ix}$ and $S_{iy}$, can be related to the local derivatives of the phase and hence the local derivatives of the Zemike polynomials of subaperture $i$, as follows:

$$S_{ix} = \frac{d\phi}{dx_i} = \sum_k A_k \frac{dZ_k}{dx}$$

(5.4)

$$S_{iy} = \frac{d\phi}{dy_i} = \sum_k A_k \frac{dZ_k}{dy}$$

(5.5)

In matrix form of $N$ subapertures and $M$ modes, this can be written as:

$$
\begin{bmatrix}
S_{1x} \\
S_{1y} \\
S_{2x} \\
S_{2y} \\
\vdots \\
S_{Nx} \\
S_{Ny}
\end{bmatrix} =
\begin{bmatrix}
\frac{dZ_1}{dx} & \frac{dZ_2}{dx} & \frac{dZ_3}{dx} & \cdots & \frac{dZ_M}{dx} \\
\frac{dZ_1}{dy} & \frac{dZ_2}{dy} & \frac{dZ_3}{dy} & \cdots & \frac{dZ_M}{dy} \\
\frac{dZ_1}{dx_2} & \frac{dZ_2}{dx_2} & \frac{dZ_3}{dx_2} & \cdots & \frac{dZ_M}{dx_2} \\
\frac{dZ_1}{dy_2} & \frac{dZ_2}{dy_2} & \frac{dZ_3}{dy_2} & \cdots & \frac{dZ_M}{dy_2} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
\frac{dZ_1}{dx_N} & \frac{dZ_2}{dx_N} & \frac{dZ_3}{dx_N} & \cdots & \frac{dZ_M}{dx_N} \\
\frac{dZ_1}{dy_N} & \frac{dZ_2}{dy_N} & \frac{dZ_3}{dy_N} & \cdots & \frac{dZ_M}{dy_N}
\end{bmatrix}
\begin{bmatrix}
A_1 \\
A_2 \\
\vdots \\
A_M
\end{bmatrix}
$$

(5.6)

By solving for vector $A_k$ using equation (5.3), the coefficients of the Zernike polynomial are obtained. At the same time, the influence function $\varphi_i(x, y)$ of each actuator in a deformable mirror can also be expressed as a function of the Zernike polynomial as follows [Zhu 1999]:

$$\varphi_i(x, y) = \sum_{k=1}^{M} b_{ik} Z_k(x, y)$$

(5.7)

where $b_{ik}$ is the coefficient corresponding to the $k$th Zernike polynomial due to the control signal of the $i$th channel of the mirror. Assuming that the total deflection of the mirror is a linear superposition of the deflections from all the control channels, the mirror surface deflection $\Delta\phi(x, y)$ can be expressed as:
\[ \Delta \phi(x, y) = \sum_{i=1}^{p} c_i \varphi_i(x, y) \]
\[ = \sum_{i=1}^{p} c_i \sum_{k=1}^{M} b_{kl} Z_k(x, y) \]
\[ = \sum_{k=1}^{M} \left( \sum_{i=1}^{p} c_i b_{kl} \right) Z_k(x, y) \]  

(5.8)

where \( c_i \) is the control signal of the \( l \)th channel of the deformable mirror. Therefore, the Zernike coefficients obtained from solving equation (5.6) can be related to the control signals \( c_i \) as follows:

\[ A_k = \sum_{i=1}^{p} c_i b_{kl} \]

(5.9)

where \( b_{kl} \) is experimentally determined and the equation is solved using equation (5.3) once again to obtain the required control signals \( c_i \) to perform the corrections.

5.4.2 ZONAL APPROACH

In a zonal reconstructor, the phase at regular grid points across the aperture is evaluated and several sensor-actuator geometries such as the Hudgin geometry and the Fried geometry shown in Figure 5.2 for a 3 x 3 actuator system. Here, \( a \) represents the actuator positions and \( S_i \) represents the slopes of subaperture \( i \).

![Figure 5.2 Wavefront sensor-actuator geometries](image-url)
With the Hudgin geometry only one centroid per subaperture is used and \( S_1, S_2, S_6, S_7, S_{11} \) and \( S_{12} \) represent slopes in the x-direction while \( S_3, S_4, S_5, S_8, S_9 \) and \( S_{10} \) represent slopes in the y-direction. For \( N \times N \) actuators, a Hudgin geometry requires \( 2N(N-1) \) subapertures (\( N(N-1) \) x-centroids and \( N(N-1) \) y-centroids) while the Fried geometry requires \( (N-1)^2 \) subapertures so the Hudgin geometry requires more subapertures but less processing per subaperture.

From these configurations, the equations (equation (5.1)) that relate the wavefront sensor signals to the actuator commands can be developed. Besides the geometry and alignment of the sensor subapertures and the actuators, the type of mirror used for reconstruction determines how the tilt values are related to the required actuator signals and hence the reconstruction matrix \( B \) [Tyson 2000]. In the case of a segmented mirror, the slope of a particular subaperture only depends on the influence of the neighbouring actuator signals and the reconstruction matrix \( B \) is sparse. In the case of the continuous faceplate deformable mirror, the remaining elements in \( B \) are not zero but dependent on the influence function of the mirror.\(^{58}\)

5.4.3 RECONSTRUCTION PROCEDURE

A Shack Hartmann wavefront sensor measures local wavefront slopes, which provide a zonal description of the aberrated wavefront and lends itself to zonal reconstruction procedures [Geary 1995]. To illustrate the architecture required for wavefront reconstruction from a set of Shack Hartmann wavefront tilts, the process of generating the reconstruction matrix and finding its pseudo-inverse for a chosen architecture is carried out, and the implementation of this structure is considered.

For this purpose, a segmented mirror system with 3 x 3 actuators and a Hudgin geometry (requiring 12 tilt sensors) as shown in Figure 5.2(b) is selected. In the matrix form this is expressed as:

\[^{58}\text{For example, } B_{11} \text{ of the reconstruction matrix represents the influence of actuator 1 on the 1}^{\text{st}} \text{ slope in the x-direction, } B_{12} \text{ represents the influence of actuator 2 on the 1}^{\text{st}} \text{ slope in the x-direction and so on.}\]
The row of 1's at the bottom of the reconstruction matrix is used to force the average surface of the wavefront to a specific shape or value and to keep the reconstruction matrix from being singular. The pseudo-inverse matrix of $B$ is found as follows:

$$s = [S_1, S_2, \ldots, S_{12}]$$

$$B = \begin{bmatrix}
1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 \\
1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\
\end{bmatrix}
$$

$$a = \begin{bmatrix}
a_1 \\
a_2 \\
a_3 \\
a_4 \\
a_5 \\
a_6 \\
a_7 \\
a_8 \\
a_9 \\
a_{10} \\
a_{11} \\
a_{12} \\
\end{bmatrix}
$$

$$s = B^{-1}a = (5.10)$$

Consequently, the solution of the actuator commands from equation (5.3) is reduced to a straightforward matrix multiplication. In the case of this example, 117 floating point operations (and a further 108 additions) are required. For on-chip implementation, the values of the matrix is stored in memory and the choice of the number of bits to represent the values depends on the accuracy of the centroid.

59 The piston component of the mirror can take on any value and still match the wavefront shape. Hence the mirror has to be constrained to an average surface height.
calculation as well as the control specifications of the deformable mirror used. In order that the error in the reconstruction matrix does not propagate through the reconstruction algorithm, and the reconstruction error is mainly due to the error in the position response, the maximum fractional uncertainty in $[B]^{-1}$ must be sufficiently less than the minimum fractional uncertainty in the wavefront slopes $s$. At this current stage of development, the centroid processor achieved a positional resolution of 2.1LSB in the $y$ and 1.8LSB in the $x$. Hence the minimum fractional uncertainty in the centroid measurement is given by:

$$\frac{\delta s}{s_{\text{max}}} = \frac{1.8}{80} = 0.0225$$

(5.11)

where $s_{\text{max}}$ is the maximum position output from the centroid processor and this corresponds to a pixel position of 5 (1010000 or 80). $[B]^{-1}$ has a minimum element value of 0.0139 and a full scale range of 0.8888. Hence the maximum allowable uncertainty in $B^{-1}$ is given by:

$$\delta B^{-1}_{\text{max}} = \left| B^{-1}_{\text{min}} \right| \frac{\delta s}{s_{\text{max}}} = 0.0139 \times \frac{1.8}{80} = 3.1275 \times 10^{-4}$$

(5.12)

As such the minimum number of bits required for $[B]^{-1}$ is given by:

$$N = \log_2 \frac{0.8888}{3.1275 \times 10^{-4}} = 12 \text{bits}$$

(5.13)

However, it turns out that for this configuration and this set of values of $[B]^{-1}$, the round off error is very small and the same accuracy is obtained if 10 bits are used. Also, although the result of the matrix multiplication will consist of 7 (slopes) + 12 (reconstruction matrix) = 19 bits, a typical deformable mirror usually requires 8 bits or less of control input and the result is usually truncated.
If signed 10-bit encoding is used, for example, the signed integer representation of the inverse matrix values\(^6\) become:

\[
\begin{bmatrix}
512 & 208 & 512 & 176 & 80 & 176 & 112 & 208 & 112 & 64 & 80 & 64 & 128 \\
-304 & 304 & 416 & 176 & -64 & 64 & 112 & 160 & 112 & -16 & 16 & 128 \\
-208 & -512 & 80 & 176 & 512 & -112 & -176 & 64 & 112 & 208 & -64 & -80 & 128 \\
176 & 112 & -304 & -64 & -16 & 416 & 160 & 304 & 64 & 16 & 176 & 112 & 128 \\
-64 & 64 & -64 & -256 & -64 & -256 & 256 & 64 & 256 & 64 & -64 & 64 & 128 \\
80 & 64 & -208 & -112 & -64 & 176 & 112 & -512 & -176 & -80 & 512 & 208 & 128 \\
\end{bmatrix}
\]

This can then be converted into signed binary or two's complement for hardware implementation. In general the number of bits of memory required to implement wavefront reconstruction on-chip is given by:

\[
\text{No. of actuators} \times (\text{No. of subapertures} + 1 \text{ piston term}) \times \text{No. of bits required}
\]

The same concepts can be applied for the Fried geometry but in this case the relationship between the tilts and the actuator signals is given by (see Figure 5.2(a)):

\[
\begin{align*}
S_{1x} &= a_2 + a_5 - a_1 + a_4 \\
S_{1y} &= a_1 + a_2 - a_4 + a_5
\end{align*}
\]

\[\ldots\]

5.5 COMPLETE AO SYSTEM

Traditional systems require data to be transmitted from the optical sensor (i.e. a CCD) to a host computer by means of an analogue video line, an analogue-to-digital converter (ADC) and a frame memory and hence are invariably slow and costly. Figure 5.3 shows the structure of our proposed AO system and that of a traditional system. By partitioning the design into its function and incorporating processing at the

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]

\[\ldots\]
sensor level, the data bottleneck present in traditional system can be alleviated. The final integrated wavefront sensor (iWFS) will consist of an array of tilt sensors with local centroid processing, and wavefront reconstruction circuitry implemented either on-chip or on a dedicated processor such as an FPGA, as illustrated in Figure 5.4. There is a further reduction in data bandwidth of 2 after reconstruction of the wavefront from the wavefront slopes. On-chip implementation will allow higher speeds and a more compact design, while an FPGA implementation has greater flexibility in allowing the system to cope with different mirror and optical configurations.

**Figure 5.3 Partitioning the AO system by function instead of hardware reduces the data bottleneck**

**Figure 5.4 Proposed integrated wavefront sensor**
Chapter 5

There was insufficient time to build a complete working AO system. However, in order to highlight the potential of this system, a proposed architecture incorporating the fabricated tilt sensor and a continuous deformable mirror will be discussed in terms of its speed, its area and its cost. Scalability of this design will also be considered. The deformable mirror selected is the 37-channel 15mm micromachined membrane deformable mirror from OKO Technologies with a settling time of 1ms and is a device which is widely used in other adaptive optical systems [Dayton 2002, de Lima Monteiro 2002, Paterson 2000, Rhoadarmer 1999]. The cost of the mirror is EUR4850 or £3300 including control electronics [Flexible Optical BV]. Each channel is driven by an 8-bit input signal.

The proposed geometry to be used for alignment of the subapertures with the mirror is shown in Figure 5.5, which requires 37 subapertures for the 37 actuators, and has been used by Rhoadarmer et. al. [Rhoadarmer 1999] for testing the wavefront sensor hardware and software for the new Multiple Mirror Telescope adaptive optics system. The DM actuators and WFS subapertures have been projected onto the entrance pupil. The large circle represents the entrance pupil diameter. The hexagons are the DM actuators and the squares are the WFS subapertures. The small circles mark the centers of the actuators. Dayton et. al. [Dayton 2002] used a slightly different geometry with 32 actuators as shown in Figure 5.6.

![Figure 5.5 Proposed subaperture-actuator geometry for AO system](Rhoadarmer 1999)
5.5.1 SPEED

A zonal reconstruction procedure is assumed and the closed loop response time of the system is estimated as follows:

\[
\text{Closed loop response time} = \text{Frame time of tilt sensors} + \text{Readout time of tilt sensors} + \text{Reconstruction time} + \text{Mirror settling time}
\]

5.5.1.1 Frame time of tilt sensors

All the tilt sensors in the prototype design operate in parallel at a minimum frame rate of 2.4kHz, that is, a frame time of 0.416ms.

5.5.1.2 Readout time of tilt sensors

In the current design, each tilt sensor produces two (x and y) 7-bit centroid values, although the number of bits used to represent the centroids may be increased in subsequent designs to achieve better positional resolution. Parallel 7-bit readout is possible but a serial readout using the 32MHz clock will be assumed here giving a readout time of:

\[
T_{\text{readout}} = \frac{(14 \text{ bits/subaperture} \times 37 \text{ subapertures})}{32\text{MHz}} = 16.2\mu\text{s}
\]  

(5.15)
5.5.1.3 Reconstruction time

The reconstruction procedure involves the matrix multiplication of the 7-bit centroid values with the inverse reconstruction matrix as described in Section 5.4. Hence, the reconstruction time, $T_{\text{recon}}$, required is given by:

$$T_{\text{recon}} = N_{\text{mult}} \times (T_{\text{mult}} + T_{\text{mem}})$$

(5.16)

where $N_{\text{mult}}$ is the number of multiplication procedures required, $T_{\text{mult}}$ is the time required for a single multiplication process and $T_{\text{mem}}$ is the memory access time for accessing the stored inverse reconstruction matrix values, assuming a single multiplier operating serially on the entire matrix. If instead more than one multiplier unit is used, then the reconstruction time required is given by:

$$T_{\text{recon}} = \left(\frac{N_{\text{mult}}}{N_{\text{unit}}}\right) \times (T_{\text{mult}} + T_{\text{mem}})$$

(5.17)

where $N_{\text{unit}}$ is the number of multiplier units operating in parallel.

i) Number of multiplication procedures, $N_{\text{mult}}$

The number of multiplication procedures, $N_{\text{mult}}$, required is given by:

$$N_{\text{mult}} = \text{number of actuators} \times (\text{number of slope terms} \times \text{1 piston term})$$

(5.18)

Typically, there are two slopes obtained per subaperture (For the Hudgin geometry only one slope per subaperture is required) and hence for the configuration proposed,

$$N_{\text{mult}} = 37 \times (2 \times 37 + 1) = 2775 \text{ procedures.}$$

ii) Multiplication delay time, $T_{\text{mult}}$

If 12-bits are used for the inverse reconstruction matrix values, then the result of the multiplication will be 19-bits long. For the multiplication, a shift and add method can be used and for the addition, a carry-look-ahead adder is preferred over a ripple adder for its parallelism and hence, shorter delay. A 16-bit 2-level carry-look-ahead (CLA) adder\textsuperscript{61} requires just 9 gate delays to complete [Parhami 2000] and this can easily be extended to a 19-bit value by incorporating another 4-bit CLA adder in the 1st level. This does not entail significant delay overhead because although the fanout and the

\textsuperscript{61} This consists of four 4-bit CLA adders in the first level and a 2\textsuperscript{nd} stage carry-look-ahead generator in the second level.
individual gate delay increases slightly, the number of levels or the number of gate delays remains the same. An alternative to the carry-look-ahead adder is the carry-save adder which has the advantage of reduced number of gates at the expense of longer delay times. Also the pipelining of carry-save adders is a simple matter and is suitable where the result of a carry-save addition is immediately re-used in another addition e.g. in multiplication. In the case of a 12-bit by 7-bit multiplication, a carry-save adder approach will require 2 x 19 clock cycles while the carry-look-ahead adder technique requires 2 x 7 clock cycles but the cycle can be made faster for the case of the pipelined carry-save adders. Other complicated optimised multipliers and adders [Parhami 2000] are also possible. However, for the purpose of this investigation only the carry-look-ahead adder will be considered. With a typical gate delay of <1ns, the addition process can be performed within one cycle of the 32MHz clock \( (1/32\text{MHz} = 31.25\text{ns}) \) and the shift and add multiplication requires \( 7 \times 2 \) clock cycles with a 7-bit quotient (centroid) and 2 cycles per bit for shift and add. Hence the multiplication delay time, \( T_{\text{mult}} \), becomes:

\[
T_{\text{mult}} = 14 / 32\text{MHz} = 0.4375\mu s
\]  

\( (5.19) \)

**iii) Memory access time, \( T_{\text{mem}} \)**

The inverse reconstruction matrix is fixed for a given configuration and these values can be stored in memory for fast reconstruction computation. Foundries often provide service for the generation of on-chip random access memory (RAM) and in the AMS 0.35\( \mu \)m CMOS (C35) process, for example, a 2775 word \( \times \) 12-bit single port RAM will take up an area of 1.86mm\(^2\) with an access time of 5.75ns [Austriamicrosystems – Memory Compiler].

**iv) Reconstruction time, \( T_{\text{recon}} \)**

Therefore, with a single serial multiplier the reconstruction time, \( T_{\text{recon}} \), required for this design is:

\[
T_{\text{recon}} = 2775 \times (0.4375\mu s + 5.75\text{ns}) = 1.23\text{ms}
\]  

\( (5.20) \)
It is possible to move to a faster clock to perform the arithmetic calculations as well as use several parallel multiplier units to reduce the delay time. De Lima Monteiro [de Lima Monteiro 2002] performed modal reconstruction on PC using a 750MHz Pentium III Processor for 64 quad cells (128 slopes), 9 Zernike modes and 37 mirror control signals, and this took a time of 134μs, so it is possible to go much faster. Also, according to de Lima Monteiro, the control and feedback algorithms run in a 750MHz Pentium III PC, under Linux, are not of concern compared to the other elements of the system.

5.5.1.4 Mirror settling time

The mirror settling time was 1ms [de Lima Monteiro 2002]. The mirror has to be stable during WFS integration so the mirror actuation and settling cannot be pipelined with the wavefront sensing and reconstruction.

5.5.1.5 Closed loop response time

Therefore, the closed loop response time = 0.416ms + 16.2μs + 1.23ms + 1ms = 2.66ms. Hence, the closed loop bandwidth = 376Hz. Rhoadarmer et. al. [Rhoadarmer 1999] used a 80 x 80 array 4-port, split frame transfer CCD with 1kHz frame rate with zonal reconstruction and achieved a closed loop bandwidth of 5Hz, while Dayton et. al. [Dayton 2002] achieved a closed loop bandwidth of 80Hz. De Lima Monteiro [de Lima Monteiro 2002] achieved an operational frequency (sensor readout and wavefront reconstruction) of 370Hz and a closed loop bandwidth (sensor readout, wavefront reconstruction, mirror actuation and settling time) of 260Hz with 44 quad cells and modal reconstruction of 9 Zernike modes. Paterson et. al. [Paterson 2000] used a .128 x 128 CCD with a maximum frame rate of 800Hz and the closed loop bandwidth achieved was 50Hz. Hence the system compares favourably to other similar systems.

Assuming the number of subapertures is equal to the number of actuators and there are two slopes per subaperture, the delay of the system as the number of degrees of freedom (actuators) increases can be shown. Figure 5.7 shows the delay of the system when a 32MHz clock with a single multiplier is used while Figure 5.8 shows the delay...
when a 200MHz clock and 10 multiplier units are used. It can be seen that moving to faster clock speeds and using parallel multiplier units will remove the delay bottleneck from the reconstruction procedure to the mirror settling time. Also, by moving to faster ADC techniques for the tilt sensors, it is possible to remove this as a key delay as well.

Figure 5.7  Delay times for the AO system when a 32MHz clock and 1 multiplier is used

Figure 5.8  Delay times for the AO system when a 200MHz clock and 10 multipliers are used
5.5.2 AREA AND COST

The integrated wavefront sensor consists of 3 main components; the fabricated tilt sensors, the wavefront reconstruction circuitry and memory storage for the reconstruction matrix.

The 5 x 5 photodiode array size is 530\(\mu\)m x 600\(\mu\)m (0.318\(mm^2\)) and the area of the centroid processor (excluding the array) is 12.6\(mm^2\). The gate density for the Mietec 0.7\(\mu\)m CMOS process is 1250 gates/mm\(^2\) while that of the AMS 0.35\(\mu\)m CMOS (C35) process is 18k gates/mm\(^2\). When scaled to the AMS 0.35\(\mu\)m CMOS (C35) process, the area consumed per tilt sensor will approximately be 12.6\(mm^2\) x 1250/18k + 0.318\(mm^2\) = 1.2\(mm^2\).

A 19-bit CLA adder is expected to take less than 200 gates and an area of 100\(\mu\)m\(^2\) per gate, and a 19-bit shift register would require 19 D-type flip flops at 400\(\mu\)m\(^2\) each [Austriamicrosystems]. Hence, the wavefront reconstruction circuitry takes up less than 0.03\(mm^2\). So it is feasible to use 10 multiplier units in parallel, or even more. Ultimately, the bottleneck of the system will lie with the settling time of the mirror except for the case where a very large number of actuators are needed.

Hence for the proposed system with 37 subapertures, the total area required will be 37 \(\times\) 1.2\(mm^2\) + 0.03\(mm^2\) + 1.86 \(mm^2\) \(\approx\) 46.3\(mm^2\). Excluding packaging costs, the fabrication cost will come up to 580EUR/mm\(^2\) x 46.3\(mm^2\) \(\approx\) 26800EUR or about £18000 for 10 samples. Considering traditional systems have a component costs of >£10\(^5\) [Munro 1999], the design offers a significant savings in system cost. The largest chip area available through Europractice is 16.5mm x 16.5mm with a ceramic quad flat pack (CQFP208) package, and this is capable of encompassing over 220 subapertures or tilt sensors.
5.6 CHAPTER SUMMARY

In this chapter, the concepts for wavefront reconstruction were introduced. A wavefront can be described using a zonal approach where the wavefront is divided up into subapertures (zones) or using a modal approach where the wavefront is treated as a sum of basis functions (modes) with Zernike polynomials being a popular choice.

Wavefront reconstruction usually involves the solution of a linear system of equations in matrix form \( s = B a \) where for an overdetermined system, a linear least-square approximation can be used to solve for the actuator signals \( a \) by finding the pseudo-inverse of the reconstruction matrix \( B \) and multiplying this by the measured wavefront tilts \( s \) \( (a = [B^T B]^{-1} B^T s) \). For a given configuration, the pseudoinverse matrix \( [B^T B]^{-1} B^T \) need only be calculated once reducing the wavefront reconstruction computation to a single matrix multiplication.

Two types of reconstruction techniques from wavefront slopes are possible and these are the modal and zonal techniques. With the modal technique, coefficients of the polynomial function for describing the wavefront (Zernike) are obtained and these need to be converted into actuator commands for driving the deformable mirror. With the zonal technique, the phases of the wavefront at regular discrete points on the aperture are obtained and these translate directly into actuator commands. In the zonal approach, the sensor-actuator geometry and choice of deformable mirror directly affects the generation of the reconstruction matrix \( B \) and an example for the reconstruction of a 3 x 3 actuator system with a Hudgin geometry was shown. Once the pseudo-inverse matrix of \( B \) is obtained, it can be converted into binary values and stored in on-chip RAM for wavefront reconstruction allowing a complete, cheap, fast, low cost adaptive optics system to be built.

The structure for our proposed AO system was then presented and it was shown that the parallel processing achieved with the system allowed a closed loop bandwidth of more than 370Hz and at a fraction of the cost of traditional AO systems. The design is able to remove the bottleneck from the readout and processing of the wavefront to its fundamental limit of the mirror settling time.
CHAPTER 6

CONCLUSIONS

6.1 DISCUSSION

The research covered in this thesis addresses the need for a fast, low cost integrated wavefront sensor for use in an adaptive optical system. An adaptive optical (AO) system corrects for wavefront distortion in the imaging medium, such as the atmosphere, by having a closed loop detection and correction scheme. A Shack Hartmann wavefront sensor uses an array of small lenslets to sample the optical wavefront and by detecting the deviation of the focused spots from reference positions, the local wavefront tilts are obtained. Currently with most of these systems, a single CCD is used to sample the entire wavefront before it can be processed, resulting in a data bottleneck. This research addresses this issue by integrating local centroid processing for each local wavefront tilt which will allow parallel processing of the wavefront. In addition, removing the need for a CCD-frame grabber-PC architecture will lead to a reduction in system size, cost and power consumption. Adaptive optics has traditionally been known for its use in astronomical and military applications mainly because of the high cost of the components in the system. With a low-cost real-time adaptive optical system, many new application areas such as ophthalmology, intra and extra-cavity laser correction, free space optical communications and microscopy, will become feasible. The design stages of the system are summarised below.

6.1.1 DESIGN SPECIFICATIONS

There are several possible structures for implementing a position sensitive device (PSD) such as the lateral effect photodiode (LEP), the quad cell and the multi-pixel array. A lateral effect PSD requires large uniform sheet resistance for linear operation,
which is not readily available in a standard CMOS process making integration with circuitry difficult. Quad cells have simple readout schemes but are not very linear. Multi pixel arrays have better linearity, sub-pixel accuracy and positional range. They also offer greater flexibility and are able to deal with multiple spots and non-uniform intensity profiles. The drawback is the increased computational load but for moderate array sizes this is reasonable and this was the architecture chosen for our system. A 5 x 5 pixel array was selected as a tradeoff between linearity and circuit complexity.

In terms of centroid processing, there have been various efforts to implement centroid detection on a CMOS process for numerous applications. In general, analogue multi-pixel array approaches suffer from low fill factor and poor linearity due to poor tolerance of components such as polysilicon resistors and capacitors. Binary position sensing techniques using Winner Take All (WTA) circuitry or an on-pixel comparator does not offer subpixel accuracy and cannot cope with multiple spots or non-uniform spots. This research explores the approach of a dedicated digital centroid processor which offers high accuracy and greater flexibility and programmability for various image processing tasks. In terms of pixel architectures, the CMOS active pixel sensor (APS) was selected as it offers high fill factor and low mismatch compared to other APS types.

6.1.2 CHARACTERISATION OF CMOS PHOTODIODE STRUCTURES

An important design requirement for this work is the integration of circuitry at the sensor level and this is difficult to achieve with CCDs. As such, a standard CMOS process was used. However, CMOS processes have been optimised towards microelectronic circuitry rather than imaging. Hence, characterisation of photodetector structures in a standard CMOS process was necessary. The CMOS process selected for the work was the Mietec 0.7µm CMOS process accessed via the Europractice IC Service. The CMOS photodiode structures were characterised for dark current, capacitance, spatial response, responsitivity and spectral response. The dark current for the devices tested was of the order of 1pA or less for a reverse bias voltage of 2 - 4V. The capacitance of the deep device (0.5pF for a 100µm x 100µm photodiode at
2V reverse bias) was shown to be smaller than the shallow devices (3.2pF for a 100μm x 100μm photodiode at 2V reverse bias) making them more suitable for high speed applications. The presence of an inadvertent Schottky barrier diode lowered the capacitance further. The photodiode structures were also shown to be highly linear with incident light intensity and to have saturation levels higher than 2.7mW of light power. The results of the characterisation work showed that without the need for any process modifications photodiodes in standard CMOS showed good responsitivity of the order of 0.3A/W. In terms of spectral response, the deep photodiode has better spectral response at longer wavelengths while the shallow performed better at shorter wavelengths. This is due to the absorption coefficient and penetration depth of light into silicon, where light of longer wavelength penetrates deeper into the substrate. The deep well-substrate photodiode was chosen for integration of the ASIC because of its low capacitance, low leakage in reverse bias and high responsitivity particularly at longer wavelengths.

6.1.3 DESIGN PROTOTYPING

To achieve the goal of fabricating a single IC optical centroid processor, a design philosophy of functional validation via a hardware emulation system prior to chip fabrication was employed. This reduces the risk and the number of iterations and fabrication runs needed to produce a working centroid processor. The hardware emulation system consists of a 5 x 5 photodiode array, a transimpedance amplifier for current-to-voltage conversion, an analogue-to-digital converter (ADC) and a reprogrammable FPGA processor for calculating the centroid. The hardware emulation system was tested with a commercial photodiode array and a full custom standard CMOS photodiode array fabricated in the Mietec 0.7μm CMOS process. The centroid processor successfully computed the centroids at a rate of 1.54kHz, which was limited by the maximum conversion frequency of the ADC of 40kHz. Having proven the functionality of the digital centroid processor and the use of a full custom photodiode array, the next stage in the design was to integrate the full custom array with the digital centroid processor onto a single CMOS IC chip.
For the ASIC centroid processor, an active pixel sensor array was used for buffering of the pixel and current-to-voltage conversion. The pixel architecture was optimised according to simulation results and circuit analysis. Digitisation of the pixel output was done using a counter and two comparators to measure the discharge time of the pixel. The dynamic range of the pixel output could be extended using a two cycle adaptive technique. The digitised pixel values were then computed by the digital centroid processor which was previously verified by the hardware emulation system. The ASIC allowed different modes of operation and various control signals for increased testability and observability. Being a mixed full-custom and semi-custom design, several layout and design issues had to be considered such as design partitioning, power supply management, physical design verification and clock tree planning. The fabricated optical centroid processor successfully obtained and transmitted the centroids at a rate of 2.4 - 4.8 kHz allowing real-time operation in many applications.

6.1.4 COMPLETE AO SYSTEM

It was shown that wavefront reconstruction for a Shack Hartmann wavefront sensor can be reduced to a simple matrix multiplication with on-chip memory storage so integration of wavefront sensing and wavefront reconstruction can easily be achieved, leading to cheap and fast adaptive optical systems. The structure for a proposed AO system was presented to illustrate the scalability of the design and the advantage drawn from processing the centroids in parallel. The system was capable of achieving a closed loop bandwidth of more than 370Hz and at a fraction of the cost of traditional AO systems. The design is able to remove the bottleneck from the readout and processing of the wavefront to its fundamental limit of the mirror settling time.
6.2 FURTHER WORK

The recommended further work for this design shall be summarised below:

1) To improve the positional resolution of the processor, the number of bits in the centroid representation can be increased. This requires very little overhead. That is, just an additional shift-and-subtract cycle in the divider is required for each additional bit in the result, without the need for additional storage for the dividend and divisor.

2) The limited spatial dynamic range of the design was attributed to stray light, crosstalk between pixels and the choice of ADC technique which compresses higher intensity light levels hence reducing the signal to background ratio. The problem of stray light can be overcome by improving the optical setup. To resolve the problem of pixel crosstalk, individual pixel reset can be implemented, such that when one pixel is discharging the other pixels remain under reset. Also, a suitably biased guard ring structure can be incorporated between pixels to mop up any crosstalk current. FPN noise removal circuitry or pixel offset subtraction should be incorporated in future designs.

3) To overcome the speed limitation of the current ADC technique, an alternate digitisation structure can be implemented where a conventional ADC is used to digitise the final discharge voltage, which would also allow pixel values to be readout and digitised sequentially without the need of separate discharge cycles per pixel.

4) In the initial design, 26 pixel periods were required per frame but this can be reduced to 25 pixel periods by using the fast clock instead of the pixel reset signal to latch out the data and clear the registers, thereby allowing a new frame to start immediately after 25 pixel periods.

5) Under conditions of low signal-to-noise ratio (SNR) it is possible to improve the accuracy of the centroiding by removing the background signal through thresholding. With a digital centroid processor this is easily achieved by subtracting a programmable, even adaptive, offset from the digitised input or by setting all pixel values below a certain threshold to zero before the centre of gravity is found. It is also possible to apply windowing to improve the SNR further.
6) Testing of mode 2 and mode 3 operation of the ADC should be carried out, which are expected to give better noise rejection and increased dynamic range capability respectively.

7) The current design should be tested with different laser beam sizes in order to characterise and quantify the linearity of the device as a centroid detector.

8) Faster, more robust off-chip readout techniques can be considered in place of the RS232 link which has limited data rates, such as the Universal Serial Bus (USB) which has a data rate of up to 480Mbps.

9) Finally, an array of tilt sensors can be integrated along with wavefront reconstruction to form a complete low-cost real-time adaptive optical system.

6.3 CONCLUSIONS

The main conclusions from this work are highlighted below:

1) CMOS photodiode structures offer satisfactory responsitivity (about 0.3 A/W) for the intended application while allowing high levels of circuit integration not possible with the CCD process. This has allowed the use of parallel processing to remove the data bottleneck in traditional CCD systems.

2) A hardware emulation system was used to confirm the performance of the design prior to ASIC fabrication, hence reducing the risk and the number of iterations needed to produce a working centroid processor. The hardware emulation system successfully computed centroids at a rate of 1.54kHz which was limited by the speed of the ADC used [Pui 2002]. Due to the re-programmable nature of the FPGA the hardware emulation environment can also be used for prototyping many other optical processing algorithms.
3) This work represents the only dedicated digital centroid processor fabricated to date and it was integrated with an on-chip CMOS photodiode array and the system successfully processed and transmitted centroids at a rate of 2.4 – 4.8 kHz [Pui 2004], removing the data bottleneck present in traditional CCD systems and allowing real-time operation in many applications.

4) The centroid processor has the potential to be scaled to a complete cheap and fast AO system. The division process of the centroid processor can make use of latency in the design to be shared among several centroid processors. In addition, moving to smaller feature sizes and improved routing capability will lead to a significant reduction in the size of the digital centroid processor, which is an advantage not offered by analogue approaches due to the tolerance of its components. When integrated with an array of tilt sensors operating in parallel, the frame rate of the design is not limited by the number of tilt sensors employed. In fact, the speed advantage over traditional systems increases with the number of tilt sensors required.
REFERENCES


References


203


References


[Salmon] Salmon, T. O. "A Primer on Using Wavefront Analysis for Refractive Surgery and Other Ophthalmic Applications". Available at http://www.opt.pacificu.edu/ce/catalog10260-RS/WavefrontSalmon.html


References


Appendix A1.1: Removal of data bottleneck in traditional wavefront sensors

In this section, the data bottleneck in current CCD AO systems is quantified and compared with our proposed system. Figure A1.1 and A1.2 shows the structure of our system and that of a single sensor, typically a CCD, system respectively. \( N_{\text{light}} \) is the number of bits used to represent the light level while \( N_{\text{centroid}} \) is the number of bits that make up the centroid values and typically, \( N_{\text{light}} > N_{\text{centroid}} \). For each centroid processing block two centroid values are obtained (x and y). Figure A1.3 shows the timing diagram associated with both. The off-chip centroid computation time is ignored, as it is possible to make use of the latency in the frame acquisition and readout, just as the parallel on-chip computation does, and to use several parallel processors/CPU/DSP units off-chip.

![Figure A1.1 Traditional systems with conventional CCD readout architecture](image)

![Figure A1.2 Our proposed system with parallel centroid processing](image)
Figure A1.3 Timing diagrams of our proposed system and the traditional system

The acquisition time specified includes the pixel integration period and the ADC acquisition time. For long integration periods, the frame rate is limited by the acquisition time, and in this case the frame period for our system, $T_1$, and of the traditional system, $T_2$, is given by:

$$T_1 = 25T_{\text{pixel}}$$

$$T_2 = 25n^2T_{\text{pixel}}$$

For short integration times and fast digitisation, the frame rate is limited by the off-chip readout time $T_{\text{read}}$, and in this case:

$$T_1 = 2N_{\text{centroid}} \times n^2 \times T_{\text{read}}$$

$$T_2 = N_{\text{light}} \times 25n^2 \times T_{\text{read}}$$

In summary, for long integration times, our system removes the data bottleneck by allowing parallel acquisition of the raw data, while for short integration times, the bottleneck is removed by processing the raw data on-chip and only transmitting reduced bandwidth data off-chip. Also, the speed advantage offered increases with the square of the number of subapertures, $n$, in the system.

Typically guard row and column pixels are needed to avoid optical crosstalk when a CCD is used. As such the array size and hence, the readout time of the CCD system is larger than that assumed.
Appendix A2.1: Alcatel Microelectronics (Mietec) 0.7μm CMOS Process Parameters

This process is a self-aligned twin-well CMOS process with n+doped polysilicon gate. Several key process and electrical parameters are highlighted here.

**Electrical Parameters:**

**Layer Thickness**

<table>
<thead>
<tr>
<th>Layer</th>
<th>Thickness (μm)</th>
</tr>
</thead>
<tbody>
<tr>
<td>n-well</td>
<td>2.0</td>
</tr>
<tr>
<td>p-epilayer</td>
<td>15.0 - 18.7</td>
</tr>
<tr>
<td>field oxide</td>
<td>0.45</td>
</tr>
<tr>
<td>gate oxide</td>
<td>0.0175</td>
</tr>
</tbody>
</table>

**Resistivity**

<table>
<thead>
<tr>
<th>Layer</th>
<th>Resistivity/Doping levels</th>
<th>Sheet resistance (Ω/sq) (^{63})</th>
<th>Resistivity (Ω cm)</th>
</tr>
</thead>
<tbody>
<tr>
<td>n-well</td>
<td></td>
<td>1300</td>
<td>-</td>
</tr>
<tr>
<td>p+</td>
<td></td>
<td>96</td>
<td>-</td>
</tr>
<tr>
<td>n+</td>
<td></td>
<td>67.5</td>
<td>-</td>
</tr>
<tr>
<td>Poly</td>
<td></td>
<td>27</td>
<td>-</td>
</tr>
<tr>
<td>Metal 1</td>
<td></td>
<td>0.050</td>
<td>-</td>
</tr>
<tr>
<td>Metal 2</td>
<td></td>
<td>0.035</td>
<td>-</td>
</tr>
<tr>
<td>p-epilayer</td>
<td></td>
<td>-</td>
<td>27.2 - 40.8</td>
</tr>
<tr>
<td>substrate(^{64})</td>
<td></td>
<td>-</td>
<td>0.01 - 0.02</td>
</tr>
</tbody>
</table>

\(^{63}\) Sheet resistance, \(R_s = \frac{\rho}{t}\) (Ω/sq), where \(\rho\) is the resistivity (Ω cm) and \(t\) is the thickness (cm)

\(^{64}\) p-epilayer is also denoted as p-substrate.
Junction diodes

<table>
<thead>
<tr>
<th>Junction type</th>
<th>Junction capacitance</th>
<th>Leakage current</th>
<th>Breakdown voltage (V)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>$C_j$ (pF/μm²)</td>
<td>$i_a$ (fA/μm²)</td>
<td>$i_{pf}$ (fA/μm)</td>
</tr>
<tr>
<td>p+/n-well</td>
<td>$6.0 \times 10^{-4}$</td>
<td>$3.6 \times 10^{-10}$</td>
<td>1.1</td>
</tr>
<tr>
<td>n+/p-well</td>
<td>$5.0 \times 10^{-4}$</td>
<td>$2.8 \times 10^{-10}$</td>
<td>0.13</td>
</tr>
<tr>
<td>n-well/p-substrate</td>
<td>$7.8901 \times 10^{-5}$</td>
<td>$7.3315 \times 10^{-10}$</td>
<td>1.1</td>
</tr>
</tbody>
</table>

Transistor Parameters

<table>
<thead>
<tr>
<th>Parameters</th>
<th>NMOS</th>
<th>PMOS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gate oxide thickness (nm)</td>
<td>17.5</td>
<td>17.5</td>
</tr>
<tr>
<td>Threshold voltage (V)</td>
<td>0.75</td>
<td>-1.0</td>
</tr>
<tr>
<td>Transconductance ($\mu A/V^2$)</td>
<td>95</td>
<td>30</td>
</tr>
</tbody>
</table>

Diode Models:

Shallow n+/p-well junction

.MODEL DNPLUS D IS=3E-7 ISW=6E-11 CJO=5E-4 M=0.35 CSO=2.8E-10 MS=0.21 VJ=0.8

Shallow p+/n-well junction

.MODEL DPPLUS D IS=2E-8 ISW=7E-11 CJO=6.0E-4 M=0.51 CSO=3.6E-10 MS=0.35 VJ=0.8

Deep n-well/p-substrate junction

.MODEL Djunc D IS=1E-15 CJ=7.8901E-5 MJ=0.27412 PB=0.42842 CJSW=7.3315E-10 MJSW=0.25301 +FC=0.99232

Digital Logic:

System speed up to 80 MHz

Power: 3.2 μW/gate/MHz at 5 V

Density: 1250 gates/mm² (incl. routing, typical density for 20,000 gates design)
Appendix A2.2: Schematic and PCB layout of test board for laser scanning experiment
Top layer
Appendix A3.1: Schematic and layout of optical front end of the hardware emulation system with a commercial photodetector array
Top layer

Bottom layer
Appendix A3.2: Schematic and layout of optical front end of the hardware emulation system with a full custom photodetector array
Top layer

Bottom layer
Appendix A3.3: Schematic and layout of FPGA processor board
Bottom layer
Appendix A3.4: Schematic of FPGA centroid processor for commercial photodetector array front end
Appendix A3.5: Schematic of FPGA centroid processor for full custom photodetector array front end
Appendix A4.1: Top level schematic of ASIC centroid processor
Appendix A4.2: Schematic and layout of ASIC centroid processor test board
Top layer
Bottom layer
### Appendix A4.3: Power consumption calculation of digital circuitry in ASIC

<table>
<thead>
<tr>
<th>Component types</th>
<th>CAND2</th>
<th>CAO121</th>
<th>CAO1211</th>
<th>CAO122</th>
<th>CFD2</th>
<th>CFD2S</th>
<th>CFD3</th>
<th>CFD3S</th>
<th>CIA</th>
<th>CN2111</th>
<th>CNAND2</th>
<th>CNAND3</th>
<th>CNAND4</th>
</tr>
</thead>
<tbody>
<tr>
<td>POW(µW/MHz)</td>
<td>3.4614</td>
<td>1.4067</td>
<td>3.6269</td>
<td>1.6254</td>
<td>11.7442</td>
<td>13.209</td>
<td>12.1246</td>
<td>13.5925</td>
<td>0.8878</td>
<td>1.726</td>
<td>1.3169</td>
<td>1.5593</td>
<td>1.9488</td>
</tr>
<tr>
<td>clkdiv1</td>
<td>1</td>
<td>2</td>
<td>4</td>
<td>5</td>
<td>1</td>
<td>1</td>
<td>8</td>
<td>3</td>
<td>21</td>
<td>114</td>
<td>28</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>CTR5by5a</td>
<td>7</td>
<td>11</td>
<td>17</td>
<td>71</td>
<td>4</td>
<td>242</td>
<td>3</td>
<td>364</td>
<td>21</td>
<td>114</td>
<td>28</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td>CTRa2d</td>
<td>3</td>
<td>12</td>
<td>2</td>
<td>46</td>
<td>32</td>
<td>37</td>
<td>3</td>
<td>12</td>
<td>98</td>
<td>8</td>
<td>38</td>
<td>10</td>
<td>4</td>
</tr>
<tr>
<td>ctrlrcvr</td>
<td>1</td>
<td>1</td>
<td>7</td>
<td>49</td>
<td>4</td>
<td>3</td>
<td>47</td>
<td>49</td>
<td>8</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>div2</td>
<td>1</td>
<td>1</td>
<td>7</td>
<td>49</td>
<td>4</td>
<td>3</td>
<td>47</td>
<td>49</td>
<td>8</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>div8</td>
<td>1</td>
<td>1</td>
<td>7</td>
<td>49</td>
<td>4</td>
<td>3</td>
<td>47</td>
<td>49</td>
<td>8</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>div10</td>
<td>1</td>
<td>4</td>
<td>1</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>div20</td>
<td>2</td>
<td>1</td>
<td>5</td>
<td>4</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>div30</td>
<td>2</td>
<td>1</td>
<td>5</td>
<td>4</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>divbaud</td>
<td>2</td>
<td>4</td>
<td>10</td>
<td>4</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>opctrlsyn</td>
<td>1</td>
<td>5</td>
<td>2</td>
<td>49</td>
<td>13</td>
<td>30</td>
<td>2</td>
<td>47</td>
<td>2</td>
<td>31</td>
<td>8</td>
<td>12</td>
<td></td>
</tr>
<tr>
<td>RCREv</td>
<td>2</td>
<td>1</td>
<td>13</td>
<td>21</td>
<td>4</td>
<td>2</td>
<td>21</td>
<td>2</td>
<td>8</td>
<td>12</td>
<td>4</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>txall</td>
<td>4</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>12</td>
<td>2</td>
<td>20</td>
<td>3</td>
<td>14</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>txlightout</td>
<td>3</td>
<td>2</td>
<td>1</td>
<td>10</td>
<td>2</td>
<td>2</td>
<td>20</td>
<td>2</td>
<td>12</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td></td>
</tr>
<tr>
<td>POW(μW/MHz)</td>
<td>Component types</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>-------------</td>
<td>-----------------</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Block name</td>
<td>CNAND5</td>
<td>CNOR2</td>
<td>CNOR3</td>
<td>CNOR4</td>
<td>CNOR5</td>
<td>COAI21</td>
<td>COAI211</td>
<td>COAI22</td>
<td>COR2</td>
<td>CXNOR2</td>
<td>CXOR2</td>
<td>CVDD</td>
<td>CVSS</td>
</tr>
<tr>
<td>clkdiv1</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>CTR5by5a</td>
<td>119</td>
<td>4</td>
<td>2</td>
<td>1</td>
<td>76</td>
<td>2</td>
<td>77</td>
<td>28</td>
<td>232</td>
<td>82</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>CTRa2d</td>
<td>12</td>
<td>15</td>
<td>3</td>
<td>2</td>
<td>36</td>
<td>23</td>
<td>7</td>
<td>4</td>
<td>4</td>
<td>7</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>ctrlrcvr</td>
<td>5</td>
<td>3</td>
<td>3</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>5</td>
<td>3</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>div2</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>div8</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>div10</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>div20</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>div30</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>divbaud</td>
<td>8</td>
<td>1</td>
<td>3</td>
<td>12</td>
<td>2</td>
<td>6</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>opctrlsyn</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>RCrecvr</td>
<td>5</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>txall</td>
<td>19</td>
<td>7</td>
<td>2</td>
<td>2</td>
<td>10</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>3</td>
<td>1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>txlighout</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>2</td>
<td>16</td>
<td>1</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>txrowcol</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>16</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>Block name</td>
<td>POW(μW/MHz)</td>
<td>Frequency, (MHz)</td>
<td>Power Consumption (mW) at frequency, f (MHz)</td>
<td>No. of Instances</td>
<td>No. of Gates</td>
<td>Critical delay (ns)</td>
<td>path</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>------------</td>
<td>-------------</td>
<td>-----------------</td>
<td>---------------------------------------------</td>
<td>-----------------</td>
<td>-------------</td>
<td>-------------------</td>
<td>------</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>clkdiv1</td>
<td>72.9231</td>
<td>32</td>
<td>2.3335392</td>
<td>21</td>
<td>47</td>
<td>6.46</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CTR5by5a</td>
<td>5131.3544</td>
<td>16</td>
<td>82.1016704</td>
<td>1511</td>
<td>3641</td>
<td>31.37</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>CTRa2d</td>
<td>1554.306</td>
<td>32</td>
<td>49.737792</td>
<td>420</td>
<td>1028</td>
<td>17.45</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ctrlrcvr</td>
<td>1029.2836</td>
<td>32</td>
<td>32.9370752</td>
<td>165</td>
<td>625</td>
<td>13.2</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>div2</td>
<td>12.632</td>
<td>32</td>
<td>0.404224</td>
<td>2</td>
<td>7</td>
<td>2.37</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>div8</td>
<td>58.0278</td>
<td>32</td>
<td>1.8568896</td>
<td>11</td>
<td>35</td>
<td>5.02</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>div10</td>
<td>60.6702</td>
<td>0.1152</td>
<td>0.006989207</td>
<td>13</td>
<td>37</td>
<td>4.86</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>div20</td>
<td>79.6368</td>
<td>0.1152</td>
<td>0.009174159</td>
<td>18</td>
<td>48</td>
<td>5.51</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>div30</td>
<td>74.3362</td>
<td>0.1152</td>
<td>0.00856353</td>
<td>16</td>
<td>46</td>
<td>5.69</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>divbaud</td>
<td>307.2977</td>
<td>32</td>
<td>9.8335264</td>
<td>94</td>
<td>191</td>
<td>13.76</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>opctrlsyn</td>
<td>150.0007</td>
<td>32</td>
<td>4.8000224</td>
<td>16</td>
<td>92</td>
<td>2.05</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>RCrcvr</td>
<td>554.6075</td>
<td>32</td>
<td>17.74744</td>
<td>81</td>
<td>335</td>
<td>7.56</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>txall</td>
<td>869.5475</td>
<td>0.1152</td>
<td>0.100171872</td>
<td>251</td>
<td>590</td>
<td>11.02</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>txlightout</td>
<td>399.6852</td>
<td>4</td>
<td>1.5987408</td>
<td>99</td>
<td>256</td>
<td>8.84</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>txrowcol</td>
<td>381.9001</td>
<td>4</td>
<td>1.5276004</td>
<td>93</td>
<td>239</td>
<td>8.5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Total: 205.0034192 2811 7217