58 results on '"Eric J. Kelmelis"'
Search Results
2. Real-time high performance atmospheric distortion correction using a Xilinx UltraScale Plus
- Author
-
Eric J. Kelmelis, Nick Henning, Aaron Paolini, James Bonnett, Wayne Cranwell, and Steve Carl Jamieson Parker
- Subjects
business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,Video processing ,MPSoC ,Lucky imaging ,business ,Adaptive optics ,Frame rate ,Field-programmable gate array ,Bispectrum ,Computer hardware - Abstract
Long-range video surveillance is usually limited by the wavefront aberrations caused by atmospheric turbulence, rather than by the quality of the imaging optics or sensor. These aberrations can be mitigated optically by adaptive optics, or corrected post detection by digital video processing. Video processing is preferred if the quality of the enhancement is acceptable, because the hardware is less expensive and has lower size, weight and power (SWaP). Several competing video processing solutions may be employed: speckle imaging with bispectrum processing, lucky imaging, geometric correction and blind deconvolution. Speckle imaging was originally developed for astronomy. It has subsequently been adapted for the more challenging problem of low altitude, slant path, imaging, where the atmosphere is denser and more turbulent. This paper considers a bispectrum-based video processing solution, called ATCOM, which was originally implemented on an i7 CPU and accelerated using a GPU by EM Photonics Ltd. The design has since been adapted in a joint venture with RFEL Ltd to produce a low SWaP implementation based around Xilinx’s Zynq 7045 allprogrammable system-on-a-chip (SoC). This system is called ATACAMA. Bispectrum processing is computationally expensive and, for both ATCOM and ATACAMA, a sub-region of the image must be processed to achieve operation at standard video frame rates. This paper considers how the design may be optimized to increase the size of this region, while maintaining high performance. Finally, use of Xilinx’s next-generation UltraScale+ multiprocessor SoC (MPSoC), which has an embedded Mali-400 GPU as well as an ARM CPU, is explored to further improve functionality.
- Published
- 2018
- Full Text
- View/download PDF
3. Front Matter: Volume 10204
- Author
-
Eric J. Kelmelis
- Subjects
Optics ,Materials science ,Range (biology) ,business.industry ,business - Published
- 2017
- Full Text
- View/download PDF
4. Enhancing data from commercial space flights (Conference Presentation)
- Author
-
Eric J. Kelmelis, Stephen Kozacik, Aaron Paolini, and Ariel Sherman
- Subjects
Rocket (weapon) ,Presentation ,business.product_category ,Aeronautics ,Rocket ,Computer science ,Range (aeronautics) ,media_common.quotation_subject ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,Space (commercial competition) ,business ,Remote sensing ,media_common - Abstract
Video tracking of rocket launches inherently must be done from long range. Due to the high temperatures produced, cameras are often placed far from launch sites and their distance to the rocket increases as it is tracked through the flight. Consequently, the imagery collected is generally severely degraded by atmospheric turbulence. In this talk, we present our experience in enhancing commercial space flight videos. We will present the mission objectives, the unique challenges faced, and the solutions to overcome them.
- Published
- 2017
- Full Text
- View/download PDF
5. Photo-acoustic and video-acoustic methods for sensing distant sound sources
- Author
-
Eric J. Kelmelis, Dan Slater, and Stephen Kozacik
- Subjects
Electromagnetic field ,Pixel ,Dynamic range ,Computer science ,Turbulence ,business.industry ,Microphone ,Video processing ,Signal ,Photodiode ,law.invention ,Stereophonic sound ,Transducer ,Sampling (signal processing) ,law ,Distortion ,Demodulation ,Waveform ,Computer vision ,Artificial intelligence ,business - Abstract
Long range telescopic video imagery of distant terrestrial scenes, aircraft, rockets and other aerospace vehicles can be a powerful observational tool. But what about the associated acoustic activity? A new technology, Remote Acoustic Sensing (RAS), may provide a method to remotely listen to the acoustic activity near these distant objects. Local acoustic activity sometimes weakly modulates the ambient illumination in a way that can be remotely sensed. RAS is a new type of microphone that separates an acoustic transducer into two spatially separated components: 1) a naturally formed in situ acousto-optic modulator (AOM) located within the distant scene and 2) a remote sensing readout device that recovers the distant audio. These two elements are passively coupled over long distances at the speed of light by naturally occurring ambient light energy or other electromagnetic fields. Stereophonic, multichannel and acoustic beam forming are all possible using RAS techniques and when combined with high-definition video imagery it can help to provide a more cinema like immersive viewing experience. A practical implementation of a remote acousto-optic readout device can be a challenging engineering problem. The acoustic influence on the optical signal is generally weak and often with a strong bias term. The optical signal is further degraded by atmospheric seeing turbulence. In this paper, we consider two fundamentally different optical readout approaches: 1) a low pixel count photodiode based RAS photoreceiver and 2) audio extraction directly from a video stream. Most of our RAS experiments to date have used the first method for reasons of performance and simplicity. But there are potential advantages to extracting audio directly from a video stream. These advantages include the straight forward ability to work with multiple AOMs (useful for acoustic beam forming), simpler optical configurations, and a potential ability to use certain preexisting video recordings. However, doing so requires overcoming significant limitations typically including much lower sample rates, reduced sensitivity and dynamic range, more expensive video hardware, and the need for sophisticated video processing. The ATCOM real time image processing software environment provides many of the needed capabilities for researching video-acoustic signal extraction. ATCOM currently is a powerful tool for the visual enhancement of atmospheric turbulence distorted telescopic views. In order to explore the potential of acoustic signal recovery from video imagery we modified ATCOM to extract audio waveforms from the same telescopic video sources. In this paper, we demonstrate and compare both readout techniques for several aerospace test scenarios to better show where each has advantages.
- Published
- 2017
- Full Text
- View/download PDF
6. Development of an embedded atmospheric turbulence mitigation engine
- Author
-
Aaron Paolini, Stephen Kozacik, Eric J. Kelmelis, and James Bonnett
- Subjects
Ubiquitous computing ,business.industry ,Computer science ,Atmospheric correction ,Graphics processing unit ,02 engineering and technology ,021001 nanoscience & nanotechnology ,Chip ,01 natural sciences ,010309 optics ,Software ,Gate array ,Software deployment ,Embedded system ,0103 physical sciences ,0210 nano-technology ,business ,Field-programmable gate array - Abstract
Methods to reconstruct pictures from imagery degraded by atmospheric turbulence have been under development for decades. The techniques were initially developed for observing astronomical phenomena from the Earth’s surface, but have more recently been modified for ground and air surveillance scenarios. Such applications can impose significant constraints on deployment options because they both increase the computational complexity of the algorithms themselves and often dictate a requirement for low size, weight, and power (SWaP) form factors. Consequently, embedded implementations must be developed that can perform the necessary computations on low-SWaP platforms. Fortunately, there is an emerging class of embedded processors driven by the mobile and ubiquitous computing industries. We have leveraged these processors to develop embedded versions of the core atmospheric correction engine found in our ATCOM software. In this paper, we will present our experience adapting our algorithms for embedded systems on a chip (SoCs), namely the NVIDIA Tegra that couples general-purpose ARM cores with their graphics processing unit (GPU) technology and the Xilinx Zynq which pairs similar ARM cores with their field-programmable gate array (FPGA) fabric.
- Published
- 2017
- Full Text
- View/download PDF
7. Enhancement of DARPA SRVS data with a real-time commercial turbulence mitigation software
- Author
-
Aaron Paolini, Ariel Sherman, Eric J. Kelmelis, Richard L. Espinola, and Stephen Kozacik
- Subjects
Computer science ,business.industry ,Turbulence ,Digital imaging ,02 engineering and technology ,021001 nanoscience & nanotechnology ,01 natural sciences ,010309 optics ,Speckle pattern ,Software ,0103 physical sciences ,0210 nano-technology ,business ,Remote sensing - Abstract
Modern digital imaging systems are susceptible to degraded imagery because of atmospheric turbulence. Notwithstanding significant improvements in resolution and speed, significant degradation of captured imagery still hampers system designers and operators. Several techniques exist for mitigating the effects of the turbulence on captured imagery, we will concentrate on the effects of Bi-Spectrum Speckle Averaging [1], [2] approach to image enhancement, on a data-set captured in-conjunction with meteorological data.
- Published
- 2017
- Full Text
- View/download PDF
8. Improving developer productivity with C++ embedded domain specific languages
- Author
-
Eric J. Kelmelis, James Bonnett, Aaron Paolini, Evenie Chao, and Stephen Kozacik
- Subjects
010302 applied physics ,Domain-specific language ,Vocabulary ,Programming language ,Computer science ,media_common.quotation_subject ,Interoperability ,computer.software_genre ,Supercomputer ,01 natural sciences ,Domain (software engineering) ,010309 optics ,Set (abstract data type) ,0103 physical sciences ,Preprocessor ,Compiler ,computer ,media_common - Abstract
Domain-specific languages are a useful tool for productivity allowing domain experts to program using familiar concepts and vocabulary while benefiting from performance choices made by computing experts. Embedding the domain specific language into an existing language allows easy interoperability with non-domain-specific code and use of standard compilers and build systems. In C++, this is enabled through the template and preprocessor features. C++ embedded domain specific languages (EDSLs) allow the user to write simple, safe, performant, domain specific code that has access to all the low-level functionality that C and C++ offer as well as the diverse set of libraries available in the C/C++ ecosystem. In this paper, we will discuss several tools available for building EDSLs in C++ and show examples of projects successfully leveraging EDSLs. Modern C++ has added many useful new features to the language which we have leveraged to further extend the capability of EDSLs. At EM Photonics, we have used EDSLs to allow developers to transparently benefit from using high performance computing (HPC) hardware. We will show ways EDSLs combine with existing technologies and EM Photonics high performance tools and libraries to produce clean, short, high performance code in ways that were not previously possible.
- Published
- 2017
- Full Text
- View/download PDF
9. Quantifying the improvement of turbulence mitigation technology
- Author
-
Aaron Paolini, Stephen Kozacik, James Bonnett, Ariel Sherman, and Eric J. Kelmelis
- Subjects
Scintillation ,Computer science ,business.industry ,Turbulence ,Atmospheric correction ,Strehl ratio ,Image processing ,02 engineering and technology ,01 natural sciences ,010309 optics ,Optical transfer function ,0103 physical sciences ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Image warping ,business ,Remote sensing - Abstract
Atmospheric turbulence degrades imagery by imparting scintillation and warping effects that can reduce the ability to identify key features of the subjects. While visually, a human can intuitively understand the improvement that turbulence mitigation techniques can offer in increasing visual information, this enhancement is rarely quantified in a meaningful way. In this paper, we discuss methods for measuring the potential improvement on system performance video enhancement algorithms can provide. To accomplish this, we explore two metrics. We use resolution targets to determine the difference between imagery degraded by turbulence and that improved by atmospheric correction techniques. By comparing line scans between the data before and after processing, it is possible to quantify the additional information extracted. Advanced processing of this data can provide information about the effective modulation transfer function (MTF) of the system when atmospheric effects are considered and removed, using this data we compute a second metric, the relative improvement in Strehl ratio.
- Published
- 2017
- Full Text
- View/download PDF
10. Front Matter: Volume 9846
- Author
-
Eric J. Kelmelis
- Subjects
Optics ,Materials science ,Range (biology) ,business.industry ,business - Published
- 2016
- Full Text
- View/download PDF
11. Comparison of turbulence mitigation algorithms
- Author
-
Aaron Paolini, Eric J. Kelmelis, Stephen Kozacik, and James Bonnett
- Subjects
Computer science ,Turbulence ,Image quality ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,02 engineering and technology ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Image warping ,Image sensor ,Algorithm - Abstract
When capturing image data over long distances (0.5 km and above), images are often degraded by atmospheric turbulence, especially when imaging paths are close to the ground or in hot environments. These issues manifest as time-varying scintillation and warping effects that decrease the effective resolution of the sensor and reduce actionable intelligence. In recent years, several image processing approaches to turbulence mitigation have shown promise. Each of these algorithms have different computational requirements, usability demands, and degrees of independence from camera sensors. They also produce different degrees of enhancement when applied to turbulent imagery. Additionally, some of these algorithms are applicable to real-time operational scenarios while others may only be suitable for post-processing workflows. EM Photonics has been developing image-processing-based turbulence mitigation technology since 2005 as a part of our ATCOM [1] image processing suite. In this paper we will compare techniques from the literature with our commercially available real-time GPU accelerated turbulence mitigation software suite, as well as in-house research algorithms. These comparisons will be made using real, experimentally-obtained data for a variety of different conditions, including varying optical hardware, imaging range, subjects, and turbulence conditions. Comparison metrics will include image quality, video latency, computational complexity, and potential for real-time operation.
- Published
- 2016
- Full Text
- View/download PDF
12. Enabling power-aware software in embedded systems
- Author
-
Eric J. Kelmelis, Adam Markey, Aaron Paolini, James Bonnett, Paul Fox, and Stephen Kozacik
- Subjects
Battery (electricity) ,Mobile processor ,business.industry ,Computer science ,Wearable computer ,Linux kernel ,02 engineering and technology ,021001 nanoscience & nanotechnology ,computer.software_genre ,01 natural sciences ,010309 optics ,Software ,Embedded system ,0103 physical sciences ,Compiler ,0210 nano-technology ,business ,Mobile device ,Electrical efficiency ,computer - Abstract
The use of commodity mobile processors in wearable computing and field-deployed applications has risen as these processors have become increasingly powerful and inexpensive. Battery technology, however, has not advanced as quickly, and as the processing power of these systems has increased, so has their power consumption. In order to maximize endurance without compromising performance, fine-grained control of power consumption by these devices is highly desirable. Various methodologies exist to affect system-level bias with respect to the prioritization of performance or efficiency, but these are fragmented and global in effect, and so do not offer the breadth and granularity of control desired. This paper introduces a method of giving application programmers more control over system power consumption using a directive-based approach similar to existing APIs such as OpenMP. On supported platforms the compiler, application runtime, and Linux kernel will work together to translate the power-saving intent expressed in compiler directives into instructions to control the hardware, reducing power consumption when possible while still providing high performance when required.
- Published
- 2016
- Full Text
- View/download PDF
13. Many-core graph analytics using accelerated sparse linear algebra routines
- Author
-
Paul Fox, Eric J. Kelmelis, Stephen Kozacik, and Aaron Paolini
- Subjects
Power graph analysis ,Wait-for graph ,Theoretical computer science ,Graph database ,business.industry ,Computer science ,Computer programming ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Graph ,010309 optics ,Analytics ,0103 physical sciences ,Linear algebra ,0202 electrical engineering, electronic engineering, information engineering ,Programming paradigm ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,business ,computer - Abstract
Graph analytics is a key component in identifying emerging trends and threats in many real-world applications. Largescale graph analytics frameworks provide a convenient and highly-scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. Another model of graph computation has emerged that promises improved performance and scalability by using abstract linear algebra operations as the basis for graph analysis as laid out by the GraphBLAS standard. By using sparse linear algebra as the basis, existing highly efficient algorithms can be adapted to perform computations on the graph. This approach, however, is often less intuitive to graph analytics experts, who are accustomed to vertex-centric APIs such as Giraph, GraphX, and Tinkerpop. We are developing an implementation of the high-level operations supported by these APIs in terms of linear algebra operations. This implementation is be backed by many-core implementations of the fundamental GraphBLAS operations required, and offers the advantages of both the intuitive programming model of a vertex-centric API and the performance of a sparse linear algebra implementation. This technology can reduce the number of nodes required, as well as the run-time for a graph analysis problem, enabling customers to perform more complex analysis with less hardware at lower cost. All of this can be accomplished without the requirement for the customer to make any changes to their analytics code, thanks to the compatibility with existing graph APIs.
- Published
- 2016
- Full Text
- View/download PDF
14. Front Matter: Volume 9478
- Author
-
Eric J. Kelmelis
- Subjects
Volume (thermodynamics) ,Mechanics ,Geology ,Front (military) - Published
- 2015
- Full Text
- View/download PDF
15. Adaptive OpenCL libraries for platform portability
- Author
-
Paul Fox, Allyssa L. Batten, Eric J. Kelmelis, and Marcus Hayes
- Subjects
Software portability ,Computer architecture ,business.industry ,Computer science ,Computer data storage ,Computer programming ,Memory organisation ,business ,Field-programmable gate array ,Execution model ,Massively parallel - Abstract
The OpenCL API provides an abstract mechanism for massively parallel programming on a very wide range of hardware, including traditional CPUs, GPUs, accelerator devices, FPGAs, and more. However, these different hardware architectures and platforms function quite differently. Therefore, coding OpenCL applications that are usefully portable is challenging. Certain considerations are therefore required in developing an effectively portable OpenCL library to enable parallel application development without requiring fully separate code paths for each target platform. By making use of device detection and characterization provided by the OpenCL API, valuable information can be obtained to make runtime decisions for optimization. In particular, the effects of memory affinity change depending on the memory organization of the device architecture. Work partitioning and assignment depend on the device execution model, in particular the types of parallel execution supported and available synchronization primitives. These considerations, in turn, affect the selection and invocation of kernel code. For certain devices, platform-specific libraries are available, while others can benefit from generated kernel code based on the specified device parameters. By parameterizing an algorithm based on how these considerations affect performance, a combination of device parameters can be used to produce an execution strategy that will provide improved performance for that device or collection of devices.
- Published
- 2015
- Full Text
- View/download PDF
16. Real-time image processing for passive mmW imagery
- Author
-
Aaron Paolini, Daniel G. Mackrides, James Bonnett, Thomas E. Dillon, Eric J. Kelmelis, Charles Harrity, Dennis W. Prather, Christopher A. Schuetz, Richard D. Martin, and Stephen Kozacik
- Subjects
Diffraction ,Computer science ,business.industry ,Aperture ,Noise reduction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,Signal-to-noise ratio ,Transmission (telecommunications) ,Digital image processing ,Computer vision ,Artificial intelligence ,Noise (video) ,business - Abstract
The transmission characteristics of millimeter waves (mmWs) make them suitable for many applications in defense and security, from airport preflight scanning to penetrating degraded visual environments such as brownout or heavy fog. While the cold sky provides sufficient illumination for these images to be taken passively in outdoor scenarios, this utility comes at a cost; the diffraction limit of the longer wavelengths involved leads to lower resolution imagery compared to the visible or IR regimes, and the low power levels inherent to passive imagery allow the data to be more easily degraded by noise. Recent techniques leveraging optical upconversion have shown significant promise, but are still subject to fundamental limits in resolution and signal-to-noise ratio. To address these issues we have applied techniques developed for visible and IR imagery to decrease noise and increase resolution in mmW imagery. We have developed these techniques into fieldable software, making use of GPU platforms for real-time operation of computationally complex image processing algorithms. We present data from a passive, 77 GHz, distributed aperture, video-rate imaging platform captured during field tests at full video rate. These videos demonstrate the increase in situational awareness that can be gained through applying computational techniques in real-time without needing changes in detection hardware.
- Published
- 2015
- Full Text
- View/download PDF
17. Real-time technology for enhancing long-range imagery
- Author
-
Eric J. Kelmelis, Paul Fox, Aaron Paolini, James Bonnett, and Stephen Kozacik
- Subjects
Atmosphere (unit) ,business.industry ,Computer science ,media_common.quotation_subject ,Real-time computing ,Image processing ,Video processing ,Range (mathematics) ,Computer vision ,Quality (business) ,Artificial intelligence ,Image warping ,business ,Focus (optics) ,media_common - Abstract
Many ISR applications require constant monitoring of targets from long distance. When capturing over long distances, imagery is often degraded by atmospheric turbulence. This adds a time-variant blurring effect to captured data, and can result in a significant loss of information. To recover it, image processing techniques have been developed to enhance sequences of short exposure images or videos in order to remove frame-specific scintillation and warping. While some of these techniques have been shown to be quite effective, the associated computational complexity and required processing power limits the application of these techniques to post-event analysis. To meet the needs of real-time ISR applications, video enhancement must be done in real-time in order to provide actionable intelligence as the scene unfolds. In this paper, we will provide an overview of an algorithm capable of providing the enhancement desired and focus on its real-time implementation. We will discuss the role that GPUs play in enabling real-time performance. This technology can be used to add performance to ISR applications by improving the quality of long-range imagery as it is collected and effectively extending sensor range.
- Published
- 2015
- Full Text
- View/download PDF
18. Automatic parameter estimation for atmospheric turbulence mitigation techniques
- Author
-
Aaron Paolini, Stephen Kozacik, and Eric J. Kelmelis
- Subjects
Range (mathematics) ,Computer science ,Estimation theory ,Turbulence ,Real-time computing ,Image processing ,Simulation - Abstract
Several image processing techniques for turbulence mitigation have been shown to be effective under a wide range of long-range capture conditions; however, complex, dynamic scenes have often required manual interaction with the algorithm’s underlying parameters to achieve optimal results. While this level of interaction is sustainable in some workflows, in-field determination of ideal processing parameters greatly diminishes usefulness for many operators. Additionally, some use cases, such as those that rely on unmanned collection, lack human-in-the-loop usage. To address this shortcoming, we have extended a well-known turbulence mitigation algorithm based on bispectral averaging with a number of techniques to greatly reduce (and often eliminate) the need for operator interaction. Automations were made in the areas of turbulence strength estimation (Fried’s parameter), as well as the determination of optimal local averaging windows to balance turbulence mitigation and the preservation of dynamic scene content (non-turbulent motions). These modifications deliver a level of enhancement quality that approaches that of manual interaction, without the need for operator interaction. As a consequence, the range of operational scenarios where this technology is of benefit has been significantly expanded.
- Published
- 2015
- Full Text
- View/download PDF
19. Comparison of turbulence mitigation algorithms
- Author
-
Ariel Sherman, Stephen Kozacik, Aaron Paolini, James Bonnett, and Eric J. Kelmelis
- Subjects
Image fusion ,020205 medical informatics ,Turbulence ,Computer science ,Image quality ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,General Engineering ,Image processing ,02 engineering and technology ,Real image ,01 natural sciences ,Atomic and Molecular Physics, and Optics ,010309 optics ,Speckle pattern ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Deconvolution ,Image warping ,Image sensor ,Algorithm ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
When capturing imagery over long distances, atmospheric turbulence often degrades the data, especially when observation paths are close to the ground or in hot environments. These issues manifest as time-varying scintillation and warping effects that decrease the effective resolution of the sensor and reduce actionable intelligence. In recent years, several image processing approaches to turbulence mitigation have shown promise. Each of these algorithms has different computational requirements, usability demands, and degrees of independence from camera sensors. They also produce different degrees of enhancement when applied to turbulent imagery. Additionally, some of these algorithms are applicable to real-time operational scenarios while others may only be suitable for postprocessing workflows. EM Photonics has been developing image-processing-based turbulence mitigation technology since 2005. We will compare techniques from the literature with our commercially available, real-time, GPU-accelerated turbulence mitigation software. These comparisons will be made using real (not synthetic), experimentally obtained data for a variety of conditions, including varying optical hardware, imaging range, subjects, and turbulence conditions. Comparison metrics will include image quality, video latency, computational complexity, and potential for real-time operation. Additionally, we will present a technique for quantitatively comparing turbulence mitigation algorithms using real images of radial resolution targets.
- Published
- 2017
- Full Text
- View/download PDF
20. Practical considerations for real-time turbulence mitigation in long-range imagery
- Author
-
Aaron Paolini, Stephen Kozacik, and Eric J. Kelmelis
- Subjects
Signal processing ,Computer science ,Image quality ,Turbulence ,business.industry ,Real-time computing ,General Engineering ,Image processing ,02 engineering and technology ,Video processing ,021001 nanoscience & nanotechnology ,01 natural sciences ,Atomic and Molecular Physics, and Optics ,010309 optics ,Range (mathematics) ,0103 physical sciences ,Computer vision ,Artificial intelligence ,Image warping ,0210 nano-technology ,business - Abstract
Atmospheric turbulence degrades imagery by imparting scintillation and warping effects that blur the collected pictures and reduce the effective level of detail. While this reduction in image quality can occur in a wide range of scenarios, it is particularly noticeable when capturing over long distances, when close to the ground, or in hot and humid environments. For decades, researchers have attempted to correct these problems through device and signal processing solutions. While fully digital approaches have the advantage of not requiring specialized hardware, they have been difficult to realize in real-time scenarios due to a variety of practical considerations, including computational performance, the need to integrate with cameras, and the ability to handle complex scenes. We address these challenges and our experience overcoming them. We enumerate the considerations for developing an image processing approach to atmospheric turbulence correction and describe how we approached them to develop software capable of real-time enhancement of long-range imagery.
- Published
- 2017
- Full Text
- View/download PDF
21. Optimization techniques for OpenCL-based linear algebra routines
- Author
-
John R. Humphrey, Stephen Kozacik, Paul Fox, Aryeh Kuller, Dennis W. Prather, and Eric J. Kelmelis
- Subjects
Set (abstract data type) ,Kernel (linear algebra) ,Computer science ,business.industry ,Linear algebra ,Computer programming ,Multiplication ,Parallel computing ,General-purpose computing on graphics processing units ,business ,Parametrization ,Matrix multiplication ,Block (data storage) - Abstract
The OpenCL standard for general-purpose parallel programming allows a developer to target highly parallel computations towards graphics processing units (GPUs), CPUs, co-processing devices, and field programmable gate arrays (FPGAs). The computationally intense domains of linear algebra and image processing have shown significant speedups when implemented in the OpenCL environment. A major benefit of OpenCL is that a routine written for one device can be run across many different devices and architectures; however, a kernel optimized for one device may not exhibit high performance when executed on a different device. For this reason kernels must typically be hand-optimized for every target device family. Due to the large number of parameters that can affect performance, hand tuning for every possible device is impractical and often produces suboptimal results. For this work, we focused on optimizing the general matrix multiplication routine. General matrix multiplication is used as a building block for many linear algebra routines and often comprises a large portion of the run-time. Prior work has shown this routine to be a good candidate for high-performance implementation in OpenCL. We selected several candidate algorithms from the literature that are suitable for parameterization. We then developed parameterized kernels implementing these algorithms using only portable OpenCL features. Our implementation queries device information supplied by the OpenCL runtime and utilizes this as well as user input to generate a search space that satisfies device and algorithmic constraints. Preliminary results from our work confirm that optimizations are not portable from one device to the next, and show the benefits of automatic tuning. Using a standard set of tuning parameters seen in the literature for the NVIDIA Fermi architecture achieves a performance of 1.6 TFLOPS on an AMD 7970 device, while automatically tuning achieves a peak of 2.7 TFLOPS
- Published
- 2014
- Full Text
- View/download PDF
22. Targeting multiple heterogeneous hardware platforms with OpenCL
- Author
-
Aaron Paolini, John R. Humphrey, Stephen Kozacik, Paul Fox, Aryeh Kuller, and Eric J. Kelmelis
- Subjects
Hardware architecture ,business.industry ,Computer science ,Subroutine ,Symmetric multiprocessor system ,computer.software_genre ,Software portability ,Just-in-time compilation ,Computer architecture ,Modular programming ,Preprocessor ,Compiler ,business ,computer ,Implementation ,Computer hardware - Abstract
The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware-specific optimizations as necessary.
- Published
- 2014
- Full Text
- View/download PDF
23. Mean square error performance evaluation of a commercial speckle imaging system using simulated imagery
- Author
-
Jeremy P. Bos, Michael C. Roggemann, Eric J. Kelmelis, and Aaron Paolini
- Subjects
Diffraction ,Mean squared error ,Turbulence ,Computer science ,business.industry ,Volume (computing) ,Image processing ,Computer Science::Computer Vision and Pattern Recognition ,Range (statistics) ,Computer vision ,Speckle imaging ,Artificial intelligence ,business ,Bispectrum ,Remote sensing - Abstract
We examine the performance of a commercially available speckle imaging system in reconstructing static scenes from imagery corrupted by anisoplanatic distortions commonly observed when imaging over long horizontal paths near the ground. Performance is evaluated using the Mean Squared Error between system outputs and a diffraction-limited reference image. Input image frames are taken from a large library of simulated imagery of a static object observed over a 1 km horizontal path through volume turbulence in 3 turbulence conditions. 1000 image frames are available for each condition allowing for a statistically significant characterization of system performance over a range of turbulence conditions.
- Published
- 2014
- Full Text
- View/download PDF
24. Multi-frame image processing with panning cameras and moving subjects
- Author
-
Eric J. Kelmelis, Aaron Paolini, Petersen F. Curt, and John R. Humphrey
- Subjects
Computer science ,business.industry ,Digital image processing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,Computer vision ,Artificial intelligence ,Speckle imaging ,Panning (camera) ,business ,Multi frame - Abstract
Imaging scenarios commonly involve erratic, unpredictable camera behavior or subjects that are prone to movement, complicating multi-frame image processing techniques. To address these issues, we developed three techniques that can be applied to multi-frame image processing algorithms in order to mitigate the adverse effects observed when cameras are panning or subjects within the scene are moving. We provide a detailed overview of the techniques and discuss the applicability of each to various movement types. In addition to this, we evaluated algorithm efficacy with demonstrated benefits using field test video, which has been processed using our commercially available surveillance product. Our results show that algorithm efficacy is significantly improved in common scenarios, expanding our software’s operational scope. Our methods introduce little computational burden, enabling their use in real-time and low-power solutions, and are appropriate for long observation periods. Our test cases focus on imaging through turbulence, a common use case for multi-frame techniques. We present results of a field study designed to test the efficacy of these techniques under expanded use cases.
- Published
- 2014
- Full Text
- View/download PDF
25. Using ATCOM to enhance long-range imagery collected by NASA’s flight test tracking cameras at Armstrong Flight Research Center
- Author
-
David Tow, Aaron Paolini, and Eric J. Kelmelis
- Subjects
Computer science ,business.industry ,Turbulence ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,Image processing ,Tracking (particle physics) ,Flight test ,Rocket launch ,Software ,Range (aeronautics) ,business ,Research center ,Remote sensing - Abstract
Located at Edwards Air Force Base, Armstrong Flight Research Center (AFRC) is NASA’s premier site for aeronautical research and operates some of the most advanced aircraft in the world. As such, flight tests for advanced manned and unmanned aircraft are regularly performed there. All such tests are tracked through advanced electro-optic imaging systems to monitor the flight status in real-time and to archive the data for later analysis. This necessitates the collection of imagery from long-range camera systems of fast moving targets from a significant distance away. Such imagery is severely degraded due to the atmospheric turbulence between the camera and the object of interest. The result is imagery that becomes blurred and suffers a substantial reduction in contrast, causing significant detail in the video to be lost. In this paper, we discuss the image processing techniques located in the ATCOM software, which uses a multi-frame method to compensate for the distortions caused by the turbulence.
- Published
- 2014
- Full Text
- View/download PDF
26. Front Matter: Volume 8752
- Author
-
Eric J. Kelmelis
- Subjects
Volume (thermodynamics) ,Mechanics ,Geology ,Front (military) - Published
- 2013
- Full Text
- View/download PDF
27. Advances in computational fluid dynamics solvers for modern computing environments
- Author
-
Aaron Paolini, John R. Humphrey, Eric J. Kelmelis, and Daniel Hertenstein
- Subjects
Multi-core processor ,business.industry ,Computer science ,Parallel computing ,Computational fluid dynamics ,Solver ,Supercomputer ,Computational science ,Software ,Scalability ,Software architecture ,business ,Multicore architecture ,Xeon Phi - Abstract
EM Photonics has been investigating the application of massively multicore processors to a key problem area: Computational Fluid Dynamics (CFD). While the capabilities of CFD solvers have continually increased and improved to support features such as moving bodies and adjoint-based mesh adaptation, the software architecture has often lagged behind. This has led to poor scaling as core counts reach the tens of thousands. In the modern High Performance Computing (HPC) world, clusters with hundreds of thousands of cores are becoming the standard. In addition, accelerator devices such as NVIDIA GPUs and Intel Xeon Phi are being installed in many new systems. It is important for CFD solvers to take advantage of the new hardware as the computations involved are well suited for the massively multicore architecture. In our work, we demonstrate that new features in NVIDIA GPUs are able to empower existing CFD solvers by example using AVUS, a CFD solver developed by the Air Force Research Labratory (AFRL) and the Volcanic Ash Advisory Center (VAAC). The effort has resulted in increased performance and scalability without sacrificing accuracy. There are many well-known codes in the CFD space that can benefit from this work, such as FUN3D, OVERFLOW, and TetrUSS. Such codes are widely used in the commercial, government, and defense sectors.
- Published
- 2013
- Full Text
- View/download PDF
28. Frontmatter: Volume 8403
- Author
-
Eric J. Kelmelis
- Subjects
Volume (thermodynamics) ,Mechanics ,Geology - Published
- 2012
- Full Text
- View/download PDF
29. Accelerating CULA Linear Algebra Routines with Hybrid GPU and Multicore Computing
- Author
-
Daniel K. Price, Eric J. Kelmelis, John R. Humphrey, and Kyle E. Spagnoli
- Subjects
Fortran ,Computer science ,Graphics processing unit ,Parallel computing ,System of linear equations ,LU decomposition ,Computational science ,law.invention ,law ,Interfacing ,Linear algebra ,Computer Science::Mathematical Software ,Central processing unit ,MATLAB ,computer ,computer.programming_language - Abstract
Publisher Summary The LU decomposition is a popular linear algebra technique with applications such as the solution of systems of linear equations and calculation of matrix inverses and determinants. Central processing unit (CPU) versions of this routine exhibit very high performance, making the port to a graphics processing unit (GPU) a challenging prospect. This chapter discusses the implementation of LU decomposition in CULA library for linear algebra on the GPU, describing the steps necessary for achieving significant speed-ups over the CPU. Specialized techniques are employed by CULA to obtain significant speed-ups over existing packages. CULA features a wide variety of linear algebra functions, including least squares solvers (constrained and unconstrained), system solvers (general and symmetric positive definite), eigenproblem solvers (general and symmetric), singular value decompositions, and many useful factorizations (QR, Hessenberg). It also presents a number of methods for interfacing with CULA. The two major interfaces are host and device, and they accept data via host memory and device memory, respectively. The host interface features high convenience, whereas the device interface is more manual, but can avoid data transfer times. Additionally, there are facilities for interfacing with MATLAB and the Fortran language.
- Published
- 2012
- Full Text
- View/download PDF
30. Accelerating sparse linear algebra using graphics processing units
- Author
-
Eric J. Kelmelis, John R. Humphrey, Kyle E. Spagnoli, and Daniel K. Price
- Subjects
Numerical linear algebra ,Computer science ,Graphics processing unit ,Parallel computing ,computer.software_genre ,Finite element method ,Computational science ,CUDA ,Linear algebra ,Computer Science::Mathematical Software ,Central processing unit ,General-purpose computing on graphics processing units ,Graphics ,computer ,Execution model - Abstract
The modern graphics processing unit (GPU) found in many standard personal computers is a highly parallel math processor capable of over 1 TFLOPS of peak computational throughput at a cost similar to a high-end CPU with excellent FLOPS-to-watt ratio. High-level sparse linear algebra operations are computationally intense, often requiring large amounts of parallel operations and would seem a natural fit for the processing power of the GPU. Our work is on a GPU accelerated implementation of sparse linear algebra routines. We present results from both direct and iterative sparse system solvers. The GPU execution model featured by NVIDIA GPUs based on CUDA demands very strong parallelism, requiring between hundreds and thousands of simultaneous operations to achieve high performance. Some constructs from linear algebra map extremely well to the GPU and others map poorly. CPUs, on the other hand, do well at smaller order parallelism and perform acceptably during low-parallelism code segments. Our work addresses this via hybrid a processing model, in which the CPU and GPU work simultaneously to produce results. In many cases, this is accomplished by allowing each platform to do the work it performs most naturally. For example, the CPU is responsible for graph theory portion of the direct solvers while the GPU simultaneously performs the low level linear algebra routines.
- Published
- 2011
- Full Text
- View/download PDF
31. Front Matter: Volume 8060
- Author
-
Eric J. Kelmelis
- Subjects
Volume (thermodynamics) ,Mechanics ,Geology ,Front (military) - Published
- 2011
- Full Text
- View/download PDF
32. CULA: hybrid GPU accelerated linear algebra routines
- Author
-
John R. Humphrey, Daniel K. Price, Aaron Paolini, Kyle E. Spagnoli, and Eric J. Kelmelis
- Subjects
CUDA ,law ,Computer science ,Linear algebra ,Singular value decomposition ,Computer Science::Mathematical Software ,Graphics processing unit ,Parallel computing ,Central processing unit ,FLOPS ,LU decomposition ,law.invention ,QR decomposition - Abstract
The modern graphics processing unit (GPU) found in many standard personal computers is a highly parallel math processor capable of nearly 1 TFLOPS peak throughput at a cost similar to a high-end CPU and an excellent FLOPS/watt ratio. High-level linear algebra operations are computationally intense, often requiring O(N3) operations and would seem a natural fit for the processing power of the GPU. Our work is on CULA, a GPU accelerated implementation of linear algebra routines. We present results from factorizations such as LU decomposition, singular value decomposition and QR decomposition along with applications like system solution and least squares. The GPU execution model featured by NVIDIA GPUs based on CUDA demands very strong parallelism, requiring between hundreds and thousands of simultaneous operations to achieve high performance. Some constructs from linear algebra map extremely well to the GPU and others map poorly. CPUs, on the other hand, do well at smaller order parallelism and perform acceptably during low-parallelism code segments. Our work addresses this via hybrid a processing model, in which the CPU and GPU work simultaneously to produce results. In many cases, this is accomplished by allowing each platform to do the work it performs most naturally.
- Published
- 2010
- Full Text
- View/download PDF
33. Comparing FPGAs and GPUs for high-performance image processing applications
- Author
-
Michael R. Bodnar, Daniel K. Price, Petersen F. Curt, Eric J. Kelmelis, Fernando E. Ortiz, Kyle E. Spagnoli, and Aaron Paolini
- Subjects
Flexibility (engineering) ,Workstation ,business.industry ,Computer science ,Image quality ,Image processing ,law.invention ,Microprocessor ,Parallel processing (DSP implementation) ,law ,Embedded system ,Computer data storage ,business ,Field-programmable gate array - Abstract
Modern image enhancement techniques have been shown to be effective in improving the quality of imagery. However, the computational requirements of applying such algorithms to streams of video in real-time often cannot be satisfied by standard microprocessor-based systems. While a scaled solution involving clusters of microprocessors may provide the necessary arithmetic capacity, deployment is limited to data-center scenarios. What is needed is a way to perform these techniques in real time on embedded platforms. A new paradigm of computing utilizing special-purpose commodity hardware including Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPU) has recently emerged as an alternative to parallel computing using clusters of traditional CPUs. Recent research has shown that for many applications, such as image processing techniques requiring intense computations and large memory spaces, these hardware platforms significantly outperform microprocessors. Furthermore, while microprocessor technology has begun to stagnate, GPUs and FPGAs have continued to improve exponentially. FPGAs, flexible and powerful, are best targeted at embedded, low-power systems and specific applications. GPUs, cheap and readily available, are available to most users through their standard desktop machines. Additionally, as fabrication scale continues to shrink, heat and power consumption issues typically limiting GPU deployment to high-end desktop workstations are becoming less of a factor. The ability to include these devices in embedded environments opens up entire new application domains. In this paper, we investigate two state-of-the-art image processing techniques, super-resolution and the average-bispectrum speckle method, and compare FPGA and GPU implementations in terms of performance, development effort, cost, deployment options, and platform flexibility.
- Published
- 2010
- Full Text
- View/download PDF
34. Front Matter: Volume 7705
- Author
-
Eric J. Kelmelis
- Subjects
Volume (thermodynamics) ,Mechanics ,Geology ,Front (military) - Published
- 2010
- Full Text
- View/download PDF
35. Organically enabled silicon-based photonic/RF-photonic applications
- Author
-
Matthew Zablocki, Peng Yao, Dennis W. Prather, Ozgenc Ebil, Ahmed S. Sharkawy, Christopher A. Schuctz, Eric J. Kelmelis, and Shouyuan Shi
- Subjects
Fabrication ,Materials science ,Silicon ,business.industry ,Physics::Optics ,chemistry.chemical_element ,Amorphous solid ,chemistry ,Polymer chemistry ,Optoelectronics ,Photonics ,Hybrid material ,business ,Ultrashort pulse ,Realization (systems) ,Photonic crystal - Abstract
In this paper, we present novel designs for the realization of organic-inorganic hybrid material systems and develop concepts and designs for silicon-organic hybrid ultrafast RF Photonic Devices. The designs presented combine, crystalline electro-optic materials, conventional crystalline materials, and amorphous polymers. Numerical simulation results as well as fabrication results are also included.
- Published
- 2010
- Full Text
- View/download PDF
36. An embedded processor for real-time atmoshperic compensation
- Author
-
Petersen F. Curt, Eric J. Kelmelis, Fernando E. Ortiz, Carmen J. Carrano, and Michael R. Bodnar
- Subjects
Speckle pattern ,business.industry ,Image quality ,Computer science ,Interface (computing) ,Image processing ,Computer vision ,Artificial intelligence ,business ,Bispectrum ,Computer hardware ,Compensation (engineering) - Abstract
Imaging over long distances is crucial to a number of defense and security applications, such as homeland security and launch tracking. However, the image quality obtained from current long-range optical systems can be severely degraded by the turbulent atmosphere in the path between the region under observation and the imager. While this obscured image information can be recovered using post-processing techniques, the computational complexity of such approaches has prohibited deployment in real-time scenarios. To overcome this limitation, we have coupled a state-of-the-art atmospheric compensation algorithm, the average-bispectrum speckle method, with a powerful FPGA-based embedded processing board. The end result is a light-weight, lower-power image processing system that improves the quality of long-range imagery in real-time, and uses modular video I/O to provide a flexible interface to most common digital and analog video transport methods. By leveraging the custom, reconfigurable nature of the FPGA, a 20x speed increase over a modern desktop PC was achieved in a form-factor that is compact, low-power, and field-deployable.
- Published
- 2009
- Full Text
- View/download PDF
37. A GPU-accelerated toolbox for the solutions of systems of linear equations
- Author
-
John R. Humphrey, Aaron Paolini, Daniel K. Price, and Eric J. Kelmelis
- Subjects
Computer science ,law ,Graphics processing unit ,Parallel computing ,Solver ,System of linear equations ,Supercomputer ,Generalized minimal residual method ,Linear equation ,LU decomposition ,law.invention - Abstract
The modern graphics processing unit (GPU) found in many off-the shelf personal computers is a very high performance computing engine that often goes unutilized. The tremendous computing power coupled with reasonable pricing has made the GPU a topic of interest in recent research. An application for such power would be the solution to large systems of linear equations. Two popular solution domains are direct solution, via the LU decomposition, and iterative solution, via a solver such as the Generalized Method of Residuals (GMRES). Our research focuses on the acceleration of such processes, utilizing the latest in GPU technologies. We show performance that exceeds that of a standard computer by an order of magnitude, thus significantly reducing the run time of the numerous applications that depend on the solution of a set of linear equations.
- Published
- 2009
- Full Text
- View/download PDF
38. Biologically inspired collision avoidance system for unmanned vehicles
- Author
-
Eric J. Kelmelis, Kyle E. Spagnoli, Brett J. Graham, and Fernando E. Ortiz
- Subjects
business.industry ,Computer science ,Controller (computing) ,Central nervous system ,Robotics ,Optic tectum ,Cerebro ,Object detection ,Midbrain ,medicine.anatomical_structure ,Computer architecture ,Embedded system ,medicine ,Collision avoidance system ,Artificial intelligence ,Field-programmable gate array ,business ,Massively parallel ,Collision avoidance - Abstract
In this project, we collaborate with researchers in the neuroscience department at the University of Delaware to develop an Field Programmable Gate Array (FPGA)-based embedded computer, inspired by the brains of small vertebrates (fish). The mechanisms of object detection and avoidance in fish have been extensively studied by our Delaware collaborators. The midbrain optic tectum is a biological multimodal navigation controller capable of processing input from all senses that convey spatial information, including vision, audition, touch, and lateral-line (water current sensing in fish). Unfortunately, computational complexity makes these models too slow for use in real-time applications. These simulations are run offline on state-of-the-art desktop computers, presenting a gap between the application and the target platform: a low-power embedded device. EM Photonics has expertise in developing of high-performance computers based on commodity platforms such as graphic cards (GPUs) and FPGAs. FPGAs offer (1) high computational power, low power consumption and small footprint (in line with typical autonomous vehicle constraints), and (2) the ability to implement massively-parallel computational architectures, which can be leveraged to closely emulate biological systems. Combining UD's brain modeling algorithms and the power of FPGAs, this computer enables autonomous navigation in complex environments, and further types of onboard neural processing in future applications.
- Published
- 2009
- Full Text
- View/download PDF
39. Real-time embedded atmospheric compensation for long-range imaging using the average bispectrum speckle method
- Author
-
Eric J. Kelmelis, Petersen F. Curt, Fernando E. Ortiz, Carmen J. Carrano, and Michael R. Bodnar
- Subjects
Speckle pattern ,business.industry ,Computer science ,Image processing ,Angular resolution ,Speckle imaging ,business ,Field-programmable gate array ,Bispectrum ,Computer hardware ,Simulation - Abstract
While imaging over long distances is criti cal to a number of security and defense applications, such as homeland security and launch tracking, current optical systems are limited in resolving power. This is largely a result of the turbulent atmosphere in the path between the region under observation and the imaging system, which can severely degrade captured imagery. There are a variety of post-processing techniques capable of recovering this obscured image information; however, the computational complexity of such approaches has prohibited real-time deployment and hampers the usability of these technologies in many scenarios. To overcome this limitation, we have designed and manufactured an embedded image processing system based on commodity hardware which can compensate for these atmospheric disturbances in real-time. Our system consists of a reformulation of the average bispectrum speckle method coupled with a high-end FPGA processing board, and employs modular I/O capable of interfacing with most common digital and analog video transport methods (composite, component, VGA, DVI, SDI, HD-SDI, etc.). By leveraging the custom, reconfigurable nature of the FPGA, we have achieved performance twenty times faster than a modern desktop PC, in a form-factor that is compact , low-power, and field-deployable. Keywords: bispectral speckle imaging, FPGA , embedded, atmospheric compensati on, real-time image processing
- Published
- 2009
- Full Text
- View/download PDF
40. Fabrication of Large Area 'Woodpile' Photonic Crystal Structures for Near IR
- Author
-
Dennis W. Prather, Peng Yao, Shouyuan Shi, Ahmed S. Sharkawy, Ozgenc Ebil, Elton Marchena, Neilanjan Dutta, and Eric J. Kelmelis
- Subjects
chemistry.chemical_classification ,Fabrication ,Materials science ,business.industry ,Nanotechnology ,Polymer ,law.invention ,Planar ,chemistry ,Resist ,law ,Optoelectronics ,Batch fabrication ,Photolithography ,business ,Lithography ,Photonic crystal - Abstract
We have fabricated large area 3D polymer photonic crystals by modifying planar lithography to achieve exposure confinement and multiple resist application. This fabrication process allows arbitrary defect introduction and is suitable for batch fabrication.
- Published
- 2009
- Full Text
- View/download PDF
41. Accelerated determination of UAV flight envelopes
- Author
-
Michael R. Bodnar, John R. Humphrey, Eric J. Kelmelis, and Lyle N. Long
- Subjects
Engineering ,business.industry ,Computational fluid dynamics ,Solver ,Supercomputer ,Euler equations ,Modeling and simulation ,symbols.namesake ,Software ,symbols ,System integration ,Aerospace engineering ,Graphics ,business ,Simulation - Abstract
Unmanned Aerial Vehicle (UAV) system integration with naval vessels is currently realized in limited form. The operational envelopes of these vehicles are constricted due to the complexities involved with at-sea flight testing. Furthermore, the unsteady nature of ship airwakes and the use of automated UAV control software necessitates that these tests be extremely conservative in nature. Modeling and simulation are natural alternatives to flight testing; however, a fully-coupled computational fluid dynamics (CFD) solution requires many thousands of CPU hours. We therefore seek to decrease simulation time by accelerating the underlying computations using state-of-the-art, commodity hardware. In this paper we present the progress of our proposed solution, harnessing the computational power of high-end commodity graphics processing units (GPUs) to create an accelerated Euler equations solver on unstructured hexahedral grids.
- Published
- 2008
- Full Text
- View/download PDF
42. Fabrication of 3D polymer photonic crystals for near-IR applications
- Author
-
Garrett J. Schneider, Dennis W. Prather, Liang Qiu, Eric J. Kelmelis, Ahmed S. Sharkawy, Shouyuan Shi, and Peng Yao
- Subjects
Materials science ,Fabrication ,business.industry ,Nanotechnology ,law.invention ,Surface micromachining ,Resist ,law ,Optoelectronics ,X-ray lithography ,Photolithography ,business ,Lithography ,Microfabrication ,Photonic crystal - Abstract
Photonic crystals[1, 2] have stirred enormous research interest and became a growing enterprise in the last 15 years. Generally, PhCs consist of periodic structures that possess periodicity comparable with the wavelength that the PhCs are designed to modulate. If material and periodic pattern are properly selected, PhCs can be applied to many applications based on their unique properties, including photonic band gaps (PBG)[3], self-collimation[4], super prism[5], etc. Strictly speaking, PhCs need to possess periodicity in three dimensions to maximize their advantageous capabilities. However, many current research is based on scaled two-dimensional PhCs, mainly due to the difficulty of fabrication such three-dimensional PhCs. Many approaches have been explored for the fabrication of 3D photonic crystals, including layer-by-layer surface micromachining[6], glancing angle deposition[7], 3D micro-sculpture method[8], self-assembly[9] and lithographical methods[10-12]. Among them, lithographic methods became increasingly accepted due to low costs and precise control over the photonic crystal structure. There are three mostly developed lithographical methods, namely X-ray lithography[10], holographic lithography[11] and two-photon polymerization[12]. Although significant progress has been made in developing these lithography-based technologies, these approaches still suffer from significant disadvantages. X-ray lithography relies on an expensive radiation source. Holographic lithography lacks the flexibility to create engineered defects, and multi-photon polymerization is not suitable for parallel fabrication. In our previous work, we developed a multi-layer photolithography processes[13, 14] that is based on multiple resist application and enhanced absorption upon exposure. Using a negative lift-off resist (LOR) and 254nm DUV source, we have demonstrated fabrication of 3D arbitrary structures with feature size of several microns. However, severe intermixing problem occurred as we reduced the lattice constant for near-IR applications. In this work, we address this problem by employing SU8. The exposure is vertically confined by using a mismatched 220nm DUV source. Intermixing problem is eliminated due to more densely crosslinked resist molecules. Using this method, we have demonstrated 3D "woodpile" structure with 1.55μm lattice constant and a 2mm-by-2mm pattern area.
- Published
- 2008
- Full Text
- View/download PDF
43. FPGA acceleration of superresolution algorithms for embedded processing in millimeter-wave sensors
- Author
-
Fernando E. Ortiz, Dennis W. Prather, James P. Durbano, and Eric J. Kelmelis
- Subjects
Pixel ,Computer science ,business.industry ,Computation ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,Bottleneck ,Power (physics) ,Computer Science::Computer Vision and Pattern Recognition ,Computer vision ,Artificial intelligence ,Dither ,business ,Field-programmable gate array ,Algorithm ,Linear least squares - Abstract
Superresolution reconstruction (SR-REC) algorithms combine multiple frames captured using spatially under-sampled imagers to produce a single higher-resolution image. Sub-pixel information is gained from natural motion within the image instead of active pixel scanning (dithering/micro-scanning), eliminating the reliability issues and power consumption associated with moving parts. One of the major computational challenges associated with SR-REC methods is the estimation of the optical flow of the image (i.e., determining the unknown pixel shifts between consecutive frames). A linear least squares approximation is the simplest method for estimating the pixel movements from the captured data, but the size of the problem (directly proportional to the number of pixels in the image) creates a computational bottleneck, which in turn limits the usability of this algorithm in real-time portable systems. We propose the use of a reconfigurable platform to implement these computations in a low power/size environment, suitable for integration into portable millimeter wave imagers.
- Published
- 2007
- Full Text
- View/download PDF
44. Reconfigurable device for enhancement of long-range imagery
- Author
-
Fernando E. Ortiz, Eric J. Kelmelis, Petersen F. Curt, and Carmen J. Carrano
- Subjects
Flexibility (engineering) ,Acceleration ,Engineering ,Speedup ,business.industry ,Electronic engineering ,Solver ,business ,Field-programmable gate array ,Reconfigurable computing ,Compensation (engineering) ,Reusability - Abstract
In this paper, we discuss the real-time compensation of air turbulence in imaging through long atmospheric paths. We propose the use of a reconfigurable hardware platform, specifically field-programmable gate arrays (FPGAs), to reduce costs and development time, as well as increase flexibility and reusability. We present the results of our acceleration efforts to date (40x speedup) and our strategy to achieve a real-time, atmospheric compensation solver for highdefinition video signals.
- Published
- 2007
- Full Text
- View/download PDF
45. An architecture for the efficient implementation of compressive sampling reconstruction algorithms in reconfigurable hardware
- Author
-
Eric J. Kelmelis, Gonzalo R. Arce, and Fernando E. Ortiz
- Subjects
Signal processing ,Compressed sensing ,Computer science ,Pipeline (computing) ,Bandwidth (signal processing) ,Sampling (statistics) ,Hardware acceleration ,Algorithm ,Wireless sensor network ,Reconfigurable computing - Abstract
According to the Shannon-Nyquist theory, the number of samples required to reconstruct a signal is proportional to its bandwidth. Recently, it has been shown that acceptable reconstructions are possible from a reduced number of random samples, a process known as compressive sampling. Taking advantage of this realization has radical impact on power consumption and communication bandwidth, crucial in applications based on small/mobile/unattended platforms such as UAVs and distributed sensor networks. Although the benefits of these compression techniques are self-evident, the reconstruction process requires the solution of nonlinear signal processing algorithms, which limit applicability in portable and real-time systems. In particular, (1) the power consumption associated with the difficult computations offsets the power savings afforded by compressive sampling, and (2) limited computational power prevents these algorithms to maintain pace with the data-capturing sensors, resulting in undesirable data loss. FPGA based computers offer low power consumption and high computational capacity, providing a solution to both problems simultaneously. In this paper, we present an architecture that implements the algorithms central to compressive sampling in an FPGA environment. We start by studying the computational profile of the convex optimization algorithms used in compressive sampling. Then we present the design of a pixel pipeline suitable for FPGA implementation, able to compute these algorithms.
- Published
- 2007
- Full Text
- View/download PDF
46. A reconfigurable self-collimation-based photonic crystal switch in silicon
- Author
-
Dennis W. Prather, Richard K. Martin, Ahmed S. Sharkawy, Eric J. Kelmelis, Caihua Chen, and Binglin Miao
- Subjects
Optics ,Materials science ,business.industry ,Electric field ,Dispersion (optics) ,Topology (electrical circuits) ,Absorption (electromagnetic radiation) ,business ,Optical switch ,Electromagnetic radiation ,Signal ,Photonic crystal - Abstract
We present a reconfigurable, compact, low loss, optical switch in silicon. The device utilizes the self-collimation properties of photonic crystal structures and provides a technique for efficiently switching an electromagnetic wave guided through a pre-engineered dispersion based photonic crystal self-guiding structure. The electromagnetic wave can be either in the microwave or optical regime based on the constituent materials and dimensions of the photonic crystal host. We propose that the loss tangent of dielectric material in the switching region can be modified by external commands to control the direction of propagation of the sel f-collimated signal and hence attain switching, thereby re-directing the light. Based on the geometrical orientation and position of the applied electric field, electromagnetic waves can be completely redirected (switched), or partially routed towards any arbitrary direction on a Manhattan grid or network. We have found that the induced loss does not signi ficantly attenuate the waves switched in any direction. The structure presented can be generalized to an arbitrary N by M interconnected switching network or fabric, where the switching topology can be dynamically modulated by the application of external fields. To attain switching, the free-carrier absorption loss of Si is controlled by carrier injection from forward-biased PN junction. The concept device is designed and analyzed using the FastFDTD
- Published
- 2007
- Full Text
- View/download PDF
47. GPU-based accelerated 2D and 3D FDTD solvers
- Author
-
John R. Humphrey, Daniel K. Price, and Eric J. Kelmelis
- Subjects
Electromagnetic field ,symbols.namesake ,Acceleration ,Maxwell's equations ,Computer science ,Scattering-matrix method ,Finite-difference time-domain method ,symbols ,Computational electromagnetics ,Graphics ,Computational science ,Visualization - Abstract
Our group has employed the use of modern graphics processor units (GPUs) for the acceleration of finite-difference based computational electromagnetics (CEM) codes. In particular, we accelerated the well-known Finite-Difference Time-Domain (FDTD) method, which is commonly used for the analysis of electromagnetic phenomena. This algorithm uses difference-based approximations for Maxwell's Equations to simulate the propagation of electromagnetic fields through space and materials. The method is very general and is applicable to a wide array of problems, but runtimes can be very long so acceleration is highly desired. In this paper we present GPU-based accelerated solvers for the FDTD method in both its 2D and 3D embodiments.
- Published
- 2007
- Full Text
- View/download PDF
48. Fabrication of Large Area Polymer-Based 3D Photonic Crystals
- Author
-
Peng Yao, Ahmed S. Sharkawy, Dennis W. Prather, Eric J. Kelmelis, and Shouyuan Shi
- Subjects
chemistry.chemical_classification ,Materials science ,Fabrication ,business.industry ,Photonic integrated circuit ,Physics::Optics ,Nanotechnology ,Polymer ,Condensed Matter::Soft Condensed Matter ,Laser linewidth ,chemistry ,Optoelectronics ,Integrated optics ,business ,Laser beams ,Photonic crystal - Abstract
Polymer based photonic crystals are ideal candidates for applications relying on engineering the dispersive properties of those periodic structures. We present a process for fabricating arbitrary large area 3D photonic crystal structures in polymers.
- Published
- 2007
- Full Text
- View/download PDF
49. Accelerated Electromagnetic Solvers Using Commodity Graphics Cards
- Author
-
Daniel K. Price, John R. Humphrey, and Eric J. Kelmelis
- Subjects
Acceleration ,Computer science ,Computer graphics (images) ,Finite-difference time-domain method ,Graphics ,Commodity (Marxism) ,Computational science - Abstract
We have employed modern graphics processor units (GPUs) for the acceleration of the well-known Finite-Difference Time-Domain (FDTD) method. Our implementation achieves speedups up to 40x traditional microprocessor-based solutions.
- Published
- 2007
- Full Text
- View/download PDF
50. Modeling and simulation of nanoscale devices with a desktop supercomputer
- Author
-
Eric J. Kelmelis, Fernando E. Ortiz, Petersen F. Curt, James P. Durbano, and John R. Humphrey
- Subjects
Modeling and simulation ,Computer science ,business.industry ,Key (cryptography) ,Finite-difference time-domain method ,Graphics ,Supercomputer ,Field-programmable gate array ,business ,Field (computer science) ,Computer hardware ,Level of detail ,Computational science - Abstract
Designing nanoscale devices presents a number of unique challenges. As device features shrink, the computational demands of the simulations necessary to accurately model them increase significantly. This is a result of not only the increasing level of detail in the device design itself, but also the need to use more accurate models. The approximations that are generally made when dealing with larger devices break down as feature sizes decrease. This can be seen in the optics field when contrasting the complexity of physical optics models with those requiring a rigorous solution to Maxwell's equations. This added complexity leads to more demanding calculations, stressing computational resources and driving research to overcome these limitations. There are traditionally two means of improving simulation times as model complexity grows beyond available computational resources: modifying the underlying algorithms to maintain sufficient precision while reducing overall computations and increasing the power of the computational system. In this paper, we explore the latter. Recent advances in commodity hardware technologies, particularly field-programmable gate arrays (FPGAs) and graphics processing units (GPUs), have allowed the creation of desktop-style devices capable of outperforming PC clusters. We will describe the key hardware technologies required to build such a device and then discuss their application to the modeling and simulation of nanophotonic devices. We have found that FPGAs and GPUs can be used to significantly reduce simulation times and allow for the solution of much large problems.
- Published
- 2006
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.