2018

Niccolò Discacciati – Implementation of Discontinuous Galerkin model problem for atmospheric dynamics on emerging architectures

About : CSCS, Suisse

Description

The Swiss National Supercomputing Centre (CSCS) is the national high-performance computing centre of
Switzerland. Operated by ETH Zurich, it is located in Lugano, Canton Ticino.

My internship was built both on a theoretical (numerical) and a computational aspect. On one side, Discontinuous Galerkin (DG) methods constitute a class of solvers for partial differential equations which combine features from finite elements and finite volumes, adding a good parallelization potential. On the other hand, GridTools (GT) is an efficient C++ library developed at CSCS. In order to solve real-world problems, it triggers several architecture-based optimizations and extensive parallelism. Even though it is mainly designed for finite differences, the Galerkin4GridTools (G4GT) library provides the framework to support finite elements discretization inside GT.

The main goal of the project was the implementation and evaluation of a DG solver using G4GT, in view of possible applications in the domain of atmospheric dynamics and climate modeling. The solver is designed for advection equations and it was validated by means of different benchmark tests. Then, the support for a spatially variable discretization degree was added. Such a flexibility plays a key role both in terms of stability issues (given by the CFL condition) in case of spherical geometry and in order to save computational resources (the required time is approximately halved with respect to a constant degree). At the same time, an extensive performance analysis was carried out, using the Roofline model. For a CPU backend the results are coherent with the model, while for the GPU case the values are even above the hardware limit.

 

Figure 1: Performance analysis using a CPU backend, with different discretization degrees r = M − 1. The
triangles denote the dominant kernel, while the circles stand for the less-efficient one, since bad memory accesses are present. Finally, the blue line denotes the hardware limit and the crosses stand for the performances of the global program.

Guillermo Julián Moreno – Development of a mass spectrometry data analysis solution

About : Nestlé, Lausanne

 

Description

My internship was conducted in the Nestlé Research Center (NRC) in Lausanne, more specifically in the Proteins/Peptides group in the Department of Analytical Science, from the 10th of July to the 22nd of December of 2017. Nestlé is a very well-known Swiss company and the NRC remains its flagship center for Research and Development. The project on which I worked was focused on the chemical analysis of proteins to improve milk-based products.

One of the analytical methods they use is intact protein analysis by Liquid Chromatography – High Resolution Mass Spectrometry (LC-HRMS). My role was to develop an application to analyze the data output from the machine and present the results to the user quickly and easily. The goal was to extract different information such as protein composition, protein ratios, protein modifications, protein quantification and impurities detection, among other things. This required understanding the data, modeling mathematically the problems and designing and implementing the algorithms needed.

This internship proved very enriching, as I was able to delve into a field that I did not know much of, working independently but with a valuable feedback loop on what was needed and how could we solve it, providing a solution that made easier the job of the analysts in the group, and all in a well-known and established company. All in all it was a great experience and an excellent preparation for the future.

Figure 4.1: View of the detected signals in the sample and the detected proteins.

 

Figure 4.2: View of the modifications of a protein in different samples.

Lie He – Data Science Internship (Deep Learning/NLP)

Description

I did my internship at Deeption SA from July 1st of 2017 to December 31st of 2017. Deeption is a spin off from EPFL’s social media lab which specialize in data mining and discover insights from social media and deliver them to customer.

The first part of my internship was to develop natural language processing tools. One task is to link entities in text with pages in wikipedia. By using the internal links between anchor text and the associated wikipedia pages, one can build a labeled dataset and train machine learning model on it. Training entity embedding is a further improvement to resolve entity mention.  

The second part of my internship is to improve performance of sentiment analysis task. Sentiment classification is a classical natural language processing task. The goal is to classify tweets containing cashtags into bullish/bearish ones. Several deep learning models and architectures were implemented and tuned to get good performance. After that, the tweet level sentiments are aggregated by cashtag and used to forecast stock market movement in the next day. Classical time series techniques and recurrent neural network based models are used in the forecasting part.

What I like about my internship in Deeption is that I have plenty of time to read papers and implement the methods from scratch. This internship has provided me an excellent opportunity to apply my acquired knowledge in the CSE program to a real application.

 

Julien Rüegg – Predictive maintenance for aeronautics

    About: Meggitt SA, Suisse

 

Description

I did my internship at Meggitt SA (Fribourg) from the 1st August 2017 to the 31st January 2018. Meggitt is a world leading company in the sensing systems field, especially in the Energy and Aeronautics branches, they conceive sensors for the turbines, which are absolutely necessary for to the good functioning of power stations and aircraft engines.

My job by Meggitt was to develop an algorithm for predictive maintenance in aeronautics field. The latter should be able to predict when some specific parts of an aircraft are going to break. In this way, the airline should schedule the replacement of the piece more easily and the plane should stay grounded for less time. This research project belongs to a broader European project called AIRMES, which goal is to optimize end-to-end maintenance activities within an operator’s environment (http://www.airmes-project.eu).

Most of the algorithms I implemented are based on pattern recognition. They belong to “subspace methods” and consist of extracting a subspace of dimension much smaller than the original space in which we start. This subspace should contain the essential information related to a part replacement and discard the useless part of the information. Once the subspace is generated, each piece replacement is represented by a template, which is a point of the subspace. This procedure can be seen in Figure 1. New data classification is performed by projecting it into the subspace and measuring the distances between this projection and the different templates.

I had a great pleasure working at Meggit. People were welcoming, my tasks were clear from the beginning and I could fully complete them using my knowledge acquired at EPFL. My supervisor totally trusted my work from the first day and was fully available and was open to discussions, which were very instructive for both of us.

 

 

Mehrdad Kiani – Performing Field Simulations for Foil-Type Medium-Frequency Transformers with COMSOL

   About : ABB, Suisse

 

Description

I had a great opportunity to work as my internship in the research and development section of ABB from 1st of July 2017 until 31st of December in the department “Power Electronics Integration”. The goal of this internship was electromagnetic and thermal 2-d and 3-d simulation of foil windings of high-frequency transformers with COMSOL Multiphysics and validate the results based on Dowell’s equations as analytical solution. All of the simulations were done on the ABB server with accessing to 8 cores and 48 [GB] RAM. I recorded the problems and the challenges that may I got during the internship and the alternative ways which we used for solving these problems and challenges. Every a few weeks, we had a meeting to share the results to my supervisors and talk about the results and further works.

As one can see in Fig. 1, the alternating current resistance (R_w) and direct current resistance (R_w0) are calculated with the software and we had a comparison between F_R=R_w/R_w0(discrete signs) with Dowell’s equations as analytical solution (solid-lines). The left-side and right-side plots are correspond to two and three dimensional simulations, respectively which F_R parameter is versus a non-dimension number (|α^2 h^2 |). Some effects such as proximity and skin effects are more important rather than other effects in the medium and high-frequency domain in the transformers and were under consideration during the simulations. One of the biggest challenges during the simulation was about Meshing in 3-d that is described in the report in details. At the end of internship, a Python code was written in order to have an interpolation and extrapolation of data which are extracted from COMSOL in order to use in a complex code.

 Figure 1: F_R as a function of  |α^2 h^2 | for winding portions with integral numbers of layers according to Dowell’s equations and comparison with two (left side) and three dimensional (right side) simulations. Δ the simulation results for m=1, + the results simulation for m=2, ο the simulation results for m=3 and * the simulation results for m=4

Chiappa Alberto – Elevator context recognition

Schindler Home   About : Schindler, Suisse

Description

I was part of the New Technologies team of Schindler Aufzüge AG from July 17, 2017 to January 19, 2018. The office is located in Ebikon, in the canton of Lucerne. Schindler is one of the world leading manufacturers of escalators and elevators, founded in Lucerne in 1874.

The team’s mission is to investigate the latest technologies and to understand how they could bring benefit to Schindler’s products and services. Past projects were mainly focused on Mechanics and Electronics, but more recent ones also include an increasing presence in the data science area. I was involved in a project named “Internet of Passenger”, where the goal is to infer the operational health of the elevator from passenger mobile phones. In fact, we constantly carry in our pockets a set of sensors that includes an accelerometer, a gyroscope, a magnetometer, a light sensor and sometimes even a barometer. The project’s main task is to develop signal processing algorithms to extract certain trip characteristics of the elevator (e.g. maximum velocity, maximum acceleration).

My initial duty for the project was to record the first set of real-world data and to study the feasibility of the approach against a predefined set of elevator trip characteristics to be inferred. Having collected 1025 trips from about 80 people, I proceeded to devise signal processing algorithms for two purposes: first, to segment sensor signals corresponding to an elevator trip, and second, to extract the mentioned characteristics of the trips.

The following picture shows the location of some of the recordings taken in Schindler’s campus, first as dots and then as a heatmap.

 

Cécile Le Sueur – How to deal with Missing Values in proteomics dataset ?

About : EMBL, Germany

Description

EMBL is an intergovernmental organisation present in 6 different countries, with more than 80 independent research groups covering the spectrum of molecular biology. I spent my 6-months internship in the Huber group, at EMBL Heidelberg (Germany).

The goal of the internship was to develop a statistical tool to study proteomics data, obtained using the tandem Mass Spectrometry technology. Using this technology, we detect the presence of proteins in tested samples and  can measure their abundance using quantification methods.

We were especially interested in testing for differential expression, i.e we were searching for proteins present in different abundances between different conditions. To this aim, we estimate the \textbf{Log Fold Change}, which is the difference in abundance between the two conditions, and assess its significance. However, proteomic datasets contain numerous \textbf{Missing values}, i.e lacking observations. The goal was to handle these missing values to improve the estimation of the Log Fold Change and detect more true differentially expressed proteins.

My project was quite theoretical, but the computational aspect was important to test the developed mathematical models. I worked on Rstudio using real and simulated datasets to benchmark the performance of the models. The developed method is not finished yet and we will continue to work on the project.

I really enjoyed the research project I did, and  appreciate all the people I met. My supervisors trusted me, were really supporting and gave me a lot of advices. The knowledges I acquired at EPFL were also very useful for my project. EMBL Heidelberg is a really nice place and people from the Huber group are kind, funny and stimulating. To conclude, this internship was a wonderful experience !

Simple model used in an hypothetical case where all data are observed. How should we model the missing observations present in real dataset ?

Peli Riccardo – Exploiting Deep Learning for Computer Vision Applications

About : BOBST, Mex, Suisse

Description

I carried out my internship at BOBST SA, one of the world’s leading suppliers of machinery for packaging with folding carton, corrugated board and flexible materials. It has roughly 5000 employees worldwide and its headquarters is in Mex, close to Lausanne, where I have worked for six months. I joined the Quality Control development team in the CORES (COntrol and REgister Solutions) department. The main goal of the team is to build applications capable of detecting defects on printed boxes based on the analysis of acquired images and, thus, ensure a high quality of the final product. The whole pipeline from the image acquisition to the detection of defects is very fast, enabling a real-time quality control, without the supervision of an operator.

My role was to investigate deep learning capability in different problems that the team needs to solve. In particular, I have applied deep learning to facilitate the work of the operator who has to set the machine before the printing starts. For instance, nowadays, it is important to identify text regions, on which an expensive but very precise algorithm is run to ensure that even tiny defects are detected. Thus, I developed an algorithm for the generation of bounding boxes around small text and thin lines, as shown in the image below. This can save time and work to the operator, who can in any case modify the output to adapt it to his needs. Another algorithm was develop to find bar codes and data matrices. A deep learning network for the detection of objects in images was adapted to this task, leading to results as the one in the second image. This algorithm can be useful in the future if the reading of barcodes is needed during the printing process. I coded a third algorithm to find an alignment model for the box. An alignment model is a couple of regions which are used to estimate the shift and the rotation angle of the current box compared to a reference. The two regions should exhibit patterns with two independent directions and, in order to achieve an acceptable robustness, the patterns have to not be too subtle. A convolutional neural networks was trained to classify regions sampled from the image and the best two, in terms of contrast and distance, are kept as alignment model.

 

Jiaxi Gu – Algorithm design and data analysis of computational crowdsourcing urban design

 

 About : TalkingData, Chine

 Description

I did my internship as a data scientist in the Human Data Lab (HLab) of TalkingData in Beijing, China from 1st August, 2017 to 31st January, 2018. TalkingData is China’s leading big data service platform and HLab is a business unit focused on data analytics in social science.

For the whole duration of my internship I focused on a computational crowdsourcing urban design project. The goals of the project were to i) set up a crowdsourcing urban design event at the Beijing International Design Week 2017, and ii) perform data analysis and design algorithms to decipher visitors’ designs.

For the first part, I set up a web application based on qua-kit, an urban analysis toolkit prototyped by ETH Zurich. The application allowed visitors to easily re-design the Beijing Hutong area while collecting data from each design action.

The data was used to full potential at the second part. With the help of machine learning models, visitors were clustered based on their design activities. I was able to find that for some specific social groups (i.e. local residents, people enjoying historic sites), visitors of the same social group tended to show identical design patterns, while some other social groups did not have such a feature. I also found some design patterns applicable for all visitors. For example, most visitors started their design with the buildings of large volume, regardless of their functionalities or exteriors.

Finally, I designed algorithms to calculate computational architectural metrics such as vicinity, visibility and set-back distance in visitors’ designs. Despite the fact that some visitors claimed they are simply making random movements of buildings instead of doing thoughtful design, they usually made improvements according to computational metrics.

The project is very satisfying to work on since it’s the first ever crowdsourcing urban design project powered by machine learning and computational science. The CSE study program allowed me to acquire a wide range of skills and I was really happy to apply them to full extent in this inter-disciplinary project.

 

Figure: The heatmap of the re-designed position of an old, large residential building

Santo Gioia – Facial Augmented Reality

about : Kapanu AG, Suisse

 

Description

I did my internship from 21.08.2017 to 20.02.2018 at Kapanu AG. Kapanu AG is a spin-off from the Institute of Visual Computing / Computer Graphics Laboratory of ETH Zurich. It is specialized in research and engineering on visual computing technologies, computer vision, computer graphics and machine learning.

During my internship, I had to study some papers about Deep Learning projects which could be useful for the company and to understand how the company could bene t from them. It was very interesting to me because I could learn many innovative Deep Learning based techniques related to computer vision tasks and have an insight on how to apply them to real life problems.

Using an open-source game engine, I also developed a test bed framework in Python which is helpful to visualise rendered scenes. I used this framework to display the facial landmarks and the head pose from an image displaying a face, derived using one of the papers I studied (see Figure 1).

I was introduced to Swift, a programming language which I had never used before. During my internship, many tasks I had to solve were related to app development, mainly UI development. Together with my colleagues, we considerably improved the UI of the iOS application the company is developing.

Figure 1: Scene with facial landmarks and head pose