© 2016 Swiss Alliance for Data-Intensive Services

  • Twitter Basic Black

SDS|2017 Parallel Session

Date: 16 June 2017

Venue: Kursaal, Bern

Project & Start-Up Support

Financial support is one of the primary concerns/objectives while formulating a research proposal. We bring together at this platform, experts from 2 of the funding agencies in Switzerland under whose umbrella hundreds of projects are funded every year, the federal Commission for Technology and Innovation (CTI) and EUResearch. 

 

EUResearch will give information about how to prepare an EU R&D proposal, forming a consortium, what calls are currently open and more. CTI will talk about how their funding works, CTI’s expectation with regard to the proposal content, innovation and the business aspect for the Swiss industry involved in the project. CTI will also give information about their support programs for start-ups and spin-offs. Join this session to hear from and meet the experts.

* Only participants who have registered for the conference can sign-up for this session

 
Use-Case Talks 
 

An interactive session where companies come ahead with their use-case problems and seek answers from you. The use-cases will talk about new technologies and applications where the presenter is facing technical problems. The presenter will explain what they have developed and where they stumbled into problems and you will provide them with answers with your expertise in that specific field.

 

Please look at the description of the use-cases to see what ignites your interest and sign-up to participate. Please note that there will be 2 use-cases running at the same time. 

Use-Case 1: Helsana

Building an Analytics Infrastructure – the right questions to ask

Building a hardware infrastructure for Analytics is a very challenging endeavor. Coming from a fragmented application landscape with Linux, windows, single- and multi-tier architectures makes the switch much more difficult and complex. Many models and batch-jobs requires up-to-date data and therefore there are strong dependencies between different worlds (for example data warehouse and Analytics) that need to be considered and planned for. The number of teams involved makes even more difficult to define and decide on a target architecture, since everyone have its own different requirements. Sizing (RAM, CPU, HD), number of environments and tools (and their integration) play a fundamental role for successful analytics infrastructure.

 

There are fundamental questions that needs to be clarified before deciding on the target infrastructure: how will the deployment occur? How many environment are needed? What is exactly is a development environment since the data scientists will always need productive data? What kind of data sources you have (Oracle, Teradata, files, etc.)? How many? What is more important CPUs or RAM? What is the role of IT and of Data Scientist (does such a division make sense)?

 

In this use-case we will share our experience and what are the challenges and the right questions to ask before defining a target Analytics Infrastructure.

In addition we would like in particular to share our experience on the distinctions between the role and responsibilities of "IT" and "Data Scientists". Does such a distinction make sense?

Questions

  • What is an analytic Infrastructure? Are we talking only about hardware?

  • How many environment are needed? Is a "Test" environment needed?

  • What is the role of a "development" environment?

  • What role the deployment process play in defining the target infrastructure?

  • How important is the integration of different tools (for example R and SAS)? Is needed?

  • How Analytics projects differentiate itself from "standard" IT projects (for example from Waterfall and Agile methodologies)?

  • What are the roles and responsibilities of IT and of Data Scientists? Does a distinction make sense in our data driven world?

Target audience

The presentation itself is meaningful for anyone working with data analytics infrastructure (hardware, tools, processes). We recommend IT people, Data Scientist and Analytics Architects (and decision makers) to attend to gain insights of what it means to consolidate a complex infrastructure coming from a fragmented one and ensure success. In addition we encourage Data Scientist in particular to participate to hear what is our experience with the division of roles and responsibilities between IT and Data Scientists.

 
 

Use-Case 2: SWITCH & ZHAW

Automated provisioning of data analytics clusters

Data analytics framework such as Hadoop, Spark offer very powerful tools to analyzing huge amount of data and gain deep insights. But managing a reasonably optimized cluster is complex. With an aim to remove the complexity and automate the lifecycle management of these clusters for research groups and the student community within Switzerland, ZHAW and SWITCH have been collaborating within the SCALE-UP project. The researchers at ICCLab in

InIT, ZHAW have developed DISCO - a distributed computing framework orchestration solution which is able to offer creation of on-demand clusters of Hadoop, Spark in a few minutes on SWITCHengines. Additional tools such as Zeppelin and Jupyter can also be provisioned with just a click through DISCO. But there still remain a few questions to be answered, which will be addressed in the discussion.

Questions

  • How big are typical big-data use cases in the various industries in terms of storage needs?

  • SWITCHengines is a cloud solution primarily for the Swiss academic and research sector, developed and managed by SWITCH. The data is hosted in Switzerland where the Swiss data protection law applies. What are further essential requirements which will make the data analytics solution offered via DISCO over SWITCHengines desirable (apart from cost factors) when compared to commercial providers located outside of Switzerland / Europe?

  • Are organizations use cases mainly representative of long lived analysis tasks, or are organizations and groups open to using a cluster only for the duration of the tasks and then deleting or pausing the cluster resources to minimize costs in the long run?

  • The clusters created by DISCO come with their own dedicated storage which is not shared with any other clusters as a default. Is this assumption a reasonable one, or is there a general consensus in the user community that data should be more easily shared between different clusters?

  • Apart from Hadoop and Spark, which are the other non-commercial popular frameworks used by the community?

Target audience

The presentation itself is meaningful for anyone working with open source data analytics frameworks such as Hadoop, Spark - students of big data and allied subjects such as AI and machine learning are suitable audience. Companies offering Business Intelligence solutions, big data cluster administrators from universities and companies are encouraged to attend.

 

Use-Case 3: PwC

Machine learning in transactional data

At PwC, data are at the heart of everything we do. We have a wealth of experience in all data-related disciplines from collection, cleansing and management to building analytical algorithms and visualisation tools. We combine this expertise with other PwC's professional offerings such as Tax Services, to create powerful, data-enabled solutions for clients.


We present a use case resulting from the application of Machine Learning to tax statements of corporations. Taking this case as a basis, we elaborate on other use cases in various business settings. In a group discussion during the use case session, we encourage you to share own challenges in your organizations with the group and to take home practical insights for your organization.

Questions

  • What are ways to build machine learning value chains in environments with high requirements in accuracy and precision?

  • What experience do you have with artificial intelligence in any kind of transaction related problems?

  • Have you looked into technology for coping with regulations in your own organizations?

  • Have you checked possibilities of automating office processes (aka Robotic Process Automation) and may you share your experiences?

  • If you haven’t had the chance to deal with those questions, what the obstacles in your environment?

Target Audience

Students and professionals with interest in accurate, transaction-style data, such as accounting, tax and regulatory.

 
 

Use-Case 4: eBay

Identification of Experts Possible Approaches

eBay is a unique marketplace with millions of live items and millions of active users. Naturally, with such a massive ecosystem there is considerable diversity in both quality of inventory and user expertise / knowledge. This is especially true in categories with specialized equipment (e.g. cameras, snowboards, mountain bikes).

 

The ability to identify users with expertise in a domain or high-quality inventory would enable us to provide interesting recommendations, give us deeper inventory insights, etc.

 

We have many data points related to user browsing and purchasing behaviour that we can leverage to identify expertise and quality. We have implemented a modified version of the graph-based SPEAR algorithm (Noll and Yeung 2009) for this purpose. However, we have open questions and are open to other approaches.

Questions

How would you build a model to identify

  • high-quality inventory and

  • domain experts at scale?

Target Audience

Data science practitioners / students interested in applied machine learning. Those with experience in graphical models are encouraged to attend.

Use-Case 5: Valora

Demand forecasting in high-frequency retailing

Data Science at Valora is part of Valora's LAB focused on building a superior shopping experience by developing new digital products and innovative services.

Understanding our data is key in this process. The LAB's analytics team accelerates business by providing analytical insights to everyone in the company.

 

A particular cornerstone in the multiverse of analytics tasks is demand forecasting to streamline business processes at POS level.

Our use case concerns the Valora kkiosk brand with more than a thousand POS in Switzerland. The goal is to develop better and new prediction models that further improve the efficiency in our supply chains and order systems. We look forward to discuss aspects of this journey with you in the session.

 

Questions

  • Choosing the right model/s for demand forecasting, univariate vs. multivariate.

  • Analysis of computational and maintenance costs.

  • Usability and scalability of a fully-fledged high-dimensional multivariate model.

  • Making POS-tailored forecasts including POS specifics, assortment cubes, external factors, and so on.

  • Sourcing external data in a consistent and persistent way, e.g., weather data; also w.r.t. granularity, and data quality.

 

Target audience

Data scientists and researchers (students and professionals) with interest or experience in the field of demand forecasting and model building.