Chris Lemmens 5/28/19 Chris Lemmens 5/28/19

Unleashing the Power of Low-Code

Let’s be honest: speed, agility, and scalability are paramount to meet business expectations. Low code has quickly become a powerful tool that complements full code solutions, enabling data teams and business users. This article will summarize the benefits of low code in data consultancy and explore its impact on the development cycle.

Fast Explorative Analytics (FEA): First to mind is the speed of delivery, applying the fail-fast method. Low code empowers data-savvy users to quickly prototype, experiment, and iterate through various scenarios. By abstracting complex coding processes, low-code tools enable users to build and customize data pipelines visually, perform data transformations, and apply machine learning algorithms with minimal coding effort. This accelerates time-to-insight and facilitates fast explorative analytics for product development, trend analysis, and other data-driven decision-making processes.

Democratization and Scalability: By empowering business users and data teams, low-code democratizes access to data and analytics by reducing the dependency on technical expertise. Business users who possess domain knowledge but may lack coding skills can leverage low-code solutions to extract valuable insights from data without relying on data specialists. This empowers organizations to scale their data-driven initiatives and leverage resources efficiently.

Controlled Traceability for Auditability: Through enhanced traceability, low-code solutions support auditability of decision-taking & product development. Low code provides built-in mechanisms for version control, data lineage, and documentation. This traceability facilitates audits, compliance reporting, and the ability to trace back decisions based on specific data and analytics pipelines.

Potential Pitfalls and Mitigating Measures: While low-code offers significant advantages, there are potential pitfalls to be mindful of:

Scalability Challenges: As the complexity and scale of the projects increase, low-code solutions may encounter limitations. It is crucial to regularly assess the scalability requirements and evaluate if additional custom code or full code solutions are needed to meet the growing demands.

Over-Reliance on Pre-Built Components: Low-code platforms often provide pre-built components and integrations. However, relying solely on these components may limit customization and flexibility. It is important to strike a balance between leveraging pre-built functionalities and having the ability to extend and customize as needed.

Security and Data Governance: While low-code platforms offer governance features, data security and privacy must still be diligently addressed. Organizations should ensure proper access controls, encryption, and compliance measures are in place to safeguard sensitive data and comply with regulatory standards.

Low code has proven itself to be a valuable complement to full code solutions in the data field. It facilitates fast explorative analytics, empowers business users, and ensures controlled traceability for auditability. By leveraging low-code platforms, organizations can accelerate their data-driven initiatives, democratize data access, and scale their analytical capabilities.

Chris Lemmens 5/28/19 Chris Lemmens 5/28/19

How to monetize data by integrating data teams

How important is the structure of data teams in the organisation?

Picture an organisation that wants to become more data-driven. It implements an updated strategy by hiring data specialists in business intelligence, data architecture, data engineering & data science. The organisation does not yet have a clear vision on how to structure & manage this new field of specialists. Small data teams pop up within the various business departments & IT. They work in close collaboration with business experts to create an impact & an appetite for data-driven change. As the number of data specialists in the organisation grows, it creates a need for standardisation, quality control, knowledge sharing, monitoring and management.

Sounds familiar? Organisations worldwide are in the process of taking this next step. In this blog, we will discuss how to structure & integrate teams of data specialists into the organisation. We will base these discussions on Accenture’s classification and AltexSoft expansion on these.

Two key elements are essential when discussing the structure and management of data teams.

The element of control

Customers and the organisation need work to be delivered predictably with quality in control. In other words, tooling, methods & processes need to be standardised among data specialists. Output can be planned & communicated, delivery of output is recognisable & easy to use, and assurances can be given on the quality of work by adherence to standards. Adding to the control element are the practices of sharing knowledge & code base between specialists and centralised monitoring & management.

The element of relevance

Data specialists rely on domain expertise to deliver output that is relevant to the organisation and its customers. Domain expertise is gained by working closely with business experts within and outside the organisation. Expertise building is slow & specific to the domain. Relevance and speed in delivering go hand in hand. Data specialists create maximum value when working closely & continuously with business experts. Adding to the element of relevance are the practices of customer-centricity, value-driven developing adaptability to the given situation in tooling, methods & processes.

The elements of control & relevance determine the success of the next step in data-driven change. The structure & integration of data teams depends on the amount of control and relevance required by the organisation. We will discuss three common approaches for structuring teams.

Decentralised approach

This approach maximum leverages the relevance element. In the decentralised approach, specialists work in cross-functional teams (product, functional or business) within the business departments. This close collaboration allows for flexibility in the process & delivery of output. Communication lines within the cross-functional teams are short, business knowledge for the data specialist is generally high. Central coordination & control on tooling, methods & processes is minimal as expertise & people are scattered across the organisation. Organisations implementing this approach may have less need for control or, in many cases, are just starting data-driven work and therefore lack the need for elaborate control measures.

Centralised approach

As the name suggests, this approach centralises data expertise in one or more teams. This approach leverages the control element. Data specialists work closely together, enabling fast knowledge sharing. As the data specialist work in functional teams management, including resource management, prioritisation & funding is simplified. Centralisation encourages career growth within functional teams. Standardisation of delivery & results are common & monitoring on these principles is done more efficiently. Communication lines with business experts & clients are longer. Creating the danger of transforming the data teams into a support team. As team members work on a project by project base, business expertise is decentralised within functional data teams, adding lead time to projects. Organisations implementing this approach may have a high need for control & standardisation. Furthermore, as work is highly prioritised & resource management coordinated, the centralised approach supports organisations with strict budgets to take steps to data-driven work.

Center of Excellence (CoE) approach

The CoE is a balanced approach trying to optimise both the element of control & relevance. In this approach local pockets of data specialists work within businesses teams (cross-functional) while a Center of Excellence team enable knowledge sharing, coordination and standardisation. The data-savvy professionals — aligned with the business teams — build business expertise and have efficient communication lines within business departments. The CoE team provides a way for management to coordinate and prioritise (cross-) department tasks and developments by enabling CoE team members to support the local data specialist (commonly called SWAT technique). Furthermore, the CoE team is tasked with standardising delivery & results across the organisation. Organisations implementing the CoE approach need local expertise within units to support daily operations and a more coordinated team to support & standardise. As data specialists work in both the business department and the Center of Excellence, the organisation needs to support both groups financially. A higher level of maturity in working data-driven is required to justify the SWAT-like approach in high priority data projects.

Concluding control & relevance are two critical elements to consider when deciding how to integrate & structure data teams within an organisation. We elaborated on three common approaches, centralised, decentralised and the Center of Excellence. Each balancing control & relevance. Which structure will work for your organisation depends on the current level of maturity and the need for either control or relevance.

Chris Lemmens 5/28/19 Chris Lemmens 5/28/19

Synthetic data

The concept of synthetic data generation is the following: taking an original dataset which is based on actual events. And create a new, artificial dataset with similar statistical properties from that original dataset. These similar properties allow for the same statistical conclusions if the original dataset would have been used.

Generating synthetic data increases the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data. It creates new and representative data that can be processed into output that plausibly could have been drawn from the original dataset.

Synthetic data is created through the use of generative models. This is unsupervised machine learning based on automatically discovering and learning of regularities/patterns of the original data.

Why is synthetic data important now?

With the rise of Artificial Intelligence (AI) and Machine Learning, the need for large and rich (test & training) data sets increases rapidly. This is because AI and Machine Learning are trained with incredible data, which is often difficult to obtain or generate without synthetic data. Large datasets are in most sectors not yet available at scale. Think about health, autonomous vehicle sensors, image recognition, and financial services data. By generating synthetic data, more and more data will become available. At the same time, consistency and availability of large data sets are a solid foundation of a mature Development/Test/Acceptance/Production (DTAP) process, which is becoming a standard approach for data products & outputs.

Existing initiatives on federated AI (where data availability is increased by maintaining the data within the source, and the AI model is sent to the source to perform the AI algorithms there) have proven to be complex due to differences between (the quality) of these data sources. In other words, data synthetization achieves more reliability and consistency than federated AI.

An additional benefit of generating synthetic data is compliance with privacy legislation. Synthesized data is less (but not zero) directly referable to an identified or identifiable person. This increases opportunities to use data, enabling data transfers to cross-border cloud servers, extending data sharing with trusted 3rd parties and selling data to customers & partners.

Relevant considerations

Privacy

Synthetisation increases data privacy but is not an assurance of privacy regulations.

A good synthethisation solution will:

include multiple data transformation techniques (e.g., data aggregation);
remove potential sensitive data;
include ‘noise’ (randomization to datasets);
perform manual stress testing.

Companies must realize that even with these techniques, additional measures such as anonymization can still be relevant.

Outliers

Outliers may be missing: Synthetic data mimics real-world data. It is not an exact replica of it. So, synthetic data may exuberate some original data's outliers. Yet, outliers are important for training & test data.

Quality

The quality of synthetic data depends on the quality of the data source. This should be taken into account when working with synthetic data.

Black-box

Although data synthetization is taking centre stage in the current hype cycle, it is still in the pioneering phase for most companies. This means that at this stage, the full effect of unsupervised data generation is unclear. In other words, it is data generated by machine learning for machine learning. A potential double black box. Companies need to build evaluation systems for the quality of synthetic datasets. As the use of synthetic data methods increases, an assessment of the quality of their output will be required. A trusted synthetization solution must always include good information on the origin of the set, its potential purposes for usage, its requirements for usage, a data quality indication, a data diversity indication, a description of (potential) bias and risk descriptions including mitigating measures based on a risk evaluation framework.

Synthetic data is a new phenomenon for most digital companies. Understanding the potential and risk will allow you to keep up with the latest developments and ahead of your competition or even your clients!

Unleashing the Power of Low-Code

How to monetize data by integrating data teams

Synthetic data

Contact information