With more than 15 million members and offices in Hamburg, Munich, Barcelona, Vienna, Zurich and Porto, XING is the leading online business network for professional contacts in German-speaking countries. 90 percent of page views on XING come from the DACH region (Germany, Austria and Switzerland). Professionals from all industries use XING to network with each other and search for jobs, projects, cooperation partners, expert opinions and business ideas. XING members interact in around 90,000 groups or get to know each other personally at one of over 130,000 networking events each year. The company was founded in Hamburg in 2003, has been listed on the stock exchange since 2006, and has been listed on the TecDAX since September 2011.
Xing SE
Talend Big Data
Xing SE
Client
XING SE
Industry: Social Media
Headquarters: Hamburg
Consulting topics
Results
- Processing of up to 50 million events per day for the marketing automation application "Braze".
- Provision of data within 10 minutes of its creation
- Easier GDPR compliance for over 15 million users who entrust their personal data to XING
Better system connectivity for better business networking
Processing large amounts of data in a time-critical event streaming environment
XING offers a decisive advantage when it comes to providing local news and job information. Because XING’s services are based on the collection, storage and processing of high-quality, timely data, the company relies on being able to efficiently integrate data from a wide variety of sources into its own systems.
“The systems we used to use forced us to write scripts for data integration. It was very difficult to keep track of where data sets were generated,” says Mustafa Engin Sözer, Senior Business Intelligence & Big Data Architect at XING. “We knew we wanted to do it all without scripting if possible, centralize our metadata, and simplify data integration. We were also looking for ways to improve data quality.” Sözer also talks about other data challenges XING faced: “Some data sources provide us with a large number of small files that were technically impossible to process separately. That’s why we were looking for an efficient approach to process this data to make it available in the central data warehouse within 10 minutes of its creation.”
Sözer adds that file formats were also problematic: The data was in Apache Avro format, which is not typically supported by traditional data processing tools. XING therefore needed a solution that would work with a variety of file formats and support interoperability within the data processing toolchain. For example, Apache Drill is used to transfer Big Data workloads to Big -Data clusters, but subsequently moves to ELT-style data processing. While processing Avro files with Apache Spark is possible in quasi-real time, maintaining Spark jobs requires programming skills as well as a basic understanding of distributed computing – something that may not be present among less technically savvy business developers. For this reason, XING wanted to reduce the complexity of the integration processes to allow stakeholders and developers with a less technical background to manage the integrations.
MUSTAFA ENGIN SÖZER
Senior Business Intelligence and Big Data Architect, XING SE
Why Talend?
XING evaluated several solutions for its data integration needs and then chose Talend for several reasons: “Among the features that won us over with Talend were its open-source approach, the wide range of connectors, and the fact that it is Java-based and offers a familiar and easy-to-use Eclipse-based interface,” says Sözer. “We also liked the metadata management and automated documentation features, the quick adoption of new technologies, and the ability to implement new use cases.”
XING now uses Talend as a bridge between a local 150 TB MapR DB NoSQL database and a 60 TB Exasol database used for analytics. “We set up Apache Drill with Talend on top of MapR to set up a data processing pipeline that is used to convert a huge number of Avro files into CSV format,” Sözer explains. “This means we only have to write a single query that we can use to read thousands of files. This drastically reduces the processing overhead.”
Using Talend for this particular use case also allows XING to keep track of metadata, giving it a global overview of all its data assets – something that would not have been easily accomplished with a normal Spark job. Sözer adds, that Talend also helps with dynamic data extraction. With Talend, XING can easily retrieve specific columns that exist in the MapR file system data, apply business or technical transformations to them, and then store them in Exasol. There, they are subsequently used by Microstrategy to generate reports. This functionality also allows data extraction to adapt to changes and ensures that only relevant columns are moved to Exasol, while all raw data can remain in the MapR file system or database for further processing, with an emphasis on batch analysis.
“Talend helps us simplify the entire integration. The data processing activity becomes more understandable and easier to track – all while maintaining high performance,” Sözer says. “This allows us to more easily involve actors with less technical background in the process.”
Why cimt?
“Working with Talend experts from cimt ag, the migration from the previous solutions to Talend, raising the standards and necessary frameworks for a healthy Talend ETL environment was created smoothly,” says Soezer. Rouven Homann, Management Partner at cimt ag, adds, “Our combined experience and knowledge in both technologies, Talend and Exasol, enabled us to develop a custom Exasol SCD component (tExaSCDELT) for XING based on ELT for Talend. We are very pleased that XING was willing to make this component available to the Talend open source community.”
Connect professionals, increase productivity and success
The driving force behind the integration of many different systems and the centralization of data processing was the goal of analyzing marketing data in near real time and providing meaningful information about the effectiveness of campaigns. The real-time marketing response data, provided by marketing automation provider Braze, consists of thousands of small files in Avro format cached on Amazon S3. “We use Talend to connect systems and deliver the data to the central data warehouse as quickly as possible so it can be used to create analytics reports, support operational marketing and targeted campaigns, and monitor campaign performance,” Sözer says.
The processing capabilities are impressive. According to Sözer, XING is able to process 2 million events per hour using this approach. At peak times, this adds up to 50 million events per day. Key benefits of XING’s new integration architecture include better business traceability, as data is consolidated on a single platform, and also the ability to run analyses and reports more efficiently, thus optimizing decision-making within the company. “In addition,” says Sözer, “maintenance costs have decreased – while productivity and efficiency have increased.” “With Talend, we gain insights and can measure our performance against metrics,” Sözer explains. “For example, we can now analyze data faster and more accurately, and extract metrics and KPIs that are used across XING to implement business strategies. We also have better statistics on the number of daily and weekly active users, new job listings, the number of users who clicked on specific offers, and more.”
Over 14 million users entrust their personal data to XING. The company therefore has a special responsibility to its customers, as they all expect the social network to keep their data secure and keep sensitive information confidential. Talend also helps XING comply with strict compliance standards in terms of corporate governance, data protection and DSGVO. “Online business networking is based on trust,” Sözer says. “Therefore, it is critical for compliance to store metadata centrally and keep it in view. With Talend, we centralize all source and target systems. This allows us to analyze data and find out which data sets are relevant to which requirement. We can also identify whether or not personal data is involved. And, we take full control of our data and metadata.”