In this article, we’re going to examine the world of data science. We’ll untangle the often used (and misused) terms and expressions, and look at data science courses and data scientist salaries.
What Even is Data Science, Business Intelligence and all the Other Expressions?
You might find all these expressions a bit too much to get your head around. Data scientist, data engineer, data analyst, business analyst, deep learning engineer, machine learning engineer… Why are there so many tasks in the world of data science, and what do they entail exactly? What technologies do they use and what are their purposes? This is what we’ll touch on first.
Most languages use English terms because it is such a new field. You will find these in job advertisements, professional forums and courses.
BI vs Data Science
Business intelligence (or BI for short) and data science are the two biggest areas of professional data analysis. They both use large amounts of data to determine trends and patterns, however, there are many differences between them. They require different skills, and while business intelligence focuses on business life and the present, data science has a broader range that focuses mostly on the future — meaning they work with different perspectives.
Business intelligence (BI)
Business intelligence is the sum of strategies and technologies that an organization uses to collect and analyze large amounts of data. Its goal is to support strategic decisions with the trends and correlations it figures out.
For example, business intelligence can be the feedback you can give on every assignment of CodeBerry’s online programming courses with small emoticons. It shows how you liked the assignments, and the collected feedback can be summarized so that the curriculum developers can upgrade the assignments that confused many people.
Data science is a complex field of science that fuses mathematics, statistics, machine learning, programming skills, presentation skills and other professions together. It creates models and its goal is to predict things based on the information we got from the data.
One of the main fields of science that supports artificial intelligence development is data science. It is present in many other fields as well—scientific innovation, research and development, social, health science and education.
Team Members of BI and Data Science Groups
Data scientists are experts that use scientific methods to analyze large amounts of data to get some information out of them, in order to help solve complex problems.
They use their complex (usually mathematical, statistical, and programming) skills to clean and convert data to create models from it (mostly with methods that use machine learning) that they can use to make predictions. Besides a good knowledge of programming and statistics, they also need the skill to be able to quickly understand the field that uses the information they got from data analysis.
They also need to be able to visualize and present their results, so good communications skills are also required.
Data analysts only focus on analyzing data, which means their area of expertise is narrower than that of data scientists’.
When browsing from data analyst job offers, some will say that the job requires strong mathematical, statistical, and programming skills, while others will state that data analysts mostly create SQL queries, analyses, and visualizations, meaning they won’t need mathematics or programming skills.
The data analyst position is similar to data scientists’ but is usually only some part of data science. (For example, sometimes, data analysts only create queries or visualizations, or maybe they won’t have to do needs assessments or communicate with clients.)
Data engineers create and maintain the infrastructure for storing and handling data. They write code that helps get the data from the servers to data analysts who will process them. They also handle systems that store extreme amounts of data and have an important role in preprocessing data, which means “cleaning” the data up, formatting it for analysis.
The task of business analysts requires assessment, keeping in contact with clients, decision-makers, and the IT team. In order to do this, they must have excellent communication and presentation skills.
In a nutshell, they are the ones who professionally translate the needs of clients to the language of information technology, and translate the feedback from the IT team for the clients as well.
Business analysts also analyze business data statistically. They explore correlations in the data to support corporate development, product and service development, and financial decisions.
Machine Learning Engineer
Machine learning engineers are specialists in machine learning, so they set up systems related to machine learning, and are skilled in algorithms and creating data used for machine learning. They build prototype models in the production systems and maintain them.
What you must know about data
What kind of data are we talking about in the context of data science? Where do they get it from, how do they store and process it? What tasks need to be done in order to do so?
Big data usually means extreme amounts of data. Often, we use big data to describe data that “can no longer be stored and managed on a normal desktop computer,” and traditional database management tools (like Excel or SQL) are not sufficient to handle them.
Big data not only refers to the amount of data — but there are also quality requirements that the data set needs to meet. Processing speed, variability, accuracy, reliability, and use value: these are all requirements that decide whether we can call the data we’re using big data.
Where do the large amounts of data that data scientists work with come from? Countless tools create large amounts of data: smartwatches, security systems, intelligent household appliances, bank transfers, purchase data, meteorological measurements, and so on ad infinitum. Also, a single user can generate large amounts of data with a single smartphone: calls, emails, messages, photos, searches, music and so on.
Raw data is data that hasn’t been processed yet. Data engineers and data scientists get data in formats that are not suitable for analysis—they have inconsistencies, missing data, duplications, and even typos. This data needs to go through preprocessing (cleaning, converting, filtering) so that it can be used. This is called data preparation or data wrangling.
Technologies that use data science
Artificial intelligence (AI for short)
Artificial intelligence is a field of science where people research and develop computers capable of making decisions based on large amounts of data. They use machine learning to achieve this.
Artificial intelligence control self-driving cars (like Tesla Autopilot, Waymo), robots that are aware of their surroundings (like Pepper robot, Boston Dynamics robots), and digital assistants that are capable of voice recognition and speech interpretation, like Siri and Alexa.
Video of Pepper:
Boston Dynamics robots:
Machine learning (ML for short)
Machine learning is a process during which a computer tries to understand data sets with the help of algorithms, and makes predictions and decisions based on them. The system is based on past examples, and after a proper amount of input data, you can input something, and it can give back an output that makes sense which the computer has never seen before.
For example, if we have a flashlight that we want to control with our voice, we can teach the computer using machine learning algorithms that it should switch on the light whenever we say “light”. As the amount of sound samples grows, the control mechanism becomes better and better, and it starts to recognize the word “light” more efficiently. After a while, it can even hear the word from full sentences as well, for example, “I need some lamplight.”
Deep learning, DL for short
Deep learning is a type of machine learning. It is the most complex of all machine learning algorithms, and it is based on the human brain. Deep learning requires more data and time, but it can give more precise results and learn more complex correlations than other types of machine learning.
Uses of data science
Data science is a varied field of science, and it has many uses in businesses, research and development, and it can even be used by civilians or in a social context.
Data science in medicine
They use research based on big data in drug development, genetic, genomic research. They use image recognition technology in diagnostics when analyzing X-ray, MRI, and CT recordings. They are also working on patient care using virtual assistants, and disease predictions. There are also some customized healthcare programs generated by these systems.
Data science and climate research
They use data science to create weather prediction models and to model and research climate change, analyze effects of global warming, predict natural disasters, monitor pollution and the air quality of cities, etc.
Data science, search engines, and recommendation systems
Data science technologies make the use of targeted ads and recommendations based on your interests possible on Youtube, Netflix, and Spotify.
Search engines also process large amounts of data. Dating apps like Tinder use data to find you a perfect match.
Data science and route planning
Finding optimal routes, route planning applications: the route planners of Uber and Google are based on data science results. Coordinating traffic lights, measuring the traffic, monitoring traffic jams—a lot of valuable data comes from this. Giant companies like Amazon or eBay are also using data science to help with logistics optimization.
Data science and language technology
Developers face a serious challenge with human language recognition, interpretation and reproduction. There are many areas where human languages and machines meet, and data science plays a huge role in handling this. Speech recognition and automatic captioning are some of these areas.
Automatic machine translation is developing lightning-fast, we only need to think about what Google translations were a few years ago and what they are now. Of course, they are far from perfect, but the translations already sound more natural, and they have better grammar as well than they did a few years ago.
Virtual assistants are also an exciting topic, like Apple’s Siri or Amazon’s Alexa. These also need to recognize and interpret human speech and give appropriate answers, so it needs machine learning and other data science techniques.
Creating accessible reading software for the visually impaired, and developing more natural human voices and intonations for robots are also goals achieved with dynamically evolving data science technologies.
Data science and augmented reality
Augmented reality is a set of technologies that puts digital content into cameras in real-time. We can, for example, try on clothes without physically putting them on, we can look at how some piece of IKEA furniture might look in our apartment, or we can put dog ears on ourselves in our selfies. The image recognition and interpretation required to do these things are all based on data science.
Machine vision is a technology used to interpret surroundings, objects, and obstacles. This also requires data-based learning.
Data science is also needed when creating VR experiences.
Data science and image recognition
Identifying and categorizing image content, image-based search, visual search engines: we couldn’t do any of it without machine learning and the use of neural networks.
There are also some other useful and important technologies, like identification of faces, and face recognition in general, in forensics and security technology. There is also emotional recognition that can help us monitor the emotional state of psychiatric patients or astronauts. These technologies are currently being developed.
What does a data scientist actually do?
Needs assessment, orientation, consulting the client
Data scientists have to thoroughly assess needs, know exactly what their research is going to be used for, and gain an in-depth knowledge of the project’s features.
Business-centric thinking is usually unavoidable, as clients usually will have business goals (like wanting to sell more of a product or minimizing losses). When setting a project’s goals, the client’s viewpoint needs to be understood.
Data preparation, cleaning and transforming
Once a data scientist knows the needs and the goals of data analysis, data collection or data “harvest” can begin from different sources.
Managing big data cannot be done by ordinary data storing software (like Excel, MySQL, etc.) — we need software designed specifically for working with big data (like Cassandra, Hadoop or Spark).
Raw data can be missing some elements, have typos and duplicates, and it can consist of inconsistent different types of elements. This means that the first and most time-consuming step is the complex process of cleaning, then comes the conversion. For this, scientists need programming skills and software tools like Talend and Informatica.
Data engineers can be a great help with collecting and preprocessing data since they are specialists in this field.
Then comes data analysis, using mathematical and statistical methods.
Python and R (also, some other languages like MatLab and Scala), and statistics software like SPSS and SAS are used for creating models.
In this phase, analysts are looking for trends using many different techniques like classification, clustering, regression, decision trees, machine learning, and neural networks.
Data visualization and presentation
Incorporating models into a corporate environment
After presenting the data, another process may begin, or a process might change in the life of the company. If a data scientist’s work can be incorporated into the operation of a corporation, the data scientist usually gets to be part of a team that completes this task.
What do data scientists need to know?
Data science is the intersection of statistics, programming and the field it’s being used in. Data scientists need to be able to see the process as a whole from needs assessment to presenting the results, and they need good communication and problem-solving skills, and teamwork. A practical approach, business skills, and insight into the area of use are also important.
Data scientists also need to know data cleaning, processing, analysis and visualization software as well as Python or R programming.
Not all positions require a university degree, but data scientists usually must have at least BSc or MSc degrees in a related field of study (mathematics, IT, engineering) to get a job. This is because the mathematical and statistical skills needed for this job can be acquired mostly at universities. Data Science is often a specialization in some of these fields.
Data scientist salaries
Data science jobs are usually extremely well-paid, even compared to the field of IT. It is no coincidence that they get paid well — as a data scientist, you have complex tasks so you need to be versatile and have diversified knowledge to complete them.
There are three main areas of data scientist jobs: developer, infrastructure, or manager.
Of course, the salaries are different in these three areas. You can earn the most money as a manager, and the least amount in the field of infrastructure.
Where can I learn this? – Data Science courses
There are two main learning paths to follow if you’d like to become a Data Scientist.
The first option is to complete a data science course. You can find several pieces of training on the Internet even for free, and they are also worth a try if you’d like to take a look at this area.
Here we selected a few online Data Science course:
DataCamp Free – Data Science course for Everyone
- 2 hours
Udemy – The Data Science Course 2021: Complete Data Science Bootcamp
- 28,5 hours on-demand video
- Price: Check the website for actual pricing
Coursera – IBM Data Science Professional Certificate
- 11 month
edX – Data Science courses on edX
- Several courses to complete
- Beginner, Intermediate, Advanced
SkillShare – Online Data Science Courses
- Several courses to complete
- Beginner, Intermediate, Advanced
ScaleFree – Data Vault 2.0 Training
- 6 weeks
- Price: €2.798
SimpliLearn – Data Science Course
- + 220 hours
- Price: €1.499
Emeritus – Online Data Science & Analytics Courses
- Several courses to complete
DataQuest – Data Science Courses Pathway
- Several courses to complete
The second way to become a Data Scientist is to attend a University course. You can complete both undergraduate and graduate-level and you also have the option to take part as a guest student at the chosen courses.
Here we selected a few Uni degrees about Data Science:
If you’d like to see more courses, here below you can check more lists about Data Science training options and other educational materials.
- Top 8 Online Data Science Courses – 2021 Guide & Reviews
- 5 Online Data Science Courses You Can Start Now
- The 9 Best Free Online Data Science Courses in 2020
- I Ranked Every Intro to Data Science Course on the Internet, Based on Thousands of Data Points
- 15 Data Science Course Certifications that pay off
In our article series “Programming courses” we collected the learning opportunities, so if you want to learn to program, you can find these, and other interesting facts there which can help you find the perfect course.
- Top 13 Python programming courses
- Top 10 Java courses for beginners
- A New Necessary Skill? Programming for Kids
- 4+1 easy ways of learning programming for free
- 10+ Best Web Development Courses Online
- Playing it safe – Top 10 Programming Courses With Job Guarantees
- Data Science Courses, Data Scientist Salaries, and Everything About DS
Would you like to start programming online? Come and try the first 25 lessons of CodeBerry Programming School for free!