DATA SCIENCE is the area of study which involves extracting insights from vast amounts of data by the use of various scientific methods, algorithms, and processes. It helps you to discover hidden patterns from the raw data. The term Data Science has emerged because of the evolution of mathematical statistics, data analysis, and big data.
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data.Data science is related to data mining, deep learning and big data.
Data science is a "concept to unify Statistics,data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data. It uses techniques and theories drawn from many fields within the context of mathematics statistics computer science and information science.
Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science(empirical,theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.
The data can be:
Why Data Science ?
Here, are significant advantages of using Data Analytics Technology:- Data is the oil for today's world. With the right tools, technologies, algorithms, we can use data and convert it into a distinctive business advantage
- Data Science can help you to detect fraud using advanced machine learning algorithms
- It helps you to prevent any significant monetary losses
- Allows to build intelligence ability in machines
- You can perform sentiment analysis to gauge customer brand loyalty
- It enables you to take better and faster decisions.
- Helpss you to recommend the right product to the right customer to enhance your businesses.
Data Science Components
Components of data science |
Statistics:
Statistics is the most critical unit in Data science. It is the method or science of collecting and analyzing numerical data in large quantities to get useful insights.Visualization:
Visualization technique helps you to access huge amounts of data in easy to understand and digestible visuals.Machine Learning:
Machine Learning explores the building and study of algorithms which learn to make predictions about unforeseen/future data.Deep Learning:
Deep Learning method is new machine learning research where the algorithm selects the analysis model to follow.Data Science Process :
1.Discovery:
Discovery step involves acquiring data from all the identified internal & external sources which helps you to answer the business question.The data can be:
- Logs from webservers
- Data gathered from social media
- Census datasets
- Data streamed from online sources using APIs
2.Data Preparation:
Data can have lots of inconsistencies like missing value, blank columns, incorrect data format which needs to be cleaned. You need to process, explore, and condition data before modeling. The cleaner your data, the better are your predictions.3.Model Planning:
In this stage, you need to determine the method and technique to draw the relation between input variables. Planning for a model is performed by using different statistical formulas and visualization tools. SQL analysis services, R, and SAS/access are some of the tools used for this purpose.4. Model Building:
In this step, the actual model building process starts. Here, Data scientist distributes datasets for training and testing. Techniques like association, classification, and clustering are applied to the training data set. The model once prepared is tested against the "testing" dataset.5. Operationalize:
In this stage, you deliver the final baselined model with reports, code, and technical documents. Model is deployed into a real-time production environment after thorough testing.6. Communicate Results :
In this stage, the key findings are communicated to all stakeholders. This helps you to decide if the results of the project are a success or a failure based on the inputs from the model.Fields of jobs in Data Science |
Data Science Jobs Roles:
Most prominent Data Scientist job titles are:- Data Scientist
- Data Engineer
- Data Analyst
- Statistician
- Data Architect
- Data Admin
- Business Analyst
- Data/Analytics Manager
Data Scientist:
A Data Scientist is a professional who manages enormous amounts of data to come up with compelling business visions by using various tools, techniques, methodologies, algorithms, etc.Languages:
R, SAS, Python, SQL, Hive, Matlab, Pig, Spark
Data Engineer:
Data engineer |
The role of data engineer is of working with large amounts of data. He develops, constructs, tests, and maintains architectures like large scale processing system and databases.
Languages:
SQL, Hive, R, SAS, Matlab, Python, Java, Ruby, C + +, and Perl
Data Analyst:
Data analyst |
A data analyst is responsible for mining vast amounts of data. He or she will look for relationships, patterns, trends in data. Later he or she will deliver compelling reporting and visualization for analyzing the data to take the most viable business decisions.
Languages:
R, Python, HTML, JS, C, C+ + , SQL
Statistician:
Role:The statistician collects, analyses, understand qualitative and quantitative data by using statistical theories and methods.
Languages:
SQL, R, Matlab, Tableau, Python, Perl, Spark, and Hive
Data Administrator:
Data administrator |
Data admin should ensure that the database is accessible to all relevant users. He also makes sure that it is performing correctly and is being kept safe from hacking.
Languages:
Ruby on Rails, SQL, Java, C#, and Python
Business Analyst:
Business analyst |
This professional need to improves business processes. He/she as an intermediary between the business executive team and IT department.
Specializations and associated careers
- Machine Learning Scientist: Machine learning scientists research new methods of data analysis and create algorithms.
- Data Analyst: Data analysts utilize large data sets to gather information that meets their company’s needs.
- Data Consultant: Data consultants work with businesses to determine the best usage of the information yielded from data analysis.
- Data Architect: Data architects build data solutions that are optimized for performance and design applications.
- Applications Architect: Applications architects track how applications are used throughout a business and how they interact with users and other applications.
Applications of Data science
Internet Search:
Google search use Data science technology to search a specific result within a fraction of a secondRecommendation Systems:
To create a recommendation system. Example, "suggested friends" on Facebook or suggested videos" on YouTube, everything is done with the help of Data Science.Image & Speech Recognition:
Speech recognizes system like Siri, Google assistant, Alexa runs on the technique of Data science. Moreover, Facebook recognizes your friend when you upload a photo with them, with the help of Data Science.Gaming world:
EA Sports, Sony, Nintendo, are using Data science technology. This enhances your gaming experience. Games are now developed using Machine Learning technique. It can update itself when you move to higher levels.Online Price Comparison:
PriceRunner, Junglee, Shopzilla work on the Data science mechanism. Here, data is fetched from the relevant websites using APIs.Challenges of Data science Technology
- High variety of information & data is required for accurate analysis
- Not adequate data science talent pool available
- Management does not provide financial support for a data science team
- Unavailability of/difficult access to data
- Data Science results not effectively used by business decision makers
- Explaining data science to others is difficult
- Privacy issues
- Lack of significant domain expert
- If an organization is very small, they can't have a Data Science team
Impacts of data science
Technologies and techniques
Techniques
- Clustering is a technique used to group data together.
- Dimensionality reduction is used to reduce the complexity of data computation so that it can be performed more quickly.
- Machine learning is a technique used to perform tasks by inferencing patterns from data.
Technologies
- Python is a programming language with simple syntax that is commonly used for data science. There are a number of python libraries that are used in data science including numpy, pandas, and scipy.
- R is a programming language that was designed for statisticians and data mining and is optimized for computation.
- TensorFlow is a framework for creating machine learning models developed by Google.
- Pytorch is another framework for machine learning developed by Facebook.
- Jupyter Notebook is an interactive web interface for Python that allows faster experimentation.
- Tableau makes a variety of software that is used for data visualization.
- Apache Hadoop is a software framework that is used to process data over large distributed systems
Note this :
- Data Science is the area of study which involves extracting insights from vast amounts of data by the use of various scientific methods, algorithms, and processes.
- Statistics, Visualization, Deep Learning, Machine Learning, are important Data Science concepts.
- Data Science Process goes through Discovery, Data Preparation, Model Planning, Model Building, Operationalize, Communicate Results.
- Important Data Scientist job roles are: 1) Data Scientist 2) Data Engineer 3) Data Analyst 4) Statistician 5) Data Architect 6) Data Admin 7) Business Analyst 8) Data/Analytics Manager
- R, SQL, Python, SaS, are essential Data science tools
- The predictions of Business Intelligence is looking backward while for Data Science it is looking forward.
- Important applications of Data science are 1) Internet Search 2) Recommendation Systems 3) Image & Speech Recognition 4) Gaming world 5) Online Price Comparison.
- High variety of information & data is the biggest challenge of Data Science technology.
2 Comments
Nice one 👌
ReplyDeleteThanks Dada
Delete