-
8 Workflow: projects | R for Data Science
-
- Press Cmd/Ctrl + Shift + F10 to restart RStudio.
- Press Cmd/Ctrl + Shift + S to rerun the current script.
There is a great pair of keyboard shortcuts that will work together to make sure you’ve captured the important parts of your code in the editor:
I use this pattern hundreds of times a week.
- You should never use absolute paths in your scripts, because they hinder sharing: no one else will have exactly the same directory configuration as you.
-
-
4 Workflow: basics | R for Data Science
-
All R statements where you create objects, assignment statements, have the same form:
object_name <- value
When reading that code say “object name gets value” in your head.
- We recommend snake_case where you separate lowercase words with
_
.
-
-
1 Introduction | R for Data Science
- Tidying your data means storing it in a consistent form that matches the semantics of the dataset with the way it is stored. In brief, when your data is tidy, each column is a variable, and each row is an observation.
- A good visualisation will show you things that you did not expect, or raise new questions about the data.
- Visualisations can surprise you, but don’t scale particularly well because they require a human to interpret them.
- Models are complementary tools to visualisation. Once you have made your questions sufficiently precise, you can use a model to answer them.
- by its very nature a model cannot question its own assumptions. That means a model cannot fundamentally surprise you.
- There’s a rough 80-20 rule at play; you can tackle about 80% of every project using the tools that you’ll learn in this book, but you’ll need other tools to tackle the remaining 20%.
-
You need a precise mathematical model in order to generate falsifiable predictions. This often requires considerable statistical sophistication.
You can only use an observation once to confirm a hypothesis.
The complement of hypothesis generation is hypothesis confirmation. Hypothesis confirmation is hard for two reasons:
- Hypothesis generation and confirmation
- models are often used for exploration, and with a little care you can use visualisation for confirmation. The key difference is how often do you look at each observation: if you look only once, it’s confirmation; if you look more than once, it’s exploration.
- Models for exploration, visualizations for confirmation
-
-
2 Introduction | R for Data Science
- The goal of data exploration is to generate many promising leads that you can later explore in more depth.
-
Sunday, May 9, 2021
Educational Resources & Tech Tools 05/10/2021
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment