13. Automate#
Why automate?
Make life simpler.
Ensure reproducibility.
Enhance efficiency.
Foster collaboration.
Achieve scalability.
Efficiency Through Command Line
Bash, the Bourne Again SHell, is a cornerstone of computing mastery. Beyond command execution, Bash exemplifies streamlined operations and a profound understanding. It offers controlled automation. While interactve graphical user interfaces (GUIs) provide a superficial interaction, Bash delves deeper, facilitating a richer conversation with the system. Locate the terminal on your computer (search for ‘Terminal’ in MacOS or ‘Command Prompt’ in Windows). Then, type ls -l
to list the files in your current directory. You’ve just executed your first Bash command.
Bash Basics:
Question: Which of the following commands lists the files in a directory in Linux?
A) dir
B) ls
C) show
D) open
# view from a MacOS terminal
(myenv) d@Poseidon 1.ontology % ls -l
total 1376
-rw-r--r--@ 1 d staff 3993 Aug 25 01:21 README.md
drwxr-xr-x@ 13 d staff 416 Aug 25 02:49 alpha
drwxr-xr-x@ 35 d staff 1120 Aug 24 14:52 bloc
drwxr-xr-x@ 9 d staff 288 Aug 21 12:45 cst
drwxr-xr-x@ 16 d staff 512 Aug 24 17:47 git
drwxr-xr-x@ 17 d staff 544 Aug 7 22:35 git-filter-repo
drwxr-xr-x@ 7 d staff 224 Aug 6 07:33 myenv
drwxr-xr-x@ 13 d staff 416 Aug 14 18:53 nh_projectbeta
-rw-r--r--@ 1 d staff 77753 Aug 23 12:44 particle_animation.gif
-rw-r--r--@ 1 d staff 463 Aug 23 12:43 populate_be.ipynb
-rw-r--r--@ 1 d staff 562412 Aug 25 06:37 python.ipynb
-rw-r--r--@ 1 d staff 34173 Aug 23 12:47 r.ipynb
drwxr-xr-x@ 13 d staff 416 Aug 15 02:12 seasons_projectalpha
drwxr-xr-x@ 13 d staff 416 Aug 17 14:05 sibs
-rw-r--r--@ 1 d staff 231 Aug 24 17:08 stata.do
-rw-r--r--@ 1 d staff 1196 Aug 23 18:43 stata.dta
-rw-r--r--@ 1 d staff 1892 Aug 24 17:03 stata.log
-rwxr-xr-x@ 1 d staff 90 Aug 24 17:08 stata.sh
drwxr-xr-x@ 139 d staff 4448 Jun 25 08:29 summer
drwxr-xr-x@ 25 d staff 800 Jul 20 20:21 verano
(myenv) d@Poseidon 1.ontology %
To see the contents of a simple bash script, type cat stata.sh
. The cat
command concatenates the contents of a file and prints them to the terminal. The stata.sh
file contains the following:
(myenv) d@Poseidon 1.ontology % cat stata.sh
export PATH=$PATH:/applications/stata/statase.app/contents/macos/
stata-se -b do stata.do
(myenv) d@Poseidon 1.ontology %
This tells us that the stata.sh
script adds the Stata executable to the path and then executes the stata.do
file. To run the script, type bash stata.sh
. The bash
command executes the script. You can also execute the script by typing ./stata.sh
. The ./
tells the terminal to execute the script in the current directory. The stata.sh
script is a simple example of how Bash can automate tasks. It’s a simple script, but it can be used to automate the execution of Stata code.
But before we run that lets see the contents of the stata.do file
(myenv) d@Poseidon 1.ontology % cat stata.do
if 0 {
cd /applications/stata
ls -kla
export PATH=$PATH:/applications/stata/statase.app/contents/macos/
stata-se -b do stata.do
}
webuse auto, clear
l make price mpg foreign in 1/9
di "`c(N)' obs & `c(k)' vars"%
(myenv) d@Poseidon 1.ontology %
Now let’s run the script
(myenv) d@Poseidon 1.ontology % ./stata.sh
(myenv) d@Poseidon 1.ontology % cat stata.log
___ ____ ____ ____ ____ ®
/__ / ____/ / ____/ 18.0
___/ / /___/ / /___/ SE—Standard Edition
Statistics and Data Science Copyright 1985-2023 StataCorp LLC
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC https://www.stata.com
979-696-4600 stata@stata.com
Stata license: Single-user , expiring 27 Mar 2024
Serial number: 401809200438
Licensed to: Poseidon
Johns Hopkins University
Notes:
1. Stata is running in batch mode.
2. Unicode is supported; see help unicode_advice.
3. Maximum number of variables is set to 5,000 but can be increased;
see help set_maxvar.
. do stata.do
. if 0 {
.
. cd /applications/stata
. ls -kla
. export PATH=$PATH:/applications/stata/statase.app/contents/macos/
. stata-se -b do stata.do
.
. }
.
. webuse auto, clear
(1978 automobile data)
. l make price mpg foreign in 1/9
+-----------------------------------------+
| make price mpg foreign |
|-----------------------------------------|
1. | AMC Concord 4,099 22 Domestic |
2. | AMC Pacer 4,749 17 Domestic |
3. | AMC Spirit 3,799 22 Domestic |
4. | Buick Century 4,816 20 Domestic |
5. | Buick Electra 7,827 15 Domestic |
|-----------------------------------------|
6. | Buick LeSabre 5,788 18 Domestic |
7. | Buick Opel 4,453 26 Domestic |
8. | Buick Regal 5,189 20 Domestic |
9. | Buick Riviera 10,372 16 Domestic |
+-----------------------------------------+
. di "`c(N)' obs & `c(k)' vars"
74 obs & 12 vars
.
end of do-file
(myenv) d@Poseidon 1.ontology %
We can see that the script ran the stata.do file and printed the results to stata.log in the terminal. This is a simple example of how Bash can automate tasks. It’s a simple script, but it can be used to automate the execution of Stata code.
Overshadowed by Aesthetic Simplicity
In the face of intuitive graphical user interfaces (GUIs) such as Windows, MacOS, and Android, depth of Bash is often overshadowed. With their colorful icons and all you have to do is click on what you want, GUIs are aesthetically pleasing. For many, clickable icons and immediate visual feedback make command-lines appear daunting. Yet, those willing to embrace Bash will find the experience extremely rewarding.
Early steps
Jupyter
Jupyter Notebooks optimize data analysis automation, integrating code, text, and visuals. They’re paramount in this Fena platform, simplifying work-sharing and reproducibility. This platform is built on Jupyter Notebooks and GitHub pages and seeks to automate data analysis and share analysis results (as the simple example above shows)
Workflow
Initiate a Jupyter Notebook: Using Google Colab, Jupyter Lab, or VS Code.
Develop Code: Use the built-in code editor in Jupyter Lab or VS Code. Our unique workflow even allows code writing in non-traditional Jupyter languages like Stata.
Execute the Code.
Save the Notebook.
Share the Notebook: Ranging from basic sharing via Google Colab to advanced methods like pushing to GitHub pages.
Python
Python’s large user community and integration with Jupyter make it an ideal choice for automating data analysis.
Libraries Overview:
Data Analysis: Use
Pandas
for data manipulation and analysis.Visualization:
Matplotlib
andSeaborn
are renowned for plotting and graphing.Machine Learning:
Scikit-learn
simplifies machine learning model creation.Statistical Modeling: Utilize
Statsmodels
.Bayesian Modeling: Turn to
PyMC3
.Deep Learning: Opt for
TensorFlow
,Keras
, orPyTorch
.Web Scraping:
Scrapy
,Beautiful Soup
,Selenium
, andRequests
are your go-to libraries.Natural Language Processing: Use
NLTK
,SpaCy
, orGensim
.Network Analysis: Explore networks with
NetworkX
,PyGraphviz
, orPyTorch Geometric
.
See Python chapter for a simple hands-on exercise
Later steps
Familiarize yourself with basic Unix commands.
Construct a bash script for executing code in Python, R, Stata, etc.
Set a cron job to automate the bash script execution.
GitHub Actions & Automation
GitHub Actions can be incredibly powerful, especially as your project grows and you might benefit from automation. Here are some tutorials and resources that can help you learn more about GitHub Actions:Official Documentation
GitHub Actions Documentation: This is the official documentation and is quite comprehensive. It provides a great overview and dives into more complex topics.
Online Tutorials
GitHub Actions for Beginners: This is a simple, straightforward tutorial aimed at beginners.
Automate your Workflow with GitHub Actions: A more in-depth look into GitHub Actions, great if you’re already familiar with CI/CD concepts.
Articles and Blog Posts
Intro to GitHub Actions: A Medium article that walks you through the basics.
Automating Workflows with GitHub Actions: This post provides a deep dive into automating various workflows.
Courses and Workshops
GitHub Actions: Continuous Delivery - This course on GitHub’s own training platform is a more structured way to learn GitHub Actions.
Automate your code-to-cloud workflows with GitHub Actions - A workshop that provides hands-on experience.
Sample Repositories
Awesome Actions: A curated list of custom actions, articles, and other resources.
The landscape is constantly changing, so it’s worth keeping an eye out for new tutorials and resources. But these should give you a good starting point.
Quantified Tokens as Checklist
Automation, Iteration, and Token Quantification: Lessons from Icons in Clinical Research
In the realms of clinical research, where precision and excellence are paramount, the wisdom of industry icons like Chesley Sullenberger, Peter Pronovost, and Marty Makary holds particular significance. These luminaries have not only left indelible marks on their respective fields but have also showcased the enduring relevance of automation, iteration, and token quantification in achieving exceptional outcomes.
Chesley Sullenberger: Precision in the Skies
Captain Chesley “Sully” Sullenberger, celebrated for his remarkable “Miracle on the Hudson,” exemplifies the critical role of precision in aviation. Beyond his heroic water landing, Sully’s commitment to safety underscores the importance of automation. In aviation, where the margin for error is slim, automation is a linchpin. Sophisticated autopilot systems and flight management computers ensure that aircraft navigate with unparalleled precision, minimizing the potential for human error.
Yet, Sully’s story also teaches us the value of iteration. Pilots undergo rigorous, iterative training to sharpen their skills and adapt to evolving technologies. Each flight is a learning experience, and continuous improvement is a non-negotiable ethos in aviation. The lessons from Sully’s iconic landing extend beyond the cockpit, emphasizing that iteration is not confined to one profession but is a universal pursuit of excellence.
Peter Pronovost and Marty Makary: Checklist Champions
In the world of clinical research and healthcare, Peter Pronovost and Marty Makary are renowned for their groundbreaking work on checklists. Their efforts have saved countless lives by reducing medical errors and enhancing patient safety. Their checklist methodology embodies the principles of automation and token quantification.
Checklists automate critical steps in complex medical procedures, ensuring that healthcare professionals adhere to best practices. These checklists act as safeguards against error, promoting precision in high-stakes environments. Moreover, they exemplify token quantification, as each item on the checklist represents a crucial token in the process. By quantifying these tokens, Pronovost and Makary provide a clear roadmap for achieving precision in healthcare.
Token Quantification: A Universal Approach
Incorporating token quantification into clinical research can lead to more rigorous and error-free studies. By quantifying the tokens within research papers, scientists can ensure that their work aligns with industry standards. Just as checklists in healthcare represent essential tokens for patient safety, token quantification acts as a checklist for precision in research.
Automation, iteration, and token quantification, as demonstrated by these icons, transcend specific professions. They are universal principles that underpin excellence. Whether you’re navigating a commercial airliner, conducting life-saving surgery, or conducting clinical research, these principles serve as beacons of precision. By heeding the lessons of Sully, Pronovost, and Makary, professionals in any field can chart a course toward ever-higher levels of excellence and precision.