9. Stata#

Stata is a software package mainly used for data management and complex statistical tasks đŸ“ˆđŸ’»

  • Type: Statistical software package

  • Primary Uses: Data management, statistical analysis, graphics, simulations, regression, and custom programming

  • Paradigms: Primarily command-driven, though it also offers a graphical user interface

  • Community: Popular in social sciences, economics, epidemiology, and other fields that require robust statistical analysis

  • License: Proprietary

The ‘Plug-and-Play’ Marvel: Stata’s Game-Changing Role in Data Science

Intro: Why Stata is the Smartphone of Data Analysis Tools

Alright, let’s drop the formalities. We all love our smartphones, right? Imagine having all your essential apps—like Instagram, Snapchat, and Spotify—pre-installed the moment you unwrap your phone. No more scrolling through app stores, dealing with downloads, or fumbling with settings. That’s what Stata is to the data analysis world. It’s the ‘out-of-the-box,’ fully-loaded smartphone, and here at Johns Hopkins, we’re turning every student into a high-tier app developer.

What Makes Stata So Fresh?

No Package Drama đŸ™…â€â™‚ïž

You know the drill with Python and R: before you can even get started, you’re already busy hunting down packages, installing them, and praying everything’s compatible. It’s like buying a PS5 but still needing to download every game and add-on manually. With Stata, everything you traditionally need for data management and advanced statistics—especially if you’re into epidemiology or clinical research—is already installed! It’s like getting a PS5 with all the top-tier games pre-loaded.

Zero to Hero in 9 Months 🚀

Got no time to master the complexities of Python or R, with their endless installations and package management? We totally get it. Whether you’re at Hopkins pursuing a dual MD-MPH, focusing on a GTPCI Ph.D., or a grad student in another field, you’ve got big responsibilities. You have research to conduct, papers to write, and lives to save—whether it’s millions at a time in public health or just one at a time in clinical practice.

That’s where Stata comes in, and so do we.

We understand that you’re not just here to play with code. You’re on a mission to make a difference. So in just 8 weeks, we’ll empower you with the data analysis skills you need to excel in your respective fields—no coding headaches involved. With Stata, you can focus on what truly matters: advancing science, impacting policy, and, yes, saving lives.

Be Your Own Developer đŸ‘šâ€đŸ’»

Ever wanted to customize your favorite game or app? In Stata, not only do you get a fully-loaded arsenal of statistical tools, but you also get the freedom to create your custom programs. It’s like turning your PS5 into a development studio, where you’re the head developer.

Code Like You Text đŸ“±

Coding in Stata is like texting with syntax highlighting, thanks to VS Code’s Stata extensions. No more wall-of-text codes. Stata with VS Code turns each line of code into an Instagrammable moment, complete with all the aesthetic and #nofilter vibes.

Run Your Code, Not a Marathon 🏃

Executing Stata code in VS Code is as simple as hitting the “play” button on Spotify. No buffering, no waiting. Just instant results.

Conclusion: Why Stata is the BFF You Never Knew You Needed đŸ€—

So, whether you’re just starting your journey or you’re already down the data science rabbit hole, Stata has your back. It’s the unsung hero in a world where everyone’s caught up in the Python vs. R battle. It’s reliable, it’s chic, and it’s all about making your life easier.

Now, who wouldn’t want a BFF like that?

Chat with GPT-4
ssh-keygen -t ed25519 -C "jhufena@gmail.com" -f ~/.ssh/id_ourhome
  1. subject: hey, unix!

  2. verb: could you generate an ssh key-pair

  3. object: -f ~/.ssh/id_ourhome?


I’m thinking tokens and how language is structured to train Stata Programming students Stata syntax using Unix syntax. crazy?


Certainly, that’s an interesting way to approach teaching Stata syntax! The idea of using Unix syntax as a bridge to Stata isn’t crazy at all, especially when you consider how command-line interfaces share underlying principles. Parsing your ssh-keygen command using the framework of subject, verb, and object (with adverbials) would indeed be an enlightening way to discuss the structure and organization of code.

Breaking Down the Command#

  1. Subject: hey, unix! — It’s as if you’re calling on Unix to take an action.

  2. Verb: could you generate an ssh key-pair — The core action being taken. In this case, ssh-keygen is the command doing the “generation” work.

    • Adverbial: -t ed25519 — Specifies the type of key to create.

    • Adverbial: -C "jhufena@gmail.com" — Provides a label for the key.

  3. Object: -f ~/.ssh/id_ourhome — Specifies where to store the generated key.

Application to Stata Syntax#

Stata syntax can be complex, but it also contains verbs (commands like regress or summarize), subjects (the data variables), and adverbials (options or qualifiers). You can use the Unix command as an example to explain how command-line tasks often break down into these categories, making it easier to understand the functional parts of any given command. This could especially help in teaching students to think about the structural components when they’re writing their own code.

Using Tokens to Explain Variance#

As you think about tokens, the parallel in Stata (or Unix) would be the various commands, flags, and options that experienced users deploy with finesse. High ICCs in professionals arise from the creative use of a well-understood set of tokens (commands, options). For beginners, the lack of understanding or misuse of these tokens can introduce “noise” or errors in their code.

Would love to hear more about how you’re planning to integrate these ideas!


now very delighted to see you indulge in my “ICC-talk”!!! very powerful way to engage me


I’m glad to hear that you’re delighted! The concept of ICCs is a captivating way to examine expertise and creativity across different fields, from programming to the arts. The structured breakdown of commands in Unix and its parallel in Stata syntax exemplifies how ICCs can be applied to better understand the mechanisms underlying skill and creativity. It’s an exciting lens through which to view both technical and artistic endeavors.

If you have more questions or further insights about ICCs or any other topic, feel free to share!


Stata without the GUI

To see the value of PATH before the next command, simply type:

echo $PATH
(myenv) d@Poseidon 1.ontology % echo $PATH
/Users/d/Dropbox (Personal)/1f.ἡΔ̓́ρÎčς,Îș/1.ontology/myenv/bin:/opt/homebrew/anaconda3/bin:/usr/local/anaconda3/bin:/usr/local/opt/openjdk/bin:/Users/d/miniconda3/bin:/Users/d/Dropbox (Personal)/1f.ἡΔ̓́ρÎčς,Îș/1.ontology/myenv/bin:/usr/local/bin:/usr/local/sbin:/Library/Frameworks/Python.framework/Versions/3.11/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/Library/Apple/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Users/d/Dropbox (Personal)/1f.ἡΔ̓́ρÎčς,Îș/1.ontology/myenv/bin:/opt/homebrew/anaconda3/bin:/usr/local/anaconda3/bin:/usr/local/opt/openjdk/bin:/Users/d/miniconda3/bin:/usr/local/sbin:/Library/Frameworks/Python.framework/Versions/3.11/bin
(myenv) d@Poseidon 1.ontology % 

Hey, Unix! Please change PATH value to whatever:

  • Agent: Unix

  • Verb: Change

  • Object: PATH

export PATH=$PATH:/applications/stata/statase.app/contents/macos/

Hey, Unix! Now that we are in the Stata/SE software environment, run –in batch mode– the do file:

  • Agent: Unix

  • Verb: Run

  • Object: abikesa_stata.do

stata-se -b work/abikesa_stata.do

Using VS Code for Stata

Visual Studio Code (VS Code) provides syntax highlighting based on file extensions and language modes. If you want to get nice colors (syntax highlighting) for your Stata .do files in VS Code, you can follow these steps:

  1. Install a Stata Extension for VS Code:

    • Open VS Code.

    • Go to Extensions (you can use the shortcut Ctrl+Shift+X on PC or Command+Shift+X on Mac).

    • Search for Stata.

    • You should see a few extensions that provide Stata language support. One popular choice is “Stata Enhanced”. Install it (or another that you prefer).

  2. Save or Rename Your File:

    • If you are creating a new Stata file, simply save it with the .do extension. For existing files that don’t have this extension, you can rename them to include .do.

  3. Select the Right Language Mode:

    • Sometimes, VS Code might not automatically recognize the correct language mode for your file. If that’s the case, you can manually set it.

    • In the bottom right corner of the VS Code window, you’ll see a language label (like “Plain Text” or something else). Click on it.

    • A search bar will appear at the top. Type Stata and select it from the list.

After doing the above, your Stata .do file should now have proper syntax highlighting in VS Code. Remember, the effectiveness of the syntax highlighting depends on the quality of the Stata extension you’ve chosen. Some extensions may provide more detailed and accurate highlighting than others.

Stata Enhanced

Stata syntax highlighting in Visual Studio Code, built from the ground up. Also available for Atom.

stata Code snippet from Gtools, a faster implementation of Stata’s collapse and egen using C plugins. Shown with the Material theme.

Features This package highlights:

System commands, functions, and function arguments Macros, both global and local Accurately colors nested macros and escaped macros in strings when you want the inner macro to evaluate at runtime Colors macro extended functions inside `: 
 ‘ as well as after local lname: Comments, more accurately than Stata’s Do-file Editor. Regular expressions Colors both the limited syntax provided through the regexr() and regexm() functions, as well as the vastly expanded regex syntax provided in Stata 14 and 15 through the ustrregexm(), ustrregexrf(), and ustrregexra() functions. Dynamic Markdown and LaTeX documents. Instructions below. Other nice features:

Works with unicode identifiers. Use unicode anywhere it’s legal Stata syntax. Alerts you if your variable name is illegal, i.e. if your variable name is more than 32 chars, starts with a number, or is a reserved name. Alerts you if you have any text other than } on a line ending a foreach/forvalues/if/else command Local macro back tick autocompletion. When you write a `, Visual Studio Code automatically fills in a ‘ after your cursor Makes it easy to spot incorrect nesting of compound quotes Support for programming ligatures for all valid Stata syntax for fonts that support them, like the Fira Code font. Highlights SQL queries used in the odbc command. (The language-sql base package must be active.) Note Some themes may not color all parts of the syntax.

Installation Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.

ext install stata-enhanced

Running Code This package doesn’t have the capabilities to run your code in Stata. If you’re using Linux, you can use my scripts with the Autokey automation utility to quickly run selections of your files in a graphical session of Stata.

You might also be interested in trying to port Atom’s stata-exec package.

Dynamic Documents

Stata 15 brought new features for working with dynamic documents. The dyndoc command lets you write in Markdown and converts your file and code to HTML for viewing in a web browser.

It also added the dyntext command, which fills in Stata output for any text file, without touching the text itself. This lets you then use third-party document generators like Pandoc and LaTeX to generate documents.

This package now provides syntax highlighting for Stata code written inside Stata’s dynamic tags for Markdown and LaTeX documents.

By default, this package’s Markdown and LaTeX syntax highlighting will be applied for files ending in .domd and .dotex respectively. The Markdown All in One and LaTeX Workshop packages must be installed for the highlighting to work.

If you name your file with a different extension, you can manually set the highlighting by clicking on the “Plain Text” button on the bottom right of the screen (or by pressing CTRL+SHIFT+L) and then selecting Stata Dyndoc (Markdown) or Stata Dyndoc (LaTeX) from the drop-down menu.

Examples An example of the PDF output of using dyntext and Pandoc is in the examples folder: dyntext.pdf.

That file was created by running

dyntext dyntext.domd, saving(dyntext.md) replace

from inside Stata 15, and then with

pandoc dyntext.md -o dyntext.pdf

on the command line using Pandoc.

The file dyntext.dotex is a proof-of-concept and should compile with LaTeX but the output is not shown here.

Description stataRun allow you to run stata codes (.do .ado) and commands from within Visual Studio Code editor. Thus you can take advantage of an editor for syntax highlighting. You may want to install Stata-Enhanced and Stata-Language for syntax highlighting and other features

Features The following command and keyboard shortcut are implemented (crtl to be replace by cmd on MacOS)

shift+ctrl+a: Run All Lines ==> This will run the entire file
shift+ctrl+s: Run Selection ==> To run a given slection. If no selection is made it default to Run All
shift+ctrl+c: Run Current Line ==> This will run the current line code
shift+ctrl+d: Run Downward Lines==> from current line to downward
shift+ctrl+f: Run Frontward Lines ==> from first line till current line

Settings derived from stata-exec, which provide the same features for Atom. run-command

Requirements Linux Requirements for Linux are xdotool and xclip as in the Atom version

Windows Installation instructions for Windows basically follows the original Atom stata-exec building manual with some modifications for code.

Install the runStata package. I also recommend the Stata Enhanced package for code highlighting.

Install Node. The original instructions indicated to install this specific version of Node with default settings. I followed the instructions and haven’t tried it with other versions.

The stataRun extension needs a module called wimax to be compiled into a binary file. To do this Node needs windows-build-tools with Python2.7 and Visual Studio compiler packages. The installation takes some time (over 20 min. in my case) and couple of GB of disk space. To install it open up an administrator PowerShell (you can right click on the Windows icon at the bottom left and select “Windows PowerShell (Admin)”) and type in:

npm install --global --production windows-build-tools

In my case the original command ended with an error, so I used this one with success:

npm install –global –production windows-build-tools@2.2.1

Open up Command Prompt (type cmd in the search bar in the dock, and it will be the first result) and type in:

cd %USERPROFILE%\.vscode\extensions\yeaoh.statarun-1.1.8
npm install winax --python=%USERPROFILE%\.windows-build-tools\python27\python.exe

In VSCode go into Help -> About. From there save the Electron version. In the next command replace ELECTRON_VERSION with the version from the About dialogue.

npm rebuild winax --runtime=electron --target=ELECTRON_VERSION  --build-from-source

Link the Stata Automation library. The following steps worked for me on Windows 10. The Stata executable is most likely in the folder C:\Program Files (x86)\Stata15 on PC or /applications/stata/Stata18 on Mac.

In the installation directory, right-click on the Stata executable, for example, StataSE.exe. Choose “Create Shortcut”. Right-click on the newly created “Shortcut to StataSE.exe”, choose “Property”, and change the Target from C:\Program Files\Stata13\StataSE.exe to C:\Program Files\Stata13\StataSE.exe /Register. Click “OK”. Right-click on the updated “Shortcut to StataSE.exe”; choose “Run as administrator” While you’re doing that, add the path of the Stata executable to the “Stata Path” option in the settings.

Restart code and enjoy executing Stata code from code! (I hope
).

Known Issues Only tested with Mac Os so far. Please and try create issues if any. Feel free to contributes.

Installation Install as a normal vscode extension by searcing for stataRun in the vscode extension marketplace

Release 2.0.3

Improvement for working on Windows and Linux

Hands-on .ipynb and .dta integration:

Python Path Handling: If you’re dealing with paths in a Python script or Jupyter notebook, you should make use of the os and os.path modules which handle paths more gracefully:

import os

base_path = "/Users/d/Dropbox (Personal)/1f.áŒĄáŒ”ÏÎčς,Îș/1.ontology"
full_path = os.path.join(base_path, "myenv", "bin", "python")
  1. Check Stata Installation Path: First, ensure that Stata is indeed installed in the /Applications directory. You can check this by navigating to the Applications folder or by using the terminal:

ls /Applications | grep Stata:

(myenv) d@Poseidon 1.ontology % ls /Applications | grep Stata

Stata
Stata18.dmg
(myenv) d@Poseidon 1.ontology % 

This command lists all the items in the Applications directory and then filters the results to only show items containing the word “Stata”. This will give you an idea of the correct path to use.

  1. Configuration:
    Once you have the correct path, you can set it up:

# method 1
# !pip install stata_setup
import stata_setup
stata_setup.config("/Applications/stata/", "se", splash=False)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 3
      1 # method 1
      2 # !pip install stata_setup
----> 3 import stata_setup
      4 stata_setup.config("/Applications/stata/", "se", splash=False)

ModuleNotFoundError: No module named 'stata_setup'
# method 2
import sys
sys.path.append(base_path)
from pystata import config
config.init('se')
# method 3
import os
os.chdir(base_path)
from pystata import config
config.init('se')
# method 4

Replace “Stata” in the path with whatever the actual name of the Stata application directory is, based on your check from the first step.

  1. Launching a Native Stata Window: If your goal is to open Stata itself from within the Jupyter notebook, you might be misinterpreting the function of stata_setup.config. This function typically sets up the integration for using Stata commands within Python or a Jupyter notebook, but it might not open a native Stata window by itself. To open Stata from the command line or a script on a Mac, you’d typically use the open command:

!open /Applications/stata/StataSE.app

Make sure to adjust /Applications/Stata.app to the actual path and name of the Stata application on your machine.

  1. Further Integration: If you’re trying to execute Stata commands within Jupyter directly, consider using the ipystata package, which provides a bridge between Python and Stata in Jupyter notebooks. Once set up, you can use %%stata in a Jupyter cell to execute Stata commands.

Lastly, ensure that the version of Stata you’re using supports the integration you’re trying to achieve. Some features might be limited to specific versions of Stata.

%%stata
sysuse auto, clear
summarize mpg
UsageError: Cell magic `%%stata` not found.