There are a few:
Of course, for Python or R, you get datacamp.com or dataquest.io and that’s due to their immensely large user base and commercial pull. You are not going to get that for F# or Julia. Sites like that make life much more pleasant as you can find professional training in almost any relevant application area quickly with real worked-out examples to follow or to utilise as solution templates. They are also very expensive to set up and maintain/update (content-wise).
The most important platform Microsoft rolled out in the form of ML.NET is strangely C#-ish and has no decent F# wrapper. It could have really helped F# adoption and growth the way Spark did with Scala, but with Microsoft, F# seems to be that ignored in-the-back child that never gets the really cool stuff.
Now to ML:
- Data Preparation as in transform information (structured/unstructured) structured data as input. Cleaning Data, Manipulating Data, Handling Erroneous Data and Missing Values, … In most jobs, that’s like 50% (if you’re very lucky) to 90% of your time and effort. This bit --> get data and make it ready for analysis.
- Do exploratory and visual analysis. For a lot of jobs, this is where it ends, you provide simple statistics measures of central tendency, dispersion, bias, kurtosis, and alike, along with plots and graphs. Then you go and either compile them into a report or develope dashboards with live graphics that can be drilled up and down. This bit --> overview of data and what you might be able to do with it.
- Modelling - Inferential or Predictive, Frequentist or Bayesian, …?
3.a) Hypothesis Testing (t Test, F Test, A/B Testing, ANOVA, …), Regression, Classification, Clustering. These are your bread and butter unless you land in the hot seat at some cutting-edge mathematics/computer-science research centre at a university or few dozen companies like Microsoft, Google, …, Goldman Sachs, Lockheed Martin, …, etc.
3.b) GLMs, LDA, SVMs, Non-Linear Regression, Bayesian Graph Networks, ANNs, Deep Learning, … you could theoretically spend time endlessly learning algorithms one after the other until you turn into a skeleton and there will be new stuff coming out yet. Within each area of application, there are few widely adopted techniques and methodologies you can hack together, and most people would only experience working in few areas.
3.c) Most important is to know which class of models and techniques to use for each situation, what data you need for it and in what format to make it a viable input, what you get out and how to interpret important bits of output, and how to assess/monitor the performance of whatever tools you are employing (e.g., accuracy, precision, specificity, F1-statistic, ROC Curve, …).
- CLOUD? Azure, AWS + Keras, Tensorflow, ML.NET, Spark, Hadoop, etc
You go there simply because the size of your problem/data and the complexity of your solution mandates that the issue can no longer be handled by your local computer’s resources such as RAM memory and processing power. You just have to use different syntax and write a few more lines to access the cloud resources rather than your PC or local server.
- Your end result is invariably a report or a dashboard or such that - most of the times - dry and devoid of any formulas or messy tables and numerous numbers and crowded graphics. Target audience is rarely an ML expert and mathematical or programming savvy people are in general not that many.
So for this whole workflow, I find that FSharp is still lacking: lack of libraries, lack of native libraries, lack of documentation, lack of article-quality graphics facilities, no interactive visualisation meaning no dashboards or reports. You can do a simple y ~ x regression on artificial data in R, Python, and F# (ML.NET). Then decide which looks and feels simpler/faster. Data analytics by-and-large is still scripting and prototyping, and in that R/Julia/Matlab still outshine full blown general purpose programming languages like F# or Python. On the positive side, F# is brilliant for data prep manipulating/merging data sources and producing inputs. I usually move on to R/Python for the rest of the workflow. Also, F# code is generally orders of magnitude faster than R and Python, more akin to C#.