100 Splunk Interview Questions and Answers
In this article, we will see 100 important Splunk Interview Questions and Answers.
Here are the top 100 interview questions and answers on Splunk:-
Splunk is a software used for monitoring, searching, analyzing the machine data in real time. The data source can be web application, sensors, devices, or user created data.
The components of Splunk are:
(a) search head – GUI for searching
(b) Forwarder – forward data to indexer
(c) indexer – index machine data
(d) Deployment server – Manages splunk components in distributed environment.
Splunk works by collecting, parsing, indexing and analyzing data. Data is collected by the forwarder from the source and forwarder forward the data to the indexer. On data stored in the indexer the search head searches, visualizes, analyzes and performs various functions.
Splunk forwarder is used to forward data to indexer.
There are two types:
Universal Forwarder
Heavy Weight Forwarder
In Universal forwarder, splunk agent is installed on non-splunk system to gather data locally but it can’t parse or index data
Heavy weight forwarder is the full instance of splunk with advance functionality and it works as remote controller as well as intermediate forwarder and data filter.
Splunk is a single integrated tool for machine data. It does all the role starting from performing IT operation, analyzing machine logs with providing business intelligence. There can be other tools in market but Splunk is the only tool that provides end-to-end data operation. You might need 3-4 tools individually for what Splunk is doing as a single software.
props.conf
indexes.conf
inputs.conf
transforms.conf
server.conf
There are 6 type of licenses in Splunk:-
Enterprise license
Free license
Forwarder license
Beta license
Licenses for search heads
Licenses for cluster members
In free license we cannot authenticate and schedule searches, distribute search, forwarding in TCP/ Http and deployment management
License Master controls how much data size we can index in a day. For example if we have 200 GB license model, then we can only index 200 GB of data in a day. So we should have the license for the maximum data size we are getting.
Data search will stop if License Master is not reachable, however data will continue to indexed. You will get a warning on web UI or search head that you have exceeded the indexing volume. The indexing will not stop.
Its a plugin to connect to generic SQL database and integrate with it.
To enable Splunk to boot-start, the command is:
$SLUNK_HOME/bin/Splunk
To disable Splunk to boot-start, the command is:$SPLUNK_HOME/bin/Splunk
To boost the reporting efficiency, Summary indexes are used. Basically it enables user to generate report after processing huge volume of machine data.
There are two types:
Default Summary Index – It is used by Splunk Enterprise by default in case no other summary index are specified.
Additional Summary Index – To enable running varieties of reports, additional summary index is used.
The five default fields are
source,
host,
source type,
index,
timestamp
Splunk can be restarted from the Splunk Web. The steps are
1.Go to System, navigate to Server Controls.
2.Click on Restart Splunk.
Using lookup tables, we can search multiple IP addresses.
The most efficient way to filter events in Splunk is by time / duration.
To reset the password, access to the file where Splunk is running is necessary. Then perform the following steps:
Move $SPLUNK_HOME/etc/passwd file to $SPLUNK_HOME/etc/passwd.bak
Restart Splunk and log in with default username and password i.e. admin/changeme.
Reset the password and combine the password file with the backup file.
Sourcetype in Splunk is a default data field.Sourcetype is the format of the data that shows its origin. for eg, take .evt files, it originate from the event viewer. The classification of the incoming data can be done based on service, system, format and character code. The common source types are apache_error, websphere_core, apache_error and cisco_syslog. What it does is processes and distributes incoming data into different events.
I would like to give usecase on how to search 2 sourcetpes in a lookup file
sourcetype=X OR sourcetype=Y | lookup country.csv
Using this code, sourcetypes X and Y can be searched in a lookup file.
KV stands for key value that allows to store and obtain data inside Splunk. The KV store has the following functions:
(a) To manage a job queue.
(b) For storing metadata by the user.
(c) Analysing the workflow.
Storing the user application state required for handling a UI session. To store the results of the search queries in Splunk. Maintaining a list of environment assets and checkpoint data.
A deployer is used to deploy configuration information and apps to the cluster head. The set of configuration details such as updates that the deployer sends is called configuration bundle. The deployer shares this bundle when a new cluster member joins the cluster. It handles the basic app configurations and user configurations.
However, the latest states cannot be restored to the members of the cluster.
Data models can be created through admin or power roles by the users. For other users, these models can only be created if they have the write access to the application. The permissions based on the roles determine whether a user can edit or view them.
auto_high_volume is used when the indexes are of very high volume. A high volume index can get over 10GB of data.
logstash
Loggly
Loglogic
sumo logic
To restart webserver: splunk start splunkweb
To restart daemon: splunk start splunkd
we need to delete searches.log from this path$splunk_home/var/log/splunk/searches.log
Its a directory or index at default location /opt/splunk/var/lib/splunk .It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already.We can access it through GUI by searching for “index=_thefishbucket”
The commands are:-
Top
Rare
stats
Chart
Timechart
Stat reports data in tabular format and multiple fields is used to build table.
As name indicates, chart is used to display data in bar, line or area graph. It takes 2 fields to display chart.
Timechart is used to display data on timeline. It just takes 1 field as the other field is by default is time field.
$SPLUNK_HOME/bin/Splunkdisable boot-start
we can disable Splunk launch messabe by adding this in splunk_launch.confSet valueOFFENSIVE=Less
in splunk_launch.conf
Splunk app has GUI configuration whereas Splunk app doesnt have it (only command line)
Using command Splunk offline, we can offline a peer
SPL command has five major categories:
Sorting Results, Filtering Results, Grouping Results, Filtering, Modifying and Adding Fields and Reporting Results.
Using the following commands we can set minimum disk usage:/opt/splunk/bin/splunk set minfreemb = 20000
It requires restart, so/opt/splunk/bin/splunk restart
Yes, SOS stands for Splunk on Splunk. Its a type of splunk app which provides graphical interface of Splunk performance and issues.
It adds fields based while identifying the value in the event, referencing a lookup table and while adding up the fields in the matching rows in the lookup table of the event.
It returns the whole lookup table as the search results.
It outputs the current search results to a lookup table on the disk.
As name explains, it sorts the search results by the use of specified fields.
Here is the syntax:
Sort[] … [desc]
The transaction command is helpful in two specific scenarios:
As we know, unique id (from one or more fields) alone is not enough to differentiate between two transactions. This might be the use case when the identifier is reused, for example web sessions identified by cookie/client IP. In this scenario, time span or pauses are also used to segment the data into transactions. In other cases when an identifier is reused, say in DHCP logs, a particular message may identify the beginning or end of a transaction. When it is desirable to see the raw text of the events combined rather than analysis on the constituent fields of the events.
Well, we can start from here: First I would like to check splunkd.log to trace any error. If all is fine then I will check server / vm performance issue (i.e. cpu / memory / storage IO etc) and lastly install Splunk on Splunk which provides GUI where we can check any performance issues.
Go to dir /opt/splunk/bin/splunk create app New_App -template app1
$SPLUNK_HOME/var/run/splunk/dispatch
contains a directory for each search that is running or has completed. For example, a directory named 1434308973.367 will contain a CSV file of its search results, a search.log with details about the search execution, and other stuff. Using the defaults (which you can override in limits.conf), these directories will be deleted 10 minutes after the search completes – unless the user saves the search results, in which case the results will be deleted after 7 days.
Both are features provided splunk for high availability of splunk search head in case any one search head goes down.Search head cluster is newly introduced and search head pooling will be removed in next upcoming versions.Search head cluster is managed by captain and captain controls its slaves.Search head cluster is more reliable and efficient than search head pooling.
Null queue used to trim out all the data that is unwanted.
There are three modes:
Fast mode
Smart mode
Verbose mode
Splunk btool is a command line tool to troubleshoot configuration file issues. It is also used to see what values are being used by your Splunk Enterprise installation in existing environment.
Command: /opt/splunk/bin/splunk btool input list
Here is the command /opt/splunk/bin/splunk rollback cluster-bundle
/opt/splunk/bin/splunkset web-port <port_number>
Map-reduce algorithm is inspired by map and reduce functionality and used for batch based large scale parallelization.
Default configuration file is located in $Splunkhome/etc/system/default
Lookup command is used for referencing fields from an external csv file that matches fields in your event data.
This is done by keeping track of indexed events in a directory called fish buckets and contains seek pointers and CRCs for indexed files. This way it can check whether it has been indexed or not and avoid duplicate index.
CIM stands for common information model
CIM is used to normalize the field names in the data so that we can search for the same exact field in different type of logs using the common field name.
The index time precedence order is system local, app local, app default, system default
The search time precedence order is App local, app default, system local, system default.
In the raw event, only the first matching pattern will be captured by lazy regex but all matching pattern will be captured by greedy regex in raw event.
Yes using bash script, it is possible to install Splunk forwarder remotely.
Search factor and Replication factor needs to be configured for indexer cluster.
Yes, here are some default port numbers used in Splunk:-
Splunk Web – Port 8000
SplunkD – 8089
Indexing port – 9997
Splunk kv store – 8191
UDP – 514
/opt/splunk
/opt/splunk/etc/system/default
/opt/splunk/etc/apps
/opt/splunk/etc/deployment-apps
/opt/splunk/var/log/splunk
If linux system, it is located by default at /var/log/messages
/opt/splunk/var/lib/splunk
Using forwarder tab on the DMC to monitor the status of our forwarders and the deployment server to manage our forwarders.
https://regexr.com
Think of a devices like router and switches where we cannot install forwarder on it, syslog server is used to collect data from such devices.
That’s all for Splunk Interview questions with answers. If you have any questions, please mention in comments. Thanks!!!
1 Response
[…] is indexed multiple times or in other words, how to avoid duplicate indexing the same chunk of data?Splunk fishbucket keeps seek pointers and CRCs for the indexed files inside the directory. This directory is called […]